Linking teacher classroom assessments to standardized
state and national tests
Excerpted from: National Research Council. (2001)
Knowing What Students Know: The Science and Design of Educational
Assessment, Committee on the Foundations of Assessment, J. Pellegrino, N.
Chudowsky and R. Glaser (eds.), Division on Behavioral and Social Sciences and
Education. Washington, DC: National Academy Press, pp. 252-257.
Assessment Systems
In the preceding discussion we have addressed issues
of practice related to classroom and large-scale assessment separately. We now
return to the matter of how such assessments can work together conceptually and
operationally. As argued throughout this chapter, one form of assessment does
not serve all purposes. Given that reality, it is inevitable that multiple
assessments (or assessments consisting of multiple components) are required to
serve the varying educational assessment needs of different audiences. A
multitude of different assessments are already being conducted in schools. It
is not surprising that users are often frustrated when such assessments have
conflicting achievement goals and results. Sometimes such discrepancies can be
meaningful and useful, such as when assessments are explicitly aimed at
measuring different school outcomes. More often, however, conflicting
assessment goals and feedback cause much confusion for educators, students, and
parents. In this section we describe a vision for coordinated systems of
multiple assessments that work together, along with curriculum and instruction,
to promote learning. Before describing specific properties of such systems, we
consider issues of balance and allocation of resources across classroom and
large-scale assessment.
Balance Between Classroom and Large-Scale Assessment
The current educational assessment environment in the
United States clearly reflects the considerable value and credibility placed on
external, large-scale assessments of individuals and programs relative to
classroom assessment designed to assist learning. The resources invested in
producing and using large-scale testing in terms of money, instructional time,
research, and development far outweigh the investment in the design and use of
effective classroom assessment. It is the committee's position that to better
serve the goals of learning, the research, development, and training investment
must be shifted toward the classroom where teaching and learning occurs.
Not only does large-scale assessment dominate over
classroom assessment, but there is also ample evidence of accountability
measures negatively impacting classroom instruction and assessment. For
instance, as discussed earlier, teachers feel pressure to teach to the test,
which results in a narrowing of instruction. They also model their own
classroom tests after less-than-ideal standardized tests (Gifford and O'Connor,
1992; Linn, 2000; Shepard, 2000). These kinds of problems suggest that beyond
striking a better balance between classroom and large-scale assessment, what is
needed are coordinated systems of assessments that collectively support a
common set of learning goals, rather than working at cross-purposes.
Ideally in a balanced assessment environment, a single
assessment does not function in isolation, but rather within a nested
assessment system involving states, local school districts, schools, and
classrooms. Assessment systems should be designed to optimize the credibility
and utility of the resulting information for both educational decision making
and general monitoring. To this end, an assessment system should exhibit three
properties: comprehensiveness, coherence, and continuity. These three
characteristics describe an assessment system that is aligned along three
dimensions: vertically, across levels of the education system; horizontally,
across assessment, curriculum, and instruction; and temporally, across the
course of a student's studies. These notions of alignment are consistent with
those set forth by the National Institute for Science Education (Webb, 1997)
and the National Council of Teachers of Mathematics (1995).
Features of a Balanced Assessment System
Comprehensiveness
By comprehensiveness, we mean that a range of
measurement approaches should be used to provide a variety of evidence to
support educational decision making. Educational decisions often require more
information than a single measure can provide. As emphasized in the National
Research Council report High Stakes: Testing for Tracking, Promotion, and
Graduation (1999b), multiple measures take on particular importance when
important, life-altering decisions (such as high school graduation) are being
made about individuals. No single test score can be considered a definitive
measure of a student's competence. Multiple measures enhance the validity and
fairness of the inferences drawn by giving students various ways and
opportunities to demonstrate their competence. The measures could also address
the quality of instruction, which provide evidence that improvements in tested
achievement represent real gains in learning (NRC, 1999c).
Example: UK Secondary School Certification Exam
System. One form of comprehensive assessment system is illustrated in
Box 6-4, which shows the components of a U.K. examination for certification of
top secondary school students who have studied physics as one of three chosen
subjects for two years between ages 16 and 18. The results of such examinations
are the main criterion for entrance to university courses. Components A, B, C,
and D are all taken within a few days, but E and F involve activities that
extend over several weeks preceding the formal examination.
|
Com- ponent |
Title |
No. of Questions or
Tasks |
Time |
Weight in Marks |
Description |
|
A |
Coded Answer |
40 |
75 min. |
20% |
Multiple choice questions,
all to be attempted. |
|
B |
Short Answer |
7 or 8 |
90 min. |
20% |
Short with structured
subcomponents, fixed space for answer, all to be attempted. |
|
C |
Comprehension |
3 |
150 min. |
24% |
a) Answer questions on a new
passage. b) Analyze and draw conclusions from a set of presented data. c)
Explain phenomena described in short paragraphs: select 3 from 5. |
|
D |
Practical Problems |
8 |
90 min. |
16% |
Short problems with
equipment set up in a laboratory, all to be attempted. |
|
E |
Investigation |
1 |
About 2 weeks |
10% |
In normal school laboratory
time, investigate a problem of the student's own choice. |
|
F |
Project Essay |
1 |
About 2 weeks |
10% |
In normal school time,
research and write about a topic chosen by the student. |
This system combines external testing on paper
(components A, B, and C) with external performance tasks done using equipment
(D) and with teachers' assessment of work done during the course of instruction
(E and F). While this particular physics examination is now subject to
change1, combining the results of external tests
with classroom assessments of particular aspects of achievement for which a
short formal test is not appropriate is an established feature of achievement
testing systems in the United Kingdom and several other countries. This feature
is also part of the examination system for the International Baccalaureate
degree program. In such systems, work is needed to develop procedures for
ensuring comparability of standards across all teachers and schools.
1Because
the whole structure of the 16-18 examinations is being changed, this
examination and the curriculum on which it is based, which has been in place
for 30 years, will no longer operate after 2001. A new curriculum and
examination, based on the same principles, will replace it.
Overall, the purpose is to reflect the variety of the
aims of course, including the range of knowledge and simple understanding
explored in A, the practical skills explored in D, and the broader capacities
for individual investigation explored in E and F. Validity and
comprehensiveness are enhanced, albeit through an expensive and complex
assessment process.
There are other possible ways to design comprehensive
assessment systems. Portfolios are intended to record "authentic" assessments
over a period of time and a range of classroom contexts. A system may assess
and give certification in stages, so that the final outcome is an accumulation
of results achieved and credited separately over, say, 1 or 2 years of a
learning course; results of this type may be built up by combining on-demand
externally controlled assessments with work samples drawn from coursework. Such
a system may include assessments administered at fixed times or at times of the
candidate's choice using banks of tasks from which tests can be selected to
match the candidate's particular opportunities to learn. Thus designers must
always look to the possibility of using the broader approaches discussed here,
combining types of tasks and the timing of assessments and of certifications in
the optimum way.
Further, in a comprehensive assessment system, the
information derived should be technically sound and timely for given decisions.
One must be able to trust the accuracy of the information and be assured that
the inferences drawn from the results can be substantiated by evidence of
various types. The technical quality of assessment is a concern primarily for
external, large-scale testing; but if classroom assessment information is to
feed into the larger assessment system, the reliability, validity, and fairness
of these assessments must be addressed as well. Researchers are just beginning
to explore issues of technical quality in the realm of classroom assessment
(e.g., Wilson and Sloane, 2000).
Coherence
For the system to support learning, it must also have
a quality the committee refers to as coherence. One dimension of coherence is
that the conceptual base or models of student learning underlying the various
external and classroom assessments within a system should be compatible. While
a large-scale assessment might be based on a model of learning that is coarser
than that underlying the assessments used in classrooms, the conceptual base
for the large-scale assessment should be a broader version of one that makes
sense at the finer-grained level (Mislevy, 1996). In this way, the external
assessment results will be consistent with the more detailed understanding of
learning underlying classroom instruction and assessment. As one moves up and
down the levels of the system, from the classroom through the school, district,
and state, assessments along this vertical dimension should align. As long as
the underlying models of learning are consistent, the assessments will
complement each other rather than present conflicting goals for learning.
To keep learning at the center of the educational
enterprise, assessment information must be strongly linked to curriculum and
instruction. Thus another aspect of coherence, emphasized earlier, is that
alignment is needed among curriculum, instruction, and assessment so that all
three parts of the education system are working toward a common set of learning
goals. Ideally, assessment will not simply be aligned with instruction, but
integrated seamlessly into instruction so that teachers and students are
receiving frequent but unobtrusive feedback about their progress. If
assessment, curriculum, and instruction are aligned with common models of
learning, it follows that they will be aligned with each other. This can be
thought of as alignment along the horizontal dimension of the system.
To achieve both the vertical and horizontal dimensions
of coherence or alignment, models of learning are needed that are shared by
educators at different levels of the system, from teachers to policy makers.
This need might be met through a process that involves gathering together the
necessary expertise, not unlike the approach used to develop state and national
curriculum standards that define the content to be learned. But current
definitions of content must be significantly enhanced based on research from
the cognitive sciences. Needed are user-friendly descriptions of how students
learn the content, identifying important targets for instruction and assessment
(see e.g., American Association for the Advancement of Science, 2001). Research
centers could be charged with convening the appropriate experts to produce a
synthesis of the best available scientific understanding of how students learn
in particular domains of the curriculum. These models of learning would then
guide assessment design at all levels, as well as curriculum and instruction,
effecting alignment in the system. Some might argue that what we have described
are the goals of current curriculum standards. But while the existing standards
emphasize what students should learn, they do not describe how
students learn in ways that are maximally useful for guiding instruction and
assessment.
Continuity
In addition to comprehensiveness and coherence, an
ideal assessment system would be designed to be continuous. That is,
assessments should measure student progress over time, akin more to a videotape
record rather than to the snapshots provided by the current system of on-demand
tests. To provide such pictures of progress, multiple sets of observations over
time must be linked conceptually so that change can be observed and
interpreted. Models of student progression in learning should underlie the
assessment system, and tests should be designed to provide information that
maps back to the progression. With such a system we would move from "one-shot"
testing situations and cross-sectional approaches to defining student
performance toward an approach that focused on the processes of learning and an
individual's progress through that process (Wilson and Sloane, 2000). Thus,
continuity calls for alignment along the third dimension of time.
Approximations of a Balanced System
No existing systems of assessment meet all three
criteria of comprehensiveness, coherence, and continuity, but many of the
examples described in this report represent steps toward these goals. For
instance, the Developmental Assessment program shows how progress maps can be
used to achieve coherence between formative and summative assessment, as well
as among curriculum, instruction, and assessments. Progress maps also enable
the measurement of growth (continuity). The Australian Council for Educational
Research has produced an excellent set of resource materials for teachers to
support their use of a wide range of assessment strategies-from written tests
to portfolios to projects at the classroom level-that can all be designed to
link back to the progress maps (comprehensiveness)(e.g., Forster and Masters,
1996a, 1996b; Masters and Forster, 1996). The BEAR assessment shares many
similar features; however, the underlying models of learning are not as
strongly tied to cognitive research as they could be. On the other hand,
intelligent tutoring systems have a strong cognitive research base and offer
opportunities for integrating formative and summative assessments, as well as
measuring growth, yet their use for large-scale assessment purposes has not yet
been explored. Thus, examples in this report offer a rich set of opportunities
for further development toward the goal of designing assessment systems that
are maximally useful for both informing and improving learning.
Some questions for thought:
- What are the features of current large-scale
assessment practices that should be preserved in a "balanced" assessment
system?
- What should be added?
- Where are the roadblocks to creating the
large-scale part of a balanced assessment system? In terms of policy? In terms
of assessment and measurement developments?
- What are the features of current classroom
assessment practices that should be preserved in a "balanced" assessment
system?
- What should be added?
- Where are the roadblocks to creating the classroom
part of a balanced assessment system? In terms of policy? In terms of
assessment and measurement developments?
- What aspects of the two parts (classroom and
large-scale) must coordinated?
- What aspects can be left unique to each part?
- What groups should be engaged in the policy-setting
and development effort that would be needed?
- How can NSF help in this effort?
|