NSF Logo and link Learning and Education:  Building Knowledge, Understanding Its Implications, May 15-17, 2002, Arlington, VA
Skip navigation and go to content
    
 

Linking teacher classroom assessments to standardized state and national tests

Excerpted from: National Research Council. (2001) Knowing What Students Know: The Science and Design of Educational Assessment, Committee on the Foundations of Assessment, J. Pellegrino, N. Chudowsky and R. Glaser (eds.), Division on Behavioral and Social Sciences and Education. Washington, DC: National Academy Press, pp. 252-257.

Assessment Systems

In the preceding discussion we have addressed issues of practice related to classroom and large-scale assessment separately. We now return to the matter of how such assessments can work together conceptually and operationally. As argued throughout this chapter, one form of assessment does not serve all purposes. Given that reality, it is inevitable that multiple assessments (or assessments consisting of multiple components) are required to serve the varying educational assessment needs of different audiences. A multitude of different assessments are already being conducted in schools. It is not surprising that users are often frustrated when such assessments have conflicting achievement goals and results. Sometimes such discrepancies can be meaningful and useful, such as when assessments are explicitly aimed at measuring different school outcomes. More often, however, conflicting assessment goals and feedback cause much confusion for educators, students, and parents. In this section we describe a vision for coordinated systems of multiple assessments that work together, along with curriculum and instruction, to promote learning. Before describing specific properties of such systems, we consider issues of balance and allocation of resources across classroom and large-scale assessment.

Balance Between Classroom and Large-Scale Assessment

The current educational assessment environment in the United States clearly reflects the considerable value and credibility placed on external, large-scale assessments of individuals and programs relative to classroom assessment designed to assist learning. The resources invested in producing and using large-scale testing in terms of money, instructional time, research, and development far outweigh the investment in the design and use of effective classroom assessment. It is the committee's position that to better serve the goals of learning, the research, development, and training investment must be shifted toward the classroom where teaching and learning occurs.

Not only does large-scale assessment dominate over classroom assessment, but there is also ample evidence of accountability measures negatively impacting classroom instruction and assessment. For instance, as discussed earlier, teachers feel pressure to teach to the test, which results in a narrowing of instruction. They also model their own classroom tests after less-than-ideal standardized tests (Gifford and O'Connor, 1992; Linn, 2000; Shepard, 2000). These kinds of problems suggest that beyond striking a better balance between classroom and large-scale assessment, what is needed are coordinated systems of assessments that collectively support a common set of learning goals, rather than working at cross-purposes.

Ideally in a balanced assessment environment, a single assessment does not function in isolation, but rather within a nested assessment system involving states, local school districts, schools, and classrooms. Assessment systems should be designed to optimize the credibility and utility of the resulting information for both educational decision making and general monitoring. To this end, an assessment system should exhibit three properties: comprehensiveness, coherence, and continuity. These three characteristics describe an assessment system that is aligned along three dimensions: vertically, across levels of the education system; horizontally, across assessment, curriculum, and instruction; and temporally, across the course of a student's studies. These notions of alignment are consistent with those set forth by the National Institute for Science Education (Webb, 1997) and the National Council of Teachers of Mathematics (1995).

Features of a Balanced Assessment System
Comprehensiveness

By comprehensiveness, we mean that a range of measurement approaches should be used to provide a variety of evidence to support educational decision making. Educational decisions often require more information than a single measure can provide. As emphasized in the National Research Council report High Stakes: Testing for Tracking, Promotion, and Graduation (1999b), multiple measures take on particular importance when important, life-altering decisions (such as high school graduation) are being made about individuals. No single test score can be considered a definitive measure of a student's competence. Multiple measures enhance the validity and fairness of the inferences drawn by giving students various ways and opportunities to demonstrate their competence. The measures could also address the quality of instruction, which provide evidence that improvements in tested achievement represent real gains in learning (NRC, 1999c).

Example: UK Secondary School Certification Exam System. One form of comprehensive assessment system is illustrated in Box 6-4, which shows the components of a U.K. examination for certification of top secondary school students who have studied physics as one of three chosen subjects for two years between ages 16 and 18. The results of such examinations are the main criterion for entrance to university courses. Components A, B, C, and D are all taken within a few days, but E and F involve activities that extend over several weeks preceding the formal examination.

Com-
ponent

Title

No. of Questions
or Tasks

Time

Weight
in
Marks

Description

A

Coded Answer

40

75 min.

20%

Multiple choice questions, all to be attempted.

B

Short Answer

7 or 8

90 min.

20%

Short with structured subcomponents, fixed space for answer, all to be attempted.

C

Comprehension

3

150 min.

24%

a) Answer questions on a new passage. b) Analyze and draw conclusions from a set of presented data. c) Explain phenomena described in short paragraphs: select 3 from 5.

D

Practical Problems

8

90 min.

16%

Short problems with equipment set up in a laboratory, all to be attempted.

E

Investigation

1

About 2 weeks

10%

In normal school laboratory time, investigate a problem of the student's own choice.

F

Project Essay

1

About 2 weeks

10%

In normal school time, research and write about a topic chosen by the student.

This system combines external testing on paper (components A, B, and C) with external performance tasks done using equipment (D) and with teachers' assessment of work done during the course of instruction (E and F). While this particular physics examination is now subject to change1, combining the results of external tests with classroom assessments of particular aspects of achievement for which a short formal test is not appropriate is an established feature of achievement testing systems in the United Kingdom and several other countries. This feature is also part of the examination system for the International Baccalaureate degree program. In such systems, work is needed to develop procedures for ensuring comparability of standards across all teachers and schools.

1Because the whole structure of the 16-18 examinations is being changed, this examination and the curriculum on which it is based, which has been in place for 30 years, will no longer operate after 2001. A new curriculum and examination, based on the same principles, will replace it.

Overall, the purpose is to reflect the variety of the aims of course, including the range of knowledge and simple understanding explored in A, the practical skills explored in D, and the broader capacities for individual investigation explored in E and F. Validity and comprehensiveness are enhanced, albeit through an expensive and complex assessment process.

There are other possible ways to design comprehensive assessment systems. Portfolios are intended to record "authentic" assessments over a period of time and a range of classroom contexts. A system may assess and give certification in stages, so that the final outcome is an accumulation of results achieved and credited separately over, say, 1 or 2 years of a learning course; results of this type may be built up by combining on-demand externally controlled assessments with work samples drawn from coursework. Such a system may include assessments administered at fixed times or at times of the candidate's choice using banks of tasks from which tests can be selected to match the candidate's particular opportunities to learn. Thus designers must always look to the possibility of using the broader approaches discussed here, combining types of tasks and the timing of assessments and of certifications in the optimum way.

Further, in a comprehensive assessment system, the information derived should be technically sound and timely for given decisions. One must be able to trust the accuracy of the information and be assured that the inferences drawn from the results can be substantiated by evidence of various types. The technical quality of assessment is a concern primarily for external, large-scale testing; but if classroom assessment information is to feed into the larger assessment system, the reliability, validity, and fairness of these assessments must be addressed as well. Researchers are just beginning to explore issues of technical quality in the realm of classroom assessment (e.g., Wilson and Sloane, 2000).

Coherence

For the system to support learning, it must also have a quality the committee refers to as coherence. One dimension of coherence is that the conceptual base or models of student learning underlying the various external and classroom assessments within a system should be compatible. While a large-scale assessment might be based on a model of learning that is coarser than that underlying the assessments used in classrooms, the conceptual base for the large-scale assessment should be a broader version of one that makes sense at the finer-grained level (Mislevy, 1996). In this way, the external assessment results will be consistent with the more detailed understanding of learning underlying classroom instruction and assessment. As one moves up and down the levels of the system, from the classroom through the school, district, and state, assessments along this vertical dimension should align. As long as the underlying models of learning are consistent, the assessments will complement each other rather than present conflicting goals for learning.

To keep learning at the center of the educational enterprise, assessment information must be strongly linked to curriculum and instruction. Thus another aspect of coherence, emphasized earlier, is that alignment is needed among curriculum, instruction, and assessment so that all three parts of the education system are working toward a common set of learning goals. Ideally, assessment will not simply be aligned with instruction, but integrated seamlessly into instruction so that teachers and students are receiving frequent but unobtrusive feedback about their progress. If assessment, curriculum, and instruction are aligned with common models of learning, it follows that they will be aligned with each other. This can be thought of as alignment along the horizontal dimension of the system.

To achieve both the vertical and horizontal dimensions of coherence or alignment, models of learning are needed that are shared by educators at different levels of the system, from teachers to policy makers. This need might be met through a process that involves gathering together the necessary expertise, not unlike the approach used to develop state and national curriculum standards that define the content to be learned. But current definitions of content must be significantly enhanced based on research from the cognitive sciences. Needed are user-friendly descriptions of how students learn the content, identifying important targets for instruction and assessment (see e.g., American Association for the Advancement of Science, 2001). Research centers could be charged with convening the appropriate experts to produce a synthesis of the best available scientific understanding of how students learn in particular domains of the curriculum. These models of learning would then guide assessment design at all levels, as well as curriculum and instruction, effecting alignment in the system. Some might argue that what we have described are the goals of current curriculum standards. But while the existing standards emphasize what students should learn, they do not describe how students learn in ways that are maximally useful for guiding instruction and assessment.

Continuity

In addition to comprehensiveness and coherence, an ideal assessment system would be designed to be continuous. That is, assessments should measure student progress over time, akin more to a videotape record rather than to the snapshots provided by the current system of on-demand tests. To provide such pictures of progress, multiple sets of observations over time must be linked conceptually so that change can be observed and interpreted. Models of student progression in learning should underlie the assessment system, and tests should be designed to provide information that maps back to the progression. With such a system we would move from "one-shot" testing situations and cross-sectional approaches to defining student performance toward an approach that focused on the processes of learning and an individual's progress through that process (Wilson and Sloane, 2000). Thus, continuity calls for alignment along the third dimension of time.

Approximations of a Balanced System

No existing systems of assessment meet all three criteria of comprehensiveness, coherence, and continuity, but many of the examples described in this report represent steps toward these goals. For instance, the Developmental Assessment program shows how progress maps can be used to achieve coherence between formative and summative assessment, as well as among curriculum, instruction, and assessments. Progress maps also enable the measurement of growth (continuity). The Australian Council for Educational Research has produced an excellent set of resource materials for teachers to support their use of a wide range of assessment strategies-from written tests to portfolios to projects at the classroom level-that can all be designed to link back to the progress maps (comprehensiveness)(e.g., Forster and Masters, 1996a, 1996b; Masters and Forster, 1996). The BEAR assessment shares many similar features; however, the underlying models of learning are not as strongly tied to cognitive research as they could be. On the other hand, intelligent tutoring systems have a strong cognitive research base and offer opportunities for integrating formative and summative assessments, as well as measuring growth, yet their use for large-scale assessment purposes has not yet been explored. Thus, examples in this report offer a rich set of opportunities for further development toward the goal of designing assessment systems that are maximally useful for both informing and improving learning.

Some questions for thought:
  1. What are the features of current large-scale assessment practices that should be preserved in a "balanced" assessment system?
  2. What should be added?
  3. Where are the roadblocks to creating the large-scale part of a balanced assessment system? In terms of policy? In terms of assessment and measurement developments?
  4. What are the features of current classroom assessment practices that should be preserved in a "balanced" assessment system?
  5. What should be added?
  6. Where are the roadblocks to creating the classroom part of a balanced assessment system? In terms of policy? In terms of assessment and measurement developments?
  7. What aspects of the two parts (classroom and large-scale) must coordinated?
  8. What aspects can be left unique to each part?
  9. What groups should be engaged in the policy-setting and development effort that would be needed?
  10. How can NSF help in this effort?
   
    
 
Division of Research, Evaluation and Communication
National Science Foundation
4201 Wilson Boulevard • Arlington, Virginia • (703)292-8650