NSF Logo and link Learning and Education:  Building Knowledge, Understanding Its Implications, May 15-17, 2002, Arlington, VA
Skip navigation and go to content
    
 

Linking Classroom and Large-Scale Assessments

Karen Draney and Mark Wilson
UC Berkeley

The discussion began with a portrayal of a conceptual framework for thinking about the issue of linking classroom and large-scale assessments. This conceptual framework can be cast in terms from the Knowing What Students Know book from the NRC. That book lists three qualities as essential to linking the two kinds of assessments: Coherence, comprehensiveness, and continuity.

The model that was used to guide this discussion is the Berkeley Evaluation and Assessment Research (BEAR) Assessment System, developed by Mark Wilson and his team at Berkeley, and described (along with an extensive example) by Wilson and Sloane (2000). The first half hour or so of the session was spent describing the essential details of that system. The BEAR Assessment System is based on:

I. A developmental perspective on learning. The system assesses the development of skills and concepts over time. Progress variables are used to track students' growth. These are well-thought-out and researched hierarchies of qualitatively different levels of performance defined on important educational achievement variables. These hierarchies are embodied in general scoring guides, one for each variable (or sub-variable). The descriptions of the levels of the scoring guide are independent of the specific task being scored, so that student progress on the variable can be tracked across large sets of tasks.

II. The match between instruction and assessment. Progress variables are used as frameworks for both curriculum and assessment. Some progress variables are unique to a particular curriculum. These will probably be reflected only in classroom assessments. However, some progress variables are common across many curricula, with only minor variation (for example, designing and conducting a scientific investigation). These could be used to structure large-scale assessments. Thus all assessment tasks, both at the classroom level, and at the large-scale testing level, can be linked to sets of progress variables.

III. Teacher management and responsibility. There are two broad issues involved in the Teacher Management and Responsibility principle. First, it is the teachers who will use the assessment information to inform and guide the teaching and learning process. For this function of assessment teachers must be
A. involved in the process of collecting and selecting student work
B. able to score and use the results immediately--not wait for scores to be returned several months later,
C. able to interpret the results in instructional terms, and
D. able to have a creative role in the way that the assessment system is realized in their classrooms.
Second, issues of teacher professionalism and teacher accountability demand that teachers play a more central and active role in collecting and interpreting evidence of student progress and performance. If they are to be held accountable for their students' performance, teachers need a good understanding of what students are expected to learn and of what counts as adequate evidence of student learning. They are then in a better position, and a more central and responsible position, for presenting, explaining, and defending their students' performances and the "outcomes" of their instruction.

Teachers use scoring guides associated with each variable to score student work on assessment tasks. To ensure comparability and consistency of teacher scores across classrooms, scoring guides must be supplemented with exemplars: specially selected pieces of student work that serve as examples, for each assessment task, of each scoring level on each variable being assessed.

IV. Quality evidence. This is required of both classroom and high stakes assessments. Along with traditional measures of assessment quality, such as reliability coefficients, the progress variables allow the use of progress maps. These take the results of powerful item response models, which can model effects such as item difficulty, rater severity, and so on, and provide visual metaphors interpretable to a lay audience. Various versions of progress maps have been developed to show a variety of effects, from student progress over large chunks of time, to detailed information about student performance on a task-by-task level, showing student strengths and weaknesses, or drops in performance so they can be corrected early.

In the general discussion that followed this orientation, the following points were raised.

  1. The states have standards that can differ quite a bit, both from each other and from the national standards. Also, state standards can change from year to year. However, there is less variation in large-scale assessments. There is a finite set of contractors who develop large-scale tests and these tests may be more alike than different. This may be true because these contractors don't develop a fully new test for each state, but adapt the tests, forms, and item banks they have; it may also be the case that there are a relatively small number of concepts and skills that form the core of most science education, and that these concepts and skills are the things that show up on large-scale tests. Features that are unique to particular curricula, districts, or states are most likely not incorporated into tests, forms, and item banks that are used generally by large-scale testing companies.
  2. It might not be possible to have one assessment that works for every state, due to (among other things) the variation between state standards. However, even different curricula emphasize many of the same big ideas, so it might be possible to develop large-scale assessments that overlapped, using progress variables or something like this as a guiding principle.
  3. The use of large-scale testing in and of itself is not necessarily a problem. The question is how to break the mold of one high stakes test at one point in time. Most experts in the field of assessment (as reflected in publications such as the Standards for Educational and Psychological Testing, developed jointly by AERA, APA, and NCME) recommend that high-stakes decisions be made on the basis of more than a single test offered at only one point in time. In addition, information on a student's development as measured at multiple points in time is more useful to teachers, and ultimately, to students as well.
  4. The PASS assessment is aligned to national (rather than state) standards, and is used by many systemics. The assessment contains multiple kinds of items and teachers can use the results. This may serve as a model for more useful large-scale assessments.
  5. Teachers need help in knowing more about how to examine student work. Preservice programs should include more training specifically on assessment, and more in service professional development programs that focus on assessment as a central component need to be offered.

The final portion of the discussion was centered on a set of questions; they are given here with the answers proposed by the group.

What features should be preserved in class-room and large-scale assessments?

  • Most of the discussion centered not on what was good about current assessment practices, but on what needs to be improved. However, the group agreed that accountability is an important idea and should be preserved.
  • In addition, teacher judgment needs to be preserved as a feature in assessment, and used more systematically.

What needs to be added? What are the road blocks?

  • How can classroom assessments be made more valid and reliable? This is a change that is necessary if classroom assessment is ever going to be taken seriously by others; for example, school administrators, persons in the testing industry, and so on.
  • It might be desirable to change the timing of high stakes tests. The end of the school year may not be the best time to administer them. (One thing that was mentioned that should perhaps not be preserved is the current timing of tests - at the end of the school year, when teachers cannot use the information from them to make decisions about what the children being tested need. Not everyone agreed on this, however. It was noted that testing at the beginning of the year would offer teachers no insight on their own teaching. In addition, teachers could use end-of-year testing to gain insight on which aspects of their teaching are currently working, and which are not, as opposed to gaining diagnostic information about a particular group of students.) One possibility is to invest in greater coherence between classroom and large-scale assessments, so that teachers can use their local tests in concert with the results from external ones.
  • Technology could be used to make assessment delivery systems more flexible. The delivery system needs to include approaches for using formative assessments. Recommendations included:
    • Teachers receiving the results of large-scale assessments for their students online, so that the information would be available to them more quickly;
    • Teachers having access to tests online. This could include public versions of current large-scale testing initiatives, as well as classroom assessments that they could use with their students.
    • For written-response type items, providing access to student work online. This could include exemplary responses at a variety of scoring levels, as well as practice scoring sets and calibration sets of papers.
  • Extensive professional development was recommended:
    • Preservice education should be reconfigured to include in-depth approaches to evaluating student work. Group members agreed that not enough attention is paid to student assessment in current teacher education programs
    • Teachers need, and often do not receive, both preservice and inservice training specifically about assessment.
    • Teachers (particularly elementary teachers, but middle school and high school teachers as well) need training in science and mathematics content.
    • Perhaps technology could be used to enhance assessment; using methods such as the progress maps, but also web-based access to tests and to student work
    • Teachers need to be given time - both time during the week to work on professional development, and time over a period of years to make improvements. Teachers also need space and other resources.
  • More time and resources need to be built in to enable teachers to become expert. Teachers are currently expected to take the results from a new testing program, learn how to use them, and make big changes in student performance in impossibly short times.
  • Assessment initiatives need to include an educational component. Education of teachers should be a given; but education should also be provided for administrators, parents and the public. Public education in particular needs to be part of any assessment initiative.

What are some possible next steps for NSF?

  • NSF can facilitate meetings to allow researchers, teachers, and testing professionals to come together to address these issues. Group members felt that the current meeting regarding assessment was extremely helpful; however, they felt that there would be great benefit in including people from testing companies who are involved in producing large-scale tests; they are the ones who would actually be making changes.
  • NSF can provide leadership to help states with how to think about assessment in the current environment. Coping with issues such as the setting of state standards, the selection of appropriate assessment tools, and the development of appropriate teacher training and professional development programs requires the leadership of experts in a variety of fields; NSF can provide this leadership.
  • Some members of the group felt that NSF could provide something like a "Consumer Reports" to examine the quality of currently available tests.
  • Each state feels the need to develop their own assessment. Can NSF promote common themes and threads (i. e. "progress variables"), such that the truly important issues are addressed, and then let states "customize?"
  • NSF could take steps to make sure that administrators are educated and involved in important decisions involving assessment.
   
    
 
Division of Research, Evaluation and Communication
National Science Foundation
4201 Wilson Boulevard • Arlington, Virginia • (703)292-8650