Linking Classroom and Large-Scale
Assessments
Karen Draney and Mark Wilson UC
Berkeley
The discussion began with a portrayal of a conceptual
framework for thinking about the issue of linking classroom and large-scale
assessments. This conceptual framework can be cast in terms from the Knowing
What Students Know book from the NRC. That book lists three qualities as
essential to linking the two kinds of assessments: Coherence,
comprehensiveness, and continuity.
The model that was used to guide this discussion is
the Berkeley Evaluation and Assessment Research (BEAR) Assessment System,
developed by Mark Wilson and his team at Berkeley, and described (along with an
extensive example) by Wilson and Sloane (2000). The first half hour or so of
the session was spent describing the essential details of that system. The BEAR
Assessment System is based on:
I. A developmental perspective on learning. The
system assesses the development of skills and concepts over time. Progress
variables are used to track students' growth. These are well-thought-out and
researched hierarchies of qualitatively different levels of performance defined
on important educational achievement variables. These hierarchies are embodied
in general scoring guides, one for each variable (or sub-variable). The
descriptions of the levels of the scoring guide are independent of the specific
task being scored, so that student progress on the variable can be tracked
across large sets of tasks.
II. The match between instruction and
assessment. Progress variables are used as frameworks for both curriculum
and assessment. Some progress variables are unique to a particular curriculum.
These will probably be reflected only in classroom assessments. However, some
progress variables are common across many curricula, with only minor variation
(for example, designing and conducting a scientific investigation). These could
be used to structure large-scale assessments. Thus all assessment tasks, both
at the classroom level, and at the large-scale testing level, can be linked to
sets of progress variables.
III. Teacher management and responsibility.
There are two broad issues involved in the Teacher Management and
Responsibility principle. First, it is the teachers who will use the assessment
information to inform and guide the teaching and learning process. For this
function of assessment teachers must be A. involved in the process of
collecting and selecting student work B. able to score and use the results
immediately--not wait for scores to be returned several months later, C.
able to interpret the results in instructional terms, and D. able to have a
creative role in the way that the assessment system is realized in their
classrooms. Second, issues of teacher professionalism and teacher
accountability demand that teachers play a more central and active role in
collecting and interpreting evidence of student progress and performance. If
they are to be held accountable for their students' performance, teachers need
a good understanding of what students are expected to learn and of what counts
as adequate evidence of student learning. They are then in a better position,
and a more central and responsible position, for presenting, explaining, and
defending their students' performances and the "outcomes" of their instruction.
Teachers use scoring guides associated with each
variable to score student work on assessment tasks. To ensure comparability and
consistency of teacher scores across classrooms, scoring guides must be
supplemented with exemplars: specially selected pieces of student work that
serve as examples, for each assessment task, of each scoring level on each
variable being assessed.
IV. Quality evidence. This is required of both
classroom and high stakes assessments. Along with traditional measures of
assessment quality, such as reliability coefficients, the progress variables
allow the use of progress maps. These take the results of powerful item
response models, which can model effects such as item difficulty, rater
severity, and so on, and provide visual metaphors interpretable to a lay
audience. Various versions of progress maps have been developed to show a
variety of effects, from student progress over large chunks of time, to
detailed information about student performance on a task-by-task level, showing
student strengths and weaknesses, or drops in performance so they can be
corrected early.
In the general discussion that followed this
orientation, the following points were raised.
- The states have standards that can differ quite a
bit, both from each other and from the national standards. Also, state
standards can change from year to year. However, there is less variation in
large-scale assessments. There is a finite set of contractors who develop
large-scale tests and these tests may be more alike than different. This may be
true because these contractors don't develop a fully new test for each state,
but adapt the tests, forms, and item banks they have; it may also be the case
that there are a relatively small number of concepts and skills that form the
core of most science education, and that these concepts and skills are the
things that show up on large-scale tests. Features that are unique to
particular curricula, districts, or states are most likely not incorporated
into tests, forms, and item banks that are used generally by large-scale
testing companies.
- It might not be possible to have one assessment
that works for every state, due to (among other things) the variation between
state standards. However, even different curricula emphasize many of the same
big ideas, so it might be possible to develop large-scale assessments that
overlapped, using progress variables or something like this as a guiding
principle.
- The use of large-scale testing in and of itself is
not necessarily a problem. The question is how to break the mold of one high
stakes test at one point in time. Most experts in the field of assessment (as
reflected in publications such as the Standards for Educational and
Psychological Testing, developed jointly by AERA, APA, and NCME) recommend that
high-stakes decisions be made on the basis of more than a single test offered
at only one point in time. In addition, information on a student's development
as measured at multiple points in time is more useful to teachers, and
ultimately, to students as well.
- The PASS assessment is aligned to national (rather
than state) standards, and is used by many systemics. The assessment contains
multiple kinds of items and teachers can use the results. This may serve as a
model for more useful large-scale assessments.
- Teachers need help in knowing more about how to
examine student work. Preservice programs should include more training
specifically on assessment, and more in service professional development
programs that focus on assessment as a central component need to be
offered.
The final portion of the discussion was centered on a
set of questions; they are given here with the answers proposed by the
group.
What features should be preserved in class-room and
large-scale assessments?
- Most of the discussion centered not on what was
good about current assessment practices, but on what needs to be improved.
However, the group agreed that accountability is an important idea and should
be preserved.
- In addition, teacher judgment needs to be preserved
as a feature in assessment, and used more systematically.
What needs to be added? What are the road
blocks?
- How can classroom assessments be made more valid
and reliable? This is a change that is necessary if classroom assessment is
ever going to be taken seriously by others; for example, school administrators,
persons in the testing industry, and so on.
- It might be desirable to change the timing of high
stakes tests. The end of the school year may not be the best time to administer
them. (One thing that was mentioned that should perhaps not be preserved is the
current timing of tests - at the end of the school year, when teachers cannot
use the information from them to make decisions about what the children being
tested need. Not everyone agreed on this, however. It was noted that testing at
the beginning of the year would offer teachers no insight on their own
teaching. In addition, teachers could use end-of-year testing to gain insight
on which aspects of their teaching are currently working, and which are not, as
opposed to gaining diagnostic information about a particular group of
students.) One possibility is to invest in greater coherence between classroom
and large-scale assessments, so that teachers can use their local tests in
concert with the results from external ones.
- Technology could be used to make assessment
delivery systems more flexible. The delivery system needs to include approaches
for using formative assessments. Recommendations included:
- Teachers receiving the results of large-scale
assessments for their students online, so that the information would be
available to them more quickly;
- Teachers having access to tests online. This
could include public versions of current large-scale testing initiatives, as
well as classroom assessments that they could use with their students.
- For written-response type items, providing
access to student work online. This could include exemplary responses at a
variety of scoring levels, as well as practice scoring sets and calibration
sets of papers.
- Extensive professional development was recommended:
- Preservice education should be reconfigured to
include in-depth approaches to evaluating student work. Group members agreed
that not enough attention is paid to student assessment in current teacher
education programs
- Teachers need, and often do not receive, both
preservice and inservice training specifically about assessment.
- Teachers (particularly elementary teachers, but
middle school and high school teachers as well) need training in science and
mathematics content.
- Perhaps technology could be used to enhance
assessment; using methods such as the progress maps, but also web-based access
to tests and to student work
- Teachers need to be given time - both time
during the week to work on professional development, and time over a period of
years to make improvements. Teachers also need space and other resources.
- More time and resources need to be built in to
enable teachers to become expert. Teachers are currently expected to take the
results from a new testing program, learn how to use them, and make big changes
in student performance in impossibly short times.
- Assessment initiatives need to include an
educational component. Education of teachers should be a given; but education
should also be provided for administrators, parents and the public. Public
education in particular needs to be part of any assessment initiative.
What are some possible next steps for NSF?
- NSF can facilitate meetings to allow researchers,
teachers, and testing professionals to come together to address these issues.
Group members felt that the current meeting regarding assessment was extremely
helpful; however, they felt that there would be great benefit in including
people from testing companies who are involved in producing large-scale tests;
they are the ones who would actually be making changes.
- NSF can provide leadership to help states with how
to think about assessment in the current environment. Coping with issues such
as the setting of state standards, the selection of appropriate assessment
tools, and the development of appropriate teacher training and professional
development programs requires the leadership of experts in a variety of fields;
NSF can provide this leadership.
- Some members of the group felt that NSF could
provide something like a "Consumer Reports" to examine the quality of currently
available tests.
- Each state feels the need to develop their own
assessment. Can NSF promote common themes and threads (i. e. "progress
variables"), such that the truly important issues are addressed, and then let
states "customize?"
- NSF could take steps to make sure that
administrators are educated and involved in important decisions involving
assessment.
|