Developing a Psychological Theory of
Test Design For Educational Achievement Tests
Pre-Meeting Abstract for an NSF
Building Knowledge Session Steve Ferrara and Maria Araceli Ruiz-Primo May
1, 2002
Overview
Educational achievement tests, including assessments
of mathematics and science achievement, do a good job of ranking students,
schools, school systems, states, and nations. They typically are structured
well, as evidenced by internal consistency reliability coefficients. And
usually they show moderate to strong convergence with external measures from
the same content domain (though moderate to weak discrimination with measures
from other content domains). Educational achievement tests of school content
domains usually are supported by good evidence of content validity, primarily
based on judgments by content experts on the relationship between test items
and content objectives. In short, educational achievement tests, including
those in mathematics and science, tell us what we expect to see: that is, high
achieving students and schools tend to score higher on these tests than do
lower achieving students and schools.
Educators lament that most educational achievement
tests do not provide the information that they need to guide important
decisions about instruction. Teachers lament that scores from these tests tell
us who is doing well and who is not, but little about important things such
as:
- The specific knowledge and skills that students
have or have not acquired.
- Whether high achieving students get high scores for
the right reasons (i.e., because they have mastered the knowledge and skills
that the test items are intended to assess).
- Why low achieving students do not get high scores;
that is, beyond that they did not answer many test items successfully.
The last 30 years of research in the psychology of
human memory, learning, and cognitive processing have shed a great deal of
light on how people represent knowledge and how they develop competence in
various school content domains (see Pellegrino, Chudowski, & Glaser, 2001,
p. 2 and elsewhere). In particular, the research on differences between novices
and experts in specific knowledge domains has illuminated how experts structure
knowledge, recognize problem types, and match solution strategies to problem
types. Likewise, in the last 30 years psychometricians have extended test
theory from elegant models of true scores and measurement errors in tests
(i.e., collections of test items) to sophisticated models of the relationship
between individual test items and examinee responses. Finally, ethnographic
researchers, social psychologists, and sociocultural theorists have highlighted
the influences of culture, situations, and other contextual factors on
learning, performance, and interpretations of performance.
Current widely practiced approaches to educational
achievement test design, development, score reporting and interpretation, and
validation have incorporated parts of these advances from the last 30 years.
However, broad theories that encompass these advances in a psychological theory
of test design have not emerged to guide educational achievement testing
practice. A psychological theory of test design for educational achievement
tests would encompass what is known currently about cognition; organization,
development, and use of content area knowledge and skills; psychometrics; and
the influence of context on learning and performance.
Current educational achievement testing practice falls
short in addressing what is known in these areas. We discuss these shortcomings
briefly below.
Organization of Knowledge and Development of
Expertise in Content Domains
Current practice in test design and analysis addresses
content and procedural knowledge, but only in terms of alignment with learning
objectives. Little regard is given to research on how experts at different ages
and stages of development organize knowledge and use what they know to reason
and solve problems (e.g., need a citation).
Cognition
Test items are intended to assess content-based
knowledge (e.g., conceptual understanding) and skills (e.g., hypothesizing in
science) and broader cognitive processes (e.g., reasoning, planning). However,
research has begin to appear only in recent years to validate that examinees
actually implement these intended processes when they respond to test items (
e.g., Baxter, Elder, & Glaser, 1996; Hamilton, Nussbaum, & Snow, 1997,
Levine, 1998; Ruiz-Primo, Shavelson, Li, & Schultz, 2001).
Psychometrics
Actually, educational achievement testing has
capitalized in the many advances in psychometrics, for example, domain sampling
theory, dichotomous and polytomous IRT models, managing dimensionality, and
analyzing differential item functioning.
Sociocultural and Other Contextual Influences on
Learning and Test Performance
Typical approaches to educational test design have
given little attention to the fact that children from different backgrounds and
cultures bring different prior knowledge and resources to learning and
performance situations. Unfortunately, some cultural resources may be better
recognized or rewarded in typical test items. For example, test items favor
certain communication styles inherent to specific cultures. This can have the
effect of inadvertently favoring some students and penalizing others.
What features would a psychological theory of test
design have? And what benefits would such a theory provide? The accompanying
figure portrays features of a possible psychological theory of test design. It
portrays the four elements described above. These features also reflect the
assessment triangle proposed in a National Research Council report (Pellegrino
et al., 2001). The figure portrays a theory in that (a) each circle presents a
set of constructs, definitions, and propositions; it portrays relationships
among the circles; and (c) the circles and their relationships facilitate
hypothesizing about relationships (see Kerlinger, 1973, p. 9) between what
examinees know and can do and their performances on educational achievement
tests that are designed according to the figure.

Discussion Questions
- What features and elements would be contained in a
comprehensive psychological theory of test design for educational achievement
tests?
- What features and elements can be defined and
described in some detail based on validated theory or empirical research?
- What features and elements can be defined and
described only by speculation and hypothesis?
- How can we strengthen the cognitive foundations of
test design and construction? Assessments are based, in part, on a set of
beliefs about the kinds of tasks or situations that will prompt students to
say, do, or create something that demonstrate important knowledge. These
beliefs should be based on and determined by a cognitive model of learning
(Pellegrino et al., 2001).
- What research is needed to understand better the
different facets of an assessment in different domains? How can we define more
precisely what we mean by achievement in order to improve our understanding for
designing an assessment task? Selection of assessment tasks should account for
the knowledge and skills required to understand and answer a test item or solve
a problem, including the context in which the task is presented, and whether an
assessment task or situation is functioning as a test of near, far, or zero
transfer (Pellegrino et al., 2001).
References
Baxter, G. P., Elder, A. D. & Glaser,
R. (1996). Knowledge-based cognition and performance assessment in the science
classroom. Educational Psychologist, 31(2), 133-140.
Hamilton, L. S., Nussbam, E. M., &
Snow, R E., (1997). Interview procedures for validating science assessments.
Applied Measurement in Education, 10, 181-200.
Kerlinger, F. N. Foundations of
behavioral research (2nd. ed.). New York: Holt, Rhinehart and Winston.
Levine, R. (1998). Cognitive lab
report (Report prepared for the National Assessment Governing Board). Palo
Alto, CA: American Institutes of Research.
Pellegrino, J. W., Chudowski, N., &
Glaser, R (Eds.). (2001). Knowing what students know: The science and design
of educational assessment. Washington, DC: National Academy Press.
Ruiz-Primo, M.A., Shavelson, R. J., Li,
M., & Shultz, S. E., (2001). On the validity of cognitive interpretations
of scores from alternative concept-mapping techniques. Educational
Assessment, 7(2), 99-141. |