NSF Logo and link Learning and Education:  Building Knowledge, Understanding Its Implications, May 15-17, 2002, Arlington, VA
Skip navigation and go to content
    
 

Developing a Psychological Theory of Test Design
For Educational Achievement Tests

Pre-Meeting Abstract for an NSF Building Knowledge Session
Steve Ferrara and Maria Araceli Ruiz-Primo
May 1, 2002

Overview

Educational achievement tests, including assessments of mathematics and science achievement, do a good job of ranking students, schools, school systems, states, and nations. They typically are structured well, as evidenced by internal consistency reliability coefficients. And usually they show moderate to strong convergence with external measures from the same content domain (though moderate to weak discrimination with measures from other content domains). Educational achievement tests of school content domains usually are supported by good evidence of content validity, primarily based on judgments by content experts on the relationship between test items and content objectives. In short, educational achievement tests, including those in mathematics and science, tell us what we expect to see: that is, high achieving students and schools tend to score higher on these tests than do lower achieving students and schools.

Educators lament that most educational achievement tests do not provide the information that they need to guide important decisions about instruction. Teachers lament that scores from these tests tell us who is doing well and who is not, but little about important things such as:

  • The specific knowledge and skills that students have or have not acquired.
  • Whether high achieving students get high scores for the right reasons (i.e., because they have mastered the knowledge and skills that the test items are intended to assess).
  • Why low achieving students do not get high scores; that is, beyond that they did not answer many test items successfully.

The last 30 years of research in the psychology of human memory, learning, and cognitive processing have shed a great deal of light on how people represent knowledge and how they develop competence in various school content domains (see Pellegrino, Chudowski, & Glaser, 2001, p. 2 and elsewhere). In particular, the research on differences between novices and experts in specific knowledge domains has illuminated how experts structure knowledge, recognize problem types, and match solution strategies to problem types. Likewise, in the last 30 years psychometricians have extended test theory from elegant models of true scores and measurement errors in tests (i.e., collections of test items) to sophisticated models of the relationship between individual test items and examinee responses. Finally, ethnographic researchers, social psychologists, and sociocultural theorists have highlighted the influences of culture, situations, and other contextual factors on learning, performance, and interpretations of performance.

Current widely practiced approaches to educational achievement test design, development, score reporting and interpretation, and validation have incorporated parts of these advances from the last 30 years. However, broad theories that encompass these advances in a psychological theory of test design have not emerged to guide educational achievement testing practice. A psychological theory of test design for educational achievement tests would encompass what is known currently about cognition; organization, development, and use of content area knowledge and skills; psychometrics; and the influence of context on learning and performance.

Current educational achievement testing practice falls short in addressing what is known in these areas. We discuss these shortcomings briefly below.

Organization of Knowledge and Development of Expertise in Content Domains

Current practice in test design and analysis addresses content and procedural knowledge, but only in terms of alignment with learning objectives. Little regard is given to research on how experts at different ages and stages of development organize knowledge and use what they know to reason and solve problems (e.g., need a citation).

Cognition

Test items are intended to assess content-based knowledge (e.g., conceptual understanding) and skills (e.g., hypothesizing in science) and broader cognitive processes (e.g., reasoning, planning). However, research has begin to appear only in recent years to validate that examinees actually implement these intended processes when they respond to test items ( e.g., Baxter, Elder, & Glaser, 1996; Hamilton, Nussbaum, & Snow, 1997, Levine, 1998; Ruiz-Primo, Shavelson, Li, & Schultz, 2001).

Psychometrics

Actually, educational achievement testing has capitalized in the many advances in psychometrics, for example, domain sampling theory, dichotomous and polytomous IRT models, managing dimensionality, and analyzing differential item functioning.

Sociocultural and Other Contextual Influences on Learning and Test Performance

Typical approaches to educational test design have given little attention to the fact that children from different backgrounds and cultures bring different prior knowledge and resources to learning and performance situations. Unfortunately, some cultural resources may be better recognized or rewarded in typical test items. For example, test items favor certain communication styles inherent to specific cultures. This can have the effect of inadvertently favoring some students and penalizing others.

What features would a psychological theory of test design have? And what benefits would such a theory provide? The accompanying figure portrays features of a possible psychological theory of test design. It portrays the four elements described above. These features also reflect the assessment triangle proposed in a National Research Council report (Pellegrino et al., 2001). The figure portrays a theory in that (a) each circle presents a set of constructs, definitions, and propositions; it portrays relationships among the circles; and (c) the circles and their relationships facilitate hypothesizing about relationships (see Kerlinger, 1973, p. 9) between what examinees know and can do and their performances on educational achievement tests that are designed according to the figure.

The figure portrays a theory in that (a) each circle presents a set of constructs, definitions, and propositions; it portrays relationships among the circles; and (c) the circles and their relationships facilitate hypothesizing about relationships between what examinees know and can do and their performances on educational achievement tests that are designed according to the figure.

Discussion Questions

  1. What features and elements would be contained in a comprehensive psychological theory of test design for educational achievement tests?
  2. What features and elements can be defined and described in some detail based on validated theory or empirical research?
  3. What features and elements can be defined and described only by speculation and hypothesis?
  4. How can we strengthen the cognitive foundations of test design and construction? Assessments are based, in part, on a set of beliefs about the kinds of tasks or situations that will prompt students to say, do, or create something that demonstrate important knowledge. These beliefs should be based on and determined by a cognitive model of learning (Pellegrino et al., 2001).
  5. What research is needed to understand better the different facets of an assessment in different domains? How can we define more precisely what we mean by achievement in order to improve our understanding for designing an assessment task? Selection of assessment tasks should account for the knowledge and skills required to understand and answer a test item or solve a problem, including the context in which the task is presented, and whether an assessment task or situation is functioning as a test of near, far, or zero transfer (Pellegrino et al., 2001).

References

Baxter, G. P., Elder, A. D. & Glaser, R. (1996). Knowledge-based cognition and performance assessment in the science classroom. Educational Psychologist, 31(2), 133-140.

Hamilton, L. S., Nussbam, E. M., & Snow, R E., (1997). Interview procedures for validating science assessments. Applied Measurement in Education, 10, 181-200.

Kerlinger, F. N. Foundations of behavioral research (2nd. ed.). New York: Holt, Rhinehart and Winston.

Levine, R. (1998). Cognitive lab report (Report prepared for the National Assessment Governing Board). Palo Alto, CA: American Institutes of Research.

Pellegrino, J. W., Chudowski, N., & Glaser, R (Eds.). (2001). Knowing what students know: The science and design of educational assessment. Washington, DC: National Academy Press.

Ruiz-Primo, M.A., Shavelson, R. J., Li, M., & Shultz, S. E., (2001). On the validity of cognitive interpretations of scores from alternative concept-mapping techniques. Educational Assessment, 7(2), 99-141.

 
   
    
 
Division of Research, Evaluation and Communication
National Science Foundation
4201 Wilson Boulevard • Arlington, Virginia • (703)292-8650