NSF Logo and link Learning and Education:  Building Knowledge, Understanding Its Implications, May 15-17, 2002, Arlington, VA
Skip navigation and go to content
    
 

Summary of Knowledge Session 8: Developing a Psychology of Test Design

Notes re: revising

  • There are three notes where the original speaker should fill in needed information or provide accurate information I should make a note that the attached summary and graphic are updated, based on the discussion session.
  • I should note that "theory" should be changed to "theories" of test design and that "test" should be changed to "assessment" to avoid narrow associations with large-scale paper-pencil testing.

This summary is based on notes taken during the session discussion by the discussion leader, Steve Ferrara, and from summaries written by participants during the discussion sessions or as follow-up to the session.

Overview of the Issue

Educational achievement tests do an excellent job of ranking students, schools, school systems, states, and nations. However, educators lament that educational achievement tests do not provide the information that they need to guide important decisions about instruction. The last 30 years of research in the psychology of human memory, learning, and cognitive processing have shed light on how people represent knowledge and how they develop competence in various school content domains. Current educational testing practices have incorporated parts of these advances. However, broad theories that encompass these advances in a coherent theory of test design have not emerged to guide educational achievement testing practice. A comprehensive theory of educational test design would incorporate what is known about the psychology of human learning and performance in school content areas and the socio-cultural contexts that shape, support, and inhibit learning and performance.

Prior to the meeting on May 16 participants received an initial attempt to portray the constructs, definitions, propositions, and their interrelationships. They received a graphic display of an outline of a theory for educational test design and a more detailed discussion of issues surrounding a theory of test design.

The participants in the discussion session were Principal Investigators of NSF-funded projects which focus on are include a focus on assessing student learning in science and mathematics. The goals of the discussion were to (a) afford meeting participants an opportunity to share and hear about their projects with explicit references to activities in the project that might be relevant to a theory of test design, (b) comment on the initial discussion and graphical display of elements of a theory of test design, and (c) extend these initial ideas.

The initial discussion of a theory of educational achievement test design and the graphical display are attached to this meeting summary.

Summary of the Discussion

We began the session by introducing the topic, reminding participants of the two-page introductory document and graphic, and the purpose of the session. Participating PIs were encouraged to take up to five minutes to (a) describe their NSF-funded project, and (b) discuss explicitly features of their project that may be relevant to the topic of developing psychological theory of test design. While these presentations continued into the middle of the second of the three discussion sessions, participants raised questions about projects, identified commonalities among projects, and raised issues relevant to the topic of the session. For example, while discussing his project, one participant suggested that a theory of test design should address sociological as well as psychological considerations. This comment set the stage for another participant to discuss the importance of cultural considerations in a theory of test design.

Discussion session participants provided the following summary statements to represent the gist of sometimes lengthy discussions. They developed these statements to summarize sometimes far-ranging discussions that were as lengthy as 10 minutes and other comments that were as brief as two minutes.

A Theory That Reaches Beyond Psychology

Early in the discussion, as described above, participants suggested that a theory of educational achievement test design should address sociological and cultural considerations as well as psychological features. We agreed to discuss a socio-cultural-psychological theory of test design, or SCP theory of test design and development.

Role of Test Items in a Theory of Test Design

A participant made the point that teachers and the general public rely on released test items to develop an understanding of test content and to aid interpretation of test performance. This participant recommended inserting "item interpretation" as another element in the graphical display and linking this element to other elements: test design, implemented curriculum, and policy and instructional decisions.

Questions to Consider in Reviewing the Graphical Display

The graphical display was an initial attempt to address the main elements of a coherent theory of test design. One participant asked, in follow-up to the meeting, How would such a theory encompass the following testing purposes?

  • Student diagnosis
  • Curriculum validation and development
  • Student and school accountability
  • Other purposes

A Three-dimensional Structure for Test Design

One participant suggested that assessment design could be three-dimensional and that these dimensions should be addressed in designing an assessment:

  • A map of the content domain. This map could be either narrow or wide, depending on the domain being assessed.
  • Aggregation and disaggregation of performance. Aggregation can be considered across examinees or across items or tasks within an assessment. Likewise, disaggregation could be applied to subgroups of examinees or subgroups of test items and tasks.
  • Time. Here, time refers to the development of student knowledge, understanding, and skill in a content area over the course of instruction.

Assessment Facets

Another participant discussed the possibility to consider in the theory the different facets that an assessment has. In this conceptualization, items would be developed and/or selected considering, for example, the context and purpose of the assessment, the type of knowledge that the developer/assessor is intended to tap, or features of the items themselves.

  • Context for the assessment. Contexts can include classroom assessment, school-level and school system-level assessment, and large-scale external assessment and age or grade level.
  • Types of knowledge assessed. Items types can be linked to types of knowledge and skills. Types of knowledge can be considered components of achievements. Examples within a content domain can be declarative knowledge is knowing that (e.g., facts, definitions, descriptions); or procedural knowledge is knowing how (executing procedures, routine actions, algorithms). It was mentioned that other types of knowledge have been proposed for this purpose.
  • Types of items/assessment tasks. Items can be considered in terms of the demands on examinees of the tasks themselves (e.g., whether the examinee must bubble in a selected responses or generate and write a response) and the cognitive demands on the item or task (e.g., whether the item requires recalling a fact, explaining understanding of a concept, or applying knowledge to solve a problem).

Role of Domain Knowledge in a Theory of Test Design

Discussion also focused on the central importance of considering what is known about the structure of knowledge in a content domain (e.g., algebra). One participant asserted that a theory of knowledge in a content domain is "the backbone of assessment design." In developing a theory of test design for a content domain, two considerations must be explicated:

  • Organization of knowledge in the domain.
  • How student learning progresses in the domain.

Our understanding of both of these considerations should be research-based.

Compilation of Assessment Examples

Another participant suggested that a theory of educational test design should include a collection of examples of:

  • Assessment methods (e.g., items, tasks) that have been shown to be more or less useful and valid [in a content domain].
  • Methods for validating the assessment methods.

Role of New Artifacts as Assessment Tools

One participant suggested that social interactions in and outside of schooling can result in learning and opportunities to use different approaches to assessment. This participant used weather maps as an example. Weather maps are sophisticated, employ unique symbol systems, and convey a large amount of complex information succinctly. Even with all that complexity they are well understood by the general public. Weather maps and other new artifacts in culture could be used as assessment tools. If they are used, it may become necessary to develop new psychometric models. Another participant followed up this discussion by referring briefly to the Interactive Multi-Media Exercises (IMMEX) project at UCLA, that allows to investigate the steps and procedures students use to complete a task, because all records are in electronic form, IMMEX is able to track and display students' decisions and pathways (as a network) as they solve the problems (i.e, all steps are documented by sequence and time per step).

Support for Making Inferences as Part of a Theory of Test Design

One participant suggested that a theory of educational test design should include reasoning tools to draw inferences from test performance in relation to the assessment context. This participant referred to a theory of test design that is a "theory of relativity."

The reasoning tools would enable test designers and developers to identify issues that are relevant to testing in various contexts (e.g., language, culture, socio-economic status) and that make that context unique. The tools would account for difference across testing contexts among examinees, purposes for testing, people administering tests, people interpreting test performances and making decisions based on test scores. Finally, a theory of test design can incorporate dimensions such as purposes and users, etc. A relativistic theory of test design should enable test developers to address the uncertainties associated with the complexities and wide varieties of these dimensions.

Addressing Four Components of a Test

Another participant recommended that any theory of test design should address four components of a test:

  • Question delivery. Test items can be case-based, situated, or static.
  • Response collection. Examinee responses can be collected in various formats, including criteria-based (e.g., multiple-choice, essay, fill-in), process-based (e.g., procedures, detailed interactions), latency of response (e.g., eye movement, solving-problem time)). (I do not have any notes on this issue so…)
  • Feedback strategy. Feedback to examinees about their performance can be delayed (i.e., the test can be graded after it is administered), instantaneous (i.e., given in real time, concurrent with the test administration), and so forth. For instantaneous feedback, there are hints, prompts, etc. that human tutors provide to guide the tutee's response.
  • Evaluation methods. Evaluation of examinee's responses can take many forms, including correct scores (e.g., percent correct), error analysis (e.g., using a wrong-choice misconception matrix, for example)

In a theory of test design these components can be crossed with the following features:

  • Type of knowledge. Knowledge can include conceptual, procedure, factual, etc.
  • Purpose of the test. Purposes can include assessment for certification, for placement, etc.
  • Scale of the test. Scale can include individual (e.g., tutor-tutee), classroom, or state and nation-wide.
  • Cultural factors. Cultural factors can include language, family, and community factors which influence learning in formal school situations.
Conclusion

The discussion did not turn directly to gaps in research and knowledge bases on socio-cultural-psychological theories of educational assessment design. However, participants did make at least two comments in the course of discussion relevant to such gaps. One participant observed that considerable progress has been made in mapping the organization and development of knowledge in some parts of some school content domains (e.g., subtopics in science and mathematics). Further, though, large gaps remain in an empirically derived knowledge base of the organization of and learning in much of school content domains. Second, another participant observed that until about 10 years ago, theories and research on educational test design have focused primarily on measurement accuracy (i.e., reliability and validity of test score interpretations). Over the last 10 years much attention has been paid to the consequences associated with educational testing as well as to accuracy. The area of consequences for students, educators, education policy, curriculum and teaching, and so forth seems promising for identifying considerations for theories of educational assessment design.

   
    
 
Division of Research, Evaluation and Communication
National Science Foundation
4201 Wilson Boulevard • Arlington, Virginia • (703)292-8650