Summary of Knowledge Session 8: Developing a
Psychology of Test Design
Notes re: revising
- There are three notes where the original speaker
should fill in needed information or provide accurate information I should make
a note that the attached summary and graphic are updated, based on the
discussion session.
- I should note that "theory" should be changed to
"theories" of test design and that "test" should be changed to "assessment" to
avoid narrow associations with large-scale paper-pencil testing.
This summary is based on notes taken during the
session discussion by the discussion leader, Steve Ferrara, and from summaries
written by participants during the discussion sessions or as follow-up to the
session.
Overview of the Issue
Educational achievement tests do an excellent job of
ranking students, schools, school systems, states, and nations. However,
educators lament that educational achievement tests do not provide the
information that they need to guide important decisions about instruction. The
last 30 years of research in the psychology of human memory, learning, and
cognitive processing have shed light on how people represent knowledge and how
they develop competence in various school content domains. Current educational
testing practices have incorporated parts of these advances. However, broad
theories that encompass these advances in a coherent theory of test design have
not emerged to guide educational achievement testing practice. A comprehensive
theory of educational test design would incorporate what is known about the
psychology of human learning and performance in school content areas and the
socio-cultural contexts that shape, support, and inhibit learning and
performance.
Prior to the meeting on May 16 participants received
an initial attempt to portray the constructs, definitions, propositions, and
their interrelationships. They received a graphic display of an outline of a
theory for educational test design and a more detailed discussion of issues
surrounding a theory of test design.
The participants in the discussion session were
Principal Investigators of NSF-funded projects which focus on are include a
focus on assessing student learning in science and mathematics. The goals of
the discussion were to (a) afford meeting participants an opportunity to share
and hear about their projects with explicit references to activities in the
project that might be relevant to a theory of test design, (b) comment on the
initial discussion and graphical display of elements of a theory of test
design, and (c) extend these initial ideas.
The initial discussion of a theory of educational
achievement test design and the graphical display are attached to this meeting
summary.
Summary of the Discussion
We began the session by introducing the topic,
reminding participants of the two-page introductory document and graphic, and
the purpose of the session. Participating PIs were encouraged to take up to
five minutes to (a) describe their NSF-funded project, and (b) discuss
explicitly features of their project that may be relevant to the topic of
developing psychological theory of test design. While these presentations
continued into the middle of the second of the three discussion sessions,
participants raised questions about projects, identified commonalities among
projects, and raised issues relevant to the topic of the session. For example,
while discussing his project, one participant suggested that a theory of test
design should address sociological as well as psychological considerations.
This comment set the stage for another participant to discuss the importance of
cultural considerations in a theory of test design.
Discussion session participants provided the following
summary statements to represent the gist of sometimes lengthy discussions. They
developed these statements to summarize sometimes far-ranging discussions that
were as lengthy as 10 minutes and other comments that were as brief as two
minutes.
A Theory That Reaches Beyond Psychology
Early in the discussion, as described above,
participants suggested that a theory of educational achievement test design
should address sociological and cultural considerations as well as
psychological features. We agreed to discuss a socio-cultural-psychological
theory of test design, or SCP theory of test design and development.
Role of Test Items in a Theory of Test Design
A participant made the point that teachers and the
general public rely on released test items to develop an understanding of test
content and to aid interpretation of test performance. This participant
recommended inserting "item interpretation" as another element in the graphical
display and linking this element to other elements: test design, implemented
curriculum, and policy and instructional decisions.
Questions to Consider in Reviewing the Graphical
Display
The graphical display was an initial attempt to
address the main elements of a coherent theory of test design. One participant
asked, in follow-up to the meeting, How would such a theory encompass the
following testing purposes?
- Student diagnosis
- Curriculum validation and development
- Student and school accountability
- Other purposes
A Three-dimensional Structure for Test Design
One participant suggested that assessment design could
be three-dimensional and that these dimensions should be addressed in designing
an assessment:
- A map of the content domain. This map could
be either narrow or wide, depending on the domain being assessed.
- Aggregation and disaggregation of
performance. Aggregation can be considered across examinees or across items
or tasks within an assessment. Likewise, disaggregation could be applied to
subgroups of examinees or subgroups of test items and tasks.
- Time. Here, time refers to the development
of student knowledge, understanding, and skill in a content area over the
course of instruction.
Assessment Facets
Another participant discussed the possibility to
consider in the theory the different facets that an assessment has. In this
conceptualization, items would be developed and/or selected considering, for
example, the context and purpose of the assessment, the type of knowledge that
the developer/assessor is intended to tap, or features of the items themselves.
- Context for the assessment. Contexts can
include classroom assessment, school-level and school system-level assessment,
and large-scale external assessment and age or grade level.
- Types of knowledge assessed. Items types can
be linked to types of knowledge and skills. Types of knowledge can be
considered components of achievements. Examples within a content domain can be
declarative knowledge is knowing that (e.g., facts, definitions, descriptions);
or procedural knowledge is knowing how (executing procedures, routine actions,
algorithms). It was mentioned that other types of knowledge have been proposed
for this purpose.
- Types of items/assessment tasks. Items can
be considered in terms of the demands on examinees of the tasks themselves
(e.g., whether the examinee must bubble in a selected responses or generate and
write a response) and the cognitive demands on the item or task (e.g., whether
the item requires recalling a fact, explaining understanding of a concept, or
applying knowledge to solve a problem).
Role of Domain Knowledge in a Theory of Test Design
Discussion also focused on the central importance of
considering what is known about the structure of knowledge in a content domain
(e.g., algebra). One participant asserted that a theory of knowledge in a
content domain is "the backbone of assessment design." In developing a theory
of test design for a content domain, two considerations must be explicated:
- Organization of knowledge in the domain.
- How student learning progresses in the domain.
Our understanding of both of these considerations
should be research-based.
Compilation of Assessment Examples
Another participant suggested that a theory of
educational test design should include a collection of examples of:
- Assessment methods (e.g., items, tasks) that have
been shown to be more or less useful and valid [in a content domain].
- Methods for validating the assessment methods.
Role of New Artifacts as Assessment Tools
One participant suggested that social interactions in
and outside of schooling can result in learning and opportunities to use
different approaches to assessment. This participant used weather maps as an
example. Weather maps are sophisticated, employ unique symbol systems, and
convey a large amount of complex information succinctly. Even with all that
complexity they are well understood by the general public. Weather maps and
other new artifacts in culture could be used as assessment tools. If they are
used, it may become necessary to develop new psychometric models. Another
participant followed up this discussion by referring briefly to the Interactive
Multi-Media Exercises (IMMEX) project at UCLA, that allows to investigate the
steps and procedures students use to complete a task, because all records are
in electronic form, IMMEX is able to track and display students' decisions and
pathways (as a network) as they solve the problems (i.e, all steps are
documented by sequence and time per step).
Support for Making Inferences as Part of a Theory
of Test Design
One participant suggested that a theory of educational
test design should include reasoning tools to draw inferences from test
performance in relation to the assessment context. This participant referred to
a theory of test design that is a "theory of relativity."
The reasoning tools would enable test designers and
developers to identify issues that are relevant to testing in various contexts
(e.g., language, culture, socio-economic status) and that make that context
unique. The tools would account for difference across testing contexts among
examinees, purposes for testing, people administering tests, people
interpreting test performances and making decisions based on test scores.
Finally, a theory of test design can incorporate dimensions such as purposes
and users, etc. A relativistic theory of test design should enable test
developers to address the uncertainties associated with the complexities and
wide varieties of these dimensions.
Addressing Four Components of a Test
Another participant recommended that any theory of
test design should address four components of a test:
- Question delivery. Test items can be
case-based, situated, or static.
- Response collection. Examinee responses can
be collected in various formats, including criteria-based (e.g.,
multiple-choice, essay, fill-in), process-based (e.g., procedures, detailed
interactions), latency of response (e.g., eye movement, solving-problem time)).
(I do not have any notes on this issue so
)
- Feedback strategy. Feedback to examinees
about their performance can be delayed (i.e., the test can be graded after it
is administered), instantaneous (i.e., given in real time, concurrent with the
test administration), and so forth. For instantaneous feedback, there are
hints, prompts, etc. that human tutors provide to guide the tutee's response.
- Evaluation methods. Evaluation of examinee's
responses can take many forms, including correct scores (e.g., percent
correct), error analysis (e.g., using a wrong-choice misconception matrix, for
example)
In a theory of test design these components can be
crossed with the following features:
- Type of knowledge. Knowledge can include
conceptual, procedure, factual, etc.
- Purpose of the test. Purposes can include
assessment for certification, for placement, etc.
- Scale of the test. Scale can include
individual (e.g., tutor-tutee), classroom, or state and nation-wide.
- Cultural factors. Cultural factors can
include language, family, and community factors which influence learning in
formal school situations.
Conclusion
The discussion did not turn directly to gaps in
research and knowledge bases on socio-cultural-psychological theories of
educational assessment design. However, participants did make at least two
comments in the course of discussion relevant to such gaps. One participant
observed that considerable progress has been made in mapping the organization
and development of knowledge in some parts of some school content domains
(e.g., subtopics in science and mathematics). Further, though, large gaps
remain in an empirically derived knowledge base of the organization of and
learning in much of school content domains. Second, another participant
observed that until about 10 years ago, theories and research on educational
test design have focused primarily on measurement accuracy (i.e., reliability
and validity of test score interpretations). Over the last 10 years much
attention has been paid to the consequences associated with educational testing
as well as to accuracy. The area of consequences for students, educators,
education policy, curriculum and teaching, and so forth seems promising for
identifying considerations for theories of educational assessment design.
|