Using Automated Item Generation to Promote Principled

advertisement
Using Automated Item Generation to Promote
Principled Test Design and Development
Cecilia B. Alves
Mark J. Gierl
Hollis Lai
Centre for Research in Applied Measurement and Evaluation
University of Alberta
Paper presented at the annual meeting of the
American Educational Research Association
Denver, CO, USA
April 30 – May 4, 2010
Principled Test Design 2
Introduction
Educational assessment plays an important role in modern society. Teachers use tests
to measure students’ strengths and weaknesses and to determine whether students are
meeting educational objectives; school administrators use tests to monitor students’
progress, and to place students in the appropriate grade; students are selected by colleges
and universities based on their performance on standardized tests; parents are informed
about their children’s performance in each subject by means of report cards. The diversity of
assessment situations is truly impressive. However, such high demands for educational
assessment comes at a significant cost and effort. Hundreds, if not thousands, of test items
must be developed to measure student performance. As a result, item development involves
significant cost, time, and effort.
In the traditional approach to test construction, each item is individually developed
by content specialists. In this process, the item is first written, then reviewed, revised, edited,
and, finally, it is administered. As a result of this lengthy process, it becomes difficult to
meet the ever increasing demand for more test items (Drasgow, Luecht, & Bennett , 2006).
Automated Item Generation (AIG) is an alternative approach to item development that can
suplement the traditional approach by using specifically programmed algorithms. The goal
of AIG is to produce large numbers of high-quality items that require little human review
prior to administration (Williamson, Johnson, Sinharay, & Bejar, 2002). The purpose of this
study is to describe and illustrate an approach for developing task models that can be used
for AIG with the College Board’s Advanced Placement Program (AP).
The College Board supports major programs and services that promote college
admissions, guidance, assessment, financial aid, enrollment, teaching, and learning (College
Principled Test Design 3
Board, 2008). Being committed to the principles of excellence and equity, the College Board
has paid renewed attention to maintaining and improving the quality of their exams. In
accordance with this goal, the use of Evidence-Centered Design (ECD) is one important
action in the process of improving the AP Program.
ECD, as initially described by Robert J. Mislevy, Linda S. Steinberg, and Russell G.
Almond in 2003, provides a conceptual framework for designing, producing, and delivering
educational assessments using evidentiary arguments. Huff, Steinberg, and Matts (2009)
defined ECD as a:
set of activities and artifacts that facilitate explicit thinking about a) given the
purpose of the assessment, what content and skills are both useful and interesting
to claim about examinees; b) what is the reasonable and observable evidence in
student work or performance required to support the claims; and c) how tasks
(items) can be developed within (p. 8).
This framework helps to ensure that evidence supports the underlying knowledge the
assessment is intended to measure (Mislevy, Steinberg, & Almond, 2003). This standpoint
is valuable because it clarifies the conceptualization of assessments in a structured, coherent,
and purposeful way that should lead to more valid inferences about student performance on
exams. Three advantages that result from the application of ECD to the AP Program are:
a) a foundation for alignment between what is taught in the course and measured
on the exam; b) an explication of what is meant by deep conceptual understanding
and complex reasoning skills; and c) detailed item design and item development
guidelines and form assembly specifications that flow directly from this
explication and that serve as the basis for comparable scores.
(Huff, Steinberg, & Matts, 2009, p. 9)
Principled Test Design 4
In their ECD-based assessment framework, the AP Program will incorporate task
models as input for item generation. A task model identifies features of the assessment
situation that make it possible for the student to produce that evidence. The use of a task
model will ensure that the generated items are derived from claims and evidence, thereby
connecting items and inference about student performance. The continuum that depicts the
progress from claims to the items is presented in Figure 1.
General
Specific
Figure 1. Continuum from Claim to Items (adapted from Hendrickson, Huff, & Luecht, 2009)
The boxes in Figure 1 increase in detail from left to right. This figure depicts a
continuum that goes from the most general inference, the Claim on the left-hand side, to a
very specific instance of student performance, the item on the right-hand side. The
continuum represents a progression of specificity, which occurs at five levels: Claim,
Principled Test Design 5
Observable Evidence, Task Model, Template, and Item. The five particular levels of item
development illustrated in this continuum are intended to indicate the range of different
types of processes and procedures that must be executed to link ECD and item development.
Each of these levels will be described and illustrated in our paper.
Claim is used to articulate the purposes of the assessment. It represents content and
skills deemed as useful for making statements about examinees.
Observable evidence is a way to support particular claims about examinee
proficiency (Huff, Steinberg, & Matts, 2009). In other words, observable evidence is
considered a behavior that allows inferences about aspects of an examinee’s proficiency.
Task models, as outlined by Hendrickson, Huff, and Luecht (2009), are components
of an evidentiary argument that supports the validity of the inferences made from student
assessments. Mislevy, Almond, and Steinberg (2002) describe a task model as “a language
for characterizing features of tasks and specifying how the interaction between the examinee
and the task is managed” (p. 115). They also state that different sets of variables (i.e., item
types or kinds of stimulus materials) may require different task models because these
characteristics may be important in modeling item parameters or controlling item selection.
Task models clarify the features of the test performance situations that will elicit
relevant evidence (PytlikZillig, Bodvarsson, & Bruning, 2005). These features can be related
to what the student is asked to say, do, or create (e.g., draw a map, write an essay, select
the correct response from a set of options) or features related to the stimulus provided to a
student (e.g., a passage, a table of data, a question) (Hendrickson, Huff, & Luecht, 2009).
Figure 2 illustrates a task model from Haladyna (2004) for applied statistics and educational
measurement statistics.
Principled Test Design 6
Construct Identifier:
Level of the construct:
Primary Context:
Competency Claim
Applied statistics and educational measurement statistics
Basic
Effect size, d
Computes and interprets an effect size as a standardized difference between groups or
levels of an independent variable.
Evidence Documentation
1.
Successfully computes d, given two means and standard deviations from a common population.
2.
Successfully computes d, given two means and standard deviations from independent populations (i.e.,
using the pooled variances).
3.
Correctly interprets d, given two means and standard deviations from a common population.
4.
Correctly interprets d, given two means and standard deviations from independent populations (i.e., using
the pooled variances).
Conceptual Task Model
Specific Tasks
Expected Mastery Criteria
1.
interprets (d|single pop. means)
Plausible choice from options
2.
interprets (d|separate pop. means)
Plausible choice from options
3.
interprets (d|levels of independent variable)
Plausible choice from options
4.
computes(d|µ1, µ2, σ)
Correct value
5.
computes(d|µ1, µ2, σ1, σ2)
Correct value
6.
interprets(computes(d|µ1, µ2, σ))
Plausible choice from options
7.
interprets(computes(d|µ1, µ2, σ1, σ2))
Plausible choice from options
8.
Interprets(generates(scatterplot|x))
Plausible choice from options
Manipulable features of complexity/difficulty
1.
Magnitude of d (low, moderate, high)
2.
Standardization of variables
3.
Number of groups (two or more)
4.
Sign of the effect size
5.
Formulas provided
6.
Software/calculator access/training
7.
Graphic facilitators (depictions of the distributions)
Manipulable features of complexity/difficulty
1.
Variable labels
2.
Magnitude of scale
3.
Compute vs. interpret vs. interpret(compute())
Figure 2. Example of Task model specification (from Haladyna, 2004. In Luecht, 2008)
Principled Test Design 7
Figure 2 documents the key features of a task model. Since task models serve as the
foundation for the item development, the key features ensure the items will be consistent
with the claims and evidence specified in the domain model. The first four lines of the table
represent the domain area of the task. Information about the intended construct, complexity
level, context, and claim is presented. Next, evidence documentation specifies what type of
behaviors to expect from a student who masters the claim. In other words, the evidence that
support a particular claim about examinee proficiency. In this task, for example, a student
should be able to successfully compute the effect size, given two means and the standard
deviations, from a common population. With conceptual task models, more specific tasks,
such as interpret (d|single population means), are presented. The expected mastery criterion,
which represents the way the evidence is observed, is also mentioned. The evidence in this
task model is exhibited by choosing the plausible choices from a multiple-choice item. The
features that affect the difficulty or complexity of the item, as well as the features that are
irrelevant to the item difficulty or complexity, are also documented. If the evidence is
presented in different formats or is focused on different aspects of proficiency, then multiple
task models may be needed. Each specific format of the task model is called an item
template.
Templates are useful for producing new items with the same task model
specifications and serve as a guide for automated item generation. There is no unanimous
taxonomy in the automatic item generation literature (Johnson & Sinharay, 2005). Item
templates are also called item models, schemas, item forms, and item shells. Similarly, the
generated items are called siblings, variants, instances, isomorphs, and clones. Items are
directly generated from templates that are linked to a task model and, subsequently, to a
claim. Item templates provide the foundation necessary for automatic item generation.
Principled Test Design 8
Automatic item generation is a procedure for using item templates to create isomorphic
instances with known item characteristics.
By employing ECD, a better understanding of the cognitive mechanisms required to
solve items and the features that affect difficulty is obtained . A careful use of design
principles when manipulating the variables is vital to the creation of items at desired
difficulty levels. As Gitomer and Bennett (2002) state “variation in difficulty may be
obtained by creating different templates, each intended to produce items in a particular target
range, or by creating a single template to generate items spanning the desired range” (p. 6).
Templates are written at a level of specificity to produce items (Hendrickson, Huff,
& Luecht, 2009). Templates present attributes in detail whereas task models presents
attributes at a more general level (Mislevy & Riconscente, 2006). Item templates are
developed explicitly to represent the clauses listed in a corresponding task model.
Furthermore, task models provide the theoretical foundation for item development. In
contrast, item templates provide an operational foundation (Zhou, 2009). This theoretical
foundation, present in a task model, entails providing information about important features
required for item development such as the construct being measured, complexity, cognitive
level of the construct, documentation of the evidence, features that affect
complexity/difficulty, and features that are irrelevant to the complexity/difficulty of the task.
A task model is a detailed structure that reveals how the information is related to
other components of the assessment, and it serves as the blueprint for constructing the actual
tasks presented to students. Task models are also created at different locations along an
ability scale and, in turn, each model provides measurement information in a particular
ability location (Luecht, 2008). Lai, Gierl, and Alves (2010), recognized this issue when
they stated:
Principled Test Design 9
Whereas the goal of generating task models from a cognitive model is to have a set
of task models that are representative across the scale of ability, item generation
from item templates is an attempt to achieve the opposite. The goal of AIG is to
vary these elements within the item template, in an iterative/systematic manner, to
generate unique items that are comparable to each other psychometrically (p. 8).
Since features required for item development, such as the construct being measured and the
cognitive level of the construct are considered, each task model should be capable of
generating multiple item templates. Items produced by one item template should have
comparable psychometric properties if only incidental elements are included in an item
template (Zhou, 2009). In other words, templates are written to create items at a specific
ability level, which depends on the variables and constraints placed on the template. If one
wants to assess different ability levels, then different item templates should be used. Camara
and Kimmel (2005) state that a task model is an item template augmented with metadata
and instantiation logic and the means of exchanging information with other components of
the assessment.
Item templates are constructed by the content specialists’ manipulation of specific,
well-defined, elements. This step requires the differentiation of the fixed and variable
elements. The variable elements can be numeric or string and replacing these variables with
value results in a new item. As indicated by Gierl, Zhou, and Alves (2008), in order to
develop item templates, at least three variables may be required: the stem, options, and
auxiliary information. Each variable functions differently. The stem is the section of the
model used to formulate context, content, and/or questions. The stem can be classified in
four categories. Independent indicates that the elements in the stem are independent or
Principled Test Design 10
unrelated to one another. That is, a change in one element will have no affect on the other
stem elements in the item template. Dependent indicates that the elements in the stem are
dependent or directly related to one other. Mixed Independent/Dependent includes both
independent and dependent elements in the stem. Fixed represents a constant stem format
with no variation or change.
The options contain the alternatives for the item template when the multiple-choice
format is used. The options can be categorized as randomly-selected when the distractors are
selected randomly; constrained when the keyed option and the distractors are generated
according to specific constraints, such as formulas, calculation, and/or context; fixed when
both the keyed option and distractors are invariant or unchanged in the item template.
The auxiliary information includes any additional material, in either the stem or
option, required to generate an item, including texts, images, tables, and/or diagrams. To
illustrate these concepts, a Biology template is presented in Figure 3 (see Gierl et al., 2008,
for more examples).
Principled Test Design 11
Stem: Mixed; Options: Fixed; Auxiliary Information: Graph
On a newly formed island, successful
populations of grasses and a species of
mouse appeared. Later, a species of hawks
flew in. The hawks feed on mice. The
population levels of mice and hawks are
represented in the graph.
In 1991, the data for the mice indicates that
A.
B.
C.
D.
r is negative because b<d
r is negative because b>d
r is positive because b<d
r is positive because b>d
STEM:
On S1, successful populations of grasses and a species of S2 appeared. Later, a species of S3 flew in. The S3
feed on S2. The population levels of S2 and S3 are represented in the graph.
In I1, the data for the S2 indicates that
ELEMENTS:
I1 Range: “1990”, “1991”, “1994”
S1 Range: “a newly formed island”, “distant forest”, “isolated jungle”
S2 Range: “rabbits”, “beetles”, “mice”, “snakes”, “fish”, “lizards”, “insects”, “bugs”, “frogs”
S3 Range: “hawks”, “eagles”, “ravens”
As S3=“hawks”, then S2=“rabbits”, “beetles”, “mice”, “snakes”, or “frogs”
As S3=“eagles”, then S2=”fish”, “snakes”, or “lizards”
As S3=“ravens”, then S2=“lizards”, “insects”, “bugs”, or “frogs”
AUXILIARY INFORMATION:
Graph with Yearly Populations
KEY:
A
Figure 3. Example of Item Template (retrieved from Gierl et al., 2008).
Principled Test Design 12
The stem contains one integer (I1) and three strings (S1, S2, and S3). The I1 element
shows three possible years: 1990, 1991, and 1994. S1 identifies the context or location of the
event, such as, a “newly formed island”, “distant forest”, “isolated jungle”. S2 represents the
prey, the animal that is attacked by the predator (S3). Since different predators feed on
different preys, S2 is dependent on S3. The location (S1), however, does not depend on the
type of preys (S2) or predators (S3). Hence, the stem is considered Mixed. The constraints
about predators and preys are stated as follows: As S3=“hawks”, then S2=“rabbits”,
“beetles”, “mice”, “snakes”, or “frogs”. The four alternatives, labeled A to D, do not varying
accordingly to the elements used, thus, it is Fixed. A graph is required as auxiliary
information for this item model.
In this paper we describe the methodology that was used with the College Board
subject-matter experts (SMEs) to articulate the content and skills deemed important in the
Biology domain and then the iterative processes of constructing task models. Part of this
objective is accomplished by documenting all the relevant background information and
considerations that are required for using content expertise as well as our generative
procedures so future researchers can benefit from our experience. Three important products
of this process are the creation of (1) robust task models, (2) item templates, and (3) sample
of items generated automatically. The use of AIG together with the ECD perspective may
promote efficient and high-quality item and test development. By outlining test development
principles for creating item templates for AIG, we will describe procedures for automatically
generating items for AP assessments. It is expected that, at the end of this project, hundreds
of new items will be generated to demonstrate our AIG approach.
Principled Test Design 13
Method
In the present study, task models were developed based on the Biology claims and
evidence provided by the College Board. The claims and evidence are viewed as the goals for
both instruction (in the AP courses) and assessment (on the AP exams). The AP program
allows students to earn credit or advanced placement in the college admissions process. It
covers 37 courses and exams across 22 subject areas. Due to space limitations, our study will
focus on Biology. The Biology exam uses multiple-choice items. Hence, this exam may
benefit from automated item generation. SMEs from the College Board articulated the claims
and the evidence in the domain and then participated in the iterative processes of constructing
task models and item templates, which included reviewing sets of keys, distractors, and
constraints. After these procedures were conducted, AIG was performed.
Procedures
Step 1: Revision of Claims and Evidence
Recently, the College Board adopted the ECD perspective for some of the AP
programs, including Biology. Based on the content and skills that were deemed important to
the Biology domain, the SMEs produced a document containing claims and evidence. This
document describes the connection among what is (a) contained in the curriculum, (b) taught
in AP courses, and (c) measured on the exams. A detailed description of the processes used to
write claims and evidence is provided in Ewing, Packman, Hamen, and Clark (2009). This
claims and evidence document will constitute the starting point for the task model
development.
Principled Test Design 14
Step 2: Creating Task Models
After having high-quality claims and evidence, task models were created. Task
models serve as a guide for creating item templates, which is the foundation for automated
item generation. It is expected that the generated task models are explicitly linked to the
claim and evidence about student proficiency.
Step 3: Creating Item Templates
An item template serves as a way for generating a large number of items with similar
conceptual and statistical properties. Item templates are constructed by the SME’s
manipulation of specific and well-defined elements. The template development requires
manipulation of the three elements: stem, options, and auxiliary information. For each
template, the specification of the fixed and variable elements, as well as their numeric or
string ranges, is necessary.
Step 4: Item Generation
The software to be used in this project is called IGOR (Item GeneratOR). It was
developed and used by Gierl et al. (2008) for developing achievement test items in
Language Arts, Social Studies, Mathematics, and Science. It was reported to be a robust and
reliable tool for generating items automatically. After creating the item template, the
software will be used to generate items. The number of generated items depends on several
factors, including the model, the number of elements in the stem of the model, and the range
specified for the elements (Gierl et al., 2008).
Principled Test Design 15
Results
The results will be shown accordingly to each step described in the Methods section.
Step 1: Revision of Claims and Evidence
This step was carried out in rounds of discussion. To begin, a review of a document
provided by the College Board containing pairings of mechanisms/processes and
structure/processes was undertaken. This document provided the starting points for creating
statements intended to characterize evidence that the student has the knowledge and skills
necessary to master the Claim 2A2 & 6.2. (The student is able to construct explanations of
the mechanisms that allow organisms to capture free energy with [the production of ATP
from ADP, photosynthesis, cellular respiration].) Key parts of the discussion as well as some
resulting changes to highlight the sequential steps executed for the enhancement of the claim
and evidence which was necessary for the subsequent development of the task models as
well as the construction of keys and distractors.
(1)
The evidence statements were subcategorized into structures or mechanisms in the
original document. However, the SME stated there was not a clear division between
structure and mechanism. Hence, values for structures and mechanisms were
collapsed.
(2)
The SME argued that the values for the collapsed processes need to be aligned with a
statement that clarified “explanation of the mechanism.” For example, the claim “A
student can explain X.” will contain as features of evidence, “correct use of language”
and “reasoning connects cause and effect”. The SME argued that while the second
feature may provide sufficient evidence that the student’s work has satisfied the claim,
the first is not sufficient. Hence, all statements were re-checked in order to satisfy the
Principled Test Design 16
“explanation” requirement. For example, ‘Pyruvate oxidation connects glycolysis to
the Krebs cycle’ was changed to ‘Pyruvate is produced by glycolysis and reacts to
form carbon dioxide and a two carbon compound that is added to molecule of the
Krebs cycle’ in order to express the “explanation” feature.
(3)
The SME claimed that sub-values for the correct statements would be very desirable,
since the number of assertions would increase and also, when answering an item,
identifying the key would be more difficult. As an example, the assertion ‘In algae
and higher plants the free energy captured by the light reactions in the form of ATP
and NADPH2 are used to produce carbohydrates from carbon dioxide in the Calvin
cycle that occurs in the stroma of the chloroplast’ was broken into (1) ‘In algae and
higher plants, ATP and NADPH are used to produce carbohydrates from carbon dioxide in the
Calvin cycle, which occurs in the stroma of the chloroplast’ and (2) ‘In algae and higher
plants the free energy is captured by the light reactions in the form of ATP and
NADPH2’.
(4)
The creation of false explanations to the Claim was performed. The false statements
were constructed negating aspects of the correct explanations, expressing a
misconception, or simply using the wrong explanation for the Claim. These false
statements were also scrutinized by the SME.
(5)
The SMEs re-checked correct and false explanations to the claim. This verification had
a threefold purpose. First, all the correct and incorrect statements were verified and rewritten, when necessary, in order to ensure that the assertions could be used as keys
and distractors in the item generation process. Second, the manipulable features of
Principled Test Design 17
complexity/difficulty were collected for this specific claim. Third, the features
irrelevant to complexity were also investigated.
Step 2: Creating Task Models
Next, the task model for the Biology Claim 2A2 & 6.2−The student is able to
construct explanations of the mechanisms that allow organisms to capture free energy with
[the production of ATP from ADP, with photosynthesis, with cellular respiration]−is
illustrated. This task model which outlines the construct being measured, complexity and
cognitive level of the construct, documentation of the evidence, features that affect
complexity/difficulty, and features that are irrelevant to the complexity/difficulty of the task
is based in Haladyna’s task model example (Haladyna, 2004). Figure 4 documents the key
features of a task model for the AP Biology Claim 2A2 & 6.2.
Principled Test Design 18
Construct Identifier:
Level of the construct:
Primary Context:
Competency Claim
AP Biology
5 (complex)
ATP production
2A2 & 6.2 The student is able to construct explanations of the mechanisms that allow
organisms to capture free energy with [the production of ATP from ADP,
photosynthesis, cellular respiration]
Evidence Documentation
1.
Successfully identifies a statement that describes one aspect of an organism’s ability to capture free energy with
the production of ATP from ADP.
2.
Successfully identifies a statement that describes one aspect of an organism’s ability to capture free energy with
photosynthesis.
3.
Successfully identifies a statement that describes one aspect of an organism’s ability to capture free energy with
cellular respiration.
Conceptual Task Model
1.
2.
3.
4.
5.
6.
7.
1.
2.
3.
4.
5.
6.
7.
Specific Tasks
Identifies that during cellular respiration and fermentation, ATP is produced
from the phosphorylation of ADP as organic molecules are broken down
Identifies that during photosynthesis, the production of ATP from ADP is
coupled to the release of energy from a proton gradient established by electron
transport chains embedded in a membrane
Identifies that the production of ATP from ADP is coupled to the release of
energy from a proton gradient established by electron transport chains
embedded in the inner membrane of the mitochondria
Identifies that photosynthesis captures free energy in visible light through the
excitation of electrons in chlorophyll molecules in chloroplasts
Identifies that f during photosynthesis, free energy is captured during the light
reactions in the form of ATP and NADPH
Identifies that glycolysis captures free energy with a series of enzymecatalyzed reactions, some of which are not spontaneous and involve the
conversion of ATP to ADP
Identifies that the difference in free energy between the mitochondrial matrix
and the inner membrane space is the energy used to produce ATP from ADP
Manipulable features of complexity/difficulty
Conceptual difficulty increases when the concept:
involves simultaneous conceptualization of multiple variable
involves multiple, possibly confounding causes of an effect
requires weighing and assessing of relative causal significance
involves multiple competing criteria
involves multiple variables whose significance must be decided
novel concepts that combine existing concepts
novel situations that require extension of existing knowledge
Expected Mastery Criteria
Plausible choice from options
Plausible choice from options
Plausible choice from options
Plausible choice from options
Plausible choice from options
Plausible choice from options
Plausible choice from options
Figure 4. Task Model for the AP Biology Claim 2A2 & 6.2
The key features help ensure that the items are consistent with the claims and evidence
specified in the domain model. The first four lines of the table represent the domain area of
the task. AP Biology is the construct identifier, the level of the construct is complex, the
Principled Test Design 19
primary context of the construct is ATP production, and the claim is 2A2 & 6.2 - The student
is able to construct explanations of the mechanisms that allow organisms to capture free
energy with [the production of ATP from ADP, photosynthesis, cellular respiration].
Evidence documentation specifies the evidence that support a particular claim about
examinee proficiency. For example, successfully identifies components of the mechanisms
that allow organisms to capture free energy with the production of ATP from ADP.
Conceptual task model provides more specific tasks, such as identifies that during cellular
respiration and fermentation, ATP is produced from the phosphorylation of ADP as organic
molecules are broken down. The expected mastery criteria, which represents the way the
evidence is observed, is also mentioned. In this task model, the evidence is exhibited by
choosing the plausible choice from option on a multiple-choice item. The features that affect
the difficulty or complexity of the item, as well as the features that are irrelevant to the item
difficulty or complexity, are also documented.
Step 3: Creating Item Templates
To illustrate the task model for claim 2A2 & 6.2, the following item template is
presented.
Principled Test Design 20
Subject
AP Biology
2A2 & 6.2 The student is able to construct explanations of the mechanisms that allow
Targeted Claim
organisms to capture free energy with [the production of ATP from ADP, photosynthesis,
cellular respiration]
Key
A
Level
5
Stem: Independent; Options: Constrained; Auxiliary Information: None
Identify a statement that describes one aspect of an organism’s ability to capture free energy
with the production of ATP from ADP.
a.
b.
c.
d.
During cellular respiration and fermentation, ATP is produced from the
phosphorylation of ADP as organic molecules are broken down.
The production of ATP by the phosphorylation of ADP occurs on the surface of the
thylakoid membrane and therefore occurs only in chloroplasts
ADP is phosphorylated to produce ATP using a proton gradient embedded in a
membrane, which is established by an electron transport chain that includes oxygen
as an electron donor
The change in free energy during the production of ATP from ADP is used to create
a proton gradient across a membrane in both the mitochondrion and the chloroplast
Item template variables
Stem
Identify correct components of the mechanisms that allow organisms to capture free energy
with [PROCESS]
Elements [PROCESS] range: “the production of ATP from ADP”, “photosynthesis”, “cellular
respiration”
Options
a.
b.
c.
d.
[KEY]
[DISTRACTOR1]
[DISTRACTOR2]
[DISTRACTOR3]
As [PROCESS]= “the production of ATP from ADP”, then [KEY]:
(1)
During cellular respiration and fermentation, ATP is produced from the
phosphorylation of ADP as organic molecules are broken down
.
.
.
(10)
…
As [PROCESS]= “photosynthesis”, then [KEY]:
(11)
Photosynthesis captures free energy in visible light through the excitation of
electrons in chlorophyll molecules in chloroplasts
.
.
.
Principled Test Design 21
(23)
…
As [PROCESS]= “cellular respiration”, then [KEY]:
(24)
Glycolysis captures free energy with a series of enzyme-catalyzed reactions,
some of which are not spontaneous and involve the conversion of ATP to ADP
.
.
.
(35)
…
As [PROCESS]= “the production of ATP from ADP”, then [DISTRACTORS]:
(1)
The production of ATP by the phosphorylation of ADP occurs on the
surface of the thylakoid membrane and therefore occurs only in chloroplasts
.
.
.
(5)
…
As [PROCESS]= “photosynthesis”, then [DISTRACTORS]:
(6)
The Calvin cycle is a series of biochemical reactions that takes place in the
cytoplasm of cells in photosynthetic algae and higher plants
.
.
.
(17)
…
As [PROCESS]= “cellular respiration”, then [DISTRACTORS]:
(18)
During Krebs cycle, ATP is converted to ADP when bonds in carbohydrates
and other organic molecules are broken
.
.
.
(24)
…
As the [KEY]=11, 12, 13, 14, 18, or 19, then [DISTRACTORS] ≠10
As the [KEY]=15, 16, or 19, then [DISTRACTORS] ≠6
Constraints As the [KEY]=16, then [DISTRACTORS] ≠9
As the [KEY]=17, then [DISTRACTORS] ≠14
As the [KEY]=28, 30, or 35 , then [DISTRACTORS] ≠19
As one [DISTRACTOR]=7, then remaining [DISTRACTORS] ≠8
As one [DISTRACTOR]=10, then remaining [DISTRACTORS] = 8 or 12
Auxiliary
Information
None
Figure 5. Item template for the AP Biology Claim 2A2 & 6.2
Principled Test Design 22
In this example, the stem contains one string variable (PROCESS). This variable can assume
three values: “the production of ATP from ADP”, “photosynthesis”, “cellular respiration”.
The four alternatives, labelled A to D, are generated in the example using algorithms that
use a combination of elements, varied in a systematic manner. Therefore, depending on
which values PROCESS assume, different keys and distractors will be displayed in the
generated item. For example, if PROCESS equals “the production of ATP from ADP”, then
there are 10 possible keys. For security reasons, only one key is been disclosed here:
Photosynthesis captures free energy in visible light through the excitation of electrons in
chlorophyll molecules in chloroplasts. A similar process is used with the distractors. If
PROCESS equals “the production of ATP from ADP”, then there are five potential
distractors, among them: The production of ATP by the phosphorylation of ADP occurs on
the surface of the thylakoid membrane and therefore occurs only in chloroplasts.
There is no auxiliary information for this item template. Since there is only one
variable in the stem, the options are constrained, and no additional material (such as texts,
images, tables, and/or diagrams), in either the stem or option, is required to generate an item
this template.
Principled Test Design 23
Step 4: Item Generation
Using the generator software IGOR, a total of 1,966 items were created. IGOR
creates items by iterating all combinations of elements in the item template, taking into
account the specified constraints. The number of items generated by type of PROCESS is
presented in Table 1.
Table 1 – The number of generated items from the AP Biology template.
Production of ATP from ADP
10
Number of
Distractors
5
Photosynthesis
13
12
1,461
Cellular respiration
12
7
405
Total
35
24
1,966
Process
Number of Keys
Number of items
100
Conclusions and Implications
The purpose of our study is to link automatic item generation and ECD to improve
test development practices at the College Board. Clearly, our approach is complex but, we
believe, worthwhile. To use Mislevy and Riconscente’s (2006) words:
initial applications of the ideas encompassed in the ECD framework may be labor
intensive and time consuming. Nevertheless, the import of ideas for improving
assessment will become clear from (a) the explication of the reasoning behind
assessment design decisions and (b) the identification of reusable elements and
pieces of structure−conceptual as well as technical−that can be adapted for new
projects (p.86).
The inclusion of the ECD approach into this project fills the gap between the traditional
automated item generation and cognitive perspective on cognition and assessment. In this
Principled Test Design 24
way, task models and item templates are constructed taking into account the knowledge and
skills students use to succeed on items based on specific claims and to demonstrate the
evidence for producing these claims.
Other strengths of the proposed approach to item development include: (a) the
process of item development can occur more quickly because items are generated
automatically; (b) difficulties present in creating parallel test forms (i.e., items with the same
content and difficulty level) can be addressed and minimized since item templates can be
used to generate large numbers of parallel items; (c) issues of test security become less of a
concern because many test items are now available for creating tests; and (d) the costs
involved in the item development process can be significantly reduced.
The benefits of AIG are also evident for computer-based, online, and continuous
assessments, where a large number of items are required for maintaining a bank. Although
we may be able to generate an enormous number of items from few task models and item
templates, a small number of these items might be useful for supporting an automated
generation process in AP. However, this project still has the great potential of contributing
to a better understanding of the processes involved in how to build robust task models as
well as constructing successful items from those task models. In this way, the amount of
knowledge and experience gathered about how to work with item modeling and item
generation and the potential to fill to the gap between what ECD and AIG might by itself be
very valuable.
Principled Test Design 25
Author Note
This project was completed with funds provided to the first author as part of the College
Board Research Grant Program. The purpose of the program is to encourage and support
developing young research scientists who wish to gain experience in conducting a program
of research. We would like to thank the College Board for their support. However, the
authors are solely responsible for the methods, procedures, and interpretations expressed in
this study. Our views do not necessarily reflect those of the College Board.
Principled Test Design 26
References
Camara, W.J., & Kimmel, E.W. (Eds) (2005). Choosing Students: Higher Education
Admissions Tools for the 21st Century, Lawrence Erlbaum Associates, Mahwah, NJ.
Drasgow, F., Luecht, R. M., & Bennett, R. (2006). Technology and testing. In R. L. Brennan
(Ed.), Educational measurement (pp. 471–516). Washington, DC: American Council on
Education.
Gierl, M. J., Zhou, J., & Alves, C. B. (2008). Developing a Taxonomy of Item Model Types
to Promote Assessment Engineering. The Journal of Technology, Learning, and
Assessment, 7(2), 1-51. Available from http://www.jtla.org.
Gitomer, D. H., & Bennett, R. E. (2002). Unmasking constructs through new technology,
measurement theory, and cognitive sciences. In National Academy of Sciences (Ed.),
Technology and Assessment Thinking Ahead: Proceedings from a Workshop (pp. 1-11).
Washington: National Academy Press.
Haladyna, T. (2004). Developing and Validating Multiple-Choice Test Item. Mahwah, New
Jersey: Lawrence Erlbaum Associates, Publishers.
Luecht, R. M. (2008). Assessment Engineering in Test Design, Development, Assembly, and
Scoring. Presentation at East Coast Organization of Language Testers (ECOLT)
Conference. Retrieved December 13, 2009 from:
http://www.govtilr.org/Publications/ECOLT08-AEKeynote-RMLuecht07Nov08%5B1%5D.pdf
Principled Test Design 27
Johnson, M. S., & Sinharay, S. (2005). Calibration of polytomous item families using
Bayesian hierarchical modeling. Applied Psychological Measurement, 29, 369–400.
Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence
centered design. ETS Research Report RR03-16. Princeton: ETS.
Mislevy, R. J., & Riconscente, M. M. (2006). Evidence-centered assessment design. In
Downing, S. M., & Haladyna, T. M. (Eds.), Handbook of test development (pp. 61-90).
Mahwah, NJ: Erlbaum.
Mislevy, R. J., Almond, R. G., & Steinberg, L. S. (2002). On the roles of task model
variables in assessment design. In Irvine, S., & Kyllonen, P. (Eds.), Generating items
for cognitive tests: Theory and practice. (p. 97-128). Hillsdale, NJ: Lawrence Erlbaum.
PytlikZillig, L. M., Bodvarsson, M., & Bruning, R. (2005). Technology-Based Education:
Bringing Researchers and Practitioners Together. Greenwich, CT: Information Age
Publishing.
Williamson, D. M., Mislevy, R. J., & Bejar, I. I. (Eds.). (2006). Automated scoring of
complex tasks in computer-based testing. Mahwah, NJ: Lawrence Erlbaum Associates.
Williamson, D. M., Johnson, M. S., Sinharay, S., & Bejar, I. (2002). Hierarchical IRT
examination of isomorphic equivalence of complex constructed response tasks. Paper
presented at the American Educational Research Association, New Orleans, LA.
Zhou, J. (2009). A Review of Assessment Engineering Principles with Select Applications to
the Certified Public Accountant Examination. Technical Report prepared for The
Principled Test Design 28
American Institute of Certified Public Accountants. Retrieved March 30, 2009 from:
http://www.cpa-exam.org/download/Zhou-A-Review-of-Assessment-Engineering.pdf
Principled Test Design 29
Appendix
The appendix contains a sample of items generated automatically
Identify a statement that describes one aspect of an organism’s ability to capture free energy
with the production of ATP from ADP.
a) The production of ATP from ADP is coupled to the release of energy from a proton
gradient established by electron transport chains embedded in the inner membrane of
the mitochondria*
b) The change in free energy during the production of ATP from ADP is used to create
a proton gradient across a membrane in both the mitochondrion and the chloroplast
c) Facilitated diffusion of protons across the cell membrane and out of the cytoplasm of
bacteria is coupled to the capture of free energy through production of ATP from
ADP on the inner surface of the membrane
d) Active transport of protons across the cell membrane and out of the cytoplasm of
bacteria is coupled to the capture of free energy through the production of ATP from
ADP on the inner surface of the membrane
Identify a statement that describes one aspect of an organism’s ability to capture free energy
with photosynthesis.
a) In the light reactions, electrons are transferred from the oxygen atom in a water
molecule to NADP+ to establish an electrochemical gradient*
b) In chloroplasts, the conversion of ATP to ADP provides the free energy required to
move protons across the thylakoid membrane and into the stroma
c) Electrons from O2 are used to replace the electrons from chlorophyll molecules that
are energized by light photons and phosphorylate ADP in the electron transport chain
d) In photosystems I and II, solar energy elevates the free energy of the hydrogen atoms
in water, which bond with the carbon atom in carbon dioxide, leaving the oxygen
atoms in carbon dioxide to form molecular oxygen
Principled Test Design 30
Identify a statement that describes one aspect of an organism’s ability to capture free energy
with cellular respiration.
a) Two pyruvate molecules produced during glycolysis are transported to the Krebs
cycle, thereby connecting the process of glycolysis with the Krebs cycle*
b) The proton gradient across the mitochondrial membrane that surrounds the matrix is
established by the free energy extracted from carbohydrates
c) In cellular respiration, electrons extracted from molecular oxygen are transferred to
NADH and then to carbon atoms during a series of reactions in the electron transport
chain
d) During cellular respiration, the pH of the mitochondrial matrix becomes lower than
the pH of the inner membrane space, and the difference in pH is used to generate
ATP from ADP
Download