Uploaded by Trần Thị Thanh Huyền

ESSENTIALS OF EDUCATIONAL MEASUREMENT

advertisement
FIFTH
EDITION
ESSENTIALS
OF
EDUCATIONAL
MEASUREMENT
ROBERT
L. EBEL
DAVIDA. FRISBIE
fifth edirion
ESSENTIALSOF
EDUCATIONAL
MEASUREMENT
ROBERT
L. EBEL
DAVIDA. FRISBIE
Unir-ersin of Iowa
Prentice,Hall
of IndiaFn0vateLImnlted
New Dethi-110001
1991
EIi.'EREE|
ThL lndbn F.pl|nl4a
7f..o0
(Oli8inalU,S.Edition-Rr.I 347.m)
ESSENIIALS OF EDUCANONALMEASUREI'EIIT"5ThEd.
by Robedt, EbelandDavidr{ Frisbie
PRENTICE-HAIL
INTERNATIONAL,
lNC.,Engla,vood
Cliffs.
PRENTICE-HALL
INTERNATIONAL,
lNC.,tondon.
PRENTICE-HALL
OF AUSTRALIA,,
PTY.LTD.,Sydney.
PRENTICE-HALL
CANADA,lNC,,Toronro.
PRENTICE-HALL
OFJAPAN,lNC.,Tokyo.
PRENTICE.HALL
OF SOUTHEAST
ASIA(PTE.}[TD., SiNSAPOG.
ED]TORAPRENTICE-HALL
DO BRASILLTDA.,Riode Jan6iIo.
,TERrcANA"
PRENTICE-HALL
HISPANOA
S,,q",
MexicoCiry.
@ I99l by Prentic€-Hall,Inc., Engleuood Cliffs, NJ., U.s.A. All rights
rcseryed. No pan of this book may be reproduced in any form, by
mimeographor any othel means, without permission in writing from
the publishen.
lsBN{€7692-70G2
The crport rightsof this bok are vestedsolelywith the publisher.
Reprinted
lndia by special arangement wtth prentice-Hall, Inc,
-inEnglevrcod
Clifh, NJ., U.S.A.
Printedby BhuvneshSeth at RajkamalEledric Press,8-35/9, G.T. K,amal
Road lndust al Area, Delhi-llOO33 and Publishedby p.entice-Hall of
India PrivaE Limited,M-97, Connaqtht CirEu3,New D,elhi-lI oOOl.
Contents
xi
Preface
The Status of Educational
Measurement
I
I
The Prevalonce of Testing
Some Chronic Cornplaints about Testing , 3
7
Some Current Issuesand Developments
17
The Principal Task of the School
The Potential Value of Testing in Education
2l
Summary Propositions
2l
Discussion
and
for
Study
Questions
2
Icasurement
and the Instructlonal
Process
23
Eraluation, Measurement, and Testing
26
Process
Er-aluation in the Teaching
30
Functions of Achievement Tests
3I
Tests
Limitations of Achievement
33
Measurements
Inrerpreting
J8
Summan'Propositions
39
Discussion
Qrresdons for Study and
23
l9
YI
CONTENTS
Measurlng
Important
Achlevements
4l
The Cognitive Outcomes of Education
4t
Using Instructional Objectives
47
SummaryPropositions
5J
54
Questions for Study and Discussion
4
Descrlblng
and Summefizlng
Measurement
FrequencyDistributions
55
Describing Score Distributions
59
Score Scales Describe Performance
Correlation Coefficients
70
Summary Proposition s
74
Questions for Study and Discussion
The Reltablllty
of Test Scores
Results
64
75
76
The Meaning of Reliability
76
gl
Methods of Estimating Siore Reliability
Using Reliability Information
8j
Factors Influencing Score Reliability
gg
Criterion-ReferencedScoreReliability
94
Summary Propositions
98
99
Questions for Study and Discussion
6
Valldlty:
Interpretatlon
and Use
f OO
The Meaning of Validity
100
Evidence Used ro Support Validity
Applying Validity Principles
t I0
Summary Propositions
I 12
Questions for Study and Discussion
102
ll j
7
Achlevenent
Test Planntng
ll4
Establishing the Purpose for Testing
l14
Alternative Types of Test Tasks
t 15
jJ
CONTENTS yil
Test Specifications
I I7
Item Format Selection
122
Number of Items
128
Level and Distribution of Difficulty
S um m ar y P ro p o s i ti o n s
131
Questions for Study and Discussion
Tnre-Fdse
Test Items
I j0
l j2
lr3
Merits of the True-False Forrnat
t)j
Common Misconceptions about True-False Items
Writing Effective True-False Items
142
Multiple True-Ialse Items
I5l
SummaryPropositions
152
t5j
Questions for Study and Discussion
I jz
9
Hulttple-Cholce
Test Items
lj4
The Popularity of the Multiple-choice Format
154
The Content Basis for Creating Multiple-choice Itenrs
The Multiple.choice Item Stem
159
Preparing the ResponseChoices
167
S um m ar y Pro p o s i ti o n s
177
l78
Questions for Study and Discussion
ro
Other Obiective-Item
Formats
l7g
S hor t - ans w e rIte ms
179
\ I at c hing I t e rrs
182
Nur ner ic alPro b l e l n s
185
S um m ar y Pro p o s i ti o n s
i ,8 7
f
o
r
S
tu
d
y
a
n
d
D
i
s
cussion
Q ues t ions
IB V
11
Essey-Test Items
f 88 -
T he P r ev ale n c eo f Es s a yT e s ti n g
188
T he V alue o f Es s a yT e s ti n g
I8 9
Reliabilit v o f E s s a y .te sSc
t o re s
I9 I
P r epar ing E s s a vIre m s
193
157
Viii
CONTENTS
Scoring EssayItems
194
Summary Propositions
197
Questions for Study and Discussion
198
t2
Test Admlnistratlon
add Scoring
199
Preparing the Students
199
Test-preparation Considerations
203
Test-administration Considerations
205
Scoring Procedures and Issues
209
Computer-assistedTestAdministration
216
Summary Propositions
218
218
Questions for Study and Discussion
t3
Evaluatlng
Test and Item Characteristlcs
22O
Test Characteristics to Evaluate
221
Item-analysis Procedures
225
Selection of the Upper and Lower Groups
227
Index of Difficulty
228
Index of Discrimination
231
Item Selection
232
Item Revision
233
OtherCriterion-referencedProcedures
237
PosttestDiscussions
238
SummaryPropositions
239
240
Questions for Study and Discussion
r4
Nontest
and Informal
Evaluation
Methods
Observatiorral Techniques
243
Informal Inventories
253
Oral-questioning Technrques
257
SummaryPropositions
262
Questions for Study and Discussion
241
262
r5
Gradlng
and Reporttng
Achlevements
The Need for Grades
264
Some Problems of Grading
265
264
COarTElr'S
The N{eaning Conveved by Grades
258
E s t ablis h i n ga Gra d i n g S y s te m
2 7I
27]
Threats to the Validity of Grades
As
s
i
g
n
m
e
n
ts
2
7
5
G r ading C o u rs e
2 76
Com bin i n g G ra d e C o mp < rn e n ts
NI er hods o f As s i g n i n g G ra d e s
279
Grading Software
283
281
Sunrnrary Propositions
284
f<
-rr
S
tu
d
y
a
n
d
D
i
scussion
Q ues t ion s
r6
The Nature of Standardized
Tests
286
285
Characteristicsof Standardized Tests
T y pes of S ta n d a rd i z e d ' Ie s t Sc o re s
289
Norms
295
299
Selection of Standardized Tests
S ur nm ar y P ro p o s i ti o n s
301
302
Questicinsfor Study and Discussion
r7
Using Strndatdized
Achievement
Tests
3O3
'l'he Status of Standardized Achievement Testing
Us c s of ' A c h i e v e me n t-te s tR e s u l ts J05
I nt er pr e ti n g S c o re so f In d i v i d u a l s
J09
h) t er pr e ti n g Sc o re so f C l a s s e s 3 1 4
Rep< lr t i n gt() S tu d e n ts a n d P a re n ts
317
S ( ) m eI nte rp re ta ti o n P ro b l e ms
3 20
School Testing Pr<lgram Issues
321
S ur nm ar y Pro p o s i ti o n s
328
328
Questions for Study and Discussion
303
18
Standardized
Intelligence
and Aptitude
Measures
The Cc-rnceptof lntelligence
330
The Nature <-rfIntelligence Tests
332
Scores Reported from Ability Tests
335
Aptitude Testing
339
S um m ar y P ro p o s i ti o n s
340
341
Questions for Study and Discussion
3jO
ar
X
CONTENTS
E.TeacherIn-servlceToplcstrtegardlngPreparatlonforTest
,5O
Admlnlstratlon
F.Teecherln.servlceToplcsonAchlevemcnt-testScore
ttz
InterPretatlon
References
,t4
Author Index
,t9
Subfcct Inder
,6t
Preface
This fifth edition of Essmtial,sof Ed,watinwl Measuremntt, like the previous editions, has been designed as a textbook for introductory measurementcoursesand
as a reference for practitioners engaged in the development and use of educational
measures.The evaluation needs of teachershave been weighed heavily in making
decisions about content coverageand emphasis,but consideration also has been
tion of a chapter on nontest evaluation methods and the deletion of the chapters
on persoriality measuresand recent developments. Chapter 14, Nontest and Informal Evaluation Methods, describes the development and use of observation
The chapter.end projects and problems sections have been moved to the
Xii
PREFACE
to illustrate the apprication of test-developmentconcepts
and procedures. The
last three appendixes provide lists of topi.r rhat are
ipp-p.iu,.
ro consrder
when planning in'servi-e instruction about standardized'test
ielection, administration, and score interpretation for teachers.
The first three chapters deal with fundamental educational
measurement
concepts and current testing issues..4
throughour the text, is a chinge in us
more common usage found in the lite
discussion of classification systems(tax
The treatment of reliabilitv has
detail on criterion-referenced iituations
) glve proper emphasisto construct valida_
Itrinsic rational validiry evi<ience.In addi:o planning for criterion_referencedtests
ts for all kinds of tests.
Laveundergone onlv minor revision: the
discussion of lrolistic and analytical sc<
chapters on test administration, test eva
most educational measurement texts do
and aptitude testing also has been revisr
score interpretation and use.
able and generous contributions to this
Ful to several colleagues who furnished
Tim Ansley, Doug Becker, Bob Forsyth,
, Dave Lohman, Rick Stiggins, and jon
For his supporr, Kim Volk-for her ible
word'proces-sing assistan.:,
1'9.B9bJo1dan for his heipful library work. I also
appreciate the efforts of Fred
Finch ind Bill Zwack of the Riverside publishing
company in obtaining illustrations. Finally, thanks are due
to my family for the
Patience and understanding they have shown and ro Bob Ebel for the solid founda.
tion he established with the first three editions of rhis book.
D. A. F.
ESSENTIALS OF
EDUCATIONAT
MEASUREMENT
The Status
of Educational
Measurement
THE PREVALENCE
OF TESTING
As the last decade of this century unfolds, there is more educational testing occur.
ring than we have ever witnessed before. Like many other educational phenom.
ena, however, testing seems to fall in and out of favor in cyclic fashion over time.
Usually the era of peak demand is followed by a period of increasing criticism
of the inadequacies of testing and of the inability of tests to address our most
pressing educational problems. By decade's end we likely will have come full
circle from the 1970swhen chargesof racial/ethnic bias dominated our thoughts
and contributed to a decrease in educational testing.
The most recent escalation in the use of tests gained impetus from such
educational movements as "excellence,""effective schools,""public accountabil.
ity," and "minimum competency." The pressure of these movements brought
both more and different kinds of testing to the schools. Teachers continued as
udual to give classroom tests to assesslearning outcomes and to motivate their
students to learn. And schools continued as usual to administer standardized
testing programs to monitor the progress of each graae group and to assesscurricular strengths and weiaknesses.However, in many states this teacher and dis.
trict testing was supplemented by a host of other testing programs mandated by
the state or by the district itself. Thus, students and teachers alike continually
found themselves in some phase of testing-preparing to take a test, administering or taking a test, or reviewing or explaining the results from some kind of a
test.
Unfortunately, there has been too much testing-too little use of good
2
THESTATUS
oF EDUCATIoNAL
MEASUREMENT
tests and too little good use of well
more by the conrent of the upcoming mandated test than
by the goars, values,
and perceived needs of the lbcar coirmunity. As a result,
in muri ptu.., it.
:onsequencesof testing has begun to shape
more than it should.
rolicymakers and the pubtic for test informa.
rreparation and to ill.advised uses of existing
purposes other than what their makers intended. unfortunately, too, some of ihes. tests are nor
oi u..y high q;"iiit T;"
many of them are produced under severe time consrraints
by iti,ri"iJ""i! *itr,
little special training in test.development and no speciai
aptitude for the task.
All educators-teachers, administrators, clunselors,
curriculum coordinators, and instructional designers-need to know more
about educational
measurement than they have had an opportunity to rearn.
Most states h;;; ;;
teacher certification requirements thar specify a test and
measurement lcourse
w r D L'
'
.- " 'v 'r !
and
suchspecia',,i'j,1:T:,T';'.1TflXff"?j'j;5r,*ilff
lfi?J,:TrffJffJ,:;
teacher certification may be a prerequisite to special
mo st
teach
.c
^f" ot,,i J ,
L^,,-
-^
^..;- - 1_
Eertidcation, rr"u.
dation in educational measuriment as they prepare forand
""-ro,r.rptu.ti.",tll.,p..
cialty' For these and other reasons, the
practicis or t
other educators are influenced prim_arily-."i.rrd-.nr
".rr..,as "li
by lore-wh"i-rtt.y
experienced
stu.
dents, what they have seen or heard rrom colteagu.r, .rrJ*hut
tir.y rruu..r.-o.a
incidentally through related coursework or shori-term prof.ssio.r"i
;;;;;;;;;
experiences.
The status of achievement tes
different. Employment testing by pe
for licensure or certification deciiionr
on the upswing. Tests of achievemen
tions, and the quality of these tests v
the next. The decisions made on the basis of scores from
such tests are no less .
critical than many of those made in our schools. These are ..strong
,.r,r,;; ,rr.
con-sequencesfor.test takers are great because the decisions
will influ"en.. .ur..,.
path$, economic bpportunities, lnd social acceptance by peers.
They are hish_
stakes rests because there is much ro lose or much t" g"i",
;;p;;;i"g;;;T.
decision that results from the scores.
The excessivePressuresbrought on by high-stakesdecisions
should serve
to reduce the amount of testing that-takes-pracJ as we near
tt. ."rrtrry;r-.rra.
And the'realization that a test acore providls limited information
of less.than.
perfect accuracy should-help to curb the tide also. But
tests will not so awav
permanentry or even sink to row obscurity. Nor should they.
If *; *.;; ?; ;;;i
vate and reward efforts to learn, if we want effective and
producrir; ;;;;i;,
if we want to deal fairly with individuals on rhe uasir oiirreir."fufiiiri.,
accomplishments, we need more good testing, not simpry less
testirlg. oespite ".ri
the
MEASUREMENT 3
OF EDUCATIONAL
THE STATUS
current prevalence of educational testing, we are far from receiving the full benefits that could be obtained from the wise use of good tests'
SOME CHRONIC COMPLAINTSABOUT TESTING
Even in periods of unprecedented amounts of testing, there are those who decry
the use of tests and whose goals are to highlight the misuses of tests or the harm
done to students by them. Of course, not all tests are skillfully prepared, not all
test scores are used in prudent ways, and no test score is likely to be free of error.
The wise use of test scores requires an understanding of the issues raised by
critics and an ability to distinguish constructive criticism from emotional reac'
tion or uninformed opinion. Because some of the most frequent charges have
implications for test development and test-scoreinterpretation and use, it is ap'
propriate to consider their merits early on in the study of educational measure'
ment.
1. Standardizedtest makerscQntrolathat studrnts learn. lt is possible for the
developers of a commercial achievement test to constrlrct their instrument to
includi only those items that meet their own personal criteria of relevance, diffi'
culty. and timeliness. But unless the content representsthe essential topics of
current textbooks, unless the items reflect the recommendations of national cur'
riculum councils and cornmittees, and unless the content is deemed relevant by
district test selection committees, such testswill not be sold. A test that might be
considered ahead of its time or behind recent developments will die a quick
death because many school personnel will fear that such a test will yield a set
of very low scores and, more importantly, useless information. The successof
commercial test developers is measured in the marketplace. The most successful
tests are those that respond to curricular changes and emphasis rather those that
attemPt to effect such changes.
To the extent that a school modifies its curriculum-what is taught, when
it is taught, and how much effort is devoted to teaching it-so that test scores will
improve, sorne may say the test is controlling the curriculum. However, the effects
of such instructional modifications can be considered both positive and negative.
Instruction that is measurement driven is furposefuI instruction, whether or not
the intentions are laudable. For example, when the domain of instruction is lim'
boredom, more frustration. and lower achievement levels. Trade'offs abound'
Teachers and administrators ar'efaced with the decision about how much
the test should influence instructional emphases.Unfortunately, parents, school
board members, superintendents, and other school personnel are reluctant to
explain low test scoies in terms of test-curriculum content mismatches. Instead,
cuirent sentiments are to treat low scores as an indication of failure on the Part
of the school. Most often the appropriateness and quality of the test instrument
MEASUREMENT
OF EDUCATIONAL
THE STATUS
not in the habit of
remain unquestioned and are assumed to be ideal. We are
they be high
whether
scores,
test
of
set
for
a
explanations
competing
considering
appropriand
the
quality
or low, but we are in tfrehabit of uncritically accepting
atenessof the test instrument'
A school that focusesinstruction on what the testsmeasure surely should
test does samplq
teach other things as well. Even in the basic skills areas that the
into interesting
venture
to
there should be ample class time and teacher time
tests can
Standardized
tests.
standardized
the
and important areas not covered by
teachand
administrators
school
that
extent
to
the
only
dominite local curricula
ers Permlt.
are inflated becauseof teathing to the test.If the effectiveness of
Z. Test scores
judgedon
the baSis-ofstudents'performance on a test, the
instruction is to be
teacher to prepaTe.stu^dentsto answer the spe'
for-the
itt"ig
be
te-ptution may
on the test. This is often referred to as teachincluded
will
bE
ih"t
.in.'qr"rtion,
are significant
ing,J the test. When the negative consequencesof.low scores
less favorable
to
a
reassignment
incriase,
salary
IJw
for the teacher-loss ofjob,
not be too low. The
will
scores
students'
ensure
to
urgency
is
great
setting-there
survival'
*o."firu of the d?cisions"often takes a back seat to practicality and
The Pressures of accountabilil
but, more importantly, unrealistic ar
their associated rewards and punishn
For example, teachers with average'a
requiring above-average achievement.
tem. Teiching the test questions and
And as one state found out, erasing I
rect ones in their places after the test
in circumstances
stakesare, fhe bolber the stakehotderswill become, especially
requirements'
or
expecta'tions
of unreasonable
An important distinction should be made between teaching to the test
qu€s(that is, attempting to fix in students' minds the answers to particular test
give
to
(that
is,
attempting
test
by
the
covered
be
to
iions) and teaifrin[ material
uestions lihe those in the test on topics covered
trehensible'The secondreflects purposeful
rn for giving away the answersto particular
for testing Performance on skills or Eeneral
nrs ;,I';:t'J;
enritled to knowwhat the srude
J|T.,J::1
i'"t&t"'""::::::*l:
to assess'Since a
must be thoroughly relevant to the instruction it is intended
much more will
usually
of
performance,
sample
a
than
more
test can r.n., .fi.it
seldom go
test
should
the
However,
be
tested.
can
than
(and
learned)
u. t".rgnt
to^learn'
beyo.ri what students have had an opportunity
.
harmful
3. Testsmake studcnts araious ind stressful.Claims that testing is
some stu'
students;
upset
and
threaten
tests
forms:
many
taken
have
to students
get a-low score
dents even break down and cry when faced with a test; if students
self-concepts
student's
try1ns;
quit
and
on a tesr, they will become discouraged
with
educational
incompatible
is
testing
Proce'
and
will be damaged seriously;
of
students'
supportive
be
to
desigtt.d
dures
There is undoubtedly anecdotal evidence to supPort some of these
HE ST,ATUS
OF EDUCATIONAL
MEASUREMENY 5
claims. Common sense suggests,however, that the majority o[ students are not
harmed by testing. There are no substantial survey data that would contradict
cornmon senseon this matter. Teachers seem concerned much more often rr'ith
students who d<ln't care enough how well or how poorly they do on tests than
with the relatively exceptional instancesof students who seem to care too much.
It is normal and biologically helpful to be somewhatanxious when facing
any real test of performance in life. But it is also a necessaryparr of growing up
to learn to cope with the kind of tests that life inevitably brings. Of the many
challenges tcl a child's peace of mind caused by such things as angry parents,
playground bullies, bad dogs, shots from the doctor, ancl things that go bump in
the night, tests surely must be among the least fearsome for most voungsters.
Unwise parental pressurecan in some caseselevateanxiety to harmful levels^But
usually the child who breaks down in tears at the prospect of a test has problems
of security, adjustment, and maturity that testing did not create and that cannot
be solved by eliminatinB tests.Indeed, more frequent tesring might help to solve
the problem.
A student who consistently gets low test scoreson material that the student has tried hard to learn is indeed likely to be cliscouraged.If this does hap.
pen, the school cannot claim to be offering a good educational program, and the
teacher cannot claim to be doing a g<lodjob of teaching. Most low test scores,
howeveq go to students who, for whatever reason, have not tried very hard to
learn. In the opinion of the teachersof such students,it is the trying rather than
the testing that is more in need of correction.
4^ Standardizedtestsare biasedagain"stsomestudcnls. Standardized tests of
educational achievement have been attacked for their alleged bias against raciali
ethnic minorities, against either males or females,or against studentewith poor
reading skills. The reason for the attack, at least in part, is that suih students
tend to score lower on standardized tests than their age-mates.But surely lower
scores alone do not signify bias. If they did, every spelling test wbuld be biased
against poor spellers, and every typing test against persons who never learned to
type. A test is biased only if it yields measures that are consistently lower than
they should be.
That students who do poorly on a particular test written in English might
do better if the test were in Spanish, or if the questions were presented orally,
does not mean that the original test is biased against them. It simply means that
they have not learned enough of what the particular test measures. Its linguistic
context is part of the test. The particularity of what the test does and should
measure does not constitute bias.
The score of a student on an achievement test indicates how successfully
the test questions were answered under the conditions of the test. The reasonable
assumption usually is made that the student would be equally successfulwith
other tasks requiring the same knowledge or ability. Consequently, if a tesr score
isjudged to be an inaccurate indication of the student's level of achievement in
the domain covered by the test, bias is not the likely explanation. It is more likely
that the conditions for testing were undesirable or that a reasonably good test
was chosen for a purpose other than the one fo!: which it was intended originallv.
Suppose we present third graders with this math problem.solving item:
"A football team scored two touchdowns, no extra points, and one field goal in
6
THE STATUSoF EDUCATIoNALMEASUREMENT
the first quarter. How many points d
Surely this item is biased ugii.r.t ,t,r,
But it is also biased againstlhose wh<
it probably should not be if it is inte
ity. The item probably favors Amen
ably disadvantagesthose who know
a game in which there are no touchc
item be biased?
5. papn and pmcit objectiae
would indeed be foolish to clJim thr
could be accomplished with a carel
test.In many situations we need to ex
istics_of a product developed by the
the therapy session of a counsefor il
the draft of a one-act play, a persuasr
surement of skill is no substiiution fi
tion can be observed directly and el
and economy.
In situations where objectivi
performance rneasures,the abiftv to ,
in the hands of the test maker. item
unimportant detail are the easiest t
writers. Thus, all of us who have be,
enced test developers can attest to I
difficult by the obscurity of their co.
u;ere stated. Only u ,-"il collection
are tempted to draw the conclusion t
(cannot) measure worthwhile instrucr
ber of morepositiveexperiences
shourdericita
r;;;;il;;ilil;.
A;i;
the skill of the test develoPer and the
":"nr
of the important content to be
-nature
measured influence.the quility and usefulnessof
giu..r't.rt.
Finally, the increased interest.in-recent y."i
"
in giving rhore curricular
emphasis to higher'order.thinking skills has rais6d
abour how to mea.
l".r,iB"r
sure the achievement of these skil-ls.The ability ,"
predict, evaluate, de.
"it"iyr.-,
cannot be measured, some haue'said,
with objective.test
skill beyond the level of remembering often *i,
i,..a to L.
rce or product development to be assessed
rains, only the skill of the irem writer pre_
measure important higher.order abiliiies.
requirements of good item
writing, as will be seen in later .n"o,.t fl"oamental
6' Testscoresrwed to telt whai stu.d.entscan d.orath.n than
hou studmts rank in
a goup'The fact is, both kinds of scores are important"
ano we need tests that
will provide each type. For exampre, I_
.,o, t. ;,ttfr;
to t'o* it
the best swimmer in her crass.Rather, I -":want to know ir srrecan swim far"ri;';;
enough
to save herserf in the poor. If she can srr-imharfu.ay
pool, that stlr _?y
not be far.enough, even if it is "best" in the class. "..orr-irr.
wt.n it comes to such activities
as swimming, writing one's name, driving a motorcycl.,
t. airp..rri"g
-;ii.;:
THE STATUSOF EDUCATIONAL
MEASUREMENT 7
tion, being the best in some group may be quite inadequate: ranking is not
enough.
Scott can swim 100 yards and spell 310 words, Is he a better swimmer or
speller?Knowing only what he can do is not sufficient for deciding about relative
strengths and weaknesses.If it were important to decide which is better, I need
to know how other five-year-oldsswim and spell. Suppose I learn that no other
five-year-oldsin Scott's YMCA, swim classcan swim 100 yards and three girls in
his kindergarten classcan spell more than 310 words. Is Scott a better swimmer?
This is an issue to be explored in greater depth later, but for now it will be suffi.
cient to point out that, as long as the comparison groups are different from one
another, the question must remain open. If we learn that everyone else in Scott's
,swimming classcan spell at least 325 words, what conclusion can be drawn about
Scott's strengths and weaknesses?
In sum, we need different kinds of information for different kinds of
instructional decisions:what students can do and how they rank with others are
both important. Beyond that, knowing the type of information most needed in
a given situation and how best to obtain it are key issuesfaced by teachersat all
instructional levels.
SOM E CURRE NTI SS U E SAN D D EV EL OP M EN TS
Mandated Assessment
The continuing press for accountability has led to more testing, more
emphasis on test results in policymaking, and new developments or advancesin
test.related methodology. Mandatedassessmmt
is the term that describes the collec.
tive testing programs organized at the state (or local) level in response to legislation enacted by state (or local) governments. The mandate varies from staie to
state regarding the flexibility accorded local school districts to implement an assessment program and to use the results. In some states, for example, local districts have the option of conducting an assessment,but in others the legislation
dictates the grade levels, subject areas, and specific purposes of testing for every
district. Virtually every stat€ has enacted some form of statewide assessmentlaw,
and new legislation is introduced annually in many statesas assessmentexperi
ences uncover unanticipated gaps in implementation.
The emphasis plaeed on testing in response to state and local mandates
can be seen by reviewing this contrived, but realistic, testing program for one
school district:
1
State assessmenJ.
Tests in mathematics and reading must be administered to all
students in grades 2, 4, 6,8, and l0 of every district during a two-week period
in May. Results are summarized by the state and reported to each district in
September.
2. Grad,uation competenq test. High school sophomores are given tests in reading,
math, health, and consumer education to dererrnine if they have the minimum
knowledge required for high school graduation as determined by the local dis.
8
THE STATUSoF EDUCATIONAL
MEASUBEMENT
trict. Those who fail may be retested again asjuniors and, if necessary',as sentors.
Test content coverage and starrdard setting decisions are made locally.
prornotiontest.The school district requires pupils at ttre end of fifrh
Mid.dleschooL
grade to pass tests in reading, mathematics, and language skills fbr proinotion
to grade 6. Those who fail any resr must attend sumrner school and pass a com.
parable test before they may enroll in middle school.
4. Writing assessment.
Writing samples are collected near rhe end of grade 6 and are
scored locally to determine the extent to which writing skills are being devel.
oped. This local assessmentdoes not provide individual student scores, bur it
does provide information by classroom to describe the nature of group achievement. Curriculum adjustments in terms of focus and time allocation are made
on a building by building basis after examining the results.
5. End-of-course
fasls.High school students complering these courses are required to
pass a district-wide comprehensive test to receive graduation credit for the
course: U.S. history, American government, algebra, geometry, biology, and
chemistry. Students who fail to achieve minimum performance levels may opt
for summer school study in the area of deficiency and retest with a comparable
forp of the test. These tests are intended to ensure equality in academic demands in the course within and between buildings and to enforce minimum
standards of performance in each course.
Of course, other forms of assessmenttake place throughout the school
year interspersed among the preparation, administration, and reporting activit ies as s oc ia te d w i thth e m a n d a te d te s ts .In grades 1,3,5,7,9, and 1l the di stri ct
administers a traditional achievement-test battery to help fulfill the need for ad.
ministrative, instructional, and guidance information that mandated progr:ams
do not furnish. A test of cognitive abilities is adminisrered in grades 2 and5, an
aptitude hatter/ is given in grade 8, and many high school students take any
number of college scholarship and admission tesrs.
The sheer volume of testing has led a number of measurement specialists
to lobby for a consolidation of testing so that one test administration might effectively serve multiple purposes. Considerable experimentarion is taking place to
develop acceptable testing procedures and reporting methods that will accommo.
date the variety of needs with a reduced amount of testing. Two developments
that warrant brief mention in this regard are customized testing and district re.
port cards.
Cwtomized testing,as the term implies, allows states or districts to have
tests tailor-made to their own content specifications rather than to select one that
was designed to represent the curriculum of the mythical "typical school in the
nation." The district identifies its instructional objectives from a publisher cata.
log and specifies how many test items should be used to measure each one. The
publisher selects items that match the chosen instructional objectives from its
large bank of test items. Custom test booklets are printed, tests are administered
and scored, and results are reported in terms of mastery of the various instruc.
tional objectives. This mastery testing program may be a useful supplement to
the district's traditional standardized testing program.
Another form of customized testing that has been tried in an effort to
reduce the amount of time devoted to testing has serious limitations. In this
method a district (or state) chooses a sample of math and reading items from a
F
THE STATUSOF EDUCATIONAL
MEASUREMENT 9
standardized achievement battery for administration in its assessmentprogram.
These items, along with a few others chosen by the district, are administered and
then, using complex statistical procedures, the scores on the full-length standardized math and reading tests are estimated. The whole PurPose here is to obtain
nationally normed test scores without giving the entire tests.There are at least
two problems with these procedures that make the norms inapplicable: (l) the
district chose only items on which its students could do well and (2) the test taken
by the district was different (context of the items, length) from the one taken by
the norming schools.The implication of both of these conditions is to overestimate the performance of district pupils relative to the national group.. (See Way,
Forsyth, and Ansley, 1989, for further details associated with context effects in
customized testing.)
The state or district report card is the most popular method under development for reporting the results of statewide assessment,among other information, at the school-district level. Typically, the report provides school-building
achievement-test score averages by grade level and compares those with the averages of other buildings in the. district and in the state. To help the report reader
develop an understanding of achievement in the school, student characteristics
are described and compared with those in other buildings in the district and the
state. For example, the distribution of racial/ethnic background, limited-English
ability, and family income levels might be presented to help explain current student academic performance. Information ahout instructional resources, school
finances, and student attendance and mobility might be presented for similar
purposes. The report card method shows promise for improving the interpretability of test scores for a school district. The wide range of information presented
helps focus attention on factors that seem to relate to school achievement. But
there is much we do not know about how such factors as family income level, per
pupil expenditure, educational levels of teachers, and racial/ethnic backgrounds
of students influence the achievement of individual pupils.
Legislated statewide testing has evolved over the past 20 years to address
public accountability concerns. It is likely to remain with us in some form for the
indefinite future, and it is Iikely to provide further motivation for improving test
development and reporting practices.
Natlonal Assessment of Educational Progress
The quest for school accountability that emerged in the post-Sputnik era
made both educators and legislators realize that no useful mechanism existed to
provide information about how much young people nationwide had learned in
school. No dependable guides existed for steering public policy regarding priorities for educational spending or needed curriculum reform. Plans were laid in
the mid-I960s for the National Assessment of Educational Progress (NAEP), a
project that would survey the knowledge, skills, and attitudes of young Americans
in several subject areas and report this information to educational decision makers, practitioners, and the public. Initial assessmentsin each of ten learning
areas-science, writing, citizenship, reading, literature, music, social studies,
mathematics, career and occupational develoPment, and art-have been updated
periodically to gauge progress. More limited assessrnents,called probes,have been
10
T H ES T A T U S
o F E D U CAT Io NAL
M EASUREM ENT
conducted ih such areas as basic life skills, health, and energy. The reports of
each assessmentinclude selectedexercises(test items) and the proportion of the
sample tested that chose each multiple-choice alternative.
Many of the factors that shaped the initial structure and goals of NAEP
in the 1960s have changed. For example, there is less public confidence in the
ability of the school to do its job, there are greater demands tbr some form of
accountability, and the once modest role of the federal government in education
has changed to a prominent one. Beginning in the late 1970s,chargeswere made
that NAEP was t'ailing to serve the audience that needed serving; its results
needed to be more useful (Comptroller General, 1976; Wiley, lgSl). Subsequently, the funding for NAEP to the Education Commission of the Stareswas
not renewed and the contract was awarded to Educational Testing Service,based
on a redesign of the purposes and technical procedures proposed for furure assessments(Messick,Beaton, and Lord, 1983). Included in rhose new plans for
NAEP were assessmentof functionally handicapped students, assessmentof
limited-English proficiency students, and compurer-assistedassessmenrprocedures.
The social/political climate of the 1980sleft many congressionaland educational leaders dissatisfiedwith the national assessmentdata available to then'r.
A study committee appointed by the U.S. Departmenr of Education reviewed
NAEP and made its recommendations in the report, "The Nation's Report Card"
(Alexander andJames, 1987).Another significant redesign of NAEP ensued,with
considerable effort directed toward a testing plan that would permit stateby state
comparisons of achievement by 1990. Such comparisons, viewed as dangerous
and inappropriate by the original designersof NAEP, are considered essentialby
current policymakers to respond to ac.countabilityneeds and to motivate states
to improve education in their jurisdictions.
The 1990sexpansion of NAEP requires that all statesadminister NAEP
teststo rePresentativesamplesof their students in selectedgrades.(Add this form
of "imposed" testing to the illustration of mandated assessmentin the previous
section.)No doubt the extra testing required by NAEP will raise issuesrelated to
school time, personnel requirements, and need for dollars to supporr state-level
participation.
There are many good reasons to believe that the state by state compari.
sons made possible Uy XAff are a bad idea. First, despire the apparent demand
for these data, there is no useful plan in place or no explicit purpose given for
using state-levelresults. A state'sranking, either within the 50 staresor relarive
to its border neighbors, is more likely to provide fuel for political fires than to
improve the quality qf education in the state. Second, it is unlikely rhar stares
can agree on the essentialcontent to be measured at each grade level for which
comparisons are to be made. As a result, the content is likely to be low-level,
minimum essentialsthat a majority of students have masrered.(If such compromisingrwsls unnecessary,more of the current statewide assessmentprograms
would use the same or similar teststo conduct their state programs.) Third, con.
tent compromising will hamper the development of rests thar are difficult
enough to help show actual differences in achievement from state to state. That
is, if the state scoresare all fairly high, the scoresof thq highest-scoringfive srates
may not look much different from the scores of the lowest-scoringfir'e states.
THE STATUS
OF EDUCATIONAL
MEASUREMENT 11
the mean-ingfulnessand value of the scores from tests representing a low.
f'ou1!t,
level, "plain-vanilla" content domain are very questionable. Foi example", there
will be much in the science curriculum of Missouri schools that students will
learn but NAEP will not test. Why should Missouri be interested in the scores
frorn such a test? The lowest common denominator curriculum represented by
test content will necessarily be far lower than what we see in wideiy used standardized achievement test batteries. Fifth, the costs of gathering the state-level
NAEP scores probably cannotbejustified in terms of the benefits statescan ac.
crue. The hidden and indirect costs to statesand school districts may exceed the
direct costs funded by the federal government or allocated by each state legisla.
ture. If all these resources could be divided and channeled to instructional"programs and school facilities in each state,the impact on educarional quality would
certainly be greater. Bold leadership on the state level is needed to redirect the
state by state cornparison efforts.
The lmpact of Computers
Advances in computing, especially with regard to the microcomputer,
continue to provide new opportunities for more efficient, more realistii, and
more accurate measurement of educational achievement.A textbook description
of new developments seems futile since the technological changes are likily to
be regarded in historical terms by the time the print reachesthe audience. Even
so, these changes reach the classroom implementation phase at a snail's pace,
especially those related to testing.
A major but somewhat hidden impact of computers on testing has been
in the area of theoretical measurement developments. The increased capacity
and speed of computing have made it possible for researchers to perform simula.
tions requiring complex and lengthy statisticalanalysisrhat previously were too
cumbersome to carry out. The developments have been oullined and detailed
by Bunderson and Inouye (1987) in terms of four generations of computing in
educational measurement.
One of the most rloteworthy theoretical advances made possible by technological improvements has been computerized adaptive testing.The computer
createsa test for the examinee during the test administration processby "adapting" to the examinee's most recent response.If the last response was wrong, an
easier item is selected;if is was correct, a harder item is selected.This continuous
bouncing from easy to hard, or vice versa, allows the examinee's achievement
level to be determined quickly with a relatively small number of items. Adaptive
testing requires less than half the number of items and testing time relative to
conventional methods. And though efficiency is the main advanlage at this stage,
this method is likely to prove to be more accurate, more versatile in t".-stf
types of items that can be presented, and more economical than our traditionai
grouP test'administration procedures. When the capabilities of video disk and
voice synthesisare added, it is easy to see that adaptive testing can revolutionize
the entire testing process in the near future.
New and revised software packages for microcomputers are making rhe
classroom testing process more efficient for teachers and more fitting for individ.
ualization of instruction. Item banking allows teachersto reduce test prepararion
12
MEASUREMENT
THE STATUSOF EDUCATIONAL
time and to access test questions that have been designed to accomPany their
other instructional materials. In some casesthe test can be administered by the
computer and scored by the time the student has finished. Responsesand scores
from all students can be stored in the computer and summarized in a convenient
report for the teacher at a later time. In other cases,a desk-top scoring machine
attached to a microcomputer can score answer sheets and provide a summary
analysis report for the teacher within minutes.
Another significant impact of the computer has been in the processing
and reporting of the results of standardized testing. Computers can easily aggre'
gute r.bre. for buildings, districts, and states,and they can disaggregate scores of
iubgroups to monitor the achievement of pupils in special programs. Laser
printers-can display test scores attractively_and in ways that are most convenient
ind meaningful for each of the several different users. Since the way test scores
are organized and formatted on a report has such an influence on whether the
reports are even used, the flexibility made possible by computer changes-may_be
among the most prominent factors impacting testing policy during this last decade of the century.
Despite the many positive contributions of cornputer technology to testing, there ar-esome porential negative side effects worth contemplating- The most
troublesome may be the fact that compuiers can provide teachers with more inf6rmation about test results than they understand and are able to use. More in'
service work and careful design of software both can address this problem. Sec'
ond, the availability of an item bank means that teachers need to write fewer
items themselves and, thus, they obtain lessof the much needed practice required
to nourish good item-writing skills. Of course, this potential ploblem is greatly
diminished when teachers build and maintain their own banks. Third, the quality
of the items in a commercially prepared purchased item.bank is not necessarily
higher than the caliber of items the teacher might prePare. So increased reliance
otr* it.* banks may yield poorer measures of achievement than what teachers
could develop on their own. Finally, the use of computers to administer a test
may impede ihe performance of some test takers, even though it may yield more
valid reiults for others. (In what ways might some test takers be put at a disadvan'
tage?) All these potential negative effects can be examined through empirical
reiearch, but a heightened awareness of their possible influence may be most
effective in minimizing their impact.
Llceneureand CartlflcatlonTestlng
The amount of testingdone to licenseprofessionalsor to certify individ-
THE STATUS
OF EDUCATIONAL
MEASUREMENT 13
Though licensure or certification requirements may inctrude
a certain college degree, so many hours of related work experi"rra.,
o'.
letters
from licensed practitioners, rest scores m.o.sttften ."iry ,L.
"rrdorr.*"r,t
most significant
weight in the decision. Thus it is reasonable to questiori,
have done,
wh.ether a paper-pencir-objective test can ,.r.ur,rr.ih. possession
"r-^".ryof
ottrr.
skills deemed essential for safe and trustworthy practii..
-"rry
stt"ra a mechanic
be
required to demonstrate ability to diagnose an engine problem?
Should a dentist
demonsrrate a tooth resrorati,on o.t i !iu. patieit before
being licensed? Is a
multiple-choice test score sufficient information for certiiyi.rg
,., emergency
medical technician? Probably not.
A number of the important issut
seen by considering some of the difficu
licensure by state governments. If pros
teacher preparation program and o-btaii
to test them? If program quality actually varies so
cloes that mean the accreditation process should b
ers are to be tested, what kinds of content should be covered?
If writing skills
are to be assessed,what standard should be used to define
minimar u...'p,"ui.
ound lesson plan is deemed to be an Lssene test be used to check knowledge of the
rning? If teachers need to deminstrate
requisite to renewing a license, should
rired as evidence?In view of the purpose
r a lifetime license to a teacher
,.v.
"ft".,to
rould provide the ..final"
"rr.*.r,
"ii
ques'ons,and someare.:
r1,r
s,- t'rilt:ff:*il;tJ,':J::Li::fi:' nJ:,'j;
!..11
^
prototype procedures for teacher evaluation No doubt computer
simulations,
video chronologies, and work-product porrfolios wilr be
iJi"p,ure rhe many
facets of "teacher" that will neid to be'evaluated. we need
"r.ato ue
aute to measure
teacher competence in such a way that we are convinced we
have measured the
p_rop9r characteristics and we could obtain nearly the same
..r"ri, ii;l-."".
else did the measuring on another occasion.
The lmpact of the Courts
As might be anticipated, while the amount of testing has
increased and
the importance of test scores in decision making;.ililLore
regar chal_
El
lenges-of test-dominated decisions have developel. rrre grieoances
serdom have
been directed at the instruments themsel
questioned the relevance of the test scor
or they have argued that the use ofthe sco
Scores from intelrigence and cognitive ability tests have
been used ro
11
.
THE STATUSOF EDUCATIONAL
MEASUREMENT
group studenrs in buildings and cli
have said that such ability grouping
1
in some cases (Stell v. Saiannih, {gt
(Hobsonv . Hansm, lg6T; Diana v. State
P v. WilsonRiles (lg7g), it was ruled tl
children in classesfor the educablv
it had a disproportionate effect on
question.
Discrimination or equal protection has been cehtral
to most court decisions involving educational testing. ln Bakhev..catiforniiigzsl,
,n. pr"i",ilr n"j
been rejected by a medical schooll and rhen learned
that a mlnoriiy
with lower test scores had been admitted. The U.S.
"ppli.""t
srrl..-"
court ruled
that
Bakke should be admitted, but it only implied that the
uie or different standards
for applicants of different faces lryuri.tuppropriate. Ho*ru.r,
the courr did indi.
cate that it was proper for the race <
admission decisions.In another case
high school graduation test, a class a
causea disproportionate number of r
bias were dismissed (Debrap. v. Turlint
of tests of minimum competence foi
However, it suspended the use of the
all students who had received any p
gated schools would have had un ooo
Thus, opportunity to learn the mar
judged to be a significant criterion for
Perhaps the most significant outcome
to make between what a'staught, what
taught' To the extent theselhree domain^s.differ, only
the first is a legitimate
standard for establishing the rerevance of the .orrt.rri
of a graduatio"i;;;;tency test.
Discrimination has been an issue.with respect to court
cases involving
e-mployment testing as welr. In Gr@t v. Dukepown iompanl (tgzl)
tr,. .ourt *i.E
that requirements for employmeil, such as a passing',.ri
r"or., must be shown
to be relevant to some aspeci of successo:
cided that it is permissible for the use of te
selection from different racial groupl
measures knowledge or skill reqr",i.ed
the outcome of IJ. S. v. South Carolfuw
Examinations (NTE) by the state of south carolina resulted
in disproportionate
failure rates for minority certification candidates. But
because teacher educators
provided evidence that the content of. their teacher
p..f"r"tion
p.ogr"-, i*
tlre court rejected racial discrimination as
ifure rate of minority examinees.
te impact was also ai the heart of another
i the Golden Rule case, which was settled our
Department
orrnsuranc.,
.,ua.1,"'"t,"""il"?:lH:XT[:,::'i
lfflTT.'ff"tJ'il1:
ice (ETS),rhe srate'stesting"consurrant,belaise t6b
Jnority
-""-y
"ppti.^i1,lo,-
ME LSUREMENT fs
OF EDUCATIONAL
THE STATUS
its insurance broker positions failed the state's test. The settlenlent requires ETS
to follow certain rules in selecting test items for future versions of the Illinois
-selecting between an item that previously has be-en
test. For exarnple, when
equally difficult for blacks and whites and an item that has been harder for
blicks, the former must be chosen. (For further details about the controversial
procedures required by the Golden Rule settlement, see several related articles
in the second issue of Volume 6 of Ed.ruationaLMeasuremmt: Issncsand Pradices,
ie87.)
Finally, scholarship selection procedures were at issue in a suit brought
by the American Civil l-iberties Union (ACLU) against the New York Department
of Education. The court said the purpose of the state scholarship program was
to reward achievement in high school, not academic promise in college. Therefore, since the exclusive use of the Scholastic Aptitude Test (SAI) score for award'
ing scholarships resulted in a sizable disparity favoring-males, the eourt said the
pricefune discriminated against females. (There is-ample e1ile1ce that the- high
school grades of females are higher than those of males-) Th_ejudge concluded
that the most appropriate resolution was to-use a composite of high sChool grade'
point average ind SAI score as the selection Criterion (Staff' 1989)'
While all the casescited above involved the use of tests,.the crucial issue
in most of them was a matter of social policy And what was'on trial was the
fair and appropriate use of test scores in-decision making rather than the tests
themselvei.'Is desegregation in school imPortant enough to justify some aPPar'
ent sacrifice of opiimum learning conditions (Stell, Hobson)? Is it proper fo_r
employers to .pecify employee qualifications that are not related directly to job
(-Griggs)?Should a seer-ninglyt'good".test b_edisqrralified as a selec'
r.q.riri*.nts
tion device becausE-ofits adverse imPaqlo.n minorities (Larry P.,South Carolina,
Debra-P.)?In attempting to right old wrongs, should selection procedures discriminate to the advaniage of minority test takers (Debra P', Bakke)?
The issue thit underlies the testing controversies that have ended up in
court has everything to do with what a test is perceived to measure as it relates
to how the scores ari to be used. And this is tDefundamental issue that underlies
the.procedures we will address in subsequent chaPters on_item writing and test
buiibing: Most well-developed, technically sound testswere built with a particular
prr.posJ in mind and are less useful for other Purposes for which we might-con'rid.r
ttt.-. Why might it be inapproPriate, for example, to use the scores from
a high school graduition competency test to select the recipients of three college
scholarship awards?
Standards for Testing Practlce
In view of the increases in the amount and the significance of testing, it
seems reasonable to expect state and federal governments would regulate^and
control the develirpmeni and use of tests.Shouldn't the public be protected from
poorly made tests or inapprop-riate ttses of them, just as they should_be Pr-ot9^cted
i.o- i.t"pt insurance sell-ers,-fraudulent lawyers, or lo-wgrade beef) Aside frorn
the truth-in.testing legislation from New York state in 1979, no other legislationstate or federal-has b.en directed at the control of the testing industry or at the
protection of test tdkers' rights. Fortunately, the testing profession has been ac'
16
MEASUREMENT
THE STATUSOF EDUCATIONAT
tive in devel<lping standards of practice, albeit unenforceable, f,or makers and
users of tests.
The Standardsfor Eduratiorwl and FsychologtcalTesting(American Psychological Association, 1985), also referred to as the Standards is the most recent form
ofa document that has been prepared and revised over a 3O'yearperiod by edu'
cational and psychological test specialists.Though they distinguistr essentialand
important aspects of test development, the Standqds are not intended primarily
as prescriptions for commercial test publishers.-Instead,they are intended to
distinguish appropriate and inappropriate test use and to describe the types of
evidence users should seek and developers should furnish to supPort a specific
use of the scores from a test. There are no legal ramifications for violators, and
no professional sanctions will be placed on those who fail to adhere to these
standards. However, professionals can exert pressure on their colleagues to conform when the beliefs and values of the profession have been documented in
writing and published widely as a consensusfor reasonable Practice. Thus, de'
spite their lack of teeth, the Standnrdsare essential to the profession and, not so
indirectly, to consumers whose test-taking rights are seldom protected legally or
formally.
A more recent effort of similar intent has been the preparation of the
Codzof Fair TbstingPrailices in Edu,catinn(1988), a document more limited in both
scope of ccintent and intended audience that the Standards.TheCodzhas separate
lists of responsibilities for test developers and test users and is written to communicate with the general public rather than only testing professionals. Its sections
on developing and selecting appropriate tests, interpreting scores, striving for
fairness, and informing test takers are intended to highlight the ProPer use of
tests rather than to broaden the existing Standardsin any way. The publication
and mass distribution of the Codeby the five sponsoring professional organiza'
tions is a clear indication that many testing professionals see the need for selfregulation of sorts. It also demonstrates a keen desire for test takers to be treated
fairly and for tests to be used properly.
There has been some interest in creating a tyPe of consumer protection
agency for testing that would function much like the Consumer's Union, the Un'
derwriter's Laboratory or any of the various accrediting bodies for schools and
universities. In fact, the Center for the Study of Testing Evaluation and Educa'
tional Policy at Boston College has received grant funds to explore the feasibility
of creating such an organization. One possible outcome of the study c,ould be
the formaiion of an oiganization that would certify (l) the quality of-existing
test instruments, (2) the procedures used by testing companies to develop their
instruments and to perfoim their statistical analysis for scale and norms develop'
ment, and (3) the proposed uses of existing tests for certain specific selection,
placement, certificition, or licensure decisions. There is still considerable debate
ibout whether a testing "watchdog" is needed and whether the cost Passed on to
test takers to support this protection is worth the potential benefit to consumers'
No doubt whenthe stakes are very high, all relevant Parties-makers, users, and
takers-will be persuaded about the value of independent review and subsequent
certification of the Process. Opinions are likely to diverge more widely, however,
as the stakes decrease.
?.^
- --/
-_r,..
THE STATUS
OF EDUCATIONAL
ME{SUe€ME\-
17
TASKOF THE SCHOOITHE PRINCIPAL
When one considers the reasons why schools were built, the reasons whv both
children and adults attend them, and the activities that go on inside them. it
seems apparent that the main purpose of the school is to facilitate cognirive
learning. However, this thesis has been challenged throughout the years by those
who have argued that schools should be concerned primarily with, for example,
development of moral character (Ligon, 19611,life adjustment (U.S. Office of
Education, l95l), enhancing self-confidence(Kelly, 1962),or even the restmctur.
ing of society (Counts, 1932). Clearly, all these are worthy purposes. And since
learning can contribute to the attainment of each, they are not actually alternatives to learning as much as they are reasons for learning. But should they be
given primary emphasis in defining the task of the school? Should they form the
foundation for the school curriculum? Don't they have more to do with ultimate,
lifetime goals than with the means the school should use to help students achieve
those goals?
Many educators disagree with those who espouse "higher" goals than
cognitive learning for education. While most teachers would acknowledge the
ultimate importance of character, adjustment, self-confidence, and the good society, there are several reasons they might give for why none of these should replace learning as the school's primary focus of attention. One reason is that the
school is a special-purpose social institution. [t was designed and developed to
do a specific task: to facilitate learning. Other agencies are responsible for other
aspects of the complex task of helping people to live good lives together. For
example, there are families and churches, legislative assembliesand courts, factories and unions, publishers and libraries, and markets and moneylenders. To
believe that the major responsibility for ethical character, life adjustment, social
reconstruction, or personal happiness must rest on the schools is as presumptuous as it is foolish.
Even the private, parochial, and home schools that have emerg.edas alternatives to public school education have sought to make the transmission of
knowledge their primary function, secondary to the other goalSthat may have led
to their forrnation. The instructional methods they choose to use, the curricular
supplements they choose to endorse, and the physical facilities and environment
they choose for their setting do not overshadow the fostering of cognitive learning as their main function. The special responsibility of every school is to provide
training, instruction, and education. The task of facilitating learning is challenging enough, and important enough, to occupy nearly all a school's time and to
consume nearly all its energy and resources.
Another reason for believing,that the schools should continue to emphasize learning is because of the basic, instrumental importance of learning to all
human affairs. With their gift of language, human beings are specially equipped
for verbal learning. Cognitive excellence is their unique excellence. The more
they know and understand, the better, more effective, and happier they are likely
to be. How better can schools help youngsters toward happiness than by increas"
ing their knowledge and understanding of themselves and the world around
them? How else can adjustment be facilitated, character developed, or ability to
18
THE STATUSOF EDUCATIONAL
MEASUREMENT
contribute to society increased? Cognitive learning is effective in reaching all
these goals, but is not the only means.
The psychological process we call conditioning also can be used to
achieve some of these same goals. It works by making use of r'ewards and punish.
ments to establish specific, habitual responses to certain specific .onditiotrs.
Much of our behavior was molded,.especially
processes of conditioning, or behavior
much subject to its influences. If the scl
and if their sole mission was to establish
behavior patterns, then they should dt
conditioning could probably get the jot
the cognitive learning process could. Br
person flexibility and freedom in choor
tioning is better suited to the training ol
of human beings to live h.ppy, useful l.
People who object to the emphasis on learning as the school's main func.
tion may do so because they think of learning as academic specialization, designed mainly to pre_parea person for further rearning, and iemote from the
practical concerns of living. There may be some justifiJation for this view. But
learn_ingneed not be, and ought not be, the learning of uselessthings. rt can and
should be the student's main road to effective living. And as long-as gognitive
learning is the focus of our schools, there is a need for ways to determine the
extent and type of learning that has occurred. Tests can be, and should be, ainong
the most useful instructional tools for planning new learning activities and foi
monitoring students' progress in attaining the learning goals
lresented to them.
The Role of Affectlue Outcomes
Teachers and test developers are sometimes accused of overemphasizing
cognitive learning, with consequent neglect of the affective determiners of behavl
ior. Some people believe most teachers are preoccupied with what their students
know; students, they say, are most concerned with what they like or dislike and
how they feel. Furthefmore, they submit, the most profound challenges in our
society are not cognitive. They are challenges to oui social unity andiur soc_ial
righteousness, to our ethical standards and moral values, and to our courage and
compassion. If our schools dwell too much on cognitive outcomes, they w-ill fail
to contribute as they.should to meeting these other important challenges.
Such viewp_oints are not without foundation. Feeling is as reil and as
.
important a part of human nature as is knowing. How we feil is almost always
more important to us than what we know, and how we behave is a paramount
concern for those with whom we share our lives. And since behavior is often
determined more by how we feel about a situation than by what we know about
it, clearly the-affective dimension will play a most significant role in meeting the
challenges of society.
Should schools glve up some of their concerns for cognitive learning in
favor of affective outcomes? Such a reemphasis should not ociur for a varieti of
THE STATUSOF EDUCATIONAL
MEASUREMENT 19
reasons.Many affective ggals can be reached, at.leastin part, through
cognitive
means.Affect and cognition are not independent utp..tt'of the pers8nallt!
rr"*
we feel abou_ta problem or an event depends in part on what we
know about
it. Wisdom does not guarantee happ
unhappiness. The affective failures ar
the pushouts-can nearly always b
theirs or ours. Psychologistswho tr;
ally use cognitive means. The psych
tive process of fostering self-li,now
courses in human relations focus or
and attempt to create a new aware
ships by expanding the knowledge
No teacher can afford to ignore the affeitive side effects of efforts
to
Promote cognitive le-arning.In fact, the affective disposition to learn-the willingness to attend and^respond-must be considered 6y teachersin assessing
the
entering behaviors of their students for each instruitional unit. But
teac"he.,
should not use their concern for affect as an excuse for paying less attention
to
cognitive outcomes.
THE POTENTIALVALUEOF TESTINGIN EDUCATION
There is currently much testing in education. but testsseldom contribute
as much
as they could to effective instruction. How much is learned in any particular
course of instruction depends largely on how much the students wini
to lea.n
and on how hard the teacher woiks to help rhem to learn it. These
efforts by
students and teachers depend, in turn, o., th" immediate and ultimate
rewards
or satisfaction that seem likely to result from their efforts. Tests can be
used to
provide recognitions and rewards for successin Iearning and teaching-ih;t;",
be used to motivate and direct efforts to learn. In sho"rt,they can't.
,r.Ja 0.,
contribute substantially to effective instruction.
Tests have sometimes been used very successfullyto stimulare efforts to
learn. For example, in the Iowa Academic Conresr that began in lg29 (Lindquist,
1960)'irigh school students were offered tests in each oithe malor sublec'ts
tr
study:English,tristory,geometry,physics,and so on. Those who ,ecieiueati. t igt .
est scores on the local test were invited to a district contest where a similar frut
somewhat more difficult test was given. Those who scored highest on rhe district
testswere invited to the state contest where they took a thirdievel of tests.Those
who scored highest on these tests were offered'scholarshipsto the State Univer.
sity.
This academic contest was u.sedby some high school principals ro provide incentives for both students and teachersto woik hard. In some'schools
the
local contest winners we-rerecognized at a school assemblyand in news stories.
In conferenceswith teacherswhose students had done well on the tests,th;
t.;
cipal offered congratulations and support for continued efforts to teach effec.
tively. In conferenceswith teacherswhose students had not done well, the princi.
20
THE STATUSOF EDUCATIONAL
MEASUREMENT
pal,tried to identify tfilsf that the principal or rhe teacher might do
to make
students more successful the ne*t yei.. Thus the whole school was led to
believe
that.learning was important and that successful efforts to learn would
be rewarded. An environment conducive to learning was created in the
school, and
every student, not just the contest winners, benifited from it.
many schools, unfortunately, tests are not used so effectively to stimu.
late
and-Ilfacilitate learning. Test scores do not matter very much, and rjnless
they
matter they cannot contribute much to effective instruition. There
ur. ,.u.rul
reasons' none of them very good, why some teachers and school'administrators
depreciate- testing and do'ai little of it us possible. The tests are
criticized as
havin-g little value or as being actually rrarmrut- Doing a good job of
testing demands skills that many educators know that they taJt<a"na requires
work"that
their lives are more comfortable without. Testing involves comparison
and
competition. Even though these are facts of life, lome teachers
believe that
strould protect rneir students from competition as much as
possible.
_schools
unless all students can win, none should be allowed to win. Thus some.
schools
are content with a comfortable mediocrity as long as the public
will tol_
erate it.
taxed.citizens in many states are not willing to tolerate medioc-Heavily
rity in their
schools. They are asking for evidence that their iax dollars are
buying
excellence in education:.Th:y areisking that the schools do something,
oi d.l
manding that communities do somethirig, to correct the conditions
that educators blame for low achievement in learnling. Since public school
teachers and
are p.lbli: emplo-yees, it is eitirely prtp., for the public to t otJ
:S-T_t"-r:,!r",ors
thelr schools accountable for doing the best job poisible undei
the circum_
stances.
There are two things, both involving the use of tests, that teachers
and
schools can do, and ought io do, to justify.tf,e.ir stewardship ro
the.o*-rrrriry.
Each teacher ought to piesent evidente periodically to the sJhool
administration
that the students he or she has been teaching have made substantial
progress in
learning. Each school ought to present evidince periodiJly
io tt e io*"*rrrri,.y
that the students in the sihool ire making-substairtial pr"gtlrr in
learning-Ifi's
not sufficient for teachers and schools to -<lescribetheii piocesses
of instruction
and to claim that they know how to do a good job of educating
children. The
public is more interested in the product trra"nin it. p.o."rr,
u"a i, would like to
see evidence to support the claims.
of course, not all evidence of learning can be, or should
be, furnished
in the form of test scores, whether from teachJr-prepared or
starraa.aized tests.
Public performances and displays of student ..eatiie .ff"*,
p"rtfolios of student work
and in-cliss observations by teacher,
can
and should-products,
be used to supPort the positive efforts of students
"rrJ"a-irristrators
and teachers. It is
po-ssibleto use tests effectivelyto-promote and document learning
also. However,
s thai boti teachers and administrators te"tnowleageable
:f.::1::.::l,r:,r:q"i."
about and skilled in the use of educational tests. The remaining
chapters 6f tfri,
book are devoted to the pr_esentationof concepts and principles'that
will contribute to the development of many of these essehtial skills.
IH E S TA TU OF
S E D U C A TION AME
L A S U FE ME N T 21
5 u IXM A RY P RO P O SIT ION S
- --:
:
j
-,3s: recent surge in educationaltestingis
:a- :' a. nisloricalcycle patternof test use
'.':-,la:€i iest'nEand tne accompanyingpoten:: regativeconsequenceshave undulyshaped
:^e curriculiJmand teachingpracticesof many
:cJcators have not had an opportunityto learn
as ,.nuchas they needto knowabouteducational
.reasurement
r Tte influencestandardized
tests have on the local schoolcurriculumis probablymorebeneficial
tnan harmful
! "Teachingto the test" is deplorableif it means
givingstudentsanswersto the particularquestionson a test;it is commendable
if it meanshelpingstudentslearnwhattheymustknowto answer
questionslike thoseon the test
6 Claimslhat testingharmsstudentstend to be exaggeratedand seldomare substantiated7 Testbias may exist to some extenton some tests,
but it cannotaccounttor substantialdifferences
in test scoresbetweendifferentculturalgroups
8 Objectivetests can providehighlyvalid,precise,
and convenienlmeasurements
oJ mostof the importantoutcomesof education.
9 Comprehensive
instructioncan benefitfrom two
kinds of test information-scoresthat tell what
studentscan do and scores lhat show how students rank amongtheir peers
10 Stateand districtassessmentprogramsrequired
by law are intendedto evaluatethe generalquality of the educalionalprogramor to identiJyspecific competenciesheld by sludents"ready" for
highschoolgraduation.
11. A district'scustomizedtestingprogramis no adequate replacementfor a nationallyslandardized
testingprogram.
12 One significantcontributionof the NationalAssessmentof Educational
Progresswas to provide
a modelthat statesmightadaplfor designingand
operatingstatewideassessmentprograms
13 Testassemblyand administration
can be accomplishedefficientlywith the use of computerswithout sacrificingtest quality
14 The use ol purchaseditem banks by teachers
could have a negativeimpacton test quality,as
weil as on the develoDmentof teachers'itemw r i t i n gs k i l l s
15 Teslingto certifycompetenceor to licensepractitionersoften requiresa demonstration
of processesor produclsthat a paperand penciltesting
aDoroachcannolaccommodate
16 Court cases of the last severaldecadesthat involvedtestingwerefocusednearlyexclusively
on
the socialconsequences
of testingand fair test
use ralherthan on the tests themselves.
17. The Standardsfor Educationaland Psychological
Iest/rg representan effortby the testingprofessionto monitoritselfand to orovidea form of consumerproteclionto test takersand users
18 Thereare good reasonsfor believingthat the primary task of the school is to tacilitatecognitive
rearnrng
'19 Amongthe limitedmeansthat schoolscan use to
helpstudentsbecomeefiectiveand happyadults,
cultivatingtheir cognitiveabilitiesis the most aporooriateand desirable
20. Schoolsshouldseekto attainalfectiveendsonly
throughcognilivemeans
21 Testscould be usedto promotelearningbetterif
both teachersand administrators
systematically
providedtest resultsto the communityas evidenceof the educalionalprogressof students
FORSTUDYAND DISCUSSION
OUESTIONS
1. What usefulpurposescould be servedby a mandatednationaltestingprogram?
2 Why is there no federalschoolcurriculum,a commoncore for all schoolsin the United
States?
3 Under what kindsof circumstancesmight "teachingthe test" be appropriateand desirable?
F
22
THESTATUS
OF EDUCATIONAL
MEASUREMENT
4 what kind of evidencPshouldbe furnishedto supportsomeone'sclaim
that a parlicular
test is biasedagainsta givenelhnicgroup?
5 How shouldthe standardsfor a high school graduationcompetencytest
be established?
6 what are the prosand consof makingavailablestateby statecomparisons
of achievementtesl scores?
7 what'iunctions might be served by a testing "watchdog" organization
that would provide
protectionto test consumers?
8 Whichindividuals
or agenciesshouldbe assignedmajorresponsibility
for developing
inlerpersonalskills,societalvalues,and personalattitudesin youngpeople?
Measurementand the
Instructional Process
EVALUATION,
i|EASUREMENIAND TESTTNG
The purpose of evaluation is to make a judgment about the quality or worth
of something-an educational program, worker performance or proficiency, or
student attainments. That is what we attempt to do when we evaluate students'
achievements, employees' productivity, or prospective practitioners' competencies.In each casethe goal is not simply to describe what the srudents,employees,
or other personnel can do. Instead we seek answers to such questions as: How
good is the level of achievement? How good is the performance? Have they
learned enough? Is their work good enough? These are questions of value that
require the exercise ofjudgment. To say simply that evaluation is the process of
making value judgments understates the complexity and difficulty of the effort
required. Once it has been determined that evaluation is needed, the evaluator
must decide what kind of information is needed, how the information should be
gathered, and how the information should be synthesized to support the outcome-the value judgment. Thus, evaluation is as concerned with information
gathering as it is with making decisions. In addition, the term is used to refer ro
the product or outcome of the process. That is, we might, for example, submit
our evaluation (the product) of Scott's school performance to his parenrs follow.
ing our evaluation (the process)of his accomplishments.In this respect evalua.
tion has a dual connotation.
23
24
M E A S U R EM ENT
AND T HE INST RUo T Io NAL
PRo C E S S
Evaluation:
Formativeand Summatlve \
i-.i
... - ;
The terms formative and sumr
to describe the various roles of evalua
struction. Formatiaeanahtationis condu<
to determine whether learning is takinl
conducted at the end of an instructio
sufficien;,{y<omplete to warrant movir
structionlUhe distinctions berween thr
tions fof'test development and use in
educational programs. As will be noted
summa^livepurposes may be used on o
i The major function of formatiue
feedbadk to rhe teacher and to the str
feedback provides an opportunity for t
ods or m_alflials to facilitate learning
going wellformative evaluation requ"ir
matton on trequent occasions.lnformat
tion, classroom oral questioning, homell
inventories. Much of what a teicher dor
led as formative evaluation..[he role of
ighly systemadzed.programiof individu.
rently for formative eviluation.
the end of an instructional segn
-Rent-at
($elative to formative evaluarion. there"ir
tive evaluation. The inforrnation gathered is less detailed
in nature but broader
in
scope of content or skils asJessed.Figure 2-t comfare"
ro-. of the distin1!e.
guishing characteristics of the two types.
obviousry,-both types of evaluation are necessary components
.
of class.
room instruction. In some cases,information gathered
r,i, ,"-t"tiu.
prr.por..
may be useful in a formative sense. For exam[I., ttr. ,.or.,
o., a unit test may
be used to evaluare achievement at the end of that unit.
A; ;; same time the
scores.reflect progress in the course and in the broader
i.rsirultiorr"r pr;d;.
ln such circumstances the tests shourd be designed ,"
vi.lJ-"..fuI information
Flgut 2-1.
characteristics that DislirEuish classrmrn Forrnaliv€and surnrnativ€
Elralualion
Formetive
Purpose
ContenlFocus
Methods
Frequency
Monitor progress
Detailed, narrow sco&
@servations,daily assignments
Daily
Summative
Checkfinatstatus
General,
broadscope
Tests,projects
Weekly,or every2-3weeks
MEASUREMENT
AND THE INSTRUCTIONAL
PROCESS E
for summative evaluation purposes, but the scoresmight be used incidentally as
gross indicators of progress in the broader context.
Measurement:Assigning Numbers
Measurement is the processof assigningnumbers to individuals or their
characteristics according to specified rules. Measurement requires the use of
numbers but does not require that valuejudgments be made about the numbers
obtained from the process.we measure achievement with a test by counting the
number of test items a student answers correctly, and we use exactly the Jame
rule to assigna number to the achievementof each student in the class.Measurements are useful for describing the amount of certain abilities that individuals
have. For that reason, they represent useful information for the evaluation proc.
ess.But can we measure all the important outcomeSof our instructional efforts?
Education is an extensive,diverse, and complex enterprise, not only in
terms of the achievementsit seeksto develop, but also in terms of the means by
which it seeksto develop them. our understanding of the nature and processo?
education is far from perfect. Hence it is easy to agree that we do not now know
how to measure all important educational outcomes.But, in principle, all impor.
tant outcomes of education are measurable.They may not be measurable with
the testscurrently available.They may not even be measurablein principle, using
only paper and pencil tests.But if they are known to be important, they must be
measurable.
To be important, an outcome of education must make an observabledif.
ference. That is, at some time, under some circumstance,a person who has more
of it must behave differently from a person who has less of it. If different degrees
or amounts of an educational achievement never make any observSblediffer.
ence, what evidence can be found to show that it is in fact important? But if such
differences can be observed, then the achieyement is measurable, for all that
measurement requires is verifiable observation of a more-less relationship. Can
integrity be measured? It can if verifiable differences in integriry can be observed
among individuals. Can mother love be measured?If observerscan agree that a
hen shows more mother love than a female trout, or that Mrs. A showsmore love
for her children than Mrs. B, then mother love can be measured.
The argument, then, is this: To beimportant an edurational outcqmemust make
a d.iffnenta If it makesa difermre, the basisjor mzasurensnt arfuts. To say that Rita
shows more "spunk" than Ned may not seem like much of a measurement. where
are the numbers? Yet out of a series of such more-less comparisons, a scale for
measuring people's spunk can be construcied. The Ayres scale for measuring the
quality of handwriting is a familiar example of this (Ayres, lgl2). If a sequence
of numbers is assigned to the sequence of steps or intervals that make up the
scale, then the scale can yield quantitative measurements. If used carefully by a
skilled judge, it yields measurements rhat'are reasonably objective (that is, free
from errors associatedwith specific judges) and reliable (that is, free from errors
associated with the use of a particular set of test items or tasks).
Are some out(:omes of education essenti3lly qualitative rather than quantitative? If so, is it reasonable to expect that these qualitative outcomes can be
26
M E A S U R E M ENT
AND T HE INST RUCT Io NAL
PROCESS
measured?It is certainly true thats.rne
differences between persons
are not usu.
.r asm<,re-ress
dtll:::11:...
i;i;;;"
l1',r_:*lyr*l.r
l.rench;,.j ;',,|,:Till
il,:f ::H ;*.ilJlT.iX'"r
ences
is a man;thatone is a
in quantitative terrns,to<>.I.his p"rr,rn
f,u, ,
man; rhat o11 has less_T.hisperson
frul .n,r.. .y..t
per s on h a s rn o re a b i ti ty to s p e a k F rench;
tt,ot,,,nl
We n ra y th i n k o f th e w e i g h t
u rnu" , f,
account as quantities, while r.garding" f
his health,
as quar it i e s .A n d i f rh e y s e rv e i o
d i ffi renti ate hi r
ex hibit s m()re o r l e s so f rh e m th a n
o th er men. thev
It is diflicult to think of any q"dit
;;;,';;;;;;;J
f ied. , , W h a te v e re x i s tsa t a l l ' e i i s ts
i n some amoun
p. l6) . A n d Wi l ti a m A . Mc C a l t (1 9 g 9 )
n " , uaJ.J,-,;,,
c an be me a s u re d " (p . l g ).
Tcsilng: A Form ol Measurement
Tests repr-esentone particular measurement
technique. A test isa set of
questions' each of which hai a
correct answer, that examinees
usually answer
orally or in writing' I-est questions
differ rrom thor.'used in measures
of attitudes' interest or prefereni., ,r. certain
other asfec,-.-o.-rp..ronality.Ideally,
questions in testsof achievement
the
or many testsoi intelligence have
answeri that
content experts can agree are correcr;
correctnessir ,,ot aZt.r*ffifi;.;;lr.
ular valu.es,preferenies, or dislikes
.rf u grouf oij;;;;.
All testsare a subset.f the quantiiativi
,ooi, o.'i..trniques rhat are classi.
fied as measurements.And a, measureme't
techniquesare a.subsetof the
titative
quan_
q^111i:^rivetechni.qu., ur.a i" .;;r;;;ffi"o
concern in this
text, but""0
certainry not the <lnty <rne,wi|l
be *itn tne aeveropment
-".;o,.
of tests that
can contribute to summative evaluation
of studen, r.^r"i"g. other measurement
and evaruation rechniquesare usefut
ro, ott e..uai;;;i;;
purposes,but resrsthar
measure ..r.:1::^r.hoor rearning
with precisi;;.h";'most
usefur toors avair.
able to teachers for most crussrozm
summative evaluation needs.
EVALUATIONIN THE TEACHINGPROCESS
The evaluatt":^:lrlTlng
takes prace in an instructio'ar
contexr and, conse.
quently, that learnin-g ..tui-.or,-.rit
shap-es;. ;;;;;r.nrry
*. evaluare, influ.
;:fi:TiJ,:T::,,f:,
evaruating
*"r,,"t.
", wilrashow*.valuation
";JA;;i;;,
of inrtr...iiorr, it is not
teaching process.The
must be understood as
that end, the role of e
ttr.t .*pi"ir* t o* trr. teaching process
works.
is an integral part
tw ts loosely attached to the
:ole of evaluation in it both
ucational measurement. To
re described using a model
M EA S U R E ME NATN D TH E IN S TR U C TION A
PLR OC E S S
27
The Basic Teaching Model
There are many models that describe the variety of approachesto teaching found in our schools, but the Basic Teaching Model (BTM), introduced by
Glaser (1962), accounts for the fundamenral components of rnosr other specific
teaching models, such as the Socratic approach, the individualized instruction
approach, or the computer-dominated instructional approach (foyce and Weil,
1980).Few teachers probably follow the BTM steps explicitly to guide their instructional activities. And though we do not specifically epdorse the use of the
BTM or any other particular rrrodel,we do advocateinstructional approaches,by
whatever name, that account for the fundamental functions represented in the
BTM as described next.
The main purposes of the BTM are to identify the major activities of the
teacher and to describe the relationships between activities.Figtrre 2-2 is a diagram of the model. Our primary interest is the Performance Assessmentcompo.
nent, but we canrlot understand completely the role of evaluation without understanding how Performance Assessmentaffects,and is affected by, other teaching
activities. Instructional objectiaes,the first componenr of the BTM, represents the
teacher's starting point in providing instruction. What should students learn?
What skills and knowledge should be the focus of instruction? What is the curric.
ulum and how is it defined? The second componenr, Entning Behauior,indicates
that the teacher must try to assessthe students' levels of achievement and readiness to learn prior to beginning instruction. What do the students know already
and what are their cognitive skills like? How receptive to learning are they?
Which ones seem self-motivated?This component indicatesa need for evaluation
i nform ation beforeinstruction actual ly begin s.
Once the teacher has decided what will be taught and to whom the teaching is to be directed, the "How?" must be determined.The InstructionalProcedures
comPonent deals with the materials and methods of instruction the teacher selects or develops to facilitate student learning. Does the texr need to be supplemented with illustrations? Should small group projecrs be developed? Is there
computer software available to serve as a refresher for prerequisites?At this
point instruction could begin, and ofren ir does. But unless the reacher makes
plans to evaluate students' performances, the students and teacher will never be
Figure 2-2.
The Basic TeachingModel(Dececco and Crawford,1974)
r
A
MEASUREMENTAND
THE INSTRUCTIONAL
PROCESS
Evaluationplanning
PR@ESS
AND THE INSTRUCTIONAL
MEASUREMENT
E
Examplesol Methodsthat ServeVaryingPurposqsin an EvaluationPlanningGuide
Flgure 2-3.
LEVELOF INSTRUCTION
Typeof
lnlormation
Course
Unit
Cumulativefolders,questionnaires,observation,
oral questioning
Unit tests,prolects,
Formative
papers,observation,
Evaluation
patterns
participation
Pretest,oral qus$lloning,
checklist,obs@rvaliofi
Finalexamination,
Summative
project,
Evaluation
comprehensive
r9searcnpaper,
performanceratings
Unit test, writtenProject,
work product,presentarecord,
tion, participation
perlormancechecklist
Entering
Behavior
Ouizzes,oral questlonlng,
results,participatiorl
records
Daily Lesson
oralquesObservation,
tioning,homework
resulls
Teacherquestioning,
studentqueslloning,
quizzes,activity
nonverbal
obssrvation,
observation
notapplicable]
[Ordinarily
of school, teachers spend considerable time "sizing up" their class, both the
group and the individuals in it (Airasian, 1989). Teachers might review cumulative record folders or solicit specific background information from students, but
most data gathering is unplanned observation and questioning directed more
toward social and emotional behaviors than academic ones. Teachers should plan
for their evaluation needs, no matter which level of instruction they happen to
be considering. By deciding in advance what kind of ihformation they need and
how it might be obtained, evaiuation will be done efficiently and will yield complete and helpful information.
Another reason for developing an evaluation planning guide for a unit,
for example, is that the teacher will be forced to plan for assessingthe achievement of some of the hard-to-measufe outcomes of instruction. For example, the
teacher may plan to evaluate the achievement of the 12 main objectives in a
science unit by using an objective test, two essayitems, and a laboratory observa'
tion checklist. Without the planning, however, last-minute attention to evaluation
might result in the use of only an objective test. Test-planning activities that will
be discussed in Chapter 7 will help to decide how "testable" objectives can be
measured. But in the absence of an evaluation planning guide for the unit, non'
tes.tableoutcomes may get lost in the shuffle. An assessmentof science achievement should reflect learning of all relevant objectives, notjust those that are most
easily assessed.
Finally, it can be seen from an inspection of Figure 2-3 that evaluation
activities vary depending on the level of instruction and the type of information
needed. Note., for gxample, that standardized achievement-test scores, found in
cumulative folders, are useful on the course level, but only for providing informa'
tion on entering behavior. They are much less useful as summative evaluation
inf<rrmation, and rhey are of little value at the unit and daily lesson levels. Note,
too, that sumrrative evaluation of daily lessons is not meaningful because such
lessons are seldom ends in themselves. Which methods seem more helpful for
30
MEASUBEMENTAND
THE INSTBUCTIONAL
PRoCESS
formative than summative evaluation.purposes?
why isn,t homework included
under rhe summative evaluarion
heading?'tn *hich lri.. .ur.go.y
do you find
the componenrs thar are listed
.,id.. r"r-",i,.
evaruarionar rhe course
,T;j;"t
FUNCTIONSOF ACHIEVEMENT
TESTS
The major function of a clar
ment and thus to contribute to the
ev:
ments. This is a matter of consideral
have, rhat whar studenrs know ;;J.,
rre opporrunities, there is nothing,,mere,,
AND THE INSTRUCTIONAL
MEASUREMENT
PROCESS 3I
.
will be tested, if they know what the test will require, and if the test does a good
job of measuring the achievement of the essentialcourse objectives,then its motivating and guiding influence will be most wholesome.
Anticipated tests should be regarded as extrinsic motivators of learning
efforts, and internal desires or needs to achieve should be regarded as intrinsic
motivators. Since both kinds'contribute to learning, the withdrawal of either
u'ould probably lessen the learning of nr:oststudents. For a fortunate few, intrinsic
motivation may be strong enough to stimulate all the effort to learn that the
student ought to exert. For most of us, however, the motivation provided by tests
and other influential factors is indispensable.What we stand to gain or what we
might lose in a given situation is motivating to all of us. We live with such tradeoffs. Tests help us make many of our decisions about trying to learn-whether
to try, how hard to try, and when to stop trying.
Classroom testsdo serve other useful educational functions. The process
of building them should cause instructors to think carefully about the objectives
of instruction in a course. It should cause them to define their objectivesoperationally, that is in terms of the kind5 of tasksa student must be able to handle to
demonstrate achievement of those objectives. And from the students' perspective,
the process of taking a classroom test and discussingthe outcome afterward can
be a richly rewarding learning experience. As Stroud (1946) put it long ago,
It is probably not extravagant to say that the contribution made to a student's
store of knowledge by taking of an examination is as great, minuti for minute,
as any other enterprise he engagesin. (p. 476)
Hence, testing and teaching should not be regarded as rnutually exclusive or as
competitors for valuable instructional time. They are intimately related parts of
the total teaching effort, as the BTM illustrates.
LIMITATIONS OF ACHIEVEMENT TESTS
It is easy to show that mental measurement falls far short of the standards of
logical soundness that have been set for physical measurement. Ordinarily, the
best it can do is provide an approximate rank order of individuals in terms of
their ability to perform a more or less well defined set of tasks. Unlike the inch
or pound, the units used in measuring this ability cannot be shown to be equal.
The zero point on the ability scale is not clearly defined. Becauseof these limita.
tions, some of the things we often do with test scores,such as finding means,
standard deviations, and correlation coefficients, ought not to be done if strict
mathematical logic holds sway.Nonetheless,we often find it practically useful to
do them. When strict Iogic conflicts with practical utility, it is the utility that usually wins, as it probably should.
It is well for us to recognize the logical limitations of the units and scales
used in educational measurement.But it is also important not to be so impressed
by these limitations that we stop doing the useful things we can legitimately do.
One of those useful tfrings is to measure educational achievement.
Are some outcomes of education too intangible to be measured? No
32
MEASUREMENT
AND THE INSTRUCTIONAL
PROCESS
Alternativesto Tests
Teachersobtain information abr
greater amounts of somecharacteristic
ot
AND THE INSTHUCTIONAL
PROCESS CI
MEASUHEMENT
of students'abilities canbe made on the basisof descriptive information. Because
are bound to be qualitative, not quantitative, they provide only
these assessments
limited and very imperfect indicators of achievement.Such descriptions are dom.
inated by terms like excellent, mediocre, worthwhile, well writtetl, satisfactory,
and quite good. Qualitative descriptions of direct or indirect behavior observations, however specific and objective, ma)/ have some value in assessingachievernerit, but they are no adequate replacement for a well-prepared classroom
achievement test.
Ratings of performance or products, on the other hand, do involve assigning numbers to things and, hence, do constitute measurements. Thus, they are
useful in differentiating individuals who possessdifferent amounts of the traits
measured by the ratings. These kinds of rating scales tend to measure certain
aspects of achievement that tests are less well equipped to measure. Consequently, while they seldom can replace tests, they frequently provide useful supplements to the information provided by tests.
Many teachersuse ratings of a student's discussionparticipation and of
the student's written work as part of the basis for evaluating learning. It is important to note that the value of ratings, as well as of a test or any other measurement
of achievement, depends on their reproducibility, accuracy,and appropriateness.
Assessmentsof achievement ordinarily should not be limited to tests, but the
alternativ'es and supplements that are availatrle must be used with full realization
of their limitations and pitfalls. Chapter l4 is devoted to a discussion of the development and use of nontest alternatives for gathering achievernent information.
INTERPRETINGMEASUREMENTS
The result of measuring is a number, but that number has no inherent meaning
and, consequently, is not a useful contributor to decision making. To make the
number usqful, or meaningful, it is necessary to compare it with something. If
Gail, a 2l-year old female, weighs 62 gauchos, what does that mean? If it is 8
bukas between your town and mine, what does that number mean? If Tonya got
15 right on an algebra test, what does that 15 tell us? My score of 23 on the
,{ttitude Toward Computers Scale means nothing by itself. I need to refnente it,
or compare it. with something that has meaning in order to interpret my score.
What are these "sornethings)"
When we step down from the scale after weighing ourselves, we lend
meaning to the number we read by referencing it to any of several other numbers: our expectation for how much we think we should weigh; the result from
our last weighing; the weights of ottaer individuals of our own height, gender, or
age; or a listing of numbers that define such terms as obese, overweight, about
right, underweight, and emaciated. Similar kinds of comparisons can be made to
interpret a test score. If my expectation for myself was a score of at least 85 on
a midselmesterexam, then actual scores of 67 or 92 each will have quite different
meanings to me. If my score of 92 is 15 higher than my score on this same test a
week ago, my score obtains meaning in terms of growth or change by referencing
my first score. Knowing that three-fourths of my classrnatesobtained scoresabove
92 also supplies interpretative information. And, finally, if I know there were I I 5
34
MEASUREMENTAND
THE INSTRUCTIONAL
PROCESS
ls rmportant to undersl
r:"1.r, depending ,
:r,1,_:
mrnd.
A close look at th"et
oackground for test
deve
and inappropriate
score
Norm-referenced Interpretations
;als (or groups) ro
obtain
_.term ..norm,, relates
to
Ls,norm_referenced
inter_
io"
l"ilgi.T?;:,T,:"n
geometrv
score with,
the
o derermine
ti, ..iltluJ
AND THE INSTRUCTIONAL
PROCESS 35
MEASUREMENT
of summative evaluation is to compare the achievement scores of the groups.
T,reatment-referenced interpretations are made when the score (average) of one
group is compared with the scores (averages)of other groups that have experienced the sarne,rival, or no instructional treatment. Tests designed to yield such
interpretations contain items that are sensitive to instructiorU that is, they are
much easier for students who have been instructed than for those who have not
been. Interpretations are made in reference to the varying methodological treatments or instructional levels. The mean score of a particular group takes on significance only when compared with the mean of some other Broup. For example,
"The team-taught class scored higher-than the control group" or "High-ability
students who used calculators scored the same as low-ability students who did
not use calculators." No direct reference is ordinarily made to test-item content
or subject matter to derive meaning from the scores. Achievement scores obtained through most of the "methods" research in the education literature are
interpreted-in a treatment-referenced fashion. In addition, school norms, or
norms for school averages, that are reported from the scoring of standardized
achievement tests are best labeled treatment referenced.
Crroupreferencedis the broad term we find convenient to use in referring
to norm-referenced and treatment-referenced collectively. Both kinds of interpretation involve comparing a.single score with a grouP of scores. In the first case,
these are scores of individuals. In the second, these are scores of groupsof individuals. The distinction is an important one because,when we want to interpret the
performance of, say a class of first graders, we should compare their average
score with the averages of other first-grade classes,not with the'scores of other
individual first-grade pupils. The consequences of using the wrong norm group
for making comparisons will be explained in greater detail in Chapter 16.
Grilerlon-ref erencod Interpretatlons
A critnim.-refomced, interpretation is made when we compare a person's
score with scores that each represent distinct levels of performance in some spe"
cific content area or with respect to a hehavioral task. Meaning is obtained by
describing what the perFon can do, in terms of various gradations, in an absolute
sense. Glaser (f 963) first used the term to highlight the need for tests that can
describe the position of a learner on a performance continuum, rather than the
learner's rank within a group of learners. For example, we may want to know if
Bob can solve geometry problems that require an understanding of the properties of the rhombus and trapezoid, but we do not care'so rnuch how well or poorly
his performance compares with that of his classmates.Such interpretations are
important when the goal is to determine,whether students have the prerequisites
to profit from a new instructional unit or if they have learned the essential ideas
in a unit before moving on to a new unit.
After over 25 years of using the term criterion referenced, there is much
confusion, even among measurement specialists,about what the term means. Part
of the confusion stems from the fact that "criterion" is used in several other ways
by testing specialists.Another part of the conftision relates to the wide variety of
interpretations that can be classified corectly as criterion referenced. (Nitko,
1980). Both Hively (1974) and Millman (f974b) recognized this arnbiguity and
36
MEASUREMENT
AND THE INSTRUCTIoNAL
PRocEsS
marn.
ltems represent only a (random)
sam
Some behaviors included in the doma
or may be underrepresented by the
it
on the basis of boti sampled and uns:
tions, course proficiency t.rtr, ,,ct
apt
prepared achievement batteries are'li
In many instructional settinss
easy to describe becauseits constitue"nt
mains in mathematics and foreign
lanl
example, than those in literature?r
socl
interpretations are meaningful onlv
to r
be communicated to those wh" ;.
i.
score. k should be apparent, also,
thar
communicated clearly, misinterpretatio
MEASUREMENT
AND THE INSTRUCTIONAL
PR@ESS
C7
reflect the intended domain well. An example will illustrare several
of tlese
points.
.
lYppo.se ]ve want to measure students' skills in using the dictionary. with
that goal in mind we could begin to write test items, but th"econtent domain
of
interest will be relatively ill defined. A
'
described by listing these particular usr
determine word meaning, to determinc
:atesextent of mastery of that collection
rrsuch
objectives.referenced
i,,,..p..,u,'i5l:1ij'*:ffi,::?'jj:i:tHlffruln.
tions will be needed and separate subtests will need to be ionstructed to
ensure
that the subdomains are being measured thoroughly. Then four scores,
one for
each subdomain, would be obtained so that r.pu.it.'d..isions
about tn.
of each skill (objective) could be made.
-.rt.ry
Cutoff"score Interprotations
There is another score-interpretation situation that merits special consid.
eration because it is so easy for the outcome to result in misinterpretation. Here
are some examples first.
l.
2.
3.
4.
5.
You need a scoreof 88 percent to earn a B grade.
You will be placed in German 202 if your score is in the range from
40 to 52.
A TOEFL scoreof at least280 is neededfor adrnission
Those who score 16 or higher will be awardedfive credit hours for
CalculusI.
The passingscoreon the certification test is 120.
These are situations in which the score users are not particularly interested in
domain scores or in norms. Instead, some minimal standard of performance is
the most logical reference for obtaining score meaning. It appears, on the surface
at least. that this^is just another form bf criterion-.ef!.erried interpretation because each cutoff score represents a performance standard. But dois it? what is
the basis for choosing 88 percent raiher than g5 or gZ percent for the B cutoff
score?What level of content proficiency does a calculus tist score of l6 represent,
or does a score of 16 simply "promote" the top l0 percent of the test takers out
of Calculus I?
whenever a'cutoff score is used as a basis for score interpretation, we
must know the rationale for selecting the cutoff scor.e in order io determine
whether norm-referenced or criterion-refergl.._d interpretation is being used.
For-e_xample, a test might b9 given to identify rhe talented eighth grade?s who
could benefit from an enriched algebra-cours-e^lext year. Th6se ii the top
20
percent (the top,16 students out of the class of g0) might be selected. Whatever
score seParates the top 20 percent is the relevant performance standard for this
3E
MEASUREMENT
AND THE INSTRUCTIoNAL
PR@ESS
Flguro 2-4, Interpretivestalemenls DistinguishingNorm-Referenced
and criterion-Beferenced
Interpretations
Norm-Rele renced I nterp retations
'I
Ricogot the highestscore in the class
z . No other 5th grade crass in the district
has a roweraveragevocaburaryscors_
Sara's score of 77 is well above the class average
o, 5g.
Princewon the "Besl of Show" award at the peishow.
Ben'spercentilerank on the listeningtest is 35.
6 The averagescore of the "writing to Read"
studentswas higherthan the averageof the
other
students.
My GREscore is 450
4
B,
Criterion-Referenced Inter pretations
1.
2.
3
4
Ericacan correctlyname the capitalsof 47 states.
Jody has achieved3 of the 9 science goals.
Katie has a perfect score
Billiemissed6 of the g itemsdealingwith adding
untiketractions
correcily speiled93% of the words from tiis guarter,s
!. lnOV
list.
6. Bert can lype 52 words/minutewithout errors
7. I got half of the true-falsecapitalizationitems
right.
grorrp' This is obviously a norm'referenced.interpretation
each student,s
- because
"-"-outcome depended on his or her ranking
in thJ g."";.
prerequisites. A cutoff score using the
be set. This method of setting the c-utoff
standard and thus qualifies is criterion
)ottom section, which statementsare
ex.
:tation?
SUMMARYPROPOSITIONS
1. Evaluationis an information{athering
process
that resultsin judgmentsaboutthe
luatity or
worthof a performance,
product,process,oi activitv.
The resulls of formative evaluation are used
primarily to monilor learning and imDrove
the
instructignal
process.
The resultsof summativeevalualionare used
ori_
MEASUREMENT
ANDTHEINSTRUCTIONAL
PROC€SS g
marily to make final judgmentsaboul the extent
of learningor the qualityof the instructionalprogram
4. Measuresare tools of evaluationthat reouire a
quantification
of information.
5. Any importantoutcome of educationis necessarily measurable,but not necessarilyby meansol a
paper and pencil test.
6 lt is a mistaketo believethat qualitiescannot be
measured.
7. All tests are measures,and all measuresare included in the set of qualitativeand quantitative
techniquesof evaluation.
8. The Basic Teaching Moclelis a conceptual descriptionof the essentialingredients
of the teach_
ing process. lts components-instructional objectives, entering behavior, instructionalprocedures, performance assessment,and feedback
roop-represent the generalactivitiesone would
expect to find among the proceduresof successful teachers, regardlessof the specific teaching
moclelthey employ.
9. The relationshipof evaluation activities to the
other essential aspects of teaching can be described with the Basic TeachingModel.
10. An evaluationplan describesthe methods to be
used in an instructionalsegmentto obtain infor_
mationabout enteringbehaviorand.information
for formativeand summativepurposes.
1 l The measurementof educationalachievementis
essentialto eftectiveformal educalion.
12 The primary function of a classroom tbst is to
measurestudent achievementaccuralelv
13. Classroomtests can help motivateand diiect student achievementand can contributeto learning
directly.
14 The developmentof a good classroom tesl rs.
quiresthe teacherto definethe courseobjeciives
in specificlerms.
15. The fact that educationalmeasurementsfail to
meet high standardsof mathematicalsoundness
does not deslroy their educationalvalue.
16. Educationaloutcomes that are said to be intangible becausethey are not clearlydefinedare as
difficult to attain through purposefulteaching as
they are to measure.
17. The impertecttests we now use serve us far bet_
ter than we would be servedby the use of qualitative assessmentsalone.
18. Criterion referencedand norrn referencedmore
preciselydescribe kinds of testscore interprelations than types ot tests.
19. Domain referenced and objectives referenced,
both types of criterion-referenced
interpietations,
are appliedin situalionswhere the test content is
either a sample of interestor the entire universe
of interest,respectively.
20. Norm+eferencedinterpretationsinvolvecompar
ing one person's score with the scores of other
inclividuals,but treatment-referencedinterpretalions compare the score of one group with the
scores of other groups.
21. Criterion-referenced
interpretationsinvolvecom_
panng one person's score with a sel of absolute
pertormancestandards.
22. Whena cutotf score is used,the underlyinginter_
pretationmay be either absoluteor relative.de_
pendingon the method used to establishlhe cutoff score.
OUESTIONS
FORSTUDYAND DISCUSSION
1. How is the process of evaruationditferentfrom rhe processof measuring?
2. In what ways do lormativeand summativeevaluationoften,takeplace in employeepertormance appraisalin variouswork settings?
3. What are some importanteducationaloutcomes that seeminglycannot
be measuredby
any availablemeans?What is the basisfor establishingthe importanceof these
outcomes?
4. Thinkingback to a course you have taken recently,how were the components
of the BTM
evidencedin the teacher's behaviors?Which componentsseemedto have
been missing,
iI any?
5. In which componentof the BTM does the evaluationplanningprocess probably
best fit?
6. For what reasonsmight the resultsof daily assignmentsbe better categorized
as formative
rather than summativeevaluation?what are the implicationsof the distinction?
& ' MEASUREMENT
ANDTHEINSTRUCTIoNAL
PRocEsS
7. lf the use of paper and pencil tests were abolishedat all educationallevels,what might
some of the important direct and indirect consequencesbe for students,teachers,and
others?
8. What kinds of group-referencedinterpretationsdo teachersand administratorsmake most
f requently?
L Under what circumstancesmight we be inlerestedin knowingonly whethera speciflcexaminee scored above or below the averagescore of a certain group?
10 Why might it be difficult to make useful norrn-referenced
interpretationswith scores from
a test designedto providecrilerion-referencedinterpretations?
11. What are some practicalexamplesof the use of cutoff scoreqthat do nof o{ovid€contentrelaled interpretations,that is, indicalionsof what examineescan do?
MeasuringImportant
Achievements
THE COGNITIVEOUTCOMESOF EDUCATION
If we look at what actually goes on in our school and college classrooms, labs,
libraries, and lecture halls, it is reasonable to conclude that the major goal of
education is to develop in students a commnnd,of substantiaehmwlzdge. Achievement of this kind of cognitive mastery is certainly not the only concern of educators, parents, and students, but it is the central concern. What is this important
knowledge and how does it relate to understanding, thinking, and performing?
We need answers to these questions so that we can decide which achievements
our educational tests should measure.
Knowledge Versus Inlonnatlon
Knowledge originates in information that can be received directly from
observation or indirectly from reports of observations. Anything we hear, read,
smell, or otherwise experience can become part of our knowledge. If it is remem'
bered, it does become knowledge. But if it is only remembered, without being
thought about, it remains mere information, the most elementary and least useful
form of knowledge. If, on the other hand, information becomes the subject of
our reflective thought, if we ask ourselves, "What does it mean?" "How do we
know?" "Why is it so?':, we may come to wdtrstand' the information. It can be
integrated into a system of relations among concePts and ideas, all of which con'
stitute a structure of knowledge. This process of encoding is essential to enable
later retrieval; observations that are not encoded in some way cannot be recalled.
42
MEASURING
I M PORT ANT
ACHIEVEM ENT S
Information that is stored in our
memory_by sernanticencoding, that
is, by ass.ci.
at ing it s m e a n i n g w i th i n fo rm u ri o n
a l r eaaystored,i , p" rr..i rr, usefur,
and sari sfy_
ing relative to information sror-ed
,".r;ir6t;;i#;"J.;;g
(Anderson. r9rJ3).
In the rarrer case,informarion i",i".LJ'ny'";i;;;;;r;'i,
*"iri orher informari.n
related to our personar experience.
reiepiro.re numbirs, wha.t we
w()re tw() days
ago' and whar we plan to do next
weekend
;;;r;r'of'r.rro.-ution
episodic. we may not remember
that is
"..
that we learned the'mearring
of.,,prognostica_
tor" in sevenrh grade (episoai.
."f"iir,gJ, nrlt we likely still
remember what it
means (sernanticencoding). Infcrrmationihat
ha, b.;;;lr;;iiared
inro
rng srrucrure of knowledge is
exisr.
likely to b*eu
'ur rhan
possessron
;;;;;,
information that is simply"r.rn..r,U!..a
-..,.e igdii-,,.,
lBouldinr.
The source of our verbal k""*r;Jg.
;;;;a'il;;
minds in the f<rrm of
(poranyi,r96a)anJ
ar-rror,.isa pureryprivatepossession"
l1f:'-u^l.-rt^r-ogeo.
Bur
absrracred
frorn rheseimagesandexpressed
in words,and
ffiil ff:::iffi:
informarionr";iltT".:s"*:,::T:5T.:i:,?.":ilJ'_:#[ni..**
can be communiiated, it can u"...ora.i
n"a ,rr..i]?,
,;;;i. rererence,and it
can be manipulated in the process
."n...iu.ir,i"tr"i
iil,
u..uul knowredse
ls a very powerfur form of knowledge.
"f The
peculia..*.&r..,..
of.humarrsamong
all other earthly creatures is their
luiri,y to produce urra ,r." u..bar
knowredge.
If a structure of
knowtedge consisrsentirery of
a sysrernof.arricu.
1erb1l
lated relations among concepts
and ideas,can it be aesc.itea
compreteryby list.
ing the elements (oropositions)
that .o-po.. it? Might .ro1-"
.o*prex strucrure
inv olv e r era ti o n s o r d i m e .,s i o ;r
i t;, ;;; nor expressedby the
consri ruenrere.
ments of the structlr^.1_":Ir",ily,
a
of rhe e.le;;;;;rr;.;,.rcrure
some that have nor been perceived
may tack
'stinj
of,suchanunperceived.ind or e*fire.sed in words. Bur t
elemenr,
;;;;;iii:T:
il ;:iffJ:
and ex pr es si t.It c o u rd th e n b".;;pd;i.
e a d d e d to the ri st.Th" ;;" ;;r:i i r,
,tu, a srrucrure
of verbal knowredge cu" u. a.r..iu.a'ty
risting ,1.
p.op.rsitions
9""*i,r-".,a
it.appears to belogicat. The whoie
wrrurt rrr
in tf,i,
1lt^alcomp:se
Lrrrslui"
case aPPearsto be pre'
cisely equal to thi ium of all theffits^^'!
"
Propositions Represent Knowledge
If the primary goal of education
is to herp srudenrsbuild and
use srruc_
tures of verbal knowled-ge,it follows
,tru, i"ro designed to measure
achievement
should
becomposeo
; aete.miri"ih; ;;;;"
whichstudenrs
iaeus r.t"-"iot,..,
and Nagle
abourpropositions
:.9- p;;;i;ii.ry g".-ur. i; ;;l;:;ntext: (r) knowr_
edge is of proooiitions and^(2)
p."p"rlirbn:ri,..*',r'ir
possessthese strucrures
of verbal F;
knowledge. r*o
(rsz4)
is a statem;;; ;,
can be said to
be t r ue or ia rs e . (o u r u s e o f th e"'
ti r# h ...
i s not l i mi ted to the basi c,,i f_then,.
sraremenrsused in logicar anarysis
in the n.rJ
expressedin sentencei,but not
are
all sentencesare"i;ilii;r;;y:;'rropo.rtions
propositions.'T.hoseexpressing
questrons or commands cannot
be said to be true ;;idr.,;;
can those rhat
report purely t"bi..lll
or feelings. propositions u"r.'ui*uy,
declarative
statements about objects lishes
or events in the externir world, ro,
.*ampt.,
The earth is a planetin the solar systom.
A body immersedin a fluid is buoyed
up by lorce equalto the weightof the fluid
displacecl
ME A S U R IN GIMP OB TA N TA C H IE V E ME N/lTS
:}
As wc consumeor acquirerddltlonel unitr ol any
commodlty,the sallsfecilonderivcdlrom
each addlllonalInstellmenttonds lo dlmlnleh.
*,l,lxT J. Bryanfarrrd In hrs brdfor erecfionro rhe preerdency
of rhe u.s. in rhe campargnof
Ralnfell ln New york on Decemberl, lg9i.
The cost or rivingrn canada Incrercedby two-ffths
of a point duringoctob€r,1gg5.
mentionedon p89e 136ol EducauonatMeasurcm-cnt,
edircd by E. F. Llnd_
"lti,':Tt'teEts "e
ased on propositions such as these,
but
:'"i:j;j,T,:'lT,f
l"ffI:'"",!
i:H.".,T;
test irems'To
suitabre,propositionsneedto meer rffrroi::::?i,.":#t,'"e
",
b!
l. They mustbe concise,wordedas accurately
and unambig.uousry
as the precision
of knowledgeand languagepermit.
2' They musr be rrue,as estabrished
by a preponderance
of expertsin the fierd.
3' rhey must be worthy of remembering,asjudged
by expertsin the fierd.
4' They must representknowledgeunique to
the field, that is, principlesand con.
ceprsnor generailyknown bylhose,rto t^u.roi.i"J*ilr,""
subrecrmarrer.
lpositions that meet these standards
in
r about the value of study in that
fielJ.
r difficu.lt to prepare, it may be because
is too ill defineh or because the
item
[ure.
ments can only represent the verbal knor.r
Physical skills or affective outcomes that
we may want students to acquire cannot
be represented by propositions.
U
MEASURINGIMPORTANT
rcHIEVEMENTS
Performance Requires Knowledge
Our concern here is with th
education. The term cognitiaeability
whatever particular kind of task car
eral mental ability, general numericz
are examples of generalized abilitier
tive ability as used here. Here are sc
abilit y :
Ability to traco the routeol the pllgrlmvoyage
Ability to calculatethe squareroot ot a number
Ability to oulline lhe economlctheorlesof J. M. Keynes
Abllity to traco th€ clrculationof blood
Ablllty to describethe orlginsof the IndustrlalRevolution
Ablllty to ldenilfythe parts ol a llower by name
Ablllty to describea methodfor removingtarnishfrom copper
These abilities indicate *!1t u person can do. They require applications
of
knowledge to perform specific taiks or to answer particulai quesri,o;s.
They can
be raught specificallv and are learned specifically.
Most written tests used to measure school achievement,professional
capabilities, or qualifications for effective performance on the joL'rn""fJu.
t.rl
of specific cognitive abilities like thos.elisied above.To acquire"anysuch
.%"i *.
ability, a person must learn how to do it. To perform a clgnitive task, onE
must
know how to do it. The basis of any cognitive ability
a-ndpractice may develop and perfect the ability, enabl
the tasks more efficiently and accurarely.But ihe bas
person know how the rask is to be
the ability to do something the persc
reasonable.Knowledge is the key.
It is sometimes said that individuals possessknowledge they do not
know
how to use. They may indeed. Based on thii supposition, th6 infeience
is some.
times drawn that knowledge alone is no.t enough; .o-"thi.rg more i,
;.;;;;.
But such an inference is,open to question. It irav be that ihe ipdiuid,ral
lac(s
sufficient knowledge of the right kind. or those *'ho cannor appry knowledge
they possessmay simply rack the knowledge of how to apply r,i rn.
proble'm
may not be the inadequacy of knowledge per se, but inadequaciesin thJspecific
knowledge possessed.
The contribution of knowledge to effective human behavior is sometimes
questioned. Knowledge alone is not"enough, says the businessman.It does
not
guarantee financial success.Knowledge alone is not enough, says the
..ir.g.
president. It does not guarantee scholirly achievement.t<nowledge alo'e
i, nli
enough, says the religious leader. It rtoes not guaranree virtue. Kn"owledge
alone
is not enough, saysthe philosopher. It does nLt guarantee wisdom.
They are alr right, of course.Knowledge irone is not enough. Bur in our
complex world of chance and change, no one tlhi.rg o. combination"of ttrlngs
*iti
MEASURINGIMPORTANTACHIEVEMENTS/T5
ever be enough to guarantee financial successor scholarly achievement or virtue
or wisdom. Although this is true, few would deny that the command of substantive knowledge does contribute greatly to the attainment of these and other ultimate goals.
some have argued that know_inghou does not arways require knowing
that (Ryle, 1949).But-are the two really so distinct and unreiated? For cognitivE
tasks,would not a sufficient amount of relevant knowing that enable u p.itott to
know.ftou? If you know that to find the quotient of two common fraciions you
must invert the divisor and multiply, and all that those words mean, do you not
know how to divide common fractions? In general, if we wish to teach someone
how to do something, is there any better way than to teach them that this, this,
and this must be done?
surely, knowing is not the same as doing. If the doing involves physical
manipulation, it may require psychomotor skills that knowing cannot supply.
Even in the realm of pure mental tasks,practice may increasefaiitity. Sut failiitv
aside,can doing any mental task require more rhan knowing perfectly well hoi,
to do it? If so, what is that "something more?" The best *ay to prepare learners
to complete a_cognitive-taskis to help them acquire the knowledge of how to
complete it. The basis of that knowledge is necessarilyverbal knowledge. Given
sufficient motivation to.attemp_t_to
complete a task, sufficient verbal kiowledge
about how to eomplete it should enable learners to do so successfully
Knowledge,Thlnklng, and Understandlng
-fhinking, understanding, and performing are among the significant
goals of education, but none of these behaviors can be produted or iurtured
without a substantiveknowledge base.Thinking is a processand knowledge is a
product, but the two are intimately related (Aaron, l97l). New knowledge clannot
be produced internally or used without thinking, and rhinking always-involves
knowledge._Thought process€sare wholly dependent on the iinowledge being
processed.Knowing how to think can be distinguished from knowing wlat is s6
but cannot_beseparatedfrom it. Acquiring knowledge and learning how ro think
thus would seem to be interdependent goals. To say that schoolJ should teach
students how to think instead of teaching them knowledge is to urge the impossible. ln sum, the best way ro teach people how to think is to help them acquire
useful knowledge; the ability to think is necessarilvdependenr on having something to think about.
To assimilate new information, learners must incorporate it into their
own structure of knowledge.They must relate it to what they ilready know. Relat.
ing is_understanding.Thunder is understood better when it is reiated to lightning. Fermentation ii understood better when it is related to bacteria.In gene?al,
the understanding of any separate thing involves seeing its relations to other
known things. And knowledge that is understood is more useful than knowledge
that is only information.
Teachers can give pupils information. But they cannot give them under.
standing, for a person's understanding is a private, personal possessioncreated
by the one who seeksit. we earn for ourselves the right to say "I understand."
How much we know about a subject depends not only on how much information
a6
MEASURING
IMPORTANT
ACHIEVEMENTS
we have obtained from others <lr from
much we have th<lught about that inf<rr
other elements of infrtrmation we hav
study. We ask students to study becausr
to thin k ab out r elat ions hips bet ween
w
lea rn. Ne w inf or m at ion t hat c an be as s
. rate means will be remembered and r
structure of knowledge with superficial
b ut,it may be r em eniber ed. Lear ning
a<
understand_ing are correcrly percei#d
To rre understo()d, iniormati'r-r must becrme
part of a c.herent structure
of kn<lwledge. when
frlr its use arises, we must u.
to remember it
and see its rerevance. 'ccasi.n
when all this is true, we can say we
"nr"
have
comman$ <lf the
*:lj"#;".$:.,;ll;:i:,?:,:,.3ffi1.
.,,,,.,,.,,ghrie badname
roterearning
and not en.u g h e mp h a s i so n c o m m a n o .
t-
" tt'
much emphasi s on possessi oi
we ac t ually c o u l d u n d e rs ta n d ; th e c o s t c
k nowledge w o u l d b e w o rth to u s . N o d o t
v . ides _one
of th e g re a te s tc h a l l e n g e sro t,
dent lear nin g . H o w d o w e i n c re a i e th e v
<lr decreasethe ,,c<,rst"
of learning it?
Describing Cognitive Outcomes
The terms that some educarorshave used
to identify or describe achieve.
ment are more impressionistic than dem
Nearly at important aspectsof achievement-knowledge
or abilities_
ME A S U R IN G
IMP OR TA N TrcH IE V E ME N TS '7
can be described by the type of behavior required to demonstrate attainment of
the achievemenl Nearly every test item on a good classroom achievement test
can be classified using one of these seven categories:
Understandingof terminology (or vocabulary)
Understandingof fact or principle (or generalization)
Ability to explain or illusrrare(undersrandrelationships)
Ability to calculate(numericalproblems)
Ability to predicr(what is likely to happenunder specifiedconditions)
Ability to recommerldappropriate action (in some specific,practical problem
situation)
7. Ability to make an evaluationjudgment
l.
2.
3.
4.
5.
6.
The usefulnessof these_categoriesin the classification of items testing
various asPectsof achievement depends on the fact that they are defined mainli
in terms of overt behavior requirements, rather than in terms of presumed mental p_rocessesthat may be required for successful response. Items belonging to
the first category always designate a term to be defined or otherwise ideiltided.
Ttems dealing with facts and principles are based on descriptive statements of
the way things are. If the question asks,who? what? when? or where? it rests a
person's factual information. Items testing explanations usually involve the
words why or becau,se,
while
,items belonging to the fourth category require the
student' to use mathematical processes to get from given information to the required quantiries. Items that belong in either of categories5 or 6 are based on
descriptions of specific situations. "Prediction" itemJ specify atl the conditions
and ask for the future result; "action" items specify sonu of the conditions and ask
what other conditions (or actions) will lead to a specified result. In ,,judgment"
items, the response options are statementswhose appropriateness oi quality is
to be judged on rhe basis of criteria specified in rhe item itself.
The fundamental concern of test developersis the processof translating
the relevant structure of knowledge into tasks (test items) that require a demonl
stration of the knowle dge and abilities of that specific structure. Todo so requires
that the elements of the structure be identified so that test items can be *iitten
based on them. These_elem.entscan be represented in a variety of ways-propositions, instructional objectives, or goal statements-and with varying leveis of
specificity. To the extent that we are able to dissect the knowledge structure and
describe its components precisely,the measurementsof achievement that result
will be most useful and most meaningful in describing the cognitive outcomes of
education.
USINGINSTRUCTIONALOBJECTIVES
The knowledge and understanding on which the instructional efforts in our
schools are focused is the same knowledge and understanding that tests of
achievement ought to. measure. The specific knowledge we expict students to
learn is represented in the Instructional Objectives compone.tt of the Basic
48
MEASURING
I M P O R T ANT ACHIEVEM ENT S
Teaching Model.described in Chapter 2. The teacher'sjob is
to define the struc.
tures of knowledge, the concepts and relationships thaishould
form the basis of
instruction. Statementsof instructional objectivescan be useful
for instructional
planning, for promoting intentional learn"ing,and for cleveloping
to.ls for per.
formance assessmen
r. what are instructionallbjectives
;[;;:
;;;;;;#"
fr.m? How can they be used to ehhance our evaluatio.,^"d
err<rriui^
The Derivationof Instructional Objectives
Instructional objectivesare statementsthar describe the abiliries
students
should be able to display to demo'strate rhar important conceprs
and princioles
have been incorporaied into their own srrucrures of knowl;J#:il;J;;;;5;;,
indicate whar the learner should be able to do at the end
of an instructional
sequence'Becausethe.development of cognitive atrilitiesought
ro be the primary
concern of our schools,the delineation of these important"abilitie.
15'
matter' Particularly at the.elementary and-secondary school
"i'iri"-i;l
levels,,r..y"L
J.
c.iding what students should learn, whar they shouli know,
"i
should not be left
to
the classroom reacher aro.n9.Mosr purposeful formal learning
is organized ilih;
context of a curriculum defined in-terms of grade levels andiuu.;.Jt
*utt.rr. ro.
example, the instructional objectives of a seienth-grade matheriatics
clas, must
fit into the entire organizational plan; they should"not be decided
solely bv the
personal preferences, interests, or capabiiities of each different
,f"."Ji, gr"J.
mathematics teacher in the school disirict.
The derivation of instruction
outlined in Figure 3-1. The pyramid r
that indicate the purpose oi the inst
general starements that are the found
teachers,school board members, and
carional goals of the schoolsshould br
s g-oal, Ianelobjectivesare prepared for each
:vel objective for grade Z-might be: .,To use
ocessesto solve problems encountered in
The revel objective for the seventh grade, one of several,
suggests the
need for a mathematics course. The purpose"of the course
is to address all the
level objectives related to mathematics cor
needed to define in mor-e detail the
courseobjectiuemight be to ..compute wi
word problems." Once course oblecti,
ers mu_storganize them logicaly and s
in the formation of instructional units
of the abilities srudents should attain-
t.
ME A S U R IN G
IMP OR TA N TrcH IE V E ME N TS49
Flgutc 3-1. The Sourceot InstructionalOb,ectives-The PyramidEttect
The pyramid illustrates that instructional objectives are derived from a
few broad educational goals through successivestafes in hierarchical fashion.
Each stage yields more statements, collectively, than the prior stage,and the statements.generated at any one stage a-remore precise than those in the prior stage
in indicating thenature of the ability to beachieved. In facr, the writing of i"n.
structional objectives can become a seemingly endless task if the writer atltempts
to separate cognitive abilities into increasingly finer components.
Statlng Instructlonal ObJectlves
In contrast to educational goals and level objectives,instructional objectives should be prepared. primarily by rhose who will do rhe teaching. The stitements should be written in a form and at a level of specificity that will-make them
most useful for their intended purposes. objectives that have been prepared to
guide instructional planning or to communicate intended learning ourcomes to
students can also be used for evaluation planning and test development. For ex.
ample, the cognitive abilities indicated by instructional objectivei are prescriptive of the type of evaluation tool to use (observation, obJective test, researih
paper,, essay,o_rprojecQ to assessachievement. And when a test seems most appropriate,
objective suggeststhe most appropriate type of test item (essay,
-each
multiPle choice,
problem type). The nature of the objectives also may suggesr
how frequently evaluation should occur and, perhaps, how much formative eiiluation is needed.
;
I
50
I
M E A S U R I N G I M P o R T ANT AcHIEVEM ENT S
Thoug-h there is general agreement among educators about the value of
and the role of instructional objectlves,there remains little agreemenrabgut
how
suclr statements should be prepared. Most, however, will agiee that explicit
statenxentsare-more helpful than implicit statements, no matter how the objectives
are
intended tr:l be used. Explicit statementscontain a verb that indicatJs in .rperational, behavioral, or observableterms what the learner must do to demonstrate
attainment of the objective.Examples of such verbs are listed in Figure 3_2.
con.
trast them with the verbs from implicit statements.It is not possible to tell
when
someone knows, thinks
comprehends,but we can ,rbr..u. them eNplain.
1P".ut,or
ing, developing, and defining.
{-he approach to developing objectivesrec'mmended by Gronlund and
Linn (1990) incorporates both implicit itatem"nts-whar they cail general
learn.
ing outcomes-and explicit statements,what they call specific leu.rri.,g outcomes.
Their method probably parallels the thought piu..rr.r mosr of us wiuld
use to
develop separate (explicit) instructional objeciiues.For example, .,Knows where
to use commas in writing'' is a general outcome becauseit has an implicit
verb
in it. Some specific learning outcomes can be developed that indicate
the kind
of behaviors we are willing to accept as evidence for attainment of the
general
learning outcome. He,re are some i*ampres: separatesnames .f city ani
state,
setsoff introductory cla-uses,
separatesquotation from rest of sentence,and ends
complimentary close of a letter. Of course, it is the specific outcomes
that are
most useful for evaluation instrument development. Fbr which purposes
might
t?
r preparing instructional objectives was
:TL:::f,iIiiliT::.j,il:.'J:,i::i
viduarized
instruction,
butonry,n. or,.'lntj,T"fi:1:"JT:H:*:T,"*1,?J,iii11.
havioral (explicit) terms is widely applicable.
do propositions relate to instructional objectives?Is it necessaryor
useful for teachersto have both? The anatomy of an in"structionalobjective
Jon.
sists of a content portion, the underlyin
the verb. Consider this sample objective:
how frequent exercisecan contribute to I
sition is that exercise doesenhance efficir
circulatory systems.The learner should
merely recognize (l) that it happens, (2) that it affects lung capacity or
heart
Flgurc 3-2.
Verbsthat DistinguishExplicitand lmplicitSlatementsof InstructionalObjectives
Explicit, Behavioral,
Observable
identify,explain,describe,rearrange,
summarize,select,develop,predict,
differentiate,
define,compare,write
Implicit, Non-Behavioral,
lnferential
know,consider,understand,
enjoy,
discuss,realize,remember,
judge,perceive,thinkabout,
comprehend,imagine
ME A S U R IN G
IMP OR TA NATC H IE V E ME N TS 51
strength, or (3) that aerobics are particularly useful for this purpose. Thus prop<,1sitions are an essential ingredient in instructi<lnal objectives (an expression of
the relevant content), but they are not one and the same. Propositions are of
content knowledge; instructional objectives are of perfornrance with respect to
content. The sample instructional objectives shown in Appendix C are based
on s om e of t h e p ro p o s i ti o n s l i s te d i n A p p endi x B . C onsi der how some of the
propositions could be translated differently into instructional objectives, depending on how the learner is expected to "operate" on that content.
How instructional objectives should be prepared for a specific situation
may be dictated by the teaching model adopted. For exanrple, individualized approaches to instruction (Bloom, 1968;Glaser, 1968;Keller, 1968)require explicit
statementsof objectives to define and organize the curriculum, to plan instructional activities,to monitor learner progress,and to advancethe learner through
the curriculum. Domain-referenced or objectives-referencedtests are essential
measures of achievement in these teaching models. Regardlessof the teaching
model, instructional objectives can be useful to test c()nstructors as g-uidesto
determining the nature of test content and the differential emphasis of topics
wit hin a t es t .H i g h l y s p e c i fi cs ta te me n tsm a y even be useftrli n suggesti ngparti cular questions or types of questions to ask.
Taxonomiesof Educational Achievemenls
A number of educators have devoted considerable effcrrt to reducing the
ar nbiguit y as s o c i a te dw' i th s ta ti n g i n s tru c t i onal obj ecti vesand transl ati ng these
objectives into relevant test items. In doing so, some have clivided learning outcomes into three nonoverlapping domains: cognitive, aff'ective,and psychornotor. The first taxonomy of educational objectives stemming frorn this w<trk,The
CognitiaeDomain:Handbooh1 (Bloom and others, 1956),cornmonly callecl"Bloorn's
Taxonomy," provides six categoriesfbr classifyingcognitive behaviors:hnouledge,
comprehmsion,application,analysis,synthesis,and analuation.The categories are intended to be hierarchical in terms of the intellectual dernand required of the
learner. That is, knowledge, the remernbering of information, is less denranding
than comprehension, the relating of concepts or the translation of ideas fnrnr
one form to another. Evaluation, the most demanding, requiresjudgments using
criteria remenrbered or formulated by the learner.Each major category is further
subdivided, and test items are presented in the handbook to illustrate how
achievement can be measured at each taxonomic level.
The taxonomy for the cognitive domain has received the most attention
from test constructors because it has been available the longest and because it
describesthe kinds of abilities test constructors are most interested in measuring.
A major contribution of this taxonomy has been the awarenessit has created
regarding the intellectual level at which instructional objectives and test iterns
are written. That is, teachers who may have written most of their objectives to
require simple remembering or recall of information have come to realize that
they actually intended for students to understand and apply knowledge.By using
the taxonomy to classify objectives, teachers can reflect more readily on whether
their expectations are appropriate.
Though the cognitive taxonomy can be somewhat useful for qlassifying
52
MEASURINGIMPoRTANTAcHIEVEMENTS
1' Why rs a rusrbfeatoy
bett€rthan
its uvrsrruenrs
consrtuentsror use fn
'!'|-" "c
rems?
automarrc
autr
frresprrnkilngsysrllingpolnt.
treatgrwaler pr€sgure.
,fting polnt.
r conductorof electricity
than tho alloy.
MEASURINGIMPORTANTACHIEVEMENTS53
Flguro 3-3.
Comparisonof ClassificationSysiemsof Bloom,Ebel,and Gagn6
Eloorn's
faxonomy
Ebel's
Relevance Guide
Gagn6s
LearningOutcomes
A Knowledge
Terminology
FactualInformation
VerbalIntormation
B Comprehension
Explanalion
I n t e l l e c t u aSi k i l i s
CognitiveSliategies
C. Application
Catculation
Prediction
D. Anatysis
E Syntnesis
F Evaluation
RecommendedAction
Evaluatiorr
( f.
Attitudes
MotorSkills
use sonle categoriesfiom each of these systemsto achievesome special purpose?
(Choose one of the classification systemsand use it to classifythe sample otrlec.
tives provided in Appendix C. Cornpare your results with rhose of another member of your class to see how well you agreed.)
Interest in promoting critical or higher-order thinking skills (I{OTS) has
lead several educators to try to develop a taxonorny of thinking skills. But there
has been little agreernent about what categories should be included or even
whether such a classificationsystemwould be helpful. The kind of mental acrivir)
that most of us would consider highcr thinking can be described by rows B
through F of l'igure 3-3. These are "beyond the knowledge or recall level," which
itself is a useful way to describe HOTS. As we have nored earlier, thinking can
occur r-rnlywhen there is something to think about-new information or an exist.
ing knowledge structure" Even learning how to think requires "how to" knowledge. Consequently, an independent thinking curriculum seernsto be illogical
and unnecessaryas long as "the use of verbal knowledge" is a prominent stiand
in each aspect.of the school curriculum.
SUMMARY PROPOSITIONS
1. A majorgoalof educationis to developin the students a commandof substantiveknowledge
2 Knowledge is informalion that has been integratedinto a structureof relationsoetweenideas,
3. A structureof verbalknowledgecan be described
by listingthe conceptsand relationshipsof which
it is eomposed.
4 All verbalknowledgecan be expressedin propositions.
5 Propositionsprovidethe basis tor most good oblective achievement-testitems.
6 Thinkingnecessarilyproduces knowledge,but
knowledgemust be pr'esentfor a person to have
somethingto thinkabout.
5.
MEASURINGIMPoRTANTACHIEVEMENTS
7' Theprerequisites
for..performing
a cognitivetask
o*'* ro ctoft and tn" rno*ur"it"
;; ;;*
il"oy;.
statementsor obJectivesderived
from a set of
educationat
eoats.
:"J:""l1iTn':';j,J::9H
''Yl3,"":'iT:'H":.ff,'Jiill"i;il::l:ln;'.,1il"ff
:,"T
ro an alreadyexislentslructrr"
.tg.
o'.no,nr"o9li.
9. Neartyaflquestions
thatask wnoZwnatifiienl
or where?areproperty
ctassi,ied
mationguestions.
""i;;i;Ji;;;'|0. ltemsintended
to testvariousas
menlordinarily
canbe classified
thebasisof overtitemcharacter
basisof the mentalprocesses
measure.
The taxonor
moreusefutl
rhanfor evatr
rru caregoflesof Bloom's
Taxonomy(cognitive
" ,'*:T:iffi:,:ff.:llT,',::'#jJ'::i".'::,:,:"ff
fi:ilfl{ffl3il:*.,"1ffi,r9
QUESTIONS
FORSTUDYAND DTSCUSSIOI{
1. Whatdoesit meanto have
commando, knowleclge?
doesa personhavewhooanunderstand
' il,?lt "*"ttages
ratherthEnsimptyknowsoms,
or grearerinterestthanrarseproposrtions
' yy;::Jffipropositions
ro rhosewho reach
' XI;:iJ:lnil:ffi 1;itr'i"',"*l1?ll'l$,#",i"',"nar
outcomes
rrom
thearrective
or
5. Whataresomeot thecrlteria
highschoollearne
otoo"olyuseto c,ecide
stancting
or a particuiaridea
thattheirunder.yve (tor
e^qrilpre,ur"
vv, exampre,
use .lt
ot metaphor)is as deep
to be?
as
6. What do we mean when
we say studentsshoulc
they care for it
thevread"?
7 Wharrearures
oi.tingui.n
educationar
n"",, u"i',,iJr,H',ffiwhat
atwo-catesory
system
\ryith
therabers
: [H,"##'iffii""T iffi]:,1?:ffiil;;;;;"",
substanriarrv
in
,r lffff';:ff::",:'.:linstructionat'1ffi';;;:i#ff;"[::":Xly#'r?so
a new taxonomyfor
9. What factors otten cause
separateJuclgesto clisa
so that your system
categorizingcognitiveiearning
outcomes
*'el9P
o,iriinto
vou
*ouro
vl;;;:,if ,:::.,:HJl!l,ff:i::lT'
"'""dr,oi"','illi'i1",,,".
Describing
and Summarizing
MeasurementResults
There is a variety of statistical concepts and techniques that enable test users to
interpret scores and that assist test developers in assessingthe quality of their
instruments. In particular, many of the methods of making norm-referenced
score interpretations depend on the mean and standard deviation of the scores
of the norm group. And when a test has been given, statistical procedures help
to summarize and describe the performance of the class.These same statistical
methods provide information about the effectiveness of the test instrument in
providing norm-referenced or criterion-referenced interpretations-whichever
the test developer had in mind. The purpose of this chapter is not to duplicate
the content of a good statistics text, but to describe statistical ideas that form a
foundation for the measurement concepts to be considered in subsequent chapters.
FREOUENCY DISTRIBUTIONS
Afreqwuy distributian is a two-column list that describes a set of scores in a concise and systematic manner. One column lists all possible scores in the set from
highest to lowest, and the other column, the frequency column, shows the number of examinees that obtained each score. Table 4-l shows the scores of a class
of 30 students on a spelling test of 25 words. A frequency distribution for these
scores is shown in Table 4-2. This distribution of scores is a useful visual aid for
identifying the relative position in the group of any one student and for obtaining a picture of overaU group performance at a glance.
55
56
DESCFIBING
A N D SUM M ARIzING
M EASUREM ENT
RESULTS
Table4-1. Scoresof 30 Studentson a 2s-ltemSpellingTest
Aaron
Barbara
Barry
Ben
Brent
Camille
Donald
Doreen
Earl
Faith
16
19
20
17
21
t3
18
20
16
Franco
Gary
Gaea
Helen
Jack
Jeff
Jerry
JOanne
Kelly
Ken
20
17
18
21
14
19
22
18
21
Kim
Lori
Marcia
Marcy
Nathen
Patrice
Richard
Scott
Travis
Wendy
20
17
23
23
19
22
21
19
21
15
Frequency Polygons and Hlstograms
The information summarized I
represented pictorially by a frequency
polygon is also known as a line graph.
polygon using the scores of the 30 exar
test.The score sca,le.is
depicted along th
is shown on the vertical line. An alternative representation is the histogram or
bar graph, shown in Figure 4-2.
Frequency polygons and histograms are equally useful for crescribing a
_
set of test scores efficiently. Detailed procedures for constructing both ryp.r or
graphs can be found in most introductory statisticstextbooks.
Characteristics of Frequency Polygons
Frequency polvgons come in all shapes and sizes,as evidenCedby the
yalety shown in Figrrre 4-3. These cui-!'esare frequency polygons like t-be one
in Figure 4-1, except these have been smoorhed. That is, the-iagged lines have
Table4-2. Frequency
Distribution
of 30 SpellingScores
Score
Frequency
z3
0
.t
24
23
tz
21
20
19
18
2
5
4
2
to
t3
1^
13
z
1
0
DESCRIBINA
GN D S U MMA R IZIN ME
G A S U R E ME NRTE S U LTS 57
Figure
4-1. Sample
Frequency
Polygon
been replaced by a smo()th curved line, and the vertical line used to determine
freqtrencieshas been omitted. Such modifications usually indicate that the poly.
gon does not represent any one set of data precisely,but it depicts a general
distribution having certain prominent characteristics.
Often there is considerableeconomy associatedwith describing or sketching the frequency polygon for a set of test scoresrather than enumerating each
score or even preparing a frequency distribution. But to communicate such a
general picture, the characteristicsthat help distinguish frequency polygons from
Flgure l-2.
SampleHistogram
58
D E S C R I E ING
AND SUM M AF IZ ING
M EASUREM ENT
R E S U LTS
Flgure 4-3.
Frequencyporigons ilrustratingvarying
characteristics
one another must be known and
understood. F.oufof theseimportant
characterisin Fisure4-t';;;;;s
:Tilr:'"1#:::'."
w'rbe
";il;,fisequenr
secrions
DESCHIBINA
GN D S U MMA R IZIN G
ME A S U R E ME NBTE S U LTS 59
modality is another distinguishing characteristic.The motlzof a score distribution
is the most frequently occurring score.rA curve iswnimod,alif it has one rnode,
Dimodal if it has laro modes, and multimodal if it has many modes. When a frequency polygon has more than one peak, we must look to the tallest to describe
its modality. Part (c) of Figure 4-3 shows a unimodal curve and a bimodal curve.
Note that the curve with lour peaks has two thar are taller than the others and
those two are equally tall. How should the modality of the polygon in Figure 4I bq described?
The last row of Figure 4-3 illustrates the hurtosispropelty of frequency
polygons. Kurtosis relates to the relative flatnessor peakednesi of the ..rru.. Th"
names describing these.curves(platykurtic, rnesokurtic,and leptokurtic) can be
remembered by associatingthe prefix of the term (platy,meso,lepto) with a visual
image of the shape of the curve.
Tb test.your understanding of the properties of frequency polygons and
their interrelationships, try to draw a figuie r<iverify that each of ihe'fillowilg
statements is true:
l.
2.
3.
4.
5.
6.
Not a l l s k e w e dd i s tri b u ti o n sa re u n i m odal .
Someleptolcurticdistributionsare not symnrett.ic.
A rectangulardistributionis multin.rodal.
Not all bimodal disrributionsare symmetrlc.
A singledistributioncan be synrmetric,
unirnodal,and nres6kurtic.
Someplatykurticdistributionsare skewed.
DESCRIBINGSCORE DISTRIBUTIONS
Central Tendency
The modewas defined previously in considering modality as a properry
of a frequency polygon. It is the most frequently occurriig score,and it may'hav!
more than one value (as in Figrre 4-l). The medianis the-score above which and
below which exactly half of the scores are found: the middlemost score. For the
s c or es 5, 4, 3, 2 , 1 , th e me d i a n i s 3 ; fo r the scores9,7, b,2, the medi an i s 6,
halfway between 7 and 5. In simple caseslike these, if the distriburion contains
an even rlumber of scores, the median is the average of the two middle scores.
60
DESCRIBING
ANDsUMMARIzING
MEASUREMENT
RESULTS
Thus the mode must be a score actually obtained by
an exarninee,but the median
need not be. When the score distrit
with tied scores in the vicinitv of thc
slightly more complicated. faUte +_:
dian. Can you infer from these exam
the median also is the 50th p"r..ni,
chapter for computing percentiles shourd be used for
finding the median.
The meanis the average score,
by summing all the scores and
'btained
div iding t h a t s u ,mb y th e to ta l -n u m b e r of
scores.nr. rn. sc()ress,4,3,2, l , the
s um is l5 a n d th e me a n i s 3 . T h e s e o p e rati ons are represented
by the f< l rr' trl a
FV
7\
=-
n
-
t5
5 :' '
where F is the mean, DX is the sum of the scores,and n is
the number of test
scores.
ordinarily, the median is easier to carculate than the
mean, especiaily
when the number of scores is small. If the score distribution
is skewed, the median usually gives a more reasonabre indication of
the typicar score than does
the mean. consider, for exampre, the set of scoresg, g,
l0:ll,'22. what are rhe
values of the median and meai? Notice that the median
indicative
1io)-i"-o..
,!: "typical" score and that the magnitude of the
?f
is influenced in
riij
the direction of the extreme score.Noti also that four or
-.u"trre iive scoresare below
tlr,T:u"'
but only rwo are below the median. If the 22 was changed
ro a score
of 42, how would the median and mean each be affected?
For a variety.of reasons,the mean generalry is regarded
by statisticians
as a more precise and useful measdre of cenlral tendency
ihan the ,ir.ai;. H"*:
ever,we will find the median very useful for certain test-ivaluation
and test-score
interpretation purposes.
So why are there three different ways to indicate central
tendency?why
notjust use the mean and forget the rest?f'he mode i,
," a.termine but not
always unique. There may be two or more modes in ""ry
a siore distribution. The
median generally is easier to determine than the mean,
i., a very skewed
distribution, the median is more.like the typical score. ".ra
Different situations sug_
gest a need for one measure rather than the orher, bur
i"
casesir marters
little which is used. For what kind of score distribution
-;;;
u;; ;;;.an
and median
the same value? Is there any circumstance in which alr
three
are equal?
-.ur1rr.5
Variabillty
.
!h:.'"nge of a distribution is a number that indicates how many score
points the distribution covers.For the scoresg,7,2,2,
r,;h.-;;;g.
i, s. Note that
Table4-3. Examples
of Computingthe Median
A. 1,2 ,8,8 ,9,1 0
Md = 7.5 + l = gO
B 1,4,6,9,9,1
0
Md = 7.5 + 0 = 75
c . 1 ,2 ,6 ,6 ,6 , 10
Md = 5 .5 + i =5.s
D 1 ,2 ,2 ,2 ,3 ,10
Md = 1 .5+ ? = zz
I
I
AND S U MMA R IZIN ME
G A S U B E ME NRTE S U LTS 61
DESCRIBING
this is one more thon the dilfermce betweenthe highestond lowestscores.The range is a
relatively gross indicator of the amount of dispersion in a set of scoresbecauSe
its value depends only on two scores,the most extreme scoresin the entire distribution of scores.The set of scores 10, 9, 9, 9, 3 has a range of 8 also, but notice
how different these two sets are in variability.
The most common and useful measure of variatrility is the stand.arddeuiation. A conceptual understanding of the standard deviation can be gained by
learning how this statistic is computed. Calculating the standard deviation involves four steps.
l. Compute each person'sdaiation scoreby subtractingthe mean from each person'stestscore.
2. Squareeachdeviationscore(multiplyeachdeviationscorehryitself)and sum all
of the squareddeviation scores.
3. Divide this sum by the number of test scores.'This yieldsa quantitycalledthe
uarl,ance.
4. Find the square root of the variance. This value is the standard deviation. (Remember to verify that your answer makes sense, that it is not larger or smaller
than it should be, logically.)
These steps can be represented
tion:
by this formula
for finding
the standard devia-
(4.2)
where s is the standard deviation, E is the symbol meaning'lthe sum of," and X
is an individual's test score.
The calculation of the standard deviation is illustrated in Table 4-4 using
t he s c or es5, 4, 3 ,2 , a n d l . T h e s c o re sa re l i sted i n col umn 1 and thei r sum i s
used to determine the mean. The deviation scores are calculated and listed in
Table4-4. Calculating
StandardDeviation
(2)
(1)
x -x
x
(3)
(x -x r
24
4
3
2
11
00
-1
-2 4
010
,l
Sum 1 5
/#
s = -v5
1
= .l Z = ' t.t' t
Y
3Statistically,this division yields a "biased" estimate of the variance. An unbiased estimate
would be obtained by dividing b,f (n - l), one less than the total number of scores. Since
most electronic calculators that are programmed to yield the variance or standard deviation
use n - I as the divisor, the value they yield should be slightly larger than that obtained
with equation 4.2. Check a statistics book to learn more about this distinction.
DESCR'BING
AND SUMMARIZINGMEASUREMENT
RESULTS
Normality
'A formula thttj
=glY*"r
i" .quarion 4 2 and that is simpler computationalrv is
_ f n E X 2 _ (E X),
"
A N D S U MMA R IZIN G
ME A S U R E ME NRTE S U LTS 5:l
DESCRIBiNG
Figure4-4. TheNormalDistribution
scores are between the values shown ()n the base line. For example, about 34
percent of all scores are betrveen the mean score and the score that is one standard deviation above the mean. It is useful to remember that (l) about 68 percent
of the sco res ar e bet ween - ls and + ls , ( 2 ) a b o u t 9 5 p e r c e n t o f t h e s c c r r e sa r e
between - 2s and * 2s, and (3) about 2.5 percent of the scores are in each tail
beyond *2s. These rounded percentage values are accurate enough for our purposes; more exact decimal values can be found in tables in an introductory statistics book.
The normal distribution is a theoretical curve that is assumed to include
an unlimited (infinite) number of scores or observations. Therefore, it extends
without lirnit on either side of the meari well beyond +3s. In practice and for
convenience, it is often considered to extend from about three standard deviations below to three standard deviations above the mean. (Actualiy, 99.72 percent
of all scores comprising the distribution are within those limits.) But the distribution of scores from a class of, sa,v,30 st'.rdents typically will not show a range of
scores encompassing six standard deviation units. The figures shown below indicate the ratio of score range to standard deviation that can be expected for
groups of the size shown here (Hoel, 1947).
Sample Size
10
50
100
1000
Typical Range in
StandardDeviation Units
30
4q
50
65
The typical values shown are averages.For example, we should expect a set of 10
scores that form a shape like a normal distribution to range from about - l.5s
to about * 1.5s.There are too few scores in the distribution to expectthat any one
of them would be as far as 3 standard deviations above the mean, for example.
It might be a useful computational check to realize that in a distribution of 25
6iI
D E s c R I B INGAND SUM M ARIZ ING
M EASUREM E NBTE S U LTS
fi:r;r:ff:
the highest score is more apt
to be 2 than 3 srandard
deviations above
SCORESCALESDESCRIBEPERFORMANCE
Since the scores on different
tests, when taken t
widely differenr mear
hav e s o me k i n d o f s ta
son purposes. percen
scales.We will discusr
lnterpretlng scoresfrr
percentiles and percentile
Ranks
I-he per_centirerank of a particular
test score can be defined
in three
airi....,,-*:;r.rt is
thepercenrase
of scores
in a distribu.
il;1,,.1i.3:i;g,;;i:lntry
t.
.)
J.
is below the given score,
or
is the same as or below
the given score, or
is below the midpoint of
the score inr..J
of the given score.
fable 4_5 shows-the effects
of using each ol
p".1. th e perce n ri le n ks
f, yp" rf,ericaI scr
:"_._""l
-ra
a percenrile rank of only g0
under"is
J.nriiii"" I. The lc
rank of 20 under definition
2. ih. ;.dt"n
score ser
.6 0 u n d e r defi ni ri on ?. B " 1;
le rank
UO,
i" a symmerric distribution
"r.l:-rt,""lJ
highest"tand lowesr
scores
Uitfrif,. same disrance
rercerrrile rank scale,as rhey
"..
li""f a be. For these
eferred and we will use i,
"lr"" """"
f,'.*'.
a s,venscore
in u po.d*r"?'jl,:f,lffiT:,::';ilfiliT*,J.
Table 4-5.
Effect of Different Definitions
of per c ent ileRank s
PERCENTI LE
RANK
UNDERDEFI NI I I O N
Score
J
4
z
1
80
60
40
20
0
100
80
60
40
20
90
70
50
30
10
percenrire
rankof
DESCRIBING
ANDSUMMARIZING
MEASUREMENT
RESULTS65
l. Prepare a fiequency distribution.
2. Beginning with the lowest sc()re,add successivefrequency values to obtain a
c olur nn of c um ulat iv e f iequenc i e s .
3' l-or the given score, identify the cumulative frequency up to, but not including,
the s t or e.
4. For the same given score, divide its fiequency by 2.
5 Add the values fiom steps 3 and 4
6. Divide the sunr lrom step 5 by the t()tal nurnber of scores.
7. Multiply the result by lt)0 to obtain a percentile rank (rounded to the nearest
whole num ber ) .
Table 4-6 illu-strates the computation of percentile ranks for the spelling
scores from Table 4-1. Ntttice that the score scale extends from one sco.e belo*
the lowest score obrained ro the highest score obtained. And the scale always
includes all possible scores between the extremes, even when no student mav
have actually obtained some of those scores. The number of students who received each of the scores is shown in the second column of the table.
The third column gives a cumulative frequency for each score. It is calculated by counting, the number of scores lower than the given score and adding
the frequency ar thar score point. Consider the score 19, for example. There arE
I I scores lower rhan l9 and four scores <lf 19. So we get a cumulaiive frequency
of 15 .
To obtain the percentile rank for a score of 19, the value in the fourth
we proceed as follows. Take the cumulative frequency of scores betou lg
9o]umn,
(which is ll) and add to it half of the frequencies ar a siore of lg (one-half of 4
is 2). The sum, 13, is divided by 30 (the number of scores) to yield 0.433. That
qu otie nt is m ult iplied by 100 ( 43. 3) and r h e n r o u n d e d t o o b t a i n 4 3 . I n s u m m a r v .
we take half the score s al the given score value plus all the scores bel.ow,we divide
that sum by the total number of scores, and we change the result to a roulnded
percentage value. can you verify that Andy's score of 2l has a percentile rank of
l5l
Table 4-6.
Score
24
Computation of PercentileRanks
Frequency
1
ZJ
11
21
20
19
18
17
16
14
13
z
5
4
3
2
Cumulative
Frequency
Percentile
Rank
30
29
27
25
20
98
93
11
8
z
2
1
0
0
1
6I
7E,
58
43
32
22
13
7
z
S
DESCRIBING
AND SUMMAFIZING
MEASUREMENT
RESULTS
r
which'o'?lr.".'f
,1?ffii['#.1i'"x:i:i*il:.fl11Tt'1i:i]
jfr:l-scorebe
ruld be below
the percentile
score in question;
re,sixth score
from the
,ap_* tiii..il:.,i;:;,ji
%li il.iiifisiJ
4. The 20th
l:.'iT,'&':,?:1,
T:;"ffi,ttT,l""tr,
j
?ff
.',,n'"",f.l
ueor,her
LX'i;
I1::s
:r va,
x
nj*ltiliTi
::
i
n
rerva
r
r
i,
orr68.
..#l:?:f fJ.".lll#+..J#
""".,r,oi.iil1:;;rJff
DESCR IB INA
GN D S U MMA R IZIN G
ME A S U R E ME NRTE S U LTS gI
Rectonqulor
Distribut
ion
F ig u r o4 - 5 .
R el ati onB etw eenN ormaland R ectangul ar
D i sl ri bui i ons
, 100%
40"
?o"L
10"h
It is clear from this figure that percentile ranks magnify raw-scoredifferencesnear the middle of the distribution but recluce.a*-rco-rediff..".r... toward
the extremes.Stated in other words. a difference of l0 percentile rank
units near
the extremes corresponds to a much larger raw-scoredifference than does the
same difference^in percentile ranks neai the mean. For exarnple, for a set
of
100 scores that form a normal distribution, the number of scores between the
50th percentile and the 55th percentile is the sarneas the number of scores
between the g0th and g5th percentiles. But the raw-score difference betr,r,een
the 50th and 55th percentiles is smaller than the raw-scoredifference between
the 90th and 95th percentiles. The score differences in standard deviation
units are 0.13 and 0.37, respectively. (This can be verified using a normal
distribution table, found in most statistics books.) In view of this p.op.rrf,
does it rnake sense that percentile ranks should not be averaged?
14/tandard
Scores
Ijke percentile ranks, standard scores provide a standard
scale,a com.
mon yardstick, by which scores on different tests by different groups mav
be
6E
DESRIBING AND SUMMARIZING
MEASUREMENT
RESULTS
Linear standard scores' Raw
scores are transformed into
standard scores usins
the raw'score mean and standard
Jeviation-in.'.ri..i'""rthis
to create a new score scale
transformation i!
that has a prederermi;J
;;;"
and srandard deviation' one basic type of standarJ"."..,
,n, ,uroii, irr"r"a using
this formula:
- _X _X
J
(4.3)
mputed using this formula:
r = 10( z ) + b 0
(4.4)
fi::'+:lllffn',:"J,,'lr:,ff:f fi lr"il$Hchievementrestsorschoras
CEEBscore = 100(2)+ b00
The original standard ,.or.::::gd
Educationar Developmenr (ITED)
e.b)
ro reporr the results from
the Iowa Tests of
.oL.-F.o,' this formura:
ITED score = 5(z) + 15
(4.6)
Finally, stanines afe computed
with the formula
S t anine= 2 (z )+ b
(4.7)
st whole number.
DESCRIB INA
GN D S U Mi ,4A R IZINME
G A S U R E N 4TN
T S ULi S
RE
Table 4-7'
ch a r a cte r istics
o f sta n d a r d sco res i n a N ormar D i stri buti on
of the raw'score frequency polvgon frorn which the standarcl
scoresrverederived.
For exarnple, rhe l-scor.e distribution u,ill be negatively
.k;rr;; if rhe raw_score
distribution was,the z-scoredistribution rvill be le"prokuitic
ir rn" ,o*...ore distribution was' and a L'rimodal,symmetric raw-scoredistribution
will yield a standardscore distribution *'ith those same properties.
Notice that the mean and stinctird deviation of each standard-score
scale
as shown in Table 4-7 is readily apparent i. the c'rresp""air-,g
io.mula for com.
puting each. -fo create a new srandlrd-score scale,we simply
.i.,ttiply the z-score
by the standard deviatio. desired for the new scareand
then add the varue of
the mean desired for the new scale.For example, if u.ewante.l
u].s.,rre scalethat
would hav e a me a n o f 4 0 a n d a s ta n d ard devi ati on of r2,-the' rormul a
needed
wout o oe
J = l2( z ) + 4 o
(4.8)
If a teac.heradds 3 points to every student's raw score,
have the raw scoresbeen
changed to a linear stanfutrdtcore?rf so, how wourd you
describe the new ser of
scores?
Normg! cunte equiaalcnts(NCEs). If we wish to assume
that the *ait being mea.
sured by a test is normally distributed,'it is possible to
transform the obtained
raw scores in a nonlinear fashion so that thl new distribution
wilt be normal.
o-bviously,it is not desirable to perform such a rransformation
on a distribution
of scores that does not resembll the shape of a normar
distribution. The main
reason for normalizing a set of scores is to permit norm,referenced
interpreta.
tions that take advantage of the properties'of the normal
.,r*.. Stani.es ancr
DESCRIBING
ANDSUMMARIZING
MEASUREMENT
RESULTS
o the r s t andar d s c or es r ep( ) r t ed
bv p u b l i s h e r s o f s t a n d a r c l i z e d ,
rr,rn rar iz c d. 1' he pr oc edur es
tesrs typically are
f - or . " L f u r i n . g
n,,.n.,otir.l' ,i"",,"rn
scrib ec l in m os r int r oduc t or y
scores arc ae.
; i; , i; . i ; x r b o o k s .
The normar cutzteequiuarrnl
(NCE), a
has bee. nrade uoputu.
lrne of normarized standard sc.re,
p.'i-n.ir;i;:;;r.
of irs use.in rep.rtrng
sults fiorn f itle I prograrns
evaruarior.r re.
of the Elernentary and
Secrjndary Eclucation
(ESEA)'
arsoknownas:'chapteri.;ixcE,
Act
n,= .6*fJ,Ji rr,,il',n,,equation:
NCI ] = 2 1 .0 6 (z+) 5 0
(4.9)
wher e t^ e z i s a n o rm a l i z e d .z .
T h e c o mputed N C E varue
rs r.unded to the.nearest
whrle nunrber anrt only uut.,"ri.oJ'iiJ
gg.u..
fionr _I.able
";.'A;;'b".."n
il?R
l;:'::ni* il*# [i"ilJ
i.iui,ffi','
;*:].:rilL";;;,;i"
NC['s h av e ov er pel' c ent ile
r ank s ?
-- r-rLsrrLrrc ranKs' what advantages
rnight
CO RRE LA T I O N
C O E F F T C IE N T S
Correlation coefficients are
statisticsthat show the
one rneasure are related to
scores from the aurnaI
sure. I''r exarnpJe,there rs
u n..d for an i"d;*;;
r es t s c ore sw h e n e s ti m a ti n g
s o m e k i nds of test reti a
ro the sanregroup on difreient
*u.ronr, or if two e
to the same group. we use
a correlation coefficie.
agreement between th
als o t r s ec o rre l a ti o n c r
anc e.Ho w d o th e e mp
;ob performance aftei
Foreign Language) sco
ate school at the Univr
Scatterplots Describe Helationships
-
A scatterplot or
lresentation of the relation,
ship betrr'eenthe score
tgle group of individuals.
is based o.r th" same
lt
€
{e!ra and.geometr)' t" ptoi
straight lines, circles,ar
)olnt that is plotted is a pair
of scores (X. Y) for a pt
A correlation coefficient is
a number that may range
trtrorrgt r.iJ, i" - r.oo
frorn + 1.00,
:l:"T*:'.*:::,:?i^".;l
LIH:fl#iTJ;#i:::l
)ns, + l ' 00 or - 1.00,rarel l '
,le correlation bet*".r,
tru6
lationship between tf.,. t*o
ex is t in pra c ti c e ,b u t b o
variablei. A correlatio"
variables.
A graphical representation of
.
the
of
tron coefficienr can be iomputed
^pairs scoresfrom which a correla.
is usefur rt.
t*o .."*".. a*0, we can
teil from
DESCR IB INA
GN D S UMMA R IZIN G
I,4E A S U R E ME N
RT
E S U LTS 71
a graph if'the points tend to clusrer about a single straight line. If they do not, a
nrore complex method of computation is required. Second,we can estimate both
the direction and rnagnitude of the relationship from visual inspection of the
graph. The graph provides a means of checking the accuracyof calculationswhen
the correlaticln coefficient has been computed.
Four scatterplots are shown in Figure 4-6 t<.rillustrate the correlation of
two sets of scoresgraphically. Each point in a graph represents a pair of scores,
X and { frrr each person. In diagram a, the lower scoreson tesi X are associated
with low, moderate, and high scores on test r And the higher scores on test X
also are associatedwith low, moderate, and high scoreson test y rherefore, we
can conclude there is little or no relationship between the two sets of scores.A
similar analysis can be made of each of the other three diagrams to verify the
value of thc correlation coefficient estimatesgiven. Another approach to interFigure 4-6,
SampteScatlerplotsShowingVaryingRelationships
BetweenTwo Sets of Scores
72
D E S C R I B I NA
GN D SUM M ABIZ ING
M EASUREM ENT
BESU LTS
preting the scatterplot is to ask hor.r
Y score when we know their ,.o..
,
most accurate prediction of test I/
scr
our prediction would be a mere
cue
of individuals, scores on X and"cor
similar on both, the correlation is
hl1
nearly opposite on the two, the corr
Computing CorretationCoelficients
There is a wide variety of co
certain conditions and each .o*p.l
most common type, the pearson pior
here. Suppose *i have ,r.d u.,
.iubu
scores and then sketch it. What is
yot
The correlation is computed using
th
txy
x) (D Y)
=
- (D X)'l tnD y, 1EJ
(4.10)
where n is the number oT pairs
of scores (or persons),E is the
symbor meaning
"the sum of," X is the score
of a person o., o.r. measure, and y
is the score oi
the same person on a second ..r.urrrr..
The various ,";;
.;;;ed
to compute
Table4-8. Measurements
Useclto ComputeCorretation
Examinee
Daniel
Liz
Ben
Carmen
Ellen
Rosa
Michael
Robert
Rhonda
Albert
Sum
Interview
Score (X)
Iest
Score(Y)
10
B
50
33
44
53
z
25
JJ
48
J
56
x2
100
36
q
25
64
4
to
25
1C0
QE
388
Y2
3249
2500
1089
1936
2809
ozc
1089
2304
3481
1225
20,307
XY
570
300
99
220
424
FN
132
240
590
105
2730
DESCRIB INA
GN D S U MMA R IZIN G
ME A S U R E ME NRTE S U LTS 73
the correlation coefficient are shown at the bottom of each column in Table
4-8. we can subsritute these values in equation 4.10 to complete the computatlon.
l0(2730)- (56)(437)
T xY =
2828
2828
= -= 0 .9 4
JO4qez,ror) 300r
How does your es^timateusing the scatterplot compare with the computed result?
What does the 0.94 mean?
The example used here is fairly simple, intended only to show whar a
correlation coefficient is and how it is calculated.Most situations in which a cor.
D( x - x ) v - v )
?ZJxJy
rxY=-
F-2 axay
n
(4.1l )
(4.r2)
Note that subscripts are used with s and z to distinguish between scoresfrom the
two measures (X and )/) for which the correlation is to be estimat'ed.
Interpreting Correlation Coefficients
equivalent forms of well-constructed achievement tests,for example, tend to yield
correlations of about 0.70 to 0.85. Values of 0.90 to 0.95 would be considered
particu-larlyhigh for that situation, and valuesless than 0.65 might be considered
somewhar low. on the other hand, ACT scores correlate aboui 0.b0 with gradepoint average at the end of the freshman year at most universities. Values o-fO.OO
would be considered somewhat high and values of 0.35 would be regarded as
fairly low.
Coefficients of correlation are widely used to siudy test scores,build theories, and- make predictions. If calculated accurately, they provide precise esti.
mates of the degree of relationship among the data on which they are based. Two
71
oEScsIBING
ANDSUMMABIZING
MEASUBEMENT
RESULTS
cautlons are in order, however.
First
coefficient is seldom a causal
one.-(
ably explain why X
"rJ-y-"..".1i",
biologystudenrscorrelare
_ 0^60;;;,
SUMMARY
PROPOSITIONS
6. The mean of a set of scores is found
by addinoall
the scores and dividingthe sum
ov in" tot"f
ber of scores.
"irn_
Z
tqw extremely high or extremely
low scores
I
tend to pull lhe value of the mean
fro, tnJ
nedian and in the direction "*"V
of the extreme
scores.
8 The varianceof a set of scores
is the averageof
rne squareddeviationsof the scores
ltroni tne
mean)
9 The siandarddeviationis the
squareroot of the
vaflance
10 Conceptually,
the standarddeviationis a number
that shows the average amount
by which fhe
scoresin a distributiondeviatefrom
the mean
11. Thenormalcurveis a theoretical,
symmetric,fett_
shapedfrequencypolygonthat nasbecome
a rer_
ativestandardfor describingcertain
typesoite'sr
data
12 The largerthe numberof scores
in a group,the
greaterthe expectedrange
of scoresi; standaro
deviationunits
'13.The use percentile
of
rankspermitscomparison
of performancef rom tests that may
Oiif"l i" in"
, means,variability,or distribution
of scoresthef
yield
14. .Thepercentiterank of a
Etvenscore is nlosl ao_
propriatelydefinedas the percent
ot ."ori.'ini
group that fall below the midpoint
of the score interval in which that score is located.
15. A complete silt of percentile ranks yields
a fre_
quencypolygonthat is rqctangular
in shape
16. Percentileranks are percent values
betweenO
ME A S U R E ME NRTE S U LTS 75
DESCRIBINAN
G D S U MMA R IZIN G
and 100; percentilesare raw scores thal may
haveany valueon a gtvenraw-scorescale
17 Conversionof normally distributedscores to
percentileranks increasesapparenl score differencesnear lhe center of the distributionand
decreasesthem near eitherextremeof the rawscore distribution
'18 A z-scoreindicatesthe numberof standarddeviation units an individualhas scoredabove(+) or
belo w(-) the me an
'1I A standardscore can be computedby usingthe
valuesof the meanand standarddeviationoJthe
new scaleto modilylhe z-score
20 Normalizedstandard scores are provided by
so that the usefulproperties
manytest publishers
of the normalcurve can be incorporatedin the
of the scores
interpretation
21- Thecorrelationcoefficientis a measureof the de-
22
23
24
25
greeof relatidnship
betweentwo variables,based
on oairedvaluesof the variablesoblainedfrom
each of a numberof personsor things
Possiblevalues of the correlationcoefficient
rangefrom 1 00, expressingperfectpositive(dithrough0, expressingabsence
rect)relationship,
o f r e l a t i o n s h i pt,o - 1 0 0 , e x p r e s s i n gp e r l e c t
negative(indirect)relationship
A scatterplotis a graphthat can be usedto esliand directionmatethe correlation-magnitude
betweenthe variablesused in plottingit
A telalively high correlation between two variables is not sufficientevidencefor concluding
that one variablecan predictthe otherin a causal
relationshig.
obtainedfrom smallsamcoefficients
Correlation
ples are subiectto largesamplingerrors
FORSTUDYAND DISCUSSION
OUESTIONS
evenwhen no individuals
1 Why shouldscorevaluesbe includedin a frequencydistribution
obtainedthosescores?
2 What uniquepurposesmightbe servedby frequencypolygons?histograms?
3 Why mighl lhe medianbe more usefulthan the mean for describingtypicalperformance
on a sixth-grade
Promotiontest?
4 How are the averagedevialionscore and the standarddeviationdifferent,conceptually
and computationally?
5 Fromwhat kindsof measuresmightwe expectto obtainscoresas highas 4 or 5 standard
deviationsabovethe mean?
score?
6 What is the essentialdifferencebetweena percentilerank and a percent-correct
two
between
that
the
correlation
to
know
it
be
useful
7 Under what circumstancesmight
measuresis aboutzero?
line,howshouldthe relationship
lor two variableslormsa straight,horizontal
I lf a scatterplot
betweenthe two variablesbe described?
The Reliability
of Test Scores
THE MEANINGOF RELIABILITY
Reliability is the terrn used to describe one of the most significant properties
of a
set of test scores-how consistent or error free the
aie.t 'scores that
are highly reliable are accurate, reproducible, and-."rrri.-.rrts
generalizable to other testing
occasions and other similar test instruments. Foi norm.referenced te-*ts,
thii
means that the use of a comparable test, under similar testing conditions
on
another occasion, wilt yield a distribution of scoresthat will place examinees
in
essentially the same rank ordering. The scatterplot of the ,io..* from
the two
measures will be a set o-f points that cluster about a slanted straight li.ne.
For criterion'referenced testing situations, reliability still refers to con.
sistency,but Placing examinees in the same order on two occasions is less
a
rele.
vant goal. when the purpose is to estimate how much of a domain each
student
knows, testing with equivalent instruments on different occasions should
yield
the same percent-correct score for each student, not jlz,llit
the same ordering of
test takers. when the purpose is objectives.referencedl leading to mastery in*ter.
pretations, the concern is not so much with reproducing the s"amescore as
with
replicating the- original-decision-mastery versus norr*ist.ry. Thus, the
notion
of consistency is the basis for the meaning of reliabilit* but how we conceptualize
consistency for a certain measurement siiuation depends very much on
the kind
of score interpretation to be made. could a given set of ,.L.., be considered
rPortions of this
discussion of reliability were taken from an instructionat module prepared
for a series sponsored by the National Council on Measurement in Education'lfiisbie,
l e 8 8) .
76
OF TESTSCO8ES
THE BELIABILITY
T7
quite reliable for criterion.referenced purposes, but not so reliable for norm'
referenced purPoses?
In view of the differences in interpretaticrn of reliability for normreferenced and criterion.referenr:ed situations, separate discussionsof the two
will be presented. Similarities will be described in the last section of this chaPter
when ciiterion-referenced circumstancesare addressed'
Dellnltlons ol Scote ReliabilitY
The classicaldefinition of score reliability makes use of the idea of corre'
lation and equivalent tests:
of
is thecoefficient
The reliabilitycoefficimtfor a setof scores
from a g?up of examinees
on an equiaalenttestobtained
correlationbetue;; that sit of scoreiandanither setof scores
o.fthesdmegroup'
indzpmdmtlyfrom memhers
Three aspectsof this definition deserve comment. First, it statesthat re'
liability is a property of a set of test scores' not a propely -of the test' itself' A
nutrition t.rf .ot,td yield fairly acclrrate scoreson a certain day when given to a
particular class.but could yield fairly inconsistent scoreswhen given to a differint classor when given to the same classon another occasion.The more appro'in the group, the higher the reliability of
Lherange of achievement in a group, the
ned with a test of that achievement.Even
:fer to a test as "very reliable," what they
uld say) is that the scoresobtained from
nder certain testing conditions could be
test. That is, the scores are highly con"
sistent.
Second, the definition specifies the use of a correlation coefficient as a
is
measure of reliability. One of the characteristicsof the correlation coefficient
that it provides a relatiae rather than an absolutemeasure of agreement between
pairs oi scores for the same persons' That is, the scores do not need to be the
!*..i r"r"" numbers on the two occasions. If the differences between scores for
the same person are small relative to the differences between scores for different
tend to show highly reliable scores. Conversely, if the
f"rro.rr. itren th. test will
between scores for the same person are large relative to the differ'
diff...n.",
ences beriryeenPersons, then the scores will show much lower reliability'
f.'hird, ih. d.Iinition calls for two or more independent measures, obtained from equivalent tests of the same trait, for each member of the Sroup:
of
This is the heait of the definition. From this it follows that the various means
the
will
provide
achievement
same
the
of
measurements
independent
"Uli"i"g
basis for"severil distinct methods for estirnating score reliability'
A Theoretical RePresentatlon
For theoretical purposes we assume that a test score can be partitioned
score
into two componentu, a i*"-r.ore and an error score. The hypothetical true
if
tested
obtain
would
person
the
scores
of
the
of an individual is the average
1l
lHE R€LIAB'LITYOF T€ST
SCONES
i
**.-"'
-,,h
iii"r
thc sarncrcsr ].he retationship
berweenthesescoresi5
x=1.+F.
(51)
3T
25
2A
21
2
35
35
40
32
-5
+3
The variance of the observed
(raw) scores can be
represented as
r:i = r? + sjr.
$2)
(53)
TII€FELIABILITY
OFTEST
SCOBES79
means that random erro$, not true differences, mainly exPlain why examinees
obtained different scores
Sourc€$ ol ScorD Vrthblllly
A major goal of tesr makers is to maximize th€ true variance and mini
mize rh€ error variance rn the scores on their t6sts. What are the factors that
influence each type of variance? What are the many reasons that explarn why
indiliduals in a group achieve different scores when they take a test tiat is aPPro'
priate for the ability level of the Sroup?
As an example, Carlos and Marta might obtain drflerent scores on the
same geography test because Carlos knows Bore about the content covered by
the test rhan Marta We want our test scores to reflect this kind ofdifference, so
no enor is involved if this is the sole explanatron for the score diff€r€nce. Howeve': all other possible explanations are Potential sources ofmeasurement error.
Here are some examDles:
Alrhough Marta rs an above averaSe reader, for some inexplicable
I
io reread nearly ev€rything on the test She seemed unable to
she
had
reason
concenrrat€ well enough to comPrehend on the fi$t reading Carlos's attcntion
did not seem to fluctuate in any unusual way duxing the test
2 The reacher recognized both Carlos's and Marta's handwriting when
points for one of her responses
3. Carlos was fortunate in that the two essay questrons related closely
ro what he had most recently studied, but Marta had concentrated her study rn
several orher areas Instead. She might have been more ,ucc€ssful had a different
pair of essay qu€stions been asked.
4. Marta did not read the inslructions carefully and forSot to answer
the five ir€ms on rhe back side of the last page These items were marked as
5 Carlos guessed correctly on four of fiv€ multiPlechoice items, but
Marta was correct on only two of the six guessesshe made
In thes€ itlusuauons (he €rrors rhat occured affected Marta and Carlos
differently and probably affected all oth€rs iD the grouP in other various ways.
Ihese {e calledrandon arr6 because,if we wer€ to Sive thcse srudents an equiva'
lent tesr or give them Lhe same test again, we would exPect these kinds oferors
ro have a somewhat different effect on each examinee the second time f,ach tyPe
oferror night be present or absent in a sPecific testing situation for a given test
rak€r. Sometimes th€ effect of an error will be fairly large, sometimes it will be
fairty small, and som€times it will b€ abseDt altog€ther
Unlike nndom errors, ststnntX errors affect all examinees in the same
way and cause all scores m be higher (or lower) rhan they ought to be The*
kinds of errors do not contribute to score differences among test takers, bur they
do affrct th€ absolute magnitude of each examinee's score For €xample, there
was a shori.answ€r definition ir€m that everyone got wrong (even though they all
81,
THE REL]ABJLITY
OF TESTSCONES
rhe possibte sources ofvariance
FTOUES-t.
in scores
p@ib 6 Souroes Varian.e
or
i. Sore oi a pariicutarIesl
fihorddike 1951)
, .-aslhg dnd senercl clat a"l".r i".
or.n",;;;q Leveto. ab'tityon
one or Foie o€ne,st tr.;,(
tt,
:::?:i,!,or€l
many
senetat charccteistbs ot the jndivtdual(Fac(J6 atlectins
p€dormanc€cn
ts ar a partcutarIme)
D Emoiionat
siain
E Generatlesiwiseness
ingwirhlhe pa.l/cutartost
mareriats
s invotvod
(especialty
in psychomotof
tesb)
lhan nomory
ll;:i::il;::i'"""'***
!Po4th€senera'
V Systahatrco. cnance taclo.s afiecinO
ho aonh.istar/on ot rB test ot he appatsat
ot tesl
a, ccnditions f bs1ing_adhersnc€
b timeimirs, treedomkorndtskacrions,
ctarityo,
Insrucron olc
B Unreiiabjtjtyor bias in subjecitvoEtirg ol
kais o. perrormancGs
vl uariancenot othe?Wseaccou ed fot (chance)
A. tuck,n serecrronoran""-"or'g";"iiig-
THE FELIABILITYOF T€STSCOBES
E'I
in an attempt ro identify rhose rhar conrribure ro error and rrue score variance.
His.caregorization_schcmeis reproduced in Figure 5-1. Some facrors explain why
rn inditidual mighr obrain differen! scores on the same rcsr on Mo;acasiors,
and some explain why examinees tesred on he same occasion misht obrain
scores thar ditler i.rom one anorher A detaile discussion of each caaecorv cal
be f ound in S h n l e y (1 9 7 1 ).l ' o r o u r p u rp oses. sev€ral generati ,ari onsc;n be
drawr from the lisring in Figure 5-ll
I _ All rhe sourc€sor v,ri, n.. .lo no t Dece$arilyoperatein ev€ry lesrint siruation.
2 So,re facrorscontribure ro enor scoresin sometesrinBsi.uationsbu!contibure
ro rue scoresin orher situations3. Reliabiliryis not simply an infinsic rrail ofa !est;ils valuedependson the naur€
uf r he B ,o u p re \re d rl
. ,d rc s (o n re n r,d nd rhe .ondi ri on\ofi e,ti nc.
METHODS OF ESTIIiIATINGSCORE RELIABILITY
There is a need ro esrimate resr.scorereliabilrry so thaiwe €anjudg€ th€ exrenr ro
which measuremenr eEors mighr inrerfere wirh th€ inrcrpretabifity offte scor€s.
B e. r us c r c liabi h ' t c rn b e i n fl u e n .e d b y s u many facror;-the group tesred.rhe
test conren! and testing conditions- ir is nor possible ro sente on a single method
for esr|maling reliabiliry for all resrinS siruarions. At leasr five Derhods are us€d
in pr dc t ic e Lo o b ta i n rh e i o d e p c n d e n t me al urementr necessaryl or e$i mari nq
!elidbilir ) . T hese n rc rh o d , y i e l d (u e ffi c i rn L sof srabi l i ry,€qui vatence,and i nrernai
Slabllaiyfutlmrtes
'fhe r€st-reresrmethod is essenriallya measur€of examin€ereliabilitv.
an indication of how consisrentlyexamineesperform on rhe sameser of tasks.
'l he simplesrand mosr obvious merhod of obraining repearedm€asuresof the
iame ability tbr rhe same individuals is to give the samer€sr twice This woutd
provide rwo scoresIbr eachindividual resred_
The correlarionberweenrhe s€rof
scoresobtained on lhe firsr adminisrrarionof the resrand that obtained on fie
secondyields a tesr-rerestreliabiliry coemcienr.Nore thar such r€mporarycharact€ris!'cslisted in Figure 5-t as healrh,farigue,memory flucruations,ani comprehension of ihe specific rest task are likely ro confiibur€ ro t}le €.ror score
when this method is used.The resr-retesrmerhod is pardcularly uretul in sirua.
r'onswher€rhe rrail beingmeasured
is expecredto be stableover rime.Th€n, if
the scoreson the rwo occasionsyield differenr rank orderinSsof the €xaminees,
measurementerror is rhe single most likely explanationfor such differ€nces.
A number ofobjecrions io r}le tesr-rerestm€drodhavebeenraised,esp€ciaity for use wirh achievementresrs.One is that exacrlythe samerestirems ir€
usfd bofi limes.Since rhis \€r ol iremi representsonly one samDlefrom whar is
urdrrurilv a \ery lar8e popularion ot posslbteresrit€ms,rhe sco;eson rh€ rct$r
provid( no evid€n(e on how mu(h rhe scoresmishr chanq€ifa diffcrent samDte
uf qoesrionswere used tcaregoryILB.I trom Figure 5-l). Anorher objectio; is
thar srudenrs'answers
ro the s€condrestare no! ind€D€ndenrof their a;swersro
e
THE F€LIABILTYOF TESI SCOBES
he retest undoubtedly are influenc€d to
o bv student iscussion and individual or
objec'rnierval bet een restings A third
mea'
of
rr
l
ong
enors
retcst
tl
|
e
r esr.rnd
. hanqes In student rbi l rty Js a rtsu[ ol
re sa;e test simplv ro determine how reli
teachers as a lery effi'
Inost studen$
';d
mav make the second test a much Poo''
rn both casei rhc rest-retest method is nor recommended
i" ir'.
i-"'i-i."i
achrevement tes6',-.reli^bilitv of s'ores f'om
the
for
'lassroom
"stimating
EqulYrlod Forna E3llmat6s
in such a way that
lf two (or more) forms of a test have been Produ'cd
wilt
be
equivalent and if
ro;ns
alrcrnate
these
.ry ir'rt,r'. icores on
it *.-"-i
the test then the correlatton beof
{orms
two
is
siven
;;:;;;:r;i"
srouP
"tlie twb forrns pro"ides an estimate ol score reliebility A higb'
t-*, -." ,l."n " ,.
;"""
rh a r rh ' rno rci Inrrn\ can be ured i nrer' hange
r . ii" t r iit l
'
'
.ri
d
e
n
te
rtJ
i r' Brrr el drrvel \ l uw e\ri ntrre i r an i ndi ' ati on
. ; v as nie a s u re ' o l rrre rm e
the ' onrenr domai n ot
i
tc
ms
.r,."
Pro b d b h Jrr nur ' anrP l i rrg
, l" i it ' . , *o ..,,
([ l l h FiSure 5-l)
errcr
il-**"'
rhis kini of content samplinf
"".,1i" -.rr.
::l[::T$f:!":.'la
I::::i'i:?:':'.','Ll
L test scores
equ,va,en' iT :-:,0-i,J;1:';,:,il$;:;il.-l:.:'i ili
erswhen
the etlecb or I
':t-'
school vear or to assess
"-;',
are not
thal true educarional
reliabilirv is essenrial'so
f--'
;;;il;;;
errorsm;sked or artificiallv elevated by 'neasuremcdt
lntom.l AnalFlr
sarns
Melhod3
of test-retest and
Th€ drfficulties assoctateo wnh the derelDrinalrcn
i nf^rrndrrun i nrerndl l o
ot a s ingl e l e s t d n d o n th e u r o f' o m Ponent ' ubre\rs
t he t e\ t , to e s l i ma l e l e q l s c o re re l ra b rl rrr
ti madon' One common method of spl i tti ng
it€ms and the elen numbered'rtems
numbered
th€
odd
a iest has been to score
between scores on the odd and even numbered
il;;;;"-"radon
l"';.i;iy.
THE FELIAB]LTYOF IESI SCOFES
8:I
ir em s is . al c u l a re d Of (o u rs e , s p l ,tri n g a rcsr i n rhi s w l y reans rhar & c sco.es
oD rvhich rhe reliabiliry is based are from halflength resrs.To obraiu an csumare
of t he f elia b i l i rv b a s e do n th e fu l l l e n g Lhtesri r i s ne.essarrro.orrecr, or step
rhe half:test correlarion ro rhc lull.lengrh correlarion. (As you {ill s.e shord},
lengr h of a tc s t h a s a a e ry d i .e c r e l i e c t on rhe rel i abi l i ry of rhe scorcsw e can
fron i(.) l-his is done wirh rhe help c,f rhe SpearDan-Brown foDnula. When
(5.r)
f + |
lr her e r is th e re l i a b i l n y o fth e o rj g i n a l scores l orcxarnpl e, i frheodd
lation betNeen t$o 25.itcm half.resrs is 0 82, rhe reliabiliry of rhe roral rcsr scores
(50 items) is I 6'1 divi(led bv 1.82, rvhrch is approxnnarely 0 90.
(5 5)
wltcrc r, is rhe retiabilirl of
scores from rhe ue{, lengtheDed rcsr, n is rhe
nunber of ti D e s rl re o ,;g h a l 'he
re s r i s l engl heneo, and r rs rhe rel i abi i i ry of rhe
original Lest s.orrs.
S u p p o s e r g i v e n s e r o f s c o re shas a rel i abi l i r! of 0 50 and w e w i sh ro
ioc r eale r h c o ri g i n a l l e n g rh o f rh c re s rb v ni ne Li mes,addi ng neu i rems equi val enr
in c o. Lent a n d d i ffi c u l ry rc , rh e o ri g i n a l i rems.The l el i abi l i (y oI rhe scoresfrom
the new resr is prcdi.rcd to be
I f r he or igi n rl rc s r c o n ra ,n e d 2 0 i tc m s , dre new rest w outd necd 190 equi vateD r
ir c m s t o lie l d a re l i a b i l i t) o f0 1 l 0 Ol c o urse, i or rhi s predi .ri on ro hotd,i tudents
should be expe.red ro rcspond r. 180 items rlirhour gctring I)rore bored or ia
t igx ed ( han th c y w o u l d g e r b y re s p o n d i b g ro the ori gi D at20.,{ nd rhe added rtems
should be similar to rhe or iginal ones rn rerms ot conrenr, dilficulty, and overall
Kuder-Riharhon.
Two of rhe lnosr widely accepred Dcrhods ibr esomadng re
liabilit y wc rc d c l e l o p e d t,y Ku d e r a n d R i chardson(19t7) The' r tbrD uta 2C l ab.
:
breviared K-R20, is
[, -
]
(5 6)
II
THE RELJA9LTY
OF IEST S@RES
. = ---:-
L
Xrr-v_, 1
i_r r,__lF._J
. n iltustradon
of rhe coDpuratioDs
of t(_R2(
SCOFES85
LTYOFTEST
THERELAB
with intermediate response opdons The forDula resemblcs th€ one for K-R20
becalrse K-R2Cr is acrially a special case of the alpha Procedure rhe formula rs
(5 8)
the
where si rs the variance ofa single rcst ttem When alPha is used to estimate
['-T]
A =
i 1 =
Dsi =
si =
(5 9)
numbe. oi s€Paratelys.ored e$al Les quenions
lariance of studenb scoreson a Parricular item
sum of the ten vanancesfor all .est rt€ms
varianceof the total e$ay scores
This method ofestimating reliability ofscores emPlovs concepts from the statisti
rndividuals across the ircms are quite similar' In such ciroDstances the separate
jndividual differenc€s in the achiele
essay rtems ate consistent in identifying
tr
rnent mcasured by rhe essay test as a whole
USIN G RE LI A B I LI T YI N F OB M AT ION
The reliability coefficicnt is an index of the amount of error associated uirlr a
ing the meaningfulness and usefulne
tion about erroi can be used to make estimates of the rue scores oi examrnees
and to assessthe Pmctical significance of the diflerence between scores of two
or more test rakeri How all this can be accomPlished rs our next set of'oncerns
IntelpretlngReli6bllltyCo6ttlcients
There are no absolute standardsto serve as criteria for determining
s{andardshave evolvedover time for evl
rAn 2pplicaLbn oalhis meLhod ro a sidplc case is lllustraLed in Ebel (1979)
I
86
THEaELABLtTyoF TESTscoBEs
must bedepends mosrly on how rhe scores will be used_whar kinds ofdecDions
will be made and how much werghr rhe rest score wilt have in re decision. Ex
per ls in e d u ,r o l a l me r-u re rn c n r h dve agreed i ntorma y rhdr (he reti abi l i l \ ro.
ef f i, r en r s h u u l b e a r l F d ' r 0 .b 5 i t rh e l cores w i be Jsed ro make deci ri ons abo,,l
B,ozp. of ind'viduals, like a class, rhe genera y accepled minimum standard is
0. 65.
Usually, we can tolerare reliabrliries around 0_50tor scores &om reacher_
made tests if each score will be combined with orher informarion_res. scorcs,
measureDents rhat should concern us rhe mosrj ir is rhis roral score, nor rhe score
g poinr in t'tsrru.rion for each studenr,
te add,tional corroboraring infornation
s.ores that will provrde informadon for grading.
Standard Eror ol Mgasur€ment
'l he reliabrliry coefficienr is a useful indicaror
of rhe exrenr ro ryhich a
set oI rest sco.es error frce or error laden, bur ir furnishes no direcr assistan.e
in eslirn,[ing the'srrue scores of examinees. In almosr all pracucal measurement
situatrons, rhe onl}' informarion availabie is rhe ser ofobs;rved scores of rhe pei.
sons measured. Therr rrue scores and error scores are borh unknown. Howe'ver,
grven rhe standard deviarion of rhe disr burion of observed scores and rhe reha.
bility cocffi.ienr of rhose scores, rhe standard deviarion ofrhe hyporhericat enor
(5.r0)
THEFELIAB]LT}OFTE5TSCORES tr'
Tlblo 5-1. Rollabllllvand Errorsol Measurem€nl
1a
9
15
21
12
-2
+1
+2
+1
2
22
10
i5
0
15
13
2A
20a
16
10
= 0 865
\AB = 1 67 (d rectcarclraron)
456 x 0367
r ' and r ar e s u b s ti ru te di n e q u a ti o n 5 .1 0, the val ue rf = 167 i s obrai ned. Thi s
shows that an estimare of rhe standard deviation of rhe errors of measurement
can be obtained with rhe standard deviation ofrhe observed s.ores and rhe relia.
bility coefficient, without any informarion about rhe individual errors of meaThc standard error of measuremenr prolides an ind'carion of rhe abso.
lute accuracy ofthe tcst scores usrng rhe obsened score scale.t'or example, ifrhe
standard error of measuremenr for a set of scores is 3, (hen for sliehrly more
r han r $o. r hir d s o l rh e o b s e rv e ds (o re 5 ,d Lour 68 pe' (enr nf rhemr rh; errors ol
measurement will be 3 or less score poinrs. For lhe remainder of rhe scores, of
scores witlin which the person s true score is expecred ro be. Using th values
from Table 5-1, we could b€ about 68 percenr sure that Dan's rrue score in rhe
int€rval 10 + 167 or 8.33 to 11 6?. To be 95 percent sure, we would say Dan's
r r ue r c or e is w i l h i n rh e i n re rv a l l 0 + ,2 r1.67or 6.66 ro l 3 34 \ore rhar rhe per.
c enr a8c s68 a n d 9 5 ( o n e s p o n d ro th e p e rt enrdge.under rhe nor mai ( un e w i i hi n
one and two siandard deviadons, respecrively, of rhe mean.
The standaid error of measuremen r is the most common indicaror of rhe
amo nt of €rror contained in an obsened test score Buc irs limirarions have
c aus eda n' r mb e r o f re s e a rrh e ' s
addi ri onal w ays ol a,counri nS Ior
' ore
.o.On
n s ie
der
er r or when in te rp re ti n g a te s r s c o
shortcomi ng ol $e srandard error ot
66
TI]EAEL]ABILTY
OFTESTSCORES
measurement is rhat i( provides the same error estimate lor everpne in the
group, even though it rs reasonable ro expcct indivi.luals to have varying error
s c or es M et h o d s th a t p e rm i t th e c o mp u u ti on ofsnndrrd errors ofmeasurement
for each of reveral s.ore ranges are desoibed bv Feldt and Brennan (1989) ln
addition, they describe procedures for obtaining crrorscore eslimates for each
individual in the rcsted group
The Probl€m ol Low R€ll8blllty
Suppose the K-R20 fron thc scores on a histor! unit test turns out to be
0 33 and t he tc a c h e r d e c i d e s tl ri s v a l u e i s uD sari sl acroryW hat shoul d be doncr
Afrer all, rhe scores seem ro hr\e rdae worth Llrough less than had been hoped
originally. Perhaps rhe first action, when pmctical, should be ro improve rhe con.
dit ions t hat c o n t b u re d to th e l o w rc l i a b i l i tv and then reLestB ut. ordi nari l v. resi
ui l . pre,l ude rhe do orer
dev elopm en r a n d a d mi n ;' ' ' d ri o I
' i m c ,on\rr.ri
rr\ the scoresbul discounr theh.
be ro retain
alternative A second alternatile would
that is, assrgn less weight to them in the decisjon process thaD had be€n planned
originally For exrmple. if rhe
ro count as 25 percent of the final
grade, rheir weighr might be dropped to 20 percent, or a bir less
When discounting is ernploved, as described above, decision making can
b€ affected in important (and usually negatrle) ways For example, if the dl}
counted scores related to significant prerequisiies lor later learnrng opporruni
ties, the wisdom of the decision to discount tlre scores ould be questionable. If
the dscounted scores wcre supposed to measu.e hrgh€r.order thinking skius in
t}le content area or problem solving or application of content, subsequent deci
sions might be grossly nisleading In sum, dre use. must be aware of the tradeoffs involved when using discounrcd or undiscounted scores, when eirher has
relatively low reliabiliry.
Low re l i a b i l i ty s y m p to m a ttc o f an unheal thy testi ng srtuati on,j ust as
's
trody tisste. We cannol rell in either case whar
high fever indicates unhealthl
the problem is, but rhc symprom suggesrswhere to look. Was it rhe test, some
characleristic of the examrnees, or some aspect of the testing conditions? Was it
a combinatron of these? The user should determine Dlausible exDlanations so
thar a decision can be macle about whe(her lo use the scores lor tireir inrended
FACT O RSI NF LUE N C IN GSC O R ER EL IA BIL IT Y
When we understand fie various faclofs rhat can imDact the reliabilitv of a set
of \ ( or r s . $e , a n i n re rp re r a n d rs e rh e or e. pruoenrl 1. rnd w e can ai ternpr to
manrpulate those factors through test prepararion and administration acdvities.
lcal'aeleled FactoE
The reliability of achievement-test scores is affected by the number of
iaems in the test, the €xtent to which tesr content is homogeneous, and the charac.
teririic! of the individual ircms-their difficulty and discriminarion capability.
TH E FE LA E LTY OF TE S JS C OFE S
1
lt\t ha\gth -Ihc SPearrnan Brown forrnula (equation 5 5) 'ndicates
r lic t heor c r ica l rc l a i ro n s h i P b i rt(e n s c o re rel i abj l i l v and tcst l engl h The el l ect
ol s uc c c s s i!e.l o u b l n rg so fd rc l e n g th o f aD ori gi nal fi !e i rem test,{ hi ch }i el ded
fa b l e 5 2 The samedal a are shol ' n graP hi cal l v
r r elr ebilir y o l 0 2 0 ,i i rh o $ n i n
rh addc.l [est l cngth A ddi ng 60 i tems to
f ron 0 50 (o 0 80 B ut addi ng 80 l rore
re
l
i
a
l
'
i
l
i
ry
a 2o- ir em r cs t c o u l c l i .c rc a s c
_f t ( )
a s s u trP ri o n so, n e s ta ri { rc a l
t hc us e of t h e Sp e i rm a n -Bro w fo rm u l a fhc ttati sti ' al assumP ti oni s rhat thc
ir em s x c ldeclL o i l i c o ri g i n a l re s l to i rc re ase i ts l eng| h have the sane stati sti cal
the addcd itetls should have thc same
a nd thei r addi ti on to Ih€ test shoul d not
s l i ke rhosc i n the test l aci l i l ates correct
i t, or i fanv other factorsD rakcthe exami
l engthened test, rel i abi l i ty pl edi cti ons
a couL(lbe erroneous
2 'lest nntmt Homogeneity of test (ontenl also tends to cnhaDce score
test about the V i etnam W ar era i s l i kel y
r elr abilir v ( G u i l fo rd , 1 9 3 6 ) A l 0
100 i tem rest covcri ng A meri can brtorv
r o Dr ov idc n ro re re l i a b l € s c o re s
tte
r i n scnnecourscs,such as marheoari cs
af r ir t h€ Civ i l W a r i \l s o tl te s u b j
organized. 1!ith g.eater intcrdePendence
emcn!s. than is the subject matter ol liter'
o l testcontent hotnogenei ty drar makcs
s ts ofmathemati cs and forci gn l anguages
3 lkn dwa.L7btict The irems in homogeneous tests also tend to div
crimiDatc betr{eerr high lrnd lot achievers be(er than items in lests 'overing
rs
nr o. e div er se c o n te n t a n d a b i l i ti e s Brrt the abi l i ty of an i tem to di scri mi nate
1, . ^' r d e p c n d e .t o th e tc c h n i c al quaLi t] of the i tem-on the soundness
" iTable5-2. llelaiion ol Tesl Lenglh
!o S c o reR € l i a b i l i l Y
5
10
20
30
160
324
6{ 0
020
033
050
067
030
089
094
097
100
T H EF E L I A E ] LT / OF T ESI S' OF ES
FiguEs 2
neaLororTesL
. hoi. e ir en r, th e a d e q u rc v o f rh e c ()n e c
b e x a n ri D e e \o l l r)\e r rt)i l i r!: Ihe darurc rr dete.Iri narr). ot
indic es of d i s c ,i n ri D a ri o n a l d rh c i r rc l al i on k) r.l i l bi ti rv !,i l l l ,c di s.xss.d i n
iDdiv i. lua l i tc n rs i D mo s r c l a s s ro o n rrc srsrs probabl y l hc nl xr etL.rtre nreans()t
s .o .e rc l i a b rl ,ry a n d , h e n (c . tcsr qnati r)
' npr owing
T h c d i ffi .u l ry o f a re s l i tc rn a f tccrs i rs .onLri bnri (). ro s.or( r.| i l )i ti tr
csr score rel i abi l i ty (han ar i rcm rl trr i s
bilit y I t ems th a tm o rc L h a D9 0 p e rc e D ro' ti vcr thaD 30 per.enrof rheexrmi nees
ans r r erc ofi e c tl ) c a D n o rp o s s i b l yc o n rri bu(e rs rnuch contrarvropoputa.bel i el
a good norm .e fe rc n c e d a .h i e v e m e n r r estsetdonrshoul d i n(hrae i rei ns * ral \arv
widelv i. diffi.uh!
I
S.o re l a a h t6 C l a s s ro ()m tes6 are soD eri ncs co.sl ru.ted aod
s c or eds o t h a t rh e ra n g e o fs c o rc s o b ra i ncd i s much l cssthan i r coutd be, rheorel i .
c ally -lor e x a mp l e , a n e s s a vte s l rL i rh a t00 poi nr mari mum sco,c mav be grade.t
THE FELAB LITYOF TESTSCOFES
9I
s , r h a v ic w r o n a k i n g 7 5 a rc a s o .a b l e p a s s i ngscore rhi susual i y l i mi ts the effec
ti\.e range of scores to about 30 points A true false rest, scored only for thc
number of nems answered corecttl hxs a useful s.orc rangc of only about lralr
rhe number ofirems A muhiple.choice test, on ttre other hand, ray hare a useful
score range of lhree-fourths or mo.e of thc number ot items in the rest llence
rhe scores froD a 100.item nmltiple.choi.e test are usrally more rcliablc than
those from a 100'i|enl tnre false test Bur stlrdents gcnerally can rcsPond lo tLIee
t r ue f als e it c ms i n th e ti m e re q u i re d to resP ond to a P ai r of conteni P a.al l el
score ranges and arc likely to Produce s.ores of e.lual reliabiliry.
set in advance as the minimun
passing scofc The odrer two tests were scorcd bv
false t€st is 75, half of the 150 items
Notice rhat the erpected variability of thc tnie-false and nrultiPle.hoi.c
tive of the rcsults we could expect teachers to achieve when using tcsts of these
Exa|lrines.€lal€d Factors
Score reliability
can be influenced by the amount of variability in
T!bl€ 5-3. Hyporheti6alTest Statistlcslor ThreeTesls
IFSI
'YPE
Epecred slandarddeviation
875
5
25
(7s-r00)
051
150
1125
15
75
(75 150)
088
100
625
15
75
(25-100)
091
92
II]E FELIAE]LTYOF TESTSCOFES
ability, bur rhey can and should rry to rmprove borh resrwisenessand motivarion
in wal's that will improve feliabiliry.
l. Group heterugmeiq.Tbe reliabiliry coefficienr for a set of rest scores
depends also on rhe range of talent in the group resred If an achievemenr resr
reliabiliry ofany subser of scores from a
single grade
The reliability coefficienr, as we have said, reflecrs rhe rario of rrue score
then the observed score variance will l
ingly highea (2) their error variances will all be about rhe same, and (3) the reiiabilI I ie. ot r he i r ro re . { i l l b e i n (re a s i n g l l h;gher
There are circumstances in which the studenrs in a class are verv similar
ro one another in tle'r achievemenr of rhe obiectives in an insrrucrional unir.
t he \ r r nJ ar d d e v i a ri o n o n rh e u n i r re \r v ery 5;al l dnd rhe reti abi ti rv(oefi i ( i enr
i' qu ir e luw. a l s o .T h j \ ma \ b e r s u a ri o'n! i n w hi ch d hi gh.quati rr resr.qhen gi ven
properlt cannot yield scores ofvery high r€liability. Dep;ndable differences in
di hie\ em enr (a n n o ' b e d e rc .' e d b v m o s t rcsrsi f rhol e di te' el (es dre neql i qi
ble- r oo \ m a l l ro b e o f p rrc l i c a l .u n s e q u e nie. I houqh gl ou o homocencnv mar
be d plaus ible e x p l a n a ri ;n Iu r to s re l i J b i ti ry ar ri meq,w e" sho;l d nor 6' eroo oui cL
t o ignor e r he \i g n s o l i a u l r! re ,r rre m c b e fore serrl ;ngon group homoS enei i ) d,
the mosr likely reason.
2. Stulznt testu|smass.When rhe amount of rest-laking experience and
levels of testwisenessvary considembly wirhin a group, such backgrounds and
skills may cause ,cores ro be less reliable rhan rhey orherwise wouldbe. When all
c x am inees in rh e g ro u p a re s o p h i s [(a re d tesr rakers.or w hen al l are retari \et!
nait e r bour rrs r ra k i n g . s u (h h o mo g e n e rrt probdbtv w i tt nor tead ro murh rdn.
dom m eas ur e m e n te rro r.T h e ra n k o rd e ru f score.i s l i (etr robe i nfi uen.ed onl v
qhen r he' p is u b v i o u s v a ri a b i l i ry i n te s rw i sen.ssw i rhi n
group. S ' udenrs * ho
' he
answer an item corecdy only becaus€ of rheir resrwiseness,
rather rhan rheir
achievement ofconrent, cause rhe irem to dBcriminare improDerty. As we noted
ear lier . poor i l e m d i s c ri m i n /ri o n (o n rri b u res ro l oqered r;ti aLi ti ty e" ri mares.
3 Stu d n t i n ti u a ti o n l t s ru d e n rr. rrenot mori va(edro du thei . best on a
test, their scores are not apr ro rcpresent rheir acrual achievement levels vera
well. B ur whe n rh e .o n s e q u e n (e so fs c o ri n S hi gh or l ow are i mporranl ro exami .
nees, the scores are likely ro be more accurare. Indifference, lack ofmoriva.ion,
or underenthusiasm, for whareler reasons, can depress resr scor€s in the sam€
s ay I har anx ie tv o r o v e re n $ u q i a s mma y . W hen mor i vati on i nfl uen, es i ndi ri duai s
in the group differ€n.ly and inconsisrendy across resdng occasions. random er.
rors are likely to influence the scores.
THE FELIABILTYOF TESTSCORES 93
Admlnislralion.related Factors
Ar wrth test.relared effects and most examinee related effects, t€st usen
I
Tine lihits Scores from a L€stgiven under highly sPeeded condioons
|he apparent increase in reliability that r€sults from speeding up a rcst is usuallT
one must have estimates for both abiliry and sPeed By sPlitting a test into halves
exa inees are able ro attempt all items.
2. Chealingopportlnities Oc.urren.es of cheating by students du ng a
tesr ontribuk rando; errors to the test scores Some stud€nts are able to Pro-
94
THE FELIAB]LTYOF TESI SCOFES
CRI T E B I O N. B €F E R EN C ESDC OR ER EL IA BIL IfY
hoid r hc m i s u k e n n o ri o n rh a r s c o rc son cnrerroDretere
ced rcsl ' cxhi bi r l i l rte
ab' t ir l, ar e In a p p r,,p ri Ie ro r .ri rc l
de! or ed r o d i s c u s \i n grh e v a ri e r)
^l r
suremenr tor se!eral orrcrion.refer(
cabiliry of rhe ..norrrefercnced" rn
Scorss or D€clsions
Thc retiabitiry of scores from criterion.referenced resrs
can be under
. .
stood berrer if ryc firsr examine rhe rypcs of
;,gh;;;;
m aLe { ir h .u ,h !,,re .. } o r (u n v e n i e n ,c, rhe\e
'nrerprcra,t"",
c!n be l drr,t" d"; d, db:utur. ner
t or m an, c . d u n ra i rre ,ri rn rre . d n d n ,J \re r)
Absoluk.peformanceint€aprctat.ion;make use of a number
.
scale rhar shorjj
te \(t\ o i d (o rn p ti ,h m.n r u r J.hi .!emen, r.,
'ue,
nr reo
eds
'
n8
" ,.h " r,,.,;i i ;' ;;s c arep u s rtr.n . r \J mD te \ a re ., T .pui nr \,
us.d ro .vj j uJre
rhemer,r 10.
" te
pu,.nr^'rrng
rormro, iudsing,hequrlir .r a w!,odko,kinr
s ( ar e r or t u d g rn g rh e q u d l i rv u t d trg u re 5Lari ngp.r t" rman, ".","ai. ";;..;,i;,
t5.poi nL' stal f
i ;r
us c o Dv d,rre n ,.h te -ah, F , ro j u d g e rh e qudl i l ! ut \ho,r
"
drJtugur,
w ri rren dn,
pr r \ enr r d D v p rrr\ o r :ru d e n | r In e j .h
rhe.e (are.. rhc .core;br" i ns medni nF
I r om I he be h a ri o r d e ..,ri p ri u n \ a rra (h e d^tro eJ, h { al c p.,nr
A rJ d.,, nbe,;;::
a studenrhasacbieved,
and no ref.erence
ro rhep*r.;-*_,
cannorbc sarisfied
wirha,,retesC,rL
.r
"ir,..
ii,j...,
original scores Cor.elauon coetfici€r
surement are in rhe samerdathe po:
the same ,r,tzte posrrio" on ..tesii"g.
_
Thereare no generat a,, epr;d Indi,e" of r etiabitir)tor s(o,e5
.
basedon
ab'otureperrorman,.inrerprerari"ns.
H"..,",. ,pp,.p;i"i; .";;;;;;l;;:;:
sistencywoutd inctude a d;monsuarion thar rhe iverage atscrepancy
terween
test and relesrscoreswassmallfor rhe group i" qu*tton.
"S.n.tt,r.oula ...a i"
be definedl,! rheurer.basedmainlr on rhe'fineness
ur rheb.h""i. ;,i;;"d;
rx,nq.rn /dorrrun.hrBhagreemenr
dmonBindependenr
rarersof rhe ,-. d._."
or proJecrs
or orher pe,Jor,nances
is luflher e\ idenceot the consjsrency
of rh€
scores.Again, absoturedifferencesrarrler lhan reradvediffe,."c€s
;;sr;L ;;;;ined.
Donatn?,tina!. intiptetal;r?6 are m"de when rhe s(orrs indicale rhe Der.
(enrageor some( teartydefined, onrentor performan,
e doma;nat|ainedbv,lhe
srudenr.some oI rhe,ommon domd,nsor inreresrare tound i" ,r,.
uiJ.,liii,
ctrni(utdmor our etementarlschoots:
namesofrh % r.,t.,;;;;;.;i;;;;;i:
proo!.rs rrom rhe mulripricarionot pairsof singte.igil numbers,
atj \;rdson
rne rn d.gradespeltrngs(ate,dnd name_location
assoriarions
for lhe 50 srare!.
THE FELIABILTY OF TESI S@FES
In each €ase, a test score obtains meaning from an understanding of \vha! Lhe
"v hole, ' t he doma i n , ( o .s ti tu te s " T o mml k n ow s 83 P ercentof hi s l etters i snot
(o ld t her e ar e 7 8 0 w o rd s i n a l l Se tti n g a s i de these shortcomi ngs,how can rhe
reliability ofs.ores liom a dolmin-relerenced sPclling tcst be determined? Since
we are agarn inleresred in absolute intefptetations, corrclational methods will
A s wir h a b s o l u tc p e rfo rma n c e i D te rpretati oD s,the mosl reasoD abl eevi
denc e of t hc r el i a b i l i tv o l d o m a i D e s ti n a tc sc.r' esi s a snrl l averagedtsdeP ancy
D rethodshal e been devel
s c or e bas edon te te s ti rg A g a i n , n o g c n e ra l l y acceP Led
in donain scores Such scores are descriptive, brt (hel offer no PrescriPtions for
further insrruction and, unless a cutoffscore is introdu.ed. they do Dot helP Lhc
perlormaDce standards Often .alle(t de.ision rtllj)
I
2
3
.l
used in nasrely ,nterPreta
A postrcstrore of iI lea* 85 pel.enr i5 necded ro Proleed to the next uniL
A s c or eo fa t l e rn 2 2 o u t o f2 5 i s n e e d edb P as tl )ed,nefs Iest
A dv oneN i L ha s .o ,c b c l .l v 4 0 s i l l b e p l ac.d i n the sP eci i lreadi ngpfog' am
e crk,r
S . or esh i g h e rL l ra n3 o n l h e 7 P o i n t s c a l cw ,l l be Iega.dedas a.ceP rablP
When rhe interpretile goal is to make a decrsion about perlbrmance the
retesting- k should matler lLt(le ifStcphen earns a 7 or a 4 on the retest, as long
as the d;cision about his performance Is the same. l hus, for mastery interpre|a
tions, score reliabilrty takes a back seat to deciston consistency with regard to
measurement errdr. Whenever criterion referenced scores are used to make di
chotomous decisions such as pass fail, mastery nonmastery, Ptoceed remediate,
any of sevcral methods may be useful for estimating decision consistency
E6tlmallng Agreom€nl
When dichotomous decisions such as mastery-nonmastery are the object
ofcriterion.referenced interpretation, the consistency ofclassrlicatron is of inter
96
TH€RELABILITY
oF iEsT scoREs
b 4,show
rhat17srudenrs
were.,"il:TT,';1$il,'li"'1.';:llii:::i,l:']i
(5ll)
N
s rhar 80 percenr of rhe decisions abour
rhesr 25 srudeDrs were consisrenr from.
percent) werc classilied differendy on l
is 1. 00,bur rrs l o w e s rv a tu e i s n o r i i l e l
to classrficatioDsmndomty, by coin ro
Kappa, seco'd coeffi.ienr ofa!
much rhe classi carions that are based i
cncy relarrve ro randoD classificarion For example, ifpi turns
our ro be 0.60 for
-
i:iJi:#H,.r:
-
l;1,s.*-."'
(5.t2)
rharcourdbeexpected
by chance,
is carcu.
(d + l )(d + .) + l . + d n D+ l l
N:
(5r3)
Tebls5-a. Examplsof D6cisionConsistencyEshmalron
b =3
17
P r= (1 5 + 5 )/2 5 = o 8 o
l8
25
THE FELAB LTY OF TESTSCOFES
97
T he v alue s o f4 4 c , a n d d a re re p re s entedi n Tabl e 5-.{ C an you ve ql harP ,
= 0579 for the data rn fable 5 .l? what does rhrs value mean? Ir ocans rhar
about 58 percent of thc pairs of cl assification s would be consistenr, by chance, if
thc outcomc liom thc sccond test administration were indcpendenr of lhose
f r om t he fi rs t F o f o u r i l l u n ra ri o n . K = (0 80 - 0.579)/(1 0 579) = 0 525 If
r he v alue o fk a p p 3 w e fe .l o s e ro z e ro , i t w oul d mean rhat our cl assi hcari ons
w ere
nol any bettcr than rvc could have done bI chance, withour tcstiDg thus ii =
0 525 should be interpreted ro mean that something other fian random facrors
is accounting for rhe level of decision consistency obrained rhrough rhe rwo resr
adminisrra(ions In other words, true scores rarher rhan crror scores are rhe pri.
mary reason that decisions were so consrstent from ore dme to the nexr
The illustErion of obtaining an agreement coefficient is parallel to rhe
tcst-rctcst method used with norm.rcfcrcnced measures Howevei when eouiva
lc nt J or m sa re a v a i l a b l eo r N h e n L o n renr\al npl i ng i s J maj or ,.n, ern, e,ri m;ri on
procedures that parallel the norm referen.ed equivalent fbrms methods should
he used- Berk (1984) and Subkoviak (1984) have dealt wrrh these and other related
methods in so re detail In addidon, Sobkoviak (1988) has provided tables ro eas€
r hc ( or npu (ru o n d l b u rd e n ro r ,
u l i ri ng & Jnd kal rpd
"1,
Feclors Altecling Decision Consislency
Many of rhe factors that affe.t the magDitude of reliabiliry coefficrenrs
also affect agreement coefficients because errors associated rvith examinees, the
t€st itsell and the testing conditions can occur h any measurement siruarion
And, of course, these facrors often operate simulta eously rather than in isola-
I
The tutoff scorelacatira When the cutoff score is nearer rhe highesr
score or the lowest score, decision consist.ncy {ill be greater A cutoff sco.e at
about the median rs likely to produce the nost inconsistent decisions ThB idea
conforDs wiih the notion that extreme levels of Derformance are easv to Dick
our . bur s h a d c \ o t d i l l '
a mo n g r !pi ,dl pcrf;,mpr' rre drffi (ul o der;cr.
' c n coJ
c \ the scaresIl the
homogeneous xnd
2. The hanogmeitt
the cutoffscore is somewhere among them, the number ofinconsisrenr decisions
could be quite high Much great€r de.ision consistency should be expecred wheD
scores are highly variable and the curof| score is located somewhere with,n rhe
distribution. For example, even if the curoff score is at rhe median, mor€ consisr
ercy is likel' ro be achieved with scores froD a recrangxlar distriburion rhan
from one that is negatively skewed.
3 Testlotgth.lntuiri\ely, it seems logical that ihe more opponunities we
gile examinees to perforB, the more {onfident we can be iD deciding whar their
ranking is, how much they knoq or whether they can perform a sp€cific task well
enough If a masrery decision nrle requires 75 percent for passing, great€r deci
sion consistency should result ftom twelve trials than ftom erght or from eighr
items than from four
The theory and technical advances associat€d with the reliability of
scores ftom criterion-referenc€d measures have be€n develoDed and imDle
98
THE AEL AB LITYOF TESTSCORES
ment€d slowly For more advancedE€armenrof rheseropics and for addirional
comFutationalillusrErrons, seeCrocker and Atgina (r986) and Kane and Brennan (r980)
SU MMA RYP RO P O S IT IO N S
I Educationat
lesrsalwavsyietdtesslhanpedecUy 12 The standarde(or ol m€aEuremern
is an e$f
consislenlresurtsbecauseot contenlsampting
mate ot the generatmagnllde ot erro6. er_
e(ors, examineepeirormanceefiors, scofng
pfessgdin tesl-s.orelniis
edors,or admnsrralion€nors
13 Thestandard€iror oi moaslfem€ntcsn be eEU_
2 Themeanrngolscoreconsst€.cyvaresdepend- matedbymuhiptying
theslandard
deviation
ot the
nq.n rhetyp€otsoore nterprelaion:norm,
criscoresby lhe squa.erootots d terencewhrcn
lenor,or objectives
rol€.enced
ts 1 minusthe fetiabitilv
coeflicienr
3 Teslscorereliabilily
can be d€tinedas theco(*
14 Longertests composedot mqe discriminal
nq
rationberweon
scoreson twoeqlivatenltoms or
itsmsareljketytoyted morergiiabte
scofoslhan
a lesi tor a speciliedgroupot examtnees
shortertesls composedot tess discfmtnatrng
-1 Reliabilit y c anbe d e fl n e d th e o re ti c a tty a s th e Droi tems
ponionor observed
scorevariancedle to tru6- 15 Tesiscomposedoi homogeneolsconlentare
scorevarrance
tik6ylo producemoreretiabld
scoresrhanlhose
5 The rellabily ol scorestor a giventesl may be
conlaining
helerooeneous
content
allecledby randomeior sourcesbul nol by 16 The more variablethe scoresobtanedkom a
solfces or sysremalic
eroG
lesl,the h gherthetrrelabitilyis tketvto be
6 Nerlherleslreteslnoreq! vatenliormsmelhods 17 Scoresoblajnedfrom grolps heisogoneolsin
are pracrrcally
lselu for es{imalngthe r€tiabitiry
achi€voment
are tikey to be moreretiaitotEn
or scores
a classroom
rest
thoseoblainedtfomhomogeneous
groups
'rom Brownprophecy
7 TheSpeafman
tormllais usetut 18 GfouDslhat are hetercgeieous
in iestwseie$
lor eslimaine the rellahrlilyoi scores kom a
are tiketyto prodlce tess re abtescoresihan
englrenedor shorlenedtesl
grolps
homogeneous
I The kLder Fichardson
lormutasyi6tdeslimales 19 Inr€rna consistencyrelabihy coeftcienlsarts
ol scorer€liability
tfom dalaon the varabi ty of
ikely ro be ov€restimates
whe. lhe lesl rs
ab ty oi scores tfom tests noi scored dicholomously
21
lrpn 6ttodsL^.d o , orn-re e,4r-6dredsLres
[ may be more rmportant lo estimate decisron
consistency thanscore re abifty tor lests used 1o
10 T hemore Wde rylh e t em s n a les r v af y i. dr f '
m a k em a s r e r y d e c s i o n s
.! ty. ihe more serously lhe KLder-F chardson 22 percenl oi agreehenr and kappa a,e Soth melh
lorm! a 21 may Lndereslrmarere iabrr(y
ods ot esnrnarng dec son consstencv
11 lhe min im!.na ccep t ablelev e or s c o/ e r ehabr t ny 23 D e . s o . c o . s s t e n c y r s . f r e c r e d b y s ; c h t a c t o r s
s. nan y a lu nclon ol lhe nlendedus e ot t he
a s c l l o i f s c o r e t o c a l t o n ,h o m o g e n e i l yo t s c o r e
drstnb!lon andlesl enoLh
r!]E FELIABL TY OF TESTSCOFES
S
FORSIUDY AND DISCUSSION
OUESTIONS
purposes
quilereliabielor crilerionrelerenced
1 Whymighla sel ol scof€sbe considered
purpos€s?
lor norm{ele.enced
bul low in 16labillly
by a persofwhoslal6s. Thislesl is mt reliaaredemonslraled
2 Whalmis!nderslandlngs
parlollrle scorevarianceralherlhanpaltole(orerors consid€rod
3 Whyaresyslemalic
andcnle.ion{e'€renced
scoresd ller
4 Howdo sysl€maicerrorsallecl noh rele.enc€d
ale lor usew th achievemenl
5 Whya.e slabilllyesl malesol re abililygeneraly nappropf
lo eslimale
6 Whatassumptons musibe madewhen!s ng lhe Spearman-Brown
'ohula
the reliab ly ol scoreslroma sho enedlest?
lor a seroi I scores?
errorol measuremenl
7 whal lslh€lormulaior compLlnglhestandard
wilh
be usedlo describe
theeilor associated
I Howcanthestandafderor ol meas!remenl
on a lesl?
a perleclscofe(100percenD
mqht lhe dooblingn lenglhol a 3ojlem lest producea de9 Underwhal circomsrances
creasein scorereliabilitY?
ce
10 Howdoesthe dilticuly leve ol ilemsallecl lhe 16lablty ol scoresfroma mulliple€ho
reliabilityeslimalesless uselu wilh scoreslrom domain11 Why are corfelalion-based
thanno.m{elere.cedtests?
rel€ronced
12 Whal happensro ihe valles oi P^ and kappawhenalexamineospasson bolh lesling
Validit.r:
Interpretation and [Jse
' I her e a rc g o o d re a s o n sw h y v a l i d i r)
i s such a
conrepr amol g
q \ yuu Ii l l \ep' nirl5undersrood
hn h r r.r ,rw r. In d rc -r .\p .fl '
rF mednrnq o' ,bi r i d" d h" ,
c v oh, e drn d c o n ri n u e sro c h a n g es l i s hrt\ \vi rh ra.h pr(i ng de(ade.C nnsequenrl y,
r nc as u re n re n ts p c c i a l i s rsh a v e b c c n In.on\' qrenr i n rhei r use of the terrn. and
Inany ideas have bccn proposed lbr brjngiDg order ro re confusion. Then, to
r h, ^r s l ,u h J rr J re l J rr\c l r r,.n m p t"re undr r.rand;nB nf,pftdrl /,/) nnd i r eJ{ ,u
, , , I lu" e rh c r\u r(rr,. D ,,p i rc rh e p.renri al tur ,onru,i on, rt,.i .,,,,," .,j
ro
under sra D drh e c o n c e p r o fv a l i d i ry b ecausesuch undersnndi ng i s rtsel fthe foun
_ dar ion fo r fa i r a n d p ro p e r u s e o t r esrsand measuremenNoaal l ki nds A l l rl ,c
precaurions and special cares raken in the resr developmenr pro.ess, in rhe ad,
ministrarion of tesrs, and in dre reporring and interpietatioir of scores are iD
rended to enhancc validrry Thc ma'n goal ot lhis chaprer is ro describe rhe sever al f ac e rso f v a l i d i r! a n d th e re ta l ed pri n.' pl es rhar can be appl i ed to ensure
appr op ri a te u s c o f l e s r s c o re s .
THE M E A NI NGOF VA L ID IT Y
lhe term ultdiry, when applied ro a ser of resr scores, refers to rhe consistency
(accuracy) wirh wllich rhe scores neasure a particular cognrrive abiliry ofinteresr.
Thus. there are rwo aspects ro validiry: whar js measured and how c;nsistendv r
is neasurcd. Ihe cognirive abilities refcrred ro are abiliries to perform observ.
able t as k s ,a b i l i ti e s rh a r re q u i re a c ommand of subsranti veknoi l edse. The con.
sisrency of mcasuremenr refers ro rhe retiabitiry of rhe scores. Reiiability is a
ro0
rl
T
AND USE
VALIDI'Y NIEFPRETAIION
101
necessaryingredienl of validity, buL it is not suflicient to ensure lalidiry trnless
ihe tesr scores measurc wha( rhe test user intcnds 1o mea .e, no matter how
r c iiably , rh e s c o rc s w i l l n o l b e v e ry !2l i d. l hus, fron thi s P €rsP ecti l e,val i di tl
rcfcrs to lhe meaning of the scores obtained from a tcst admiDistered to a .crta;n
gr oup D o e s a h i g h s c o re mc a n l h a i Ihe tndi vi dual i s P arl i cul arl y.o,D P etentw tth
respecr ro rhe knorvledgr dre lest is supPosed to be measuring? Does a lorv scorc
n) ean t he e x a m i n e e h a s l i rtl e a b i l i ty ?W haI el se mi i i hr a hi gh or l ow score meatr:
Validiry has iraditionalh been regardcd as a tesl characteris{ic, but the
most (julren t fiinkirlg of measu.ement exPet ls has .haDged that 'fhe nost rccent
Stanlard.sJot Edu.ntianal and Pr.halogical'lattrng (Amcrican Ps,vchologicalAsso.ia
Lion 198 5 ) a s s o c i a re sth e te ml N i th a set of test scoresral her than { i th rbe test
used to produce them ln particular, laliditr has to do wtrh the meannrg of rhe
s c or esan d th e w a l s w e u s e th e s c o re sto make deci si ons W c rsk such questi ons
as " How w e l l d o th e s es c o re src l l e .r P hysl csa.hi e' emenri o. H ow aP P roP i ate
r s it t o u s e th e s e ma th te s t s c o re sro deci de w ho shoul d takc al qcbr^ ncxt l ear
and whc,s h o u l d n o t? o r " H o w a p p r.i pl i rte i 5 i t o i ri l c! thrt hi gh scorerson rhi s
c er r if i. a ri o n te s r w i l l b e e x .e l l c n t te a chers?"I{ crc are sonre s,Irati ons Ihal tl l us
trale a variety of validitv concerns
I
The test is intended to measur€ higher-order thirking skills in eanh
scienc€, bDt most iterns require only recall of facts, terms, and scientific Principles. It world be rnappropriate Io tnfer Lhat high scorers can aPPlv. sohc Prob
lems, or inlerpreL earth scien.e informationl the nature of the test items did not
require the demonstration of such skills I he meaning ol the scores is somewhat
different from Nhat the tesr developer intended- For whal other purpose might
these scores be more lalid?
2 The test is supposed to neasure achievement of concePts and rela'
tiorships rssociated vith the democratic form of governan.e, but the queslions
requir; a very high level of reading skills and a highly d€veloPed Yocabulary.
Tl; meaninq of ihe scorcs fion this tcst is comProDiscd by the "eit'a" \'erbal
skrlls rhar arc reqllired l.ow scorers may be poor readers rather than low social
studics achie!ers. l hese scores are not a vcry qood represcntation of the kirds
of achievement the teacher expectcd
3- The scores on the p€rsuasive €ssay thar was giv€n as a fidal exam in
ninth-g.ad€ English lreusd by the oEiculum committee to assessthe eff€ctiv€.
grammar,
ir add.essing witing nechanics-spelling,
ness of the currioluh
punctuarion, and capitalization. Unless writing mechanici rvas a scoring crite.
rion for the essays,the scores wrll be influenced mainlt by the amount and quaf
ity of evidenc€ the writer offered to support the position taken. Even if wriring
mechanics was one of the scoring criteria, the scores will be contaminated" by
lhe other factors rhat were used tojudge the qualiy of the essals. It is probably
rnappropriate ro make curricular judgments abou( mechanics on the basis of
4 Students nodced that the multiple-choice it€ms of th€ readirg test
contained qu,lifrers (most, sohe, usua y) ;n the key€d r€Eponses and absolu.es
(all, n€v€r, every) in host distract€rs. This test was Dade easier than rt was in.
item.writing medods ol the teachei The
tended to be due ro lhe
te s tw rs e n e ss
bctter than readi ng comprehensi on-S i nce
s c or c sm a y w e l l m e a s u re'diosyncratic
iO2
VAIDITYINTEFPFETATON
ANDUSE
the scores may nor have the meaning inrended b) rhe reacher, ir would be inap.
propriate ro decide how welt pupils are progre$ing in rerding on the basis
of
mosc s.ores
5. Th€ t€st is a measure ofword anatysis ski s_b€sinnins sounds. end-
inq s9u4s. and rhvminr_but lbe ,eacher.senuncia,ion; a
inn."ii""
"J;".
h€rp tead studeDrs to rh€ corr€ct answers. The scores on rhis resr
may sti be a
measure ofword-sound associarion skilts, but their rntcrpreration is complicaterr
b! fie re.rcher'srmprcper tes(.adminisrrarion proccdurc;..rhe vahre of rh'escorcs
r or r herr rn re n d e d u s e _ g ro u p i n g fo r readi ng i nsrructi on_has been
teoD ardi zed
hc , du\ p rh e rF d n rn So f rh e y o r.. h d. b..n di \rorred.
6. The directions o! rh€ math t€sr irdicated thar ,.+,'and ,,0,,should
be used to respond true o. fat!€; answ€rs usin8
were scoted o
wrong, regardl€ss of their corr€ctness. Obliousti rhe scores of rhose who failed
to hccd the dirccrions will nor be a measure of iheir mathemarics achi€vemenr
Since 11is unclear whar rhe scores mean, ir atso is unclear how rh"";;.;l;;"
These situarions iltusrrare whar validiry is all abour_rhe meanins of tl,c
scores we obrain from rhe adminisrrarion of a parricular tesr Therc ari narrv
f i( r o' , r h a ( ( rn d e { | o \ a r a l i c h e meani ng w e had hoped ro drLd.h ro ,cr nf
d
sores. Once rh€ Deaning of the scores hai been disroited, rhe scores be.ome
less appropriare for rheir inrended use. Thar scores are ,rrs or 16r disrorted
and
lp$ Jppmpr;are means ihar validiry is a mauer of degree. Scores arc not
abso
lur el\ qlid o ri n ra l ,d fo rd p a rri c d l d ru s e tsuchperfeeri oni \orcl i nJri l l nor
,)o\,i
Dre ro a(hrere h rrh rhe educarionat measuresavailable ro us. The burden is ciiefl)
o^nt he u rE n f rc s r(o re s ro j u s rrfv (1) thei r i nrerpretari onsof a set ofscores
and
(zl the appropnrrcness of their use of rhe scores In eirher
case, some form of
tangible evidence rs required ro supporr rh€jusriEcadons
EVI DE NCEUS E DT O SU PP O R TV AL T D IT Y
T he r ppr o p ri a re n e s s o t!s i n g -re s r:c o res for nati ngprni (utarrnrerpr.ri ri on\ol
oc nir ons s n o u ro h e J u d g e d tro m e v i den.e
B arhereddnd prp,cnred bt rhp re\l
use. There is a varjery of evidenc€ rhar mighr b. p..,.nrua ro demon;rrare rhc
valid use of a ser of scores, and most could be grouped inro one of rhese
categories: conlenr relared, c renon relared, and consrrLicr related Thesearc nol
t)p"\ atudt i'!, btt \p^ at uohd,^ aid,1tr". fhe conrcnr rlpe is cont p-ncd \ irh
hos
wer t t hpr e \t ,o n re n r re p ' e s e n rsrh e d omai n ot ab,l i ti r,rhe r.er,\.r!i ns l o mea
. ur e. T hc,n re ri o n rrl ,e i ..u n .e rn F d q i L brel di i on,hi b.,usud
rreorcsi nredbr
. uI elir io n ..e ffi , re n r,. h e rbe e n rh e resr.cores and rhi r ore. on ,ome
cr i Le.i on
n' e} ur e o f rc l e \/n r a b i l i e \. F i n J ttr. rhc .on\rru(' r\pF i , , nnccrn..t \i rh
rhe
or eiar l n ' e a n i n q o trh ,.,o re s .q h a r rhe.o p.ri onor ra,pon,es,o rhe Intti \i dudl
ir em \ ||ord l s (u r., me d n \ J s a p s v ,h o togi rat .onsLrucl
|l rj .o D \c n i e n r ro d i n u !s rh e rhree ryD e. ofe\i den(e rFpardrct).bur do
rng so conve),s rhe false rmpression thar rhese are independenr notions'and that
any one of rhen mighr be used, by irsetf, ro support v;lidiryjudgmen$
tn facr,
VALIDITY
]NTEBPFEIATION
ANOUSE 103
.{nsrmcr..elared evidence iucorporares rhc orher rwo:
borh conrenr. and ffrerion.
:'i.:,1:;jl::::,i:il::ded,o
suppo',
rhenea"r.g,t .h;',;';,i,-i;i";
",
' l h e p ro c c s so t g a rh c ri D g
c !i der
t'on are to providc evidence /o, a Dar
s e! er al o rh e r.o mp c ri D g i n te rp re re i;on
scores rcllecr rhe achie!enenr of cerr
unduly i n fl u e n c e d b y c c rta i n i f.e l c lan
:1'.
::::T,
fi:H*l'l::?::l;1
T.,J,
::#iL::Tjff""..:."i::,(:"1::iit:
ili"1,l
sii,lii
ljl
iill';r,{::"l l;.:x11i#fi:i:ilir*lill'
Ir is imporranr ro realize rhaL rhe. validarion of scores
can change over
p3*,
i..
:Til]"',lf:i'i:,iltff:itt::
{ *1g*-* .r'"'s*s."f
lii:.T:ilt1
::Jf'i:il:ixril.';:'i
t:T*ili:Hxs,"txl
""$1*ii;'..1#li""f"':H
.",*;,o,,.":;ll#i::i:.1:::Y;i:il!,1fi
I :::.:': :ilt:,l- -,,,,r,.,
.;;: )::iJi;:i,:,,'"I[:n
]J:::i:l ;":i,."1,:tiil:';h:y
"::,i:,:Tit.";
:i ;rT:,ill"lt;li:;;:i:1.:;::';..i
il:,:i,ltJ,1n*u,:11*:
;:tlj
rlic rlpewnre. In facr, if Acme continues!o us" i" ,yp;g *.,,
i, .ili lik.i, ,"r;
Irro b te m\ a rc e n ,u u nLered In u\i ng rorcs con,ri n;d i n rcho,,r
fflt"E:,':l,m'*.*
ill:d;:'":nifii*,;1ftT:x"::,,T*:
:',:':,::ll
ing inferencesabour future pertormancemay atsoilanse.
Conl€ntrelatsd Valdation Evid6nc€
j:$j:.,""'l'J,i:,
i:T:i,T
f,'i;t'j*:,Tj"j;I::,:
:" " l'j,i:;l;'.:i'"',:fi
rhFdlainmenrot knostedgein rher unrenrarea.f i.,.*". r r,.
*",. i,,.i*Jta.
,rons wnernernn|'n rele,cni:edor(rirelonrereren,ed_arebasedon
rheLvoo.
li.xl'l;f,t:t':":L
:*';.1;,
;;i:l;:Tf[.';:"11'"f
:,::4,:f
,I,,ftfjl
t*fi'irr.'r+r[i':'":#Itr."i"rFi]!i;,.::li
another difterenr bur similar serofitems fn
idmt ro answer
aboui?up**",
wewourdeirct
t"t:..p.,11:tion,
"r:,:
ce musr be made rhatretares
ro anorher
, with a w tten driverk lic€nsetesr,we
rer on rhe test wilt be saf€r and more
rcs,
conrenr
musr
beba*o * *o,n,il?Xilf:r.T:i%:"dt#r;.jbffi,1:
",
104
VALIOITY NTEFPREIAT]ON
ANO USF
delineation ol rhe k.o$lcdge, skills, and undersandings requrred for rafe
driv.
ing A sanple of some oa rhe srare,nenrs rhar mighr be ;ffer;d to devclop a safe
driver d€ltinitron arc
I Disringuish the .r€anings of road si8ns oi difiireDr .olors
2 Describe dr€ iun.Lon of a carbureror
3 Describe rhe prlccdure for g,ining.onrrol
ot a .ar rhar beains 10 skid on ice or
4 Fnld (he shoncq disrance berweer t$,o ciries using a highway map
5 Id€ntily rbe merning asso.iarcd with signs of larying geomeri. shaDes
0 Des . r ibr t he pr o( c dur es f or ( ba n S i D g a r a L r i r e
somc starements could bc excludcd
frorn thc defiriiion
because rhey represenr
us ef ul s k i l l s . b u r s k i l l s ro r e s s e n ri rl ibr safe dri vi ng The .,." ,r,r;" ,.i
,,.4;
m enr \ pr o b /b l v l ,r , rhr\ ,
) o n rh. urt,cr l ,drd, rh( .Iefi .ri , ron trtctr w oul d
-.c
F
.,r
be ( ons i d e rc d i n ,n rIIl (re k l r,.j In , tU ,ti ng sr.r.rn.nr, \ .h,c..D i rri .rpnri arc
bet ween rh e m e rr;n s s n t.u l rd r' ,,t L r ul ent,,,,\rhdr d(j i npdri !Lgl rr.e,.
On,e
t he. der r n rru h r\ b rF n n rd d . ( \p l r.i r i ' i \ po* i btc
re" i rrem," nr" nr
' u,.mp,re
wjr n I n( o c rrn rtro n tu a \\e \r rh e rc l e\i n,, of rhF i rcm,
tt rhe i rem, ha\e l ,ecn
wr iuen r n n rd ' ( h rh e d u m.ri n d e fi Irri ,,n pre, r,,.t!. rhe i n rerFn,e, E e n,,h ro make
abour s d l c J n d rc ,p u n \rb l e d ri v i n s (a n 1,. l ,rgl rl ) \rtrd l r,,m rt,i s pni nr ut vi .r{ ,.
a m at or p o fl ' u n o l rh . d n \h c r ru rl re \ati rti ry quesri un i \ i nhe,r;r l l r rhe re,r.
crevelopment process. Thar is, contcnr-relared e!rdence is furnished snnutta
neous ly w i th te s r.d c v e l o p me n ra c ri v i ri es The domai l dcfi ni ri on (l )oundari csof
( ont en,
b , i n ,l u d (d ). i u d g m(n r\ /h nut rhe teteran,c ul rc.r rt.m\,:rncl !r(D r
lak en r o 'Jo( h i (!r r.p r e re n rJ | | \ e n c \\ u r , onrenr
re rhe duat purpure or sui drne
.r
le\ t de\ el o p m e n t J n d .l o , u m" rri n g Jl rdJrton"c!
rden(.
'
Wri tte n d o c u m e n ta ri o n o f d r e domai n speci fi cati ons,rhe nature of the
r er r r J r k r . .Id rl k re a s u n . ro r rrs i n grh,,.e r" \r\ prori de i nrri n,rr rari oD rl tahdi r!
ev ' de' , ( . rF l ,e l . I9 8 3 ,. l h r r\rd e n (. i .,n,nn,i , terdu,e i r i , bui tr i nro rhe
r€rr.l l
is rational "because it is derivecl frorr rational iDieren€es abour rhe kind of rasks
r har will me a s u rerh e i n re n d e d a b i l i ry,,(p. 7). W hen the (est maker rs, so rhe tesr
user! erroence ror approprtate scorc use rs builr inro the producr; rhar is, borh
ihe specilicaoons for tesr consrru.tion and rhe ircms rhemselvs are necesury
eviden.e for rhe validarion process.
Mosr rcsr makers, including lea.hers, aim ro prodLrce tests rhar demon.
r r r ar e jnt r i n .i . rd ri u n d l v ,l i d i ' \ p \ rd e n , r. bul rh.r \etdum r.knuw t(dqe rhi (
eodl
ex pl' r |ll\ . T h e s e l d o m re g d rd rh e p r o,e* nt resr (on\rrurri on ,,.' o," ,.,!
v dlidar io n :rh c r 5 e l d o ,nd u r u m e n r i n h fl l l nB I he rei \on\ tor p,r I i ( utri .te(
"t
i s,oL
in test developmcnr. Useful wriuen do.umenrahon ro supporr inrrinsrc rarional
validty provid€s answers ro rhcse queshons.
L A b o u tu h a tv to to h
i afamt" \ tob,ndn ?A spdrIot rh,\de!ri D
uo
c \a
f ion, it is s o m e ri me su \e fu l rn ' n
(crrai n ex aneour abrti ri .. rhdr \huul d d.
,e tc
exctuded hrenrionally from rhe main abiLry ol inreresr. For etample, in tesrsof
chemisry problem solvrng, reaCing abilily should be minimized. '
2. What dahnin lt h@bdg4 shill,,ot ldsA,ptoutd"\ a bo.rLtat.uth ntfra.e\?
A c ont en t o u rl i n e rh a ( d e \,ri L e . rh e rr\1, ot i nrcrer i , neecl rd. l hr' !urti n.
VALIOITYINIEBPFETATION
ANO USE
105
s hould .o v e r rh e e .ri re u n i rc rs e o f conrcrtrro be measured,nor i usr rhe contenr
, c f le, r c,l L r rh r \p c , rL rc \r i rrn ,\ r l ,dLrni ehr be d, \el opcd. S omeri mes, hrD rer
heJ dr ' g . i n i rrx r a rc d B o n d \rd rri nq poi ;r Inr det ,i ng rhe domdi n.
3. lillhdt tt the relattue bnpartow oI the subdohanLt that campte lhe dnmain
deJilitbn? ke there sets 01 rclarcci rasks rhar are more imporranr rhan orhers? tf
so, the rest dcvclopmenr plan should reflecl 6e differenc€s so rhar more tesr
it em s w i l l b e i n c l u d e d fo r th e
c rc i mporranr subdomai ns One subdomai n
might receive a weighr of l0 pe..e.r, for example, while a more imporranr one
. . d$ign ' L l 2 :' p e r, e n l
4
Whtlt hinds of te.tl itens ha1)epnp*txes thal uitL pdnit the kning oJuhide
mmt of the dond elenmb? For exampte, in view of rhe tasks ourline.i rn srep 2.
dr c ( ir h
e \\J r o ! \h o I.a n \$ e r i rem. Inorc
" pp' opri d| r rhdn mutri pl erhoi (.
5 Do ihe kn iten' a*quatelJ relect thc damain kmuledge, skilis, and t6hs?
T his qu e s rl o n re l a L c sto th e m a rc h bcrw cen tesr i re r.onrenr and rhe.onrenr
specitled in rhe doDain outlinc Hlw u'ell did lhe irem wrirer translare the rask
des c r ip ti o n s i n to rc s t i te ms ?
6 Do th" subsels oJ tun nms alaquateu represmt the danain in t*ns oj the
relattueimqarlanceof the surdrrmr? ? Is the conrcnr I'eighring in rhe resr consraEnt
wiih rhe decisions made in srep 3?
7 Wh,lt d,omain o subtlohain, a silz the danain af inrdest, i br6nr in the
l, , l? { r e rh e re e \rra n e o u \ In (ro r\ rh dl ,oul d i nrr' t.re w i rh rhe y orc i nrerprerd.
r ion. r h c u \e r r!i q h (: ro rn a l .: I5 re adi 4g JL' rti r).ru,dbul ar) te\et. or rumpura.
tional skill, beyond thar rnrended, required to answer irems correcrtl?
It may seem thar inrrirsic rarional vatidiry evidence is sumcrcnt for rhe
validarion of achievemenr rests,bur ir is nor Such evidence lbcuses on rhe tesr_
its domain, the relevance of its rrems, and rhe represerrariveness of irs contenr
B ur , \ i o n g a ( l a ,ro r\ u rh e r rh ,n rl r. rc.r ," n
rhe md{ni rude ofrhe
" i n,'
" n,. !dtrd s.or r
\ ( or i s . ( r i d e n t e b e v o n d ,e .r .o n re n r i , needed ro
\uppofl
i nrerprek.
uon- C u rrc n r.re l rl e d e v i d e rk e a l u n , i r nor sutn.i Fn' be, JU r( i , trrtr l o rake i nro
account response consistency (reliabiliry) or orher rspects ot rhe resrins enliron.
m enr r h a t mi g h t i m p a , I s c o re i n re ' p rerrri on. t or exampte. nor m.rereri nced Lesr
items that fail to discriminare between high and low a€hievers may be conrenr
r c le! anr a n d re , h n i .a l l v a d c q u a re .Bur eLrch emr b i tl nor hctp ro pruduce d ranl
or der o[ s L o re srh a r w i l l p e rmi r u k fu l norm reteren,ed rn(erpnj ' ,on\ tor sel e(.
tion or classification purposes- gecause rhe tesr scores obrain€d will be somewhar
low in reliabiliry, they also will be d€ficient in rerms of validiry
( o n re n t.re l d re d e v i d e n ,e ,or i ,hi evemenr re" r, atso murr bF cLrD D l e.
mented by informadon about rhe adm;nisrrarion condirions, scorins crireria: ;nd
nar ur e o te \i m
re r\. P' e ti o u , e \d mptes ha,..how n tros rhe k! j ami ni .r,aror
can provide clues abou! correcr and incorrecr responses and how sconnq rules
can disrort the meaning of scores. Bur, in addirion to rhesc facrors, examinee
characreristrcs orher rhan achievemenr can cause scores ro be hisher or lower
t han r he r o u g h r L ob e . F u r e i a m p l e . M essi rk( 198S ,l i \r. rhesedi rern;ri ve exptanaLr onsf or l o q a c h re v e m e n r.re (r
v u re s: Id.t ot suffi ci enr knortedge, hi sh anxi ery,
v iqudl im p a i rme n r. l o q l e v e l o l m o rivari un, l un ed Lnqti sh \ti tG, ana l os l €ver
of.on.ennarion.
Though low achievemenr may be rhe mo$r ptausible explana.
106
VALIDITY:
INTEFPRFTAIION
ANDUSE
tion for low scores, rhe burden is on th€ user ro show rbat rhese orher facrors are
not influencing rh€ scores unduly. Such elidence is nor ro be found in rhe rest:
intrinsic rational validiry evidence l,usr be supplemenred bv inforjnarion found
in examinee responses, in rhe resring condiri;ns, and in thi scorine process
W e h a v e i o n 8 t o w n rh a r val i di ,) deperrd\ on rhe purpo,i .r" r
" rr;,tr
t es t s c o re sa re u s e d , L h eg ro u p q i rh w hi .h
1e\ri \ u,ed. anJ rhi , i ,, un.rdI,,^
'
he
und€r which the test is nsed. Valdiry depends oD morc rhan rhe quatily of rhe
r es r .T h € r€ s p o n s i b i l i tv o f rh e
devel op.r i , ro be d\.teaf a. p.,,j Lt, Jh,,ul
' e .r
hat is being measured and to produce
alesr rhar measures accuareil as Dossi
^s
bte. The responsibiliry of rhe resr user is to make valjd dccisions
usine rtre tcsr
scores and all other avaitable, relevanr informati.on, inctudins docu;nhrjon
furnished by Lhe tesr developer.
C.llorlon'rolaled V.llddtlon Evld€nce
A crit€rion measure is an accepred srandard againrr lrhich somc resr is
compared to validare the use of rhe resr as a prcdicror. For exampte, scores on a
dic r ar io n (e s r a re a B e n e ra l l y a c c e prcd mea,ure of \pel i nB d, Li $Fn,er L Ir qt
w€r e t o b u i l d a n d g i v e a rru e -l a l s e s p el l ;ng re5r.qe mi 8hr .;parc
rhe rtue trL(
scores with scores obtain€d on a comparable dicrarion resr r; demonsrrare rhar
the true-false resr is an acceptable measure of spelling achrevcmenr. The dicra
t ; on t €st i s rh c s ta n d a rd u s e d fo t (o mpdr i son i n r rvi ng ;o e.rdbtr\h rhe tegl l i mr,
)
or Lne n e w s p e rn g re s t.
Criterion.related €vidence rakes eirher of rrfo f{r-ms oDe relares ro derer_
mining pr6ent sranding on a cnrerion measure and rhe orher relares ro predicr
ing tuture pedormance on a crirerion measure. The rlpe of evidence neebed for
r giv en s i tu d ti o n d e p € n d s o n h o $ rhe s,nrc' t' om rl i e re,r i n qur\ri nn are i r.
t€nded to be used For example, rhe rrue,false spelling resr referr;d ro above was
int€nded to be used insread of rhe dictarion test because of the increased effi.
ci€ncy in scoring afforded by rhe true-talse tesr Ci,'Murdnr eviden.e would be
us€ful to show that studens appear in rhe same rela[ive raDk order on the rwo
measures. A corretadon coefficienr of 0.80, for exanple, mighr be regarded as
acceptable concurrenr evidence.
When rest scores are used ro selecr individuats for admrssion. emDlov
ment, exuaordinary educarion opporrunirlt and the like, ,rddi.t;z. evidenie is
needed .t h c ' e i s a n e e d ru s h o w rh a r a po(i r i ve r etari on(hi p i us| \ br r$een ,.or r\
on ( he ( e s t { th e p re d i , ro D a n d (.o re s on rome a.(
" prdbtF mpd,Lre ul tJrure per.
formance (the criterion). For example. a developmental screening-test siore
might be used ro predicr which five-year olds are tikely ro succeed i;kinderearten.Ifthe c tenon measure for "success" is "reacher's rating ofsocral, emodo;al,
and a. dd e m i c d e v e l o p m e n r d L rh e end ot krndersarren, rhe essenri dtvdt,di rv
ev ; denc €mi g h r i n c l u d e rh e ,u rre l rri on be' heen s,i eeni ns.' e\r s(ores and reach.
er is r a( in g s .l f a (o re l a ri o n o l \a t. o.ou i s obrJi ned. w e m-i ghrr oncl udc rhdr rhe
test is a useful predictor of tutur€ performance; rhar is, there is support for using
the scores to predict successin kinderearrcn
The orrelation beMeen tesr aid criterion scores has been resarded bv
m any as th e b e i r k i n d o f fl i d e n c e to ruppol vati d i .hi evemenr.re" ause. Th;
correlation seems to provide an independen! objecrive validarion ofrhe subjec-
VALDTY NTERPFFTATON
ANOUSE 107
t iv e judgm e n ts a n d d e c i s i o n s th a t n ru st be madc duri ng test del cl opmcnt. B ut
rhe validity of the scores from widely used tests ot acadernic achievement has
seldom been supporred wrth impressive criterion related evrdence I his could
mean that the lests are srmply poor tes6. But a more Plausible exPlanation is
rhat how well a lest measurcs what it is intended to measure cannot be conveyed
by the correlation between tesr scores and scores on the c terion measure why
In some .ases appropriaE .riterron measures are simply unzvailable
What should be used as a crrerion measure for a test of ability in nfth'grade
arirhmetic or a test ofabiliq'to understand contemPomry affairs? The tests them
selves are usualty intended to be the best measures of these abilities that can be
devsed Ifberter measures Nere available io senc the role ofcriterion, they also
should be more vahd than the rest under validation That many tesl developen
have failed to present con!i.cing empirical evidence of the validity ofrh€ scores
from their rests is not for want of concern, effort, or skill It is because cor'
relational evidence for the validity of scores from most achievement teste is
essentially unproducible. The same can be said of scores obtai'ed trom most
professional licensure examinations (Kane, 1982)
In many cases, appropriate criterra turn out to be diilicult or nearlv
,mpossiblc to measure a.curately On thejob Performance ought lo be an aPPro
priate crirerion for an employee selection test Rur for any excePt the simPlest
jobs, what constitutes satisfactory perlbrmance is hard to define expensivc to
assess,and difficutt to measure imPartially The relevance ofPcrlormance ratings
as criterra for the validity ofa written test is oPen to question. also A qritten test
cannot possibly measure many of the characteristics that contribute to high .ar
ings foriob perfornance. Such a rest, howevet can measure desirable characteris'
rics that are unlikely to show up clearly on a performance ratrng. In situations
like these, rhere is little iustification for presenting evidence on correlation with
a critcrion as the prtna'l evidence ofvalidity
A ma i o r p ro b l e m w i th € m p i ri c al testval i dati on i s the i mperl ect oruncer
lalidity
of the ffiterton s.orcs Criterron scores themselves should be highl)'
tain
valrd measures of the ability being tested This also means the crrterion s.ores
should be quite reliable, and their reliabiliry coefhcrent should be included as
vatidity evidence- After all, a standard used forjudging the validity oflest scores
certainly ought to be dl kdrl as valid as rhe s.ores beingjudged against that standard The validitv of the scorei from the criterion measure needs to be addressed
as rigorously and aq thoroughll as the validity of th€ test scores rn question
Correlation procedures hold Iittle promise for Providing the majn evi
of
validi(y, but they mav be useful in providing s€condary, contirming evi
dence
dcnce lf ability A is related in some degree to altilities B, C, and D, then s.ores
from a test of A should correlatc to some degree wtth ,cores from B, C, and D
If rhey do, rhe confidence that rest A measures abihty A is increased.
It is irnportanr to note that such secondary evidence of validity cannot
take the Dlace of con ten t related validity €vidence. What test A measures is determined mostly by rhe tasks included in it. ODe cannot discover what test A mea_
sures only by studying the correlation of scores from test A with scores from tests
B. C. and D. How do we know whar these oth€r tests measure? We {ould need
ro examtn€ the tasks included in them, th€ condirions under which they were
108
VALIDITI INTEFPFETATON
ANO LJSE
r dm inis re re d . th e n a L u re o f rh e e ra mi nees, and rhe procedures for \(ori ns. tl
r hr s c dre rh e b d \e \ l u r rh e me d n i n g o l { orer l rom re\i \ B , C , and D , rhoutd ;he}
not be r h e b d \e s to r rh e m e a n i n g o f rores from ren A ds w e[?
ConcuEent and predichv€ evidence both requir€ correlarional data and.
. onr eque n rl v ,b o rh e ' ru a ri o n sd re p ta g ued b) rhe pro6l em ot obrai nrng an appro.
pr ir r e, n re ri o n m e rs u re . th e p rc d i .rion oi co ege treshm" n erade.pol nra" i i aee
u\ in6 A C l - s .o re \ i l l u s | | rre 5 rh e d i l e mmd. B orh mcacuresrefl i r t the abi l i rv ro i o
.olege level work, bur cenainly the crirerion measLrre,grade-point aveiage, is
influenced by Dary orher importanr facrors-nature or the cor:isework, sru"denr
ellbrt and motivaiion, gading policies in rhe courses, and abilirv ro €sHbtish
qupporr i !e s o ,
re l a ' i o n s h i p qa mo n g peerr. A nd rhe cri reri on mcasure hi [ | ep
'dl
resent achievemenr
in English, math€marics, and scienc€ only ro th€ extenr rh;r
, our \ ew. rk i n th u \c J re a q w a \ ra k e n by ea(h \rudenr i n rhe val i drri on samD l e.
. . o, r elar i u n . b e rs e e n A( I.o mp o s i re scorer and r' eshman end or
rear er:;de
point avcrages tend to be about 0.50. There is much rhese two metsures dio not
have in coDmon. Can a more saiisfactory crircrion be identiAed. on€ thar is Drac,
tical to inplement and fair ro srudents regardless of the pattern ofcourses tiaken
in their firsr year?-rhe crirc on problem demonsrrates rhe need for addirional
vaiidrt) evidence_ro supplemenr rhe rnformarion supplied by correladonal eviden, e. uh i , h i r\e l i i \ h a \e d o n (' i rc fl a ofque(ri on,bt;
A vari err ofevi
l i di i y.
deL r . J l l p o i n ri n g ro rh e s d m e ( o n c l u \i on abour \ or e "! "dti
drrr. i s rhe mo\r , on.
! r n( r ngju \ri l i ra ' i o n l o ' rc s r s .o re u s e .
Construcl-relaled Validation Evldenc6
The term.rarrz.t
refers ro a psychological consrrucr, a rheorerical con.
ccptuali?at'on abour an aspecr of human behavior thar cannor be measured or
obserled drrecily. Examples of consuucrs are rntelligence, achievemeni motiva.
tion, anxiety, achievemcn! arrirude, dominance, and readins comDrehension.
( ur \ r u, t v d l ;d a ri u n i \rh c p ro ' F s s u fg a rheri ngeri denrercsupporr' rhe,onrcn.
r ion r hJ r d g i v e n rr\r i n d e e d n rc a s u r e\ rhe ps!(hotog' (al .oni ri urr rhe maxen
intend fbr it to measure The goal is io derermine rhe meaningof rhe scores from,
the tesr, to assure rhar rhe scores mean whar we expecr riem ro mean
Ifour purpose is 1()measure marhemarics problem.sotving ach,evement,
- exanple, the goal of consrrucr validarron
tor
is to sarner evidencaL}lar wi show
r hdr ' lr e ra ,l , rn rh , re { re q u i re rn a rh probl em.sohi ng dbi l i ry I he denni on of
the construct that ivas used for rest developmenr defines the consrrucr In rhis
case, if the (leiiniriorl indicares rhar all four a rhm€ric operarions mav be in, ludc d dn d rh d r J l l p ro h l e m\ rh o u l d requi re ar tea5rrw o q;pq ror sotu[on, rhen
each test item will nced ro be reviewed for compliance. Since nrrila readinc
comprehension Dor marh compurarion is ro be measured (they are separate coni
\ r r u. r . \ . o u r ra l ;d J ri o n \h o u l d i n tl u d e evi dcnre rhrr rhese (onsrru;rs have no
appreciable impact on the magnirude ofthe scores.Judgmenrs by revi€vers and
coDelarions berween probtem.solving scores and (t) readiag scoies and (2) com,
putation s.ores would be useful evidence. Score reliabilirt €vidence would be
need, d r o ,h o k rh a ( rrn d o m e rro rs d u e ro etam;nee I haracreri sri cs
o, ro i dmi n.
istration conditions were nor roo influenrial rn the scores.ln addirion, rhe scorinE
crite a should be reviewed ro darermine rheir appropriareness, and the sco n;
VALIDITY
NTERPFETATON
ANDUSE 109
key should be reviewed ro check irs accuracy Ir should be clear from this illusrra
lr on r hr r (o n ( ru r rr e l a te d e r i d e n (e rn , orpor dresa !d, reryut .on,cnr retJred
and
( r r t elunr e l a l e d e rrd e n .e h e !a u (e rh e meani ng ot a (er ot r.orcs
i , rctate.t ru
The marn threats !o consrrucr vahdiry hrve been referred to by Messick
(1989) as consrruct undenepresenrarion and irrelevanr resr variance.
Tlie idea of
unacrrepresenhrron means rhar some o
rty (the construco, are not being measui
solving tesrrroLld hrve some problems r
t ion, but t h e re a re n o n e . rh e te i r' ,u n d e r
we defined 'l he idea of iFelevanr rcst
or her r hdn th F c o n s rru .r. a re i ru \i n g s i ores' o L,edi erenr trom w ha, rhp! ouqhr
t o be. M dn \ u , rh e ra ri a b te \ r} l rr I o n rrj trure tos rr-trrbi ti rr t. i n rhi r ! !rpS or).
' n rr rhoutd
5or n( ur I n e s er/r to r 5 ma r. rh e re r c r\ i er rhrn
he_re,rb i < ene* , sxe,\
( lues rn rh (;re ms . i mp l ru \i h te w r, ,ng aI\her:
aId
\^mc fJ, ro,\ mri c rhe
r' ng.
r { m . r e d i l fi .u l L rh a n i r rh o u td b e need t.r sr .deretopedreJdi nRor kfl (l nq
\ k ill\ . f ' ne ri s u J l r,u i r!, B d b te d \p e e .h by rhe resradmrni .rrarn,,
a,.,!
" " i ,.,v
unr eas ona b tee x p e , ra l l o n (. In rh e p ru b l em \ot\ i ns e\amrrte, i r r eteranr
i esr I rr r.
anc e. (ould b e rn rro d u , e d b r rh e d ' 1 fi , ul r\
.ompurari on, r eqrri red,rhe use ul
nor elpr ob l e m s e rri n g \ rh a r re q u i re u ni que" rpfl or i now tedS e.\;!cre ri me ti mi tl ,
or am Dr gu o u srre m w o rd rn q ,
Co n ,e rn to r r.n ,rru r ra t,d i ,v i s rr rhc herfl ot \u.h que,ri on\ ds,.w hl
.
dr c l- r hr ss ru d c n L r.o re \o h i g h u n rh i s
or' \\hv H c,e dI ut rhe \,ure, ru
' cq?
loq? | hes eq u e s ri .n s rri \e d o u b r( a b our
\aherherrhe $ orcl are In.rsurer ot dre
construct the resr maker had in mind or whether exrraneous facrors hale over
stated or underrepresented rrue achielement.
euesrions of construct validarion
hale not always been raised with scores from achrevemer)rrests.Bur ctearlv rhev
should be 'I h€ usaztrg of rhe scores fron any rest should be established ietur!
the scores are used ro make decrsions abour examinees Then quesrions ofvalid
rrc are appropriare to raise, and evidence for the proposed usei shoutd be garh.
As originally conc€ived, consrruct validiry was concerned wirh rhe valid.
ir y oI a h\ p o L h e ' i ,a l ro n s rru fl p u ' p o n edrv mea\u,cd by a par I i i utrr resr(C ron.
bar h and Me e h l . l 9 5 5 r. fh e i d e d h a s a ppti ed pri mari tv ro pi l chotogi i zt vari abl es
or per r ond rrv rn v e n l o rre ( ra rh e r rh a n ro achi rvemenr re\rq. fhe merhod\ eh.
ployed were, and srill are, intended to;how rhar rhe consrruct under invesriga.
t ion i\ r ela te d i n p re d i c ra b l ew a y s ro o r he' con\rruch, as expt,i ned bv $me rhe.
or y . S om e o f rh e s e me rh o d s a n d rh e ir u\es are exptai ned and i usl rared bt
Messick (1989) in his comprehensive treatDenr olvaiiairv.
Q u e s ri o n q .a b o u '(o n n ru fl v a l i di rv hare al w ats been posed w hen fi ere
apper r ed ro b e r d i \.rF p a n c v b e rw e e ow har a tesLw as rupposed ro measurednd
h hdLit \ eeme d ro m e a < u re .l s ri i s a re s tot under\randi nq otsr i enti fi c D ri nci D te\.
as |} le t it le.s u g 8 e n so, ! i . i ' re a ttya n i n te i gence resrtts i har a resrofi ri re i gi nce
or r s r t r edU yd me a q u reo l v e rb a l fa c i ti r y) some rerr makers name r} err re.ri and
des c r ibe wh a r rh e i r te s ts a re m e a s u ri n g . not i n re,ms oI rhe rask\ fi el i n.tude
bur in t er ms o f rh e rra i rr rh e y p re s u m a bl ymeasure.fhar i s shy se have restsol
rigidity, inteuigence, persistence, creativiry, tolerance, spatial reiations, and manv
otier .rails. For tes$ like rhese, rle quesrion of wh€rh;r rhe rest really measure;
1'O
VALIOITY NTEFPFETATON
AND USE
$l) ar ir c l a i m s ro me a s u red o s a ri s e , as i r shoul d D oes the rask ofcompl eti ng.a
iigur e a n a i o s y n re a s u rei n re l l i s e n c e?D oes abrl j ry ro l ,sr uncon!enri onai uses i or
a br ic k me rs u rc .re a ri !i ry ?
A P P LY I NGV A LID IT YP R IN C IP L ES
hfo illustralions of rhe validarion process wiU bc presented to shos how various
r n. , , u, , , . n r\ J n d \ J ! l { \ u f\ m a v I el i t e drfte.er,rkrn.tr ot v,ti d,r} e\ i .l .nr e. l n
( J ' n . J r c | | r\ J \\u rn e d rh J t th e U r(rtu meIr\ \pre dF, .1,,fc.l i n a, (
ord w i ,h gpne,
ally ac c e p re d mc a s u re n re n rp rj n c i p les, bur rh€ qual i ri or succe$ of rha; w ork
needs t o b e e x a rn i D e .lrh ro u s h ra l i .l rri on
Kindergart6n Readiness Test
Srandardized achicvemenr rcsrs are available for use wirh kindersarcn
pupils r o d e re rmi n c h o w w c l l rh e y h al c atta' ned the academi c skr s hu;hr i n
t heir k jn d c rg a ri e n p ro g a o r S o m e s chool suse (he s.ores fron a spri ng ad;i ni s.
rraoon to derdmrne which pupits should be promo(ed, whjch shoujd b; rerained,
anci whi.h should be placed in a (ransirional kindergarten program ttre nextyiar.
ls it approprinrc ro use such scores ro make rhcse kinds oi plicement de.,sj;ns?
Ho* valid are rhe scores lbr rhis purpose? Whar do rhese;cores mean?
Fr' s b i e a n d AD d re w s(1 9 9 0 )serout ro garherevi denceretaredro rhe l atter
question by observing rhe adminrtration ot rhe |otua Testsof B6ic ShiUsin nearty
50 da$r o o ms . T h e p u rp o s e w a s ro o bseN e reachcr and pupi l behavror duri n;
|tsr adminisrration ro derermine whether irrelevanr souries ot test vartanci
m r ghr c o mp ro mi s e rh e me a n i n g o f r he scores.Thar i s, tf D upi l s cal l ed our an.
s wer s .if p u p i l s c o p i e d fro m o n e a n orher, j f some pupi l s buri t i n(o rears or be.
r ll i fr,d ,h e r,p ro v i d e d h rn rs .o r
rt rea.hc,\ r" ;d i r" m, i mproD erl v. rhen
' r m equr e s s o ,rl d n u r
r h(
h e v c rv me a n rn gfuti ndi .rr" r' ora,hi e." menr.
i ri qh' s.ore*
or lor scorcs could be explained mos y by factors orher rhan ski ar;inment,
r hp . . or e \ h u u l d n n ' b e u .e tu l fo r a ,r , purpose.( on.l u\i ons Lom rhi \ \rucl l \ere
r har r l) r e ,, h rr\ a n d p u p i l \ h a d l i l e di tfi , ul () usi ng rhe tesr mareri at\,(2) rea.h
er s s c r ed b l e to p ro !i d e ,n a rm o s p h e re ,onduri re16g,,.6t..,rdtj nA ,LJ)D uD rt.
s ho uere p ro p e ' l v \u p e n i s e d c o u l d prori de uscrut re,pon,es ,na
i qr r.a,hers
nearll always followed published direcrions In sum, ir was derermined rhar the
readiness s.o.es of groups of pupits can be very meaningful, but lhe scores of
s ele. ' F d i l d i \i d u d l s m i e h t b e q u e s ti onabl t
In view of rhese-nndinls, rfconvincing conrcnt relared evidence for rhe
read'ness rest scores were also available (thar is,'evidence rn support of inrrinsic
t r t r onr l \a l i d i rv ,. rh e n rh e m e d n i n g ot rhe $ orfu from rhe readi nss ,e,,
" outd
l; lel! be \re b c d a r d (c e p ' a b l e .Il o w e rc' . rh€ \e(ond quesri on.rhe onp reqardi
na
us e. has n o r )e ' h e c n a d d re \\e d W h rr (an be done ro \how rhrr i t i r oi r noi
appropriare to make firsr.grade placement decisions rvirh rhe readiness scores?
S inc e pla c e me n ri n ro trrs r g ra d e i s rh e rtpi .al di re(r parh from ki nde,S arren.rhe
r es t u( er m u q l d e mo n \Irre rh a r l o w scorers\oul d bencfi r more from rerendon
(in a particular program) rhan from normat promotion If rhis evidence were to
be gathered empiricalh,. some low.scoring kindergarten pupits would need m be
VALDTY NTERPFETAT]ON
ANOUSE 111
retaincd (before evidence in favor of rerendon is in hand) so rhar rhe ourcomes
(ould be observed. Unfortunately, rhis has been rhe sc€nario ir some school dis
tncts. That is, sone pupils hate been rerained withour anple evidence rhar bcne
f ir s will be d e ' i !e d tu r rh e .h i l d . l n ma n) i nsranres. rhe
qu..ri on hd\ bcen
cond
addressed on rheoredcal and prac0cal grounds Mosr 'ofren
rhe conqtusio. has
been that deficrencies in skills covered by a readiness resr can be orercome, e!en
in rhe sho run, through inrensive Insrrucrional engagement. Ir should nor rake
a year to remediate pupils wirh low scores unless physical, emotional, or jnrcllec
tual disabilities are inlphcated
This validadon example rllusrrares rhe n€ed ro separare rhe questions ot
"meaning" and "ure" ro garher evrdence Ir fn her demonstnres thai valida(ron
is more than Batheringjltdgments and Dumbers; ir usually requires a logical mal
ysis of the relationship between several .ons$ucrs
Driver's Llconse W ttsn Test
Usually, the scores frorn a wrirren examinarion, a performance resr, and
a v i\ ion. r r re e n i n tse ra m d re u \e d i n (u mbrn, on ro de(l de sho i \ eti s,htc ro
obr r in an d u to m o b i l c d ' i v e r' s l rc e n \e I he p' i ma' r pur po\e ot l r, en' S d' i \eh
is ro protect.he public from those who mighr endanger the tife and properrl of
othcrs through unsate use of a motor v€h'cle lorlhe wrirren resi, rhe vatidily
quer ' ion is : H o w r' p p ro p ri a te i \ i r ro i nl " errhJ hrgl ' \,uters qi tt Le.rter' . morc
responsrble drivers than low scorers?" Whar do the scorcs mean? Whar kilrd of
evidence would suppo rhe inrended use of rhe scor€s for classificarion pur
poses?(Anorh€r relevanr question thar we will nor deal wrth ar rhis rime js rhe
basis for choosing a particular passing score )
Th€ validation process mighr begin by examrning (1) rhe delinirion of
''safe driving abiliry" established by rhe rcsr builders and (2) rhe elemenB of rhe
domain of relevant knowledge the definition encompasses.Then rhe relevance
of the test items can be assessedby marching irem conreDr wirh domarn deli i.
tioD. Items that require examinees to explain how ro conrrol a skiddins car or ro
t€ll *re meaning of road signs of various shapes or colo^ would probably be
judged relevant.Items tha!require examinees to desc be how a carbu reror works
or to find distances on a map are hk€ly !o bejudged irretevant If no irems deal
with the differrng meaninBs ofsolid and broken Iines thar define road tanes, the
construcr domain mrght be considered undeDeprese n ted Ifmosr ireDs deal srth
facts abour laws to the exclusion of makingjudgnenrs in certain drjving sirua
tions, tne representariveness of the contenr mighr be quesrioned.
Nexr the technical adequacy of rhe resr irems mighr be reraewed ro dere!
mine how well ihe items help achieve the purpose of distinguishing safe and
unsafe &ivers. Readabiliry should be assessed!o derermine if lhe vocabutary is
t oo adv anc e d o r i f rh € s y n ra x i 5 ro o c o mpl ex rS afe dri vers need nor be abl e l o
read English prose.) The keyed response should b€ checked for correcrness and
(for Dultiple-choice items) the plausibility of wrong answers should b€ consid.
ered. Test instructions, inslructions to examinees abour coding or rnaking r€.
s pons es , a n d i n fo rma ri o n p ro v ' d e d r o exami nees about scori ns (i nctudi nq
$her her lo g u e 5 s ,s h o u l d b e (h e c k e d fo r cl ari ry and compl ereness.
The test adminisrmtion and scoring condirions shoutd be reviewed ro
1I2
VAL/DITYNTEFPFETATON
ANDUSE
der er m i n c i f th rc a rs r() th e v a b d n e rni nq of rh€ s.ores have been conrro[ed. Is
s uper v j s i o ns u i l i c i e n r L op re v e n r c h e ari ngby rhe cxaD ri neesitfaconpui erre.mi
Dal is usc d to p rc s e n i i i e d rs ,i s a n rp l e i nsrrucri on provi ded and i s rh€re pro!i si on
f or r er u rn i n g to a n i te m { o re .o n \i d(r ur chrnge I rcsp{,nsei Ir Lteri cat h.rrd
s c or ing i s d o n e , a rc rh c rc p ro c e d u resi n pl :rcc ro check rhc ac.umcy of s.ori ng?
Istimates of s.ore rcliabiliry pro!ide s., e cvidcncc atr(Nr rhe influen.e of ran
dom er r o rs fro m a d mi .i s ri :Ii ,,n a n d sconng (ts i r ctcar rhar dcci si on consi sreD cy
int or m a ri o n i s i mp o rra n r h e fe , ro o ?)
Wh i r k i ..t ,)f.ri re ri o n re ta ted evi dcnce shoul d be garl reredto support
r hc us e o f (b e N ri rrc n re s t fo r l i .c n si ng dri l ersi Is rhe pertbrman.e resra ;seful
c onc ur re n r.ri tc i o rl me a s u re TIr p robabl y i s nor, bccause rhe dri vi nq test re.
qur r es m o re rh a n k n o w l c d g e o t d re l aq Jnd rul es ot fi t r,)ad p€rfurmance bc.
hindr he w b e e l a l s o re q u i rc s p s y .h omoror abi l i ri cs,abi ti ri es Lo see and roJudge
s pc c d a n d d s ta n c e , a n d m c n ta l a l e rrnessand conccnrfari on.' fher.eorotabl v rs
.o cxisting standard against lrhich tbe qualny of a w.jrren driver s iesr can be
J udged Pre d i .tn e e v rd e n .e n ri g h r b e garheredi f! sui tabtecri rcri on for safc dri ving in rhe future could be derermined For example, ii the scores on rhe rcsr
. uni 1. ,r.0 l o \
h r,,r,L F r u t Il d r, , JIron\ Ll uri !,srhe Ii r\r i \eJrr.,trerre.ei v.
ing a lic e n s c ,h o N g e rma n e i s s u c h evi dencei W oul d nurnber of acci dentsbe a
suitable criterion? How abour nuDber of a.cidenr.free hours of drivrnst The
pr ublc m o f d e , i o i n E u n a .u rra b te , ti reri un .ugge,r\ rha' ,' i Leri ou.,etJr;d evi .
dcnce is nor likelv ro hrve major weighr in deciding how appropriare rh€ wrrtten
tesr scores arc lbr makiDg licensurc decrsions.
'l his lalidalion illusrration shows how imporranr
inrrinsic rarional valicl.
iry cvidence
for achievcmenr rests and how documenurion of rhar evideirce
dur ing tc s t d's
e v e l o p m e n r s h o u l d e x pedi re val i dari on. Ir al so demonstrai esthar
conrencrelarcd evidence alone is insufficienr for makingjudgments abour va.
lidrry
SUMMARY
PROPOSITIONS
Va dty is a pro por ly of as er oi les t s c or esr at her
lhan a prope y ol a Iesl lnslrlnenl
2 Va dlest use req! res good lests lhose lhal con7
lorm ro a cleaf spec licaton oT rest content and
yield highlyreliabe scores
3 What a lesl oi abirlies measures s delinedmore
cbany by lhe lasks il requiresthan by the name
I
ol lhe lrarl t rs slpposed lo measure
Ev de nceo l val ldus eolt 6s ls o' c ogn lv e abilit ies
is nherenlin lhe tesl consrr!cliof process, nthe
9
delinirionol lhe abrilies €nd in the raltonaes fo.
incrLdng each oi rhe lesr lasks
5 lnl.insc ralionalva dily evidenceis needed bul
is nol sull c enr by ilse I to establishthe vatidilyot 10
l
6 Thevalue o' critef on{elaled evidencelorvalidily
is hghly dependenlon the quality(vatidi(y)oithe
Thevalueol cofte alional ev dence toslpporl lhe
valid lse ol a sel ol ach evementscores is secondary to tho vale or direct tudqmenlat 6vi
No adequslecr tenor measureerists with which
to compareachievementtests lor the pLrposeol
providingevidenceof vald score us6
Conslrucl .eialed evdence
on howwel
lhe lems repr€senla dimensions
'oclses
otlhe relevanl
domain and how w€ll i'elevant laclors are exc 0ded kom the measurements
ConslrLcl-real6d evidencenecessarllyinc udes
bul rs no1 m redto, conrent-relat6dand crlierion,
VALIDITY:
NTERPFETATON
ANDUSE
113
FORSTUDYAND DISCUSSION
OUESTIONS
1 Whal misunderslandngs are demons{raledbya pefsonwho slales This is a va d tesl ?
2 Under whal c rcurnslancesmqhl a sel ol scores be consdered qL te relable bul nol very
Whal k nds ol lhinos could happend!r ng lhe administralonor scor ng ol an objeclivelesl
lhar wourd probablyredlce the validilyof the res! ling scoies?
why is Intr nsic ratonal va dly nol enolgh lo suppon the use,ulnessol scores iiom a
crassroomtesl ro be used lor grading?
How is the idea ol construct !ndenepresenlalionaccolnted lor by lhe process or documenr inginr r ns c r alionav aldt y ev id e n c e ?
6 What specilic sludenl characlerisucsdoes h gh schoolgrade-pointavefageprobaby meaWhym ghl a supervsor s fal ngs norbe a usef! cf lerionlof meas! rlnqemployeeperlorm-
Achievement Test
Planning
ESTABLISHING
THEPUBPOSE
FORTESTING
I be sngcs ol the test developnent process begiD sirh descibing rhe purpose fbr
testing. Why are $.e resringT Whar do we inlend ro measure? How wrll rhe tesr
scores be used, or whar kinds ofscorc hrcrpretarions do ue wanr ro make? These
are important quesrions ro answer, bur roo ofren rhey are nor anslered Drior ro
r he rrrm .s ri ri ' rB p h d \,. l h i r i . unl u' runJre berause rhe answ ers tay rhi roun
ddr i u r Iu r \u b \e q u e n , d e (i \i o ' , mJl i nq ar resLdetrl upme| l r or re,r5ete,ri un
activitres proceedA good resr rarely serves Dulriple purposcs equally wcll Tesrs designed
mainly to measure a.hrevemenr prccisely probably also are morivaring !o sru.
dents ard may be insrrn.tional as Nell However, resrsdcsiBned primarily ro morj
v at e s tu d e n ts ro s ru d y o r to s e Ne as l earni ng del rces arc D or l ,kel ! ro be qood
\ um m d t i !e r5 re \\me n rsu t\ru d e nr l errni ng. Il u.r r" r he,. r.,dpre' r' dre i nr.;d" d
to provide precrse measures ofachievemenr rhar can be used ro pro!ide feedback
r o n u d e n rs a n d ro
p ro g re$ ru rhei r trrrrr.
{ n.t thi i rtruutd he rhei l
' e p o rt
A signrficanr aspecr of €srablishing rhe purpose for resring is deciding
how the scores should be inLerprered Whar refe.ent will be used ro obtain mean.
ing from rhe scores2Contenr? Scores o1.a norn group? SrareDen6 of obie.rives)
F or ,l d $ ' o o m re \ri n s p u rp o s e s .rhe an' ucr shuul d bc ri cd , to\et o
grrdi D g
' hc or se
(r.portiDg) system-rc the referent used ro gi!e neaning ro the quarrerly
mester gndes- Ior srarewide comperen.' testing, rh€ scores are liketv ro be refer.
enced ro the content domain frorn which the resr was developed. For Dersonnel
ACh EVEMENTTEST
PLANNNG
115
s eic . t i( n rh a t i s b x s e d o n h rri D g rhe mosr qD al i l i ed of rl i .,sc $ho are ar l east
m ininr a l l l .q u a h l l c d , n o rm rc l trc n c c d i ntcrpr.tati ons probabl l are needcd.A nd,
lior lh, tc l ti n g l o r p :o i c s s i o n a l .e rti l i cati oIl or l i .cnsure probabl y requi re!
. I ir e, i, , r ' rF l i ,.,, .,t ,,,' r,l ,rc r.,r.o n r
' l h e rn p l i .a ri o n s o a rh e d e c isi on abo!r rbe rtpe ofscorc i nterprerati o
nc c dc d $ i l l b .c o rn c n rn e rp p a rc n r as w e ()nsi dcr rhe seprr.rtc aspectsof tesr
coDstruclion lhc Soal at cacli srag. of .onsuuction is ro d. lhe things fiar will
hc lp t , ) Ii c td a d i s tri b u ti o n o 1 th c n o st l al i d scores,a di stri buti on rhat has thc
. h! a. r eri s i ,.s rh a r .rtrk € p o s s i b l eri re rtpe ()f i nrerp.erari onsw e had pl anned !o
ALTERNATIVETYPES OF TEST TASKS
T h€ m o s r c o n m o n l y u s e d tl p e s o l res| sare rhe essar-,
rhe obj ccti l e (rncl udi ng
s lr o, ' arrs $ e ,),a n d rl r n ,a d re l n a ri c alprol ,l eD rype. P erfornancc tcstsaD d oral
ex anina ri o .s b o i h a re l e s s c o n rn o n perhaps, l ) t rvhere* re) are uscd, rhe ci r.umstanccs often falor rheir use over rhe odler tvDes lhn sectjon is devotcd
I , ' , ' l, i, l .u ,, t' .
u l ,h , , h J r." r ' ri .ri i . ol rl ,c\' \Jri urr' rp.r ,)pe, ard
"
' :\u n
des oipr i o n o f d re re l a rj !e me r i rs o a each i n si rrari ons r!here a choi ce i s feasi' .bl e
Essay, Obj€ctiv€, and Num€rical Problem
l i r-s t,s o me .o mmo n n i s c .rn ceFri ons.eed to be addi essed It i s not true
thar luck is a large element in scdres on one type and nearly or rotally absent in
another On the conrrary, all tlpes can be wrjlLen to require much the same kind
and level ofabiiity aDd. ifhandled carefully, can yield results ofsarisfaclory relia
bilny and lalidill (Cofiman, IS66i Dressel, 1978). A good essay test or a good
objec r iv ere s tc a b e c o n s rru c te ds o th ati tw i l l ra kagroupofstudentsi nnearl y
the same order as thar resulting from a good problem rest. Bur this is nor to say
that the various types caD be used interchangeably rvirh equal easeand effecdve
ncss (Sce Birenbauni and'Iatsuoka, 1987. for an example in the area of djagnos
ing lc arn c r d i l l i c u l ti c s )
Borh eslay and problem lests are less tirne corrsuming to prcparc than
otlecrile tesrs. BltI rhe objecrive tesr generally can be scored nore rapidly and
norc rcliably than either of the orher typ€s, particularly the essay test Where
vcry iargc groups of stDdents must be resrcd, thc usc of objective tests permits
g.eatef effi.ie.cy qith no appreci,tlrle sacrifice in validity Bur wher€ classesare
small, the efr.iency is in the opposrte drrectioD, and essayor problem tests often
'rhe numeri.al problem
rype has the apparent advanuge of greater tntrin'
sic relevance-of glearer idcntity with on-the:job requiremenrs-than either of
the orher types. It rs sometimes claimed that abiliry to choose aD answer is differ.
ent from, and less significant than, abiliry to produce an answer. But most of the
evidence indicates that these abilities are highly related (Ward, 1982; Sax and
Collc t , 1 9 6 8 )
Because of the length and complexny of ihe answers they require, and
because the an$!e$ inusr be w tten by hand, neither essay nor problem type
116
ICHIEVEMENT.TESTPLANN]NG
as comprehensively as an objecrive
resr
r.eadrng.
Whichever rype exarriners deci
Poafoamence:process snd product
TCHIEVEMENTTEST
PL^NN NG
1I7
. LUm pl i \h i n g J rJ rt, o r ro d e rc rm rn e rhe quati ty oI a producr Terchers can evaru.
ar e dr J $ In g ! .' n d , u l i e e \ i I rt I, L U n hute\ dnd \outne\ i n
home c, onnl l .i ( j ,
t ne, l.u p .e n g j n e si n d ra b l r ta m p s i n rechni cal educati on, and penmansh,p
ur
pangr J p n .o n c y !e n e $ In ta rg u a g earrs.tn each caserbe goat i s N mmati ve;val .
il. i:PIr\ e] "rru
,,,1n..0
^ ,' l i ,{ r!o fp ro c e s s a r emadeasthestudenrprogressesrow ar.d
.l,om
o r ."
rtrc fro l e L r
S n n u ta ri o n s .rh e m o s t c o !
Dived si(r:trions esrablished for rlr
s es s ' ngs p e e d ,a c c u ra c y ,a n d q u a l i r
our . ome i s a c h i c v e d .D a n c e i n s tr.
polka, nusic rexcheN lisren for p
war c h a D d l i s re n l o rh e i r c o u .s c l i ,
who have been .errified in CpR a.
ar e inp o rta n r, b u r rh e o u l c o m e i s ,
PertbrmaDcc resrs can serv
pr c s e. r s o m e u n i q u e n
same idenrificarioD tasks and simu
t he r r . ( \ [ra ! n .r b e , u mp " ra b t, I
be comp,rable crear ca;e musr l,
e{iu'vateDt rcsring for all srudenrs in
r c or iig o fp e rfb rm a n c e re s N re n d s
gr aLlin gg u i d e i \ p re u !re d t,tp rrr! J ri o,
I ng r u p re p d re d n d j d m i rrri r. p -p e , i a r ro IJrge group,. on
rhc s hotr, D erto,,,,.
an, e r e \l \ re n d l o b c tc \\ e l l t, i p n , rh" n oh1r,ri re rrrr,. tI mrn) ri ruai i .n,
rne
q u e sri onabl e,the mosr real i sri csi mul arronsrend
r d admi ni si er E ven w hen si mul adon scores
r ,s tikell ro be grearcr rhan rheir benel;t. Despite
l a n y ci rcumsrancesunde vhi chperforD ance i esr.
j a n s o , measurcmenr.cui deti nes l ur de!el oprrts
llr
i:r n1nl c_!sl"s,T9,,r
r..t,,u \iet dh is t , t \ J t : d , e , u 1 r,
{ . , , , .r, . . , ,J . i, . r. i r, ,
r r . gE I r \ l l :' u /1 .I n d d d i ri .rr. a h a p rr r ]td.\ri bernrertrud,otu\i ns(hF,LIi sr\Jxd
r J , r nB ! d ti .s .b o rh n r $ h ' , h d r u { d promrnenrt) i n per r" r m" rr,-e
ase..menr
"
TE S TS P E CI F I C A T IO N S
1. Types of r$r iLemsro bc used
2 N u n b c , o fi re m o fe a .h ry p en eei l ed
3 K i n d s o r (a s rsrL e i re d r v i u p resenL
A ' H ] E V E M ENT T EST PL ANN]NG
'I8
1 Nr m b€r ol r as k sof c adr k ind n e e d e d
Dc s . , ipt nlN ot . onr enr ar eas o b c s a n , p l . <l
0
Nuhber of ireDrs from each arca Deede{t
Level and dislribuuor ot dre dilticutr\ of rhe ilcu,s
T e s r s p e c i fi c a ri o n so f (tri s ki nd are us.tul l i rr rel cral rcasons:(t) rhey
the
work oi rhc resr .onsiruclor (2) rhel can hfomr cxanrn,"".
€iuide
pec t , r ri o n sa n d h o w rh c y d ,g h r p re pare rhenrl ehcs,(3) rl reyprol i (l e i nrbrnari
";i
"i";; ;;
Io orhers who nay $anr ro select dre resr tb. rheir o{D pardcular use,
and (a)
rher prolide documenrarion as eviden.e tbr.ju.lging rl,J latidirr .1 d.
;.;;;l
obtarned (B-ursince resr spe.ificalions furnish a
rla; ror resr (lc;elopDer,r, rhc
suresr basis for-Judging rhe usefulness of a rcsr, o,: rr,"
.,t i,, ,io..,, is a.
".r;atty
ex am in a ti o n o f rh e rc s r i re l ' ]s rh e msel vcs)
Delinidg Conlent Domains
rnr ro bc Dreasuredbv a resr be described?
relares nosr direcrty aDd deDends mosr
rion the user wishes to makc. Obvioust.!..
al obj ecdvesof i nreresr i f our eoal i s to
ions. When our goat is norm rer.erencecl
r be defined more generalty, bur sritl the
boundaries need to be idenufied In many cases, the contenr of:cerrain book
( hapr e r\, a ' ri ' l e s , n o \c l \..ru d r g l i de,. or orher i 1\| l u.ri ondl
marFri ar, \(r rhp
ir m r b ro r e trg rb rei re m ro n te n r. \' \ hen uur nFed;\,o es| | ma,ehuu mnrl ,,1
,5.
€ont en r d o ma i n h a s b e e n l e a rn e d , the separateel emenrsrhat conD ri se rhar do.
maiD need ro be describcd. This is rhe caaewhen dom a inr ef.erenc;d inrerprera
F ,C " ' :7 .1 ,h o w \ rh p rv p e ol domai n.peLi n.dri on\ rbdr mi B hr be pru
, . .
! r o. o ro r e d .h o r th e c e k i n d , o , n ure i nrcrpr.rrrron,.
our D ri nrar\ D U r;osu
qa5 r o o b rrtn n u rm.re te re n .(d v o r e.. ror
e\ampte. rtr" t^uni ,ri e,
" r' o.,' ,,,,,
s iblec o n te n t c o u l d b e d e s .ri b € d s o m ew harl oosel i .rfrw o i ndi vi dual s,bori
j rami i
idr hir h rh F i n \| l u ,| | o n ,tt
p r.g rJ m.kerrrobui t.tr.\r,i ndppF,,,ti nrt\,r$t
url
lhe de' cri u ri o r' p ru r rd e d , rq o q U i Ir di ttcr el r rerr,, outd errrcrge tl rrhcr .r^,F .1
k ut t en In \rru (ri o l | a l n ,d re ' i d l , s e rF nnr rrri :dbl e, i ,,,r,rerr
nc !(\etoD e.i
" urt
br r he re d (h e r w o u l d b e n rc d e .t ro .rrJhti rh rhe,un,Fr,r ti n
i r\ Lhre.,drh:rd
dcpth) of rh€ resr items.
( o n L e n r,p c L i fi ,d ri o j r, tu r d nmdi n.retere0,.d
re,ri nq r1u.r Le erD ti (j r ur.
i r ur r h e g o d l i , ro e n rma r" h o w mdny ot rhp i denri fi dl ,tej i e- or rn-.r._" ,"
r r e hdd b y . tn o w n b v , o r, u n ' ro l l e d b) ea(h crdmi n.F In mor .dse- ,he dumrrn
w^t llbe l a r€ e e n o u g h th d r n n l ! / rrmptd ot rhe el emenl \ , an be re\rcd ar one I i m..
rrp re 5 e n ra r' \e s x m p l e i s ru be obrai ned. rhe i ndi \i duat ete, enr.
m ur r be l i \rr d o r d e s ,ri b rd rn s u i h a w av rhdr rhei r \ete(ri on L pos,i bte. I tre
illus o/ ri o n i n F rg r-l re
7 t h d . h c e n a bbrevi aFd ro,on\ene \pa, e, bu( rhi pD re
dom ain ro u l d b e d e s
b ) rh e 2 7 proposi ri un. tr,rcd i n i ppendi r B
b trc
e tt a r;o n j
l i n d l l ). rh e s p e' i.i
l o r oi ,j .i r,.e,.r.t.,en,.a r.,.,,e .i -pi , , t,,r.
r ng ol- rh e j n s rru c ri o n rl o b j e i ri v e so t i n(efc,r. Fdch obtec,rvci . , on\i dcr;d r ,.1
tent domain by irself, several items will be wnrten ro measure achievement of
A C N IE V E ME N T-TE S TP LA N N IN G'19
Thf.conlent domain of intercst is phy!.al fitnc$ as d.;cribed by Chaprer r1 of
t he h. alt h r ex t . ' lhc r nain ar eas a r c :
I f,xercise and irs benefits
2. Designing an exe.cise program
5- The
of s le. p ia good he a l r h
' o, e
D- Dobain-Refer€rced
Tte coNenl domain of inrer€st is Dhfsical fiLne$ as detined bv a separatc rnr of
?7 pr opos ir ions r ehr ed r o ex er c r e , c \ c r c i s e p n ) g f i n r j a n d l e . o ; r r i b u r i o n o l
npep. Hr , ! - r e, 1, ' c e \ "r npi p . o p u . I 'o r , , . . r 'F I , , n F J , h . u b d o r n i , . r r o r n r t - r
t ull . lnm : in lr r r ing llt pendr { B i :
I [xercirc cad ,mprolc blood veset .apacirt ancl inoease hea( srrengrh and lunA
2 The bcD.fits ol aerobr cxercise r€qu,rc a minimum otLhree 20 d,inurc s.$ions
r
! n, , hc d
nc_,.r,,t,
J h. . , lF- p, ) , t F
-,rFn
r r ght nr t n^ nr uLh nn' r .
h\ poul bodv fosition, Loo n
ch
c. objectiy€s-Ref er€nced
The conlenL donrains of intercsr are Lheseinsirnclonal obje.tircs about pbysi.al
ii.ne$ (,\ppendix C):
r. DcLin$ish the purposes a.d fearurcs of aerobic and ana.robi. exer.isjng
2. Desc.ibe ho{ nuririor and .xercise jon d) alfe.r bodi Neisht.
3. Es t im ar e t he, elar n' e am ox nb o t s l e e p . e g u i r e d b v m d i 'i d t r a l s {h o v a r l i n a g . ,
adiviry level, and geDeral health .ondirior.
Figr.€ 7-t
sample ConLenlDoman Delrnlons for TrrreeTyp€sof Score n(erpretaLons
ca.h s€parare obecove, and a score will be reporred tbr ea.h instru.rional obiec
r i\ e. Ih e rh r..u b i e ,,i \e ' ;
p ., r or f i gxr 7-trrFr" ten
tro[r qpfcndi rr
In
ill, , { rd re rh e ,n ,,| | r,ri n g rc rtu j r,m.nr,
n,rhFLtumdi n,' rfi ni non \;FrhrrrhLn
objectivesreferenced inrerpreraiions are needed, as opposed ro domaiF
r ef er e n c e d , n D s a mp l i n g o f e l e n ents occurs and no i nfercnces about conrent
nee. l ro h c m .' d c i n s , u ' . i n rp rp rcrrron. l ' \ i rh .hi r, ri \ p. , eti r{ r, e,t ,i rJJ,In.,\.
all r lr \i l l ' o r r U s l e d a e
,,,r, r.' r .re rc' rFd i ;d ,.nq quFnrt\
^l
are nade about untesred kno\rtedge or skill rhe examinees mal possc$_
Tables of Sp€cifications
1z'
ACHIEVEMENTIESTPLANNING
Tabl67-1. Tabteof Speclflcationsfor a 40 ttemTest on Scor6
Betiab ity
ABILIfIES
fATAL
2
0
0
2
l
3
2
0
1
0
l
2
3
3
TOTAL
2
0
0
1
12
6
I
6
a
40
,.., .*1,i"q-rrF d\pefli or ,eliabilir\.d"nnirion, rypesof
erro,,.
::':1""i.,1 4omarng. rrc,o,.
inr'tu"n,inr. dnd inrerprrrarionbi roetficienrs.
'1.'"*. '.:
li:J;n::tT;:l
:i:i:J;T.;i[.*ff:
;i;:,.jil:l.jl:::#:,.':TlT:,::itg
ll I r hr F c i re rn \ s h u u l c l tF q u i re c \p td nari on ," g,d," 8 ;yp" " ;i ;.;
,;; ;i ;;;;
, ^nr r nr le !\d rc \i m,ta ri n rc ta ,rre i m p o , ,an,;," .ai s;' d:" r
,t.l oi ,._" ,r," ,,i i
r c qu|' c a n rrv n ,o re .o mp tc \ rh .,n s impt\ i denri f\i ng or
des.ri bi nq rerms.
0..,.,,i,;'.1:,i[1i,*.1':.
;:i'":i,:li#5,;ilil1i,;;:.f5.'iy.i;.,jt
:J#:
ple, a one.dimensional classificarion schem
;'ucrionar
obrecdves
ro.,,,h".;;;;,;;::: il:t ff,::i::*:"1.,:;j:.Yffilii
it y dim en s i o n
a re b o rh p re s e n r i n a ch sra
j.;ilill;,I,1,:.11?'iLii'fi'J:il:Ti"::
rabre
7-2dcpic,. i,,i,,tii",er",
,
-.r,
ti r C thori gh ea,h ofrhe rhree i ontenr
r'es,the projecred composition ofr}le resr
.c i ncrrucri onatobj e, ri vesare . omD ound
(
i \ 7,7, and t2 for a roral ot2' 6. { For
of' r!c\
six separate objecdves.) The percenr.
beequil,imporan,
H.",. ,00,"
o'n.lilill';jfi:ruiIr;:i:,:::;iT::'.li
be des ( r i b e d s h o fl i v
Caregories ofEbei's Relevance cuide were used ro d€sffibe
rhc abiliries
.
drmension in Table ?-l because Ebet,s rerms g*"
iJ.*.,
rhe_type
ofabilityrequi..a.rt. uu.i""" r.".r, irrr..-ii",""..i'...
". "p*.,i."ri
"i
r.*,i',i,ii
t ul be, J u.e
me a n i n g \ o t B to o m s .a rego, resare more \us.eprj bl e
ro
mi
n
tassi
' h " c ta s ri fi .a ' i u n
f i. a, iDn o r In
d i s a g re e meniamong i udges rhdn a,e Lbel s A tre;Trbls 7-2. Tabteot Spectricarions
tor a Oomatn-rer6renced
Test
27
27
46
-
rcNIEVEMENT.TEST
PLANNING121
narelyj in siruarions whcre atfecrive or psvchomoror objecrrves
are ro be
I r aluara d .ra ' e p u Ie \ u f
ri ro n ur, e\ Jre dupr.pri are L, u\. r. d, i ri bp rhe
'
h
o
\.
dbilir ie \ d i m rn .' ,,n R e s a rd tc \\ o r rhe , ta.,i fi ,.;" n
.,.,." . ;fi ,,,.;.
;i l
r ar et s u | e so r rh e .r h n \c n \\ \re m q , ,,ur ncL, \\i , i t) be u.c.r tu,
a ,-i i " , r,, .l i .
m i n grhrredrh,e
dppea,i nsi nd,;bteot .D eri l i .
^r s o. r n c re r\Ii ' ' ra \r\l ^ rp rp :
u r' u \e d . t4 h i i h , d rrA U rrF\ot l l ,,.t. R cl eudn;" C ,u,de hr\e
beel
om ' lr ed rrn n h e p td n i n T ,rL l e 7 _ t : H oq mJnr i ,en)\ rel Jrpd ru
dFfi n i on rri l l
r equir e re c o mme n d e d a c ri o n ?(D o es thi s nj ake senset)
' fh e ra b l e o f s p e c i fi c a ri o n spro.
r o be rc s re da n c l i r i n a i c a re s rh e r.;ri !
W hat f a c ro rss h o u l d rh e re s rp l a n n er co
t es t c on te n r (ro ra l p o ' n ts o r ro ta l i rems)
ln t he a b s e n c eo f i n s rru c ri o n a l o b j e crn
areas can be gauged b) considering rh€
l . A n o u n t o l ,.n l n t tn n td ,n .d .\n drc.ttotmerl b) .i A t prot-,o\i r,on.
. .
D rob.
abr \ \ hu u td h a v e rq i , e (h e w e ,g h ru r d,, a, pd , omp,,* a
.l nrr i " ,i
"r
z An n tn t
t4 \tttu tttu t ti 4 " d.hohtt.4 rupi . ro
\vtl l ,h ,i x l"t",."
.,i,e_i
,1.i ,,
.1,
" n,
.
^t
were devored probably should have three tines ihe weighr
of a ,"pi: ,i.;,;;
quir ed o n l y trv o c l a s ss e s s i o n s .
R o l p a tu tu tp p ,4 4 tu t!t. .It rn ,[.r i egarded J\ e\\pnri al
bi (k.
gr ound,t.i o r a s u^b s e q u e n rrfl \rru .ri u n al u
and descrv'ng more weighr, rhan an ar€
4 Other Wortltnities to dalLate.\
agarn, as on a comprehensjle final ex
equally i m p o rra n t a re a th a r w ' tl n o (be r
ple, when a ropic is resred by essayon a
be enhre l v o b j e (ri v e i rc m \. tL r p ra .' l ,j t rei \uI..
5 N p p df" r v rrrrl \/o rp \ \a h en \(ores dre nrpded tor \uhropi .\, ,onrFnl
.
wit hin s u b ro p i ,. mu \r h e b e rg h ' e d ro en\ure (onrenr repre\enrari venes
ut (hc
,o r c h (h
r t(o re w i l t b e !Fponed.
T h e p e rc e n ra g e .i n a ra b tc or \pe, ,fi , ari onqshoutd be rhnuehr
ot as rhF
per , enr o t te s r p o ' n ' s ro b e .r o ta te d rarher rhan resr i rem.
ro be u" sed.l hrs i s
especially importanr when more than one rype of item t, t"
b.;;;;;;';h.
tems For example, a shorr.answer irem
)i nt, bur another shorr.answ eri tem rhar
rion Inighr have a maximum score of 3
, respe.rivety, ofa two-irem rest in terms
g x i .l € re s r c o n s rru c ri o ne ffeni very ano ro i ntorm prorpe(ri vF
exami .
.lo
nees ade q l a re tr,.rh e _ re srp
r e c i fi .a ri o ns needi o u. r" i ,ry a.," i i .a.
i"
quer t ion " H o w d e ra i te d ? w e mi g h r pose anorher quesri onrl ,h" ,
;.; " " ."
i .i" i,-ti
;;; .,.
€x ac L[ . D r a c o m p e te n t i te m w ri te r. w o utd rhey be l i ket] to produ(e
dn ar.eD tabl e
r e- s r(i - , D v ro u b r)-rp e l rc d ri o n , s h o u td be derai tedenough ro i ndi .at€
ehdi ki nds
or r r em ss n o u td rrew !
o n w h a r g eneral areasot l earni ng,but they shoul d nor
'
l
l
e
n
oe io oera rte da s ro g rre a w a ) rh € a c r ual quesri oni rhar w i l i appear
on rhe resr.
122
rcH EVEMENTTEST
PLANNING
ITEIUFORMATSELECTION
\ V it h c on te D t s p e c i fi .a r;o n sl n h a n d , rhc rcsLdcvcl oper s .ext dcci si o. rel aresro
t he t y pe s o f i te ms to b e u re ,1 W h e . i nsrru.ti onrl obi ecrl l es l orm rhe conrenr
t r as e,r hc l c rb u s c d i n c e (h s trte m e Dt suppl i es a srri ct sranda' .1i br rhe i ype of
ir en r o.o n s i d o o , rc j c .r $ Io rd s I' k e descri be,des,gn,gr.aph,.l evfl op,and ex.
plain r eq u i re s o n rel o .,n o l p r()d ' rc ri or or rhe part ofrhe cxani nee. acl i vi ry rhar
c annot b e .l e mo n s rr^ re d b y n ' L rl ri p lc(hoi cc, hre fal se,of odl er obj ecti vc-i reD
t , v pc s( ) f rc n tb e i d e a l n re a s u .e n re l lD
L ro(,i durc ml rst be.ompromi sed becauseof
pr ac t ic a i c o n s i d c ra ri o n s .a s \rh e n rn ol ,j ecl i re. machi ne scotxbl e rcsr used i n.
s t ead oI a $ ri { i n g s a m u l c to n c a rc rl ri ti ng abi l i ri es Thc rrade offs associ
's
ared
rvith essry, obje(Iive, and problcn!tlpc tcsts will be e\anrin.d furrher ro reveal
the rclatile merits of ea.h
Comparison ol Essay and Objecliv€ Formals
The following srareDents !u
enc esol c s s a ya rd o b j e c ti l e te s rs
rrarjze $D€ of rhe similariries and ditfcr.
I
Eirhe. a. e$ay o. a. oble.tire tesL.an be Dscd lo measure almosr rDy idporranr
edu.arional achielenren! lhar a.y paper and pencil
2 E,ther an essav or an obiccrile rcsr can be used to encourage sud€nts ro $udy
for understanding of prnrciplcs, of8anizatioD and int€gradon ofideas, and ap.
pli.ario. of kndrvledge to thc ytutnm of problems
3 Thc use ol eirhe. rype ne.e$arily involvcs thc crercise of subjcctile judgdenr
.1 l hc liluc ofs.ores f.oo eirher trpe oftesr is depcndenr oo their obJcctirny a^d
5 An essavtesLguestnnr rcquires studenrs o plan dreir ovn answeB and.o exprc$
thenr jtr rhen os. w,rds An objeclivc tcs. itcm fequires e\aninees .o .hoose
aDrong scyc' al designared alrernalives
6 An csav tesr coDsis6 ofrelarilel! few more general quesrions Lharcall fbr rarhe.
exterded nnsvcrs An objecdv. @st o'dioa.ily consists of man! rather specific
questions reqnirlrg only brief answers
7 Stud€nts speDd mosr of d)eii line in tlnnking and wrning when Laking an e$ay
resr 'they spend mosL of thcir t'De .eading and rhrnking wher taknrg an objec.
8 . Tbe qualny of an objec t iv e t c s r is d e r e , m i n e d l a r g e l y b y r h e s k i U o f d r e r . s t . o n
srruc@r The gualitl ofan essal t€st is determincd largell by thc skiu oflhe 1es
9- An essavexam in2(ion is relativell easy ro p.epare bu t rzLher ledious and difficulr
ro score accuraLell A good objecrilc cxamina.ion is relatively redious and diffi
cult 1o prepa.e b!r coopararively €asy to scor€
10. n e$al examinition arords srudedrs DUchlreedom to expres rheir indrvidual.
i(y in .he ahsBers th€v gi'e and much freedom for the €xaminer ro bc guided by
his or her nrdilidual preterences
scoring fte answer. An objective €xaDrna.
lbn aflords much f.eedom for rhe'n rcst constru.ror ro expres per$nal knowl.
€dge and values but allo*s studenLsonly the freedom ro shoq bv the proportion
of corrert answes the! giue, how much or ho{ liftle rhey Lnow or can do
A( H]EVEMENT.TEST
PLANN]NG
ll
Irt objecLi\e.te$ iten\ lhe studenfs task and the basis on vhicb
Lheexamnrer wi
juds e r he de&ee t o whic h n h a s b e e n a c c o m p l i s h s . t
a-.|a"a.-.
a",.if,r""
rney arc rn esay rests
l2 Anoble, ' i. - r . r r
per m i, r , an d t u , d . r u n c t l \ , n , o u r r S e . , c u p ( . n e
\ne),a\ re\r
per m ir \ .
o, , d. r . nill. en . o J r d q c r , L t u t l i n S .
' nd
Ll The dr r r ibur inn ut num c r i, r . . , u r e , o t r r r n e i l , r o m r n c . . d l
rpr,rn bF ro!
, ' or dJ , o a , . , n, idei, h. e dr $ c e b j , r _ e
,.,;;:
s,ader. ,h, r,u,n an ;;;..;;,.
t) bt rhe errmir jIoF;r.r
I n \ r c h nr r he\ e nm it r nr ie s
Jnd di
ereo, es. s hen mishr ir he mn.r aD.
rrrLour ssa)i,ems?
rs,ryrrr5d,err!"i"a r",
l:pllilr ll,9
ll:ne0,
a .n
rc v e me n r$ h e n :
-.,.,,,Jg
I lhc group ro bc rcsed is snal, and rh€ tes Nill nor be rensed
2 The iDnrucro. s.ishes to pro!ide ior rhe developnrenr ofstudedr
skiu in $riren
exP.c$ron
3 The insrrucrof is nore inreres{ed rn cxploring studenr aLrirudcs
.han in meai!
rng achrclements (Whelher in u.rors yrord be nor€ rnteresred
ir aftiru.les
, hr n dr | , e\ en, , n, anil qhpr h e , r h e r {h u u t d e r p a r d n j , u n e . r
e\p,c,.ron ,,r Jtr,
r Ldc r I n r t er ' i ur ' ion. r
4. Tl)e inso.uctor is dore.onfidenr
of bis or hef proficiencr as a criti.al esar
reader thaD as aD imasinarive uir€r 01 good obj;.t_" r",t i,"-,
5I im edailablef or Le$pr epar ado n i s s h o r t e r t h i n r i n e a v a i l a b l e f o f L e s r s . o . i n g
trssayrestshave importanr use
.
arsohave some serrouslimiradons Tea
clains thal cssa) tesrscan measure hi
havenot been defined.They atsoshoul(
to determine how welt studenrscan an
Comparisonof ObjecttvoFormats
The most commont)ured kindj
r ue-talre.marching.ctassification,
and
beendescribedin other trearmentsofoh
However,most of rhesespe.ralvarieri(
Their unique tearuresorre-nao more tb
diffi.u[v ol usinSit than ro imp"rove
rhe irem as r' measuringruol.
'n(r€re,rhe
v,u'np'e(norceand rrue_tr'tse
resritemsare widetl rpDli,.abte
ro ;oejl
\ ariery or ra\I\ Be(auseor rhis and becau* of rheimponancJriia*r.r,i,irri
in usins edchone effe.,ivery.
sepa,arechap,er,,,J a-"i;a ,.-,_""i"r,! ,"i
muluple.choi(e
irem tormat\ taterin fiir rerL
124
TCH EVEMENTTEST
FLANNNG
'lhe nuUiple choiceform oftest item is relatively high in abilrry to discriDi
nare bctwecn high.and low.achieving students It is somewhat more drmcuh ro
ilrite than some other iten types, but its advantages scem so apparent rhar ir has
become the type most widely used in tests .onstructed by specialists Theoreti
cally, and this has been verified in pmctice, a given multiple-choice resr can be
expecrcd to show as mlrch score reliability as a typical rrue-filsc resr wrth nearly
twi.e that Dumber of items. Here is an example of rhe multiple.choice iype
DirectionsrWrileth€ numberolrh6 bostansw.rlo th. qsostlonon rh6ltn6 at th! rtghtot the
Erdnprei U,hichis th6 Dost rpprep rte dsslgnatlonlo. a govohm€ntIn whtchconrFt is in
lhe handsot a lew Deool6?
l. Aulonomy
3. Feudilism
2. Burer0cracy
4. Ollgarchy
The r/?, /i,&r item is simpler to prepare and is also quire widely adapr.
able- lt tends to be soDeshat less discriminating, item for irem, rhan rhe mul!iple.
' hui' e rtp e . r' " d \o me h h d r n ro re 5 u hi e(t ro ambi gunv and mi \i nrerprerari on {l
dough th€oretically r high proportion of true false irems could be answered
coflecrly by blind guessing, in practice the enor inrroduced inro rm€-false resr
scores by blind guessing tends to be small (Ebel, 1968) Thrs is rrue because well
motivated examinees takrng a reasouable tesr do very lirrle blind guessing. They
almosr always find lt possible and more advanrageous io give a rarional answer
than ro guess blindly- The problem of guessing on rrue-false test quesrions will
be diicussed ir greater detail in Chapter 8. Here rs an example of rhe true-false
Dtrocliors;ll ths s.ntonc6 is o.sontlallytru., 6nchctoth. l€t.r "T" at tho rtght ot th. 6.n.
l6nc€. ll il is 6s.enrlrlly l.ls€, snclrclotho l6tt6r,,F,''
ErarnplejAsubslanc. th.t sedos.s. catalystIn r ch.mlc.lr'.ctlon mly bs Fcovor€dunrl
r€r6dat rh6 6nd or th€ r..orron.
o F
Those cntics who urg€ test makers to abandon rh€ "rmdirional mulriple.
€hoice and true false formats and ro iDv€nr n€w formars ro mea
and more significant array of €ducarional achievement are misinform€d
'ed
abour
two important pointsl
L42, asp€ctof cognirire educahonal achr€v€m€nrraD be r€stedby eirher the
mult'ple choice or rhe true falsefotuar.
2 Whar a mult,Dle-choiceor true falseirem mesur€s is dete.mined much more
by its contedt rhan by its format
The nat h'ng type is efficienr in thar m en(ire ser of responses can b€
us€d with a clusrer ofrelated stimulus words: But this is also a limitarion since ir
is sometimes difficult to fonnulat€ clusters of questions or srimulus words rhar
are sufficienlly rimilar to make use of the sami ser of responses. Fuflhermore,
quesrions whose answers can b€ no more than a word or a phrds€ rend ro be
IC H IE V E ME N T,T€S TP LA N N ]N G' I25
sonewhat superficral and ro place a premiurn on purely verbalistic tearning. AD
er am ple o f rh e m a r, h i n g
n Si v en here
'vpe
_ L l.
e2 .
fh.lodocenlt
Abrcad
Wllll.m Shak.spoar€
Fob.n LoulsSt.Yonson
d3.
The.Ia\sijimrim type E less familiar than rhe marching type, but possibly
more useful rn certain situanons Like the marching rype, ir uses a single ser of
responses but applies rhese to a large number ofstimulus siruarions An example
of the classification type is the followirg.
Oir.cttorsi ln th6 lollowhg ltoms you a6 to €rpros! ths €lt€ct6ot €x€rcts.on vrrtou! Dody
ind sqbst.nc.s, a$umo lhai tho org.nlBh undorgoo!no chang€oxc.pt tho3.
Proco3see
du. lo €xorclso.Fo. each item clrcl€ ths approprlatonomb6r.
1. ll tho alloct ol srerclsols lo ,rcro€salh. quanlltyd.acdb€dIn th. lt.m
2, lf lh. ollect ol €lorclso ls to.Lc..as. ths quantltyd$cdb.d ln th€ tt.n
ff erorcfss should h.v6 no r ppreciab .ll.ct ot an unpr.dtclrlr. .tt ct on lh€ qu.nttty
.loscdbodIn lh. ltoD
27,
27.o23
20.
26.023
25. Anounl ol gluco3, In lh€ blood
2e.r @3
30. AmouDlol ro€ldu.l .k ln th. lungs
30.
1
3
@
iI
I
I
have shown a very high correlarionb€tweenscoreson resrscomposedof parallel
.hort ans$erand mulriple,honr ircms,whenbo'h membersoriach paiiofpar.
allel itemsare inrcnded ro tesrthe sameknowledgeor abilny @urich, 1931;Cook,
1955).
This meansthat studenrswho are b€srat troduing correct answersrend
also fo be besrar idmt{ttng thcrr, amonlseveral airernarirs. Accuraremeasures
ofhow well studenrscan identify correcr answersrend ro be somewhareasierto
get than accuratemeasuresof deir abiliry ro produce r}l€m.Th€r€ may be special
srtuadons,of cours€,wh€re rhe coEelation would be much lower.
The disadvantag€s
of rhe shorr.ansu€rform are rhat ir is limited ro ques.
rion\ rhdt can be an.weredby a word. phras€,lymbol,o' numb€rand rhtt its
scorrngrcnds to be subjectiveand r€dious.Irem wrircrs ofren find ir difficult ro
phrase good questions about principles, explanarions,applicarions,or predic.
I26
AcHIEVEMENT,TESTPLANNING
r ionr th a r (rn b e rn \re ' e d b v o n e \pe, rfi ( hord o, phrase H ere
Jre \ome exJm
pr es o I l n o r( rn rw e r [.mq
Dtroct,orcjOtr the btank toflowhg 6ach ot rhe tottowtngquestions,parflat
statem6nts,or
wo.ds,wrlt6 ths word or nuhb€r thrt s€emsmost appropriat€.
wh6t ls rhe !at6nc6oI orygen? 2
Th6 mlddl. s€ctlonot rhe body an h6ect is calt€dthe rhor6x.
Whal maJorriler tlqws throushor n6ar€achot th€semai;;;Gs?
-t
Cairo
Ou.bec
Sr. Laersrc€
,at a variery of irern ilpcs be used in each
askspresenredro rhe exami nee..l .hevi m_
.the sco.es
or make fte rcsrmore i D r;r.sr
,rs should choosc rhe paraicular irem rvDe
w i sh to exami ne There rs,rore rreri ,i n
widely a d a p ra b l e A re s r c o n s tru c ror c:
ir c D t yp e , s u c h a s a D u tri p l e c h o i c e, ar
when i! becomes .learly more efficient
depc n d s mu c h m o re o n g i \i n g p ro per
and on l ri ri n g g u u d rrc ms .r s h d rercr rvpe,hdn on rhe,hor,e,,t
rl ri \ or rhaL
type of irem.
It6m Complerlty
There conrinues ro be an inreresr by some resr developcrs roward rhe use
oI ir em j rh d r p re s e n rc o mp te \ ra s ks.orren bJsedon r." e,tr .. a.i ri i ;d
d." .J"
t r ons o, e d l o r r o n rr i \c d \i ru a rro n 5 .S ume requi re rhe i ni eri rerari on
or romD l cr
'
dar a.di d s ra m ' . o ' b d , rs ro u n d i n ro ,mdri un. i i g" ,. r_.t ,r,o" * ," .;.;:;;l j ;;i
. om ple x i rc n ,s p re s e n re db v Bl o o m rnd hi r , ol tedsue, r956r t" ," " ,.fi ;,.d;i ,,;
{
common to use rtems ofrhis nature on licensure and cerrili.adon $,rirren
exam.
ool is nor very large
comprex rrems appear ro be artracove
6, ofknowledge, rhey provide an answer
ions rest only recogn(ion ofisolaled fac_
arions and background marcrials used i
),bty require (he examinee ro use hiqher
,ffactive ro rhose who beliele rhar edi,.,
ng a srudenfs abilir) ro think rather rhan
)wledge and rhinkrng were independenr
Howeaer, rhese complex rasks hale soDe undesirable fearures as rest
ACHIEVEMENTTEST
PLANNNG
.I27
(1) rhe iten beSins trith a desLriprion
of a drDr
l;:x :l"".j..j*i*:L.iTii11J;r:;:,t::rj:["lT:]iii*ll.t,ff::f,il
ti;
1;;lt1i::".1i*,ti[g;rit;;i{
tflfr$di.ffi
(7i*ozozl
(:) An unusuai chco,icar reacrion
is dcscribed. E
Pp 196 9?)
:J't}?n,il'ii::X
:,:;:fit}}::
;l;h:{:r,*;#'ttj:;:":,T"::l;1
:J)ExanLkes
aresncna.han on$,rri(rr
(heexpen",,,,",,,,:."Tf;::,,].t:":il
jn;j;,.,1:-il;'
ffi:,1
l;:Ti:;l!:F::i::':.$:.1-'"'i..,T*lT
l;S;lt
:.
(lfi?ru,na,r
pp I lr-t9)
Figure7-2
DescrpLons
orc.mprex rems
Some ircm wrirers are drawn b ,
i28
ACH EVEMENTTEST
PLANNING
cartoon! poem! or passageol lest material, they are asked ro apply Lheir knowl.
edge.
kems that require interpretaoon of marerials ofien are rctirred ro as
c ont ex r.d e p e n d c n it te m s . (T h e y h a ve no neani ng oursi .l crhe conrexrof rhe mare
r ial ab o u t w h rc h rh e y a re w rn re n .)They are w i del y use.t i n rcsrsof generaleduca.
t ional d e v e l o p m e n t,re s tsw h o s ep urposesarc to mcrsure (he abi i i ri esofshrl e,rrs
wir h w i d c l y d i ffe r€ n t e d u .a ti o D a lb ackgrounds (\tosr suc.eedqui re w cl l i n doi ng
s o. ) Ho w e v c r th e l a re Ie s sa p p ro p ri !te, con!eni enr, and effi ci enr i n rcsrrngfor
ac hiev e m e n th l e a m i n g s p e c i fi c subj ecrmarter l en users shoul d be skcpri cal
of c lai s rh a r c o rte x t.d e p e n d e n t i t cms measurearl ti dr rather dran knori l edg.
bec au s eth e a b i l i t' e s th e y me a s u rearc al mosr w hol l r rhe resul tsofkno\l edge
Ma n y o f rh e i n d i re c r re s i sof krosl edge, rhrough speci atappl i .atuo' s {)t
t he k n o rl e d g e o r th e u s e o fc o m p l ex si ruarj ons,can be presenl ed i n rrue fal se.
m ult ip l e .c h o i c e ,s h o rt.a D s l e r, o r matchi D g forn S ome are more conveni endr
pr es en re d i n o p e n .e d e d l h s h i o n , such as requi ri ng rhe cxami nee ro produce a
dr agr a m,s k e rc h ,o r s e t o f e d i to ri a l correcti ons The mai n poi nr ro be madc hcre
is t hat, w h i l e a c h i e v e me n tc a n b e resredmosl conveni eD rl vB i rh one of rhe .o,)r.
m on i te m l o rm J r\. ,h c re J re n ,rd \i .n\ \hcn
^rhe' mcan\ r!' d) be morc , oI\c
nient, saosfacrory.or palatable b those who are char ged wirh providmg cviden.c
for valid score use.
NUMAEROF ITEMS
T he nu mb e r o fq u e s ti o n s to i n c l u d e i n a (esti s determi ncd l argel y by 1l )earnount
of time available for it Many testsare liDrited ro 50 minurcs, morc or less,becansc
that is the scheduied length of rhe class pcriod Special examinarion schedules
nay pronde periods of 2 hours or longer In general, the Ionger rhe period and
the examination, the more reliable the scores obrarned lioD ir However. ir is
seldom pra.rical or desirablc to prepare a classroom test rhat will requife more
A reasonable goaj js to make rcsts that include few enough questions so
rhat mosr srudents have hme to attempt all of them when workrng at rheir orln
normal rates- One reason for this B that speed ofresponse is not a pnmary objec.
tive of iDstruction in rnost K-12 and colleee courses and hence is nol a ralid
indir a ri o n o l a r h i e v e me n t.In m a n v dreaso, pr ofi (i cn, \. \pced rnd d,, ur/i ) a! e
not highly correlated. Consider the data in Table 7 3. The sum of the scores for
the first rcn studenB who finished the test was 965- The hiehcsr score in rhar
gr oup w d s 1 0 5 . rh e l o w e n h a s 7 1 . Ihur. rhc ranqc of\oreai n
har tsroup uas
35 score units. Note that, though the range ofscores varies sone$har from group
to 8roup, there is no cleartendency for students to do beRer orworse depending
on the amount of time spent. ODe can .onclude from these data rhat on rhis test
there was almost no relation becween time spent in taking the resr and the number of correct answers given.
A s€cond reason for giving students ample time ro work on a test is rhat
examination anxieiy, severe enough even in unrimed rests, is accenuared when
pressure lo work rapidly as well as accurately is applied. A third is rhar efficienr
use of an instructorh painstakingly produced tesr requires thar mosr srudenrs
ACH EVEMENTIESTFLANNNG
T.ble 7-3,
I29
Relation Between Fale of Wofk and Test Scoresa
1 10
1 1 2A
31 40
41 50
51 60
6r-70
71 80
81 90
91 100
965
956
9,13
955
965
1010
942
968
35
32
31
32
52
25
27
30
o. a resr
'Based
byr00srudenhThemeans..re onlbELesL
was96I TheLenLhstodenl
Jr^srred
LheLesr
r es pood b a l l o l i r In $ m e s i tu a ti ons,speeded tesN may be appropri are and
v aluabl e . l )l | r th e s e s i l u a ti o n s s e c m L.rbe Lhe excepti on. nor rhe rul e_Though
r l, e e ire r,,.,1 ^ ,' l Ii .r-n d d rd \ n ,r i udqrns ' peede.l ne... m.a' urcmenr .pei i ;l
ists hale .ome Lo adbpt rhis one: A test is speeded if fewer dran 90 percenr of
t hc t c s Lta k e rs a re a b l e ro a (rc tn p l a l l i tems
T h c n u m b e r o f q u e s ti o n s thar an exam,neecan ansser per mi nure de
pe. ds o th e k i n d o f q u e s ti o n su s e d, the compl exi r! of the rhoughr processes
r equir e .l !o 2 n s { e , tb e m, a n d th e c x ami nee' sw ork habi ts The fasresrsrudenri n
a c las sm a y fi n i s h a re s trn h a l f th e ri ore requi red bl the sl ow esrFor thesereasons,
it is difficult to specify precisely how many items to include rn a gilen tesr Rules
s uc h as i u s e o n e m u l ti p l e ..h o i .e i tem per ni nute" or " A l l ow 30 secondsper
true fal!e irem are misleading and unsnbstantiated generalizarions Only expe.
ricnce with simrlar tests in similar classescan provide useful rerr.lengrh informa.
F i n a l l y , rb e n u mb e r o f i te ms needed depends al so on how rhoroughl y
the domarn orost be sampled. And rhat, of course, depends on rhe type of score
interpretation desired. For example, a test covering l0 instructional objectives
may require a minimum of 30 nems when objectrves referen.ed inrerpreradons
are wanted, but 20 ircms might suffice for norm.referenced purposes.
Conlenl Sampling Erors
Ifrhe amounr of rime available for rcsring does not derermin€ rhe lensth
ol d r e\ r , th e d .(u ra c v d e s i re d i n rh e \(ore\ shoul d derermi ne j ' In gene' al , i he
larger the nunber of items included in a test, the mor€ reliable the ,cores will
be. In stadstical rerminolog]! the irems rhai make up a rest consrirure a rdrnpld
from a much farger colle.tion, or populattm, of items rhat might have been used
in rhar rest A 100.word spelling test mighr be constncted by s€lecring every fifth
word from a list of rhe 500 words studied durins the rcrm The 500 words consti.
r ut e t he p o p u l a ri o n l ro m w h i L h th e I{ r0.sord;mpl e sas rel e.red
Consider now a studenr who, asked to sp€ll all 500 words, spells 325 (65
I3O
ACH]EVEMENT.TEST
PLANNNG
perceno of rhem correcrty. Of rhe l0rl hords in rhe sampte, he spelts
69 (69 Der.
c €nr ) . or re c rl l T h e d i ffc re D c e b c ru e en rhe 65 per(enr for rhe popul ari on' anrt
rhe 69 percenr for rhe sample is known as a,oipting
",-ro,.
rh e .a s e u rrh e ,p c rri n g rc . r. rhe pop;td,;on
ut p,^,i t,tr que\ri n,,. i \
.
.In
r ear and o e r,n rre .n o r ro r n ro s r re s tsi r i s nor. t hat i s, rhcre i s al D rosrno
ti mi r ro
the nuDber of problems thar could be invcnred for use in an alsebra resr or
ro
r h- enum b e r o fq u e s ri o n s rh a r.o u td b e fornrul ared tor a hi story tei i _C onstructo
s
of re-srsrn rhese subjecrs,as in mosr orher subjec,,, h"".
p,"i.t..-1;;;l;i
."
". nn.. B ul l hei r
lii r , om w h i i h ro d ra w
* ,,,i ,te,
que.,i
r..r. dre
p ,e
,,,,,.
$m pr c \ . n e w n n e te \\r L (r' eru
s c* "rh
e In .t,rde ,,nt\"dr tra, r,un ot rhe querron\ rhrl
couro De asked rn €rch (are. A mdjo, problcm of resr consrru.rors is rhus
ro makc
t her r qdmp te \ ta i rl \ re p ' e s e n rr l h c u re | | tat p,,pul dri un ot que\ri ons on rhe
l oD i ,.
T h e ta rg e r rh e p u p u l a ri o n o r p" rcIrLt quesri on., rhe more ti kel \ i i i !
r h/ r ( he c o n re n t d o m a i n i r h e re ro Bc n eousirhar r\, i , ,n, tudes .l i \e,\e
and' seD ,.
they happeD to know is a much sre,
lO.question resi than from one of t0
practicalll all educarional rest scores.
er r r ) r ear e n o rra u s e d b y m i s ra k e si n s a D pti ng.A perfecrl l chos€nrandom samD l e
r r r t t s r r t t b e s u b j e c l to ra mp ti n 8 e rro rs .i mpl ) bF.au\e i r i s a .ampte.
LEVEL AI{D DISTRIBUTIONOF DIFFICULTY
I her e ae tw o s a v s i n w h i c h rh e p ro b tem ot tci r d,fn, ul l y , an be approa.hed.
( J ne r s r o r n (tu d e rn th e te s ( o n l v rh o s e i rems rhar dn)
srudenr q ho i ras rrudreo
successlully shoutd be abt€ to ans$er. Ifthis is done, mosr of rhe srudents
can be
€xp€cred to answ€r the inajoriry of rhe items cor€cd} pur somewhar differenrtr,
rs are tik€ty to be giv€n rhar many ofthe itens wi ndr
De elle( r r v e In d rs c fl m tn a tj n g a m o n g
average, weak, and poor The s.ore d
homog€n€ous, as reflected by a smaj
to make norm,referinced score intc
s c or €sof d i s a p p o i n ri n g ty Io w r€ l i a b ..
T h e o rh e r a p p ro a (h , fo r n o rm - referenred resri ns.i s ro (hoose i tems
or
appropnate conrenr on ahe basis of rheir ability to reveal different levels
of
achievem€nr among the students ieslr
dimcult qu€srions. The ideat difficuh
difficukt sc;le (percenr corect) mid
responie) and rhe chan(e levet diffic
25 per c enr c o n e c r fo r to u r.a tF rn a ri
proportion of correct responses, t}le item y'.value, should be about ?5
percenl
A€H]EVEMENT.TEST
PLANNING
131
.orre.t for an ideal rnc-false ireD and abour 62 5 perccnt cofrect for an rdeal
nulr iple c h o i c e rte n (T h c r€ rm p " ,/,r rq ,,\cd ru i efer ru rhe drri ,L!l t\ of an
it enr ) T h i s s e .o n d a p p ro a c h g c n c ral l l \rl l )rel d r' ure rel rrhtc \.,,res rt,an rhe
llr s r ibr a c o n s ra n t a m o u n t o f rc s ri n g hmeAs w e n i l l s e ei n rh c u p c o D ri ngchaprcn on i rem { ri ti ng. rhere are severat
m c t hod s i te n r w rrte rs c a n u s c ro m f,D rpul arerhe Ll i ffr.U tr\ l del ot a rerr rrer,
pr epar e d l o ra s p e c i fi cg ro u p A c l t ur;' ,rm.referenLcd Lcsri nq,l urh manrD ul ,r, r nn\ p! 1 /b ({ n p l u \e d r.c re d re i rc r
\ut rhcde\i redLl i rti ,ul ri tc\rt
thoush.r
is possible to use thc samc merhods !o conrrol rhe difficulry of irems writren
lbr a c i te ri o n re fe re n c e d re s r,s u c h mani pul auons w oul d be i nappropri rre For
cr irerion.refere nced measuremenr, rhe difficulry h builr inro rhc asks or r|r
k nowle d g e d c s c ri p ri o n sth a t s p e c i f) rhe (unrenr domaj W hen,rem Iri rerr,,,"
nipulate i te m c o n te n r ro a d j u s t p e rc erveddi fl l cuky, thev are i n effed.rea(i ni t a
m is nalch b c tw e e n rte m c o n (e r a n d the doD rai n defi ni ri on Thesc mi smarchcj
i' npac r c o n re n r re l e v a n c eb y u n d e rreprescnri ngl egi ri mareconrent and bv i nrro
duc ing i fre l e v a n r (o r l e s s re l e v a n r) conrenr.l n rum, pa' r or rhe reaqonIur nor
specifying the norm.refercnccd contenr dorrai roo precisety is rhat jr Bives li.
ccnsc to the item writer ro crearc irems of rhe mosr appropr ia|e difficulrl.
Sonc instru.tors believe drar a good tesr should include some difficutr
items ro tcsa the better studenls and somc easy irems ro give poorer studenrs a
chan.e Rur neirher of rhese kinds of irems rends ro affecr rhc rank orderins of
s t udent s c o re sa p p re c i a b l y T h e h i g her scorIng studenrsgenerJ ) houtd.rn\w er
the hard€r items and, therefore, earn higher scores yet. Nearly everyone would
answer the casy iteDs 'l he efecr of easy irems is ro add a constaur amounr Lo
each exaDinec's score, to rarse all scorcs. bur wrrhour affecrins rhe rank ordc,
of s t ude n ts 's c o re s F o r g o o d n o rm referenred arhi e!emerr n;,sure\, i rel rs of
moderare difficulty-not
roo hard and nor roo easy-contriburc mosr ro discrjmi
nating bet{een studenrs who have leamed varyinB amounrs of rhe .onreDr of
Tesls designed ro lield criterion.referenced score inrerpretarions tikely
will be easrer in drfficul(y level rhan therr norm referen.ed counrerparrs. When
resting for mrnrmum cumperency or for masrery, rhe expecrarron is rhar mosr
s ( udent sh a v e re a c h e d th e mi n i mu m l el el or hr' ve ach' evi d masrer\.The i rcnb
in these rcsts should be easy for mosr crudenrJ bur should be difficuft ior rhosc
who have not mastered rhe contenr rhe iiems represenr Ir should be clear rhar a
t.sr item in isolahon is nor easy or difhculr. The drfficulty of an irem relares ro
the nature of the group and depends on rhe exrenr ro which those in Lhe gloup
possessthe abiliry pr€sented by the task.
SUM M A RYP ROP O S IT IO N S
1 T hem os lim p o rl a
l unnl c l i o n o l c l a s s ro o m te s rsrs
3 Whaleverfom of lesl is used,examiners
shood
lo oblainpreclsemeaslresol slud€ntsachieve,
2 The lormoi a tesr-essay,objective,probtemgiv€sno certainindicalionol lh€ abilitybeing
atlempilo makelhetrmeasurements
asoblecLive
4 Whena p€rlo.mance
tesrand an objectvelesl
canbe lsed to achievesssenta ly lhe samep!r
pose,the objellve test lik€lywll be moreelll,
ACHIEVEMENT.IESTPLANNING
'32
cieni, be mo.e retevanl and yield more reiiabte
1l
5 Th e pre ctsonwir h whic h lhe c onlents pec
f c ar ons ror a test shoutdbe describ€dretatesto the
rype or score tnterprela1ion
de6 red
6 A labte of speciticatons is a planningguide
ior
Multiptechoice and lrue_tatseilems can be
used
I n m e a s L r ea n y a s p e c ,o t c o g n r y e e d ! . d l o n a r
12 S
a t i o n ao r n t e r p r e t N e
tegt temstendlobe n
-p rcrent,drflicLltto wr te drfic! I Io,,key oblecr v e y € n d L r n c o n v i nn(q a s m e . s u r e so t h q h e r
***'alion orconlenr
and
menrarp.ocesses.
ff;lj:i""""",:T"
l3 Mosl classroom achtevemerr lests
shoutd be
7 Theretaiveimporla.ceot a conrentsubdomarn
sno.r eno!9rr, r. relationto lhe tihe avaiabte
In a lestdepends
so
on suchtactorsas lhe amount
thatvrrtu y at stLdentshave tme to alempt
or conleft rt contarnsafd the amounlol ifsl.uc,
a
rcna lrmedevoledto il
8 Theexami.ercan conlrotthedisirbutionol lest
s c or es m or e e a s i ty w i rh e s s a y th a n w th o bj ecl i v€
I I ls untkey thatstldentssludymoreeflectvely
rnprepsaBton
foran essytest lhantoranobiec10 E s s r yr es lsc a n b ee fl i ce n tw h e nth e
9 ro l pro b e
OUESTIONSFOR STUDY AiID DISCUSSION
,
nsr,ucro,d,r,e,
nmpo,ra,rways
i:x3:iitL'",#:",":1,"J1"::J,HJl',"i:fl""#"
2 Whalreasonabte
slepm ghta scienceor malhtet
;::";;;"'";:Xil::lHl':i::i;:':1ffi:l?:':li:,1:il;l
1:'J:1,'i:;le'.1i",;,";
3 Whyafe pertormancelesls ofief lesseTUcief hi
anoblecliveIeslswhen bolhare designed
Io setue lhe same purpose?
,es,anda c,i,erion' H":::.f",i"Ji:llf;ft";ndderinirio' rora normje'e'i.nced
appropna,ev
e,clde,ems
n
' li:';:ljii;:fiHjX":::"H:[:#?:,9"":,:,"X*'ons
9. Howdo conlentsamplinqe(ors causesco.eretiabrty
to
t
be towered?
mieht
irbeapprop'|iare
b Lrse
irems
rhar
a,ehishrv
di' cur ro.
' ir:Tl"Yl1":X;1ff
:"#es
thesratemnr'
"crte'iion{ere,enced
res,s
a,en,inron,onarv
" y,,:il;jiffi:l$"^
"
True-FalseTest Items
From one poinl of view, Lrue fals€ tests seem like a breeze-easrer than they
ought to bc From another, as many students would testi4. they seem unnece$ar
ily difiiculr, irrelevant, and frustrating Some would say there are bctrcr rays of
measuring achievement than bv using true false items. Yet this lack of endorse
nr€nt is not universally shared among educators. A few, including the authors
of rhis book, regard true false items much more favorably (Ebel, 1975; Frisbic
1973)
ME RI T SO F T H E T R U E-F A L SEF OR M AT
The basic reason for using true-false tesr items is that ihey provide a simple
and direct means ofmeasuring rhe essential outcomes of fornal education. The
argxment for th. value of true-false items as measures of €ducational achreve.
ment can be summarized in four statements:
1. The
e$encc of€du.alional
achievemen! is the command ofus€ful verbal knodl
2 All lerbal knowledge can be expre$ed in propo$uoDs3 A proposition is an)' sentence rhat can be said to be (rue or false
4, The exrent of stDd€nts' cotumand of a particular area ofknowledge
by lhei. successinjudSing the rnrh or fahity ofpropo.itions
is indi.at€d
r€lated to rL
r3tl
1 34
TRUE FALSEIEST TEMS
l he r ar io n a l e s u p -p o ru n gth e fi rs r s ra remenrw as pro!j de.t i n C hapter 3. The sec.
ondi\ z lm o rrs c l l .,\rd e n r. I. I p n \,i L' 1. In i nraS i re r,,
" t" n,.n, ^i .erb,t ^n" .,t
c dge r ha o u l d t' o r b e e \p re s s .d r\ a pri ,p^\,ri o : th,.
rhrrr] i s r e.nerJ \ aL
c epr edde fi n i ri n n . I h c In u r rh .c rm\ ro be d I,,gr,at i un,equpL F ut I t;r ti ,,r rhr,_
It mal, ofcourse, be challenged on rhe baris oa r".t nic,t ,ear."ess". r, t."" rlisc
r t em s ,bu t rt i s n o r Ii k e l y ro b e re j e c r edi n pri n.i pte.
I o re s ta p e rs o n ' sc o mma n d o fan i dea or cl emenLof koow l edsc rs ro rcst
his or he r u n d e rs ra n d i n g o f i r A s ru denrw ho can r.cosni ze an i a* ?," r"
expressed in some parri.ular ser of wods does mr have .u,""runi ".rrr."i,
't 's
Neither
does the sruden( who knows tbe idea onlv as an ,sotared facr, r{irhoui
s eeingho w i r
re l a re d to o rh e r i d e a s.K now l edge one has commard of B nor a
m is c r llan e o u s' so l l e c ri o n o f s e p a n re el ernenrs,bul an i nregraredsLru(ture thar
c an_be ! s e d ro D a k e d e c i s i o n s ,d ra rv l ogi cal i nferenccs,or-sol ve probtems Ir i s
us able k n o w l e d s e
C o n \i d e r-h o s u n e mi Bh r re \r d .rudenr U n,ndrLl ,,t qr,hrmFde,pl ni i
. ^.
ple Clerrh
. ru o l l e r rh . \ru d e n r rh e u ' urt expre\\i on n, ,he pri n.rnte n5 r rrue
statemenr, or some slight akerar,on of it as a false slaremenr, as ha; been done
in r t em s I a n d 2 , i s ro m i s u n d e rs ra n drhe true nature of know l edge
(1) A body hjnoF€d h a ttuid rs buoyodop by . torco€quatro rhe w€ishr ot
rh6 flutd disPrac6d._fi)
in a utd i3 buoysdup by.lorc6 6qoatlo ha[ the wetghtot th6 flutd
12) A boly immerGod
drspracod. {R
Instead the srudent mighr be asked ro rccognrze lhe principle in sonre alrernarive
statement of ir, as
irems 3 and .1 below
'n
(3) ll an oblsct havinga codrin votum. is suroundedby a quidorgas.rh€ upward
torc6on
ll.qu.l. th6 w.lght ot ihat votum. ot rh6 tiqurdor gas. Cr)
(4) Th6upw3rdlorc. on rn oblecr.urcundsd by. qutdors.s t6 equalorho surtaco
area
ol th€ obl.ct mutlpttod by rho p|6sso6 ot th3 tiqutdor gas suroundtngtr. (D
O r I he 5ru d e n r m rg h r b e re q u i ' e d ro dppty rhe pri ,r,i pte i n \pe,i ti i srrudri un,
r uc h a, r h o re d e !ri b e d i n i re m s 5 rn d b bel otr
(5) Thobuoy.nt lorcs on a one-c€n m€r.r cub6ot atuminumts €x.c y rh. srmo
a3 |h!r on
r on6'c€ntlm.torcub6 0t tron eh6n both ar€ Inme6sd h wet€r, m
{5) ll an insolubl€obJ€ctts hmeEod succ6sstv€ty
h s6reratfluidsot difi.r€nr d6nsiry,rh€
buoyantlorc. uponlr in s6ch c.E. wll yaryInvers€tywtrh rh6 d.n6iry or th6 fluids: (B
Sonetimes the use ofan ulconvenrional example can serve ro tesr undersranding
(4
Dlatlll.dwate. i. son wrr€i
m
It is a popular misconceprion rhar true false rest irems are limued lo
testing for simple facrual r€call. On rhe conrrar',, comptex and difficulr problems
can be pr€s€nted quire effecdlely In this rorm.
----------
I
TFUE-FALSE
TEST]TEMS I35
(6) Th€ n6xt tsrm in lh6 ssrtss3, 4t r, t1. 18ls 29. m
(9) It th6 3ld.s ol a tr.pezotdaro cors€cu v€ whot. numb6E,and I rho sho.r$r 6tdot! on.
ol the two par.ttot6id6s,rhsn the a.6a ot th. trapozotdis 10 squrr€ untt.. tD
'fhe reasoD why true fatse resrsare otien held
in low este€m is nor thar
rhere is anyrhing inherendy wrong with rhe ilem form. Ir is rarher rhat rhe formar
i\ olr e ' , u { e d
b t u n \k i l l e d i rem w ri rrr\. Ir ha! rl \o bcen al l esed ri ar rrue
e perl
f dls ere \r\ d ' c 'en\p
( i\a l h \u s c e p ri b te ro gue\\i ng and rhar rhey hrre h;, mtul ene, L.
on \ r u d e n r l e d rn i n g . I' c l i .l ' s L h a r hdre nol bFen,hei ked aA rai n\texD eri menral
dak l h e \e a l l e ts F dw e d tn e \\e s o t rrue-ratsei rem\ w i be d;atr w i th more ful ty
lat er i n th e c h a p rc r
E tici€ncy ol True-False ltems
In addrtion ro providing retevanr measures of rhe essenceofeducarional
achrevencnt, irue-false irems have rhe advantage of being quirc efficient. The
n { i l l d c p e n d e n rl t !.o ra b te
per rhousanJ w ord\ of resro, per
e\pon\es ghe,rhanrharl ormutri pte.rhoi re
r' r,un'
ourbuel rrc s ri r,g ri m e re n d s ru b e ,o n\i'derabtyhi
items. Research e!idence has shown rtrat siudints can arrempt thiee iru€_false
fi n _ ,e
J empr a prl or murri ptF{hoi .c i remr \Fri sbi e,
i: 11\ i 1 _ rh e
r elr . r' , /.1 ).u tti e ' .q
Inug' re
rhdrs a' o
d v d n raB eIn etl t. i cn. y i s r di \adr anti de i n i rem .ti s.
cnminat'ng power kem fbr irem, rrue false rends ro discriminaG Iess wetl benveen high and low.achiering srudenrs rhan mutripte choice (trbel, tg80).In sum,
a C:od. olel'our rrue false resr is rikely ro be ai effecrive as g."a lr".r,.ri
multrple.choice tesr.
"
Compared Nirh orher irem formars, rrue-fatse rest items are relativelv
er \ t ru w ' i re T h e \ a re .i mp l e d e ,l arari ve senren,esuf rhe ki nd rhrr mate up
u ra l a n d h ' i ' re n .o ' n m u n i (a ri on' . Ir r\ rrue rhar rhe i der' \ rhcy atfi ,m o;
" r os r rDust be chosen judiciously
deny
Ir is also rrue rhat the id€as chos;n musr be
worded carefully, wnh a view 1()maximlrm pre.ision and clanry, since rhey srand
and mun bcjudged in isolalon. For rhh reason they musr b; self.contained in
meaning, depending wholly on mErnat conrenr, not on exrernal conrexr Bur rhe
ba\ i, ,l i l l rn \o l !e d i n rru e -trl q e i re m qri rrng i s no di l terenr rrom rhrr requi red
(o mmu n i ,a ri u n q i tuari on. Iho\e qho have di ffi (utt) i n w ri ri ne
good true-false resr.irems p.obably have rouble expressing rhemsllves clearl!
and accurately in other forms of wrrrinc.
Comparisons wlth Multipl€.choice ttems
An obvious difterence berw€en rrue-false and mulriple.choice ircns is in
r hr nu n b € r o l rl re r n a ri !c , g e n e 'a \ o ered
rhe ex/m,ne;. A norher di erence
' uprercnred.
r \ r n |h ( c te tIn ,re n rs n r s p e , i fi c i rt o f rhr rdsk
tr ma) be mo,e di tU .utr
r o t udg e s h e ' h e r a s ra re me n rs h o u l d be i d ed rr uc of fdtsc rhan ,o i ndse w hi (h
ol r e\ e ri l a l rc ' n a rrre \ i \ rh p b e \r J n.w et ro d pr' rri ,utar quesrron.For i rampj e,
sludents who mark a srarement rrue may nor be able ro think.of a counterex'aml
pre a srtuahon
rhar would Dake rhe proposiion false. Their
\ . J r . h fo r l I o J n re re x d m p te m a t b e boundFd by ri me tl mi L or by rhe teneth
ro
wh! h rh e ) .a n \| | e r(h rh e i ' m i n d or rhe deprh or rhei , reLri ctdl ,ysrem ro;hi (h
i36
IBUE.FALSE
TEST]TEMS
llel i1
len.t.at.. The mulriple(norre rrem, howerer, hmrls ihe
univc$c or
rhar rhe individlr;t r
(hese
d
rhere
aresubsrantia,
sim;il;.;ii:i.Tii:
,1:'1:fii],:-ll
rre,cn(es,
rrern\ d' e hi rcd on prnpo\i rron\.
J te\, ti tr
Whtchot ths lo owing sentonc6sts statod
most 6mpharicay?
a. I my onderctandtng
of th. quostionis coroct' lhis
Princlpleis ono wo cannoraflord to
,cc€pr.
t
Onoprhctptsws cannotatrordtorccept
tsthtson6,it my undeEian.,hgot rh6 qoes
on
c. ThlBprlnctpte,lfhy hd€rslandhg
ot rhequ* on ts conEct i, on. wecannoraflord
to
d. Thtspdnctpt€Is on. w. cannorarrod
||o sccopr.|l my undg.srandtryot the qu€sflon
tg
.ofFrg!re 8-l i nvot\€ some desree or D er.
rom propo' j ri ons C onsequenri y,rhcv
.an
o r mar and-a' e berrer posed i ni ri a
r as mul .
' n r D asrsot rrem 4 i s rnore apparenr rhan
ir
(l) Ch.ngtngrh€ roDpgratuEot a
mas3ol
14rrrore
amenarnenrs
we,"
::li:1",ffi:#il;"T:i::
".*,,;;;"-i,:":
raflflc!flon th.n dudngth6
noxt oo€ honoEo y6ar.. (r)
"f," """,
(3) An6c pse ot tno sun can occu.onty
wn.n rhs moon ts tu . {R
('' hcrt.rhg $€ rongti ora tosi is lt6ry
rjodsc6a6€rts Erand!ftr6nor or h.rEu.€nenl
(D
TRUE-FALSETESTTEMS
137
TRUE.FALSE
VEFSION
M ULI IPLE.CHOICE VERSION
A EqualtyUselulFonats
+ Y'z= 4 is a circle (T)
(1) Th ee quat onX' ?+ Y' : = 4 s r epr es enl e d
+ Y'z= I is an ellipse(F)
(2) Whal s the ma n flnclion ol a co(eclive
'a
Chanoelhe image thal lalls on lhe
(2a)Thoma n lunclonor a cotrecliveLensis io
cnangelhe magetharlallsonlh€ re na (r)
(2b)Themainlunctonor a co(ectlveL€nsIs to
changerheamounrol lghr reaching
rhe
b Changethe amounl ol righl reach ng
c Femove the blnd spol o. the renaa
B ltens Benet suited ta True False Fonat
(3) Which ol lhese is ror characlerislicor a
a lr c an v e ony n pant and anim al
'b
(3a)A vnuscan ve cny in planlandanimal
(3b)A vnusis composed
ot verylargeliving
ll is composedol very arge iving
c ll can reprod0ce lsell
c hens BettetSuiledlo Mullole-Choice
Fornat
(4) Whichol lhesebesrdescribes
a goodcr
a Someonewho pays laxes
b Someonewho has a job
'c Someonewho obeys the laws
Figur. 3-1.
(4a) A goodcuzencan be descfib€dbelleras
a lawabiderlhanas a laxpay€r(T)
(ab) Havn9ajob ls morecharaclerisl
c ot good
cltzenshiplhanis obeyingaws (F)
Mulpl€ahoce ardCotrespondns True Fase llems
false irerns than from mulriple-choice items formed by grouphg one true srarc'
Denr wirh rhree thar are false or by Srouping one that is false wirh rhree that are
S BOU TT R U E -FA LS EITE MS
CO M M O N M I SC O N C E PT ION A
of be n)e-false format among educa.
There seems to be an mitical undaeeptance
oonal researchers, wrirers, and testing personnel. The unfavorable auitudes
about rrue-false irems seem to be perpetuated by disappointing experience of
resr takers, frustEong results of resr makers, and hearsay elidence. There hav€
been few careful empirical studies of the charg$ most often brouBht aBainst
them. An analysis of some of (he mosr frequenlly heard indictm€nrs follows
118
TFUE-FALSETEST TEMS
Th€ lmpact ot cuessing
a g a rn s rrru e ta tre r.i \ rhrr mdnr rake
qui re vl ,ousr) i s rhdl
t hr r dr . s u b j e (| ru g ru \\ F n o r i n rru d u( e.l b\ g" * ,i .g
S * .i ,r ,hi .g, , ," i . ,,;
rn respoDse ro rhis charge_
The firsr is rhar a disrin.rion r
and
g u e s s i n gB
. ti n d s u e s s rnt
nfeodrmge!eds s e !,o n rh e o rh e ; h a n d,r nt or'm
r ne m o ' e a s l u d e n r l n o \ r, rh e mo rc l i
The second is rha( weltmoriva
difficulry wirh a generous dme timir,
on rue-ralse rests. They know rhat rl
determlnrng rhe correct answer In onr
rc.pon-.c,hJ | , Jn be rnJd. r^ r 61., 1o,
n.
:il,;:.
l::,T":l::."
il:t:Hl::::liJ;
.ol 0.85 ro u.,15.Ih,.se!atuFsa,pdbour
roorn resr,rega.dless of the lbrm of
tcsr
I rhar goo.t true false resrs need nor he
S o m e re s tj n gs p e l i a ti s rsb € ti e r e
_
dex r r wjr h b ! c o rre L ri n g th e \c o re s f^r
sr were ro be exrensile enough to allecr
rcrnrng rnar a guessi ngcorrecdon coul d
The Amblgulty Charge
IPUE FALSETESTITEMS
I39
ar D?E z tqS
. tu d e n [sw h o s a y ," Il I i n te rpret rhe statcmentthrs w av,I d sar j r i s true
B ur if I i n re rp re ( i t rh a r w a y , I' d h a vc to say i t i s l al sc," are compl arnrng aboD t
appar e n t a mb rg !i tx Ife x p c rts i n th .l i el d hal e the sa' Jedi l i i cutty i n i nl crD rcti ng
a particular stateDeni, the trouble may be iDLrrnsic anrb,Buiry.
Apparent ambiguitl mat someti,nes be due to inadequaaies in the s|u
becauserhe {o.ds
dent s ' k n o w l e d g e .T h e y h a v e tro u b l e i nterpreti ng a sLaLement
meao some[hing a litrle different t.] them rhan ro rhe cxpcit, or because the staLemenr fails ro evoke the nccessar: associarbns that would yreld the inrended inler.
H e n c e a p p a re n ( rn b i g x rty i s nor onl v unavoi dabl e,i t may eveDbe u,.eful By making the task of respondrng harder for the poorly prcparcd than for thc
$ell pre p a re d s tu d e n t, rr c a n h e l p ro di scri mi nate berw een the tw o Thusastu
denf s c o mme n t th a t a te s r q u e s ti on i s uncl ea. i s nor necessari l yrn i ndi ct cnt
of the question. lt may be, ratlrer, an uniDtentional coDfcssion of his or her owrl
In fi n s i c a mb i g u i ty , o n l h c other hand, Lheki D d of ambi qui ty thrl trou
bles the expert as nuch as or more th?n rt trouhles Lhenovrce, is a feal con.er n Ir
probably can never be rotally eliDinated, since la goage is nrhcrcnrly sondvhat
absuact, general, and imprecise. Bur i rhe statenents lrsed in rue-lhlse resr
it em s i t s h o u l d h e n ;n i m i z e d
Of course, there is somelimes truth rn the charge that true false (est '
iiems nre ambigrous and lack significance:one reason is that t€achers somctimes
uy to excerpt textbook sentences for use as rett itens. Even in a wcll-written
rexr, few of the sentences would actually make good true false tesr j(ems Many
snrements sen'e only to keep readers informed of whaL the author ts fying to
do or to remind ftem of the stru.tur€ and orEanization of the discussion. Some
rhar pr" rpde ur l ol l oh rh" m
ar e, u d .p (n d c n r ta r th e rr me a n rng.n..nri n,es
that they are almost meaningless out of conrext Others are intcnded onlv to
suggest an idea, not to state it positively and precisely Stilt others comprisc a
{hole logical argxment, involving two or three propositions, in a single sente.ce.
Another category of sratenenrs is inrended not to descrrbe whar is tiue, bu( to
prescribc $hat oultht to be rrue. Finalln some are expressed so loosely and so
tentarivcly thar Lhe'e rs hardly any possible basis for doubtiDg theD ln all the
wrrting we do to presetae the knowledge we have gained and to communi.ate rr
to otherst there seem to be "ery few naturally occurring nuggets of cstahiished
For this reason it is seldom possrble ro find in a text or reference work
a senrence rhat can be coDied drrectlv for use as a tme starement or transformed
by a simple negation fbr use as a falsc statement The wridng of good true-false
items is more a task oI creative writing than ofcopying. This may be a fortunarc
{ircumstance, for it helps rcsr construcLors avord the huard ofwrning nems that
would encourage and r€ward rote learninL
A speciai source ofambiguity in rrue-false test iterns needs to be guarded
against It is uDcertaintl on the pa.t of f}le examinee as to the examiner\ stan.
dards of truth. lf ihe statement is not perfe€tly lrue, if it has th€ slightest flaq
should it be consider€d false? Probably noc; tbe item wriier's task will be easier,
and rhe test will be better if t}te e-raminee is directed to consider as true any
statem€nt rhat has more uuth than error in it. or any statemeni that is more firre
I'O
TFUE FALSETESTITEMS
be rhe tes,bujrdefs
taskrheni'i o ,uod , r n8
I1*r
'hn
"-.?j;,'::ll:,:l:1r,",;riilr,i,."li.l":i.""n"n"
.,,
:iift:;:11:1.f1::
fli;;x:i::'llh:n
:,{"r,f,r
;r!l:i}il
ii.iJ'.l;i::t';,.",",..
:.".ri,},:?rj
xl;;,:t,;i'll;:l*llr:ji
T.".""',t,;l:l
r;l:ii:il:r;l',;:
;ilt""
ll:il:
li..:
f,t',-llt
i':
i:,;iJI.;n*:,*_"1,;:i";,i:ii
ii:;"J':1.;:iTi f,Xi:
ll:fl
I
t
:',;i:l:i';'ilT
xl:1,*1*::
I
];:lill';1,'l'.iI$*iT
:':i;;."n'lf
l,l?t:":"1,;li;"
lrlid
::i[i::i
-l",ffiiJ;:
i:*'
:i]
il,
x'::j'I
t.l:n;ti,i
ij:::lil.;1,;'1.;;:x',liJ"ill]'f;;
i,'tijij;1,",i:.;:i:
l;:,."1T
*,fi.llf.ff::.1?,,i:;;;;";i
:":;,tl;?;,
.r rrrtuig"-iiy
"*ia.tt. -,.*
Beward lor M€morlza on
Ttr6Esthorot OonOuixot6
wes C.tu.ntos. m
rne cnsmtcEtjorDutafor water
t5 H,O f4
Th€ 6atfl. ot Has ngs wa6
, .,OrU.
chrisroph,r
cotumbL,s
".nn,
waooolir" so"i".,o
o
r0616sr€str ptrnoisin th6
sot"rsy"re.,
rn
-t*!il;*ltri*;:+i
mi <
l:Ti:ltl,H
:1"1,::fi
:T':l
:T#T:.:ff:.":"ff
ff:fi:ffi:Tff;
11,."
One can tesrknowtedge
of a functionalreraoonship:
-i"Hf
15*
tho rt.ms rr ! r.!r viry In drrculy,
ri6 n,rrow€rrr,€ ,one. o,
t !t
'lx'.:ffg"J1ii,:::::::T:'i*:l,Tilf:i::*,i..!!b.r!nc6,rh.r€np€r,rur.o,rh.
One can rcst the abitity to apply
principtes:
n
:::ilii!."":ff1"#?1"H.T.iTi.,"1ff":.?#,""T!"$;.,r
-,,."-,.,..,..,
:#:.j::,""."".,, n;",.,.o.".",rh,,,dooropen,ri.r6mp€r
,"jl;:1ff#,#fl1fi
---------
IFUEFALSE
TESTTEMS I/I1
The tim6 kom moonrtss to moonset is usualy tong6r than rhs tim€ troh sunrise ro
Eltecls on Learnlng Outcomos
Crirics of rrue-false rests somerimes.harge rhar rheir use has harmful
ef f e, t s
l e rrn i n g i rh rr rh e ! i | | e n , o urJgF \rudenr\ ro (on, enrrareon remFmt er.
ing is . l^anre d fi .' u rl d e ra ' l \ a n d ,o rerv h.rri l y on ,ore terrni ng; (2)
en,ourrqe
r ud. nr s ro a ,re p r g ro r\l v .\e r\i m p l i fted .ontppri ons or rrurhr rnd 13r exooi e
qud. nr . u n d e \i ra b l ) ro e l ' o ' . l { n w dcten.rhtc a;. rhe\e.hi rs€s?
f' u e l .rl \e i ' e mq n e rd n o Le m phasi /ememory tor i sol ;,ed fa, ,uatdera,l s.
uuod nn e s p ' s e n r n o re l p ro b te m \ ro be \ol \cd dnd rhu\ emph,rj /r underi ,and
ing and a p p l i c a ri o n .E v e n rh o s erh a r mi ghr requi r€ reca off;.tuatderai ts do nor
ne' *s . r r i l \ re k a rd ro l e l c d rn i n g . tor rarrs arc hard ro re,nember i n i sul dri on.
I ney drc re tr' n e d a n d , rn b e re .rl led be er i f rhc) rre parr ol a srru. i ure of
knowledge
l h e re i ,
b e h e l e rhJr rore tcarni ns i ( \om.rhi nq of an edu(a.
e trF
a s onnw' a
u ' n e d a g a i nn
r innr l b n g e y m a n '.o
rnd ( i re.t as rhe , a,,,e or ed;, a,i undl fai t.
ure, but seldom prachced or obsefled. Rote learning is not much fun, and ir
promises few Iasting rewards. Mosr srudenrs and teachers properly shun ir. per.
haps ,ts supposed prevalence resulrs from an error in infeiena. Ii is surelv rrue
thar rcte learniDg always resuks in incompl€re learning (that is, tack of under.
standind, but ir does nor fotlow rhat a iniomptete teaining is the result of too
mu.h rote learning. Ir may simpty be rhe resulr of roo tiule iearninq ofanv sorr
n trh F s e ,o n d c h a rSe .' hd" he rl regorj .at w ay i n shi cl i rn5se,s are
.
_ -w h rr
bor h of f e rc d a n d s (o ' e d i q l i k e l ) ro g ive srudenrs rat" " noi i on ,tor,, rt. gi moti ..
" i s setdom
ir ! of r ru rh ? tt i d e n c e i n s u p p o n o t rhi s drgumenr
oreqented.and the
argxment rtrelfis seldom advanced by rhose who have used rrue_fatse tests exren.
\ ilc lv . T e s r w ri re ' . tn o w s ru d e n r\ s i tl .hr ense dnsw ersrhar di racree w i rh rhei r
own. O f te n rh e r w i l l p o i n r l o rh e .o mptexi ty;frhe enri re subi e.r-andsi l l i nsi sl
r har a r a s e (a n b e ma d e fo r rh e a l r€ ' nari ve answ er.U \ua y rh; aurhor con.edes
r hdr . r heq ra l e m e n ri n q u e $ i o n i s n e i rher perte(rl y rrue nor tora y fatse. Ihe di s.
( u$ion rh l r n o rm a l l v fo l l o w s re n d e l o emphr' si ,/e,rar}er rhan
i o (on(eat, rhe
complexiry, rhe impuriry, and the r€lativity of rrurh. On occasion ir leads ro rhe
. onr lus io n rh a r.rh ei te m i n q u e s ti o n w as si mpty a bad i rem. poorl y concci ved
or
Itow conrider the third charge: rhat rrue-talse rest items are edu(adon.
all) har m fu l b e .ru s e rh e v e x p o re i h e {udenr ro eror. Ihe arqumenl i s rhat th€
pr es enl a ti o no ffa l s e s ta te m e n tsa s i f i hey w ere rnre may havel nemdve suqses.
rion effecr, caurjng srudenrs ro believe and remember unrrurhs. H6wever, iich
(1929) tenrar;velv (oncluded fiar fie negarive suggesdon effeci in
rrue_fatse t€srs
rs Ptobably mu.h qmaller rhan is somerim€s assumed and is fu v offser bv l}|e
ner.posnive teachinB effecrr. Other experimentat srudies confirm rfiis conctusion,
and as R o s s { 1 9 4 7 )D o i n re d o u r:
Wherbr or nor a falr sraremenris dangercuedepdnd! tar8tty upon rhe se in*
,n wh'(h rr appears.A talsesuremenr in r}|e rexlbooL,roward which lhe char;
i€ristic pupil artitude is likelt to be one ofpdsive, uncriri.at accepranc.,mighr
l
142
TRUEFALSE
TESTTEMS
'
easi\ be sernnrsRut the siru.rion is differcnLvirh rhe ircnrsii a rrue-{alje resr
H e r€ rh ch a h i ru ,l r.,h ,n ..i rhemode.npupi l i s oneofacti vc,.rj ri cal cl al Lenge
(P :i411)
In lighr of fiese findrngs, we conchide rltar wcll-.on.eired and we!.
developed true false rest ircms can conr b,rrc subsranriall) ro rhe lr|easuremeDr
ot educational achievernenl The harm some fear rhcy mighr do is r!.i!,al in {joD1-
W B I T I NGE F F EC T IVET R U E -F AL S EIT E M S
The insuuctor who wishes to wrirc r rruc-false irem for a claslroom Lcsrshould
begiD by focusing atrenrion on some segmenr of rhe knol'ledse rhar has been
taugbt- It is assumeclrhat the item wr're. rs in firm commrnd of ihar seqmenr of
k I o$l' J q , a r,l rh a ri (i ri o mc rh ;n s a n\.dpdhl esru.l .nr
ot ,hc\,,hi .., o;ghr a{ .u
ro undersrand. This segment of knowlcdge is, or easiiy could be, dcscribed i. a
single paragraph su.h as those found in any good rerrbook adopted for rhe class
A , , or d i n q l \. Ie m h n rc r \ u ru a l l ) fi nd i r ea\i er and more ettc.' a!( ro u.e In,rr u, .
tronal marcrials as the source of ideas for r€sr rr€ms rhan ro deriae rhose ideas
direcdy fi om educational obiectives
Suppose now that an item writer singles our a specific paragraph of rexr
inlendd to help the studenr develop sone segmenr ofknowledge T,ke, for ex
ample, rhis paragraph:
A \o i .rs i v i n d F x :rn i n e c (r c h o i cedmongopri ondlque{i o,F un
.$j y ,r.r ,,,
"n
l r.. \p i r" l ,i r.u m{ a n ,e . m a l c suchopri o . netFsl ar} l.t dri tc'
- r c\rr,i nee.
aNner direrenr .lue$ions, rhe basisfor comparinBrbei. scores,rheir lelels ol
pe' tb.nanie, $ eroded-Srudenrswho have answ€feddifferenr scrsof quesL,ons
a.turlly hale .aker differ€nt rcsrs.ThesedifferenLrcstsare nor l(ery to be nea
. ur / \ ul
s dn' e di hiev Fr nF n r ! A n d . c , r r i n l y v h . n r u . i p n '\ , h o , ^ F r h c o u e !
' hc
on ! hr h r hev ( r n per f or m b . r r . r h c \ e , o r y o r e . I , , , , h i r A r o u l , q ' m - \ r
' i. n,
rhe differen.es in achievemenr among exanine€s Thar is, a narrow ran,ae o{
ld, , l\ his h
will pt obabl,
Dependdbh no,m rerFrFn.rd.cnre inr.r
' . or e"
'c s u l rbecause
pietadons will bc difficDlt ro make
rhe lnall score differences $ould
more likely be due to measuremenr eiror than ro differences in acbievemenr
Op,ional qu€stions are somernesjDsrifi€d on rhe ground Lhargiving studrnLs
a choi.e among rhe quesio.s they are ro answ€r matLesrh€ tesr "aairer" BuL if
all the questions irvol!€ ess€Dhal aspects of achiev€men. in a .ourse (as fiey
ordinaril)' m,gho, it is not unfair to aDy srudenr Lorequre answers ro all of$em
Furthermore, an oppoftunity ro.hoose among optional questions may help rhe
poo.er studenr coneiderabu burmay acrually disract rbe well prepared studenr
'Ihe first question the t€m.r\,rikr
musr pose is, "What are rhe mosr
rmp orta nr ider : pr es c nr ed ; n I his pa r J g r a p h l
Her; rrF rhree ot\Fverdl propori.
tions that can be identified:
l. 'Ihe use of optional essay irems interteres wnh inLerindividual s.ore compari
ITEMS 143
TT]UE
FALSE
TEST
2 TIe ust of oPrnn,al e$a.vileDrsusuallv contriburesLo reduced |esrscorevari
I
T h e u s eo r o p L ,o n ailte n s u mal l l Iesul tsi . rcdu' ed testscorerel i abi l i ty
The Dext qlesrior is ho! d)ese idcal can bc expressed as uue false test
ircms. At rilis poi;t, a vety iDrPortartr suggesrx,n c& tte offercd: Ahrqs thinh oJ
the ottul/ah' OfLourse, only one men
boslibk t e-falie testitms iniairs,onett*'
't
of rrr. pulr is actlrallv used i. thc tesl l{owevet uDlcss a parallel but opPosite
".
can be nade, rhc ProPosirion is nor likciv to make a good $ue false
srateDrcnr
r es r it cn H e rc a re s o me i te ; Pa i l s deri ved from (he i deas P resentedabove
1r- The use ol optionalr.ther lhan requk€de65ayllems reducssth6 dbilitvto malo norm_
relerencedinleDretalioos. (T)
lb. Th6 uso ol oprional rarherthan requn€de35ayllems enhanc€sth6 tbllily lo nrako
interprotation6.(F)
crit6rion-reler6nc6d
2a. Th€ scoresr6sultlnglrom the useol oPtionslfarherlhtn r€quired6s3ayll€mswill orhlbll
Bddcedva.isbilitY. (I,
The
sco16sresultinglrom tn. us6ol optlonalralhsrrhsnr.qul'Ed€ssavilomswlll 6xhlbli
2b.
incrcas.d variabllltv. (F)
3a. The r€llabilityol scoreskom optional65s4, ltdmsls llhelyro b6 smallsrihad lor scorls
basedon reqlirsd 6ssaYdems. O
3b. The r€llabiliiyol scores|tom oPtlonal665syll.m3 i6 liloly to be l€196rlhln lor lcoru8
btsed on rsquired65saYlt€ms. (D
4a. Tho rctlabllliyadvsnug€ ol uslng rsqulr.d v€rsusoPtlonal6s6!v ltomdl! du' to dltt'F
oncs! In 6corcva ablllly. {t)
4b. Tho rcll.blllty advantlg. ot uslnE€qulrod lor6u6 oPtlonnl€!6av ll€m' lt du' to dlll6r'
.
anc.s In lest l6ngths. (D
\ or F th d r m a n ) v rt;a ri o n s i a n b c dcv€l oP ed i rom rhe proP o' i ri ons l i sred and
r l' r r no n e o l th e i l c m\. o r rh e p ro p osrti ons i s a reproLl ucl ronul one oL tne ofl gr'
nal sentences-Atl itens are designed to rest for understanding, not simply for
recall of sentences read or heard
Guidsllnes lor ltom Dov€lopm€nl
There are five general rcquireEenls for a good tne-false tesi item
1 It should 1ei the €xaminee! knopledge of an imForGnt Proposition,one ttut
is likelv o be siqnificatrtand usetul in coping wilh a larie'y of situadonsdd
proble s.It should say somethingsorth raynrS
requrie un.terranding aswell asm€morv SimPlere'all ofm€aningle$
Itshould
2
words, enpiy phrases,or senknles learn€d by roc should not b€ enough to
Permit a cofiect answer
5- The inLended.orrect answer(tn€ ot ralse)should be easyfor th€ item niter rc
defend .o the satish.don of comPeten.criti's The 'rue sratementsshould be
rru e e n .u H ha n d rh e ,a l ' e !d Gm€;rs fat' € cnotrgh\o rhatan exP eflw oul dha\e
n o d i m(u l y d ,{ i n g u F h i n g ber{een rhe' n A n) exP l an{ i oq or qurrrrr' ar' o
neededro ru*ifv an uncon;lidonal answershould be in'iuded in the n€m'
TFUEFALSE
TESTIIMS
,1. On fie orhe. hand, Lbe inLended coDecr ans{e. strould be obvnrus onl} ro rhose
{ho hav e good ( om m and of r he k n o w l e d g e b e i d g t e s r e d l r s h o u t d n o r b e a n a L
rer ol common k.ovledge h should nor be gieeD ahav bI ad unjntended clue
fhe wroDg an$ver shoold be made arrra.tive Lo Lhoservho lack Lhe desired com
5 The nen slould be cxpre$ed as simply, as .oncisely, and abole alt as .learry as
is consisrenr w,th lhe preceding four requiremenrs. rrshould be based o. a singte
proposition. Conmon words should be given preferen.e oler t€chnic2t Lenns.
Senr€n.es shDuld be shor. and simple in sou.ture Essentially uue sratcmeds
should nor be made false by sinpll idserling tbe wotd tu/
Here are some pa,rs of rrue-lalse test irerns rhar illusrrare rhe|e rcqurrernenrs
The frsr af ekh pah i: an aueptabte itm, uhlb the seond I paot.
L me ikm
testt an inlortant
i.!zd.
{1) Pr.sldent Konnedy .tt6mptod to sotv€ th6 mtssit6 crtsts by threatentng a btockad6 ot
cuba. (r)
(2) PrcsldenlKennedywar 12 y.a.s olderthan hi3 wtlo. tD
The differenc€ in the ages berween President Kennedy and his wife mighr be a
subjecr for conment in a casual conversarion, but it has lrule ro do
rhe
'virh
importanr evenN of ihe time. The Cuban mrssile crisis, on rhe other
hand,
brought rhe Unitcd States and Russia ro rhe brrnk ofwar Hos,rhis cnsis was
handled is a far more imporranr element jn lrorld hisrory (han a difference rn
ages between a president and his wrfe
(3) words ffl(. som., uE allh alt, ot Nret snottldbe ayoidodh wrtrtns truo-tatss 1e3r
Item6. (D
(a) Two pltl.lls ehouldbo lvoldod In w.ltlng tru6-l.rs. r$r ti€m6. (F)
Item 4 is rhe rype of rexrbook sentence that sers rhe srage for an rmpor.
tant pronouncement-bur
fails to make ir ltem 3, on the orher hand, rests rh€
examinee's understanding of several impo anr princrples Specific dercrmrners
like sanc anduualry provide irrelevant clues when used only in rrue staremenrs rf
us€d in false sGtements, they tend to arracr wrong answers from rhe ilfprepared
srud€nt. Convenely, specific determiners like alr or naa are useful in arr.acring
wrong answers from rhe uninformed when used in rrue sraremenrs
(5) liloror.lt c.n b€ dblolvod ln i plnt ot wlm wat.r than ln d ptni ot cotd wat.r. O
(6) Som6lhlngs dlssolvoIn oth.r thlns.. O
A statement like that in item 6 is roo general ro say anythiDg useful Item
5, on the other hand, provides a test of the undersranding ofan impo anr rela.
tionship.
2. Thc itcm tcrts @dtsl@diag.
Phrorcolog.
It docs not reuard re@I of d dneoq4d
TFUE-FALSE
TESTTEMS 1{5
(7) Whena handpushssa doorwirh a c€.tainto.c6,rh6 doorp!€hes
backon rh6 handwtrh
th6 same |orc.. m
(l) For €vsrydcrtonrhe6 ts an 6quatsnd opposireBedion.
m
(9) ll the hypotonuseot an tsosc€tesrighr t angt€is sevenInchos
tong,oach onh6 two
€qu6tt€gsmust be nor€ lh6n tivo tnchgstong. {D
(10) Th€ squarc.ofrh6 hypor6nus6ot a gnr r angt€6qoatBrhs
sun ot rho squar€so, rh.
oth€r lwo sidss, tn
Borh iLems 8 and l{J are word.for word srarcmenrs of impo.ranr
Dr:nci
ples r h a r .o u l d b e tc a rn e d b \ ro r.- . r^ r.\r d r,,dFni r undr,,,,.;t;;.
L t,' d;;;.
dDr e ru fre v n r q p c ,j ti , a p p l i .rri o n\ rhar rrord rhe 5(.re.r)ped phi d,rr,
d\ hr.
been d o n e i n i re ms 7 a n d 9
). me coftect aMo
to an iten & defasibb.
{11) Moisrrtr is lass denserh.n dry air. (T)
{12) Aaln clouds.16 tisht h weighl. O
S i n .e a rrrn , to u d
" e e m , to fl uJt ,n rhc ,i r. i r mi ghr reasundbtrbe ral l ed
lighln b c i g h L u n rh e o rh e , h d n d. d ri ngre ra;n .t" uj , m" r
i eh .;;. i ;;;
100, 00 0ro n s . O n e (u b i i i o o r o t rh e rtoud probabty w .;gt , ,U "",,
, it
. ,,..
cubic foor ofair Since rhe cloud conrajns d-rop;.t, i,r*"i".,
" r;
.."1"t,"rrr,,
wergh more per cubrc foor rhan cloudless dry air. On the orher
".."rahand, moisr ai;
dlone/l | e m l l )$ e i g h \l e $ p e r(u b i c foor
rhdn due\ dry ri r. shoutd rh. i r;:i ;
s pe( ' r \._ o tn e r rh i n B\ b e i n g e q u d l . tor erampl e. pressureand rFmperaturej
lr
m r gnr .b u l rn th e d b s e n .eo f me n ri o n. d redsondbteper.on i s
tu\| | t,cd ;n .(umi nJ
rhdrremperarure
dndp,e\rureshoutdnorbe rdl.; ,. t"
",li,ti"
ru,,.lli,, ir,,l
(13) Theproposatihats6taryschodutosrorroach€Boughrto
Inctudsski In rgachinsas on6
ot.thsd€i.mtning vartabt.st6 6qppon.d morosrrongtyuy reacrrers,
organizartirs|tran
it is by t.xp.y6re. {D
(14) M6dt ls an Inport.nt tacrorarr€cringa reachs/s .atary. (D
I h e i rs r te rs i o n i s m u .h m o rr spe, i fi , dnd mu, h morr cl eJrtr tatse
rhan
.
r ne s e,o n o L rp e rrs I o u l d rg re e u n rhe aD \N er ru rhe frrl r, bur soul dberroubl ed
a m b i g x ' l y o frh e rerond. A cro55the (ounrr)
i s no doubl rrLe
r nar r ne s a ta rre so r g o o d re a .h e rs a re hi ghfl l han rho\e ot puor rrarher,.
H or
ev er ,|| r\ a rs orru e rh a t rh e s rta ry s .h eduteqor manv srhoot .}i Lem.
do nor i n, l ude
m er r r a s o n e o t th e d e Ermi n i n s fa cror\
(15) Tho tarntttng ot srarlght ts du€ ro molton In rho oanh,s
atmosphorc. (I)
{16) Starssend our ttgnrrhrt twtnkt€6. (T)
The answer to ihe second, unacceprable version of rhis rreD
coutd be
..
c"hallen Se db v a re a \o n d b l e ,w e .i n fo rmed person on rhe to ow i ns qround!.l L
b
nor r netrg ^ h t s e n to u tb \ rh e s td r rh d t tw i nktes. rhar l i ghr i s retdU v" ei l
5rcadr.B ur
Dec aus eo r c l rs tu rb a n .e si n o u r l rm o sphere, rhe Ii ghr l har rea,
hes our evesfrom
146
TESTTEMS
TFIJEFALSE
the srar often appears to twinkie That the second version is unaccepBble is due
eirher ro the li;iied knowledge or to the carelessnessin expressian ofthe person
oh.tiola tu'a,ryo@.It tesa special knwbdget o g@d turt item is
^ot
'
Froz€nloods sr€ usu.lly ch€aPsrlhan cannedloods (D
Fro:sn loods ol lhg highe5tqualfiy may be ruinedIn the kitchen O
Most locrl insutanoo.gencles are ownedand conirolledbv on€ ol the malornalional
Insu.ancecohp.nie6. (R
lnsuranca.geEciesmay be eithd genotalor speclallzod m
4. The sMet
(17)
{181
119)
(20)
Who could Coubr the possibility olcooking an) knrd ol fbod badly? How
ot aur ble i \ rh . h e l i e l rh x t o n l y s e n e r' rl or onl ) \pe, i dl i ,' cd i n{r' ,n.e JS rnr re'
iould be i o u n d ? t h .,rra ft e p rd b l r v e ,\rnn\. i rem) l d rnd l r' dr. ruu ol ^ i ou Jr
true to discflminate high achievement fron lo$ Borh read like inrroductory sen
ten.es lifted fron a textbook, seDtencesthar set dre stage for an imPortant idea
but do not themselves €xpress imPortalrt ideas5. To one @ho la.hs the hnoubdle being teste4 a uroag d@q
more Plasibb thda th. @nect oi..
should d,Peur
{21) sy addlngmor6loluis, a saluralodsolutioncan be m.d€ sup6ruatur6lsd. {R
(24 A Bup€Gaturul€d
€olutloncont.lns morssotulsP€runll thon a Eatural.dsolutlon {T)
It aDpears reasonable ro believe that adding more solute r\'ould tum a
smurated sol;don into a supersaturated solution (item 21) But those who undcr'
stand solutions know that lf doesn'l work thlit ay The added solute woD'i dis'
solve in a saturat€d solurion Only by evaPorating some of the solve.t or cooling
it, can a saturated solurion be nade suPe$aturated The student \ho tries lo use
as a substitute fo. sPccial knowledge is likely to girc a wrong
(which
is
all hrs knowledge entitles him to) to $e fint item But the same
answer
coDmon sensc leads ttie student oflow achievcnent to answer item 22 conec -n
Thus the second version fails to function Properly ai a test of the studenfs com
m and of k n o w l e d g e
6. The itdn is eEFesse.cclzatbt based on a single idea.
l23l Ths sall dissolvedin wai€r can be r€coveEdbv ev.porationol ths solvenl (T)
124) sali can bs .li*otvsd In warerEndcan be rscov6€dbv €lapohtion ol the solv€nl F)'
(25) At conc€pllonthe ssx ratio ls aPprorimalolv3 bovs lo 2 gitls. CI)
spern .rs srrcngerand liv€longerlhan l'md€_
t26) ScionllslshavsloundrhEtmaleProducing
rersnD da( inaPProPtraGl).o
'Arorher unat.eptable
(T)
soDes in hot sate.isuqir dnsohs in.old nr.i
bnrcs Lr. ideas migh(be:Sarrdir
TAUE-FALSETESTITEMS
I47
prcdlcins spe.m,whichaccountstor th€ s€xratioat concsptionot approxhaiEty
3 boys
ro 2 sirls. (T)
A n i re m b a s e d o n a s j n g l € rdea i s usua y easer ro undersnnd than one
blse.l on two or more ideas. Ir is also nore efficicnt. One can obrain
a more
o t r .ru d c n a \ d, hre\(menr L) rFsri ns \epJrrre i deas reD I
r dr elr r l rd n b \ l ,rm p i n g rh c m ro g e rher dn.l
o...o;,p" " ,.,," " .,,i i r,,
" o,," g
l27l h!ividuals whodettbsrarebetorsmakingchoicessstdomthd rh€ms6tv€s
,orc6dto eac.
rillce one good thing tn orderto a ain .norh€r. (F)
(28) Lll€ is a conrinuoB pbcess otchotco mathg, sacriticlngone humanvatu€,omnother,
whlch goes through tho loflowing steps: spontdneousmontet s€tectionsrec.dh;
€verythingw€ w.nr, con ic ng pret€r€nceshotd esch orhor h check, h6;[srio;
bocome5delb€ration as we wotgh and comparevatu€s,rha y chotc6 o; pret6rcnco
7. The itcm is aorded conciseb.
(29) Th6lederatgovernmentpaysprac ca yth€ ontir€costotconstructhg
and msintahlng
highwaysth.t a.s prrt ot the int€rstatehighwaysysten. (R
(30) Whenyou s6e a htghwaywlth a ha.k6r th.t re.d6 ..tnteBtat€00,,,yoo
know rhst th.
constructionand upk€epofthat road a.s bulti .nd mainr.hod by rh; st.to and tedor.l
thar
. . The wording of item 30 is carelessand redundanr.It is the ,ri&_iard,
L,urlranLlmlnrained. nur r\ cunernrcrion
dnd upkeep.tr,. p.r,.,'."r1."J
'\
fwtren lou
rhe Jppedran(c pracii,ariiy.b,, i"..
,ii.;;
"rmal(ing
rcJ y med\ur(\,r dlt. Find||),
irem 30 truF b! "",
includine
\trrF a5 hrll ds fed.rdt tsoternmcnr\
as *upporrersni rhe inre,.rarer,iet,,v _v;
rem prnbablvmrrc, rlre irem earier ro, rhe uninrormed.lrem 29 his rhe in.
tended mark more clearty becauseit rs more srraishtforwardand concise
8. Thz itch doet not in tultc at anifu;aL
trickj Mgatiw.
(31) Columbusmads onryrourvoyaqosot oxptor. on ro rh6 w6arom
H6mtsph6E. O
(32) Columbusdid not nake tourvoyag6sol.xptora on rorhoWostom
Homtspher€. (R
.
S o m e i re m w ri re rr rrr ro ru rn re\rbook p,oposi ri ons i nro l atsr sLaremen| ,
\rmp t! b ) j n \e rri n g rh e w ord @ r i n rhe ori si nat ql atement.
fhe
148
TFUEFALSETcST
fEMs
r es u l t i s s e l d o n g o o d .l h e i te m u sual l y.arri es rhe cl ear bi fthmark ofi rs unnaru
ral ongin: lt reads awkw,rdly and invires suspi.ion, which, if rhe irem is iodced
false, may give anay rhe answer Furrhermor.cj rhese irems rend ro be trickr An
unob rru \i te n o r" i n a ' r o ' h e ' w i re w hol l y Lu' . \rJi er,,enrmd) he orFrt,,uted hl
ev en a q e l fp re p a r e d e x d m rn c e .5uLh i rrn,s pl r r' uJenr\ ar rn unne.e\.ar l di \rd
\ dnl d g c N i ts a ri l e l ) q o rd d d \rd rc n' en,r harr bren .hnhI ro br m.rc d; i , utr rnd
to create more c.rnfusion and hosriliry in c\arninees than posirively rvorded rrue
f ahr ' rra te m e rrtstB a rt' c r a n d C b rt, l qdl l
Roduclng Amblguily
Why are Dultiple.choice irems seldorn criricized for being ambigxous,
but true-false irems seem ro be fiulred qith regularit,vi l he alrwei, r,e thint, is
in the ren fornrats themselves Wirh mulripl€ choic€ irems. drc resDonse lhar
appe rh ro b e mu \t i u rre .1 . re l a ri \.
dl ,
L" detcndcd.
' . in
' he.rher
There is relatrve compa son inherenr
rhc irem Widr rrue f,tse ircms. hou
ever, the starement usually is absolurc nld exaDirees sear.h rheir knowlcdqe
( lr u.L u re s [u r a l r(L J ri \e 5 A U ( fdtse i l (m rhdt dp.l dr(s. .[Ir P r' t,,mJ,i r J r;l l
mounlain," leads lhe examinee ro a s€arch foi high peaks so Lhar rh€ ra Dessof
Mt Palomar can bejudged iclarive ro rhe heighrs oi ofier mounrains. A corre
sponding mulriple.choice item may ask which is rallesr and provide rhese choices:
M r . W h | ' n e ).Vr S t H e l e n , \l r I ' al omd' . and C rdnd terun.thouqhMr.patumar
r s t he l o w e \r o t rh r tu u r p e rt,. h) i r,e i r tc' rJi rr' suut,l fi ' moi r (onl epri on\
l h ;s ' i rn p l e i l tu ,rrd tru n ., t { h. formar d;,l erch,e,
a sol urrorrro
" f r al l
' ' rgg" ,r,
I he a m b i g u i t) p ro b l e n ' i n s u m . |l ue trl \e i rern(
Most true-false items can be made essenriallyanbiguiry free by inrrodu.ing a comparison wirhin rhe irem. Here are some sample pairs of poor and im.
lt.
1b.
26.
2b.
3a,
3b.
Op.$book t6sts tsnd lo b€ In6tllcten!. (?
Opsn-bookrests t€nd to b. loss otttctontthln ctos€d-boo*rosts, fi)
Tak6-hom6
ox.ms usqallyrrc ftlgh In quatly, (?)
hfto.hon.6xrms usuallyar6 hishor rn quatttyrhan In.cta6s6xam3. (R
The use ol closad'book,In.classtests contrtbut66to htqh rstonflonby stud.nts. (?
Th6 use ot clo36d'book,h.cla66l66ts conulbur.c more to itgh retoniionby $tudonts
lhan does fie uso ol t.k6.homet€sG. m
3c. tho rrEool closod.book,In.ta3s tosls oontribursEmor€ ro hrgh rotsn on by 6rud6nr3
lhan to rcductlonol t€Etonrt6ty. O
In cach item an internal comparison is introduced so rhat 'inefficienr," "high in
quali r\. a n d h i g h re re n ri o n i rn bej udged b) a (ummun i randard.
In other cases, stat€m€nts may be ambiguous because of rhe rmprecise
wording chos€n by the item wrircr. Insread of satng 'a Iong rcs!" for exampte,
say "a 100'item test. Instead of referring ro "an easy irem,'describe ir as "an
iiem th a r i i l e a s r8 5 p e r.€ n r o l e xrmi nees an\ser curr..rl v.' Ther. i c nu ( hoi ce
bet w e c n b re ri tv a n C p re ri s i o n \hcn w ri ri ng rrue fahe i kms. rhoutsh ve
' (et
both, trevity and imprecrsion lead to worse
consequences dran
does verbosny with clarity-
-----------
IAUE-FALSE ]iEMS
'EST
1'g
ErhanciEg tt6fi Dlscrirnlnstlon
-Ih e .j o t) o fa re s t i re m ;s ro di s.ri mi narc
beMeen thosc Irho hare
.
r hosc ,v t,o Id (k .o n ,mz n o r, \o m e e,c,m,r:1 r,.,.r.,r8.,,,.;" ;Jr.;;:,,' ;,,.,;;: ar)d
. r r n re ' p re r! o n ,n b e ,,,d C e ,,t r tres4u,F,
l hur(i nul ," 1c" ,t,;r,eJ,,,,,u!.;d
shoutdle?ille ro ansiver rhe que,rion cJ.re.,ty ,t,rr.",
ai,ri..,iiy..ir,"-". *i,.i".i
irshould nnd a wrong answer atracri,Je t o procldce
irem5 rha.,witf di*,;;;,,;;,,
iu t h i s w a y i s o re o f .t' e a r6 o i i re rn' ri ,i ,,s.' H e...;.J," ;;;;;;i ;;,i ;
true-false items can be prottuced ro promo
1- Ue nffie folte that ttuc stutements.
When in doubr, studenN sijeB
rnc
p,upo.i,ions
p,eqec,(o,,,,,,""
,,,,.,..,1;
li::",1:,i.i,:.i,
:::,:':i'.f,:]11',],'*l:
nore sharply between
sr!denN of r,igh and iow ,.ht.".-.,,r
rt,:l" d; ;;:;:.
m ent s (B a rk e r a n d Eb e l . tg i tl ). T h i s ;av b
wharN caried
xn 'rqu,escd,,
.",p.;.
i,;;;;;;;
#;ill?,:9":.'
";.;to rejec,
accept
rhan
a decrarar,!e,,",.j:;1:Tij'J1,:j,,i
:.#ili",tl"l,::
Judge.
Inshuctions for preparing rrue ta,se res{s somennres
suggesr includinR
_.^-.- the
.
aDout
same number of falsc and rrue sraretrtenrs.tsur il rir.
f:ir. ,r"i;,;,;;;:
rend to be higher in drcrirninario!, ir wourd serin
;.-;;;-tua.c;
hrgher proporrion ofrbem, perhaps as mr.v ai rJ?p"...",.
".t,,r,,,s;;
E"".; ia,,;;;;i..;;";
to exp_e.ta grearer Dumber oi falst u€m-i, ihe kc-hiirluc
sri| seen,, ,.; ;.t-i;
one of the author's ctasses,sruclenis rrrok a res. o, ,f,i.f,
r*".rt i.a.
,t.t..
quesri oD sand_co!nri hg how"f,fr.
many rhey
_ectnumocr
ot rrue sratenreDrsaDd werc
ey wished. r\fosl of rtrcm changed. nun
scorcs lery lirrlc, on rhe avcraqe. t.bev
! iiom righr Lo Brong a, fro- ;ro"s r;
2. Woftl tnc ibft n thet supefiljal
lagic tuggest! a urcng aflrun
(1) Aruo!€rDltt_w6bntn9tO0sr.na i. flor ngon th€
3urt.c6ot o pootot *slsreracltv h.rl
suDmlrgsd.An.ddtlronetdownwardlorceot 50snhr
b; _c;i.;i;
compt6r€ty. (R
"".td
"-;-;;;;;i
weighs t00 grams, which qives one-half
erficiat basis. Ihe true casirc, ofcourse,
only halfofir, anorher 100 qrams would
cial logic also woutd make rrre inconecr
ausi bl e.
(4
cr
Shc€ 6tud€nr.show. c,td. rang. ot tdtvtdu.t d €rencos.
rh. tdoatmoadurum€nr
!tru!.
ton woltd b€ achtovedtr6.ch lrudont coutdr.ro a dtr,",*t
r";i;p;;;tit;;;6;;;
l6et h|m cr h€.. {D
150
TBUE-FALSE
TESTTEMS
(3) Th. outpul vollag€ol a t.anslorm€ris detsrhtn€dIn part by th€ numbsrotturnson tho
hput coil. F)
{4) A translorm€rthat will increas€the vott.g. ot .n attgmatingcufiont can atso b6 usod
lo inc€aso tho voltEgool a dtr€crco €nr. (R
). Maha the u.on? dntuer @Niitent aith a poqular nisc@ceptioa or a
Foputar
belief ifteleaMt to the qrestion.
(s) Th6 €ltectlvenessot t€sts as rootstor n€asurinsacht€v€monr
is towgr€dby th6 appro.
hensionstudentsl€el lor them. {Fl
Nlany srudenrs dc, cxperience rcst anxiery, bul for nrosLof rhem ir facil_
,rates rather than impedes maximum pe.formance
(6) An echlevemenllest shouldinctudo€noushiromsto k€epsverysrudonrbusydudngth€
€ntirot6sl P6riod. {D
Keepingsrudenrs
busyar orrhy educational
tasksis usuallycommend.
.,
able, bur in rhis caseir ryould make rare ofrvork couDrroo heaYilvas a derermi
nanr of the tesrscore.
4. Use sFecifr. deteminers in ldelse
to cnfound testui&ness.
(7) A so-ltemt66t s€nerattywitt b6 horo rcfirbt€rhana z5.ir6mr.st. (D
(8) lo a posltlvslyskoweddtstrtburionrhe m€ani! rtways rarg6.rh.n rh6 hode.
O
{9) Tru6statemsntsusuatry!r€ mor€dtscrthtnatingrhantats€.raromsnr.. (D
(10) Acor€lallon ol +0.28is nevdconstd€rcdro b. htghorthana comtaton ot _o.as.
O
5. ase thrercs in fdbe stdtements rhat giae th.n the ,,ring of truth.,'
(11) Tho uso ol b€tt6racht.vom€nrtests w t, In tts6l, contrtbsto||lt6 or nothtngto b.$rr
6chlev.n.nt. lFl
The phrases "in irself' and "little or nothing" impart a tone of sincerit, and
rightness to the statement thar conceals irs falseness from rhe uniDform€d.
(i2) To onsurscompr€h€nstv€
m6r*ur.m€nrot.!ci rlp€d or.cht€voment.dtfi€rsnrHnd.
ol ltema ftsst be lpoctttcaly wr ton, In du€ propor{on., to ts3t o.clr dtsUnctm.nt.l
p.oc6ssrhe cour36t3 Intendodro d.votop. (D
TFU E FA LS ETE 9T]TE M5
I51
\,. i n q u p ri o ' r\ 2 . .1 .i
l . { ,pe,l i , i Jt tuqi , r, l ,ri d., rnJn, B
I ri "
.
'd
i' ( n J l .o .l i .tl .,\.rh e e l J l ,u rrre r.' r,menr
Jad,JrFrdtqurl i ri ..,r .n\rhrr.c\rni \e
indi!i d u a l s a s s o c i a rem a i n l y w i rh rrue staremenG.
MULTIPLE
TRUE-FALSE
ITEMS
M ulr i p l e tfu e -fi l s e i re n rs re s e Drbl e muIri pIe.h.ri ce i rens i D rhei r D h\si cal
rppcanDce Howcver, rarher dnn selecring onc besr answer tiom sereral;laerla.
r r v e.. i x .' rn rn e p \rF \p u n d n e c ,h ufrhe,F\.rrt
d\pnar. e| | Lre_
f . , ls eJ . ,In , n r. tl ,.\ q p .u r,.\ri rc m.,
\hd\c,,,o.
Irun \re,r, ti k; d nrut| | D te
, h, ' i. e
.m, b ,,r rn r n u m b e r nt rhc i cr^,rJ,cd rtrrrndri !er D a\ l ,e ,n,.
T he n u n b e r o f a l re fu a ri v e s p e r i rcm or ctuster need i or remai ; consranr
d q i ,' n re .r H e r( i \ J
' drnpl . i rpm
' hr ur .g h ' ,u r
"An
t.
2,
3.
4,
5.
ecolosisrbslng weishrby loggtngand €rcrcisingis
increasinghahtenance m€tabotism. (D
decr6ashgnet productlvlty, {T)
incr€aslngbiom.ss. (R
decr€aslngenergylosl to decompos[on. (R
incroaslnggrossproductlvlty. (F)
Nolice that the afternarives ar€ numbered consecutjvcly rhroushout rhe rcsr and
.s o a' rpri rl r or romc \rrnbut rh,r
e" , h
s r.r' . i \ i rr I u d u , c d
m/t.e, ,he i em
' ry
'
,c
s
er s ' lv i ,l F ' ,ri tr.,b l e .l h e r( \r i rfl n . t or e\dmpte. mj gt-r ,.nrai n , h.r.e\ r, ro t0, rh.
neRt .c h o i re r 1 1 ro 1 4 . a n .l w r o n
nulliple true faise form has seleral appealing fearures relarive ro
.
_Th€
t he mu l ti p l e .c h o rc efo rma r (F ri s b ie and S w eeney,1982) E x;mi nees can make ar
le/ s r rh re e mu l ti p l e rru e -ta l .p i re m re\pun\.s. nn rhe r\erdse, In rhe ti me re
qui' ed ro a n s { ' a \i n g l e m u l ri p l e.hoi ,e i r.m, I de,i dcd rd\an' " ge rn dn h.ur
' h e l o n g e r r.\' p erm' .s re\ri ns
ol r . \ri n g t| me . T
ot d H l .. er rl ng; or deprh ot
( onr F n t. In rd d rri o n , i r h rs h e e r \ hohn rh.r a mutri pl e trup tats;re\r
D reD ared
br , on v e n i n g i re m s tr o m mu l | ' p l e choi re for m \ i etda h' ghe eti abi ti r] ;.ri mdrc,
than the original mulople.choi.e r€sr (Frisbie and Dru!;, r08iii Kreirer and Fris
bie, 1989) Finally, rwo less crirical ourcomes can be noted. Students thar were
ex r m i n e d b \ b o , h i re m rl p e \ e \p r e ' red an o\rMhetmi ne pretcrencefor mutri D l e
t r ue- f a l \e i te ms a n d p e ' ,e re d rh em ro be edsi e hdn rl "
Mu l ri p l e rru e -fa l s e i re m \ can be devel op.d uqi ng rny
-utrioJ
" t" .r,"
i .. \rrrre.
scterdt
gies Exrsting mulriple choice irems can be converred easily lo muhiDle dxe_false
" . lu\ r e rs rn n e ri m e ' w i rh ^ u r .i g n rfi (anr
ot rhe srm. i tr" rc,pon" e
' e$ordrng
\
\^ n re i ,F m s u o ,rl d n ..d ro \e m.rl
i fi .d .a .hJr ..,.h (turrcr soui .t nol
' hai' { i n
152
TRUEFALSE
TESI TEMS
Forwhlchol rhe quantiiissb6towis I morereasontbr'
to"limeto the qu'htilv bym'asurlng
a stmp|€ rarherlhan the whotepoputaiion?
r. The av€ragotite ot 6 now brandof TV tubes
{. lhe porcenrot Amsricanvot6rswho tavorrh.
,r...rh6;uhbe,
o,reach;;
,; ;;;;
c.
Both tt 6nd l
ilil'ifl::"j"'il;",fj",'il ff:':j
"","",
D,
E,
In muiriple hrerfalse form, the rrcm mighrlook
like rhrsl
..ti woutdbe Dore
roason.bteto me.sul
the whorepoPur'rionro €ltinato th.
t. av.hse rito or a new b;,;" ;ii-"T;:::tli)t"""
2. po.ceorot Americahvor€rswio I
roreisnPorrcv {r)
3 numberor re'chers in J",t.-." r"Ll-t:':
3"od"11'3
rcnootwho
usua y rid. th6 bus !o schoot. (fl
\ or . r h rr i r j . p o .,i h te ru a ,td mu r.
qu' t e re 2 d j l ], b n r ro d o ,ro ro fi e m u l
comptexrtv fo. rhe irem wlire. and L
same cogni|ive rasks. bur (he secon(
Finally, rhose who preDare n
, er \ ion mc rh n d , d n b e a i n i trt r,ro t .
for preparing mutripl;.horce'items
nec au s c rh e rre n t w ri re r i s n o t L nL
or r(sponses de'eloped. Any concel
mulriple (hoice items can tjc neas.
S UM M A HYP RO P O S IT IO N S
I Trle ratse rlems provid€ a simple and d.rect
ff;:i;j,lllX-^
rh€essenriar
ourcomes
ol
2 The low esteemin which lrue_ratselests are
somelmeshetdis dueto neptuse,not lo inh€ts
3 True-'atsetemsprovdeintorhationon
essenlial
achevementmue efiicien|y lhan mosr olher
Mosl impodadaspeclsof achl€v.mo can be
resledequalywellwilheitherlrue_talse
or muhirl is lessefiicienrlo grolp lfue_talseslator.enlE
ro producea mltljpte4hoicetem thanlo requife
IRUE.FALSE
TEST
ITEMS 'I53
separale responsesto each ol lhe irle ialse
6 Inlormedguesses as opposedlo bilnd 9!esse6.
. providelse'ul ndicatons ol achievemenl
7 Sludenls do very lilte bllnd guess ng on good
I
The probabi ly o' an examineeach eMng a h gh
score on a lrue ralse lesl by goess ng bli.dly is
9 Trle-,alse lems thal appeaf ambiglous ony 10
poorlypreparedsludenlsare lkely lo bepowerlll
10 Few lexlbook senlencesare sign I canl enolgh,
and mean nglu enolghoutol conlext. lo be lsed
as true slalemenls rn a loe lase lest
11 Statemenlslhal ar€ essenlally(b!r nol perleclly)
true or essentaiy(but nol rolaly) false can make
0ood lrue-false lems
12 TrLe lalse ilems can lesl sludenls comprehen
sion ol mpoflant deas and lheir ab ly ro use
ihem in sorvingprobems
13 Th erea re n o l ir m em pr c a dala lo s lppo. t t he
nolions thal true lase tesls encourage role
earnrn9,oversmplilied conceprlonsol rhe lr!th.
or lhe learningot lals6 or inco(eci ideas
14 Generaily,
ir rs easie.to deveop tesrlremsirom
insrrucriona
malenals
thanlromslalements
ot instruclional
obteclives
15 A usel! str€legyiof devetopn9 ttue false tems
is lo crealepansol slalements,
onetrueandono
lalse basedon a singleidea
16 Goodlrue-ralselems expresssingle,not muhi.
17 Generally,
a good fase slatemenlcannot b€
creal ed
by i nsenrng
(,nor' )i n a l rue
a negati ve
18 Ambgoly in lems can be m nimrzed
by wf ting
slatemems
lhalcorlanan internacompa.isonol
19 Falsesralemenls
tendto be morehighydiscriminalng lhanlrle stalemenls
20 specricdelermnerscanbe usedinwaysthatwitl
hlnderratherlhanhelpthepoonyprepafedtesl21 Falseslalemenls
canbe mddeto seemptausibe
byusngl ami l i arl erms
andphrasesn seemtngy
slraghlloMardlact!a stalemenls
22 B eari veto muhi pl e{hoce
i rems,mui l pe k!ela se ilemsare moreellrcrenl,are easierlo prepare andyed morere ablescoros
FORSTUDYAND DISCUSSION
OUESTIONS
1 Howcouda goodirue-lase lema.d Iheproposilion
onwhichit is basedbed slinguished,
2 Whalaresomelogicaelpla.alionsior lh6ge.€ra advanlage
ol multiple
choiceover1rue
lalsein lermsol itemdiscrmnalion?
lo a lrue
ilem b€ moredifiicltt inhofenlty
3 Whymiqhrlhe task ol respondioq
thsn
'alseitem?
respondi.glo a conlenl€qurvalenl
mllrple{holce
Whalsrepscanareslconsrruclor
rakelo drmi.rsh
th€polenlianegativosltscls
otguessin!
on rrue-'alseresls?mulipe€hoiceresG?
texlbooksenlences
asrtue lalsortsmsconlributesooaratslv
5 Howdoeslh€ us60lverbalim
lo amhiglilyandlo measuremenl
oltfivality?
6 Whatare ihe arqlmenislhat supporland r€lul€lh6 ideathat truo-lals€it6msimptant
''unlrulhs or misiniorhalonin lhe mindsof l€st lakels?
Howdoeslhe addnionol an inle..alcomparsonin a lrus-lals€ilemmakeit mor6likea
Underwhalcncumslances
mighllhe advice Usemor6talsethantrLe hav6morenegativs thanposiliveconseqlences
for obtan ng vald measures?
Whymighlil b€ b8dadvicsto recommend
thal specilicdotefminers
nol be usedin rrue-
Multiple-Choice
Test Items
THE POPULARITY
OF THE MULTIPLE.CHOICE
FORMAT
Multipl€.choice ir€Ds have lons been rhe
::i:.#i:i*ltffii
:$'ff 5,xT"".:i;iH:*-t*ff
i..*li$ii.jji..,ff
choic€ than in someorher irem forms. S
tions lessambiguousrhan compleuon or
easierto defend correcr answ;rs
t".#j:,1._f*?*:',113
";;*ll:;:",:
111y.liLil"';
ffi :."::i:;i,i"fi
;.;i,:ffiT,ii"'*:T,i?,::"T,"?"ili;:ii,:i
i:;;:.#[l;TJ[?i:
i'r:$:
';:. 'r.::
ol gucssing.
in,rru.rlor.,"i J"a""i. ,,rir
k.l::-1:,.l
detrimental in
i.,..;;;;;,
rinu-ttiple-"iioi..trra" tn t.re_r.rse test,.
Multiple choice items
Mulriple
irems have
havc vithsrood
wirhsh^.r rhe
,h. resr
r-., of rime well They have
.ri,i.i:mandrhe!rema,,'
,..ri.";_;x;;;,i;;.,;;;_i;;:
:);::lH-111':9:-"*"1
other
objecriveir€ms-they are superficral,
'15'l
"-brg;;;
;;;;il;i;;::;;]:
MULIIPLE{HO/CE
TEST
IT€MS I55
r ! ' gu,s .i ' ,q U .u rl l r. rh e ,' i ri ,s h d ve sai d or i mpl i ed,har rhe
o t! sood qav ro
r c \ r r \ Ih e I k a \ e \\a r rF j ri n s .r sume fo,m.t pertorman(rre\ri ni .rt.i .r^" "
x u rh e n r' , rrs t\, rh et ral , r' nd permi rc rhe asre\\mrn" rot hi eher.
or oer .o g n rrc \k r s I h e rmp tn a rj onr dre rhar rc mu,r u,e une
rnr rhod ui the
or ner . n n ri D o th .i J rd rh d r mu l ri p l e , hoi .e i rern\ measure
ontr a i fi (i rt hehati ors
dlr dr n(d rh ru u g h \u n re l o ' m o f
tearnrng
' u rformars
e
Feu .,bjccrjve resrsor irem
are so perfect as m be above reDroach
I' re ( I i | | .\. B u he,F rre \c\.rat r{ edtne\\e\ i n rhc i ndi rL.
m r nr s rh d r h r\e L ,r-c ni D u e d a 8 a rn , rd mutri ptF..hoi ,e r.\r\
dnd i rem\. I i , rr. rhe
s u p p u l e d L \ unhi !sed empi ri i rt dd,r, despre rhe rr.r
rhdl
i"
' ztd
' cbr\e re trU rc tv
( uinl, 'd:-.
a ra "$' "-o u
e a s \ ro obrdn l hc,hufl Lom,ni \ men,i uned b\
i1r , r i \ \h o l td ru q e r l h p d ,\, ri m ,rk ri r,g puker ot rhe
i r.m, and ,he reti abrtrr\ul
I nr .( or c (. y e r rrrrn e l t re r, h rrs rn d .xped k n maker, hr! e dcmon\rrdred
reD ear.
eor \ r r o u n trn u rl tv rh a ' r o ri \ tro m mutr,pte, hoi , re.b (an
be ni cht\ retr;bl e.
'
"
.l ri p te { h u i , e r,.sri ng,rnd" ru be basedun qe;erdtj ,,dri ,,n5
s r eD) rrrn BI' u n r l | rru rn ,r u r ti o m i snl .,rFdIn\ran,e\utpoor
Le.,,or i rcn,,
h , rh -. .e , o n d p l a i .. m n \r r I | l i A \etdorn mdte d 5eri our aI| emD r
ro urrte
. ' Hon, l ,d \p l u r a b e r' .r w a . u t rn e d \uri ngedu,ari un" t .,t," .....,
C i ,r,.,. r-," .
nr or e i$ i .\me rrr o t P | rto [d n , e . p J t, ut" ,t] me,hods rhdr koutd
nor ,eo;i re
r ' ) i u\ e u r p d t' r' d rd p .n , j t. Bu r \u, h propusrt\ \i tdom r(ounr
l or rhe ti mi r,.
r or , . r n h rre n r i n rh e IJ .ri n r,d n ,e td\L\. rhr td rbi ti r) ul rhe $i ,re\
as:rqnedb\
lu4B ia. ., rh e .,\r\ rn ri n ,e d n d p e r\unnet ,cqui red ro rdny oL,r rne
i ro,es.
r ner F J rc \rr.n g r h \ d n d \.d r n c .\ , d* u( i d,ed $ i rh a form!
ot asesmenr ,.a
p
r,
ur er : ! l
h n e e .l ru b e
" s rrF o t rt re\e pro\a,rdcanr V o,, i mpo,r" n,t1," tu," .
, or \ r no u rd r F d trl F
n ^ \i n g tc u ppr,,Ji h r. edui dl i unat medurcmenr r: morr
aP P r up ' rrr. ru fd // ' D rr
, uJ| | nl !
' \rru rL i o n J l
l-ven rhe mosr
ardenr advo(arcs
tion. They acknostcdge, as Ne do, rhar m
serious llarvs and rhar, rn qenefal. rhev a
dis . r ir n i n a ri n g a s rh e y s h o u l d b e Mos; u
wirh the obser!arion rhar rhe scores rhel
be, orought ro be, for niaximum vatue. Br
c hor c ere s r' n gu n ti l a s u b s ti tl rre,i rh tess
red altenratives, esay or pe.iormancc tesring, are clearjy less
convenicnr and
'ornc r en r
er
ro u s e rn m a n v s rL u a ti o n s ,
Production versus Sslec on
Ir i s s o m c ri m c \ \u g g c ,rc d rh r r,,bj F, ri re' c\r\ are i ne\rrabty
mure ruD erl r.
.
. r r t J nd te r5 re a l i s ri . re \rr o I d s ru d c,,r' . Inobtr Jge rhan
are esar' re,r.. f ti , rea
nswers ro rhe stuclenr the examiner hrs
I nosL goud oLi e, L,re.resri rems rcqu,rc
{ i nrt rhoughr,rhc ,ara t.r i hoi .e a;onp
s do nor perDrt coEecr..spo""e o" th!
nemory, or mean,ngless verbal a$ocia.
r processesrnvolved in selecring an an.
I
15€
MULTIPLE'HOICE
TESTITEMS
Achlldbuysl€llyb.answhbn
$€ groc6rptcksup,wtthoutr.€ardtorcotor,
trom. truyconrain.
Ingr mlxtureol lollyb€rn3of lnrc. dttt€rcnr
cotoruwhar t. th. sh![6st numborot loly
b€m€lh. chlldoanbuyand.tlll boco.trtnot g6lltng!t t€.srtou.l6 y be.nsot rhoerm€
The answersprovid€d aie 4, 7, 10, and 12
Assume that dxamineesare seeing.tlis parricular probtem for rhe firsr
time, so ihat they cannot ans$er ir su€cesstully
by srmply;epearing an ans er
someone€lsehas given them. Assum€,rco, rhat probternsof tlis kind are nor of
sufficient pEctical imporrance to have been made rhe subjectof a specialunit
of study.These assurnFdonscall attention ro an imporrant geDeralpiinciple of
educaiional measuiepent. What a test irem measur;s,that is, what a succ;sstul
respons€to it indicdtes,cannot be determjned on r}le basisof rhe irem alone.
Consid€rationmust alsobe given ro the examinee'spreviousexperiencesThese
rnaydifer significantlyfor different examinees.Butin the casebf rhe foregoing
problem, the assumptlonsmenrioned abovemay be quire reasonable.
How much differenr woutd the thoughr pro;essesbe. and how much
more difEcult would the problem be, if no answeriwere suggesredand the rask
requir€d production of the answerrarher rhan selecrionlpi;ducinc an answer
is nor nece$arilya more complexor difficulr rasl.
i;dicarire of
achievement,than choosingrhe b€srofthe availablealrernariv€s(euellmalz,Capell, and Chbu, 1980).
HoSant 0981) thorough revi€w of th€ researchinvolving comparisons
b€rweenfree-response
and objecri\e resl, led bim rc $ese conctuslons:iln most
insta-nces,
free.rerponseand choi(erype mea\uresare founrt to be equivatenror
nearly €quivalenr,as defined by rh€ir intercorrelatron,within rhe limits of rheir
resp€ctivereliabiliri€s. Furrher th€ choice.rypemeasureis nearly alwaysmore
reliabl€ t}lan the fre€.respons€measureand is considerablyeasierro score.,'patterson (1926)reached the sameconclusion n€arly 55 yearsearlier.Bur desDir€
|}le overwhelmingemp;rical support for Hogan'. cbnclustons,manr pracririorien
continu€lo ignorerheresearch
or persistin belie\inB(harin rheirown:iruarion
rhe two must yield measuresof oujre difterent abihies.
The Frocor3ol Ellmlndtlon
Studentsmay somedmesarive at rh€ correcranswerto a mulriple.choice
testitem through a processofeliminarion. Rejectingrespons€sthar see; unsads.
facrory,thev are finall) left wirh one rermed Lhe 'righr a;sser.' nor becauserhey
haveany baris for choosingir direcrty.bur s;mplr becausenone of rhe otherswiit
The availabiliiy of this processof elimination is sometimesreSardedas
a w€aknessofthe muldple.choic€ii€m form.It h charg€dthat srudentsBercr€dir
for knowing som€rhing$ey r€ally don r know Morr speciatisrsin (eslionsrru(.
tion, however,do not disapproveof rhe procersofanswering by etiminarion and
do not r€tard ir as a sign ofw€aknessin mulripl€.choiceirems in generat,or in
an hem wherethe proc€ssis pa icularly usetul.rIl mighr be nored i; prrring rhar
an item $at usesthe responsi -none ofrhe above'a: i correctanswei,r4znir rhe
MIJLTiPLE4HOICE
TESTTEMS I57
studenr ro answerby a processof eliminarion.)There are rwo r€asonswhv this
processis nor generallydeplored by tesrspecialisrs.
In rhe firsr piace, rhe tunction of ,chie! emenr_test
it€ms is primarity ro
conrribure to ! measure of generatachievemenrin an areaof study.fhey are'not
Intendedprimaritl'opro!ideaninvenroryofwhichparticularbit;ofknowleo8e
or skills a srudenthas.The achi€vemenroi a srudent;ho answersirems t, 3, and
5 correcdybut misses2 and 4 is regardedasequaj to rhe achievemenrofanotlcr
srudenrwho answers(ems 2, 3, and 4 correcrlybut misses1 and 5. Id€ntifyruts
exacrlywhich things a srudenrhas achievedoi failed ro achieveis a mar;r of
se(ono?ryjmuonan(e e\(epr bhen obje(r,\elreferen(edinterprerarions
are
needeclror mdsrerlor diaenorrildeLRions
In the secondplac; the knohtedgeand abiliry required ro property elim.
inare incorect alternarivescan be. and urr.ra is. cioselyretared ro ihe i<nowf
l
edge or ab iry rhar would be required ro selecitne correlt alternative rf eJucatron doesnor consisrin rhe ac.umularion ofunrelaled bits ofinformadon, ifrhe
developmenrof a meaningful nerw
then rhe facr Lhat a studenr respor
choosingrhe besranswerby rariona
tion), should be applauded rarher
pends on rhe use of muldple.choi(,
choicesfor uninformed or misinfc
surd or Iogicall) inappropriare chorcesis no proxy for a measureofusefut verbal
kn o$ ted ce .
In practice,few mrlrrple,choiceresrirems are lik€ly ro be answeredcor.
re.rlt
merelvbv eliminarinBinro1|6,1.1,o1..r.Frr more ofren rhe Dro.€s:ot
.hoile qrll involve(omprrarivejudgmenrsot rhis atLernarite
aqainrirhar.tl is
untrkety thar an examinee who is rorally ignoranr of tbe correci answerwoutd
have knosledge enough ro etiminare with- certainty rhe jncorrecr alrernatives.
r nrs rs especra r likel/ ro be rrue if rhe irem is we enouFhconsrructedso tia!
all rhe a. rilable alrernarnes,correcrand incolr€ct, hav€so;e obviousbasicsinxlariry. For th€sereasons,ir s€emssaf€ ro conclude thar rh€ p.oblem Ja;;..
choice by a processof disuacrer eliminarion need not be reiarded as a serious
THE CONTENTBASISFORCREATINGMULTIPLE.CHOICE
ITEMS
Like true-falseirems,mutriple.choiceitemsare developedmosrconvenienrlvand
mosr appropriarely on rhe basisof ideasexpressedor implied in instructioDal
materials.In Chapter 8 rwo paragraphsof iest materjal ivere reproduced and
rheserhreep'opo5itionsweree\rra(redfrom rheml
l
The lse of opdonal essayir€ms inkrf€res wi.h irterindividual s.ore compari_
2 The ur of optional esay ilems usua y contriburer to reduced resr.!.orevari:
3- The use ofoptional essa),iremsusualtyresuhsin r€duced rcslscorerel€biliry.
,I58
MI.ILTIPLE.HoICE
TESTITEMS
To develop multipl€.choice tesr items on t}te basis of propositions like th€se, it
I Formularea questionor aD incompletestatemeatthat cl€arlyimplies a question
(th€ iGm sted)
2. Provide an a.ceptableanswerto the quesion, statedin a few wetl chosenwords
3 Produ.e several plausible (but in.onect) answers to the quesrion (rh€ d$.
This sequence of steps was foxowed to develop a muldple.choice item conesponding to each of the propositions reproduced above.
{1) Hon wo|lldlho u€oot opllon.l ralhorlhln roqulrsdo50ayll6melikolylll.cl the IntelDrctatlo. ol lh6 !cor.6 obllln.d?
'., ll g€nar.llydlsto.l! norm.El.rem.d Inl.rprd.tlons.
D- ll g.n.rully lllacts lr.rtm.nt.rel.roncod intorpEtltlons morcth.n norm.rol€roncod
c. I g.nor.lly mrt s crI. on.r.l.r€ncod Inl.e.elallons mo|6 accurata.
d. lt gom..lly provld.r chrlllcdlon or lho domrh tor dom.ln.r€l6r.nc.d Inl.rprd..
(4 Howwouldlh. u3. ol optlonrl rllhor than r€qulrodo.ary lt€mcptDb.blylll.cr rolt-Bcora
vrrlrblllty?
'!. Tho drnd.d d€vlrtlonwlll bo sm.llor wh.. cholc€. .r€ o€mltl6d.
!. Th.tl wlll b. towor low Ecoros wh.n cholc$ rr. D..mltle.l.
c, Scol. vld.blllly wlll b€ Inorc.s€dwhoncholcos!r. p.nnltlld.
d. Scor€! arl mom lltoly lo b. 6pread ool lhrougn th. modarit€ rung€ wh.n cholc.g
(o Wh.l lt th. pobrbl€ ofl.cr on scorarcllabllllyol urlnq opllon.l rdhgl lhft
'..
b.
c.
d.
Equhod €..
Th3 K-R20rnould bo notlco.bly low...
Th€ K-R20coulrl ho only .llghtly hlgh€.or low.r.
Th. i..t-r€to.t co.lllclent rhould b€ nollo€lbly low€r.
Tn€ lDl|nh.lv6. co.ttlcl.nr lhould !€ u..ll6ct.d.
In the remaind€r of this chapter a Dumber of suggestioDs
will be offered for
Miting good muliiple-choice test irems.Most of these reflect conclusionsthat
it€m writ€rs have reached.as a ftsult of th€ir own €fforts to produce nems rhat
will yi€ld dependableindications of achievement,and many are supported by
rarional infer€nce-Nonetheless,only a few have been testedin rigorous experi
Rigorments,and th€ resultshave noi alwayscl€arly supported th€ suggestions.
ous €xperimentsin this area are difficult to manage,and fte effect of violating
one or a few sutgestionsis nor lik€ly to be great. On lhe whole, how€ver,item
writ€rs are likely to produce better items if they know and follow the suggestions
than if they are iSnorant of ihem or disEgrd then. A comprehenlive leview of
multiple-choice ii€m wriiiEg research and lor€ led Haladyna and Dowring
(1989a)ro dev€lop a taxonomy of item writing rules and to examinethe validity
of the rules tha! have be€n olTered by text aulhors and r€searchers (Haladyna
and Downing, r989b).
MI] LTIPLE€HOICE
TEST TEMS
159
THEMULTIPLE.CHOICE
ITEMSTEM
T he f urc ti o n o frh e i te m s re m i s ro acquarnrthe examrneew i rh the probtem rhar
is being posed.Ideally, ir should srarc or impt,v a specific quesrron Airhoueh onc
c an s o m e ri me ss a v e w o rd s w i rh o u r l oss of cl arrLvbl usrng an i ncrrmpl ctt srare.
m ent a s th e i re D s te m, a d i re c r q u esri on i s often beucr N or ontr does a di recr
ques ' io n ' e n d ro p re ,e n r rh e e x a m i nee krrh d murp \prr rti r pr,,bt, rn, ,t.u mJ\
f oc u\ r h e i te m w ri te r' \ p u rp o s e ( m u re cl crrl ! a' ,d hc tp h, rn ur her ru i voi d rrrete
v anc e o r u n re l a te d n e s si n th e d i s rracters.
Focus on Rolevanc€
Irre l e v a n t i re m s fa l l s h o i n ..nrri buri ng ro rhe purpose rb, tcsri ng fbr
any num b e r o fre a s o n s . rh e s te m fa i h to preseD ra qucsrrcDor spe(i fi c probl enr,
r he s ord i n g o f rh e
" re m i s a mb i g u ou,. o, ,h( q,,e\ri ,,n preseri rrd i s i etzrrrel y
ins iS ni fi (a n t.A l a t k o t I e l e v J n ,c rc \ ul rs i n I ru\r r.' ri , tur r\.,mi r.es JId Lonu rb
'n
utes to unreliable measures. The sample irems rhar
follor! i usrrare poor rech
niques for beginnrng the mulriple choice rrcmPhysiolosyt€ach€6us thrt
'r. tho dbvrlopm.nt ol vital oryans ts dop6nd6nrupor muscutar6ctivity.
t, slrcngth is Ind€pondontol mu6cr€si26.
c. th. mlnd end body ar6 not inltuencodby.ach othe._
d. woft lB nol6x.rclso.
Here the subject of a senrence is used as rhe irem sr€m and irs predrca(e as rhe
r or r eLr re s p o n r Ob v i o u s l \.
p re di careduer n,,'
tui to$: ph!.i ,,tuE )
e f,ven r
"
prp\w
rrri
)
could teach us a varie ry of rhin' hgs
f rhe srem were reptrrasea
ro r eaci, Wtrit
does ph v s i o l o g v r.a c h u ' ) rh e i re m N nul d bej u.' a. hrd
In cmparlng tho p€ od ol hoterogoxuat
adjustmontot our cutturswith thossot orh6.culsres.
It mu.l b. concludodth.t
'., th6r6rr. t.smondousdlll.bnc66 th.t can only be €tptainodon a cuhur.t basts.
t. fi.r6 ir.l!rg. dllloronc6Eth.t musr b€ oxptain€dby rhe intorac on ot blotogy6n.t rh€
rnom InlluonrlrlcultuE.
c- dlhough lh.re !ru !om6 dtllersnc.s,th6 btotogtc.ttounda on ot pub6rryts tundam.nrat.
d. In mo.t cultur.! pub€.tyli the p€rrodot h6r6rcs.xu.rldiusrmont.
Here, again, rhere are any number ofconclusions possible on the basis ofa srudy
of r parL i c u l a r p e ri o d o t h u ma n d e r el opmenr U nri t he ex.rmrneerrad. a rhe
responses, she or he has no clear idea ofwhar rhe quesdon is asking. The irem as
awhole is not focus€d on any specific problen. This opens rhe rdayfor confusing
multiple interpretations.
Abtolute@d r.htioe cot"ectr,$. Ideally, the intended answer ro a multipte.choice
ques t io n s h o u l d b e a rh o ro u Bh l vt o rr e( t ansbe' , admi rti nB no di fferen, e of opi n
ion among ad€quarely rnformed experts. This kind ofabsolure correcrness, iow.
ever, is difficult to achi€ve excepr in fornal logical sysremsor in staremenrs rhat
l
MULTIPLEIHO]CETEST TEMS
'8O
\ im P l ! re p ro d u , e o rh e r \rd re m e nL\ Fps. i r an), i ndu.ri ve rrurhs
or f\D eri D en.
r , ll\ b a rd g e n e rr l ' l a r i u n s , a n be re$' dpd a s a b\oturel y (rue.
r" ., , ." ' .;;;;,,
o , th e )r j tF ms on pr,,poi i ti on\ rhar drF no( absoturel ytrue bul
ar e s rro n g l \ p ro b d b l e . Ih e v \h o ul d. hu\ever, gxrrd rgci n\r basi nqi remq
on \rdre.
m en ' s s h o \e \a l i d i rv u n u td b F .h ,IcnB ed byromperent schol ar-" .
.
. . Anofter $idetine to fo ow is that rhe stim of a multiple.choice irem
should ask a quesrion rhar has a definire answer rndeterminate' q".;ii.;
;;
provide inreresring roprcs for discussion, but tbey do not make giod
iren,;;;
testin8 achie!ement. For example.
whlch 6v6nth the tolowing st has besnot the 9r.6t65tImponancsIn An€rtcan
htstoru?
'a, Br.ddoct'Bd6t€ar
D. Eur'3 conspiracy
c. Th€ Hayes-Ild€n conr€sr
d. Th€ W.bster-H6yn€d6bat6
Ir
u n l i l F l r th a r n h o l a ' .,rn d g ree on shi ch of rhe.e ev.nr. i r ofrhe
sreare\r
Amr .d n h i .rn r\. The i mpu,rrn(e of an evenr depend\" on
rh€
por n r o r \ re s o r rh e p e r \o n m a ti ng rhe i udgment and rhe I ontexr
i n w hi ch rhar
r nor v r.l u a rrs th rn k rn s o t rr.
Wh rl e e a ,h m u l ri p l F .h o i (e i tem shoutd have a defi ni re answ er.
i L mar
.
nor 2 l w a v s l re a n a b ro l u te l ) r o r re( r rnsse,. Man) good i rems ark rhe
e\ami nee
10 choose th€ best answer, as in rhis exampte
's
Whlchstat.mont b6st charact.rtzo.th€ mln appotntedby pr€.tdlnt Et!.nhow€fto
b€ Cht6l
Jur||co or rh€ unfl€d ststgs supr.m€ Court?
Ar a.$cr.ls ,uitic€ ot th6 SuprsmoCourrwho h!.t onc6 b€.n . prot.agorot taw
.r
A .ucc.sslulgov€rnorwhohad b6.n !n un!ucco$tutc.ndtd!r.torthe aopubflcanptE.t.
A u,.ll-knownN6wYort.trom.y who eucc.6.tu yprc.ecurodrh. t.rd€ru ot th. Conhu.
nrsr prny h ths unitod stat6.
A Domocr.ttc.6nator tron !lourh6rn sr!t.whohld suppo.tedEt..nhow.r,s
c.mp.tgr
Opiniant and authorita.h'e rorE,s
Whar abour items rhar involve expressions
of opinion? If ir is an opinion on which nosr experG agr€e,rtren a re'aqonable
muropr€.cnorce
ltem can be bas€don iL
MIJLIIPLE'HOICE
TESTITEM6 16,I
Whlchot rhosesraromonrsis mo.r const.rentwtrh J6tt6rson,sconcopt
ot d€mocrscy?
a. Democracyis part ol rh6 dtvtn6ptantor manktnd.
t
Domocracyrsquiresa 6kong na onatgov€mmont.
puryoseor governmontrs ro promororh€ we[..€ ot th€ p€oot6,
The
'c.
d, Th€purposeot govemmentts to pbt€ct rh. p€opt. tromradtc.tor subvel6tve
mtno 06.
lhe r er p o n s e \ ro rh i \ q u e s ri o n re p re senr generat;/ari onson
rhe basi ( of teffer.
s on s s pe e .n e \a n d w fl rrn E q .N o d u rh o rari \e\an(ri onforone parri cul a,
;€ral .
r z ar on i\ rrre l r ro b e a v a i td b l c .y e r s c hol arsfami ti ar w i rh
teffei :on s w ori w oul d
pr obabl v a g re e o n r b e s r ,n ,s e r ro rhi s i rem. In such c;se, ,h;
;* ;;,
i ;;
br s ed on e x p e o p i n i o n i s e n (i re l v i usri fi abl e.H o\evc,, i l rhe
rtem a.t< sthe ex.
, r m r nee ,o r a p e rs o n a l o p i n i o n , i r i \ \ubj ecr ro cri ri (i i m. For e\ampl el
what do you considerth. no.r InporrantobJ€cv€ ot at.fl m6oflng.?
'r- To .slablleh good wo|ttng rctaoonswrrhyoursrltl
D. To handt€routinemrtto.s
c. To holp tsacholstmprov€Instructon
d. To practlceand aremp ty domocracyIn admtntltra on
There isonerense in which aDy answer to this irem musr b€ considered
a correct
answer. On rhe orher hand, whar rhe item writer obriously wantea
to ao *as m
r es r Lhe e x a m i n e es J u d g m e n r a g a i n s rrhaLof recogni zedaurhori ri es
i n rhe fi ei J
or I nr er p e rs o n a l!e ta L ro n s Ir
. w o u l d have been beuer l o asl srudenrsdi recrl v
ro
'choo"e Lhe moJ( impo, ran, objecrive of srafl m..,i.s,.rh.i,
our y De w n a r.l r4 c o n (rd e r rh e mo s r i mpofi ant obj e( ti ve. bur a"r'.;.;,-;iri;i,;i
answ ersw i l l be
or recognrzed experrs.
Bur even exp€rrr disaBr€e, parti.
.
c nange d u e ro n e w a d l a n c e me n rs i n d
viewpornr or recommend a specific re€l
enr or opposing position. When ir is r€te
rt may be necessary to specify rh€ aurh(
like "According to your inshrctor', or..l
ple, may be needed to esrablish a f.ame
However. such siruarions probably oughr ro be quite rare. ft seems
more
r'espond.
eiev anrro r e x a m j n e e sro u n d c rs ra n drhe rati onal e for a parri cul ar
D oi nt ofvi ew
r han. f or e x a m p l e , to re m e m b e r S m i rh' s !i ehD oi nr.
m u l ri p l e .c h o i c e i l e ms d eat w ;* r i mporranl si gnj fi cani i deas,
. .
not
99 .d
ltirh incidental derait'. as does t}le firn itern rd owing, n;r;t,h
unr que org a n i z a ri o n so f s u b j e c rma tter, as do€s rhe secori d.
";;fti;,:
T hlt qu$t to n ts _ b i 3 .d o n rh e 6 d v a rfl .tn 9crmprtgnotN .umtoeg[t.root.tnthom..t
r.ro.|lnrp or r..quot b€d thon. tryhd w.! th. comp. v6 po.t on ot poquotprodueb
In
1gzt'l
a. An.ad ot atl comp. roo .mong ![ oustom€l!
'0. Strongwlth Inliluttonlt boy.E but w.!fi wtth hour€hotdconsumor.
c. S€condooly lo W!m!utt. among! cu6tom.ru
.,, Wa!ft wlth.ttgroup! ot con.um.E
162
ML]LTIPLE{HOICE
TEST TEMS
T his a d l e rti s i n g c a mp a i g n ma l indeed provi de a cxcel l eD ri l l ustrari o' or rtrc
pr ob l e n s i n v o l v e d a n d d rc p ra c ti .cs to fol l ow i n adverri si D g.ampai gns B ur i t
s c em sn o t c n ttrc l v a p p ro p r i a re ro measurestudcnrs abi l i rv ro i randl e.n adverri s
ing c a m p a i g n b ) a s k i n g rh e m ro recal l rl re detai l s of one i l l usrrari on used i o i n
The s€condprlnciptsol.ducalion is that lhe individual
3. g.thers knowledgo
D. mates mistakes.
c. re6pondsio siluations,
'4 ros€ntsdomindtion,
T he o D l y p e rs o n c a p a b l e o f a n sw cri ng thi s quesri on i s one w ho has studi cd a
par t i c u l a r b o o k o r a rti c l e Wh e rher a gi !cn pri n.i pl e ofeducari on i s l j rsr or sc..
ond i s u s u a l l ) a D a tte r o f l i ttl e i mportancc. E ducarorsha!c not agree.l on any
par r i c u l a r Ii s t o f p ri n c i p l e s o f e d ucati on or any p ori rr of pri nci pl es. Ihi s i re' )
shou's an undesirable ciose lic.up to the organization of subjecr ma(tcr used bl
a sPecific instructor or wrircr
I8rructioneliBttio6.
Informationalpreambles rhar scne only as *indow dress
ing and do not help the exarninec understand the quesrion being asked should
ordinarily be avoided Here are llvo examplcs
Whll€lrcnlnghor lonnal,J.ne bornodher hand .ecldentallyon tho hot tron,Thi3w63 du€ ro
a rranslerol h.st by
The introductorv sentence !!gges6 thar rhe iren involles a pracrical pmblem
A.tually the question asked .alls only for knowledge of rechnical telmiDologv
In purllylngwalerlora clly wat€rsupply,oneprccessls to havethe impurcwet.rssep through
laye6 ol s.nd.nd line and coac€ 9rav6l.H.6 mary lmpurities.re t€tt behind,Aetow..e
lour lorms,oneol whlchwlll de6cribethis proc€ssb6tle.rhanlhe others,S6t€d rhecorect
The primarv purpose ofa tesr irem is to Deasure acbievement While much learn.
ing may occur dunng the process oftaking a resr,deliberare inclusion ofinsrruc
tional materials may reduce irs effecriyenessas a tesr more rhan irs insrrucrional
ralue rs increased It mighr be better to ask the purpose offilrrarion in purifying
city water supplies or the type of filter used
I
MULTPLE<NOCE
IEST TEMS 163
Iatroducilg noulry. Noyel qucstio.s rnd unique prolrlcm siruarions reward the
hha, h.,,r \hF hr\ (auehr
' ' i, i' ' l ,,,i rd e ,l \r ,,,.n ,u 1 ,," ,,' \!,u q r,r.,,ui der\r.,n.l
. r " Lr pc r J l /. rh , . t\ rl i , i .,l l F .,r i | | ,,r,\i ,t.r , I , i \ ,. \ J , | | | ,,
IJ '
ll the radlusol the earthwere Increasedby 3 teei, irs circumt€rence
al tho €quatorwoutdb€
rncreasodby about how nuch?
J.8. M. h€ws,onenhe emptoVce
ot S6n.torMcCarthV's
subcommitre€,
ciaroe.tthdt a taroe
nuhberol supportersot communismin tho UnitedS!6toswoutdbe toundi; hbhotth6;.
6.
O.
c,
'd.
W.ll Slreotb.n&ers
NewspaDer
edlrors
P.olessionalgamblers
Prot6srantclgrgymen
Unintended clu6. Mukiple choicc irems sometimes provide unintended hints
aboDr rhe .orrecr answer rhar offer con
aminee. In some c2ses,key words from L
in t he c o rre c t a n s w e r In o rh e rsrh e c o rre
c ally or s c m a n ti c a l l yw i rh rh e s re m rhar
s om et ime sth e s te m o fo n e i re m w i l l i oar
item llere are some examples ol items thar provide relevanr clues in the sr€ml
Wh6nussdin conlunctionwfih the T.squar€,rhe tsl v6.ricaro.tg. ot . td.trgte ts usedrodEw
c. horizonrallines.
1€4
MULTIPLE<.HOICE
TESTITEMS
Ihe use of rhe word lertbal in borh the srem and rhe correcr response of rhis
item provides an obvious clue
Mlnordllterenc€.amongorgant3msot rh€ eah6 khd ar6 tnown Es
d. nrtural .6l6cllon,
i^he
v^l
term Aiff%d.rr in rhe srem calts for a prural iesponse, which can onty
Tho nalor w6alo6ssot our gowmmont und6fttu Ar ctos ot Conled6r6on wds rh.r
a. lh.rc $,srcno hlgh otttctats.
'!. it lack€rlpow€r.
c, lt wrs vsry dllllcull ro amend.
4 thor6was onryon6 housoIn congr.ss.
There is an obvious relarion between lack of power and weaknessofsovernmenr.
lf a person knew norhing abour rhe Arrictes of Confederari.", ..;-""
would nonetheless dicrare $e conecr response.
"..".
A
n
)
re
s
t
i
rc
m
l
h
a
l
|
9
e
i
rh
e
r
m
u
ch
too
easy
or
n,uch
ro
di
tfi
cuh
for
a
.
sroup
of ex am in c e s r rn n o r p ro ! i d e mu ' h u \e fut i nformari on abour rherr ret,,,.ej c,.i ,
of a( |iev e me n t. l f o n i n s p e rri o n o r a f rer rryour an i rem i \ found ro be rnrD D ro
pr iat e in d i fn c u k ), \o m e (o rre c rj v e a ( ri on may be needed
Manihulating ditruulq. To some exrcnr rhe difficulty ofa muldDle-choice irem is
inher enr i n rh e i d c a n n w h i .h i L re s ts. t here arc, how e!e!, reci ni oues rhar oi !e
r he wr ir c r( o f mu l ri p l e { h o i c e re s Li re m s some conrrot ove he d;fi tcul } of rhe
ir em , t he y p ' o d u (e o n a g i !e n ro p i c . In general , stem que(i onr.an be made
eas r erbr m a k rn g rh e m mo re g e n e ra t o' harder by maki ng rhem more \peci fi i
r ne r o|lo w rn g p a l r o r l te m s i s rj tu s rra ti ve.
'b. good8hroughtInto r oountry.
c. Incomsot lmmlgranl8.
Onlt Lhe mosLgeneral norions abou( a rariff are required to rerpond !u((es\tullr
t o r h; \ ir e m, w h i rh i s rh u s s u i ra h t€ for use ar rhe l ow esr tevei of arhi evemenL.
Much more knowl€d8e of tariffs is required ro respond successfullyto rhe follow,
A hloh prot.cttvotarttt on Swt.r !x.tch.. In th. UnttsdSrat.s t! Inr.nd.d ro mo6rdtroc y
b.nollt
!. Swl.a wtlcimat€.!,
D, Unlt€dSht.. ctttzln! who buy Swtaawrtchoa.
MULTIPLECHOICE
TESTITEMS 186
4 unlradshro. €ovemmonr
ofltctat!.
'd- Unlt€dStar6!watchmat.rc.
i ustraresb^owrhe senerali,yor spe(ifirirvof a quesrion.an
Ili:
litl.l11.rn,
De usecr
to hetp (ontrot its difficut'v
focus on Clartty
Tt.'sdesirable
toexpressrhesremofrhe item so rhatirrequesrs
. ..
LheessenD€,ns resredas di'ecrty. a(curaretv,and sirnply as possibre.
rhe
;fil"1L:1,:.1c.
ro owrng em sremseem,needtessty
complexl
Conslder€d
trod !n dconomtc
vtowpotnr,
whtchot rh..o propo!.tolo mltnratnwortdD€aco
"-- *-'
d6 v6Etho t€astiuppontromth€ mt taryporonlt.lrteioi ar*rc c"*r:;
d. An hrsrnartonat
po[c6torceEhoutd
b. ost.llshod.
D. P.rmrnonrprogrimsot untvercrt
n traryrratntng
shoutdb. ldoD|3d.
'c- Slz6sot standtngm tarytorc€.lhoutdbo hcr€ls€d.
d. Th6r6mltntngdenocr. c naUomot rhowo d shoutd.nr6rhro
a m ttlry. tsnce.
.1.'
,,'efut reddinss.rhe meaninsor rhisitem stemie nor ctear.
lltl
'.0*,.0
nesrrrveapproa.hand seemsro .ombine rwo dis,rmitarbases
for
llji"".*,
econom'(siand drornicenergv.The wordingor lhis irem
JUogmenr,
miqhr
seem
--""''' "'-"'
m reflect lack of clariry in the rhinkins-;f tte p-*,
i,,r,"
ii.
"i"i.
asingflegdtbet. Ir somerimesseemsdesimble ro phrasethe stem quesrion
to ask
not for rhe correcr answer,bur for the inconecr inswer. For exaripl
In lh€ dolhlton ot ! mtn€.a|,
whtchot tho tofiowtng
ts hcorsct?
.. lt wa3produc€d
procors.s.
by gootogtc
o, rt hasdl8thcttvsphy€icat
properfl.!.
c, ll conlllneoneor mor6.tsm€n|s,
td, l|l chemtcrtcompostflon
t3v.rirbts.
Tremsriaiare negarivetv
stated.rhar i:, rharrequi,ean examineeto Dickan ansqer thar is nor rrue or chr'racre.isric.
rendro be somerhaLco,r,si,i. ir,.y
pearunusuallvarrracriveroexaminatlonqrirersbe.rusesomuchoftheinsinr,r.
"f.
,rra et s^ubheddinSs
under a main ropic.
6R,ng lor somerhingrhdris nol one of
ions are rarely encounter€dourside rhe
€vancerhat is usua y desirabl€.Ar rimes.
to
achieve
bothbrevir/anoo-u" .rnu,,1i,l"'l"Tixl''o;t """"" " probrem
Undorwhtch-otrho!6 ckcumsrlnc€ewoutd. 3I,€ako.ar a po
clt.a[y NOTb6 pEt.€t.d by
th€ FkBtAmandmenr?
a. Wh.n rsktng rhe rudtencoro rotn In ! plor6$ march
'!, Whsnrotttngih. audt.ncob trl€ vtot€nrac on
c. Wh€ndonounctngth€ pr€ltdonrot th6 UntrodSr!r.!
4 Wh.n calllng tor rho cr.iflon ot a now po Uc.t p.rry
166
MULTIPLECHOICE
TESTITEMS
B v ap i ra ti /i n g o r
u n d e rti n i n g rh c negari vebord,
rhc .l
:,l ,s,trer.araE
rrcm
r{,| | .r dD H\\ ,ri c
am , n e e ! a re n ri o n ro i r a n d e n s ure\,;,,
,;-.;,:,::._
(arFress* " d' '
s k er w i rr n o t ." .,i " " k ;;;;;' i :;.rh a '
k' .' ;;;;,'' r,e.^
' he
" h.
WhEtchengeoccuruIn th€corhposttion
o, rhal l" t llstttedalrtishtroomin
rivlnsrttnss aresrowins
whrchrh6onry
;,1, pi"ii.l' "- "r
Cr.Dondiodde incEae.3 and oryggn
d6cc!!es.
,..
-o,
ca.bon diorld€ dec.eas.sand oxy@n
Incr€asoc.
c. noth carbondiorido .nd orygen
incre.se.
4 aoth carbondioxtd€,nd oxygon
decrcaso
Irtro.lu?tory senteflces. hem wrirers
shouid
:::r'i
.il:r';Tillr*:r*:i::i:f:I iriirfiSx
H nf:,'l$:
MULTIPLE€HOICE
TEST
ITEMS
'67
A r Lhe s a m e ri mc . h u w e re r. rh e y rhoul d nor dvoi d i mporranr quesri onss;mpl y
bc , : t u' e rh e re r' n o d b ,n l u re l v a n d , umpl erel ) correcra;sw er l f many des(ri pri ve
or qualifying ideas are required, rhe clearesr erpression may be achieved by-placing them in separate introducrory Senrences.
Th6lsrm cr66pirs sociarrsrnspp6aEdlrequsnrtyIn potiric.t dt.cu66tonsIn th6.a y i950e.
Whlchol th6s€ i6 most olton us6dto ittuslratocr66ptngsoctal€m?
Gensrationand disrrlbutlonol 6l6ct c pow€rby th6 t€d.rlt govornmont
Comhunlsl Inliltrrtion ol labor uniong
Gradualinc.ea6€in s.l6s.nd sxclsetax.s
Particip.tion ol the Unibd Slates h tniorn. onat organir. ons 6uch as the Untt6d
The use of nvo sentences-one to presenr background informarion and the other
to ask rhe question-frequently adds to the clariry of rhe irem srcm. CombiniDE
r hA e r $ o e l e me n r\ i n ro a \i n g l e .q ueq' on senrenre probdbtv $outd mate i r conl
In other siru;dons a separare introducrory senr€nce is necessaryto esrab.
lish rhe setting or conrexr. Such staremeDrsdiffer from rhe insrucrionat pream.
bles and window dressing menrioned earlier. Here l, an example
"Wh€n we look at rh€ world .s a whoto, ts ctearthat rhe probtomot sconomtcprogr€.. t6
reallylh€ mosl important,"This star€msntis be6tctassflsd a3
. scl€ntiticconcluslon,
Obviously,rhesesraremenrscould be merged ro form a single question.Bur Ior
exxminees{hose rcading rkills may nor b€ well developed,greaierctariryoftask
can be achievedby using rhe formar illustrared.
PREPARING
THEFESPONSE
CHOICES
Obtaining Dlstracters
The purpose of a dNtracrer in a mulhple.ahoice irem is to dBcriminat€
ber wee n L h o s es l u d e n ts w h o h d \e c o j nmand of a speci fi . body ot tnow tedse and
t hus e w h o d o n o r. T o d o rh i s . rh e d i \| | arrer mus' be a pl ausi bl eal ernari v;. One
168 MULTIPLE€I]OJCE
TEST
ITEMS
oblaining ptausibl€ dislracters.B to use true
setements rhar do |lor cor
1ay,of
recuy ansner the quesrion presenrcd in the stem.
For example:
Wharts th6prhciplt 6dvsnr.ge ot. battoryottodd
storag€coth ov6ra betory o, dry ce s tol
auromob a srarrrngand tighthg?
,. Th6 stor.go c. turntshosdtmcrcur€nr
!. Th6 votr.g€ ot rhs erorogec6I ts htgh€r.
.c. Tho corcnr trom
th6 storag. cofi ts sr.ong.r.
d. Ths lnlrratco€t ot rhe slorag. co ts r€ss.
Lead srorage cetls do furnish direcr
cells, bur rhis is nor rhe reason why t
cerning fie relevance of knowledee
ing irs rrurh. Mulripte.choice item"ss
t es ling a n ,L h i e v e m e n r rh a t i s s o m e
es s ayix a m i n a ri o n s .
Another source of plausible distracrers are famitiar
expressions, phrases
r hdr ha v e b e e n u s e d i n ro m m o n p artanae and,hd,
ma, ,ee;,,,,,,i i ,.' ;;:,;
denr s w h o s e tn o s l e d g e i s me re ty i uperrrrrar.
WhtchoJ rh.s. h!6 oflecrodtho grsargsrchan96In
dom$ c ptanrsand antmat6?
,. Influencoot onvtronmonton hsrodny
!. Organtcayotu on
'c. S.tecUv6bre€dtng
d. Surutvrlot th. flnoer
ilJ:T:lli';:;'"ll':::::::..::,-'j
j:l:tri' :f ,henuesr,which
as,uden,
Dav
p,.,rd;;.;i;;;;;i,*;i:;;;,i,il.":ix#x:
ll,:,1;:11.:llj:l,lldf,srandin€
tary lev€l of discriminarion for wlirc'hrhrs nen
,, ,n,.,0.a.
thari..m
o'.,.".,.Y,1:,":";:T;sP€cincracdcs
"rt"'
ii**t'3,*r"tm
'un1'. to g"n.'ur. gooa
be
hn, Forexam
w:x::fx#::!:an*:r:.m*
::::''i:
#':::::'.::111ff
'l,i:
::t";:ffi
:;:*; i,ly"*ili{ie,l,
2. Thi.nh of thingt tt at hdoe samea$ocidrion .
ere(rri.
r;rris;dror
quJ,i;;:",b.;;;;;il'jff ,tr"'rl'J5,1f,Ljl;
.li;lll:
througha (omp,essedgas"or "etecrro;agnetic,t*,pi",
'."
"r
i,.r, ...,rr11
MULTIPLE€HOICE
TESTTEMS 169
How dld (X) lhs €otlmrted omountot p6rrct€umdiscoveredin new fl€td3in tho t6re t970€
comp6r6wlh m th6 amounroxkact€dkom p.oducins etds In rho 5am6v€ars?
,. X ws3 pr.ctlcllly z610.
'0. X wa6 about hall ol Y
c, x lust dboul6quat€dY
d. X wls grcatd.than Y
Som. c..6e ot tung crnc.r may bo atrrtbur€dro ctgarot€ smoking.Wh.t was rho srarus
ol
lhls lder In th€ lats t96o3?
r. Th. thooryhad bo.n ctoarty$tablshed by m6dtc.t ovtdonc€.
O, ll wr. ! controv.Ellt mrtt.r.nd .06o etP€.ts constd6r6dlhe ovtdonc€to bs lnconclu.
Th.lh€ory hrd bsoncl€.rty dtaprov.dby 6urv6ysot.moker3, tomor sdok6r., and non.
Th.lhlory wr6 too rccanrro havob6en.ubl.ct6d ro any t€sts.
The responses
ro rhi\ irem rep'€senra v ale of vatuesfrom (omDtetee\rdblish.
menrro (omplereindefinireness.
The useot a qualirar
ives(ateof rl.ponseshetFs
to s)srematize
rheprocess
of lest(onsrru(riontnd ro suggest
de.irabieresponsis.
4. Ph,asethe y.stion tu that it cotltfot a,y's" or "no', onsun prl.,as .xptaMtio,,.
Here rs an euhDle
rr tr rarroorrrecraronrry Incomsro dt.po.!bt6 hcom! uaurly htgherIn a .€ntorct zsn
nou!.hord rh.n In ! young.nrrtert hous.hotd?Why?
r. Y.., b€c&r. .srlorc h.v. gr€lr.r livtry! Incom. to.p.nd.
L Y.i b.c.u.6 !ento.! h!v. no malor tuturo .rpone.. (hous., colt€g€ cost6) to say. tor.
.. Nq b.cru.o loclll socudry plynonr! ..tdon c,ov€.a[ fliod €xpone.! ot 66rtors.
d. No, b.ca$. dl.po.abl€ Incom. ts atway. htgh!., by d.flnhton.
5,.UE variats conbinatians altuo cbnentt as ttE att$netiver.
might oc.asionally assume rhis form:
r.
2.
34.
ODly A
Only B
Borh A and B
Neither A nor D
Thus four responses
__________
I70
MULTIPLE'HOICE
TEST
]TEMS
A D i re m i l tu s rra rj n g rh j s ta c ti c i s:
What was th€ generatpoticy ot ihe Etsennot"'"ot'n'ttratlon
governhodrerpondiiuresano r"""",
,. Reducitonot both expendlur€sand tarss
.b. Roduc
on ot gxpenditures,
no ch.ng€ h raxos
c, R€ductionin tax.s, no changein 6xpend(ur6s
4 No changein eithsr €rpendl[ros o. raxes
durlng i953 wirh resp'cl to
Ifrt)c rwo etemenrs each have rwo ditl.erenr
values, for example rise_fa raDrdtv,
slowln rhey can be coDbined in rhis $av
ro grve tbur alterna(iv.s
I
?
5
4
rt risesrapidly
It risesslorvrl
Il falh slowry
Ir falls rapidly.
\iue. .onsidr.
r ro ba, k uI
c res,rnA .l t
raj \e attcr rr
using a difrcrent ottbtuach ia ttp
t.,m rhc E ri ngj ;h and ro J,t
rhe prouuri ri .n on w hi ,h i r i \
ri \. ro I, dr,e\ not eyi \1,rhe i de"
0,,,*".,.:,'l
;ll ;,Tl:i:";:.,,,,,,r,
;Jll,;;,lj
..,., rfd1*roning,r,.
c ut r v o r rh e rre m, ro m a l e rh e c o rre.
. ;a.*rry'rr,.
"".i
c on re n r d h .ri m i n a u o n'";;;;;;;:.:;Tilrij'.$::::':::,.*f:T'.:Iff:11:
re q u i re d i nr
Jtititi"ltl;jI"it"i:lili.$:Jj:,,:il
:l:* ;;;p;;;:,;",itllf;i1f,
An embargot3
'r, a trw of r6guta o6.
D, . t(tndot bo!t,
c. an €mbartment,
d. a toorrshadvonrurc.
lhis irem!ary widery,onryan eiemeDtary
kDowredFe
of
3.:::"-.^iTI:+.1*:
emDargoes
6 required lfor successful response.
D, a cu.toms duty.
'c. th6 .ropp.g. o, good. tion sntry and o.p!nu...
d. !n rdmtslton ot good6r.06 ot dury.
T he h o m o g e n e i ry o f re s p o n s e si n rhis
se(ond questi on mdkes i r co,rs,derabty
m or e d rtfi c u t,
MULTIPLE'HOICETESTITEMS
1'.|
Anorher means of makinE an irern easier rs ro provide
more rhan onc
basr for choosing rhe correcr an;erj as in mrs rrcm
Which ol the tottowing.rs knownlor rh€trwrltngs h cotoniat
America?
'a- ThomasPain€and Ben Fr.nktin
D. Malk Twainand H6nrycr.y
c, Willlaft Pennand paut Revere
d, RobertFrostand E.n€stHehingway
The use ofrhe names of rwo individuals fi(ing rhe specificarion
in rhc i(eD sten
nak es i t s o me w h a r e a s i e r T h e e x am,nee
." rv 1" " ,"
,,r;1," .;i ;;t;i
" ei a
$riters-or know thar one in each ofrhe disrracrers
"".
was nor knon,n
for his writins
in |he colonial period-ro respond successtuIy
rr has occured ro soDe ircm wirers rhar rhel mighr
use as.tisrmclers the
a:rsw:r
or .ohpreti-on
y:#,:":.,:i1",:::i..:,s,cLl::: srlorr
rt""rsrr_o.".,lsiijr.
j:';"r;:l;
be
I.'h,'
"b'|j,n.d
::l'l;,':;'l::1"'i:r:'-:1':,'i::lt
'Io ."ra.- ..;-t ,"":;':; ; r,;
in ear n, i,pmhn,,ns
r:;;; ";
i,.,,r';
:lll:l -.1:'jll:
ij
- ;;;';ii:i,r::
*. ",uden,,
espun,e'o,.,.."u"".,.
i;;;;;;,
il;,;,''"
tt)73)
".a
Sldving lor Ctsrity
[;
1:5;.";;l;.3if
ti*,t',,*:fu:*rli#;li#fu
Th6 chl€l dtlte.6ncab.M€en rhs surtsc€foatuEs of EuroDs
and Norlh Amertcais that
rtu 6.oa ot Europets ta.gor
Europo€xt€ndsmor6to th6 south.
th€ VolgaRlverts tong.r thdn the Mtssou.t-MisstssrDoi
lh6 grerter hlghlan(|sand ptrins ot Eurcp6.xtend In an oast_wsst
dkection,
fea,ureof Europe.
Eirherrhe
3:y *_. :l:::s :1,::: l:illr q.*.i|.,
: *.face
"'u.r...
r.u,u-"r:
". t'r'.;*i.il;t*
3ff:1'j:,"1::l:-:":.q:
lTr::d::
s hould a l l c o n fo rm ro rh a r.a re g o r).
S i n .e mu l ri p l e .c h o i ," r;s p onset dre d
i orer.ted ro be answ err ro rt,e
beParaler(thati.'
I' g*.-.ri..r
:-":'T:'.:i'i,.'l-L:l:tidarr
,.1.T.:.:".1lp-.
,""'.,1.inrengrh.
'i-'t-;
and
,.
."-pr".i".
iJ^i",,i;;;;i).
l;t. '.*.
:l
"';
Slrvorywas ltrst !ta.t6d
'r. .t Jlmgetown sot 6h.nt.
D. at Ptymouthso t€mont.
c- rt tho s€|lt€D.nI ot Rhod. btard,
d. . decrdo bstorg rhe Ctv[ w.r.
172
MULTIPLE€HO]CE
TEST
ITEMS
The firsr three responses ro this item are placesi rhe fourth
is a time In quesrions
(his
of
rype,it ,s nor difticult m visuari;ean t*"*.
i, _r,r.r,L. .1,r.",*
r.r. ofa directqueerion
stem
mighr
h.rp
ro pie,.; l;i,";;;.
:.":j.o:::.*,.
or
a m D i g u rr).
S i n (e rl re rn rti v e ,e s p o nsesare i nrcnded ro rFpresenra \er
of di \ri ncr
stton. rr ishelpful to t\e examrne; and ro rhe effecrivenesr
or m e re { rre m rt rh e } d o i n d e e d preqent (l ear cho;.e5.
.
M€.i can b€ pEs.ryod In brtnodue to th€ t.cr thar
a. satt ts a brctortatpotson.
.!, bact€rtacsnnot
withstlnd th. o6mo c action ot th€ brtno.
c. satt att.rs tho cismtc.t compo.Iton ot th6 tood.
d_ brtn. protects th6 Dsat trom contact wtth atr.
B or h re s p o n ' e sa a n d , c o u td b e j udged .orre( L R esponse,
5i mptr e\D tai n5,$hr
r c s p o n s ea rs c o d e (t. In a .a \e t i ke rhi s, i r r\ undesi rabte
ro counr onty onc or
two alDost equally correct responses.
Familiar exp.essions ind phrases provide a useful source
..
of Dlausibtc
distracters. bur obscure distracteriare undesirabte.
a ctro c condflon t3
r. aEymPtotlc.
.b, contusod.
c. gaucho.
4 p€nnutlbt€.
appropriare
reuer
ot d'm'ur,vror
ll i:^::i:1"'i3ji1
:'?'?'"'an
rrr..
*-ainr"g
..,m,i,:;J;;;,;:,i;;
jl:i?.yl,f,:::l1l' ::Pl!
il:
lTn ,o expe.l,t.
roodirficulr..tr
il unreasonabrc
*".i*.
of rhem mrght no. be a betrer synonym for ,.ch,".i."
^;:;;;;,:;;
i';;
,. r..*i;;;;,.";;;;
th"" ,h" ;;.JJ;;;;;;;
The search foi prausibte d isrra.rers maysomerimes induce
an item wrirer
to resort to rrici(ery, as in this irem.
Ho.rc€ Gr€ol.y t€ tnown brnb
.. advbo to young m.n not to go W..t.
!. dlscoy..yot.naeih.tc!.
'c. gdllorEhlp ol rh. N.f, bn 1116lrr.
d. humorous !n.cdor.s,
I ns e ro n o f rh e " n o C ' i n rh e fi rs t responsespoi ts w har
w outd ofi eFai se be rhe
to rhe question and d,us makes |}le irem more a test
of studenr!.
arertnessthan of their knowledse of Horace cre.r"y. r,i.r...y
,r,i" r.i"J-i.-n-..i,
badly on the ethics of rhe ite; writer and is likeiy ,"
"r
power of the item. Such ptoys tend to hav€derrttn;*"t "p.1,;;;;;i;i;;;;;
Jef.., ., ,. -",.",,"7
examrneeiwho are abtero de(ecrrhem.The me".af ro ,hem i"
"Rerd ;;;;;
r arefully be.ausesomeoneis our ro car(h you off-guard.,,
e, , *.,r,. ,iia..^
MULT/PLE{NO]CE
TEST
ITEMS 173
ar e lik el y b re q u i re m o rc ri rn c ro ma kc rl
ti"' a."'l,k"li;
.;;.
spoDscs'and their levels offrustra'
Galning Efflciency
,*
necd
td p a ra l te l s l ru crurc berw een rhe srem and
rhe resD onfts
"._
someum€s
requrresrhara respoDs.s
t)esinwitr, ,r,. *-. *.ra- s;iii rhl.-;
group 01 words is repcatea
in cach resi
phr as e t n th e s te m s h o ,,l d 1 r..,," ,i d " ." d1" " " '
the possi bi l i ty of i ncl udi ng that
Whlch is the bost detinitiontor a vein?
'a. A blood v6ssetcarryingbtoodsohs ro rhe h€a,.
D. A blood vossetcarryingbtuebtood
e. A blood vessstca.ryinghpurs btood
4 A blood r*sol carryhg btoodeway trom |he hesrt
T his ir r m,u u l .l I,ro b rh t\ b c ,m p ro re d h! usi ng
an In,ompl ere \t?tFment rrem
' uihd, . A v e i n i \ j h l o ,,d r.r,e t ;a r,\i ng.
. oc, a.i onal y. som. ,." .;i i t;;
;;;
i,,n\.ni.n, s/! or ma rin s, h Fr, " . , r. , . . b ; i i" ii. r, ; : " , ; , ' , ; :
..n-""
rl':.:.j1
c per r t io
s e e m se x .c s s i v c
A l o rh e r p ro L l .m a ,i \e \ w h e , ,
tonq and (umpl e\ so rhdl
e\ dm inee \ h ,t. d i j fi ,u trt p fl .e i \i n A and kerfi ng
i n .i ra rrr..* .' " i i " r" A i n]l
enc esam o n g rh e a l l c rn a ti v e s .
Syst€malicgeographyd[lers kom rcgionatgeography
matntyin that
r. systemsiica€o9r6phyde. ts. in the n.tn, wfth phystcltgsography,
wlr.r€r!
:-*
--- roctondt4*
osr.phy concems[s.[ s3senflalywtrh rh€ fletd ot hrrm"" g""e;;rd
D. systoharicseogr.physrudi6s, r.gton systemdtca y, wh
e r6gton.t;6ographyir con.
cernedontywtrh . d€scrtptiv6accountot . rogion.
'c. syst€ma c geographystudt€s, shgte phonomononh tls dtstdbu
on ovorlhe oadh h
ordeJf_o.suppty
gonedtEa ons tor rsstonats.og6phy, *rrr"rr
rruar."
,"
ot pnenornonain one oivan.ro.,
"ir_gl;""i
d. syitoDatic gaogr.phy is rh6 modom actontiflcway
ol srudytngdIt.ron sflon ot tho
6a.rh's.s!rtate,whiteresion,ts.osraphy is rh6r,adiu"."t
fi d.;;i;ii;;;:;;;il;:
hg dtsl.tbu on ot ph6nom6n!In spac6.
.fl eni ri ( of svsremari cgeographvdi sri n
trpny?
s rhe task tor rhe e\ami nee by removi ns
.sponsesal so rend to
focus auenri on on
174
MULTIPLE€HOICE
TESTITEMS
Whai le monogamy?
a. R.lu.alto marry
6. Mrftlage ol on6 womanto mor6th.n one husband
c. Merrlageol on€ man to morethan on6 wilo
'd, Mirl.ge ol one m.n to only on€ wil6
A m6.rl6goIn whlch ono wom6n mlrrss on6 man i3 cdlled
It rsu s u a l l yd e s i ra b l ero l i st rhe responsesto a mut,pl e.choi ce i rem rarher
than to auange them in tandem, as in this example
Th6b.lancs sho6l.sporllorlhoAlax CannlngCompanywouldroveal(a)Thecompany'sprotir
lor lh€ prevlousliscal y6ar'(D)The lmounr ol hon6y owedto its cBdtrorc(c)Th6 amounl
ol hcomo tax pald (d)Th. ahount ol sal.s lor th. pr.vlous liscrl p6riod.
Responses in tandem savc some spacc but are much more difficulr ro
compare than-those pla.ed in list lorm Another good rule rs thar whenever the
alternatives form a quantitarive or qualitati('e scale, rhey normally should be ar.
ranged in order ofmagnitude from smallest ro largest or largesr b smallest. This
may avoid some confiision on the Dart of rhe examinee and eliminare an irrele.
vant source of erfor
The DoDulatlon
ol D€nmark16aboui
'0. 4 mllllon.
c. 7 mllllon.
d. 15 mllllon.
Comlnon prac(ice in wrning multiple.choice tests calls for rhree or four
distEcters for each item. Ifgood distracters are available, th€ larger rhe number
ofalternatives, the more highly discriminating the
is likely ro be. However
'rem one is likely ro be some.
as one seeks to r\'rite more disracters, each additional
what weaker Tbere is some merit in settin8 one\ goal ar three good disrra.rers
to each multiple.choi.€ item aDd. iE $uggling temporarily ro reach this goal
Not all good distactersare immediately apparent Some will emerge only afrer
considerable brain racking
On the other hand, the,e is no magic in four alternatives and no real
reason why all items in a rert sbould have the same number ofalternatives. Ir is
quite possible ro write a good multiple.choice test ilem wirh only two distracrers
(three responses), and occasionally with only one distract€r, as Smirh (1958) and
Williams and Ebel (1957) have shown After tryout, one can actually improv€
som€ rtems by dropping those alternatives that don't distract poor srudenrs or
that do distract sood ones.
MULTIPLECHOICE
TESTITEMS 175
Eliminatlng Unwanted Ctues
A common device for adapring muhiple choice irems ro questions
rhat
seem to requre severat correcr
add as a final ahernarive the ie.
sponse, "all ofrhe above.', tsur use ofrhis respo"r. ." ,h. ."-..r
app' np" J lF n n l \ i f rl l p re .e d i n g d l rc r nati te\ are ati .\.onp.t ""J;;;;;.;;
,; ;;
s r enr que\ l ro n . l r r\ n n r u n i n m m o n o n (ome .t1ssroom r..rs to " " " " .r,
fi nd ..al j ut rhe
abc,vc" as the .orrecr 2nswer for each or mosr
tl. t,"rn" t. *r,i.f, tr
"r r" r,",r,.,".,,,lt)-ii"".,
rheuppo\ire\irudrioni\ round
iusr
"o".i*
9::.:',."1,,],. I e te \d n r (l u . ro rh r
torrc,l
an i ncorrect
wh. n
all o r rh e J b .v e i . u .e d . d n d i r shoul d bc used,p,i .si ;_
" nsser
i i ;;;;;'
,;
be, hp, o' , p . I r,' .$ .t o n ra l u .,d s i o n\. bur ne\e,
,
" , " ri ;.,,,i " i ,,,' r
" ri " i
The response ,,none of rhe above,, is also somerimes used,
eirh€r as rhe
inr i. ndc d J r-re r !r a , d d i \r,!, re r j r i s pa r(ul a,t)
i , In,i ;i ;;;;..
af r lnm e' i, u r
" ,.trt.or reLl ne!,
k h ,.re rb c dr.,rni ti on bcrheen
" p e ttrn g
dnctern,r
r ' unequ' \ o , a l . Bu r rhi ,' r.m
r.rp. u n \e . ti k e a ot rhfut" ," .-.t .,ta _r U ....a,.r.,,
,, r ,horoutshrr
ljl:-:.-,l
, r ' \ r ) and l lll:,
[,(o r' ,r r L e ,,,n d ] u \d g e
Whichword ls missp€tt€t?
Hereare exan)pres
of (o,,ec,
o t r he(e re,pon,e\
What do€s th€ termgrortfi fl66n?
T h( o \e ru n o l b u rh n l rh e s ere s pon\e atrerna,i ve\probzbl )
denve\ trom
r,hr m r r on, e p ri u n rh a r a l l mu t,i p te { h ;i,e i rem\ shouj d
i rave ar teast i our ror
nv e] r s l, ons e _ a I| c m rl i v e \. th e \e p h ra i e s dre used as fi l l er w hen
rhe rtem rri rer
en. ounr er \ d rt,r(u rr\ In In d i n g a \u m.i ent number ot di \l racrers
l n surh (i r.
rh e o v e ru .e o t e rL h b e (o mc. d .tue ro rhe resrw i se,
underD reD ared
ludenr hho re .o g n i /e s rh a r..a i l o r rh e a bove' or..none of
rr,.
* i a" _
r nc ( or r er I d n s q e r w h e n i l d o e s rp p e a r. A ( w as poi nred out
U " "D
" tri
" ous\ec.
i n "rhe
r€vi
iili,illil,"l.
."-r.
in8redsun
rora ,rems
ina tesrrohaverhesimenumber
The use of disrmcten rhat are less difficulr rhan rhe correcr aDswer
is
somerinre\,
rrri,i/ed be(auseir permirsa studerrro ,esporOsr,..e.stuflv
tv
ef;m.
Ina ng rn.otre.r
Huwe\er srudenrrwho can respondsucteistullv
on
rnrs Ddvsusua y 'Aponscs.
tnostedSerhdn rhosewho cannor.Hence the
I dn iremis nor ,mpriredb) thischaracrer
istir.Of course.
,":,,d or hishryimplausibre
ki .onrribureliIlre or norhrns
:^ii:,"'::-..i1":
': ot a resrrreh
ro rne erreruvenel\
whichot ihe to owingha3horpodmolt ro hcroas€rh6av.r.g6tsngtn
or hum.n i.?
D. Avoldancoot ov€Ealng
c. Wid6r!6e ot vtrEmtns
'4 Wld€rus€ ot Inocuta ons
170
MULTIPLE'HOICE IEST IIEMS
S om e re a c h e rs m a )te e l th /l th e d b i ti ri esofsomeotthei r\rudenrs(annorD ossi btv
be undere\rimared,
but rhershourdnortcr rhisreetins
or tr,.,*,L"
to employ such an unreasonabte disrractcr as response
d
. A lack of parallelism in the al
pared examinees to the corr€ct answe
lvnt€rs io express the correcr answer r
tb€ o rher ahernarrves. Somerimes rhe c,
sive than any disrracrel Ar orher rimes
conect answea allowing some studenr!
mg vaguely ftat rhey had eDcounrered
examples of irems thar provide unwan
r.iJ,"r,.-
How dld etyteEIn woman'sctothingIn t950 di[€. mo3i kom
thos€ in .tgOO?
a, Theyshow6drnor€b€ruty.
D. fh.y showedmorev.d6ty.
c. Thoyw6r6ea.io. to ctean.
'd- Th6ywsr€ea3i6rto tivo In. to work ifi. to mov6In, and w6r€g€nsraly
tess r€srrtetiv..
The
sgrqrer
detail Ned in stating rhe correcr.esponse rnakes
ii undesirably ob-
Hlstoryt€ttous rhar att na ons hi.vo6njoy6dps.Uchsflon
ir
'c.
phy8lc8ttratntngot lom€ sod.
Rsponse . obviously provrdes a more reasonable
compleuon ro rhe srcD thaD
or:r.].,c
ojher
rcsp:nses
r, represe,,ts
a cons",_,,,i[;i;a,;,_r;
i;;Ii
ill ol the dangersinherent
one
In Lheus<ot incomptcte.suremeni
iremsLemr
All theseirretevanrcluesro the
and should be avoided.Ir is entirely aD
in the disrractersto misleadt}le tesivii,
the relevant clu€s-rhose useful to welt.
irrel€vant cluesis an imporanr skill in
ReduclngComptoxlty
In somemedical aad healrh.retaredconrenrareasit has
becomepoDular
to sroup answerchoicesand have examineess.r..r,h"
_....,
choice!.such irems hav€ been referred to as ..-ultiple ,nultiple""-;i;;;;;-;;
;h;i;;.;-,,-;;
""'"".
ptex mulriple choice,',and ..K.ryp€items.,,Here t,
'""'
"';;k'ii;,
a lor|nol or..ctm rharts Intond€d
mlhty ro hstpbufldondur.nc.t3
'1 .
!.
losglns.
MULTIPLE'HOICE
TESTITEMS I77
c, litrlng w6ignk.
ular combinadons of chorces offered ,
ramrnees express grearer pr€ference for
s r' lrue measure of shar they tnow. (see
nr. a[d W h ne], t9;7j A l baneseanctsa.
.er and Frisbre! 1989; and Haladyna and
rese generatizarions.)
.j Fscri bcdi n rhe preti ou\ (haprer, over.
{ mutri pl e{ hoi ce furmar no,ed above.l r
Drescores,rr samptes rhe conienrdomain
hown a distin.r preference for ir (FI|sbie
and sweeney, 19E2).There seeDs ro be no logi.rr *."ipi.i.ur
l*i" ro.
. i"".
tinDed use ot complex rnuldple cho,ce irems
SUMMARY
PROPOSITIONS
1 Ihe mosl highiy reqa.dedand wdety used form
0ro DtecLrve
le$ is r he m ult pe c hor c eior m
2 Critcs o l mutlpe c hoic e t em s lend 10 ex agger
are borhthe fumberof tau(y rlemsthatappearon
restsand the ser ousnesso1lhe consequencesot
3 Th e rmp on a.r as pec ls ot €duc ar onatac hev +
men rLfa lca nb e nieas ! r edDy obiec liv et esar
G6
rargerydenljcatvr'trhrhose lhal can be meas!fed
9 A goodmlit p e-choic6ilenrordi.arityshoutdnol
asktor the oxamtnee
s opinon.
10 rlemstestingrecaio' ncdentatdetais oi insrruc
rbn or spe.€torgan26tions
ot sLrbjecl
ma[eroF
ornary are undesirabte
1r The rem slem shoutdposelhe essenc€ot its
qlestionas simptyand accufatety
as mssibte.
12 hemslemsincudingthe wordnot askinoin e!
reclro.an inco(eclanswerlendlo be boihcon,
4 A sludenl who setecis lhe corecl responselo a
good ru lipre-chot.e ilem by ekminalng re_
13 Thestemot a moltipte<hoic€
lem shoutdb€ exspo.ses she or he k.ows are incoiiect demon
s(a re9r.h evem enlot r et ov ants ubt ec rm a er
5 M! rip e-cho i.€ tem s s houldbe b: s ed on s olnd,
s/gf rcant deas that can be expressedas nde,
penoenl and mean ngi!] propositions
6 The slem ot a m! lipte4hoce ttem should state
or ceany rmplya specirrcdired qLeslon
7 A mL tpie-choice (em ca fg tor a best answer
can be as elteclve as one lhal conta n6 onlyone
aDsor!rerycorrecl answer
I Goo dh !t1p e-cho c e t em sc an bebas edonm ar
rers ol op nio. I mosl experlsshare thai oprnion
or il lhe a!thorilarve so!rce s specitied.
176
MULTIPLE€NO
CETEST
ITEMS
17 Allrheresponses
to a m! tipte<hoce ilemshoud
be par ar e l rn
ty p eo i c o n l e n g
t,ra mma l i c a tsl (j cture.afo generarappearance
22
le The responses
1oa mUtpe-chotcetem shoutd
be expressed
simp/yenouqtrto makecear rhe
essenra d lle.encesamongthem
19 T hor es p o n s eto
s a m !rp l € -c h o c ei l e ms h oui d
be lisledralherthanwn en one al{eranolherIn 23
a compaclparagraph
20 Wh e moslmLrttipe-choice
it€msprovdea1 easl
lour allernatveresponses,
goodqlesiiofs caf 24
be wf ille nl s i n go n y tw oo r l h re ea l e rn a l v es
21 There s no compeifg reasonfor a mlllpte-
chorce items in a rest (o have exaclly the same
numoeror responseatlernatves
The responses noneoftheabove and.a o'
lhe above are appropriale onty wher rhe re_
spcfse choces e ven to lhe queston are absc
luley corector i.coirect(asIn speIng or ar th
The d stracrers if a mlttpte choice lem shourd
be def n lery ess corect ihan the answer blt
prausbly allraclive lo the untnto.med
The nrended answer lo a mu lrpte,choce tem
should be clear concise, co/iect, and k€e ot
O UE S T I O N SF O B S T U D YA N D O IS C U SS ION
1 W har is lhem os t s er ols n t a l D . o f t h e m u t L pe - c h oc e r o r m a lf o r m e a s ! r n g a d r e v e m e n l
(ii yolr op nioi) and how co! d thal m larionbe overcome?
2 How does lhe process oi e m nat@f Inherent n the mu lLpte{hoce to rmal. coniiblre to
ess va d scores when objeclivereferencedrather than norm reierefced n(erpretalons
3 W hy s il pr eler ablef or lhes lem o t a m u n i p t e , c h o cier e r nt o b e w n l l e f a s a q l e s l i o n r a t h e r
rhan an ncompele sentence?
4 Whal are some advanlageslo the rtemwr lef ot be ng abte lo use a besl answer rather
than "absoule coftecl arswer ?
How c an t he dif iic uly ol a m utl i p ec h o c e L e ml o r a g v e n g r o u p b ea t e r e dw i r o u l c h a n g
Ing the basic ufdenyinq proposrLion
be n! meas!red?
6 Whal are some polenliaidrawbacksLolsinq cofrpound cho ces (lor exampte A and B A
B, and C and so on) as m! liple-chorceaiLernalives?
7 unde. whalc lr c um s lanc es m g h t t h e u s e o ' n o n e o r l h e a b o v e c o n t r b u l em o s tt o o b l a n
8 how can th6 use ol munpte trle-tase tems In ptaceot mo lipte choce improvethe hea
suremenloi achievenenl?
-
10
Other Objective-Irem
Formats
SHORT.ANSWER
ITEMS
A short answer resr item aims ro resr knowledge by asking examinees ro supply a
s or d. ph n l e , o r n u mb e r rh a r rn r$ e h a qucsri un or , nrnptcro a ,enrence.C orn
pler ion a n d fi l l i n rh e .b l a n k a re o rh er cl mrnorr tdb.t, i " r sh" r' .Jnl w er i rems
Here are several examples:
(1) Who discoy.rsdthe ldlulh tr.rh6nt ol dt.beros?
(4 lhs ntD. ot tt! holy city ol l.l.m i!
{3} h wnal yo.r w.s th. brttt. ot H..ttng. toughr?
arn no
a.D, t060
Wh.t 16th. commonn.ho ot oacftol thes. chontcEt3oheirncs.?
(r)
(5)
(6'
(4
C!cO!
NrCl
C{H,O,,
NroH
10) NH,
rugar
rv.
Items 4 thrcugh 8 consrirutea clusr€rof similar shorGans$eriremsbasedon rhe
S_hort-answer
items deal mainly wirh words and numb€n. They ask for
namesof p€rsons,places,things, processes,
colors, and so forrh. They may also
179
18O
OTNEF
OBJECTIVE-ITEM
FOBMAIS
ask for Englishwords,foreisnequival
shorthand,marhemari(s.
chemisr; mr
(lude numberrrepresenring
dares,disl
tor a phrase,ir is usuallysomerhing
sh(
neouscombusoon" or i'discoveryofAm
ol \omewharlonge esponses.forerdmple,'.Civerhreered\un:whv...or..Lr)r
lhe trJrlsof... rqclassrfied
asa sho es(a!quc{tion,rrherthr'na \ho .alstrer
This meansrhat shorr.answerirems resrmainlt for facruatjnformrrion.
A' rhefoundarionof all retiabteInowtedSe.
ra(rr, on\riiurcdq impo,Idnr\ub\||a.
Ium. Eut thele muLh,much more to tno\ledg. rhanrhe farrr rhdr,rn be re
ported in single'\words,short phmses,or numbeis Wt anL",r a",*". t.". ..n
test rs much more lim,red rhan whar true_falseor mulripl€.choiceitems can resr
Thu\, khile rn! \hofl dn\her
,dn be ion\crred r. d n-Lretat\por mulLiole
'remor mutriptechorre
(hoi.e em. only a iew rroe-fal\e
rrem,,rn le r"nrerrea ro
the short.answe!forn.
Sho {n,wer.iremsarc r er1 mu,hlessafre.redb\gxe\(ingrhdndrel'ue_
,,
lalse
or mulr;pleLhoi,e iremr.Thev dt\o drc rupposeai,iresr ,ei/ ra,herrtun
e' ognrrron.Hhr,h In rheeye\oI somein\rrucror
c makesrhemmoredemrndrne
'
and
more \rlrd a( resr\of d(hievem€nr.
Hohevcr,/i qe havealrcadrsetn, noi
only rs blind guessinga rather rare phenomenon,but the harm that ti can jo ro
the scoreon a reasombly good, reasonablylong resris actuallyfarhei shghr.And
rn responsero the conrenrionthatrecall is a more srrenuousmcnral pro;ss rhaD
re(ognirion, ir mavbc .did rhargood .ho,,e r) pe irem\ \etdoln
i,. un,.*.a
bt qimplere.oCnirion.In fa(r,rheyare mnre titcl) $an are shofl
-" .an\w.r iremr
ro resrundersranding€nd
ro requirereflerriverhinting
Despite thes€limitarions,short,answerrtemshive a placern educarional
manvsepara'escorablercrponsesper pageor per unir ol tesrrnsrir;e. And j
Ine group ro be re\red red.onabl)(mall.rhe $ or ins rhdrrnurrbi done bv rhe
',aide is nor unreasonabty
teacheror a comperent
burdensome.
popurar in the primary irnd
'djusrifiably
basicvocabutariesare beins buitt in sub.
1etic,and in rhosepdns ot a,rencewhere
Lbolsmusrbe learned_When usedsimply,
€xamineeshave idenrifiable readrng or
writing problems
Wrlllng Short.an6worltema
L W6d,Ih2 Wenionc itunpbu-ttakna! @efutb @agh to f.qrin o ,i}gh, uniqa. 6tu4 A I ommonprobtemwllh shorr.rnsser;remsis rhara qu.\tion rharrheirem
writer thou8htwould call for answerAelicirsftom someofrie exanineeseouallv
defen.ibleanswersB. C, or D. For exampte.rhe question..Whar is coati..,ro
hnltn rhe rntendedansrer was..afuel. mighl alsoeti.ir su(h ans$ersaq,,perr;.
fied ve8eublemaler." a burningember, or ..impurecarbon.'To orereni rhrr
dual ambiguity indefinitenrs.in whar is resiedrnd ronseouenrdiffi,ur,. in
r(orinB-rhe quesrionshoutdbe r€wordedso as ro eticira mo;e soe.il,( answer.
OTHEFOBJECTIVE
ITEMFORMATS 1O.I
For eh6t purpos€ts mosr coat used?
F om what subsr.nco was coat tormed?
Wh6i nam€ts Epptiedro a gtowingcoat In a th6,
Coalconsistsmatntyot h,hatch6htcat et€,hont?
udte a qustid to uhich that antuet i, th.
r shorr answer quesrion should be oD rhe
rar ans(er nr mi nd and uord rhei r ques
ceed in avoiding indelinireness and ilrrl.
ar i|f.eriot method of obuinrns shorr
ansqcr rems is to find a rcxrbook sente,rce froD .hi.h.,".d.";;;;;i.A;;
m ar e a sh o rr a n s w e r i re m . fo r c x a mpl e
fhundsrstormstorm wh6n cotumnsol
arr ris€ to cooterat ludes.
P os s ible(rre c r a n s w e rsto rh i s i re n i n ctude ,,w a.mcr,,,.,l ow o,,,
and ,,moi st,, Thrs
exaDpre also serves Lo illusFate rhc next * ee suggcsuons
tor wriring shc,rr an_
jl*lliffi ;..;#T:l,f
Y::::::LLn;,;{
r,r:',ri:"
tTisft
3. If thz iten
it an incofltpbte
@teflcq
ti1 to
tr*:;"#il:r^:":#,:;tr:t?":::,.*i""::i"_:T
ii?il,
ii,'li,ill;:l
ff.tx':ff:lii,i,i:::::::;:xxH$,'j;l;;i
ljllilT?
";::;:::::f
Tho nameot iho hotyctty ot tltam ts
Whlt ls th6 nam6ot rh. hoty ctry ot t!t.m?
However, answers to the question:
Whydld tho Unlsd Stat€sdoctrro vr.. on Jrp.n |n 1s41?
are tikely ro b€ more lariable and somewha! tonger
rhan complerions of rhe sen_
Th. hm€dhr. c.u3o to. rh6 U.S.d.ctrruflon ol w.,
on JlpEn h 19at w.3 tho bombtngol
pead Hlr6or
5. Atro;.I lminen Ld clu.s ta ttg correc, a'.nler.
The word.oobr in the irem
lnunder s t o rrn s s u 8 8 e \tsth d t rh e d i r b efore i t rose
must hav€ been warme. Or
(onsider this irem:
182
VE TEMFORMATS
OTHEB
OBJECT
steamborts aro hoved by engln€s that run on rhe prossur€ ol
It rakes little knosledge or insight to gucss that the correct answer ro this item
musr be steam For what purPose B a questjon like this one being asked at all?
Focusinq on the answer betore witing the questio r, lrkely to result in more
r m p' ' r t arr q !re \ri o n srh a ( h d !. n ,u re " P eci fi ,al l \ unrque atrsw Fr. It i ' " l ' o i rnP ' '
r , r jr r " r emi m b c rth d tq u e s ri u n \s fl rre n hi tharP eci l i ' anss.tt' ,mi ndarel ' Lel )
to be more relevant and more concise than sentengcslifted from text material
Another commorl but unwanted cue helps (he examinee d€termine the
length of the iDtended responses.Each blank used in a set ofshort'answer items
sho;l.i be exa.tlv the rame lensth. The short answer directions should indicate
ifonly a srngle w;rd or rfeither a word or phrase nay be used as a valid resPonse.
Cons ider th i s i re ml
The nrm.ssl lho lwo rlvorsthat m€6t.t Cako,ltllnob, src th6
The long blank to ac.ommodate "MississiPpi' and the sho blank intended fbr
''ohio" ;ake this item easr€r for all, but Parricularly for students who are unsure
Cteat
6. word th. iten 6 conciseb as Po$ibte uithout king s|ecifcity of res?Ne.
ideas are expressed in concise statem€nts or questions. Excess words waste thc
examine€\ time and may confuse the idea to be expressed
antuns n th" iSht nor9in ol .the questioa paee- Ihi\
spoaefor renntig
7. Anrye
i te ms e d si er ro score.w h;.h i s i r' mdi n Jutri tr' dri urr'
ma
k
e
s
th
e
a,
r
ir
ir
r
o
l
o
n
i
v
or
6ut also encourages the use ofdir€ct questions or Placemenr ofblanks at the end
of incomplet€ sentences.
8. Anoid unng 4 co oentional uoflting ol an ituPodnnt idca et the bo'8 lor a short
d@er i.en. Use of the usual wording nay encourage and reward study to mem
orize rather than to understand For example:
Gah or los6 dlvld€d by ihs colt .qual6 lh. g.ln or 1065In-.
Two lin6s psrplndlcul.r to ths samolln€ In lho 3am€Pl.no tr
Better versions of these it€ms w@ld be:
To d6t€rmln. the Potc.nt ol gtl6 on,
ramactlon, bv wh6t musl rh6.clu.l galn b€ di'
ll tro lln.! .€ druwn Porp€ndlcul.rto lho stm. llno on . !h6€t ol P!P.r, lho rwo linos
MATCHII{G ITEMS
Matchinq.rcst rtems occur in clusre$ comPosed of a list of premrses, a list ol
responsc-s,and directions for matching the two ln manv clusrers the distinction
between prenises and resPonsesis simply in the names grven to them The Lu
------.---
OTNEFOBJ€CTV€.]IEMFOFMATS
I03
lEts can be inrerchanged withour difficutrl In orher clusrers, such as rhe folow.
r ng ex a m p l e j rr rs c o n v e n re rr ro use descrrpri re phrascs as rhe preD rses and
s hor t e . n a me s a s re s p o n s e s
DiructiorsiOn the btrnl betoreeachot the to owhg contrtbution.to €ducaflonatit6.sur€.
m6nt,placeth6 l€ er that prec6d6srh€ nameot the p6rsonrespon.ibtotor it.
.
c
d
pr€dis€s
t6. Dev€loped
the Boardot Etaminers.t lh€ univorsity
ol Chtcago
i7. D€v6lop€dhigh.spoodotecrbntc tesnprocossing
oqurPm6nr
14. Publl6h€dtho thst ieiboot on €duca onat ne..
E. F. Llndqulst
E. L. Thomdlko
A wide variery of premis€ response combinarions can bc used as rhe basis for
m al. hi n g te \r i ' c m\: d a ' .' a n d e !(nr\: rerl n\ and.tcti ni ri on\i w ri rer\ and
ouora.
r i. nr : q u rn r i c . rn d l o rn ru l d \:,,,tur j an,pl ei d,rd ,,i m., or ,ol o,r: dnd;
o;.
! turer , Jn b. md,' hed ro p.,ns shor n on a
1'
'sl'r'et
: ' 9'
( h o t th e d n rm a l
C l o s e l r r.l a re d r. rh e m d rr hrngre\r i rem i \ rhe,tass,fi ,ari on or keyl j \r
.
r t em . -K e s l n n \e \ to r rh i s i rrm I n n \i \r uti l i 5r,rt rta\\e\su,hasrheparr.ofrpeeth,
per iod ( o rh rs ro r\..1 d \s e \u t p trn rs , ,' ani mrt\.' tpe\ ot, hemi cat,;a.ri onr, ;au,e_
et t ec t i e q U c n ,( \. b r a n .h e \ o fg u \e r nmenr. ur nari on. or srare\ ' l hp premr\escun
s ir of n a me s .d e \ri p ri o n ., o r e \a mpl e\ rhar rre ro he i ta$,fi ed among rhe
re.
s P on\ e sp ro ri d (d H e re i s a n i l l u !r rari un
Drr€ctiorsjAlror 6ach oventIn tho lst betow,DUtthe numbor
i. lt lt happensdbelor€rho btdh ot Ch sr (4 a.C.)
2. lllth a p p 6 n .d a fi 6 .rh 6 b trth o tc h ri srburb6to.ori sMagn6C .d!tfa.stgno.l (.D ..t2ts)
3. ll lr happ.nedrfi€r the MagnaCarrawas stgnedbot b€torcCotumbu!aftly€dtn Am6dc;
oa92)
a. il lt htppen€drfier Cotumblsartv6d In Am.rica bot botoroth. D.ctrra on o,Indop€n.
5. ll it h.pp€nodaltd th. Ooctaraon ot Indepsnd6nco
{1275)
37.
30.
39.
a{t.
Erupllonol t. V6uvtus
GuttonbergBtbl. prtnt€d
Pllgrim! land6d€t Ptymouth
Wllll.m Shakosp€6r6
w.s born
2
3
A paf l f ro m rh -eu- s n f .l a q s e so ' c rregori cs rs responses he ke) Ii srr,I l assi fi cd
t r on r te m q d rrtc r tro m rtp rra t ma r.hi ng i kms i n that the sam€ rrsD onse i !
' ' m ar c h e d - ro mo re rh rn o n c p re m i re, and rhe number
of premi ses i i usua v
grearer rhan rhe number of responses.In typical rnarching idns there are Dor;
responses tnan Prem'ses.
M a r.h rn g i re m\ h a v e ,o me rhi nB i n .ommon w i rh mul ri pl e.choj cei rem,
_
r n o, r e rrn g c x p tr,rr a l re ' n rri v e a n s w ers The! atso have somethi ns i n common
wit h s h o d n s w e r i re ms th e v a rc usua v Ii mi red ro spe(i f,, ra;uat i ntorma.
r on- nam e s . d rte s . td b e l s ,a n d s o on. They are
l .,orty sui ted for restrnqundeF
!
18I
OTHEFOBJECTVE.]TEM
FORMATS
u:: poorrradapred
ro, ,esrDsun,rque dcar.
:,,i:::1:,rt,lr-:t-",1:
uf
i re ' n s i s .r p re rc q ,,i s i
, r c l n re d
si n.e
a.l uner
;;.:;,':;;|
;;::;,;
:,
tj;;il';i;
,]i".
;,;,ljl'1",::l
;',:r-r:l;r:
It:;.r":,iJ,;i
pJ,ri,uta
r.,w rspe.";r ,1r,,.,i,..,,,.
\ ", i t,'i \r I rn\ In r te\t. L
,.
,._i ,
i .rE ( nr
',J'
. '',mJ,,
h,',s
,i.,
,i:i:;.':ii,li:;l:"ll::"1:l"l;;':::
:l't:t'
r"" " ."",
ll:,.:ll,T:;i.;'f,il,':
;:l;.1';i-:':i
l:t';";1,:l;:l*:,i,i:ll;;rlll;1i:;:.1'.l
,,:,';",:. 1,,'.1:',','"i,i';r:':,1,;t
ff:,i:l,i::l,'
1,,;llt:::::lt'1
,'r.lii,i
,l;l;i;;,i;
l;,i
;;.r:ti;i:
;:i;"
i.1.
.,;".,
}"1::I;1"t,
;
i:;:::lr.:;.,;.
j,l
.,;;
;,li;l
l:l';
;::,1
i
;,;
:,
I
i;,
:: i:t;ij:
:;;H
:,
I:lt;ti":t
;,..,n.,,,,,,j
;..i;;".'i ;:, ;:;:ii :I :;li,,I).,,:
",,"i,.a
m a\.huwe\er.
;:*ffii,:l
T1
l;=fl
hc ia""rr,
J ,rs
, , rLcn
,
turn H",.
t is ng i r e ffe c ri v e i v
",
".";,,,,.,,,*g.,,i,",ii",:
,- .n",,.",,
i;.:ixri:fif,:::.,T
fi:?,::i,:j,,ffff
r::f :;:i,tji",f#,:ff
-t3,
_14.
..-i5,
darl, hard wood
too or smoothino
12" x 12" x 1'
f .'ilt:'.T'il,'J
iiitlj,:,i:i,i"J
15."1:
i:ii:i:;].:l';t[i^''*'
",'..',:i:ii;:#i:j*;ilii:l
Ifi
3. Do not oti2nft -Met.,
mat hins, in e
oEe to each oI the prennzs. In pirtccr
away b y rh e o rh e r ma rc h e s Or
i i a n err
in another Thus the ir
rh a n p re rn i s e se ti nrn
mar(nrng.
4. Prouide direc.io6 tha ctcorb
.i.ntair
,ijffj.j,.*l3lSiiffi
trtr#;f*.}.}j".
$'1g*}1fi
5- AEangins resr,otscsor ptmbc, t
@1
bothin
" f,mH,* trfi,tffi
ai^
ini""n
"*";,;-j;;";;;:;
-"y
tf;
OTHEF
OBJECTNE-ITEM
FOFMATS1€5
pr em r s e rn d rt\ re \p o n s e s to s e th e
r houghr ,' 1 , rh e i r s e q u e n c em a v mi
be. Rear ra n g i n go n i o r b o ,h ti ' \rsi ,l
Howev er, i f a n l k ,g rrl t o rd e r € x i s rc
or dares) preserving rhar order will
anrinces tast .
6. rf thz respojrJesare nune,i.at
gudntitier, dnange then in oflrar
iron hu .o hW.
7..Ut th" loaTcr phtu es 6 pqnLtcs. th4 shorkt d, resporJ?j.
Borh ot rhcse d, riols
bill r end ,o \i m p ti rr rh F ;x a mi n c € \' , a.k In fi nd,ft,h...,,..,
eliminare irrelevanr djffi cutrl
^,i ;h;;;.;,)
NUM E RI CA LP RO B L EMS
sented as mui ri pl e.choi cerest i rems, they
er form. \umer t, dt probtem( provi dr rhe
d i ,hmer, and u,h;, br!n.h;, ol mi rhe
pr obic Ds c a n e a s i l v b e p ro d u c e d b y .h
r n wnr c n th e ! a re p rc s e n rc d ,a n d rh e s€
t es t unders L a o d i n gi n c o o L ra s rrr) m e r (
ano t r c nc e c a s ' L o s .o re , e l c n i n s h o ft.
r r t ei. r l pro b l e ' l l s a s s h (rrra n s w e rre s ri r cD rs B ur someri nes
thef. are mrnor di ffi
c ult ic s in u s i n g rh e rn
I l| e l ' f,,L l c r, ,,t r\,.r.ti g r n u r,t(.! .t , ure! r r..p^I\c\.
sh., h nl .,!xe!
,
,
.h.i,..,r,q,.,
urher
,,.,,, ,.,re.d ..m.wh : , d
, t l(, e , , , r, , , ; r. , . , , , , , " , i, ii
, , i" i.
li. m \ H, , $
, i ,,,e , | ,ru \r rh e .,,,sq,I r,e ,,, ..,.t." ,;,.d;;;
H
;f
.p
,e
,,rt\
l
;,;.;
partial credir shoutd he given if rhc process is correct
but *. ;"..*
t,r.",,=.i
bec aus eof.o mp u ra ri o n a l c rro rs ? N o b l ank
;;";::r;r';;ll;1.,:"1;.';:
ii:il1:::n
; ;ltBllii,j;l
:l'";it'i;
;;;j;,
l: U::,thzsinptennunbersp"$r61".Thepurposeofrheiremrsroresrunderstan.l.
j.,Tf
j::i;i;;;;il
rif"'j+tJ:,'jti*:j
tf*:tl ::.J"?.j.'j
::.Jii.ili,:i:
la.
How msny ctrctosot radtuEI S/0,,canb6 obrain6dlroD an
I t/2" by t1', €h.ot ot pa_
lb.
Howmanycirctesotradtqs1"can b€obt.h€d troman8" x t
2t' sho€tot pap.r?_,
Clearly rhesane problem.solvingabitiry is measuredby borh
irems,bur rhe com.
Sliii:tli"li:i:,i::i:*,'L'5:j::.x'il,T?:;,:ilH*
J1"..#;il #:
186
fiHEB ofuECTIVEjTEM
FoFMATS
2. II.posihtz,..h.ak
ttr? gtuen quaatitias so thot the attun uiA b. d @hote tunber.
o oo rh r\ w rtt h e tp to rv o i d u n r e fl ai ntr abour hov far a de( i mrt f.a, ,,on .houtd
.r
2r. Whatle th6 .re! h 6qu6Eteetot tho |rrgsst rect.ngt. that c.n be tormedkom an isosc6.
lo3 trap€toldhavingbes€sot 20".nd 35".nd atUtud€ot i7,,?
2b. Whati6 th. .106In squareteei ot th€ targostrocren9l6tharc.n b. tom€d tron .n tsosce.
los tr6pszoidhavlngbGos ot 24" rnd 30" and .hitud6 ot tB,'?
T he n u mb " rs i n rh e fi ' \r p ' u b i e m )r(td J ounde.t/.o e,r re\D on\. ot 23b.
wh' c h .a l \u ( o u l d h e e rp re s e d a . 2 .4 .r 2 Ihe qF,ond probl em rcqi ,,* . ,r,. _,..
rhougjrt proce$es bur yields a corre.r response of 3 square feer and requires no
t. Spccilythc d?grec of ptccis;on dp..!ed in thc d^n pr |lqu.lrnr\arFuniefla,n
abour w h d r rh p r a rc b .i n R r,rc d rodn.and i t rhey gxe,r hrnngt) rhem.J{ rc
" '
ment of whar rhey are able ro do will be madc 1"", i.."."re
a: If a fib @n_ecl antuet tuLn siecib the unit ofmeo:te in tuhich it is expressed,ten
the Mniree this 6 fart of th. probleu. It is easy for a disrracted eraninee tc,
lo18. r In w ri re rh c u n r$ i n $ h i .h rn an\her
erpreseed. knoqrna hhar rh(
'\
uni' \ \h o u l d h e h a n i mp o rra n r pafl of rh. prohj
.m a,k to h.m;epa,Jret)
4.. Whal numb€.erpressocthe Inr€nsly ol tlumtd. on o. rht6surtace?
/tb, In wh.t unlt. i. thts lttumh.tiotr tnton.tty6xpros6ed?
toor.candtos
5. II pNibb. dirid2 a dnglc .onpter nuhilt".stcp htubt"n into a numbu ot ,inbtzt
s iaela n te p
f rc b b n t. tr i s J h i i rd k c ru bpt' eve
F,.mol e
he ;robt;h
t,ti rhe,no,
t he be rte r i r w i l l re ' r rh e e ra mrn c e r aL' i l i ry.l' u{
r rhp rcve,.p rs u.ua r i rue. \nv
i om ple \ p ro b l rm i n \o l v e 3 d n u ' n ber ot p' o.e.l ural I h.i , F\. ar N el t as z number
ol qua n t d ri v e c a l c u l d o o n s .E a c h o f rhes. , Jn be ma.l e
bas,. nr d (eD rrare rerr
' hc
ir em -S u , c e s si n q o l ! i n 8 rh e w h o l e pr.hl ern i nvol vesnurhrns
mure rt an \u, .es. i n
mating the separate choices and calcularions. CoDsider the;e rwo ircms. T.he first
is relatively complex, but rhe second is more effi€ienr and tikely wili conribure
m or e t o h i g h rel i ab i l ' I y .
L.rt y.ar M..cy lold 00 cars 6t .n av6raO6
prtceot $2,OtO.86. goatrhi. yoarb to s6tl
50% Dor€ c.r3. tt Mircy..m6 a commtsstonot lo% toroachsai., how huch morsw l
sho oam thl. y.ar than l63t yoartt sh€ 6.ch€s h€r so.t?
sb. last y6!r iilrrcy sold 60 c.B er !n av€ragoprtcoot s2,0OO.
H6, go6t rhi! yorr i6 ro setr
50% moroc.'3. Mircy,. cohmt.oton i! 1Oolo
tor €.ch srt6.
l. How nany cire do6s Marcyhop€ro sel rht6y..?
2. Hos hany dolhru did M.rcy earn t.st yeertrom h.r !!to6?
3- tlou, mrry dolllrs do€s Marcyhop€to 6.h thts y6s?
5..
Breaking down a complex, muhiple.step problem in rhis way wilt minimize rhe
prcblem of partial €redir. tt will result in more independenr indicarions of
a.hievemenr or lack of ir Thar will improve rhe reliab,liry of the resr scores.
OTH€FOBJECT
VE.TEMFORMATS 187
6. Erptes thc kumti.at ptuhten ct4a t ond s coneiMl a! posibtc. (.ldrir) requires
lull , r ilu U Ir| ' u I J n ' j \i m p l e d i r(r r v dremenr.. Lun.i :enpss, rhe .ti mi nari ;n ol
SUMMABY
PROPOSITIONS
I
Shorl?nswerilems are usedma n y lo tesl tor tac-
2 A much wlder range ot achievemeris can be
lesleo wr0r lrle-tatse or mulipterhoce lems
lhan wilh shorlanswe. lems
3 The diilculy eramnees have n prodlcifg lhe
cod€cl answer 10 a short-answernem is an ad
vanrageot lm led value
4 Shorlansw€r nems do rot provde a more vatd
measureol realach evemenl lhar do chocetype
5 Sh on an swe r te m s ar e ef t c enl and r et at iv ey
6 Shod-answer rlems need lo be conceived and
wriLtencare'! y to avotdtlre possiblity or muliple7
| wr I ng shorFanswernems, r s advantaeeols
1 0lh nk I rsL ofltr eans werand t henwr le t heq! es ror tha l w e hcil il
8 A d n e c t q u e s l o ng e . e r a l y w r t t r e s u liin a r e s sa m
b r g u o u s s h o ral n s w e rl e m t h a nw I a n n c o m p t e t e
I
l L e mw r i l e r s s h o u l d a v o d t i l L i n O n t a c l s e n
lences lrom texlLa maleras as the bass Jor
short-answeriLems
10 Malch ng tems Lke shorl answer rlems, usLa y
are lrm led 1olesl ng ro. laclual ntormaton
l l l v l a l c lnr 9 n e m sa r e e l l r ce n l a n d u s e t u t i re m p h a siz n9 reanorships belween ideas
12 Shorl homogeneols lists snoub be used n any
r3
Perlecrma(ch ng ol the lwo sls on a one-lo{ne
basrsis u.destrable
D r e c r o n s s h o !d b e e x p c i l a b o ! l l h e b a s i st o b e
15 The sr o' respofses i. a malchingc lsler sholtd
b-apresenled n ellher scrambed or atphabelicat
OUESTIONS
FORSTUDYANODISCUSSION
Howhr ghl t he r es pons es loas er ot s h o r r - a n s wleerm sb e l s e d a s l h e b a s s t o r d e v e o pn g
m! I Plethorce ilemst
Why rs I nol possible to co.verl a[ 1r!e ta se or m! I p e,choce ilems |o usetut snort-
3 How can matchrng lems be used 10measureachievenenl beyofd lhe reca evet?
What can bedone to reduceorelim narerhe roteot compliai onal sk I in restingnlmerical
probremsorvng ab ty,
5 whal a.e some dfawbacksassociatedwnh.edlcing a mlltipre-Etepnum€rca probtemto
a sel oJ singleslep probtems?
I
11
Essay-Test
Items
THE PREVALENCEOF ESSAYTESTING
&say resls condnue io be a very Dopllar fonx, (specialiy
among schotars and at
Ine higher Ievels ofeducation. fh;
sc ri i,.t."i'Lii"i.;J;;,iH:l,i.Jil,i*
*i:"1i,:;:":::i:*t:iil,j*
r..- "r*,it*""-l-i-*;;il;;. ;; iil:;;;"jJ.il'J,il::fi
:,,1.:::,:,:,I
(Cofftnan, l97l).
Howeler, there dre othe, reasonsfur rheir
DoDuti,irvon" ;..^nienc€.rn
contras,
wi,h.ri..,i* ,.",,, .",,y ,.,,i ;l; ,i;i,i;;::.;i".":j:1"._
$e difttcurr
panol thejo'b,susual!sraaing
sruaens.answ;,"1;,;;i;,t;""
securiryrhey pro!ide ro the examiner writeri or**y
q".",i"r. ,.. s"rj.j, *quired,as are (ompos€rsof obie(rive.te\r
irem!,ro detendrhe -.
or ro demonsrrare
thar none of rhe..sron
answer.hsay questions require rh
me scorer can rare wirhour descnt
showing his or her own version ot
essa/qu€shon are seldom so readil
objecrive,t€sritem
li is also quite easyfor th€ I
.
level aDd disrribuiion of icores. r,
pomrs, or €ven sevenpoints for a
p€rqonald€cision.Thu;, no matter
an essa)resr,rhe grdder r an dcliusr
wi| receivescoreabeto*
;*".
".*.
10t
---------
ESS"qY
lEsT TEMS 109
it enr . ii fi i !u l l 1 .i s o o r a .tu (i a t i a .r,o r contri butts
rn no smai l ncasure ro LhepoP
uiar iiy o i ri i c e i s a y re s i
l h e d i s (i n c ri ,J nb c rw e e 0i r
, ' , ' P . , r ' J n r ,,n r \! h rt.. D .,rh r\( d ri
, ' r g n\ u \^ .,u r,, ..,n r..,i , .x r,,r\\r
oB m e o 1 t,e r h :i ri (i , r,s c sti rr ..,,rt.,
r hc wr i ri n g i rs e l i t t.h L s \rri rj ,r,r i s r
ar i c r d ;r i rs e ti $ ;rh w ri ri rg a s l c s s rr
, ( , , Jd ,,rE ,,,rr,,o 1 .e l t r,. d L r,i ri r
r c c r u c J ,1 ,rSc .r,n i _rh
r rn ,n ,n
u u . p u .p o re i n th i s c h a p rc r
lor m c a s u ri n g e d u c a ri .rn a ra c fri i rc i
( , 1es s u tsl n d \ri ri ri g a s s e s s re n Ni s
er er . ' s e s , rh e s (o ri n g .ri tc ri a , a rd
.eprescnr achrc{eDerr ir rhc reie\
exPress'on.
rHE VALUE
OF ESSAY
TESTING
IJ:#,:HH;i
iHlll,H,
lY:::t$.:
iberarely choose rnderermrnare issues as
q u e s l | o n 5 .w h a r th e srud.nr (oncl udes,,he) (a).
i \ uni mpor.
T he € r i d e n c e o n w h i (h rh e e \a mi nee ba\c, rhe, on,
turi ,r, an! rhe. oqJn, v
,
x
or n' s or h e ) rrg u m e n r n s u p p o fl , r ;r are 5ai d
ro be a .rmoorr," r
"''
( r ir i( al lh i n t i n g . o ri a i n a ti ,y .
d .b i i ,,) ro oj B .ni .c dn
I lear ly de l i n e d . T h o s e ( h a rd c" n
re ri s ti (5ol rhe rnckcr. i h
sludents hav€ more and whi.h have less
citly. When rhe scores awarded ro essav.
deductions from a maximun possible s'c
nadon of these deficieD.ies:
i.
2
J.
:l
incon€cr starehenB vde included in nc ansRei
Inpo,rdnr
recGsary ro an adequue answer wereomrred
'das
Lone.r lraremenGhdvrnAtilte or no retation ro rhe
quesrionwe,e inctuded.
Unrcurd con tueio.s w€.e reached,eirher becauseof misrakes
in reaoning or
becauieof nisapptication ot principtes
190
5 B a d $ ,i ti ,,s o b s .u rc d(h c d e v.k)pn,entnnd exposni onof the stud€nfs,.l eas
6 l l ,e ,c ,rd e l l .,g ru n r.rro rs ,Dspel l ,rgrnd the ue.hani csol corrr.t w ,i ti rtg
N l i s ra k c si D th c l i rs i l b u r ca(cgoti cs can be attdbuted ei ther to w eaknes s esrn L h cs tu (l c n a s .o u r a i d o fknoul cdge(n b l ack of.l ari ty and spe(i fi ci ty
m t hc c x a mi D c r' sq u c s ri o n N l i s ra l cs i r thc l xs( rs.r (al egori cs ci thcr i rnl i cate a
i (c n s e l f.x p r c s s n [ or rcl ](cI Lhedi ffi cul ti es of rhe hand i n keep
wc ak n c s si r
' !r
ing up ri th a n i D d ra c i n g ,l ,e e d u n der Lhepfc,\ rcofaLi mel i mi t-A sessal tests
ar e t y p i c a l l y u s c d , rh c u .i (tu e fu n crnxrs thcy hare that are bcyond the scopc of
objec ri v c rc n s s e e n r s o n e i rh a t l i n ri tcd atd i ndel i ni te Odel l s (1927) scrl es l br
r . r r ing e s s a yL c s ra n s l e rs $ rg g c s t s o ongl ! rhaL Lhe l eD gLhof , studenfs answ er
m ay be c b s e l l rc l a tc d ti r tl tc s (o rc i t recctves l -ongcr answ erstend to re.ei ve
higher ra ti n g s
I nlluen c eo f W ri l i n g Ab l l i l y
E s s ry te s tsa re a l s o !a l u c d f o' rhc cD rphasi srhey pl ace on w ri ti ng H ow
c ! c r , t l ,i s s b o d r a n a d v a D l rBea n d a di sadvantrge W ri tten expressi oni s an i m
por t r n t s k i 1 1th a t e s s a ytc s tsd o e n c ouragc.H oLl ever,the P racl i cethat essaytests
b/ w ri ri ng hasLy,i l l consi dered,and unP ol
giv e r n $ ri ti n g rn a y b c p rrc ri c e
is hed l v o rs c , s k i l l i . $ trti rg , o r' nl .rck of i t, may i ni l uen.c the scorcr' sj u d gment
r c gar d i n g 1 1 rc.o n re n r o f th c a o s { er U ni fornr, Iegi bl e handw ri ti ng and fl uent,
gr ac ef trl s e n tc n .c sc a n c o mp e D s a tcforso' re defi ci eD c' estn .oD tent (C hasc,1979;
IIughcs. Kcclirg, and l-Lrck, l{183).On dre other hand, flaws )n sPellhg, graDmat
or usai.c crd derfacL from the scorer's evahration of thc content.
S(u d e n rso c c a s b n a l l l u s e w rrti D g ski l l to compensxtefor l ack of know l
cdge Srudenls who arc hard pur to aDswcr adequately the question asked can
t r ar r s fo fi r i t s u b d y i n to a re l a L e dq uesti on rhat s easi er for them to answ el If
lher perforDr wcll (,. rhe substitute tasl, the read€r may not even nodce the
s ubs t i tu ti o n Or rh e s tu d e n t ma y c o ncentrateon l btm rather than on conl ent, on
elcganr preseDtation of .r lew rather simple ideas, io the hope that this may dtv€rt
r he r ea .l e r s a ttc n L i o n l i o m l h e l a ck of substantralcontent
N o t a l l re a d e rs o f e s s a ye xami ual tons are easy to bl uff- Then, too, sl u
dents likcly ro be most rn nccd of re kind ofassisiance that bluffing might giv€
thenr are sually Lhe lcast able to use such techniqLres For lhis .eason, bluffing
on ess.rytes!s is hardly more serious a Problem (han guessrngone s way to success
on an o b j e c ri v c tc s r
lnllu6nc6 on Examin€e P.eDarallon
Thal the nature of the examination expc.ied affects the PreParation stu_
denG make for it is attcsted,by experience, reason, and research (Meyer 1935i
Terry, 1933) SuNeys of student opinron conducted about 50 y€ars ago suSg€sr
fte students then srudied more thoroughly in prepamtion foressay enaminaiion!
than for objecdve examinations The abseDceof more tecent research on this
lopi( m d y i u g g e s La l a .k u l i n te re \r i n l he toP i (, a l acl ofaw areneqsut$e P orrn'
r ijl dil l e re n .e s . o r d n i m p l i c i l i n d i , ari on rhnt srudentsobvi ousl y P repare di fl er
entlv for the two tvDes oi tests
ESSAY
TESTITEMS 191
I
2_
r - no t u n r n ,n mn n ro \c e \tu d e n r\ w ho hd\ e been hrndfd
a mutri pl el
besin ro wrirem€morizednores,)rtisr\on *. t..r p,g* .i,r,.i.'*.i hoi .e resr
i".ir.i".
Ihesearelikelyrhcs,mekiDdsof noresrhcsest,a..i"i".,ra
si"i,,.
re\r.An rD\pF(|lon ot obte,rive.rc.(boo|er. arrer
-.r..
rtre""..
c;mptc;o,,
oi resrinsofren reveatssisnifi.anr not; mak'ng, most often t" .
i;.;i;;;l
be inrelligibte only to rhe-maker.
Many porenr facrorsorher rhan examina.jonsattecr hos and
wirh lrha.
r'r,.F$,\rudenr\ \rudr.The,c fr,,or, rntc,rcr;n."mpt.,,,1,
r,, to.tir"i. o, i,
srrarecleariywhrch form ofexamirladon. essayor oqectt"e, t
rr,",no* rr""eii.
cial influenceon srudyand lear ins_
^s
NELIABILITY
OF ESSAY.TEST
SCOBES
The mosr seriouslirnirarron of essayrestsas measuresof achievemenr
in class.
ro.'n \. ingsir rhetow etrahitirv
of rhet.ore\ tt n rypi_tr11,er,r.
r o" , eri,S,ii,"
'
meansrndr there rs J good deatot in.onsrrr.n.yh(rqecnscure\
obrdrnrdlrum
'u, i r\.i\e ,dminirr,ati.n, of,he samcrrsr ,,r equivatenr
,.r,,, f,"; ;;;;;;;
denr.ro'ins\ or ,he \ame,esr.on rhe whote,rir ee .".dt,t;;a", ,;.
;;,;;;,fii:
Ior
tow
I rhetimire.t\amptingor,r,"..",.",.."","Ar,"iii
*.,
'}li.
'chahitirvlI
rzr rne rndertnrtene{s
ot rhc rr\ts serbr rh( es.avquenion.,and t3, rhe subje(lir
ity of rhe scoring of €ssayans$.ers.
In generat.
rhetarger(henumberofindependentetemenr:in the samole
or. tark! (hosentor an a.hievemenrtest,rhe morr
p",form,nce,.".
ihose taskswill reftect overal achievementifl rh. fi.td.
^,urrrety
It i" r;"';";i;;;;.;
quesrion ofren invorves many *p"*,.-a.,,_.,
.i
::.1 ::mlr:x..ess3y.resr
dearrwirh as r moreor rss inreEra(ed
chore
b)
borr.
L.:i::I:l|-'jll*y
"re nor rs independent
Ine quoenr and rhe grader,
etemenrr.
Few'if any,experimenralsru
relalive ro rhar of obie.rile t€srsia
sr'fliciendy objectivescorins of essal
there havebeensometheoriticar ani
a direct relation berweenthe exrens
rhe pr€cision wirh which differenr levetsof achievemenrcan be
diff€renriared
Posey(1952)demonsracd rharexaminees,luck, r..l
i" *;"e""k.;-_i;,
".
"r,t,
192
TEST IIEMS
ESSAY
k) k!.rr rs.rrnlch e.erer facrorin r!_e
1::l flr'l:"
sradethev re.ejvein a r0
resrthan in.!r,r ot 100r,.rrs
'rem
rhecsk andrh. brsisforjudsins anexani
:,":.:l:l :::1ll::j,rl:iions
n'pr,
i,'
F
'
n i, c r, rv" F . , , r; l, ; ,
::: I ::: 1:ll'r'
".F "
;;;.;;,;. ;;:,;
scorjrg dirc.tiols.di
he
,ornn/r
,i.."
,.;;,.;^:, ;i",;:
:l:l.i j.i :-::l,.lt::,:,{a,d,expric,,
'u.,.',uv thedjr.i,,.'..
^0"'" ;.*"r,"1.
.t"a._,
H*:::.Ll:::.i.:ll.d
r h€ nor e o b i c .rn ' e a n d
rh ,j neasur(3nen,s obtai
- l rrbl e fran an".a.i._,,
essa).resr
ques
io n o ,,,h
oues tlio
,,trbr , n ,) h .e ' e ti a b l e
T h e c i a s s i .s n rd i e s .t s ra ..h an
N pD k
s ur em e n r o r a D e v i l u a ti o n Ir i s u s e ful
m ean so n e rh i n g Id rh e p e rs L u l h o d(
c let er m i n a rj a n ,b u i d ^ y s o r w e e k s !ater
same thrng to rhe student rvho recelve,
To rhe degree rhar orher quali6ed obs,
1l' ' 1" t 1 4 ' " " ' .
scofes to rhe sams essayt€sr ansller on
hcts q' ' " t /ri \rgree i n rhF rane \\r\'
'
" ,,;;,,,,;. .--,;;,,;l ;;, rh ' s ..... s o ,rrd he \hi kpn i rd rh-,r
A paricularly
dim.ult
disrincuon for studerr$ ot e.iucarional l'teasure
,erqecn /t) rhF reti rhi ri ,y ot es(av
E .,r.\
el i dhi ti rv ot es(ay ,.ti nss trnm mLrl ri ote
rFl ares,o rhc (otte,ri .n ot .,,| l ,.rns w e
test must be used wirh .oemcient alDha
The retiabitity ofessay rarin;s ,a
different raters assign the sam€ reta;ve
Da'n quesrion raised is, ..Does tbe score
on wno does rhe scorinB?,' When muttip
c ons r s r e n r a p p l y i n g l h e s c o ri n g c rirei
'n
duced by ea.h
rater wi be about rhe !
scores assigned by any Fair of lateB on
ESSAYIEST ITEMS
I93
mosrdefensibleDerhodof eslilnaring
rhe .eliabili!yof ! alingsis a corretationat
melhodpresenrcd
by EbelLl95t)
PREPARING
ESSAYITEMS
lm plic ir i n w h a r h a s b e e n s a i d i th i s chaprer about rhe val uesand l i D i rati ons
of
es s dyt e s rr d re a ru m L ,e r u t \u g g ..' i uns t,,r i mpr^\i ngf..ay,rpequ.\ri on,
l-"Ash guesti.ou or.ler ruhs thar uitt r?quin thc shtd?ktto.tcnonstolc
a eommond
ot 4s ?in d r k n o u k d g 4 \u ,h !u e \rj ,, ns \.l l n,,r .rnrpt. (J Inr r.prudurri on
or
marerials presenrcd in the rcxrbo(,k or ctassroorn. rnsiead of bokiire exctusiveiv
backward to rhe pasr course of i.srru.ridr, rhcy will also look forw"?a ,"
fri"."
applic ati o n s o l th e rh i g s l e a n c d ' 1he quesri onsw i be based on no" el si tual
trons or probtems, nor on rhe samc ones sed for insrnrcrional purposes.
2. Ash Nestiotu kat dre .letetui@te, ia the sensethat expdts could agree that
one
an&ta h bekel than dno.[e. ]ndere,minarc qucoons are l;t<etyto finction
oniv
as exercises in erpositron, Nhose relarion ro etfccrive behavio; may be quite
rJ.
m _n, eS u l h q u e \ri o n \q i l l p ,u b d b l !n u ' bpe\pF.;a
1 rel " .ant,oLhemeasuremenr
or r s lu c l e n rs Is e l u l ri ,m l ra n d o te s(.n' ral knoqt" dge. Funhermore, dnd mo\l
im no' ' an rl t, rh . a b \e n ,c u t r g n o d b . .r dn(w .r mr) mrte i r nuch morc di rfi cutL
l. r n r e J d e r to j u d q e a g i rc n :ru d e n' s l eret ot dthi .,emcnr. On controtersi al
q' , F s Lio n sw
. h i c h m.rn v rn .l e rrrmi n a r e quer| | nn\ are. l he redder\ oD i nrons and
hir qeqm a v , o n \i d e ra b l )
e a n) c\atua| | on o, rhe \rudenfs anrser
' n fl L re n i
t. D.eJi"" the .mniwp\
talh 6 .onprer2t, and l{.eili.dltt 6 posibte uithout intarferins
&trn n.Mfcrncrr ol thc.trhituemat iatendc.t. The qDesr,on ,houtd bi (arefriltv
phrased so rhat €xamrnees fu y undersrand whar rh;y are expecred b do. Ifth;
r a. k ir . D o r .l e a rl r F \i d p n r i n rh e q u s r i on i r.ett, add an expl anari on of| }l e
bari s
on whi. h a n rw e rsw i l l b e c v rl u a te c l .Do not al tou \rudenrs more Leedom than i s
nec es v ry to m e a s u rerh e d e s i re d a . hi etemenr It rhe quesri on permi rs vari ari on
in t he ex re n r a n d d c ra rl o f rh e a n s w er ei ven. bur rhi ! i r nor a;etel an, va,i dbtc,
s pec r r vr D o u r n o w to n g rh e a n s w e q expet red ro be.
to nor? speeifi. que,ti.^, thdt con be a&aefrd no/.
sLe pl,Iwne
!..In_cn!4.
au. / r y .
h e .ta rg e r rh e n u m b e r o t i n dependendy rcorabl e quesri ons,rhe mo,e
.l
r hnr ouSh tyrh e r o n re n r d o m r' i n (a n b e samptedand. rheretori , the nrore reti abte
srions are Iikely ro be less ambisuous ro
ade retiably. Occasiona y. an nitructor
esr on only a few very bioad questions.
rr, and rhe insrnrcmr shoutd btsure tlar
nr to wanant rh€ probable loss in score
reliab'lity.
,. Atuid gi ag thz tuni@
d .hoie o^ong op.iotot qu.,rion, wt{.r, ,n .iot circun
stdu6 mdhe flch-opti.tr
tfdifTerenr examinees answerdifferenr ques
'e.4rary.rh€ir \cores
ons. Ine Dasrsro ompaflng
ir weakened. Clearty, when studlnts
choose the quesdons they can answer best, the range of test scoies is likelv to be
Dar r o w e r-h e n c e th c rc l i a b i l i l v o i rhc r(orcs u(rtrl h. cxpc(tcd k) hc srncw hal
les s R e s c rrc h i n d i (rr.s l h a t th i s .rxl )..rrti ,,n i s j rsri ti cct
.I' r t , \, ri ,i 4, \, I;,,r' rh,.r,,
\ I' F n , ' l l ,!,." d ' |
, , I . , ' l fr\F F $ .,\ rt r,.' i ,,I. \l r.\r.r , t ' r' " ,i Ii ,d . ,rt,.r.,,Lt. rtr..,
n.r,, r,r
" .' t\
o1 ll) cn o rn i ,{ e d d rc ,tu c s tl )n (j | { 1 ,i ch
r
w aLrl d.to tc.r5rw .
H c " suql
rerr..l
'
h.!
t har un l c s sth . v a .i o u s (tu c s i i o n s;n e l ci ghrcd i n y)fr{ r nU ri t}l c tashi ,,nrhc i i roi .e
l( ) r m o fe s s a v c x a n ri n a ti o n b e d i s (o nri nucd S rrtnakej (j 9i l )r,nr(l acl crtarrrrcv
of t hc p ro b l e ms i n v o h c .l i n
' h e u s(j of(Ji r o..tl .rucni ,,n\ w Ih rhcs. i !,)r(t(:
\. e x p e ri m.D rrl.ri d c rrrc h t,.(,, l nrtrl rtr(d,,i .l ynl rhd sti l \ rl i l i hi ti Ie\
.rn b c r.l c (l u [c l r s rmIt(.db \ rh. u\. or .,prn,nrif csr]oD si
on rl x.orhe,Lan.l .
s e \e r.rls tx d i e sh a \e s h ,)\r rh rr (' pronrtl l D ei i ,!rs ((nntti .d. nrednucnrcnl .rnd
i .b o rl u l e Ia .ri rs o t l u .l g n c n r $tri (l are exnrnc,,usr) rhc xbi l j rvtrcrngD rex
i rre d . Po / y N n .l s .n n p l i i s , rr F f..(,r,nr.i rd.d thrr opri orrt ,rucsr,,," Lc
x ro ,d .d a n d rh d rl l c x rn ri n eest,e rskcd brur rh.s.,n,e;r.. (rr i 70)
O p ri o n a l q u e s L ro n sa rc s oD r€r| nres
j usti l j cd on the ground rhar gi vrn{:
.
s t uden rsa .h o i c e a D ro n g(h e q u e s ri(Jnsrhey are ro answ ermakes rhe resr,,f:i i rer::
B ut if a l l rh c q u e s o o n s i n v o h e e s senri rl asp..rs ot a(hi evcri cnr i n a .oIrse (as
t hev o.d j n a ri l l m i g h o , i l i s n o r u D f ai ! to aD r shr.l e.l b requj rc ansl ,crs ro al l of
r hpn, . Iu rrl ,.' n u re . J n ,,t,p i ,,r ,r,r
, h' r,\ ,' rr,,,,R ,,pr,.,rH t qr,e,ri ,,n. r. r,
' ,,
. rt,, \e p,c| ' rp,l
h, lplh e p , " ' rc r .rrd e n r (.n \' ,j .ri L l v. h,,' m,\ :,,
" ,rl l v ' ti .i ' .,.
Op ri o n a l q u e s ri o n sma v b e j usri {i abtew hen a resrofedu.arbnal a.hj erc.
ment musl c()ver a broad arca and \rhen lhe studenrs \.ho rake ir have receivcd
unc qua l o .ri n i n g i n d i ffe re n r a rra s E !€.. rr ;uch ,i si nri ri on, how el cr, rhe rch.atr
t ageso fu s i n g o p ri o n _ aql F s l ro n s a rc hi ghty dul ,i ous Opri onal resrs,separarel v
s c or ed ,mi g h t b e p re fe ra b l c ro a c o nrmon tcsr,vretdj ng a si ngtc s.orc. b;sed o;
dif f er e n r s e $ o f q u e s ri o n s
6. Testthe questionbJ uriting dfl i.ledl antuer ta it. Wriring rhe idealnnswer r( rhe
time a quesrion is drafred scNes an rmmediate plrrposc tt gives rhe resl const.uc
t or a c h e c k o n rh e re a $ n a b l c n e s s of the q esri on and on rhe ad€qua.y of hi s or
h€r ow n u n d e rs ta n d i n g .Pe rh a p ss o me chanee i n rhe quesri oD.oui d make i r eas
ier , if t h n r s c e n rsd e s i ra b l € ,o r m o r e di scri mi nrri ng, w hi ch i s atw aysdesi tabl e_
A ls o us e fu l , i f i t c a n b e a rra n g e d , i s ro have a .ottengre i n rhe same ftel .l rrv k)
ans wer i t. C o m p a ri s o n o f s u c h i d e al aD sw ersni ghr she.l addi ri onat j i eht on' rhe
ques t io n ' ss u i ta l ' i l i ty a n d m i g h r s u 8gesraddrri onal w ays of j mprovrns rr
The deferred purpose servcd by dratring an rdeat answer toiach essav.
r e( que { i u n i \ ro p ro r i d e g u i d a n ,e .nd a poi nr 6t e| erFn,c tor rh. tar.r s, ori nc
'
oi s r ud e n tL a n \$ e r.. l l s o n ,e o n eo rher rh:l n rhc i rrrrrrrer.r
r< r" err.l e ,he que.
tion-s or to help {.ith rhe gradinq, the ideal answer is almost i;dispensable ro
unilor m i ry i n g ra d i n g
SCORING ESSAYITEITIS
lhe decisions to be made when selecring a method for scoring essaysinvolve rhe
rype of score interprerarion desired-norm.referenccd or oirerion referenced_
and the amount ofdiagnolric informarjon needed abour individuats' responses.
ESSAY
TEST
ITEMS 195
Holis llc 6 n d An a l y tl c a l Me l h o d s
, o Irn r,,n l ,ro , c ,turc. u\.d t., \ oIng e\..rr rF3rri ,n.e\
T}F
nre
.,
, hr h^li. .i i v ,n^' ,ri
,i € \,i
.d g to L J t i mpresi un,dnd hcJ;i tl ri ,i l m;rho,t.
th"
, r i, , , ' ,ri i ,,t d r\u u r.e . l ' u ' ti t.e o rhe, merh;d\,rr.l i nl i rts(..,.,t,." ri ri IgrJ...
nic nr s . I rs n o r a p p l i c ;i l rl cro rs s i g ni ng scoresto essaysrhai are i D tended
;o mea.
."r,
h rF .e ,(rrr (,1ro ' rl { n l
',
Ih r l n t tr' , n / t.n d ..^ th F n d ,,,, 1tti ., i n.,,t\e..,,.i A nrnq r _,,r.,n e.,Lh
\ 1"
,,I rh e .\ !.ri l t , l u rti r\ ,,t rh.. an\trrr. t hrl asrqnat ,,
orc r eo.
, v L l .j r' ,(l i ' rA
',,, , , n' \'lu
/l i r\i
rp l d ri ,,r
i \xher,i r i \(^mD Jrecl B ri h
' ,,,o
,d
r l, , , ' . p .' n .e u t.rl l ,,,h F ' \ru
,l e,..,.r.J
rr,-n , j r,\e.rJndi
i n I ta| | ^n r., .o,n,.rb\^tur" ,r.i n.i ard a3
, . 1. , , ' r :.,.n rp :' r.d r. rl ' .' q | .t \.Inl ' te
IrJper. rhrr rFpre\cn, prFd,i ermi ned
qr . ' d. , r,,r,..t ,t,r.' tr.r fi e rI, tt t\h ,.\.t-oq rheh.tr.r,,,,r,rh^d,i nLcu,e.t
ro
( ) bt a! ns .o .c ,rb a s e do n rc l .ri v c s ta D d ard
r nd pla c e s i r (n o n e d rh re e p i l c s dep
wr lh ot h e rs l h .I h a re b e fi r rc a d Afrer I
t he hig h s ra .k rrc s h u i l l c d . re re rd . ar
I he pc r( € n ra g e s p a re n rh e s e si n d i .are
\h ','n,,-' rr" -.' $ ,s n F re i .h { ,n,rp.
p
r,,,.u.1.rhr\ h.r.cd.,a rhp {,.,di ns
:l
lr ll .l h :' l \ta r' l r\h .d
l " r rl rF !,ri r\e
P r " ( iar u r.s ,
\L l e ,,,i . r i ..r' , t,., n , c ,t nrpl l ,,rl ,,,i an\ .,r, ne,,tp.t dnd hoti {r, r,,,
I n! ' \ i., h i
rh . !^ ,c i f r.r,1" ,,rt,e. i n,r.l \i r.p. rh,.,hnrr,rpr,\ti ,.
LhJr
' r..d .
diil. r enri i rtc r.s p o n rc s a r e l c h s c o r e! al e poi nr. l -or cxampte. rn cvatuattns re.
r , ar d c x p a n s i o l l . d re g r.a rl i n gg u i d . D ray i ndi care r)rarso.i al , potj ri cal , anrt
cco
nonr . I' n p a d a rl $ o u l d b e rd d re s s edan.l rhar cxanpl es ofcach shoul d bc
si ven.
T h. s . o ,i n g s ra n d a rd sn ri g h r b e re p r.csenredon a s;al e ti kc thi s
5 = All 3 as pec r sr r c in. luded. i t l w i t h r c l e r a . l . x a m p l . s
4 = Ar l. r s r 2 of 3 a\ p. . 6 ar e r n . t L l l e d , b o r b q , i t h r e t e v a n re x a D r p l e s
Flgure1r-1
procedura
safrpe SorLnO
ror r]otstc Norm-Fererenced
Scorine
196
ESSA\T€ST IEMS
3
2
I
0
=
=
=
=
At Ieasr2 of3 aspectsare inclu.te.{,ar leasLone
with rcl4vant cxanDlEs
r
Ar leasr I or 3 rspe.B is included rtlh ,"1.,""r
At le4r I of 3 aspecrsis inchrdelt,a,, p".,,..,,, "_,.;;;
",.-;i;,
No rcspons€,irrelcvant responsc
Nor e th a r rh e s c o rc r rq n o r to o k rn s for
D ar
,'p, ",,rre
,.c.1un.c
ii ;'.'"".;.'i,
""'
;.i:l'i"',il,ilj,Tl
l:,iTllj ;,11""
,r,.ia.,,
,,..r^".,
"r"-",,i."r
i,,:::,"::.:1!:^r":,!,.,,.,,,*,,,r
;r,.
".,a,,,
r'
i:
;";;f;;::::;i
lj"i:i,Tt:i:l
::1^':t:l
:o:tl''it, ',"' '"
"'"'i"'":
p,o',n':'r"',",,,
: ;";;l'
""1"'" i ;;..;;c
, f illll:',;j l: :::::r:1':,",1;,r
-o-r
.r"-."i,-,pp.,;;"s'';
;;" :;i*;i
l*, l:*,1::li:
lTT1t.,.r
,::,'l'
;i';;'i:;:l::,i.:
;ili:l:t:'.ll:l:ll::t.t':',:'i'ri'r,r"e;'
'^,:,'"
':
;:i
1':::l:l
;
;'
i',;
l:
':
;
jt,l"t
:: ;:i
l: , :lll;l"ill
:lll"jl';;l
f
-'.
inr csr aiiun,h";,;,,.;.
^1";-:j:i':
:..;;r
;.'
;;i;:
"i : i l ::::J:,:-.:i T .^l /a ri o n ? nd
",
l:';J,:i:ii:;;it.j::l:ii,,r
il :i:
:Il:: ing
;'j: ur::r
";;;";.;';;::,:i.il:l:;
p,o.",;
;;;,
"::il::^"1i::il,:,""r,, " ",
-,,
ous t. be elfec(ive
Ih e S rrd l n 8
A u i d e u q e d k i rh
oar L' . : L n d ,th u s . tr.\i d e ( \c a rc s rh i r aj
trijon. Clf course. once fiose scorcs a
ar s oc an b e ma d c . l h e q u a ti rv o f L h en(
s.ores rrom rhe anat).rical mcrhod der
. r eat ed rh e g ra d i n g s x i d e d e fi n i ti o n s:
ar e I ik c l y ro e l i .i r d e ta i te d . u n ,fo rml y s
Technlqu.s to promote Objec vtty
As has been menrioDed, rhe eff
r . nr l a(h ' e v e me n r d e p e n d s p r i m a ri l y L
( nm per e n c e o t rh e s .o r e ' i s c ru c i a l
ro i h
i n a d v e e n rl y d o rh i ns
r he\ c ou td h e . H e re a rc s o m e s u g g e s ti on
s r oc I rh e v a rF .o mmi rrc d ro m a k j n g r
l. Sare the aksu4s qwstion bJ quesnon r.
r nat t he s c o re r w i l l re a d th e a n s w e rs
to .
going on to rheir responses to the nexr
required wirh tne holisti. merl]od. lr is
srn.e concenrration ofaftendon on one
. ial' z ed q k i j l a n d to fo s r€ r j n d e D e n denr
t 975) .
^,
;;":; :l;: :llT;,,;;
ESSAY,TEST]-I€MS
197
2" VP$sibIe, cottcealftod the score. Ue lenht\ af the sdldeit uhose @ter he a st]d
tr icoi.r;g. The pDrposc of rhis procedurc s ro reduce rhe possibility rhat biases
or halo effe.rs will innuencc rhe sccres arsiqned ld€ally, rhe ansqers ro.tifierenr
quesrions wonld be wrirren on separare sheersofpaper, ideniified only by a code
number. These sheers woutd be arrangpd inro gro;ps by quesnon nu,,iUr for
the scoring p'ocess and lhcn rccombined by srudenr name for roratinq and.e_
colding. Thrs proces .an reducc rhe halo erTe.r associarcd wirh the irudenCs
namc and repurarion o! wrrh the high or tow scores on rhat srudenr.s pre.ed;,g
3: tf pos:1b!e,aftanse IEr indep4
scoi.ns ol the aae,6q Dr at bast a Mnf,tc
'tzn,
4
,n.2. Independcnr scorlng is rhe only reat check on rhe obie.rilrrv. and hen.e
r he. eliah i l i ry ,o frh e s .o ri n g . S i E c ei r i s l roubl esometo arrangl ana ti me consum.
lng ro carll out, it is seldon utiljzed by classroom reachers. Bur if a schoot or
college werc (o under rak. a serious program for the rmprovement of essayexam.
inat ' ons , s U c ha s ru d y o f rh e x e l i a b i l i ryofessay.resr
s.ori ns w oul d be an .xcetl enr
way r o De g rn
To g,erindependenr scores, a! leasr rlro comperenr readers would have ro
scor€ ea.h quesrlon, 1!irhour .onsL!lring each ofier and wilhoul knowine whar
scotes thc other had assigned. At least 100, preferably 300, answers shoiild be
gf,en rhis dolrbl., independcnr reading. (Th€ anse,ersneed nor aI be ro the .ramc
ques t ioD .R e a d i n g rh e a n s l e c rso r 3 0 srudenrsro cach of t0 quesri onsw oul d be
quir. sarisfacrory.) Th. .orrelarion berween paits of scores on individual quets.
t ions wou l d
rh e re l i a h i i i ry o f rhe rari nas.
' n d i .a re
suMiranvPnoPostTtoils
1 Th. popularityot essaylesrs is due parfty to th4tr I
convenence in preparalion,ihe teedom from nispule theyprovidethe examiner,anciihe co.troto,
ihe score diskibLl|on lhey altord
10
2 Essayqueslionsare €ss v! nerableto €xaninee
ciiticism lhaf are obtectve qu.stions
3 Ar essaylesl may lerml th€ oxaminefro assess 11
llre ex€minees IhoughtD@cesses
4 Essaylesls uslally do not provideva id measures 12
or cohper menla processes sLrch as c.irical
thinkin0,orig na ty or abirilytoorqanizeand inl+
Essaysco.efolabilitvcan be eahancodby maknoihequeslions
s.eciic enolohso thala lqood
answe.svrrrbe n6ary identical
Belabilirvcan bo anha.cedmor€by !si.g moro
queslnnslhat ca lio/ shorlansweAthanby !si.g l.wer queslions
thal fequk6tonganswers
Oplonalquesronssholld bo svod6d in a6say
Advanceoreoaralion
ol an deal ansqo.lo each
essavlemracillet€sretiabtescortng
andp€mirs
a .heck or lhe qualllvor lhe qlostionpriorlo ns
5 rhe emphasis6ssay tesrs pace on lhe abitityro 13 lhe holistcmerhodotessayscoringinvotves
the
w4ro rs bolh advanlageousand disadvanlageo!s
assesshenlor overarqualitybssedon eitho/,of
6 Essayscdfos tend to be ow in re abi ly becauso
ai v6 or absoUlesta.dards
ol rimred conrenl sampting ind6inLl€t6sr rasks, 14 Theanalyil.almgthodotscoring
invotves
sssiqn.
and subjeclve scoring
ingscoreslo components
ol a r€sponse
basodon
7 Essayscoresm!$ poss€sssigniricanraoounts ot
6bsoi!lestandards
obloclivemea.inq lo b€ !s€l!l
I Good 6ssay qlesuons r€quir€ the examino€ lo
demonsrratoa com@nd ot essgnliatkrcwledge
196
€SSAY.TEST
ITEMS
OUESTIONS
FORSTUDYANO DISCUSSION
1 Howcan thegraderot an essaylesl contro th6disrfiburion
o, te$ scores?
2 How.an a soclalstudiosreach€.oblan a meas!reot slldenrs,anattlrcal
abititi€swilh
an essaylesl whit6contro[ingth€ nfluence
ot bothsoca slldies knowledleandwrl n9
3 why s essayieslingconsidered,badpracrcetn wntinq.by someeducaloG?
ls r(,
whal are someol lh6 calsesor towessayscoreretiabifirawhal are some
or thecauses
or rowdssayjalerfeliabrtily?
5 Whyis thelse ol optionarossay
rternsmoreprobematrcrornormrererenced
thancrilertonrererenced
snLa(ions?
6 Howcan lhe anayricatscortnqprocessresull n norm{eterenced
scoreInierpretat
ons?
Whyis ihe 'primaryrrarl methodnorusef! torscorngessays?
I what purposesare serv6dby scoringa. essayresttemby ttemralherthan
stldenrby
L2
Test Administration
and Scoring
Unless thc .lass is very large, unless the .lassroom is poorly suiied fo. rcsr ad in.
istration, or unlcss othc{ spccial problems arc cn.ountered, test adnrjDistIadon
usually is rhe si rplesr phase ol drc shole rcsrimgprocess In Lhe admrnrstration
ofsrandardized (esrs,Lhe goldei rule for d)e resr administrabr is: Follotuth! drrc.
liont in thc manual prc.szll ln classroon testrng rherc is usually no such manual,
ar d dr e n c e d l o r ri g i d l y s ta n d a rd i z c dcondi ti ons ol test admi ni stmri on i s mucl r
les s Nev e (h e l e s s ,h e ' e , a s i r D o s r o r l ' er are,rs,.d!atrced pl anni ng usual l y pays
dr v ideDd s A l s o , th e re a re s o m e p e rs i srenrprobl enr associ al edw i rh testadmi n6
t r at ion, su c h a s tl tc q u c s ti o n s0 1 p rc p a ri ng tcst takers,of.heanng. aD d ofguess
ing o D obj ect ive tesls 'I h ese, rogcrher s ith a (onsider ation of (ompu ter assisted
testrng, will provide the sub.je.t matter of rhis chapter
PB E P A RI NGT HE ST U O E N T S
Preparing rhe sludents lor rhe rcst gocs hand iD hand wiLh preparidg the tesr for
the students 'l hough each can be accomplished separatell, the ne8lect of errher
certainly will result in lost effort and less vahd measures of achievement. As a
start, srudents should know rhat a test is coming Any importan! rest should be
announ.ed well in advance Ifa rest is to have rhe desirable effects in mottvatine
and di' e, rtn g e fl .rr\ l o l e a rn . \tu d e nts need to knoa rrur onl l shen rhe resr i i
comhgbur what kinds ofachievement the rest wjll require rhem to demonstrate.
I hls means the teacher should plan resrsbefoe rhe course begins, usnrg rhe nr,
structional obJecov€s and learnnrg materials prepared during th€ planning
s{aqcs of instruction199
200
T€SI AOMIN]STiAT]ONANO SCOF]NG
Test.TaktngSk ts
\{har aresomr ot rhc tegiornarc
aud
n ee so ug hr r opos es \ j
d
- '- '! i q e s \ e n r r J l r , r " r l i h s \ L i l l , r h d r i r a r n i .
,,t triiir.d ru rrJd
'{
pnr.r
",,J,il ,1,.Jdr,6, , ,ir ,,,.trnB,,,r,,,,vn
h ,e\p,.nn. witt br !,,rr.t
Wi , i D u , r , r . b .
,^r c r , o r \ , , r \ p r r t i , . g
tsrdn ncr, or Drn!rLr
{e r a n r r J , , , , r t i € i t .
4 . Thel s hoDr d pur r nem s eher , r
rhe berr
,,k,n8,ne,e\i
,r',,d!.
i;,,,:;;;i..;.; f:;ll:"lil:1,::;:l:,;.:;1..;iili.
;
-------.-
TESI ADMIN]STFATION
AND SCOFING
20l
ollie( by yin u la. rs, is a l)eavy handi.ap ltxanrinees should , ertizc
rhar lasr dinu rc
( r dm m ' ng^J p. iu, s Lhnir r r r n r , i d , r r r r e n r r t n r r
rhrnuAhuurrhr,nu,je,Drru!
ut r r l) if
r h( , d, c t J ( i, , E , \ , i r e t | , , h ( J h r . . _ . i *, . a . *, , ,
' h(i' t' c "r
, un' m and
"t,
k , r , , { t ( dbc S, , nr c d r , \ , . c i \ u {r u t r n n r u r i \ d r r n x ( \ a n n e" e" \i ,ri u
do
r'c,,tc\\ t,t tptur rhdr. tr[glc.
i \ ' ud. r ' r \ \ h, , uld p, , ,c r lr r r , , \ r r ,, , . n
r. h"rc In,e ru , unsrder and res$,nd ru
dll , ht ' r , . , , r ! c , , , u, , \ I hb m ( a,r , h d-,, , h c ! I n u { , , u , p u / z t . , o u t ; " ; i " ; j r ; ;
.ut q"€srion or problcm, or wrnc roo .x;ns,vety or,',"
,"r, ri;,;;;;;;
wner ! kroS answer se€D'scasl t! (nre
".,,y
i,ll' c , , . t / , ut dl\ r r r ' t , c , . , , d, r , , , , r 4ue$ t x d , , , r , e i U . , n \ | r d
y,ruLotpcnati/fcLfrl
! i"r o q- uer ( ' nl. r , ur r r , npt t . . . r r , u r r , , f c - , r , J i r H . r ! . , . r ! d ( n r . , r r " , t , r
u.r,n
lid hr r . ir r t r | . t idnr t ldr i) t u, J n \ $ e r , i 8
7 ln rDw€ring an $sa), question, srudenr sbould iake rirnc ro rcitecr,
ro plan, a.d
ro or8rni2e rheir answer b€fo.e narti.g ro sritc. rhey sln,uld decide
h;w ;u.h
r h4. J , ' J t f o' d r r , r
in r ir r , i , , , p r L J r t r b l e l , , a , . " r , , , , *, , n . y , t o " , o " . t , .
'.
q' m e, h, nd h. , qr \ ( ,
, m \ \ ir u, d ) , e . , , , r u h L . d . a , , a , , , w L r .
d. ll r he,
nr c L, , , d r c . pu r c \ on a \ p J r , r r r d n \ . e r . h c r ( , j L d . n r \ s h u u t d
\ l!. k
"r .
oe9u. nll. t , , be \ u, e lheir I n",k J t r u J v , , r d r r L \ , h r r / \ r r r , r \ e I n ( r
ir,rended
rDd rhat ir is muied in (rre spaces proviaca tor rhai question.
9. la possibte, e*aminees should tlie time ro ,e.ead rhe,r answers,
ro dcre(r and
correcr any.arete$ mis(ckes. tr is..otunon
nis.oflc€prioD aNon,{ teach€rs and
s.uderrrs rha( rhe firsr answer gilen s tuolr likely ro tre c., e.r
th"ar a chanscd
dn\ we, . Hoher e, i, c ) . a, . h c , ide n , c n ^ , n , , " , ,
i n " , o , , . *. , . n , n a i n q , . , r d " - , .
im p' o\ c . e!
s ( o, c , whe, , , h( ! h d , , t s ! \
,,n
r",,s,,,:i,in;,
ii."
",r Ldkd
Suc s ing / M uelle, a' , d h r e , r , . t q t T l . t n J d d , r ", .,"n , C r D , - \ e r a n d a e n r o n
(' dudum
1980' f dno, h- r
r er gudt ir r i\ n . , r . r o d r d b ,
y,d.,r" r. ,;;;;
",,,dr.g-g
s ' der r he' r or iBinal ans wr . ! .
Snicc exar,inarions
do colnr, srudenls and rheir ieachers are well ad.
.
,ise d
s pent i \ om e r im e ( ons idc ' ing h o q
(ope w;rh Ine.n Los, 5k,llfultv
'o
rome .go od book s on r he 5ubiec r . gil i n 8 m o , e'o d e r r i l . d h e t p r h a n
wc ha,e 5us
ge sre d h. r e. . , r e av J ilable r M illm an an d p a u k , t 9 6 S i D i v r n e
;ld kyten. 1979i A;
nis , 1983 ).
:alled rcr,&,6ri"$. Srudenrs who are richly
ed io be abte ro score we on ,"v t."i,
rbject or nor. Furthermore, it is supposed
re better measures of studenm, tisiwiseTher€ is some basis for rhis concern. Certain tests,especia y some kinds
of inr elli g e n (( re s ,s ,i n c l u d e n o v e t, u n;que.and hi ghrl ,p* t,i r;* a * * ,
i .,
am pr e, r g u re a n a to B' e so r n u m b e r s e ri es.rur resri rrn,5 ot thi r D d,r,rc,
" _n
rhe rnai
problem ofthe examinee is ro..R€! rhe hz
previous learning, nor is rhe skill deveto
d in classroom tesrs.But rhere are com.
v an eJ€mnee to subshture resrwiseness
unintend€ri clu€s ro rhe correcr answer
202
TESTADMINISTFATION
AND SCORNC
wc-reclisdrssed in rhe chal)rcrs on rruc-trtsc rn(l
n)utriple choicc rest;rerns. l.hey
a.e ourr'ned anil disLusscd i,, srearcr dc(ail in a"
L,yv r,",l-ui,i,,,r,
,,io
I bc l t" r,-,,. rL , H ," ,d r,.r. tIM,,,.r rj rr rrrrn q -,;.r,.
rrr H ,j t ," " i d .1,,;p;" ;,;;)
m any c rl ' e s o l d ri s o r x n ) o rr,c r ki ,,d. (;,ren J
rerr rt,ar measures
I now l e d g e a n d i s i i e e o fre (h n i (a t r.l
**"*.cnr
-l ,i -"ro
" " a-.r
is likcrv
." bo rittre,rarhcrrr,xnk!, much,;::;;J:l:il
beduc
T€stAniiety
l h c p ro b l e m o f L c s ra n xi e()
.
A lx , c ty i s a fre q u e n L s i d c e tl c c r ol
( las s ro o m j o D L t)c a rh tc L i cti e td . i n
mc corrrereh(. roonr {.hcre a cN.iat L
is , r es ra o x i o u s e x a m i n e e sma y fe a r rh
non. or ridicule, toss of respccr, or
t c a, he r, p a re n r5 .u r h i e n d ,. tru m rh_-
mplex and rhe stualions in which rh€y are
ny simpte, universal answers silr be f;und
d cure of rcs( anxiety Some research has
seem reasonably safe.
' ll:::
:,"
*",""
:",
rerarionberween
reveror ab,rB ad reveror r$r diery.
;*:",;J,: ii."fi:l#ilbre,end,o
bere,sranxiow*h€nfa.inB,,.sirH;;.
' ii:fu*1:*i#;1.;
Ti,tT:
;mT:Tirt
iffi"1;;:Hur%
3. Mild de8le€sof.anxieryta.ilitale :nd enhance@$
penormuce.
detreei are_tilel! ro inrerfere $irh dd depressresi performnce.More ext.eme
.
' J;:,r,ff
:["::t;',::iii1l;
;:x?:,,::l'Ji:;:iiff:;
:1r"f,'::xiii;
TEST
ADMINISTRAT
ONANDSCOFINGZ'03
5 rcsr anx'eti- can be edocationalll userul if it is dnribnt€d, at a relaLilcly lo,v
lerel, tlrouBb.ur rh€ courseof instrtr.rion, i.{e!d .f being con.entrat€d aLa
re l a (i v e l yb ,g h l c l e l j u s t p ri o r.o and duri ng an exami na' i onS ki l l ful reachi dg
inlollc\ Lhe.ontrolled releaseof the encrgy stimulatedbv test anxiety
NlcK€a.hie (1988) has concluded, after more than 30 years of research
relared to anxiery and srudy strategies, that the poor performance of anxious
studenrs may be due to inferior studv le.hniques:
I havebeen concern€dabout studens wboseperformanceis impaired by exc€s
sive auiety, parricularly anxiery about achievementtests Our researchhas .e
lealed $me te.hniques to help th€re studenb perform b€rter on tests Otber
res€archershave aho delcloped methods of reducing adxi€ty,but €v€n when
su.h srudentsbav€ tearned td relaa and control their feeliDssof anxiely,rh€ir
Derformzn.e has noL rmDroved Our mor€ re.ent r€searchindicalesthat su.h
studentsp€rform Poorll on rcstsnol simPly be.ausethey are znxious but b€
cause$ey are poorly pr€pared Highly anxious studentssrud)''but rhel stndy
ineffecrively,m€morizingdelails and readingand r.r.ading (p 7)
Evidence to support the belief that some students of good or superior
achievement characteristrcally go to pieces and do poorly on every examinalnrn
is hard to find Since individuals differ in many respetts, it is reasonable to sup
Dosc thar thev mav differ also in their toleran.e of the kind of stress thal tesrs
gener at e On th e o th e r h a n d , i t i s c o ncei vabl ethat aP P arenri nstancesofunder
achiev€menr on tests may actually be ;nsnnces of overrated abillty in nontest
situations. In other words, a student whose achievement is really quite modest
may ha!e cultrvated the poise, the r€ady response, and the pleasing manners that
would ordinarily mark the person as an accomplished and promising scholar
CONSIDERATIOTIS
TEST-PREPARATION
Objective tests generally are presented to s$dents in printed or duplicated book.
lets. Sometimes rhe questions for cssayor problem tests are written on the chalkboard as the test period begins. This savesduplication costs and helps to main.
tain test security, bur it gives the teacher the double responsibil;ty ofcopying the
questions and ofgetting rhe sNdents started to work on them, all at a time when
minutes are precious and wh€n ev€ryone is likely io be somewhat anxious to
begin working. Then, too, when the chalkboard has been erased, no one has a
valid record of exactlv how the quesdons were srated.
Oral dicrariotr of test questions, especially short'answer or true-false
items, can be accomplished with success,but most studen!s pr€fer !o loot at ea.h
item while they are tryin8 to decide on a response. This permits the student,
rath€r rhan teacher,.to i€t rhe Pace. Sohe instructors Put test items on dides
or transparencies and project tnemin a s€midarkened room. This enables the
examiner to pace the students and ensur€s $ar each epminee will give at least
bri€f consideration to each it€m. Studies have indicated that examinees answer
aboui as many items corr€ctly when rhey are forced to hurry as wh€n they choose
their own pac€ (curtis and Kropp, 1962; Heckman, Tiffin, and Snoq t967). Wilh
TESTADM]A/]STFATION
ANO SCORING
measuremeDr efrors some ctassroom
I
r. nn $ hi . h drrr, ri un\ rre pri nr.d h< l D \
I
ot poI \ , o\ e, F.l dur i ns
Dic
' .m,n.l
ecp\
theervudcnrs tron,e" rne rhe i r" nrs
,F addrps\ed b, r.er,,r ," mpreh.n.rr,
I rtow ro ns€ the separareanswershc€luno
no' to *rid in namcsand ID number\
2. Ho$ many items rhere
no."
P"gts lhere ar€ it rtr€ Lestbookrer
".. ".0
J wherher nor€s,rexrbo,,k.,
,.o",rori,,.'"tny
I wheLhs
qua,ioc
na,,: ;,;;;,,,.:T;::lL:l ii.iJii;'i1;." *, -**
5 . tl o$, m uc h r im e is ar ailaht e
0 whd special direcrions shoutd be
fc,l.ved
7 H.!
m anv pojnr s v il
for ea.h of the sepafate llpes
ot
b. awar de d
fo,
8 whe,ho
;."dd,s
shor;..;;; ;:;lT.J":TT"":Jff1,:l:";T:I
I
$/har ro do $.henfinished $,ith Lherest and L.herh€r
rhc r€sLbooktcl needsrd be
. ot rhc trrqu(n,t hi rh \hi (h (d,h
ru
e mude. W i rh tuur { l ,oi (c em5.
Inr e\
cr answer tor abour onefou
h of Lhe
atrention from instrucrors and edu.arior
s een I n t he m a s rro n g i n c e n ri v eto r s tu d en
r ar her r han fo r rb i t,rl s rm p ty to re me mbe
TESTADMN]STFATION
AND SCORING
ZO5
rnsrru.rors ro eschew recalt rlpe tc
p0c anon ry p e s .In rh i s ti g h r th e rc
er anlnat i o n _ C )n l h e o rh e r h a n d ,
rhel bring $irh lhem ro classare lii
supporl. l_ooking up facN or forr
s olv r ng h me .
A D c x p e ri m e n ra l c o m p a r
ex ar nan o n , a d mi n i n c re d a s a D L
book r es ri n a n o rh e r s e c ti o no frh e
bv K alis h (1 9 5 8 ).H e .o n c tu d e d d rz
affecred by the examinalion aDDri
signifi .anrlI differenr ab,litresi,i x
r r qe. , , t rh p o p F I.b ,,n t e \d mi n a ri u n :
I Sn'dy efforrs may be redu.ed
2 Eribrts ro overlearn sufficien
v to a'hiele rull understandingma) be discour.
agcd
3. NrrLep,$ing add cop),ingtron orhcr students
are le$ obvious
4 tlrc sup$ficiat knortedgc is en.oura8ed
T h e ta k c _ h o m ere s rh a s s o !r
res! rvrrh rwo inlporranr differencci
ol r r m e.wh i c l ) o fre n d e fe a rsrh e v e r)
.l|sadvanrage js rhe loss of assuran(,
rherr own rchievcments. For this rc,
as a r ear n rD ge x e rc i s eth a n a s a n a (
eren en.ouraged. ro collaborate rn
dence The cfforrs rhey somerimes
a.bicve under these conditions car
If rime liDrts for the rest a
a.nrevemenr restsi rhe order of pres{
dent scores, as shorvn by sax and crr
probably shoutd be auanged in orut
suppose rhar to begrn a ieit with ore
cxcessrvetesr anxiery. Ir atso seems r,
wnn rhe same area of subjec( nattf
pracoces rmprove rhe validity of rhe
IEST.ADMINISTRATIONCOiISIOERATIONS
As we-have srared eartier, rhe actual adminisrrarion
of mosr rests involves relar iv et \ f es an d s i mp re p ro b re ms .Si n .e rhe
ri mr ava;rabter" r,;;,;,;1" ;,." ;;;]
I m r r ed, and s e td o m a s to n g a s s o me o t rh e rrud..,,
.1.f,. ." .y," " i i rtr.
rn," ,*
---.--------
206
TE6TADMINISTFATION
ANDSCOFING
should be used to good advantage- By Siu,ng preliminary insrrucrions rhe da!
before the test, by organizing test marerials for ef6crenr disrriburion, and bv keep
in8 last-minute oral direcrions and answers ro quesrions as bnea as possiblc, Lhe
t eac he r c a n e n s u re rh a r s ru d e n r\ h a l e rhe maxi mum dmi rrnr ul ri m; L^ \ort !n
it. Corresponding provisions for errcrent collecrion of mareriak and advance
notice to th€ students that all work must stop when tin" is called help ro .onclude
the test on time and in an orderly fashion.
times th€ dividing line is hard to determine.
S u c h q u e s ti o n sa s th o s e s ti mul atedby obvi ous bur noncri ri cat l l pograph
ical errors should not even be asked. Since the process of asking and answering
a question during the course of an examination is always disturbing ro orhen,
even if ir is done as quierly a;d discreetly as possible, and since the answer ro
one studenfs quesrion mighr possibly give rhat rndi!idual an adlantase over the
others, students should be urged to avoid all but the mosr necessary quesrrons.
Disc'rssion of dis poinr can well be u ndertaken prior ro rhe day of rhe examjna.
Special .onsideration may need to be given in seuings where $me ex
aminees use trngtish as a s€cond or foreign language. In classroom resrins sirua
r ions , th e \e s ru d e n l \ s h o u l d b e e n c .uraR ed to rsk que\ri oni t.l ared Lo gc' re' rl
vocabulary or culrural situations presenred in r€st itemr, i.formarion wjth which
they may nor be familiar. ln some cases,special tesr adminisrrarions may be ap
propriate to permjt additional tesring rime for slower readers.lhe general goal
of Sood rest adminislratron is ro prcsenr and mainrarn the condirions rhrr wiJl
Permit all examinees to demonstrarc their true level of achievemenr wirhour gjv.
ing advantag€ io any examinee.
R.duco Opporlunltle3 tor Che.tlno
books and articles on t€sting, Ch€aiing on examinarions is commonly liewed as
a sign ofdeclining ethical standards or as an inevitable consequence ofhcr€ased
emphasis on test scor€s and grades.
Any a(tiviry of a srudent or troup of srudenls whose purpose is to giv€
any of them higher grades fian they would b€ lik€ly !o receiv€ on the basis of
their ow_nachievements is cheating. Thus the term covers a wide vari€ty ofacrivi.
l The sid.long glan.c ,r another.udents answeB
2. Th. DreDaradon.nd uie of, .rib 3heet
T€ST
ADM]NISTFATION
ANDSOORINGA'7
3 Collusion bcrweentwo or mor€ srudentsro exchangeinformarion on answers
Uraurhorized copling br'que$'ons or $eali g of i€st booklersin anri.iparion
thar rhey day b€ used agai. later
5 A Ia n Bi n B l o r d s u L r,ru ,e
' o ,a l . dn e\ami ndri .n
0 Sre.linB or bulint copi€sof an €xaminarionbefbre the t€sr is given or sharing
su.h illicn advancecopieswlh orhers
Although rhese larious forms of cheating differ in seriousness, none
should be liewed wirh indifterence. The rypical srudenr has many opporruniries
to chear, and rhe willingness ro do so has been obsened as early as kindersarren
( l f i\ bic Jrd Ao .| r\\.
I9 9 0 r.s o n re ri(umsran.es mr) even encourage;xami
nees to chea!, bur nonejustifies rheir doing so. Srudenrs may conclude, nor wirh.
out some justificarion, that the erhical standards of many of rheir peers are nor
\ c , t hit s h .a r l c J rr s h e re c h c d ri n tsu n erami ndLi on! i !,oncerned Ttrry may go
on Lo inler thar rhis facr requires rhem ro lower rheir own srandardi or iusrifies
lhem I ndo rn g \o \ h d rc v (r u rh e r c o n d i o' r\ma) Lon' r;bureroi r.rheati nghoutd
no' , nr ' r ' i l r' l l ,ru d e n l s h e rr ru re ru S ni /e rhar i s al w aysdrshones and usua )
'r
Some acts ot cheating are no cloubr morivared by desperarion. The more
e\ r r er n. rh e d e s p e rd l i o D ,th r n ro re rm bi Li oLrsand v1i ou, l he at| empLro (hear i s
likell to be. A rnajor facror coDrributing ro chearing is carelessnesson rhe in
st'ucror's p,rrt in safeguardrng rhe examinarion copy before ir is adminisrered
and i. s u p e rv i s i n grh e s ru d e D u d u ri n g fi e exami nati on
Emphasrs on grades is somerimes blamed as a prinary cause ofcheaiins.
Bur sin.e grades are, or should be, synbols ofeducadonal achievement, $,e ca;.
not i.dicr grading as a cause of chearing wirhour also indicring rhe goal of
a.hrevement in learning Does anyone really wanr ro do ihar? No doubt mosr
students would find ir easier ro resisr rhe rempErion ro chear if no advanrase of
any consequen.e were litelv ro resulr from ihe chearinB. But refusal ro recognize
and re$ard achievement may be as ellecrive in reducing achievement as in reduc.
ing €heating. Such .r price s€ems roo heavy ro pay.
Increased use oi obje.tive tests has also been cited as a cause ofchearinq.
The mode ofresponse ro obJecrivc resrsmakes some kinds ofchearins €asier, b;r
r he m ulr ip l i r' r\ u t q u e \ri o n 5 ma l e s o r hrr Li nds of(hcdri ns morc d-i ffi cul r.N o
lor m ol r e \r rr i m m u n e ro d l l to n D s o l cheari ng.The qudt;rt ol a resr.how eve,,
mal have a direct bearing on rhe remprarion i! offers to srudenrs ro chear. De,
mand for detailed. superficial knowledge encourages rhe preparation of crib
shee$- If the examinaootr see s ro rhe srudenrs unlikely to yield vatid measures
ol lheir r e a l a ,l ri e \e m e n rr. rl i r \e c m \ untri r ro rhem i n terms ofrhe i nstrucdon
they have received, if rheir scor€s seem likely ro be derermin€d bv irelevant
t ait or s an v w a \, L h e (ri mr" o l c h e a r' ng may\ecm l ${ l eri ous.
Whar.ures are rhere for chearing? The basic cure is relaGd to rhe basrc
.ause Studen$ and their teachers must recognize that chearing is dishonesr and
unfair and that it des€nes consistenr applicarion of appropriare penalries_fail.
ur€ in the course, loss ol credir, suspension, or dismissal Reporrs on th€ preva.
lenc e of (h (a ri n B. n o d o u b r ro m e ri mes exaggerared.,houl dnor be a obed ro
establish cheadng as an accepBble norm for srudenr behavior or to persuade
AE
TESTAOMINISTFA'IONANO SCOFING
ll"iT,:[i:
'n""n'"ti's
is inevinbreand nust be ac.ommoda,ed
assracerurry
Inn ru(ror ro a\ oi d dn) , ondi ri un\ I har mj t
r
sears.Atternate fbrrrs can casity be
t
difterent order Finally. ;"rr-l
o^ ,
rherr examinadon6 as parr o{.rheir r(
witl nor cheat and !vh; should nor b
Teachers bale considerabte aurhoriry in.rheir
our classroon. They
should not overuse it under sress o. unoeruse
rr when rhe situarion deman.ts ir.
If a reacher salistied beyond anv
doubtthata studenr
ischearins'
r'. 's
.,i#il;ilriil'#
r"".:*"
"r'. ".
'*.a" ".
I (joll€cring the e\amidarion marenals and guiertv
d^nrissin8 rhc stDderr f.onl
lhe roon
2 lbm ing Lhc r em l6 oI t he. x a,
3
j?il:;;:x"J:jl-J:."1fli1il:,1"".
inadoD,
or
rhesu(,.",
"" **,
Bringing rh€
incrdeDrLo.he arrcDtionof rhe schoolaurhorities
'ivn,s
il furrh€r achol
O n e frc q u e n tl y m e n ri o D e d D
lc m is rh e e s ra b ti s h me n o
t fa D h o n ;,
educaaonat insriruri.ns of rrcderar
slrong group idenrifi.arion and toyalr
depends seldorn anscs or naintains
r J ' ef ul l \ a d .o I' rn u u u { t\. j h c rl ,rnE .
p€r s . nd r h o n u r a n d rh < h o n o r o t rh.
or by well rehearscd madi on .l.he d,
honor sysr.m io such an e.rironmer
peNonal honor in a world $here no ,
Thar such sysremshave worked ro lin,
trons seems beyond dotrbr_That rher .
be\ und d o u b r. T h e i d o p ti o n o i rhe
pr oDr c m o t t h e a r' n g u n e ra m i n rt| nn
tasu€a ol Tosl Sscurity
Insrrucrors and
adminisrraro
siunar
v bese,
b, rumors
rhar,
;',' j;;lH%f :..:t;,i:T
"',.,.';,'iffl'Jl'l
ddran(eof rhe qheduledadmiois,,u,,on
ur,r".^,,,,i.,,i"".i"^.;;.,;il
;:
mors are tounded on facr. More ofren rheyresut t.om m,sinfo.-,;-;;;;;:
ioussrudenrs
are ontr roo easelo pd$ atons.r,*rr,. *" i,-.ii.",
,. ii.",j.
rreo,or course)reachesthe ears of rh
oneor a numbcror
anoDymous
rer€pbone
cails.whari" l;:'i:'Jlj:?.',t':"1'a
TESTADMINSTFATON ANO SCOBING
l-.l l.l:,",
'hi\
209
xj'rtrdrr mo\, rir.rr ro d,,\e dnd ,u .:ru.emn5r.er
iuu,
wh,ch
s Dorr" _",r",.,r,.y,..,"i.u,
l'l1t*ll*'":;;i"E:.:ampurs,
Ilnm,h,sh\ hoor, N, w,r,rper,r",,^ hd\. ao.,.."iJ,r,"
^r,.,iii
i",i.:",
j:,:: :f i'lq nl**,", ,, '""ilc,." r. ,"i.g"^,.ri'"g
:'",1":i:l,i:::
,-1,:',,-',:
c,t:'i' 'rr'n"''p'"'.''"i,'.a;,r; . "r.X;;b;;5
".::l:3-:'
:,1'":::'j:il::l:'"
;. .;;;,,r"";",;.;;;:',,
::.::.,1
ill.,:
"i,er,.ina,,g.,.,sr,,i,
ru*ors
thara :="t-,d
r.sr is our
ttobeginto-circulaie,
a. 0,.y .i'",,,,
or later
r1;;i;i
,"..i,'.tii',,.iii
SCORINGPHOCEOURES
AND ISSUES
i,:.iff
:fl 'l:,?:ij'^x"T.':",,,'.,:r;ffJ,T#1".H
:i:T:,:;ffii::::i:
almost arwaysarranged so that the answersian be recorded
in ;;l.:;;;ii;.
This aoid\, compli.Jr,.nBrhe rdskor re\pondinsto, ,r,.
*g,".1,. i .,i",,').i
xamseyer(tg{jq)round rhar rhc rev sori, or r;i_rgr"a"
",8..,"
o rc.ord rhe,r dnrb
e r \ u n d"",;;.,;1,,,;;
\eparJre
Jn
cond.grade srudcnrs were towoea somr
i were unaffe.red These findinss folloEed
' H i eronvmu' I
l 96l ) on rhrsrupi and have
answen
n$e,€s,
**,*,**",,n. lJljJ:Tl,?:f,
:,i:TjlJ,::l:l;;ll :i.:g
the corre.red test copy easier to use fbr_rnsrucuonar
purposes. l.hc use of a
jii.j:ffi .;"1
:',,'.T:i:t';:H'i:i:is;x?il:i:$
flxl::
ff .I::;:._,:
used,the answ€$ musr be recorded on an answersneer
rhar rhe machrners de
signed ro handle
If the answersare to be r€cordr
sw€rsshould be provided near one mar
and rninimize$e possibiliry
of€mors,*
the columns of a separareanswertev .,
answeE and posirio;ing ihe
in
"nswe.J
answerspaceson th€ t€st copy.
In scoring.theanswersrecorded in resrbookters,the scorer
ma) nnd i!
,herptul
, . . ro
marl ft€ answeri,using a .otored pen(ir. A shorr r,.ri-rri-ii""
210
TESTADM N STFATIONANO SCOF NG
rhrough thc studenas response can be used to indicate a corre.t response Some_
times it is advantageous to mark all r esponsesusing, in additlon to t}le horizontal
lnre fbr correcr responses, an x to indicatc an incorrecl response and a circle
around the answer spa.e tc' indicate an omitted resPonse.
Responses are indicated on nosr sepamle answer sheets by marking one
of rhe several response positions provided opposire the number of each itemSuch answer sheets may bc scored by hand, using a stencil key with holes
punched ro correspond ro (he correct responses.Transparent keys. which crn be
prepaied on rhc film used to make transparencies for an overhead projector,
have some adlanrages, as Gerlach (1966) ltas noted. When a separate answer sheet
aDd a punched kcy are used, ir is possible to rndicale incorrect or omitted itcms
bt using a .olored pencil to encircle the answer spaces rhat the s(udent marked
$rongly or did not mark at all This kind of marktng is useful when the answ€r
sheeB are rcturned with .opies of the rcsr for class discussion
Most rlassroom tests of educational achievement are scored by the in'
srrucror. If the rest is in essayfbrs, the skill andjud8m€nt of the instDctor or ol
someone equ ally compe ren r are essential The task of ,corjn g an objecrive test is
essentially clerical and can often be handled by someone wbose t'me rs less ex
pensive rhan a'r instrucror's time and whose skill and entrgy are less in demand
for orher educatronal tasksSome school slstems and colleges maintaiD cenoal scori'rg senices Usu'
ally, these seniccs make use ofsmall scoring machines, se!€ral ofwhich are now
available. Bur even if all rhe scori.g is done by hand, a central service has the
value offosteriDg the dcvelopment of special skills !hat make for raPid, accurate
sco.ing. Institudonal lestsco ng senices often Providc statistical and testanalysis
sen'ices as well, and sometimes they even offer tcst.duplication services that Pro'
vide expert assistan.e in the specral problerns ofrcst Pr oduction and in the marn
renance of rest securiy.
Instrucrors sometimes usc the class meeting following the test for test
scoriDg. Asking each student to check the answcrs ofa classmatemay on occasion
bc a reasonable and rewarding use of class fine, bu! ofler the process tends to
be slow and inaccura!e. A difficulty encountered by one student on one test Pa'
per may interrupt and delay the whole oP€ration Most important, if the student
echanical accuracy of scoring, as they Probably
scorers are concentraring on
should be, the circurnstances will not favor much learning as a by product of the
Optical Scanning Equlpm€nl
Recent advances in computing technology have contributed to the devel'
opment ofan array ofeleclro ic scornrg machines thar are pracd(ally usefi.tl and
economically accessibleto school distri(ts and colleges ofall sizes. fhese oPtical
scanners can be oPerated indePende tly by relatively unskilted workers or they
can be inregEted rnto a variety ofcomplex comPuter equiPment configurations.
They may be attached to a large computer dire.dy or they may send information
ro such a computer over rransmission lines They cen be attached to a microcom.
purer or minicomplrter. As a self.connnled system, some scanners can r€ad th€
IESI ADM]N]STBAT
ON AND SCOFING
211
mswer sheetsjcompute c score lbr
sheet.Smallermachinesdo so ar the rr
scoringol educarional
rcsrs,buLrheir \
Corection for Guessing
Supposea srudenrh,ercro sue
^.
rrnce
thereare onl) rwo po5ribtean;we
thc studenrhasreasonru etpecra scorel
l<nowrngno t€ssthan rh€ firsr bur reluc
answers
and rhusrecerve
a zero.Wirhour
rrrststudenrwoutdbe hrgherrhanrharol
snourobe the same
e for guessing.
ir is re.essaryb subrracr
xpe.redgainfrom btind gxessingSincc
answ€rsro^erery righr answer In this ca
numDer or wrong responsesfrom rhe nr
Suessing.If multipte.choiceitems tisr I.i!
q-uesrion,
only one oI qhi.h is co,reL,,rhe cxpe.redrarroot qrons
to Lqhr dn.
srle^ rr 4 ru L and rheguessins
co,,e(rion woutd., ,", ,,h,,,, ri,..s.;;i;;;;;
ol rhe numbcrof srong answ;,\ trom the
ut,;shr ,.,,;;.:
'lumber
Logrcor rhrskrnd tcadsro a generat
tormutator;orrec,ionIo, gur\\iIg:
W
(r2 .t )
212
TESTADM)NISTAATION
ANDS@FiNG
s
R
l/
N
=
=
=
=
scorecor€.red for 8u€s5in8
number of questionsanswe.edriAhtty
nunber of quesrionsansw€redwrongly
numberofpossble akemariveanswers€qualtytikely ro be chGen
in blind
SuessrnS
.It is easy to see rhar this formula becomes
,t=n-W
(r2.2)
in the caseof rwo.al.ernariv€
(rrue-false)
itcx,s,or
w
(12.3)
4
,ro.,
t"U" ot*t"
ntnd teadsro a secondgeneratformula for guessingcorr€c-
+9
=
R =
o =
/ =
(r2.4)
scorecorrectedfor guessinSon the bash of tremsonitted
number ofiiens answeredcorecrty
number of items onirred
nunber ofafternativ€ answqs whos€cboi.e is equaUyl*ety on rhe bass
of blind guessing
Again, it is €asyro s€ethar this g€n€nl formula becomes
+9
2
in the case of true-fals€ items, or
(r2.5)
TESTADM N]STRATONAND SCORING
. l' = l i + :
5
213
(l ? 6)
nt r he c as eo f i i l e rl rc rn a ri l c mu ti p l e .h oi ce resri tems
If rhe lame scr of resr s.ores is .o,rccred fbr gxessiDg in rwo differenr
s ay s , by s u b rra c ri n gx fra .ri o n o f rh e w roD g answ ersand by;ddi nq a
l i acri on
()1 the oDrirred rnswers, rwo diflerenr sers of
cc,rrc.red scores wilt b; obranred
I lut , alt lr ou g h rh e r$ o s e rso f s c c ,rc srv i l t di ffer i n rhei !
1* i 1}, tr,.
omn (onccred scoresbeinghighcr in all
" " .." g" ," tr"
corre.ted scoresbcing
!ariable atm
'rorcA makes a higl
tecth correlared.lf srudenr
a( eiia. Lions o f rh e i r w r" n g ." " p o n s .,i
r c s polis c s A
, w i l l a l s o ma k e a h i s h e . s c (
ol Lheir ir c m s o rD i rre di s a d d e d i o th e 1
uar c m e c or re .U o n to rmu ta rl a r re s rso n i r N o such assumpnon i s
made i n rhe
tornula {or guessjng .orrecrion on rhe basis of ircms orniried, and
vet the tro
I ! r m ul. s r r e l d { o re \ rh .rr rg re F p .rt,, rt! i 0 rhei eta,rrc rdnti ns ur
sl udcnr\.
\ ur s , o' r e re d b \ ,u L ' , r, ro n m" v h e r
tog,(Jl
l
)
,.
ro"
I;
i
n
,b,oture
r aiue. J Lr r ras rh o f-u rre .re d h ) a d d i ri .n" grrJ(Lt
ma\ bc r.S Jrded l ogi ,al l ) a. run hi sh
b t rh e \ a re e .i u d l ty , ound Ir ,(tari \F !atue W rrh r< ure: on
r ( ' r s . r . . lur r o Id l J , h i c \ e D ,e n ' .rh e d L \ul ure !rtue r\ u\ui l t\ rar Ie\\ \i gIi fi Ldn,
rban rhe relarn'e ralue
It is also reorrh noring here lhar rf no irems are oDrjrrcd, scores correcred
lur . gx r \ i' r gL ' r.u h (l a ,ri n g d frrc U o n u rrh e s,ungre\pun\c\,urrct!repertF.rl !
$r r n r r ' . un. o rrfi re d j o ' e \.rh d r
r,. s i rh rhenumber\ot ri ej rr re,D onse,. l hi r
r ndi, dr A
rh r m J g n i ru d e o t rh e e rte ,I or a gu.si rrg,uri i , ri o,' ,1.p..d,;;;
r n,
r nF pr upor' hd
run
rr, m \o m j e d .On t i ro n\i derJbl ( nurnbrrrot i rems;rcumrr.
r e. , b\ ar lea \r \o me u i
ru .l e n rs q i
.,ppl ,,a,i u,r ot ei rhe, formul a fnr
e
' hr crtc, L
, i' ne( I on lo r $ e * i rrg h' h:re
a n a p p re , rrhte
I t pr e d rr \o m e , u n ,i d e r a t ro n \ rh a| \houtd i nfl uen.p rhe ri \r n,akcr \ deci .
s'on regarding rhe use of a correcnon for guessing on objecr've achievcnenr
t
S.orcs corected for gxe$ing wiu usuallt .ank srudenB in about rbe same rela
trve posroons as do uncorr€crd scores.
2 The probabiliryofobraining a respecrablescoreon a
tood objeclive tesrby blind
tuess'ng alonc is extremely5malt.
3 Wellmorivared exaninees who hove rime ro art€mpt alt nems gu€$
blindly on
lcv, if ant of th€m
4 Seldom is any moral or educarronat evit involv€d in rhe encouragement ofsru
d€nts to make the besr rarional
Bueses they (an.
i Sru dF nr \ . r ar idal glFs es , an pr o\ id c u 5 e l u t , h t o , m d r , o n r b o u r
rhe,, Senerdl
214
TESTAD MINISTRAT
ON AND SCOF NG
n a rest rs rimcd. a gussing (rrre.dD
removcs rhc in.ctrtirc lor s1owe.strrdenrs
5.o,6 nDrected for8lssiDg mav ndude
res$,\ene$ or sillingnes Lo
sambte.
irrerevanr Dcasurcs or rhe cxaminee,s
Conrrarl' ro whar srudenrs sonr
r iu i o r g x e \s i rg J p p ti e s fl u s r,cLi rt p
I end s ro rl i mi n rr( rh e !.i . a ,ru g i r" rhi
r o omrttrn g rre rn s T c s rw i s es ru d e nrstn{
s om e rh ru g ro g a i n , b ! mrk j n s u se of €
allc m p ri n g ro a n s w e . c v e fy j re rn .T.hc rr
av or d ta k rD Bc h a n c e s _ma l b e i n ituen.€
rtens on $.hich his or her tikeliho.rd ol
level (Rowley and Traub, t977i $,ood, l
ror $c$'nS grve a specral adranrase ro
as measures of achiercment sullcri
Dill6renttat nem Weighting
, ",,"il::'t;::;
::':.::,T;
1:;ljlri it:i*,-:,:i'j,
;i ;:5ii:.
: x::
irl;iili;
J:;lill;:
:ki".,'.f
;:i;i.,:
:...;:.
;i.l".I'jifi,..tJ
"i::.j.iilit
r
ru, eJ,h,",,.,, *.p""..,
_i", _1", _1.."d s"".. l.;.",i,,";s,;
sponse,and 0 for ea.h omiued response
sotn€ resrconsrrucrorebetieverharcerrainiremsin their
resrshould carry
e more rmporranrirems iremsofberter
exiry or difficulty, or itemsthar are more
Reasonabteas such .lifferential
rdrrl\ (.ru-e.rhere.r ro khi,h rhe. are,ppiied ro
\.re{ \or dorher ordinarrllmaterhereira,rnu.hworr.n.asur..
I it.guc:srng
:i,:.::::lh:'",:'t;:,ll'1"';;p'il'-;"ll#",'iil,':#"::"T
andExaDinarionSeni.., tSSZ).
a rou. t,i
bc,sh'inss.hemeqhen p';..;;r;;;;;i.]i::.'o's
'cqueled
a !r'sh'r\-dirrercn'
.,i.,.,_r,_,,r,",.1;;.;;;:,i.,;;;
;iflJ",fi:.]'::::::,'l,".il::,j[;i]:l
e requesredweighrs.The rank order of
resr.score
disrriburions"and the Kuder
Dcai.There is no obviousadvanraqeto
.
rhesecases.Sabersaud Whire (i969)
j,i'f
;ii
ltr:rffi l;":,tiliTi:+i##ikr't*#1j,,'"
1
TEST
AOMINISTRAT]ON
ANOSCOF]NG2'5
,,1ri , r.. L rr.i q | , J
-,1 1 r,d \d re \ I ,r nnd ntrFr. t.rer po.,rLrt ,e, tur errorr
In ,
rti I,.g t' .-,.re
. i ,,n J 4 i L ,r,.,i Fr-,uri nsIrw \urc.arFprobabt)ra\i el
'
r,
ro !rrerPret
rl d r, J , h i e \e n ,rn r
.reJ\. on. ot hhr, h i . i udeed ro be
.,. rt,Fu ,h c r.,h c ,, rk,re.,\ ma,,) i ,rm\ \houl d b; qri rkr over
r he r " " r ' n ,.p ,!r.rn r J re .. T h i s g e n (r J ! w i r.surr i n rnore retrabtFrnd !al i d
I r i a. ur c \ rh i ' n rt r' r i rl u d t n u rl b .r ,,[ rrrrn, r. \ I rrr| n tor rJt h er and ,hose for
r hc nr or e In p o rtrn t rrc a a re d o u b l e wei ghred.
L to mp l e x o r ri m e c o n s u m i n g i rems shoul d be made ro yi etd more rhan
r
p:
p,,rrv
e d , h .l u h i , h rrn b e In dcpendel rt) ...,red Js ri R hr.r hroIq. the
' , n.
dd\ dr r J g e , u l mu i | l f]r rru e .rl \c i re m\ rur ru.h \i rud ons r:ere devri b" ed i n
Clhaptcr 8. Very diffi.uft ircDrs are likell ro conrribure less rhan moderarelv drffi.
. ult r t enr sb s c o re re l i a h i l i rl . c i v i n g rh e nore di l fi .ul r i tems exrra w ei ehr i ow ers
thc average cflecri.!encss orrhc ircmi and rhus lowcrs lhe effecriveness;f rhe resr
ru !,rre r.\r
,trdt drfl erenri dt hei qhri nsof
' \," ,u rrF J
i rc rn s ,..rg h rb . r,F fu t i ,, In pr,^i ng \,ure reti zbi ti rl o;val i J,,r.
F or €x an rp l c , i n a q u c s r n l i k e L h efo l tow ,x8:
l
h
A child comptaif,sot soverepain and teddernessin the tow€rabdomen,with naus6a.What
shourdlh€ child's motherdo?
a. Givethe child a taxative,
D, Purrhe chitd to bed.
c, call rho doctoi
, l| ' i' i, ' f r l' rl i r\r
r(,1 !,' r\. rn ' g l r.\u l r nai oreol
-t.orrhesecondi nas.ore
. ' r ( 1,r nd ,,1 rl ,. rh i rd i n r \.re o t + t. In
rhe \(ori nq w ei qhr\ sere
" be
' h,\,r'
\e
de' enninc d i p i u ' i l ( h d s d l \,, h e e r .uH gesred
rha
hc\ mi ghr
de;e,mi ned
r x pelr nen rrl l \. i ,, a . ru m r\rn ' i /e \i ,re
ur vdti di ry.
' cl i dbrl i r\
Tablg12-1. E fect of Dilterentialtiem WeighlingApptiedto FourT€sts
Na ot
Slrde.ls
Na ol
ltens
33
41
50
160
0 945
(1-140)
34
105
0i 1-160)
(1-i 0)
Fi ghl s= + 2
(71-105)
21
0 923
90
(1-45)
Fighls= +3
(46 90)
0 983
0 976
218
TESTADMINISTFAT]ON
AND SCOR]NG
B r,r ,n rh i \ c a i e a ts o rt,. e.
_
( D o k n e ). 1 9 7 9 ).S e td o m h a !€ anr
v a ti d i rr b e e n to u n d . l (e e m \, teJr
one would need ro wrirc items wirh
E x r e p ri o n s E i I b p tu u nd. ol
.
we rg n rrn go r i rF m3 .n r o t i re m re \p,a
r oo m re \rqo f e d u r a | | o n a t r.h i e r,trrrt
srrucror of an educarional achieve,
COMPUTER-ASSISTEOTEST ADMINISTRATION
amrnee performance. It can provid€
\
tney respond ro rhe tast resr r;m.
The I
resr adminisrrarions seems bounded
or
puter what it can do for us.
A n c w a n d p ro mi s i n g re srrdmi ni sLrari on
app.oach.adapti L.l pttt4s,u.e\
L ad, v a n ta g e s
t. he
of
mi c ro .o mD uL
'he
IJI;"[,jl;.,il:1,,T*'Jl
'esrinscare;tu-;;"i;;',;";;i,',^";;,il'::f
TEST
ADM
INISTFAIION
ANDS@RING 2I?
that the trait berng measured can b. described by a single psychological conrin_
uum and thai rhe responses ofexaminees ro resr items can be used io Dtace rhe
individuals on rhar conrinuum. The comput€r sel.ecrsfrom the rcst.ire; bank an
rrcm rhat an average examinee would be expecred ro answer correctll If the iest
taker answers corre. y, amore dilhcutr irem ischosen for the nexr try. If rhe first
answer was incorrecr, an easrer ikm is chosen for (he second try. Since each
item in the pool has been calibrared in advance ro a particular lo;adon on rhe
. onr inu u m, rh e e \a m i n e e \ p o \i n o n on rhe .onri nuu; (an be l ocared rhroush
\ u! . e$rv e q e l e c ri o n so l e a s i e ' a n d h arder i rems. A chi ef adl anraqe ot adaD ri i e
t e\ r ing o t. . o n !e n ' i o n rl re n i n S i s rhdr onl y rbour hal f rhe n,-t .. or i i .."
are needed ' to obrain "equrvalenr" resulrs (creen, 1983). There are probtems ve!
unr e\ ol v e d h i rh J d d p ri v e rrs ri n g , b ur i rs anri ci paredadtanrases_sti o.rer tesri " e
r jm e. dd d p ra L i l i r! ro mo rc v a l i d
rypes.and grearertesrsei uri ry_make i r oni
' renm
ol r h. m u \r p ro m i \;n 8 d e re l o p me
L\tor edu.ari ondtand psy(hol ogi catresri ngi n
the last decade of the century
T h e ' e rre p ro b l p ms ro b c uver.onc betore mass resti nq by comD uter
be,
.o m m o n p l a r e . F u r e \d mp tc.
e mrr.r be , targe poot 6r resr i rem! i n
^r ne ,
' her purposes,
rhe computer's bank so tha! fbr tesr securrry
every examinee does not
rcceile exacrly rhe same irems rhar rhose previously resred received. The items
different examinees receive musl be relar,lely equivalenr rn conrent and diffi.
cukyl orh€Nise their resr s.ores witt nor be comparable. In addirion, rhere is
rh a r d b a n t u f re ,r i rems .dn be mai nrdrned w i rhour permi i nq
unaur hn r' /e d a ,, e s ro
i re m r.o ncrumpurer uhru reem, abte ro..ourfoxY
'huv
the orher ro hreak scuriry
codcs designed ro timir accessand preserve confidentralitl The old fashioned lock and key srill appear ro be the ;afesr way ro srore
rcst iiems or booklers in preparation lbr resr adminisrralion. Frnally, re;earch on
co mpu ter a ssisred resr adminisrrauon has drawn atrenrion ro additional con.
cerns. Moe andJohnson (1988) found thar rhe rerminal screen presenred a varierv
, lP r oblc m \ro e \a m ' n e e s i j j p .rrp n r
repo ..l no| l ,eabteevef" ri sue,39D ercenr
, , bie, , e.l r.rl ,, b ri g h rn e (..a n d 2 5 p e rrenr sere borhFredb!
stare.One tdur $ of
r l, . ( \ dm rrc .\ d l \,, , o ' n p tn i n c d rb o u' rhe ta, I ot oppo uni rl Lo rcri ew i tems ro
which they had alreadv responded Sarvela and Noonan 0988) Doinred our rhe
- a, , ' . limi rJ ,r.,, In d L i l rr o re i u n .rd er re,pon* ,..hange ansui rs. and re.over
trom kev entry erors i,dds an elemenr of unfairness rhar reduces rhe reliabihtv
J nd \ J lr d rr),,r rh ,
'
On rh e p l u s-ns re
i d e . 9 l p c rc e n r ofrhe 3l S subi ecrsrn rhe Moe and Iohnson
l. r \ \ ' r u ,l \ p \p re ..c d a p re tc rF n ,e t^r rdki nB an rpri ' ude
by comD urer rer.
'
' esr
m ndl r J rh p r rh rr b \ .,,n !c rri u n rt pap,r ,nd pen.i t pr^redures
In addi ri on,
computer'tesr adm'nBrrarnJ.s show pronisc for providing more valid rest scores
h it h hdr d r, J p p c d e \" rn i n re ' rh d n , d n he oh' di n;d i rom;aper and oen i t resrs.
\ , ' , ! r i, , ,rr i . r.q u i r e ,l n rd . q h rh e dr I rl dhrt v nr .,,r, e synri esi ;ers, no readi ne
may be needed For rhose who Iack rhe {ine moror coordinarion required to us!
the keyboard of a rtandard conputer Grminal, a rouch{ensirive ,ireen
the
noDitor provides an alternarive t hcrc is no way to predict rhe magnirude""ofth€
i' npr ' | , , l
d . \ i , e \ i n p r u ! i d i n g o puur runl i e\ i n edu(ar;on dn; rmpto! menl
li, ' r ho, e q' hhcu\. h e re ' o fo rc h J e \e .n m,B rtr bal r.r,. B ur rhere rr cuerl i eason ro
2'I8
TESI ADMINISTFAIIONAND SCOFING
SUMMARY
PAOPOSITIONS
1 Sludenls should be lold in advance wh€f an imporlanl lesl is lo be given and whal lhe nature ol
rhe conlenr s to be
2 Stlde.ts atal educalofa levelsshouldbe raugril
essenta lesr taking skills
3 The lesl developershculd avod c ues n the resl
ilemstha l en ab e an ex an f ee t o s ubs llulet es !
wisenesslor command ol knowredge
4 Tesl anxiety s seldoma major
n delerhrnnq a sludents score oh a lesl 'aclor
5 Researchhas sugoesledthar rhe poor lesl per
ol anriols slldenls may be due 1o n
'ormancelearn ng and pocr sludy techn qle
compele
6 Oblecllveclassroom lests usua y should be presen ledrf dlp lca led Les lbook els
7 Many aspecls ol claosroommanagemenlrealed
lo lesl adiiiniskallons can be addressedelJec
iivelywthlh oro Lohwr ilt enns lr ! c lions ont f r er es l
booke1coversheel
I The posticn ol the corect answer ln mulliple
cho ice ilems sh o! d be dis lr buled s om ewhal
e ve rly so tlra lo v e. us eor ! nder ls e ol a pos r on
does not providea clle to exam nees
9 Bolh open book and lake-homelests orrer advan
l€ges thal are o!tweighed by ther d sadvanlages
reatve ro n-c ass, closed-bookresls
10 The re is n o.o nclus € r es ear c hev denc e t haL
suppons lhe order ng oj ltems n a lesl accordrig
lo dllcully evel or on t hebas isoj s ubjec lm alr er
11 Th€ 1e s1 ad m
nis lr alors houldher ps r udenr st o ad
lusl the r rale oTwork on a lesl accord nq to lhe
amolnr or t me rema n ng
12 Specia lesfadm nislralioi procedureslor c ass
room lesrs hay b€ neededlo accommodalesludenls w lh anglage hand caps
13 The nslrlclor shou d be responsrblelor both the
p roven trona nd t he p! ns hm ent ol c heat no or
1 4 T h 6 d e v e o p m e n lo t a n h o n o r ss y s t e m s n o t a
p r o m r s r nsgo r u lo n l o l h e p r o b l e mo f c h e a l n g d ! r
l 5 T h e I n s i r u c L osrh o u r db e r e s p o n s b eJ o rp . e s e r ! n q l h e s e c ! r 1 y o fa l e s t p r o r t o i t s a d mn s L r a t o .
1 6 T h e u s e o l s e p a r a l ea n s w € r s n e e b t a . L z l e s
r a p r d c r e r c a . r m a c hn e s . o r n g o r o b j e c l v e
1 7 R e c e n la d v a n c e s n c o m p L l e rl e c h n o t o o yh a v e
made Lesfscorng machines more read ty ava I
abe for lse by schoos n scorng cas3room
l a T h e p u r p o s eo t ! s n ! a g u e s s . g c o ( e c t o n s L o
r e d l c e 1 0z e r ot h e e x p e . l e d s c o r eg a n i r o m b t n d
19 Scores may be cotrected lor guess ng by slb
l r a c l n a a l r a c l o n o f l h e w r o n g r e s p o n s e sl r o m
or by addrnga fraclion of the om tted respo.ses
l o , l h e n u m b e ir 9 h l s c o r e
2 0 S c o . e s c o ( e c l e d l o r q L e s sn C u s u a l y w r a n k
lne eramr.ees n abolL lhe same order as the
co(espondi.a !fconecled scores
21 The probab ly o1get(r.g a respectabtescore on
a qood obleclve lesl by b nd guessno atore s
2 2 S l u d e n l ss h o u d b e e n c o u r a g e dl o m a k e r a r o n a l
guesses aboul lhe answers lo obleclNeresl
23 Gvnq dllerenl weghls Lodrle/efL lems in a
L e s lo r t o d i l f e r e . lc o n e c L o r n c o ( e c t r € s p o n s e s
wnh n an lem seldom mprovesscore reliablily
or va[d score use appreciabty
2.4 Adaplivelesti.s s a r. aLvey new and promsin!,
m e l h o do r c o m p u l e rL e s L - a dnms l r a l o r t h a Lh a s
l h e p o l e n l r a l r orrm p r o v i . qt h e e j n c r e n c yr e a t i s m
and securiLyaspeclsoi the more lrad (ona resl
FORSTI,'DY
AND DISCUSSION
QUESTIONS
whal are lhe pros and cons 01 uslfg
surpriselesls
lhal are nlended for slmmalive
H ow m Eht a s t udenls der c enc y n l e s H a kn g s k l l l sl e a dl o a c h e v e m e n sl c o r e so l q u e s
lionableva d lyt How do such sludenls cause the re ab ty ol lhe scores irom the r c ass
lo be lower Lhan t should be,
3 Fow.o! d a studenls seli reporls of lesl aniiely be veifred lhrolgh othe. mea.st
TESTAOMINISTFATION
AND SCOFING
219
A leacheralows 40 mtnulesot lesirngrme ror slldefrs whosenalve tanglageis
nor
English,
b!l allowsony 30 m nulesto a lolher studenlsDoesthrsseemtike;n;q! table
po lcy?Why?
why do slldenls chealon iesis Insteador p.eparinglhemsevesthoroughty
tor scorin!
6 one inskucloralrowss udenlsto keeptherlesl copreswheniheyeavelheexam
ancshe
oeveopsa newteslror thenexttrmethalexamis ne6ded.
whal aro lhe prosandconsof
procedure
lhis
lor presentsludents,f!lure slldenls,and the inst.lctor?
Howcan I be shownthatthe lse ot a coneclon tor euessing
tormuladoesnotpenatze
Whalkindsol conrrots
woutda teacherneedto inlroduceto pfevenlchealinqby sludents
on a comp!ler€dmiosleredlesr?(Answe.ior
indvidualiz€a
restrnganOgroupteiringsepa
fest
raracteristics
2a
EVALUATING
TESTANO ITEMCHAFACTEF
SIICS
|lm e . rc ! r\e rh e rr re \r ,,c m \ i ru , l u r ur r u!F. f\ cn,u.rl r. a tdrgr
puot ,,t
hi th
tesriremsshoutdaccumulare,
and rhcab,tir) a*.r.p fi,gi q,"t,,i i.,
be enhanced in the process.
-
221
qu! t,,]
;,u
TEST CHARACTERISTICSTO EVALUATE
The chamcrerisrics ro consider in evaluating rhe qualiry of
an achrevemenr resr
are the sar.e as rhose ro,lhich rhe tesr developer arrends rn
rryine to build
"
sooq-'e.\'l9.,Ic or ,he,eimpn,,rn,ra,ror,r,e rejeqn,e.r,"i,,,"1
s per r. !.d r r(u trt. d j s .r,mi n z rrun. !rri xbrti r) dnd,(t,rbi ti ry.
"ir,i,-.
thoueh,onrc ;t
. nes . c n d ra (re n s rri\ a rc c v J Iu d r.d \ i rh d rc,cn, .r i reti r t" or.
i i ,er;on ;eterenrcd
u norm refcrenced measures, each rhrra(rcrisnr rs rmporranr
ro c"nsider ;.
gar dl e s ' o r rh e
u t" q ,o ,c i n re , prerauun ,rr.,.,, i ,,,,1,.,a.J,,,1,," i .r,
' !p e
Rolevrnce and Balsnc€
Rel"zan., rndicares rhe exrcnr r(
ficadons and conrribure ro achievins t}
.elevance of resr items requires cuitit
flt€ria. Are rhe test specificarions and
test reviewer !o decidc which irems are
ofrhe_iest purpose, does an irem like rhis belong in rhis resrr
lrems rhar are
b e re te v d n ' a re n o r n e (e \uri t, or ni gh qi " ti ,,. h,rr i h.)
lud_s:-d
:
" ;; i ;;.,;;,
appea r ro me a s u re rh e a b i ti l i c s th dr rhe re\, i nn\r,ucrol
R e l e v d n (e re j u d g e d b t d n i ' en, h\ i rem rdi eq ot res,,onrenr s
i rh Jrren.
..
oon o rre c r€ d a t rh e s ec ri re rra :
t:
a p p y p ;r.p $ .
D ues' hc i rem (onrenr f,td\dnetemenr nfrhe
.
,
^C?
n .te
n trd o c s rh c i te m runrenrmarrharperrfi ti nrrrui ri onal obi e..
oom a rn o e rrn
rtro
n ,o
t i! e? C a n rh f rr' k p re s e n re d b v ( he i rem bc round i n t,, g.;.;;-;;;;;l ;;
r ns r r u c u o n d t m a re ri a tsu \e d b v e rami neev
2- Tat orenL IdeL.rn ierms oftrbels Relevance Cuide or Btoom,s
Taxon
om y - a re rh e i ' e m \ h ri U e n a r rh e rppropri are i nrFtte(rurr
tcvel : A ..,h.-;" ;;
r r v er smrn rmrre d ta to r o l rh e u s eof L nos tcdge.appti c.,,i on.
Jnd pt obtem sol u
n e r re q u i re d
ing- )Are l } l e a b i ti'ri
b y each i (em ei i heri ;o fa, bel und' or w e rh;
or r n. .o g n rt' v e d c m d r' d s o n q h n h i n)rrur | l on krs tocu5ed?
3. ExttanpdL'abiljh?r. fo qhar errenr doei ea.h irem ,equire
.
knowledee.
s k ill- . o r a b i l i ri rs o u b i d e ,h e c o n renr domai n or i nreresr.n" .i
l ,rl " s" i ,*
i * :" rl
r eddin g d b i l i r!, o r c r(.a r' v i r)p ta y
much of a ." 1.: H .*
,i l ;t;;;;
;;:
'
o
o
S r ound k n o w te d 8 € ,o u rs i d e th e d omai n o, i nstru(ti on, murr
-r;; th; exami nee.a
upon to answer rhe irem? To what exrenr do th.
.rrr"_",
tne majo ry culure mfiuence irem inrerpretation
"".-r,
or ihe s.l";ai; ";;;iil;;
;;;;-;";;
conect answer?
Most tesr cons!ructors seek ra]dtu? in deir resrs.Theyhope
rhat the ilems
rhey
seiccr for their rest wil sampte representarivety
#;.;;;;';"i::
"il;.
222
EVALUAT]NG
TESTAND
ITEMCHAFACTEBISIICS
k nu s l e d g € . !k i s , a n d u n d e rs u ndi nss-ou
rl i ned i n rhe
pl dn I he tdbl .- ol
s. pe(rrrrd
pe c i i i c arro
ti o n sd,te
o e\.tn
to pnepd i n rh e pl " nni
' esl
,,a"d
,.,ri.. s,;a.r",h""i
i;; i;;;" i";",;;";J;i:::,:i:.ij:lliil"::1s:;-.
u:11..:
r,
l:fl:"
;0"'r'p'''ii''"'"'
"r^prr,r""Ji.."",li"J,'i
;;ilTi;',:,';:9li;:li::l'
i:;1;l:,i;ll1i":::
"Ti:':
r;:;;';;"'J;J:l;:.iji"'ll,:.,.I
:::1,ff]:l';l:;:,il;::ii,,"1:t::':,,,i,'.".
,r
,
,.,,
,
,.,
J"
,r,f
*";4;ii.::;;:;,,,;::.;,,.11
;::;i..ili;J,[,iii::i;
]l;f;:
ilii:',i,lT,:ill:
j:i iililir *:ijl}xj:
:litt":::.1;:;::i-"':;i,i.'il';l,i:!-,xT,:'::,i
i;::;,';
:11';'t,tjt:
:l
*;l;ll;,:.il!:
isi:
j;'li.::iirlc',.,
il'.l.ilTl:''.'i.'::ilT.:l;
iii"Jl
l*:,,""'J'::
;
J,iH:::
;
;i;;;',;;.";
j
j
::i'JI;l.ffi:l
l'"i:;llff ili:l,} :ll;,1;::lr l:l.r:lr
:i:.lT;l;lt.',,1ti'i"'::liJl:*1,:r;r:ri,ir;:r,;il#::,l':llli.
;:i"J,,:,li#
:;.j1.,
",,,
;.:i.,J,;,;,:T;
.f,lljl.lll
""," i,
l:",,,
Efficiencyend Sp€c icity
"a,s,r
"i;:l;;
EVALUAI]NG
TESTANDTEMCHAFTCTEF
SIICS 223
Ditlicully 5nd Olscrlmlnation
How drfficult a tesr rnozld be relates ro rhe purpose for rcsting and rhe
kind of score interpretation desired. A good orrn.referenced resr ;iroutd be
harder, iDlentionally, rhan a good .riter'on.referenced rest. Bur how hard a tcsr
r u' n) o u r to b e rl s n d e p e n d s o n h ob qel l srudent\ l earned rhe r onr.nr requi rcd
b\ t he re s r ra s k s .Ifl ' l 7 ? r/r) s e re s r ri .rl ) d.hdra.reri \ri . or rhF resr,a ai ren re\r
s a, ul. l b e e q u a l l \ h a ,d o r e d \v to r e\'
S rouf ro w h,i m i ' sa\ admi ni (rered.
For norm referenced purposes!' vrests rhar are roa easy or roo difficulr for
the group resred will produce score disrriburions that make ir hard ro identify
rel'able inrerindividual differences. Under rhese circumsrances rhe coat of rhe
test developcr is to uk itemr thar will produ.e moderare difficulry-a;ean
s.ore
that is about halfway berween a perfecr score and the mean chance score Thus.
rhe idzal diffcultt of a 40.item resr composed of 5.oprion mulriple-choice irems is
24, hal fh a y b e th e e n 4 0 a n d 8 L o n e .fi trhof40). The di tfi cutty oi a re,t i s obvi ou,l t
detcrmined by the dilficulry of rhe irems thar comprise ir. l he difficultv of an
t lem , irs y ' u a l u " .i ( rh e p ro p o i o n o f rhe group rhdr ;espon.ts.onecrt) I tre i .teat
drfficulty of a 5'(hoice item is 0.60, halfway between I 00 and 0.20. Considerabte
skill is required by irem writers ro develop and manipulare irem conrcnr ro
achieve the approp ate level of difficulry.
How diffi.uk should resrs inaended lbr crirerion.referenced inrerDreLa.
|lons b c l B e , d u s e rh e e l e mrn K o t r he domai n ro bF measuredrre madc e;pl i , i ,
h) r he d o m a i n d e fi n ' ri o n , rh e n o ri o n ot drl Ti cul ryi s burl r i nro rhe resr\D ecl fi (a.
r ion\ . I n rh i \.a re rh e
w ri te r i s nor free ro mani pul are;tpm.^nrenr ro rnnu
enc e d i ffi . u l r) d i re .rt)' re m
Io rh e e \renr rhar di l Ti cul ty j ( mani D ul ated,retevan,e
m al r u fl F r
When a rating scale is developed to describe rhe absolurc srandards
against whi.h performance will b€judged, dimculry is accounred for in des.nb.
i.g the various scale poinrs. The srimulus pr€sented to rhe studenrs, wherher a
r hem e p ' .m p q a s p e e .h
o r a l aborarory \ki l l , mu\r be preprred by rhe
elaluaro r to b e (o n s ;i re n r' oi np id(,i m , ul ry w i rh rhe demands i nher;nr' i n
obi c( .
' ne
r iv esof i n s | ru (ri o n . F o r e x a m p l e , a n i mprompru speechdbour the derri
menri of
s m olin g w u u l d b e e a s i e r fo r a h i g h school srudenr than one abour how w rrer
softeners work In this case inapprop ale difnculty-roo
hard or too easywould conrribute to a la.k ofielevanc€.
cen€rally, we expecr restsgeared ro criterion.r€ferenced interprerations
to be easi€r, in terms of mean score, rhan those us€d fbr norn r€fere;cinE_ But
it i3 possible for a Sood cri.erion-referenced rest ro yield low scores.In rhe c;iteri
on.referenc€d rituation the goal is not to mak€ resr-rrhar are hard, moderare, or
easy in difficulty. Instead, lhe purpose is to rranslare the resr specificarions_ihe
dom ain d e fi n i ti o n - i n ro re l e v a n rtesrrasks.H i gh degreesof suclessar rrandr' i on
will automatically take care of dilficulty.
Tbe abiliiy of a nony".refer€nced rrrr ro discriminate beMeen hiqh. and
low.achreving srudenrr is a tunciion of th€ abitity of eacl ttm to do iust ihar. tf
alar ge p ro p o rr' o n o l th e g o o d s rudent! get an i rem ri ghr, and a s;al t propor.
r ion of | } )e ' p o o r" rtu d e n ts g e r i r ri 8hr, rhar i r€m has di scri mi naredprop;rl y and
has . on rri b u re d ro rh e re s t p u rp o re . D rd' n
a,i rn i s.toscty rel aredro i ti m(utrv:
?24
EVAIUATINGTESTAND ITEMCNARICTEq]STCS
ir c m s rh 2 r a re ro o h a rd o r ro o c a s l are nor as .i pabl . ofdi scri mi nari nq
berw ee,,
high a n d l o w a c h i e v c rsa s i te m s ;f rnodenrc di i tj orl rt
T h c rre q s rn a c ri re ri o D j c i
s nr de n rsa s to D g a s s o m c s tu d e n rs
r hos ei tc ms . Bu t s i n c e th e p u rp o s e o
ir c DN rh a r fa i l to d ;s c ri n i n a r; rr. ,
bas is O l c o l rrs e ,i f!))o re l o w a c h i c vr
conecrll, rhar irem js a negarive d;s(
menr purposes
Variability and Retlabitity
A s l o n g a s d i fl e re n c e s i n srl
pose lbr resring is ro idcniilv such
s hould c x h i b i r h i g h ,a /i .rb i ti 4 T h e t
the more succcssful rhe rest ,:onsrnr
diffcrences in achieleDenr The rot,
ingv x r rrh ,l i rt \h o u l d b p d p D a rp n r.trrrrmet\ , r.\
o, nd,d r, \rr \i pto.l Fspd di r
, , ibnr io n s N i rh re td ri !" t. i ,;d l s rd n dJrd d.\,dri un\
r" ,,4;;;p;.;;
m nd cra,e
dirri,utr\ \rrnd rhebprr, r, , . , . , , r. a i., i-, " " , , " " f i; ; " " ; " , ,;;.;:" ,
t : i] li. ;
J , hiA e m-n r d n d p ro d u , i n s h rq h y r,rr \ d,,rD , r).
cood c.irerion-referen;e.l r.
_ AhRh
qud,,
\ * ".;i;:T).',1
ll:i:,.:1,';,::
l"id"",,1",
;:::ll,:$;ii;
d j rprte,I w .r...,
.i ,,
a m,rc,,Jt
r c \ r . n s h ri h e r(r\o n e ,,h ri i n ,
pianosrudenrs
prarwirhouterr.r
al areexar"pr*
* r.,,i " "
.,i l
I ' ons H h e re v a ri a b i l i ry rs q u i re s m a o, evc, nonexrsrenr
"r.;i,..,.".."i*,i.i.i,iil
w h e n n o rm rc l .re n , e d \u rF i nr,.rprcra| | .n. i r c
nct.de.t.th., at,tn,tttt nl
tt^nL \tati _thnli nd,i ttr^t
tt," ,. qu.t;ry. C rf
"t
rcreyanceni rsnoLhFen e.tahtr\he.t
rhe rrrer
me
abiliries quitc accuraret-vThe
'.relc!anr
:;"ffi;ffilil ::i.t:.tj;i:
- .','
JJ:
nu,be.,,vu.ctu,," ..
-,.. ",,".l:l:*ll':,,1,1:ll;:1.il:.:..::;;;;,:i:
mosrly based on correlarion coefficie,
variabilty in s.ores When dichorotr
madewift scores,decision consisren.,
But in criterion.referencea contex.tsU
grading on theA-F scale,score reliabi
r he r r ad i ri o n a t re ti d b i l i rv e 5 ri m rre mav be app," p,;" ,.," ,,.,
t" ,
or goooness used in norn referen.ed {
bF l oostri ngrn' ForctamD i ' '
J K R20 o f 0 .J 5 \o u l d n o , o . r' ..o -ontt* ' t
-av
u,e'bu,;,ma,
no,be,oorowror,enain
"
..irc.i"n,.brlllil"T:I;l,i:fi:::l
The rest chaEcrerisrics we have reviewed iD rhis section
are imporranr ro
examin€ in evaluating rte quatity of an achievemeDt ,",,,
." ;.,;;;';;;;;
p u rp o s e B . rh e e v a tuari on of F,.h .ha,a(reri sti .
.,n p,o,i a.
l,
" - "acrd j n g
c rli1l:
ues:r:eg
rh e w a y s i n w h i c h rhe res, i rems mi ghr be rcvi sedand
i ;pro,J
EVALUATINGTEST ANO ITEM CHAAICTEFISTICS
lilli:T1'.L::::i:l'*:'ssion
225
orthesechamcteristicsand
trrecriteriaroriuds
ITEM.ANALYSIS
PROCEDUBES
T he and l r s r\ u l \ru d c n r re \p i ,n \e <ri , ohi c, ri re re\r i r.m\ i s a poqertut
l oot tor resr
r m pr ole m e n r a n d ro ' d ,.u m u l a ri n s a bank of hi gh.qual i ryi r;ms.
The pro.edures
In r\ \e ,l ,o n h d !e b e e n used rradi ri onal t) qi rh rrems
fi om norm.
reisenced measures,br,_trhey can be used also fo. i.._ift"_
_it .t"..r.r.*"."i
t P ro , .d u rc \ s p e . i fi ( r r d e s i B n edfo, cIre, i " ..,.r* .*
.a ,* , i ,* * i rr' U l a" l
rs d
.' F
( rsDF
rn d s u b \e q u p n r \e i I i , rn .) l re m anattc,s, an rndi (ate w hi r
h i re" msmav re
r uo ea{ o , d i tfi ,u tr a n d w h i i h m J v tJ,r, ror w hatel er * * -r,,"
ai " .,i .i i " i .
pr opef l\ D e rw e e nh rg h i n d l o b d r h ievers.S omel i mes these procedures
suseest
s h\ an i re m h r\ n u , tu n i ,i o n e d F rr e(ri \etv ancl how i r mi ehi
s. rl nrl " " .al ?1,,,
mo\r,ofieniremanattrsontr id.nr,fie\p,obte.t',"a ,1. i"rr.,,"i l,r"
r or I nc p ro h a b re c a u \F s .rn d p o \\i b l e 5()l urron5.
"-."iJi
Ire m a n a l v s i \ b e g ' n s a tre rrh e resr hasbeen r,ored Of rhemanysetsof
,
anar v s r sp ro c e d u re s In u s e , o n e h .,\ been (hoi en to i l l u.rrate
how the process
\ ur k r A n d rh o u g h mo s r mi , ro c o m p urers (dn do rhe cat(ul ari ons
d." .i l [.a-t.:
r ow' r ne p ro , c s s rr d e s ,I' b e d i n d e ra i t ro hel p vou devel op a compl ere
undersrand.
ing ofthe rnfbrmatioD rhar r€sulrs. A ctassr;om t.*t * *t
t.."_"r.i.
the procedures by hand would foltow thes€ six sreps:
".r,i**
l. Airzn€e rhe rcded tesrpap€rs or answersheetsin s.ore order from
highesi to
2. ldenr,ft an upper grcup and a rower group seprrarety.The upper
srouD i! lhe
hishe!
rorins 2r penenrronerourrrrror irre
;.a ,r" i.i"i.".",."
i1,"up
ii"
fg u a ' n u n D e r .'
r.w e s \ o r i nSof rhe rorrt 8roup.
'
h
c
5. Fo r e a .h i re h , ro u n r rh e n u mbfl orcxami neesi n the upper group
rhar.hose
p r. h re \p ,,h F rrrp rn a ri \eD
. o a \eparare,
j i mi tar rau),fo; ;h" 6" .; g,.oup.
4. R -e .o rdrh $ r.o u n r\ o n a c o p y ot.rhe h{ d rhr end ot ,r,. _" * po" Oi ng
*.
.
"
" p u n \e r| | e rn a ri v e sC. rh r u s eo f(ol nred penci hi s re,ommended.,
r
alh
h e k e te d.rei poncF
and di !i de rhts,um by drerol arnun.
ll g l j ,: ' w :,
lo
In rhe
upper drl lower groups. Vutriply rhis d;(imat value bv
ro ro' m a pe'.enrage.The rerutr is an.timr. ol rh. indcx of
eh dim.ultu
'u,
{ t . Su
b ri a |, l h e .to w e r F o u p c o u n t l mh rhe upp€r
.ounr tor trrel evea re.
8roup
s p -n r. u n ,d e rh F d ' trrrrn (. b v rhe numberofexami neesi n onc
of | ne;ouo!
re,,ner Broup srn(e both are rhe sam€rize). rhc rerut( r\pressed a3 a delih;I,
is rhe index of dis.rimination
An Examola
An illustrarion of the dara obrained by this process for one
item is Dre.
-.
senred in Figure l3-1. Answer sheeB from a so.tut
r"st *cr.
"ti,ai.,
I /6srudents. so rre upper and toker groups consiired
"uoit.ti..
otr}|e 48 studenls ha!inc
oe nrgnesr and the 4E having rhe lowert scores. The keycd rerponse
is marlej
------
226
EVALIJATING
TESI AND ITEMCHAFrcTEAIST]CS
O4e
rersof s ke v lo rrye)has been
occLrrlflt-
-a lr nasbeen nqeasing 147 24)
(0_ro)
9 " * " " " n n s d u e10ri sfs raresor cancera.d heandrsease
? rrl lnas
" . increased
c.
for youngpeoplebut decreased
roroder peopte(o s)
o t has remained
q! le slabe (l 7)
cmirs(0 2)
Flguro13-1. rruslralono ren-AialyssData
esesf.,l l ow i ng cac| responsc
(first fig'Jre) and how nianr of^ttcr.arrye
LLe tower
,n\e Ol
l F In ,tr" ,uter A ,oun. r;
'
hc
,.,e.r Jn\kc,\,,nrl .,,, ,h,,., rhe t;,,,t,
)w er qroup. ?4 .hosc rhe fi rsr resD ors..
l0
the second,5 the third, ard 7 rhefourth.Two (;,rr" r"*".
s.",; ,i,,.1.; i:irjii
rheircmar.aI.(Note
nar,,e a. ,ot r"*: r,",,i
ii..-",r.jir.'lij
:-_':jryLo
::
percenr
or scorers
responded ro rhis r(em.)
T h e m o d F ra te .l c p re c u t d ,tfi ,Itr) or rhe i r,.n,
i \ InJr, arcd hU r,,. ;! ,,p,
- rre rt r..p o n s e
r . nr o r.o
rh . rw . qro,rp,,.m\rnert.,.rt,utJr,a,,
r.ff,," .,r
'n
l. Add the rwo counc for rhe k.ved re\ponscl
' ti + 2 4 = 7 1
2. Divide this sum by rhe torat numhel 01 stu<tenhin both
groups:
7l +96=074
3. Converr rhe d€crnal value ro a p.r.e.,atse:
o .7 1 x t0 t)=
/4 V a
l are of thc di { fi cul t, i ndex rhat w outd bc
age of rhe entire group, a[ ti8 srudenrs.
rri mare\j l t be quIF sr;,tJ, ror! tor l d,A c
Ior smr er.ta.\es. Or .aur\" . to, \r,;tl
esponses of alt srudents 10 compure rhe
N.rimrnarion of rhe item is indicaicct b!
: differe.ce i, proporrions of cone(r re,
o,,p\ [/47 _ 24r + 4b = 0.41i ].A nd
ea(h
w rtt qi r.e.ai h a(ra(red sume re\D un\el
I rhe lower group. In sum, rhe moderare
.rimination and a usefut conrribu on
ro
ETALUAT]NG
TESTANO IEM CHAFACTER
SICS
Z?7
S E LE CT I O iIO F T H E U P PE RA N D L OW E RG A OU P S
'I lre rypc ofirenr anrtysis
wc .lcs(rit
nr ak esu s e o l a n i n re rn a t c ri re ri o n
acr'revemenr. Thar is, rhe rorat scor
cnfenon rarhef rhan sone orher i
ment In order ro (on.tude rhar an
rLem,one musr assume rhar rhc enli
.Such an asstimption is ordin
come close enough to rhe mark on
a fairly dependabte basis for distin
a. hr ev e rn e n t I{ o w e v e . i r mu s r b e .i
.rrrcrrcn.an oni, make a test a betre
t he r es ra b e fi e r s re a s u reo fw h a r rr.,
b. t r er c ri te ri o n th rn rh e ro ra l s c o re (
exrernal criterion_ yet an cxrernal cr
nal c r ir e ri o n u n l e s s i r i s rru tv a b e tt (
The use ofroral.resr sc,)rc as l
f or ir c m !n a l l s i s h a s rw o i rn p o rn n r
lim ir s s c r b w th e w i s d o rn a n d i k ' o r
nr . ldet cs r d o c s .o n re c l o s € r rh a n a D r (
nuar r ha r p e rs o n w i s h c d ro m e a s u re
on lhe re s ru h o s e i re n s a rc b e i n s a ,
Ih e s e l (.ri o n o f h i Ah l y d i s ..i
crire.ior, rcsrtrs in a res|vhoie ir"m c as ur e s .Ir rh i s s e n s e ,i te n r a n a tl s i !
k ind of a n a i v s i sa n d s c l e c d o n w e i n
an.i Di8hr nor elen improve, the vat
to th. (esr as a whole, and rhis is n,
reliable, and thus probabty more vati
\re p 3 tn
p r., ,5 s o t"i re m rnatri i \ rrl l e.t tor the counri ns
re\D onrc\
'h.
,pq.' ,:d l.wer
^t
27 pr'enr sroupr tlhv 27 p"nen; \^;;;:; ;;;fi;;;
l.
r,,qerloulhs (25penrnr. rhirds
,33p.,,.",r.
"..,., h,t".. r50p.,;5i;r; ;;;
i:,.h1,27 peren, pro,idA,he he .ompromrsc
b.rq*.,,*,, a",_"r,i"
i:llll
DUrI n. on s rs re n rrrm s
0 ) r. ma k c th e
t o m ak e rh e e x rre m e R o u p ( a s d i ffe .
r l93q) de m o n s rra re d i ta t w tre n e * r.
P
upper and tower fourths or thirds. Ho
the intuirive feeting rhat 33 pd.enr is
groups of larger slzc or thai 25 perct
228 EVAL(IAT]NG
TEST
ANDITEM
CIABICTEF]STICS
diflerene herqeenrhe gyoupsi\ grearer.tn ed(h. dserhe\uppo\Fd
Jd\dnrasers
{;sh,lv-mo,erhanolrabr rheoplposing
diudunrage rn. l,ftimum varrrcil,z,
Counllng the R6sponses
The counring of responserto rhe irems is tikely to be rhe
mosr redior,s
, -lme (onsumjnspdr
ano
I ot rhe rndtt\ir. H.weter. tur mdny{ tas,onm re{\ rt"
number ofpapers inlach extreme
sroup may b. t"* ,r,,, i.", .r,i.r, ..r1, ii,.
rask se€n lessformidabte. a chart-can'be de*f.p.a tr,* r,",
it.-.-,"-*i,",.o
down rhe r.rr \ide and
rtrernari.A tatr;t.d ,, ,",, ,h; ;;; ;;,;r;,;"
.rsponce
. harr helps ro trsani/e rhp wo,r rnd. ir man) .opie.
ur ,r ar(
T'lllq".,'i. ar one
ouplcated
rime, a strppt] can be kepr on rrandror future teJtsor to
strar.
rirh Lolleagxcs.
Ot|rn (jc,i;rt .rdtfor aid;( , rn p.rrorm rhF,,1r,,,,f,^,r'".a
{
)nsecountsby a showof handsin ctass,as
Lsrng
srudent!olunieers.Bur neirherof lhcse
r il:llfl$:,.fi.i':::
""."f :iT.,,x.,:"?,*
spected. Optical scanncrs and compurers are rhe most
effi.ienr rools availabLe
r o, obr a ,n ,n g ,h e ,rc n ,rn a tv j .,u u n i .J n di nd,,." .v" " y,,h" .t
,l r_,ri ;;;,;a.;;
ng and Lompudng faciliries make such analysis avaitablc
to tcachers
INDEXOF DIFFICULTY
Historically, two measu.es ofitem difficutrl
ha,derro
car(ura,e
bu,.,'*n,,,
,...,.",,.,li"i"'i,i,:irlli
Hi;lli;l l;;:l"l
:ir:!'::u.
i:J,';lt
15;i".".^i?;:1,e;:r
i':';#,TJt:it
xl:
iJi:i,Tli:'fi
or rhe inrlex.oJdirtr utr\ r\, rhe more .litfitut, rhe i,";. r *
*.",a
-.",,..
l:'ffi."::,lJ
+;J';';."'1,:":l;.':;'
;:'il:"i'"T,T]i-i*i:]"r.iJli:
xr:::lx[iti;]
::g:.'i,;,.*:::u:H:;i
tI;i:
!1';i:,t
$:::ii:,1
li!il::"i
r @r r t ule re ta te d r^ J .h i e te m e n r rc s d ns.
The numencal value of rhe index ofdifficutl
of a reir item is not derer.
H:"";:l"Jt"'il::,:il';:"::,i,.,:,.#,
I j;i::lil;:lj:.tr'iJ;:i,;1i,",i.*
::
;x'.'f.'J;:ffi:
.],.;! o;;;:;'
Hi:i'.i,";lht,jiy,:ti;l:',t:*;1,.;.,0[::flt
Tho Dlst.lbutlonot DltflcuttyIndices
Ir is quire narural to assume,as many tesr consrmctorsdo, rhat a pood
norm.referencedtesrmust include sone easy,t.ms ro r*t ft. lo*;.;i;,;,";;;
some diflculr items ro tesr rhe bigh achievers Afrer all, ir Dusi discrjmrnare
EVPLI]ATINGTEST
AND ITEMCHAFACIERSTICS
229
anrong studenG over a fairll wide range of achievement tevels Dur rhe acruat:
t es ung c r r cu m s ra n c e sra rc l v w a fra n t s uch an assumphon. ].hc i tems rn mosr
ir or m r ef er c n c c d te s rsa rc n o r Ii k e a s e t o fhrrdl es ofdi i fcrenr hci shts.al l
D resenr
ing thc same rask bur vafvnrg in rheir difficul.v. Such nornrefeienced ilms do
differ in^difficully, but rhel ditTer also in rhetind of task thet presenr.
S up p o s e a .l a s s o f 2 0 s tu d e n rstakesa resLand I2 of rti e studentsansw er
ir c m 6. or r e. rl v . b n t o .l y 8 o frh e m a n s rj .
r ion is t hat a .! s ru d e n r w h o a D s n ,e rc dr
swcred the casier quesrion (6) corccrlr:
also $ould be cxpecte(l ro have rnisscci(t
cxpectal'ons are oiien Drisraken when 3
' I abl c t3 I p re s c n tsd a ra o n rh € r esponsesof t l srudenrs
ro si x resri rcms.
A plus (+) in rhe rable reprcsenrs a (orrccr responscj a zcro (0) an jncorrecr
resDonse In rhis exhibir rhe srudcnrs hale bceD arrangcd in order ot abilir!,. and
the rtems in order of dilEcult!. Nore rhar the irem missed b,vgood srudenl b ,!as
nor one of rhe mosr difiicutt iicms. poor strr<ienrJmisscd ali ;he easier rtems b;r
managed .orect ansr'€rs to nlo of rhe more difficutt itens.
11is p o s s i b l e r() i ma g i n c a re s r rhat w outd gi l e hrghl y.onsi srenr rc [s
q h e n J d m i ni .rercd r,, p.,i ri . ut.,r oun. R c\
a' r o. r it c m ' a n d r, r,,r, ,ru d e
rr
rl rs
"
qoul. r b. r " e d ,o ,,.i \rc n , ,t .u' \,,c \. h r a pJ,,,,,,i .,, ,,uJe,:,
pi ,r.i ,i ra: i r-r,,
n, qi .rp
pr J ( f uat t r g u d rd n l ,a rl n rr,.s .,,r i rt o ,h .' ,rem.,,r rht re\r "rhrr
e.,\ri .j tor
the group than thar irem Corrcspondinglv, failure oD a parricular irern would
almost guarantee l:rilure on all harder
c ons is t ent .R u t a tc s r s h o v i n g s u c h a d e g
would also be characrerized bv mlrch hif
wrth the same number of ircDs. Su.h rcr
with in prac(ice This is anorhe. reason !
inclr!de items ranging widely in difficuh
Most item wrirers produce some jrems rhar are ineffecrile (nondjscrinri
nating)because rhey are roo difficult or roo easy.trflbrrs ro improle rhe ac.uracv
qit h s hr ( h r
rh rr L rn i m prn\F i r. enre r.ti rhrti rr. usu.r \ harr
lhc - . f f . . I of re d ,( i n s rh e tJ n g p n t i ' .m . ti ffi (ut' ) rarher rhi n i n,,ersrne i r. I t,(
dillc , en( es i ' , d i tfi (u l rv rh ,,r rp ma i n d mong i rems hi ghe\r i n.ti vi mi nx;i on.,,(
usuallv more than adequatt ro make ftc resr effective in dis( minarinq differeor
levels of achrevemenr over rhe whote range of rbiliries for uhich rhe"Gsr is expected to be used
Some data from a simple experimenral study of rhe relarion berween
Trblo '13-t. nesponsssol 11 Stud€nlsto Six Test ttems
c
+
D
G
0
0
0
0
+
0
+
0
0
;
0
0
0
0
0
0
0
0
0
0
0
00
0+
00
+0
o0
+0
230
EVALUAINGTESTAND ITEMCHARICTERSTICS
s pr ead o f i te rn d j ffi c L rl tyv rl u e s , o n thc one hand, and sP readof resr s.ores a..l
lelel ofreliabilrty coefficients, on the other, are presented in Figure l3 2
' l h re e s y n rh e d cte s tso f l 6 i te ms each w ere " construcl ed" hl ' the sel ecti on
of items from a bi it€m trial lbrnr of a social science test This trial forDr had
been ad m i n i s te re d to o !e r 3 0 0 .o l l e ge freshmen and aD i tem anal )si spcrl or!rcJ
t o I r eld i n d i c c s o i d i l l i c u l rv a n d d i s cri mi D ati on Ior cach i rem Thc i te rs Lonstr
t ut nr g r h e rh re e l 6 i re m re s tsw c re s cl ccrcdso as rc l i el d l estsdi l l eri ng w i del ) i ' ,
dif f ic ult y d i s o i b u u o n s
F lgurrc3 -2 . F e rd rioornD s r,b
T E SI E -tu k n 6
_______
EVALUATING
TEST
ANOIiEMCHAAACT€FISTICS
23I
In Tesr C, rhe irems selecred,ere rorcdhat2d,n dimcuty vatuesas near tne
mi d d l eo t rh e e n ri red ^ U i b u Io n ur di l fi ,uh\ ri tuer a D o\;i hrc
In T€srD, rh. i.ems selecreduete drftibued h dntnuh; value\ s unif.rmtv A
posible over thF enrire range of availabtedimculy values
ln Tcs!f,. rh€ i1emswere sel€.redfor drrm ditfiLUrr)\ aluer,inLluding rhe ci8,r
ea s i e \ra n d rh e e i g h rrn o v d i ffi c u h l em..
WheD these rhree 16 irem resrswere scored on a set of2b3 answer shee6
for the Gl.ircm rryout fbrm, rhe disrributions of scores displayed rn rle histo.
grans of FiFre 13 2 were obtained The disrriburions of irem diffi.utties a,c
indicared by rh€ rally marks along the verticat scalesro rhe lefr ofeach hisroerah.
Note the nrlerse reladon berween the spread ofrrem difficulries a;l rhe
spread of rest scores.The wider the dispersion ;fdrfficulry values. rhe nore con.
centrared the disrriburion of resr scores. Nore, roo, rhe very low retiabitiry or
scores on fie rest composed only of very easy and very difficutt items.and rhe
somewhat higher reliabrliry of rhe rrures from rhuse iests composed of irenrs
m or e nea rl y i n rh e n i d ,a n g e o f d i ffi (utr). tn rhorr, rhe fi ndi ngs ot rhrs srudy
supporr rhe rc.ommendarion thar irems of Drddle difficutry bJfavored in rt,c
( on\ t r u. t in n o f a , h i .a e m e n r rc s rs .
INDE XO F DI S CRIMIN AT ION
Uppe.-Lowor Dlll€ronca Index
The index of discriminarion rhar resulrs from srep 6 was firsr describeil
bt J ol' nr u n { l v 5 l r. Si n .e rh e n i r h r\ rrl ' acred (onsi deri bl e afl en,ron and dD .
pr olal. lr i \ q i m p l e r ru c o mp u te a n d ro expta;n ro orher\ rhdn \ur h nrher i ndn e,
ol dr s (f lm rn a | l o n a s th e p o i n l .b i \e r i a l c orrel ari on.bi seri ali ur retari un,t tanacan s
c net f ir ienr rF l a n a g a n .| 9 3 9 ' , rn d D .i v i s: . ueffi , renr (D a\ i s, l v46). Ir has the" \er\
us ef ul p' u p e n ' . w h i , h m o s r o t rh e o rh er correta(roni ncti .esIdck.ol be,ng bi Jsed
in favor of items ofmiddle difficulry. As we have already seen, ir is precise'tyrhese
it em s dr ar p ' o v i d e rh e Id rg e s ra m o u n rr ol i nto' mz' i un ab,,ur di tfe,;n,
i n te. ets
" , pr
ot ac hr ev e m rn ra n d th rt rh u s (o n r! i b u re moi r ro s ur e reti abi ti rr. l t the
i nrar!
goal of item selcction is ro maximize reliabiliry, as ir should be for nor'm,refei.
enced rcsts,rhe items havinghrghesr discriminarion in rerms ofrhis rndex shoutd
be chosen.I.em difficulry need norbe considered direcrly in irenl selecrion. since
no ir em t h a r i \ mu c h to o d i ffi c u l i o r much roo easy , an posl i bty shoq eood di \.
c r im inat io n w h e n rh e u p p e r l o w e r d i f l erence i ndcx i ,,;sed
Item discriminarion indrces of all rypes are subjecr to consjderable sam.
pliflg enor (Plrczak, 1973) The smaller the sample ofanswer sheers used in rhe
r nalv s is ,r h e l d rg e r rh e !a mp l i n g e ' ro rs . A n i tem l ha' appearqhj ght) di scnmi na,.
I n one s ma l l i a m p l e rn a y J p p e z ' q e dk or crFn nesati ve i n di \c, i mi nr' ri on i n
' ng her s m a l l
anot
s a m p l e T h e v a l u e r o b k ined tor achi evemenr.Lesr
;rem\ are atso
sensidve to the kind of jnsrrucrion rhe srudents recejved relative ro the iLeD .
Hence rhe use ofrefined sradsrics ro measure item discrimination seldom seems
But elen though one cannot determine rhe discrjminadon indices of in
dividual items reliably wirhout using large samples of srudenr responses, irem
232
EIALI]AT]NGTEST
AND TEM CHA,qNCTEf,ISTICS
analysis baled on smatl samples is srill worrlwhile as a means or overa tes! im,
provemenr. How much betrer a revised test composed ofth€ mos! discriminatilg
items can be exp€cred ro b€ will depend on how larg€ rhe samples and how snat
the sampling errors are.
Bls€rial and Polnt.biserial lndtc6s
T h e h r\e ' i a t rn d p n ' n t.b i s e ri /t torrel ati on (oeffi ci enl s are pre\enred as
dis c r nn i n a ri o n rn d i c e s i n s o m e i re ' n.anatysi sreporrs generaredby a.ompul er.
Their cornpurarion is Loo complex and rine consuming.o waranr o,rr atte;rion,
but because rhey are popular rndi(es of discflminari;n, it is wonh comparing
c ar h s i rh rh e u p p e r l o w e r d i l fe re n (c rndei di s.usscd abo!e.
The bisdial .onetntiatue@[Lint desoibes rh€ retarionship between two
vaiables: ecore on a tesr irem an.l score on rhe roral r€sr for ea(h e;amin€e. High
positile cofelario's are obtained for items thar high.scoring studenrs oD rhe resr
tend ro ger righr (ireD score = r l) and low.scoring srudenis on rhe tesr rend ro
get wrong (ireD score = 0)_Such ilemr are inrerprered ro be hish in dis(rim,na,
tion. Negarively discnminaring itens show rhe ;pposire relari;nshio: Mosr sru
dent s r th h i g h re s r s c o re sh a v e s .o res of zero on the tesr i tem and manv w Lth
low tcst scores have scores of + I on rhe item_Ttle point bi:6ial .olrel^tion co;fgint
differs from rhe biserial coefficienr compurarionatly and theorericalty, b;r for
purposes rhe lwo can be inrerpreted in essenrially rhe same maDn€.
'lem.analysis
When borh are compur€d wilh dara from r}le same rcst irem, rhe biserial
coefti.ient will yield a vatue fiat is always ar teast one-fourth larser rhan rhe
poinr biserial (curlford, 1965, p. 321). Ne(her coefficient is as biasea in fa,or of
items of nodemre difficulty as is rhe case wirh rhe upper-low€r index Thus, ir
is possible ro obrarn relativcty hrgh poinr.bisenat or biserial discrimination rrr
di. es f o t v e ' t h J rd .r \e rr e J s ) i re ms.Thi s poi nr i s w orrh rem€mberi ns w hen
s ele. r n g i te ms n n rh c b a s i s a i rh e i r di scnmi nadoD i ndi ces to bui l d a rcJt or Lo
determrne which irems may be in need of revision.
ITE M S E LE CT I ON
One c'f the t*o direcr usesrhar can be made of indices of discriminarion is in
rhe\elr(tionnfLheL,c\r,rhdri\.mo5rhighlldi..rim'narjnB),remsto,in,tu\ion
dn 'mpro\ed !c'sronol rhc rps,.Hoq hiBh,houtdrhe indexotdist riminarion
'n
Ixperren.e wirh a wide varieryofclassroomrestssuBgests
thar rhe indices
_
of- item disc minatron fbr mosr oI ihem can be evaluatedin Lhesererns:
030t o03 9
0 20lo 0 2 9
Below0 19
goodbul possby sribjecllo improvefrent
Feasonably
Margnar tems,usla y needingandbeingsubjectto hprovement
Poorilems,to b6 rejectedor lmprovedby revsion
EVAIUATING
TESI AND ]TEMCF]ARACIEF
STCS
233
benade,o
secure
l"'#:x'jl'Ynjlll:J:,iil'J,i,'jj,::.'jli":r effb.shou'|d
a.i.,r,..r,"t,oii,".;;;;.;i;".-, i::,iis::,:.,T,i;:.:i
$:.:lt:ll_.:'i,lLi:l
rhehishcr
wlr,rways,,",r.
l:::'Ll,Xi:i:1li:i-,'s
;".*.,,r,^ir,,,i,ri!,..J,Jiil
".'J'";T:T:::,']':
il:i:::
tr*t'l?i{irll[*:'il"".''"''::"'"::il:i:i.:'i
,
(t D)!
6
T his fo rmu l a i n d i c a rc s rh a r rh c s cor
square 01 the sLm of rhe discrnnirj
thar rhe larger thc score variancc 1
relabitiry of rhe scores, rhe tbrmul
v alue o f rh c d i s o i m i n a ri o n i n d i c €" ,
Of .ourse, discriminarioir s1
for selecting rhe irenrs for a norn ..
ance white maximizing reliabitiry rs
rhat correspond ro rhe conrenr area
r / . h l ri l e . i re m \.d n b e ! rn e e d i n
on a p ' e u o u \ rd m i n i { | | rri o n L on si n
irens can be setected unril rhe nuo
are obnined
ITEM REVISION
T he s e c o n du s e th a r c a n b e m a d e ofi n(
Jecred rh€re ap?eded ro be some rhar (
rng revrsrons, fte irems were rried our !
dent s a n d re a n a tl z e d . R e s u l ts o f rhe
i n rb e fo tto w i n g p a ra q .raphs.
' ndic ate d
l h e fi rs r i re m d e a l ,
rLe di rri ncri on berheer, rhe rerms di n,r, dnd
.
"i;
87% What,tt !ny, ts tt€ dt.flnc{on b€tw..n clmato and
woarher?
.. Th.r6 ts no hporl|nr dtsfincton. (t_6t
D. Cttmlt t6 prtmddtya n.tsroi t.mpr
mfiv otn.r n.rur.t ph.non.n'. F3-i{itu'
"no '"rnt"ll'
whltewo'th€' hcrudo!
-
234
E\,ALUAT
NCTESTANO ITEMCHAFACTEFSTCS
'c. Climrte pertainsro longerp€rlodsot tlne than w.!th€r. (43-30)
d, wealher perlainsto naturalphenomenaon a rocrl rarh€rthsn a ,etiorat Ecati.
(23-11\
I Li, .ir c r n i s s (rrc w h a r k b d i l l i c u l l l b r rb. group rcsred(onl y 73 correcrresponses
' , J , i , ' ! - , , ,t\,,,,i!L ...,I,1 .1 ,{ ,
,,r ,h"j i '
i IJr(
h,l l
\,,nl t
IJ
m,,rr
q!!,J
rhJn
p" " - ! u ,l c ,' r' .' ,,.$ ,,(,1
tu n ,\rl \'
f\i ,,.,,," ri ,,,r ,t rl ,e re\p,' n.c,,,unrs i rrrl i .
( a' f s r lt rr .e s p o n s e , Ba s rL tra .ti l c t o a.onJrderl btc number ot sood srrdcnts
r nd r La t rc s p o rts eI w a s trr), e a tl ta ( ti l c t() good srudenrsr]raDto prrcr:S i n.e rhe
s e, D , , f th c q u c s ri o n s e e D re .ll ,a s i c al l l (l crr aD d si D ce rhe i nrended corrc.r re
. p, , n, c n a ,," 1 l ,,.F ,,r,.,L l ,.c tf,i r-,,,r,.rrrarrLl .,n.hrr{ i ns
r n\ r
r c, \ t ,U ,J .r. l r .,p p t.u , rr rh ,r re\1" ,n,- h i ,,,.l J L. mJ,l e l .5r r rrc' j \i bt
',
r uak nr g i l s i D rp l e re n d v D ' rc w l i a r D o rc spccrfl c S i n.e responri ed seemedD uch
r oo phus i b l c to rh . b e rL c fs ru d e r' tsi D Lhcgroap b.i D gLesred, w as spoi ted" by
s r bs r ir u ri n g r D ro rc o b \.i o u s l yi n .o rre cr respo se.Thc revi sed' ri tcm (revi si onsi n
upper . a s e l e rrc rs )rc a d s :
6270 Whar,il a.y, is lha dlstinctionb€lwe€n€lim6t and w€rth€r?
0.s8
., Thsr€is no lmponanidisrlnctlon.(2-22)
N, CLIMATEIS PNI ARILYA MATTEROF NAINFAII.WHILEWEATIIESIS PBIMARILYA MATTEFOF TEMPEAAIUFE.
(3-25)
p€.t.ln6
p€riodB
Climara
to
tong€r
ot
tim6 rh.n wooth6r.(91-33)
'c.
4 WEATHEFIS DETERMINED
BY CI.OUDS,WHILECLIMATEIS DEIERMINEO8Y
wlNos.lrr-20)
Alalvsis data of rhe rcviscd item reveal rhar lhe revisions were elfecrive. The
.hanijed
is much easier and much rnore hrghly discriminating rhan rhe origi
nal. Only'tem
nine of the good srudenb chose distracrers. trqually importanr is the
la.r rhat thelc revisions did not appreciabl) increase the number of pmr sru.
dc n, s , h u o ' Irg th e c o n e ( t r.\p .,r!e . l L r\ rnrere' ti ng ro nore rhar on rhr' se,ond
rr/out the number of poor studenrs reho chose response a increased markedly,
even Lhough this response had nor bcen nlrered.
Thc nexr itcm deals wirh rhe common misconception rhar mereors are
''filling stars.
$v" Do !l.r..vor 16llto th6 6.rlh?
0.35
& Y€3.Th.y may be s6€r ott.n, panbubdy durlnocort.tn nontn6. {12-2El
D. Ys6. Tn.l. rll c i... orulld by l.lllng .rarc h c.rtaln t.gton. ot rh. .srh.
(30-€l
a No. Th€..nh movo. roo .|pldry lor lb srryh.trontr torc. to rcr on th€ d!n.
(6-111
'.1, l.'o, Th. l.lllne ol I d.gr. $.r.9.
rt . rootd d..tr.y rh. drrh. (53-tE)
This item again is somewhar r(D diffiolr
though itl discriminaring
pow€r is fairly good. The i.€m mighr be made somewhat €isi€r by revising th€
EVALUAT
NGTEST
ANDITEM
CHAFACTERISTICS
235
responsc r. This response can be legitimarely.riri.ized as "rrifkv', bccause rhcr€
are meteor.rarers HeDce in the relisic'n, dris respoDse alone las chrDsed.
42Yo Do slars ov6rlallto the earrh?
0.56
a. Yos,Th6yhay be seenollen. particutartydurtngcedain honihs. (20-60)
b. NO. PLANETSTIKETHE EAFTHHAVENO ATTRACTION
FORSTARS.(1.,II
c. No. The e.rth moves too rapldly lor its gravitatiomttorce to aci on th6 stars.
{9-14)
'd. No. The lalling ol a sinsteaveragesrar woutddesrby rhe earth.(zO-14)
Noie that the difficulty of the nem improved olly slighrll,, but rhe .hanse obvi
uu' l\ s puile d rh e rrrra ,ri r€ n c \\o t
rh ( !.rondre\l l on\e H ,' hercr rh.,hJ;sedi d
nor in, e,' \c rl rr p ru p u ' ri o n u l p .o r i u d.n' . , h,. ^i ng rl ' e. urr r arr\r r. \ppar
",
er r r l' . nr' os r o l rh c i Ih u i ,.\,h i j ,.d ro rc ,p u n $
a, uhi .h hdd n.,r Leet, mJri red
T h e n e x t i te m a rre rp rc d ro d e al w i rh rhe rel ari onshi pbcrw een rhe num.
ber of t im e z o n e s s p l n n i n g a g e o g ra p hi carea and i he si ze of rhar area
23./, There6le elevenlime zon6sIn the U.S.S.R.
This lact indicatesrhar
0.09
a. much ol th6 a.ea ol the U.S.S.B.
is abovorho Arctic Circte.(12-26)
b. the u.s.s.R.ls wldor (€asFwe3t)than it is tong{north-south)(56-ao)
occupiesa largegeogmphicaEa. (27-18)
'c. lhe U.S.S.R.
d. Somearcasol rh6 U.S.S.n.
rrs abovethe equ.tor and somearc betowthe eduaror
(5-16)
This item is much too difficult and is very low in dis.riminarion 't he najor prob.
lem appcars m be with chorce ,. Ir was a lery arrracrive choicc oaerall, bur nore
attrac(ive to good students than to poor ones A new second response was $,rit|en
that was expecaed to be less closely related to the idea expressed by rhe keyed
rtg% Ther6ar6 elgv.n llms:ones in the U.S.S.F,
Thls lact Indlcaiosthal
0.56
a. much ot the ,rea ol th6 U.S.S.F.ls aboveth6 tuctic Ctrcto.t4-32)
O. MOST OF THE AREA OF THE U.S.S-R.IS IN THE EASTEBNHEMISPHERE.
l 1 t-2 5 1
'c- the u.S.S.R.occuplssa largegeographicarc.. (78-20)
d, somsarcr6 ot the U,S.S.R..reabovslh€ oquator6ndsom€6re b6towrhs eouatoi
t7-23\
This revision improved both the difficulty level and disc minarion of rhe irem
markedly Most good studenK were able to decid€ on rhe correcr response, but
ir appear \ rh r' r p o o r s ru d e n rsd i s rri b u re d .hemsel vesnedrl y evenl ya(r;ss dl t four
r er pons es .m u (h a \ w o u l d b e e l p e c L e d i l r])e exami neesw ere bl i ndt) suessi nR .
' I h c n e x r i re m d e a l s w i fi c ru s e ot shonage rn rhe ground
w ati r suppi y.
238
EVALUAT
NGTESTAND TEM CHAFACTERISTICS
48% Wat.rshortag6sh manytocatitiesh.v€ beencausedbywhich,iteny, ol rhes6lactors?
0. 17
., Removatot n.turat pt.nt covere owing tasterrun.o Into srr€ams(.t7_13)
D. hcEas€d demandstor w,r6.In hoh6s, busitresses,
.nd industry(1s_26)
., Neithe.a or b (12-22)
'4 Bothd and b (s6-s9)
T h, . ir , m I, n l d p L r,,p IJ r,.d i Ir, trrbur ,,,,,,r hrghtl t,\ Irj ,,,,,ri re tn rhi . i
J\e
||r ppeJ rc !trrrJ r rh e tn u l r mi g L ' l .e $ r ' hrl -c,tF\.j r,,. rtr. .eIr i r\e i t," qu" " ,," n
s r r f t J m e d i n ru i h a \ru h ru h .re
\Fr,.,un .,, p..r,.i rr.,,,l e,r
" " .,,.,,,,,,:i
henc e r t w i s n e c e s s a ryro i n c l u d e e ach of rheseas a si D gte,sl rppose.l
l yi D correcr
r es pon s ca n d ro m a k e b o L h " rh e c . ,rrec(resp(,nseThi i appro;ch rs apparendr
une, r ih r ru n fu ,i n ts l u l ]re ,m,{ c .,ro uppo,,,n,,," ,,."
l ri , u,e
1,,,,,,a..r r,,,
r
rn rh e I e \ i \i o n .ne ui . Lc i u e, . r, .pun\e. $.r. n1,,,ed "rn
Lhe sreD of the itcm and rhree bona lide distracLe.s were provtaea as iortows,
53% WHATFACTOR,
OTHEFTHAN INCEEASED
WATERUSE.HAS BEENBESPONSIBTE
0.62 FORWATERSHORTAGES
IN MANYLOCALITIES?
A. RESTRICTION
OF STREAMFIOW BY HYDNOEIECTFIC
DAMS(3-22)
6 . D IST U R B AN C
E N O R M AL
OF
R A IN FA LL
B YA FIIFIC IA LR A IN MA K IN(3-18)
G
.. INTENS|VEFABMCULT|VAT|ON,
WHTCHPEFM|TSMOSTRA|NFA|-LTO SOAK
INTOTHE GFOUND(10-36)
'd. NEMOVALOF NATURALPLANTCOVERALLOWNG FASTESNUN,OFFINTO
STREAMS
164-221
The itcm was made somewhar easier and much more discriminarins. In rhis case.
the revision process worked in a way rhar gladdened the heart of rh; rtem wrirer.
The final irem ro be illusrrared deats wirh knowtedge of(he rype ofinfor.
mation fou'rd on a physrcal map of a region
12% A physlc.l map ot r st.to wouH snow
o, 21
,. th. sr.to'3 rettwayn.twork,(20-25)
D. .vcr.gs ratnt.tr by monrhtor rho 3r.r!. (20_34)
c- ths toc! on ot th. tlrgo.t ct .€ tn tho st.t.. {3a_40)
rd. tho .rrr.'s htgh.sr .tova{on. (22-t)
The item wrir€r decided rhar this irem calted for roo fine a discrimination. All
responses were arrracrive ro good studenrs because no single response se€med
b€s L S om e p h v s i (a l m a p s d o rh o w maj or rransponarj on syci emsand some mapq
s now r ar n ra rrp a tte rn s ,rh o u g h n o t u suattymonrhl y averaS es.
l i na * mai or I i ri ;s
are ocmrionally used as poinr! of rererence on phvsicai mapr. Each ;,rsrrarrer
was modined to reduce its attractivehes3 ro good sruden* wirite nainBiqing a
certain level of plausibiliry for poor students.
EVALUATING
IESTAND]T€MCHAF}CTEFISTICS
237
43% A PBYSICALMAP OF A STATEWOULOSHOWTHESTATE,S
0, 40
A , AV EF A GS
EU MME R
R AIN FA LL,I2-' i )
o. PoPULATION
DENSTTY
(20_25)
4 tuosT |MPoRTANTCtTtES.(15_40)
'd. HtcHEsr ELEVA ON. (63_23)
The revised irem rurned our ro be reasonably discriminaring and easier,
but it is
srill a bir more difiicutt than mo3( item wirers w"uta p.ef..l titr,.. st"jents
nor r le a r a b o x r rh e u n i q u e rta ru res of a physi cat map or the second and -.
thi rd
or s r r a (rd s s rrI rc p r e (e n r te g rri ma recorrccr ansl versrel ati ve rc keyed response
d.
T h e s e fi v e j re m s d o n o r i l t usrrare a rbe possi bl e w al s i n \hi ;h i r;m.
analysis dara may be inreryrered ro ard in irem reiisron. Whai rhev clo indiuLe
is rhe general narure of the process and rhe i)ct that it nal be highly successfur.
O T HE RCRI T E R IO N .R E F ER EN C EPB
D O C ED U R E S
T he pr o re d u !s ro re m a ra tl s rs d escnbed i n thrs chaprcr are equal y
useti rl
ror tu.tgi g rhe qu!ltry of jrrms from norm.referencea and criterion.rererencea
measures. HoNcver, rhe srandards used to differentiate good and poor items
rn
rhe twu rt ps oI rneasu es \ r11 and. consequentty, an iteniearmarkj
forrevision
' In d ) b e s e te .' cd w i rhou,.hdngF
, ur one t\ p c o t In e d ,u re
tor ure i n the ol her l yD e
In rhe pfepamtion of irens for crirerion.ref;renced _.".".""
.".ir.",
masrery rcsrs, mrnrmnm comperency resrs,and some professronal cerrificarion
tesrs,irem writers need nor make a conscrous decisroD to wrire items rhat wi
bc
ale said, rhe rigid conrcnr specificadons of
eci .i on $har rhe ren i rems shoutd mea\ur(.
to be well preparedj rhe item writer should
' oI70 ro 100 pe' .en' B r | hcse srandrrdr
re l udged ro be too easy,bur i tems , an b(
ever, IFms rhar, sa). rl 5 percenr o[ a srour,
answers conecdy are not auromadca y good irems fo, a trirerion.referincei
measre. For exanple, rhose rhar conraiD several implausible distracters or rhar
gr v e inrF rn d l ,l u F s ru g g e s ri n grh e .o n e(r respon:e are sri brd i rems. rhe
anatr.
s's oi - easy crrrerion.referenced ircms for appropriateness in difficultv
shoulo
include a review of rhe items for t€chn,cat iieqiracy. The .."i.*..
,i.;i;
b;
c ont in( e d rh a r a h i g h p ro p o rl i o n o f rhe srudenrsi crua y kner rhe conrenr mea.
s ur ed br e ms rh a Ls h o $ h i g h d i tfi .ul r) i ndi ces.
The upper lower d,fference index can be used to assessrhe quatiry
or
cdrerion.referenced irems as well, but gen€ralry ir is lnuch ressuseful in
ih;sii;
,re m, re g a rd te s su r i ts i n r;nded purpose.i s usel ur i r i r
. f . f.!
fera" a neqa.
r' :ir. e dis c ri mi n a ri o n i n d e r. Bu r ma n r g ood i rems ,i .a t" * i r* t" " .* r,* .,..a
rnF,
s ur es m rv h a v e d i s c ri m i n a ri o n i n di ces of z€ro or onty sti qhtl y hi chen
Th€
explanar;on-for this phenomenon retaLesro r}le fact rnat score iisrribur'ions
from
cnrenon.reiereni ed mcasurer rend to be quit€ ne8ativety skew€d and lo$ in
vari.
abiliry. The upp€r and tower crit€rion gr;ups r€nd to bi very sU ar in terms
or
236
EVALUATING
TESTANO
IT€MCHAFACTEFISTICS
r lr al r e s l
ta c r. rh e
rz. l cores tbr (he rw o sroups D ra! bc l )rrcl r.
:(9 r: In
' \e rJ H e
or ( ung u rs n a D ' f
A l te rn a ri v e i n d i c e s ro rh e uppcr-t.,w e. i nde\ ur poi rLl )r\cri at r,rrrel a
t r on ha v e b e e n p ro p o s e d fo r u s e w i t tr i rcnr\ fru,r, rrrreri onj crrrenr(,1 D rc.aures.
F or €r a mp l e , C o x a n d V a rg r\ rt9 6 6) suggesteda pre posl ctri ].ercn.ei D (tcx ()
t uoge I n e a D rt) o t rre m\ ru d r! ri mi n . e. t hc pr, rp,,rI i ,l , .,r \ruo, ,,1
an it em c o rre c d y p o r ro rn s rru c rn' r l prc) i Lj bi radcd ti onr rhc
t)roporLi ,,r oj
r h€ s am eg ru u p
re s p o n d c o fl rc rt) , afLcri nstrncri
's
on (posr)..l .hc1" i e.i 11,",.tu.
olr hein d e x ,rh c n' uo re h i B h l \.l i \(ri l ri r,. rri rrj
t,e r, rr, ..J,,,tb.,l r,,l " rt1.,J,,_
olr neqd y rt' .i o m p u te c t.s ,m e h J \,
1Jh.tc,,rt,i . j ,i ,| | ,,,1,\,,t
i . ,\ | | , ., | ., , | |.I I
sensirivity-")
A n n L h e r i n d e i , u \e d p l rn J ri t) r,,r i rr' , tr,,r, r.r,rrr\ r,\.,. .\ L.r.,t ,,
t he phi tu rr(h ti u n (o F fl i ' i c n r A ri r,.rrrrrrr,.rr,,,,rrrprrr,rt L1 ,,,,r,t.,r.,,r _rel
nor e / 0 o r + l )w i ' h rh e n ra \rc rvd (, i .i uI ,,r ,sr, | ,,, , ;,. , , , , .. ,. r, , , , , ,,,r r' .,,,.,,
I
"
lr eq u F n ,y ra b l r l i k e rL .' r.h .,u r t{ t,,s.
' $o
Mastet
Nonnastel
B
a
c
D
If " m a s re d ' re n d ro a n s k e r correcrty(A i s l arsc) and,,non,n;rsLcrs,.
rcnd
r o ans h e r i n ..rre , rl \ rD r\ l i ' g c rt\o,. rhr i rem di { Lrnar,. \i
ber\,r r rt,L
two levels of achielemeDr. When rhe vatues B aDd C are large, rtrc irer) sho,,i
nc gat iv ed i i , i m i ' rd i u I. T h e p h i c u c tfi , i enr ha, p
" ars1 ,,1,1,,,,h ,r rhe I,rL.D o\l
dif ler F n (e i n'd e \ b e c d u (c re q u i rrr ni , prere.r J.tmi
ni .| l dri " n U L,rur;t" * rt,.
num bero l n o n ma s re rs i s c ri' rl fi ri e n rl y t a' gc,rhetJhi roetfi ri cnr si l t pr,^i d, J;,,,,
leading i n d i (a rro n o frh e d i \c ' i m i n d bi ti r\ ot rhi rrern,.
T h e i n d e x o t d i s c ri m i n d ri o n r dn l e u5rd ro,el e(, rhe Lc.r i re,nsror i nLl ,
sion in a crirerion.referen ced measure, also.'Ib do so, irems tirsr nusr bc
Arouped
a, c or din B l o rh e c o n r€ n r (a te g o ri e s ourt;ned ,n rhe rabte ot ,pe, i ti , ,rtoi , ori a.
f or d' ng ro rh e o b i e i ri te \ b e i n g me a\ured. thFnrhel U mrrci ntrnrn\requi red
from eacb caregory (an b€ selecred on rhe basis of rheir discriminadon inii.es.
This procedure will ensure rhar the conrenr balanc€ required io make valid score
inrerprehrions will be achieved.
Th€ decision consisrency procedures desoibed in Chaprer 5 provicte ar
ternadve merhods for assessingrhe qual;ry ofscores fron a criienon.ieferenced
test, especially when rhe rradirional reliabili!y analysis seems less appropnare.
POSTTEST DISCUSSIONS
On(e the Iarrroom resr has b€en i.ored, rhe resulrs(an be used ro Dromore
'l€arningo'ro contiburero the
addirional
kind ofoverlearning
rharresi.irforqer
ting. Tesring postmortemscan be profirable ro studentsas w; as ro reacher:if
EVALUATING
TEST
AND]TEMCHAFICTERISTICS
239
rhe\ are planned well and conducred in a buiinesstike mann€r The feedbacr
t om sru d e n r\ ro r.r' c h e r a b o u ' rh c
can lead to irem improvemcnrs as I
The main prepararbn b) rh
iiun of
| l e m J n d l )s r' J n d J re .
"n
about w h y c e fu i n i re m s $ e re to o
analtsis and r€flecrion by srudents
papers. or if rn anc$er ker rq displaved on a uansparency, clas rrme will nor be
needF .l ro a n \{ e r k e y .' ,l i re d q u c s ri on,
Thc class discussion should focus on the rtems thar were mosr dimcuh
for rhe class;quesrions abour orher irems can be handted on an individual basis,
if Decessarl,altcr class or during sone orher ofFclass dme Srud€n(s who missed
rh. jrem under dis.ussion should be encoumgcd ro explain how rhey answcred
and ro i.drcare ambrguiries rhev may have detected. Disagreementi occurrirrg
berNccn a smdenr and rhe rcacher rhar seem not to contriburc subsrantivelv ro
.lass discussioD should be suspeDded unril a Iarer rime.
It i s u n l i te l r th rt p n \rrn u fi e D rsshoul d l ead ro ttre revi si on of rhe scori no
ke\ or r,) a.l.teriun or an! irems from scoring. Or,,io,srl, cfert..r erro.. i" scorl
I ng or i n p ' e p .rri n g rh e \.ri n g k e y shoul d be rccri fi ed, bur conrroversi ali rerr
ley \ \ hu u l d n o r b ( rh rn g e d . T h e re i s room ro takc rheseand orher rypesofmea.
inro ac.ount in using rhc sco.es, rhar ,s, in grading ind settirrg
curoff scores. such rnerhods of accounong for eror srroula iie expiainerr ro stu.
dents so rbar rhcy are a$.are rhat.,eFors, will be addressed in aniquitable way
S UM M A RYP RO P O S IT IO N S
i
1emanalyss s a usetuttoo in rheprogress
ve
mprovemenL
ot achievement
t6s1s
2 Thefelevance
ot a sel ot temsis estabsh6dby
t her r e ta to n s h w
i pth i n s l ru c l i o ncaot n lLogr ng
1enr,Lneappropnateness
oitheirlaronomtctevet,
andthe porenlialfor nituence
by extrareous
iac-
8 The more variabtG
the scorestrom a rest,the
mo.elkelytheteslhassucceeded
in differential_
ng betweenexamineeswho possessditferent
amourtsol lhe ablites measu.edby the rest
9 The mosl siqniiicantsralisjlcatmeasureot tho
quatilyot an achievement
lest s the re abitilyoi
3 Thedeqreeofmarchbetweena tabteofspecI ca 1O lem anarysisbegrnswiih the countng ot relo. s andLt r le
e s i -te mc o n l e rti s a n n d i .a ro no f
sponsesfiade by high-and tow-achieving
siu_
thedegreeof baanceachievedin rheiest
dems10eachof lhe lemsin the t6st
4 Themosrelticenl test nc udesasmanyindepen l1 Whle ogica objections
can be madelo lhe !s€
deniy s c or abre
tes p o .s epse ru n i to fre s i n gl rm e
or thetolal scoreon a teslas a criieriontor anaas rs possrble
wthoui sacrificingretevance
yzng rhe tems n the tesr,the praciica ettecl
5 Personswho
lackspeca competence
inthesLbol lhesesho comngs s smat and the practical
leci coveredby the lesl wi obtan scoresnear
convenience
of disregarding
themis greal
lne chancetevetilthelestis appropfialen spec 12 lr s convenienl
andslalislicay detensib
e roconsder as 'good studentsthosewhosescor€s
6 A nom{e'erencedtesl isappropriate
pracelh6m n the upper27 percentollhe rota
in diIicutly
I ils meaf s midwaybelweenthe pedefi score
groLpand ro consderas "pooa sludentsthose
andthe expecledchance6core
whosescoresplacethem nrhetower2Tpercent
7 A llr es lss houl d i s c rmn a tea c h i e v e rs
a n dn o n acrreversol rheconlentrheyaflemptlo
measure, 13 Theproponion
of correclfesponses
to an remby
no malrerwhatthe testng purposemaybe
lhecombneduppefand owef2Tpercentg.oups
2'O
€VALUATING
TESTANOITEMCHI,C}CTEFIST
CS
providesa salislacloryeslimateot lhe dificuly
rB Thehrgher
the average
dscri mi naton
i nd6xtor
ilemsIn a test,the morevartabelhe scoresare
likelyto be and the more.etabe lhe scoresare
14 Formostc assroomlests,it is destabterhal etl
thellemsbeoi middleditficultywtlhnoneot them
extremely
easyor ertremetydifiicu|
19 The nem-analysis
procedlreslsed with norm
15 In generalth6widerihe distribulion
of tem diiit,
relerefcedmeasuresare appropriate
tor tems
cully valuesin a classroomtesl, rhe more re
l.om crileron{eferenced
measures
atso,bul the
slricledlhe rangeol scoreswi||b€ and lhe tow6r
slandardslor d I'erenrialtog
berweengoodand
lhe reliabiliry
ol thosescoreswi be
poorrtemsare
ketytovaryi o.l hei w osl uatons
16 A convenient
andhightysalisraclory
indexotdis, 20 Thevalueof posltesldisc!ssions
io ctasss highiy
crimlnarons simptylheditferenco
in thepropoc
dependenlon advance prepararonby the
lons ol co(ect respon$e
belweenlhe upperand
reacherroc!seddiscussonand aclNesrudefr
lower27 perc€ntgrouos
17 Good normiererencedachievemenHest
ilems
sho! d haveindicesot dscrimi.ationot 030 or
OUESIIONS FOR STUDY AND DISCUSSION
probabty
1 Whalminimlmqualiiications
sholldbe mel by lhosewhoare asked10lldge the
relevance
ol a glventest?
2 In whalsensemiqhla vorypoortesl haveexcettent
baance?
3 Wh a l a c l o rs mg hcla u s e c o rtenl taral el mutl i pl e-choceand
rruetasereststobeequay
4 Whymighlil be possiblelor a lesfio be judgedhghtyfeevail bll low In spectcly,
5 Howcoudwedecdeiia 20 item,4-choice
mul p e,choce restwasabo!rasd f' cul jorlhe
s a m e g ro l p a s a 4 0l e- ml fu e l a l serestw henboLh
areused norm{ eTerenced
p!rposes?
' or
6 Howcouldthe reativesze ot lhe standardd€viationbe esrmated
for a specrrrc
norm
7 Whaliactorsinfllencethe s 2eol lhe d ftcutry ndexor a rest tem?
I il lhe sameresl s givenlo rhreeseclonsot thesamecass whyhrghl i be prererabie
to
c o n d u crll e ma n a l yssw th th e c o mbi nedqroLrps
ratherl handoi ngthreesepa,ateanayses?
why rniglrtseparaleanalyses
be lselul,
9 W h a rs m e a n b
l y th es ta te m e n,rhe
t, upperow erIndexsbasedi ntavoror remsot
Nontest and Informal
Evaluation Methods
I m dt s ine r h i s s .e n rri o fro m a s i x rh .g r ades(,en(e cl assroom:
M5. Frdnle i\ 6ing rh€ derh€ad proie(ror ro explain how rhe sreDs
of rhr {ien.
be rhoughr of as rhe Dain outtine for p,epann; a taborabn
;:l.Jlr",,".J :i::*,,.enr.
rn srancins
around,he
roo;,ie no,rc€d
a pu?ured
"Dino, can you differentiar€ rhe findings ofan experimenr from the .onctu.
r think so,'he replied .'The fhdings are nuhben bur rhe conclusions
are
w
''Thafs ofren true,',Ms Frank€ aloqed, .,but how do rbet purposes
differ?,,
-we . trte nndin$ kI wha, happened whar rhe re*r,
uJ f_iiii_.,i
*1..
.
"r
upposedro be a summar) or generat
sriremenr.
Con.
qu ro n s re j rw h a rw e rh i n k w i h appeni f w c
do rhe bamerhi nq asai n..,
I nar s a conrenjenr wav ro desojbe ue diflr,enre,,. rrre reaihir norea.
ano
,ner preserbr'on
connnued,
This snapshot from Ms. Franke s science classd€mortsrrar€s that
reachers contin.
ually gar h € r d a ra d n d ma k e j u d g me n rs and deci " i " n" ar.i ng i nrr.r.ti " n.
i r i i ;o
illustrares
rhe-varicry
of rech;iq;e, rea(hersuser".u.,"iii
evaruauons ot class and srudent progress:
I
. .il.i,i"._"r'ii
O6\@arh or r]le ctas\ wa! used to dflecr such non!€rbal indicaroB
as lact of
duenrion, po\irive nods of rhe head,or (in rhis .ar., *p**.r;
understandinc.
"ii;;;i;
2a1
242
NONTESTANO NFORMALEVALUAT
ON METHODS
Qza1iorir8.$ as nscd ro derernri.e Lhe narur. rnd exLenr ol nrin'ndcr sran.li.g df
A. / u. t / ir r of s ooef or dwas u s e d b r _ M s F r a n k e t ( ) r a k c a n r e n L r l c x r s i o n d o w ,
r he or der ed { eps of r he s ci e n r i f i c m c r h o d L o d e r e n n i n c \ {h i L h s r c p s s e r e . r e r r
A
has c r ear ed,a g a i . i n r h c m i n d o l r h e r e n L b c r ,r c h e h d e ( i . i c i l r h e
' zlir
hrlgrof
. aL
quz
n! de. r r c s pons e w r s s u l t i c i e n ( l d q u c s L b n i n s r o c c r s c
' Ie a c h e rs s p e n d c o n s i d e rabl e amoun!s of rhen prol cssi oD al rnne w i rh
assessrnentrela(ed acriviries, as much as 30 perccnt b), sohe cstinares (S{iggins,
198 8 ) T h o u g h th i s ti m e i n c l u d e s rhe devel opme,rr,admi ni srrari on,and use of
rheir own tests and the preparalioi fbr and gi!inlg of sranciirdiTed rcsts. Drucb of
the Iinre is n.r doubt devoted ro less lbrmal merhods gearcd prima.ily ro lbrlra
tive evaluarion observation, quizzes and invenror ics, checklisrs,rarins scales.oral
qu. \l ro n rn g . d ' rd rh e l i k r In l a ,1 . rea,her,a(bol hrh.Fl emen' Jrl rrrd,eronrl arl
ldt e l rc Bd rd rh e i n l n rma ri o n o L l ai ned by rl rFi ' o\n ubservari on.,' ,rd.i dt oi
imporlanf' to a variety of instrucrional decisions rhey make (Dorr Brcmne and
Her m a n , 1 9 8 6 )
In view of the tiequency of their use and because of rhe impoflance
r ea(h e rsa ri a th ro l h c re s u | tr,rh e qual i r) ol nonrerr rnd i nl ormdt as" csamenrs
is
a siFificanr matter The accuracy of rhe resulrs obtaiDed and rheir \alidity fbr
instructional decision making areJusr as importanr as for rh€ more for mal mea.
sures we have discussed in previous chapters Some of the auribiles characrers
tic ofinformal methods-lack of planning, lack of comparabilrty ofresulrs across
srudents, and failure to r€cord outcomes-can contrrbute ro informarioo rhal is
deficienr in accuracy and r€levance. Bur rhese shorrcominss are nor so inherenr
in r h e me th o d s a s rh c v d re i n rh e hi srori caluse of rhe merh;ds by rea(heh I hal
is, planning often .az be done, characterisocs to b€ judged da?,be defiqed ro
enhance comparability, and merhods of recording accurarely and .onvenienrty
.afl be devised and implemented.
Finally, d€spire the value of well.developed objective and €ssay achieve.
ment tests, there are many areas of t}l€ cuniculum in which rhese resr merhods
are inappropriare, or less appropriate, than ccrtaiD nontest merhods mrghr be
F or e x a m p l e . i n \rru r ri r.' n a lo b j e (ri ves rbar requrre speaki ng,w ri | | ng. and l i \rrn.
ing- w h e rh fl i n E q g l i s h o ' s e c o n dl anguagel errni ng-mosr ol ren requi rer,om
municative production In addirion, skills in such areas as physical educarion,
hoin€ economica, indusrrial rechnology, science laboratory and performing arts
oft€n requlre d€monstrations ofeirher processesor producrs. Many of rhe non
test methods and informal ass€ssmentsar€ particularly useful for moniroring
achievement in areas tha. cannot b€ measured directlv by more formal measureThe purpose of this chapter is to desc be and illustrare procedures thar
can be us€d to suppl€menr t€st information or to provide information when rests
s€em iU suit€d to the task. The main goal is to create a grearer awareness of the
need to think in terms ofreliability and validity when cr€ating such procedures
or ulinc the r€slrlts from them.
NONTESTAND INFOFMALEVALUAT]ON
MEII]OOS
213
OBSENV A T I O NATLE CH N IO U ES
I
O bs er v ar ion is a l u n d a D re n ra lD re d n u n ft,. ob| aj ni ns i D l brmarn,r rl i at srri ctl r
speak ' ng,c anno r b c a c q u i re d i n a n v o rh c , w av.Observarn,r s(hcdutesand checl
l is t s ar e us ef ul d e v i c e sfo r d i re (i j n g o rrr a rrcnti on ({) .crrai D bcha!i ors w c i nre d
ro obs er v eiobs e rv a ri o n a ls .h e d u l c s .re c o rd s ,(heckti sl s rnd rrri ng scal csal l i rc
d ev ic esf or r ec o rd i n g o b s e rv a ti o n s rh a r rh e eve rs obseN cd can bc D reservcd
a s a r elar iv elyp e rma n e D ra c c o u n ro 'foth e o c c urredccs A 11rhesccl evi ccs-can
se
to ensure rhar rhe proper behavior is nored and rhar ir is re.or.jed in an a.(urlre repr odu. r ble f a s h i o n l h a r i s , p ro p c r d e l e toprnenrofi he ri ds roobscnarj rD w i l l
conr r ibut e t o hi g h l y rc l i a b te a n d v a l i d o u r(o mes. Ofcourse, rhe rery bcsr chl rrs.
l is r s ,or t ablesc a n n o t o re rc o m c s e v e .ed e fi ci enci csi n Lheobscrvari c,nacr i rr€l r.
Obseners who see rhings rhrt are rol rhere, Drissrhings rhar a? there, or miscarc
g or iz € r he beh a v i { rs rh e y s e e s h o u td b e ,,,nri der;d
Jusr a! t,azrrd,,u\ as a
m ult iplc . c hoic e re s l .o m p ri s c d o f i re ,n s c o ntai ni ng roo i nany arnbi guous.n j D r.
p laus ible dis r r a c te rs Bo
. th fo rn rs o f rs s e s s nenrare tj kcty ro p,ovi d; hi ghl y rnrs
lerding infornarion
Spontan€ousObs€rvation
While Mi Vo$ was giving y,nre indi!,dual a$isra,r( t{, rtink, he .oriced rhal
Jana us€da dicrioDarl ro Iook up rhc spellingot seve.allo.ds as sheNas{firi.s
ad nnprod,ptu Lhem€.She alsays sLafledat the begnrDnrsof (he book an;
!ur.ed 5 to 0 pagesat a rirnebcfore lo.alnrg the p,opd l€rre; s€.tion The. sbr
r u,nr d dF p J Br a rmr, n o ri n 8rh er,r d i ,, rhr Inri r rgrrr , o, r,er,,r rrr. prge.
"r
unr r 5hero u d th
c Pru P c rPrg e
l h i s s pont aneou so b s e rv a ti o n c a n b e rh e b e gi nni ng srep tow ard D aki D gJa a a
more elficient user of the dicrionan, but ir musr be;.-ernb*.d
o. ...";4.,1 so
that individualized help caD be provided later ar a more conv€nienr rime. Obvi
ously, if the reacher can arr on rhe obsened informarion immediarelr, rhe need
L o r e( u' d ir is dimi n i l h e d .
Of course, an overreliance on sponraneous obseralion can resulr in
marry informatioD vords. Thar is, planned and sysremadc observarion will helD
ensure that significant activiries are obsened, rhar rhe mosr imporrant aspectsof
those activities are nored, and thar att perrinenr i ndividua ls ivit I be odsNed.
Spont aneousobse ra o n o fre n re s u h s i n ..tu nnel vi si on,' :w e see rhose srudenrs
who are mosr demanding of our arren!ion, and r{e may never ge( to see rhe reac.
tions or peiformances of orhers in siruarions tha! ar€ impo.r;r
bur rarc occur-
244
NONTESTAND NFORMALEVALUAT]ON
METHOOS
s p o n ra D e o u so b s e rl a ti o rs can be unexpecredbonuses,and
rhe i nforma_
t r on (b e y p ro v i d € ma y i n ftu e n c e i nmedi are j u;B menrs o. s,,b.eq" ." r
d" .i .;;;;.
J us t a s fi i s t i rn p re s s i o D sc a n s o mcri mesunkn;w i ;gry _ ," t" * " ,r,i " ri i "
i .n" " .."
in' c rp e rs ,,.., re l ..ri u n .r,i p .. \u rhe ou,.nrnc. tr,;,, i n,i d.,,rdt ,,h,;;,i i ;;_,;;
i I L Ie\pr,r.d bJ\, rur rhi s rrn\,,n, rhe.b\Fner
'I"eeo
" 1..'( ,.*
\ \rt\
ru!'F" 1 ,' d { ,'h\i.\,.n,.J
rh e tJ i l u I e ,^ dnrt) /( ,,1,\in d ri uns Io, posi bte .a:ses
'
I/i
.
ind P n r, rrrrJ| | | n p l r,a rj ,,n s , a n l e rd
rJutrr , !,n, tu.run\. tur exJmD rc.l dna mdv
K noh .' n n u r d r' rr,i n .,!\ g ,ri d c s u ,d . r' urLl h,,s ro u.. rhern bur,
t;r dme i rrrapoar
cnr reason, shc does nor usc rhem. Here are so-. g"taeii".,
rt
mist r heiiio
c r c a re a n d m a i n L a i n
rhar observed bchavi or us"^t.l l y ;." bJ." plaiD e d b ! D rl ri ri p l eta c to rs
l. A bchavior should nor be cc,nsidere(l iypical unless verified
in a con.
unr r ' ,9 { J r,\' .., i fi . rr,i ,,
. ^r r u D .' .,ri .c lri \ J n ., h ,.r ,,1 \(r r.r on , /rpl o(, J:ron,. or ,. ,
rrj c( Lronot a orher
r easo n a o l ec o m p e ri n g e x p ta n a ri o ns
2 A significant a.rioD should be observed agaiD for veriijcarron.
But if
. ond i ti o n s a re l :.e a re di n te n ri o n a v an{
ior \!ill recur, rhe loss of rhc nerural;sri.
or rr mry pronx)ie a more socia y deei
:1 . If d re { ,b re rv a l o n i s n r a st
t r r n( c 5 , a n d u b te L ri v €d r\L ri D ri o n are i r
pos in g a l re . a re e x p l a n a L i o n sSLrch
.
re
t hur ,a r b e \h rr.d w i rh \e \c rr' l , nrctprerers ro.{, ei r Independenr
rudemenrs
r egar o ,n B .? u \d r,L \.R e r:" ,,.." .....l ,i .,nu,cti ketrroprori aere" j coi prere.
le\ s o e ra rte d ,o r In q i q n i fi c a n trn to rmaLi on
4 s i n ,e s p " n rrn e " u , o b se^dri ^n: 1re unptanned, br defi ni rron.
a re.
.
. o,dins rnrm
. zn ac(ummodrLe
such5,rua,i"";,rrr
t. ,",ll,tr.-.i.",..
'harpre(on(ci,ed
""i
norions
rhe
oh,en.er
mdr
h"rt,;.
.p, ,; b" ,;;.
t::.ll ll1:l:,
m o\ t ra r" ' tra r.ro n o ri .e th u 5 ea s p e,r, ot an event rhar be, fi r
rhei r eri sri nc know t.
eoS e D d \.. to o rh e ' b o rd r. rh e c rp e(rari un, tormed trom our pri or
exo" eri enre
are more likely ro be tutfilted thai are events th".
"f";th;;,;;;;;i;ii;.;
"..
Planned Obs€rvation
A i e l l :p ro c l a i m e d ..p e o p t e w akher .an easi tl overuhetm w i
a
i ns l i s.
r ener w rrn rh e p e , u tj a ri fi e s ,u n p re di .rabte \i mi ta,i ri es.rnd
new l y di scove,e; dj
r er $r v o b s e n e c t i n a p a rk . o n a n reer corner. or i n d
bus) shoppi ng mal l . w hen
wr s e t o u t ro H a rc h p a rrh u l a r e v e nrs.a(l i ons, or obj e.ts. w e
seem ro be more
m or r v a te dto a ..o mp ti s h o u ' p u rp o se and Inore srri sfi ed
ro have done Jo rhan i l
v eJ us r n a p p e n e d ro h a v e i e e n s o mel hi nS unusual . S o
i r i s i n rhe ctassroom.
I noug n u n e x p e (tc d e v e n rs( a n b e ex.i ri ng and rnreresri ns,ptanned
observari ons
c an pro u d e re v e a l i n g . u n i q u e i n ro rmari on abour tearni rs
rhar can be used ro
m anr P U ra teth e _ (o n d i ti o n sto r te a rni ng i n a posi ri te w a). S u(h
i nrenri onal obse,.
r il: 1,
,r e | | rc ' e n r h a y to s a th er i nrormari on abour rearni ns n)tes,
meLl od"
or pr obil F m s o l t
mo ro r.s ti d e v etopmenr,truqrrari ontevel ,a;d coE ni ri te abi f
'ng,
NONTESIAND INFORMALEVALUAT
ON MEIHODS
,"':,"1,"r:,
,,.,"
lll'..1n"'
ur
e pro c e s so t p ta n n e d o b s e rv a ri on
Bu
245
, , 1 L \ d e r, , , , , i, , , i,
, :. , F , ,o i. , i, \ 4 , , , , !
I ii:il':l:':'
;illlll'l;' :i:il,;i;
:;;:ll:i,
l,::::
llljl I ' ;;;';'J:l:r..i
T h r q u .,tj r\ u i \e r,,i u h \ (r\ur|
g" !rl \
"
ur I nr r' ,rm ,h .,, k J \ d e
rrrl
.i B,,,.d," ,ii ," ., ];]l r' ,ir,
" ,uer,cd
r' |.
h e h dr
iu,\.rn JdJ,rio,,,rr,,:,,;ri,,., , i, " . , 1 ' "
r, (u
l'\
n,r
' ,u,l i ,\
' | 1,'
' " r. , , ia rh .
,r,"
*.. .rrr,"
.'l,.-'.;t;;;;il;':::ii:.Ili:f."JJ,Xffi
x,;,3;,1*.*,'"..t',
t Ohsttu.r subje.tiriry. .,t.ookinr
s eer ng a re c o m n o n e x p re s s i o n srhat'
r nr F
d p rri e n c e u n k n o r i n s tl S LrLh
t nerc r n rr n u r e d ri l v ro n tl . e d -o r i l rcl
2. ohetu*
inlertue
t.he rcason
Flieve rhat rea.he, absence from a
ctars,
:ome roo cooperative. Usually, a snall
number of observations, and more ex.
,er.ome the.egadve
effecrs of obserer
,crerisric ro be observed is described in
rence wil be required of rhe obseNer
Fnaxon ror exrnrpte. i \ ro derrrmi ne
n a ru-mrnureper i od. qe are i n rruubl e
he i s off.rast; A nd hor w i l t w e tnoi
o.currenre o[otfra\k behavi o! ,an
be
.o be obLai ned rhe rark behavi or i s
246
NONTESTANOINFOFMAL
EVALUATION
MEIHODS
Obserye on Schedutes
NONTESTAND NFOFMALEVALUATION
METHODS
247
SealworkObservalonal Fecord
5
Workingon ass qnmenl
- w trr other slldenl
2 Do ng assignmentlor olher ctass
Aead ng brary book or hagaz ne
Ta lkng w t h anolhefs llder t ( nonwor k )
5
6
Figur€ 1a-l
fsJ
rF+! | I
fFl-l
Sampe Obserlalcn sch€due
B c ha v n )rsa re l i s re d j n a .o l u n i rh rr .,n he scannedqui .kl y and l he dcscri pri ons
ar e no L to o d c D s e l yp a c k e d .(a ) Ih ere i s room rr the boL(on or on rhe back si de
lor s u p p l e ' n e n ta ry n o re s o r i n rc .pretari onsof rhe recorded dara
in social ralk of idlc rime. I he reacher share.t dris informarion wirh rtre classand
asked for their advice abour wherher the free-rine perrcd should be conrinued
uhat l { o u l d y o u r a p p ro a c h a n d d eci si onsbe?
Chockll6ls
A checklisr is a ser ol phmses or sraremenrs rhat describes cither rhe es.
sentEl steps rn a procedure or rhe most imporranl elemenrs of a producr. ordi
narily, the evaluaror using a checklisr will simply check rhe prese;ce or absence
oI eac h s te p o r e l e m e n t,b u r s o me checkl i stspermi r a rati ng of rhe quati ry ofthe
NONTESIANO NFORMALEVALUATION
METHODS
ac r k ) . o b s e r\e d .r (h c c h a rrc r.rrti c D
c c r t ' f lin g th e rc a d i D e s so f rh c i r p tr,rc
. he. k lir t ro c rr$ rre l h a t rh c t h a !c c o trrl
nr i ' u r
.r mrD rrl , h rr I l i s r l rr r,,l
'(r 'r
' r\c
J r p- r rn i rh t,ru d r: L trJ n u n d e rs c.u:
Y c l o n (1 9 3 4 )h a s s h o $ .nh o w cbe(kti srs.,l l r srpport D anv
aspe.rs01.i n
! r , ' , , i, , ,,i l ,,.l d r' ,,,,,,,,.,\\..,.rq ,
p.,,,,, Lrrt, r, r.t .' .i ,.,,r.i nc.r.r,1..ri ,.,
\ r , lc . J .,,t,r.r i ,' 1 ,,,,e | ,,r rtr.J trt.,r ,..,r,,.,,,.,,.," ,.,,,t,,.,t," t,.,,;,,.;
" ," .
..;,,
appr op n a re p ra (o c e (rtn b e c o n d u ( rcrl . Il ur nr(^r i D rporr!nrl r,
the chec| i st D ro
\ . id, . r he ,,i re .,j ,,rJ ,.rp ri ,,,p .rr,.r
. ,,,,, rt," :.r. ..,,rF,
.,., i .," ," .,r;" -,
\. F \.1 r..,r,,,n ,d ,,,r,u,,t,,,r..tpr.,,.i ..| -\,L| .IJ| | r,\| | | I
tFrr,,Fr,
s h, ) nldb c .a b l ero e v i i u :L re,h ri r o !,o pe.ti rrmar,." ,, ,1,.,(,.ghl y.;;
,cc" .ai ;i ;
! \ lhe i u 5 tru r rr,l
",
()n e
Bo a l .l a p h v s i c a tfi rD e ssuni r i n a si xrh g, ade tci hh cl assi s ro hal e
\ r uder , r \ d N e l n p .rn ,1In ,p l e m ,. r rr cx, r, r\c pri i g,:,m I hc .!Jtuj .i .n
utJnui ns
guid. r r ' \p p p r.l i r \ \h ,
.rrtcnr, k I l ;..,,pi . rn" i r p,.l :r
;;:
' N r rh i r
"
r
"
,,,
a\ a f . ' m , i \u n ' m J , r\ r c r d tu a ' i u ,r.| ,,r p,,,p" * ,
" ,ti ;;
.\.
,.,n.,n;.,,i ;;
.,."
;;i
;
"r
rrp ,rr n n d In r n \s e * i nS rhe (umpter,ncs
nf i r,,;,,he,H t,r
in
F igur e l 4 -2 w a s d e v .l o p e d . T h e c h e ckti srw outd bc
would b e B j v e Dro c a c h s tu d e n r a r rhe
help describe ihe essenrial ingredieD$ (
( he t ea.h e r j n e l a l u a ri n g th e q u a ti ry oj
thar were tbllowed in rhe dcvelopmcnr
FlguE 1.-2
SampteN€a th BeDo( Checkrst
D/e cd r
O " r F lne ir r or or . ac h
D t a l ea o t r , +r r r r h e s t p pw a s . o r p t e . F d s a r i - d c
Io 'rv. pd c e 6 n ljs I - ) if t \ e s . 6pwar' len,
. on p r e i e d J r _ o r s i a c t o ,v . d l d L s e a z F r o
lo,,r the
_
-
- A d^clor's ad ce .bo r ere..,s 1g wdr obteireo
2 Baseli.e lilness tesls wefe taken
-
_3
_5
a Putt-ups
or arm hangs
c Fecoveryindexlest
hprovemenlgoalsconsislenl
w h rhe lesl resrttswereesiabtished
Exerclses
appropfiate
for lhe ooatsw6resetecr€d
Aweekrylimeschedute
waseslabtished
6 A two-w€ek
]ournaiot programlse was nctuded
1 Personar6aclions
lo programetteclivgness
enloymenl,
and needsfor chansewere
Genoralqlalry ol wril€n expresston:
NON]ESiANO INFOFMALEVALUATION
MFIHODS
249
I . Ob ' a i n o r e n ri s ' o n e \a mp tc\
go,,d and poor vers,oni of,he p,od
^t
uct to ,br c\ aluare.l
D e . i d c i t e r!h j u d g m e n r .houl d be d \i mpte
)es_no, presen(e or ab.
- 2 . rI h u r.. n r rt rh e q r' a l i 't of ca( h artri bure
musr bi asesed. ror
In fi g u re t4 .2
i udgmenr abour quati ry,sar;\trc.
' equ'
rcqp,eqen(_abrcnr
I nr \ ( + ) or u n \d rrrrr,r,r' v /-\
i n rd d i ri
on ro
r0,.(i ffi nerdi sri n.
r inn\ in qu rl r(v a ' p n e e d e d ,r ra ' i n s r.al . \outd b. ..r" ," .f,t ,h,,
; a;;;ki i ;i .,
J . td e n ri tL rh e p ro d u ,rd ri b ure thar musr be preqentand descri be
hoh
good r n. t p o o r e \J m p te \ h o u J d h e e x p e .red ru drtrer o; eJr h arr,i burc.
For exrm.
pr e. I n \ r e p .t u t rrg u rrt4 2 .a l l ,ru d e nr\ma\ ti .r goat. t.r i mpro\enrenr.
bur rhe
tween rhe goals and rhe tesr .esutrs is likelv ro
bure\ rnd subdi vi dc rhose
+ i shrng a
provide diagnostic feedback. In Fizure
i ndi vi dual t) ro hej p asrs romol ere" ness
ir nd r o , n n v F r rh r. .' i r" Io n i n d d v d n ,e r. \rudpnl \r
Wh e [ p u * i h l e . | | v .u ' a ,h e,kti \r drdtr un a tew .J,nptc produ, ro
Ls
,
. 5.
. ne( k ||\ ro m p ' c h rn \rv e | l e .s a n d ' l | ., rpte!anre ot ca.h i rem
\tt (he, he(kti sLi s
dirtributed ro srudents as a grading guide, it should nor t. i."a;n.a
gt r r nt \ r uo c n r\ d n o p p i ,' ru n i Il ro rd tr rhF modi trati un\ i nro d,l ounr "iG.ri
,n rhei r
pr . ouc r de re to D n re n r)
b W h e n a p p r.p ' i i re .
/ n o \ i ( c In r' \e rhe. hFi kti \r. B t ob\ervi nc and
" rk
l { l r lmcr sisl lrn!:|'...1
ano
s€ l e m e,,..
n rs .
,,i \ p o s\i b re
ro un,,,ver
ambisui,,es.,*h"i,,r j;;s;;,
T h e fo u n d rl i o rt ,rp o n $ h i ' h r ppaonor, p.rpl ,/' l / r\ bui tLi \ a rd\k andl vsi s
.
or r he. pertu rm J ,rp r,, h , u b ,e r!i d . l hF pr,rcd ,ps fo! de\etopi IR J
D e!torm.
iD. c ne, k tr\t rrc .rm rtr o rh o .c ,l e r bF{tJbove.tur th" re are * onrc i mporranr
'
I . o h ,.r.
rn e \p c fl p ,
rark and rF.nrd rhe e\senr.al \reD s.
.
,ma r \hout,t
' he
\ ur e r he m.' rF ri a tsd ri t ,u n d r' r,,n' .torh
L. provi ded ro rhe performer' as
par t of t he ' g i v e n s"
2 Use a .lraft of rhe che.Ilisr ro obs€rve anorher €xpert so thar differ.
p " ri ^ rIi n g (ti r i ,al sreps ot rhe prorerr ran be dererred
r nr \ ^\ r F p s rt, re d ,r,e h i a r d u c ro i d i o \v n.rd\i es i nhFrenr i n rhe pertormance
ol
r r F r I . , er p c l . O l r,.u r\e , i I rh e rc a re s.reral efi ;, renr ana eheni v.
w ars Lo
, nm pic e rh ( rJ \l i n q u AD o n ., p e rl o , man(c.he, kl i sr ma) be
an i napproD ri are
a( , \ \ m er , r ro o r.l u \u ,h ,r\c .(h e p ,o d rrr\ti ketrrobemorew o,,hvotei al ua| ' ng
I l ' \ h .n rp p r u p ri a re . a l ra .h rh e (r i te, i un of acreprabl e
D ertnrmanLero
r he dei p | | .n o t rh e ,re p . F o r e x a mp l e. r che( i :r tor r;ti nq \omeone,r
btood
pr e*" ' e m rR h r h a v c rh r, s rrre re n rj W rapr rufftnuggl ) abo-uruD D er
arm.6ar
t^^P m 'uqh ro ttnt nb w lingn .an bp tn,at&t baLcft .;[ and om." ihe
it ati.ned
por f lnn ur In e s ra re n e n r rn d j ,.,re . w h a r ,,\nug enough. medns
L I' c e rh c d ' d tre d , h e , l l i q (n o b,ena a norn e perfor mer I hi , steD
wi
rhe.|lge5or dirfiLurrt
e\pc,ientedb] n." r.", n".. .;rr u.';a.n.
l!15..*' 11tr."r
nir po r n . u rr rF nI d e ra rt.| | d l \o s i
p u i nt our rhe need ro i nspfl staremenr\abour
250
NONTESTAND INFOFMALEVALUAT
ON VFIHODS
whar should no, be done o, hou so,
*"uid h('ione' rrr addi'|innir prrr
'ide' i, h".I on *h;;;. li"i,-;;I":ll"
""'i'lw"dr prerequrs"e'I'eedto he in' urpurrrFl i' rh,
ing otr;..','"*I
cood checklisrsare rime co
dererup Lur rhcr xn pat innru'
Iional andarses\menr
'
ai',a..a' iil'l l,''8 ''
\iruirions Yek'n
(re84)has
ra'"tin"a
hl''1-''crr'or
nr.h d , h,1 kti:r i. pdr ri, utir
.I pdl
tt kc
"o"'ur '""';";;;;:;;i
t
oari'urarrr inporbdr br.atrseit is a prerequisiLc
ro rearn
IN""'i;.','Ji",,T"u' ''
'.o.t
,
r.
dh: r :
is , o , n p r , x . . b F i r u . c n i r h p n u m h . r
o, \ i rrneni\ ur be
.l^.
, u!: :Fn or fr:he I nu n, r ir , nt p r , [ . u J , r .
lcn,enrt
i Irhtu rh€ fire_tn.i.g of a skill requires
fair!), dcrailed ree.tback to rhe
learnci
4 Wher suden$ depe.d on sel
thaD thejtrdsrncnts or aD in'
{ruc,or, .o .neck rheir pr**li"'"""""'
-u*'
RatingScat€s
Anorher rool of obsenarion
.
now trequenrl, a cerrain behavior oc
L The trait m8t
d? \fttpt t , L, hem , r q, , a
1A Sludontts prompt,
2. U.uatty
3. S€tdom
be deytibed obt
nt.h,\tp.n,
nth?t,atppo.nt
r g ir r u, , , j , J l's
: ',e: :t :etrt a
\ .r'i t hp, ol r n '5 r € t a r e d r o
hair definiriun.
18 Studonttums |hathpapercin on rtryre
e6chws€k.
2, 4 out ot 5 days
3. 3 out ot 5 days
4. 2 o. t6wprdays a so6k
l:r+lf
t+:1li*'T:H*#,ti:iir,,,m*;*T
NONTEST
AND ]NFORMALEVALUATION
ME.-HOOS 25I
rng rhe rimc perio(l ro s har i\ rlpn al in a weel.Norerharrhe slemtrom irem I B
Jrd rhe choi(c\ I'unr rren,IA rugerhertorm a rlirtv ambiguouss.aleirem.also.
Iren 2A mighr appear on a scatefor radng rhe quality of an end rabte
Dadc in a woodworking class
24 Oualiiyol th6rabl€iop surface
Otbn jolnt
The firsr ser of sale poinrs is ambiguous and imprecise regarding *re arribures
of the rable rop rhar shouid be assessed The crireria of perf€crio; are star€d in
observable terms with rhe second ser of des.riprors so rhar more objecrive, reti
able m e a s u re me n rsa re l i k e l y to b e produced;i rh i t
2. \ab dp\.'tptu,\ .]t.rld
a Mdp di,aer5toaot athd qntitr o lr..
'?pnsnt
qupt u) .n u t b u th .l rF n r 3 A tro n , a \p
rer h evaturri on s.al e i s formed * i rt i sei of
fairlv obsennble scale poin(s, bur some points retare ro frequency of behavror
(2). $ne reiarc ro e,vecontacr di.ecrty (l ;nd Sy,and some reiare ro use ofnotes
( 2, 3, a n d 4 ).
3A Maintainsoy€ contactwtth sudience.
1. Spansthe.ntlre audionc€
2, Occaslonsllyr€l€rsto nor6s
3. Tendsto look ar onty 1 or 2 p€opto
l, Dependsh€avityon nores
5. TEndsto re.d
AD impror€d ser of responses rhar focus on eye conracr frequency is illusrrared
by item 3B.
38 Frsquencyol eye conlact with audienc€
t. Al lea6toncs everysont€nce
2. Onceolsry 2 s.nrenc6s
3. Onc€ in eye.y3 to 4 s€ntenc€s
4. L.ss lhan onc€ In every3 to t sentetr@s
3.. Wha no' a.'4aa" qJ tatt?g, rtp ,adght. \p,t tp th. r|nm? gr^up bpingbpd.
'tems
lr k ( + Io t) d p p (d r o n ra ri n g to rn rs tur repofl i nB \ch,,ot progre$ or fo, re(,,m.
m endr n S rn d r\.d u rrs l o r d d mi c .' o n o' rmpl ol menr
4. How w€ll doesthe studentto ow diroc ons?
5- Howwouldyou ass€ssthe appttcant'schanc€ot compte ng a gradu!toprcgrlh In €duca.
252
NONTESTANDINFOFMALEVALUAT/OTI
METHOOS
6. How woutdyou descrtbetho candtdate,swritingski s?
1. We abovoave.age
2. Abov6sv€rago
3. Av6.rgs
4, Botowaverrg€
5. Wettb€towaver.g6
l i ft rh( rel rren,, g,oup ru u\r,l or .\amptr
rdr' \n' ,. pr' ri ' ul J!
t,rogra,n.. orhcr: ;i rn
l he ri rrer Jnd rl ' e I\er ot th. rrti nq\ dre IoL
:ation of rhc raring In rnost casesit rs norc
u p ro b'
b\ rne rJrer5 rhan ro al tns rhr
" \eddcd
r p they deci
ro u!e.
, | P r:tatrJarp @ \,pt, un paht \,tt. p4 \
t\ t.
:::l::l1l.i_".:..":,"r"ry,*::ill:;fii:i,.ii,l:,i:,]:::l,l#,:jl;
scare ponrrs need nor be defined hrpolar aclje.rives
can be uscd ro detine onl."
rhe end po rs Then rhe number ol inrennedrary pornrs
can be varied as .tecmed
s uir ableb y th e s c a l eD a k e r
atup to indrntp thar th^ haup tuJ ,t,\ulJ,,mr
rnc , rr, um\tdn, e\. d tai l u,. ro rrre i an L,(
6 . W h n \p u froI t t a tt, d b ? h u ,,.i i u, ro t" ,o, 0a,,, ne t h,,a
h, al ,, th?.D bto
"
"
pr iat np' \ o u h p ra t.l o r p a .htra \h n u tdb p.h?,
h.d tt., Z ,.1" ,,
" ,." - ,
screening form thar uses a 3_poinr Ering scalc: .b.,"
;;;il,;k;" ,.,i i i " Jf
",..rg.,
7. Enorgy
9. P.rUctpatsoin,unn|ng samss
10. Thrcw6. b.tl
T he dr r F tl o n . rn .l i , d re rh J r p rF r.n r ,l J$nrarer \t,outd
bc..,r.i de,.d J,,he rcrer
enc . gr ou p \. !!h d l r\ a h n \ r a \ c ra q e p o \rur e ,d,; \l trrr i \ .,b.!(
.r\.r Jqe t,.rnr, i ua
r 9, ; l h r.q i n s d b J Il . q i rh .u ,,.r" r " n,"
ro,.r,,.)
,1r,," ,;1.,f" ;;.::
'vion
es . no r e \p u n \e n o r a n o rm re frr" n red rat| n< . A \
",
c\cmpr,t;ed
h,.rF.,l dri r)
should never be sac fi.ed fo. efficier(t
Ur ing. d. r o ti n en d l e . R a | ' n g \i tp s J ,e pronF ro cc ri n ()pi .s
o| errur\ rhrr ra,,
Dc m inr mr./e uh \ . re n ri n s rI a h rre n e .s nt .u, h.,r.rs rhroudh
r" rer rtJrnrnr. l rl
dddi, , on. rh e p ro ,e d u r; fu r rJ ,;n 6 .a n be dc.,snrd ro
r" * 1," ,r,.,r,,* " rr,,,
, dr inger r o ,r h i l ,o ,n p ,o m i \e ,.o ,; rdti di r.. r" ,i ,,n,pr.
,;;:;
" r,f.,-,.,,i
l,
, a' e all 26 s ,u d e n r.,n r , td \\ u n e a , h . r ri ,c, h.,,a,re,i .,,,\.,,,;i
, b,-;,.,,;
,.,,;."
$'ill be preferabte to srudent bv studenr raring. Thar i,,
"ll'.,".1.;i,';;;;;-;:
NONTESTANDINFORMALEVAiLJAIION
METHODS
253
r at ed o n a rre n ri l l e n e s srh
, e n o n c o uperari on, rhen un usrr* rrme w i sel v.and 50
on, in s re a d o f ra ri n tj S a ra h o n a I fi \e t| ai tj , rhen Mi chael ,;nd rhen C had. Ir,s
pro(cdure litt help iL\err h.aroefe.! crrurs, rhe tendency ro
sile rnore Dosihve
r a||ng \ u n rtt rrd i r, ru \u h tc , | l s h o prui e., rn u!era posi (i rc-rL]ra.thi ,
rr,om
m enn a l u n
d n J rn g o u sr.
o n e m.,dFregar.l i ng rhc ,.or i n8.t $sal sr l o prc
' \ I' u m
h .u n s e rrom i n
! . nr rh e \o re
o n . re'\p
enci ;s,hcrori ;cot
,h,, D ;;,:" "
or h. r , e \p u n s e \, .,l l p rp r,. s h o u l d be v o, rd i rem ri i rem rarhel rrun ,rJaenr
u1
Sevcrai orher kinds of rad!
r he ( h d rd L rrI i s rk s o t i n d i \ i d u d t r.
scaies For exampte, some rarers ha\
r he v a l e s h e n r;' i n g d g ru u p o r i n d
t he s .d te @ tn \.4 \d frtO)h ^ a u \( ul
rarees, or some other unknow. reason. others use moslty rhe posirile
end of rhe
s.a,te(sd6asiE 6/ots) because of an un*lins,,"*
. *,ig"
an r na b i h ry ro d i s c ri m i n a rc l e v e l s ol quat,rj . ur an adheri nte
"1rr1,*ii,i;;",,"s,,
," ' f" ,,r1 to"
dards. Finally, ,'ro'r ,/.mttur tn.tenq oc.v shen nre6 avoid eirher ;xb.enc" o" _
oi
rhe scale and use mainly moderar€
s Land a rd \a n d d rr u n c o rn to ,rrb l e
t s,
, r 4r is m o re d rmL u tr ro d e re c r R i r{
rarer (redieads are votarilc pe^on:
hangs our wirh srudenrs who smoke)
history facrors (his brorher was a very poor
i erro$, rhe mosr effecrive means of pr;ven,
ns that creale an awareness of rhe porenrial
he relative uselessnessofscores thai contarr
I NF O RM A LI NV EN T O R IES
There are innumerable insrrucrional siruahons in which addirional rnformadon
abour rh e l e a m e r m ;g h r h e tp rh e r er' (her i n, redqe\rudenr mori vari on, ri vare
a,
P f lo' k n o w re d g e .(h o o s e rh . mo sr ette.ri ve dpproa(h. or i usr de!etop beuer
t eac he r .s ru d e n rrrp p o rt. S o m c o t rhi : i nl ormari on i :
earheredoi c, emeat.un,vs
I em ar i .a l l y rh ' o u g h rh e (o n s ra n r s pontancousobsrnai i on reaci err ao.
as i n -sz.
ingup,
i nto,ma.
.ro
re
ra
m
p
l
c
.Pa
rri
(u
ta
rl
v
s
henred(hersneedenre,i
n8behdri
or
||on s u (h a \ p rj o r a rh i e v e me n r i n r eresrs.arri ,udes,or prefei encestor tearnrne
. r v r e5,v a n o u 5 i n l o rma l a n d \v s re m ari .devi (e\ can be
us;d e(onomi (a r ro bui l ;
a s t o' e o l d e s (| i p ri v e i n fu r m a ri u n . Thi s srore of i ntormari on tan be rapped tJrer
as needed iD a variery of reaching-l€arning srtuarions
Oue6llonnelro!
Classroom rcachers who choose Dot ro rely on lasr year,s t€acher for an
as es s m e n ro frh e p e rs o n a ti rv o trh e i rnew rtasscan..ukesroi k,,ofrhenew sroup
wr m a D n e r In l e n to ry i a s u d e r q u e s ti onnai reta;tor.madero ach;everhe
reai her,s
rh o w n i n F i sure td .S provi des samD tei tems ol fi e rvD e
P t ' r P os e .rh e
' nwv erhn erore
rya (h e r
t har wo u l d a l l o
ro s i /e up a new (tass erfi (i eni l y. The use ol
" ;;,.
2Sl
NONIESTAND ]NFORMALEVALUAT]ON
METHODS
ended ncN giv6 ttexibilir) ro respondenrs bur requircs
rhe same care in devel.
o-prenr as rhe compterion rcsr rrems discussed i,
Ct.p*.
tO. Wr,", ir,. t...i..
,\,,,r e,r\r,ed
in:p(,,rk op,on,.rhe . * u n d n p e u r iri. . irr
I ne r r s ult i
ut an I nlor m al
invenl
rarse as rnany quesrions as rhe survey s
E ro 15 are as shown in Figxre l
'tens
m J r r el l u e ru rrh rr p ro b i n g . a d d i ri o nal
andl\ s r su t s p ((i fi ( s ru d c n r rrrp o n rcs r,
-. . -
" o ir; . ii. ;
,l
Would Lheuse ofdyrds be nore eftecov€lhan snau
group or commiueeworkr
2 ls dris group o.ienLedrnore 10 oral_auralsrihulus mareriat
thaDvisuali
r Arc most sLudcnrsshl abour oral reading o, do rhey
lac\ confid€nce iD their
re a d i D ga b i l i ry ?
il Ilare Lhew.iring experiencesof.hese sudenrs been
lirrited, unsucce$ful, or
,
I
nonre{reatidral experience have srudenLshad
wirh a microcom.
iJt:,,-*n
6 w l l r q o ,l rdj
o t rt,eEr,,up,arherLc !.td r pu/i c .otutnn ttdn tu
' l lcl p
,d-r.r
ttute
,"
' .. ,!
a /i .!,q h , , .l
u n ri e t,l ,d or puzzt.rrre.' rr.,. i n
-i ndl rFiguE1'l-3. CtassSiztng-Up
Inventory
Dilecllons Pteasecompteteeach oi lhese sertences wtth
a word or two thal best describes
1 My tavor le schoo slbject ts
2
3 Ihe krnd ol books I trke10 read .nost is
Theradioslaton mosrtistento ts
5 My favorltesummer actvily is
6 r wourdralherdo homewoftthan
7
Dlrecrbrs:P easecompleleeacho' ihesesentences
by crctingthe
9 J preferto tearnby
13 To sotvea puzz€, I woutd
14 To wr le a paper I preter
wordsthal bestdescribe
NONTESIAND INFORMALEVALUATION
METHODS
I Learningpreterence
rl-lJ I | |
fr.{.]ftu l I
fH.tI
l
n-u fJ-.|]tH] I
lN.l ft{t I
t\i f1.uf1.!IIl
r.}.{Jfl'.U fF|l |]-u
||l
fN] | | I
f N . . t fHJItl
f]-.tl
FIOUE14-4,
255
fl.lJ ft-u fN.tf1'1.1
fN..l f1'1..1
fN.l
n'! |.iJ.r
ft'jJ 11-.1..1
lll,l I l
Slmmary otCass Invenbiy Aesp.nses
Informat inv€nrories like rhosc
develop, once the reacher has thouqhr
class analysis is imporranr for the ieacl
may promoie some self,assessmenrifit
will nor draw arrention ro rhe responser
responses, an index.card file svsGm ca
of updarrng after the next suney. (Th
raprdly, €ven daily.)
Similar informarion can be obtained by the,reacher rh.ouqh
.-.
indilidual
c o n v e rs a (i o n sw i rh s tu den15.Ihnush tes( etfi r renr ,i l "
nair es . t h e re m e rh o d s p e ,m i r ro l to u uo ouesri on' i ng
i ;, .l ;,ri ;,;;;;.:;;fi";;,," .,r,,;'
t : n elic ir i n fo r rn a ri o n w i rh o u r I e q u i , in; , edurnS .w rl | | ng.
ur \ui ahut,r! Jbi l i ue\
t ha' que\ri o n n a i re \ d e p e n d u p o n .
Inlormal Reeding Invento sg
I formal invenrories in areas like readingand marh are
needed occasion.
s abour indivrdual srudenN or ro obrain
exampte, math or foreign tangxages).
256
NONTESTANDINFOFMALEVALUAIIONMETHODS
Mosr informal reading inventofies consisr ot a graded word lisr, gradecl
r eadin g s e l e c ri o n sa, n d a s e ro Ic o mpreheD si o qucsri onsfor each sel ecri on..fhe
9 8 ro 1 0 0
95
90 to100
75
T h c n ,l ' o s i n B ru g g e .ri .n \ J rc o ered tor d.\etopi ns rntorm:,t red,r, r:
ir ' \ ( nr uri .' rh rr h i l l p tu \i d " ,h e mo\r medni nsfut
i orbi qenr
tr^r;;
" na
" " tc.
s t udent s r. g ra d e s I th ru 1 2 .
I An exisring gaded word lisr may be used or o.e Day be dcveloDed
by raodoDlv s.le.ting words from each rcxr in a sra.led basal senes A lisi of
l5 r ' r l0 u o rJ . fi u m e h l r\c j . p ' i n rFd on.ard.r" ,l
torm, s.r rhar rs,..\i t)
2. Using (he samc basal series, tlo passagesshoutd be selecredfiom near
the beginning of the book for each grade level for which rhe inveDrory is ro be
us ed lh c p d \rg e \\F l e rrF d (h u u l d b e rep' e.rnrrri teofrh.,ubi c,rmJ
er,ru,:,b
uld' \ . r n d l d n q u J q e ,.n ,p l e \i r\ o l r hp grrdel etct rexr from ;hi , h ,r kr. Inkcn
E\ans, f,vans, and Mercer (1986) re.ommended varying passage tengrhs ac(orcl
ing ro gnde level preprimer 50 words; primer and srade i, 100 wordsi aD.l
Hr adc ' 2 d n d u p . 2 n o w o rd s s o m e i nren' ori c< u\e i ntroA u(l orl phraseq,rrt, ., ul
illu' r r J | i o n \ ro n i I i \ a rc p ri o r k n o w l edge.hu' su, h prel araror j i .l i pr e, tutl F rhc
l
use of "marn idea ' comprehension auesrions larer
3. A set ofeighr ro t." courp..r,.".""
quesrions should be w rren for
each passagc. Since rhe quesrrons are adminisrered omltv, rhe use of liec_
r e\ ponq e ra rh e r Ih d n m u l ri p l .rh o i .e formar w outd pur a ti ghrcr todd on,hor(.
r er m m e mo ru d e ma n d \ Mo s r i m p o anrl \, (he quesri unqmus' requD e mure rhj rl
r e, . , ll o l fa c ru a l
o r l i reral i nterprerari on A premi um l houl d be
' nufo
plr ' c d o n i rc m\ rq
i rirma
n g riionnfe re n c e rnd on B ;neral i zari onquesr;unsrhdr ger rl
"why," hoq" "what il or "whar next?"
4. Review rhe graded word lists, passag€s,and items amonga seroftea.h
er s r epr e s e n ri n grh e s a mc F a d e l e v € h as dre mareri ats C heck for passaaerepre
senrariveness.irem ,mbiguirv irem keys rrhe range of ac(epr.jble resp;ses fo,
open- en d e d i t€ m s r.a n d s c o ri n g c ri re ri a tor basat-tcvelpta.;meD (s.
NONTESTAND NFOFMALEVALUAT
ON METI]ODS
257
T h e n ra rc ri a l sd e l e l o p e d for an i nfornal readi ng i nvenrory shoutd be
... ^
valid for
use oler a number ofvears. As long as srudcnrs h;ve nor had an oppor.
tunirv to read the samc passagesas pari of thcir regular classroom t"struit't",,,
inv e n ro rv re ftl rs s b o u l d p ro v e v al uabte i or pl Jceme;r i n a seri es,thoosi ns cerer al re a d i n g
o r d i a g n osnrgdec.d,rg qeakrreses Tcachersmreh' t' ;tso
rato
re ma
ri a lkse, d e c i s i o n sabout our ofl evel
s e t h e re s u l'rs
resri ngpri or ro the ;dmi ni s.
tratror ol an every.pupil achicvement resr batrery (See Chapte, t? for further
det a i l s a b o D t o u t o fl e v e l s ra n d a rdi zedtesri ng.)
O RA L qUE S T ION IN GT E C H N IOU ES
I h, rp ,h n rq u e .u t u r.' l q ,,r\ri u n r ng\cnrrhetun,ri onsot tnsreri nql earni nsand
r . - c \ \rn g th F e \re r' r u l l p d rn i rB ut .nur\e.th.rropu,po,.." rren' ,," " n," i n.a,
particularly when rhe nature of rhe assessmeDris forDative rarher Aan summa.
r i\ c . T h e I' ,l ,o \e , rt rhr\ \., r i o n r\ ro d.monsrrrrc h.\ o,.rt que.ri urrne m" rhods
.,,c J n i n s tu l d \\.\\m p nr .j J,d Jnd hnq merhods
, r n L ,r,,d u e
.
of re, ordi ; rhc our
cones ol quesrjoning can coirribure ro rhe collc.rion ofhighly reliable-informa.
Purposes ol Ousstionlng
T h o u s h c ,n n i r i ( d l c \ rd e ,,,. t ti ng. burh togi r dnd rxperi en(e suaseu
'\
, hJ , o ,rt q ,,,s ,i o n i n g i \ ,h F mo rr r,.quenrry
e-pr" y.a i n.r,,,ri " n,t ,e, r,ni o' ue.
wh\ r' rh i , p ro h d b l \ rn - w h i r t unc0on\ d.e\ ques| oni nS * ..- ," ,.,,.,t,r
other rechni.lues rccomDlish less effecri\
Vodel descrited
Chapier 2, oral quest
'n
ing informadon aboul
cntering behivror
cedures, and in assessjngperformance,
sun\ oral quesdoning mighr very well br
begt n a u n i t o f i n s rru c ri o n a n d rhe l ast
T h , D a n ) p ,,rp o .e , to r o ral quesri oni ns i denri fi ed b\ stl csi ns. R ubet.
and Q u e l l ma l / , 1 9 8 6 ,a n d W i l e n , tqb6/ can be , o' .e" ' i ,,ea
pr i m;;tv ,uooo, r.
, rposesqi l l
ing e i rh F r d i re ,r i n .rfu n o n o r a\\Fssmeor.B orh rrpe. ot "pLi
te re.
t i. qed h e re ro h e l p d i ftfl p n ri a re r hpm dnd ro i l ru,rrai e hos l ns;oa,abl e rhe D ur
Do\es rre dt riDes.
l. Mmitor fragess Teachers frequenrty ask quesrions ;f the class or di.
rect questioDs to particular srudents ro make judgmenrs abour comprehension
aDd the completeness of learning. The goal is !o derermiDe if mori examptes.
pr ai ri . F , o r d i s , u \\i o n a re n e rd e d betore mo! i ns on ro rhe nexr l earni ns oi i ec.
O f' c n rh e e eq u e s ri o n \ rre r' i ggered by rhe rearhers reaai ns ot no" n.ertat
' r r e.s e ma n a ' i n g fro m s l u d e n r ra (e3
idF
2. Enxovage appharin
oI hnotubdSe.The ..So whar?" quesrion thar sru.
.
dents som€times raise can be iniriated by rhe reacher ro focus on the use ofnew
knorvledge-to go beyond the staiemenr of a principle or a general method of
pr obl e m s o l v i n g S u c h q u e q o n s s r j mul arehi shertevel rhi nti nq dnd hei chren i n.
I e' es ' . T h e g o a l h e re i s d i ' e r I i n s rrucri on ra' her rhan assessmi nr.
l254
NONTESTAND INFOFMALEVALL]AT
ON METNODS
3. Stinxlate patricipati.)n S(u.lenrs.an be drawn jnro
a ctiscussionrhroueh
qu' \ r oni n q .a n d rh u \e E h r,!F J
.n r i un\rrr,,\robFptre$here,.,.t,,;;,,,," " ;l
Lndb\ ,r,oush,r,,r
;.,;';:jll,
r,.,e'r"",. I h;;..;,;;,,1,,;,;q
l:.:.j]:.1:lll:,,'",,",
n c c e s r,! i o n d i ri n n ti ,, tedrnl rg
In hi fp,n.
4 . n tu ,d ! p a ,t /,v ra ,/d ,. R cri eh .e,.i ,,n,,,,n" rh,,.r.rrr
rrrrgF..d Lr ,
r of l h, om rn R s !m m a ' i !e c \rtu . rn n b Lr \u.,e Jt(, rrp l | 5.d
Js \,,1,
r ngnr p ' o \i d i n g .l i { ri h u rF d p ri ,rn . S u.l y* i ,,r . \.h,.rhcr
" r' * ,,n-" ,:,
In n g.." i ;;;.;,
or rn a rasr movrng prosecuror srvte, se.vc ro rcinforce
.ira t" asr"..
t;st."ctl
(learner and reacher feedLa.k) simultaneous\
)nore rhese aci i vi treshave borh i nsrruc.
.ause su.lenrs
trave a rendeuct to ask aues.
rrcns like rhoseth€y rheDselves ha!e been asked, p..bi"s
q";,;;;l;;;;.1,;'k;
thought should be more frequenr than recall^quesr[;s
,r,;; .;;;.i;;;l;
7. Diagno\Pnudat ltoblens. St)o
.
P r oDesc a n te a d a re a (h c r ro rh e ro o r o
marn puryose sened here
latics probtems ar rhe elenrentarv lelel.
rnsrru.rionat rechnique is used wirh sru.
I needs solving or who quickly rune inro
rr ask: .,If you were conhncd ro a wheel
,utd vour e\er, 15. prosra,n necd l o be
a\t a memberot r vo el bal t rcam...H os
be modified so rhai rhe new lersion has
erobi c dance?,,D n.ecri nstrucri oD i s rhe
W h e n rh e p ri ma n p u rp o s e o r orat quesri oni ngi s ro \Frue rhe
a,\e\\menr
"r un( non. In e me th o d s o r to rmi n g q u e \ri ons. deti veri ng quesi i ons,
and i nrerp,el
ing rcsponses
are tundrnenrarriimpnrranr.rhese,i r,"i",* a*_-,."
i,""
reraoreano ralrd the intbrmarionobrarnedwi be and hrjw,oun.r
rhe lubse
quent acrionsof rhe teacherwilt be.
Guldellneslor Ou€sttontng
The issuerellredto oratquesrioning
thathasre(eivedthe mo\r arrenrion
.
rrom
resear(ners
relaresto rhe ra\onomirlevctof rhequenionslearhersu\e.To
NONTESTAND NFORMALEVALUATION
METHODS
259
date, the resuhs of thar research are clear despite the rheto c about fostering
higher-order thinkjng skills, the vast maJority of teachers' questions r€quire recall, recognition, and literal comprehension This is iDdeed an unforiunate state.
m€nt about the nature ofcommunication in the classrooms ofour schools SiNe
questions play such a significant role in rhat communication, it aptEars that the
iDtellectual lcvel of mosr verbal interchanges is much lower than ir ought to be.
Hon can rhis sorrv state be exnlained?
lirst, teachers have little, if any. direct insrruction iD rheirpr€service pro
gr am <
ro q u e s ri o n i n g .W h e n rh e' do, rhe focus i s l esson
t;!el
' ;ronomi .
' el! r on
e d rhe mechanics of conducting a discussion
and more
Second, since hisher.
lev el ques ri o n i n g w a 5 s u (h a n i n s i g n i fi c anrpa of rhe reacher' sow n experGnre
as a student, th€ t€acher has nor tenefited from the modeling of higherievel
questioning. Many teaching techniques uq€d by teachers, and norlearned direcrly
during preseFrce instrucrion, were obsen€d during lhe reachers' own schoolins.
High. quali rvo ra l q u e s ri o n i n gw a s n o r o n e ofrhose. Thi rd. oral querri oni nBseems
like such a narrow topic, wrth no perceived posiiive consquences, rhar ir is nor
often proposed as an in.seFice education topic. Besides, the rhinking goes.ev€ryone knows how to ask qu€stions-hard ones and easy ones.
Here are some suggestions for Daking oral quesrions more challenging
for siudents_and for obraining meaningful informarron ro supporr instucdonal
I Be cognizmt of the vetb used in a EEsIi@ The verb can require a simpte
yes-no response or it can require a descriprion, an explanation, a n€w plan, or
a reasonedjudgment. The explicir implicit distincrion made about instrucrionat
objectives in Chapter 3 pertains hcre also- In addition, Iowlevel questioDs ofren
include such words as who, what, and when; highlevel questions reDd ro use how'
why, and which. Here are sone examples:
Selactth6 mosr D€rsuaslreeditorial.
Comparoth. peEu.slv€ quallly ol th6 two 6dltonr'5.
Namethe writ6. ol Commor Sense,
Cite the lime p6.iod duringwhlch Poot Richatl's Almar.c sas wriit6n.
Explalnhow Jelforcon!.idoalsw6reqpresB€d in th6 wods ol the D.ctrratton.
Whlchodlto alls mosl oorcuagiv€?
Why ls this odllorl.l norc p66u!€iv6 th.n thsl on.?
Who wrol6 CoDDor Sotrso?
Whenwss Poo. Fichdd's lrmarac wrltt6n?
How did Jetl€rson'siderls s€r €xprcss€din ths D6clarailon?
These sets of"questions" illustrate rhat oral quesrioning can b€ carri€d our wirh
both declarative and interrogative shtemenrs. Behind each declarative sraremcnr
is a quesdon expressing the same contenr.
2 Wattlat a rcspon:e.The elapsed time bebreen the end of a qu€stion
and the teacher's next utterance avera8esabout I se.ond Rowe (r9?4) has shown
that these krnds ofbenefits can accrue by increasing"wait rime ' to 3 ro 5 secondsr
260
NOI\]TESTANO
INFOAMAL
EVALUAT]ON
MEIHODS
l. Srudentsw,tl Bieelonger responscs
2 More unsolicired,appropriate r€sponseswill be given
3- rewer casesofnonresponse vill o(cur
4 StudenGw'll becometnore confidenr in respon.ling
5. More speculationand wondering zloud will occur
6. Tea.hrn8 will becomemore sludenr centered
? Stud€n|swill nore ofren supply evidencero support rhe,.
n,rarer.es
8. Srudentswill ast more qu€stions
9 Lor! a.hieving shrdentswiU conribure horc
r0- Tea.her qucstioning stills will improve o!€r tiDe
'Ie?chers rend nor ro wai( very long
before rephrasing o. asking anorher
ques r iu n .Ord ma ri l \. rh e n e s q u e \ri o n i s \i mpl e,, dr d t.\,
I te\" t. rhrn rh..,i " ,
rh e u p s h o r i r. l o n q e r w ai r ri me. rhi nk ri - e t.r ,,l Ll ,.Ir\,.houl d
preserve the raxonomic level otquesrioning rhe ,.u.1,..
t"t."a.a ,o
t
ii.
"."
"3 Stat uith a sturtznt uho ans@s incompttteL) or in o/rr.rry
If a lower.level
follow.up quesrion seems caled for use a seqie..; .rq"*,i."i
,"
t. ,r,.
orig'nal one. Ask for clarificarion, restarem€nr, exptanatronj
-r"."
or evidenliat
sup
puf l S ru d e n r( \h o J re a b d n d o n e d di rer ,rr i n,orre, r
r(,pon,e l c,,n r.o rhr;.:
l|}'j:li:";T,Tiil:"e;"t:',fJ;,Hi::iJ
-"d,,"!,',;,;,;;;;;';;;;.1';
4. Alh o lddnt to poraphru:?ot t.\totp th? t^paa"gium
.
h aa.ath\ tht\t\p"
of quesrioning nor onlt de;ands consanr atdntionfroi
,,"a-,,r.
ii ,lti.
)res deeper undersran.li"s Of ..;..;.;
r fo.marive evaluarron
R)r higher order rhinking to occur, the
rs rhat good quesri oni nsrsti keh to bui l d
hy sruden$ sray shy, b;lps unattendi s
d narrows the learnrng audience to the
persisrenr or self-morrvared.
dn.t ditp ttpn doun in on,au". \ton
Fnad nrJt .IrF,
,,. ^. ." l,i:{ro,.S*'!;,mt
a\sood
{""cn
u'in'"
'|F'':r"m'
il:;,iJ.'#,f'.';:ili,':,:::;';l:;*'i,l:'
some,eache6
deve,op
n,"r.,..;1,'ii::fl1i]'5i::,1::l;::j:","iL
" .".d
therr quesrions in lecrure
no16 or on ov€rhead tnnsparencies. Sponaneous
queshonrng is more Iikety to promote knowledg. t.,"t
,tiinkl"g tL.",nppii..i;
of knowledBe.
Recording Qu6sttoning Deta
Tfl h e rF \p o n F s ro o ra l q u e s ri oni ngarp l o be used fo, su.nmrri l e
eval ua
pur p o s e s o ' rr rh e re \p o n q e sa re ro be anal )/ed by rhe reache,
,n di aqnose
" on
srrensrhs.ind
seaknerses,
a permanenr
sroup
i.o,a n..a.," u" a.ili?,".I'
Doumenra,idn
arrherrmeof responie
isnearr,
p*f.;;;
; ,;t, i;;5;;;
ro.(harrin_g
from memor)ar d "t",""
tarerri;ne.Ot ,o.,_e,;r rfi. ,"
ror ro,marile.purp6es.
rherearher
is r er! ro...onsL,me.,
:p-"1:::"1._1,1,:11.1
rmme c tra l e l t.a D d d o cumenl ari on
i s probabtv unnecesrr).
NQNTESTAND NFOFMALEVALUATION
MEIHODS
261
p u rn s p ro r q u e s l o n i nB and tor rc,ordi ng rhe ndrur. ur rhF rp
lhe
s ponqe sd ! ra re \ l h . .h d r a ' r.r i s ri rs oI rhe re.ut di ng turm ru be u\.d.
U L,\i ou nr.
one dr m e n .i o n i ,f
fo rm mu i h e uudenr nJmes.but rhe ,e,
di
nren.ron
i\
e u rp o s e
^nd
def ined b y rh e -u \e' hr\--p
l i g urc I4 a,boh\ r$o
.r,,,i
.,r,,,
"r
wer e de s rg n e d,u r.l r,,e ' e n r p u rp u { e :.(hrfl \ ha.d,a " ,,-pr*
} n,drt for.,,.h r\D col
ques r iu n rs l .d o fe a ,h q u d e n r rn d d ,:,i tF ror ca,h appropri are q,urre,i l
re
sponse. The charr allons rhe reacher ro exa
throughour rhe group, (2) rhe exrent o
quest'on, (3) the suc.ess rare of suden
o!crall successofsrudbnts. t he teacher i
are being neglecred or if anyone is pard
Char B is rnrended ro shoiy both rhe quanrir). and quality of siudenr
participatron in.lass recitarion or discussion. The focus of rhrjchait ;s nore o"
the general quality of srudenN responses than on rhc narure of rrre questions
they were asked or were able to ansver
\o rr rh a r l o r b o rh .h a \ rn t i gure t.1 r ,
.l er" hl e I,rdsmcnr r,,u)r be
ex er i is e d .b y rh c re d h fl d u n n g rh e .| re\r;un dr\ ^n,i
,,i \i or p" ri ;d brd ,l ur q,re,.
r on r eq u l p rn e x p ta n a ' j . u ' H d s i r a pr edi , r i un? \^ a. rhdI rc\p^n,e !, et,L;bte:
L, r d uou g a d d rn \ n e w \l a l r\ In rh e i s.ue or di d hc mJi nt\ sa1 me ruo:1: H ur
S er m r ne s a s rh a r , u mme n o rh e (opi ,. ur di d i ' upen a new , bu!hw hi te r\\u,,;
Lr pe' r Fn c eq i rh a n v \i n g l e i e , u ' d i n g toj nr .h,,,rl d Ini red\r ubl c( r i \ i ry r nd. , nn\r.
quenr lt . rh c u .e l u l n e s o f th p d rra l u rm\,hdr rFq,ri reroo,nu, h i nrFr.n,e Lr rhe
r e€( hc r s i l l rn r€ rrF rc -s i ,h rh e q u c ,ri u ni ngtr^(.$i nd.| | i mcnrrt$dy\:
l es\ri me
wr r l bc a ra i l d b l e fo r l o rm u l .rri n g h rg h.q ati rv qu..ri nn\ dnd n,urF deJd | l me s i tl
re \p o n s e q u a l i rt i \ | ldn\furmed Inro r ra \ mJrt rn ,hi pr.pc,
FbuE ra-s. A t€natveEiamptes
6i Slud€nr
oratOueslonins
F€sponses
A fYPEAt QUES|IONASKEDANDANSWEFED
(n
Odr
DoL4
Explanation
Ptedictian
oI
oo
o
o
Or
lt
oo
I
I
o
I
o
I
B. NA|UREOF CONTR|BIJIION
l
Ooug
I
l
i'62
NONTESTANO
INFOBMAL
EVALLIAT]ON
METHODS
SUMI ARY PEOPOSITIONS
1 Themoslcommon
detciencies
associated
with
nbrnat assessreflsca. b€ dercotre or/cdrsrur rrst, rnenr deve,opmgll€r d advance
ot€nnr ng
2 Theqlalty ot observationa
data s a tunclionol
lre oos4ruafo.acrandrt_eob6ervat.onat
r€cord-
I The iinenessof a raiingscsie ls retated
to the
arolro scorevariabirrv
b canproducearo,ne
,etrabrti,y
revqor sco,e
rhatcar oe alarneowith
3 Fpsutsrror sponta.eors obs€rvzlior inlL
c€n
erce sLbseq!€ntjudgmonlsor decisions
of the
vrewerIn unintended
or unknowinq
wavs.
. Fn.r.ilbr or oos.tueoDsnavior
.s e;s61:atr!
e.,aorsrrc G reo,esena,ve.oss..d rrs
D.oo-
I Theresponse
optonsof a p€rlicutarralrnoscat.
d6scribeeitherla)rreq!encyor ofcu(6nc;
or lb)
qLary ol Fefronnance
(or ot a Drodlci).
OUESTIONS FON STUDY AND DISCUSSION
r. Howco! d the retiabiltyot the resuttsot oratquesioning
in a ctassfoonb6.slimaled?
abo!tih€methods
characlerized
inthischapterasintormat
assessmenl
' H[1#:i'Ji,T:#;TH#J;:hedri
e!'i reco'|d
viddmq6,e,bb,e
andmean
ne.!]
4 W-hy6 it absrd ro lrge teach€rsto curlait
a spo.ran€ous
obsotua|on
in ,avorot ptanned
. y":"J;:T:[:J::i:ffixarnpr€Eof howsponraneous
observa.ons
napproprialery
in.L]
6 Whvmrghlinlenriomrobservers€recrvitv
be moredrticuhto conlrorthan!nintenaonal
k6,v,o
modilv
rhe,
behav' ffi"'#ffJ"##:ffi""i1"i:#:il:'I:fj,?;Ts€re,
provide
dislod€d
' .li##i:i"jlj:i':1",",,T::1:"9.il',".'"'j#'*-"Ftencvcomb,ne,o
oidii,€,ent
orb.havio,s
ryp€s
robe
" H:l"ff $;,lifl'#1i,",'",:fi::y[eni€ numb6,
NONTESTAND /NFOBMALEVALIJAT
ON METNODS
263
10 Whymightthe resutrsrroma ctreckhsr
be moreeasly lsed tor crterionjeterenced
than
normielerenced
o!rDosest
l1 Howcan noMcesand expe.isbe lsed efeclivetyi. constrLrcting
a checktisl?
12 what are the expectedeitectson score interpretaron
(normreterenced
and c e,ionfererenced
separarery)or
eachof thesekindsot ralng erors. eniefcy generosty.and
ceftral lende.cv?
13 Whalare lhemaioradvanlaces
ot usingoratqlestoningtorgarhenng
summatve achevemenl nlofnaliontroma ctassor sludents?
Wharki.ds or queslionngtechniques
by the leacherseemto dscouragedeepth nkingon
15 W h y m g h ti .c re a s e w
d a i trme n c r ease
pani cpal i onasw e as l hei requency
of appro-
Grading and
ReportingAchievements
THENEEDFORGAADES
The uses made of grades are nunerous and ofren cncial. They are used as s€lf,
evaluadve measures and also to report studen$'educaraonat srarus ro paren!s,
future teachers, and prospecrive employers. They provide a basis for imporranr
de, i\ r ' , n. d b o u r e d u (d ti o n a l p l a n s a nd ri ' ecr npri on5. Ihen. roo, edu(;i i on i s
expensiv€ To nake the best possible use of educational la.iliries and srud€nr
talent, it is cssential thatca.h studenfs educalional pro8ress be warched caretutty
and reported as accurately as possible Reports ofschool grades serve somewhat
the same funcdon in education thar financial starements serve in business In
either case,if the reports are ina€curate orunavailable, rhe venture may become
inefficient or the quality of the producr may detenorare.
Crades also provide an importaDr means for shmularing, drrecring, and
rewarding tbe educarional €fforrs of srudents. This funcrron of srades has been
a" I k ed o n
g ro u n d th a r rh e ) p ' o \ i dr rx| ' i nsn. arufi , i al . anJ hence undesi r
' h e rewards. tnde€d, grades are exrinsi., bur so are mostother
able stinuli and
cher.
ished r€wards for elTort and achievement- Most workers, includins those in rhe
pr of e( s io n \. a re g ra re l u l l o r th e i n tri nsi . reqardq rhar
accompanr
mes
their effo s. Bui most ofthem are even mor€ qrareful rhar" omeri
these are nor the onlv
F e w o rg a n i z e d ,e l fi c i e n r h u man enrerprl sesran be (ondu, red sucress
' ewaron
ds .the basis of intnnsjc rewards alone.
tully
I o s e rv ee ffe c ri v e l yrh e p u rp ore ot sri mul ari ng,dj recri ng.and rew ardi ns
s t udenre ffo r r\ ro l e rrn , B ra d e sm u s r b e \ /l ' d. The hi ghe\r gl ddesmusr go to rho\;
srudents ho have demonstrated the highesr levels of achievement with respecr
261
GRAD
NGAND
FEPOFT
NGACI]EVEMENTS
265
to coursc objc.li!es Grades must be based on sufficient evidence. They musr
rcport the dcgree of achievenrent as precfely as possible under thc circurr
staDces.lfg.ades are assigned carelessly,theirlong.run effects on the educarionar
eA br r s of s ru d e .rs c a n n o t b e g o o d
Some studcnts and tcachers minimize the nnportance ofgrades, suggesr
rng rhat rrral studcDrs lcarn is more important th^n Lhe grulz they get The'r
concepdon rests oo rhe assurnprion thar rhere generally is nor a close relarion.
ship be$een
re amoun( of usefirl learnrng a studcnt can demonstrate and thc
grade he or she recerves Oth€rs have made the same point by noting thar grades
should not be rcgarded as cnds in fiemselves, and by quesrioling the use of
c x am inat i o n s ' l n e re l y fo . rh e p u .p o se ofa$rgD i ng grades
It is uue drat the grade a student receivcs is not in itself an impo anr
educaoonal outcome by the same token, neither is the degree o. diplonra
roward which the student is working, nor the academic rank orprofessional lepu
t at ion of r h o s e w h o tc a c h l h a t i n d i l rd u al . B ut al l l hese symbol scan be and shoul d
be valid nrdicari.,is of importanr edu.ational attainmenls. It is desirable, and
nor impossibly difiinrlr, ro nake the goal ofmaximum educational achie!emenr
compatible with the goal of highest possible gades. If thcse two goals are nor
.losel) related, thc fault would seen (o rcsr with those who rcach rhe classesand
assign r}le grades llrom the poinlofview ofstudents, paren[s, re.rchers,and em.
ploycrs, rhcre is nodftrg "mere' about the grading proce$ aod rhe $ades ir
y ields S t ro u d (1 9 4 6 ) u rd e rs c o E d th i s poi nt.
Il the marks earned,n a.ourse ofsudy are t ade ro representprogres toward
getdrgan educadon,{orti.gtbr marksis,prolrro afurtheranceot the purposes
ofedu(a(ion Ilthe marks are so bad fta. the studenlwho rorks for and attan,s
rhem nisses an education,then workinSlbr marLsis a pracdce.o be esche{edWhen marLs are given, we are not likely io dssuade pupils from workin8 for
them aDd there is no senrble reason why we shodld lt smply doer not make
sens€to grade pupils, b maintain insd.udonal machineryfor asembling and
recordint rhe gradings, ehile at the same rime t€lling pupils marls do no.
amounr to much-As a mauer of facr they do amouDrto somethingand the pupil
knovs rhis.Ifwe are dissatified with th€ resulB ol workrg for marks we night
try !o iEprove the marks (p- 632)
cmdes are necessary.If they are inaccurare, invalid, or meaninglerr, th€
remedy li€s less in de.emphasizing grades th:m in assigning them more caretuIy
so rhar they mor€ truly report fte exienr of important achievements. lnsiead of
segking ro minrmize their importance or seekrng to find som€ less paintul substi
tute, teacheft should devote more attention to imProving the validity and pr€ci
sion of rhe grades they assign md to minimizing misini€rpretations ofgrades by
rhe students. t€achers. and others who use them.
SOME PROBLEI'S OF GRADING
The problems ofusing grades to describe student achievement have b€en persisreDtly troublesom€ at all levels of educa tion. An imponan! ud firndamental rea.
ro solve Permanendy is because Sradson why probl€ms of grading are diffifllt
GRADINGAND FEPCRIINGPi]I] EVEMENTS
ing s lsre rn si e n d to b € c o mc i s s u c s i . educati onal con(r.,versi es Odel l (1950)
nored that research on grading sys(eDrsdid not bccomc signifi.aDt uDtil afrer dte
turn of the ccnrury At about tliat samc lime, the delelopmcnt of objective tests
was ushcrins in the somewhat contror.ersial scientific molcmenf in edu.ation.
educarion in rhe rhird and fourth decades of tlris (enrur\:
Thc rise oftrogre$i!e
wir h ir s e rn p h a s i so n d re u n i q u e re ss of the i ndi vi dual , the w hol enessof meotal
life, freedom and de rocracy irl rhe classroom, and thc child's need fc,r loving
r ei, \ \ ur rr(c .l e d ro i r, i \n ' ' u l J rJ d crr;r rarru* " e\\.rl re,ul rprri ri \(p' (\\ur" ..
and |he c o mmo n s ta n d a rd so fa c h ;e vementfor al l sudeD ts tmpl i ci t i n trran) $ ad.
ing sysreDrsIIowever, subsequent renewed emphasis on "back to basics" end on
pursuit of academic excellence has been acconpa.ied bv plcas for nore formal
ev alua ri o n so f a c h i e ' e m e n t rn d m o r e ri gorous sai dards ofarrai nment (N ari onal
Cc m m i s s ro n o D E x c e l l e n c ci n E d u c at' on, r983)
Such drilis and shifis in educational philosoph) hfluence sonre educa
rionrl leaders 10 espouse one philosophy, $ome anorhcr Some tcachers find it
easy 10 accept or,e positron, some another, even when lh€y tcach in tbe same
educ ad o n a li n s ti tu ri o n . S i n c e s o m e whatdi fferi ng gradi D gsyrtemsare i mpl i ed b!
each of these difitrenr philosoph,cal positions, n is not surprising that diffef
ences of opinion, dissatistactron, and proposals lor change tcnd to charac(erize
teacher reacLions to the entirc grading enterPrseAnother reason why grading systemspresent perennial problems rs that
rbey require reachcrs, whose natural instnrcts incline lhem to be hetpfirl counsel.
ofs and advocates, to stand injudgnenr over the deeds of others "Forbear to
.judge, for !e are sinner, all," said Shakespeare, echoing the sentiments of the
lerer dilficult to
SerDon on the Mounr 'Judge not, that ye be nor.Judged " It
's he or she teally
assign a srudent a good grade, particularly if it is higher than
expected But since the reach of many stedents exceeds their grasp, there are
lrkely to bc more occasions for disappointmcnt than pleasure lbr both studcnts
The jssues that contribute to makrng grading so Problematic are primar
ily philosophi.al in naiure. lhere are no research studies thnt can answer ques
rions like: Whar should an A grade mean? What percent of the studrnts in a class
should re.cive a C? Should spelling and grammar bejudBed in assigning a grade
to a paper? What should a course grade represeDt? 'Ihese thould" questions
require !alue judgments radrer than an intcrpreration of rescarch data; the answer to each may vary from teacher to teacher. But all teachers musl ask similu
quesrions and find acceptable ans$ers to theft in establishing dreir grading poli
cies. Wirh careful thoughr and penodic revieq most teacheE can develop satisfacrory, defensible gmding practices that will yield accuErc m€asures of the
achiev€ments of their students. And by altending to the principles that e.hance
policies and proce
rhe reliability and ealidiiy of other achiev
dures can be developed to produce relevan! meaningtirl grades at all educational
No sysrem of grading is lrkely to be found that will make the process of
grading easy, painless, and g€nerally satisfactory. This is not to say rhat prese !
grading practice, are bcyond improvement.It is only to say that no new Srading
system, no matter how cleverly devis€d and conscientioully followed, is likely 1o
GRADNG ANO FEPOFTNCilCH EVFMENTS 267
s ohc th c b a s i c p ro b l e m s o fg ra d ing The real need i s not for some ncw sysrem
G oo d s v s re rn sa l re a d ! e x i s r T h c r cal necd i s i rl usi D ti the exi sri rg sl 6remsro pro.
dacc th. nnrst valid grades possible lbr rhc limircd ser of purposes grades shonid
Some Shorlcomlngs ol Grad6s
T w o n a j o r d e fi .i e n .re s o f grades,.rs rh.) are assi sned,n many cduca
r i^na l i n \ri h r,' n \.!r'
l rh e l d ,l nl ,l c.,randg.nprrl l \i i ' (prFdder,ni ri on.ul
ahat .he various grades mear and (?) the la.k of sullicienr, relevanr, and obicc
I l\ ' . , i d i n ,e ro u \e d \J b a \i < to rJ $ i q n i nB cri .l p.' \ri gtsi n' .tri rt,i ' .rnddri r
1989) One .onsequence of the filsl shortcoming is drar gradins sLandards
'vold.rhe mcanings of grades
and
teDd to vary Irom reacher ro reachcr, from course ro
course, fron dcparmeDt to deparhnenr, ard fiom s(hool ro s.hool wirhin disrricis (IeNillingc\
1971) Anoiher consequence is rhat reachcr biases arld ntn>
svncrasies tend ro reduce rhe validit)'of grades (Sriggins, lfishie, and criswold,
1989) One outcorne of rhis sr(ond shorrcoming is rhar rhe gr,rdcs tend t.) be
unreliable Another is that grades .an be inllared thelr hce vahe rs high.r rhan
r her r d (ru a l !a l u c .
The absence of€xplicit definrtiors fo. each grade permits reachers to be
influenced, either consciously orunknowingl,v, t'y exrraneous facrors in assigniog
grades. Rcsearch on this point fiom thrcc or more decadcs ago probably is .har.
ac k ri s i i c o f p re s e n t p ra .i i c e (C a rrer, 1952; H adl en 1954i P al mer, 1962) S ome
teachers deliberately use high grades as revards and Ios'gradcs as punishDcnrs
for beha!ior unrelated to the attarnmeni of inslru.rional obiecrives
\' d ' , h rn .l t l l i .r l l 12. l q l Jr. l 9l ' l hl ;n rhe unrel i ahi l i rl
T h e " ,Id re \
^l
'
ofteacher's grades on examinadon papcrs are classicdemonsrrauons of rhe insia
bility ofjudgments based on presumablt absolure s(andards Identical copies of
an English test paper were given to 142 English teachers, wirh rns(ructions ro
score it ou rhe basis of 100 percent for a perfect paper Since each reacher lookcd
at only one pzper, no relatite basis forJudBne[r was avarlable. The scores as.
signed to the 3ame paper ranged all the way from 98 ro 50 percent Si'rilar resulrs
were obtarned with tesr papers in geomehy and rn hisrory
Typically, grades such as rhose Starch and Ellior collected forsinsle examr nr r i o n p a p e rsd rr n o r h i g h l r re l rdbl e.t^r semen.r grddeq.hose\c.
' eibased
i aU i Irion
e.
in the rang€ of 0.70 ro 0 80 should be common. Semester grades are
qudenr
m uc h mo re e x re n s i v ea n d .o mp rehensi !e nbsew ari ons ol
d ai nmenr5,
perhaps as nany as 80 hours of obsen-ario lven so, one hour of inrensivc ,.ob.
servadon" under the controlled conditions of a well srandardized achieve'Enr
test can yield measures with reliability esnmates in excess of0.90. If the rools of
performance assessmentare not well designed, their collecrive worrh over a se
mester Day be exceeded by a reliable and valid commercially prepared instrumenr rhat tales no tiDe for the teacher to prepare and a small frachon of class
r im e (o a .i m i n i s te r O u r p u rp o s e here r\ noi (ol rgue torrepl a(i ngreacher.made
evaluation tools with standardized measur€s. bur to dramatize the unforrunare
state of affairs in which some teachers find rhemslves ar grade assignnenr rime
We are nor facing utter chaos, but considerable room for improv€menr exisrs.
268
GBAOINGANDREPOFTING,|CNEVEMENTS
THEMEANINGCONVEYED
BYGRAOES
A,grading sysremis primarity a merhod of communicanng
measurementsor
achrevemenr.Ir Involvesrbe use of
tob.€re".it
;"n;;;il;,i"H l'"fj:l:ll:...
;lTlT"i:fi:J;Tilpj
"ush,
rhe clegre€rhat rhe sr ading s) mbolshale rhe sa-.
f.. r *h. ;*-ili.,,,
is it possibtefor sradesro s;rv. ure
rh PurPoses
-*"irg
or comnunicalion
meaninsrully
^"d i;i;ry-he mejninsol d gride \hi,uld.dep(nd
J\ JIrt. r\ p.\sibteon,he in\|l u,.
,'T issued
tor who
rt o. rhe coursero whrch ir prftai,,,- fhi. ;**
rh.,r ;;s;;;;s
." insrrucror.of
a deparrm..i,
* _a*J"i._."il;."J:;:r
l-."",::l:::
"fare rDatrers of legitimar€
rnsrturron
m enr s ,a n d o th e r i n s ri ru ri o D s .Itme :
A parricular giade carries t
th e .o m p a ri s o n o l
absolure s.andard or a relative sranr
fied group. Se.^nd, a Fade reDrelr
eir h€r am o u o r o f e ffo ri e " p e n a .a o
nar r y .a g ra d e re p re re n rse i th e r rh e a l
rnstru.rion or rhe amounr oflearnrn
The remainder of fiis secoon is a dil
ing be(ween rhe alremadve meaninss
ute ro the overall meaninS of a graae
,
Absolut€ and Rota vo Stendrrds
A gracterepresents a teacher,
has performed a ser of rasks in one i
units. These judgmeDts of goodness c
ot comparison. performance thar is d
lenr, or inferior obtains irs qualihri
perrormance in quesrion with a peri
absolut€ or relarive
rg systems used in the United Stares since
-trerabsolure or relative grading srandards_
A definite percenr of..p€rfecrion," usr
was regard€d as ihe minimum passin!
studenh' performances
r,irereni
edg€, skills, and undersranding_thal
-ere
GFADINGAND REPOFTING
iCH/€VEMENTS
289
gr dde p re \u ma b l v
rs i s n e d i n d e pr,,den' tv uf rhe grade. ot urher srudenrsi n
r 4e. ou rs e . p e ' r, r, ' \g ra d l n S l l (u / r i \ thatact.ti l ed a\ ahsotutp
sl otn!.
In , o rh e r mrro r rrp e o f g r adi ne j l crem i sbascd on rheuseof
,3mal l
.
num ue r o r te ' re r g rrd e s . o ttF n fi !e , rn exp,es vari ou, tevetsorarhi evemenr
ln
r hF f l\ c l e rrF ' .{ ._ 1 ,
D . I .rrre m . rrul y ourqrandi ngpertorman(e i r assi qneda
'
.t
.r.
gr aoe o r
I n e b In d ! a re \ d t,o re .r rerasea(hi eremenl C i \ rhe a\eraqe
eradc: D
r n. u' dr c \ r)c ro q a rF rrg e a r h rc v e m e nl and F r\ u(cd ro repon tai tu,c, aA i ;vemenl
u n ' ra n r , re d i r l o r i ,,mptrri ng d .ouFe. r ne rptari rF srandard
to
s r r , h e r,h \ru .l e x r \ p e rfo ' m rn rF i s retFr.n(ed i \ rhe di \rri buri on
ot D erl orm.
J n, pol o rh c r\ru d rn r\i n rh e i tJ s s
Ihu\. tel .r grddi ng i s ,.rn.,,-.,
.;;r;;;" ;
ized as rtlahu grading.
O l i o u h c . e d , h tc e r i n rh e gradi ng \tsrem , an be defi nrd i n rbsol ure
r c r m r ,s t.rd n r re ta | | re re rm r. A
Srrde ot D ma| ndi ,rre a.hi .!emenr ot rhc
nr r nr m i rr! e \F n tri r rn o w te d g c d n d undcr{ andi nB 5: C mry rrpresen,
adeqdare
r dr n, I rh n n a \e ' J g e l i h i e rc n e n , B m J) i ndi i arc I tc!et ur Jdl d;,
ed ar hi erenent
wrm fespecr ro coltrse conrenq and A may be used to represenr exceDtional
or
m er i, o' i u u , d .h i c \ e ,n e n ' . s u , h g e n erardei ,n i ,i o,r ,o,t,t r" q,,i r. .,h.I;;,;;
,;.
i o mn ru n r, a l r rh e a l ,\oture t.\et, ot a, hrevemet, ,n
any parri .utar
, nLr 5e o r { ' J d p .J rri fi , \u b i c .r n ,J ,r thc pornr i r. rr i , nor $e
use uf tetrer
\ ) n' bul\ ,j l .tr\| | n q u i 5 h e \ d b ,o l u re anJ retari \e gradrng: i r
i \ ,he narure ot" rhe
standard against which performance is compared rhar diifferenriates
tne two
T h p d e i i .ro n ro u \e e ,rh p r d,,
ur a rdtr' ri vesradi na smndard i .
" b.^turr
r he
r,rn .l J n p n r.r d (,
., rp i (hcr
musr make qi rh rea;rd r" ;" ;i ;,;;;;
",..r
' \i .n standard
assessnrenL
When rhe absolure
is chose", .ll
and r;"ls of.;;1,;
r r iun m ,' \r b e d e .i S n e d ro \i e td ,ri ,eri on.retcren,e,t-erho;"
i ,,.,p," ," ,i " ...
A ;;tr;.
. r dn, , J r.d sro r rrJ d rn g mu ,r b r e \ra b t;,hed ror ea, h ,
omp.ni nr rhdr i \ ro Lon| | i b.
ur er or h e ,o ,x \.g rd d /-re \r..p rp F r\.q u r//es,pre\cnr,ri un\.proi errs,drdurher
J ! r , g' , m.n l s l l rh e d ,a i .rn n i \ In u \F J retari re \rdndard. a gr" di ne,
ompon" ns
m u. , L' i c e d e d ro p ro !i d rn g n u rm re tFren.edi nrerp,e,j ri oni .Or
,;u,se,rnborh
i . ' \ . \ ( u ru fl \ o r c d e , ,\i u n \ n e e d ,o b. mdde r, t" ;g ,r ,er" r,t
s,,atne ,r-L" ta
dr , dur l d b l e . Ih c h a \i . l o r d e r.rmrri ng,hc,uro" fr
poi nr.
r,.Li ," ri r Lr c r F n ,e d In o n c ,d \F a n d n .rm .,e t eren,.ed i n the orter.
" ' i rr
T h o u g h r ,l rrr mj i u ri ' r o t i n.| | ruuon- noq u\e te er
sD drns qi rh retj .
f i, e s r n n d n rd s .p e r.e n r S rr.ri n g i s h ] no meanr nt,\otpre.S omi i nrLi i uri ons
rri tl
,on\crr ro l e cr grades trom D er.enl
rards srill prefer ro deFrne pmsin! scores
. i n \ome,nqe\, rhe r," rores,*
rrany
grading methodotogy Some insrrucrors
I over r.tr| | ve grddi ng for phi tos.phi ral
\tandJrdi ovFrheari nqor, i n some cases.
Achl€vem€nt and Effort
After rhe decision has been nade about rhe use of absolure or relarive
standards, rhe insrnctor musr decrde which
p.,f".-""""
.. ;;l;;
lor m . uf rr h i F re m F n r s i fl b e i n , tu d e.t i n ,he grade.
".p..t" U ndoub|
"f
rdtx ..-. ,." , ;.;;
baqes um e o t rh e g ra d e s rh e v rs u e o n l atrori orher rhan rhe dcere"
or arhi eve.
270
GRADNG AND REPORTING
A]:H]EV€MENTS
menr of rnstrucrionat obiectives (S
likely $'itl conrinue ro d"io U.-",,
conrol in rhe ctassand because son
t ile r e a (h i n g . B u r th e u l e o f s l a d cr
leadsro d i s ro rre d m e a n i n A (o t' rh e qr
t hr r s o .i a t b e h !v i o r ra rt., rt r" .." h
of rheir school progEm
W e h a v e a r$ re d rb a r g rrd e!
.
and r e w i r.t s ru d e n r l e a rn i n g . c € rr a
oemonsrrare grealer desire ro learn
some forrn of recognirjon and rewa
Statusand crowth
Some insrructors belele that
the amounr of improvemenr sruden
achievemenr rhey dernonsrrare ar rhe
on orher preliminary observations. ;
inirial sratus.The differences bers.er
Il)enr, sxbrracrrng rhcse scores from o
tlon ot enors ralher rhan a cancella
more error tadcn than enher of rhc s
mal consisr mainly of errors ot measr
provide reliable scores. insrrucrors ma
and posnest rnean
But few classroom achievement tests i
suremcn,s
orsho.rern,
sil;i;
:fi t:::,i.:J,.J,-il:"ffilii","..1,f:1",*,
GFADINGANU
UHAUING
AND BEPOFTING
REPORTING
ACI EVEMENTS 271
ACH
In addirion ro rhe reliabitiry con.erni rhere ar€ ortrer problems wr.n
groMh measures. One is rhar, Ior mosr edu(arional purposes, kn;qtedge thar a
studenfs achievemenr is good, averasr
is more useful rhan knowtedEe tha
than orhers during a gading lirrior
on the preresr have a considerabty gl
gains in a c h i e v e me n r rh a n rh e i r p (
dents are quick ro learn lha! under cjrcumsrances of grading on rhe basis of
grorvrh, rheir prerest scores should be as tow as possible ro pe;m( rhe greatesr
pos!ible obsenable gain.
Ir is rrue Lhar sr2tus grading seems to condemn some srudenrs to low
grades in mosr suLrje.rs,senresrcr afre' serne,re. Low grades drscourase effor!,
whic h in tu rn i n c .e a s rs rh c p ro b a b i ti o of more toh g' i a.s. S . rt. ui c,6us.1.t.
conrinues, bringing dislikr of lear ing and, possibty, eariy wirhdrawat from
s c h. ol. If s ru d e n rsa re ra u g trr to d rs l ike school by constanrreD ri ndersof rherr l ow
achieleDenr, the rcmedY probaLly is nor (o rry ro persuade rhem rhar therr rare
of growth roward achieveDent is .rore imporranr rhan starus achieved. tor rhar
n a t . an i p a re n r fa l s e h o o d T h e re D e dt i s probab\ ro pro\i de vari ed oD D orrur!f
t ies t o e x c e l i n s e v c ra lk i n d s o f o rthuhri e acti ,i i i cs.-The ptrnni ne a,' r:ti mpre
nenLat i o n o f s u c h c i l b s c e .Bi n l ) $uul d reql rre an aterr, ,ersar e, and d;dr
cated rcachcr When ir is accomplished, rhough, grading on rhe basis of srarus
achieved will no longer orean thar some srudenrs must always wrn while orhe,s
nus t alw a y sl o s e In s re a d$ m e $ ru d enrsw i tl he abl e ru entor some ot i he rew ards
oI ex c el l e n c ei n rh c i r o w n s p e c i n tri c \.C ohen rl S 83). for;xampLe, has de\cri bed
alrerna(jve procedu.es for grading rhe achievcmeni ofexceprionat srudents who
have bee. "muinstrea,ned "
E S T A B LI S HI NG
A G R AD IN GSY ST E M
The Grad€ Scale
lircnt Many instrucrors seemed ro 2gree wiih rhis view. Nonerheless, from time
r o r , ,nu lh e re l ,J . b e e ,, ,,,, r (.s ,.d o r re new ed rnrcrFsri n efi ni ns rhe $ adi rq 5(ate
buodilg
p l u \ d n d n ,rn u \ .i g r. ru rhe hd\i . l e e' , o' ' de, i m;r rrai ,i " n.i o rhe
bas ic nu mb e rs (l b r e x a m p l e , 4 0 , 3 .5 , 3.0, 2 5).
The notion rha. gradrng problems can be simplifi€d and grading errors
rcduced bv usiDg fewer caregories is an a(racdve one hs weakness can be ex.
272
GHAOINGAND FEPOFTNGACH]EVEMENTS
In^.I'""":j"y:,-ll::der
g',ai"g
cateso es in-sradrnsdoesindeedreduce
rhe
.l;iih
jl:
, ;.";;;:':.i",',:;:i;:";:l':;
.rh,i
r,.i
a.,.,,.i.,,,i.i;;;;
,.,j;:,:;:;.''j.i^:"'-,,f..,"':';1,;',;J';;;j':;?
".;;f:j:'j,T.jilllll:
:1i1":::[t":',1::;
::,fli::'j:,:;:tl-:1:1.:::r
r'
ll;::::..f:,:::;::l
., u.-.."ri,";i:|FJi:lli';';T:
l:i:;::
;X:,;'::;'J;
:::.
:::"t:".1*,."
;;';;;;;;;'i,ii'i:'iill"i;i"'i;';".
fJl::l".1':,,::':.1
::_p.,.;",.
;;:;':' ;:l:.IJi1;
*,;:".'
;:l':;I.flfitl;t
::iq.:::i:':
1.."!,ri;ir
r,,.f
,".,8.;i;,;;;:.ji,i
i:,.:?::[l ?:lj]:::::j."r.a..,""ra
F:l::J',;:
,,;l:;is';, ;;;;;";.,8":':";111,.".,;11:li;
;:i"..":::,:.:t:::.1,.:i::_jlir
.''
;'lilii.: ;#;:.'" Y,;1,,i':
k;';:l g;,;:::-*;1.1';"_,
1r1 ".]",; ;;";
;;.;:';;:;i.
f;)
::;:::Jlt
J;;:li::J,::t:5:i.::::::tr:;"i
i'g''r,"
ii es redutes the islli:r';;
l:l
ntor ni
( o n v € )e d b ) rh r g ra d
r on .,;::"::1ot:lll,l'.
'.au,
",;r.. " , ,,.go,
e.
Lellers versus Numbeas
pdcent.sradins
wasa,dedbv rhe subsriru
[]l:.'H::'j,:j.:::ff:rn$
.,_#,,
r.i.,..iii.;;;i;;,:t:;."
j;t;:l:ji::
i:.:T"):.,;,
#
fli_*.-y
l,:::llil:;ltJ::;;:.,a:1x.:1.::r,r::..:r;':;;;;;J;::;i;:,
ff:.1::j;.'i:
ffi:i:#::-,'jilff:,
:L:;::t-:rf--;,6.;;:;;:;'#,::ll'
ir,.,.t.,J".;; :t J;..,;Jl:;iltii:
Il;ii;:11]:1j"ha,reueh
t::?:1,
impr),
4-al,,;,,,
., r,i..
-r.r"*
;;;";i 1i,",',illllili;"'"llJi:li:
"",_,
F o r b o rh rh e s ere a s o n srt",
.,.tU n ,"
i:.ifl"it;-ff::Jlj:ilt:::x;.-",ii:T;,:i4;;'j;:l;:iii:r':,11
r.'Ji"'i'",'','ii
;,;.;'li:]"'x
lfili"?
[::l,i::"li: :]t:'-y:l:,',:lr
":iJ;;':x:
l:l,?'lil
ruili";li:*
[:ij"j;ii:l*,::::r:;'lilr
j1i.fl"",',iij.ll"l,T;i
,;.J'",,"r,"ii,"
"i
;,ltti",l)'J
more
f;#:,ll
:':::"::
subrle
rta s e o u \th i q ,.l !_ -,,^ ,,ri a trvmbot\i ng,adi
" " -" " ngboutd
".r.r.
changes
Slngloor Mutttptecradss
, s are rDoreexplicir in communicaring
GBADINGAND
HEPOBT/NG
ACHEVEMENTS
273
wha( s ru d e n rsc a n d o rh a n a re l e rt
senrs and rhar sufficient evidence L
{ o' d e r(rm ,n ,n g rh e \e p .,r,rr
R rJde
grades rePorred ar one rime, rtre x
enced by cons'dembte h,lo eftacr 'r.
s r ude n rm a y i n fl u e n c e e a .h o fth c gr
ro rhe scparate aspecrsof achielem(
I ng n ra ) h d !c L r o l re r.rn e i l u i j l i o
shoncomings of rhe sinsle.svmbot s
There is an ecle;ric gradir,ts
,
thar has promise for sarisfying rh(
Ii re n c i n g .I l
r nr nr m u m (o m p e re n c y
r€rerenced measures ro make rela
passed. (One version of rhis mcrhod
Only a single grade is assi,$ed to ea
gr ade (A, B, C , o rD )a re re fe re n c e dt
hav e b e € n i d e n ti fi e d a s m rn i ma l l v es
r he r e l a ri \e s ta n d i n g \ o f \ru d e n rr i n
regard! as .'beyond basrcs,, or impurranr to success
rn srudyins more advancco
as pe.rso i rh e s u b i e ,I rn a re r T h e -erte,ri i .y.,.- .,,.r,
i r,.,.
" " a" " " ," i ." .," "
t* .h .
ti k e ty ro dssi gna ba,et) pass,nsgrade (D ) ro srudenrs
. . . , - nave
. - I nor
wno
mastered
-:.te
s s skils rhan the)
basic
be ,"ai.;
."nv"";-;;i;h.
tive grading sysrem.
-,ght
2. Studenrs who fait at first can be rererred ro jmprore
rherr gladc rtrer
rhey improve.rheir skills. They are nor relegared .. l,",i"g
i.il,..;i;p.i;;;,.
on one o . (a s i o n rh e r d e mo n s rra re d te$ te;rni ng rhan
u;h.
:t. s ru d e n re w h o e x .e t j ,e rew ardrd r,.;,di ng ro rhei r
l rvel ot arhi eve.
a re i n c e n ri v e t ro s u bel ond rhe mi ni mum * * ,,t" r,
a.r_.j l _
l^. _: , , _]h .,"
P a$' ng
4. The system represents a
spr;*y
.av".,tJs
H:fi:":J,""1
lxi*li-i,l ""
"i a"iil;,;",,"[i?::i1."i,T
THREATSTO THE VALIOTTY
OF qRADES
A distincrion should b€ madeb€rw
ofPerformance rhat a rea.her
zarzarrr aDd the subseron*ro". ,t,"t" -t "Ptcts
compon€nts
rh",.;;G;'J,:,1"d.?f;:lLt'J",H#::':f;:llfl ::H:::
srades
studenrs competencewirh respectto the instmctlnat
.bj?.t,".r. i;-.;-^i..
274
CRAD NGAND REPOFTNG ACH]EVEMENTS
neDts of a grade should be academically orientedr gradcs should no! be tools of
discipline or relards for pleasant peryrnalities or good anirudes. A srudenr who
is xssiBned an A grade should have a li.m grasp of the skills and knowledge
t augh t. Il th e s l u d e .r i s l ])e fe l y n ra rgi al academi cal l ybur ve.) i ndusrri ous and
congerlial, an A grade i{ould be misleading 3nd woukl rendcr a blow to rhe moti.
vation of thc cxccllent studcnts in class Insru(tors can and should sn,e ftedback
r o s ' ud (,,r\ h rl , | \p ,\ t ru d 1 rU .r! ul r.' i t. an.l , har., rer,' r i , l ,ur ,,rl ) per form.
'.
a ce based (,n acade'nic achielement should be used ro derermine srades. In
r l, er re ..mrn F Id J ri ^ r' \ r.g J , d ,,rg r rr | ,.r d\ r,' ,l r\tF rJri on\. Ll ' e\ar i ;ndl C orn
',
Drissrcn oD Excellen.e in trducarion (i08:l) stated rhar "sndes should be indica.
r ot s ofa c a d e D i c rc h i e v e mc n t s o thcy.an bc rcl i ed on as evi dcnce of a srode.t' s
readiness for furrher stud,v Grades co.raDinared by other fac(ors givc srudenrs
a false sense ofreadiDess aDd provrde misinformarion ro those who seek Lo guide
s t uden L si n th rx l u tu rc c d u .rL r-n d l cndc" \" !s
S e v e ra l a s p e c rso f s ru d c n Lperl ormrnce havc beeD l abel ed as porenti al l y
inv alid g ra d n rg .o d rp o .e .rs b e r,ru serhey rep.esentbehavi ors thar do D ot rcfl ecr
drrecri_vrhc attainDient of rhe imporranr obtect'res of rnsrru.rion (lrisbie, 1977).
Though some cx.eptions .ou1d bc noted, thesc variable, generally should nor be
used in determining course g.ades
Neatness in written work,.orrectness in spelling and grarmarical usagc,
and organizadonal ability arc all worlhy trairs and are asse6 in mosr vocauonal
endeavors. To this eJ(lent, it seems appropria(e rhat teachers e!aluare rhese as
pects of performance and provide studen$ with constructive comments abour
them- I{owever unless the course obtectrves include instruction in rhese skills,
sruden$ should not bc graded on thcm in the course For example, studen$
essil exarnirralion scores should nor be ntllue ced directly by their spelling abil
iry and neither should therr course grades. Sludents whose skills in wrirren ex.
pressioD are weak caD and do learD rhe impoflanr knowled8e of scrence, social
studies, literaturc, and othcr academic subjects. Iheir wriring skills can and
should be evaluared in such courses, bur their course gmdes should nor suffer
dire.tly because of their writiDg deficrencres Ib the exrent rhat rhey do, rhese
grades are misleadhg to both students and parents and serve to moderare rarher
rhan s{imulate intcrcst in rhe subjccr rrea
Nlost rnsrnrctors are artracted Lo srudents who are a$eeable, friendly,
industrious, and krnd. They try to ignore or Dray even reject those vho display
opposite characreristics When ir appcan that certain personaliiies may inrerfere
with classrvorl( or have Iinited chances lor employnent io their field ofinrerest,
constructive feedback from rhe instnctor may be necessarl: Bur an argumcnra.
dve or misbehaving studeDt who recervesa C grade should have only a moderate
anount of knowledge abour th€ course content.'Ihe C should Dor reflecr rhe
studenas djsposiuon or disruprive behavior direcdy (Bartlett, 198?).
Most smaU classesand college selninarr depend on student pardciparion
ro some degree for rherr success When parti.ipatlon is an important ingredient
in learning, parhcipation grades may be appropriate In such casesthe insrructor
should ensurc that all srudents have sufficienr oppo unity to participate and
should maintain systematic notes regarding frequency and quality of participa
tion (See Chapter 14 for sample recordiDg forms ) Waiting undl the €nd of th€
giading period and r€lying suictly on merDory causesa relativety subJectiverzsk
GfIAD]NGANO FEPOFT]NGACI] EVFMENIS
275
t o bc ev e n m o re s trl )j c i ti \e a n d u n rc l iabl e pani ci pari on pfobabl ! shoutd nor bc
graded in nrcst ctasses,howe!e. Doninrtilg
uu.j
,t r.f.ut, r.,rj
"rrro,l,,rt.a
win, an. l i n rro v e (e d o r s h v s tu d e rrtstend ro l < i s" Iosr.r.ro.,
",
,ray ,unntro pr.,;i .t.
c \ aluat n e i n fb rn ra ri o n L os [rd e n rs J h uur \ rr ruus ]ql krri ut Lhcl ruLl en15pctr.nr_
alir ic s ,r n c l u d i n g w rl l i n g n e s s1 0 p a r r,Li pJre, hU r !r a.i rng s)roul dn,,r Lt rhc nea,rs
or oor ng s o .
S ru d e n rsa r a l l tc v c l s s h o u l d b e err.ouragcd ro rrten.l ci assestrccadscthe
t c c t u. es ,d e D ro n s rrrri o D sa, n d d i s c u s si onspresunr.rbl yharc bee desi gnc.t j .i
r(,
c ilit at e rc i r l e a rn i n g . tfs tu d c n rs rn i s sserei al cl asses,Lhentherr pcrfonnancc,,n
p rp rr\, d n .l | r9 i € .rs l ikcl v ri l l suffe. If rhe i D srrucr(,rrcdu.es
t heir gr . rc l eh e ' J u s e .' b s e n .e ,5 u .h srudcD tsa1r subrri tre{ l (o a forrn of.l oubl e
^f
J iupJ r d\ . F ,n e x a n rp te ,! r,, e g e i n s rructor ma), say fi ar ctassarrcD daocccou!rr
lu- per ' en r u l rh e .u l r{ e g rJ d . L r r,,r. rdr nrs $ ho Li \\ { tr I rt , t,rse\ ,h r. ,..
et t e,r r r ( t \ , J mo u n r ro 2 0 p tr, e n r. l ;r lk t. \hu, \pcri e,,,, t,.tsh!,,,c\,,t ...,,
rnJ.
I n t her r L rrs s e sp ru b a b l ) n e e d ro e x l m i ne drei r cti ssroom e;vi ronIJ)cD tand j l .
srructional pr.,ce(lures to dere.mine if changes are nee.lcd. there ousht ro trc
m or e p. o (u c rrl e m c a n s o t e n .o u ragi ng srudeD (s ro ar(cnd ctassesftan Lrr
rhrearen ro towcr rhcir grade
S o m e i n s r.J c ro rs a re mo rc g e neruusi D rt,rn grdLl i nArtu! rhrv ouehr ro
Lr bp, J , , .i rl ,d t, J r,l .a r to \F r q r.,,l e snri B hr hrur\r rhci r \ru,tFrr\ v .i r;r' sr,
H, , \ ' \ e . .r\ \d J t, | , t,l i r' 1 ,t,J \.,r9 1 ,.,1 .rt,e i ,rt,t, InF| l rnr.nnut rhi \
I,h,J,,,,,J,t;ir.
nor delen s i b l -T h e d F ,r c r^ t" b e te \F r\ rh n g .
r h a ra n vre s z ri v e .€ a c ri o ins b o
and (2) rhaLelaluaring a perfor
a Personas a Person.Nor eler
an d d i ti g e rr o n e s ,i s g o o d .a n d
lu tl g me n rr d b n u r \rri t' n g J n d spcaki ng ski s, personal i ry rrai rs, efl or!
. ' no, r r ur ' \ d r' u n a ' e
L r rc r(h c n .onstaD dy as rhey i nrerucr w i Lh therr sl u
d eth F { td rr.,r\tr,,r
denr , . lur\!l u L l .n r.\r' n a,rl
rl rcJU dgmer,r\ma,trJbnurJi rde ,r,
promise rs Do easv task. Bur accurari and ntcaninstul
llllc,.:':
sracles dc.
-d
pc nd on rL
GRA DI NGCO UB S EA SS IG N M EN T S
276
GFAO NG AND BEPOFTNG ACNIEVEMENTS
and sevcnth days hc missed nonc He Dissed only one our 01 twcnty on the resr
given the eighrh day Which grade best desc.ibes l'reddy's levct of a.hievement?
'Ihough he may not have caught on as rapidly as some of hrs peerr, Freddy ap
pears to be able to rdentify prepositions Sone tbrm ofgrading might be used ro
motilare anC dirccr Frcddy and his classDales,but all such grades need not enter
hro dcrcrnrinnrg thc final coursc or term grade
Perhaps the most frequent shortcoming alsociated with gradnrg assign
meDts such as papers, reports, prcsentaiions, artd projects is the faihrre of the
teacher to sp€cify and dcs.rlbe in alhtatue\\l\:tL \he imponanr aspecrsof rhe final
producr should be likc The lack of "feed forwa.d," .N Sadler (1983) has label€d
rt, p.odDces t$.o uDdesrmble outcomes: (1) sotrrestndents presenr incoDplere as
sigDmeDts because they nisunderstood thc tcacher's iDtcDt, and (2) grading be.ones a chorc fbr the reachcr bccause rhe crirefla thar distinguish better assignmcnts fron poorer ones ha! e nor been explicated 'l he gt ading gu i de thar scetu
so logi.al to prepare for scoring essay items is equallv beneficial to the teacher
for grading assignments. lt can help to accomplish rhese things:
I When presenrcd ro rhe students ar the time the assignnenr is madc,
potential misunderstandings about what to do can be overcome. fhe Daure of
the final product can be describcd completely, and thc relative inportaDce of
various aspecrs of it can bc prese red. Oflen an example of an A assignmenr
f iom a p re v i o u s c l a s sj s a h e l p fu l m odel 2 Opportuniues for exoaneous factors to influeDce grading are reduced because the relevanr elements have been defiDed. Grading variables and
evaluation lariables can be separared so lhat a conscious eftbfl can bc made by
rhe reacher to lrake comnen$ about the nongiaded aspects of the work
3 Grading can be done efFrcien.ll becauselittle time rs needed to decide
which part, of the assrgnmen t to rserghl most hea! ily Less time is necded to judge
c om ple te n e s sa s w e l l 4 Feedback ro students can be som€what diagnostic because missingseg
ments and studenr misconceptions a.e more readily identified
We discussed in Chaprer 19 the iDportance of preparing students tbr
examinanons so that they know whar ro expect and can prepare themselves lur.
rher A grading guide, like the checklist shown in Figure 14 2, can sene this same
uselul tuncrion for assignments, and it also caD contribute to more valid and
rehable measures of a.hrevemenr ifused wisely by the grader.
C O M B I NI NGG R AD EC O MP O N EN T S
When teachers derermrne a course Erade by conbining grades or scores from
tests,papers, demonstradons, and projects, each componert may cany more or
Iess weighr than the orhers in deterrnining the final grade. To obHin grades of
maximum validitt teachers must give each component the proper w€ight, not
too much and nor too litde. How can tley determiD€ what those weights orgftt to
be and what rhey actually turn out to be? And if these rwo s€ts of figures are
disparaie, what can instnrcrors do? It is not easy to give a fim, precise answer to
GRADINGANOFEPORIINGrcN IEVEMENTS
277
of
,n:rch.influcnce cach .omponenr ozs,lr ro
j::
have in deterDln.
l,ow
:l:'esllon
I ngU,
,.o m p o s j re
q rJ d (.1 r,,,\e \(rrt
-,,,..
sx,di ngp,i " .i pr." -." ,, r."
| ,' g e n e ra r.rh e u s e o t re \ e ral d
" ,i .,.Jr s berre r rh a n U \e o f o rl l o l e , p rovi ,
I ns r n.ri u n a l .h j e c ' i r6 a n d p ro i ,ttcd
r L' r n re rs o n a b te d c .u rr.l . O,h c r (un,
nm the rDosr retiabie scorcs shoukt h
,,1i.,,,,,,,
""un, ,hd,r ,.m Pon.,,,ur r
:Tii:i'il:
' ;l .1,;l"l:;;'i,l#1'-i;
enr quire diflicutr ro assess.As a firsr aD.
renr rhc \rdn.l dro deri . ron ot i r, \or;s
rqr,e.. \Jri rbtc J. anurher, rhc Ii r,r rer
of rhe se.ond rn rheir roral.
on a secoDd, and lowesr score on rhe rh
th€ raDks of rheir roral scores on rhc rhree
rcsts are rhe same as rheir ranks oD
rh i rd s e rri o n o l rh e ra bte gr\evhc mdxi mum
pu,si bte \ ures oraj
_, . .n, s.r ith
po,
rh.e m e a n s (o re !. a n d rh e s ,Jnd:,d d,v,al i " " ,
,r,i * " * ,
,r," i i ,*
" r fhas
otal pornN ..fesr
the hi" "qhest m€an
r vanabihty
278
GAADINGAND FEPOFTI.]GACH EVEMENTS
r able 1 5 -1 . l v e g h l e dT e s tS c o re s
53
50
65
a9
42
2
3
3
1 0 00
500
25
21;)
'15
65
5
1 30
I ta
1 42
136
15r
2
2
aa
22aa
30
l0
1 4 50
65
42
:JO
360
360
360
illus r r a te d i D th e l a s rs c c ri ()n o frh c rnbl cS coreronresrX aremutri pl i edby4.ro
c hanq e rh e i r s ra n d a rd(l c v i a ri o Dl i o m 2.5 ro 10, rhe same as o" r.,fZ S coi .,
"
t es t y l rre m u l ti p l ,e d b r 2 , L o c h a n ge thei r srandard devi ari on ro r0 al so W i"rh
e, . ud1,r-,,,1 d ,.1d .\i n ri ,,n - .h ( rr' \r !aIr
eq,rat \ci 8h( anci I,re,tudenr. havi ns
r he r . rn ' a \e rJ A c rd rl ,' n rh p rc \r" rhc." nre r" ral .tores.
$'hen. rhe rvhole posible mnge of scores is .dsed, score variabiliry is
.
cl(xelv related to the exrenr ol rhe ava'labte score scale This means thaL scores
oD a,() irem oliecrive resr are likelt ro crrry aboutfour rimes rh€ weighr ofscores
on a l O.p o i n r c s s a yre s r q u e s ri o n ,p ro!i ded rhar scorcs.xrend acro;s rhe w hote
range in lrorh cases. Bur iI onl! a small part of rhe possible scale of scores is
rcturll)' used, thc lengrlr of rlrar scale
-an be a lery misteading guide to rhe vari
r l) iliI y o f th e s fo rc s .
20Vo
20%
10%
30%
20%
the Trscorcsofeach compoDenr can be muttiplied by 2, 2, l, 3, and 2, respecrivelx
ro achieve dre desired weighdng (Oos rerhof (,1987)has described the weighring
GFADINGAND F€POFTNG ACIIIEVEUENTS
procedures for hoth.riterionreferenced
279
and norm referenced gradirtg situa'
On e fi n a l a d m o n i ti o n re g a d i ng rel ati ve gradi ng and combi .i ng scoresl
k is a n)islake to converr tcst scores to letter grades, tecord rhese in a grade book,
and rhen rcconlert the lerre. Sradcs to numbe's (A = I, B = 3) for Purposes of
iu, np, r r in B r' ,,.,1J \,,i g e \. \ b F Il .t p ro.Fdur( i r ro tei ord rhe rP srscnre\ dnd
Ih c' P ' arrherdd.d.hi rhqharever$ei ghri nq
' , r hei nurl e ri ," l rn .r' u rp ' d i r.{ rl t
score that can be cooverred to a final
a
co'nPosite
Io
obtain
bcer
xdoptcd,
bas
which ir i{as bascd, is gilen the same value in the reconlersion p.o.ess (for exam'
Dlc, R : 4-0). Sorie of thc rcliabrlitv thc tcacher struggled to achteve in delelop'
is lost in the pro(css !'or this reason rt ts desirable to record
i"e c".h
-".s,"e
or standard s.ores r;th€r than letters or rheir numeri.al equivalents
raii scotes
LlrrErHoDs
GRADES
oF ASSIGNING
Thc procedures a teachcr lbllows for assjgning term gradet ar€ dictated laBely
b! r hi Dru a n n rg(h e re a .h c r h a s c h o s cn to atrri bute l o the symbol s The mul ti l ude
oi rnethods used in practice Beneralh (an l)e categorized rn lerms of their depen
de. c e on c i tl te f a b $ l u tc (r rc l a ri v c srandards(l ri stri e, 1978).Thc P opul ar vari
at ions ot rh e s e tw o tl p e s a n d th c i ! .orresP ondi .g strengthsand w eaknessesare
dc s c r ibe d i . rh i s s c c (i o n
Relative Grading Methods
is callcd gading on the .ume The
One popnlar radei! ol rela(i!.
(l i stri l _,uti oncu11eor some svmmctri .
' ' c uNc re fe ri e d to u s o a l l ,vi s th e n o rmal
'arading
\ ar ianr o f i t. T h e n o rm .rc l i ' re n .e d b asi s for rhi s t,vP eol gradi ng i s comP l i cr(ed
ar c n, er e l y a rD n ti fi e ro p g ro u p , i n .l udi ng those w ho D ay have scored20 poi nts
Io$er I he bottom 5 percent may €ach be assigncd an F, even though the bottom
l5 Der c e n t ma y b e i n d i s ti n g L ri s h a b l ei n achi evcmert R egardl essof the quora
r er r iDg s o a re g y u s c d , tl ti s r;l a ti l e g r adi ng mcthod sel dom.ari es a defensj bl c
Ihe .ti\tribrtlon gaf itethod, another relati"e grirdrng variation, is base.l
oD t hc r c l x ri !e ra n k i n g o l s l u d e D tsi n l he forn of a frequen.l di sri buti on of the
.onrposire s.orcs The trequcncy distribution is examined carefully for gaPss er eial c o n s e c u ri v es c o re sth x t n o s tud€ntsobtai ned-A ho.i zontal l i ne i s dra$n
r he r o p o i th e n rs t g a p (" H e re a re the A sl ) and a second gaP i s sought The
280
GqADT.IGANDREpoqTTNGTHIEVEM€NIS
P r of t s : re n ri n u e : u n ri l J l l p .$ i L ,te fl ade ranB c| q r. F) ha!c bFen i drnl l fi rd
I ne m rl o r ri | rc ) w rrh rh r\ re , h n i q ue rs rhe depcndenceon.hance
to form rhc
gaps. Tbc size and tocation of gaps may depend as Du.h
o" ,a"dor
ment error as on aduat a.hieycment differe.ces berwecn srudenrs.
",ea.";..
If the
score.
from an equivatenr ser ofmcasur.es dNld bc obr"i".d f.r", ,;.
e,";lr.;h;;;;i:,
Baps m'ght appear in diffe'cnr tocarion( or rtre trrger gaps mJt ,,i." .u, r" l"
somewh.rr small. i:r,ors of neasu,rmen, r,,,,n d,//.?i;e,,,re.
d;
n;.";.
c . i il\ r J D ,T I e d , h o rh e ' .u ' d ' rh " \ i r,.
re.l ,o d. .n repr rreo rD ci"";
surcmerr
"
\pe,
s ir h I h e f,r i n :I Im-n r I h F md i o r
ri ^n o| | t," A i .,, i du,i ^. j l ,p mel h.d .,
.r,.,.
{ g n .d . | ;\ s,"u.l
,.nr: Jpp" dr ru be I i shr , n ,tr,:S o,.r.,t,u.
rl q h e r c rd .re c ,,n ,c quenrt\. ,er, hFr\ r,.ei \c re$,.r \' | rdcnr ,1,Ir
" : I " *' ' , ' F :
pr ar nr s rn d re w d re l u c < r ro re e \a mi ne resLpapers ro search
tbr,,rhar exrra
p. r nr . In d r s o u t.l . l u r F \d rn p te ., h rnqe a { gr.de r^ ,I tn ,i ,Ia .I'
sh-rp rh.
.'":'.,]'".
o , f.rc s re ' ri g h t\ \r' ri Jbte rhi : grddi ng l rerhud i \ rrketr In ri cl d
F r aoes rn d ra rc \rmrtd rro rtr.r(,rs s i g n.dh\.um" nrherretrri .eFrxdrnqme,i ,ud,.
noqev e ' . w h e n \ o ' e r rre I c td ri !e tv humoscnen r, rhe
[Jp .]i ,i i i L,,,r^a merhod
a. r ua|| v ma v b c a 5 In e q r" tr' l .tero .n mr rudcnrr ar i r
ro L,c.
" | pi a,.
rt' ,e o rh .r \ i .l c h u .rd a n .l
Fc,rp!d r .nu nJ rej ,i ,r,e n,rl i nr- pro,
" dur.
m ighl h c l J h e l e d rh F rra d l o td /,,1 ).a ,anapth^d.t.
,h" ;.p;,," d;,,:;:;
;;.
"
dar d de ri a ri ,rn i a ' d e re ,rri n i n g rl ,e gradc .uroff poi
" ' ,;
nts rhai fo,- .d;;;1.
||on, or rh e ,.mp n .i :r \.o re \ tl rc n rhe Inr.,l i rn j nd \r!ndi rd de!rJri o,r,,t
rhL
r om po, i re n o rr. a re i u mp l rc d . r, o p.i ^r\ ti ,r rhF r dnC eot C
sr,,l e_ r" ," r,;.
pef lor m a n .e l d rr d e re rn j n c .l t,! d d d Ins ,,ne l ,r| | or rhe srrndar,t i er i r ri on
ru ri i t
m f dr dn .' r' d s u h r' J .ri n g ,,n e trrl f .t rtr ,r ,r.,tJr.l dc\ri ri .n l ronr rtri .
n,edi Jn
, he4. d d d u n c .r!n d rrd d . ri rrl o n ro,rr. upfer, ur.rr i ,t rth.r , ru fi nd
rhe-q_!
c ut of f s c o .c S rb rrl c t rh e s a m e a D o unr fr;; ttre tow e, orton or trre
C s to nno
t he D- I c u to i l R e v re s b o rd e rl j n e c a scsb, u$j ng the
of,,.i sn_;n;;;;
p- ler edq, u a ti (y o I a s s i g n me n rso. r s o n,eorher rci .-.t " " m1,.,.
[ur" ," j .l i ,i .
if anv bordolj,re gndes shouid be raiscci o, l"*...,1 -r,i
.,.-." t
Nr"."".._.";
._;;;;;i.
rll componte s(ores, also A rariation of ths mcrhod rhar des.ribes
rhc use of
relatjve grading on aD rnsrirurioDal bass has becn iltuslrated jn .onsiderable
de.
t ail by } l b e l 1 1 9 7 ? ).
A bs olut e c ra d l n g Me th o d s
\,ri o ' ,. n ,e rh o d . rh J r d e D Fnd on p" ,,enr..nre. d\ thF.r t,.,\r,
hJ,e a
.
r . nq. ( dn d rn g h r.ro r\. h u r rh e i r p ^ n u tnl r! trd. .| r i i she.
arerrt! \i n, ( ,tre rJ: tr
r : r , us r e rre n i s (o re \ri o m ' c .r.,p J p e r..dndorh(r
p,ui p ..Jrei ar,.rprer(drsrh..
per . enr o r .o n rr n t. s k rl t.. o r k n o d te dg. u\Fr qhi .h i u.i en,\ \i \r ,,,rnmrnd _,
dom . ln. re t' e re n c e di n re rp ' p rrri o n F or e\amptp.drc,r \,urF.t b., p., , .;i
;;;.,.:
lhar r he s tu d e n r l n u h . 8 3 | e ' , e n ' o r rhe.on' .nr I cpr p.Fn,ed h) rtrei rj r| |
ur I i unal
ont ec lr p \ | l o m s h ! h re \r I' e ,r( q e rc p,.pdre.j dnd,.rn,pl ed ppri en,\nr..u\I.
rrng the scolcs vith performance srandar.ls
to pelcent scorcs using at-birrary srandards
e cune Thar is, srudenrs wth s.ores rn rhe
(o 92 is a B, 78 to 84 is a C, and so on The
GRAO
NGANDNEPOFTINC
ACNEVEMENTS
281
restricrion hcrc is on thc score ranges rarher than on the number of srudenrs
eligiblc to rcaeile each ol the possiblc gmdes Brt what ranouale should be used
to determine cach grade categorl .utoff score) Why should the .uroff for an A
be 93 rathcr than 94 or 90? A major limitation of percen( gading as used by
soDe rea.hers is the use of fixed cutoff points thrt are applied ro @dl' grading
componenl in rhe course It seems indefcnsible ro set gnd€ cutoffs rhat remain
coDstant throughout the course and over several co.secutive offerhgs of rhe
.ourse. What dr6 secm dc|nsible is for the instnctor to establish cutoffs for
cach grading .omponent, independenr of the others, depending on the conrent
c,f each component. For example, the range for an A might be 93 to 100 for rhe
first test, 88 to 1U0 for a term paper 87 to 100 for the second rcsr, and 90 ro 100
fbr the final exam
Those who use percenr grading find rhemselvesin a bind when rhe high.
esr score obtained on a test was only 68 percent, for example. Was the resr much
too dillicult or did students prepare too little? Was instruction relarively ineffec
tive) Some insrrucrcrs proceed to adjust the scorcsby replacing rhe perfecr score,
100 percent, with the highest s.o.e, 68 percent in this case. For exarnple, if fie
highest score was 34 out of 50 points, each students percent score would be
recomfuted using 34 as rhe mariimum rather rhan 50- Though such an adjusrment may .ause all concerned ro brerthe easier, rhe new score can no longer be
inrerprete.i as originally iDtended the pmporrion of rhe contenr domain rhe
student knows, as sampled bv the test. A new donain has been eshblished What
useful inrcrpretarion aan be made of the nc$ scores?How can rhe ner\r domain
A final shortcoming of percent grading should be nored. The range of
percent scores usually is limire.l to 70 to 100 because (he passing score generally
is 70 percen!. The test constructor must exhibir Srear skill ro prepare irems rhar
will yield scores distributed rn thls narrow range and rhat, ar rhe same dme, will
measure relevant learni g as reflecred by the instruc(ional objecrives Merhods
tha{ allow for a lower passing score would permit a greater porendal range of
scorcs. likcly would yield more reliable scores, and likely would result in more
reliable grade assignments, assum)ng the full range of grades (A ro F) is ro be
A second melhod ofabsolurc grading, called here the antent b6en Mtho4
dcpends heavily on rhe judgnents of the ieacher in decidhg rhe rype and
amounr of knowlcdge students must displal to earn each grade on the A ro F
scale It ir the method mosr compatible with mastery or quasimastery reaching
and learning strategies, but it need:rot be limircd to pass fail or sarisfacrory
unsatisfactory grading scales.The procedural steps for establishrng performance
sGndards and curoffscores are outlined below for a so.item Lesrbuilr to measure
achievenent in Mo units of insuuction.
I Firsr. rhe grade to be assign€d to thosc who demonstmre minimum
a
f dec r ng c h re v e me n 'm u ' | b c e \ta b l i <hed W e q i l l urc D l or i l l urrra' i on purpoces,
bu' ir ( o u l d b e L . r\ i l o m m o n i n grddua(el evel (ou' rer. The reacbe;m;sr de.
velop a descriprion, preferably in writing, of the type of knowledge and understanding a student who barely passes should possess.Srmilar descriprions musr
be developed to describe C, B, and A performances.
282
i
GFAD]NGANO h'EPORNING
rcH IEVEMENTS
2 . Wi rh rh e d e s c rrp ti o n si d
and de c i d e s i f a s n rd e n ,i fi o n l y I
swer r correcrl): tf so, a D is re(or
m or e rh a n a s i n s rc p o i n r, ti k F ro mr
. r e( r dc rh r D i n i m u D
u mb c r o t p o
caregorx
I. T h rs p ro L c r( ro rj ri n e s , rl
. . . __
l|ed r h p c s m a re d c u t{ ,ft s c o rc to r
D s y mb o l sp re c e d i n g th c i re ms A s sul
t he num b e r o fC s y mb o tsi s ra l ti e d ar
L perrormance. This pro.ess conrl
grade has been derermrncd. The res
A = 48_50
B = 40_qt
c=2939
D = 1 7 -2 8
I = 0_16
Ln l )c obtai D ed Ly adj usri D grhe esri D rrl .d
acpFndrng orr rcsr tengrtr. r he adi rr\rmrnr
i ur rh. rnr I rh.rro r,uca\urc\ are l e< !
rhdn
i':i:"I'llj:iil,,j,il:
l:,.*:;'J';:i:tl.
q,^ o,e,rn,,
o,,""il,,ll'ii,:,"i;l:,
s, " s,,0".
ll:,ll::r.1,;,l,,ll
T h e c o n re n rb a s e d m e th o d i
musr exercjse subje.r,vitr h descnrrin
ample, musr displal Instructors in rh,
mctors are willrng and abie ro define
Der_
: able ro supplv a dettnsibte rarionate fof
ar approach has bcen described b), Tclr|il.
A final merhod relaresro rhe use
ilff#i:*iil.','i:i:u]n*s*lii*rn*h#ilnytiutH
GFADING
ANDBEFOFT
NGrcH EVEMENTS
2I3
ov er t 0 0 re p o rrs d c s c ri b i n g c o n rra crgradi rg and concl uded rhar ,,conrracreTart.
ing ap p c a rc ro h a v c a p e .n a n e n t ptr(c J;rung rh. mosr rpproprrot. cui i enr
m er ho d \ o fd \s i g n i ' ,8 H rd d F \r^ \ru den,- H us" \" . rrudi F, ,,i i t," .tr,. r. ,r .o,,.
t r ac t in g g e n e ra l l y s h o w e d rh a r s ( udcnrs l i ke i r, reache6 assl ened morc hi B h
gr ade srh a n w h e c o n v e n ti o n a lme rhodsw er.eused,and sLudcnti chi cvemert ul as
nohigh
rh rl \i rl ,i u n \e n ri o n /l
grrd,rE ri ,n,rr,rLi rr.trnR Jppej .\,,,hct\..,
s u' r e. l ru \.' r s n ,d tt, tJ \\e . n t rn d ,.p cn,ter..\ruJ ,,., our* . r;
h ,,u,1,n.,.,,.
"ii,
giv en (h e fl e x i b i l i ry ro p u rs u e i n d i v i dual i rrercsrs.tns ch cases
a uri rren asree
m ent s h o u l d b e m a n d a ro ry s o rb ar D u D rsundersrandrnsh i
resard rrr;ha,
m u\ r b e r,,u rn p l i \h e d . L r \h ,,,r. d r,l b) hhd dFU Ll trn; " \,\r
GRADING SOFTWARE
The time.consuming rasks associared wirh recording resr scores in gractebooks
and c o mb i n i n g s c o re sfb r h D a l g rades ca be handi ed readi \ by a rni crocom
purer and any of rhe numerous softwafe packages alailabte f;r
i.adns. So,ne
t eac he r\ u re a \p re rd \h c .r p rn g rd m dnd de\i tsr rhe,j .un g]" Lti ng ,ppi i ,.rrr,,n
pr ogr d m :n rb e ,\ ti n d rh e u n i q u e r.a,ure. or man! ,,r rrre,,,rnnreri i ,i
l ,a,r" gs
word their relalivelv loq cosr.
Bccause sofiware and hardware borh change more rapidly rhan mosr
other textbook conrenr, we ha!e chosen not ro describc or evalu;re slrecifi. srad
ing software. However, curren! inlormarion can be tocarcd using suci relerc:nces
As Dato Saur4, L,-atoJ!!p!!t!Lr.4
ino.^nputn_so't
c. r.tl.\r \ta, Inro\hr. / a,
. at bn tn a ? r,.a -n Id u t,a t l tu t? , ta J a u ,MI\ i r Fdb o' on tt.tl l 1 l n rddrr i un. \ntr\.r,I
r e\ r ew s a n d trs rso t n .$ re te a \.s a rp pri ntFd trequer' tr In \u, h
l .rrn.,t, .,, /./,.
tronk Leaning, inAdzr, Cra$rcon hnputet Leaminf, and ihe C(,np;b Tancha. Hete
are some queshons (o rarse when assessingrhe uril'tv of a gradebook prograrn
I
2
3
4
5
6
7
How dranysrudenrsand grading .o mponenb per sLudenL
can be r.conuo.laied
on a single dar2 diskeu€?
l-or the elemenrarys.hool lerel, .an rhc sy{em han.lle mulriple ctascs for a
single group of sLudenrs?
Ho{ cdnvenienris it to cbangegradesor Lorepla.e scores?
C a n re s ts c o re sb e i m p o rL e da s a dara6l e so rheydo not needt(j be kel ent€,ed
one by one?
Does the variei) or reporting-prinring oprions sarisfybasicDeeds?
C rn rh c d d ,r b c .ro rc d o n r d d,a di .te,' c.,ppd,j re r,rm,t,c D ,os,r,n{ i \t.,.
rh a r! u p i e v a n b c m J .teo l rh e ddra trte.j
Are there anl unusual hardware .equi.emenrs .egardnrg menrory,drives, or
8 Can rhe prograqr be retur.cd for tulr refund afre. a rcasonablerrial perbd?
Ofro u rs e . th e m d i n q u e s ri on ro a\t r' bout any A ,l di ne sotrhrrc i r...W i l l
t he pr o g ra m a l l o w m e ro u re rh e g r adi ng proi edure, and ph:to\oph] I hde J
dopiedl ro r e x a m p l e , s o l i l " a r. (h dl $ i tt nol a, i ummodare; rea(h;r,\ r ri r.ri on.
referencd__glading pmcrices should nor be considered for adoprion, no maner
how friendly the package seems to be.
244
CFADINGAND FEPORIINGACH EVEMENTS
S UM M A RYP B OP O S IT IO N S
I T her er s . orntrq w o n q w rrre f.o u ranqg s ru d e r l
''
, r ! , o. 0 1 o ,a d p . , ..- q ." d a .d ,e ," ,,:
Y?
4 Gladine-is
r@q@nry
rhesubrecr
or educarona
::."#::
I ' r c d s ! r e S o r a ch ie ve h e n l
con rroveEybec als e Lheqr idir g pr oc es s
sd
'L 'd||6 'e''p |'' o\ oo' * . , ' , ' "j", ' "' "", ", , . .
l :" d
o rdnh r.' oo,
l;',lj,il
o, ro,ra..p
ro represenr
slmmarive
"",i,h."rr;;;;;;;;_x ;J':,",,:""xlJ:;iHlffi:J5"t:l:1::,ii"
;oe"l,:l"..
16 T/rewetghrcaned by eachcornponenr
meas!.e
compone.tsis ro, smalt
7 T hes eec ir of0 1e trh ear b s o tu te
o r re ta tv es L a n oarosas a basrsfor gradi.gwrt be ntlenced
mor€bypj rosph €lmns deratonsthan
by em_
21 The use ot conlractgradng may
be advanlageols
indviduai nstruclionasituaUons
bor
'or
grad
nol ro.
ng cassesot slldenls
2 C , -a at,o o-p-re,so ba.p.rnF6p,.oL.!
' fe c pi !dt .rtor crd rrp oorenta
ro. .orpj a.
rcnaierors assocaledwtthgradinq
OUESTIONS
FORSTUDYAND OISCUSSION
I For what uses are high schoo colrse qrades
most vatd?
2 !nder whal cnchslanc€s.ould I be approp.iat6
e r 6 9 'a d e s i s s L p db v r e a c r - . , si n
d , c r oor r o ev dr Ldr a, aec J r r , c ur Lro . , h a r " " h o o j o '
3 W-hat s grade inth t@nad what kind
of evidencgis reeded lo show that has o.
has nol
GFAD NG AND FEPOFTNG A]H EVEMENTS
2Aa
.4 When eller grades a.e used of report cards at
the mddle schoot evel, whar rfformalron
srrourdbe T!rn shed to commlf cale the meani.Oot each qfade symbo?
5 W halar es om eet f ec t v em ean s o tr e w a r dn g s l u d e n t sf o r r e i f s u p e r be f l o r t t o t e a r np, a r t c ,
! arry n the lace ot reraltveytow ach evemenr?
6 W hy r s r heus eor pt us andm tf u s r e l r e r g i a d f gk e y l o y i e d r n o r e v a r d g r a d e s
r h a ns j h p t e
7 W hat adv a! lagesdoes ef ler g r a dn g h a v eo v e r n u m e r c a t g r a d r n g ?
e What shorcominAs,n afy are i.herenl n rhe ececlic gradtnosyslem
described in lhis
9 Whal ncenlives,other than grades,can leachers use lo molivale srudenls partrcipale
1o
rf class aclvilies and 1o compete rromeworkor pfaclrce exerc ses,
10 whal are Lrredsadvanlaqesot Lhe leed jorward,, concepl thal s recommended
for use
if 9rad ng ass 9nme.ls? Fow cou d eadr disadvanrageyo! iden ed be
overcome?
l1 lr the scores irom lhree 90to.1lesrs are addedtogelherlo torm a compos
le tor grading
wny wo! c each resLnot necessaalyhave eqla nt uence(werghL)
in delermrnrfgihe rank
order o1 ndividuas n the composrle?
12 Undefwhal c i/ c lm s lanc esm ig h l e r a dn g o n i e c L r v eb e p a d c ! a r y
appfopriate?
l3 W haLdr awbac kdoes
s
t he s r an d a r d i e v a r o nm e l h o do r g r a d n Oh a v et o r r e t a r v e r ys m a l
14 W hy m ghr s 0beam or eappr op r a l e p a s s r g s c o r e l h a n T s n a p e r c e n lg r a d r n g
sysle.n?
15 W har ar elhe dear c har ac ler is l c s o r a c o m p u t e r g r a d e b o o k s y s l e m t o r l s e a t e a c h o t L h e s e
gr ader ev elseem ef t ar y ,m r dd t es c h o o , h g h s c h o o i c o l t e g e ?
The Nature
of Standatdned Tests
CHARACTERISTICS
OF STANDARDIZED
TESTS
Tlre tern ttdtulafthzal tesr.efers ro a resr thar has been experriy consrrucred,
usu
allv wjth_tryour, anat),sis,and revision; inciucte,.*pli.tt ii,"ui,t"n,
r..u"tru,i"
Glandard) adninisrradon and scorin$
p.ovla* *1r",
r.. r_i.
"na
rnrcrpretarron purposes, deriled froD adminisre.ing
rhe resr"rtlol-,
in u iforrn fashion
to a defined sanple of persons Used l
published rest or invenrcry. whether Dr
not. Most precsely, resrs or measure; r
means ro. making score comparisons
taskt under the some tzsti@ cotuliiiow anrt t;
uith the sme prccedurc; Of cou$e, no
ui
I.n ' e n d c dro \i c l d norm rercren(cd (omprri runr Lri reri un.
:r -co,
: l' l\,
r
r . rTTen.
e d rn o9 d,a
o m a rn .re re re n c e da r hi e\cmenr rc5r. and,ome per,onati r\
\ x r . ! . all u i ' w h i , h md v b e .o rn me r(i a t pr.pared and," ;r.,-i y.a_," ;,,.,.J, mea.
us uallt pru l rd e
ra b l e 5o f n o rms
Srrn d a r d' ro
ized
s e re rh e s ame fLj nrI ron rn edu( ari on rnd p\} , hotusr
d\ s r dndd rd re i g h rs a 'ned\rsm e a s u re sd o i n.onme,.e and,ci ence.
rri ..,"
_." " ,
m ar k er hJ d i rs o b n rv p c u r s (d tc a n d .on(epr ofhoh mu(h d puuna
i ,, i l _rrJ
nor . oe\ ur e rh d r a p o u n d o r g ro u n d b e efpur.hased at one marker routd
be mnre
/ p .u n c t o b ra j n e d a r /n o rher l hr samr probl em houl d tace
rhe
( onr um er a r th e g a s < a ri o n . rh e ta b ri ( shop,
and rhei andy counrer W i Lhour
r r lnda, dil e d L e q s ,,h e a , h i e v e mc n rra n d dbi ri ,i * .r,,,a.,,,
r," . Jri i .* " i i i .* .
288
TESTS 287
THENATUFE
OFSIANDAFDZED
rooms and schools.ould not be assessedreadily with a common yardsdck. For
example, rfeach tiiih grade t€acher in a distrtct werc ro develoP a geognPhy test
ro measure studenl achievement, e would Iikely find lests that varied markedly
in lhe breadth rnd depth of tasks requrred, the umber of irems, the amount of
tcsring ome allowed, the qualiry of t€st rtems, rnd the reliability of the scores
obrarned Celtainly, ir would be illogical and inappropriate to make scote com
parsons among sludents &om different classrooms and schools under such crr'
'I he distinction Dade ir Chapter 2 between tests and measures will be
followed hcre in detarlin,l thc charactenstics of test batteries and srngle'subJecl
res(s In addition, because standardized personality measures ancl inventories are
sed so rarely by Dost reachers and administrators, we have chosen to hmit our
t.earmenr ofsrandardized instruments to tests in the areas ofachievement, cogni
tive ability, and aptitude
Test Ball6des
Somc standardzed tests are developed, published, and administered in
coordinated sets known as tesl battfrizs-The nnmber of tests in the set may vary
from 3 or ,l to l0 or more, the number of items Per test may vary from as few as
20 to 100 or more. and the administration trme Per test may range from about
10 ninutes to more rhan an hour The admuistration of batteries like the loua
Tesk of EAMtionaI DeveLopnmt or he DiXermlial Aftitude Tests may rake as manf
as live seDarate test sessions.
A primary advantage ofusing abattery over a collection ofseParate tests,
whether for achievement or aptitude measurement, is thar fie battery provides
comparable scores from the saDe norm grouP for all its tesl,sThis is imPortant,
for example, if Mindy's achievement in mathematrcs is to be comPared with her
achievem;nt in reading, language, and science Her relative srrengths and y'eak'
nessescannot be assessedunless norm.referenced scores using a ri,gl" reference
group are available If seParatetestswere used, Mindy might seemro do b€tter on
tle riarling test ttran on the math test simply because students oflower achi€ve'
menr wer€ more prominent in the norm group of the reading tesi This tllustrarion explains why aptitude batteri€s are used so fr€quently in emPloyment and
vocational counseling to help the client understand his or her areas of srength
and w€akness.The use ofseparaie tests would not permii useful intraindivtdual
An achievement battery is a suney ofihe subJectmatter covered by each
resti cov€rage is broad and, therefore, reladv€ly shallow A battery can Provide
comprehensive coverage oftort of the impoltant aspects of achievement at the
elementary school lev€I, nu', at the secondary level, and &rt, at the .olleg€ lev€l
The more uniform the aducational progams of all students are, the more suit'
able a test battery w l be for all of ihem.
A very practical advantage of a battery is that rhe scores from a ba(ery
are reported mg€ther on a snrde report. WheD sepamrc tests are us€d, a score
reponis generated for each seParate test, creatinga most cumbenome accurnula'
tion of paper for the user.
2I8
THENATURE
oF SIANDAFDIZED
TESTS
use ofalartery
of lesrs rhlt was developed as an inregrared whole
.
--Thesubsranrial advantages.The nrain disadva;raqc is rhc la.kthus offera
fflexibitirv
ir r il' o' . 1. .A b d | | e n n ,d ) i n c tu d e ,o rn r { r h,e\r\ rl ur dre ol ti tc rnrcl
lar u. er . an d m i ! n m i r o rh e ' . rh e \ b u u l d ha\e prrterrcd. B ur rtri . i pd,i or rhe
pr i' e t har m u \r b e p d ,d .u me ri m e s Iu r rhp adrdnrdge\ or ,on!enr ,,e i n u.c,
comprehensiveness of covcrage, and comparabiLry ofscores. Mosr achieveDrenr,
Single.subl€cl Tests
Tesrs rhar measure achievemcnr rn onc conrenr area, or rhar measure a
L a single{ubJecr resr. And because such
rhan rhe conespondDg resr found jn a
bartery, they will contain more toral irems and more irems per-skill
Srngle.subjecr tesrs rend ro be used for parricular purposes, to make a
._
sp€cifi. kind of insrructional decrsion, rarher rban simply io d;scribe studenrs,
reiative achievement or aptirude levels For example, readrness res6 used ar rhe
pr jm ar t lev e l m i g h r h e l p rh e re d .h e r g o up srudcnr. ot \i mrta, te,di na or dri ,h.
m dr & hiev c me n r Ic \c l \ tu r i n \rfu , ri u n rt pu,pose,. { m" rhemari r, res;mi ghr te
us edr . der r d e w h r,h \e re n rh g rd d e r\n ' .mo\r
l ;kel ) , dndi drres l or erahrh.srJde
algeb' a. Re a d i n g h s r, d re u rd
h e tp ,cte,r reddi ng mdre,i dl \ rhr;w ou" td be
' o e re a d i nts.krttsore;chqrudenrLnd.ot
m os t appr op ri a te fo rd e v e l o p i n S rh
,ourse
P r or r c r en. yte s tsa n c l g l a d u a o o n c o m p e r r
t nat r r e us e d ro ma l e p rn mo ri o n re re n rl
Some single.subject rcsts resemt
or thcy provide skrll scores Some trngl
separate scores on vocabulary spelling,
capitalization. A reading resr may yletd I
hcDsion score, and a total score. Ofren
wift one anorher thar their separate dia
the total score is probably a comprehensive indicaror of achievemenr iD rhe
broad content domain delined by rhe resr specificarions.
Most of the standardized resrs of.ognirive abilities (intelhgence) to be
dercribed more tuny in Chaprer 18 are mosr appropriarely .lassifi;d as single
subject tests.That is, rhe rrair rhese tesrsarrempt to measure generally is a sin[le,
unitary cbaracterislic. Despite th€ differences among ..inre iAence', resrsin;har
r hey pur pof l ro me a s u rea n d i n rh e rh e o rl on w hi cti rhey ari based, dnd derD i re
r } le f a. r t hat s o me v i e l d s u b re s ts c o re s .m ost i ntetl i genceresrsare l essti te bai rer.
ies and more like single{ubject tests
TESTS 269
THENATIREOFSTANDAFDIZED
TYPES OF STANDARDIZEOTEST SCORES
Seldom are the raw scores (nunber correco obtained by students on standardized
res$ interDreted directly. Inslead, ra scores are converted to some other score
scale to facilitate in.erpretaoon. These n€w score scales are desjgned to Permil
direct norm referenced interpretations by referring to a singl€ reference group
(starus scores) or to s€veral reference groups that have been linked to the sam€
score scale ldeveloDm€ntal scores)
Sialus Scores
Stalus scoes indicate how a student's test Performance comPares with
those of olhers in a single reference SrouP-a class, school, school dist c! or
natioDal group Relative position in the group is the focus Status or standing in
the group gene.ally is express€d as a p€r.entile rank, but standard scores hke
those dcscribecl in Chapter 4 frequentty are used as well ln most casesstanines,
?scores, or normal curle equivalents (NCE' are normaliz€d standatd scor€s de'
r i\ ed lr om p e r, (n ri l . ra n k s Ih e .ra n d a rd age \cotc' or devi ari on IQ s(o' e\ rhal
corne from cosnitil'e abilities tests are status scores also.
The primary purposeofstatus scores is to help in iden tifying intraindivid'
achievement (or abihty) across rcsts in a battery. For exarDPle,
ual drfferences
Vrc's pcrcentile rank of l.{ in vocabulary indicatcs a relative weakness comPared
with i reading percentile rank of42 Science mightb€ consider€d a strength for
Vic and rnath a weakness if his science stanine s.ore is 7 and his math stanin€ is
4 Of course, such comparisons are legitimate c.nly when the same reference
group has bcen used
Note that the use of status scores to moDitor year io year Progress can
mask grourh F.,r examplc, a student whose reading Percentile rank is 87 tbis
year will ob u nr a sim ilar score next year tf normal gro wth occurs The sam€ness
convcycd by status scores in this srtuation could be mrsinterPreted to mean that
no change occurred In fact, a ,core of about 8? next year would indicate the
studenas achievcment changed as much as the achieveDents of others in the
norm group (Sec the guideline's shown in Chapter l7 for in terpre tiDB Percentile
Developm6nlal Scores
DeuloPm€ntalscofts iDdicate how a studenr's test Performance compar€s
with those ofothers rn a se es ofrelated refer€nce groups (Hoovet 1983) There
groups difer systematically and deve)opmentally in average achievement and are
defiDed in rerms of school grade o. chronoloFcal age. Score scales most fre'
quentlyused to express developmental Ievel include grade eq ivalents, age equiv'
alents, and developmental standatd scores (sometimes called expanded standard
approPriately used in grades K m I
Grade equivalent
with s.h.,ol subiects that are studied continuously overseveml years at increasing
levcls of skrll and complexiry 'Io obtain a table of grade equ ivalents, the test Inust
290
THENATUBE
OF STANDAFDIZED
TESTS
be given to a large number of studen$ in each of the seve.al Arades for whi.h ir
is inteDded Then the m€dian raw score ofstuden$ in cach grade is derermrned
The raw score is assigned a grade.equivalent score rhar e)Lpressesrhe grade Ie!el
a grade equrvalent of 3 2 would be assigned ro rhat raw score_If the median raw
score obtained by fourth graders on the same resr at fi€ same rime was 30 3, dren
a gmdc cquivalent of4 2 would be assigned to rhat.aw score (Does it make sense
thar a raw scorc of 26 0 tould be assigned a grade equivalenr of 3 7) crade
equivalents Lrsuallvare expressed ro the nearcsr lenth. each renrh corresponding
r oughly t o o n e m o n th o fs c h o o l i n g i n a sch.,olyear ofapproxi marel y 10 .r.,nrhs
A grade equivalenl oI 7 4, for example, represents rhc median pe'formance of
s ev enr h g ra d e rs rt th c e n d o f th e fo u rth month' Tabl e 16 I show s rhe qradc
equir alen r { o fe a * i g n c d ru rh . rv p i .dl srudenr In .ai h srade l or earh ut rhrFe
resting rrmes. Note the average growth rate from year ro year is 10 and rhar rhe
samc uniform $owrh is assumed rhroughout cach year
Grade.equn'alent scores carl be used (o desoibe a studcnr's delelop.
mental level, in terms ofschool grades, and ro rneasure growrlt from year ro yeal
But rhey are iess useful ibr examrning relatire sirengrhs and weai(nesscsbecause,
as Table l6-2 ilhrsuates, va abilitl in each test area is dillerenr lor a gilen grade
group For e)tample, all sixth graders whose raw scores are at rhe median in rhe
fall have GE : 6.2, ro malrer which test area we considcl Bur pe.forDan.e ar
t h. 95' h p e rc e n ' i l e L o rre (p ^ n d s ro a uF ol e 2 l ur ,pel l i ng !nd; cf. nr a 0 tur
maft computation If Ne looked only at grade equivale'rrs ro make judgmenrs
abour strengths and weaknesses,in rhis exa rple $e would erroneously .onsider
s pelling a s re n g th , re l a ti v e to m a rh c o mputati on.B ecausesi xrh gradc$ are nl ore
homogeneous in Dath computation achrevemenr dran in spelling, rhc range of
grade equivalenK necded to describe the bulk of (his grade group is 4 2 to 8 0
aDd 32 r o 9 2 . re s D e c ti re h .
Devetopment,l standard scorcs arc similar to grade cquivaleD$ in lunc
Iio and have the same advantages and disadlantages of Dosr orher rrpes oI
derived s.orcs Thc d€velopmental standard scores shou'n rn Table 16 3 hale
average growrh rates that d€crease as students progress through rhe Fades 'r'hese
Table16-1. GradeoquivalentScoreslor MedianPerformanc€
at Eachot Thr6eTimes
o f Ye a rl n Ea c hGfa d e
3
K2
K5
K8
12
l5
ta
22
25
28
32
35
36
42
48
52
55
58
62
65
68
T2
T5
TA
A2
85
88
rAs will be seenin rhe examplesused larer sone publshen d.op rhe dc.lnat poinL{hen
repo.nng r sudenfs grade cquiv"ldnt Fo. exanpl€ ta and tl rLould be inrerp.eredin
THENATUBEOF
STANDABDIZED
TESTS 291
Tabl€l6-2.
Dlfierencesin G€de.€quivaleni
Distrlbutionsby Test Ar6as
GBADE.EQUI VALENf SCOFE
a7
95
6o
50
92
67
62
57
32
66
62
51
35
5
80
65
62
59
p e rro rm a n craeto
i nrqrad€
6onl he/ow 6
particular scores, used with the Iora Testi of B6X Shilk, illusrrare a significanr
lim ir r r ' . nof d l l d c \' l u p m e n ra l s ra n d a rd \ore scal er. rhere i s no meani ne or
inr e' p' c r dr ' . n h u i l r i n ro a \o re . \^ h a r does a {ore of 120 mean to, a tourrh
gradcr tcsted
April? Wirhour accessto a chart. like Tabl€ 16 3, we would need
ro know these'n
things. (l) Dedian performance in fall ofgrad€ 3 is defined as 100,
(2) nedian performance in rall or grade 8 is defined as 160, and (3) averaqe an.
I q' o\ r h f o r q ra d e s3 ,o 8 i \ 1 2 .D e v e l o pmenral.randard v urFs dre nor ; i dety
" ud becauic of the extfu
used
baggage requrred to iDrerpret them and because rhei
. ' r e r o unLm i l i J r ro re a .h e rs a n d p a re n rs.
Grade.equivalent scores are fairly easy ro inrerprer because they are ded
d n. { F s . r l c rh d r i (
b t i n di ti durl " uho havF l i nl e \ophi \ri carton
'u h ,,\r\.r
rnedve rs
r..d\u b i e ,'
ui,
s ra ' r.ri .\. 'Ih
rre
mi si n' erprerari onj us, ai are s,.rut
' o thar developnentai scores are more
scores, but rhere is no .onvincing evidence
grossly misused or Disinterprered rhan are srarus scores (Hoover 1983). -I he use
ol' um ' n. n. en' r
d d ,,,m e b d s i , k n o F l .dge dbour devrl upmenrat srate\ rre rhe
r . \ ingr e. lr e n r\ ro rF .p u n s i h l e rn te rp re ri r i on ot gr,.i e equi val enr vorF, A n ey.
ample will illlrsrrate.
IfJo nerte, a brighr fifrh.grade grrl, gers a gade equivalent score of 8 4
on an arithmeti. test de$gned for grades 5 and 6, how should her score be inre,preted? Chances are rhis resr was nor administered ro eighrh graden, so the value
8:l is the estimat€dgrade equilalentOy the process ofexhapolarion). The typical
Tabl616-3. Developm€nlar
StandardScorestor Medianperformanceat Each
of ThreeTim6sot Yearin Eaci craoe
Sprinq
56
60
64
73
a1
100
104
108
9I
95
13
112
124
136
140
144
124
128
132
12
14a
152
156
12
160
164
163
292
THENATLJBE
OFSTANDABDIZED
TESTS
studenr in t}te eighti grade, fourth month would score abour the same as Jonne(re
did on this test. However, this does not mcanJonnette can do the same arrihmeric
as fie rypical eighth grader She l'ould need to rake a test designed lbr eighrh
graders for us to know how 5he would perform on aritlmeric contenr srudied by
eighth graders Students who obtarn grade equrvalent scores srgnrficanrly above
or below their o n grade level should be retested wirh a higher or lowcr tesr
form if rhe userwishes ro obtain more precisc indications ofrheir developmental
levels. Ofren th€ per€endle mnk, a slarus indicator is helpful in nakingjudg.
ments about the value of out oflevel tesdng for a parricular srude r_
Scoro P.otlles
Only if scores on the scveral tests used are comparable is a profile of
student scores meaningtul. Scores will be comparable if they arc expressed on
rh e s dm e \ r ar u\ s o re s (rl e rrl l p e r(F n rl e r d nt, or al l rl ,p \i me rvpc;t \r,,ndrrd
score) and if the "ame reference (norm) group is used for each one ,q.nexample
ofone student's score prolile is shown in Figure 16 1 The horizontal lines orr
the chart.epresent various percentile ranks, spaced as they would be if rhe rrair
beiilg Deasured b) the scores wa, Dormally dislnbuied There rs a vcrrical line
on the chart for each t€st in the battery. The percenrlle rank values shown ac'oss
the top of the chart for each res( are marked as dots on tbe corresponding verrrcal
scales and connected by lines to form ihe prohle l,arry Hill's perfornance s
about avemge, ovemll (His percentile rank for the total €sr is 52.) His highesr
achievemenr levels (rclative strengrhs) are in reading, vocabulary, and work,srudy
skills His lowest (.elative weaknesset are in language and mathematics
ProFrles are most useful for identifying individual needs ofst dents and
for vocational and educational planning A profrle also nighr bc used ro idendfy
srudcnts who should be tested more extensively or to derermine rf imDressions
Ior m ed f r om , ld \\ro u m r(s ti n tsrn d u b s c r!d ri un rf, .urTl med P tufi l c, ;epre,cnr
a very compact form of vrsual communicadon rhat makes them convenienr for
reporting and explaining test results to both srudents and parents. (Additional
examples of profiles can be found rn Chapier I?.)
Perc6nllle Bands
In an attempt to st.ess the fact that rest scores are subj(r ro eror, somc
test pubhshers choose not to report an exact per.eDrile rank for each tesr score
Instead, rhey provrde a range of lalues within lrhich rhe "true perccndlc rank
probably lies This mnge is called alercenlite l)atul.For e\ample, rhe resr manual
may show that the percentile rank for a test s.ore of63 is betwcen the values 28
and 57; it ma) go on to stress that the exact perceDtile rank cquivalent is un.
known, since it depends on thc unknown size and sign (posrtive or negative) of
the error of measuremenr in rhe individual s score
T he pr in .i p l e e mp l o y e d i n (o m p u rn g percenti l e ba ds i s tl i e same one
involved rn usrng the standard enor ofmeas rement (Chaprer 5) to find the ranr
score mnge in which the true score probably hes- The width of fte pcrcentile
band depends on two facrors, the reliability of rhe scorcs and the degree of cer.
ainty that the band includes rhe true value. Lo score reliabilit) or hrgh degrces
GBADE
Iowa Tests of Basic Skills Form G. H. or J
I
i
lIJ'a:6t,jd E['ud|i5l
aG(r.qiadrq
r d rrllur t.u
,ErEfl!F
h ilF d Efi
t!
b !h! o ir oDn- FFfr rr rEr
'drh
!.
Ertu
h
r.
rnii
t,n
!.r
e|rc *l
- Fq[
c
Flcurel6-1. sampresLudanr
Prcrrecharl
299
ZEDTESTS
OFSTANDAFD
294 IHE NATURE
ofcertaintv lead t wide percentile bands Unfortunalelv the broader'these Per'
.entile bands are. he lesi useful is the information the test Provides
One use of percentile bands tn a batterv of tests N in decidrng wnether
be
or not a drfference between any two scores ofan examin€e is large enough to
t be du€ solely to errors of measurement
The score report in Figure l6 2 demonsrrates th€ use ofPercentile ban'ls
on scores for AlisonBabka fron the load Zrb o/Brrn Sttlls ln the uPPer'right
vocabulary, for examPle, Alison's n
whi.h means there is a 50 Percent Pr
is in the range'1 In thc bottom of rlr
scores liom each rest area WhY are
**.
of 100 always have a percentile rank band at the toP ol
u p.-"nt
--..t
th€ "HIGH" areaT
There is a possibllity of underinterPreting test scores using Percenrile
selecttnghigh level
be better olf relyin
for decision lnai.in
lhe user can be thit
of confidence for inte
on the perce.rile ran
. Gen€r;llv, the larger a score difference' ihe moJe confident
a corrcsponding achierement dlffer€nce actlrally exists But
usuallvire nade in e decision makrng context using other
'
are Dore liketv to helP than to hinder the Process
Subtest Scorcs
battery provtde seParate medures of
Just as tests that constitute a test
so
ir
is
differen-i aspects of achievement,
Possible to subdivrde.a singl€ tesi into
of several un ique skills 'l h€ rePort in
parrs
n
measure;
obta'
to
seoaratelv scored
Fll-,re 16lz demoi'srra.es this The desire to obtain as much information as Posri
blE from a tesr sometimes leads the test developer to offer a large number ofskill
scores, each of which may be based on onL' a few test items There
tions io note with regard to interPretinB such skill or subtest scores First as rhe
number of separate scores increases, the rehabitity ot eacb Probably diminishes
On many tes;, a subtest score based on as fer{ as l0 or 15 ilems may measure
samplinq error moie lhan it does irue achi€vement The percenrile bands in Figur e l6 t help a l e rt th e u ' e r ro fi i s p o $ i bi l i rv
nacll
tro{hir&
the size ofa vaDdard enor
296
THE NATUFEOF STANOAFID
ZED TESIS
f he s e c d n d(ru ti o D re l a re sto rh e \ al i di tl ofsubrcsLscores.W hcn subren
r c . r r c sar c pr o v i d e d , a s i s | n rc w i rh m a .y si nH l c$ubj ect.eadi ng rests,rhc dcret
ope. s hould p ro l i d e s u b te s ri rc rc o n e l a ri ons i . l hc tcsr manual ro shoB ho1,
s ir nihr or . lifte re n r rh e s u b te s b a c ru a l l va re If rhe correl i rro.s are roo hi gh, for
ex am plc . r lic s u b i e s tsa re a l l l i l c l v ro b e m easuri D gl hc saD reuni rarl rra( o, ski l l
T he r es poDs i b l el c s r u s e r s h o u l d l o c u s i n rc rprerari onson toral tesrl co.es i D sxcn
c r s c s an. 1jgn o re (h e a !a i l a b i l i r\' o f rh e s u bresrscorcs
NORMS
A ' o/ nr , r i. hic h te p o rr h o { S l u d c n tsa c ru a l l y do pe.form, shoul d nor be contused
\|l }t stando"l!, rlhich represenr csrinares of bow well rhe), should perforn tor
ex am ple, lhc s ra n d a rd ()f c o F e c tn e s si n a ri rhmeri c .al cul ari on i . most cl assesi s
t 00 per . ent , b u ( th c n o rm (a l e ra g e )o fs (u ( 1cnrachi evementon a gi vcn.oni pura
lion r c s t nay b e o n l y 8 5 p c rc c n r Ofre n t he averagcperformance rakes on rhe
f unc r ion of a s ta n d e rd Ih a l i s , rh c a v c ra g ebecoD es rhc cr' reri on agarnsr{ hi ch
t he s c or esot i rd i v i d u a l s a re .i u d g e d ro d cl crmi nc rhe scorc D reani gandval ue
Cons equenil ),,fe ,r s tu d c n ts a re re g a rd e d as fti l urcs i n an arca of srud! i f rtrei r
pcribrmance is ibove rhe norm t,r average), l)nd lew are regarded as succcsscs
il t hc ir pc r f b rm a n .e i s b e l o w i t
Nor ms a re R rm e l i d rc sc o n tu s e d w i rh the vari ous rvpes ot scoresrhat arc
us ed t o r epor t th e m P e rc e n ti l era n k s , s ra ni D es,gmde cqui l al en| s, and sten(l ard
s.ofes a,e all tlpcs of scores, derived |r()In raw scores, to rcporr no.matile per
f or m an. ei r he v a re n o r n o rms th e ms e l v e sN orms arc d' ffercnri ared by ccri ai n
charactcristics of ihe reference grorrp dlat .ornprise drcnr 't herc are age norIlrs
ar d gr adc Do rm s . l o .a l n o i n s a n d n a ri o n al nornrs,gfoup norD r! and i ndi vi dual
nor nr s ,t o r r r m c o n l y a fe w - It i s p o s s i b l el o Lonbi nc the chal acreri sri csof a norm
g' oup in a la ri e ty o l w a ,v sh a n a rre mpr ro bui l d hi ghl ) di ffcrenri ared D orm
gr oups l' or e x a m p l e , rh e N rti o n a l A s s e s smentof E ducari oD alP rogress(N ^trp)
rcports normatiye perfo.rna.ce Lascd on agej geographi. region, racc, gender,
and c onm uni ty ry p e W e c o u l d h n d o u r how w hnc ni ne tear.ol d boys ti om rhe
r ur al W es r s co .e o n tc s t c x e rc i s e s ,b u t s e l doD i s i r w orrhw hi l c to use so many
variables in conrb'nation ro describc lest perfornraDce (And, forrunatcl.!: NAEp
does nor us c th a t m a n y .h s s i fi c a ri o v a ri abl csi D a si ngl e compari son)
t ions . and nho ta k e (h e te s t a s s c ri o u s l ya sw i l l other srudcnrsfor N hom rhe norms
are necdcd Thc three R s rnosr ofrcn used ro judge rhe appropriareness of a ser
of.orms for a givcn testing situarion ar e rdr.sotali!tus,
relflancej an.l ftcmri.
Nor m s o b v i o u s l y mu s r b e o b ra i n e d from $udenrs i n schooi sthar are w i l l .
ing to take tiDle out from rheir othcr responsibiliries ro help w(h rhc nonning
adninistration That very willingness may make rhem somewhar arypical of the
nat ioDal popu l a r;o n o f s .h o o l s a n d s tu d eD ts.To ger enough parri ci pari o. from
schooh to provide a reasonably larye norm g.oup is a difficul( underrakiDg. To
makc it a rcprcsentaiive sample is even h^rder Firsr, rhe developer musr decide
THE NATIiREOF STANDAFD]ZED
TESTS
stuclenrs.fhar is, a morc relev
sentdrive
sampleui pri!are his
Easr.S^urh,Mid(esi, ind 11.i
The adminisrradon of rcstsro ob
297
298
iF.
NAT!]REOF STANDAFDIZIDTESTS
7A
7?
66
60
52-
36
28-
r987
oT
Flgure16-3, TheETre.L
n a Penodor
948
94 9
99
990
NormsrorscrroolAveaqes
q Achevemenr
ardardized
tes(s
lndlvidualvsrsus Group Norms
use norms cont
A sonewhar seriouserror ihat sometestusersmake rs to
schoorbuir
r'om
scores
u!emse
i"c.'p"t
p.".d
ot
'.o..".o
rlrenno
rhe
medrr
"iili;i;;;r;;;i.",
Alrhuugh
aggre8dre
rome
orhe'
lnes,r hool divr i, r', or
rnLri\
ro'
rhe
; .";. s'.up 'hourd be i6our rn' rrmc a' rhr n'edn
'
;i:;;;;;i;;
likt
trre .choot averagesate
sria""t
'.or"",
"l
For examPre,Lheaunoaes'ore xr a tnrv excerrent
,h. *;;;;';;res
,#;;
obtlined r*,);" nl'l :-l'l'-:'-":'J:'::]l
rhe
s(hoorma) be ro$er
"ores
'Jldn
,1..i.-',"8,:rl:J;"11.:i;1,fl::',"::T:',"';':i
in rheno,msroup.:"d
s.hoor,
be bet t e r th a n i h e s c o re s u i o n e l rl tn oI
school
;."rPretadons are made' thc percenlile ranks of the
i".ij".ppi.p.r,i.
I
I
:
t
',i
ti:
it
'J
e.l
i
alIt
::.-!;
I
ri * 6
i?
e
Ec".:
r;
*
a
3
e
:
E]
E
:i,' I
;: li
f,
gj
\
F
;
i
J
!
,
I
G! $:3
ct
295
TNENATIJREOF
STANDAFD
ZEDTESTS 299
arcragcs arc Dot iikely ever to be lo,{er dran 20 or higher rhan 80 the rrosr ext,enre degrees ol exccllcnc. or deficiencr are likely ro be underesrimared drasoc
alh lhem o s (i d e a l b a s j s l o r e l a l u a ri o n ofs.hool al erages,atl pe of rrcarnenr
,cicrcrced irrrcrprcration. is a separate (able oi nolms for school averages And
the qu.riil! ot school norms should bejudged b! the saDe .rireria of relevance,
.epresentarivenr-Is, and rc.cnc)" as were re.o'nmended for indi!iduai studenr
Dor nr i. F ig u re 1 6 3 e x e x rp i i l i c s th c p rcdr.ament tbat can d€vel op w hen dared
norms for school arerages arc used
SEL E CT I O NO F S T AN D A BD IZ ED
T E ST S
Sources ol lnlormalion
For thosc who u.ish to iden(jfy pubiished .rrd uDpublish€d resrsrhar measure a prrri.ular trait. or those who seek descriprive information or crirical relrews ol existine mea rcs, a widc varieN of sources is availabla Mosr informa.
t r onwill be fo u n d rn p ri L b u r m u c h o fi r i s a.ccssi bl ei hrough compurer retri eval
ycorrool (l{MY) generallv is regarded as rhe mosr
Thc MentaLMedeimntr
conrp.(bensilc sour.c of iDformat'on about pubLshed iesrs. The tenfi edirion
(Conoler' and Krarnct 1990), rhe mosr .urrenr printed edirion at rhe rime of rhis
\rrrtiDg, in.hrdes such descriptivc information abont ea.h test as aurhor, publi.ar i, ' n dr r F .n rn ,l ' e r o l l ^ r In \.' n ,l l .\ rl .. n umbi I uf\, orpe repo' r.d. admi ni i rra' i on
timc rcquircd, and plces for tes(s and s.oring services In addirion, crirical re.
liens by tc(ing spccialists and a bibliography identifying research srudies in
rhich rlte nreasurc $'as used are provided Teslsin PTi III (Mitchelt, lS83) is a
sunnnary relerence to information detailed in all the MMys published plevi.
ously (I he fourth edition is schedulcd ro bc published in 1991 )
'I he lluros Itrsriture of Menral Mersurcmcnrs has made seveml chanses
r , ' r . du' e rh e
' p \c re p u h l r(.' r.n l d g rl rar l Jl agxe.ic,' rl i er vol umes ofrhe V My
Iirsr, a I'icnnial publication schedule has begun, and paperback supplemenrs are
l)rollded i. rhe alrernare ye s This Deans informarion about curenr resrs is
updated rn prrnt every year Second, all r'Ur\4I information is accessible on,line
using Biliographic Retrieval Seni.es (RRS).The sysrelrl (MMYD) is updared con
linually so Lhar a computer search lrll uncover the mosr recenr descripdve and
evaluaiive informatior about available tesN.
Another source of descripdve information abour published rests, kr'
A CamPrchM r Refdence lor Atsess,nint in Pstuh.log, Ed,uation, ann B6ines! 6 a
cumulari!e listing that contains over 3100 entries (Keyser and Sweetland. 1990)A compa.ion publication, TestCitiqws, pro\ides comprehensive reviews rhar in.
clude recommended applications, technical information, and an overall cririque
At the dare ofthis wrning, seven volumes had be€n published (Keyser and Sweer.
land, 1985 1 9 9 0 ).
The most cur.ent information about snndar{iz€d tesrs is in publish€r
r J r J luqsz nd th e re s t5$ e ms e l v e e I h o s e sho arc, harB edsi rh sel ecri nCre\ts l or
a school rcsringprognm should review a specimen set for each tesrunder consid.
eration. For a nominal fee, the publisher witl provide a copy ofone form of rhe
TESTS
3oo THENATUBEOF
STANDAFDIZED
test and rh€ accompanving test manuals to individuals who are authorized io se
srandardized resis Pubhsher rcpresentatives can answer questions aba,u( their
tests and processing services ihrough cithef telePhoDe rnquiries or ichool visits
lequested by the test.selection committee
Some tests have beeD reviewed by mcasureotent specialisls in Profes
sional publications like $e Jo1'rnnlaf Edantnnrt M.dntr.nht or MPtlvrrmmt uLtl
tualwrim in Guiaow These reviews, as well as lalidrty studies Published in Edr'
canbe AeDtificd readily through a co rPur
cationaland Pychokgitul MeorrmnL
crized literature search of Curmt Intl{ ta Jamal|in Eduatian {CIJtr) lbt a ve.Y
finally, college and universrty faculty members in education and pslchol'
ogy departments often are avarlable to consult with s.hool Persortnel reBa'ding
te;t seGction and use. Some universirjes are $'illing to Provide .onsultarion and
test.scorrng senices to school distric6 through their canrPus rnea-qltreDrenIand
test.scoring centers. The same seni.es are suPPlied by some statc cducarion de
partments through area or regioDal centers esublished throughoul lhe slate to
serve school dislrictsSelocllon Criterla
Sources of information available to committees or indivi.luats resPon5i
ble for selecting standardjzed tests were described in the Prcvioul seclion llrrt
what information should be sought from these sourccs and how should the infor'
mation be ueighcd in arriving at a selection decision?
the items or test tasksrequire lhe
Zottdtry. Wnhou( question, test content-thar
factor
to assess How lell the lests or
imPortant
know-is
the
most
to
examinee
subtests rratch the currrculum in terrns of content coverage and etnPhasis must
be determrned in selecting achievement tests For tests of aPtitude and intelli
make them easy for students to use.
should be legrble in terms of size and cl2r'ry
Techni.at adcrydL|. A test that has been judged to bc sufficiently valid Io allow
the disrict to accomplish i(5 purposes for testing should be scmtinized furthcr
for technical adequacy. The reliabiliiy of test and subrest scores should be as
sessedfrom data supplied in the technical manual and comnents nadc br- re
viewers ofthc test Data should be Provided in the manual about the equivalency
of alternate test forms that may be available. When d.velopmental scores are
arc satisfactory.
Tests that survive a validity ard technical screening
Practicdt coaidqatio'Lt
should be evaluated in terms of.ertain orher rmPortant considerations Schools
TESTS 3II1
OFSTANDARDIZED
THENATI]BE
PROPOSITIONS
SUMMARY
f""iii;
t
*.*
a sranoard'
serecrine
'' lli'i"l;'."
'o
validllY
p"onlv'o
g"e
h'sl"
',
*o"E
i,i. .""*. '..".."'
ticatesrre srden.s
and5uchoracllcal
.ui"""". i""""'",'
re r
q e n q ar ew'""
ordc owem o l a sw e l l a sh s o r
wearnesses
and
;oeciiicslrenglhs
ro rle o'esenceoi
il'*"""
","'J' "" ""t"n
"o"o"""v
andcosr'
requiremenls
as lirne
conslderations
302
TI'IENAIUREOFSTANDARDIZED
TESTS
FOBSTUOY
AND DISCUSSION
QUESTIONS
1 Wrich characl€rislcs ol slandardi2edtesls are nost ssnrlrcanl ior lests that w be lsed
1o make crileron relerencedintefpretations?
2 W hy m ghl Lhenalonal nor m gr olp s o ' l w o d 'l e r e n t p L b s h e r s y r e l dd f r e r e f t s c o r e s
(meansand slandard devialions)il lhe grolps were gven a common test?
3 Wlry are achevomenLbaheres generaliy ess lserL at lhe high schoo than eemenlary
4 ll a math lesl providingthree separatescores rras nlerconearons ol0 79 0 a5 and 0 83
be t weens ! bles ls whai ar e lhe im p/r c a l i o nl so r s c o r e ! f l e r p r e l a t o n ,
5 Why sholld l-scores be reqafdedas slalls scores ralher than developmenhrscorest
6 Why mrght grade-eqLvalenlscores be less uselul lor nlefprel ng scores ol hghschoo
seno.s rhan rhoseoi th rd qraders?
7
I
t a slxih gfadef is wofkingal lhe leve oilhe lypca folrrh lrader. s I more appropriarelo
tesl lhe stldentwnh a tes(ballery desgredTor grade 4 or!rade 6? &pa n your response
Whal !s lhe meanrrgol a score pror e. !s ng percenlilefanks lhalforms a sLraghl horzon
I Why ar e pef c e. l le{ ank baNdspadic ! a r y u s e f u i n n l e r p r e l . q s k r l o r s l b 1 e s is . o r e s t
10 Whalar e lhe d ller enc esbelweennat o n a l s t a n i n en o r m sa n d l o c a p e r c e n L l er a n kn o r m s t
11 Dlr nq a percd o' naliona achevemenl score dec ne whal impacl w the lse ol dated
norms have on rhe nlerprelaronsof s1!de.ls scores,
12 Wnardoes t m ea. when a s c hools a v e r a g es c o r eo l 3 3 5 r o r g r a d e2 s c e n c e h a s a n a
riona per c ehlier ank ol97?
13 In whal sour.es are yo! kely lo lindlhe mosr cu(e.r crricar revrewor a slafda.dized
ach evemenl lesl?
using Standardned
Achievement Tests
There ar€ good reasons why this chapter focuses on 1l.rr,rather than on selecring
t€sts, describing sample test content, or administering and scoring tests. Ihese
ar€ all important aspects of sBndardized achievement testrng, but none is so
critical as the use of tesis and the scores deriled from them. The climarc of the
1990s drearcns valid test use because there is too much testing for purposes fbr
which tests were not designed, and th€re is too li[le appreciation fof the lilnited
precision wnh rvhich we are able ro measure educational anainmeDts Conse.
quenrly, attention in ftis chaprer will be directed toward (l) an analysis of rhe
legitrmate uses of standardized achievement tests, (2) illustrations and explana.
tions ofscore interprctation, (3) ptanning topics fbr in.scNice work in rest selc..
tlon, administration, and int€rpretation, and (4) an exploration of some issu€s
that affect the quality of a school testing program
THE STATUSOF STANDARDIZEDACHIEVEMENT TESIING
The great irony of standardued achrevement testing today is that, while these
rcstsare b€ing ovensed to fulfill slate and local legisladve mandates, their results
are being underutilized in serving the instructional needs of rcachers and sr.
dents. To be sure, many school districts have caretully developed rcsting pro.
F?ms with systematic procedures for interyreting and reporting th€ir test re'
suhs. But in too many cases the schools are required to give certain tests so thar
results can be made available for such DurDos€s as interstate and inrcrdistncr
c om par is ons ,te a c h e ra n d a d mi n i s tra to rp ersonnel de.i si ons.and pupi l rerenti on
303
304
USINGSTANDAFD]ZED
ACH EVEMENTTESTS
judgmcnts. I he many informauon seekers leghlarors school board nembers,
parent gloups! business leaders, and school adminrsraro.s and reachers-do nor
share the samc agcnda an.l do Dor, therefore, havc need for rhe saDe rvDes of
Teache.s and adminBtrarors gcnerally make less use of srandardized
achievement-tesr resulrs rheD rhey could lbr rwo reasons.Firsr, educarors tend ro
understand tar less about rcsrs and resr scores rhan would be desirable. Thcir
educadonal prepararion prograDs and in,senicc edu.arion setdom ad.lress rhe
esscnrials of resting and evalur(ion Consequenrly, rcachers ofren can devore tess
consequences because of lor scorcs on urandared rests musr lnvesl rhe bulk of
rheir energy and insrrucrional dmc ro prepaftng rheir srudenrs ro do welt in rhe
areas (usualll reading aDd math) covercd by rhe accounrabiliry assessmenr As a
result, even if other tesr scor€s poDr ro areas ofweakness, rhese reacbers cannor
afford ro split thcir effo.is aNay from ihe high.srakes" mandared assessmenr
cepted_anong many educarors and rhe gene
sure of educadonal accomplishmenr is virtu
unfortunately, he may be correcr. We
I hd\ berome .u unr,i ri (dl l v rL
public thar irs validiry,'as a mea
unqu€stioned_ To s;me exrentj
xpecrartons For example, when rhe barh.
room scale givs a hrgher reading rhan we expecr, how nany ofus firsr wonder
if the sale is functioning propcrly? When sropped for a speedDg violation, how
many dri!ers firsrquesrion theaccuracy ofrhe radarequipDenr) Butwhen achieve.
frequently try to explain substandard tesr resulrs in rcrms ofreacher quatiry, funding. o, ph) s ir a l re \o u " r.. d n d s i mp l r J s sume rhe appropri drenessof rhe mea.
s ur e! r nat Dr o v rd e c th
l o s e re s u l ts ,
Another erplanarion for rhe increased use of srandardized achi€vemenr
tests, particularly mandared assessm€nr,is rhar rhe celebEkd Nadonal Commis.
sion on Excellenre in Educarion recommended more_ In irs reporL A Natiafl at
i?t r t { lS 83' , d \p e ( i fi , ti a m e l u rk k i rh e x p l i ci r purpo\es w as gi i en
In ou ining the evideDce for concludiD8 rhar we are a nadonar
sk, the commission reporred 1.1"iDdicarors ofrisk," 11 ofwhich depend oD rhe use of srandard,
ized test scores as crit€ria.
USNGSIANDARD
ZEO EVE]VENTTESTS
305
'CH
Whether or Dot the achievemenr test has become an insorurion is debar.
able What seems certain, howevet is thar good standardrzed achievemenr rests
will continue to be needed to help educators moniror the €ffecriveness of iherr
efforts and report rhe outcomes oftheirefforts to local boards and par€nrs. Care.
lul test selection and wise tesr.score interDrehnon and use can make Dosirilc
( onr r ibu ti o n \ ro fu l fi l l i n g Ih c \r n e p d s
USES OF ACHIEVEMENT.TESTRESULTS
Standardized achielemeni.test scores provide a special kind of informarion on
the extent ofstudent learning k is special because it l9 based on a consensus of
expert l€achers with respect to whar ought ro be leamed in the sudy ofa specific
subject, a conseDsus external lo and independent of the local teachers h Lhus
provides a basis for comparing local achievemenrs wirh cxrernal norms of
achievement in similar classes.lt is usetul iDfomatron becauseir helps to rnform
s r udenr q .re rt h e r\. a d m j n i )rra L o r\.a nd the publ ;( dr l rrge ol rhe effi ri rene* of
the educational efforts in their schools
Schools sometimes have been criticized for serring up resring programs,
giving and sco ng the resK, and rhen doin8 norling wirh the resr scores excepr
to file them rn the principal's ofiice.If.he school facultyand the individual reach
ers do not study the test results to idendry levels and ranges of achievemenr in
the school as a hole and within specific classes;ifthey do nor single ou.srudenG
ofhigh and low achievemenqand ifthe scores are not reported and tnrerprered
to students, parents, and th€ public, these criricisns are jusrifiable. Bur if rhe
c tics mean that no coherenr program of.action triggered specifically by the rest
resulis and designed to "do something" about them eDerged from rhe resdng
program, rhen rhe criiicisms probably are norjust,fi€d
What a good school faculiy "does' about standardized t€sr scores is somerhing l*e what good citrzens do wrth information rhey glean from a newspaper
Having finish€d !h€ evening paper th€y do not lay ir aside and asl themsel\'es,
"Now what am I going to do about all this, about rhe weathe! rhe accidents, Lhe
crimes, rhe le8isladve decisions, fte clorhing sales,the srock market reports, rhe
baseball games won and los! and all rhe resr?" Th€y may, of cou e, plan specific
actions in rerponse to one or two items. But Dost of what is meDorable Lhey
simply add to their store of latent knowledge. In hundreds ofunplanned ways ir
will affect rhe opinions rhey express larea the votes tiey cast, and rhe orh€r deci.
sions they make. lnformadon can be very useful ultimately, eveD when ir rrigg€rs
no mmeolarc resPonre.
Educators who properly deplorejudging teacher competenc€ solely on
Lhe basis ofstudenrs' test scores sometimes fail to see that it is eouallv unwise ro
take acrion on school or student problems solely on Lhe basis oi rhose same rest
scores Seldom do standardized test scores by themselv€ provrde suf|lcienr guidance for wise and effective educational actioDs. It follows tiat these t€st scores
should be regarded p marily as sourc€s ofusetul information, nor as major srim.
uli and guid€s to immediate action.
A school faculty or teacher who sees lhe need and has t}|e opponunir)
shouid not hesitate to develop a program for acrion based pardy on fie rcorej
306
US NG SIANDAFDZED ACNIEVEMENTTESTS
pro!ided by standardizcd tcsts of achieveDent But neither should feel that the
resri.g Nas a $.aste of tnle unless such a progran is developcd Thc iDDcdiak
purpose to bc scnc(l by standardized tcst scores is the prolision ol instructionrl
inlbnnarion, rnformarion that can contribrtte to the wisdom of a host of specifi.
a. r ions s t im u l a te d b y o th c r e d u c a ti o n a l needs and devel opments
Purposes lor Te6ling
All achieremcnr tcsts-whether slandardized or teacher rnade are
mainly tools of instnction That js, they ar. designcd on the basis of the goals
of insuuction, and dren rcsults are iDtended to show the extcnt of progress
to$.ard those goals Standardized achicveme.t iest batleries provide surveys oi
the extent of learni.g !r each of scrcral cuficula' areasi therr scores are er
pectcd to it prove the decisions tcachers make about students lhe assumpnon
is thar reachers $'ill make be(er instructronal de.isions about students dii,l such
resr scores than they woutd uilrorl rhem (Ilieronymus and Hoover, 1986) Scores
are not iDtendcd to supplaur rcacherj.iudgments Insread, ttrcy may hch to con.
firm suspicions and expectations, they nay provide conflicting inlbrmation that
should rigger rearsessment, or ihey may point out the need for furlher, more
detailcd inlbnnatron.
The purposes oudined by the authors of the loua ksts of Ba\ic ShiU! iDd\care the iDportance of sefling instnctional needs and, by tl)eir absence, the ln
appropriareness ofusing such testsfor avariety ofaccountabilrty functions (Hreronyrnus and Hoover 1990, p l):
I Describethe developnenul level ofstudenm so rhat itrstru.t,onal nar€rials and
procedur€scan be adaPed to individu2l(
2. Diagnos€ individual stren8ths and wea\ne$es in educational d€velopment
acros subJe.rareasand sk ls within subjecrareas
3 DeLerminethe erren( ofreadines 10begnrinsttucdon,.o proceediu an instruc
honal sequence,of to move to an acceleratedlevel of instNctbn
4 Inform administfatiwede.isions in Srouping indivrduak to accomnodateindi
vidualized instrucdon
5 DiaBDosegroDp strengthsand sealn€$es for ndjusting.urricular content,€nt
pbasis,or approach
6 Det€rmine the relativeefectirenes ofalte..ate methodsor pr%rams ofinstnc.
7 Determinethe effe.ti!€nes ofinnovattue programsor experimenhl approa.hes
8 Provide a m€ans for d€veloping reasonabl€expecaLionsfor rudent achi€vement and for desc.ibin8 progres toward such Soals
9. Describestudenrachievementin termsrhot 3re meaningttl to parcnc, students;
and !he generalpublic
Examples of ome of these specific purpqses will describe how achiev€ment test
scores can b used ro select studenis for remedral attention or for enrrchment
opportunides, for readiness for planned rnstructioD, or for diagnosing dift-lcul'
USINGSTANDARD
ZED ACI]IEVEMENTTESIS 307
Chapter1 Sel€ctionand Evatuation
Talentedand citted Setection
r
;;lfi:,.
*-e
nationarpercenorerank of ar reaste5 on rhe
sk /,rd
308
US NG SIANDAFDIZEDICH EVEMENTTESTS
p.ograms depends on such nonacadem rc variables as inreresr. Dorivarion. Dersisr
enc€, and indepFndence_ The narure of the program, (he demand for pa;ticipa
tion, and the exrenr of local resourccs may vary enouqh over rime to warrint
\ c r ing \ ep a rJ te , n (e r i a l u ' rh e \rn o u s tA L program .i ' dnd. J\ai tabte.
Kindergarten R€adln6ss
v ea' ur B o ro j ,e ti n d e rg a e n . S u .l r p l a , cmcn' de.i i i un. shoutd he made u\i ns
r nlor m ar on L h a ra rc a d
p' ori de.
w ho d,e noL rri dt ror
' .hi td,cn
k inder gJ r t .n d re rh o n h h o h d !e \o m c ti nd ol dr\el
opmrnrdl defi ri , D hl \i (!t.
em or ional,s u .i a l -rh a r n e e d e\p e r ra t a .n ri .n (or I i me. i n rhe i udem, nr
.om, r
"r
r hJ r Lhe r eg u l J r k rn d e ' g a e n p ro g ra m doc" nor rana," nnorl oLter
havc not had a pres.hool or home envjronmenr riar nourished such ski s Bur
t hes e r r e c o g n i ri \c J b i l i | | e \ rh rrn b e tea' ned rel dri !et] qur,kty, A r\en eome
. onLenr r are d e l to rr b ) i u d e n r a d re d .her The marn ratue;t readi ne* rore,
is to provide a prcrure ofsrudent srrengrhs and weaknrsses,ar rhe skrll levet, and
io describ€ the read iness in each sub.jecr area of rhe classof sruden rs Thus, readi
ness tests ar€ most uselul when given rn rhe mid.fatl, a time rhar a ows plenry of
Finally, for reasons similar ro rhose cired inmedia(ely above, readiness
(p ' i n g o l k i n d e rS a' ren are nor useful fo, makrng fi rs,.
i( or es obr a i n e d i n
gr dde pla. e m e n r d e' h
.i e
s i o n s . D e c i s i o n s ro rerai n, ro pta, e i n a rransi ri ondl lD ro.
gram, or to promote are not likely to be aided by readiness.restscores. She;ard
and S m if i r 1 9 8 6 ' h a v e rrg u e d rh a r re re nri on at rhi s decr\i on poi nr has ontv ;eqr.
t iv e, onequ e n (e s fo r m o ' r s ru d e n rs .n o marrer w har .' i reri ;n i s u,ed. S rude;,.
who s how s o e re w c a k n p s .e s;n p re re a d i n8 ,ti tl s l i .reni nR , l erte, recoqni ri un,
lelt er - s oun d a q s o c i a ti o n .a n d Ia n g u a g e rel ari onat ronceps ma' need-rempo.
rary individual.prcgram plans to r;nediare.heir deficieniies. But cerrainlv other
considerations should inform r}le .€tenrion decrsions. not sot€ly rhe ac;demic
det ic ienc ie so f k i n d e rg a ri e n .
Didgnolls ol Leamlng Dlttlcultles
S t an d a rd i z e da .h i e te me n t b a rreri esare mr deri gned ro be di aqnosri ( ai ds
t har pr ov ide d e ta i l e d i n fo rm a ri o n l o r w ork w ;rh
srudenr; H ow ever,
' nl j ;rl ul
most do provide considerable group diagnosiic iDformarion,
parricularly those
thar display resuks in special reporrs rhar show av€rage rest.item scores or avera8e s t ill s c o re sw i th i n re s r!. In s rru .ri o n al ptanni ng for a ctars can be enhan(ed
by BkinS sud data inro account, insrructronal mareials can be sel€cted or devel,
oped !o imprpve learning in deficienr areas, and rime can be reallocared frcm
topics on which students have demonstrared higher levels of ac.omptishmenr.
USING
STANOAFDIZED
ACHIEV€MENT
TESTS 3I'g
Any achidenenr rest can provide "diagnosiic informanon of value t'
indivrduat stud€nN if rher are told which items they mrssed With the teacher's
help, these students can rhen correct the mistakes or misconcepiions that led
rhem asra,v Highl) specific "diagnosis' and "remediadon" of tltis sort can be
effecrive and is often accolnplished wnh classroom achievement tests. But such
feedback and discussion are impracticai, if not impossible, with standardized
One reason for the lack ofsuccess in edu.ational diaqnosis in most fields
other rhan elementary readrng and arithmetic rs that most leatning difficulties
are not attributable to specific or easily correctable drsorders lnstead, they usu'
ally result from accunulations of incomplete learnrng and of distaste for learn'
ing Neither of these causesis hard to re.ogn izei neither is easy to cure Diagn osis
is nor the real probiem, and diagnostic testing catr do little to solve that problem.
thal
-A.norherreason for rhis lack of successin educational dia8nos's 's
effecdve diagriosis and remediaLion take a great deal more time than most teachers have or most students would be $illing to derote The diagnosing of reading
difficulties rs a well.developed skrll, and remedial treatnents can be Yery effec'
uve Because rcading is so basrc to o fier leaming, the dme required for d iagnosis
and re'nediatioD is often spent ungrudgrngly. But lhere *te subjecr of study is
nore advanced and more speciatized, the best solution to learning diffictllties in
an area, say algebra, physics, or Cerman, may be ro put offstudy in that area a.d
cultivate ieaming in orher areas that present fewer problems.
Standardized diagnostic tests in both reading and math are achievement
rcs[s used by reading Dr math specialists to gain information about the learning
problems ofindividual students. These tes$ are built to allow rcsr takers to d€m
onstratc cernin kinils oferrors or misconceptions held by students who are hav'
ing difticnlri€s in r€ading (or arithmeti( computation). Often the resulrq of th€
$bje, r ma e e \r i n a b fl rF ry i n d i i a re a B enerrl probl em, dnd the di agnosti , te.t
is adminirtered to ascertain the sp€ciFrcdeficits in terms of skills and subskills.
Unfortunately, diagnosdc rests, like other achievement rests, help to identify
probleD areas, but they seldoD provide reasons for t}te dimculties and cannot
prescribe solutions io overcome them. A major challenge to the rcacher is to
synrhesizethe entering beha!ior information about a student so that the instruc.
tional strategi€s and materials can be selected Lhat will optrmrze that studenfs
condrtioDs for learning
SC O B ESO F IN O IV ID U AL S
INT E RP RE T I NG
Mosr test publishe$ offer such a wide variety ofscore reporis and scoring serrjces
rhar schools sometimes have difficulty deciding which ones they should order
counselors, admrnistrarors, parenis-and !/hat kind of information is neededpupil rest and skill scores, building average scores, system.wide averages, class
room p€rcent scores. and so on. The list of ne€ds Iilay seem almost eDdless,but
rhe review process will help rc rule out many reports that are either similar to
one anorh€r or simply nor need€d. There is good reason to b€lieve that the
310
USINGSTANDARDIZEO
ACHIEVEMENT
TESTS
underutilizatron of scores by reachcrs is due in pa;t ro dre inconvenienr formar
in which the scores are reporred ro rheln Of course, part of re rcason for rhe
inadequate reportrng is that teachers are seldom consulred aboul reporr formars
ihat would be most helpful to them.
When tcst results come back to a distri.t, ncarly elery reacherwill rc.cile
a lst repor(, an alphaberical listing of students and rheir correspo.ding scorcs.
At the middle school and high school levels, repons mighr be arraDged by class
period fbr ea.h English, math, science, and social srudies reacher Fisxre 17 lis
a sample lisl report showing scores for Mrs. Newton's fifrh-grade classon rhc lood
Testsaf Baslc Skilh. A re\io{ of the scores ofAlison Babka will illusrrarc how scores
of indivrdual pupils mighr be interyreted. Here are some slarcmenrs thar mighr
be made about Alison's performance in dre fall of fifth gradel
I
H€r Complet€ Compositt gradc equivalent score (rhe arerage ol tbe llve main
scoret is 55, the same as the typical sLudeni aLthe end oflhe fifrh monrh offifth
Her Complcte Composic percerdle ra.k of 60 means thar lio percenr ot fiflh
g'aders nahonally have composire sco.es loser rhan hers
3 In sun, Alison\ overall achie!cmenr seems aboxt ave.age .ompared wirh orh.r
llrlh graden natio.ally
4 Ali$n s rclrtive srrenerhs a,e in areas in rlhich hrr per.enrile rank is nod.eabll
above h€r Complete Composite percentile rank puncruatidn and matb conpu
5 Aliso.\
relarile $eakneses are in areas in rLhich h.r percenrile .ank is roLice
ablvlower !han herComplete Co posite per.entile rank-language usage, matb
problem solring, vr.ial srudies, and sciencc
At this point we should be interesred in the parricular skills rhar nny halc
contributed most ro rhe strengths and weaknessesidentified by rhe (esr scores A
report that makes such analysis possible and rhal provides percent correct scores
for critcrion-referenced interpreadon is rhe Srudcnt Skills Analysjs report rn
Figxre l7-2 (Note that the lndivrdual Performance Profil€ repor! Frgure 16 2,
also could be used nicely for this purpose.)
A c ro s s th e to p o fA l i s o n ' s s k rl l sreport, rhe test scores grade equi val edts
and percentile ranks-ftom
the list report ale reproduced hr easy reference
The fourth column ofnumbers in the botton scction of the reoo is the pcrcenl
l o r A l i \o n . a n d rh ( n e x t'uo," l umn..
.' eLge. or rl ' , ,i r* arro
the nation, permrt norm-referenced conparisons Here are sorne srarements that
might be made about Alison's perfbrmance based on skrll ,coresr
1 Puncruation is a rel2tile shengtb ofAlison's, pardl because ofher perforna.ce
3
wi th rcrminal pun.tuaLion and use of commas Her other skill sco' es arc m! ch
like those of.he a!erdge sudenr in her cla$
Alison s math compu.atoD performance was bolsrered by perfects.ores on addi
rionhubtracrion ofwhole numbers and decimals. But vhole number nruldplica.
.ion/division se€ms ro be a weak skiu widin this generally strong area
The language usag€ and €xpre$ion seakne$ seems to be explained maint)'by
lsage skills, all of whi.h re€d improlemen.
f{
il
I
e
1)
:r
t
g
B
g
.:
E
F
!
311
:" :i !
::ri
:i::
:;::
;
;
e
:l
-=H
t!
!3
312
:€
USINGSTANDAFOTzED
/TCHtEVEMENT
TESTS gt3
4 The wraknels
frarh probtem s(
'n per ( enr ile
pe. r eo r r om r he
nn L o
bt.acnon. Th€re may be some !
oblem-solving tesr irems to derr
j::l.jt#,..#"T,::T"'"':;
' ('''{::nir
;::l;'J:il'""t""":::T..t.ilji"",l
sru.ties as anythi!8.)
6. Pc r f ar hd, , , ? in q ic n( e
in pbysics and chemistr
wea t o v F r ,
lbe.e x a
bur pa,r(ulr,ly
wilh r"rpe.r ro ,oDirs
:.e::t1.!:r':Tt:i
i:.illili1:if;ii_,',:f.:;;r;illitrrfl"J
7 Alison also had som€ troubte wirh the rezding ofgraphs
8
and tabl€sin rhe visual
yf:T':i:
j:ii,i':,x.,ij:;J#fl
:il::;
";,i,::,*-r.,"lil"-il;il;
Though referencemareriah was nor n
, pr,.r,",,,,n
gp,*"",J " ;..;i# ;fll.lik';,:i;'il"';J:'"] s:,:,;l:
ll't';ri:#flI;:[.;],:tliil;l::::1",I.j.:l';J#il1]:i:,"..:
r €! eit r he \ pec if ic pr ot t em
..^,..---l-ll,li1.,
r\ne.r or in,erprerins
rhesiuresor an ind,!iduarsrudenrin.
f..':"::;ti
:$;tfl::::,i':J,,ffi
n: .:il,:ij,;:""1;::Hp,;".,il;t
rvpi cat proFess nas made i n e;r h a,ea,
. same on the two occasions. This tabte
:":miL",:,J:"Tji:r!,::i*i
i"!.il.[lt{;::r:!iixT]ft
(Feidt, Forsyth,and Aln"t, rSSS,p. tt;,
85-99
65 84
35-64
15 34
+8
onc year or l0 monrhs (or 1.0 when rhe
belorvaveragemighr gain only 6 ro 8 m,
expecreclto gain 12 ro 14 months A pr(
seveiarsuccessrve
yearscan provide a us
31'
USINGSIANDAFO]Z€OACNIEVEMENTTESTS
quacy ofgrowth-overall
a.d in rhe areas previously nored as6trengrhs or weak.
INTERPREN G SCORES OF CLASSES
tom roll of the list r€port shows the avemgc gradc.equivalenr score and rhc corlelike these can be nade:
l
tinishcd the second monlh The p€rc€ntile rank of5r verrfies dris inrerpretarion
The relative srengrhs of rhe class ar€ rn areas in rvhich thc percenrile rant n
3. Thc
rclarive weak.esses are in rhe refercn.e materiats and rcadinB areas
.1.
uDexpected rc$ perfomance
b) c€rtain studcnts )
Thus far, in addition to idenrifying areas of group srrengrh and weakness,
we have tried ro verifv that the student scores most responsible for rhese cxueme
group pertormancer rre not due to incomplete test'ng, random responding, or
some orh€r erroneous iactor Students who $,ere not morivared ro rake rhe resrs
seriously could have responded in unpredicrable ways thar would caus€ rheir
scores to be incongruent with ther. rypicat classroom arrainmenrs. Such scores
should be ignored tempomrily so that subsequent Rroup analvsis and insrructional planning will not be distorted.
The nexr step rs to determine rhe skills thar mighr explain rhe relarive
strengtns and wcaknessesnoted above- The Croup Item Analysis report in Figxre
l7-3 shol\'s rhe Visual Materials and Reference Materiais skill scores and corre.
sponding rtem scores fbr Mrs. Newton's class- Since reference marerials was a
weakness previously rdentrfied, we should look ar rhe scores in the righr column
to gaugc performance in that area. The first column of nunbers $hows rhe resr
ircm number, and each of the next four columns shows rhe average percent cor.
rect score for (1) all fifth graders in the rorioa (2) Mn. Newton's .lars, (3) all fifrh
graders in the ,ztld'4€, and (4) all fifth graders in the school slrla!. The lasr col.
umn, Diff, is rhe class average minus the natronal average Ir is this column rhar
canhelp isolare skill deficiencies and rhe parriculariten, contentthat conrribuled
Mrs Neuton s class se€ms to have had some trouble with alphaberizing
iI
-' : ?,i 1 -' $ ] j- 9 ! G! a x t9e-:j
:3 9 3 s SS 9 l ;Pe !r::3 $ €${9S r
!:!!i
;1 ;$ .;i s i i
i n ;a .1 f
:rri ;i s9?
!!r?l
3!-?;3
:ao se sg
aal
::;;i5i:,;i
i=r
r l=d
651
i8i
56
I
:i
9
6
'i
:!r!
!!
x iiiii: : :
6 5 6 ,.r8
:!!:l :f
315
316
USINGSTANDARD]ZEO
ACH]EVEMENT
TESTS
seem to be problenaric because rh
exception of ircm 58, they are sizable Mrs Newron NI nccd !o decide if sne
should plan some insrrucrron in this skill; if she shoutd incidenralty introduce
alphabetiring tasks in rhe course of presenrrng srience, socist srudies, or other
such lessonsi or rf she fiinks her upcoming plans wifl deat wirh rhis sl.ill suffi
ciently. The weakness in gencral references nighr hale bcen expc.red if ficle
fifth graders had not been instrucred in rhe usc oI artases,atmanacs, and cerran,
booit parrs. The disrricr curriculum guide $,ould be a useful referencc for deciding about reasonable expectarions and possrble needs for remedration in sirr,"
tions like this
Wben schools are departmenralized, as lhey usualll are lbr mrddle s.hool
and high school grades, sco.e reporrs.an be preparcd sepa.aGly for each ol a
e o n e i nfi gxrF l 7 1.,.rn bet.orrdi i r' thi \
latter reporr shows rhe average skill s.ores oF t4 grade t2 srudents on Lhe loaa
T . s ^ oI r dur a h o rd l D a .l ' p w n ! tt,n n b e usctl mr, h l i ke rhe sroup i rer Jnat\,i \
r epor t ( F igu r€ l 7 -3 ) ro fi n d s k i l l s rh a r h e l p erpl i ,n are;s of strengrb ana w i ak.
ness.For example, rhe Sources oflnfonDarion ctassaverage percenr conecr score
of 58 was onl) 3 points less rhan rhe narionrl average score. One skill, use of
encyclopedias and almanacs, 1\'asa weak area and another use of rhe ,tdd..t r
Guidz, was a srrong area ln science, Mr White's najor area of inreres(, rhese
students performed slighrly beiow rhe narional aveDge, bur no skill seemed ro
be particularly weak or strong In Quan dra rive Thinl(ing, ano the r area of inreres r
to a physics teacher, these srudenrs performed slightly berrer rhan rhe narionat
average, particularly in the skills of probabrliry/sGrislics and erponenrs
Ifreachers in deparrmenralized schools are expccred ro review rhe srandardized achievemenr scores of rherr sruden$, as rhcy should be, repo s Ly class
p€riod should be provided for rhem Ir is uDreasonable, for exampte, for middle
school teachers to pour over a list reporr of240 ergh{h graders ro find rhe ones
REPORTINGTO STUDENTSAND PARENTS
The most basic use of tesr scores is ro repor rhern ro all who need ro kno ,
alongwith a simple inrerpreration ofwhat rhey rnean. Theyshor'td be reponed ro
s t udent s as w e l l a s ro rh e i r p d re n rs b e .au\e bofi d' e kev i ngredi cnr, i n (t hool
l€arnrng. Parents that are informed a.e likely ro be more involved-ar home and
at schoo wilh their children's leaming and are more likely ro work coopera,
liv el] $;
r } l e l e .rc h e r.
Students, too, must be rnformed abour rheir own resrresulrs because rhey
m at e c ount l e \s d e ri s i o n sa b o u t th e ,r o s n i n\rrur ri onat i nvotvemenr. dhether ro
Participate, how much to participaie, how much effort ro devore, and whar kind
of personal standards to adopr. And.unless srudenrs are made aware of 6eir
IF,E
<:E
:
9g;
I
g)
$s
333
R
S g B sE
;g"E
i g"E
i.s
I
,fiQ
5rs
a
I
F
!
E
o
a8
aa
!f,E
33bF3
!.r
I
d7
AE
d^@
-,v
917
318
IESTS
USINGSTANOARD
ZED ACF]IEVEMENI
scores. thev are likelv to be less motivated to take thc next standardized ter seri
ously. When students develop the imPression that lest scores do not get used or
that the scores are not important to others. eIIorB diminjsh and test s.ores lose
rheir usefulness Such s the unlbrtuDate situalior in sone high schools where
test scores tend ro be used administratilely by Ihe district, but potential instructional uses are ignoredScore reports thal are marked by simplictlv in visual Presentation and
.
erplanation are most rdeal for rePorring to otlters For example, thc Individual
Poformance Proiilc report (Figure ltj 2) dis.usscd i Chaprer 16 is ideal for
repor rg to paren[s durtng a parenr leacher conference l'his rePort has several
s t iengt hit ha i p e rmi t a .o mp re h e n s i v e a nd cornP rehensi bl ei nrcrpreutj on ofreI Percentileranks, rhe easies.type of d€rived s.ore b ur.lersta.d, are used
2 Pe.ce.iile bandsaUowfic idea of€..or b be,ncorpolared i. $e iDterPreLatron
3 The a..angcmentof ter and skill profiles Pelmits an easyidentiiiczttun of Iela
tive strengtht and weakne$es
5 Percent.corect scores albw for c,iLerionreterenced inle.Prcradons ol skill
(Turn to Figure l6-2 and trl to visualize hou )ou !ouldusc
this rcPolt in discuss
tioD are the mor useful
School offi(ials are sometimes rcluctant, for many reasons, abour rePon
:
I
i3
3
I
!
I
i
i
:
d
a
tr
E"f
tf'
oa
I
i r .'! l ,
6
:
at!
,i
EI
-95
=g
i!
. "t.! ! t: :
:
3t9
320
USNGSTANMFO]Z€O
!CI] EVEMENTTESTS
PBOBLEMS
SOMEINTERPRETATION
B ec aus et he re s t re s u L sa f a s ru d e n t,c l a s s,bui l di nq. or di sri ct ar€ i nfl uenced bv
a \ ar ic r v ol l d ,ro r\. r ' , rmp n s s rh l etn b ,r ri hure hi gh o' l os pF' l urman.e,. JnI
one fa.tor.'fhat is not to say that att bution should be ignored But, as we shall
pornt out. inept attempts to explain the testresults ol a group can iead rc,lcclings
of lutility among teachers or to the desrruction oftesr scores as a viable rnfor ra
Jud0ing Teacher Compe16ncs
S hou l d th e re s u l tso fs ta n d a rd i z ed achi eve.renrLestsbe used to eval uate
the competen.e of reachers, either rndrvidualll or as a group? Iren when we
recognize that test results never tell tie whole a.hievement storyt thar srandard.
ized tests have limitations, and thar lactors other than teacher competence enrcr
rhe picture, rhere is still a good casefor arguing that poor achierement (or good)
nzt be rhe result ofpoor teaching (or good) Ifue agree rhat rhe qualityofreach.
rng influences the quality of achieveDent, then we musr agree also that good
measures of achrevement have sonething to conribute to the complex process
ofevaluating teacher comp:tence.Ifwe do nor agree that good learning requircs
good reaching, why do we iry to hire good teachers or try to train them in the
first place?
The.oncern aboutwhether to use test scores as apartial basisfor teacher
evaluation has intcnsified because of the maDy misuses ofstudents' test scores in
personnel decisions. The primary purpose of tesdng, to improve rnstruclion, has
been replaced in some cascsby the use ofscores to m2ke salary decisions, de.ide
promotions, or assrgn teachers to buildrngs. In some rnstances,t€acher retentron
decrsions are made by givrng hea\y weight to the most recent standardized test
results- G'ving inordinate emphasis to test results for such purposes has de
stroyed rhe instruciional value of the scores in the affected s.hools and has nar.
rowed the "instructed curriculum tocloselymarch test conlent, ifnot to inchrde
sPecrfic test rtems.
Too much emphasis on rest scores in judging teacher competence also
has slowed the search for acceptable alternative measures of effective teaching.
The existeDce of objective, quantified indicators like rest scores has contributed
to complacency abou( developmental efforts Research on teacher elaluation
continues, but the prospects for more effectiv€ metlods are no greater for the
evaluation of teacher competence than for the assessmentof student achieve.
The gross misuses of standardized achi€vement.test scor€s for judging
teacher competence we have witnessed over the past decade have caused us to
qualify our response to the iDitiai quesrion raised In the absence ofother pertinent information, t€st results should not be used to male personnel decisions.
Tesr scores lhat arc compromised by arkmpts to influence personnel de.isions
ale uselessfor ant purpose. Any secondary use of standardized achievement test
scores that erodes the basic instructional purpose for teslingshould be discontin.
ued Promptly.
US NG STANDAFDIZEO
ACHIEVEMENT
TESTS
321
J udging Sc h o o l O u a l i !y
As l o D g a s rh c p ri D c i p a l ra s k of thc s.hool i s Lofaci l i ratecogni ri vc l earn
iDg, any rn fo rma ri o n rh a r d e s c ri b c sthe exren( ot such l e.rrn,ngsecmsadmi ssi bl e
fbr judgilg s.hool eftcctireness. From rhis standpoinr, rhe moir efln.rive schools
achrelemenl resrscan conrribute to examining annuat gro$,th in rhe curricutum
areas upped br rhc rests' itenrs
The surge c,i inrerest in srarc or disrri.r reporr cards has helpcd to diver.
s if y r he r h i n k i n g o F e d u c a ro 6 a b o u r x !a! i erv ofschool qual i ry nrdi car.rs_arrend.
noer gel assessed
and facroredi nro.urri culuD evaluarion be(ause accepral)le asscssmenr proccdures are nor avaitable.
Thus, judgments ofs(hool qualrr,\,seldo arc based on perfornance ill such aca
physical educarion S.ores IroD sra
contnbute ro dre assessmenrof school qualir,v,bur their focus is trmircd, as rhe
c m P ha5 ' qo n rh e m d l { u \h o u l d I' e
School achielement. a.d .onsequcnd! resl scoles. are influenced bv a
nurnber offactors relared lo rhc srudents, rhe srafl rhe school, and the comn,u_
and communiry financial resour.es. suppo.t aor rhe schools, and populatron mo.
brlirt Somerimes it is difficult ro recognize rhar achievenenr is as high in a school
as should be expected, grven the resources buman and moneht-that
have
been expended. It is equally drfficult to recognize thar the hrgh achievemenr ob
served in some schools is lower rhan ir oughl ro be, given rhe narure ofresources
rs, parents, or students-can inrerfere wirh
outcomes musr be heldjoinily
as well
SCHOOL TESTING PROGRAM ISSUES
A number ofdecisions face .eachers and adminisrrators in rhe various phases of
test selection, prepamtion, administradon, and score interprention. Some ofthe
guides and mdnuals rhar ac( ompany standardi/ed rerts sp;al. ro drese rrsues,bur
many do Dot. The remainder of rhis chaprer is devoted to an analysis of each of
322
USNGSTANDARD
ZEDACNIEVEMENT
TESTS
scveral marrers thar impac! rest use but are not ihoroughly dealt with in most
Teacher In.6erviee Planning
A lest-batter,vselection committee is faced with four major iasks: (l) revicw or assessthe testrelated information needs of t}le school svstem and its
\ r u, r u' dl In i r..' 2 rd (!e l o p a l i \r o l ,ri ,e ri dr.beusFdi ne' al uatrngthecomper.
ins achiclenrenl barrer ies from which a selection $1ll be made, (3) determine th€
procedures to be used to obtrrn eviden.e rclevant to each selecdon criterion and
to r{eighr the eudence from the la.ious selcction.riteria, and (4) implemenr the
procedures and makc a rccommcDdadon ln liew ofthr lack of preparation or
experiencc of maDy educators wirh these tasks, an in-service program for most
selection co'nmrlrees rs essential-'Ihe topics listed in Appendix D form a comprc.
hensive age.da fron which lo.al plans for in'serice might be developed.
When a standardized achievemert battery is administered properlv, we
can be relafi'ely confident that (l) students have responded to Lhe tasks to the
best of their abilides, (2) resources provrded by the school have been expenCed
judiciously, (3) score interpretations using the publish€r's norms are appropriate
(meanrngful), and (a) year to year growth cstimates caD be represented accurately.
To ensure propcr preparation and administration, rcacher in-seFrce planned
around rhe topics listed in Appendix E should be provided. The specific insenice needs of a district will depend on t}le extent of annual staff turnover and
rhe extent of previous experience with thc battery in curent use, among oth€r
factors. tn addition, Nhen the staff is formed predominantly by experienced
teachers, grcater emphasis should be placed on the "why" ofvarious procedures
rather than on a rehash of ihe what."
Frnallv, test scores n€ed to b€ placed in the hands of teacherc so thai
insrructional de.rsions can bc made about students, classes,or segment-eof the
curncuhrm Tcachers must be able to recognize discrepant performance, inrerprer various knrds of scores, locate scores on particular reports, and relare the
score information ro exDectations and Drevious achievem€Dt l€vels. In-service
ropic,. that address thestskilts. delineateh in Appendix F, are th€ basis for program planning. Posttcst in.senice is necessary to ensure t}lat the conscientious
efforts of tesr selection ancl admrnEtration are brousht to worthy conclusions.
Currlculum.Test Content M6tch
Ir is both unfair and illosical to adminismr a tsr to students when that
rcsr cov€rs topics students have ;ot had an opportunity to lealn about. But the
other extreme has its limitations, also, as Linn (r983) has pohted our "Allowing
the match berwcen iDstociional mat€rials and tesr items to b€ too dose risks
losing the capability !o measure understandiDg." The grearest loss, however, is
the ability to generalize abour what studen$ may be able to do:
Literal match ot ituttuc.ion and testing in th€ kns6 ofpractice on ih€ items that
about
zppear on rhe rcst deslroys the measurement valde of rhe EsL Infftrcd
skiUs and knowledge that are made on the basis of tcsi resulb becoEe suspe.r
IJSING STANOARD
ZED ACHIEVEMENT
TESTS
323
The mismarch ditemma iltusrraresrhe
ara'1ablefrom publishers,and local r(
.'uri.Ulum Mosrimporranrt),
rhedrten
r,t'n rhariq lileiv tu resutrfrom .u\ton
than obJecti!€s)are selecrcdby rhe sch
be measurcd,for example,when rhe r
Practic€and T€echlngto the Test
The prepararion of srudentsfor resring is an issue
of curriculum.rest
xrensiverhat, in fact, it becomesrhe ,.inI resrhasbecoDea majorroncern! parrjc.
,s mu. h prelJrJrionis tesir;l-)Jre
6ctore
k,;/,V, hr"n. and harnrn.ki(ttSor hJ\e
or prachce (and matpmctice)usrns this
l. g_derat iNrrudion qithou. reAard for spec,fic
objectivesmeasurcdbI th€ tesr
rn quemon
:? Tea.hing ofkst.rakinA skilts
rcs.hat ma),includcsomespecificauychosenbccause
rhev
1::",."j,l.":",*r".
" arc
xnown to Deme dred on iotu standardizedtesls
7 Pracd.c ustng lhe exad itehs f.om
r he m ns t q u e s ti o n a b l ep o s i ri o n i s i re m 4 .
or I ns t r ucri o n .rh B { o r m o fp ra c ri c e w o u t,
(except perhaps in a strucrured cur ot
Fro our perspecdve, the most reasonab
lhe d, hin g o f re s r-ta trn gq k i i r2 ) z n d i nsrructi on rhat
combi nes posi ri ons3 and
4.
Eerly School Testing
The use of stmdardized rests in tlr
ri.d",s**.
*,,;;;-;;tu-..;;:';,;';j#l:#IilSlii:;';Xil:ff.l:
is iusraqmu.h needro monirorgrowrr,l
ia.nttry
and;*;;;;,;;;
esrimare
devetopmenrat
"r,engrnlHo*.,",,
tevets
amingthevounge,
a,f,olii
"ua.n".
324
USNGSTANOAFDIZED
rcF EVEMENTTESTS
omlly, and responses are marked drrecdy on the test booklet. The abilit) of five.
year.olds to handle such testing in the fall ofkindergarten has been documenred
in several studies (Frisbie and Andrews, 1990i $'odtke and others, 1989)
Many primary teachen have been convi.ced tiat resting in the early
gmdes is a mistake, that the results are very unreliable, and thar some srudents
are placed in a traumatic aituation by testing No doubl some of rhcsc teachers
have observed student behavior that supp.rts thcir position. Othefs probably have
been innuenced by the misuse of achievene t scores in making grade retendonpromotion decisions orkindergaien admissionjudgments With respect to reliabiliry, reachers often do not have ready accessto the supporting technical data
lbr a given tesr or rhey are uncertain about how to interpret the data.
It appears, however, that many prinary teachers object to norm.
referenced testingbecause it puts some stlrdents in a position ofhaving to answer
questions they are unprepared to answer. Consequently, ther might say, the love
of learning the ieacher has tried so hard to instill is undone in a matter of min.
utcs by a test Obviously, a test with too many difficult questions should not be
given to a child bccause it may produce unnecessary frustration and probabll
would provide lirde useful information. But a test wirh -{orr hard questions B
necessary (o distinguish students of different achievement teveis and m help
idendlr relative strengths and weaknesses.B€sides, students have all had experi
ence in their past that created frustration and some degree of failure-tying
shoes, buitoning and zipping, riding a bicycle, or printi.g their name. WheD
studen$ are told bv the test directions that some questions mighr be asked that
they cannor answer nost of them understaDd and;ccept the condrnons wiihout
negar\'€ consequcn.cs.
Perhaps too much early school testing is done because ad$inistrarcrs
have rcquired it, rather than because teachen bave found the results helpfui. As
loDg as legiomate purposes for rcsting exist, it is paramounr rhat appropnate
tesrs be selecred, that the administration directions be followed exacdy, and rhat
srudents be encouraged to do their best. Oth€Nis€ the results may nor be !er,v
valid for any purposes, even those of tshich the teacher may not be fully aware.
Frequ€ncy 6nd Tim€ ol Year
The reasons for giving standardized achievement tests hold the answers
to rhe questions: When should tests be given? and How ofren should rests be
given? Since differenr testing purposes do not point to the same answer and
because a number of practical factors enter in, answ€ring these t'o questions
means weighing tmde'offs.
The costs in dollars and instructional rime probably should preclude giv.
ing a full batt€ry more thaD once in a school year. Some pr€ post tesdng may be
necessaryat times for prcgram evaluation, but ordina ly this should involve the
readministration ofonly oDe or rwo tests from a battery
Some dist cts have begun to look for walE to r€spond to the intrusion
of the multiple national, stat€, and district iesdnS programs that €rode .lirect
instructional time. One resDons€ has been ro limit the administration of an
USINGSTANDARD
ZED ACI] EVEMENTTESTS
325
to elaluare r€ach . Midyear resdns o
time for remedia
g $ieaknesses. "
E
n
c
l
.o
f
r e a r re s ri n gh a i a n i rrp l i
.
r r .lher her i n re n d (d o r n .r t,y s ! b o ;l
.tudgments about rhem rnay bc influen(
hons as it e a c h rn gth e re s a ,c a nr€ s u l t.o
an assessrnenrofthe effecrs ofycar lons
tion of_new Daterials. Ar rhe primarri(
reDt ach€rensr
information rhey often use ro reconsrirure classes
for lhe up.
'fwo orhe
inrs Derir briefaft
growth, ir makes
differeDce which
t o ndm dr ile (o m p
T h e rw n .m t
o nesa.n a tl s rs
r n- r c r pr er atL o nbsu r n o r' qrh
o f s rl
Ounot.Leyol Testlng
of ddapri nS resri nSro Lhe currrutum
upor i ndi vi dual ro be re.(ed.For exam.
Lecrassto be tesred in the fall should he
r ppropri !,re to, epri ng resri nsi n se(ond
d . resri r roo advanceai n (onrenr (over.
Lrop back,, is l*ety ro yield more us€ful
T her e a re u s a U ! d ra m/ri c a c h i eremenrdi fferen(er
among fl udenrs i n
r. ,he- s am e. ls l rc o m.
n d re a c h e rso rd i n a ri l v w ork hard ro accommodate
rhore
dif f er en( es b y i n d i v i d u a l i ri n s ma re ri a l i . a(ri vi ri es,,nd
* p.;;,i ;;;1.
;;' ;t#;;
32€
LJs/NGSTANDAFDtzEDTpIEVEMENTTESTs
such a€commodarrons, it makes sense that tesring also be individuaiizcct so rhar
tlose working at markedly Iower or hrgher curricular tevcls rhan rheir classnates
willbe rested on the objecrives to which rherr rnsrrucrion has been direcled An
Out oflevel resring can result in major gains for reactrer and snrdenr wrih
no loss in interpretability ofscores Such sco.es are l)kelv ro be more accumre
because studenrs will expe ence less frusrrarion v1lh foreiAn contenr and $rll bc
m or e m or i v a re d ro ro m p l e rc rh F re .r. thr r.,r dnd qki tt nn;
,,r. ti tFtv ru
demonsrrate a partern of strengths and weaknessesrarher
rcrurc oi atl
weak nes se s -u n d i ffe re n ri a te d p e rfo rn an.el evel s.Thesra
al enrscores
r hat r c s ulr fro m .u r.o f.l e re l re \ri n g h a \ e rhe \dme m.anrni a.
e, d" ri ,
rron
" ^r
" .1
in lev el I e s ri n S.T h a r i (, rh p s e q .o re s are i nrerprered rLtrhour
regrrd ru ,he re.l
level talen. Also, the percentile mnks assigned ro a pupil show how that pup
grade-equivalent scores compare wirh rhose ofothers in ttre same srade Thar
a third grader is always compared with otner rhird graders, no mairer whrch t
lev€l was administered
Testlng Sp€clal Students
Individualizing testing is one method of accoDmodarinE srudenrs wirh
s pec ial ne e d s . b u r rh e re d re o rh e , n u d enr\ w ho, no mr er w hi ;h red l del i ,
selected for them, will requir€ special resring condihons. Sruden6 lvirh some
form of ledning disabiliry, rhose I!ith lisual or auditory deficirs, or rhose $;rh
physical handicaps may need exrra time, a reader, an answer recorder or some
otner form of assistance that requires departure ftom standard adminisuarion
condirions. When the goal ofrestin8 is ro obtain retevanr informarion for rndivid_
ual program planning, all such accommodarjons should be made. Of course,
score rnrerpfetations musr take into account rhe specral conditions and rheir ef
fect on the applicabiliry of norms In such cases, norm,referenced scores are
lik€ly to be oflitde interest or value, ericepr p€rhaps when local norms are avail.
diff€rent reDort forms mav be ofvalue.l
and that provide item response informadon will be mosr ureful for buitdins indi
vidual programs ofinstruction When such information is coupled wirh r;cher
observadons ftom the rest-adminisration s€ssions,rhe needs otsDeciat studcnrs
( an be add re rre d w i rh i n r} l e m a i n s rre a mol l he (hool resri ngpr;grdm
Hlgh School Testlng
Much discusior about standardized achievement resrinc r€nds ro focus
on S r ad$ R ro I, w h e re rh e s ete s b a re m orL promi nenrl y u\ed, bri r $ese resr\ dre
adminisrered in virtually every high school, at least in some Fades. There a
number of pracrical r€asons why high school srandardized Gdnq wirh achi
ment batteries presents som€ unique problems, conc€rns, o. issu;s.
First, the nature of the high school curriculum precludes rhe us€ of a
-
US]NG
SIANDARD]ZED
,CHIEVEMENT
TESTS trI7
rrattery rhar presum€s some kind ofco
tinuous growth) in each subjecr matre)
a| Junlors ra(e a math course, soDho
exrcnsion or conrinuarion of fieshir
social srudies. soDe srudents are in wo
s om e d re i n s o c i o to g y ,a n d s o me m ay
s ur h c u rri c u l u m d i v e rs i rya n d i D c o n si ,
stucrenb make the haditianat achieven
nearly every high schoot ,iudenL
A logical response ro this curriculum march problem
is to focus ass€ss.
menr on
more s€nemtiz€dskilts rbar all srudenrs.are._p.",.a-i"-a#i."
throughourrhehighschootprograrn.For
,h,n(,ns
in..rp,e,a,ion.r
d",,.,.d
ri,.,;i""illil'jl'Jg-iijlilp;i,TlltiLi
':..":*Pp
in high school,colcse. and ,r,-,gr,.,i
;,-r;''
,v;. ;,.i.i
",:":
';ii,W);lii,;;l;nff
Ei;':lP,".:*XlLl"l:*mp'erBarrerie'rike,her
opmeir*
rhn,r,.s..ia.""1.,1"1'
li.1:.,:itr;:iliI;l1Tlp:i:'::f:J*:
Thus, addirionat achievement dara fro
glam supplenent rhe information avai
causes the ores to be of quesrionabte ,
Th ourcomes from srandaili,
school Ievel seem ro be of less conc€rn
the) were ar dre towergrade lelets. Or}ler
as m or e rm p o rta n r to r ma k i n A tu tu re car
rhe resuhs from rhe batteries-are useful
c ur r ic ulu m-c o n re n r c o v e ra s ea n d e m ol
Diry presented by course oairinss. auh
velopDenr, rhe scores also ,.e
r".
""ef"t sraD.
strengfts and weaknesses.ID shorr.
as usetul-ar rhe high schoot levet as at
greater ettorr musrbe expended ro convir
iion and to help them divetop convenie
328
USINGSTANDAADIZED
ACH]EVEM
ENTTESTS
SUM M A RYP RO P O S IT IO N S
1 The cunenrlestng climates dominaled
by ex- 1 2 l l € a m s l a k el o a s s l m e l h a t s l l d e n 1 sa r e l f a b e
cessivoreslifg lor varous accountabry p!rlo underslard Ihe meanino of lhe r own tesl
pcsesrc the delrmenrof inslructonatimprovescores or thal lhey care nte abolt the r own
2 The pnmaryand essenlratuse ot scorestrom 13 Aclilevemenl-teslscores provide nTormationlhal
sla.dardzedachievement
teslsis lo providein,
can conlrlbule 10 evatualronsot teacher compe,
lormalroflo allwhoare conce.ned
wlh the 6du1erc6, b!l such ocores nev€. shoud be used as
lhe sole or primarybasis ior evaluarng re3ch€rs
3 ll scoresobrainedlroma schooteslingproqram 14 Expecla0ofslor achievemenlleversir a parrcuare repon-ad
and interp.etedto teachois;s1!l a r s c l r o o l s h o u db e d e ? e t o p e d
l h r o L g ha f a n a y
den1s,aJldparenls,no olheriormatoreaborare
sis oi lhe characlerstics ol the st!denrs the
pfogramlor usnq them s necessary
scrroo pranl and program and the socoeco
4 Standardzed
achrevement-test
scoresareuset!
nomEs or tre commln ly
primarilyn laci lati.g instruclron
andin evatuar 15 Schoolsmlsl provideteacherswth n-seryceed
5 Theuseof an achevemenlcomposite
scoretor
seredlngslldents ior gilled edlcationatpro,
.gramsmayetcrudemanywho excelin onlyone
parrrcurar
subjeclarea
16
6 Feadne$-leslscoresprovidenformalion
tor ifslruclionarprernng,bll they have rfllch tess
valuefor maklnqprograrnplacemenl
d6csons
7 Exceplih lhe lelds ol elemenlaryreadingand
ar lhmericdiagnostic
leslinghasprovedlo beot 1 7
li1ll6ed!calionava ue
8 Theseleclonol reportlormalsfor slandardtzedlesl resulrsshourdbe doneon the bass ot who 18
needswharkhd ol inlornalion
9 An i.dviduals palternol slrengthsand w6akoessescan be delerminedby comparingeach
sepa.ar€lesr scorewlh the ballerycomposile- i9
scorepercenue rank
10 SkillandsLbskrlrscores
providedEgnosllcintor
marbnthar mayaccounllor weaknesses
]d€nti
liedat rh€ t€sl tevel
20
r1 Thesirenglhsandweaknesses
ol a cla6scanbe
idenliliedby ireatin9the classaveragesas lhe
scofosol the -averagepupi andlhenustngthe
interpretive
procodures
out nedtor !s€ wirh in-
! c a l i o no n l h e L o p i c o
St t e s l s e l e . to . l e s t a d m r n islralon and teslscore nlerp/etalon because
preseMce ed!calDnal opporluf tes on these
rop,csare roo ra.e
When ihe conlent malch belween the lest and
clrculum is loo close or when students afe
laughl lhe lesi coflenl loo d rec y, the abrtrlyro
generarizeabo!l what sludents are abte to do s
diminishedgreany or losl atlogethef
T h el s e o l s l a n d a r d i z e da c h r e v e m e n r l e s r sn r h e
eary prmary grades can accomp sh the same
p!rposes that testingar highe. eves does
Annla fa reslingwilh an ach evernenlbatreryis
opllma tor add.essjng the severat insrru.ronal
purposes thai can be served by standaidtz€d
schreveme lesls
ll is unreasonableloexpeclthalas ngtet€sl tevel
can be used lo measurea.hlevement n a cassroom popuratedby sludenls whose academicdev€loprnenralievelsmayspanlwolothreegrades
The d ve.se naiure ol lhe high schootcudrcu um
a n d r h ev a f e d e n r o h e n l p a r l e r n s o t l h e s t u d e n l s
rnake he tesrin! ol educational devetopmenr
more userurlrran lhe lest nq ot achieveme.t ol
basic skills a1 lhal level
OUESTIONSFOR STUDY ANO DISCUSSION
abouisrandardizedlesting
is needed
1 Whalknowl€dge
byth6gen€ratpubric
to makeindividuals"lnlorm€dconsumer6?
2 Whalsho!d a schooldislrcl do,as a mlnimum,
withits annualstandardized
achievem€nt-
3 Whyshoudtesl scoresslppl€menla reachefsjldgnenl aboll sludentsralusandpbg,
ressralhorthanleacherjldgmentsupplemenling
rhereslscores?
US NG SIANOARDZED PCN EVEMENTTESTS
j
329
fr'ha lwoudbe a r eas of abes et ec r onr u t e i o r i d e n y r n gs x t h g r a d e i st o r a c r e a t v e w . i l i n g
5 Why might a ch d who cannol count or who does nol know any eners of the aphabor be
abre lo prol t from begrnr ng a trad tional krndergartenprogram?
6 Why are there probablyno dagfoslic ach evem€nltests n sctence or sociatstldes?
7 Whar are the re al ve stre.qlhs and weaknessesof Marty Gerami,shown in Ftg!re r7t?
(Noie lhe ComprereCompostLescore is rhe averageot scores V F, L, W aid M
)
3 How ca. percennlerafks be used lo descrbe and nrerpretyear to year growth?
9 Wrral p,ocedlres m 9h1a i€acher totow to expa n why lhe averagescore of lhe ctass ts
hldr rower lhan lasl year s cass?
ll
wnal can be sad abouLrelatNeslienglhs lo a stLdenl whose compos te-scorepercentite
12 Howdoes a high mobitiryralewilh fta como!n ty interterewj r someaspectsottesl-score
11 | what way mghl lhe teachingot rest-takrrgskitB be coDslruedas,,leacn n9 the lesf,?
15 Why is t qener ar ynot pos s ber o iden l d yi n d v d L r a t o rq r o l p s r r e r c r l r sa . d w e a k n e s s e s
wirlr scores kom a cr renon-referenced
1esl7
r6 why miqhtoclober 28 be consideredthe mosl idea day to beginadministernq a n achieve,
men Lbaner yI n a s c hoo?
17 f a stldent is eipecred lo oblaif.eary the same grade-eqlivatentand percenlrierank
jcores whef lesled our of teyel,what is he pord ot doing our4Hevel lest
ng?
l8 Whal incentves caf be used ro encolrage h gh sctroolsludentsto pedorm at then besl
o. standardzedachievementtesls?
19 What can be done lo make ihe fesu ts or standardizedachtevementtesrs mofe usetul to
hg h s c r ioor leac herin
s v ir lua y a s ubj e c la r e a s t
18
Standardtzed Intelligence
and Aptitude Measures
THE CO NCE P TO F IN T EL L IGE N C E
Despit€ widesp.ead acceptance of the idea that intelLg€nce exists, there seems
to be no cons€nsus as to just what it is- It presumably has a biological basis in
neuroanatomy or brain physiology. Vanous levels ofmental deficiency have been
associated wiih metabolic defects and certain tyPes of Prenatal environm€Dtal
stress (for example, oxygen deficr€ncy, viral infection, and injurious drugs) But
thus far no biological basis for drfferences in intelltg€nce among normal humans
has be€n determined.
ln rts common and informal usage, int€lli8en.e is often characlerized as
"brightness" or "sharpness." Th€se words suggestresponsiveness,percepdveness,
cleverness, and ability [o cut through appearances and contusioDs to r€ach understandiDg Lack of intelligence is associated with dullness, whlch sug8ests a
lack of atb;d!€n€ss, awareness,or undeEtanding. DesPite their imPrecise and
informal characterizations, tnese verbal represenHtions of intelligence are used
constantlv as we "size up' $e abilities ofothers around us. The outcomes of such
informal issessment no doubt have significaDt imPacts on relationshiPs foDed,
viewpoints entertained, aDd judgments followed
Psychologists who study cognitive Proc€sres and mental develoPment
and tunctioning differ amor,8 theDselves rn their con(€Ptions of intelligence
(Weinberg, 1989). But they ar€ in geneEl agreement wirh the tu,nacademicians
who perc€ile intelligence as a composite of mainly three elements: (l) abihty to
solvJpractical problims, (2) abiliry io verbalize, and (3) ability to adaPt to vdious
330
I
STANDARO
ZEOINIELLGENCE
ANOAPTITUDE
MEASUF€S331
demands of thc social environmenr. Some researchers call it the abilitv to learn
u, r u du h o rk i n \, h o o l . rh e s rm e dLi l l l y rhar A tfr.d B i nct \l ql l t \d\ i nrcresred
in deiecting with his early tests. Others characterize it as ability to rcason, to
solve probiems, and to use thc "higher mcnml processes" Still orhers emphasize
o. ig' na l th i n k i n g a n d th e a b i l i t) to a dapl t.r nol el si trati ons l n some di scussi ons
" c r eat i v i ry i s p o s rte da s a c o mp o n ent ofi ntel hgence, as somc seei ti or as a separare, but related psychologrcal .onstnct, as others see it
Dellnitlons: Op.ratlonal and Analyllcal
One possible solution to the problem of defining inrelligence is to use
an operational definition, as is done sith a variery ofother penonal;ty measures
The test used to measure the tmit defines what is beine measured That is, inrelli
rc s ' m.a \u rc i . l Jut.l i l l eren, tesr<mcarure d' fi erenr ki nds
c enr e / { w h a r.v .r
'h.
;f intelligence depending
on the nature of the tasks in them. Obviously, this
approach, wharever rts vrrtues in helprng us to thrnk more concrerely about whar
we mean by intelhgence. is not going to yield a single, generally acceptable
Another possible solution is to use the methods offactor analysis on the
respons€s of a wid€ va ety of persons to a wrde \'ari€ty of tasks (test ttems) designed b measure inrelligence !'a.ror analysis is a statistical rcchniquc rhat in.
volves examining ahe correlations between a large number of rtem responses to
determine if certain homogeneous subsets of itents, .allc.lftutds, can be identr'
fied. This approach has shed much ligh( on the extent ro whi.h proficiency on
cerkin tasks tends to be related to, or independentof, proficiencyon orher tasks.
But it has provided no compelhng definition of intelligence. Differenr research.
ers hav€ not used the same kinds of test tasks and, even when they ha!e, they
have rnt€rpreted their findings som€what differen y Spearman (1927), for exam
ple, found a.ommon, general intellectual factor, but Thurstone (1938) found
seven Drimarv mental abilities- Th€ multidimeDsional "structure of inrellecl'
model-proposed by Guilford (1906) is quite elaborate but of mostly theoretical
interesr The taskshe used rc conceptualize the measurement ofintelligence were
subdivided finely into 120 aspects ofintellectual tunctioning based on the process, product, and content characteristic of each aspect Finally. V€rnon (1971) is
one of seveml factor analysts to propose a hi€rarchical theory of intelligence
that helDs to exDlain muah ofthe corr€lational data that has accumulated on tne
structur; of inti:lligence.
More rec€ndy, cognitive psychologists have propos€d information.
processing models to describe what happens during intellecual functioning
rather than studying what results from the process. For example, rhe triarchic
tleory offered by Sternberg (1985) is based on these three premises:
l. Intelligence explains the ability of persons to adapt to their environ
ment, socially or cnlturally. More intelligent individuals are able to adapt in a
wider range of social context-r.
2. Intelligent behavior is goal dir€ct€di th€r€ i5 a r€ason for it. Ofren th€
332
STA\DARDIT'O \--
C-N( E TND AP- T. DC VLASIJRI5
reason relares to wanting to be able to perform cognirive rasks sponraneously or
automatically (like an expero or wanting to be able to handle a novel problem.
3. In te l l i g e n t b e h a v i o r i s b o u nded bl rhe extent ro w hi ch i nfornari on
process'ng sk,lls and meracognili!c suaregies have been dcveloped
The current work of cognitive psychologirrs seems promisrng for educahon be.
cause it suggestsfiar fie intellectual components ofindividuals may be isolared
This means the functions can be studied separarel) aDd rhe componenrs can be
dev elope d to p ro m o te q u i c k e r l e a rn r ng. i mproved memory. or more errensi ve
recall capability. Thus far, however these theorres have had lrttle impacr on the
instruments thar dominate rhe sales market for .osnirive abiliries resrs
Be c a u s eth e m u l ti ru d e o fd e fi n i ri ons propo;ed br psychol ngi srsprovi des
no convergence or consensus, rhe measures of inrclligence ai'ailable for use in
our schools do not have a common basis Ihe rmplicarion for those who must
selict intelligence tests for their school resring program is clear The operadonai
definition and theoretical bases of each resr under consrderadon musr be rc.
viewed and rhe resr tasks musr be examined in terms of rhe school's DurDose for
t es ong I n mo s tc a s e sth e n a tu re o fth e resri tems w i l l provi de a cl earei i ndi .aoon
ofwhatrs to be m€asured than will whatever cri terion.related or consrrucr-relared
evrdence that is supplred by the publisher to support th€ inrended use ofrhe tesr.
THE NATUREOF INTELLIGENCETESTS
The differenr conceptions of the nature of int€lligence have conrribured to rhe
development of a wid€ diversity of tasks for rcsting it. Examples of some of rhe
rypes most widely used on group'qdministered tests are presenrcd below (The
wide variety ofopen.ended questions and performance rasksused on some indi
vidually administered rests differ considerably from the rasks shown here.) As
you read each of these items, try to describe the charscteristics ofindividuals n.,o
likely would answer rhe items correctly and tnose who likely would not When
you have read all the ircms, try to syDrhesiz€you r descriptions to arrive ara verbal
description of intelligenc€
Srtun w (d mtoara)
Identify tbe pair of wor& in ea.h set that are either synonymsor anronyns
a accid€n!
b. bad
c evil
d. worry
2. Vdbal'nclaga
snow:flakei :
a. cloud: fleecy c. hail: stofr
b. icicler€aves d. rain: drop
3 Vdbdl .latsiEtation
pear apple p€ach
a be€t
b. grape
.. eheat
d. gi€€n
r.
Which of th€le is most like a call?
. b. cat
c. pony
SIANOAFD/7FD
NTELLGENCE
ANDAPI IUOEMEASUFES 333
a
l)
t-
n
Sent.nu .otupLetjat
c Praised
d inlestig't€d
s.tae
inkeretutin
(;ikn s,"ntcn(e: ihe da|e
nusr bc advanced one day wLeo one crosses
rhe Inter.
!rrionat Darr I_inc in a wesrertv direcrrofl
N11nba lerict
r t0
qta"tikttit.r.tntinn,
X 2, { ) 00lc el
a X is m or c
1A
tuldtiu
| |
Nihbtr
b
y onc r Di l e
b y is m o, €
c
11
d
tl
c X and y are rhe samc
hagnitutlel
satde
30
+
a 111
d 100
nntLru.tiarl
b li3
UI. Abst.acl Proc€sses
12 np,. clas:trt.ation
i!
lt
d9
33'
STANDAFDIZ€DNTEILIGENCEANDAPTTUDEMEASUFES
11.
Matrit Prog4siai
Whi.h figure belongs in the blank space?
A R (]D
School.16latod Tasks
Some of the exercises used to test rntelligence-giving
synonyms, inter.
eral achievement test batteries. Abilities to handle other tasks ,uch as analog]
pmblems, number sentence construction, and problems ofclassification usually
are learned rncrdennlly (if at all) rn school, at play, at home, or elsewhere
It is somerimes assumed thatwhat a student succeedsin learning inciden.
tally rs a berrer indicarion of intelligence than is the person's successin inten
tional learning in school. The assumption may bejustified, but the evidence and
logic n€eded tojusrify it ar€ not obviou Teachin8 does indeed assrstlearnrng,
bur it do€s not make learnine automatic or does it €liminate the need for elTort
a nd abil; ty on the part of studen ts In telliSencc contrtbu les to learning in school
as well as out of it.
Obviously, if we wish io compare th€ intelligence of.hrldren who have
been ro school nith those who hav€ not, we should not use tasks that ihe school
trres to teach As a gen€ral princiPle, if we seek to infer basic ability to learn
probably have been exposed. Yet, as Coleman and Cureton (1954) have pointed
oui, even if opportuniti€s for in.school learning could be equalized, there would
stili remain great differences in the availability ofincidental learnin8.'Ihese differences in environments and life-styles among different families, diff€rent
neighborhoods, and diff€rent regions of t}le country cannot, and probably
should noL be €liminated. Therefore, th€ prosp€cts for equalizing opporlunities
to learn are €ssentially nonexistent School related tasks Probably rePresent the
Brearest experi€ndal conmon al€nominator for children and, thus, the most ap'
propriate sourc€ of items for pr€dicting potential for l€arning in school.
STANDARD
ZED INTELLIGENCE
AND APTTLIDEMEASURES
Nonverbal and Culture.tair Tesls
So me d e v e l o p e rso f l n rc l l i g e nce res$ hate auempred ro n,ni mi ze. or to
th er l brIn of mani pul ati on. S oneri mes even
t he ins r uc ti o n s i n v o l v e D o w o rd s , b u r are gi ven nr panromi me
' I h e s e te s | sa re u s e fu l rfs tu d e n ts w ho do nor al l speak the samel aD gua8c
tested with rhe same resr,or if a srudcnt with a severe la gxagc haDdi(ap
r e s re dT
. h e y ma y b c a p p e a l i n g to rhosew ho seekmeasuresofi nte[se,ce
drar
icss influeDced dire.dy bl school lcarning, parricuJarly laneuaqe Larn.
ing B r lhcrc is no good reason to believe rhar rhcse Donlcrbal resls are Drore
y alid m ea s u re so f i n re l l i g e D c erh a n rh c verbal resrs A bi l i ry ro do w el l on (hcn i s
lear ned al s o A D d s i n c c v c rb a l fa .i l i tv i s so i mporranr an cl cmeD t i n school l earD .
ing, and in n ro s t o rh e r a re a s o f h u D ra n achi evenenr, rhe nzi or appl i carbn for
nonlerbal tests secns ro be wirh individuals who ha!c s'gnificanr language
Irrob.
lems or with rhose whosc narive la.guage is nor English
Most inrelligence resrsnor only requirc sone deg'ec of adepmcss ,virlr .r
particular langungc, bur also assumchmiliarity nirh a parri.utar culrure. -t.hisqual
ity limirs rheir lscfulness in orher cuhures However, auempG ro britct t:uiturt
/r,l rcsrs have failed be.au5e resriDg requires communicarion, and communica
t ion is im p o s s i b l ei D rh e a b s e n .eo fc u l r ure and the symbots,conceprs,and meaD
ir lgs ir em b o d i e s .
A r re m p ts ro b u i l d a l tz l //o ,r tes$ by el i mrnarj ng i rems rhar di s.ri mi nare
uals in t Le i r re s p o n s eto a n l re s ri re m r har cannot be arr bured ro di fferencesrn
culture. if cuhure rs defined inclusively enough. Iach of us lives in a soneshar
ditlcrcnt culrure Nor onlv Eskimos and Afrrcans, bur also Vermonrers and vn
ginirns, farmers and ciry dwelle.s. boys and girls, even firsr-born and nex! born
in the same family lilc in somewha( differenr culmres The differences are our
equally great in all of rhese inshnces, bur rhey exisr as drfferences in rll cases,
ir em r hdt d i rc , i m i n a trs i i u n ta i r- I
or cultureJair tes! to discriminare among individuals, aud rhere is no reason ro
use a test that does nor discliminare berween rhose who have more or less ofax
abihy of interesr ro rhe usef,
SCORES REPORTEDFBOM ABILITY TESTS
ized) and percentile ranks
336
STANDAF]D
ZEO NIELLIGENCEANDAPTTUOEMEASI-IRES
Standard Scores and Percenlils Ranks
I he raw scorcs obtained on intelligence tests require norms for interyreradon, and these norms-usually age or grade level-are expressed as stanines,
Though the srandard
some orher rype of standard score, or percennle
'anks.
scores used by varrous publishen are known by different nanes, virtually all a.e
defi|ed by a nean score of r00 and a standard deviation of 16 (or 15 in some
case9 For example, when age norns are used, the standard score nny be catled
a standard age score, mental age score, age.equivalent score, cognitive skills quo
tient, or deviation IQ score.
l'he scorc mnges in Table 18 I show the apProximate equivalcDt values
of scores commonly reported for rntelligence tests lhese are the saDe rclation.
,hips discussed in Chapter 4 based on the normal curve. Tbe generat descriptors
are terms that I')ight be used in a naradve rePort or during a Parent-teacbe.
conferen.e ro describe a level of performance for either age or grade norms
Many score repo s list standard scores and percentile ranks Ior both agc
aDd gradc normsj For students whose chrot)ologrcal age s rypical for their grade
level, their percentile ranks using either norm SrouP should be the same But for
those who are older or younget tnan thcir Fade mates, noticeable diferences
should be expected
Subtest and Total Scorss
In view of the wrde range ofdefinitions ofinteuigence discussed earli€r,
It should not be surprising to find multiple scores produced from some t€stsand
only a single score furnished by others. Tests built on a unitary theory of inrelli
gence should be expected to report a singl€ score, but those based on a multi
facered theory should produce several, on€ for each facet perhaps.In addition,
theories thar promote the idea that intelligeDt behavior DiEht vary in different
conrenr areas would require tesnng in each of the several distincr domains (for
example, marhematical, verbal, abstract, socral).ln som€ caseslt might be incon.
sicteor with the theory to average the separate scores to obtain a meaningful total
(olerall) score. For example, when verbal and nonverbal scores are reportedi
whar meaning should be attached to the average of such scores?The specific
T.bt€ 18-1. Relal,onshlpbetweenStandardScore,PercentileBank, and Slanln€Aangos
Ussdln CognitiveAblllllesScoreInterpreiatron
112- 121
88- 111
'7247
96-99
TT-95
23-76
422
NorejThe slanin€ valuessh@ approxmale relarlonshlpeForexanple,bvdelinlon
tound n each ol slanln* 1 and 9
9
2-3
1
4 percenl ol lh€ scores ate
STANOABD
ZED INTELLIGENCEANO
APTITIJDEMEASUFES 337
m ann) g mi g h t l re i m p o .\i b l e ro derermi ne. bur mos rogni rrre prl (hotosi srs
s ould p ro b a b t, a c (e p r rh e a !e , r' s € aj 1n i ndi r aror or g* .Lr
,
,bi f,i l .'
I n e ' e m a v D e .o n \rd e ra b te di agnos(i ( i nformari on i n rhe
" g.i i ,i " .rrn
na
ot aD
r ndr v rd u rt \
i n re i s e n .e.tesr ba en. C oni i der rhree i rudens w ho
se standar.l score_
121
90
a0
98
T6
80
135
97
97
97
96
The parrerns_shownby rhe tesr scores indrcare rhar importanr informarion
would
be concealed by using onty
rotal (av
cons'derable rerbal faciliry, hite Zact
and aporenrial forverbal and quanrirari
ot scores would no. be misinrcrpr€led I
tern ofstrengrhs and weaknessei appar,
t ions f o r th e i n d i v i d u a l i z a ti o n o fc o n ren
O f c our \ e . a l l s .o rc s rh a t re p re \e n r
ar seem..our ot chdracter-
g. veritied$rou8h furrherub narion or resrins,perhaps
::]."-,Jio:ll:n.,rd
wrrn
me as:rsrance
ol a s.hootpslchotogi\'.
Interp.ellngScor€sol Indtvlduets
inteligence barrcry wirl all srudentsrn
se who may have an overalt 6!rbi.a.l le\el
Linformation for individualiziriqinshuc.
ol scoresfrom any resr,the use; should
L How do lbe separate
s.orerfrom t€sLs
wifiin lh€ barrerycompare?
2. How do rh€s.oresfrom dis tesringcompaiewirh rhosefrom the tasrtesrint?
J . H o w d o Oc $ o re r (mp ,re w i!h i n..l aespFrformd,,ce,e!,,b" havi o n ve;aj
a n o p e rro rma ne. on w ri t| ena$i gnmenr\)
' n re ra c l o n ,,
4. How-do the scor€scomparewirh recenr scoresfrom skndard,zed achieveneDr
rl l i n re l l i g e n (e te s K used i n $e schootsare measu,e, of devet.
s rrl ts o b ra me d l h ro u gh e\peri ences both i n and our of s.hool .
M anr o, th e s e s k i tts rrn b e n u ' ru re d rhrough di recr i nsrru.ri on; rhey need
nol
wait for some kind of maturarional unfoldins. C.^.q""*rv,
*1i." j.n.i.".i."
dr e nut rd . re a c h e rsc a n i n re d e n e w irh the i nl enr to i mprove a chi td,
l earni ns
5
e \ \h o rt te rm o ' l o n g .' e rm memory. abi t;ry to !erri evei nfo,mati on,
abi i
l h e a rtri b u re so f con(eprr, or the abi l i (v to cl rssi tr obj e.,s or
i.te2s
.
.N e a rl y
!lr
:
:
E
ij
ii s
t
E
e
s
!
s
3
:
E
:;l;
r!
:i
!P
qEhhtE
IE
Eg
9€
e
330
iHEEEE
I
NTELLIGENCEANDAPTITUDE
MEASUFES
STANDAFOIZEO
330
Exceprional sludents who demonstratc highly developed cognitive skills
rcquire specral artention, also Thay mighr be able to iearn fasrcr, handle more
compler jdeas, and probc grearer depths rhan most of their classmates.Supple.
meDral marerials and projectr can be used to provide the enrichment they prob
abh need Extra, high intelest activitres need to be kept on hand because, srnce
rhe) may requi.e fewer repetitrons to learn, these students are likely to develop
more idle rime rhan their peers The so.called roublemakers in a cl
as likely to cone from re rop of dre abrlity scale as rhe bottom, €specrally if
individualiz.rtion of instmction E inadequate
'Ihe sanple report shown in Figure l8-l contains the scores from Mrs
Kessle. s secoDd-8radeclasson rhe Cognti.,eAbilrlt"j Zrt (CogAT) (Thorndike and
fIagen, 1986) The procedures described in Chapter 17 for interpreting indrvid
ual scorcs can be used with this list report, too Here are some reasonable state
menrs to Dake about the performance ofAnna Aparicio, the first student lstedI Ir rems of rhe descrip@rsin Table I E- I , Annat pcrformaDceis aboveaverage
compa'cd LooLhereightyear'oldsnatnrnalll
? Sin.e tbe ageofmost beginnint secondFade.s (nationally)is.loser to six )€irs
Lhanseven,{e should expecrAnna\ a8ePR and S scoreslo be higher than the
c o rre s p o n d i d S
E ra d es .o re s(a stb eyare)
3 Anna\ le.bal perlbrmance .ould bc termed a relativc weaknessbecauselhe
orher tNOs.oresare so bigh (ln an absoluresense,rto weaknesses
are apparent)
a Quanrnarile reasodingis bodr a relarile and an absoluresl.englh for Anna She
may progre$ faster rhan her cla$maLesi. math, nore so in nrath concep$ and
.omputarional skills fian in verbal problem solving
The class averaBesin the last row of the rcport in Figu.e 18 I indrcate
that N{rs Kessler's classhas a fairly evcn pattem of scores: 107 4, 108-3,and 105.8
This means Lhere are no idenofiablegrrup srrenglhs and weakn€ssesto which she
mighr need ro adjusr. The typical student has scores in the sixth stanine $ith
per.entile ranks around 68 Fortunately, there are no students who, on the bass
of rheir baltery scores, seem to require furrher testing to explore the possibiliry
of dev elo p me n ta l d i s a b i l i (i e s .
APT I T UDET E S T I N G
Aprirude tesls, like inrelligence rcs(s, are not always easy to distinguish from
achievement tests because, on the surface, the contenl seems interchaneeable
develepers of apt'tude rests accomplish purposes thar achievement rests ordinar ily ar e n o r i n te n d e d to s e n e .
Aptitude tests are measures of potential-abilities that foreshadow successon relared tasks at soDe future dme. Their purpose is predictive and their
focus often is narrowed to a single abrlity or small collection ofrelated abilities.
S.omeonewho has the aptitude to do clerical work, for example, has the prerequi
sire skills in mxnual dexteriry, arrention ro detail, and speed with repetitive tasks
340
STANMFDIZED NTELLGENCEANO APTITUDEMEASUF€S
to conlplete many typcs of clcrical work cftectivcly and efficienth Of course, if
the persoD has perfo'ned clerical rforl previoLisly, wc wouid nor nced an apri.
t ude t est to p re d rc i h i s o r h e r p o te nti al as r.l erk In mo\r $al kr.t l i l e, p.s!
performance (achievenent) is the best predrctor of furure pcrfomraDce rn dre
same realn of acrilirt
The most colnr on lbnns ol aptiude are rhose used rojudge scholasri.
promise and those used in employment and educa(ional counseling The Anlerr
c an Coll e g e T c s t (A C ]) a n d S c h o l a sl i cA pri tude' l esr (S A I) are w rdel y used L,
make predictioDs about uho is lilely to succeed in a college und.rgraduate pro
gram. Other srnilar tests are used fbr pariicular admission decisions: sraduare
school (CR[], Miller Analogies, GMAT), mcdica] nhool (N{cA-"f),and lar s.hoe(LSAI). These tests tend to be highly verbal, bur several of rhem also yreld sepa.
rate quant'tative, lerbal, and rotal scores In addition !o rhcsc, aptnude.resr s.ores
are used in counseling situations ro assesspromise in rnusical, mechmical, and
artistic endeavors, among others
It is common to include an apdrude bartery in a middle school rcsring
program to aid students, parents, and counse)ors rn planning rhc mosi appropr,
ate high school curriculum for stqdents to pursue l{oweveri in mosr cases rhe
scores from standardizcd achicvemcnt tests and thc grades fron a laiety of sub
ject areas may provide an equally useful planningbase l hus, aptirude rcstsDight
be administered to an individual lbr whom past informadon is i,rcomplcre, (onf li, r ing. o r .l a re d d u e r. u ,' u \u 3 l i n reneni ng r\< n(s i n rhc yudrnr' q l i rc
In sum, dptiude tes6 are designed to predr.t furure pedbrmance and
are based on content that may have been learned in or out ofschool By conlrasr,
achielement tests are inrended to desc be the current starus ofan examinee
learning. The content of a achievemenr test should represenr a knowl€dge do'
main that we care to know about. The conrent ofan apritude test, however need
not be bound to a parlrcular domain b€cause the user $ill not wanr to make
inferences about tial domarn lnstead. rhe user ofaDtnude scores wishes ro makc
inf er enc e s a b o u r fu ru re b .h -!;o r-q h d
h. erami ne. D ' obdbl \ w i l l be abl e t
do. nor w h a t h e o r rh e c a n d o n o h
lntelligence is thought of by many 2s a general apritude In view of rhc
comp€ting theoies of intelligence and the nature of rhe corresponding scrs of
lasks used to measure it, such apritudcs as verbal, visual-spatial, and quanutative
reasoning seem ro be firdng descriprors In addition, fte purposes forusingrntel
ligence t€sts are nearly always predictive rather than descriptive More on rhe
nature of aptitude tests and their relationship to intelligeDce resls can be tound
in Cronbach (r984) and Anastasi (1988).
SUMMARY PROPOSITIOI{S
I Because
ot lhe varielyol delinilonsproposed
lor
tesls thal emphasze abihliesdev€ oped nschoo
inlellgefceand th€ lackot conseosus
regarding
ralher than rhose lhai .esurl lrom ncderla
howlo m€a6!fet, schoolsshooldnoluserher*
solrslrom drfterenlt6stsas lhoughth6ywere 3 Insteadof choos n9lesls that plrponto be "c! measures
ol exacllythe sameconsl rcl
lure lre6 or clllure fair, schoos sho!:d
prelerverbal
2 Edlcatorsshould
to nonverbar
nrel
choose tesls whose contenl ls reiovanl ro ine
ligencelesls pa.tlcularly
to/ g.o!p lesufg,and
learninglasks ol lhe schoo
STANDARD
ZEOINTEILIGENCE
ANDAPTIUOEMEASUFES 34I
4 The aq! ai...ad€ rcrms used toi.terpreiscores
fcm ir:rigence lests are usla y expressedas
sra..aic scores or percenlrteranks
5 Ine s.ores lrom an inte gence-resroa ery can
conveyan ove.atlabitityteve, as we asa parern
ol slrenglhsand weaknesses
6 Tne partc! ar sco.es 3 studenrobtarnson an nLerrroence
t€st may havc mp calions ior decidi.g
w.ar sklls lhe s t ldent needst o lear nand wh c h
nsrrucLofa proced!fes m thl De rnosl e'reclrve
7 TesLlsers sholld noLDe slrpriseo ro get some,
wna( d erenrscores irorn somewnatdtferenl n,
ier9ence resrs,oi ro rrndlhar the samest!denr,s
score srrrrlsupwardo. downwardtromtime to
lumewhengiventhe samelest
I T€acherssholld
regardinle igenceleslsasm€as!reso' generar
abllty in schoottearning,
anabit
ily lhal E basedon pfiorteaining
I Apiirldetesi6are d€signedlo torecastsuccess
rnsomerut!fe6ndeavo.,
presumably
by measur
ingsk ls lhatareessenliatto
pertormsuccessrut
a.ce ln rhatendeavo.
10 The locls ol achievem€nt
teststs whatthe €xnow bul the tocls ot aprrude
resrsrs wharlhe examineew be abteto do in
OUESTIONS
FORSTUOYAND DISCUSSIOT,I
2 Whalaresomeexampres
ot ndvidlatsadaptng to thei.soctat€nvironmsnt?
3 ow car a prospectvelser oJan inle igefce restdolermine
lust whatIh6 lesl actualty
4 C o !d me a n rn g l l t.rre rc n fe i erencedi .i erprel atonsbemadew i l hscoresi fom6ntnte,
gencetesl?Etplan youranswe.
5 Howcolld a sel or tasksbe c! lu.e ian wrlhoutbeingcuture k€e?
6 Whymshl lhe laskson a nonverbalinreligefce
tesl nol be consderedcolu.e,air?
a Whyarethe inlelligenc*test
scoresot stldenrsOen€raly
mor6slabt€ov6flime lha. are
lhe r aclrevemenl-tesl
scorest
SampleEvaluation
Planning Guide
Health Unit:
Physical Fitness (Grade 6)
AssessEntering Behdvit
l. Clas oral questioninSro derermiDeg,oup familia.iiy sith tbesererns: phyrical
litne$, €ndDrance,body llexibilirl, body suen8lh, s!re$, and fatigue.
2 Obsenahon and classificationof studenrsas orerveighL, about rigbq unde.
weigh! physicalll normal or hand,.apped;and tlpical responsero ph),sicalex€r.
tjon as tolerant,somewhatstressed,or overly soessed
Fomdttue Evdl@tion
I P€riodic short quizzesd€alinBwirh Lems, con.epb, and relarionshipsberseen
2 Clas oral questioningwith examplesand non.examplesro che.!.oncep! arein
3. Reviewof drafr copy ofproposed exer.ise plaD and program
'{ Reliew afrer one week,of exerciseand sl€epjou.nal €Dtries.
Summatiae Eoolwtion
I Objedrv€ tesrs€crioncoveringconcepts,relationships!and applicarionofprinci
2. Short lnswer test seclion dealinBwirh explanatione,descriptioDs,and problem
SAMPIFEVALUAT
ON PL{NNING
GUIDE 3'3
3
s hor t es ay s ec r ion r equir inq r h e e ! a l u a r i o D o f a h v p o r h e r i (
dr ererciqeProsram
' - - '" - '
^
-'
t o,
her ld)
' , ' , h s . : l; ;
"
4 Dew opmrqt of a rwo weetjournal wi.h daily entries
describing p€rsonal exer_
cise nd sle€p a.tility.
Prqpositions
Obfainedfrom
Instructional Materials
Health Unit:
PhysicalFitness(Grade 6)
I
Ere'.i!e
n bodilv exertion that .oDtribures to deleloprn8 and oainraining 6r-
? I-rcrcFc can improle blood ve$el rapa.iry and
hea.r strengtb and lung
'.crease
3 frercising rc increase ous.le sticrgth providcs proiecrion from back paiD and
builds abdom inal s r ! ppor t
,1 Rone thr.knes a.d densily can tlc incrcas.d by e\ercisug
i Fd l;ssue s repla.ed bv lean mus.lc tissu. as a res'k of regrtar exercisinB
6 Exercise can alleriare tbe slmptoms of sLre$ nuscle re.snrn and inabilny ro
7 Aerobi. €xercise requires a minimum oi l0 mjnutes of.onrinuous,
rh/thmic
ll A€robi. €xcrcisc produces (hese body e|fe.sr in.reased cardiovas.ular and ra.
\pir ' . r v
r "pi. ' , r . loLf l blond p r c *u , . a n d h p d i r r a r c . a n . l i m t r ; ! . d . n d ', '
9 A ninimum of lhree ?Q.m'nute se$'ons weekl) is reguircd ro a.h,eve rh€ bene.
fih of aerobic erercise
10 Anaerobic ex€rcise s bnel intetrse phrsical acriv'ry
l1 Anaerobic exer.ise imp.oves body molemen! strengrh, and sp€.d, usuall) wirh
oDt conditioninE the cardioputmonary systems
12 Planningan €xercise prograo includes rhes€ componenrs: physical exam,6h€$
rariDss Eoal definnion, a.dvry sele.tion, progres monitoring
l3 Fitne$ toah ffnerally.enter
on cardiopulmoDary endDrance, muscular endur.
ance, mus.ular stren8rh. fl.x,bility, and alertn€$ and concenrrarion
3a.l
PFOPOSTrONSOETATNfOFROM NSTRUCTTONAL
MATERTALS
',
l;i::
19
re foran exercise prosram ac.ounts ro, frequen.i.
i.tensiry aDCrses
ydraLion ran be prevenred by drinring col.l
water b€f.Jr. and during exer
,*
3.:,i0"'j;.J,:,ir,:.
"
345
3i
iliil:ir.,*:i.a
r,he srcep.ycre
arcusedro repairbodyrissueand Lo.d rhe
penon requiks dep€ndson a8e.acrivityrevcr.and
sen.
SampleInstructional
Objectives
Health Unit:
PhysicalFirness(Grade6)
At the co mpler ion of r hr s unit r he s iude n r s h o u l d b e a b l e r o :
I IdeDtify in(reased bodr..stsLen efficiencies rhar resulr rrom regrlar
exercise
2 l.xplam hov regular exe..is€ relares ro borb bone sress and increased
bone
3. Des.ribe rhe effccrs ofexercise on Lhc spe.ific synproms of emoLional stres
,l Disrintuish Lhe purposes and fearures
of aeiobir and aDa€robi. €xerciing
5 LisL rhe e$eDrial .omponenb of a plan for developing an exercise prog,am
8. Explain how b der€rmine ea.h o I these ra res maxihum heart rare, targer
bearl
rate, and resring heart rale
I Plan an exercise progran .on sisrenLw'th his orherown goals and current healrh
10. Ke€p a conplete daily fitne$ tournal foi a rwo.w€et peflod ofan exercise
pro.
l1
346
Descr,be how nurridon and €xercis€jointly affe.( body w€ight
SCMPLEINSTFUCT/ONALOEJECTVES
347
t2
,.
bod\ ,},,emsJ! a pf r$n bF8'nj,refp.muscurar,
.,rcu
:l:l:.11.,:L,lg",
,arory.resp'rarury.
"*.
nenous,
ander(reror)
It Drawa line graphtharshowsrbenumberand frequency
ofdream episodes
in
one ni8hr tor
nst,mare
acrivily I
most individuah.
r€lariveamountsof sleepr€quired by indiv,duah qbo vary
in ate,
, and senerathealLhconairio;
TestSelectionCommittee
In-serviceTopics
The teachers and orhers who are responsible 1br resr scle.rion mu have
knowf
e, c lA abuur
e
rl )' h p ^ s r\' c m \ (u Ii i u tu m and,2,' hc.senri at: ot re Ind ro rutfi
tl
r hc r obljB a ri o n \. s u m e o frh e rn o s r \i g n' fi (Jnr ropi ,\ In ddi j re* i pr" epari ne
a
s er r . r on c o m rn ' rrc e t.r rri p ri ma r) ra \t dfe ti \r.d betuq.
A Sour.es oflnrormatioD about Tess
I
Pu b ' i \h p rm rre ri a t.,c a trto B...pe,i rnenr,.,
c ' , j n !rn , ' l ' . re\Lboott
5. Consuhanrs(publnher's tesLspecialsr.serecLed
collegeedu.atio. fa.ut))
D TesrConrent and Curricutum Mat.h
,l i * .' s i rh d n i n Io ,p e l r i ,e vi ew row ard!se\n8.uni (urun
nahh
I/.. J " f.
J u o g era rrn e s Be n d e rd n d e r hni . rcpre,enral on. namer,rote5,i nrerper
n
sonal relations' dep,.ted
C- Techm.al Qual,ty
I Evaluarereliabihry €vi n.e in view ofuser purposeGk r sco.es,equivalenl
forms, difference scor wilhin the ba.rery)'
34A
TESTSELECTON
COMMTTTEE
TN.SERVCE
roptcs
3a9
2.
D
Ass€s $e apprcpriaLeness of avajlabte ,,orms
(rerevance or norm poDula.
non, r epr \ c nk r iv r nr \
uf n o r m r n m p t e , r e , e n c y u t d r r j ,
ar . r qFw
' onr pr er r c n t ur . . dn d , i n r r I , m i r s r o i u d 8 r e r t r c i e n !) i n d . p r e d e d n r $
4 x r r ic s dim ( ut , r or ed. h r r n i n ( o I n . n t
noor :nd, eil,.g d,rfi,,irr Ji;;;n:
t or r im € of y ear or r es nns
s . Dac r m , ne r c jdt nS r oid, ; n o n r F d d , n s r .
r c r r r e\ pl\ . r enr en, r \ r r u, r u r e , d n d a c ; o r
Pra.Lical Ma[ers
lnher consultarive seni(es (ea.her in
i"8o
j
€snpr€, format and rypesize,
rit2hirir),of practi.e marerials
of cogDitivestills is available
ttl
notuically
TeacherIn-serviceTopics
RegardingPreparatiofi
for TestAdministration
The list of topics belos are among rhe mosr retevanr to be addressed rn reacher
pr ogr am s pnor r o r e\ r ddm i n i s l 'd r i o n .
'n{tsNr.e
A
Selc.tion ofAppropriate Test Levels
I
Revi
onredl ro esLablisb the mos approp.iare rc$ tevel for €ach
grad
2 A$c$ lhe n
r individualized rcsring (out oflevel) wirhin cta$rooms by
using iDdn,i
current redding and developme.rat levels
B. f e\ r \ , heJ ur r ng jnd Ad\ dn, c P r c p r 'r 'r n n
3 Oudine procedures ahd dewelop schedule for make.ups
C Test Administration
I
Reli€w loral time schedule and the need ro adhere Lo n
7
350
Refie* sysrcoaric procedures for mareriah .lisrribur'on and .ollecdon
TEICI'IEFIN'SEFVICE
TOP]CSREGARDNG PAEPARAT
ON FOF TESTADMINISTFATION
35I
D
'
"rr":il
i:f."i*.
"r
checkinsdo.umenrsfor.omprelio^aDdfo.
euarily
TeacherIn-serviceTopics
on Achievement-test
ScoreInterpretation
A posttcsring in{edi.e program can prepare reachers ro undersrard rhe mean
ings of scores, rhe locations of scores on rheir reporrs, an
e$ods ofusing rhe
reporl, effectively. l his lisr identifies key concepts and p
iples rcachers must
under s l a n d to ' n re rp re r s u n d a rd i zed achi evementtestscores.
A . S e l e c ti o no fP u b l i h e r Sc o reR eponsand S e.!i c€s
I Reviewlhe sysem s purposesfor k$ingand identifyr€pon formaLsLhatNill
tacititate achie!ing cach pu.pose
2 Mar.h the specificgoalsot Leacbers,
counselors,and adminisrraros pilb rh€
rypesofr€ports thaLlacilitac 3@ining .h€ goalsof ea.h group
D. Difrerentiare.he Typ€sofScores Reported
I Dndngujsh composire,Lesr,still, and item sco.es
2 Des.ribe the purposesof raw end percent scores
3 Explain rhe meaningsand purposesof developrnenratscores(cE, SS)
.1 Expta,n the meaningsand purposesof srarusscores(PR,riCE, nanine)
5 Desdibe .la$ averagescoresin rerms of "averagesrudedC'
C Undernanding lhe Norms Used in ReporriDg
t Describelhe narure of each group (lo.al, nadoDal,.arholic, lar8e
?
D
When appropriat€, differemiaLe norms lbr pupil scores and norms for
ichool averages
3. Explair rh€ eff€cr of tm€ ofy€ar of t€sting od cf,s and PRs
Extra.tin8 lnformarion from Tesr Reporrs
I
Assessannual grosrh otindilidual $udeoB usin8 prolile.harrs or cumula
IEI|CHEF INSEFVICETOFICSON ACHIEVEMENT-TEST
SCONTINTEHPRFTATON
3 . L i i ma re i n d i \i d ual de\etopmenrat
l cvet,{ i rh
4 l 4 u n i ro ra n ro d l & o\rh.ol .tsi e.,!rgrrdc,o
pectahoDsand accountingfor studenrmisrari
5. Idenrify.la$ (or Brade)strengrhsand {eaknes
xutum rreas
prnggruw rhFr
frests an.l sttills
6. Ur groupdara !o des.ribe skitl and ,tem performance(norm and crirerion
7- Explain tesr resutrsro studenrsand parents
E BasicInterpreDveConsid€r2rions
mental and s(arusscoies
3
Da, r ibe hoa r o r e r r e a . o n a b t Fp e r t o r m r n . e . r r n d d r d r I n r u s e q i r h c r i r e ri o n .
rerer€ncfo rnrerpretaUons
References
A^ro\ R r 0e7r) dmdsDrla?twna4ttt6n
A!h!\!
l:$ trq3ql crrsroom
Itr'p,ovrid, rnL ur a(lg\on
*\y.,k:
Le.t) Th, qtlnt;v teh1.
A r8 ^ N r r L ,M a , , n d S A { r s.D
L 0 { .1 7 3 ) r { u kip tc r 4 p o n r
s mur' Ple he r.,r!c!orn8 A!
R (1977) A omPanson
or11,n.!rrl, krirbirnx ud !i,dry ofompt.x nutript.
!h!!€ DUrtrPLeresPoN.
(R*eard' R.p.n \o 95) r.M C'rv: Lr.iveBn) .rr)wa,
colleg€ orltrdunr
L , d r r j $ t r s ,H I t\e r r l) r h r M tih
' .4 r 1 @ d
^ rrxlnt
{ ! DnDas
r R , tu a$8:N4t ot ludar uh;?llr,4/
walh'nson,
DC Nii!,rxr i.ademy of frdu.auon
r^ao\
09lri) 514trl'rd /,/
.du.atb,qnana l\l.h,tasoL katY wshiDgr.n, Dc rh€
A\^shn
.\ (r98rJ) Pryi'r,g,.ar rrr4
(6rh .d)
Ncq yorr:
cv\.hndse.
J R 1r*r) tharchkluQotraear;d
MA Hari.d LnN.rnt Pi.s
^!DrRso\
A\NA. L F (1933) S/u2r
Dubuqu. rA: Wi un C
'.rnru'
Ayu L r(rerg
i !4r./f wtrwDs rx q&1i5 oI hbnv;inE
o/E/@r.r,,'a i.'ee Yor! R!$cl
B&'rN I l. (le3l) Do.s n2ri.nalLr' normed reill'
nonJjti Jaut n oJravdk^c
MetvMt,
13. q7 'n.an
107
adb\^
t)rtt|u
15t1us 263 lte73)
(
t9
3
t)
B au . R , D , a o d r n r L . { r A.o ' r p r r ko n o fd im .u lr v
354
vduel 0r *lc.rd
rru._Gke n d)
tfpN a,nLmpm\ E/1ntb.4t pr.t t g,7 t5 40
B^Rrufr L 1r.,87) A.{lenic evatua on rnd {udenr dn.i
t4o
hwuLatr4w@A
Bfr{, R K tr93!) Ser..ring thc index otreriabitny. rn R x
a"tN t <t), A Etit b qrn,o,.4ln
n4d
m,ne Joh6 Hopkms uiir(,']
Pr$
B rN u,A (l 9l l ) N ouv.X $re.h.r
rrerkduer.he, res 6arn$ de.ok A^e PryhotE.tli.
x 11937)op.ncndedv.r
susD,ur,ipr. oi.rr$Fonse iormr
trdocsmake rdir
ae.en(e ror dusarn.
purp.$
,4ttL.d plyhataE;.at M&
srooM.R s 0968) kaniisrdn
sctf EuLtutina4nBt,
,t dothis oe.6) r@iotu n!41v
be\ I: Th. apnn, d6qin
BorLb'Nc ,X f (1967)
afut t .hMt.g. nvohn
bgst ,,7
11
( (r937) ComPuErided
In R cls!. Gd) 16rru
titut trhtut sl HtLkt e NJr l_rwEn.e Erlbrm Asoo
ciffu. R. s (1952)HoN invatidalc Eark3 r$igftd Lr r..h
.4) Jtu
Eduatiannt Pr.habb, 4r, 2ts 2a
'd E
(te6e) The us.orspaDre
an5ser sheeb br pria,r] rge .bndrcn ku.,!tt al Edn
ttfui MtuawL6,
t5a a.
CHAS. c. L tl973l. Trr. iEpa< o
,nd hordlrdns quabryon (orins esav r.ss /du.,,/./
E'}va|io|4lM^u|M|.16'1941
t,.
.
..
.tt,-
\ |ijn..D
f $(l]oi)fl
rk^u4xnt, n1) 4r! t;l
D r '\ {L P I 1 l ! '7 3A
) d l x n l {nt t i l c m e t r l r \ , r , D
s<
n a r i o nR
.lL A$!!u!.i)
( r ! r ) 5 ) , r '4 r , n a
.,11\ \t r,.nncHi|l. rr
\rr l,\'ntol Lrlnrint
tqnnt
qi c \,
d.l i l ^
t,\n4t't
1 R r ! I .e 6 6 ) A { o m p d t!D o t i( n r t(
( 4\
.
[ | \]
r2t
\
N|
r.ttuaiaat
1 i 0 7 3 r) r e ,
--
utuvlhht r.\2r s
lt b:d)id@at jt1nN'nt.,
MntuhnL
t2.tt
tj
t$nt,L1r L.r 44t4
'dl | ]ff\ N l
c .r h. nen\ l F .n]P Il R l
!]r ]k d)
Br\vr\ r .1.ncso)
.t,Y
.u(^vu|d
a,ul t:et@LM h (;;1aa!
1'
,anl
lh^|r\ol
n|ul tnlh'
1'7
|||
^,a\
rrsr
tr,rr4t'
'|t t\thaaFtdt I
n) lt u)|attqzt
D-41'h
a\\. B,inh r\d,t..t
F N tr .s
| ( . r r ur r
ntn4!
\
\tan
Blod
nl E t Ed,iil
0R( rl. P rl160).liorrd
rk,inr.
brrt7 Nr? bx ttsvttl
! No 5 ) Pr in .e r Dn .:..r
]il. j.rr.,rrd r{\Lrx. D.w. (tq7!) Eau to b@t&
oa sq r-tgra q yv qoat \fu
qha,h4t
D.sNli
4 prw
tun4
l. L (res6).isArie,!/,1r
ol,rujun t,4k.
rc5r !r,,;oorDh
| ]) 1 4 \n *r t.r r n n .h n r o r
R G (rs79) rkmoPrio
pryhot!$ 2t. f;
H@h44r
\l
ao{,
nr !,r _" r ;B
,he _i 'n
dhoit or.nimr;s
rh.
p(dud
n'oDeD' @.m.'.nr
fron rh. hrrs of$e dh;;hn.
pn kt,E
|o(
ot Ltv4toturt
ldqnt
)a 614 30
r FFM n 4 l tqr 3r v ul i ph..ho....
]tu. r .k . r _m D ._,
t\nrt\r!
hrd4t d.Fnr
.drcMl M.6b4dL
tO 297 !O4
_1r97.1).
r'he e[..r.r
neh ao
-
-
kn\Ffudiq4]adP'yhDlaqwd'
rrs;6r r\prnd<d Epq,h8
rumi
(r973) rre,hodorognat ronnd.rrtions
pd,ni! ,4 pnnd.r
i! s,adinB r\r
tiant Astn
d .tcal.sa
bd r.a16, ot
D tttsnt
^sti\nurJervl.
{l1r3c) Rerirbirnr or r..,rs n.r
Ptuha,7trt,23 35
( (1990) Kj.derya'r.n pupil rnl
-
ing Ebha,a, s.h@rlortut, e'tlt 135 +lt
, n r r D r L v a c i ( l 9 8 b t tr $ ' m
pr. uuc turlc r$L /4,81 o/ Druo
Jliri, rqo
I aln.rgo Rnar(k
H nr\ l R and(raD \Lry
bero{.hrrekn{d6
L, th, nntLd nn,ltb
, r'd oRrJ . 1rq30) irhdiv\
.tun tn An dtnflata!
.of,pan rd (Reserfth R.por LNo
370t urbam cl,ripiiSn,rL uiivc(nlot
,nos.olrice
c ( r e lJ? )
,rD(rsr'l\lr.D
prehr
rtrhe x.hi.vencn' h6 /dnu,t lt Ldtdiq'rl
trt nntdhr
/6;
Me
P.\1943\
lJ t om"s
^\dri.n\
NJ: F,.di.r
t2nr e,t\ trngre{oad crirh,
clr' {,H v s (lorjij) P'eparns rnnsparenrke}s ror insped
n\1: \q shcd, l\mat aJLdv|l;,wt Me5trn1L
), tt2
cL:$R. 'nR L (1962)Ps)rholo8,vandinsruoional Le.hnology
r R I crrfr (<L). ?zotns,,?ath a./1?b@hant\.e
78) rrbbrrgh PA
1106lr)r nud(n tr.i,n.rog'l, a.d rh. ncaruremcnr
or r.xrnitr3ouro,,es 1,,tr.tr rrt.hob!1i!, r3,5t9 2l
lre68) adrPfns rhe erem
in'rividutrlp.rlo' ntri.. PidpdrE\ il tt' te67 lryitatiqa)
a,4b4o an 16nn[ Pr,rkn\ PnD.€o. NJ: !]ducuonat
Malutl,r
th@r alnann!
publ,shi,rgco,n
B (re63) P redtrungg'rd.sfruD
rD ,ar4
Hrvlir W tr07.1)Inbodr n)n ro
rn w H^crr cd l. ,.,'dr" ry',4trl bnns 15 t5) Erslc
\ood crjffs, NJ LduxDmr rel
,orw Lln@
269 ! SupP 406 lD D Lr t9li7)
HoLr- P. c (le.l7)
v
Jl'hD $n'r & sdr, r,r
Hoc^N r P.(re8l) nrdxrab, ,dM'r4
t\tb^v M,1.rod
tP nr\ al uLnadLd| A ,n,tu lJ
!,sr R ctno{l udon se\tre N o rrD 2i ?4311
HoovLR Ir D (rr.rlJ5)rhen^r tfilrtdt
\aa la' ne'u\4
tuudetu ba.l)D,4t n ttE itw'
pape' pr.trn'ed I' dr A nmral \
rh'i L ri (l s7et ordc!ngpo\eror3rp;rdk \,^ursror ped
In! illl( !$G In'ert(ron.ftj-p
Atuti?a P:yhokp.at \kburtn1t
t,
3, end Tr.ix. B r (1933) rffeG ol
?(hievemenl expemrions o,l hand!.'unB quatnv .n
Kdin{ as\"s hltnd 4 Dtvdm
A S. rndLANDMAN,Ir (rq30t A$e$
G u r \ . B r ( r 9 r J 1 lA d ?r r le ,6 r
t.rG'r)
Ntu tndin,lq
t4t;,8
ndL k(rvbs\ otu 1nnrrl^'i't rn
ls humd
a;ref r r'l. Po&4 .n $r us r2{, (r9?r)
CRoNLU\DN E,3ndLN\ R L 1ts9a\Mtdtudat d,ntutt
di@,fl z&rus(rjLh ed ) r"e! \brl Me.nrrran tubh5hrn(
(nn k,rD I P 0e36) tq.r,4r,.
.l oIrN so\,A P .(l e3l ) N d6on a
,d,t the \tL,\d"\
Jtutut ,t Ltvattu
loFN$N, M s.
ne6tl
al r,1vd,o"al
pr.h'tas!, 62,
L,tamat ft0,1,4
d r{irl v llcqt)) Ma.t t: oI t.tu1|r1t2nr.a|
Joycr B R
EnFrewoodc1,ff5,NJ:Prenri.e lrru, ln(
(re53)
rrl6n, R A
An.xPe.in
_
Ldv&,tu
P:\.h.hs\ |e,
0e{i1i) rn'ethsenc
tdtrn t
't
(i954)
H^r'rIS
KlNr,M f trelr2) a $mprins,,odd rq 1r,d,q A#r,?rr,
I
A {hoor Darr-fz.\r
I^n.y1 duarotll
otur srpnr,@
4.). 3o5 t2
.h.tagicdt Mffit tu|
6 ) 23-6a
^dDwbd'd
s M (r$qtr) vxlidnr.ra
L 0930) AgreenenL.oem.enr as
uxo.ony or nux,pre.horce r.mvrn,n8 ruld At?ftd
indi..s ord.pendabirny nf donain rtrer.r..d bc i,
M^t tu|ia
F,ttqtit4 2trl.5L 13
pt6.t P:yhnhsi.d M^,tMt,
1, ta1 26
(1939b)
ordulriplechoi.enem
rrNr,
F. (re63) c.oirhre a{hd
Jtuqtt { Ap?Lizr L.tu!
^'axonom' in Hvat@. 2l1t 11-50
rtni'xatct
Applat lndur@t
wd
r,79-:!9
^r4l):s,
Harn, L rnd ,oiAr, !: 0975) lhe efieci or rhe qurhr or I(.Luy, E
C (1962) The lulty nrncdoning s.tf rn P'.ru?y,
pr..ediig rep.ns.s.. tl'. grrdd isisncd to lublcquenL
,.bury, ,. r'',s wrhi.grdn, Dc A$oqalon ror supe;
lssponse5 o an esly qu.n,on /d,tur al D*ednn! M.a
vkion rnd crdj.ulum Deletopnenr 1962y.,rbol
Ruur'l
L (1939) rhe sel.crionorupp.r end roq.rgroups
HARRor. A J rreT?) A tMry
ott4 n.taDdr drui^ A
ror
nrid
an or kn irn5 /@,,r E Edwti'nzt p,
'be
sLd. td drnb?tns h,haroat obi.t
Issu, D J , and suEruND, R C i3rr) (r$5 m) rd;
H[.KM{N.R w rfEN,J.andsNowR
I (1e67) Efte.B oI
res, voruE.5 r-vrrr xans.scD, Mo:,f.r co.poaron
ia€D.nr En,ns alk&
t" .J o Psr.horog t MtetMt
27, rr3 25
?*
@jtzrntu
-.rnd-(€d!)(r0s0)
4m
16
Hu3u,
R (r933) Cordacl.aus.\.fec6,
and rEa'm.nr
Mwt
n p,rtubs,.e4dn,
^o,/ dtus
(ird cd ) Krn.
oI test
Rtu! oJElvotvat R.w\
tslr), a1 71
ss ctry, Mo: Tcn corparr oo oaAnarcr
^ax'.lrN Oe6r) x6@' oa th, e alq v!6tu @w
H'ERoNvvus,A
tuu.R
o4xtits 16 itttui@ @tttu
J.aododc6oe3r)
4r''37d&J unpublished6anu3crip,,ro*aT6dnSPro
,ut@ (2nd .d ) Boron:
rnd B(on, In.
g ni Univc^nr or Iowa
B LmM. B^IIn
S ,rndMN h,B s (rs6{) t%'
0961) t:a/t4@un
nthtu)
in lr.h'las
@!1.tr4tnn.
qa
4 4tbd,i,B|,btrha
t|atulb@h : rh? alha
.r
-
ll{r5oJ
in.l nrrr,ns sy{cms rD w s Monr
,." ,.,.."- ;.:l
D ,\ 1/q|e) r:rred,r.n.$ olmulri
oor,,Rnol. A. c. ti9n7) obh,nrna inr.n,ld {eigbr rhcn
w lr cr 7 ) T lr $ ( r v
i ,n,,g.,dc n.!
4'd FrrLrt,
6, 29 XJ
or
r c5 F .Sr p p q 2 6 ( ND.cr t r q r g).
L r , l r \ E I t ( r s6 r l
I n.r'\ tld \. Phtt64,t,b ,j auand
\c!
L r \ \ R L r q 8 3 ) T e n in sln d r n
n.,' Ja,a,r'n lJ Lnvdrnal Mtq\vmt,2a.
tbrr: tolrn w,lq
Pai
Lqltrh totndt, tt,4tj1
D (;. (lq2c)
7.
.J hr ./l 9- i
P'
ne 3!
j ,.,R i
I r,.i
r n,4.n,.l
o4 ht
-u,t t I 4e.
Pr R ( 11{ r ] l Ic T :l )
ho\fe
ol i'cm q,rrtrr /4d./
M.C\LL, w A (t939t
Q Li " v N l
'6r2'a
vninB
r
s ,Li r M ',f
rcdP(cn.e
J
r nd c r k r j .
ofl .yd oley. hrndcoordnkn,n
(1933) ,\d!ur a.Mdr
m d.st t 16 4 ntu da
^
MtyrR, Ci (r035J Ao.xpenn€NJ iudr orrhe otd aDdnew
,rpes or.rtrm,nNons: n mdhod5orrh.nudy /,unut o/
af .d,ntiddt
?,o{4'
rmtdad
( r s 3 s ) r h e .h o ir e .fq u .n ,o B o
P:yhatas, to, t6t Jr
JtuD'tr alnuaiwL
vrLLM{\ J (le74a) P'osrn 65e$Denr.lneron ref.r.d.cd
non,ft, )2 taa 92
e d m e r u r .m € n L ln w. I
8trlero cA Md ui
(rq69t H,, b r4r rni
ntn
tl@hwL
Ne* York [rc
Jr. kd.) (re33) T66 '4tuint nr L'matn,NE
!.L x c,andJoHNsN. M (te33) Paru.ipanc r.acuonso
r.oplErn<t
kiiq
R.
lwa4l al Edht@4t a^tuiry
l@.,1r1r) 79 36.
ll!trr.
D.t. 3 w{ss.r, v 007?). rnrp tr om.f .hrns
rn3 ?ni{ds
obF(N. kn n.as Jtu nat ot Elud,o,ut
\no
oD
d Nt tu tn!1diu
M. (ls3r) fhe r.lduon5h,!
and aas4n?ikinsro;
&1uditu
arn1,lLtetal
rat M.dtrMt,,t
t, 369_i 5
Ro$. (i . (re.17) r,4&uhr
r dL1t vhad\ \2r<r.d,
Ln
rnc.
srerood cl,ft, Nt. P(drceHa[,
Rowl,M R (1974) wrr dn.{rd
ahr.s, dru
inu.n.c on r?n$ag.
P.,,.t\\" \ \ne l o,%
or R '
. (r9?7)
R!.H, C M (r9?9). ?*?
ot m4F
Fomuta
s.ori.g
aatmtin
a?n
"rnrr
RrLr c lts4et rh. <@4t nI nn
s^rLE ,D L,andw H r'r,c
w (re6s) Thc e|rea ofdi ..r.en
L. (1s65).An a.dr!i5 or
dnn pr.nob$at M@ut@4
2t,
vrcHr,J.v
@6
tl e30) Lfl i ( tr
/dnd./rj,/
L I I H P s \ l L { in d ktvlNq I J
cs and Pwie.3, t4 22
Mr*(( s lr9n9) v!L,d,q rn R L Lin.ted) Eduditun
a
(Jrd
lcmar
ed ) w*hmsron,D
c
.e in [ducai.o 093]) ,{
16 dation2l r4an \Easbin*
A J . ( l e 3 0r Dr u n F ish i.s
rc.tEtjenred
rc!.. A@@ ot EdlLdr'!1t R.edth, ta5t
\tezit slot ,J6 nhrs puptts w<
ta ntv t1p:
l g u nfth E' fJi' } l' ka &zUm lE' jL :Un llro oat trnok Euleau oIEdu.,LioEr Rd.arch
t.znijq humt aI
uarh, t|.t;o:is.
5{P Ltb' P D i nd N oU \1{ ]
rr,lhome,n! !.nndenLionl rr@
tidta Tetrchg! 23t6),17 2A
rnd croM{(, T R (is66) T
or tren raa.gcm.nb o. Len pft n ,o.e kunat ot Llju
cot6d MwtMt.
), 309_rr.
sL0v$ M 0967).Thc me'nodor.frotevaluln n rnR Tvt.r
(ed.) Pd,pativ' at cuwurs
Etttwh
AILA M;no.
I )skorj€,rr_
Snphsqi.soncum.utu,,tydtuanon(No
-,
sHFas, r A.rndsMnH,M L (rs36) syn$csn ofr.searh
on jchool r.d'ne$ ?.d |ilnde.g
ttu Dd.,:ht6 1111),13 36.
sM'rH, I( (1e53)
.h.i.e
ftosinre!ins?.h'cvcu4n
ltuaatolDltdnrct
s E^rM ^ N c E ( 1 e 2 7 )T r .4 r ir Ao l,m
s r^ rrl J C ( 1 9 ? l )
Edtdtihat Mvnatenn<d)
(r1,5r) Rcr,rbiry rr !r l
n ,n a ' o n In L I
Ixri )N , (| s 1l l ||10)
Ra,i, ,,trnudrqd
r\wn
tb rd,i ,
(let5a)
and _
wor[ in hkrs s.ituJ rlaraq 2I,676-3r
(19r3h) RduLiirirr of gridins hish {hoor
rnd -
-,
tt
s.hqi srldnlg p!rcLi..5:Dtrili'ns
kiv, n\t Prk t n4 | \21, 4- | I
R r D ' . r , a a l r I N{ r . F ( r q 3 6 ) ,ttA!,' ' r t,,r ,' a
s'rorD J B (re46) Ir'.r'r,g) r
n^.n.h.
tlt 12
t 4' tdlultnl 1rrdhn la'
,,thD |t i i rhr4 | r\r,
B,hr t,tn,t
'
,)tshnrnd r\.11t
\{ rR r w (r (193?ra(on,tN '!r
LcL$r\ r/,/rilArr!,@
rcprc\.ru'n(
( r g t r 3 ) R o m l t l i g d rso
rnstu. ).rl pn.,rr Pri /'/k/ {al4n,6\'ttit
\ no'td
r r' {\r(,srrl I (r1l 3) r'nD rft'
r - fh o n ,d i[e ( .d ) .
wrsbi.ghn. Dc ADen
S/rl v sodizi, 370 U s 93x{rq6a)
ttua, al hurh
srmNnrRc. R J (re35) Rryut I.l A hvin
ttx,ga.
N& Ydk crnbn,lse tr"ivdsi'! I',eis
srrinNs. R I (re37) DcsiBn.'nd ne\er.pn'n' or p.iirr
-
((r). /:brduen
t' tt\)Nn) r\ant"i
lln, fonn 1 (ih
tuprffirrrn
(i (lql2) Reliib'rri or sr{hDs
_.
Inl(nlu{
Ne Bn r [:M r n i
,en
c, B'ardsar\rrcnt.oui,d..isi,).sof
s'a*Oe30,wi
use lorarard'nsaholrnhips ir norarnsnt lnLtry.ndt
s rilN ^ x n . 1 M ( l e 5 l ) l h.tsa yr yp .o le x
l-indqunr (.d ) duarml
'.afl
-
(!rcn,
u rl shrR {, R A (re3q) I
\en) Elludutt
n,tr'pris Alut\t rh6w-^Ar
(l 'sor
tudnn,t
',\!tr)
D
aunatl vt'r'
tlrd .(l) l\]\l'itrsrtri
Dll
wtrf\ rf w (rlfo Qurt!u,4: \t1t^.ti tun||,.\ \!,it d)
wa\hn'gkn D(r: Nrri,['l [.dn,Ii,r' ,^{o.irn]
srl \D H {l e3l )rD pun[po
D' r!r,('
lri (M),\tultfnror
M J (ro3{) riiiDrrn
In R x Be r r ( e d l,,r s!tu ,
nrrri"rtr. lohs Hop
ditdinr4atun
bi.n!'utiq
s!s{d,^i,
-
(r083) a prr.rkioner': pide (.c.npu'lbn
rnd
'nrr
preunon orrliabikr indic.s ror marry k{t /ond,/
Tdlor, H Os30) r''&r
s'4lii{ {ERlcIrM R.p.n N. 75)
P'inc.,oo. NJ: 1:R'c cr.rrinsh,
m.nhnd Evaluanon(ERrc Do.un.br R.produdur str
rli.w foi obj.di€ rdd
rrnFy P.w (1s53) Hor
'ud.oB
tt,5s2 603
srh,L]ou.l't"
sar t \r. Ettuh4
cldrk{
T.r$rucn.l
s (rc?rJ,{sia,dg374*hn&tat
-
(1930) cl3sroon iMdard rd'ns and sldnrg pra.
rHoRNDrx' E L (rsr8) rtu wlathr4tuv,
sqn\tu
sh'4
E ton@tPa4II)
'I
co
l'. s.hool Publhhing
tlth. Ndud
ttu Hth t4trhtua,1n' \t'\r
rrr"
nL totutnat Ll tht K,"rL/rnt)tu\o,
hbabt ahma4 tr. x,,t, RrLatlw
hat':gt A, ultdol
lroturhii
t)Dtt
R I trf;71 !'{,t,
2 t \ rnrtt\ .rttt, | |
24,tltrlot
|,t,6
,)\ thn ,uirrtll
I thtt
\rdj ,t t!,itrtn4t
su!
r{'ree
of wn..nrir-M'l\
w.!D. R (19t0)
I ) 2e1 :tau
atrdn\$
ht vt aJ EludnMt ]trNr2"thr,
Yu.N,s L (193r) H.r k, usean
| I
PaI6Do@ atn l6ttutih hrnst,2xt)
Author Index
Coifnan, W E, ll5, r8d
a nd re{s, K, II 0, 20?
Cro.ter L M,98,2Or
Cronb,ch, i- J.,84, t09, Sao
DaYis,F 8, 231
BImm, B, S,5I- 5S, 126
Bou ldin &K E,42
Divine,J H,,201
DorF8r€mne, D W.,241
350
Downing, S M, 3!t, 153, 177,20.1
Ebel R L, 52-55, 84 85, 104,124,|93, 135-56,Il3,
148-{9, r?!r, 202, 233, 230
Enioll E c., 192,26?
Ke l l t , E C , I t
Kib r e r ,R J , 5 2
leldt, L S-,88, 3I3
Kr e a e r ,C D , l 5 l ,
'7 7
t risbre,D A,91 , ll0, 1 55,135 36, l5l, I t l, 177,
207, 267, 210, 212, 27A
Cagna,R M, 52 53
Olaser,R L ,27 ,35 ,5l
Grissold, P A, 267,2?0
Linn, R L, 50, 322
Cu'lford,J P, 39, 252, 3.11
H adl€t ,S T,26 7
H aladt na T, M, lsa , 17 7, 204
Me3si.k,S, 10, 105, 109
Hi€ronynus, A N, 209, 306
H ills,J,R, 1 38
Hively, W., 35 36
Ho€I,P G,63
M ir r m a n , J , 3 4 ! 6 , 2 0 t - 2
Hoover. H D, ?89, 506
Hru, L M ,, 13 6
Idin, L K, 136
ode l l , C W, 1 9 0 , 2 6 6
ory,l C,171
Pauerson,D C., 156
swe€rland,R C,, 299
Tqlor, H,282
E s, 15 6, 2b?
Qu eUma lz,
TerwilliSer,J S, 267, 282
Thorndite, R L , 80, i39
Ric hrd so n,M W,83
'I i r f i n , J , 2 0 3
v € r n o n , P t , 1 9 2 ,3 3 r
S ab ers,
D L , t?? , 2 t. t
I
srig8ins,R.J, ll? . 2a 2,2. 17,267,2i0
SubjectIndex
\!6otur. 8hd ing. 2bE
6t
-|J]sare sEn.iard5. 94
k t u tab, t , L!.I
2 4 ,7, g
Il s nc _u qon
open.bool(204-5
", ^^,
zi.lfurn!
nerh;s. 32 33
.mpaed
Htrh apritu
j59
ryhn!
o[ lg-pU, S
r
@euoro. 9, St 39
j,i"ijilii..:,;.',?#'
..-".',
iliL:,i
s€parak ,nswer she.L
rp<r,r {udenG, ]]26 200
r.lttur
rer
l,3_2;
.!fur
q!s, I 2, SO!-s
qreic
6tins. qos 9
<F rdol, 308, 323-24
r+ k!6L 326-27
q:E!
rdlG.
jog 19,126
ElE.FFknce,3Os.
$'5j-;;;3'1'*
ffi,fl'-"'
. r.ait_ 245_46, 250.61
AnaJtaicat\.orjnR. I95_96
^rs€r ch:Dgrng.2nt
hpur udeG s 6 , j j 9 - 4 0
j2o-el
ni'o
tnn-'oo ''
Frp,Drion.2oa
Fojtari
206 c
-E
:-Fl'-{tud,216-17
-E
,
'o'-u
1.
463
304
SUB'ECT
INOEX
Bia3,253, 335
Bimodal diitribuLion, 59
Biierial corrcladon,232
Cu r o f f s c o r e , 3 73 3 , 9 t
de.bi.n consisrency,
9?
rnreryrehdoDs*ilh, 37 3a
C€ntral (end€n.t, 59 60
Ce.rin. ion resing, l2-13
Charting paii.rparion, 261, 274
Chear tn go n re is,9 3,2 06 9, 329
ch€crlish. t,12 247 50
Decision.onskrercl, 94 98 t2.l
D.rived scores(r.r SGndard scores,percenrjle
ranks,S.ore rnrerprerarion)
Dev€lopm€nel scores,239 !2
Diagnosri. teving, 308 9
Dim c u l l y , 9 0 ,1 3 0 3 1 2 2 3 , 2 j . ) 6Z t S 3 1 , 2 3 i 3 3
.riterion ret ren.ed r.n, t!r. 2t3, 2J7-33
disLribuuonof indi.es 228-31
pcrfortrran.eevaluation.t49 50
trodu.r flaluarion, 248 49
Puryoses,l4t .!3, 250
clasifi.ation trem (Jrr Mar.hing itens)
Cognni"e ouLcomes,It 19,41 -17,100
i r e n , 1 3 0 3 1 , 2 2 3 .2 2 6 2 2 8
problens in measuring,46 ,t
Conpa.abl€ scores,217
conpl€don itens (.", ShorLansweriLens)
Codpul€ r asnre d t€ srin g,l l: lt , 2lO ll, t l6- t ?,
23 3
rdm,nisLra Lto n,lll2 216 l7
Sradrngn,fiware 281
iLcmbanting, ll l2
.e.ordkeping, r2,283
r€porrn8 resurb, l2
3.orinF and analysis,210 11
Discrimrnauon,89-90, 223-2,1,226, 23r 32,
237 38
Concurrenrevid€nce,106, 103
Con$ndretar.d €lideDce,102, t08 10, l3l
Disoa.r€rs (rr. Muldple choi.e iLems)
Do h a i n d d i n n i . n , 1 0 4 5 l 1 8 - 1 9
Domain rererenced(.?, Crikrion fererenc.d)
€xtran€ors!arian.€, 100, r3l
trnderrepreenradon,r0!, 131
contentrelakd evid.nc., l0! 6, 221-22
correcrion aor guessin8,133,?0r,211 r:l
Cor..larion .oemcienk:
zppti. tlons,70,12-74, JJ, 106 7
i.rerprehiion, 70 74
P hi,23 8
Produccmomenr,72
domain ref€.enc€d,94
interpreatiotu, 195,Sl0-13, 318
nem anary$8Pro.edurd! 23?-33
obJ€ctivelrrferenced,
56-37, 118-t9
reliabiliV esrimadon,9.1-98
Crirerton.rrl2ted€viden.e, 102, tO6-a
C.iri.ishs of L.srs,3 7, 524
Cumularivet €qu€nct,65
cus6n'z.d t.6ring,8, S23
.rirerion ref€renc€d,22.1,237 3B
in d e x o i 2 2 6 , 2 3 r - 5 !
irem s€l€crion,232-233
potDrbiserialrndex,232
pre p.{ diff€.€n.c index, 238
upper lorer difc.en.e index, t26,2st
EtaecLtles.ore.ange,9l, I50, 2?,
Ern.iency, 127,222
!nrefing behavior,2? 20,253
Basic rea.hin8 Model, 27-t3
€laiuaiion plannin8,29, 3.12
m€dods of ase$in8,253
EquilalenLfortu n.rhods, 82
lsat resb, l0l, ll5 l?, rr2 23, t8S-9?
characr€rsa.5!139_90
.omparison with objecdr€ rens, l15 17, l2t 23
Burd€linestor preparing,193 9.1
reliabiht of.adngs, 192-93,197
reliabilny of scores,191-92
s.orin8 m€rhodR,194-9?
wrnrDgabilil/, 189-90
E a l u a d o n , 2 39 0 , 2 4 1 6 1 , 3 / , 2, 1 8
Basi. Tea.hiDgMod€1,27-23
fornarive, 24, 29, 24I 42, 3.12
inlormal meftods, 241,61
plannin& 28-30,,19,9.12-45
relaLedro rn(rucdon, 26 30
SLJBJECT
INDEX
.elrred ki n,ea\u.en,rnr,26
r.larcd r. Lcning. 2li
sunoxtivt, 21. 2! 3,12 13
Fredhr.k l(Dp (r.. Br\iL Tcr.ltng Model)
Ir, nr
, rc f v r h ' n ! . n
2 1 , 2 1 1 4 2 2 a i !1 2
Fr t aluen, r d a r r i b u r n ) n , 5 5 5 9
rlrtrr.tr1trti.i.50
59
'lc
r. ' ihed . 5 5 56 57
hisloAranB
k urlos t r.5 l r
skewed. alJ
stmnerril. 58
Global qual,L,rscolng. (r.. Hol,siic scorng)
Cr2deeqni v a l € n ' s . o r e s , 2 3 9 9 2 ,3 1 0 1 7
dc f rt rron s . l
2lj7 68
mean,nA\. 11i6.263 1l
nreLhods ol a$igning, 279-33
need tb., 261 65
purp.ses. 26,1
reliabilnv .l 267. '.1t0
shoncomrn8s 267
s our. es o a i n " r l i d i r v . : : 1 0 7 1,2 7 3 i6
G rrding 2 0 7 . 2 6 4 8 : l
abnt ure,2 6 l r 6 0 , 2 3 n 3 2
a$ignne n s n . d h o e w . ' k 2 7 5 - 7 ti
.onrrac(, t82 lr3
.ontra(ed $nh elrlu
nrg, 271 14
F,idc, 276
lcgal hsue!. 2i1
pa* liil ?31
problems ni. 20?. :ltr5-li7
rclauvc 268 6S, 277. 279 30
$f$are
233
{eighring codponcnc. 276-79
Fr
C' d. l'nc p' a , r d x ' c . . 2 7 '
- J b$f ii
rmFar
rbL,..r
co.renr based m€rhods, 281 82
disrribnLbn Fap merhod.279
F adrng o n t h c . u . v e , 2 7 9
p. rc enr g r a d i n g , 2 6 8 6 9 , 23 0 8 l
rel2Live meLbods. 279 30
sofrrare. ?45
iandard dclirtion merhod. 180
wei8hLin8 componenB, 2?ij 79
crading syne6s, 266. 268-69 1tl-?:l
combininS c.mponenh in, 276 t0
dual sy{em. 273
e.r€cric sysren, 2?3
sradc s.ales.2?l 72
365
Gfuup h.Le.oqeneirl 92
(i.up reti.cn.cd inrerpre(arons. 5,1 35, 2!ll 99
defin.d. !5
tro.m rete,OLed, 3.1 35
ir.atmcnr r.tcren.ed, 34 ll5, :l!3 99
(lue$in8, 79 133, 156-5?, 2l I 1.1
b l' nd. l :l s
( .r .e.ri on tor, 2l l -1,1
,nl()rn€d l3a
m flfl pl ccho'.. re$ 7.)
proles olelnniralon,
156 57
r m c l ah€ rers l l 3
tJalo ef€.t (ra observer problems, presen.e)
HiFh \chaol
326-tt
'e{ins.
Il,ghnak.s rcns, 2
Hig h erorder Lhi nki ng sl i l l r,5, 53, l 0t, l 2rj -2l J
23't'-60
Ho lisri . s.ori n8, l !15-96
lnfo.nril efrLL,aLionnelhods (r.? Nonr.r
.iquet
I^ serr.e trarnDg progranN, iJ.18-53
s.o.e inLerprerarion and use,352 53
ren rdminkLnrnrn. 150 5l
,rn s€|..'ion. 34a-49
r n sh cri onal obj €cLncs:
Ba s. l .a.l i rng Model ,27
d e r il al n,n of.13 49
d - "!cl op'i g {ai emcnrs,49 5l
eramples,lllli .17
re.h
expli.il vs impli.ir, 50
p".poses ot, 36. 43
pramid elfe.r,4F .19
tJxotror,v. r{l..tive, 52
tr\onony. .ognnive, 5l 53
raxononryj psl.homoior,52
Innruc&,nal procednres (r4 Basi. Tea.hing
Model)
In F lli8i ",' b r ' .",13.
dcfini.ons, r)30 lr2
devel{,pn€nl ol3:l?
h cr cdnan bash Ior 334 35
l tl 32
' n e annrgs,
mcasrremenr o1330-Jg
Intelligcntc ren s.o.e, 332 355-39
d.\iaiton IQ. 3116-37
inrelligen.e guorienr. 335
inr€rprerrron oa 331, 337 39
reporr oi 338
*andard score!, 335
uses ol, 3J6 t7
lnLelliBenie te{s, iand2rdized
iEiiEa-ia;,nb".. 2t2
pn$ fatl,281
apuitrder.{s.33e-.10
conceptrral
b2sis,2a8.350-32
singl€ vs mnlLiple gr2des 2t2 73
.riri.isms of, 14
366
SLJBJE'I]NDEX
I^r4 P. \ Wikm Riles,11
repor.ing re$ resulls,518
lnrelligen.c rc{s (.0'r )
LypA of asl*, 332 35
usesol r1 -r5,3 56 -5?
Inrernal anallsn €{imar€s Ga ReliabiliLre{inra
In0insic radonal ralidiq, 101 6
Invenk,r) (rtr Nonterr teLhnique$
l(2 terurg tua.Inteuigen.ele{s, $ndardired)
iLem analysnpro.edufes, 225 Jt
applilrlbns, 233 37
.rnerion group selecrion,227 28
.rilerion refercr.ed, 237 33
U.S v SDuthCamliiq 14
Li.ensure resrng, 12 I4
Manda!€da$e$ncnr programsi
m,nmrn.r cDmPetenc)/j
I
nainrnal as€$mcnl, 9-ll
sare by $ate .onpa.isons, l0-ll
irem dirficnlry-226, 228 3r
ftn discrnninari.n,226 ?31 52
lLen responserhcory, 210 It
adaprnerertin &21 6 l7
Mark and mrrking G4 G.ades cradin8)
Ma{ery resling,3, 238
MaLchjngnens, r24-25, r82 85
advartagesoC r$ 84
dasincarion rype, 133
Buidelinesfor w.itmg, 134-85
l Len s€leco!, 12 2 2 8,2 32 J 3
M€asuren€nr,25-26,.1r-32,287
reviri.rn ofirems, 233 37
I lcn weigh ring11
, ,21 .116
oprion se igh (lng 2
, r5 l6
I lem w.it ing ,6 11 2 51 ,15 7 71, l8t 87, 193 94
relaLed10evaluarion,26
relaLedLoresring,26j 247
.ompurarion of, 59-60, 66
mulLiple.h oi(e ,15 7 77
mulriple true lilsc, l5l-59
numeri.al problem. 185 37
s!o.t an:ler, 179-32
Mdtal M.asurtuts
Mulriple.hoi.e ilens, 124, 154-77
all of the above,1?5
consr.a{ed giLh informarionj 4l .12
.elated Lo performan.e,.11-45
rela(edr o L hin tin8 ,45 4 6
relaGd ro uidersLrnding,45-:16
school pufposcs,l7 r8, 53
rru.u.e. 4l-12,47, 53
(nder Richardsonformnlas,83 85
conpared wilh e$atrs,155 56
comparedwnh nulriple nefahe, r5r 52, r?3
.dmpared with true talse,tg5-37
c.iucismsol, 155-57
diss,.ler prepamidn, 167-?7
processof climin.don, r56-5?
le.hniqud tbr {ri!in8, 15?-77
Dninr.nd.d clu€s,r63-64, 175 ?6
Muhiple h€-f.l&
L.8al issue s,
l3 15 ,2? ,1,3 18
Bnhh. \ aatilami4 14
Dcbra P. e Ttnington, 14
Dim u stat Boa t aJF,hrnti6,
Colden mle de.tuion, l4-15
C,iggs I lrt . Poud h., 14
Ywbuh, 299
)4
itemr, 151 52,215
Nadonal Ass.ssn€nrof Educ2tioDalPmgress
(NAEP),I 11
NeSar'v€ru8g.{'on en€cL,141 42
Nonte* techniques,2t1-6r
SUBJECT
INDEX 367
obsen donal methods,245-4t
o.al que{ioning, 257-tjl
queslionnaires,25355
rating s.al€s,250-59
Nornal cuFe equivalenL,69?0
Nornral dislribution, 62-64, 66, 69-70, 336
.har^.lerisLi.sol lj2
frequencypdLenilges, 6?-63
*andrd der,arion uDns,63, 336
*andard sco.€,69 70
-*arnal[ed
conpaf€d wirh.rirerion ref€renced,54
inreryreLa rion s,670
8 ,80 , l14 I 5, 196, 2s 6- 99,
!10
!9I4t28?, 29n 99,325,996
).haract€risricsof, 296-c8
.ompared sth {andards, 2q6
B rDup\5 ind nid ua l,zq F 99
r-!n€.i..l
probleDs (r,, Shorlanswerneno
.ha.a.refis(ics,ll5-17
conpar€d uirh essayresr,ll5 71,122 23,156
Objectivetes iGms,6, r22-2b
.lassificadon,123 26
.odpartuotu, 123-26,156
cooplex nulLiple choice,r52
nlar.hing 124 25, 182 85
mulriplc.hoic€, l2il, 154-77
nurliP le irue fah e,15 1 52 , 215
shorranswer,125-26,179-3?
Oblccri!€$.etaren.ed,36-37, ll3 l9
coDpared w'd donain ref€renccd,3lj-i7
scoring ,ll6, 19 6-9 7
te$ developmenL,
1r6
obs€naLion,32 35, 241 53
@nnon probl€ms,245-46
ObsenaLiDnsch€dul$, 246 4?
Obs€rer problems, 245-46
recording meLhods,246
subje.l consnLen.y,246
Op€n-hoot k$,204-5
OpriDnalt€$ itens, r93-9a
Oral questioning,242,257-61
re.ording r.spons$, 260 6t
out ot:lcvelLesring,25?, 292, 325-26
tercenul€ mnks, 64-6?, 70, 289, 292-95,310-13.
356 39
.hra.leristics df, 67
.omPurarionalmerhodr,64-65
co.tra$€d wtrh perc€nriles,66
disribuflon ol 6lj-67
inrerpreradon,tj6-67, 70, 289, 310-17,337-39
perc€niilebandr, 292-95,5r3
Pefcenriles,60, 6il-67, 292 95, 3r8
computationalme6od, 66
conrrarted{ilh percemile ranlc, 66
Perfornanc€a$csnent Grr Basic teaching Model,
PertdrnanL€ Le{t
?erfornan.e rns, I9,29,92-33, n6 17
Plftr€d obs€nadon,243 46
.ontrar€d Hnh sPonaneoui 2.{5
Poin!biseriai corelarion, 232
Po$lelr discu$ion, 238 59
Predi.riv€ evid€n.e,107 8
Problem t$I' chara.Lerrdcs (r.. Short.answer
?rofi le inr€rprdralion,292-93
ch:racterirlics,42-43
comparedwiLh objecriv€r,50 51
exampl€s,:12-43,344 45
qualiry srandar&, 33
sPon@neous,
2rl3-44
Questjonin6Ge ofal que*ionins)
Qu€stionnaires,253-55
368
SUAJECT
]NDEX
Ran8e,00 0t,9 l
detined,60
etit . ule scorera nBe 9l
Rling s.!ler. ?.12,250 53
connnotrrrr.rs NiLh,25J
develo pn ren.1' 2n 0 a2
examples,25o62
rtr.r prdrbon, 250-51
pro(edu fesnr usin& 252 53
pu.pores,2:o
A eidxr .ssL€ i 1 10 lt,lio l,
Readi8 inlenrrr, 255 5J
Rcckrgular drtnbunon, ijrj-tj?
R€leun.e. ill5, 2:l 22, !96 !7
ol normr, !!li 97
res .onrcnL 221-21
valid nt l0 5. !2 1 22
Rclela..c Crud r, 5 l-5:i, t 20- 21, 211
Reli: hilirr,irj l)8 , 10 0 l0l, l9l 93. 2?. 1, 2r r i,
! 92 !6
. rnei0n ..Irren .!d scores ,7t i- 7' l,9+ 94 221
definiLn,ns.76 t8
errc,s o fmca su ,c.re nL7113l
esa) s.ores.t9, 91. lll 03
Ia.io.s n,lluen.n,g 8t3-93
iupor (a D.. ol1 8, 85 .2 2, 1
inLernll analysn,82
inrerp.errtion, 35 t3tl,2jl.l
m.Lhods01 en,nraung,lil 35
related.o va lidilr, 10 0 ll) l
riLtrgs,85, tCl 93,1.16
souf c esoI.fro r, 7 !r rll
randzrd erro. ot merlu.emenr 36 aa, 292 'l.l
subrc$ sco,cs,29+-90
oe scores,77 r_8
Reliabilitv ern.Inr
n,etliods:
alphr .o €fficien l 3,1a5
eqDivalenrforms, 82
esay s.orc l9l-92
, nrcrral.n alyns,32 -1!5
Kude. Ri.hardron, 83 84
SpearmanBrown 33
spliL ha h€ s.
8, 83
t enretcs,3l-li2
R€liabrlnv,tacrorsInnuencirr8:
.heaLnig,93
grouP hoDogene,Lt,92
ireDrconLenLhonx,BeneiLv,
S9
ited d ll.ulr),3 C 90 '
tred dtrcrinnrauon, aS 90
motivaLion92
scorelariabiliry, 90-91
speedednes,93
srudenLkn*sr.ess,92
t en len$ b,8 g
lime limn s,g 3
Reporringto par€n6, 317 19
dtrL.id .ep.r .ards, 321
ResponsecounL,225
Re s p o n i er a r . s ,l l l l 2 9
Rerpon\esct, 1.11r
, r D p l n r 8 e , '. r , N 2 ,r 2 9 3 0
SLrtrerpl(n,tl) tl
S.hool re$rg proBmn$. 7 3, 12, 29! Jol JO5
j l 7 1 9 , 3 2 12 t
lprirrde renrDg.:J,1t)
n 'g l , s . l i . o l , 3 ! 6 2 ?
tnrell'8€n(c(cnnrg,lJ57
purpo\er, 7 8
feplrrng renrl6, i2, 3lt ltl
selern)n Lrlleni, 21rlrStrt
reach., r. r.rriLr, i121,318 i3
,^eolresolFnoxr,J05
nnerpretrrnD:
/ ( or e
J S c c q u 'l a l c r 6 2 3 9 : 1 3 6l J 7
t r.tasser.fiurt.ns, Sl4 17
.r i t r r i o r r r l e r e n ( e . t , 8 0 t. l . 1 1 5 , 3 1 0
d.velopmenrrl \.orcs ?8! 1r2
g r J d e e q u i y l e n 6 , 1 9 09 ! , 3 l l i
8ft,up norn$. 2ga 99
norm rere.en.ed,03 70, u0 I 14 t5, 196,2Cti-E.,
Jlo
pei(enrilr bands,292 95
trofiles, 29! 1r3
sratuss.ores ?39
s l b r e i s c o , . s , 2 9 . 19 6 . ! i l 6 J 7
rrerrmerl.rferen.cd 29a 99
Sc o n r g , t ! , 1 9 1 - 9 7 , I 0 4 , 2 0 9t l j
rnsre' k.y prepardion 204
.o.reLLnn ror glesrng,2ll-14
d'ilircrtJl irem weiBhring,2l.t-16
e$al Le{s, 1.1.11r7
tornrl.s, 2ll 13,
nrachrn.s 12, 2t0 I I
oprn,r wei8bon8,215-16
S. o i n g m r . h i n e i l t . 2 0 0 .2 1 0 - l l , 2 2 3
Shoranswef rrda l!5 26. ltg a2
rdvanrnge\,l7q 30
d F a d v r n r a g e sl l,3 0
g u i d € l i n e sl b r ! 'r r n g , 1 3 0 3 !
numr,,.al problems,lt5 t? ta5 37
Skesed lrequen.y .l6triburion, 53
SpearDrnBrown iormula, 33 31100
Specrncdcrcrnrnier\,14.1,150
Sp e l i f i . r y , 2 2 2
Specdcdnc$,93,l2a
Stlnhalves D.rlrod, 3? 3.1
Sponrrneousobservarion,243 .15
SGndard devradon 61 62, da, a6-37
LonpuLaLionol0r-62
t i8o,n1,l6l i?- 3 t
u' nstec\rop1. e
6 r2a, 6
SLandrr.l.io. or nearlremcnr 11688, ,u2 94
LonpnLarion,3L-37
pcf.cndlc brnds, 2e! !1
suBJrcrtNoq
::
ro relj:b,tny, t6
.
:-i .r!.erarior 87_33
:-::
:::!?.rcnL -G
3 4,1 9 2n, Zt ) 3_21
:i
:-.)
porcen discussion,238_99
lst 88 ,3 . 10
--::re.nri.s df, 286-38
- :.,.rs, 280-llt, 306. 337
-- r.hool resrnr8issucs,j26 27
: rgen.e 948, 330 j!
:: ir !1 .1rrn ullm , il . t , i! 2_23, : l2t
l9rr !!, 336
-1
r<'!rd rrh,, J ir ) c h J r r 9 t ; , ; 2u ?1.
_ ..rln .,,reria , 29 9 J 0t
- ,:.c! 0r rnlorllrrtod !99 30u
- _r<!. r.ns :ti3, J.lil
-nr. rrau nrg 30 t . 3! : 2 3. 18! 3
r<\ or lu7 Il8, 3.10
.
i.rd! ror .esorK 15 t 6, l0r , lU6
bt 70 ,: a9 91, 116
:: .,.|rrenQl 2ll9 9r
.iial .!rve equival.nr, 09 ?(,,239
Ten plannin&l14 31,199
.o,renr to h€6ure,
8_19,t29 90
dirruk' t€rrt, ll0 1t
rumber of n€m, l2a l}O
spc.ificrdods, rr7 2r
Le{h8 PU.Posc,114 l5
r!p.s ol
ll5 t7, 122 28
,.
d.lil_3!
T$Ln,{e-inr.rp
. b_7. U 58, 2a9_96,gO9
.r{e.io. r€ter€n.cd,6 ?, s5_3a
.lrolfscores,57 33
domaD r.lered.ed, i6
Tsr rrl(.|-n
-10t, g48_49
r e \ r r r r , n g r l j - ,\ G
, 2nutur .
I ens6tr€$, 92,l0l, i:01-2
...rne , ri3,t89 ,5Jti
, !98_99
anbrSrtrt ol r38_40
330
- i'nes, 63, !1111,
vJu200n, 24. 342 .13
:r;le ol spe .dn aro ns,t t lr t t , lt l
^Driln
!€a
oI €dkrional
obje.(ves, 5t _5!, t26
8u'd€ttrer for inprovinS, 148
,mproYrngd^clminaiion, I 49-51
fternrl .omplrjson use, t43
lcarninA effeLb. l4l-42
mNconcepLioru,l:l4, l3?_{2
Dunpre r ue litre, t5t-S2
xgarre suggenraneffccLjlitl 42
.3ron,rc ror urjng, |13 3.1
S ooxfr,5l,5 3, t:0,2 21
r],efs, 5i: 5ll, 1t0,2:1
.(ding b rhe ren,l .r, 3:t3
J/,r rr\rerJ J 5 t21 j 29 . 2lr 2 J
-8srL,i!r,5 6,2 53 ,33 5
. <)t evJuaLionproL€dnrts,2?0 !!
varidnt, r00 I l!, :tt, ?.12,296, 300
Loncu.rrnLr€lakd evidence,106_3
.onsru.rr€lared evrdrnce,102, loa t0
LoDrenLr.tar€devrdence,102 6
!.rcrro. rrr€ren.ed, 2:J7 J9
dLlrn n:ro r,2!5
!.r
rnLrtrtrrc.3aonaterid€n.€, l0.t-6
370
SUBJECT
INDEX
nontesrrechniques,242
pr€dictiE evrden.e,107 8
relaled ro reliabilny, I00 l0l, l12
wrilren doomenrdion, 10.15
V alidatio n,l0 S,l0? a, ll0 12
variabilrt, 60-62, 224, 252
€xtraneous,l0l, 105, 109
relar€ dro r€ liab iliry , 9091, 224
$rndard devilion,
6t 62
eonrenr caLeEories,105, l2l
gradln8coDPon€nc, 276 79
W.ning ase$bcnr,
l0l, 139. 195
ESSENTIATS
OFEDUCATIONAT
M EASUB E ME N T
Fifth Edition
BOBERT L. EBELand DAVID A. FRISBIE
hoth ol UnivaBitv of lowa
This book provides a solid iniloduction
to the lundamental concaots and
pr inciple so fed ucat ional
m eas ur em ent I t is d e s i g n € d s p e c i f i c a l l y f o r t h o s o
individu als re sp ons ible lor t es t ing c ognit iv s a b i l i t i e s T h r o u g h r h e u s s o f
pr actica ldiscu ssions , r €ador s €x 6m inet hepr inc ip l e s o f t 6 s t i t e m
witing;
the
re liab iliry snd validit y of educ at ional les t s c o r g s ; i s s u e s r e l a t e d t o g r a d i n g ;
pr oced ure s tor s s s igning gr ades ; r ec ent d e v e l o p h e n t s
in testing;
snd
stan da rdizsd te s t s of ac hiov €m ent and int glli g e n c e
Tho text is appropriato for introductory testing and measurement courses and
as a tofo ren ce fot pr ac t it ionols , Nopt ov ious s t u d y o f e d u c a t i o n a l
moasutement
or 6ta tistics is a s s um ad.
tsBN-o-87692-700-2
Download