FIFTH EDITION ESSENTIALS OF EDUCATIONAL MEASUREMENT ROBERT L. EBEL DAVIDA. FRISBIE fifth edirion ESSENTIALSOF EDUCATIONAL MEASUREMENT ROBERT L. EBEL DAVIDA. FRISBIE Unir-ersin of Iowa Prentice,Hall of IndiaFn0vateLImnlted New Dethi-110001 1991 EIi.'EREE| ThL lndbn F.pl|nl4a 7f..o0 (Oli8inalU,S.Edition-Rr.I 347.m) ESSENIIALS OF EDUCANONALMEASUREI'EIIT"5ThEd. by Robedt, EbelandDavidr{ Frisbie PRENTICE-HAIL INTERNATIONAL, lNC.,Engla,vood Cliffs. PRENTICE-HALL INTERNATIONAL, lNC.,tondon. PRENTICE-HALL OF AUSTRALIA,, PTY.LTD.,Sydney. PRENTICE-HALL CANADA,lNC,,Toronro. PRENTICE-HALL OFJAPAN,lNC.,Tokyo. PRENTICE.HALL OF SOUTHEAST ASIA(PTE.}[TD., SiNSAPOG. ED]TORAPRENTICE-HALL DO BRASILLTDA.,Riode Jan6iIo. ,TERrcANA" PRENTICE-HALL HISPANOA S,,q", MexicoCiry. @ I99l by Prentic€-Hall,Inc., Engleuood Cliffs, NJ., U.s.A. All rights rcseryed. No pan of this book may be reproduced in any form, by mimeographor any othel means, without permission in writing from the publishen. lsBN{€7692-70G2 The crport rightsof this bok are vestedsolelywith the publisher. Reprinted lndia by special arangement wtth prentice-Hall, Inc, -inEnglevrcod Clifh, NJ., U.S.A. Printedby BhuvneshSeth at RajkamalEledric Press,8-35/9, G.T. K,amal Road lndust al Area, Delhi-llOO33 and Publishedby p.entice-Hall of India PrivaE Limited,M-97, Connaqtht CirEu3,New D,elhi-lI oOOl. Contents xi Preface The Status of Educational Measurement I I The Prevalonce of Testing Some Chronic Cornplaints about Testing , 3 7 Some Current Issuesand Developments 17 The Principal Task of the School The Potential Value of Testing in Education 2l Summary Propositions 2l Discussion and for Study Questions 2 Icasurement and the Instructlonal Process 23 Eraluation, Measurement, and Testing 26 Process Er-aluation in the Teaching 30 Functions of Achievement Tests 3I Tests Limitations of Achievement 33 Measurements Inrerpreting J8 Summan'Propositions 39 Discussion Qrresdons for Study and 23 l9 YI CONTENTS Measurlng Important Achlevements 4l The Cognitive Outcomes of Education 4t Using Instructional Objectives 47 SummaryPropositions 5J 54 Questions for Study and Discussion 4 Descrlblng and Summefizlng Measurement FrequencyDistributions 55 Describing Score Distributions 59 Score Scales Describe Performance Correlation Coefficients 70 Summary Proposition s 74 Questions for Study and Discussion The Reltablllty of Test Scores Results 64 75 76 The Meaning of Reliability 76 gl Methods of Estimating Siore Reliability Using Reliability Information 8j Factors Influencing Score Reliability gg Criterion-ReferencedScoreReliability 94 Summary Propositions 98 99 Questions for Study and Discussion 6 Valldlty: Interpretatlon and Use f OO The Meaning of Validity 100 Evidence Used ro Support Validity Applying Validity Principles t I0 Summary Propositions I 12 Questions for Study and Discussion 102 ll j 7 Achlevenent Test Planntng ll4 Establishing the Purpose for Testing l14 Alternative Types of Test Tasks t 15 jJ CONTENTS yil Test Specifications I I7 Item Format Selection 122 Number of Items 128 Level and Distribution of Difficulty S um m ar y P ro p o s i ti o n s 131 Questions for Study and Discussion Tnre-Fdse Test Items I j0 l j2 lr3 Merits of the True-False Forrnat t)j Common Misconceptions about True-False Items Writing Effective True-False Items 142 Multiple True-Ialse Items I5l SummaryPropositions 152 t5j Questions for Study and Discussion I jz 9 Hulttple-Cholce Test Items lj4 The Popularity of the Multiple-choice Format 154 The Content Basis for Creating Multiple-choice Itenrs The Multiple.choice Item Stem 159 Preparing the ResponseChoices 167 S um m ar y Pro p o s i ti o n s 177 l78 Questions for Study and Discussion ro Other Obiective-Item Formats l7g S hor t - ans w e rIte ms 179 \ I at c hing I t e rrs 182 Nur ner ic alPro b l e l n s 185 S um m ar y Pro p o s i ti o n s i ,8 7 f o r S tu d y a n d D i s cussion Q ues t ions IB V 11 Essey-Test Items f 88 - T he P r ev ale n c eo f Es s a yT e s ti n g 188 T he V alue o f Es s a yT e s ti n g I8 9 Reliabilit v o f E s s a y .te sSc t o re s I9 I P r epar ing E s s a vIre m s 193 157 Viii CONTENTS Scoring EssayItems 194 Summary Propositions 197 Questions for Study and Discussion 198 t2 Test Admlnistratlon add Scoring 199 Preparing the Students 199 Test-preparation Considerations 203 Test-administration Considerations 205 Scoring Procedures and Issues 209 Computer-assistedTestAdministration 216 Summary Propositions 218 218 Questions for Study and Discussion t3 Evaluatlng Test and Item Characteristlcs 22O Test Characteristics to Evaluate 221 Item-analysis Procedures 225 Selection of the Upper and Lower Groups 227 Index of Difficulty 228 Index of Discrimination 231 Item Selection 232 Item Revision 233 OtherCriterion-referencedProcedures 237 PosttestDiscussions 238 SummaryPropositions 239 240 Questions for Study and Discussion r4 Nontest and Informal Evaluation Methods Observatiorral Techniques 243 Informal Inventories 253 Oral-questioning Technrques 257 SummaryPropositions 262 Questions for Study and Discussion 241 262 r5 Gradlng and Reporttng Achlevements The Need for Grades 264 Some Problems of Grading 265 264 COarTElr'S The N{eaning Conveved by Grades 258 E s t ablis h i n ga Gra d i n g S y s te m 2 7I 27] Threats to the Validity of Grades As s i g n m e n ts 2 7 5 G r ading C o u rs e 2 76 Com bin i n g G ra d e C o mp < rn e n ts NI er hods o f As s i g n i n g G ra d e s 279 Grading Software 283 281 Sunrnrary Propositions 284 f< -rr S tu d y a n d D i scussion Q ues t ion s r6 The Nature of Standardized Tests 286 285 Characteristicsof Standardized Tests T y pes of S ta n d a rd i z e d ' Ie s t Sc o re s 289 Norms 295 299 Selection of Standardized Tests S ur nm ar y P ro p o s i ti o n s 301 302 Questicinsfor Study and Discussion r7 Using Strndatdized Achievement Tests 3O3 'l'he Status of Standardized Achievement Testing Us c s of ' A c h i e v e me n t-te s tR e s u l ts J05 I nt er pr e ti n g S c o re so f In d i v i d u a l s J09 h) t er pr e ti n g Sc o re so f C l a s s e s 3 1 4 Rep< lr t i n gt() S tu d e n ts a n d P a re n ts 317 S ( ) m eI nte rp re ta ti o n P ro b l e ms 3 20 School Testing Pr<lgram Issues 321 S ur nm ar y Pro p o s i ti o n s 328 328 Questions for Study and Discussion 303 18 Standardized Intelligence and Aptitude Measures The Cc-rnceptof lntelligence 330 The Nature <-rfIntelligence Tests 332 Scores Reported from Ability Tests 335 Aptitude Testing 339 S um m ar y P ro p o s i ti o n s 340 341 Questions for Study and Discussion 3jO ar X CONTENTS E.TeacherIn-servlceToplcstrtegardlngPreparatlonforTest ,5O Admlnlstratlon F.Teecherln.servlceToplcsonAchlevemcnt-testScore ttz InterPretatlon References ,t4 Author Index ,t9 Subfcct Inder ,6t Preface This fifth edition of Essmtial,sof Ed,watinwl Measuremntt, like the previous editions, has been designed as a textbook for introductory measurementcoursesand as a reference for practitioners engaged in the development and use of educational measures.The evaluation needs of teachershave been weighed heavily in making decisions about content coverageand emphasis,but consideration also has been tion of a chapter on nontest evaluation methods and the deletion of the chapters on persoriality measuresand recent developments. Chapter 14, Nontest and Informal Evaluation Methods, describes the development and use of observation The chapter.end projects and problems sections have been moved to the Xii PREFACE to illustrate the apprication of test-developmentconcepts and procedures. The last three appendixes provide lists of topi.r rhat are ipp-p.iu,. ro consrder when planning in'servi-e instruction about standardized'test ielection, administration, and score interpretation for teachers. The first three chapters deal with fundamental educational measurement concepts and current testing issues..4 throughour the text, is a chinge in us more common usage found in the lite discussion of classification systems(tax The treatment of reliabilitv has detail on criterion-referenced iituations ) glve proper emphasisto construct valida_ Itrinsic rational validiry evi<ience.In addi:o planning for criterion_referencedtests ts for all kinds of tests. Laveundergone onlv minor revision: the discussion of lrolistic and analytical sc< chapters on test administration, test eva most educational measurement texts do and aptitude testing also has been revisr score interpretation and use. able and generous contributions to this Ful to several colleagues who furnished Tim Ansley, Doug Becker, Bob Forsyth, , Dave Lohman, Rick Stiggins, and jon For his supporr, Kim Volk-for her ible word'proces-sing assistan.:, 1'9.B9bJo1dan for his heipful library work. I also appreciate the efforts of Fred Finch ind Bill Zwack of the Riverside publishing company in obtaining illustrations. Finally, thanks are due to my family for the Patience and understanding they have shown and ro Bob Ebel for the solid founda. tion he established with the first three editions of rhis book. D. A. F. ESSENTIALS OF EDUCATIONAT MEASUREMENT The Status of Educational Measurement THE PREVALENCE OF TESTING As the last decade of this century unfolds, there is more educational testing occur. ring than we have ever witnessed before. Like many other educational phenom. ena, however, testing seems to fall in and out of favor in cyclic fashion over time. Usually the era of peak demand is followed by a period of increasing criticism of the inadequacies of testing and of the inability of tests to address our most pressing educational problems. By decade's end we likely will have come full circle from the 1970swhen chargesof racial/ethnic bias dominated our thoughts and contributed to a decrease in educational testing. The most recent escalation in the use of tests gained impetus from such educational movements as "excellence,""effective schools,""public accountabil. ity," and "minimum competency." The pressure of these movements brought both more and different kinds of testing to the schools. Teachers continued as udual to give classroom tests to assesslearning outcomes and to motivate their students to learn. And schools continued as usual to administer standardized testing programs to monitor the progress of each graae group and to assesscurricular strengths and weiaknesses.However, in many states this teacher and dis. trict testing was supplemented by a host of other testing programs mandated by the state or by the district itself. Thus, students and teachers alike continually found themselves in some phase of testing-preparing to take a test, administering or taking a test, or reviewing or explaining the results from some kind of a test. Unfortunately, there has been too much testing-too little use of good 2 THESTATUS oF EDUCATIoNAL MEASUREMENT tests and too little good use of well more by the conrent of the upcoming mandated test than by the goars, values, and perceived needs of the lbcar coirmunity. As a result, in muri ptu.., it. :onsequencesof testing has begun to shape more than it should. rolicymakers and the pubtic for test informa. rreparation and to ill.advised uses of existing purposes other than what their makers intended. unfortunately, too, some of ihes. tests are nor oi u..y high q;"iiit T;" many of them are produced under severe time consrraints by iti,ri"iJ""i! *itr, little special training in test.development and no speciai aptitude for the task. All educators-teachers, administrators, clunselors, curriculum coordinators, and instructional designers-need to know more about educational measurement than they have had an opportunity to rearn. Most states h;;; ;; teacher certification requirements thar specify a test and measurement lcourse w r D L' ' .- " 'v 'r ! and suchspecia',,i'j,1:T:,T';'.1TflXff"?j'j;5r,*ilff lfi?J,:TrffJffJ,:; teacher certification may be a prerequisite to special mo st teach .c ^f" ot,,i J , L^,,- -^ ^..;- - 1_ Eertidcation, rr"u. dation in educational measuriment as they prepare forand ""-ro,r.rptu.ti.",tll.,p.. cialty' For these and other reasons, the practicis or t other educators are influenced prim_arily-."i.rrd-.nr ".rr..,as "li by lore-wh"i-rtt.y experienced stu. dents, what they have seen or heard rrom colteagu.r, .rrJ*hut tir.y rruu..r.-o.a incidentally through related coursework or shori-term prof.ssio.r"i ;;;;;;;;; experiences. The status of achievement tes different. Employment testing by pe for licensure or certification deciiionr on the upswing. Tests of achievemen tions, and the quality of these tests v the next. The decisions made on the basis of scores from such tests are no less . critical than many of those made in our schools. These are ..strong ,.r,r,;; ,rr. con-sequencesfor.test takers are great because the decisions will influ"en.. .ur..,. path$, economic bpportunities, lnd social acceptance by peers. They are hish_ stakes rests because there is much ro lose or much t" g"i", ;;p;;;i"g;;;T. decision that results from the scores. The excessivePressuresbrought on by high-stakesdecisions should serve to reduce the amount of testing that-takes-pracJ as we near tt. ."rrtrry;r-.rra. And the'realization that a test acore providls limited information of less.than. perfect accuracy should-help to curb the tide also. But tests will not so awav permanentry or even sink to row obscurity. Nor should they. If *; *.;; ?; ;;;i vate and reward efforts to learn, if we want effective and producrir; ;;;;i;, if we want to deal fairly with individuals on rhe uasir oiirreir."fufiiiri., accomplishments, we need more good testing, not simpry less testirlg. oespite ".ri the MEASUREMENT 3 OF EDUCATIONAL THE STATUS current prevalence of educational testing, we are far from receiving the full benefits that could be obtained from the wise use of good tests' SOME CHRONIC COMPLAINTSABOUT TESTING Even in periods of unprecedented amounts of testing, there are those who decry the use of tests and whose goals are to highlight the misuses of tests or the harm done to students by them. Of course, not all tests are skillfully prepared, not all test scores are used in prudent ways, and no test score is likely to be free of error. The wise use of test scores requires an understanding of the issues raised by critics and an ability to distinguish constructive criticism from emotional reac' tion or uninformed opinion. Because some of the most frequent charges have implications for test development and test-scoreinterpretation and use, it is ap' propriate to consider their merits early on in the study of educational measure' ment. 1. Standardizedtest makerscQntrolathat studrnts learn. lt is possible for the developers of a commercial achievement test to constrlrct their instrument to includi only those items that meet their own personal criteria of relevance, diffi' culty. and timeliness. But unless the content representsthe essential topics of current textbooks, unless the items reflect the recommendations of national cur' riculum councils and cornmittees, and unless the content is deemed relevant by district test selection committees, such testswill not be sold. A test that might be considered ahead of its time or behind recent developments will die a quick death because many school personnel will fear that such a test will yield a set of very low scores and, more importantly, useless information. The successof commercial test developers is measured in the marketplace. The most successful tests are those that respond to curricular changes and emphasis rather those that attemPt to effect such changes. To the extent that a school modifies its curriculum-what is taught, when it is taught, and how much effort is devoted to teaching it-so that test scores will improve, sorne may say the test is controlling the curriculum. However, the effects of such instructional modifications can be considered both positive and negative. Instruction that is measurement driven is furposefuI instruction, whether or not the intentions are laudable. For example, when the domain of instruction is lim' boredom, more frustration. and lower achievement levels. Trade'offs abound' Teachers and administrators ar'efaced with the decision about how much the test should influence instructional emphases.Unfortunately, parents, school board members, superintendents, and other school personnel are reluctant to explain low test scoies in terms of test-curriculum content mismatches. Instead, cuirent sentiments are to treat low scores as an indication of failure on the Part of the school. Most often the appropriateness and quality of the test instrument MEASUREMENT OF EDUCATIONAL THE STATUS not in the habit of remain unquestioned and are assumed to be ideal. We are they be high whether scores, test of set for a explanations competing considering appropriand the quality or low, but we are in tfrehabit of uncritically accepting atenessof the test instrument' A school that focusesinstruction on what the testsmeasure surely should test does samplq teach other things as well. Even in the basic skills areas that the into interesting venture to there should be ample class time and teacher time tests can Standardized tests. standardized the and important areas not covered by teachand administrators school that extent to the only dominite local curricula ers Permlt. are inflated becauseof teathing to the test.If the effectiveness of Z. Test scores judgedon the baSis-ofstudents'performance on a test, the instruction is to be teacher to prepaTe.stu^dentsto answer the spe' for-the itt"ig be te-ptution may on the test. This is often referred to as teachincluded will bE ih"t .in.'qr"rtion, are significant ing,J the test. When the negative consequencesof.low scores less favorable to a reassignment incriase, salary IJw for the teacher-loss ofjob, not be too low. The will scores students' ensure to urgency is great setting-there survival' *o."firu of the d?cisions"often takes a back seat to practicality and The Pressures of accountabilil but, more importantly, unrealistic ar their associated rewards and punishn For example, teachers with average'a requiring above-average achievement. tem. Teiching the test questions and And as one state found out, erasing I rect ones in their places after the test in circumstances stakesare, fhe bolber the stakehotderswill become, especially requirements' or expecta'tions of unreasonable An important distinction should be made between teaching to the test qu€s(that is, attempting to fix in students' minds the answers to particular test give to (that is, attempting test by the covered be to iions) and teaifrin[ material uestions lihe those in the test on topics covered trehensible'The secondreflects purposeful rn for giving away the answersto particular for testing Performance on skills or Eeneral nrs ;,I';:t'J; enritled to knowwhat the srude J|T.,J::1 i'"t&t"'""::::::*l: to assess'Since a must be thoroughly relevant to the instruction it is intended much more will usually of performance, sample a than more test can r.n., .fi.it seldom go test should the However, be tested. can than (and learned) u. t".rgnt to^learn' beyo.ri what students have had an opportunity . harmful 3. Testsmake studcnts araious ind stressful.Claims that testing is some stu' students; upset and threaten tests forms: many taken have to students get a-low score dents even break down and cry when faced with a test; if students self-concepts student's try1ns; quit and on a tesr, they will become discouraged with educational incompatible is testing Proce' and will be damaged seriously; of students' supportive be to desigtt.d dures There is undoubtedly anecdotal evidence to supPort some of these HE ST,ATUS OF EDUCATIONAL MEASUREMENY 5 claims. Common sense suggests,however, that the majority o[ students are not harmed by testing. There are no substantial survey data that would contradict cornmon senseon this matter. Teachers seem concerned much more often rr'ith students who d<ln't care enough how well or how poorly they do on tests than with the relatively exceptional instancesof students who seem to care too much. It is normal and biologically helpful to be somewhatanxious when facing any real test of performance in life. But it is also a necessaryparr of growing up to learn to cope with the kind of tests that life inevitably brings. Of the many challenges tcl a child's peace of mind caused by such things as angry parents, playground bullies, bad dogs, shots from the doctor, ancl things that go bump in the night, tests surely must be among the least fearsome for most voungsters. Unwise parental pressurecan in some caseselevateanxiety to harmful levels^But usually the child who breaks down in tears at the prospect of a test has problems of security, adjustment, and maturity that testing did not create and that cannot be solved by eliminatinB tests.Indeed, more frequent tesring might help to solve the problem. A student who consistently gets low test scoreson material that the student has tried hard to learn is indeed likely to be cliscouraged.If this does hap. pen, the school cannot claim to be offering a good educational program, and the teacher cannot claim to be doing a g<lodjob of teaching. Most low test scores, howeveq go to students who, for whatever reason, have not tried very hard to learn. In the opinion of the teachersof such students,it is the trying rather than the testing that is more in need of correction. 4^ Standardizedtestsare biasedagain"stsomestudcnls. Standardized tests of educational achievement have been attacked for their alleged bias against raciali ethnic minorities, against either males or females,or against studentewith poor reading skills. The reason for the attack, at least in part, is that suih students tend to score lower on standardized tests than their age-mates.But surely lower scores alone do not signify bias. If they did, every spelling test wbuld be biased against poor spellers, and every typing test against persons who never learned to type. A test is biased only if it yields measures that are consistently lower than they should be. That students who do poorly on a particular test written in English might do better if the test were in Spanish, or if the questions were presented orally, does not mean that the original test is biased against them. It simply means that they have not learned enough of what the particular test measures. Its linguistic context is part of the test. The particularity of what the test does and should measure does not constitute bias. The score of a student on an achievement test indicates how successfully the test questions were answered under the conditions of the test. The reasonable assumption usually is made that the student would be equally successfulwith other tasks requiring the same knowledge or ability. Consequently, if a tesr score isjudged to be an inaccurate indication of the student's level of achievement in the domain covered by the test, bias is not the likely explanation. It is more likely that the conditions for testing were undesirable or that a reasonably good test was chosen for a purpose other than the one fo!: which it was intended originallv. Suppose we present third graders with this math problem.solving item: "A football team scored two touchdowns, no extra points, and one field goal in 6 THE STATUSoF EDUCATIoNALMEASUREMENT the first quarter. How many points d Surely this item is biased ugii.r.t ,t,r, But it is also biased againstlhose wh< it probably should not be if it is inte ity. The item probably favors Amen ably disadvantagesthose who know a game in which there are no touchc item be biased? 5. papn and pmcit objectiae would indeed be foolish to clJim thr could be accomplished with a carel test.In many situations we need to ex istics_of a product developed by the the therapy session of a counsefor il the draft of a one-act play, a persuasr surement of skill is no substiiution fi tion can be observed directly and el and economy. In situations where objectivi performance rneasures,the abiftv to , in the hands of the test maker. item unimportant detail are the easiest t writers. Thus, all of us who have be, enced test developers can attest to I difficult by the obscurity of their co. u;ere stated. Only u ,-"il collection are tempted to draw the conclusion t (cannot) measure worthwhile instrucr ber of morepositiveexperiences shourdericita r;;;;il;;ilil;. A;i; the skill of the test develoPer and the ":"nr of the important content to be -nature measured influence.the quility and usefulnessof giu..r't.rt. Finally, the increased interest.in-recent y."i " in giving rhore curricular emphasis to higher'order.thinking skills has rais6d abour how to mea. l".r,iB"r sure the achievement of these skil-ls.The ability ," predict, evaluate, de. "it"iyr.-, cannot be measured, some haue'said, with objective.test skill beyond the level of remembering often *i, i,..a to L. rce or product development to be assessed rains, only the skill of the irem writer pre_ measure important higher.order abiliiies. requirements of good item writing, as will be seen in later .n"o,.t fl"oamental 6' Testscoresrwed to telt whai stu.d.entscan d.orath.n than hou studmts rank in a goup'The fact is, both kinds of scores are important" ano we need tests that will provide each type. For exampre, I_ .,o, t. ;,ttfr; to t'o* it the best swimmer in her crass.Rather, I -":want to know ir srrecan swim far"ri;';; enough to save herserf in the poor. If she can srr-imharfu.ay pool, that stlr _?y not be far.enough, even if it is "best" in the class. "..orr-irr. wt.n it comes to such activities as swimming, writing one's name, driving a motorcycl., t. airp..rri"g -;ii.;: THE STATUSOF EDUCATIONAL MEASUREMENT 7 tion, being the best in some group may be quite inadequate: ranking is not enough. Scott can swim 100 yards and spell 310 words, Is he a better swimmer or speller?Knowing only what he can do is not sufficient for deciding about relative strengths and weaknesses.If it were important to decide which is better, I need to know how other five-year-oldsswim and spell. Suppose I learn that no other five-year-oldsin Scott's YMCA, swim classcan swim 100 yards and three girls in his kindergarten classcan spell more than 310 words. Is Scott a better swimmer? This is an issue to be explored in greater depth later, but for now it will be suffi. cient to point out that, as long as the comparison groups are different from one another, the question must remain open. If we learn that everyone else in Scott's ,swimming classcan spell at least 325 words, what conclusion can be drawn about Scott's strengths and weaknesses? In sum, we need different kinds of information for different kinds of instructional decisions:what students can do and how they rank with others are both important. Beyond that, knowing the type of information most needed in a given situation and how best to obtain it are key issuesfaced by teachersat all instructional levels. SOM E CURRE NTI SS U E SAN D D EV EL OP M EN TS Mandated Assessment The continuing press for accountability has led to more testing, more emphasis on test results in policymaking, and new developments or advancesin test.related methodology. Mandatedassessmmt is the term that describes the collec. tive testing programs organized at the state (or local) level in response to legislation enacted by state (or local) governments. The mandate varies from staie to state regarding the flexibility accorded local school districts to implement an assessment program and to use the results. In some states, for example, local districts have the option of conducting an assessment,but in others the legislation dictates the grade levels, subject areas, and specific purposes of testing for every district. Virtually every stat€ has enacted some form of statewide assessmentlaw, and new legislation is introduced annually in many statesas assessmentexperi ences uncover unanticipated gaps in implementation. The emphasis plaeed on testing in response to state and local mandates can be seen by reviewing this contrived, but realistic, testing program for one school district: 1 State assessmenJ. Tests in mathematics and reading must be administered to all students in grades 2, 4, 6,8, and l0 of every district during a two-week period in May. Results are summarized by the state and reported to each district in September. 2. Grad,uation competenq test. High school sophomores are given tests in reading, math, health, and consumer education to dererrnine if they have the minimum knowledge required for high school graduation as determined by the local dis. 8 THE STATUSoF EDUCATIONAL MEASUBEMENT trict. Those who fail may be retested again asjuniors and, if necessary',as sentors. Test content coverage and starrdard setting decisions are made locally. prornotiontest.The school district requires pupils at ttre end of fifrh Mid.dleschooL grade to pass tests in reading, mathematics, and language skills fbr proinotion to grade 6. Those who fail any resr must attend sumrner school and pass a com. parable test before they may enroll in middle school. 4. Writing assessment. Writing samples are collected near rhe end of grade 6 and are scored locally to determine the extent to which writing skills are being devel. oped. This local assessmentdoes not provide individual student scores, bur it does provide information by classroom to describe the nature of group achievement. Curriculum adjustments in terms of focus and time allocation are made on a building by building basis after examining the results. 5. End-of-course fasls.High school students complering these courses are required to pass a district-wide comprehensive test to receive graduation credit for the course: U.S. history, American government, algebra, geometry, biology, and chemistry. Students who fail to achieve minimum performance levels may opt for summer school study in the area of deficiency and retest with a comparable forp of the test. These tests are intended to ensure equality in academic demands in the course within and between buildings and to enforce minimum standards of performance in each course. Of course, other forms of assessmenttake place throughout the school year interspersed among the preparation, administration, and reporting activit ies as s oc ia te d w i thth e m a n d a te d te s ts .In grades 1,3,5,7,9, and 1l the di stri ct administers a traditional achievement-test battery to help fulfill the need for ad. ministrative, instructional, and guidance information that mandated progr:ams do not furnish. A test of cognitive abilities is adminisrered in grades 2 and5, an aptitude hatter/ is given in grade 8, and many high school students take any number of college scholarship and admission tesrs. The sheer volume of testing has led a number of measurement specialists to lobby for a consolidation of testing so that one test administration might effectively serve multiple purposes. Considerable experimentarion is taking place to develop acceptable testing procedures and reporting methods that will accommo. date the variety of needs with a reduced amount of testing. Two developments that warrant brief mention in this regard are customized testing and district re. port cards. Cwtomized testing,as the term implies, allows states or districts to have tests tailor-made to their own content specifications rather than to select one that was designed to represent the curriculum of the mythical "typical school in the nation." The district identifies its instructional objectives from a publisher cata. log and specifies how many test items should be used to measure each one. The publisher selects items that match the chosen instructional objectives from its large bank of test items. Custom test booklets are printed, tests are administered and scored, and results are reported in terms of mastery of the various instruc. tional objectives. This mastery testing program may be a useful supplement to the district's traditional standardized testing program. Another form of customized testing that has been tried in an effort to reduce the amount of time devoted to testing has serious limitations. In this method a district (or state) chooses a sample of math and reading items from a F THE STATUSOF EDUCATIONAL MEASUREMENT 9 standardized achievement battery for administration in its assessmentprogram. These items, along with a few others chosen by the district, are administered and then, using complex statistical procedures, the scores on the full-length standardized math and reading tests are estimated. The whole PurPose here is to obtain nationally normed test scores without giving the entire tests.There are at least two problems with these procedures that make the norms inapplicable: (l) the district chose only items on which its students could do well and (2) the test taken by the district was different (context of the items, length) from the one taken by the norming schools.The implication of both of these conditions is to overestimate the performance of district pupils relative to the national group.. (See Way, Forsyth, and Ansley, 1989, for further details associated with context effects in customized testing.) The state or district report card is the most popular method under development for reporting the results of statewide assessment,among other information, at the school-district level. Typically, the report provides school-building achievement-test score averages by grade level and compares those with the averages of other buildings in the. district and in the state. To help the report reader develop an understanding of achievement in the school, student characteristics are described and compared with those in other buildings in the district and the state. For example, the distribution of racial/ethnic background, limited-English ability, and family income levels might be presented to help explain current student academic performance. Information ahout instructional resources, school finances, and student attendance and mobility might be presented for similar purposes. The report card method shows promise for improving the interpretability of test scores for a school district. The wide range of information presented helps focus attention on factors that seem to relate to school achievement. But there is much we do not know about how such factors as family income level, per pupil expenditure, educational levels of teachers, and racial/ethnic backgrounds of students influence the achievement of individual pupils. Legislated statewide testing has evolved over the past 20 years to address public accountability concerns. It is likely to remain with us in some form for the indefinite future, and it is Iikely to provide further motivation for improving test development and reporting practices. Natlonal Assessment of Educational Progress The quest for school accountability that emerged in the post-Sputnik era made both educators and legislators realize that no useful mechanism existed to provide information about how much young people nationwide had learned in school. No dependable guides existed for steering public policy regarding priorities for educational spending or needed curriculum reform. Plans were laid in the mid-I960s for the National Assessment of Educational Progress (NAEP), a project that would survey the knowledge, skills, and attitudes of young Americans in several subject areas and report this information to educational decision makers, practitioners, and the public. Initial assessmentsin each of ten learning areas-science, writing, citizenship, reading, literature, music, social studies, mathematics, career and occupational develoPment, and art-have been updated periodically to gauge progress. More limited assessrnents,called probes,have been 10 T H ES T A T U S o F E D U CAT Io NAL M EASUREM ENT conducted ih such areas as basic life skills, health, and energy. The reports of each assessmentinclude selectedexercises(test items) and the proportion of the sample tested that chose each multiple-choice alternative. Many of the factors that shaped the initial structure and goals of NAEP in the 1960s have changed. For example, there is less public confidence in the ability of the school to do its job, there are greater demands tbr some form of accountability, and the once modest role of the federal government in education has changed to a prominent one. Beginning in the late 1970s,chargeswere made that NAEP was t'ailing to serve the audience that needed serving; its results needed to be more useful (Comptroller General, 1976; Wiley, lgSl). Subsequently, the funding for NAEP to the Education Commission of the Stareswas not renewed and the contract was awarded to Educational Testing Service,based on a redesign of the purposes and technical procedures proposed for furure assessments(Messick,Beaton, and Lord, 1983). Included in rhose new plans for NAEP were assessmentof functionally handicapped students, assessmentof limited-English proficiency students, and compurer-assistedassessmenrprocedures. The social/political climate of the 1980sleft many congressionaland educational leaders dissatisfiedwith the national assessmentdata available to then'r. A study committee appointed by the U.S. Departmenr of Education reviewed NAEP and made its recommendations in the report, "The Nation's Report Card" (Alexander andJames, 1987).Another significant redesign of NAEP ensued,with considerable effort directed toward a testing plan that would permit stateby state comparisons of achievement by 1990. Such comparisons, viewed as dangerous and inappropriate by the original designersof NAEP, are considered essentialby current policymakers to respond to ac.countabilityneeds and to motivate states to improve education in their jurisdictions. The 1990sexpansion of NAEP requires that all statesadminister NAEP teststo rePresentativesamplesof their students in selectedgrades.(Add this form of "imposed" testing to the illustration of mandated assessmentin the previous section.)No doubt the extra testing required by NAEP will raise issuesrelated to school time, personnel requirements, and need for dollars to supporr state-level participation. There are many good reasons to believe that the state by state compari. sons made possible Uy XAff are a bad idea. First, despire the apparent demand for these data, there is no useful plan in place or no explicit purpose given for using state-levelresults. A state'sranking, either within the 50 staresor relarive to its border neighbors, is more likely to provide fuel for political fires than to improve the quality qf education in the state. Second, it is unlikely rhar stares can agree on the essentialcontent to be measured at each grade level for which comparisons are to be made. As a result, the content is likely to be low-level, minimum essentialsthat a majority of students have masrered.(If such compromisingrwsls unnecessary,more of the current statewide assessmentprograms would use the same or similar teststo conduct their state programs.) Third, con. tent compromising will hamper the development of rests thar are difficult enough to help show actual differences in achievement from state to state. That is, if the state scoresare all fairly high, the scoresof thq highest-scoringfive srates may not look much different from the scores of the lowest-scoringfir'e states. THE STATUS OF EDUCATIONAL MEASUREMENT 11 the mean-ingfulnessand value of the scores from tests representing a low. f'ou1!t, level, "plain-vanilla" content domain are very questionable. Foi example", there will be much in the science curriculum of Missouri schools that students will learn but NAEP will not test. Why should Missouri be interested in the scores frorn such a test? The lowest common denominator curriculum represented by test content will necessarily be far lower than what we see in wideiy used standardized achievement test batteries. Fifth, the costs of gathering the state-level NAEP scores probably cannotbejustified in terms of the benefits statescan ac. crue. The hidden and indirect costs to statesand school districts may exceed the direct costs funded by the federal government or allocated by each state legisla. ture. If all these resources could be divided and channeled to instructional"programs and school facilities in each state,the impact on educarional quality would certainly be greater. Bold leadership on the state level is needed to redirect the state by state cornparison efforts. The lmpact of Computers Advances in computing, especially with regard to the microcomputer, continue to provide new opportunities for more efficient, more realistii, and more accurate measurement of educational achievement.A textbook description of new developments seems futile since the technological changes are likily to be regarded in historical terms by the time the print reachesthe audience. Even so, these changes reach the classroom implementation phase at a snail's pace, especially those related to testing. A major but somewhat hidden impact of computers on testing has been in the area of theoretical measurement developments. The increased capacity and speed of computing have made it possible for researchers to perform simula. tions requiring complex and lengthy statisticalanalysisrhat previously were too cumbersome to carry out. The developments have been oullined and detailed by Bunderson and Inouye (1987) in terms of four generations of computing in educational measurement. One of the most rloteworthy theoretical advances made possible by technological improvements has been computerized adaptive testing.The computer createsa test for the examinee during the test administration processby "adapting" to the examinee's most recent response.If the last response was wrong, an easier item is selected;if is was correct, a harder item is selected.This continuous bouncing from easy to hard, or vice versa, allows the examinee's achievement level to be determined quickly with a relatively small number of items. Adaptive testing requires less than half the number of items and testing time relative to conventional methods. And though efficiency is the main advanlage at this stage, this method is likely to prove to be more accurate, more versatile in t".-stf types of items that can be presented, and more economical than our traditionai grouP test'administration procedures. When the capabilities of video disk and voice synthesisare added, it is easy to see that adaptive testing can revolutionize the entire testing process in the near future. New and revised software packages for microcomputers are making rhe classroom testing process more efficient for teachers and more fitting for individ. ualization of instruction. Item banking allows teachersto reduce test prepararion 12 MEASUREMENT THE STATUSOF EDUCATIONAL time and to access test questions that have been designed to accomPany their other instructional materials. In some casesthe test can be administered by the computer and scored by the time the student has finished. Responsesand scores from all students can be stored in the computer and summarized in a convenient report for the teacher at a later time. In other cases,a desk-top scoring machine attached to a microcomputer can score answer sheets and provide a summary analysis report for the teacher within minutes. Another significant impact of the computer has been in the processing and reporting of the results of standardized testing. Computers can easily aggre' gute r.bre. for buildings, districts, and states,and they can disaggregate scores of iubgroups to monitor the achievement of pupils in special programs. Laser printers-can display test scores attractively_and in ways that are most convenient ind meaningful for each of the several different users. Since the way test scores are organized and formatted on a report has such an influence on whether the reports are even used, the flexibility made possible by computer changes-may_be among the most prominent factors impacting testing policy during this last decade of the century. Despite the many positive contributions of cornputer technology to testing, there ar-esome porential negative side effects worth contemplating- The most troublesome may be the fact that compuiers can provide teachers with more inf6rmation about test results than they understand and are able to use. More in' service work and careful design of software both can address this problem. Sec' ond, the availability of an item bank means that teachers need to write fewer items themselves and, thus, they obtain lessof the much needed practice required to nourish good item-writing skills. Of course, this potential ploblem is greatly diminished when teachers build and maintain their own banks. Third, the quality of the items in a commercially prepared purchased item.bank is not necessarily higher than the caliber of items the teacher might prePare. So increased reliance otr* it.* banks may yield poorer measures of achievement than what teachers could develop on their own. Finally, the use of computers to administer a test may impede ihe performance of some test takers, even though it may yield more valid reiults for others. (In what ways might some test takers be put at a disadvan' tage?) All these potential negative effects can be examined through empirical reiearch, but a heightened awareness of their possible influence may be most effective in minimizing their impact. Llceneureand CartlflcatlonTestlng The amount of testingdone to licenseprofessionalsor to certify individ- THE STATUS OF EDUCATIONAL MEASUREMENT 13 Though licensure or certification requirements may inctrude a certain college degree, so many hours of related work experi"rra., o'. letters from licensed practitioners, rest scores m.o.sttften ."iry ,L. "rrdorr.*"r,t most significant weight in the decision. Thus it is reasonable to questiori, have done, wh.ether a paper-pencir-objective test can ,.r.ur,rr.ih. possession "r-^".ryof ottrr. skills deemed essential for safe and trustworthy practii.. -"rry stt"ra a mechanic be required to demonstrate ability to diagnose an engine problem? Should a dentist demonsrrate a tooth resrorati,on o.t i !iu. patieit before being licensed? Is a multiple-choice test score sufficient information for certiiyi.rg ,., emergency medical technician? Probably not. A number of the important issut seen by considering some of the difficu licensure by state governments. If pros teacher preparation program and o-btaii to test them? If program quality actually varies so cloes that mean the accreditation process should b ers are to be tested, what kinds of content should be covered? If writing skills are to be assessed,what standard should be used to define minimar u...'p,"ui. ound lesson plan is deemed to be an Lssene test be used to check knowledge of the rning? If teachers need to deminstrate requisite to renewing a license, should rired as evidence?In view of the purpose r a lifetime license to a teacher ,.v. "ft".,to rould provide the ..final" "rr.*.r, "ii ques'ons,and someare.: r1,r s,- t'rilt:ff:*il;tJ,':J::Li::fi:' nJ:,'j; !..11 ^ prototype procedures for teacher evaluation No doubt computer simulations, video chronologies, and work-product porrfolios wilr be iJi"p,ure rhe many facets of "teacher" that will neid to be'evaluated. we need "r.ato ue aute to measure teacher competence in such a way that we are convinced we have measured the p_rop9r characteristics and we could obtain nearly the same ..r"ri, ii;l-."". else did the measuring on another occasion. The lmpact of the Courts As might be anticipated, while the amount of testing has increased and the importance of test scores in decision making;.ililLore regar chal_ El lenges-of test-dominated decisions have developel. rrre grieoances serdom have been directed at the instruments themsel questioned the relevance of the test scor or they have argued that the use ofthe sco Scores from intelrigence and cognitive ability tests have been used ro 11 . THE STATUSOF EDUCATIONAL MEASUREMENT group studenrs in buildings and cli have said that such ability grouping 1 in some cases (Stell v. Saiannih, {gt (Hobsonv . Hansm, lg6T; Diana v. State P v. WilsonRiles (lg7g), it was ruled tl children in classesfor the educablv it had a disproportionate effect on question. Discrimination or equal protection has been cehtral to most court decisions involving educational testing. ln Bakhev..catiforniiigzsl, ,n. pr"i",ilr n"j been rejected by a medical schooll and rhen learned that a mlnoriiy with lower test scores had been admitted. The U.S. "ppli.""t srrl..-" court ruled that Bakke should be admitted, but it only implied that the uie or different standards for applicants of different faces lryuri.tuppropriate. Ho*ru.r, the courr did indi. cate that it was proper for the race < admission decisions.In another case high school graduation test, a class a causea disproportionate number of r bias were dismissed (Debrap. v. Turlint of tests of minimum competence foi However, it suspended the use of the all students who had received any p gated schools would have had un ooo Thus, opportunity to learn the mar judged to be a significant criterion for Perhaps the most significant outcome to make between what a'staught, what taught' To the extent theselhree domain^s.differ, only the first is a legitimate standard for establishing the rerevance of the .orrt.rri of a graduatio"i;;;;tency test. Discrimination has been an issue.with respect to court cases involving e-mployment testing as welr. In Gr@t v. Dukepown iompanl (tgzl) tr,. .ourt *i.E that requirements for employmeil, such as a passing',.ri r"or., must be shown to be relevant to some aspeci of successo: cided that it is permissible for the use of te selection from different racial groupl measures knowledge or skill reqr",i.ed the outcome of IJ. S. v. South Carolfuw Examinations (NTE) by the state of south carolina resulted in disproportionate failure rates for minority certification candidates. But because teacher educators provided evidence that the content of. their teacher p..f"r"tion p.ogr"-, i* tlre court rejected racial discrimination as ifure rate of minority examinees. te impact was also ai the heart of another i the Golden Rule case, which was settled our Department orrnsuranc., .,ua.1,"'"t,"""il"?:lH:XT[:,::'i lfflTT.'ff"tJ'il1: ice (ETS),rhe srate'stesting"consurrant,belaise t6b Jnority -""-y "ppti.^i1,lo,- ME LSUREMENT fs OF EDUCATIONAL THE STATUS its insurance broker positions failed the state's test. The settlenlent requires ETS to follow certain rules in selecting test items for future versions of the Illinois -selecting between an item that previously has be-en test. For exarnple, when equally difficult for blacks and whites and an item that has been harder for blicks, the former must be chosen. (For further details about the controversial procedures required by the Golden Rule settlement, see several related articles in the second issue of Volume 6 of Ed.ruationaLMeasuremmt: Issncsand Pradices, ie87.) Finally, scholarship selection procedures were at issue in a suit brought by the American Civil l-iberties Union (ACLU) against the New York Department of Education. The court said the purpose of the state scholarship program was to reward achievement in high school, not academic promise in college. Therefore, since the exclusive use of the Scholastic Aptitude Test (SAI) score for award' ing scholarships resulted in a sizable disparity favoring-males, the eourt said the pricefune discriminated against females. (There is-ample e1ile1ce that the- high school grades of females are higher than those of males-) Th_ejudge concluded that the most appropriate resolution was to-use a composite of high sChool grade' point average ind SAI score as the selection Criterion (Staff' 1989)' While all the casescited above involved the use of tests,.the crucial issue in most of them was a matter of social policy And what was'on trial was the fair and appropriate use of test scores in-decision making rather than the tests themselvei.'Is desegregation in school imPortant enough to justify some aPPar' ent sacrifice of opiimum learning conditions (Stell, Hobson)? Is it proper fo_r employers to .pecify employee qualifications that are not related directly to job (-Griggs)?Should a seer-ninglyt'good".test b_edisqrralified as a selec' r.q.riri*.nts tion device becausE-ofits adverse imPaqlo.n minorities (Larry P.,South Carolina, Debra-P.)?In attempting to right old wrongs, should selection procedures discriminate to the advaniage of minority test takers (Debra P', Bakke)? The issue thit underlies the testing controversies that have ended up in court has everything to do with what a test is perceived to measure as it relates to how the scores ari to be used. And this is tDefundamental issue that underlies the.procedures we will address in subsequent chaPters on_item writing and test buiibing: Most well-developed, technically sound testswere built with a particular prr.posJ in mind and are less useful for other Purposes for which we might-con'rid.r ttt.-. Why might it be inapproPriate, for example, to use the scores from a high school graduition competency test to select the recipients of three college scholarship awards? Standards for Testing Practlce In view of the increases in the amount and the significance of testing, it seems reasonable to expect state and federal governments would regulate^and control the develirpmeni and use of tests.Shouldn't the public be protected from poorly made tests or inapprop-riate ttses of them, just as they should_be Pr-ot9^cted i.o- i.t"pt insurance sell-ers,-fraudulent lawyers, or lo-wgrade beef) Aside frorn the truth-in.testing legislation from New York state in 1979, no other legislationstate or federal-has b.en directed at the control of the testing industry or at the protection of test tdkers' rights. Fortunately, the testing profession has been ac' 16 MEASUREMENT THE STATUSOF EDUCATIONAT tive in devel<lping standards of practice, albeit unenforceable, f,or makers and users of tests. The Standardsfor Eduratiorwl and FsychologtcalTesting(American Psychological Association, 1985), also referred to as the Standards is the most recent form ofa document that has been prepared and revised over a 3O'yearperiod by edu' cational and psychological test specialists.Though they distinguistr essentialand important aspects of test development, the Standqds are not intended primarily as prescriptions for commercial test publishers.-Instead,they are intended to distinguish appropriate and inappropriate test use and to describe the types of evidence users should seek and developers should furnish to supPort a specific use of the scores from a test. There are no legal ramifications for violators, and no professional sanctions will be placed on those who fail to adhere to these standards. However, professionals can exert pressure on their colleagues to conform when the beliefs and values of the profession have been documented in writing and published widely as a consensusfor reasonable Practice. Thus, de' spite their lack of teeth, the Standnrdsare essential to the profession and, not so indirectly, to consumers whose test-taking rights are seldom protected legally or formally. A more recent effort of similar intent has been the preparation of the Codzof Fair TbstingPrailices in Edu,catinn(1988), a document more limited in both scope of ccintent and intended audience that the Standards.TheCodzhas separate lists of responsibilities for test developers and test users and is written to communicate with the general public rather than only testing professionals. Its sections on developing and selecting appropriate tests, interpreting scores, striving for fairness, and informing test takers are intended to highlight the ProPer use of tests rather than to broaden the existing Standardsin any way. The publication and mass distribution of the Codeby the five sponsoring professional organiza' tions is a clear indication that many testing professionals see the need for selfregulation of sorts. It also demonstrates a keen desire for test takers to be treated fairly and for tests to be used properly. There has been some interest in creating a tyPe of consumer protection agency for testing that would function much like the Consumer's Union, the Un' derwriter's Laboratory or any of the various accrediting bodies for schools and universities. In fact, the Center for the Study of Testing Evaluation and Educa' tional Policy at Boston College has received grant funds to explore the feasibility of creating such an organization. One possible outcome of the study c,ould be the formaiion of an oiganization that would certify (l) the quality of-existing test instruments, (2) the procedures used by testing companies to develop their instruments and to perfoim their statistical analysis for scale and norms develop' ment, and (3) the proposed uses of existing tests for certain specific selection, placement, certificition, or licensure decisions. There is still considerable debate ibout whether a testing "watchdog" is needed and whether the cost Passed on to test takers to support this protection is worth the potential benefit to consumers' No doubt whenthe stakes are very high, all relevant Parties-makers, users, and takers-will be persuaded about the value of independent review and subsequent certification of the Process. Opinions are likely to diverge more widely, however, as the stakes decrease. ?.^ - --/ -_r,.. THE STATUS OF EDUCATIONAL ME{SUe€ME\- 17 TASKOF THE SCHOOITHE PRINCIPAL When one considers the reasons why schools were built, the reasons whv both children and adults attend them, and the activities that go on inside them. it seems apparent that the main purpose of the school is to facilitate cognirive learning. However, this thesis has been challenged throughout the years by those who have argued that schools should be concerned primarily with, for example, development of moral character (Ligon, 19611,life adjustment (U.S. Office of Education, l95l), enhancing self-confidence(Kelly, 1962),or even the restmctur. ing of society (Counts, 1932). Clearly, all these are worthy purposes. And since learning can contribute to the attainment of each, they are not actually alternatives to learning as much as they are reasons for learning. But should they be given primary emphasis in defining the task of the school? Should they form the foundation for the school curriculum? Don't they have more to do with ultimate, lifetime goals than with the means the school should use to help students achieve those goals? Many educators disagree with those who espouse "higher" goals than cognitive learning for education. While most teachers would acknowledge the ultimate importance of character, adjustment, self-confidence, and the good society, there are several reasons they might give for why none of these should replace learning as the school's primary focus of attention. One reason is that the school is a special-purpose social institution. [t was designed and developed to do a specific task: to facilitate learning. Other agencies are responsible for other aspects of the complex task of helping people to live good lives together. For example, there are families and churches, legislative assembliesand courts, factories and unions, publishers and libraries, and markets and moneylenders. To believe that the major responsibility for ethical character, life adjustment, social reconstruction, or personal happiness must rest on the schools is as presumptuous as it is foolish. Even the private, parochial, and home schools that have emerg.edas alternatives to public school education have sought to make the transmission of knowledge their primary function, secondary to the other goalSthat may have led to their forrnation. The instructional methods they choose to use, the curricular supplements they choose to endorse, and the physical facilities and environment they choose for their setting do not overshadow the fostering of cognitive learning as their main function. The special responsibility of every school is to provide training, instruction, and education. The task of facilitating learning is challenging enough, and important enough, to occupy nearly all a school's time and to consume nearly all its energy and resources. Another reason for believing,that the schools should continue to emphasize learning is because of the basic, instrumental importance of learning to all human affairs. With their gift of language, human beings are specially equipped for verbal learning. Cognitive excellence is their unique excellence. The more they know and understand, the better, more effective, and happier they are likely to be. How better can schools help youngsters toward happiness than by increas" ing their knowledge and understanding of themselves and the world around them? How else can adjustment be facilitated, character developed, or ability to 18 THE STATUSOF EDUCATIONAL MEASUREMENT contribute to society increased? Cognitive learning is effective in reaching all these goals, but is not the only means. The psychological process we call conditioning also can be used to achieve some of these same goals. It works by making use of r'ewards and punish. ments to establish specific, habitual responses to certain specific .onditiotrs. Much of our behavior was molded,.especially processes of conditioning, or behavior much subject to its influences. If the scl and if their sole mission was to establish behavior patterns, then they should dt conditioning could probably get the jot the cognitive learning process could. Br person flexibility and freedom in choor tioning is better suited to the training ol of human beings to live h.ppy, useful l. People who object to the emphasis on learning as the school's main func. tion may do so because they think of learning as academic specialization, designed mainly to pre_parea person for further rearning, and iemote from the practical concerns of living. There may be some justifiJation for this view. But learn_ingneed not be, and ought not be, the learning of uselessthings. rt can and should be the student's main road to effective living. And as long-as gognitive learning is the focus of our schools, there is a need for ways to determine the extent and type of learning that has occurred. Tests can be, and should be, ainong the most useful instructional tools for planning new learning activities and foi monitoring students' progress in attaining the learning goals lresented to them. The Role of Affectlue Outcomes Teachers and test developers are sometimes accused of overemphasizing cognitive learning, with consequent neglect of the affective determiners of behavl ior. Some people believe most teachers are preoccupied with what their students know; students, they say, are most concerned with what they like or dislike and how they feel. Furthefmore, they submit, the most profound challenges in our society are not cognitive. They are challenges to oui social unity andiur soc_ial righteousness, to our ethical standards and moral values, and to our courage and compassion. If our schools dwell too much on cognitive outcomes, they w-ill fail to contribute as they.should to meeting these other important challenges. Such viewp_oints are not without foundation. Feeling is as reil and as . important a part of human nature as is knowing. How we feil is almost always more important to us than what we know, and how we behave is a paramount concern for those with whom we share our lives. And since behavior is often determined more by how we feel about a situation than by what we know about it, clearly the-affective dimension will play a most significant role in meeting the challenges of society. Should schools glve up some of their concerns for cognitive learning in favor of affective outcomes? Such a reemphasis should not ociur for a varieti of THE STATUSOF EDUCATIONAL MEASUREMENT 19 reasons.Many affective ggals can be reached, at.leastin part, through cognitive means.Affect and cognition are not independent utp..tt'of the pers8nallt! rr"* we feel abou_ta problem or an event depends in part on what we know about it. Wisdom does not guarantee happ unhappiness. The affective failures ar the pushouts-can nearly always b theirs or ours. Psychologistswho tr; ally use cognitive means. The psych tive process of fostering self-li,now courses in human relations focus or and attempt to create a new aware ships by expanding the knowledge No teacher can afford to ignore the affeitive side effects of efforts to Promote cognitive le-arning.In fact, the affective disposition to learn-the willingness to attend and^respond-must be considered 6y teachersin assessing the entering behaviors of their students for each instruitional unit. But teac"he., should not use their concern for affect as an excuse for paying less attention to cognitive outcomes. THE POTENTIALVALUEOF TESTINGIN EDUCATION There is currently much testing in education. but testsseldom contribute as much as they could to effective instruction. How much is learned in any particular course of instruction depends largely on how much the students wini to lea.n and on how hard the teacher woiks to help rhem to learn it. These efforts by students and teachers depend, in turn, o., th" immediate and ultimate rewards or satisfaction that seem likely to result from their efforts. Tests can be used to provide recognitions and rewards for successin Iearning and teaching-ih;t;", be used to motivate and direct efforts to learn. In sho"rt,they can't. ,r.Ja 0., contribute substantially to effective instruction. Tests have sometimes been used very successfullyto stimulare efforts to learn. For example, in the Iowa Academic Conresr that began in lg29 (Lindquist, 1960)'irigh school students were offered tests in each oithe malor sublec'ts tr study:English,tristory,geometry,physics,and so on. Those who ,ecieiueati. t igt . est scores on the local test were invited to a district contest where a similar frut somewhat more difficult test was given. Those who scored highest on rhe district testswere invited to the state contest where they took a thirdievel of tests.Those who scored highest on these tests were offered'scholarshipsto the State Univer. sity. This academic contest was u.sedby some high school principals ro provide incentives for both students and teachersto woik hard. In some'schools the local contest winners we-rerecognized at a school assemblyand in news stories. In conferenceswith teacherswhose students had done well on the tests,th; t.; cipal offered congratulations and support for continued efforts to teach effec. tively. In conferenceswith teacherswhose students had not done well, the princi. 20 THE STATUSOF EDUCATIONAL MEASUREMENT pal,tried to identify tfilsf that the principal or rhe teacher might do to make students more successful the ne*t yei.. Thus the whole school was led to believe that.learning was important and that successful efforts to learn would be rewarded. An environment conducive to learning was created in the school, and every student, not just the contest winners, benifited from it. many schools, unfortunately, tests are not used so effectively to stimu. late and-Ilfacilitate learning. Test scores do not matter very much, and rjnless they matter they cannot contribute much to effective instruition. There ur. ,.u.rul reasons' none of them very good, why some teachers and school'administrators depreciate- testing and do'ai little of it us possible. The tests are criticized as havin-g little value or as being actually rrarmrut- Doing a good job of testing demands skills that many educators know that they taJt<a"na requires work"that their lives are more comfortable without. Testing involves comparison and competition. Even though these are facts of life, lome teachers believe that strould protect rneir students from competition as much as possible. _schools unless all students can win, none should be allowed to win. Thus some. schools are content with a comfortable mediocrity as long as the public will tol_ erate it. taxed.citizens in many states are not willing to tolerate medioc-Heavily rity in their schools. They are asking for evidence that their iax dollars are buying excellence in education:.Th:y areisking that the schools do something, oi d.l manding that communities do somethirig, to correct the conditions that educators blame for low achievement in learnling. Since public school teachers and are p.lbli: emplo-yees, it is eitirely prtp., for the public to t otJ :S-T_t"-r:,!r",ors thelr schools accountable for doing the best job poisible undei the circum_ stances. There are two things, both involving the use of tests, that teachers and schools can do, and ought io do, to justify.tf,e.ir stewardship ro the.o*-rrrriry. Each teacher ought to piesent evidente periodically to the sJhool administration that the students he or she has been teaching have made substantial progress in learning. Each school ought to present evidince periodiJly io tt e io*"*rrrri,.y that the students in the sihool ire making-substairtial pr"gtlrr in learning-Ifi's not sufficient for teachers and schools to -<lescribetheii piocesses of instruction and to claim that they know how to do a good job of educating children. The public is more interested in the product trra"nin it. p.o."rr, u"a i, would like to see evidence to support the claims. of course, not all evidence of learning can be, or should be, furnished in the form of test scores, whether from teachJr-prepared or starraa.aized tests. Public performances and displays of student ..eatiie .ff"*, p"rtfolios of student work and in-cliss observations by teacher, can and should-products, be used to supPort the positive efforts of students "rrJ"a-irristrators and teachers. It is po-ssibleto use tests effectivelyto-promote and document learning also. However, s thai boti teachers and administrators te"tnowleageable :f.::1::.::l,r:,r:q"i." about and skilled in the use of educational tests. The remaining chapters 6f tfri, book are devoted to the pr_esentationof concepts and principles'that will contribute to the development of many of these essehtial skills. IH E S TA TU OF S E D U C A TION AME L A S U FE ME N T 21 5 u IXM A RY P RO P O SIT ION S - --: : j -,3s: recent surge in educationaltestingis :a- :' a. nisloricalcycle patternof test use '.':-,la:€i iest'nEand tne accompanyingpoten:: regativeconsequenceshave undulyshaped :^e curriculiJmand teachingpracticesof many :cJcators have not had an opportunityto learn as ,.nuchas they needto knowabouteducational .reasurement r Tte influencestandardized tests have on the local schoolcurriculumis probablymorebeneficial tnan harmful ! "Teachingto the test" is deplorableif it means givingstudentsanswersto the particularquestionson a test;it is commendable if it meanshelpingstudentslearnwhattheymustknowto answer questionslike thoseon the test 6 Claimslhat testingharmsstudentstend to be exaggeratedand seldomare substantiated7 Testbias may exist to some extenton some tests, but it cannotaccounttor substantialdifferences in test scoresbetweendifferentculturalgroups 8 Objectivetests can providehighlyvalid,precise, and convenienlmeasurements oJ mostof the importantoutcomesof education. 9 Comprehensive instructioncan benefitfrom two kinds of test information-scoresthat tell what studentscan do and scores lhat show how students rank amongtheir peers 10 Stateand districtassessmentprogramsrequired by law are intendedto evaluatethe generalquality of the educalionalprogramor to identiJyspecific competenciesheld by sludents"ready" for highschoolgraduation. 11. A district'scustomizedtestingprogramis no adequate replacementfor a nationallyslandardized testingprogram. 12 One significantcontributionof the NationalAssessmentof Educational Progresswas to provide a modelthat statesmightadaplfor designingand operatingstatewideassessmentprograms 13 Testassemblyand administration can be accomplishedefficientlywith the use of computerswithout sacrificingtest quality 14 The use ol purchaseditem banks by teachers could have a negativeimpacton test quality,as weil as on the develoDmentof teachers'itemw r i t i n gs k i l l s 15 Teslingto certifycompetenceor to licensepractitionersoften requiresa demonstration of processesor produclsthat a paperand penciltesting aDoroachcannolaccommodate 16 Court cases of the last severaldecadesthat involvedtestingwerefocusednearlyexclusively on the socialconsequences of testingand fair test use ralherthan on the tests themselves. 17. The Standardsfor Educationaland Psychological Iest/rg representan effortby the testingprofessionto monitoritselfand to orovidea form of consumerproteclionto test takersand users 18 Thereare good reasonsfor believingthat the primary task of the school is to tacilitatecognitive rearnrng '19 Amongthe limitedmeansthat schoolscan use to helpstudentsbecomeefiectiveand happyadults, cultivatingtheir cognitiveabilitiesis the most aporooriateand desirable 20. Schoolsshouldseekto attainalfectiveendsonly throughcognilivemeans 21 Testscould be usedto promotelearningbetterif both teachersand administrators systematically providedtest resultsto the communityas evidenceof the educalionalprogressof students FORSTUDYAND DISCUSSION OUESTIONS 1. What usefulpurposescould be servedby a mandatednationaltestingprogram? 2 Why is there no federalschoolcurriculum,a commoncore for all schoolsin the United States? 3 Under what kindsof circumstancesmight "teachingthe test" be appropriateand desirable? F 22 THESTATUS OF EDUCATIONAL MEASUREMENT 4 what kind of evidencPshouldbe furnishedto supportsomeone'sclaim that a parlicular test is biasedagainsta givenelhnicgroup? 5 How shouldthe standardsfor a high school graduationcompetencytest be established? 6 what are the prosand consof makingavailablestateby statecomparisons of achievementtesl scores? 7 what'iunctions might be served by a testing "watchdog" organization that would provide protectionto test consumers? 8 Whichindividuals or agenciesshouldbe assignedmajorresponsibility for developing inlerpersonalskills,societalvalues,and personalattitudesin youngpeople? Measurementand the Instructional Process EVALUATION, i|EASUREMENIAND TESTTNG The purpose of evaluation is to make a judgment about the quality or worth of something-an educational program, worker performance or proficiency, or student attainments. That is what we attempt to do when we evaluate students' achievements, employees' productivity, or prospective practitioners' competencies.In each casethe goal is not simply to describe what the srudents,employees, or other personnel can do. Instead we seek answers to such questions as: How good is the level of achievement? How good is the performance? Have they learned enough? Is their work good enough? These are questions of value that require the exercise ofjudgment. To say simply that evaluation is the process of making value judgments understates the complexity and difficulty of the effort required. Once it has been determined that evaluation is needed, the evaluator must decide what kind of information is needed, how the information should be gathered, and how the information should be synthesized to support the outcome-the value judgment. Thus, evaluation is as concerned with information gathering as it is with making decisions. In addition, the term is used to refer ro the product or outcome of the process. That is, we might, for example, submit our evaluation (the product) of Scott's school performance to his parenrs follow. ing our evaluation (the process)of his accomplishments.In this respect evalua. tion has a dual connotation. 23 24 M E A S U R EM ENT AND T HE INST RUo T Io NAL PRo C E S S Evaluation: Formativeand Summatlve \ i-.i ... - ; The terms formative and sumr to describe the various roles of evalua struction. Formatiaeanahtationis condu< to determine whether learning is takinl conducted at the end of an instructio sufficien;,{y<omplete to warrant movir structionlUhe distinctions berween thr tions fof'test development and use in educational programs. As will be noted summa^livepurposes may be used on o i The major function of formatiue feedbadk to rhe teacher and to the str feedback provides an opportunity for t ods or m_alflials to facilitate learning going wellformative evaluation requ"ir matton on trequent occasions.lnformat tion, classroom oral questioning, homell inventories. Much of what a teicher dor led as formative evaluation..[he role of ighly systemadzed.programiof individu. rently for formative eviluation. the end of an instructional segn -Rent-at ($elative to formative evaluarion. there"ir tive evaluation. The inforrnation gathered is less detailed in nature but broader in scope of content or skils asJessed.Figure 2-t comfare" ro-. of the distin1!e. guishing characteristics of the two types. obviousry,-both types of evaluation are necessary components . of class. room instruction. In some cases,information gathered r,i, ,"-t"tiu. prr.por.. may be useful in a formative sense. For exam[I., ttr. ,.or., o., a unit test may be used to evaluare achievement at the end of that unit. A; ;; same time the scores.reflect progress in the course and in the broader i.rsirultiorr"r pr;d;. ln such circumstances the tests shourd be designed ," vi.lJ-"..fuI information Flgut 2-1. characteristics that DislirEuish classrmrn Forrnaliv€and surnrnativ€ Elralualion Formetive Purpose ContenlFocus Methods Frequency Monitor progress Detailed, narrow sco& @servations,daily assignments Daily Summative Checkfinatstatus General, broadscope Tests,projects Weekly,or every2-3weeks MEASUREMENT AND THE INSTRUCTIONAL PROCESS E for summative evaluation purposes, but the scoresmight be used incidentally as gross indicators of progress in the broader context. Measurement:Assigning Numbers Measurement is the processof assigningnumbers to individuals or their characteristics according to specified rules. Measurement requires the use of numbers but does not require that valuejudgments be made about the numbers obtained from the process.we measure achievement with a test by counting the number of test items a student answers correctly, and we use exactly the Jame rule to assigna number to the achievementof each student in the class.Measurements are useful for describing the amount of certain abilities that individuals have. For that reason, they represent useful information for the evaluation proc. ess.But can we measure all the important outcomeSof our instructional efforts? Education is an extensive,diverse, and complex enterprise, not only in terms of the achievementsit seeksto develop, but also in terms of the means by which it seeksto develop them. our understanding of the nature and processo? education is far from perfect. Hence it is easy to agree that we do not now know how to measure all important educational outcomes.But, in principle, all impor. tant outcomes of education are measurable.They may not be measurable with the testscurrently available.They may not even be measurablein principle, using only paper and pencil tests.But if they are known to be important, they must be measurable. To be important, an outcome of education must make an observabledif. ference. That is, at some time, under some circumstance,a person who has more of it must behave differently from a person who has less of it. If different degrees or amounts of an educational achievement never make any observSblediffer. ence, what evidence can be found to show that it is in fact important? But if such differences can be observed, then the achieyement is measurable, for all that measurement requires is verifiable observation of a more-less relationship. Can integrity be measured? It can if verifiable differences in integriry can be observed among individuals. Can mother love be measured?If observerscan agree that a hen shows more mother love than a female trout, or that Mrs. A showsmore love for her children than Mrs. B, then mother love can be measured. The argument, then, is this: To beimportant an edurational outcqmemust make a d.iffnenta If it makesa difermre, the basisjor mzasurensnt arfuts. To say that Rita shows more "spunk" than Ned may not seem like much of a measurement. where are the numbers? Yet out of a series of such more-less comparisons, a scale for measuring people's spunk can be construcied. The Ayres scale for measuring the quality of handwriting is a familiar example of this (Ayres, lgl2). If a sequence of numbers is assigned to the sequence of steps or intervals that make up the scale, then the scale can yield quantitative measurements. If used carefully by a skilled judge, it yields measurements rhat'are reasonably objective (that is, free from errors associatedwith specific judges) and reliable (that is, free from errors associated with the use of a particular set of test items or tasks). Are some out(:omes of education essenti3lly qualitative rather than quantitative? If so, is it reasonable to expect that these qualitative outcomes can be 26 M E A S U R E M ENT AND T HE INST RUCT Io NAL PROCESS measured?It is certainly true thats.rne differences between persons are not usu. .r asm<,re-ress dtll:::11:... i;i;;;" l1',r_:*lyr*l.r l.rench;,.j ;',,|,:Till il,:f ::H ;*.ilJlT.iX'"r ences is a man;thatone is a in quantitative terrns,to<>.I.his p"rr,rn f,u, , man; rhat o11 has less_T.hisperson frul .n,r.. .y..t per s on h a s rn o re a b i ti ty to s p e a k F rench; tt,ot,,,nl We n ra y th i n k o f th e w e i g h t u rnu" , f, account as quantities, while r.garding" f his health, as quar it i e s .A n d i f rh e y s e rv e i o d i ffi renti ate hi r ex hibit s m()re o r l e s so f rh e m th a n o th er men. thev It is diflicult to think of any q"dit ;;;,';;;;;;;J f ied. , , W h a te v e re x i s tsa t a l l ' e i i s ts i n some amoun p. l6) . A n d Wi l ti a m A . Mc C a l t (1 9 g 9 ) n " , uaJ.J,-,;,, c an be me a s u re d " (p . l g ). Tcsilng: A Form ol Measurement Tests repr-esentone particular measurement technique. A test isa set of questions' each of which hai a correct answer, that examinees usually answer orally or in writing' I-est questions differ rrom thor.'used in measures of attitudes' interest or prefereni., ,r. certain other asfec,-.-o.-rp..ronality.Ideally, questions in testsof achievement the or many testsoi intelligence have answeri that content experts can agree are correcr; correctnessir ,,ot aZt.r*ffifi;.;;lr. ular valu.es,preferenies, or dislikes .rf u grouf oij;;;;. All testsare a subset.f the quantiiativi ,ooi, o.'i..trniques rhat are classi. fied as measurements.And a, measureme't techniquesare a.subsetof the titative quan_ q^111i:^rivetechni.qu., ur.a i" .;;r;;;ffi"o concern in this text, but""0 certainry not the <lnty <rne,wi|l be *itn tne aeveropment -".;o,. of tests that can contribute to summative evaluation of studen, r.^r"i"g. other measurement and evaruation rechniquesare usefut ro, ott e..uai;;;i;; purposes,but resrsthar measure ..r.:1::^r.hoor rearning with precisi;;.h";'most usefur toors avair. able to teachers for most crussrozm summative evaluation needs. EVALUATIONIN THE TEACHINGPROCESS The evaluatt":^:lrlTlng takes prace in an instructio'ar contexr and, conse. quently, that learnin-g ..tui-.or,-.rit shap-es;. ;;;;;r.nrry *. evaluare, influ. ;:fi:TiJ,:T::,,f:, evaruating *"r,,"t. ", wilrashow*.valuation ";JA;;i;;, of inrtr...iiorr, it is not teaching process.The must be understood as that end, the role of e ttr.t .*pi"ir* t o* trr. teaching process works. is an integral part tw ts loosely attached to the :ole of evaluation in it both ucational measurement. To re described using a model M EA S U R E ME NATN D TH E IN S TR U C TION A PLR OC E S S 27 The Basic Teaching Model There are many models that describe the variety of approachesto teaching found in our schools, but the Basic Teaching Model (BTM), introduced by Glaser (1962), accounts for the fundamenral components of rnosr other specific teaching models, such as the Socratic approach, the individualized instruction approach, or the computer-dominated instructional approach (foyce and Weil, 1980).Few teachers probably follow the BTM steps explicitly to guide their instructional activities. And though we do not specifically epdorse the use of the BTM or any other particular rrrodel,we do advocateinstructional approaches,by whatever name, that account for the fundamental functions represented in the BTM as described next. The main purposes of the BTM are to identify the major activities of the teacher and to describe the relationships between activities.Figtrre 2-2 is a diagram of the model. Our primary interest is the Performance Assessmentcompo. nent, but we canrlot understand completely the role of evaluation without understanding how Performance Assessmentaffects,and is affected by, other teaching activities. Instructional objectiaes,the first componenr of the BTM, represents the teacher's starting point in providing instruction. What should students learn? What skills and knowledge should be the focus of instruction? What is the curric. ulum and how is it defined? The second componenr, Entning Behauior,indicates that the teacher must try to assessthe students' levels of achievement and readiness to learn prior to beginning instruction. What do the students know already and what are their cognitive skills like? How receptive to learning are they? Which ones seem self-motivated?This component indicatesa need for evaluation i nform ation beforeinstruction actual ly begin s. Once the teacher has decided what will be taught and to whom the teaching is to be directed, the "How?" must be determined.The InstructionalProcedures comPonent deals with the materials and methods of instruction the teacher selects or develops to facilitate student learning. Does the texr need to be supplemented with illustrations? Should small group projecrs be developed? Is there computer software available to serve as a refresher for prerequisites?At this point instruction could begin, and ofren ir does. But unless the reacher makes plans to evaluate students' performances, the students and teacher will never be Figure 2-2. The Basic TeachingModel(Dececco and Crawford,1974) r A MEASUREMENTAND THE INSTRUCTIONAL PROCESS Evaluationplanning PR@ESS AND THE INSTRUCTIONAL MEASUREMENT E Examplesol Methodsthat ServeVaryingPurposqsin an EvaluationPlanningGuide Flgure 2-3. LEVELOF INSTRUCTION Typeof lnlormation Course Unit Cumulativefolders,questionnaires,observation, oral questioning Unit tests,prolects, Formative papers,observation, Evaluation patterns participation Pretest,oral qus$lloning, checklist,obs@rvaliofi Finalexamination, Summative project, Evaluation comprehensive r9searcnpaper, performanceratings Unit test, writtenProject, work product,presentarecord, tion, participation perlormancechecklist Entering Behavior Ouizzes,oral questlonlng, results,participatiorl records Daily Lesson oralquesObservation, tioning,homework resulls Teacherquestioning, studentqueslloning, quizzes,activity nonverbal obssrvation, observation notapplicable] [Ordinarily of school, teachers spend considerable time "sizing up" their class, both the group and the individuals in it (Airasian, 1989). Teachers might review cumulative record folders or solicit specific background information from students, but most data gathering is unplanned observation and questioning directed more toward social and emotional behaviors than academic ones. Teachers should plan for their evaluation needs, no matter which level of instruction they happen to be considering. By deciding in advance what kind of ihformation they need and how it might be obtained, evaiuation will be done efficiently and will yield complete and helpful information. Another reason for developing an evaluation planning guide for a unit, for example, is that the teacher will be forced to plan for assessingthe achievement of some of the hard-to-measufe outcomes of instruction. For example, the teacher may plan to evaluate the achievement of the 12 main objectives in a science unit by using an objective test, two essayitems, and a laboratory observa' tion checklist. Without the planning, however, last-minute attention to evaluation might result in the use of only an objective test. Test-planning activities that will be discussed in Chapter 7 will help to decide how "testable" objectives can be measured. But in the absence of an evaluation planning guide for the unit, non' tes.tableoutcomes may get lost in the shuffle. An assessmentof science achievement should reflect learning of all relevant objectives, notjust those that are most easily assessed. Finally, it can be seen from an inspection of Figure 2-3 that evaluation activities vary depending on the level of instruction and the type of information needed. Note., for gxample, that standardized achievement-test scores, found in cumulative folders, are useful on the course level, but only for providing informa' tion on entering behavior. They are much less useful as summative evaluation inf<rrmation, and rhey are of little value at the unit and daily lesson levels. Note, too, that sumrrative evaluation of daily lessons is not meaningful because such lessons are seldom ends in themselves. Which methods seem more helpful for 30 MEASUBEMENTAND THE INSTBUCTIONAL PRoCESS formative than summative evaluation.purposes? why isn,t homework included under rhe summative evaluarion heading?'tn *hich lri.. .ur.go.y do you find the componenrs thar are listed .,id.. r"r-",i,. evaruarionar rhe course ,T;j;"t FUNCTIONSOF ACHIEVEMENT TESTS The major function of a clar ment and thus to contribute to the ev: ments. This is a matter of consideral have, rhat whar studenrs know ;;J., rre opporrunities, there is nothing,,mere,, AND THE INSTRUCTIONAL MEASUREMENT PROCESS 3I . will be tested, if they know what the test will require, and if the test does a good job of measuring the achievement of the essentialcourse objectives,then its motivating and guiding influence will be most wholesome. Anticipated tests should be regarded as extrinsic motivators of learning efforts, and internal desires or needs to achieve should be regarded as intrinsic motivators. Since both kinds'contribute to learning, the withdrawal of either u'ould probably lessen the learning of nr:oststudents. For a fortunate few, intrinsic motivation may be strong enough to stimulate all the effort to learn that the student ought to exert. For most of us, however, the motivation provided by tests and other influential factors is indispensable.What we stand to gain or what we might lose in a given situation is motivating to all of us. We live with such tradeoffs. Tests help us make many of our decisions about trying to learn-whether to try, how hard to try, and when to stop trying. Classroom testsdo serve other useful educational functions. The process of building them should cause instructors to think carefully about the objectives of instruction in a course. It should cause them to define their objectivesoperationally, that is in terms of the kind5 of tasksa student must be able to handle to demonstrate achievement of those objectives. And from the students' perspective, the process of taking a classroom test and discussingthe outcome afterward can be a richly rewarding learning experience. As Stroud (1946) put it long ago, It is probably not extravagant to say that the contribution made to a student's store of knowledge by taking of an examination is as great, minuti for minute, as any other enterprise he engagesin. (p. 476) Hence, testing and teaching should not be regarded as rnutually exclusive or as competitors for valuable instructional time. They are intimately related parts of the total teaching effort, as the BTM illustrates. LIMITATIONS OF ACHIEVEMENT TESTS It is easy to show that mental measurement falls far short of the standards of logical soundness that have been set for physical measurement. Ordinarily, the best it can do is provide an approximate rank order of individuals in terms of their ability to perform a more or less well defined set of tasks. Unlike the inch or pound, the units used in measuring this ability cannot be shown to be equal. The zero point on the ability scale is not clearly defined. Becauseof these limita. tions, some of the things we often do with test scores,such as finding means, standard deviations, and correlation coefficients, ought not to be done if strict mathematical logic holds sway.Nonetheless,we often find it practically useful to do them. When strict Iogic conflicts with practical utility, it is the utility that usually wins, as it probably should. It is well for us to recognize the logical limitations of the units and scales used in educational measurement.But it is also important not to be so impressed by these limitations that we stop doing the useful things we can legitimately do. One of those useful tfrings is to measure educational achievement. Are some outcomes of education too intangible to be measured? No 32 MEASUREMENT AND THE INSTRUCTIONAL PROCESS Alternativesto Tests Teachersobtain information abr greater amounts of somecharacteristic ot AND THE INSTHUCTIONAL PROCESS CI MEASUHEMENT of students'abilities canbe made on the basisof descriptive information. Because are bound to be qualitative, not quantitative, they provide only these assessments limited and very imperfect indicators of achievement.Such descriptions are dom. inated by terms like excellent, mediocre, worthwhile, well writtetl, satisfactory, and quite good. Qualitative descriptions of direct or indirect behavior observations, however specific and objective, ma)/ have some value in assessingachievernerit, but they are no adequate replacement for a well-prepared classroom achievement test. Ratings of performance or products, on the other hand, do involve assigning numbers to things and, hence, do constitute measurements. Thus, they are useful in differentiating individuals who possessdifferent amounts of the traits measured by the ratings. These kinds of rating scales tend to measure certain aspects of achievement that tests are less well equipped to measure. Consequently, while they seldom can replace tests, they frequently provide useful supplements to the information provided by tests. Many teachersuse ratings of a student's discussionparticipation and of the student's written work as part of the basis for evaluating learning. It is important to note that the value of ratings, as well as of a test or any other measurement of achievement, depends on their reproducibility, accuracy,and appropriateness. Assessmentsof achievement ordinarily should not be limited to tests, but the alternativ'es and supplements that are availatrle must be used with full realization of their limitations and pitfalls. Chapter l4 is devoted to a discussion of the development and use of nontest alternatives for gathering achievernent information. INTERPRETINGMEASUREMENTS The result of measuring is a number, but that number has no inherent meaning and, consequently, is not a useful contributor to decision making. To make the number usqful, or meaningful, it is necessary to compare it with something. If Gail, a 2l-year old female, weighs 62 gauchos, what does that mean? If it is 8 bukas between your town and mine, what does that number mean? If Tonya got 15 right on an algebra test, what does that 15 tell us? My score of 23 on the ,{ttitude Toward Computers Scale means nothing by itself. I need to refnente it, or compare it. with something that has meaning in order to interpret my score. What are these "sornethings)" When we step down from the scale after weighing ourselves, we lend meaning to the number we read by referencing it to any of several other numbers: our expectation for how much we think we should weigh; the result from our last weighing; the weights of ottaer individuals of our own height, gender, or age; or a listing of numbers that define such terms as obese, overweight, about right, underweight, and emaciated. Similar kinds of comparisons can be made to interpret a test score. If my expectation for myself was a score of at least 85 on a midselmesterexam, then actual scores of 67 or 92 each will have quite different meanings to me. If my score of 92 is 15 higher than my score on this same test a week ago, my score obtains meaning in terms of growth or change by referencing my first score. Knowing that three-fourths of my classrnatesobtained scoresabove 92 also supplies interpretative information. And, finally, if I know there were I I 5 34 MEASUREMENTAND THE INSTRUCTIONAL PROCESS ls rmportant to undersl r:"1.r, depending , :r,1,_: mrnd. A close look at th"et oackground for test deve and inappropriate score Norm-referenced Interpretations ;als (or groups) ro obtain _.term ..norm,, relates to Ls,norm_referenced inter_ io" l"ilgi.T?;:,T,:"n geometrv score with, the o derermine ti, ..iltluJ AND THE INSTRUCTIONAL PROCESS 35 MEASUREMENT of summative evaluation is to compare the achievement scores of the groups. T,reatment-referenced interpretations are made when the score (average) of one group is compared with the scores (averages)of other groups that have experienced the sarne,rival, or no instructional treatment. Tests designed to yield such interpretations contain items that are sensitive to instructiorU that is, they are much easier for students who have been instructed than for those who have not been. Interpretations are made in reference to the varying methodological treatments or instructional levels. The mean score of a particular group takes on significance only when compared with the mean of some other Broup. For example, "The team-taught class scored higher-than the control group" or "High-ability students who used calculators scored the same as low-ability students who did not use calculators." No direct reference is ordinarily made to test-item content or subject matter to derive meaning from the scores. Achievement scores obtained through most of the "methods" research in the education literature are interpreted-in a treatment-referenced fashion. In addition, school norms, or norms for school averages, that are reported from the scoring of standardized achievement tests are best labeled treatment referenced. Crroupreferencedis the broad term we find convenient to use in referring to norm-referenced and treatment-referenced collectively. Both kinds of interpretation involve comparing a.single score with a grouP of scores. In the first case, these are scores of individuals. In the second, these are scores of groupsof individuals. The distinction is an important one because,when we want to interpret the performance of, say a class of first graders, we should compare their average score with the averages of other first-grade classes,not with the'scores of other individual first-grade pupils. The consequences of using the wrong norm group for making comparisons will be explained in greater detail in Chapter 16. Grilerlon-ref erencod Interpretatlons A critnim.-refomced, interpretation is made when we compare a person's score with scores that each represent distinct levels of performance in some spe" cific content area or with respect to a hehavioral task. Meaning is obtained by describing what the perFon can do, in terms of various gradations, in an absolute sense. Glaser (f 963) first used the term to highlight the need for tests that can describe the position of a learner on a performance continuum, rather than the learner's rank within a group of learners. For example, we may want to know if Bob can solve geometry problems that require an understanding of the properties of the rhombus and trapezoid, but we do not care'so rnuch how well or poorly his performance compares with that of his classmates.Such interpretations are important when the goal is to determine,whether students have the prerequisites to profit from a new instructional unit or if they have learned the essential ideas in a unit before moving on to a new unit. After over 25 years of using the term criterion referenced, there is much confusion, even among measurement specialists,about what the term means. Part of the confusion stems from the fact that "criterion" is used in several other ways by testing specialists.Another part of the conftision relates to the wide variety of interpretations that can be classified corectly as criterion referenced. (Nitko, 1980). Both Hively (1974) and Millman (f974b) recognized this arnbiguity and 36 MEASUREMENT AND THE INSTRUCTIoNAL PRocEsS marn. ltems represent only a (random) sam Some behaviors included in the doma or may be underrepresented by the it on the basis of boti sampled and uns: tions, course proficiency t.rtr, ,,ct apt prepared achievement batteries are'li In many instructional settinss easy to describe becauseits constitue"nt mains in mathematics and foreign lanl example, than those in literature?r socl interpretations are meaningful onlv to r be communicated to those wh" ;. i. score. k should be apparent, also, thar communicated clearly, misinterpretatio MEASUREMENT AND THE INSTRUCTIONAL PR@ESS C7 reflect the intended domain well. An example will illustrare several of tlese points. . lYppo.se ]ve want to measure students' skills in using the dictionary. with that goal in mind we could begin to write test items, but th"econtent domain of interest will be relatively ill defined. A ' described by listing these particular usr determine word meaning, to determinc :atesextent of mastery of that collection rrsuch objectives.referenced i,,,..p..,u,'i5l:1ij'*:ffi,::?'jj:i:tHlffruln. tions will be needed and separate subtests will need to be ionstructed to ensure that the subdomains are being measured thoroughly. Then four scores, one for each subdomain, would be obtained so that r.pu.it.'d..isions about tn. of each skill (objective) could be made. -.rt.ry Cutoff"score Interprotations There is another score-interpretation situation that merits special consid. eration because it is so easy for the outcome to result in misinterpretation. Here are some examples first. l. 2. 3. 4. 5. You need a scoreof 88 percent to earn a B grade. You will be placed in German 202 if your score is in the range from 40 to 52. A TOEFL scoreof at least280 is neededfor adrnission Those who score 16 or higher will be awardedfive credit hours for CalculusI. The passingscoreon the certification test is 120. These are situations in which the score users are not particularly interested in domain scores or in norms. Instead, some minimal standard of performance is the most logical reference for obtaining score meaning. It appears, on the surface at least. that this^is just another form bf criterion-.ef!.erried interpretation because each cutoff score represents a performance standard. But dois it? what is the basis for choosing 88 percent raiher than g5 or gZ percent for the B cutoff score?What level of content proficiency does a calculus tist score of l6 represent, or does a score of 16 simply "promote" the top l0 percent of the test takers out of Calculus I? whenever a'cutoff score is used as a basis for score interpretation, we must know the rationale for selecting the cutoff scor.e in order io determine whether norm-referenced or criterion-refergl.._d interpretation is being used. For-e_xample, a test might b9 given to identify rhe talented eighth grade?s who could benefit from an enriched algebra-cours-e^lext year. Th6se ii the top 20 percent (the top,16 students out of the class of g0) might be selected. Whatever score seParates the top 20 percent is the relevant performance standard for this 3E MEASUREMENT AND THE INSTRUCTIoNAL PR@ESS Flguro 2-4, Interpretivestalemenls DistinguishingNorm-Referenced and criterion-Beferenced Interpretations Norm-Rele renced I nterp retations 'I Ricogot the highestscore in the class z . No other 5th grade crass in the district has a roweraveragevocaburaryscors_ Sara's score of 77 is well above the class average o, 5g. Princewon the "Besl of Show" award at the peishow. Ben'spercentilerank on the listeningtest is 35. 6 The averagescore of the "writing to Read" studentswas higherthan the averageof the other students. My GREscore is 450 4 B, Criterion-Referenced Inter pretations 1. 2. 3 4 Ericacan correctlyname the capitalsof 47 states. Jody has achieved3 of the 9 science goals. Katie has a perfect score Billiemissed6 of the g itemsdealingwith adding untiketractions correcily speiled93% of the words from tiis guarter,s !. lnOV list. 6. Bert can lype 52 words/minutewithout errors 7. I got half of the true-falsecapitalizationitems right. grorrp' This is obviously a norm'referenced.interpretation each student,s - because "-"-outcome depended on his or her ranking in thJ g."";. prerequisites. A cutoff score using the be set. This method of setting the c-utoff standard and thus qualifies is criterion )ottom section, which statementsare ex. :tation? SUMMARYPROPOSITIONS 1. Evaluationis an information{athering process that resultsin judgmentsaboutthe luatity or worthof a performance, product,process,oi activitv. The resulls of formative evaluation are used primarily to monilor learning and imDrove the instructignal process. The resultsof summativeevalualionare used ori_ MEASUREMENT ANDTHEINSTRUCTIONAL PROC€SS g marily to make final judgmentsaboul the extent of learningor the qualityof the instructionalprogram 4. Measuresare tools of evaluationthat reouire a quantification of information. 5. Any importantoutcome of educationis necessarily measurable,but not necessarilyby meansol a paper and pencil test. 6 lt is a mistaketo believethat qualitiescannot be measured. 7. All tests are measures,and all measuresare included in the set of qualitativeand quantitative techniquesof evaluation. 8. The Basic Teaching Moclelis a conceptual descriptionof the essentialingredients of the teach_ ing process. lts components-instructional objectives, entering behavior, instructionalprocedures, performance assessment,and feedback roop-represent the generalactivitiesone would expect to find among the proceduresof successful teachers, regardlessof the specific teaching moclelthey employ. 9. The relationshipof evaluation activities to the other essential aspects of teaching can be described with the Basic TeachingModel. 10. An evaluationplan describesthe methods to be used in an instructionalsegmentto obtain infor_ mationabout enteringbehaviorand.information for formativeand summativepurposes. 1 l The measurementof educationalachievementis essentialto eftectiveformal educalion. 12 The primary function of a classroom tbst is to measurestudent achievementaccuralelv 13. Classroomtests can help motivateand diiect student achievementand can contributeto learning directly. 14 The developmentof a good classroom tesl rs. quiresthe teacherto definethe courseobjeciives in specificlerms. 15. The fact that educationalmeasurementsfail to meet high standardsof mathematicalsoundness does not deslroy their educationalvalue. 16. Educationaloutcomes that are said to be intangible becausethey are not clearlydefinedare as difficult to attain through purposefulteaching as they are to measure. 17. The impertecttests we now use serve us far bet_ ter than we would be servedby the use of qualitative assessmentsalone. 18. Criterion referencedand norrn referencedmore preciselydescribe kinds of testscore interprelations than types ot tests. 19. Domain referenced and objectives referenced, both types of criterion-referenced interpietations, are appliedin situalionswhere the test content is either a sample of interestor the entire universe of interest,respectively. 20. Norm+eferencedinterpretationsinvolvecompar ing one person's score with the scores of other inclividuals,but treatment-referencedinterpretalions compare the score of one group with the scores of other groups. 21. Criterion-referenced interpretationsinvolvecom_ panng one person's score with a sel of absolute pertormancestandards. 22. Whena cutotf score is used,the underlyinginter_ pretationmay be either absoluteor relative.de_ pendingon the method used to establishlhe cutoff score. OUESTIONS FORSTUDYAND DISCUSSION 1. How is the process of evaruationditferentfrom rhe processof measuring? 2. In what ways do lormativeand summativeevaluationoften,takeplace in employeepertormance appraisalin variouswork settings? 3. What are some importanteducationaloutcomes that seeminglycannot be measuredby any availablemeans?What is the basisfor establishingthe importanceof these outcomes? 4. Thinkingback to a course you have taken recently,how were the components of the BTM evidencedin the teacher's behaviors?Which componentsseemedto have been missing, iI any? 5. In which componentof the BTM does the evaluationplanningprocess probably best fit? 6. For what reasonsmight the resultsof daily assignmentsbe better categorized as formative rather than summativeevaluation?what are the implicationsof the distinction? & ' MEASUREMENT ANDTHEINSTRUCTIoNAL PRocEsS 7. lf the use of paper and pencil tests were abolishedat all educationallevels,what might some of the important direct and indirect consequencesbe for students,teachers,and others? 8. What kinds of group-referencedinterpretationsdo teachersand administratorsmake most f requently? L Under what circumstancesmight we be inlerestedin knowingonly whethera speciflcexaminee scored above or below the averagescore of a certain group? 10 Why might it be difficult to make useful norrn-referenced interpretationswith scores from a test designedto providecrilerion-referencedinterpretations? 11. What are some practicalexamplesof the use of cutoff scoreqthat do nof o{ovid€contentrelaled interpretations,that is, indicalionsof what examineescan do? MeasuringImportant Achievements THE COGNITIVEOUTCOMESOF EDUCATION If we look at what actually goes on in our school and college classrooms, labs, libraries, and lecture halls, it is reasonable to conclude that the major goal of education is to develop in students a commnnd,of substantiaehmwlzdge. Achievement of this kind of cognitive mastery is certainly not the only concern of educators, parents, and students, but it is the central concern. What is this important knowledge and how does it relate to understanding, thinking, and performing? We need answers to these questions so that we can decide which achievements our educational tests should measure. Knowledge Versus Inlonnatlon Knowledge originates in information that can be received directly from observation or indirectly from reports of observations. Anything we hear, read, smell, or otherwise experience can become part of our knowledge. If it is remem' bered, it does become knowledge. But if it is only remembered, without being thought about, it remains mere information, the most elementary and least useful form of knowledge. If, on the other hand, information becomes the subject of our reflective thought, if we ask ourselves, "What does it mean?" "How do we know?" "Why is it so?':, we may come to wdtrstand' the information. It can be integrated into a system of relations among concePts and ideas, all of which con' stitute a structure of knowledge. This process of encoding is essential to enable later retrieval; observations that are not encoded in some way cannot be recalled. 42 MEASURING I M PORT ANT ACHIEVEM ENT S Information that is stored in our memory_by sernanticencoding, that is, by ass.ci. at ing it s m e a n i n g w i th i n fo rm u ri o n a l r eaaystored,i , p" rr..i rr, usefur, and sari sfy_ ing relative to information sror-ed ,".r;ir6t;;i#;"J.;;g (Anderson. r9rJ3). In the rarrer case,informarion i",i".LJ'ny'";i;;;;;r;'i, *"iri orher informari.n related to our personar experience. reiepiro.re numbirs, wha.t we w()re tw() days ago' and whar we plan to do next weekend ;;;r;r'of'r.rro.-ution episodic. we may not remember that is ".. that we learned the'mearring of.,,prognostica_ tor" in sevenrh grade (episoai. ."f"iir,gJ, nrlt we likely still remember what it means (sernanticencoding). Infcrrmationihat ha, b.;;;lr;;iiared inro rng srrucrure of knowledge is exisr. likely to b*eu 'ur rhan possessron ;;;;;, information that is simply"r.rn..r,U!..a -..,.e igdii-,,., lBouldinr. The source of our verbal k""*r;Jg. ;;;;a'il;; minds in the f<rrm of (poranyi,r96a)anJ ar-rror,.isa pureryprivatepossession" l1f:'-u^l.-rt^r-ogeo. Bur absrracred frorn rheseimagesandexpressed in words,and ffiil ff:::iffi: informarionr";iltT".:s"*:,::T:5T.:i:,?.":ilJ'_:#[ni..** can be communiiated, it can u"...ora.i n"a ,rr..i]?, ,;;;i. rererence,and it can be manipulated in the process ."n...iu.ir,i"tr"i iil, u..uul knowredse ls a very powerfur form of knowledge. "f The peculia..*.&r..,.. of.humarrsamong all other earthly creatures is their luiri,y to produce urra ,r." u..bar knowredge. If a structure of knowtedge consisrsentirery of a sysrernof.arricu. 1erb1l lated relations among concepts and ideas,can it be aesc.itea compreteryby list. ing the elements (oropositions) that .o-po.. it? Might .ro1-" .o*prex strucrure inv olv e r era ti o n s o r d i m e .,s i o ;r i t;, ;;; nor expressedby the consri ruenrere. ments of the structlr^.1_":Ir",ily, a of rhe e.le;;;;;rr;.;,.rcrure some that have nor been perceived may tack 'stinj of,suchanunperceived.ind or e*fire.sed in words. Bur t elemenr, ;;;;;iii:T: il ;:iffJ: and ex pr es si t.It c o u rd th e n b".;;pd;i. e a d d e d to the ri st.Th" ;;" ;;r:i i r, ,tu, a srrucrure of verbal knowredge cu" u. a.r..iu.a'ty risting ,1. p.op.rsitions 9""*i,r-".,a it.appears to belogicat. The whoie wrrurt rrr in tf,i, 1lt^alcomp:se Lrrrslui" case aPPearsto be pre' cisely equal to thi ium of all theffits^^'! " Propositions Represent Knowledge If the primary goal of education is to herp srudenrsbuild and use srruc_ tures of verbal knowled-ge,it follows ,tru, i"ro designed to measure achievement should becomposeo ; aete.miri"ih; ;;;;" whichstudenrs iaeus r.t"-"iot,.., and Nagle abourpropositions :.9- p;;;i;ii.ry g".-ur. i; ;;l;:;ntext: (r) knowr_ edge is of proooiitions and^(2) p."p"rlirbn:ri,..*',r'ir possessthese strucrures of verbal F; knowledge. r*o (rsz4) is a statem;;; ;, can be said to be t r ue or ia rs e . (o u r u s e o f th e"' ti r# h ... i s not l i mi ted to the basi c,,i f_then,. sraremenrsused in logicar anarysis in the n.rJ expressedin sentencei,but not are all sentencesare"i;ilii;r;;y:;'rropo.rtions propositions.'T.hoseexpressing questrons or commands cannot be said to be true ;;idr.,;; can those rhat report purely t"bi..lll or feelings. propositions u"r.'ui*uy, declarative statements about objects lishes or events in the externir world, ro, .*ampt., The earth is a planetin the solar systom. A body immersedin a fluid is buoyed up by lorce equalto the weightof the fluid displacecl ME A S U R IN GIMP OB TA N TA C H IE V E ME N/lTS :} As wc consumeor acquirerddltlonel unitr ol any commodlty,the sallsfecilonderivcdlrom each addlllonalInstellmenttonds lo dlmlnleh. *,l,lxT J. Bryanfarrrd In hrs brdfor erecfionro rhe preerdency of rhe u.s. in rhe campargnof Ralnfell ln New york on Decemberl, lg9i. The cost or rivingrn canada Incrercedby two-ffths of a point duringoctob€r,1gg5. mentionedon p89e 136ol EducauonatMeasurcm-cnt, edircd by E. F. Llnd_ "lti,':Tt'teEts "e ased on propositions such as these, but :'"i:j;j,T,:'lT,f l"ffI:'"",! i:H.".,T; test irems'To suitabre,propositionsneedto meer rffrroi::::?i,.":#t,'"e ", b! l. They mustbe concise,wordedas accurately and unambig.uousry as the precision of knowledgeand languagepermit. 2' They musr be rrue,as estabrished by a preponderance of expertsin the fierd. 3' rhey must be worthy of remembering,asjudged by expertsin the fierd. 4' They must representknowledgeunique to the field, that is, principlesand con. ceprsnor generailyknown bylhose,rto t^u.roi.i"J*ilr,"" subrecrmarrer. lpositions that meet these standards in r about the value of study in that fielJ. r difficu.lt to prepare, it may be because is too ill defineh or because the item [ure. ments can only represent the verbal knor.r Physical skills or affective outcomes that we may want students to acquire cannot be represented by propositions. U MEASURINGIMPORTANT rcHIEVEMENTS Performance Requires Knowledge Our concern here is with th education. The term cognitiaeability whatever particular kind of task car eral mental ability, general numericz are examples of generalized abilitier tive ability as used here. Here are sc abilit y : Ability to traco the routeol the pllgrlmvoyage Ability to calculatethe squareroot ot a number Ability to oulline lhe economlctheorlesof J. M. Keynes Abllity to traco th€ clrculationof blood Ablllty to describethe orlginsof the IndustrlalRevolution Ablllty to ldenilfythe parts ol a llower by name Ablllty to describea methodfor removingtarnishfrom copper These abilities indicate *!1t u person can do. They require applications of knowledge to perform specific taiks or to answer particulai quesri,o;s. They can be raught specificallv and are learned specifically. Most written tests used to measure school achievement,professional capabilities, or qualifications for effective performance on the joL'rn""fJu. t.rl of specific cognitive abilities like thos.elisied above.To acquire"anysuch .%"i *. ability, a person must learn how to do it. To perform a clgnitive task, onE must know how to do it. The basis of any cognitive ability a-ndpractice may develop and perfect the ability, enabl the tasks more efficiently and accurarely.But ihe bas person know how the rask is to be the ability to do something the persc reasonable.Knowledge is the key. It is sometimes said that individuals possessknowledge they do not know how to use. They may indeed. Based on thii supposition, th6 infeience is some. times drawn that knowledge alone is no.t enough; .o-"thi.rg more i, ;.;;;;. But such an inference is,open to question. It irav be that ihe ipdiuid,ral lac(s sufficient knowledge of the right kind. or those *'ho cannor appry knowledge they possessmay simply rack the knowledge of how to apply r,i rn. proble'm may not be the inadequacy of knowledge per se, but inadequaciesin thJspecific knowledge possessed. The contribution of knowledge to effective human behavior is sometimes questioned. Knowledge alone is not"enough, says the businessman.It does not guarantee financial success.Knowledge alone is not enough, says the ..ir.g. president. It does not guarantee scholirly achievement.t<nowledge alo'e i, nli enough, says the religious leader. It rtoes not guaranree virtue. Kn"owledge alone is not enough, saysthe philosopher. It does nLt guarantee wisdom. They are alr right, of course.Knowledge irone is not enough. Bur in our complex world of chance and change, no one tlhi.rg o. combination"of ttrlngs *iti MEASURINGIMPORTANTACHIEVEMENTS/T5 ever be enough to guarantee financial successor scholarly achievement or virtue or wisdom. Although this is true, few would deny that the command of substantive knowledge does contribute greatly to the attainment of these and other ultimate goals. some have argued that know_inghou does not arways require knowing that (Ryle, 1949).But-are the two really so distinct and unreiated? For cognitivE tasks,would not a sufficient amount of relevant knowing that enable u p.itott to know.ftou? If you know that to find the quotient of two common fraciions you must invert the divisor and multiply, and all that those words mean, do you not know how to divide common fractions? In general, if we wish to teach someone how to do something, is there any better way than to teach them that this, this, and this must be done? surely, knowing is not the same as doing. If the doing involves physical manipulation, it may require psychomotor skills that knowing cannot supply. Even in the realm of pure mental tasks,practice may increasefaiitity. Sut failiitv aside,can doing any mental task require more rhan knowing perfectly well hoi, to do it? If so, what is that "something more?" The best *ay to prepare learners to complete a_cognitive-taskis to help them acquire the knowledge of how to complete it. The basis of that knowledge is necessarilyverbal knowledge. Given sufficient motivation to.attemp_t_to complete a task, sufficient verbal kiowledge about how to eomplete it should enable learners to do so successfully Knowledge,Thlnklng, and Understandlng -fhinking, understanding, and performing are among the significant goals of education, but none of these behaviors can be produted or iurtured without a substantiveknowledge base.Thinking is a processand knowledge is a product, but the two are intimately related (Aaron, l97l). New knowledge clannot be produced internally or used without thinking, and rhinking always-involves knowledge._Thought process€sare wholly dependent on the iinowledge being processed.Knowing how to think can be distinguished from knowing wlat is s6 but cannot_beseparatedfrom it. Acquiring knowledge and learning how ro think thus would seem to be interdependent goals. To say that schoolJ should teach students how to think instead of teaching them knowledge is to urge the impossible. ln sum, the best way ro teach people how to think is to help them acquire useful knowledge; the ability to think is necessarilvdependenr on having something to think about. To assimilate new information, learners must incorporate it into their own structure of knowledge.They must relate it to what they ilready know. Relat. ing is_understanding.Thunder is understood better when it is reiated to lightning. Fermentation ii understood better when it is related to bacteria.In gene?al, the understanding of any separate thing involves seeing its relations to other known things. And knowledge that is understood is more useful than knowledge that is only information. Teachers can give pupils information. But they cannot give them under. standing, for a person's understanding is a private, personal possessioncreated by the one who seeksit. we earn for ourselves the right to say "I understand." How much we know about a subject depends not only on how much information a6 MEASURING IMPORTANT ACHIEVEMENTS we have obtained from others <lr from much we have th<lught about that inf<rr other elements of infrtrmation we hav study. We ask students to study becausr to thin k ab out r elat ions hips bet ween w lea rn. Ne w inf or m at ion t hat c an be as s . rate means will be remembered and r structure of knowledge with superficial b ut,it may be r em eniber ed. Lear ning a< understand_ing are correcrly percei#d To rre understo()d, iniormati'r-r must becrme part of a c.herent structure of kn<lwledge. when frlr its use arises, we must u. to remember it and see its rerevance. 'ccasi.n when all this is true, we can say we "nr" have comman$ <lf the *:lj"#;".$:.,;ll;:i:,?:,:,.3ffi1. .,,,,.,,.,,ghrie badname roterearning and not en.u g h e mp h a s i so n c o m m a n o . t- " tt' much emphasi s on possessi oi we ac t ually c o u l d u n d e rs ta n d ; th e c o s t c k nowledge w o u l d b e w o rth to u s . N o d o t v . ides _one of th e g re a te s tc h a l l e n g e sro t, dent lear nin g . H o w d o w e i n c re a i e th e v <lr decreasethe ,,c<,rst" of learning it? Describing Cognitive Outcomes The terms that some educarorshave used to identify or describe achieve. ment are more impressionistic than dem Nearly at important aspectsof achievement-knowledge or abilities_ ME A S U R IN G IMP OR TA N TrcH IE V E ME N TS '7 can be described by the type of behavior required to demonstrate attainment of the achievemenl Nearly every test item on a good classroom achievement test can be classified using one of these seven categories: Understandingof terminology (or vocabulary) Understandingof fact or principle (or generalization) Ability to explain or illusrrare(undersrandrelationships) Ability to calculate(numericalproblems) Ability to predicr(what is likely to happenunder specifiedconditions) Ability to recommerldappropriate action (in some specific,practical problem situation) 7. Ability to make an evaluationjudgment l. 2. 3. 4. 5. 6. The usefulnessof these_categoriesin the classification of items testing various asPectsof achievement depends on the fact that they are defined mainli in terms of overt behavior requirements, rather than in terms of presumed mental p_rocessesthat may be required for successful response. Items belonging to the first category always designate a term to be defined or otherwise ideiltided. Ttems dealing with facts and principles are based on descriptive statements of the way things are. If the question asks,who? what? when? or where? it rests a person's factual information. Items testing explanations usually involve the words why or becau,se, while ,items belonging to the fourth category require the student' to use mathematical processes to get from given information to the required quantiries. Items that belong in either of categories5 or 6 are based on descriptions of specific situations. "Prediction" itemJ specify atl the conditions and ask for the future result; "action" items specify sonu of the conditions and ask what other conditions (or actions) will lead to a specified result. In ,,judgment" items, the response options are statementswhose appropriateness oi quality is to be judged on rhe basis of criteria specified in rhe item itself. The fundamental concern of test developersis the processof translating the relevant structure of knowledge into tasks (test items) that require a demonl stration of the knowle dge and abilities of that specific structure. Todo so requires that the elements of the structure be identified so that test items can be *iitten based on them. These_elem.entscan be represented in a variety of ways-propositions, instructional objectives, or goal statements-and with varying leveis of specificity. To the extent that we are able to dissect the knowledge structure and describe its components precisely,the measurementsof achievement that result will be most useful and most meaningful in describing the cognitive outcomes of education. USINGINSTRUCTIONALOBJECTIVES The knowledge and understanding on which the instructional efforts in our schools are focused is the same knowledge and understanding that tests of achievement ought to. measure. The specific knowledge we expict students to learn is represented in the Instructional Objectives compone.tt of the Basic 48 MEASURING I M P O R T ANT ACHIEVEM ENT S Teaching Model.described in Chapter 2. The teacher'sjob is to define the struc. tures of knowledge, the concepts and relationships thaishould form the basis of instruction. Statementsof instructional objectivescan be useful for instructional planning, for promoting intentional learn"ing,and for cleveloping to.ls for per. formance assessmen r. what are instructionallbjectives ;[;;: ;;;;;;#" fr.m? How can they be used to ehhance our evaluatio.,^"d err<rriui^ The Derivationof Instructional Objectives Instructional objectivesare statementsthar describe the abiliries students should be able to display to demo'strate rhar important conceprs and princioles have been incorporaied into their own srrucrures of knowl;J#:il;J;;;;5;;, indicate whar the learner should be able to do at the end of an instructional sequence'Becausethe.development of cognitive atrilitiesought ro be the primary concern of our schools,the delineation of these important"abilitie. 15' matter' Particularly at the.elementary and-secondary school "i'iri"-i;l levels,,r..y"L J. c.iding what students should learn, whar they shouli know, "i should not be left to the classroom reacher aro.n9.Mosr purposeful formal learning is organized ilih; context of a curriculum defined in-terms of grade levels andiuu.;.Jt *utt.rr. ro. example, the instructional objectives of a seienth-grade matheriatics clas, must fit into the entire organizational plan; they should"not be decided solely bv the personal preferences, interests, or capabiiities of each different ,f"."Ji, gr"J. mathematics teacher in the school disirict. The derivation of instruction outlined in Figure 3-1. The pyramid r that indicate the purpose oi the inst general starements that are the found teachers,school board members, and carional goals of the schoolsshould br s g-oal, Ianelobjectivesare prepared for each :vel objective for grade Z-might be: .,To use ocessesto solve problems encountered in The revel objective for the seventh grade, one of several, suggests the need for a mathematics course. The purpose"of the course is to address all the level objectives related to mathematics cor needed to define in mor-e detail the courseobjectiuemight be to ..compute wi word problems." Once course oblecti, ers mu_storganize them logicaly and s in the formation of instructional units of the abilities srudents should attain- t. ME A S U R IN G IMP OR TA N TrcH IE V E ME N TS49 Flgutc 3-1. The Sourceot InstructionalOb,ectives-The PyramidEttect The pyramid illustrates that instructional objectives are derived from a few broad educational goals through successivestafes in hierarchical fashion. Each stage yields more statements, collectively, than the prior stage,and the statements.generated at any one stage a-remore precise than those in the prior stage in indicating thenature of the ability to beachieved. In facr, the writing of i"n. structional objectives can become a seemingly endless task if the writer atltempts to separate cognitive abilities into increasingly finer components. Statlng Instructlonal ObJectlves In contrast to educational goals and level objectives,instructional objectives should be prepared. primarily by rhose who will do rhe teaching. The stitements should be written in a form and at a level of specificity that will-make them most useful for their intended purposes. objectives that have been prepared to guide instructional planning or to communicate intended learning ourcomes to students can also be used for evaluation planning and test development. For ex. ample, the cognitive abilities indicated by instructional objectivei are prescriptive of the type of evaluation tool to use (observation, obJective test, researih paper,, essay,o_rprojecQ to assessachievement. And when a test seems most appropriate, objective suggeststhe most appropriate type of test item (essay, -each multiPle choice, problem type). The nature of the objectives also may suggesr how frequently evaluation should occur and, perhaps, how much formative eiiluation is needed. ; I 50 I M E A S U R I N G I M P o R T ANT AcHIEVEM ENT S Thoug-h there is general agreement among educators about the value of and the role of instructional objectlves,there remains little agreemenrabgut how suclr statements should be prepared. Most, however, will agiee that explicit statenxentsare-more helpful than implicit statements, no matter how the objectives are intended tr:l be used. Explicit statementscontain a verb that indicatJs in .rperational, behavioral, or observableterms what the learner must do to demonstrate attainment of the objective.Examples of such verbs are listed in Figure 3_2. con. trast them with the verbs from implicit statements.It is not possible to tell when someone knows, thinks comprehends,but we can ,rbr..u. them eNplain. 1P".ut,or ing, developing, and defining. {-he approach to developing objectivesrec'mmended by Gronlund and Linn (1990) incorporates both implicit itatem"nts-whar they cail general learn. ing outcomes-and explicit statements,what they call specific leu.rri.,g outcomes. Their method probably parallels the thought piu..rr.r mosr of us wiuld use to develop separate (explicit) instructional objeciiues.For example, .,Knows where to use commas in writing'' is a general outcome becauseit has an implicit verb in it. Some specific learning outcomes can be developed that indicate the kind of behaviors we are willing to accept as evidence for attainment of the general learning outcome. He,re are some i*ampres: separatesnames .f city ani state, setsoff introductory cla-uses, separatesquotation from rest of sentence,and ends complimentary close of a letter. Of course, it is the specific outcomes that are most useful for evaluation instrument development. Fbr which purposes might t? r preparing instructional objectives was :TL:::f,iIiiliT::.j,il:.'J:,i::i viduarized instruction, butonry,n. or,.'lntj,T"fi:1:"JT:H:*:T,"*1,?J,iii11. havioral (explicit) terms is widely applicable. do propositions relate to instructional objectives?Is it necessaryor useful for teachersto have both? The anatomy of an in"structionalobjective Jon. sists of a content portion, the underlyin the verb. Consider this sample objective: how frequent exercisecan contribute to I sition is that exercise doesenhance efficir circulatory systems.The learner should merely recognize (l) that it happens, (2) that it affects lung capacity or heart Flgurc 3-2. Verbsthat DistinguishExplicitand lmplicitSlatementsof InstructionalObjectives Explicit, Behavioral, Observable identify,explain,describe,rearrange, summarize,select,develop,predict, differentiate, define,compare,write Implicit, Non-Behavioral, lnferential know,consider,understand, enjoy, discuss,realize,remember, judge,perceive,thinkabout, comprehend,imagine ME A S U R IN G IMP OR TA NATC H IE V E ME N TS 51 strength, or (3) that aerobics are particularly useful for this purpose. Thus prop<,1sitions are an essential ingredient in instructi<lnal objectives (an expression of the relevant content), but they are not one and the same. Propositions are of content knowledge; instructional objectives are of perfornrance with respect to content. The sample instructional objectives shown in Appendix C are based on s om e of t h e p ro p o s i ti o n s l i s te d i n A p p endi x B . C onsi der how some of the propositions could be translated differently into instructional objectives, depending on how the learner is expected to "operate" on that content. How instructional objectives should be prepared for a specific situation may be dictated by the teaching model adopted. For exanrple, individualized approaches to instruction (Bloom, 1968;Glaser, 1968;Keller, 1968)require explicit statementsof objectives to define and organize the curriculum, to plan instructional activities,to monitor learner progress,and to advancethe learner through the curriculum. Domain-referenced or objectives-referencedtests are essential measures of achievement in these teaching models. Regardlessof the teaching model, instructional objectives can be useful to test c()nstructors as g-uidesto determining the nature of test content and the differential emphasis of topics wit hin a t es t .H i g h l y s p e c i fi cs ta te me n tsm a y even be useftrli n suggesti ngparti cular questions or types of questions to ask. Taxonomiesof Educational Achievemenls A number of educators have devoted considerable effcrrt to reducing the ar nbiguit y as s o c i a te dw' i th s ta ti n g i n s tru c t i onal obj ecti vesand transl ati ng these objectives into relevant test items. In doing so, some have clivided learning outcomes into three nonoverlapping domains: cognitive, aff'ective,and psychornotor. The first taxonomy of educational objectives stemming frorn this w<trk,The CognitiaeDomain:Handbooh1 (Bloom and others, 1956),cornmonly callecl"Bloorn's Taxonomy," provides six categoriesfbr classifyingcognitive behaviors:hnouledge, comprehmsion,application,analysis,synthesis,and analuation.The categories are intended to be hierarchical in terms of the intellectual dernand required of the learner. That is, knowledge, the remernbering of information, is less denranding than comprehension, the relating of concepts or the translation of ideas fnrnr one form to another. Evaluation, the most demanding, requiresjudgments using criteria remenrbered or formulated by the learner.Each major category is further subdivided, and test items are presented in the handbook to illustrate how achievement can be measured at each taxonomic level. The taxonomy for the cognitive domain has received the most attention from test constructors because it has been available the longest and because it describesthe kinds of abilities test constructors are most interested in measuring. A major contribution of this taxonomy has been the awarenessit has created regarding the intellectual level at which instructional objectives and test iterns are written. That is, teachers who may have written most of their objectives to require simple remembering or recall of information have come to realize that they actually intended for students to understand and apply knowledge.By using the taxonomy to classify objectives, teachers can reflect more readily on whether their expectations are appropriate. Though the cognitive taxonomy can be somewhat useful for qlassifying 52 MEASURINGIMPoRTANTAcHIEVEMENTS 1' Why rs a rusrbfeatoy bett€rthan its uvrsrruenrs consrtuentsror use fn '!'|-" "c rems? automarrc autr frresprrnkilngsysrllingpolnt. treatgrwaler pr€sgure. ,fting polnt. r conductorof electricity than tho alloy. MEASURINGIMPORTANTACHIEVEMENTS53 Flguro 3-3. Comparisonof ClassificationSysiemsof Bloom,Ebel,and Gagn6 Eloorn's faxonomy Ebel's Relevance Guide Gagn6s LearningOutcomes A Knowledge Terminology FactualInformation VerbalIntormation B Comprehension Explanalion I n t e l l e c t u aSi k i l i s CognitiveSliategies C. Application Catculation Prediction D. Anatysis E Syntnesis F Evaluation RecommendedAction Evaluatiorr ( f. Attitudes MotorSkills use sonle categoriesfiom each of these systemsto achievesome special purpose? (Choose one of the classification systemsand use it to classifythe sample otrlec. tives provided in Appendix C. Cornpare your results with rhose of another member of your class to see how well you agreed.) Interest in promoting critical or higher-order thinking skills (I{OTS) has lead several educators to try to develop a taxonorny of thinking skills. But there has been little agreernent about what categories should be included or even whether such a classificationsystemwould be helpful. The kind of mental acrivir) that most of us would consider highcr thinking can be described by rows B through F of l'igure 3-3. These are "beyond the knowledge or recall level," which itself is a useful way to describe HOTS. As we have nored earlier, thinking can occur r-rnlywhen there is something to think about-new information or an exist. ing knowledge structure" Even learning how to think requires "how to" knowledge. Consequently, an independent thinking curriculum seernsto be illogical and unnecessaryas long as "the use of verbal knowledge" is a prominent stiand in each aspect.of the school curriculum. SUMMARY PROPOSITIONS 1. A majorgoalof educationis to developin the students a commandof substantiveknowledge 2 Knowledge is informalion that has been integratedinto a structureof relationsoetweenideas, 3. A structureof verbalknowledgecan be described by listingthe conceptsand relationshipsof which it is eomposed. 4 All verbalknowledgecan be expressedin propositions. 5 Propositionsprovidethe basis tor most good oblective achievement-testitems. 6 Thinkingnecessarilyproduces knowledge,but knowledgemust be pr'esentfor a person to have somethingto thinkabout. 5. MEASURINGIMPoRTANTACHIEVEMENTS 7' Theprerequisites for..performing a cognitivetask o*'* ro ctoft and tn" rno*ur"it" ;; ;;* il"oy;. statementsor obJectivesderived from a set of educationat eoats. :"J:""l1iTn':';j,J::9H ''Yl3,"":'iT:'H":.ff,'Jiill"i;il::l:ln;'.,1il"ff :,"T ro an alreadyexislentslructrr" .tg. o'.no,nr"o9li. 9. Neartyaflquestions thatask wnoZwnatifiienl or where?areproperty ctassi,ied mationguestions. ""i;;i;Ji;;;'|0. ltemsintended to testvariousas menlordinarily canbe classified thebasisof overtitemcharacter basisof the mentalprocesses measure. The taxonor moreusefutl rhanfor evatr rru caregoflesof Bloom's Taxonomy(cognitive " ,'*:T:iffi:,:ff.:llT,',::'#jJ'::i".'::,:,:"ff fi:ilfl{ffl3il:*.,"1ffi,r9 QUESTIONS FORSTUDYAND DTSCUSSIOI{ 1. Whatdoesit meanto have commando, knowleclge? doesa personhavewhooanunderstand ' il,?lt "*"ttages ratherthEnsimptyknowsoms, or grearerinterestthanrarseproposrtions ' yy;::Jffipropositions ro rhosewho reach ' XI;:iJ:lnil:ffi 1;itr'i"',"*l1?ll'l$,#",i"',"nar outcomes rrom thearrective or 5. Whataresomeot thecrlteria highschoollearne otoo"olyuseto c,ecide stancting or a particuiaridea thattheirunder.yve (tor e^qrilpre,ur" vv, exampre, use .lt ot metaphor)is as deep to be? as 6. What do we mean when we say studentsshoulc they care for it thevread"? 7 Wharrearures oi.tingui.n educationar n"",, u"i',,iJr,H',ffiwhat atwo-catesory system \ryith therabers : [H,"##'iffii""T iffi]:,1?:ffiil;;;;;"", substanriarrv in ,r lffff';:ff::",:'.:linstructionat'1ffi';;;:i#ff;"[::":Xly#'r?so a new taxonomyfor 9. What factors otten cause separateJuclgesto clisa so that your system categorizingcognitiveiearning outcomes *'el9P o,iriinto vou *ouro vl;;;:,if ,:::.,:HJl!l,ff:i::lT' "'""dr,oi"','illi'i1",,,". Describing and Summarizing MeasurementResults There is a variety of statistical concepts and techniques that enable test users to interpret scores and that assist test developers in assessingthe quality of their instruments. In particular, many of the methods of making norm-referenced score interpretations depend on the mean and standard deviation of the scores of the norm group. And when a test has been given, statistical procedures help to summarize and describe the performance of the class.These same statistical methods provide information about the effectiveness of the test instrument in providing norm-referenced or criterion-referenced interpretations-whichever the test developer had in mind. The purpose of this chapter is not to duplicate the content of a good statistics text, but to describe statistical ideas that form a foundation for the measurement concepts to be considered in subsequent chapters. FREOUENCY DISTRIBUTIONS Afreqwuy distributian is a two-column list that describes a set of scores in a concise and systematic manner. One column lists all possible scores in the set from highest to lowest, and the other column, the frequency column, shows the number of examinees that obtained each score. Table 4-l shows the scores of a class of 30 students on a spelling test of 25 words. A frequency distribution for these scores is shown in Table 4-2. This distribution of scores is a useful visual aid for identifying the relative position in the group of any one student and for obtaining a picture of overaU group performance at a glance. 55 56 DESCFIBING A N D SUM M ARIzING M EASUREM ENT RESULTS Table4-1. Scoresof 30 Studentson a 2s-ltemSpellingTest Aaron Barbara Barry Ben Brent Camille Donald Doreen Earl Faith 16 19 20 17 21 t3 18 20 16 Franco Gary Gaea Helen Jack Jeff Jerry JOanne Kelly Ken 20 17 18 21 14 19 22 18 21 Kim Lori Marcia Marcy Nathen Patrice Richard Scott Travis Wendy 20 17 23 23 19 22 21 19 21 15 Frequency Polygons and Hlstograms The information summarized I represented pictorially by a frequency polygon is also known as a line graph. polygon using the scores of the 30 exar test.The score sca,le.is depicted along th is shown on the vertical line. An alternative representation is the histogram or bar graph, shown in Figure 4-2. Frequency polygons and histograms are equally useful for crescribing a _ set of test scores efficiently. Detailed procedures for constructing both ryp.r or graphs can be found in most introductory statisticstextbooks. Characteristics of Frequency Polygons Frequency polvgons come in all shapes and sizes,as evidenCedby the yalety shown in Figrrre 4-3. These cui-!'esare frequency polygons like t-be one in Figure 4-1, except these have been smoorhed. That is, the-iagged lines have Table4-2. Frequency Distribution of 30 SpellingScores Score Frequency z3 0 .t 24 23 tz 21 20 19 18 2 5 4 2 to t3 1^ 13 z 1 0 DESCRIBINA GN D S U MMA R IZIN ME G A S U R E ME NRTE S U LTS 57 Figure 4-1. Sample Frequency Polygon been replaced by a smo()th curved line, and the vertical line used to determine freqtrencieshas been omitted. Such modifications usually indicate that the poly. gon does not represent any one set of data precisely,but it depicts a general distribution having certain prominent characteristics. Often there is considerableeconomy associatedwith describing or sketching the frequency polygon for a set of test scoresrather than enumerating each score or even preparing a frequency distribution. But to communicate such a general picture, the characteristicsthat help distinguish frequency polygons from Flgure l-2. SampleHistogram 58 D E S C R I E ING AND SUM M AF IZ ING M EASUREM ENT R E S U LTS Flgure 4-3. Frequencyporigons ilrustratingvarying characteristics one another must be known and understood. F.oufof theseimportant characterisin Fisure4-t';;;;;s :Tilr:'"1#:::'." w'rbe ";il;,fisequenr secrions DESCHIBINA GN D S U MMA R IZIN G ME A S U R E ME NBTE S U LTS 59 modality is another distinguishing characteristic.The motlzof a score distribution is the most frequently occurring score.rA curve iswnimod,alif it has one rnode, Dimodal if it has laro modes, and multimodal if it has many modes. When a frequency polygon has more than one peak, we must look to the tallest to describe its modality. Part (c) of Figure 4-3 shows a unimodal curve and a bimodal curve. Note that the curve with lour peaks has two thar are taller than the others and those two are equally tall. How should the modality of the polygon in Figure 4I bq described? The last row of Figure 4-3 illustrates the hurtosispropelty of frequency polygons. Kurtosis relates to the relative flatnessor peakednesi of the ..rru.. Th" names describing these.curves(platykurtic, rnesokurtic,and leptokurtic) can be remembered by associatingthe prefix of the term (platy,meso,lepto) with a visual image of the shape of the curve. Tb test.your understanding of the properties of frequency polygons and their interrelationships, try to draw a figuie r<iverify that each of ihe'fillowilg statements is true: l. 2. 3. 4. 5. 6. Not a l l s k e w e dd i s tri b u ti o n sa re u n i m odal . Someleptolcurticdistributionsare not symnrett.ic. A rectangulardistributionis multin.rodal. Not all bimodal disrributionsare symmetrlc. A singledistributioncan be synrmetric, unirnodal,and nres6kurtic. Someplatykurticdistributionsare skewed. DESCRIBINGSCORE DISTRIBUTIONS Central Tendency The modewas defined previously in considering modality as a properry of a frequency polygon. It is the most frequently occurriig score,and it may'hav! more than one value (as in Figrre 4-l). The medianis the-score above which and below which exactly half of the scores are found: the middlemost score. For the s c or es 5, 4, 3, 2 , 1 , th e me d i a n i s 3 ; fo r the scores9,7, b,2, the medi an i s 6, halfway between 7 and 5. In simple caseslike these, if the distriburion contains an even rlumber of scores, the median is the average of the two middle scores. 60 DESCRIBING ANDsUMMARIzING MEASUREMENT RESULTS Thus the mode must be a score actually obtained by an exarninee,but the median need not be. When the score distrit with tied scores in the vicinitv of thc slightly more complicated. faUte +_: dian. Can you infer from these exam the median also is the 50th p"r..ni, chapter for computing percentiles shourd be used for finding the median. The meanis the average score, by summing all the scores and 'btained div iding t h a t s u ,mb y th e to ta l -n u m b e r of scores.nr. rn. sc()ress,4,3,2, l , the s um is l5 a n d th e me a n i s 3 . T h e s e o p e rati ons are represented by the f< l rr' trl a FV 7\ =- n - t5 5 :' ' where F is the mean, DX is the sum of the scores,and n is the number of test scores. ordinarily, the median is easier to carculate than the mean, especiaily when the number of scores is small. If the score distribution is skewed, the median usually gives a more reasonabre indication of the typicar score than does the mean. consider, for exampre, the set of scoresg, g, l0:ll,'22. what are rhe values of the median and meai? Notice that the median indicative 1io)-i"-o.. ,!: "typical" score and that the magnitude of the ?f is influenced in riij the direction of the extreme score.Noti also that four or -.u"trre iive scoresare below tlr,T:u"' but only rwo are below the median. If the 22 was changed ro a score of 42, how would the median and mean each be affected? For a variety.of reasons,the mean generalry is regarded by statisticians as a more precise and useful measdre of cenlral tendency ihan the ,ir.ai;. H"*: ever,we will find the median very useful for certain test-ivaluation and test-score interpretation purposes. So why are there three different ways to indicate central tendency?why notjust use the mean and forget the rest?f'he mode i, ," a.termine but not always unique. There may be two or more modes in ""ry a siore distribution. The median generally is easier to determine than the mean, i., a very skewed distribution, the median is more.like the typical score. ".ra Different situations sug_ gest a need for one measure rather than the orher, bur i" casesir marters little which is used. For what kind of score distribution -;;; u;; ;;;.an and median the same value? Is there any circumstance in which alr three are equal? -.ur1rr.5 Variabillty . !h:.'"nge of a distribution is a number that indicates how many score points the distribution covers.For the scoresg,7,2,2, r,;h.-;;;g. i, s. Note that Table4-3. Examples of Computingthe Median A. 1,2 ,8,8 ,9,1 0 Md = 7.5 + l = gO B 1,4,6,9,9,1 0 Md = 7.5 + 0 = 75 c . 1 ,2 ,6 ,6 ,6 , 10 Md = 5 .5 + i =5.s D 1 ,2 ,2 ,2 ,3 ,10 Md = 1 .5+ ? = zz I I AND S U MMA R IZIN ME G A S U B E ME NRTE S U LTS 61 DESCRIBING this is one more thon the dilfermce betweenthe highestond lowestscores.The range is a relatively gross indicator of the amount of dispersion in a set of scoresbecauSe its value depends only on two scores,the most extreme scoresin the entire distribution of scores.The set of scores 10, 9, 9, 9, 3 has a range of 8 also, but notice how different these two sets are in variability. The most common and useful measure of variatrility is the stand.arddeuiation. A conceptual understanding of the standard deviation can be gained by learning how this statistic is computed. Calculating the standard deviation involves four steps. l. Compute each person'sdaiation scoreby subtractingthe mean from each person'stestscore. 2. Squareeachdeviationscore(multiplyeachdeviationscorehryitself)and sum all of the squareddeviation scores. 3. Divide this sum by the number of test scores.'This yieldsa quantitycalledthe uarl,ance. 4. Find the square root of the variance. This value is the standard deviation. (Remember to verify that your answer makes sense, that it is not larger or smaller than it should be, logically.) These steps can be represented tion: by this formula for finding the standard devia- (4.2) where s is the standard deviation, E is the symbol meaning'lthe sum of," and X is an individual's test score. The calculation of the standard deviation is illustrated in Table 4-4 using t he s c or es5, 4, 3 ,2 , a n d l . T h e s c o re sa re l i sted i n col umn 1 and thei r sum i s used to determine the mean. The deviation scores are calculated and listed in Table4-4. Calculating StandardDeviation (2) (1) x -x x (3) (x -x r 24 4 3 2 11 00 -1 -2 4 010 ,l Sum 1 5 /# s = -v5 1 = .l Z = ' t.t' t Y 3Statistically,this division yields a "biased" estimate of the variance. An unbiased estimate would be obtained by dividing b,f (n - l), one less than the total number of scores. Since most electronic calculators that are programmed to yield the variance or standard deviation use n - I as the divisor, the value they yield should be slightly larger than that obtained with equation 4.2. Check a statistics book to learn more about this distinction. DESCR'BING AND SUMMARIZINGMEASUREMENT RESULTS Normality 'A formula thttj =glY*"r i" .quarion 4 2 and that is simpler computationalrv is _ f n E X 2 _ (E X), " A N D S U MMA R IZIN G ME A S U R E ME NRTE S U LTS 5:l DESCRIBiNG Figure4-4. TheNormalDistribution scores are between the values shown ()n the base line. For example, about 34 percent of all scores are betrveen the mean score and the score that is one standard deviation above the mean. It is useful to remember that (l) about 68 percent of the sco res ar e bet ween - ls and + ls , ( 2 ) a b o u t 9 5 p e r c e n t o f t h e s c c r r e sa r e between - 2s and * 2s, and (3) about 2.5 percent of the scores are in each tail beyond *2s. These rounded percentage values are accurate enough for our purposes; more exact decimal values can be found in tables in an introductory statistics book. The normal distribution is a theoretical curve that is assumed to include an unlimited (infinite) number of scores or observations. Therefore, it extends without lirnit on either side of the meari well beyond +3s. In practice and for convenience, it is often considered to extend from about three standard deviations below to three standard deviations above the mean. (Actualiy, 99.72 percent of all scores comprising the distribution are within those limits.) But the distribution of scores from a class of, sa,v,30 st'.rdents typically will not show a range of scores encompassing six standard deviation units. The figures shown below indicate the ratio of score range to standard deviation that can be expected for groups of the size shown here (Hoel, 1947). Sample Size 10 50 100 1000 Typical Range in StandardDeviation Units 30 4q 50 65 The typical values shown are averages.For example, we should expect a set of 10 scores that form a shape like a normal distribution to range from about - l.5s to about * 1.5s.There are too few scores in the distribution to expectthat any one of them would be as far as 3 standard deviations above the mean, for example. It might be a useful computational check to realize that in a distribution of 25 6iI D E s c R I B INGAND SUM M ARIZ ING M EASUREM E NBTE S U LTS fi:r;r:ff: the highest score is more apt to be 2 than 3 srandard deviations above SCORESCALESDESCRIBEPERFORMANCE Since the scores on different tests, when taken t widely differenr mear hav e s o me k i n d o f s ta son purposes. percen scales.We will discusr lnterpretlng scoresfrr percentiles and percentile Ranks I-he per_centirerank of a particular test score can be defined in three airi....,,-*:;r.rt is thepercenrase of scores in a distribu. il;1,,.1i.3:i;g,;;i:lntry t. .) J. is below the given score, or is the same as or below the given score, or is below the midpoint of the score inr..J of the given score. fable 4_5 shows-the effects of using each ol p".1. th e perce n ri le n ks f, yp" rf,ericaI scr :"_._""l -ra a percenrile rank of only g0 under"is J.nriiii"" I. The lc rank of 20 under definition 2. ih. ;.dt"n score ser .6 0 u n d e r defi ni ri on ?. B " 1; le rank UO, i" a symmerric distribution "r.l:-rt,""lJ highest"tand lowesr scores Uitfrif,. same disrance rercerrrile rank scale,as rhey ".. li""f a be. For these eferred and we will use i, "lr"" """" f,'.*'. a s,venscore in u po.d*r"?'jl,:f,lffiT:,::';ilfiliT*,J. Table 4-5. Effect of Different Definitions of per c ent ileRank s PERCENTI LE RANK UNDERDEFI NI I I O N Score J 4 z 1 80 60 40 20 0 100 80 60 40 20 90 70 50 30 10 percenrire rankof DESCRIBING ANDSUMMARIZING MEASUREMENT RESULTS65 l. Prepare a fiequency distribution. 2. Beginning with the lowest sc()re,add successivefrequency values to obtain a c olur nn of c um ulat iv e f iequenc i e s . 3' l-or the given score, identify the cumulative frequency up to, but not including, the s t or e. 4. For the same given score, divide its fiequency by 2. 5 Add the values fiom steps 3 and 4 6. Divide the sunr lrom step 5 by the t()tal nurnber of scores. 7. Multiply the result by lt)0 to obtain a percentile rank (rounded to the nearest whole num ber ) . Table 4-6 illu-strates the computation of percentile ranks for the spelling scores from Table 4-1. Ntttice that the score scale extends from one sco.e belo* the lowest score obrained ro the highest score obtained. And the scale always includes all possible scores between the extremes, even when no student mav have actually obtained some of those scores. The number of students who received each of the scores is shown in the second column of the table. The third column gives a cumulative frequency for each score. It is calculated by counting, the number of scores lower than the given score and adding the frequency ar thar score point. Consider the score 19, for example. There arE I I scores lower rhan l9 and four scores <lf 19. So we get a cumulaiive frequency of 15 . To obtain the percentile rank for a score of 19, the value in the fourth we proceed as follows. Take the cumulative frequency of scores betou lg 9o]umn, (which is ll) and add to it half of the frequencies ar a siore of lg (one-half of 4 is 2). The sum, 13, is divided by 30 (the number of scores) to yield 0.433. That qu otie nt is m ult iplied by 100 ( 43. 3) and r h e n r o u n d e d t o o b t a i n 4 3 . I n s u m m a r v . we take half the score s al the given score value plus all the scores bel.ow,we divide that sum by the total number of scores, and we change the result to a roulnded percentage value. can you verify that Andy's score of 2l has a percentile rank of l5l Table 4-6. Score 24 Computation of PercentileRanks Frequency 1 ZJ 11 21 20 19 18 17 16 14 13 z 5 4 3 2 Cumulative Frequency Percentile Rank 30 29 27 25 20 98 93 11 8 z 2 1 0 0 1 6I 7E, 58 43 32 22 13 7 z S DESCRIBING AND SUMMAFIZING MEASUREMENT RESULTS r which'o'?lr.".'f ,1?ffii['#.1i'"x:i:i*il:.fl11Tt'1i:i] jfr:l-scorebe ruld be below the percentile score in question; re,sixth score from the ,ap_* tiii..il:.,i;:;,ji %li il.iiifisiJ 4. The 20th l:.'iT,'&':,?:1, T:;"ffi,ttT,l""tr, j ?ff .',,n'"",f.l ueor,her LX'i; I1::s :r va, x nj*ltiliTi :: i n rerva r r i, orr68. ..#l:?:f fJ.".lll#+..J# """.,r,oi.iil1:;;rJff DESCR IB INA GN D S U MMA R IZIN G ME A S U R E ME NRTE S U LTS gI Rectonqulor Distribut ion F ig u r o4 - 5 . R el ati onB etw eenN ormaland R ectangul ar D i sl ri bui i ons , 100% 40" ?o"L 10"h It is clear from this figure that percentile ranks magnify raw-scoredifferencesnear the middle of the distribution but recluce.a*-rco-rediff..".r... toward the extremes.Stated in other words. a difference of l0 percentile rank units near the extremes corresponds to a much larger raw-scoredifference than does the same difference^in percentile ranks neai the mean. For exarnple, for a set of 100 scores that form a normal distribution, the number of scores between the 50th percentile and the 55th percentile is the sarneas the number of scores between the g0th and g5th percentiles. But the raw-score difference betr,r,een the 50th and 55th percentiles is smaller than the raw-scoredifference between the 90th and 95th percentiles. The score differences in standard deviation units are 0.13 and 0.37, respectively. (This can be verified using a normal distribution table, found in most statistics books.) In view of this p.op.rrf, does it rnake sense that percentile ranks should not be averaged? 14/tandard Scores Ijke percentile ranks, standard scores provide a standard scale,a com. mon yardstick, by which scores on different tests by different groups mav be 6E DESRIBING AND SUMMARIZING MEASUREMENT RESULTS Linear standard scores' Raw scores are transformed into standard scores usins the raw'score mean and standard Jeviation-in.'.ri..i'""rthis to create a new score scale transformation i! that has a prederermi;J ;;;" and srandard deviation' one basic type of standarJ".".., ,n, ,uroii, irr"r"a using this formula: - _X _X J (4.3) mputed using this formula: r = 10( z ) + b 0 (4.4) fi::'+:lllffn',:"J,,'lr:,ff:f fi lr"il$Hchievementrestsorschoras CEEBscore = 100(2)+ b00 The original standard ,.or.::::gd Educationar Developmenr (ITED) e.b) ro reporr the results from the Iowa Tests of .oL.-F.o,' this formura: ITED score = 5(z) + 15 (4.6) Finally, stanines afe computed with the formula S t anine= 2 (z )+ b (4.7) st whole number. DESCRIB INA GN D S U Mi ,4A R IZINME G A S U R E N 4TN T S ULi S RE Table 4-7' ch a r a cte r istics o f sta n d a r d sco res i n a N ormar D i stri buti on of the raw'score frequency polvgon frorn which the standarcl scoresrverederived. For exarnple, rhe l-scor.e distribution u,ill be negatively .k;rr;; if rhe raw_score distribution was,the z-scoredistribution rvill be le"prokuitic ir rn" ,o*...ore distribution was' and a L'rimodal,symmetric raw-scoredistribution will yield a standardscore distribution *'ith those same properties. Notice that the mean and stinctird deviation of each standard-score scale as shown in Table 4-7 is readily apparent i. the c'rresp""air-,g io.mula for com. puting each. -fo create a new srandlrd-score scale,we simply .i.,ttiply the z-score by the standard deviatio. desired for the new scareand then add the varue of the mean desired for the new scale.For example, if u.ewante.l u].s.,rre scalethat would hav e a me a n o f 4 0 a n d a s ta n d ard devi ati on of r2,-the' rormul a needed wout o oe J = l2( z ) + 4 o (4.8) If a teac.heradds 3 points to every student's raw score, have the raw scoresbeen changed to a linear stanfutrdtcore?rf so, how wourd you describe the new ser of scores? Normg! cunte equiaalcnts(NCEs). If we wish to assume that the *ait being mea. sured by a test is normally distributed,'it is possible to transform the obtained raw scores in a nonlinear fashion so that thl new distribution wilt be normal. o-bviously,it is not desirable to perform such a rransformation on a distribution of scores that does not resembll the shape of a normar distribution. The main reason for normalizing a set of scores is to permit norm,referenced interpreta. tions that take advantage of the properties'of the normal .,r*.. Stani.es ancr DESCRIBING ANDSUMMARIZING MEASUREMENT RESULTS o the r s t andar d s c or es r ep( ) r t ed bv p u b l i s h e r s o f s t a n d a r c l i z e d , rr,rn rar iz c d. 1' he pr oc edur es tesrs typically are f - or . " L f u r i n . g n,,.n.,otir.l' ,i"",,"rn scrib ec l in m os r int r oduc t or y scores arc ae. ; i; , i; . i ; x r b o o k s . The normar cutzteequiuarrnl (NCE), a has bee. nrade uoputu. lrne of normarized standard sc.re, p.'i-n.ir;i;:;;r. of irs use.in rep.rtrng sults fiorn f itle I prograrns evaruarior.r re. of the Elernentary and Secrjndary Eclucation (ESEA)' arsoknownas:'chapteri.;ixcE, Act n,= .6*fJ,Ji rr,,il',n,,equation: NCI ] = 2 1 .0 6 (z+) 5 0 (4.9) wher e t^ e z i s a n o rm a l i z e d .z . T h e c o mputed N C E varue rs r.unded to the.nearest whrle nunrber anrt only uut.,"ri.oJ'iiJ gg.u.. fionr _I.able ";.'A;;'b".."n il?R l;:'::ni* il*# [i"ilJ i.iui,ffi',' ;*:].:rilL";;;,;i" NC['s h av e ov er pel' c ent ile r ank s ? -- r-rLsrrLrrc ranKs' what advantages rnight CO RRE LA T I O N C O E F F T C IE N T S Correlation coefficients are statisticsthat show the one rneasure are related to scores from the aurnaI sure. I''r exarnpJe,there rs u n..d for an i"d;*;; r es t s c ore sw h e n e s ti m a ti n g s o m e k i nds of test reti a ro the sanregroup on difreient *u.ronr, or if two e to the same group. we use a correlation coefficie. agreement between th als o t r s ec o rre l a ti o n c r anc e.Ho w d o th e e mp ;ob performance aftei Foreign Language) sco ate school at the Univr Scatterplots Describe Helationships - A scatterplot or lresentation of the relation, ship betrr'eenthe score tgle group of individuals. is based o.r th" same lt € {e!ra and.geometr)' t" ptoi straight lines, circles,ar )olnt that is plotted is a pair of scores (X. Y) for a pt A correlation coefficient is a number that may range trtrorrgt r.iJ, i" - r.oo frorn + 1.00, :l:"T*:'.*:::,:?i^".;l LIH:fl#iTJ;#i:::l )ns, + l ' 00 or - 1.00,rarel l ' ,le correlation bet*".r, tru6 lationship between tf.,. t*o ex is t in pra c ti c e ,b u t b o variablei. A correlatio" variables. A graphical representation of . the of tron coefficienr can be iomputed ^pairs scoresfrom which a correla. is usefur rt. t*o .."*".. a*0, we can teil from DESCR IB INA GN D S UMMA R IZIN G I,4E A S U R E ME N RT E S U LTS 71 a graph if'the points tend to clusrer about a single straight line. If they do not, a nrore complex method of computation is required. Second,we can estimate both the direction and rnagnitude of the relationship from visual inspection of the graph. The graph provides a means of checking the accuracyof calculationswhen the correlaticln coefficient has been computed. Four scatterplots are shown in Figure 4-6 t<.rillustrate the correlation of two sets of scoresgraphically. Each point in a graph represents a pair of scores, X and { frrr each person. In diagram a, the lower scoreson tesi X are associated with low, moderate, and high scores on test r And the higher scores on test X also are associatedwith low, moderate, and high scoreson test y rherefore, we can conclude there is little or no relationship between the two sets of scores.A similar analysis can be made of each of the other three diagrams to verify the value of thc correlation coefficient estimatesgiven. Another approach to interFigure 4-6, SampteScatlerplotsShowingVaryingRelationships BetweenTwo Sets of Scores 72 D E S C R I B I NA GN D SUM M ABIZ ING M EASUREM ENT BESU LTS preting the scatterplot is to ask hor.r Y score when we know their ,.o.. , most accurate prediction of test I/ scr our prediction would be a mere cue of individuals, scores on X and"cor similar on both, the correlation is hl1 nearly opposite on the two, the corr Computing CorretationCoelficients There is a wide variety of co certain conditions and each .o*p.l most common type, the pearson pior here. Suppose *i have ,r.d u., .iubu scores and then sketch it. What is yot The correlation is computed using th txy x) (D Y) = - (D X)'l tnD y, 1EJ (4.10) where n is the number oT pairs of scores (or persons),E is the symbor meaning "the sum of," X is the score of a person o., o.r. measure, and y is the score oi the same person on a second ..r.urrrr.. The various ,";; .;;;ed to compute Table4-8. Measurements Useclto ComputeCorretation Examinee Daniel Liz Ben Carmen Ellen Rosa Michael Robert Rhonda Albert Sum Interview Score (X) Iest Score(Y) 10 B 50 33 44 53 z 25 JJ 48 J 56 x2 100 36 q 25 64 4 to 25 1C0 QE 388 Y2 3249 2500 1089 1936 2809 ozc 1089 2304 3481 1225 20,307 XY 570 300 99 220 424 FN 132 240 590 105 2730 DESCRIB INA GN D S U MMA R IZIN G ME A S U R E ME NRTE S U LTS 73 the correlation coefficient are shown at the bottom of each column in Table 4-8. we can subsritute these values in equation 4.10 to complete the computatlon. l0(2730)- (56)(437) T xY = 2828 2828 = -= 0 .9 4 JO4qez,ror) 300r How does your es^timateusing the scatterplot compare with the computed result? What does the 0.94 mean? The example used here is fairly simple, intended only to show whar a correlation coefficient is and how it is calculated.Most situations in which a cor. D( x - x ) v - v ) ?ZJxJy rxY=- F-2 axay n (4.1l ) (4.r2) Note that subscripts are used with s and z to distinguish between scoresfrom the two measures (X and )/) for which the correlation is to be estimat'ed. Interpreting Correlation Coefficients equivalent forms of well-constructed achievement tests,for example, tend to yield correlations of about 0.70 to 0.85. Values of 0.90 to 0.95 would be considered particu-larlyhigh for that situation, and valuesless than 0.65 might be considered somewhar low. on the other hand, ACT scores correlate aboui 0.b0 with gradepoint average at the end of the freshman year at most universities. Values o-fO.OO would be considered somewhat high and values of 0.35 would be regarded as fairly low. Coefficients of correlation are widely used to siudy test scores,build theories, and- make predictions. If calculated accurately, they provide precise esti. mates of the degree of relationship among the data on which they are based. Two 71 oEScsIBING ANDSUMMABIZING MEASUBEMENT RESULTS cautlons are in order, however. First coefficient is seldom a causal one.-( ably explain why X "rJ-y-"..".1i", biologystudenrscorrelare _ 0^60;;;, SUMMARY PROPOSITIONS 6. The mean of a set of scores is found by addinoall the scores and dividingthe sum ov in" tot"f ber of scores. "irn_ Z tqw extremely high or extremely low scores I tend to pull lhe value of the mean fro, tnJ nedian and in the direction "*"V of the extreme scores. 8 The varianceof a set of scores is the averageof rne squareddeviationsof the scores ltroni tne mean) 9 The siandarddeviationis the squareroot of the vaflance 10 Conceptually, the standarddeviationis a number that shows the average amount by which fhe scoresin a distributiondeviatefrom the mean 11. Thenormalcurveis a theoretical, symmetric,fett_ shapedfrequencypolygonthat nasbecome a rer_ ativestandardfor describingcertain typesoite'sr data 12 The largerthe numberof scores in a group,the greaterthe expectedrange of scoresi; standaro deviationunits '13.The use percentile of rankspermitscomparison of performancef rom tests that may Oiif"l i" in" , means,variability,or distribution of scoresthef yield 14. .Thepercentiterank of a Etvenscore is nlosl ao_ propriatelydefinedas the percent ot ."ori.'ini group that fall below the midpoint of the score interval in which that score is located. 15. A complete silt of percentile ranks yields a fre_ quencypolygonthat is rqctangular in shape 16. Percentileranks are percent values betweenO ME A S U R E ME NRTE S U LTS 75 DESCRIBINAN G D S U MMA R IZIN G and 100; percentilesare raw scores thal may haveany valueon a gtvenraw-scorescale 17 Conversionof normally distributedscores to percentileranks increasesapparenl score differencesnear lhe center of the distributionand decreasesthem near eitherextremeof the rawscore distribution '18 A z-scoreindicatesthe numberof standarddeviation units an individualhas scoredabove(+) or belo w(-) the me an '1I A standardscore can be computedby usingthe valuesof the meanand standarddeviationoJthe new scaleto modilylhe z-score 20 Normalizedstandard scores are provided by so that the usefulproperties manytest publishers of the normalcurve can be incorporatedin the of the scores interpretation 21- Thecorrelationcoefficientis a measureof the de- 22 23 24 25 greeof relatidnship betweentwo variables,based on oairedvaluesof the variablesoblainedfrom each of a numberof personsor things Possiblevalues of the correlationcoefficient rangefrom 1 00, expressingperfectpositive(dithrough0, expressingabsence rect)relationship, o f r e l a t i o n s h i pt,o - 1 0 0 , e x p r e s s i n gp e r l e c t negative(indirect)relationship A scatterplotis a graphthat can be usedto esliand directionmatethe correlation-magnitude betweenthe variablesused in plottingit A telalively high correlation between two variables is not sufficientevidencefor concluding that one variablecan predictthe otherin a causal relationshig. obtainedfrom smallsamcoefficients Correlation ples are subiectto largesamplingerrors FORSTUDYAND DISCUSSION OUESTIONS evenwhen no individuals 1 Why shouldscorevaluesbe includedin a frequencydistribution obtainedthosescores? 2 What uniquepurposesmightbe servedby frequencypolygons?histograms? 3 Why mighl lhe medianbe more usefulthan the mean for describingtypicalperformance on a sixth-grade Promotiontest? 4 How are the averagedevialionscore and the standarddeviationdifferent,conceptually and computationally? 5 Fromwhat kindsof measuresmightwe expectto obtainscoresas highas 4 or 5 standard deviationsabovethe mean? score? 6 What is the essentialdifferencebetweena percentilerank and a percent-correct two between that the correlation to know it be useful 7 Under what circumstancesmight measuresis aboutzero? line,howshouldthe relationship lor two variableslormsa straight,horizontal I lf a scatterplot betweenthe two variablesbe described? The Reliability of Test Scores THE MEANINGOF RELIABILITY Reliability is the terrn used to describe one of the most significant properties of a set of test scores-how consistent or error free the aie.t 'scores that are highly reliable are accurate, reproducible, and-."rrri.-.rrts generalizable to other testing occasions and other similar test instruments. Foi norm.referenced te-*ts, thii means that the use of a comparable test, under similar testing conditions on another occasion, wilt yield a distribution of scoresthat will place examinees in essentially the same rank ordering. The scatterplot of the ,io..* from the two measures will be a set o-f points that cluster about a slanted straight li.ne. For criterion'referenced testing situations, reliability still refers to con. sistency,but Placing examinees in the same order on two occasions is less a rele. vant goal. when the purpose is to estimate how much of a domain each student knows, testing with equivalent instruments on different occasions should yield the same percent-correct score for each student, not jlz,llit the same ordering of test takers. when the purpose is objectives.referencedl leading to mastery in*ter. pretations, the concern is not so much with reproducing the s"amescore as with replicating the- original-decision-mastery versus norr*ist.ry. Thus, the notion of consistency is the basis for the meaning of reliabilit* but how we conceptualize consistency for a certain measurement siiuation depends very much on the kind of score interpretation to be made. could a given set of ,.L.., be considered rPortions of this discussion of reliability were taken from an instructionat module prepared for a series sponsored by the National Council on Measurement in Education'lfiisbie, l e 8 8) . 76 OF TESTSCO8ES THE BELIABILITY T7 quite reliable for criterion.referenced purposes, but not so reliable for norm' referenced purPoses? In view of the differences in interpretaticrn of reliability for normreferenced and criterion.referenr:ed situations, separate discussionsof the two will be presented. Similarities will be described in the last section of this chaPter when ciiterion-referenced circumstancesare addressed' Dellnltlons ol Scote ReliabilitY The classicaldefinition of score reliability makes use of the idea of corre' lation and equivalent tests: of is thecoefficient The reliabilitycoefficimtfor a setof scores from a g?up of examinees on an equiaalenttestobtained correlationbetue;; that sit of scoreiandanither setof scores o.fthesdmegroup' indzpmdmtlyfrom memhers Three aspectsof this definition deserve comment. First, it statesthat re' liability is a property of a set of test scores' not a propely -of the test' itself' A nutrition t.rf .ot,td yield fairly acclrrate scoreson a certain day when given to a particular class.but could yield fairly inconsistent scoreswhen given to a differint classor when given to the same classon another occasion.The more appro'in the group, the higher the reliability of Lherange of achievement in a group, the ned with a test of that achievement.Even :fer to a test as "very reliable," what they uld say) is that the scoresobtained from nder certain testing conditions could be test. That is, the scores are highly con" sistent. Second, the definition specifies the use of a correlation coefficient as a is measure of reliability. One of the characteristicsof the correlation coefficient that it provides a relatiae rather than an absolutemeasure of agreement between pairs oi scores for the same persons' That is, the scores do not need to be the !*..i r"r"" numbers on the two occasions. If the differences between scores for the same person are small relative to the differences between scores for different tend to show highly reliable scores. Conversely, if the f"rro.rr. itren th. test will between scores for the same person are large relative to the differ' diff...n.", ences beriryeenPersons, then the scores will show much lower reliability' f.'hird, ih. d.Iinition calls for two or more independent measures, obtained from equivalent tests of the same trait, for each member of the Sroup: of This is the heait of the definition. From this it follows that the various means the will provide achievement same the of measurements independent "Uli"i"g basis for"severil distinct methods for estirnating score reliability' A Theoretical RePresentatlon For theoretical purposes we assume that a test score can be partitioned score into two componentu, a i*"-r.ore and an error score. The hypothetical true if tested obtain would person the scores of the of an individual is the average 1l lHE R€LIAB'LITYOF T€ST SCONES i **.-"' -,,h iii"r thc sarncrcsr ].he retationship berweenthesescoresi5 x=1.+F. (51) 3T 25 2A 21 2 35 35 40 32 -5 +3 The variance of the observed (raw) scores can be represented as r:i = r? + sjr. $2) (53) TII€FELIABILITY OFTEST SCOBES79 means that random erro$, not true differences, mainly exPlain why examinees obtained different scores Sourc€$ ol ScorD Vrthblllly A major goal of tesr makers is to maximize th€ true variance and mini mize rh€ error variance rn the scores on their t6sts. What are the factors that influence each type of variance? What are the many reasons that explarn why indiliduals in a group achieve different scores when they take a test tiat is aPPro' priate for the ability level of the Sroup? As an example, Carlos and Marta might obtain drflerent scores on the same geography test because Carlos knows Bore about the content covered by the test rhan Marta We want our test scores to reflect this kind ofdifference, so no enor is involved if this is the sole explanatron for the score diff€r€nce. Howeve': all other possible explanations are Potential sources ofmeasurement error. Here are some examDles: Alrhough Marta rs an above averaSe reader, for some inexplicable I io reread nearly ev€rything on the test She seemed unable to she had reason concenrrat€ well enough to comPrehend on the fi$t reading Carlos's attcntion did not seem to fluctuate in any unusual way duxing the test 2 The reacher recognized both Carlos's and Marta's handwriting when points for one of her responses 3. Carlos was fortunate in that the two essay questrons related closely ro what he had most recently studied, but Marta had concentrated her study rn several orher areas Instead. She might have been more ,ucc€ssful had a different pair of essay qu€stions been asked. 4. Marta did not read the inslructions carefully and forSot to answer the five ir€ms on rhe back side of the last page These items were marked as 5 Carlos guessed correctly on four of fiv€ multiPlechoice items, but Marta was correct on only two of the six guessesshe made In thes€ itlusuauons (he €rrors rhat occured affected Marta and Carlos differently and probably affected all oth€rs iD the grouP in other various ways. Ihese {e calledrandon arr6 because,if we wer€ to Sive thcse srudents an equiva' lent tesr or give them Lhe same test again, we would exPect these kinds oferors ro have a somewhat different effect on each examinee the second time f,ach tyPe oferror night be present or absent in a sPecific testing situation for a given test rak€r. Sometimes th€ effect of an error will be fairly large, sometimes it will be fairty small, and som€times it will b€ abseDt altog€ther Unlike nndom errors, ststnntX errors affect all examinees in the same way and cause all scores m be higher (or lower) rhan they ought to be The* kinds of errors do not contribute to score differences among test takers, bur they do affrct th€ absolute magnitude of each examinee's score For €xample, there was a shori.answ€r definition ir€m that everyone got wrong (even though they all 81, THE REL]ABJLITY OF TESTSCONES rhe possibte sources ofvariance FTOUES-t. in scores p@ib 6 Souroes Varian.e or i. Sore oi a pariicutarIesl fihorddike 1951) , .-aslhg dnd senercl clat a"l".r i". or.n",;;;q Leveto. ab'tityon one or Foie o€ne,st tr.;,( tt, :::?:i,!,or€l many senetat charccteistbs ot the jndivtdual(Fac(J6 atlectins p€dormanc€cn ts ar a partcutarIme) D Emoiionat siain E Generatlesiwiseness ingwirhlhe pa.l/cutartost mareriats s invotvod (especialty in psychomotof tesb) lhan nomory ll;:i::il;::i'"""'*** !Po4th€senera' V Systahatrco. cnance taclo.s afiecinO ho aonh.istar/on ot rB test ot he appatsat ot tesl a, ccnditions f bs1ing_adhersnc€ b timeimirs, treedomkorndtskacrions, ctarityo, Insrucron olc B Unreiiabjtjtyor bias in subjecitvoEtirg ol kais o. perrormancGs vl uariancenot othe?Wseaccou ed fot (chance) A. tuck,n serecrronoran""-"or'g";"iiig- THE FELIABILITYOF T€STSCOBES E'I in an attempt ro identify rhose rhar conrribure ro error and rrue score variance. His.caregorization_schcmeis reproduced in Figure 5-1. Some facrors explain why rn inditidual mighr obrain differen! scores on the same rcsr on Mo;acasiors, and some explain why examinees tesred on he same occasion misht obrain scores thar ditler i.rom one anorher A detaile discussion of each caaecorv cal be f ound in S h n l e y (1 9 7 1 ).l ' o r o u r p u rp oses. sev€ral generati ,ari onsc;n be drawr from the lisring in Figure 5-ll I _ All rhe sourc€sor v,ri, n.. .lo no t Dece$arilyoperatein ev€ry lesrint siruation. 2 So,re facrorscontribure ro enor scoresin sometesrinBsi.uationsbu!contibure ro rue scoresin orher situations3. Reliabiliryis not simply an infinsic rrail ofa !est;ils valuedependson the naur€ uf r he B ,o u p re \re d rl . ,d rc s (o n re n r,d nd rhe .ondi ri on\ofi e,ti nc. METHODS OF ESTIIiIATINGSCORE RELIABILITY There is a need ro esrimate resr.scorereliabilrry so thaiwe €anjudg€ th€ exrenr ro which measuremenr eEors mighr inrerfere wirh th€ inrcrpretabifity offte scor€s. B e. r us c r c liabi h ' t c rn b e i n fl u e n .e d b y s u many facror;-the group tesred.rhe test conren! and testing conditions- ir is nor possible ro sente on a single method for esr|maling reliabiliry for all resrinS siruarions. At leasr five Derhods are us€d in pr dc t ic e Lo o b ta i n rh e i o d e p c n d e n t me al urementr necessaryl or e$i mari nq !elidbilir ) . T hese n rc rh o d , y i e l d (u e ffi c i rn L sof srabi l i ry,€qui vatence,and i nrernai Slabllaiyfutlmrtes 'fhe r€st-reresrmethod is essenriallya measur€of examin€ereliabilitv. an indication of how consisrentlyexamineesperform on rhe sameser of tasks. 'l he simplesrand mosr obvious merhod of obraining repearedm€asuresof the iame ability tbr rhe same individuals is to give the samer€sr twice This woutd provide rwo scoresIbr eachindividual resred_ The correlarionberweenrhe s€rof scoresobtained on lhe firsr adminisrrarionof the resrand that obtained on fie secondyields a tesr-rerestreliabiliry coemcienr.Nore thar such r€mporarycharact€ris!'cslisted in Figure 5-t as healrh,farigue,memory flucruations,ani comprehension of ihe specific rest task are likely ro confiibur€ ro t}le €.ror score when this method is used.The resr-retesrmerhod is pardcularly uretul in sirua. r'onswher€rhe rrail beingmeasured is expecredto be stableover rime.Th€n, if the scoreson the rwo occasionsyield differenr rank orderinSsof the €xaminees, measurementerror is rhe single most likely explanationfor such differ€nces. A number ofobjecrions io r}le tesr-rerestm€drodhavebeenraised,esp€ciaity for use wirh achievementresrs.One is that exacrlythe samerestirems ir€ usfd bofi limes.Since rhis \€r ol iremi representsonly one samDlefrom whar is urdrrurilv a \ery lar8e popularion ot posslbteresrit€ms,rhe sco;eson rh€ rct$r provid( no evid€n(e on how mu(h rhe scoresmishr chanq€ifa diffcrent samDte uf qoesrionswere used tcaregoryILB.I trom Figure 5-l). Anorher objectio; is thar srudenrs'answers ro the s€condrestare no! ind€D€ndenrof their a;swersro e THE F€LIABILTYOF TESI SCOBES he retest undoubtedly are influenc€d to o bv student iscussion and individual or objec'rnierval bet een restings A third mea' of rr l ong enors retcst tl | e r esr.rnd . hanqes In student rbi l rty Js a rtsu[ ol re sa;e test simplv ro determine how reli teachers as a lery effi' Inost studen$ ';d mav make the second test a much Poo'' rn both casei rhc rest-retest method is nor recommended i" ir'. i-"'i-i."i achrevement tes6',-.reli^bilitv of s'ores f'om the for 'lassroom "stimating EqulYrlod Forna E3llmat6s in such a way that lf two (or more) forms of a test have been Produ'cd wilt be equivalent and if ro;ns alrcrnate these .ry ir'rt,r'. icores on it *.-"-i the test then the correlatton beof {orms two is siven ;;:;;;:r;i" srouP "tlie twb forrns pro"ides an estimate ol score reliebility A higb' t-*, -." ,l."n " ,. ;""" rh a r rh ' rno rci Inrrn\ can be ured i nrer' hange r . ii" t r iit l ' ' .ri d e n te rtJ i r' Brrr el drrvel \ l uw e\ri ntrre i r an i ndi ' ati on . ; v as nie a s u re ' o l rrre rm e the ' onrenr domai n ot i tc ms .r,." Pro b d b h Jrr nur ' anrP l i rrg , l" i it ' . , *o ..,, ([ l l h FiSure 5-l) errcr il-**"' rhis kini of content samplinf "".,1i" -.rr. ::l[::T$f:!":.'la I::::i'i:?:':'.','Ll L test scores equ,va,en' iT :-:,0-i,J;1:';,:,il$;:;il.-l:.:'i ili erswhen the etlecb or I ':t-' school vear or to assess "-;', are not thal true educarional reliabilirv is essenrial'so f--' ;;;il;;; errorsm;sked or artificiallv elevated by 'neasuremcdt lntom.l AnalFlr sarns Melhod3 of test-retest and Th€ drfficulties assoctateo wnh the derelDrinalrcn i nf^rrndrrun i nrerndl l o ot a s ingl e l e s t d n d o n th e u r o f' o m Ponent ' ubre\rs t he t e\ t , to e s l i ma l e l e q l s c o re re l ra b rl rrr ti madon' One common method of spl i tti ng it€ms and the elen numbered'rtems numbered th€ odd a iest has been to score between scores on the odd and even numbered il;;;;"-"radon l"';.i;iy. THE FELIAB]LTYOF IESI SCOFES 8:I ir em s is . al c u l a re d Of (o u rs e , s p l ,tri n g a rcsr i n rhi s w l y reans rhar & c sco.es oD rvhich rhe reliabiliry is based are from halflength resrs.To obraiu an csumare of t he f elia b i l i rv b a s e do n th e fu l l l e n g Lhtesri r i s ne.essarrro.orrecr, or step rhe half:test correlarion ro rhc lull.lengrh correlarion. (As you {ill s.e shord}, lengr h of a tc s t h a s a a e ry d i .e c r e l i e c t on rhe rel i abi l i ry of rhe scorcsw e can fron i(.) l-his is done wirh rhe help c,f rhe SpearDan-Brown foDnula. When (5.r) f + | lr her e r is th e re l i a b i l n y o fth e o rj g i n a l scores l orcxarnpl e, i frheodd lation betNeen t$o 25.itcm half.resrs is 0 82, rhe reliabiliry of rhe roral rcsr scores (50 items) is I 6'1 divi(led bv 1.82, rvhrch is approxnnarely 0 90. (5 5) wltcrc r, is rhe retiabilirl of scores from rhe ue{, lengtheDed rcsr, n is rhe nunber of ti D e s rl re o ,;g h a l 'he re s r i s l engl heneo, and r rs rhe rel i abi i i ry of rhe original Lest s.orrs. S u p p o s e r g i v e n s e r o f s c o re shas a rel i abi l i r! of 0 50 and w e w i sh ro ioc r eale r h c o ri g i n a l l e n g rh o f rh c re s rb v ni ne Li mes,addi ng neu i rems equi val enr in c o. Lent a n d d i ffi c u l ry rc , rh e o ri g i n a l i rems.The l el i abi l i (y oI rhe scoresfrom the new resr is prcdi.rcd to be I f r he or igi n rl rc s r c o n ra ,n e d 2 0 i tc m s , dre new rest w outd necd 190 equi vateD r ir c m s t o lie l d a re l i a b i l i t) o f0 1 l 0 Ol c o urse, i or rhi s predi .ri on ro hotd,i tudents should be expe.red ro rcspond r. 180 items rlirhour gctring I)rore bored or ia t igx ed ( han th c y w o u l d g e r b y re s p o n d i b g ro the ori gi D at20.,{ nd rhe added rtems should be similar to rhe or iginal ones rn rerms ot conrenr, dilficulty, and overall Kuder-Riharhon. Two of rhe lnosr widely accepred Dcrhods ibr esomadng re liabilit y wc rc d c l e l o p e d t,y Ku d e r a n d R i chardson(19t7) The' r tbrD uta 2C l ab. : breviared K-R20, is [, - ] (5 6) II THE RELJA9LTY OF IEST S@RES . = ---:- L Xrr-v_, 1 i_r r,__lF._J . n iltustradon of rhe coDpuratioDs of t(_R2( SCOFES85 LTYOFTEST THERELAB with intermediate response opdons The forDula resemblcs th€ one for K-R20 becalrse K-R2Cr is acrially a special case of the alpha Procedure rhe formula rs (5 8) the where si rs the variance ofa single rcst ttem When alPha is used to estimate ['-T] A = i 1 = Dsi = si = (5 9) numbe. oi s€Paratelys.ored e$al Les quenions lariance of studenb scoreson a Parricular item sum of the ten vanancesfor all .est rt€ms varianceof the total e$ay scores This method ofestimating reliability ofscores emPlovs concepts from the statisti rndividuals across the ircms are quite similar' In such ciroDstances the separate jndividual differenc€s in the achiele essay rtems ate consistent in identifying tr rnent mcasured by rhe essay test as a whole USIN G RE LI A B I LI T YI N F OB M AT ION The reliability coefficicnt is an index of the amount of error associated uirlr a ing the meaningfulness and usefulne tion about erroi can be used to make estimates of the rue scores oi examrnees and to assessthe Pmctical significance of the diflerence between scores of two or more test rakeri How all this can be accomPlished rs our next set of'oncerns IntelpretlngReli6bllltyCo6ttlcients There are no absolute standardsto serve as criteria for determining s{andardshave evolvedover time for evl rAn 2pplicaLbn oalhis meLhod ro a sidplc case is lllustraLed in Ebel (1979) I 86 THEaELABLtTyoF TESTscoBEs must bedepends mosrly on how rhe scores will be used_whar kinds ofdecDions will be made and how much werghr rhe rest score wilt have in re decision. Ex per ls in e d u ,r o l a l me r-u re rn c n r h dve agreed i ntorma y rhdr (he reti abi l i l \ ro. ef f i, r en r s h u u l b e a r l F d ' r 0 .b 5 i t rh e l cores w i be Jsed ro make deci ri ons abo,,l B,ozp. of ind'viduals, like a class, rhe genera y accepled minimum standard is 0. 65. Usually, we can tolerare reliabrliries around 0_50tor scores &om reacher_ made tests if each score will be combined with orher informarion_res. scorcs, measureDents rhat should concern us rhe mosrj ir is rhis roral score, nor rhe score g poinr in t'tsrru.rion for each studenr, te add,tional corroboraring infornation s.ores that will provrde informadon for grading. Standard Eror ol Mgasur€ment 'l he reliabrliry coefficienr is a useful indicaror of rhe exrenr ro ryhich a set oI rest sco.es error frce or error laden, bur ir furnishes no direcr assistan.e in eslirn,[ing the'srrue scores of examinees. In almosr all pracucal measurement situatrons, rhe onl}' informarion availabie is rhe ser ofobs;rved scores of rhe pei. sons measured. Therr rrue scores and error scores are borh unknown. Howe'ver, grven rhe standard deviarion of rhe disr burion of observed scores and rhe reha. bility cocffi.ienr of rhose scores, rhe standard deviarion ofrhe hyporhericat enor (5.r0) THEFELIAB]LT}OFTE5TSCORES tr' Tlblo 5-1. Rollabllllvand Errorsol Measurem€nl 1a 9 15 21 12 -2 +1 +2 +1 2 22 10 i5 0 15 13 2A 20a 16 10 = 0 865 \AB = 1 67 (d rectcarclraron) 456 x 0367 r ' and r ar e s u b s ti ru te di n e q u a ti o n 5 .1 0, the val ue rf = 167 i s obrai ned. Thi s shows that an estimare of rhe standard deviation of rhe errors of measurement can be obtained with rhe standard deviation ofrhe observed s.ores and rhe relia. bility coefficient, without any informarion about rhe individual errors of meaThc standard error of measuremenr prolides an ind'carion of rhe abso. lute accuracy ofthe tcst scores usrng rhe obsened score scale.t'or example, ifrhe standard error of measuremenr for a set of scores is 3, (hen for sliehrly more r han r $o. r hir d s o l rh e o b s e rv e ds (o re 5 ,d Lour 68 pe' (enr nf rhemr rh; errors ol measurement will be 3 or less score poinrs. For lhe remainder of rhe scores, of scores witlin which the person s true score is expecred ro be. Using th values from Table 5-1, we could b€ about 68 percenr sure that Dan's rrue score in rhe int€rval 10 + 167 or 8.33 to 11 6?. To be 95 percent sure, we would say Dan's r r ue r c or e is w i l h i n rh e i n re rv a l l 0 + ,2 r1.67or 6.66 ro l 3 34 \ore rhar rhe per. c enr a8c s68 a n d 9 5 ( o n e s p o n d ro th e p e rt enrdge.under rhe nor mai ( un e w i i hi n one and two siandard deviadons, respecrively, of rhe mean. The standaid error of measuremen r is the most common indicaror of rhe amo nt of €rror contained in an obsened test score Buc irs limirarions have c aus eda n' r mb e r o f re s e a rrh e ' s addi ri onal w ays ol a,counri nS Ior ' ore .o.On n s ie der er r or when in te rp re ti n g a te s r s c o shortcomi ng ol $e srandard error ot 66 TI]EAEL]ABILTY OFTESTSCORES measurement is rhat i( provides the same error estimate lor everpne in the group, even though it rs reasonable ro expcct indivi.luals to have varying error s c or es M et h o d s th a t p e rm i t th e c o mp u u ti on ofsnndrrd errors ofmeasurement for each of reveral s.ore ranges are desoibed bv Feldt and Brennan (1989) ln addition, they describe procedures for obtaining crrorscore eslimates for each individual in the rcsted group The Probl€m ol Low R€ll8blllty Suppose the K-R20 fron thc scores on a histor! unit test turns out to be 0 33 and t he tc a c h e r d e c i d e s tl ri s v a l u e i s uD sari sl acroryW hat shoul d be doncr Afrer all, rhe scores seem ro hr\e rdae worth Llrough less than had been hoped originally. Perhaps rhe first action, when pmctical, should be ro improve rhe con. dit ions t hat c o n t b u re d to th e l o w rc l i a b i l i tv and then reLestB ut. ordi nari l v. resi ui l . pre,l ude rhe do orer dev elopm en r a n d a d mi n ;' ' ' d ri o I ' i m c ,on\rr.ri rr\ the scoresbul discounr theh. be ro retain alternative A second alternatile would that is, assrgn less weight to them in the decisjon process thaD had be€n planned originally For exrmple. if rhe ro count as 25 percent of the final grade, rheir weighr might be dropped to 20 percent, or a bir less When discounting is ernploved, as described above, decision making can b€ affected in important (and usually negatrle) ways For example, if the dl} counted scores related to significant prerequisiies lor later learnrng opporruni ties, the wisdom of the decision to discount tlre scores ould be questionable. If the dscounted scores wcre supposed to measu.e hrgh€r.order thinking skius in t}le content area or problem solving or application of content, subsequent deci sions might be grossly nisleading In sum, dre use. must be aware of the tradeoffs involved when using discounrcd or undiscounted scores, when eirher has relatively low reliabiliry. Low re l i a b i l i ty s y m p to m a ttc o f an unheal thy testi ng srtuati on,j ust as 's trody tisste. We cannol rell in either case whar high fever indicates unhealthl the problem is, but rhc symprom suggesrswhere to look. Was it rhe test, some characleristic of the examrnees, or some aspect of the testing conditions? Was it a combinatron of these? The user should determine Dlausible exDlanations so thar a decision can be macle about whe(her lo use the scores lor tireir inrended FACT O RSI NF LUE N C IN GSC O R ER EL IA BIL IT Y When we understand fie various faclofs rhat can imDact the reliabilitv of a set of \ ( or r s . $e , a n i n re rp re r a n d rs e rh e or e. pruoenrl 1. rnd w e can ai ternpr to manrpulate those factors through test prepararion and administration acdvities. lcal'aeleled FactoE The reliability of achievement-test scores is affected by the number of iaems in the test, the €xtent to which tesr content is homogeneous, and the charac. teririic! of the individual ircms-their difficulty and discriminarion capability. TH E FE LA E LTY OF TE S JS C OFE S 1 lt\t ha\gth -Ihc SPearrnan Brown forrnula (equation 5 5) 'ndicates r lic t heor c r ica l rc l a i ro n s h i P b i rt(e n s c o re rel i abj l i l v and tcst l engl h The el l ect ol s uc c c s s i!e.l o u b l n rg so fd rc l e n g th o f aD ori gi nal fi !e i rem test,{ hi ch }i el ded fa b l e 5 2 The samedal a are shol ' n graP hi cal l v r r elr ebilir y o l 0 2 0 ,i i rh o $ n i n rh addc.l [est l cngth A ddi ng 60 i tems to f ron 0 50 (o 0 80 B ut addi ng 80 l rore re l i a l ' i l i ry a 2o- ir em r cs t c o u l c l i .c rc a s c _f t ( ) a s s u trP ri o n so, n e s ta ri { rc a l t hc us e of t h e Sp e i rm a n -Bro w fo rm u l a fhc ttati sti ' al assumP ti oni s rhat thc ir em s x c ldeclL o i l i c o ri g i n a l re s l to i rc re ase i ts l eng| h have the sane stati sti cal the addcd itetls should have thc same a nd thei r addi ti on to Ih€ test shoul d not s l i ke rhosc i n the test l aci l i l ates correct i t, or i fanv other factorsD rakcthe exami l engthened test, rel i abi l i ty pl edi cti ons a couL(lbe erroneous 2 'lest nntmt Homogeneity of test (ontenl also tends to cnhaDce score test about the V i etnam W ar era i s l i kel y r elr abilir v ( G u i l fo rd , 1 9 3 6 ) A l 0 100 i tem rest covcri ng A meri can brtorv r o Dr ov idc n ro re re l i a b l € s c o re s tte r i n scnnecourscs,such as marheoari cs af r ir t h€ Civ i l W a r i \l s o tl te s u b j organized. 1!ith g.eater intcrdePendence emcn!s. than is the subject matter ol liter' o l testcontent hotnogenei ty drar makcs s ts ofmathemati cs and forci gn l anguages 3 lkn dwa.L7btict The irems in homogeneous tests also tend to div crimiDatc betr{eerr high lrnd lot achievers be(er than items in lests 'overing rs nr o. e div er se c o n te n t a n d a b i l i ti e s Brrt the abi l i ty of an i tem to di scri mi nate 1, . ^' r d e p c n d e .t o th e tc c h n i c al quaLi t] of the i tem-on the soundness " iTable5-2. llelaiion ol Tesl Lenglh !o S c o reR € l i a b i l i l Y 5 10 20 30 160 324 6{ 0 020 033 050 067 030 089 094 097 100 T H EF E L I A E ] LT / OF T ESI S' OF ES FiguEs 2 neaLororTesL . hoi. e ir en r, th e a d e q u rc v o f rh e c ()n e c b e x a n ri D e e \o l l r)\e r rt)i l i r!: Ihe darurc rr dete.Iri narr). ot indic es of d i s c ,i n ri D a ri o n a l d rh c i r rc l al i on k) r.l i l bi ti rv !,i l l l ,c di s.xss.d i n iDdiv i. lua l i tc n rs i D mo s r c l a s s ro o n rrc srsrs probabl y l hc nl xr etL.rtre nreans()t s .o .e rc l i a b rl ,ry a n d , h e n (c . tcsr qnati r) ' npr owing T h c d i ffi .u l ry o f a re s l i tc rn a f tccrs i rs .onLri bnri (). ro s.or( r.| i l )i ti tr csr score rel i abi l i ty (han ar i rcm rl trr i s bilit y I t ems th a tm o rc L h a D9 0 p e rc e D ro' ti vcr thaD 30 per.enrof rheexrmi nees ans r r erc ofi e c tl ) c a D n o rp o s s i b l yc o n rri bu(e rs rnuch contrarvropoputa.bel i el a good norm .e fe rc n c e d a .h i e v e m e n r r estsetdonrshoul d i n(hrae i rei ns * ral \arv widelv i. diffi.uh! I S.o re l a a h t6 C l a s s ro ()m tes6 are soD eri ncs co.sl ru.ted aod s c or eds o t h a t rh e ra n g e o fs c o rc s o b ra i ncd i s much l cssthan i r coutd be, rheorel i . c ally -lor e x a mp l e , a n e s s a vte s l rL i rh a t00 poi nr mari mum sco,c mav be grade.t THE FELAB LITYOF TESTSCOFES 9I s , r h a v ic w r o n a k i n g 7 5 a rc a s o .a b l e p a s s i ngscore rhi susual i y l i mi ts the effec ti\.e range of scores to about 30 points A true false rest, scored only for thc number of nems answered corecttl hxs a useful s.orc rangc of only about lralr rhe number ofirems A muhiple.choice test, on ttre other hand, ray hare a useful score range of lhree-fourths or mo.e of thc number ot items in the rest llence rhe scores froD a 100.item nmltiple.choi.e test are usrally more rcliablc than those from a 100'i|enl tnre false test Bur stlrdents gcnerally can rcsPond lo tLIee t r ue f als e it c ms i n th e ti m e re q u i re d to resP ond to a P ai r of conteni P a.al l el score ranges and arc likely to Produce s.ores of e.lual reliabiliry. set in advance as the minimun passing scofc The odrer two tests were scorcd bv false t€st is 75, half of the 150 items Notice rhat the erpected variability of thc tnie-false and nrultiPle.hoi.c tive of the rcsults we could expect teachers to achieve when using tcsts of these Exa|lrines.€lal€d Factors Score reliability can be influenced by the amount of variability in T!bl€ 5-3. Hyporheti6alTest Statistlcslor ThreeTesls IFSI 'YPE Epecred slandarddeviation 875 5 25 (7s-r00) 051 150 1125 15 75 (75 150) 088 100 625 15 75 (25-100) 091 92 II]E FELIAE]LTYOF TESTSCOFES ability, bur rhey can and should rry to rmprove borh resrwisenessand motivarion in wal's that will improve feliabiliry. l. Group heterugmeiq.Tbe reliabiliry coefficienr for a set of rest scores depends also on rhe range of talent in the group resred If an achievemenr resr reliabiliry ofany subser of scores from a single grade The reliability coefficienr, as we have said, reflecrs rhe rario of rrue score then the observed score variance will l ingly highea (2) their error variances will all be about rhe same, and (3) the reiiabilI I ie. ot r he i r ro re . { i l l b e i n (re a s i n g l l h;gher There are circumstances in which the studenrs in a class are verv similar ro one another in tle'r achievemenr of rhe obiectives in an insrrucrional unir. t he \ r r nJ ar d d e v i a ri o n o n rh e u n i r re \r v ery 5;al l dnd rhe reti abi ti rv(oefi i ( i enr i' qu ir e luw. a l s o .T h j \ ma \ b e r s u a ri o'n! i n w hi ch d hi gh.quati rr resr.qhen gi ven properlt cannot yield scores ofvery high r€liability. Dep;ndable differences in di hie\ em enr (a n n o ' b e d e rc .' e d b v m o s t rcsrsi f rhol e di te' el (es dre neql i qi ble- r oo \ m a l l ro b e o f p rrc l i c a l .u n s e q u e nie. I houqh gl ou o homocencnv mar be d plaus ible e x p l a n a ri ;n Iu r to s re l i J b i ti ry ar ri meq,w e" sho;l d nor 6' eroo oui cL t o ignor e r he \i g n s o l i a u l r! re ,r rre m c b e fore serrl ;ngon group homoS enei i ) d, the mosr likely reason. 2. Stulznt testu|smass.When rhe amount of rest-laking experience and levels of testwisenessvary considembly wirhin a group, such backgrounds and skills may cause ,cores ro be less reliable rhan rhey orherwise wouldbe. When all c x am inees in rh e g ro u p a re s o p h i s [(a re d tesr rakers.or w hen al l are retari \et! nait e r bour rrs r ra k i n g . s u (h h o mo g e n e rrt probdbtv w i tt nor tead ro murh rdn. dom m eas ur e m e n te rro r.T h e ra n k o rd e ru f score.i s l i (etr robe i nfi uen.ed onl v qhen r he' p is u b v i o u s v a ri a b i l i ry i n te s rw i sen.ssw i rhi n group. S ' udenrs * ho ' he answer an item corecdy only becaus€ of rheir resrwiseness, rather rhan rheir achievement ofconrent, cause rhe irem to dBcriminare improDerty. As we noted ear lier . poor i l e m d i s c ri m i n /ri o n (o n rri b u res ro l oqered r;ti aLi ti ty e" ri mares. 3 Stu d n t i n ti u a ti o n l t s ru d e n rr. rrenot mori va(edro du thei . best on a test, their scores are not apr ro rcpresent rheir acrual achievement levels vera well. B ur whe n rh e .o n s e q u e n (e so fs c o ri n S hi gh or l ow are i mporranl ro exami . nees, the scores are likely ro be more accurare. Indifference, lack ofmoriva.ion, or underenthusiasm, for whareler reasons, can depress resr scor€s in the sam€ s ay I har anx ie tv o r o v e re n $ u q i a s mma y . W hen mor i vati on i nfl uen, es i ndi ri duai s in the group differ€n.ly and inconsisrendy across resdng occasions. random er. rors are likely to influence the scores. THE FELIABILTYOF TESTSCORES 93 Admlnislralion.related Factors Ar wrth test.relared effects and most examinee related effects, t€st usen I Tine lihits Scores from a L€stgiven under highly sPeeded condioons |he apparent increase in reliability that r€sults from speeding up a rcst is usuallT one must have estimates for both abiliry and sPeed By sPlitting a test into halves exa inees are able ro attempt all items. 2. Chealingopportlnities Oc.urren.es of cheating by students du ng a tesr ontribuk rando; errors to the test scores Some stud€nts are able to Pro- 94 THE FELIAB]LTYOF TESI SCOFES CRI T E B I O N. B €F E R EN C ESDC OR ER EL IA BIL IfY hoid r hc m i s u k e n n o ri o n rh a r s c o rc son cnrerroDretere ced rcsl ' cxhi bi r l i l rte ab' t ir l, ar e In a p p r,,p ri Ie ro r .ri rc l de! or ed r o d i s c u s \i n grh e v a ri e r) ^l r suremenr tor se!eral orrcrion.refer( cabiliry of rhe ..norrrefercnced" rn Scorss or D€clsions Thc retiabitiry of scores from criterion.referenced resrs can be under . . stood berrer if ryc firsr examine rhe rypcs of ;,gh;;;; m aLe { ir h .u ,h !,,re .. } o r (u n v e n i e n ,c, rhe\e 'nrerprcra,t"", c!n be l drr,t" d"; d, db:utur. ner t or m an, c . d u n ra i rre ,ri rn rre . d n d n ,J \re r) Absoluk.peformanceint€aprctat.ion;make use of a number . scale rhar shorjj te \(t\ o i d (o rn p ti ,h m.n r u r J.hi .!emen, r., 'ue, nr reo eds ' n8 " ,.h " r,,.,;i i ;' ;;s c arep u s rtr.n . r \J mD te \ a re ., T .pui nr \, us.d ro .vj j uJre rhemer,r 10. " te pu,.nr^'rrng rormro, iudsing,hequrlir .r a w!,odko,kinr s ( ar e r or t u d g rn g rh e q u d l i rv u t d trg u re 5Lari ngp.r t" rman, ".","ai. ";;..;,i;, t5.poi nL' stal f i ;r us c o Dv d,rre n ,.h te -ah, F , ro j u d g e rh e qudl i l ! ut \ho,r " drJtugur, w ri rren dn, pr r \ enr r d D v p rrr\ o r :ru d e n | r In e j .h rhe.e (are.. rhc .core;br" i ns medni nF I r om I he be h a ri o r d e ..,ri p ri u n \ a rra (h e d^tro eJ, h { al c p.,nr A rJ d.,, nbe,;;:: a studenrhasacbieved, and no ref.erence ro rhep*r.;-*_, cannorbc sarisfied wirha,,retesC,rL .r "ir,.. ii,j..., original scores Cor.elauon coetfici€r surement are in rhe samerdathe po: the same ,r,tzte posrrio" on ..tesii"g. _ Thereare no generat a,, epr;d Indi,e" of r etiabitir)tor s(o,e5 . basedon ab'otureperrorman,.inrerprerari"ns. H"..,",. ,pp,.p;i"i; .";;;;;;l;;:;: sistencywoutd inctude a d;monsuarion thar rhe iverage atscrepancy terween test and relesrscoreswassmallfor rhe group i" qu*tton. "S.n.tt,r.oula ...a i" be definedl,! rheurer.basedmainlr on rhe'fineness ur rheb.h""i. ;,i;;"d; rx,nq.rn /dorrrun.hrBhagreemenr dmonBindependenr rarersof rhe ,-. d._." or proJecrs or orher pe,Jor,nances is luflher e\ idenceot the consjsrency of rh€ scores.Again, absoturedifferencesrarrler lhan reradvediffe,."c€s ;;sr;L ;;;;ined. Donatn?,tina!. intiptetal;r?6 are m"de when rhe s(orrs indicale rhe Der. (enrageor some( teartydefined, onrentor performan, e doma;nat|ainedbv,lhe srudenr.some oI rhe,ommon domd,nsor inreresrare tound i" ,r,. uiJ.,liii, ctrni(utdmor our etementarlschoots: namesofrh % r.,t.,;;;;;.;i;;;;;i: proo!.rs rrom rhe mulripricarionot pairsof singte.igil numbers, atj \;rdson rne rn d.gradespeltrngs(ate,dnd name_location assoriarions for lhe 50 srare!. THE FELIABILTY OF TESI S@FES In each €ase, a test score obtains meaning from an understanding of \vha! Lhe "v hole, ' t he doma i n , ( o .s ti tu te s " T o mml k n ow s 83 P ercentof hi s l etters i snot (o ld t her e ar e 7 8 0 w o rd s i n a l l Se tti n g a s i de these shortcomi ngs,how can rhe reliability ofs.ores liom a dolmin-relerenced sPclling tcst be determined? Since we are agarn inleresred in absolute intefptetations, corrclational methods will A s wir h a b s o l u tc p e rfo rma n c e i D te rpretati oD s,the mosl reasoD abl eevi denc e of t hc r el i a b i l i tv o l d o m a i D e s ti n a tc sc.r' esi s a snrl l averagedtsdeP ancy D rethodshal e been devel s c or e bas edon te te s ti rg A g a i n , n o g c n e ra l l y acceP Led in donain scores Such scores are descriptive, brt (hel offer no PrescriPtions for further insrruction and, unless a cutoffscore is introdu.ed. they do Dot helP Lhc perlormaDce standards Often .alle(t de.ision rtllj) I 2 3 .l used in nasrely ,nterPreta A postrcstrore of iI lea* 85 pel.enr i5 necded ro Proleed to the next uniL A s c or eo fa t l e rn 2 2 o u t o f2 5 i s n e e d edb P as tl )ed,nefs Iest A dv oneN i L ha s .o ,c b c l .l v 4 0 s i l l b e p l ac.d i n the sP eci i lreadi ngpfog' am e crk,r S . or esh i g h e rL l ra n3 o n l h e 7 P o i n t s c a l cw ,l l be Iega.dedas a.ceP rablP When rhe interpretile goal is to make a decrsion about perlbrmance the retesting- k should matler lLt(le ifStcphen earns a 7 or a 4 on the retest, as long as the d;cision about his performance Is the same. l hus, for mastery interpre|a tions, score reliabilrty takes a back seat to deciston consistency with regard to measurement errdr. Whenever criterion referenced scores are used to make di chotomous decisions such as pass fail, mastery nonmastery, Ptoceed remediate, any of sevcral methods may be useful for estimating decision consistency E6tlmallng Agreom€nl When dichotomous decisions such as mastery-nonmastery are the object ofcriterion.referenced interpretation, the consistency ofclassrlicatron is of inter 96 TH€RELABILITY oF iEsT scoREs b 4,show rhat17srudenrs were.,"il:TT,';1$il,'li"'1.';:llii:::i,l:']i (5ll) N s rhar 80 percenr of rhe decisions abour rhesr 25 srudeDrs were consisrenr from. percent) werc classilied differendy on l is 1. 00,bur rrs l o w e s rv a tu e i s n o r i i l e l to classrficatioDsmndomty, by coin ro Kappa, seco'd coeffi.ienr ofa! much rhe classi carions that are based i cncy relarrve ro randoD classificarion For example, ifpi turns our ro be 0.60 for - i:iJi:#H,.r: - l;1,s.*-."' (5.t2) rharcourdbeexpected by chance, is carcu. (d + l )(d + .) + l . + d n D+ l l N: (5r3) Tebls5-a. Examplsof D6cisionConsistencyEshmalron b =3 17 P r= (1 5 + 5 )/2 5 = o 8 o l8 25 THE FELAB LTY OF TESTSCOFES 97 T he v alue s o f4 4 c , a n d d a re re p re s entedi n Tabl e 5-.{ C an you ve ql harP , = 0579 for the data rn fable 5 .l? what does rhrs value mean? Ir ocans rhar about 58 percent of thc pairs of cl assification s would be consistenr, by chance, if thc outcomc liom thc sccond test administration were indcpendenr of lhose f r om t he fi rs t F o f o u r i l l u n ra ri o n . K = (0 80 - 0.579)/(1 0 579) = 0 525 If r he v alue o fk a p p 3 w e fe .l o s e ro z e ro , i t w oul d mean rhat our cl assi hcari ons w ere nol any bettcr than rvc could have done bI chance, withour tcstiDg thus ii = 0 525 should be interpreted ro mean that something other fian random facrors is accounting for rhe level of decision consistency obrained rhrough rhe rwo resr adminisrra(ions In other words, true scores rarher rhan crror scores are rhe pri. mary reason that decisions were so consrstent from ore dme to the nexr The illustErion of obtaining an agreement coefficient is parallel to rhe tcst-rctcst method used with norm.rcfcrcnced measures Howevei when eouiva lc nt J or m sa re a v a i l a b l eo r N h e n L o n renr\al npl i ng i s J maj or ,.n, ern, e,ri m;ri on procedures that parallel the norm referen.ed equivalent fbrms methods should he used- Berk (1984) and Subkoviak (1984) have dealt wrrh these and other related methods in so re detail In addidon, Sobkoviak (1988) has provided tables ro eas€ r hc ( or npu (ru o n d l b u rd e n ro r , u l i ri ng & Jnd kal rpd "1, Feclors Altecling Decision Consislency Many of rhe factors that affe.t the magDitude of reliabiliry coefficrenrs also affect agreement coefficients because errors associated rvith examinees, the t€st itsell and the testing conditions can occur h any measurement siruarion And, of course, these facrors often operate simulta eously rather than in isola- I The tutoff scorelacatira When the cutoff score is nearer rhe highesr score or the lowest score, decision consist.ncy {ill be greater A cutoff sco.e at about the median rs likely to produce the nost inconsistent decisions ThB idea conforDs wiih the notion that extreme levels of Derformance are easv to Dick our . bur s h a d c \ o t d i l l ' a mo n g r !pi ,dl pcrf;,mpr' rre drffi (ul o der;cr. ' c n coJ c \ the scaresIl the homogeneous xnd 2. The hanogmeitt the cutoffscore is somewhere among them, the number ofinconsisrenr decisions could be quite high Much great€r de.ision consistency should be expecred wheD scores are highly variable and the curof| score is located somewhere with,n rhe distribution. For example, even if the curoff score is at rhe median, mor€ consisr ercy is likel' ro be achieved with scores froD a recrangxlar distriburion rhan from one that is negatively skewed. 3 Testlotgth.lntuiri\ely, it seems logical that ihe more opponunities we gile examinees to perforB, the more {onfident we can be iD deciding whar their ranking is, how much they knoq or whether they can perform a sp€cific task well enough If a masrery decision nrle requires 75 percent for passing, great€r deci sion consistency should result ftom twelve trials than ftom erght or from eighr items than from four The theory and technical advances associat€d with the reliability of scores ftom criterion-referenc€d measures have be€n develoDed and imDle 98 THE AEL AB LITYOF TESTSCORES ment€d slowly For more advancedE€armenrof rheseropics and for addirional comFutationalillusrErrons, seeCrocker and Atgina (r986) and Kane and Brennan (r980) SU MMA RYP RO P O S IT IO N S I Educationat lesrsalwavsyietdtesslhanpedecUy 12 The standarde(or ol m€aEuremern is an e$f consislenlresurtsbecauseot contenlsampting mate ot the generatmagnllde ot erro6. er_ e(ors, examineepeirormanceefiors, scofng pfessgdin tesl-s.orelniis edors,or admnsrralion€nors 13 Thestandard€iror oi moaslfem€ntcsn be eEU_ 2 Themeanrngolscoreconsst€.cyvaresdepend- matedbymuhiptying theslandard deviation ot the nq.n rhetyp€otsoore nterprelaion:norm, criscoresby lhe squa.erootots d terencewhrcn lenor,or objectives rol€.enced ts 1 minusthe fetiabitilv coeflicienr 3 Teslscorereliabilily can be d€tinedas theco(* 14 Longertests composedot mqe discriminal nq rationberweon scoreson twoeqlivatenltoms or itsmsareljketytoyted morergiiabte scofoslhan a lesi tor a speciliedgroupot examtnees shortertesls composedot tess discfmtnatrng -1 Reliabilit y c anbe d e fl n e d th e o re ti c a tty a s th e Droi tems ponionor observed scorevariancedle to tru6- 15 Tesiscomposedoi homogeneolsconlentare scorevarrance tik6ylo producemoreretiabld scoresrhanlhose 5 The rellabily ol scorestor a giventesl may be conlaining helerooeneous content allecledby randomeior sourcesbul nol by 16 The more variablethe scoresobtanedkom a solfces or sysremalic eroG lesl,the h gherthetrrelabitilyis tketvto be 6 Nerlherleslreteslnoreq! vatenliormsmelhods 17 Scoresoblajnedfrom grolps heisogoneolsin are pracrrcally lselu for es{imalngthe r€tiabitiry achi€voment are tikey to be moreretiaitotEn or scores a classroom rest thoseoblainedtfomhomogeneous groups 'rom Brownprophecy 7 TheSpeafman tormllais usetut 18 GfouDslhat are hetercgeieous in iestwseie$ lor eslimaine the rellahrlilyoi scores kom a are tiketyto prodlce tess re abtescoresihan englrenedor shorlenedtesl grolps homogeneous I The kLder Fichardson lormutasyi6tdeslimales 19 Inr€rna consistencyrelabihy coeftcienlsarts ol scorer€liability tfom dalaon the varabi ty of ikely ro be ov€restimates whe. lhe lesl rs ab ty oi scores tfom tests noi scored dicholomously 21 lrpn 6ttodsL^.d o , orn-re e,4r-6dredsLres [ may be more rmportant lo estimate decisron consistency thanscore re abifty tor lests used 1o 10 T hemore Wde rylh e t em s n a les r v af y i. dr f ' m a k em a s r e r y d e c s i o n s .! ty. ihe more serously lhe KLder-F chardson 22 percenl oi agreehenr and kappa a,e Soth melh lorm! a 21 may Lndereslrmarere iabrr(y ods ot esnrnarng dec son consstencv 11 lhe min im!.na ccep t ablelev e or s c o/ e r ehabr t ny 23 D e . s o . c o . s s t e n c y r s . f r e c r e d b y s ; c h t a c t o r s s. nan y a lu nclon ol lhe nlendedus e ot t he a s c l l o i f s c o r e t o c a l t o n ,h o m o g e n e i l yo t s c o r e drstnb!lon andlesl enoLh r!]E FELIABL TY OF TESTSCOFES S FORSIUDY AND DISCUSSION OUESTIONS purposes quilereliabielor crilerionrelerenced 1 Whymighla sel ol scof€sbe considered purpos€s? lor norm{ele.enced bul low in 16labillly by a persofwhoslal6s. Thislesl is mt reliaaredemonslraled 2 Whalmis!nderslandlngs parlollrle scorevarianceralherlhanpaltole(orerors consid€rod 3 Whyaresyslemalic andcnle.ion{e'€renced scoresd ller 4 Howdo sysl€maicerrorsallecl noh rele.enc€d ale lor usew th achievemenl 5 Whya.e slabilllyesl malesol re abililygeneraly nappropf lo eslimale 6 Whatassumptons musibe madewhen!s ng lhe Spearman-Brown 'ohula the reliab ly ol scoreslroma sho enedlest? lor a seroi I scores? errorol measuremenl 7 whal lslh€lormulaior compLlnglhestandard wilh be usedlo describe theeilor associated I Howcanthestandafderor ol meas!remenl on a lesl? a perleclscofe(100percenD mqht lhe dooblingn lenglhol a 3ojlem lest producea de9 Underwhal circomsrances creasein scorereliabilitY? ce 10 Howdoesthe dilticuly leve ol ilemsallecl lhe 16lablty ol scoresfroma mulliple€ho reliabilityeslimalesless uselu wilh scoreslrom domain11 Why are corfelalion-based thanno.m{elere.cedtests? rel€ronced 12 Whal happensro ihe valles oi P^ and kappawhenalexamineospasson bolh lesling Validit.r: Interpretation and [Jse ' I her e a rc g o o d re a s o n sw h y v a l i d i r) i s such a conrepr amol g q \ yuu Ii l l \ep' nirl5undersrood hn h r r.r ,rw r. In d rc -r .\p .fl ' rF mednrnq o' ,bi r i d" d h" , c v oh, e drn d c o n ri n u e sro c h a n g es l i s hrt\ \vi rh ra.h pr(i ng de(ade.C nnsequenrl y, r nc as u re n re n ts p c c i a l i s rsh a v e b c c n In.on\' qrenr i n rhei r use of the terrn. and Inany ideas have bccn proposed lbr brjngiDg order ro re confusion. Then, to r h, ^r s l ,u h J rr J re l J rr\c l r r,.n m p t"re undr r.rand;nB nf,pftdrl /,/) nnd i r eJ{ ,u , , , I lu" e rh c r\u r(rr,. D ,,p i rc rh e p.renri al tur ,onru,i on, rt,.i .,,,,," .,j ro under sra D drh e c o n c e p r o fv a l i d i ry b ecausesuch undersnndi ng i s rtsel fthe foun _ dar ion fo r fa i r a n d p ro p e r u s e o t r esrsand measuremenNoaal l ki nds A l l rl ,c precaurions and special cares raken in the resr developmenr pro.ess, in rhe ad, ministrarion of tesrs, and in dre reporring and interpietatioir of scores are iD rended to enhancc validrry Thc ma'n goal ot lhis chaprer is ro describe rhe sever al f ac e rso f v a l i d i r! a n d th e re ta l ed pri n.' pl es rhar can be appl i ed to ensure appr op ri a te u s c o f l e s r s c o re s . THE M E A NI NGOF VA L ID IT Y lhe term ultdiry, when applied ro a ser of resr scores, refers to rhe consistency (accuracy) wirh wllich rhe scores neasure a particular cognrrive abiliry ofinteresr. Thus. there are rwo aspects ro validiry: whar js measured and how c;nsistendv r is neasurcd. Ihe cognirive abilities refcrred ro are abiliries to perform observ. able t as k s ,a b i l i ti e s rh a r re q u i re a c ommand of subsranti veknoi l edse. The con. sisrency of mcasuremenr refers ro rhe retiabitiry of rhe scores. Reiiability is a ro0 rl T AND USE VALIDI'Y NIEFPRETAIION 101 necessaryingredienl of validity, buL it is not suflicient to ensure lalidiry trnless ihe tesr scores measurc wha( rhe test user intcnds 1o mea .e, no matter how r c iiably , rh e s c o rc s w i l l n o l b e v e ry !2l i d. l hus, fron thi s P €rsP ecti l e,val i di tl rcfcrs to lhe meaning of the scores obtained from a tcst admiDistered to a .crta;n gr oup D o e s a h i g h s c o re mc a n l h a i Ihe tndi vi dual i s P arl i cul arl y.o,D P etentw tth respecr ro rhe knorvledgr dre lest is supPosed to be measuring? Does a lorv scorc n) ean t he e x a m i n e e h a s l i rtl e a b i l i ty ?W haI el se mi i i hr a hi gh or l ow score meatr: Validiry has iraditionalh been regardcd as a tesl characteris{ic, but the most (julren t fiinkirlg of measu.ement exPet ls has .haDged that 'fhe nost rccent Stanlard.sJot Edu.ntianal and Pr.halogical'lattrng (Amcrican Ps,vchologicalAsso.ia Lion 198 5 ) a s s o c i a re sth e te ml N i th a set of test scoresral her than { i th rbe test used to produce them ln particular, laliditr has to do wtrh the meannrg of rhe s c or esan d th e w a l s w e u s e th e s c o re sto make deci si ons W c rsk such questi ons as " How w e l l d o th e s es c o re src l l e .r P hysl csa.hi e' emenri o. H ow aP P roP i ate r s it t o u s e th e s e ma th te s t s c o re sro deci de w ho shoul d takc al qcbr^ ncxt l ear and whc,s h o u l d n o t? o r " H o w a p p r.i pl i rte i 5 i t o i ri l c! thrt hi gh scorerson rhi s c er r if i. a ri o n te s r w i l l b e e x .e l l c n t te a chers?"I{ crc are sonre s,Irati ons Ihal tl l us trale a variety of validitv concerns I The test is intended to measur€ higher-order thirking skills in eanh scienc€, bDt most iterns require only recall of facts, terms, and scientific Principles. It world be rnappropriate Io tnfer Lhat high scorers can aPPlv. sohc Prob lems, or inlerpreL earth scien.e informationl the nature of the test items did not require the demonstration of such skills I he meaning ol the scores is somewhat different from Nhat the tesr developer intended- For whal other purpose might these scores be more lalid? 2 The test is supposed to neasure achievement of concePts and rela' tiorships rssociated vith the democratic form of governan.e, but the queslions requir; a very high level of reading skills and a highly d€veloPed Yocabulary. Tl; meaninq of ihe scorcs fion this tcst is comProDiscd by the "eit'a" \'erbal skrlls rhar arc reqllired l.ow scorers may be poor readers rather than low social studics achie!ers. l hese scores are not a vcry qood represcntation of the kirds of achievement the teacher expectcd 3- The scores on the p€rsuasive €ssay thar was giv€n as a fidal exam in ninth-g.ad€ English lreusd by the oEiculum committee to assessthe eff€ctiv€. grammar, ir add.essing witing nechanics-spelling, ness of the currioluh punctuarion, and capitalization. Unless writing mechanici rvas a scoring crite. rion for the essays,the scores wrll be influenced mainlt by the amount and quaf ity of evidenc€ the writer offered to support the position taken. Even if wriring mechanics was one of the scoring criteria, the scores will be contaminated" by lhe other factors rhat were used tojudge the qualiy of the essals. It is probably rnappropriate ro make curricular judgments abou( mechanics on the basis of 4 Students nodced that the multiple-choice it€ms of th€ readirg test contained qu,lifrers (most, sohe, usua y) ;n the key€d r€Eponses and absolu.es (all, n€v€r, every) in host distract€rs. This test was Dade easier than rt was in. item.writing medods ol the teachei The tended to be due ro lhe te s tw rs e n e ss bctter than readi ng comprehensi on-S i nce s c or c sm a y w e l l m e a s u re'diosyncratic iO2 VAIDITYINTEFPFETATON ANDUSE the scores may nor have the meaning inrended b) rhe reacher, ir would be inap. propriate ro decide how welt pupils are progre$ing in rerding on the basis of mosc s.ores 5. Th€ t€st is a measure ofword anatysis ski s_b€sinnins sounds. end- inq s9u4s. and rhvminr_but lbe ,eacher.senuncia,ion; a inn."ii"" "J;". h€rp tead studeDrs to rh€ corr€ct answers. The scores on rhis resr may sti be a measure ofword-sound associarion skilts, but their rntcrpreration is complicaterr b! fie re.rcher'srmprcper tes(.adminisrrarion proccdurc;..rhe vahre of rh'escorcs r or r herr rn re n d e d u s e _ g ro u p i n g fo r readi ng i nsrructi on_has been teoD ardi zed hc , du\ p rh e rF d n rn So f rh e y o r.. h d. b..n di \rorred. 6. The directions o! rh€ math t€sr irdicated thar ,.+,'and ,,0,,should be used to respond true o. fat!€; answ€rs usin8 were scoted o wrong, regardl€ss of their corr€ctness. Obliousti rhe scores of rhose who failed to hccd the dirccrions will nor be a measure of iheir mathemarics achi€vemenr Since 11is unclear whar rhe scores mean, ir atso is unclear how rh"";;.;l;;" These situarions iltusrrare whar validiry is all abour_rhe meanins of tl,c scores we obrain from rhe adminisrrarion of a parricular tesr Therc ari narrv f i( r o' , r h a ( ( rn d e { | o \ a r a l i c h e meani ng w e had hoped ro drLd.h ro ,cr nf d sores. Once rh€ Deaning of the scores hai been disroited, rhe scores be.ome less appropriare for rheir inrended use. Thar scores are ,rrs or 16r disrorted and lp$ Jppmpr;are means ihar validiry is a mauer of degree. Scores arc not abso lur el\ qlid o ri n ra l ,d fo rd p a rri c d l d ru s e tsuchperfeeri oni \orcl i nJri l l nor ,)o\,i Dre ro a(hrere h rrh rhe educarionat measuresavailable ro us. The burden is ciiefl) o^nt he u rE n f rc s r(o re s ro j u s rrfv (1) thei r i nrerpretari onsof a set ofscores and (zl the appropnrrcness of their use of rhe scores In eirher case, some form of tangible evidence rs required ro supporr rh€jusriEcadons EVI DE NCEUS E DT O SU PP O R TV AL T D IT Y T he r ppr o p ri a re n e s s o t!s i n g -re s r:c o res for nati ngprni (utarrnrerpr.ri ri on\ol oc nir ons s n o u ro h e J u d g e d tro m e v i den.e B arhereddnd prp,cnred bt rhp re\l use. There is a varjery of evidenc€ rhar mighr b. p..,.nrua ro demon;rrare rhc valid use of a ser of scores, and most could be grouped inro one of rhese categories: conlenr relared, c renon relared, and consrrLicr related Thesearc nol t)p"\ atudt i'!, btt \p^ at uohd,^ aid,1tr". fhe conrcnr rlpe is cont p-ncd \ irh hos wer t t hpr e \t ,o n re n r re p ' e s e n rsrh e d omai n ot ab,l i ti r,rhe r.er,\.r!i ns l o mea . ur e. T hc,n re ri o n rrl ,e i ..u n .e rn F d q i L brel di i on,hi b.,usud rreorcsi nredbr . uI elir io n ..e ffi , re n r,. h e rbe e n rh e resr.cores and rhi r ore. on ,ome cr i Le.i on n' e} ur e o f rc l e \/n r a b i l i e \. F i n J ttr. rhc .on\rru(' r\pF i , , nnccrn..t \i rh rhe or eiar l n ' e a n i n q o trh ,.,o re s .q h a r rhe.o p.ri onor ra,pon,es,o rhe Intti \i dudl ir em \ ||ord l s (u r., me d n \ J s a p s v ,h o togi rat .onsLrucl |l rj .o D \c n i e n r ro d i n u !s rh e rhree ryD e. ofe\i den(e rFpardrct).bur do rng so conve),s rhe false rmpression thar rhese are independenr notions'and that any one of rhen mighr be used, by irsetf, ro support v;lidiryjudgmen$ tn facr, VALIDITY ]NTEBPFEIATION ANOUSE 103 .{nsrmcr..elared evidence iucorporares rhc orher rwo: borh conrenr. and ffrerion. :'i.:,1:;jl::::,i:il::ded,o suppo', rhenea"r.g,t .h;',;';,i,-i;i"; ", ' l h e p ro c c s so t g a rh c ri D g c !i der t'on are to providc evidence /o, a Dar s e! er al o rh e r.o mp c ri D g i n te rp re re i;on scores rcllecr rhe achie!enenr of cerr unduly i n fl u e n c e d b y c c rta i n i f.e l c lan :1'. ::::T, fi:H*l'l::?::l;1 T.,J, ::#iL::Tjff""..:."i::,(:"1::iit: ili"1,l sii,lii ljl iill';r,{::"l l;.:x11i#fi:i:ilir*lill' Ir is imporranr ro realize rhaL rhe. validarion of scores can change over p3*, i.. :Til]"',lf:i'i:,iltff:itt:: { *1g*-* .r'"'s*s."f lii:.T:ilt1 ::Jf'i:il:ixril.';:'i t:T*ili:Hxs,"txl ""$1*ii;'..1#li""f"':H .",*;,o,,.":;ll#i::i:.1:::Y;i:il!,1fi I :::.:': :ilt:,l- -,,,,r,., .;;: )::iJi;:i,:,,'"I[:n ]J:::i:l ;":i,."1,:tiil:';h:y "::,i:,:Tit."; :i ;rT:,ill"lt;li:;;:i:1.:;::';..i il:,:i,ltJ,1n*u,:11*: ;:tlj rlic rlpewnre. In facr, if Acme continues!o us" i" ,yp;g *.,, i, .ili lik.i, ,"r; Irro b te m\ a rc e n ,u u nLered In u\i ng rorcs con,ri n;d i n rcho,,r fflt"E:,':l,m'*.* ill:d;:'":nifii*,;1ftT:x"::,,T*: :',:':,::ll ing inferencesabour future pertormancemay atsoilanse. Conl€ntrelatsd Valdation Evid6nc€ j:$j:.,""'l'J,i:, i:T:i,T f,'i;t'j*:,Tj"j;I::,: :" " l'j,i:;l;'.:i'"',:fi rhFdlainmenrot knostedgein rher unrenrarea.f i.,.*". r r,. *",. i,,.i*Jta. ,rons wnernernn|'n rele,cni:edor(rirelonrereren,ed_arebasedon rheLvoo. li.xl'l;f,t:t':":L :*';.1;, ;;i:l;:Tf[.';:"11'"f :,::4,:f ,I,,ftfjl t*fi'irr.'r+r[i':'":#Itr."i"rFi]!i;,.::li another difterenr bur similar serofitems fn idmt ro answer aboui?up**", wewourdeirct t"t:..p.,11:tion, "r:,: ce musr be made rhatretares ro anorher , with a w tten driverk lic€nsetesr,we rer on rhe test wilt be saf€r and more rcs, conrenr musr beba*o * *o,n,il?Xilf:r.T:i%:"dt#r;.jbffi,1: ", 104 VALIOITY NTEFPREIAT]ON ANO USF delineation ol rhe k.o$lcdge, skills, and undersandings requrred for rafe driv. ing A sanple of some oa rhe srare,nenrs rhar mighr be ;ffer;d to devclop a safe driver d€ltinitron arc I Disringuish the .r€anings of road si8ns oi difiireDr .olors 2 Describe dr€ iun.Lon of a carbureror 3 Describe rhe prlccdure for g,ining.onrrol ot a .ar rhar beains 10 skid on ice or 4 Fnld (he shoncq disrance berweer t$,o ciries using a highway map 5 Id€ntily rbe merning asso.iarcd with signs of larying geomeri. shaDes 0 Des . r ibr t he pr o( c dur es f or ( ba n S i D g a r a L r i r e somc starements could bc excludcd frorn thc defiriiion because rhey represenr us ef ul s k i l l s . b u r s k i l l s ro r e s s e n ri rl ibr safe dri vi ng The .,." ,r,r;" ,.i ,,.4; m enr \ pr o b /b l v l ,r , rhr\ , ) o n rh. urt,cr l ,drd, rh( .Iefi .ri , ron trtctr w oul d -.c F .,r be ( ons i d e rc d i n ,n rIIl (re k l r,.j In , tU ,ti ng sr.r.rn.nr, \ .h,c..D i rri .rpnri arc bet ween rh e m e rr;n s s n t.u l rd r' ,,t L r ul ent,,,,\rhdr d(j i npdri !Lgl rr.e,. On,e t he. der r n rru h r\ b rF n n rd d . ( \p l r.i r i ' i \ po* i btc re" i rrem," nr" nr ' u,.mp,re wjr n I n( o c rrn rtro n tu a \\e \r rh e rc l e\i n,, of rhF i rcm, tt rhe i rem, ha\e l ,ecn wr iuen r n n rd ' ( h rh e d u m.ri n d e fi Irri ,,n pre, r,,.t!. rhe i n rerFn,e, E e n,,h ro make abour s d l c J n d rc ,p u n \rb l e d ri v i n s (a n 1,. l ,rgl rl ) \rtrd l r,,m rt,i s pni nr ut vi .r{ ,. a m at or p o fl ' u n o l rh . d n \h c r ru rl re \ati rti ry quesri un i \ i nhe,r;r l l r rhe re,r. crevelopment process. Thar is, contcnr-relared e!rdence is furnished snnutta neous ly w i th te s r.d c v e l o p me n ra c ri v i ri es The domai l dcfi ni ri on (l )oundari csof ( ont en, b , i n ,l u d (d ). i u d g m(n r\ /h nut rhe teteran,c ul rc.r rt.m\,:rncl !r(D r lak en r o 'Jo( h i (!r r.p r e re n rJ | | \ e n c \\ u r , onrenr re rhe duat purpure or sui drne .r le\ t de\ el o p m e n t J n d .l o , u m" rri n g Jl rdJrton"c! rden(. ' Wri tte n d o c u m e n ta ri o n o f d r e domai n speci fi cati ons,rhe nature of the r er r r J r k r . .Id rl k re a s u n . ro r rrs i n grh,,.e r" \r\ prori de i nrri n,rr rari oD rl tahdi r! ev ' de' , ( . rF l ,e l . I9 8 3 ,. l h r r\rd e n (. i .,n,nn,i , terdu,e i r i , bui tr i nro rhe r€rr.l l is rational "because it is derivecl frorr rational iDieren€es abour rhe kind of rasks r har will me a s u rerh e i n re n d e d a b i l i ry,,(p. 7). W hen the (est maker rs, so rhe tesr user! erroence ror approprtate scorc use rs builr inro the producr; rhar is, borh ihe specilicaoons for tesr consrru.tion and rhe ircms rhemselvs are necesury eviden.e for rhe validarion process. Mosr rcsr makers, including lea.hers, aim ro prodLrce tests rhar demon. r r r ar e jnt r i n .i . rd ri u n d l v ,l i d i ' \ p \ rd e n , r. bul rh.r \etdum r.knuw t(dqe rhi ( eodl ex pl' r |ll\ . T h e s e l d o m re g d rd rh e p r o,e* nt resr (on\rrurri on ,,.' o," ,.,! v dlidar io n :rh c r 5 e l d o ,nd u r u m e n r i n h fl l l nB I he rei \on\ tor p,r I i ( utri .te( "t i s,oL in test developmcnr. Useful wriuen do.umenrahon ro supporr inrrinsrc rarional validty provid€s answers ro rhcse queshons. L A b o u tu h a tv to to h i afamt" \ tob,ndn ?A spdrIot rh,\de!ri D uo c \a f ion, it is s o m e ri me su \e fu l rn ' n (crrai n ex aneour abrti ri .. rhdr \huul d d. ,e tc exctuded hrenrionally from rhe main abiLry ol inreresr. For etample, in tesrsof chemisry problem solvrng, reaCing abilily should be minimized. ' 2. What dahnin lt h@bdg4 shill,,ot ldsA,ptoutd"\ a bo.rLtat.uth ntfra.e\? A c ont en t o u rl i n e rh a ( d e \,ri L e . rh e rr\1, ot i nrcrer i , neecl rd. l hr' !urti n. VALIOITYINIEBPFETATION ANO USE 105 s hould .o v e r rh e e .ri re u n i rc rs e o f conrcrtrro be measured,nor i usr rhe contenr , c f le, r c,l L r rh r \p c , rL rc \r i rrn ,\ r l ,dLrni ehr be d, \el opcd. S omeri mes, hrD rer heJ dr ' g . i n i rrx r a rc d B o n d \rd rri nq poi ;r Inr det ,i ng rhe domdi n. 3. lillhdt tt the relattue bnpartow oI the subdohanLt that campte lhe dnmain deJilitbn? ke there sets 01 rclarcci rasks rhar are more imporranr rhan orhers? tf so, the rest dcvclopmenr plan should reflecl 6e differenc€s so rhar more tesr it em s w i l l b e i n c l u d e d fo r th e c rc i mporranr subdomai ns One subdomai n might receive a weighr of l0 pe..e.r, for example, while a more imporranr one . . d$ign ' L l 2 :' p e r, e n l 4 Whtlt hinds of te.tl itens ha1)epnp*txes thal uitL pdnit the kning oJuhide mmt of the dond elenmb? For exampte, in view of rhe tasks ourline.i rn srep 2. dr c ( ir h e \\J r o ! \h o I.a n \$ e r i rem. Inorc " pp' opri d| r rhdn mutri pl erhoi (. 5 Do ihe kn iten' a*quatelJ relect thc damain kmuledge, skilis, and t6hs? T his qu e s rl o n re l a L c sto th e m a rc h bcrw cen tesr i re r.onrenr and rhe.onrenr specitled in rhe doDain outlinc Hlw u'ell did lhe irem wrirer translare the rask des c r ip ti o n s i n to rc s t i te ms ? 6 Do th" subsels oJ tun nms alaquateu represmt the danain in t*ns oj the relattueimqarlanceof the surdrrmr? ? Is the conrcnr I'eighring in rhe resr consraEnt wiih rhe decisions made in srep 3? 7 Wh,lt d,omain o subtlohain, a silz the danain af inrdest, i br6nr in the l, , l? { r e rh e re e \rra n e o u \ In (ro r\ rh dl ,oul d i nrr' t.re w i rh rhe y orc i nrerprerd. r ion. r h c u \e r r!i q h (: ro rn a l .: I5 re adi 4g JL' rti r).ru,dbul ar) te\et. or rumpura. tional skill, beyond thar rnrended, required to answer irems correcrtl? It may seem thar inrrirsic rarional vatidiry evidence is sumcrcnt for rhe validarion of achievemenr rests,bur ir is nor Such evidence lbcuses on rhe tesr_ its domain, the relevance of its rrems, and rhe represerrariveness of irs contenr B ur , \ i o n g a ( l a ,ro r\ u rh e r rh ,n rl r. rc.r ," n rhe md{ni rude ofrhe " i n,' " n,. !dtrd s.or r \ ( or i s . ( r i d e n t e b e v o n d ,e .r .o n re n r i , needed ro \uppofl i nrerprek. uon- C u rrc n r.re l rl e d e v i d e rk e a l u n , i r nor sutn.i Fn' be, JU r( i , trrtr l o rake i nro account response consistency (reliabiliry) or orher rspects ot rhe resrins enliron. m enr r h a t mi g h t i m p a , I s c o re i n re ' p rerrri on. t or exampte. nor m.rereri nced Lesr items that fail to discriminare between high and low a€hievers may be conrenr r c le! anr a n d re , h n i .a l l v a d c q u a re .Bur eLrch emr b i tl nor hctp ro pruduce d ranl or der o[ s L o re srh a r w i l l p e rmi r u k fu l norm reteren,ed rn(erpnj ' ,on\ tor sel e(. tion or classification purposes- gecause rhe tesr scores obrain€d will be somewhar low in reliabiliry, they also will be d€ficient in rerms of validiry ( o n re n t.re l d re d e v i d e n ,e ,or i ,hi evemenr re" r, atso murr bF cLrD D l e. mented by informadon about rhe adm;nisrrarion condirions, scorins crireria: ;nd nar ur e o te \i m re r\. P' e ti o u , e \d mptes ha,..how n tros rhe k! j ami ni .r,aror can provide clues abou! correcr and incorrecr responses and how sconnq rules can disrort the meaning of scores. Bur, in addirion to rhesc facrors, examinee characreristrcs orher rhan achievemenr can cause scores ro be hisher or lower t han r he r o u g h r L ob e . F u r e i a m p l e . M essi rk( 198S ,l i \r. rhesedi rern;ri ve exptanaLr onsf or l o q a c h re v e m e n r.re (r v u re s: Id.t ot suffi ci enr knortedge, hi sh anxi ery, v iqudl im p a i rme n r. l o q l e v e l o l m o rivari un, l un ed Lnqti sh \ti tG, ana l os l €ver of.on.ennarion. Though low achievemenr may be rhe mo$r ptausible explana. 106 VALIDITY: INTEFPRFTAIION ANDUSE tion for low scores, rhe burden is on th€ user ro show rbat rhese orher facrors are not influencing rh€ scores unduly. Such elidence is nor ro be found in rhe rest: intrinsic rational validiry evidence l,usr be supplemenred bv inforjnarion found in examinee responses, in rhe resring condiri;ns, and in thi scorine process W e h a v e i o n 8 t o w n rh a r val i di ,) deperrd\ on rhe purpo,i .r" r " rr;,tr t es t s c o re sa re u s e d , L h eg ro u p q i rh w hi .h 1e\ri \ u,ed. anJ rhi , i ,, un.rdI,,^ ' he und€r which the test is nsed. Valdiry depends oD morc rhan rhe quatily of rhe r es r .T h € r€ s p o n s i b i l i tv o f rh e devel op.r i , ro be d\.teaf a. p.,,j Lt, Jh,,ul ' e .r hat is being measured and to produce alesr rhar measures accuareil as Dossi ^s bte. The responsibiliry of rhe resr user is to make valjd dccisions usine rtre tcsr scores and all other avaitable, relevanr informati.on, inctudins docu;nhrjon furnished by Lhe tesr developer. C.llorlon'rolaled V.llddtlon Evld€nce A crit€rion measure is an accepred srandard againrr lrhich somc resr is compared to validare the use of rhe resr as a prcdicror. For exampte, scores on a dic r ar io n (e s r a re a B e n e ra l l y a c c e prcd mea,ure of \pel i nB d, Li $Fn,er L Ir qt w€r e t o b u i l d a n d g i v e a rru e -l a l s e s p el l ;ng re5r.qe mi 8hr .;parc rhe rtue trL( scores with scores obtain€d on a comparable dicrarion resr r; demonsrrare rhar the true-false resr is an acceptable measure of spelling achrevcmenr. The dicra t ; on t €st i s rh c s ta n d a rd u s e d fo t (o mpdr i son i n r rvi ng ;o e.rdbtr\h rhe tegl l i mr, ) or Lne n e w s p e rn g re s t. Criterion.related €vidence rakes eirher of rrfo f{r-ms oDe relares ro derer_ mining pr6ent sranding on a cnrerion measure and rhe orher relares ro predicr ing tuture pedormance on a crirerion measure. The rlpe of evidence neebed for r giv en s i tu d ti o n d e p € n d s o n h o $ rhe s,nrc' t' om rl i e re,r i n qur\ri nn are i r. t€nded to be used For example, rhe rrue,false spelling resr referr;d ro above was int€nded to be used insread of rhe dictarion test because of the increased effi. ci€ncy in scoring afforded by rhe true-talse tesr Ci,'Murdnr eviden.e would be us€ful to show that studens appear in rhe same rela[ive raDk order on the rwo measures. A corretadon coefficienr of 0.80, for exanple, mighr be regarded as acceptable concurrenr evidence. When rest scores are used ro selecr individuats for admrssion. emDlov ment, exuaordinary educarion opporrunirlt and the like, ,rddi.t;z. evidenie is needed .t h c ' e i s a n e e d ru s h o w rh a r a po(i r i ve r etari on(hi p i us| \ br r$een ,.or r\ on ( he ( e s t { th e p re d i , ro D a n d (.o re s on rome a.( " prdbtF mpd,Lre ul tJrure per. formance (the criterion). For example. a developmental screening-test siore might be used ro predicr which five-year olds are tikely ro succeed i;kinderearten.Ifthe c tenon measure for "success" is "reacher's rating ofsocral, emodo;al, and a. dd e m i c d e v e l o p m e n r d L rh e end ot krndersarren, rhe essenri dtvdt,di rv ev ; denc €mi g h r i n c l u d e rh e ,u rre l rri on be' heen s,i eeni ns.' e\r s(ores and reach. er is r a( in g s .l f a (o re l a ri o n o l \a t. o.ou i s obrJi ned. w e m-i ghrr oncl udc rhdr rhe test is a useful predictor of tutur€ performance; rhar is, there is support for using the scores to predict successin kinderearrcn The orrelation beMeen tesr aid criterion scores has been resarded bv m any as th e b e i r k i n d o f fl i d e n c e to ruppol vati d i .hi evemenr.re" ause. Th; correlation seems to provide an independen! objecrive validarion ofrhe subjec- VALDTY NTERPFFTATON ANOUSE 107 t iv e judgm e n ts a n d d e c i s i o n s th a t n ru st be madc duri ng test del cl opmcnt. B ut rhe validity of the scores from widely used tests ot acadernic achievement has seldom been supporred wrth impressive criterion related evrdence I his could mean that the lests are srmply poor tes6. But a more Plausible exPlanation is rhat how well a lest measurcs what it is intended to measure cannot be conveyed by the correlation between tesr scores and scores on the c terion measure why In some .ases appropriaE .riterron measures are simply unzvailable What should be used as a crrerion measure for a test of ability in nfth'grade arirhmetic or a test ofabiliq'to understand contemPomry affairs? The tests them selves are usualty intended to be the best measures of these abilities that can be devsed Ifberter measures Nere available io senc the role ofcriterion, they also should be more vahd than the rest under validation That many tesl developen have failed to present con!i.cing empirical evidence of the validity ofrh€ scores from their rests is not for want of concern, effort, or skill It is because cor' relational evidence for the validity of scores from most achievement teste is essentially unproducible. The same can be said of scores obtai'ed trom most professional licensure examinations (Kane, 1982) In many cases, appropriate criterra turn out to be diilicult or nearlv ,mpossiblc to measure a.curately On thejob Performance ought lo be an aPPro priate crirerion for an employee selection test Rur for any excePt the simPlest jobs, what constitutes satisfactory perlbrmance is hard to define expensivc to assess,and difficutt to measure imPartially The relevance ofPcrlormance ratings as criterra for the validity ofa written test is oPen to question. also A qritten test cannot possibly measure many of the characteristics that contribute to high .ar ings foriob perfornance. Such a rest, howevet can measure desirable characteris' rics that are unlikely to show up clearly on a performance ratrng. In situations like these, rhere is little iustification for presenting evidence on correlation with a critcrion as the prtna'l evidence ofvalidity A ma i o r p ro b l e m w i th € m p i ri c al testval i dati on i s the i mperl ect oruncer lalidity of the ffiterton s.orcs Criterron scores themselves should be highl)' tain valrd measures of the ability being tested This also means the crrterion s.ores should be quite reliable, and their reliabiliry coefhcrent should be included as vatidity evidence- After all, a standard used forjudging the validity oflest scores certainly ought to be dl kdrl as valid as rhe s.ores beingjudged against that standard The validitv of the scorei from the criterion measure needs to be addressed as rigorously and aq thoroughll as the validity of th€ test scores rn question Correlation procedures hold Iittle promise for Providing the majn evi of validi(y, but they mav be useful in providing s€condary, contirming evi dence dcnce lf ability A is related in some degree to altilities B, C, and D, then s.ores from a test of A should correlatc to some degree wtth ,cores from B, C, and D If rhey do, rhe confidence that rest A measures abihty A is increased. It is irnportanr to note that such secondary evidence of validity cannot take the Dlace of con ten t related validity €vidence. What test A measures is determined mostly by rhe tasks included in it. ODe cannot discover what test A mea_ sures only by studying the correlation of scores from test A with scores from tests B. C. and D. How do we know whar these oth€r tests measure? We {ould need ro examtn€ the tasks included in them, th€ condirions under which they were 108 VALIDITI INTEFPFETATON ANO LJSE r dm inis re re d . th e n a L u re o f rh e e ra mi nees, and rhe procedures for \(ori ns. tl r hr s c dre rh e b d \e \ l u r rh e me d n i n g o l { orer l rom re\i \ B , C , and D , rhoutd ;he} not be r h e b d \e s to r rh e m e a n i n g o f rores from ren A ds w e[? ConcuEent and predichv€ evidence both requir€ correlarional data and. . onr eque n rl v ,b o rh e ' ru a ri o n sd re p ta g ued b) rhe pro6l em ot obrai nrng an appro. pr ir r e, n re ri o n m e rs u re . th e p rc d i .rion oi co ege treshm" n erade.pol nra" i i aee u\ in6 A C l - s .o re \ i l l u s | | rre 5 rh e d i l e mmd. B orh mcacuresrefl i r t the abi l i rv ro i o .olege level work, bur cenainly the crirerion measLrre,grade-point aveiage, is influenced by Dary orher importanr facrors-nature or the cor:isework, sru"denr ellbrt and motivaiion, gading policies in rhe courses, and abilirv ro €sHbtish qupporr i !e s o , re l a ' i o n s h i p qa mo n g peerr. A nd rhe cri reri on mcasure hi [ | ep 'dl resent achievemenr in English, math€marics, and scienc€ only ro th€ extenr rh;r , our \ ew. rk i n th u \c J re a q w a \ ra k e n by ea(h \rudenr i n rhe val i drri on samD l e. . . o, r elar i u n . b e rs e e n A( I.o mp o s i re scorer and r' eshman end or rear er:;de point avcrages tend to be about 0.50. There is much rhese two metsures dio not have in coDmon. Can a more saiisfactory crircrion be identiAed. on€ thar is Drac, tical to inplement and fair ro srudents regardless of the pattern ofcourses tiaken in their firsr year?-rhe crirc on problem demonsrrates rhe need for addirional vaiidrt) evidence_ro supplemenr rhe rnformarion supplied by correladonal eviden, e. uh i , h i r\e l i i \ h a \e d o n (' i rc fl a ofque(ri on,bt; A vari err ofevi l i di i y. deL r . J l l p o i n ri n g ro rh e s d m e ( o n c l u \i on abour \ or e "! "dti drrr. i s rhe mo\r , on. ! r n( r ngju \ri l i ra ' i o n l o ' rc s r s .o re u s e . Construcl-relaled Validation Evldenc6 The term.rarrz.t refers ro a psychological consrrucr, a rheorerical con. ccptuali?at'on abour an aspecr of human behavior thar cannor be measured or obserled drrecily. Examples of consuucrs are rntelligence, achievemeni motiva. tion, anxiety, achievemcn! arrirude, dominance, and readins comDrehension. ( ur \ r u, t v d l ;d a ri u n i \rh c p ro ' F s s u fg a rheri ngeri denrercsupporr' rhe,onrcn. r ion r hJ r d g i v e n rr\r i n d e e d n rc a s u r e\ rhe ps!(hotog' (al .oni ri urr rhe maxen intend fbr it to measure The goal is io derermine rhe meaningof rhe scores from, the tesr, to assure rhar rhe scores mean whar we expecr riem ro mean Ifour purpose is 1()measure marhemarics problem.sotving ach,evement, - exanple, the goal of consrrucr validarron tor is to sarner evidencaL}lar wi show r hdr ' lr e ra ,l , rn rh , re { re q u i re rn a rh probl em.sohi ng dbi l i ry I he denni on of the construct that ivas used for rest developmenr defines the consrrucr In rhis case, if the (leiiniriorl indicares rhar all four a rhm€ric operarions mav be in, ludc d dn d rh d r J l l p ro h l e m\ rh o u l d requi re ar tea5rrw o q;pq ror sotu[on, rhen each test item will nced ro be reviewed for compliance. Since nrrila readinc comprehension Dor marh compurarion is ro be measured (they are separate coni \ r r u. r . \ . o u r ra l ;d J ri o n \h o u l d i n tl u d e evi dcnre rhrr rhese (onsrru;rs have no appreciable impact on the magnirude ofthe scores.Judgmenrs by revi€vers and coDelarions berween probtem.solving scores and (t) readiag scoies and (2) com, putation s.ores would be useful evidence. Score reliabilirt €vidence would be need, d r o ,h o k rh a ( rrn d o m e rro rs d u e ro etam;nee I haracreri sri cs o, ro i dmi n. istration conditions were nor roo influenrial rn the scores.ln addirion, rhe scorinE crite a should be reviewed ro darermine rheir appropriareness, and the sco n; VALIDITY NTERPFETATON ANDUSE 109 key should be reviewed ro check irs accuracy Ir should be clear from this illusrra lr on r hr r (o n ( ru r rr e l a te d e r i d e n (e rn , orpor dresa !d, reryut .on,cnr retJred and ( r r t elunr e l a l e d e rrd e n .e h e !a u (e rh e meani ng ot a (er ot r.orcs i , rctate.t ru The marn threats !o consrrucr vahdiry hrve been referred to by Messick (1989) as consrruct undenepresenrarion and irrelevanr resr variance. Tlie idea of unacrrepresenhrron means rhar some o rty (the construco, are not being measui solving tesrrroLld hrve some problems r t ion, but t h e re a re n o n e . rh e te i r' ,u n d e r we defined 'l he idea of iFelevanr rcst or her r hdn th F c o n s rru .r. a re i ru \i n g s i ores' o L,edi erenr trom w ha, rhp! ouqhr t o be. M dn \ u , rh e ra ri a b te \ r} l rr I o n rrj trure tos rr-trrbi ti rr t. i n rhi r ! !rpS or). ' n rr rhoutd 5or n( ur I n e s er/r to r 5 ma r. rh e re r c r\ i er rhrn he_re,rb i < ene* , sxe,\ ( lues rn rh (;re ms . i mp l ru \i h te w r, ,ng aI\her: aId \^mc fJ, ro,\ mri c rhe r' ng. r { m . r e d i l fi .u l L rh a n i r rh o u td b e need t.r sr .deretopedreJdi nRor kfl (l nq \ k ill\ . f ' ne ri s u J l r,u i r!, B d b te d \p e e .h by rhe resradmrni .rrarn,, a,.,! " " i ,.,v unr eas ona b tee x p e , ra l l o n (. In rh e p ru b l em \ot\ i ns e\amrrte, i r r eteranr i esr I rr r. anc e. (ould b e rn rro d u , e d b r rh e d ' 1 fi , ul r\ .ompurari on, r eqrri red,rhe use ul nor elpr ob l e m s e rri n g \ rh a r re q u i re u ni que" rpfl or i now tedS e.\;!cre ri me ti mi tl , or am Dr gu o u srre m w o rd rn q , Co n ,e rn to r r.n ,rru r ra t,d i ,v i s rr rhc herfl ot \u.h que,ri on\ ds,.w hl . dr c l- r hr ss ru d c n L r.o re \o h i g h u n rh i s or' \\hv H c,e dI ut rhe \,ure, ru ' cq? loq? | hes eq u e s ri .n s rri \e d o u b r( a b our \aherherrhe $ orcl are In.rsurer ot dre construct the resr maker had in mind or whether exrraneous facrors hale over stated or underrepresented rrue achielement. euesrions of construct validarion hale not always been raised with scores from achrevemer)rrests.Bur ctearlv rhev should be 'I h€ usaztrg of rhe scores fron any rest should be established ietur! the scores are used ro make decrsions abour examinees Then quesrions ofvalid rrc are appropriare to raise, and evidence for the proposed usei shoutd be garh. As originally conc€ived, consrruct validiry was concerned wirh rhe valid. ir y oI a h\ p o L h e ' i ,a l ro n s rru fl p u ' p o n edrv mea\u,cd by a par I i i utrr resr(C ron. bar h and Me e h l . l 9 5 5 r. fh e i d e d h a s a ppti ed pri mari tv ro pi l chotogi i zt vari abl es or per r ond rrv rn v e n l o rre ( ra rh e r rh a n ro achi rvemenr re\rq. fhe merhod\ eh. ployed were, and srill are, intended to;how rhar rhe consrruct under invesriga. t ion i\ r ela te d i n p re d i c ra b l ew a y s ro o r he' con\rruch, as expt,i ned bv $me rhe. or y . S om e o f rh e s e me rh o d s a n d rh e ir u\es are exptai ned and i usl rared bt Messick (1989) in his comprehensive treatDenr olvaiiairv. Q u e s ri o n q .a b o u '(o n n ru fl v a l i di rv hare al w ats been posed w hen fi ere apper r ed ro b e r d i \.rF p a n c v b e rw e e ow har a tesLw as rupposed ro measurednd h hdLit \ eeme d ro m e a < u re .l s ri i s a re s tot under\randi nq otsr i enti fi c D ri nci D te\. as |} le t it le.s u g 8 e n so, ! i . i ' re a ttya n i n te i gence resrtts i har a resrofi ri re i gi nce or r s r t r edU yd me a q u reo l v e rb a l fa c i ti r y) some rerr makers name r} err re.ri and des c r ibe wh a r rh e i r te s ts a re m e a s u ri n g . not i n re,ms oI rhe rask\ fi el i n.tude bur in t er ms o f rh e rra i rr rh e y p re s u m a bl ymeasure.fhar i s shy se have restsol rigidity, inteuigence, persistence, creativiry, tolerance, spatial reiations, and manv otier .rails. For tes$ like rhese, rle quesrion of wh€rh;r rhe rest really measure; 1'O VALIOITY NTEFPFETATON AND USE $l) ar ir c l a i m s ro me a s u red o s a ri s e , as i r shoul d D oes the rask ofcompl eti ng.a iigur e a n a i o s y n re a s u rei n re l l i s e n c e?D oes abrl j ry ro l ,sr uncon!enri onai uses i or a br ic k me rs u rc .re a ri !i ry ? A P P LY I NGV A LID IT YP R IN C IP L ES hfo illustralions of rhe validarion process wiU bc presented to shos how various r n. , , u, , , . n r\ J n d \ J ! l { \ u f\ m a v I el i t e drfte.er,rkrn.tr ot v,ti d,r} e\ i .l .nr e. l n ( J ' n . J r c | | r\ J \\u rn e d rh J t th e U r(rtu meIr\ \pre dF, .1,,fc.l i n a, ( ord w i ,h gpne, ally ac c e p re d mc a s u re n re n rp rj n c i p les, bur rh€ qual i ri or succe$ of rha; w ork needs t o b e e x a rn i D e .lrh ro u s h ra l i .l rri on Kindergart6n Readiness Test Srandardized achicvemenr rcsrs are available for use wirh kindersarcn pupils r o d e re rmi n c h o w w c l l rh e y h al c atta' ned the academi c skr s hu;hr i n t heir k jn d c rg a ri e n p ro g a o r S o m e s chool suse (he s.ores fron a spri ng ad;i ni s. rraoon to derdmrne which pupits should be promo(ed, whjch shoujd b; rerained, anci whi.h should be placed in a (ransirional kindergarten program ttre nextyiar. ls it approprinrc ro use such scores ro make rhcse kinds oi plicement de.,sj;ns? Ho* valid are rhe scores lbr rhis purpose? Whar do rhese;cores mean? Fr' s b i e a n d AD d re w s(1 9 9 0 )serout ro garherevi denceretaredro rhe l atter question by observing rhe adminrtration ot rhe |otua Testsof B6ic ShiUsin nearty 50 da$r o o ms . T h e p u rp o s e w a s ro o bseN e reachcr and pupi l behavror duri n; |tsr adminisrration ro derermine whether irrelevanr souries ot test vartanci m r ghr c o mp ro mi s e rh e me a n i n g o f r he scores.Thar i s, tf D upi l s cal l ed our an. s wer s .if p u p i l s c o p i e d fro m o n e a n orher, j f some pupi l s buri t i n(o rears or be. r ll i fr,d ,h e r,p ro v i d e d h rn rs .o r rt rea.hc,\ r" ;d i r" m, i mproD erl v. rhen ' r m equr e s s o ,rl d n u r r h( h e v c rv me a n rn gfuti ndi .rr" r' ora,hi e." menr. i ri qh' s.ore* or lor scorcs could be explained mos y by factors orher rhan ski ar;inment, r hp . . or e \ h u u l d n n ' b e u .e tu l fo r a ,r , purpose.( on.l u\i ons Lom rhi \ \rucl l \ere r har r l) r e ,, h rr\ a n d p u p i l \ h a d l i l e di tfi , ul () usi ng rhe tesr mareri at\,(2) rea.h er s s c r ed b l e to p ro !i d e ,n a rm o s p h e re ,onduri re16g,,.6t..,rdtj nA ,LJ)D uD rt. s ho uere p ro p e ' l v \u p e n i s e d c o u l d prori de uscrut re,pon,es ,na i qr r.a,hers nearll always followed published direcrions In sum, ir was derermined rhar the readiness s.o.es of groups of pupits can be very meaningful, but lhe scores of s ele. ' F d i l d i \i d u d l s m i e h t b e q u e s ti onabl t In view of rhese-nndinls, rfconvincing conrcnt relared evidence for rhe read'ness rest scores were also available (thar is,'evidence rn support of inrrinsic t r t r onr l \a l i d i rv ,. rh e n rh e m e d n i n g ot rhe $ orfu from rhe readi nss ,e,, " outd l; lel! be \re b c d a r d (c e p ' a b l e .Il o w e rc' . rh€ \e(ond quesri on.rhe onp reqardi na us e. has n o r )e ' h e c n a d d re \\e d W h rr (an be done ro \how rhrr i t i r oi r noi appropriare to make firsr.grade placement decisions rvirh rhe readiness scores? S inc e pla c e me n ri n ro trrs r g ra d e i s rh e rtpi .al di re(r parh from ki nde,S arren.rhe r es t u( er m u q l d e mo n \Irre rh a r l o w scorers\oul d bencfi r more from rerendon (in a particular program) rhan from normat promotion If rhis evidence were to be gathered empiricalh,. some low.scoring kindergarten pupits would need m be VALDTY NTERPFETAT]ON ANOUSE 111 retaincd (before evidence in favor of rerendon is in hand) so rhar rhe ourcomes (ould be observed. Unfortunately, rhis has been rhe sc€nario ir some school dis tncts. That is, sone pupils hate been rerained withour anple evidence rhar bcne f ir s will be d e ' i !e d tu r rh e .h i l d . l n ma n) i nsranres. rhe qu..ri on hd\ bcen cond addressed on rheoredcal and prac0cal grounds Mosr 'ofren rhe conqtusio. has been that deficrencies in skills covered by a readiness resr can be orercome, e!en in rhe sho run, through inrensive Insrrucrional engagement. Ir should nor rake a year to remediate pupils wirh low scores unless physical, emotional, or jnrcllec tual disabilities are inlphcated This validadon example rllusrrares rhe n€ed ro separare rhe questions ot "meaning" and "ure" ro garher evrdence Ir fn her demonstnres thai valida(ron is more than Batheringjltdgments and Dumbers; ir usually requires a logical mal ysis of the relationship between several .ons$ucrs Driver's Llconse W ttsn Test Usually, the scores frorn a wrirren examinarion, a performance resr, and a v i\ ion. r r re e n i n tse ra m d re u \e d i n (u mbrn, on ro de(l de sho i \ eti s,htc ro obr r in an d u to m o b i l c d ' i v e r' s l rc e n \e I he p' i ma' r pur po\e ot l r, en' S d' i \eh is ro protect.he public from those who mighr endanger the tife and properrl of othcrs through unsate use of a motor v€h'cle lorlhe wrirren resi, rhe vatidily quer ' ion is : H o w r' p p ro p ri a te i \ i r ro i nl " errhJ hrgl ' \,uters qi tt Le.rter' . morc responsrble drivers than low scorers?" Whar do the scorcs mean? Whar kilrd of evidence would suppo rhe inrended use of rhe scor€s for classificarion pur poses?(Anorh€r relevanr question thar we will nor deal wrth ar rhis rime js rhe basis for choosing a particular passing score ) Th€ validation process mighr begin by examrning (1) rhe delinirion of ''safe driving abiliry" established by rhe rcsr builders and (2) rhe elemenB of rhe domain of relevant knowledge the definition encompasses.Then rhe relevance of the test items can be assessedby marching irem conreDr wirh domarn deli i. tioD. Items that require examinees to explain how ro conrrol a skiddins car or ro t€ll *re meaning of road signs of various shapes or colo^ would probably be judged relevant.Items tha!require examinees to desc be how a carbu reror works or to find distances on a map are hk€ly !o bejudged irretevant If no irems deal with the differrng meaninBs ofsolid and broken Iines thar define road tanes, the construcr domain mrght be considered undeDeprese n ted Ifmosr ireDs deal srth facts abour laws to the exclusion of makingjudgnenrs in certain drjving sirua tions, tne representariveness of the contenr mighr be quesrioned. Nexr the technical adequacy of rhe resr irems mighr be reraewed ro dere! mine how well ihe items help achieve the purpose of distinguishing safe and unsafe &ivers. Readabiliry should be assessed!o derermine if lhe vocabutary is t oo adv anc e d o r i f rh € s y n ra x i 5 ro o c o mpl ex rS afe dri vers need nor be abl e l o read English prose.) The keyed response should b€ checked for correcrness and (for Dultiple-choice items) the plausibility of wrong answers should b€ consid. ered. Test instructions, inslructions to examinees abour coding or rnaking r€. s pons es , a n d i n fo rma ri o n p ro v ' d e d r o exami nees about scori ns (i nctudi nq $her her lo g u e 5 s ,s h o u l d b e (h e c k e d fo r cl ari ry and compl ereness. The test adminisrmtion and scoring condirions shoutd be reviewed ro 1I2 VAL/DITYNTEFPFETATON ANDUSE der er m i n c i f th rc a rs r() th e v a b d n e rni nq of rh€ s.ores have been conrro[ed. Is s uper v j s i o ns u i l i c i e n r L op re v e n r c h e ari ngby rhe cxaD ri neesitfaconpui erre.mi Dal is usc d to p rc s e n i i i e d rs ,i s a n rp l e i nsrrucri on provi ded and i s rh€re pro!i si on f or r er u rn i n g to a n i te m { o re .o n \i d(r ur chrnge I rcsp{,nsei Ir Lteri cat h.rrd s c or ing i s d o n e , a rc rh c rc p ro c e d u resi n pl :rcc ro check rhc ac.umcy of s.ori ng? Istimates of s.ore rcliabiliry pro!ide s., e cvidcncc atr(Nr rhe influen.e of ran dom er r o rs fro m a d mi .i s ri :Ii ,,n a n d sconng (ts i r ctcar rhar dcci si on consi sreD cy int or m a ri o n i s i mp o rra n r h e fe , ro o ?) Wh i r k i ..t ,)f.ri re ri o n re ta ted evi dcnce shoul d be garl reredto support r hc us e o f (b e N ri rrc n re s t fo r l i .c n si ng dri l ersi Is rhe pertbrman.e resra ;seful c onc ur re n r.ri tc i o rl me a s u re TIr p robabl y i s nor, bccause rhe dri vi nq test re. qur r es m o re rh a n k n o w l c d g e o t d re l aq Jnd rul es ot fi t r,)ad p€rfurmance bc. hindr he w b e e l a l s o re q u i rc s p s y .h omoror abi l i ri cs,abi ti ri es Lo see and roJudge s pc c d a n d d s ta n c e , a n d m c n ta l a l e rrnessand conccnrfari on.' fher.eorotabl v rs .o cxisting standard against lrhich tbe qualny of a w.jrren driver s iesr can be J udged Pre d i .tn e e v rd e n .e n ri g h r b e garheredi f! sui tabtecri rcri on for safc dri ving in rhe future could be derermined For example, ii the scores on rhe rcsr . uni 1. ,r.0 l o \ h r,,r,L F r u t Il d r, , JIron\ Ll uri !,srhe Ii r\r i \eJrr.,trerre.ei v. ing a lic e n s c ,h o N g e rma n e i s s u c h evi dencei W oul d nurnber of acci dentsbe a suitable criterion? How abour nuDber of a.cidenr.free hours of drivrnst The pr ublc m o f d e , i o i n E u n a .u rra b te , ti reri un .ugge,r\ rha' ,' i Leri ou.,etJr;d evi . dcnce is nor likelv ro hrve major weighr in deciding how appropriare rh€ wrrtten tesr scores arc lbr makiDg licensurc decrsions. 'l his lalidalion illusrration shows how imporranr inrrinsic rarional valicl. iry cvidence for achievcmenr rests and how documenurion of rhar evideirce dur ing tc s t d's e v e l o p m e n r s h o u l d e x pedi re val i dari on. Ir al so demonstrai esthar conrencrelarcd evidence alone is insufficienr for makingjudgments abour va. lidrry SUMMARY PROPOSITIONS Va dty is a pro por ly of as er oi les t s c or esr at her lhan a prope y ol a Iesl lnslrlnenl 2 Va dlest use req! res good lests lhose lhal con7 lorm ro a cleaf spec licaton oT rest content and yield highlyreliabe scores 3 What a lesl oi abirlies measures s delinedmore cbany by lhe lasks il requiresthan by the name I ol lhe lrarl t rs slpposed lo measure Ev de nceo l val ldus eolt 6s ls o' c ogn lv e abilit ies is nherenlin lhe tesl consrr!cliof process, nthe 9 delinirionol lhe abrilies €nd in the raltonaes fo. incrLdng each oi rhe lesr lasks 5 lnl.insc ralionalva dily evidenceis needed bul is nol sull c enr by ilse I to establishthe vatidilyot 10 l 6 Thevalue o' critef on{elaled evidencelorvalidily is hghly dependenlon the quality(vatidi(y)oithe Thevalueol cofte alional ev dence toslpporl lhe valid lse ol a sel ol ach evementscores is secondary to tho vale or direct tudqmenlat 6vi No adequslecr tenor measureerists with which to compareachievementtests lor the pLrposeol providingevidenceof vald score us6 Conslrucl .eialed evdence on howwel lhe lems repr€senla dimensions 'oclses otlhe relevanl domain and how w€ll i'elevant laclors are exc 0ded kom the measurements ConslrLcl-real6d evidencenecessarllyinc udes bul rs no1 m redto, conrent-relat6dand crlierion, VALIDITY: NTERPFETATON ANDUSE 113 FORSTUDYAND DISCUSSION OUESTIONS 1 Whal misunderslandngs are demons{raledbya pefsonwho slales This is a va d tesl ? 2 Under whal c rcurnslancesmqhl a sel ol scores be consdered qL te relable bul nol very Whal k nds ol lhinos could happend!r ng lhe administralonor scor ng ol an objeclivelesl lhar wourd probablyredlce the validilyof the res! ling scoies? why is Intr nsic ratonal va dly nol enolgh lo suppon the use,ulnessol scores iiom a crassroomtesl ro be used lor grading? How is the idea ol construct !ndenepresenlalionaccolnted lor by lhe process or documenr inginr r ns c r alionav aldt y ev id e n c e ? 6 What specilic sludenl characlerisucsdoes h gh schoolgrade-pointavefageprobaby meaWhym ghl a supervsor s fal ngs norbe a usef! cf lerionlof meas! rlnqemployeeperlorm- Achievement Test Planning ESTABLISHING THEPUBPOSE FORTESTING I be sngcs ol the test developnent process begiD sirh descibing rhe purpose fbr testing. Why are $.e resringT Whar do we inlend ro measure? How wrll rhe tesr scores be used, or whar kinds ofscorc hrcrpretarions do ue wanr ro make? These are important quesrions ro answer, bur roo ofren rhey are nor anslered Drior ro r he rrrm .s ri ri ' rB p h d \,. l h i r i . unl u' runJre berause rhe answ ers tay rhi roun ddr i u r Iu r \u b \e q u e n , d e (i \i o ' , mJl i nq ar resLdetrl upme| l r or re,r5ete,ri un activitres proceedA good resr rarely serves Dulriple purposcs equally wcll Tesrs designed mainly to measure a.hrevemenr prccisely probably also are morivaring !o sru. dents ard may be insrrn.tional as Nell However, resrsdcsiBned primarily ro morj v at e s tu d e n ts ro s ru d y o r to s e Ne as l earni ng del rces arc D or l ,kel ! ro be qood \ um m d t i !e r5 re \\me n rsu t\ru d e nr l errni ng. Il u.r r" r he,. r.,dpre' r' dre i nr.;d" d to provide precrse measures ofachievemenr rhar can be used ro pro!ide feedback r o n u d e n rs a n d ro p ro g re$ ru rhei r trrrrr. { n.t thi i rtruutd he rhei l ' e p o rt A signrficanr aspecr of €srablishing rhe purpose for resring is deciding how the scores should be inLerprered Whar refe.ent will be used ro obtain mean. ing from rhe scores2Contenr? Scores o1.a norn group? SrareDen6 of obie.rives) F or ,l d $ ' o o m re \ri n s p u rp o s e s .rhe an' ucr shuul d bc ri cd , to\et o grrdi D g ' hc or se (r.portiDg) system-rc the referent used ro gi!e neaning ro the quarrerly mester gndes- Ior srarewide comperen.' testing, rh€ scores are liketv ro be refer. enced ro the content domain frorn which the resr was developed. For Dersonnel ACh EVEMENTTEST PLANNNG 115 s eic . t i( n rh a t i s b x s e d o n h rri D g rhe mosr qD al i l i ed of rl i .,sc $ho are ar l east m ininr a l l l .q u a h l l c d , n o rm rc l trc n c c d i ntcrpr.tati ons probabl l are needcd.A nd, lior lh, tc l ti n g l o r p :o i c s s i o n a l .e rti l i cati oIl or l i .cnsure probabl y requi re! . I ir e, i, , r ' rF l i ,.,, .,t ,,,' r,l ,rc r.,r.o n r ' l h e rn p l i .a ri o n s o a rh e d e c isi on abo!r rbe rtpe ofscorc i nterprerati o nc c dc d $ i l l b .c o rn c n rn e rp p a rc n r as w e ()nsi dcr rhe seprr.rtc aspectsof tesr coDstruclion lhc Soal at cacli srag. of .onsuuction is ro d. lhe things fiar will hc lp t , ) Ii c td a d i s tri b u ti o n o 1 th c n o st l al i d scores,a di stri buti on rhat has thc . h! a. r eri s i ,.s rh a r .rtrk € p o s s i b l eri re rtpe ()f i nrerp.erari onsw e had pl anned !o ALTERNATIVETYPES OF TEST TASKS T h€ m o s r c o n m o n l y u s e d tl p e s o l res| sare rhe essar-, rhe obj ccti l e (rncl udi ng s lr o, ' arrs $ e ,),a n d rl r n ,a d re l n a ri c alprol ,l eD rype. P erfornancc tcstsaD d oral ex anina ri o .s b o i h a re l e s s c o n rn o n perhaps, l ) t rvhere* re) are uscd, rhe ci r.umstanccs often falor rheir use over rhe odler tvDes lhn sectjon is devotcd I , ' , ' l, i, l .u ,, t' . u l ,h , , h J r." r ' ri .ri i . ol rl ,c\' \Jri urr' rp.r ,)pe, ard " ' :\u n des oipr i o n o f d re re l a rj !e me r i rs o a each i n si rrari ons r!here a choi ce i s feasi' .bl e Essay, Obj€ctiv€, and Num€rical Problem l i r-s t,s o me .o mmo n n i s c .rn ceFri ons.eed to be addi essed It i s not true thar luck is a large element in scdres on one type and nearly or rotally absent in another On the conrrary, all tlpes can be wrjlLen to require much the same kind and level ofabiiity aDd. ifhandled carefully, can yield results ofsarisfaclory relia bilny and lalidill (Cofiman, IS66i Dressel, 1978). A good essay test or a good objec r iv ere s tc a b e c o n s rru c te ds o th ati tw i l l ra kagroupofstudentsi nnearl y the same order as thar resulting from a good problem rest. Bur this is nor to say that the various types caD be used interchangeably rvirh equal easeand effecdve ncss (Sce Birenbauni and'Iatsuoka, 1987. for an example in the area of djagnos ing lc arn c r d i l l i c u l ti c s ) Borh eslay and problem lests are less tirne corrsuming to prcparc than otlecrile tesrs. BltI rhe objecrive tesr generally can be scored nore rapidly and norc rcliably than either of the orher typ€s, particularly the essay test Where vcry iargc groups of stDdents must be resrcd, thc usc of objective tests permits g.eatef effi.ie.cy qith no appreci,tlrle sacrifice in validity Bur wher€ classesare small, the efr.iency is in the opposrte drrectioD, and essayor problem tests often 'rhe numeri.al problem rype has the apparent advanuge of greater tntrin' sic relevance-of glearer idcntity with on-the:job requiremenrs-than either of the orher types. It rs sometimes claimed that abiliry to choose aD answer is differ. ent from, and less significant than, abiliry to produce an answer. But most of the evidence indicates that these abilities are highly related (Ward, 1982; Sax and Collc t , 1 9 6 8 ) Because of the length and complexny of ihe answers they require, and because the an$!e$ inusr be w tten by hand, neither essay nor problem type 116 ICHIEVEMENT.TESTPLANN]NG as comprehensively as an objecrive resr r.eadrng. Whichever rype exarriners deci Poafoamence:process snd product TCHIEVEMENTTEST PL^NN NG 1I7 . LUm pl i \h i n g J rJ rt, o r ro d e rc rm rn e rhe quati ty oI a producr Terchers can evaru. ar e dr J $ In g ! .' n d , u l i e e \ i I rt I, L U n hute\ dnd \outne\ i n home c, onnl l .i ( j , t ne, l.u p .e n g j n e si n d ra b l r ta m p s i n rechni cal educati on, and penmansh,p ur pangr J p n .o n c y !e n e $ In ta rg u a g earrs.tn each caserbe goat i s N mmati ve;val . il. i:PIr\ e] "rru ,,,1n..0 ^ ,' l i ,{ r!o fp ro c e s s a r emadeasthestudenrprogressesrow ar.d .l,om o r ." rtrc fro l e L r S n n u ta ri o n s .rh e m o s t c o ! Dived si(r:trions esrablished for rlr s es s ' ngs p e e d ,a c c u ra c y ,a n d q u a l i r our . ome i s a c h i c v e d .D a n c e i n s tr. polka, nusic rexcheN lisren for p war c h a D d l i s re n l o rh e i r c o u .s c l i , who have been .errified in CpR a. ar e inp o rta n r, b u r rh e o u l c o m e i s , PertbrmaDcc resrs can serv pr c s e. r s o m e u n i q u e n same idenrificarioD tasks and simu t he r r . ( \ [ra ! n .r b e , u mp " ra b t, I be comp,rable crear ca;e musr l, e{iu'vateDt rcsring for all srudenrs in r c or iig o fp e rfb rm a n c e re s N re n d s gr aLlin gg u i d e i \ p re u !re d t,tp rrr! J ri o, I ng r u p re p d re d n d j d m i rrri r. p -p e , i a r ro IJrge group,. on rhc s hotr, D erto,,,,. an, e r e \l \ re n d l o b c tc \\ e l l t, i p n , rh" n oh1r,ri re rrrr,. tI mrn) ri ruai i .n, rne q u e sri onabl e,the mosr real i sri csi mul arronsrend r d admi ni si er E ven w hen si mul adon scores r ,s tikell ro be grearcr rhan rheir benel;t. Despite l a n y ci rcumsrancesunde vhi chperforD ance i esr. j a n s o , measurcmenr.cui deti nes l ur de!el oprrts llr i:r n1nl c_!sl"s,T9,,r r..t,,u \iet dh is t , t \ J t : d , e , u 1 r, { . , , , .r, . . , ,J . i, . r. i r, , r r . gE I r \ l l :' u /1 .I n d d d i ri .rr. a h a p rr r ]td.\ri bernrertrud,otu\i ns(hF,LIi sr\Jxd r J , r nB ! d ti .s .b o rh n r $ h ' , h d r u { d promrnenrt) i n per r" r m" rr,-e ase..menr " TE S TS P E CI F I C A T IO N S 1. Types of r$r iLemsro bc used 2 N u n b c , o fi re m o fe a .h ry p en eei l ed 3 K i n d s o r (a s rsrL e i re d r v i u p resenL A ' H ] E V E M ENT T EST PL ANN]NG 'I8 1 Nr m b€r ol r as k sof c adr k ind n e e d e d Dc s . , ipt nlN ot . onr enr ar eas o b c s a n , p l . <l 0 Nuhber of ireDrs from each arca Deede{t Level and dislribuuor ot dre dilticutr\ of rhe ilcu,s T e s r s p e c i fi c a ri o n so f (tri s ki nd are us.tul l i rr rel cral rcasons:(t) rhey the work oi rhc resr .onsiruclor (2) rhel can hfomr cxanrn,"". €iuide pec t , r ri o n sa n d h o w rh c y d ,g h r p re pare rhenrl ehcs,(3) rl reyprol i (l e i nrbrnari ";i "i";; ;; Io orhers who nay $anr ro select dre resr tb. rheir o{D pardcular use, and (a) rher prolide documenrarion as eviden.e tbr.ju.lging rl,J latidirr .1 d. ;.;;;l obtarned (B-ursince resr spe.ificalions furnish a rla; ror resr (lc;elopDer,r, rhc suresr basis for-Judging rhe usefulness of a rcsr, o,: rr," .,t i,, ,io..,, is a. ".r;atty ex am in a ti o n o f rh e rc s r i re l ' ]s rh e msel vcs) Delinidg Conlent Domains rnr ro bc Dreasuredbv a resr be described? relares nosr direcrty aDd deDends mosr rion the user wishes to makc. Obvioust.!.. al obj ecdvesof i nreresr i f our eoal i s to ions. When our goat is norm rer.erencecl r be defined more generalty, bur sritl the boundaries need to be idenufied In many cases, the contenr of:cerrain book ( hapr e r\, a ' ri ' l e s , n o \c l \..ru d r g l i de,. or orher i 1\| l u.ri ondl marFri ar, \(r rhp ir m r b ro r e trg rb rei re m ro n te n r. \' \ hen uur nFed;\,o es| | ma,ehuu mnrl ,,1 ,5. €ont en r d o ma i n h a s b e e n l e a rn e d , the separateel emenrsrhat conD ri se rhar do. maiD need ro be describcd. This is rhe caaewhen dom a inr ef.erenc;d inrerprera F ,C " ' :7 .1 ,h o w \ rh p rv p e ol domai n.peLi n.dri on\ rbdr mi B hr be pru , . . ! r o. o ro r e d .h o r th e c e k i n d , o , n ure i nrcrpr.rrrron,. our D ri nrar\ D U r;osu qa5 r o o b rrtn n u rm.re te re n .(d v o r e.. ror e\ampte. rtr" t^uni ,ri e, " r' o.,' ,,,,, s iblec o n te n t c o u l d b e d e s .ri b € d s o m ew harl oosel i .rfrw o i ndi vi dual s,bori j rami i idr hir h rh F i n \| l u ,| | o n ,tt p r.g rJ m.kerrrobui t.tr.\r,i ndppF,,,ti nrt\,r$t url lhe de' cri u ri o r' p ru r rd e d , rq o q U i Ir di ttcr el r rerr,, outd errrcrge tl rrhcr .r^,F .1 k ut t en In \rru (ri o l | a l n ,d re ' i d l , s e rF nnr rrri :dbl e, i ,,,r,rerr nc !(\etoD e.i " urt br r he re d (h e r w o u l d b e n rc d e .t ro .rrJhti rh rhe,un,Fr,r ti n i r\ Lhre.,drh:rd dcpth) of rh€ resr items. ( o n L e n r,p c L i fi ,d ri o j r, tu r d nmdi n.retere0,.d re,ri nq r1u.r Le erD ti (j r ur. i r ur r h e g o d l i , ro e n rma r" h o w mdny ot rhp i denri fi dl ,tej i e- or rn-.r._" ," r r e hdd b y . tn o w n b v , o r, u n ' ro l l e d b) ea(h crdmi n.F In mor .dse- ,he dumrrn w^t llbe l a r€ e e n o u g h th d r n n l ! / rrmptd ot rhe el emenl \ , an be re\rcd ar one I i m.. rrp re 5 e n ra r' \e s x m p l e i s ru be obrai ned. rhe i ndi \i duat ete, enr. m ur r be l i \rr d o r d e s ,ri b rd rn s u i h a w av rhdr rhei r \ete(ri on L pos,i bte. I tre illus o/ ri o n i n F rg r-l re 7 t h d . h c e n a bbrevi aFd ro,on\ene \pa, e, bu( rhi pD re dom ain ro u l d b e d e s b ) rh e 2 7 proposi ri un. tr,rcd i n i ppendi r B b trc e tt a r;o n j l i n d l l ). rh e s p e' i.i l o r oi ,j .i r,.e,.r.t.,en,.a r.,.,,e .i -pi , , t,,r. r ng ol- rh e j n s rru c ri o n rl o b j e i ri v e so t i n(efc,r. Fdch obtec,rvci . , on\i dcr;d r ,.1 tent domain by irself, several items will be wnrten ro measure achievement of A C N IE V E ME N T-TE S TP LA N N IN G'19 Thf.conlent domain of intercst is phy!.al fitnc$ as d.;cribed by Chaprer r1 of t he h. alt h r ex t . ' lhc r nain ar eas a r c : I f,xercise and irs benefits 2. Designing an exe.cise program 5- The of s le. p ia good he a l r h ' o, e D- Dobain-Refer€rced Tte coNenl domain of inrer€st is Dhfsical fiLne$ as detined bv a separatc rnr of ?7 pr opos ir ions r ehr ed r o ex er c r e , c \ c r c i s e p n ) g f i n r j a n d l e . o ; r r i b u r i o n o l npep. Hr , ! - r e, 1, ' c e \ "r npi p . o p u . I 'o r , , . . r 'F I , , n F J , h . u b d o r n i , . r r o r n r t - r t ull . lnm : in lr r r ing llt pendr { B i : I [xercirc cad ,mprolc blood veset .apacirt ancl inoease hea( srrengrh and lunA 2 The bcD.fits ol aerobr cxercise r€qu,rc a minimum otLhree 20 d,inurc s.$ions r ! n, , hc d nc_,.r,,t, J h. . , lF- p, ) , t F -,rFn r r ght nr t n^ nr uLh nn' r . h\ poul bodv fosition, Loo n ch c. objectiy€s-Ref er€nced The conlenL donrains of intercsr are Lheseinsirnclonal obje.tircs about pbysi.al ii.ne$ (,\ppendix C): r. DcLin$ish the purposes a.d fearurcs of aerobic and ana.robi. exer.isjng 2. Desc.ibe ho{ nuririor and .xercise jon d) alfe.r bodi Neisht. 3. Es t im ar e t he, elar n' e am ox nb o t s l e e p . e g u i r e d b v m d i 'i d t r a l s {h o v a r l i n a g . , adiviry level, and geDeral health .ondirior. Figr.€ 7-t sample ConLenlDoman Delrnlons for TrrreeTyp€sof Score n(erpretaLons ca.h s€parare obecove, and a score will be reporred tbr ea.h instru.rional obiec r i\ e. Ih e rh r..u b i e ,,i \e ' ; p ., r or f i gxr 7-trrFr" ten tro[r qpfcndi rr In ill, , { rd re rh e ,n ,,| | r,ri n g rc rtu j r,m.nr, n,rhFLtumdi n,' rfi ni non \;FrhrrrhLn objectivesreferenced inrerpreraiions are needed, as opposed ro domaiF r ef er e n c e d , n D s a mp l i n g o f e l e n ents occurs and no i nfercnces about conrent nee. l ro h c m .' d c i n s , u ' . i n rp rp rcrrron. l ' \ i rh .hi r, ri \ p. , eti r{ r, e,t ,i rJJ,In.,\. all r lr \i l l ' o r r U s l e d a e ,,,r, r.' r .re rc' rFd i ;d ,.nq quFnrt\ ^l are nade about untesred kno\rtedge or skill rhe examinees mal possc$_ Tables of Sp€cifications 1z' ACHIEVEMENTIESTPLANNING Tabl67-1. Tabteof Speclflcationsfor a 40 ttemTest on Scor6 Betiab ity ABILIfIES fATAL 2 0 0 2 l 3 2 0 1 0 l 2 3 3 TOTAL 2 0 0 1 12 6 I 6 a 40 ,.., .*1,i"q-rrF d\pefli or ,eliabilir\.d"nnirion, rypesof erro,,. ::':1""i.,1 4omarng. rrc,o,. inr'tu"n,inr. dnd inrerprrrarionbi roetficienrs. '1.'"*. '.: li:J;n::tT;:l :i:i:J;T.;i[.*ff: ;i;:,.jil:l.jl:::#:,.':TlT:,::itg ll I r hr F c i re rn \ s h u u l c l tF q u i re c \p td nari on ," g,d," 8 ;yp" " ;i ;.; ,;; ;i ;;;; , ^nr r nr le !\d rc \i m,ta ri n rc ta ,rre i m p o , ,an,;," .ai s;' d:" r ,t.l oi ,._" ,r," ,,i i r c qu|' c a n rrv n ,o re .o mp tc \ rh .,n s impt\ i denri f\i ng or des.ri bi nq rerms. 0..,.,,i,;'.1:,i[1i,*.1':. ;:i'":i,:li#5,;ilil1i,;;:.f5.'iy.i;.,jt :J#: ple, a one.dimensional classificarion schem ;'ucrionar obrecdves ro.,,,h".;;;;,;;::: il:t ff,::i::*:"1.,:;j:.Yffilii it y dim en s i o n a re b o rh p re s e n r i n a ch sra j.;ilill;,I,1,:.11?'iLii'fi'J:il:Ti":: rabre 7-2dcpic,. i,,i,,tii",er", , -.r, ti r C thori gh ea,h ofrhe rhree i ontenr r'es,the projecred composition ofr}le resr .c i ncrrucri onatobj e, ri vesare . omD ound ( i \ 7,7, and t2 for a roral ot2' 6. { For of' r!c\ six separate objecdves.) The percenr. beequil,imporan, H.",. ,00," o'n.lilill';jfi:ruiIr;:i:,:::;iT::'.li be des ( r i b e d s h o fl i v Caregories ofEbei's Relevance cuide were used ro d€sffibe rhc abiliries . drmension in Table ?-l because Ebet,s rerms g*" iJ.*., rhe_type ofabilityrequi..a.rt. uu.i""" r.".r, irrr..-ii",""..i'... ". "p*.,i."ri "i r.*,i',i,ii t ul be, J u.e me a n i n g \ o t B to o m s .a rego, resare more \us.eprj bl e ro mi n tassi ' h " c ta s ri fi .a ' i u n f i. a, iDn o r In d i s a g re e meniamong i udges rhdn a,e Lbel s A tre;Trbls 7-2. Tabteot Spectricarions tor a Oomatn-rer6renced Test 27 27 46 - rcNIEVEMENT.TEST PLANNING121 narelyj in siruarions whcre atfecrive or psvchomoror objecrrves are ro be I r aluara d .ra ' e p u Ie \ u f ri ro n ur, e\ Jre dupr.pri are L, u\. r. d, i ri bp rhe ' h o \. dbilir ie \ d i m rn .' ,,n R e s a rd tc \\ o r rhe , ta.,i fi ,.;" n .,.,." . ;fi ,,,.;. ;i l r ar et s u | e so r rh e .r h n \c n \\ \re m q , ,,ur ncL, \\i , i t) be u.c.r tu, a ,-i i " , r,, .l i . m i n grhrredrh,e dppea,i nsi nd,;bteot .D eri l i . ^r s o. r n c re r\Ii ' ' ra \r\l ^ rp rp : u r' u \e d . t4 h i i h , d rrA U rrF\ot l l ,,.t. R cl eudn;" C ,u,de hr\e beel om ' lr ed rrn n h e p td n i n T ,rL l e 7 _ t : H oq mJnr i ,en)\ rel Jrpd ru dFfi n i on rri l l r equir e re c o mme n d e d a c ri o n ?(D o es thi s nj ake senset) ' fh e ra b l e o f s p e c i fi c a ri o n spro. r o be rc s re da n c l i r i n a i c a re s rh e r.;ri ! W hat f a c ro rss h o u l d rh e re s rp l a n n er co t es t c on te n r (ro ra l p o ' n ts o r ro ta l i rems) ln t he a b s e n c eo f i n s rru c ri o n a l o b j e crn areas can be gauged b) considering rh€ l . A n o u n t o l ,.n l n t tn n td ,n .d .\n drc.ttotmerl b) .i A t prot-,o\i r,on. . . D rob. abr \ \ hu u td h a v e rq i , e (h e w e ,g h ru r d,, a, pd , omp,,* a .l nrr i " ,i "r z An n tn t t4 \tttu tttu t ti 4 " d.hohtt.4 rupi . ro \vtl l ,h ,i x l"t",." .,i,e_i ,1.i ,, .1, " n, . ^t were devored probably should have three tines ihe weighr of a ,"pi: ,i.;,;; quir ed o n l y trv o c l a s ss e s s i o n s . R o l p a tu tu tp p ,4 4 tu t!t. .It rn ,[.r i egarded J\ e\\pnri al bi (k. gr ound,t.i o r a s u^b s e q u e n rrfl \rru .ri u n al u and descrv'ng more weighr, rhan an ar€ 4 Other Wortltnities to dalLate.\ agarn, as on a comprehensjle final ex equally i m p o rra n t a re a th a r w ' tl n o (be r ple, when a ropic is resred by essayon a be enhre l v o b j e (ri v e i rc m \. tL r p ra .' l ,j t rei \uI.. 5 N p p df" r v rrrrl \/o rp \ \a h en \(ores dre nrpded tor \uhropi .\, ,onrFnl . wit hin s u b ro p i ,. mu \r h e b e rg h ' e d ro en\ure (onrenr repre\enrari venes ut (hc ,o r c h (h r t(o re w i l t b e !Fponed. T h e p e rc e n ra g e .i n a ra b tc or \pe, ,fi , ari onqshoutd be rhnuehr ot as rhF per , enr o t te s r p o ' n ' s ro b e .r o ta te d rarher rhan resr i rem. ro be u" sed.l hrs i s especially importanr when more than one rype of item t, t" b.;;;;;;';h. tems For example, a shorr.answer irem )i nt, bur another shorr.answ eri tem rhar rion Inighr have a maximum score of 3 , respe.rivety, ofa two-irem rest in terms g x i .l € re s r c o n s rru c ri o ne ffeni very ano ro i ntorm prorpe(ri vF exami . .lo nees ade q l a re tr,.rh e _ re srp r e c i fi .a ri o ns needi o u. r" i ,ry a.," i i .a. i" quer t ion " H o w d e ra i te d ? w e mi g h r pose anorher quesri onrl ,h" , ;.; " " ." i .i" i,-ti ;;; .,. €x ac L[ . D r a c o m p e te n t i te m w ri te r. w o utd rhey be l i ket] to produ(e dn ar.eD tabl e r e- s r(i - , D v ro u b r)-rp e l rc d ri o n , s h o u td be derai tedenough ro i ndi .at€ ehdi ki nds or r r em ss n o u td rrew ! o n w h a r g eneral areasot l earni ng,but they shoul d nor ' l l e n oe io oera rte da s ro g rre a w a ) rh € a c r ual quesri oni rhar w i l i appear on rhe resr. 122 rcH EVEMENTTEST PLANNING ITEIUFORMATSELECTION \ V it h c on te D t s p e c i fi .a r;o n sl n h a n d , rhc rcsLdcvcl oper s .ext dcci si o. rel aresro t he t y pe s o f i te ms to b e u re ,1 W h e . i nsrru.ti onrl obi ecrl l es l orm rhe conrenr t r as e,r hc l c rb u s c d i n c e (h s trte m e Dt suppl i es a srri ct sranda' .1i br rhe i ype of ir en r o.o n s i d o o , rc j c .r $ Io rd s I' k e descri be,des,gn,gr.aph,.l evfl op,and ex. plain r eq u i re s o n rel o .,n o l p r()d ' rc ri or or rhe part ofrhe cxani nee. acl i vi ry rhar c annot b e .l e mo n s rr^ re d b y n ' L rl ri p lc(hoi cc, hre fal se,of odl er obj ecti vc-i reD t , v pc s( ) f rc n tb e i d e a l n re a s u .e n re l lD L ro(,i durc ml rst be.ompromi sed becauseof pr ac t ic a i c o n s i d c ra ri o n s .a s \rh e n rn ol ,j ecl i re. machi ne scotxbl e rcsr used i n. s t ead oI a $ ri { i n g s a m u l c to n c a rc rl ri ti ng abi l i ri es Thc rrade offs associ 's ared rvith essry, obje(Iive, and problcn!tlpc tcsts will be e\anrin.d furrher ro reveal the rclatile merits of ea.h Comparison ol Essay and Objecliv€ Formals The following srareDents !u enc esol c s s a ya rd o b j e c ti l e te s rs rrarjze $D€ of rhe similariries and ditfcr. I Eirhe. a. e$ay o. a. oble.tire tesL.an be Dscd lo measure almosr rDy idporranr edu.arional achielenren! lhar a.y paper and pencil 2 E,ther an essav or an obiccrile rcsr can be used to encourage sud€nts ro $udy for understanding of prnrciplcs, of8anizatioD and int€gradon ofideas, and ap. pli.ario. of kndrvledge to thc ytutnm of problems 3 Thc use ol eirhe. rype ne.e$arily involvcs thc crercise of subjcctile judgdenr .1 l hc liluc ofs.ores f.oo eirher trpe oftesr is depcndenr oo their obJcctirny a^d 5 An essavtesLguestnnr rcquires studenrs o plan dreir ovn answeB and.o exprc$ thenr jtr rhen os. w,rds An objeclivc tcs. itcm fequires e\aninees .o .hoose aDrong scyc' al designared alrernalives 6 An csav tesr coDsis6 ofrelarilel! few more general quesrions Lharcall fbr rarhe. exterded nnsvcrs An objecdv. @st o'dioa.ily consists of man! rather specific questions reqnirlrg only brief answers 7 Stud€nts speDd mosr of d)eii line in tlnnking and wrning when Laking an e$ay resr 'they spend mosL of thcir t'De .eading and rhrnking wher taknrg an objec. 8 . Tbe qualny of an objec t iv e t c s r is d e r e , m i n e d l a r g e l y b y r h e s k i U o f d r e r . s t . o n srruc@r The gualitl ofan essal t€st is determincd largell by thc skiu oflhe 1es 9- An essavexam in2(ion is relativell easy ro p.epare bu t rzLher ledious and difficulr ro score accuraLell A good objecrilc cxamina.ion is relatively redious and diffi cult 1o prepa.e b!r coopararively €asy to scor€ 10. n e$al examinition arords srudedrs DUchlreedom to expres rheir indrvidual. i(y in .he ahsBers th€v gi'e and much freedom for the €xaminer ro bc guided by his or her nrdilidual preterences scoring fte answer. An objective €xaDrna. lbn aflords much f.eedom for rhe'n rcst constru.ror ro expres per$nal knowl. €dge and values but allo*s studenLsonly the freedom ro shoq bv the proportion of corrert answes the! giue, how much or ho{ liftle rhey Lnow or can do A( H]EVEMENT.TEST PLANN]NG ll Irt objecLi\e.te$ iten\ lhe studenfs task and the basis on vhicb Lheexamnrer wi juds e r he de&ee t o whic h n h a s b e e n a c c o m p l i s h s . t a-.|a"a.-. a",.if,r"" rney arc rn esay rests l2 Anoble, ' i. - r . r r per m i, r , an d t u , d . r u n c t l \ , n , o u r r S e . , c u p ( . n e \ne),a\ re\r per m ir \ . o, , d. r . nill. en . o J r d q c r , L t u t l i n S . ' nd Ll The dr r r ibur inn ut num c r i, r . . , u r e , o t r r r n e i l , r o m r n c . . d l rpr,rn bF ro! , ' or dJ , o a , . , n, idei, h. e dr $ c e b j , r _ e ,.,;;: s,ader. ,h, r,u,n an ;;;..;;,. t) bt rhe errmir jIoF;r.r I n \ r c h nr r he\ e nm it r nr ie s Jnd di ereo, es. s hen mishr ir he mn.r aD. rrrLour ssa)i,ems? rs,ryrrr5d,err!"i"a r", l:pllilr ll,9 ll:ne0, a .n rc v e me n r$ h e n : -.,.,,,Jg I lhc group ro bc rcsed is snal, and rh€ tes Nill nor be rensed 2 The iDnrucro. s.ishes to pro!ide ior rhe developnrenr ofstudedr skiu in $riren exP.c$ron 3 The insrrucrof is nore inreres{ed rn cxploring studenr aLrirudcs .han in meai! rng achrclements (Whelher in u.rors yrord be nor€ rnteresred ir aftiru.les , hr n dr | , e\ en, , n, anil qhpr h e , r h e r {h u u t d e r p a r d n j , u n e . r e\p,c,.ron ,,r Jtr, r Ldc r I n r t er ' i ur ' ion. r 4. Tl)e inso.uctor is dore.onfidenr of bis or hef proficiencr as a criti.al esar reader thaD as aD imasinarive uir€r 01 good obj;.t_" r",t i,"-, 5I im edailablef or Le$pr epar ado n i s s h o r t e r t h i n r i n e a v a i l a b l e f o f L e s r s . o . i n g trssayrestshave importanr use . arsohave some serrouslimiradons Tea clains thal cssa) tesrscan measure hi havenot been defined.They atsoshoul( to determine how welt studenrscan an Comparisonof ObjecttvoFormats The most commont)ured kindj r ue-talre.marching.ctassification, and beendescribedin other trearmentsofoh However,most of rhesespe.ralvarieri( Their unique tearuresorre-nao more tb diffi.u[v ol usinSit than ro imp"rove rhe irem as r' measuringruol. 'n(r€re,rhe v,u'np'e(norceand rrue_tr'tse resritemsare widetl rpDli,.abte ro ;oejl \ ariery or ra\I\ Be(auseor rhis and becau* of rheimponancJriia*r.r,i,irri in usins edchone effe.,ivery. sepa,arechap,er,,,J a-"i;a ,.-,_""i"r,! ,"i muluple.choi(e irem tormat\ taterin fiir rerL 124 TCH EVEMENTTEST FLANNNG 'lhe nuUiple choiceform oftest item is relatively high in abilrry to discriDi nare bctwecn high.and low.achieving students It is somewhat more drmcuh ro ilrite than some other iten types, but its advantages scem so apparent rhar ir has become the type most widely used in tests .onstructed by specialists Theoreti cally, and this has been verified in pmctice, a given multiple-choice resr can be expecrcd to show as mlrch score reliability as a typical rrue-filsc resr wrth nearly twi.e that Dumber of items. Here is an example of rhe multiple.choice iype DirectionsrWrileth€ numberolrh6 bostansw.rlo th. qsostlonon rh6ltn6 at th! rtghtot the Erdnprei U,hichis th6 Dost rpprep rte dsslgnatlonlo. a govohm€ntIn whtchconrFt is in lhe handsot a lew Deool6? l. Aulonomy 3. Feudilism 2. Burer0cracy 4. Ollgarchy The r/?, /i,&r item is simpler to prepare and is also quire widely adapr. able- lt tends to be soDeshat less discriminating, item for irem, rhan rhe mul!iple. ' hui' e rtp e . r' " d \o me h h d r n ro re 5 u hi e(t ro ambi gunv and mi \i nrerprerari on {l dough th€oretically r high proportion of true false irems could be answered coflecrly by blind guessing, in practice the enor inrroduced inro rm€-false resr scores by blind guessing tends to be small (Ebel, 1968) Thrs is rrue because well motivated examinees takrng a reasouable tesr do very lirrle blind guessing. They almosr always find lt possible and more advanrageous io give a rarional answer than ro guess blindly- The problem of guessing on rrue-false test quesrions will be diicussed ir greater detail in Chapter 8. Here rs an example of rhe true-false Dtrocliors;ll ths s.ntonc6 is o.sontlallytru., 6nchctoth. l€t.r "T" at tho rtght ot th. 6.n. l6nc€. ll il is 6s.enrlrlly l.ls€, snclrclotho l6tt6r,,F,'' ErarnplejAsubslanc. th.t sedos.s. catalystIn r ch.mlc.lr'.ctlon mly bs Fcovor€dunrl r€r6dat rh6 6nd or th€ r..orron. o F Those cntics who urg€ test makers to abandon rh€ "rmdirional mulriple. €hoice and true false formats and ro iDv€nr n€w formars ro mea and more significant array of €ducarional achievement are misinform€d 'ed abour two important pointsl L42, asp€ctof cognirire educahonal achr€v€m€nrraD be r€stedby eirher the mult'ple choice or rhe true falsefotuar. 2 Whar a mult,Dle-choiceor true falseirem mesur€s is dete.mined much more by its contedt rhan by its format The nat h'ng type is efficienr in thar m en(ire ser of responses can b€ us€d with a clusrer ofrelated stimulus words: But this is also a limitarion since ir is sometimes difficult to fonnulat€ clusters of questions or srimulus words rhar are sufficienlly rimilar to make use of the sami ser of responses. Fuflhermore, quesrions whose answers can b€ no more than a word or a phrds€ rend ro be IC H IE V E ME N T,T€S TP LA N N ]N G' I25 sonewhat superficral and ro place a premiurn on purely verbalistic tearning. AD er am ple o f rh e m a r, h i n g n Si v en here 'vpe _ L l. e2 . fh.lodocenlt Abrcad Wllll.m Shak.spoar€ Fob.n LoulsSt.Yonson d3. The.Ia\sijimrim type E less familiar than rhe marching type, but possibly more useful rn certain situanons Like the marching rype, ir uses a single ser of responses but applies rhese to a large number ofstimulus siruarions An example of the classification type is the followirg. Oir.cttorsi ln th6 lollowhg ltoms you a6 to €rpros! ths €lt€ct6ot €x€rcts.on vrrtou! Dody ind sqbst.nc.s, a$umo lhai tho org.nlBh undorgoo!no chang€oxc.pt tho3. Proco3see du. lo €xorclso.Fo. each item clrcl€ ths approprlatonomb6r. 1. ll tho alloct ol srerclsols lo ,rcro€salh. quanlltyd.acdb€dIn th. lt.m 2, lf lh. ollect ol €lorclso ls to.Lc..as. ths quantltyd$cdb.d ln th€ tt.n ff erorcfss should h.v6 no r ppreciab .ll.ct ot an unpr.dtclrlr. .tt ct on lh€ qu.nttty .loscdbodIn lh. ltoD 27, 27.o23 20. 26.023 25. Anounl ol gluco3, In lh€ blood 2e.r @3 30. AmouDlol ro€ldu.l .k ln th. lungs 30. 1 3 @ iI I I have shown a very high correlarionb€tweenscoreson resrscomposedof parallel .hort ans$erand mulriple,honr ircms,whenbo'h membersoriach paiiofpar. allel itemsare inrcnded ro tesrthe sameknowledgeor abilny @urich, 1931;Cook, 1955). This meansthat studenrswho are b€srat troduing correct answersrend also fo be besrar idmt{ttng thcrr, amonlseveral airernarirs. Accuraremeasures ofhow well studenrscan identify correcr answersrend ro be somewhareasierto get than accuratemeasuresof deir abiliry ro produce r}l€m.Th€r€ may be special srtuadons,of cours€,wh€re rhe coEelation would be much lower. The disadvantag€s of rhe shorr.ansu€rform are rhat ir is limited ro ques. rion\ rhdt can be an.weredby a word. phras€,lymbol,o' numb€rand rhtt its scorrngrcnds to be subjectiveand r€dious.Irem wrircrs ofren find ir difficult ro phrase good questions about principles, explanarions,applicarions,or predic. I26 AcHIEVEMENT,TESTPLANNING r ionr th a r (rn b e rn \re ' e d b v o n e \pe, rfi ( hord o, phrase H ere Jre \ome exJm pr es o I l n o r( rn rw e r [.mq Dtroct,orcjOtr the btank toflowhg 6ach ot rhe tottowtngquestions,parflat statem6nts,or wo.ds,wrlt6 ths word or nuhb€r thrt s€emsmost appropriat€. wh6t ls rhe !at6nc6oI orygen? 2 Th6 mlddl. s€ctlonot rhe body an h6ect is calt€dthe rhor6x. Whal maJorriler tlqws throushor n6ar€achot th€semai;;;Gs? -t Cairo Ou.bec Sr. Laersrc€ ,at a variery of irern ilpcs be used in each askspresenredro rhe exami nee..l .hevi m_ .the sco.es or make fte rcsrmore i D r;r.sr ,rs should choosc rhe paraicular irem rvDe w i sh to exami ne There rs,rore rreri ,i n widely a d a p ra b l e A re s r c o n s tru c ror c: ir c D t yp e , s u c h a s a D u tri p l e c h o i c e, ar when i! becomes .learly more efficient depc n d s mu c h m o re o n g i \i n g p ro per and on l ri ri n g g u u d rrc ms .r s h d rercr rvpe,hdn on rhe,hor,e,,t rl ri \ or rhaL type of irem. It6m Complerlty There conrinues ro be an inreresr by some resr developcrs roward rhe use oI ir em j rh d r p re s e n rc o mp te \ ra s ks.orren bJsedon r." e,tr .. a.i ri i ;d d." .J" t r ons o, e d l o r r o n rr i \c d \i ru a rro n 5 .S ume requi re rhe i ni eri rerari on or romD l cr ' dar a.di d s ra m ' . o ' b d , rs ro u n d i n ro ,mdri un. i i g" ,. r_.t ,r,o" * ," .;.;:;;l j ;;i . om ple x i rc n ,s p re s e n re db v Bl o o m rnd hi r , ol tedsue, r956r t" ," " ,.fi ;,.d;i ,,; { common to use rtems ofrhis nature on licensure and cerrili.adon $,rirren exam. ool is nor very large comprex rrems appear ro be artracove 6, ofknowledge, rhey provide an answer ions rest only recogn(ion ofisolaled fac_ arions and background marcrials used i ),bty require (he examinee ro use hiqher ,ffactive ro rhose who beliele rhar edi,., ng a srudenfs abilir) ro think rather rhan )wledge and rhinkrng were independenr Howeaer, rhese complex rasks hale soDe undesirable fearures as rest ACHIEVEMENTTEST PLANNNG .I27 (1) rhe iten beSins trith a desLriprion of a drDr l;:x :l"".j..j*i*:L.iTii11J;r:;:,t::rj:["lT:]iii*ll.t,ff::f,il ti; 1;;lt1i::".1i*,ti[g;rit;;i{ tflfr$di.ffi (7i*ozozl (:) An unusuai chco,icar reacrion is dcscribed. E Pp 196 9?) :J't}?n,il'ii::X :,:;:fit}}:: ;l;h:{:r,*;#'ttj:;:":,T"::l;1 :J)ExanLkes aresncna.han on$,rri(rr (heexpen",,,,",,,,:."Tf;::,,].t:":il jn;j;,.,1:-il;' ffi:,1 l;:Ti:;l!:F::i::':.$:.1-'"'i..,T*lT l;S;lt :. (lfi?ru,na,r pp I lr-t9) Figure7-2 DescrpLons orc.mprex rems Some ircm wrirers are drawn b , i28 ACH EVEMENTTEST PLANNING cartoon! poem! or passageol lest material, they are asked ro apply Lheir knowl. edge. kems that require interpretaoon of marerials ofien are rctirred ro as c ont ex r.d e p e n d c n it te m s . (T h e y h a ve no neani ng oursi .l crhe conrexrof rhe mare r ial ab o u t w h rc h rh e y a re w rn re n .)They are w i del y use.t i n rcsrsof generaleduca. t ional d e v e l o p m e n t,re s tsw h o s ep urposesarc to mcrsure (he abi i i ri esofshrl e,rrs wir h w i d c l y d i ffe r€ n t e d u .a ti o D a lb ackgrounds (\tosr suc.eedqui re w cl l i n doi ng s o. ) Ho w e v c r th e l a re Ie s sa p p ro p ri !te, con!eni enr, and effi ci enr i n rcsrrngfor ac hiev e m e n th l e a m i n g s p e c i fi c subj ecrmarter l en users shoul d be skcpri cal of c lai s rh a r c o rte x t.d e p e n d e n t i t cms measurearl ti dr rather dran knori l edg. bec au s eth e a b i l i t' e s th e y me a s u rearc al mosr w hol l r rhe resul tsofkno\l edge Ma n y o f rh e i n d i re c r re s i sof krosl edge, rhrough speci atappl i .atuo' s {)t t he k n o rl e d g e o r th e u s e o fc o m p l ex si ruarj ons,can be presenl ed i n rrue fal se. m ult ip l e .c h o i c e ,s h o rt.a D s l e r, o r matchi D g forn S ome are more conveni endr pr es en re d i n o p e n .e d e d l h s h i o n , such as requi ri ng rhe cxami nee ro produce a dr agr a m,s k e rc h ,o r s e t o f e d i to ri a l correcti ons The mai n poi nr ro be madc hcre is t hat, w h i l e a c h i e v e me n tc a n b e resredmosl conveni eD rl vB i rh one of rhe .o,)r. m on i te m l o rm J r\. ,h c re J re n ,rd \i .n\ \hcn ^rhe' mcan\ r!' d) be morc , oI\c nient, saosfacrory.or palatable b those who are char ged wirh providmg cviden.c for valid score use. NUMAEROF ITEMS T he nu mb e r o fq u e s ti o n s to i n c l u d e i n a (esti s determi ncd l argel y by 1l )earnount of time available for it Many testsare liDrited ro 50 minurcs, morc or less,becansc that is the scheduied length of rhe class pcriod Special examinarion schedules nay pronde periods of 2 hours or longer In general, the Ionger rhe period and the examination, the more reliable the scores obrarned lioD ir However. ir is seldom pra.rical or desirablc to prepare a classroom test rhat will requife more A reasonable goaj js to make rcsts that include few enough questions so rhat mosr srudents have hme to attempt all of them when workrng at rheir orln normal rates- One reason for this B that speed ofresponse is not a pnmary objec. tive of iDstruction in rnost K-12 and colleee courses and hence is nol a ralid indir a ri o n o l a r h i e v e me n t.In m a n v dreaso, pr ofi (i cn, \. \pced rnd d,, ur/i ) a! e not highly correlated. Consider the data in Table 7 3. The sum of the scores for the first rcn studenB who finished the test was 965- The hiehcsr score in rhar gr oup w d s 1 0 5 . rh e l o w e n h a s 7 1 . Ihur. rhc ranqc of\oreai n har tsroup uas 35 score units. Note that, though the range ofscores varies sone$har from group to 8roup, there is no cleartendency for students to do beRer orworse depending on the amount of time spent. ODe can .onclude from these data rhat on rhis test there was almost no relation becween time spent in taking the resr and the number of correct answers given. A s€cond reason for giving students ample time ro work on a test is rhat examination anxieiy, severe enough even in unrimed rests, is accenuared when pressure lo work rapidly as well as accurately is applied. A third is rhar efficienr use of an instructorh painstakingly produced tesr requires thar mosr srudenrs ACH EVEMENTIESTFLANNNG T.ble 7-3, I29 Relation Between Fale of Wofk and Test Scoresa 1 10 1 1 2A 31 40 41 50 51 60 6r-70 71 80 81 90 91 100 965 956 9,13 955 965 1010 942 968 35 32 31 32 52 25 27 30 o. a resr 'Based byr00srudenhThemeans..re onlbELesL was96I TheLenLhstodenl Jr^srred LheLesr r es pood b a l l o l i r In $ m e s i tu a ti ons,speeded tesN may be appropri are and v aluabl e . l )l | r th e s e s i l u a ti o n s s e c m L.rbe Lhe excepti on. nor rhe rul e_Though r l, e e ire r,,.,1 ^ ,' l Ii .r-n d d rd \ n ,r i udqrns ' peede.l ne... m.a' urcmenr .pei i ;l ists hale .ome Lo adbpt rhis one: A test is speeded if fewer dran 90 percenr of t hc t c s Lta k e rs a re a b l e ro a (rc tn p l a l l i tems T h c n u m b e r o f q u e s ti o n s thar an exam,neecan ansser per mi nure de pe. ds o th e k i n d o f q u e s ti o n su s e d, the compl exi r! of the rhoughr processes r equir e .l !o 2 n s { e , tb e m, a n d th e c x ami nee' sw ork habi ts The fasresrsrudenri n a c las sm a y fi n i s h a re s trn h a l f th e ri ore requi red bl the sl ow esrFor thesereasons, it is difficult to specify precisely how many items to include rn a gilen tesr Rules s uc h as i u s e o n e m u l ti p l e ..h o i .e i tem per ni nute" or " A l l ow 30 secondsper true fal!e irem are misleading and unsnbstantiated generalizarions Only expe. ricnce with simrlar tests in similar classescan provide useful rerr.lengrh informa. F i n a l l y , rb e n u mb e r o f i te ms needed depends al so on how rhoroughl y the domarn orost be sampled. And rhat, of course, depends on rhe type of score interpretation desired. For example, a test covering l0 instructional objectives may require a minimum of 30 nems when objectrves referen.ed inrerpreradons are wanted, but 20 ircms might suffice for norm.referenced purposes. Conlenl Sampling Erors Ifrhe amounr of rime available for rcsring does not derermin€ rhe lensth ol d r e\ r , th e d .(u ra c v d e s i re d i n rh e \(ore\ shoul d derermi ne j ' In gene' al , i he larger the nunber of items included in a test, the mor€ reliable the ,cores will be. In stadstical rerminolog]! the irems rhai make up a rest consrirure a rdrnpld from a much farger colle.tion, or populattm, of items rhat might have been used in rhar rest A 100.word spelling test mighr be constncted by s€lecring every fifth word from a list of rhe 500 words studied durins the rcrm The 500 words consti. r ut e t he p o p u l a ri o n l ro m w h i L h th e I{ r0.sord;mpl e sas rel e.red Consider now a studenr who, asked to sp€ll all 500 words, spells 325 (65 I3O ACH]EVEMENT.TEST PLANNNG perceno of rhem correcrty. Of rhe l0rl hords in rhe sampte, he spelts 69 (69 Der. c €nr ) . or re c rl l T h e d i ffc re D c e b c ru e en rhe 65 per(enr for rhe popul ari on' anrt rhe 69 percenr for rhe sample is known as a,oipting ",-ro,. rh e .a s e u rrh e ,p c rri n g rc . r. rhe pop;td,;on ut p,^,i t,tr que\ri n,,. i \ . .In r ear and o e r,n rre .n o r ro r n ro s r re s tsi r i s nor. t hat i s, rhcre i s al D rosrno ti mi r ro the nuDber of problems thar could be invcnred for use in an alsebra resr or ro r h- enum b e r o fq u e s ri o n s rh a r.o u td b e fornrul ared tor a hi story tei i _C onstructo s of re-srsrn rhese subjecrs,as in mosr orher subjec,,, h"". p,"i.t..-1;;;l;i ." ". nn.. B ul l hei r lii r , om w h i i h ro d ra w * ,,,i ,te, que.,i r..r. dre p ,e ,,,,,. $m pr c \ . n e w n n e te \\r L (r' eru s c* "rh e In .t,rde ,,nt\"dr tra, r,un ot rhe querron\ rhrl couro De asked rn €rch (are. A mdjo, problcm of resr consrru.rors is rhus ro makc t her r qdmp te \ ta i rl \ re p ' e s e n rr l h c u re | | tat p,,pul dri un ot que\ri ons on rhe l oD i ,. T h e ta rg e r rh e p u p u l a ri o n o r p" rcIrLt quesri on., rhe more ti kel \ i i i ! r h/ r ( he c o n re n t d o m a i n i r h e re ro Bc n eousirhar r\, i , ,n, tudes .l i \e,\e and' seD ,. they happeD to know is a much sre, lO.question resi than from one of t0 practicalll all educarional rest scores. er r r ) r ear e n o rra u s e d b y m i s ra k e si n s a D pti ng.A perfecrl l chos€nrandom samD l e r r r t t s r r t t b e s u b j e c l to ra mp ti n 8 e rro rs .i mpl ) bF.au\e i r i s a .ampte. LEVEL AI{D DISTRIBUTIONOF DIFFICULTY I her e ae tw o s a v s i n w h i c h rh e p ro b tem ot tci r d,fn, ul l y , an be approa.hed. ( J ne r s r o r n (tu d e rn th e te s ( o n l v rh o s e i rems rhar dn) srudenr q ho i ras rrudreo successlully shoutd be abt€ to ans$er. Ifthis is done, mosr of rhe srudents can be €xp€cred to answ€r the inajoriry of rhe items cor€cd} pur somewhar differenrtr, rs are tik€ty to be giv€n rhar many ofthe itens wi ndr De elle( r r v e In d rs c fl m tn a tj n g a m o n g average, weak, and poor The s.ore d homog€n€ous, as reflected by a smaj to make norm,referinced score intc s c or €sof d i s a p p o i n ri n g ty Io w r€ l i a b .. T h e o rh e r a p p ro a (h , fo r n o rm - referenred resri ns.i s ro (hoose i tems or appropnate conrenr on ahe basis of rheir ability to reveal different levels of achievem€nr among the students ieslr dimcult qu€srions. The ideat difficuh difficukt sc;le (percenr corect) mid responie) and rhe chan(e levet diffic 25 per c enr c o n e c r fo r to u r.a tF rn a ri proportion of correct responses, t}le item y'.value, should be about ?5 percenl A€H]EVEMENT.TEST PLANNING 131 .orre.t for an ideal rnc-false ireD and abour 62 5 perccnt cofrect for an rdeal nulr iple c h o i c e rte n (T h c r€ rm p " ,/,r rq ,,\cd ru i efer ru rhe drri ,L!l t\ of an it enr ) T h i s s e .o n d a p p ro a c h g c n c ral l l \rl l )rel d r' ure rel rrhtc \.,,res rt,an rhe llr s r ibr a c o n s ra n t a m o u n t o f rc s ri n g hmeAs w e n i l l s e ei n rh c u p c o D ri ngchaprcn on i rem { ri ti ng. rhere are severat m c t hod s i te n r w rrte rs c a n u s c ro m f,D rpul arerhe Ll i ffr.U tr\ l del ot a rerr rrer, pr epar e d l o ra s p e c i fi cg ro u p A c l t ur;' ,rm.referenLcd Lcsri nq,l urh manrD ul ,r, r nn\ p! 1 /b ({ n p l u \e d r.c re d re i rc r \ut rhcde\i redLl i rti ,ul ri tc\rt thoush.r is possible to use thc samc merhods !o conrrol rhe difficulry of irems writren lbr a c i te ri o n re fe re n c e d re s r,s u c h mani pul auons w oul d be i nappropri rre For cr irerion.refere nced measuremenr, rhe difficulry h builr inro rhc asks or r|r k nowle d g e d c s c ri p ri o n sth a t s p e c i f) rhe (unrenr domaj W hen,rem Iri rerr,,," nipulate i te m c o n te n r ro a d j u s t p e rc erveddi fl l cuky, thev are i n effed.rea(i ni t a m is nalch b c tw e e n rte m c o n (e r a n d the doD rai n defi ni ri on Thesc mi smarchcj i' npac r c o n re n r re l e v a n c eb y u n d e rreprescnri ngl egi ri mareconrent and bv i nrro duc ing i fre l e v a n r (o r l e s s re l e v a n r) conrenr.l n rum, pa' r or rhe reaqonIur nor specifying the norm.refercnccd contenr dorrai roo precisety is rhat jr Bives li. ccnsc to the item writer ro crearc irems of rhe mosr appropr ia|e difficulrl. Sonc instru.tors believe drar a good tesr should include some difficutr items ro tcsa the better studenls and somc easy irems ro give poorer studenrs a chan.e Rur neirher of rhese kinds of irems rends ro affecr rhc rank orderins of s t udent s c o re sa p p re c i a b l y T h e h i g her scorIng studenrsgenerJ ) houtd.rn\w er the hard€r items and, therefore, earn higher scores yet. Nearly everyone would answer the casy iteDs 'l he efecr of easy irems is ro add a constaur amounr Lo each exaDinec's score, to rarse all scorcs. bur wrrhour affecrins rhe rank ordc, of s t ude n ts 's c o re s F o r g o o d n o rm referenred arhi e!emerr n;,sure\, i rel rs of moderare difficulty-not roo hard and nor roo easy-contriburc mosr ro discrjmi nating bet{een studenrs who have leamed varyinB amounrs of rhe .onreDr of Tesls designed ro lield criterion.referenced score inrerpretarions tikely will be easrer in drfficul(y level rhan therr norm referen.ed counrerparrs. When resting for mrnrmum cumperency or for masrery, rhe expecrarron is rhar mosr s ( udent sh a v e re a c h e d th e mi n i mu m l el el or hr' ve ach' evi d masrer\.The i rcnb in these rcsts should be easy for mosr crudenrJ bur should be difficuft ior rhosc who have not mastered rhe contenr rhe iiems represenr Ir should be clear rhar a t.sr item in isolahon is nor easy or difhculr. The drfficulty of an irem relares ro the nature of the group and depends on rhe exrenr ro which those in Lhe gloup possessthe abiliry pr€sented by the task. SUM M A RYP ROP O S IT IO N S 1 T hem os lim p o rl a l unnl c l i o n o l c l a s s ro o m te s rsrs 3 Whaleverfom of lesl is used,examiners shood lo oblainpreclsemeaslresol slud€ntsachieve, 2 The lormoi a tesr-essay,objective,probtemgiv€sno certainindicalionol lh€ abilitybeing atlempilo makelhetrmeasurements asoblecLive 4 Whena p€rlo.mance tesrand an objectvelesl canbe lsed to achievesssenta ly lhe samep!r pose,the objellve test lik€lywll be moreelll, ACHIEVEMENT.IESTPLANNING '32 cieni, be mo.e retevanl and yield more reiiabte 1l 5 Th e pre ctsonwir h whic h lhe c onlents pec f c ar ons ror a test shoutdbe describ€dretatesto the rype or score tnterprela1ion de6 red 6 A labte of speciticatons is a planningguide ior Multiptechoice and lrue_tatseilems can be used I n m e a s L r ea n y a s p e c ,o t c o g n r y e e d ! . d l o n a r 12 S a t i o n ao r n t e r p r e t N e tegt temstendlobe n -p rcrent,drflicLltto wr te drfic! I Io,,key oblecr v e y € n d L r n c o n v i nn(q a s m e . s u r e so t h q h e r ***'alion orconlenr and menrarp.ocesses. ff;lj:i""""",:T" l3 Mosl classroom achtevemerr lests shoutd be 7 Theretaiveimporla.ceot a conrentsubdomarn sno.r eno!9rr, r. relationto lhe tihe avaiabte In a lestdepends so on suchtactorsas lhe amount thatvrrtu y at stLdentshave tme to alempt or conleft rt contarnsafd the amounlol ifsl.uc, a rcna lrmedevoledto il 8 Theexami.ercan conlrotthedisirbutionol lest s c or es m or e e a s i ty w i rh e s s a y th a n w th o bj ecl i v€ I I ls untkey thatstldentssludymoreeflectvely rnprepsaBton foran essytest lhantoranobiec10 E s s r yr es lsc a n b ee fl i ce n tw h e nth e 9 ro l pro b e OUESTIONSFOR STUDY AiID DISCUSSION , nsr,ucro,d,r,e, nmpo,ra,rways i:x3:iitL'",#:",":1,"J1"::J,HJl',"i:fl""#" 2 Whalreasonabte slepm ghta scienceor malhtet ;::";;;"'";:Xil::lHl':i::i;:':1ffi:l?:':li:,1:il;l 1:'J:1,'i:;le'.1i",;,"; 3 Whyafe pertormancelesls ofief lesseTUcief hi anoblecliveIeslswhen bolhare designed Io setue lhe same purpose? ,es,anda c,i,erion' H":::.f",i"Ji:llf;ft";ndderinirio' rora normje'e'i.nced appropna,ev e,clde,ems n ' li:';:ljii;:fiHjX":::"H:[:#?:,9"":,:,"X*'ons 9. Howdo conlentsamplinqe(ors causesco.eretiabrty to t be towered? mieht irbeapprop'|iare b Lrse irems rhar a,ehishrv di' cur ro. ' ir:Tl"Yl1":X;1ff :"#es thesratemnr' "crte'iion{ere,enced res,s a,en,inron,onarv " y,,:il;jiffi:l$"^ " True-FalseTest Items From one poinl of view, Lrue fals€ tests seem like a breeze-easrer than they ought to bc From another, as many students would testi4. they seem unnece$ar ily difiiculr, irrelevant, and frustrating Some would say there are bctrcr rays of measuring achievement than bv using true false items. Yet this lack of endorse nr€nt is not universally shared among educators. A few, including the authors of rhis book, regard true false items much more favorably (Ebel, 1975; Frisbic 1973) ME RI T SO F T H E T R U E-F A L SEF OR M AT The basic reason for using true-false tesr items is that ihey provide a simple and direct means ofmeasuring rhe essential outcomes of fornal education. The argxment for th. value of true-false items as measures of €ducational achreve. ment can be summarized in four statements: 1. The e$encc of€du.alional achievemen! is the command ofus€ful verbal knodl 2 All lerbal knowledge can be expre$ed in propo$uoDs3 A proposition is an)' sentence rhat can be said to be (rue or false 4, The exrent of stDd€nts' cotumand of a particular area ofknowledge by lhei. successinjudSing the rnrh or fahity ofpropo.itions is indi.at€d r€lated to rL r3tl 1 34 TRUE FALSEIEST TEMS l he r ar io n a l e s u p -p o ru n gth e fi rs r s ra remenrw as pro!j de.t i n C hapter 3. The sec. ondi\ z lm o rrs c l l .,\rd e n r. I. I p n \,i L' 1. In i nraS i re r,, " t" n,.n, ^i .erb,t ^n" .,t c dge r ha o u l d t' o r b e e \p re s s .d r\ a pri ,p^\,ri o : th,. rhrrr] i s r e.nerJ \ aL c epr edde fi n i ri n n . I h c In u r rh .c rm\ ro be d I,,gr,at i un,equpL F ut I t;r ti ,,r rhr,_ It mal, ofcourse, be challenged on rhe baris oa r".t nic,t ,ear."ess". r, t."" rlisc r t em s ,bu t rt i s n o r Ii k e l y ro b e re j e c r edi n pri n.i pte. I o re s ta p e rs o n ' sc o mma n d o fan i dea or cl emenLof koow l edsc rs ro rcst his or he r u n d e rs ra n d i n g o f i r A s ru denrw ho can r.cosni ze an i a* ?," r" expressed in some parri.ular ser of wods does mr have .u,""runi ".rrr."i, 't 's Neither does the sruden( who knows tbe idea onlv as an ,sotared facr, r{irhoui s eeingho w i r re l a re d to o rh e r i d e a s.K now l edge one has commard of B nor a m is c r llan e o u s' so l l e c ri o n o f s e p a n re el ernenrs,bul an i nregraredsLru(ture thar c an_be ! s e d ro D a k e d e c i s i o n s ,d ra rv l ogi cal i nferenccs,or-sol ve probtems Ir i s us able k n o w l e d s e C o n \i d e r-h o s u n e mi Bh r re \r d .rudenr U n,ndrLl ,,t qr,hrmFde,pl ni i . ^. ple Clerrh . ru o l l e r rh . \ru d e n r rh e u ' urt expre\\i on n, ,he pri n.rnte n5 r rrue statemenr, or some slight akerar,on of it as a false slaremenr, as ha; been done in r t em s I a n d 2 , i s ro m i s u n d e rs ra n drhe true nature of know l edge (1) A body hjnoF€d h a ttuid rs buoyodop by . torco€quatro rhe w€ishr ot rh6 flutd disPrac6d._fi) in a utd i3 buoysdup by.lorc6 6qoatlo ha[ the wetghtot th6 flutd 12) A boly immerGod drspracod. {R Instead the srudent mighr be asked ro rccognrze lhe principle in sonre alrernarive statement of ir, as irems 3 and .1 below 'n (3) ll an oblsct havinga codrin votum. is suroundedby a quidorgas.rh€ upward torc6on ll.qu.l. th6 w.lght ot ihat votum. ot rh6 tiqurdor gas. Cr) (4) Th6upw3rdlorc. on rn oblecr.urcundsd by. qutdors.s t6 equalorho surtaco area ol th€ obl.ct mutlpttod by rho p|6sso6 ot th3 tiqutdor gas suroundtngtr. (D O r I he 5ru d e n r m rg h r b e re q u i ' e d ro dppty rhe pri ,r,i pte i n \pe,i ti i srrudri un, r uc h a, r h o re d e !ri b e d i n i re m s 5 rn d b bel otr (5) Thobuoy.nt lorcs on a one-c€n m€r.r cub6ot atuminumts €x.c y rh. srmo a3 |h!r on r on6'c€ntlm.torcub6 0t tron eh6n both ar€ Inme6sd h wet€r, m {5) ll an insolubl€obJ€ctts hmeEod succ6sstv€ty h s6reratfluidsot difi.r€nr d6nsiry,rh€ buoyantlorc. uponlr in s6ch c.E. wll yaryInvers€tywtrh rh6 d.n6iry or th6 fluids: (B Sonetimes the use ofan ulconvenrional example can serve ro tesr undersranding (4 Dlatlll.dwate. i. son wrr€i m It is a popular misconceprion rhar true false rest irems are limued lo testing for simple facrual r€call. On rhe conrrar',, comptex and difficulr problems can be pr€s€nted quire effecdlely In this rorm. ---------- I TFUE-FALSE TEST]TEMS I35 (6) Th€ n6xt tsrm in lh6 ssrtss3, 4t r, t1. 18ls 29. m (9) It th6 3ld.s ol a tr.pezotdaro cors€cu v€ whot. numb6E,and I rho sho.r$r 6tdot! on. ol the two par.ttot6id6s,rhsn the a.6a ot th. trapozotdis 10 squrr€ untt.. tD 'fhe reasoD why true fatse resrsare otien held in low este€m is nor thar rhere is anyrhing inherendy wrong with rhe ilem form. Ir is rarher rhat rhe formar i\ olr e ' , u { e d b t u n \k i l l e d i rem w ri rrr\. Ir ha! rl \o bcen al l esed ri ar rrue e perl f dls ere \r\ d ' c 'en\p ( i\a l h \u s c e p ri b te ro gue\\i ng and rhar rhey hrre h;, mtul ene, L. on \ r u d e n r l e d rn i n g . I' c l i .l ' s L h a r hdre nol bFen,hei ked aA rai n\texD eri menral dak l h e \e a l l e ts F dw e d tn e \\e s o t rrue-ratsei rem\ w i be d;atr w i th more ful ty lat er i n th e c h a p rc r E tici€ncy ol True-False ltems In addrtion ro providing retevanr measures of rhe essenceofeducarional achrevencnt, irue-false irems have rhe advantage of being quirc efficient. The n { i l l d c p e n d e n rl t !.o ra b te per rhousanJ w ord\ of resro, per e\pon\es ghe,rhanrharl ormutri pte.rhoi re r' r,un' ourbuel rrc s ri r,g ri m e re n d s ru b e ,o n\i'derabtyhi items. Research e!idence has shown rtrat siudints can arrempt thiee iru€_false fi n _ ,e J empr a prl or murri ptF{hoi .c i remr \Fri sbi e, i: 11\ i 1 _ rh e r elr . r' , /.1 ).u tti e ' .q Inug' re rhdrs a' o d v d n raB eIn etl t. i cn. y i s r di \adr anti de i n i rem .ti s. cnminat'ng power kem fbr irem, rrue false rends ro discriminaG Iess wetl benveen high and low.achiering srudenrs rhan mutripte choice (trbel, tg80).In sum, a C:od. olel'our rrue false resr is rikely ro be ai effecrive as g."a lr".r,.ri multrple.choice tesr. " Compared Nirh orher irem formars, rrue-fatse rest items are relativelv er \ t ru w ' i re T h e \ a re .i mp l e d e ,l arari ve senren,esuf rhe ki nd rhrr mate up u ra l a n d h ' i ' re n .o ' n m u n i (a ri on' . Ir r\ rrue rhar rhe i der' \ rhcy atfi ,m o; " r os r rDust be chosen judiciously deny Ir is also rrue rhat the id€as chos;n musr be worded carefully, wnh a view 1()maximlrm pre.ision and clanry, since rhey srand and mun bcjudged in isolalon. For rhh reason they musr b; self.contained in meaning, depending wholly on mErnat conrenr, not on exrernal conrexr Bur rhe ba\ i, ,l i l l rn \o l !e d i n rru e -trl q e i re m qri rrng i s no di l terenr rrom rhrr requi red (o mmu n i ,a ri u n q i tuari on. Iho\e qho have di ffi (utt) i n w ri ri ne good true-false resr.irems p.obably have rouble expressing rhemsllves clearl! and accurately in other forms of wrrrinc. Comparisons wlth Multipl€.choice ttems An obvious difterence berw€en rrue-false and mulriple.choice ircns is in r hr nu n b € r o l rl re r n a ri !c , g e n e 'a \ o ered rhe ex/m,ne;. A norher di erence ' uprercnred. r \ r n |h ( c te tIn ,re n rs n r s p e , i fi c i rt o f rhr rdsk tr ma) be mo,e di tU .utr r o t udg e s h e ' h e r a s ra re me n rs h o u l d be i d ed rr uc of fdtsc rhan ,o i ndse w hi (h ol r e\ e ri l a l rc ' n a rrre \ i \ rh p b e \r J n.w et ro d pr' rri ,utar quesrron.For i rampj e, sludents who mark a srarement rrue may nor be able ro think.of a counterex'aml pre a srtuahon rhar would Dake rhe proposiion false. Their \ . J r . h fo r l I o J n re re x d m p te m a t b e boundFd by ri me tl mi L or by rhe teneth ro wh! h rh e ) .a n \| | e r(h rh e i ' m i n d or rhe deprh or rhei , reLri ctdl ,ysrem ro;hi (h i36 IBUE.FALSE TEST]TEMS llel i1 len.t.at.. The mulriple(norre rrem, howerer, hmrls ihe univc$c or rhar rhe individlr;t r (hese d rhere aresubsrantia, sim;il;.;ii:i.Tii: ,1:'1:fii],:-ll rre,cn(es, rrern\ d' e hi rcd on prnpo\i rron\. J te\, ti tr Whtchot ths lo owing sentonc6sts statod most 6mpharicay? a. I my onderctandtng of th. quostionis coroct' lhis Princlpleis ono wo cannoraflord to ,cc€pr. t Onoprhctptsws cannotatrordtorccept tsthtson6,it my undeEian.,hgot rh6 qoes on c. ThlBprlnctpte,lfhy hd€rslandhg ot rhequ* on ts conEct i, on. wecannoraflord to d. Thtspdnctpt€Is on. w. cannorarrod ||o sccopr.|l my undg.srandtryot the qu€sflon tg .ofFrg!re 8-l i nvot\€ some desree or D er. rom propo' j ri ons C onsequenri y,rhcv .an o r mar and-a' e berrer posed i ni ri a r as mul . ' n r D asrsot rrem 4 i s rnore apparenr rhan ir (l) Ch.ngtngrh€ roDpgratuEot a mas3ol 14rrrore amenarnenrs we," ::li:1",ffi:#il;"T:i:: ".*,,;;;"-i,:": raflflc!flon th.n dudngth6 noxt oo€ honoEo y6ar.. (r) "f," """, (3) An6c pse ot tno sun can occu.onty wn.n rhs moon ts tu . {R ('' hcrt.rhg $€ rongti ora tosi is lt6ry rjodsc6a6€rts Erand!ftr6nor or h.rEu.€nenl (D TRUE-FALSETESTTEMS 137 TRUE.FALSE VEFSION M ULI IPLE.CHOICE VERSION A EqualtyUselulFonats + Y'z= 4 is a circle (T) (1) Th ee quat onX' ?+ Y' : = 4 s r epr es enl e d + Y'z= I is an ellipse(F) (2) Whal s the ma n flnclion ol a co(eclive 'a Chanoelhe image thal lalls on lhe (2a)Thoma n lunclonor a cotrecliveLensis io cnangelhe magetharlallsonlh€ re na (r) (2b)Themainlunctonor a co(ectlveL€nsIs to changerheamounrol lghr reaching rhe b Changethe amounl ol righl reach ng c Femove the blnd spol o. the renaa B ltens Benet suited ta True False Fonat (3) Which ol lhese is ror characlerislicor a a lr c an v e ony n pant and anim al 'b (3a)A vnuscan ve cny in planlandanimal (3b)A vnusis composed ot verylargeliving ll is composedol very arge iving c ll can reprod0ce lsell c hens BettetSuiledlo Mullole-Choice Fornat (4) Whichol lhesebesrdescribes a goodcr a Someonewho pays laxes b Someonewho has a job 'c Someonewho obeys the laws Figur. 3-1. (4a) A goodcuzencan be descfib€dbelleras a lawabiderlhanas a laxpay€r(T) (ab) Havn9ajob ls morecharaclerisl c ot good cltzenshiplhanis obeyingaws (F) Mulpl€ahoce ardCotrespondns True Fase llems false irerns than from mulriple-choice items formed by grouphg one true srarc' Denr wirh rhree thar are false or by Srouping one that is false wirh rhree that are S BOU TT R U E -FA LS EITE MS CO M M O N M I SC O N C E PT ION A of be n)e-false format among educa. There seems to be an mitical undaeeptance oonal researchers, wrirers, and testing personnel. The unfavorable auitudes about rrue-false irems seem to be perpetuated by disappointing experience of resr takers, frustEong results of resr makers, and hearsay elidence. There hav€ been few careful empirical studies of the charg$ most often brouBht aBainst them. An analysis of some of (he mosr frequenlly heard indictm€nrs follows 118 TFUE-FALSETEST TEMS Th€ lmpact ot cuessing a g a rn s rrru e ta tre r.i \ rhrr mdnr rake qui re vl ,ousr) i s rhdl t hr r dr . s u b j e (| ru g ru \\ F n o r i n rru d u( e.l b\ g" * ,i .g S * .i ,r ,hi .g, , ," i . ,,; rn respoDse ro rhis charge_ The firsr is rhar a disrin.rion r and g u e s s i n gB . ti n d s u e s s rnt nfeodrmge!eds s e !,o n rh e o rh e ; h a n d,r nt or'm r ne m o ' e a s l u d e n r l n o \ r, rh e mo rc l i The second is rha( weltmoriva difficulry wirh a generous dme timir, on rue-ralse rests. They know rhat rl determlnrng rhe correct answer In onr rc.pon-.c,hJ | , Jn be rnJd. r^ r 61., 1o, n. :il,;:. l::,T":l::." il:t:Hl::::liJ; .ol 0.85 ro u.,15.Ih,.se!atuFsa,pdbour roorn resr,rega.dless of the lbrm of tcsr I rhar goo.t true false resrs need nor he S o m e re s tj n gs p e l i a ti s rsb € ti e r e _ dex r r wjr h b ! c o rre L ri n g th e \c o re s f^r sr were ro be exrensile enough to allecr rcrnrng rnar a guessi ngcorrecdon coul d The Amblgulty Charge IPUE FALSETESTITEMS I39 ar D?E z tqS . tu d e n [sw h o s a y ," Il I i n te rpret rhe statcmentthrs w av,I d sar j r i s true B ur if I i n re rp re ( i t rh a r w a y , I' d h a vc to say i t i s l al sc," are compl arnrng aboD t appar e n t a mb rg !i tx Ife x p c rts i n th .l i el d hal e the sa' Jedi l i i cutty i n i nl crD rcti ng a particular stateDeni, the trouble may be iDLrrnsic anrb,Buiry. Apparent ambiguitl mat someti,nes be due to inadequaaies in the s|u becauserhe {o.ds dent s ' k n o w l e d g e .T h e y h a v e tro u b l e i nterpreti ng a sLaLement meao some[hing a litrle different t.] them rhan ro rhe cxpcit, or because the staLemenr fails ro evoke the nccessar: associarbns that would yreld the inrended inler. H e n c e a p p a re n ( rn b i g x rty i s nor onl v unavoi dabl e,i t may eveDbe u,.eful By making the task of respondrng harder for the poorly prcparcd than for thc $ell pre p a re d s tu d e n t, rr c a n h e l p ro di scri mi nate berw een the tw o Thusastu denf s c o mme n t th a t a te s r q u e s ti on i s uncl ea. i s nor necessari l yrn i ndi ct cnt of the question. lt may be, ratlrer, an uniDtentional coDfcssion of his or her owrl In fi n s i c a mb i g u i ty , o n l h c other hand, Lheki D d of ambi qui ty thrl trou bles the expert as nuch as or more th?n rt trouhles Lhenovrce, is a feal con.er n Ir probably can never be rotally eliDinated, since la goage is nrhcrcnrly sondvhat absuact, general, and imprecise. Bur i rhe statenents lrsed in rue-lhlse resr it em s i t s h o u l d h e n ;n i m i z e d Of course, there is somelimes truth rn the charge that true false (est ' iiems nre ambigrous and lack significance:one reason is that t€achers somctimes uy to excerpt textbook sentences for use as rett itens. Even in a wcll-written rexr, few of the sentences would actually make good true false tesr j(ems Many snrements sen'e only to keep readers informed of whaL the author ts fying to do or to remind ftem of the stru.tur€ and orEanization of the discussion. Some rhar pr" rpde ur l ol l oh rh" m ar e, u d .p (n d c n r ta r th e rr me a n rng.n..nri n,es that they are almost meaningless out of conrext Others are intcnded onlv to suggest an idea, not to state it positively and precisely Stilt others comprisc a {hole logical argxment, involving two or three propositions, in a single sente.ce. Another category of sratenenrs is inrended not to descrrbe whar is tiue, bu( to prescribc $hat oultht to be rrue. Finalln some are expressed so loosely and so tentarivcly thar Lhe'e rs hardly any possible basis for doubtiDg theD ln all the wrrting we do to presetae the knowledge we have gained and to communi.ate rr to otherst there seem to be "ery few naturally occurring nuggets of cstahiished For this reason it is seldom possrble ro find in a text or reference work a senrence rhat can be coDied drrectlv for use as a tme starement or transformed by a simple negation fbr use as a falsc statement The wridng of good true-false items is more a task oI creative writing than ofcopying. This may be a fortunarc {ircumstance, for it helps rcsr construcLors avord the huard ofwrning nems that would encourage and r€ward rote learninL A speciai source ofambiguity in rrue-false test iterns needs to be guarded against It is uDcertaintl on the pa.t of f}le examinee as to the examiner\ stan. dards of truth. lf ihe statement is not perfe€tly lrue, if it has th€ slightest flaq should it be consider€d false? Probably noc; tbe item wriier's task will be easier, and rhe test will be better if t}te e-raminee is directed to consider as true any statem€nt rhat has more uuth than error in it. or any statemeni that is more firre I'O TFUE FALSETESTITEMS be rhe tes,bujrdefs taskrheni'i o ,uod , r n8 I1*r 'hn "-.?j;,'::ll:,:l:1r,",;riilr,i,."li.l":i.""n"n" .,, :iift:;:11:1.f1:: fli;;x:i::'llh:n :,{"r,f,r ;r!l:i}il ii.iJ'.l;i::t';,.",",.. :.".ri,},:?rj xl;;,:t,;i'll;:l*llr:ji T.".""',t,;l:l r;l:ii:il:r;l',;: ;ilt"" ll:il: li..: f,t',-llt i': i:,;iJI.;n*:,*_"1,;:i";,i:ii ii:;"J':1.;:iTi f,Xi: ll:fl I t :',;i:l:i';'ilT xl:1,*1*:: I ];:lill';1,'l'.iI$*iT :':i;;."n'lf l,l?t:":"1,;li;" lrlid ::i[i::i -l",ffiiJ;: i:*' :i] il, x'::j'I t.l:n;ti,i ij:::lil.;1,;'1.;;:x',liJ"ill]'f;; i,'tijij;1,",i:.;:i: l;:,."1T *,fi.llf.ff::.1?,,i:;;;;";i :":;,tl;?;, .r rrrtuig"-iiy "*ia.tt. -,.* Beward lor M€morlza on Ttr6Esthorot OonOuixot6 wes C.tu.ntos. m rne cnsmtcEtjorDutafor water t5 H,O f4 Th€ 6atfl. ot Has ngs wa6 , .,OrU. chrisroph,r cotumbL,s ".nn, waooolir" so"i".,o o r0616sr€str ptrnoisin th6 sot"rsy"re., rn -t*!il;*ltri*;:+i mi < l:Ti:ltl,H :1"1,::fi :T':l :T#T:.:ff:.":"ff ff:fi:ffi:Tff; 11,." One can tesrknowtedge of a functionalreraoonship: -i"Hf 15* tho rt.ms rr ! r.!r viry In drrculy, ri6 n,rrow€rrr,€ ,one. o, t !t 'lx'.:ffg"J1ii,:::::::T:'i*:l,Tilf:i::*,i..!!b.r!nc6,rh.r€np€r,rur.o,rh. One can rcst the abitity to apply principtes: n :::ilii!."":ff1"#?1"H.T.iTi.,"1ff":.?#,""T!"$;.,r -,,."-,.,..,.., :#:.j::,""."".,, n;",.,.o.".",rh,,,dooropen,ri.r6mp€r ,"jl;:1ff#,#fl1fi --------- IFUEFALSE TESTTEMS I/I1 The tim6 kom moonrtss to moonset is usualy tong6r than rhs tim€ troh sunrise ro Eltecls on Learnlng Outcomos Crirics of rrue-false rests somerimes.harge rhar rheir use has harmful ef f e, t s l e rrn i n g i rh rr rh e ! i | | e n , o urJgF \rudenr\ ro (on, enrrareon remFmt er. ing is . l^anre d fi .' u rl d e ra ' l \ a n d ,o rerv h.rri l y on ,ore terrni ng; (2) en,ourrqe r ud. nr s ro a ,re p r g ro r\l v .\e r\i m p l i fted .ontppri ons or rrurhr rnd 13r exooi e qud. nr . u n d e \i ra b l ) ro e l ' o ' . l { n w dcten.rhtc a;. rhe\e.hi rs€s? f' u e l .rl \e i ' e mq n e rd n o Le m phasi /ememory tor i sol ;,ed fa, ,uatdera,l s. uuod nn e s p ' s e n r n o re l p ro b te m \ ro be \ol \cd dnd rhu\ emph,rj /r underi ,and ing and a p p l i c a ri o n .E v e n rh o s erh a r mi ghr requi r€ reca off;.tuatderai ts do nor ne' *s . r r i l \ re k a rd ro l e l c d rn i n g . tor rarrs arc hard ro re,nember i n i sul dri on. I ney drc re tr' n e d a n d , rn b e re .rl led be er i f rhc) rre parr ol a srru. i ure of knowledge l h e re i , b e h e l e rhJr rore tcarni ns i ( \om.rhi nq of an edu(a. e trF a s onnw' a u ' n e d a g a i nn r innr l b n g e y m a n '.o rnd ( i re.t as rhe , a,,,e or ed;, a,i undl fai t. ure, but seldom prachced or obsefled. Rote learning is not much fun, and ir promises few Iasting rewards. Mosr srudenrs and teachers properly shun ir. per. haps ,ts supposed prevalence resulrs from an error in infeiena. Ii is surelv rrue thar rcte learniDg always resuks in incompl€re learning (that is, tack of under. standind, but ir does nor fotlow rhat a iniomptete teaining is the result of too mu.h rote learning. Ir may simpty be rhe resulr of roo tiule iearninq ofanv sorr n trh F s e ,o n d c h a rSe .' hd" he rl regorj .at w ay i n shi cl i rn5se,s are . _ -w h rr bor h of f e rc d a n d s (o ' e d i q l i k e l ) ro g ive srudenrs rat" " noi i on ,tor,, rt. gi moti .. " i s setdom ir ! of r ru rh ? tt i d e n c e i n s u p p o n o t rhi s drgumenr oreqented.and the argxment rtrelfis seldom advanced by rhose who have used rrue_fatse tests exren. \ ilc lv . T e s r w ri re ' . tn o w s ru d e n r\ s i tl .hr ense dnsw ersrhar di racree w i rh rhei r own. O f te n rh e r w i l l p o i n r l o rh e .o mptexi ty;frhe enri re subi e.r-andsi l l i nsi sl r har a r a s e (a n b e ma d e fo r rh e a l r€ ' nari ve answ er.U \ua y rh; aurhor con.edes r hdr . r heq ra l e m e n ri n q u e $ i o n i s n e i rher perte(rl y rrue nor tora y fatse. Ihe di s. ( u$ion rh l r n o rm a l l v fo l l o w s re n d e l o emphr' si ,/e,rar}er rhan i o (on(eat, rhe complexiry, rhe impuriry, and the r€lativity of rrurh. On occasion ir leads ro rhe . onr lus io n rh a r.rh ei te m i n q u e s ti o n w as si mpty a bad i rem. poorl y concci ved or Itow conrider the third charge: rhat rrue-talse rest items are edu(adon. all) har m fu l b e .ru s e rh e v e x p o re i h e {udenr ro eror. Ihe arqumenl i s rhat th€ pr es enl a ti o no ffa l s e s ta te m e n tsa s i f i hey w ere rnre may havel nemdve suqses. rion effecr, caurjng srudenrs ro believe and remember unrrurhs. H6wever, iich (1929) tenrar;velv (oncluded fiar fie negarive suggesdon effeci in rrue_fatse t€srs rs Ptobably mu.h qmaller rhan is somerim€s assumed and is fu v offser bv l}|e ner.posnive teachinB effecrr. Other experimentat srudies confirm rfiis conctusion, and as R o s s { 1 9 4 7 )D o i n re d o u r: Wherbr or nor a falr sraremenris dangercuedepdnd! tar8tty upon rhe se in* ,n wh'(h rr appears.A talsesuremenr in r}|e rexlbooL,roward which lhe char; i€ristic pupil artitude is likelt to be one ofpdsive, uncriri.at accepranc.,mighr l 142 TRUEFALSE TESTTEMS ' easi\ be sernnrsRut the siru.rion is differcnLvirh rhe ircnrsii a rrue-{alje resr H e r€ rh ch a h i ru ,l r.,h ,n ..i rhemode.npupi l i s oneofacti vc,.rj ri cal cl al Lenge (P :i411) In lighr of fiese findrngs, we conchide rltar wcll-.on.eired and we!. developed true false rest ircms can conr b,rrc subsranriall) ro rhe lr|easuremeDr ot educational achievernenl The harm some fear rhcy mighr do is r!.i!,al in {joD1- W B I T I NGE F F EC T IVET R U E -F AL S EIT E M S The insuuctor who wishes to wrirc r rruc-false irem for a claslroom Lcsrshould begiD by focusing atrenrion on some segmenr of rhe knol'ledse rhar has been taugbt- It is assumeclrhat the item wr're. rs in firm commrnd of ihar seqmenr of k I o$l' J q , a r,l rh a ri (i ri o mc rh ;n s a n\.dpdhl esru.l .nr ot ,hc\,,hi .., o;ghr a{ .u ro undersrand. This segment of knowlcdge is, or easiiy could be, dcscribed i. a single paragraph su.h as those found in any good rerrbook adopted for rhe class A , , or d i n q l \. Ie m h n rc r \ u ru a l l ) fi nd i r ea\i er and more ettc.' a!( ro u.e In,rr u, . tronal marcrials as the source of ideas for r€sr rr€ms rhan ro deriae rhose ideas direcdy fi om educational obiectives Suppose now that an item writer singles our a specific paragraph of rexr inlendd to help the studenr develop sone segmenr ofknowledge T,ke, for ex ample, rhis paragraph: A \o i .rs i v i n d F x :rn i n e c (r c h o i cedmongopri ondlque{i o,F un .$j y ,r.r ,,, "n l r.. \p i r" l ,i r.u m{ a n ,e . m a l c suchopri o . netFsl ar} l.t dri tc' - r c\rr,i nee. aNner direrenr .lue$ions, rhe basisfor comparinBrbei. scores,rheir lelels ol pe' tb.nanie, $ eroded-Srudenrswho have answ€feddifferenr scrsof quesL,ons a.turlly hale .aker differ€nt rcsrs.ThesedifferenLrcstsare nor l(ery to be nea . ur / \ ul s dn' e di hiev Fr nF n r ! A n d . c , r r i n l y v h . n r u . i p n '\ , h o , ^ F r h c o u e ! ' hc on ! hr h r hev ( r n per f or m b . r r . r h c \ e , o r y o r e . I , , , , h i r A r o u l , q ' m - \ r ' i. n, rhe differen.es in achievemenr among exanine€s Thar is, a narrow ran,ae o{ ld, , l\ his h will pt obabl, Dependdbh no,m rerFrFn.rd.cnre inr.r ' . or e" 'c s u l rbecause pietadons will bc difficDlt ro make rhe lnall score differences $ould more likely be due to measuremenr eiror than ro differences in acbievemenr Op,ional qu€stions are somernesjDsrifi€d on rhe ground Lhargiving studrnLs a choi.e among rhe quesio.s they are ro answ€r matLesrh€ tesr "aairer" BuL if all the questions irvol!€ ess€Dhal aspects of achiev€men. in a .ourse (as fiey ordinaril)' m,gho, it is not unfair to aDy srudenr Lorequre answers ro all of$em Furthermore, an oppoftunity ro.hoose among optional questions may help rhe poo.er studenr coneiderabu burmay acrually disract rbe well prepared studenr 'Ihe first question the t€m.r\,rikr musr pose is, "What are rhe mosr rmp orta nr ider : pr es c nr ed ; n I his pa r J g r a p h l Her; rrF rhree ot\Fverdl propori. tions that can be identified: l. 'Ihe use of optional essay irems interteres wnh inLerindividual s.ore compari ITEMS 143 TT]UE FALSE TEST 2 TIe ust of oPrnn,al e$a.vileDrsusuallv contriburesLo reduced |esrscorevari I T h e u s eo r o p L ,o n ailte n s u mal l l Iesul tsi . rcdu' ed testscorerel i abi l i ty The Dext qlesrior is ho! d)ese idcal can bc expressed as uue false test ircms. At rilis poi;t, a vety iDrPortartr suggesrx,n c& tte offercd: Ahrqs thinh oJ the ottul/ah' OfLourse, only one men boslibk t e-falie testitms iniairs,onett*' 't of rrr. pulr is actlrallv used i. thc tesl l{owevet uDlcss a parallel but opPosite ". can be nade, rhc ProPosirion is nor likciv to make a good $ue false srateDrcnr r es r it cn H e rc a re s o me i te ; Pa i l s deri ved from (he i deas P resentedabove 1r- The use ol optionalr.ther lhan requk€de65ayllems reducssth6 dbilitvto malo norm_ relerencedinleDretalioos. (T) lb. Th6 uso ol oprional rarherthan requn€de35ayllems enhanc€sth6 tbllily lo nrako interprotation6.(F) crit6rion-reler6nc6d 2a. Th€ scoresr6sultlnglrom the useol oPtionslfarherlhtn r€quired6s3ayll€mswill orhlbll Bddcedva.isbilitY. (I, The sco16sresultinglrom tn. us6ol optlonalralhsrrhsnr.qul'Ed€ssavilomswlll 6xhlbli 2b. incrcas.d variabllltv. (F) 3a. The r€llabilityol scoreskom optional65s4, ltdmsls llhelyro b6 smallsrihad lor scorls basedon reqlirsd 6ssaYdems. O 3b. The r€llabiliiyol scores|tom oPtlonal665syll.m3 i6 liloly to be l€196rlhln lor lcoru8 btsed on rsquired65saYlt€ms. (D 4a. Tho rctlabllliyadvsnug€ ol uslng rsqulr.d v€rsusoPtlonal6s6!v ltomdl! du' to dltt'F oncs! In 6corcva ablllly. {t) 4b. Tho rcll.blllty advantlg. ot uslnE€qulrod lor6u6 oPtlonnl€!6av ll€m' lt du' to dlll6r' . anc.s In lest l6ngths. (D \ or F th d r m a n ) v rt;a ri o n s i a n b c dcv€l oP ed i rom rhe proP o' i ri ons l i sred and r l' r r no n e o l th e i l c m\. o r rh e p ro p osrti ons i s a reproLl ucl ronul one oL tne ofl gr' nal sentences-Atl itens are designed to rest for understanding, not simply for recall of sentences read or heard Guidsllnes lor ltom Dov€lopm€nl There are five general rcquireEenls for a good tne-false tesi item 1 It should 1ei the €xaminee! knopledge of an imForGnt Proposition,one ttut is likelv o be siqnificatrtand usetul in coping wilh a larie'y of situadonsdd proble s.It should say somethingsorth raynrS requrie un.terranding aswell asm€morv SimPlere'all ofm€aningle$ Itshould 2 words, enpiy phrases,or senknles learn€d by roc should not b€ enough to Permit a cofiect answer 5- The inLended.orrect answer(tn€ ot ralse)should be easyfor th€ item niter rc defend .o the satish.don of comPeten.criti's The 'rue sratementsshould be rru e e n .u H ha n d rh e ,a l ' e !d Gm€;rs fat' € cnotrgh\o rhatan exP eflw oul dha\e n o d i m(u l y d ,{ i n g u F h i n g ber{een rhe' n A n) exP l an{ i oq or qurrrrr' ar' o neededro ru*ifv an uncon;lidonal answershould be in'iuded in the n€m' TFUEFALSE TESTIIMS ,1. On fie orhe. hand, Lbe inLended coDecr ans{e. strould be obvnrus onl} ro rhose {ho hav e good ( om m and of r he k n o w l e d g e b e i d g t e s r e d l r s h o u t d n o r b e a n a L rer ol common k.ovledge h should nor be gieeD ahav bI ad unjntended clue fhe wroDg an$ver shoold be made arrra.tive Lo Lhoservho lack Lhe desired com 5 The nen slould be cxpre$ed as simply, as .oncisely, and abole alt as .learry as is consisrenr w,th lhe preceding four requiremenrs. rrshould be based o. a singte proposition. Conmon words should be given preferen.e oler t€chnic2t Lenns. Senr€n.es shDuld be shor. and simple in sou.ture Essentially uue sratcmeds should nor be made false by sinpll idserling tbe wotd tu/ Here are some pa,rs of rrue-lalse test irerns rhar illusrrare rhe|e rcqurrernenrs The frsr af ekh pah i: an aueptabte itm, uhlb the seond I paot. L me ikm testt an inlortant i.!zd. {1) Pr.sldent Konnedy .tt6mptod to sotv€ th6 mtssit6 crtsts by threatentng a btockad6 ot cuba. (r) (2) PrcsldenlKennedywar 12 y.a.s olderthan hi3 wtlo. tD The differenc€ in the ages berween President Kennedy and his wife mighr be a subjecr for conment in a casual conversarion, but it has lrule ro do rhe 'virh importanr evenN of ihe time. The Cuban mrssile crisis, on rhe other hand, brought rhe Unitcd States and Russia ro rhe brrnk ofwar Hos,rhis cnsis was handled is a far more imporranr element jn lrorld hisrory (han a difference rn ages between a president and his wrfe (3) words ffl(. som., uE allh alt, ot Nret snottldbe ayoidodh wrtrtns truo-tatss 1e3r Item6. (D (a) Two pltl.lls ehouldbo lvoldod In w.ltlng tru6-l.rs. r$r ti€m6. (F) Item 4 is rhe rype of rexrbook sentence that sers rhe srage for an rmpor. tant pronouncement-bur fails to make ir ltem 3, on the orher hand, rests rh€ examinee's understanding of several impo anr princrples Specific dercrmrners like sanc anduualry provide irrelevant clues when used only in rrue staremenrs rf us€d in false sGtements, they tend to arracr wrong answers from rhe ilfprepared srud€nt. Convenely, specific determiners like alr or naa are useful in arr.acring wrong answers from rhe uninformed when used in rrue sraremenrs (5) liloror.lt c.n b€ dblolvod ln i plnt ot wlm wat.r than ln d ptni ot cotd wat.r. O (6) Som6lhlngs dlssolvoIn oth.r thlns.. O A statement like that in item 6 is roo general ro say anythiDg useful Item 5, on the other hand, provides a test of the undersranding ofan impo anr rela. tionship. 2. Thc itcm tcrts @dtsl@diag. Phrorcolog. It docs not reuard re@I of d dneoq4d TFUE-FALSE TESTTEMS 1{5 (7) Whena handpushssa doorwirh a c€.tainto.c6,rh6 doorp!€hes backon rh6 handwtrh th6 same |orc.. m (l) For €vsrydcrtonrhe6 ts an 6quatsnd opposireBedion. m (9) ll the hypotonuseot an tsosc€tesrighr t angt€is sevenInchos tong,oach onh6 two €qu6tt€gsmust be nor€ lh6n tivo tnchgstong. {D (10) Th€ squarc.ofrh6 hypor6nus6ot a gnr r angt€6qoatBrhs sun ot rho squar€so, rh. oth€r lwo sidss, tn Borh iLems 8 and l{J are word.for word srarcmenrs of impo.ranr Dr:nci ples r h a r .o u l d b e tc a rn e d b \ ro r.- . r^ r.\r d r,,dFni r undr,,,,.;t;;. L t,' d;;;. dDr e ru fre v n r q p c ,j ti , a p p l i .rri o n\ rhar rrord rhe 5(.re.r)ped phi d,rr, d\ hr. been d o n e i n i re ms 7 a n d 9 ). me coftect aMo to an iten & defasibb. {11) Moisrrtr is lass denserh.n dry air. (T) {12) Aaln clouds.16 tisht h weighl. O S i n .e a rrrn , to u d " e e m , to fl uJt ,n rhc ,i r. i r mi ghr reasundbtrbe ral l ed lighln b c i g h L u n rh e o rh e , h d n d. d ri ngre ra;n .t" uj , m" r i eh .;;. i ;;; 100, 00 0ro n s . O n e (u b i i i o o r o t rh e rtoud probabty w .;gt , ,U "",, , it . ,,.. cubic foor ofair Since rhe cloud conrajns d-rop;.t, i,r*"i"., " r; .."1"t,"rrr,, wergh more per cubrc foor rhan cloudless dry air. On the orher ".."rahand, moisr ai; dlone/l | e m l l )$ e i g h \l e $ p e r(u b i c foor rhdn due\ dry ri r. shoutd rh. i r;:i ; s pe( ' r \._ o tn e r rh i n B\ b e i n g e q u d l . tor erampl e. pressureand rFmperaturej lr m r gnr .b u l rn th e d b s e n .eo f me n ri o n. d redsondbteper.on i s tu\| | t,cd ;n .(umi nJ rhdrremperarure dndp,e\rureshoutdnorbe rdl.; ,. t" ",li,ti" ru,,.lli,, ir,,l (13) Theproposatihats6taryschodutosrorroach€Boughrto Inctudsski In rgachinsas on6 ot.thsd€i.mtning vartabt.st6 6qppon.d morosrrongtyuy reacrrers, organizartirs|tran it is by t.xp.y6re. {D (14) M6dt ls an Inport.nt tacrorarr€cringa reachs/s .atary. (D I h e i rs r te rs i o n i s m u .h m o rr spe, i fi , dnd mu, h morr cl eJrtr tatse rhan . r ne s e,o n o L rp e rrs I o u l d rg re e u n rhe aD \N er ru rhe frrl r, bur soul dberroubl ed a m b i g x ' l y o frh e rerond. A cro55the (ounrr) i s no doubl rrLe r nar r ne s a ta rre so r g o o d re a .h e rs a re hi ghfl l han rho\e ot puor rrarher,. H or ev er ,|| r\ a rs orru e rh a t rh e s rta ry s .h eduteqor manv srhoot .}i Lem. do nor i n, l ude m er r r a s o n e o t th e d e Ermi n i n s fa cror\ (15) Tho tarntttng ot srarlght ts du€ ro molton In rho oanh,s atmosphorc. (I) {16) Starssend our ttgnrrhrt twtnkt€6. (T) The answer to ihe second, unacceprable version of rhis rreD coutd be .. c"hallen Se db v a re a \o n d b l e ,w e .i n fo rmed person on rhe to ow i ns qround!.l L b nor r netrg ^ h t s e n to u tb \ rh e s td r rh d t tw i nktes. rhar l i ghr i s retdU v" ei l 5rcadr.B ur Dec aus eo r c l rs tu rb a n .e si n o u r l rm o sphere, rhe Ii ghr l har rea, hes our evesfrom 146 TESTTEMS TFIJEFALSE the srar often appears to twinkie That the second version is unaccepBble is due eirher ro the li;iied knowledge or to the carelessnessin expressian ofthe person oh.tiola tu'a,ryo@.It tesa special knwbdget o g@d turt item is ^ot ' Froz€nloods sr€ usu.lly ch€aPsrlhan cannedloods (D Fro:sn loods ol lhg highe5tqualfiy may be ruinedIn the kitchen O Most locrl insutanoo.gencles are ownedand conirolledbv on€ ol the malornalional Insu.ancecohp.nie6. (R lnsuranca.geEciesmay be eithd genotalor speclallzod m 4. The sMet (17) {181 119) (20) Who could Coubr the possibility olcooking an) knrd ol fbod badly? How ot aur ble i \ rh . h e l i e l rh x t o n l y s e n e r' rl or onl ) \pe, i dl i ,' cd i n{r' ,n.e JS rnr re' iould be i o u n d ? t h .,rra ft e p rd b l r v e ,\rnn\. i rem) l d rnd l r' dr. ruu ol ^ i ou Jr true to discflminate high achievement fron lo$ Borh read like inrroductory sen ten.es lifted fron a textbook, seDtencesthar set dre stage for an imPortant idea but do not themselves €xpress imPortalrt ideas5. To one @ho la.hs the hnoubdle being teste4 a uroag d@q more Plasibb thda th. @nect oi.. should d,Peur {21) sy addlngmor6loluis, a saluralodsolutioncan be m.d€ sup6ruatur6lsd. {R (24 A Bup€Gaturul€d €olutloncont.lns morssotulsP€runll thon a Eatural.dsolutlon {T) It aDpears reasonable ro believe that adding more solute r\'ould tum a smurated sol;don into a supersaturated solution (item 21) But those who undcr' stand solutions know that lf doesn'l work thlit ay The added solute woD'i dis' solve in a saturat€d solurion Only by evaPorating some of the solve.t or cooling it, can a saturated solurion be nade suPe$aturated The student \ho tries lo use as a substitute fo. sPccial knowledge is likely to girc a wrong (which is all hrs knowledge entitles him to) to $e fint item But the same answer coDmon sensc leads ttie student oflow achievcnent to answer item 22 conec -n Thus the second version fails to function Properly ai a test of the studenfs com m and of k n o w l e d g e 6. The itdn is eEFesse.cclzatbt based on a single idea. l23l Ths sall dissolvedin wai€r can be r€coveEdbv ev.porationol ths solvenl (T) 124) sali can bs .li*otvsd In warerEndcan be rscov6€dbv €lapohtion ol the solv€nl F)' (25) At conc€pllonthe ssx ratio ls aPprorimalolv3 bovs lo 2 gitls. CI) spern .rs srrcngerand liv€longerlhan l'md€_ t26) ScionllslshavsloundrhEtmaleProducing rersnD da( inaPProPtraGl).o 'Arorher unat.eptable (T) soDes in hot sate.isuqir dnsohs in.old nr.i bnrcs Lr. ideas migh(be:Sarrdir TAUE-FALSETESTITEMS I47 prcdlcins spe.m,whichaccountstor th€ s€xratioat concsptionot approxhaiEty 3 boys ro 2 sirls. (T) A n i re m b a s e d o n a s j n g l € rdea i s usua y easer ro undersnnd than one blse.l on two or more ideas. Ir is also nore efficicnt. One can obrain a more o t r .ru d c n a \ d, hre\(menr L) rFsri ns \epJrrre i deas reD I r dr elr r l rd n b \ l ,rm p i n g rh c m ro g e rher dn.l o...o;,p" " ,.,," " .,,i i r,, " o,," g l27l h!ividuals whodettbsrarebetorsmakingchoicessstdomthd rh€ms6tv€s ,orc6dto eac. rillce one good thing tn orderto a ain .norh€r. (F) (28) Lll€ is a conrinuoB pbcess otchotco mathg, sacriticlngone humanvatu€,omnother, whlch goes through tho loflowing steps: spontdneousmontet s€tectionsrec.dh; €verythingw€ w.nr, con ic ng pret€r€nceshotd esch orhor h check, h6;[srio; bocome5delb€ration as we wotgh and comparevatu€s,rha y chotc6 o; pret6rcnco 7. The itcm is aorded conciseb. (29) Th6lederatgovernmentpaysprac ca yth€ ontir€costotconstructhg and msintahlng highwaysth.t a.s prrt ot the int€rstatehighwaysysten. (R (30) Whenyou s6e a htghwaywlth a ha.k6r th.t re.d6 ..tnteBtat€00,,,yoo know rhst th. constructionand upk€epofthat road a.s bulti .nd mainr.hod by rh; st.to and tedor.l thar . . The wording of item 30 is carelessand redundanr.It is the ,ri&_iard, L,urlranLlmlnrained. nur r\ cunernrcrion dnd upkeep.tr,. p.r,.,'."r1."J '\ fwtren lou rhe Jppedran(c pracii,ariiy.b,, i".. ,ii.;; "rmal(ing rcJ y med\ur(\,r dlt. Find||), irem 30 truF b! "", includine \trrF a5 hrll ds fed.rdt tsoternmcnr\ as *upporrersni rhe inre,.rarer,iet,,v _v; rem prnbablvmrrc, rlre irem earier ro, rhe uninrormed.lrem 29 his rhe in. tended mark more clearty becauseit rs more srraishtforwardand concise 8. Thz itch doet not in tultc at anifu;aL trickj Mgatiw. (31) Columbusmads onryrourvoyaqosot oxptor. on ro rh6 w6arom H6mtsph6E. O (32) Columbusdid not nake tourvoyag6sol.xptora on rorhoWostom Homtspher€. (R . S o m e i re m w ri re rr rrr ro ru rn re\rbook p,oposi ri ons i nro l atsr sLaremen| , \rmp t! b ) j n \e rri n g rh e w ord @ r i n rhe ori si nat ql atement. fhe 148 TFUEFALSETcST fEMs r es u l t i s s e l d o n g o o d .l h e i te m u sual l y.arri es rhe cl ear bi fthmark ofi rs unnaru ral ongin: lt reads awkw,rdly and invires suspi.ion, which, if rhe irem is iodced false, may give anay rhe answer Furrhermor.cj rhese irems rend ro be trickr An unob rru \i te n o r" i n a ' r o ' h e ' w i re w hol l y Lu' . \rJi er,,enrmd) he orFrt,,uted hl ev en a q e l fp re p a r e d e x d m rn c e .5uLh i rrn,s pl r r' uJenr\ ar rn unne.e\.ar l di \rd \ dnl d g c N i ts a ri l e l ) q o rd d d \rd rc n' en,r harr bren .hnhI ro br m.rc d; i , utr rnd to create more c.rnfusion and hosriliry in c\arninees than posirively rvorded rrue f ahr ' rra te m e rrtstB a rt' c r a n d C b rt, l qdl l Roduclng Amblguily Why are Dultiple.choice irems seldorn criricized for being ambigxous, but true-false irems seem ro be fiulred qith regularit,vi l he alrwei, r,e thint, is in the ren fornrats themselves Wirh mulripl€ choic€ irems. drc resDonse lhar appe rh ro b e mu \t i u rre .1 . re l a ri \. dl , L" detcndcd. ' . in ' he.rher There is relatrve compa son inherenr rhc irem Widr rrue f,tse ircms. hou ever, the starement usually is absolurc nld exaDirees sear.h rheir knowlcdqe ( lr u.L u re s [u r a l r(L J ri \e 5 A U ( fdtse i l (m rhdt dp.l dr(s. .[Ir P r' t,,mJ,i r J r;l l mounlain," leads lhe examinee ro a s€arch foi high peaks so Lhar rh€ ra Dessof Mt Palomar can bejudged iclarive ro rhe heighrs oi ofier mounrains. A corre sponding mulriple.choice item may ask which is rallesr and provide rhese choices: M r . W h | ' n e ).Vr S t H e l e n , \l r I ' al omd' . and C rdnd terun.thouqhMr.patumar r s t he l o w e \r o t rh r tu u r p e rt,. h) i r,e i r tc' rJi rr' suut,l fi ' moi r (onl epri on\ l h ;s ' i rn p l e i l tu ,rrd tru n ., t { h. formar d;,l erch,e, a sol urrorrro " f r al l ' ' rgg" ,r, I he a m b i g u i t) p ro b l e n ' i n s u m . |l ue trl \e i rern( Most true-false items can be made essenriallyanbiguiry free by inrrodu.ing a comparison wirhin rhe irem. Here are some sample pairs of poor and im. lt. 1b. 26. 2b. 3a, 3b. Op.$book t6sts tsnd lo b€ In6tllcten!. (? Opsn-bookrests t€nd to b. loss otttctontthln ctos€d-boo*rosts, fi) Tak6-hom6 ox.ms usqallyrrc ftlgh In quatly, (?) hfto.hon.6xrms usuallyar6 hishor rn quatttyrhan In.cta6s6xam3. (R The use ol closad'book,In.classtests contrtbut66to htqh rstonflonby stud.nts. (? Th6 use ot clo36d'book,h.cla66l66ts conulbur.c more to itgh retoniionby $tudonts lhan does fie uso ol t.k6.homet€sG. m 3c. tho rrEool closod.book,In.ta3s tosls oontribursEmor€ ro hrgh rotsn on by 6rud6nr3 lhan to rcductlonol t€Etonrt6ty. O In cach item an internal comparison is introduced so rhat 'inefficienr," "high in quali r\. a n d h i g h re re n ri o n i rn bej udged b) a (ummun i randard. In other cases, stat€m€nts may be ambiguous because of rhe rmprecise wording chos€n by the item wrircr. Insread of satng 'a Iong rcs!" for exampte, say "a 100'item test. Instead of referring ro "an easy irem,'describe ir as "an iiem th a r i i l e a s r8 5 p e r.€ n r o l e xrmi nees an\ser curr..rl v.' Ther. i c nu ( hoi ce bet w e c n b re ri tv a n C p re ri s i o n \hcn w ri ri ng rrue fahe i kms. rhoutsh ve ' (et both, trevity and imprecrsion lead to worse consequences dran does verbosny with clarity- ----------- IAUE-FALSE ]iEMS 'EST 1'g ErhanciEg tt6fi Dlscrirnlnstlon -Ih e .j o t) o fa re s t i re m ;s ro di s.ri mi narc beMeen thosc Irho hare . r hosc ,v t,o Id (k .o n ,mz n o r, \o m e e,c,m,r:1 r,.,.r.,r8.,,,.;" ;Jr.;;:,,' ;,,.,;;: ar)d . r r n re ' p re r! o n ,n b e ,,,d C e ,,t r tres4u,F, l hur(i nul ," 1c" ,t,;r,eJ,,,,,u!.;d shoutdle?ille ro ansiver rhe que,rion cJ.re.,ty ,t,rr.", ai,ri..,iiy..ir,"-". *i,.i".i irshould nnd a wrong answer atracri,Je t o procldce irem5 rha.,witf di*,;;;,,;;,, iu t h i s w a y i s o re o f .t' e a r6 o i i re rn' ri ,i ,,s.' H e...;.J," ;;;;;;i ;;,i ; true-false items can be prottuced ro promo 1- Ue nffie folte that ttuc stutements. When in doubr, studenN sijeB rnc p,upo.i,ions p,eqec,(o,,,,,,"" ,,,,.,..,1; li::",1:,i.i,:.i, :::,:':i'.f,:]11',],'*l: nore sharply between sr!denN of r,igh and iow ,.ht.".-.,,r rt,:l" d; ;;:;:. m ent s (B a rk e r a n d Eb e l . tg i tl ). T h i s ;av b wharN caried xn 'rqu,escd,, .",p.;. i,;;;;;;; #;ill?,:9":.' ";.;to rejec, accept rhan a decrarar,!e,,",.j:;1:Tij'J1,:j,,i :.#ili",tl"l,:: Judge. Inshuctions for preparing rrue ta,se res{s somennres suggesr includinR _.^-.- the . aDout same number of falsc and rrue sraretrtenrs.tsur il rir. f:ir. ,r"i;,;,;;;: rend to be higher in drcrirninario!, ir wourd serin ;.-;;;-tua.c; hrgher proporrion ofrbem, perhaps as mr.v ai rJ?p"...",. ".t,,r,,,s;; E"".; ia,,;;;;i..;;"; to exp_e.ta grearer Dumber oi falst u€m-i, ihe kc-hiirluc sri| seen,, ,.; ;.t-i; one of the author's ctasses,sruclenis rrrok a res. o, ,f,i.f, r*".rt i.a. ,t.t.. quesri oD sand_co!nri hg how"f,fr. many rhey _ectnumocr ot rrue sratenreDrsaDd werc ey wished. r\fosl of rtrcm changed. nun scorcs lery lirrlc, on rhe avcraqe. t.bev ! iiom righr Lo Brong a, fro- ;ro"s r; 2. Woftl tnc ibft n thet supefiljal lagic tuggest! a urcng aflrun (1) Aruo!€rDltt_w6bntn9tO0sr.na i. flor ngon th€ 3urt.c6ot o pootot *slsreracltv h.rl suDmlrgsd.An.ddtlronetdownwardlorceot 50snhr b; _c;i.;i; compt6r€ty. (R "".td "-;-;;;;;i weighs t00 grams, which qives one-half erficiat basis. Ihe true casirc, ofcourse, only halfofir, anorher 100 qrams would cial logic also woutd make rrre inconecr ausi bl e. (4 cr Shc€ 6tud€nr.show. c,td. rang. ot tdtvtdu.t d €rencos. rh. tdoatmoadurum€nr !tru!. ton woltd b€ achtovedtr6.ch lrudont coutdr.ro a dtr,",*t r";i;p;;;tit;;;6;;; l6et h|m cr h€.. {D 150 TBUE-FALSE TESTTEMS (3) Th. outpul vollag€ol a t.anslorm€ris detsrhtn€dIn part by th€ numbsrotturnson tho hput coil. F) {4) A translorm€rthat will increas€the vott.g. ot .n attgmatingcufiont can atso b6 usod lo inc€aso tho voltEgool a dtr€crco €nr. (R ). Maha the u.on? dntuer @Niitent aith a poqular nisc@ceptioa or a Foputar belief ifteleaMt to the qrestion. (s) Th6 €ltectlvenessot t€sts as rootstor n€asurinsacht€v€monr is towgr€dby th6 appro. hensionstudentsl€el lor them. {Fl Nlany srudenrs dc, cxperience rcst anxiery, bul for nrosLof rhem ir facil_ ,rates rather than impedes maximum pe.formance (6) An echlevemenllest shouldinctudo€noushiromsto k€epsverysrudonrbusydudngth€ €ntirot6sl P6riod. {D Keepingsrudenrs busyar orrhy educational tasksis usuallycommend. ., able, bur in rhis caseir ryould make rare ofrvork couDrroo heaYilvas a derermi nanr of the tesrscore. 4. Use sFecifr. deteminers in ldelse to cnfound testui&ness. (7) A so-ltemt66t s€nerattywitt b6 horo rcfirbt€rhana z5.ir6mr.st. (D (8) lo a posltlvslyskoweddtstrtburionrhe m€ani! rtways rarg6.rh.n rh6 hode. O {9) Tru6statemsntsusuatry!r€ mor€dtscrthtnatingrhantats€.raromsnr.. (D (10) Acor€lallon ol +0.28is nevdconstd€rcdro b. htghorthana comtaton ot _o.as. O 5. ase thrercs in fdbe stdtements rhat giae th.n the ,,ring of truth.,' (11) Tho uso ol b€tt6racht.vom€nrtests w t, In tts6l, contrtbsto||lt6 or nothtngto b.$rr 6chlev.n.nt. lFl The phrases "in irself' and "little or nothing" impart a tone of sincerit, and rightness to the statement thar conceals irs falseness from rhe uniDform€d. (i2) To onsurscompr€h€nstv€ m6r*ur.m€nrot.!ci rlp€d or.cht€voment.dtfi€rsnrHnd. ol ltema ftsst be lpoctttcaly wr ton, In du€ propor{on., to ts3t o.clr dtsUnctm.nt.l p.oc6ssrhe cour36t3 Intendodro d.votop. (D TFU E FA LS ETE 9T]TE M5 I51 \,. i n q u p ri o ' r\ 2 . .1 .i l . { ,pe,l i , i Jt tuqi , r, l ,ri d., rnJn, B I ri " . 'd i' ( n J l .o .l i .tl .,\.rh e e l J l ,u rrre r.' r,menr Jad,JrFrdtqurl i ri ..,r .n\rhrr.c\rni \e indi!i d u a l s a s s o c i a rem a i n l y w i rh rrue staremenG. MULTIPLE TRUE-FALSE ITEMS M ulr i p l e tfu e -fi l s e i re n rs re s e Drbl e muIri pIe.h.ri ce i rens i D rhei r D h\si cal rppcanDce Howcver, rarher dnn selecring onc besr answer tiom sereral;laerla. r r v e.. i x .' rn rn e p \rF \p u n d n e c ,h ufrhe,F\.rrt d\pnar. e| | Lre_ f . , ls eJ . ,In , n r. tl ,.\ q p .u r,.\ri rc m., \hd\c,,,o. Irun \re,r, ti k; d nrut| | D te , h, ' i. e .m, b ,,r rn r n u m b e r nt rhc i cr^,rJ,cd rtrrrndri !er D a\ l ,e ,n,. T he n u n b e r o f a l re fu a ri v e s p e r i rcm or ctuster need i or remai ; consranr d q i ,' n re .r H e r( i \ J ' drnpl . i rpm ' hr ur .g h ' ,u r "An t. 2, 3. 4, 5. ecolosisrbslng weishrby loggtngand €rcrcisingis increasinghahtenance m€tabotism. (D decr6ashgnet productlvlty, {T) incr€aslngbiom.ss. (R decr€aslngenergylosl to decompos[on. (R incroaslnggrossproductlvlty. (F) Nolice that the afternarives ar€ numbered consecutjvcly rhroushout rhe rcsr and .s o a' rpri rl r or romc \rrnbut rh,r e" , h s r.r' . i \ i rr I u d u , c d m/t.e, ,he i em ' ry ' ,c s er s ' lv i ,l F ' ,ri tr.,b l e .l h e r( \r i rfl n . t or e\dmpte. mj gt-r ,.nrai n , h.r.e\ r, ro t0, rh. neRt .c h o i re r 1 1 ro 1 4 . a n .l w r o n nulliple true faise form has seleral appealing fearures relarive ro . _Th€ t he mu l ti p l e .c h o rc efo rma r (F ri s b ie and S w eeney,1982) E x;mi nees can make ar le/ s r rh re e mu l ti p l e rru e -ta l .p i re m re\pun\.s. nn rhe r\erdse, In rhe ti me re qui' ed ro a n s { ' a \i n g l e m u l ri p l e.hoi ,e i r.m, I de,i dcd rd\an' " ge rn dn h.ur ' h e l o n g e r r.\' p erm' .s re\ri ns ol r . \ri n g t| me . T ot d H l .. er rl ng; or deprh ot ( onr F n t. In rd d rri o n , i r h rs h e e r \ hohn rh.r a mutri pl e trup tats;re\r D reD ared br , on v e n i n g i re m s tr o m mu l | ' p l e choi re for m \ i etda h' ghe eti abi ti r] ;.ri mdrc, than the original mulople.choi.e r€sr (Frisbie and Dru!;, r08iii Kreirer and Fris bie, 1989) Finally, rwo less crirical ourcomes can be noted. Students thar were ex r m i n e d b \ b o , h i re m rl p e \ e \p r e ' red an o\rMhetmi ne pretcrencefor mutri D l e t r ue- f a l \e i te ms a n d p e ' ,e re d rh em ro be edsi e hdn rl " Mu l ri p l e rru e -fa l s e i re m \ can be devel op.d uqi ng rny -utrioJ " t" .r," i .. \rrrre. scterdt gies Exrsting mulriple choice irems can be converred easily lo muhiDle dxe_false " . lu\ r e rs rn n e ri m e ' w i rh ^ u r .i g n rfi (anr ot rhe srm. i tr" rc,pon" e ' e$ordrng \ \^ n re i ,F m s u o ,rl d n ..d ro \e m.rl i fi .d .a .hJr ..,.h (turrcr soui .t nol ' hai' { i n 152 TRUEFALSE TESI TEMS Forwhlchol rhe quantiiissb6towis I morereasontbr' to"limeto the qu'htilv bym'asurlng a stmp|€ rarherlhan the whotepoputaiion? r. The av€ragotite ot 6 now brandof TV tubes {. lhe porcenrot Amsricanvot6rswho tavorrh. ,r...rh6;uhbe, o,reach;; ,; ;;;; c. Both tt 6nd l ilil'ifl::"j"'il;",fj",'il ff:':j "","", D, E, In muiriple hrerfalse form, the rrcm mighrlook like rhrsl ..ti woutdbe Dore roason.bteto me.sul the whorepoPur'rionro €ltinato th. t. av.hse rito or a new b;,;" ;ii-"T;:::tli)t""" 2. po.ceorot Americahvor€rswio I roreisnPorrcv {r) 3 numberor re'chers in J",t.-." r"Ll-t:': 3"od"11'3 rcnootwho usua y rid. th6 bus !o schoot. (fl \ or . r h rr i r j . p o .,i h te ru a ,td mu r. qu' t e re 2 d j l ], b n r ro d o ,ro ro fi e m u l comptexrtv fo. rhe irem wlire. and L same cogni|ive rasks. bur (he secon( Finally, rhose who preDare n , er \ ion mc rh n d , d n b e a i n i trt r,ro t . for preparing mutripl;.horce'items nec au s c rh e rre n t w ri re r i s n o t L nL or r(sponses de'eloped. Any concel mulriple (hoice items can tjc neas. S UM M A HYP RO P O S IT IO N S I Trle ratse rlems provid€ a simple and d.rect ff;:i;j,lllX-^ rh€essenriar ourcomes ol 2 The low esteemin which lrue_ratselests are somelmeshetdis dueto neptuse,not lo inh€ts 3 True-'atsetemsprovdeintorhationon essenlial achevementmue efiicien|y lhan mosr olher Mosl impodadaspeclsof achl€v.mo can be resledequalywellwilheitherlrue_talse or muhirl is lessefiicienrlo grolp lfue_talseslator.enlE ro producea mltljpte4hoicetem thanlo requife IRUE.FALSE TEST ITEMS 'I53 separale responsesto each ol lhe irle ialse 6 Inlormedguesses as opposedlo bilnd 9!esse6. . providelse'ul ndicatons ol achievemenl 7 Sludenls do very lilte bllnd guess ng on good I The probabi ly o' an examineeach eMng a h gh score on a lrue ralse lesl by goess ng bli.dly is 9 Trle-,alse lems thal appeaf ambiglous ony 10 poorlypreparedsludenlsare lkely lo bepowerlll 10 Few lexlbook senlencesare sign I canl enolgh, and mean nglu enolghoutol conlext. lo be lsed as true slalemenls rn a loe lase lest 11 Statemenlslhal ar€ essenlally(b!r nol perleclly) true or essentaiy(but nol rolaly) false can make 0ood lrue-false lems 12 TrLe lalse ilems can lesl sludenls comprehen sion ol mpoflant deas and lheir ab ly ro use ihem in sorvingprobems 13 Th erea re n o l ir m em pr c a dala lo s lppo. t t he nolions thal true lase tesls encourage role earnrn9,oversmplilied conceprlonsol rhe lr!th. or lhe learningot lals6 or inco(eci ideas 14 Generaily, ir rs easie.to deveop tesrlremsirom insrrucriona malenals thanlromslalements ot instruclional obteclives 15 A usel! str€legyiof devetopn9 ttue false tems is lo crealepansol slalements, onetrueandono lalse basedon a singleidea 16 Goodlrue-ralselems expresssingle,not muhi. 17 Generally, a good fase slatemenlcannot b€ creal ed by i nsenrng (,nor' )i n a l rue a negati ve 18 Ambgoly in lems can be m nimrzed by wf ting slatemems lhalcorlanan internacompa.isonol 19 Falsesralemenls tendto be morehighydiscriminalng lhanlrle stalemenls 20 specricdelermnerscanbe usedinwaysthatwitl hlnderratherlhanhelpthepoonyprepafedtesl21 Falseslalemenls canbe mddeto seemptausibe byusngl ami l i arl erms andphrasesn seemtngy slraghlloMardlact!a stalemenls 22 B eari veto muhi pl e{hoce i rems,mui l pe k!ela se ilemsare moreellrcrenl,are easierlo prepare andyed morere ablescoros FORSTUDYAND DISCUSSION OUESTIONS 1 Howcouda goodirue-lase lema.d Iheproposilion onwhichit is basedbed slinguished, 2 Whalaresomelogicaelpla.alionsior lh6ge.€ra advanlage ol multiple choiceover1rue lalsein lermsol itemdiscrmnalion? lo a lrue ilem b€ moredifiicltt inhofenlty 3 Whymiqhrlhe task ol respondioq thsn 'alseitem? respondi.glo a conlenl€qurvalenl mllrple{holce Whalsrepscanareslconsrruclor rakelo drmi.rsh th€polenlianegativosltscls otguessin! on rrue-'alseresls?mulipe€hoiceresG? texlbooksenlences asrtue lalsortsmsconlributesooaratslv 5 Howdoeslh€ us60lverbalim lo amhiglilyandlo measuremenl oltfivality? 6 Whatare ihe arqlmenislhat supporland r€lul€lh6 ideathat truo-lals€it6msimptant ''unlrulhs or misiniorhalonin lhe mindsof l€st lakels? Howdoeslhe addnionol an inle..alcomparsonin a lrus-lals€ilemmakeit mor6likea Underwhalcncumslances mighllhe advice Usemor6talsethantrLe hav6morenegativs thanposiliveconseqlences for obtan ng vald measures? Whymighlil b€ b8dadvicsto recommend thal specilicdotefminers nol be usedin rrue- Multiple-Choice Test Items THE POPULARITY OF THE MULTIPLE.CHOICE FORMAT Multipl€.choice ir€Ds have lons been rhe ::i:.#i:i*ltffii :$'ff 5,xT"".:i;iH:*-t*ff i..*li$ii.jji..,ff choic€ than in someorher irem forms. S tions lessambiguousrhan compleuon or easierto defend correcr answ;rs t".#j:,1._f*?*:',113 ";;*ll:;:",: 111y.liLil"'; ffi :."::i:;i,i"fi ;.;i,:ffiT,ii"'*:T,i?,::"T,"?"ili;:ii,:i i:;;:.#[l;TJ[?i: i'r:$: ';:. 'r.:: ol gucssing. in,rru.rlor.,"i J"a""i. ,,rir k.l::-1:,.l detrimental in i.,..;;;;;, rinu-ttiple-"iioi..trra" tn t.re_r.rse test,. Multiple choice items Mulriple irems have havc vithsrood wirhsh^.r rhe ,h. resr r-., of rime well They have .ri,i.i:mandrhe!rema,,' ,..ri.";_;x;;;,i;;.,;;;_i;;: :);::lH-111':9:-"*"1 other objecriveir€ms-they are superficral, '15'l "-brg;;; ;;;;il;i;;::;;]: MULIIPLE{HO/CE TEST IT€MS I55 r ! ' gu,s .i ' ,q U .u rl l r. rh e ,' i ri ,s h d ve sai d or i mpl i ed,har rhe o t! sood qav ro r c \ r r \ Ih e I k a \ e \\a r rF j ri n s .r sume fo,m.t pertorman(rre\ri ni .rt.i .r^" " x u rh e n r' , rrs t\, rh et ral , r' nd permi rc rhe asre\\mrn" rot hi eher. or oer .o g n rrc \k r s I h e rmp tn a rj onr dre rhar rc mu,r u,e une rnr rhod ui the or ner . n n ri D o th .i J rd rh d r mu l ri p l e , hoi .e i rern\ measure ontr a i fi (i rt hehati ors dlr dr n(d rh ru u g h \u n re l o ' m o f tearnrng ' u rformars e Feu .,bjccrjve resrsor irem are so perfect as m be above reDroach I' re ( I i | | .\. B u he,F rre \c\.rat r{ edtne\\e\ i n rhc i ndi rL. m r nr s rh d r h r\e L ,r-c ni D u e d a 8 a rn , rd mutri ptF..hoi ,e r.\r\ dnd i rem\. I i , rr. rhe s u p p u l e d L \ unhi !sed empi ri i rt dd,r, despre rhe rr.r rhdl i" ' ztd ' cbr\e re trU rc tv ( uinl, 'd:-. a ra "$' "-o u e a s \ ro obrdn l hc,hufl Lom,ni \ men,i uned b\ i1r , r i \ \h o l td ru q e r l h p d ,\, ri m ,rk ri r,g puker ot rhe i r.m, and ,he reti abrtrr\ul I nr .( or c (. y e r rrrrn e l t re r, h rrs rn d .xped k n maker, hr! e dcmon\rrdred reD ear. eor \ r r o u n trn u rl tv rh a ' r o ri \ tro m mutr,pte, hoi , re.b (an be ni cht\ retr;bl e. ' " .l ri p te { h u i , e r,.sri ng,rnd" ru be basedun qe;erdtj ,,dri ,,n5 s r eD) rrrn BI' u n r l | rru rn ,r u r ti o m i snl .,rFdIn\ran,e\utpoor Le.,,or i rcn,, h , rh -. .e , o n d p l a i .. m n \r r I | l i A \etdorn mdte d 5eri our aI| emD r ro urrte . ' Hon, l ,d \p l u r a b e r' .r w a . u t rn e d \uri ngedu,ari un" t .,t," ....., C i ,r,.,. r-," . nr or e i$ i .\me rrr o t P | rto [d n , e . p J t, ut" ,t] me,hods rhdr koutd nor ,eo;i re r ' ) i u\ e u r p d t' r' d rd p .n , j t. Bu r \u, h propusrt\ \i tdom r(ounr l or rhe ti mi r,. r or , . r n h rre n r i n rh e IJ .ri n r,d n ,e td\L\. rhr td rbi ti r) ul rhe $i ,re\ as:rqnedb\ lu4B ia. ., rh e .,\r\ rn ri n ,e d n d p e r\unnet ,cqui red ro rdny oL,r rne i ro,es. r ner F J rc \rr.n g r h \ d n d \.d r n c .\ , d* u( i d,ed $ i rh a form! ot asesmenr ,.a p r, ur er : ! l h n e e .l ru b e " s rrF o t rt re\e pro\a,rdcanr V o,, i mpo,r" n,t1," tu," . , or \ r no u rd r F d trl F n ^ \i n g tc u ppr,,Ji h r. edui dl i unat medurcmenr r: morr aP P r up ' rrr. ru fd // ' D rr , uJ| | nl ! ' \rru rL i o n J l l-ven rhe mosr ardenr advo(arcs tion. They acknostcdge, as Ne do, rhar m serious llarvs and rhar, rn qenefal. rhev a dis . r ir n i n a ri n g a s rh e y s h o u l d b e Mos; u wirh the obser!arion rhar rhe scores rhel be, orought ro be, for niaximum vatue. Br c hor c ere s r' n gu n ti l a s u b s ti tl rre,i rh tess red altenratives, esay or pe.iormancc tesring, are clearjy less convenicnr and 'ornc r en r er ro u s e rn m a n v s rL u a ti o n s , Production versus Sslec on Ir i s s o m c ri m c \ \u g g c ,rc d rh r r,,bj F, ri re' c\r\ are i ne\rrabty mure ruD erl r. . . r r t J nd te r5 re a l i s ri . re \rr o I d s ru d c,,r' . Inobtr Jge rhan are esar' re,r.. f ti , rea nswers ro rhe stuclenr the examiner hrs I nosL goud oLi e, L,re.resri rems rcqu,rc { i nrt rhoughr,rhc ,ara t.r i hoi .e a;onp s do nor perDrt coEecr..spo""e o" th! nemory, or mean,ngless verbal a$ocia. r processesrnvolved in selecring an an. I 15€ MULTIPLE'HOICE TESTITEMS Achlldbuysl€llyb.answhbn $€ groc6rptcksup,wtthoutr.€ardtorcotor, trom. truyconrain. Ingr mlxtureol lollyb€rn3of lnrc. dttt€rcnr cotoruwhar t. th. sh![6st numborot loly b€m€lh. chlldoanbuyand.tlll boco.trtnot g6lltng!t t€.srtou.l6 y be.nsot rhoerm€ The answersprovid€d aie 4, 7, 10, and 12 Assume that dxamineesare seeing.tlis parricular probtem for rhe firsr time, so ihat they cannot ans$er ir su€cesstully by srmply;epearing an ans er someone€lsehas given them. Assum€,rco, rhat probternsof tlis kind are nor of sufficient pEctical imporrance to have been made rhe subjectof a specialunit of study.These assurnFdonscall attention ro an imporrant geDeralpiinciple of educaiional measuiepent. What a test irem measur;s,that is, what a succ;sstul respons€to it indicdtes,cannot be determjned on r}le basisof rhe irem alone. Consid€rationmust alsobe given ro the examinee'spreviousexperiencesThese rnaydifer significantlyfor different examinees.Butin the casebf rhe foregoing problem, the assumptlonsmenrioned abovemay be quire reasonable. How much differenr woutd the thoughr pro;essesbe. and how much more difEcult would the problem be, if no answeriwere suggesredand the rask requir€d production of the answerrarher rhan selecrionlpi;ducinc an answer is nor nece$arilya more complexor difficulr rasl. i;dicarire of achievement,than choosingrhe b€srofthe availablealrernariv€s(euellmalz,Capell, and Chbu, 1980). HoSant 0981) thorough revi€w of th€ researchinvolving comparisons b€rweenfree-response and objecri\e resl, led bim rc $ese conctuslons:iln most insta-nces, free.rerponseand choi(erype mea\uresare founrt to be equivatenror nearly €quivalenr,as defined by rh€ir intercorrelatron,within rhe limits of rheir resp€ctivereliabiliri€s. Furrher th€ choice.rypemeasureis nearly alwaysmore reliabl€ t}lan the fre€.respons€measureand is considerablyeasierro score.,'patterson (1926)reached the sameconclusion n€arly 55 yearsearlier.Bur desDir€ |}le overwhelmingemp;rical support for Hogan'. cbnclustons,manr pracririorien continu€lo ignorerheresearch or persistin belie\inB(harin rheirown:iruarion rhe two must yield measuresof oujre difterent abihies. The Frocor3ol Ellmlndtlon Studentsmay somedmesarive at rh€ correcranswerto a mulriple.choice testitem through a processofeliminarion. Rejectingrespons€sthar see; unsads. facrory,thev are finall) left wirh one rermed Lhe 'righr a;sser.' nor becauserhey haveany baris for choosingir direcrty.bur s;mplr becausenone of rhe otherswiit The availabiliiy of this processof elimination is sometimesreSardedas a w€aknessofthe muldple.choic€ii€m form.It h charg€dthat srudentsBercr€dir for knowing som€rhing$ey r€ally don r know Morr speciatisrsin (eslionsrru(. tion, however,do not disapproveof rhe procersofanswering by etiminarion and do not r€tard ir as a sign ofw€aknessin mulripl€.choiceirems in generat,or in an hem wherethe proc€ssis pa icularly usetul.rIl mighr be nored i; prrring rhar an item $at usesthe responsi -none ofrhe above'a: i correctanswei,r4znir rhe MIJLTiPLE4HOICE TESTTEMS I57 studenr ro answerby a processof eliminarion.)There are rwo r€asonswhv this processis nor generallydeplored by tesrspecialisrs. In rhe firsr piace, rhe tunction of ,chie! emenr_test it€ms is primarity ro conrribure to ! measure of generatachievemenrin an areaof study.fhey are'not Intendedprimaritl'opro!ideaninvenroryofwhichparticularbit;ofknowleo8e or skills a srudenthas.The achi€vemenroi a srudent;ho answersirems t, 3, and 5 correcdybut misses2 and 4 is regardedasequaj to rhe achievemenrofanotlcr srudenrwho answers(ems 2, 3, and 4 correcrlybut misses1 and 5. Id€ntifyruts exacrlywhich things a srudenrhas achievedoi failed ro achieveis a mar;r of se(ono?ryjmuonan(e e\(epr bhen obje(r,\elreferen(edinterprerarions are needeclror mdsrerlor diaenorrildeLRions In the secondplac; the knohtedgeand abiliry required ro property elim. inare incorect alternarivescan be. and urr.ra is. cioselyretared ro ihe i<nowf l edge or ab iry rhar would be required ro selecitne correlt alternative rf eJucatron doesnor consisrin rhe ac.umularion ofunrelaled bits ofinformadon, ifrhe developmenrof a meaningful nerw then rhe facr Lhat a studenr respor choosingrhe besranswerby rariona tion), should be applauded rarher pends on rhe use of muldple.choi(, choicesfor uninformed or misinfc surd or Iogicall) inappropriare chorcesis no proxy for a measureofusefut verbal kn o$ ted ce . In practice,few mrlrrple,choiceresrirems are lik€ly ro be answeredcor. re.rlt merelvbv eliminarinBinro1|6,1.1,o1..r.Frr more ofren rhe Dro.€s:ot .hoile qrll involve(omprrarivejudgmenrsot rhis atLernarite aqainrirhar.tl is untrkety thar an examinee who is rorally ignoranr of tbe correci answerwoutd have knosledge enough ro etiminare with- certainty rhe jncorrecr alrernatives. r nrs rs especra r likel/ ro be rrue if rhe irem is we enouFhconsrructedso tia! all rhe a. rilable alrernarnes,correcrand incolr€ct, hav€so;e obviousbasicsinxlariry. For th€sereasons,ir s€emssaf€ ro conclude thar rh€ p.oblem Ja;;.. choice by a processof disuacrer eliminarion need not be reiarded as a serious THE CONTENTBASISFORCREATINGMULTIPLE.CHOICE ITEMS Like true-falseirems,mutriple.choiceitemsare developedmosrconvenienrlvand mosr appropriarely on rhe basisof ideasexpressedor implied in instructioDal materials.In Chapter 8 rwo paragraphsof iest materjal ivere reproduced and rheserhreep'opo5itionsweree\rra(redfrom rheml l The lse of opdonal essayir€ms inkrf€res wi.h irterindividual s.ore compari_ 2 The ur of optional esay ilems usua y contriburer to reduced resr.!.orevari: 3- The use ofoptional essa),iremsusualtyresuhsin r€duced rcslscorerel€biliry. ,I58 MI.ILTIPLE.HoICE TESTITEMS To develop multipl€.choice tesr items on t}te basis of propositions like th€se, it I Formularea questionor aD incompletestatemeatthat cl€arlyimplies a question (th€ iGm sted) 2. Provide an a.ceptableanswerto the quesion, statedin a few wetl chosenwords 3 Produ.e several plausible (but in.onect) answers to the quesrion (rh€ d$. This sequence of steps was foxowed to develop a muldple.choice item conesponding to each of the propositions reproduced above. {1) Hon wo|lldlho u€oot opllon.l ralhorlhln roqulrsdo50ayll6melikolylll.cl the IntelDrctatlo. ol lh6 !cor.6 obllln.d? '., ll g€nar.llydlsto.l! norm.El.rem.d Inl.rprd.tlons. D- ll g.n.rully lllacts lr.rtm.nt.rel.roncod intorpEtltlons morcth.n norm.rol€roncod c. I g.nor.lly mrt s crI. on.r.l.r€ncod Inl.e.elallons mo|6 accurata. d. lt gom..lly provld.r chrlllcdlon or lho domrh tor dom.ln.r€l6r.nc.d Inl.rprd.. (4 Howwouldlh. u3. ol optlonrl rllhor than r€qulrodo.ary lt€mcptDb.blylll.cr rolt-Bcora vrrlrblllty? '!. Tho drnd.d d€vlrtlonwlll bo sm.llor wh.. cholc€. .r€ o€mltl6d. !. Th.tl wlll b. towor low Ecoros wh.n cholc$ rr. D..mltle.l. c, Scol. vld.blllly wlll b€ Inorc.s€dwhoncholcos!r. p.nnltlld. d. Scor€! arl mom lltoly lo b. 6pread ool lhrougn th. modarit€ rung€ wh.n cholc.g (o Wh.l lt th. pobrbl€ ofl.cr on scorarcllabllllyol urlnq opllon.l rdhgl lhft '.. b. c. d. Equhod €.. Th3 K-R20rnould bo notlco.bly low... Th€ K-R20coulrl ho only .llghtly hlgh€.or low.r. Th. i..t-r€to.t co.lllclent rhould b€ nollo€lbly low€r. Tn€ lDl|nh.lv6. co.ttlcl.nr lhould !€ u..ll6ct.d. In the remaind€r of this chapter a Dumber of suggestioDs will be offered for Miting good muliiple-choice test irems.Most of these reflect conclusionsthat it€m writ€rs have reached.as a ftsult of th€ir own €fforts to produce nems rhat will yi€ld dependableindications of achievement,and many are supported by rarional infer€nce-Nonetheless,only a few have been testedin rigorous experi Rigorments,and th€ resultshave noi alwayscl€arly supported th€ suggestions. ous €xperimentsin this area are difficult to manage,and fte effect of violating one or a few sutgestionsis nor lik€ly to be great. On lhe whole, how€ver,item writ€rs are likely to produce better items if they know and follow the suggestions than if they are iSnorant of ihem or disEgrd then. A comprehenlive leview of multiple-choice ii€m wriiiEg research and lor€ led Haladyna and Dowring (1989a)ro dev€lop a taxonomy of item writing rules and to examinethe validity of the rules tha! have be€n olTered by text aulhors and r€searchers (Haladyna and Downing, r989b). MI] LTIPLE€HOICE TEST TEMS 159 THEMULTIPLE.CHOICE ITEMSTEM T he f urc ti o n o frh e i te m s re m i s ro acquarnrthe examrneew i rh the probtem rhar is being posed.Ideally, ir should srarc or impt,v a specific quesrron Airhoueh onc c an s o m e ri me ss a v e w o rd s w i rh o u r l oss of cl arrLvbl usrng an i ncrrmpl ctt srare. m ent a s th e i re D s te m, a d i re c r q u esri on i s often beucr N or ontr does a di recr ques ' io n ' e n d ro p re ,e n r rh e e x a m i nee krrh d murp \prr rti r pr,,bt, rn, ,t.u mJ\ f oc u\ r h e i te m w ri te r' \ p u rp o s e ( m u re cl crrl ! a' ,d hc tp h, rn ur her ru i voi d rrrete v anc e o r u n re l a te d n e s si n th e d i s rracters. Focus on Rolevanc€ Irre l e v a n t i re m s fa l l s h o i n ..nrri buri ng ro rhe purpose rb, tcsri ng fbr any num b e r o fre a s o n s . rh e s te m fa i h to preseD ra qucsrrcDor spe(i fi c probl enr, r he s ord i n g o f rh e " re m i s a mb i g u ou,. o, ,h( q,,e\ri ,,n preseri rrd i s i etzrrrel y ins iS ni fi (a n t.A l a t k o t I e l e v J n ,c rc \ ul rs i n I ru\r r.' ri , tur r\.,mi r.es JId Lonu rb 'n utes to unreliable measures. The sample irems rhar follor! i usrrare poor rech niques for beginnrng the mulriple choice rrcmPhysiolosyt€ach€6us thrt 'r. tho dbvrlopm.nt ol vital oryans ts dop6nd6nrupor muscutar6ctivity. t, slrcngth is Ind€pondontol mu6cr€si26. c. th. mlnd end body ar6 not inltuencodby.ach othe._ d. woft lB nol6x.rclso. Here the subject of a senrence is used as rhe irem sr€m and irs predrca(e as rhe r or r eLr re s p o n r Ob v i o u s l \. p re di careduer n,,' tui to$: ph!.i ,,tuE ) e f,ven r " prp\w rrri ) could teach us a varie ry of rhin' hgs f rhe srem were reptrrasea ro r eaci, Wtrit does ph v s i o l o g v r.a c h u ' ) rh e i re m N nul d bej u.' a. hrd In cmparlng tho p€ od ol hoterogoxuat adjustmontot our cutturswith thossot orh6.culsres. It mu.l b. concludodth.t '., th6r6rr. t.smondousdlll.bnc66 th.t can only be €tptainodon a cuhur.t basts. t. fi.r6 ir.l!rg. dllloronc6Eth.t musr b€ oxptain€dby rhe intorac on ot blotogy6n.t rh€ rnom InlluonrlrlcultuE. c- dlhough lh.re !ru !om6 dtllersnc.s,th6 btotogtc.ttounda on ot pub6rryts tundam.nrat. d. In mo.t cultur.! pub€.tyli the p€rrodot h6r6rcs.xu.rldiusrmont. Here, again, rhere are any number ofconclusions possible on the basis ofa srudy of r parL i c u l a r p e ri o d o t h u ma n d e r el opmenr U nri t he ex.rmrneerrad. a rhe responses, she or he has no clear idea ofwhar rhe quesdon is asking. The irem as awhole is not focus€d on any specific problen. This opens rhe rdayfor confusing multiple interpretations. Abtolute@d r.htioe cot"ectr,$. Ideally, the intended answer ro a multipte.choice ques t io n s h o u l d b e a rh o ro u Bh l vt o rr e( t ansbe' , admi rti nB no di fferen, e of opi n ion among ad€quarely rnformed experts. This kind ofabsolure correcrness, iow. ever, is difficult to achi€ve excepr in fornal logical sysremsor in staremenrs rhat l MULTIPLEIHO]CETEST TEMS '8O \ im P l ! re p ro d u , e o rh e r \rd re m e nL\ Fps. i r an), i ndu.ri ve rrurhs or f\D eri D en. r , ll\ b a rd g e n e rr l ' l a r i u n s , a n be re$' dpd a s a b\oturel y (rue. r" ., , ." ' .;;;;,, o , th e )r j tF ms on pr,,poi i ti on\ rhar drF no( absoturel ytrue bul ar e s rro n g l \ p ro b d b l e . Ih e v \h o ul d. hu\ever, gxrrd rgci n\r basi nqi remq on \rdre. m en ' s s h o \e \a l i d i rv u n u td b F .h ,IcnB ed byromperent schol ar-" . . . . Anofter $idetine to fo ow is that rhe stim of a multiple.choice irem should ask a quesrion rhar has a definire answer rndeterminate' q".;ii.; ;; provide inreresring roprcs for discussion, but tbey do not make giod iren,;;; testin8 achie!ement. For example. whlch 6v6nth the tolowing st has besnot the 9r.6t65tImponancsIn An€rtcan htstoru? 'a, Br.ddoct'Bd6t€ar D. Eur'3 conspiracy c. Th€ Hayes-Ild€n conr€sr d. Th€ W.bster-H6yn€d6bat6 Ir u n l i l F l r th a r n h o l a ' .,rn d g ree on shi ch of rhe.e ev.nr. i r ofrhe sreare\r Amr .d n h i .rn r\. The i mpu,rrn(e of an evenr depend\" on rh€ por n r o r \ re s o r rh e p e r \o n m a ti ng rhe i udgment and rhe I ontexr i n w hi ch rhar r nor v r.l u a rrs th rn k rn s o t rr. Wh rl e e a ,h m u l ri p l F .h o i (e i tem shoutd have a defi ni re answ er. i L mar . nor 2 l w a v s l re a n a b ro l u te l ) r o r re( r rnsse,. Man) good i rems ark rhe e\ami nee 10 choose th€ best answer, as in rhis exampte 's Whlchstat.mont b6st charact.rtzo.th€ mln appotntedby pr€.tdlnt Et!.nhow€fto b€ Cht6l Jur||co or rh€ unfl€d ststgs supr.m€ Court? Ar a.$cr.ls ,uitic€ ot th6 SuprsmoCourrwho h!.t onc6 b€.n . prot.agorot taw .r A .ucc.sslulgov€rnorwhohad b6.n !n un!ucco$tutc.ndtd!r.torthe aopubflcanptE.t. A u,.ll-knownN6wYort.trom.y who eucc.6.tu yprc.ecurodrh. t.rd€ru ot th. Conhu. nrsr prny h ths unitod stat6. A Domocr.ttc.6nator tron !lourh6rn sr!t.whohld suppo.tedEt..nhow.r,s c.mp.tgr Opiniant and authorita.h'e rorE,s Whar abour items rhar involve expressions of opinion? If ir is an opinion on which nosr experG agr€e,rtren a re'aqonable muropr€.cnorce ltem can be bas€don iL MIJLIIPLE'HOICE TESTITEM6 16,I Whlchot rhosesraromonrsis mo.r const.rentwtrh J6tt6rson,sconcopt ot d€mocrscy? a. Democracyis part ol rh6 dtvtn6ptantor manktnd. t Domocracyrsquiresa 6kong na onatgov€mmont. puryoseor governmontrs ro promororh€ we[..€ ot th€ p€oot6, The 'c. d, Th€purposeot govemmentts to pbt€ct rh. p€opt. tromradtc.tor subvel6tve mtno 06. lhe r er p o n s e \ ro rh i \ q u e s ri o n re p re senr generat;/ari onson rhe basi ( of teffer. s on s s pe e .n e \a n d w fl rrn E q .N o d u rh o rari \e\an(ri onforone parri cul a, ;€ral . r z ar on i\ rrre l r ro b e a v a i td b l c .y e r s c hol arsfami ti ar w i rh teffei :on s w ori w oul d pr obabl v a g re e o n r b e s r ,n ,s e r ro rhi s i rem. In such c;se, ,h; ;* ;;, i ;; br s ed on e x p e o p i n i o n i s e n (i re l v i usri fi abl e.H o\evc,, i l rhe rtem a.t< sthe ex. , r m r nee ,o r a p e rs o n a l o p i n i o n , i r i \ \ubj ecr ro cri ri (i i m. For e\ampl el what do you considerth. no.r InporrantobJ€cv€ ot at.fl m6oflng.? 'r- To .slablleh good wo|ttng rctaoonswrrhyoursrltl D. To handt€routinemrtto.s c. To holp tsacholstmprov€Instructon d. To practlceand aremp ty domocracyIn admtntltra on There isonerense in which aDy answer to this irem musr b€ considered a correct answer. On rhe orher hand, whar rhe item writer obriously wantea to ao *as m r es r Lhe e x a m i n e es J u d g m e n r a g a i n s rrhaLof recogni zedaurhori ri es i n rhe fi ei J or I nr er p e rs o n a l!e ta L ro n s Ir . w o u l d have been beuer l o asl srudenrsdi recrl v ro 'choo"e Lhe moJ( impo, ran, objecrive of srafl m..,i.s,.rh.i, our y De w n a r.l r4 c o n (rd e r rh e mo s r i mpofi ant obj e( ti ve. bur a"r'.;.;,-;iri;i,;i answ ersw i l l be or recognrzed experrs. Bur even exp€rrr disaBr€e, parti. . c nange d u e ro n e w a d l a n c e me n rs i n d viewpornr or recommend a specific re€l enr or opposing position. When ir is r€te rt may be necessary to specify rh€ aurh( like "According to your inshrctor', or..l ple, may be needed to esrablish a f.ame However. such siruarions probably oughr ro be quite rare. ft seems more r'espond. eiev anrro r e x a m j n e e sro u n d c rs ra n drhe rati onal e for a parri cul ar D oi nt ofvi ew r han. f or e x a m p l e , to re m e m b e r S m i rh' s !i ehD oi nr. m u l ri p l e .c h o i c e i l e ms d eat w ;* r i mporranl si gnj fi cani i deas, . . not 99 .d ltirh incidental derait'. as does t}le firn itern rd owing, n;r;t,h unr que org a n i z a ri o n so f s u b j e c rma tter, as do€s rhe secori d. ";;fti;,: T hlt qu$t to n ts _ b i 3 .d o n rh e 6 d v a rfl .tn 9crmprtgnotN .umtoeg[t.root.tnthom..t r.ro.|lnrp or r..quot b€d thon. tryhd w.! th. comp. v6 po.t on ot poquotprodueb In 1gzt'l a. An.ad ot atl comp. roo .mong ![ oustom€l! '0. Strongwlth Inliluttonlt boy.E but w.!fi wtth hour€hotdconsumor. c. S€condooly lo W!m!utt. among! cu6tom.ru .,, Wa!ft wlth.ttgroup! ot con.um.E 162 ML]LTIPLE{HOICE TEST TEMS T his a d l e rti s i n g c a mp a i g n ma l indeed provi de a cxcel l eD ri l l ustrari o' or rtrc pr ob l e n s i n v o l v e d a n d d rc p ra c ti .cs to fol l ow i n adverri si D g.ampai gns B ur i t s c em sn o t c n ttrc l v a p p ro p r i a re ro measurestudcnrs abi l i rv ro i randl e.n adverri s ing c a m p a i g n b ) a s k i n g rh e m ro recal l rl re detai l s of one i l l usrrari on used i o i n The s€condprlnciptsol.ducalion is that lhe individual 3. g.thers knowledgo D. mates mistakes. c. re6pondsio siluations, '4 ros€ntsdomindtion, T he o D l y p e rs o n c a p a b l e o f a n sw cri ng thi s quesri on i s one w ho has studi cd a par t i c u l a r b o o k o r a rti c l e Wh e rher a gi !cn pri n.i pl e ofeducari on i s l j rsr or sc.. ond i s u s u a l l ) a D a tte r o f l i ttl e i mportancc. E ducarorsha!c not agree.l on any par r i c u l a r Ii s t o f p ri n c i p l e s o f e d ucati on or any p ori rr of pri nci pl es. Ihi s i re' ) shou's an undesirable ciose lic.up to the organization of subjecr ma(tcr used bl a sPecific instructor or wrircr I8rructioneliBttio6. Informationalpreambles rhar scne only as *indow dress ing and do not help the exarninec understand the quesrion being asked should ordinarily be avoided Here are llvo examplcs Whll€lrcnlnghor lonnal,J.ne bornodher hand .ecldentallyon tho hot tron,Thi3w63 du€ ro a rranslerol h.st by The introductorv sentence !!gges6 thar rhe iren involles a pracrical pmblem A.tually the question asked .alls only for knowledge of rechnical telmiDologv In purllylngwalerlora clly wat€rsupply,oneprccessls to havethe impurcwet.rssep through laye6 ol s.nd.nd line and coac€ 9rav6l.H.6 mary lmpurities.re t€tt behind,Aetow..e lour lorms,oneol whlchwlll de6cribethis proc€ssb6tle.rhanlhe others,S6t€d rhecorect The primarv purpose ofa tesr irem is to Deasure acbievement While much learn. ing may occur dunng the process oftaking a resr,deliberare inclusion ofinsrruc tional materials may reduce irs effecriyenessas a tesr more rhan irs insrrucrional ralue rs increased It mighr be better to ask the purpose offilrrarion in purifying city water supplies or the type of filter used I MULTPLE<NOCE IEST TEMS 163 Iatroducilg noulry. Noyel qucstio.s rnd unique prolrlcm siruarions reward the hha, h.,,r \hF hr\ (auehr ' ' i, i' ' l ,,,i rd e ,l \r ,,,.n ,u 1 ,," ,,' \!,u q r,r.,,ui der\r.,n.l . r " Lr pc r J l /. rh , . t\ rl i , i .,l l F .,r i | | ,,r,\i ,t.r , I , i \ ,. \ J , | | | ,, IJ ' ll the radlusol the earthwere Increasedby 3 teei, irs circumt€rence al tho €quatorwoutdb€ rncreasodby about how nuch? J.8. M. h€ws,onenhe emptoVce ot S6n.torMcCarthV's subcommitre€, ciaroe.tthdt a taroe nuhberol supportersot communismin tho UnitedS!6toswoutdbe toundi; hbhotth6;. 6. O. c, 'd. W.ll Slreotb.n&ers NewspaDer edlrors P.olessionalgamblers Prot6srantclgrgymen Unintended clu6. Mukiple choicc irems sometimes provide unintended hints aboDr rhe .orrecr answer rhar offer con aminee. In some c2ses,key words from L in t he c o rre c t a n s w e r In o rh e rsrh e c o rre c ally or s c m a n ti c a l l yw i rh rh e s re m rhar s om et ime sth e s te m o fo n e i re m w i l l i oar item llere are some examples ol items thar provide relevanr clues in the sr€ml Wh6nussdin conlunctionwfih the T.squar€,rhe tsl v6.ricaro.tg. ot . td.trgte ts usedrodEw c. horizonrallines. 1€4 MULTIPLE<.HOICE TESTITEMS Ihe use of rhe word lertbal in borh the srem and rhe correcr response of rhis item provides an obvious clue Mlnordllterenc€.amongorgant3msot rh€ eah6 khd ar6 tnown Es d. nrtural .6l6cllon, i^he v^l term Aiff%d.rr in rhe srem calts for a prural iesponse, which can onty Tho nalor w6alo6ssot our gowmmont und6fttu Ar ctos ot Conled6r6on wds rh.r a. lh.rc $,srcno hlgh otttctats. '!. it lack€rlpow€r. c, lt wrs vsry dllllcull ro amend. 4 thor6was onryon6 housoIn congr.ss. There is an obvious relarion between lack of power and weaknessofsovernmenr. lf a person knew norhing abour rhe Arrictes of Confederari.", ..;-"" would nonetheless dicrare $e conecr response. "..". A n ) re s t i rc m l h a l | 9 e i rh e r m u ch too easy or n,uch ro di tfi cuh for a . sroup of ex am in c e s r rn n o r p ro ! i d e mu ' h u \e fut i nformari on abour rherr ret,,,.ej c,.i , of a( |iev e me n t. l f o n i n s p e rri o n o r a f rer rryour an i rem i \ found ro be rnrD D ro pr iat e in d i fn c u k ), \o m e (o rre c rj v e a ( ri on may be needed Manihulating ditruulq. To some exrcnr rhe difficulty ofa muldDle-choice irem is inher enr i n rh e i d c a n n w h i .h i L re s ts. t here arc, how e!e!, reci ni oues rhar oi !e r he wr ir c r( o f mu l ri p l e { h o i c e re s Li re m s some conrrot ove he d;fi tcul } of rhe ir em , t he y p ' o d u (e o n a g i !e n ro p i c . In general , stem que(i onr.an be made eas r erbr m a k rn g rh e m mo re g e n e ra t o' harder by maki ng rhem more \peci fi i r ne r o|lo w rn g p a l r o r l te m s i s rj tu s rra ti ve. 'b. good8hroughtInto r oountry. c. Incomsot lmmlgranl8. Onlt Lhe mosLgeneral norions abou( a rariff are required to rerpond !u((es\tullr t o r h; \ ir e m, w h i rh i s rh u s s u i ra h t€ for use ar rhe l ow esr tevei of arhi evemenL. Much more knowl€d8e of tariffs is required ro respond successfullyto rhe follow, A hloh prot.cttvotarttt on Swt.r !x.tch.. In th. UnttsdSrat.s t! Inr.nd.d ro mo6rdtroc y b.nollt !. Swl.a wtlcimat€.!, D, Unlt€dSht.. ctttzln! who buy Swtaawrtchoa. MULTIPLECHOICE TESTITEMS 186 4 unlradshro. €ovemmonr ofltctat!. 'd- Unlt€dStar6!watchmat.rc. i ustraresb^owrhe senerali,yor spe(ifirirvof a quesrion.an Ili: litl.l11.rn, De usecr to hetp (ontrot its difficut'v focus on Clartty Tt.'sdesirable toexpressrhesremofrhe item so rhatirrequesrs . .. LheessenD€,ns resredas di'ecrty. a(curaretv,and sirnply as possibre. rhe ;fil"1L:1,:.1c. ro owrng em sremseem,needtessty complexl Conslder€d trod !n dconomtc vtowpotnr, whtchot rh..o propo!.tolo mltnratnwortdD€aco "-- *-' d6 v6Etho t€astiuppontromth€ mt taryporonlt.lrteioi ar*rc c"*r:; d. An hrsrnartonat po[c6torceEhoutd b. ost.llshod. D. P.rmrnonrprogrimsot untvercrt n traryrratntng shoutdb. ldoD|3d. 'c- Slz6sot standtngm tarytorc€.lhoutdbo hcr€ls€d. d. Th6r6mltntngdenocr. c naUomot rhowo d shoutd.nr6rhro a m ttlry. tsnce. .1.' ,,'efut reddinss.rhe meaninsor rhisitem stemie nor ctear. lltl '.0*,.0 nesrrrveapproa.hand seemsro .ombine rwo dis,rmitarbases for llji"".*, econom'(siand drornicenergv.The wordingor lhis irem JUogmenr, miqhr seem --""''' "'-"' m reflect lack of clariry in the rhinkins-;f tte p-*, i,,r," ii. "i"i. asingflegdtbet. Ir somerimesseemsdesimble ro phrasethe stem quesrion to ask not for rhe correcr answer,bur for the inconecr inswer. For exaripl In lh€ dolhlton ot ! mtn€.a|, whtchot tho tofiowtng ts hcorsct? .. lt wa3produc€d procors.s. by gootogtc o, rt hasdl8thcttvsphy€icat properfl.!. c, ll conlllneoneor mor6.tsm€n|s, td, l|l chemtcrtcompostflon t3v.rirbts. Tremsriaiare negarivetv stated.rhar i:, rharrequi,ean examineeto Dickan ansqer thar is nor rrue or chr'racre.isric. rendro be somerhaLco,r,si,i. ir,.y pearunusuallvarrracriveroexaminatlonqrirersbe.rusesomuchoftheinsinr,r. "f. ,rra et s^ubheddinSs under a main ropic. 6R,ng lor somerhingrhdris nol one of ions are rarely encounter€dourside rhe €vancerhat is usua y desirabl€.Ar rimes. to achieve bothbrevir/anoo-u" .rnu,,1i,l"'l"Tixl''o;t """"" " probrem Undorwhtch-otrho!6 ckcumsrlnc€ewoutd. 3I,€ako.ar a po clt.a[y NOTb6 pEt.€t.d by th€ FkBtAmandmenr? a. Wh.n rsktng rhe rudtencoro rotn In ! plor6$ march '!, Whsnrotttngih. audt.ncob trl€ vtot€nrac on c. Wh€ndonounctngth€ pr€ltdonrot th6 UntrodSr!r.! 4 Wh.n calllng tor rho cr.iflon ot a now po Uc.t p.rry 166 MULTIPLECHOICE TESTITEMS B v ap i ra ti /i n g o r u n d e rti n i n g rh c negari vebord, rhc .l :,l ,s,trer.araE rrcm r{,| | .r dD H\\ ,ri c am , n e e ! a re n ri o n ro i r a n d e n s ure\,;,, ,;-.;,:,::._ (arFress* " d' ' s k er w i rr n o t ." .,i " " k ;;;;;' i :;.rh a ' k' .' ;;;;,'' r,e.^ ' he " h. WhEtchengeoccuruIn th€corhposttion o, rhal l" t llstttedalrtishtroomin rivlnsrttnss aresrowins whrchrh6onry ;,1, pi"ii.l' "- "r Cr.Dondiodde incEae.3 and oryggn d6cc!!es. ,.. -o, ca.bon diorld€ dec.eas.sand oxy@n Incr€asoc. c. noth carbondiorido .nd orygen incre.se. 4 aoth carbondioxtd€,nd oxygon decrcaso Irtro.lu?tory senteflces. hem wrirers shouid :::r'i .il:r';Tillr*:r*:i::i:f:I iriirfiSx H nf:,'l$: MULTIPLE€HOICE TEST ITEMS '67 A r Lhe s a m e ri mc . h u w e re r. rh e y rhoul d nor dvoi d i mporranr quesri onss;mpl y bc , : t u' e rh e re r' n o d b ,n l u re l v a n d , umpl erel ) correcra;sw er l f many des(ri pri ve or qualifying ideas are required, rhe clearesr erpression may be achieved by-placing them in separate introducrory Senrences. Th6lsrm cr66pirs sociarrsrnspp6aEdlrequsnrtyIn potiric.t dt.cu66tonsIn th6.a y i950e. Whlchol th6s€ i6 most olton us6dto ittuslratocr66ptngsoctal€m? Gensrationand disrrlbutlonol 6l6ct c pow€rby th6 t€d.rlt govornmont Comhunlsl Inliltrrtion ol labor uniong Gradualinc.ea6€in s.l6s.nd sxclsetax.s Particip.tion ol the Unibd Slates h tniorn. onat organir. ons 6uch as the Untt6d The use of nvo sentences-one to presenr background informarion and the other to ask rhe question-frequently adds to the clariry of rhe irem srcm. CombiniDE r hA e r $ o e l e me n r\ i n ro a \i n g l e .q ueq' on senrenre probdbtv $outd mate i r conl In other siru;dons a separare introducrory senr€nce is necessaryto esrab. lish rhe setting or conrexr. Such staremeDrsdiffer from rhe insrucrionat pream. bles and window dressing menrioned earlier. Here l, an example "Wh€n we look at rh€ world .s a whoto, ts ctearthat rhe probtomot sconomtcprogr€.. t6 reallylh€ mosl important,"This star€msntis be6tctassflsd a3 . scl€ntiticconcluslon, Obviously,rhesesraremenrscould be merged ro form a single question.Bur Ior exxminees{hose rcading rkills may nor b€ well developed,greaierctariryoftask can be achievedby using rhe formar illustrared. PREPARING THEFESPONSE CHOICES Obtaining Dlstracters The purpose of a dNtracrer in a mulhple.ahoice irem is to dBcriminat€ ber wee n L h o s es l u d e n ts w h o h d \e c o j nmand of a speci fi . body ot tnow tedse and t hus e w h o d o n o r. T o d o rh i s . rh e d i \| | arrer mus' be a pl ausi bl eal ernari v;. One 168 MULTIPLE€I]OJCE TEST ITEMS oblaining ptausibl€ dislracters.B to use true setements rhar do |lor cor 1ay,of recuy ansner the quesrion presenrcd in the stem. For example: Wharts th6prhciplt 6dvsnr.ge ot. battoryottodd storag€coth ov6ra betory o, dry ce s tol auromob a srarrrngand tighthg? ,. Th6 stor.go c. turntshosdtmcrcur€nr !. Th6 votr.g€ ot rhs erorogec6I ts htgh€r. .c. Tho corcnr trom th6 storag. cofi ts sr.ong.r. d. Ths lnlrratco€t ot rhe slorag. co ts r€ss. Lead srorage cetls do furnish direcr cells, bur rhis is nor rhe reason why t cerning fie relevance of knowledee ing irs rrurh. Mulripte.choice item"ss t es ling a n ,L h i e v e m e n r rh a t i s s o m e es s ayix a m i n a ri o n s . Another source of plausible distracrers are famitiar expressions, phrases r hdr ha v e b e e n u s e d i n ro m m o n p artanae and,hd, ma, ,ee;,,,,,,i i ,.' ;;:,; denr s w h o s e tn o s l e d g e i s me re ty i uperrrrrar. WhtchoJ rh.s. h!6 oflecrodtho grsargsrchan96In dom$ c ptanrsand antmat6? ,. Influencoot onvtronmonton hsrodny !. Organtcayotu on 'c. S.tecUv6bre€dtng d. Surutvrlot th. flnoer ilJ:T:lli';:;'"ll':::::::..::,-'j j:l:tri' :f ,henuesr,which as,uden, Dav p,.,rd;;.;i;;;;;i,*;i:;;;,i,il.":ix#x: ll,:,1;:11.:llj:l,lldf,srandin€ tary lev€l of discriminarion for wlirc'hrhrs nen ,, ,n,.,0.a. thari..m o'.,.".,.Y,1:,":";:T;sP€cincracdcs "rt"' ii**t'3,*r"tm 'un1'. to g"n.'ur. gooa be hn, Forexam w:x::fx#::!:an*:r:.m* ::::''i: #':::::'.::111ff 'l,i: ::t";:ffi :;:*; i,ly"*ili{ie,l, 2. Thi.nh of thingt tt at hdoe samea$ocidrion . ere(rri. r;rris;dror quJ,i;;:",b.;;;;;il'jff ,tr"'rl'J5,1f,Ljl; .li;lll: througha (omp,essedgas"or "etecrro;agnetic,t*,pi", '." "r i,.r, ...,rr11 MULTIPLE€HOICE TESTTEMS 169 How dld (X) lhs €otlmrted omountot p6rrct€umdiscoveredin new fl€td3in tho t6re t970€ comp6r6wlh m th6 amounroxkact€dkom p.oducins etds In rho 5am6v€ars? ,. X ws3 pr.ctlcllly z610. '0. X wa6 about hall ol Y c, x lust dboul6quat€dY d. X wls grcatd.than Y Som. c..6e ot tung crnc.r may bo atrrtbur€dro ctgarot€ smoking.Wh.t was rho srarus ol lhls lder In th€ lats t96o3? r. Th. thooryhad bo.n ctoarty$tablshed by m6dtc.t ovtdonc€. O, ll wr. ! controv.Ellt mrtt.r.nd .06o etP€.ts constd6r6dlhe ovtdonc€to bs lnconclu. Th.lh€ory hrd bsoncl€.rty dtaprov.dby 6urv6ysot.moker3, tomor sdok6r., and non. Th.lhlory wr6 too rccanrro havob6en.ubl.ct6d ro any t€sts. The responses ro rhi\ irem rep'€senra v ale of vatuesfrom (omDtetee\rdblish. menrro (omplereindefinireness. The useot a qualirar ives(ateof rl.ponseshetFs to s)srematize rheprocess of lest(onsrru(riontnd ro suggest de.irabieresponsis. 4. Ph,asethe y.stion tu that it cotltfot a,y's" or "no', onsun prl.,as .xptaMtio,,. Here rs an euhDle rr tr rarroorrrecraronrry Incomsro dt.po.!bt6 hcom! uaurly htgherIn a .€ntorct zsn nou!.hord rh.n In ! young.nrrtert hous.hotd?Why? r. Y.., b€c&r. .srlorc h.v. gr€lr.r livtry! Incom. to.p.nd. L Y.i b.c.u.6 !ento.! h!v. no malor tuturo .rpone.. (hous., colt€g€ cost6) to say. tor. .. Nq b.cru.o loclll socudry plynonr! ..tdon c,ov€.a[ fliod €xpone.! ot 66rtors. d. No, b.ca$. dl.po.abl€ Incom. ts atway. htgh!., by d.flnhton. 5,.UE variats conbinatians altuo cbnentt as ttE att$netiver. might oc.asionally assume rhis form: r. 2. 34. ODly A Only B Borh A and B Neither A nor D Thus four responses __________ I70 MULTIPLE'HOICE TEST ]TEMS A D i re m i l tu s rra rj n g rh j s ta c ti c i s: What was th€ generatpoticy ot ihe Etsennot"'"ot'n'ttratlon governhodrerpondiiuresano r"""", ,. Reducitonot both expendlur€sand tarss .b. Roduc on ot gxpenditures, no ch.ng€ h raxos c, R€ductionin tax.s, no changein 6xpend(ur6s 4 No changein eithsr €rpendl[ros o. raxes durlng i953 wirh resp'cl to Ifrt)c rwo etemenrs each have rwo ditl.erenr values, for example rise_fa raDrdtv, slowln rhey can be coDbined in rhis $av ro grve tbur alterna(iv.s I ? 5 4 rt risesrapidly It risesslorvrl Il falh slowry Ir falls rapidly. \iue. .onsidr. r ro ba, k uI c res,rnA .l t raj \e attcr rr using a difrcrent ottbtuach ia ttp t.,m rhc E ri ngj ;h and ro J,t rhe prouuri ri .n on w hi ,h i r i \ ri \. ro I, dr,e\ not eyi \1,rhe i de" 0,,,*".,.:,'l ;ll ;,Tl:i:";:.,,,,,,r, ;Jll,;;,lj ..,., rfd1*roning,r,. c ut r v o r rh e rre m, ro m a l e rh e c o rre. . ;a.*rry'rr,. "".i c on re n r d h .ri m i n a u o n'";;;;;;;:.:;Tilrij'.$::::':::,.*f:T'.:Iff:11: re q u i re d i nr Jtititi"ltl;jI"it"i:lili.$:Jj:,,:il :l:* ;;;p;;;:,;",itllf;i1f, An embargot3 'r, a trw of r6guta o6. D, . t(tndot bo!t, c. an €mbartment, d. a toorrshadvonrurc. lhis irem!ary widery,onryan eiemeDtary kDowredFe of 3.:::"-.^iTI:+.1*: emDargoes 6 required lfor successful response. D, a cu.toms duty. 'c. th6 .ropp.g. o, good. tion sntry and o.p!nu... d. !n rdmtslton ot good6r.06 ot dury. T he h o m o g e n e i ry o f re s p o n s e si n rhis se(ond questi on mdkes i r co,rs,derabty m or e d rtfi c u t, MULTIPLE'HOICETESTITEMS 1'.| Anorher means of makinE an irern easier rs ro provide more rhan onc basr for choosing rhe correcr an;erj as in mrs rrcm Which ol the tottowing.rs knownlor rh€trwrltngs h cotoniat America? 'a- ThomasPain€and Ben Fr.nktin D. Malk Twainand H6nrycr.y c, Willlaft Pennand paut Revere d, RobertFrostand E.n€stHehingway The use ofrhe names of rwo individuals fi(ing rhe specificarion in rhc i(eD sten nak es i t s o me w h a r e a s i e r T h e e x am,nee ." rv 1" " ," ,,r;1," .;i ;;t;i " ei a $riters-or know thar one in each ofrhe disrracrers "". was nor knon,n for his writins in |he colonial period-ro respond successtuIy rr has occured ro soDe ircm wirers rhar rhel mighr use as.tisrmclers the a:rsw:r or .ohpreti-on y:#,:":.,:i1",:::i..:,s,cLl::: srlorr rt""rsrr_o.".,lsiijr. j:';"r;:l; be I.'h,' "b'|j,n.d ::l'l;,':;'l::1"'i:r:'-:1':,'i::lt 'Io ."ra.- ..;-t ,"":;':; ; r,; in ear n, i,pmhn,,ns r:;;; "; i,.,,r'; :lll:l -.1:'jll: ij - ;;;';ii:i,r:: *. ",uden,, espun,e'o,.,.."u"".,. i;;;;;;, il;,;,''" tt)73) ".a Sldving lor Ctsrity [; 1:5;.";;l;.3if ti*,t',,*:fu:*rli#;li#fu Th6 chl€l dtlte.6ncab.M€en rhs surtsc€foatuEs of EuroDs and Norlh Amertcais that rtu 6.oa ot Europets ta.gor Europo€xt€ndsmor6to th6 south. th€ VolgaRlverts tong.r thdn the Mtssou.t-MisstssrDoi lh6 grerter hlghlan(|sand ptrins ot Eurcp6.xtend In an oast_wsst dkection, fea,ureof Europe. Eirherrhe 3:y *_. :l:::s :1,::: l:illr q.*.i|., : *.face "'u.r... r.u,u-"r: ". t'r'.;*i.il;t* 3ff:1'j:,"1::l:-:":.q: lTr::d:: s hould a l l c o n fo rm ro rh a r.a re g o r). S i n .e mu l ri p l e .c h o i ," r;s p onset dre d i orer.ted ro be answ err ro rt,e beParaler(thati.' I' g*.-.ri..r :-":'T:'.:i'i,.'l-L:l:tidarr ,.1.T.:.:".1lp-. ,""'.,1.inrengrh. 'i-'t-; and ,. ."-pr".i". iJ^i",,i;;;;i). l;t. '.*. :l "'; Slrvorywas ltrst !ta.t6d 'r. .t Jlmgetown sot 6h.nt. D. at Ptymouthso t€mont. c- rt tho s€|lt€D.nI ot Rhod. btard, d. . decrdo bstorg rhe Ctv[ w.r. 172 MULTIPLE€HO]CE TEST ITEMS The firsr three responses ro this item are placesi rhe fourth is a time In quesrions (his of rype,it ,s nor difticult m visuari;ean t*"*. i, _r,r.r,L. .1,r.",* r.r. ofa directqueerion stem mighr h.rp ro pie,.; l;i,";;;. :.":j.o:::.*,. or a m D i g u rr). S i n (e rl re rn rti v e ,e s p o nsesare i nrcnded ro rFpresenra \er of di \ri ncr stton. rr ishelpful to t\e examrne; and ro rhe effecrivenesr or m e re { rre m rt rh e } d o i n d e e d preqent (l ear cho;.e5. . M€.i can b€ pEs.ryod In brtnodue to th€ t.cr thar a. satt ts a brctortatpotson. .!, bact€rtacsnnot withstlnd th. o6mo c action ot th€ brtno. c. satt att.rs tho cismtc.t compo.Iton ot th6 tood. d_ brtn. protects th6 Dsat trom contact wtth atr. B or h re s p o n ' e sa a n d , c o u td b e j udged .orre( L R esponse, 5i mptr e\D tai n5,$hr r c s p o n s ea rs c o d e (t. In a .a \e t i ke rhi s, i r r\ undesi rabte ro counr onty onc or two alDost equally correct responses. Familiar exp.essions ind phrases provide a useful source .. of Dlausibtc distracters. bur obscure distracteriare undesirabte. a ctro c condflon t3 r. aEymPtotlc. .b, contusod. c. gaucho. 4 p€nnutlbt€. appropriare reuer ot d'm'ur,vror ll i:^::i:1"'i3ji1 :'?'?'"'an rrr.. *-ainr"g ..,m,i,:;J;;;,;:,i;; jl:i?.yl,f,:::l1l' ::Pl! il: lTn ,o expe.l,t. roodirficulr..tr il unreasonabrc *".i*. of rhem mrght no. be a betrer synonym for ,.ch,".i." ^;:;;;;,:;; i';; ,. r..*i;;;;,.";;;; th"" ,h" ;;.JJ;;;;;;; The search foi prausibte d isrra.rers maysomerimes induce an item wrirer to resort to rrici(ery, as in this irem. Ho.rc€ Gr€ol.y t€ tnown brnb .. advbo to young m.n not to go W..t. !. dlscoy..yot.naeih.tc!. 'c. gdllorEhlp ol rh. N.f, bn 1116lrr. d. humorous !n.cdor.s, I ns e ro n o f rh e " n o C ' i n rh e fi rs t responsespoi ts w har w outd ofi eFai se be rhe to rhe question and d,us makes |}le irem more a test of studenr!. arertnessthan of their knowledse of Horace cre.r"y. r,i.r...y ,r,i" r.i"J-i.-n-..i, badly on the ethics of rhe ite; writer and is likeiy ," "r power of the item. Such ptoys tend to hav€derrttn;*"t "p.1,;;;;;i;i;;;;; Jef.., ., ,. -",.",,"7 examrneeiwho are abtero de(ecrrhem.The me".af ro ,hem i" "Rerd ;;;;; r arefully be.ausesomeoneis our ro car(h you off-guard.,, e, , *.,r,. ,iia..^ MULT/PLE{NO]CE TEST ITEMS 173 ar e lik el y b re q u i re m o rc ri rn c ro ma kc rl ti"' a."'l,k"li; .;;. spoDscs'and their levels offrustra' Galning Efflciency ,* necd td p a ra l te l s l ru crurc berw een rhe srem and rhe resD onfts "._ someum€s requrresrhara respoDs.s t)esinwitr, ,r,. *-. *.ra- s;iii rhl.-; group 01 words is repcatea in cach resi phr as e t n th e s te m s h o ,,l d 1 r..,," ,i d " ." d1" " " ' the possi bi l i ty of i ncl udi ng that Whlch is the bost detinitiontor a vein? 'a. A blood v6ssetcarryingbtoodsohs ro rhe h€a,. D. A blood vossetcarryingbtuebtood e. A blood vessstca.ryinghpurs btood 4 A blood r*sol carryhg btoodeway trom |he hesrt T his ir r m,u u l .l I,ro b rh t\ b c ,m p ro re d h! usi ng an In,ompl ere \t?tFment rrem ' uihd, . A v e i n i \ j h l o ,,d r.r,e t ;a r,\i ng. . oc, a.i onal y. som. ,." .;i i t;; ;;; i,,n\.ni.n, s/! or ma rin s, h Fr, " . , r. , . . b ; i i" ii. r, ; : " , ; , ' , ; : ..n-"" rl':.:.j1 c per r t io s e e m se x .c s s i v c A l o rh e r p ro L l .m a ,i \e \ w h e , , tonq and (umpl e\ so rhdl e\ dm inee \ h ,t. d i j fi ,u trt p fl .e i \i n A and kerfi ng i n .i ra rrr..* .' " i i " r" A i n]l enc esam o n g rh e a l l c rn a ti v e s . Syst€malicgeographyd[lers kom rcgionatgeography matntyin that r. systemsiica€o9r6phyde. ts. in the n.tn, wfth phystcltgsography, wlr.r€r! :-* --- roctondt4* osr.phy concems[s.[ s3senflalywtrh rh€ fletd ot hrrm"" g""e;;rd D. systoharicseogr.physrudi6s, r.gton systemdtca y, wh e r6gton.t;6ographyir con. cernedontywtrh . d€scrtptiv6accountot . rogion. 'c. syst€ma c geographystudt€s, shgte phonomononh tls dtstdbu on ovorlhe oadh h ordeJf_o.suppty gonedtEa ons tor rsstonats.og6phy, *rrr"rr rruar." ," ot pnenornonain one oivan.ro., "ir_gl;""i d. syitoDatic gaogr.phy is rh6 modom actontiflcway ol srudytngdIt.ron sflon ot tho 6a.rh's.s!rtate,whiteresion,ts.osraphy is rh6r,adiu"."t fi d.;;i;ii;;;:;;;il;: hg dtsl.tbu on ot ph6nom6n!In spac6. .fl eni ri ( of svsremari cgeographvdi sri n trpny? s rhe task tor rhe e\ami nee by removi ns .sponsesal so rend to focus auenri on on 174 MULTIPLE€HOICE TESTITEMS Whai le monogamy? a. R.lu.alto marry 6. Mrftlage ol on6 womanto mor6th.n one husband c. Merrlageol on€ man to morethan on6 wilo 'd, Mirl.ge ol one m.n to only on€ wil6 A m6.rl6goIn whlch ono wom6n mlrrss on6 man i3 cdlled It rsu s u a l l yd e s i ra b l ero l i st rhe responsesto a mut,pl e.choi ce i rem rarher than to auange them in tandem, as in this example Th6b.lancs sho6l.sporllorlhoAlax CannlngCompanywouldroveal(a)Thecompany'sprotir lor lh€ prevlousliscal y6ar'(D)The lmounr ol hon6y owedto its cBdtrorc(c)Th6 amounl ol hcomo tax pald (d)Th. ahount ol sal.s lor th. pr.vlous liscrl p6riod. Responses in tandem savc some spacc but are much more difficulr ro compare than-those pla.ed in list lorm Another good rule rs thar whenever the alternatives form a quantitarive or qualitati('e scale, rhey normally should be ar. ranged in order ofmagnitude from smallest ro largest or largesr b smallest. This may avoid some confiision on the Dart of rhe examinee and eliminare an irrele. vant source of erfor The DoDulatlon ol D€nmark16aboui '0. 4 mllllon. c. 7 mllllon. d. 15 mllllon. Comlnon prac(ice in wrning multiple.choice tests calls for rhree or four distEcters for each item. Ifgood distracters are available, th€ larger rhe number ofalternatives, the more highly discriminating the is likely ro be. However 'rem one is likely ro be some. as one seeks to r\'rite more disracters, each additional what weaker Tbere is some merit in settin8 one\ goal ar three good disrra.rers to each multiple.choi.€ item aDd. iE $uggling temporarily ro reach this goal Not all good distactersare immediately apparent Some will emerge only afrer considerable brain racking On the other hand, the,e is no magic in four alternatives and no real reason why all items in a rert sbould have the same number ofalternatives. Ir is quite possible ro write a good multiple.choice test ilem wirh only two distracrers (three responses), and occasionally with only one distract€r, as Smirh (1958) and Williams and Ebel (1957) have shown After tryout, one can actually improv€ som€ rtems by dropping those alternatives that don't distract poor srudenrs or that do distract sood ones. MULTIPLECHOICE TESTITEMS 175 Eliminatlng Unwanted Ctues A common device for adapring muhiple choice irems ro questions rhat seem to requre severat correcr add as a final ahernarive the ie. sponse, "all ofrhe above.', tsur use ofrhis respo"r. ." ,h. ."-..r app' np" J lF n n l \ i f rl l p re .e d i n g d l rc r nati te\ are ati .\.onp.t ""J;;;;;.;; ,; ;; s r enr que\ l ro n . l r r\ n n r u n i n m m o n o n (ome .t1ssroom r..rs to " " " " .r, fi nd ..al j ut rhe abc,vc" as the .orrecr 2nswer for each or mosr tl. t,"rn" t. *r,i.f, tr "r r" r,",r,.,".,,,lt)-ii""., rheuppo\ire\irudrioni\ round iusr "o".i* 9::.:',."1,,],. I e te \d n r (l u . ro rh r torrc,l an i ncorrect wh. n all o r rh e J b .v e i . u .e d . d n d i r shoul d bc used,p,i .si ;_ " nsser i i ;;;;;' ,; be, hp, o' , p . I r,' .$ .t o n ra l u .,d s i o n\. bur ne\e, , " , " ri ;.,,,i " i ,,,' r " ri " i The response ,,none of rhe above,, is also somerimes used, eirh€r as rhe inr i. ndc d J r-re r !r a , d d i \r,!, re r j r i s pa r(ul a,t) i , In,i ;i ;;;;.. af r lnm e' i, u r " ,.trt.or reLl ne!, k h ,.re rb c dr.,rni ti on bcrheen " p e ttrn g dnctern,r r ' unequ' \ o , a l . Bu r rhi ,' r.m r.rp. u n \e . ti k e a ot rhfut" ," .-.t .,ta _r U ....a,.r.,, ,, r ,horoutshrr ljl:-:.-,l , r ' \ r ) and l lll:, [,(o r' ,r r L e ,,,n d ] u \d g e Whichword ls missp€tt€t? Hereare exan)pres of (o,,ec, o t r he(e re,pon,e\ What do€s th€ termgrortfi fl66n? T h( o \e ru n o l b u rh n l rh e s ere s pon\e atrerna,i ve\probzbl ) denve\ trom r,hr m r r on, e p ri u n rh a r a l l mu t,i p te { h ;i,e i rem\ shouj d i rave ar teast i our ror nv e] r s l, ons e _ a I| c m rl i v e \. th e \e p h ra i e s dre used as fi l l er w hen rhe rtem rri rer en. ounr er \ d rt,r(u rr\ In In d i n g a \u m.i ent number ot di \l racrers l n surh (i r. rh e o v e ru .e o t e rL h b e (o mc. d .tue ro rhe resrw i se, underD reD ared ludenr hho re .o g n i /e s rh a r..a i l o r rh e a bove' or..none of rr,. * i a" _ r nc ( or r er I d n s q e r w h e n i l d o e s rp p e a r. A ( w as poi nred out U " "D " tri " ous\ec. i n "rhe r€vi iili,illil,"l. ."-r. in8redsun rora ,rems ina tesrrohaverhesimenumber The use of disrmcten rhat are less difficulr rhan rhe correcr aDswer is somerinre\, rrri,i/ed be(auseir permirsa studerrro ,esporOsr,..e.stuflv tv ef;m. Ina ng rn.otre.r Huwe\er srudenrrwho can respondsucteistullv on rnrs Ddvsusua y 'Aponscs. tnostedSerhdn rhosewho cannor.Hence the I dn iremis nor ,mpriredb) thischaracrer istir.Of course. ,":,,d or hishryimplausibre ki .onrribureliIlre or norhrns :^ii:,"'::-..i1": ': ot a resrrreh ro rne erreruvenel\ whichot ihe to owingha3horpodmolt ro hcroas€rh6av.r.g6tsngtn or hum.n i.? D. Avoldancoot ov€Ealng c. Wid6r!6e ot vtrEmtns '4 Wld€rus€ ot Inocuta ons 170 MULTIPLE'HOICE IEST IIEMS S om e re a c h e rs m a )te e l th /l th e d b i ti ri esofsomeotthei r\rudenrs(annorD ossi btv be undere\rimared, but rhershourdnortcr rhisreetins or tr,.,*,L" to employ such an unreasonabte disrractcr as response d . A lack of parallelism in the al pared examinees to the corr€ct answe lvnt€rs io express the correcr answer r tb€ o rher ahernarrves. Somerimes rhe c, sive than any disrracrel Ar orher rimes conect answea allowing some studenr! mg vaguely ftat rhey had eDcounrered examples of irems thar provide unwan r.iJ,"r,.- How dld etyteEIn woman'sctothingIn t950 di[€. mo3i kom thos€ in .tgOO? a, Theyshow6drnor€b€ruty. D. fh.y showedmorev.d6ty. c. Thoyw6r6ea.io. to ctean. 'd- Th6ywsr€ea3i6rto tivo In. to work ifi. to mov6In, and w6r€g€nsraly tess r€srrtetiv.. The sgrqrer detail Ned in stating rhe correcr.esponse rnakes ii undesirably ob- Hlstoryt€ttous rhar att na ons hi.vo6njoy6dps.Uchsflon ir 'c. phy8lc8ttratntngot lom€ sod. Rsponse . obviously provrdes a more reasonable compleuon ro rhe srcD thaD or:r.].,c ojher rcsp:nses r, represe,,ts a cons",_,,,i[;i;a,;,_r; i;;Ii ill ol the dangersinherent one In Lheus<ot incomptcte.suremeni iremsLemr All theseirretevanrcluesro the and should be avoided.Ir is entirely aD in the disrractersto misleadt}le tesivii, the relevant clu€s-rhose useful to welt. irrel€vant cluesis an imporanr skill in ReduclngComptoxlty In somemedical aad healrh.retaredconrenrareasit has becomepoDular to sroup answerchoicesand have examineess.r..r,h" _...., choice!.such irems hav€ been referred to as ..-ultiple ,nultiple""-;i;;;;;-;; ;h;i;;.;-,,-;; ""'"". ptex mulriple choice,',and ..K.ryp€items.,,Here t, '""' "';;k'ii;, a lor|nol or..ctm rharts Intond€d mlhty ro hstpbufldondur.nc.t3 '1 . !. losglns. MULTIPLE'HOICE TESTITEMS I77 c, litrlng w6ignk. ular combinadons of chorces offered , ramrnees express grearer pr€ference for s r' lrue measure of shar they tnow. (see nr. a[d W h ne], t9;7j A l baneseanctsa. .er and Frisbre! 1989; and Haladyna and rese generatizarions.) .j Fscri bcdi n rhe preti ou\ (haprer, over. { mutri pl e{ hoi ce furmar no,ed above.l r Drescores,rr samptes rhe conienrdomain hown a distin.r preference for ir (FI|sbie and sweeney, 19E2).There seeDs ro be no logi.rr *."ipi.i.ur l*i" ro. . i"". tinDed use ot complex rnuldple cho,ce irems SUMMARY PROPOSITIONS 1 Ihe mosl highiy reqa.dedand wdety used form 0ro DtecLrve le$ is r he m ult pe c hor c eior m 2 Critcs o l mutlpe c hoic e t em s lend 10 ex agger are borhthe fumberof tau(y rlemsthatappearon restsand the ser ousnesso1lhe consequencesot 3 Th e rmp on a.r as pec ls ot €duc ar onatac hev + men rLfa lca nb e nieas ! r edDy obiec liv et esar G6 rargerydenljcatvr'trhrhose lhal can be meas!fed 9 A goodmlit p e-choic6ilenrordi.arityshoutdnol asktor the oxamtnee s opinon. 10 rlemstestingrecaio' ncdentatdetais oi insrruc rbn or spe.€torgan26tions ot sLrbjecl ma[eroF ornary are undesirabte 1r The rem slem shoutdposelhe essenc€ot its qlestionas simptyand accufatety as mssibte. 12 hemslemsincudingthe wordnot askinoin e! reclro.an inco(eclanswerlendlo be boihcon, 4 A sludenl who setecis lhe corecl responselo a good ru lipre-chot.e ilem by ekminalng re_ 13 Thestemot a moltipte<hoic€ lem shoutdb€ exspo.ses she or he k.ows are incoiiect demon s(a re9r.h evem enlot r et ov ants ubt ec rm a er 5 M! rip e-cho i.€ tem s s houldbe b: s ed on s olnd, s/gf rcant deas that can be expressedas nde, penoenl and mean ngi!] propositions 6 The slem ot a m! lipte4hoce ttem should state or ceany rmplya specirrcdired qLeslon 7 A mL tpie-choice (em ca fg tor a best answer can be as elteclve as one lhal conta n6 onlyone aDsor!rerycorrecl answer I Goo dh !t1p e-cho c e t em sc an bebas edonm ar rers ol op nio. I mosl experlsshare thai oprnion or il lhe a!thorilarve so!rce s specitied. 176 MULTIPLE€NO CETEST ITEMS 17 Allrheresponses to a m! tipte<hoce ilemshoud be par ar e l rn ty p eo i c o n l e n g t,ra mma l i c a tsl (j cture.afo generarappearance 22 le The responses 1oa mUtpe-chotcetem shoutd be expressed simp/yenouqtrto makecear rhe essenra d lle.encesamongthem 19 T hor es p o n s eto s a m !rp l € -c h o c ei l e ms h oui d be lisledralherthanwn en one al{eranolherIn 23 a compaclparagraph 20 Wh e moslmLrttipe-choice it€msprovdea1 easl lour allernatveresponses, goodqlesiiofs caf 24 be wf ille nl s i n go n y tw oo r l h re ea l e rn a l v es 21 There s no compeifg reasonfor a mlllpte- chorce items in a rest (o have exaclly the same numoeror responseatlernatves The responses noneoftheabove and.a o' lhe above are appropriale onty wher rhe re_ spcfse choces e ven to lhe queston are absc luley corector i.coirect(asIn speIng or ar th The d stracrers if a mlttpte choice lem shourd be def n lery ess corect ihan the answer blt prausbly allraclive lo the untnto.med The nrended answer lo a mu lrpte,choce tem should be clear concise, co/iect, and k€e ot O UE S T I O N SF O B S T U D YA N D O IS C U SS ION 1 W har is lhem os t s er ols n t a l D . o f t h e m u t L pe - c h oc e r o r m a lf o r m e a s ! r n g a d r e v e m e n l (ii yolr op nioi) and how co! d thal m larionbe overcome? 2 How does lhe process oi e m nat@f Inherent n the mu lLpte{hoce to rmal. coniiblre to ess va d scores when objeclivereferencedrather than norm reierefced n(erpretalons 3 W hy s il pr eler ablef or lhes lem o t a m u n i p t e , c h o cier e r nt o b e w n l l e f a s a q l e s l i o n r a t h e r rhan an ncompele sentence? 4 Whal are some advanlageslo the rtemwr lef ot be ng abte lo use a besl answer rather than "absoule coftecl arswer ? How c an t he dif iic uly ol a m utl i p ec h o c e L e ml o r a g v e n g r o u p b ea t e r e dw i r o u l c h a n g Ing the basic ufdenyinq proposrLion be n! meas!red? 6 Whal are some polenliaidrawbacksLolsinq cofrpound cho ces (lor exampte A and B A B, and C and so on) as m! liple-chorceaiLernalives? 7 unde. whalc lr c um s lanc es m g h t t h e u s e o ' n o n e o r l h e a b o v e c o n t r b u l em o s tt o o b l a n 8 how can th6 use ol munpte trle-tase tems In ptaceot mo lipte choce improvethe hea suremenloi achievenenl? - 10 Other Objective-Irem Formats SHORT.ANSWER ITEMS A short answer resr item aims ro resr knowledge by asking examinees ro supply a s or d. ph n l e , o r n u mb e r rh a r rn r$ e h a qucsri un or , nrnptcro a ,enrence.C orn pler ion a n d fi l l i n rh e .b l a n k a re o rh er cl mrnorr tdb.t, i " r sh" r' .Jnl w er i rems Here are several examples: (1) Who discoy.rsdthe ldlulh tr.rh6nt ol dt.beros? (4 lhs ntD. ot tt! holy city ol l.l.m i! {3} h wnal yo.r w.s th. brttt. ot H..ttng. toughr? arn no a.D, t060 Wh.t 16th. commonn.ho ot oacftol thes. chontcEt3oheirncs.? (r) (5) (6' (4 C!cO! NrCl C{H,O,, NroH 10) NH, rugar rv. Items 4 thrcugh 8 consrirutea clusr€rof similar shorGans$eriremsbasedon rhe S_hort-answer items deal mainly wirh words and numb€n. They ask for namesof p€rsons,places,things, processes, colors, and so forrh. They may also 179 18O OTNEF OBJECTIVE-ITEM FOBMAIS ask for Englishwords,foreisnequival shorthand,marhemari(s. chemisr; mr (lude numberrrepresenring dares,disl tor a phrase,ir is usuallysomerhing sh( neouscombusoon" or i'discoveryofAm ol \omewharlonge esponses.forerdmple,'.Civerhreered\un:whv...or..Lr)r lhe trJrlsof... rqclassrfied asa sho es(a!quc{tion,rrherthr'na \ho .alstrer This meansrhat shorr.answerirems resrmainlt for facruatjnformrrion. A' rhefoundarionof all retiabteInowtedSe. ra(rr, on\riiurcdq impo,Idnr\ub\||a. Ium. Eut thele muLh,much more to tno\ledg. rhanrhe farrr rhdr,rn be re ported in single'\words,short phmses,or numbeis Wt anL",r a",*". t.". ..n test rs much more lim,red rhan whar true_falseor mulripl€.choiceitems can resr Thu\, khile rn! \hofl dn\her ,dn be ion\crred r. d n-Lretat\por mulLiole 'remor mutriptechorre (hoi.e em. only a iew rroe-fal\e rrem,,rn le r"nrerrea ro the short.answe!forn. Sho {n,wer.iremsarc r er1 mu,hlessafre.redb\gxe\(ingrhdndrel'ue_ ,, lalse or mulr;pleLhoi,e iremr.Thev dt\o drc rupposeai,iresr ,ei/ ra,herrtun e' ognrrron.Hhr,h In rheeye\oI somein\rrucror c makesrhemmoredemrndrne ' and more \rlrd a( resr\of d(hievem€nr. Hohevcr,/i qe havealrcadrsetn, noi only rs blind guessinga rather rare phenomenon,but the harm that ti can jo ro the scoreon a reasombly good, reasonablylong resris actuallyfarhei shghr.And rn responsero the conrenrionthatrecall is a more srrenuousmcnral pro;ss rhaD re(ognirion, ir mavbc .did rhargood .ho,,e r) pe irem\ \etdoln i,. un,.*.a bt qimplere.oCnirion.In fa(r,rheyare mnre titcl) $an are shofl -" .an\w.r iremr ro resrundersranding€nd ro requirereflerriverhinting Despite thes€limitarions,short,answerrtemshive a placern educarional manvsepara'escorablercrponsesper pageor per unir ol tesrrnsrir;e. And j Ine group ro be re\red red.onabl)(mall.rhe $ or ins rhdrrnurrbi done bv rhe ',aide is nor unreasonabty teacheror a comperent burdensome. popurar in the primary irnd 'djusrifiably basicvocabutariesare beins buitt in sub. 1etic,and in rhosepdns ot a,rencewhere Lbolsmusrbe learned_When usedsimply, €xamineeshave idenrifiable readrng or writing problems Wrlllng Short.an6worltema L W6d,Ih2 Wenionc itunpbu-ttakna! @efutb @agh to f.qrin o ,i}gh, uniqa. 6tu4 A I ommonprobtemwllh shorr.rnsser;remsis rhara qu.\tion rharrheirem writer thou8htwould call for answerAelicirsftom someofrie exanineeseouallv defen.ibleanswersB. C, or D. For exampte.rhe question..Whar is coati..,ro hnltn rhe rntendedansrer was..afuel. mighl alsoeti.ir su(h ans$ersaq,,perr;. fied ve8eublemaler." a burningember, or ..impurecarbon.'To orereni rhrr dual ambiguity indefinitenrs.in whar is resiedrnd ronseouenrdiffi,ur,. in r(orinB-rhe quesrionshoutdbe r€wordedso as ro eticira mo;e soe.il,( answer. OTHEFOBJECTIVE ITEMFORMATS 1O.I For eh6t purpos€ts mosr coat used? F om what subsr.nco was coat tormed? Wh6i nam€ts Epptiedro a gtowingcoat In a th6, Coalconsistsmatntyot h,hatch6htcat et€,hont? udte a qustid to uhich that antuet i, th. r shorr answer quesrion should be oD rhe rar ans(er nr mi nd and uord rhei r ques ceed in avoiding indelinireness and ilrrl. ar i|f.eriot method of obuinrns shorr ansqcr rems is to find a rcxrbook sente,rce froD .hi.h.,".d.";;;;;i.A;; m ar e a sh o rr a n s w e r i re m . fo r c x a mpl e fhundsrstormstorm wh6n cotumnsol arr ris€ to cooterat ludes. P os s ible(rre c r a n s w e rsto rh i s i re n i n ctude ,,w a.mcr,,,.,l ow o,,, and ,,moi st,, Thrs exaDpre also serves Lo illusFate rhc next * ee suggcsuons tor wriring shc,rr an_ jl*lliffi ;..;#T:l,f Y::::::LLn;,;{ r,r:',ri:" tTisft 3. If thz iten it an incofltpbte @teflcq ti1 to tr*:;"#il:r^:":#,:;tr:t?":::,.*i""::i"_:T ii?il, ii,'li,ill;:l ff.tx':ff:lii,i,i:::::::;:xxH$,'j;l;;i ljllilT? ";::;:::::f Tho nameot iho hotyctty ot tltam ts Whlt ls th6 nam6ot rh. hoty ctry ot t!t.m? However, answers to the question: Whydld tho Unlsd Stat€sdoctrro vr.. on Jrp.n |n 1s41? are tikely ro b€ more lariable and somewha! tonger rhan complerions of rhe sen_ Th. hm€dhr. c.u3o to. rh6 U.S.d.ctrruflon ol w., on JlpEn h 19at w.3 tho bombtngol pead Hlr6or 5. Atro;.I lminen Ld clu.s ta ttg correc, a'.nler. The word.oobr in the irem lnunder s t o rrn s s u 8 8 e \tsth d t rh e d i r b efore i t rose must hav€ been warme. Or (onsider this irem: 182 VE TEMFORMATS OTHEB OBJECT steamborts aro hoved by engln€s that run on rhe prossur€ ol It rakes little knosledge or insight to gucss that the correct answer ro this item musr be steam For what purPose B a questjon like this one being asked at all? Focusinq on the answer betore witing the questio r, lrkely to result in more r m p' ' r t arr q !re \ri o n srh a ( h d !. n ,u re " P eci fi ,al l \ unrque atrsw Fr. It i ' " l ' o i rnP ' ' r , r jr r " r emi m b c rth d tq u e s ri u n \s fl rre n hi tharP eci l i ' anss.tt' ,mi ndarel ' Lel ) to be more relevant and more concise than sentengcslifted from text material Another commorl but unwanted cue helps (he examinee d€termine the length of the iDtended responses.Each blank used in a set ofshort'answer items sho;l.i be exa.tlv the rame lensth. The short answer directions should indicate ifonly a srngle w;rd or rfeither a word or phrase nay be used as a valid resPonse. Cons ider th i s i re ml The nrm.ssl lho lwo rlvorsthat m€6t.t Cako,ltllnob, src th6 The long blank to ac.ommodate "MississiPpi' and the sho blank intended fbr ''ohio" ;ake this item easr€r for all, but Parricularly for students who are unsure Cteat 6. word th. iten 6 conciseb as Po$ibte uithout king s|ecifcity of res?Ne. ideas are expressed in concise statem€nts or questions. Excess words waste thc examine€\ time and may confuse the idea to be expressed antuns n th" iSht nor9in ol .the questioa paee- Ihi\ spoaefor renntig 7. Anrye i te ms e d si er ro score.w h;.h i s i r' mdi n Jutri tr' dri urr' ma k e s th e a, r ir ir r o l o n i v or 6ut also encourages the use ofdir€ct questions or Placemenr ofblanks at the end of incomplet€ sentences. 8. Anoid unng 4 co oentional uoflting ol an ituPodnnt idca et the bo'8 lor a short d@er i.en. Use of the usual wording nay encourage and reward study to mem orize rather than to understand For example: Gah or los6 dlvld€d by ihs colt .qual6 lh. g.ln or 1065In-. Two lin6s psrplndlcul.r to ths samolln€ In lho 3am€Pl.no tr Better versions of these it€ms w@ld be: To d6t€rmln. the Potc.nt ol gtl6 on, ramactlon, bv wh6t musl rh6.clu.l galn b€ di' ll tro lln.! .€ druwn Porp€ndlcul.rto lho stm. llno on . !h6€t ol P!P.r, lho rwo linos MATCHII{G ITEMS Matchinq.rcst rtems occur in clusre$ comPosed of a list of premrses, a list ol responsc-s,and directions for matching the two ln manv clusrers the distinction between prenises and resPonsesis simply in the names grven to them The Lu ------.--- OTNEFOBJ€CTV€.]IEMFOFMATS I03 lEts can be inrerchanged withour difficutrl In orher clusrers, such as rhe folow. r ng ex a m p l e j rr rs c o n v e n re rr ro use descrrpri re phrascs as rhe preD rses and s hor t e . n a me s a s re s p o n s e s DiructiorsiOn the btrnl betoreeachot the to owhg contrtbution.to €ducaflonatit6.sur€. m6nt,placeth6 l€ er that prec6d6srh€ nameot the p6rsonrespon.ibtotor it. . c d pr€dis€s t6. Dev€loped the Boardot Etaminers.t lh€ univorsity ol Chtcago i7. D€v6lop€dhigh.spoodotecrbntc tesnprocossing oqurPm6nr 14. Publl6h€dtho thst ieiboot on €duca onat ne.. E. F. Llndqulst E. L. Thomdlko A wide variery of premis€ response combinarions can bc used as rhe basis for m al. hi n g te \r i ' c m\: d a ' .' a n d e !(nr\: rerl n\ and.tcti ni ri on\i w ri rer\ and ouora. r i. nr : q u rn r i c . rn d l o rn ru l d \:,,,tur j an,pl ei d,rd ,,i m., or ,ol o,r: dnd; o;. ! turer , Jn b. md,' hed ro p.,ns shor n on a 1' 'sl'r'et : ' 9' ( h o t th e d n rm a l C l o s e l r r.l a re d r. rh e m d rr hrngre\r i rem i \ rhe,tass,fi ,ari on or keyl j \r . r t em . -K e s l n n \e \ to r rh i s i rrm I n n \i \r uti l i 5r,rt rta\\e\su,hasrheparr.ofrpeeth, per iod ( o rh rs ro r\..1 d \s e \u t p trn rs , ,' ani mrt\.' tpe\ ot, hemi cat,;a.ri onr, ;au,e_ et t ec t i e q U c n ,( \. b r a n .h e \ o fg u \e r nmenr. ur nari on. or srare\ ' l hp premr\escun s ir of n a me s .d e \ri p ri o n ., o r e \a mpl e\ rhar rre ro he i ta$,fi ed among rhe re. s P on\ e sp ro ri d (d H e re i s a n i l l u !r rari un Drr€ctiorsjAlror 6ach oventIn tho lst betow,DUtthe numbor i. lt lt happensdbelor€rho btdh ot Ch sr (4 a.C.) 2. lllth a p p 6 n .d a fi 6 .rh 6 b trth o tc h ri srburb6to.ori sMagn6C .d!tfa.stgno.l (.D ..t2ts) 3. ll lr happ.nedrfi€r the MagnaCarrawas stgnedbot b€torcCotumbu!aftly€dtn Am6dc; oa92) a. il lt htppen€drfier Cotumblsartv6d In Am.rica bot botoroth. D.ctrra on o,Indop€n. 5. ll it h.pp€nodaltd th. Ooctaraon ot Indepsnd6nco {1275) 37. 30. 39. a{t. Erupllonol t. V6uvtus GuttonbergBtbl. prtnt€d Pllgrim! land6d€t Ptymouth Wllll.m Shakosp€6r6 w.s born 2 3 A paf l f ro m rh -eu- s n f .l a q s e so ' c rregori cs rs responses he ke) Ii srr,I l assi fi cd t r on r te m q d rrtc r tro m rtp rra t ma r.hi ng i kms i n that the sam€ rrsD onse i ! ' ' m ar c h e d - ro mo re rh rn o n c p re m i re, and rhe number of premi ses i i usua v grearer rhan rhe number of responses.In typical rnarching idns there are Dor; responses tnan Prem'ses. M a r.h rn g i re m\ h a v e ,o me rhi nB i n .ommon w i rh mul ri pl e.choj cei rem, _ r n o, r e rrn g c x p tr,rr a l re ' n rri v e a n s w ers The! atso have somethi ns i n common wit h s h o d n s w e r i re ms th e v a rc usua v Ii mi red ro spe(i f,, ra;uat i ntorma. r on- nam e s . d rte s . td b e l s ,a n d s o on. They are l .,orty sui ted for restrnqundeF ! 18I OTHEFOBJECTVE.]TEM FORMATS u:: poorrradapred ro, ,esrDsun,rque dcar. :,,i:::1:,rt,lr-:t-",1: uf i re ' n s i s .r p re rc q ,,i s i , r c l n re d si n.e a.l uner ;;.:;,':;;| ;;::;,; :, tj;;il';i; ,]i". ;,;,ljl'1",::l ;',:r-r:l;r: It:;.r":,iJ,;i pJ,ri,uta r.,w rspe.";r ,1r,,.,i,..,,,. \ ", i t,'i \r I rn\ In r te\t. L ,. ,._i , i .rE ( nr ',J' . '',mJ,, h,',s ,i., ,i:i:;.':ii,li:;l:"ll::"1:l"l;;'::: :l't:t' r"" " ."", ll:,.:ll,T:;i.;'f,il,': ;:l;.1';i-:':i l:t';";1,:l;:l*:,i,i:ll;;rlll;1i:;:.1'.l ,,:,';",:. 1,,'.1:',','"i,i';r:':,1,;t ff:,i:l,i::l,' 1,,;llt:::::lt'1 ,'r.lii,i ,l;l;i;;,i; l;,i ;;.r:ti;i: ;:i;" i.1. .,;"., }"1::I;1"t, ; i:;:::lr.:;.,;. j,l .,;; ;,li;l l:l'; ;::,1 i ;,; :, I i;, :: i:t;ij: :;;H :, I:lt;ti":t ;,..,n.,,,,,,j ;..i;;".'i ;:, ;:;:ii :I :;li,,I).,,: ",,"i,.a m a\.huwe\er. ;:*ffii,:l T1 l;=fl hc ia""rr, J ,rs , , rLcn , turn H",. t is ng i r e ffe c ri v e i v ", ".";,,,,.,,,*g.,,i,",ii",: ,- .n",,.",, i;.:ixri:fif,:::.,T fi:?,::i,:j,,ffff r::f :;:i,tji",f#,:ff -t3, _14. ..-i5, darl, hard wood too or smoothino 12" x 12" x 1' f .'ilt:'.T'il,'J iiitlj,:,i:i,i"J 15."1: i:ii:i:;].:l';t[i^''*' ",'..',:i:ii;:#i:j*;ilii:l Ifi 3. Do not oti2nft -Met., mat hins, in e oEe to each oI the prennzs. In pirtccr away b y rh e o rh e r ma rc h e s Or i i a n err in another Thus the ir rh a n p re rn i s e se ti nrn mar(nrng. 4. Prouide direc.io6 tha ctcorb .i.ntair ,ijffj.j,.*l3lSiiffi trtr#;f*.}.}j". $'1g*}1fi 5- AEangins resr,otscsor ptmbc, t @1 bothin " f,mH,* trfi,tffi ai^ ini""n "*";,;-j;;";;;:; -"y tf; OTHEF OBJECTNE-ITEM FOFMATS1€5 pr em r s e rn d rt\ re \p o n s e s to s e th e r houghr ,' 1 , rh e i r s e q u e n c em a v mi be. Rear ra n g i n go n i o r b o ,h ti ' \rsi ,l Howev er, i f a n l k ,g rrl t o rd e r € x i s rc or dares) preserving rhar order will anrinces tast . 6. rf thz respojrJesare nune,i.at gudntitier, dnange then in oflrar iron hu .o hW. 7..Ut th" loaTcr phtu es 6 pqnLtcs. th4 shorkt d, resporJ?j. Borh ot rhcse d, riols bill r end ,o \i m p ti rr rh F ;x a mi n c € \' , a.k In fi nd,ft,h...,,.., eliminare irrelevanr djffi cutrl ^,i ;h;;;.;,) NUM E RI CA LP RO B L EMS sented as mui ri pl e.choi cerest i rems, they er form. \umer t, dt probtem( provi dr rhe d i ,hmer, and u,h;, br!n.h;, ol mi rhe pr obic Ds c a n e a s i l v b e p ro d u c e d b y .h r n wnr c n th e ! a re p rc s e n rc d ,a n d rh e s€ t es t unders L a o d i n gi n c o o L ra s rrr) m e r ( ano t r c nc e c a s ' L o s .o re , e l c n i n s h o ft. r r t ei. r l pro b l e ' l l s a s s h (rrra n s w e rre s ri r cD rs B ur someri nes thef. are mrnor di ffi c ult ic s in u s i n g rh e rn I l| e l ' f,,L l c r, ,,t r\,.r.ti g r n u r,t(.! .t , ure! r r..p^I\c\. sh., h nl .,!xe! , , .h.i,..,r,q,., urher ,,.,,, ,.,re.d ..m.wh : , d , t l(, e , , , r, , , ; r. , . , , , , , " , i, ii , , i" i. li. m \ H, , $ , i ,,,e , | ,ru \r rh e .,,,sq,I r,e ,,, ..,.t." ,;,.d;;; H ;f .p ,e ,,rt\ l ;,;.; partial credir shoutd he given if rhc process is correct but *. ;"..* t,r.",,=.i bec aus eof.o mp u ra ri o n a l c rro rs ? N o b l ank ;;";::r;r';;ll;1.,:"1;.';: ii:il1:::n ; ;ltBllii,j;l :l'";it'i; ;;;j;, l: U::,thzsinptennunbersp"$r61".Thepurposeofrheiremrsroresrunderstan.l. j.,Tf j::i;i;;;;il rif"'j+tJ:,'jti*:j tf*:tl ::.J"?.j.'j ::.Jii.ili,:i: la. How msny ctrctosot radtuEI S/0,,canb6 obrain6dlroD an I t/2" by t1', €h.ot ot pa_ lb. Howmanycirctesotradtqs1"can b€obt.h€d troman8" x t 2t' sho€tot pap.r?_, Clearly rhesane problem.solvingabitiry is measuredby borh irems,bur rhe com. Sliii:tli"li:i:,i::i:*,'L'5:j::.x'il,T?:;,:ilH* J1"..#;il #: 186 fiHEB ofuECTIVEjTEM FoFMATS 2. II.posihtz,..h.ak ttr? gtuen quaatitias so thot the attun uiA b. d @hote tunber. o oo rh r\ w rtt h e tp to rv o i d u n r e fl ai ntr abour hov far a de( i mrt f.a, ,,on .houtd .r 2r. Whatle th6 .re! h 6qu6Eteetot tho |rrgsst rect.ngt. that c.n be tormedkom an isosc6. lo3 trap€toldhavingbes€sot 20".nd 35".nd atUtud€ot i7,,? 2b. Whati6 th. .106In squareteei ot th€ targostrocren9l6tharc.n b. tom€d tron .n tsosce. los tr6pszoidhavlngbGos ot 24" rnd 30" and .hitud6 ot tB,'? T he n u mb " rs i n rh e fi ' \r p ' u b i e m )r(td J ounde.t/.o e,r re\D on\. ot 23b. wh' c h .a l \u ( o u l d h e e rp re s e d a . 2 .4 .r 2 Ihe qF,ond probl em rcqi ,,* . ,r,. _,.. rhougjrt proce$es bur yields a corre.r response of 3 square feer and requires no t. Spccilythc d?grec of ptccis;on dp..!ed in thc d^n pr |lqu.lrnr\arFuniefla,n abour w h d r rh p r a rc b .i n R r,rc d rodn.and i t rhey gxe,r hrnngt) rhem.J{ rc " ' ment of whar rhey are able ro do will be madc 1"", i.."."re a: If a fib @n_ecl antuet tuLn siecib the unit ofmeo:te in tuhich it is expressed,ten the Mniree this 6 fart of th. probleu. It is easy for a disrracted eraninee tc, lo18. r In w ri re rh c u n r$ i n $ h i .h rn an\her erpreseed. knoqrna hhar rh( '\ uni' \ \h o u l d h e h a n i mp o rra n r pafl of rh. prohj .m a,k to h.m;epa,Jret) 4.. Whal numb€.erpressocthe Inr€nsly ol tlumtd. on o. rht6surtace? /tb, In wh.t unlt. i. thts lttumh.tiotr tnton.tty6xpros6ed? toor.candtos 5. II pNibb. dirid2 a dnglc .onpter nuhilt".stcp htubt"n into a numbu ot ,inbtzt s iaela n te p f rc b b n t. tr i s J h i i rd k c ru bpt' eve F,.mol e he ;robt;h t,ti rhe,no, t he be rte r i r w i l l re ' r rh e e ra mrn c e r aL' i l i ry.l' u{ r rhp rcve,.p rs u.ua r i rue. \nv i om ple \ p ro b l rm i n \o l v e 3 d n u ' n ber ot p' o.e.l ural I h.i , F\. ar N el t as z number ol qua n t d ri v e c a l c u l d o o n s .E a c h o f rhes. , Jn be ma.l e bas,. nr d (eD rrare rerr ' hc ir em -S u , c e s si n q o l ! i n 8 rh e w h o l e pr.hl ern i nvol vesnurhrns mure rt an \u, .es. i n mating the separate choices and calcularions. CoDsider the;e rwo ircms. T.he first is relatively complex, but rhe second is more effi€ienr and tikely wili conribure m or e t o h i g h rel i ab i l ' I y . L.rt y.ar M..cy lold 00 cars 6t .n av6raO6 prtceot $2,OtO.86. goatrhi. yoarb to s6tl 50% Dor€ c.r3. tt Mircy..m6 a commtsstonot lo% toroachsai., how huch morsw l sho oam thl. y.ar than l63t yoartt sh€ 6.ch€s h€r so.t? sb. last y6!r iilrrcy sold 60 c.B er !n av€ragoprtcoot s2,0OO. H6, go6t rhi! yorr i6 ro setr 50% moroc.'3. Mircy,. cohmt.oton i! 1Oolo tor €.ch srt6. l. How nany cire do6s Marcyhop€ro sel rht6y..? 2. Hos hany dolhru did M.rcy earn t.st yeertrom h.r !!to6? 3- tlou, mrry dolllrs do€s Marcyhop€to 6.h thts y6s? 5.. Breaking down a complex, muhiple.step problem in rhis way wilt minimize rhe prcblem of partial €redir. tt will result in more independenr indicarions of a.hievemenr or lack of ir Thar will improve rhe reliab,liry of the resr scores. OTH€FOBJECT VE.TEMFORMATS 187 6. Erptes thc kumti.at ptuhten ct4a t ond s coneiMl a! posibtc. (.ldrir) requires lull , r ilu U Ir| ' u I J n ' j \i m p l e d i r(r r v dremenr.. Lun.i :enpss, rhe .ti mi nari ;n ol SUMMABY PROPOSITIONS I Shorl?nswerilems are usedma n y lo tesl tor tac- 2 A much wlder range ot achievemeris can be lesleo wr0r lrle-tatse or mulipterhoce lems lhan wilh shorlanswe. lems 3 The diilculy eramnees have n prodlcifg lhe cod€cl answer 10 a short-answernem is an ad vanrageot lm led value 4 Shorlansw€r nems do rot provde a more vatd measureol realach evemenl lhar do chocetype 5 Sh on an swe r te m s ar e ef t c enl and r et at iv ey 6 Shod-answer rlems need lo be conceived and wriLtencare'! y to avotdtlre possiblity or muliple7 | wr I ng shorFanswernems, r s advantaeeols 1 0lh nk I rsL ofltr eans werand t henwr le t heq! es ror tha l w e hcil il 8 A d n e c t q u e s l o ng e . e r a l y w r t t r e s u liin a r e s sa m b r g u o u s s h o ral n s w e rl e m t h a nw I a n n c o m p t e t e I l L e mw r i l e r s s h o u l d a v o d t i l L i n O n t a c l s e n lences lrom texlLa maleras as the bass Jor short-answeriLems 10 Malch ng tems Lke shorl answer rlems, usLa y are lrm led 1olesl ng ro. laclual ntormaton l l l v l a l c lnr 9 n e m sa r e e l l r ce n l a n d u s e t u t i re m p h a siz n9 reanorships belween ideas 12 Shorl homogeneols lists snoub be used n any r3 Perlecrma(ch ng ol the lwo sls on a one-lo{ne basrsis u.destrable D r e c r o n s s h o !d b e e x p c i l a b o ! l l h e b a s i st o b e 15 The sr o' respofses i. a malchingc lsler sholtd b-apresenled n ellher scrambed or atphabelicat OUESTIONS FORSTUDYANODISCUSSION Howhr ghl t he r es pons es loas er ot s h o r r - a n s wleerm sb e l s e d a s l h e b a s s t o r d e v e o pn g m! I Plethorce ilemst Why rs I nol possible to co.verl a[ 1r!e ta se or m! I p e,choce ilems |o usetut snort- 3 How can matchrng lems be used 10measureachievenenl beyofd lhe reca evet? What can bedone to reduceorelim narerhe roteot compliai onal sk I in restingnlmerical probremsorvng ab ty, 5 whal a.e some dfawbacksassociatedwnh.edlcing a mlltipre-Etepnum€rca probtemto a sel oJ singleslep probtems? I 11 Essay-Test Items THE PREVALENCEOF ESSAYTESTING &say resls condnue io be a very Dopllar fonx, (specialiy among schotars and at Ine higher Ievels ofeducation. fh; sc ri i,.t."i'Lii"i.;J;;,iH:l,i.Jil,i* *i:"1i,:;:":::i:*t:iil,j* r..- "r*,it*""-l-i-*;;il;;. ;; iil:;;;"jJ.il'J,il::fi :,,1.:::,:,:,I (Cofftnan, l97l). Howeler, there dre othe, reasonsfur rheir DoDuti,irvon" ;..^nienc€.rn contras, wi,h.ri..,i* ,.",,, .",,y ,.,,i ;l; ,i;i,i;;::.;i".":j:1"._ $e difttcurr panol thejo'b,susual!sraaing sruaens.answ;,"1;,;;i;,t;"" securiryrhey pro!ide ro the examiner writeri or**y q".",i"r. ,.. s"rj.j, *quired,as are (ompos€rsof obie(rive.te\r irem!,ro detendrhe -. or ro demonsrrare thar none of rhe..sron answer.hsay questions require rh me scorer can rare wirhour descnt showing his or her own version ot essa/qu€shon are seldom so readil objecrive,t€sritem li is also quite easyfor th€ I . level aDd disrribuiion of icores. r, pomrs, or €ven sevenpoints for a p€rqonald€cision.Thu;, no matter an essa)resr,rhe grdder r an dcliusr wi| receivescoreabeto* ;*". ".*. 10t --------- ESS"qY lEsT TEMS 109 it enr . ii fi i !u l l 1 .i s o o r a .tu (i a t i a .r,o r contri butts rn no smai l ncasure ro LhepoP uiar iiy o i ri i c e i s a y re s i l h e d i s (i n c ri ,J nb c rw e e 0i r , ' , ' P . , r ' J n r ,,n r \! h rt.. D .,rh r\( d ri , ' r g n\ u \^ .,u r,, ..,n r..,i , .x r,,r\\r oB m e o 1 t,e r h :i ri (i , r,s c sti rr ..,,rt., r hc wr i ri n g i rs e l i t t.h L s \rri rj ,r,r i s r ar i c r d ;r i rs e ti $ ;rh w ri ri rg a s l c s s rr , ( , , Jd ,,rE ,,,rr,,o 1 .e l t r,. d L r,i ri r r c c r u c J ,1 ,rSc .r,n i _rh r rn ,n ,n u u . p u .p o re i n th i s c h a p rc r lor m c a s u ri n g e d u c a ri .rn a ra c fri i rc i ( , 1es s u tsl n d \ri ri ri g a s s e s s re n Ni s er er . ' s e s , rh e s (o ri n g .ri tc ri a , a rd .eprescnr achrc{eDerr ir rhc reie\ exPress'on. rHE VALUE OF ESSAY TESTING IJ:#,:HH;i iHlll,H, lY:::t$.: iberarely choose rnderermrnare issues as q u e s l | o n 5 .w h a r th e srud.nr (oncl udes,,he) (a). i \ uni mpor. T he € r i d e n c e o n w h i (h rh e e \a mi nee ba\c, rhe, on, turi ,r, an! rhe. oqJn, v , x or n' s or h e ) rrg u m e n r n s u p p o fl , r ;r are 5ai d ro be a .rmoorr," r "'' ( r ir i( al lh i n t i n g . o ri a i n a ti ,y . d .b i i ,,) ro oj B .ni .c dn I lear ly de l i n e d . T h o s e ( h a rd c" n re ri s ti (5ol rhe rnckcr. i h sludents hav€ more and whi.h have less citly. When rhe scores awarded ro essav. deductions from a maximun possible s'c nadon of these deficieD.ies: i. 2 J. :l incon€cr starehenB vde included in nc ansRei Inpo,rdnr recGsary ro an adequue answer wereomrred 'das Lone.r lraremenGhdvrnAtilte or no retation ro rhe quesrionwe,e inctuded. Unrcurd con tueio.s w€.e reached,eirher becauseof misrakes in reaoning or becauieof nisapptication ot principtes 190 5 B a d $ ,i ti ,,s o b s .u rc d(h c d e v.k)pn,entnnd exposni onof the stud€nfs,.l eas 6 l l ,e ,c ,rd e l l .,g ru n r.rro rs ,Dspel l ,rgrnd the ue.hani csol corrr.t w ,i ti rtg N l i s ra k c si D th c l i rs i l b u r ca(cgoti cs can be attdbuted ei ther to w eaknes s esrn L h cs tu (l c n a s .o u r a i d o fknoul cdge(n b l ack of.l ari ty and spe(i fi ci ty m t hc c x a mi D c r' sq u c s ri o n N l i s ra l cs i r thc l xs( rs.r (al egori cs ci thcr i rnl i cate a i (c n s e l f.x p r c s s n [ or rcl ](cI Lhedi ffi cul ti es of rhe hand i n keep wc ak n c s si r ' !r ing up ri th a n i D d ra c i n g ,l ,e e d u n der Lhepfc,\ rcofaLi mel i mi t-A sessal tests ar e t y p i c a l l y u s c d , rh c u .i (tu e fu n crnxrs thcy hare that are bcyond the scopc of objec ri v c rc n s s e e n r s o n e i rh a t l i n ri tcd atd i ndel i ni te Odel l s (1927) scrl es l br r . r r ing e s s a yL c s ra n s l e rs $ rg g c s t s o ongl ! rhaL Lhe l eD gLhof , studenfs answ er m ay be c b s e l l rc l a tc d ti r tl tc s (o rc i t recctves l -ongcr answ erstend to re.ei ve higher ra ti n g s I nlluen c eo f W ri l i n g Ab l l i l y E s s ry te s tsa re a l s o !a l u c d f o' rhc cD rphasi srhey pl ace on w ri ti ng H ow c ! c r , t l ,i s s b o d r a n a d v a D l rBea n d a di sadvantrge W ri tten expressi oni s an i m por t r n t s k i 1 1th a t e s s a ytc s tsd o e n c ouragc.H oLl ever,the P racl i cethat essaytests b/ w ri ri ng hasLy,i l l consi dered,and unP ol giv e r n $ ri ti n g rn a y b c p rrc ri c e is hed l v o rs c , s k i l l i . $ trti rg , o r' nl .rck of i t, may i ni l uen.c the scorcr' sj u d gment r c gar d i n g 1 1 rc.o n re n r o f th c a o s { er U ni fornr, Iegi bl e handw ri ti ng and fl uent, gr ac ef trl s e n tc n .c sc a n c o mp e D s a tcforso' re defi ci eD c' estn .oD tent (C hasc,1979; IIughcs. Kcclirg, and l-Lrck, l{183).On dre other hand, flaws )n sPellhg, graDmat or usai.c crd derfacL from the scorer's evahration of thc content. S(u d e n rso c c a s b n a l l l u s e w rrti D g ski l l to compensxtefor l ack of know l cdge Srudenls who arc hard pur to aDswcr adequately the question asked can t r ar r s fo fi r i t s u b d y i n to a re l a L e dq uesti on rhat s easi er for them to answ el If lher perforDr wcll (,. rhe substitute tasl, the read€r may not even nodce the s ubs t i tu ti o n Or rh e s tu d e n t ma y c o ncentrateon l btm rather than on conl ent, on elcganr preseDtation of .r lew rather simple ideas, io the hope that this may dtv€rt r he r ea .l e r s a ttc n L i o n l i o m l h e l a ck of substantralcontent N o t a l l re a d e rs o f e s s a ye xami ual tons are easy to bl uff- Then, too, sl u dents likcly ro be most rn nccd of re kind ofassisiance that bluffing might giv€ thenr are sually Lhe lcast able to use such techniqLres For lhis .eason, bluffing on ess.rytes!s is hardly more serious a Problem (han guessrngone s way to success on an o b j e c ri v c tc s r lnllu6nc6 on Examin€e P.eDarallon Thal the nature of the examination expc.ied affects the PreParation stu_ denG make for it is attcsted,by experience, reason, and research (Meyer 1935i Terry, 1933) SuNeys of student opinron conducted about 50 y€ars ago suSg€sr fte students then srudied more thoroughly in prepamtion foressay enaminaiion! than for objecdve examinations The abseDceof more tecent research on this lopi( m d y i u g g e s La l a .k u l i n te re \r i n l he toP i (, a l acl ofaw areneqsut$e P orrn' r ijl dil l e re n .e s . o r d n i m p l i c i l i n d i , ari on rhnt srudentsobvi ousl y P repare di fl er entlv for the two tvDes oi tests ESSAY TESTITEMS 191 I 2_ r - no t u n r n ,n mn n ro \c e \tu d e n r\ w ho hd\ e been hrndfd a mutri pl el besin ro wrirem€morizednores,)rtisr\on *. t..r p,g* .i,r,.i.'*.i hoi .e resr i".ir.i". Ihesearelikelyrhcs,mekiDdsof noresrhcsest,a..i"i".,ra si"i,,. re\r.An rD\pF(|lon ot obte,rive.rc.(boo|er. arrer -.r.. rtre"".. c;mptc;o,, oi resrinsofren reveatssisnifi.anr not; mak'ng, most often t" . i;.;i;;;l be inrelligibte only to rhe-maker. Many porenr facrorsorher rhan examina.jonsattecr hos and wirh lrha. r'r,.F$,\rudenr\ \rudr.The,c fr,,or, rntc,rcr;n."mpt.,,,1, r,, to.tir"i. o, i, srrarecleariywhrch form ofexamirladon. essayor oqectt"e, t rr,",no* rr""eii. cial influenceon srudyand lear ins_ ^s NELIABILITY OF ESSAY.TEST SCOBES The mosr seriouslirnirarron of essayrestsas measuresof achievemenr in class. ro.'n \. ingsir rhetow etrahitirv of rhet.ore\ tt n rypi_tr11,er,r. r o" , eri,S,ii," ' meansrndr there rs J good deatot in.onsrrr.n.yh(rqecnscure\ obrdrnrdlrum 'u, i r\.i\e ,dminirr,ati.n, of,he samcrrsr ,,r equivatenr ,.r,,, f,"; ;;;;;;; denr.ro'ins\ or ,he \ame,esr.on rhe whote,rir ee .".dt,t;;a", ,;. ;;,;;;,fii: Ior tow I rhetimire.t\amptingor,r,"..",.",.."","Ar,"iii *., '}li. 'chahitirvlI rzr rne rndertnrtene{s ot rhc rr\ts serbr rh( es.avquenion.,and t3, rhe subje(lir ity of rhe scoring of €ssayans$.ers. In generat. rhetarger(henumberofindependentetemenr:in the samole or. tark! (hosentor an a.hievemenrtest,rhe morr p",form,nce,.". ihose taskswill reftect overal achievementifl rh. fi.td. ^,urrrety It i" r;"';";i;;;;.; quesrion ofren invorves many *p"*,.-a.,,_., .i ::.1 ::mlr:x..ess3y.resr dearrwirh as r moreor rss inreEra(ed chore b) borr. L.:i::I:l|-'jll*y "re nor rs independent Ine quoenr and rhe grader, etemenrr. Few'if any,experimenralsru relalive ro rhar of obie.rile t€srsia sr'fliciendy objectivescorins of essal there havebeensometheoriticar ani a direct relation berweenthe exrens rhe pr€cision wirh which differenr levetsof achievemenrcan be diff€renriared Posey(1952)demonsracd rharexaminees,luck, r..l i" *;"e""k.;-_i;, ". "r,t, 192 TEST IIEMS ESSAY k) k!.rr rs.rrnlch e.erer facrorin r!_e 1::l flr'l:" sradethev re.ejvein a r0 resrthan in.!r,r ot 100r,.rrs 'rem rhecsk andrh. brsisforjudsins anexani :,":.:l:l :::1ll::j,rl:iions n'pr, i,' F ' n i, c r, rv" F . , , r; l, ; , ::: I ::: 1:ll'r' ".F " ;;;.;;,;. ;;:,; scorjrg dirc.tiols.di he ,ornn/r ,i.." ,.;;,.;^:, ;i",;: :l:l.i j.i :-::l,.lt::,:,{a,d,expric,, 'u.,.',uv thedjr.i,,.'.. ^0"'" ;.*"r,"1. .t"a._, H*:::.Ll:::.i.:ll.d r h€ nor e o b i c .rn ' e a n d rh ,j neasur(3nen,s obtai - l rrbl e fran an".a.i._,, essa).resr ques io n o ,,,h oues tlio ,,trbr , n ,) h .e ' e ti a b l e T h e c i a s s i .s n rd i e s .t s ra ..h an N pD k s ur em e n r o r a D e v i l u a ti o n Ir i s u s e ful m ean so n e rh i n g Id rh e p e rs L u l h o d( c let er m i n a rj a n ,b u i d ^ y s o r w e e k s !ater same thrng to rhe student rvho recelve, To rhe degree rhar orher quali6ed obs, 1l' ' 1" t 1 4 ' " " ' . scofes to rhe sams essayt€sr ansller on hcts q' ' " t /ri \rgree i n rhF rane \\r\' ' " ,,;;,,,,;. .--,;;,,;l ;;, rh ' s ..... s o ,rrd he \hi kpn i rd rh-,r A paricularly dim.ult disrincuon for studerr$ ot e.iucarional l'teasure ,erqecn /t) rhF reti rhi ri ,y ot es(av E .,r.\ el i dhi ti rv ot es(ay ,.ti nss trnm mLrl ri ote rFl ares,o rhc (otte,ri .n ot .,,| l ,.rns w e test must be used wirh .oemcient alDha The retiabitity ofessay rarin;s ,a different raters assign the sam€ reta;ve Da'n quesrion raised is, ..Does tbe score on wno does rhe scorinB?,' When muttip c ons r s r e n r a p p l y i n g l h e s c o ri n g c rirei 'n duced by ea.h rater wi be about rhe ! scores assigned by any Fair of lateB on ESSAYIEST ITEMS I93 mosrdefensibleDerhodof eslilnaring rhe .eliabili!yof ! alingsis a corretationat melhodpresenrcd by EbelLl95t) PREPARING ESSAYITEMS lm plic ir i n w h a r h a s b e e n s a i d i th i s chaprer about rhe val uesand l i D i rati ons of es s dyt e s rr d re a ru m L ,e r u t \u g g ..' i uns t,,r i mpr^\i ngf..ay,rpequ.\ri on, l-"Ash guesti.ou or.ler ruhs thar uitt r?quin thc shtd?ktto.tcnonstolc a eommond ot 4s ?in d r k n o u k d g 4 \u ,h !u e \rj ,, ns \.l l n,,r .rnrpt. (J Inr r.prudurri on or marerials presenrcd in the rcxrbo(,k or ctassroorn. rnsiead of bokiire exctusiveiv backward to rhe pasr course of i.srru.ridr, rhcy will also look forw"?a ," fri"." applic ati o n s o l th e rh i g s l e a n c d ' 1he quesri onsw i be based on no" el si tual trons or probtems, nor on rhe samc ones sed for insrnrcrional purposes. 2. Ash Nestiotu kat dre .letetui@te, ia the sensethat expdts could agree that one an&ta h bekel than dno.[e. ]ndere,minarc qucoons are l;t<etyto finction oniv as exercises in erpositron, Nhose relarion ro etfccrive behavio; may be quite rJ. m _n, eS u l h q u e \ri o n \q i l l p ,u b d b l !n u ' bpe\pF.;a 1 rel " .ant,oLhemeasuremenr or r s lu c l e n rs Is e l u l ri ,m l ra n d o te s(.n' ral knoqt" dge. Funhermore, dnd mo\l im no' ' an rl t, rh . a b \e n ,c u t r g n o d b . .r dn(w .r mr) mrte i r nuch morc di rfi cutL l. r n r e J d e r to j u d q e a g i rc n :ru d e n' s l eret ot dthi .,emcnr. On controtersi al q' , F s Lio n sw . h i c h m.rn v rn .l e rrrmi n a r e quer| | nn\ are. l he redder\ oD i nrons and hir qeqm a v , o n \i d e ra b l ) e a n) c\atua| | on o, rhe \rudenfs anrser ' n fl L re n i t. D.eJi"" the .mniwp\ talh 6 .onprer2t, and l{.eili.dltt 6 posibte uithout intarferins &trn n.Mfcrncrr ol thc.trhituemat iatendc.t. The qDesr,on ,houtd bi (arefriltv phrased so rhat €xamrnees fu y undersrand whar rh;y are expecred b do. Ifth; r a. k ir . D o r .l e a rl r F \i d p n r i n rh e q u s r i on i r.ett, add an expl anari on of| }l e bari s on whi. h a n rw e rsw i l l b e c v rl u a te c l .Do not al tou \rudenrs more Leedom than i s nec es v ry to m e a s u rerh e d e s i re d a . hi etemenr It rhe quesri on permi rs vari ari on in t he ex re n r a n d d c ra rl o f rh e a n s w er ei ven. bur rhi ! i r nor a;etel an, va,i dbtc, s pec r r vr D o u r n o w to n g rh e a n s w e q expet red ro be. to nor? speeifi. que,ti.^, thdt con be a&aefrd no/. sLe pl,Iwne !..In_cn!4. au. / r y . h e .ta rg e r rh e n u m b e r o t i n dependendy rcorabl e quesri ons,rhe mo,e .l r hnr ouSh tyrh e r o n re n r d o m r' i n (a n b e samptedand. rheretori , the nrore reti abte srions are Iikely ro be less ambisuous ro ade retiably. Occasiona y. an nitructor esr on only a few very bioad questions. rr, and rhe insrnrcmr shoutd btsure tlar nr to wanant rh€ probable loss in score reliab'lity. ,. Atuid gi ag thz tuni@ d .hoie o^ong op.iotot qu.,rion, wt{.r, ,n .iot circun stdu6 mdhe flch-opti.tr tfdifTerenr examinees answerdifferenr ques 'e.4rary.rh€ir \cores ons. Ine Dasrsro ompaflng ir weakened. Clearty, when studlnts choose the quesdons they can answer best, the range of test scoies is likelv to be Dar r o w e r-h e n c e th c rc l i a b i l i l v o i rhc r(orcs u(rtrl h. cxpc(tcd k) hc srncw hal les s R e s c rrc h i n d i (rr.s l h a t th i s .rxl )..rrti ,,n i s j rsri ti cct .I' r t , \, ri ,i 4, \, I;,,r' rh,.r,, \ I' F n , ' l l ,!,." d ' | , , I . , ' l fr\F F $ .,\ rt r,.' i ,,I. \l r.\r.r , t ' r' " ,i Ii ,d . ,rt,.r.,,Lt. rtr.., n.r,, r,r " .' t\ o1 ll) cn o rn i ,{ e d d rc ,tu c s tl )n (j | { 1 ,i ch r w aLrl d.to tc.r5rw . H c " suql rerr..l ' h.! t har un l c s sth . v a .i o u s (tu c s i i o n s;n e l ci ghrcd i n y)fr{ r nU ri t}l c tashi ,,nrhc i i roi .e l( ) r m o fe s s a v c x a n ri n a ti o n b e d i s (o nri nucd S rrtnakej (j 9i l )r,nr(l acl crtarrrrcv of t hc p ro b l e ms i n v o h c .l i n ' h e u s(j of(Ji r o..tl .rucni ,,n\ w Ih rhcs. i !,)r(t(: \. e x p e ri m.D rrl.ri d c rrrc h t,.(,, l nrtrl rtr(d,,i .l ynl rhd sti l \ rl i l i hi ti Ie\ .rn b c r.l c (l u [c l r s rmIt(.db \ rh. u\. or .,prn,nrif csr]oD si on rl x.orhe,Lan.l . s e \e r.rls tx d i e sh a \e s h ,)\r rh rr (' pronrtl l D ei i ,!rs ((nntti .d. nrednucnrcnl .rnd i .b o rl u l e Ia .ri rs o t l u .l g n c n r $tri (l are exnrnc,,usr) rhc xbi l j rvtrcrngD rex i rre d . Po / y N n .l s .n n p l i i s , rr F f..(,r,nr.i rd.d thrr opri orrt ,rucsr,,," Lc x ro ,d .d a n d rh d rl l c x rn ri n eest,e rskcd brur rh.s.,n,e;r.. (rr i 70) O p ri o n a l q u e s L ro n sa rc s oD r€r| nres j usti l j cd on the ground rhar gi vrn{: . s t uden rsa .h o i c e a D ro n g(h e q u e s ri(Jnsrhey are ro answ ermakes rhe resr,,f:i i rer:: B ut if a l l rh c q u e s o o n s i n v o h e e s senri rl asp..rs ot a(hi evcri cnr i n a .oIrse (as t hev o.d j n a ri l l m i g h o , i l i s n o r u D f ai ! to aD r shr.l e.l b requj rc ansl ,crs ro al l of r hpn, . Iu rrl ,.' n u re . J n ,,t,p i ,,r ,r,r , h' r,\ ,' rr,,,,R ,,pr,.,rH t qr,e,ri ,,n. r. r, ' ,, . rt,, \e p,c| ' rp,l h, lplh e p , " ' rc r .rrd e n r (.n \' ,j .ri L l v. h,,' m,\ :,, " ,rl l v ' ti .i ' .,. Op ri o n a l q u e s ri o n sma v b e j usri {i abtew hen a resrofedu.arbnal a.hj erc. ment musl c()ver a broad arca and \rhen lhe studenrs \.ho rake ir have receivcd unc qua l o .ri n i n g i n d i ffe re n r a rra s E !€.. rr ;uch ,i si nri ri on, how el cr, rhe rch.atr t ageso fu s i n g o p ri o n _ aql F s l ro n s a rc hi ghty dul ,i ous Opri onal resrs,separarel v s c or ed ,mi g h t b e p re fe ra b l c ro a c o nrmon tcsr,vretdj ng a si ngtc s.orc. b;sed o; dif f er e n r s e $ o f q u e s ri o n s 6. Testthe questionbJ uriting dfl i.ledl antuer ta it. Wriring rhe idealnnswer r( rhe time a quesrion is drafred scNes an rmmediate plrrposc tt gives rhe resl const.uc t or a c h e c k o n rh e re a $ n a b l c n e s s of the q esri on and on rhe ad€qua.y of hi s or h€r ow n u n d e rs ta n d i n g .Pe rh a p ss o me chanee i n rhe quesri oD.oui d make i r eas ier , if t h n r s c e n rsd e s i ra b l € ,o r m o r e di scri mi nrri ng, w hi ch i s atw aysdesi tabl e_ A ls o us e fu l , i f i t c a n b e a rra n g e d , i s ro have a .ottengre i n rhe same ftel .l rrv k) ans wer i t. C o m p a ri s o n o f s u c h i d e al aD sw ersni ghr she.l addi ri onat j i eht on' rhe ques t io n ' ss u i ta l ' i l i ty a n d m i g h r s u 8gesraddrri onal w ays of j mprovrns rr The deferred purpose servcd by dratring an rdeat answer toiach essav. r e( que { i u n i \ ro p ro r i d e g u i d a n ,e .nd a poi nr 6t e| erFn,c tor rh. tar.r s, ori nc ' oi s r ud e n tL a n \$ e r.. l l s o n ,e o n eo rher rh:l n rhc i rrrrrrrer.r r< r" err.l e ,he que. tion-s or to help {.ith rhe gradinq, the ideal answer is almost i;dispensable ro unilor m i ry i n g ra d i n g SCORING ESSAYITEITIS lhe decisions to be made when selecring a method for scoring essaysinvolve rhe rype of score interprerarion desired-norm.referenccd or oirerion referenced_ and the amount ofdiagnolric informarjon needed abour individuats' responses. ESSAY TEST ITEMS 195 Holis llc 6 n d An a l y tl c a l Me l h o d s , o Irn r,,n l ,ro , c ,turc. u\.d t., \ oIng e\..rr rF3rri ,n.e\ T}F nre ., , hr h^li. .i i v ,n^' ,ri ,i € \,i .d g to L J t i mpresi un,dnd hcJ;i tl ri ,i l m;rho,t. th" , r i, , , ' ,ri i ,,t d r\u u r.e . l ' u ' ti t.e o rhe, merh;d\,rr.l i nl i rts(..,.,t,." ri ri IgrJ... nic nr s . I rs n o r a p p l i c ;i l rl cro rs s i g ni ng scoresto essaysrhai are i D tended ;o mea. ."r, h rF .e ,(rrr (,1ro ' rl { n l ', Ih r l n t tr' , n / t.n d ..^ th F n d ,,,, 1tti ., i n.,,t\e..,,.i A nrnq r _,,r.,n e.,Lh \ 1" ,,I rh e .\ !.ri l t , l u rti r\ ,,t rh.. an\trrr. t hrl asrqnat ,, orc r eo. , v L l .j r' ,(l i ' rA ',,, , , n' \'lu /l i r\i rp l d ri ,,r i \xher,i r i \(^mD Jrecl B ri h ' ,,,o ,d r l, , , ' . p .' n .e u t.rl l ,,,h F ' \ru ,l e,..,.r.J rr,-n , j r,\e.rJndi i n I ta| | ^n r., .o,n,.rb\^tur" ,r.i n.i ard a3 , . 1. , , ' r :.,.n rp :' r.d r. rl ' .' q | .t \.Inl ' te IrJper. rhrr rFpre\cn, prFd,i ermi ned qr . ' d. , r,,r,..t ,t,r.' tr.r fi e rI, tt t\h ,.\.t-oq rheh.tr.r,,,,r,rh^d,i nLcu,e.t ro ( ) bt a! ns .o .c ,rb a s e do n rc l .ri v c s ta D d ard r nd pla c e s i r (n o n e d rh re e p i l c s dep wr lh ot h e rs l h .I h a re b e fi r rc a d Afrer I t he hig h s ra .k rrc s h u i l l c d . re re rd . ar I he pc r( € n ra g e s p a re n rh e s e si n d i .are \h ','n,,-' rr" -.' $ ,s n F re i .h { ,n,rp. p r,,,.u.1.rhr\ h.r.cd.,a rhp {,.,di ns :l lr ll .l h :' l \ta r' l r\h .d l " r rl rF !,ri r\e P r " ( iar u r.s , \L l e ,,,i . r i ..r' , t,., n , c ,t nrpl l ,,rl ,,,i an\ .,r, ne,,tp.t dnd hoti {r, r,,, I n! ' \ i., h i rh . !^ ,c i f r.r,1" ,,rt,e. i n,r.l \i r.p. rh,.,hnrr,rpr,\ti ,. LhJr ' r..d . diil. r enri i rtc r.s p o n rc s a r e l c h s c o r e! al e poi nr. l -or cxampte. rn cvatuattns re. r , ar d c x p a n s i o l l . d re g r.a rl i n gg u i d . D ray i ndi care r)rarso.i al , potj ri cal , anrt cco nonr . I' n p a d a rl $ o u l d b e rd d re s s edan.l rhar cxanpl es ofcach shoul d bc si ven. T h. s . o ,i n g s ra n d a rd sn ri g h r b e re p r.csenredon a s;al e ti kc thi s 5 = All 3 as pec r sr r c in. luded. i t l w i t h r c l e r a . l . x a m p l . s 4 = Ar l. r s r 2 of 3 a\ p. . 6 ar e r n . t L l l e d , b o r b q , i t h r e t e v a n re x a D r p l e s Flgure1r-1 procedura safrpe SorLnO ror r]otstc Norm-Fererenced Scorine 196 ESSA\T€ST IEMS 3 2 I 0 = = = = At Ieasr2 of3 aspectsare inclu.te.{,ar leasLone with rcl4vant cxanDlEs r Ar leasr I or 3 rspe.B is included rtlh ,"1.,""r At le4r I of 3 aspecrsis inchrdelt,a,, p".,,..,,, "_,.;;; ",.-;i;, No rcspons€,irrelcvant responsc Nor e th a r rh e s c o rc r rq n o r to o k rn s for D ar ,'p, ",,rre ,.c.1un.c ii ;'.'"".;.'i, ""' ;.i:l'i"',il,ilj,Tl l:,iTllj ;,11"" ,r,.ia.,, ,,..r^"., "r"-",,i."r i,,:::,"::.:1!:^r":,!,.,,.,,,*,,,r ;r,. ".,a,,, r' i: ;";;f;;::::;i lj"i:i,Tt:i:l ::1^':t:l :o:tl''it, ',"' '" "'"'i"'": p,o',n':'r"',",,, : ;";;l' ""1"'" i ;;..;;c , f illll:',;j l: :::::r:1':,",1;,r -o-r .r"-."i,-,pp.,;;"s''; ;;" :;i*;i l*, l:*,1::li: lTT1t.,.r ,::,'l' ;i';;'i:;:l::,i.: ;ili:l:t:'.ll:l:ll::t.t':',:'i'ri'r,r"e;' '^,:,'" ': ;:i 1':::l:l ; ;' i',; l: ': ; jt,l"t :: ;:i l: , :lll;l"ill :lll"jl';;l f -'. inr csr aiiun,h";,;,,.;. ^1";-:j:i': :..;;r ;.' ;;i;: "i : i l ::::J:,:-.:i T .^l /a ri o n ? nd ", l:';J,:i:ii:;;it.j::l:ii,,r il :i: :Il:: ing ;'j: ur::r ";;;";.;';;::,:i.il:l:; p,o.",; ;;;, "::il::^"1i::il,:,""r,, " ", -,, ous t. be elfec(ive Ih e S rrd l n 8 A u i d e u q e d k i rh oar L' . : L n d ,th u s . tr.\i d e ( \c a rc s rh i r aj trijon. Clf course. once fiose scorcs a ar s oc an b e ma d c . l h e q u a ti rv o f L h en( s.ores rrom rhe anat).rical mcrhod der . r eat ed rh e g ra d i n g s x i d e d e fi n i ti o n s: ar e I ik c l y ro e l i .i r d e ta i te d . u n ,fo rml y s Technlqu.s to promote Objec vtty As has been menrioDed, rhe eff r . nr l a(h ' e v e me n r d e p e n d s p r i m a ri l y L ( nm per e n c e o t rh e s .o r e ' i s c ru c i a l ro i h i n a d v e e n rl y d o rh i ns r he\ c ou td h e . H e re a rc s o m e s u g g e s ti on s r oc I rh e v a rF .o mmi rrc d ro m a k j n g r l. Sare the aksu4s qwstion bJ quesnon r. r nat t he s c o re r w i l l re a d th e a n s w e rs to . going on to rheir responses to the nexr required wirh tne holisti. merl]od. lr is srn.e concenrration ofaftendon on one . ial' z ed q k i j l a n d to fo s r€ r j n d e D e n denr t 975) . ^, ;;":; :l;: :llT;,,;; ESSAY,TEST]-I€MS 197 2" VP$sibIe, cottcealftod the score. Ue lenht\ af the sdldeit uhose @ter he a st]d tr icoi.r;g. The pDrposc of rhis procedurc s ro reduce rhe possibility rhat biases or halo effe.rs will innuencc rhe sccres arsiqned ld€ally, rhe ansqers ro.tifierenr quesrions wonld be wrirren on separare sheersofpaper, ideniified only by a code number. These sheers woutd be arrangpd inro gro;ps by quesnon nu,,iUr for the scoring p'ocess and lhcn rccombined by srudenr name for roratinq and.e_ colding. Thrs proces .an reducc rhe halo erTe.r associarcd wirh the irudenCs namc and repurarion o! wrrh the high or tow scores on rhat srudenr.s pre.ed;,g 3: tf pos:1b!e,aftanse IEr indep4 scoi.ns ol the aae,6q Dr at bast a Mnf,tc 'tzn, 4 ,n.2. Independcnr scorlng is rhe only reat check on rhe obie.rilrrv. and hen.e r he. eliah i l i ry ,o frh e s .o ri n g . S i E c ei r i s l roubl esometo arrangl ana ti me consum. lng ro carll out, it is seldon utiljzed by classroom reachers. Bur if a schoot or college werc (o under rak. a serious program for the rmprovement of essayexam. inat ' ons , s U c ha s ru d y o f rh e x e l i a b i l i ryofessay.resr s.ori ns w oul d be an .xcetl enr way r o De g rn To g,erindependenr scores, a! leasr rlro comperenr readers would have ro scor€ ea.h quesrlon, 1!irhour .onsL!lring each ofier and wilhoul knowine whar scotes thc other had assigned. At least 100, preferably 300, answers shoiild be gf,en rhis dolrbl., independcnr reading. (Th€ anse,ersneed nor aI be ro the .ramc ques t ioD .R e a d i n g rh e a n s l e c rso r 3 0 srudenrsro cach of t0 quesri onsw oul d be quir. sarisfacrory.) Th. .orrelarion berween paits of scores on individual quets. t ions wou l d rh e re l i a h i i i ry o f rhe rari nas. ' n d i .a re suMiranvPnoPostTtoils 1 Th. popularityot essaylesrs is due parfty to th4tr I convenence in preparalion,ihe teedom from nispule theyprovidethe examiner,anciihe co.troto, ihe score diskibLl|on lhey altord 10 2 Essayqueslionsare €ss v! nerableto €xaninee ciiticism lhaf are obtectve qu.stions 3 Ar essaylesl may lerml th€ oxaminefro assess 11 llre ex€minees IhoughtD@cesses 4 Essaylesls uslally do not provideva id measures 12 or cohper menla processes sLrch as c.irical thinkin0,orig na ty or abirilytoorqanizeand inl+ Essaysco.efolabilitvcan be eahancodby maknoihequeslions s.eciic enolohso thala lqood answe.svrrrbe n6ary identical Belabilirvcan bo anha.cedmor€by !si.g moro queslnnslhat ca lio/ shorlansweAthanby !si.g l.wer queslions thal fequk6tonganswers Oplonalquesronssholld bo svod6d in a6say Advanceoreoaralion ol an deal ansqo.lo each essavlemracillet€sretiabtescortng andp€mirs a .heck or lhe qualllvor lhe qlostionpriorlo ns 5 rhe emphasis6ssay tesrs pace on lhe abitityro 13 lhe holistcmerhodotessayscoringinvotves the w4ro rs bolh advanlageousand disadvanlageo!s assesshenlor overarqualitybssedon eitho/,of 6 Essayscdfos tend to be ow in re abi ly becauso ai v6 or absoUlesta.dards ol rimred conrenl sampting ind6inLl€t6sr rasks, 14 Theanalyil.almgthodotscoring invotves sssiqn. and subjeclve scoring ingscoreslo components ol a r€sponse basodon 7 Essayscoresm!$ poss€sssigniricanraoounts ot 6bsoi!lestandards obloclivemea.inq lo b€ !s€l!l I Good 6ssay qlesuons r€quir€ the examino€ lo demonsrratoa com@nd ot essgnliatkrcwledge 196 €SSAY.TEST ITEMS OUESTIONS FORSTUDYANO DISCUSSION 1 Howcan thegraderot an essaylesl contro th6disrfiburion o, te$ scores? 2 How.an a soclalstudiosreach€.oblan a meas!reot slldenrs,anattlrcal abititi€swilh an essaylesl whit6contro[ingth€ nfluence ot bothsoca slldies knowledleandwrl n9 3 why s essayieslingconsidered,badpracrcetn wntinq.by someeducaloG? ls r(, whal are someol lh6 calsesor towessayscoreretiabifirawhal are some or thecauses or rowdssayjalerfeliabrtily? 5 Whyis thelse ol optionarossay rternsmoreprobematrcrornormrererenced thancrilertonrererenced snLa(ions? 6 Howcan lhe anayricatscortnqprocessresull n norm{eterenced scoreInierpretat ons? Whyis ihe 'primaryrrarl methodnorusef! torscorngessays? I what purposesare serv6dby scoringa. essayresttemby ttemralherthan stldenrby L2 Test Administration and Scoring Unless thc .lass is very large, unless the .lassroom is poorly suiied fo. rcsr ad in. istration, or unlcss othc{ spccial problems arc cn.ountered, test adnrjDistIadon usually is rhe si rplesr phase ol drc shole rcsrimgprocess In Lhe admrnrstration ofsrandardized (esrs,Lhe goldei rule for d)e resr administrabr is: Follotuth! drrc. liont in thc manual prc.szll ln classroon testrng rherc is usually no such manual, ar d dr e n c e d l o r ri g i d l y s ta n d a rd i z c dcondi ti ons ol test admi ni stmri on i s mucl r les s Nev e (h e l e s s ,h e ' e , a s i r D o s r o r l ' er are,rs,.d!atrced pl anni ng usual l y pays dr v ideDd s A l s o , th e re a re s o m e p e rs i srenrprobl enr associ al edw i rh testadmi n6 t r at ion, su c h a s tl tc q u c s ti o n s0 1 p rc p a ri ng tcst takers,of.heanng. aD d ofguess ing o D obj ect ive tesls 'I h ese, rogcrher s ith a (onsider ation of (ompu ter assisted testrng, will provide the sub.je.t matter of rhis chapter PB E P A RI NGT HE ST U O E N T S Preparing rhe sludents lor rhe rcst gocs hand iD hand wiLh preparidg the tesr for the students 'l hough each can be accomplished separatell, the ne8lect of errher certainly will result in lost effort and less vahd measures of achievement. As a start, srudents should know rhat a test is coming Any importan! rest should be announ.ed well in advance Ifa rest is to have rhe desirable effects in mottvatine and di' e, rtn g e fl .rr\ l o l e a rn . \tu d e nts need to knoa rrur onl l shen rhe resr i i comhgbur what kinds ofachievement the rest wjll require rhem to demonstrate. I hls means the teacher should plan resrsbefoe rhe course begins, usnrg rhe nr, structional obJecov€s and learnnrg materials prepared during th€ planning s{aqcs of instruction199 200 T€SI AOMIN]STiAT]ONANO SCOF]NG Test.TaktngSk ts \{har aresomr ot rhc tegiornarc aud n ee so ug hr r opos es \ j d - '- '! i q e s \ e n r r J l r , r " r l i h s \ L i l l , r h d r i r a r n i . ,,t triiir.d ru rrJd '{ pnr.r ",,J,il ,1,.Jdr,6, , ,ir ,,,.trnB,,,r,,,,vn h ,e\p,.nn. witt br !,,rr.t Wi , i D u , r , r . b . ,^r c r , o r \ , , r \ p r r t i , . g tsrdn ncr, or Drn!rLr {e r a n r r J , , , , r t i € i t . 4 . Thel s hoDr d pur r nem s eher , r rhe berr ,,k,n8,ne,e\i ,r',,d!. i;,,,:;;;i..;.; f:;ll:"lil:1,::;:l:,;.:;1..;iili. ; -------.- TESI ADMIN]STFATION AND SCOFING 20l ollie( by yin u la. rs, is a l)eavy handi.ap ltxanrinees should , ertizc rhar lasr dinu rc ( r dm m ' ng^J p. iu, s Lhnir r r r n r , i d , r r r r e n r r t n r r rhrnuAhuurrhr,nu,je,Drru! ut r r l) if r h( , d, c t J ( i, , E , \ , i r e t | , , h ( J h r . . _ . i *, . a . *, , , ' h(i' t' c "r , un' m and "t, k , r , , { t ( dbc S, , nr c d r , \ , . c i \ u {r u t r n n r u r i \ d r r n x ( \ a n n e" e" \i ,ri u do r'c,,tc\\ t,t tptur rhdr. tr[glc. i \ ' ud. r ' r \ \ h, , uld p, , ,c r lr r r , , \ r r ,, , . n r. h"rc In,e ru , unsrder and res$,nd ru dll , ht ' r , . , , r ! c , , , u, , \ I hb m ( a,r , h d-,, , h c ! I n u { , , u , p u / z t . , o u t ; " ; i " ; j r ; ; .ut q"€srion or problcm, or wrnc roo .x;ns,vety or,'," ,"r, ri;,;;;;;; wner ! kroS answer se€D'scasl t! (nre ".,,y i,ll' c , , . t / , ut dl\ r r r ' t , c , . , , d, r , , , , r 4ue$ t x d , , , r , e i U . , n \ | r d y,ruLotpcnati/fcLfrl ! i"r o q- uer ( ' nl. r , ur r r , npt t . . . r r , u r r , , f c - , r , J i r H . r ! . , . r ! d ( n r . , r r " , t , r u.r,n lid hr r . ir r t r | . t idnr t ldr i) t u, J n \ $ e r , i 8 7 ln rDw€ring an $sa), question, srudenr sbould iake rirnc ro rcitecr, ro plan, a.d ro or8rni2e rheir answer b€fo.e narti.g ro sritc. rhey sln,uld decide h;w ;u.h r h4. J , ' J t f o' d r r , r in r ir r , i , , , p r L J r t r b l e l , , a , . " r , , , , *, , n . y , t o " , o " . t , . '. q' m e, h, nd h. , qr \ ( , , m \ \ ir u, d ) , e . , , , r u h L . d . a , , a , , , w L r . d. ll r he, nr c L, , , d r c . pu r c \ on a \ p J r , r r r d n \ . e r . h c r ( , j L d . n r \ s h u u t d \ l!. k "r . oe9u. nll. t , , be \ u, e lheir I n",k J t r u J v , , r d r r L \ , h r r / \ r r r , r \ e I n ( r ir,rended rDd rhat ir is muied in (rre spaces proviaca tor rhai question. 9. la possibte, e*aminees should tlie time ro ,e.ead rhe,r answers, ro dcre(r and correcr any.arete$ mis(ckes. tr is..otunon nis.oflc€prioD aNon,{ teach€rs and s.uderrrs rha( rhe firsr answer gilen s tuolr likely ro tre c., e.r th"ar a chanscd dn\ we, . Hoher e, i, c ) . a, . h c , ide n , c n ^ , n , , " , , i n " , o , , . *. , . n , n a i n q , . , r d " - , . im p' o\ c . e! s ( o, c , whe, , , h( ! h d , , t s ! \ ,,n r",,s,,,:i,in;, ii." ",r Ldkd Suc s ing / M uelle, a' , d h r e , r , . t q t T l . t n J d d , r ", .,"n , C r D , - \ e r a n d a e n r o n (' dudum 1980' f dno, h- r r er gudt ir r i\ n . , r . r o d r d b , y,d.,r" r. ,;;;; ",,,dr.g-g s ' der r he' r or iBinal ans wr . ! . Snicc exar,inarions do colnr, srudenls and rheir ieachers are well ad. . ,ise d s pent i \ om e r im e ( ons idc ' ing h o q (ope w;rh Ine.n Los, 5k,llfultv 'o rome .go od book s on r he 5ubiec r . gil i n 8 m o , e'o d e r r i l . d h e t p r h a n wc ha,e 5us ge sre d h. r e. . , r e av J ilable r M illm an an d p a u k , t 9 6 S i D i v r n e ;ld kyten. 1979i A; nis , 1983 ). :alled rcr,&,6ri"$. Srudenrs who are richly ed io be abte ro score we on ,"v t."i, rbject or nor. Furthermore, it is supposed re better measures of studenm, tisiwiseTher€ is some basis for rhis concern. Certain tests,especia y some kinds of inr elli g e n (( re s ,s ,i n c l u d e n o v e t, u n;que.and hi ghrl ,p* t,i r;* a * * , i ., am pr e, r g u re a n a to B' e so r n u m b e r s e ri es.rur resri rrn,5 ot thi r D d,r,rc, " _n rhe rnai problem ofthe examinee is ro..R€! rhe hz previous learning, nor is rhe skill deveto d in classroom tesrs.But rhere are com. v an eJ€mnee to subshture resrwiseness unintend€ri clu€s ro rhe correcr answer 202 TESTADMINISTFATION AND SCORNC wc-reclisdrssed in rhe chal)rcrs on rruc-trtsc rn(l n)utriple choicc rest;rerns. l.hey a.e ourr'ned anil disLusscd i,, srearcr dc(ail in a" L,yv r,",l-ui,i,,,r, ,,io I bc l t" r,-,,. rL , H ," ,d r,.r. tIM,,,.r rj rr rrrrn q -,;.r,. rrr H ,j t ," " i d .1,,;p;" ;,;;) m any c rl ' e s o l d ri s o r x n ) o rr,c r ki ,,d. (;,ren J rerr rt,ar measures I now l e d g e a n d i s i i e e o fre (h n i (a t r.l **"*.cnr -l ,i -"ro " " a-.r is likcrv ." bo rittre,rarhcrrr,xnk!, much,;::;;J:l:il beduc T€stAniiety l h c p ro b l e m o f L c s ra n xi e() . A lx , c ty i s a fre q u e n L s i d c e tl c c r ol ( las s ro o m j o D L t)c a rh tc L i cti e td . i n mc corrrereh(. roonr {.hcre a cN.iat L is , r es ra o x i o u s e x a m i n e e sma y fe a r rh non. or ridicule, toss of respccr, or t c a, he r, p a re n r5 .u r h i e n d ,. tru m rh_- mplex and rhe stualions in which rh€y are ny simpte, universal answers silr be f;und d cure of rcs( anxiety Some research has seem reasonably safe. ' ll::: :," *","" :", rerarionberween reveror ab,rB ad reveror r$r diery. ;*:",;J,: ii."fi:l#ilbre,end,o bere,sranxiow*h€nfa.inB,,.sirH;;. ' ii:fu*1:*i#;1.; Ti,tT: ;mT:Tirt iffi"1;;:Hur% 3. Mild de8le€sof.anxieryta.ilitale :nd enhance@$ penormuce. detreei are_tilel! ro inrerfere $irh dd depressresi performnce.More ext.eme . ' J;:,r,ff :["::t;',::iii1l; ;:x?:,,::l'Ji:;:iiff:; :1r"f,'::xiii; TEST ADMINISTRAT ONANDSCOFINGZ'03 5 rcsr anx'eti- can be edocationalll userul if it is dnribnt€d, at a relaLilcly lo,v lerel, tlrouBb.ur rh€ courseof instrtr.rion, i.{e!d .f being con.entrat€d aLa re l a (i v e l yb ,g h l c l e l j u s t p ri o r.o and duri ng an exami na' i onS ki l l ful reachi dg inlollc\ Lhe.ontrolled releaseof the encrgy stimulatedbv test anxiety NlcK€a.hie (1988) has concluded, after more than 30 years of research relared to anxiery and srudy strategies, that the poor performance of anxious studenrs may be due to inferior studv le.hniques: I havebeen concern€dabout studens wboseperformanceis impaired by exc€s sive auiety, parricularly anxiery about achievementtests Our researchhas .e lealed $me te.hniques to help th€re studenb perform b€rter on tests Otber res€archershave aho delcloped methods of reducing adxi€ty,but €v€n when su.h srudentsbav€ tearned td relaa and control their feeliDssof anxiely,rh€ir Derformzn.e has noL rmDroved Our mor€ re.ent r€searchindicalesthat su.h studentsp€rform Poorll on rcstsnol simPly be.ausethey are znxious but b€ cause$ey are poorly pr€pared Highly anxious studentssrud)''but rhel stndy ineffecrively,m€morizingdelails and readingand r.r.ading (p 7) Evidence to support the belief that some students of good or superior achievement characteristrcally go to pieces and do poorly on every examinalnrn is hard to find Since individuals differ in many respetts, it is reasonable to sup Dosc thar thev mav differ also in their toleran.e of the kind of stress thal tesrs gener at e On th e o th e r h a n d , i t i s c o ncei vabl ethat aP P arenri nstancesofunder achiev€menr on tests may actually be ;nsnnces of overrated abillty in nontest situations. In other words, a student whose achievement is really quite modest may ha!e cultrvated the poise, the r€ady response, and the pleasing manners that would ordinarily mark the person as an accomplished and promising scholar CONSIDERATIOTIS TEST-PREPARATION Objective tests generally are presented to s$dents in printed or duplicated book. lets. Sometimes rhe questions for cssayor problem tests are written on the chalkboard as the test period begins. This savesduplication costs and helps to main. tain test security, bur it gives the teacher the double responsibil;ty ofcopying the questions and ofgetting rhe sNdents started to work on them, all at a time when minutes are precious and wh€n ev€ryone is likely io be somewhat anxious to begin working. Then, too, when the chalkboard has been erased, no one has a valid record of exactlv how the quesdons were srated. Oral dicrariotr of test questions, especially short'answer or true-false items, can be accomplished with success,but most studen!s pr€fer !o loot at ea.h item while they are tryin8 to decide on a response. This permits the student, rath€r rhan teacher,.to i€t rhe Pace. Sohe instructors Put test items on dides or transparencies and project tnemin a s€midarkened room. This enables the examiner to pace the students and ensur€s $ar each epminee will give at least bri€f consideration to each it€m. Studies have indicated that examinees answer aboui as many items corr€ctly when rhey are forced to hurry as wh€n they choose their own pac€ (curtis and Kropp, 1962; Heckman, Tiffin, and Snoq t967). Wilh TESTADM]A/]STFATION ANO SCORING measuremeDr efrors some ctassroom I r. nn $ hi . h drrr, ri un\ rre pri nr.d h< l D \ I ot poI \ , o\ e, F.l dur i ns Dic ' .m,n.l ecp\ theervudcnrs tron,e" rne rhe i r" nrs ,F addrps\ed b, r.er,,r ," mpreh.n.rr, I rtow ro ns€ the separareanswershc€luno no' to *rid in namcsand ID number\ 2. Ho$ many items rhere no." P"gts lhere ar€ it rtr€ Lestbookrer ".. ".0 J wherher nor€s,rexrbo,,k., ,.o",rori,,.'"tny I wheLhs qua,ioc na,,: ;,;;;,,,.:T;::lL:l ii.iJii;'i1;." *, -** 5 . tl o$, m uc h r im e is ar ailaht e 0 whd special direcrions shoutd be fc,l.ved 7 H.! m anv pojnr s v il for ea.h of the sepafate llpes ot b. awar de d fo, 8 whe,ho ;."dd,s shor;..;;; ;:;lT.J":TT"":Jff1,:l:";T:I I $/har ro do $.henfinished $,ith Lherest and L.herh€r rhc r€sLbooktcl needsrd be . ot rhc trrqu(n,t hi rh \hi (h (d,h ru e mude. W i rh tuur { l ,oi (c em5. Inr e\ cr answer tor abour onefou h of Lhe atrention from instrucrors and edu.arior s een I n t he m a s rro n g i n c e n ri v eto r s tu d en r ar her r han fo r rb i t,rl s rm p ty to re me mbe TESTADMN]STFATION AND SCORING ZO5 rnsrru.rors ro eschew recalt rlpe tc p0c anon ry p e s .In rh i s ti g h r th e rc er anlnat i o n _ C )n l h e o rh e r h a n d , rhel bring $irh lhem ro classare lii supporl. l_ooking up facN or forr s olv r ng h me . A D c x p e ri m e n ra l c o m p a r ex ar nan o n , a d mi n i n c re d a s a D L book r es ri n a n o rh e r s e c ti o no frh e bv K alis h (1 9 5 8 ).H e .o n c tu d e d d rz affecred by the examinalion aDDri signifi .anrlI differenr ab,litresi,i x r r qe. , , t rh p o p F I.b ,,n t e \d mi n a ri u n : I Sn'dy efforrs may be redu.ed 2 Eribrts ro overlearn sufficien v to a'hiele rull understandingma) be discour. agcd 3. NrrLep,$ing add cop),ingtron orhcr students are le$ obvious 4 tlrc sup$ficiat knortedgc is en.oura8ed T h e ta k c _ h o m ere s rh a s s o !r res! rvrrh rwo inlporranr differencci ol r r m e.wh i c l ) o fre n d e fe a rsrh e v e r) .l|sadvanrage js rhe loss of assuran(, rherr own rchievcments. For this rc, as a r ear n rD ge x e rc i s eth a n a s a n a ( eren en.ouraged. ro collaborate rn dence The cfforrs rhey somerimes a.bicve under these conditions car If rime liDrts for the rest a a.nrevemenr restsi rhe order of pres{ dent scores, as shorvn by sax and crr probably shoutd be auanged in orut suppose rhar to begrn a ieit with ore cxcessrvetesr anxiery. Ir atso seems r, wnn rhe same area of subjec( nattf pracoces rmprove rhe validity of rhe IEST.ADMINISTRATIONCOiISIOERATIONS As we-have srared eartier, rhe actual adminisrrarion of mosr rests involves relar iv et \ f es an d s i mp re p ro b re ms .Si n .e rhe ri mr ava;rabter" r,;;,;,;1" ;,." ;;;] I m r r ed, and s e td o m a s to n g a s s o me o t rh e rrud..,, .1.f,. ." .y," " i i rtr. rn," ,* ---.-------- 206 TE6TADMINISTFATION ANDSCOFING should be used to good advantage- By Siu,ng preliminary insrrucrions rhe da! before the test, by organizing test marerials for ef6crenr disrriburion, and bv keep in8 last-minute oral direcrions and answers ro quesrions as bnea as possiblc, Lhe t eac he r c a n e n s u re rh a r s ru d e n r\ h a l e rhe maxi mum dmi rrnr ul ri m; L^ \ort !n it. Corresponding provisions for errcrent collecrion of mareriak and advance notice to th€ students that all work must stop when tin" is called help ro .onclude the test on time and in an orderly fashion. times th€ dividing line is hard to determine. S u c h q u e s ti o n sa s th o s e s ti mul atedby obvi ous bur noncri ri cat l l pograph ical errors should not even be asked. Since the process of asking and answering a question during the course of an examination is always disturbing ro orhen, even if ir is done as quierly a;d discreetly as possible, and since the answer ro one studenfs quesrion mighr possibly give rhat rndi!idual an adlantase over the others, students should be urged to avoid all but the mosr necessary quesrrons. Disc'rssion of dis poinr can well be u ndertaken prior ro rhe day of rhe examjna. Special .onsideration may need to be given in seuings where $me ex aminees use trngtish as a s€cond or foreign language. In classroom resrins sirua r ions , th e \e s ru d e n l \ s h o u l d b e e n c .uraR ed to rsk que\ri oni t.l ared Lo gc' re' rl vocabulary or culrural situations presenred in r€st itemr, i.formarion wjth which they may nor be familiar. ln some cases,special tesr adminisrrarions may be ap propriate to permjt additional tesring rime for slower readers.lhe general goal of Sood rest adminislratron is ro prcsenr and mainrarn the condirions rhrr wiJl Permit all examinees to demonstrarc their true level of achievemenr wirhour gjv. ing advantag€ io any examinee. R.duco Opporlunltle3 tor Che.tlno books and articles on t€sting, Ch€aiing on examinarions is commonly liewed as a sign ofdeclining ethical standards or as an inevitable consequence ofhcr€ased emphasis on test scor€s and grades. Any a(tiviry of a srudent or troup of srudenls whose purpose is to giv€ any of them higher grades fian they would b€ lik€ly !o receiv€ on the basis of their ow_nachievements is cheating. Thus the term covers a wide vari€ty ofacrivi. l The sid.long glan.c ,r another.udents answeB 2. Th. DreDaradon.nd uie of, .rib 3heet T€ST ADM]NISTFATION ANDSOORINGA'7 3 Collusion bcrweentwo or mor€ srudentsro exchangeinformarion on answers Uraurhorized copling br'que$'ons or $eali g of i€st booklersin anri.iparion thar rhey day b€ used agai. later 5 A Ia n Bi n B l o r d s u L r,ru ,e ' o ,a l . dn e\ami ndri .n 0 Sre.linB or bulint copi€sof an €xaminarionbefbre the t€sr is given or sharing su.h illicn advancecopieswlh orhers Although rhese larious forms of cheating differ in seriousness, none should be liewed wirh indifterence. The rypical srudenr has many opporruniries to chear, and rhe willingness ro do so has been obsened as early as kindersarren ( l f i\ bic Jrd Ao .| r\\. I9 9 0 r.s o n re ri(umsran.es mr) even encourage;xami nees to chea!, bur nonejustifies rheir doing so. Srudenrs may conclude, nor wirh. out some justificarion, that the erhical standards of many of rheir peers are nor \ c , t hit s h .a r l c J rr s h e re c h c d ri n tsu n erami ndLi on! i !,oncerned Ttrry may go on Lo inler thar rhis facr requires rhem ro lower rheir own srandardi or iusrifies lhem I ndo rn g \o \ h d rc v (r u rh e r c o n d i o' r\ma) Lon' r;bureroi r.rheati nghoutd no' , nr ' r ' i l r' l l ,ru d e n l s h e rr ru re ru S ni /e rhar i s al w aysdrshones and usua ) 'r Some acts ot cheating are no cloubr morivared by desperarion. The more e\ r r er n. rh e d e s p e rd l i o D ,th r n ro re rm bi Li oLrsand v1i ou, l he at| empLro (hear i s likell to be. A rnajor facror coDrributing ro chearing is carelessnesson rhe in st'ucror's p,rrt in safeguardrng rhe examinarion copy before ir is adminisrered and i. s u p e rv i s i n grh e s ru d e D u d u ri n g fi e exami nati on Emphasrs on grades is somerimes blamed as a prinary cause ofcheaiins. Bur sin.e grades are, or should be, synbols ofeducadonal achievement, $,e ca;. not i.dicr grading as a cause of chearing wirhour also indicring rhe goal of a.hrevement in learning Does anyone really wanr ro do ihar? No doubt mosr students would find ir easier ro resisr rhe rempErion ro chear if no advanrase of any consequen.e were litelv ro resulr from ihe chearinB. But refusal ro recognize and re$ard achievement may be as ellecrive in reducing achievement as in reduc. ing €heating. Such .r price s€ems roo heavy ro pay. Increased use oi obje.tive tests has also been cited as a cause ofchearinq. The mode ofresponse ro obJecrivc resrsmakes some kinds ofchearins €asier, b;r r he m ulr ip l i r' r\ u t q u e \ri o n 5 ma l e s o r hrr Li nds of(hcdri ns morc d-i ffi cul r.N o lor m ol r e \r rr i m m u n e ro d l l to n D s o l cheari ng.The qudt;rt ol a resr.how eve,, mal have a direct bearing on rhe remprarion i! offers to srudenrs ro chear. De, mand for detailed. superficial knowledge encourages rhe preparation of crib shee$- If the examinaootr see s ro rhe srudenrs unlikely to yield vatid measures ol lheir r e a l a ,l ri e \e m e n rr. rl i r \e c m \ untri r ro rhem i n terms ofrhe i nstrucdon they have received, if rheir scor€s seem likely ro be derermin€d bv irelevant t ait or s an v w a \, L h e (ri mr" o l c h e a r' ng may\ecm l ${ l eri ous. Whar.ures are rhere for chearing? The basic cure is relaGd to rhe basrc .ause Studen$ and their teachers must recognize that chearing is dishonesr and unfair and that it des€nes consistenr applicarion of appropriare penalries_fail. ur€ in the course, loss ol credir, suspension, or dismissal Reporrs on th€ preva. lenc e of (h (a ri n B. n o d o u b r ro m e ri mes exaggerared.,houl dnor be a obed ro establish cheadng as an accepBble norm for srudenr behavior or to persuade AE TESTAOMINISTFA'IONANO SCOFING ll"iT,:[i: 'n""n'"ti's is inevinbreand nust be ac.ommoda,ed assracerurry Inn ru(ror ro a\ oi d dn) , ondi ri un\ I har mj t r sears.Atternate fbrrrs can casity be t difterent order Finally. ;"rr-l o^ , rherr examinadon6 as parr o{.rheir r( witl nor cheat and !vh; should nor b Teachers bale considerabte aurhoriry in.rheir our classroon. They should not overuse it under sress o. unoeruse rr when rhe situarion deman.ts ir. If a reacher salistied beyond anv doubtthata studenr ischearins' r'. 's .,i#il;ilriil'# r"".:*" "r'. ". '*.a" ". I (joll€cring the e\amidarion marenals and guiertv d^nrissin8 rhc stDderr f.onl lhe roon 2 lbm ing Lhc r em l6 oI t he. x a, 3 j?il:;;:x"J:jl-J:."1fli1il:,1"". inadoD, or rhesu(,.", "" **, Bringing rh€ incrdeDrLo.he arrcDtionof rhe schoolaurhorities 'ivn,s il furrh€r achol O n e frc q u e n tl y m e n ri o D e d D lc m is rh e e s ra b ti s h me n o t fa D h o n ;, educaaonat insriruri.ns of rrcderar slrong group idenrifi.arion and toyalr depends seldorn anscs or naintains r J ' ef ul l \ a d .o I' rn u u u { t\. j h c rl ,rnE . p€r s . nd r h o n u r a n d rh < h o n o r o t rh. or by well rehearscd madi on .l.he d, honor sysr.m io such an e.rironmer peNonal honor in a world $here no , Thar such sysremshave worked ro lin, trons seems beyond dotrbr_That rher . be\ und d o u b r. T h e i d o p ti o n o i rhe pr oDr c m o t t h e a r' n g u n e ra m i n rt| nn tasu€a ol Tosl Sscurity Insrrucrors and adminisrraro siunar v bese, b, rumors rhar, ;',' j;;lH%f :..:t;,i:T "',.,.';,'iffl'Jl'l ddran(eof rhe qheduledadmiois,,u,,on ur,r".^,,,,i.,,i"".i"^.;;.,;il ;: mors are tounded on facr. More ofren rheyresut t.om m,sinfo.-,;-;;;;;: ioussrudenrs are ontr roo easelo pd$ atons.r,*rr,. *" i,-.ii.", ,. ii.",j. rreo,or course)reachesthe ears of rh oneor a numbcror anoDymous rer€pbone cails.whari" l;:'i:'Jlj:?.',t':"1'a TESTADMINSTFATON ANO SCOBING l-.l l.l:,", 'hi\ 209 xj'rtrdrr mo\, rir.rr ro d,,\e dnd ,u .:ru.emn5r.er iuu, wh,ch s Dorr" _",r",.,r,.y,..,"i.u, l'l1t*ll*'":;;i"E:.:ampurs, Ilnm,h,sh\ hoor, N, w,r,rper,r",,^ hd\. ao.,.."iJ,r," ^r,.,iii i",i.:", j:,:: :f i'lq nl**,", ,, '""ilc,." r. ,"i.g"^,.ri'"g :'",1":i:l,i::: ,-1,:',,-',: c,t:'i' 'rr'n"''p'"'.''"i,'.a;,r; . "r.X;;b;;5 ".::l:3-:' :,1'":::'j:il::l:'" ;. .;;;,,r"";",;.;;;:',, ::.::.,1 ill.,: "i,er,.ina,,g.,.,sr,,i, ru*ors thara :="t-,d r.sr is our ttobeginto-circulaie, a. 0,.y .i'",,,, or later r1;;i;i ,"..i,'.tii',,.iii SCORINGPHOCEOURES AND ISSUES i,:.iff :fl 'l:,?:ij'^x"T.':",,,'.,:r;ffJ,T#1".H :i:T:,:;ffii::::i: almost arwaysarranged so that the answersian be recorded in ;;l.:;;;ii;. This aoid\, compli.Jr,.nBrhe rdskor re\pondinsto, ,r,. *g,".1,. i .,i",,').i xamseyer(tg{jq)round rhar rhc rev sori, or r;i_rgr"a" ",8..," o rc.ord rhe,r dnrb e r \ u n d"",;;.,;1,,,;; \eparJre Jn cond.grade srudcnrs were towoea somr i were unaffe.red These findinss folloEed ' H i eronvmu' I l 96l ) on rhrsrupi and have answen n$e,€s, **,*,**",,n. lJljJ:Tl,?:f, :,i:TjlJ,::l:l;;ll :i.:g the corre.red test copy easier to use fbr_rnsrucuonar purposes. l.hc use of a jii.j:ffi .;"1 :',,'.T:i:t';:H'i:i:is;x?il:i:$ flxl:: ff .I::;:._,: used,the answ€$ musr be recorded on an answersneer rhar rhe machrners de signed ro handle If the answersare to be r€cordr sw€rsshould be provided near one mar and rninimize$e possibiliry of€mors,* the columns of a separareanswertev ., answeE and posirio;ing ihe in "nswe.J answerspaceson th€ t€st copy. In scoring.theanswersrecorded in resrbookters,the scorer ma) nnd i! ,herptul , . . ro marl ft€ answeri,using a .otored pen(ir. A shorr r,.ri-rri-ii"" 210 TESTADM N STFATIONANO SCOF NG rhrough thc studenas response can be used to indicate a corre.t response Some_ times it is advantageous to mark all r esponsesusing, in additlon to t}le horizontal lnre fbr correcr responses, an x to indicatc an incorrecl response and a circle around the answer spa.e tc' indicate an omitted resPonse. Responses are indicated on nosr sepamle answer sheets by marking one of rhe several response positions provided opposire the number of each itemSuch answer sheets may bc scored by hand, using a stencil key with holes punched ro correspond ro (he correct responses.Transparent keys. which crn be prepaied on rhc film used to make transparencies for an overhead projector, have some adlanrages, as Gerlach (1966) ltas noted. When a separate answer sheet aDd a punched kcy are used, ir is possible to rndicale incorrect or omitted itcms bt using a .olored pencil to encircle the answer spaces rhat the s(udent marked $rongly or did not mark at all This kind of marktng is useful when the answ€r sheeB are rcturned with .opies of the rcsr for class discussion Most rlassroom tests of educational achievement are scored by the in' srrucror. If the rest is in essayfbrs, the skill andjud8m€nt of the instDctor or ol someone equ ally compe ren r are essential The task of ,corjn g an objecrive test is essentially clerical and can often be handled by someone wbose t'me rs less ex pensive rhan a'r instrucror's time and whose skill and entrgy are less in demand for orher educatronal tasksSome school slstems and colleges maintaiD cenoal scori'rg senices Usu' ally, these seniccs make use ofsmall scoring machines, se!€ral ofwhich are now available. Bur even if all rhe scori.g is done by hand, a central service has the value offosteriDg the dcvelopment of special skills !hat make for raPid, accurate sco.ing. Institudonal lestsco ng senices often Providc statistical and testanalysis sen'ices as well, and sometimes they even offer tcst.duplication services that Pro' vide expert assistan.e in the specral problerns ofrcst Pr oduction and in the marn renance of rest securiy. Instrucrors sometimes usc the class meeting following the test for test scoriDg. Asking each student to check the answcrs ofa classmatemay on occasion bc a reasonable and rewarding use of class fine, bu! ofler the process tends to be slow and inaccura!e. A difficulty encountered by one student on one test Pa' per may interrupt and delay the whole oP€ration Most important, if the student echanical accuracy of scoring, as they Probably scorers are concentraring on should be, the circurnstances will not favor much learning as a by product of the Optical Scanning Equlpm€nl Recent advances in computing technology have contributed to the devel' opment ofan array ofeleclro ic scornrg machines thar are pracd(ally usefi.tl and economically accessibleto school distri(ts and colleges ofall sizes. fhese oPtical scanners can be oPerated indePende tly by relatively unskilted workers or they can be inregEted rnto a variety ofcomplex comPuter equiPment configurations. They may be attached to a large computer dire.dy or they may send information ro such a computer over rransmission lines They cen be attached to a microcom. purer or minicomplrter. As a self.connnled system, some scanners can r€ad th€ IESI ADM]N]STBAT ON AND SCOFING 211 mswer sheetsjcompute c score lbr sheet.Smallermachinesdo so ar the rr scoringol educarional rcsrs,buLrheir \ Corection for Guessing Supposea srudenrh,ercro sue ^. rrnce thereare onl) rwo po5ribtean;we thc studenrhasreasonru etpecra scorel l<nowrngno t€ssthan rh€ firsr bur reluc answers and rhusrecerve a zero.Wirhour rrrststudenrwoutdbe hrgherrhanrharol snourobe the same e for guessing. ir is re.essaryb subrracr xpe.redgainfrom btind gxessingSincc answ€rsro^erery righr answer In this ca numDer or wrong responsesfrom rhe nr Suessing.If multipte.choiceitems tisr I.i! q-uesrion, only one oI qhi.h is co,reL,,rhe cxpe.redrarroot qrons to Lqhr dn. srle^ rr 4 ru L and rheguessins co,,e(rion woutd., ,", ,,h,,,, ri,..s.;;i;;;;; ol rhe numbcrof srong answ;,\ trom the ut,;shr ,.,,;;.: 'lumber Logrcor rhrskrnd tcadsro a generat tormutator;orrec,ionIo, gur\\iIg: W (r2 .t ) 212 TESTADM)NISTAATION ANDS@FiNG s R l/ N = = = = scorecor€.red for 8u€s5in8 number of questionsanswe.edriAhtty nunber of quesrionsansw€redwrongly numberofpossble akemariveanswers€qualtytikely ro be chGen in blind SuessrnS .It is easy to see rhar this formula becomes ,t=n-W (r2.2) in the caseof rwo.al.ernariv€ (rrue-false) itcx,s,or w (12.3) 4 ,ro., t"U" ot*t" ntnd teadsro a secondgeneratformula for guessingcorr€c- +9 = R = o = / = (r2.4) scorecorrectedfor guessinSon the bash of tremsonitted number ofiiens answeredcorecrty number of items onirred nunber ofafternativ€ answqs whos€cboi.e is equaUyl*ety on rhe bass of blind guessing Again, it is €asyro s€ethar this g€n€nl formula becomes +9 2 in the case of true-fals€ items, or (r2.5) TESTADM N]STRATONAND SCORING . l' = l i + : 5 213 (l ? 6) nt r he c as eo f i i l e rl rc rn a ri l c mu ti p l e .h oi ce resri tems If rhe lame scr of resr s.ores is .o,rccred fbr gxessiDg in rwo differenr s ay s , by s u b rra c ri n gx fra .ri o n o f rh e w roD g answ ersand by;ddi nq a l i acri on ()1 the oDrirred rnswers, rwo diflerenr sers of cc,rrc.red scores wilt b; obranred I lut , alt lr ou g h rh e r$ o s e rso f s c c ,rc srv i l t di ffer i n rhei ! 1* i 1}, tr,. omn (onccred scoresbeinghighcr in all " " .." g" ," tr" corre.ted scoresbcing !ariable atm 'rorcA makes a higl tecth correlared.lf srudenr a( eiia. Lions o f rh e i r w r" n g ." " p o n s .,i r c s polis c s A , w i l l a l s o ma k e a h i s h e . s c ( ol Lheir ir c m s o rD i rre di s a d d e d i o th e 1 uar c m e c or re .U o n to rmu ta rl a r re s rso n i r N o such assumpnon i s made i n rhe tornula {or guessjng .orrecrion on rhe basis of ircms orniried, and vet the tro I ! r m ul. s r r e l d { o re \ rh .rr rg re F p .rt,, rt! i 0 rhei eta,rrc rdnti ns ur sl udcnr\. \ ur s , o' r e re d b \ ,u L ' , r, ro n m" v h e r tog,(Jl l ) ,. ro" I; i n ,b,oture r aiue. J Lr r ras rh o f-u rre .re d h ) a d d i ri .n" grrJ(Lt ma\ bc r.S Jrded l ogi ,al l ) a. run hi sh b t rh e \ a re e .i u d l ty , ound Ir ,(tari \F !atue W rrh r< ure: on r ( ' r s . r . . lur r o Id l J , h i c \ e D ,e n ' .rh e d L \ul ure !rtue r\ u\ui l t\ rar Ie\\ \i gIi fi Ldn, rban rhe relarn'e ralue It is also reorrh noring here lhar rf no irems are oDrjrrcd, scores correcred lur . gx r \ i' r gL ' r.u h (l a ,ri n g d frrc U o n u rrh e s,ungre\pun\c\,urrct!repertF.rl ! $r r n r r ' . un. o rrfi re d j o ' e \.rh d r r,. s i rh rhenumber\ot ri ej rr re,D onse,. l hi r r ndi, dr A rh r m J g n i ru d e o t rh e e rte ,I or a gu.si rrg,uri i , ri o,' ,1.p..d,;;; r n, r nF pr upor' hd run rr, m \o m j e d .On t i ro n\i derJbl ( nurnbrrrot i rems;rcumrr. r e. , b\ ar lea \r \o me u i ru .l e n rs q i .,ppl ,,a,i u,r ot ei rhe, formul a fnr e ' hr crtc, L , i' ne( I on lo r $ e * i rrg h' h:re a n a p p re , rrhte I t pr e d rr \o m e , u n ,i d e r a t ro n \ rh a| \houtd i nfl uen.p rhe ri \r n,akcr \ deci . s'on regarding rhe use of a correcnon for guessing on objecr've achievcnenr t S.orcs corected for gxe$ing wiu usuallt .ank srudenB in about rbe same rela trve posroons as do uncorr€crd scores. 2 The probabiliryofobraining a respecrablescoreon a tood objeclive tesrby blind tuess'ng alonc is extremely5malt. 3 Wellmorivared exaninees who hove rime ro art€mpt alt nems gu€$ blindly on lcv, if ant of th€m 4 Seldom is any moral or educarronat evit involv€d in rhe encouragement ofsru d€nts to make the besr rarional Bueses they (an. i Sru dF nr \ . r ar idal glFs es , an pr o\ id c u 5 e l u t , h t o , m d r , o n r b o u r rhe,, Senerdl 214 TESTAD MINISTRAT ON AND SCOF NG n a rest rs rimcd. a gussing (rrre.dD removcs rhc in.ctrtirc lor s1owe.strrdenrs 5.o,6 nDrected for8lssiDg mav ndude res$,\ene$ or sillingnes Lo sambte. irrerevanr Dcasurcs or rhe cxaminee,s Conrrarl' ro whar srudenrs sonr r iu i o r g x e \s i rg J p p ti e s fl u s r,cLi rt p I end s ro rl i mi n rr( rh e !.i . a ,ru g i r" rhi r o omrttrn g rre rn s T c s rw i s es ru d e nrstn{ s om e rh ru g ro g a i n , b ! mrk j n s u se of € allc m p ri n g ro a n s w e . c v e fy j re rn .T.hc rr av or d ta k rD Bc h a n c e s _ma l b e i n ituen.€ rtens on $.hich his or her tikeliho.rd ol level (Rowley and Traub, t977i $,ood, l ror $c$'nS grve a specral adranrase ro as measures of achiercment sullcri Dill6renttat nem Weighting , ",,"il::'t;::; ::':.::,T; 1:;ljlri it:i*,-:,:i'j, ;i ;:5ii:. : x:: irl;iili; J:;lill;: :ki".,'.f ;:i;i.,: :...;:. ;i.l".I'jifi,..tJ "i::.j.iilit r ru, eJ,h,",,.,, *.p"".., _i", _1", _1.."d s"".. l.;.",i,,";s,; sponse,and 0 for ea.h omiued response sotn€ resrconsrrucrorebetieverharcerrainiremsin their resrshould carry e more rmporranrirems iremsofberter exiry or difficulty, or itemsthar are more Reasonabteas such .lifferential rdrrl\ (.ru-e.rhere.r ro khi,h rhe. are,ppiied ro \.re{ \or dorher ordinarrllmaterhereira,rnu.hworr.n.asur.. I it.guc:srng :i,:.::::lh:'",:'t;:,ll'1"';;p'il'-;"ll#",'iil,':#"::"T andExaDinarionSeni.., tSSZ). a rou. t,i bc,sh'inss.hemeqhen p';..;;r;;;;;i.]i::.'o's 'cqueled a !r'sh'r\-dirrercn' .,i.,.,_r,_,,r,",.1;;.;;;:,i.,;;; ;iflJ",fi:.]'::::::,'l,".il::,j[;i]:l e requesredweighrs.The rank order of resr.score disrriburions"and the Kuder Dcai.There is no obviousadvanraqeto . rhesecases.Sabersaud Whire (i969) j,i'f ;ii ltr:rffi l;":,tiliTi:+i##ikr't*#1j,,'" 1 TEST AOMINISTRAT]ON ANOSCOF]NG2'5 ,,1ri , r.. L rr.i q | , J -,1 1 r,d \d re \ I ,r nnd ntrFr. t.rer po.,rLrt ,e, tur errorr In , rti I,.g t' .-,.re . i ,,n J 4 i L ,r,.,i Fr-,uri nsIrw \urc.arFprobabt)ra\i el ' r, ro !rrerPret rl d r, J , h i e \e n ,rn r .reJ\. on. ot hhr, h i . i udeed ro be .,. rt,Fu ,h c r.,h c ,, rk,re.,\ ma,,) i ,rm\ \houl d b; qri rkr over r he r " " r ' n ,.p ,!r.rn r J re .. T h i s g e n (r J ! w i r.surr i n rnore retrabtFrnd !al i d I r i a. ur c \ rh i ' n rt r' r i rl u d t n u rl b .r ,,[ rrrrn, r. \ I rrr| n tor rJt h er and ,hose for r hc nr or e In p o rtrn t rrc a a re d o u b l e wei ghred. L to mp l e x o r ri m e c o n s u m i n g i rems shoul d be made ro yi etd more rhan r p: p,,rrv e d , h .l u h i , h rrn b e In dcpendel rt) ...,red Js ri R hr.r hroIq. the ' , n. dd\ dr r J g e , u l mu i | l f]r rru e .rl \c i re m\ rur ru.h \i rud ons r:ere devri b" ed i n Clhaptcr 8. Very diffi.uft ircDrs are likell ro conrribure less rhan moderarelv drffi. . ult r t enr sb s c o re re l i a h i l i rl . c i v i n g rh e nore di l fi .ul r i tems exrra w ei ehr i ow ers thc average cflecri.!encss orrhc ircmi and rhus lowcrs lhe effecriveness;f rhe resr ru !,rre r.\r ,trdt drfl erenri dt hei qhri nsof ' \," ,u rrF J i rc rn s ,..rg h rb . r,F fu t i ,, In pr,^i ng \,ure reti zbi ti rl o;val i J,,r. F or €x an rp l c , i n a q u c s r n l i k e L h efo l tow ,x8: l h A child comptaif,sot soverepain and teddernessin the tow€rabdomen,with naus6a.What shourdlh€ child's motherdo? a. Givethe child a taxative, D, Purrhe chitd to bed. c, call rho doctoi , l| ' i' i, ' f r l' rl i r\r r(,1 !,' r\. rn ' g l r.\u l r nai oreol -t.orrhesecondi nas.ore . ' r ( 1,r nd ,,1 rl ,. rh i rd i n r \.re o t + t. In rhe \(ori nq w ei qhr\ sere " be ' h,\,r' \e de' enninc d i p i u ' i l ( h d s d l \,, h e e r .uH gesred rha hc\ mi ghr de;e,mi ned r x pelr nen rrl l \. i ,, a . ru m r\rn ' i /e \i ,re ur vdti di ry. ' cl i dbrl i r\ Tablg12-1. E fect of Dilterentialtiem WeighlingApptiedto FourT€sts Na ot Slrde.ls Na ol ltens 33 41 50 160 0 945 (1-140) 34 105 0i 1-160) (1-i 0) Fi ghl s= + 2 (71-105) 21 0 923 90 (1-45) Fighls= +3 (46 90) 0 983 0 976 218 TESTADMINISTFAT]ON AND SCOR]NG B r,r ,n rh i \ c a i e a ts o rt,. e. _ ( D o k n e ). 1 9 7 9 ).S e td o m h a !€ anr v a ti d i rr b e e n to u n d . l (e e m \, teJr one would need ro wrirc items wirh E x r e p ri o n s E i I b p tu u nd. ol . we rg n rrn go r i rF m3 .n r o t i re m re \p,a r oo m re \rqo f e d u r a | | o n a t r.h i e r,trrrt srrucror of an educarional achieve, COMPUTER-ASSISTEOTEST ADMINISTRATION amrnee performance. It can provid€ \ tney respond ro rhe tast resr r;m. The I resr adminisrrarions seems bounded or puter what it can do for us. A n c w a n d p ro mi s i n g re srrdmi ni sLrari on app.oach.adapti L.l pttt4s,u.e\ L ad, v a n ta g e s t. he of mi c ro .o mD uL 'he IJI;"[,jl;.,il:1,,T*'Jl 'esrinscare;tu-;;"i;;',;";;i,',^";;,il'::f TEST ADM INISTFAIION ANDS@RING 2I? that the trait berng measured can b. described by a single psychological conrin_ uum and thai rhe responses ofexaminees ro resr items can be used io Dtace rhe individuals on rhar conrinuum. The comput€r sel.ecrsfrom the rcst.ire; bank an rrcm rhat an average examinee would be expecred ro answer correctll If the iest taker answers corre. y, amore dilhcutr irem ischosen for the nexr try. If rhe first answer was incorrecr, an easrer ikm is chosen for (he second try. Since each item in the pool has been calibrared in advance ro a particular lo;adon on rhe . onr inu u m, rh e e \a m i n e e \ p o \i n o n on rhe .onri nuu; (an be l ocared rhroush \ u! . e$rv e q e l e c ri o n so l e a s i e ' a n d h arder i rems. A chi ef adl anraqe ot adaD ri i e t e\ r ing o t. . o n !e n ' i o n rl re n i n S i s rhdr onl y rbour hal f rhe n,-t .. or i i .." are needed ' to obrain "equrvalenr" resulrs (creen, 1983). There are probtems ve! unr e\ ol v e d h i rh J d d p ri v e rrs ri n g , b ur i rs anri ci paredadtanrases_sti o.rer tesri " e r jm e. dd d p ra L i l i r! ro mo rc v a l i d rypes.and grearertesrsei uri ry_make i r oni ' renm ol r h. m u \r p ro m i \;n 8 d e re l o p me L\tor edu.ari ondtand psy(hol ogi catresri ngi n the last decade of the century T h e ' e rre p ro b l p ms ro b c uver.onc betore mass resti nq by comD uter be, .o m m o n p l a r e . F u r e \d mp tc. e mrr.r be , targe poot 6r resr i rem! i n ^r ne , ' her purposes, rhe computer's bank so tha! fbr tesr securrry every examinee does not rcceile exacrly rhe same irems rhar rhose previously resred received. The items different examinees receive musl be relar,lely equivalenr rn conrent and diffi. cukyl orh€Nise their resr s.ores witt nor be comparable. In addirion, rhere is rh a r d b a n t u f re ,r i rems .dn be mai nrdrned w i rhour permi i nq unaur hn r' /e d a ,, e s ro i re m r.o ncrumpurer uhru reem, abte ro..ourfoxY 'huv the orher ro hreak scuriry codcs designed ro timir accessand preserve confidentralitl The old fashioned lock and key srill appear ro be the ;afesr way ro srore rcst iiems or booklers in preparation lbr resr adminisrralion. Frnally, re;earch on co mpu ter a ssisred resr adminisrrauon has drawn atrenrion ro additional con. cerns. Moe andJohnson (1988) found thar rhe rerminal screen presenred a varierv , lP r oblc m \ro e \a m ' n e e s i j j p .rrp n r repo ..l no| l ,eabteevef" ri sue,39D ercenr , , bie, , e.l r.rl ,, b ri g h rn e (..a n d 2 5 p e rrenr sere borhFredb! stare.One tdur $ of r l, . ( \ dm rrc .\ d l \,, , o ' n p tn i n c d rb o u' rhe ta, I ot oppo uni rl Lo rcri ew i tems ro which they had alreadv responded Sarvela and Noonan 0988) Doinred our rhe - a, , ' . limi rJ ,r.,, In d L i l rr o re i u n .rd er re,pon* ,..hange ansui rs. and re.over trom kev entry erors i,dds an elemenr of unfairness rhar reduces rhe reliabihtv J nd \ J lr d rr),,r rh , ' On rh e p l u s-ns re i d e . 9 l p c rc e n r ofrhe 3l S subi ecrsrn rhe Moe and Iohnson l. r \ \ ' r u ,l \ p \p re ..c d a p re tc rF n ,e t^r rdki nB an rpri ' ude by comD urer rer. ' ' esr m ndl r J rh p r rh rr b \ .,,n !c rri u n rt pap,r ,nd pen.i t pr^redures In addi ri on, computer'tesr adm'nBrrarnJ.s show pronisc for providing more valid rest scores h it h hdr d r, J p p c d e \" rn i n re ' rh d n , d n he oh' di n;d i rom;aper and oen i t resrs. \ , ' , ! r i, , ,rr i . r.q u i r e ,l n rd . q h rh e dr I rl dhrt v nr .,,r, e synri esi ;ers, no readi ne may be needed For rhose who Iack rhe {ine moror coordinarion required to us! the keyboard of a rtandard conputer Grminal, a rouch{ensirive ,ireen the noDitor provides an alternarive t hcrc is no way to predict rhe magnirude""ofth€ i' npr ' | , , l d . \ i , e \ i n p r u ! i d i n g o puur runl i e\ i n edu(ar;on dn; rmpto! menl li, ' r ho, e q' hhcu\. h e re ' o fo rc h J e \e .n m,B rtr bal r.r,. B ur rhere rr cuerl i eason ro 2'I8 TESI ADMINISTFAIIONAND SCOFING SUMMARY PAOPOSITIONS 1 Sludenls should be lold in advance wh€f an imporlanl lesl is lo be given and whal lhe nature ol rhe conlenr s to be 2 Stlde.ts atal educalofa levelsshouldbe raugril essenta lesr taking skills 3 The lesl developershculd avod c ues n the resl ilemstha l en ab e an ex an f ee t o s ubs llulet es ! wisenesslor command ol knowredge 4 Tesl anxiety s seldoma major n delerhrnnq a sludents score oh a lesl 'aclor 5 Researchhas sugoesledthar rhe poor lesl per ol anriols slldenls may be due 1o n 'ormancelearn ng and pocr sludy techn qle compele 6 Oblecllveclassroom lests usua y should be presen ledrf dlp lca led Les lbook els 7 Many aspecls ol claosroommanagemenlrealed lo lesl adiiiniskallons can be addressedelJec iivelywthlh oro Lohwr ilt enns lr ! c lions ont f r er es l booke1coversheel I The posticn ol the corect answer ln mulliple cho ice ilems sh o! d be dis lr buled s om ewhal e ve rly so tlra lo v e. us eor ! nder ls e ol a pos r on does not providea clle to exam nees 9 Bolh open book and lake-homelests orrer advan l€ges thal are o!tweighed by ther d sadvanlages reatve ro n-c ass, closed-bookresls 10 The re is n o.o nclus € r es ear c hev denc e t haL suppons lhe order ng oj ltems n a lesl accordrig lo dllcully evel or on t hebas isoj s ubjec lm alr er 11 Th€ 1e s1 ad m nis lr alors houldher ps r udenr st o ad lusl the r rale oTwork on a lesl accord nq to lhe amolnr or t me rema n ng 12 Specia lesfadm nislralioi procedureslor c ass room lesrs hay b€ neededlo accommodalesludenls w lh anglage hand caps 13 The nslrlclor shou d be responsrblelor both the p roven trona nd t he p! ns hm ent ol c heat no or 1 4 T h 6 d e v e o p m e n lo t a n h o n o r ss y s t e m s n o t a p r o m r s r nsgo r u lo n l o l h e p r o b l e mo f c h e a l n g d ! r l 5 T h e I n s i r u c L osrh o u r db e r e s p o n s b eJ o rp . e s e r ! n q l h e s e c ! r 1 y o fa l e s t p r o r t o i t s a d mn s L r a t o . 1 6 T h e u s e o l s e p a r a l ea n s w € r s n e e b t a . L z l e s r a p r d c r e r c a . r m a c hn e s . o r n g o r o b j e c l v e 1 7 R e c e n la d v a n c e s n c o m p L l e rl e c h n o t o o yh a v e made Lesfscorng machines more read ty ava I abe for lse by schoos n scorng cas3room l a T h e p u r p o s eo t ! s n ! a g u e s s . g c o ( e c t o n s L o r e d l c e 1 0z e r ot h e e x p e . l e d s c o r eg a n i r o m b t n d 19 Scores may be cotrected lor guess ng by slb l r a c l n a a l r a c l o n o f l h e w r o n g r e s p o n s e sl r o m or by addrnga fraclion of the om tted respo.ses l o , l h e n u m b e ir 9 h l s c o r e 2 0 S c o . e s c o ( e c l e d l o r q L e s sn C u s u a l y w r a n k lne eramr.ees n abolL lhe same order as the co(espondi.a !fconecled scores 21 The probab ly o1get(r.g a respectabtescore on a qood obleclve lesl by b nd guessno atore s 2 2 S l u d e n l ss h o u d b e e n c o u r a g e dl o m a k e r a r o n a l guesses aboul lhe answers lo obleclNeresl 23 Gvnq dllerenl weghls Lodrle/efL lems in a L e s lo r t o d i l f e r e . lc o n e c L o r n c o ( e c t r € s p o n s e s wnh n an lem seldom mprovesscore reliablily or va[d score use appreciabty 2.4 Adaplivelesti.s s a r. aLvey new and promsin!, m e l h o do r c o m p u l e rL e s L - a dnms l r a l o r t h a Lh a s l h e p o l e n l r a l r orrm p r o v i . qt h e e j n c r e n c yr e a t i s m and securiLyaspeclsoi the more lrad (ona resl FORSTI,'DY AND DISCUSSION QUESTIONS whal are lhe pros and cons 01 uslfg surpriselesls lhal are nlended for slmmalive H ow m Eht a s t udenls der c enc y n l e s H a kn g s k l l l sl e a dl o a c h e v e m e n sl c o r e so l q u e s lionableva d lyt How do such sludenls cause the re ab ty ol lhe scores irom the r c ass lo be lower Lhan t should be, 3 Fow.o! d a studenls seli reporls of lesl aniiely be veifred lhrolgh othe. mea.st TESTAOMINISTFATION AND SCOFING 219 A leacheralows 40 mtnulesot lesirngrme ror slldefrs whosenalve tanglageis nor English, b!l allowsony 30 m nulesto a lolher studenlsDoesthrsseemtike;n;q! table po lcy?Why? why do slldenls chealon iesis Insteador p.eparinglhemsevesthoroughty tor scorin! 6 one inskucloralrowss udenlsto keeptherlesl copreswheniheyeavelheexam ancshe oeveopsa newteslror thenexttrmethalexamis ne6ded. whal aro lhe prosandconsof procedure lhis lor presentsludents,f!lure slldenls,and the inst.lctor? Howcan I be shownthatthe lse ot a coneclon tor euessing tormuladoesnotpenatze Whalkindsol conrrots woutda teacherneedto inlroduceto pfevenlchealinqby sludents on a comp!ler€dmiosleredlesr?(Answe.ior indvidualiz€a restrnganOgroupteiringsepa fest raracteristics 2a EVALUATING TESTANO ITEMCHAFACTEF SIICS |lm e . rc ! r\e rh e rr re \r ,,c m \ i ru , l u r ur r u!F. f\ cn,u.rl r. a tdrgr puot ,,t hi th tesriremsshoutdaccumulare, and rhcab,tir) a*.r.p fi,gi q,"t,,i i., be enhanced in the process. - 221 qu! t,,] ;,u TEST CHARACTERISTICSTO EVALUATE The chamcrerisrics ro consider in evaluating rhe qualiry of an achrevemenr resr are the sar.e as rhose ro,lhich rhe tesr developer arrends rn rryine to build " sooq-'e.\'l9.,Ic or ,he,eimpn,,rn,ra,ror,r,e rejeqn,e.r,"i,,,"1 s per r. !.d r r(u trt. d j s .r,mi n z rrun. !rri xbrti r) dnd,(t,rbi ti ry. "ir,i,-. thoueh,onrc ;t . nes . c n d ra (re n s rri\ a rc c v J Iu d r.d \ i rh d rc,cn, .r i reti r t" or. i i ,er;on ;eterenrcd u norm refcrenced measures, each rhrra(rcrisnr rs rmporranr ro c"nsider ;. gar dl e s ' o r rh e u t" q ,o ,c i n re , prerauun ,rr.,.,, i ,,,,1,.,a.J,,,1,," i .r, ' !p e Rolevrnce and Balsnc€ Rel"zan., rndicares rhe exrcnr r( ficadons and conrribure ro achievins t} .elevance of resr items requires cuitit flt€ria. Are rhe test specificarions and test reviewer !o decidc which irems are ofrhe_iest purpose, does an irem like rhis belong in rhis resrr lrems rhar are b e re te v d n ' a re n o r n e (e \uri t, or ni gh qi " ti ,,. h,rr i h.) lud_s:-d : " ;; i ;;.,;;, appea r ro me a s u re rh e a b i ti l i c s th dr rhe re\, i nn\r,ucrol R e l e v d n (e re j u d g e d b t d n i ' en, h\ i rem rdi eq ot res,,onrenr s i rh Jrren. .. oon o rre c r€ d a t rh e s ec ri re rra : t: a p p y p ;r.p $ . D ues' hc i rem (onrenr f,td\dnetemenr nfrhe . , ^C? n .te n trd o c s rh c i te m runrenrmarrharperrfi ti nrrrui ri onal obi e.. oom a rn o e rrn rtro n ,o t i! e? C a n rh f rr' k p re s e n re d b v ( he i rem bc round i n t,, g.;.;;-;;;;;l ;; r ns r r u c u o n d t m a re ri a tsu \e d b v e rami neev 2- Tat orenL IdeL.rn ierms oftrbels Relevance Cuide or Btoom,s Taxon om y - a re rh e i ' e m \ h ri U e n a r rh e rppropri are i nrFtte(rurr tcvel : A ..,h.-;" ;; r r v er smrn rmrre d ta to r o l rh e u s eof L nos tcdge.appti c.,,i on. Jnd pt obtem sol u n e r re q u i re d ing- )Are l } l e a b i ti'ri b y each i (em ei i heri ;o fa, bel und' or w e rh; or r n. .o g n rt' v e d c m d r' d s o n q h n h i n)rrur | l on krs tocu5ed? 3. ExttanpdL'abiljh?r. fo qhar errenr doei ea.h irem ,equire . knowledee. s k ill- . o r a b i l i ri rs o u b i d e ,h e c o n renr domai n or i nreresr.n" .i l ,rl " s" i ,* i * :" rl r eddin g d b i l i r!, o r c r(.a r' v i r)p ta y much of a ." 1.: H .* ,i l ;t;;;; ;;: ' o o S r ound k n o w te d 8 € ,o u rs i d e th e d omai n o, i nstru(ti on, murr -r;; th; exami nee.a upon to answer rhe irem? To what exrenr do th. .rrr"_", tne majo ry culure mfiuence irem inrerpretation "".-r, or ihe s.l";ai; ";;;iil;; ;;;;-;";; conect answer? Most tesr cons!ructors seek ra]dtu? in deir resrs.Theyhope rhat the ilems rhey seiccr for their rest wil sampte representarivety #;.;;;;';"i:: "il;. 222 EVALUAT]NG TESTAND ITEMCHAFACTEBISIICS k nu s l e d g € . !k i s , a n d u n d e rs u ndi nss-ou rl i ned i n rhe pl dn I he tdbl .- ol s. pe(rrrrd pe c i i i c arro ti o n sd,te o e\.tn to pnepd i n rh e pl " nni ' esl ,,a"d ,.,ri.. s,;a.r",h""i i;; i;;;" i";",;;";J;i:::,:i:.ij:lliil"::1s:;-. u:11..: r, l:fl:" ;0"'r'p'''ii''"'"' "r^prr,r""Ji.."",li"J,'i ;;ilTi;',:,';:9li;:li::l' i:;1;l:,i;ll1i"::: "Ti:': r;:;;';;"'J;J:l;:.iji"'ll,:.,.I :::1,ff]:l';l:;:,il;::ii,,"1:t::':,,,i,'.". ,r , ,.,, , ,., J" ,r,f *";4;ii.::;;:;,,,;::.;,,.11 ;::;i..ili;J,[,iii::i; ]l;f;: ilii:',i,lT,:ill: j:i iililir *:ijl}xj: :litt":::.1;:;::i-"':;i,i.'il';l,i:!-,xT,:'::,i i;::;,'; :11';'t,tjt: :l *;l;ll;,:.il!: isi: j;'li.::iirlc',., il'.l.ilTl:''.'i.'::ilT.:l; iii"Jl l*:,,""'J':: ; J,iH::: ; ;i;;;',;;."; j j ::i'JI;l.ffi:l l'"i:;llff ili:l,} :ll;,1;::lr l:l.r:lr :i:.lT;l;lt.',,1ti'i"'::liJl:*1,:r;r:ri,ir;:r,;il#::,l':llli. ;:i"J,,:,li# :;.j1., ",,, ;.:i.,J,;,;,:T; .f,lljl.lll ""," i, l:",,, Efficiencyend Sp€c icity "a,s,r "i;:l;; EVALUAI]NG TESTANDTEMCHAFTCTEF SIICS 223 Ditlicully 5nd Olscrlmlnation How drfficult a tesr rnozld be relates ro rhe purpose for rcsting and rhe kind of score interpretation desired. A good orrn.referenced resr ;iroutd be harder, iDlentionally, rhan a good .riter'on.referenced rest. Bur how hard a tcsr r u' n) o u r to b e rl s n d e p e n d s o n h ob qel l srudent\ l earned rhe r onr.nr requi rcd b\ t he re s r ra s k s .Ifl ' l 7 ? r/r) s e re s r ri .rl ) d.hdra.reri \ri . or rhF resr,a ai ren re\r s a, ul. l b e e q u a l l \ h a ,d o r e d \v to r e\' S rouf ro w h,i m i ' sa\ admi ni (rered. For norm referenced purposes!' vrests rhar are roa easy or roo difficulr for the group resred will produce score disrriburions that make ir hard ro identify rel'able inrerindividual differences. Under rhese circumsrances rhe coat of rhe test developcr is to uk itemr thar will produ.e moderare difficulry-a;ean s.ore that is about halfway berween a perfecr score and the mean chance score Thus. rhe idzal diffcultt of a 40.item resr composed of 5.oprion mulriple-choice irems is 24, hal fh a y b e th e e n 4 0 a n d 8 L o n e .fi trhof40). The di tfi cutty oi a re,t i s obvi ou,l t detcrmined by the dilficulry of rhe irems thar comprise ir. l he difficultv of an t lem , irs y ' u a l u " .i ( rh e p ro p o i o n o f rhe group rhdr ;espon.ts.onecrt) I tre i .teat drfficulty of a 5'(hoice item is 0.60, halfway between I 00 and 0.20. Considerabte skill is required by irem writers ro develop and manipulare irem conrcnr ro achieve the approp ate level of difficulry. How diffi.uk should resrs inaended lbr crirerion.referenced inrerDreLa. |lons b c l B e , d u s e rh e e l e mrn K o t r he domai n ro bF measuredrre madc e;pl i , i , h) r he d o m a i n d e fi n ' ri o n , rh e n o ri o n ot drl Ti cul ryi s burl r i nro rhe resr\D ecl fi (a. r ion\ . I n rh i \.a re rh e w ri te r i s nor free ro mani pul are;tpm.^nrenr ro rnnu enc e d i ffi . u l r) d i re .rt)' re m Io rh e e \renr rhar di l Ti cul ty j ( mani D ul ated,retevan,e m al r u fl F r When a rating scale is developed to describe rhe absolurc srandards against whi.h performance will b€judged, dimculry is accounred for in des.nb. i.g the various scale poinrs. The srimulus pr€sented to rhe studenrs, wherher a r hem e p ' .m p q a s p e e .h o r a l aborarory \ki l l , mu\r be preprred by rhe elaluaro r to b e (o n s ;i re n r' oi np id(,i m , ul ry w i rh rhe demands i nher;nr' i n obi c( . ' ne r iv esof i n s | ru (ri o n . F o r e x a m p l e , a n i mprompru speechdbour the derri menri of s m olin g w u u l d b e e a s i e r fo r a h i g h school srudenr than one abour how w rrer softeners work In this case inapprop ale difnculty-roo hard or too easywould conrribute to a la.k ofielevanc€. cen€rally, we expecr restsgeared ro criterion.r€ferenced interprerations to be easi€r, in terms of mean score, rhan those us€d fbr norn r€fere;cinE_ But it i3 possible for a Sood cri.erion-referenced rest ro yield low scores.In rhe c;iteri on.referenc€d rituation the goal is not to mak€ resr-rrhar are hard, moderare, or easy in difficulty. Instead, lhe purpose is to rranslare the resr specificarions_ihe dom ain d e fi n i ti o n - i n ro re l e v a n rtesrrasks.H i gh degreesof suclessar rrandr' i on will automatically take care of dilficulty. Tbe abiliiy of a nony".refer€nced rrrr ro discriminate beMeen hiqh. and low.achreving srudenrr is a tunciion of th€ abitity of eacl ttm to do iust ihar. tf alar ge p ro p o rr' o n o l th e g o o d s rudent! get an i rem ri ghr, and a s;al t propor. r ion of | } )e ' p o o r" rtu d e n ts g e r i r ri 8hr, rhar i r€m has di scri mi naredprop;rl y and has . on rri b u re d ro rh e re s t p u rp o re . D rd' n a,i rn i s.toscty rel aredro i ti m(utrv: ?24 EVAIUATINGTESTAND ITEMCNARICTEq]STCS ir c m s rh 2 r a re ro o h a rd o r ro o c a s l are nor as .i pabl . ofdi scri mi nari nq berw ee,, high a n d l o w a c h i e v c rsa s i te m s ;f rnodenrc di i tj orl rt T h c rre q s rn a c ri re ri o D j c i s nr de n rsa s to D g a s s o m c s tu d e n rs r hos ei tc ms . Bu t s i n c e th e p u rp o s e o ir c DN rh a r fa i l to d ;s c ri n i n a r; rr. , bas is O l c o l rrs e ,i f!))o re l o w a c h i c vr conecrll, rhar irem js a negarive d;s( menr purposes Variability and Retlabitity A s l o n g a s d i fl e re n c e s i n srl pose lbr resring is ro idcniilv such s hould c x h i b i r h i g h ,a /i .rb i ti 4 T h e t the more succcssful rhe rest ,:onsrnr diffcrences in achieleDenr The rot, ingv x r rrh ,l i rt \h o u l d b p d p D a rp n r.trrrrmet\ , r.\ o, nd,d r, \rr \i pto.l Fspd di r , , ibnr io n s N i rh re td ri !" t. i ,;d l s rd n dJrd d.\,dri un\ r" ,,4;;;p;.;; m nd cra,e dirri,utr\ \rrnd rhebprr, r, , . , . , , r. a i., i-, " " , , " " f i; ; " " ; " , ,;;.;:" , t : i] li. ; J , hiA e m-n r d n d p ro d u , i n s h rq h y r,rr \ d,,rD , r). cood c.irerion-referen;e.l r. _ AhRh qud,, \ * ".;i;:T).',1 ll:i:,.:1,';,:: l"id"",,1", ;:::ll,:$;ii; d j rprte,I w .r..., .i ,, a m,rc,,Jt r c \ r . n s h ri h e r(r\o n e ,,h ri i n , pianosrudenrs prarwirhouterr.r al areexar"pr* * r.,,i " " .,i l I ' ons H h e re v a ri a b i l i ry rs q u i re s m a o, evc, nonexrsrenr "r.;i,..,.".."i*,i.i.i,iil w h e n n o rm rc l .re n , e d \u rF i nr,.rprcra| | .n. i r c nct.de.t.th., at,tn,tttt nl tt^nL \tati _thnli nd,i ttr^t tt," ,. qu.t;ry. C rf "t rcreyanceni rsnoLhFen e.tahtr\he.t rhe rrrer me abiliries quitc accuraret-vThe '.relc!anr :;"ffi;ffilil ::i.t:.tj;i: - .',' JJ: nu,be.,,vu.ctu,," .. -,.. ",,".l:l:*ll':,,1,1:ll;:1.il:.:..::;;;;,:i: mosrly based on correlarion coefficie, variabilty in s.ores When dichorotr madewift scores,decision consisren., But in criterion.referencea contex.tsU grading on theA-F scale,score reliabi r he r r ad i ri o n a t re ti d b i l i rv e 5 ri m rre mav be app," p,;" ,.," ,,., t" , or goooness used in norn referen.ed { bF l oostri ngrn' ForctamD i ' ' J K R20 o f 0 .J 5 \o u l d n o , o . r' ..o -ontt* ' t -av u,e'bu,;,ma, no,be,oorowror,enain " ..irc.i"n,.brlllil"T:I;l,i:fi:::l The rest chaEcrerisrics we have reviewed iD rhis section are imporranr ro examin€ in evaluating rte quatity of an achievemeDt ,",,, ." ;.,;;;';;;;; p u rp o s e B . rh e e v a tuari on of F,.h .ha,a(reri sti . .,n p,o,i a. l, " - "acrd j n g c rli1l: ues:r:eg rh e w a y s i n w h i c h rhe res, i rems mi ghr be rcvi sedand i ;pro,J EVALUATINGTEST ANO ITEM CHAAICTEFISTICS lilli:T1'.L::::i:l'*:'ssion 225 orthesechamcteristicsand trrecriteriaroriuds ITEM.ANALYSIS PROCEDUBES T he and l r s r\ u l \ru d c n r re \p i ,n \e <ri , ohi c, ri re re\r i r.m\ i s a poqertut l oot tor resr r m pr ole m e n r a n d ro ' d ,.u m u l a ri n s a bank of hi gh.qual i ryi r;ms. The pro.edures In r\ \e ,l ,o n h d !e b e e n used rradi ri onal t) qi rh rrems fi om norm. reisenced measures,br,_trhey can be used also fo. i.._ift"_ _it .t"..r.r.*"."i t P ro , .d u rc \ s p e . i fi ( r r d e s i B n edfo, cIre, i " ..,.r* .* .a ,* , i ,* * i rr' U l a" l rs d .' F ( rsDF rn d s u b \e q u p n r \e i I i , rn .) l re m anattc,s, an rndi (ate w hi r h i re" msmav re r uo ea{ o , d i tfi ,u tr a n d w h i i h m J v tJ,r, ror w hatel er * * -r,," ai " .,i .i i " i . pr opef l\ D e rw e e nh rg h i n d l o b d r h ievers.S omel i mes these procedures suseest s h\ an i re m h r\ n u , tu n i ,i o n e d F rr e(ri \etv ancl how i r mi ehi s. rl nrl " " .al ?1,,, mo\r,ofieniremanattrsontr id.nr,fie\p,obte.t',"a ,1. i"rr.,,"i l,r" r or I nc p ro h a b re c a u \F s .rn d p o \\i b l e 5()l urron5. "-."iJi Ire m a n a l v s i \ b e g ' n s a tre rrh e resr hasbeen r,ored Of rhemanysetsof , anar v s r sp ro c e d u re s In u s e , o n e h .,\ been (hoi en to i l l u.rrate how the process \ ur k r A n d rh o u g h mo s r mi , ro c o m p urers (dn do rhe cat(ul ari ons d." .i l [.a-t.: r ow' r ne p ro , c s s rr d e s ,I' b e d i n d e ra i t ro hel p vou devel op a compl ere undersrand. ing ofthe rnfbrmatioD rhar r€sulrs. A ctassr;om t.*t * *t t.."_"r.i. the procedures by hand would foltow thes€ six sreps: ".r,i** l. Airzn€e rhe rcded tesrpap€rs or answersheetsin s.ore order from highesi to 2. ldenr,ft an upper grcup and a rower group seprrarety.The upper srouD i! lhe hishe! rorins 2r penenrronerourrrrror irre ;.a ,r" i.i"i.".",." i1,"up ii" fg u a ' n u n D e r .' r.w e s \ o r i nSof rhe rorrt 8roup. ' h c 5. Fo r e a .h i re h , ro u n r rh e n u mbfl orcxami neesi n the upper group rhar.hose p r. h re \p ,,h F rrrp rn a ri \eD . o a \eparare, j i mi tar rau),fo; ;h" 6" .; g,.oup. 4. R -e .o rdrh $ r.o u n r\ o n a c o p y ot.rhe h{ d rhr end ot ,r,. _" * po" Oi ng *. . " " p u n \e r| | e rn a ri v e sC. rh r u s eo f(ol nred penci hi s re,ommended., r alh h e k e te d.rei poncF and di !i de rhts,um by drerol arnun. ll g l j ,: ' w :, lo In rhe upper drl lower groups. Vutriply rhis d;(imat value bv ro ro' m a pe'.enrage.The rerutr is an.timr. ol rh. indcx of eh dim.ultu 'u, { t . Su b ri a |, l h e .to w e r F o u p c o u n t l mh rhe upp€r .ounr tor trrel evea re. 8roup s p -n r. u n ,d e rh F d ' trrrrn (. b v rhe numberofexami neesi n onc of | ne;ouo! re,,ner Broup srn(e both are rhe sam€rize). rhc rerut( r\pressed a3 a delih;I, is rhe index of dis.rimination An Examola An illustrarion of the dara obrained by this process for one item is Dre. -. senred in Figure l3-1. Answer sheeB from a so.tut r"st *cr. "ti,ai., I /6srudents. so rre upper and toker groups consiired "uoit.ti.. otr}|e 48 studenls ha!inc oe nrgnesr and the 4E having rhe lowert scores. The keycd rerponse is marlej ------ 226 EVALIJATING TESI AND ITEMCHAFrcTEAIST]CS O4e rersof s ke v lo rrye)has been occLrrlflt- -a lr nasbeen nqeasing 147 24) (0_ro) 9 " * " " " n n s d u e10ri sfs raresor cancera.d heandrsease ? rrl lnas " . increased c. for youngpeoplebut decreased roroder peopte(o s) o t has remained q! le slabe (l 7) cmirs(0 2) Flguro13-1. rruslralono ren-AialyssData esesf.,l l ow i ng cac| responsc (first fig'Jre) and how nianr of^ttcr.arrye LLe tower ,n\e Ol l F In ,tr" ,uter A ,oun. r; ' hc ,.,e.r Jn\kc,\,,nrl .,,, ,h,,., rhe t;,,,t, )w er qroup. ?4 .hosc rhe fi rsr resD ors.. l0 the second,5 the third, ard 7 rhefourth.Two (;,rr" r"*". s.",; ,i,,.1.; i:irjii rheircmar.aI.(Note nar,,e a. ,ot r"*: r,",,i ii..-",r.jir.'lij :-_':jryLo :: percenr or scorers responded ro rhis r(em.) T h e m o d F ra te .l c p re c u t d ,tfi ,Itr) or rhe i r,.n, i \ InJr, arcd hU r,,. ;! ,,p, - rre rt r..p o n s e r . nr o r.o rh . rw . qro,rp,,.m\rnert.,.rt,utJr,a,, r.ff,," .,r 'n l. Add the rwo counc for rhe k.ved re\ponscl ' ti + 2 4 = 7 1 2. Divide this sum by rhe torat numhel 01 stu<tenhin both groups: 7l +96=074 3. Converr rhe d€crnal value ro a p.r.e.,atse: o .7 1 x t0 t)= /4 V a l are of thc di { fi cul t, i ndex rhat w outd bc age of rhe entire group, a[ ti8 srudenrs. rri mare\j l t be quIF sr;,tJ, ror! tor l d,A c Ior smr er.ta.\es. Or .aur\" . to, \r,;tl esponses of alt srudents 10 compure rhe N.rimrnarion of rhe item is indicaicct b! : differe.ce i, proporrions of cone(r re, o,,p\ [/47 _ 24r + 4b = 0.41i ].A nd ea(h w rtt qi r.e.ai h a(ra(red sume re\D un\el I rhe lower group. In sum, rhe moderare .rimination and a usefut conrribu on ro ETALUAT]NG TESTANO IEM CHAFACTER SICS Z?7 S E LE CT I O iIO F T H E U P PE RA N D L OW E RG A OU P S 'I lre rypc ofirenr anrtysis wc .lcs(rit nr ak esu s e o l a n i n re rn a t c ri re ri o n acr'revemenr. Thar is, rhe rorat scor cnfenon rarhef rhan sone orher i ment In order ro (on.tude rhar an rLem,one musr assume rhar rhc enli .Such an asstimption is ordin come close enough to rhe mark on a fairly dependabte basis for distin a. hr ev e rn e n t I{ o w e v e . i r mu s r b e .i .rrrcrrcn.an oni, make a test a betre t he r es ra b e fi e r s re a s u reo fw h a r rr., b. t r er c ri te ri o n th rn rh e ro ra l s c o re ( exrernal criterion_ yet an cxrernal cr nal c r ir e ri o n u n l e s s i r i s rru tv a b e tt ( The use ofroral.resr sc,)rc as l f or ir c m !n a l l s i s h a s rw o i rn p o rn n r lim ir s s c r b w th e w i s d o rn a n d i k ' o r nr . ldet cs r d o c s .o n re c l o s € r rh a n a D r ( nuar r ha r p e rs o n w i s h c d ro m e a s u re on lhe re s ru h o s e i re n s a rc b e i n s a , Ih e s e l (.ri o n o f h i Ah l y d i s ..i crire.ior, rcsrtrs in a res|vhoie ir"m c as ur e s .Ir rh i s s e n s e ,i te n r a n a tl s i ! k ind of a n a i v s i sa n d s c l e c d o n w e i n an.i Di8hr nor elen improve, the vat to th. (esr as a whole, and rhis is n, reliable, and thus probabty more vati \re p 3 tn p r., ,5 s o t"i re m rnatri i \ rrl l e.t tor the counri ns re\D onrc\ 'h. ,pq.' ,:d l.wer ^t 27 pr'enr sroupr tlhv 27 p"nen; \^;;;:; ;;;fi;;; l. r,,qerloulhs (25penrnr. rhirds ,33p.,,.",r. "..,., h,t".. r50p.,;5i;r; ;;; i:,.h1,27 peren, pro,idA,he he .ompromrsc b.rq*.,,*,, a",_"r,i" i:llll DUrI n. on s rs re n rrrm s 0 ) r. ma k c th e t o m ak e rh e e x rre m e R o u p ( a s d i ffe . r l93q) de m o n s rra re d i ta t w tre n e * r. P upper and tower fourths or thirds. Ho the intuirive feeting rhat 33 pd.enr is groups of larger slzc or thai 25 perct 228 EVAL(IAT]NG TEST ANDITEM CIABICTEF]STICS diflerene herqeenrhe gyoupsi\ grearer.tn ed(h. dserhe\uppo\Fd Jd\dnrasers {;sh,lv-mo,erhanolrabr rheoplposing diudunrage rn. l,ftimum varrrcil,z, Counllng the R6sponses The counring of responserto rhe irems is tikely to be rhe mosr redior,s , -lme (onsumjnspdr ano I ot rhe rndtt\ir. H.weter. tur mdny{ tas,onm re{\ rt" number ofpapers inlach extreme sroup may b. t"* ,r,,, i.", .r,i.r, ..r1, ii,. rask se€n lessformidabte. a chart-can'be de*f.p.a tr,* r,", it.-.-,"-*i,",.o down rhe r.rr \ide and rtrernari.A tatr;t.d ,, ,",, ,h; ;;; ;;,;r;,;" .rsponce . harr helps ro trsani/e rhp wo,r rnd. ir man) .opie. ur ,r ar( T'lllq".,'i. ar one ouplcated rime, a strppt] can be kepr on rrandror future teJtsor to strar. rirh Lolleagxcs. Ot|rn (jc,i;rt .rdtfor aid;( , rn p.rrorm rhF,,1r,,,,f,^,r'".a { )nsecountsby a showof handsin ctass,as Lsrng srudent!olunieers.Bur neirherof lhcse r il:llfl$:,.fi.i'::: ""."f :iT.,,x.,:"?,* spected. Optical scanncrs and compurers are rhe most effi.ienr rools availabLe r o, obr a ,n ,n g ,h e ,rc n ,rn a tv j .,u u n i .J n di nd,,." .v" " y,,h" .t ,l r_,ri ;;;,;a.;; ng and Lompudng faciliries make such analysis avaitablc to tcachers INDEXOF DIFFICULTY Historically, two measu.es ofitem difficutrl ha,derro car(ura,e bu,.,'*n,,, ,...,.",,.,li"i"'i,i,:irlli Hi;lli;l l;;:l"l :ir:!'::u. i:J,';lt 15;i".".^i?;:1,e;:r i':';#,TJt:it xl: iJi:i,Tli:'fi or rhe inrlex.oJdirtr utr\ r\, rhe more .litfitut, rhe i,";. r * *.",a -.",,.. l:'ffi."::,lJ +;J';';."'1,:":l;.':;' ;:'il:"i'"T,T]i-i*i:]"r.iJli: xr:::lx[iti;] ::g:.'i,;,.*:::u:H:;i tI;i: !1';i:,t $:::ii:,1 li!il::"i r @r r t ule re ta te d r^ J .h i e te m e n r rc s d ns. The numencal value of rhe index ofdifficutl of a reir item is not derer. H:"";:l"Jt"'il::,:il';:"::,i,.,:,.#, I j;i::lil;:lj:.tr'iJ;:i,;1i,",i.* :: ;x'.'f.'J;:ffi: .],.;! o;;;:;' Hi:i'.i,";lht,jiy,:ti;l:',t:*;1,.;.,0[::flt Tho Dlst.lbutlonot DltflcuttyIndices Ir is quire narural to assume,as many tesr consrmctorsdo, rhat a pood norm.referencedtesrmust include sone easy,t.ms ro r*t ft. lo*;.;i;,;,";;; some diflculr items ro tesr rhe bigh achievers Afrer all, ir Dusi discrjmrnare EVPLI]ATINGTEST AND ITEMCHAFACIERSTICS 229 anrong studenG over a fairll wide range of achievement tevels Dur rhe acruat: t es ung c r r cu m s ra n c e sra rc l v w a fra n t s uch an assumphon. ].hc i tems rn mosr ir or m r ef er c n c c d te s rsa rc n o r Ii k e a s e t o fhrrdl es ofdi i fcrenr hci shts.al l D resenr ing thc same rask bur vafvnrg in rheir difficul.v. Such nornrefeienced ilms do differ in^difficully, but rhel ditTer also in rhetind of task thet presenr. S up p o s e a .l a s s o f 2 0 s tu d e n rstakesa resLand I2 of rti e studentsansw er ir c m 6. or r e. rl v . b n t o .l y 8 o frh e m a n s rj . r ion is t hat a .! s ru d e n r w h o a D s n ,e rc dr swcred the casier quesrion (6) corccrlr: also $ould be cxpecte(l ro have rnisscci(t cxpectal'ons are oiien Drisraken when 3 ' I abl c t3 I p re s c n tsd a ra o n rh € r esponsesof t l srudenrs ro si x resri rcms. A plus (+) in rhe rable reprcsenrs a (orrccr responscj a zcro (0) an jncorrecr resDonse In rhis exhibir rhe srudcnrs hale bceD arrangcd in order ot abilir!,. and the rtems in order of dilEcult!. Nore rhar the irem missed b,vgood srudenl b ,!as nor one of rhe mosr difiicutt iicms. poor strr<ienrJmisscd ali ;he easier rtems b;r managed .orect ansr'€rs to nlo of rhe more difficutt itens. 11is p o s s i b l e r() i ma g i n c a re s r rhat w outd gi l e hrghl y.onsi srenr rc [s q h e n J d m i ni .rercd r,, p.,i ri . ut.,r oun. R c\ a' r o. r it c m ' a n d r, r,,r, ,ru d e rr rl rs " qoul. r b. r " e d ,o ,,.i \rc n , ,t .u' \,,c \. h r a pJ,,,,,,i .,, ,,uJe,:, pi ,r.i ,i ra: i r-r,, n, qi .rp pr J ( f uat t r g u d rd n l ,a rl n rr,.s .,,r i rt o ,h .' ,rem.,,r rht re\r "rhrr e.,\ri .j tor the group than thar irem Corrcspondinglv, failure oD a parricular irern would almost guarantee l:rilure on all harder c ons is t ent .R u t a tc s r s h o v i n g s u c h a d e g would also be characrerized bv mlrch hif wrth the same number of ircDs. Su.h rcr with in prac(ice This is anorhe. reason ! inclr!de items ranging widely in difficuh Most item wrirers produce some jrems rhar are ineffecrile (nondjscrinri nating)because rhey are roo difficult or roo easy.trflbrrs ro improle rhe ac.uracv qit h s hr ( h r rh rr L rn i m prn\F i r. enre r.ti rhrti rr. usu.r \ harr lhc - . f f . . I of re d ,( i n s rh e tJ n g p n t i ' .m . ti ffi (ut' ) rarher rhi n i n,,ersrne i r. I t,( dillc , en( es i ' , d i tfi (u l rv rh ,,r rp ma i n d mong i rems hi ghe\r i n.ti vi mi nx;i on.,,( usuallv more than adequatt ro make ftc resr effective in dis( minarinq differeor levels of achrevemenr over rhe whote range of rbiliries for uhich rhe"Gsr is expected to be used Some data from a simple experimenral study of rhe relarion berween Trblo '13-t. nesponsssol 11 Stud€nlsto Six Test ttems c + D G 0 0 0 0 + 0 + 0 0 ; 0 0 0 0 0 0 0 0 0 0 0 00 0+ 00 +0 o0 +0 230 EVALUAINGTESTAND ITEMCHARICTERSTICS s pr ead o f i te rn d j ffi c L rl tyv rl u e s , o n thc one hand, and sP readof resr s.ores a..l lelel ofreliabilrty coefficients, on the other, are presented in Figure l3 2 ' l h re e s y n rh e d cte s tso f l 6 i te ms each w ere " construcl ed" hl ' the sel ecti on of items from a bi it€m trial lbrnr of a social science test This trial forDr had been ad m i n i s te re d to o !e r 3 0 0 .o l l e ge freshmen and aD i tem anal )si spcrl or!rcJ t o I r eld i n d i c c s o i d i l l i c u l rv a n d d i s cri mi D ati on Ior cach i rem Thc i te rs Lonstr t ut nr g r h e rh re e l 6 i re m re s tsw c re s cl ccrcdso as rc l i el d l estsdi l l eri ng w i del ) i ' , dif f ic ult y d i s o i b u u o n s F lgurrc3 -2 . F e rd rioornD s r,b T E SI E -tu k n 6 _______ EVALUATING TEST ANOIiEMCHAAACT€FISTICS 23I In Tesr C, rhe irems selecred,ere rorcdhat2d,n dimcuty vatuesas near tne mi d d l eo t rh e e n ri red ^ U i b u Io n ur di l fi ,uh\ ri tuer a D o\;i hrc In T€srD, rh. i.ems selecreduete drftibued h dntnuh; value\ s unif.rmtv A posible over thF enrire range of availabtedimculy values ln Tcs!f,. rh€ i1emswere sel€.redfor drrm ditfiLUrr)\ aluer,inLluding rhe ci8,r ea s i e \ra n d rh e e i g h rrn o v d i ffi c u h l em.. WheD these rhree 16 irem resrswere scored on a set of2b3 answer shee6 for the Gl.ircm rryout fbrm, rhe disrributions of scores displayed rn rle histo. grans of FiFre 13 2 were obtained The disrriburions of irem diffi.utties a,c indicared by rh€ rally marks along the verticat scalesro rhe lefr ofeach hisroerah. Note the nrlerse reladon berween the spread ofrrem difficulries a;l rhe spread of rest scores.The wider the dispersion ;fdrfficulry values. rhe nore con. centrared the disrriburion of resr scores. Nore, roo, rhe very low retiabitiry or scores on fie rest composed only of very easy and very difficutt items.and rhe somewhat higher reliabrliry of rhe rrures from rhuse iests composed of irenrs m or e nea rl y i n rh e n i d ,a n g e o f d i ffi (utr). tn rhorr, rhe fi ndi ngs ot rhrs srudy supporr rhe rc.ommendarion thar irems of Drddle difficutry bJfavored in rt,c ( on\ t r u. t in n o f a , h i .a e m e n r rc s rs . INDE XO F DI S CRIMIN AT ION Uppe.-Lowor Dlll€ronca Index The index of discriminarion rhar resulrs from srep 6 was firsr describeil bt J ol' nr u n { l v 5 l r. Si n .e rh e n i r h r\ rrl ' acred (onsi deri bl e afl en,ron and dD . pr olal. lr i \ q i m p l e r ru c o mp u te a n d ro expta;n ro orher\ rhdn \ur h nrher i ndn e, ol dr s (f lm rn a | l o n a s th e p o i n l .b i \e r i a l c orrel ari on.bi seri ali ur retari un,t tanacan s c net f ir ienr rF l a n a g a n .| 9 3 9 ' , rn d D .i v i s: . ueffi , renr (D a\ i s, l v46). Ir has the" \er\ us ef ul p' u p e n ' . w h i , h m o s r o t rh e o rh er correta(roni ncti .esIdck.ol be,ng bi Jsed in favor of items ofmiddle difficulry. As we have already seen, ir is precise'tyrhese it em s dr ar p ' o v i d e rh e Id rg e s ra m o u n rr ol i nto' mz' i un ab,,ur di tfe,;n, i n te. ets " , pr ot ac hr ev e m rn ra n d th rt rh u s (o n r! i b u re moi r ro s ur e reti abi ti rr. l t the i nrar! goal of item selcction is ro maximize reliabiliry, as ir should be for nor'm,refei. enced rcsts,rhe items havinghrghesr discriminarion in rerms ofrhis rndex shoutd be chosen.I.em difficulry need norbe considered direcrly in irenl selecrion. since no ir em t h a r i \ mu c h to o d i ffi c u l i o r much roo easy , an posl i bty shoq eood di \. c r im inat io n w h e n rh e u p p e r l o w e r d i f l erence i ndcx i ,,;sed Item discriminarion indrces of all rypes are subjecr to consjderable sam. pliflg enor (Plrczak, 1973) The smaller the sample ofanswer sheers used in rhe r nalv s is ,r h e l d rg e r rh e !a mp l i n g e ' ro rs . A n i tem l ha' appearqhj ght) di scnmi na,. I n one s ma l l i a m p l e rn a y J p p e z ' q e dk or crFn nesati ve i n di \c, i mi nr' ri on i n ' ng her s m a l l anot s a m p l e T h e v a l u e r o b k ined tor achi evemenr.Lesr ;rem\ are atso sensidve to the kind of jnsrrucrion rhe srudents recejved relative ro the iLeD . Hence rhe use ofrefined sradsrics ro measure item discrimination seldom seems But elen though one cannot determine rhe discrjminadon indices of in dividual items reliably wirhout using large samples of srudenr responses, irem 232 EIALI]AT]NGTEST AND TEM CHA,qNCTEf,ISTICS analysis baled on smatl samples is srill worrlwhile as a means or overa tes! im, provemenr. How much betrer a revised test composed ofth€ mos! discriminatilg items can be exp€cred ro b€ will depend on how larg€ rhe samples and how snat the sampling errors are. Bls€rial and Polnt.biserial lndtc6s T h e h r\e ' i a t rn d p n ' n t.b i s e ri /t torrel ati on (oeffi ci enl s are pre\enred as dis c r nn i n a ri o n rn d i c e s i n s o m e i re ' n.anatysi sreporrs generaredby a.ompul er. Their cornpurarion is Loo complex and rine consuming.o waranr o,rr atte;rion, but because rhey are popular rndi(es of discflminari;n, it is wonh comparing c ar h s i rh rh e u p p e r l o w e r d i l fe re n (c rndei di s.usscd abo!e. The bisdial .onetntiatue@[Lint desoibes rh€ retarionship between two vaiables: ecore on a tesr irem an.l score on rhe roral r€sr for ea(h e;amin€e. High positile cofelario's are obtained for items thar high.scoring studenrs oD rhe resr tend ro ger righr (ireD score = r l) and low.scoring srudenis on rhe tesr rend ro get wrong (ireD score = 0)_Such ilemr are inrerprered ro be hish in dis(rim,na, tion. Negarively discnminaring itens show rhe ;pposire relari;nshio: Mosr sru dent s r th h i g h re s r s c o re sh a v e s .o res of zero on the tesr i tem and manv w Lth low tcst scores have scores of + I on rhe item_Ttle point bi:6ial .olrel^tion co;fgint differs from rhe biserial coefficienr compurarionatly and theorericalty, b;r for purposes rhe lwo can be inrerpreted in essenrially rhe same maDn€. 'lem.analysis When borh are compur€d wilh dara from r}le same rcst irem, rhe biserial coefti.ient will yield a vatue fiat is always ar teast one-fourth larser rhan rhe poinr biserial (curlford, 1965, p. 321). Ne(her coefficient is as biasea in fa,or of items of nodemre difficulty as is rhe case wirh rhe upper-low€r index Thus, ir is possible ro obrarn relativcty hrgh poinr.bisenat or biserial discrimination rrr di. es f o t v e ' t h J rd .r \e rr e J s ) i re ms.Thi s poi nr i s w orrh rem€mberi ns w hen s ele. r n g i te ms n n rh c b a s i s a i rh e i r di scnmi nadoD i ndi ces to bui l d a rcJt or Lo determrne which irems may be in need of revision. ITE M S E LE CT I ON One c'f the t*o direcr usesrhar can be made of indices of discriminarion is in rhe\elr(tionnfLheL,c\r,rhdri\.mo5rhighlldi..rim'narjnB),remsto,in,tu\ion dn 'mpro\ed !c'sronol rhc rps,.Hoq hiBh,houtdrhe indexotdist riminarion 'n Ixperren.e wirh a wide varieryofclassroomrestssuBgests thar rhe indices _ of- item disc minatron fbr mosr oI ihem can be evaluatedin Lhesererns: 030t o03 9 0 20lo 0 2 9 Below0 19 goodbul possby sribjecllo improvefrent Feasonably Margnar tems,usla y needingandbeingsubjectto hprovement Poorilems,to b6 rejectedor lmprovedby revsion EVAIUATING TESI AND ]TEMCF]ARACIEF STCS 233 benade,o secure l"'#:x'jl'Ynjlll:J:,iil'J,i,'jj,::.'jli":r effb.shou'|d a.i.,r,..r,"t,oii,".;;;;.;i;".-, i::,iis::,:.,T,i;:.:i $:.:lt:ll_.:'i,lLi:l rhehishcr wlr,rways,,",r. l:::'Ll,Xi:i:1li:i-,'s ;".*.,,r,^ir,,,i,ri!,..J,Jiil ".'J'";T:T:::,']': il:i::: tr*t'l?i{irll[*:'il"".''"''::"'"::il:i:i.:'i , (t D)! 6 T his fo rmu l a i n d i c a rc s rh a r rh c s cor square 01 the sLm of rhe discrnnirj thar rhe larger thc score variancc 1 relabitiry of rhe scores, rhe tbrmul v alue o f rh c d i s o i m i n a ri o n i n d i c €" , Of .ourse, discriminarioir s1 for selecting rhe irenrs for a norn .. ance white maximizing reliabitiry rs rhat correspond ro rhe conrenr area r / . h l ri l e . i re m \.d n b e ! rn e e d i n on a p ' e u o u \ rd m i n i { | | rri o n L on si n irens can be setected unril rhe nuo are obnined ITEM REVISION T he s e c o n du s e th a r c a n b e m a d e ofi n( Jecred rh€re ap?eded ro be some rhar ( rng revrsrons, fte irems were rried our ! dent s a n d re a n a tl z e d . R e s u l ts o f rhe i n rb e fo tto w i n g p a ra q .raphs. ' ndic ate d l h e fi rs r i re m d e a l , rLe di rri ncri on berheer, rhe rerms di n,r, dnd . "i; 87% What,tt !ny, ts tt€ dt.flnc{on b€tw..n clmato and woarher? .. Th.r6 ts no hporl|nr dtsfincton. (t_6t D. Cttmlt t6 prtmddtya n.tsroi t.mpr mfiv otn.r n.rur.t ph.non.n'. F3-i{itu' "no '"rnt"ll' whltewo'th€' hcrudo! - 234 E\,ALUAT NCTESTANO ITEMCHAFACTEFSTCS 'c. Climrte pertainsro longerp€rlodsot tlne than w.!th€r. (43-30) d, wealher perlainsto naturalphenomenaon a rocrl rarh€rthsn a ,etiorat Ecati. (23-11\ I Li, .ir c r n i s s (rrc w h a r k b d i l l i c u l l l b r rb. group rcsred(onl y 73 correcrresponses ' , J , i , ' ! - , , ,t\,,,,i!L ...,I,1 .1 ,{ , ,,r ,h"j i ' i IJr( h,l l \,,nl t IJ m,,rr q!!,J rhJn p" " - ! u ,l c ,' r' .' ,,.$ ,,(,1 tu n ,\rl \' f\i ,,.,,," ri ,,,r ,t rl ,e re\p,' n.c,,,unrs i rrrl i . ( a' f s r lt rr .e s p o n s e , Ba s rL tra .ti l c t o a.onJrderl btc number ot sood srrdcnts r nd r La t rc s p o rts eI w a s trr), e a tl ta ( ti l c t() good srudenrsr]raDto prrcr:S i n.e rhe s e, D , , f th c q u c s ri o n s e e D re .ll ,a s i c al l l (l crr aD d si D ce rhe i nrended corrc.r re . p, , n, c n a ,," 1 l ,,.F ,,r,.,L l ,.c tf,i r-,,,r,.rrrarrLl .,n.hrr{ i ns r n\ r r c, \ t ,U ,J .r. l r .,p p t.u , rr rh ,r re\1" ,n,- h i ,,,.l J L. mJ,l e l .5r r rrc' j \i bt ', r uak nr g i l s i D rp l e re n d v D ' rc w l i a r D o rc spccrfl c S i n.e responri ed seemedD uch r oo phus i b l c to rh . b e rL c fs ru d e r' tsi D Lhcgroap b.i D gLesred, w as spoi ted" by s r bs r ir u ri n g r D ro rc o b \.i o u s l yi n .o rre cr respo se.Thc revi sed' ri tcm (revi si onsi n upper . a s e l e rrc rs )rc a d s : 6270 Whar,il a.y, is lha dlstinctionb€lwe€n€lim6t and w€rth€r? 0.s8 ., Thsr€is no lmponanidisrlnctlon.(2-22) N, CLIMATEIS PNI ARILYA MATTEROF NAINFAII.WHILEWEATIIESIS PBIMARILYA MATTEFOF TEMPEAAIUFE. (3-25) p€.t.ln6 p€riodB Climara to tong€r ot tim6 rh.n wooth6r.(91-33) 'c. 4 WEATHEFIS DETERMINED BY CI.OUDS,WHILECLIMATEIS DEIERMINEO8Y wlNos.lrr-20) Alalvsis data of rhe rcviscd item reveal rhar lhe revisions were elfecrive. The .hanijed is much easier and much rnore hrghly discriminating rhan rhe origi nal. Only'tem nine of the good srudenb chose distracrers. trqually importanr is the la.r rhat thelc revisions did not appreciabl) increase the number of pmr sru. dc n, s , h u o ' Irg th e c o n e ( t r.\p .,r!e . l L r\ rnrere' ti ng ro nore rhar on rhr' se,ond rr/out the number of poor studenrs reho chose response a increased markedly, even Lhough this response had nor bcen nlrered. Thc nexr itcm deals wirh rhe common misconception rhar mereors are ''filling stars. $v" Do !l.r..vor 16llto th6 6.rlh? 0.35 & Y€3.Th.y may be s6€r ott.n, panbubdy durlnocort.tn nontn6. {12-2El D. Ys6. Tn.l. rll c i... orulld by l.lllng .rarc h c.rtaln t.gton. ot rh. .srh. (30-€l a No. Th€..nh movo. roo .|pldry lor lb srryh.trontr torc. to rcr on th€ d!n. (6-111 '.1, l.'o, Th. l.lllne ol I d.gr. $.r.9. rt . rootd d..tr.y rh. drrh. (53-tE) This item again is somewhar r(D diffiolr though itl discriminaring pow€r is fairly good. The i.€m mighr be made somewhat €isi€r by revising th€ EVALUAT NGTEST ANDITEM CHAFACTERISTICS 235 responsc r. This response can be legitimarely.riri.ized as "rrifkv', bccause rhcr€ are meteor.rarers HeDce in the relisic'n, dris respoDse alone las chrDsed. 42Yo Do slars ov6rlallto the earrh? 0.56 a. Yos,Th6yhay be seenollen. particutartydurtngcedain honihs. (20-60) b. NO. PLANETSTIKETHE EAFTHHAVENO ATTRACTION FORSTARS.(1.,II c. No. The e.rth moves too rapldly lor its gravitatiomttorce to aci on th6 stars. {9-14) 'd. No. The lalling ol a sinsteaveragesrar woutddesrby rhe earth.(zO-14) Noie that the difficulty of the nem improved olly slighrll,, but rhe .hanse obvi uu' l\ s puile d rh e rrrra ,ri r€ n c \\o t rh ( !.rondre\l l on\e H ,' hercr rh.,hJ;sedi d nor in, e,' \c rl rr p ru p u ' ri o n u l p .o r i u d.n' . , h,. ^i ng rl ' e. urr r arr\r r. \ppar ", er r r l' . nr' os r o l rh c i Ih u i ,.\,h i j ,.d ro rc ,p u n $ a, uhi .h hdd n.,r Leet, mJri red T h e n e x t i te m a rre rp rc d ro d e al w i rh rhe rel ari onshi pbcrw een rhe num. ber of t im e z o n e s s p l n n i n g a g e o g ra p hi carea and i he si ze of rhar area 23./, There6le elevenlime zon6sIn the U.S.S.R. This lact indicatesrhar 0.09 a. much ol th6 a.ea ol the U.S.S.B. is abovorho Arctic Circte.(12-26) b. the u.s.s.R.ls wldor (€asFwe3t)than it is tong{north-south)(56-ao) occupiesa largegeogmphicaEa. (27-18) 'c. lhe U.S.S.R. d. Somearcasol rh6 U.S.S.n. rrs abovethe equ.tor and somearc betowthe eduaror (5-16) This item is much too difficult and is very low in dis.riminarion 't he najor prob. lem appcars m be with chorce ,. Ir was a lery arrracrive choicc oaerall, bur nore attrac(ive to good students than to poor ones A new second response was $,rit|en that was expecaed to be less closely related to the idea expressed by rhe keyed rtg% Ther6ar6 elgv.n llms:ones in the U.S.S.F, Thls lact Indlcaiosthal 0.56 a. much ot the ,rea ol th6 U.S.S.F.ls aboveth6 tuctic Ctrcto.t4-32) O. MOST OF THE AREA OF THE U.S.S-R.IS IN THE EASTEBNHEMISPHERE. l 1 t-2 5 1 'c- the u.S.S.R.occuplssa largegeographicarc.. (78-20) d, somsarcr6 ot the U,S.S.R..reabovslh€ oquator6ndsom€6re b6towrhs eouatoi t7-23\ This revision improved both the difficulty level and disc minarion of rhe irem markedly Most good studenK were able to decid€ on rhe correcr response, but ir appear \ rh r' r p o o r s ru d e n rsd i s rri b u re d .hemsel vesnedrl y evenl ya(r;ss dl t four r er pons es .m u (h a \ w o u l d b e e l p e c L e d i l r])e exami neesw ere bl i ndt) suessi nR . ' I h c n e x r i re m d e a l s w i fi c ru s e ot shonage rn rhe ground w ati r suppi y. 238 EVALUAT NGTESTAND TEM CHAFACTERISTICS 48% Wat.rshortag6sh manytocatitiesh.v€ beencausedbywhich,iteny, ol rhes6lactors? 0. 17 ., Removatot n.turat pt.nt covere owing tasterrun.o Into srr€ams(.t7_13) D. hcEas€d demandstor w,r6.In hoh6s, busitresses, .nd industry(1s_26) ., Neithe.a or b (12-22) '4 Bothd and b (s6-s9) T h, . ir , m I, n l d p L r,,p IJ r,.d i Ir, trrbur ,,,,,,r hrghtl t,\ Irj ,,,,,ri re tn rhi . i J\e ||r ppeJ rc !trrrJ r rh e tn u l r mi g L ' l .e $ r ' hrl -c,tF\.j r,,. rtr. .eIr i r\e i t," qu" " ,," n s r r f t J m e d i n ru i h a \ru h ru h .re \Fr,.,un .,, p..r,.i rr.,,,l e,r " " .,,.,,,,,,:i henc e r t w i s n e c e s s a ryro i n c l u d e e ach of rheseas a si D gte,sl rppose.l l yi D correcr r es pon s ca n d ro m a k e b o L h " rh e c . ,rrec(resp(,nseThi i appro;ch rs apparendr une, r ih r ru n fu ,i n ts l u l ]re ,m,{ c .,ro uppo,,,n,,," ,,." l ri , u,e 1,,,,,,a..r r,,, r rn rh e I e \ i \i o n .ne ui . Lc i u e, . r, .pun\e. $.r. n1,,,ed "rn Lhe sreD of the itcm and rhree bona lide distracLe.s were provtaea as iortows, 53% WHATFACTOR, OTHEFTHAN INCEEASED WATERUSE.HAS BEENBESPONSIBTE 0.62 FORWATERSHORTAGES IN MANYLOCALITIES? A. RESTRICTION OF STREAMFIOW BY HYDNOEIECTFIC DAMS(3-22) 6 . D IST U R B AN C E N O R M AL OF R A IN FA LL B YA FIIFIC IA LR A IN MA K IN(3-18) G .. INTENS|VEFABMCULT|VAT|ON, WHTCHPEFM|TSMOSTRA|NFA|-LTO SOAK INTOTHE GFOUND(10-36) 'd. NEMOVALOF NATURALPLANTCOVERALLOWNG FASTESNUN,OFFINTO STREAMS 164-221 The itcm was made somewhar easier and much more discriminarins. In rhis case. the revision process worked in a way rhar gladdened the heart of rh; rtem wrirer. The final irem ro be illusrrared deats wirh knowtedge of(he rype ofinfor. mation fou'rd on a physrcal map of a region 12% A physlc.l map ot r st.to wouH snow o, 21 ,. th. sr.to'3 rettwayn.twork,(20-25) D. .vcr.gs ratnt.tr by monrhtor rho 3r.r!. (20_34) c- ths toc! on ot th. tlrgo.t ct .€ tn tho st.t.. {3a_40) rd. tho .rrr.'s htgh.sr .tova{on. (22-t) The item wrir€r decided rhar this irem calted for roo fine a discrimination. All responses were arrracrive ro good studenrs because no single response se€med b€s L S om e p h v s i (a l m a p s d o rh o w maj or rransponarj on syci emsand some mapq s now r ar n ra rrp a tte rn s ,rh o u g h n o t u suattymonrhl y averaS es. l i na * mai or I i ri ;s are ocmrionally used as poinr! of rererence on phvsicai mapr. Each ;,rsrrarrer was modined to reduce its attractivehes3 ro good sruden* wirite nainBiqing a certain level of plausibiliry for poor students. EVALUATING IESTAND]T€MCHAF}CTEFISTICS 237 43% A PBYSICALMAP OF A STATEWOULOSHOWTHESTATE,S 0, 40 A , AV EF A GS EU MME R R AIN FA LL,I2-' i ) o. PoPULATION DENSTTY (20_25) 4 tuosT |MPoRTANTCtTtES.(15_40) 'd. HtcHEsr ELEVA ON. (63_23) The revised irem rurned our ro be reasonably discriminaring and easier, but it is srill a bir more difiicutt than mo3( item wirers w"uta p.ef..l titr,.. st"jents nor r le a r a b o x r rh e u n i q u e rta ru res of a physi cat map or the second and -. thi rd or s r r a (rd s s rrI rc p r e (e n r te g rri ma recorrccr ansl versrel ati ve rc keyed response d. T h e s e fi v e j re m s d o n o r i l t usrrare a rbe possi bl e w al s i n \hi ;h i r;m. analysis dara may be inreryrered ro ard in irem reiisron. Whai rhev clo indiuLe is rhe general narure of the process and rhe i)ct that it nal be highly successfur. O T HE RCRI T E R IO N .R E F ER EN C EPB D O C ED U R E S T he pr o re d u !s ro re m a ra tl s rs d escnbed i n thrs chaprcr are equal y useti rl ror tu.tgi g rhe qu!ltry of jrrms from norm.referencea and criterion.rererencea measures. HoNcver, rhe srandards used to differentiate good and poor items rn rhe twu rt ps oI rneasu es \ r11 and. consequentty, an iteniearmarkj forrevision ' In d ) b e s e te .' cd w i rhou,.hdngF , ur one t\ p c o t In e d ,u re tor ure i n the ol her l yD e In rhe pfepamtion of irens for crirerion.ref;renced _."."."" .".ir.", masrery rcsrs, mrnrmnm comperency resrs,and some professronal cerrificarion tesrs,irem writers need nor make a conscrous decisroD to wrire items rhat wi bc ale said, rhe rigid conrcnr specificadons of eci .i on $har rhe ren i rems shoutd mea\ur(. to be well preparedj rhe item writer should ' oI70 ro 100 pe' .en' B r | hcse srandrrdr re l udged ro be too easy,bur i tems , an b( ever, IFms rhar, sa). rl 5 percenr o[ a srour, answers conecdy are not auromadca y good irems fo, a trirerion.referincei measre. For exanple, rhose rhar conraiD several implausible distracters or rhar gr v e inrF rn d l ,l u F s ru g g e s ri n grh e .o n e(r respon:e are sri brd i rems. rhe anatr. s's oi - easy crrrerion.referenced ircms for appropriateness in difficultv shoulo include a review of rhe items for t€chn,cat iieqiracy. The .."i.*.. ,i.;i; b; c ont in( e d rh a r a h i g h p ro p o rl i o n o f rhe srudenrsi crua y kner rhe conrenr mea. s ur ed br e ms rh a Ls h o $ h i g h d i tfi .ul r) i ndi ces. The upper lower d,fference index can be used to assessrhe quatiry or cdrerion.referenced irems as well, but gen€ralry ir is lnuch ressuseful in ih;sii; ,re m, re g a rd te s su r i ts i n r;nded purpose.i s usel ur i r i r . f . f.! fera" a neqa. r' :ir. e dis c ri mi n a ri o n i n d e r. Bu r ma n r g ood i rems ,i .a t" * i r* t" " .* r,* .,..a rnF, s ur es m rv h a v e d i s c ri m i n a ri o n i n di ces of z€ro or onty sti qhtl y hi chen Th€ explanar;on-for this phenomenon retaLesro r}le fact rnat score iisrribur'ions from cnrenon.reiereni ed mcasurer rend to be quit€ ne8ativety skew€d and lo$ in vari. abiliry. The upp€r and tower crit€rion gr;ups r€nd to bi very sU ar in terms or 236 EVALUATING TESTANO IT€MCHAFACTEFISTICS r lr al r e s l ta c r. rh e rz. l cores tbr (he rw o sroups D ra! bc l )rrcl r. :(9 r: In ' \e rJ H e or ( ung u rs n a D ' f A l te rn a ri v e i n d i c e s ro rh e uppcr-t.,w e. i nde\ ur poi rLl )r\cri at r,rrrel a t r on ha v e b e e n p ro p o s e d fo r u s e w i t tr i rcnr\ fru,r, rrrreri onj crrrenr(,1 D rc.aures. F or €r a mp l e , C o x a n d V a rg r\ rt9 6 6) suggesteda pre posl ctri ].ercn.ei D (tcx () t uoge I n e a D rt) o t rre m\ ru d r! ri mi n . e. t hc pr, rp,,rI i ,l , .,r \ruo, ,,1 an it em c o rre c d y p o r ro rn s rru c rn' r l prc) i Lj bi radcd ti onr rhc t)roporLi ,,r oj r h€ s am eg ru u p re s p o n d c o fl rc rt) , afLcri nstrncri 's on (posr)..l .hc1" i e.i 11,",.tu. olr hein d e x ,rh c n' uo re h i B h l \.l i \(ri l ri r,. rri rrj t,e r, rr, ..J,,,tb.,l r,,l " rt1.,J,,_ olr neqd y rt' .i o m p u te c t.s ,m e h J \, 1Jh.tc,,rt,i . j ,i ,| | ,,,1,\,,t i . ,\ | | , ., | ., , | |.I I sensirivity-") A n n L h e r i n d e i , u \e d p l rn J ri t) r,,r i rr' , tr,,r, r.r,rrr\ r,\.,. .\ L.r.,t ,, t he phi tu rr(h ti u n (o F fl i ' i c n r A ri r,.rrrrrrr,.rr,,,,rrrprrr,rt L1 ,,,,r,t.,r.,,r _rel nor e / 0 o r + l )w i ' h rh e n ra \rc rvd (, i .i uI ,,r ,sr, | ,,, , ;,. , , , , .. ,. r, , , , , ,,,r r' .,,,.,, I " lr eq u F n ,y ra b l r l i k e rL .' r.h .,u r t{ t,,s. ' $o Mastet Nonnastel B a c D If " m a s re d ' re n d ro a n s k e r correcrty(A i s l arsc) and,,non,n;rsLcrs,. rcnd r o ans h e r i n ..rre , rl \ rD r\ l i ' g c rt\o,. rhr i rem di { Lrnar,. \i ber\,r r rt,L two levels of achielemeDr. When rhe vatues B aDd C are large, rtrc irer) sho,,i nc gat iv ed i i , i m i ' rd i u I. T h e p h i c u c tfi , i enr ha, p " ars1 ,,1,1,,,,h ,r rhe I,rL.D o\l dif ler F n (e i n'd e \ b e c d u (c re q u i rrr ni , prere.r J.tmi ni .| l dri " n U L,rur;t" * rt,. num bero l n o n ma s re rs i s c ri' rl fi ri e n rl y t a' gc,rhetJhi roetfi ri cnr si l t pr,^i d, J;,,,, leading i n d i (a rro n o frh e d i \c ' i m i n d bi ti r\ ot rhi rrern,. T h e i n d e x o t d i s c ri m i n d ri o n r dn l e u5rd ro,el e(, rhe Lc.r i re,nsror i nLl , sion in a crirerion.referen ced measure, also.'Ib do so, irems tirsr nusr bc Arouped a, c or din B l o rh e c o n r€ n r (a te g o ri e s ourt;ned ,n rhe rabte ot ,pe, i ti , ,rtoi , ori a. f or d' ng ro rh e o b i e i ri te \ b e i n g me a\ured. thFnrhel U mrrci ntrnrn\requi red from eacb caregory (an b€ selecred on rhe basis of rheir discriminadon inii.es. This procedure will ensure rhar the conrenr balanc€ required io make valid score inrerprehrions will be achieved. Th€ decision consisrency procedures desoibed in Chaprer 5 provicte ar ternadve merhods for assessingrhe qual;ry ofscores fron a criienon.ieferenced test, especially when rhe rradirional reliabili!y analysis seems less appropnare. POSTTEST DISCUSSIONS On(e the Iarrroom resr has b€en i.ored, rhe resulrs(an be used ro Dromore 'l€arningo'ro contiburero the addirional kind ofoverlearning rharresi.irforqer ting. Tesring postmortemscan be profirable ro studentsas w; as ro reacher:if EVALUATING TEST AND]TEMCHAFICTERISTICS 239 rhe\ are planned well and conducred in a buiinesstike mann€r The feedbacr t om sru d e n r\ ro r.r' c h e r a b o u ' rh c can lead to irem improvemcnrs as I The main prepararbn b) rh iiun of | l e m J n d l )s r' J n d J re . "n about w h y c e fu i n i re m s $ e re to o analtsis and r€flecrion by srudents papers. or if rn anc$er ker rq displaved on a uansparency, clas rrme will nor be needF .l ro a n \{ e r k e y .' ,l i re d q u c s ri on, Thc class discussion should focus on the rtems thar were mosr dimcuh for rhe class;quesrions abour orher irems can be handted on an individual basis, if Decessarl,altcr class or during sone orher ofFclass dme Srud€n(s who missed rh. jrem under dis.ussion should be encoumgcd ro explain how rhey answcred and ro i.drcare ambrguiries rhev may have detected. Disagreementi occurrirrg berNccn a smdenr and rhe rcacher rhar seem not to contriburc subsrantivelv ro .lass discussioD should be suspeDded unril a Iarer rime. It i s u n l i te l r th rt p n \rrn u fi e D rsshoul d l ead ro ttre revi si on of rhe scori no ke\ or r,) a.l.teriun or an! irems from scoring. Or,,io,srl, cfert..r erro.. i" scorl I ng or i n p ' e p .rri n g rh e \.ri n g k e y shoul d be rccri fi ed, bur conrroversi ali rerr ley \ \ hu u l d n o r b ( rh rn g e d . T h e re i s room ro takc rheseand orher rypesofmea. inro ac.ount in using rhc sco.es, rhar ,s, in grading ind settirrg curoff scores. such rnerhods of accounong for eror srroula iie expiainerr ro stu. dents so rbar rhcy are a$.are rhat.,eFors, will be addressed in aniquitable way S UM M A RYP RO P O S IT IO N S i 1emanalyss s a usetuttoo in rheprogress ve mprovemenL ot achievement t6s1s 2 Thefelevance ot a sel ot temsis estabsh6dby t her r e ta to n s h w i pth i n s l ru c l i o ncaot n lLogr ng 1enr,Lneappropnateness oitheirlaronomtctevet, andthe porenlialfor nituence by extrareous iac- 8 The more variabtG the scorestrom a rest,the mo.elkelytheteslhassucceeded in differential_ ng betweenexamineeswho possessditferent amourtsol lhe ablites measu.edby the rest 9 The mosl siqniiicantsralisjlcatmeasureot tho quatilyot an achievement lest s the re abitilyoi 3 Thedeqreeofmarchbetweena tabteofspecI ca 1O lem anarysisbegrnswiih the countng ot relo. s andLt r le e s i -te mc o n l e rti s a n n d i .a ro no f sponsesfiade by high-and tow-achieving siu_ thedegreeof baanceachievedin rheiest dems10eachof lhe lemsin the t6st 4 Themosrelticenl test nc udesasmanyindepen l1 Whle ogica objections can be madelo lhe !s€ deniy s c or abre tes p o .s epse ru n i to fre s i n gl rm e or thetolal scoreon a teslas a criieriontor anaas rs possrble wthoui sacrificingretevance yzng rhe tems n the tesr,the praciica ettecl 5 Personswho lackspeca competence inthesLbol lhesesho comngs s smat and the practical leci coveredby the lesl wi obtan scoresnear convenience of disregarding themis greal lne chancetevetilthelestis appropfialen spec 12 lr s convenienl andslalislicay detensib e roconsder as 'good studentsthosewhosescor€s 6 A nom{e'erencedtesl isappropriate pracelh6m n the upper27 percentollhe rota in diIicutly I ils meaf s midwaybelweenthe pedefi score groLpand ro consderas "pooa sludentsthose andthe expecledchance6core whosescoresplacethem nrhetower2Tpercent 7 A llr es lss houl d i s c rmn a tea c h i e v e rs a n dn o n acrreversol rheconlentrheyaflemptlo measure, 13 Theproponion of correclfesponses to an remby no malrerwhatthe testng purposemaybe lhecombneduppefand owef2Tpercentg.oups 2'O €VALUATING TESTANOITEMCHI,C}CTEFIST CS providesa salislacloryeslimateot lhe dificuly rB Thehrgher the average dscri mi naton i nd6xtor ilemsIn a test,the morevartabelhe scoresare likelyto be and the more.etabe lhe scoresare 14 Formostc assroomlests,it is destabterhal etl thellemsbeoi middleditficultywtlhnoneot them extremely easyor ertremetydifiicu| 19 The nem-analysis procedlreslsed with norm 15 In generalth6widerihe distribulion of tem diiit, relerefcedmeasuresare appropriate tor tems cully valuesin a classroomtesl, rhe more re l.om crileron{eferenced measures atso,bul the slricledlhe rangeol scoreswi||b€ and lhe tow6r slandardslor d I'erenrialtog berweengoodand lhe reliabiliry ol thosescoreswi be poorrtemsare ketytovaryi o.l hei w osl uatons 16 A convenient andhightysalisraclory indexotdis, 20 Thevalueof posltesldisc!ssions io ctasss highiy crimlnarons simptylheditferenco in thepropoc dependenlon advance prepararonby the lons ol co(ect respon$e belweenlhe upperand reacherroc!seddiscussonand aclNesrudefr lower27 perc€ntgrouos 17 Good normiererencedachievemenHest ilems sho! d haveindicesot dscrimi.ationot 030 or OUESIIONS FOR STUDY AND DISCUSSION probabty 1 Whalminimlmqualiiications sholldbe mel by lhosewhoare asked10lldge the relevance ol a glventest? 2 In whalsensemiqhla vorypoortesl haveexcettent baance? 3 Wh a l a c l o rs mg hcla u s e c o rtenl taral el mutl i pl e-choceand rruetasereststobeequay 4 Whymighlil be possiblelor a lesfio be judgedhghtyfeevail bll low In spectcly, 5 Howcoudwedecdeiia 20 item,4-choice mul p e,choce restwasabo!rasd f' cul jorlhe s a m e g ro l p a s a 4 0l e- ml fu e l a l serestw henboLh areused norm{ eTerenced p!rposes? ' or 6 Howcouldthe reativesze ot lhe standardd€viationbe esrmated for a specrrrc norm 7 Whaliactorsinfllencethe s 2eol lhe d ftcutry ndexor a rest tem? I il lhe sameresl s givenlo rhreeseclonsot thesamecass whyhrghl i be prererabie to c o n d u crll e ma n a l yssw th th e c o mbi nedqroLrps ratherl handoi ngthreesepa,ateanayses? why rniglrtseparaleanalyses be lselul, 9 W h a rs m e a n b l y th es ta te m e n,rhe t, upperow erIndexsbasedi ntavoror remsot Nontest and Informal Evaluation Methods I m dt s ine r h i s s .e n rri o fro m a s i x rh .g r ades(,en(e cl assroom: M5. Frdnle i\ 6ing rh€ derh€ad proie(ror ro explain how rhe sreDs of rhr {ien. be rhoughr of as rhe Dain outtine for p,epann; a taborabn ;:l.Jlr",,".J :i::*,,.enr. rn srancins around,he roo;,ie no,rc€d a pu?ured "Dino, can you differentiar€ rhe findings ofan experimenr from the .onctu. r think so,'he replied .'The fhdings are nuhben bur rhe conclusions are w ''Thafs ofren true,',Ms Frank€ aloqed, .,but how do rbet purposes differ?,, -we . trte nndin$ kI wha, happened whar rhe re*r, uJ f_iiii_.,i *1.. . "r upposedro be a summar) or generat sriremenr. Con. qu ro n s re j rw h a rw e rh i n k w i h appeni f w c do rhe bamerhi nq asai n.., I nar s a conrenjenr wav ro desojbe ue diflr,enre,,. rrre reaihir norea. ano ,ner preserbr'on connnued, This snapshot from Ms. Franke s science classd€mortsrrar€s that reachers contin. ually gar h € r d a ra d n d ma k e j u d g me n rs and deci " i " n" ar.i ng i nrr.r.ti " n. i r i i ;o illustrares rhe-varicry of rech;iq;e, rea(hersuser".u.,"iii evaruauons ot class and srudent progress: I . .il.i,i"._"r'ii O6\@arh or r]le ctas\ wa! used to dflecr such non!€rbal indicaroB as lact of duenrion, po\irive nods of rhe head,or (in rhis .ar., *p**.r; understandinc. "ii;;;i; 2a1 242 NONTESTANO NFORMALEVALUAT ON METHODS Qza1iorir8.$ as nscd ro derernri.e Lhe narur. rnd exLenr ol nrin'ndcr sran.li.g df A. / u. t / ir r of s ooef or dwas u s e d b r _ M s F r a n k e t ( ) r a k c a n r e n L r l c x r s i o n d o w , r he or der ed { eps of r he s ci e n r i f i c m c r h o d L o d e r e n n i n c \ {h i L h s r c p s s e r e . r e r r A has c r ear ed,a g a i . i n r h c m i n d o l r h e r e n L b c r ,r c h e h d e ( i . i c i l r h e ' zlir hrlgrof . aL quz n! de. r r c s pons e w r s s u l t i c i e n ( l d q u c s L b n i n s r o c c r s c ' Ie a c h e rs s p e n d c o n s i d e rabl e amoun!s of rhen prol cssi oD al rnne w i rh assessrnentrela(ed acriviries, as much as 30 perccnt b), sohe cstinares (S{iggins, 198 8 ) T h o u g h th i s ti m e i n c l u d e s rhe devel opme,rr,admi ni srrari on,and use of rheir own tests and the preparalioi fbr and gi!inlg of sranciirdiTed rcsts. Drucb of the Iinre is n.r doubt devoted ro less lbrmal merhods gearcd prima.ily ro lbrlra tive evaluarion observation, quizzes and invenror ics, checklisrs,rarins scales.oral qu. \l ro n rn g . d ' rd rh e l i k r In l a ,1 . rea,her,a(bol hrh.Fl emen' Jrl rrrd,eronrl arl ldt e l rc Bd rd rh e i n l n rma ri o n o L l ai ned by rl rFi ' o\n ubservari on.,' ,rd.i dt oi imporlanf' to a variety of instrucrional decisions rhey make (Dorr Brcmne and Her m a n , 1 9 8 6 ) In view of the tiequency of their use and because of rhe impoflance r ea(h e rsa ri a th ro l h c re s u | tr,rh e qual i r) ol nonrerr rnd i nl ormdt as" csamenrs is a siFificanr matter The accuracy of rhe resulrs obtaiDed and rheir \alidity fbr instructional decision making areJusr as importanr as for rh€ more for mal mea. sures we have discussed in previous chapters Some of the auribiles characrers tic ofinformal methods-lack of planning, lack of comparabilrty ofresulrs across srudents, and failure to r€cord outcomes-can contrrbute ro informarioo rhal is deficienr in accuracy and r€levance. Bur rhese shorrcominss are nor so inherenr in r h e me th o d s a s rh c v d re i n rh e hi srori caluse of rhe merh;ds by rea(heh I hal is, planning often .az be done, characterisocs to b€ judged da?,be defiqed ro enhance comparability, and merhods of recording accurarely and .onvenienrty .afl be devised and implemented. Finally, d€spire the value of well.developed objective and €ssay achieve. ment tests, there are many areas of t}l€ cuniculum in which rhese resr merhods are inappropriare, or less appropriate, than ccrtaiD nontest merhods mrghr be F or e x a m p l e . i n \rru r ri r.' n a lo b j e (ri ves rbar requrre speaki ng,w ri | | ng. and l i \rrn. ing- w h e rh fl i n E q g l i s h o ' s e c o n dl anguagel errni ng-mosr ol ren requi rer,om municative production In addirion, skills in such areas as physical educarion, hoin€ economica, indusrrial rechnology, science laboratory and performing arts oft€n requlre d€monstrations ofeirher processesor producrs. Many of rhe non test methods and informal ass€ssmentsar€ particularly useful for moniroring achievement in areas tha. cannot b€ measured directlv by more formal measureThe purpose of this chapter is to desc be and illustrare procedures thar can be us€d to suppl€menr t€st information or to provide information when rests s€em iU suit€d to the task. The main goal is to create a grearer awareness of the need to think in terms ofreliability and validity when cr€ating such procedures or ulinc the r€slrlts from them. NONTESTAND INFOFMALEVALUAT]ON MEII]OOS 213 OBSENV A T I O NATLE CH N IO U ES I O bs er v ar ion is a l u n d a D re n ra lD re d n u n ft,. ob| aj ni ns i D l brmarn,r rl i at srri ctl r speak ' ng,c anno r b c a c q u i re d i n a n v o rh c , w av.Observarn,r s(hcdutesand checl l is t s ar e us ef ul d e v i c e sfo r d i re (i j n g o rrr a rrcnti on ({) .crrai D bcha!i ors w c i nre d ro obs er v eiobs e rv a ri o n a ls .h e d u l c s .re c o rd s ,(heckti sl s rnd rrri ng scal csal l i rc d ev ic esf or r ec o rd i n g o b s e rv a ti o n s rh a r rh e eve rs obseN cd can bc D reservcd a s a r elar iv elyp e rma n e D ra c c o u n ro 'foth e o c c urredccs A 11rhesccl evi ccs-can se to ensure rhar rhe proper behavior is nored and rhar ir is re.or.jed in an a.(urlre repr odu. r ble f a s h i o n l h a r i s , p ro p c r d e l e toprnenrofi he ri ds roobscnarj rD w i l l conr r ibut e t o hi g h l y rc l i a b te a n d v a l i d o u r(o mes. Ofcourse, rhe rery bcsr chl rrs. l is r s ,or t ablesc a n n o t o re rc o m c s e v e .ed e fi ci enci csi n Lheobscrvari c,nacr i rr€l r. Obseners who see rhings rhrt are rol rhere, Drissrhings rhar a? there, or miscarc g or iz € r he beh a v i { rs rh e y s e e s h o u td b e ,,,nri der;d Jusr a! t,azrrd,,u\ as a m ult iplc . c hoic e re s l .o m p ri s c d o f i re ,n s c o ntai ni ng roo i nany arnbi guous.n j D r. p laus ible dis r r a c te rs Bo . th fo rn rs o f rs s e s s nenrare tj kcty ro p,ovi d; hi ghl y rnrs lerding infornarion Spontan€ousObs€rvation While Mi Vo$ was giving y,nre indi!,dual a$isra,r( t{, rtink, he .oriced rhal Jana us€da dicrioDarl ro Iook up rhc spellingot seve.allo.ds as sheNas{firi.s ad nnprod,ptu Lhem€.She alsays sLafledat the begnrDnrsof (he book an; !ur.ed 5 to 0 pagesat a rirnebcfore lo.alnrg the p,opd l€rre; s€.tion The. sbr r u,nr d dF p J Br a rmr, n o ri n 8rh er,r d i ,, rhr Inri r rgrrr , o, r,er,,r rrr. prge. "r unr r 5hero u d th c Pru P c rPrg e l h i s s pont aneou so b s e rv a ti o n c a n b e rh e b e gi nni ng srep tow ard D aki D gJa a a more elficient user of the dicrionan, but ir musr be;.-ernb*.d o. ...";4.,1 so that individualized help caD be provided later ar a more conv€nienr rime. Obvi ously, if the reacher can arr on rhe obsened informarion immediarelr, rhe need L o r e( u' d ir is dimi n i l h e d . Of course, an overreliance on sponraneous obseralion can resulr in marry informatioD vords. Thar is, planned and sysremadc observarion will helD ensure that significant activiries are obsened, rhar rhe mosr imporrant aspectsof those activities are nored, and thar att perrinenr i ndividua ls ivit I be odsNed. Spont aneousobse ra o n o fre n re s u h s i n ..tu nnel vi si on,' :w e see rhose srudenrs who are mosr demanding of our arren!ion, and r{e may never ge( to see rhe reac. tions or peiformances of orhers in siruarions tha! ar€ impo.r;r bur rarc occur- 244 NONTESTAND NFORMALEVALUAT]ON METHOOS s p o n ra D e o u so b s e rl a ti o rs can be unexpecredbonuses,and rhe i nforma_ t r on (b e y p ro v i d € ma y i n ftu e n c e i nmedi are j u;B menrs o. s,,b.eq" ." r d" .i .;;;;. J us t a s fi i s t i rn p re s s i o D sc a n s o mcri mesunkn;w i ;gry _ ," t" * " ,r,i " ri i " i .n" " .." in' c rp e rs ,,.., re l ..ri u n .r,i p .. \u rhe ou,.nrnc. tr,;,, i n,i d.,,rdt ,,h,;;,i i ;;_,;; i I L Ie\pr,r.d bJ\, rur rhi s rrn\,,n, rhe.b\Fner 'I"eeo " 1..'( ,.* \ \rt\ ru!'F" 1 ,' d { ,'h\i.\,.n,.J rh e tJ i l u I e ,^ dnrt) /( ,,1,\in d ri uns Io, posi bte .a:ses ' I/i . ind P n r, rrrrJ| | | n p l r,a rj ,,n s , a n l e rd rJutrr , !,n, tu.run\. tur exJmD rc.l dna mdv K noh .' n n u r d r' rr,i n .,!\ g ,ri d c s u ,d . r' urLl h,,s ro u.. rhern bur, t;r dme i rrrapoar cnr reason, shc does nor usc rhem. Here are so-. g"taeii"., rt mist r heiiio c r c a re a n d m a i n L a i n rhar observed bchavi or us"^t.l l y ;." bJ." plaiD e d b ! D rl ri ri p l eta c to rs l. A bchavior should nor be cc,nsidere(l iypical unless verified in a con. unr r ' ,9 { J r,\' .., i fi . rr,i ,, . ^r r u D .' .,ri .c lri \ J n ., h ,.r ,,1 \(r r.r on , /rpl o(, J:ron,. or ,. , rrj c( Lronot a orher r easo n a o l ec o m p e ri n g e x p ta n a ri o ns 2 A significant a.rioD should be observed agaiD for veriijcarron. But if . ond i ti o n s a re l :.e a re di n te n ri o n a v an{ ior \!ill recur, rhe loss of rhc nerural;sri. or rr mry pronx)ie a more socia y deei :1 . If d re { ,b re rv a l o n i s n r a st t r r n( c 5 , a n d u b te L ri v €d r\L ri D ri o n are i r pos in g a l re . a re e x p l a n a L i o n sSLrch . re t hur ,a r b e \h rr.d w i rh \e \c rr' l , nrctprerers ro.{, ei r Independenr rudemenrs r egar o ,n B .? u \d r,L \.R e r:" ,,.." .....l ,i .,nu,cti ketrroprori aere" j coi prere. le\ s o e ra rte d ,o r In q i q n i fi c a n trn to rmaLi on 4 s i n ,e s p " n rrn e " u , o b se^dri ^n: 1re unptanned, br defi ni rron. a re. . . o,dins rnrm . zn ac(ummodrLe such5,rua,i"";,rrr t. ,",ll,tr.-.i.",.. 'harpre(on(ci,ed ""i norions rhe oh,en.er mdr h"rt,;. .p, ,; b" ,;;. t::.ll ll1:l:, m o\ t ra r" ' tra r.ro n o ri .e th u 5 ea s p e,r, ot an event rhar be, fi r rhei r eri sri nc know t. eoS e D d \.. to o rh e ' b o rd r. rh e c rp e(rari un, tormed trom our pri or exo" eri enre are more likely ro be tutfilted thai are events th". "f";th;;,;;;;;i;ii;.; ".. Planned Obs€rvation A i e l l :p ro c l a i m e d ..p e o p t e w akher .an easi tl overuhetm w i a i ns l i s. r ener w rrn rh e p e , u tj a ri fi e s ,u n p re di .rabte \i mi ta,i ri es.rnd new l y di scove,e; dj r er $r v o b s e n e c t i n a p a rk . o n a n reer corner. or i n d bus) shoppi ng mal l . w hen wr s e t o u t ro H a rc h p a rrh u l a r e v e nrs.a(l i ons, or obj e.ts. w e seem ro be more m or r v a te dto a ..o mp ti s h o u ' p u rp o se and Inore srri sfi ed ro have done Jo rhan i l v eJ us r n a p p e n e d ro h a v e i e e n s o mel hi nS unusual . S o i r i s i n rhe ctassroom. I noug n u n e x p e (tc d e v e n rs( a n b e ex.i ri ng and rnreresri ns,ptanned observari ons c an pro u d e re v e a l i n g . u n i q u e i n ro rmari on abour tearni rs rhar can be used ro m anr P U ra teth e _ (o n d i ti o n sto r te a rni ng i n a posi ri te w a). S u(h i nrenri onal obse,. r il: 1, ,r e | | rc ' e n r h a y to s a th er i nrormari on abour rearni ns n)tes, meLl od" or pr obil F m s o l t mo ro r.s ti d e v etopmenr,truqrrari ontevel ,a;d coE ni ri te abi f 'ng, NONTESIAND INFORMALEVALUAT ON MEIHODS ,"':,"1,"r:, ,,.," lll'..1n"' ur e pro c e s so t p ta n n e d o b s e rv a ri on Bu 245 , , 1 L \ d e r, , , , , i, , , i, , :. , F , ,o i. , i, \ 4 , , , , ! I ii:il':l:':' ;illlll'l;' :i:il,;i; :;;:ll:i, l,:::: llljl I ' ;;;';'J:l:r..i T h r q u .,tj r\ u i \e r,,i u h \ (r\ur| g" !rl \ " ur I nr r' ,rm ,h .,, k J \ d e rrrl .i B,,,.d," ,ii ," ., ];]l r' ,ir, " ,uer,cd r' |. h e h dr iu,\.rn JdJ,rio,,,rr,,:,,;ri,,., , i, " . , 1 ' " r, (u l'\ n,r ' ,u,l i ,\ ' | 1,' ' " r. , , ia rh . ,r," *.. .rrr," .'l,.-'.;t;;;;il;':::ii:.Ili:f."JJ,Xffi x,;,3;,1*.*,'"..t', t Ohsttu.r subje.tiriry. .,t.ookinr s eer ng a re c o m n o n e x p re s s i o n srhat' r nr F d p rri e n c e u n k n o r i n s tl S LrLh t nerc r n rr n u r e d ri l v ro n tl . e d -o r i l rcl 2. ohetu* inlertue t.he rcason Flieve rhat rea.he, absence from a ctars, :ome roo cooperative. Usually, a snall number of observations, and more ex. ,er.ome the.egadve effecrs of obserer ,crerisric ro be observed is described in rence wil be required of rhe obseNer Fnaxon ror exrnrpte. i \ ro derrrmi ne n a ru-mrnureper i od. qe are i n rruubl e he i s off.rast; A nd hor w i l t w e tnoi o.currenre o[otfra\k behavi o! ,an be .o be obLai ned rhe rark behavi or i s 246 NONTESTANOINFOFMAL EVALUATION MEIHODS Obserye on Schedutes NONTESTAND NFOFMALEVALUATION METHODS 247 SealworkObservalonal Fecord 5 Workingon ass qnmenl - w trr other slldenl 2 Do ng assignmentlor olher ctass Aead ng brary book or hagaz ne Ta lkng w t h anolhefs llder t ( nonwor k ) 5 6 Figur€ 1a-l fsJ rF+! | I fFl-l Sampe Obserlalcn sch€due B c ha v n )rsa re l i s re d j n a .o l u n i rh rr .,n he scannedqui .kl y and l he dcscri pri ons ar e no L to o d c D s e l yp a c k e d .(a ) Ih ere i s room rr the boL(on or on rhe back si de lor s u p p l e ' n e n ta ry n o re s o r i n rc .pretari onsof rhe recorded dara in social ralk of idlc rime. I he reacher share.t dris informarion wirh rtre classand asked for their advice abour wherher the free-rine perrcd should be conrinued uhat l { o u l d y o u r a p p ro a c h a n d d eci si onsbe? Chockll6ls A checklisr is a ser ol phmses or sraremenrs rhat describes cither rhe es. sentEl steps rn a procedure or rhe most imporranl elemenrs of a producr. ordi narily, the evaluaror using a checklisr will simply check rhe prese;ce or absence oI eac h s te p o r e l e m e n t,b u r s o me checkl i stspermi r a rati ng of rhe quati ry ofthe NONTESIANO NFORMALEVALUATION METHODS ac r k ) . o b s e r\e d .r (h c c h a rrc r.rrti c D c c r t ' f lin g th e rc a d i D e s so f rh c i r p tr,rc . he. k lir t ro c rr$ rre l h a t rh c t h a !c c o trrl nr i ' u r .r mrD rrl , h rr I l i s r l rr r,,l '(r 'r ' r\c J r p- r rn i rh t,ru d r: L trJ n u n d e rs c.u: Y c l o n (1 9 3 4 )h a s s h o $ .nh o w cbe(kti srs.,l l r srpport D anv aspe.rs01.i n ! r , ' , , i, , ,,i l ,,.l d r' ,,,,,,,,.,\\..,.rq , p.,,,,, Lrrt, r, r.t .' .i ,.,,r.i nc.r.r,1..ri ,., \ r , lc . J .,,t,r.r i ,' 1 ,,,,e | ,,r rtr.J trt.,r ,..,r,,.,,,.,,.," ,.,,,t,,.,t," t,.,,;,,.; " ," . ..;,, appr op n a re p ra (o c e (rtn b e c o n d u ( rcrl . Il ur nr(^r i D rporr!nrl r, the chec| i st D ro \ . id, . r he ,,i re .,j ,,rJ ,.rp ri ,,,p .rr,.r . ,,,,, rt," :.r. ..,,rF, .,., i .," ," .,r;" -, \. F \.1 r..,r,,,n ,d ,,,r,u,,t,,,r..tpr.,,.i ..| -\,L| .IJ| | r,\| | | I tFrr,,Fr, s h, ) nldb c .a b l ero e v i i u :L re,h ri r o !,o pe.ti rrmar,." ,, ,1,.,(,.ghl y.;; ,cc" .ai ;i ; ! \ lhe i u 5 tru r rr,l ", ()n e Bo a l .l a p h v s i c a tfi rD e ssuni r i n a si xrh g, ade tci hh cl assi s ro hal e \ r uder , r \ d N e l n p .rn ,1In ,p l e m ,. r rr cx, r, r\c pri i g,:,m I hc .!Jtuj .i .n utJnui ns guid. r r ' \p p p r.l i r \ \h , .rrtcnr, k I l ;..,,pi . rn" i r p,.l :r ;;: ' N r rh i r " r " ,,, a\ a f . ' m , i \u n ' m J , r\ r c r d tu a ' i u ,r.| ,,r p,,,p" * , " ,ti ;; .\. ,.,n.,n;.,,i ;; .,." ;;i ; "r rrp ,rr n n d In r n \s e * i nS rhe (umpter,ncs nf i r,,;,,he,H t,r in F igur e l 4 -2 w a s d e v .l o p e d . T h e c h e ckti srw outd bc would b e B j v e Dro c a c h s tu d e n r a r rhe help describe ihe essenrial ingredieD$ ( ( he t ea.h e r j n e l a l u a ri n g th e q u a ti ry oj thar were tbllowed in rhe dcvelopmcnr FlguE 1.-2 SampteN€a th BeDo( Checkrst D/e cd r O " r F lne ir r or or . ac h D t a l ea o t r , +r r r r h e s t p pw a s . o r p t e . F d s a r i - d c Io 'rv. pd c e 6 n ljs I - ) if t \ e s . 6pwar' len, . on p r e i e d J r _ o r s i a c t o ,v . d l d L s e a z F r o lo,,r the _ - - A d^clor's ad ce .bo r ere..,s 1g wdr obteireo 2 Baseli.e lilness tesls wefe taken - _3 _5 a Putt-ups or arm hangs c Fecoveryindexlest hprovemenlgoalsconsislenl w h rhe lesl resrttswereesiabtished Exerclses appropfiate for lhe ooatsw6resetecr€d Aweekrylimeschedute waseslabtished 6 A two-w€ek ]ournaiot programlse was nctuded 1 Personar6aclions lo programetteclivgness enloymenl, and needsfor chansewere Genoralqlalry ol wril€n expresston: NON]ESiANO INFOFMALEVALUATION MFIHODS 249 I . Ob ' a i n o r e n ri s ' o n e \a mp tc\ go,,d and poor vers,oni of,he p,od ^t uct to ,br c\ aluare.l D e . i d c i t e r!h j u d g m e n r .houl d be d \i mpte )es_no, presen(e or ab. - 2 . rI h u r.. n r rt rh e q r' a l i 't of ca( h artri bure musr bi asesed. ror In fi g u re t4 .2 i udgmenr abour quati ry,sar;\trc. ' equ' rcqp,eqen(_abrcnr I nr \ ( + ) or u n \d rrrrr,r,r' v /-\ i n rd d i ri on ro r0,.(i ffi nerdi sri n. r inn\ in qu rl r(v a ' p n e e d e d ,r ra ' i n s r.al . \outd b. ..r" ," .f,t ,h,, ; a;;;ki i ;i ., J . td e n ri tL rh e p ro d u ,rd ri b ure thar musr be preqentand descri be hoh good r n. t p o o r e \J m p te \ h o u J d h e e x p e .red ru drtrer o; eJr h arr,i burc. For exrm. pr e. I n \ r e p .t u t rrg u rrt4 2 .a l l ,ru d e nr\ma\ ti .r goat. t.r i mpro\enrenr. bur rhe tween rhe goals and rhe tesr .esutrs is likelv ro bure\ rnd subdi vi dc rhose + i shrng a provide diagnostic feedback. In Fizure i ndi vi dual t) ro hej p asrs romol ere" ness ir nd r o , n n v F r rh r. .' i r" Io n i n d d v d n ,e r. \rudpnl \r Wh e [ p u * i h l e . | | v .u ' a ,h e,kti \r drdtr un a tew .J,nptc produ, ro Ls , . 5. . ne( k ||\ ro m p ' c h rn \rv e | l e .s a n d ' l | ., rpte!anre ot ca.h i rem \tt (he, he(kti sLi s dirtributed ro srudents as a grading guide, it should nor t. i."a;n.a gt r r nt \ r uo c n r\ d n o p p i ,' ru n i Il ro rd tr rhF modi trati un\ i nro d,l ounr "iG.ri ,n rhei r pr . ouc r de re to D n re n r) b W h e n a p p r.p ' i i re . / n o \ i ( c In r' \e rhe. hFi kti \r. B t ob\ervi nc and " rk l { l r lmcr sisl lrn!:|'...1 ano s€ l e m e,,.. n rs . ,,i \ p o s\i b re ro un,,,ver ambisui,,es.,*h"i,,r j;;s;;, T h e fo u n d rl i o rt ,rp o n $ h i ' h r ppaonor, p.rpl ,/' l / r\ bui tLi \ a rd\k andl vsi s . or r he. pertu rm J ,rp r,, h , u b ,e r!i d . l hF pr,rcd ,ps fo! de\etopi IR J D e!torm. iD. c ne, k tr\t rrc .rm rtr o rh o .c ,l e r bF{tJbove.tur th" re are * onrc i mporranr ' I . o h ,.r. rn e \p c fl p , rark and rF.nrd rhe e\senr.al \reD s. . ,ma r \hout,t ' he \ ur e r he m.' rF ri a tsd ri t ,u n d r' r,,n' .torh L. provi ded ro rhe performer' as par t of t he ' g i v e n s" 2 Use a .lraft of rhe che.Ilisr ro obs€rve anorher €xpert so thar differ. p " ri ^ rIi n g (ti r i ,al sreps ot rhe prorerr ran be dererred r nr \ ^\ r F p s rt, re d ,r,e h i a r d u c ro i d i o \v n.rd\i es i nhFrenr i n rhe pertormance ol r r F r I . , er p c l . O l r,.u r\e , i I rh e rc a re s.reral efi ;, renr ana eheni v. w ars Lo , nm pic e rh ( rJ \l i n q u AD o n ., p e rl o , man(c.he, kl i sr ma) be an i napproD ri are a( , \ \ m er , r ro o r.l u \u ,h ,r\c .(h e p ,o d rrr\ti ketrrobemorew o,,hvotei al ua| ' ng I l ' \ h .n rp p r u p ri a re . a l ra .h rh e (r i te, i un of acreprabl e D ertnrmanLero r he dei p | | .n o t rh e ,re p . F o r e x a mp l e. r che( i :r tor r;ti nq \omeone,r btood pr e*" ' e m rR h r h a v c rh r, s rrre re n rj W rapr rufftnuggl ) abo-uruD D er arm.6ar t^^P m 'uqh ro ttnt nb w lingn .an bp tn,at&t baLcft .;[ and om." ihe it ati.ned por f lnn ur In e s ra re n e n r rn d j ,.,re . w h a r ,,\nug enough. medns L I' c e rh c d ' d tre d , h e , l l i q (n o b,ena a norn e perfor mer I hi , steD wi rhe.|lge5or dirfiLurrt e\pc,ientedb] n." r.", n".. .;rr u.';a.n. l!15..*' 11tr."r nir po r n . u rr rF nI d e ra rt.| | d l \o s i p u i nt our rhe need ro i nspfl staremenr\abour 250 NONTESTAND INFOFMALEVALUAT ON VFIHODS whar should no, be done o, hou so, *"uid h('ione' rrr addi'|innir prrr 'ide' i, h".I on *h;;;. li"i,-;;I":ll" ""'i'lw"dr prerequrs"e'I'eedto he in' urpurrrFl i' rh, ing otr;..','"*I cood checklisrsare rime co dererup Lur rhcr xn pat innru' Iional andarses\menr ' ai',a..a' iil'l l,''8 '' \iruirions Yek'n (re84)has ra'"tin"a hl''1-''crr'or nr.h d , h,1 kti:r i. pdr ri, utir .I pdl tt kc "o"'ur '""';";;;;:;;i t oari'urarrr inporbdr br.atrseit is a prerequisiLc ro rearn IN""'i;.','Ji",,T"u' '' '.o.t , r. dh: r : is , o , n p r , x . . b F i r u . c n i r h p n u m h . r o, \ i rrneni\ ur be .l^. , u!: :Fn or fr:he I nu n, r ir , nt p r , [ . u J , r . lcn,enrt i Irhtu rh€ fire_tn.i.g of a skill requires fair!), dcrailed ree.tback to rhe learnci 4 Wher suden$ depe.d on sel thaD thejtrdsrncnts or aD in' {ruc,or, .o .neck rheir pr**li"'"""""' -u*' RatingScat€s Anorher rool of obsenarion . now trequenrl, a cerrain behavior oc L The trait m8t d? \fttpt t , L, hem , r q, , a 1A Sludontts prompt, 2. U.uatty 3. S€tdom be deytibed obt nt.h,\tp.n, nth?t,atppo.nt r g ir r u, , , j , J l's : ',e: :t :etrt a \ .r'i t hp, ol r n '5 r € t a r e d r o hair definiriun. 18 Studonttums |hathpapercin on rtryre e6chws€k. 2, 4 out ot 5 days 3. 3 out ot 5 days 4. 2 o. t6wprdays a so6k l:r+lf t+:1li*'T:H*#,ti:iir,,,m*;*T NONTEST AND ]NFORMALEVALUATION ME.-HOOS 25I rng rhe rimc perio(l ro s har i\ rlpn al in a weel.Norerharrhe slemtrom irem I B Jrd rhe choi(c\ I'unr rren,IA rugerhertorm a rlirtv ambiguouss.aleirem.also. Iren 2A mighr appear on a scatefor radng rhe quality of an end rabte Dadc in a woodworking class 24 Oualiiyol th6rabl€iop surface Otbn jolnt The firsr ser of sale poinrs is ambiguous and imprecise regarding *re arribures of the rable rop rhar shouid be assessed The crireria of perf€crio; are star€d in observable terms with rhe second ser of des.riprors so rhar more objecrive, reti able m e a s u re me n rsa re l i k e l y to b e produced;i rh i t 2. \ab dp\.'tptu,\ .]t.rld a Mdp di,aer5toaot athd qntitr o lr.. '?pnsnt qupt u) .n u t b u th .l rF n r 3 A tro n , a \p rer h evaturri on s.al e i s formed * i rt i sei of fairlv obsennble scale poin(s, bur some points retare ro frequency of behavror (2). $ne reiarc ro e,vecontacr di.ecrty (l ;nd Sy,and some reiare ro use ofnotes ( 2, 3, a n d 4 ). 3A Maintainsoy€ contactwtth sudience. 1. Spansthe.ntlre audionc€ 2, Occaslonsllyr€l€rsto nor6s 3. Tendsto look ar onty 1 or 2 p€opto l, Dependsh€avityon nores 5. TEndsto re.d AD impror€d ser of responses rhar focus on eye conracr frequency is illusrrared by item 3B. 38 Frsquencyol eye conlact with audienc€ t. Al lea6toncs everysont€nce 2. Onceolsry 2 s.nrenc6s 3. Onc€ in eye.y3 to 4 s€ntenc€s 4. L.ss lhan onc€ In every3 to t sentetr@s 3.. Wha no' a.'4aa" qJ tatt?g, rtp ,adght. \p,t tp th. r|nm? gr^up bpingbpd. 'tems lr k ( + Io t) d p p (d r o n ra ri n g to rn rs tur repofl i nB \ch,,ot progre$ or fo, re(,,m. m endr n S rn d r\.d u rrs l o r d d mi c .' o n o' rmpl ol menr 4. How w€ll doesthe studentto ow diroc ons? 5- Howwouldyou ass€ssthe appttcant'schanc€ot compte ng a gradu!toprcgrlh In €duca. 252 NONTESTANDINFOFMALEVALUAT/OTI METHOOS 6. How woutdyou descrtbetho candtdate,swritingski s? 1. We abovoave.age 2. Abov6sv€rago 3. Av6.rgs 4, Botowaverrg€ 5. Wettb€towaver.g6 l i ft rh( rel rren,, g,oup ru u\r,l or .\amptr rdr' \n' ,. pr' ri ' ul J! t,rogra,n.. orhcr: ;i rn l he ri rrer Jnd rl ' e I\er ot th. rrti nq\ dre IoL :ation of rhc raring In rnost casesit rs norc u p ro b' b\ rne rJrer5 rhan ro al tns rhr " \eddcd r p they deci ro u!e. , | P r:tatrJarp @ \,pt, un paht \,tt. p4 \ t\ t. :::l::l1l.i_".:..":,"r"ry,*::ill:;fii:i,.ii,l:,i:,]:::l,l#,:jl; scare ponrrs need nor be defined hrpolar aclje.rives can be uscd ro detine onl." rhe end po rs Then rhe number ol inrennedrary pornrs can be varied as .tecmed s uir ableb y th e s c a l eD a k e r atup to indrntp thar th^ haup tuJ ,t,\ulJ,,mr rnc , rr, um\tdn, e\. d tai l u,. ro rrre i an L,( 6 . W h n \p u froI t t a tt, d b ? h u ,,.i i u, ro t" ,o, 0a,,, ne t h,,a h, al ,, th?.D bto " " pr iat np' \ o u h p ra t.l o r p a .htra \h n u tdb p.h?, h.d tt., Z ,.1" ,, " ,." - , screening form thar uses a 3_poinr Ering scalc: .b.," ;;;il,;k;" ,.,i i i " Jf ",..rg., 7. Enorgy 9. P.rUctpatsoin,unn|ng samss 10. Thrcw6. b.tl T he dr r F tl o n . rn .l i , d re rh J r p rF r.n r ,l J$nrarer \t,outd bc..,r.i de,.d J,,he rcrer enc . gr ou p \. !!h d l r\ a h n \ r a \ c ra q e p o \rur e ,d,; \l trrr i \ .,b.!( .r\.r Jqe t,.rnr, i ua r 9, ; l h r.q i n s d b J Il . q i rh .u ,,.r" r " n," ro,.r,,.) ,1r,," ,;1.,f" ;;.:: 'vion es . no r e \p u n \e n o r a n o rm re frr" n red rat| n< . A \ ", c\cmpr,t;ed h,.rF.,l dri r) should never be sac fi.ed fo. efficier(t Ur ing. d. r o ti n en d l e . R a | ' n g \i tp s J ,e pronF ro cc ri n ()pi .s o| errur\ rhrr ra,, Dc m inr mr./e uh \ . re n ri n s rI a h rre n e .s nt .u, h.,r.rs rhroudh r" rer rtJrnrnr. l rl dddi, , on. rh e p ro ,e d u r; fu r rJ ,;n 6 .a n be dc.,snrd ro r" * 1," ,r,.,r,,* " rr,,, , dr inger r o ,r h i l ,o ,n p ,o m i \e ,.o ,; rdti di r.. r" ,i ,,n,pr. ,;;:; " r,f.,-,.,,i l, , a' e all 26 s ,u d e n r.,n r , td \\ u n e a , h . r ri ,c, h.,,a,re,i .,,,\.,,,;i , b,-;,.,,; ,.,,;." $'ill be preferabte to srudent bv studenr raring. Thar i,, "ll'.,".1.;i,';;;;;-;: NONTESTANDINFORMALEVAiLJAIION METHODS 253 r at ed o n a rre n ri l l e n e s srh , e n o n c o uperari on, rhen un usrr* rrme w i sel v.and 50 on, in s re a d o f ra ri n tj S a ra h o n a I fi \e t| ai tj , rhen Mi chael ,;nd rhen C had. Ir,s pro(cdure litt help iL\err h.aroefe.! crrurs, rhe tendency ro sile rnore Dosihve r a||ng \ u n rtt rrd i r, ru \u h tc , | l s h o prui e., rn u!era posi (i rc-rL]ra.thi , rr,om m enn a l u n d n J rn g o u sr. o n e m.,dFregar.l i ng rhc ,.or i n8.t $sal sr l o prc ' \ I' u m h .u n s e rrom i n ! . nr rh e \o re o n . re'\p enci ;s,hcrori ;cot ,h,, D ;;,:" " or h. r , e \p u n s e \, .,l l p rp r,. s h o u l d be v o, rd i rem ri i rem rarhel rrun ,rJaenr u1 Sevcrai orher kinds of rad! r he ( h d rd L rrI i s rk s o t i n d i \ i d u d t r. scaies For exampte, some rarers ha\ r he v a l e s h e n r;' i n g d g ru u p o r i n d t he s .d te @ tn \.4 \d frtO)h ^ a u \( ul rarees, or some other unknow. reason. others use moslty rhe posirile end of rhe s.a,te(sd6asiE 6/ots) because of an un*lins,,"* . *,ig" an r na b i h ry ro d i s c ri m i n a rc l e v e l s ol quat,rj . ur an adheri nte "1rr1,*ii,i;;",,"s,, ," ' f" ,,r1 to" dards. Finally, ,'ro'r ,/.mttur tn.tenq oc.v shen nre6 avoid eirher ;xb.enc" o" _ oi rhe scale and use mainly moderar€ s Land a rd \a n d d rr u n c o rn to ,rrb l e t s, , r 4r is m o re d rmL u tr ro d e re c r R i r{ rarer (redieads are votarilc pe^on: hangs our wirh srudenrs who smoke) history facrors (his brorher was a very poor i erro$, rhe mosr effecrive means of pr;ven, ns that creale an awareness of rhe porenrial he relative uselessnessofscores thai contarr I NF O RM A LI NV EN T O R IES There are innumerable insrrucrional siruahons in which addirional rnformadon abour rh e l e a m e r m ;g h r h e tp rh e r er' (her i n, redqe\rudenr mori vari on, ri vare a, P f lo' k n o w re d g e .(h o o s e rh . mo sr ette.ri ve dpproa(h. or i usr de!etop beuer t eac he r .s ru d e n rrrp p o rt. S o m c o t rhi : i nl ormari on i : earheredoi c, emeat.un,vs I em ar i .a l l y rh ' o u g h rh e (o n s ra n r s pontancousobsrnai i on reaci err ao. as i n -sz. ingup, i nto,ma. .ro re ra m p l c .Pa rri (u ta rl v s henred(hersneedenre,i n8behdri or ||on s u (h a \ p rj o r a rh i e v e me n r i n r eresrs.arri ,udes,or prefei encestor tearnrne . r v r e5,v a n o u 5 i n l o rma l a n d \v s re m ari .devi (e\ can be us;d e(onomi (a r ro bui l ; a s t o' e o l d e s (| i p ri v e i n fu r m a ri u n . Thi s srore of i ntormari on tan be rapped tJrer as needed iD a variery of reaching-l€arning srtuarions Oue6llonnelro! Classroom rcachers who choose Dot ro rely on lasr year,s t€acher for an as es s m e n ro frh e p e rs o n a ti rv o trh e i rnew rtasscan..ukesroi k,,ofrhenew sroup wr m a D n e r In l e n to ry i a s u d e r q u e s ti onnai reta;tor.madero ach;everhe reai her,s rh o w n i n F i sure td .S provi des samD tei tems ol fi e rvD e P t ' r P os e .rh e ' nwv erhn erore rya (h e r t har wo u l d a l l o ro s i /e up a new (tass erfi (i eni l y. The use ol " ;;,. 2Sl NONIESTAND ]NFORMALEVALUAT]ON METHODS ended ncN giv6 ttexibilir) ro respondenrs bur requircs rhe same care in devel. o-prenr as rhe compterion rcsr rrems discussed i, Ct.p*. tO. Wr,", ir,. t...i.. ,\,,,r e,r\r,ed in:p(,,rk op,on,.rhe . * u n d n p e u r iri. . irr I ne r r s ult i ut an I nlor m al invenl rarse as rnany quesrions as rhe survey s E ro 15 are as shown in Figxre l 'tens m J r r el l u e ru rrh rr p ro b i n g . a d d i ri o nal andl\ s r su t s p ((i fi ( s ru d c n r rrrp o n rcs r, -. . - " o ir; . ii. ; ,l Would Lheuse ofdyrds be nore eftecov€lhan snau group or commiueeworkr 2 ls dris group o.ienLedrnore 10 oral_auralsrihulus mareriat thaDvisuali r Arc most sLudcnrsshl abour oral reading o, do rhey lac\ confid€nce iD their re a d i D ga b i l i ry ? il Ilare Lhew.iring experiencesof.hese sudenrs been lirrited, unsucce$ful, or , I nonre{reatidral experience have srudenLshad wirh a microcom. iJt:,,-*n 6 w l l r q o ,l rdj o t rt,eEr,,up,arherLc !.td r pu/i c .otutnn ttdn tu ' l lcl p ,d-r.r ttute ," ' .. ,! a /i .!,q h , , .l u n ri e t,l ,d or puzzt.rrre.' rr.,. i n -i ndl rFiguE1'l-3. CtassSiztng-Up Inventory Dilecllons Pteasecompteteeach oi lhese sertences wtth a word or two thal best describes 1 My tavor le schoo slbject ts 2 3 Ihe krnd ol books I trke10 read .nost is Theradioslaton mosrtistento ts 5 My favorltesummer actvily is 6 r wourdralherdo homewoftthan 7 Dlrecrbrs:P easecompleleeacho' ihesesentences by crctingthe 9 J preferto tearnby 13 To sotvea puzz€, I woutd 14 To wr le a paper I preter wordsthal bestdescribe NONTESIAND INFORMALEVALUATION METHODS I Learningpreterence rl-lJ I | | fr.{.]ftu l I fH.tI l n-u fJ-.|]tH] I lN.l ft{t I t\i f1.uf1.!IIl r.}.{Jfl'.U fF|l |]-u ||l fN] | | I f N . . t fHJItl f]-.tl FIOUE14-4, 255 fl.lJ ft-u fN.tf1'1.1 fN..l f1'1..1 fN.l n'! |.iJ.r ft'jJ 11-.1..1 lll,l I l Slmmary otCass Invenbiy Aesp.nses Informat inv€nrories like rhosc develop, once the reacher has thouqhr class analysis is imporranr for the ieacl may promoie some self,assessmenrifit will nor draw arrention ro rhe responser responses, an index.card file svsGm ca of updarrng after the next suney. (Th raprdly, €ven daily.) Similar informarion can be obtained by the,reacher rh.ouqh .-. indilidual c o n v e rs a (i o n sw i rh s tu den15.Ihnush tes( etfi r renr ,i l " nair es . t h e re m e rh o d s p e ,m i r ro l to u uo ouesri on' i ng i ;, .l ;,ri ;,;;;;.:;;fi";;,," .,r,,;' t : n elic ir i n fo r rn a ri o n w i rh o u r I e q u i , in; , edurnS .w rl | | ng. ur \ui ahut,r! Jbi l i ue\ t ha' que\ri o n n a i re \ d e p e n d u p o n . Inlormal Reeding Invento sg I formal invenrories in areas like readingand marh are needed occasion. s abour indivrdual srudenN or ro obrain exampte, math or foreign tangxages). 256 NONTESTANDINFOFMALEVALUAIIONMETHODS Mosr informal reading inventofies consisr ot a graded word lisr, gradecl r eadin g s e l e c ri o n sa, n d a s e ro Ic o mpreheD si o qucsri onsfor each sel ecri on..fhe 9 8 ro 1 0 0 95 90 to100 75 T h c n ,l ' o s i n B ru g g e .ri .n \ J rc o ered tor d.\etopi ns rntorm:,t red,r, r: ir ' \ ( nr uri .' rh rr h i l l p tu \i d " ,h e mo\r medni nsfut i orbi qenr tr^r;; " na " " tc. s t udent s r. g ra d e s I th ru 1 2 . I An exisring gaded word lisr may be used or o.e Day be dcveloDed by raodoDlv s.le.ting words from each rcxr in a sra.led basal senes A lisi of l5 r ' r l0 u o rJ . fi u m e h l r\c j . p ' i n rFd on.ard.r" ,l torm, s.r rhar rs,..\i t) 2. Using (he samc basal series, tlo passagesshoutd be selecredfiom near the beginning of the book for each grade level for which rhe inveDrory is ro be us ed lh c p d \rg e \\F l e rrF d (h u u l d b e rep' e.rnrrri teofrh.,ubi c,rmJ er,ru,:,b uld' \ . r n d l d n q u J q e ,.n ,p l e \i r\ o l r hp grrdel etct rexr from ;hi , h ,r kr. Inkcn E\ans, f,vans, and Mercer (1986) re.ommended varying passage tengrhs ac(orcl ing ro gnde level preprimer 50 words; primer and srade i, 100 wordsi aD.l Hr adc ' 2 d n d u p . 2 n o w o rd s s o m e i nren' ori c< u\e i ntroA u(l orl phraseq,rrt, ., ul illu' r r J | i o n \ ro n i I i \ a rc p ri o r k n o w l edge.hu' su, h prel araror j i .l i pr e, tutl F rhc l use of "marn idea ' comprehension auesrions larer 3. A set ofeighr ro t." courp..r,."."" quesrions should be w rren for each passagc. Since rhe quesrrons are adminisrered omltv, rhe use of liec_ r e\ ponq e ra rh e r Ih d n m u l ri p l .rh o i .e formar w outd pur a ti ghrcr todd on,hor(. r er m m e mo ru d e ma n d \ Mo s r i m p o anrl \, (he quesri unqmus' requD e mure rhj rl r e, . , ll o l fa c ru a l o r l i reral i nterprerari on A premi um l houl d be ' nufo plr ' c d o n i rc m\ rq i rirma n g riionnfe re n c e rnd on B ;neral i zari onquesr;unsrhdr ger rl "why," hoq" "what il or "whar next?" 4. Review rhe graded word lists, passag€s,and items amonga seroftea.h er s r epr e s e n ri n grh e s a mc F a d e l e v € h as dre mareri ats C heck for passaaerepre senrariveness.irem ,mbiguirv irem keys rrhe range of ac(epr.jble resp;ses fo, open- en d e d i t€ m s r.a n d s c o ri n g c ri re ri a tor basat-tcvelpta.;meD (s. NONTESTAND NFOFMALEVALUAT ON METI]ODS 257 T h e n ra rc ri a l sd e l e l o p e d for an i nfornal readi ng i nvenrory shoutd be ... ^ valid for use oler a number ofvears. As long as srudcnrs h;ve nor had an oppor. tunirv to read the samc passagesas pari of thcir regular classroom t"struit't",,, inv e n ro rv re ftl rs s b o u l d p ro v e v al uabte i or pl Jceme;r i n a seri es,thoosi ns cerer al re a d i n g o r d i a g n osnrgdec.d,rg qeakrreses Tcachersmreh' t' ;tso rato re ma ri a lkse, d e c i s i o n sabout our ofl evel s e t h e re s u l'rs resri ngpri or ro the ;dmi ni s. tratror ol an every.pupil achicvement resr batrery (See Chapte, t? for further det a i l s a b o D t o u t o fl e v e l s ra n d a rdi zedtesri ng.) O RA L qUE S T ION IN GT E C H N IOU ES I h, rp ,h n rq u e .u t u r.' l q ,,r\ri u n r ng\cnrrhetun,ri onsot tnsreri nql earni nsand r . - c \ \rn g th F e \re r' r u l l p d rn i rB ut .nur\e.th.rropu,po,.." rren' ,," " n," i n.a, particularly when rhe nature of rhe assessmeDris forDative rarher Aan summa. r i\ c . T h e I' ,l ,o \e , rt rhr\ \., r i o n r\ ro d.monsrrrrc h.\ o,.rt que.ri urrne m" rhods .,,c J n i n s tu l d \\.\\m p nr .j J,d Jnd hnq merhods , r n L ,r,,d u e . of re, ordi ; rhc our cones ol quesrjoning can coirribure ro rhe collc.rion ofhighly reliable-informa. Purposes ol Ousstionlng T h o u s h c ,n n i r i ( d l c \ rd e ,,,. t ti ng. burh togi r dnd rxperi en(e suaseu '\ , hJ , o ,rt q ,,,s ,i o n i n g i \ ,h F mo rr r,.quenrry e-pr" y.a i n.r,,,ri " n,t ,e, r,ni o' ue. wh\ r' rh i , p ro h d b l \ rn - w h i r t unc0on\ d.e\ ques| oni nS * ..- ," ,.,,.,t,r other rechni.lues rccomDlish less effecri\ Vodel descrited Chapier 2, oral quest 'n ing informadon aboul cntering behivror cedures, and in assessjngperformance, sun\ oral quesdoning mighr very well br begt n a u n i t o f i n s rru c ri o n a n d rhe l ast T h , D a n ) p ,,rp o .e , to r o ral quesri oni ns i denri fi ed b\ stl csi ns. R ubet. and Q u e l l ma l / , 1 9 8 6 ,a n d W i l e n , tqb6/ can be , o' .e" ' i ,,ea pr i m;;tv ,uooo, r. , rposesqi l l ing e i rh F r d i re ,r i n .rfu n o n o r a\\Fssmeor.B orh rrpe. ot "pLi te re. t i. qed h e re ro h e l p d i ftfl p n ri a re r hpm dnd ro i l ru,rrai e hos l ns;oa,abl e rhe D ur Do\es rre dt riDes. l. Mmitor fragess Teachers frequenrty ask quesrions ;f the class or di. rect questioDs to particular srudents ro make judgmenrs abour comprehension aDd the completeness of learning. The goal is !o derermiDe if mori examptes. pr ai ri . F , o r d i s , u \\i o n a re n e rd e d betore mo! i ns on ro rhe nexr l earni ns oi i ec. O f' c n rh e e eq u e s ri o n \ rre r' i ggered by rhe rearhers reaai ns ot no" n.ertat ' r r e.s e ma n a ' i n g fro m s l u d e n r ra (e3 idF 2. Enxovage appharin oI hnotubdSe.The ..So whar?" quesrion thar sru. . dents som€times raise can be iniriated by rhe reacher ro focus on the use ofnew knorvledge-to go beyond the staiemenr of a principle or a general method of pr obl e m s o l v i n g S u c h q u e q o n s s r j mul arehi shertevel rhi nti nq dnd hei chren i n. I e' es ' . T h e g o a l h e re i s d i ' e r I i n s rrucri on ra' her rhan assessmi nr. l254 NONTESTAND INFOFMALEVALL]AT ON METNODS 3. Stinxlate patricipati.)n S(u.lenrs.an be drawn jnro a ctiscussionrhroueh qu' \ r oni n q .a n d rh u \e E h r,!F J .n r i un\rrr,,\robFptre$here,.,.t,,;;,,,," " ;l Lndb\ ,r,oush,r,,r ;.,;';:jll, r,.,e'r"",. I h;;..;,;;,,1,,;,;q l:.:.j]:.1:lll:,,'",,", n c c e s r,! i o n d i ri n n ti ,, tedrnl rg In hi fp,n. 4 . n tu ,d ! p a ,t /,v ra ,/d ,. R cri eh .e,.i ,,n,,,,n" rh,,.r.rrr rrrrgF..d Lr , r of l h, om rn R s !m m a ' i !e c \rtu . rn n b Lr \u.,e Jt(, rrp l | 5.d Js \,,1, r ngnr p ' o \i d i n g .l i { ri h u rF d p ri ,rn . S u.l y* i ,,r . \.h,.rhcr " r' * ,,n-" ,:, In n g.." i ;;;.;, or rn a rasr movrng prosecuror srvte, se.vc ro rcinforce .ira t" asr".. t;st."ctl (learner and reacher feedLa.k) simultaneous\ )nore rhese aci i vi treshave borh i nsrruc. .ause su.lenrs trave a rendeuct to ask aues. rrcns like rhoseth€y rheDselves ha!e been asked, p..bi"s q";,;;;l;;;;.1,;'k; thought should be more frequenr than recall^quesr[;s ,r,;; .;;;.i;;;l; 7. Diagno\Pnudat ltoblens. St)o . P r oDesc a n te a d a re a (h c r ro rh e ro o r o marn puryose sened here latics probtems ar rhe elenrentarv lelel. rnsrru.rionat rechnique is used wirh sru. I needs solving or who quickly rune inro rr ask: .,If you were conhncd ro a wheel ,utd vour e\er, 15. prosra,n necd l o be a\t a memberot r vo el bal t rcam...H os be modified so rhai rhe new lersion has erobi c dance?,,D n.ecri nstrucri oD i s rhe W h e n rh e p ri ma n p u rp o s e o r orat quesri oni ngi s ro \Frue rhe a,\e\\menr "r un( non. In e me th o d s o r to rmi n g q u e \ri ons. deti veri ng quesi i ons, and i nrerp,el ing rcsponses are tundrnenrarriimpnrranr.rhese,i r,"i",* a*_-,." i,"" reraoreano ralrd the intbrmarionobrarnedwi be and hrjw,oun.r rhe lubse quent acrionsof rhe teacherwilt be. Guldellneslor Ou€sttontng The issuerellredto oratquesrioning thathasre(eivedthe mo\r arrenrion . rrom resear(ners relaresto rhe ra\onomirlevctof rhequenionslearhersu\e.To NONTESTAND NFORMALEVALUATION METHODS 259 date, the resuhs of thar research are clear despite the rheto c about fostering higher-order thinkjng skills, the vast maJority of teachers' questions r€quire recall, recognition, and literal comprehension This is iDdeed an unforiunate state. m€nt about the nature ofcommunication in the classrooms ofour schools SiNe questions play such a significant role in rhat communication, it aptEars that the iDtellectual lcvel of mosr verbal interchanges is much lower than ir ought to be. Hon can rhis sorrv state be exnlained? lirst, teachers have little, if any. direct insrruction iD rheirpr€service pro gr am < ro q u e s ri o n i n g .W h e n rh e' do, rhe focus i s l esson t;!el ' ;ronomi . ' el! r on e d rhe mechanics of conducting a discussion and more Second, since hisher. lev el ques ri o n i n g w a 5 s u (h a n i n s i g n i fi c anrpa of rhe reacher' sow n experGnre as a student, th€ t€acher has nor tenefited from the modeling of higherievel questioning. Many teaching techniques uq€d by teachers, and norlearned direcrly during preseFrce instrucrion, were obsen€d during lhe reachers' own schoolins. High. quali rvo ra l q u e s ri o n i n gw a s n o r o n e ofrhose. Thi rd. oral querri oni nBseems like such a narrow topic, wrth no perceived posiiive consquences, rhar ir is nor often proposed as an in.seFice education topic. Besides, the rhinking goes.ev€ryone knows how to ask qu€stions-hard ones and easy ones. Here are some suggestions for Daking oral quesrions more challenging for siudents_and for obraining meaningful informarron ro supporr instucdonal I Be cognizmt of the vetb used in a EEsIi@ The verb can require a simpte yes-no response or it can require a descriprion, an explanation, a n€w plan, or a reasonedjudgment. The explicir implicit distincrion made about instrucrionat objectives in Chapter 3 pertains hcre also- In addition, Iowlevel questioDs ofren include such words as who, what, and when; highlevel questions reDd ro use how' why, and which. Here are sone examples: Selactth6 mosr D€rsuaslreeditorial. Comparoth. peEu.slv€ quallly ol th6 two 6dltonr'5. Namethe writ6. ol Commor Sense, Cite the lime p6.iod duringwhlch Poot Richatl's Almar.c sas wriit6n. Explalnhow Jelforcon!.idoalsw6reqpresB€d in th6 wods ol the D.ctrratton. Whlchodlto alls mosl oorcuagiv€? Why ls this odllorl.l norc p66u!€iv6 th.n thsl on.? Who wrol6 CoDDor Sotrso? Whenwss Poo. Fichdd's lrmarac wrltt6n? How did Jetl€rson'siderls s€r €xprcss€din ths D6clarailon? These sets of"questions" illustrate rhat oral quesrioning can b€ carri€d our wirh both declarative and interrogative shtemenrs. Behind each declarative sraremcnr is a quesdon expressing the same contenr. 2 Wattlat a rcspon:e.The elapsed time bebreen the end of a qu€stion and the teacher's next utterance avera8esabout I se.ond Rowe (r9?4) has shown that these krnds ofbenefits can accrue by increasing"wait rime ' to 3 ro 5 secondsr 260 NOI\]TESTANO INFOAMAL EVALUAT]ON MEIHODS l. Srudentsw,tl Bieelonger responscs 2 More unsolicired,appropriate r€sponseswill be given 3- rewer casesofnonresponse vill o(cur 4 StudenGw'll becometnore confidenr in respon.ling 5. More speculationand wondering zloud will occur 6. Tea.hrn8 will becomemore sludenr centered ? Stud€n|swill nore ofren supply evidencero support rhe,. n,rarer.es 8. Srudentswill ast more qu€stions 9 Lor! a.hieving shrdentswiU conribure horc r0- Tea.her qucstioning stills will improve o!€r tiDe 'Ie?chers rend nor ro wai( very long before rephrasing o. asking anorher ques r iu n .Ord ma ri l \. rh e n e s q u e \ri o n i s \i mpl e,, dr d t.\, I te\" t. rhrn rh..,i " , rh e u p s h o r i r. l o n q e r w ai r ri me. rhi nk ri - e t.r ,,l Ll ,.Ir\,.houl d preserve the raxonomic level otquesrioning rhe ,.u.1,.. t"t."a.a ,o t ii. "." "3 Stat uith a sturtznt uho ans@s incompttteL) or in o/rr.rry If a lower.level follow.up quesrion seems caled for use a seqie..; .rq"*,i."i ," t. ,r,. orig'nal one. Ask for clarificarion, restarem€nr, exptanatronj -r"." or evidenliat sup puf l S ru d e n r( \h o J re a b d n d o n e d di rer ,rr i n,orre, r r(,pon,e l c,,n r.o rhr;.: l|}'j:li:";T,Tiil:"e;"t:',fJ;,Hi::iJ -"d,,"!,',;,;,;;;;;';;;;.1'; 4. Alh o lddnt to poraphru:?ot t.\totp th? t^paa"gium . h aa.ath\ tht\t\p" of quesrioning nor onlt de;ands consanr atdntionfroi ,,"a-,,r. ii ,lti. )res deeper undersran.li"s Of ..;..;.; r fo.marive evaluarron R)r higher order rhinking to occur, the rs rhat good quesri oni nsrsti keh to bui l d hy sruden$ sray shy, b;lps unattendi s d narrows the learnrng audience to the persisrenr or self-morrvared. dn.t ditp ttpn doun in on,au". \ton Fnad nrJt .IrF, ,,. ^. ." l,i:{ro,.S*'!;,mt a\sood {""cn u'in'" '|F'':r"m' il:;,iJ.'#,f'.';:ili,':,:::;';l:;*'i,l:' some,eache6 deve,op n,"r.,..;1,'ii::fl1i]'5i::,1::l;::j:","iL " .".d therr quesrions in lecrure no16 or on ov€rhead tnnsparencies. Sponaneous queshonrng is more Iikety to promote knowledg. t.,"t ,tiinkl"g tL.",nppii..i; of knowledBe. Recording Qu6sttoning Deta Tfl h e rF \p o n F s ro o ra l q u e s ri oni ngarp l o be used fo, su.nmrri l e eval ua pur p o s e s o ' rr rh e re \p o n q e sa re ro be anal )/ed by rhe reache, ,n di aqnose " on srrensrhs.ind seaknerses, a permanenr sroup i.o,a n..a.," u" a.ili?,".I' Doumenra,idn arrherrmeof responie isnearr, p*f.;;; ; ,;t, i;;5;;; ro.(harrin_g from memor)ar d "t","" tarerri;ne.Ot ,o.,_e,;r rfi. ," ror ro,marile.purp6es. rherearher is r er! ro...onsL,me., :p-"1:::"1._1,1,:11.1 rmme c tra l e l t.a D d d o cumenl ari on i s probabtv unnecesrr). NQNTESTAND NFOFMALEVALUATION MEIHODS 261 p u rn s p ro r q u e s l o n i nB and tor rc,ordi ng rhe ndrur. ur rhF rp lhe s ponqe sd ! ra re \ l h . .h d r a ' r.r i s ri rs oI rhe re.ut di ng turm ru be u\.d. U L,\i ou nr. one dr m e n .i o n i ,f fo rm mu i h e uudenr nJmes.but rhe ,e, di nren.ron i\ e u rp o s e ^nd def ined b y rh e -u \e' hr\--p l i g urc I4 a,boh\ r$o .r,,,i .,r,,, "r wer e de s rg n e d,u r.l r,,e ' e n r p u rp u { e :.(hrfl \ ha.d,a " ,,-pr* } n,drt for.,,.h r\D col ques r iu n rs l .d o fe a ,h q u d e n r rn d d ,:,i tF ror ca,h appropri are q,urre,i l re sponse. The charr allons rhe reacher ro exa throughour rhe group, (2) rhe exrent o quest'on, (3) the suc.ess rare of suden o!crall successofsrudbnts. t he teacher i are being neglecred or if anyone is pard Char B is rnrended ro shoiy both rhe quanrir). and quality of siudenr participatron in.lass recitarion or discussion. The focus of rhrjchait ;s nore o" the general quality of srudenN responses than on rhc narure of rrre questions they were asked or were able to ansver \o rr rh a r l o r b o rh .h a \ rn t i gure t.1 r , .l er" hl e I,rdsmcnr r,,u)r be ex er i is e d .b y rh c re d h fl d u n n g rh e .| re\r;un dr\ ^n,i ,,i \i or p" ri ;d brd ,l ur q,re,. r on r eq u l p rn e x p ta n a ' j . u ' H d s i r a pr edi , r i un? \^ a. rhdI rc\p^n,e !, et,L;bte: L, r d uou g a d d rn \ n e w \l a l r\ In rh e i s.ue or di d hc mJi nt\ sa1 me ruo:1: H ur S er m r ne s a s rh a r , u mme n o rh e (opi ,. ur di d i ' upen a new , bu!hw hi te r\\u,,; Lr pe' r Fn c eq i rh a n v \i n g l e i e , u ' d i n g toj nr .h,,,rl d Ini red\r ubl c( r i \ i ry r nd. , nn\r. quenr lt . rh c u .e l u l n e s o f th p d rra l u rm\,hdr rFq,ri reroo,nu, h i nrFr.n,e Lr rhe r e€( hc r s i l l rn r€ rrF rc -s i ,h rh e q u c ,ri u ni ngtr^(.$i nd.| | i mcnrrt$dy\: l es\ri me wr r l bc a ra i l d b l e fo r l o rm u l .rri n g h rg h.q ati rv qu..ri nn\ dnd n,urF deJd | l me s i tl re \p o n s e q u a l i rt i \ | ldn\furmed Inro r ra \ mJrt rn ,hi pr.pc, FbuE ra-s. A t€natveEiamptes 6i Slud€nr oratOueslonins F€sponses A fYPEAt QUES|IONASKEDANDANSWEFED (n Odr DoL4 Explanation Ptedictian oI oo o o Or lt oo I I o I o I B. NA|UREOF CONTR|BIJIION l Ooug I l i'62 NONTESTANO INFOBMAL EVALLIAT]ON METHODS SUMI ARY PEOPOSITIONS 1 Themoslcommon detciencies associated with nbrnat assessreflsca. b€ dercotre or/cdrsrur rrst, rnenr deve,opmgll€r d advance ot€nnr ng 2 Theqlalty ot observationa data s a tunclionol lre oos4ruafo.acrandrt_eob6ervat.onat r€cord- I The iinenessof a raiingscsie ls retated to the arolro scorevariabirrv b canproducearo,ne ,etrabrti,y revqor sco,e rhatcar oe alarneowith 3 Fpsutsrror sponta.eors obs€rvzlior inlL c€n erce sLbseq!€ntjudgmonlsor decisions of the vrewerIn unintended or unknowinq wavs. . Fn.r.ilbr or oos.tueoDsnavior .s e;s61:atr! e.,aorsrrc G reo,esena,ve.oss..d rrs D.oo- I Theresponse optonsof a p€rlicutarralrnoscat. d6scribeeitherla)rreq!encyor ofcu(6nc; or lb) qLary ol Fefronnance (or ot a Drodlci). OUESTIONS FON STUDY AND DISCUSSION r. Howco! d the retiabiltyot the resuttsot oratquesioning in a ctassfoonb6.slimaled? abo!tih€methods characlerized inthischapterasintormat assessmenl ' H[1#:i'Ji,T:#;TH#J;:hedri e!'i reco'|d viddmq6,e,bb,e andmean ne.!] 4 W-hy6 it absrd ro lrge teach€rsto curlait a spo.ran€ous obsotua|on in ,avorot ptanned . y":"J;:T:[:J::i:ffixarnpr€Eof howsponraneous observa.ons napproprialery in.L] 6 Whvmrghlinlenriomrobservers€recrvitv be moredrticuhto conlrorthan!nintenaonal k6,v,o modilv rhe, behav' ffi"'#ffJ"##:ffi""i1"i:#:il:'I:fj,?;Ts€re, provide dislod€d ' .li##i:i"jlj:i':1",",,T::1:"9.il',".'"'j#'*-"Ftencvcomb,ne,o oidii,€,ent orb.havio,s ryp€s robe " H:l"ff $;,lifl'#1i,",'",:fi::y[eni€ numb6, NONTESTAND /NFOBMALEVALIJAT ON METNODS 263 10 Whymightthe resutrsrroma ctreckhsr be moreeasly lsed tor crterionjeterenced than normielerenced o!rDosest l1 Howcan noMcesand expe.isbe lsed efeclivetyi. constrLrcting a checktisl? 12 what are the expectedeitectson score interpretaron (normreterenced and c e,ionfererenced separarery)or eachof thesekindsot ralng erors. eniefcy generosty.and ceftral lende.cv? 13 Whalare lhemaioradvanlaces ot usingoratqlestoningtorgarhenng summatve achevemenl nlofnaliontroma ctassor sludents? Wharki.ds or queslionngtechniques by the leacherseemto dscouragedeepth nkingon 15 W h y m g h ti .c re a s e w d a i trme n c r ease pani cpal i onasw e as l hei requency of appro- Grading and ReportingAchievements THENEEDFORGAADES The uses made of grades are nunerous and ofren cncial. They are used as s€lf, evaluadve measures and also to report studen$'educaraonat srarus ro paren!s, future teachers, and prospecrive employers. They provide a basis for imporranr de, i\ r ' , n. d b o u r e d u (d ti o n a l p l a n s a nd ri ' ecr npri on5. Ihen. roo, edu(;i i on i s expensiv€ To nake the best possible use of educational la.iliries and srud€nr talent, it is cssential thatca.h studenfs educalional pro8ress be warched caretutty and reported as accurately as possible Reports ofschool grades serve somewhat the same funcdon in education thar financial starements serve in business In either case,if the reports are ina€curate orunavailable, rhe venture may become inefficient or the quality of the producr may detenorare. Crades also provide an importaDr means for shmularing, drrecring, and rewarding tbe educarional €fforrs of srudents. This funcrron of srades has been a" I k ed o n g ro u n d th a r rh e ) p ' o \ i dr rx| ' i nsn. arufi , i al . anJ hence undesi r ' h e rewards. tnde€d, grades are exrinsi., bur so are mostother able stinuli and cher. ished r€wards for elTort and achievement- Most workers, includins those in rhe pr of e( s io n \. a re g ra re l u l l o r th e i n tri nsi . reqardq rhar accompanr mes their effo s. Bui most ofthem are even mor€ qrareful rhar" omeri these are nor the onlv F e w o rg a n i z e d ,e l fi c i e n r h u man enrerprl sesran be (ondu, red sucress ' ewaron ds .the basis of intnnsjc rewards alone. tully I o s e rv ee ffe c ri v e l yrh e p u rp ore ot sri mul ari ng,dj recri ng.and rew ardi ns s t udenre ffo r r\ ro l e rrn , B ra d e sm u s r b e \ /l ' d. The hi ghe\r gl ddesmusr go to rho\; srudents ho have demonstrated the highesr levels of achievement with respecr 261 GRAD NGAND FEPOFT NGACI]EVEMENTS 265 to coursc objc.li!es Grades must be based on sufficient evidence. They musr rcport the dcgree of achievenrent as precfely as possible under thc circurr staDces.lfg.ades are assigned carelessly,theirlong.run effects on the educarionar eA br r s of s ru d e .rs c a n n o t b e g o o d Some studcnts and tcachers minimize the nnportance ofgrades, suggesr rng rhat rrral studcDrs lcarn is more important th^n Lhe grulz they get The'r concepdon rests oo rhe assurnprion thar rhere generally is nor a close relarion. ship be$een re amoun( of usefirl learnrng a studcnt can demonstrate and thc grade he or she recerves Oth€rs have made the same point by noting thar grades should not be rcgarded as cnds in fiemselves, and by quesrioling the use of c x am inat i o n s ' l n e re l y fo . rh e p u .p o se ofa$rgD i ng grades It is uue drat the grade a student receivcs is not in itself an impo anr educaoonal outcome by the same token, neither is the degree o. diplonra roward which the student is working, nor the academic rank orprofessional lepu t at ion of r h o s e w h o tc a c h l h a t i n d i l rd u al . B ut al l l hese symbol scan be and shoul d be valid nrdicari.,is of importanr edu.ational attainmenls. It is desirable, and nor impossibly difiinrlr, ro nake the goal ofmaximum educational achie!emenr compatible with the goal of highest possible gades. If thcse two goals are nor .losel) related, thc fault would seen (o rcsr with those who rcach rhe classesand assign r}le grades llrom the poinlofview ofstudents, paren[s, re.rchers,and em. ploycrs, rhcre is nodftrg "mere' about the grading proce$ aod rhe $ades ir y ields S t ro u d (1 9 4 6 ) u rd e rs c o E d th i s poi nt. Il the marks earned,n a.ourse ofsudy are t ade ro representprogres toward getdrgan educadon,{orti.gtbr marksis,prolrro afurtheranceot the purposes ofedu(a(ion Ilthe marks are so bad fta. the studenlwho rorks for and attan,s rhem nisses an education,then workinSlbr marLsis a pracdce.o be esche{edWhen marLs are given, we are not likely io dssuade pupils from workin8 for them aDd there is no senrble reason why we shodld lt smply doer not make sens€to grade pupils, b maintain insd.udonal machineryfor asembling and recordint rhe gradings, ehile at the same rime t€lling pupils marls do no. amounr to much-As a mauer of facr they do amouDrto somethingand the pupil knovs rhis.Ifwe are dissatified with th€ resulB ol workrg for marks we night try !o iEprove the marks (p- 632) cmdes are necessary.If they are inaccurare, invalid, or meaninglerr, th€ remedy li€s less in de.emphasizing grades th:m in assigning them more caretuIy so rhar they mor€ truly report fte exienr of important achievements. lnsiead of segking ro minrmize their importance or seekrng to find som€ less paintul substi tute, teacheft should devote more attention to imProving the validity and pr€ci sion of rhe grades they assign md to minimizing misini€rpretations ofgrades by rhe students. t€achers. and others who use them. SOME PROBLEI'S OF GRADING The problems ofusing grades to describe student achievement have b€en persisreDtly troublesom€ at all levels of educa tion. An imponan! ud firndamental rea. ro solve Permanendy is because Sradson why probl€ms of grading are diffifllt GRADINGAND FEPCRIINGPi]I] EVEMENTS ing s lsre rn si e n d to b € c o mc i s s u c s i . educati onal con(r.,versi es Odel l (1950) nored that research on grading sys(eDrsdid not bccomc signifi.aDt uDtil afrer dte turn of the ccnrury At about tliat samc lime, the delelopmcnt of objective tests was ushcrins in the somewhat contror.ersial scientific molcmenf in edu.ation. educarion in rhe rhird and fourth decades of tlris (enrur\: Thc rise oftrogre$i!e wir h ir s e rn p h a s i so n d re u n i q u e re ss of the i ndi vi dual , the w hol enessof meotal life, freedom and de rocracy irl rhe classroom, and thc child's need fc,r loving r ei, \ \ ur rr(c .l e d ro i r, i \n ' ' u l J rJ d crr;r rarru* " e\\.rl re,ul rprri ri \(p' (\\ur" .. and |he c o mmo n s ta n d a rd so fa c h ;e vementfor al l sudeD ts tmpl i ci t i n trran) $ ad. ing sysreDrsIIowever, subsequent renewed emphasis on "back to basics" end on pursuit of academic excellence has been acconpa.ied bv plcas for nore formal ev alua ri o n so f a c h i e ' e m e n t rn d m o r e ri gorous sai dards ofarrai nment (N ari onal Cc m m i s s ro n o D E x c e l l e n c ci n E d u c at' on, r983) Such drilis and shifis in educational philosoph) hfluence sonre educa rionrl leaders 10 espouse one philosophy, $ome anorhcr Some tcachers find it easy 10 accept or,e positron, some another, even when lh€y tcach in tbe same educ ad o n a li n s ti tu ri o n . S i n c e s o m e whatdi fferi ng gradi D gsyrtemsare i mpl i ed b! each of these difitrenr philosoph,cal positions, n is not surprising that diffef ences of opinion, dissatistactron, and proposals lor change tcnd to charac(erize teacher reacLions to the entirc grading enterPrseAnother reason why grading systemspresent perennial problems rs that rbey require reachcrs, whose natural instnrcts incline lhem to be hetpfirl counsel. ofs and advocates, to stand injudgnenr over the deeds of others "Forbear to .judge, for !e are sinner, all," said Shakespeare, echoing the sentiments of the lerer dilficult to SerDon on the Mounr 'Judge not, that ye be nor.Judged " It 's he or she teally assign a srudent a good grade, particularly if it is higher than expected But since the reach of many stedents exceeds their grasp, there are lrkely to bc more occasions for disappointmcnt than pleasure lbr both studcnts The jssues that contribute to makrng grading so Problematic are primar ily philosophi.al in naiure. lhere are no research studies thnt can answer ques rions like: Whar should an A grade mean? What percent of the studrnts in a class should re.cive a C? Should spelling and grammar bejudBed in assigning a grade to a paper? What should a course grade represeDt? 'Ihese thould" questions require !alue judgments radrer than an intcrpreration of rescarch data; the answer to each may vary from teacher to teacher. But all teachers musl ask similu quesrions and find acceptable ans$ers to theft in establishing dreir grading poli cies. Wirh careful thoughr and penodic revieq most teacheE can develop satisfacrory, defensible gmding practices that will yield accuErc m€asures of the achiev€ments of their students. And by altending to the principles that e.hance policies and proce rhe reliability and ealidiiy of other achiev dures can be developed to produce relevan! meaningtirl grades at all educational No sysrem of grading is lrkely to be found that will make the process of grading easy, painless, and g€nerally satisfactory. This is not to say rhat prese ! grading practice, are bcyond improvement.It is only to say that no new Srading system, no matter how cleverly devis€d and conscientioully followed, is likely 1o GRADNG ANO FEPOFTNCilCH EVFMENTS 267 s ohc th c b a s i c p ro b l e m s o fg ra d ing The real need i s not for some ncw sysrem G oo d s v s re rn sa l re a d ! e x i s r T h c r cal necd i s i rl usi D ti the exi sri rg sl 6remsro pro. dacc th. nnrst valid grades possible lbr rhc limircd ser of purposes grades shonid Some Shorlcomlngs ol Grad6s T w o n a j o r d e fi .i e n .re s o f grades,.rs rh.) are assi sned,n many cduca r i^na l i n \ri h r,' n \.!r' l rh e l d ,l nl ,l c.,randg.nprrl l \i i ' (prFdder,ni ri on.ul ahat .he various grades mear and (?) the la.k of sullicienr, relevanr, and obicc I l\ ' . , i d i n ,e ro u \e d \J b a \i < to rJ $ i q n i nB cri .l p.' \ri gtsi n' .tri rt,i ' .rnddri r 1989) One .onsequence of the filsl shortcoming is drar gradins sLandards 'vold.rhe mcanings of grades and teDd to vary Irom reacher ro reachcr, from course ro course, fron dcparmeDt to deparhnenr, ard fiom s(hool ro s.hool wirhin disrricis (IeNillingc\ 1971) Anoiher consequence is rhat reachcr biases arld ntn> svncrasies tend ro reduce rhe validit)'of grades (Sriggins, lfishie, and criswold, 1989) One outcorne of rhis sr(ond shorrcoming is rhar rhe gr,rdcs tend t.) be unreliable Another is that grades .an be inllared thelr hce vahe rs high.r rhan r her r d (ru a l !a l u c . The absence of€xplicit definrtiors fo. each grade permits reachers to be influenced, either consciously orunknowingl,v, t'y exrraneous facrors in assigniog grades. Rcsearch on this point fiom thrcc or more decadcs ago probably is .har. ac k ri s i i c o f p re s e n t p ra .i i c e (C a rrer, 1952; H adl en 1954i P al mer, 1962) S ome teachers deliberately use high grades as revards and Ios'gradcs as punishDcnrs for beha!ior unrelated to the attarnmeni of inslru.rional obiecrives \' d ' , h rn .l t l l i .r l l 12. l q l Jr. l 9l ' l hl ;n rhe unrel i ahi l i rl T h e " ,Id re \ ^l ' ofteacher's grades on examinadon papcrs are classicdemonsrrauons of rhe insia bility ofjudgments based on presumablt absolure s(andards Identical copies of an English test paper were given to 142 English teachers, wirh rns(ructions ro score it ou rhe basis of 100 percent for a perfect paper Since each reacher lookcd at only one pzper, no relatite basis forJudBne[r was avarlable. The scores as. signed to the 3ame paper ranged all the way from 98 ro 50 percent Si'rilar resulrs were obtarned with tesr papers in geomehy and rn hisrory Typically, grades such as rhose Starch and Ellior collected forsinsle examr nr r i o n p a p e rsd rr n o r h i g h l r re l rdbl e.t^r semen.r grddeq.hose\c. ' eibased i aU i Irion e. in the rang€ of 0.70 ro 0 80 should be common. Semester grades are qudenr m uc h mo re e x re n s i v ea n d .o mp rehensi !e nbsew ari ons ol d ai nmenr5, perhaps as nany as 80 hours of obsen-ario lven so, one hour of inrensivc ,.ob. servadon" under the controlled conditions of a well srandardized achieve'Enr test can yield measures with reliability esnmates in excess of0.90. If the rools of performance assessmentare not well designed, their collecrive worrh over a se mester Day be exceeded by a reliable and valid commercially prepared instrumenr rhat tales no tiDe for the teacher to prepare and a small frachon of class r im e (o a .i m i n i s te r O u r p u rp o s e here r\ noi (ol rgue torrepl a(i ngreacher.made evaluation tools with standardized measur€s. bur to dramatize the unforrunare state of affairs in which some teachers find rhemslves ar grade assignnenr rime We are nor facing utter chaos, but considerable room for improv€menr exisrs. 268 GBAOINGANDREPOFTING,|CNEVEMENTS THEMEANINGCONVEYED BYGRAOES A,grading sysremis primarity a merhod of communicanng measurementsor achrevemenr.Ir Involvesrbe use of tob.€re".it ;"n;;;il;,i"H l'"fj:l:ll:... ;lTlT"i:fi:J;Tilpj "ush, rhe clegre€rhat rhe sr ading s) mbolshale rhe sa-. f.. r *h. ;*-ili.,,, is it possibtefor sradesro s;rv. ure rh PurPoses -*"irg or comnunicalion meaninsrully ^"d i;i;ry-he mejninsol d gride \hi,uld.dep(nd J\ JIrt. r\ p.\sibteon,he in\|l u,. ,'T issued tor who rt o. rhe coursero whrch ir prftai,,,- fhi. ;** rh.,r ;;s;;;;s ." insrrucror.of a deparrm..i, * _a*J"i._."il;."J:;:r l-."",::l::: "fare rDatrers of legitimar€ rnsrturron m enr s ,a n d o th e r i n s ri ru ri o D s .Itme : A parricular giade carries t th e .o m p a ri s o n o l absolure s.andard or a relative sranr fied group. Se.^nd, a Fade reDrelr eir h€r am o u o r o f e ffo ri e " p e n a .a o nar r y .a g ra d e re p re re n rse i th e r rh e a l rnstru.rion or rhe amounr oflearnrn The remainder of fiis secoon is a dil ing be(ween rhe alremadve meaninss ute ro the overall meaninS of a graae , Absolut€ and Rota vo Stendrrds A gracterepresents a teacher, has performed a ser of rasks in one i units. These judgmeDts of goodness c ot comparison. performance thar is d lenr, or inferior obtains irs qualihri perrormance in quesrion with a peri absolut€ or relarive rg systems used in the United Stares since -trerabsolure or relative grading srandards_ A definite percenr of..p€rfecrion," usr was regard€d as ihe minimum passin! studenh' performances r,irereni edg€, skills, and undersranding_thal -ere GFADINGAND REPOFTING iCH/€VEMENTS 289 gr dde p re \u ma b l v rs i s n e d i n d e pr,,den' tv uf rhe grade. ot urher srudenrsi n r 4e. ou rs e . p e ' r, r, ' \g ra d l n S l l (u / r i \ thatact.ti l ed a\ ahsotutp sl otn!. In , o rh e r mrro r rrp e o f g r adi ne j l crem i sbascd on rheuseof ,3mal l . num ue r o r te ' re r g rrd e s . o ttF n fi !e , rn exp,es vari ou, tevetsorarhi evemenr ln r hF f l\ c l e rrF ' .{ ._ 1 , D . I .rrre m . rrul y ourqrandi ngpertorman(e i r assi qneda ' .t .r. gr aoe o r I n e b In d ! a re \ d t,o re .r rerasea(hi eremenl C i \ rhe a\eraqe eradc: D r n. u' dr c \ r)c ro q a rF rrg e a r h rc v e m e nl and F r\ u(cd ro repon tai tu,c, aA i ;vemenl u n ' ra n r , re d i r l o r i ,,mptrri ng d .ouFe. r ne rptari rF srandard to s r r , h e r,h \ru .l e x r \ p e rfo ' m rn rF i s retFr.n(ed i \ rhe di \rri buri on ot D erl orm. J n, pol o rh c r\ru d rn r\i n rh e i tJ s s Ihu\. tel .r grddi ng i s ,.rn.,,-., .;;r;;;" ; ized as rtlahu grading. O l i o u h c . e d , h tc e r i n rh e gradi ng \tsrem , an be defi nrd i n rbsol ure r c r m r ,s t.rd n r re ta | | re re rm r. A Srrde ot D ma| ndi ,rre a.hi .!emenr ot rhc nr r nr m i rr! e \F n tri r rn o w te d g c d n d undcr{ andi nB 5: C mry rrpresen, adeqdare r dr n, I rh n n a \e ' J g e l i h i e rc n e n , B m J) i ndi i arc I tc!et ur Jdl d;, ed ar hi erenent wrm fespecr ro coltrse conrenq and A may be used to represenr exceDtional or m er i, o' i u u , d .h i c \ e ,n e n ' . s u , h g e n erardei ,n i ,i o,r ,o,t,t r" q,,i r. .,h.I;;,;; ,;. i o mn ru n r, a l r rh e a l ,\oture t.\et, ot a, hrevemet, ,n any parri .utar , nLr 5e o r { ' J d p .J rri fi , \u b i c .r n ,J ,r thc pornr i r. rr i , nor $e use uf tetrer \ ) n' bul\ ,j l .tr\| | n q u i 5 h e \ d b ,o l u re anJ retari \e gradrng: i r i \ ,he narure ot" rhe standard against which performance is compared rhar diifferenriates tne two T h p d e i i .ro n ro u \e e ,rh p r d,, ur a rdtr' ri vesradi na smndard i . " b.^turr r he r,rn .l J n p n r.r d (, ., rp i (hcr musr make qi rh rea;rd r" ;" ;i ;,;;;; ",..r ' \i .n standard assessnrenL When rhe absolure is chose", .ll and r;"ls of.;;1,; r r iun m ,' \r b e d e .i S n e d ro \i e td ,ri ,eri on.retcren,e,t-erho;" i ,,.,p," ," ,i " ... A ;;tr;. . r dn, , J r.d sro r rrJ d rn g mu ,r b r e \ra b t;,hed ror ea, h , omp.ni nr rhdr i \ ro Lon| | i b. ur er or h e ,o ,x \.g rd d /-re \r..p rp F r\.q u r//es,pre\cnr,ri un\.proi errs,drdurher J ! r , g' , m.n l s l l rh e d ,a i .rn n i \ In u \F J retari re \rdndard. a gr" di ne, ompon" ns m u. , L' i c e d e d ro p ro !i d rn g n u rm re tFren.edi nrerp,e,j ri oni .Or ,;u,se,rnborh i . ' \ . \ ( u ru fl \ o r c d e , ,\i u n \ n e e d ,o b. mdde r, t" ;g ,r ,er" r,t s,,atne ,r-L" ta dr , dur l d b l e . Ih c h a \i . l o r d e r.rmrri ng,hc,uro" fr poi nr. r,.Li ," ri r Lr c r F n ,e d In o n c ,d \F a n d n .rm .,e t eren,.ed i n the orter. " ' i rr T h o u g h r ,l rrr mj i u ri ' r o t i n.| | ruuon- noq u\e te er sD drns qi rh retj . f i, e s r n n d n rd s .p e r.e n r S rr.ri n g i s h ] no meanr nt,\otpre.S omi i nrLi i uri ons rri tl ,on\crr ro l e cr grades trom D er.enl rards srill prefer ro deFrne pmsin! scores . i n \ome,nqe\, rhe r," rores,* rrany grading methodotogy Some insrrucrors I over r.tr| | ve grddi ng for phi tos.phi ral \tandJrdi ovFrheari nqor, i n some cases. Achl€vem€nt and Effort After rhe decision has been nade about rhe use of absolure or relarive standards, rhe insrnctor musr decrde which p.,f".-"""" .. ;;l;; lor m . uf rr h i F re m F n r s i fl b e i n , tu d e.t i n ,he grade. ".p..t" U ndoub| "f rdtx ..-. ,." , ;.;; baqes um e o t rh e g ra d e s rh e v rs u e o n l atrori orher rhan rhe dcere" or arhi eve. 270 GRADNG AND REPORTING A]:H]EV€MENTS menr of rnstrucrionat obiectives (S likely $'itl conrinue ro d"io U.-",, conrol in rhe ctassand because son t ile r e a (h i n g . B u r th e u l e o f s l a d cr leadsro d i s ro rre d m e a n i n A (o t' rh e qr t hr r s o .i a t b e h !v i o r ra rt., rt r" .." h of rheir school progEm W e h a v e a r$ re d rb a r g rrd e! . and r e w i r.t s ru d e n r l e a rn i n g . c € rr a oemonsrrare grealer desire ro learn some forrn of recognirjon and rewa Statusand crowth Some insrructors belele that the amounr of improvemenr sruden achievemenr rhey dernonsrrare ar rhe on orher preliminary observations. ; inirial sratus.The differences bers.er Il)enr, sxbrracrrng rhcse scores from o tlon ot enors ralher rhan a cancella more error tadcn than enher of rhc s mal consisr mainly of errors ot measr provide reliable scores. insrrucrors ma and posnest rnean But few classroom achievement tests i suremcn,s orsho.rern, sil;i; :fi t:::,i.:J,.J,-il:"ffilii","..1,f:1",*, GFADINGANU UHAUING AND BEPOFTING REPORTING ACI EVEMENTS 271 ACH In addirion ro rhe reliabitiry con.erni rhere ar€ ortrer problems wr.n groMh measures. One is rhar, Ior mosr edu(arional purposes, kn;qtedge thar a studenfs achievemenr is good, averasr is more useful rhan knowtedEe tha than orhers during a gading lirrior on the preresr have a considerabty gl gains in a c h i e v e me n r rh a n rh e i r p ( dents are quick ro learn lha! under cjrcumsrances of grading on rhe basis of grorvrh, rheir prerest scores should be as tow as possible ro pe;m( rhe greatesr pos!ible obsenable gain. Ir is rrue Lhar sr2tus grading seems to condemn some srudenrs to low grades in mosr suLrje.rs,senresrcr afre' serne,re. Low grades drscourase effor!, whic h in tu rn i n c .e a s rs rh c p ro b a b i ti o of more toh g' i a.s. S . rt. ui c,6us.1.t. conrinues, bringing dislikr of lear ing and, possibty, eariy wirhdrawat from s c h. ol. If s ru d e n rsa re ra u g trr to d rs l ike school by constanrreD ri ndersof rherr l ow achieleDenr, the rcmedY probaLly is nor (o rry ro persuade rhem rhar therr rare of growth roward achieveDent is .rore imporranr rhan starus achieved. tor rhar n a t . an i p a re n r fa l s e h o o d T h e re D e dt i s probab\ ro pro\i de vari ed oD D orrur!f t ies t o e x c e l i n s e v c ra lk i n d s o f o rthuhri e acti ,i i i cs.-The ptrnni ne a,' r:ti mpre nenLat i o n o f s u c h c i l b s c e .Bi n l ) $uul d reql rre an aterr, ,ersar e, and d;dr cated rcachcr When ir is accomplished, rhough, grading on rhe basis of srarus achieved will no longer orean thar some srudenrs must always wrn while orhe,s nus t alw a y sl o s e In s re a d$ m e $ ru d enrsw i tl he abl e ru entor some ot i he rew ards oI ex c el l e n c ei n rh c i r o w n s p e c i n tri c \.C ohen rl S 83). for;xampLe, has de\cri bed alrerna(jve procedu.es for grading rhe achievcmeni ofexceprionat srudents who have bee. "muinstrea,ned " E S T A B LI S HI NG A G R AD IN GSY ST E M The Grad€ Scale lircnt Many instrucrors seemed ro 2gree wiih rhis view. Nonerheless, from time r o r , ,nu lh e re l ,J . b e e ,, ,,,, r (.s ,.d o r re new ed rnrcrFsri n efi ni ns rhe $ adi rq 5(ate buodilg p l u \ d n d n ,rn u \ .i g r. ru rhe hd\i . l e e' , o' ' de, i m;r rrai ,i " n.i o rhe bas ic nu mb e rs (l b r e x a m p l e , 4 0 , 3 .5 , 3.0, 2 5). The notion rha. gradrng problems can be simplifi€d and grading errors rcduced bv usiDg fewer caregories is an a(racdve one hs weakness can be ex. 272 GHAOINGAND FEPOFTNGACH]EVEMENTS In^.I'""":j"y:,-ll::der g',ai"g cateso es in-sradrnsdoesindeedreduce rhe .l;iih jl: , ;.";;;:':.i",',:;:i;:";:l':; .rh,i r,.i a.,.,,.i.,,,i.i;;;; ,.,j;:,:;:;.''j.i^:"'-,,f..,"':';1,;',;J';;;j':;? ".;;f:j:'j,T.jilllll: :1i1":::[t":',1::; ::,fli::'j:,:;:tl-:1:1.:::r r' ll;::::..f:,:::;::l ., u.-.."ri,";i:|FJi:lli';';T: l:i:;:: ;X:,;'::;'J; :::. :::"t:".1*,." ;;';;;;;;;'i,ii'i:'iill"i;i"'i;';". fJl::l".1':,,::':.1 ::_p.,.;",. ;;:;':' ;:l:.IJi1; *,;:".' ;:l':;I.flfitl;t ::iq.:::i:': 1.."!,ri;ir r,,.f ,".,8.;i;,;;;:.ji,i i:,.:?::[l ?:lj]:::::j."r.a..,""ra F:l::J',;: ,,;l:;is';, ;;;;;";.,8":':";111,.".,;11:li; ;:i"..":::,:.:t:::.1,.:i::_jlir .'' ;'lilii.: ;#;:.'" Y,;1,,i': k;';:l g;,;:::-*;1.1';"_, 1r1 ".]",; ;;"; ;;.;:';;:;i. f;) ::;:::Jlt J;;:li::J,::t:5:i.::::::tr:;"i i'g''r," ii es redutes the islli:r';; l:l ntor ni ( o n v € )e d b ) rh r g ra d r on .,;::"::1ot:lll,l'. '.au, ",;r.. " , ,,.go, e. Lellers versus Numbeas pdcent.sradins wasa,dedbv rhe subsriru []l:.'H::'j,:j.:::ff:rn$ .,_#,, r.i.,..iii.;;;i;;,:t:;." j;t;:l:ji:: i:.:T"):.,;, # fli_*.-y l,:::llil:;ltJ::;;:.,a:1x.:1.::r,r::..:r;':;;;;;J;::;i;:, ff:.1::j;.'i: ffi:i:#::-,'jilff:, :L:;::t-:rf--;,6.;;:;;:;'#,::ll' ir,.,.t.,J".;; :t J;..,;Jl:;iltii: Il;ii;:11]:1j"ha,reueh t::?:1, impr), 4-al,,;,,, ., r,i.. -r.r"* ;;;";i 1i,",',illllili;"'"llJi:li: "",_, F o r b o rh rh e s ere a s o n srt", .,.tU n ," i:.ifl"it;-ff::Jlj:ilt:::x;.-",ii:T;,:i4;;'j;:l;:iii:r':,11 r.'Ji"'i'",'','ii ;,;.;'li:]"'x lfili"? [::l,i::"li: :]t:'-y:l:,',:lr ":iJ;;':x: l:l,?'lil ruili";li:* [:ij"j;ii:l*,::::r:;'lilr j1i.fl"",',iij.ll"l,T;i ,;.J'",,"r,"ii," "i ;,ltti",l)'J more f;#:,ll :':::":: subrle rta s e o u \th i q ,.l !_ -,,^ ,,ri a trvmbot\i ng,adi " " -" " ngboutd ".r.r. changes Slngloor Mutttptecradss , s are rDoreexplicir in communicaring GBADINGAND HEPOBT/NG ACHEVEMENTS 273 wha( s ru d e n rsc a n d o rh a n a re l e rt senrs and rhar sufficient evidence L { o' d e r(rm ,n ,n g rh e \e p .,r,rr R rJde grades rePorred ar one rime, rtre x enced by cons'dembte h,lo eftacr 'r. s r ude n rm a y i n fl u e n c e e a .h o fth c gr ro rhe scparate aspecrsof achielem( I ng n ra ) h d !c L r o l re r.rn e i l u i j l i o shoncomings of rhe sinsle.svmbot s There is an ecle;ric gradir,ts , thar has promise for sarisfying rh( Ii re n c i n g .I l r nr nr m u m (o m p e re n c y r€rerenced measures ro make rela passed. (One version of rhis mcrhod Only a single grade is assi,$ed to ea gr ade (A, B, C , o rD )a re re fe re n c e dt hav e b e € n i d e n ti fi e d a s m rn i ma l l v es r he r e l a ri \e s ta n d i n g \ o f \ru d e n rr i n regard! as .'beyond basrcs,, or impurranr to success rn srudyins more advancco as pe.rso i rh e s u b i e ,I rn a re r T h e -erte,ri i .y.,.- .,,.r, i r,.,. " " a" " " ," i ." .," " t* .h . ti k e ty ro dssi gna ba,et) pass,nsgrade (D ) ro srudenrs . . . , - nave . - I nor wno mastered -:.te s s skils rhan the) basic be ,"ai.; ."nv"";-;;i;h. tive grading sysrem. -,ght 2. Studenrs who fait at first can be rererred ro jmprore rherr gladc rtrer rhey improve.rheir skills. They are nor relegared .. l,",i"g i.il,..;i;p.i;;;,. on one o . (a s i o n rh e r d e mo n s rra re d te$ te;rni ng rhan u;h. :t. s ru d e n re w h o e x .e t j ,e rew ardrd r,.;,di ng ro rhei r l rvel ot arhi eve. a re i n c e n ri v e t ro s u bel ond rhe mi ni mum * * ,,t" r, a.r_.j l _ l^. _: , , _]h .," P a$' ng 4. The system represents a spr;*y .av".,tJs H:fi:":J,""1 lxi*li-i,l "" "i a"iil;,;",,"[i?::i1."i,T THREATSTO THE VALIOTTY OF qRADES A distincrion should b€ madeb€rw ofPerformance rhat a rea.her zarzarrr aDd the subseron*ro". ,t,"t" -t "Ptcts compon€nts rh",.;;G;'J,:,1"d.?f;:lLt'J",H#::':f;:llfl ::H::: srades studenrs competencewirh respectto the instmctlnat .bj?.t,".r. i;-.;-^i.. 274 CRAD NGAND REPOFTNG ACH]EVEMENTS neDts of a grade should be academically orientedr gradcs should no! be tools of discipline or relards for pleasant peryrnalities or good anirudes. A srudenr who is xssiBned an A grade should have a li.m grasp of the skills and knowledge t augh t. Il th e s l u d e .r i s l ])e fe l y n ra rgi al academi cal l ybur ve.) i ndusrri ous and congerlial, an A grade i{ould be misleading 3nd woukl rendcr a blow to rhe moti. vation of thc cxccllent studcnts in class Insru(tors can and should sn,e ftedback r o s ' ud (,,r\ h rl , | \p ,\ t ru d 1 rU .r! ul r.' i t. an.l , har., rer,' r i , l ,ur ,,rl ) per form. '. a ce based (,n acade'nic achielement should be used ro derermine srades. In r l, er re ..mrn F Id J ri ^ r' \ r.g J , d ,,rg r rr | ,.r d\ r,' ,l r\tF rJri on\. Ll ' e\ar i ;ndl C orn ', Drissrcn oD Excellen.e in trducarion (i08:l) stated rhar "sndes should be indica. r ot s ofa c a d e D i c rc h i e v e mc n t s o thcy.an bc rcl i ed on as evi dcnce of a srode.t' s readiness for furrher stud,v Grades co.raDinared by other fac(ors givc srudenrs a false sense ofreadiDess aDd provrde misinformarion ro those who seek Lo guide s t uden L si n th rx l u tu rc c d u .rL r-n d l cndc" \" !s S e v e ra l a s p e c rso f s ru d c n Lperl ormrnce havc beeD l abel ed as porenti al l y inv alid g ra d n rg .o d rp o .e .rs b e r,ru serhey rep.esentbehavi ors thar do D ot rcfl ecr drrecri_vrhc attainDient of rhe imporranr obtect'res of rnsrru.rion (lrisbie, 1977). Though some cx.eptions .ou1d bc noted, thesc variable, generally should nor be used in determining course g.ades Neatness in written work,.orrectness in spelling and grarmarical usagc, and organizadonal ability arc all worlhy trairs and are asse6 in mosr vocauonal endeavors. To this eJ(lent, it seems appropria(e rhat teachers e!aluare rhese as pects of performance and provide studen$ with constructive comments abour them- I{owever unless the course obtectrves include instruction in rhese skills, sruden$ should not bc graded on thcm in the course For example, studen$ essil exarnirralion scores should nor be ntllue ced directly by their spelling abil iry and neither should therr course grades. Sludents whose skills in wrirren ex. pressioD are weak caD and do learD rhe impoflanr knowled8e of scrence, social studies, literaturc, and othcr academic subjects. Iheir wriring skills can and should be evaluared in such courses, bur their course gmdes should nor suffer dire.tly because of their writiDg deficrencres Ib the exrent rhat rhey do, rhese grades are misleadhg to both students and parents and serve to moderare rarher rhan s{imulate intcrcst in rhe subjccr rrea Nlost rnsrnrctors are artracted Lo srudents who are a$eeable, friendly, industrious, and krnd. They try to ignore or Dray even reject those vho display opposite characreristics When ir appcan that certain personaliiies may inrerfere with classrvorl( or have Iinited chances lor employnent io their field ofinrerest, constructive feedback from rhe instnctor may be necessarl: Bur an argumcnra. dve or misbehaving studeDt who recervesa C grade should have only a moderate anount of knowledge abour th€ course content.'Ihe C should Dor reflecr rhe studenas djsposiuon or disruprive behavior direcdy (Bartlett, 198?). Most smaU classesand college selninarr depend on student pardciparion ro some degree for rherr success When parti.ipatlon is an important ingredient in learning, parhcipation grades may be appropriate In such casesthe insrructor should ensurc that all srudents have sufficienr oppo unity to participate and should maintain systematic notes regarding frequency and quality of participa tion (See Chapter 14 for sample recordiDg forms ) Waiting undl the €nd of th€ giading period and r€lying suictly on merDory causesa relativety subJectiverzsk GfIAD]NGANO FEPOFT]NGACI] EVFMENIS 275 t o bc ev e n m o re s trl )j c i ti \e a n d u n rc l iabl e pani ci pari on pfobabl ! shoutd nor bc graded in nrcst ctasses,howe!e. Doninrtilg uu.j ,t r.f.ut, r.,rj "rrro,l,,rt.a win, an. l i n rro v e (e d o r s h v s tu d e rrtstend ro l < i s" Iosr.r.ro., ", ,ray ,unntro pr.,;i .t. c \ aluat n e i n fb rn ra ri o n L os [rd e n rs J h uur \ rr ruus ]ql krri ut Lhcl ruLl en15pctr.nr_ alir ic s ,r n c l u d i n g w rl l i n g n e s s1 0 p a r r,Li pJre, hU r !r a.i rng s)roul dn,,r Lt rhc nea,rs or oor ng s o . S ru d e n rsa r a l l tc v c l s s h o u l d b e err.ouragcd ro rrten.l ci assestrccadscthe t c c t u. es ,d e D ro n s rrrri o D sa, n d d i s c u s si onspresunr.rbl yharc bee desi gnc.t j .i r(, c ilit at e rc i r l e a rn i n g . tfs tu d c n rs rn i s sserei al cl asses,Lhentherr pcrfonnancc,,n p rp rr\, d n .l | r9 i € .rs l ikcl v ri l l suffe. If rhe i D srrucr(,rrcdu.es t heir gr . rc l eh e ' J u s e .' b s e n .e ,5 u .h srudcD tsa1r subrri tre{ l (o a forrn of.l oubl e ^f J iupJ r d\ . F ,n e x a n rp te ,! r,, e g e i n s rructor ma), say fi ar ctassarrcD daocccou!rr lu- per ' en r u l rh e .u l r{ e g rJ d . L r r,,r. rdr nrs $ ho Li \\ { tr I rt , t,rse\ ,h r. ,.. et t e,r r r ( t \ , J mo u n r ro 2 0 p tr, e n r. l ;r lk t. \hu, \pcri e,,,, t,.tsh!,,,c\,,t ...,, rnJ. I n t her r L rrs s e sp ru b a b l ) n e e d ro e x l m i ne drei r cti ssroom e;vi ronIJ)cD tand j l . srructional pr.,ce(lures to dere.mine if changes are nee.lcd. there ousht ro trc m or e p. o (u c rrl e m c a n s o t e n .o u ragi ng srudeD (s ro ar(cnd ctassesftan Lrr rhrearen ro towcr rhcir grade S o m e i n s r.J c ro rs a re mo rc g e neruusi D rt,rn grdLl i nArtu! rhrv ouehr ro Lr bp, J , , .i rl ,d t, J r,l .a r to \F r q r.,,l e snri B hr hrur\r rhci r \ru,tFrr\ v .i r;r' sr, H, , \ ' \ e . .r\ \d J t, | , t,l i r' 1 ,t,J \.,r9 1 ,.,1 .rt,e i ,rt,t, InF| l rnr.nnut rhi \ I,h,J,,,,,J,t;ir. nor delen s i b l -T h e d F ,r c r^ t" b e te \F r\ rh n g . r h a ra n vre s z ri v e .€ a c ri o ins b o and (2) rhaLelaluaring a perfor a Personas a Person.Nor eler an d d i ti g e rr o n e s ,i s g o o d .a n d lu tl g me n rr d b n u r \rri t' n g J n d spcaki ng ski s, personal i ry rrai rs, efl or! . ' no, r r ur ' \ d r' u n a ' e L r rc r(h c n .onstaD dy as rhey i nrerucr w i Lh therr sl u d eth F { td rr.,r\tr,,r denr , . lur\!l u L l .n r.\r' n a,rl rl rcJU dgmer,r\ma,trJbnurJi rde ,r, promise rs Do easv task. Bur accurari and ntcaninstul llllc,.:': sracles dc. -d pc nd on rL GRA DI NGCO UB S EA SS IG N M EN T S 276 GFAO NG AND BEPOFTNG ACNIEVEMENTS and sevcnth days hc missed nonc He Dissed only one our 01 twcnty on the resr given the eighrh day Which grade best desc.ibes l'reddy's levct of a.hievement? 'Ihough he may not have caught on as rapidly as some of hrs peerr, Freddy ap pears to be able to rdentify prepositions Sone tbrm ofgrading might be used ro motilare anC dirccr Frcddy and his classDales,but all such grades need not enter hro dcrcrnrinnrg thc final coursc or term grade Perhaps the most frequent shortcoming alsociated with gradnrg assign meDts such as papers, reports, prcsentaiions, artd projects is the faihrre of the teacher to sp€cify and dcs.rlbe in alhtatue\\l\:tL \he imponanr aspecrsof rhe final producr should be likc The lack of "feed forwa.d," .N Sadler (1983) has label€d rt, p.odDces t$.o uDdesrmble outcomes: (1) sotrrestndents presenr incoDplere as sigDmeDts because they nisunderstood thc tcacher's iDtcDt, and (2) grading be.ones a chorc fbr the reachcr bccause rhe crirefla thar distinguish better assignmcnts fron poorer ones ha! e nor been explicated 'l he gt ading gu i de thar scetu so logi.al to prepare for scoring essay items is equallv beneficial to the teacher for grading assignments. lt can help to accomplish rhese things: I When presenrcd ro rhe students ar the time the assignnenr is madc, potential misunderstandings about what to do can be overcome. fhe Daure of the final product can be describcd completely, and thc relative inportaDce of various aspecrs of it can bc prese red. Oflen an example of an A assignmenr f iom a p re v i o u s c l a s sj s a h e l p fu l m odel 2 Opportuniues for exoaneous factors to influeDce grading are reduced because the relevanr elements have been defiDed. Grading variables and evaluation lariables can be separared so lhat a conscious eftbfl can bc made by rhe reacher to lrake comnen$ about the nongiaded aspects of the work 3 Grading can be done efFrcien.ll becauselittle time rs needed to decide which part, of the assrgnmen t to rserghl most hea! ily Less time is necded to judge c om ple te n e s sa s w e l l 4 Feedback ro students can be som€what diagnostic because missingseg ments and studenr misconceptions a.e more readily identified We discussed in Chaprer 19 the iDportance of preparing students tbr examinanons so that they know whar ro expect and can prepare themselves lur. rher A grading guide, like the checklist shown in Figure 14 2, can sene this same uselul tuncrion for assignments, and it also caD contribute to more valid and rehable measures of a.hrevemenr ifused wisely by the grader. C O M B I NI NGG R AD EC O MP O N EN T S When teachers derermrne a course Erade by conbining grades or scores from tests,papers, demonstradons, and projects, each componert may cany more or Iess weighr than the orhers in deterrnining the final grade. To obHin grades of maximum validitt teachers must give each component the proper w€ight, not too much and nor too litde. How can tley determiD€ what those weights orgftt to be and what rhey actually turn out to be? And if these rwo s€ts of figures are disparaie, what can instnrcrors do? It is not easy to give a fim, precise answer to GRADINGANOFEPORIINGrcN IEVEMENTS 277 of ,n:rch.influcnce cach .omponenr ozs,lr ro j:: have in deterDln. l,ow :l:'esllon I ngU, ,.o m p o s j re q rJ d (.1 r,,,\e \(rrt -,,,.. sx,di ngp,i " .i pr." -." ,, r." | ,' g e n e ra r.rh e u s e o t re \ e ral d " ,i .,.Jr s berre r rh a n U \e o f o rl l o l e , p rovi , I ns r n.ri u n a l .h j e c ' i r6 a n d p ro i ,ttcd r L' r n re rs o n a b te d c .u rr.l . O,h c r (un, nm the rDosr retiabie scorcs shoukt h ,,1i.,,,,,,, ""un, ,hd,r ,.m Pon.,,,ur r :Tii:i'il: ' ;l .1,;l"l:;;'i,l#1'-i; enr quire diflicutr ro assess.As a firsr aD. renr rhc \rdn.l dro deri . ron ot i r, \or;s rqr,e.. \Jri rbtc J. anurher, rhc Ii r,r rer of rhe se.ond rn rheir roral. on a secoDd, and lowesr score on rhe rh th€ raDks of rheir roral scores on rhc rhree rcsts are rhe same as rheir ranks oD rh i rd s e rri o n o l rh e ra bte gr\evhc mdxi mum pu,si bte \ ures oraj _, . .n, s.r ith po, rh.e m e a n s (o re !. a n d rh e s ,Jnd:,d d,v,al i " " , ,r,i * " * , ,r," i i ,* " r fhas otal pornN ..fesr the hi" "qhest m€an r vanabihty 278 GAADINGAND FEPOFTI.]GACH EVEMENTS r able 1 5 -1 . l v e g h l e dT e s tS c o re s 53 50 65 a9 42 2 3 3 1 0 00 500 25 21;) '15 65 5 1 30 I ta 1 42 136 15r 2 2 aa 22aa 30 l0 1 4 50 65 42 :JO 360 360 360 illus r r a te d i D th e l a s rs c c ri ()n o frh c rnbl cS coreronresrX aremutri pl i edby4.ro c hanq e rh e i r s ra n d a rd(l c v i a ri o Dl i o m 2.5 ro 10, rhe same as o" r.,fZ S coi ., " t es t y l rre m u l ti p l ,e d b r 2 , L o c h a n ge thei r srandard devi ari on ro r0 al so W i"rh e, . ud1,r-,,,1 d ,.1d .\i n ri ,,n - .h ( rr' \r !aIr eq,rat \ci 8h( anci I,re,tudenr. havi ns r he r . rn ' a \e rJ A c rd rl ,' n rh p rc \r" rhc." nre r" ral .tores. $'hen. rhe rvhole posible mnge of scores is .dsed, score variabiliry is . cl(xelv related to the exrenr ol rhe ava'labte score scale This means thaL scores oD a,() irem oliecrive resr are likelt ro crrry aboutfour rimes rh€ weighr ofscores on a l O.p o i n r c s s a yre s r q u e s ri o n ,p ro!i ded rhar scorcs.xrend acro;s rhe w hote range in lrorh cases. Bur iI onl! a small part of rhe possible scale of scores is rcturll)' used, thc lengrlr of rlrar scale -an be a lery misteading guide to rhe vari r l) iliI y o f th e s fo rc s . 20Vo 20% 10% 30% 20% the Trscorcsofeach compoDenr can be muttiplied by 2, 2, l, 3, and 2, respecrivelx ro achieve dre desired weighdng (Oos rerhof (,1987)has described the weighring GFADINGAND F€POFTNG ACIIIEVEUENTS procedures for hoth.riterionreferenced 279 and norm referenced gradirtg situa' On e fi n a l a d m o n i ti o n re g a d i ng rel ati ve gradi ng and combi .i ng scoresl k is a n)islake to converr tcst scores to letter grades, tecord rhese in a grade book, and rhen rcconlert the lerre. Sradcs to numbe's (A = I, B = 3) for Purposes of iu, np, r r in B r' ,,.,1J \,,i g e \. \ b F Il .t p ro.Fdur( i r ro tei ord rhe rP srscnre\ dnd Ih c' P ' arrherdd.d.hi rhqharever$ei ghri nq ' , r hei nurl e ri ," l rn .r' u rp ' d i r.{ rl t score that can be cooverred to a final a co'nPosite Io obtain bcer xdoptcd, bas which ir i{as bascd, is gilen the same value in the reconlersion p.o.ess (for exam' Dlc, R : 4-0). Sorie of thc rcliabrlitv thc tcacher struggled to achteve in delelop' is lost in the pro(css !'or this reason rt ts desirable to record i"e c".h -".s,"e or standard s.ores r;th€r than letters or rheir numeri.al equivalents raii scotes LlrrErHoDs GRADES oF ASSIGNING Thc procedures a teachcr lbllows for assjgning term gradet ar€ dictated laBely b! r hi Dru a n n rg(h e re a .h c r h a s c h o s cn to atrri bute l o the symbol s The mul ti l ude oi rnethods used in practice Beneralh (an l)e categorized rn lerms of their depen de. c e on c i tl te f a b $ l u tc (r rc l a ri v c srandards(l ri stri e, 1978).Thc P opul ar vari at ions ot rh e s e tw o tl p e s a n d th c i ! .orresP ondi .g strengthsand w eaknessesare dc s c r ibe d i . rh i s s c c (i o n Relative Grading Methods is callcd gading on the .ume The One popnlar radei! ol rela(i!. (l i stri l _,uti oncu11eor some svmmctri . ' ' c uNc re fe ri e d to u s o a l l ,vi s th e n o rmal 'arading \ ar ianr o f i t. T h e n o rm .rc l i ' re n .e d b asi s for rhi s t,vP eol gradi ng i s comP l i cr(ed ar c n, er e l y a rD n ti fi e ro p g ro u p , i n .l udi ng those w ho D ay have scored20 poi nts Io$er I he bottom 5 percent may €ach be assigncd an F, even though the bottom l5 Der c e n t ma y b e i n d i s ti n g L ri s h a b l ei n achi evcmert R egardl essof the quora r er r iDg s o a re g y u s c d , tl ti s r;l a ti l e g r adi ng mcthod sel dom.ari es a defensj bl c Ihe .ti\tribrtlon gaf itethod, another relati"e grirdrng variation, is base.l oD t hc r c l x ri !e ra n k i n g o l s l u d e D tsi n l he forn of a frequen.l di sri buti on of the .onrposire s.orcs The trequcncy distribution is examined carefully for gaPss er eial c o n s e c u ri v es c o re sth x t n o s tud€ntsobtai ned-A ho.i zontal l i ne i s dra$n r he r o p o i th e n rs t g a p (" H e re a re the A sl ) and a second gaP i s sought The 280 GqADT.IGANDREpoqTTNGTHIEVEM€NIS P r of t s : re n ri n u e : u n ri l J l l p .$ i L ,te fl ade ranB c| q r. F) ha!c bFen i drnl l fi rd I ne m rl o r ri | rc ) w rrh rh r\ re , h n i q ue rs rhe depcndenceon.hance to form rhc gaps. Tbc size and tocation of gaps may depend as Du.h o" ,a"dor ment error as on aduat a.hieycment differe.ces berwecn srudenrs. ",ea.";.. If the score. from an equivatenr ser ofmcasur.es dNld bc obr"i".d f.r", ,;. e,";lr.;h;;;;i:, Baps m'ght appear in diffe'cnr tocarion( or rtre trrger gaps mJt ,,i." .u, r" l" somewh.rr small. i:r,ors of neasu,rmen, r,,,,n d,//.?i;e,,,re. d; n;.";. c . i il\ r J D ,T I e d , h o rh e ' .u ' d ' rh " \ i r,. re.l ,o d. .n repr rreo rD ci""; surcmerr " \pe, s ir h I h e f,r i n :I Im-n r I h F md i o r ri ^n o| | t," A i .,, i du,i ^. j l ,p mel h.d ., .r,.,. { g n .d . | ;\ s,"u.l ,.nr: Jpp" dr ru be I i shr , n ,tr,:S o,.r.,t,u. rl q h e r c rd .re c ,,n ,c quenrt\. ,er, hFr\ r,.ei \c re$,.r \' | rdcnr ,1,Ir " : I " *' ' , ' F : pr ar nr s rn d re w d re l u c < r ro re e \a mi ne resLpapers ro search tbr,,rhar exrra p. r nr . In d r s o u t.l . l u r F \d rn p te ., h rnqe a { gr.de r^ ,I tn ,i ,Ia .I' sh-rp rh. .'":'.,]'". o , f.rc s re ' ri g h t\ \r' ri Jbte rhi : grddi ng l rerhud i \ rrketr In ri cl d F r aoes rn d ra rc \rmrtd rro rtr.r(,rs s i g n.dh\.um" nrherretrri .eFrxdrnqme,i ,ud,. noqev e ' . w h e n \ o ' e r rre I c td ri !e tv humoscnen r, rhe [Jp .]i ,i i i L,,,r^a merhod a. r ua|| v ma v b c a 5 In e q r" tr' l .tero .n mr rudcnrr ar i r ro L,c. " | pi a,. rt' ,e o rh .r \ i .l c h u .rd a n .l Fc,rp!d r .nu nJ rej ,i ,r,e n,rl i nr- pro, " dur. m ighl h c l J h e l e d rh F rra d l o td /,,1 ).a ,anapth^d.t. ,h" ;.p;,," d;,,:;:; ;;. " dar d de ri a ri ,rn i a ' d e re ,rri n i n g rl ,e gradc .uroff poi " ' ,; nts rhai fo,- .d;;;1. ||on, or rh e ,.mp n .i :r \.o re \ tl rc n rhe Inr.,l i rn j nd \r!ndi rd de!rJri o,r,,t rhL r om po, i re n o rr. a re i u mp l rc d . r, o p.i ^r\ ti ,r rhF r dnC eot C sr,,l e_ r" ," r,;. pef lor m a n .e l d rr d e re rn j n c .l t,! d d d Ins ,,ne l ,r| | or rhe srrndar,t i er i r ri on ru ri i t m f dr dn .' r' d s u h r' J .ri n g ,,n e trrl f .t rtr ,r ,r.,tJr.l dc\ri ri .n l ronr rtri . n,edi Jn , he4. d d d u n c .r!n d rrd d . ri rrl o n ro,rr. upfer, ur.rr i ,t rth.r , ru fi nd rhe-q_! c ut of f s c o .c S rb rrl c t rh e s a m e a D o unr fr;; ttre tow e, orton or trre C s to nno t he D- I c u to i l R e v re s b o rd e rl j n e c a scsb, u$j ng the of,,.i sn_;n;;;; p- ler edq, u a ti (y o I a s s i g n me n rso. r s o n,eorher rci .-.t " " m1,.,. [ur" ," j .l i ,i . if anv bordolj,re gndes shouid be raiscci o, l"*...,1 -r,i .,.-." t Nr"."".._."; ._;;;;;i. rll componte s(ores, also A rariation of ths mcrhod rhar des.ribes rhc use of relatjve grading on aD rnsrirurioDal bass has becn iltuslrated jn .onsiderable de. t ail by } l b e l 1 1 9 7 ? ). A bs olut e c ra d l n g Me th o d s \,ri o ' ,. n ,e rh o d . rh J r d e D Fnd on p" ,,enr..nre. d\ thF.r t,.,\r, hJ,e a . r . nq. ( dn d rn g h r.ro r\. h u r rh e i r p ^ n u tnl r! trd. .| r i i she. arerrt! \i n, ( ,tre rJ: tr r : r , us r e rre n i s (o re \ri o m ' c .r.,p J p e r..dndorh(r p,ui p ..Jrei ar,.rprer(drsrh.. per . enr o r .o n rr n t. s k rl t.. o r k n o d te dg. u\Fr qhi .h i u.i en,\ \i \r ,,,rnmrnd _, dom . ln. re t' e re n c e di n re rp ' p rrri o n F or e\amptp.drc,r \,urF.t b., p., , .;i ;;;.,.: lhar r he s tu d e n r l n u h . 8 3 | e ' , e n ' o r rhe.on' .nr I cpr p.Fn,ed h) rtrei rj r| | ur I i unal ont ec lr p \ | l o m s h ! h re \r I' e ,r( q e rc p,.pdre.j dnd,.rn,pl ed ppri en,\nr..u\I. rrng the scolcs vith performance srandar.ls to pelcent scorcs using at-birrary srandards e cune Thar is, srudenrs wth s.ores rn rhe (o 92 is a B, 78 to 84 is a C, and so on The GRAO NGANDNEPOFTINC ACNEVEMENTS 281 restricrion hcrc is on thc score ranges rarher than on the number of srudenrs eligiblc to rcaeile each ol the possiblc gmdes Brt what ranouale should be used to determine cach grade categorl .utoff score) Why should the .uroff for an A be 93 rathcr than 94 or 90? A major limitation of percen( gading as used by soDe rea.hers is the use of fixed cutoff points thrt are applied ro @dl' grading componenl in rhe course It seems indefcnsible ro set gnd€ cutoffs rhat remain coDstant throughout the course and over several co.secutive offerhgs of rhe .ourse. What dr6 secm dc|nsible is for the instnctor to establish cutoffs for cach grading .omponent, independenr of the others, depending on the conrent c,f each component. For example, the range for an A might be 93 to 100 for rhe first test, 88 to 1U0 for a term paper 87 to 100 for the second rcsr, and 90 ro 100 fbr the final exam Those who use percenr grading find rhemselvesin a bind when rhe high. esr score obtained on a test was only 68 percent, for example. Was the resr much too dillicult or did students prepare too little? Was instruction relarively ineffec tive) Some insrrucrcrs proceed to adjust the scorcsby replacing rhe perfecr score, 100 percent, with the highest s.o.e, 68 percent in this case. For exarnple, if fie highest score was 34 out of 50 points, each students percent score would be recomfuted using 34 as rhe mariimum rather rhan 50- Though such an adjusrment may .ause all concerned ro brerthe easier, rhe new score can no longer be inrerprete.i as originally iDtended the pmporrion of rhe contenr domain rhe student knows, as sampled bv the test. A new donain has been eshblished What useful inrcrpretarion aan be made of the nc$ scores?How can rhe ner\r domain A final shortcoming of percent grading should be nored. The range of percent scores usually is limire.l to 70 to 100 because (he passing score generally is 70 percen!. The test constructor must exhibir Srear skill ro prepare irems rhar will yield scores distributed rn thls narrow range and rhat, ar rhe same dme, will measure relevant learni g as reflecred by the instruc(ional objecrives Merhods tha{ allow for a lower passing score would permit a greater porendal range of scorcs. likcly would yield more reliable scores, and likely would result in more reliable grade assignments, assum)ng the full range of grades (A ro F) is ro be A second melhod ofabsolurc grading, called here the antent b6en Mtho4 dcpends heavily on rhe judgnents of the ieacher in decidhg rhe rype and amounr of knowlcdge students must displal to earn each grade on the A ro F scale It ir the method mosr compatible with mastery or quasimastery reaching and learning strategies, but it need:rot be limircd to pass fail or sarisfacrory unsatisfactory grading scales.The procedural steps for establishrng performance sGndards and curoffscores are outlined below for a so.item Lesrbuilr to measure achievenent in Mo units of insuuction. I Firsr. rhe grade to be assign€d to thosc who demonstmre minimum a f dec r ng c h re v e me n 'm u ' | b c e \ta b l i <hed W e q i l l urc D l or i l l urrra' i on purpoces, bu' ir ( o u l d b e L . r\ i l o m m o n i n grddua(el evel (ou' rer. The reacbe;m;sr de. velop a descriprion, preferably in writing, of the type of knowledge and understanding a student who barely passes should possess.Srmilar descriprions musr be developed to describe C, B, and A performances. 282 i GFAD]NGANO h'EPORNING rcH IEVEMENTS 2 . Wi rh rh e d e s c rrp ti o n si d and de c i d e s i f a s n rd e n ,i fi o n l y I swer r correcrl): tf so, a D is re(or m or e rh a n a s i n s rc p o i n r, ti k F ro mr . r e( r dc rh r D i n i m u D u mb c r o t p o caregorx I. T h rs p ro L c r( ro rj ri n e s , rl . . . __ l|ed r h p c s m a re d c u t{ ,ft s c o rc to r D s y mb o l sp re c e d i n g th c i re ms A s sul t he num b e r o fC s y mb o tsi s ra l ti e d ar L perrormance. This pro.ess conrl grade has been derermrncd. The res A = 48_50 B = 40_qt c=2939 D = 1 7 -2 8 I = 0_16 Ln l )c obtai D ed Ly adj usri D grhe esri D rrl .d acpFndrng orr rcsr tengrtr. r he adi rr\rmrnr i ur rh. rnr I rh.rro r,uca\urc\ are l e< ! rhdn i':i:"I'llj:iil,,j,il: l:,.*:;'J';:i:tl. q,^ o,e,rn,, o,,""il,,ll'ii,:,"i;l:, s, " s,,0". ll:,ll::r.1,;,l,,ll T h e c o n re n rb a s e d m e th o d i musr exercjse subje.r,vitr h descnrrin ample, musr displal Instructors in rh, mctors are willrng and abie ro define Der_ : able ro supplv a dettnsibte rarionate fof ar approach has bcen described b), Tclr|il. A final merhod relaresro rhe use ilff#i:*iil.','i:i:u]n*s*lii*rn*h#ilnytiutH GFADING ANDBEFOFT NGrcH EVEMENTS 2I3 ov er t 0 0 re p o rrs d c s c ri b i n g c o n rra crgradi rg and concl uded rhar ,,conrracreTart. ing ap p c a rc ro h a v c a p e .n a n e n t ptr(c J;rung rh. mosr rpproprrot. cui i enr m er ho d \ o fd \s i g n i ' ,8 H rd d F \r^ \ru den,- H us" \" . rrudi F, ,,i i t," .tr,. r. ,r .o,,. t r ac t in g g e n e ra l l y s h o w e d rh a r s ( udcnrs l i ke i r, reache6 assl ened morc hi B h gr ade srh a n w h e c o n v e n ti o n a lme rhodsw er.eused,and sLudcnti chi cvemert ul as nohigh rh rl \i rl ,i u n \e n ri o n /l grrd,rE ri ,n,rr,rLi rr.trnR Jppej .\,,,hct\.., s u' r e. l ru \.' r s n ,d tt, tJ \\e . n t rn d ,.p cn,ter..\ruJ ,,., our* . r; h ,,u,1,n.,.,,. "ii, giv en (h e fl e x i b i l i ry ro p u rs u e i n d i v i dual i rrercsrs.tns ch cases a uri rren asree m ent s h o u l d b e m a n d a ro ry s o rb ar D u D rsundersrandrnsh i resard rrr;ha, m u\ r b e r,,u rn p l i \h e d . L r \h ,,,r. d r,l b) hhd dFU Ll trn; " \,\r GRADING SOFTWARE The time.consuming rasks associared wirh recording resr scores in gractebooks and c o mb i n i n g s c o re sfb r h D a l g rades ca be handi ed readi \ by a rni crocom purer and any of rhe numerous softwafe packages alailabte f;r i.adns. So,ne t eac he r\ u re a \p re rd \h c .r p rn g rd m dnd de\i tsr rhe,j .un g]" Lti ng ,ppi i ,.rrr,,n pr ogr d m :n rb e ,\ ti n d rh e u n i q u e r.a,ure. or man! ,,r rrre,,,rnnreri i ,i l ,a,r" gs word their relalivelv loq cosr. Bccause sofiware and hardware borh change more rapidly rhan mosr other textbook conrenr, we ha!e chosen not ro describc or evalu;re slrecifi. srad ing software. However, curren! inlormarion can be tocarcd using suci relerc:nces As Dato Saur4, L,-atoJ!!p!!t!Lr.4 ino.^nputn_so't c. r.tl.\r \ta, Inro\hr. / a, . at bn tn a ? r,.a -n Id u t,a t l tu t? , ta J a u ,MI\ i r Fdb o' on tt.tl l 1 l n rddrr i un. \ntr\.r,I r e\ r ew s a n d trs rso t n .$ re te a \.s a rp pri ntFd trequer' tr In \u, h l .rrn.,t, .,, /./,. tronk Leaning, inAdzr, Cra$rcon hnputet Leaminf, and ihe C(,np;b Tancha. Hete are some queshons (o rarse when assessingrhe uril'tv of a gradebook prograrn I 2 3 4 5 6 7 How dranysrudenrsand grading .o mponenb per sLudenL can be r.conuo.laied on a single dar2 diskeu€? l-or the elemenrarys.hool lerel, .an rhc sy{em han.lle mulriple ctascs for a single group of sLudenrs? Ho{ cdnvenienris it to cbangegradesor Lorepla.e scores? C a n re s ts c o re sb e i m p o rL e da s a dara6l e so rheydo not needt(j be kel ent€,ed one by one? Does the variei) or reporting-prinring oprions sarisfybasicDeeds? C rn rh c d d ,r b c .ro rc d o n r d d,a di .te,' c.,ppd,j re r,rm,t,c D ,os,r,n{ i \t.,. rh a r! u p i e v a n b c m J .teo l rh e ddra trte.j Are there anl unusual hardware .equi.emenrs .egardnrg menrory,drives, or 8 Can rhe prograqr be retur.cd for tulr refund afre. a rcasonablerrial perbd? Ofro u rs e . th e m d i n q u e s ri on ro a\t r' bout any A ,l di ne sotrhrrc i r...W i l l t he pr o g ra m a l l o w m e ro u re rh e g r adi ng proi edure, and ph:to\oph] I hde J dopiedl ro r e x a m p l e , s o l i l " a r. (h dl $ i tt nol a, i ummodare; rea(h;r,\ r ri r.ri on. referencd__glading pmcrices should nor be considered for adoprion, no maner how friendly the package seems to be. 244 CFADINGAND FEPORIINGACH EVEMENTS S UM M A RYP B OP O S IT IO N S I T her er s . orntrq w o n q w rrre f.o u ranqg s ru d e r l '' , r ! , o. 0 1 o ,a d p . , ..- q ." d a .d ,e ," ,,: Y? 4 Gladine-is r@q@nry rhesubrecr or educarona ::."#:: I ' r c d s ! r e S o r a ch ie ve h e n l con rroveEybec als e Lheqr idir g pr oc es s sd 'L 'd||6 'e''p |'' o\ oo' * . , ' , ' "j", ' "' "", ", , . . l :" d o rdnh r.' oo, l;',lj,il o, ro,ra..p ro represenr slmmarive "",i,h."rr;;;;;;;;_x ;J':,",,:""xlJ:;iHlffi:J5"t:l:1::,ii" ;oe"l,:l".. 16 T/rewetghrcaned by eachcornponenr meas!.e compone.tsis ro, smalt 7 T hes eec ir of0 1e trh ear b s o tu te o r re ta tv es L a n oarosas a basrsfor gradi.gwrt be ntlenced mor€bypj rosph €lmns deratonsthan by em_ 21 The use ot conlractgradng may be advanlageols indviduai nstruclionasituaUons bor 'or grad nol ro. ng cassesot slldenls 2 C , -a at,o o-p-re,so ba.p.rnF6p,.oL.! ' fe c pi !dt .rtor crd rrp oorenta ro. .orpj a. rcnaierors assocaledwtthgradinq OUESTIONS FORSTUDYAND OISCUSSION I For what uses are high schoo colrse qrades most vatd? 2 !nder whal cnchslanc€s.ould I be approp.iat6 e r 6 9 'a d e s i s s L p db v r e a c r - . , si n d , c r oor r o ev dr Ldr a, aec J r r , c ur Lro . , h a r " " h o o j o ' 3 W-hat s grade inth t@nad what kind of evidencgis reeded lo show that has o. has nol GFAD NG AND FEPOFTNG A]H EVEMENTS 2Aa .4 When eller grades a.e used of report cards at the mddle schoot evel, whar rfformalron srrourdbe T!rn shed to commlf cale the meani.Oot each qfade symbo? 5 W halar es om eet f ec t v em ean s o tr e w a r dn g s l u d e n t sf o r r e i f s u p e r be f l o r t t o t e a r np, a r t c , ! arry n the lace ot reraltveytow ach evemenr? 6 W hy r s r heus eor pt us andm tf u s r e l r e r g i a d f gk e y l o y i e d r n o r e v a r d g r a d e s r h a ns j h p t e 7 W hat adv a! lagesdoes ef ler g r a dn g h a v eo v e r n u m e r c a t g r a d r n g ? e What shorcominAs,n afy are i.herenl n rhe ececlic gradtnosyslem described in lhis 9 Whal ncenlives,other than grades,can leachers use lo molivale srudenls partrcipale 1o rf class aclvilies and 1o compete rromeworkor pfaclrce exerc ses, 10 whal are Lrredsadvanlaqesot Lhe leed jorward,, concepl thal s recommended for use if 9rad ng ass 9nme.ls? Fow cou d eadr disadvanrageyo! iden ed be overcome? l1 lr the scores irom lhree 90to.1lesrs are addedtogelherlo torm a compos le tor grading wny wo! c each resLnot necessaalyhave eqla nt uence(werghL) in delermrnrfgihe rank order o1 ndividuas n the composrle? 12 Undefwhal c i/ c lm s lanc esm ig h l e r a dn g o n i e c L r v eb e p a d c ! a r y appfopriate? l3 W haLdr awbac kdoes s t he s r an d a r d i e v a r o nm e l h o do r g r a d n Oh a v et o r r e t a r v e r ys m a l 14 W hy m ghr s 0beam or eappr op r a l e p a s s r g s c o r e l h a n T s n a p e r c e n lg r a d r n g sysle.n? 15 W har ar elhe dear c har ac ler is l c s o r a c o m p u t e r g r a d e b o o k s y s l e m t o r l s e a t e a c h o t L h e s e gr ader ev elseem ef t ar y ,m r dd t es c h o o , h g h s c h o o i c o l t e g e ? The Nature of Standatdned Tests CHARACTERISTICS OF STANDARDIZED TESTS Tlre tern ttdtulafthzal tesr.efers ro a resr thar has been experriy consrrucred, usu allv wjth_tryour, anat),sis,and revision; inciucte,.*pli.tt ii,"ui,t"n, r..u"tru,i" Glandard) adninisrradon and scorin$ p.ovla* *1r", r.. r_i. "na rnrcrpretarron purposes, deriled froD adminisre.ing rhe resr"rtlol-, in u iforrn fashion to a defined sanple of persons Used l published rest or invenrcry. whether Dr not. Most precsely, resrs or measure; r means ro. making score comparisons taskt under the some tzsti@ cotuliiiow anrt t; uith the sme prccedurc; Of cou$e, no ui I.n ' e n d c dro \i c l d norm rercren(cd (omprri runr Lri reri un. :r -co, : l' l\, r r . rTTen. e d rn o9 d,a o m a rn .re re re n c e da r hi e\cmenr rc5r. and,ome per,onati r\ \ x r . ! . all u i ' w h i , h md v b e .o rn me r(i a t pr.pared and," ;r.,-i y.a_," ;,,.,.J, mea. us uallt pru l rd e ra b l e 5o f n o rms Srrn d a r d' ro ized s e re rh e s ame fLj nrI ron rn edu( ari on rnd p\} , hotusr d\ s r dndd rd re i g h rs a 'ned\rsm e a s u re sd o i n.onme,.e and,ci ence. rri ..," _." " , m ar k er hJ d i rs o b n rv p c u r s (d tc a n d .on(epr ofhoh mu(h d puuna i ,, i l _rrJ nor . oe\ ur e rh d r a p o u n d o r g ro u n d b e efpur.hased at one marker routd be mnre / p .u n c t o b ra j n e d a r /n o rher l hr samr probl em houl d tace rhe ( onr um er a r th e g a s < a ri o n . rh e ta b ri ( shop, and rhei andy counrer W i Lhour r r lnda, dil e d L e q s ,,h e a , h i e v e mc n rra n d dbi ri ,i * .r,,,a.,,, r," . Jri i .* " i i i .* . 288 TESTS 287 THENATUFE OFSIANDAFDZED rooms and schools.ould not be assessedreadily with a common yardsdck. For example, rfeach tiiih grade t€acher in a distrtct werc ro develoP a geognPhy test ro measure studenl achievement, e would Iikely find lests that varied markedly in lhe breadth rnd depth of tasks requrred, the umber of irems, the amount of tcsring ome allowed, the qualiry of t€st rtems, rnd the reliability of the scores obrarned Celtainly, ir would be illogical and inappropriate to make scote com parsons among sludents &om different classrooms and schools under such crr' 'I he distinction Dade ir Chapter 2 between tests and measures will be followed hcre in detarlin,l thc charactenstics of test batteries and srngle'subJecl res(s In addition, because standardized personality measures ancl inventories are sed so rarely by Dost reachers and administrators, we have chosen to hmit our t.earmenr ofsrandardized instruments to tests in the areas ofachievement, cogni tive ability, and aptitude Test Ball6des Somc standardzed tests are developed, published, and administered in coordinated sets known as tesl battfrizs-The nnmber of tests in the set may vary from 3 or ,l to l0 or more, the number of items Per test may vary from as few as 20 to 100 or more. and the administration trme Per test may range from about 10 ninutes to more rhan an hour The admuistration of batteries like the loua Tesk of EAMtionaI DeveLopnmt or he DiXermlial Aftitude Tests may rake as manf as live seDarate test sessions. A primary advantage ofusing abattery over a collection ofseParate tests, whether for achievement or aptitude measurement, is thar fie battery provides comparable scores from the saDe norm grouP for all its tesl,sThis is imPortant, for example, if Mindy's achievement in mathematrcs is to be comPared with her achievem;nt in reading, language, and science Her relative srrengths and y'eak' nessescannot be assessedunless norm.referenced scores using a ri,gl" reference group are available If seParatetestswere used, Mindy might seemro do b€tter on tle riarling test ttran on the math test simply because students oflower achi€ve' menr wer€ more prominent in the norm group of the reading tesi This tllustrarion explains why aptitude batteri€s are used so fr€quently in emPloyment and vocational counseling to help the client understand his or her areas of srength and w€akness.The use ofseparaie tests would not permii useful intraindivtdual An achievement battery is a suney ofihe subJectmatter covered by each resti cov€rage is broad and, therefore, reladv€ly shallow A battery can Provide comprehensive coverage oftort of the impoltant aspects of achievement at the elementary school lev€I, nu', at the secondary level, and &rt, at the .olleg€ lev€l The more uniform the aducational progams of all students are, the more suit' able a test battery w l be for all of ihem. A very practical advantage of a battery is that rhe scores from a ba(ery are reported mg€ther on a snrde report. WheD sepamrc tests are us€d, a score reponis generated for each seParate test, creatinga most cumbenome accurnula' tion of paper for the user. 2I8 THENATURE oF SIANDAFDIZED TESTS use ofalartery of lesrs rhlt was developed as an inregrared whole . --Thesubsranrial advantages.The nrain disadva;raqc is rhc la.kthus offera fflexibitirv ir r il' o' . 1. .A b d | | e n n ,d ) i n c tu d e ,o rn r { r h,e\r\ rl ur dre ol ti tc rnrcl lar u. er . an d m i ! n m i r o rh e ' . rh e \ b u u l d ha\e prrterrcd. B ur rtri . i pd,i or rhe pr i' e t har m u \r b e p d ,d .u me ri m e s Iu r rhp adrdnrdge\ or ,on!enr ,,e i n u.c, comprehensiveness of covcrage, and comparabiLry ofscores. Mosr achieveDrenr, Single.subl€cl Tests Tesrs rhar measure achievemcnr rn onc conrenr area, or rhar measure a L a single{ubJecr resr. And because such rhan rhe conespondDg resr found jn a bartery, they will contain more toral irems and more irems per-skill Srngle.subjecr tesrs rend ro be used for parricular purposes, to make a ._ sp€cifi. kind of insrructional decrsion, rarher rban simply io d;scribe studenrs, reiative achievement or aptirude levels For example, readrness res6 used ar rhe pr jm ar t lev e l m i g h r h e l p rh e re d .h e r g o up srudcnr. ot \i mrta, te,di na or dri ,h. m dr & hiev c me n r Ic \c l \ tu r i n \rfu , ri u n rt pu,pose,. { m" rhemari r, res;mi ghr te us edr . der r d e w h r,h \e re n rh g rd d e r\n ' .mo\r l ;kel ) , dndi drres l or erahrh.srJde algeb' a. Re a d i n g h s r, d re u rd h e tp ,cte,r reddi ng mdre,i dl \ rhr;w ou" td be ' o e re a d i nts.krttsore;chqrudenrLnd.ot m os t appr op ri a te fo rd e v e l o p i n S rh ,ourse P r or r c r en. yte s tsa n c l g l a d u a o o n c o m p e r r t nat r r e us e d ro ma l e p rn mo ri o n re re n rl Some single.subject rcsts resemt or thcy provide skrll scores Some trngl separate scores on vocabulary spelling, capitalization. A reading resr may yletd I hcDsion score, and a total score. Ofren wift one anorher thar their separate dia the total score is probably a comprehensive indicaror of achievemenr iD rhe broad content domain delined by rhe resr specificarions. Most of the standardized resrs of.ognirive abilities (intelhgence) to be dercribed more tuny in Chaprer 18 are mosr appropriarely .lassifi;d as single subject tests.That is, rhe rrair rhese tesrsarrempt to measure generally is a sin[le, unitary cbaracterislic. Despite th€ differences among ..inre iAence', resrsin;har r hey pur pof l ro me a s u rea n d i n rh e rh e o rl on w hi cti rhey ari based, dnd derD i re r } le f a. r t hat s o me v i e l d s u b re s ts c o re s .m ost i ntetl i genceresrsare l essti te bai rer. ies and more like single{ubject tests TESTS 269 THENATIREOFSTANDAFDIZED TYPES OF STANDARDIZEOTEST SCORES Seldom are the raw scores (nunber correco obtained by students on standardized res$ interDreted directly. Inslead, ra scores are converted to some other score scale to facilitate in.erpretaoon. These n€w score scales are desjgned to Permil direct norm referenced interpretations by referring to a singl€ reference group (starus scores) or to s€veral reference groups that have been linked to the sam€ score scale ldeveloDm€ntal scores) Sialus Scores Stalus scoes indicate how a student's test Performance comPares with those of olhers in a single reference SrouP-a class, school, school dist c! or natioDal group Relative position in the group is the focus Status or standing in the group gene.ally is express€d as a p€r.entile rank, but standard scores hke those dcscribecl in Chapter 4 frequentty are used as well ln most casesstanines, ?scores, or normal curle equivalents (NCE' are normaliz€d standatd scor€s de' r i\ ed lr om p e r, (n ri l . ra n k s Ih e .ra n d a rd age \cotc' or devi ari on IQ s(o' e\ rhal corne from cosnitil'e abilities tests are status scores also. The primary purposeofstatus scores is to help in iden tifying intraindivid' achievement (or abihty) across rcsts in a battery. For exarDPle, ual drfferences Vrc's pcrcentile rank of l.{ in vocabulary indicatcs a relative weakness comPared with i reading percentile rank of42 Science mightb€ consider€d a strength for Vic and rnath a weakness if his science stanine s.ore is 7 and his math stanin€ is 4 Of course, such comparisons are legitimate c.nly when the same reference group has bcen used Note that the use of status scores to moDitor year io year Progress can mask grourh F.,r examplc, a student whose reading Percentile rank is 87 tbis year will ob u nr a sim ilar score next year tf normal gro wth occurs The sam€ness convcycd by status scores in this srtuation could be mrsinterPreted to mean that no change occurred In fact, a ,core of about 8? next year would indicate the studenas achievcment changed as much as the achieveDents of others in the norm group (Sec the guideline's shown in Chapter l7 for in terpre tiDB Percentile Developm6nlal Scores DeuloPm€ntalscofts iDdicate how a studenr's test Performance compar€s with those ofothers rn a se es ofrelated refer€nce groups (Hoovet 1983) There groups difer systematically and deve)opmentally in average achievement and are defiDed in rerms of school grade o. chronoloFcal age. Score scales most fre' quentlyused to express developmental Ievel include grade eq ivalents, age equiv' alents, and developmental standatd scores (sometimes called expanded standard approPriately used in grades K m I Grade equivalent with s.h.,ol subiects that are studied continuously overseveml years at increasing levcls of skrll and complexiry 'Io obtain a table of grade equ ivalents, the test Inust 290 THENATUBE OF STANDAFDIZED TESTS be given to a large number of studen$ in each of the seve.al Arades for whi.h ir is inteDded Then the m€dian raw score ofstuden$ in cach grade is derermrned The raw score is assigned a grade.equivalent score rhar e)Lpressesrhe grade Ie!el a grade equrvalent of 3 2 would be assigned ro rhat raw score_If the median raw score obtained by fourth graders on the same resr at fi€ same rime was 30 3, dren a gmdc cquivalent of4 2 would be assigned to rhat.aw score (Does it make sense thar a raw scorc of 26 0 tould be assigned a grade equivalenr of 3 7) crade equivalents Lrsuallvare expressed ro the nearcsr lenth. each renrh corresponding r oughly t o o n e m o n th o fs c h o o l i n g i n a sch.,olyear ofapproxi marel y 10 .r.,nrhs A grade equivalenl oI 7 4, for example, represents rhc median pe'formance of s ev enr h g ra d e rs rt th c e n d o f th e fo u rth month' Tabl e 16 I show s rhe qradc equir alen r { o fe a * i g n c d ru rh . rv p i .dl srudenr In .ai h srade l or earh ut rhrFe resting rrmes. Note the average growth rate from year ro year is 10 and rhar rhe samc uniform $owrh is assumed rhroughout cach year Grade.equn'alent scores carl be used (o desoibe a studcnr's delelop. mental level, in terms ofschool grades, and ro rneasure growrlt from year ro yeal But rhey are iess useful ibr examrning relatire sirengrhs and weai(nesscsbecause, as Table l6-2 ilhrsuates, va abilitl in each test area is dillerenr lor a gilen grade group For e)tample, all sixth graders whose raw scores are at rhe median in rhe fall have GE : 6.2, ro malrer which test area we considcl Bur pe.forDan.e ar t h. 95' h p e rc e n ' i l e L o rre (p ^ n d s ro a uF ol e 2 l ur ,pel l i ng !nd; cf. nr a 0 tur maft computation If Ne looked only at grade equivale'rrs ro make judgmenrs abour strengths and weaknesses,in rhis exa rple $e would erroneously .onsider s pelling a s re n g th , re l a ti v e to m a rh c o mputati on.B ecausesi xrh gradc$ are nl ore homogeneous in Dath computation achrevemenr dran in spelling, rhc range of grade equivalenK necded to describe the bulk of (his grade group is 4 2 to 8 0 aDd 32 r o 9 2 . re s D e c ti re h . Devetopment,l standard scorcs arc similar to grade cquivaleD$ in lunc Iio and have the same advantages and disadlantages of Dosr orher rrpes oI derived s.orcs Thc d€velopmental standard scores shou'n rn Table 16 3 hale average growrh rates that d€crease as students progress through rhe Fades 'r'hese Table16-1. GradeoquivalentScoreslor MedianPerformanc€ at Eachot Thr6eTimes o f Ye a rl n Ea c hGfa d e 3 K2 K5 K8 12 l5 ta 22 25 28 32 35 36 42 48 52 55 58 62 65 68 T2 T5 TA A2 85 88 rAs will be seenin rhe examplesused larer sone publshen d.op rhe dc.lnat poinL{hen repo.nng r sudenfs grade cquiv"ldnt Fo. exanpl€ ta and tl rLould be inrerp.eredin THENATUBEOF STANDABDIZED TESTS 291 Tabl€l6-2. Dlfierencesin G€de.€quivaleni Distrlbutionsby Test Ar6as GBADE.EQUI VALENf SCOFE a7 95 6o 50 92 67 62 57 32 66 62 51 35 5 80 65 62 59 p e rro rm a n craeto i nrqrad€ 6onl he/ow 6 particular scores, used with the Iora Testi of B6X Shilk, illusrrare a significanr lim ir r r ' . nof d l l d c \' l u p m e n ra l s ra n d a rd \ore scal er. rhere i s no meani ne or inr e' p' c r dr ' . n h u i l r i n ro a \o re . \^ h a r does a {ore of 120 mean to, a tourrh gradcr tcsted April? Wirhour accessto a chart. like Tabl€ 16 3, we would need ro know these'n things. (l) Dedian performance in fall ofgrad€ 3 is defined as 100, (2) nedian performance in rall or grade 8 is defined as 160, and (3) averaqe an. I q' o\ r h f o r q ra d e s3 ,o 8 i \ 1 2 .D e v e l o pmenral.randard v urFs dre nor ; i dety " ud becauic of the extfu used baggage requrred to iDrerpret them and because rhei . ' r e r o unLm i l i J r ro re a .h e rs a n d p a re n rs. Grade.equivalent scores are fairly easy ro inrerprer because they are ded d n. { F s . r l c rh d r i ( b t i n di ti durl " uho havF l i nl e \ophi \ri carton 'u h ,,\r\.r rnedve rs r..d\u b i e ,' ui, s ra ' r.ri .\. 'Ih rre mi si n' erprerari onj us, ai are s,.rut ' o thar developnentai scores are more scores, but rhere is no .onvincing evidence grossly misused or Disinterprered rhan are srarus scores (Hoover 1983). -I he use ol' um ' n. n. en' r d d ,,,m e b d s i , k n o F l .dge dbour devrl upmenrat srate\ rre rhe r . \ ingr e. lr e n r\ ro rF .p u n s i h l e rn te rp re ri r i on ot gr,.i e equi val enr vorF, A n ey. ample will illlrsrrate. IfJo nerte, a brighr fifrh.grade grrl, gers a gade equivalent score of 8 4 on an arithmeti. test de$gned for grades 5 and 6, how should her score be inre,preted? Chances are rhis resr was nor administered ro eighrh graden, so the value 8:l is the estimat€dgrade equilalentOy the process ofexhapolarion). The typical Tabl616-3. Developm€nlar StandardScorestor Medianperformanceat Each of ThreeTim6sot Yearin Eaci craoe Sprinq 56 60 64 73 a1 100 104 108 9I 95 13 112 124 136 140 144 124 128 132 12 14a 152 156 12 160 164 163 292 THENATLJBE OFSTANDABDIZED TESTS studenr in t}te eighti grade, fourth month would score abour the same as Jonne(re did on this test. However, this does not mcanJonnette can do the same arrihmeric as fie rypical eighth grader She l'ould need to rake a test designed lbr eighrh graders for us to know how 5he would perform on aritlmeric contenr srudied by eighth graders Students who obtarn grade equrvalent scores srgnrficanrly above or below their o n grade level should be retested wirh a higher or lowcr tesr form if rhe userwishes ro obtain more precisc indications ofrheir developmental levels. Ofren th€ per€endle mnk, a slarus indicator is helpful in nakingjudg. ments about the value of out oflevel tesdng for a parricular srude r_ Scoro P.otlles Only if scores on the scveral tests used are comparable is a profile of student scores meaningtul. Scores will be comparable if they arc expressed on rh e s dm e \ r ar u\ s o re s (rl e rrl l p e r(F n rl e r d nt, or al l rl ,p \i me rvpc;t \r,,ndrrd score) and if the "ame reference (norm) group is used for each one ,q.nexample ofone student's score prolile is shown in Figure 16 1 The horizontal lines orr the chart.epresent various percentile ranks, spaced as they would be if rhe rrair beiilg Deasured b) the scores wa, Dormally dislnbuied There rs a vcrrical line on the chart for each t€st in the battery. The percenrlle rank values shown ac'oss the top of the chart for each res( are marked as dots on tbe corresponding verrrcal scales and connected by lines to form ihe prohle l,arry Hill's perfornance s about avemge, ovemll (His percentile rank for the total €sr is 52.) His highesr achievemenr levels (rclative strengrhs) are in reading, vocabulary, and work,srudy skills His lowest (.elative weaknesset are in language and mathematics ProFrles are most useful for identifying individual needs ofst dents and for vocational and educational planning A profrle also nighr bc used ro idendfy srudcnts who should be tested more extensively or to derermine rf imDressions Ior m ed f r om , ld \\ro u m r(s ti n tsrn d u b s c r!d ri un rf, .urTl med P tufi l c, ;epre,cnr a very compact form of vrsual communicadon rhat makes them convenienr for reporting and explaining test results to both srudents and parents. (Additional examples of profiles can be found rn Chapier I?.) Perc6nllle Bands In an attempt to st.ess the fact that rest scores are subj(r ro eror, somc test pubhshers choose not to report an exact per.eDrile rank for each tesr score Instead, rhey provrde a range of lalues within lrhich rhe "true perccndlc rank probably lies This mnge is called alercenlite l)atul.For e\ample, rhe resr manual may show that the percentile rank for a test s.ore of63 is betwcen the values 28 and 57; it ma) go on to stress that the exact perceDtile rank cquivalent is un. known, since it depends on thc unknown size and sign (posrtive or negative) of the error of measuremenr in rhe individual s score T he pr in .i p l e e mp l o y e d i n (o m p u rn g percenti l e ba ds i s tl i e same one involved rn usrng the standard enor ofmeas rement (Chaprer 5) to find the ranr score mnge in which the true score probably hes- The width of fte pcrcentile band depends on two facrors, the reliability of rhe scorcs and the degree of cer. ainty that the band includes rhe true value. Lo score reliabilit) or hrgh degrces GBADE Iowa Tests of Basic Skills Form G. H. or J I i lIJ'a:6t,jd E['ud|i5l aG(r.qiadrq r d rrllur t.u ,ErEfl!F h ilF d Efi t! b !h! o ir oDn- FFfr rr rEr 'drh !. Ertu h r. rnii t,n !.r e|rc *l - Fq[ c Flcurel6-1. sampresLudanr Prcrrecharl 299 ZEDTESTS OFSTANDAFD 294 IHE NATURE ofcertaintv lead t wide percentile bands Unfortunalelv the broader'these Per' .entile bands are. he lesi useful is the information the test Provides One use of percentile bands tn a batterv of tests N in decidrng wnether be or not a drfference between any two scores ofan examin€e is large enough to t be du€ solely to errors of measurement The score report in Figure l6 2 demonsrrates th€ use ofPercentile ban'ls on scores for AlisonBabka fron the load Zrb o/Brrn Sttlls ln the uPPer'right vocabulary, for examPle, Alison's n whi.h means there is a 50 Percent Pr is in the range'1 In thc bottom of rlr scores liom each rest area WhY are **. of 100 always have a percentile rank band at the toP ol u p.-"nt --..t th€ "HIGH" areaT There is a possibllity of underinterPreting test scores using Percenrile selecttnghigh level be better olf relyin for decision lnai.in lhe user can be thit of confidence for inte on the perce.rile ran . Gen€r;llv, the larger a score difference' ihe moJe confident a corrcsponding achierement dlffer€nce actlrally exists But usuallvire nade in e decision makrng context using other ' are Dore liketv to helP than to hinder the Process Subtest Scorcs battery provtde seParate medures of Just as tests that constitute a test so ir is differen-i aspects of achievement, Possible to subdivrde.a singl€ tesi into of several un ique skills 'l h€ rePort in parrs n measure; obta' to seoaratelv scored Fll-,re 16lz demoi'srra.es this The desire to obtain as much information as Posri blE from a tesr sometimes leads the test developer to offer a large number ofskill scores, each of which may be based on onL' a few test items There tions io note with regard to interPretinB such skill or subtest scores First as rhe number of separate scores increases, the rehabitity ot eacb Probably diminishes On many tes;, a subtest score based on as fer{ as l0 or 15 ilems may measure samplinq error moie lhan it does irue achi€vement The percenrile bands in Figur e l6 t help a l e rt th e u ' e r ro fi i s p o $ i bi l i rv nacll tro{hir& the size ofa vaDdard enor 296 THE NATUFEOF STANOAFID ZED TESIS f he s e c d n d(ru ti o D re l a re sto rh e \ al i di tl ofsubrcsLscores.W hcn subren r c . r r c sar c pr o v i d e d , a s i s | n rc w i rh m a .y si nH l c$ubj ect.eadi ng rests,rhc dcret ope. s hould p ro l i d e s u b te s ri rc rc o n e l a ri ons i . l hc tcsr manual ro shoB ho1, s ir nihr or . lifte re n r rh e s u b te s b a c ru a l l va re If rhe correl i rro.s are roo hi gh, for ex am plc . r lic s u b i e s tsa re a l l l i l c l v ro b e m easuri D gl hc saD reuni rarl rra( o, ski l l T he r es poDs i b l el c s r u s e r s h o u l d l o c u s i n rc rprerari onson toral tesrl co.es i D sxcn c r s c s an. 1jgn o re (h e a !a i l a b i l i r\' o f rh e s u bresrscorcs NORMS A ' o/ nr , r i. hic h te p o rr h o { S l u d c n tsa c ru a l l y do pe.form, shoul d nor be contused \|l }t stando"l!, rlhich represenr csrinares of bow well rhe), should perforn tor ex am ple, lhc s ra n d a rd ()f c o F e c tn e s si n a ri rhmeri c .al cul ari on i . most cl assesi s t 00 per . ent , b u ( th c n o rm (a l e ra g e )o fs (u ( 1cnrachi evementon a gi vcn.oni pura lion r c s t nay b e o n l y 8 5 p c rc c n r Ofre n t he averagcperformance rakes on rhe f unc r ion of a s ta n d e rd Ih a l i s , rh c a v c ra g ebecoD es rhc cr' reri on agarnsr{ hi ch t he s c or esot i rd i v i d u a l s a re .i u d g e d ro d cl crmi nc rhe scorc D reani gandval ue Cons equenil ),,fe ,r s tu d c n ts a re re g a rd e d as fti l urcs i n an arca of srud! i f rtrei r pcribrmance is ibove rhe norm t,r average), l)nd lew are regarded as succcsscs il t hc ir pc r f b rm a n .e i s b e l o w i t Nor ms a re R rm e l i d rc sc o n tu s e d w i rh the vari ous rvpes ot scoresrhat arc us ed t o r epor t th e m P e rc e n ti l era n k s , s ra ni D es,gmde cqui l al en| s, and sten(l ard s.ofes a,e all tlpcs of scores, derived |r()In raw scores, to rcporr no.matile per f or m an. ei r he v a re n o r n o rms th e ms e l v e sN orms arc d' ffercnri ared by ccri ai n charactcristics of ihe reference grorrp dlat .ornprise drcnr 't herc are age norIlrs ar d gr adc Do rm s . l o .a l n o i n s a n d n a ri o n al nornrs,gfoup norD r! and i ndi vi dual nor nr s ,t o r r r m c o n l y a fe w - It i s p o s s i b l el o Lonbi nc the chal acreri sri csof a norm g' oup in a la ri e ty o l w a ,v sh a n a rre mpr ro bui l d hi ghl ) di ffcrenri ared D orm gr oups l' or e x a m p l e , rh e N rti o n a l A s s e s smentof E ducari oD alP rogress(N ^trp) rcports normatiye perfo.rna.ce Lascd on agej geographi. region, racc, gender, and c onm uni ty ry p e W e c o u l d h n d o u r how w hnc ni ne tear.ol d boys ti om rhe r ur al W es r s co .e o n tc s t c x e rc i s e s ,b u t s e l doD i s i r w orrhw hi l c to use so many variables in conrb'nation ro describc lest perfornraDce (And, forrunatcl.!: NAEp does nor us c th a t m a n y .h s s i fi c a ri o v a ri abl csi D a si ngl e compari son) t ions . and nho ta k e (h e te s t a s s c ri o u s l ya sw i l l other srudcnrsfor N hom rhe norms are necdcd Thc three R s rnosr ofrcn used ro judge rhe appropriareness of a ser of.orms for a givcn testing situarion ar e rdr.sotali!tus, relflancej an.l ftcmri. Nor m s o b v i o u s l y mu s r b e o b ra i n e d from $udenrs i n schooi sthar are w i l l . ing to take tiDle out from rheir othcr responsibiliries ro help w(h rhc nonning adninistration That very willingness may make rhem somewhar arypical of the nat ioDal popu l a r;o n o f s .h o o l s a n d s tu d eD ts.To ger enough parri ci pari o. from schooh to provide a reasonably larye norm g.oup is a difficul( underrakiDg. To makc it a rcprcsentaiive sample is even h^rder Firsr, rhe developer musr decide THE NATIiREOF STANDAFD]ZED TESTS stuclenrs.fhar is, a morc relev sentdrive sampleui pri!are his Easr.S^urh,Mid(esi, ind 11.i The adminisrradon of rcstsro ob 297 298 iF. NAT!]REOF STANDAFDIZIDTESTS 7A 7? 66 60 52- 36 28- r987 oT Flgure16-3, TheETre.L n a Penodor 948 94 9 99 990 NormsrorscrroolAveaqes q Achevemenr ardardized tes(s lndlvidualvsrsus Group Norms use norms cont A sonewhar seriouserror ihat sometestusersmake rs to schoorbuir r'om scores u!emse i"c.'p"t p.".d ot '.o..".o rlrenno rhe medrr "iili;i;;;r;;;i.", Alrhuugh aggre8dre rome orhe' lnes,r hool divr i, r', or rnLri\ ro' rhe ; .";. s'.up 'hourd be i6our rn' rrmc a' rhr n'edn ' ;i:;;;;;i;; likt trre .choot averagesate sria""t '.or"", "l For examPre,Lheaunoaes'ore xr a tnrv excerrent ,h. *;;;;';;res ,#;; obtlined r*,);" nl'l :-l'l'-:'-":'J:'::]l rhe s(hoorma) be ro$er "ores 'Jldn ,1..i.-',"8,:rl:J;"11.:i;1,fl::',"::T:',"';':i in rheno,msroup.:"d s.hoor, be bet t e r th a n i h e s c o re s u i o n e l rl tn oI school ;."rPretadons are made' thc percenlile ranks of the i".ij".ppi.p.r,i. I I : t ',i ti: it 'J e.l i alIt ::.-!; I ri * 6 i? e Ec".: r; * a 3 e : E] E :i,' I ;: li f, gj \ F ; i J ! , I G! $:3 ct 295 TNENATIJREOF STANDAFD ZEDTESTS 299 arcragcs arc Dot iikely ever to be lo,{er dran 20 or higher rhan 80 the rrosr ext,enre degrees ol exccllcnc. or deficiencr are likely ro be underesrimared drasoc alh lhem o s (i d e a l b a s j s l o r e l a l u a ri o n ofs.hool al erages,atl pe of rrcarnenr ,cicrcrced irrrcrprcration. is a separate (able oi nolms for school averages And the qu.riil! ot school norms should bejudged b! the saDe .rireria of relevance, .epresentarivenr-Is, and rc.cnc)" as were re.o'nmended for indi!iduai studenr Dor nr i. F ig u re 1 6 3 e x e x rp i i l i c s th c p rcdr.ament tbat can d€vel op w hen dared norms for school arerages arc used SEL E CT I O NO F S T AN D A BD IZ ED T E ST S Sources ol lnlormalion For thosc who u.ish to iden(jfy pubiished .rrd uDpublish€d resrsrhar measure a prrri.ular trait. or those who seek descriprive information or crirical relrews ol existine mea rcs, a widc varieN of sources is availabla Mosr informa. t r onwill be fo u n d rn p ri L b u r m u c h o fi r i s a.ccssi bl ei hrough compurer retri eval ycorrool (l{MY) generallv is regarded as rhe mosr Thc MentaLMedeimntr conrp.(bensilc sour.c of iDformat'on about pubLshed iesrs. The tenfi edirion (Conoler' and Krarnct 1990), rhe mosr .urrenr printed edirion at rhe rime of rhis \rrrtiDg, in.hrdes such descriptivc information abont ea.h test as aurhor, publi.ar i, ' n dr r F .n rn ,l ' e r o l l ^ r In \.' n ,l l .\ rl .. n umbi I uf\, orpe repo' r.d. admi ni i rra' i on timc rcquircd, and plces for tes(s and s.oring services In addirion, crirical re. liens by tc(ing spccialists and a bibliography identifying research srudies in rhich rlte nreasurc $'as used are provided Teslsin PTi III (Mitchelt, lS83) is a sunnnary relerence to information detailed in all the MMys published plevi. ously (I he fourth edition is schedulcd ro bc published in 1991 ) 'I he lluros Itrsriture of Menral Mersurcmcnrs has made seveml chanses r , ' r . du' e rh e ' p \c re p u h l r(.' r.n l d g rl rar l Jl agxe.ic,' rl i er vol umes ofrhe V My Iirsr, a I'icnnial publication schedule has begun, and paperback supplemenrs are l)rollded i. rhe alrernare ye s This Deans informarion about curenr resrs is updated rn prrnt every year Second, all r'Ur\4I information is accessible on,line using Biliographic Retrieval Seni.es (RRS).The sysrelrl (MMYD) is updared con linually so Lhar a computer search lrll uncover the mosr recenr descripdve and evaluaiive informatior about available tesN. Another source of descripdve information abour published rests, kr' A CamPrchM r Refdence lor Atsess,nint in Pstuh.log, Ed,uation, ann B6ines! 6 a cumulari!e listing that contains over 3100 entries (Keyser and Sweetland. 1990)A compa.ion publication, TestCitiqws, pro\ides comprehensive reviews rhar in. clude recommended applications, technical information, and an overall cririque At the dare ofthis wrning, seven volumes had be€n published (Keyser and Sweer. land, 1985 1 9 9 0 ). The most cur.ent information about snndar{iz€d tesrs is in publish€r r J r J luqsz nd th e re s t5$ e ms e l v e e I h o s e sho arc, harB edsi rh sel ecri nCre\ts l or a school rcsringprognm should review a specimen set for each tesrunder consid. eration. For a nominal fee, the publisher witl provide a copy ofone form of rhe TESTS 3oo THENATUBEOF STANDAFDIZED test and rh€ accompanving test manuals to individuals who are authorized io se srandardized resis Pubhsher rcpresentatives can answer questions aba,u( their tests and processing services ihrough cithef telePhoDe rnquiries or ichool visits lequested by the test.selection committee Some tests have beeD reviewed by mcasureotent specialisls in Profes sional publications like $e Jo1'rnnlaf Edantnnrt M.dntr.nht or MPtlvrrmmt uLtl tualwrim in Guiaow These reviews, as well as lalidrty studies Published in Edr' canbe AeDtificd readily through a co rPur cationaland Pychokgitul MeorrmnL crized literature search of Curmt Intl{ ta Jamal|in Eduatian {CIJtr) lbt a ve.Y finally, college and universrty faculty members in education and pslchol' ogy departments often are avarlable to consult with s.hool Persortnel reBa'ding te;t seGction and use. Some universirjes are $'illing to Provide .onsultarion and test.scorrng senices to school distric6 through their canrPus rnea-qltreDrenIand test.scoring centers. The same seni.es are suPPlied by some statc cducarion de partments through area or regioDal centers esublished throughoul lhe slate to serve school dislrictsSelocllon Criterla Sources of information available to committees or indivi.luats resPon5i ble for selecting standardjzed tests were described in the Prcvioul seclion llrrt what information should be sought from these sourccs and how should the infor' mation be ueighcd in arriving at a selection decision? the items or test tasksrequire lhe Zottdtry. Wnhou( question, test content-thar factor to assess How lell the lests or imPortant know-is the most to examinee subtests rratch the currrculum in terrns of content coverage and etnPhasis must be determrned in selecting achievement tests For tests of aPtitude and intelli make them easy for students to use. should be legrble in terms of size and cl2r'ry Techni.at adcrydL|. A test that has been judged to bc sufficiently valid Io allow the disrict to accomplish i(5 purposes for testing should be scmtinized furthcr for technical adequacy. The reliabiliiy of test and subrest scores should be as sessedfrom data supplied in the technical manual and comnents nadc br- re viewers ofthc test Data should be Provided in the manual about the equivalency of alternate test forms that may be available. When d.velopmental scores are arc satisfactory. Tests that survive a validity ard technical screening Practicdt coaidqatio'Lt should be evaluated in terms of.ertain orher rmPortant considerations Schools TESTS 3II1 OFSTANDARDIZED THENATI]BE PROPOSITIONS SUMMARY f""iii; t *.* a sranoard' serecrine '' lli'i"l;'." 'o validllY p"onlv'o g"e h'sl" ', *o"E i,i. .""*. '..".."' ticatesrre srden.s and5uchoracllcal .ui"""". i""""'",' re r q e n q ar ew'"" ordc owem o l a sw e l l a sh s o r wearnesses and ;oeciiicslrenglhs ro rle o'esenceoi il'*""" ","'J' "" ""t"n "o"o"""v andcosr' requiremenls as lirne conslderations 302 TI'IENAIUREOFSTANDARDIZED TESTS FOBSTUOY AND DISCUSSION QUESTIONS 1 Wrich characl€rislcs ol slandardi2edtesls are nost ssnrlrcanl ior lests that w be lsed 1o make crileron relerencedintefpretations? 2 W hy m ghl Lhenalonal nor m gr olp s o ' l w o d 'l e r e n t p L b s h e r s y r e l dd f r e r e f t s c o r e s (meansand slandard devialions)il lhe grolps were gven a common test? 3 Wlry are achevomenLbaheres generaliy ess lserL at lhe high schoo than eemenlary 4 ll a math lesl providingthree separatescores rras nlerconearons ol0 79 0 a5 and 0 83 be t weens ! bles ls whai ar e lhe im p/r c a l i o nl so r s c o r e ! f l e r p r e l a t o n , 5 Why sholld l-scores be reqafdedas slalls scores ralher than developmenhrscorest 6 Why mrght grade-eqLvalenlscores be less uselul lor nlefprel ng scores ol hghschoo seno.s rhan rhoseoi th rd qraders? 7 I t a slxih gfadef is wofkingal lhe leve oilhe lypca folrrh lrader. s I more appropriarelo tesl lhe stldentwnh a tes(ballery desgredTor grade 4 or!rade 6? &pa n your response Whal !s lhe meanrrgol a score pror e. !s ng percenlilefanks lhalforms a sLraghl horzon I Why ar e pef c e. l le{ ank baNdspadic ! a r y u s e f u i n n l e r p r e l . q s k r l o r s l b 1 e s is . o r e s t 10 Whalar e lhe d ller enc esbelweennat o n a l s t a n i n en o r m sa n d l o c a p e r c e n L l er a n kn o r m s t 11 Dlr nq a percd o' naliona achevemenl score dec ne whal impacl w the lse ol dated norms have on rhe nlerprelaronsof s1!de.ls scores, 12 Wnardoes t m ea. when a s c hools a v e r a g es c o r eo l 3 3 5 r o r g r a d e2 s c e n c e h a s a n a riona per c ehlier ank ol97? 13 In whal sour.es are yo! kely lo lindlhe mosr cu(e.r crricar revrewor a slafda.dized ach evemenl lesl? using Standardned Achievement Tests There ar€ good reasons why this chapter focuses on 1l.rr,rather than on selecring t€sts, describing sample test content, or administering and scoring tests. Ihese ar€ all important aspects of sBndardized achievement testrng, but none is so critical as the use of tesis and the scores deriled from them. The climarc of the 1990s drearcns valid test use because there is too much testing for purposes fbr which tests were not designed, and th€re is too li[le appreciation fof the lilnited precision wnh rvhich we are able ro measure educational anainmeDts Conse. quenrly, attention in ftis chaprer will be directed toward (l) an analysis of rhe legitrmate uses of standardized achievement tests, (2) illustrations and explana. tions ofscore interprctation, (3) ptanning topics fbr in.scNice work in rest selc.. tlon, administration, and int€rpretation, and (4) an exploration of some issu€s that affect the quality of a school testing program THE STATUSOF STANDARDIZEDACHIEVEMENT TESIING The great irony of standardued achrevement testing today is that, while these rcstsare b€ing ovensed to fulfill slate and local legisladve mandates, their results are being underutilized in serving the instructional needs of rcachers and sr. dents. To be sure, many school districts have caretully developed rcsting pro. F?ms with systematic procedures for interyreting and reporting th€ir test re' suhs. But in too many cases the schools are required to give certain tests so thar results can be made available for such DurDos€s as interstate and inrcrdistncr c om par is ons ,te a c h e ra n d a d mi n i s tra to rp ersonnel de.i si ons.and pupi l rerenti on 303 304 USINGSTANDAFD]ZED ACH EVEMENTTESTS judgmcnts. I he many informauon seekers leghlarors school board nembers, parent gloups! business leaders, and school adminrsraro.s and reachers-do nor share the samc agcnda an.l do Dor, therefore, havc need for rhe saDe rvDes of Teache.s and adminBtrarors gcnerally make less use of srandardized achievement-tesr resulrs rheD rhey could lbr rwo reasons.Firsr, educarors tend ro understand tar less about rcsrs and resr scores rhan would be desirable. Thcir educadonal prepararion prograDs and in,senicc edu.arion setdom ad.lress rhe esscnrials of resting and evalur(ion Consequenrly, rcachers ofren can devore tess consequences because of lor scorcs on urandared rests musr lnvesl rhe bulk of rheir energy and insrrucrional dmc ro prepaftng rheir srudenrs ro do welt in rhe areas (usualll reading aDd math) covercd by rhe accounrabiliry assessmenr As a result, even if other tesr scor€s poDr ro areas ofweakness, rhese reacbers cannor afford ro split thcir effo.is aNay from ihe high.srakes" mandared assessmenr cepted_anong many educarors and rhe gene sure of educadonal accomplishmenr is virtu unfortunately, he may be correcr. We I hd\ berome .u unr,i ri (dl l v rL public thar irs validiry,'as a mea unqu€stioned_ To s;me exrentj xpecrartons For example, when rhe barh. room scale givs a hrgher reading rhan we expecr, how nany ofus firsr wonder if the sale is functioning propcrly? When sropped for a speedDg violation, how many dri!ers firsrquesrion theaccuracy ofrhe radarequipDenr) Butwhen achieve. frequently try to explain substandard tesr resulrs in rcrms ofreacher quatiry, funding. o, ph) s ir a l re \o u " r.. d n d s i mp l r J s sume rhe appropri drenessof rhe mea. s ur e! r nat Dr o v rd e c th l o s e re s u l ts , Another erplanarion for rhe increased use of srandardized achi€vemenr tests, particularly mandared assessm€nr,is rhar rhe celebEkd Nadonal Commis. sion on Excellenre in Educarion recommended more_ In irs reporL A Natiafl at i?t r t { lS 83' , d \p e ( i fi , ti a m e l u rk k i rh e x p l i ci r purpo\es w as gi i en In ou ining the evideDce for concludiD8 rhar we are a nadonar sk, the commission reporred 1.1"iDdicarors ofrisk," 11 ofwhich depend oD rhe use of srandard, ized test scores as crit€ria. USNGSIANDARD ZEO EVE]VENTTESTS 305 'CH Whether or Dot the achievemenr test has become an insorurion is debar. able What seems certain, howevet is thar good standardrzed achievemenr rests will continue to be needed to help educators moniror the €ffecriveness of iherr efforts and report rhe outcomes oftheirefforts to local boards and par€nrs. Care. lul test selection and wise tesr.score interDrehnon and use can make Dosirilc ( onr r ibu ti o n \ ro fu l fi l l i n g Ih c \r n e p d s USES OF ACHIEVEMENT.TESTRESULTS Standardized achielemeni.test scores provide a special kind of informarion on the extent ofstudent learning k is special because it l9 based on a consensus of expert l€achers with respect to whar ought ro be leamed in the sudy ofa specific subject, a conseDsus external lo and independent of the local teachers h Lhus provides a basis for comparing local achievemenrs wirh cxrernal norms of achievement in similar classes.lt is usetul iDfomatron becauseir helps to rnform s r udenr q .re rt h e r\. a d m j n i )rra L o r\.a nd the publ ;( dr l rrge ol rhe effi ri rene* of the educational efforts in their schools Schools sometimes have been criticized for serring up resring programs, giving and sco ng the resK, and rhen doin8 norling wirh the resr scores excepr to file them rn the principal's ofiice.If.he school facultyand the individual reach ers do not study the test results to idendry levels and ranges of achievemenr in the school as a hole and within specific classes;ifthey do nor single ou.srudenG ofhigh and low achievemenqand ifthe scores are not reported and tnrerprered to students, parents, and th€ public, these criricisns are jusrifiable. Bur if rhe c tics mean that no coherenr program of.action triggered specifically by the rest resulis and designed to "do something" about them eDerged from rhe resdng program, rhen rhe criiicisms probably are norjust,fi€d What a good school faculiy "does' about standardized t€sr scores is somerhing l*e what good citrzens do wrth information rhey glean from a newspaper Having finish€d !h€ evening paper th€y do not lay ir aside and asl themsel\'es, "Now what am I going to do about all this, about rhe weathe! rhe accidents, Lhe crimes, rhe le8isladve decisions, fte clorhing sales,the srock market reports, rhe baseball games won and los! and all rhe resr?" Th€y may, of cou e, plan specific actions in rerponse to one or two items. But Dost of what is meDorable Lhey simply add to their store of latent knowledge. In hundreds ofunplanned ways ir will affect rhe opinions rhey express larea the votes tiey cast, and rhe orh€r deci. sions they make. lnformadon can be very useful ultimately, eveD when ir rrigg€rs no mmeolarc resPonre. Educators who properly deplorejudging teacher competenc€ solely on Lhe basis ofstudenrs' test scores sometimes fail to see that it is eouallv unwise ro take acrion on school or student problems solely on Lhe basis oi rhose same rest scores Seldom do standardized test scores by themselv€ provrde suf|lcienr guidance for wise and effective educational actioDs. It follows tiat these t€st scores should be regarded p marily as sourc€s ofusetul information, nor as major srim. uli and guid€s to immediate action. A school faculty or teacher who sees lhe need and has t}|e opponunir) shouid not hesitate to develop a program for acrion based pardy on fie rcorej 306 US NG SIANDAFDZED ACNIEVEMENTTESTS pro!ided by standardizcd tcsts of achieveDent But neither should feel that the resri.g Nas a $.aste of tnle unless such a progran is developcd Thc iDDcdiak purpose to bc scnc(l by standardized tcst scores is the prolision ol instructionrl inlbnnarion, rnformarion that can contribrtte to the wisdom of a host of specifi. a. r ions s t im u l a te d b y o th c r e d u c a ti o n a l needs and devel opments Purposes lor Te6ling All achieremcnr tcsts-whether slandardized or teacher rnade are mainly tools of instnction That js, they ar. designcd on the basis of the goals of insuuction, and dren rcsults are iDtended to show the extcnt of progress to$.ard those goals Standardized achicveme.t iest batleries provide surveys oi the extent of learni.g !r each of scrcral cuficula' areasi therr scores are er pectcd to it prove the decisions tcachers make about students lhe assumpnon is thar reachers $'ill make be(er instructronal de.isions about students dii,l such resr scores than they woutd uilrorl rhem (Ilieronymus and Hoover, 1986) Scores are not iDtendcd to supplaur rcacherj.iudgments Insread, ttrcy may hch to con. firm suspicions and expectations, they nay provide conflicting inlbrmation that should rigger rearsessment, or ihey may point out the need for furlher, more detailcd inlbnnatron. The purposes oudined by the authors of the loua ksts of Ba\ic ShiU! iDd\care the iDportance of sefling instnctional needs and, by tl)eir absence, the ln appropriareness ofusing such testsfor avariety ofaccountabilrty functions (Hreronyrnus and Hoover 1990, p l): I Describethe developnenul level ofstudenm so rhat itrstru.t,onal nar€rials and procedur€scan be adaPed to individu2l( 2. Diagnos€ individual stren8ths and wea\ne$es in educational d€velopment acros subJe.rareasand sk ls within subjecrareas 3 DeLerminethe erren( ofreadines 10begnrinsttucdon,.o proceediu an instruc honal sequence,of to move to an acceleratedlevel of instNctbn 4 Inform administfatiwede.isions in Srouping indivrduak to accomnodateindi vidualized instrucdon 5 DiaBDosegroDp strengthsand sealn€$es for ndjusting.urricular content,€nt pbasis,or approach 6 Det€rmine the relativeefectirenes ofalte..ate methodsor pr%rams ofinstnc. 7 Determinethe effe.ti!€nes ofinnovattue programsor experimenhl approa.hes 8 Provide a m€ans for d€veloping reasonabl€expecaLionsfor rudent achi€vement and for desc.ibin8 progres toward such Soals 9. Describestudenrachievementin termsrhot 3re meaningttl to parcnc, students; and !he generalpublic Examples of ome of these specific purpqses will describe how achiev€ment test scores can b used ro select studenis for remedral attention or for enrrchment opportunides, for readiness for planned rnstructioD, or for diagnosing dift-lcul' USINGSTANDARD ZED ACI]IEVEMENTTESIS 307 Chapter1 Sel€ctionand Evatuation Talentedand citted Setection r ;;lfi:,. *-e nationarpercenorerank of ar reaste5 on rhe sk /,rd 308 US NG SIANDAFDIZEDICH EVEMENTTESTS p.ograms depends on such nonacadem rc variables as inreresr. Dorivarion. Dersisr enc€, and indepFndence_ The narure of the program, (he demand for pa;ticipa tion, and the exrenr of local resourccs may vary enouqh over rime to warrint \ c r ing \ ep a rJ te , n (e r i a l u ' rh e \rn o u s tA L program .i ' dnd. J\ai tabte. Kindergarten R€adln6ss v ea' ur B o ro j ,e ti n d e rg a e n . S u .l r p l a , cmcn' de.i i i un. shoutd he made u\i ns r nlor m ar on L h a ra rc a d p' ori de. w ho d,e noL rri dt ror ' .hi td,cn k inder gJ r t .n d re rh o n h h o h d !e \o m c ti nd ol dr\el opmrnrdl defi ri , D hl \i (!t. em or ional,s u .i a l -rh a r n e e d e\p e r ra t a .n ri .n (or I i me. i n rhe i udem, nr .om, r "r r hJ r Lhe r eg u l J r k rn d e ' g a e n p ro g ra m doc" nor rana," nnorl oLter havc not had a pres.hool or home envjronmenr riar nourished such ski s Bur t hes e r r e c o g n i ri \c J b i l i | | e \ rh rrn b e tea' ned rel dri !et] qur,kty, A r\en eome . onLenr r are d e l to rr b ) i u d e n r a d re d .her The marn ratue;t readi ne* rore, is to provide a prcrure ofsrudent srrengrhs and weaknrsses,ar rhe skrll levet, and io describ€ the read iness in each sub.jecr area of rhe classof sruden rs Thus, readi ness tests ar€ most uselul when given rn rhe mid.fatl, a time rhar a ows plenry of Finally, for reasons similar ro rhose cired inmedia(ely above, readiness (p ' i n g o l k i n d e rS a' ren are nor useful fo, makrng fi rs,. i( or es obr a i n e d i n gr dde pla. e m e n r d e' h .i e s i o n s . D e c i s i o n s ro rerai n, ro pta, e i n a rransi ri ondl lD ro. gram, or to promote are not likely to be aided by readiness.restscores. She;ard and S m if i r 1 9 8 6 ' h a v e rrg u e d rh a r re re nri on at rhi s decr\i on poi nr has ontv ;eqr. t iv e, onequ e n (e s fo r m o ' r s ru d e n rs .n o marrer w har .' i reri ;n i s u,ed. S rude;,. who s how s o e re w c a k n p s .e s;n p re re a d i n8 ,ti tl s l i .reni nR , l erte, recoqni ri un, lelt er - s oun d a q s o c i a ti o n .a n d Ia n g u a g e rel ari onat ronceps ma' need-rempo. rary individual.prcgram plans to r;nediare.heir deficieniies. But cerrainlv other considerations should inform r}le .€tenrion decrsions. not sot€ly rhe ac;demic det ic ienc ie so f k i n d e rg a ri e n . Didgnolls ol Leamlng Dlttlcultles S t an d a rd i z e da .h i e te me n t b a rreri esare mr deri gned ro be di aqnosri ( ai ds t har pr ov ide d e ta i l e d i n fo rm a ri o n l o r w ork w ;rh srudenr; H ow ever, ' nl j ;rl ul most do provide considerable group diagnosiic iDformarion, parricularly those thar display resuks in special reporrs rhar show av€rage rest.item scores or avera8e s t ill s c o re sw i th i n re s r!. In s rru .ri o n al ptanni ng for a ctars can be enhan(ed by BkinS sud data inro account, insrructronal mareials can be sel€cted or devel, oped !o imprpve learning in deficienr areas, and rime can be reallocared frcm topics on which students have demonstrared higher levels of ac.omptishmenr. USING STANOAFDIZED ACHIEV€MENT TESTS 3I'g Any achidenenr rest can provide "diagnosiic informanon of value t' indivrduat stud€nN if rher are told which items they mrssed With the teacher's help, these students can rhen correct the mistakes or misconcepiions that led rhem asra,v Highl) specific "diagnosis' and "remediadon" of tltis sort can be effecrive and is often accolnplished wnh classroom achievement tests. But such feedback and discussion are impracticai, if not impossible, with standardized One reason for the lack ofsuccess in edu.ational diaqnosis in most fields other rhan elementary readrng and arithmetic rs that most leatning difficulties are not attributable to specific or easily correctable drsorders lnstead, they usu' ally result from accunulations of incomplete learnrng and of distaste for learn' ing Neither of these causesis hard to re.ogn izei neither is easy to cure Diagn osis is nor the real probiem, and diagnostic testing catr do little to solve that problem. thal -A.norherreason for rhis lack of successin educational dia8nos's 's effecdve diagriosis and remediaLion take a great deal more time than most teachers have or most students would be $illing to derote The diagnosing of reading difficulties rs a well.developed skrll, and remedial treatnents can be Yery effec' uve Because rcading is so basrc to o fier leaming, the dme required for d iagnosis and re'nediatioD is often spent ungrudgrngly. But lhere *te subjecr of study is nore advanced and more speciatized, the best solution to learning diffictllties in an area, say algebra, physics, or Cerman, may be ro put offstudy in that area a.d cultivate ieaming in orher areas that present fewer problems. Standardized diagnostic tests in both reading and math are achievement rcs[s used by reading Dr math specialists to gain information about the learning problems ofindividual students. These tes$ are built to allow rcsr takers to d€m onstratc cernin kinils oferrors or misconceptions held by students who are hav' ing difticnlri€s in r€ading (or arithmeti( computation). Often the resulrq of th€ $bje, r ma e e \r i n a b fl rF ry i n d i i a re a B enerrl probl em, dnd the di agnosti , te.t is adminirtered to ascertain the sp€ciFrcdeficits in terms of skills and subskills. Unfortunately, diagnosdc rests, like other achievement rests, help to identify probleD areas, but they seldoD provide reasons for t}te dimculties and cannot prescribe solutions io overcome them. A major challenge to the rcacher is to synrhesizethe entering beha!ior information about a student so that the instruc. tional strategi€s and materials can be selected Lhat will optrmrze that studenfs condrtioDs for learning SC O B ESO F IN O IV ID U AL S INT E RP RE T I NG Mosr test publishe$ offer such a wide variety ofscore reporis and scoring serrjces rhar schools sometimes have difficulty deciding which ones they should order counselors, admrnistrarors, parenis-and !/hat kind of information is neededpupil rest and skill scores, building average scores, system.wide averages, class room p€rcent scores. and so on. The list of ne€ds Iilay seem almost eDdless,but rhe review process will help rc rule out many reports that are either similar to one anorh€r or simply nor need€d. There is good reason to b€lieve that the 310 USINGSTANDARDIZEO ACHIEVEMENT TESTS underutilizatron of scores by reachcrs is due in pa;t ro dre inconvenienr formar in which the scores are reporred ro rheln Of course, part of re rcason for rhe inadequate reportrng is that teachers are seldom consulred aboul reporr formars ihat would be most helpful to them. When tcst results come back to a distri.t, ncarly elery reacherwill rc.cile a lst repor(, an alphaberical listing of students and rheir correspo.ding scorcs. At the middle school and high school levels, repons mighr be arraDged by class period fbr ea.h English, math, science, and social srudies reacher Fisxre 17 lis a sample lisl report showing scores for Mrs. Newton's fifrh-grade classon rhc lood Testsaf Baslc Skilh. A re\io{ of the scores ofAlison Babka will illusrrarc how scores of indivrdual pupils mighr be interyreted. Here are some slarcmenrs thar mighr be made about Alison's performance in dre fall of fifth gradel I H€r Complet€ Compositt gradc equivalent score (rhe arerage ol tbe llve main scoret is 55, the same as the typical sLudeni aLthe end oflhe fifrh monrh offifth Her Complcte Composic percerdle ra.k of 60 means thar lio percenr ot fiflh g'aders nahonally have composire sco.es loser rhan hers 3 In sun, Alison\ overall achie!cmenr seems aboxt ave.age .ompared wirh orh.r llrlh graden natio.ally 4 Ali$n s rclrtive srrenerhs a,e in areas in rlhich hrr per.enrile rank is nod.eabll above h€r Complete Composite percentile rank puncruatidn and matb conpu 5 Aliso.\ relarile $eakneses are in areas in rLhich h.r percenrile .ank is roLice ablvlower !han herComplete Co posite per.entile rank-language usage, matb problem solring, vr.ial srudies, and sciencc At this point we should be interesred in the parricular skills rhar nny halc contributed most ro rhe strengths and weaknessesidentified by rhe (esr scores A report that makes such analysis possible and rhal provides percent correct scores for critcrion-referenced interpreadon is rhe Srudcnt Skills Analysjs report rn Figxre l7-2 (Note that the lndivrdual Performance Profil€ repor! Frgure 16 2, also could be used nicely for this purpose.) A c ro s s th e to p o fA l i s o n ' s s k rl l sreport, rhe test scores grade equi val edts and percentile ranks-ftom the list report ale reproduced hr easy reference The fourth column ofnumbers in the botton scction of the reoo is the pcrcenl l o r A l i \o n . a n d rh ( n e x t'uo," l umn.. .' eLge. or rl ' , ,i r* arro the nation, permrt norm-referenced conparisons Here are sorne srarements that might be made about Alison's perfbrmance based on skrll ,coresr 1 Puncruation is a rel2tile shengtb ofAlison's, pardl because ofher perforna.ce 3 wi th rcrminal pun.tuaLion and use of commas Her other skill sco' es arc m! ch like those of.he a!erdge sudenr in her cla$ Alison s math compu.atoD performance was bolsrered by perfects.ores on addi rionhubtracrion ofwhole numbers and decimals. But vhole number nruldplica. .ion/division se€ms ro be a weak skiu widin this generally strong area The language usag€ and €xpre$ion seakne$ seems to be explained maint)'by lsage skills, all of whi.h re€d improlemen. f{ il I e 1) :r t g B g .: E F ! 311 :" :i ! ::ri :i:: :;:: ; ; e :l -=H t! !3 312 :€ USINGSTANDAFOTzED /TCHtEVEMENT TESTS gt3 4 The wraknels frarh probtem s( 'n per ( enr ile pe. r eo r r om r he nn L o bt.acnon. Th€re may be some ! oblem-solving tesr irems to derr j::l.jt#,..#"T,::T"'"':; ' ('''{::nir ;::l;'J:il'""t""":::T..t.ilji"",l sru.ties as anythi!8.) 6. Pc r f ar hd, , , ? in q ic n( e in pbysics and chemistr wea t o v F r , lbe.e x a bur pa,r(ulr,ly wilh r"rpe.r ro ,oDirs :.e::t1.!:r':Tt:i i:.illili1:if;ii_,',:f.:;;r;illitrrfl"J 7 Alison also had som€ troubte wirh the rezding ofgraphs 8 and tabl€sin rhe visual yf:T':i: j:ii,i':,x.,ij:;J#fl :il::; ";,i,::,*-r.,"lil"-il;il; Though referencemareriah was nor n , pr,.r,",,,,n gp,*"",J " ;..;i# ;fll.lik';,:i;'il"';J:'"] s:,:,;l: ll't';ri:#flI;:[.;],:tliil;l::::1",I.j.:l';J#il1]:i:,"..: r €! eit r he \ pec if ic pr ot t em ..^,..---l-ll,li1., r\ne.r or in,erprerins rhesiuresor an ind,!iduarsrudenrin. f..':"::;ti :$;tfl::::,i':J,,ffi n: .:il,:ij,;:""1;::Hp,;".,il;t rvpi cat proFess nas made i n e;r h a,ea, . same on the two occasions. This tabte :":miL",:,J:"Tji:r!,::i*i i"!.il.[lt{;::r:!iixT]ft (Feidt, Forsyth,and Aln"t, rSSS,p. tt;, 85-99 65 84 35-64 15 34 +8 onc year or l0 monrhs (or 1.0 when rhe belorvaveragemighr gain only 6 ro 8 m, expecreclto gain 12 ro 14 months A pr( seveiarsuccessrve yearscan provide a us 31' USINGSIANDAFO]Z€OACNIEVEMENTTESTS quacy ofgrowth-overall a.d in rhe areas previously nored as6trengrhs or weak. INTERPREN G SCORES OF CLASSES tom roll of the list r€port shows the avemgc gradc.equivalenr score and rhc corlelike these can be nade: l tinishcd the second monlh The p€rc€ntile rank of5r verrfies dris inrerpretarion The relative srengrhs of rhe class ar€ rn areas in rvhich thc percenrile rant n 3. Thc rclarive weak.esses are in rhe refercn.e materiats and rcadinB areas .1. uDexpected rc$ perfomance b) c€rtain studcnts ) Thus far, in addition to idenrifying areas of group srrengrh and weakness, we have tried ro verifv that the student scores most responsible for rhese cxueme group pertormancer rre not due to incomplete test'ng, random responding, or some orh€r erroneous iactor Students who $,ere not morivared ro rake rhe resrs seriously could have responded in unpredicrable ways thar would caus€ rheir scores to be incongruent with ther. rypicat classroom arrainmenrs. Such scores should be ignored tempomrily so that subsequent Rroup analvsis and insrructional planning will not be distorted. The nexr step rs to determine rhe skills thar mighr explain rhe relarive strengtns and wcaknessesnoted above- The Croup Item Analysis report in Figxre l7-3 shol\'s rhe Visual Materials and Reference Materiais skill scores and corre. sponding rtem scores fbr Mrs. Newton's class- Since reference marerials was a weakness previously rdentrfied, we should look ar rhe scores in the righr column to gaugc performance in that area. The first column of nunbers $hows rhe resr ircm number, and each of the next four columns shows rhe average percent cor. rect score for (1) all fifth graders in the rorioa (2) Mn. Newton's .lars, (3) all fifrh graders in the ,ztld'4€, and (4) all fifth graders in the school slrla!. The lasr col. umn, Diff, is rhe class average minus the natronal average Ir is this column rhar canhelp isolare skill deficiencies and rhe parriculariten, contentthat conrribuled Mrs Neuton s class se€ms to have had some trouble with alphaberizing iI -' : ?,i 1 -' $ ] j- 9 ! G! a x t9e-:j :3 9 3 s SS 9 l ;Pe !r::3 $ €${9S r !:!!i ;1 ;$ .;i s i i i n ;a .1 f :rri ;i s9? !!r?l 3!-?;3 :ao se sg aal ::;;i5i:,;i i=r r l=d 651 i8i 56 I :i 9 6 'i :!r! !! x iiiii: : : 6 5 6 ,.r8 :!!:l :f 315 316 USINGSTANDARD]ZEO ACH]EVEMENT TESTS seem to be problenaric because rh exception of ircm 58, they are sizable Mrs Newron NI nccd !o decide if sne should plan some insrrucrron in this skill; if she shoutd incidenralty introduce alphabetiring tasks in rhe course of presenrrng srience, socist srudies, or other such lessonsi or rf she fiinks her upcoming plans wifl deat wirh rhis sl.ill suffi ciently. The weakness in gencral references nighr hale bcen expc.red if ficle fifth graders had not been instrucred in rhe usc oI artases,atmanacs, and cerran, booit parrs. The disrricr curriculum guide $,ould be a useful referencc for deciding about reasonable expectarions and possrble needs for remedration in sirr," tions like this Wben schools are departmenralized, as lhey usualll are lbr mrddle s.hool and high school grades, sco.e reporrs.an be preparcd sepa.aGly for each ol a e o n e i nfi gxrF l 7 1.,.rn bet.orrdi i r' thi \ latter reporr shows rhe average skill s.ores oF t4 grade t2 srudents on Lhe loaa T . s ^ oI r dur a h o rd l D a .l ' p w n ! tt,n n b e usctl mr, h l i ke rhe sroup i rer Jnat\,i \ r epor t ( F igu r€ l 7 -3 ) ro fi n d s k i l l s rh a r h e l p erpl i ,n are;s of strengrb ana w i ak. ness.For example, rhe Sources oflnfonDarion ctassaverage percenr conecr score of 58 was onl) 3 points less rhan rhe narionrl average score. One skill, use of encyclopedias and almanacs, 1\'asa weak area and another use of rhe ,tdd..t r Guidz, was a srrong area ln science, Mr White's najor area of inreres(, rhese students performed slighrly beiow rhe narional aveDge, bur no skill seemed ro be particularly weak or strong In Quan dra rive Thinl(ing, ano the r area of inreres r to a physics teacher, these srudenrs performed slightly berrer rhan rhe narionat average, particularly in the skills of probabrliry/sGrislics and erponenrs Ifreachers in deparrmenralized schools are expccred ro review rhe srandardized achievemenr scores of rherr sruden$, as rhcy should be, repo s Ly class p€riod should be provided for rhem Ir is uDreasonable, for exampte, for middle school teachers to pour over a list reporr of240 ergh{h graders ro find rhe ones REPORTINGTO STUDENTSAND PARENTS The most basic use of tesr scores is ro repor rhern ro all who need ro kno , alongwith a simple inrerpreration ofwhat rhey rnean. Theyshor'td be reponed ro s t udent s as w e l l a s ro rh e i r p d re n rs b e .au\e bofi d' e kev i ngredi cnr, i n (t hool l€arnrng. Parents that are informed a.e likely ro be more involved-ar home and at schoo wilh their children's leaming and are more likely ro work coopera, liv el] $; r } l e l e .rc h e r. Students, too, must be rnformed abour rheir own resrresulrs because rhey m at e c ount l e \s d e ri s i o n sa b o u t th e ,r o s n i n\rrur ri onat i nvotvemenr. dhether ro Participate, how much to participaie, how much effort ro devore, and whar kind of personal standards to adopr. And.unless srudenrs are made aware of 6eir IF,E <:E : 9g; I g) $s 333 R S g B sE ;g"E i g"E i.s I ,fiQ 5rs a I F ! E o a8 aa !f,E 33bF3 !.r I d7 AE d^@ -,v 917 318 IESTS USINGSTANOARD ZED ACF]IEVEMENI scores. thev are likelv to be less motivated to take thc next standardized ter seri ously. When students develop the imPression that lest scores do not get used or that the scores are not important to others. eIIorB diminjsh and test s.ores lose rheir usefulness Such s the unlbrtuDate situalior in sone high schools where test scores tend ro be used administratilely by Ihe district, but potential instructional uses are ignoredScore reports thal are marked by simplictlv in visual Presentation and . erplanation are most rdeal for rePorring to otlters For example, thc Individual Poformance Proiilc report (Figure ltj 2) dis.usscd i Chaprer 16 is ideal for repor rg to paren[s durtng a parenr leacher conference l'his rePort has several s t iengt hit ha i p e rmi t a .o mp re h e n s i v e a nd cornP rehensi bl ei nrcrpreutj on ofreI Percentileranks, rhe easies.type of d€rived s.ore b ur.lersta.d, are used 2 Pe.ce.iile bandsaUowfic idea of€..or b be,ncorpolared i. $e iDterPreLatron 3 The a..angcmentof ter and skill profiles Pelmits an easyidentiiiczttun of Iela tive strengtht and weakne$es 5 Percent.corect scores albw for c,iLerionreterenced inle.Prcradons ol skill (Turn to Figure l6-2 and trl to visualize hou )ou !ouldusc this rcPolt in discuss tioD are the mor useful School offi(ials are sometimes rcluctant, for many reasons, abour rePon : I i3 3 I ! I i i : d a tr E"f tf' oa I i r .'! l , 6 : at! ,i EI -95 =g i! . "t.! ! t: : : 3t9 320 USNGSTANMFO]Z€O !CI] EVEMENTTESTS PBOBLEMS SOMEINTERPRETATION B ec aus et he re s t re s u L sa f a s ru d e n t,c l a s s,bui l di nq. or di sri ct ar€ i nfl uenced bv a \ ar ic r v ol l d ,ro r\. r ' , rmp n s s rh l etn b ,r ri hure hi gh o' l os pF' l urman.e,. JnI one fa.tor.'fhat is not to say that att bution should be ignored But, as we shall pornt out. inept attempts to explain the testresults ol a group can iead rc,lcclings of lutility among teachers or to the desrruction oftesr scores as a viable rnfor ra Jud0ing Teacher Compe16ncs S hou l d th e re s u l tso fs ta n d a rd i z ed achi eve.renrLestsbe used to eval uate the competen.e of reachers, either rndrvidualll or as a group? Iren when we recognize that test results never tell tie whole a.hievement storyt thar srandard. ized tests have limitations, and thar lactors other than teacher competence enrcr rhe picture, rhere is still a good casefor arguing that poor achierement (or good) nzt be rhe result ofpoor teaching (or good) Ifue agree rhat rhe qualityofreach. rng influences the quality of achieveDent, then we musr agree also that good measures of achrevement have sonething to conribute to the complex process ofevaluating teacher comp:tence.Ifwe do nor agree that good learning requircs good reaching, why do we iry to hire good teachers or try to train them in the first place? The.oncern aboutwhether to use test scores as apartial basisfor teacher evaluation has intcnsified because of the maDy misuses ofstudents' test scores in personnel decisions. The primary purpose of tesdng, to improve rnstruclion, has been replaced in some cascsby the use ofscores to m2ke salary decisions, de.ide promotions, or assrgn teachers to buildrngs. In some rnstances,t€acher retentron decrsions are made by givrng hea\y weight to the most recent standardized test results- G'ving inordinate emphasis to test results for such purposes has de stroyed rhe instruciional value of the scores in the affected s.hools and has nar. rowed the "instructed curriculum tocloselymarch test conlent, ifnot to inchrde sPecrfic test rtems. Too much emphasis on rest scores in judging teacher competence also has slowed the search for acceptable alternative measures of effective teaching. The existeDce of objective, quantified indicators like rest scores has contributed to complacency abou( developmental efforts Research on teacher elaluation continues, but the prospects for more effectiv€ metlods are no greater for the evaluation of teacher competence than for the assessmentof student achieve. The gross misuses of standardized achi€vement.test scor€s for judging teacher competence we have witnessed over the past decade have caused us to qualify our response to the iDitiai quesrion raised In the absence ofother pertinent information, t€st results should not be used to male personnel decisions. Tesr scores lhat arc compromised by arkmpts to influence personnel de.isions ale uselessfor ant purpose. Any secondary use of standardized achievement test scores that erodes the basic instructional purpose for teslingshould be discontin. ued Promptly. US NG STANDAFDIZEO ACHIEVEMENT TESTS 321 J udging Sc h o o l O u a l i !y As l o D g a s rh c p ri D c i p a l ra s k of thc s.hool i s Lofaci l i ratecogni ri vc l earn iDg, any rn fo rma ri o n rh a r d e s c ri b c sthe exren( ot such l e.rrn,ngsecmsadmi ssi bl e fbr judgilg s.hool eftcctireness. From rhis standpoinr, rhe moir efln.rive schools achrelemenl resrscan conrribute to examining annuat gro$,th in rhe curricutum areas upped br rhc rests' itenrs The surge c,i inrerest in srarc or disrri.r reporr cards has helpcd to diver. s if y r he r h i n k i n g o F e d u c a ro 6 a b o u r x !a! i erv ofschool qual i ry nrdi car.rs_arrend. noer gel assessed and facroredi nro.urri culuD evaluarion be(ause accepral)le asscssmenr proccdures are nor avaitable. Thus, judgments ofs(hool qualrr,\,seldo arc based on perfornance ill such aca physical educarion S.ores IroD sra contnbute ro dre assessmenrof school qualir,v,bur their focus is trmircd, as rhe c m P ha5 ' qo n rh e m d l { u \h o u l d I' e School achielement. a.d .onsequcnd! resl scoles. are influenced bv a nurnber offactors relared lo rhc srudents, rhe srafl rhe school, and the comn,u_ and communiry financial resour.es. suppo.t aor rhe schools, and populatron mo. brlirt Somerimes it is difficult ro recognize rhar achievenenr is as high in a school as should be expected, grven the resources buman and moneht-that have been expended. It is equally drfficult to recognize thar the hrgh achievemenr ob served in some schools is lower rhan ir oughl ro be, given rhe narure ofresources rs, parents, or students-can inrerfere wirh outcomes musr be heldjoinily as well SCHOOL TESTING PROGRAM ISSUES A number ofdecisions face .eachers and adminisrrators in rhe various phases of test selection, prepamtion, administradon, and score interprention. Some ofthe guides and mdnuals rhar ac( ompany standardi/ed rerts sp;al. ro drese rrsues,bur many do Dot. The remainder of rhis chaprer is devoted to an analysis of each of 322 USNGSTANDARD ZEDACNIEVEMENT TESTS scveral marrers thar impac! rest use but are not ihoroughly dealt with in most Teacher In.6erviee Planning A lest-batter,vselection committee is faced with four major iasks: (l) revicw or assessthe testrelated information needs of t}le school svstem and its \ r u, r u' dl In i r..' 2 rd (!e l o p a l i \r o l ,ri ,e ri dr.beusFdi ne' al uatrngthecomper. ins achiclenrenl barrer ies from which a selection $1ll be made, (3) determine th€ procedures to be used to obtrrn eviden.e rclevant to each selecdon criterion and to r{eighr the eudence from the la.ious selcction.riteria, and (4) implemenr the procedures and makc a rccommcDdadon ln liew ofthr lack of preparation or experiencc of maDy educators wirh these tasks, an in-service program for most selection co'nmrlrees rs essential-'Ihe topics listed in Appendix D form a comprc. hensive age.da fron which lo.al plans for in'serice might be developed. When a standardized achievemert battery is administered properlv, we can be relafi'ely confident that (l) students have responded to Lhe tasks to the best of their abilides, (2) resources provrded by the school have been expenCed judiciously, (3) score interpretations using the publish€r's norms are appropriate (meanrngful), and (a) year to year growth cstimates caD be represented accurately. To ensure propcr preparation and administration, rcacher in-seFrce planned around rhe topics listed in Appendix E should be provided. The specific insenice needs of a district will depend on t}le extent of annual staff turnover and rhe extent of previous experience with thc battery in curent use, among oth€r factors. tn addition, Nhen the staff is formed predominantly by experienced teachers, grcater emphasis should be placed on the "why" ofvarious procedures rather than on a rehash of ihe what." Frnallv, test scores n€ed to b€ placed in the hands of teacherc so thai insrructional de.rsions can bc made about students, classes,or segment-eof the curncuhrm Tcachers must be able to recognize discrepant performance, inrerprer various knrds of scores, locate scores on particular reports, and relare the score information ro exDectations and Drevious achievem€Dt l€vels. In-service ropic,. that address thestskilts. delineateh in Appendix F, are th€ basis for program planning. Posttcst in.senice is necessary to ensure t}lat the conscientious efforts of tesr selection ancl admrnEtration are brousht to worthy conclusions. Currlculum.Test Content M6tch Ir is both unfair and illosical to adminismr a tsr to students when that rcsr cov€rs topics students have ;ot had an opportunity to lealn about. But the other extreme has its limitations, also, as Linn (r983) has pohted our "Allowing the match berwcen iDstociional mat€rials and tesr items to b€ too dose risks losing the capability !o measure understandiDg." The grearest loss, however, is the ability to generalize abour what studen$ may be able to do: Literal match ot ituttuc.ion and testing in th€ kns6 ofpractice on ih€ items that about zppear on rhe rcst deslroys the measurement valde of rhe EsL Infftrcd skiUs and knowledge that are made on the basis of tcsi resulb becoEe suspe.r IJSING STANOARD ZED ACHIEVEMENT TESTS 323 The mismarch ditemma iltusrraresrhe ara'1ablefrom publishers,and local r( .'uri.Ulum Mosrimporranrt), rhedrten r,t'n rhariq lileiv tu resutrfrom .u\ton than obJecti!€s)are selecrcdby rhe sch be measurcd,for example,when rhe r Practic€and T€echlngto the Test The prepararion of srudentsfor resring is an issue of curriculum.rest xrensiverhat, in fact, it becomesrhe ,.inI resrhasbecoDea majorroncern! parrjc. ,s mu. h prelJrJrionis tesir;l-)Jre 6ctore k,;/,V, hr"n. and harnrn.ki(ttSor hJ\e or prachce (and matpmctice)usrns this l. g_derat iNrrudion qithou. reAard for spec,fic objectivesmeasurcdbI th€ tesr rn quemon :? Tea.hing ofkst.rakinA skilts rcs.hat ma),includcsomespecificauychosenbccause rhev 1::",."j,l.":",*r". " arc xnown to Deme dred on iotu standardizedtesls 7 Pracd.c ustng lhe exad itehs f.om r he m ns t q u e s ti o n a b l ep o s i ri o n i s i re m 4 . or I ns t r ucri o n .rh B { o r m o fp ra c ri c e w o u t, (except perhaps in a strucrured cur ot Fro our perspecdve, the most reasonab lhe d, hin g o f re s r-ta trn gq k i i r2 ) z n d i nsrructi on rhat combi nes posi ri ons3 and 4. Eerly School Testing The use of stmdardized rests in tlr ri.d",s**. *,,;;;-;;tu-..;;:';,;';j#l:#IilSlii:;';Xil:ff.l: is iusraqmu.h needro monirorgrowrr,l ia.nttry and;*;;;;,;;; esrimare devetopmenrat "r,engrnlHo*.,",, tevets amingthevounge, a,f,olii "ua.n". 324 USNGSTANOAFDIZED rcF EVEMENTTESTS omlly, and responses are marked drrecdy on the test booklet. The abilit) of five. year.olds to handle such testing in the fall ofkindergarten has been documenred in several studies (Frisbie and Andrews, 1990i $'odtke and others, 1989) Many primary teachen have been convi.ced tiat resting in the early gmdes is a mistake, that the results are very unreliable, and thar some srudents are placed in a traumatic aituation by testing No doubl some of rhcsc teachers have observed student behavior that supp.rts thcir position. Othefs probably have been innuenced by the misuse of achievene t scores in making grade retendonpromotion decisions orkindergaien admissionjudgments With respect to reliabiliry, reachers often do not have ready accessto the supporting technical data lbr a given tesr or rhey are uncertain about how to interpret the data. It appears, however, that many prinary teachers object to norm. referenced testingbecause it puts some stlrdents in a position ofhaving to answer questions they are unprepared to answer. Consequently, ther might say, the love of learning the ieacher has tried so hard to instill is undone in a matter of min. utcs by a test Obviously, a test with too many difficult questions should not be given to a child bccause it may produce unnecessary frustration and probabll would provide lirde useful information. But a test wirh -{orr hard questions B necessary (o distinguish students of different achievement teveis and m help idendlr relative strengths and weaknesses.B€sides, students have all had experi ence in their past that created frustration and some degree of failure-tying shoes, buitoning and zipping, riding a bicycle, or printi.g their name. WheD studen$ are told bv the test directions that some questions mighr be asked that they cannor answer nost of them understaDd and;ccept the condrnons wiihout negar\'€ consequcn.cs. Perhaps too much early school testing is done because ad$inistrarcrs have rcquired it, rather than because teachen bave found the results helpfui. As loDg as legiomate purposes for rcsting exist, it is paramounr rhat appropnate tesrs be selecred, that the administration directions be followed exacdy, and rhat srudents be encouraged to do their best. Oth€Nis€ the results may nor be !er,v valid for any purposes, even those of tshich the teacher may not be fully aware. Frequ€ncy 6nd Tim€ ol Year The reasons for giving standardized achievement tests hold the answers to rhe questions: When should tests be given? and How ofren should rests be given? Since differenr testing purposes do not point to the same answer and because a number of practical factors enter in, answ€ring these t'o questions means weighing tmde'offs. The costs in dollars and instructional rime probably should preclude giv. ing a full batt€ry more thaD once in a school year. Some pr€ post tesdng may be necessaryat times for prcgram evaluation, but ordina ly this should involve the readministration ofonly oDe or rwo tests from a battery Some dist cts have begun to look for walE to r€spond to the intrusion of the multiple national, stat€, and district iesdnS programs that €rode .lirect instructional time. One resDons€ has been ro limit the administration of an USINGSTANDARD ZED ACI] EVEMENTTESTS 325 to elaluare r€ach . Midyear resdns o time for remedia g $ieaknesses. " E n c l .o f r e a r re s ri n gh a i a n i rrp l i . r r .lher her i n re n d (d o r n .r t,y s ! b o ;l .tudgments about rhem rnay bc influen( hons as it e a c h rn gth e re s a ,c a nr€ s u l t.o an assessrnenrofthe effecrs ofycar lons tion of_new Daterials. Ar rhe primarri( reDt ach€rensr information rhey often use ro reconsrirure classes for lhe up. 'fwo orhe inrs Derir briefaft growth, ir makes differeDce which t o ndm dr ile (o m p T h e rw n .m t o nesa.n a tl s rs r n- r c r pr er atL o nbsu r n o r' qrh o f s rl Ounot.Leyol Testlng of ddapri nS resri nSro Lhe currrutum upor i ndi vi dual ro be re.(ed.For exam. Lecrassto be tesred in the fall should he r ppropri !,re to, epri ng resri nsi n se(ond d . resri r roo advanceai n (onrenr (over. Lrop back,, is l*ety ro yield more us€ful T her e a re u s a U ! d ra m/ri c a c h i eremenrdi fferen(er among fl udenrs i n r. ,he- s am e. ls l rc o m. n d re a c h e rso rd i n a ri l v w ork hard ro accommodate rhore dif f er en( es b y i n d i v i d u a l i ri n s ma re ri a l i . a(ri vi ri es,,nd * p.;;,i ;;;1. ;;' ;t#;; 32€ LJs/NGSTANDAFDtzEDTpIEVEMENTTESTs such a€commodarrons, it makes sense that tesring also be individuaiizcct so rhar tlose working at markedly Iower or hrgher curricular tevcls rhan rheir classnates willbe rested on the objecrives to which rherr rnsrrucrion has been direcled An Out oflevel resring can result in major gains for reactrer and snrdenr wrih no loss in interpretability ofscores Such sco.es are l)kelv ro be more accumre because studenrs will expe ence less frusrrarion v1lh foreiAn contenr and $rll bc m or e m or i v a re d ro ro m p l e rc rh F re .r. thr r.,r dnd qki tt nn; ,,r. ti tFtv ru demonsrrate a partern of strengths and weaknessesrarher rcrurc oi atl weak nes se s -u n d i ffe re n ri a te d p e rfo rn an.el evel s.Thesra al enrscores r hat r c s ulr fro m .u r.o f.l e re l re \ri n g h a \ e rhe \dme m.anrni a. e, d" ri , rron " ^r " .1 in lev el I e s ri n S.T h a r i (, rh p s e q .o re s are i nrerprered rLtrhour regrrd ru ,he re.l level talen. Also, the percentile mnks assigned ro a pupil show how that pup grade-equivalent scores compare wirh rhose ofothers in ttre same srade Thar a third grader is always compared with otner rhird graders, no mairer whrch t lev€l was administered Testlng Sp€clal Students Individualizing testing is one method of accoDmodarinE srudenrs wirh s pec ial ne e d s . b u r rh e re d re o rh e , n u d enr\ w ho, no mr er w hi ;h red l del i , selected for them, will requir€ special resring condihons. Sruden6 lvirh some form of ledning disabiliry, rhose I!ith lisual or auditory deficirs, or rhose $;rh physical handicaps may need exrra time, a reader, an answer recorder or some otner form of assistance that requires departure ftom standard adminisuarion condirions. When the goal ofrestin8 is ro obtain retevanr informarion for rndivid_ ual program planning, all such accommodarjons should be made. Of course, score rnrerpfetations musr take into account rhe specral conditions and rheir ef fect on the applicabiliry of norms In such cases, norm,referenced scores are lik€ly to be oflitde interest or value, ericepr p€rhaps when local norms are avail. diff€rent reDort forms mav be ofvalue.l and that provide item response informadon will be mosr ureful for buitdins indi vidual programs ofinstruction When such information is coupled wirh r;cher observadons ftom the rest-adminisration s€ssions,rhe needs otsDeciat studcnrs ( an be add re rre d w i rh i n r} l e m a i n s rre a mol l he (hool resri ngpr;grdm Hlgh School Testlng Much discusior about standardized achievement resrinc r€nds ro focus on S r ad$ R ro I, w h e re rh e s ete s b a re m orL promi nenrl y u\ed, bri r $ese resr\ dre adminisrered in virtually every high school, at least in some Fades. There a number of pracrical r€asons why high school srandardized Gdnq wirh achi ment batteries presents som€ unique problems, conc€rns, o. issu;s. First, the nature of the high school curriculum precludes rhe us€ of a - US]NG SIANDARD]ZED ,CHIEVEMENT TESTS trI7 rrattery rhar presum€s some kind ofco tinuous growth) in each subjecr matre) a| Junlors ra(e a math course, soDho exrcnsion or conrinuarion of fieshir social srudies. soDe srudents are in wo s om e d re i n s o c i o to g y ,a n d s o me m ay s ur h c u rri c u l u m d i v e rs i rya n d i D c o n si , stucrenb make the haditianat achieven nearly every high schoot ,iudenL A logical response ro this curriculum march problem is to focus ass€ss. menr on more s€nemtiz€dskilts rbar all srudenrs.are._p.",.a-i"-a#i." throughourrhehighschootprograrn.For ,h,n(,ns in..rp,e,a,ion.r d",,.,.d ri,.,;i""illil'jl'Jg-iijlilp;i,TlltiLi ':..":*Pp in high school,colcse. and ,r,-,gr,.,i ;,-r;'' ,v;. ;,.i.i ",:": ';ii,W);lii,;;l;nff Ei;':lP,".:*XlLl"l:*mp'erBarrerie'rike,her opmeir* rhn,r,.s..ia.""1.,1"1' li.1:.,:itr;:iliI;l1Tlp:i:'::f:J*: Thus, addirionat achievement dara fro glam supplenent rhe information avai causes the ores to be of quesrionabte , Th ourcomes from srandaili, school Ievel seem ro be of less conc€rn the) were ar dre towergrade lelets. Or}ler as m or e rm p o rta n r to r ma k i n A tu tu re car rhe resuhs from rhe batteries-are useful c ur r ic ulu m-c o n re n r c o v e ra s ea n d e m ol Diry presented by course oairinss. auh velopDenr, rhe scores also ,.e r". ""ef"t sraD. strengfts and weaknesses.ID shorr. as usetul-ar rhe high schoot levet as at greater ettorr musrbe expended ro convir iion and to help them divetop convenie 328 USINGSTANDAADIZED ACH]EVEM ENTTESTS SUM M A RYP RO P O S IT IO N S 1 The cunenrlestng climates dominaled by ex- 1 2 l l € a m s l a k el o a s s l m e l h a t s l l d e n 1 sa r e l f a b e cessivoreslifg lor varous accountabry p!rlo underslard Ihe meanino of lhe r own tesl pcsesrc the delrmenrof inslructonatimprovescores or thal lhey care nte abolt the r own 2 The pnmaryand essenlratuse ot scorestrom 13 Aclilevemenl-teslscores provide nTormationlhal sla.dardzedachievement teslsis lo providein, can conlrlbule 10 evatualronsot teacher compe, lormalroflo allwhoare conce.ned wlh the 6du1erc6, b!l such ocores nev€. shoud be used as lhe sole or primarybasis ior evaluarng re3ch€rs 3 ll scoresobrainedlroma schooteslingproqram 14 Expecla0ofslor achievemenlleversir a parrcuare repon-ad and interp.etedto teachois;s1!l a r s c l r o o l s h o u db e d e ? e t o p e d l h r o L g ha f a n a y den1s,aJldparenls,no olheriormatoreaborare sis oi lhe characlerstics ol the st!denrs the pfogramlor usnq them s necessary scrroo pranl and program and the socoeco 4 Standardzed achrevement-test scoresareuset! nomEs or tre commln ly primarilyn laci lati.g instruclron andin evatuar 15 Schoolsmlsl provideteacherswth n-seryceed 5 Theuseof an achevemenlcomposite scoretor seredlngslldents ior gilled edlcationatpro, .gramsmayetcrudemanywho excelin onlyone parrrcurar subjeclarea 16 6 Feadne$-leslscoresprovidenformalion tor ifslruclionarprernng,bll they have rfllch tess valuefor maklnqprograrnplacemenl d6csons 7 Exceplih lhe lelds ol elemenlaryreadingand ar lhmericdiagnostic leslinghasprovedlo beot 1 7 li1ll6ed!calionava ue 8 Theseleclonol reportlormalsfor slandardtzedlesl resulrsshourdbe doneon the bass ot who 18 needswharkhd ol inlornalion 9 An i.dviduals palternol slrengthsand w6akoessescan be delerminedby comparingeach sepa.ar€lesr scorewlh the ballerycomposile- i9 scorepercenue rank 10 SkillandsLbskrlrscores providedEgnosllcintor marbnthar mayaccounllor weaknesses ]d€nti liedat rh€ t€sl tevel 20 r1 Thesirenglhsandweaknesses ol a cla6scanbe idenliliedby ireatin9the classaveragesas lhe scofosol the -averagepupi andlhenustngthe interpretive procodures out nedtor !s€ wirh in- ! c a l i o no n l h e L o p i c o St t e s l s e l e . to . l e s t a d m r n islralon and teslscore nlerp/etalon because preseMce ed!calDnal opporluf tes on these rop,csare roo ra.e When ihe conlent malch belween the lest and clrculum is loo close or when students afe laughl lhe lesi coflenl loo d rec y, the abrtrlyro generarizeabo!l what sludents are abte to do s diminishedgreany or losl atlogethef T h el s e o l s l a n d a r d i z e da c h r e v e m e n r l e s r sn r h e eary prmary grades can accomp sh the same p!rposes that testingar highe. eves does Annla fa reslingwilh an ach evernenlbatreryis opllma tor add.essjng the severat insrru.ronal purposes thai can be served by standaidtz€d schreveme lesls ll is unreasonableloexpeclthalas ngtet€sl tevel can be used lo measurea.hlevement n a cassroom popuratedby sludenls whose academicdev€loprnenralievelsmayspanlwolothreegrades The d ve.se naiure ol lhe high schootcudrcu um a n d r h ev a f e d e n r o h e n l p a r l e r n s o t l h e s t u d e n l s rnake he tesrin! ol educational devetopmenr more userurlrran lhe lest nq ot achieveme.t ol basic skills a1 lhal level OUESTIONSFOR STUDY ANO DISCUSSION abouisrandardizedlesting is needed 1 Whalknowl€dge byth6gen€ratpubric to makeindividuals"lnlorm€dconsumer6? 2 Whalsho!d a schooldislrcl do,as a mlnimum, withits annualstandardized achievem€nt- 3 Whyshoudtesl scoresslppl€menla reachefsjldgnenl aboll sludentsralusandpbg, ressralhorthanleacherjldgmentsupplemenling rhereslscores? US NG SIANOARDZED PCN EVEMENTTESTS j 329 fr'ha lwoudbe a r eas of abes et ec r onr u t e i o r i d e n y r n gs x t h g r a d e i st o r a c r e a t v e w . i l i n g 5 Why might a ch d who cannol count or who does nol know any eners of the aphabor be abre lo prol t from begrnr ng a trad tional krndergartenprogram? 6 Why are there probablyno dagfoslic ach evem€nltests n sctence or sociatstldes? 7 Whar are the re al ve stre.qlhs and weaknessesof Marty Gerami,shown in Ftg!re r7t? (Noie lhe ComprereCompostLescore is rhe averageot scores V F, L, W aid M ) 3 How ca. percennlerafks be used lo descrbe and nrerpretyear to year growth? 9 Wrral p,ocedlres m 9h1a i€acher totow to expa n why lhe averagescore of lhe ctass ts hldr rower lhan lasl year s cass? ll wnal can be sad abouLrelatNeslienglhs lo a stLdenl whose compos te-scorepercentite 12 Howdoes a high mobitiryralewilh fta como!n ty interterewj r someaspectsottesl-score 11 | what way mghl lhe teachingot rest-takrrgskitB be coDslruedas,,leacn n9 the lesf,? 15 Why is t qener ar ynot pos s ber o iden l d yi n d v d L r a t o rq r o l p s r r e r c r l r sa . d w e a k n e s s e s wirlr scores kom a cr renon-referenced 1esl7 r6 why miqhtoclober 28 be consideredthe mosl idea day to beginadministernq a n achieve, men Lbaner yI n a s c hoo? 17 f a stldent is eipecred lo oblaif.eary the same grade-eqlivatentand percenlrierank jcores whef lesled our of teyel,what is he pord ot doing our4Hevel lest ng? l8 Whal incentves caf be used ro encolrage h gh sctroolsludentsto pedorm at then besl o. standardzedachievementtesls? 19 What can be done lo make ihe fesu ts or standardizedachtevementtesrs mofe usetul to hg h s c r ioor leac herin s v ir lua y a s ubj e c la r e a s t 18 Standardtzed Intelligence and Aptitude Measures THE CO NCE P TO F IN T EL L IGE N C E Despit€ widesp.ead acceptance of the idea that intelLg€nce exists, there seems to be no cons€nsus as to just what it is- It presumably has a biological basis in neuroanatomy or brain physiology. Vanous levels ofmental deficiency have been associated wiih metabolic defects and certain tyPes of Prenatal environm€Dtal stress (for example, oxygen deficr€ncy, viral infection, and injurious drugs) But thus far no biological basis for drfferences in intelltg€nce among normal humans has be€n determined. ln rts common and informal usage, int€lli8en.e is often characlerized as "brightness" or "sharpness." Th€se words suggestresponsiveness,percepdveness, cleverness, and ability [o cut through appearances and contusioDs to r€ach understandiDg Lack of intelligence is associated with dullness, whlch sug8ests a lack of atb;d!€n€ss, awareness,or undeEtanding. DesPite their imPrecise and informal characterizations, tnese verbal represenHtions of intelligence are used constantlv as we "size up' $e abilities ofothers around us. The outcomes of such informal issessment no doubt have significaDt imPacts on relationshiPs foDed, viewpoints entertained, aDd judgments followed Psychologists who study cognitive Proc€sres and mental develoPment and tunctioning differ amor,8 theDselves rn their con(€Ptions of intelligence (Weinberg, 1989). But they ar€ in geneEl agreement wirh the tu,nacademicians who perc€ile intelligence as a composite of mainly three elements: (l) abihty to solvJpractical problims, (2) abiliry io verbalize, and (3) ability to adaPt to vdious 330 I STANDARO ZEOINIELLGENCE ANOAPTITUDE MEASUF€S331 demands of thc social environmenr. Some researchers call it the abilitv to learn u, r u du h o rk i n \, h o o l . rh e s rm e dLi l l l y rhar A tfr.d B i nct \l ql l t \d\ i nrcresred in deiecting with his early tests. Others characterize it as ability to rcason, to solve probiems, and to use thc "higher mcnml processes" Still orhers emphasize o. ig' na l th i n k i n g a n d th e a b i l i t) to a dapl t.r nol el si trati ons l n some di scussi ons " c r eat i v i ry i s p o s rte da s a c o mp o n ent ofi ntel hgence, as somc seei ti or as a separare, but related psychologrcal .onstnct, as others see it Dellnitlons: Op.ratlonal and Analyllcal One possible solution to the problem of defining inrelligence is to use an operational definition, as is done sith a variery ofother penonal;ty measures The test used to measure the tmit defines what is beine measured That is, inrelli rc s ' m.a \u rc i . l Jut.l i l l eren, tesr<mcarure d' fi erenr ki nds c enr e / { w h a r.v .r 'h. ;f intelligence depending on the nature of the tasks in them. Obviously, this approach, wharever rts vrrtues in helprng us to thrnk more concrerely about whar we mean by intelhgence. is not going to yield a single, generally acceptable Another possible solution is to use the methods offactor analysis on the respons€s of a wid€ va ety of persons to a wrde \'ari€ty of tasks (test ttems) designed b measure inrelligence !'a.ror analysis is a statistical rcchniquc rhat in. volves examining ahe correlations between a large number of rtem responses to determine if certain homogeneous subsets of itents, .allc.lftutds, can be identr' fied. This approach has shed much ligh( on the extent ro whi.h proficiency on cerkin tasks tends to be related to, or independentof, proficiencyon orher tasks. But it has provided no compelhng definition of intelligence. Differenr research. ers hav€ not used the same kinds of test tasks and, even when they ha!e, they have rnt€rpreted their findings som€what differen y Spearman (1927), for exam ple, found a.ommon, general intellectual factor, but Thurstone (1938) found seven Drimarv mental abilities- Th€ multidimeDsional "structure of inrellecl' model-proposed by Guilford (1906) is quite elaborate but of mostly theoretical interesr The taskshe used rc conceptualize the measurement ofintelligence were subdivided finely into 120 aspects ofintellectual tunctioning based on the process, product, and content characteristic of each aspect Finally. V€rnon (1971) is one of seveml factor analysts to propose a hi€rarchical theory of intelligence that helDs to exDlain muah ofthe corr€lational data that has accumulated on tne structur; of inti:lligence. More rec€ndy, cognitive psychologists have propos€d information. processing models to describe what happens during intellecual functioning rather than studying what results from the process. For example, rhe triarchic tleory offered by Sternberg (1985) is based on these three premises: l. Intelligence explains the ability of persons to adapt to their environ ment, socially or cnlturally. More intelligent individuals are able to adapt in a wider range of social context-r. 2. Intelligent behavior is goal dir€ct€di th€r€ i5 a r€ason for it. Ofren th€ 332 STA\DARDIT'O \-- C-N( E TND AP- T. DC VLASIJRI5 reason relares to wanting to be able to perform cognirive rasks sponraneously or automatically (like an expero or wanting to be able to handle a novel problem. 3. In te l l i g e n t b e h a v i o r i s b o u nded bl rhe extent ro w hi ch i nfornari on process'ng sk,lls and meracognili!c suaregies have been dcveloped The current work of cognitive psychologirrs seems promisrng for educahon be. cause it suggestsfiar fie intellectual components ofindividuals may be isolared This means the functions can be studied separarel) aDd rhe componenrs can be dev elope d to p ro m o te q u i c k e r l e a rn r ng. i mproved memory. or more errensi ve recall capability. Thus far, however these theorres have had lrttle impacr on the instruments thar dominate rhe sales market for .osnirive abiliries resrs Be c a u s eth e m u l ti ru d e o fd e fi n i ri ons propo;ed br psychol ngi srsprovi des no convergence or consensus, rhe measures of inrclligence ai'ailable for use in our schools do not have a common basis Ihe rmplicarion for those who must selict intelligence tests for their school resring program is clear The operadonai definition and theoretical bases of each resr under consrderadon musr be rc. viewed and rhe resr tasks musr be examined in terms of rhe school's DurDose for t es ong I n mo s tc a s e sth e n a tu re o fth e resri tems w i l l provi de a cl earei i ndi .aoon ofwhatrs to be m€asured than will whatever cri terion.related or consrrucr-relared evrdence that is supplred by the publisher to support th€ inrended use ofrhe tesr. THE NATUREOF INTELLIGENCETESTS The differenr conceptions of the nature of int€lligence have conrribured to rhe development of a wid€ diversity of tasks for rcsting it. Examples of some of rhe rypes most widely used on group'qdministered tests are presenrcd below (The wide variety ofopen.ended questions and performance rasksused on some indi vidually administered rests differ considerably from the rasks shown here.) As you read each of these items, try to describe the charscteristics ofindividuals n.,o likely would answer rhe items correctly and tnose who likely would not When you have read all the ircms, try to syDrhesiz€you r descriptions to arrive ara verbal description of intelligenc€ Srtun w (d mtoara) Identify tbe pair of wor& in ea.h set that are either synonymsor anronyns a accid€n! b. bad c evil d. worry 2. Vdbal'nclaga snow:flakei : a. cloud: fleecy c. hail: stofr b. icicler€aves d. rain: drop 3 Vdbdl .latsiEtation pear apple p€ach a be€t b. grape .. eheat d. gi€€n r. Which of th€le is most like a call? . b. cat c. pony SIANOAFD/7FD NTELLGENCE ANDAPI IUOEMEASUFES 333 a l) t- n Sent.nu .otupLetjat c Praised d inlestig't€d s.tae inkeretutin (;ikn s,"ntcn(e: ihe da|e nusr bc advanced one day wLeo one crosses rhe Inter. !rrionat Darr I_inc in a wesrertv direcrrofl N11nba lerict r t0 qta"tikttit.r.tntinn, X 2, { ) 00lc el a X is m or c 1A tuldtiu | | Nihbtr b y onc r Di l e b y is m o, € c 11 d tl c X and y are rhe samc hagnitutlel satde 30 + a 111 d 100 nntLru.tiarl b li3 UI. Abst.acl Proc€sses 12 np,. clas:trt.ation i! lt d9 33' STANDAFDIZ€DNTEILIGENCEANDAPTTUDEMEASUFES 11. Matrit Prog4siai Whi.h figure belongs in the blank space? A R (]D School.16latod Tasks Some of the exercises used to test rntelligence-giving synonyms, inter. eral achievement test batteries. Abilities to handle other tasks ,uch as analog] pmblems, number sentence construction, and problems ofclassification usually are learned rncrdennlly (if at all) rn school, at play, at home, or elsewhere It is somerimes assumed thatwhat a student succeedsin learning inciden. tally rs a berrer indicarion of intelligence than is the person's successin inten tional learning in school. The assumption may bejustified, but the evidence and logic n€eded tojusrify it ar€ not obviou Teachin8 does indeed assrstlearnrng, bur it do€s not make learnine automatic or does it €liminate the need for elTort a nd abil; ty on the part of studen ts In telliSencc contrtbu les to learning in school as well as out of it. Obviously, if we wish io compare th€ intelligence of.hrldren who have been ro school nith those who hav€ not, we should not use tasks that ihe school trres to teach As a gen€ral princiPle, if we seek to infer basic ability to learn probably have been exposed. Yet, as Coleman and Cureton (1954) have pointed oui, even if opportuniti€s for in.school learning could be equalized, there would stili remain great differences in the availability ofincidental learnin8.'Ihese differences in environments and life-styles among different families, diff€rent neighborhoods, and diff€rent regions of t}le country cannot, and probably should noL be €liminated. Therefore, th€ prosp€cts for equalizing opporlunities to learn are €ssentially nonexistent School related tasks Probably rePresent the Brearest experi€ndal conmon al€nominator for children and, thus, the most ap' propriate sourc€ of items for pr€dicting potential for l€arning in school. STANDARD ZED INTELLIGENCE AND APTTLIDEMEASURES Nonverbal and Culture.tair Tesls So me d e v e l o p e rso f l n rc l l i g e nce res$ hate auempred ro n,ni mi ze. or to th er l brIn of mani pul ati on. S oneri mes even t he ins r uc ti o n s i n v o l v e D o w o rd s , b u r are gi ven nr panromi me ' I h e s e te s | sa re u s e fu l rfs tu d e n ts w ho do nor al l speak the samel aD gua8c tested with rhe same resr,or if a srudcnt with a severe la gxagc haDdi(ap r e s re dT . h e y ma y b c a p p e a l i n g to rhosew ho seekmeasuresofi nte[se,ce drar icss influeDced dire.dy bl school lcarning, parricuJarly laneuaqe Larn. ing B r lhcrc is no good reason to believe rhar rhcse Donlcrbal resls are Drore y alid m ea s u re so f i n re l l i g e D c erh a n rh c verbal resrs A bi l i ry ro do w el l on (hcn i s lear ned al s o A D d s i n c c v c rb a l fa .i l i tv i s so i mporranr an cl cmeD t i n school l earD . ing, and in n ro s t o rh e r a re a s o f h u D ra n achi evenenr, rhe nzi or appl i carbn for nonlerbal tests secns ro be wirh individuals who ha!c s'gnificanr language Irrob. lems or with rhose whosc narive la.guage is nor English Most inrelligence resrsnor only requirc sone deg'ec of adepmcss ,virlr .r particular langungc, bur also assumchmiliarity nirh a parri.utar culrure. -t.hisqual ity limirs rheir lscfulness in orher cuhures However, auempG ro britct t:uiturt /r,l rcsrs have failed be.au5e resriDg requires communicarion, and communica t ion is im p o s s i b l ei D rh e a b s e n .eo fc u l r ure and the symbots,conceprs,and meaD ir lgs ir em b o d i e s . A r re m p ts ro b u i l d a l tz l //o ,r tes$ by el i mrnarj ng i rems rhar di s.ri mi nare uals in t Le i r re s p o n s eto a n l re s ri re m r har cannot be arr bured ro di fferencesrn culture. if cuhure rs defined inclusively enough. Iach of us lives in a soneshar ditlcrcnt culrure Nor onlv Eskimos and Afrrcans, bur also Vermonrers and vn ginirns, farmers and ciry dwelle.s. boys and girls, even firsr-born and nex! born in the same family lilc in somewha( differenr culmres The differences are our equally great in all of rhese inshnces, bur rhey exisr as drfferences in rll cases, ir em r hdt d i rc , i m i n a trs i i u n ta i r- I or cultureJair tes! to discriminare among individuals, aud rhere is no reason ro use a test that does nor discliminare berween rhose who have more or less ofax abihy of interesr ro rhe usef, SCORES REPORTEDFBOM ABILITY TESTS ized) and percentile ranks 336 STANDAF]D ZEO NIELLIGENCEANDAPTTUOEMEASI-IRES Standard Scores and Percenlils Ranks I he raw scorcs obtained on intelligence tests require norms for interyreradon, and these norms-usually age or grade level-are expressed as stanines, Though the srandard some orher rype of standard score, or percennle 'anks. scores used by varrous publishen are known by different nanes, virtually all a.e defi|ed by a nean score of r00 and a standard deviation of 16 (or 15 in some case9 For example, when age norns are used, the standard score nny be catled a standard age score, mental age score, age.equivalent score, cognitive skills quo tient, or deviation IQ score. l'he scorc mnges in Table 18 I show the apProximate equivalcDt values of scores commonly reported for rntelligence tests lhese are the saDe rclation. ,hips discussed in Chapter 4 based on the normal curve. Tbe generat descriptors are terms that I')ight be used in a naradve rePort or during a Parent-teacbe. conferen.e ro describe a level of performance for either age or grade norms Many score repo s list standard scores and percentile ranks Ior both agc aDd gradc normsj For students whose chrot)ologrcal age s rypical for their grade level, their percentile ranks using either norm SrouP should be the same But for those who are older or younget tnan thcir Fade mates, noticeable diferences should be expected Subtest and Total Scorss In view of the wrde range ofdefinitions ofinteuigence discussed earli€r, It should not be surprising to find multiple scores produced from some t€stsand only a single score furnished by others. Tests built on a unitary theory of inrelli gence should be expected to report a singl€ score, but those based on a multi facered theory should produce several, on€ for each facet perhaps.In addition, theories thar promote the idea that intelligeDt behavior DiEht vary in different conrenr areas would require tesnng in each of the several distincr domains (for example, marhematical, verbal, abstract, socral).ln som€ caseslt might be incon. sicteor with the theory to average the separate scores to obtain a meaningful total (olerall) score. For example, when verbal and nonverbal scores are reportedi whar meaning should be attached to the average of such scores?The specific T.bt€ 18-1. Relal,onshlpbetweenStandardScore,PercentileBank, and Slanln€Aangos Ussdln CognitiveAblllllesScoreInterpreiatron 112- 121 88- 111 '7247 96-99 TT-95 23-76 422 NorejThe slanin€ valuessh@ approxmale relarlonshlpeForexanple,bvdelinlon tound n each ol slanln* 1 and 9 9 2-3 1 4 percenl ol lh€ scores ate STANOABD ZED INTELLIGENCEANO APTITIJDEMEASUFES 337 m ann) g mi g h t l re i m p o .\i b l e ro derermi ne. bur mos rogni rrre prl (hotosi srs s ould p ro b a b t, a c (e p r rh e a !e , r' s € aj 1n i ndi r aror or g* .Lr , ,bi f,i l .' I n e ' e m a v D e .o n \rd e ra b te di agnos(i ( i nformari on i n rhe " g.i i ,i " .rrn na ot aD r ndr v rd u rt \ i n re i s e n .e.tesr ba en. C oni i der rhree i rudens w ho se standar.l score_ 121 90 a0 98 T6 80 135 97 97 97 96 The parrerns_shownby rhe tesr scores indrcare rhar importanr informarion would be concealed by using onty rotal (av cons'derable rerbal faciliry, hite Zact and aporenrial forverbal and quanrirari ot scores would no. be misinrcrpr€led I tern ofstrengrhs and weaknessei appar, t ions f o r th e i n d i v i d u a l i z a ti o n o fc o n ren O f c our \ e . a l l s .o rc s rh a t re p re \e n r ar seem..our ot chdracter- g. veritied$rou8h furrherub narion or resrins,perhaps ::]."-,Jio:ll:n.,rd wrrn me as:rsrance ol a s.hootpslchotogi\'. Interp.ellngScor€sol Indtvlduets inteligence barrcry wirl all srudentsrn se who may have an overalt 6!rbi.a.l le\el Linformation for individualiziriqinshuc. ol scoresfrom any resr,the use; should L How do lbe separate s.orerfrom t€sLs wifiin lh€ barrerycompare? 2. How do rh€s.oresfrom dis tesringcompaiewirh rhosefrom the tasrtesrint? J . H o w d o Oc $ o re r (mp ,re w i!h i n..l aespFrformd,,ce,e!,,b" havi o n ve;aj a n o p e rro rma ne. on w ri t| ena$i gnmenr\) ' n re ra c l o n ,, 4. How-do the scor€scomparewirh recenr scoresfrom skndard,zed achieveneDr rl l i n re l l i g e n (e te s K used i n $e schootsare measu,e, of devet. s rrl ts o b ra me d l h ro u gh e\peri ences both i n and our of s.hool . M anr o, th e s e s k i tts rrn b e n u ' ru re d rhrough di recr i nsrru.ri on; rhey need nol wait for some kind of maturarional unfoldins. C.^.q""*rv, *1i." j.n.i.".i." dr e nut rd . re a c h e rsc a n i n re d e n e w irh the i nl enr to i mprove a chi td, l earni ns 5 e \ \h o rt te rm o ' l o n g .' e rm memory. abi t;ry to !erri evei nfo,mati on, abi i l h e a rtri b u re so f con(eprr, or the abi l i (v to cl rssi tr obj e.,s or i.te2s . .N e a rl y !lr : : E ij ii s t E e s ! s 3 : E :;l; r! :i !P qEhhtE IE Eg 9€ e 330 iHEEEE I NTELLIGENCEANDAPTITUDE MEASUFES STANDAFOIZEO 330 Exceprional sludents who demonstratc highly developed cognitive skills rcquire specral artention, also Thay mighr be able to iearn fasrcr, handle more compler jdeas, and probc grearer depths rhan most of their classmates.Supple. meDral marerials and projectr can be used to provide the enrichment they prob abh need Extra, high intelest activitres need to be kept on hand because, srnce rhe) may requi.e fewer repetitrons to learn, these students are likely to develop more idle rime rhan their peers The so.called roublemakers in a cl as likely to cone from re rop of dre abrlity scale as rhe bottom, €specrally if individualiz.rtion of instmction E inadequate 'Ihe sanple report shown in Figure l8-l contains the scores from Mrs Kessle. s secoDd-8radeclasson rhe Cognti.,eAbilrlt"j Zrt (CogAT) (Thorndike and fIagen, 1986) The procedures described in Chapter 17 for interpreting indrvid ual scorcs can be used with this list report, too Here are some reasonable state menrs to Dake about the performance ofAnna Aparicio, the first student lstedI Ir rems of rhe descrip@rsin Table I E- I , Annat pcrformaDceis aboveaverage compa'cd LooLhereightyear'oldsnatnrnalll ? Sin.e tbe ageofmost beginnint secondFade.s (nationally)is.loser to six )€irs Lhanseven,{e should expecrAnna\ a8ePR and S scoreslo be higher than the c o rre s p o n d i d S E ra d es .o re s(a stb eyare) 3 Anna\ le.bal perlbrmance .ould bc termed a relativc weaknessbecauselhe orher tNOs.oresare so bigh (ln an absoluresense,rto weaknesses are apparent) a Quanrnarile reasodingis bodr a relarile and an absoluresl.englh for Anna She may progre$ faster rhan her cla$maLesi. math, nore so in nrath concep$ and .omputarional skills fian in verbal problem solving The class averaBesin the last row of the rcport in Figu.e 18 I indrcate that N{rs Kessler's classhas a fairly evcn pattem of scores: 107 4, 108-3,and 105.8 This means Lhere are no idenofiablegrrup srrenglhs and weakn€ssesto which she mighr need ro adjusr. The typical student has scores in the sixth stanine $ith per.entile ranks around 68 Fortunately, there are no students who, on the bass of rheir baltery scores, seem to require furrher testing to explore the possibiliry of dev elo p me n ta l d i s a b i l i (i e s . APT I T UDET E S T I N G Aprirude tesls, like inrelligence rcs(s, are not always easy to distinguish from achievement tests because, on the surface, the contenl seems interchaneeable develepers of apt'tude rests accomplish purposes thar achievement rests ordinar ily ar e n o r i n te n d e d to s e n e . Aptitude tests are measures of potential-abilities that foreshadow successon relared tasks at soDe future dme. Their purpose is predictive and their focus often is narrowed to a single abrlity or small collection ofrelated abilities. S.omeonewho has the aptitude to do clerical work, for example, has the prerequi sire skills in mxnual dexteriry, arrention ro detail, and speed with repetitive tasks 340 STANMFDIZED NTELLGENCEANO APTITUDEMEASUF€S to conlplete many typcs of clcrical work cftectivcly and efficienth Of course, if the persoD has perfo'ned clerical rforl previoLisly, wc wouid nor nced an apri. t ude t est to p re d rc i h i s o r h e r p o te nti al as r.l erk In mo\r $al kr.t l i l e, p.s! performance (achievenent) is the best predrctor of furure pcrfomraDce rn dre same realn of acrilirt The most colnr on lbnns ol aptiude are rhose used rojudge scholasri. promise and those used in employment and educa(ional counseling The Anlerr c an Coll e g e T c s t (A C ]) a n d S c h o l a sl i cA pri tude' l esr (S A I) are w rdel y used L, make predictioDs about uho is lilely to succeed in a college und.rgraduate pro gram. Other srnilar tests are used fbr pariicular admission decisions: sraduare school (CR[], Miller Analogies, GMAT), mcdica] nhool (N{cA-"f),and lar s.hoe(LSAI). These tests tend to be highly verbal, bur several of rhem also yreld sepa. rate quant'tative, lerbal, and rotal scores In addition !o rhcsc, aptnude.resr s.ores are used in counseling situations ro assesspromise in rnusical, mechmical, and artistic endeavors, among others It is common to include an apdrude bartery in a middle school rcsring program to aid students, parents, and counse)ors rn planning rhc mosi appropr, ate high school curriculum for stqdents to pursue l{oweveri in mosr cases rhe scores from standardizcd achicvemcnt tests and thc grades fron a laiety of sub ject areas may provide an equally useful planningbase l hus, aptirude rcstsDight be administered to an individual lbr whom past informadon is i,rcomplcre, (onf li, r ing. o r .l a re d d u e r. u ,' u \u 3 l i n reneni ng r\< n(s i n rhc yudrnr' q l i rc In sum, dptiude tes6 are designed to predr.t furure pedbrmance and are based on content that may have been learned in or out ofschool By conlrasr, achielement tests are inrended to desc be the current starus ofan examinee learning. The content of a achievemenr test should represenr a knowl€dge do' main that we care to know about. The conrent ofan apritude test, however need not be bound to a parlrcular domain b€cause the user $ill not wanr to make inferences about tial domarn lnstead. rhe user ofaDtnude scores wishes ro makc inf er enc e s a b o u r fu ru re b .h -!;o r-q h d h. erami ne. D ' obdbl \ w i l l be abl e t do. nor w h a t h e o r rh e c a n d o n o h lntelligence is thought of by many 2s a general apritude In view of rhc comp€ting theoies of intelligence and the nature of rhe corresponding scrs of lasks used to measure it, such apritudcs as verbal, visual-spatial, and quanutative reasoning seem ro be firdng descriprors In addition, fte purposes forusingrntel ligence t€sts are nearly always predictive rather than descriptive More on rhe nature of aptitude tests and their relationship to intelligeDce resls can be tound in Cronbach (r984) and Anastasi (1988). SUMMARY PROPOSITIOI{S I Because ot lhe varielyol delinilonsproposed lor tesls thal emphasze abihliesdev€ oped nschoo inlellgefceand th€ lackot conseosus regarding ralher than rhose lhai .esurl lrom ncderla howlo m€a6!fet, schoolsshooldnoluserher* solrslrom drfterenlt6stsas lhoughth6ywere 3 Insteadof choos n9lesls that plrponto be "c! measures ol exacllythe sameconsl rcl lure lre6 or clllure fair, schoos sho!:d prelerverbal 2 Edlcatorsshould to nonverbar nrel choose tesls whose contenl ls reiovanl ro ine ligencelesls pa.tlcularly to/ g.o!p lesufg,and learninglasks ol lhe schoo STANDARD ZEOINTEILIGENCE ANDAPTIUOEMEASUFES 34I 4 The aq! ai...ad€ rcrms used toi.terpreiscores fcm ir:rigence lests are usla y expressedas sra..aic scores or percenlrteranks 5 Ine s.ores lrom an inte gence-resroa ery can conveyan ove.atlabitityteve, as we asa parern ol slrenglhsand weaknesses 6 Tne partc! ar sco.es 3 studenrobtarnson an nLerrroence t€st may havc mp calions ior decidi.g w.ar sklls lhe s t ldent needst o lear nand wh c h nsrrucLofa proced!fes m thl De rnosl e'reclrve 7 TesLlsers sholld noLDe slrpriseo ro get some, wna( d erenrscores irorn somewnatdtferenl n, ier9ence resrs,oi ro rrndlhar the samest!denr,s score srrrrlsupwardo. downwardtromtime to lumewhengiventhe samelest I T€acherssholld regardinle igenceleslsasm€as!reso' generar abllty in schoottearning, anabit ily lhal E basedon pfiorteaining I Apiirldetesi6are d€signedlo torecastsuccess rnsomerut!fe6ndeavo., presumably by measur ingsk ls lhatareessenliatto pertormsuccessrut a.ce ln rhatendeavo. 10 The locls ol achievem€nt teststs whatthe €xnow bul the tocls ot aprrude resrsrs wharlhe examineew be abteto do in OUESTIONS FORSTUOYAND DISCUSSIOT,I 2 Whalaresomeexampres ot ndvidlatsadaptng to thei.soctat€nvironmsnt? 3 ow car a prospectvelser oJan inle igefce restdolermine lust whatIh6 lesl actualty 4 C o !d me a n rn g l l t.rre rc n fe i erencedi .i erprel atonsbemadew i l hscoresi fom6ntnte, gencetesl?Etplan youranswe. 5 Howcolld a sel or tasksbe c! lu.e ian wrlhoutbeingcuture k€e? 6 Whymshl lhe laskson a nonverbalinreligefce tesl nol be consderedcolu.e,air? a Whyarethe inlelligenc*test scoresot stldenrsOen€raly mor6slabt€ov6flime lha. are lhe r aclrevemenl-tesl scorest SampleEvaluation Planning Guide Health Unit: Physical Fitness (Grade 6) AssessEntering Behdvit l. Clas oral questioninSro derermiDeg,oup familia.iiy sith tbesererns: phyrical litne$, €ndDrance,body llexibilirl, body suen8lh, s!re$, and fatigue. 2 Obsenahon and classificationof studenrsas orerveighL, about rigbq unde. weigh! physicalll normal or hand,.apped;and tlpical responsero ph),sicalex€r. tjon as tolerant,somewhatstressed,or overly soessed Fomdttue Evdl@tion I P€riodic short quizzesd€alinBwirh Lems, con.epb, and relarionshipsberseen 2 Clas oral questioningwith examplesand non.examplesro che.!.oncep! arein 3. Reviewof drafr copy ofproposed exer.ise plaD and program '{ Reliew afrer one week,of exerciseand sl€epjou.nal €Dtries. Summatiae Eoolwtion I Objedrv€ tesrs€crioncoveringconcepts,relationships!and applicarionofprinci 2. Short lnswer test seclion dealinBwirh explanatione,descriptioDs,and problem SAMPIFEVALUAT ON PL{NNING GUIDE 3'3 3 s hor t es ay s ec r ion r equir inq r h e e ! a l u a r i o D o f a h v p o r h e r i ( dr ererciqeProsram ' - - '" - ' ^ -' t o, her ld) ' , ' , h s . : l; ; " 4 Dew opmrqt of a rwo weetjournal wi.h daily entries describing p€rsonal exer_ cise nd sle€p a.tility. Prqpositions Obfainedfrom Instructional Materials Health Unit: PhysicalFitness(Grade 6) I Ere'.i!e n bodilv exertion that .oDtribures to deleloprn8 and oainraining 6r- ? I-rcrcFc can improle blood ve$el rapa.iry and hea.r strengtb and lung '.crease 3 frercising rc increase ous.le sticrgth providcs proiecrion from back paiD and builds abdom inal s r ! ppor t ,1 Rone thr.knes a.d densily can tlc incrcas.d by e\ercisug i Fd l;ssue s repla.ed bv lean mus.lc tissu. as a res'k of regrtar exercisinB 6 Exercise can alleriare tbe slmptoms of sLre$ nuscle re.snrn and inabilny ro 7 Aerobi. €xercise requires a minimum oi l0 mjnutes of.onrinuous, rh/thmic ll A€robi. €xcrcisc produces (hese body e|fe.sr in.reased cardiovas.ular and ra. \pir ' . r v r "pi. ' , r . loLf l blond p r c *u , . a n d h p d i r r a r c . a n . l i m t r ; ! . d . n d ', ' 9 A ninimum of lhree ?Q.m'nute se$'ons weekl) is reguircd ro a.h,eve rh€ bene. fih of aerobic erercise 10 Anaerobic ex€rcise s bnel intetrse phrsical acriv'ry l1 Anaerobic exer.ise imp.oves body molemen! strengrh, and sp€.d, usuall) wirh oDt conditioninE the cardioputmonary systems 12 Planningan €xercise prograo includes rhes€ componenrs: physical exam,6h€$ rariDss Eoal definnion, a.dvry sele.tion, progres monitoring l3 Fitne$ toah ffnerally.enter on cardiopulmoDary endDrance, muscular endur. ance, mus.ular stren8rh. fl.x,bility, and alertn€$ and concenrrarion 3a.l PFOPOSTrONSOETATNfOFROM NSTRUCTTONAL MATERTALS ', l;i:: 19 re foran exercise prosram ac.ounts ro, frequen.i. i.tensiry aDCrses ydraLion ran be prevenred by drinring col.l water b€f.Jr. and during exer ,* 3.:,i0"'j;.J,:,ir,:. " 345 3i iliil:ir.,*:i.a r,he srcep.ycre arcusedro repairbodyrissueand Lo.d rhe penon requiks dep€ndson a8e.acrivityrevcr.and sen. SampleInstructional Objectives Health Unit: PhysicalFirness(Grade6) At the co mpler ion of r hr s unit r he s iude n r s h o u l d b e a b l e r o : I IdeDtify in(reased bodr..stsLen efficiencies rhar resulr rrom regrlar exercise 2 l.xplam hov regular exe..is€ relares ro borb bone sress and increased bone 3. Des.ribe rhe effccrs ofexercise on Lhc spe.ific synproms of emoLional stres ,l Disrintuish Lhe purposes and fearures of aeiobir and aDa€robi. €xerciing 5 LisL rhe e$eDrial .omponenb of a plan for developing an exercise prog,am 8. Explain how b der€rmine ea.h o I these ra res maxihum heart rare, targer bearl rate, and resring heart rale I Plan an exercise progran .on sisrenLw'th his orherown goals and current healrh 10. Ke€p a conplete daily fitne$ tournal foi a rwo.w€et peflod ofan exercise pro. l1 346 Descr,be how nurridon and €xercis€jointly affe.( body w€ight SCMPLEINSTFUCT/ONALOEJECTVES 347 t2 ,. bod\ ,},,emsJ! a pf r$n bF8'nj,refp.muscurar, .,rcu :l:l:.11.,:L,lg", ,arory.resp'rarury. "*. nenous, ander(reror) It Drawa line graphtharshowsrbenumberand frequency ofdream episodes in one ni8hr tor nst,mare acrivily I most individuah. r€lariveamountsof sleepr€quired by indiv,duah qbo vary in ate, , and senerathealLhconairio; TestSelectionCommittee In-serviceTopics The teachers and orhers who are responsible 1br resr scle.rion mu have knowf e, c lA abuur e rl )' h p ^ s r\' c m \ (u Ii i u tu m and,2,' hc.senri at: ot re Ind ro rutfi tl r hc r obljB a ri o n \. s u m e o frh e rn o s r \i g n' fi (Jnr ropi ,\ In ddi j re* i pr" epari ne a s er r . r on c o m rn ' rrc e t.r rri p ri ma r) ra \t dfe ti \r.d betuq. A Sour.es oflnrormatioD about Tess I Pu b ' i \h p rm rre ri a t.,c a trto B...pe,i rnenr,., c ' , j n !rn , ' l ' . re\Lboott 5. Consuhanrs(publnher's tesLspecialsr.serecLed collegeedu.atio. fa.ut)) D TesrConrent and Curricutum Mat.h ,l i * .' s i rh d n i n Io ,p e l r i ,e vi ew row ard!se\n8.uni (urun nahh I/.. J " f. J u o g era rrn e s Be n d e rd n d e r hni . rcpre,enral on. namer,rote5,i nrerper n sonal relations' dep,.ted C- Techm.al Qual,ty I Evaluarereliabihry €vi n.e in view ofuser purposeGk r sco.es,equivalenl forms, difference scor wilhin the ba.rery)' 34A TESTSELECTON COMMTTTEE TN.SERVCE roptcs 3a9 2. D Ass€s $e apprcpriaLeness of avajlabte ,,orms (rerevance or norm poDula. non, r epr \ c nk r iv r nr \ uf n o r m r n m p t e , r e , e n c y u t d r r j , ar . r qFw ' onr pr er r c n t ur . . dn d , i n r r I , m i r s r o i u d 8 r e r t r c i e n !) i n d . p r e d e d n r $ 4 x r r ic s dim ( ut , r or ed. h r r n i n ( o I n . n t noor :nd, eil,.g d,rfi,,irr Ji;;;n: t or r im € of y ear or r es nns s . Dac r m , ne r c jdt nS r oid, ; n o n r F d d , n s r . r c r r r e\ pl\ . r enr en, r \ r r u, r u r e , d n d a c ; o r Pra.Lical Ma[ers lnher consultarive seni(es (ea.her in i"8o j €snpr€, format and rypesize, rit2hirir),of practi.e marerials of cogDitivestills is available ttl notuically TeacherIn-serviceTopics RegardingPreparatiofi for TestAdministration The list of topics belos are among rhe mosr retevanr to be addressed rn reacher pr ogr am s pnor r o r e\ r ddm i n i s l 'd r i o n . 'n{tsNr.e A Selc.tion ofAppropriate Test Levels I Revi onredl ro esLablisb the mos approp.iare rc$ tevel for €ach grad 2 A$c$ lhe n r individualized rcsring (out oflevel) wirhin cta$rooms by using iDdn,i current redding and developme.rat levels B. f e\ r \ , heJ ur r ng jnd Ad\ dn, c P r c p r 'r 'r n n 3 Oudine procedures ahd dewelop schedule for make.ups C Test Administration I Reli€w loral time schedule and the need ro adhere Lo n 7 350 Refie* sysrcoaric procedures for mareriah .lisrribur'on and .ollecdon TEICI'IEFIN'SEFVICE TOP]CSREGARDNG PAEPARAT ON FOF TESTADMINISTFATION 35I D ' "rr":il i:f."i*. "r checkinsdo.umenrsfor.omprelio^aDdfo. euarily TeacherIn-serviceTopics on Achievement-test ScoreInterpretation A posttcsring in{edi.e program can prepare reachers ro undersrard rhe mean ings of scores, rhe locations of scores on rheir reporrs, an e$ods ofusing rhe reporl, effectively. l his lisr identifies key concepts and p iples rcachers must under s l a n d to ' n re rp re r s u n d a rd i zed achi evementtestscores. A . S e l e c ti o no fP u b l i h e r Sc o reR eponsand S e.!i c€s I Reviewlhe sysem s purposesfor k$ingand identifyr€pon formaLsLhatNill tacititate achie!ing cach pu.pose 2 Mar.h the specificgoalsot Leacbers, counselors,and adminisrraros pilb rh€ rypesofr€ports thaLlacilitac 3@ining .h€ goalsof ea.h group D. Difrerentiare.he Typ€sofScores Reported I Dndngujsh composire,Lesr,still, and item sco.es 2 Des.ribe the purposesof raw end percent scores 3 Explain rhe meaningsand purposesof developrnenratscores(cE, SS) .1 Expta,n the meaningsand purposesof srarusscores(PR,riCE, nanine) 5 Desdibe .la$ averagescoresin rerms of "averagesrudedC' C Undernanding lhe Norms Used in ReporriDg t Describelhe narure of each group (lo.al, nadoDal,.arholic, lar8e ? D When appropriat€, differemiaLe norms lbr pupil scores and norms for ichool averages 3. Explair rh€ eff€cr of tm€ ofy€ar of t€sting od cf,s and PRs Extra.tin8 lnformarion from Tesr Reporrs I Assessannual grosrh otindilidual $udeoB usin8 prolile.harrs or cumula IEI|CHEF INSEFVICETOFICSON ACHIEVEMENT-TEST SCONTINTEHPRFTATON 3 . L i i ma re i n d i \i d ual de\etopmenrat l cvet,{ i rh 4 l 4 u n i ro ra n ro d l & o\rh.ol .tsi e.,!rgrrdc,o pectahoDsand accountingfor studenrmisrari 5. Idenrify.la$ (or Brade)strengrhsand {eaknes xutum rreas prnggruw rhFr frests an.l sttills 6. Ur groupdara !o des.ribe skitl and ,tem performance(norm and crirerion 7- Explain tesr resutrsro studenrsand parents E BasicInterpreDveConsid€r2rions mental and s(arusscoies 3 Da, r ibe hoa r o r e r r e a . o n a b t Fp e r t o r m r n . e . r r n d d r d r I n r u s e q i r h c r i r e ri o n . rerer€ncfo rnrerpretaUons References A^ro\ R r 0e7r) dmdsDrla?twna4ttt6n A!h!\! l:$ trq3ql crrsroom Itr'p,ovrid, rnL ur a(lg\on *\y.,k: Le.t) Th, qtlnt;v teh1. A r8 ^ N r r L ,M a , , n d S A { r s.D L 0 { .1 7 3 ) r { u kip tc r 4 p o n r s mur' Ple he r.,r!c!orn8 A! R (1977) A omPanson or11,n.!rrl, krirbirnx ud !i,dry ofompt.x nutript. !h!!€ DUrtrPLeresPoN. (R*eard' R.p.n \o 95) r.M C'rv: Lr.iveBn) .rr)wa, colleg€ orltrdunr L , d r r j $ t r s ,H I t\e r r l) r h r M tih ' .4 r 1 @ d ^ rrxlnt { ! DnDas r R , tu a$8:N4t ot ludar uh;?llr,4/ walh'nson, DC Nii!,rxr i.ademy of frdu.auon r^ao\ 09lri) 514trl'rd /,/ .du.atb,qnana l\l.h,tasoL katY wshiDgr.n, Dc rh€ A\^shn .\ (r98rJ) Pryi'r,g,.ar rrr4 (6rh .d) Ncq yorr: cv\.hndse. J R 1r*r) tharchkluQotraear;d MA Hari.d LnN.rnt Pi.s ^!DrRso\ A\NA. L F (1933) S/u2r Dubuqu. rA: Wi un C '.rnru' Ayu L r(rerg i !4r./f wtrwDs rx q&1i5 oI hbnv;inE o/E/@r.r,,'a i.'ee Yor! R!$cl B&'rN I l. (le3l) Do.s n2ri.nalLr' normed reill' nonJjti Jaut n oJravdk^c MetvMt, 13. q7 'n.an 107 adb\^ t)rtt|u 15t1us 263 lte73) ( t9 3 t) B au . R , D , a o d r n r L . { r A.o ' r p r r ko n o fd im .u lr v 354 vduel 0r *lc.rd rru._Gke n d) tfpN a,nLmpm\ E/1ntb.4t pr.t t g,7 t5 40 B^Rrufr L 1r.,87) A.{lenic evatua on rnd {udenr dn.i t4o hwuLatr4w@A Bfr{, R K tr93!) Ser..ring thc index otreriabitny. rn R x a"tN t <t), A Etit b qrn,o,.4ln n4d m,ne Joh6 Hopkms uiir(,'] Pr$ B rN u,A (l 9l l ) N ouv.X $re.h.r rrerkduer.he, res 6arn$ de.ok A^e PryhotE.tli. x 11937)op.ncndedv.r susD,ur,ipr. oi.rr$Fonse iormr trdocsmake rdir ae.en(e ror dusarn. purp.$ ,4ttL.d plyhataE;.at M& srooM.R s 0968) kaniisrdn sctf EuLtutina4nBt, ,t dothis oe.6) r@iotu n!41v be\ I: Th. apnn, d6qin BorLb'Nc ,X f (1967) afut t .hMt.g. nvohn bgst ,,7 11 ( (r937) ComPuErided In R cls!. Gd) 16rru titut trhtut sl HtLkt e NJr l_rwEn.e Erlbrm Asoo ciffu. R. s (1952)HoN invatidalc Eark3 r$igftd Lr r..h .4) Jtu Eduatiannt Pr.habb, 4r, 2ts 2a 'd E (te6e) The us.orspaDre an5ser sheeb br pria,r] rge .bndrcn ku.,!tt al Edn ttfui MtuawL6, t5a a. CHAS. c. L tl973l. Trr. iEpa< o ,nd hordlrdns quabryon (orins esav r.ss /du.,,/./ E'}va|io|4lM^u|M|.16'1941 t,. . .. .tt,- \ |ijn..D f $(l]oi)fl rk^u4xnt, n1) 4r! t;l D r '\ {L P I 1 l ! '7 3A ) d l x n l {nt t i l c m e t r l r \ , r , D s< n a r i o nR .lL A$!!u!.i) ( r ! r ) 5 ) , r '4 r , n a .,11\ \t r,.nncHi|l. rr \rr l,\'ntol Lrlnrint tqnnt qi c \, d.l i l ^ t,\n4t't 1 R r ! I .e 6 6 ) A { o m p d t!D o t i( n r t( ( 4\ . [ | \] r2t \ N| r.ttuaiaat 1 i 0 7 3 r) r e , -- utuvlhht r.\2r s lt b:d)id@at jt1nN'nt., MntuhnL t2.tt tj t$nt,L1r L.r 44t4 'dl | ]ff\ N l c .r h. nen\ l F .n]P Il R l !]r ]k d) Br\vr\ r .1.ncso) .t,Y .u(^vu|d a,ul t:et@LM h (;;1aa! 1' ,anl lh^|r\ol n|ul tnlh' 1'7 ||| ^,a\ rrsr tr,rr4t' '|t t\thaaFtdt I n) lt u)|attqzt D-41'h a\\. B,inh r\d,t..t F N tr .s | ( . r r ur r ntn4! \ \tan Blod nl E t Ed,iil 0R( rl. P rl160).liorrd rk,inr. brrt7 Nr? bx ttsvttl ! No 5 ) Pr in .e r Dn .:..r ]il. j.rr.,rrd r{\Lrx. D.w. (tq7!) Eau to b@t& oa sq r-tgra q yv qoat \fu qha,h4t D.sNli 4 prw tun4 l. L (res6).isArie,!/,1r ol,rujun t,4k. rc5r !r,,;oorDh | ]) 1 4 \n *r t.r r n n .h n r o r R G (rs79) rkmoPrio pryhot!$ 2t. f; H@h44r \l ao{, nr !,r _" r ;B ,he _i 'n dhoit or.nimr;s rh. p(dud n'oDeD' @.m.'.nr fron rh. hrrs of$e dh;;hn. pn kt,E |o( ot Ltv4toturt ldqnt )a 614 30 r FFM n 4 l tqr 3r v ul i ph..ho.... ]tu. r .k . r _m D ._, t\nrt\r! hrd4t d.Fnr .drcMl M.6b4dL tO 297 !O4 _1r97.1). r'he e[..r.r neh ao - - kn\Ffudiq4]adP'yhDlaqwd' rrs;6r r\prnd<d Epq,h8 rumi (r973) rre,hodorognat ronnd.rrtions pd,ni! ,4 pnnd.r i! s,adinB r\r tiant Astn d .tcal.sa bd r.a16, ot D tttsnt ^sti\nurJervl. {l1r3c) Rerirbirnr or r..,rs n.r Ptuha,7trt,23 35 ( (1990) Kj.derya'r.n pupil rnl - ing Ebha,a, s.h@rlortut, e'tlt 135 +lt , n r r D r L v a c i ( l 9 8 b t tr $ ' m pr. uuc turlc r$L /4,81 o/ Druo Jliri, rqo I aln.rgo Rnar(k H nr\ l R and(raD \Lry bero{.hrrekn{d6 L, th, nntLd nn,ltb , r'd oRrJ . 1rq30) irhdiv\ .tun tn An dtnflata! .of,pan rd (Reserfth R.por LNo 370t urbam cl,ripiiSn,rL uiivc(nlot ,nos.olrice c ( r e lJ? ) ,rD(rsr'l\lr.D prehr rtrhe x.hi.vencn' h6 /dnu,t lt Ldtdiq'rl trt nntdhr /6; Me P.\1943\ lJ t om"s ^\dri.n\ NJ: F,.di.r t2nr e,t\ trngre{oad crirh, clr' {,H v s (lorjij) P'eparns rnnsparenrke}s ror insped n\1: \q shcd, l\mat aJLdv|l;,wt Me5trn1L ), tt2 cL:$R. 'nR L (1962)Ps)rholo8,vandinsruoional Le.hnology r R I crrfr (<L). ?zotns,,?ath a./1?b@hant\.e 78) rrbbrrgh PA 1106lr)r nud(n tr.i,n.rog'l, a.d rh. ncaruremcnr or r.xrnitr3ouro,,es 1,,tr.tr rrt.hob!1i!, r3,5t9 2l lre68) adrPfns rhe erem in'rividutrlp.rlo' ntri.. PidpdrE\ il tt' te67 lryitatiqa) a,4b4o an 16nn[ Pr,rkn\ PnD.€o. NJ: !]ducuonat Malutl,r th@r alnann! publ,shi,rgco,n B (re63) P redtrungg'rd.sfruD rD ,ar4 Hrvlir W tr07.1)Inbodr n)n ro rn w H^crr cd l. ,.,'dr" ry',4trl bnns 15 t5) Erslc \ood crjffs, NJ LduxDmr rel ,orw Lln@ 269 ! SupP 406 lD D Lr t9li7) HoLr- P. c (le.l7) v Jl'hD $n'r & sdr, r,r Hoc^N r P.(re8l) nrdxrab, ,dM'r4 t\tb^v M,1.rod tP nr\ al uLnadLd| A ,n,tu lJ !,sr R ctno{l udon se\tre N o rrD 2i ?4311 HoovLR Ir D (rr.rlJ5)rhen^r tfilrtdt \aa la' ne'u\4 tuudetu ba.l)D,4t n ttE itw' pape' pr.trn'ed I' dr A nmral \ rh'i L ri (l s7et ordc!ngpo\eror3rp;rdk \,^ursror ped In! illl( !$G In'ert(ron.ftj-p Atuti?a P:yhokp.at \kburtn1t t, 3, end Tr.ix. B r (1933) rffeG ol ?(hievemenl expemrions o,l hand!.'unB quatnv .n Kdin{ as\"s hltnd 4 Dtvdm A S. rndLANDMAN,Ir (rq30t A$e$ G u r \ . B r ( r 9 r J 1 lA d ?r r le ,6 r t.rG'r) Ntu tndin,lq t4t;,8 ndL k(rvbs\ otu 1nnrrl^'i't rn ls humd a;ref r r'l. Po&4 .n $r us r2{, (r9?r) CRoNLU\DN E,3ndLN\ R L 1ts9a\Mtdtudat d,ntutt di@,fl z&rus(rjLh ed ) r"e! \brl Me.nrrran tubh5hrn( (nn k,rD I P 0e36) tq.r,4r,. .l oIrN so\,A P .(l e3l ) N d6on a ,d,t the \tL,\d"\ Jtutut ,t Ltvattu loFN$N, M s. ne6tl al r,1vd,o"al pr.h'tas!, 62, L,tamat ft0,1,4 d r{irl v llcqt)) Ma.t t: oI t.tu1|r1t2nr.a| Joycr B R EnFrewoodc1,ff5,NJ:Prenri.e lrru, ln( (re53) rrl6n, R A An.xPe.in _ Ldv&,tu P:\.h.hs\ |e, 0e{i1i) rn'ethsenc tdtrn t 't (i954) H^r'rIS KlNr,M f trelr2) a $mprins,,odd rq 1r,d,q A#r,?rr, I A {hoor Darr-fz.\r I^n.y1 duarotll otur srpnr,@ 4.). 3o5 t2 .h.tagicdt Mffit tu| 6 ) 23-6a ^dDwbd'd s M (r$qtr) vxlidnr.ra L 0930) AgreenenL.oem.enr as uxo.ony or nux,pre.horce r.mvrn,n8 ruld At?ftd indi..s ord.pendabirny nf donain rtrer.r..d bc i, M^t tu|ia F,ttqtit4 2trl.5L 13 pt6.t P:yhnhsi.d M^,tMt, 1, ta1 26 (1939b) ordulriplechoi.enem rrNr, F. (re63) c.oirhre a{hd Jtuqtt { Ap?Lizr L.tu! ^'axonom' in Hvat@. 2l1t 11-50 rtni'xatct Applat lndur@t wd r,79-:!9 ^r4l):s, Harn, L rnd ,oiAr, !: 0975) lhe efieci or rhe qurhr or I(.Luy, E C (1962) The lulty nrncdoning s.tf rn P'.ru?y, pr..ediig rep.ns.s.. tl'. grrdd isisncd to lublcquenL ,.bury, ,. r'',s wrhi.grdn, Dc A$oqalon ror supe; lssponse5 o an esly qu.n,on /d,tur al D*ednn! M.a vkion rnd crdj.ulum Deletopnenr 1962y.,rbol Ruur'l L (1939) rhe sel.crionorupp.r end roq.rgroups HARRor. A J rreT?) A tMry ott4 n.taDdr drui^ A ror nrid an or kn irn5 /@,,r E Edwti'nzt p, 'be sLd. td drnb?tns h,haroat obi.t Issu, D J , and suEruND, R C i3rr) (r$5 m) rd; H[.KM{N.R w rfEN,J.andsNowR I (1e67) Efte.B oI res, voruE.5 r-vrrr xans.scD, Mo:,f.r co.poaron ia€D.nr En,ns alk& t" .J o Psr.horog t MtetMt 27, rr3 25 ?* @jtzrntu -.rnd-(€d!)(r0s0) 4m 16 Hu3u, R (r933) Cordacl.aus.\.fec6, and rEa'm.nr Mwt n p,rtubs,.e4dn, ^o,/ dtus (ird cd ) Krn. oI test Rtu! oJElvotvat R.w\ tslr), a1 71 ss ctry, Mo: Tcn corparr oo oaAnarcr ^ax'.lrN Oe6r) x6@' oa th, e alq v!6tu @w H'ERoNvvus,A tuu.R o4xtits 16 itttui@ @tttu J.aododc6oe3r) 4r''37d&J unpublished6anu3crip,,ro*aT6dnSPro ,ut@ (2nd .d ) Boron: rnd B(on, In. g ni Univc^nr or Iowa B LmM. B^IIn S ,rndMN h,B s (rs6{) t%' 0961) t:a/t4@un nthtu) in lr.h'las @!1.tr4tnn. qa 4 4tbd,i,B|,btrha t|atulb@h : rh? alha .r - ll{r5oJ in.l nrrr,ns sy{cms rD w s Monr ,." ,.,.."- ;.:l D ,\ 1/q|e) r:rred,r.n.$ olmulri oor,,Rnol. A. c. ti9n7) obh,nrna inr.n,ld {eigbr rhcn w lr cr 7 ) T lr $ ( r v i ,n,,g.,dc n.! 4'd FrrLrt, 6, 29 XJ or r c5 F .Sr p p q 2 6 ( ND.cr t r q r g). L r , l r \ E I t ( r s6 r l I n.r'\ tld \. Phtt64,t,b ,j auand \c! L r \ \ R L r q 8 3 ) T e n in sln d r n n.,' Ja,a,r'n lJ Lnvdrnal Mtq\vmt,2a. tbrr: tolrn w,lq Pai Lqltrh totndt, tt,4tj1 D (;. (lq2c) 7. .J hr ./l 9- i P' ne 3! j ,.,R i I r,.i r n,4.n,.l o4 ht -u,t t I 4e. Pr R ( 11{ r ] l Ic T :l ) ho\fe ol i'cm q,rrtrr /4d./ M.C\LL, w A (t939t Q Li " v N l '6r2'a vninB r s ,Li r M ',f rcdP(cn.e J r nd c r k r j . ofl .yd oley. hrndcoordnkn,n (1933) ,\d!ur a.Mdr m d.st t 16 4 ntu da ^ MtyrR, Ci (r035J Ao.xpenn€NJ iudr orrhe otd aDdnew ,rpes or.rtrm,nNons: n mdhod5orrh.nudy /,unut o/ af .d,ntiddt ?,o{4' rmtdad ( r s 3 s ) r h e .h o ir e .fq u .n ,o B o P:yhatas, to, t6t Jr JtuD'tr alnuaiwL vrLLM{\ J (le74a) P'osrn 65e$Denr.lneron ref.r.d.cd non,ft, )2 taa 92 e d m e r u r .m € n L ln w. I 8trlero cA Md ui (rq69t H,, b r4r rni ntn tl@hwL Ne* York [rc Jr. kd.) (re33) T66 '4tuint nr L'matn,NE !.L x c,andJoHNsN. M (te33) Paru.ipanc r.acuonso r.oplErn<t kiiq R. lwa4l al Edht@4t a^tuiry l@.,1r1r) 79 36. ll!trr. D.t. 3 w{ss.r, v 007?). rnrp tr om.f .hrns rn3 ?ni{ds obF(N. kn n.as Jtu nat ot Elud,o,ut \no oD d Nt tu tn!1diu M. (ls3r) fhe r.lduon5h,! and aas4n?ikinsro; &1uditu arn1,lLtetal rat M.dtrMt,,t t, 369_i 5 Ro$. (i . (re.17) r,4&uhr r dL1t vhad\ \2r<r.d, Ln rnc. srerood cl,ft, Nt. P(drceHa[, Rowl,M R (1974) wrr dn.{rd ahr.s, dru inu.n.c on r?n$ag. P.,,.t\\" \ \ne l o,% or R ' . (r9?7) R!.H, C M (r9?9). ?*? ot m4F Fomuta s.ori.g aatmtin a?n "rnrr RrLr c lts4et rh. <@4t nI nn s^rLE ,D L,andw H r'r,c w (re6s) Thc e|rea ofdi ..r.en L. (1s65).An a.dr!i5 or dnn pr.nob$at M@ut@4 2t, vrcHr,J.v @6 tl e30) Lfl i ( tr /dnd./rj,/ L I I H P s \ l L { in d ktvlNq I J cs and Pwie.3, t4 22 Mr*(( s lr9n9) v!L,d,q rn R L Lin.ted) Eduditun a (Jrd lcmar ed ) w*hmsron,D c .e in [ducai.o 093]) ,{ 16 dation2l r4an \Easbin* A J . ( l e 3 0r Dr u n F ish i.s rc.tEtjenred rc!.. A@@ ot EdlLdr'!1t R.edth, ta5t \tezit slot ,J6 nhrs puptts w< ta ntv t1p: l g u nfth E' fJi' } l' ka &zUm lE' jL :Un llro oat trnok Euleau oIEdu.,LioEr Rd.arch t.znijq humt aI uarh, t|.t;o:is. 5{P Ltb' P D i nd N oU \1{ ] rr,lhome,n! !.nndenLionl rr@ tidta Tetrchg! 23t6),17 2A rnd croM{(, T R (is66) T or tren raa.gcm.nb o. Len pft n ,o.e kunat ot Llju cot6d MwtMt. ), 309_rr. sL0v$ M 0967).Thc me'nodor.frotevaluln n rnR Tvt.r (ed.) Pd,pativ' at cuwurs Etttwh AILA M;no. I )skorj€,rr_ Snphsqi.soncum.utu,,tydtuanon(No -, sHFas, r A.rndsMnH,M L (rs36) syn$csn ofr.searh on jchool r.d'ne$ ?.d |ilnde.g ttu Dd.,:ht6 1111),13 36. sM'rH, I( (1e53) .h.i.e ftosinre!ins?.h'cvcu4n ltuaatolDltdnrct s E^rM ^ N c E ( 1 e 2 7 )T r .4 r ir Ao l,m s r^ rrl J C ( 1 9 ? l ) Edtdtihat Mvnatenn<d) (r1,5r) Rcr,rbiry rr !r l n ,n a ' o n In L I Ixri )N , (| s 1l l ||10) Ra,i, ,,trnudrqd r\wn tb rd,i , (let5a) and _ wor[ in hkrs s.ituJ rlaraq 2I,676-3r (19r3h) RduLiirirr of gridins hish {hoor rnd - -, tt s.hqi srldnlg p!rcLi..5:Dtrili'ns kiv, n\t Prk t n4 | \21, 4- | I R r D ' . r , a a l r I N{ r . F ( r q 3 6 ) ,ttA!,' ' r t,,r ,' a s'rorD J B (re46) Ir'.r'r,g) r n^.n.h. tlt 12 t 4' tdlultnl 1rrdhn la' ,,thD |t i i rhr4 | r\r, B,hr t,tn,t ' ,)tshnrnd r\.11t \{ rR r w (r (193?ra(on,tN '!r LcL$r\ r/,/rilArr!,@ rcprc\.ru'n( ( r g t r 3 ) R o m l t l i g d rso rnstu. ).rl pn.,rr Pri /'/k/ {al4n,6\'ttit \ no'td r r' {\r(,srrl I (r1l 3) r'nD rft' r - fh o n ,d i[e ( .d ) . wrsbi.ghn. Dc ADen S/rl v sodizi, 370 U s 93x{rq6a) ttua, al hurh srmNnrRc. R J (re35) Rryut I.l A hvin ttx,ga. N& Ydk crnbn,lse tr"ivdsi'! I',eis srrinNs. R I (re37) DcsiBn.'nd ne\er.pn'n' or p.iirr - ((r). /:brduen t' tt\)Nn) r\ant"i lln, fonn 1 (ih tuprffirrrn (i (lql2) Reliib'rri or sr{hDs _. Inl(nlu{ Ne Bn r [:M r n i ,en c, B'ardsar\rrcnt.oui,d..isi,).sof s'a*Oe30,wi use lorarard'nsaholrnhips ir norarnsnt lnLtry.ndt s rilN ^ x n . 1 M ( l e 5 l ) l h.tsa yr yp .o le x l-indqunr (.d ) duarml '.afl - (!rcn, u rl shrR {, R A (re3q) I \en) Elludutt n,tr'pris Alut\t rh6w-^Ar (l 'sor tudnn,t ',\!tr) D aunatl vt'r' tlrd .(l) l\]\l'itrsrtri Dll wtrf\ rf w (rlfo Qurt!u,4: \t1t^.ti tun||,.\ \!,it d) wa\hn'gkn D(r: Nrri,['l [.dn,Ii,r' ,^{o.irn] srl \D H {l e3l )rD pun[po D' r!r,(' lri (M),\tultfnror M J (ro3{) riiiDrrn In R x Be r r ( e d l,,r s!tu , nrrri"rtr. lohs Hop ditdinr4atun bi.n!'utiq s!s{d,^i, - (r083) a prr.rkioner': pide (.c.npu'lbn rnd 'nrr preunon orrliabikr indic.s ror marry k{t /ond,/ Tdlor, H Os30) r''&r s'4lii{ {ERlcIrM R.p.n N. 75) P'inc.,oo. NJ: 1:R'c cr.rrinsh, m.nhnd Evaluanon(ERrc Do.un.br R.produdur str rli.w foi obj.di€ rdd rrnFy P.w (1s53) Hor 'ud.oB tt,5s2 603 srh,L]ou.l't" sar t \r. Ettuh4 cldrk{ T.r$rucn.l s (rc?rJ,{sia,dg374*hn&tat - (1930) cl3sroon iMdard rd'ns and sldnrg pra. rHoRNDrx' E L (rsr8) rtu wlathr4tuv, sqn\tu sh'4 E ton@tPa4II) 'I co l'. s.hool Publhhing tlth. Ndud ttu Hth t4trhtua,1n' \t'\r rrr" nL totutnat Ll tht K,"rL/rnt)tu\o, hbabt ahma4 tr. x,,t, RrLatlw hat':gt A, ultdol lroturhii t)Dtt R I trf;71 !'{,t, 2 t \ rnrtt\ .rttt, | | 24,tltrlot |,t,6 ,)\ thn ,uirrtll I thtt \rdj ,t t!,itrtn4t su! r{'ree of wn..nrir-M'l\ w.!D. R (19t0) I ) 2e1 :tau atrdn\$ ht vt aJ EludnMt ]trNr2"thr, Yu.N,s L (193r) H.r k, usean | I PaI6Do@ atn l6ttutih hrnst,2xt) Author Index Coifnan, W E, ll5, r8d a nd re{s, K, II 0, 20? Cro.ter L M,98,2Or Cronb,ch, i- J.,84, t09, Sao DaYis,F 8, 231 BImm, B, S,5I- 5S, 126 Bou ldin &K E,42 Divine,J H,,201 DorF8r€mne, D W.,241 350 Downing, S M, 3!t, 153, 177,20.1 Ebel R L, 52-55, 84 85, 104,124,|93, 135-56,Il3, 148-{9, r?!r, 202, 233, 230 Enioll E c., 192,26? Ke l l t , E C , I t Kib r e r ,R J , 5 2 leldt, L S-,88, 3I3 Kr e a e r ,C D , l 5 l , '7 7 t risbre,D A,91 , ll0, 1 55,135 36, l5l, I t l, 177, 207, 267, 210, 212, 27A Cagna,R M, 52 53 Olaser,R L ,27 ,35 ,5l Grissold, P A, 267,2?0 Linn, R L, 50, 322 Cu'lford,J P, 39, 252, 3.11 H adl€t ,S T,26 7 H aladt na T, M, lsa , 17 7, 204 Me3si.k,S, 10, 105, 109 Hi€ronynus, A N, 209, 306 H ills,J,R, 1 38 Hively, W., 35 36 Ho€I,P G,63 M ir r m a n , J , 3 4 ! 6 , 2 0 t - 2 Hoover. H D, ?89, 506 Hru, L M ,, 13 6 Idin, L K, 136 ode l l , C W, 1 9 0 , 2 6 6 ory,l C,171 Pauerson,D C., 156 swe€rland,R C,, 299 Tqlor, H,282 E s, 15 6, 2b? Qu eUma lz, TerwilliSer,J S, 267, 282 Thorndite, R L , 80, i39 Ric hrd so n,M W,83 'I i r f i n , J , 2 0 3 v € r n o n , P t , 1 9 2 ,3 3 r S ab ers, D L , t?? , 2 t. t I srig8ins,R.J, ll? . 2a 2,2. 17,267,2i0 SubjectIndex \!6otur. 8hd ing. 2bE 6t -|J]sare sEn.iard5. 94 k t u tab, t , L!.I 2 4 ,7, g Il s nc _u qon open.bool(204-5 ", ^^, zi.lfurn! nerh;s. 32 33 .mpaed Htrh apritu j59 ryhn! o[ lg-pU, S r @euoro. 9, St 39 j,i"ijilii..:,;.',?#' ..-".', iliL:,i s€parak ,nswer she.L rp<r,r {udenG, ]]26 200 r.lttur rer l,3_2; .!fur q!s, I 2, SO!-s qreic 6tins. qos 9 <F rdol, 308, 323-24 r+ k!6L 326-27 q:E! rdlG. jog 19,126 ElE.FFknce,3Os. $'5j-;;;3'1'* ffi,fl'-"' . r.ait_ 245_46, 250.61 AnaJtaicat\.orjnR. I95_96 ^rs€r ch:Dgrng.2nt hpur udeG s 6 , j j 9 - 4 0 j2o-el ni'o tnn-'oo '' Frp,Drion.2oa Fojtari 206 c -E :-Fl'-{tud,216-17 -E , 'o'-u 1. 463 304 SUB'ECT INOEX Bia3,253, 335 Bimodal diitribuLion, 59 Biierial corrcladon,232 Cu r o f f s c o r e , 3 73 3 , 9 t de.bi.n consisrency, 9? rnreryrehdoDs*ilh, 37 3a C€ntral (end€n.t, 59 60 Ce.rin. ion resing, l2-13 Charting paii.rparion, 261, 274 Chear tn go n re is,9 3,2 06 9, 329 ch€crlish. t,12 247 50 Decision.onskrercl, 94 98 t2.l D.rived scores(r.r SGndard scores,percenrjle ranks,S.ore rnrerprerarion) Dev€lopm€nel scores,239 !2 Diagnosri. teving, 308 9 Dim c u l l y , 9 0 ,1 3 0 3 1 2 2 3 , 2 j . ) 6Z t S 3 1 , 2 3 i 3 3 .riterion ret ren.ed r.n, t!r. 2t3, 2J7-33 disLribuuonof indi.es 228-31 pcrfortrran.eevaluation.t49 50 trodu.r flaluarion, 248 49 Puryoses,l4t .!3, 250 clasifi.ation trem (Jrr Mar.hing itens) Cognni"e ouLcomes,It 19,41 -17,100 i r e n , 1 3 0 3 1 , 2 2 3 .2 2 6 2 2 8 problens in measuring,46 ,t Conpa.abl€ scores,217 conpl€don itens (.", ShorLansweriLens) Codpul€ r asnre d t€ srin g,l l: lt , 2lO ll, t l6- t ?, 23 3 rdm,nisLra Lto n,lll2 216 l7 Sradrngn,fiware 281 iLcmbanting, ll l2 .e.ordkeping, r2,283 r€porrn8 resurb, l2 3.orinF and analysis,210 11 Discrimrnauon,89-90, 223-2,1,226, 23r 32, 237 38 Concurrenrevid€nce,106, 103 Con$ndretar.d €lideDce,102, t08 10, l3l Disoa.r€rs (rr. Muldple choi.e iLems) Do h a i n d d i n n i . n , 1 0 4 5 l 1 8 - 1 9 Domain rererenced(.?, Crikrion fererenc.d) €xtran€ors!arian.€, 100, r3l trnderrepreenradon,r0!, 131 contentrelakd evid.nc., l0! 6, 221-22 correcrion aor guessin8,133,?0r,211 r:l Cor..larion .oemcienk: zppti. tlons,70,12-74, JJ, 106 7 i.rerprehiion, 70 74 P hi,23 8 Produccmomenr,72 domain ref€.enc€d,94 interpreatiotu, 195,Sl0-13, 318 nem anary$8Pro.edurd! 23?-33 obJ€ctivelrrferenced, 56-37, 118-t9 reliabiliV esrimadon,9.1-98 Crirerton.rrl2ted€viden.e, 102, tO6-a C.iri.ishs of L.srs,3 7, 524 Cumularivet €qu€nct,65 cus6n'z.d t.6ring,8, S23 .rirerion ref€renc€d,22.1,237 3B in d e x o i 2 2 6 , 2 3 r - 5 ! irem s€l€crion,232-233 potDrbiserialrndex,232 pre p.{ diff€.€n.c index, 238 upper lorer difc.en.e index, t26,2st EtaecLtles.ore.ange,9l, I50, 2?, Ern.iency, 127,222 !nrefing behavior,2? 20,253 Basic rea.hin8 Model, 27-t3 €laiuaiion plannin8,29, 3.12 m€dods of ase$in8,253 EquilalenLfortu n.rhods, 82 lsat resb, l0l, ll5 l?, rr2 23, t8S-9? characr€rsa.5!139_90 .omparison with objecdr€ rens, l15 17, l2t 23 Burd€linestor preparing,193 9.1 reliabiht of.adngs, 192-93,197 reliabilny of scores,191-92 s.orin8 m€rhodR,194-9? wrnrDgabilil/, 189-90 E a l u a d o n , 2 39 0 , 2 4 1 6 1 , 3 / , 2, 1 8 Basi. Tea.hiDgMod€1,27-23 fornarive, 24, 29, 24I 42, 3.12 inlormal meftods, 241,61 plannin& 28-30,,19,9.12-45 relaLedro rn(rucdon, 26 30 SLJBJECT INDEX .elrred ki n,ea\u.en,rnr,26 r.larcd r. Lcning. 2li sunoxtivt, 21. 2! 3,12 13 Fredhr.k l(Dp (r.. Br\iL Tcr.ltng Model) Ir, nr , rc f v r h ' n ! . n 2 1 , 2 1 1 4 2 2 a i !1 2 Fr t aluen, r d a r r i b u r n ) n , 5 5 5 9 rlrtrr.tr1trti.i.50 59 'lc r. ' ihed . 5 5 56 57 hisloAranB k urlos t r.5 l r skewed. alJ stmnerril. 58 Global qual,L,rscolng. (r.. Hol,siic scorng) Cr2deeqni v a l € n ' s . o r e s , 2 3 9 9 2 ,3 1 0 1 7 dc f rt rron s . l 2lj7 68 mean,nA\. 11i6.263 1l nreLhods ol a$igning, 279-33 need tb., 261 65 purp.ses. 26,1 reliabilnv .l 267. '.1t0 shoncomrn8s 267 s our. es o a i n " r l i d i r v . : : 1 0 7 1,2 7 3 i6 G rrding 2 0 7 . 2 6 4 8 : l abnt ure,2 6 l r 6 0 , 2 3 n 3 2 a$ignne n s n . d h o e w . ' k 2 7 5 - 7 ti .onrrac(, t82 lr3 .ontra(ed $nh elrlu nrg, 271 14 F,idc, 276 lcgal hsue!. 2i1 pa* liil ?31 problems ni. 20?. :ltr5-li7 rclauvc 268 6S, 277. 279 30 $f$are 233 {eighring codponcnc. 276-79 Fr C' d. l'nc p' a , r d x ' c . . 2 7 ' - J b$f ii rmFar rbL,..r co.renr based m€rhods, 281 82 disrribnLbn Fap merhod.279 F adrng o n t h c . u . v e , 2 7 9 p. rc enr g r a d i n g , 2 6 8 6 9 , 23 0 8 l rel2Live meLbods. 279 30 sofrrare. ?45 iandard dclirtion merhod. 180 wei8hLin8 componenB, 2?ij 79 crading syne6s, 266. 268-69 1tl-?:l combininS c.mponenh in, 276 t0 dual sy{em. 273 e.r€cric sysren, 2?3 sradc s.ales.2?l 72 365 Gfuup h.Le.oqeneirl 92 (i.up reti.cn.cd inrerpre(arons. 5,1 35, 2!ll 99 defin.d. !5 tro.m rete,OLed, 3.1 35 ir.atmcnr r.tcren.ed, 34 ll5, :l!3 99 (lue$in8, 79 133, 156-5?, 2l I 1.1 b l' nd. l :l s ( .r .e.ri on tor, 2l l -1,1 ,nl()rn€d l3a m flfl pl ccho'.. re$ 7.) proles olelnniralon, 156 57 r m c l ah€ rers l l 3 tJalo ef€.t (ra observer problems, presen.e) HiFh \chaol 326-tt 'e{ins. Il,ghnak.s rcns, 2 Hig h erorder Lhi nki ng sl i l l r,5, 53, l 0t, l 2rj -2l J 23't'-60 Ho lisri . s.ori n8, l !15-96 lnfo.nril efrLL,aLionnelhods (r.? Nonr.r .iquet I^ serr.e trarnDg progranN, iJ.18-53 s.o.e inLerprerarion and use,352 53 ren rdminkLnrnrn. 150 5l ,rn s€|..'ion. 34a-49 r n sh cri onal obj €cLncs: Ba s. l .a.l i rng Model ,27 d e r il al n,n of.13 49 d - "!cl op'i g {ai emcnrs,49 5l eramples,lllli .17 re.h expli.il vs impli.ir, 50 p".poses ot, 36. 43 pramid elfe.r,4F .19 tJxotror,v. r{l..tive, 52 tr\onony. .ognnive, 5l 53 raxononryj psl.homoior,52 Innruc&,nal procednres (r4 Basi. Tea.hing Model) In F lli8i ",' b r ' .",13. dcfini.ons, r)30 lr2 devel{,pn€nl ol3:l? h cr cdnan bash Ior 334 35 l tl 32 ' n e annrgs, mcasrremenr o1330-Jg Intelligcntc ren s.o.e, 332 355-39 d.\iaiton IQ. 3116-37 inrelligen.e guorienr. 335 inr€rprerrron oa 331, 337 39 reporr oi 338 *andard score!, 335 uses ol, 3J6 t7 lnLelliBenie te{s, iand2rdized iEiiEa-ia;,nb".. 2t2 pn$ fatl,281 apuitrder.{s.33e-.10 conceptrral b2sis,2a8.350-32 singl€ vs mnlLiple gr2des 2t2 73 .riri.isms of, 14 366 SLJBJE'I]NDEX I^r4 P. \ Wikm Riles,11 repor.ing re$ resulls,518 lnrelligen.c rc{s (.0'r ) LypA of asl*, 332 35 usesol r1 -r5,3 56 -5? Inrernal anallsn €{imar€s Ga ReliabiliLre{inra In0insic radonal ralidiq, 101 6 Invenk,r) (rtr Nonterr teLhnique$ l(2 terurg tua.Inteuigen.ele{s, $ndardired) iLem analysnpro.edufes, 225 Jt applilrlbns, 233 37 .rnerion group selecrion,227 28 .rilerion refercr.ed, 237 33 U.S v SDuthCamliiq 14 Li.ensure resrng, 12 I4 Manda!€da$e$ncnr programsi m,nmrn.r cDmPetenc)/j I nainrnal as€$mcnl, 9-ll sare by $ate .onpa.isons, l0-ll irem dirficnlry-226, 228 3r ftn discrnninari.n,226 ?31 52 lLen responserhcory, 210 It adaprnerertin &21 6 l7 Mark and mrrking G4 G.ades cradin8) Ma{ery resling,3, 238 MaLchjngnens, r24-25, r82 85 advartagesoC r$ 84 dasincarion rype, 133 Buidelinesfor w.itmg, 134-85 l Len s€leco!, 12 2 2 8,2 32 J 3 M€asuren€nr,25-26,.1r-32,287 reviri.rn ofirems, 233 37 I lcn weigh ring11 , ,21 .116 oprion se igh (lng 2 , r5 l6 I lem w.it ing ,6 11 2 51 ,15 7 71, l8t 87, 193 94 relaLed10evaluarion,26 relaLedLoresring,26j 247 .ompurarion of, 59-60, 66 mulLiple.h oi(e ,15 7 77 mulriple true lilsc, l5l-59 numeri.al problem. 185 37 s!o.t an:ler, 179-32 Mdtal M.asurtuts Mulriple.hoi.e ilens, 124, 154-77 all of the above,1?5 consr.a{ed giLh informarionj 4l .12 .elated Lo performan.e,.11-45 rela(edr o L hin tin8 ,45 4 6 relaGd ro uidersLrnding,45-:16 school pufposcs,l7 r8, 53 rru.u.e. 4l-12,47, 53 (nder Richardsonformnlas,83 85 conpared wilh e$atrs,155 56 comparedwnh nulriple nefahe, r5r 52, r?3 .dmpared with true talse,tg5-37 c.iucismsol, 155-57 diss,.ler prepamidn, 167-?7 processof climin.don, r56-5? le.hniqud tbr {ri!in8, 15?-77 Dninr.nd.d clu€s,r63-64, 175 ?6 Muhiple h€-f.l& L.8al issue s, l3 15 ,2? ,1,3 18 Bnhh. \ aatilami4 14 Dcbra P. e Ttnington, 14 Dim u stat Boa t aJF,hrnti6, Colden mle de.tuion, l4-15 C,iggs I lrt . Poud h., 14 Ywbuh, 299 )4 itemr, 151 52,215 Nadonal Ass.ssn€nrof Educ2tioDalPmgress (NAEP),I 11 NeSar'v€ru8g.{'on en€cL,141 42 Nonte* techniques,2t1-6r SUBJECT INDEX 367 obsen donal methods,245-4t o.al que{ioning, 257-tjl queslionnaires,25355 rating s.al€s,250-59 Nornal cuFe equivalenL,69?0 Nornral dislribution, 62-64, 66, 69-70, 336 .har^.lerisLi.sol lj2 frequencypdLenilges, 6?-63 *andrd der,arion uDns,63, 336 *andard sco.€,69 70 -*arnal[ed conpaf€d wirh.rirerion ref€renced,54 inreryreLa rion s,670 8 ,80 , l14 I 5, 196, 2s 6- 99, !10 !9I4t28?, 29n 99,325,996 ).haract€risricsof, 296-c8 .ompared sth {andards, 2q6 B rDup\5 ind nid ua l,zq F 99 r-!n€.i..l probleDs (r,, Shorlanswerneno .ha.a.refis(ics,ll5-17 conpar€d uirh essayresr,ll5 71,122 23,156 Objectivetes iGms,6, r22-2b .lassificadon,123 26 .odpartuotu, 123-26,156 cooplex nulLiple choice,r52 nlar.hing 124 25, 182 85 mulriplc.hoic€, l2il, 154-77 nurliP le irue fah e,15 1 52 , 215 shorranswer,125-26,179-3? Oblccri!€$.etaren.ed,36-37, ll3 l9 coDpared w'd donain ref€renccd,3lj-i7 scoring ,ll6, 19 6-9 7 te$ developmenL, 1r6 obs€naLion,32 35, 241 53 @nnon probl€ms,245-46 ObsenaLiDnsch€dul$, 246 4? Obs€rer problems, 245-46 recording meLhods,246 subje.l consnLen.y,246 Op€n-hoot k$,204-5 OpriDnalt€$ itens, r93-9a Oral questioning,242,257-61 re.ording r.spons$, 260 6t out ot:lcvelLesring,25?, 292, 325-26 tercenul€ mnks, 64-6?, 70, 289, 292-95,310-13. 356 39 .hra.leristics df, 67 .omPurarionalmerhodr,64-65 co.tra$€d wtrh perc€nriles,66 disribuflon ol 6lj-67 inrerpreradon,tj6-67, 70, 289, 310-17,337-39 perc€niilebandr, 292-95,5r3 Pefcenriles,60, 6il-67, 292 95, 3r8 computationalme6od, 66 conrrarted{ilh percemile ranlc, 66 Perfornanc€a$csnent Grr Basic teaching Model, PertdrnanL€ Le{t ?erfornan.e rns, I9,29,92-33, n6 17 Plftr€d obs€nadon,243 46 .ontrar€d Hnh sPonaneoui 2.{5 Poin!biseriai corelarion, 232 Po$lelr discu$ion, 238 59 Predi.riv€ evid€n.e,107 8 Problem t$I' chara.Lerrdcs (r.. Short.answer ?rofi le inr€rprdralion,292-93 ch:racterirlics,42-43 comparedwiLh objecriv€r,50 51 exampl€s,:12-43,344 45 qualiry srandar&, 33 sPon@neous, 2rl3-44 Questjonin6Ge ofal que*ionins) Qu€stionnaires,253-55 368 SUAJECT ]NDEX Ran8e,00 0t,9 l detined,60 etit . ule scorera nBe 9l Rling s.!ler. ?.12,250 53 connnotrrrr.rs NiLh,25J develo pn ren.1' 2n 0 a2 examples,25o62 rtr.r prdrbon, 250-51 pro(edu fesnr usin& 252 53 pu.pores,2:o A eidxr .ssL€ i 1 10 lt,lio l, Readi8 inlenrrr, 255 5J Rcckrgular drtnbunon, ijrj-tj? R€leun.e. ill5, 2:l 22, !96 !7 ol normr, !!li 97 res .onrcnL 221-21 valid nt l0 5. !2 1 22 Rclela..c Crud r, 5 l-5:i, t 20- 21, 211 Reli: hilirr,irj l)8 , 10 0 l0l, l9l 93. 2?. 1, 2r r i, ! 92 !6 . rnei0n ..Irren .!d scores ,7t i- 7' l,9+ 94 221 definiLn,ns.76 t8 errc,s o fmca su ,c.re nL7113l esa) s.ores.t9, 91. lll 03 Ia.io.s n,lluen.n,g 8t3-93 iupor (a D.. ol1 8, 85 .2 2, 1 inLernll analysn,82 inrerp.errtion, 35 t3tl,2jl.l m.Lhods01 en,nraung,lil 35 related.o va lidilr, 10 0 ll) l riLtrgs,85, tCl 93,1.16 souf c esoI.fro r, 7 !r rll randzrd erro. ot merlu.emenr 36 aa, 292 'l.l subrc$ sco,cs,29+-90 oe scores,77 r_8 Reliabilitv ern.Inr n,etliods: alphr .o €fficien l 3,1a5 eqDivalenrforms, 82 esay s.orc l9l-92 , nrcrral.n alyns,32 -1!5 Kude. Ri.hardron, 83 84 SpearmanBrown 33 spliL ha h€ s. 8, 83 t enretcs,3l-li2 R€liabrlnv,tacrorsInnuencirr8: .heaLnig,93 grouP hoDogene,Lt,92 ireDrconLenLhonx,BeneiLv, S9 ited d ll.ulr),3 C 90 ' tred dtrcrinnrauon, aS 90 motivaLion92 scorelariabiliry, 90-91 speedednes,93 srudenLkn*sr.ess,92 t en len$ b,8 g lime limn s,g 3 Reporringto par€n6, 317 19 dtrL.id .ep.r .ards, 321 ResponsecounL,225 Re s p o n i er a r . s ,l l l l 2 9 Rerpon\esct, 1.11r , r D p l n r 8 e , '. r , N 2 ,r 2 9 3 0 SLrtrerpl(n,tl) tl S.hool re$rg proBmn$. 7 3, 12, 29! Jol JO5 j l 7 1 9 , 3 2 12 t lprirrde renrDg.:J,1t) n 'g l , s . l i . o l , 3 ! 6 2 ? tnrell'8€n(c(cnnrg,lJ57 purpo\er, 7 8 feplrrng renrl6, i2, 3lt ltl selern)n Lrlleni, 21rlrStrt reach., r. r.rriLr, i121,318 i3 ,^eolresolFnoxr,J05 nnerpretrrnD: / ( or e J S c c q u 'l a l c r 6 2 3 9 : 1 3 6l J 7 t r.tasser.fiurt.ns, Sl4 17 .r i t r r i o r r r l e r e n ( e . t , 8 0 t. l . 1 1 5 , 3 1 0 d.velopmenrrl \.orcs ?8! 1r2 g r J d e e q u i y l e n 6 , 1 9 09 ! , 3 l l i 8ft,up norn$. 2ga 99 norm rere.en.ed,03 70, u0 I 14 t5, 196,2Cti-E., Jlo pei(enrilr bands,292 95 trofiles, 29! 1r3 sratuss.ores ?39 s l b r e i s c o , . s , 2 9 . 19 6 . ! i l 6 J 7 rrerrmerl.rferen.cd 29a 99 Sc o n r g , t ! , 1 9 1 - 9 7 , I 0 4 , 2 0 9t l j rnsre' k.y prepardion 204 .o.reLLnn ror glesrng,2ll-14 d'ilircrtJl irem weiBhring,2l.t-16 e$al Le{s, 1.1.11r7 tornrl.s, 2ll 13, nrachrn.s 12, 2t0 I I oprn,r wei8bon8,215-16 S. o i n g m r . h i n e i l t . 2 0 0 .2 1 0 - l l , 2 2 3 Shoranswef rrda l!5 26. ltg a2 rdvanrnge\,l7q 30 d F a d v r n r a g e sl l,3 0 g u i d € l i n e sl b r ! 'r r n g , 1 3 0 3 ! numr,,.al problems,lt5 t? ta5 37 Skesed lrequen.y .l6triburion, 53 SpearDrnBrown iormula, 33 31100 Specrncdcrcrnrnier\,14.1,150 Sp e l i f i . r y , 2 2 2 Specdcdnc$,93,l2a Stlnhalves D.rlrod, 3? 3.1 Sponrrneousobservarion,243 .15 SGndard devradon 61 62, da, a6-37 LonpuLaLionol0r-62 t i8o,n1,l6l i?- 3 t u' nstec\rop1. e 6 r2a, 6 SLandrr.l.io. or nearlremcnr 11688, ,u2 94 LonpnLarion,3L-37 pcf.cndlc brnds, 2e! !1 suBJrcrtNoq :: ro relj:b,tny, t6 . :-i .r!.erarior 87_33 :-:: :::!?.rcnL -G 3 4,1 9 2n, Zt ) 3_21 :i :-.) porcen discussion,238_99 lst 88 ,3 . 10 --::re.nri.s df, 286-38 - :.,.rs, 280-llt, 306. 337 -- r.hool resrnr8issucs,j26 27 : rgen.e 948, 330 j! :: ir !1 .1rrn ullm , il . t , i! 2_23, : l2t l9rr !!, 336 -1 r<'!rd rrh,, J ir ) c h J r r 9 t ; , ; 2u ?1. _ ..rln .,,reria , 29 9 J 0t - ,:.c! 0r rnlorllrrtod !99 30u - _r<!. r.ns :ti3, J.lil -nr. rrau nrg 30 t . 3! : 2 3. 18! 3 r<\ or lu7 Il8, 3.10 . i.rd! ror .esorK 15 t 6, l0r , lU6 bt 70 ,: a9 91, 116 :: .,.|rrenQl 2ll9 9r .iial .!rve equival.nr, 09 ?(,,239 Ten plannin&l14 31,199 .o,renr to h€6ure, 8_19,t29 90 dirruk' t€rrt, ll0 1t rumber of n€m, l2a l}O spc.ificrdods, rr7 2r Le{h8 PU.Posc,114 l5 r!p.s ol ll5 t7, 122 28 ,. d.lil_3! T$Ln,{e-inr.rp . b_7. U 58, 2a9_96,gO9 .r{e.io. r€ter€n.cd,6 ?, s5_3a .lrolfscores,57 33 domaD r.lered.ed, i6 Tsr rrl(.|-n -10t, g48_49 r e \ r r r r , n g r l j - ,\ G , 2nutur . I ens6tr€$, 92,l0l, i:01-2 ...rne , ri3,t89 ,5Jti , !98_99 anbrSrtrt ol r38_40 330 - i'nes, 63, !1111, vJu200n, 24. 342 .13 :r;le ol spe .dn aro ns,t t lr t t , lt l ^Driln !€a oI €dkrional obje.(ves, 5t _5!, t26 8u'd€ttrer for inprovinS, 148 ,mproYrngd^clminaiion, I 49-51 fternrl .omplrjson use, t43 lcarninA effeLb. l4l-42 mNconcepLioru,l:l4, l3?_{2 Dunpre r ue litre, t5t-S2 xgarre suggenraneffccLjlitl 42 .3ron,rc ror urjng, |13 3.1 S ooxfr,5l,5 3, t:0,2 21 r],efs, 5i: 5ll, 1t0,2:1 .(ding b rhe ren,l .r, 3:t3 J/,r rr\rerJ J 5 t21 j 29 . 2lr 2 J -8srL,i!r,5 6,2 53 ,33 5 . <)t evJuaLionproL€dnrts,2?0 !! varidnt, r00 I l!, :tt, ?.12,296, 300 Loncu.rrnLr€lakd evidence,106_3 .onsru.rr€lared evrdrnce,102, loa t0 LoDrenLr.tar€devrdence,102 6 !.rcrro. rrr€ren.ed, 2:J7 J9 dLlrn n:ro r,2!5 !.r rnLrtrtrrc.3aonaterid€n.€, l0.t-6 370 SUBJECT INDEX nontesrrechniques,242 pr€dictiE evrden.e,107 8 relaled ro reliabilny, I00 l0l, l12 wrilren doomenrdion, 10.15 V alidatio n,l0 S,l0? a, ll0 12 variabilrt, 60-62, 224, 252 €xtraneous,l0l, 105, 109 relar€ dro r€ liab iliry , 9091, 224 $rndard devilion, 6t 62 eonrenr caLeEories,105, l2l gradln8coDPon€nc, 276 79 W.ning ase$bcnr, l0l, 139. 195 ESSENTIATS OFEDUCATIONAT M EASUB E ME N T Fifth Edition BOBERT L. EBELand DAVID A. FRISBIE hoth ol UnivaBitv of lowa This book provides a solid iniloduction to the lundamental concaots and pr inciple so fed ucat ional m eas ur em ent I t is d e s i g n € d s p e c i f i c a l l y f o r t h o s o individu als re sp ons ible lor t es t ing c ognit iv s a b i l i t i e s T h r o u g h r h e u s s o f pr actica ldiscu ssions , r €ador s €x 6m inet hepr inc ip l e s o f t 6 s t i t e m witing; the re liab iliry snd validit y of educ at ional les t s c o r g s ; i s s u e s r e l a t e d t o g r a d i n g ; pr oced ure s tor s s s igning gr ades ; r ec ent d e v e l o p h e n t s in testing; snd stan da rdizsd te s t s of ac hiov €m ent and int glli g e n c e Tho text is appropriato for introductory testing and measurement courses and as a tofo ren ce fot pr ac t it ionols , Nopt ov ious s t u d y o f e d u c a t i o n a l moasutement or 6ta tistics is a s s um ad. tsBN-o-87692-700-2