The Place of Intended Impact in Assessment Use Arguments* Lyle F. Bachman Adrian Palmer Department of Applied Linguistics U.C.L.A. Los Angeles, California Department of Linguistics University of Utah Salt Lake City, Utah *The material in this presentation and handout is based upon the books Language Testing in Practice, Lyle F. Bachman & Adrian Palmer. © Oxford University Press (1996) and Language Assessment in Action, Oxford University Press (forthcoming) as well as on various other articles by Lyle F. Bachman. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 1 References • Bachman, L. F. "Building and supporting a case for assessment use." Language Assessment Quarterly, 2(1). 2005. • Bachman, Lyle F and Adrian Palmer. Language Testing In Practice. Oxford University Press. 1996. http://www.oup.co.uk/ • Bachman, Lyle F and Adrian Palmer. Language Assessment In Action. Oxford University Press. Forthcoming. • Toulmin, S. E. The Uses of Argument. Cambridge: Cambridge University Press. 2003. • Watson, Jenny Peterson & Sindhvananda, Kanchana. "Notes on the Thammasat University English Program". Bangkok: Thammasat University Faculty of Liberal Arts. 1972. • Palmer, Adrian. "Procedures for student classification and grading in courses I-IV". Bangkok: Thammasat University Faculty of Liberal Arts. 1972. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 2 Outline of Presentation • How to make an Assessment Use Argument to justify using a test to have specific types of intended impact in a specific situation. • How to use this argument to argue for two different testing options (different methods of testing). • How to go about making a decision to use one option or the other. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 3 Four Qualities of Useful Language Assessments 1. 2. 3. 4. Reliability: consistency of measurement Construct validity: the meaningfulness of the interpretations that we make on the basis of assessment scores Authenticity: the degree of correspondence between the characteristics of a given assessment task and the characteristics of a relevant non-assessment language use task Intended Impact: the intended effects that taking a assessment, administering and taking a assessment, and using assessment results have on students, teachers, educational systems, and society Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 4 Qualities of Usefulness Associated With Links in Assessment Use Argument Bachman & Palmer (Forthcoming) 4. Uses/Decisio ns Authen tic ity W a r r ants I n te nd e d I m pa ct Wa rr an ts 3. Interpretation of R e su lts Constr u ct Va lidi t y W a r r ants 2. Results/Scores Re lia b ility W a r r ants 1. Perfor m ance on Asse s sm e nt Tas k s Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 5 Summary of Reasoning in Example Assessment Use Argument Authenticity For the following reasons…the M-C task is appropriate for measuring the students' knowledge of grammar in this situation. Construct Validity For the following reasons…scores can be interpreted in terms of "knowledge of grammar 4. USE/DECISIONS Assign grades at end of grammar unit. 3. INTERPRETATION Numbers are interpreted as students' knowledge of grammar Intended Impact For the following reasons…using the interpretations of the students' knowledge of grammar to assign grades will have the intended impact on test takers and test users. 2. RESULTS/SCORES Numbers are assigned to performance Reliability For the following reasons…we can consistently associate grammar scores with students' performance on M-C tasks Updated 11/16/06 1. PERFORMANCE ON ASSESSMENT TASK Students select answers on M-C Grammar Test Tasks ©1996 & forthcoming, Bachman & Palmer & OUP Page 6 Backing (Supporting Evidence) for Warrants (Reasoning) 2. RESULTS/SCORES Scores (numbers) are assigned to performance. Reliability Warrants (reasons) Backing (supporting evidence) 1. PERFORMANCE ON ASSESSMENT TASKS Students check answers on M-C answer sheet. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 7 Kinds of Backing • Prior research • Evidence specifically collected for this purpose • Accepted community social practice and values • Government regulations • Laws • Legal precedents Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 8 Example of Backing (Evidence) for Specific Reliability Warrant (Reasoning) 2. RESULTS/SCORES Scores (numbers) are assigned to performance. Reliability Warrant Scores are consistent from one administration to another. Backing On 2/34/06, measured test/retest reliability = .91 Updated 11/16/06 1. PERFORMANCE ON ASSESSMENT TASKS Students mark answers on M-C grammar test. ©1996 & forthcoming, Bachman & Palmer & OUP Page 9 Complete Assessment Use Argument Bachman & Palmer (Forthcoming) 4. Uses/Decisio ns Authen tic ity W a r r ants B ac k ing I n te nd e d I m pa ct Wa rr an ts 3. Interpretation of R e su lts Backing Constr u ct Va lidi t y W a r r ants 2. Results/Scores Backing Re lia b ility W a r r ants Backing Updated 11/16/06 1. Perfor m ance on Asse s sm e nt Tas k s ©1996 & forthcoming, Bachman & Palmer & OUP Page 10 Thammasat University Proficiency Test (TUPT) Kanchana Sindhvananda, J. Peterson, A. Palmer, and Thammasat Faculty of Liberal Arts Ajarns. (1971) • High-stakes test used to make decisions affecting all students in Thammasat University • Purpose – Measure knowledge of • grammar, • vocabulary • reading comprehension – To make decisions about • exemption from university ESL courses primarily involving reading • placement in required ESL courses primarily involving reading • grading in required ESL courses primarily involving reading Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 11 Criteria for Student Classification and Grading in Courses I-IV Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 12 Intended Impact & Options Situations Test Method Situation 1 MultipleThammasat choice 1971 Situation 2 Option 1 Thammasat Multiple1973 choice (hypothet.) Situation 2 Thammasat 1973 (hypothet.) Updated 11/16/06 Option 2 Multiplechoice and essay Intended Impact Efficient and hasslefree placement and grading in readingbased ESL program 1. Efficient and hasslefree placement and grading in reading and writing-based ESL program 2. Washback: teachers and students 1. Efficient and hasslefree placement and grading in reading and writing-based ESL program 2. Washback: teachers and students ©1996 & forthcoming, Bachman & Palmer & OUP Page 13 Intended Impact Argument Warrants 4. Us e /d e c is ions Intended I m pact Warrants 1 . Ex e m p t h i gh l y pro f i c i e nt s t u d ents fr o m E S L c las s es 2. Pl a ce re m a i n ing stu d e n ts in app r op r i at e E S L c l asses 3. Assign g r a d es of A and B in E S L cour s es (l ow e r gra d es to b e assi g n e d usi n g o t her m e asu r es) 3. Interpret a tions o f results 1. Know l ed g e of gr am m a r, vo c ab ul ary, and r e a di n g c o mp r e h en s ion Updated 11/16/06 I n di vi d u als a . El i mina t ing unnecessary instruc t ion frees studen t s to take other courses. b. Instructi o n at appropriate level is more effective. c . Regularized grading al l ows for syst e matic interpreta t ion of grades and reduces complaints of unfairness. 2 . Systems a . Relevance of construct to decisio n s: University courses focus on grammar, vocabulary, and reading comprehensi o n, so measures of these constructs are neede d to place students appropriately (co m mon practice). a . Regularized in s truct i on at dif ferent levels over time and across classes maxi m izes use of resources. 1. ©1996 & forthcoming, Bachman & Palmer & OUP Page 14 Intended Impact Argument Backing Intended Impact Wa r rants 1 . I nd iv idu als a. E li m i n at ing un n e c e s s a ry i n s tr u c t i on fr e es s tu d e n ts to ta ke o th er c o u r s es. b. I n s t r u c ti o n a t a p p r o p r i a te l ev e l is m ore e f f e c ti v e. c. Reg ul a r i z ed g r a d i ng al lo w s f or s y ste m a ti c i n t e r p r e ta ti on of g r a d e s a n d r e d u c e s c om p l a i n t s of un f a i r n e s s. 2 . Syste m s a. Re l e v an c e o f c o n s tr u c t t o d e c is io n s: Univ er s i ty co ur s e s fo c us on g r amm ar, v oc a b u l a ry, a nd re ad ing c o m p r e h e n s i o n, so m e a s u r es o f t h es e co ns tr u c ts a r e ne e d e d to pl ace s tu d e n ts a p p r o p r i at e l y. b. Reg ul a r i z ed i n s tr u c t i o n a t d if f e re n t l e v e ls o v e r ti me a nd ac r o s s c l a s s es m a xi m i z e s u se o f r es ou r c e s. Backing 1. Indiv i duals a. Doc um e n te d c om m u n i c a ti on fr o m ad va n c e d st u de n ts (s e e ษ ) b. Standard practice c. Documented communication from teachers and students on fairnessof grades(seeษ ) 2. Systems a. Standard practice. b. Documented teacher feedback on time spent in classpreparation and assessment (see…) Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 15 Authenticity Argument Warrants Authe n t i c ity Wa r ran t s 1. Rele van t in str u ct ion al t a s k selecti on : inst ructi o nal m aterials consist to a la rg e e x tent o f readi n g passa g es and speci f ic selecti o ns f ro m passa g es illustrati n g g ra mm ar, v ocabula ry , a n d readin g c o m prehens i o n teachin g po i nts. 2 . C o rres pond e n ce o f in str u cti o n a l t a s k a nd test t a s k c ha r a cteristics: Readin g passa g es are si m ilar in d i ff icul ty a n d conten t to inst ructi o nal passa g es. Man y instruct i onal tas k s i n v o l v e selected responses and li m ited c o nstructed responses . Updated 11/16/06 4. Use/deci s ions 1. E x e m pt highl y pro f icien t st u dents f ro m ES L classes 2. P lace re m ainin g students in a pp ropriate ES L classes 3. A ssi g n g r a des o f A a nd B in E S L c o urses (lo w er g rades to be assi g ned usi n g o ther m easures) 3. Interpretat i ons of results 1. K n o w le d g e o f g r a mm ar, v ocabula and read i n g co m prehensi o n ©1996 & forthcoming, Bachman & Palmer & OUP ry , Page 16 Authenticity Argument Backing Authe n t ic i ty W ar ran t s 1 . Relevant instructional tas k s election: i n str u ctio n al m aterials co n sist to a lar g e ex te n t of readi ng passa g es a n d speci f ic selectio n s f ro m passa g es ill u strati ng g ra mm ar , v oca b u lar y , a n d r h etorical or g a n izati o n teac h in g poi n ts. 2 . Correspondence of instruct i onal tas k and test tas k characteristics: Readi ng passa g es are si m ilar i n di ff ic u lt y a n d c o n te n t to i n str u ctio n al passa g es. Ma n y i n str u ctio n al tas k s i nv ol v e selected a n d li m ited c o n str u cted respo n ses. Backing 1 . E x a m ples o f i n str u ctio n al re a d i ng passa g es a n d i n str u ctio n al tas k s ca n be fo u nd i n t h e follo w i ng co u rse te x ts (re f e re n ces h er e ). 2. Readi ng di ff ic u lt y f o r m u las ha v e bee n u sed to calc u late di ff ic u lt y o f re a di ng passa g es i n i n str u ctio n al m ate rials a n d ca l ibrate di ff ic u lt y o f test passa g es (see T U P T m a nu al). B ot h i n str u ctio n al and test passa g es are based u pon topics i nv ol v i ng g e n eral (n o n te c hn ical) bac kg r ou n d kno w led g e and selected a n d li m ited c o n str u cted res p on ses. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 17 Construct Validity Warrants 3. Interpre t ations of results Cons t r u ct Va lid ity Wa r ran t s 1 . T h e c on str u cts าgr a mm a r, vo ca bu la ry, and rea d ing c o m p re h e n si on ำาh a v e b ee n c a ref u lly d efi n e d . 2 . T h e selecte d res pon se gr a mm a r , v o c ab ul a ry , and rea d ing c o m p re h e n si on test t a s k s a ll o w th e test t a k ers to d e m on str a te t h eir k no w ledge o f gr a mm a r, vo ca bu la r y, an d re ad ing c o m p re h e n si on Updated 11/16/06 1. K n o w le d g e o f g ra mm ar , v oc a b u lar y , a n d read i ng co m pr e h e n sio n 2. Results /Scores Total num b er o f co r r e ct r e spon se s ©1996 & forthcoming, Bachman & Palmer & OUP Page 18 Construct Validity Backing Cons t r u ct Va lid i ty Wa r ran t s 1. T h e co n str u cts า gra mm ar , v o c ab u lar y , a n d readi ng c o m pr e h e n sio n h a v e b ee n care fu ll y de f i n ed. 2. T h e selected respo n se gra m m ar, v oc a b u lar y , and readi ng c o m pr e h e n sio n test ta s k s allo w t h e test ta k ers to de m o n strate th eir kno w led g e of g ra mm ar , v oc a b u lar y , and re a di ng co m pr e h e n sio n . Backing 1. The construct definitions ha v e been develo p ed b y a committee of teachers with a bac k groun d in test design. (See definitions of c onstructs in test design statement.) 2. The test tasks have be e n des i gned to fo c us attention on the testing poin t in contexts that do not in and of t h emselves cre a te additional difficultly for test takers. F o r example, tasks designed to test grammar d o not involve difficult vocabulary as well. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 19 Reliability Warrants 2. Results Reliability Warrants Scoring c r i t e ri a and pro c edur e s a r e cons is te n t a cr os s admi n i s tr a t ions, and t a s k s. 2. Task ch a r ac te r i s t i c s a r e cons is te n t a cr os s mul t iple t as k s. 3. Sco r es a r e cons is te n t a cr os s te st admi n i s tr a t ions. 1. /Scores Total num b er o f co r r e ct r e spon se s 1. Perfor m ance on Asse s s m e nt Tas k s T est ta k ers c h ec k M -C a n s w ers Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 20 Reliability Backing Reliability Warrants Scoringcriteria and procedures are consistent across administrations, and tasks 2. Taskcharacteristics are consistent across multiple tasks 3. Scores are consistent across administrations 1. Backing 1. Single criteri o n is used f o r scori n g each set o f test tas k s (v ocab, g ra m , a n d read i n g c o m p rehension ). T est is m achine scored, s o p rocedures a re identical for all test tas k s. 2. A ll tas k s i n each section o f the test consist o f ste m s an d alte rnati v es w it h speci f ied charac teristics as described in test m anual. 3. Measured test/retest reliabili ty ( M arch, 1971). Mean SD N Pearson r Updated 11/16/06 Form A 86.21 20.49 Form B 88.48 19.47 164 .93 ©1996 & forthcoming, Bachman & Palmer & OUP Page 21 Situation 2: Same as for Situation 1 With The Following Additions • Purpose – Also to measure knowledge of the following constructs in task involving essay writing: • grammar • vocabulary • rhetorical organization – To make decisions about… • exemption from new university ESL writing courses • placement in new required ESL writing courses • grading in new required ESL writing courses • Additional intended impact: promote positive washback on writing teachers and students in writing courses Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 22 Additional Intended Impact Argument Warrants 4. Additional U s e/De c i sio n s 1. E x e m pt h ig h ly p ro f i c i e nt stu d e n ts fr o m E S L wri t i n g c l asses 2. Pl a ce re m a i n ing s t u d ents in a p pr o pr i a t e E S L wri t i n g c l asses 3. Assign g r a d es of A a n d B in E S L w r i t ing c our s es, (lo w er g r a d es to b e assi g n e d usi n g o t her m e as u r es) 3. Additi o nal Interpretat i ons of Results 1. K n o w le d g e o f g ra mm ar, v ocabula ry , and rhetorical or g anizat i on in tas k s involvi n g essa y w ritin g Updated 11/16/06 Additio n al Intended Impact Warrants 1 . I nd iv idu als a. No add i ti o nal w arrants 2 . Syste m s a. Rele v ance o f const ruct to decis i ons: Ne w un i v ersit y w r i ti n g courses f ocus on k n o w led g e o f gra mm ar, v ocab u lar y & rhetorical or g anizat i on in essa y w riting tas k s, s o m easures o f these constructs i n essa y w riting tas k s are neede d to p lace students appr o priatel y . ©1996 & forthcoming, Bachman & Palmer & OUP Page 23 Additional Intended Impact Argument Backing Additio n al Inte n ded I mpact Warrant s 1 . Individu a ls a . No addi tional w a r r an t s 2. Sys t e m s a . Re l eva nc e of cons t ru c t to d e c i s ion s : N e w unive r s i ty w r i t ing cou r s e s f ocus on knowledge of g r a m m ar, v o cabu la ry & rhet o r ic al or g an iz ation in e s s ay w r i t ing ta sks, so m e a su re s of th e s e cons tru ct s in e ss ay w r i ting ta sks a re n ee d ed to p l a c e stude n ts approp ri a t ely. Additional Backing 1. Indiv iduals 2. Syste m s a. Documented f eedb a ck f r om instructors that students who control g rammar and vocabular y in r eadin g ta s ks cannot necess a ril y perform w ell on tasks involvin g essa y writin g Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 24 Additional Authenticity Argument Warrants 4. Us e /d e c is ions Addi t ional Authe n t ic i ty W ar ran t s 1 . Relevant instructional task selection: instructiona l materials also i n volve tasks invo l vi n g essa y writing. 2 . Correspondence of assessment task / instructional task characteristics: Assessment e ssay topics are similar to topics invo l vi n g general knowle d ge u sed i n instructiona l tasks . Length of a s sessment essay task s is similar to length of instruct i onal essay task s . Updated 11/16/06 1. Ex e m p t hi g h l y pro fi c i ent stu d e n ts fr o m new E S L essay w r i t i n g c l asses 2. Pl a ce re m a i n ing s t u d ents in a p pr o pr i a t e n ew E S L essay w r i t i n g c l asses 3. Assign g r a d es of A a n d B in n e w ess a y wr i ti n g cour s es, (lo w er g r a d es to be as s ig n ed u s ing o th er m e asu r es) 3. Interpre t ations of results 1. K n o w le d g e o f g ra mm ar, v ocab u lar y , a nd rhetoric al o r g anizat i on in tas k s invo l v i n g essa y w ritin g ©1996 & forthcoming, Bachman & Palmer & OUP Page 25 Additional Authenticity Argument Backing Additio n al Aut h entic i ty Warran ts 1 . Re levant in s tr uc tional ta s k s e lec tion: ins tru c tio n al ma te r i a l s a l so involve ta sks involving e s s ay w r i tin g. 2. Cor re sponde n ce o f a s s ess me n t task / in s tr u c t ional t a sk char a c t e r is t ic s: A s s es s m ent es say topi c s a r e s im il ar to topi c s invol v ing gen e r a l knowledge us ed in ins tru c tio n al t a sks. L ength of a s se s s me n t e ss ay tas k s i s s i m i l ar to l ength of ins tru c tio n al e ss ay w r i t ing tas k s. Backing 1. Description of cu r ricul u m. 2. E x ample instructional materials proposed essa y test blue p rint. Updated 11/16/06 and ©1996 & forthcoming, Bachman & Palmer & OUP Page 26 Additional Construct Validity Warrants Construct Validity Warrants 1 . The constructs า k no w ledge of gra mm ar, vo c abulary, and rhetorical organization ำาhave been carefully defined. 2 . The e x tended production essay w riting test tas k allo w s the test ta k ers to de m onstrate their k no w ledge of gra mm ar, vo c abulary, and rhetorical organization Updated 11/16/06 3. Interpre t ations of results 1. Knowledge of gra m m a r, vocabu l ary, and rhet o r ic al organ iz ation 2. Results /Scores Ra t ing le v e l s. ©1996 & forthcoming, Bachman & Palmer & OUP Page 27 Additional Construct Validity Backing Construct Validity Warrants 1 . The constructs า k no w ledge of gra mm ar, vocab ulary, and rhetorical organizationำ h a ve been carefully defined. 2 . The e x tended production e s s ay w riting test tas k allo w s the test ta k ers to de m onstrate their k no w ledge of gra mm ar, v o c abulary, and rhetorical or g anization. Backing 1. The construct definitions ha v e been develo p ed b y a committee of teachers with a bac k groun d in test design. (See definitions of c onstructs in test design statement.) 2. The test tasks have be e n des i gned to fo c us attention on the testing poin t in contexts that do not in and of t h emselves cre a te additional difficultly for test takers. F o r example, essay writing tasks involve topical knowledge c ommon to all test takers. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 28 Comparative Assessment Use Arguments 4. Uses/Decisio ns Authen tic ity W a r r ants 4. Uses/Decisio I n te nd e d I m p a ct Wa rr an ts ns Authen tic ity W a r r ants 3. Interpretation I n te nd e d I m pa ct Wa rr an ts 3. Interpretation of R e su lts Constr u ct Va lidi ty W a r r ants of R e su lts Constr u ct Va lidi ty W a r r ants 2. Results/Scores Re lia b ility W a r r ants 2. Results/Scores Re lia b ility W a r r ants 1. Perfor m ance on Asse ssm e nt M -C Ta sk s 1. Perfor m ance on Asses sm e nt M -C and E ssay Tas k s Asse ssm e nt Use Argu m ent For Option #1 Asse ssm e nt Use Argu m ent For Option #2 Option #1 Option #2 Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 29 How to Decide Between Alternatives • Describe additional decisions and intended impact – Program directors need to make the following decision: Should they add an essay writing task to the English test given to all students entering Thammasat University? – Program directors want to increase students' ability to write essays because essay writing is an ability that students currently lack. This ability is needed both in instructional and real-life language use tasks that the students need to perform. • To make this decision, they need to develop Assessment Use Arguments for two alternatives: 1. Do not add an essay writing task. Continue to use only the M-C tasks to place and grade students in essay writing classes. 2. Add an additional essay writing task and use this to place and grade students in essay writing classes. • Then decide 1. which argument they prefer and can live with… 2. on the basis of whether developing the test according to the preferred argument is worth the cost. Updated 11/16/06 ©1996 & forthcoming, Bachman & Palmer & OUP Page 30