Assessment Use Arguments as Basis For Assessment Development

advertisement
The Place of Intended Impact in
Assessment Use Arguments*
Lyle F. Bachman
Adrian Palmer
Department of
Applied Linguistics
U.C.L.A.
Los Angeles, California
Department of
Linguistics
University of Utah
Salt Lake City, Utah
*The material in this presentation and handout is
based upon the books Language Testing in Practice,
Lyle F. Bachman & Adrian Palmer. © Oxford
University Press (1996) and Language Assessment in
Action, Oxford University Press (forthcoming) as
well as on various other articles by Lyle F. Bachman.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 1
References
• Bachman, L. F. "Building and supporting a case
for assessment use." Language Assessment
Quarterly, 2(1). 2005.
• Bachman, Lyle F and Adrian Palmer. Language
Testing In Practice. Oxford University Press.
1996. http://www.oup.co.uk/
• Bachman, Lyle F and Adrian Palmer. Language
Assessment In Action. Oxford University Press.
Forthcoming.
• Toulmin, S. E. The Uses of Argument. Cambridge:
Cambridge University Press. 2003.
• Watson, Jenny Peterson & Sindhvananda,
Kanchana. "Notes on the Thammasat University
English Program". Bangkok: Thammasat
University Faculty of Liberal Arts. 1972.
• Palmer, Adrian. "Procedures for student
classification and grading in courses I-IV".
Bangkok: Thammasat University Faculty of
Liberal Arts. 1972.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 2
Outline of Presentation
• How to make an Assessment Use
Argument to justify using a test to
have specific types of intended
impact in a specific situation.
• How to use this argument to argue
for two different testing options
(different methods of testing).
• How to go about making a decision
to use one option or the other.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 3
Four Qualities of
Useful Language Assessments
1.
2.
3.
4.
Reliability: consistency of
measurement
Construct validity: the meaningfulness
of the interpretations that we make on
the basis of assessment scores
Authenticity: the degree of
correspondence between the
characteristics of a given assessment
task and the characteristics of a relevant
non-assessment language use task
Intended Impact: the intended effects
that taking a assessment, administering
and taking a assessment, and using
assessment results have on students,
teachers, educational systems, and
society
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 4
Qualities of Usefulness Associated With
Links in Assessment Use Argument
Bachman & Palmer (Forthcoming)
4. Uses/Decisio
ns
Authen tic ity
W a r r ants
I n te nd e d I m pa ct
Wa rr an ts
3. Interpretation
of R e su lts
Constr u ct
Va lidi t y
W a r r ants
2. Results/Scores
Re lia b ility
W a r r ants
1. Perfor m ance on
Asse s sm e nt Tas k s
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 5
Summary of Reasoning in Example
Assessment Use Argument
Authenticity
For the following
reasons…the M-C
task is appropriate
for measuring the
students' knowledge
of grammar in this
situation.
Construct Validity
For the following
reasons…scores can
be interpreted in
terms of "knowledge
of grammar
4. USE/DECISIONS
Assign grades at end of
grammar unit.
3. INTERPRETATION
Numbers are interpreted
as students'
knowledge of grammar
Intended Impact
For the following
reasons…using the
interpretations of the
students' knowledge
of grammar to
assign grades will
have the intended
impact on test takers
and test users.
2. RESULTS/SCORES
Numbers are assigned to
performance
Reliability
For the following
reasons…we can
consistently associate
grammar scores with
students'
performance on M-C
tasks
Updated 11/16/06
1. PERFORMANCE ON
ASSESSMENT TASK
Students select answers on
M-C Grammar Test Tasks
©1996 & forthcoming, Bachman & Palmer & OUP
Page 6
Backing (Supporting Evidence)
for Warrants (Reasoning)
2. RESULTS/SCORES
Scores (numbers) are
assigned to performance.
Reliability
Warrants
(reasons)
Backing
(supporting evidence)
1. PERFORMANCE ON
ASSESSMENT TASKS
Students check answers on
M-C answer sheet.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 7
Kinds of Backing
• Prior research
• Evidence specifically collected
for this purpose
• Accepted community social
practice and values
• Government regulations
• Laws
• Legal precedents
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 8
Example of Backing (Evidence)
for Specific Reliability Warrant (Reasoning)
2. RESULTS/SCORES
Scores (numbers) are
assigned to performance.
Reliability
Warrant
Scores are consistent
from one
administration to
another.
Backing
On 2/34/06,
measured test/retest
reliability = .91
Updated 11/16/06
1. PERFORMANCE ON
ASSESSMENT TASKS
Students mark answers on
M-C grammar test.
©1996 & forthcoming, Bachman & Palmer & OUP
Page 9
Complete Assessment Use Argument
Bachman & Palmer (Forthcoming)
4. Uses/Decisio
ns
Authen tic ity
W a r r ants
B ac k ing
I n te nd e d I m pa ct
Wa rr an ts
3. Interpretation
of R e su lts
Backing
Constr u ct
Va lidi t y
W a r r ants
2. Results/Scores
Backing
Re lia b ility
W a r r ants
Backing
Updated 11/16/06
1. Perfor m ance on
Asse s sm e nt Tas k s
©1996 & forthcoming, Bachman & Palmer & OUP
Page 10
Thammasat University
Proficiency Test (TUPT)
Kanchana Sindhvananda, J. Peterson, A. Palmer,
and Thammasat Faculty of Liberal Arts Ajarns. (1971)
• High-stakes test used to make
decisions affecting all students in
Thammasat University
• Purpose
– Measure knowledge of
• grammar,
• vocabulary
• reading comprehension
– To make decisions about
• exemption from university ESL courses
primarily involving reading
• placement in required ESL courses
primarily involving reading
• grading in required ESL courses primarily
involving reading
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 11
Criteria for Student Classification
and Grading in Courses I-IV
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 12
Intended Impact & Options
Situations
Test
Method
Situation 1 MultipleThammasat choice
1971
Situation 2 Option 1
Thammasat Multiple1973
choice
(hypothet.)
Situation 2
Thammasat
1973
(hypothet.)
Updated 11/16/06
Option 2
Multiplechoice
and essay
Intended
Impact
Efficient and hasslefree placement and
grading in readingbased ESL program
1. Efficient and hasslefree placement and
grading in reading
and writing-based
ESL program
2. Washback: teachers
and students
1. Efficient and hasslefree placement and
grading in reading
and writing-based
ESL program
2. Washback: teachers
and students
©1996 & forthcoming, Bachman & Palmer & OUP
Page 13
Intended Impact Argument
Warrants
4. Us e /d e c is ions
Intended I
m pact
Warrants
1 . Ex e m p t h i gh l y
pro f i c i e nt s t u d ents
fr o m E S L c las s es
2. Pl a ce re m a i n ing
stu d e n ts in
app r op r i at e E S L
c l asses
3. Assign g r a d es of A
and B in E S L
cour s es (l ow e r
gra d es to b e
assi g n e d usi n g o t her
m e asu r es)
3.
Interpret a tions
o f results
1. Know l ed g e of
gr am m a r, vo c ab ul ary,
and r e a di n g
c o mp r e h en s ion
Updated 11/16/06
I n di vi d u als
a . El i mina t ing unnecessary
instruc t ion frees studen t s to
take other courses.
b. Instructi o n at appropriate
level
is more effective.
c . Regularized grading al
l ows for
syst e matic interpreta t ion of
grades and reduces complaints
of unfairness.
2 . Systems
a . Relevance of construct to
decisio n s: University courses
focus on grammar, vocabulary,
and reading comprehensi
o n, so
measures of these constructs
are neede d to place students
appropriately (co
m mon
practice).
a . Regularized in
s truct i on at
dif ferent levels over time and
across classes maxi
m izes use of
resources.
1.
©1996 & forthcoming, Bachman & Palmer & OUP
Page 14
Intended Impact Argument
Backing
Intended
Impact Wa
r rants
1 . I nd iv idu als
a. E li m i n at ing un n e c e s s a ry i n s tr u c t i on fr e es s tu d e n ts to
ta ke o th er c o u r s es.
b. I n s t r u c ti o n a t a p p r o p r i a te l ev e l is m ore e f f e c ti v e.
c. Reg ul a r i z ed g r a d i ng al lo w s f or s y ste m a ti c i n t e r p r e ta ti on
of g r a d e s a n d r e d u c e s c om p l a i n t s of un f a i r n e s s.
2 . Syste m s
a. Re l e v an c e o f c o n s tr u c t t o d e c is io n s: Univ er s i ty co ur s e s
fo c us on g r amm ar, v oc a b u l a ry, a nd re ad ing
c o m p r e h e n s i o n, so m e a s u r es o f t h es e co ns tr u c ts a r e
ne e d e d to pl ace s tu d e n ts a p p r o p r i at e l y.
b. Reg ul a r i z ed i n s tr u c t i o n a t d if f e re n t l e v e ls o v e r ti me a nd
ac r o s s c l a s s es m a xi m i z e s u se o f r es ou r c e s.
Backing
1. Indiv i duals
a. Doc um e n te d c om m u n i c a ti on fr o m ad va n c e d st u de n ts (s e e ษ )
b. Standard practice
c. Documented communication from teachers and students on
fairnessof grades(seeษ )
2. Systems
a. Standard practice.
b. Documented teacher feedback on time spent in classpreparation
and assessment (see…)
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 15
Authenticity Argument Warrants
Authe n t i c ity
Wa r ran t s
1. Rele van t in str u ct ion al t a s k
selecti on : inst ructi o nal
m aterials consist to a la rg e
e x tent o f readi n g passa g es
and speci f ic selecti o ns f ro m
passa g es illustrati n g
g ra mm ar, v ocabula ry , a n d
readin g c o m prehens i o n
teachin g po i nts.
2 . C o rres pond e n ce o f
in str u cti o n a l t a s k a nd test
t a s k c ha r a cteristics:
Readin g passa g es are
si m ilar in d i ff icul ty a n d
conten t to inst ructi o nal
passa g es. Man y
instruct i onal tas k s i n v o l v e
selected responses and
li m ited c o nstructed
responses .
Updated 11/16/06
4. Use/deci
s ions
1. E x e m pt highl y
pro f icien t st u dents
f ro m ES L classes
2. P lace re m ainin g
students in a pp ropriate
ES L classes
3. A ssi g n g r a des o f A a nd
B in E S L c o urses
(lo w er g rades to be
assi g ned usi n g o ther
m easures)
3.
Interpretat
i ons
of results
1. K n o w le d g e o f
g r a mm ar, v ocabula
and read i n g
co m prehensi o n
©1996 & forthcoming, Bachman & Palmer & OUP
ry ,
Page 16
Authenticity Argument Backing
Authe n t ic i ty W ar ran t s
1 . Relevant
instructional tas
k s election:
i n str u ctio n al m aterials co n sist to a lar g e ex te n t of
readi ng passa g es a n d speci f ic selectio n s f ro m
passa g es ill u strati ng g ra mm ar , v oca b u lar y , a n d
r h etorical or g a n izati o n teac h in g poi n ts.
2 . Correspondence of instruct
i onal tas k and test
tas k characteristics:
Readi ng passa g es are
si m ilar i n di ff ic u lt y a n d c o n te n t to i n str u ctio n al
passa g es. Ma n y i n str u ctio n al tas k s i nv ol v e
selected a n d li m ited c o n str u cted respo n ses.
Backing
1 . E x a m ples o f i n str u ctio n al re a d i ng passa g es a n d
i n str u ctio n al tas k s ca n be fo u nd i n t h e follo w i ng
co u rse te x ts (re f e re n ces h er e ).
2. Readi ng di ff ic u lt y f o r m u las ha v e bee n u sed to
calc u late di ff ic u lt y o f re a di ng passa g es i n
i n str u ctio n al m ate rials a n d ca l ibrate di ff ic u lt y o f test
passa g es (see T U P T m a nu al). B ot h i n str u ctio n al and
test passa g es are based u pon topics i nv ol v i ng g e n eral
(n o n te c hn ical) bac kg r ou n d kno w led g e and selected
a n d li m ited c o n str u cted res p on ses.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 17
Construct Validity Warrants
3. Interpre
t ations
of results
Cons t r u ct Va lid ity
Wa r ran t s
1 . T h e c on str u cts
าgr a mm a r, vo ca bu la ry,
and rea d ing
c o m p re h e n si on ำาh a v e
b ee n c a ref u lly d efi n e d .
2 . T h e selecte d res pon se
gr a mm a r , v o c ab ul a ry ,
and rea d ing
c o m p re h e n si on test t a s k s
a ll o w th e test t a k ers to
d e m on str a te t h eir
k no w ledge o f gr a mm a r,
vo ca bu la r y, an d re ad ing
c o m p re h e n si on
Updated 11/16/06
1. K n o w le d g e o f
g ra mm ar , v oc a b u lar y ,
a n d read i ng
co m pr e h e n sio n
2. Results
/Scores
Total num b er o f
co r r e ct r e spon se s
©1996 & forthcoming, Bachman & Palmer & OUP
Page 18
Construct Validity Backing
Cons t r u ct Va lid i ty
Wa r ran t s
1. T h e co n str u cts า gra mm ar , v o c ab u lar y , a n d
readi ng c o m pr e h e n sio n h a v e b ee n care fu ll y
de f i n ed.
2. T h e selected respo n se gra m m ar, v oc a b u lar y , and
readi ng c o m pr e h e n sio n test ta s k s allo w t h e test
ta k ers to de m o n strate th eir kno w led g e of
g ra mm ar , v oc a b u lar y , and re a di ng
co m pr e h e n sio n .
Backing
1. The construct definitions ha
v e been develo p ed b y
a committee of teachers with a bac
k groun d in test
design. (See definitions of
c onstructs in test
design statement.)
2. The test tasks have be
e n des i gned to fo c us
attention on the testing poin
t in contexts that do
not in and of t
h emselves
cre a te additional
difficultly for test takers. F
o r example, tasks
designed to test grammar
d o not involve difficult
vocabulary
as well.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 19
Reliability Warrants
2. Results
Reliability Warrants
Scoring c r i t e ri a and
pro c edur e s a r e
cons is te n t a cr os s
admi n i s tr a t ions,
and t a s k s.
2. Task
ch a r ac te r i s t i c s a r e
cons is te n t a cr os s
mul t iple t as k s.
3. Sco r es a r e
cons is te n t a cr os s
te st
admi n i s tr a t ions.
1.
/Scores
Total num b er o f
co r r e ct r e spon se s
1. Perfor m ance on
Asse s s m e nt Tas k s
T est ta k ers c h ec k M -C
a n s w ers
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 20
Reliability Backing
Reliability Warrants
Scoringcriteria and procedures are
consistent across administrations,
and tasks
2. Taskcharacteristics are consistent
across multiple tasks
3. Scores are consistent across
administrations
1.
Backing
1. Single criteri
o n is used f o r scori n g each set o f test tas k s
(v ocab, g ra m , a n d read i n g c o m p rehension ). T est is
m achine scored, s o p rocedures a re identical
for all test
tas k s.
2. A ll tas k s i n each section o f the test consist o f ste m s an d
alte rnati v es w it h speci f ied charac teristics as described
in
test m anual.
3. Measured test/retest reliabili
ty ( M arch, 1971).
Mean
SD
N
Pearson r
Updated 11/16/06
Form A
86.21
20.49
Form B
88.48
19.47
164
.93
©1996 & forthcoming, Bachman & Palmer & OUP
Page 21
Situation 2:
Same as for Situation 1
With The Following Additions
• Purpose
– Also to measure knowledge of the
following constructs in task involving
essay writing:
• grammar
• vocabulary
• rhetorical organization
– To make decisions about…
• exemption from new university ESL writing
courses
• placement in new required ESL writing
courses
• grading in new required ESL writing
courses
• Additional intended impact:
promote positive washback on
writing teachers and students in
writing courses
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 22
Additional Intended Impact
Argument Warrants
4. Additional
U s e/De c i sio n s
1. E x e m pt h ig h ly p ro f i c i e nt
stu d e n ts fr o m E S L
wri t i n g c l asses
2. Pl a ce re m a i n ing s t u d ents
in a p pr o pr i a t e E S L
wri t i n g c l asses
3. Assign g r a d es of A a n d B
in E S L w r i t ing c our s es,
(lo w er g r a d es to b e
assi g n e d usi n g o t her
m e as u r es)
3. Additi o nal
Interpretat
i ons
of Results
1. K n o w le d g e o f
g ra mm ar, v ocabula ry ,
and rhetorical
or g anizat i on in tas k s
involvi n g essa y
w ritin g
Updated 11/16/06
Additio n al
Intended
Impact
Warrants
1 . I nd iv idu als
a. No add i ti o nal w arrants
2 . Syste m s
a. Rele v ance o f const ruct
to decis i ons: Ne w
un i v ersit y w r i ti n g
courses f ocus on
k n o w led g e o f gra mm ar,
v ocab u lar y & rhetorical
or g anizat i on in essa y
w riting tas k s, s o
m easures o f these
constructs
i n essa y
w riting tas k s are neede d
to p lace students
appr o priatel y .
©1996 & forthcoming, Bachman & Palmer & OUP
Page 23
Additional Intended Impact
Argument Backing
Additio
n al Inte n ded I mpact
Warrant
s
1 . Individu
a ls
a . No addi tional w a r r an t s
2. Sys t e m s
a . Re l eva nc e of cons t ru c t to d e c i s ion s : N e w
unive r s i ty w r i t ing cou r s e s f ocus on
knowledge of
g r a m m ar, v o cabu la ry &
rhet o r ic al or g an iz ation in e s s ay w r i t ing
ta sks, so m e a su re s of th e s e cons tru ct s in
e ss ay w r i ting ta sks a re n ee d ed to p l a c e
stude n ts approp ri a t ely.
Additional
Backing
1. Indiv iduals
2. Syste m s
a. Documented
f eedb a ck f r om instructors
that students who control
g rammar and
vocabular y in r eadin g ta s ks cannot
necess a ril y perform w ell on tasks
involvin g essa y writin g
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 24
Additional Authenticity
Argument Warrants
4. Us e /d e c is ions
Addi t ional
Authe n t ic i ty W ar ran t s
1 . Relevant instructional
task selection:
instructiona
l materials
also i n volve tasks
invo l vi n g essa y writing.
2 . Correspondence of
assessment task
/
instructional task
characteristics:
Assessment e
ssay topics
are similar to topics
invo l vi n g general
knowle d ge u sed i n
instructiona
l tasks .
Length of a s sessment
essay task s is similar to
length of instruct i onal
essay task s .
Updated 11/16/06
1. Ex e m p t hi g h l y pro fi c i ent
stu d e n ts fr o m new E S L
essay w r i t i n g c l asses
2. Pl a ce re m a i n ing s t u d ents
in a p pr o pr i a t e n ew E S L
essay w r i t i n g c l asses
3. Assign g r a d es of A a n d B
in n e w ess a y wr i ti n g
cour s es, (lo w er g r a d es to
be as s ig n ed u s ing o th er
m e asu r es)
3. Interpre
t ations
of results
1. K n o w le d g e o f g ra mm ar,
v ocab u lar y , a nd
rhetoric al o r g anizat i on
in tas k s invo l v i n g essa y
w ritin g
©1996 & forthcoming, Bachman & Palmer & OUP
Page 25
Additional Authenticity Argument
Backing
Additio
n al Aut h entic i ty Warran
ts
1 . Re levant in
s tr uc tional ta s k s e lec tion:
ins tru c tio n al ma te r i a l s a l so involve ta sks
involving
e s s ay w r i tin g.
2. Cor re sponde n ce o f a s s ess me n t task /
in s tr u c t ional t a sk char a c t e r is t ic s:
A s s es s m ent es say topi c s a r e s im il ar to
topi c s invol v ing gen e r a l knowledge
us ed in
ins tru c tio n al t a sks. L ength of a s se s s me n t
e ss ay tas k s i s s i m i l ar to l ength of
ins tru c tio n al e ss ay w r i t ing tas k s.
Backing
1. Description of cu
r ricul u m.
2. E x ample instructional
materials
proposed essa
y test blue p rint.
Updated 11/16/06
and
©1996 & forthcoming, Bachman & Palmer & OUP
Page 26
Additional Construct Validity
Warrants
Construct Validity
Warrants
1 . The constructs
า k no w ledge of
gra mm ar, vo c abulary,
and rhetorical
organization
ำาhave
been carefully defined.
2 . The e x tended
production essay
w riting test tas k allo w s
the test ta k ers to
de m onstrate their
k no w ledge of
gra mm ar, vo c abulary,
and rhetorical
organization
Updated 11/16/06
3. Interpre
t ations
of results
1. Knowledge of
gra m m a r,
vocabu l ary, and
rhet o r ic al
organ iz ation
2. Results
/Scores
Ra t ing le v e l s.
©1996 & forthcoming, Bachman & Palmer & OUP
Page 27
Additional Construct Validity
Backing
Construct Validity
Warrants
1 . The constructs า
k no w ledge of gra mm ar,
vocab ulary, and rhetorical
organizationำ
h a ve
been carefully defined.
2 . The e x tended production
e s s ay w riting test
tas k allo w s the test ta k ers to de m onstrate their
k no w ledge of gra mm ar, v o c abulary,
and
rhetorical or
g anization.
Backing
1. The construct definitions ha
v e been develo p ed b y
a committee of teachers with a bac
k groun d in test
design. (See definitions of
c onstructs in test
design statement.)
2. The test tasks have be
e n des i gned to fo c us
attention on the testing poin
t in contexts that do
not in and of t
h emselves
cre a te additional
difficultly for test takers. F
o r example, essay writing tasks involve topical knowledge
c ommon
to all test takers.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 28
Comparative Assessment
Use Arguments
4. Uses/Decisio
ns
Authen tic ity
W a r r ants
4. Uses/Decisio
I n te nd e d I m p a ct
Wa rr an ts
ns
Authen tic ity
W a r r ants
3. Interpretation
I n te nd e d I m pa ct
Wa rr an ts
3. Interpretation
of R e su lts
Constr u ct
Va lidi ty
W a r r ants
of R e su lts
Constr u ct
Va lidi ty
W a r r ants
2. Results/Scores
Re lia b ility
W a r r ants
2. Results/Scores
Re lia b ility
W a r r ants
1. Perfor m ance on
Asse ssm e nt M -C Ta sk s
1. Perfor m ance on Asses sm e nt
M -C and E ssay Tas k s
Asse ssm e nt Use Argu m ent
For Option #1
Asse ssm e nt Use Argu m ent
For Option #2
Option #1
Option #2
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 29
How to Decide
Between Alternatives
•
Describe additional decisions and intended
impact
– Program directors need to make the following
decision: Should they add an essay writing task to
the English test given to all students entering
Thammasat University?
– Program directors want to increase students' ability
to write essays because essay writing is an ability
that students currently lack. This ability is needed
both in instructional and real-life language use tasks
that the students need to perform.
•
To make this decision, they need to develop
Assessment Use Arguments for two alternatives:
1. Do not add an essay writing task. Continue to use
only the M-C tasks to place and grade students in
essay writing classes.
2. Add an additional essay writing task and use this to
place and grade students in essay writing classes.
•
Then decide
1. which argument they prefer and can live with…
2. on the basis of whether developing the test
according to the preferred argument is worth the
cost.
Updated 11/16/06
©1996 & forthcoming, Bachman & Palmer & OUP
Page 30
Related documents
Download