Argument Substance and Argument Structure in Educational Assessment Robert J. Mislevy Department of Measurement, Statistics, & Evaluation University of Maryland, College Park, MD April 29, 2003 Presented at Conference on Inference, Culture, and Ordinary Thinking in Dispute Resolution, Benjamin N. Cardozo School of Law, Yeshiva University, New York, New York, April 27-29, 2003. This work builds on research with Linda Steinberg and Russell Almond at Educational Testing Service on the structure of educational assessments. April 29, 2003 Inference & Culture Slide 1 Central Points Educational assessment has changed considerably over the last century. Why? Strikingly different psychological perspectives on nature of learning and knowledge. Can be seen as elaborations of same argument structure. » Wigmore, Toulmin April 29, 2003 Inference & Culture Slide 2 Messick (1994) on assessment design: [B]egin by asking what complex of knowledge, skills, or other attribute should be assessed, presumably because they are tied to explicit or implicit objectives of instruction or are otherwise valued by society. Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors? Thus, the nature of the construct guides the selection or construction of relevant tasks as well as the rational development of construct-based scoring criteria and rubrics. April 29, 2003 Inference & Culture Slide 3 Toulmin's (1958) structure for arguments C unless W on account of A since so B D supports R Reasoning flows from data (D) to claim (C) by justification of a warrant (W), which in turn is supported by backing (B). The inference may need to be qualified by alternative explanations (A), which may have rebuttal evidence (R) to support them. April 29, 2003 Inference & Culture Slide 4 Perspectives on learning and knowledge Trait/differential (~1900 - ) Behaviorist (~1950 - 1980) Information-processing (~1970 - ) Sociocultural (~1980 - ) April 29, 2003 Inference & Culture Slide 5 Trait/Differential Perspective A relatively stable characteristic of a person— an attribute, enduring process, or disposition—which is consistently manifested to some degree when relevant, despite considerable variation in the range of settings and circumstances. (Messick, 1989) Interest in people's differential status on common traits Useful in selection, prediction, and educational decisions—not so much for instruction April 29, 2003 Inference & Culture Slide 6 Spearman’s “Theorem of indifference of the indicator” This means that, for the purpose of indicating the amount of g possessed by a person, any test will do just well as any other, provided only that its correlation with g is equally high. ... Another consequence of the indifference of the indicator consists in the significance that should be attached to personal estimates of “intelligence” made by teachers and others. However unlike may be the kinds of observation from which these estimates may have been derived, still insofar as they have a sufficiently broad basis to make the influence of g dominate over that of the s’s [subjects], they will tend to measure precisely the same thing. April 29, 2003 Inference & Culture Slide 7 An Analytical Reasoning Item Pet Shop Display Arturo is planning the parakeet display for his pet shop. He has five parakeets, Alice, Bob, Carla, Diwakar, and Etria. Each is a different color; not necessarily in the same order, they are white, speckled, green, blue, and yellow. Arturo has two cages. The top cage holds three birds, and the bottom cage holds two. The display must meet the following additional conditions: Alice is in the bottom cage. Bob is in the top cage and is not speckled. Carla cannot be in the same cage as the blue parakeet. Etria is green. The green parakeet and the speckled parakeet are in the same cage. If Carla is in the top cage, which of the following must be true? a) The green parakeet is in the bottom cage. b) The speckled parakeet is in the bottom cage. c) Diwakar is in the top cage. d) Diwakar is in the bottom cage. e) The blue parakeet is in the top cage. April 29, 2003 Inference & Culture Slide 8 LSAT on AR Items LSAT's description of AR takes a trait perspective: "Analytical reasoning items are designed to measure the ability to understand a structure of relationships and to draw conclusions about the structure." AR items are in the LSAT not because either lawyers or law students routinely have to solve problems just like these in their jobs or their studies, but because there is evidence that students who can solve these kinds of puzzles tend to perform better in law school than students who don't. April 29, 2003 Inference & Culture Slide 9 C : S u e h a s a h ig h v a l u e o f A n a ly t ic a l R e a s o n in g . A: S u e a ns w e r e d W : S tu d e n ts w h o a r e h ig h o n u n les s A n a l y ti c a l R e a s o n i n g te n d t o d o c o rrec t ly as a re s ult o f a lu c k y g u e s s . w e l l o n l o g ic a l p u z z le s t h a t q u e r y r e la t io n s t h a t fo ll o w fr o m s inc e e x p l ic i t r e la t io n s a n d c o n s tr a i n t s. on s u p po rt s a c c o un t of so B: E m pi r i c a l s t ud i e s s h ow R: S u e s p e nt l es s an d h ig h c o r r e la t io n s b e t w e e n th a n 1 0 s e c o n d s o n th is it e m . A R te s t s c o r e s a n d c o l l e g e g rad e s , op e n -en d ed p r o b l e m s o lv i n g ta s k s , a n d r a ti n g s o f e m p l o ye e s D 1: S u e D 2 : L o g ic a l r e a s o n in g s ki l ls o n t h e jo b . a n s w e r e d th e s t r u ct u r e a n d P e t S h o p i te m c o n te n ts o f P e t c o r r e c tl y . S h o p i te m . C : S u e h a s a h ig h v a l u e o f A n a ly t ic a l R e a s o n in g . A: S u e a ns w e r e d W : S tu d e n ts w h o a r e h ig h o n u n les s A n a l y ti c a l R e a s o n i n g te n d t o d o c o rrec t ly as a re s ult o f a lu c k y g u e s s . w e l l o n l o g ic a l p u z z le s t h a t q u e r y r e la t io n s t h a t fo ll o w fr o m s inc e e x p l ic i t r e la t io n s a n d c o n s tr a i n t s. on 1) Note that the a c c o un t warrant requires of a conjunction of B: E m pi r i c a l s t ud i e s s h ow data about the h ig h c o r r e la t io n s b e t w e e n A R tenature s t s c o r eof s aSue's n d c olleg e g r a d eperformance s , o p e n - e n d e d and p r o b l e m s o lv i n g ta s k s , a n d the nature of the r a ti n g s o f e m p l o ye e s r e a s operformance n in g s ki l ls o n t h e jo b . situation. s u p po rt s so R: S u e s p e nt l es s an d th a n 1 0 s e c o n d s o n th is it e m . D 1: S u e D 2 : L o g ic a l a n s w e r e d th e s t r u ct u r e a n d P e t S h o p i te m c o n te n ts o f P e t c o r r e c tl y . S h o p i te m . C: Sue has a high value of Analytical Reasoning. unless 2) A closer look at “data”: W thesince on Must reasonaccount from unique work of products and itemB materials, to so aspects addressed in the general and warrant. D1: Sue answered the Pet Shop item correctly. W1: Correspondence of darkest mark and since keyed response means correct and answer. D11 : Sue's marks on the answer sheet for Pet Shop item. A supports R D2 : Logical structure and contents of Pet Shop item. W2: Elements in schemas for valid AR items. D12 Answer key for the Pet Shop item. since D22 Particular content of Pet Shop item. Multiple pieces of evidence of the same kind C: Sue has a high value of Analytical Reasoning. W:Students who are high on Analytical Reasoning tend to do well on logical puzzles that query relations that follow from explicit relations and constraints. on account of B: ... unless A: ... since supports R: ... so and D11: Sue's answer to Item 1 April 29, 2003 ... D1n: Sue's answer to Item n D21 structure and contents of Item 1 Inference & Culture ... D2n structure and contents of Item n Slide 13 Multiple pieces of evidence of different kinds C: Sue has a high value of Analytical Reasoning. unless A0: ... so A : [[Alternatives re logic puzzles]] W1:[[warrant re logic puzzles]] unless A : [[Alternatives re recommendations]] :Wn: [[Warrant re recommendations]] since since and D11: Sue's answer to Item 1 April 29, 2003 unless and D12 : Structure & content of Pet Shop item ... Dn1 Teacher recommendation about Sue Inference & Culture Dn2 Conditions of observation for recommendation Slide 14 Statistical Modeling of Assessment Data Claims in terms of values of unobservable variables in student model (SM)-characterize student knowledge. Data modeled as depending probabilistically on SM vars. Estimate conditional distributions of data given SM vars. Bayes theorem to infer SM variables given data. April 29, 2003 p() p(X1|) p(X3|) p(X2 |) X1 . Inference & Culture X2 . X3 . Slide 15 Behaviorist Perspective The educational process consists of providing a series of environments that permit the student to learn new behaviors or modify or eliminate existing behaviors and to practice these behaviors to the point that he displays them at some reasonably satisfactory level of competence and regularity under appropriate circumstances. … The evaluation of the success of instruction and of the student’s learning becomes a matter of placing the student in a sample of situations in which the different learned behaviors may appropriately occur and noting the frequency and accuracy with which they do occur. D.R. Krathwohl & D.A. Payne, 1971, p. 17-18. April 29, 2003 Inference & Culture Slide 16 The warrant encompasses definitions of the class of stimulus situations, response classifications, and sampling theory. C : Sue's probability of correctly answering a 2digit subtraction problem with borrowing is p W:Sampling theory machinery for reasoning from observed proportion of r correct responses in n targeted situations, to true proportion p. unless since A: [e.g., observational errors, data errors, misclassification of responses or performance situations, distractions, etc.] so and D1jD11 : Sue's : Sue's D11 Sue's answer to: to answer to Item janswer Item j Item j D2jD2jstructure D2jstructure structure andand contents contents and of Item j contents of Item j j of Item C : Sue's probability of correctly answering a 2digit subtraction problem with borrowing is p W:Sampling theory machinery for reasoning from observed proportion of r correct responses in n targeted situations, to true proportion p. unless since The claim addresses the expected value of performance of the targeted kind in the targeted situations. A: [e.g., observational errors, data errors, misclassification of responses or performance situations, distractions, etc.] so and D1jD11 : Sue's : Sue's D11 Sue's answer to: to answer to Item janswer Item j Item j D2jD2jstructure D2jstructure structure andand contents contents and of Item j contents of Item j j of Item C : Sue's probability of correctly answering a 2digit subtraction problem with borrowing is p W:Sampling theory machinery for reasoning from observed proportion of r correct responses in n targeted situations, to true proportion p. unless since A: [e.g., observational errors, data errors, misclassification of responses or performance situations, distractions, etc.] so and D1jD11 : Sue's : Sue's D11 Sue's answer to: to answer to Item janswer Item j Item j D2jD2jstructure D2jstructure structure andand contents contents and of Item j contents of Item j j of Item The task data address the salient features of the stimulus situations (i.e., tasks). C : Sue's probability of correctly answering a 2digit subtraction problem with borrowing is p W:Sampling theory machinery for reasoning from observed proportion of r correct responses in n targeted situations, to true proportion p. unless since A: [e.g., observational errors, data errors, misclassification of responses or performance situations, distractions, etc.] so The student data address the salient features of the responses. and D1jD11 : Sue's : Sue's D11 Sue's answer to: to answer to Item janswer Item j Item j D2jD2jstructure D2jstructure structure andand contents contents and of Item j contents of Item j j of Item The Information-Processing Perspective Epitomized in Newell and Simon’s (1972) Human Problem Solving Examines the procedures by which people acquire, store, and use knowledge to solve problems. Modeling problem-solving in terms of the capabilities and the limitations of human thought and memory. Importance of knowledge structures, relationships, procedures in learning domains. Use of rules, production systems, task decompositions, and means-ends analyses. April 29, 2003 Inference & Culture Slide 21 Responses consistent with the "subtract smaller from larger" bug April 29, 2003 821 - 285 885 - 221 664 664 63 - 15 17 -9 52 12 Inference & Culture Slide 22 C: Sue's configuration of production rules for operating in the domain (knowledge and skill) is K W0: Theory about how persons with configurations {K1,...,Km} would be likely to respond to items with different salient features. since so and C : Sue's probability of answering a Class 1 subtraction problem with borrowing is p1 C : Sue's probability of ... W :Sampling theory for items with since feature set defining Class 1 answering a Class n subtraction problem with borrowing is pn W :Sampling theory for items with since feature set defining Class n so and D11j : Sue's D11 answerD11 to Item j, Class 1 so and D21j structure D2j D2j and contents of Item j, Class1 of Item j j of Item ... D1nj : Sue's D11 answerD11 to Item j, Class n D2nj structure D2j D2j and contents of Item j, Class n of Item j j of Item C: Sue's configuration of production rules for operating in the domain (knowledge and skill) is K W0: Theory about how persons with configurations {K1,...,Km} would be likely to respond to items with different salient features. since so and C : Sue's probability of answering a Class 1 subtraction problem with borrowing is p1 C : Sue's probability of ... Like behaviorist :Sampling W inference at level of theory for items with since behavior in classes of feature set so defining Class n structurally similar and tasks. W :Sampling theory for items with since feature set defining Class 1 so and D11j : Sue's D11 answerD11 to Item j, Class 1 D21j structure D2j D2j and contents of Item j, Class1 of Item j j of Item answering a Class n subtraction problem with borrowing is pn ... D1nj : Sue's D11 answerD11 to Item j, Class n D2nj structure D2j D2j and contents of Item j, Class n of Item j j of Item C: Sue's configuration of production rules for operating in the domain (knowledge and skill) is K W0: Theory about how persons with configurations {K1,...,Km} would be likely to respond to items with different salient features. since so and C : Sue's probability of answering a Class 1 subtraction problem with borrowing is p1 W :Sampling theory for items with since feature set defining Class 1 D11j : Sue's D11 answerD11 to Item j, Class 1 C : Sue's probability of ... answering a Class n subtraction problem with borrowing is pn W :Sampling theory for items with since feature set defining Class n Patterns among so behaviorist claims are and data for inferences D21j structure D1nj : Sue's unobservable D2jabout D11D11 D2j answer to and contents ... j, Class n production rulesItem that of Item j, Class1 of Item j j of Item govern behavior. so and D2nj structure D2j D2j and contents of Item j, Class n of Item j j of Item C: Sue's level of troubleshooting skill with is K. W: [theory about strategies and procedures people at various levels of troubleshooting expertise tend to employ when iteratively solving problems in the domain.] since so and D1,t+1: Sue's actions at time t+1 Assessing inquiry processes: Time dependencies in a troubleshooting task. Past behavior & consequences becomes part of setting for next action. D2,t: Context after time t D1,t: Sue's actions at time t D2,t-1: Context after time t-1 D1,t-1: Sue's actions at time t-1 D1,t-2: Sue's actions at time t-2 ... D2,t-2: Context after time t-2 ... The Sociocultural Perspective Stresses how knowledge is conditioned and constrained by the technologies, information resources, representation systems, and social situations ... Incorporates explanatory concepts that have proved useful in fields such as ethnography and sociocultural psychology to study collaborative work, … mutual understanding in conversation, and other characteristics of interaction that are relevant to the functional success of the participants’ activities. Greeno, Collins, & Resnick, 1997, p. 7. April 29, 2003 Inference & Culture Slide 27 AP Studio Art Portfolios C: The level of performance for the Concentration section is K. W0: [Specification of general rubric to the goals and and approach the student describes in the narrative] since so B: General rubric and D1 :Student's learning D2 :Conditions under C in the course of carrying out the concentration. which the work was carried out. tailors Statements in narrative explaining the concentration, its influences, goals, etc. D3j : Art D11 piece j D11 in the concentration. AP Studio Art Portfolios C: The level of performance for the Concentration section is K. W0: [Specification of general rubric to the goals and and approach the student describes in the narrative] since so Claim concerns level of performance represented by unique project, in socially-determined general evaluation scheme. B: General rubric and D1 :Student's learning D2 :Conditions under C in the course of carrying out the concentration. which the work was carried out. tailors Statements in narrative explaining the concentration, its influences, goals, etc. D3j : Art D11 piece j D11 in the concentration. AP Studio Art Portfolios C: The level of performance for the Concentration section is K. W0: [Specification of general rubric to the goals and and approach the student describes in the narrative] since so B: General rubric and D1 :Student's learning D2 :Conditions under C in the course of carrying out the concentration. which the work was carried out. tailors Statements in narrative explaining the concentration, its influences, goals, etc. D3j : Art D11 piece j D11 in the concentration. Data from student are (1) works of art and (2) explanation of project goals, approach, rationale. AP Studio Art Portfolios C: The level of performance for the Concentration section is K. W0: [Specification of general rubric to the goals and and approach the student describes in the narrative] since so B: General rubric and D1 :Student's learning D2 :Conditions under C in the course of carrying out the concentration. which the work was carried out. tailors Statements in narrative explaining the concentration, its influences, goals, etc. D3j : Art D11 piece j D11 in the concentration. Student text helps assure performance conditions meet the requirements of the warrant. AP Studio Art Portfolios C: The level of performance for the Concentration section is K. W0: [Specification of general rubric to the goals and and approach the student describes in the narrative] Student text contributes to how raters apply general evaluation rubric to tailors this student’s work. since so B: General rubric and D1 :Student's learning D2 :Conditions under C in the course of carrying out the concentration. which the work was carried out. Statements in narrative explaining the concentration, its influences, goals, etc. D3j : Art D11 piece j D11 in the concentration. Conversational Competence C: Sue's level of conversational competence is K. W: [theory about what people at various levels of conversational competence will behave in contexts with specified features] since so and C D3,t+1: I's speech act at time t+1 D1,t+1: Sue's speech act at time t+1 D2,t: Context after time t D1,t: Sue's speech act at time t D3,t: I's speech act at time t D2,t-1: Context after time t-1 D1,t-1: Sue's speech act at time t-1 D1,t-2: Sue's speech act at time t-2 ... D2,t-2: Context after time t-2 ... D3,t-1: I's speech act at time t-1 D3,t-2: I's speech act at time t-2 ... Conversational Competence C: Sue's level of conversational competence is K. W: [theory about what people at various levels of conversational competence will behave in contexts with specified features] and D1,t+1: Sue's speech act at time t+1 D1,t: Sue's speech act at time t D1,t-1: Sue's speech act at time t-1 since so Challenges: 1) Time dependencies. 2) Interlocutor’s behavior affects C D3,t+1: I's speech act atby warrant for context-- is required time t+1 D2,t: evidence Context about certain aspects of after time t D3,t: I's competence. speech act at time Naturalistic t 3) D2,t-1: How constrained? vs. Context after interviewer. time t-1 D3,t-1: I's D1,t-2: Sue's speech act at time t-2 ... D2,t-2: Context after time t-2 ... speech act at time t-1 D3,t-2: I's speech act at time t-2 ... Conclusion What changes? Developments in psychology, technology, and social factors (e.g., accommodations) continually place demands on assessment that outstrip familiar forms. What doesn’t change? We want to draw inferences about what students know and can do as seen from some perspective; that perspective tells us what kinds of things we need to see them do, in what kinds of situations, to ground those inferences. April 29, 2003 Inference & Culture Slide 35 Conclusion We see elaborations, extensions, and specializations of enduring principles of evidentiary reasoning. We find continued value in tools such as Toulmin diagrams, Wigmore charts, and Bayesian inference networks to understand yesterday's assessments, manage today's, and design the assessments of tomorrow. April 29, 2003 Inference & Culture Slide 36