Integrating Measurement and Sociocognitive Perspectives in Educational Assessment Robert J. Mislevy University of Maryland Robert L. Linn Distinguished Address Sponsored by AERA Division D. Presented at the Annual Meeting of the American Educational Research Association, Denver, CO, May 1, 2010. This work was supported by a grant from the Spencer Foundation. May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 1 Messick, 1994 [W]hat complex of knowledge, skills, or other attribute should be assessed... Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors? May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 2 Snow & Lohman, 1989 Summary test scores, and factors based on them, have often been though of as “signs” indicating the presence of underlying, latent traits. … An alternative interpretation of test scores as samples of cognitive processes and contents … is equally justifiable and could be theoretically more useful. May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 3 Roadmap Rationale Model-based reasoning A sociocognitive perspective Assessment arguments Measurement models & concepts Why are these issues important? Conclusion May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 4 Rationale May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 5 Rationale An articulated way to think about assessment: Understand task & use situations in “emic” sociocognitive terms. Identify the shift in to “etic” terms in task-level assessment arguments. Examine the synthesis of evidence across tasks in terms of model-based reasoning. Reconceive measurement concepts. Draw implications for assessment practice. May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 6 Model-Based Reasoning May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 7 Representational Form A y=ax+b (y-b)/a=x Measurement models Representational Form B Mappings among representational systems Entities and Measurement relationships concepts Real-World Situation Reconceived Real-World Situation Representational Form A y=ax+b Measurement models (y-b)/a=x Representational Form B Mappings among representational systems Reconceived Entities and relationships in higher-level model Measurement concepts Entities and relationships in lower-level model Real-World Situation Sociocognitive concepts Reconceived Real-World Situation A Sociocognitive Perspective May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 10 Some Foundations Themes from, e.g., cog psych, linguistics, neuroscience, anthropology: » Connectionist metaphor, associative memory, complex systems (variation, stability, attractors) Situated cognition & information processing » E.g., Kintsch’s Construction-Integration (CI) theory of comprehension; diSessa’s “knowledge in pieces” Interpersonal & Extrapersonal patterns May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 11 Some Foundations Extrapersonal patterns: » Linguistic: Grammar, conventions, constructions » Cultural models: What ‘being sick’ means, restaurant script, apology situations » Substantive: F=MA, genres, plumbing, etc. Intrapersonal resources: » Connectionist metaphor for learning » Patterns from experience at many levels May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 12 Inside A not observable May 1, 2010 A Inside B B observable not observable AERA 2010 Robert L. Linn Lecture Slide 13 and internal and external aspects of context … Inside A A Inside B B Context A la Kintsch: Propositional content of text / speech… May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 14 Inside A A Inside B B Context The C in CI theory is Construction: Activation of both relevant and irrelevant bits from •If a pattern hasn’t been developed in past LTM, past experience. All L/C/S levels involved. experience, it can’t be activated (although it Example: Chemistry problems in German. may get constructed in the interaction). May 1, 2010 •A relevant pattern from LTM may be activated in some contexts but not others (e.g., physics models). AERA 2010 Robert L. Linn Lecture Slide 15 Inside A A Inside B B Context The I in CI theory, Integration: •Situation model: synthesis of coherent / reinforced activated L/C/S patterns May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 16 Inside A A Inside B B Context Situation model is also the basis of planning and action. May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 17 Inside A A Inside B B Context Context Context Context May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 18 Inside A A Inside B B Context Context Context Context May 1, 2010 Ideally, activation of relevant and compatible intrapersonal patterns… AERA 2010 Robert L. Linn Lecture Slide 19 Inside A A Inside B B Context Context Context Context •Persons’ capabilities, situations, and toperformances lead to (sufficiently) are intertwined – •Meaning co-determined, through shared understanding; L/C/S patterns i.e., co-constructed meaning. May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 20 What can we say about individuals? Use of resources in appropriate contexts in appropriate ways; i.e., Attunement to targeted L/C/S patterns: Recognize markers of externally-viewed patterns? Construct internal meanings in their light? Act in ways appropriate to targeted L/C/S patterns? What is the range and circumstances of activation? (variation of performance across contexts) May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 21 Assessment Arguments May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 22 Messick, 1994 [W]hat complex of knowledge, skills, or other attribute should be assessed... Next, what behaviors or performances should reveal those constructs, and what tasks or situations should elicit those behaviors? May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 23 Toulmin’s Argument Structure Claim unless Alternative Warrant explanation since so Backing May 1, 2010 Data AERA 2010 Robert L. Linn Lecture Slide 24 Concerns features of (possibly Note the move from the emic evolving) context as seen from the to Claim the about etic!student view of the assessor – in particular, Choice in light of assessment Backing concerning assessment situation those seen as relevant to targets of purpose and conception of inference. Warrant capabilities. concerning unless on account of assessment Alternative explanations since so Data concerning Evaluation of performance task situation seeks evidence of Depends on contextual attunement to features of features implicitly, since targeted L/C/S patterns. evaluated in light of targeted patterns. Student acting L/C/S in Data concerning student performance Warrant concerning evaluation since Warrant concerning task design since Other information concerning student vis a vis assessment situation assessment situation Backing concerning assessment situation on account of Warrant concerning assessment “Hidden” aspects of context—not in test theory model but essential to argument: What attunements to linguistic Claim about student cultural / substantive patterns can be presumed arranged for among Fundamental to or situated meaning examinees, to condition inference re of student variables in targeted l/c/s patterns? measurement models; Both critical and implicit. unless Alternative explanations since so Data concerning student performance Warrant concerning evaluation since Data concerning task situation Warrant concerning task design since Student acting in assessment situation Other information concerning student vis a vis assessment situation Claim about student Backing concerning assessment situation unless on account of Features of Warrant concerning performance assessment evaluated in light of emerging context. Macro features of performance Features of context arise over time as student acts / interacts. Alternative explanations since so Data concerning student performance Warrant concerning evaluation since Micro features of performance Data concerning task situation Time Warrant concerning task design since Especially important in simulation, game, extended performance Unfolding situated and Micro features of Macro features of performance situation as it situation contexts (e.g., Shute) evolves Student acting in Other information concerning student vis a vis assessment situation assessment situation Claim about student Backing concerning assessment situation unless on account of Warrant concerning assessment Alternative explanations since so Data concerning student performance Warrant concerning evaluation Design Argument since Data concerning task situation Warrant concerning task design since Student acting in assessment situation Other information concerning student vis a vis assessment situation Use Argument Claim about student in use situation (Bachman) unless Warrant concerning use situation Alternative explanations since on account of Backing concerning use situation Other information concerning student vis a vis use situation Data concerning use situation Claim about student Backing concerning assessment situation unless on account of Warrant concerning assessment Alternative explanations since so Data concerning student performance Warrant concerning evaluation Design Argument since Data concerning task situation Warrant concerning task design since Student acting in assessment situation Other information concerning student vis a vis assessment situation Use Argument Claim about student in use situation (Bachman) unless Warrant concerning use situation Alternative explanations since on account of Backing concerning use situation Data concerning use situation Other information concerning student vis a vis use situation Claim about student Backing concerning assessment situation unless Alternative explanations Claim aboutWarrant student is concerning output of the assessment assessment cast depends on argument, How inputittoisthe Data concerning Data concerning psychological perspective student task situation use argument. When measurement models performance and intended areuse. used, the claim is an etic synthesis of evidence, expressed as values of student-model variable(s). Student acting in Design Argument on account of since so Warrant concerning evaluation since Warrant concerning task design since Other information concerning student vis a vis assessment situation assessment situation Use Argument Claim about student in use situation unless Warrant concerning use situation Alternative explanations since on account of Backing concerning use situation Other information concerning student vis a vis use situation Data concerning use situation Claim about student Backing concerning assessment situation unless on account of Warrant concerning assessment Alternative explanations since so Data concerning student performance Warrant concerning evaluation Design Argument since Data concerning task situation Warrant concerning task design since Student acting in assessment situation Other information concerning student vis a vis assessment situation Use Argument Claim about student in use situation unless Warrant concerning use situation Alternative explanations since on account of Backing concerning use situation Other information concerning student vis a vis use situation Data concerning use situation Claim about student Backing concerning assessment situation unless on account of Warrant concerning assessment Alternative explanations since so Data concerning student performance Warrant concerning evaluation Design Argument since Data concerning task situation Warrant concerning task design since Student acting in assessment situation Other information concerning student vis a vis assessment situation Use Argument Claim about student in use situation unless Warrant concerning use situation Alternative explanations since on account of Backing concerning use situation Other information concerning student vis a vis use situation Data concerning use situation Claim about student Backing concerning assessment situation unless on account of Warrant concerning assessment Alternative explanations since so Data concerning student performance Warrant concerning evaluation Design Argument since Data concerning task situation Warrant concerning task design since Student acting in assessment situation Other information concerning student vis a vis assessment situation Use Argument Claim about student in use situation unless Warrant concerning use situation Alternative explanations since on account of Backing concerning use situation Other information concerning student vis a vis use situation Data concerning use situation Claim about student Backing concerning assessment situation unless on account of Warrant concerning assessment Alternative explanations Warrant for inference: Increased likelihood of Data concerning Data concerning activation in use situation student task situation performance if was activated in task Empirical question: Degrees situations. of stability, ranges and conditions of variability Student acting in (Chalhoub-Deville) Design Argument assessment situation since so Warrant concerning evaluation since Warrant concerning task design since What features do tasks and use situations share? •Implicit in trait arguments •Explicit in sociocognitive arguments Other information concerning student vis a vis assessment situation Use Argument Claim about student in use situation unless Warrant concerning use situation Alternative explanations since on account of Backing concerning use situation Data concerning use situation Other information concerning student vis a vis use situation Claim about student •Use situation features call for other •Knowing about relation of target Backing concerning that weren’t in task and L/C/S patterns assessment situation examinees and use situations may or may not be in examinee’s strengthen inferences Warrant resources. concerning •“bias for the best” (Swain, 1985) assessment •Target patterns activated in task but Data concerning Data concerning not use context. What features do tasks and student task situation performance •Target patterns activated in use but use not situations not have in task context. common? Issues of validity & generalizability e.g., “method factors” unless on account of Alternative explanations since so Warrant concerning evaluation Design Argument since Warrant concerning task design since Student acting in assessment situation Other information concerning student vis a vis assessment situation Multiple Tasks Claim about student Dp1 Ds1 Dp1 p2 … Ds2 OI1 Dpn p1 Dsn OI2 OIn Synthesize evidence from multiple tasks, in terms of A A A proficiency variables in a measurement model Snow & Lohman’s sampling What accumulates? L/C/S patterns, but variation What is similar from analyst’s perspective need not be from examinee’s. 1 2 May 1, 2010 AERA 2010 Robert L. Linn Lecture n Slide 36 Measurement Models & Concepts AS IF Tendencies for certain kinds of performance in certain kinds of situations expressed as student model variables q. Probability models for individual performances (X) modeled as probabilistic functions of q – variability. Probability models permit sophisticated reasoning about evidentiary relationships in complex and subtle situations, BUT they are models, with all the limitations implied! May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 37 Measurement Models & Concepts Xs result from particular persons calling upon resources in particular contexts (or not, or how) Mechanically qs simply accumulate info across situations Our choosing situations and what to observe drives their situated meaning. Situated meaning of qs are tendencies toward these actions in these situations that call for certain interactional resources, via L/C/S patterns. May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 38 Classical Test Theory t Claim about student X Dp1 Ds1 Dp2 p1 OI1 … Dpn p1 Dsn OI2 Probability model: “true score” = stability along implied A A A dimension, “error” = variation Situated meaning from task features & evaluation Can organize around traits, task features, or both, depending on task sets and performance features. Profile differences unaddressed 1 Ds2 2 May 1, 2010 AERA 2010 Robert L. Linn Lecture n Slide 39 OIn Item Response Theory q Claim about student D Ds1 Dp1 X 1 Dp1 p2 X 2 OI1 … Ds2 Dpn p1 X n Dsn OI2 A A q = Apropensity to act in targeted way, bj=typical 1 2 OIn n evocation,Complex IRT function = typical variation systems concepts: Willfrom best when most & nontargeted L/C/S Situated meaning task features evaluation Attractors &work stability patterns are familiar… regularities in response patterns, Task features still implicit Item-parameter invariance quantified in parameters; Profile differences / misfit highlights where the vs Population dependence Typical variation prob model narrative doesn’t fit – forLinn, sociocognitive reasons1988) (Tatsuoka, Tatsuoka, & Yamamoto, May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 40 Multivariate Item Response Theory (MIRT) q s = propensities to act in targeted ways in situations with different mixes of L/C/S demands. Good for controlled mixes of situations May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 41 Structured Item Response Theory q Claim about student XD1p1 qD1 s1 XD2p1 p2 2 vi1OI1 A1 … D qs2 XDnpn p1 qDnsn OIn vin vOIi22 A A Explicitly model task situations in terms of L/C/S demands. Links TD with sociocognitive view. Work explicitly with features in controlled and evolved situations (design / agents) Can use with MIRT; Cognitive diagnosis models 2 May 1, 2010 AERA 2010 Robert L. Linn Lecture n Slide 42 Mixtures of IRT Models q student Claim about DXp1 1 Ds1 DXp2 2 p1 OI1 A1 … Ds2 DXpn n p1 OI2 A2 OIn OR DXp1 1 An q student Claim about Ds1 DXp2 2 p1 … Ds2 Different IRT models for differentA unobserved A groups of people Modeling different attractor states Can be theory driven or discovered in data OI1 1 Dsn May 1, 2010 DXpn n p1 OI2 2 AERA 2010 Robert L. Linn Lecture Dsn OIn An Slide 43 Measurement Concepts Validity » Soundness of model for local inferences » Breadth of scope is an empirical question » Construct representation in L/C/S terms » Construct irrelevant sources of variation in L/C/S terms Reliability » Through model, strength of evidence for inferences about tendencies, given variabilities … or about characterizations of variability. May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 44 Measurement Concepts Method Effects » What accumulates in terms of L/C/S patterns in assessment situations but not use situations Generalizability Theory (Cronbach) » Watershed in emphasizing evidentiary reasoning rather than simply measurement » Focus on external features of context; can be recast in L/C/S terms, & attend to correlates of variability May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 45 Why are these issues important? Connect assessment/measurement with current psychological research » Connect assessment with learning Appropriate constraints on interpreting large scale assessments Inference in complex assessments » Games, simulations, performances » Assessment modifications & accommodations » Individualized yet comparable assessments May 1, 2010 AERA 2010 Robert L. Linn Lecture Slide 46 Conclusion May 1, 2010 Communication at the interface We have work we need to do, together. AERA 2010 Robert L. Linn Lecture Slide 47