Individual Differences Marsha C. Lovett Carnegie Mellon University Differences Beyond Mere Noise Between-subject differences: systematic, reliable – Architectural (processing) differences e.g., processing speed, working memory capacity, decay – Knowledge-based differences • Knowledge contents (e.g., facts, strategies, etc.) • Same content, but differences in experience/practice e.g., different trial sequences, different real-world experiences – Representational differences • Features represented, knowledge structures Differences Beyond Mere Noise Within-subject differences: temporal, subtle – Knowledge/experience grows (learning) – Processing parameters change (e.g., fatigue) – Representation changes (insight) Why model Ind Diffs in ACT-R? Additional constraints on model/theory – Models can fit the averaged data and yet fail to fit individuals’ (or subbroups’) data (cf. Siegler, 1987) 0.6 0.5 J Global Average G Human Average JG 0.4 G J J G JG 0.3 0.7 G J G J J G 0.2 0.1 0 1 2 3 4 Block 5 6 Global Average Unaware Aware Human J 7 Proportion StrategyA Proportion StrategyA 0.7 0.6 0.5 G JG 0.4 G J J G JG 0.3 G J G J J G 0.2 0.1 0 1 2 3 4 Block 5 6 7 Why model Ind Diffs in ACT-R? Predicted variability often lower than observed 100 % Correct 80 60 40 20 0 3 4 5 6 Memory Size (from Lovett, Reder, & Lebiere, 1997) Why model Ind Diffs in ACT-R? ACT-R is a nonlinear system variation in input can have surprising consequences FIXED CASE fixed input W1 fixed input W1 Linear System f(W) = aW Nonlinear System g(W) = eW fixed output a1 = a fixed output e1 = e Why model Ind Diffs in ACT-R? VARYING CASE variation in input W~ Normal(1,s2) Linear System f(W) = aW variability in output Normal(a,as2) mean same as fixed variation in input W~ Normal(1,s2) Nonlinear System g(W) = eW variability in output 2 Normal(e1+.5s ,?#@!) change in mean! Why model Ind Diffs in ACT-R? • Adding Ind Diffs changes variability and mean Memory Size • This affects fitting of other global parameters Especially within unified theories • Unified theories like ACT-R use single set of mechanisms to capture data across tasks • Modeling Ind Diffs within ACT-R – Allows modeling of an individual across tasks – Allows testing of individual difference theories across tasks How to model Ind Diffs in ACT-R? • Architectural (processing) differences – Global parameters: G (motivation), W (WM), P/M … • Knowledge differences — content vs experience – Symbolic: Different sets of productions, chunks – Subsymbolic: Production utilities, chunk activations • Representational differences (cf. Lovett & Schunn, 1999) – Chunk types, production conditions, proc vs. decl ACT-R Ind Diff Models Architectural Parameters Varied Ind’l Subject Subgroup Param G egs d W W* W Task (Modeler) Strat choice in BST (Schunn) KA-ATC (Taatgen) Scheduling (Taatgen, Jongman) WM tasks (Lovett, Reder,Lebiere) List memory(Reder, Schunn) Exp’tal Design (Schunn) Strat choice in memory (Reder, Schunn) Scaling (Petrov) Device operation (Byrne) Digit symbol (Byrne) ACT-R Ind Diff Models Symbolic Knowledge Varied Knowledge productions productions productions Task (Modeler) User interface (Gray) 2-col subtraction (Young) Seriation length ( Young) productions chunks Exp’tal Design (Schunn) Ind’l Subject Subgroup ACT-R Ind Diff Models Other combinations varied ? varied egs p-utils knwldg wm Task (Modeler) Analogy in prob solving (Salvucci) Unix tutor, piloting (Doane) Ind’l Subject Subgroup p-utils Early algebra (Koedinger) rt egs p-conds TON (Jones, Ritter) Example: Arch’l Param Varied • WM capacity’s effect on performance – Model individuals in a WM task called MODS – Take MODS model and vary W parameter (Lovett, Reder, & Lebiere, 1999) Example cont’d • Can fit individual data at even finer level – Serial position effect (w/ W fit from set size effect) Subject 221 Subject 201 Proportion Correct W = 0.7 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 W = 0.9 E B B B B E E 1 2 B E B E E 3 4 5 Serial Position 6 Subject 211 Proportion Correct B E B E E 1 2 B Data E Mode l B E B B E E B 3 4 5 Serial Position 6 Subject 203 W = 1.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 W = 1.1 E B E B B E 1 E B 2 E B E B 3 4 5 Serial Position 6 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 B E B E E E E B B E B B 1 2 3 4 5 Serial Position 6 Example cont’d • Can other params account for ind’l patterns? W varying d varying Subject 221 W = 0.7 W = 0.9 d = 0.8 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 E B B B B E E 1 2 B E B E E 3 4 5 Serial Position 6 B E B E B B E E 1 2 B E E B 3 4 5 Serial Position Proportion Correct Subject 201 Proportion Correct Subject 221 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 6 d = 0.6 E E B 1 1 2 3 4 5 Serial Position 6 3 4 5 Serial Position Data E Model 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 B E B E E E E B B E B B 1 2 3 4 5 Serial Position 6 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 E B E B E 1 6 B Proportion Correct Proportion Correct E B 2 Mode l d = 0.7 E B B Data W = 1.1 E B B B E Subject 203 B E E 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 B W = 1.0 E B B B E Subject 211 E B E E Subject 226 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Subject 201 B E B E E B B 2 3 4 5 Serial Position 6 E E B Subject 203 d = 0.2 E B E B E E B E B B 1 2 E B 3 4 5 Serial Position 6 (Daily, Lovett, & Reder, 2001) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 E B E B E E B B B 1 2 3 4 5 Serial Position 6 Example cont’d • Cross-task predictions w/ no new parameters! – Subjects performed MODS & N-back tasks – W’s estimated from MODS to predict N-back (Lovett, Daily, & Reder, 2000) Summary: Architectural Ind Diffs • Vary global parameter(s) to represent Ind Diffs • Parameter values take on meaning – – – – Predict other measures within task Predict performance on other tasks Relate to other empirical measures Change across time • Can compare different theories of Ind Diffs Ex: Symbolic Knowledge Varied • Scientific Discovery in Microworld (Schunn & Anderson, 1998) – Task: reveal “truth” behind data by conducting experiments and interpreting data tables – Large performance differences in experiment designs and interpretation – Subgroups modeled with different sets of procedural & declarative knowledge Issues in Symbolic Knowledge Diffs • Varying procedural/declarative knowledge – Consider elements as qualitative parameters • Model is set of elements drawn/not from fixed set – Use other constraints to winnow possible model versions from the power set • Developmental progression • Learnable via symbolic learning mechanisms Ex: Subsymbolic Knowledge Diffs • Interface use (Gray et al., 2001) – Model runs get same experience as subjects – Use same priors for chunk activations and production utilities, but different experience across model runs leads outputs to diverge • Early algebra (Koedinger & Maclaren, 2001) – Vary priors for production utilities to account for subjects’ different pre-experimental experience Ex: Representational Ind Diffs • Tower of Nottingham (Jones, Ritter, Wood, 2000) – Goal: Capture developmental differences by implementing developmental theories in ACT-R – Besides global parameters (rt, egn), vary # of conditions in productions’ left hand sides • Analogical Problem Solving (Salvucci, 1998) – Goal: Capture wide variation in eye-fixation strategies when subjects refer to source problem – Besides varying parameters (egn, prod utilities), built decl-based and proc-based models Ind Diffs Model Fitting • Many Ind Diff parameters – Fit IndDiff parameters for each subject NPbig • Few Ind Diff (esp 1) parameters – Fit global params while IndDiff param(s) are drawn from distribution (i.e., not fixed) – Fit IndDiff param(s) for each subject G+NPsmall • Hierarchical modeling: best of both! – Fit global and IndDiff params together – IndDiff param-values from distribution NsmallP Concluding Bold Statement All ACT-R models should be Ind Diffs Models – – – – You have the data (simply omit averaging step) It’s just a few more fitting cycles (see prev slide) Avoids perils of averaging over subjects Increases model variability (closer to observed) – Default parameter settings would become distributions, not fixed values More on Why: Guess the Authors Computational models need to be able to account for both the commonalty across individuals’ processing as well as the variation between individuals’ performance. Cognitive models should be developed to predict the performance of individual participants across tasks and along multiple dimensions. Ideally, such a modeling effort would be able to predict individuals’ performance in a new task with no new free parameters, presumably after deriving an estimate of each individual’s processing parameters from previous modeling of other tasks. (Lovett, Daily, & Reder, 2001) A way to keep the multiple-constraint advantage offered by unified theories of cognition while making their development tractable is to do Individual Data Modelling (IDM). That is, to gather a large number or empircal/experimental observations on a single subject (or a few subjects analysed individually) using a variety of tasks that exercise multiple abilities (e.g., perception, memory, problem solving), and then to use these data to develop a detailed computational model of the subject that is able to learn while performing the tasks. (Gobet & Ritter, 2000) Example cont’d • Sensitivity analysis: do other params manage? W varying Subject 201 W = 0.7 W = 0.9 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 E B B B B E E 1 2 B E B E E 3 4 5 Serial Position 6 rt varying Subject 221 B E B E B B E E 1 2 B E E B 3 4 5 Serial Position 6 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 RT = 0.694 E B B B 1 1 2 3 4 5 Serial Position 6 1 6 B Data E Model 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 B E B E E E E B B E B B 1 2 3 4 5 Serial Position 6 Proportion Correc t Proportion Correct E B 3 4 5 Serial Position B Mode l RT = 0.194 E B E E E Data Subject 204 E B 2 B E E W = 1.1 B E B B E B W = 1.0 E B E B E Subject 203 E B 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 E Subject 211 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Subject 201 RT = 1.194 Proportion Correc t Proportion Correct Subject 221 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 E B B B E 2 E B 3 4 5 Serial Position 6 E B B E 3 4 5 Serial Position 6 Subject 203 RT = -0.056 E B E B E B E 1 2 B E E B B 3 4 5 Serial Position (Daily, Lovett, & Reder, 2001) 6 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 E B B E B E B E 1 2