ppt

advertisement
Individual Differences
Marsha C. Lovett
Carnegie Mellon University
Differences Beyond Mere Noise
Between-subject differences: systematic, reliable
– Architectural (processing) differences
e.g., processing speed, working memory capacity, decay
– Knowledge-based differences
• Knowledge contents (e.g., facts, strategies, etc.)
• Same content, but differences in experience/practice
e.g., different trial sequences, different real-world experiences
– Representational differences
• Features represented, knowledge structures
Differences Beyond Mere Noise
Within-subject differences: temporal, subtle
– Knowledge/experience grows (learning)
– Processing parameters change (e.g., fatigue)
– Representation changes (insight)
Why model Ind Diffs in ACT-R?
Additional constraints on model/theory
– Models can fit the averaged data and yet fail to fit
individuals’ (or subbroups’) data (cf. Siegler, 1987)
0.6
0.5
J
Global Average
G
Human Average
JG
0.4
G
J
J
G
JG
0.3
0.7
G
J
G
J
J
G
0.2
0.1
0
1
2
3
4
Block
5
6
Global Average
Unaware
Aware
Human
J
7
Proportion StrategyA
Proportion StrategyA
0.7
0.6
0.5
G
JG
0.4
G
J
J
G
JG
0.3
G
J
G
J
J
G
0.2
0.1
0
1
2
3
4
Block
5
6
7
Why model Ind Diffs in ACT-R?
Predicted variability often lower than observed
100
% Correct
80
60
40
20
0
3
4
5
6
Memory Size
(from Lovett, Reder, & Lebiere, 1997)
Why model Ind Diffs in ACT-R?
ACT-R is a nonlinear system  variation in
input can have surprising consequences
FIXED CASE
fixed input
W1
fixed input
W1
Linear System
f(W) = aW
Nonlinear System
g(W) = eW
fixed output
a1 = a
fixed output
e1 = e
Why model Ind Diffs in ACT-R?
VARYING CASE
variation in input
W~
Normal(1,s2)
Linear System
f(W) = aW
variability in output
Normal(a,as2)
mean same as fixed
variation in input
W~
Normal(1,s2)
Nonlinear System
g(W) = eW
variability in output
2
Normal(e1+.5s ,?#@!)
change in mean!
Why model Ind Diffs in ACT-R?
• Adding Ind Diffs changes variability and mean
Memory Size
• This affects fitting of other global parameters
Especially within unified theories
• Unified theories like ACT-R use single set of
mechanisms to capture data across tasks
• Modeling Ind Diffs within ACT-R
– Allows modeling of an individual across tasks
– Allows testing of individual difference theories
across tasks
How to model Ind Diffs in ACT-R?
• Architectural (processing) differences
– Global parameters: G (motivation), W (WM), P/M …
• Knowledge differences — content vs experience
– Symbolic: Different sets of productions, chunks
– Subsymbolic: Production utilities, chunk activations
• Representational differences (cf. Lovett & Schunn, 1999)
– Chunk types, production conditions, proc vs. decl
ACT-R Ind Diff Models
Architectural Parameters Varied
Ind’l
Subject
Subgroup
Param
G egs d
W
W*
W
Task (Modeler)
Strat choice in BST (Schunn)
KA-ATC (Taatgen)
Scheduling (Taatgen, Jongman)
WM tasks (Lovett, Reder,Lebiere)
List memory(Reder, Schunn)
Exp’tal Design (Schunn)
Strat choice in memory (Reder, Schunn)
Scaling (Petrov)
Device operation (Byrne)
Digit symbol (Byrne)
ACT-R Ind Diff Models
Symbolic Knowledge Varied
Knowledge
productions
productions
productions
Task (Modeler)
User interface (Gray)
2-col subtraction (Young)
Seriation length ( Young)
productions
chunks
Exp’tal Design (Schunn)
Ind’l
Subject
Subgroup
ACT-R Ind Diff Models
Other combinations varied
? varied
egs p-utils
knwldg wm
Task (Modeler)
Analogy in prob solving (Salvucci)
Unix tutor, piloting (Doane)
Ind’l
Subject
Subgroup
p-utils
Early algebra (Koedinger)
rt egs p-conds TON (Jones, Ritter)
Example: Arch’l Param Varied
• WM capacity’s effect on performance
– Model individuals in a WM task called MODS
– Take MODS model and vary W parameter
(Lovett, Reder, & Lebiere, 1999)
Example cont’d
• Can fit individual data at even finer level
– Serial position effect (w/ W fit from set size effect)
Subject 221
Subject 201
Proportion Correct
W = 0.7
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
W = 0.9
E
B
B
B
B
E
E
1
2
B
E
B
E
E
3
4
5
Serial Position
6
Subject 211
Proportion Correct
B
E
B
E
E
1
2
B
Data
E
Mode l
B
E
B
B
E
E
B
3
4
5
Serial Position
6
Subject 203
W = 1.0
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
W = 1.1
E
B
E
B
B
E
1
E
B
2
E
B
E
B
3
4
5
Serial Position
6
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
B
E
B
E
E
E
E
B
B
E
B
B
1
2
3
4
5
Serial Position
6
Example cont’d
• Can other params account for ind’l patterns?
W varying
d varying
Subject 221
W = 0.7
W = 0.9
d = 0.8
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
E
B
B
B
B
E
E
1
2
B
E
B
E
E
3
4
5
Serial Position
6
B
E
B
E
B
B
E
E
1
2
B
E
E
B
3
4
5
Serial Position
Proportion Correct
Subject 201
Proportion Correct
Subject 221
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
6
d = 0.6
E
E
B
1
1
2
3
4
5
Serial Position
6
3
4
5
Serial Position
Data
E
Model
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
B
E
B
E
E
E
E
B
B
E
B
B
1
2
3
4
5
Serial Position
6
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
E
B
E
B
E
1
6
B
Proportion Correct
Proportion Correct
E
B
2
Mode l
d = 0.7
E
B
B
Data
W = 1.1
E
B
B
B
E
Subject 203
B
E
E
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
B
W = 1.0
E
B
B
B
E
Subject 211
E
B
E
E
Subject 226
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Subject 201
B
E
B
E
E
B
B
2
3
4
5
Serial Position
6
E
E
B
Subject 203
d = 0.2
E
B
E
B
E
E
B
E
B
B
1
2
E
B
3
4
5
Serial Position
6
(Daily, Lovett, & Reder, 2001)
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
E
B
E
B
E
E
B
B
B
1
2
3
4
5
Serial Position
6
Example cont’d
• Cross-task predictions w/ no new parameters!
– Subjects performed MODS & N-back tasks
– W’s estimated from MODS to predict N-back
(Lovett, Daily, & Reder, 2000)
Summary: Architectural Ind Diffs
• Vary global parameter(s) to represent Ind Diffs
• Parameter values take on meaning
–
–
–
–
Predict other measures within task
Predict performance on other tasks
Relate to other empirical measures
Change across time
• Can compare different theories of Ind Diffs
Ex: Symbolic Knowledge Varied
• Scientific Discovery in  Microworld
(Schunn & Anderson, 1998)
– Task: reveal “truth” behind data by conducting
experiments and interpreting data tables
– Large performance differences in experiment
designs and interpretation
– Subgroups modeled with different sets of
procedural & declarative knowledge
Issues in Symbolic Knowledge Diffs
• Varying procedural/declarative knowledge
– Consider elements as qualitative parameters
• Model is set of elements drawn/not from fixed set
– Use other constraints to winnow possible model
versions from the power set
• Developmental progression
• Learnable via symbolic learning mechanisms
Ex: Subsymbolic Knowledge Diffs
• Interface use (Gray et al., 2001)
– Model runs get same experience as subjects
– Use same priors for chunk activations and
production utilities, but different experience across
model runs leads outputs to diverge
• Early algebra (Koedinger & Maclaren, 2001)
– Vary priors for production utilities to account for
subjects’ different pre-experimental experience
Ex: Representational Ind Diffs
• Tower of Nottingham (Jones, Ritter, Wood, 2000)
– Goal: Capture developmental differences by
implementing developmental theories in ACT-R
– Besides global parameters (rt, egn), vary # of
conditions in productions’ left hand sides
• Analogical Problem Solving (Salvucci, 1998)
– Goal: Capture wide variation in eye-fixation
strategies when subjects refer to source problem
– Besides varying parameters (egn, prod utilities),
built decl-based and proc-based models
Ind Diffs Model Fitting
• Many Ind Diff parameters
– Fit IndDiff parameters for each subject  NPbig
• Few Ind Diff (esp 1) parameters
– Fit global params while IndDiff param(s) are drawn
from distribution (i.e., not fixed)
– Fit IndDiff param(s) for each subject  G+NPsmall
• Hierarchical modeling: best of both!
– Fit global and IndDiff params together
– IndDiff param-values from distribution  NsmallP
Concluding Bold Statement
All ACT-R models should be Ind Diffs Models
–
–
–
–
You have the data (simply omit averaging step)
It’s just a few more fitting cycles (see prev slide)
Avoids perils of averaging over subjects
Increases model variability (closer to observed)
– Default parameter settings would become
distributions, not fixed values
More on Why: Guess the Authors
Computational models need to be able to account for both the commonalty across
individuals’ processing as well as the variation between individuals’ performance.
Cognitive models should be developed to predict the performance of individual
participants across tasks and along multiple dimensions. Ideally, such a modeling effort
would be able to predict individuals’ performance in a new task with no new free
parameters, presumably after deriving an estimate of each individual’s processing
parameters from previous modeling of other tasks. (Lovett, Daily, & Reder, 2001)
A way to keep the multiple-constraint advantage offered by unified theories of cognition
while making their development tractable is to do Individual Data Modelling (IDM).
That is, to gather a large number or empircal/experimental observations on a single
subject (or a few subjects analysed individually) using a variety of tasks that exercise
multiple abilities (e.g., perception, memory, problem solving), and then to use these data
to develop a detailed computational model of the subject that is able to learn while
performing the tasks. (Gobet & Ritter, 2000)
Example cont’d
• Sensitivity analysis: do other params manage?
W varying
Subject 201
W = 0.7
W = 0.9
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
E
B
B
B
B
E
E
1
2
B
E
B
E
E
3
4
5
Serial Position
6
rt varying
Subject 221
B
E
B
E
B
B
E
E
1
2
B
E
E
B
3
4
5
Serial Position
6
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
RT = 0.694
E
B
B
B
1
1
2
3
4
5
Serial Position
6
1
6
B
Data
E
Model
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
B
E
B
E
E
E
E
B
B
E
B
B
1
2
3
4
5
Serial Position
6
Proportion Correc t
Proportion Correct
E
B
3
4
5
Serial Position
B
Mode l
RT = 0.194
E
B
E
E
E
Data
Subject 204
E
B
2
B
E
E
W = 1.1
B
E
B
B
E
B
W = 1.0
E
B
E
B
E
Subject 203
E
B
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
E
Subject 211
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Subject 201
RT = 1.194
Proportion Correc t
Proportion Correct
Subject 221
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
E
B
B
B
E
2
E
B
3
4
5
Serial Position
6
E
B
B
E
3
4
5
Serial Position
6
Subject 203
RT = -0.056
E
B
E
B
E
B
E
1
2
B
E
E
B
B
3
4
5
Serial Position
(Daily, Lovett, & Reder, 2001)
6
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
E
B
B
E
B
E
B
E
1
2
Download