Bayesian Knowledge Tracing Prediction Models

advertisement
Bayesian Knowledge Tracing and Other Predictive
Models in Educational Data Mining
Zachary A. Pardos
PSLC Summer School 2011
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Outline of Talk
•
Introduction to Knowledge Tracing
–
History
Intuition
Model
–
Demo
–
–
–
–
•
Random Forests
–
–
•
Variations (and other models)
Evaluations (baker work / kdd)
Description
Evaluations (kdd)
Time left?
–
Vote on next topic
2
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
History
• Introduced in 1995 (Corbett & Anderson,
UMUAI)
• Basked on ACT-R theory of skill knowledge
(Anderson 1993)
• Computations based on a variation of
Bayesian calculations proposed in 1972
(Atkinson)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Intuition
• Based on the idea that practice on a skill leads
to mastery of that skill
• Has four parameters used to describe student
performance
• Relies on a KC model
• Tracks student knowledge over time
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
For some Skill K:
Given a student’s response sequence 1 to n, predict n+1
1
….
0
0
1
0
n
n+1
1
1
?
Chronological response sequence for student Y
[ 0 = Incorrect response
1 = Correct response]
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Track knowledge over time
(model of learning)
0
0
Zach Pardos
1
0
1
Bayesian Knowledge Tracing & Other Models
1
1
PLSC Summer School 2011
Intro to Knowledge Tracing
ters
lity of initial knowledge
Tracing
ity of learning Knowledge TracingKnowledge
(KT) can be represented
as a simple HMM
lity of guess
ity of slip
P(L0)
P(T)
P(T)
Latent
ntation
node
ode
Observed
0 or 1)
0 or 1)
P(G)
P(S)
K
K
K
Q
Q
Q
Node representations
K = Knowledge node
Q = Question node
UMAP 2011
Zach Pardos
Node states
K = Two state (0 or 1)
Q = Two state (0 or 1)
7
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters
lity of initial knowledge
P(L0) = Probability of initial knowledge
P(T) = Probability of learning
Knowledge
ity of learning Four parameters of
the KT model: Tracing
P(G) = Probability of guess
lity of guess
P(S) = Probability of slip
ity of slip
P(T)
P(T)
P(L0) P(L0)
P(T)
ntation
node
ode
P(G)
P(S)
0 or 1)
0 or 1)
P(G)
P(T)
K
K
K
Q
Q
Q
P(G)
P(G)
P(S)
Probability of forgetting assumed to be zero (fixed)
UMAP 2011
Zach Pardos
8
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Formulas for inference and prediction
If πΆπ‘œπ‘Ÿπ‘Ÿπ‘’π‘π‘‘π‘›
𝑃(𝐿𝑛−1 ) =
𝑃 𝐿𝑛−1 ∗(1−𝑃 𝑆 )
𝑃 𝐿𝑛−1 ∗ 1−𝑃 𝑆 + (1−𝑃 𝐿𝑛−1 )∗(𝑃 𝐺 )
(1)
𝑃 𝐿𝑛−1 ∗𝑃 𝑆
𝑃 𝐿𝑛−1 ∗𝑃(𝑆)+ (1−𝑃 𝐿𝑛−1 )∗(1−𝑃 𝐺 )
(2)
πΌπ‘›π‘π‘œπ‘Ÿπ‘Ÿπ‘’π‘π‘‘π‘›
𝑃(𝐿𝑛−1 ) =
𝑃 𝐿𝑛 = 𝑃 (𝐿𝑛−1 ∗ (1 − 𝑃 𝐹 ) + (1 − 𝑃(𝐿𝑛−1 ) ∗ 𝑃(𝑇))
(3)
• Derivation (Reye, JAIED 2004):
• Formulas use Bayes Theorem to make
inferences about latent variable
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Training:
Model Training Step - Values of parameters P(T), P(G), P(S) & P(L0 ) used to predict
student responses
• Ad-hoc values could be used but will likely not be the best fitting
• Goal: find a set of values for the parameters that minimizes prediction error
Student A 0
Student B 1
Student C 0
Zach Pardos
0
1
0
1
0
1
0
1
0
Bayesian Knowledge Tracing & Other Models
1
1
0
1
1
0
PLSC Summer School 2011
Intro to Knowledge Tracing
meters
Model Prediction:
Parameters
Model
ability
knowledge
P(L0of
) =initial
Probability
knowledge
of initial knowledge
Knowledge
Tracing
Knowledge
Tracing
Tracing
ng
ability
P(T)
of=learning
Probability
of learning
Model
Tracing
Step –Knowledge
Skill: Subtraction
ability
P(G)
ofP(K)
=guess
Probability
10% of guess
45%
75%
79%
83%
ability
P(S)
of=slip
Probability
of slip P(L )
P(L0)
P(T)
P(T) P(L0)
P(T)
P(T)
0
Latent
(knowledge)
esentation
Nodes representation
K
dge Knode
= knowledge node
n node
Q = question node
P(G)
P(G)
s Node
P(S)statesQ
P(S)
te (0Kor
= two
1) state (0 or 1)
0 (0 or 1)
teObservable
(0Qor
= two
1) state
K
Q
1
P(G)
P(S)
K
K
K
Q
Q
Q
1
P(Q) 71%
74%
(responses)
Student’s last three responses to Subtraction questions
(in the Unit)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
Test set questions
PLSC Summer School 2011
Intro to Knowledge Tracing
Influence of parameter values
Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
Student reached 95% probability of knowledge
After 4th opportunity
P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Influence of parameter values
Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1
Student reached 95% probability of knowledge
After 8th opportunity
P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09
P(L0): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
( Demo )
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Variations on Knowledge Tracing
(and other models)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters
lity of initial knowledge
Knowledge
Tracing
ity of learning Do all students enter
a lesson with the
same background knowledge?
Knowledge Tracing with Individualized P(L0)
lity of guess
ity of slip
P(L
P(L00|S)
)
P(L
P(T)
P(T)P(T)
P(T)
0|S)
Prior Individualization Approach
ntation
S
node
Observed
ode
P(G)
P(S)
0 or 1)
0 or 1)
K
K
K
Q
Q
Q
Node representations
K = Knowledge node
Q = Question node
S = Student node
Zach Pardos
Node states
K = Two state (0 or 1)
Q = Two state (0 or 1)
S = Multi state (1 to N)
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
P(T) =Table
Probability
ofnode
learning
Conditional Probability
of Student
and Individualized PriorKnowled
node
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability of slip
P(L
CPT of Student node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
S value
P(S=value)
1
1/N
2
1/N
3
1/N
…
…
N
1/N
S value can represent a
cluster or type of student
instead of ID
Zach Pardos
Nodes representation
K = knowledge node
Q =•question
node
CPT of observed
student node is fixed
• Possible
Node
statesto have S value
every student ID
K = for
two
state (0 or 1)
• Raises initialization
Q =issue
two (where
state (0
1)
do or
these
prior values come from?)
Bayesian Knowledge Tracing & Other Models
K
S
P(G)
P(S)
Q
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
P(T) =Table
Probability
ofnode
learning
Conditional Probability
of Student
and Individualized PriorKnowled
node
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability
of slip
P(L
CPT of Individualized
Prior node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
S value
P(L0|S)
3
0.95
Nodes
representation
1
0.05
K = knowledge node
2
0.30
Q = question node
…
…
• Individualized L0 values
need to be seeded
• This CPT can be fixed or
the values can be learned
• Fixing this CPT and
seeding it with values
based on a student’s first
response can be an
effective strategy
Node states
K N= two state 0.92
(0 or 1)
This
that only
Q = model,
two state
(0 or 1)
K
S
P(G)
P(S)
Q
individualizes L0, the Prior
Per Student (PPS) model
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
P(T) =Table
Probability
ofnode
learning
Conditional Probability
of Student
and Individualized PriorKnowled
node
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability
of slip
P(L
CPT of Individualized
Prior node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
• Bootstrapping prior
• If a student answers
incorrectly on the first
question, she gets a low
prior
•If a student answers
correctly on the first
question, she gets a
higher prior
Zach Pardos
S value
P(L0|S)
Nodes
representation
0
0.05
K = knowledge node
1
0.30
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
Bayesian Knowledge Tracing & Other Models
K
S
1
P(G)
P(S)
Q
1
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
Knowled
= Probability
of learning
What values toP(T)
use for
the two priors?
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability
of slip
P(L
CPT of Individualized
Prior node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
What values to use for
the two priors?
S value
P(L0|S)
Nodes
representation
0
0.05
K = knowledge node
1
0.30
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
K
S
1
P(G)
P(S)
Q
1
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
Knowled
= Probability
of learning
What values toP(T)
use for
the two priors?
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability
of slip
P(L
CPT of Individualized
Prior node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
1. Use ad-hoc values
S value
P(L0|S)
Nodes
representation
0
0.10
K = knowledge node
1
0.85
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
K
S
1
P(G)
P(S)
Q
1
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
Knowled
= Probability
of learning
What values toP(T)
use for
the two priors?
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability
of slip
P(L
CPT of Individualized
Prior node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
1. Use ad-hoc values
2. Learn the values
S value
P(L0|S)
Nodes
representation
0
EM
K = knowledge node
1
EM
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
K
S
1
P(G)
P(S)
Q
1
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
Knowled
= Probability
of learning
What values toP(T)
use for
the two priors?
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability
of slip
P(L
CPT of Individualized
Prior node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
1. Use ad-hoc values
2. Learn the values
3. Link with the
guess/slip CPT
S value
P(L0|S)
Nodes
representation
0
Slip
K = knowledge node
1
1-Guess
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
K
S
1
P(G)
P(S)
Q
1
PLSC Summer School 2011
Intro to Knowledge Tracing
Model Parameters
P(L0) = Probability of initial knowledge
Knowled
= Probability
of learning
What values toP(T)
use for
the two priors?
Knowledge Tracing with Ind
P(G) = Probability of guess
P(S) = Probability
of slip
P(L
CPT of Individualized
Prior node
P(L00|S)
)
P(L
P(T)
P(T)
0|S)
Prior Individualization Approach
1. Use ad-hoc values
2. Learn the values
3. Link with the
guess/slip CPT
S value
P(L0|S)
Nodes
representation
0
Slip
K = knowledge node
1
1-Guess
Q = question node
Node states
K = two state (0 or 1)
Q = two state (0 or 1)
K
S
1
P(G)
P(S)
Q
1
With ASSISTments, PPS (ad-hoc) achieved an R2 of 0.301 (0.176 with KT)
(Pardos & Heffernan, UMAP 2010)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Variations on Knowledge Tracing
(and other models)
UMAP 2011
Zach Pardos
25
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters 1. BKT-BF
Learnsknowledge
values for these parameters by
lity of initial
P(L0) = Probability of initial knowledge
performing a grid search (0.01
granularity) Tracing
P(T) = Probability of learning
Knowledge
ity of learning
and chooses the set of parameters with the P(G) = Probability of guess
lity of guess
best squared error
P(S) = Probability of slip
ity of slip
P(T)
P(T)
P(L0) P(L0)
P(T)
ntation
node
ode
P(G)
P(S)
0 or 1)
0 or 1)
P(T)
K
K
K
Q
Q
Q
P(S)
P(G) P(S)
P(G) P(S)
P(G)
...
(Baker et al., 2010)
UMAP 2011
Zach Pardos
26
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters 2. BKT-EM
Learnsknowledge
values for these parameters with
lity of initial
P(L0) = Probability of initial knowledge
Expectation MaximizationKnowledge
(EM). Maximizes Tracing
P(T) = Probability of learning
ity of learning
the log likelihood fit to the data
P(G) = Probability of guess
lity of guess
P(S) = Probability of slip
ity of slip
P(T)
P(T)
P(L0) P(L0)
P(T)
ntation
node
ode
P(G)
P(S)
0 or 1)
0 or 1)
P(T)
K
K
K
Q
Q
Q
P(S)
P(G) P(S)
P(G) P(S)
P(G)
...
(Chang et al., 2006)
UMAP 2011
Zach Pardos
27
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters 3. BKT-CGS
Guessknowledge
and slip parameters are assessed
lity of initial
P(L0) = Probability of initial knowledge
contextually using a regression
on features Tracing
P(T) = Probability of learning
Knowledge
ity of learning
generated from student performance in the P(G) = Probability of guess
lity of guess
tutor
P(S) = Probability of slip
ity of slip
P(T)
P(T)
P(L0) P(L0)
P(T)
ntation
node
ode
P(G)
P(S)
0 or 1)
0 or 1)
P(T)
K
K
K
Q
Q
Q
P(S)
P(G) P(S)
P(G) P(S)
P(G)
...
(Baker, Corbett, & Aleven, 2008)
UMAP 2011
Zach Pardos
28
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters 4. BKT-CSlip
Uses the
student’s averaged contextual Slip P(L ) = Probability of initial knowledge
lity of initial
knowledge
0
parameter
learned
across
all
incorrect
P(T) = Probability of learning
Knowledge Tracing
ity of learning
actions.
P(G) = Probability of guess
lity of guess
P(S) = Probability of slip
ity of slip
P(T)
P(T)
P(L0) P(L0)
P(T)
ntation
node
ode
P(G)
P(S)
0 or 1)
0 or 1)
P(T)
K
K
K
Q
Q
Q
P(S)
P(G) P(S)
P(G) P(S)
P(G)
...
(Baker, Corbett, & Aleven, 2008)
UMAP 2011
Zach Pardos
29
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters 5. BKT-LessData
Limitsknowledge
students response sequence length to P(L ) = Probability of initial knowledge
lity of initial
0
the
most
recent
15
during
EM
training.
P(T) = Probability of learning
Knowledge Tracing
ity of learning
P(G) = Probability of guess
lity of guess
P(S) = Probability of slip
ity of slip
P(T)
P(T)
P(L0) P(L0)
P(T)
ntation
node
ode
P(G)
P(S)
0 or 1)
0 or 1)
P(T)
K
K
K
Q
Q
Q
P(G) P(S)
P(G)
...
P(S)
P(G) P(S)
Most recent 15 responses used (max)
(Nooraiei et al, 2011)
UMAP 2011
Zach Pardos
30
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
ters 6. BKT-PPS
Prior per
student (PPS) model which
lity of initial
knowledge
P(L0) = Probability of initial knowledge
individualizes the prior parameter.
Students Tracing
P(T) = Probability of learning
Knowledge
ity of learning
are assigned a prior
based on
their Individualized
response P(G) = Probability
Knowledge
Tracing
with
P(L0) of guess
lity of guess
to the first question.
P(S) = Probability of slip
ity of slip
P(L00|S)
)
P(L
P(T)
P(T)P(T)
P(T)
P(L
P(L
0|S)
0)
P(T)
ntation
S
node
Observed
ode
P(G)
P(S)
0 or 1)
0 or 1)
P(T)
K
K
K
Q
Q
Q
P(G) P(S)
P(G)
P(S)
...
P(G) P(S)
(Pardos & Heffernan, 2010)
UMAP 2011
Zach Pardos
31
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
7. CFAR
Correct on First Attempt Rate (CFAR)
calculates the student’s percent correct on
the current skill up until the question being
predicted.
Student responses for Skill X: 0 1 0 1 0 1
_
Predicted next response would be 0.50
(Yu et al., 2010)
UMAP 2011
Zach Pardos
32
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
8. Tabling
Uses the student’s response sequence (max
length 3) to predict the next response by
looking up the average next response among
student with the same sequence in the
training set
Training set
Student A: 0 1 1 0
Student B: 0 1 1 1
Student C: 0 1 1 1
Max table length set to 3:
Table size was 20+21+22+23=15
Test set student: 0 0 1 _
Predicted next response would be 0.66
(Wang et al., 2011)
UMAP 2011
Zach Pardos
33
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
9. PFA
Performance Factors Analysis (PFA). Logistic
regression model which elaborates on the
Rasch IRT model. Predicts performance based
on the count of student’s prior failures and
successes on the current skill.
An overall difficulty parameter ᡝ is also fit for
each skill or each item In this study we use the
variant of PFA that fits ᡝ for each skill. The PFA
equation is:
π‘š 𝑖, 𝑗 ∈ 𝐾𝐢𝑠, 𝑠, 𝑓 = 𝛽𝑗 +
(𝛾𝑗 𝑆𝑖𝑗 +πœŒπ‘— 𝐹𝑖𝑗 )
(Pavlik et al., 2009)
UMAP 2011
Zach Pardos
34
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge
Tracing
Methodology
Dataset
Study
Evaluation
• Cognitive Tutor for Genetics
– 76 CMU undergraduate students
– 9 Skills (no multi-skill steps)
– 23,706 problem solving attempts
– 11,582 problem steps in the tutor
– 152 average problem steps completed per student
(SD=50)
– Pre and post-tests were administered with this
assignment
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to
Knowledge
Tracing
Methodology
Study
model in-tutor prediction
•
Predictions were made by the 9 models using a 5 fold cross-validation by
student
…
Student 1
Student 1
•
Actual
Skill A
Resp 1
0.10
0.22
0
Skill A
Resp 2
….
0.51
0.26
1
Skill A
Resp N
0.77
0.40
1
Skill B
Resp 1
…
0.55
0.60
1
Skill B
Resp N
0.41
0.61
0
Accuracy was calculated with A’ for each student. Those values were then
averaged across students to report the model’s A’ (higher is better)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge
Tracing
Results
Study
in-tutor model prediction
A’ results averaged across students
Model
A’
Zach Pardos
BKT-PPS
0.7029
BKT-BF
0.6969
BKT-EM
0.6957
BKT-LessData
0.6839
PFA
0.6629
Tabling
0.6476
BKT-CSlip
0.6149
CFAR
0.5705
BKT-CGS
0.4857
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge
Tracing
Results
Study
in-tutor model prediction
No significant
differences within
these BKT
A’ results averaged across students
Model
A’
BKT-PPS
0.7029
BKT-BF
0.6969
BKT-EM
0.6957
Significant
BKT-LessData
differences
between these BKT PFA
and PFA
Tabling
Zach Pardos
0.6839
0.6629
0.6476
BKT-CSlip
0.6149
CFAR
0.5705
BKT-CGS
0.4857
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to
Knowledge
Tracing
Methodology
Study
ensemble in-tutor prediction
•
5 ensemble methods were used, trained with the same 5 fold cross-validation
folds
features
Student 1
Student 1
•
…
label
Actual
Skill A
Resp 1
0.10
0.22
0
Skill A
Resp 2
….
0.51
0.26
1
Skill A
Resp N
0.77
0.40
1
Skill B
Resp 1
…
0.55
0.60
1
Skill B
Resp N
0.41
0.61
0
Ensemble methods were trained using the 9 model predictions as the features
and the actual response as the label.
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to
Knowledge
Tracing
Methodology
Study
ensemble in-tutor prediction
• Ensemble methods used:
1.
2.
3.
4.
5.
Linear regression with no feature selection (predictions bounded between {0,1})
Linear regression with feature selection (stepwise regression)
Linear regression with only BKT-PPS & BKT-EM
Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip
Logistic regression
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge
Tracing
Results
Study
in-tutor ensemble prediction
A’ results averaged across students
Model
A’
Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip
0.7028
Ensemble: LinReg with BKT-PPS & BKT-EM
0.6973
Ensemble: LinReg without feature selection
0.6945
Ensemble: LinReg with feature selection (stepwise)
0.6954
Ensemble: Logistic without feature selection
0.6854
No significant difference between ensembles
Ta
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge
Tracing
Results
Study
in-tutor ensemble & model prediction
A’ results averaged across students
Model
A’
BKT-PPS
0.7029
Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip
0.7028
Ensemble: LinReg with BKT-PPS & BKT-EM
0.6973
BKT-BF
0.6969
BKT-EM
0.6957
Ensemble: LinReg without feature selection
0.6945
Ensemble: LinReg with feature selection (stepwise)
0.6954
Ensemble: Logistic without feature selection
0.6854
BKT-LessData
0.6839
PFA
0.6629
Tabling
0.6476
BKT-CSlip
0.6149
CFAR
0.5705
BKT-CGS
0.4857
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge
Tracing
Results
Study
in-tutor ensemble & model prediction
A’ results calculated across all actions
Model
A’
Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip
Ensemble: LinReg without feature selection
Ensemble: LinReg with feature selection (stepwise)
Ensemble: Logistic regression without feature selection
Ensemble: LinReg with BKT-PPS & BKT-EM
BKT-EM
BKT-BF
BKT-PPS
PFA
BKT-LessData
CFAR
Tabling
Contextual Slip
BKT-CGS
Zach Pardos
Bayesian Knowledge Tracing & Other Models
0.7451
0.7428
0.7423
0.7359
0.7348
0.7348
0.7330
0.7310
0.7277
0.7220
0.6723
0.6712
0.6396
0.4917
PLSC Summer School 2011
Random Forests
Random Forests
In the KDD Cup
• Motivation for trying non KT approach:
– Bayesian method only uses KC, opportunity count and
student as features. Much information is left unutilized.
Another machine learning method is required
• Strategy:
– Engineer additional features from the dataset and use
Random Forests to train a model
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
• Strategy:
– Create rich feature datasets that include features created
from features not included in the test set
Feature Rich Validation
set 2 (frval2)
Zach Pardos
Validation set 1
(val1)
raw test dataset rows
Non validation
training rows
(nvtrain)
Validation set 2
(val2)
raw training dataset rows
Feature Rich Validation
set 1 (frval1)
Feature Rich Test set
(frtest)
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
• Created by Leo Breiman
• The method trains T number of separate decision
tree classifiers (50-800)
• Each decision tree selects a random 1/P portion of
the available features (1/3)
• The tree is grown until there are at least M
observations in the leaf (1-100)
• When classifying unseen data, each tree votes on the
class. The popular vote wins or an average of the
votes (for regression)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
Feature Importance
Features extracted from training set:
• Student progress features (avg. importance: 1.67)
–
–
–
–
Number of data points [today, since the start of unit]
Number of correct responses out of the last [3, 5, 10]
Zscore sum for step duration, hint requests, incorrects
Skill specific version of all these features
• Percent correct features (avg. importance: 1.60)
– % correct of unit, section, problem and step and total for each skill and also for each
student (10 features)
• Student Modeling Approach features (avg. importance: 1.32)
– The predicted probability of correct for the test row
– The number of data points used in training the parameters
– The final EM log likelihood fit of the parameters / data points
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
• Features of the user were more important in Bridge
to Algebra than Algebra
• Student progress features / gaming the system (Baker
et al., UMUAI 2008) were important in both datasets
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
Algebra
Rank
Feature set
RMSE
Coverage
1
All features
0.2762
87%
2
Percent correct+
0.2824
96%
3
All features (fill)
0.2847
97%
Bridge to Algebra
Rank
Feature set
RMSE
Coverage
1
All features
0.2712
92%
2
All features (fill)
0.2791
99%
3
Percent correct+
0.2800
98%
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
Algebra
Rank
Feature set
RMSE
Coverage
1
All features
0.2762
87%
2
Percent correct+
0.2824
96%
3
All features (fill)
0.2847
97%
Bridge to Algebra
Rank
Feature set
RMSE
Coverage
1
All features
0.2712
92%
2
All features (fill)
0.2791
99%
3
Percent correct+
0.2800
98%
• Best Bridge to Algebra RMSE on the Leaderboard was 0.2777
• Random Forest RMSE of 0.2712 here is exceptional
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
Algebra
Rank
Feature set
RMSE
Coverage
1
All features
0.2762
87%
2
Percent correct+
0.2824
96%
3
All features (fill)
0.2847
97%
Bridge to Algebra
Rank
Feature set
RMSE
Coverage
1
All features
0.2712
92%
2
All features (fill)
0.2791
99%
3
Percent correct+
0.2800
98%
• Skill data for a student was not always available for each test row
• Because of this many skill related feature sets only had 92% coverage
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Conclusion from KDD
• Combining user features with skill features was very
powerful in both modeling and classification
approaches
• Model tracing based predictions performed
formidably against pure machine learning techniques
• Random Forests also performed very well on this
educational data set compared to other approaches
such as Neural Networks and SVMs. This method
could significantly boost accuracy in other EDM
datasets.
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Random Forests
Hardware/Software
• Software
– MATLAB used for all analysis
• Bayes Net Toolbox for Bayesian Networks Models
• Statistics Toolbox for Random Forests classifier
– Perl used for pre-processing
• Hardware
– Two rocks clusters used for skill model training
• 178 CPUs in total. Training of KT models took ~48 hours when
utilizing all CPUs.
– Two 32gig RAM systems for Random Forests
• RF models took ~16 hours to train with 800 trees
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Time left?
Choose the next topic
•
•
•
•
•
KT: 1-35
Prediction: 36-67
Evaluation: 47-77
sig tests: 69-77
Regression/sig tests: 80-112
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Individualize Everything?
UMAP 2011
Zach Pardos
55
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Fully Individualized Model
Model Parameters
P(L0) = Probability of initial knowledge
P(L0|Q1) = Individual Cold start P(L0)
P(T) = Probability of learning
P(T|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(G|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(S|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S = Student node
Q1= first response node
T = Learning node
G = Guessing node
S = Slipping node
Node states
K , Q, Q1, T, G, S = Two state (0 or 1)
Q = Two state (0 or 1)
S = Multi state (1 to N)
(Where N is the number of students in the training data)
P(T|S)
Student-Skill Interaction Model
T
P(L0|Q1)
S
Q1
P(G)
P(S)
P(T)
P(T)
K
K
K
Q
Q
Q
Parameters in bold are learned
from data while the others are fixed
P(G|S)
P(S|S)
G
S
(Pardos & Heffernan, JMLR 2011)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Fully Individualized Model
Model Parameters
P(L0) = Probability of initial knowledge
P(L0|Q1) = Individual Cold start P(L0)
P(T) = Probability of learning
P(T|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(G|S) = Students’ Individual P(G)
P(S) = Probability of slip
S identifies
the P(S)
student
Individual
P(S|S) Students’
Node representations
K = Knowledge node
Q = Question node
S = Student node
Q1= first response node
T = Learning node
G = Guessing node
S = Slipping node
Node states
K , Q, Q1, T, G, S = Two state (0 or 1)
Q = Two state (0 or 1)
S = Multi state (1 to N)
(Where N is the number of students in the training data)
P(T|S)
Student-Skill Interaction Model
T
P(L0|Q1)
S
Q1
P(G)
P(S)
P(T)
P(T)
K
K
K
Q
Q
Q
Parameters in bold are learned
from data while the others are fixed
P(G|S)
P(S|S)
G
S
(Pardos & Heffernan, JMLR 2011)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Fully Individualized Model
Model Parameters
P(L0) = Probability of initial knowledge
P(L0|Q1) = Individual Cold start P(L0)
P(T) = Probability of learning
P(T|S) = Students’ Individual P(T)
P(G) = Probability of guess
P(G|S) = Students’ Individual P(G)
P(S) = Probability of slip
P(S|S) Students’ Individual P(S)
Node representations
K = Knowledge node
Q = Question node
S = Student node
Q1= first response node
T = Learning node
G = Guessing node
S = Slipping node
Node states
, Q, Q1, T, G, S = Two state (0 or 1)
T KQcontains
the CPT
= Two state (0 or 1)
lookup
table
(1 to N)
stateof
S = Multi
the number of students in the training data)
(Where N isstudent
individual
learn rates
Student-Skill Interaction Model
P(T|S)
T
P(L0|Q1)
S
Q1
P(G)
P(S)
P(T)
P(T)
K
K
K
Q
Q
Q
Parameters in bold are learned
from data while the others are fixed
P(G|S)
P(S|S)
G
S
(Pardos & Heffernan, JMLR 2011)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
Fully Individualized Model
Model Parameters
Node states
P(L0) = Probability of initial knowledge
K , Q, Q1, T, G, S = Two state (0 or 1)
P(L0|Q1) = Individual Cold start P(L0)
Q = Two state (0 or 1)
P(T) = Probability of learning
S = Multi state (1 to N)
P(T|S) = Students’ Individual
is the number of students in the training data)
(Where
P(T) isP(T)
trained for
eachNskill
P(G) = Probability of guess
which gives a learn rate for:
P(T|S)
P(G|S) = Students’ Individual P(G)
Student-Skill Interaction Model
P(T|T=1) [high learner] and
P(S) = Probability of slip
T
P(S) [low learner]
P(S|S) Students’ Individual
P(T|T=0)
Node representations
K = Knowledge node
Q = Question node
S = Student node
Q1= first response node
T = Learning node
G = Guessing node
S = Slipping node
P(L0|Q1)
S
Q1
P(G)
P(S)
P(T)
P(T)
K
K
K
Q
Q
Q
Parameters in bold are learned
from data while the others are fixed
P(G|S)
P(S|S)
G
S
(Pardos & Heffernan, JMLR 2011)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Intro to Knowledge Tracing
SSI model results
0.286
RMSE
0.285
0.284
Dataset
0.283
Algebra
New RMSE
Prev RMSE
Improvement
0.2813
0.2835
0.0022
Bridge to Algebra
0.282
0.2824
0.2860
0.0036
PPS
SSI
Average of Improvement is the difference between the 1st and 3rd
0.281
place. It is also the difference between 3rd and 4th place.
0.28
The difference between PPS and SSI are significant in each dataset at
the P < 0.01 level (t-test of squared errors)
0.279
Algebra
Bridge to Algebra
(Pardos & Heffernan, JMLR 2011)
Zach Pardos
Bayesian Knowledge Tracing & Other Models
PLSC Summer School 2011
Download