Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Outline of Talk • Introduction to Knowledge Tracing – History Intuition Model – Demo – – – – • Random Forests – – • Variations (and other models) Evaluations (baker work / kdd) Description Evaluations (kdd) Time left? – Vote on next topic 2 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing History • Introduced in 1995 (Corbett & Anderson, UMUAI) • Basked on ACT-R theory of skill knowledge (Anderson 1993) • Computations based on a variation of Bayesian calculations proposed in 1972 (Atkinson) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Intuition • Based on the idea that practice on a skill leads to mastery of that skill • Has four parameters used to describe student performance • Relies on a KC model • Tracks student knowledge over time Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing For some Skill K: Given a student’s response sequence 1 to n, predict n+1 1 …. 0 0 1 0 n n+1 1 1 ? Chronological response sequence for student Y [ 0 = Incorrect response 1 = Correct response] Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Track knowledge over time (model of learning) 0 0 Zach Pardos 1 0 1 Bayesian Knowledge Tracing & Other Models 1 1 PLSC Summer School 2011 Intro to Knowledge Tracing ters lity of initial knowledge Tracing ity of learning Knowledge TracingKnowledge (KT) can be represented as a simple HMM lity of guess ity of slip P(L0) P(T) P(T) Latent ntation node ode Observed 0 or 1) 0 or 1) P(G) P(S) K K K Q Q Q Node representations K = Knowledge node Q = Question node UMAP 2011 Zach Pardos Node states K = Two state (0 or 1) Q = Two state (0 or 1) 7 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters lity of initial knowledge P(L0) = Probability of initial knowledge P(T) = Probability of learning Knowledge ity of learning Four parameters of the KT model: Tracing P(G) = Probability of guess lity of guess P(S) = Probability of slip ity of slip P(T) P(T) P(L0) P(L0) P(T) ntation node ode P(G) P(S) 0 or 1) 0 or 1) P(G) P(T) K K K Q Q Q P(G) P(G) P(S) Probability of forgetting assumed to be zero (fixed) UMAP 2011 Zach Pardos 8 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Formulas for inference and prediction If πΆππππππ‘π π(πΏπ−1 ) = π πΏπ−1 ∗(1−π π ) π πΏπ−1 ∗ 1−π π + (1−π πΏπ−1 )∗(π πΊ ) (1) π πΏπ−1 ∗π π π πΏπ−1 ∗π(π)+ (1−π πΏπ−1 )∗(1−π πΊ ) (2) πΌππππππππ‘π π(πΏπ−1 ) = π πΏπ = π (πΏπ−1 ∗ (1 − π πΉ ) + (1 − π(πΏπ−1 ) ∗ π(π)) (3) • Derivation (Reye, JAIED 2004): • Formulas use Bayes Theorem to make inferences about latent variable Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Model Training: Model Training Step - Values of parameters P(T), P(G), P(S) & P(L0 ) used to predict student responses • Ad-hoc values could be used but will likely not be the best fitting • Goal: find a set of values for the parameters that minimizes prediction error Student A 0 Student B 1 Student C 0 Zach Pardos 0 1 0 1 0 1 0 1 0 Bayesian Knowledge Tracing & Other Models 1 1 0 1 1 0 PLSC Summer School 2011 Intro to Knowledge Tracing meters Model Prediction: Parameters Model ability knowledge P(L0of ) =initial Probability knowledge of initial knowledge Knowledge Tracing Knowledge Tracing Tracing ng ability P(T) of=learning Probability of learning Model Tracing Step –Knowledge Skill: Subtraction ability P(G) ofP(K) =guess Probability 10% of guess 45% 75% 79% 83% ability P(S) of=slip Probability of slip P(L ) P(L0) P(T) P(T) P(L0) P(T) P(T) 0 Latent (knowledge) esentation Nodes representation K dge Knode = knowledge node n node Q = question node P(G) P(G) s Node P(S)statesQ P(S) te (0Kor = two 1) state (0 or 1) 0 (0 or 1) teObservable (0Qor = two 1) state K Q 1 P(G) P(S) K K K Q Q Q 1 P(Q) 71% 74% (responses) Student’s last three responses to Subtraction questions (in the Unit) Zach Pardos Bayesian Knowledge Tracing & Other Models Test set questions PLSC Summer School 2011 Intro to Knowledge Tracing Influence of parameter values Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1 Student reached 95% probability of knowledge After 4th opportunity P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Influence of parameter values Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1 Student reached 95% probability of knowledge After 8th opportunity P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 P(L0): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ( Demo ) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Variations on Knowledge Tracing (and other models) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters lity of initial knowledge Knowledge Tracing ity of learning Do all students enter a lesson with the same background knowledge? Knowledge Tracing with Individualized P(L0) lity of guess ity of slip P(L P(L00|S) ) P(L P(T) P(T)P(T) P(T) 0|S) Prior Individualization Approach ntation S node Observed ode P(G) P(S) 0 or 1) 0 or 1) K K K Q Q Q Node representations K = Knowledge node Q = Question node S = Student node Zach Pardos Node states K = Two state (0 or 1) Q = Two state (0 or 1) S = Multi state (1 to N) Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) =Table Probability ofnode learning Conditional Probability of Student and Individualized PriorKnowled node Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Student node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach S value P(S=value) 1 1/N 2 1/N 3 1/N … … N 1/N S value can represent a cluster or type of student instead of ID Zach Pardos Nodes representation K = knowledge node Q =•question node CPT of observed student node is fixed • Possible Node statesto have S value every student ID K = for two state (0 or 1) • Raises initialization Q =issue two (where state (0 1) do or these prior values come from?) Bayesian Knowledge Tracing & Other Models K S P(G) P(S) Q PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) =Table Probability ofnode learning Conditional Probability of Student and Individualized PriorKnowled node Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Individualized Prior node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach S value P(L0|S) 3 0.95 Nodes representation 1 0.05 K = knowledge node 2 0.30 Q = question node … … • Individualized L0 values need to be seeded • This CPT can be fixed or the values can be learned • Fixing this CPT and seeding it with values based on a student’s first response can be an effective strategy Node states K N= two state 0.92 (0 or 1) This that only Q = model, two state (0 or 1) K S P(G) P(S) Q individualizes L0, the Prior Per Student (PPS) model Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge P(T) =Table Probability ofnode learning Conditional Probability of Student and Individualized PriorKnowled node Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Individualized Prior node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach • Bootstrapping prior • If a student answers incorrectly on the first question, she gets a low prior •If a student answers correctly on the first question, she gets a higher prior Zach Pardos S value P(L0|S) Nodes representation 0 0.05 K = knowledge node 1 0.30 Q = question node Node states K = two state (0 or 1) Q = two state (0 or 1) Bayesian Knowledge Tracing & Other Models K S 1 P(G) P(S) Q 1 PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge Knowled = Probability of learning What values toP(T) use for the two priors? Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Individualized Prior node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach What values to use for the two priors? S value P(L0|S) Nodes representation 0 0.05 K = knowledge node 1 0.30 Q = question node Node states K = two state (0 or 1) Q = two state (0 or 1) Zach Pardos Bayesian Knowledge Tracing & Other Models K S 1 P(G) P(S) Q 1 PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge Knowled = Probability of learning What values toP(T) use for the two priors? Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Individualized Prior node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach 1. Use ad-hoc values S value P(L0|S) Nodes representation 0 0.10 K = knowledge node 1 0.85 Q = question node Node states K = two state (0 or 1) Q = two state (0 or 1) Zach Pardos Bayesian Knowledge Tracing & Other Models K S 1 P(G) P(S) Q 1 PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge Knowled = Probability of learning What values toP(T) use for the two priors? Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Individualized Prior node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach 1. Use ad-hoc values 2. Learn the values S value P(L0|S) Nodes representation 0 EM K = knowledge node 1 EM Q = question node Node states K = two state (0 or 1) Q = two state (0 or 1) Zach Pardos Bayesian Knowledge Tracing & Other Models K S 1 P(G) P(S) Q 1 PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge Knowled = Probability of learning What values toP(T) use for the two priors? Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Individualized Prior node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach 1. Use ad-hoc values 2. Learn the values 3. Link with the guess/slip CPT S value P(L0|S) Nodes representation 0 Slip K = knowledge node 1 1-Guess Q = question node Node states K = two state (0 or 1) Q = two state (0 or 1) Zach Pardos Bayesian Knowledge Tracing & Other Models K S 1 P(G) P(S) Q 1 PLSC Summer School 2011 Intro to Knowledge Tracing Model Parameters P(L0) = Probability of initial knowledge Knowled = Probability of learning What values toP(T) use for the two priors? Knowledge Tracing with Ind P(G) = Probability of guess P(S) = Probability of slip P(L CPT of Individualized Prior node P(L00|S) ) P(L P(T) P(T) 0|S) Prior Individualization Approach 1. Use ad-hoc values 2. Learn the values 3. Link with the guess/slip CPT S value P(L0|S) Nodes representation 0 Slip K = knowledge node 1 1-Guess Q = question node Node states K = two state (0 or 1) Q = two state (0 or 1) K S 1 P(G) P(S) Q 1 With ASSISTments, PPS (ad-hoc) achieved an R2 of 0.301 (0.176 with KT) (Pardos & Heffernan, UMAP 2010) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Variations on Knowledge Tracing (and other models) UMAP 2011 Zach Pardos 25 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters 1. BKT-BF Learnsknowledge values for these parameters by lity of initial P(L0) = Probability of initial knowledge performing a grid search (0.01 granularity) Tracing P(T) = Probability of learning Knowledge ity of learning and chooses the set of parameters with the P(G) = Probability of guess lity of guess best squared error P(S) = Probability of slip ity of slip P(T) P(T) P(L0) P(L0) P(T) ntation node ode P(G) P(S) 0 or 1) 0 or 1) P(T) K K K Q Q Q P(S) P(G) P(S) P(G) P(S) P(G) ... (Baker et al., 2010) UMAP 2011 Zach Pardos 26 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters 2. BKT-EM Learnsknowledge values for these parameters with lity of initial P(L0) = Probability of initial knowledge Expectation MaximizationKnowledge (EM). Maximizes Tracing P(T) = Probability of learning ity of learning the log likelihood fit to the data P(G) = Probability of guess lity of guess P(S) = Probability of slip ity of slip P(T) P(T) P(L0) P(L0) P(T) ntation node ode P(G) P(S) 0 or 1) 0 or 1) P(T) K K K Q Q Q P(S) P(G) P(S) P(G) P(S) P(G) ... (Chang et al., 2006) UMAP 2011 Zach Pardos 27 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters 3. BKT-CGS Guessknowledge and slip parameters are assessed lity of initial P(L0) = Probability of initial knowledge contextually using a regression on features Tracing P(T) = Probability of learning Knowledge ity of learning generated from student performance in the P(G) = Probability of guess lity of guess tutor P(S) = Probability of slip ity of slip P(T) P(T) P(L0) P(L0) P(T) ntation node ode P(G) P(S) 0 or 1) 0 or 1) P(T) K K K Q Q Q P(S) P(G) P(S) P(G) P(S) P(G) ... (Baker, Corbett, & Aleven, 2008) UMAP 2011 Zach Pardos 28 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters 4. BKT-CSlip Uses the student’s averaged contextual Slip P(L ) = Probability of initial knowledge lity of initial knowledge 0 parameter learned across all incorrect P(T) = Probability of learning Knowledge Tracing ity of learning actions. P(G) = Probability of guess lity of guess P(S) = Probability of slip ity of slip P(T) P(T) P(L0) P(L0) P(T) ntation node ode P(G) P(S) 0 or 1) 0 or 1) P(T) K K K Q Q Q P(S) P(G) P(S) P(G) P(S) P(G) ... (Baker, Corbett, & Aleven, 2008) UMAP 2011 Zach Pardos 29 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters 5. BKT-LessData Limitsknowledge students response sequence length to P(L ) = Probability of initial knowledge lity of initial 0 the most recent 15 during EM training. P(T) = Probability of learning Knowledge Tracing ity of learning P(G) = Probability of guess lity of guess P(S) = Probability of slip ity of slip P(T) P(T) P(L0) P(L0) P(T) ntation node ode P(G) P(S) 0 or 1) 0 or 1) P(T) K K K Q Q Q P(G) P(S) P(G) ... P(S) P(G) P(S) Most recent 15 responses used (max) (Nooraiei et al, 2011) UMAP 2011 Zach Pardos 30 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing ters 6. BKT-PPS Prior per student (PPS) model which lity of initial knowledge P(L0) = Probability of initial knowledge individualizes the prior parameter. Students Tracing P(T) = Probability of learning Knowledge ity of learning are assigned a prior based on their Individualized response P(G) = Probability Knowledge Tracing with P(L0) of guess lity of guess to the first question. P(S) = Probability of slip ity of slip P(L00|S) ) P(L P(T) P(T)P(T) P(T) P(L P(L 0|S) 0) P(T) ntation S node Observed ode P(G) P(S) 0 or 1) 0 or 1) P(T) K K K Q Q Q P(G) P(S) P(G) P(S) ... P(G) P(S) (Pardos & Heffernan, 2010) UMAP 2011 Zach Pardos 31 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing 7. CFAR Correct on First Attempt Rate (CFAR) calculates the student’s percent correct on the current skill up until the question being predicted. Student responses for Skill X: 0 1 0 1 0 1 _ Predicted next response would be 0.50 (Yu et al., 2010) UMAP 2011 Zach Pardos 32 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing 8. Tabling Uses the student’s response sequence (max length 3) to predict the next response by looking up the average next response among student with the same sequence in the training set Training set Student A: 0 1 1 0 Student B: 0 1 1 1 Student C: 0 1 1 1 Max table length set to 3: Table size was 20+21+22+23=15 Test set student: 0 0 1 _ Predicted next response would be 0.66 (Wang et al., 2011) UMAP 2011 Zach Pardos 33 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing 9. PFA Performance Factors Analysis (PFA). Logistic regression model which elaborates on the Rasch IRT model. Predicts performance based on the count of student’s prior failures and successes on the current skill. An overall difficulty parameter α΅ is also fit for each skill or each item In this study we use the variant of PFA that fits α΅ for each skill. The PFA equation is: π π, π ∈ πΎπΆπ , π , π = π½π + (πΎπ πππ +ππ πΉππ ) (Pavlik et al., 2009) UMAP 2011 Zach Pardos 34 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Methodology Dataset Study Evaluation • Cognitive Tutor for Genetics – 76 CMU undergraduate students – 9 Skills (no multi-skill steps) – 23,706 problem solving attempts – 11,582 problem steps in the tutor – 152 average problem steps completed per student (SD=50) – Pre and post-tests were administered with this assignment Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Methodology Study model in-tutor prediction • Predictions were made by the 9 models using a 5 fold cross-validation by student … Student 1 Student 1 • Actual Skill A Resp 1 0.10 0.22 0 Skill A Resp 2 …. 0.51 0.26 1 Skill A Resp N 0.77 0.40 1 Skill B Resp 1 … 0.55 0.60 1 Skill B Resp N 0.41 0.61 0 Accuracy was calculated with A’ for each student. Those values were then averaged across students to report the model’s A’ (higher is better) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Results Study in-tutor model prediction A’ results averaged across students Model A’ Zach Pardos BKT-PPS 0.7029 BKT-BF 0.6969 BKT-EM 0.6957 BKT-LessData 0.6839 PFA 0.6629 Tabling 0.6476 BKT-CSlip 0.6149 CFAR 0.5705 BKT-CGS 0.4857 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Results Study in-tutor model prediction No significant differences within these BKT A’ results averaged across students Model A’ BKT-PPS 0.7029 BKT-BF 0.6969 BKT-EM 0.6957 Significant BKT-LessData differences between these BKT PFA and PFA Tabling Zach Pardos 0.6839 0.6629 0.6476 BKT-CSlip 0.6149 CFAR 0.5705 BKT-CGS 0.4857 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Methodology Study ensemble in-tutor prediction • 5 ensemble methods were used, trained with the same 5 fold cross-validation folds features Student 1 Student 1 • … label Actual Skill A Resp 1 0.10 0.22 0 Skill A Resp 2 …. 0.51 0.26 1 Skill A Resp N 0.77 0.40 1 Skill B Resp 1 … 0.55 0.60 1 Skill B Resp N 0.41 0.61 0 Ensemble methods were trained using the 9 model predictions as the features and the actual response as the label. Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Methodology Study ensemble in-tutor prediction • Ensemble methods used: 1. 2. 3. 4. 5. Linear regression with no feature selection (predictions bounded between {0,1}) Linear regression with feature selection (stepwise regression) Linear regression with only BKT-PPS & BKT-EM Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip Logistic regression Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Results Study in-tutor ensemble prediction A’ results averaged across students Model A’ Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028 Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973 Ensemble: LinReg without feature selection 0.6945 Ensemble: LinReg with feature selection (stepwise) 0.6954 Ensemble: Logistic without feature selection 0.6854 No significant difference between ensembles Ta Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Results Study in-tutor ensemble & model prediction A’ results averaged across students Model A’ BKT-PPS 0.7029 Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028 Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973 BKT-BF 0.6969 BKT-EM 0.6957 Ensemble: LinReg without feature selection 0.6945 Ensemble: LinReg with feature selection (stepwise) 0.6954 Ensemble: Logistic without feature selection 0.6854 BKT-LessData 0.6839 PFA 0.6629 Tabling 0.6476 BKT-CSlip 0.6149 CFAR 0.5705 BKT-CGS 0.4857 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Results Study in-tutor ensemble & model prediction A’ results calculated across all actions Model A’ Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic regression without feature selection Ensemble: LinReg with BKT-PPS & BKT-EM BKT-EM BKT-BF BKT-PPS PFA BKT-LessData CFAR Tabling Contextual Slip BKT-CGS Zach Pardos Bayesian Knowledge Tracing & Other Models 0.7451 0.7428 0.7423 0.7359 0.7348 0.7348 0.7330 0.7310 0.7277 0.7220 0.6723 0.6712 0.6396 0.4917 PLSC Summer School 2011 Random Forests Random Forests In the KDD Cup • Motivation for trying non KT approach: – Bayesian method only uses KC, opportunity count and student as features. Much information is left unutilized. Another machine learning method is required • Strategy: – Engineer additional features from the dataset and use Random Forests to train a model Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests • Strategy: – Create rich feature datasets that include features created from features not included in the test set Feature Rich Validation set 2 (frval2) Zach Pardos Validation set 1 (val1) raw test dataset rows Non validation training rows (nvtrain) Validation set 2 (val2) raw training dataset rows Feature Rich Validation set 1 (frval1) Feature Rich Test set (frtest) Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests • Created by Leo Breiman • The method trains T number of separate decision tree classifiers (50-800) • Each decision tree selects a random 1/P portion of the available features (1/3) • The tree is grown until there are at least M observations in the leaf (1-100) • When classifying unseen data, each tree votes on the class. The popular vote wins or an average of the votes (for regression) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests Feature Importance Features extracted from training set: • Student progress features (avg. importance: 1.67) – – – – Number of data points [today, since the start of unit] Number of correct responses out of the last [3, 5, 10] Zscore sum for step duration, hint requests, incorrects Skill specific version of all these features • Percent correct features (avg. importance: 1.60) – % correct of unit, section, problem and step and total for each skill and also for each student (10 features) • Student Modeling Approach features (avg. importance: 1.32) – The predicted probability of correct for the test row – The number of data points used in training the parameters – The final EM log likelihood fit of the parameters / data points Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests • Features of the user were more important in Bridge to Algebra than Algebra • Student progress features / gaming the system (Baker et al., UMUAI 2008) were important in both datasets Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests Algebra Rank Feature set RMSE Coverage 1 All features 0.2762 87% 2 Percent correct+ 0.2824 96% 3 All features (fill) 0.2847 97% Bridge to Algebra Rank Feature set RMSE Coverage 1 All features 0.2712 92% 2 All features (fill) 0.2791 99% 3 Percent correct+ 0.2800 98% Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests Algebra Rank Feature set RMSE Coverage 1 All features 0.2762 87% 2 Percent correct+ 0.2824 96% 3 All features (fill) 0.2847 97% Bridge to Algebra Rank Feature set RMSE Coverage 1 All features 0.2712 92% 2 All features (fill) 0.2791 99% 3 Percent correct+ 0.2800 98% • Best Bridge to Algebra RMSE on the Leaderboard was 0.2777 • Random Forest RMSE of 0.2712 here is exceptional Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests Algebra Rank Feature set RMSE Coverage 1 All features 0.2762 87% 2 Percent correct+ 0.2824 96% 3 All features (fill) 0.2847 97% Bridge to Algebra Rank Feature set RMSE Coverage 1 All features 0.2712 92% 2 All features (fill) 0.2791 99% 3 Percent correct+ 0.2800 98% • Skill data for a student was not always available for each test row • Because of this many skill related feature sets only had 92% coverage Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Conclusion from KDD • Combining user features with skill features was very powerful in both modeling and classification approaches • Model tracing based predictions performed formidably against pure machine learning techniques • Random Forests also performed very well on this educational data set compared to other approaches such as Neural Networks and SVMs. This method could significantly boost accuracy in other EDM datasets. Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Random Forests Hardware/Software • Software – MATLAB used for all analysis • Bayes Net Toolbox for Bayesian Networks Models • Statistics Toolbox for Random Forests classifier – Perl used for pre-processing • Hardware – Two rocks clusters used for skill model training • 178 CPUs in total. Training of KT models took ~48 hours when utilizing all CPUs. – Two 32gig RAM systems for Random Forests • RF models took ~16 hours to train with 800 trees Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Time left? Choose the next topic • • • • • KT: 1-35 Prediction: 36-67 Evaluation: 47-77 sig tests: 69-77 Regression/sig tests: 80-112 Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Individualize Everything? UMAP 2011 Zach Pardos 55 Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Fully Individualized Model Model Parameters P(L0) = Probability of initial knowledge P(L0|Q1) = Individual Cold start P(L0) P(T) = Probability of learning P(T|S) = Students’ Individual P(T) P(G) = Probability of guess P(G|S) = Students’ Individual P(G) P(S) = Probability of slip P(S|S) Students’ Individual P(S) Node representations K = Knowledge node Q = Question node S = Student node Q1= first response node T = Learning node G = Guessing node S = Slipping node Node states K , Q, Q1, T, G, S = Two state (0 or 1) Q = Two state (0 or 1) S = Multi state (1 to N) (Where N is the number of students in the training data) P(T|S) Student-Skill Interaction Model T P(L0|Q1) S Q1 P(G) P(S) P(T) P(T) K K K Q Q Q Parameters in bold are learned from data while the others are fixed P(G|S) P(S|S) G S (Pardos & Heffernan, JMLR 2011) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Fully Individualized Model Model Parameters P(L0) = Probability of initial knowledge P(L0|Q1) = Individual Cold start P(L0) P(T) = Probability of learning P(T|S) = Students’ Individual P(T) P(G) = Probability of guess P(G|S) = Students’ Individual P(G) P(S) = Probability of slip S identifies the P(S) student Individual P(S|S) Students’ Node representations K = Knowledge node Q = Question node S = Student node Q1= first response node T = Learning node G = Guessing node S = Slipping node Node states K , Q, Q1, T, G, S = Two state (0 or 1) Q = Two state (0 or 1) S = Multi state (1 to N) (Where N is the number of students in the training data) P(T|S) Student-Skill Interaction Model T P(L0|Q1) S Q1 P(G) P(S) P(T) P(T) K K K Q Q Q Parameters in bold are learned from data while the others are fixed P(G|S) P(S|S) G S (Pardos & Heffernan, JMLR 2011) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Fully Individualized Model Model Parameters P(L0) = Probability of initial knowledge P(L0|Q1) = Individual Cold start P(L0) P(T) = Probability of learning P(T|S) = Students’ Individual P(T) P(G) = Probability of guess P(G|S) = Students’ Individual P(G) P(S) = Probability of slip P(S|S) Students’ Individual P(S) Node representations K = Knowledge node Q = Question node S = Student node Q1= first response node T = Learning node G = Guessing node S = Slipping node Node states , Q, Q1, T, G, S = Two state (0 or 1) T KQcontains the CPT = Two state (0 or 1) lookup table (1 to N) stateof S = Multi the number of students in the training data) (Where N isstudent individual learn rates Student-Skill Interaction Model P(T|S) T P(L0|Q1) S Q1 P(G) P(S) P(T) P(T) K K K Q Q Q Parameters in bold are learned from data while the others are fixed P(G|S) P(S|S) G S (Pardos & Heffernan, JMLR 2011) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing Fully Individualized Model Model Parameters Node states P(L0) = Probability of initial knowledge K , Q, Q1, T, G, S = Two state (0 or 1) P(L0|Q1) = Individual Cold start P(L0) Q = Two state (0 or 1) P(T) = Probability of learning S = Multi state (1 to N) P(T|S) = Students’ Individual is the number of students in the training data) (Where P(T) isP(T) trained for eachNskill P(G) = Probability of guess which gives a learn rate for: P(T|S) P(G|S) = Students’ Individual P(G) Student-Skill Interaction Model P(T|T=1) [high learner] and P(S) = Probability of slip T P(S) [low learner] P(S|S) Students’ Individual P(T|T=0) Node representations K = Knowledge node Q = Question node S = Student node Q1= first response node T = Learning node G = Guessing node S = Slipping node P(L0|Q1) S Q1 P(G) P(S) P(T) P(T) K K K Q Q Q Parameters in bold are learned from data while the others are fixed P(G|S) P(S|S) G S (Pardos & Heffernan, JMLR 2011) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011 Intro to Knowledge Tracing SSI model results 0.286 RMSE 0.285 0.284 Dataset 0.283 Algebra New RMSE Prev RMSE Improvement 0.2813 0.2835 0.0022 Bridge to Algebra 0.282 0.2824 0.2860 0.0036 PPS SSI Average of Improvement is the difference between the 1st and 3rd 0.281 place. It is also the difference between 3rd and 4th place. 0.28 The difference between PPS and SSI are significant in each dataset at the P < 0.01 level (t-test of squared errors) 0.279 Algebra Bridge to Algebra (Pardos & Heffernan, JMLR 2011) Zach Pardos Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011