Uncertainty, Prediction, and Teacher Feedback Using an Online

advertisement
All Papers for this Session
are available at
http://www.stat.cmu.edu/~brian/NCME07
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
1
Uncertainty, Prediction and
Teacher Feedback using an
Online System
that Teaches as it Assesses
Brian W. Junker
Thanks to Neil Heffernan, Ken Koedinger,
Mingyu Feng, Beth Ayers, Nathaniel Anozie,
Zach Pardos, and many others
http://www.assistment.org
Funding from US Department of Education, National
Science Foundation (NSF), Office of Naval Research,
Spencer Foundation, and the US Army
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
2
The ASSISTments Project
• Web-based 8th grade mathematics tutoring system
• ASSIST with, and ASSESS, progress toward
Massachusetts Comprehensive Assessment System
Exam (MCAS)
– Guide students through problem solving with MCAS released
items
– Predict students’ MCAS scores at end of year
– Provide feedback to teachers (what to teach next?)
• (Generalize to other States…)
• Over 50 workers at Carnegie Mellon, Worcester
Polytechnic Institute, Carnegie Learning, Worcester
Public Schools
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
3
The ASSISTment Tutor
•
•
Main Items: Released MCAS or
“morphs”
Incorrect Main “Scaffold” Items
– “One-step” breakdowns of main
task
– Buggy feedback, hints on request,
etc.
•
•
•
All items coded by transfer model
(Q-matrix) for knowledge
components (KC’s)
Student records contain
responses, timing data,
bugs/hints, etc.
System tracks students through
time, provides teacher reports per
student & per class.
– Predict MCAS Scores
– KC Feedback: learned/notlearned, etc.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
4
This talk draws on two recent
reports
Predicting MCAS Scores – Summary/Review:
Junker, B. W. (2006)."Using on-line tutoring records to predict end-of-year
exam scores: experience with the ASSISTments project and MCAS 8th
grade mathematics". To appear in Lissitz, R. W. (Ed.), Assessing and
modeling cognitive development in school: intellectual growth and standard
settings. Maple Grove, MN: JAM Press.
KC Feedback – Some Current Progress:
Anozie, N. & Junker, B. W. (2007). "Investigating the utility of a conjunctive
model in Q matrix assessment using monthly student records in an online
tutoring system". Paper to be presented at the Annual Meeting of the
National Council on Measurement in Education, April 12, Chicago IL (K4;
Thursday 8:15-10:15 Intercontinental Seville East).
(These and all papers for this session are available at
http://www.stat.cmu.edu/~brian/NCME07)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
5
Challenges: Predicting MCAS
• The exact content of the MCAS exam is not known
until months after it is given
• The ASSISTments themselves are ongoing throughout
the school year as students learn (from teachers, from
ASSISTment interactions, etc.).
% Correct on System per
student
40
35
30
25
20
15
10
5
0
Sep
0
t
Brian Junker
Carnegie Mellon
Oct
1
Nov
2
Jan
Dec
3
Time
Jan
4
Feb
5
2007 NCME Symposium on
Learning-Embedded Assessment
Mar
6
6
Methods: Predicting MCAS
• Regression approaches [Feng et al, 2006; Anozie & Junker, 2006;
Ayers & Junker, 2006/2007]:
–
–
–
–
Percent Correct on Main Questions
Percent Correct on Scaffold Questions
Rasch proficiency on Main Questions
Online metrics (efficiency and help-seeking; e.g. Campione et al., 1985;
Grigorenko & Sternberg, 1998)
– Both end-of-year and “month-by-month” models
• Bayes Net (DINA Model) approaches:
– Predicting KC-coded MCAS questions from Bayes Nets (DINA model)
applied to ASSISTments [Pardos, et al., 2006];
– Regression on number of KC’s mastered in DINA model [Anozie 2006]
• HLM-style growth curve models
– At the KC level [Feng, Heffernan & Koedinger, 2006]
– At the total score level [Feng, Heffernan, Mani & Heffernan, 2006]
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
7
Results: Predicting MCAS
Predictors
df
CV-MAD
CV-RMSE
1
7.18
8.65
7 months, main questions only
#Skills of 77
1
learned (DINA)
6.63
8.62
3 months, mains and scaffolds
Rasch
Proficiency
1
5.90
7.18
7 months, main questions only
PctCorrMain +
4 metrics
35 5.46
6.56
7 months; 5 summaries each
month
Rasch Profic +
5 metrics
6
6.46
7 months, main questions only
PctCorrMain
5.24
Remarks
• Feng et al. (in press) estimate best-possible
(11% of 54pt raw score) from split-half experiments with MCAS
• Ayers & Junker (2007) reliability calculation suggests approximate
bounds 1.05· MAD · 6.53.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
8
Conclusions: Predicting MCAS
• Tradeoff:
– Greater model complexity (DINA) can help [Pardos et
al, 2006; Anozie, 2006];
– Accounting for question difficulty (Rasch), plus online
metrics, does as well [Ayers & Junker, 2007]
• Limits of what we can accomplish for prediction
– MCAS reliability  ¼ 0.91
– Typical ASSISTments  ¼ 0.81
– If ASSISTments were perfectly reliable, approx.
bound on MAD would be cut in half (3.40)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
9
Goal: KC Feedback
• Providing feedback on
– individual students
– groups of students
• Current teacher report:
For each skill, report
percent correct on all
items for which that skill
is hardest.
• Can we do better?
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
10
Challenges: KC Feedback
• Different transfer models are used and expected by
different stakeholders:
– The MCAS itself is scaled using unidimensional IRT / Pct Correct
– Description and design of the MCAS is based on
• Five-strand model of mathematics (Number & Operations, Algebra,
Geometry, Measurement, Data Analysis & Probability)
• 39 “learning standards” nested within the five strands.
– ASSISTment researchers have developed a transfer model
involving up to 106 KC’s (WPI-106, Pardos et al., 2006) nested
within the 39 learning standards
• Scaffolding can be designed as optimal measures of
single KC’s; or as optimal tutoring aids
– When more than one transfer model is involved, scaffolds fail to
line up with at least one of them!
• Different students work through ASSISTments at
different rates
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
11
Methods: KC Feedback
Pardos et al (2006): tend to prefer more KC’s for prediction;
Anozie & Junker (2007): inference about KC inference (106 KC’s)
P(Congruence)
P(Perimeter)
P(Equation-Solving)
1
3
2
Gate
P(Question)
Gate
P(Question)
G
P(Question)
True
1-s1
True
1-s2
True
1-s3
False
g1
False
g2
False
g3
Conjunctive binary-skills Bayes Net (Macready & Dayton, 1977;
Haertel, 1989; Maris, 1999; Junker & Sijtsma DINA, 2001; etc.)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
12
Results: KC Feedback
• Average percent of
KC’s mastered:
30-40%
• February dip reflects
a recording error for
main questions
• Can also break this
down by individual KC
(next slide)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
13
Results: KC Feedback
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
14
Results: KC Feedback
• Prediction based on
‘ideal response’
(P[guess] = P[slip] =0)
• Split-half cross-val
accuracy 68-73%
• Enough to help
teachers decide what
to teach next.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
15
Digression: Question & Transfer
Model
Characteristics
Main Item:
Which graph
contains the
points in the
table?
Scaffolds:
1.
2.
3.
4.
Brian Junker
Carnegie Mellon
X
Y
-2
-3
-1
-1
1
3
Guess (posterior boxplots)
Slip (posterior boxplots)
Quadrant of (-2,-3)?
Quadrant of (-1,-1)?
Quadrant of (1,3)?
[Repeat main]
2007 NCME Symposium on
Learning-Embedded Assessment
16
Conclusions
• Different transfer models for different purposes seem
necessary.
• For unidimensional prediction, unidimensional IRT,
augmented with “assistance metrics”, works well
– Account for question difficulty, help-seeking behavior
– We are close to best-possible prediction error
• A finer grained model like the DINA model is needed for
individual and group diagnostics
– Individual diagnosis uncertainty can be large
– Group diagnosis seems good enough to help teachers decide
what to teach next
– Scaffold questions: teaching or assessment?
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
17
Future Work
• Transfer model / KC’s
– Discovering and improving the transfer model?
– Different transfer models for different purposes – “play
together”?
• Experimental design to improve KC inferences
• Account for learning over time
– Prior distributions for skills based on past
performance?
– Markov Learning Model for each skill?
• Compare with crediting/blaming hardest KC
– Accuracy of inference?
– Speed of computation?
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
18
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
19
RMSE and MAD bounds
(Ayers & Junker, 2007)
• Let
• Then
• And
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
20
Dynamic Models:
Anozie and Junker (2006)
• More months helps more than more metrics
• First 5 online metrics retained for final model(s)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
21
Full Set of Online Metrics
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
22
Dynamic Models:
Anozie and Junker (2006)
• Look at changing influence of online metrics on MCAS
prediction over time
– Compute monthly summaries of all online metrics (not just %correct)
– Build linear prediction model for each month, using all current
and prev. months’ summaries
• To enhance interpretation, variable selection
– by metric, not by monthly summary
– include/exclude metrics simultaneously in all monthly models
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
23
KC’s in DINA analysis
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
24
Results: KC Feedback
• Top shows posterior CI’s
for one skill; middle and
bottom are ‘sample sizes’
• More data or consistent
evidence  smaller CI
• Less data, or inconsistent
evidence larger CI
• Experimental Design?
How many questions?
Which skills? Etc.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
25
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
26
Methods: KC Feedback
• Pardos et al (2006) first tried DINA for
MCAS prediction
– Compared the 1-KC, 5-KC, 39-KC and 106KC models
– Found 39 KC’s did best; 106 KC’s 2nd best
• Anozie & Junker (2007) apply DINA with
an eye toward feedback to teachers etc.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
27
Static Prediction Models
• Feng et al. (2006 & to appear):
– Online testing metrics
•
•
•
•
Percent correct on main/scaffold/both items
“assistance score” = (errors+hints)/(number of scaffolds)
Time spent on (in-)correct answers
etc.
– Compare paper & pencil pre/post benchmark tests
• Ayers and Junker (2006):
– Rasch & LLTM (linear decomps of item difficulty)
– Augmented with online testing metrics
• Pardos et al. (2006); Anozie (2006):
– Binary-skills conjunctive Bayes nets
– DINA models (Junker & Sijtsma, 2001; Maris, 1999; etc.)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
28
The ASSISTment Architectures
•
Extensible Tutor Architecture
– Scalable from simple pseudo-tutors with few users to model-tracing tutors and
1000’s of users
– Curriculum Unit
• Items organized into multiple curricula
• Sections within curriculum: Linear, Random, Experimental, etc.
– Problem & Tutoring Strategy Units
• Task organization & user interaction (e.g. main item & scaffolds, interface widgets, …)
• Task components mapped to multiple transfer models
– Logging Unit
• Fine-grained human-computer interaction trace
• Abstracting/coarsening mechanisms
•
Web-based Item Builder
– Used by classroom teachers to develop content
– Support for building curricula, mapping tasks to transfer models, etc.
•
Relational Database and Network Architecture supports
– User Reports (e.g., students, teachers, coaches, administrators)
– Research Data Analysis
•
Razzaq et al. (to appear) overview
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
29
Two Assessment Goals
• To predict end-of-year MCAS scores
• To provide feedback to teachers (what to
teach next?)
• But there are some complications…
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
30
2004-2005 Data
• Tutoring tasks
– 493 main items
– 1216 scaffold items
• Students
– 912 eighth-graders in two middle schools
• Skills Models (Transfer Models / Q Matrices)
– 1 “Proficiency”: Unidimensional IRT
– 5 MCAS “strands”: Number/Operations, Algebra,
Geometry, Measurement, Data/Probability
– 39 MCAS learning standards: nested in the strands
– 77 active skills: “WPI April 2005” (106 potential)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
31
Static Models:
Feng et al. (2006 & to appear)
• What is related to raw MCAS
(0-54 pts)?
• P&P pre/post benchmark tests
• Online metrics:
– Pct Correct on Mains
– Pct Correct on Scaffolds
– Seconds Spent on Incorrect
Scaffolds
– Avg Number of Scaffolds per
Minute
– Number of Hints Plus Incorrect
Main Items
– etc.
Predictor
SEP-TEST
0.75
MARCH-TEST
0.41
MAIN_PERCENT_CORRECT
0.75
MAIN_COUNT
0.47
TOTAL_MINUTES
0.26
PERCENT_CORRECT
0.76
QUESTION_COUNT
0.20
HINT_REQUEST_COUNT
-0.41
AVG_HINT_REQUEST
-0.63
HINT_COUNT
-0.39
AVG_HINT_COUNT
-0.63
BOTTOM_OUT_HINT_COUNT
-0.38
AVG_BOTTOM_HINT
-0.55
ATTEMPT_COUNT
0.08
AVG_ATTEMPT
-0.41
AVG_QUESTION_TIME
-0.12
AVG_ITEM_TIME
-0.39
P & P Tests
ASSISTment
Online Metrics
• All annual summaries
Brian Junker
Carnegie Mellon
Corr
2007 NCME Symposium on
Learning-Embedded Assessment
32
Static Models:
Feng et al. (2006 & to appear)
• Stepwise linear
regression
• Mean Abs Deviation
Predictor
(Const)
Sept_Test
• Within-sample
MAD = 5.533
• Raw MCAS = 0-54, so
Within-sample Pct Err =
MAD/54 =10.25%
(uses Sept P&P Test)
Brian Junker
Carnegie Mellon
Pct_Correct_All
Avg_Attempts
Avg_Hint_Reqs
2007 NCME Symposium on
Learning-Embedded Assessment
Coefficient
26.04
0.64
24.21
-10.56
-2.28
33
Static Models:
Ayers & Junker (2006)
• Compared two IRT models on
ASSISTment main questions:
– Rasch model for 354 main questions.
– LLTM: Constrained Rasch model decompose
main question difficulty by skills in the WPI
April Transfer Model (77 skills).
• Replace “Percent Correct” with IRT
proficiency score in linear predictions of
MCAS
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
34
Static Models:
Ayers & Junker (2006)
• Rasch fits much
better than LLTM
– BIC = -3,300
– df = +277
• Attributable to
– Transfer model?
– Linear decomp of item
difficulties?
• Residual and difficulty
plots suggest transfer
model fixes.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
35
Static Models:
Ayers & Junker (2006)
• Focus on Rasch, predict MCAS with
where  = proficiency, Y=online metric
• 10-fold cross-validation vs. 54-pt raw MCAS:
Predictors
Variables
CV- MAD
CV % Error
% Corr Main
1
7.18
13.30
 (proficiency)
1
5.90
10.93
 + 5 Online
Metrics
6
5.24
9.70
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
36
Static Models:
Pardos et al. (2006)
• Compared nested versions of binary skills
models (coded both ASSISTments and MCAS):
• gi = 0.10, si = 0.05, all items; k = 0.5, all skills
• Inferred skills from ASSISTments; computed
expected score for 30-item MCAS subset
MODEL
Mean Absolute Deviance (MAD)
% ERROR (30 items)
39 MCAS standards
4.500
15.00
106 skills (WPI Apr)
4.970
16.57
5 MCAS strands
5.295
17.65
1 Binary Skill
7.700
25.67
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
37
Static Models: Anozie (2006)
• Focused on 77 active skills in WPI April Model
• Estimated k’s, gi’s and si’s using flexible priors
• Predicted full raw 54-pt MCAS score as linear
function of (expected) number of skills learned
Months of Data
CV MAD
CV % Err
1
8.11
15.02
2
7.38
13.68
3
6.79
12.58
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
38
Dynamic Prediction Models
• Razzaq et al. (to appear): evidence of
learning over time
• Feng et al. (to appear): student or item
covariates plus linear growth curves (a la
Singer & Willett, 2003)
• Anozie and Junker (2006): changing
influence of online metrics over time
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
39
Dynamic Models:
Razzaq et al. (to appear)
% Correct on System per
student
40
35
30
25
20
15
10
5
0
Sep
0
t
•
•
Oct
1
Nov
2
Jan
Dec
3
Jan
4
Feb
5
Mar
6
Time
ASSISTment system is sensitive to learning
Not clear what is the source of learning here…
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
40
Dynamic Models:
Feng et al. (to appear)
• Growth-Curve Model I: Overall Learning
School was a better predictor (BIC) than Class or Teacher;
possibly because School demographics dominate the intercept.
• Growth-Curve Model II: Learning in Strands
Sept_Test is a good predictor of baseline proficiency.
Baseline and learning rates varied by Strand.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
41
Dynamic Models:
Anozie and Junker (2006)
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
42
Dynamic Models:
Anozie and Junker (2006)
• Recent main question performance
dominates – proficiency?
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
43
Dynamic Models:
Anozie and Junker (2006)
• Older performance on scaffolds similar to
recent – learning?
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
44
Summary of Prediction Models
Model
Variables
CV-MAD
CV % Error
CV-RMSE
PctCorrMain
1
7.18
13.30
8.65
#Skills of 77
learned
1?
6.63
12.58
8.62
Rasch
Proficiency
1?
5.90
10.93
7.18
PctCorrMain
+ 4 metrics
35 ( = 5 x 7 )
5.46
10.10
6.56
Rasch Profic
+ 5 metrics
6?
5.24
9.70
6.46
• Feng et al. (in press) compute the split-half MAD of the MCAS
and estimate ideal % Error ~ 11%, or MAD ~ 6 points.
• Ayers & Junker (2006) compute reliabilities of the ASSISTment
sets seen by all students and estimate upper and lower bounds
for optimal MAD: 0.67 MAD 5.21.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
45
New Directions
• We have some real
evidence of learning
– We are not yet modeling
individual student learning
• Current teacher report:
For each skill, report
percent correct on all
items for which that skill
is hardest.
– Can we do better?
• Approaches now getting
underway:
– Learning curve models
– Knowledge-tracing models
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
46
New Directions:
Cen, Koedinger & Junker (2005)
• Inspired by Draney, Pirolli &
Wilson (1995)
– Logistic regression for
successful skill uses
– Random intercept (baseline
proficiency)
– fixed effects for skill and
skill*opportunity
• Difficulty factor: skill but not
skill*opportunity
• Learning factor: skill and
skill*opportunity
– Part of Data Shop at
http://www.learnlab.org
• Feng et al. (to appear) fit
similar logistic growth curve
models to ASSISTment items
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
47
New Directions:
Knowledge Tracing
• Combine knowledge tracing approach of
Corbett, Anderson and O’Brien (1995) with DINA
model of Junker and Sijtsma (2001)
• Each skill represented by a two state
(unlearned/learned) Markov process with
absorbing state at “learned”.
• Can locate time during school year when each
skill is learned.
• Work just getting underway (Jiang & Junker).
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
48
Discussion
• ASSISTment system
– Great testbed for online cognitive modeling and
prediction technologies
– Didn’t mention reporting and “gaming detection”
technologies
– Teachers positive, students impressed
• Ready-Fire-Aim
– Important! Got system up and running, lots of user
feedback & buy-in
– But… E.g. lack of control over content and contentrollout (content balance vs MCAS?)
– Given this, perhaps only crude methods
needed/possible for MCAS prediction?
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
49
Discussion
• Multiple skill codings for different purposes
– Exam prediction vs. teacher feedback; state to state.
• Scaffolds
– Dependence between scaffolds and main items
– Forced-scaffolding: main right  scaffolds right
– Content sometimes skills-based, sometimes tutorial
• We are now building some true one-skill decomps to
investigate stability of skills across items
• Student learning over time
– Clearly evidence of that!
– Some experiments not shown here suggest modest
but significant value-added for ASSISTments
– Starting to model learning, time-to-mastery, etc.
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
50
References
Anozie, N. (2006). Investigating the utility of a conjunctive model in Q-matrix assessment using monthly student records in an online tutoring
system. Proposal submitted to the National Council on Measurement in Education 2007 Annual Meeting.
Anozie, N.O. & Junker, B. W. (2006). Predicting end-of-year accountability assessment scores from monthly student records in an online
tutoring system. American Association for Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston,
MA.
Ayers, E. & Junker, B.W. (2006). Do skills combine additively to predict task difficulty in eighth-grade mathematics? American Association for
Artificial Intelligence Workshop on Educational Data Mining (AAAI-06), July 17, 2006, Boston, MA.
Ayers, E. & Junker, B. W. (2006). IRT modeling of tutor performance to predict end of year exam scores. Working paper.
Corbett, A. T., Anderson, J. R., & O'Brien, A. T. (1995) Student modeling in the ACT programming tutor. Chapter 2 in P. Nichols, S. Chipman, &
R. Brennan, Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.
Draney, K. L., Pirolli, P., & Wilson, M. (1995). A measurement model for a complex cognitive skill. In P. Nichols, S. Chipman, & R. Brennan,
Cognitively Diagnostic Assessment. Hillsdale, NJ: Erlbaum.
Feng, M., Heffernan, N. T., & Koedinger, K. R. (2006). Predicting state test scores better with intelligent tutoring systems: developing metrocs to
measure assistance required. In Ikeda, Ashley & Chan (Eds.) Proceedings of the Eighth International Conference on Intelligent Tutoring
Systems. Springer-Verlag: Berlin. pp 31-40.
Feng, M., Heffernan, N., Mani, M., & Heffernan, C. (2006). Using mixed effects modeling to compare different grain-sized skill models. AAAI06
Workshop on Educational Data Mining, Boston MA.
Feng, M., Heffernan, N. T., & Koedinger, K. R. (in press). Addressing the testing challenge with a web-based E-assessment system that tutors
as it assesses. Proceedings of the 15th Annual World Wide Web Conference. ACM Press (Anticipated): New York, 2005.
Hao C., Koedinger K., & Junker B. (2005). Automating Cognitive Model Improvement by A*Search and Logistic Regression. In Technical Report
(WS-05-02) of the AAAI-05 Workshop on Educational Data Mining, Pittsburgh, 2005.
Junker, B.W. & Sijtsma K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response
theory. Applied Psychological Measurement 25: 258-272.
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika 64, 187-212.
Pardos, Z. A., Heffernan, N. T., Anderson, B., & Heffernan, C. L. (2006). Using Fine Grained Skill Models to Fit Student Performance with
Bayesian Networks. Workshop in Educational Data Mining held at the Eighth International Conference on Intelligent Tutoring Systems.
Taiwan. 2006.
Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak,
T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., & Rasmussen, K.P. (2005). The Assistment Project: Blending
Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th Artificial Intelligence In
Education. Amsterdam: ISO Press. pp 555-562.
Razzaq, L., Feng, M., Heffernan, N. T., Koedinger, K. R., Junker, B., Nuzzo-Jones, G., Macasek, N., Rasmussen, K. P., Turner, T. E. &
Walonoski, J. (to appear). A web-based authoring tool for intelligent tutors: blending assessment and instructional assistance. In Nedjah,
N., et al. (Eds). Intelligent Educational Machines within the Intelligent Systems Engineering Book Series (see http://isebis.eng.uerj.br).
Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University Press, New York.
Websites:
http://www.assistment.org
http://www.learnlab.org
http://www.educationaldatamining.org
Brian Junker
Carnegie Mellon
2007 NCME Symposium on
Learning-Embedded Assessment
51
Download