The Andes Intelligent Tutoring System: Five years of evaluations

advertisement
The Andes Intelligent Tutoring System:
Five years of evaluations
Kurt VanLehn
Pittsburgh Science of Learning Center (PSLC)
University of Pittsburgh
The physics LearnLab course
committee
 Andes
development
 Experimenters
– Anders Weinstein
– Brett van de Sande
– Kurt VanLehn (co-chair)

–
–
–
–
U.S. Naval Academy
–
–
–
–
Don Treacy (co-chair)
Bob Shelby
Mary Wintersgill
Kay Schulze

Scotty Craig
Sandy Katz
Bob Hausmann
Michael
Ringenberg
Meet weekly
– Thursdays, 3:30
Funding
 The
U. S. Office of Naval Research
Cognitive Science Program
USN
 The
U.S. National Science Foundation
Pittsburgh Science of Learning Center
Research question
 Given
–
–
–
–
Whole semester of instruction
No change to content of course
No change to lectures, labs, assignments
Standard exams (not designed by experimenters)
 Can
a homework helper increase learning?
Prior work with answer-only tutoring
steps
 Web-based
homework grading systems
– E.g., Web-assign, CAPA, Mastering Physics
– Provide feedback & hints on the answer only

Compared to ordinary paper-based homework
– Positive benefits

When paper-based homework is collected & graded
– No benefits (Pascarella, 2002; Dufresne, Mestre & Rath, 2002)

Interpretation
– Motivating students to do their homework provides benefits,
but the answer-only tutoring system provides no additional
benefits
Prior work with tutoring systems that
give feedback & hints on steps

Lisp Tutor (Corbett, 2001) and many others
– Same homework problems & text
– Experimenter’s exams only
– But not a whole semester (only 5 lessons)

Pump curriculum + Pat tutor (Koedinger et al)
– Whole year of high-school algebra
– Both experimenter’s exams & standard exams
– But content confounded with tutoring system

Earlier evaluations of Andes
– First half-semester only
– Experimenter’s exams only
Why does it matter?
 Ideally,
an intelligent homework helper…
– can increase learning without changing the course, and
– the increase is strong enough to show in final exam
» The diligent always do well & slackers always do poorly
» Cramming
 If
not…
– still useful if it facilitates content upgrades, and
– the upgrades cause robust increases in learning
Outline
Andes
Evaluation
Discussion
Next
What kind of physics?
 US
university introductory physics courses
 US high school advanced physics courses
 A typical problem:
If a 2000 kg car at the top of a 20
degree inclined driveway 20 m
long slips its parking brake and
rolls down. If we ignore friction
and drag, what is the magnitude of
the velocity of the car when it hits
the garage door?
Andes user interface
Read a physics problem
Draw vectors
Type in equations
Type in answer
Andes feedback and hints
“What should I do next?”
“What’s wrong with that?”
Green means correct
Red means incorrect
Dialogue & hints
Major challenges
 Dealing
with equations
– Giving red/green feedback
– Undoing algebraic combination
» For “what should I do next?”
– Analyzing errors in equations
 Scale-up
– 13 chapters, 500 textbook pages
– 350+ problems
– 300+ principles
Outline
Andes
Evaluation
Next
– Method
– Main results
– Which students benefited?
– Which knowledge benefited?
– Interpretation of results
Discussion
Evaluations of Andes at the US
Naval Academy
 Fall
semesters 2000, 2001, 2002 & 2003
 Only the homework modality was varied:
Andes vs. paper-based
–
–
–
–
Same textbook
Similar lectures, labs, recitations
Similar homework problems
Same exams
 Students
were motivated to do paper-based homework
– Either collected and graded
– Or 1 homework problem on each quiz
Exams
 Midterm
exam
– 1 hour, 4 problems
– Scored on derivation & answer
» Drawings (30%)
» Variable definitions (20%)
» Equations (40%)
» Answers (10%)
 Final
exam
– 3 hours, 50 problems
– Multiple choice
Next
Checking prior competence of Andes
and control students
Grade-point
averages equal
Distribution of majors equal
– Engineering majors vs.
– Science majors vs.
– Other majors
Midterm exam results
(All differences reliable, p < .01)
75
70
65
Control
Andes
60
55
50
2000
2001
2002
How to calculate effect size?
2003
Calculating effect size over 4
different midterm exams
 Normalize
each score
z_score(student) =
[raw_score(student) – mean(exam)] / standard_deviation(exam)
 For
each condition, pool z-scores across years
 Effect size =
0.61
Final exam
 Exam
covers 100% of course, but Andes didn’t
– Does now
 Use
2003 exam only; Andes covered 70%
– 89 Andes students
– 823 non-Andes students
Prior competence not equal
 Majors
not equally distributed
– Andes group had more engineering majors
 GPAs
not equally distributed
– Andes group had marginally higher GPAs
 Factor
out prior competence statistically
– For each major, regress GPA on final exam score
– Residual_score(student) =
raw_score(student) – predicted_score(student’s major, student’s GPA)
Final exam results
Difference is
reliable
(p = 0.028)
2.5
2
1.5
1
0.5
0
Effect size = 0.25
-0.5
-1
Control
Andes
Outline
Andes
Evaluation
– Method
– Main results
– Which students benefited?
– Which knowledge benefited?
– Interpretation of results
Discussion
Next
Benefits same regardless of GPA
3.0000
Andes
y = 0.9473x - 2.4138
2
R = 0.2882
2.0000
Controls
y = 0.7956x - 2.5202
2
R = 0.2048
Z-score on exam
1.0000
0.0000
-1.0000
ANDES
CONTROLS
Linear (ANDES)
Linear (CONTROLS)
-2.0000
-3.0000
1
1.5
2
2.5
GPA
3
3.5
4
Benefits varied by major on final exam
but not on midterm exam
Midterm exam results
0.3
Final exam results
4.5
4
0.2
3.5
0.1
3
0
2.5
2
-0.1
1.5
-0.2
1
-0.3
0.5
0
-0.4
-0.5
-0.5
-1
Engineers
Scientists
Control
Others
Andes
Engineers
Scientists
Non-Andes
Others
Andes
Outline
Andes
Evaluation
– Method
– Main results
– Which students benefited?
– What knowledge benefited?
– Interpretation of results
Discussion
Next
Effect sizes for subscores of midterm
exam
1.5
1
0.5
0
-0.5
-1
Drawings
Variables
Equations
Answers
Interpretation of results

Engineering & science majors learned the
red path and prefer it
Problem
Andes
– Andes does not increase their final exam scores

They use blue path on the midterm
– Andes increases their midterm exam scores

Prior
physics
Other majors do not have red path, so they
use the blue path on both exams
Diagram &
variables
Andes
– Andes increases both exams’ scores

On midterm exams, subscores measure
components of blue path separately
– Biggest benefit for diagrams & variables
– Smaller on equations; none on answer
Equations
Prior math
& physics
Answer
Summary of results
 Main
result: Andes provides benefits
– Midterm exam effect size: 0.61
– Final exam effect size: 0.25

Andes helps students learn conceptual skills
– Effect sizes on conceptual subscores: 1.21 & 0.69
– Effect sizes on calculational subscores: 0.11 & -0.08

Some students appear to have a non-conceptual method
for solving problems
– Competes with the conceptual method taught by Andes
– They use it on the (answer-only) final exam
– This dilutes the benefit of Andes on final exam
Outline
Andes
Evaluation
Discussion
– Andes compared to others
– Why is Andes effective?
Next
Effect sizes on experimenter’s &
standard exams of 3 tutoring systems
1.4
1.2
1
Lisp
Pump+Pat
Andes
0.8
0.6
0.4
0.2
0
Experimenter's 1
Experimenter's 2
Standard
Interpretation of the comparison
with other tutors
 Andes
is about the same as other tutoring
systems that give feedback and hints on steps
 Perhaps the Pump+Pat benefits are due solely to
the tutoring system and not the content upgrade
Summary: Studies of homework
helpers when content is controlled
Ordinary paper-based homework
Large benefits
Motivated paper-based homework
No benefits
Feedback & hints on answer only
Large benefits
Feedback & hints on steps
Outline
Andes
Evaluation
Discussion
– Andes compared to others
– Why is feedback & hints on steps so
effective?
Next
Hypothesis: Andes increases the number
of successful knowledge events
 Without
feedback & hints on steps, students skip them
– Guess
– Copy similar example’s step & edit
– Copy & edit a higher goal’s outcome
 Doing
a step correctly requires
– Figuring out how the first time (sense-making)
– Figuring out why the second & third times (refinement)
– Recalling why & how the other times (fluency building)
 This
increases number of successful knowledge events
– Wherein a student constructs or applies a knowledge
component
Thanks for your attention!
 At
www.andes.pitt.edu
– Download stand-alone version of Andes
– Try OLI version of Andes
– Download papers on Andes
 Sorry,
but Andes only runs on Windows
Download