Cox slides

advertisement
REP
Seeing how students really err: Data
mining the Grade Grinder corpus
Openproof Day 27/3/09
Richard Cox
Representation & Cognition Group
Department of Informatics
University of Sussex
richc@sussex.ac.uk
1
Talk plan
•! The Context
•! What We Want To Do
•! A Closer Look at the Data
–!NL-FOL translation errors
–!NL-FOL mal-rules
–!Graphical translation
•! Next Steps
Openproof Day 27/3/09
2
Who's Involved
•! Dave Barker-Plummer,
CSLI, Stanford University
•! Robert Dale,
Openproof Day 27/3/09
3
Language, Proof and Logic
Openproof Day 27/3/09
4
The LPL Courseware
•! Tarski's World allows users to write sentences in firstorder logic with a blocks-world interpretation; users can
build worlds and examine how the truth of the sentences
tracks changes in the world
•! Grade Grinder accepts work and assesses it, returning a
report on the work by email to the submitter, and
optionally a specified instructor
Openproof Day 27/3/09
5
Tarski’s World
Openproof Day 27/3/09
6
A Translation Exercise
7
Openproof Day 27/3/09
Tet(a) ! FrontOf(a,d)
Graphical checking
Openproof Day 27/3/09
8
A current Grade Grinder report
EXERCISE-7.12.Sentences-7.12.error.1=*** Your
first sentence, "FrontOf(a,d)! Tet(a)", is not
equivalent to any of the expected translations.
Openproof Day 27/3/09
9
The Grade Grinder Dataset
The Grade Grinder
•! can process Solutions to 489 exercises in the
LPL book
•! has been used by more than 38,000 individual
students over the last eight years, from around
100 institutions in around a dozen countries
•! has assessed approximately 1.8 million
individual submissions (each of which can
contain zero or more exercises)
Openproof Day 27/3/09
10
Talk progress
•! The Context
•! What We Want To Do
•! A Closer Look at the Data
–!NL-FOL translation errors
–!Mal-rules
–!Graphical translation
•! Current & Future work
Openproof Day 27/3/09
11
Research aims
•! to characterise & taxonomise student errors and
develop cognitive (information processing) accounts of
their origin - appeal to human error, reasoning / problem
solving, expertise development literatures
Openproof Day 27/3/09
12
Research aims
•! to characterise & taxonomise student errors and
develop cognitive (information processing) accounts of
their origin - appeal to human error, reasoning / problem
solving, expertise development literatures
•! explore ‘educational data mining’ techniques for large
TEL system datasets - emerging area
(www.educationaldatamining.org)
Openproof Day 27/3/09
13
Research aims
•! to characterise & taxonomise student errors and
develop cognitive (information processing) accounts of
their origin - appeal to human error, reasoning / problem
solving, expertise development literatures
•! explore ‘educational data mining’ techniques for large
TEL system datasets - emerging area
(www.educationaldatamining.org)
•! to improve GG diagnosis and feedback, add learner
modelling
Openproof Day 27/3/09
14
Research aims
•! to characterise & taxonomise student errors and
develop cognitive (information processing) accounts of
their origin - appeal to human error, reasoning / problem
solving, expertise development literatures
•! explore ‘educational data mining’ techniques for large
TEL system datasets - emerging area
(www.educationaldatamining.org)
•! to improve GG diagnosis and feedback, add learner
modelling
•! to develop automatic natural language paraphrase
capabilities that, given a student’s incorrect answer, are
able to select and formulate an appropriate natural
language expression that makes clear the difference
between this and the correct answer
Openproof Day 27/3/09
15
Research aims
•! to characterise & taxonomise student errors and
develop cognitive (information processing) accounts of
their origin - appeal to human error, reasoning / problem
solving, expertise development literatures
•! explore ‘educational data mining’ techniques for large
TEL system datasets - emerging area
(www.educationaldatamining.org)
•! to improve GG diagnosis and feedback, add learner
modelling
•! to develop automatic natural language paraphrase
capabilities that, given a student’s incorrect answer, are
able to select and formulate an appropriate natural
language expression that makes clear the difference
between this and the correct answer
•! to extend our approach to cases involving 3-way (NL,
FOL, graphical) translations
Openproof Day 27/3/09
16
Talk progress
•! The Context
•! What We Want To Do
•! A Closer Look at the Data
–!NL-FOL translation errors
–!Mal-rules
–!Graphical translation
•! Future work
Openproof Day 27/3/09
17
Data Selection for Initial Exploration
•! What we wanted: a natural language to FOL translation
exercise that employed a TW blocks world and which is
of moderate difficulty, i.e. one that discriminates
between students
•! We computed the number of GG submissions per LPL
exercise and rank ordered them; Exercise 7.12 from
Chapter 7 (which introduces conditionals) was selected
•! 46,200 submitted solutions, of which 27,473 were
erroneous (59%)
•! The solutions were submitted by 4912 students
representing an average of 5.6 erroneous sentences per
student
•! So, Exercise 7.12 represents an exercise of mid-range
difficulty
Openproof Day 27/3/09
18
Exercise 7.12: Sentences 1-10
1.!
2.!
3.!
4.!
5.!
If a is a tetrahedron then it is in front of d.
a is to the left of or right of d only if it's a cube.
c is between either a and e or a and d.
c is to the right of a, provided it (i.e., c) is small.
c is to the right of d only if b is to the right of c and left
of e.
6.! if e is a tetrahedron, then it's to the right of b if and only
if it is also in front of b.
7.! If b is a dodecahedron, then if it isn't in front of d then it
isn't in back of d either.
8.! c is in back of a but in front of e.
9.! e is in front of d unless it (i.e., e) is a large tetrahedron.
10.! At least one of a, c, and e is a cube.
Openproof Day 27/3/09
19
Exercise 7.12: Sentences 11-20
11.! a is a tetrahedron only if it is in front of b.
12.! b is larger than both a and e.
13.! a and e are both larger than c, but neither is large.
14.! d is the same shape as b only if they are the same size.
15.! a is large if and only if it's a cube.
16.! b is a cube unless c is a tetrahedron.
17.! If e isn't a cube, either b or d is large.
18.! b or d is a cube if either a or c is a tetrahedron.
19.! a is large just in case d is small.
20.! a is large just in case e is.
Openproof Day 27/3/09
20
Openproof Day 27/3/09
21
Exercise 7.12, Sentence 1
•! If a is a tetrahedron then it is in front of d.
–!
Tet(a) ! FrontOf(a,d)
•! 1547 incorrect answers instances out of 2181 submissions, 201
different forms of error
•! 10 incorrect forms account for 1062 of 1547 (69%)
Openproof Day 27/3/09
22
The first error taxonomy
•! Logic Errors
–! Antecedent and Consequent Reversed (" = 0.97)
•! Substitution Errors
–! Incorrect Constants (" = 0.98)
–! Incorrect Predicates (" = 0.74)
–! Incorrect Connectives (" = 0.94)
•! Syntactic Errors
–! Paren Errors (" = 0.76)
–! Case Errors (" = 0.66)
–! Arity Errors (" = 1.0)
•! Other Errors (" = 0.52)
Openproof Day 27/3/09
23
Revised error taxonomy
•! Structural errors
–! antecedent/consequent reversal FrontOf(a,d)-->Tet(a)
–! added exclusivity (A V B) & ~(A & B) instead of just
(A V B)
•! Connective errors
–! using one connective in place of another
–! ! for "!, V for &
•! Atomic errors
–! predicate errors FrontOf(a,b) for SameShape(a,b)
–! argument errors A for a, b for d
Openproof Day 27/3/09
24
Automated coding
•! pattern matching based on regular expressions
•! classifier identifies fragments of submitted
answer that correspond to atomic subformulae
of submitted sentence
•! average correct classification of 85% of
submissions
–!range 62% (S7)
–! If b is a dodecahedron, then if it isn't in front of d then it isn't in back of d
either.
–!to 99% (S11)
–! a is a tetrahedron only if it is in front of b
25
Openproof Day 27/3/09
This talk – focus on three highprevalence error types
•! 1. sensitivity to word-order effects in posed NL sentence
–!
(26%, count=25084)
•! 2. distinguishing the conditional from the biconditional
–!
!
"!
–! bicon for con (18%, n=17518)
–! con for bicon (12%, n=11362)
•! 3. errors concerning naming of constants
–! 4 of the top 10 errors on some sentences e.g. S10
–! Cube(a) V Cube(b) V Cube(c) instead of
–! Cube(a) V Cube(c) V Cube(e)
Openproof Day 27/3/09
26
Error 1/3: Two kinds of sentence
•! 1. correct FOL preserves word order in posed NL sentence
–! S1 If a is a tet then it is front of d
–!
Tet(a) ! FrontOf(a,d)
•! 2. correct FOL does not preserve NL word order
–! S4 c is to the right of a, provided it (c) is small
–!
Small(c) ! RightOf(c,a)
•! 12/20 sentences involve implication (!)
–! 8/12 preserve word order
–! 4/12 do not
•! interesting to compare sentences 1 and 4 as they are matched in
terms of syntactic complexity of FOL translation (i.t.o. predicates,
arity, connectives, arguments,...)
27
Openproof Day 27/3/09
Openproof Day 27/3/09
28
Error 1/3: Students & professors
•! Clement, Lochhead & Monk (1981) studied
translation from NL to algebraic expressions
•! “Write an equation: ‘There are six times as
many students as professors at this university’
Use S for number of students and P for number
of professors”
Openproof Day 27/3/09
29
Error 1/3: Students & professors
•! Clement, Lochhead & Monk (1981)
•! “Write an equation: ‘There are six times as many
students as professors at this university’ Use S for
number of students and P for number of professors”
•! 150 ‘calculus level’ students 37% wrote 6S=P
•! 57% in another sample
•! Two postulated sources for error
–! word-order matching (‘superficial, syntactic’)
–! static comparison (‘S’ not used as a var, but as a
label attached to number i.e. 6S=P used to represent
6S:P ... a wrongly written ratio
(‘deep seated intuitive symbolization strategy’)
Openproof Day 27/3/09
30
Error 1/3: Two kinds of sentence
•! 1. correct solution preserves word order in posed NL
sentence
–! S1 If a is a tet then it is front of d
–! Tet(a) ! FrontOf(a,d)
–! 43% antecedent and consequent reversals
•! 2. correct solution does not preserve NL word order
–! S4 c is to the right of a, provided it (c) is small
–! Small(c) ! RightOf(c,a)
–! 66% antecedent and consequent reversals
•! t=2.23, df(10), p<.05
•! error probably conflates word-order effect with !/"!
confusion, which brings us to...
Openproof Day 27/3/09
31
Error 2/3: <--> for --> substitution
If a is a tetrahedron then it is in front of d
•! S11 a is a tetrahedron only if it is in front of b
•! S1
•! 6 error types shared:
–! antecedent/consequent reversal
–! "!/! not distinguished
–! argument reversal
–! incorrect arguments
–! incorrect predicates
–! arity error
–! For S1 "! for ! ranked 4/6 and accounted for 12%
errors of 1361 solutions
–! For S11 "! for ! ranked 1/6, accounted for 58% of
8981 solutions
•! hence different surface structure of two NL forms elicits
very different error patterns…
Openproof Day 27/3/09
32
Error 2/3:
Biconditional for conditional errors
across all 20 sentences...
students clearly have most difficulty with ‘only if’ NL form of the conditional probably due to similarity to ‘if and only if’ NL form of the biconditional (iff, "!, ‘just in case’, ‘if and only if’)
Openproof Day 27/3/09
33
Error 2/3: "! for ! substitution
•! LPL textbook “..the expression only if introduces a
necessary condition...” “you will pass the course only if
you hand in all the homework”
•! In NL conditionals are naturally interpreted as
biconditionals “if you read this, I’ll buy you lunch”
carries a heavy hint that ‘no reading, no lunch’...
(Stenning & van Lambalgen, 2001; Zweis & Zwicky, 1971)
•! exemplifies negative interference between NL and other
(formal) language systems - cooperative vs adversorial
stances differ (cf Grice) - applies to quantifiers too e.g.
‘some’ in logic includes possibility of ‘all’ but not so in
NL - major source of difficulty for students of formal
subjects...
Openproof Day 27/3/09
34
Error 3/3:
Errors in naming of constants
•! S10 At least one of a, c and e is a cube
–! Cube(a) V Cube(c) V Cube(e)
•! in general we note that this error interacts with:
–! use of constant ‘a’ first in sentence
–! when ‘a’ is the first constant mentioned in sentence
–! constants used in sentence are ‘gappy’ <b,e,d> rather
than alphabetically adjacent <a,b,c>
Openproof Day 27/3/09
35
Error 3/3:
Errors in naming of constants
N = no. of sentences
•! a slip of ‘capture error’ variety
–!(Norman, 1988; Reason, 1990)
Openproof Day 27/3/09
36
Talk progress
•! The Context
•! What We Want To Do
•! A Closer Look at the Data
–!NL-FOL translation errors
–!Mal-rules
–!Graphical translation
•! Future work
Openproof Day 27/3/09
37
Mal-rules
•! Attempt to go beyond superficial error pattern
description… link NL forms to error forms
•! Again, considered Ex 7.12 sentences involving
conditional forms (15/20)
•! small number of mal-rules account for most of
the data for every NL form
Openproof Day 27/3/09
38
Openproof Day 27/3/09
39
Mal-rules
•! small number of mal-rules account for most of
the data for every NL form
–!cf algebra, arithmetic, programming
•! except….
Openproof Day 27/3/09
40
Mal-rules
•! ‘if…then’
–!When conditional is at the top level
•! If a is a tetrahedron then it is in front of d.
–! if A then B, B ! A (46%)
–! if A then B, B " A (40%)
–! but within an embedded clause
•! If b is a dodecahedron, then if it isn't in front of
d then it isn't in back of d either.
–! if A then B, A ! B (63%)
–!complex misunderstanding…
Openproof Day 27/3/09
41
Mal-rules
•! Implications for GG feedback to students
•! Feedback needs to be context-sensitive
–! B ! A when translating only if
•! emphasise only if form of the conditional implies a
necessary condition which, unlike the biconditional,
does not have the property of commutativity.
–!B ! A
when translating just in case
•! emphasise the restricted meaning that just in case
has in logic compared to everyday language
Openproof Day 27/3/09
42
Talk progress
•! The Context
•! What We Want To Do
•! A Closer Look at the Data
–!NL-FOL translation errors
–!Mal-rules
–!Graphical translation
•! Future work
Openproof Day 27/3/09
43
Comparing logic & graphical
translations
•! This time we considered similar kind of exercise (7.15)
•! Requires NL->FOL translation but also student builds a
‘blocks world’ representation too
•! Again, chose a medium difficulty level exercise...
•! Focus again on conditionals
•! 2000-2007
–! 29,500 submissions
–! 18,609 erroneous
–! 7,918 missing a blocks world diagram
–! 372 lacking any FOL sentences
–! 10,319 with both FOL sentences and a diagram
–! from 5,176 different students
Openproof Day 27/3/09
44
Exercise 7.15
Openproof Day 27/3/09
45
FOL translations
Openproof Day 27/3/09
46
Error rates
Openproof Day 27/3/09
Openproof Day 27/3/09
47
48
Flow of information through deduction
•!
•!
•!
•!
•!
•!
•!
•!
•!
•!
start with S4
combine with S1
when b is known..
combine with S2
now S3 (a & c are Tets)
from S4 we know c is not
large
so a is large (S3)
84% get this far...
Tet(a) & Tet(b) & Tet(c) &
Large(a)
next is S7....
49
Openproof Day 27/3/09
Flow of information through deduction
•! Tet(a) & Tet(b) & Tet(c) &
Large(a)
•! S7... d is a small dodec
unless a is small
"! for ! error frequencies (Ex 7.12)
Openproof Day 27/3/09
50
10 most frequent S7 errors
Openproof Day 27/3/09
51
S7
Graphical
errors
involving d
size 2346 (92%)
shape 623 (24%)
both 417 (16%)
Openproof Day 27/3/09
52
Graphical errors involving d
size 2346 (92%)
shape 623 (24%)
both 417 (16%)
most FOL
refers to
both the size
and shape
of d
Openproof Day 27/3/09
53
Graphic shows error that FOL
conceals...
•! possibly a scoping error ?
–!“d is a small dodecahedron unless a is
small”
–!conclusion d is not small
–!d is (not a small) dodec
–!rather than
–!d is (is not a small dodec)
•! Working memory overload?
Openproof Day 27/3/09
54
Difference between tasks?
•! NL ! FOL is a translation task (one sentence at
a time, sentences treated separately)
–!Immediate inference (Newstead)
•! NL ! FOL ! BW is a production task
–!(construct BW, integrate information across
S1 – S12
–!Full inference (deduction)
Openproof Day 27/3/09
55
Talk progress
•! The Context
•! What We Want To Do
•! A Closer Look at the Data
–!NL-FOL translation errors
–!Mal-rules
–!Graphical translation
•! Future work
Openproof Day 27/3/09
56
Current & Future work
•! 1. pursue syntactic/graphical error ‘asymmetry’
•! 2. analysis of students’ submission trajectories
–!cf difficulty measures error rate & stickiness
–!Individual differences
•! 3. LPL and KR
–!Characterise tasks and resources
•! 4.OpenFace
Openproof Day 27/3/09
57
1. pursue syntactic/graphical
error ‘asymmetry’
Openproof Day 27/3/09
58
Ideas: Aaron Kalb
–!Related to part of speech?
•! Refutation of ‘The Taj Mahal is a large
animal’
–!Not ‘You’re wrong, the Taj Mahal is
not a large animal’
–!Rather ‘Dude, the Taj Mahal is not an
animal’
–!Adjective drops out …
•! Study further via concurrent verbal
‘think aloud’ protocols…method/data
triangulation
Openproof Day 27/3/09
59
Expressiveness
•! also graphical representations less expressive
thanlinguistic representations
•! cf ‘triangle’ and
Openproof Day 27/3/09
60
Visual interference effect
(Knauff)?
•! Visual information when not relevant to
reasoning at hand can have negative effects on
reasoning
•! Not so true for spatial information
•! Implications for abstract mental rule vs mental
model theories…
Openproof Day 27/3/09
61
2. analysis of students’
submission trajectories
Openproof Day 27/3/09
62
Individual student’s submissions
over time
number of attempts it takes for for a student determine a correct answer once they
have made their initial mistake.
Openproof Day 27/3/09
Openproof Day 27/3/09
63
64
Openproof Day 27/3/09
65
3. LPL and KR
•! Resources
–!NL
–!Graphical
•! Tasks
–!Agency
–!Modality(ies)
•! Deliverables
–!Ex. solutions
Openproof Day 27/3/09
66
3. LPL and KR
•! structure GG corpus in terms of the learning
tasks posed to the learner
•! structure of the logic curriculum (i.e., a
hierarchy of conceptual pre- and co-requisites)
–!content analysis, various texts
–!Enderton, Hodges, Copi, Gamut, Hurley,
LPL,…
–!KE domain experts…
Openproof Day 27/3/09
67
4. OpenFace – sharing with EDM
community
•!
•!
•!
•!
user-friendly web-based front-end
facilitate data filtering, sharing and re-use
network with new & growing EDM community
users will be encouraged to `grow' the resource
by submitting the results of their analyses, and
ancillary materials
Openproof Day 27/3/09
68
Future work
•! Implications
–! revise FOL curriculum
–! enrich GradeGrinder’s feedback to students by
providing it in the form of natural language
–! Your solution (incorrect):
•! Lib(j) # Home(j)
–! What your solution means:
•! John is either at home or at the library, or both.
•! (challenge is to minimise ‘paraphrase distance’)
•! Mal-rule analyses show that context sensitivity required!
Openproof Day 27/3/09
69
Paraphrase ‘Distance’ From
Source
•! [Home(john) " Home(mary)] ! ¬[Home(john)
! Home(mary)]
–!Either John is home or Mary is home and it’s
not the case that John is home and Mary is
home
–!Either John or Mary is home and it’s not the
case that John and Mary are both home
–!Either John or Mary is home but it’s not the
case that John and Mary are both home
–!Either John or Mary is home but it’s not the
case that both of them are home
–!Either John or Mary is home but not both
Openproof Day 27/3/09
70
End
•! Acknowledgements:
–!Joint Social Science Research Council (US) &
ESRC TLRP (UK) ‘Visiting Americas’ Fellowship to
Cox,
–!Australian Research Council support to Dale
•! Albert Liu, Michael Murray, Dawit Meles, Aaron Kalb
(CLSI)
Openproof Day 27/3/09
71
Download