REP Seeing how students really err: Data mining the Grade Grinder corpus Openproof Day 27/3/09 Richard Cox Representation & Cognition Group Department of Informatics University of Sussex richc@sussex.ac.uk 1 Talk plan •! The Context •! What We Want To Do •! A Closer Look at the Data –!NL-FOL translation errors –!NL-FOL mal-rules –!Graphical translation •! Next Steps Openproof Day 27/3/09 2 Who's Involved •! Dave Barker-Plummer, CSLI, Stanford University •! Robert Dale, Openproof Day 27/3/09 3 Language, Proof and Logic Openproof Day 27/3/09 4 The LPL Courseware •! Tarski's World allows users to write sentences in firstorder logic with a blocks-world interpretation; users can build worlds and examine how the truth of the sentences tracks changes in the world •! Grade Grinder accepts work and assesses it, returning a report on the work by email to the submitter, and optionally a specified instructor Openproof Day 27/3/09 5 Tarski’s World Openproof Day 27/3/09 6 A Translation Exercise 7 Openproof Day 27/3/09 Tet(a) ! FrontOf(a,d) Graphical checking Openproof Day 27/3/09 8 A current Grade Grinder report EXERCISE-7.12.Sentences-7.12.error.1=*** Your first sentence, "FrontOf(a,d)! Tet(a)", is not equivalent to any of the expected translations. Openproof Day 27/3/09 9 The Grade Grinder Dataset The Grade Grinder •! can process Solutions to 489 exercises in the LPL book •! has been used by more than 38,000 individual students over the last eight years, from around 100 institutions in around a dozen countries •! has assessed approximately 1.8 million individual submissions (each of which can contain zero or more exercises) Openproof Day 27/3/09 10 Talk progress •! The Context •! What We Want To Do •! A Closer Look at the Data –!NL-FOL translation errors –!Mal-rules –!Graphical translation •! Current & Future work Openproof Day 27/3/09 11 Research aims •! to characterise & taxonomise student errors and develop cognitive (information processing) accounts of their origin - appeal to human error, reasoning / problem solving, expertise development literatures Openproof Day 27/3/09 12 Research aims •! to characterise & taxonomise student errors and develop cognitive (information processing) accounts of their origin - appeal to human error, reasoning / problem solving, expertise development literatures •! explore ‘educational data mining’ techniques for large TEL system datasets - emerging area (www.educationaldatamining.org) Openproof Day 27/3/09 13 Research aims •! to characterise & taxonomise student errors and develop cognitive (information processing) accounts of their origin - appeal to human error, reasoning / problem solving, expertise development literatures •! explore ‘educational data mining’ techniques for large TEL system datasets - emerging area (www.educationaldatamining.org) •! to improve GG diagnosis and feedback, add learner modelling Openproof Day 27/3/09 14 Research aims •! to characterise & taxonomise student errors and develop cognitive (information processing) accounts of their origin - appeal to human error, reasoning / problem solving, expertise development literatures •! explore ‘educational data mining’ techniques for large TEL system datasets - emerging area (www.educationaldatamining.org) •! to improve GG diagnosis and feedback, add learner modelling •! to develop automatic natural language paraphrase capabilities that, given a student’s incorrect answer, are able to select and formulate an appropriate natural language expression that makes clear the difference between this and the correct answer Openproof Day 27/3/09 15 Research aims •! to characterise & taxonomise student errors and develop cognitive (information processing) accounts of their origin - appeal to human error, reasoning / problem solving, expertise development literatures •! explore ‘educational data mining’ techniques for large TEL system datasets - emerging area (www.educationaldatamining.org) •! to improve GG diagnosis and feedback, add learner modelling •! to develop automatic natural language paraphrase capabilities that, given a student’s incorrect answer, are able to select and formulate an appropriate natural language expression that makes clear the difference between this and the correct answer •! to extend our approach to cases involving 3-way (NL, FOL, graphical) translations Openproof Day 27/3/09 16 Talk progress •! The Context •! What We Want To Do •! A Closer Look at the Data –!NL-FOL translation errors –!Mal-rules –!Graphical translation •! Future work Openproof Day 27/3/09 17 Data Selection for Initial Exploration •! What we wanted: a natural language to FOL translation exercise that employed a TW blocks world and which is of moderate difficulty, i.e. one that discriminates between students •! We computed the number of GG submissions per LPL exercise and rank ordered them; Exercise 7.12 from Chapter 7 (which introduces conditionals) was selected •! 46,200 submitted solutions, of which 27,473 were erroneous (59%) •! The solutions were submitted by 4912 students representing an average of 5.6 erroneous sentences per student •! So, Exercise 7.12 represents an exercise of mid-range difficulty Openproof Day 27/3/09 18 Exercise 7.12: Sentences 1-10 1.! 2.! 3.! 4.! 5.! If a is a tetrahedron then it is in front of d. a is to the left of or right of d only if it's a cube. c is between either a and e or a and d. c is to the right of a, provided it (i.e., c) is small. c is to the right of d only if b is to the right of c and left of e. 6.! if e is a tetrahedron, then it's to the right of b if and only if it is also in front of b. 7.! If b is a dodecahedron, then if it isn't in front of d then it isn't in back of d either. 8.! c is in back of a but in front of e. 9.! e is in front of d unless it (i.e., e) is a large tetrahedron. 10.! At least one of a, c, and e is a cube. Openproof Day 27/3/09 19 Exercise 7.12: Sentences 11-20 11.! a is a tetrahedron only if it is in front of b. 12.! b is larger than both a and e. 13.! a and e are both larger than c, but neither is large. 14.! d is the same shape as b only if they are the same size. 15.! a is large if and only if it's a cube. 16.! b is a cube unless c is a tetrahedron. 17.! If e isn't a cube, either b or d is large. 18.! b or d is a cube if either a or c is a tetrahedron. 19.! a is large just in case d is small. 20.! a is large just in case e is. Openproof Day 27/3/09 20 Openproof Day 27/3/09 21 Exercise 7.12, Sentence 1 •! If a is a tetrahedron then it is in front of d. –! Tet(a) ! FrontOf(a,d) •! 1547 incorrect answers instances out of 2181 submissions, 201 different forms of error •! 10 incorrect forms account for 1062 of 1547 (69%) Openproof Day 27/3/09 22 The first error taxonomy •! Logic Errors –! Antecedent and Consequent Reversed (" = 0.97) •! Substitution Errors –! Incorrect Constants (" = 0.98) –! Incorrect Predicates (" = 0.74) –! Incorrect Connectives (" = 0.94) •! Syntactic Errors –! Paren Errors (" = 0.76) –! Case Errors (" = 0.66) –! Arity Errors (" = 1.0) •! Other Errors (" = 0.52) Openproof Day 27/3/09 23 Revised error taxonomy •! Structural errors –! antecedent/consequent reversal FrontOf(a,d)-->Tet(a) –! added exclusivity (A V B) & ~(A & B) instead of just (A V B) •! Connective errors –! using one connective in place of another –! ! for "!, V for & •! Atomic errors –! predicate errors FrontOf(a,b) for SameShape(a,b) –! argument errors A for a, b for d Openproof Day 27/3/09 24 Automated coding •! pattern matching based on regular expressions •! classifier identifies fragments of submitted answer that correspond to atomic subformulae of submitted sentence •! average correct classification of 85% of submissions –!range 62% (S7) –! If b is a dodecahedron, then if it isn't in front of d then it isn't in back of d either. –!to 99% (S11) –! a is a tetrahedron only if it is in front of b 25 Openproof Day 27/3/09 This talk – focus on three highprevalence error types •! 1. sensitivity to word-order effects in posed NL sentence –! (26%, count=25084) •! 2. distinguishing the conditional from the biconditional –! ! "! –! bicon for con (18%, n=17518) –! con for bicon (12%, n=11362) •! 3. errors concerning naming of constants –! 4 of the top 10 errors on some sentences e.g. S10 –! Cube(a) V Cube(b) V Cube(c) instead of –! Cube(a) V Cube(c) V Cube(e) Openproof Day 27/3/09 26 Error 1/3: Two kinds of sentence •! 1. correct FOL preserves word order in posed NL sentence –! S1 If a is a tet then it is front of d –! Tet(a) ! FrontOf(a,d) •! 2. correct FOL does not preserve NL word order –! S4 c is to the right of a, provided it (c) is small –! Small(c) ! RightOf(c,a) •! 12/20 sentences involve implication (!) –! 8/12 preserve word order –! 4/12 do not •! interesting to compare sentences 1 and 4 as they are matched in terms of syntactic complexity of FOL translation (i.t.o. predicates, arity, connectives, arguments,...) 27 Openproof Day 27/3/09 Openproof Day 27/3/09 28 Error 1/3: Students & professors •! Clement, Lochhead & Monk (1981) studied translation from NL to algebraic expressions •! “Write an equation: ‘There are six times as many students as professors at this university’ Use S for number of students and P for number of professors” Openproof Day 27/3/09 29 Error 1/3: Students & professors •! Clement, Lochhead & Monk (1981) •! “Write an equation: ‘There are six times as many students as professors at this university’ Use S for number of students and P for number of professors” •! 150 ‘calculus level’ students 37% wrote 6S=P •! 57% in another sample •! Two postulated sources for error –! word-order matching (‘superficial, syntactic’) –! static comparison (‘S’ not used as a var, but as a label attached to number i.e. 6S=P used to represent 6S:P ... a wrongly written ratio (‘deep seated intuitive symbolization strategy’) Openproof Day 27/3/09 30 Error 1/3: Two kinds of sentence •! 1. correct solution preserves word order in posed NL sentence –! S1 If a is a tet then it is front of d –! Tet(a) ! FrontOf(a,d) –! 43% antecedent and consequent reversals •! 2. correct solution does not preserve NL word order –! S4 c is to the right of a, provided it (c) is small –! Small(c) ! RightOf(c,a) –! 66% antecedent and consequent reversals •! t=2.23, df(10), p<.05 •! error probably conflates word-order effect with !/"! confusion, which brings us to... Openproof Day 27/3/09 31 Error 2/3: <--> for --> substitution If a is a tetrahedron then it is in front of d •! S11 a is a tetrahedron only if it is in front of b •! S1 •! 6 error types shared: –! antecedent/consequent reversal –! "!/! not distinguished –! argument reversal –! incorrect arguments –! incorrect predicates –! arity error –! For S1 "! for ! ranked 4/6 and accounted for 12% errors of 1361 solutions –! For S11 "! for ! ranked 1/6, accounted for 58% of 8981 solutions •! hence different surface structure of two NL forms elicits very different error patterns… Openproof Day 27/3/09 32 Error 2/3: Biconditional for conditional errors across all 20 sentences... students clearly have most difficulty with ‘only if’ NL form of the conditional probably due to similarity to ‘if and only if’ NL form of the biconditional (iff, "!, ‘just in case’, ‘if and only if’) Openproof Day 27/3/09 33 Error 2/3: "! for ! substitution •! LPL textbook “..the expression only if introduces a necessary condition...” “you will pass the course only if you hand in all the homework” •! In NL conditionals are naturally interpreted as biconditionals “if you read this, I’ll buy you lunch” carries a heavy hint that ‘no reading, no lunch’... (Stenning & van Lambalgen, 2001; Zweis & Zwicky, 1971) •! exemplifies negative interference between NL and other (formal) language systems - cooperative vs adversorial stances differ (cf Grice) - applies to quantifiers too e.g. ‘some’ in logic includes possibility of ‘all’ but not so in NL - major source of difficulty for students of formal subjects... Openproof Day 27/3/09 34 Error 3/3: Errors in naming of constants •! S10 At least one of a, c and e is a cube –! Cube(a) V Cube(c) V Cube(e) •! in general we note that this error interacts with: –! use of constant ‘a’ first in sentence –! when ‘a’ is the first constant mentioned in sentence –! constants used in sentence are ‘gappy’ <b,e,d> rather than alphabetically adjacent <a,b,c> Openproof Day 27/3/09 35 Error 3/3: Errors in naming of constants N = no. of sentences •! a slip of ‘capture error’ variety –!(Norman, 1988; Reason, 1990) Openproof Day 27/3/09 36 Talk progress •! The Context •! What We Want To Do •! A Closer Look at the Data –!NL-FOL translation errors –!Mal-rules –!Graphical translation •! Future work Openproof Day 27/3/09 37 Mal-rules •! Attempt to go beyond superficial error pattern description… link NL forms to error forms •! Again, considered Ex 7.12 sentences involving conditional forms (15/20) •! small number of mal-rules account for most of the data for every NL form Openproof Day 27/3/09 38 Openproof Day 27/3/09 39 Mal-rules •! small number of mal-rules account for most of the data for every NL form –!cf algebra, arithmetic, programming •! except…. Openproof Day 27/3/09 40 Mal-rules •! ‘if…then’ –!When conditional is at the top level •! If a is a tetrahedron then it is in front of d. –! if A then B, B ! A (46%) –! if A then B, B " A (40%) –! but within an embedded clause •! If b is a dodecahedron, then if it isn't in front of d then it isn't in back of d either. –! if A then B, A ! B (63%) –!complex misunderstanding… Openproof Day 27/3/09 41 Mal-rules •! Implications for GG feedback to students •! Feedback needs to be context-sensitive –! B ! A when translating only if •! emphasise only if form of the conditional implies a necessary condition which, unlike the biconditional, does not have the property of commutativity. –!B ! A when translating just in case •! emphasise the restricted meaning that just in case has in logic compared to everyday language Openproof Day 27/3/09 42 Talk progress •! The Context •! What We Want To Do •! A Closer Look at the Data –!NL-FOL translation errors –!Mal-rules –!Graphical translation •! Future work Openproof Day 27/3/09 43 Comparing logic & graphical translations •! This time we considered similar kind of exercise (7.15) •! Requires NL->FOL translation but also student builds a ‘blocks world’ representation too •! Again, chose a medium difficulty level exercise... •! Focus again on conditionals •! 2000-2007 –! 29,500 submissions –! 18,609 erroneous –! 7,918 missing a blocks world diagram –! 372 lacking any FOL sentences –! 10,319 with both FOL sentences and a diagram –! from 5,176 different students Openproof Day 27/3/09 44 Exercise 7.15 Openproof Day 27/3/09 45 FOL translations Openproof Day 27/3/09 46 Error rates Openproof Day 27/3/09 Openproof Day 27/3/09 47 48 Flow of information through deduction •! •! •! •! •! •! •! •! •! •! start with S4 combine with S1 when b is known.. combine with S2 now S3 (a & c are Tets) from S4 we know c is not large so a is large (S3) 84% get this far... Tet(a) & Tet(b) & Tet(c) & Large(a) next is S7.... 49 Openproof Day 27/3/09 Flow of information through deduction •! Tet(a) & Tet(b) & Tet(c) & Large(a) •! S7... d is a small dodec unless a is small "! for ! error frequencies (Ex 7.12) Openproof Day 27/3/09 50 10 most frequent S7 errors Openproof Day 27/3/09 51 S7 Graphical errors involving d size 2346 (92%) shape 623 (24%) both 417 (16%) Openproof Day 27/3/09 52 Graphical errors involving d size 2346 (92%) shape 623 (24%) both 417 (16%) most FOL refers to both the size and shape of d Openproof Day 27/3/09 53 Graphic shows error that FOL conceals... •! possibly a scoping error ? –!“d is a small dodecahedron unless a is small” –!conclusion d is not small –!d is (not a small) dodec –!rather than –!d is (is not a small dodec) •! Working memory overload? Openproof Day 27/3/09 54 Difference between tasks? •! NL ! FOL is a translation task (one sentence at a time, sentences treated separately) –!Immediate inference (Newstead) •! NL ! FOL ! BW is a production task –!(construct BW, integrate information across S1 – S12 –!Full inference (deduction) Openproof Day 27/3/09 55 Talk progress •! The Context •! What We Want To Do •! A Closer Look at the Data –!NL-FOL translation errors –!Mal-rules –!Graphical translation •! Future work Openproof Day 27/3/09 56 Current & Future work •! 1. pursue syntactic/graphical error ‘asymmetry’ •! 2. analysis of students’ submission trajectories –!cf difficulty measures error rate & stickiness –!Individual differences •! 3. LPL and KR –!Characterise tasks and resources •! 4.OpenFace Openproof Day 27/3/09 57 1. pursue syntactic/graphical error ‘asymmetry’ Openproof Day 27/3/09 58 Ideas: Aaron Kalb –!Related to part of speech? •! Refutation of ‘The Taj Mahal is a large animal’ –!Not ‘You’re wrong, the Taj Mahal is not a large animal’ –!Rather ‘Dude, the Taj Mahal is not an animal’ –!Adjective drops out … •! Study further via concurrent verbal ‘think aloud’ protocols…method/data triangulation Openproof Day 27/3/09 59 Expressiveness •! also graphical representations less expressive thanlinguistic representations •! cf ‘triangle’ and Openproof Day 27/3/09 60 Visual interference effect (Knauff)? •! Visual information when not relevant to reasoning at hand can have negative effects on reasoning •! Not so true for spatial information •! Implications for abstract mental rule vs mental model theories… Openproof Day 27/3/09 61 2. analysis of students’ submission trajectories Openproof Day 27/3/09 62 Individual student’s submissions over time number of attempts it takes for for a student determine a correct answer once they have made their initial mistake. Openproof Day 27/3/09 Openproof Day 27/3/09 63 64 Openproof Day 27/3/09 65 3. LPL and KR •! Resources –!NL –!Graphical •! Tasks –!Agency –!Modality(ies) •! Deliverables –!Ex. solutions Openproof Day 27/3/09 66 3. LPL and KR •! structure GG corpus in terms of the learning tasks posed to the learner •! structure of the logic curriculum (i.e., a hierarchy of conceptual pre- and co-requisites) –!content analysis, various texts –!Enderton, Hodges, Copi, Gamut, Hurley, LPL,… –!KE domain experts… Openproof Day 27/3/09 67 4. OpenFace – sharing with EDM community •! •! •! •! user-friendly web-based front-end facilitate data filtering, sharing and re-use network with new & growing EDM community users will be encouraged to `grow' the resource by submitting the results of their analyses, and ancillary materials Openproof Day 27/3/09 68 Future work •! Implications –! revise FOL curriculum –! enrich GradeGrinder’s feedback to students by providing it in the form of natural language –! Your solution (incorrect): •! Lib(j) # Home(j) –! What your solution means: •! John is either at home or at the library, or both. •! (challenge is to minimise ‘paraphrase distance’) •! Mal-rule analyses show that context sensitivity required! Openproof Day 27/3/09 69 Paraphrase ‘Distance’ From Source •! [Home(john) " Home(mary)] ! ¬[Home(john) ! Home(mary)] –!Either John is home or Mary is home and it’s not the case that John is home and Mary is home –!Either John or Mary is home and it’s not the case that John and Mary are both home –!Either John or Mary is home but it’s not the case that John and Mary are both home –!Either John or Mary is home but it’s not the case that both of them are home –!Either John or Mary is home but not both Openproof Day 27/3/09 70 End •! Acknowledgements: –!Joint Social Science Research Council (US) & ESRC TLRP (UK) ‘Visiting Americas’ Fellowship to Cox, –!Australian Research Council support to Dale •! Albert Liu, Michael Murray, Dawit Meles, Aaron Kalb (CLSI) Openproof Day 27/3/09 71