How good is a robot tutor? The effectiveness of excel as a teaching resource multiplier in teaching statistics Dave Nunez Colin Tredoux Susan Malcolm-Smith ACSENT Lab University of Cape Town Jacob Jaftha Dept. of Mathematics and Applied Mathematics University of Cape Town 2 Context UCT Psychology has an extensive statistics teaching programme (1st year to honours) Research focus makes this an imperative By honours, are expected to apply stats to a significant individual research project A mixed group of students entering All have high-school maths, or have completed/concurrently completing a year-long numeracy course Stats is largely disliked, and provokes significant anxiety 3 Context Large classes, few tutors Typically 40:1 student:tutor ratio Excel based tutorials developed to counter this (lab facilities can cope with numbers) – “tutor in a can”; “tutorbot”; “tutortron-2000” Positive student feedback from excel tuts Liked that they could take them home Seemed to compensate for poor lecture attendance BUT – very little interaction between teachers & students (how were explanations/queries handled? Was it necessary?) 4 The excel based tutorials used In development since 2003 Almost all technical glitches resolved Contain text, exercises and evaluation Text supplements textbook (text and images); also includes animations & simulations Teaches concepts and tools Provides exercises which are immediately scored (feedback given for each question) Each tut ends with a mini-test which must be submitted [electronically] Each tut takes 120-150 minutes to complete 5 The excel based tutorials used The tutorials aim to be more than simple exercises Embed some teaching by interaction & feedback Raises the issue: Can interactive, discovery based learning surpass student-tutor interaction for learning statistics Some topics are well suited for discovery (sampling distribution of the mean) Some topics are poorly suited for it (probability) Do the excel tutorials lead to skill transfer? 7 Methods used in the past Pre-test/post-test Without a control, cannot show the tutorial is the cause (even a bad tut teaches something) Voluntary assignment No control for motivation variables No control for repetition Performance often measured by means of psychological variables Confidence, mastery, conceptual learning No absolute task-based criteria 8 Deficits in past methods Poor controls No proper control within subjects (natural learning) No proper control across conditions (subjects self-assign to conditions) These are often related to ethical concerns Measures are generally poor Single measure of complex, time-dependent phenomenon No criterion based assessment (i.e. low ecological validity of findings) 9 Research questions Do Excel based tutorials (EBTs) compare in performance (marks scored) to pen-andpaper tutorials (PnPs)? Is there a difference in terms of psychological variables (mastery, confidence) between EBTs and PnPs? 10 Strengths of the current study Two-group quasi-experiment Pseudo random assignment of students to excel/pen-and-paper tutorials Strong control/similarity of tutorials (we think) Semester long, continuous assessment Standard test after each tutorial (criterion and psychological measures) Final exam at the end of the semester 11 Sample The 2007 PSY2006F class Statistics lecture each Friday; One stats tut a week 172 students (only Humanities students) Almost all have been through 3 tutorials in PSY1001W on using excel for stats 2007 cohort not significantly different from other years Not told about the study; simply told strange tutorial structure was due to logistical reasons 12 Materials PnP tuts are ‘traditional’ as done in the dept. before advent of excel tuts Published in a textbook (we partly wrote) – in 2001 Choose tutors who excel (!) at statistics They lead students (groups of 30-40) through worksheets and explain problems and theory as they go along Students are given 2 hour classroom sessions to complete tuts (mostly don’t finish) Students are required to submit the completed worksheet a week after the classroom session 13 Materials Excel tuts (latest versions) Developed by us (2003-2007) 1 senior tutor in the lab for stats queries, junior tutors for technical problems Students are given 2 hour lab sessions (groups of 30-40) to complete tuts (mostly don’t finish) Students are required to submit the completed excel worksheet a week after the lab session 14 Design Control for individual variation and crossgroup effects Each student does 4 EBTs, 4 PnPs (8 topics in the course) Two ‘streams’ – EPEPEPEP, PEPEPEPE Within subjects design, and cross-group comparison The non-statistics marks in the course (research methods, psychometrics & qualitative methods) can be used to validate (traditionally high R2 between them) 15 Measures Exam at the end 2 hour practical exam (given data, problem solving – no concepts) Do each exam section in the same technology form as the tuts were done in 16 Measures Monday assessments Each tut has a set of MCQ items 6 MCQ items, 3 concepts, 3 calculations; one each easy, moderate, hard 5 Likert items about confidence with the material, usefulness of tut, degree of understanding, how much extra help is needed 17 Two students, Able and Baker, want to get into the honours class, but they have taken different third year subjects. Able did the PSY300X course (which had a mean mark of 53% and a standard deviation of 11%) and he got a mark of 80%. Baker on the other hand did the PSY300Y course (mean mark of 57% and a standard deviation of 7.5%), and got a mark of 77%. If honours places are awarded to students who stand out the most in their courses, which one of the students should get into honours and why? a) b) c) d) Able should get in, because he scored 27% above the course average Baker should get in, because he scored 20% above the course average Able should get in, because he scored proportionately higher above the course average Baker should get in, because he scored proportionately higher above the course average Measures (3) Distribution X is normally distributed; distribution Y has a standard normal distribution. Which of the following statements MUST BE FALSE? a) b) c) d) The mean of distribution X is 2 The standard deviation of distribution Y is 1 Distribution Y must always give the same proportion of high scores as low scores when sampled randomly Distribution X never gives scores lower than distribution Y when sampled randomly. 18 Validation N=170 Comp. Paper Quant. methods 0.15 0.33 Psychometrics 0.35 0.40 Qual. methods 0.25 0.36 19 GROUP; LS Means Wilks lambda=.99259, F(3, 165)=.41061, p=.74559 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals Validation 32 30 28 26 24 22 20 18 16 A B GROUP exam_quant exam_psychom exam_qual 20 Attitude results R1*GROUP; LS Means Current effect: F(5, 370)=4.7192, p=.00034 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 3.8 3.6 C 3.4 C 3.2 C C 3.0 C 2.8 C 2.6 2.4 Positive attitude (0-5) 2.2 2.0 1.8 1.6 att1 att3 att4 att6 att7 att8 Paper first Comp. first 21 eval_1; LS Means Current effect: F(2, 150)=1.1240, p=.32769 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals Preference effects 30 29 28 27 26 25 24 Score for computer questions 23 22 21 Classroom tuts Both were helpful Lab tuts 22 eval_1; LS Means Current effect: F(2, 151)=1.5940, p=.20651 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals Preference effects 29 28 27 26 25 24 23 22 21 Paper based questions 20 19 18 17 Classroom tuts Both were helpful Lab tuts 23 Monday assessments R1*GROUP; LS Means Current effect: F(5, 455)=1.5736, p=.16607 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 6.0 5.5 5.0 C C 4.5 C 4.0 C 3.5 C Test score (out of 6) 3.0 C 2.5 2.0 Computer first Paper first 1.5 Topic1 Topic3 Topic4 Topic6 Topic7 Topic8 24 Exam results R1*GROUP; LS Means Current effect: F(7, 1169)=7.3499, p=.00000 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 1.0 0.9 C C 0.8 0.7 C 0.6 C C C 0.5 C 0.4 Exam mark (0-1) 0.3 0.2 C 0.1 0.0 Q1std Q2std Q3std Q4std Q5std R1 Q6std Q7std Q8std GROUP A GROUP B 25 Testing effects R1*GROUP; LS Means Current effect: F(5, 840)=1.0067, p=.41255 Effective hypothesis decomposition Vertical bars denote 0.95 confidence intervals 0.4 C C 0.2 C C C 0.0 -0.2 -0.4 C -0.6 Improvement from tut to exam -0.8 -1.0 -1.2 q1-t1 q3-t3 q4-t4 q6-t6 q7-t7 q8-t8 Paper first Comp first 26 What the data shows The EBTs can function as a robot tutor With small tutor team, marks at least as good as traditional tutorials, better in a few topics for some students Student preference/attitude is not associated with performance Lack of significant findings No patterned differences 27 What the data shows EBTs can show an advantage At exam time rather than test time May indicate poor test or that EBTs need repetition to take effect It is a weak effect - does not generalize to the entire class easily (group B only) 28 What the data DOES NOT show Excel based statistics teaching is better Content is confounded with form Tutor ability is confounded with form Students enjoy/get confidence from the EBTs Only differences show the opposite Students can leverage existing computer skills for learning statistics Skills were pre-existing and not manipulated