Research Methods for the Learning Sciences Kenneth R. Koedinger Philip I. Pavlik Jr TA: Benjamin Shih Lecture 2 Validity and Design Management Issues • In a few minutes, I will get started with our second lecture • But first, I’d like to cover a few mangement issues Management Issues • Has everyone successfully purchased the book and accessed the online reading for today? • Trochim & Donnelly Chapters 1 and 7 Management Issues • Has anyone had any difficulty with accessing or posting to Goggle Wave? • You should post for the next class Your first assignment: part 1 • Are there any questions or concerns on part 1 of the first assignment Quibbles • Discussing, or even Quibbling, about details of examples is a good thing, since it helps us think about the concepts discussed • Although… The Trouble With Quibbles They Multiply! Three Types of Study • Descriptive Studies • “Relational Studies” – Many people call them correlational studies – Like me • Causal Studies • Can you define each type? Three Types of Study • Descriptive Studies • Correlational Studies • Causal Studies • Who here has done studies of each type? (say a little more?) Feasibility and Validity • Descriptive Studies • Correlational Studies • Causal Studies Feasibility Validity Feasibility and Validity • A tradeoff you will see many times Issues With Correlational Studies • Does the Dog Wag the Tail? • Or Does the Tail Wag The Dog? The Tail Wagging The Dog • Fowler et al (2005) report that "There [is] a 41% increase in risk of being overweight for every can or bottle of diet soft drink a person consumes each day.” • People who drink more diet soda gain weight • Therefore, diet drinks must be stimulating appetite, and making people eat more and gain weight, right? The Tail Wagging The Dog • Well, maybe… • But maybe people drink more diet soda because they are gaining weight or are already overweight • With just a correlation, you can’t tell – With a causal study, you can – So how would you make this study causal? Free Soda • “You get all the free diet soda you can drink, but you over there, you get all the free regular soda you can drink.” • See the later section on Ethics Issues with Correlational Studies • The Third-Variable Problem • Not always referred to by this name, but definitely always important Let’s Consider a Possible Relationship • We are handing out slips describing a correlational relationship • Please write down a third variable that could directly lead to an increase in both variables (it could be a quantitative or a categorical variable) Let’s read out what you’ve got • Read out the original relationship and your third variable As you can see… • A lot of different third variables can explain the relationships found The “Just So Story” Problem • As a class, you were able to find reasonable explanations for two contradictory hypotheses • This is called the “Just So Story” problem • People can find a reasonable-sounding explanation for just about any finding – Which is why we should always question both our findings, and our reasonable-sounding explanations for them Confirmation Bias • A particular danger is when you find what you’re expecting to find – You may not double-check your results quite as carefully as when your results are surprising – Always double-check everything and keep records! • Coding errors, mis-copied data, eliminated subjects for good reasons but forgot to propagate change to sample pool, using the wrong variable in an analysis, running the wrong test Exception and Ecological Fallacies (from Chapter 1) • Roughly opposites of each other – Ecological fallacy: • A property general to group applies to all group members • Students who have used Cognitive Tutors know more math than students who used traditional curricula • Therefore Sheela (who used a Cognitive Tutor) knows more math than Indira (who used a traditional curriculum) – Exception fallacy • A property found in one individual applies to whole group • Roberto used a Cognitive Tutor and cannot distinguish categorical variables from numerical variables • Therefore all students who used a Cognitive Tutor will have this difficulty Now, Validity! • What are – Conclusion validity – Internal validity – Construct validity – External validity – Ecological validity Sub-categories of External Validity • Non-representative and/or nonrandom sample of users • Inappropriate tasks • Inappropriate measures Ecological vs. External Validity • Critical issue in studies of learning is – whether they generalize to people and places (have 'external validity') – that are representative of "real life" (an ecological validity concern) • Ecological validity, in common usage – not about *generalization* to real-life situations – about the whether the "methods, materials and settings" are similar (or identical) to real life. • One can separate the ideas – ecological validity is about real-world *relevance* – external validity is about generalizability Examples ecological & external validity distinction • Strong ecological validity, but lower external validity – Koedinger, Anderson, Hadley, & Mark study – Strong ecological validity because methods, materials, & setting are real classroom instruction in real schools – Not strong external validity because study was only done with urban students in Pittsburgh • Strong external validity, but lower ecological validity – Lab study of “seductive details” finds that instruction that does not include interesting but ultimately irrelevant details leads to better learning, for students of variety of ages performed at 2 universities with children of different socio-economic status (SES) & race – Strong external validity because it was demonstrated across a range of persons and places, but because it was done in the lab, it may not have high ecological validity • Maybe “seductive details” only have benefit in ecologically valid settings, with distractions, where they increase attention Study features to consider for external & ecological validity • External validity – Generalizability of study features – Trochim 2nd edition: persons, places, times – Brewer (2000) (see Wikipedia): settings (~places), procedures, participants (= persons) – Koedinger: procedures, materials • Ecological validity – Relevance of study features to real-world – Brewer (2000) (see Wikipedia): methods (~procedures), materials, & setting Ecological validity increases prob of external validity • It is commonly conjectured that high ecological validity may likely improve external validity. – A study done in a classoom rather than the lab (more ecologically valid) is more likely to generalize to other classrooms (external validity) than a lab study • Not clear that this common conjecture has been proven – How would one prove it? Ecological validity increases prob of external validity • It is commonly conjectured that high ecological validity may likely improve external validity. – A study done in a classoom rather than the lab (more ecologically valid) is more likely to generalize to other classrooms (external validity) than a lab study • Not clear that this common conjecture has been proven – How would one prove it? – But a good rule of thumb is: The more similar your study is to context of application (ecological) and the more different contexts of study (external)… The more likely your results will generalize to the context of natural settings with other people, procedures, places, & times (ecological and external) Example (Baker, d’Mello, Rodrigo, & Graesser, in preparation) • Is boredom or frustration more persistent over time, as students use a learning environment? • If we just did one study, you might ask: – Will this effect be general across contexts, student ages, cultures, learning systems, domains, etc. Example (Baker, d’Mello, Rodrigo, & Graesser, in preparation) • So we ran studies analyzing this – USA, college students, lab study, AutoTutor, computer literacy domain – Philippines, 17-19 year olds, classroom study, The Incredible Machine, concrete problem-solving domain – Philippines, 12-15 year olds, classroom study, Aplusix, algebra • And got the same result (boredom is much more persistent) Example (Baker, d’Mello, Rodrigo, & Graesser, in preparation) • Do these three studies have external validity? • Do these three studies have ecological validity? Another key feature • Participant motivation, affect, & knowledge factors. – Example: Study with students in classroom, materials from course -> ecologically valid – But, students not getting a grade -> may approach task differently & results may differ • E.g., a treatment designed to enhance motivation may work better than it does when it is applied as actual, graded, part of a class A quiz… Let’s consider a few examples • Vote on which type of validity is violated (any of the five, could be multiple, could even be none) • Explain your reasoning Which type of validity is violated? • Students who read bug messages perform more poorly on post-test • So bug messages hurt learning! You have chosen a categorical variable for the X axis; however, scatterplot graphs can only contain numerical variables. (Baker, Corbett, Koedinger, & Schneider, 2004) Which type of validity is violated? • I have proven that students learn more Calculus from my Calculus tutoring system • Here is my test, used both pre and post • How well do you know Calculus? 1 2 3 4 5 Not well Very well Which type of validity is violated? • My new tutoring system is much better than the previous tutoring system! Which type of validity is violated? • My new tutoring system is much better than the previous tutoring system! Which type of validity is violated? • I conducted a study comparing my new tutoring system to a previous one • Students who completed the whole tutoring system performed significantly better on post-test in the experimental condition than control condition Which type of validity is violated? • I conducted a study comparing my new tutoring system to a previous one • Students who completed the whole tutoring system performed significantly better on post-test in the experimental condition than control condition • Oops… did I mention only 3% of students completed the whole tutoring system in the experimental condition? Which type of validity is violated? • Now that I have tested my new learning environment that responds to off-task behavior by giving it to single students in the guidance counselor’s office after school, we can be confident it will work in all school settings Which type of validity is violated? • Now that I have tested my new learning environment with a set of 10 8th graders in Tuktoyaktuk (Northwestern Territory of Canada), all bilingual English-Inuvialuit, with parents who work in the mine nearby, we can be confident it will work for all students Which type of validity is violated? • Now that I have tested my new learning environment with a set of 41 8th graders in a predominantly upper-class Caucasian suburb of Pittsburgh, we can be confident it will work for all students Threats to Validity • Selection threat/ Self-selection threat • Internal validity (Accuracy of cause-effect inference) – History threat – Maturation threat – Testing threat – Instrumentation threat – Mortality threat – Regression threat • Social/Motivational threats – Diffusion of treatment – Compensatory rivalry/resentful demoralization – Compensatory Equalization – Demand threat Confounding • What is a confounding variable? • Examples? Regression toward the mean example (From davidmlane.com) "Consider an acutal study that received considerable media attention. This study sought to determine whether a drug that reduces anxiety could raise SAT scores by reducing test anxiety. A group of students whose SAT scores were surprisingly low (given their grades) was chosen to be in the experiment. These students, who presumably scored lower than expected on the SAT because of test anxiety, were administered the anti-anxiety drug before taking the SAT for the second time. The results supported the hypothesis that the drug could improve SAT scores by lowering anxiety: the SAT scores were higher the second time than the first time. Since SAT scores normally go up from the first to the second administration of the test, the researchers compared the improvement of the students in the experiment with nationwide data on how much students usually improve. The students in the experiment improved significantly more than the average improvement nationwide. The problem with this study is that by choosing students who scored lower than expected on the SAT, the researchers inadvertently chose students whose scores on the SAT were lower than their "true" scores. The increase on the retest could just as easily be due to regression toward the mean as to the effect of the drug. The degree to which there is regression toward the mean depends on the relative role of skill and luck in the test." Any issues with this example? Feasibility • One of the big things you crash into, when planning a study or a program of research, is the need for feasibility • It would be awesome if we all had access to unlimitedly large subject pools, in any setting we wanted Feasibility • It would be awesome if we all had access to unlimited research support for things like running studies and coding data Feasibility • Often, when a study we want to do is not quite feasible, we can find corners to cut to make it possible • The key is finding the right corners to cut That Said • Being willing to do something painful that no one else has been willing to do so far can enable great new research – Like driving out to schools every morning at 7am for 2 months in 3 separate years (Ryan’s dissertation) But… • It’s even better to discover a new method that provides data which is verifiably “almost as good” with vastly less effort Experimental Design Feasibility Considerations • Cost of running experiment – Subjects, experimenter time, equipment • Converting results into economic or practical terms • Important trade-offs: – Lower cost for subjects vs. higher reliability/ believability of results – More pilot subjects/time vs. faster/cheaper results but with greater risk Ethics: • This is a big issue • It is not one that can be summarized in just a few minutes • These days there is often a lot of paperwork – CMU is sometimes extremely reasonable about this • But there have been real abuses in the past – And not just in the past Ethics: • I feel odd not saying much about ethics, it’s a very key subject • But at some level, ethics is a key part of the “apprenticeship” model of graduate school • I genuinely believe that it’s hard to teach out of context Guidelines • Protect peoples’ anonymity • Enable people to give informed consent, as much as possible • Give people an avenue for complaint • Don’t use conditions known to be bad unless you’re going to compensate for it somehow • If unexpected bad things happen, don’t ignore it • The subject is always right Guidelines • Protect peoples’ anonymity • Enable people to give informed consent, as much as possible • Give people an avenue for complaint • Don’t use conditions known to be bad unless you’re going to compensate for it somehow • If unexpected bad things happen, don’t ignore it • The subject is always right (until they leave the scene) Ethical Guidelines • Does anybody want to disagree with any of these guidelines? • Does anybody want to add in some other guidelines they think are important? Thanks! • Make sure to read Trochim chapters 8, 9, 10 for next week!