Experimental_Methods - Pittsburgh Science of Learning Center

advertisement
Research Methods for the
Learning Sciences
Kenneth R. Koedinger
Philip I. Pavlik Jr
TA: Benjamin Shih
Lecture 2
Validity and Design
Management Issues
• In a few minutes, I will get started with our
second lecture
• But first, I’d like to cover a few mangement
issues
Management Issues
• Has everyone successfully purchased the book
and accessed the online reading for today?
• Trochim & Donnelly Chapters 1 and 7
Management Issues
• Has anyone had any difficulty with accessing
or posting to Goggle Wave?
• You should post for the next class
Your first assignment: part 1
• Are there any questions or concerns on part 1
of the first assignment
Quibbles
• Discussing, or even Quibbling, about details of
examples is a good thing, since it helps us
think about the concepts discussed
• Although…
The Trouble With Quibbles
They Multiply!
Three Types of Study
• Descriptive Studies
• “Relational Studies”
– Many people call them correlational studies
– Like me
• Causal Studies
• Can you define each type?
Three Types of Study
• Descriptive Studies
• Correlational Studies
• Causal Studies
• Who here has done studies of each type?
(say a little more?)
Feasibility and Validity
• Descriptive Studies
• Correlational Studies
• Causal Studies
Feasibility
Validity
Feasibility and Validity
• A tradeoff you will see many times
Issues With Correlational Studies
• Does the Dog Wag the Tail?
• Or Does the Tail Wag The Dog?
The Tail Wagging The Dog
• Fowler et al (2005) report that "There [is] a
41% increase in risk of being overweight for
every can or bottle of diet soft drink a person
consumes each day.”
• People who drink more diet soda gain weight
• Therefore, diet drinks must be stimulating
appetite, and making people eat more and
gain weight, right?
The Tail Wagging The Dog
• Well, maybe…
• But maybe people drink more diet soda
because they are gaining weight or are already
overweight
• With just a correlation, you can’t tell
– With a causal study, you can
– So how would you make this study causal?
Free Soda
• “You get all the free diet soda you can drink,
but you over there, you get all the free regular
soda you can drink.”
• See the later section on Ethics
Issues with Correlational Studies
• The Third-Variable Problem
• Not always referred to by this name, but
definitely always important
Let’s Consider a Possible
Relationship
• We are handing out slips describing a
correlational relationship
• Please write down a third variable that could
directly lead to an increase in both variables
(it could be a quantitative or a categorical
variable)
Let’s read out what you’ve got
• Read out the original relationship and your
third variable
As you can see…
• A lot of different third variables can explain
the relationships found
The “Just So Story” Problem
• As a class, you were able to find reasonable
explanations for two contradictory hypotheses
• This is called the “Just So Story” problem
• People can find a reasonable-sounding
explanation for just about any finding
– Which is why we should always question both our
findings, and our reasonable-sounding
explanations for them
Confirmation Bias
• A particular danger is when you find what
you’re expecting to find
– You may not double-check your results quite as
carefully as when your results are surprising
– Always double-check everything and keep records!
• Coding errors, mis-copied data, eliminated subjects for
good reasons but forgot to propagate change to sample
pool, using the wrong variable in an analysis, running the
wrong test
Exception and Ecological Fallacies
(from Chapter 1)
• Roughly opposites of each other
– Ecological fallacy:
• A property general to group applies to all group members
• Students who have used Cognitive Tutors know more math than
students who used traditional curricula
• Therefore Sheela (who used a Cognitive Tutor) knows more math
than Indira (who used a traditional curriculum)
– Exception fallacy
• A property found in one individual applies to whole group
• Roberto used a Cognitive Tutor and cannot distinguish categorical
variables from numerical variables
• Therefore all students who used a Cognitive Tutor will have this
difficulty
Now, Validity!
• What are
– Conclusion validity
– Internal validity
– Construct validity
– External validity
– Ecological validity
Sub-categories of External Validity
• Non-representative and/or nonrandom
sample of users
• Inappropriate tasks
• Inappropriate measures
Ecological vs. External Validity
• Critical issue in studies of learning is
– whether they generalize to people and places (have
'external validity')
– that are representative of "real life"
(an ecological validity concern)
• Ecological validity, in common usage
– not about *generalization* to real-life situations
– about the whether the "methods, materials and settings"
are similar (or identical) to real life.
• One can separate the ideas
– ecological validity is about real-world *relevance*
– external validity is about generalizability
Examples ecological & external
validity distinction
• Strong ecological validity, but lower external validity
– Koedinger, Anderson, Hadley, & Mark study
– Strong ecological validity because methods, materials, & setting are
real classroom instruction in real schools
– Not strong external validity because study was only done with urban
students in Pittsburgh
• Strong external validity, but lower ecological validity
– Lab study of “seductive details” finds that instruction that does not
include interesting but ultimately irrelevant details leads to better
learning, for students of variety of ages performed at 2 universities
with children of different socio-economic status (SES) & race
– Strong external validity because it was demonstrated across a range of
persons and places, but because it was done in the lab, it may not
have high ecological validity
• Maybe “seductive details” only have benefit in ecologically valid settings,
with distractions, where they increase attention
Study features to consider for external &
ecological validity
• External validity
– Generalizability of study features
– Trochim 2nd edition: persons, places, times
– Brewer (2000) (see Wikipedia): settings (~places), procedures,
participants (= persons)
– Koedinger: procedures, materials
• Ecological validity
– Relevance of study features to real-world
– Brewer (2000) (see Wikipedia): methods (~procedures), materials, &
setting
Ecological validity increases prob of
external validity
• It is commonly conjectured that high ecological validity may
likely improve external validity.
– A study done in a classoom rather than the lab (more ecologically
valid) is more likely to generalize to other classrooms (external
validity) than a lab study
• Not clear that this common conjecture has been proven
– How would one prove it?
Ecological validity increases prob of
external validity
• It is commonly conjectured that high ecological validity may
likely improve external validity.
– A study done in a classoom rather than the lab (more ecologically
valid) is more likely to generalize to other classrooms (external
validity) than a lab study
• Not clear that this common conjecture has been proven
– How would one prove it?
– But a good rule of thumb is:
The more similar your study is to context of application (ecological)
and the more different contexts of study (external)…
The more likely your results will generalize to the context of natural
settings with other people, procedures, places, & times (ecological and
external)
Example
(Baker, d’Mello, Rodrigo, & Graesser,
in preparation)
• Is boredom or frustration more persistent over
time, as students use a learning environment?
• If we just did one study, you might ask:
– Will this effect be general across contexts, student
ages, cultures, learning systems, domains, etc.
Example
(Baker, d’Mello, Rodrigo, & Graesser,
in preparation)
• So we ran studies analyzing this
– USA, college students, lab study, AutoTutor,
computer literacy domain
– Philippines, 17-19 year olds, classroom study, The
Incredible Machine, concrete problem-solving
domain
– Philippines, 12-15 year olds, classroom study,
Aplusix, algebra
• And got the same result (boredom is much
more persistent)
Example
(Baker, d’Mello, Rodrigo, & Graesser,
in preparation)
• Do these three studies have external validity?
• Do these three studies have ecological
validity?
Another key feature
• Participant motivation, affect, & knowledge
factors.
– Example: Study with students in classroom,
materials from course -> ecologically valid
– But, students not getting a grade -> may approach
task differently & results may differ
• E.g., a treatment designed to enhance motivation may
work better than it does when it is applied as actual,
graded, part of a class
A quiz…
Let’s consider a few examples
• Vote on which type of validity is violated (any
of the five, could be multiple, could even be
none)
• Explain your reasoning
Which type of validity is violated?
• Students who read bug messages perform
more poorly on post-test
• So bug messages hurt learning!
You have chosen a categorical
variable for the X axis; however,
scatterplot graphs can only contain
numerical variables.
(Baker, Corbett, Koedinger, & Schneider, 2004)
Which type of validity is violated?
• I have proven that students learn more
Calculus from my Calculus tutoring system
• Here is my test, used both pre and post
• How well do you know Calculus?
1
2
3
4
5
Not well
Very well
Which type of validity is violated?
• My new tutoring system is much better than
the previous tutoring system!
Which type of validity is violated?
• My new tutoring system is much better than
the previous tutoring system!
Which type of validity is violated?
• I conducted a study comparing my new tutoring
system to a previous one
• Students who completed the whole tutoring system
performed significantly better on post-test in the
experimental condition than control condition
Which type of validity is violated?
• I conducted a study comparing my new tutoring
system to a previous one
• Students who completed the whole tutoring system
performed significantly better on post-test in the
experimental condition than control condition
• Oops… did I mention only 3% of students completed
the whole tutoring system in the experimental
condition?
Which type of validity is violated?
• Now that I have tested my new learning
environment that responds to off-task
behavior by giving it to single students in the
guidance counselor’s office after school, we
can be confident it will work in all school
settings
Which type of validity is violated?
• Now that I have tested my new learning
environment with a set of 10 8th graders in
Tuktoyaktuk (Northwestern Territory of
Canada), all bilingual English-Inuvialuit, with
parents who work in the mine nearby, we can
be confident it will work for all students
Which type of validity is violated?
• Now that I have tested my new learning
environment with a set of 41 8th graders in a
predominantly upper-class Caucasian suburb
of Pittsburgh, we can be confident it will work
for all students
Threats to Validity
• Selection threat/ Self-selection threat
• Internal validity (Accuracy of cause-effect inference)
– History threat
– Maturation threat
– Testing threat
– Instrumentation threat
– Mortality threat
– Regression threat
• Social/Motivational threats
– Diffusion of treatment
– Compensatory rivalry/resentful demoralization
– Compensatory Equalization
– Demand threat
Confounding
• What is a confounding variable?
• Examples?
Regression toward the mean
example
(From davidmlane.com)
"Consider an acutal study that received considerable media attention. This study sought
to determine whether a drug that reduces anxiety could raise SAT scores by reducing
test anxiety. A group of students whose SAT scores were surprisingly low (given their
grades) was chosen to be in the experiment.
These students, who presumably scored lower than expected on the SAT because of
test anxiety, were administered the anti-anxiety drug before taking the SAT for the
second time. The results supported the hypothesis that the drug could improve SAT
scores by lowering anxiety: the SAT scores were higher the second time than the first
time. Since SAT scores normally go up from the first to the second administration of the
test, the researchers compared the improvement of the students in the experiment with
nationwide data on how much students usually improve. The students in the experiment
improved significantly more than the average improvement nationwide. The problem
with this study is that by choosing students who scored lower than expected on the SAT,
the researchers inadvertently chose students whose scores on the SAT were lower than
their "true" scores. The increase on the retest could just as easily be due to regression
toward the mean as to the effect of the drug. The degree to which there is regression
toward the mean depends on the relative role of skill and luck in the test."
Any issues with this example?
Feasibility
• One of the big things you crash into, when
planning a study or a program of research, is
the need for feasibility
• It would be awesome if we all had access to
unlimitedly large subject pools, in any setting
we wanted
Feasibility
• It would be awesome if we all had access to
unlimited research support for things like
running studies and coding data
Feasibility
• Often, when a study we want to do is not
quite feasible, we can find corners to cut to
make it possible
• The key is finding the right corners to cut
That Said
• Being willing to do something painful that no
one else has been willing to do so far can
enable great new research
– Like driving out to schools every morning at 7am
for 2 months in 3 separate years
(Ryan’s dissertation)
But…
• It’s even better to discover a new method that
provides data which is verifiably “almost as
good” with vastly less effort
Experimental Design Feasibility
Considerations
• Cost of running experiment
– Subjects, experimenter time, equipment
• Converting results into economic or
practical terms
• Important trade-offs:
– Lower cost for subjects vs. higher reliability/ believability of
results
– More pilot subjects/time vs. faster/cheaper results but with
greater risk
Ethics:
• This is a big issue
• It is not one that can be summarized in just a few
minutes
• These days there is often a lot of paperwork
– CMU is sometimes extremely reasonable about this
• But there have been real abuses in the past
– And not just in the past
Ethics:
• I feel odd not saying much about ethics, it’s a very
key subject
• But at some level, ethics is a key part of the
“apprenticeship” model of graduate school
• I genuinely believe that it’s hard to teach out of
context
Guidelines
• Protect peoples’ anonymity
• Enable people to give informed consent, as much as
possible
• Give people an avenue for complaint
• Don’t use conditions known to be bad unless you’re
going to compensate for it somehow
• If unexpected bad things happen, don’t ignore it
• The subject is always right
Guidelines
• Protect peoples’ anonymity
• Enable people to give informed consent, as much as
possible
• Give people an avenue for complaint
• Don’t use conditions known to be bad unless you’re
going to compensate for it somehow
• If unexpected bad things happen, don’t ignore it
• The subject is always right (until they leave the
scene)
Ethical Guidelines
• Does anybody want to disagree with any of
these guidelines?
• Does anybody want to add in some other
guidelines they think are important?
Thanks!
• Make sure to read Trochim chapters 8, 9, 10
for next week!
Download