Think-Aloud Protocols

advertisement
Advanced Study Design
February 19, 2010
Today’s Class
• Last Week’s Probing Question
• Advanced Study Design
• Assignments
Probing Question
• Let’s say you wanted to do a large-scale
research study on boredom
• Under what conditions would it be preferable
to use
– Questionnaire items
– Experience sampling method
– Quantitative field observations
Today’s Class
• Last Week’s Probing Question
• Advanced Study Design
• Assignments
Today…
•
•
•
•
•
Validity
Validity Threats
Stratification
Counterbalancing and Cross-over Designs
Regression-Discontinuity Designs
Validity
• Useful jargon...
Validity
(Trochim & Donnelly, 2007)
•
•
•
•
•
Conclusion validity
Internal validity
Construct validity
External validity
Ecological validity
Conclusion Validity
• The degree to which conclusions you reach
about relationships in your data are justified
Internal Validity
• Assuming that there is a relationship in the
study, can you justifiably infer that the
relationship is causal?
Construct Validity
• The degree to which inferences can
legitimately be made from the
operationalizations in your study, to the
theoretical constructs on which those
operationalizations were based
External Validity
• Do your results generalize to other people,
procedures, places, and times?
Ecological Validity
• What is the degree to which the methods,
materials, and settings of the study are
relevant to natural/legitimate settings?
Ecological vs. External Validity
• Ecological validity
– not about *generalization* to real-life situations
– about the whether the "methods, materials and settings"
are similar (or identical) to real life.
• Ecological validity is about real-world
*relevance*
• External validity is about generalizability
Examples?
• High External Validity, Low Ecological Validity
• Low External Validity, High Ecological Validity
High External Validity,
Low Ecological Validity
• Lab studies of “seductive details” effect
• Instruction that does not include interesting but ultimately
irrelevant details leads to better learning, for students of
variety of ages performed in lab settings at 2 universities with
children of different socio-economic status (SES) & race
Low External Validity,
High Ecological Validity
• A classroom study, with real students, involving legitimate
educational tasks, presented in exactly the way a teacher
would present them…
Low External Validity,
High Ecological Validity
• A classroom study, with real students, involving legitimate
educational tasks, presented in exactly the way a teacher
would present them…
• With 1 student in each condition
Let’s consider a few examples
• Vote on which type of validity is violated (any
of the five, could be multiple, could even be
none)
• Explain your reasoning
Which type of validity is violated?
• Students who read bug messages perform
more poorly on post-test
• So bug messages hurt learning!
You have chosen a categorical variable for
the X axis; however, scatterplot graphs
can only contain numerical variables.
Which type of validity is violated?
• I have proven that students learn more
Calculus from my Calculus tutoring system
• Here is my test, used both pre and post
• How well do you know Calculus?
1
2
3
4
5
Not well
Very well
Which type of validity is violated?
• My new tutoring system is much better than
the previous tutoring system!
Which type of validity is violated?
• My new tutoring system is much better than
the previous tutoring system!
Which type of validity is violated?
• I conducted a study comparing my new tutoring
system to a previous one
• Students who completed the whole tutoring system
performed significantly better on post-test in the
experimental condition than control condition
Which type of validity is violated?
• I conducted a study comparing my new tutoring
system to a previous one
• Students who completed the whole tutoring system
performed significantly better on post-test in the
experimental condition than control condition
• Oops… did I mention only 3% of students completed
the whole tutoring system in the control condition?
Which type of validity is violated?
• Now that I have tested my new learning
environment that responds to off-task
behavior by giving it to single students in the
guidance counselor’s office after school, we
can be confident it will work in all school
settings
Which type of validity is violated?
• Now that I have tested my new learning
environment with a set of 10 8th graders in
Tuktoyaktuk (Northwestern Territory of
Canada), all bilingual English-Inuvialuit, with
fathers who work in the mine nearby, we can
be confident it will work for all students
Which type of validity is violated?
• Now that I have tested my new learning
environment with a set of 120 8th graders in a
predominantly middle-class Caucasian suburb
of Worcester, we can be confident it will work
for all students
Some Popular Threats to
Internal Validity
Maturation Threat
• Something happens between pre-test and
post-test, aside from your intervention, that
impacts student change
– E.g. the same thing would have happened
whether or not you ran your study
Maturation Threat
• Something happens between pre-test and
post-test, aside from your intervention, that
impacts student change
– E.g. the same thing would have happened
whether or not you ran your study
• Any horror stories from your research?
Maturation Threat
• Something happens between pre-test and
post-test, aside from your intervention, that
impacts student change
– E.g. the same thing would have happened
whether or not you ran your study
• One teacher taught the same material in class
during the same week as the study
Mortality Threat
• Common in urban classrooms
Mortality Threat
• Large numbers of participants systematically
drop out of the study
• Any horror stories from your research?
Mortality Threat
• Large numbers of participants systematically
drop out
• Example: I ran a study with homeschool
students; response rates were different
between conditions
Regression to the Mean
• If you choose a group based on pre-test
performance
– The most frequent gamers
– The students who scored in the bottom 10% on
the pre-test
• Some of them were in that group by chance
• And can be expected to do better on the posttest
Diffusion of Treatment
• You assign kids to different conditions, but
they see each others’ screens (or talk in the
hallway, etc.)
• You assign classes randomly to condition
within-teacher, but teachers learn strategies
from the better condition and use that
knowledge in the other condition
Diffusion of Treatment
• You assign kids to different conditions, but they see
each others’ screens (or talk in the hallway, etc.)
• You assign classes randomly to condition withinteacher, but teachers learn strategies from the better
condition and use that knowledge in the other
condition
– A major study comparing curricula in Baltimore was called
into question because teachers took teaching strategies
from the experimental condition to the control condition
Compensatory rivalry/
resentful demoralization
• Students in condition A learn about condition B,
which is obviously better
• Resentful demoralization – “it’s no fair they got the
better software, let’s just quit”
• Compensatory rivalry – “we can beat them, even if
they got the better software”
Compensatory rivalry/
resentful demoralization
• Students in condition A learn about condition B,
which is obviously better
• Resentful demoralization – “it’s no fair they got the
better software, let’s just quit”
– More common for students
• Compensatory rivalry – “we can beat them, even if
they got the better software”
– More common for teachers
Confounding
• You changed multiple things in your
intervention (often inadvertantly), and it’s not
clear which change had the impact
• Some examples?
Confounding
• Your meta-cognitive intervention takes longer
to go through
– Better learning, or just more time-on-task?
Comments? Questions?
Stratification
Pure random sampling
• Let’s say you have an intervention that you want
to test in 4 groups: urban, wealthy suburban,
working-class suburban, and rural students
– You have access to students in Worcester, Auburn,
Ashburnham, and Cambridge
– If you just randomly sample in your population, you
are going to get a lot more people from Worcester
than Ashburnham
– In fact, if you sample 100 people randomly, you have a
significant chance of getting nobody at all from
Ashburnham
Stratification
• Your population has N groups
• Sample randomly within each group
Proportional Stratification
• Sample from each group in proportion to its’
size
– e.g. randomly select
•
•
•
•
5% of all students in Worcester
5% of all students in Auburn
5% of all students in Cambridge
5% of all students in Ashburnham
Equalizing Stratification
(also called “Disproportionate”)
• Sample from each group in proportion to get
equal groups
– e.g. randomly select
•
•
•
•
25 students in Worcester
25 students in Auburn
25 students in Cambridge
25 students in Ashburnham
What variables could you stratify on?
(in learning sciences)
What variables could you stratify on?
(in learning sciences)
• Gender
• Race/Ethnicity
• Prior knowledge (pre-test large group, then
choose intervention sample)
• Disabilities
Why?
• Why might you want to use
– Proportional Stratification
– Equalizing Stratification
– Good Old Random Sampling
Some Reasons for Stratification
• Guarantee of representing all groups of
interest in your sample
• Higher statistical power
• Discover inter-group differences in
intervention’s effect
Some Reasons against Stratification
• Need to account for multiple groups in your
statistical method
• Results will over-emphasize effects in rarer
groups
– e.g. what if an intervention is wonderfully
effective in major cities; stratification may make
that effect harder to see
• Much more complicated, especially if you
stratify on pre-test
Comments? Questions?
Counterbalancing
• Also called “Cross-over design”
Counterbalancing
• Split your sample into groups A and B
A
Control
Experimental
B
Experimental
Control
Time 1
(Topic 1?)
Time 2
(Topic 2?)
Advantages? Disadvantages?
• Split your sample into groups A and B
A
Control
Experimental
B
Experimental
Control
Time 1
(Topic 1?)
Time 2
(Topic 2?)
Advantages
• Enables you to do a within-subjects statistical
test
– More statistical power
• You can look at longer-term effects of your
intervention (by looking at time 2 behavior in
group that got experimental condition at time
1)
Disadvantages
• Statistical analysis will be complicated if there
is any carry-over effect from time 1 to time 2
• Longer study
• Usually requires two versions of all material
such as two topics – if topics are different in
difficulty or learning, there is increased
variance (and thus less statistical power)
Notes
• Enthusiastically recommended by Shaaron
Ainsworth in her AIED Evaluation tutorial
• For me, it has always been a disaster
Comments? Questions?
Regression-Discontinuity Design
Regression-Discontinuity Design
• The “that’s got to be invalid…” Design
Regression-Discontinuity Design
• You conduct a pre-test
• You choose a cut-off
– Below the cut-off, you give the experimental
condition
– Above the cut-off, you give the control condition
Regression-Discontinuity Design
• You plot the pre-test and post-test for each
condition on the same graph
No effect of intervention
Cut-off
Post
Pre
Positive effect of intervention
Cut-off
Post
Pre
Negative effect of intervention
Cut-off
Post
Pre
Why would you want to use
this study method?
Why would you want to use
this study method?
• Cases where it is unethical to give the control
condition to some students
– Or where there is a real need for intervention for
students at the bottom of the distribution
• Cases where the experimental condition is
expected to have no effect on students who
don’t need it
What are some limitations of this
method?
What are some limitations of this
method?
• Complicated statistics; low statistical power
• Painful to explain to reviewers
Surprisingly…
• Regression to the mean isn’t a problem…
• It doesn’t cause discontinuities in the
regression line!
Comments? Questions?
Today’s Class
• Last Week’s Probing Question
• Advanced Study Design
• Assignments
Assignment #4
• Any questions?
Download