Introducing Concepts of Statistical Inference Via

advertisement
Introducing Concepts of Statistical
Inference Via Randomization Tests
John Holcomb - Cleveland State University
Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University
George Cobb - Mount Holyoke College
http://statweb.calpoly.edu/csi/
Introduction
 2005 USCOTS Cobb proposed the idea of a new
introductory curriculum based on randomization
methods.
 Cobb (2007) “Our curriculum is needlessly
complicated because we put the normal
distribution, as an approximate sampling
distribution for the mean, instead of putting the
core logic of inference at the center.”
 Why? “Tyranny of the computable.”
 How????
Curriculum Goal
 Use technology simulations to lead students
to develop an understanding of the concepts
of statistical significance and p-values.
 Focus on statistical process.
 Repeated exposure throughout course.
Learning Process
 Research study and data.
 Tactile simulation and class discussion of
results.
 Simulation using a tailored applet.
 Empirical p-value and discussion.
 Conclusion in context.
Classroom Activities (Example 1):
Naughty or Nice?
 Hamlin, Wynn, and Bloom (Nature, 2007)
 Videos available at
www.yale.edu/infantlab/socialevaluation/Helper-Hinderer.html
 Do the experimental results (14 out of 16)
provide convincing evidence that infants have
a genuine preference for the helper toy rather
than the result occurring by chance alone?
 Inference for a binomial proportion.
Randomization approach
• How likely such an extreme result would be
under the null model of no preference?
• Begin with each student flipping a coin 16
times.
• Combine results from class with a dotplot.
• Move to applet to simulate tossing 16
coins for 1000 repetitions.
• Determine the proportion of
repetitions14 or more flips were heads.
Example 2: Sleep Deprivation?
 Stickgold, James, & Hobson, (Nature Neuroscience,
2000)
 21 subjects randomly assigned to one of two
groups: sleep deprived group and unrestricted
sleep.
 Both groups then allowed as much sleep as
wanted on the following two nights.
 All subjects re-tested on the third day.
Example 2: Sleep Deprivation?
 Randomized experiment.
 Compared mean improvement in
response time to a visual stimulus on a
computer screen.
 Unrestricted sleep group: 19.82 ms.
 Sleep deprived group: 3.90 ms.
 Inference for the difference of two means
from independent samples.
Randomization approach
 How likely such an extreme result
(Diff=15.92 ms) would be under the null
model of no treatment effect?
 21 improvement scores written on index
cards.
 Randomly “deal” cards to two groups and
record difference of group means.
 Combine results of the class with a
dotplot.
Randomization approach
 Use an applet to simulate randomization
process 1000 times and explore the
frequency of a difference in means of 15.92
or more extreme by chance alone.
 http://www.rossmanchance.com/applets
Research Goal
 Make evidence-based curriculum decisions.
 Implement and collect data from small-scale
classroom experiments.
 Identify issues that create difficulties for
student understanding.
 Formulate questions about the most
pedagogically effective way to implement
this approach.
Curriculum Design Issue #1:
First Activity
 Should the first example that students
encounter be one where the result is
statistically significant or one where the
result is not significant at all?
Advantages
 Students may find it easier to judge when
an observed result is surprising under a
null model.
 Starting with an insignificant result may
reinforce students’ natural inclinations to
regard a p-value as the probability the null
model is true.
Classroom Experiment
 Four sections of introductory statistics at Cal
Poly.
 Half the students told 9 of the 16 infants chose
the helper toy (non-significant result group).
 Half the students told 14 of the 16 infants chose
the helper toy (significant result group).
 Students given the activity and told to work in
pairs.
Classroom Experiment
• Two instructors: one randomized across
sections and the other randomized by
individuals.
• Follow-up Quiz questions the next class
period.
Question 1
9 of 16
14 of 16
When I conducted the simulation using 1,000,000 repetitions,
I obtained a proportion (empirical p-value) of .402 (.002).
Based on this result, which assumes the null model of no
genuine preference, the actual result obtained by the
researchers,9 of 16 (14 of 16), choosing the helper is
a) impossible
c) somewhat surprising
b) very surprising
d) not at all surprising
Results
 60.6% (n = 71) students in the “9 of 16”
group answered correctly.
 77.5% (n = 71) students in the “14 of 16”
group answered correctly.
 Two-sided p-value is approximately 0.030.
 Interpretation: students find it easier to
interpret a surprising outcome than a nonsurprising one.
Question 2
Fill in the blanks in the following sentence to interpret this
proportion from part (l).
This proportion says that in about (1) % of ___(2)___,
the researchers would get __(3)___ who choose
the helper toy, assuming that ________(4)______.
Results
This proportion says that in about (1) % of
___(2)___, the researchers would get __(3)___ who
choose the helper toy, assuming that ______(4)______.
9 of 16
14 of 16
(1) 40.2% vs. 0.2%: Non-significant group
performed better, but may be an artifact of
the difficulty of expressing 0.002 as a
percentage.
Results
This proportion says that in about (1) % of
___(2)___, the researchers would get __(3)___ who
choose the helper toy, assuming that
______(4)______.
(2) Both groups about 50% correctly
answering 1,000,000 repetitions.
Results
This proportion says that in about (1) % of
___(2)___, the researchers would get __(3)___ who
choose the helper toy, assuming that
______(4)______.
(3) “14 of 16” group performed better
(54.2% vs. 38.9%, p-value = .066) in
citing the observed result or more
extreme.
Results
This proportion says that in about (1) % of
___(2)___, the researchers would get __(3)___ who
choose the helper toy, assuming that
______(4)______.
(4) Both groups about 76% correctly
answering “assuming no preference.”
Interpretation
 Although students seem to equally
understand the null model, they did differ
slightly in realizing what the simulation told
them.
Question 3
Based on your answer to (1) and (2), which of the following
would you consider the most appropriate conclusion from this
study? (choose one)
(a) These 16 infants have no genuine preference and therefore
there’s no reason to doubt that the researchers’ result is different
from .5 just by random chance.
(b) The researchers’ results would be very surprising if there was
no genuine preference for the helper and therefore I believe there
is a preference.
(c) There is a large (small) chance that there is a genuine
preference for the helper.
Results
 Approximately 77% in each group answered
the correct answer: (a) for the “9 of 16”
group and (b) for the “14 of 16.”
Curriculum Design Issue #2:
Tactile Simulations
 We have suggested beginning each
simulation with a tactile version before
turning to technology, but does the tactile
aspect really add value to the students’
learning experience?
Should we do tactile first?
 Potential advantages:
 Students are in a better position to
understand what the technology simulation is
doing if they have first performed a tactile
simulation themselves.
 We use applets that mirror the hands-on
activity as closely as possible so the
technology is not a “black box.”
 Potential disadvantage: tactile simulations take
valuable class time.
Classroom Experiment
 Randomly assigned 43 students to two
treatment groups.
 Class topic was investigating the sampling
distribution of a single proportion.
Classroom Experiment
 Tactile group (20 students) students
each determined the sample proportion
of orange candies among 25 actual
Reese’s Pieces.
Students created a class dotplot of
sample proportions.
Then turned to applet for simulation of
many samples of size 25.
Classroom Experiment
 Non-tactile group (23) immediately moved
to simulation with applet without working
with Reese’s Pieces and creating pooled
dotplot.
Results
 Students in both groups given a quiz the
next day.
 Five questions that involved a single
sample proportion.
 Independent and blinded statistics
instructor graded the quizzes.
 No statistically significant differences were
found.
Results
 Interesting outcome was that both groups
seemed to finish the activity in about the
same amount of time.
 May suggest the tactile aspect does not take
more time and does not hinder learning.
Results
 In a follow-up questionnaire to a different
class, approximately 50% of the students
indicated the tactile component of the
activity was helpful.
Curriculum Design Issue #3
 Should the first activity that students
encounter focus on inference for a single
proportion, as in the “Naughty or Nice”
example, or on a comparison of two groups,
as in the “Sleep Deprivation” example?
Curriculum Design Issue #4
 In the case of a simulation involving a single
proportion, as with the “Naughty or Nice”
example, how should the tactile simulation
be conducted?
 16 students each tossing one coin
 Each student tosses 1 coin 16 times
 Each student tosses 16 coins
Curriculum Design Issue #5
 In the case of a randomization test for a 2×2
table, what statistic should the students
calculate in their simulations?
 Difference in conditional proportions of
success.
 Ratio of conditional proportions (Relative
risk).
 Number of successes in group A.
Curriculum Design Issue #6
 How much of the work should the
technology do automatically (calculating
empirical p-values)?
 Push a button (e.g., based on two-way
table).
 Specify observed result to count beyond.
Curriculum Design Issue #7
 Should the type of randomness used in the
simulation always reflect the role of
randomness used in the actual data
collection process?
 Randomizing (assignment) when data
arises from experiments. Bootstrapping
and/or sampling from finite populations
when data arises from samples.
 Always randomizing group membership.
Summary
 Feasibility of random assignment in
classroom experiments.
 Focused and relevant research questions.
 Direct link between research and classroom
practice.
 Assessment instruments described in
Holcomb, Chance, Rossman, and Cobb (2010,
Proceedings).
References
 Cobb, George W. (2007). The Introductory Statistics Course: A
Ptolemaic Curriculum? Technology Innovations in Statistics Education,
1(1),www.escholarship.org/uc/item/6hb3k0nz
 Holcomb, J., Chance, B., Rossman, A., & Cobb, G., (2010).
Assessing student learning about statistical inference, Proceedings of
the 8th International Conference on Teaching Statistics.
 Hamlin, J. K., Wynn, K., & Bloom, P. (2007). Social evaluation by
preverbal infants. Nature, 450, 557-559
 Stickgold, R., James, L., & Hobson, J.A. (2000). Visual
discrimination learning requires post-training sleep. Nature
Neuroscience, 2, 1237-1238.
 http://statweb.calpoly.edu/csi
Thanks to National Science Foundation DUE/CCLI #0633349
Download