Trends in the Teaching of Inferential Statistics

advertisement
Talk at RIT-2013, April, Mini-Conference-2
Trends and Updates in the
Teaching of Inferential Statistics
Yusuf K. Bilgic - bilgic@geneseo.edu
Visiting Assistant Professor in Statistics
2012-2014 , SUNY-Geneseo
I wish Fisher knew it!
• ITALY’S highest court overturned the
acquittal of Amanda Knox, accused of the
2007 murder of Meredith Kercher...
Miscalculation (misinterpretations) by
judges and lawyers of probabilities, from
the odds of DNA matches to the chance
of accidental death, have sent innocent
people to jail, and, perhaps, let
murderers walk free... (NYT, 3/27)
• See next slide for the stat fallacy..
Reasoning ..
• By the time Ms. Knox’s appeal was decided in
2011, however, techniques had advanced
sufficiently to make a retest of the knife
possible, and the prosecution asked the judge
to have one done. But he refused. His
reasoning? If the scientific community
recognizes that a test on so small a sample
cannot establish identity beyond a reasonable
doubt, he explained, then neither could a
second test on an even smaller sample…
Talk Outlines
• What is Inferential Statistics?
• History
• Logic in Hypothesis Testing and Interpretations
of the P-value
• Trends in Teaching Hypothesis Testing
• Reform Movements
• Position of Hypothesis Testing in Current and
Developing Curricula
• Conclusion
What is Inferential Statistics?
• Statistical inference is the process of drawing
conclusions/decisions/estimates/.. from data
that is subject to random variation.
• Big picture of statistical inference
History
• 1700s, first inferential statistics uses in astronomy
and geodesy
• Pierre-Simon Laplace
• P-value: Formally introduced by Karl Pearson,
1914
• Modern use of P-values and Null Hypothesis
Testing, by Fisher, 1920s.
• Neyman-Pearson approach to hypothesis testing
and debates with Fisher’s approach
The lady tasting tea
• A lady's claim (Muriel Bristol)  able to distinguish by taste
how tea is prepared (first adding the milk to the cup, then the
tea, or first tea, then milk)
• She was sequentially presented with 8 cups: 4 prepared one
way, 4 prepared the other, and asked to determine the
preparation of each cup (knowing that there were 4 of each).
• In this case the null hypothesis was that she had no special
ability, the test was Fisher's exact test.
• In the actual experiment, Bristol correctly classified all 8 cups.
• The p-value was C(8,4)=.014 so Fisher rejected the null
hypothesis (consider the outcome highly unlikely to be due to
chance.
Inferential Stat Timeline
When
..
1740
1770
1839
1914
1920
What?
Statistics/Probability
First use of inferential stat
Laplace’s CLT and inference
ASA founded
P-value introduced
Modern P-value with Fisher
Neyman-Pearson approach
1937
Neyman introduced the
confidence interval
MCMC/Gibbs/
GAISE Report
Reform Projects
Con’t
1980s
2005
2010+
…
So What?
..
Error probability calculations
Excess of boys compared to girls
Did you know …?
Pearson used in Chi-squared dist
Null Hypothesis introduced
Null Hypothesis vs. Alternative
Hypothesis introduced
In statistical testing
Applications with technology
Reform in Stat Ed
Concrete Materials being developed
..
Logic in Hypothesis Testing and
Interpretations of P-value
• Theory is proven, but statistical hypotheses
are checked with data, knowing the limitations
of data and chance factor in any set of results.
• Falsification: a hypothesis is testable by
empirical experiment and thus conforms to
the standards of scientific method.
• Null hypothesis–based significance testing:
Most common way in which scientific
inferences are made
Different paradigms of statistical
inference
• Fisher’s approach on inductive inference about a
single hypothesis using pseudo-falsification
• The Neyman–Pearson approach on future
behavior based on a test using two
complementary hypotheses, associated decision
error rates, and a specified effect size
• The Bayesian approach on probabilities to
measure the belief in a particular hypothesis
warranted by evidence.
Three paradigms
• Fisher's approach does not involve any
alternative hypothesis
• A shortcoming of the NP approach is that the inthe-long-run condition of such testing is a fiction
relative to actual scientific inquiry and decision
making.
• The Bayesian approach dominated statistical
thinking before Fisher, Neyman, and Pearson but
was pushed aside in the 1920s as being too
subjective.
Interpretations of probabilities
• ‘a measure of evidence’: Fisher suggested the p-value
as an informal measure of statistical evidence.
• ‘observed error rate’: Neyman dismissed the p-value as
a measure of evidence and proposed the formal
hypothesis test framework based on error rates
• These two methods on testing and interpretation of pvalue are incompatible but mistakenly regarded as part
of a single, coherent approach to statistical inference
• ‘degree of belief ’: In the Bayesian approach, p-value
suggests plausibility: it informs an investigator so that
his or her degree of belief in a hypothesis can be
adjusted based on evidence
Trends in Teaching Hypothesis Testing
• Academic statistics vs. Practical statistics
• Dichotomous decision vs. Subjective decision
• Factors that shape teaching statistics:
–
–
–
–
–
–
Vibrant Statistics
Philosophical evolutions in stat
Educational updates: Cognitivism/Constructivism
High Speed Calculations
Subjectivity
Needs in ‘human/social-related inferences’ with
complexity
Who is responsible?
Me: Hey mom! You always force me to dichotomous
decisions.
Mom: Are you sure?
Me: 100% I am sure you do this.
Mom: You made it again.
Shifts in teaching inferential statistics
From
Single p-value
p-value
Theoretical probability facts
Conventional wisdom
Traditional parametric tests
To
Many p-values, Simulations, Meta-analysis
Estimate, CI, Power, Effect Size
Noisy facts, Data-driven facts
‘It depends’ likelihood decisions
Alternatives (Nonparam, Bayesian, Bootstrap)
Theoretical emphasis in
data analysis
Empirical, Applied, Concept-based,
Randomization-based,
Behaviorism-based teaching
Objectivity
Single
NHST with p-value
Cognitivism-based teaching
Subjectivity
Interdisciplinary inclusions
Seeking alternatives/broader ways
Reform Movements
• Disagreements of Fisher, Pearson, and
Neyman unresolved and imperfectly
integrated into present-day applications
• GAISE Recommendations, the CATALYST
Project, the Cause Organization, and
Project MOSAIC
– Radical changes in content and pedagogy
– Simulation/empirical/randomization-based
activities, re-samplings, no procedural framework
Position of Hypothesis Testing in
Current and Developing Curricula
• Despite growth of Bayesian research, most
undergraduate teaching is still based on
frequentist inference (the inference framework in
which the well-established methodologies of
statistical hypothesis testing and confidence
intervals are based).
• New/Developing
• Common Cores
• Journals, APA Guidelines
• Alternatives to Hypothesis/Significance testing
and/or p-value
• Confidence intervals -instead of hypothesis
tests- whenever possible (Newman;
Agresti&Franklin; Cumming; APA…)
• Arguments to replace hypothesis testing with
presentations of confidence limits are
increasing as a consequence of the confusion
surrounding ES, p-value, and error rates
(Newman)
• Alternatives to p-values in testing
Conclusion / Comments / Q
• Since P values are not likely to soon disappear from the
pages of medical journals or from the toolbox of
statisticians, the challenge remains how to use them
and still properly convey the strength of evidence
provided by research data (L. Herd).
• Need work how to reflect current trends on undergrad
teachings.
• I need partnerships to write an article
on today’s topic. Please let me know at
bilgic@geneseo.edu
Do you agree?
Let’s argue it in the next conference...
• Bayarri and Berger (2004), ‘In a related vein,
we avoid the question of what is
“pedagogically correct. If pressed, we would
probably argue that Bayesian statistics (with
emphasis on objective Bayesian methodology)
should be the type of statistics that is taught
to the masses, with frequentist statistics being
taught primarily to advanced statisticians.’
References
•
•
•
•
•
•
Geoff Cumming, 2011, Understanding The New Statistics: Effect Sizes, Confidence
Intervals, and Meta-Analysis, July 14th 2011 by Routledge Academic.
Michael C. Newman, 2008, ‘What exactly are you inferring?’ A closer look at
hypothesis testing. Environmental Toxicology and Chemistry, Vol. 27, No. 5, pp.
1013–1019, 2008
Robert E. Kass, 2011 Statistical Inference: The Big Picture 1. Statistical Science,
2011, Vol. 26, No. 1, 1–9, DOI: 10.1214/10-STS337, Institute of Mathematical
Statistics,
Svetlana Tishkovskaya, Gillian A. Lancaster. Statistical Education in the 21st
Century: a Review of Challenges, Teaching Innovations and Strategies for Reform,
Journal of Statistics Education Volume 20, Number 2 (2012), Lancaster University
Goodman SN. 1993. P values, hypothesis tests, and likelihood: Implications for
epidemiology of a neglected historical debate. Am J Epidemiol 137:485–496.
L. Leonhard Held. Biostatistician.
http://www.biostat.uzh.ch/aboutus/people/held/IFSPM.pdf
Download