Talk at RIT-2013, April, Mini-Conference-2 Trends and Updates in the Teaching of Inferential Statistics Yusuf K. Bilgic - bilgic@geneseo.edu Visiting Assistant Professor in Statistics 2012-2014 , SUNY-Geneseo I wish Fisher knew it! • ITALY’S highest court overturned the acquittal of Amanda Knox, accused of the 2007 murder of Meredith Kercher... Miscalculation (misinterpretations) by judges and lawyers of probabilities, from the odds of DNA matches to the chance of accidental death, have sent innocent people to jail, and, perhaps, let murderers walk free... (NYT, 3/27) • See next slide for the stat fallacy.. Reasoning .. • By the time Ms. Knox’s appeal was decided in 2011, however, techniques had advanced sufficiently to make a retest of the knife possible, and the prosecution asked the judge to have one done. But he refused. His reasoning? If the scientific community recognizes that a test on so small a sample cannot establish identity beyond a reasonable doubt, he explained, then neither could a second test on an even smaller sample… Talk Outlines • What is Inferential Statistics? • History • Logic in Hypothesis Testing and Interpretations of the P-value • Trends in Teaching Hypothesis Testing • Reform Movements • Position of Hypothesis Testing in Current and Developing Curricula • Conclusion What is Inferential Statistics? • Statistical inference is the process of drawing conclusions/decisions/estimates/.. from data that is subject to random variation. • Big picture of statistical inference History • 1700s, first inferential statistics uses in astronomy and geodesy • Pierre-Simon Laplace • P-value: Formally introduced by Karl Pearson, 1914 • Modern use of P-values and Null Hypothesis Testing, by Fisher, 1920s. • Neyman-Pearson approach to hypothesis testing and debates with Fisher’s approach The lady tasting tea • A lady's claim (Muriel Bristol) able to distinguish by taste how tea is prepared (first adding the milk to the cup, then the tea, or first tea, then milk) • She was sequentially presented with 8 cups: 4 prepared one way, 4 prepared the other, and asked to determine the preparation of each cup (knowing that there were 4 of each). • In this case the null hypothesis was that she had no special ability, the test was Fisher's exact test. • In the actual experiment, Bristol correctly classified all 8 cups. • The p-value was C(8,4)=.014 so Fisher rejected the null hypothesis (consider the outcome highly unlikely to be due to chance. Inferential Stat Timeline When .. 1740 1770 1839 1914 1920 What? Statistics/Probability First use of inferential stat Laplace’s CLT and inference ASA founded P-value introduced Modern P-value with Fisher Neyman-Pearson approach 1937 Neyman introduced the confidence interval MCMC/Gibbs/ GAISE Report Reform Projects Con’t 1980s 2005 2010+ … So What? .. Error probability calculations Excess of boys compared to girls Did you know …? Pearson used in Chi-squared dist Null Hypothesis introduced Null Hypothesis vs. Alternative Hypothesis introduced In statistical testing Applications with technology Reform in Stat Ed Concrete Materials being developed .. Logic in Hypothesis Testing and Interpretations of P-value • Theory is proven, but statistical hypotheses are checked with data, knowing the limitations of data and chance factor in any set of results. • Falsification: a hypothesis is testable by empirical experiment and thus conforms to the standards of scientific method. • Null hypothesis–based significance testing: Most common way in which scientific inferences are made Different paradigms of statistical inference • Fisher’s approach on inductive inference about a single hypothesis using pseudo-falsification • The Neyman–Pearson approach on future behavior based on a test using two complementary hypotheses, associated decision error rates, and a specified effect size • The Bayesian approach on probabilities to measure the belief in a particular hypothesis warranted by evidence. Three paradigms • Fisher's approach does not involve any alternative hypothesis • A shortcoming of the NP approach is that the inthe-long-run condition of such testing is a fiction relative to actual scientific inquiry and decision making. • The Bayesian approach dominated statistical thinking before Fisher, Neyman, and Pearson but was pushed aside in the 1920s as being too subjective. Interpretations of probabilities • ‘a measure of evidence’: Fisher suggested the p-value as an informal measure of statistical evidence. • ‘observed error rate’: Neyman dismissed the p-value as a measure of evidence and proposed the formal hypothesis test framework based on error rates • These two methods on testing and interpretation of pvalue are incompatible but mistakenly regarded as part of a single, coherent approach to statistical inference • ‘degree of belief ’: In the Bayesian approach, p-value suggests plausibility: it informs an investigator so that his or her degree of belief in a hypothesis can be adjusted based on evidence Trends in Teaching Hypothesis Testing • Academic statistics vs. Practical statistics • Dichotomous decision vs. Subjective decision • Factors that shape teaching statistics: – – – – – – Vibrant Statistics Philosophical evolutions in stat Educational updates: Cognitivism/Constructivism High Speed Calculations Subjectivity Needs in ‘human/social-related inferences’ with complexity Who is responsible? Me: Hey mom! You always force me to dichotomous decisions. Mom: Are you sure? Me: 100% I am sure you do this. Mom: You made it again. Shifts in teaching inferential statistics From Single p-value p-value Theoretical probability facts Conventional wisdom Traditional parametric tests To Many p-values, Simulations, Meta-analysis Estimate, CI, Power, Effect Size Noisy facts, Data-driven facts ‘It depends’ likelihood decisions Alternatives (Nonparam, Bayesian, Bootstrap) Theoretical emphasis in data analysis Empirical, Applied, Concept-based, Randomization-based, Behaviorism-based teaching Objectivity Single NHST with p-value Cognitivism-based teaching Subjectivity Interdisciplinary inclusions Seeking alternatives/broader ways Reform Movements • Disagreements of Fisher, Pearson, and Neyman unresolved and imperfectly integrated into present-day applications • GAISE Recommendations, the CATALYST Project, the Cause Organization, and Project MOSAIC – Radical changes in content and pedagogy – Simulation/empirical/randomization-based activities, re-samplings, no procedural framework Position of Hypothesis Testing in Current and Developing Curricula • Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist inference (the inference framework in which the well-established methodologies of statistical hypothesis testing and confidence intervals are based). • New/Developing • Common Cores • Journals, APA Guidelines • Alternatives to Hypothesis/Significance testing and/or p-value • Confidence intervals -instead of hypothesis tests- whenever possible (Newman; Agresti&Franklin; Cumming; APA…) • Arguments to replace hypothesis testing with presentations of confidence limits are increasing as a consequence of the confusion surrounding ES, p-value, and error rates (Newman) • Alternatives to p-values in testing Conclusion / Comments / Q • Since P values are not likely to soon disappear from the pages of medical journals or from the toolbox of statisticians, the challenge remains how to use them and still properly convey the strength of evidence provided by research data (L. Herd). • Need work how to reflect current trends on undergrad teachings. • I need partnerships to write an article on today’s topic. Please let me know at bilgic@geneseo.edu Do you agree? Let’s argue it in the next conference... • Bayarri and Berger (2004), ‘In a related vein, we avoid the question of what is “pedagogically correct. If pressed, we would probably argue that Bayesian statistics (with emphasis on objective Bayesian methodology) should be the type of statistics that is taught to the masses, with frequentist statistics being taught primarily to advanced statisticians.’ References • • • • • • Geoff Cumming, 2011, Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis, July 14th 2011 by Routledge Academic. Michael C. Newman, 2008, ‘What exactly are you inferring?’ A closer look at hypothesis testing. Environmental Toxicology and Chemistry, Vol. 27, No. 5, pp. 1013–1019, 2008 Robert E. Kass, 2011 Statistical Inference: The Big Picture 1. Statistical Science, 2011, Vol. 26, No. 1, 1–9, DOI: 10.1214/10-STS337, Institute of Mathematical Statistics, Svetlana Tishkovskaya, Gillian A. Lancaster. Statistical Education in the 21st Century: a Review of Challenges, Teaching Innovations and Strategies for Reform, Journal of Statistics Education Volume 20, Number 2 (2012), Lancaster University Goodman SN. 1993. P values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. Am J Epidemiol 137:485–496. L. Leonhard Held. Biostatistician. http://www.biostat.uzh.ch/aboutus/people/held/IFSPM.pdf