Understanding Lies, Damn Lies, and Statistics: A Look At Why So Many People Find Statistics Frustrating John P. Holcomb, Jr. Cleveland State University Ohio MAA Section Meeting April 1, 2005 Outline • Why do statisticians find public reporting of statistics frustrating? • Why does the public find statistics frustrating? • Why do students find statistics frustrating? • What are some major differences between statisticians and mathematicians? • Emphasize our similarities "There are Three Kinds of Lies: Lies, Damn Lies and Statistics." • Attributed to Benjamin Disraeli (1804 - 1880) • Prime Minister (1868, 1874 -1880) • Said to be popularized by Mark Twain in the United States Statistics Affirming Quotations • Frederick Mosteller (Harvard University) • “It is easy to lie with statistics, but it is easier to lie without them.” What Drives Statisticians Nuts? Yahoo! News, (September 7, 2004) Study Links TV to Teen Sexual Activity • “Teenagers who watch a lot of television with sexual content are twice as likely to engage in intercourse than those who watch few such programs.” (Reuters) • Rebecca Collins, “This is the strongest evidence yet that the sexual content of television programs encourages adolescents to initiate sexual intercourse and other sexual activities.” • The problem is this is an Observational Study • Did not sit 1,792 adolescents down and force them to watch television • Adolescents chose their own “treatment” Confounding • Occurs when some other variable(s) affects both the independent variable (TV watching) and the dependent variable (Sexual Activity) • Can be obvious and not-so-obvious • This is hard for statistics students when it is covered in class, but for the public … Problem with All Observational Studies • Cannot assume there is no confounding • So critics always have opportunity to criticize observational studies • This is the defense of the Tobacco Industry for smoking So why am I concerned? • There is no mention of the role of parental supervision • What is the consequence? • The public misguided on the meaning of the result Experiments • Allow researchers to make “causal” conclusions • Randomly assign subjects to “treatments” and “control” to ensure balance – Control does not necessarily mean “sugar pill” • Both groups alike to every known variable as well as every unknown variable EXCEPT the treatment variable Example II • July 9, 2002, The Journal of the American Medical Association releases the results of the “Women’s Health Initiative (WHI)” • Headlines Across America warned women about the risks from Hormone Replacement Therapy (HRT) • New York Times: Study Is Halted Over Rise Seen In Cancer Risk Belief: Estrogen and Progesterone would help women live healthier lives Findings: • Increased risk for breast cancer (26%) • Increased risk of heart disease (29%) • Increased risk of Stroke (41%) Previous Good News • 1962 – Observational study suggests estrogen therapy reduces risk of breast and genital cancers • 1980 – A study shows that estrogen and progesterone together reduce risk for endometrial cancer • 1985 – The Nurses’ Health Study, with 121,964 subjects finds lower rate of heart disease in those taking progesterone • 1995 – Same study finds that estrogen and progesterone reduce heart attack risk by 39% Ethical Question • For the WHI can we deprive the control group this great treatment? What Went Wrong? • One major issue – Nurses’ Health Study is observational • WHI is a clinical Trial • One theory is the confounder is health – healthier nurses took the HRT and stayed on the HRT • Another theory is the nature of the study – those who had some kind of heart ailment stopped taking medicine • Even though WHI was a clinical trial (experiment), informed consent can add bias • Also, Women in WHI were older (most were 60 or older instead of going through menopause) Caution • Observational Studies are not useless • Often point to issues needing further investigation – Experiments – Animal Studies What Did Not Make the Headlines (or Even the Article) • Recall the earlier increase: – Breast cancer (26%) – 8 more cases for every 10,000 women – For 8 to equal 26% increase then: X 8 1.26 8 .26 X X 30.77 X P(Breast Cancer in Placebo Group) = 31/10,000 = .0031 P(Breast Cancer in the HRT Group) = 39/10,000 = .0038 THESE ARE STILL VERY SMALL PROBABILITIES! Frustrations: 1. Difference between observational studies and experiments is subtle 2. For statisticians, there is no contradiction, but for the public and even scientists, there is a glaring contradiction 3. Confirms the culture of disbelief – and who is blamed? 4. There is inherent uncertainty in the process Statistics is Perfect for the Law • Since all conclusions are based on probability – we can never say anything definitively • 0 and 1 are difficult to achieve ever in practice Implications for Teaching • These are the topics we need to discuss – Study Design – Confounding and Causation – Treatment vs. Placebo – Absolute and Relative Risk – Uncertainty • “All models are wrong, but some are useful” – George Box (University of Wisconsin) Further Implications • In the courses: – Introductory statistics – Statistical literacy – Mathematics for liberal arts • Statistical thinking will one day be as necessary a qualification for efficient citizenship as the ability to read and write. – H.G. Wells Rational vs Emotional • Statistics and Mathematics have the perception of being rule enforcers • People do not like being told what to do or what not to do • We are constantly saying do not play the Lottery – My life is a personal failure Mega Millions • July 2, 2004 • Mega-Millions jackpot reaches $290,000,000 • Probability of winning is .000000007399 = 7.399x10-9 Fox News Cleveland Dr. Killjoy • 57 times more likely to die from a motor vehicle accident that day then win MegaMillions • 21 times more likely to die from lightening strike in a year than win MegaMillions Why Do Students Find Statistics Frustrating? 1. Stilted Language – Recall an earlier phrase • “Cannot assume there is no confounding” – We are the masters of the double negative Confidence Intervals • Students want to say – The probability the mean is in the interval is 95% • What we require them to say – “We are 95% confident the interval (a,b) captures the unknown population mean” – When drawing random samples from a population, calculating the intervals in this manner captures the unknown mean 95% of the time. Hypothesis Testing Want to say “Accept Null” • Have to say “Fail to Reject Null” – (AND we make them put in context) • Again we statisticians can’t be certain (or accepting) of anything 2. Look At What We Make Them Do 3. Statistics Taught By Folks Who Are Not Trained Statisticians • Statistics was added “on the side” to their training • Not sure of the “why”, so it is difficult to motivate • Teaching statistics is “scraping the bottom of the barrel” in classroom assignments • “In God We Trust, All Others Bring Data” • W. Edwards Demming (TQM Guru) • At CSU, there are at least 7 different departments teaching some kind of introductory statistics comprising over 100 faculty • Only 4 faculty on campus have a Ph.D. in Statistics • At many schools that may be even lower Differences Between Mathematics and Statistics • Statistics is too dirty • Mathematics is pure and pristine • Mathematics is built on axioms, definitions, and theorems • Statistics is built on “flawed” processes right from the very beginning Inferential Statistics Giant Leaps of Faith • Assume the population is definable • Assume the population is stable • Assume the sample is representative (bias free) • If all this is true, then can we rely on Mathematics for our confidence interval to capture the mean 95% of the time. • Often mathematicians want “perfect” studies or nothing • “If you do not know what to measure, measure anyway, you’ll learn what to measure next time.” – David Moore (Purdue University) • Assessment X No Quod Erat Demonstrandum • I get a representative sample • The sample size is large enough to invoke the Central Limit Theorem • I calculate s X 1.96 n • I still do not know if my interval contains the unknown mean ERGO • I have to wonder . . . • Mathematicians do not like uncertainty Difference #2 • Applied Statisticians have to communicate with other researchers • These researchers often have limited statistical training • (Present company excluded), mathematicians are not exactly known for their patience with those deemed less worthy • The main challenge is to take a scientific hypothesis and turn into a testable statistical hypothesis • Have to convince researchers that input prior to collecting data is critical – Cleveland Cavaliers • Have to educate them not to “Stone the Messenger” Difference #3 • Statisticians make more money • Statisticians have more job options • Go to icrunchdata.com 1-50 of 119 | First | Previous | Next | Last Job No. Job Title Company Name Date Posted State Exp. Salary 825 Senior Marketing Analyst Advanced Financial Services, Inc. 3/28/2005 RI 5-8 80-89K 824 Employment Systems Analyst & Researcher University of Connecticut 3/28/2005 CT 0-2 -- 823 Senior Research Analyst Fortune 100 Company UnitedHealth Group 3/25/2005 MN 3-4 -- 822 Manager, Statistical Analysis The Brixton Group, Inc. 3/24/2005 VA 5-8 100-109K, 110-119K, 120-129K, 130-139K, 140-149K 821 Sr. Statistician The Brixton Group, Inc. 3/24/2005 VA 5-8 90-99K, 100109K 820 Business Analyst The Brixton Group, Inc. 3/24/2005 VA 0-2 50-59K, 6069K 819 Informatics Statistics Manager, Senior/Lead Informatics Analyst and Informatics Analyst BSA Advertising for Aetna 3/22/2005 PA 0-2 -- 818 DATABASE MARKETING SPECIALIST Home Shopping Network 3/21/2005 FL 0-2 -- 810 Statistician (Marketing) Vistrio 3/21/2005 OH 3-4 open • Try going www.idoproofs.com • Great Opportunities in Math – 101 Careers in Mathematics – http://www.maa.org My Own History • BS in Mathematics • MS in Mathematics • Took Prelims in Real Analysis, Topology, Complex Analysis, and Math Stat • Would have gotten a Ph.D. in mathematics … • I do love Mathematics and Mathematicians • HONEST! Why Can’t We Be Friends??? • Undergraduate Math Departments Need Math Majors • Graduate Statistics Departments Need applicants • We need to offer mathematically talented students as many options as possible Easier Said Than Done • We need to let undergraduates know what statistics is • Traditional Probability and Statistics sequence is NOT statistics • Students need authentic experience working with data Enrollment • 264,000 students took Elementary Statistics according the 2000 CBMS • www.ams.org/cbms • 77,000 to take AP STATS in 2005 • These people are NOT welcome in Mathematics Departments If I were King of the World … • • • • Calculus I, II, III Linear Algebra Intro Proof/Discrete Differential Equations • • • • Real Analysis Probability Math Stat Applied Stats Do Not Reinvent The Wheel • The American Statistical Association has guidelines: – Majors – Concentrations – Minors – Google Search USEI Guidelines – Journal of Statistics Education • www.amstat.org/jse Shameless Plug … • Check out an innovative statistics course for majors at www.rossmanchance.com (click ISCAT link) • Beth Chance and Allan Rossman • Investigating Statistical Concepts, Applications, and Methods (Duxbury) • MAA PREP Workshop July 18-22 • www.maa.org/prep/2005 Goals • Show specific examples of frustrating news stories involving statistics • Discuss the importance of these “soft” ideas in low – level courses • “Feel the Pain” of my own tortured statistics students • Discuss the differences between statistics and mathematics • Talk about how we need each other – desperately!!! Last Quote “To Understand God’s Thoughts We Must Study Statistics, for These Are the Measure of his Purpose”