How to Lie with Statistics Edward H. Freeman Statisculation misinforming people statistical material. by the use of The better you know the subject, the less likely you are to misuse it by mistake, or be taken in by those who misuse statistics on purpose. Lying With Statistics • He uses statistics as a drunken man uses lampposts – for support rather than for illumination. Andrew Lang There are three kinds of lies: Lies, Damned Lies, and Statistics. British Prime Minister Benjamin Disraeli British Economist Alfred Marshall Mark Twain Definition of Statistics: The science of producing unreliable facts from reliable figures. We reach conclusions from our statistics – often incorrectly. How To Lie With Statistics • Written in 1954 by Darrell Huff (1913-2001), editor of Better Homes and Gardens. • Huff had no formal training in statistics. • Over 1,500,000 copies sold in English. • Breezy, highly-readable informal style copied by Dummies books. I couldn’t find a photograph of • 152 pages – lots of pictures. Darrell Huff anywhere on the Internet. Other books by Darrell Huff • The Complete How To Figure It: Using Math in Everyday Life • How to Take a Chance • How to work with concrete and masonry • Score: The strategy of taking tests • Complete Book of Home Improvement • How to figure the odds on everything • Pictures by Pete, a career story of a young commercial photographer,… • Twenty careers of tomorrow • Woodpulp and Ink: The less reputable newsstand magazines, 1919-1939 • How to save on the home you want Mark Twain’s Definition of a Classic Something that everybody wants to have read and nobody wants to read. Popular Books on Technical Subjects • The Universe and Doctor Einstein by Lincoln Barnett – “among the clearest, most readable expositions of relativity theory.” • Mathematics for the Nonmathematician by Morris Kline – “entertaining overview follows development of mathematics from ancient Greeks to present.” • A Brief History of Time by Stephen Hawking – “easy, good-natured humor and an ability to illustrate highly complex propositions with analogies plucked from daily life.” Chapter 5 The Gee Whiz Graph West Hartford Real Estate Sales 2007 Coldwell Raveis Prudential Re/Max 254 187 173 107 Let’s Make a Graph!! 300 250 200 150 100 50 0 Coldwell Raveis Coldwell Raveis Prudential Re/Max Prudential 254 187 173 107 Re/Max Let’s make another graph! Year After Year Your #1 Team in West Hartford Real Estate! 250 200 150 100 Coldwell Raveis Prudential Re/Max Sales Coldwell Raveis Prudential Re/Max 254 187 173 107 Chop off the bottom and stretch the top. Nobody is Selling Anything! 10000 8000 6000 4000 2000 0 Coldwell Raveis Prudential Re/Max Coldwell Raveis Prudential Re/Max 254 187 173 107 Squeeze everything to the bottom. Points to Ponder • Is the data true and accurate? The Excel Graphing Fallacy. • Are there other factors? – Prudental specializes in McMansions. – There is another agency, not on the chart, that sold 1000 houses last year. • Do a seller care how many houses a realtor sells or that she sells his house? DATA DISTORTION % of Doctors Who Are Family Practitioners Ratio to 1964 27.0% 1:2247 8023 1975 16.9% 1:3157 6064 1990 12.0% 1:4232 5212 Number of Doctors Population UNREADABLE CHARTS (but don’t it look nice!?) The Pentagon Spaghetti Slide The Global Warming Chart AVERAGES I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live. Justice Louis D. Brandeis The Well-Chosen Average Chapter 2 – The Well-Chosen Average Three Types of Averages: •Mean – The traditional average ($5,700) •Median – The one in the middle. 12 make more and 12 make less. ($3,000) •Mode – The salary that occurs most often. ($1,000) Salary Employees $45,000 1 $15,000 1 $10,000 2 $5,700 1 $5,000 3 $3,700 4 $3,000 1 $2,000 12 (Mode) (Mean) (Median) It is written: 4 out of 5 dentists surveyed would recommend sugarless gum to their patients who chew gum. Television Questions to Ponder Mary Ann or Ginger? What did Wilma Flintstone see in Fred? What did the fifth dentist recommend? Probably not Probably not: Smoking is one of the leading causes of statistics. Fletcher Knebel An Explanation • The question, asked in a 1976 market research survey, was whether dentists would recommend sugared gum, sugarless gum, or no gum at all to their patients who chew gum. • Out of about 1,200 dentists, 85% recommended sugarless gum, with the rest pretty much going to “no gum at all.” • There is no hard evidence any dentist was in favor of sugared gum. http://www.bookofodds.com/Daily-Life-Activities/Articles/A0471-No-Gum-at-All-1-in-10 An Explanation (Continued) • Compare with: “Four out of five oncologists recommend low tar cigarettes for their patients who smoke.” • Would any dentist say that their patients should chew gum – sugarless or not? Two more quotes about statistics • The theory of probabilities is at bottom nothing but common sense reduced to calculus. Laplace • There are two kinds of statistics, the kind you look up and the kind you make up. Rex Stout Sample with the Built-In Bias • Time Magazine (circa 1950) “The average Yalesman, Class of 1924, makes $25,111 a year.” • Four categories of alumni – – – – Those who responded Those who did not reply Those whose addresses are unknown Those who are dead Those Who Responded • Did they tell the truth? Will one multimillionaire skew the average? Outliers • If they lied: • • • • Did they exaggerate? (to impress their fellow graduates) Did they underreport? (to avoid problems with the IRS) Do the liars balance each other out? Do we know? Those who didn’t reply Those whose addresses are unknown • Fact: Well known alumni are easy to locate. • Who did not reply to the survey? – Low Income Earners – “clerks, mechanics, tramps, unemployed alcoholics, barely surviving writers and artists…people of whom it would take half a dozen or more to add up to an income of $25,111.” – Tax Cheats - Those who don’t want anybody (the IRS) to know their income. – Private People -Those who don’t consider their income anybody’s business. Chapter 10 – How to Talk Back to a Statistic • Who Says So? – The OK Name. Freeman Institute for Advanced Statistical Control and Organization. (FIASCO) – The PowerPoint/Excel Syndrome – “When an OK name is cited (i.e. the Harvard Institute), make sure that the authority stands behind the information, not merely somewhere alongside it.” Some More Quotations • Statistics show that many people watch our show from the bedroom and people you ask into your bedroom have to be more interesting than those you ask into your living room. Jack Paar, late night host • Then there is the man who drowned crossing a stream with an average depth of six inches. W.I.E. Gates Who Says So? (continued) – Chicago Journal of Commerce • Sent out a survey to 1200 corporations asking about price gouging and hoarding during the Korean War. • 169 responded (14%) – 9% said they had not raised prices – 5% said they had raised prices – 86% didn’t answer at all. • “The survey shows that corporations have done exactly the opposite of what the enemies of the American business have charged.” (Emphasis mine) Who Says So? (Continued) • • • • • Whom did they ask? Who responded? What did they say? What did you expect them to say? Can they be taken at their word? Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable. Bobby Bragan How Does He Know? • Is the sample accurate? • Is the sample big enough to represent the entire population? • Are the people in this room a fair representation of all voters in Connecticut? (Age, Race, Gender, etc) • Names Taken Out of the Telephone Book. What’s Missing? • 33% of the first class of women admitted to Johns Hopkins University married faculty members. • Look Magazine – “A survey of 2800 mothers shows that over half of the mothers of children born with Down’s syndrome were over 35 years old.” Torture numbers, and they'll confess to anything. Gregg Easterbrook What’s Missing (Examples) • You are three times more likely to be hit by lightning than you are to be attacked by a shark. • “This is the first time I have ever seen you sober. • April Retail Sales were higher this year than last year. Another Quotation • USA Today has come out with a new survey – apparently, three out of four people make up 75% of the population. David Letterman Did Somebody Change the Subject? • Census Department – Half a million more farms in 1930 than in 1935. Definition of a farm was revised in 1932. • 1950 Census – More people 65 – 70 than were 55 – 60 in the 1940 census. – Not explained by immigration. – Social Security – Vanity Did Somebody Change the Subject? (Continued) • “We could take a prisoner from Alcatraz and board him at the Waldorf-Astoria cheaper.” Senator William Langer (R – ND) • Later went to prison himself. • Comparing cost of hotel room to total maintenance of a prisoner. (Food, Security) Senator William Langer (R-ND) 1886 - 1959 How is the Question Phrased? • Building in bias. Bias can be built into a questionnaire by little more than careless wording. • Compare – Should the government help people who face losing their homes to foreclosure? with – Should you be forced to pay more taxes to help people make the payments on their houses? Both accurately describe what will happen. Correlation vs. Causation • Post Hoc – Ice cream sales go up during the summer. – Homicides increase during the summer. – Therefore, ice cream leads to murder or murder leads to ice cream. – Correlation is not causation. Perhaps neither of these things has produced the other, but both are a product of some third factor. (It’s hot). Be careful when somebody says that A leads to B. – Mudders, Tampa Bay and UConn Basketball Non Representative Sample • Practically all statistics are based on a sample of a population. So…... – how was the sample chosen? – how big is the sample? – what population does it claim to represent? – what population does it actually represent? – The Self-Selecting Sample Other ways to Misuse Statistics • Overgeneralizing. Example: Studying only men, and then generalizing conclusions to both men and women. • Interpreting probability as certainty. Example: – Finding that women are more likely than men to favor strict gun control only means that women have a higher probability of favoring strict gun control than men. – It does not mean that all women favor strict gun control and all men do not favor it. Other ways to Misuse Statistics • Faking data. A famous instance of this occurred in a study of separated identical twins. – The researcher wished to show that despite separation, twins remained similar in certain traits. – It was later shown that the data were fabricated. – Lies, Lies, I can’t believe a word you say. The Knickerbockers, 1966 • Using data selectively. Sometimes a survey includes many questions, but the researcher reports on only a few of the answers. Data Precision Quoting specific numbers, especially including decimals points, can look authoritative. "Real estate values up 4.95%" Why would someone be so precise if they didn't know their stuff? The numbers can be wild guesses, but accuracy gives an air of authority. Samples that went wrong (Part I) 1936 Presidential Election • FDR v. Alfred Landon –Republican Governor of Kansas • Literary Digest – Survey of 2,500,000 voters concludes that Landon would win in a landslide • Survey consisted of – Subscribers – Readily Available Lists • Automobile Owners • Telephones. • Folded in 1938 President Alfred Landon (1887 – 1987) Samples that went wrong (Part II) 1948 Presidential Election • Virtually every poll indicated that incumbent President Harry S. Truman would be defeated by New York Governor Thomas E. Dewey. Truman won, overcoming a three-way split in his own party. Six Questions to Ask About Any Statistic 1. Who Created it? Do They Have an Agenda? (Republican Pollster) 2. Why was it created? For Research or to persuade? 3. How was it Created? What Methodology was used? 4. What is missing? Is there some hidden context? 5. Is it relevant? Does it tend to mislead the reader? 6. Does it make sense? If it sounds ridiculous, it probably is. How to Avoid Being Sucked In • Be suspicious of any data that does not identify the number of cases sampled or does not provide the probable error. • Be skeptical of the conclusions reached. • Are they playing with your emotions? How to Avoid Being Sucked In • Compared to what? – Associated Press: Almost a third (29%) of all deaths among nuclear workers aged 44 to 65 were linked to cancer. – An independent party observed that 35% of all deaths of those between 44 and 65 years of age are attributable to cancer; therefore, the workers died from cancer at a lower rate than others. – AP numbers don’t prove anything, one way or the other. Conclusions • Statistics are commonly used to support a biased position or an outright fabrication for two reasons. – Few people understand statistics well enough to question them. – Lying with statistics requires no actual lying. If the most favorable data is highlighted and the most unfavorable data is suppressed, statistics can be manipulated to illustrate just about any point of view, allowing the manipulator’s hands to remain unsullied. Finally • Statistics are no substitute for judgment. Henry Clay Questions or Comments?