How to Lie with Stats

advertisement
How to Lie with Statistics
Edward H. Freeman
Statisculation
misinforming people
statistical material.
by
the
use
of
The better you know the subject, the less
likely you are to misuse it by mistake, or be
taken in by those who misuse statistics on
purpose.
Lying With Statistics
• He uses statistics as a drunken man uses
lampposts – for support rather than for
illumination. Andrew Lang
There are three kinds of lies:
Lies, Damned Lies, and Statistics.
British Prime Minister
Benjamin Disraeli
British Economist
Alfred Marshall
Mark Twain
Definition of
Statistics: The
science of producing unreliable
facts from reliable figures.
We reach conclusions from our
statistics – often incorrectly.
How To Lie With Statistics
• Written in 1954 by Darrell Huff (1913-2001), editor
of Better Homes and Gardens.
• Huff had no formal training in statistics.
• Over 1,500,000 copies sold in English.
• Breezy, highly-readable informal style copied by
Dummies books.
I couldn’t find a
photograph of
• 152 pages – lots of pictures.
Darrell Huff
anywhere on the
Internet.
Other books by Darrell Huff
• The Complete How To Figure It: Using Math in
Everyday Life
• How to Take a Chance
• How to work with concrete and masonry
• Score: The strategy of taking tests
• Complete Book of Home Improvement
• How to figure the odds on everything
• Pictures by Pete, a career story of a young
commercial photographer,…
• Twenty careers of tomorrow
• Woodpulp and Ink: The less reputable newsstand
magazines, 1919-1939
• How to save on the home you want
Mark Twain’s Definition of a Classic
Something that everybody wants to have read
and nobody wants to read.
Popular Books on Technical Subjects
• The Universe and Doctor Einstein by Lincoln
Barnett – “among the clearest, most readable
expositions of relativity theory.”
• Mathematics for the Nonmathematician by
Morris Kline – “entertaining overview follows
development of mathematics from ancient
Greeks to present.”
• A Brief History of Time by Stephen Hawking –
“easy, good-natured humor and an ability to
illustrate highly complex propositions with
analogies plucked from daily life.”
Chapter 5
The Gee Whiz Graph
West Hartford Real Estate Sales 2007
Coldwell
Raveis
Prudential
Re/Max
254
187
173
107
Let’s Make a Graph!!
300
250
200
150
100
50
0
Coldwell
Raveis
Coldwell
Raveis
Prudential
Re/Max
Prudential
254
187
173
107
Re/Max
Let’s make another graph!
Year After Year Your #1 Team in West
Hartford Real Estate!
250
200
150
100
Coldwell
Raveis
Prudential
Re/Max
Sales
Coldwell
Raveis
Prudential
Re/Max
254
187
173
107
Chop off the bottom
and stretch the top.
Nobody is Selling Anything!
10000
8000
6000
4000
2000
0
Coldwell Raveis Prudential Re/Max
Coldwell
Raveis
Prudential
Re/Max
254
187
173
107
Squeeze everything to the bottom.
Points to Ponder
• Is the data true and accurate? The Excel Graphing
Fallacy.
• Are there other factors?
– Prudental specializes in McMansions.
– There is another agency, not on the chart, that sold 1000
houses last year.
• Do a seller care how many houses a realtor sells or
that she sells his house?
DATA DISTORTION
% of Doctors
Who Are
Family
Practitioners
Ratio to
1964
27.0%
1:2247
8023
1975
16.9%
1:3157
6064
1990
12.0%
1:4232
5212
Number of
Doctors
Population
UNREADABLE CHARTS
(but don’t it look nice!?)
The Pentagon Spaghetti Slide
The Global Warming Chart
AVERAGES
I abhor averages. I like the individual case. A
man may have six meals one day and none
the next, making an average of three meals
per day, but that is not a good way to
live. Justice Louis D. Brandeis
The Well-Chosen
Average
Chapter 2 – The Well-Chosen Average
Three Types of Averages:
•Mean – The traditional average ($5,700)
•Median – The one in the middle. 12 make more
and 12 make less. ($3,000)
•Mode – The salary that occurs most often.
($1,000)
Salary
Employees
$45,000
1
$15,000
1
$10,000
2
$5,700
1
$5,000
3
$3,700
4
$3,000
1
$2,000
12 (Mode)
(Mean)
(Median)
It is written:
4 out of 5 dentists surveyed
would
recommend
sugarless
gum to their patients who chew
gum.
Television Questions to Ponder
Mary Ann or Ginger?
What did Wilma Flintstone see in
Fred?
What did the fifth dentist
recommend?
Probably not
Probably not:
Smoking is one of the leading causes of statistics.
Fletcher Knebel
An Explanation
• The question, asked in a 1976 market research
survey, was whether dentists would recommend
sugared gum, sugarless gum, or no gum at all to their
patients who chew gum.
• Out of about 1,200 dentists, 85% recommended
sugarless gum, with the rest pretty much going to
“no gum at all.”
• There is no hard evidence any dentist was in favor of
sugared gum.
http://www.bookofodds.com/Daily-Life-Activities/Articles/A0471-No-Gum-at-All-1-in-10
An Explanation (Continued)
• Compare with: “Four out of five
oncologists
recommend
low
tar
cigarettes for their patients who smoke.”
• Would any dentist say that their patients
should chew gum – sugarless or not?
Two more quotes about statistics
• The theory of probabilities is at bottom
nothing but common sense reduced to
calculus. Laplace
• There are two kinds of statistics, the
kind you look up and the kind you make
up. Rex Stout
Sample with the Built-In Bias
• Time Magazine (circa 1950) “The average
Yalesman, Class of 1924, makes $25,111 a year.”
• Four categories of alumni
–
–
–
–
Those who responded
Those who did not reply
Those whose addresses are unknown
Those who are dead
Those Who Responded
• Did they tell the truth? Will one multimillionaire
skew the average? Outliers
• If they lied:
•
•
•
•
Did they exaggerate? (to impress their fellow graduates)
Did they underreport? (to avoid problems with the IRS)
Do the liars balance each other out?
Do we know?
Those who didn’t reply
Those whose addresses are unknown
• Fact: Well known alumni are easy to locate.
• Who did not reply to the survey?
– Low Income Earners – “clerks, mechanics, tramps,
unemployed alcoholics, barely surviving writers
and artists…people of whom it would take half a
dozen or more to add up to an income of
$25,111.”
– Tax Cheats - Those who don’t want anybody
(the IRS) to know their income.
– Private People -Those who don’t consider
their income anybody’s business.
Chapter 10 – How to Talk Back to a
Statistic
• Who Says So?
– The OK Name. Freeman Institute for
Advanced Statistical Control and
Organization. (FIASCO)
– The PowerPoint/Excel Syndrome
– “When an OK name is cited (i.e. the Harvard
Institute), make sure that the authority
stands behind the information, not merely
somewhere alongside it.”
Some More Quotations
• Statistics show that many people
watch our show from the bedroom
and people you ask into your
bedroom
have
to
be
more
interesting than those you ask
into your living room.
Jack Paar, late night host
• Then there is the man who
drowned crossing a stream
with an average depth of
six inches. W.I.E. Gates
Who Says So? (continued)
– Chicago Journal of Commerce
• Sent out a survey to 1200 corporations asking about
price gouging and hoarding during the Korean War.
• 169 responded (14%)
– 9% said they had not raised prices
– 5% said they had raised prices
– 86% didn’t answer at all.
• “The survey shows that corporations have done exactly
the opposite of what the enemies of the American
business have charged.” (Emphasis mine)
Who Says So? (Continued)
•
•
•
•
•
Whom did they ask?
Who responded?
What did they say?
What did you expect them to say?
Can they be taken at their word?
Say you were standing with one foot in the oven and
one foot in an ice bucket. According to the
percentage people, you should be perfectly
comfortable. Bobby Bragan
How Does He Know?
• Is the sample accurate?
• Is the sample big enough to represent the
entire population?
• Are the people in this room a fair
representation of all voters in Connecticut?
(Age, Race, Gender, etc)
• Names Taken Out of the Telephone Book.
What’s Missing?
• 33% of the first class of women admitted to
Johns Hopkins University married faculty
members.
• Look Magazine – “A survey of 2800 mothers
shows that over half of the mothers of
children born with Down’s syndrome were
over 35 years old.”
Torture numbers, and they'll confess to anything. Gregg
Easterbrook
What’s Missing (Examples)
• You are three times more likely to be hit by
lightning than you are to be attacked by a shark.
• “This is the first time I have ever seen you sober.
• April Retail Sales were higher this year than last
year.
Another Quotation
• USA Today has come out with a new survey –
apparently, three out of four people make up
75% of the population. David Letterman
Did Somebody Change the Subject?
• Census Department – Half a million more
farms in 1930 than in 1935. Definition of a
farm was revised in 1932.
• 1950 Census – More people 65 – 70 than were
55 – 60 in the 1940 census.
– Not explained by immigration.
– Social Security
– Vanity
Did Somebody Change the Subject?
(Continued)
• “We could take a prisoner from Alcatraz and board him
at the Waldorf-Astoria cheaper.” Senator William
Langer (R – ND)
• Later went to prison himself.
• Comparing cost of hotel room to total maintenance of
a prisoner. (Food, Security)
Senator William Langer (R-ND)
1886 - 1959
How is the Question Phrased?
• Building in bias. Bias can be built into a questionnaire
by little more than careless wording.
• Compare
– Should the government help people who face losing their
homes to foreclosure?
with
– Should you be forced to pay more taxes to help people
make the payments on their houses?
Both accurately describe what will happen.
Correlation vs. Causation
• Post Hoc
– Ice cream sales go up during the summer.
– Homicides increase during the summer.
– Therefore, ice cream leads to murder or murder
leads to ice cream.
– Correlation is not causation. Perhaps neither of
these things has produced the other, but both are
a product of some third factor. (It’s hot). Be
careful when somebody says that A leads to B.
– Mudders, Tampa Bay and UConn Basketball
Non Representative Sample
• Practically all statistics are based on a sample
of a population. So…...
– how was the sample chosen?
– how big is the sample?
– what population does it claim to represent?
– what population does it actually represent?
– The Self-Selecting Sample
Other ways to Misuse Statistics
• Overgeneralizing. Example: Studying only men, and
then generalizing conclusions to both men and
women.
• Interpreting probability as certainty.
Example:
– Finding that women are more likely than men to
favor strict gun control only means that women
have a higher probability of favoring strict gun
control than men.
– It does not mean that all women favor strict gun
control and all men do not favor it.
Other ways to Misuse Statistics
• Faking data. A famous instance of this occurred in a
study of separated identical twins.
– The researcher wished to show that despite
separation, twins remained similar in certain
traits.
– It was later shown that the data were fabricated.
– Lies, Lies, I can’t believe a word you say. The
Knickerbockers, 1966
• Using data selectively. Sometimes a survey includes
many questions, but the researcher reports on only a
few of the answers.
Data Precision
Quoting specific numbers, especially including
decimals points, can look authoritative. "Real
estate values up 4.95%" Why would someone
be so precise if they didn't know their stuff?
The numbers can be wild guesses, but
accuracy gives an air of authority.
Samples that went wrong (Part I)
1936 Presidential Election
• FDR v. Alfred Landon –Republican Governor of
Kansas
• Literary Digest – Survey of 2,500,000 voters
concludes that Landon would win in a landslide
• Survey consisted of
– Subscribers
– Readily Available Lists
• Automobile Owners
• Telephones.
• Folded in 1938
President Alfred Landon
(1887 – 1987)
Samples that went wrong (Part II)
1948 Presidential Election
• Virtually every poll indicated that incumbent President
Harry S. Truman would be defeated by New York Governor
Thomas E. Dewey. Truman won, overcoming a three-way
split in his own party.
Six Questions to Ask About Any
Statistic
1. Who Created it? Do They Have an Agenda?
(Republican Pollster)
2. Why was it created? For Research or to persuade?
3. How was it Created? What Methodology was used?
4. What is missing? Is there some hidden context?
5. Is it relevant? Does it tend to mislead the reader?
6. Does it make sense? If it sounds ridiculous, it
probably is.
How to Avoid Being Sucked In
• Be suspicious of any data that does not
identify the number of cases sampled or
does not provide the probable error.
• Be skeptical of the conclusions reached.
• Are they playing with your emotions?
How to Avoid Being Sucked In
• Compared to what?
– Associated Press: Almost a third (29%) of all
deaths among nuclear workers aged 44 to 65 were
linked to cancer.
– An independent party observed that 35% of all
deaths of those between 44 and 65 years of age
are attributable to cancer; therefore, the workers
died from cancer at a lower rate than others.
– AP numbers don’t prove anything, one way or the
other.
Conclusions
• Statistics are commonly used to support a biased
position or an outright fabrication for two
reasons.
– Few people understand statistics well enough to
question them.
– Lying with statistics requires no actual lying. If the
most favorable data is highlighted and the most
unfavorable data is suppressed, statistics can be
manipulated to illustrate just about any point of view,
allowing the manipulator’s hands to remain unsullied.
Finally
• Statistics are no substitute for judgment.
Henry Clay
Questions or Comments?
Download