Read the Introduction and Chapter 1 to How to Lie

advertisement
How to Lie With Statistics
These worksheets were created by Mike Bryant at Santa Maria High School, 901 S.
Broadway, Santa Maria, CA 93455. Feel free to use them, modify them, and share them. But
PLEASE, PLEASE, PLEASE share with me any improvements you make. And if anyone out
there wants to do the same thing for The Lady Tasting Tea, I would love to not have to create
them myself.
mbryant@smjuhsd.org
How to Lie With Statistics
Introduction and Chapter 1
Read the Introduction and Chapter 1 to How to Lie With Statistics. Answer the following
questions and be ready to discuss the chapter in class.
1. Choose one of the quotations inside the front cover and discuss how it relates to the
Introduction.
2. What does the author mean in the fourth paragraph of Chapter 1, “It is quite
improbably salubrious?”
3. List as many sources of sample bias as you can that are mentioned in Chapter 1 and
provide an example of each.
How to Lie With Statistics
Introduction and Chapter 1
4. Put the second paragraph on Page 18 (“A river cannot….”) into your own words.
5. What is the advantage of a stratified random sample and what difficulties does it
pose, according to this chapter?
6. On Page 26, the author suggests that most polls are biased in the direction of the
Literary Digest error.
a. What incident does this refer to?
b. That incident took place during (and arguably because of) the Great
Depression. Are the lessons learned from that still relevant for us today?
Why?
c. In what direction is that bias?
7. Comment on the last paragraph of Chapter 1.
How to Lie With Statistics
Introduction and Chapter 1
Read the Introduction and Chapter 1 to How to Lie With Statistics. Answer the following
questions and be ready to discuss the chapter in class.
1. Choose one of the quotations inside the front cover and discuss how it relates to the
Introduction.
a. Disraeli: In a fact-minded culture, those without honesty or understanding use
statistical language to sensationalize or oversimplify (Pg 8).
b. Wells: We are deluged with data (father-in-law’s newspaper) and claims
based on data (antihistamine pills, Pg 8).
c. Ward: Poorly designed procedures or improperly applied statistics can make
us confident in conclusions that are not true at all (newspaper, antihistamine,
Pg 7-8). It is harder to become disabused of these things we feel confident in
(after all, they had a basis in statistical analysis) than to deal with the things
we know we are ignorant about.
2. What does the author mean in the fourth paragraph of Chapter 1, “It is quite
improbably salubrious?” (Pg 11)
It is unlikely that this figure is healthy, that it will stand up to closer inspection.
3. List as many sources of sample bias as you can that are mentioned in Chapter 1 and
provide an example of each.
a. Response errors:
i. Inaccurate knowledge or memory of the respondent. Most people don’t
know their own annual income to the nearest dollar (Pg 12).
ii. Exaggeration or minimization. Some exaggerate their income out of
vanity or minimize it out of fear of the taxman (Pg 12). Harper vs.
True Story (Pg 16). Tooth brushing frequency (Pg 17).
iii. Giving an answer expected to be pleasing to the one asking. Japanese
vs. Nazis (Pg 24).
iv. Non-response. Many will choose not to respond to a questionnaire
perceived to be personal (Pg 14). Many of those that do not respond to
an income question will be those whose incomes are low (Pg 15).
b. Sampling procedure: Under-representation can come from several sources.
i. Less likely to be able to easily locate the Yale men who were less
successful (Pg 13). This makes our sample fail to be representative of
our population. The least successful people cannot afford to attend
reunions, so don’t bother to keep the college informed of their
whereabouts (Pg 15).
ii. Convenience samples. “All kinds of people can be found in a [railroad]
station.” (Pg 21)
iii. Selecting subjects you are more comfortable talking to (Pg 25, 26).
iv. Difficulties in designing or collecting a stratified sample (discussed
below).
c. The source of the bias (or even its existence) may not be readily apparent
(Landon/Roosevelt race, Pg 19-20).
How to Lie With Statistics
Introduction and Chapter 1
4. Put the second paragraph on Page 18 (“A river cannot….”) into your own words.
If the data upon which a statistical study is based are not representative of the
population, then neither can the statistic nor the conclusion be accurate. But a
conclusion phrased in statistical language and referencing a “statistical study” sounds
so erudite that we tend to accept it at face value, without examining the solidity of
that foundation. See the quote by Artemus Ward inside the front cover.
Discussion: Cancer data based on reconstructed records didn’t include many who had
lived to move from the area (Pg 18). The psychiatrist, observing a steady diet of his
own patients, concludes that everyone is neurotic (Pg 19).
5. What is the advantage of a stratified random sample and what difficulties does it
pose, according to this chapter?
A purely random sample is difficult and expensive to obtain. But difficulties with a
stratified random sample include basing the strata on incorrect assumptions (Pg 21),
problems in identifying subjects who fit in your strata (Pg 22), and the question of
how to achieve randomization while maintaining the strata (Pg 22). The latter can be
especially trying if multiple variables are being considered.
6. On Page 26, the author suggests that most polls are biased in the direction of the
Literary Digest error.
a. What incident does this refer to?
Landon/Roosevelt race in 1936 (Pg 20).
b. That incident took place during (and arguably because of) the Great
Depression. Are the lessons learned from that still relevant for us today?
Why?
See Question c.
The preponderance of cell phones, which are not normally called in telephone
polls, became a subject of concern in polls during the 2004 Bush-Kerry
presidential race. Cell-phone-only users tend to be disproportionately young,
and many feared (or hoped) that the polls showing Bush in the lead were
under-representing young voters that tend to be more liberal. As the number
of cell phone users continues to grow, this may turn into a real challenge for
pollsters.
c. In what direction is that bias?
The bias is “toward the person with more money, more education, more
information and alertness, better appearance, more conventional behavior and
more settled habits than the average of the population…” (Pg 26).
7. Comment on the last paragraph of Chapter 1.
Though the title of the book refers to lying with statistics, the greater danger is that
statistics can be misleading due to sample bias even with the best-intentioned study.
How to Lie With Statistics
Chapter 2
Read Chapter 2 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. Chapter 2 begins with an illustration in which someone is snobbish about an income
of $15,000 a year. Why did the author choose such a small income figure when it
seems that a much higher one would fit the context better?
2. Summarize Chapter 2 in a few sentences.
3. What did the hypothetical business partners do in Chapter 2 to make the distribution
of earnings seem more palatable to the employees?
4. When we see an average reported, what do we need to ask besides which kind of
average is being used? Why?
5. What confidence level was used in the Census Bureau’s report referred to in this
chapter? How do you know? Is the way that this confidence level is described a
correct interpretation of the confidence level?
6. Which kind of “average” would best describe the height of students at our high
school? Why?
7. Which kind of “average” (statisticians call all three “measures of central tendency”)
would give me the best way to compare the performance of two classes of a required
math course? Why?
How to Lie With Statistics
Chapter 2
Read Chapter 2 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. Chapter 2 begins with an illustration in which someone is snobbish about an income
of $15,000 a year. Why did the author choose such a small income figure when it
seems that a much higher one would fit the context better?
The book was written in 1954!
2. Summarize Chapter 2 in a few sentences.
The term “average” is often used to hide that a particular measure (mean, median, or
mode) has been chosen in order to make the desired impression. “Average” is not a
helpful term because its meaning is too broad. Examples: neighborhood average
income (Pg 27-28), advertised average pay (Pg 31), average magazine reader (Pg
36).
3. What did the hypothetical business partners do in Chapter 2 to make the distribution
of earnings seem more palatable to the employees?
They took a large share of their income as bonuses, which were not reported as
“profit.” They included their own salaries in the “average wage or salary,” but not in
the “average profit of owners.” And they used the mean rather than the median, so
that their own considerable salaries would raise the average. Then, on the poster,
they reported the profit as a percentage of the company’s income, to make the wages
sound even higher (Pg 34).
4. When we see an average reported, what do we need to ask besides which kind of
average is being used? Why?
What is being included in the average? Examples: US Steel report of increase in
average weekly earnings that ignored a change in the number of partially employed
people (Pg 35). Average profit manipulation (Pg 34). Meaning of family (as well as
“average”) in census report (Pg 35).
5. What confidence level was used in the Census Bureau’s report referred to in this
chapter? How do you know? Is the way that this confidence level is described a
correct interpretation of the confidence level?
95%. It said there were 19 chances out of 20 that the estimate was correct within the
margin of error (Pg 35). This is not a correct interpretation, since the estimate is
either correct (100%) or not (0%). Our 95% confidence that the true median lies
within the stated interval means that our procedures will generate an interval that
captures the true median 95% of the time.
6. Which kind of “average” would best describe the height of students at our high
school? Why?
There would be little difference between the three measures, since heights tend to be
normally distributed (Pg 29).
How to Lie With Statistics
Chapter 2
7. Which kind of “average” (statisticians call all three “measures of central tendency”)
would give me the best way to compare the performance of two classes of a required
math course? Why?
The median would be a better measure, since most required courses see the majority
of the student grades falling within a relatively narrow passing range, but the failing
grades spread over a much larger range. The mean would be sensitive to the scores
of those few students who never showed up for class.
How to Lie With Statistics
Chapter 3
Read Chapter 3 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. Explain why advertisers often rely on a very small sample to substantiate their claims.
2. We have learned that a proper sample of only 1,000 individuals from the entire
population of the United States can give us results with only a 3% margin of error
(assuming p=.5, at a 95% confidence level). Why, then, does the author suggest that a
sample of over a thousand can be much too small in some situations?
3. Chapter 3 is titled “The Little Figures That Are Not There.” This chapter discusses at
least five different kinds of information that may be missing when we are faced with
a claim based in data. See how many of these missing pieces of information you can
find mentioned in the chapter.
4. On Page 44 the author uses two words that may be new to you: rotogravure and
invidious. Provide a brief definition of each and explain how the author is using the
word in this context.
How to Lie With Statistics
Chapter 3
5. What does the author mean on Page 45 when he says, “Hardly anybody is exactly
normal in any way…?”
6. How can the graph of the advertising agency’s business be misleading when the
graph clearly shows such an upward trend?
7. When reviewing scholarship applications, I was given a table that listed the GPA of
each applicant. What other information do I need in order to evaluate these
applicants’ academic performance?
How to Lie With Statistics
Chapter 3
Read Chapter 3 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. Explain why advertisers often rely on a very small sample to substantiate their claims.
With a small sample, the differences that occur do to chance create large variations in
the proportions, which the advertisers can exploit to give a false impression of the
effectiveness of their product. With large samples, the differences produced by
chance tend to be smaller proportions (Pg 39). Remember the Law of Large Numbers
(YMM Pg 390)?
2. We have learned that a proper sample of only 1,000 individuals from the entire
population of the United States can give us results with only a 3% margin of error
(assuming p=.5, at a 95% confidence level). Why, then, does the author suggest that a
sample of over a thousand can be much too small in some situations?
When dealing with probabilities that are very small, a much larger sample may be
needed. The author used the example of the polio vaccine study, where the usual
incidence rate would only lead us to expect two cases in the sample of over 1100 that
was used (Pg 40).
3. Chapter 3 is titled “The Little Figures That Are Not There.” This chapter discusses at
least five different kinds of information that may be missing when we are faced with
a claim based in data. See how many of these missing pieces of information you can
find mentioned in the chapter.
a. The degree of significance, how likely it is that a test figure represents
something produced by chance (Pg 42).
b. The range or deviation from the average (Pg 42).
i. Houses built for the 3.6-person average family, which only made up
45% of the population (Pg 43). (This was from the 1950 census. The
2000 census shows the average family size has fallen to 3.14, but
defines family as two or more related persons living together, so this
figure does not include homes with one individual. The average
household size is only 2.59.)
ii. Gesell’s norms (Pg 44).
iii. Oklahoma City average temperature, with a range of 130˚ (Pg 52).
c. What is being compared to, such as in the claim that a treatment made steel
three times harder (Pg 47).
d. What is meant by a word, such as in the claim that electric power was
“available” to ¾ of the farms (Pg 47).
e. Labels or scales on graphs. Grape Nuts energy released (Pg 50). Advertising
agency’s business trend (Pg 51).
4. On Page 44 the author uses two words that may be new to you: rotogravure and
invidious. Provide a brief definition of each and explain how the author is using the
word in this context.
Rotogravure is something printed using a particular kind of printing press,
particularly the color inserts in a Sunday newspaper. Invidious means unfair or unjust,
How to Lie With Statistics
Chapter 3
with respect to a comparison. To assume that there is something wrong with a child
when he doesn’t do something by the age listed in the Gessell Norm is an unfair
conclusion.
5. What does the author mean on Page 45 when he says, “Hardly anybody is exactly
normal in any way…?”
The majority of any measurements do not fall on the average. So knowing that
something is above or below average tells us very little unless we know something
about the range of values around that average.
6. How can the graph of the advertising agency’s business be misleading when the
graph clearly shows such an upward trend? (Pg 51)
a. Without a label for the vertical axis, we are not even sure what is being
measured. Is this total income before expenses? (The expenses may have risen
more than the income!) Is it adjusted for inflation? Is it number of clients?
(The company may have fallen on hard times and now be dealing with a larger
number of very small clients.)
b. What is the scale? Even the smallest increase can be made to have a slope that
looks quite dramatic if the scale is chosen carefully. (More on this in a later
chapter.)
7. When reviewing scholarship applications, I was given a table that listed the GPA of
each applicant. What other information do I need in order to evaluate these
applicants’ academic performance?
a. What is it that the average is of? That is, what courses did the student take?
One student might achieve a 4.0 GPA taking the easiest courses, while another
that only has a 3.6 average took all of the most difficult offerings.
b. What is the student’s distribution of grades? Did he achieve a 3.5 GPA by
getting all A’s and B’s, or did he have all A’s for the first three years and
several recent F’s?
The average only tells part of the story, and can often be misleading!
How to Lie With Statistics
Chapters 4 - 6
Read chapters 4, 5 and 6 of How to Lie With Statistics. Answer the following questions and
be ready to discuss the chapters in class.
1. Select one sentence in Chapter 4 that you believe best summarizes the main points of
this chapter.
2. The author suggests that some reported differences may not be real differences at all
and others, even though they can be shown to be real differences, should still be
ignored. Explain how each of these two situations can arise and give an example of
each.
3. What are the steps in producing a “gee-whiz” graph?
4. Here is a reproduction of a Chevy ad that appeared a few years ago.
LIKE A ROCK: More than 98%
of all Chevy trucks sold in the
last 10 years are still on the
road.
100
99
98
97
96
95
CHEVY
FORD
TOYOTA NISSAN
The scale was printed in white on a gray background, so you had to be looking for it
to notice it. If you didn’t notice the scale, what percent would you assign to the
Nissan? Redraw this graph using a scale from 0 to 100. How would your version
look as an ad?
5. Why is it that making one picture twice as tall as another on a picture graph leaves an
impression that one quantity is eight times as big?
6. Chapter 5 talks about the wrong way to make a picture graph. Is there a right way? If
so, how do you suppose it could be constructed to avoid leaving a false impression?
7. Try to find an example of a misleading graph in a newspaper or magazine.
How to Lie With Statistics
Chapters 4 - 6
Read chapters 4, 5 and 6 of How to Lie With Statistics. Answer the following questions and
be ready to discuss the chapters in class.
1. Select one sentence in Chapter 4 that you believe best summarizes the main points of
this chapter.
My personal favorite is “This is in defiance of the fine old saying that a difference is a
difference only if it makes a difference” (Pg 58). Other good choices include “What
this comes down to is that the only way to think about IQs and many other sampling
results is in ranges,” “But comparisons between figures with small differences are
meaningless, and “You must always keep that plus-or-minus in mind, even (or
especially) when it is not stated” (Pg 57).
2. The author suggests that some reported differences may not be real differences at all
and others, even though they can be shown to be real differences, should still be
ignored. Explain how each of these two situations can arise and give an example of
each.
With small differences, the variation to be expected in any sampling situation often
makes it impossible to tell whether the difference is real or is simply due to that
variation. Examples: IQ comparisons (Pg 53-57), magazine article readership (Pg 57).
Even if a difference can be demonstrated with certainty, it may be so small as to be
meaningless. Old Gold cigarette ads (Pg 58-59). “Statistically significant” does not
equate to “significant” in terms of importance.
3. What are the steps in producing a “gee-whiz” graph?
Truncate the bottom (Pg 62). Expand the scale on the y-axis (Pg 63).
4. Here is a reproduction of a Chevy ad that appeared a few years ago.
LIKE A ROCK: More than 98% of
all Chevy trucks sold in the
last 10 years are still on the
road.
100
99
98
97
96
95
CHEVY
FORD
TOYOTA
NISSAN
The scale was printed in white on a gray background, so you had to be looking for it
to notice it. If you didn’t notice the scale, what percent would you assign to the
Nissan? Redraw this graph using a scale from 0 to 100. How would your version
look as an ad?
5. Why is it that making one picture twice as tall as another on a picture graph leaves an
impression that one quantity is eight times as big?
How to Lie With Statistics
Chapters 4 - 6
When we double the height of the picture, in order to make it appear proportional we
normally double the length as well. This makes the area four times as large. But our
mind perceives the picture as representing a three-dimensional object, which would
then have eight times the volume (Pg 69-70).
6. Chapter 5 talks about the wrong way to make a picture graph. Is there a right way? If
so, how do you suppose it could be constructed to avoid leaving a false impression?
Instead of making one picture larger than another, use multiple copies of pictures of
the same size to construct the bars of the graph. Then if we double the length of the
bar it only doubles the area and volume, so is not misleading.
7. Try to find an example of a misleading graph in a newspaper or magazine.
How to Lie With Statistics
Chapter 7
Read Chapter 7 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. What is a semi-attached figure?
2. Name as many general strategies from this chapter as you can for using the semiattached figure. (I count roughly half a dozen.)
3. Here is an excerpt from a guest commentary column in the Santa Maria Times by
Ron Fink (May 17, 2005). See how many semi-attached figures you can spot.
The North [Santa Barbara] County routinely meets state air quality standards
and the South Coast does not. Why is that? ... The South Coast doesn’t have any
more industrial pollution emitters than does the north. Failure to meet the standard
may be connected to the 71 tons of methane that is released daily from natural seeps
off of our coastline, not any human activities. You see, the instruments used to
measure air contaminates [sic] cannot discriminate between natural and man-made
pollutants.
Nature is not perfect when judged by man’s rules for polluting the
environment. Anyone with a pollen-associated sinus problem can readily attest to the
impact that nature has on our daily lives.
And what about those greenhouse gases, the purported cause of global
warming, where do they come from? A major source is volcanoes…. Satellite data
after the 1991 eruptions of Mt Pinatubo (the Philippines) and Mt Hudson (Chile)
showed a 15-20 percent ozone loss at high latitudes and a greater than 50 percent loss
over the Antarctic! … The U.S. Geological Survey determined that: globally, large
volcanic explosive eruptions that inject a tremendous volume of sulfur aerosols into
the stratosphere can lead to lower surface temperatures and promote depletion of the
Earth’s ozone layer. Ash from such events [as the Mt St Helens eruptions] can travel
100 miles or more downwind!
How to Lie With Statistics
Chapter 7
So despite what editorial writers, local environmentalists and movie makers
want you to believe, its [sic] nature that is responsible for the greatest changes in
nature, not human activity or President Bush’s environmental policies. I have often
wondered how we mere humans could equal the magnitude of pollution created by
nature no matter how many oil wells we drill, power plants we build, air-conditioners
we operate, hair spray we use, cars we paint or SUV’s we drive.
4. See if you can find an example of a semi-attached figure. They are generally easier to
find in writing that is meant to be persuasive, such as advertising or commentaries.
How to Lie With Statistics
Chapter 7
Read Chapter 7 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. What is a semi-attached figure?
A figure, fact, or argument that seems at first glance to be related to the point that is
being made, but upon closer inspection is not.
2. Name as many general strategies from this chapter as you can for using the semiattached figure. (I count roughly half a dozen.)
a. Prove something that sounds like it has to do with your topic. (Number of
germs killed by a cold nostrum. Are 31,000 germs a lot? In what
concentration? What kind of germs? Do germs have anything to do with
colds? Pg 74-75.)
b. Present testimony from someone well known or respected. (What brand of
cigarettes do doctors smoke? Pg 76-77.) Are we more convinced Jello
Pudding is good because Bill Cosby likes it? What brand of motor oil does a
NASCAR driver use?
c. Present a figure that doesn’t mean what it sounds like. (Extracts 26% more
juice, Pg 77.) I was recently perplexed when I looked at the label on my 100%
juice container and saw all of the ingredients. Looking closer, the label reads,
“100% Juice and other ingredients.” There is obviously something more to the legal
definition of 100% juice than there is to the mathematical definition.
d. Make comparisons between things that aren’t really comparable. (Fatalities
morning vs. evening, Pg 78; death rate in Navy vs. New York City, Pg 83;
teacher salary in lowest district vs. those in New York City, Pg 85.)
e. Count something and report the total as something not quite the same. (Have a
complaint vs. opposed to the union, Pg 80.)
f. Use percentages of large or small sums to make quantities seem smaller or
larger (GM profit on sales vs. profit on investment, Pg 81.)
g. Confuse a change in the rate of reporting with a change in the rate of
incidence. (Influenza or malaria in South, Pg 82-83; increase in polio cases,
Pg 84.)
3. Here is an excerpt from a guest commentary column in the Santa Maria Times by
Ron Fink (May 17, 2005). See how many semi-attached figures you can spot.
The North [Santa Barbara] County routinely meets state air quality standards
and the South Coast does not. Why is that? ... The South Coast doesn’t have any
more industrial pollution emitters than does the north. Failure to meet the standard
may be connected to the 71 tons of methane that is released daily from natural seeps
off of our coastline, not any human activities. You see, the instruments used to
measure air contaminates [sic] cannot discriminate between natural and man-made
pollutants.
Nature is not perfect when judged by man’s rules for polluting the
environment. Anyone with a pollen-associated sinus problem can readily attest to the
impact that nature has on our daily lives.
And what about those greenhouse gases, the purported cause of global
warming, where do they come from? A major source is volcanoes…. Satellite data
after the 1991 eruptions of Mt Pinatubo (the Philippines) and Mt Hudson (Chile)
How to Lie With Statistics
Chapter 7
showed a 15-20 percent ozone loss at high latitudes and a greater than 50 percent loss
over the Antarctic! … The U.S. Geological Survey determined that: globally, large
volcanic explosive eruptions that inject a tremendous volume of sulfur aerosols into
the stratosphere can lead to lower surface temperatures and promote depletion of the
Earth’s ozone layer. Ash from such events [as the Mt St Helens eruptions] can travel
100 miles or more downwind!
So despite what editorial writers, local environmentalists and movie makers
want you to believe, its [sic] nature that is responsible for the greatest changes in
nature, not human activity or President Bush’s environmental policies. I have often
wondered how we mere humans could equal the magnitude of pollution created by
nature no matter how many oil wells we drill, power plants we build, air-conditioners
we operate, hair spray we use, cars we paint or SUV’s we drive.
The columnist’s thesis is that the lower air quality in southern Santa Barbara County
may be due to natural methane seepage. Nothing in the article proves this, and almost none of
the article even has anything to do with it.
 How does 71 tons of methane released daily compare to that released in areas
that meet the air quality standards? And, for that matter, how do we know it
is 71 tons? The quantity sounds very official, being so precise, though I
doubt we have knowledge of the true amount that is precise within even ten
tons.
 What does pollen have to do with it? Is the detection equipment unable to
distinguish between pollen and methane?
 Is he suggesting that the air quality is Santa Barbara is worse than that in the
northern end of the county due to volcanoes in the Philippines? How does
ash traveling 100 miles from Mt St Helens in Oregon affect Santa Barbara on
the central coast of California?
 Is the ozone layer thinning over the Antarctic affecting the air quality in
Santa Barbara? And what were those percentages in comparison to? The year
before the eruption? Immediately before the eruption? The average
thickness? Or volume? Or screening ability?
 Has he presented any evidence at all that nature contributes more to the air
pollution than those things he lists in his conclusion? And, even if it does,
does that make those things not responsible for the difference between the
two ends of the county?
4. See if you can find an example of a semi-attached figure. They are generally easier to
find in writing that is meant to be persuasive, such as advertising or commentaries.
How to Lie With Statistics
Chapter 8
Read Chapter 8 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. What does post hoc mean?
2. List several kinds of correlation that might lead to post hoc reasoning.
3. The chapter also warns against the dangers of extrapolation. What is one example the
author uses in that discussion?
4. Name two variables that you suspect would have correlation if we collected data, but
for which a cause and effect relationship does not exist. (Hint: think of a common
causal factor first, then select your two variables.) Be creative and make it as
outrageous as possible.
5. Comment briefly on each of the following reports.
a. Teen drug use linked to truancy
Press Association
Tuesday July 29, 2003
Truants are more than five times as likely to take drugs than other schoolchildren,
according to research published today. The survey found that 35% of pupils who
had ever played truant admitted they had taken drugs in the last month, compared
with just 6% who had not skipped school. The study of 10,000 schoolchildren aged
11 to 15 also found that truants were far more likely to regularly drink alcohol and
smoke.
How to Lie With Statistics
Chapter 8
b. 20 November, 2003
Low self-esteem 'shrinks brain'
By Pallab Ghosh
BBC Science Correspondent
People with a low sense of self worth are more likely to suffer from memory loss as
they get older, say researchers. The study, presented at a conference at the Royal
Society in London, also found that the brains of these people were more likely to
shrink compared with those who have a high sense of self esteem. Dr Sonia Lupien,
of McGill University in Montreal surveyed 92 senior citizens over 15 years and
studied their brain scans. She found that the brains of those with low self-worth
were up to a fifth smaller than those who felt good about themselves. These people
also performed worse in memory and learning tests.
c. Historical data show that the candidate who raises the most money before the
primary season starts generally wins the nomination.
d. Winning World Cup lowers heart attack deaths
Apr 16, 2003 (Reuters) - Winning soccer's World Cup not only lifts a nation's spirits,
it lowers the death rate from heart attacks, doctors said on Tuesday. During the
1998 World Cup when France defeated Brazil in the final, deaths from heart attacks
in men and women dropped on the day of the match, which was watched by 26
million French TV viewers. Instead of about 33 deaths a day in the five days before
and after the match, 23 men died of a heart attack on match day. There were also
fewer deaths in women but the decrease was not as significant.
e. Church Attendance Boosts Immunity Going to church may be good for the
body as well as the soul. In a study of 1700 older Americans, researchers at Duke
University Medical Center found that those who attended religious services had
stronger immune responses. About 60% of the men and women surveyed attended
religious services at least once a week. Blood tests showed that regular attendees
were less likely to have high levels of an immune-system protein involved in agerelated diseases. -- Article from Parade Magazine, 1997
How to Lie With Statistics
Chapter 8
Read Chapter 8 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. What does post hoc mean?
Logical fallacy of believing that temporal (time) succession implies a causal relation.
2. List several kinds of correlation that might lead to post hoc reasoning.
a. Caused by chance. Another sample may have no correlation at all. (Cancer vs.
milk, Pg 95-96.)
b. Real relationship, but uncertain which is the cause and which the effect.
(Spinsterhood vs. college education? Pg 94).
c. Common cause. (Smoking vs. college grades, Pg 87; minister’s salary vs.
price of rum, Pg 90.)
3. The chapter also warns against the dangers of extrapolation. What is one example the
author uses in that discussion?
Rainfall vs. corn height; years of education vs. income. Pg 91.
4. Name two variables that you suspect would have correlation if we collected data, but
for which a cause and effect relationship does not exist. (Hint: think of a common
causal factor first, then select your two variables.) Be creative and make it as
outrageous as possible.
Or simply think of two variables that have trends in the same direction without regard
to the causes.
5. Comment briefly on each of the following reports.
a. Teen drug use linked to truancy
Press Association
Tuesday July 29, 2003
Truants are more than five times as likely to take drugs than other schoolchildren,
according to research published today. The survey found that 35% of pupils who had
ever played truant admitted they had taken drugs in the last month, compared with just
6% who had not skipped school. The study of 10,000 schoolchildren aged 11 to 15 also
found that truants were far more likely to regularly drink alcohol and smoke.
This doesn’t mean that either one causes the other. It is likely that a common
attitude increases the tendency toward both of these behaviors.
b. 20 November, 2003
Low self-esteem 'shrinks brain'
By Pallab Ghosh
BBC Science Correspondent
People with a low sense of self worth are more likely to suffer from memory loss as they
get older, say researchers. The study, presented at a conference at the Royal Society in
London, also found that the brains of these people were more likely to shrink compared
with those who have a high sense of self esteem. Dr Sonia Lupien, of McGill University
in Montreal surveyed 92 senior citizens over 15 years and studied their brain scans. She
found that the brains of those with low self-worth were up to a fifth smaller than those
who felt good about themselves. These people also performed worse in memory and
How to Lie With Statistics
Chapter 8
learning tests.
Is it also possible that a physiological problem with the brain leads to a
difficulty in thought processes or sense of well-being that in turn contributes
to a feeling of low self-worth?
c.
Historical data show that the candidate who raises the most money before the primary
season starts generally wins the nomination.
Perhaps the candidate who is most popular (and is therefore going to win) is
able to raise more money due to this popularity. This is one where the cause
and effect are probably hopelessly entwined, with each variable contributing
to the other.
d. Winning World Cup lowers heart attack deaths
Apr 16, 2003 (Reuters) - Winning soccer's World Cup not only lifts a nation's spirits, it
lowers the death rate from heart attacks, doctors said on Tuesday. During the 1998
World Cup when France defeated Brazil in the final, deaths from heart attacks in men
and women dropped on the day of the match, which was watched by 26 million French
TV viewers. Instead of about 33 deaths a day in the five days before and after the match,
23 men died of a heart attack on match day. There were also fewer deaths in women but
the decrease was not as significant.
If this was the only incidence looked at, this could be coincidence. If many
took the day off to watch the finals, the reduced stress could be the cause,
without respect to the activity for which the day was being taken off. In any
case, I would question the headline’s suggestion that it was winning that
lowered the deaths, since there is no mention of having looked at the heart
attack death incidence rate for the losing team’s nation.
e. Church Attendance Boosts Immunity Going to church may be good for the
body as well as the soul. In a study of 1700 older Americans, researchers at Duke
University Medical Center found that those who attended religious services had stronger
immune responses. About 60% of the men and women surveyed attended religious
services at least once a week. Blood tests showed that regular attendees were less likely
to have high levels of an immune-system protein involved in age-related diseases. -Article from Parade Magazine, 1997
Or do those who feel better get out more, and therefore attend church more?
How to Lie With Statistics
Chapter 9
Read Chapter 9 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. Who do you think are the people who are most likely to statisticulate, and for what
purposes?
2. Which of the maps on Page 103 is drawn fairly, without statisticulation?
3. Why are percentages so often a source of statisticulation? (I’m starting to like this new
word. )
4. Explain how the difference between two students at the 90th and 95th percentiles, five
percentiles apart, can be (and usually is) greater than the difference between two
students at the 50th and 60th percentiles, ten percentiles apart.
5. Summarize the chapter in one or two sentences.
How to Lie With Statistics
Chapter 9
Read Chapter 9 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. Who do you think are the people who are most likely to statisticulate, and for what
purposes?
a. The media, to sensationalize.
b. Politicians, to win your vote.
c. Advertisers, to sell their product.
d. Anyone trying to convince you of something, especially if they stand to profit
from your becoming convinced.
2. Which of the maps on Page 103 is drawn fairly, without statisticulation?
Neither!
3. Why are percentages so often a source of statisticulation? (I’m starting to like this new
word. ) (Pg 109)
a. They can mask very small and therefore largely meaningless samples by
hiding the counts.
b. They can mean very different things depending on what is chosen as the base,
which is often not even stated.
c. So many people don’t understand them in the first place, on both the writers’
and readers’ ends of the communication.
4. Explain how the difference between two students at the 90th and 95th percentiles, five
percentiles apart, can be (and usually is) greater than the difference between two
students at the 50th and 60th percentiles, ten percentiles apart.
(Pg 115). Due to the normal distribution of most measures of achievement, the
performance levels of the students will be grouped most heavily about the mean. But
percentiles are a ranking system, therefore the achievement levels of students at the
middle range of percentiles are grouped more closely together than those at the
extremes. The 90th and 95th percentiles correspond to roughly to z-scores of 1.28 and
1.64, respectively, a difference of 0.36 SD. The 50th and 60th percentiles correspond
to z-scores of about 0 and 0.25, a difference of only 0.25 SD.
5. Summarize the chapter in one or two sentences.
Many use statistics to distort or hide the truth, sometimes through ignorance, but
often intentionally. Be wary when encountering statistics!
How to Lie With Statistics
Chapter 10
Read Chapter 10 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. As we look to see who is offering a statistic for our consumption, what should we look
for?
2. If a respectable organization is cited as a source of a statistic, what do we still need to
consider about that authority?
3. The example of the survey on prices during the Korean War brought out what can be a
subtle way in which a sample can be biased. What is that source of bias and what is
the name we give to samples likely to be biased in that manner?
4. What are some of the many critical things that are commonly missing when a statistic
is reported in the media?
5. One of the more subtle ways for somebody to “change the subject” has to do with
statistics based on what people say. What is the difficulty that it is essential we look
out for with such a statistic?
How to Lie With Statistics
Chapter 10
6. Summarize the five questions we can ask to defend ourselves as a consumer of
statistics. (Explain briefly what each question is about.)
How to Lie With Statistics
Chapter 10
Read Chapter 10 of How to Lie With Statistics. Answer the following questions and be ready
to discuss the chapter in class.
1. As we look to see who is offering a statistic for our consumption, what should we look
for?
Bias, conscious or unconscious (Pg 123).
2. If a respectable organization is cited as a source of a statistic, what do we still need to
consider about that authority?
That it stands behind the information, and not merely alongside it (Pg 125).
3. The example of the survey on prices during the Korean War brought out what can be a
subtle way in which a sample can be biased. What is that source of bias and what is
the name we give to samples likely to be biased in that manner?
The sample was biased by the large number that failed to respond. This is the
difficulty with a voluntary response sample.
4. What are some of the many critical things that are commonly missing when a statistic
is reported in the media?
a. Number of cases.
b. Confidence level or level of significance.
c. Variability.
d. What variety of average is being used.
e. Something to compare the figure to (mothers over 35; 2800 deaths during fog,
Pg 128).
f. Raw numbers to put percentages in perspective (33% of coeds married
faculty, Pg 128).
g. Some indication of distribution to go with averages (stock held by 3,003
persons with average of 660 shares each vs. three men held 3/4 of the stock,
Pg 129).
h. Base for an index (Pg 129-130).
i. The factor responsible for the change (Easter in a different month; a change in
reporting procedures, lifespan, populations, diagnostic techniques, Pg 130).
How to Lie With Statistics
Chapter 10
5. One of the more subtle ways for somebody to “change the subject” has to do with
statistics based on what people say. What is the difficulty that it is essential we look
out for with such a statistic?
People often either don’t know the information that they are reporting, and merely
estimate, or they lie for any of a number of reasons. (More people 65 to 70 years old
in 1950 than there were 55 to 60 ten years earlier. Pg 134.)
6. Summarize the five questions we can ask to defend ourselves as a consumer of
statistics. (Explain briefly what each question is about.)
a. Who says so? Is there likely to be bias in either the one analyzing the data or
reporting the statistic? Is the cited authority really standing behind the
statistic?
b. How does he know? Is there likely to be bias in the sample? Is it
representative?
c. What’s missing? Do we have everything we need to know in order to fully
understand the significance of the statistic that is being offered?
d. Did somebody change the subject? Are definitions of all terms fully
understood, and consistent for any comparisons? Is the data likely to be
accurate, or was there opportunity and reason for the subjects to lie? Is
correlation being represented as causation?
e. Does it make sense? Is it believable? Or are we being blinded by the
seemingly sophisticated analysis and scientific-sounding statistic? Is it
reasonable to extrapolate this far? (Mark Twain’s prediction for the
Mississippi River, Pg 142.)
Download