S l ti St ti ti Selective Statistics

advertisement
S l ti St
Selective
Statistics
ti ti
Selective statistics may be
defined as
“not telling the whole truth with
the intention of deception.”
p
Lies Damn Lies
Lies,
Lies, and Statistics
‹ Attributed
Clemens)
‹ Everyone
to Mark Twain (Samuel
picks and chooses when
they tell a story or display statistics.
‹ Do the statistics you choose attempt
to tell the whole truth?
‹ When you hear statistics cited, can
you identify a distortion?
Examples of selective statistics
‹ The
following examples have been
made up or chosen from publications
or the Internet to show how statistics
can be misused.
‹ Some,
Some but not,
not all of the original
sources have been confirmed.
Selective interval of measurement
Suppose I say that
t at from
o November
o e be to
June, temperature was 77% above the
long--term average for the same eight
long
months during the last thirty years.
years Thus
we are in a period of global warming.
‹ Why did I exclude July to October? What if
those months were cooler?
‹ We can change a measurement period so
that numbers confirm our agenda.
‹
– Skewing by selective choice of sample
Selective statistics by sample size
I write a reference for a student:
“William Tell Burroughs is the best student
that I have ever taught.”
‹
‹
I omit to say that he is the only student
that I ever taught.
g
– (This is an example of skewing by size of
sample.)
‹
Most application forms today ask you to
state the percentile rank of a student and
the size of the student body of comparison
Selective interval of measurement
‹
‹
‹
“Recently an OpOp-Ed in the New York Times (12/22/07) by
Jonathan Cole and Stephen Sigler concluded that drugs don’t
enhance performance and that conclusion would provide a more
effective
ff ti
message than
th
morall or ethical
thi l injunction.
i j
ti
It was a
perfect example of questionable statistics combined with wishful
thinking. Using selective statistics to demonstrate no link between
drugs and performance doesn’t mean no link exists (the
statisticians’
statisticians dictum - absence of proof is not proof of absence).
absence)
One example of selective statistics was their misleading
comparison in the last six years of Barry Bonds’ career with those
of Babe Ruth (the percentage of home runs the men hit during
their careers was equal).
q
) The arbitrarily
y selected six year
y
time
frame included a year where Bonds was injured and hit only five
home runs in fourteen games thereby omitting his incredible 73
home run season one year before that. How can experienced
statisticians do that? Extending the interval one year (which
includes the time Bonds
Bonds, two years older than Ruth
Ruth, is suspected
of using steroids) or substituting that year for Bonds injury
plagued season would change the conclusions about Bonds’
performance in relation to Ruth’s.”
http://politicalmavens.com/index.php/2008/01/01/sophistryhttp://politicalmavens.com/index.php/2008/01/01/sophistry
p //p
/
p p/
/ / / p
y-andand-statistics by
y Cory
y Franklin
Selective interval of measurement
‹
‹
“CNN contributor, talk radio host, and BushBush-Cheney '04
campaign adviser Rev
Rev. Joe Watkins falsely claimed that the
United States has experienced a "reduction" in poverty
under President Bush. Misleadingly citing poverty statistics
on the September 12 edition of CNN's Paula Zahn Now,
Watkins stated that "under
under Bill Clinton
Clinton, 15
15.1
1 percent of the
population was poor; under President Bush, 12.7 percent of
the population is poor. That's a reduction, that's a good
thing." But the poverty rate declined every year
Clinton was in office,
office from 15.1
15 1 percent when he took
office in 1993 to a low of 11.3 percent in 2000; it has
risen every year that Bush has been in office, from
11.7 percent in 2001 to 12.7 percent in 2004.”
http://www.democraticunderground.com/discuss/duboard.p
hp?az=view
p
_all&address=132x2087728
Absolute risk vs relative risk
‹
‹
‹
Which drug would you rather take? One that
reduces your risk of cancer by 50 percent, or
another drug that only eliminates cancer in one
outt off 100 people?
l ? Most
M t people
l would
ld choose
h
the
th
drug that reduces their risk of cancer by 50
percent, but the fact is, both of these numbers
refer to the same drug
drug. They
They’re
re just two different
ways of looking at the same statistic. One way is
called relative risk;
risk; the other way is absolute
risk..
risk
http://randomjohn.wordpress.com/2006/05/03/lying--with
http://randomjohn.wordpress.com/2006/05/03/lying
with--statistics
statistics--relative
relative--risk
risk--vsvs-absolute
absolute--risk/
“How to Lie with Statistics” Darrell Huff
Absolute and Relative Statistics
‹
The p
pharmaceutical industry
y is required
q
to conduct
extensive trials of all drugs before they are sold to the
public. All trials must include control groups that do not
take the drug.
Suppose we conduct a clinical test of a new colon cancer
drug with 100 patients each in a control group and in a
group taking the drug. 2 patients in the control group get
colon cancer, but only 1 patient in the group taking the
drug gets colon cancer.
‹
The reduction of cancer is one person out of a hundred.
‹
– Absolute statistic
‹
The reduction in risk is 50%, because the risk for any
person in the drug group is reduced by half (2 to 1).
– Relative statistic
Absolute and Relative Statistics
‹
‹
How can the p
previous statistics be misused?
Report the result of the test using the method
that sounds more or less favorable depending on
which side you favor.
‹
Th drug
The
d
helps
h l ONLY 1 in
i 100 people.
l
The drug reduces the risk of developing colon
y 50%.
cancer by
‹
Which statistic promotes use of the drug?
‹
Absolute and Relative Statistics
‹ Selective
statistics may be used to
omit
it information.
i f
ti
C
Consider
id th
the
previous example:
‹ If
you are for the drug, omit or downplay
the probability of side effects.
– Example: a calm voice drones out side effects
while TV is displaying a patient enjoying life due
to using the drug (implied).
‹ If
you against the drug,
drug explain side effects
in depth with appropriate images of patients
in pain.
Absolute and Relative Statistics
USA Today
US
oday
‹ Headline
‹
McLean,
c ea , Va.:Nov
a
o 20,
0, 2007.
00
p.
p A.3
3
– FBI: Hate crimes escalate 8% in
2006:Hispanics, Muslims,
l
gays among those
h
under sharper attack
‹
Statistics within the article
– A 19% increase in crimes motivated by
religious bias. Attacks on Muslims increased
22% to 156 the previous year
year. Attacks on
Catholics increased by almost a third to 76.
Almost seven in 10 were crimes against Jews,
which were up 14% to 967.
967
Absolute and Relative Statistics
‹
Analysis of USA Today’s
Today s headline
– “Muslims…among those under sharper attack.”
But attacks against Catholics increased by almost a
third
h d which
h h is a greater percentage than
h
22% cited
d
for Muslims
‹ Jewish hate crimes increased to 967 cases which is a
l more cases than
lot
h
156 attributed
b
d towards
d Muslims,
l
although the percentage increase is not as great
‹
– Note the way the numbers are presented and
the emphasis of the headline.
Choosing an endorsement group
‹ Cite
a statistic to provide an
endorsement:
Four out of five teachers use this text.
text
‹ How
is the
h group chosen?
h
– Which five teachers?
– Are the group members paid?
– Do they have an interest in the text?
Deceptive graphs and charts
‹
Op-ed article in the NY Times by Cox and Alm, “Scientists Are
OpMade Not Born,
Made,
Born ” 2/28/05
‹
A line graph shows an almost continuous growth of the % of
women obtaining professional degrees in veterinary, pharmacy,
law medicine,
law,
medicine dentistry and M.B.A
M B A programs.
programs
A chart lists the percentage of women obtaining degrees in
sciences for 19701970-1971 and for 2001
2001--2002, and includes the
“hard sciences” of physics, electrical engineering, and computer
science The chart shows an increase in all of the listed fields
science.
fields.
‹
The implication: the number of women enrolled in the hard
sciences has continuously increased.
‹
‹
Conclusion:”The ‘feminization’ of the ranks will take place as a
matter of simple math because the older, male
male--dominated groups
will retire.” (in response to a controversy involving Harvard’s
president at the time).
time)
Selective use of charts and graphs
Analysis
– The percentage of women obtaining professional
d
degrees
in the
h hard
h d sciences from
f
1970 to 2002 did
dd
increase, but that is because women were not admitted
to most professional schools in hard sciences in 1970.
– The line graph,
graph which shows periods between 1970 and
2002, does not include the hard sciences.
– The article uses selective statistics to pick samples that
support its thesis (1970 and 2002) that the percentage
of women in the hard sciences has continually increased
and ignores the intermediate years of the hard sciences
in its chart.
– The
Th article
i l uses the
h “soft”
“ f ” sciences
i
to show
h
a continuous
i
increase in the percentage of women enrolled.
– Implication is that the pattern in the line graph applies
to the chart that includes the hard sciences.
sciences
Selective statistics with charts and
graphs
h
‹
Fact
act
– There has been a decline in the percentage of
professional degrees awarded to female
computer scientists over the last twenty years
years.
NCES provides statistics for the percentage of
female undergraduate degrees in computer
science (including data processing technology
and other information sciences): 19801980-81
32.5%, 19851985-86 35.7%, 19901990-1991 29.3%,
with further decreases since.
since If we consider
only computer science without IT, the drop is
more critical. Percentages for computer and
electrical engineering are also discouraging.
discouraging
Statistics on women in CS
‹ “Computer
Computer
science has the dubious
distinction of being the only science
field to see a fall in the share of its
bachelor’s degrees granted to
women between 1983 and 2002.
2002 ”
‹
http://www.cra.org/wp/index.php?p=83
Using charts and graphs
Selective choice of disciplines
Selective Statistics
‹ 1970
and 2001
% of undergraduate degrees
awarded
d d to women ffrom 1966
1966--2002
http://www.cra.org/wp/index.php?p=83
Taulbee Survey 2006
Conclusion
‹ These
examples illustrate
mechanisms for using statistics
incorrectly.
incorrectly
‹ Take
T k
statistics
t ti ti courses to
t learn
l
how
h
to use and understand statistics.
Download