S l ti St Selective Statistics ti ti Selective statistics may be defined as “not telling the whole truth with the intention of deception.” p Lies Damn Lies Lies, Lies, and Statistics Attributed Clemens) Everyone to Mark Twain (Samuel picks and chooses when they tell a story or display statistics. Do the statistics you choose attempt to tell the whole truth? When you hear statistics cited, can you identify a distortion? Examples of selective statistics The following examples have been made up or chosen from publications or the Internet to show how statistics can be misused. Some, Some but not, not all of the original sources have been confirmed. Selective interval of measurement Suppose I say that t at from o November o e be to June, temperature was 77% above the long--term average for the same eight long months during the last thirty years. years Thus we are in a period of global warming. Why did I exclude July to October? What if those months were cooler? We can change a measurement period so that numbers confirm our agenda. – Skewing by selective choice of sample Selective statistics by sample size I write a reference for a student: “William Tell Burroughs is the best student that I have ever taught.” I omit to say that he is the only student that I ever taught. g – (This is an example of skewing by size of sample.) Most application forms today ask you to state the percentile rank of a student and the size of the student body of comparison Selective interval of measurement “Recently an OpOp-Ed in the New York Times (12/22/07) by Jonathan Cole and Stephen Sigler concluded that drugs don’t enhance performance and that conclusion would provide a more effective ff ti message than th morall or ethical thi l injunction. i j ti It was a perfect example of questionable statistics combined with wishful thinking. Using selective statistics to demonstrate no link between drugs and performance doesn’t mean no link exists (the statisticians’ statisticians dictum - absence of proof is not proof of absence). absence) One example of selective statistics was their misleading comparison in the last six years of Barry Bonds’ career with those of Babe Ruth (the percentage of home runs the men hit during their careers was equal). q ) The arbitrarily y selected six year y time frame included a year where Bonds was injured and hit only five home runs in fourteen games thereby omitting his incredible 73 home run season one year before that. How can experienced statisticians do that? Extending the interval one year (which includes the time Bonds Bonds, two years older than Ruth Ruth, is suspected of using steroids) or substituting that year for Bonds injury plagued season would change the conclusions about Bonds’ performance in relation to Ruth’s.” http://politicalmavens.com/index.php/2008/01/01/sophistryhttp://politicalmavens.com/index.php/2008/01/01/sophistry p //p / p p/ / / / p y-andand-statistics by y Cory y Franklin Selective interval of measurement “CNN contributor, talk radio host, and BushBush-Cheney '04 campaign adviser Rev Rev. Joe Watkins falsely claimed that the United States has experienced a "reduction" in poverty under President Bush. Misleadingly citing poverty statistics on the September 12 edition of CNN's Paula Zahn Now, Watkins stated that "under under Bill Clinton Clinton, 15 15.1 1 percent of the population was poor; under President Bush, 12.7 percent of the population is poor. That's a reduction, that's a good thing." But the poverty rate declined every year Clinton was in office, office from 15.1 15 1 percent when he took office in 1993 to a low of 11.3 percent in 2000; it has risen every year that Bush has been in office, from 11.7 percent in 2001 to 12.7 percent in 2004.” http://www.democraticunderground.com/discuss/duboard.p hp?az=view p _all&address=132x2087728 Absolute risk vs relative risk Which drug would you rather take? One that reduces your risk of cancer by 50 percent, or another drug that only eliminates cancer in one outt off 100 people? l ? Most M t people l would ld choose h the th drug that reduces their risk of cancer by 50 percent, but the fact is, both of these numbers refer to the same drug drug. They They’re re just two different ways of looking at the same statistic. One way is called relative risk; risk; the other way is absolute risk.. risk http://randomjohn.wordpress.com/2006/05/03/lying--with http://randomjohn.wordpress.com/2006/05/03/lying with--statistics statistics--relative relative--risk risk--vsvs-absolute absolute--risk/ “How to Lie with Statistics” Darrell Huff Absolute and Relative Statistics The p pharmaceutical industry y is required q to conduct extensive trials of all drugs before they are sold to the public. All trials must include control groups that do not take the drug. Suppose we conduct a clinical test of a new colon cancer drug with 100 patients each in a control group and in a group taking the drug. 2 patients in the control group get colon cancer, but only 1 patient in the group taking the drug gets colon cancer. The reduction of cancer is one person out of a hundred. – Absolute statistic The reduction in risk is 50%, because the risk for any person in the drug group is reduced by half (2 to 1). – Relative statistic Absolute and Relative Statistics How can the p previous statistics be misused? Report the result of the test using the method that sounds more or less favorable depending on which side you favor. Th drug The d helps h l ONLY 1 in i 100 people. l The drug reduces the risk of developing colon y 50%. cancer by Which statistic promotes use of the drug? Absolute and Relative Statistics Selective statistics may be used to omit it information. i f ti C Consider id th the previous example: If you are for the drug, omit or downplay the probability of side effects. – Example: a calm voice drones out side effects while TV is displaying a patient enjoying life due to using the drug (implied). If you against the drug, drug explain side effects in depth with appropriate images of patients in pain. Absolute and Relative Statistics USA Today US oday Headline McLean, c ea , Va.:Nov a o 20, 0, 2007. 00 p. p A.3 3 – FBI: Hate crimes escalate 8% in 2006:Hispanics, Muslims, l gays among those h under sharper attack Statistics within the article – A 19% increase in crimes motivated by religious bias. Attacks on Muslims increased 22% to 156 the previous year year. Attacks on Catholics increased by almost a third to 76. Almost seven in 10 were crimes against Jews, which were up 14% to 967. 967 Absolute and Relative Statistics Analysis of USA Today’s Today s headline – “Muslims…among those under sharper attack.” But attacks against Catholics increased by almost a third h d which h h is a greater percentage than h 22% cited d for Muslims Jewish hate crimes increased to 967 cases which is a l more cases than lot h 156 attributed b d towards d Muslims, l although the percentage increase is not as great – Note the way the numbers are presented and the emphasis of the headline. Choosing an endorsement group Cite a statistic to provide an endorsement: Four out of five teachers use this text. text How is the h group chosen? h – Which five teachers? – Are the group members paid? – Do they have an interest in the text? Deceptive graphs and charts Op-ed article in the NY Times by Cox and Alm, “Scientists Are OpMade Not Born, Made, Born ” 2/28/05 A line graph shows an almost continuous growth of the % of women obtaining professional degrees in veterinary, pharmacy, law medicine, law, medicine dentistry and M.B.A M B A programs. programs A chart lists the percentage of women obtaining degrees in sciences for 19701970-1971 and for 2001 2001--2002, and includes the “hard sciences” of physics, electrical engineering, and computer science The chart shows an increase in all of the listed fields science. fields. The implication: the number of women enrolled in the hard sciences has continuously increased. Conclusion:”The ‘feminization’ of the ranks will take place as a matter of simple math because the older, male male--dominated groups will retire.” (in response to a controversy involving Harvard’s president at the time). time) Selective use of charts and graphs Analysis – The percentage of women obtaining professional d degrees in the h hard h d sciences from f 1970 to 2002 did dd increase, but that is because women were not admitted to most professional schools in hard sciences in 1970. – The line graph, graph which shows periods between 1970 and 2002, does not include the hard sciences. – The article uses selective statistics to pick samples that support its thesis (1970 and 2002) that the percentage of women in the hard sciences has continually increased and ignores the intermediate years of the hard sciences in its chart. – The Th article i l uses the h “soft” “ f ” sciences i to show h a continuous i increase in the percentage of women enrolled. – Implication is that the pattern in the line graph applies to the chart that includes the hard sciences. sciences Selective statistics with charts and graphs h Fact act – There has been a decline in the percentage of professional degrees awarded to female computer scientists over the last twenty years years. NCES provides statistics for the percentage of female undergraduate degrees in computer science (including data processing technology and other information sciences): 19801980-81 32.5%, 19851985-86 35.7%, 19901990-1991 29.3%, with further decreases since. since If we consider only computer science without IT, the drop is more critical. Percentages for computer and electrical engineering are also discouraging. discouraging Statistics on women in CS “Computer Computer science has the dubious distinction of being the only science field to see a fall in the share of its bachelor’s degrees granted to women between 1983 and 2002. 2002 ” http://www.cra.org/wp/index.php?p=83 Using charts and graphs Selective choice of disciplines Selective Statistics 1970 and 2001 % of undergraduate degrees awarded d d to women ffrom 1966 1966--2002 http://www.cra.org/wp/index.php?p=83 Taulbee Survey 2006 Conclusion These examples illustrate mechanisms for using statistics incorrectly. incorrectly Take T k statistics t ti ti courses to t learn l how h to use and understand statistics.