Abuses of Statistics

Abuses of Statistics Today, statistics, surveys, polls, charts and graphs are widely used in electronic and print media. They are also widely abused. The statistics may be accurate but the presentation is misleading. There are many ways to abuse statistics and some folks take advantage of these. Some of the ways that people lie with statistics are:      Intentionally deceptive - sometimes people simply make up numbers and use them as fact when there isn't any truth in the values. A classic example occurred a couple of years ago when a researcher used a magic marker to color some dark spots on a white lab rat. With this deception, he claimed that he had made a great discovery in the treatment of cancer. He got the results that he wanted until his deception was discovered. Selective use of data - simply throwing out the numbers that don't fit the researchers needs. If someone wants to say that 9 out of 10 doctors prefer a certain medicine, then conveniently 'lose' the data on those doctors that don't prefer this medicine. Extrapolation - everyone knows that teenagers go through a growth spurt. If we plot that rapid growth and use that rapid rate to predict how tall the individual will be at age 60, then we would get some bizarre numbers. For example, consider a teenager that grows 3 inches a year some teenagers have grown from 5 feet to 6 feet in 4 years. However, if this rate of growth continues, then the individual would grow 120 inches in the next 40 years. Obviously, this doesn't occur in real life. However, sometimes researchers extrapolate data in similar ways. Creative graphics - everyone likes a picture. Sometimes, the picture can misrepresent the data. Incompetent - sometimes, the researcher wants to do a good job with the statistics but just doesn't know what they are doing. Often, it would be best if they just didn't do anything instead of doing something the wrong way. Here are some examples: There are 3 commonly used measures of central tendency, they are commonly called averages. Consider the following values: 10, 20, 20, 20, 50, 70, 120, 130, 170 One can say that the average of these numbers is 20. Another can say that the average is 50. And yet another can say that the average is 70. How can this be? Can they all be right? Can the average of these numbers be 20, 50 or 70? The answer is YES. The 3 commonly used measures of central tendency are the MODE, MEDIAN and MEAN. The mode is the most common value. In this data set, it is the value of 20 which occurs 3 times. The median is the middle value after all of the values have been ordered. In this data set of 9 values, it is the 5th value or 50. Finally, the total divided by the number of values is the mean. The total adds up to 630 and there are 9 values so the mean is 70. Many people assume that when someone talks about 'the average', they are actually referring to the mean. However, this is not necessarily correct. Reputable reports will use the terms of mean, median and mode when referring to the various averages. In general, the median is a better measure for populations that have wide variations, like salaries. However, the usefulness of the statistic is only as good as the method used to collect the data. If the salary was self-reported, then someone might under-report, because of the IRS or they may over-report because of pride. The person may not know how much they make so they may have guessed. There are various problems with collecting data in this way, so the data may be useless and the results equally useless. However, one will find this type of data reported in the media. QUESTIONS 1. Suppose that these values represent salaries (in thousands). That is, 20 represents 20,000. Suppose that you were trying to convince that the average salary is low, which measure of central tendency would you use? (10, 20, 20, 20, 50, 70, 120, 130, 170) 2. Suppose that you were trying to convince someone that the average salary is very high, which measure would you use? A plot of the values of: 8989, 8930, 8950, 8979, 8999, 9030, 8940, 8901, 8934, 8987, 8950, 9009 and 8990. A plot of data that varies greatly. Data seems to be unstable. A plot of data that seems rather stable but is low on the scale. Seems to indicate that there is a lot of room for positive growth. A plot of data that seems stable but is high on the scale. Seems to indicate that there isn't any room for growth - the only way for the values to go is DOWN. Now, the point - all of these plots use the same data set. The only difference is that the scales change. In the first, the scale is 8800 to 9050. In the second, the scale is 8900 to 9030. In the next, the scale is 0 to 100,000. Finally, the scale is 0 to 10,000. QUESTIONS 3. Which of these graphs (1st, 2nd, 3rd or 4th), would you use to convince a person that the stock market has topped out and they need to sell? 4. Which one would you use to convince a person that the stock market has bottomed out and they need to buy? 5. Which one would you use to convince a person that the stock market is highly unstable? PICTOGRAPHS As we have seen, graphs can be misleading. A special type of graph, called 'pictographs' can be problematical. For example, this is a pictograph that shows the purchasing power of the American dollar. The pictograph above shows how one American dollar in 1958 had shrunk to a value of 44 cents in 1978 (due to the effects of rising prices or inflation). If you think carefully, this means that one American dollar in 1978 could buy just under half as much as it could in 1958. So is there any problem with the depiction of statistics in the pictograph? The size or area (length by breadth) of the dollars shown are in fact misleading. They should reflect the statistics or actual purchasing power of the dollar in the year in question. As 44 cents is just under half of one dollar, so the 1978 dollar area should be just under one half of the 1958 dollar area. This means that the 1978 dollar should be about twice as big as it is. The artist decreased the length by half, so that decreases the area by a factor of 4. You may argue that this problem goes unnoticed by people when they look at a pictograph like this one, so it is not particularly important. However, the fact is that subconsciously many people interpret the dollar to have lost far more of its value than is the case. Furthermore, it is also worth noting that the pictograph appeared during an American presidential election campaign in a leading newspaper, and would have been looked at by many potential voters. Consider the following pictograph, the financial advisor claims that he can reduce your taxes by 25%. The $5 bill in his right hand is much larger than the one in his left hand. The length decreased to 25% of the original length but the area of the bill was reduced to 1/16 of the original area. Again, it is deceptive. One may argue that this problem goes unnoticed by people when they look at a pictograph, so it is not very important. However, the fact is that subconsciously many people interpret the dollar to be significantly less than it actually is in both of these examples. Consider the following example: QUESTION 7. Assuming that the graph was created by the same artists that created the 'dollar' pictographs, which set of data was most likely used to create this graph? a. 2000 - $10,000 and 2002 - $40,000 b. 2000 - $10,000 and 2002 - $20,000 c. 2000 - $10,000 and 2002 - $10,000 MISLEADING/LOADED QUESTIONS Another way to 'lie' with statistics is to ask misleading or loaded questions. For example, if the researcher wants to show that everyone speeds, then they could ask the following question: Do you still drive faster than the speed limit? How did you answer this question? There isn't any way to answer the question without admitting that that you speed. If you said "no" then the conclusion is that you used to speed but that you no longer do so. If you said "yes" then you are saying that you still speed. Obviously, it is an unfair question - but sometimes researchers use similar questions to draw invalid conclusions. Sometimes, people compare inappropriate data - typically, this is called 'comparing apples to oranges'. We often accuse politicians of doing this. Suppose that someone asks: Do students get a good education at Texas community colleges? And this question is answered with: Yes, the tuition is lower than at 4-year universities and the G.P.A. is 2.95. Obviously, the answer is a good answer but not to this question. Suppose that a researcher polls students who have dropped out of high school and finds that 75% dropped out because of finances. Further suppose that the researcher makes a statement such as: We have found that 75% of high school students quit because of finances. Well, they seem to be comparing the population of high school students with the population of high school dropout students. Their statement would have been clearer if they had said: We found that 75% of high school dropout students quit because of finances. However, often the media will report the former statement, especially in this age of 'investigative' reporting. This is reporting that is performed to FIND a problem - even if the problem doesn't really exist. Another problem with questions involves emotion. Whenever articles deal with emotionally charged issues, one should be very suspicious. The bias of the researcher can have a great effect on the results obtained from the survey. Consider the issue of abortion, for example. Both pro-abortion groups and pro-life groups can show polls that show that the majority of the public supports their view. How can this be? Suppose that the researcher wants to show support for a pro-abortion view, they could ask "Should women be allowed to make their own medical decisions without the intrusion of an overbearing federal bureaucracy?" On the other hand, to show support for a pro-life view, they could ask, "Should doctors be allowed to mercilessly butcher innocent little babies?" Obviously, the wording affects the response. Most people assume that when these results were obtained from asking a question similar to "Are you pro-abortion or pro-life?" Unfortunately, that is not what happens in real life. When a researcher puts a question on the survey, they usually have a purpose in mind - they have something to prove. QUESTION 8. War is another emotionally charged issue. Write a biased survey question on this issue. Percentages can also be problematical. For example, if an investment made 3% last year and made 3% again in the next year then one could say that the investment increased by 3% or the investment return increased by 100%. Both statements are true, but the percentages are very different. Suppose that a sale advertises 25% off an item. Then the price is dropped another 25%. The store claims that they have dropped the price by 50%. Is this true? Let's look at an example. Suppose that the original price was $32, so 25% off would mean that the price was dropped by 25% of $32 or 8$. The price of the item is now $24 and the price is dropped by another 25%, which would be $6. The resulting price would be $18. If the original price was dropped by 50% then the price drop would have been $16. So, the store would not be correct. Suppose that your company is having financial problems and they cut your salary by 50%. Times get better and they increase your salary by 50%. Are you back at your original salary? Not hardly. Suppose that you were making $40,000. The 50% drop would take you to $20,000 and the 50% increase would add $10,000 or result in a salary of $30,000. QUESTION 9. In the last example, instead of a 50% increase in the pay, what percentage increase would be necessary to bring the salary back to the original salary? Surveys can be biased, either consciously or subconsciously. When considering the results of a survey be sure to consider the following:     How many cases were considered? If the sample size is small, then be concerned about the results. Large sample size doesn't mean that the study is valid but small sample size is certainly a concern. Who is telling you the information? Do they have a reason to 'stretch' the truth? The question that was asked. For example, suppose the following question was asked: "Do you favor protecting the environment or helping economic growth?" If the respondent could only choose one of these choices, then the results would be invalid because the choices are not mutually exclusive. That is, a person could favor protecting the environment and helping economic growth. Polls that involve highly technical matters are usually useless. Most people do not know about the technical matters and cannot intelligently answer the questions. When reading or listening to anything in the media regarding statistics, polls or surveys, it is helpful to understand that sometimes you can be mislead. Statistics are only useful when the assumptions associated with the statistic being used are met. It is the responsibility of the person using the statistics of know these assumptions and to use them properly and ethically. However, it is also important that the reader know about these statistics and to be suspicious when appropriate. QUESTION 10. Pick one of the concerns mentioned in this material (graphs, misleading questions, averages, percentages, etc.) that has deceived you in the past and write a couple of sentences about how you were mislead.

Abuses of Statistics

Related documents

Products

Support

Abuses of Statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib