Abuses of Statistics

advertisement
Abuses of Statistics
Today, statistics, surveys, polls, charts and graphs are widely used in electronic and print media. They
are also widely abused. The statistics may be accurate but the presentation is misleading. There are
many ways to abuse statistics and some folks take advantage of these. Some of the ways that people lie
with statistics are:





Intentionally deceptive - sometimes people simply make up numbers and use them as fact
when there isn't any truth in the values. A classic example occurred a couple of years ago when
a researcher used a magic marker to color some dark spots on a white lab rat. With this
deception, he claimed that he had made a great discovery in the treatment of cancer. He got the
results that he wanted until his deception was discovered.
Selective use of data - simply throwing out the numbers that don't fit the researchers needs. If
someone wants to say that 9 out of 10 doctors prefer a certain medicine, then conveniently 'lose'
the data on those doctors that don't prefer this medicine.
Extrapolation - everyone knows that teenagers go through a growth spurt. If we plot that rapid
growth and use that rapid rate to predict how tall the individual will be at age 60, then we
would get some bizarre numbers. For example, consider a teenager that grows 3 inches a year some teenagers have grown from 5 feet to 6 feet in 4 years. However, if this rate of growth
continues, then the individual would grow 120 inches in the next 40 years. Obviously, this
doesn't occur in real life. However, sometimes researchers extrapolate data in similar ways.
Creative graphics - everyone likes a picture. Sometimes, the picture can misrepresent the data.
Incompetent - sometimes, the researcher wants to do a good job with the statistics but just
doesn't know what they are doing. Often, it would be best if they just didn't do anything instead
of doing something the wrong way.
Here are some examples:
There are 3 commonly used measures of central tendency, they are commonly called averages.
Consider the following values:
10, 20, 20, 20, 50, 70, 120, 130, 170
One can say that the average of these numbers is 20. Another can say that the average is 50. And yet
another can say that the average is 70. How can this be? Can they all be right? Can the average of these
numbers be 20, 50 or 70?
The answer is YES. The 3 commonly used measures of central tendency are the MODE,
MEDIAN and MEAN. The mode is the most common value. In this data set, it is the value of 20 which
occurs 3 times. The median is the middle value after all of the values have been ordered. In this data
set of 9 values, it is the 5th value or 50. Finally, the total divided by the number of values is the mean.
The total adds up to 630 and there are 9 values so the mean is 70.
Many people assume that when someone talks about 'the average', they are actually referring to
the mean. However, this is not necessarily correct. Reputable reports will use the terms of mean,
median and mode when referring to the various averages.
In general, the median is a better measure for populations that have wide variations, like salaries.
However, the usefulness of the statistic is only as good as the method used to collect the data. If the
salary was self-reported, then someone might under-report, because of the IRS or they may over-report
because of pride. The person may not know how much they make so they may have guessed. There are
various problems with collecting data in this way, so the data may be useless and the results equally
useless. However, one will find this type of data reported in the media.
QUESTIONS
1. Suppose that these values represent salaries (in thousands). That is, 20 represents 20,000. Suppose
that you were trying to convince that the average salary is low, which measure of central tendency
would you use? (10, 20, 20, 20, 50, 70, 120, 130, 170)
2. Suppose that you were trying to convince someone that the average salary is very high, which
measure would you use?
A plot of the values of: 8989, 8930, 8950, 8979, 8999, 9030, 8940, 8901, 8934, 8987, 8950, 9009 and
8990.
A plot of data that varies greatly. Data seems to be unstable.
A plot of data that seems rather stable but is low on the scale. Seems to indicate that there is a lot of
room for positive growth.
A plot of data that seems stable but is high on the scale. Seems to indicate that there isn't any room for
growth - the only way for the values to go is DOWN.
Now, the point - all of these plots use the same data set. The only difference is that the scales
change. In the first, the scale is 8800 to 9050. In the second, the scale is 8900 to 9030. In the next, the
scale is 0 to 100,000. Finally, the scale is 0 to 10,000.
QUESTIONS
3. Which of these graphs (1st, 2nd, 3rd or 4th), would you use to convince a person that the stock
market has topped out and they need to sell?
4. Which one would you use to convince a person that the stock market has bottomed out and they
need to buy?
5. Which one would you use to convince a person that the stock market is highly unstable?
PICTOGRAPHS
As we have seen, graphs can be misleading. A special type of graph, called 'pictographs' can be
problematical. For example, this is a pictograph that shows the purchasing power of the American
dollar.
The pictograph above shows how one American dollar in 1958 had shrunk to a value of 44 cents in
1978 (due to the effects of rising prices or inflation). If you think carefully, this means that one
American dollar in 1978 could buy just under half as much as it could in 1958. So is there any problem
with the depiction of statistics in the pictograph?
The size or area (length by breadth) of the dollars shown are in fact misleading. They should reflect the
statistics or actual purchasing power of the dollar in the year in question. As 44 cents is just under half
of one dollar, so the 1978 dollar area should be just under one half of the 1958 dollar area. This means
that the 1978 dollar should be about twice as big as it is.
The artist decreased the length by half, so that decreases the area by a factor of 4.
You may argue that this problem goes unnoticed by people when they look at a pictograph like this
one, so it is not particularly important. However, the fact is that subconsciously many people interpret
the dollar to have lost far more of its value than is the case. Furthermore, it is also worth noting that the
pictograph appeared during an American presidential election campaign in a leading newspaper, and
would have been looked at by many potential voters.
Consider the following pictograph, the financial advisor claims that he can reduce your taxes by 25%.
The $5 bill in his right hand is much larger than the one in his left hand. The length decreased to 25%
of the original length but the area of the bill was reduced to 1/16 of the original area. Again, it is
deceptive.
One may argue that this problem goes unnoticed by people when they look at a pictograph, so it is not
very important. However, the fact is that subconsciously many people interpret the dollar to be
significantly less than it actually is in both of these examples.
Consider the following example:
QUESTION
7. Assuming that the graph was created by the same artists that created the 'dollar' pictographs, which
set of data was most likely used to create this graph?
a. 2000 - $10,000 and 2002 - $40,000
b. 2000 - $10,000 and 2002 - $20,000
c. 2000 - $10,000 and 2002 - $10,000
MISLEADING/LOADED QUESTIONS
Another way to 'lie' with statistics is to ask misleading or loaded questions. For example, if the
researcher wants to show that everyone speeds, then they could ask the following question:
Do you still drive faster than the speed limit?
How did you answer this question? There isn't any way to answer the question without admitting that
that you speed. If you said "no" then the conclusion is that you used to speed but that you no longer do
so. If you said "yes" then you are saying that you still speed. Obviously, it is an unfair question - but
sometimes researchers use similar questions to draw invalid conclusions.
Sometimes, people compare inappropriate data - typically, this is called 'comparing apples to oranges'.
We often accuse politicians of doing this. Suppose that someone asks:
Do students get a good education at Texas community colleges?
And this question is answered with:
Yes, the tuition is lower than at 4-year universities and the G.P.A. is 2.95.
Obviously, the answer is a good answer but not to this question.
Suppose that a researcher polls students who have dropped out of high school and finds that 75%
dropped out because of finances. Further suppose that the researcher makes a statement such as:
We have found that 75% of high school students quit because of finances.
Well, they seem to be comparing the population of high school students with the population of high
school dropout students. Their statement would have been clearer if they had said:
We found that 75% of high school dropout students quit because of finances.
However, often the media will report the former statement, especially in this age of 'investigative'
reporting. This is reporting that is performed to FIND a problem - even if the problem doesn't really
exist.
Another problem with questions involves emotion. Whenever articles deal with emotionally charged
issues, one should be very suspicious. The bias of the researcher can have a great effect on the results
obtained from the survey. Consider the issue of abortion, for example. Both pro-abortion groups and
pro-life groups can show polls that show that the majority of the public supports their view. How can
this be? Suppose that the researcher wants to show support for a pro-abortion view, they could ask
"Should women be allowed to make their own medical decisions without the intrusion of an
overbearing federal bureaucracy?" On the other hand, to show support for a pro-life view, they could
ask, "Should doctors be allowed to mercilessly butcher innocent little babies?" Obviously, the wording
affects the response. Most people assume that when these results were obtained from asking a question
similar to "Are you pro-abortion or pro-life?" Unfortunately, that is not what happens in real life.
When a researcher puts a question on the survey, they usually have a purpose in mind - they have
something to prove.
QUESTION
8. War is another emotionally charged issue. Write a biased survey question on this issue.
Percentages can also be problematical. For example, if an investment made 3% last year and made 3%
again in the next year then one could say that the investment increased by 3% or the investment return
increased by 100%. Both statements are true, but the percentages are very different.
Suppose that a sale advertises 25% off an item. Then the price is dropped another 25%. The store
claims that they have dropped the price by 50%. Is this true? Let's look at an example. Suppose that the
original price was $32, so 25% off would mean that the price was dropped by 25% of $32 or 8$. The
price of the item is now $24 and the price is dropped by another 25%, which would be $6. The
resulting price would be $18. If the original price was dropped by 50% then the price drop would have
been $16. So, the store would not be correct.
Suppose that your company is having financial problems and they cut your salary by 50%. Times get
better and they increase your salary by 50%. Are you back at your original salary? Not hardly. Suppose
that you were making $40,000. The 50% drop would take you to $20,000 and the 50% increase would
add $10,000 or result in a salary of $30,000.
QUESTION
9. In the last example, instead of a 50% increase in the pay, what percentage increase would be
necessary to bring the salary back to the original salary?
Surveys can be biased, either consciously or subconsciously. When considering the results of a survey
be sure to consider the following:




How many cases were considered? If the sample size is small, then be concerned about the
results. Large sample size doesn't mean that the study is valid but small sample size is certainly
a concern.
Who is telling you the information? Do they have a reason to 'stretch' the truth?
The question that was asked. For example, suppose the following question was asked: "Do you
favor protecting the environment or helping economic growth?" If the respondent could only
choose one of these choices, then the results would be invalid because the choices are not
mutually exclusive. That is, a person could favor protecting the environment and helping
economic growth.
Polls that involve highly technical matters are usually useless. Most people do not know about
the technical matters and cannot intelligently answer the questions.
When reading or listening to anything in the media regarding statistics, polls or surveys, it is helpful to
understand that sometimes you can be mislead. Statistics are only useful when the assumptions
associated with the statistic being used are met. It is the responsibility of the person using the statistics
of know these assumptions and to use them properly and ethically. However, it is also important that
the reader know about these statistics and to be suspicious when appropriate.
QUESTION
10. Pick one of the concerns mentioned in this material (graphs, misleading questions, averages,
percentages, etc.) that has deceived you in the past and write a couple of sentences about how you were
mislead.
Download