Data Analysis Project Misleading Statistics Companies and media often manipulate and skew certain statistics to their advantages in order to deliver their messages more convincingly. Because there are many ways of presentation of statistics, it is quite easy to manipulate the information in a misleading way. If the statistics include surveying, a proper way of selection of population is important (Simple Random Sampling, Clustered Sampling, Stratified Random Sampling, and etc.). We all must be aware of these misleading graphs and in what forms they are presented to us, and be able to analyze, and fix such graphs so that they are no longer misleading, to the benefit of our society. Two different bar graphs are made from the same survey of favorite foods: The same information can be accurately presented in a non-misleading way : Favorite Foods Hamburgers 33% Pizza 33% Pizza Hot Dogs Hamburgers Hot Dogs 34% If we take the same information and present it in a pie graph, we can see the more accurate result of the survey. Unlike the previous graph which depicted hot dogs as the favorite food by misrepresenting the origin on the y-axis, this pie graph shows that all three foods are equally preferred, a more realistic result. Comparative Causes of Annual Deaths in the United States – Provided by CDC A simple glance at this graph will make us conclude that smoking is the leading cause of death among Americans. However, an in-depth analysis of this graph will easily tell us that it is greatly misleading. Certain crucial information are missing. There is no way for us to know whether or not CDC has counted smokers who have died from diseases or accidents. There is a good chance that any smoker that died from a disease has been counted as those that died from smoking. Here is a question to ask the CDC: A person who smokes has died from a heart disease. What was his cause of death? Ways to fix misleading graphs - I Comparative Causes of Annual Deaths in the United States AIDS, 30 Alcohol, 105 Motor Vehicle, 46 Fires, 4 Smoking, 418 Homocide, 25 Illicit Drugs, 9 Suicide, 31 AIDS Alcohol Motor Vehicle Fires Homocide Illicit Drugs Suicide Smoking One way to fix a misleading graph is to present in a different way, like what we have done for the previous information(favorite foods). However, it seems there is no significant change in the information even after we have made a pie graph from the initial bar graph. We now have to question the method of construction of the graph by the CDC. Ways to fix misleading graphs - II - How the CDC has collected their data is very doubtful. The graph does not provide any information about the number of deaths caused by smoking. If smoking is not the primary cause of death, then it should not be the cause of death. - The CDC may say that 418,000 people who have died were smokers, but they cannot say that they have died because of smoking. - The graph presented on the next slide is a more accurate graph regarding the causes of annual deaths in the United States: Revised Graph – Percentage of Smokers in Each Cause of Annual Death in the United States Percentage of Smokers in Each Cause of Death 120% 100% 80% 60% 40% 20% AI D A S M l c ot or oho V l eh ic le Fi Ho res m Ill ic ici ide tD ru Su gs ic id He C e ar anc t D er ise as e 0% Percent of Smokers Analysis of the Revised Graph Percentage of Smokers in Each Cause of Death 120% 100% 80% 60% Percent of Smokers 40% 20% ea rt D is ea se ic id e Su H A M ID ot S or V eh ic le H om ic id e 0% From the revised graph, we can tell that certain percentages of people who have died from each disease smoked. This graph does not imply that smoking is the leading cause of death in the United States. It does, however, imply that smoking contributes to deaths in the United States. For instance, we can assume that smoking is closely related with cardio vascular diseases such as heart disease and cancer, for chemical included in a cigarette such as tar is able to block blood vessels, ultimately causing heart diseases. We can also assume that smoking is intimately related to deaths cause by usage of drugs. We cannot draw a “fact” from most statistics. It is important to conduct the survey and the construction of presentation as in the most realistically accurate, reliable, and non-misleading way as possible, and the conclusions drawn from the presentation, must not be general, but acutely specific. Price Per Barrel of Light Crude Oil Leaving Saudi Arabia on Jan. 1 The pictograph on the left indicates the amount of increase in crude oils in transport from Saudi Arabia. The ratio of the size of the barrel to the actual price is not in proportion. The difference in sizes of the barrels are conveyed to exaggerate the increase or decrease in the price of a barrel of crude oil. It is, moreover, hard for readers to visually compare prices in each year. Therefore, this pictograph is potentially misleading. Year Price Increase 1973 $2.41 1974 $10.95 354.36% 1974 $10.46 -4.47% 1976 $11.51 10.04% 1977 $12.09 5.04% 1978 $12.70 5.05% 1979 $13.34 5.04% Revised Graph Price Per Barrel Price Per Barrel of Light Crude Oil Leaving Saudi Arabia on Jan. 1 Instead of using barrels with different sizes to describe the increase in prices, a properly constructed bar graph would present the information more accurately. $16.00 $14.00 $12.00 $10.00 $8.00 $6.00 $4.00 $2.00 $0.00 1973 1974 1975 1976 Years 1977 1978 1979 Revised Graph - II Prices Per Barrel Price Per Barrel of Light Crude Oil Leaving Saudi Arabia on Jan. 1 $16.00 $14.00 $12.00 $10.00 $8.00 $6.00 $4.00 $2.00 $0.00 1973 1974 1975 1976 1977 1978 1979 Years - Another adequate way of fixing the graph, showing the gradual increase in the oil prices effectively through a line graph. Chevy Advertisement This is a misleading graph to serve a purpose which is to indicate that Chevy is the most preferred car among people, thus possibly persuading many others to purchase Chevy. However, if we look at the graph closely, it can be seen that the y-axis does not start at zero. The viewer of this graph may take this misleading graph for how it looks, and will procure a false and inaccurate information. In order to fix this misleading graph, we would have to possess a precise and accurate information with which we could organize a properly designed graph. The yaxis of the graph must also begin from 0, in order to display an accurate comparison. Without sufficient information, we cannot but simply be aware that graphs such as this are misleading because of their obscurity of the origin on the y-axis. What makes some statistical information accurate and reliable? Statistics is a set of methods that are used to collect and analyze data. Because it is used to help many people to make good decisions about uncertain situations, many people tend to believe any statistic that is presented to them by a company. However, as you have seen through our presentation, statistics are very easy to manipulate; without an adequate understanding and analysis of the statistical information, it is easy for us to take misleading statistics seriously. Accurate and reliable statistics come from proper procedure of defining the problem, collecting the data, analyzing the data, and reporting the data. These 4 procedures must be done rationally and as accurately as possible, in order to prevent the statistics from becoming misleading. We will explain the adequate ways to conduct the 4 procedures, and ultimately to make accurate and reliable statistics. Defining the Problem Every word in a statistical problem must be defined extremely specifically and accurately. For example, if the problem was “counting the number of inhabitants of Kerrisdale, Vancouver, on a specific date, we would have to define inhabitants to know who to count into the survey. Also, Kerrisdale must be defined specifically in order to decide where to stop the survey. Factors such as newborn babies in the hospital must be taken into consideration. If one of these pieces of information is not clearly define, it would be extremely difficult to begin gathering data. Collecting the Data For each kind of problem, different information is needed, and so is the method of collecting the data. One of the most important parts of establishing a statistic is to design an effective way of collecting data. We collect data from a population or from a sample. When the population of the survey is selected from a sample, the selected population must be able to provide exactly required information for the purpose of the survey. The most exacting and informative form of data collection for comparisons is randomized controlled experiment. The population is divided into randomly separated groups, and are selcted randomly. Analyzing the Data ~ Exploratory Methods ~ This method often involves a lot of calculating averages and percentages, and displaying the information on a graph. Although Exploratory methods may provide many pieces of information, it may not answer specific questions or make definite statements about a problem. ~ Confirmatory Methods ~ This method is used to conclude the results of the survey and the statistical information by answering specific questions. For example, using a confirmatory method, a statistician can say “Oil Prices leaving Saudi Arabia has been increasing, and will increase in prices.” Not one of these methods should be overlooked. Both methods should be used extensively to analyze the results of a statistical activity and will have to come to varieties of extremely specific conclusions with credibility and accuracy. Reporting the Results Inference is used to draw conclusion from a statistical activity; even from a small collection of observations or experimental results, careful and rational inference can create an accurate and reliable generalization that can be used to used to the social benefits. There are many forms of presentations, and they include bar graphs, pie graphs, tables, or a set of percentages. However, when drawing conclusions, one must take into consideration the fact that the survey was carried on a specifically selected sample population, not the entire population. Therefore, using probability, the conclusions must reflect and include the uncertainty possibly excluded or misrepresented in the statistics. How We Would Conduct Statistical Activities Everything that has been presented to you by the previous slides must be considered when carrying out a statistical activity. When everything is carefully done, the statistics will be truly accurate and reliable. Thank you for viewing our presentation! Bibliography Fienberg, Stephen E. “Statistics.” The World Book Encyclopedia. 2002 ed. Goodman, Jeff. “Math and the Media: Deconstructing Graphs and Numbers.” Jeff Goodman. Modification Date: N/A Appalachian State University. Access Date: 1 December 2003. <http://www.ced.appstate.edu/~goodmanj/workshops/ABS04/graphs/graphs.html> Goodman, Jeff. “Math and the Media: Deconstructing Graphs and Numbers.” Jeff Goodman. Modification Date: N/A Appalachian State University. Access Date: 1 December 2003. < http://www.ced.appstate.edu/~goodmanj/workshops/ABS04/graphs/graphs.html > Knox, Pattie. "Excel Activities for the Classroom.“ North Canton City Schools. Modification Date: N/A Access Date: 1 December 2003. <http://www.northcanton.sparcc.org/~technology/excel/files/Misleading_Graphs.xls> “Misleading statistics by the CDC.” Jeremiah Project. Modification Date: 21 May 2003. Access Date: 1 December 2003. <http://www.jeremiahproject.com/smoke/cdcdeaths.html>