Data Analysis Project

advertisement
Data Analysis Project
Misleading Statistics




Companies and media often manipulate and skew certain statistics
to their advantages in order to deliver their messages more
convincingly.
Because there are many ways of presentation of statistics, it is quite
easy to manipulate the information in a misleading way.
If the statistics include surveying, a proper way of selection of
population is important (Simple Random Sampling, Clustered
Sampling, Stratified Random Sampling, and etc.).
We all must be aware of these misleading graphs and in what forms
they are presented to us, and be able to analyze, and fix such
graphs so that they are no longer misleading, to the benefit of our
society.
Two different bar graphs are made from
the same survey of favorite foods:
The same information can be accurately
presented in a non-misleading way :
Favorite Foods
Hamburgers
33%
Pizza
33%
Pizza
Hot Dogs
Hamburgers
Hot Dogs
34%
If we take the same
information and present
it in a pie graph, we can
see the more accurate
result of the survey.
Unlike the previous
graph which depicted
hot dogs as the favorite
food by misrepresenting
the origin on the y-axis,
this pie graph shows
that all three foods are
equally preferred, a
more realistic result.
Comparative Causes of Annual Deaths in
the United States – Provided by CDC
A simple glance at this graph will
make us conclude that smoking
is the leading cause of death
among Americans. However, an
in-depth analysis of this graph
will easily tell us that it is greatly
misleading.
Certain crucial
information are missing.
There is no way for us
to know whether or not
CDC has counted
smokers who have
died from diseases or
accidents. There is a
good chance that any
smoker that died from
a disease has been
counted as those that
died from smoking.
Here is a question to
ask the CDC:
A person who smokes has died from a heart disease. What was
his cause of death?
Ways to fix misleading graphs - I
Comparative Causes of Annual Deaths in the United States
AIDS, 30
Alcohol, 105
Motor Vehicle, 46
Fires, 4
Smoking, 418
Homocide, 25
Illicit Drugs, 9
Suicide, 31
AIDS
Alcohol
Motor Vehicle
Fires
Homocide
Illicit Drugs
Suicide
Smoking
One way to fix a
misleading graph is to
present in a different
way, like what we
have done for the
previous
information(favorite
foods). However, it
seems there is no
significant change in
the information even
after we have made a
pie graph from the
initial bar graph. We
now have to question
the method of
construction of the
graph by the CDC.
Ways to fix misleading graphs - II
- How the CDC has collected their data is very doubtful.
The graph does not provide any information about the
number of deaths caused by smoking. If smoking is not
the primary cause of death, then it should not be the
cause of death.
-
The CDC may say that 418,000 people who have died
were smokers, but they cannot say that they have died
because of smoking.
-
The graph presented on the next slide is a more
accurate graph regarding the causes of annual deaths in
the United States:
Revised Graph – Percentage of
Smokers in Each Cause of Annual
Death in the United States
Percentage of Smokers in Each Cause of Death
120%
100%
80%
60%
40%
20%
AI
D
A
S
M
l
c
ot
or oho
V l
eh
ic
le
Fi
Ho res
m
Ill ic
ici ide
tD
ru
Su gs
ic
id
He C e
ar anc
t D er
ise
as
e
0%
Percent of Smokers
Analysis of the Revised Graph
Percentage of Smokers in Each Cause of
Death
120%
100%
80%
60%
Percent of
Smokers
40%
20%
ea
rt
D
is
ea
se
ic
id
e
Su
H
A
M
ID
ot
S
or
V
eh
ic
le
H
om
ic
id
e
0%
From the revised graph, we can tell that
certain percentages of people who have
died from each disease smoked. This
graph does not imply that smoking is the
leading cause of death in the United
States. It does, however, imply that
smoking contributes to deaths in the
United States.
For instance, we can assume that smoking is closely related with cardio
vascular diseases such as heart disease and cancer, for chemical included in a
cigarette such as tar is able to block blood vessels, ultimately causing heart
diseases. We can also assume that smoking is intimately related to deaths
cause by usage of drugs.
We cannot draw a “fact” from most statistics. It is important to conduct the
survey and the construction of presentation as in the most realistically accurate,
reliable, and non-misleading way as possible, and the conclusions drawn from
the presentation, must not be general, but acutely specific.
Price Per Barrel of Light Crude Oil
Leaving Saudi Arabia on Jan. 1
The pictograph on the left indicates the
amount of increase in crude oils in
transport from Saudi Arabia. The ratio of
the size of the barrel to the actual price
is not in proportion. The difference in
sizes of the barrels are conveyed to
exaggerate the increase or decrease in
the price of a barrel of crude oil. It is,
moreover, hard for readers to visually
compare prices in each year. Therefore,
this pictograph is potentially misleading.
Year
Price
Increase
1973
$2.41
1974 $10.95 354.36%
1974 $10.46 -4.47%
1976 $11.51 10.04%
1977 $12.09 5.04%
1978 $12.70 5.05%
1979 $13.34 5.04%
Revised Graph
Price Per Barrel
Price Per Barrel of Light Crude Oil Leaving Saudi Arabia on
Jan. 1
Instead of using
barrels with
different sizes to
describe the
increase in prices,
a properly
constructed bar
graph would
present the
information more
accurately.
$16.00
$14.00
$12.00
$10.00
$8.00
$6.00
$4.00
$2.00
$0.00
1973
1974
1975
1976
Years
1977
1978
1979
Revised Graph - II
Prices Per Barrel
Price Per Barrel of Light Crude Oil Leaving Saudi
Arabia on Jan. 1
$16.00
$14.00
$12.00
$10.00
$8.00
$6.00
$4.00
$2.00
$0.00
1973
1974
1975
1976
1977
1978
1979
Years
- Another adequate way of fixing the graph, showing the gradual
increase in the oil prices effectively through a line graph.
Chevy Advertisement
This is a misleading graph to serve a purpose
which is to indicate that Chevy is the most
preferred car among people, thus possibly
persuading many others to purchase Chevy.
However, if we look at the graph closely, it can
be seen that the y-axis does not start at zero.
The viewer of this graph may take this
misleading graph for how it looks, and will
procure a false and inaccurate information.
In order to fix this misleading graph, we
would have to possess a precise and
accurate information with which we could
organize a properly designed graph. The yaxis of the graph must also begin from 0, in
order to display an accurate comparison.
Without sufficient information, we cannot but
simply be aware that graphs such as this are
misleading because of their obscurity of the
origin on the y-axis.
What makes some statistical
information accurate and reliable?
Statistics is a set of methods that are used to collect and analyze data.
Because it is used to help many people to make good decisions about
uncertain situations, many people tend to believe any statistic that is
presented to them by a company. However, as you have seen through our
presentation, statistics are very easy to manipulate; without an adequate
understanding and analysis of the statistical information, it is easy for us to
take misleading statistics seriously. Accurate and reliable statistics come
from proper procedure of defining the problem, collecting the data,
analyzing the data, and reporting the data.
These 4 procedures must be done rationally and as accurately as possible,
in order to prevent the statistics from becoming misleading.
We will explain the adequate ways to conduct the 4 procedures, and
ultimately to make accurate and reliable statistics.
Defining the Problem
Every word in a statistical problem must be defined
extremely specifically and accurately. For example,
if the problem was “counting the number of
inhabitants of Kerrisdale, Vancouver, on a specific
date, we would have to define inhabitants to know
who to count into the survey. Also, Kerrisdale must
be defined specifically in order to decide where to
stop the survey. Factors such as newborn babies in
the hospital must be taken into consideration. If one
of these pieces of information is not clearly define, it
would be extremely difficult to begin gathering data.
Collecting the Data
For each kind of problem, different information is
needed, and so is the method of collecting the data.
One of the most important parts of establishing a
statistic is to design an effective way of collecting
data. We collect data from a population or from a
sample. When the population of the survey is
selected from a sample, the selected population
must be able to provide exactly required information
for the purpose of the survey. The most exacting
and informative form of data collection for
comparisons is randomized controlled experiment.
The population is divided into randomly separated
groups, and are selcted randomly.
Analyzing the Data
~ Exploratory Methods ~
This method often involves a lot of calculating averages and percentages, and displaying
the information on a graph. Although Exploratory methods may provide many pieces
of information, it may not answer specific questions or make definite statements
about a problem.
~ Confirmatory Methods ~
This method is used to conclude the results of the survey and the statistical information
by answering specific questions. For example, using a confirmatory method, a
statistician can say “Oil Prices leaving Saudi Arabia has been increasing, and will
increase in prices.”
Not one of these methods should be overlooked. Both methods should be used
extensively to analyze the results of a statistical activity and will have to come to
varieties of extremely specific conclusions with credibility and accuracy.
Reporting the Results
Inference is used to draw conclusion from a statistical
activity; even from a small collection of observations or
experimental results, careful and rational inference can
create an accurate and reliable generalization that can
be used to used to the social benefits.
There are many forms of presentations, and they include
bar graphs, pie graphs, tables, or a set of percentages.
However, when drawing conclusions, one must take into
consideration the fact that the survey was carried on a
specifically selected sample population, not the entire
population. Therefore, using probability, the conclusions
must reflect and include the uncertainty possibly
excluded or misrepresented in the statistics.
How We Would Conduct
Statistical Activities
Everything that has been presented to
you by the previous slides must be
considered when carrying out a
statistical activity. When everything is
carefully done, the statistics will be
truly accurate and reliable.
Thank you for viewing our presentation!
Bibliography
Fienberg, Stephen E. “Statistics.” The World Book Encyclopedia. 2002 ed.
Goodman, Jeff. “Math and the Media: Deconstructing Graphs and Numbers.” Jeff Goodman.
Modification Date: N/A Appalachian State University. Access Date: 1 December 2003.
<http://www.ced.appstate.edu/~goodmanj/workshops/ABS04/graphs/graphs.html>
Goodman, Jeff. “Math and the Media: Deconstructing Graphs and Numbers.” Jeff Goodman.
Modification Date: N/A Appalachian State University. Access Date: 1 December 2003.
< http://www.ced.appstate.edu/~goodmanj/workshops/ABS04/graphs/graphs.html >
Knox, Pattie. "Excel Activities for the Classroom.“ North Canton City Schools. Modification
Date: N/A Access Date: 1 December 2003.
<http://www.northcanton.sparcc.org/~technology/excel/files/Misleading_Graphs.xls>
“Misleading statistics by the CDC.” Jeremiah Project. Modification Date: 21 May 2003. Access
Date: 1 December 2003. <http://www.jeremiahproject.com/smoke/cdcdeaths.html>
Download