Powerpoint Slides - Wharton Statistics Department

advertisement
Seeing something once
is worth hearing about it
a hundred times.
The Solution:
The double y-axis graph
The most powerful tool for misleading
graphics ever devised.
I gather, young man, that you wish to be a
Member of Parliament. The first lesson that you
must learn is, when I call for statistics about
the rate of infant mortality, what I want is
proof that fewer babies died when I was Prime
Minister than when anyone else was Prime
Minister. That is a political statistic.
Winston Churchill (1874-1965)
The Ithaca Times - October 20, 2000
How to display data badly
Howard Wainer
National Board of Medical Examiners
Three wonderful books by my favorite author:
Picturing the Uncertain World: How to Understand, Communicate and
Control Uncertainty through Graphical Display . Princeton, NJ:
Princeton University Press, 2009.
Graphic Discovery: A Trout in the Milk and Other Visual Adventures.
Princeton, N.J.: Princeton University Press, 2005.
Visual Revelations: Graphical Tales of Fate and Deception from
Napoleon Bonaparte to Ross Perot. (2nd edition) Hillsdale, N. J.:
Lawrence Erlbaum Associates, 2000.
“Closing averages on the human scene were mixed today.
Brotherly love was down two points,
while enlightened self-interest gained a half.
Vanity showed no movement, and
guarded optimism slipped a point in sluggish trading.
Overall, the status quo remained unchanged.”
Of good graphs it may be said what Mark Van
Doren observed about brilliant
conversationalists:
In their presence others speak well.
A good graph is quiet and lets the data tell their
story clearly and completely.
Along each line are numbers which are turning points in the race
between men and fire,
And if the lines are viewed as a race the numbers mark off legs of
the race,
If they also have religious significance they are stations of the
cross,
And if they have literary significance they mark off acts of
drama,…
But the acts are short, because modern wildfire allows no time for
soliloquies.
Norman Maclean (1992) Young Men and Fire. Chicago: University of Chicago Press.
The four purposes of graphs
1. Exploration - The data contain a message and we would like to
find out what it is.
2. Communication - We know something and we wold like to tell
others.
3. Calculation - Graphs can serve as visual algorithms (nomogaphs)
that enable us to determine at-a-glance what might
otherwise be tedious to calculate.
4. Decoration - Graphs are pretty and can be used to enliven what
might otherwise be a dull presentation.
“Some, to beautify their Halls, Parlors,
Chambers, Galeries, Studies, or Libraries….
Liketh, loveth, getteth and useth, Maps,
Charts, and Geographicall Globes.”
(John Dee, 1570)
The Aim of
Good Data Graphics
is to
Display Data
Accurately and Clearly.
I.
Include as few data as possible
(minimize the data density)
You can boost data density through the use of small multiples
II. Hide what data you do show
(minimize the data-ink ratio)
Hide the data in the grid
Another published version of the same data
improved the data-ink ratio up to about 0.7.
Relation of Actual Rates of Registration to Predicted Rates (104 cities 1960)
Hide the data in the scale
III. Ignore the Visual Metaphor
Show the Data Accurately
The Old
Goosing up the effect By
Squaring the Eye-Ball
Trick
IV. Only Order Matters
(maximize the lie factor)
V. Graph Data out of Context
Show the Data Clearly
VI. Change Scales in Mid-Axis
VII. Emphasize the Trivial
(Ignore the Important)
VIII. Jiggle the Baseline
IX. Alabama First
Even the best of graphics can sometimes use a little explanation
X. Label: Illegibly,
Incompletely,
Incorrectly,
Ambiguously
“Everyone spoke of an information overload,
but what there was, in fact,
was a non-information overload.”
(Wurman, 1976)
XI. More is Murkier:
More Decimal Places
More Dimensions
Economists state their GNP growth
projections to the nearest tenth of a
percentage point to prove they have a
sense of humor.
Edgar R. Fiedler
An example from
State Court Caseload Statistics: 1976
Ln(DIAC) = -.10729131 + 1.00716993 x Ln(FIAC)
Where DIAC = Number of Dispositions
FIAC = Number of Filings
with eight digits shown it implies a standard error of
.000000005
or
n ~ 4 x 1016
The actual n (the number of states)
allows one digit of accuracy at most.
Or
DIAC = .9 FIAC
or
“There are about 90% as many
dispositions as filings.”
My colleague Al Biderman, who knows
more about state courts than I do, said it
needs yet one more digit of rounding, or
There are about as many
dispositions as filings.
The worst graph ever to
find its way into print.
American Education
A Graphical Duck
from the Bureau of the Census
XII. If it has been done well in the
past, think of a new way to do it.
The World’s Champion Graph
Designed and drawn by Charles Joseph Minard
November 20, 1869
Download