Lecture Slides

advertisement
Data Visualization
Lies, Damn Lies, and Bad Graphs
Lies, Damn Lies, and Bad Graphs
Lies, Damn Lies, and Bad Graphs
Visual Medium
• Reports
– Paper (static, with time)
– Web (dynamic and interactive)
• Presentations (static and dynamic)
“It’s not just about producing graphics for
publication,” Aldhous explains. “It’s about playing
around and making a bunch of graphics that help
you explore your data.This kind of graphical
analysis is a really useful way to help you
understand what you’re dealing with, because if
you can’t see it, you can’t really understand it. But
when you start graphing it out, you can really see
what you’ve got.”
Four sets of data with the same correlation of 0.816
Percent Blue relative to Red?
Percent Blue relative to Red?
2
1
Percent Water
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
body
brain
blood
Percent Water
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
body
brain
blood
Percent Water
100
75
50
25
0
body
brain
blood
Bad
Better
Even Better*
0
5
10
15
Too Little
About Right
Too Much
National Spending to Deal with Drug Addiction
20
25
30
0
5
10
15
20
25
30
35
Too Little
About Right
Too Much
Male
Female
National Spending to Deal with
Drug Addiction
40
National Spending to Deal with
Drug Addiction
Too
Little
Female
About
Right
Male
Too
Much
-
10
20
30
40
National Spending to Deal with
Drug Addiction
Female
Too Little
About Right
Too Much
Male
0%
20%
40%
60%
80%
100%
#Introduction
#History of Plots
#The Explanatory Power of Graphics
#Basic Philosophy of Approach
#Graphical Integrity
#Data Densities
#Data Compression
#Multifunctioning Graphical Elements
#Maximize data-ink; minimize non-data ink
#Small Multiples
#Chartjunk
#Colors
#General Philosophy for Increasing Data
Comprehension
#Techniques for Increasing Data
Comprehension
#When NOT to Use Graphics
#Aesthetics
Chartjunk and Graphics Integrity
Types of chartjunk
• Chartjunk are non-data-ink or redundant
data-ink decoration
• Unintended Optical Art (Moiré vibration)
• The Grid
• The Duck: Self-promoting Graphics
Unintended Optical Art
• Mainly rely on moiré effects
– Distracting appearance of vibration and
movement
– The most common form of graphical clutter
Moiré Vibrations
The Grid
• Dark grid lines are chartjunk
• The grid should usually be muted or
completely supressed
The Grid (cont’d)
• Marey’s train schedule
The Duck
• Self-promoting
graphics: when the
data measures
become design
elements
Duck Examples
"In our excitement to produce what we could only make
before with great effort, many of us have lost sight of the
real purpose of quantitative displays — to provide the
reader with important, meaningful, and useful insight."
— Stephen Few
Graphical Integrity
• Graphical excellence begins with telling the
truth about the data
Some examples of Lie
Two Principles
• The representation of numbers, as
physically measured on the surface of the
graphics, should be directly proportional to
the numerical quantities represented
• Clear, detailed and thorough labeling
should be used to defeat distoration
Violating rule 1
18 miles/gallon: 0.6 inches; 27.5miles/gallon: 5.3 inches
Lie Factor
• Rule 1 can be measured by Lie factor
size of effect shown in graphics
Lie Factor =
size of effect in data
• Lie Factor equal to one is ideal
• The previous slide has a lie factor of 14.8
Design and Data Variation
• Show data variation, not design variation
• 1973-1978: one vertical inch
equals to $8.00. In 1979,
One vertical inch equals $3-4
• 1973-1978: one horzontal inch
equals 3.7 years, while 1979
equals 0.57 year
Example
• Lie factor: 9.5
• The price of oil is
inflated so need to be
repaired.
Government Spending
• Tricks to exaggerate the growth of spending
Real Government Spending

Tricks to exaggerate the growth of
spending
Visual Area and Numerical Measure
• Tricking the reviewer with design variation is to
use areas to show 1D data
Lie factor: 2.8
Content is Essential
• Graphics must not quote
data out of context
Content is Essential

Graphics must not
quote data out of
context
On Using Color…
Rule #3: Use color only when needed to serve a
particular communication goal.
0
50
100
150
Japan
Greece
Italy
Belgium
Portugal
Hungary
United…
Austria
France
Netherlands
Poland
Iceland
United States
Turkey
Germany
Sweden
Spain
Denmark
Finland
Korea
Canada
Ireland
Czech Republic
Slovak Republic
Mexico
Switzerland
New Zealand
Norway
Luxembourg
Australia
2008 Debt to GDP Ratio for OECD
200
Picking Color Schemes
http://colorbrewer2.org/
http://kuler.adobe.com
Stop Visually Assaulting Me
• http://fosslien.com/rules/
The principles
• The representation of numbers, as physically
measured on the surface of graphics, should be
proportionally to the numerical quantities
represented
• Use clear and detailed labeling
• Show data variation, not design variation
• The number of information-carrying dimensions
depicted should not exceed the number of
dimensions in the data (2 dimensions of data 2 D,
2 dimensions 3 D)
• Graphics should not quote data out of context
Why do graphics lie?
• Lack of quantitative skills of professional
artists
• The doctrine that statistical data are boring
• The doctrine that graphics are only for the
unsophisticated readers
Design is choice.The theory of the visual display of quantitative
information consists of principles that generate design options and
that guide choices among options. The principles should not be
applied rigidly or in a peevish spirit; they are not logically or
mathematically certain; and it is better to violate any principle than
to place graceless or inelegant marks on paper.
— Edward Tufte,The Visual Display of Quantitative Information
Word Cloud
Spine Plot / Matrix Chart
Bullet Graph
Data dense. Each bar communicates a piece of data.
Bullet Graph
Bullet Graph
Choropleth “Heat Map”
RED STATE
BLUE STATE
PURPLE STATE
Dynamic Charts
Leisure Time Spent
biking
125
hiking
40
reading
30
singing
25
dancing
10
cleaning
0
Avoid defaults in Excel
• Show the data
• Data dense
Maximize Data Ink Ratio
Minimize Non-Data Ink
Eliminate Chart Junk
Streamline Placement
Download