MIS5101: Business Intelligence Principles of Data Visualization Sunil Wattal

advertisement
MIS5101:
Business Intelligence
Principles of Data Visualization
Sunil Wattal
What makes a good chart?
Wikipedia: Patriotic War of 1812
http://en.wikipedia.org/wiki/File:Patriotic_War_of_1812_ENG_map1.svg
What makes a good chart?
Minard’s map of Napoleon’s campaign into Russia, 1869
Reprinted in Tufte (2009), p. 41
What is Data Visualization?
• A picture can say a thousand words (Chinese proverb)
• Data visualization is the use of tools to represent data in
the form of charts, maps, tag clouds, animations, or any
graphical means that make content easier to understand.
– Creating a picture in your mind
• Interpreting data in visual terms
– Raw data is too complex
– Not a substitute for statistical analysis or other quantitative
methods
– Exploit visual perception to amplify cognition
Why visualization is effective
• Short attention spans
• Information overload
• Easy to grasp format
• More retention
Video Games do this well…
What can you learn from this map?
http://www.popvssoda.com/countystats/total-county.html
What makes a good chart?
This is from an
academic
conference
paper.
What are the
problems with
this chart?
Zhang et al. (2010), “A case study of micro-blogging in the enterprise: use, value, and related issues,” Proceedings of the 28th International
Conference on Human Factors in Computing Systems.
Some basic principles (adapted from Tufte 2009)
1
• The chart should tell a story
2
• The chart should have graphical
integrity
3
• The chart should minimize graphical
complexity
Tufte’s fundamental principle:
Above all else show the data
Principle 1: The chart should tell a
story
Graphics should be clear on their own
The depictions should enable meaningful comparison
The chart should yield insight beyond the text
“If the statistics are boring, then you’ve got the wrong
numbers.” (Tufte 2009)
Do these tell a story?
http://www.evl.uic.edu/aej/491/week03.html
http://flowingdata.com/2009/11/26/fox-newsmakes-the-best-pie-chart-ever/
Daylight Savings Time Explained
http://visual.ly/daylight-saving-time-explained
http://flowingdata.com/2011/01/19/states-with-the-most-and-least-firearms-murders/
http://economix.blogs.nytimes.com/2009/05/05/obesityand-the-fastness-of-food/
Telling a Story
Telling a story
The electoral
result of the
2012 election
Under pre-1920 rules: men only
Under 1850 rules: white men only
http://www.buzzfeed.com/buzzfeedpolitics/what-the-2012-election-would-have-looked-like-with
Principle 2: The chart should have
graphical integrity
• Basically, it shouldn’t “lie” (mislead the reader)
• Tufte’s “Lie Factor”:
– 𝐿𝑖𝑒 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ =
𝑠𝑖𝑧𝑒 π‘œπ‘“ 𝑒𝑓𝑓𝑒𝑐𝑑 π‘ β„Žπ‘œπ‘€π‘› 𝑖𝑛 π‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘
𝑠𝑖𝑧𝑒 π‘œπ‘“ 𝑒𝑓𝑓𝑒𝑐𝑑 𝑖𝑛 π‘‘π‘Žπ‘‘π‘Ž
Should be ~ 1
< 1 = understated
effect
> 1 = exaggerated
effect
Examples of the “lie factor”
𝐿𝐹 =
Reprinted from
Tufte (2009), p.
57 & p. 62
𝐿𝐹 =
5.3/0.6 8.83
=
= 5.77
27.5/18 1.53
4280% (π‘β„Žπ‘Žπ‘›π‘”π‘’ 𝑖𝑛 π‘£π‘œπ‘™π‘’π‘šπ‘’)
= 9.4
454% (π‘β„Žπ‘Žπ‘›π‘”π‘’ 𝑖𝑛 π‘π‘Ÿπ‘–π‘π‘’)
How is this deceptive?
http://www.guardian.co.uk/environment/interactive/2009/mar/09/
food-carbon-emissions
How about this?
The original graphic from Real Clear
Politics, 2008.
The adjusted graphic.
(Look at the y-axis)
http://20bits.com/articles/politics-and-tuftes-lie-factor/
Or this?
The original graphic from Fox
Business, 2012.
The adjusted graphic.
http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225
Other tips to avoid “lying”
Hypothetical Industries, Inc.
140
130
Adjust for
inflation
120
110
Revenue
Adjusted Revenue
100
90
80
2003
2004
2005
2006
2007
2008
2009
2010
Year
Hypothetical City Crime
Hypothetical City Crime
400
425
390
Thefts per 100000 citizens
Thefts per 100000 citizens
Make sure
the context
is presented
410
380
370
360
350
2009
2010
vs.
375
325
275
225
175
125
75
25
2003 2004 2005 2006 2007 2008 2009 2010
Principle 3: The chart should
minimize graphical complexity
Generally, the simpler the better…
Key concepts
Sometimes
a table is
better
Data-ink
Chartjunk
When a table is better than a chart
For a few data points, a table can do just as well…
Total Sales by Salesperson
$250,000.00
$200,000.00
$150,000.00
$100,000.00
$50,000.00
$0.00
Salesperson
Total Sales
Peacock
$225,763.68
Leverling
$201,196.27
Davolio
$182,500.09
Fuller
$162,503.78
Callahan
$123,032.67
King
$116,962.99
Dodsworth
$75,048.04
Suyama
$72,527.63
Buchanan
$68,792.25
The table carries more information in less space
and is more precise.
The Ultimate Table: The Box Score
• Large amount of
information in a
very small space
• So why does this
work?
– Depends on the
reader’s knowledge
of the data
The Business Box Score?
• Applying the same
concept to our
salesforce example.
• How does this help?
How could it hurt?
Key:
TS – total sales
WD – worst day
BD – best day
NC – number of customers
DOR – days on the road
Sales Performance – March 2011
Salesperson
TS
WD
BD
NC
DOR
Peacock
225
3
40
20
28
Leverling
201
2
45
18
27
Davolio
182
5
38
22
28
Fuller
162
2
22
16
20
Callahan
123
1
15
14
15
King
116
0.5
20
12
18
Dodsworth
75
0.3
12
10
20
Suyama
72
0
8
10
8
Buchanan
68
0
8
8
12
Data Ink
• The amount of “ink” devoted to data in a chart
• Tufte’s Data-Ink ratio:
– π·π‘Žπ‘‘π‘Ž − π‘–π‘›π‘˜ π‘Ÿπ‘Žπ‘‘π‘–π‘œ =
π‘‘π‘Žπ‘‘π‘Ž−π‘–π‘›π‘˜
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘–π‘›π‘˜ 𝑒𝑠𝑒𝑑 𝑖𝑛 π‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘
Should be ~ 1
< 1 = more non-data
related ink in graphic
= 1 implies all ink
devoted to data
Tufte’s principle:
Erase ink whenever possible
Being conscious of data ink
Lower data-ink ratio
(worse)
425
375
Thefts per 100000 citizens
Hypothetical City Crime
425
325
275
225
175
125
75
375
Thefts per 100000 citizens
Hypothetical City Crime
25
2003
325
2004
2005
2006
2007
2008
2009
2010
275
225
Hypothetical City Crime
175
400
125
370
75
25
2003
2004
2005
2006
2007
2008
2009
320
330
2005
2006
370
350
270
2010
200
Higher data-ink ratio
(better)
2003
2004
2007
2008
2009
2010
What makes a good chart?
Sum of Extended Price
2011 Total Sales
160000
140000
120000
100000
80000
60000
40000
20000
0
Sometimes it’s
really a matter of
preference.
Order Date
Sum of Extended Price
2011 Total Sales
160000
140000
120000
100000
80000
60000
40000
20000
0
Order Date
These both
minimize data ink.
Why isn’t a table
better here?
3-D Charts
Total Sales by Salesperson
$250,000.00
$200,000.00
$150,000.00
$100,000.00
$50,000.00
$0.00
Evaluate this from a data-ink perspective.
How does it affect the clarity of the chart?
Chartjunk: Data Ink “gone wild”
Unnecessary visual clutter that doesn’t
provide additional insight
Distraction from the story the chart is
supposed to convey
When the data-ink ratio is low, chartjunk
is likely to be high
Example: Moiré effects (Tufte 2009)
Total Sales by Salesperson
Creates illusion of
movement
$250,000.00
$200,000.00
$150,000.00
Stands out, in a bad
way
$100,000.00
$50,000.00
$0.00
Hypothetical City Crime
425
Thefts per 100000 citizens
375
325
275
225
175
125
75
25
2003
2004
2005
2006
2007
2008
2009
2010
Example: The Grid
Hypothetical City Crime
425
Why are these
examples of
chartjunk?
325
275
225
175
125
75
25
2003
2004
2005
2006
2007
2008
2009
2010
Hypothetical City Crime
425
375
What could you
do to remedy
it?
Thefts per 100000 citizens
Thefts per 100000 citizens
375
325
275
225
175
125
75
25
2003
2004
2005
2006
2007
2008
2009
2010
http://www.economist.com/node/21537909
http://www.economist.com/node/21537922
Data Ink Working Against Us
Data Ink Working For Us
Evaluate this
chart in terms
of Data Ink.
Imagine this as
a bar chart. As a
table!!
Tools
• Excel Tools – charts, histograms, pie charts
– One dimensional, and not useful to represent complex data
• Heat Maps
• Infographics
• Social Media Visualization
•
•
•
•
•
•
Other tools
Google Trends
Fusion Tables
Tibco Spotfire
Twitter Analytics for Excel 2013
Visualizing World Bank Data
– http://data.worldbank.org/products/data-visualization-tools
Heat Map
• Graphical representation of data in a map or a
matrix
• Using colors instead of numbers
• Different colors denote different intensities or
categories
• Tools
– Open heat map (http://www.openheatmap.com/)
Infographic
• Present information using visual tools
• According to visual.ly
– visualizations that present complex information quickly and
clearly
– visualizations that integrate words and graphics to reveal
information, patterns or trends
– visualizations that are easier to understand than words alone
– visualizations that are beautiful and engaging
• Tools
– Infogram
– Visual.ly
Check out more at
www.coolinfographics.com
http://visual.ly/what-infographic-2
The Infographic
Three excerpts from an infographic
comparing Hurricane Sandy to Katrina
http://www.huffingtonpost.com/2012/11/04/hurricane-sandy-vskatrina-infographic_n_2072432.html
http://www.biztechmagazine.com/article/2012/08/how-make-sure-yourbyod-plan-all-good-infographic
Another example
Social Media Visualization
• Big Data Analytics is only one aspect of social media
– Story telling through visualizations is the key
– Easy to get lost in pure numbers
• Online conversations
– Sentiment analysis
– Reviews and ratings
– Feedback
• Online networks
– Influencers
Twitter Analytics
•
Social Networks
• Connections between people
–
–
–
–
How many?
Who?
One way or two way?
Intensity of connections?
• Applications
– LinkedIn InMaps (inmaps.linkedinlabs.com)
– Klout Score
– How Google allotted the first set of Google glasses?
Links
• TED Talk on Data Visualization
• http://videosubtitle.tedcdn.com/talk/podcast/2010G/Non
e/DavidMcCandless_2010G-480p-en.mp4
Download