MIS5401: Data Analytics for Management What is Analytics? Principles of Data Visualization

advertisement
MIS5401:
Data Analytics for Management
What is Analytics?
Principles of Data Visualization
Sunil Wattal
Focus of this Course
Analyze
Communicate
…using data
Manage
Discuss (5 minutes)
• From reading Matchmaking with Math, what
is a good definition of analytics?
• From reading Unwin, what are some
guidelines for good data visualizations?
• From reading Davenport, how do analytics
and visualization facilitate storytelling?
Analytics is
Extracting
information
from data
Discovering
meaningful
patterns
What is Big
Data?
What is the role of
Hadoop and
MapReduce?
By the way, you
don’t need Big
Data to do
analytics!
Velocity
Volume
Variety
The Three “V”s of Big Data
• Manufacturers are monitoring minute vibration data from their equipment,
which changes slightly as it wears down, to predict the optimal time to replace
or maintain. Replacing it too soon wastes money; replacing it too late triggers
an expensive work stoppage
• Manufacturers are also monitoring social networks, but with a different goal
than marketers: They are using it to detect aftermarket support issues before a
warranty failure becomes publicly detrimental.
• Financial Services organizations are using data mined from customer
interactions to slice and dice their users into finely tuned segments. This
enables these financial institutions to create increasingly relevant and
sophisticated offers.
• Advertising and marketing agencies are tracking social media to understand
responsiveness to campaigns, promotions, and other advertising mediums.
• Insurance companies are using Big Data analysis to see which home insurance
applications can be immediately processed, and which ones need a validating
in-person visit from an agent.
• Hospitals are analyzing medical data and patient records to predict those
patients that are likely to seek readmission within a few months of discharge.
The hospital can then intervene in hopes of preventing another costly hospital
stay.
• Web-based businesses are developing information products that combine data
gathered from customers to offer more appealing recommendations and more
successful coupon programs.
• Sports teams are using data for tracking ticket sales and even for tracking team
strategies.
• Pause: what other examples can you think of?
Case Study:
Matchmaking with
Math
What did they do?
How did they do it?
What do you think of the
result?
What do you think of the
process?
Communicating using data
Minard’s map of Napoleon’s campaign into Russia, 1869
Reprinted in Tufte (2009), p. 41
What can you learn from this map?
http://www.popvssoda.com/countystats/total-county.html
Some basic principles (adapted from Tufte 2009)
1
• The chart should tell a story
2
• The chart should have graphical
integrity
3
• The chart should minimize graphical
complexity
Tufte’s fundamental principle:
Above all else show the data
Principle 1: Graphics should tell a story
Clear
Enable comparison
Yield insight
Roumeliotis’ steps for
communicating analysis
My understanding of the business problem
How will I measure the business impact?
What’s the available data?
The initial solution hypothesis
The solution
The business impact of the solution
But avoid
communicating:
• Technical
terminology
• Step-by-step
methodology
• Complex statistics
All of these
inform the
data
visualization
From: Davenport, T. (2013). Telling a Story with Data. Deloitte University Press.
What story does this tell?
Causes of death in the
Crimean war:
Blue – preventable diseases
Red – wounds
Black – other
http://en.wikipedia.org/wiki/Florence_Nightingale
What
story does
this tell?
http://fivethirtyeight.com/datalab/most-of-the-u-s-saw-little-recovery-in-2013/
What story do
these tell?
http://fivethirtyeight.com/features/the-potential-bubble-the-federal-reservecares-most-about/
Finding the
next “bubble”…
Principle 2: The chart should have
graphical integrity
• Basically, it shouldn’t “lie” (mislead the reader)
• Tufte’s “Lie Factor”:
– 𝐿𝑖𝑒 πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿ =
𝑠𝑖𝑧𝑒 π‘œπ‘“ 𝑒𝑓𝑓𝑒𝑐𝑑 π‘ β„Žπ‘œπ‘€π‘› 𝑖𝑛 π‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘
𝑠𝑖𝑧𝑒 π‘œπ‘“ 𝑒𝑓𝑓𝑒𝑐𝑑 𝑖𝑛 π‘‘π‘Žπ‘‘π‘Ž
Should be ~ 1
< 1 = understated
effect
> 1 = exaggerated
effect
“Lie factor”: Visualizing Area
A
$500
$250
WIDTH of A is twice width
of B, but AREA of A is 4x
area of B!
Lie factor of 2
(4 ÷ 2) = 2!
A
B
$500
vs.
B
$250
WIDTH of A is 1.4x width of
B,
so area of A is now 2x area
of B!
Lie factor of 1 (no lie!)
(2 ÷ 2) = 1
Adapted from Krum, R. (2014). Cool infographics: Effective
Communication with Data Visualization.
“Lie factor”: Axes should include “0”
2012 Presidential Run
GOP CANDIDATES
70%
63%
Huckabee
60%
Palin
Romney
vs.
vs.
2012 Presidential Run
GOP CANDIDATES
63%
Huckabee
70%
Palin
60%
Romney
Other tips to avoid “lying”
Hypothetical Industries, Inc.
140
130
Adjust for
inflation
120
110
Revenue
Adjusted Revenue
100
90
80
2003
2004
2005
2006
2007
2008
2009
2010
Year
Hypothetical City Crime
Hypothetical City Crime
400
425
390
Thefts per 100000 citizens
Thefts per 100000 citizens
Make sure
the context
is presented
410
380
370
360
350
2009
2010
vs.
375
325
275
225
175
125
75
25
2003 2004 2005 2006 2007 2008 2009 2010
Principle 3: The chart should
minimize graphical complexity
Generally, the simpler the better…
Key concepts
Sometimes a
table is better
Data-in
When a table is better than a chart
For a few data points, a table can do just as well…
Total Sales by Salesperson
$250,000.00
$200,000.00
$150,000.00
$100,000.00
$50,000.00
$0.00
Salesperson
Total Sales
Peacock
$225,763.68
Leverling
$201,196.27
Davolio
$182,500.09
Fuller
$162,503.78
Callahan
$123,032.67
King
$116,962.99
Dodsworth
$75,048.04
Suyama
$72,527.63
Buchanan
$68,792.25
The table carries more information in less space
and is more precise.
Data Ink
• The amount of “ink” devoted to data in a chart
• Tufte’s Data-Ink ratio:
– π·π‘Žπ‘‘π‘Ž − π‘–π‘›π‘˜ π‘Ÿπ‘Žπ‘‘π‘–π‘œ =
π‘‘π‘Žπ‘‘π‘Ž−π‘–π‘›π‘˜
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘–π‘›π‘˜ 𝑒𝑠𝑒𝑑 𝑖𝑛 π‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘
Should be ~ 1
< 1 = more non-data
related ink in graphic
= 1 implies all ink
devoted to data
Tufte’s principle:
Erase ink whenever possible
Being conscious of data ink
Lower data-ink ratio
(worse)
Hypothetical City Crime
425
425
375
325
275
225
175
125
75
375
Thefts per 100000 citizens
Hypothetical City Crime
Thefts per 100000 citizens
Note the use of
labels and
legends
25
2003
325
2004
2005
2006
2007
2008
2009
2010
275
225
Hypothetical City Crime
175
400
125
370
75
25
2003
2004
2005
2006
2007
2008
2009
320
330
2005
2006
370
350
270
2010
200
Higher data-ink ratio
(better)
2003
2004
2007
2008
2009
2010
Evaluate these graphics.
http://www.nytimes.com/2014/09/11/world/middleeast/obama-speech-isis.html
http://fivethirtyeight.com/features/skeptical-footballmanning-vs-messi-and-andrew-luck-experiment/
Download