MIS5101: Business Intelligence Principles of Data Visualization Sunil Wattal What makes a good chart? Wikipedia: Patriotic War of 1812 http://en.wikipedia.org/wiki/File:Patriotic_War_of_1812_ENG_map1.svg What makes a good chart? Minard’s map of Napoleon’s campaign into Russia, 1869 Reprinted in Tufte (2009), p. 41 What is Data Visualization? • A picture can say a thousand words (Chinese proverb) • Data visualization is the use of tools to represent data in the form of charts, maps, tag clouds, animations, or any graphical means that make content easier to understand. – Creating a picture in your mind • Interpreting data in visual terms – Raw data is too complex – Not a substitute for statistical analysis or other quantitative methods – Exploit visual perception to amplify cognition Why visualization is effective • Short attention spans • Information overload • Easy to grasp format • More retention Video Games do this well… What can you learn from this map? http://www.popvssoda.com/countystats/total-county.html What makes a good chart? This is from an academic conference paper. What are the problems with this chart? Zhang et al. (2010), “A case study of micro-blogging in the enterprise: use, value, and related issues,” Proceedings of the 28th International Conference on Human Factors in Computing Systems. Some basic principles (adapted from Tufte 2009) 1 • The chart should tell a story 2 • The chart should have graphical integrity 3 • The chart should minimize graphical complexity Tufte’s fundamental principle: Above all else show the data Principle 1: The chart should tell a story Graphics should be clear on their own The depictions should enable meaningful comparison The chart should yield insight beyond the text “If the statistics are boring, then you’ve got the wrong numbers.” (Tufte 2009) Do these tell a story? http://www.evl.uic.edu/aej/491/week03.html http://flowingdata.com/2009/11/26/fox-newsmakes-the-best-pie-chart-ever/ Daylight Savings Time Explained http://visual.ly/daylight-saving-time-explained http://flowingdata.com/2011/01/19/states-with-the-most-and-least-firearms-murders/ http://economix.blogs.nytimes.com/2009/05/05/obesityand-the-fastness-of-food/ Telling a Story Telling a story The electoral result of the 2012 election Under pre-1920 rules: men only Under 1850 rules: white men only http://www.buzzfeed.com/buzzfeedpolitics/what-the-2012-election-would-have-looked-like-with Principle 2: The chart should have graphical integrity • Basically, it shouldn’t “lie” (mislead the reader) • Tufte’s “Lie Factor”: – πΏππ πΉπππ‘ππ = π ππ§π ππ ππππππ‘ π βππ€π ππ ππππβππ π ππ§π ππ ππππππ‘ ππ πππ‘π Should be ~ 1 < 1 = understated effect > 1 = exaggerated effect Examples of the “lie factor” πΏπΉ = Reprinted from Tufte (2009), p. 57 & p. 62 πΏπΉ = 5.3/0.6 8.83 = = 5.77 27.5/18 1.53 4280% (πβππππ ππ π£πππ’ππ) = 9.4 454% (πβππππ ππ πππππ) How is this deceptive? http://www.guardian.co.uk/environment/interactive/2009/mar/09/ food-carbon-emissions How about this? The original graphic from Real Clear Politics, 2008. The adjusted graphic. (Look at the y-axis) http://20bits.com/articles/politics-and-tuftes-lie-factor/ Or this? The original graphic from Fox Business, 2012. The adjusted graphic. http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225 Other tips to avoid “lying” Hypothetical Industries, Inc. 140 130 Adjust for inflation 120 110 Revenue Adjusted Revenue 100 90 80 2003 2004 2005 2006 2007 2008 2009 2010 Year Hypothetical City Crime Hypothetical City Crime 400 425 390 Thefts per 100000 citizens Thefts per 100000 citizens Make sure the context is presented 410 380 370 360 350 2009 2010 vs. 375 325 275 225 175 125 75 25 2003 2004 2005 2006 2007 2008 2009 2010 Principle 3: The chart should minimize graphical complexity Generally, the simpler the better… Key concepts Sometimes a table is better Data-ink Chartjunk When a table is better than a chart For a few data points, a table can do just as well… Total Sales by Salesperson $250,000.00 $200,000.00 $150,000.00 $100,000.00 $50,000.00 $0.00 Salesperson Total Sales Peacock $225,763.68 Leverling $201,196.27 Davolio $182,500.09 Fuller $162,503.78 Callahan $123,032.67 King $116,962.99 Dodsworth $75,048.04 Suyama $72,527.63 Buchanan $68,792.25 The table carries more information in less space and is more precise. The Ultimate Table: The Box Score • Large amount of information in a very small space • So why does this work? – Depends on the reader’s knowledge of the data The Business Box Score? • Applying the same concept to our salesforce example. • How does this help? How could it hurt? Key: TS – total sales WD – worst day BD – best day NC – number of customers DOR – days on the road Sales Performance – March 2011 Salesperson TS WD BD NC DOR Peacock 225 3 40 20 28 Leverling 201 2 45 18 27 Davolio 182 5 38 22 28 Fuller 162 2 22 16 20 Callahan 123 1 15 14 15 King 116 0.5 20 12 18 Dodsworth 75 0.3 12 10 20 Suyama 72 0 8 10 8 Buchanan 68 0 8 8 12 Data Ink • The amount of “ink” devoted to data in a chart • Tufte’s Data-Ink ratio: – π·ππ‘π − πππ πππ‘ππ = πππ‘π−πππ π‘ππ‘ππ πππ π’π ππ ππ ππππβππ Should be ~ 1 < 1 = more non-data related ink in graphic = 1 implies all ink devoted to data Tufte’s principle: Erase ink whenever possible Being conscious of data ink Lower data-ink ratio (worse) 425 375 Thefts per 100000 citizens Hypothetical City Crime 425 325 275 225 175 125 75 375 Thefts per 100000 citizens Hypothetical City Crime 25 2003 325 2004 2005 2006 2007 2008 2009 2010 275 225 Hypothetical City Crime 175 400 125 370 75 25 2003 2004 2005 2006 2007 2008 2009 320 330 2005 2006 370 350 270 2010 200 Higher data-ink ratio (better) 2003 2004 2007 2008 2009 2010 What makes a good chart? Sum of Extended Price 2011 Total Sales 160000 140000 120000 100000 80000 60000 40000 20000 0 Sometimes it’s really a matter of preference. Order Date Sum of Extended Price 2011 Total Sales 160000 140000 120000 100000 80000 60000 40000 20000 0 Order Date These both minimize data ink. Why isn’t a table better here? 3-D Charts Total Sales by Salesperson $250,000.00 $200,000.00 $150,000.00 $100,000.00 $50,000.00 $0.00 Evaluate this from a data-ink perspective. How does it affect the clarity of the chart? Chartjunk: Data Ink “gone wild” Unnecessary visual clutter that doesn’t provide additional insight Distraction from the story the chart is supposed to convey When the data-ink ratio is low, chartjunk is likely to be high Example: Moiré effects (Tufte 2009) Total Sales by Salesperson Creates illusion of movement $250,000.00 $200,000.00 $150,000.00 Stands out, in a bad way $100,000.00 $50,000.00 $0.00 Hypothetical City Crime 425 Thefts per 100000 citizens 375 325 275 225 175 125 75 25 2003 2004 2005 2006 2007 2008 2009 2010 Example: The Grid Hypothetical City Crime 425 Why are these examples of chartjunk? 325 275 225 175 125 75 25 2003 2004 2005 2006 2007 2008 2009 2010 Hypothetical City Crime 425 375 What could you do to remedy it? Thefts per 100000 citizens Thefts per 100000 citizens 375 325 275 225 175 125 75 25 2003 2004 2005 2006 2007 2008 2009 2010 http://www.economist.com/node/21537909 http://www.economist.com/node/21537922 Data Ink Working Against Us Data Ink Working For Us Evaluate this chart in terms of Data Ink. Imagine this as a bar chart. As a table!! Tools • Excel Tools – charts, histograms, pie charts – One dimensional, and not useful to represent complex data • Heat Maps • Infographics • Social Media Visualization • • • • • • Other tools Google Trends Fusion Tables Tibco Spotfire Twitter Analytics for Excel 2013 Visualizing World Bank Data – http://data.worldbank.org/products/data-visualization-tools Heat Map • Graphical representation of data in a map or a matrix • Using colors instead of numbers • Different colors denote different intensities or categories • Tools – Open heat map (http://www.openheatmap.com/) Infographic • Present information using visual tools • According to visual.ly – visualizations that present complex information quickly and clearly – visualizations that integrate words and graphics to reveal information, patterns or trends – visualizations that are easier to understand than words alone – visualizations that are beautiful and engaging • Tools – Infogram – Visual.ly Check out more at www.coolinfographics.com http://visual.ly/what-infographic-2 The Infographic Three excerpts from an infographic comparing Hurricane Sandy to Katrina http://www.huffingtonpost.com/2012/11/04/hurricane-sandy-vskatrina-infographic_n_2072432.html http://www.biztechmagazine.com/article/2012/08/how-make-sure-yourbyod-plan-all-good-infographic Another example Social Media Visualization • Big Data Analytics is only one aspect of social media – Story telling through visualizations is the key – Easy to get lost in pure numbers • Online conversations – Sentiment analysis – Reviews and ratings – Feedback • Online networks – Influencers Twitter Analytics • Social Networks • Connections between people – – – – How many? Who? One way or two way? Intensity of connections? • Applications – LinkedIn InMaps (inmaps.linkedinlabs.com) – Klout Score – How Google allotted the first set of Google glasses? Links • TED Talk on Data Visualization • http://videosubtitle.tedcdn.com/talk/podcast/2010G/Non e/DavidMcCandless_2010G-480p-en.mp4