Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Graphics an Ace up a Statistician's Sleeve Heike Hofmann IOWA STATE UNIVERSITY • Bad graphics • Beginning of Statistical Graphics • Milestones in Graphics • Interactive Graphics Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting BAD Graphics Guidelines for a bad graphic: (Howard Wainer) • don’t show much data Criteria for bad graphics: (Edward Tufte) • Lie Factor size of effect in graphic / size of effect in data • show the data inaccurately • obfuscate the data Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Lie Factor Increase in Mileage The Lie Factor (from Tufte, 1983, p.57) gif image by Clay Helberg, Pitfalls of Data Analysis This graph, from the NY Times, purports to show the mandated fuel economy standards set by the US Department of Transportation. The standard required an increase in mileage from 18 to 27.5, an increase of 53%. The magnitude of increase shown in the graph is 783%, for a whopping lie factor = (783/53) = 14.8! Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting BAD Graphics Goals for a bad graphic: (Howard Wainer) • don’t show much data Criteria for bad graphics: (Edward Tufte) • Lie Factor size of effect in graphic / size of effect in data • • show the data inaccurately obfuscate the data • Data-Ink Ratio data ink / total ink used in graphic Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting “Worst Graphic ever ... in print” (Tufte) Age Structure of College Enrollment Art or Artifice? As a substitute for substance, one can try lots of color, 3D effects, or disguised redundancy. This graph uses all three techniques, to display just five numbers. Note the clever use of mirror-imaging -- the top series is just (100 - the bottom series) and the interesting use curved lines, front and back to avoid the appearance that there‘s a lot less here than meets the eye. Tufte (1983, p.118) says, “This may well be the worst graphic ever to find its way into print.” Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Beginnings of Graphics Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Beginnings of Statistical Graphics William Playfair (1759 - 1823) Scottish economist author of “The Commercial and Political Atlas” (1786) includes 44 charts: time series plots, one bar chart simple in design, yet data rich. Joseph Minard (1781 - 1870) Mathematician École Nationale des Ponts et Chaussées (ENPC) 1844 - 1870 draws maps and data flow graphs Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Playfair: Price of Wheat Price of a quarter of wheat (28 pounds) from 1565 to 1821 in comparison to weekly wages with a time-line of reigns of different rulers Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Minard: Napoleon's Russian Campaign 1812 6d Data on Army: geographic location size of army time temperature direction of movement Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Overlaid Maps Cholera Outbreak in Central London September 1854 Dr John Snow plotted deaths by dots crosses for water pumps Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Overlaid Maps Armoring Airplanes during WWII Abraham Wald challenged to add extra armor to airplanes based on pattern of bullet holes in returning aircrafts Wald determined where planes had been shot conclusion: put extra armor every place else! Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Train schedule: Paris - Lyon Marey's Plot (1880) today's TGV Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Modern Dark Ages (1900-1949) only few innovations, rise of "classical" statistics: distributions, hypothesis tests, parameter estimates, ... Re-Birth of Statistical Graphics (1950-1974) John W. Tukey variety of new simple graphics: Exploratory Data Analysis Jean Jacques Bertin Semiologie Graphique: organize visual and perceptual elements of graphics http://viscog.beckman.uiuc.edu/djs_lab/demos.html Computer available Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting High Dimensional, High Interaction Graphics With increasing computer power processing of high dimensional data possible High-interaction graphics with new paradigms: selection, linked highlighting, brushing, logical zooming New Methods: for continuous variables Scatterplot Matrix, Grand Tour + Projection Pursuit, Parallel Coordinate Plots for categorical variables Mosaic Plots, Tree Maps Ever Expanding Application to New Areas Wide Range of Commercial and Free Software DataDesk, Spotfire, Statistica, JMP, Visual Insights GGobi, Manet, Mondrian Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY Biplots & Grand Tour Biplots (Gabriel 1971) idea: scatterplot of 1st & 2nd principal component, add original variables as lines 2. pc 2003 WNAR/IMS meeting %weaver 18 %unknown 18 %baker 18 %patrician 18 %merchants 18 %women 18 %widow 18 %goldsmith 18 %textiles 18 Grand Tour (Asimov 1984) walk along path of ALL POSSIBLE d-dimensional projections additional indices for optimization: Projection Pursuit 1. pc Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Parallel Coordinates allow high dimensional visualization of data (Ed Wegman, Al Inselberg) Non-Euclidean Geometry: points to lines and lines to points Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Mosaic Plots Visualization of high-dimensional contingency tables (Hartigan & Kleiner) further development (Friendly) and variations (Hofmann) Sex F M F M F Male F Male Age area based plots: one rectangle for each cell in the table, area is proportional to cell size Child Adult Variation: Double Decker Plot Class 1st 2. 3rd First 2nd Sex Female Age Class First Second Third Male Adult Third Crew Crew F M Child Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Tree Maps Ben Shneiderman Splits on same level can be according to different variables not all cells are on same level aspect ratio optimized (close to 1) squares are easier to compare than skew rectangles green-shading indicates development of stock Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Where do new ideas come from? stimulated by applications, new data types here: network data Network Graphs Graham Wills problems overview vs close-ups layout Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Application: Gene Expression Data Experimental Setup 2 genotypes: Wildtype, growth impaired mutant 2 treatments: cure (not) added to soil 2 replicates each cure added? no WT genotype mutant yes Goal: identify genes with changes in gene expression due to treatment or genotype or both Classical ANOVA Problem! Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting ANOVA Model Model Setup (for each gene): Yijk cure added no = µ + λiC + λjT + λijCT + εk λiC WT Yijk µ λiC λjT λijCT gene expression level base expression level (average) effect of cure λjT genotype yes mutant effect of genotype interaction effect of cure & genotype Compute F statistics, get P-values -> P-values of < 5% show significant effects ... or NOT?? λijCT Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Summary Statistical Graphics have beautiful & interesting past Successful Applications, in some cases saved human lives Development goes through Cycles seems that graphics once more in highly productive phase Stimulation from application areas massive data sets, new areas with problems on new scale data mining / knowledge discovery Graphics - an Ace up a Statistician's Sleeve Heike Hofmann, hofmann@iastate.edu IOWA STATE UNIVERSITY 2003 WNAR/IMS meeting Sources Howard Wainer: "Visual Revelations - Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot" Edward Tufte: "The Visual Display of Quantitative Information" Michael Friendly's Data Visualization Gallery Milestones Project http://www.math.yorku.ca/SCS/ Gallery/milestone/