1. Introduction Multivariate data visualization concepts and tools Applications 1-1 \Zoology" of Multivariate Data Visualization ) \taxonomy" of visualization principles (Buja et al, 96). Rendering: what to show in a plot. Manipulation: what to do with the plots. 1-2 Rendering Scatterplots: cases represented by locations of points, eg dotplots, and scatterplots Traces: cases represented as a function of a real parameter, eg parallel coordinate plots (Inselberg, 85; Wegman, 90), Andrews curves (Andrews, 72), tour curves (Peterson and Cook, 99). Glyphs: cases represented by complex symbols, eg trees and castles (Kleiner and Hartigan, 81), stars (Newton, 78) , Cherno faces (Cherno, 73). 1-3 Manipulation Finding Gestalt - Focusing Individual Views: choosing variables, aspect ratio, zoom and pan, motion. Posing Queries - Linking Multiple Views: coloring or highlighting a subset, brushing in one view, observing the result in other views, eg linked brushing. Making Comparisons - Arranging Many Views: mechanisms for shifting and reformatting the layout of multiple plots, eg scatterplot matrix. 1-4 Focusing Individual Views For scatterplots: Choose variables, aspect ratio, zoom and pan, ... For traces: Choose variables, ordering variables, scale and aspect ratio, ... For glyphs: Choose variables, mapping variables to glyph features, layout of glyphs on plotting surface, ... 1-5 Adding Motion - Tours (1) Grand Tour: Overview of data, continuous sequence of low-dimensional projections of high-d space, all projections equally likely to be shown (Asimov, 85; Buja and Asimov, 86) . information about the joint distribution (2) Guided Tour: increases the probability of visiting the more \interesting" views and decreases the probability of visiting less interesting views (Hurley and Buja, 90; Cook et al, 95). (3) Manual Control: provides mechanisms for manipulating the contribution of one or more variables to a plot. It gives ne tune \variable-centered" control (Cook and Buja, 97). ) 1-6 Notation for Projection Methods Data (n observations, p variables) has matrix form as follows: X = [2X 1 X 2 : : : X11 X12 66 X = 64 ..21 X..22 Xn1 Xp2 X p] ::: ::: ... ::: X1p 37 X2p 77 .. 5 Xnp np 1-7 Notation for Projection Methods A 1-D projection of the data into a vector p1 takes the form: X = [X 1 X 2 : : : X n] = [1X11 + 2X21 + + pXp1 : : : 1X1n + 2X2n + q where = 21 + + 2p = 1. A 2-D projection of the data can be generated by expanding to Ap2 = [1 2] where the columns are orthonormal, 012 = 0. Similarly this notation can be expanded to represent d-D projections. jj jj 1-8 + Linking Multiple Views Conditioning operation: Under the brush we have (X; Y ) B conditioning on variables X, Y, (Newton, 78; McDonald, 82; Becker and Cleveland, 87). Sectioning: Geometric sections with hyperplanes (Furnas and Buja, 94). Database query: Brushing interpreted in the logic of query clauses, a1 < X < a2 and b1 < Y < b2 (Buja et al, 91; Scheiderman, 94). 2 ) 1-9 Arranging Many Views Laying out marginal plots, eg scatterplot matrices. Arranging conditional plots, dened by levels of a discrete or discretized variable, eg trellis plots (Becker and Cleveland, ). Organizing material according to similarity, using a cluster algorithm or minimal spanning tree (Carr, 94; Eddy and Mockus, 95) 1 - 10 Software: XGobi Developed at Bellcore by Swayne, Cook, and Buja, beginning 1989 (Swayne et al, 98). Freely available from www.research.att.com/areas/stat/xgobi/. Data represented by scatterplots, and connected lines. Linked brushing of points and lines across plots. Univariate and bivariate plots, parallel coordinate plots, scatterplot matrices. Dynamic plots - cycling, 3D rotations, tours. High-dimensional drawing program Connections to other software using Remote Procedure Calls X Window System application 1 - 11 XGobi Layout 1 - 12 Inputting Data into XGobi A host of input les, recognized by extension (filename as stem le name): .dat essential data le: all numeric, space-delimited 191 131 53 185 134 50 200 137 52 173 127 50 .col optional variable names: one label per line tars1 tars2 head .row optional case names: one label per line Concinna Concinna Heptapot. Heikert. 1 - 13 Inputting Data into XGobi .colors .glyphs optional colors le: color specication for each case Green Green Yellow Red optional glyphs le: symbol specication for each case 7 7 2 22 Usage: xgobi [-mono] [-subset] [-only n/N] [-only a,n] [-std mmx|msd|mmd] [-dev std_deviation] [-version] [-scatmat] filename 1 - 14 Broad Variety of Applications Many visualization programs are designed for special purposes. XGobi was designed to be very general, and have broad applicability. The use of simple point and line plots can create surprisingly complex pictures. 1 - 15 Schematic Illustrating Projection Methods 1 - 16 Scatterplot Matrix tars1 tars2 head aede1 aede2 aede3 1 - 17 Parallel Coordinate Plot 1 - 18 Dendrogram Linked to Tour View palmitic palmitoleic oleic 0 10002000 Merge Level linoleic 0 200 400 600 Objects 1 - 19 Bonferroni vs Schee Condence Intervals in 3D Na SweatRate K 1 - 20 Results of a 24 Experimental Design: Hand-drawn and XGobi Re-construction 1 - 21 Multiple Time Series SB Profit Gilts 1 - 22 Regressions with 3 Explanatory Variables, Without and With Interaction Terms X3 X1 X2 X1 Y Y X2 X3 1 - 23 6D Dynamical System, Stable and Unstable Trajectories (Qi et al, 98) Var 1Var 5 Var 1Var 5 Var 3 Var 2 Var Var 4 6 Var 3 Var 2 Var Var 4 6 1 - 24 Contours of Climate Rating Across the USA 1 - 25 What We Can Find With Graphics With graphics we can often nd features that we wouldn't otherwise detect from numerical methods - small departures from the trend, sparse structure in high-dimensions - and we can rene numerical results and make them more interpretable. 1 - 26 0 20 40 60 0 10 20 30 40 Tipping Behavior 2 4 0 2 4 0 2 4 Tips 6 8 10 6 8 10 6 8 10 0 2 4 0 2 4 0 2 4 Tips 6 8 10 6 8 10 6 8 10 0 10 20 30 0 10 20 30 40 50 0 Tips 0 10 20 30 0 10 20 30 40 Tips Tips Tips 1 - 27 Tipping Behavior 10 Total Tip 4 6 8 2 Male Smokers Female Smokers 10 20 30 Total Bill 40 50 2 2 Total Tip 4 6 8 10 0 10 20 30 40 50 Total Bill Total Tip 4 6 8 4 2 0 Female Non-smokers 0 10 20 30 40 50 Total Bill 10 Total Tip 6 2 8 Total Tip 4 6 8 10 10 Male Non-smokers 0 10 20 30 40 50 Total Bill 0 10 20 30 40 50 Total Bill 1 - 28 60 CART vs Manual Tour Controls 1 1 10 11 1 1 1 11 1 1 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 1 11 111 1 1 1 1 11 1 111 11111 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 11 1 11 1 1 111 11111 1 11 1 111 1 1 1 1 1 1 1 11 11111111 11 11111111 1 11 1 1 11 11 111111 1111 1 11 1 111 1 11 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 111 111111 1 11 111 1 1 111 111 11 111 111111 1 11 1 11 111 1 111 11 1111 11 11 11111111 1111 1 11 11 1 111 1 1 1 1 1 1 11 11 11 1 1 11 1 11 111 1111 1111111111 1 1 1111111 11111 1111 1 1 0 50 1 333 33 33333333 3 333333333333 3 3333 3333 33 222222 2 222 22222222 2 2 2222222 3333 3 33 33 333 2 2 3 33 33 33 33 3333333 33333333333 333333333333333 3 332 2 22222 22 222 222 222222 2 222 22222222222 22 20 eicosenoic 30 40 1 1 1 linoleic 2 arachidic oleic 600 800 1000 linoleic 1200 1400 eicosenoic 1 - 29 Sparse Structure in 7D X5 X3 X6 X2 X1 X4 X7 X5 X3 X2 X1 X4 X7 X6 1 - 30