Interactive and Dynamic Graphics for Data Analysis using XGobi Dianne Cook, Statistics, Iowa State University Deborah F. Swayne, Statistics Research, AT&T Labs Andreas Buja, Statistics Research, AT&T Labs Copyright 1999 D. Cook, D. F. Swayne, A. Buja Objective By the end of this course, I would hope that you have gained some understanding for the power of visual tools used in the process of data analysis. 3 What is data visualization? Data: information in a table or list Visualization: I abstract relationships between variables. I beyond 3D, to arbitrary dimensions. I applicable to many types of data. 4 Beyond a Flat Page \Multiple views" paradigm. Focusing using zoom/pan/re-scale. Linking by queries, or motion. Rearranging to make multiple comparisons. Augmented by: 5 History of Statistical Graphics PRIM-9: Fisherkeller, Friedman, Tukey 1974. brushing: Newton, 1978; McDonald, 1982. grand tour: Asimov, 1985. 6 What is \interactive"? Direct manipulation in the plot: linked brushing points/regions/lines. querying the id of a point or group of points. dragging a scrollbar to change the value of a parameter. clicking a button to change the variables viewed in a plot. 7 What is \dynamic"? cycling between plots. 3D rotating plots. tour methods: grand/random, guided, manual. Motion graphics: 8 What Makes Graphics Special? The eye can absorb enormous amounts of information. I small departures from the trend. I sparse structure in high-dimensions. often nd features that we wouldn't otherwise detect from numerical methods - With graphics we can: rene numerical results and make them more interpretable. 9 Intricate Features: Tipping Behavior One waiter records 244 dining parties for 2.5 months, early 1990. Recorded total tip, total bill, sex of payer, smoking or not, day of the week, time of day, size of the party. What are the important factors in tipping behavior? Reference: Bryant and Smith (1995) 10 0 20 40 60 0 10 20 30 40 Intricate Features: Tipping Behavior 2 4 0 2 4 0 2 4 Tips 6 8 10 6 8 10 6 8 10 0 2 4 0 2 4 0 2 4 Tips 6 8 10 6 8 10 6 8 10 0 10 20 30 0 10 20 30 40 50 0 Tips 0 10 20 30 0 10 20 30 40 Tips Tips Tips 11 Intricate Features: Tipping Behavior 10 Total Tip 4 6 8 2 Male Smokers Female Smokers 10 20 30 Total Bill 40 50 2 2 Total Tip 4 6 8 10 0 10 20 30 40 50 Total Bill Total Tip 4 6 8 4 2 0 Female Non-smokers 0 10 20 30 40 50 Total Bill 10 Total Tip 6 2 8 Total Tip 4 6 8 10 10 Male Non-smokers 0 10 20 30 40 50 Total Bill 0 10 20 30 40 50 Total Bill 12 Software: XGobi Developed at Bellcore by Swayne, Cook, and Buja, beginning 1989 (Swayne et al, 98). Freely available from . Data represented by scatterplots, and connected lines. Linked brushing of points and lines across plots. Dynamic plots - cycling, 3D rotations, tours. Interprocess communication to other software. X Window System application. www.research.att.com/areas/stat/xgobi/ 13 Sparse Structure: 7D particle physics X5 X3 X6 X2 X1 X4 X7 X5 X3 X2 X1 X4 X7 X6 14 Rening Results: Italian Olive Oils Percentage composition of 8 fatty acids for 572 samples from 3 regions (and several areas) in Italy. How do we distinguish the oils from dierent regions and areas in Italy based on their combinations of the fatty acids? Reference: Forina et al. 1983 15 60 Rening Results: Italian Olive Oils 1 1 10 11 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 111 11 1 1 1 1 1 11 1 111111 11 1 111 11 11 1 1 1 1 11 1 1 1 1 11 1 111 1 1 11 1 1 11111 11111 1 1 1 11 1 1 1 1 11 111 11 11111 1 111 1 1 1 11111111 1 11 1 1 11 1111 11 111 1 1 11 1 1 11 1 111 1 11111 111 1 1 1 1 1 11 111 11 1 111 11111 11111 11111111111 11 1 1 1 11 111 11111 111 11 1 111 111 11 11111 11 1 1111 1111 1 11 1 1 11 1 1 1 11111111111 11 1 1 11 1 111 1 111 111 1111111 1 1 1111111 11111 1111 1 1 0 50 1 33 33333333 3 33 3333 3333 33 222222 2 222 22222222 2 2 2222222 3333 3 333 33 33 33 33 333 2 2 3 33 33 33 33 3333333 33 33 33 33 333 33 33333 333333333333333 3 332 2 22222 22 222 222 222222 2 222 22222222222 22 20 eicosenoic 30 40 1 1 linoleic 2 arachidic oleic 600 800 1000 linoleic 1200 1400 eicosenoic 16 0 400 800 Rening Results: Italian Olive Oils 0 200 400 600 17 80 humidity 90 100 s.s.temp 20 22 s.s.temp humidity air.temp 24 air.temp 26 28 30 s.s.temp 22 s.s.temp air.temp humidity 20 air.temp 24 26 28 New Work: Exploring Missingness 70 s.s.temp humidity air.temp 30 25 20 30 25 20 30 25 20 air.temp 30 18 19 20 21 22 23 24 25 26 27 28 29 30 31 s.s.temp. New Work: Exploring Missingness 20 21 22 23 24 25 26 27 28 29 air.temp. s.s.temp. zon.winds air.temp. mer.winds 19 New Work: Inference 20 Summary Applicable wherever data is collected: all areas of science, governments, nancial, retail, health, telecommunications industries. 21 Web Pages The authors can be contacted by electronic email at: dicook@iastate.edu dfs@research.att.com andreas@research.att.com and the XGobi software can be downloaded from the XGobi web site: http://www.research.att.com/areas/stat/xgobi/ 22