Interactive Analysis and Dynamic

advertisement
Interactive and Dynamic Graphics for Data
Analysis using XGobi
Dianne Cook, Statistics, Iowa State University
Deborah F. Swayne, Statistics Research, AT&T Labs
Andreas Buja, Statistics Research, AT&T Labs
Copyright 1999 D. Cook, D. F. Swayne, A. Buja
Objective
By the end of this course, I would hope that you have
gained some understanding for the power of visual tools
used in the process of data analysis.
3
What is data visualization?
Data: information in a table or list
Visualization:
I abstract relationships between variables.
I beyond 3D, to arbitrary dimensions.
I applicable to many types of data.
4
Beyond a Flat Page
\Multiple views" paradigm.
Focusing using zoom/pan/re-scale.
Linking by queries, or motion.
Rearranging to make multiple comparisons.
Augmented by:
5
History of Statistical Graphics
PRIM-9: Fisherkeller, Friedman, Tukey 1974.
brushing: Newton, 1978; McDonald, 1982.
grand tour: Asimov, 1985.
6
What is \interactive"?
Direct manipulation in the plot:
linked brushing points/regions/lines.
querying the id of a point or group of points.
dragging a scrollbar to change the value of a parameter.
clicking a button to change the variables viewed in a
plot.
7
What is \dynamic"?
cycling between plots.
3D rotating plots.
tour methods: grand/random, guided, manual.
Motion graphics:
8
What Makes Graphics Special?
The eye can absorb enormous amounts of information.
I small departures from the trend.
I sparse structure in high-dimensions.
often nd features that we wouldn't otherwise detect
from numerical methods -
With graphics we can:
rene numerical results and make them more interpretable.
9
Intricate Features: Tipping Behavior
One waiter records 244 dining parties for 2.5 months,
early 1990. Recorded total tip, total bill, sex of payer,
smoking or not, day of the week, time of day, size of the
party.
What are the important factors in tipping behavior?
Reference: Bryant and Smith (1995)
10
0
20 40 60
0 10 20 30 40
Intricate Features: Tipping Behavior
2
4
0
2
4
0
2
4
Tips
6
8
10
6
8
10
6
8
10
0
2
4
0
2
4
0
2
4
Tips
6
8
10
6
8
10
6
8
10
0 10 20 30
0 10 20 30 40 50
0
Tips
0
10 20 30
0 10 20 30 40
Tips
Tips
Tips
11
Intricate Features: Tipping Behavior
10
Total Tip
4 6 8
2
Male Smokers
Female Smokers
10
20
30
Total Bill
40
50
2
2
Total Tip
4 6 8
10
0 10 20 30 40 50
Total Bill
Total Tip
4 6 8
4
2
0
Female Non-smokers
0 10 20 30 40 50
Total Bill
10
Total Tip
6
2
8
Total Tip
4 6 8
10
10
Male Non-smokers
0 10 20 30 40 50
Total Bill
0 10 20 30 40 50
Total Bill
12
Software: XGobi
Developed at Bellcore by Swayne, Cook, and Buja, beginning 1989 (Swayne et al, 98). Freely available from
.
Data represented by scatterplots, and connected lines.
Linked brushing of points and lines across plots.
Dynamic plots - cycling, 3D rotations, tours.
Interprocess communication to other software.
X Window System application.
www.research.att.com/areas/stat/xgobi/
13
Sparse Structure: 7D particle physics
X5
X3
X6
X2
X1
X4
X7
X5
X3
X2
X1
X4
X7
X6
14
Rening Results: Italian Olive Oils
Percentage composition of 8 fatty acids for 572 samples
from 3 regions (and several areas) in Italy.
How do we distinguish the oils from dierent regions and
areas in Italy based on their combinations of the fatty
acids?
Reference: Forina et al. 1983
15
60
Rening Results: Italian Olive Oils
1
1
10
11 1
1 1
1
1
1
1
1
1
11
1
1
1
1 1 111
11
1
1
1
1
1
11
1
111111
11 1 111 11
11 1 1 1 1 11 1
1
1
1 11 1 111
1 1 11 1 1 11111 11111 1 1
1
11
1
1 1 1 11 111 11 11111 1 111 1 1
1
11111111 1 11 1 1 11 1111 11 111 1
1 11 1 1 11
1 111 1
11111 111 1
1 1 1 1 11 111 11 1 111 11111
11111 11111111111 11 1 1
1
11
111
11111
111 11 1
111
111 11 11111 11
1 1111 1111 1
11 1 1
11
1
1
1
11111111111 11 1
1 11 1 111
1
111
111 1111111
1
1 1111111 11111
1111 1
1
0
50
1
33
33333333
3
33
3333 3333
33 222222
2
222
22222222
2 2
2222222
3333 3
333
33
33
33
33
333
2
2
3
33
33
33
33
3333333
33
33
33
33
333
33
33333
333333333333333
3 332
2 22222
22
222
222
222222 2 222
22222222222
22
20
eicosenoic
30
40
1
1
linoleic
2
arachidic
oleic
600
800
1000
linoleic
1200
1400
eicosenoic
16
0
400
800
Rening Results: Italian Olive Oils
0
200
400
600
17
80
humidity
90
100
s.s.temp
20
22
s.s.temp
humidity
air.temp
24
air.temp
26
28
30
s.s.temp
22
s.s.temp
air.temp
humidity
20
air.temp
24
26
28
New Work: Exploring Missingness
70
s.s.temp
humidity
air.temp
30
25
20
30
25
20
30
25
20
air.temp
30
18
19 20 21 22 23 24 25 26 27 28 29 30 31
s.s.temp.
New Work: Exploring Missingness
20 21 22 23 24 25 26 27 28 29
air.temp.
s.s.temp.
zon.winds
air.temp.
mer.winds
19
New Work: Inference
20
Summary
Applicable wherever data is collected: all areas of science, governments, nancial, retail, health, telecommunications industries.
21
Web Pages
The authors can be contacted by electronic email at:
dicook@iastate.edu
dfs@research.att.com
andreas@research.att.com
and the XGobi software can be downloaded from the
XGobi web site:
http://www.research.att.com/areas/stat/xgobi/
22
Download