1. Introduction \Zoology" of Multivariate Data Visualization

advertisement
1. Introduction
Multivariate data visualization concepts and tools
Applications
1-1
\Zoology" of Multivariate Data Visualization
)
\taxonomy" of visualization principles (Buja et al, 96).
Rendering: what to show in a plot.
Manipulation: what to do with the plots.
1-2
Rendering
Scatterplots: cases represented by locations of points, eg
dotplots, and scatterplots
Traces: cases represented as a function of a real parameter,
eg parallel coordinate plots (Inselberg, 85; Wegman, 90),
Andrews curves (Andrews, 72), tour curves (Peterson and
Cook, 99).
Glyphs: cases represented by complex symbols, eg trees and
castles (Kleiner and Hartigan, 81), stars (Newton, 78) , Cherno faces (Cherno, 73).
1-3
Manipulation
Finding Gestalt - Focusing Individual Views: choosing
variables, aspect ratio, zoom and pan, motion.
Posing Queries - Linking Multiple Views: coloring or highlighting a subset, brushing in one view, observing the result
in other views, eg linked brushing.
Making Comparisons - Arranging Many Views: mechanisms for shifting and reformatting the layout of multiple
plots, eg scatterplot matrix.
1-4
Focusing Individual Views
For scatterplots: Choose variables, aspect ratio, zoom and
pan, ...
For traces: Choose variables, ordering variables, scale and
aspect ratio, ...
For glyphs: Choose variables, mapping variables to glyph
features, layout of glyphs on plotting surface, ...
1-5
Adding Motion - Tours
(1) Grand Tour: Overview of data, continuous sequence of
low-dimensional projections of high-d space, all projections
equally likely to be shown (Asimov, 85; Buja and Asimov,
86) .
information about the joint distribution
(2) Guided Tour: increases the probability of visiting the more
\interesting" views and decreases the probability of visiting
less interesting views (Hurley and Buja, 90; Cook et al, 95).
(3) Manual Control: provides mechanisms for manipulating
the contribution of one or more variables to a plot. It gives
ne tune \variable-centered" control (Cook and Buja, 97).
)
1-6
Notation for Projection Methods
Data (n observations, p variables) has matrix form as follows:
X = [2X 1 X 2 : : :
X11 X12
66 X
= 64 ..21 X..22
Xn1 Xp2
X p]
:::
:::
...
:::
X1p 37
X2p 77
.. 5
Xnp np
1-7
Notation for Projection Methods
A 1-D projection of the data into a vector p1 takes the
form:
X = [X 1 X 2 : : : X n]
= [1X11 + 2X21 + + pXp1 : : : 1X1n + 2X2n +
q
where = 21 + + 2p = 1. A 2-D projection of the
data can be generated by expanding to Ap2 = [1 2]
where the columns are orthonormal, 012 = 0. Similarly this
notation can be expanded to represent d-D projections.
jj
jj
1-8
+
Linking Multiple Views
Conditioning operation: Under the brush we have (X; Y ) B
conditioning on variables X, Y, (Newton, 78; McDonald,
82; Becker and Cleveland, 87).
Sectioning: Geometric sections with hyperplanes (Furnas and
Buja, 94).
Database query: Brushing interpreted in the logic of query
clauses, a1 < X < a2 and b1 < Y < b2 (Buja et al, 91;
Scheiderman, 94).
2
)
1-9
Arranging Many Views
Laying out marginal plots, eg scatterplot matrices.
Arranging conditional plots, dened by levels of a discrete or
discretized variable, eg trellis plots (Becker and Cleveland, ).
Organizing material according to similarity, using a cluster
algorithm or minimal spanning tree (Carr, 94; Eddy and
Mockus, 95)
1 - 10
Software: XGobi
Developed at Bellcore by Swayne, Cook, and Buja, beginning
1989 (Swayne et al, 98). Freely available from
www.research.att.com/areas/stat/xgobi/.
Data represented by scatterplots, and connected lines.
Linked brushing of points and lines across plots.
Univariate and bivariate plots, parallel coordinate plots, scatterplot matrices.
Dynamic plots - cycling, 3D rotations, tours.
High-dimensional drawing program
Connections to other software using Remote Procedure Calls
X Window System application
1 - 11
XGobi Layout
1 - 12
Inputting Data into XGobi
A host of input les, recognized by extension (filename as
stem le name):
.dat essential data le: all numeric, space-delimited
191 131 53
185 134 50
200 137 52
173 127 50
.col optional variable names: one label per line
tars1
tars2
head
.row optional case names: one label per line
Concinna
Concinna
Heptapot.
Heikert.
1 - 13
Inputting Data into XGobi
.colors
.glyphs
optional colors le: color specication for each case
Green
Green
Yellow
Red
optional glyphs le: symbol specication for each case
7
7
2
22
Usage:
xgobi [-mono] [-subset] [-only n/N] [-only a,n]
[-std mmx|msd|mmd] [-dev std_deviation] [-version]
[-scatmat] filename
1 - 14
Broad Variety of Applications
Many visualization programs are designed for special purposes. XGobi was designed to be very general, and have
broad applicability. The use of simple point and line plots
can create surprisingly complex pictures.
1 - 15
Schematic Illustrating Projection Methods
1 - 16
Scatterplot Matrix
tars1
tars2
head
aede1
aede2
aede3
1 - 17
Parallel Coordinate Plot
1 - 18
Dendrogram Linked to Tour View
palmitic
palmitoleic
oleic
0 10002000
Merge Level
linoleic
0
200
400
600
Objects
1 - 19
Bonferroni vs Schee Condence Intervals in 3D
Na
SweatRate
K
1 - 20
Results of a 24 Experimental Design: Hand-drawn and
XGobi Re-construction
1 - 21
Multiple Time Series
SB
Profit
Gilts
1 - 22
Regressions with 3 Explanatory Variables, Without and
With Interaction Terms
X3
X1
X2
X1
Y
Y
X2
X3
1 - 23
6D Dynamical System, Stable and Unstable
Trajectories (Qi et al, 98)
Var 1Var 5
Var 1Var 5
Var 3
Var 2
Var
Var
4 6
Var 3
Var 2
Var
Var
4 6
1 - 24
Contours of Climate Rating Across the USA
1 - 25
What We Can Find With Graphics
With graphics we can often nd features that we wouldn't
otherwise detect from numerical methods - small departures
from the trend, sparse structure in high-dimensions - and
we can rene numerical results and make them more interpretable.
1 - 26
0
20 40 60
0 10 20 30 40
Tipping Behavior
2
4
0
2
4
0
2
4
Tips
6
8
10
6
8
10
6
8
10
0
2
4
0
2
4
0
2
4
Tips
6
8
10
6
8
10
6
8
10
0 10 20 30
0 10 20 30 40 50
0
Tips
0
10 20 30
0 10 20 30 40
Tips
Tips
Tips
1 - 27
Tipping Behavior
10
Total Tip
4 6 8
2
Male Smokers
Female Smokers
10
20
30
Total Bill
40
50
2
2
Total Tip
4 6 8
10
0 10 20 30 40 50
Total Bill
Total Tip
4 6 8
4
2
0
Female Non-smokers
0 10 20 30 40 50
Total Bill
10
Total Tip
6
2
8
Total Tip
4 6 8
10
10
Male Non-smokers
0 10 20 30 40 50
Total Bill
0 10 20 30 40 50
Total Bill
1 - 28
60
CART vs Manual Tour Controls
1
1
10
11 1
1
1
11 1 1 1 1
1
1
1
1 1 111 1
1
1
1
1
1
1
1
11
111
1
1
1
1
11 1 111 11111 11 1 1 1 1 1
1
1 11 1 1 1 1 1 1 11
1
1
1
1
11 1 11 1 1 111 11111
1 11 1
111 1
1
1 1 1
1 1 11 11111111 11
11111111
1 11 1 1 11
11
111111 1111
1 11 1 111 1 11
1 11
1
1
1
1
1
1 1 1 1 1 1 1 1 111 111111 1 11 111 1 1
111 111
11
111
111111 1 11 1
11
111 1 111 11 1111
11 11
11111111 1111 1
11
11 1 111 1 1
1
1
1
1 11
11 11 1 1
11 1 11
111
1111 1111111111
1
1 1111111 11111
1111 1
1
0
50
1
333
33
33333333
3
333333333333
3
3333 3333
33 222222
2
222
22222222
2 2
2222222
3333 3
33
33
333
2
2
3
33
33
33
33
3333333
33333333333
333333333333333
3 332
2 22222
22
222
222
222222 2 222
22222222222
22
20
eicosenoic
30
40
1
1
1
linoleic
2
arachidic
oleic
600
800
1000
linoleic
1200
1400
eicosenoic
1 - 29
Sparse Structure in 7D
X5
X3
X6
X2
X1
X4
X7
X5
X3
X2
X1
X4
X7
X6
1 - 30
Download