part 1- scatter plots and parallel coordinates

advertisement
Envisioning Information
Lecture 3 – Multivariate Data
Exploration
Scatter plots and parallel coordinates
Ken Brodlie
ENV 2006
3.1
Data Tables
•
Multivariate datasets can be
expressed as a data table
– Each entry in table is an
observation
– An observation consists of
values of a set of variables, or
variates
•
variables
Exercise
– Create a data table from the
MSc class…
observations
ENV 2006
A
B
C
1
..
..
..
2
..
..
..
3.2
Scatter Plot
•
•
For two variates, we have
already met the scatter plot
technique
It is useful for showing what
happens to one variable as
another changes…
ENV 2006
3.3
Scatter Plot
•
•
•
•
Visicube from Datamology is a
useful free charting tool
Here is an example scatter plot,
visualizing the speed of the
(receding) galaxy NGC7531
relative to the earth,
measurements of speed being
taken at different points on
galaxy
Circles represent
measurements at 133o to
horizon; pluses at 43o
What can you observe?
http://www.datamology.com/sample-S2.shtml
ENV 2006
3.4
3D Scatter Plot
•
Visicube has a tool specifically
for 3D scatter plots
•
Third variate expressed as a
vertical axis and widget lets you
take slices at different heights
•
Here we have same dataset but
X and Y are positions, and Z
axis is velocity … ie layered by
velocity – here 3rd layer (1482 –
1519 km/sec)
•
Observations less than 1500
km/sec highlighted in yellow
(almost allowing 4D)
•
Conclusion?
http://www.datamology.com/sample-S3.shtml
ENV 2006
3.5
3D Scatter Plots
XRT/3d
•
Here is an alternative
approach, using 3D
plotting…
•
… does this work?
ENV 2006
http://www.ist.co.uk/XRT/xrt3d.html
3.6
Extending to Higher Numbers of Variables
•
•
Additional variables can be
visualized by colour and shape
coding
IRIS Explorer ( a scientific
visualization system!) used to
visualize data from BMW
–
–
•
Five variables displayed using
spatial arrangement for three,
colour and object type for
others
Notice the clusters…
But there are clearly limits to
how much this will scale
Kraus & Ertl, U Stuttgart
http://wscg.zcu.cz/wscg2001/Papers_2001/R54.pdf
ENV 2006
3.7
Multivariate Visualization Techniques
•

Software:
– Xmdvtool
Matthew Ward
Techniques designed for
any number of variables
– Scatter plot matrices
– Parallel co-ordinates
– Glyph techniques
http://davis.wpi.edu/~xmdv
Acknowledgement:
Many of images in following
slides taken from Ward’s work
ENV 2006
3.8
What are these?
ENV 2006
3.9
Multivariate Visualization
•
Example of iris data set
– 150 observations of 4
variables (length, width of
petal and sepal)
– Check wikipedia for
explanations of petals &
sepals
– Techniques aim to display
relationships between
variables – the analytical
task
Challenge in visualization is to design the
visualization to match the analytical task
ENV 2006
3.10
Scatter Plot Matrices
ENV 2006
3.11
Scatter Plot Matrices
• For table data of M variables, we can look at pairs in 2D scatter
plots
• The pairs can be juxtaposed:
C
.
B
.
A
.
.
.
.
.
A
. . .
.
.
.
B
.
.
.
.
With luck, you may spot
correlations between pairs
as linear structures… or
you may observe clusters
.
C
ENV 2006
3.12
Scatter Plot Matrix – Iris Data Set
ENV 2006
3.13
Scatter Plot Matrix – Car Data Set
Data represents
7 aspects of cars:
what relationships
can we notice?
For example, what
correlates with
high MPG?
ENV 2006
3.14
Parallel Coordinates
A
B
C
D
E
F
- create M equidistant vertical axes, each corresponding to a
variable
- each axis scaled to [min, max] range of the variable
- each observation corresponds to a line drawn through point on
each axis corresponding to value of the variable
ENV 2006
3.15
Parallel Coordinates
A
B
C
D
E
F
- correlations may start to appear as the observations are
plotted on the chart
- here there appears to be negative correlation between
values of A and B for example
- this has been used for applications with thousands of data
items
ENV 2006
3.16
Parallel Coordinates – Iris Data
ENV 2006
3.17
Parallel Coordinates Example
Detroit homicide
data
7 variables
13 observations
1961 -1973
ENV 2006
3.18
Parallel Coordinates
•
•
•
Concept due to Alfred Inselberg
Conceived the idea as a
research student in 1959…
… idea gradually refined over
next 40 years
http://www.math.tau.ac.il/~aiisreal/
ENV 2006
3.19
Parallel Coordinates
•
•
•
•
•
Parallel coordinates is a clever
mechanism for transforming
geometry from one space to
another
To get a handle on the idea,
consider two variables X,Y
In parallel coordinates, a point
(X,Y) becomes… what?
A line becomes… what?
•
Use this space to sketch the
answers…
Why is the ordering of the axes
important?
ENV 2006
3.20
The Screen Space Problem
•
•
All techniques, sooner or
later, run out of screen space
Parallel co-ordinates
– Usable for up to 150
variates
– Unworkable greater than
250 variates
Remote sensing: 5 variates, 16,384 observations)
ENV 2006
3.21
Brushing as a Solution
•
•
Brushing selects a restricted
range of one or more variables
Selection then highlighted
ENV 2006
3.22
Scatter Plot
Use of a
‘brushing’ tool
can highlight
subsets of data
..now we can see
what correlates
with high MPG
ENV 2006
3.23
Parallel Coordinates
Brushing picks
out the high MPG
data
Can you observe
the same relations
as with scatter
plots?
More or less easy?
ENV 2006
3.24
Parallel Coordinates
Here we highlight
high MPG and
not 4 cylinders
ENV 2006
3.25
Download