Data Visualization - A Very Rough Guide Ken Brodlie University of Leeds

advertisement
Data Visualization
- A Very Rough Guide
Ken Brodlie
University of Leeds
SDMIV
1
What is This Thing Called
Visualization?
„
Visualization
– “Use of computersupported, interactive,
visual representations of
data to amplify cognition”
(Card, McKinlay, Shneiderman)
– Born as a discipline in 1987
with publication of NSF
Report
– Now widely used in
computational science
and engineering
SDMIV
Vis5D
2
Visualization – Twin Subjects
„
Scientific
Visualization
– Visualization of
physical data
Ozone layer around earth
SDMIV
„
Information
Visualization
– Visualization of
abstract data
Automobile web site
- visualizing links
3
Scientific Visualization –
Another Characterisation
„
Focus is on visualizing
an entity measured in a
multi-dimensional space
–
–
–
–
„
„
1D
2D
3D
Occasionally nD
Underlying field is
recreated from the
sampled data
Relationship between
variables well
understood – some
independent, some
dependent
SDMIV
http://pacific.commerce.ubc.ca/xr/plot.html
Image from D. Bartz and M. Meissner
4
Scientific Visualization Model
„
Visualization represented
as pipeline:
– Read in data
– Build model of underlying
entity
– Construct a visualization
in terms of geometry
– Render geometry as
image
„
data
model
visualize
render
Realised as modular
visualization environment
– IRIS Explorer
– IBM Open Visualization
Data Explorer (DX)
– AVS
SDMIV
5
Extending the SciVis Model
„
„
The dataflow model
has proved
extremely flexible
Provides basis of
collaborative
visualization
data
model
visualize
collaborative server
internet
– Implemented in IRIS
Explorer as the
COVISA toolkit
„
render
Extensible
– User code
introduced as
module in pipeline
allows
computational
steering
SDMIV
render
control
simulate
visualize
render
6
An e-Science Demonstrator
„
Emergency scenario:
release of toxic
chemical
– Simulation launched on
Grid resource, steered
from desktop using IRIS
Explorer
– Collaborators linked in
remotely using COVISA
toolkit
Dispersion of pollutant
studied under varying
wind directions
A collaborator
links in over
the network
SDMIV
7
Other Metaphors
„
„
„
SDMIV
Other user
interface
metaphors have
been suggested
Spreadsheet
interface
becoming
popular..
Allows audit trail
of visualizations
Jankun-Kelly and Ma
8
Information Visualization
„
„
Focus is on
visualizing set of
observations that
are multi-variate
Example of iris data
set
– 150 observations of 4
variables (length,
width of petal and
sepal)
– Techniques aim to
display relationships
between variables
SDMIV
9
Dataflow for Information
Visualization
„
„
„
Again we can express
as a dataflow – but
emphasis now is on
data itself rather than
underlying entity
First step is to form the
data into a table of
observations, each
observation being a set
of values of the
variables
Then we apply a
visualization technique
as before
SDMIV
data
data
table
visualize
render
variables
observations
1
A
..
B
..
C
..
2
..
..
..
10
Multivariate Visualization
„
Techniques
designed for any
number of variables
–
–
–
–
SDMIV
Glyph techniques
Parallel co-ordinates
Scatter plot matrices
Pixel-based
techniques
Acknowledgement:
Many of images in following
slides taken from Ward’s work
„
Software:
– Xmdvtool
Matthew Ward
..and also IRIS Explorer!
11
Glyph Techniques
„
Star plots
– Each observation
represented as a ‘star’
– Each spike represents a
variable
– Length of spike indicates
the value
„
Variety of possible
glyphs
– Chernoff faces
Crime in
Detroit
SDMIV
12
Parallel Co-ordinates
„
„
„
Each variate
represented as vertical
axis
Axes laid out uniformly
Observation
represented as a
polyline traversing all
M axes, crossing each
axis at the observed
value of the variate
SDMIV
Detroit homicide data (7 variables,13 observations)
13
Scatter Plot Matrices
„
Matrix of 2D scatter
plots
– Each plot shows
projection of data
onto a 2D subspace
of the variates
– Order M2 plots
SDMIV
14
The Screen Space Problem
„
„
All techniques,
sooner or later, run
out of screen space
Parallel coordinates
– Usable for up to 150
variates
– Unworkable greater
than 250 variates
SDMIV
Remote sensing: 5 variates, 16,384 observations)
15
Brushing as a Solution
„
„
SDMIV
Brushing selects a
restricted range of
one or more
variables
Selection then
highlighted
16
Clustering as a Solution
„
„
Success has been
achieved through
clustering of
observations
Hierarchical parallel
co-ordinates
– Cluster by similarity
– Display using
translucency and
proximity-based
colour
SDMIV
17
Hierarchical Parallel Coordinates
SDMIV
18
Reduction of Dimensionality of
Variate Space
„
„
Reduce number of
variables, preserve
information
Principal Component
Analysis
– Transform to new coordinate system
– Hard to interpret
„
Hierarchical reduction
of variate space
– Cluster variables where
distance between
observations is typically
small
– Choose representative
for each cluster
SDMIV
19
Using a Dataflow System for
Information Visualization
„
IRIS Explorer used to
visualize data from
BMW
– Five variables
displayed using
spatial arrangement
for three, colour and
object type for
others
– Notice the clusters…
„
More later..
Kraus & Ertl
SDMIV
20
Scientific Visualization –
Information Visualization
Scientific Visualization
„
„
„
SDMIV
Focus is on visualizing
an entity measured in
a multi-dimensional
space
Underlying field is
recreated from the
sampled data
Relationship between
variables well
understood
Information Visualization
„
„
„
Focus is on
visualizing set of
observations that
are multi-variate
There is no
underlying field – it
is the data itself we
want to visualize
The relationship
between variables is
not well understood
21
Download