A Rank-by-Feature Framework for Interactive Multi-dimensional Data Exploration

advertisement
A Rank-by-Feature Framework
for Interactive Multi-dimensional
Data Exploration
Jinwook Seo and Ben Shneiderman
Human-Computer Interaction Lab. &
Department of Computer Science
University of Maryland, College Park
Hierarchical Clustering Explorer (HCE)
Hierarchical Clustering Explorer (HCE)
“HCE enabled us to find important clusters
that we don’t know about yet.”
Goal: Find Interesting Features in
Multidimensional Data
• Finding correlations, clusters, outliers, gaps,
… is difficult in multidimensional data
– Cognitive difficulties in >3D
• Therefore utilize low-dimensional projections
– Perceptual efficiency in 1D and 2D
– Use Rank-by-Feature Framework to guide discovery
Do you see anything interesting?
Do you see any interesting feature?
Scatter Plot
50
40
30
20
10
0
50
75
100
125
150
Ionization Energy
175
200
225
250
Correlation…What else?
Scatter Plot
50
40
30
20
10
0
50
75
100
125
150
Ionization Energy
175
200
225
250
Outliers
Scatter Plot
50
40
He
30
20
10
0
50
75
100
Rn
125
150
Ionization Energy
175
200
225
250
Demonstration
• Breakfast Cereals
– 77 cereals
– 8 dimensions (or variables) : sugar, potassium,
fiber, protein, etc.
• US counties census data
– 3138 counties
– 14 dimensions : population density, poverty
level, unemployment, etc.
Low-dimensional Projections
• Techniques
-2X1+X2
– General
• combination of variables for an axis
– Axis parallel
• a variable for an axis
X1+2X2
X3
• Number of projections
• Interface for Exploration
X1
Exploration by Projections
• XGobi, GGobi – Scatterplot Browsing
www.research.att.com/areas/stat/xgobi/
www.ggobi.org
Exploration by Projections
• Spotfire DecisionSite – Scatterplots
www.spotfire.com
Exploration by Projections
• XGobi, GGobi – Grand Tour
Exploration by Projections
• XmdvTool – Scatterplot Matrix
Worcester Polytechnic Institute
Square Matrix Display
Corrgram by Michael Friendly
Dimension selection tool
in GeoVISTA studio
by Alan M. MacEachren
Exploration by Projections
• Spotfire DecisionSite– View Tip orders scatterplots
Design Considerations
• Hard to interpret arbitrary linear projections
 Axis-parallel projections
• Interestingness depends on applications
 Incorporate users’ interest
• Overview of all possible projections
• Rapid change of axis
Demonstration
• Breakfast Cereals
– 77 cereals
– 11 dimensions (or variables) : sugar, potassium,
fiber, protein, etc.
• US counties census data
– 3138 counties
– 14 dimensions : population density, poverty
level, unemployment, etc.
Rank-by-Feature Framework: 1D
Ranking Criterion
Rank-by-Feature
Prism
Score List
Manual
Projection
Browser
Rank-by-Feature Framework: 2D
Ranking Criterion
Rank-by-Feature
Prism
Score List
Manual
Projection
Browser
A Ranking Example
3138 U.S. counties with 17 attributes
Ranking Criterion: Uniformity (entropy) (6.7, 6.1, 4.5, 1.5)
Ranking Criterion: Pearson correlation (0.996, 0.31, 0.01, -0.69)
Ongoing and Future Work
• Identify & implement more ranking criteria
– Gaps, outliers, etc.
• Ranking based on users’ selection of items
– Separability of the selected items
– Ranking by using only the selected items
• Scalability Issue
– How to handle a large number of dimensions
– Grouping by clustering dimensions
– Filtering uninteresting entries in the prism
More about HCE
• In collaboration and sponsored by Eric
Hoffman: Children’s National Medical Center
• Freely downloadable at
www.cs.umd.edu/hcil/hce
• Version 3.0 beta, May 2004
• About 2,000 downloads since April 2002
• Licensing to ViaLactia Biosciences (NZ) Ltd.
More Applications?
• Try HCE and the Rank-by-Feature
Framework with your problems and data
• Join the case studies on the use of HCE and
the Rank-by-Feature Framework
• Welcome suggestions and comments
Thank you !
Download