Enhancing Set-Analysis through Scalable Visualizations Hamid Haidarian Shahri Mudit Agrawal

advertisement
Enhancing Set-Analysis
through Scalable Visualizations
Presented by:
Hamid Haidarian Shahri
(hamid@cs.umd.edu)
Mudit Agrawal
(mudit@cs.umd.edu)
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
1
Content








Problem Definition
Motivation
Dataset
Architecture
Visualization Methods
Interaction Tools
Demo
Future Work
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
2
Problem Definition

Analysis of sets by



representing the clusters graphically
depicting their internal and external links
Scaling visualization
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
3
Motivation

Sets are encountered in various domains





websites
commodities
publications
anything that has attributes!!
Visualization of sets to aid human perception is still
an unsolved problem


May 09, 2006
no direct relations between sets (or its elements) in spatial
domain
can be grouped based on various attributes
CMSC 838S
Information Visualization Spring 2006
4
Dataset

2700 law cases

Each case identified by a numerical id
ranging from 1000 to 3718

Tuples in the dataset imply a referencing

Relation is unidirectional and not symmetric
(the referencing also implies a temporal
constraint on the cases)
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
5
Snapshot of the data
First 50 links (approximately 0.1 percent of whole dataset)
(1001,1105,'100 S.Ct. 318'),(1001,1612,'101
S.Ct. 2352'),(1001,1018,'107 S.Ct. 1232'),(1001,1016,'112
318')
S.Ct. 2886'),(1001,2923,'113 S.Ct. 2264'),(1001,1016,'120 L.Ed.2d 798'),(1001,2923,'124 L.Ed.2d
539'),(1001,2286,'138 F.3d 1036'),(1001,2396,'238 F.3d 382'),(1001,3410,'438 U.S.
104'),(1001,1105,'444 U.S. 51'),(1001,1612,'452 U.S. 264'),(1001,1018,'480 U.S. 470'),(1001,1016,'505
U.S. 1003'),(1001,2923,'508 U.S. 602'),(1001,3410,'57 L.Ed.2d 631'),(1001,1105,'62 L.Ed.2d
210'),(1001,1612,'69 L.Ed.2d 1'),(1001,1789,'926 F.2d 1169'),(1001,1018,'94 L.Ed.2d
472'),(1001,3410,'98 S.Ct. 2646'),(1002,1276,'100 S.Ct. 2138'),(1002,1101,'105 S.Ct.
3108'),(1002,1018,'107 S.Ct. 1232'),(1002,1098,'107 S.Ct. 2378'),(1002,1016,'112 S.Ct.
2886'),(1002,1015,'114 S.Ct. 2309'),(1002,1016,'120 L.Ed.2d 798'),(1002,1013,'121 S.Ct.
2448'),(1002,1012,'122 S.Ct. 1465'),(1002,1015,'129 L.Ed.2d 304'),(1002,2316,'142 F.3d
1319'),(1002,1013,'150 L.Ed.2d 592'),(1002,1012,'152 L.Ed.2d 517'),(1002,1121,'266 F.3d
487'),(1002,3028,'306 F.3d 113'),(1002,3410,'438 U.S. 104'),(1002,1276,'447 U.S. 255'),(1002,1101,'473
U.S. 172'),(1002,1018,'480 U.S. 470'),(1002,1098,'482 U.S. 304'),(1002,1016,'505 U.S.
1003'),(1002,1015,'512 U.S. 374'),(1002,1013,'533 U.S. 606'),(1002,1012,'535 U.S. 302'),(1002,3410,'57
L.Ed.2d 631'),(1002,2091,'59 F.3d 852'),(1002,1276,'65 L.Ed.2d 106'),(1002,1889,'746 F.2d
135'),(1002,1101,'87 L.Ed.2d 126'),(1002,1018,'94 L.Ed.2d 472'),(1002,2319,'953 F.2d
1299'),(1002,1098,'96 L.Ed.2d 250'),(1002,3410,'98 S.Ct. 2646'),(1002,1022,'980 F.2d
84'),(1002,2670,'989 F.2d 362'),(1003,1104,'100 S.Ct. 383'),(1003,1611,'104 S.Ct.
2862'),(1003,1100,'106 S.Ct. 1018'),(1003,1099,'107 S.Ct. 2076'),(1003,1016,'112 S.Ct.
2886'),(1003,3110,'116 S.Ct. 2432'),(1003,1016,'120 L.Ed.2d 798'),(1003,1012,'122 S.Ct.
1465'),(1003,1881,'13 F.3d 1192'),(1003,3054,'133 F.3d 893'),(1003,3110,'135 L.Ed.2d
964'),(1003,1012,'152 L.Ed.2d 517'),(1003,1047,'18 F.3d 1560'),(1003,1886,'265 F.3d
1237'),(1003,2689,'271 F.3d 1090'),(1003,1358,'271 F.3d 1327'),(1003,1149,'28 F.3d
1171'),(1003,1040,'331 F.3d 891')
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
6
Architecture
Clustering
Data
Module
Clustered
Data
Visualization
Module
Similarity
Metric
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
7
Routine K-Means Clustering





Data points are in
vector space.
x and  are vectors.
j
This assumption does
not hold for cases
represented as sets.
Centroids are not
simple geometric
means.
In fact, mean does not
make any sense.
May 09, 2006
CMSC 838S
k
V  
i 1 jS
i
Information Visualization Spring 2006
x 
j
2
j
8
Routine Self Organizing Map

Wv and D are assumed to be vectors.

Wv(t + 1) = Wv(t) + Θ(t)α(t) [D(t) - Wv(t)]

This assumption does not hold.
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
9
Similarity Measures

Jaccard similarity

Reference-based
similarity

Weighted referencebased similarity
May 09, 2006
CMSC 838S
A B
J ( A, B ) 
A B
S ( A, B)  A  B
WS ( A, B) 
Information Visualization Spring 2006

f ( x)
x A B
A B
10
Contribution to clustering

Applying K-means and SOM for producing better
visualizations

Not apparent at first glance, but the above algorithms
are not applicable to set visualization directly

They assume a 2D or nD (vector) representation for
each data point (i.e. law case). More specifically, the
attributes must form a vector space.

This assumption does not hold

May 09, 2006
no clear geometric attribute corresponding to the dataset
CMSC 838S
Information Visualization Spring 2006
11
Similarity Metrics  Geometric Metrics

1-D Partitioning

2-D Partitioning


May 09, 2006
Sequential arrangement
Distance based arrangement
CMSC 838S
Information Visualization Spring 2006
1
2
5
9
3
4
7
12
6
8
11
14
10
13
15
16
12
K-Means
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
13
K-Means
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
14
SOM after K-Means
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
15
Various Interactive Tools







Referencing pattern (activating all
links)
Local referencing
Density map
Representative element
Tool tip
Link follow-up
Search
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
16
Referencing Pattern
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
17
Local Referencing
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
18
Local Referencing
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
19
Density Map
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
20
Density Map
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
21
Representative Element
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
22
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
23
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
24
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
25
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
26
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
27
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
28
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
29
Link Follow-up
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
30
DEMO
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
31
Future Work

Other clustering algorithms can be explored:


Spectral
Fuzzy C-means

More similarity functions

Better initial posting of data

Zooming and Panning
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
32
References















Abello, J., Korn, J., Visualizing Massive Multi-Digraphs. Proceedings of the IEEE Symposium on
Information Visualization 2000.
Berry, M.W., Drma, Z., Jessup, E.R., Matrices, Vector Spaces, and Information Retrieval. SIAM Review,
41:2, 1999, pp. 335-362.
Gansner , E.R., Koutsofios, E., North, S.C., Vo, K.P., A Technique for Drawing Directed Graphs. IEEE
Trans. on Soft. Eng. 19(3), 1993, pp. 214-230.
Guimerà, R., Mossa, S., Turtschi, A., Amaral, L.A.N., The Worldwide Air Transportation Network:
Anomalous Centrality, Community Structure, and Cities' Global Roles. Proceedings of the National
Academy of Sciences 102, May 31, 2005, pp. 7794-7799.
Jain, A.K., Murty, M.N., Flynn, P.J., Data Clustering: A Review. ACM Computing Surveys, 1999.
Kohonen, T., The Self-Organizing Map. Proceedings of the IEEE, Volume 78, Issue 9, Sept. 1990, pp.
1464-1480.
Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A., Self organization of
a massive document collection. IEEE Transactions on Neural Networks, Vol. 11, 2000, pp. 574-585.
Kunz, C., Botsch, V., Ziegler, J., Spath, D., Contextualizing Search Results in Networked Directories.
Proceedings of HCII, 2003.
Leuski, A., Strategy-based Interactive Cluster Visualization for Information Retrieval. International
Journal on Digital Libraries, Vol. 3, Issue 2, 2000, pp. 170.
Liu, X., Luo, M., Shneiderman B. Visualization of Sets. Unpublished manuscript, 2005.
McQueen, J.B., Some Methods for classification and Analysis of Multivariate Observations. Proceedings
of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California
Press, 1967, pp. 281-297.
Murata, T., Visualizing the Structure of Web Communities Based on Data Acquired From a Search
Engine. IEEE Trans. on Industrial Electronics, Vol. 50, No. 5, 2003.
Palla, G., Derenyi, I., Farkas, I., Vicsek, T., Uncovering the Overlapping Structure of Complex Networks
in Nature and Society. Nature Letters, Vol. 435, 9 June 2005, pp. 814.
Self-organizing map. Wikipedia, The Free Encyclopedia.
Seo, J., Shneiderman, B., Understanding Hierarchical Clustering Results by Interactive Exploration of
Dendograms: A Case Study with Genomic Microarray Data. IEEE Computer Special Issue on
Bioinformatics, Volume 35, No. 7, July 2002, pp. 80-86.
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
33
Thanks!
May 09, 2006
CMSC 838S
Information Visualization Spring 2006
34
Download