Enhancing Set-Analysis through Scalable Visualizations Presented by: Hamid Haidarian Shahri (hamid@cs.umd.edu) Mudit Agrawal (mudit@cs.umd.edu) May 09, 2006 CMSC 838S Information Visualization Spring 2006 1 Content Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work May 09, 2006 CMSC 838S Information Visualization Spring 2006 2 Problem Definition Analysis of sets by representing the clusters graphically depicting their internal and external links Scaling visualization May 09, 2006 CMSC 838S Information Visualization Spring 2006 3 Motivation Sets are encountered in various domains websites commodities publications anything that has attributes!! Visualization of sets to aid human perception is still an unsolved problem May 09, 2006 no direct relations between sets (or its elements) in spatial domain can be grouped based on various attributes CMSC 838S Information Visualization Spring 2006 4 Dataset 2700 law cases Each case identified by a numerical id ranging from 1000 to 3718 Tuples in the dataset imply a referencing Relation is unidirectional and not symmetric (the referencing also implies a temporal constraint on the cases) May 09, 2006 CMSC 838S Information Visualization Spring 2006 5 Snapshot of the data First 50 links (approximately 0.1 percent of whole dataset) (1001,1105,'100 S.Ct. 318'),(1001,1612,'101 S.Ct. 2352'),(1001,1018,'107 S.Ct. 1232'),(1001,1016,'112 318') S.Ct. 2886'),(1001,2923,'113 S.Ct. 2264'),(1001,1016,'120 L.Ed.2d 798'),(1001,2923,'124 L.Ed.2d 539'),(1001,2286,'138 F.3d 1036'),(1001,2396,'238 F.3d 382'),(1001,3410,'438 U.S. 104'),(1001,1105,'444 U.S. 51'),(1001,1612,'452 U.S. 264'),(1001,1018,'480 U.S. 470'),(1001,1016,'505 U.S. 1003'),(1001,2923,'508 U.S. 602'),(1001,3410,'57 L.Ed.2d 631'),(1001,1105,'62 L.Ed.2d 210'),(1001,1612,'69 L.Ed.2d 1'),(1001,1789,'926 F.2d 1169'),(1001,1018,'94 L.Ed.2d 472'),(1001,3410,'98 S.Ct. 2646'),(1002,1276,'100 S.Ct. 2138'),(1002,1101,'105 S.Ct. 3108'),(1002,1018,'107 S.Ct. 1232'),(1002,1098,'107 S.Ct. 2378'),(1002,1016,'112 S.Ct. 2886'),(1002,1015,'114 S.Ct. 2309'),(1002,1016,'120 L.Ed.2d 798'),(1002,1013,'121 S.Ct. 2448'),(1002,1012,'122 S.Ct. 1465'),(1002,1015,'129 L.Ed.2d 304'),(1002,2316,'142 F.3d 1319'),(1002,1013,'150 L.Ed.2d 592'),(1002,1012,'152 L.Ed.2d 517'),(1002,1121,'266 F.3d 487'),(1002,3028,'306 F.3d 113'),(1002,3410,'438 U.S. 104'),(1002,1276,'447 U.S. 255'),(1002,1101,'473 U.S. 172'),(1002,1018,'480 U.S. 470'),(1002,1098,'482 U.S. 304'),(1002,1016,'505 U.S. 1003'),(1002,1015,'512 U.S. 374'),(1002,1013,'533 U.S. 606'),(1002,1012,'535 U.S. 302'),(1002,3410,'57 L.Ed.2d 631'),(1002,2091,'59 F.3d 852'),(1002,1276,'65 L.Ed.2d 106'),(1002,1889,'746 F.2d 135'),(1002,1101,'87 L.Ed.2d 126'),(1002,1018,'94 L.Ed.2d 472'),(1002,2319,'953 F.2d 1299'),(1002,1098,'96 L.Ed.2d 250'),(1002,3410,'98 S.Ct. 2646'),(1002,1022,'980 F.2d 84'),(1002,2670,'989 F.2d 362'),(1003,1104,'100 S.Ct. 383'),(1003,1611,'104 S.Ct. 2862'),(1003,1100,'106 S.Ct. 1018'),(1003,1099,'107 S.Ct. 2076'),(1003,1016,'112 S.Ct. 2886'),(1003,3110,'116 S.Ct. 2432'),(1003,1016,'120 L.Ed.2d 798'),(1003,1012,'122 S.Ct. 1465'),(1003,1881,'13 F.3d 1192'),(1003,3054,'133 F.3d 893'),(1003,3110,'135 L.Ed.2d 964'),(1003,1012,'152 L.Ed.2d 517'),(1003,1047,'18 F.3d 1560'),(1003,1886,'265 F.3d 1237'),(1003,2689,'271 F.3d 1090'),(1003,1358,'271 F.3d 1327'),(1003,1149,'28 F.3d 1171'),(1003,1040,'331 F.3d 891') May 09, 2006 CMSC 838S Information Visualization Spring 2006 6 Architecture Clustering Data Module Clustered Data Visualization Module Similarity Metric May 09, 2006 CMSC 838S Information Visualization Spring 2006 7 Routine K-Means Clustering Data points are in vector space. x and are vectors. j This assumption does not hold for cases represented as sets. Centroids are not simple geometric means. In fact, mean does not make any sense. May 09, 2006 CMSC 838S k V i 1 jS i Information Visualization Spring 2006 x j 2 j 8 Routine Self Organizing Map Wv and D are assumed to be vectors. Wv(t + 1) = Wv(t) + Θ(t)α(t) [D(t) - Wv(t)] This assumption does not hold. May 09, 2006 CMSC 838S Information Visualization Spring 2006 9 Similarity Measures Jaccard similarity Reference-based similarity Weighted referencebased similarity May 09, 2006 CMSC 838S A B J ( A, B ) A B S ( A, B) A B WS ( A, B) Information Visualization Spring 2006 f ( x) x A B A B 10 Contribution to clustering Applying K-means and SOM for producing better visualizations Not apparent at first glance, but the above algorithms are not applicable to set visualization directly They assume a 2D or nD (vector) representation for each data point (i.e. law case). More specifically, the attributes must form a vector space. This assumption does not hold May 09, 2006 no clear geometric attribute corresponding to the dataset CMSC 838S Information Visualization Spring 2006 11 Similarity Metrics Geometric Metrics 1-D Partitioning 2-D Partitioning May 09, 2006 Sequential arrangement Distance based arrangement CMSC 838S Information Visualization Spring 2006 1 2 5 9 3 4 7 12 6 8 11 14 10 13 15 16 12 K-Means May 09, 2006 CMSC 838S Information Visualization Spring 2006 13 K-Means May 09, 2006 CMSC 838S Information Visualization Spring 2006 14 SOM after K-Means May 09, 2006 CMSC 838S Information Visualization Spring 2006 15 Various Interactive Tools Referencing pattern (activating all links) Local referencing Density map Representative element Tool tip Link follow-up Search May 09, 2006 CMSC 838S Information Visualization Spring 2006 16 Referencing Pattern May 09, 2006 CMSC 838S Information Visualization Spring 2006 17 Local Referencing May 09, 2006 CMSC 838S Information Visualization Spring 2006 18 Local Referencing May 09, 2006 CMSC 838S Information Visualization Spring 2006 19 Density Map May 09, 2006 CMSC 838S Information Visualization Spring 2006 20 Density Map May 09, 2006 CMSC 838S Information Visualization Spring 2006 21 Representative Element May 09, 2006 CMSC 838S Information Visualization Spring 2006 22 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 23 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 24 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 25 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 26 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 27 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 28 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 29 Link Follow-up May 09, 2006 CMSC 838S Information Visualization Spring 2006 30 DEMO May 09, 2006 CMSC 838S Information Visualization Spring 2006 31 Future Work Other clustering algorithms can be explored: Spectral Fuzzy C-means More similarity functions Better initial posting of data Zooming and Panning May 09, 2006 CMSC 838S Information Visualization Spring 2006 32 References Abello, J., Korn, J., Visualizing Massive Multi-Digraphs. Proceedings of the IEEE Symposium on Information Visualization 2000. Berry, M.W., Drma, Z., Jessup, E.R., Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 41:2, 1999, pp. 335-362. Gansner , E.R., Koutsofios, E., North, S.C., Vo, K.P., A Technique for Drawing Directed Graphs. IEEE Trans. on Soft. Eng. 19(3), 1993, pp. 214-230. Guimerà, R., Mossa, S., Turtschi, A., Amaral, L.A.N., The Worldwide Air Transportation Network: Anomalous Centrality, Community Structure, and Cities' Global Roles. Proceedings of the National Academy of Sciences 102, May 31, 2005, pp. 7794-7799. Jain, A.K., Murty, M.N., Flynn, P.J., Data Clustering: A Review. ACM Computing Surveys, 1999. Kohonen, T., The Self-Organizing Map. Proceedings of the IEEE, Volume 78, Issue 9, Sept. 1990, pp. 1464-1480. Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A., Self organization of a massive document collection. IEEE Transactions on Neural Networks, Vol. 11, 2000, pp. 574-585. Kunz, C., Botsch, V., Ziegler, J., Spath, D., Contextualizing Search Results in Networked Directories. Proceedings of HCII, 2003. Leuski, A., Strategy-based Interactive Cluster Visualization for Information Retrieval. International Journal on Digital Libraries, Vol. 3, Issue 2, 2000, pp. 170. Liu, X., Luo, M., Shneiderman B. Visualization of Sets. Unpublished manuscript, 2005. McQueen, J.B., Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1967, pp. 281-297. Murata, T., Visualizing the Structure of Web Communities Based on Data Acquired From a Search Engine. IEEE Trans. on Industrial Electronics, Vol. 50, No. 5, 2003. Palla, G., Derenyi, I., Farkas, I., Vicsek, T., Uncovering the Overlapping Structure of Complex Networks in Nature and Society. Nature Letters, Vol. 435, 9 June 2005, pp. 814. Self-organizing map. Wikipedia, The Free Encyclopedia. Seo, J., Shneiderman, B., Understanding Hierarchical Clustering Results by Interactive Exploration of Dendograms: A Case Study with Genomic Microarray Data. IEEE Computer Special Issue on Bioinformatics, Volume 35, No. 7, July 2002, pp. 80-86. May 09, 2006 CMSC 838S Information Visualization Spring 2006 33 Thanks! May 09, 2006 CMSC 838S Information Visualization Spring 2006 34