Measuring and Improving the Readability of Network Visualizations Cody Dunne cdunne@cs.umd.edu ORNL – March 11, 2013 The Data Problem Why Visualization? Anscombe’s Quartet I x II y x III y x IV y x y 10.00 8.04 10.00 9.14 10.00 7.46 8.00 6.58 8.00 6.95 8.00 8.14 8.00 6.77 8.00 5.76 13.00 7.58 13.00 8.74 13.00 12.74 8.00 7.71 9.00 8.81 9.00 8.77 9.00 7.11 8.00 8.84 11.00 8.33 11.00 9.26 11.00 7.81 8.00 8.47 14.00 9.96 14.00 8.10 14.00 8.84 8.00 7.04 6.00 7.24 6.00 6.13 6.00 6.08 8.00 5.25 4.00 4.26 4.00 3.10 4.00 5.39 19.00 12.50 12.00 10.84 12.00 9.13 12.00 8.15 8.00 5.56 7.00 4.82 7.00 7.26 7.00 6.42 8.00 7.91 5.00 5.68 5.00 4.74 5.00 5.73 8.00 6.89 Anscombe’s Quartet - Statistics Property Value Equality Mean of x in each case 9 Exact Variance of x in each case 11 Exact Mean of y in each case 7.50 To 2 decimal places Variance of y in each case 4.122 or 4.127 To 3 decimal places Correlation between x and 0.816 y in each case Linear regression line in each case To 3 decimal places To 2 and 3 decimal y = 3.00 + 0.500x places, respectively Anscombe’s Quartet - Scatterplots Networks! Edge List Adjacency Matrix Node 1 Node 2 Alice Bob Alice Cathy Alice Bob Cathy Alice 0 1 1 Cathy Bob 0 0 0 Alice Cathy 1 0 0 Tweets of the #Win09 Workshop # User 1 User 2 # User 1 User 2 1 20andlife barrywellman 15 danevans87 informor 2 20andlife BrianDavidson 16 danevans87 NetSciWestPoint 3 barrywellman elizabethmdaly 17 danielequercia BrianDavidson 4 barrywellman informor 18 danielequercia drewconway 5 BrianDavidson hcraygliangjie 19 danielequercia ipeirotis 6 BrianDavidson informor 20 danielequercia johnflurry 7 BrianDavidson NetSciWestPoint 21 danielequercia loyan 8 byaber barrywellman 22 danielequercia loyan 9 byaber danielequercia 23 danielequercia mcscharf 10 byaber mcscharf 24 danielequercia NetSciWestPoint 11 chrisnordyke RebeccaBadger 12 danevans87 barrywellman 106 sechrest Japportreport 13 danevans87 BrianDavidson 107 sechrest loyan 14 danevans87 drewconway 108 sechrest RebeccaBadger … … … Tweets of the #Win09 Workshop Who Uses Network Analysis Sociology Scientometrics Biology Urban Planning Politics Archaeology WWW Some of my work… NodeXL (Smith et al., 2009; Dunne & Shneiderman, 2013; +5) GraphTrail (Dunne et al., 2012; Riche et al., 2011) STICK (Shneiderman et al., 2011; Gove et al., 2011) Action Science Explorer (Dunne et al., 2012; Gove et al., 2011) NetGrok (Blue et al., 2008) smrfoundation.org NodeXL Collect data, Excel analysis, statistics, visualization, layout algorithms, filtering, clustering, attribute mapping… NodeXL Graph Gallery NodeXL as a Teaching Tool I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social media: New Technologies of Collaboration 3. Social Network Analysis II. NodeXL Tutorial: Learning by Doing 4. Layout, Visual Design & Labeling 5. Calculating & Visualizing Network Metrics 6. Preparing Data & Filtering 7. Clustering &Grouping III Social Media Network Analysis Case Studies 8. Email 9. Threaded Networks 10. Twitter 11. Facebook 12. WWW 13. Flickr 14. YouTube 15. Wiki Networks http://www.elsevier.com/wps/find/bookdescription.cws_home/723354/description NodeXL as a Research Tool Bonsignore EM, Dunne C, Rotman D, Smith M, Capone T, Hansen DL and Shneiderman B (2009), "First steps to NetViz Nirvana: Evaluating social network analysis with NodeXL", In CSE '09. pp. 332-339. DOI:10.1109/CSE.2009.120 Mohammad S, Dunne C and Dorr B (2009), "Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus", In EMNLP '09. pp. 599-608. Smith M, Shneiderman B, Milic-Frayling N, Rodrigues EM, Barash V, Dunne C, Capone T, Perer A and Gleave E (2009), "Analyzing (social media) networks with NodeXL", In C&T '09. pp. 255-264. DOI:0.1145/1556460.1556497 Research in NodeXL Node-Link Visualization is Hard Alternate visualizations... Gove et al., 2011 Henry & Fekete, 2006 Freire et al., 2010 Dunne et al., 2012 Wattenberg, 2006 Better Layouts… Hachul & Jünger, 2006 Plan of attack Readability metrics • Global/local • Taxonomy/layout aids Motif simplifications Evaluations Meta-layouts • Readability metrics • User studies Readability Metrics Why measure readability? Lee et al., 2003 Measuring Readability Simple rules or heuristics Davidson & Harel, 1996 User performance Huang et al., 2007 Global readability metrics Purchase, 2002 Source: Sugiyama, 2002, p. 14 Global Readability Metrics • How understandable is the network drawing? • Example: Journal may suggest • 0% node occlusion • <2% edge tunneling • <5% edge crossing E.g., Node Overlap Global readability metric [0,1] where: 0 = Complete overlap 1 = No overlap Node readability metric Ratio of node area that overlaps other nodes My metrics New Local Node overlap Edge tunnel Drawing space used Group overlap Edge crossing Angular resolution Edge crossing angle Existing metrics Assisted Manipulation • Real-time ranking & coloring by metrics 14 edge tunnels 0 edge tunnels Images: Cody Dunne Discussion • Raise awareness of readability issues • Localized identification of where improvement is needed • Optimization recommendations for tasks • Interactive optimization • Future optimization plans Dunne C and Shneiderman B (2009), "Improving graph drawing readability by incorporating readability metrics: A software tool for network analysts". University of Maryland. Human-Computer Interaction Lab Tech Report No. (HCIL-2009-13). Motif Simplification Lostpedia articles Observations 1: There are repeating patterns in networks (motifs) 2: Motifs often dominate the visualization 3: Motifs members can be functionally equivalent Graph Summarization… Navlakha et al., 2008 Motif Simplification Fan Motif 2-Connector Motif Lostpedia articles Lostpedia articles Glyph Design: Fan Glyph Design: Connector Cliques too! Interactivity Fan motif: 133 leaf vertices with head vertex “Theory” Senate Co-Voting: 65% Agreement Senate Co-Voting: 70% Agreement Senate Co-Voting: 80% Agreement Senate Co-Voting: 85% Agreement Voson Web Crawl Voson Web Crawl Voson Web Crawl User Impressions “I’m overwhelmed, … this is like one of those vision tests at the eye doctor” “Now I can see the central pages…[and] pairwise connections” Controlled Experiment - General • Maximal motifs: • 21s faster***, 68% more accurate***, 28% less size error (0%)*** • Estimating node count • 22s slower***, 39% less error*** • Finding plain labeled nodes • 20s faster**, 83% more accurate** • Finding simplified labeled nodes • 15s slower* Controlled Experiment - Topology • Cut points • 35% more accurate** • Path length • 10s slower*, 15% less error**/19% more error* • Neighbor compare • 10s slower**, 19% less accurate* • Shared neighbor count • 18s slower***, 11% more error* Motif Detection Algorithms • Fans • Straightforward • O(N * avg. neighbor count) • Connectors • Handle overlapping connectors • O(N * avg. neighbor count) • Cliques • Traditional clique-finding algorithms • Choice heuristics • O(3^(N/3)) Discussion • Motif simplification effective for • Reducing complexity • Understanding larger or hidden relationships • However • Frequent motifs may not be covered • Glyph design has tradeoffs • May be challenging at first for some tasks • Available now in NodeXL: nodexl.codeplex.com Dunne C and Shneiderman B (2013), "Motif simplification: improving network visualization readability with fan, connector, and clique glyphs", In CHI '13. Shneiderman B and Dunne C (2012), "Interactive network exploration to derive insights: Filtering, clustering, grouping, and simplification", In Graph Drawing ‘12. pp. 2-18. DOI:10.1007/978-3-642-36763-2_2 Meta-Layouts Analogy: Clusters Are Occluded Hard to count nodes, clusters Separate Clusters Are More Comprehensible Meta-Layouts • Layout using groupings • Attributes • Topology • Manual • Good for • Large or high density networks • Highlighting hidden relationships • Recursive nesting Group-in-a-Box Meta-Layouts • Squarified Treemap • See topology poorly, space-filling • Fitted Rectangles • See topology better, slight space increase • Force-Directed • See topology well, at the cost of space Risk Movements Plain Layout with Clusters Risk Movements GiB Treemap GiB Fitted Rectangles: The Donut GiB Fitted Rectangles: The Croissant Risk Movements GiB Fitted Rectangles (Croissant) GiB Force-Directed Risk Movements GiB Force-Directed Pennsylvania Innovation Pennsylvania Innovation GiB Treemap Pennsylvania Innovation GiB Fitted Rectangles Pennsylvania Innovation GiB Force-Directed GiB Force-Directed: Algorithm • Start with initial area usage (20%--50%) • Generate initial positions • Harel & Koren, 2002 • Better to use meta-edge weights • Remove overlaps • Gansner & Hu, 2009 • Minimize space used • Retain layout structure • Scale the new layout to fit Force-Directed GiB Box Initial Positions Force-Directed GiB Overlap Removal 20% Originally Filled Force-Directed GiB Overlap Removal 50% Originally Filled Putting It All Together Layout depends on task requirements: space-filling vs. showing relationships • Treemap • Fitted Rectangles • Force-directed Automatic choices: • Disconnected components • Treemap outer layout • Nested GiB layouts • Two groups: Treemap • Fitted rectangles • Donut for a few large groups • Croissant for more evenly distributed groups Empirical Evaluation • Compare techniques on 3564+ Twitter networks • Measure readability metrics • Edges crossing boxes unnecessarily • Forthcoming results… Discussion • Three Group-in-a-Box layouts for dissecting networks • Improved group and overview visualization • Tradeoffs: Filling space vs. showing relationships • Available in NodeXL: nodexl.codeplex.com • Treemap: Available now! • Force-Directed & Fitted Rectangles: ~6 weeks • Real-world application Shneiderman B and Dunne C (2012), "Interactive network exploration to derive insights: Filtering, clustering, grouping, and simplification", In Graph Drawing ‘12. pp. 2-18. DOI:10.1007/978-3-642-36763-2_2 Dunne C, Chaturvedi S, Ashktorab Z, Zacharia R, and Shneiderman B (2013), "Fitted rectangles and forcedirected group-in-a-box layouts for clustered network visualization", In preparation. Rodrigues EM, Milic-Frayling N, Smith M, Shneiderman B, and Hansen (2011), “Group-in-a-Box layout for multi-faceted analysis of communities”, In SocialCom ’11. pp. 354-361. DOI:10.1109/PASSAT/SocialCom.2011.139 Better Node-Link Visualizations Readability metrics • Global/local • Taxonomy/layout aids Motif simplifications Evaluations Meta-layouts • Readability metrics • User studies Some of my work… NodeXL (Smith et al., 2009; Dunne & Shneiderman, 2013; +5) GraphTrail (Dunne et al., 2012; Riche et al., 2011) STICK (Shneiderman et al., 2011; Gove et al., 2011) Action Science Explorer (Dunne et al., 2012; Gove et al., 2011) NetGrok (Blue et al., 2008) Future Plans Future of Readability Metrics: Multi-Criteria Optimization • User-defined energy function • Interactive view of task-by-metric taxonomy • Simulated annealing • Metropolis et al., 1953; Kirkpatrick et al., 1983 • Searches layout space • Hill climbing • Expensive, but valuable esp. for static images Future: Network Overviews • Identify high-level structures • Motifs • Clusters • Network backbone • Ease display, especially online • Semantic zooming • Interactivity • Glyphs Wong et al., 2008 Future: Network Evolution • Line charts • Dynamic filters • Time bins • Heatmap slices Future: Medical Records • Connections between patents and concepts • Fast, approximate analyses • Continuous, random stream of records • Clustering • Uncertainty visualization Funders & Collaborators Funding • NSF grants SBE 0915645, IIS 0705832, IIS 0968521 • HHS SHARP grant 10510592 • Social Media Research Foundation, Connected Action Consulting Group, Microsoft External Research, Microsoft Research, National Cancer Institute Co-Authors • Ben Shneiderman, Marc Smith, Snigdha Chaturvedi, Zahra Ashktorab, Rajan Zacharia, Tony Capone, Eduarda Mendes Rodrigues, Natasa MilicFrayling, Nathalie Riche, Bongshin Lee, Ron Metoyer, George Robertson, Robert Gove, Bonnie Dorr, Judith Klavans, Saif Mohammad, Puneet Sharma, Ping Wang, Awalin Sopan, Nick Gramsky, Rose Kirby, Emre Sefer, Meirav Taieb-Maimon, Vladimir Barash, Adam Perer, Eric Gleave, Derek Hansen, Elizabeth Bonsignore, Dana Rotman, Ryan Blue, Adam Fuchs, Kyle King, and Aaron Schulman Collaborators • Catherine Plaisant, Jon Froehlich, Leah Findlater, Yiyan Liu Take Away Messages Create effective node-link visualizations in NodeXL: • Readability metrics to guide improvements • Motif simplification to reduce complexity • Meta-layouts to more clearly show ties and groups Cody Dunne cdunne@cs.umd.edu www.cs.umd.edu/~cdunne/