Expanding and Evaluating Network Motif Simplification Cody Dunne and Ben Shneiderman {cdunne, ben}@cs.umd.edu 30th Annual Human-Computer Interaction Lab Symposium, May 22–23, 2013 College Park, MD Who Uses Network Analysis Sociology Scientometrics Biology Urban Planning Politics Archaeology WWW Network Analysis is Hard Better Layouts Alternate Visualizations Hachul & Jünger, 2006 Gove et al, 2011; Henry & Fekete, 2006; Dunne et al, 2012; Freire et al, 2010; Wattenberg, 2006 Graph Summarization Navlakha et al, 2008 Lostpedia articles Observations 1: There are repeating patterns in networks (motifs) 2: Motifs often dominate the visualization 3: Motifs members can be functionally equivalent Motif Simplification Fan Motif 2-Connector Motif Lostpedia articles Lostpedia articles Glyph Design: Fan Glyph Design: Connector Glyph Design: General D-Connector Connector Variants Cliques too! Senate Co-Voting: 65% Agreement Senate Co-Voting: 70% Agreement Senate Co-Voting: 80% Agreement Interactivity Fan motif: 133 leaf vertices with head vertex “Theory” Interactivity in NodeXL Motif Detection Algorithms • Fans • O(N) • Connectors • Handle overlapping connectors • O(N * avg. neighbor count) • Cliques • Overlapping clique-finding algorithms • Tomita et al., 2006; Eppstein & Strash, 2011 • Choice heuristics • O(3^(N/3)) Controlled Experiment 60 50 40 30 20 *** 10 0 Web *** Wiki *** Wiki Plain Clique Time 60 50 -1 40 30 *** 20 10 0 Web Time (s) 60 50 40 30 20 10 0 Connector Time Time (s) Time (s) Fan Time MS -3 -3 *** Senate *** Web Connector Acc. 80% 80% 60% 40% 60% 40% 0% 0% 0% Web Connector Error 1500% *** Error *** -50% 1000% 500% 0% -500% -100% * -1000% Wiki Web Wiki Web Senate Web Clique Error Error 50% 0% Wiki MS *** 40% 20% Web Plain 60% 20% Fan Error *** 100% *** 80% 20% Wiki Error Clique Acc. Accuracy 100% *** 100% Accuracy Accuracy Fan Acc. 4000% 3000% 2000% 1000% 0% -1000% -2000% ** Senate Web Plain 60 50 40 30 20 10 0 Node Count Error MS 50% *** *** Error Time (s) Node Count Time 0% *** *** -50% -100% Wiki Senate Web Wiki Senate Web Cutpoint Accuracy Cutpoint Time 100% 60 50 40 30 20 10 0 80% Accuracy Time (s) *** ** 60% 40% 20% 0% Wiki Web Wiki Web Plain 60 50 40 30 20 10 0 Plain Label Accuracy MS 100% -12 -3 *** *** Wiki Senate -18 60% 40% 0% Web Wiki Senate Web Hidden Label Accuracy 100% -13 -10 -5 ** -4 -16 -17 80% Accuracy Time (s) *** 20% Hidden Label Time 60 50 40 30 20 10 0 *** 80% Accuracy Time (s) Plain Label Time 60% 40% 20% 0% Wiki Senate Web Wiki Senate Web Plain Path Error MS 60 50 40 30 20 10 0 * 100% Error Time (s) Path Time * 0% -50% Senate Web1 Web2 Wiki Senate Web1 Web2 *** ** Nbr. Compare Accuracy Accuracy Nbr. Compare Time Time (s) * ** Wiki 60 50 40 30 20 10 0 50% 100% 80% 60% 40% 20% 0% * * Shared Nbr. Count Error MS 60 50 40 30 20 10 0 1500% 1000% -1 *** Error Time (s) Shared Nbr. Count Time Plain 0% -500% * -1000% Wiki Web1 Web2 Wiki Shared Nbr. Comp. Time 60 50 40 30 20 10 0 Web1 Web2 Shared Nbr. Comp. Accuracy 100% 80% Accuracy Time (s) 500% 60% 40% 20% 0% Wiki Web Wiki Web Motif Simplification Discussion • Motif simplification effective for • Reducing complexity • Understanding larger or hidden relationships • However • Frequent motifs may not be covered • Glyph design has tradeoffs • May be challenging at first for some tasks • Available now in NodeXL: nodexl.codeplex.com Funders • Social Media Research Foundation • Connected Action Consulting Group • National Science Foundation grant SBE 0915645 Motif Simplification Papers • Dunne C & Shneiderman B (2013). Motif simplification: improving network visualization readability with fan, connector, and clique glyphs. In CHI '13. • Shneiderman B & Dunne C (2012). Interactive network exploration to derive insights: Filtering, clustering, grouping, and simplification. In Graph Drawing '12. Cody Dunne, Ben Shneiderman {cdunne,ben}@cs.umd.edu Available now in NodeXL: nodexl.codeplex.com Motif Detection Algorithms • Fans • O(N) • Connectors • Handle overlapping connectors • O(N * avg. neighbor count) • Cliques • Overlapping clique-finding algorithms • Tomita et al., 2006; Eppstein & Strash, 2011 • Choice heuristics • O(3^(N/3)) Fan Motif Detection O(n) Connector Detection O(n*avg deg) D-min: 2 D-max: ∞ found: • a, b -> c • a, b -> c, d • a, b -> c, d, e Connector Problems D-min: 2 D-max: ∞ found: • a, b -> d D-min: 2 D-max: ∞ found: • a, b -> c, d, e • c, d, e -> a, b Connector Motif Detection (2) Clique Motif Detection • Find cliques using Tomita et al., 2006 • O(3^(N/3)) time • High memory cost for matrix multiplication • Alternatively use Eppstein & Strash, 2011 • Faster for large, sparse networks • Slower for dense networks within constant factor • Linear memory cost • Greedy heuristic • Choose largest non-overlapping cliques Controlled Experiment • Participants: 2 pilot, 36 main • Data: The Wiki, Senate, and Web networks • Two groups: control and motif simplification • 31 questions • 45 minutes Controlled Experiment - Tasks Based on Lee et al. 2006 taxonomy: 1. Node count: About how many nodes are in the network? 2. Cut point: Which individual node would we remove to disconnect the most nodes from the main network? 3. Largest motif & size: Which is the largest ( fan | connector | clique ) motif and how many nodes does it contain? 4. Labels: Which node has the label “XXX”? 5. Shortest path: What is the length of the shortest path between the two highlighted nodes? 6. Neighbors: Which of the two highlighted nodes has more neighbors? 7. Common Neighbors: How many common neighbors are shared by the two highlighted nodes? 8. Common Neighbors: Which of these two pairs of nodes has more common neighbors? Controlled Experiment - Results Task Time Accuracy Error Cliques -20.8s*** 0% vs -28%**(Senate) 0 vs. 3 Fan -7.8s*** Connectors -17.1s*** 21.8s*** (Wiki,Web) 92% vs. 24%*** 95% vs. 35%*** (Web) 79% vs. 6%***(Web) Node Count Label Plain -19.9s** Label Simp. Cut point 15.3s*(Senate) Path 10.1s*(Web) More neighbors Count shared neighbors Compare shared neighbors Skipped 2% vs -62%*** -5% vs. -17%*(Wiki) -8% vs. -47%*** 0 vs. 12 Wiki, 3 vs. 18 Web 98% vs. 15%**(Wiki,Web) 53% vs. 18%**(Web) -7% vs. 22%**(Senate), 20% vs. 1%*(Web) 10.9s**(Wiki), 60% vs. 79%*(Senate,Web) 9.26*(Senate) 17.7s*** -21% vs. -10%* (Wiki) (Web) Quantifying Effectiveness Senate Co-Voting: 85% Agreement Senate Co-Voting: 90% Agreement Senate Co-Voting: 95% Agreement Voson Web Crawl Voson Web Crawl Voson Web Crawl Voson Web Crawl Voson Web Crawl