Fitted Rectangles: A Visualization for Clustered Graphs Snigdha Chaturvedi Computer Science, University of Maryland snigdhac@cs.umd.edu Zahra Ashktorab Information School, University of Maryland parnia@umd.edu Rajan Zacharia Information School, University of Maryland rajankz@umd.edu politicians or actors etc. Another popular approach is to group the vertices based on some common attribute such as geographical location or interests. Often the vertices belonging to same community tend to behave similarly and so, it can be useful to study individual communities in greater detail. In most visualization tools, viewing the communities is facilitated by using distinctive colors and shapes to represent group identity. For example, politicians can be represented by red circle while actors by blue triangles. A recent popular method, the 'Group-In-a-Box Layout' [4] enhances this further by assigning specific spatial regions to individual groups. The GIB method lays out individual groups into different well-separated rectangles or 'boxes' (from where it derives its name) revealing the local intra-group network structure and attributes of the members. ABSTRACT Community structures and relationships within graphs are a significant facet of data analysis and data visualization. The existence of relationships between communities and the strength between communities in a graph is an important graph attribute and the visual clarity of these relationships could provide insights during the exploratory phase of data analysis. Currently, there is no effective way of visualizing a graph to ensure that the relationships between clusters is clear. We propose Fitted-Rectangle Layout (FRL), a metalayout for graphs that the visualizes clusters and their respective inter-group edges clearly. FRL takes into consideration the order of the clusters within a graph when laying them out and places them in order of decreasing connectivity of a group to all other groups. The most connected group is placed in the center of the graph, while all others are placed around it. The goal of the FRL is to clearly visualize the inter-group edges and to decrease the overlapping of inter-group edges that occurs in other TreeMap layouts. By conducting usability studies, we found that users prefer the FRL layout to the GIB layout when seeking for inter-group relationship information in the visualization. However, apart from intra-group relationships, it is also interesting to study the inter-group relationships. This helps in understanding the nature of information exchange between groups and how they are different from each other. For example, in any given twitter network, it would be interesting to see if there is a stronger relationship between the sports and the politics community or between the sports and the actors community. One of the ways to visualize this in the GIB method is to 'combine' all the inter-group edges between a pair of groups into one thick edge from center of one group's box to another. Though, this conveniently conveys the 'strength' of connections between groups, it leads to an abundance of overlap among these thick 'combined' inter-group edges; especially near groups that are highly connected to other groups. For example, in Figure 5(a) we see a lot of overlap between the thick grey combined edges around G1 and G2. This happens because the tree map algorithm used to layout group rectangles does not take the inter-group connectivity into account. Author Keywords network visualization; layout; meta-layout; force-directed; graph clustering visualization; communities 1) INTRODUCTION Network visualization is a challenging task involving studying properties of hundreds and thousands of entities and the relationships between them. With the recent surge in popularity of social networks, the growing size of these networks makes their analysis interesting but more difficult. Graph clustering plays an important role in network analysis. It refers to topology based clustering of vertices into several groups or communities such that the connections within groups are tighter than those between groups. For example, in a Twitter follower network, people who very frequently follow/re-tweet each other would form one community such as the community of sports people or This paper addresses this problem by rearranging the group rectangles in a 'U' or a 'Donut' shape such that the most connected group is placed at the core of the 'U' or 'Donut' and is surrounded by other groups. For example Figure 5(b) shows an example of the Donut layout. The edges around G1 and G2 are more discernible than those in Figure 5(a). Our experiments/user study with 9 subjects indicate that such an arrangement is more effective in visualizing the inter-group relationships than the GIB method. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI’13, April 27 – May 2, 2013, Paris, France. Copyright 2013 ACM 978-1-XXXX-XXXX-X/XX/XX...$10.00. In the following sections, we describe the related work and specifically GIB technique in greater detail. We then -1- explain our approach to the problem followed by empirical results establishing its effectiveness. We finally conclude the paper by summarizing our contributions and suggesting directions of future work. Because various clustering algorithms were used when determining the groups in our Fitted-Rectangle Algorithm, selecting the right algorithm was an important facet of our research. One algorithm used was the Wakita- Tsurumi algorithm, an alternative to the Clauset-Newman, and Moore algorithm and one of the three clustering algorithms provided by NodeXL. Wakita et. al propose an alternative to the CNM algorithm that finds connections 70 times faster than the CNM algorithm. This was a factor we considered when choosing our our clustering algorithm because speed is important with any visualization tool. Much research has been conducted on the utilization of force-directed layouts in graphs. This is particularly important, even within clusters within graphs. Two algorithms commonly employed to layout vertices are the Fruchterman-Reingold (FR) layout and the Harel and Koren Layout. While both of these algorithms aim to layout vertices to achieve an optimal graph structure, they use different methods. HK uses recursion to coarsen the graph until it reaches an optimal multi-level representation, while FR uses attractive and repulsive forces to achieve an optimal equilibrium in a graph. We implement our meta-layout in NodeXL, a toolkit that runs as an add-in in Microsoft Excel. Smith et. al. demonstrate the usefulness of NodeXL in data analysis and visualization by using its various sorting, clustering and filtering functionalities. The available clustering algorithms in NodeXL were particularly instrumental in our implementation of PR. 2) RELATED WORK The ease of detection of communities within graphs and their inter-connectivities are instrumental in revealing insights. Providing a mechanism to visualize these communities within a visualization tool could aid in the detection of important insights within the data. There has been much previous work done on visualizing communities within a graph. Treemaps have been used in many studies to visualize communities and the sub-communities of which they consist. Frantz et. al describes the usefulness of treemaps for analyzing social networks. During the exploratory part of data visualization and analysis, treemaps can be particularly insightful and revealing, particularly in social network analysis. The original treemap, introduced by Johnson et. al, employed the 'slice and dice' method for representing hierarchical information in rectangular space in a spacefilling manner. The Group-In-A-Box meta-layout is an application of treemaps to aid graph cluster visualization. Rodrigues et. al. introduce Group-In-A-Box as a metalayout that displays each cluster in a treemap enclosed within its own “box”. A space-filling technique is employed to fill the space with clusters that have been grouped based on their network attributes or clustering algorithms. Each box size is proportional to the number of vertices it’s enclosed cluster includes. Furthermore, much work has been done on creating boxes with lower aspect ration in treemaps. The traditional tree mapping algorithms resulted in several thin elongated rectangles which are difficult to point at in an interactive environment. Bruls et. al propose a method a method of organization that results in a treemap which consists of 'squarified' rectangles with lower aspect ratios. While aspect ratios of the squares are significant in a tree-map, our focus is to improve the readability of intergroup relationships between the clusters in the graph. While the GIB implementation of the squarified treemap is an improvement over the original Slice-In-Dice method in terms of improving aspect ration of boxes, resulting overlapping edges between the clusters in GIB increase the clutter in the graph and thus decrease the ease to which users can understand information and draw conclusions from the visualized information. The GIB algorithm does not consider the placement and overlapping of intergroup edges and thus results in overlapping edges that may be detrimental to the readability of the visualization. Our proposed “Fitted-Rectangle-Layout” visualization, which we will refer to as FRL for the remainder of this paper, solves this problem by taking the intergroup edges into consideration when determining the ordering of the clusters on the graph. 3) METHOD: FITTED RECTANGLES In this section we explain our Fitted rectangles method in detail. The scope of our method is limited to a better visualization of the combined inter-group edges as produced by the GIB method. Our method does not aim at a better layout of the vertices of the graph. Instead, it proposes a meta-layout that attempt to improve inter-group edge overlap caused by the use of treemap layout in the GIB method. In other words, this paper proposes an alternative to the usage of treemap algorithm for arranging the group boxes. In this paper, we propose to alleviate this problem of indiscernible combined inter-group edges by arranging the groups in a U or a Donut layout instead of following a treemap algorithm. Like GIB, each group is allotted a box or rectangle on the 2D screen space whose area is proportional to the number of vertices in the group. The group rectangles are then arranged in a U or Donut layout (explained below). Then the vertices of individual groups are laid out in the box allotted to that group using any of the popular graph layout algorithms like Fruchterman-Reingold algorithm, Harel and Koren algorithm etc. For the rest of this paper, we will be using the following two terms: -2- ο· Connectedness. A group's connectedness is defined as the number of other groups it is connected to. In other words it is the number of outgoing combined intergroup edges. divides the screen into two horizontal (H1 and H2) and two vertical (V1 and V2) empty boxes. The remaining groups will be arranged in these boxes alternating in the sequence H1, H2, V1, V2. ο· G-skewness. For a given graph, this refers to the fraction of vertices present in the two most connected groups (groups with highest connectedness values). Since we only know the areas and not the dimensions of the groups, we use the orientation of the empty boxes to determine the group's width or height. While placing a group in a horizontal empty box, we set its height to be same as that of the horizontal empty box. Its width is then determined by dividing its area by the height. Using the dimensions calculated above, the group is finally placed in the horizontal empty space box aligned with the empty box's left side. For example, in Figure 1, step (ii) has a horizontal empty box labeled H1. The result of placing a group, G2, in H1 is shown in step (iii) creating a new smaller H1 in (iii). It is easy to see that G2 and H1 of step (ii) have same heights and they have a common left edge. Similarly, while placing a group in a vertical free box, we set its width to be same as that of the empty box and its height is determined using its width and area. The group and the vertical empty box share the top edges of the empty box. See placement of G4 in V1 in steps (iv) and (v) of Figure 1 for an example. Our approach involves placing the group-boxes in a Donut or U layout. In simple terms, in Donut layout, we place the most connected group at the center of the screen and arrange the remaining group all around the center group (See step (viii) of Figure 1). This method should be preferred when there are a lot of small groups in the graph. However, when the graph contains one or two big clusters and a few small clusters, Donut layout might result in a lot of space wastage. In such a case, U layout should be preferred in which the most connected group is placed at the top center portion of the screen and the remaining groups are arranged on the three sides of the previous group forming a U shape (See step (viii) of Figure 2). In general, a Donut layout is more effective in reducing the edge overlap but is less space filling than the U layout. Clearly, there is a tradeoff between the two desired effects. So, if the G-skewness is high, then our method prefers a U layout over Donut layout and vice versa. Defining G to be the total number of groups in the graph, we proceed by dividing the problem into 3 cases: After placing G1, we place the next group, G2, in H1; followed by G3 in H2, G4 in V1 and G5 in V2 (steps (ii) to (vi)). Step (vi) shows that we have G6 and G7 left. So starting again at H1, we place G6 in H1 (step(vii)). We then try to place G7 in H2, but H2 had no space left. So, we move on to V1 and place G7 there (step (viii)). This ends the algorithm as there aren't any more groups left to be placed at step (viii). 1. Case1: G<=3 and G-skewness <0.1 Layout the group-boxes using the treemap algorithm as in case of GIB 2. Case2: G>3 and 0.1<= G-skewness <=0.45 The method proposed above is not space filling which might result in a situation where the algorithm still has some groups to place on the screen but none of the empty boxes are big enough. In such a situation, we restart the algorithm with alpha=0.9*previous alpha and repeat this process until all the groups get placed on the screen. A detailed algorithm for the proposed method is shown in Figure \ref{}. Layout the group-boxes using Donut layout 3. Case3: G>3 and G-skewness >0.45 Layout the group-boxes using U layout 3.1) Donut Layout Figure 1 shows an example of a Donut layout with 7 groups. The area occupied by each group is calculated as: Group Area = πππβπ ∗ ππππππ π΄πππ ∗ 3.2) U Layout Figure 2 shows an example of the U layout. The groups sorted in decreasing order of connectedness and their areas are shown in the table on the left in the figure. The algorithm is similar to Donut layout. The only difference is that the first group, G1, gets placed at the core of the U resulting in one horizontal and two vertical empty boxes instead of two each namely H2, V1 and V2 (step (ii) of the figure). The remaining groups are placed around G1 in alternating in the sequence H2, V1 and V2 (Steps (iii) to (viii)). π_ππππ’π π_ππππβ where N_group and N_graph represent the number of vertices in the group and graph respectively and alpha =1. The areas, for the current example is specified in the table on the left in the figure. Let us assume that the groups are sorted in decreasing order of connectedness. The various steps of the process are marked as (i) to (viii). The figure also shows that each step is also associated with an array which stores the groups left to be placed at that step. In the following sections, we will explain the implementation of our method and then compare our method with the GIB meta-layout. We begin with an empty screen and all the groups in step (i). We then place G1, the most connected group, at the center with an aspect ratio proportional to that of the screen. This is represented as a blue box in Step (ii). Placing G1 -3- The background graph in Figure 3 shows vertices colored by cluster id after using the Clauset-Newmann-Moore algorithm while the excel sheet on the left shows the various clusters labeled G1, G2 etc. After clustering, the users can also use GIB to visualize contents of individual groups in greater details. 4) IMPLEMENTATION The meta-layout presented above was implemented in NodeXL using C#. NodeXL is an open-source network analysis and visualization tool within Excel which allows easy computation of various network analysis metrics such as betweenness centrality, degree, closeness etc. The users can annotate the vertices and edges with these properties or with user-defined labels or images and visualize the results by drawing their graph algorithmically on a 2D screen space. Several layout algorithms such as the FruchtermanReingold and Harel-Koren, circular, spiral etc are currently being offered by NodeXL. For example, the graph in the background towards the right side in Figure 3 has been drawn using the Fruchterman-Reingold layout and the vertices are sized by betweenness centrality. Another relevant feature supported by NodeXL is the clustering option which clusters the vertices of the graph into smaller groups. NodeXL supports various graph clustering mechanisms such as clustering based on vertex attributes or connected components or using standard graph clustering algorithms. Upon clustering, the vertices, by default, get colored by their group id so that it is easy to identify the vertices belonging to same groups while visualizing them. The features briefly described above are a very small part of the huge range of rich functionalities provided by NodeXL. Since NodeXL is a very useful tool for insightful analysis of networks including clustered graphs, we decided to implement our method as an additional meta-layout option like GIB. The dialog box in the screen shot in Figure 3 shows how we provide our method as a layout option (encircled in red). We also provide three options for the style of inter-group edges as shown in the figure. The users can choose to either 'show' all the individual inter-group edges (Figure 4(a)) or 'hide' them completely (Figure 4(b)) or 'combine' those between two groups into one thick edge going from center of one group to another (Figure 4(c)). All the figures (except the simulations Figures 1 and 2) in this report have been generated by NodeXL implementation of our method using the 'combined' inter-group edge option. Figure 1: Example of a Donut layout -4- Figure 2: Example of a U layout Figure 3: Screenshot of NodeXL showing the Fitted Rectangle option Figure 4: Screenshot of NodeXL showing the inter-group edge option -5- Figure 5: Comparison of the GIB and FRL (Donut) Layouts -6- Figure 6: Comparison of the GIB and FRL (U) Layouts -7- 5) EVALUATION Nine participants (6 Females, 3 Males) who have at least pursued or are pursuing a masters degree or higher in a technical field were recruited to conduct a usability study. The participants self reported that they have dealt with network data previously, biological or social network data. The experiment followed a within subjects design, where subjects were given a set of tasks and asked to use two layouts, one with the GIB layout, and another with a FRL layout to answer questions. Participants were asked to think-aloud as they completed the tasks. Additionally, half of our participants looked at a dataset that fell under the “ULayout” FRL (Figure 6), and the other half of the participants looked at a dataset that produced a “Donut” (Figure 5). The order of experimental conditions was counterbalanced by alternating the order in which we presented GIB/FRL layouts. Each user was asked to answer a series of tasks given two layouts with the same data. These tasks were derived from questions that may arise about a network with regards to the relationship between the various communities in the network. Subjects completed the task in their native work environments and were presented the two layouts on a screen, the way in which the layouts would be naturally presented to them, if they were using NodeXL. They were asked to solve a series of tasks and were allowed to complete them in any order they were desired. If they consented, they were recorded as they “thought-aloud.” The tasks asked users to count the number of outgoing combined edges from a list of groups, to find the group which has the maximum number of adjacent groups, to find the number of groups connected to a list of pair of groups, and to ascertain whether there was an edge between a series of pairs of groups. Following each task, users were asked to rate each layout on a scale from 0 to 9 based on the the layout’s usefulness. Figure 7: Average user ratings for the 4 tasks for the GIB and FRL (Donut) Layouts 5.1) Results Participants rated each of the layouts in terms of usefulness in helping them completing the tasks. Aggregrating the usefulness scores of all of the tasks, the FRL layout (combined Donut and U) received an average score of 6.94/9 (SD=1.47). The GIB layout received an average score of 4.61 (SD=1.59). Ratings by layout are shown in Figures 7 and 8. Task 2: Find the group with the maximum number of adjacent groups. Task 1: Given a list of groups, count the number of outgoing combined edges from each of them. The Fitted Rectangle Layout received a score of 6.11/9 on this task (SD=1.90), while the GIB received a score of XX on this task with 5/9 (SD=1.58). The Donut layout outreceived an average usefulness score of 7.2/9 (SD=1.79), while the U-layout received a usefulness score of 4.75/9 (SD=.98). In this particular task, users found the GIB layout to be more useful than U-Layout, with a difference of 2.45 between the two average usefulness scores. For this task, users rated both versions of the FRL layout significantly better for than that of the GIB layout. 5 of the 9 users mentioned that because the most connected graph was in the middle center for the FRL layout, their eyes were drawn to it and they new immediately it was the most connected layout, while it was difficult to tell with the GIB, since it was located in the upper left corner. Task 3: Given a list of pairs of groups, count the number of groups connected to each pair. Similarly to task 2, users rated both versions of the FRL layout significantly better for than that of the GIB layout. It was at this point, where users started to voice their annoyance with the overlapping edges in GIB. At this step, 4 out of the 9 users expressed that the GIB made this task difficult due to the overlapping edges. Task 4: Given a list of pairs of groups, indicate if there is an edge between them. Users were relieved that this task received a “yes” or “no” response and completed this task more quickly than any of Figure 7: Average user ratings for the 4 tasks for the GIB and FRL (U) Layouts -8- the other tasks. For this task, users also rated the FRL layout more highly for both the Donut layout and the Ulayout. In a future implementation of the FRL meta-layout, aspect ratio of smaller groups would be considered as well. Additionally, the proposed method is not space-filling. This happens because the area to be occupied by each group is pre-determined and placing the first group at the core of Donut or U segments the screen space leading to an arrangement which might not be optimal. The fixed boundaries of the empty spaces can also contribute to wastage of space. A future implementation would employ a technique in which more space in the graph can be utilized. Fitted Rectangles We tested both versions of the FRL layouts, the donut layout and the U-layout. Collectively on average, the FRL layout was rated higher in usefulness than the GIB layout. With regards to the U-layout, users complained about the thinness of some of the edges and the resulting decrease of their visibility if they overlapped with a thicker edge. Users also made suggestions regarding the colors of the edges, complaining about the default gray-colored edges. Of the 9 users, 3 specifically suggested that a change of color in the intergroup edges would improve the edges’ visbility. ACKNOWLEDGEMENTS We thank Cody Dune and Dr. Ben Schneiderman for their valuable feedback throughout this project. REFERENCES 1) Bruls, M.; Huizing, K. and J. Van Wijk, J. Squarified Treemaps. Proc. Joint Eurographics and IEEE TCVG symposium on Visualization, IEEE Press, 2000, 33-42. DOI:10.1109/PASSAT/SocialCom.2011.139 2) Frantz, T. L. and Carley, K. M. Treemaps as a Tool for Social Network Analysis, Technical report number CMUISRI-05-118, 2005, doi=10.1.1.150.1696 3) Johnson, B. and Shneiderman, B. Treemaps: A spacefilling approach to the visualization of hierarchical information structures, Proc. 2nd International IEEE Visualization Conference, 1991, 284-291. 4) Rodrigues, E. M.; Milic-Frayling, N.; Smith, M.; Shneiderman, B. and Hansen, D. Group-in-a-Box layout for multi-faceted analysis of communities. SocialCom '11: Proc. 2011 IEEE 3rd International Conference on Social Computing, 2011, 354-361. 5) Smith, M. A., Shneiderman, B., Milic-Frayling, N., Mendes Rodrigues, E., Barash, V., Dunne, C., Capone, T., Perer, A. and Gleave, E. (2009). Analyzing (social media) networks with NodeXL. Proc. of the 4th International Conference on Communities and Technologies (C&T '09), 255-264. 6) Wakita, K. and Tsurumi, Y. (2007). Finding community structure in mega-scale social networks, Proc. 16th International Conference on World Wide Web, 1275-1276. GIB When completing tasks and answering questions with the GIB layout, 6 out of the 9 users complained about it’s crowdedness when compared to it’s FRL counterpart. One particular user mentioned that FRL was easier to analyze because the groups were more “scattered out”, even though FRL and GIB were using the same exact area for the entire graph. To this particular user, it appeared that the groups had more space to be scattered. 6) FUTURE WORK AND CONCLUSIONS In this report we have addressed the problem of visualization of inter-group relations in a clustered graph. Our approach is motivated by another successful layout method, Group-In-a-Box and we propose an algorithm to arrange the groups on a 2D screen space so that the overlap among the intergroup edges reduces as compared to the GIB layout. Our user study with 9 subjects indicates that the proposed Fitted Rectangle method is indeed useful in visualizing and studying the inter-group relations in huge complex graphs. However, there were complaints about the thin rectangles which encompassed groups with a small amount of vertices. The FRL meta-layout does not take into consideration the aspect the ratio of groups, and this proves to be particularly visually obvious in groups in which there are few vertices. -9-