SIGCHI Conference Paper Format

advertisement
Fitted Rectangles: A Visualization for Clustered Graphs
Snigdha Chaturvedi
Computer Science, University
of Maryland
snigdhac@cs.umd.edu
Zahra Ashktorab
Information School, University
of Maryland
parnia@umd.edu
Rajan Zacharia
Information School,
University of Maryland
rajankz@umd.edu
politicians or actors etc. Another popular approach is to
group the vertices based on some common attribute such as
geographical location or interests. Often the vertices
belonging to same community tend to behave similarly and
so, it can be useful to study individual communities in
greater detail. In most visualization tools, viewing the
communities is facilitated by using distinctive colors and
shapes to represent group identity. For example, politicians
can be represented by red circle while actors by blue
triangles. A recent popular method, the 'Group-In-a-Box
Layout' [4] enhances this further by assigning specific
spatial regions to individual groups. The GIB method lays
out individual groups into different well-separated
rectangles or 'boxes' (from where it derives its name)
revealing the local intra-group network structure and
attributes of the members.
ABSTRACT
Community structures and relationships within graphs are a
significant facet of data analysis and data visualization. The
existence of relationships between communities and the
strength between communities in a graph is an important
graph attribute and the visual clarity of these relationships
could provide insights during the exploratory phase of data
analysis. Currently, there is no effective way of visualizing
a graph to ensure that the relationships between clusters is
clear. We propose Fitted-Rectangle Layout (FRL), a metalayout for graphs that the visualizes clusters and their
respective inter-group edges clearly. FRL takes into
consideration the order of the clusters within a graph when
laying them out and places them in order of decreasing
connectivity of a group to all other groups. The most
connected group is placed in the center of the graph, while
all others are placed around it. The goal of the FRL is to
clearly visualize the inter-group edges and to decrease the
overlapping of inter-group edges that occurs in other TreeMap layouts. By conducting usability studies, we found that
users prefer the FRL layout to the GIB layout when seeking
for inter-group relationship information in the visualization.
However, apart from intra-group relationships, it is also
interesting to study the inter-group relationships. This helps
in understanding the nature of information exchange
between groups and how they are different from each other.
For example, in any given twitter network, it would be
interesting to see if there is a stronger relationship between
the sports and the politics community or between the sports
and the actors community. One of the ways to visualize this
in the GIB method is to 'combine' all the inter-group edges
between a pair of groups into one thick edge from center of
one group's box to another. Though, this conveniently
conveys the 'strength' of connections between groups, it
leads to an abundance of overlap among these thick
'combined' inter-group edges; especially near groups that
are highly connected to other groups. For example, in
Figure 5(a) we see a lot of overlap between the thick grey
combined edges around G1 and G2. This happens because
the tree map algorithm used to layout group rectangles does
not take the inter-group connectivity into account.
Author Keywords
network visualization; layout; meta-layout; force-directed;
graph clustering visualization; communities
1) INTRODUCTION
Network visualization is a challenging task involving
studying properties of hundreds and thousands of entities
and the relationships between them. With the recent surge
in popularity of social networks, the growing size of these
networks makes their analysis interesting but more difficult.
Graph clustering plays an important role in network
analysis. It refers to topology based clustering of vertices
into several groups or communities such that the
connections within groups are tighter than those between
groups. For example, in a Twitter follower network, people
who very frequently follow/re-tweet each other would form
one community such as the community of sports people or
This paper addresses this problem by rearranging the group
rectangles in a 'U' or a 'Donut' shape such that the most
connected group is placed at the core of the 'U' or 'Donut'
and is surrounded by other groups. For example Figure 5(b)
shows an example of the Donut layout. The edges around
G1 and G2 are more discernible than those in Figure 5(a).
Our experiments/user study with 9 subjects indicate that
such an arrangement is more effective in visualizing the
inter-group relationships than the GIB method.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise,
or republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee.
CHI’13, April 27 – May 2, 2013, Paris, France.
Copyright 2013 ACM 978-1-XXXX-XXXX-X/XX/XX...$10.00.
In the following sections, we describe the related work and
specifically GIB technique in greater detail. We then
-1-
explain our approach to the problem followed by empirical
results establishing its effectiveness. We finally conclude
the paper by summarizing our contributions and suggesting
directions of future work.
Because various clustering algorithms were used when
determining the groups in our Fitted-Rectangle Algorithm,
selecting the right algorithm was an important facet of our
research. One algorithm used was the Wakita- Tsurumi
algorithm, an alternative to the Clauset-Newman, and
Moore algorithm and one of the three clustering algorithms
provided by NodeXL. Wakita et. al propose an alternative
to the CNM algorithm that finds connections 70 times faster
than the CNM algorithm. This was a factor we considered
when choosing our our clustering algorithm because speed
is important with any visualization tool.
Much research has been conducted on the utilization of
force-directed layouts in graphs. This is particularly
important, even within clusters within graphs. Two
algorithms commonly employed to layout vertices are the
Fruchterman-Reingold (FR) layout and the Harel and Koren
Layout. While both of these algorithms aim to layout
vertices to achieve an optimal graph structure, they use
different methods. HK uses recursion to coarsen the graph
until it reaches an optimal multi-level representation, while
FR uses attractive and repulsive forces to achieve an
optimal equilibrium in a graph.
We implement our meta-layout in NodeXL, a toolkit that
runs as an add-in in Microsoft Excel. Smith et. al.
demonstrate the usefulness of NodeXL in data analysis and
visualization by using its various sorting, clustering and
filtering functionalities. The available clustering algorithms
in NodeXL were particularly instrumental in our
implementation of PR.
2) RELATED WORK
The ease of detection of communities within graphs and
their inter-connectivities are instrumental in revealing
insights. Providing a mechanism to visualize these
communities within a visualization tool could aid in the
detection of important insights within the data. There has
been much previous work done on visualizing communities
within a graph. Treemaps have been used in many studies
to visualize communities and the sub-communities of which
they consist. Frantz et. al describes the usefulness of
treemaps for analyzing social networks. During the
exploratory part of data visualization and analysis, treemaps
can be particularly insightful and revealing, particularly in
social network analysis.
The original treemap, introduced by Johnson et. al,
employed the 'slice and dice' method for representing
hierarchical information in rectangular space in a spacefilling manner. The Group-In-A-Box meta-layout is an
application of treemaps to aid graph cluster visualization.
Rodrigues et. al. introduce Group-In-A-Box as a metalayout
that displays each cluster in a treemap enclosed within its
own “box”. A space-filling technique is employed to fill the
space with clusters that have been grouped based on their
network attributes or clustering algorithms. Each box size is
proportional to the number of vertices it’s enclosed cluster
includes.
Furthermore, much work has been done on creating boxes
with lower aspect ration in treemaps. The traditional tree
mapping algorithms resulted in several thin elongated
rectangles which are difficult to point at in an interactive
environment. Bruls et. al propose a method a method of
organization that results in a treemap which consists of
'squarified' rectangles with lower aspect ratios. While
aspect ratios of the squares are significant in a tree-map, our
focus is to improve the readability of intergroup
relationships between the clusters in the graph.
While the GIB implementation of the squarified treemap is
an improvement over the original Slice-In-Dice method in
terms of improving aspect ration of boxes, resulting
overlapping edges between the clusters in GIB increase the
clutter in the graph and thus decrease the ease to which
users can understand information and draw conclusions
from the visualized information. The GIB algorithm does
not consider the placement and overlapping of intergroup
edges and thus results in overlapping edges that may be
detrimental to the readability of the visualization. Our
proposed “Fitted-Rectangle-Layout” visualization, which
we will refer to as FRL for the remainder of this paper,
solves this problem by taking the intergroup edges into
consideration when determining the ordering of the clusters
on the graph.
3) METHOD: FITTED RECTANGLES
In this section we explain our Fitted rectangles method in
detail. The scope of our method is limited to a better
visualization of the combined inter-group edges as
produced by the GIB method. Our method does not aim at a
better layout of the vertices of the graph. Instead, it
proposes a meta-layout that attempt to improve inter-group
edge overlap caused by the use of treemap layout in the
GIB method. In other words, this paper proposes an
alternative to the usage of treemap algorithm for arranging
the group boxes.
In this paper, we propose to alleviate this problem of
indiscernible combined inter-group edges by arranging the
groups in a U or a Donut layout instead of following a
treemap algorithm. Like GIB, each group is allotted a box
or rectangle on the 2D screen space whose area is
proportional to the number of vertices in the group. The
group rectangles are then arranged in a U or Donut layout
(explained below). Then the vertices of individual groups
are laid out in the box allotted to that group using any of the
popular graph layout algorithms like Fruchterman-Reingold
algorithm, Harel and Koren algorithm etc.
For the rest of this paper, we will be using the following
two terms:
-2-
ο‚· Connectedness. A group's connectedness is defined as the
number of other groups it is connected to. In other
words it is the number of outgoing combined intergroup edges.
divides the screen into two horizontal (H1 and H2) and two
vertical (V1 and V2) empty boxes. The remaining groups
will be arranged in these boxes alternating in the sequence
H1, H2, V1, V2.
ο‚· G-skewness. For a given graph, this refers to the fraction
of vertices present in the two most connected groups
(groups with highest connectedness values).
Since we only know the areas and not the dimensions of the
groups, we use the orientation of the empty boxes to
determine the group's width or height. While placing a
group in a horizontal empty box, we set its height to be
same as that of the horizontal empty box. Its width is then
determined by dividing its area by the height. Using the
dimensions calculated above, the group is finally placed in
the horizontal empty space box aligned with the empty
box's left side. For example, in Figure 1, step (ii) has a
horizontal empty box labeled H1. The result of placing a
group, G2, in H1 is shown in step (iii) creating a new
smaller H1 in (iii). It is easy to see that G2 and H1 of step
(ii) have same heights and they have a common left edge.
Similarly, while placing a group in a vertical free box, we
set its width to be same as that of the empty box and its
height is determined using its width and area. The group
and the vertical empty box share the top edges of the empty
box. See placement of G4 in V1 in steps (iv) and (v) of
Figure 1 for an example.
Our approach involves placing the group-boxes in a Donut
or U layout. In simple terms, in Donut layout, we place the
most connected group at the center of the screen and
arrange the remaining group all around the center group
(See step (viii) of Figure 1). This method should be
preferred when there are a lot of small groups in the graph.
However, when the graph contains one or two big clusters
and a few small clusters, Donut layout might result in a lot
of space wastage. In such a case, U layout should be
preferred in which the most connected group is placed at
the top center portion of the screen and the remaining
groups are arranged on the three sides of the previous group
forming a U shape (See step (viii) of Figure 2).
In general, a Donut layout is more effective in reducing the
edge overlap but is less space filling than the U layout.
Clearly, there is a tradeoff between the two desired effects.
So, if the G-skewness is high, then our method prefers a U
layout over Donut layout and vice versa. Defining G to be
the total number of groups in the graph, we proceed by
dividing the problem into 3 cases:
After placing G1, we place the next group, G2, in H1;
followed by G3 in H2, G4 in V1 and G5 in V2 (steps (ii) to
(vi)). Step (vi) shows that we have G6 and G7 left. So
starting again at H1, we place G6 in H1 (step(vii)). We then
try to place G7 in H2, but H2 had no space left. So, we
move on to V1 and place G7 there (step (viii)). This ends
the algorithm as there aren't any more groups left to be
placed at step (viii).
1. Case1: G<=3 and G-skewness <0.1
Layout the group-boxes using the treemap algorithm as
in case of GIB
2. Case2: G>3 and 0.1<= G-skewness <=0.45
The method proposed above is not space filling which
might result in a situation where the algorithm still has
some groups to place on the screen but none of the empty
boxes are big enough. In such a situation, we restart the
algorithm with alpha=0.9*previous alpha and repeat this
process until all the groups get placed on the screen. A
detailed algorithm for the proposed method is shown in
Figure \ref{}.
Layout the group-boxes using Donut layout
3. Case3: G>3 and G-skewness >0.45
Layout the group-boxes using U layout
3.1) Donut Layout
Figure 1 shows an example of a Donut layout with 7
groups. The area occupied by each group is calculated as:
Group Area = π‘Žπ‘™π‘β„Žπ‘Ž ∗ π‘†π‘π‘Ÿπ‘’π‘’π‘› π΄π‘Ÿπ‘’π‘Ž ∗
3.2) U Layout
Figure 2 shows an example of the U layout. The groups
sorted in decreasing order of connectedness and their areas
are shown in the table on the left in the figure. The
algorithm is similar to Donut layout. The only difference is
that the first group, G1, gets placed at the core of the U
resulting in one horizontal and two vertical empty boxes
instead of two each namely H2, V1 and V2 (step (ii) of the
figure). The remaining groups are placed around G1 in
alternating in the sequence H2, V1 and V2 (Steps (iii) to
(viii)).
𝑁_π‘”π‘Ÿπ‘œπ‘’π‘
𝑁_π‘”π‘Ÿπ‘Žπ‘β„Ž
where N_group and N_graph represent the number of
vertices in the group and graph respectively and alpha =1.
The areas, for the current example is specified in the table
on the left in the figure. Let us assume that the groups are
sorted in decreasing order of connectedness. The various
steps of the process are marked as (i) to (viii). The figure
also shows that each step is also associated with an array
which stores the groups left to be placed at that step.
In the following sections, we will explain the
implementation of our method and then compare our
method with the GIB meta-layout.
We begin with an empty screen and all the groups in step
(i). We then place G1, the most connected group, at the
center with an aspect ratio proportional to that of the screen.
This is represented as a blue box in Step (ii). Placing G1
-3-
The background graph in Figure 3 shows vertices colored
by cluster id after using the Clauset-Newmann-Moore
algorithm while the excel sheet on the left shows the
various clusters labeled G1, G2 etc. After clustering, the
users can also use GIB to visualize contents of individual
groups in greater details.
4) IMPLEMENTATION
The meta-layout presented above was implemented in
NodeXL using C#. NodeXL is an open-source network
analysis and visualization tool within Excel which allows
easy computation of various network analysis metrics such
as betweenness centrality, degree, closeness etc. The users
can annotate the vertices and edges with these properties or
with user-defined labels or images and visualize the results
by drawing their graph algorithmically on a 2D screen
space. Several layout algorithms such as the FruchtermanReingold and Harel-Koren, circular, spiral etc are currently
being offered by NodeXL. For example, the graph in the
background towards the right side in Figure 3 has been
drawn using the Fruchterman-Reingold layout and the
vertices are sized by betweenness centrality. Another
relevant feature supported by NodeXL is the clustering
option which clusters the vertices of the graph into smaller
groups. NodeXL supports various graph clustering
mechanisms such as clustering based on vertex attributes or
connected components or using standard graph clustering
algorithms. Upon clustering, the vertices, by default, get
colored by their group id so that it is easy to identify the
vertices belonging to same groups while visualizing them.
The features briefly described above are a very small part of
the huge range of rich functionalities provided by NodeXL.
Since NodeXL is a very useful tool for insightful analysis
of networks including clustered graphs, we decided to
implement our method as an additional meta-layout option
like GIB. The dialog box in the screen shot in Figure 3
shows how we provide our method as a layout option
(encircled in red). We also provide three options for the
style of inter-group edges as shown in the figure. The users
can choose to either 'show' all the individual inter-group
edges (Figure 4(a)) or 'hide' them completely (Figure 4(b))
or 'combine' those between two groups into one thick edge
going from center of one group to another (Figure 4(c)).
All the figures (except the simulations Figures 1 and 2) in
this report have been generated by NodeXL implementation
of our method using the 'combined' inter-group edge option.
Figure 1: Example of a Donut layout
-4-
Figure 2: Example of a U layout
Figure 3: Screenshot of NodeXL showing the Fitted Rectangle option
Figure 4: Screenshot of NodeXL showing the inter-group edge option
-5-
Figure 5: Comparison of the GIB and FRL (Donut) Layouts
-6-
Figure 6: Comparison of the GIB and FRL (U) Layouts
-7-
5) EVALUATION
Nine participants (6 Females, 3 Males) who have at least
pursued or are pursuing a masters degree or higher in a
technical field were recruited to conduct a usability study.
The participants self reported that they have dealt with
network data previously, biological or social network data.
The experiment followed a within subjects design, where
subjects were given a set of tasks and asked to use two
layouts, one with the GIB layout, and another with a FRL
layout to answer questions. Participants were asked to
think-aloud as they completed the tasks. Additionally, half
of our participants looked at a dataset that fell under the “ULayout” FRL (Figure 6), and the other half of the
participants looked at a dataset that produced a “Donut”
(Figure 5). The order of experimental conditions was
counterbalanced by alternating the order in which we
presented GIB/FRL layouts.
Each user was asked to answer a series of tasks given two
layouts with the same data. These tasks were derived from
questions that may arise about a network with regards to the
relationship between the various communities in the
network.
Subjects completed the task in their native work
environments and were presented the two layouts on a
screen, the way in which the layouts would be naturally
presented to them, if they were using NodeXL. They were
asked to solve a series of tasks and were allowed to
complete them in any order they were desired. If they
consented, they were recorded as they “thought-aloud.” The
tasks asked users to count the number of outgoing
combined edges from a list of groups, to find the group
which has the maximum number of adjacent groups, to find
the number of groups connected to a list of pair of groups,
and to ascertain whether there was an edge between a series
of pairs of groups. Following each task, users were asked to
rate each layout on a scale from 0 to 9 based on the the
layout’s usefulness.
Figure 7: Average user ratings for the 4 tasks for the
GIB and FRL (Donut) Layouts
5.1) Results
Participants rated each of the layouts in terms of usefulness
in helping them completing the tasks. Aggregrating the
usefulness scores of all of the tasks, the FRL layout
(combined Donut and U) received an average score of
6.94/9 (SD=1.47). The GIB layout received an average
score of 4.61 (SD=1.59). Ratings by layout are shown in
Figures 7 and 8.
Task 2: Find the group with the maximum number of
adjacent groups.
Task 1: Given a list of groups, count the number of
outgoing combined edges from each of them.
The Fitted Rectangle Layout received a score of 6.11/9 on
this task (SD=1.90), while the GIB received a score of XX
on this task with 5/9 (SD=1.58). The Donut layout outreceived an average usefulness score of 7.2/9 (SD=1.79),
while the U-layout received a usefulness score of 4.75/9
(SD=.98). In this particular task, users found the GIB layout
to be more useful than U-Layout, with a difference of 2.45
between the two average usefulness scores.
For this task, users rated both versions of the FRL layout
significantly better for than that of the GIB layout. 5 of the
9 users mentioned that because the most connected graph
was in the middle center for the FRL layout, their eyes were
drawn to it and they new immediately it was the most
connected layout, while it was difficult to tell with the GIB,
since it was located in the upper left corner.
Task 3: Given a list of pairs of groups, count the
number of groups connected to each pair.
Similarly to task 2, users rated both versions of the FRL
layout significantly better for than that of the GIB layout. It
was at this point, where users started to voice their
annoyance with the overlapping edges in GIB. At this step,
4 out of the 9 users expressed that the GIB made this task
difficult due to the overlapping edges.
Task 4: Given a list of pairs of groups, indicate if there
is an edge between them.
Users were relieved that this task received a “yes” or “no”
response and completed this task more quickly than any of
Figure 7: Average user ratings for the 4 tasks for the
GIB and FRL (U) Layouts
-8-
the other tasks. For this task, users also rated the FRL
layout more highly for both the Donut layout and the Ulayout.
In a future implementation of the FRL meta-layout, aspect
ratio of smaller groups would be considered as well.
Additionally, the proposed method is not space-filling. This
happens because the area to be occupied by each group is
pre-determined and placing the first group at the core of
Donut or U segments the screen space leading to an
arrangement which might not be optimal. The fixed
boundaries of the empty spaces can also contribute to
wastage of space. A future implementation would employ a
technique in which more space in the graph can be utilized.
Fitted Rectangles
We tested both versions of the FRL layouts, the donut
layout and the U-layout. Collectively on average, the FRL
layout was rated higher in usefulness than the GIB layout.
With regards to the U-layout, users complained about the
thinness of some of the edges and the resulting decrease of
their visibility if they overlapped with a thicker edge. Users
also made suggestions regarding the colors of the edges,
complaining about the default gray-colored edges. Of the 9
users, 3 specifically suggested that a change of color in the
intergroup edges would improve the edges’ visbility.
ACKNOWLEDGEMENTS
We thank Cody Dune and Dr. Ben Schneiderman for their
valuable feedback throughout this project.
REFERENCES
1) Bruls, M.; Huizing, K. and J. Van Wijk, J. Squarified
Treemaps. Proc. Joint Eurographics and IEEE TCVG
symposium on Visualization, IEEE Press, 2000, 33-42.
DOI:10.1109/PASSAT/SocialCom.2011.139
2) Frantz, T. L. and Carley, K. M. Treemaps as a Tool for
Social Network Analysis, Technical report number CMUISRI-05-118, 2005, doi=10.1.1.150.1696
3) Johnson, B. and Shneiderman, B. Treemaps: A spacefilling approach to the visualization of hierarchical
information structures, Proc. 2nd International IEEE
Visualization Conference, 1991, 284-291.
4) Rodrigues, E. M.; Milic-Frayling, N.; Smith, M.;
Shneiderman, B. and Hansen, D. Group-in-a-Box layout for
multi-faceted analysis of communities. SocialCom '11:
Proc. 2011 IEEE 3rd International Conference on Social
Computing, 2011, 354-361.
5) Smith, M. A., Shneiderman, B., Milic-Frayling, N.,
Mendes Rodrigues, E., Barash, V., Dunne, C., Capone, T.,
Perer, A. and Gleave, E. (2009). Analyzing (social media)
networks with NodeXL. Proc. of the 4th International
Conference on Communities and Technologies (C&T '09),
255-264.
6) Wakita, K. and Tsurumi, Y. (2007). Finding community
structure in mega-scale social networks, Proc. 16th
International Conference on World Wide Web, 1275-1276.
GIB
When completing tasks and answering questions with the
GIB layout, 6 out of the 9 users complained about it’s
crowdedness when compared to it’s FRL counterpart. One
particular user mentioned that FRL was easier to analyze
because the groups were more “scattered out”, even though
FRL and GIB were using the same exact area for the entire
graph. To this particular user, it appeared that the groups
had more space to be scattered.
6) FUTURE WORK AND CONCLUSIONS
In this report we have addressed the problem of
visualization of inter-group relations in a clustered graph.
Our approach is motivated by another successful layout
method, Group-In-a-Box and we propose an algorithm to
arrange the groups on a 2D screen space so that the overlap
among the intergroup edges reduces as compared to the
GIB layout. Our user study with 9 subjects indicates that
the proposed Fitted Rectangle method is indeed useful in
visualizing and studying the inter-group relations in huge
complex graphs.
However, there were complaints about the thin rectangles
which encompassed groups with a small amount of vertices.
The FRL meta-layout does not take into consideration the
aspect the ratio of groups, and this proves to be particularly
visually obvious in groups in which there are few vertices.
-9-
Download