Talk Transcript (Word)

advertisement
17540
>> Bongshin Lee: Good morning everyone. Thanks for making time during one of the busiest
days in the year. Today we have two speakers from the University of Maryland. Cody Dunne is a
Ph.D. student in the computer science department at the University of Maryland, working under
Ben Shneiderman. And Elizabeth Bonsignore is a Ph.D. student in I School there. And she
serves as an operations engineer and intelligence analyst for the Department of Defense and she
holds Master's degrees in computer science and education.
Actually, they are on their way to the I-3 conference on social computing in Vancouver, Canada.
So today first they will talk about the paper they will be presenting at the conference on Sunday
morning and then later they will talk more about network visualization Nirvana and readability
matrix for craft drawing. Here comes the talk.
>> Elizabeth Bonsignore: Thank you very much, Bongshin. As she said, I'm Elizabeth
Bonsignore, and with Cody's help today I'd like to share with you the results of a study that we
had this past spring studying the teachability of social network analysis using node XL.
As you can see, we both worked on the project, but also Dana Rotman, another Ph.D. student at
the I School was with us Marc Smith, sociologist with Intelligent and Tony Capone is here with us.
He's a developer. And our professor is Derek Hansen and Ben Shneiderman.
First, a real quick overview of what I'm going to cover today or what we're both going to cover
today, first motivation and research goals for the study an evaluation of NodeXL to teach, as a
teaching tool for social network analysis, and the principles of layout graph principles that we
dubbed NetViz Nirvana and readability metrics. The Research methods we went over, samples
of student work, and some lessons learned.
The motivation for our study is based on the observation that sophisticated network analysis tools
basically remain the domain of computer scientists who are developing algorithms for drawing
graphs. Most of them have high learning curves, steep learning curves, use command line
interfaces or require some level of programming expertise. Yet, as most of us know, social
network analysis is an important academic, commercial and in social media contexts.
In short, SNA is not just for scientists anymore, because community managers also have a stake
in learning how to use SNA tools to sort of help their communities to cultivate.
So our long-term goal is to develop accessible tools and educational strategies to help reach a
broader spectrum of users.
With that overarching goal in mind, the focus for our talk today is evaluation of NodeXL as a
teaching tool for social network analysis across a broader user set as well as these layout, graph
layout principles that we've dubbed NetViz Nirvana which Cody will go over.
First a little bit about NodeXL. NodeXL is an open source plug-in for Microsoft Excel. It stands
for Network Overview Discovery and Exploration for Excel. As an open source plug-in for Excel,
it takes advantage of the spreadsheet format to store your network data.
It also takes advantage of the standard spreadsheet manipulation tool such as sorting, filtering
and creating formulas to actually use them in your displays.
If we take a little walk around the layout, basically we have on our left we have this spreadsheet,
your data for your nodes and your edges, and we have the visualization. We have worksheet
tabs that you can switch back and forth between tasks for edges and nodes and clusters.
And it's a very tightly integrated visualization and data. If you click a node or an edge in your
graph, it will be reflected or highlighted in your cell data and your spreadsheet and vice versa.
You can import your network data from existing spreadsheets, of course, or from several common
social network data sources such as Twitter or tools like Piach [phonetic] or output from Piach.
They also provide a starter library, basic network matrix, such as centrality measures and degree
measures which you can select as needed based on your performance requirements or based on
what questions you're asking.
You can also have multiple ways to map your data to display properties or attributes such as
using degree to represent size, shape or opacity of the nodes, or you can even use thickness of
the edges as well.
And on to network, NetViz principles.
>> Cody Dunne: As part of our study, we introduced these information science to students who
had a couple of week class on network analysis to these principles for creating better network
diagrams. We call it NetViz Nirvana.
And NetViz Nirvana pretty much is that every node in your network should be visible so you can
count the number of nodes in a particular area. If the nodes are large enough you can read the
labels on them wthout them occluding each other. Also we want the degree of every node to be
countable, because degree ends up being a very good measurement for how important any
individual node is for the status of any actor in the network.
Also, we want to be able to follow each of these edges from the source node to the destination
node, because knowing relationships exist isn't as useful as being able to find out who they're
between and look at more details about them. Also, we want to be able to easily identify clusters
and outliers, because that's what we really care about finding. If we're a network analyst, we
want to look at interesting attributes of our network.
These overall principles was the introduction that we gave them to creating good network
diagrams, but they don't really give you fine detail as to where your drawing is or, sorry, where
problems are in your drawing and provide a quantitative measurement for how good or bad your
drawings are.
So we've been working on these readability metrics which we also introduced to the students
during the course that are just measurements for how readable or understandable the drawing is.
So some of these are common things like edge crossings like we've seen before, and we have a
node occlusion metric for nodes overlapping and edge tunnels for edges travel underneath them,
but we're defining all of these on a continuous scale from zero to one where zero is the worst
possible case we think you could have and one is the best case.
In the past these often have been called aesthetic metrics, but we believe this is a bit of a
misnomer because we're focusing on the readability of the drawing, not necessarily the beauty.
Although it's fortunate that beauty is oftentimes correlated with readability.
Now, in the past people like purchased, focused on global metrics, which give you a single
number for various aspects of your entire visualization saying it's 80 percent on node occlusion or
80 percent on edge crossings, things like that. But oftentimes it's not sufficient to guide users to
problem areas so they can improve their drawings.
Think of an analogy to Microsoft Word. If Word just told you that you had 20 misspellings in your
document rather than pointing you to where those misspellings were, it would take you
substantially more time to improve your document and get it ready for publication.
We've created this node in edge readability metrics to give us this fine granularity and allow us to
pinpoint problems. First off, let's look at node occlusion. On the simple drawing on the right we
have nodes A, C, and D. In this fourth node we can't see it because A is occluding it.
There could be additional nodes hiding there we're not able to count because they're completely
hidden. But by moving A off to the side, we can read the label. On node B we see A is not
connected to node B. It's just a much better representation of the drawing.
Our occlusion readability metric is proportional to the area lost if you compress all the nodes into
a single static image, like layers in PhotoShop or any other image processing application.
So when you flatten all these nodes, you count how much area is lost in the flattening and in a
perfect situation every node is uniquely distinguishable. They have a white border around them
or whatever the background color is and you don't lose any area. Worst case scenario they end
up in a pile together.
Our edge crossing readability metric is the same used by purchase, number of crossings that you
count in the drawing like this intersection here, scaled by an approximate upper bound for how
many we think there could be.
And this is bad because when you have crossings like this, it's extra visual complexity in the
graph. If you don't need to have the crossings there, you should eliminate them so that users can
do things like path finding tasks better without having to deal with these visual attributes.
So by moving it down, again we get the same representation where you don't have these
intersections and this extra cognitive load. Edge tunnels are when you have an edge traveling
underneath a node without actually connecting to it. This is the exact same network as before;
but because of the positioning, it looks that C is not directly connected to B and it looks like A is
connected to B.
But, in fact, the edge is just hidden directly underneath that node. So this is kind of a trivial
example where things are perfectly aligned; but oftentimes it's very hard to determine whether an
edge is actually connected to a node especially if you have noncircular nodes.
So just like edge crossings, we create a metric for this based on the number of tunnels, based on
an approximate upper bound how many there could be. But we define these additional local
edge tunnel and trigger edge tunnel metrics for how many edge tunnels there are underneath any
particular node and how many edge tunnels there are caused by the edges coming from a
particular node.
So the first is good for identifying nodes that are problems and the second is good for identifying
areas around nodes that we might want to manipulate as a whole. We also look at text
readability. So the first we examined was that label height or the height of the text within the label
and guidelines for proper text type depends on the font and all that. But usually it should be
within 20 and 22 minutes of arc from the viewer's eye.
And so no matter what the medium is, we have to calculate the distance of the user from the
medium, the pixels per inch in the medium or the height of the font in inches on paper and
compute this visual angle and we create this metric based on approximate lower and upper
bounds based on work other people have done.
Finally, label distinctiveness is a measure for how unique your label is in the drawing. If you have
Department of Computer Science, Department of Sociology and so on as your node labels and
you truncate after eight characters, you're going to have all of these labels that you're not going to
be able to distinguish between them.
So we create this distinctiveness metric based upon a prefix tree where you can just cut off at
your truncation point for all your labels and then in each of the sub trees you can count the
number of nodes that have identical labels.
Now, one thing that you might think is why don't you just use like Lichert distance or things like
that to determine the distance between every node or between the label of every node but this
would in fact encourage users to create arbitrarily complex and unnecessary convoluted node
names rather than just differing by a couple of characters.
>> Elizabeth Bonsignore: Thanks Cody. We tried to frame our methods onto our methods, we
tried to frame our methods along two axes, if you would like to think about it that way.
One is depth and one is diversity. We say depth because we wanted to use mixed methods and
a long time frame to study these users because we wanted to learn the graduate student, the
information science graduate student's process of learning social network analysis. There's sort
of sense-making process.
We also wanted to try to map out or document the discovery process that practitioners, experts
might use as they're trying to find patterns in the social structures. What we found was that
there's a qualitative framework known as multi-dimensional in-depth long-term case studies
approach, a qualitative approach. It assumes at the outset that exploration of datasets, complex
datasets cannot be effectively tested by predetermined timed usability, traditional timed usability
tests that usually last around on the order of one to three hours, because that sense-making
process is really not that predictable.
So we felt that was ideal for setting basically the learners as well as the experts. We took a sort
of, to meet our diversity need to support a broad base of users, we had a two-prong user study.
We used a core set of methods, interview techniques, presurvey and post-survey. We placed
them, or we used them against two different groups.
Computer scientists who had background in graph drawing and information visualization, and
information science students who were interested in social networks and online communities and
were in fact studying online communities but weren't necessarily as technical as a computer
scientist and didn't know that much about graph drawing.
So first for a little bit of background on our information science. Graduate students, we had 15 of
them. They're a mixture of information science or library science and information management.
They were all participating in a communities of practice course to study online communities and
to determine social structures and relationships and ways in which they could help cultivate these
online communities.
They were each studying a community of choice. They ranged from things like depression
support group, a serious eats gourmet group. A weight loss group. Wedding planning group, and
arts and crafts group, records management listserv and so on. The time frame for our study was
about five weeks within a 16-week semester.
The data that we collected is we observed in class, in lab. We transcribed some of their online
discussions and sort of coded and categorized what they were talking about as they were
learning social network analysis. We had individual observations of various students of, just
about all of them, 13 of them, while they were completing certain assignments using NodeXL and
they would use a talk a loud protocol saying I'm thinking about this or I'm really frustrated about
that.
We looked at their course work, their assignments, and they had diaries, they had journal entries
that talked about this is where I'm at in my visualization process; this is what I'm thinking about;
these are my hypotheses. We did a presurvey to determine what their level of experience was
and a post survey to sort of determine what they thought was most important.
And for a few selected students we did some in-depth interviews. We used a grounded theory
approach for an analysis which basically says that you have, after you have all this data
transcribed, you sort of code and categorize all the salient features.
For the information -- I'm sorry, for the computer science graduate students, that was primarily
conducted by Cody. He had six computer science graduate students. They were all experienced
at some level in graph theory, social network analysis and information visualization techniques.
So he took sort of a core of what we used for the information science students. He used a
survey, a presurvey and then took about just under two hours with each participant to give them a
tutorial and to observe them interact again with their own dataset, not with a prescribed dataset,
but a community or social network that they were interested in.
And part of that observation process was in-depth interview. Again, a grounded theory approach.
And Cody also used spot fire and some quantifier analysis of surveys.
As we were coding and categorizing some of the transcripts and especially the online discussions
and the journal entries, some salient features about teaching social network analysis and the
learning process came to mind, which is basically the students really enjoyed being able to map
their display attributes of using certain metrics to their display. In other words, they wanted to
make the node sizes big if the degree was high. They wanted to change the shape of certain
nodes that had certain characteristics like community moderators.
And we found that NodeXL very effectively supports this being able to do it very dynamically and
setting it pretty much via menu item. And the students became almost obsessive after they
learned about the graph layout principles known as NetViz Nirvana although they didn't have the
capability built into NodeXL yet, they were really -- they found it really important to do to try to
make meaningful graphs and it helped them develop relatively sophisticated graphs in a short
period.
Some student examples. I'm going to go through three of them. And the first two sort of have the
same structure in that or goal in that they were interested in finding whether boundary spanners
or bridges occurred across different subgroups or across different forums. And they both used
degree and tie strength to sort of represent that as their metrics to represent that.
This student, she had this Subaru subgroup, Subaruowners.com community. Again she identified
boundary spanners and wanted to show the level of participation in different forums. So these
spheres here are the actual forums she was interested in and the smaller nodes and triangles
and diamonds are the actual people that are part of that community.
What you can see right away is that the members and cafe forums have a lot more connectivity, a
lot more common participants. This is probably because of their more relaxed nature or the
broader nature of their discussions than the problems and solutions group.
For the problems and solutions group, many people just go in and ask a question and they're out
of there and they don't keep coming back and making friends.
Another thing that we found that was interesting or that she found was interesting was that the
host, which ostensibly are the people that are supposed to keep the discussion going, were in
fact some of the lower level participants in the group with the exception of this one right here, but
in fact who came out as community leaders were these two little triangles and they didn't
necessarily have any real leadership role but they came out based on their participation.
The second group is, this student, again, she wanted to identify boundary spanners across these
different subgroups but she also wanted to find out how these ravelers who are people who knit
and crochet, what are the properties of those folks who completed more projects, more knitting or
crocheting projects. What were the factors that led them to complete more projects. So she
picked the top 20 posters and contributors of three different groups.
One was lazy and godless, which is the LSG community the other was fiber optics spectrum.
What you find there, they're pretty much tightly unique and separate subgroups in this first
drawing here.
There's not very many boundary spanners across the groups. They sort of stay among
themselves, and you also see that the lazy and Godless community, they're very much more
social because they contribute a lot more to the posts whereas project spectrum is focused on
completing projects and maybe just highlighting that fact in their posts.
In the next post she took only -- in the next visualization she took only the bloggers with that
community and the community moderators or leaders of each subgroup are shown by these
spheres and sort of confirmed the relatively simple hypothesis that those who were moderators,
those who participated more actually completed more projects.
But it's still an interesting visualization. And while this student didn't really achieve NetViz
Nirvana per se because there's lots of edge crossings, he's really interesting in that he tried to
model sort of a community management problem using NodeXL and it was really pretty effective
at doing that. What he has is a records management listserv and he was trying to identify what
features of the different people would be reflected in their sort of leadership and experience
levels.
And his hypothesis was that those would hide between centrality and those connected to
important people, those with high eigen vector centrality would in fact turn out to be the leaders in
the community. That held true because the admin in this first, left graph, you see that the admin
of the listserv community was in fact, it had -- he had the highest between this eigen vector
centrality. Those with lower, between this or those who were less connected or bridging across
different subgroups are sort of hanging out on the periphery and they're in the red as well.
What he tried the model, once he confirmed that that was sort of his, that his hypothesis was
correct, was who would be the best candidate if he could skip or take out that existing admin, or if
that admin left the community, would there be anybody who could have sort of had the same
requisite skills and experience and connections that would be able to come and fill that person's
place.
He did in fact, he skipped the node using NodeXL very easy, one step, and was able to show that
there was a new admin that sort of popped up there.
So relatively sophisticated model for somebody who had only learned for about three to five
weeks.
So as to our lessons learned: What we found primarily was that when you promote the
awareness of NetViz Nirvana people learn it. When they know you have to make graph layouts
that are readable they learn it pretty well and in fact they get sort of obsessive about it.
One thing that NodeXL didn't support very well that they, the students had workarounds
themselves is they tried to scaffold their learning with sort of interaction history. They would save
different versions and different paths they followed with their visualizations and hypotheses as
they went along. But what they really kept saying was if we could have a history or a way to undo
actions that would be really helpful for us to learn. I thought of this idea. It didn't quite work and it
might also be good to have a library of those sorts of histories so that other people who learn or
who are learning social network analysis could sort of learn from that as well.
There were some pasting issues, went from a nice tutorial to some low levels, numbers of nodes
and went very quickly to lots of nodes and it got a little bit hard for them to take it all in.
And many of them said that they would have liked to have higher level Excel experience. If not
just Excel itself, but more experience with the interface with Excel 2007. For researchers, what
we found is that MILCs are very effective way to represent the discovery process that subject
matter experts follow as they go through these complex, their analysis of these complex datasets.
It maps the sense making process that students follow as well.
In fact, if you consider that of the 132 reports or articles that were posted, I guess, between 2005
and 2008 and 2007 and 2008 in info advice and vast only 39 of those actually had any user
evaluation, period. And all of those were between one and three hours. They were all your
standard timed predetermined dataset analysis. So this was probably a good example of trying to
map to the actual process of learning SNA as well as the discovery process.
Of course, it's kind of obvious but it merits mention that it does require more data collection and
analysis.
>> Cody Dunne: So from our CS students, in particular, but also our information science
students, we learned a lot of more specific things about our tool and about tools in general for
how like to improve them. So we've shown again multiple coordinated views having our tabular
view and visualization and brushing link in between them is very effective in allowing people to
analyze the data.
And we've also shown that users enjoy being able to code all these visual attributes based on the
data that they have.
More interestingly, we showed that adding readability metric interactions is helpful in getting these
users or adding knowledge to the readability metrics anyway is more helpful in getting these
users to better productions than without them. Also, more specific to Excel, we found that having
that tabular view with all these formulas and macros you can apply in Excel allows for very
extensive and novel data manipulation techniques, like writing formulas based on constants
defined elsewhere in manually tweaking them and seeing their reflected visual updates.
However, being inside Excel we're limited to what Excel templates allow us to do. So like
Elizabeth said having undo functionality or hierarchy of previous actions and being able to go
back to them would be very helpful for these tools.
And finally the big key point is that users would really enjoy being able to aggregate nodes and
edges and currently NodeXL has a limited edge aggregation functionality. But the users
constantly requested that our aggregation techniques, not just for communities but for user
defined groupings as well if you're looking at an e-mail network you might want to combine e-mail
addresses that you know of that belong to a single particular person.
In conclusion, we used two different and diverse groups. One with nice long-term
multi-dimensional long-term case study and one with more short-term focused intense
assignments using their own datasets and they were all invested in accomplishing their analysis.
First network analysis education. Our information science students, showed that NodeXL can be
very useful because of its shallow learning curve for teaching basic network analysis techniques.
And as far as NodeXL usability and design, our feedback from our CS users and our IS users
gave us lots of information about feature requests and bugs for NodeXL that allowed substantial
improvement even over the course of the user study in the program and we're just finishing a
large redesign of the user interface based on some of these requests.
If you have any questions, you can ask them to us now or you can look at a tutorial for NodeXL,
50 pages here, on the Catsky [phonetic] website. Here's the NodeXL website, and here's more
visualization links at the University of Maryland.
>> Elizabeth Bonsignore: Thank you.
[applause]
>>: Cody, what are you doing to actually compute these readability metrics today? Are you
doing computational analysis of the geometry of the graph?
>> Cody Dunne: Yes, we just look at the layout provided to us by the layout algorithm, and then
we're able to find the number of edge intersections and we're able to find where every node is
occluding, and the current implementation we have, which I'll talk about a little bit more in social
action which is another network analysis tool, uses fairly naive approaches for computing these
intersections.
But if you look at the work by, oh, University of Maryland, I'll send you the link, but there's a large
reference library of fast edge intersection and line intersection and rectangle intersection
algorithms that you can use to substantially improve this process.
Because in the best case scenario you would be computing these in real time and given feedback
to the user as to where the problems are in the network.
>>: I have a question about that. So the current tool doesn't give them feedback on the sort of
readability metrics?
>> Cody Dunne: No, Excel currently doesn't, but it will soon. I have a follow-up addendum
presentation that talks about more in depth about the readability questions and it will address
that.
>>: I missed the beginning. Is it a manual layout tool that doesn't use expressing the edge
crossing removal that somehow sort of ->> Cody Dunne: So NodeXL has a couple of algorithms that it provides. We have [inaudible]
algorithm and older forced directed approach. We have the [inaudible] multi-scale approach now
for laying out the graph. Circle layouts, various like sine waves and grids and spirals and things
like that.
>>: Can a user then somehow make it better if the algorithm doesn't do satisfactory ->> Cody Dunne: That's what all of our I School students did.
>> Elizabeth Bonsignore: In fact, they spent lots of time -- although they didn't have it done for
them or at least not yet in NodeXL automatically to sort of remove all these edge crossings, they
spent a lot of time in all of their diaries and all the interviews we had with them, once they were
aware of NetViz Nirvana, once they read Cody's paper and Ben's paper and learned about all
these different factors, node occlusions and edge crossings and edge tunnels they spent a lot of
time saying I want to get rid of them and make my drawing more meaningful. In a relatively short
time they were able to get a really sophisticated sort of idea how they should be drawing graphs
manually.
>>: They did that via dragging?
>> Elizabeth Bonsignore: Yes, they liked locking certain -- they really liked locking certain
sections of the graph and then experimenting with other ones as well. So, yeah, it was really -they got obsessive about it, actually.
>>: So you said that when they learned, they were more obsessed. So when did you teach
them?
>> Elizabeth Bonsignore: It was right -- let's see, they had a tutorial in NodeXL. They read Cody
and Ben's tech report, and then they discussed it in class. And they were -- and around that
same time they were just now starting to sort of input the data for their communities saying I think
this is what I think will happen and they started realizing this looks really crappy or they didn't like
the way it looked when they did the first automatic layout and then they would play around with it.
>>: So what was the exact test you guys gave to them? What did you ask them to do?
>> Elizabeth Bonsignore: They had these online communities of interest that they had started to
study and they were using various communities of practice ideas to sort of determine better than
participation metrics what they were, more information about how to sort of keep these
communities growing so they wouldn't die off as we see many online social networks die.
And so the section for NodeXL is that they wanted to see how social network analysis tools,
statistics and visualizations, would help them give community managers more of an idea of these
sorts of things.
So they learned NodeXL. They were given two assignments to say sort of state a hypothesis
about something that you, either a management problem like that third student did, or ways of
sort of highlighting the interests of subgroups and then make two visualizations based on that sort
of initially visualization see if it confirms your hypothesis or shows you something else. And a
second visualization that either models some sort of problem or looks at it from another facet.
Does that answer your question?
>>: And what was the average size of the graph state?
>> Elizabeth Bonsignore: Oh, that's a good question. It was anywhere from -- well, with a loss
PD, there was one student studying loss PD. I'm not sure, she may have had two subgroups. I
don't remember. That was probably the top end. But they were on the order of the hundreds.
>> Cody Dunne: Some of our CS users dealt with much larger datasets. We had one analyze a
protein-protein interaction network, supported 40,000 edges, and another look at the vast 2009
challenge dataset of footer relationships which is a Twitter clone for the dataset.
And the current version of the NodeXL we used at the time didn't handle these very large
datasets very well. We had a lot of good feedback how to reduce the burden placed on the user
of sitting there and waiting for things to finish.
So we received feedback on disabling some of the real time updating and more progress bars
and being able to cancel actions and the like.
So currently NodeXL is more your introduction to social network analysis tool. But as we
progress, it scales more and more to larger networks. And we actually have a research assistant
working now on porting some of these algorithms like between the centrality and the like over to
Nvidia's CUDA architecture for graphic processors to improve the speed of them. And we have
Vladimir and virash working on porting them to a Map-Reduce framework so you can offload it
into the cloud for large networks.
>>: So for the first user group you gave us the three examples. Do you have any examples for
the graduate students?
>> Cody Dunne: Not in the slide show, but I can bring them up if you like. They didn't learn ->>: I can see it later.
>> Cody Dunne: And so they were different tasks. It wasn't about the communities of practice
approach. I gave them more directed tasks like finding interesting community and find an
interesting actor and highlight them in the end visualization, but they didn't have nearly as much
time to go through and explore it fully and arrange the final layout for publication.
>>: I have one question. Actually those net-based Nirvana thing, to me nothing is quite
surprising. So even though you don't teach the user, the user may still try to do [inaudible], for
example, if someone gives me a graph the first thing I'll do is remove the edge crossings and
then position them the right location so the tree is [indiscernible] so how much do you think it
affected them, affected you told them?
>> Elizabeth Bonsignore: That's a great question, because around the same time, a little bit after
Derek Hansen had started this communities of practice course, a colleague in the business
school, in the B school there, had a sort of two assignments -- I think he had two courses, over
two week assignment for his business students they were pretty well versed in social analysis
and metrics, the statistics end of it, of social network analysis.
But they weren't -- they didn't really care so much about the graph and the information
visualization part. They were not told anything about NetViz Nirvana and both in their
assignments and in their write-ups and some of the surveys that we had of them, their graphs,
they didn't try to do anything, other than the standard layout in NodeXL. And they said, yeah, we
think the statistics are more important because I can look at a number and it means more to me
than looking at the structure, because this is kind of a hairball right now and I don't care about
spending time working on it. Part of it might be they had only two weeks instead of four to five
weeks that the other students had. But we also like to think part of it was because they didn't
care about whether it looked pretty or not or looked meaningful or not.
>> Cody Dunne: We're also trying to teach social network analysis to students who have no
graph theory background whatsoever. They don't know what a node or an edge is when they
start. Giving them this leg up on creating better visualizations saves them time later on.
>> Elizabeth Bonsignore: Most of the IS students, Library of Science students, really didn't know
what a node or an edge was when they started.
>>: I have a related question. So you said people get obsessed trying to ->> Elizabeth Bonsignore: At least the 15 that we worked with.
>>: So do you have an idea how long they spent?
>> Elizabeth Bonsignore: Yes, we did. It was somewhere on the average of -- it was somewhere
on the average of, per assignment, two to three hours that they would sit and play with the
diagrams and test things.
>>: This is for exploration steps where they ->> Elizabeth Bonsignore: This was probably for exploration in steps to complete their final
visualization that they submitted for assignment. Now, overall, for all the assignments, I had
some numbers. But they spent about 10 to 12 hours, I said, the average -- I guess the fastest
person spent about three hours for his visualization and at the high end or median was probably
more like eight to ten hours.
>>: So that is a big chunk of time. Do you envision eventually being able to cut that significantly
through the use of your metrics?
>> Cody Dunne: Yeah. So the demo video I'll show you in a little bit here actually gives you
visual attributes that say where the problems are. And so it's still the manual manipulation
unfortunately, but you can select groups of nodes to manipulate together.
And we're currently exploring a snap-to-grid feature that allows you to pull, that allows the users
to pull nodes to local minima for the readability metrics as you move them around as well as
feeding them back into automatic layout algorithms. Although the layout algorithms tend to be
very expensive when you start doing that, when you start optimizing these things.
Are we ready to move on?
>> Elizabeth Bonsignore: Yeah.
>> Cody Dunne: Now, so let's talk a little bit more about these readability metrics. Again, I've
been working on these with Ben Shneiderman back at the University of Maryland. We have our
network. We can code attributes of the network in colors and in shape and size and all these
things. Here we just have colors. This is in the social action network analysis tool. And the
refresh of our NetViz Nirvana. We want all these attributes to be satisfied by our final drawing.
And we've created these readability metrics which measure how understandable these drawings
are.
Now, social action is a social network analysis tool created by Adam Pair at the University of
Maryland with Ben Shneiderman and incorporates statistical measures like NodeXL, and
centrality and degree and the like, with attribute ranking system that allows you to see a tabular
ranked view of these measures alongside the network visualization. Again, you have the net
brushing and linking between them.
It has these multiple coordinated view approach. Now, this attribute ranking system allows us to
very easily incorporate these readability metrics because we can just feed them in as additional
attributes for the nodes and we automatically get the coloring for them. We automatically get a
rank table that shows you where the crossings are in your graph. And we've added global
readability metrics for the entire graph to social action, as well as our node and edge readability
metrics that pinpoint the problems in the graph as they're moved.
Okay. So now I'm going to show you a brief demo of the tool in action here. So this is social
action right here. You have continuous layout, whereas the user modifies the layout, it continues
to reposition the nodes around it and we're loading in a small dataset from the Alberta Politics
Discussion News Group of people replying to each other. And then by tweaking some layout
parameters we can get to a good starting point to start our visualization.
What the user is doing here is they're changing the repulsion forces and the forces pulling nodes
together so as to try to create a good initial layout. And once they've done that, they can rank by
out degree or in degree or any of these statistical measures we have right here, they show up
colored in the rank table on the side and then highlighted here in those colors.
So the user can move to between a centrality or closeness centrality and compute these
measurements for them, and they can do their entire analysis. And when they're ready to publish
the image, they freeze it and start using the graph readability metrics.
As they drag the nodes around, it updates the readability metrics for the entire drawing as well as
for the individual node they have selected. And they can rank by these metrics computed. Here
we have no node occlusion but we have a fair amount of local edge tunnels, where we have
edges traveling underneath nodes without connecting to them. The worst nodes are highlighted
in red and the ones not so good are highlighted in dark blue. As the user moves them around
they get to this nice light blue color that indicates that everything's okay with that node.
And so you can grab groups of nodes and move them together, control click, or you can create
communities with your community finding algorithm and drag them together; but, unfortunately,
this currently involves a fair amount of manual manipulation to get to a good drawing.
And then as you see the colors are changed as the scale changes as well. So all right. And
that's the end of the demo there. But let's go through a static process of doing one of these
analyses. So here we have this nice tight network colored by node occlusion, the bright red
nodes are the ones occluding the most other nodes. In the top right we have measurement for
how many occluding nodes we have how many edge tunnels and edge crossings so you can get
a visual idea of the count of these problems.
The easiest way to reduce node occlusion if you have the room is to spread the graph out.
Unfortunately in publications you often don't. If you want to read the labels. By reducing the
global spring coefficient by an order of magnitude, we allow things to spread further apart with the
continuous algorithm, substantially reduce node occlusions, from four to 10 and another order
magnitude and have a nice layout. This might not be suitable for putting in two column ACM
format paper, because it takes up a lot more space and you can press it, you're not going to be
able to read the labels. So oftentimes in those cases you end up with again doing the manual
manipulation.
Coloring by local edge tunnels. We see those problem areas. We have 14 edge tunnels in total,
and again through manual manipulation we don't have to move the nodes a whole lot unless
there's a nice hairball in the center to completely eliminate the edge tunnels. And some tight
graphs, it becomes impossible and you just have to start reducing the amount of them rather than
completely eliminating them.
Edge crossings are a little bit more tricky, because reducing edge crossings, oftentimes doing
these gross manipulations, like taking these highly central nodes and pulling them off to the side
or using good edge routing techniques, but currently social action only allows these straight
edges we have right here. So we're limited to fitting nodes between these arcs of edges from the
high degree nodes. So this improves our ability to do things like path finding because we can
easily follow the edges without the crossings, but it reduces our understanding of how important
any particular node is in the network.
So this kind of looks a little bit less central because it's not in the center of the graph and because
its edges all come out from one side instead of from all around him.
And so if we just want to find paths, that's not a problem, but if you want to identify how important
people are, being central and having your edges spreading around apart from you is a lot more
important. So this illustrates some of the trade-offs we often face when optimizing these metrics.
We've looked at a bunch of additional readability metrics, angular resolution is the spread of
edges around it. Also the angle of the edge crossings can substantially impact path finding tasks
and things like that because users are more likely to follow the wrong path.
We've looked at various things like node size and color and shape variance, especially when you
have to scale your visualization because beyond a certain point you don't want to show the text
on the node because nobody will be able to read it. You can just compress. Can't compress it
too far otherwise you won't be able to see the color or shape of it.
Additionally, we can look at orthogonality or how well the nodes and edges line up on an
imaginary grid. For UML diagrams or hierarchical structures, they're useful because there's
meaning to those levels for social networks and citation networks and things like that forcing them
on to these arbitrary grids imply relationships that don't exist. You can scan it more easily but you
have relationships that don't exist there.
Spatial layout is very important and there's also various measurements for the paths through the
network. Because you want these paths to be fairly continuous and easy to follow. You don't
want them to be very bendy. You want them to be nice and continuous. And you don't want
arbitrary paths heading off towards nodes they don't actually connect to, geometric path
tendency. And finally you want edge length usually to be uniform throughout the graph except
through disparate groupings. Oftentimes users request they be placed further apart.
So future work like we've talked about, we have the snap-to-grid feature we're working on as well
as feedback into these automatic layout algorithms. We've done this evaluation for NetViz
Nirvana for teaching using NodeXL this paper we presented earlier, and then we've started
implementing these readability metrics into NodeXL so we can look at more long-term usage of
them and see how well it improves users' end results.
And so in conclusion, all the things we've talked about here, readability metrics as well as
individual node and edge readability metrics for identifying problem areas. And we hope that this
will help network analysts become more aware of these problems and how to fix them and tool
designers to give guidance to their users as to how to fix the problems in their graph drawings so
we can get better publications for everybody.
And if you're interested in reading the paper we have a tech report here and we're submitting that
paper to KAI 2010 [phonetic]. Thank you. Any questions?
[applause]
>>: So, yeah, you mentioned graphing. How do you think that graphed edges would affect -- do
you think that the readability metrics for rounded edges would still apply, would change the
strategy at all?
>> Cody Dunne: Rounding edges substantially would reduce the number of these blatant edge
intersections and the edge tunnels underneath nodes. And so that is very helpful for path finding
tasks and making sure you actually find that the edge isn't connected to any particular node,
because it's right around the edge of it.
But all these metrics are applicable to straight edge drawings as well as rounded edge or curved
edge drawings, they become more difficult. With rounded edges, it seems the bendedness of the
route and the angle of all the junctions in the path, if you have line segments connecting it, it
starts becomes more important. If you have a lot of edges routed through the same narrow
channel between nodes, that also becomes a bit of an issue because you have to do things
instead of being able to code attributes with the edges, you have to start using arbitrary colors to
be able to distinguish edges within those bundles.
So currently our edge intersection metric would actually call those edges lying on top of each
other as an intersection. So to properly handle those we would have to extend it a little bit and
make measurement for how distinguishable any edge in that bundle actually is.
>>: Do you have any thoughts on the graph, the edge -- because it's interesting you show only
the node that has labels, right? Because some graphs edge or so has labels. Can you put the
label on the node, it becomes more complex problem?
>> Cody Dunne: So we just implemented edge labels in NodeXL and our current approach is to
line up the label along the edge and actually lay out on top of the edge with a light opacity
bounding box around it you can actually see the edge behind the label but it's not nearly as
strong it allows you to see intersections behind labels even though hopefully you wouldn't have
any of those. And it's gone.
>>: Edge readability actually the mapping between actual edge to the label that edges also -- it's
not just about [inaudible] there are many names and you put some node, sometimes you are just
not sure which link actually that label refers to?
>> Cody Dunne: In NodeXL, because the text is actually tilted to the angle of the edge and it's
laid directly on top of it, it becomes a lot easier to associate those things. It becomes a little bit
harder to read because it's tilted, but you have a lot less problems with it interacting with other
edges and being able to identify which is which.
What becomes more of an issue is if you have long edges, long edges, you stretch long labels
out along them or you compress them. If you have close edges where you have room for a label,
what do you show there. We looked at various approaches for simplifying edge labels that we
don't have enough room to show and making them nice and unique so you can identify them and
read them properly.
And then we also have a label impossible approach where you rank edges according to your
importance based on their betweenness centrality or something like that and it tries to label them
just greedily along that list. And then if it can't show a label for that edge, without completely
destroying the readability of that section it just won't, and it will tell you that it can't show an edge
there with maybe a little marker or something like that.
>> Bongshin Lee: Okay. Any further questions? Let's thank our speakers one more time.
[applause]
Download