17540 >> Bongshin Lee: Good morning everyone. Thanks for making time during one of the busiest days in the year. Today we have two speakers from the University of Maryland. Cody Dunne is a Ph.D. student in the computer science department at the University of Maryland, working under Ben Shneiderman. And Elizabeth Bonsignore is a Ph.D. student in I School there. And she serves as an operations engineer and intelligence analyst for the Department of Defense and she holds Master's degrees in computer science and education. Actually, they are on their way to the I-3 conference on social computing in Vancouver, Canada. So today first they will talk about the paper they will be presenting at the conference on Sunday morning and then later they will talk more about network visualization Nirvana and readability matrix for craft drawing. Here comes the talk. >> Elizabeth Bonsignore: Thank you very much, Bongshin. As she said, I'm Elizabeth Bonsignore, and with Cody's help today I'd like to share with you the results of a study that we had this past spring studying the teachability of social network analysis using node XL. As you can see, we both worked on the project, but also Dana Rotman, another Ph.D. student at the I School was with us Marc Smith, sociologist with Intelligent and Tony Capone is here with us. He's a developer. And our professor is Derek Hansen and Ben Shneiderman. First, a real quick overview of what I'm going to cover today or what we're both going to cover today, first motivation and research goals for the study an evaluation of NodeXL to teach, as a teaching tool for social network analysis, and the principles of layout graph principles that we dubbed NetViz Nirvana and readability metrics. The Research methods we went over, samples of student work, and some lessons learned. The motivation for our study is based on the observation that sophisticated network analysis tools basically remain the domain of computer scientists who are developing algorithms for drawing graphs. Most of them have high learning curves, steep learning curves, use command line interfaces or require some level of programming expertise. Yet, as most of us know, social network analysis is an important academic, commercial and in social media contexts. In short, SNA is not just for scientists anymore, because community managers also have a stake in learning how to use SNA tools to sort of help their communities to cultivate. So our long-term goal is to develop accessible tools and educational strategies to help reach a broader spectrum of users. With that overarching goal in mind, the focus for our talk today is evaluation of NodeXL as a teaching tool for social network analysis across a broader user set as well as these layout, graph layout principles that we've dubbed NetViz Nirvana which Cody will go over. First a little bit about NodeXL. NodeXL is an open source plug-in for Microsoft Excel. It stands for Network Overview Discovery and Exploration for Excel. As an open source plug-in for Excel, it takes advantage of the spreadsheet format to store your network data. It also takes advantage of the standard spreadsheet manipulation tool such as sorting, filtering and creating formulas to actually use them in your displays. If we take a little walk around the layout, basically we have on our left we have this spreadsheet, your data for your nodes and your edges, and we have the visualization. We have worksheet tabs that you can switch back and forth between tasks for edges and nodes and clusters. And it's a very tightly integrated visualization and data. If you click a node or an edge in your graph, it will be reflected or highlighted in your cell data and your spreadsheet and vice versa. You can import your network data from existing spreadsheets, of course, or from several common social network data sources such as Twitter or tools like Piach [phonetic] or output from Piach. They also provide a starter library, basic network matrix, such as centrality measures and degree measures which you can select as needed based on your performance requirements or based on what questions you're asking. You can also have multiple ways to map your data to display properties or attributes such as using degree to represent size, shape or opacity of the nodes, or you can even use thickness of the edges as well. And on to network, NetViz principles. >> Cody Dunne: As part of our study, we introduced these information science to students who had a couple of week class on network analysis to these principles for creating better network diagrams. We call it NetViz Nirvana. And NetViz Nirvana pretty much is that every node in your network should be visible so you can count the number of nodes in a particular area. If the nodes are large enough you can read the labels on them wthout them occluding each other. Also we want the degree of every node to be countable, because degree ends up being a very good measurement for how important any individual node is for the status of any actor in the network. Also, we want to be able to follow each of these edges from the source node to the destination node, because knowing relationships exist isn't as useful as being able to find out who they're between and look at more details about them. Also, we want to be able to easily identify clusters and outliers, because that's what we really care about finding. If we're a network analyst, we want to look at interesting attributes of our network. These overall principles was the introduction that we gave them to creating good network diagrams, but they don't really give you fine detail as to where your drawing is or, sorry, where problems are in your drawing and provide a quantitative measurement for how good or bad your drawings are. So we've been working on these readability metrics which we also introduced to the students during the course that are just measurements for how readable or understandable the drawing is. So some of these are common things like edge crossings like we've seen before, and we have a node occlusion metric for nodes overlapping and edge tunnels for edges travel underneath them, but we're defining all of these on a continuous scale from zero to one where zero is the worst possible case we think you could have and one is the best case. In the past these often have been called aesthetic metrics, but we believe this is a bit of a misnomer because we're focusing on the readability of the drawing, not necessarily the beauty. Although it's fortunate that beauty is oftentimes correlated with readability. Now, in the past people like purchased, focused on global metrics, which give you a single number for various aspects of your entire visualization saying it's 80 percent on node occlusion or 80 percent on edge crossings, things like that. But oftentimes it's not sufficient to guide users to problem areas so they can improve their drawings. Think of an analogy to Microsoft Word. If Word just told you that you had 20 misspellings in your document rather than pointing you to where those misspellings were, it would take you substantially more time to improve your document and get it ready for publication. We've created this node in edge readability metrics to give us this fine granularity and allow us to pinpoint problems. First off, let's look at node occlusion. On the simple drawing on the right we have nodes A, C, and D. In this fourth node we can't see it because A is occluding it. There could be additional nodes hiding there we're not able to count because they're completely hidden. But by moving A off to the side, we can read the label. On node B we see A is not connected to node B. It's just a much better representation of the drawing. Our occlusion readability metric is proportional to the area lost if you compress all the nodes into a single static image, like layers in PhotoShop or any other image processing application. So when you flatten all these nodes, you count how much area is lost in the flattening and in a perfect situation every node is uniquely distinguishable. They have a white border around them or whatever the background color is and you don't lose any area. Worst case scenario they end up in a pile together. Our edge crossing readability metric is the same used by purchase, number of crossings that you count in the drawing like this intersection here, scaled by an approximate upper bound for how many we think there could be. And this is bad because when you have crossings like this, it's extra visual complexity in the graph. If you don't need to have the crossings there, you should eliminate them so that users can do things like path finding tasks better without having to deal with these visual attributes. So by moving it down, again we get the same representation where you don't have these intersections and this extra cognitive load. Edge tunnels are when you have an edge traveling underneath a node without actually connecting to it. This is the exact same network as before; but because of the positioning, it looks that C is not directly connected to B and it looks like A is connected to B. But, in fact, the edge is just hidden directly underneath that node. So this is kind of a trivial example where things are perfectly aligned; but oftentimes it's very hard to determine whether an edge is actually connected to a node especially if you have noncircular nodes. So just like edge crossings, we create a metric for this based on the number of tunnels, based on an approximate upper bound how many there could be. But we define these additional local edge tunnel and trigger edge tunnel metrics for how many edge tunnels there are underneath any particular node and how many edge tunnels there are caused by the edges coming from a particular node. So the first is good for identifying nodes that are problems and the second is good for identifying areas around nodes that we might want to manipulate as a whole. We also look at text readability. So the first we examined was that label height or the height of the text within the label and guidelines for proper text type depends on the font and all that. But usually it should be within 20 and 22 minutes of arc from the viewer's eye. And so no matter what the medium is, we have to calculate the distance of the user from the medium, the pixels per inch in the medium or the height of the font in inches on paper and compute this visual angle and we create this metric based on approximate lower and upper bounds based on work other people have done. Finally, label distinctiveness is a measure for how unique your label is in the drawing. If you have Department of Computer Science, Department of Sociology and so on as your node labels and you truncate after eight characters, you're going to have all of these labels that you're not going to be able to distinguish between them. So we create this distinctiveness metric based upon a prefix tree where you can just cut off at your truncation point for all your labels and then in each of the sub trees you can count the number of nodes that have identical labels. Now, one thing that you might think is why don't you just use like Lichert distance or things like that to determine the distance between every node or between the label of every node but this would in fact encourage users to create arbitrarily complex and unnecessary convoluted node names rather than just differing by a couple of characters. >> Elizabeth Bonsignore: Thanks Cody. We tried to frame our methods onto our methods, we tried to frame our methods along two axes, if you would like to think about it that way. One is depth and one is diversity. We say depth because we wanted to use mixed methods and a long time frame to study these users because we wanted to learn the graduate student, the information science graduate student's process of learning social network analysis. There's sort of sense-making process. We also wanted to try to map out or document the discovery process that practitioners, experts might use as they're trying to find patterns in the social structures. What we found was that there's a qualitative framework known as multi-dimensional in-depth long-term case studies approach, a qualitative approach. It assumes at the outset that exploration of datasets, complex datasets cannot be effectively tested by predetermined timed usability, traditional timed usability tests that usually last around on the order of one to three hours, because that sense-making process is really not that predictable. So we felt that was ideal for setting basically the learners as well as the experts. We took a sort of, to meet our diversity need to support a broad base of users, we had a two-prong user study. We used a core set of methods, interview techniques, presurvey and post-survey. We placed them, or we used them against two different groups. Computer scientists who had background in graph drawing and information visualization, and information science students who were interested in social networks and online communities and were in fact studying online communities but weren't necessarily as technical as a computer scientist and didn't know that much about graph drawing. So first for a little bit of background on our information science. Graduate students, we had 15 of them. They're a mixture of information science or library science and information management. They were all participating in a communities of practice course to study online communities and to determine social structures and relationships and ways in which they could help cultivate these online communities. They were each studying a community of choice. They ranged from things like depression support group, a serious eats gourmet group. A weight loss group. Wedding planning group, and arts and crafts group, records management listserv and so on. The time frame for our study was about five weeks within a 16-week semester. The data that we collected is we observed in class, in lab. We transcribed some of their online discussions and sort of coded and categorized what they were talking about as they were learning social network analysis. We had individual observations of various students of, just about all of them, 13 of them, while they were completing certain assignments using NodeXL and they would use a talk a loud protocol saying I'm thinking about this or I'm really frustrated about that. We looked at their course work, their assignments, and they had diaries, they had journal entries that talked about this is where I'm at in my visualization process; this is what I'm thinking about; these are my hypotheses. We did a presurvey to determine what their level of experience was and a post survey to sort of determine what they thought was most important. And for a few selected students we did some in-depth interviews. We used a grounded theory approach for an analysis which basically says that you have, after you have all this data transcribed, you sort of code and categorize all the salient features. For the information -- I'm sorry, for the computer science graduate students, that was primarily conducted by Cody. He had six computer science graduate students. They were all experienced at some level in graph theory, social network analysis and information visualization techniques. So he took sort of a core of what we used for the information science students. He used a survey, a presurvey and then took about just under two hours with each participant to give them a tutorial and to observe them interact again with their own dataset, not with a prescribed dataset, but a community or social network that they were interested in. And part of that observation process was in-depth interview. Again, a grounded theory approach. And Cody also used spot fire and some quantifier analysis of surveys. As we were coding and categorizing some of the transcripts and especially the online discussions and the journal entries, some salient features about teaching social network analysis and the learning process came to mind, which is basically the students really enjoyed being able to map their display attributes of using certain metrics to their display. In other words, they wanted to make the node sizes big if the degree was high. They wanted to change the shape of certain nodes that had certain characteristics like community moderators. And we found that NodeXL very effectively supports this being able to do it very dynamically and setting it pretty much via menu item. And the students became almost obsessive after they learned about the graph layout principles known as NetViz Nirvana although they didn't have the capability built into NodeXL yet, they were really -- they found it really important to do to try to make meaningful graphs and it helped them develop relatively sophisticated graphs in a short period. Some student examples. I'm going to go through three of them. And the first two sort of have the same structure in that or goal in that they were interested in finding whether boundary spanners or bridges occurred across different subgroups or across different forums. And they both used degree and tie strength to sort of represent that as their metrics to represent that. This student, she had this Subaru subgroup, Subaruowners.com community. Again she identified boundary spanners and wanted to show the level of participation in different forums. So these spheres here are the actual forums she was interested in and the smaller nodes and triangles and diamonds are the actual people that are part of that community. What you can see right away is that the members and cafe forums have a lot more connectivity, a lot more common participants. This is probably because of their more relaxed nature or the broader nature of their discussions than the problems and solutions group. For the problems and solutions group, many people just go in and ask a question and they're out of there and they don't keep coming back and making friends. Another thing that we found that was interesting or that she found was interesting was that the host, which ostensibly are the people that are supposed to keep the discussion going, were in fact some of the lower level participants in the group with the exception of this one right here, but in fact who came out as community leaders were these two little triangles and they didn't necessarily have any real leadership role but they came out based on their participation. The second group is, this student, again, she wanted to identify boundary spanners across these different subgroups but she also wanted to find out how these ravelers who are people who knit and crochet, what are the properties of those folks who completed more projects, more knitting or crocheting projects. What were the factors that led them to complete more projects. So she picked the top 20 posters and contributors of three different groups. One was lazy and godless, which is the LSG community the other was fiber optics spectrum. What you find there, they're pretty much tightly unique and separate subgroups in this first drawing here. There's not very many boundary spanners across the groups. They sort of stay among themselves, and you also see that the lazy and Godless community, they're very much more social because they contribute a lot more to the posts whereas project spectrum is focused on completing projects and maybe just highlighting that fact in their posts. In the next post she took only -- in the next visualization she took only the bloggers with that community and the community moderators or leaders of each subgroup are shown by these spheres and sort of confirmed the relatively simple hypothesis that those who were moderators, those who participated more actually completed more projects. But it's still an interesting visualization. And while this student didn't really achieve NetViz Nirvana per se because there's lots of edge crossings, he's really interesting in that he tried to model sort of a community management problem using NodeXL and it was really pretty effective at doing that. What he has is a records management listserv and he was trying to identify what features of the different people would be reflected in their sort of leadership and experience levels. And his hypothesis was that those would hide between centrality and those connected to important people, those with high eigen vector centrality would in fact turn out to be the leaders in the community. That held true because the admin in this first, left graph, you see that the admin of the listserv community was in fact, it had -- he had the highest between this eigen vector centrality. Those with lower, between this or those who were less connected or bridging across different subgroups are sort of hanging out on the periphery and they're in the red as well. What he tried the model, once he confirmed that that was sort of his, that his hypothesis was correct, was who would be the best candidate if he could skip or take out that existing admin, or if that admin left the community, would there be anybody who could have sort of had the same requisite skills and experience and connections that would be able to come and fill that person's place. He did in fact, he skipped the node using NodeXL very easy, one step, and was able to show that there was a new admin that sort of popped up there. So relatively sophisticated model for somebody who had only learned for about three to five weeks. So as to our lessons learned: What we found primarily was that when you promote the awareness of NetViz Nirvana people learn it. When they know you have to make graph layouts that are readable they learn it pretty well and in fact they get sort of obsessive about it. One thing that NodeXL didn't support very well that they, the students had workarounds themselves is they tried to scaffold their learning with sort of interaction history. They would save different versions and different paths they followed with their visualizations and hypotheses as they went along. But what they really kept saying was if we could have a history or a way to undo actions that would be really helpful for us to learn. I thought of this idea. It didn't quite work and it might also be good to have a library of those sorts of histories so that other people who learn or who are learning social network analysis could sort of learn from that as well. There were some pasting issues, went from a nice tutorial to some low levels, numbers of nodes and went very quickly to lots of nodes and it got a little bit hard for them to take it all in. And many of them said that they would have liked to have higher level Excel experience. If not just Excel itself, but more experience with the interface with Excel 2007. For researchers, what we found is that MILCs are very effective way to represent the discovery process that subject matter experts follow as they go through these complex, their analysis of these complex datasets. It maps the sense making process that students follow as well. In fact, if you consider that of the 132 reports or articles that were posted, I guess, between 2005 and 2008 and 2007 and 2008 in info advice and vast only 39 of those actually had any user evaluation, period. And all of those were between one and three hours. They were all your standard timed predetermined dataset analysis. So this was probably a good example of trying to map to the actual process of learning SNA as well as the discovery process. Of course, it's kind of obvious but it merits mention that it does require more data collection and analysis. >> Cody Dunne: So from our CS students, in particular, but also our information science students, we learned a lot of more specific things about our tool and about tools in general for how like to improve them. So we've shown again multiple coordinated views having our tabular view and visualization and brushing link in between them is very effective in allowing people to analyze the data. And we've also shown that users enjoy being able to code all these visual attributes based on the data that they have. More interestingly, we showed that adding readability metric interactions is helpful in getting these users or adding knowledge to the readability metrics anyway is more helpful in getting these users to better productions than without them. Also, more specific to Excel, we found that having that tabular view with all these formulas and macros you can apply in Excel allows for very extensive and novel data manipulation techniques, like writing formulas based on constants defined elsewhere in manually tweaking them and seeing their reflected visual updates. However, being inside Excel we're limited to what Excel templates allow us to do. So like Elizabeth said having undo functionality or hierarchy of previous actions and being able to go back to them would be very helpful for these tools. And finally the big key point is that users would really enjoy being able to aggregate nodes and edges and currently NodeXL has a limited edge aggregation functionality. But the users constantly requested that our aggregation techniques, not just for communities but for user defined groupings as well if you're looking at an e-mail network you might want to combine e-mail addresses that you know of that belong to a single particular person. In conclusion, we used two different and diverse groups. One with nice long-term multi-dimensional long-term case study and one with more short-term focused intense assignments using their own datasets and they were all invested in accomplishing their analysis. First network analysis education. Our information science students, showed that NodeXL can be very useful because of its shallow learning curve for teaching basic network analysis techniques. And as far as NodeXL usability and design, our feedback from our CS users and our IS users gave us lots of information about feature requests and bugs for NodeXL that allowed substantial improvement even over the course of the user study in the program and we're just finishing a large redesign of the user interface based on some of these requests. If you have any questions, you can ask them to us now or you can look at a tutorial for NodeXL, 50 pages here, on the Catsky [phonetic] website. Here's the NodeXL website, and here's more visualization links at the University of Maryland. >> Elizabeth Bonsignore: Thank you. [applause] >>: Cody, what are you doing to actually compute these readability metrics today? Are you doing computational analysis of the geometry of the graph? >> Cody Dunne: Yes, we just look at the layout provided to us by the layout algorithm, and then we're able to find the number of edge intersections and we're able to find where every node is occluding, and the current implementation we have, which I'll talk about a little bit more in social action which is another network analysis tool, uses fairly naive approaches for computing these intersections. But if you look at the work by, oh, University of Maryland, I'll send you the link, but there's a large reference library of fast edge intersection and line intersection and rectangle intersection algorithms that you can use to substantially improve this process. Because in the best case scenario you would be computing these in real time and given feedback to the user as to where the problems are in the network. >>: I have a question about that. So the current tool doesn't give them feedback on the sort of readability metrics? >> Cody Dunne: No, Excel currently doesn't, but it will soon. I have a follow-up addendum presentation that talks about more in depth about the readability questions and it will address that. >>: I missed the beginning. Is it a manual layout tool that doesn't use expressing the edge crossing removal that somehow sort of ->> Cody Dunne: So NodeXL has a couple of algorithms that it provides. We have [inaudible] algorithm and older forced directed approach. We have the [inaudible] multi-scale approach now for laying out the graph. Circle layouts, various like sine waves and grids and spirals and things like that. >>: Can a user then somehow make it better if the algorithm doesn't do satisfactory ->> Cody Dunne: That's what all of our I School students did. >> Elizabeth Bonsignore: In fact, they spent lots of time -- although they didn't have it done for them or at least not yet in NodeXL automatically to sort of remove all these edge crossings, they spent a lot of time in all of their diaries and all the interviews we had with them, once they were aware of NetViz Nirvana, once they read Cody's paper and Ben's paper and learned about all these different factors, node occlusions and edge crossings and edge tunnels they spent a lot of time saying I want to get rid of them and make my drawing more meaningful. In a relatively short time they were able to get a really sophisticated sort of idea how they should be drawing graphs manually. >>: They did that via dragging? >> Elizabeth Bonsignore: Yes, they liked locking certain -- they really liked locking certain sections of the graph and then experimenting with other ones as well. So, yeah, it was really -they got obsessive about it, actually. >>: So you said that when they learned, they were more obsessed. So when did you teach them? >> Elizabeth Bonsignore: It was right -- let's see, they had a tutorial in NodeXL. They read Cody and Ben's tech report, and then they discussed it in class. And they were -- and around that same time they were just now starting to sort of input the data for their communities saying I think this is what I think will happen and they started realizing this looks really crappy or they didn't like the way it looked when they did the first automatic layout and then they would play around with it. >>: So what was the exact test you guys gave to them? What did you ask them to do? >> Elizabeth Bonsignore: They had these online communities of interest that they had started to study and they were using various communities of practice ideas to sort of determine better than participation metrics what they were, more information about how to sort of keep these communities growing so they wouldn't die off as we see many online social networks die. And so the section for NodeXL is that they wanted to see how social network analysis tools, statistics and visualizations, would help them give community managers more of an idea of these sorts of things. So they learned NodeXL. They were given two assignments to say sort of state a hypothesis about something that you, either a management problem like that third student did, or ways of sort of highlighting the interests of subgroups and then make two visualizations based on that sort of initially visualization see if it confirms your hypothesis or shows you something else. And a second visualization that either models some sort of problem or looks at it from another facet. Does that answer your question? >>: And what was the average size of the graph state? >> Elizabeth Bonsignore: Oh, that's a good question. It was anywhere from -- well, with a loss PD, there was one student studying loss PD. I'm not sure, she may have had two subgroups. I don't remember. That was probably the top end. But they were on the order of the hundreds. >> Cody Dunne: Some of our CS users dealt with much larger datasets. We had one analyze a protein-protein interaction network, supported 40,000 edges, and another look at the vast 2009 challenge dataset of footer relationships which is a Twitter clone for the dataset. And the current version of the NodeXL we used at the time didn't handle these very large datasets very well. We had a lot of good feedback how to reduce the burden placed on the user of sitting there and waiting for things to finish. So we received feedback on disabling some of the real time updating and more progress bars and being able to cancel actions and the like. So currently NodeXL is more your introduction to social network analysis tool. But as we progress, it scales more and more to larger networks. And we actually have a research assistant working now on porting some of these algorithms like between the centrality and the like over to Nvidia's CUDA architecture for graphic processors to improve the speed of them. And we have Vladimir and virash working on porting them to a Map-Reduce framework so you can offload it into the cloud for large networks. >>: So for the first user group you gave us the three examples. Do you have any examples for the graduate students? >> Cody Dunne: Not in the slide show, but I can bring them up if you like. They didn't learn ->>: I can see it later. >> Cody Dunne: And so they were different tasks. It wasn't about the communities of practice approach. I gave them more directed tasks like finding interesting community and find an interesting actor and highlight them in the end visualization, but they didn't have nearly as much time to go through and explore it fully and arrange the final layout for publication. >>: I have one question. Actually those net-based Nirvana thing, to me nothing is quite surprising. So even though you don't teach the user, the user may still try to do [inaudible], for example, if someone gives me a graph the first thing I'll do is remove the edge crossings and then position them the right location so the tree is [indiscernible] so how much do you think it affected them, affected you told them? >> Elizabeth Bonsignore: That's a great question, because around the same time, a little bit after Derek Hansen had started this communities of practice course, a colleague in the business school, in the B school there, had a sort of two assignments -- I think he had two courses, over two week assignment for his business students they were pretty well versed in social analysis and metrics, the statistics end of it, of social network analysis. But they weren't -- they didn't really care so much about the graph and the information visualization part. They were not told anything about NetViz Nirvana and both in their assignments and in their write-ups and some of the surveys that we had of them, their graphs, they didn't try to do anything, other than the standard layout in NodeXL. And they said, yeah, we think the statistics are more important because I can look at a number and it means more to me than looking at the structure, because this is kind of a hairball right now and I don't care about spending time working on it. Part of it might be they had only two weeks instead of four to five weeks that the other students had. But we also like to think part of it was because they didn't care about whether it looked pretty or not or looked meaningful or not. >> Cody Dunne: We're also trying to teach social network analysis to students who have no graph theory background whatsoever. They don't know what a node or an edge is when they start. Giving them this leg up on creating better visualizations saves them time later on. >> Elizabeth Bonsignore: Most of the IS students, Library of Science students, really didn't know what a node or an edge was when they started. >>: I have a related question. So you said people get obsessed trying to ->> Elizabeth Bonsignore: At least the 15 that we worked with. >>: So do you have an idea how long they spent? >> Elizabeth Bonsignore: Yes, we did. It was somewhere on the average of -- it was somewhere on the average of, per assignment, two to three hours that they would sit and play with the diagrams and test things. >>: This is for exploration steps where they ->> Elizabeth Bonsignore: This was probably for exploration in steps to complete their final visualization that they submitted for assignment. Now, overall, for all the assignments, I had some numbers. But they spent about 10 to 12 hours, I said, the average -- I guess the fastest person spent about three hours for his visualization and at the high end or median was probably more like eight to ten hours. >>: So that is a big chunk of time. Do you envision eventually being able to cut that significantly through the use of your metrics? >> Cody Dunne: Yeah. So the demo video I'll show you in a little bit here actually gives you visual attributes that say where the problems are. And so it's still the manual manipulation unfortunately, but you can select groups of nodes to manipulate together. And we're currently exploring a snap-to-grid feature that allows you to pull, that allows the users to pull nodes to local minima for the readability metrics as you move them around as well as feeding them back into automatic layout algorithms. Although the layout algorithms tend to be very expensive when you start doing that, when you start optimizing these things. Are we ready to move on? >> Elizabeth Bonsignore: Yeah. >> Cody Dunne: Now, so let's talk a little bit more about these readability metrics. Again, I've been working on these with Ben Shneiderman back at the University of Maryland. We have our network. We can code attributes of the network in colors and in shape and size and all these things. Here we just have colors. This is in the social action network analysis tool. And the refresh of our NetViz Nirvana. We want all these attributes to be satisfied by our final drawing. And we've created these readability metrics which measure how understandable these drawings are. Now, social action is a social network analysis tool created by Adam Pair at the University of Maryland with Ben Shneiderman and incorporates statistical measures like NodeXL, and centrality and degree and the like, with attribute ranking system that allows you to see a tabular ranked view of these measures alongside the network visualization. Again, you have the net brushing and linking between them. It has these multiple coordinated view approach. Now, this attribute ranking system allows us to very easily incorporate these readability metrics because we can just feed them in as additional attributes for the nodes and we automatically get the coloring for them. We automatically get a rank table that shows you where the crossings are in your graph. And we've added global readability metrics for the entire graph to social action, as well as our node and edge readability metrics that pinpoint the problems in the graph as they're moved. Okay. So now I'm going to show you a brief demo of the tool in action here. So this is social action right here. You have continuous layout, whereas the user modifies the layout, it continues to reposition the nodes around it and we're loading in a small dataset from the Alberta Politics Discussion News Group of people replying to each other. And then by tweaking some layout parameters we can get to a good starting point to start our visualization. What the user is doing here is they're changing the repulsion forces and the forces pulling nodes together so as to try to create a good initial layout. And once they've done that, they can rank by out degree or in degree or any of these statistical measures we have right here, they show up colored in the rank table on the side and then highlighted here in those colors. So the user can move to between a centrality or closeness centrality and compute these measurements for them, and they can do their entire analysis. And when they're ready to publish the image, they freeze it and start using the graph readability metrics. As they drag the nodes around, it updates the readability metrics for the entire drawing as well as for the individual node they have selected. And they can rank by these metrics computed. Here we have no node occlusion but we have a fair amount of local edge tunnels, where we have edges traveling underneath nodes without connecting to them. The worst nodes are highlighted in red and the ones not so good are highlighted in dark blue. As the user moves them around they get to this nice light blue color that indicates that everything's okay with that node. And so you can grab groups of nodes and move them together, control click, or you can create communities with your community finding algorithm and drag them together; but, unfortunately, this currently involves a fair amount of manual manipulation to get to a good drawing. And then as you see the colors are changed as the scale changes as well. So all right. And that's the end of the demo there. But let's go through a static process of doing one of these analyses. So here we have this nice tight network colored by node occlusion, the bright red nodes are the ones occluding the most other nodes. In the top right we have measurement for how many occluding nodes we have how many edge tunnels and edge crossings so you can get a visual idea of the count of these problems. The easiest way to reduce node occlusion if you have the room is to spread the graph out. Unfortunately in publications you often don't. If you want to read the labels. By reducing the global spring coefficient by an order of magnitude, we allow things to spread further apart with the continuous algorithm, substantially reduce node occlusions, from four to 10 and another order magnitude and have a nice layout. This might not be suitable for putting in two column ACM format paper, because it takes up a lot more space and you can press it, you're not going to be able to read the labels. So oftentimes in those cases you end up with again doing the manual manipulation. Coloring by local edge tunnels. We see those problem areas. We have 14 edge tunnels in total, and again through manual manipulation we don't have to move the nodes a whole lot unless there's a nice hairball in the center to completely eliminate the edge tunnels. And some tight graphs, it becomes impossible and you just have to start reducing the amount of them rather than completely eliminating them. Edge crossings are a little bit more tricky, because reducing edge crossings, oftentimes doing these gross manipulations, like taking these highly central nodes and pulling them off to the side or using good edge routing techniques, but currently social action only allows these straight edges we have right here. So we're limited to fitting nodes between these arcs of edges from the high degree nodes. So this improves our ability to do things like path finding because we can easily follow the edges without the crossings, but it reduces our understanding of how important any particular node is in the network. So this kind of looks a little bit less central because it's not in the center of the graph and because its edges all come out from one side instead of from all around him. And so if we just want to find paths, that's not a problem, but if you want to identify how important people are, being central and having your edges spreading around apart from you is a lot more important. So this illustrates some of the trade-offs we often face when optimizing these metrics. We've looked at a bunch of additional readability metrics, angular resolution is the spread of edges around it. Also the angle of the edge crossings can substantially impact path finding tasks and things like that because users are more likely to follow the wrong path. We've looked at various things like node size and color and shape variance, especially when you have to scale your visualization because beyond a certain point you don't want to show the text on the node because nobody will be able to read it. You can just compress. Can't compress it too far otherwise you won't be able to see the color or shape of it. Additionally, we can look at orthogonality or how well the nodes and edges line up on an imaginary grid. For UML diagrams or hierarchical structures, they're useful because there's meaning to those levels for social networks and citation networks and things like that forcing them on to these arbitrary grids imply relationships that don't exist. You can scan it more easily but you have relationships that don't exist there. Spatial layout is very important and there's also various measurements for the paths through the network. Because you want these paths to be fairly continuous and easy to follow. You don't want them to be very bendy. You want them to be nice and continuous. And you don't want arbitrary paths heading off towards nodes they don't actually connect to, geometric path tendency. And finally you want edge length usually to be uniform throughout the graph except through disparate groupings. Oftentimes users request they be placed further apart. So future work like we've talked about, we have the snap-to-grid feature we're working on as well as feedback into these automatic layout algorithms. We've done this evaluation for NetViz Nirvana for teaching using NodeXL this paper we presented earlier, and then we've started implementing these readability metrics into NodeXL so we can look at more long-term usage of them and see how well it improves users' end results. And so in conclusion, all the things we've talked about here, readability metrics as well as individual node and edge readability metrics for identifying problem areas. And we hope that this will help network analysts become more aware of these problems and how to fix them and tool designers to give guidance to their users as to how to fix the problems in their graph drawings so we can get better publications for everybody. And if you're interested in reading the paper we have a tech report here and we're submitting that paper to KAI 2010 [phonetic]. Thank you. Any questions? [applause] >>: So, yeah, you mentioned graphing. How do you think that graphed edges would affect -- do you think that the readability metrics for rounded edges would still apply, would change the strategy at all? >> Cody Dunne: Rounding edges substantially would reduce the number of these blatant edge intersections and the edge tunnels underneath nodes. And so that is very helpful for path finding tasks and making sure you actually find that the edge isn't connected to any particular node, because it's right around the edge of it. But all these metrics are applicable to straight edge drawings as well as rounded edge or curved edge drawings, they become more difficult. With rounded edges, it seems the bendedness of the route and the angle of all the junctions in the path, if you have line segments connecting it, it starts becomes more important. If you have a lot of edges routed through the same narrow channel between nodes, that also becomes a bit of an issue because you have to do things instead of being able to code attributes with the edges, you have to start using arbitrary colors to be able to distinguish edges within those bundles. So currently our edge intersection metric would actually call those edges lying on top of each other as an intersection. So to properly handle those we would have to extend it a little bit and make measurement for how distinguishable any edge in that bundle actually is. >>: Do you have any thoughts on the graph, the edge -- because it's interesting you show only the node that has labels, right? Because some graphs edge or so has labels. Can you put the label on the node, it becomes more complex problem? >> Cody Dunne: So we just implemented edge labels in NodeXL and our current approach is to line up the label along the edge and actually lay out on top of the edge with a light opacity bounding box around it you can actually see the edge behind the label but it's not nearly as strong it allows you to see intersections behind labels even though hopefully you wouldn't have any of those. And it's gone. >>: Edge readability actually the mapping between actual edge to the label that edges also -- it's not just about [inaudible] there are many names and you put some node, sometimes you are just not sure which link actually that label refers to? >> Cody Dunne: In NodeXL, because the text is actually tilted to the angle of the edge and it's laid directly on top of it, it becomes a lot easier to associate those things. It becomes a little bit harder to read because it's tilted, but you have a lot less problems with it interacting with other edges and being able to identify which is which. What becomes more of an issue is if you have long edges, long edges, you stretch long labels out along them or you compress them. If you have close edges where you have room for a label, what do you show there. We looked at various approaches for simplifying edge labels that we don't have enough room to show and making them nice and unique so you can identify them and read them properly. And then we also have a label impossible approach where you rank edges according to your importance based on their betweenness centrality or something like that and it tries to label them just greedily along that list. And then if it can't show a label for that edge, without completely destroying the readability of that section it just won't, and it will tell you that it can't show an edge there with maybe a little marker or something like that. >> Bongshin Lee: Okay. Any further questions? Let's thank our speakers one more time. [applause]