>> George Robertson: It's my pleasure to introduce Nathalie Henry, who's a graduate student at -- she's getting a PhD jointly from Universite of Paris Sud in INRIA and the University of Sydney. And she's going to talk to us about her thesis work on exploring social networks with matrix-based representations. Nathalie. >> Nathalie Henry: Thank you for the introduction. So yeah I'll try to explain my PhD work in 50 minutes, so that's three years of work. It's about social networks and matrix-based representation. So basically my background groups three different axis. So it's graph theory. I work with Peter Eades at University of Sydney. Information visualization, my cursor advisor in France is Jean-Daniel Fekete and the application domain is actually social network analysis. I work many with histology, historians and analysts from French institutions. So let's talk about social networks. So what is it? So social networks are actually a set of actors, which are people, connected by relations. So like me and my mom. So we're a social network, two people, two actors in one relations. Just a simple example is FaceBook. So here is my friend network. So I'm in the center. I'm the red circle on this network and I'm connected to all my friends on FaceBook. So this is a social network. And we call this representation new link diagram. So basically you've got nodes, which are the circles. And you have got links, which are the links, relation between the people. So social networks are actually a very hard topic these days. So you've got all the online communities that are becoming very popular, such as FaceBook, Flickr, Friendster. You have got online collaboration websites also, such as Wikipedia, where user collaborates to create an encyclopedia and storage where people share software codes. And there are also lots of work on scientific collaboration. Our researchers collaborate with each other. So basically the collaboration is where two researchers cosign a paper and a collaborate. We have researchers interested in this to see type of view of the field. And there are lots of other social networks. The web is one. People are desperately in need of visualizing -- sorry, internets. So people are connecting through internet and you have got tourist network and more generally communication networks and also in biology you have got business transmission networks. So the HIV is propagating, for example. So those are all examples of social network. My approach to analyze that is to use information visualization and exploratory the time analysis, more particularly. So I'll explain more about that just now. So let's take Hinscomb(inaudible) numbers. So four series of X & Y coordinates. So four series of points. For instance on this actually see the same statistics for all four series. So for example I take the average or the mean of X and Y and I have got the same number. So from this point I would say statistics says all those four series are actually equivalent. But if I tranform it into a visual representation say using basic scatter blocks and representing each point by X and Y coordinates in the space then I can see very quickly that those four series are actually very different. And this is the main purpose of fixed exploratory data analysis. So you answer a question you did not know you had. So you didn't know what you were looking for. You just wanted to get insight about those numbers. And you want to look at the data from different perspective. So it's not replacing statistics, it's complimenting it. You want to look at the raw data from different point of view to try to get insight from it. So that's basically my approach. Now let's talk about network visualization. So usually social scientists start with node link diagram. I explained earlier it is just nodes like those squares here connected by links. Here is for example node link diagram of the (indiscernible) research community, the largest component sits like 100 researchers connected when they co-sign a paper. So usually here you can actually see something. But if you go to larger node link diagrams, so let's say that one, which is four to seven actors, then it's not the same problem. You can't really read the names and you can't really see who's connected to who. So after a lot of work filtering and signing visual viables most of the time you can achieve a very nice representation such as dot link. So that's the same networking as the previous one. Except someone manually placed the nodes into space and attributed colors and size. So here it is actually researchers of (indiscernible). The size means how many papers did they publish and the color is actually how many reference did they get. And here what you can see very fast is that big, red node here. Who's that guy? It is actually Ben Shneiderman. And then you've got the ones that are densely, that are strongly connected. So the links, the length of the links means the number of collaboration. And you can just see George here, fortunately you can't really read the name. And he's connected to Joe McKinley and (inaudible). So basically what social scientists want to see in those networks is the communities so people densely connected such as this area here or here and central actors such as those people that publish work or eyesighted the work or both. So this picture was obtained manually, right? People just moved the nodes, assigned visual colors and size. So what do you do with that big one here? First on the nodes. So you can do it manually, but then it will take huge amount of time. So in one screen, one standard screen you can see 100 nodes. So imagine four seven you have to spend a lot of time, but you can still do it if it is large. However, if it is dense it means that lots of links occur between nodes, then it's not even possible. You won't manage to have something readable enough. We have the solution now. So first solution I remove data. So I sample. I take a random sample of my network and I visualize it. So of course it is less data so it might be better, easier to read. However, in social networks what people want to see is central actors. So if I remove randomly one central axis. Say I remove Benjamin or George, then I have a vision completely misleading of my network so I don't want to do that. So another solution is filtering. In that case you filter so you remove data, but according to a given criteria. Let's say I remove researchers that have only published one paper. In that case you can still have a better understanding of the network though you are not looking at the complete data. So you might miss something. So you need to explore to be sure you have not having missed some important trait in the data. So you can't alter the structure which is a problem. So another solution would be to cluster. So I get several nodes that are for example densely connected and I just scratch them in one single node. So it's compacting the data. In that case the problem is (inaudible) information again. So let's say I will have MSR as one big node, but then I don't see what is going on inside that node and how people are connected outside, let's say. So it's also a problem altering the data and the insight that you can get from them. So this set solution is to look for alternative representation and for example you can use addresses and matrixes. So what are they? See here is a simple example. You've got a node link diagram on the left so people are actually orange and they are linked by directed links. So we (inaudible) here and it means basically that A is connected to C. Now this representation can be as a (inaudible) so this graph or network as a (indiscernible) representation, node link diagram and adjacent symmetrics. So it is very simple. In a matrix it is like a table and you have got the actors or nodes that are actually in rows and in column. Rows being the source of the edges or relations and column being the targets. So for example if I take A is connected to C in the matrix I will have the row A is connected to the column C and I will place a mark in the matrix. And now let's take a concrete example. This is a year of -- between my research collaborator in Australia. So it's 400 people. And you take degradation of who sent an e-mail to who over a year. What you can see by this is that oh, lots of people send email to each other, like almost everyone, except maybe those two people here that sent only an e-mail to one guy. So if I now represent that as a matrix, you will see here that the gray means no connections and no e-mail and the black point means one e-mail. And if I turn rows and color according to algorithm I did later then you can see the same matrix that gives you completely different insight on the data. So first you can see there is big amount of gray in that matrix. So compared to the node link diagram you say, no tis not that dense actually. So in a year people don't send email to each other quite a lot. You can see small groups here, small blocks, small black blocks. And if you drill down and read the name of the people you actually realize that member of the same research group and of course they send e-mail to each other. During a year they are all connected to each other. Now what you can see and what is interesting is that one maybe one group which would be like for example rows sent e-mail to its own group and also maybe another group. So you can see also trends and collaboration between groups. The other thing that you might see it is a very long line horizontal and vertical lines here and what it is actually one -- or a group of people that (inaudible) a lot of other people. If you drill down again and see who is sending e-mail to everyone it is actually financial service that actually refund people when they go to mission somewhere. They send e-mails to lots of other teams. So what you can see here and my point is with node link diagram you might have a misleading, misleading information and you can't really say much thing about it. If you transform it into a matrix representation you have much more insight and very quickly. So from that point my approach was try to see the users, try to see the social scientists and work with them and how are their problems and how we can solve them. So I use (inaudible) redesign. So we scientists from different French institution. So basically it's four phases. Observation. I go to the place and see how they work. What are the problem? What are the data set? What do they do everyday? Then we come together in one workshop or several workshop and try brainstorming together. So what would be the perfect solution for you guys? What do you want? What could be helping? So from this brainstorming session we get set of ideas and then we prototype and evaluate wisdom. So I ran actually two workshop with them. They are pretty busy people and expert but I got several (inaudible) feedback. So we went back and forth during the three years of this PhD. The main outcome was that they use analysts too with lots of menus and basically they will have lots of statistics. They wouldn't know which method is the best one. They would just try everything. So they will systematically try everything, try all menus, all parameters, because they needed to find something, they just didn't know what they wanted to find. And what we found out is also that they draw pieces by hand because using the current visualization tool is too difficult for them. So most of the time they would draw lots of node link diagrams. So they use mainly node link diagrams, small ones. But the interesting finding was there is also matrixes. So from those two points the potential of matrixes for users that actually know what this representation is, I decided to work on this and into my PhD topic it. Yeah? >> Question: Do they label their nodes with the links? >> Nathalie Henry: Yeah, basically matrixes they use, usually very small ones because you can imagine it takes a lot of time. They label by name or identifier and then usually in the matrix they put values so they do that when they got several (inaudible) links so for example you have got three kinds of relation between or you can imagine the distance and stuff like that. They use it to think about how the different relation compared to each other most of the time -- with my users again. So my approach was is information positioned but three different axis. The first is perception. We already know how people perceive matrixes so I wanted to work on this and understand, okay, how do they perceive it, what kind of feature are important? Let's always taking in mind that I'm working with social scientists. So what is it for them to visualize social network with matrixes? Second point was exploration. So okay, I know how they perceive matrixes how they interact with it to get to the data. And the third part is actually the communication part. So they find some things, theory or findings about the network. So how do we communicate it to the world. How can we help that? And here is the itinerary of my PhD. I will tell you each point later. So first I was interested in perception and how people perceive matrixes and as I briefly told earlier you can permeate rows and columns of a matrix and the permentation(phonetic) give you different visual perception. So the idea was okay I'll try to make metrics usable, which means if you randomly order a matrix it is (indiscernible) unusable. You will have dots everywhere and you won't be able to see anything. So that was my first step, trying to collect all ordering algorithms that exist and trying to assist accrediting. The second one -- the second work that I did was trying to combine matrix and node link diagram. The idea was that the matrix would be used for exploring because you start with large investment work. Then you manage to find whatever insight that you need to find. You filter it and then you communicate with node link diagram because that's the most popular representation that they use to communicate information. So I did matrix explorer. I will just talk a bit more later. Then I found out that actually there were problem when exploring matrixes. So I fixed that by omitting them with net link. And the first MS work that I did in my PhD was to actually merge matrix and node link diagram, providing a way of exploring and also a way of communicating. Now we'll talk about it later. So how to make matrix usable. So Jack Destop(phonetic), French guy who wrote Synergy of Graphics, similar looking theories, showed us very interesting things about table, numerical table very early, 1967. So it is a very simple table. So you have got countries, five different countries and five different kind of meat. And each represent what meat is produced by which country. So what he showed us is that if you replace each values, you make a value by a small, by visual variables, so small rectangle and if you permeate rows and columns you can actually find things very quickly in the data. So you can find groups. So here you have got A, B and C. And they are actually very small groups, of course. A's can represent two countries. But you can see those two countries are the same profile of prediction. So you can see trends in the data. And you can see also that the B group, which is the friends in Italy is actually also the same trend of prediction, but opposite from group A. And you can see that very quickly. You can also see the outlier, which is actually Belgium, which really doesn't have or provide a prediction. And from this very simple example you can go further and you can say, okay, now let's have France, let's say there is a low to be voted against the prediction of let's say horses here. Then France and Italy won't be really agreeing with that low. So should they go to (inaudible), Netherland or Belgium to try to convince them to vote with them against that node then basically they should go to Belgium because that trend in Netherland don't really care about it. They're more interested into the first kind of meat, whereas Belgium can be easy to convince because doesn't really have profile of prediction. So it's very simple example. It's a small table. You can transform it visually and if you order it then you can get real information about data. And it is pretty fast when you know domain and the data. So from this I tried to look out what kind of ordering method there is in the world. And there is (indiscernible) methods to permeate rows and columns because it is related to lots of problems. So I'll just divide it in two categories. Easy. So you have got table based ordering methods. Principally it's from the field of biology. So you have got gene expression table and what happen is you have got links and rows and different predictions, experimental condition and colors. And you want to reorder rows and columns so you can see genes that are expressed the same ways in the same condition. And so of course the way of permitting these rows and columns is -- I mean there are many ways to compute those things. The second one is related to graph linearization. So the idea is you have got a graph and you want to put all the nodes into one single line and you want to optimize such function that says minimizing the number of crossings. So there are also lots of methods by it. And what I did about that is I tried to mix both. So I tried to take a graph, transform it into a table and then apply ordering methods on it. So I want to explain it because it is a bit complicated, but I can talk about it later. So from that actually the question is okay, we have got lots of methods. What is a good ordering for social networks and for social scientists in particular. So I went through semesters with scientists. I show them node link diagram matrix. Okay. We order it and tell me what is good and what is bad and I come up with this representation. So basically what social scientists want to see is those large crosses here, which are central actors. So in a node link diagram you would see it like the person connected to several communities or having many links. And then when I see groups, which are blocks in the matrix and densely connected community node link diagram. So they want to click for example here you can see they all are connected except the diagram and it is connected to itself. And here again see the block is not complete so it is a community and not -- there are some missing connections here. But so this was a manual ordering. Right? So I went to see them and I asked them. So the question from there is what is a good arithmetic algorithm? How can I choose my arithmetic algorithm to fit their need? So all the algorithms are designed from formal measures. Like I try to minimize the distance between rows or between columns or I try to minimize the crossing. But the question is can we characterize those ordering according to the visual feature they produce. Because I don't care about the formal mergers. I want to see the communities. I want to see the crosses and stuff. So my approach on this was to perform empirical studies with them and try to understand other perceived groups and how they perceive different orderings. So I ran several experiments with users on digital pen and paper. What happen is here you have got the master's grades at Universite of Paris and you have all the students and columns you have got the course they choose for their Master's. And I reorder them differently from different algorithm and try to characterize what visual features would appear. So then I asked students mainly to group the -- to draw on the visual matrix the groups and try to name them. So they had to perform very long experiments and trying to figure it out not only the visual group but also trying to give them meaningful name. So from maybe 30 of those people performing on several data sets I just get all the results and superimpose them on top of each other to try to see if there were common groups. So here what you can see is that there is no common groups actually. So the more white it is, the more group has been chosen by people. But when it's ordered you can actually see there are some consensus. So for example, those groups here have been chosen by almost all the people. And if you look at the name you can see that actually those ones circles in reds have the same name, that's people -- that student that chose Kenichi psychology courses and users, most of them like maybe 80% of them, were able to find out this group and giving a meaningful name. So from this experiment the question is, is there a good algorithm, a good arithmetic one to find groups or for a particular type of data? And so what we found out is that with different algorithms, so this is different algorithms, always on the same data set with students, that people find the same group. So here it's split it in three and here it's actually a single group. And they are able to identify it as Kenichi students. So the question is actually more complicated than finding one group because maybe finding a group that is in three is good because people can identify another group. So the question of is there a good algorithm is very hard to answer. Yep? >> Question: Just to be clear. Your algorithm, the labeling that is happening from the people that you're (inaudible) ->> Nathalie Henry: Yes. >> Question: So they're just saying this is a group (inaudible)->> Nathalie Henry: Yep. Yep. Exactly and then you code it because of course they will say psychology and some others will say Kenichi science. So you have to code it and say okay I consider it's the same notion. But yeah, they give them (inaudible). Or you got also people that call them group one, group two, group three. But then you try to avoid those ones. And so now there is not a good algorithm, even mine. So what can you do about it? So first it saves huge amount of time to start from somewhere because if you look at people that started with random or alphabetical ordering they couldn't do anything basically or they are to spending -- first they spent twice the time trying to figure out which group was going on. And they couldn't find anything. Whereas in the different ordering algorithm we found that they found different groups and I was -- first they found something and there was sometimes quicker with one ordering than the other one. And what we found out is that analysts need actually several orderings to find a consensus in the data because sometimes it might appear in a group with one ordering, but it's just another fact and it will be split in others. And so then you need additional things than readability, you need interpretation, which is very depleting to the data and then you can't do anything about it basically. So the human has to go in and decide. So I focus on two things. Help people reordering because most of the time they start from somewhere, but they're not really happy about the ordering and they would like to try other things. So I develop a set of interaction to move groups and interacting with the matrix on a more iterative way than actually address sorting with items or stuff like that. So more direct manipulation kind of interaction and also I focus on interaction to find a consensus on the data. Try to get the truth on the data. So just a simple example. There are many features that are just like that one because I think it's an important and simple one. So basically if you give people the possibility to cluster to recognize groups in a matrix like with interactive clustering you just use lesser selection and then they decide for the group. And then if you offer the opportunity to reorder then you still have -- I mean it's the first step towards finding consensus in the data. So for example you can see that those Kenichi students actually, the purple one, if I reorder in the second different ordering there will be split in three. So you can wonder why and you can drill down and maybe decide if it's a group or not. And you can see for example that the pink one, they stay grouped, except maybe those ones here. But so it's the first step towards trying to find what is really in the data and what is not. And so the message here is try to give several perspective on the data. So from this orders and directions and matrix, I combine best of both world basically with matrix explorer. So I decided to give my users matrix and learning diagram and of course I concentrated in matrix representation in giving lots of interaction with it. Just a simple example showing that once you have recognized groups in one representation you can interact simply and see how it is -- they are in the other representation. Very simply the person select a group in the matrix and he see the related data in the node link diagram. So it's very simple. And people usually understand it very easily because they're used to the representation. And of course you can synchronize both representation by colors and all kind of official variables. So just to summarize I provided correlated views and interaction to explorer and my purpose was people will use matrix to explore in order to communicate. But Actually what we found out is that people use node link for certain tasks and matrix for others and usually they switch back and forth. Most of the time this -- I mean it's better that they get two screens, one with matrix and one with node link. And it is more comfortable that you have three, actually. And what I figure out is that switching from one to the other and switching from one representation, which is a cell in a matrix into a link. You know, node link diagram is very Kenichibly(phonetic) demanding. Like if you switch back and forth then you're really tired at the end of the day. And so that was the main finding is that it's very hard to switch back and forth. However, they use both representation to be able to figure it out if they were a group or not. They use different layouts and ordering so basically they would use all those perspective to find out if your group was really existing. So in -- it means that we really needed both. So I tried to explore why would they use matrix or node link and when. So social networks can be sparse, such as general (inaudible). So it's very sparse. Sometimes you can have loops, but better not talk about it. It's like (indiscernible) marrying someone or something. But when they are sparse people use node link diagram. It's -- because the matrix is mainly empty so you will just spend your time crawling. And the node link diagram doesn't have lots of crossings so you don't really need to go to the matrix. In that case they use only node link diagram. (Inaudible), they use mainly met matrix. Like if you're a female they try and they wait for ages and they don't think it is okay and work many of the matrix. But don't -- they switch back and forth. When exploration large and dense network, they switch back to the node link diagram. And this actually explores one of the main weakness of matrixes, which is past related task. This is very simple I explain just now. So in social network of course your focus is connectivity. How people are connected. So for example if you want to see how B is connected to D you will see the paths go through C. C is exponentially interesting for you. And some people actually achieve lots of those tasks. How people are connected, who is on the path between two people. >> Question: (Inaudible). >> Nathalie Henry: Yeah. >> Question: I'd like to know when you're doing most of these it looks like there is no direction to your links. >> Nathalie Henry: Yeah. >> Question: All of your matrixes look like they are symmetric. >> Nathalie Henry: Yeah, they are. >> Question: And -- but in the case of like e-mail that have ->> Nathalie Henry: Yes. >> Question: -- the direct link; right. So I'm just curious if your exploration here is mostly symmetric connections or are you also looking at direct link connection. >> Nathalie Henry: No. So I could do it. The problem is that almost all the algorithms are designed for undirected graphs. It's much more complicated because for directed graphs so you say this, the matrix is unsymmetrical. And so the distance between rows is different from the distance between columns and it might be linked. So if you move two rows then you might change the distance between columns. So it's much more complicated than that. So it's right. I didn't plan the version was directed. You can visualize it, but can't rally reorder it, so forget it. >> Question: Okay. It just -- it occurred to me that especially in a matrix visualization that would be really interesting. >> Nathalie Henry: It is. >> Question: You can pick up on the asymmetric. >> Nathalie Henry: Yeah. Since the big thing of research trying to figure it out how people perceive it. So very interesting. Because groups are not groups like blocks, so -- but you know I did investigate that. (Inaudible). So go back to path. So here I always show directed one because it it is easier to explain. So here -- okay the good thing about finding a path in the matrix is that it's always possible. Because if you look at this, how can you find a pass now? When it's too dense, too large then it's kind of madness. So of course you can advance, highlighting or filtering. But it's very difficult to perform those tasks. So however, the problem with matrixes, it has (indiscernible). So I'll show you a very small example, same one. So the rows are the sources of the links and the columns are the targets. So let's go from B to C. So I see one link is here. Then I have to go back so C is my target so I have to go back C as a source node. So basically I have to go back to the rows. And then from there I can go to D. And you see that here actually it's much more complicated when the matrix is large because have you to scroll down and imagine you have got more than one intermediate. Then have you to figure it out scanning the full metrics basically. So it's almost impossible. I mean, you could do it, but then you will become mad pretty soon. So the idea is my third PhD contribution is trying to augment matrixes so people don't have to switch back to node link diagram and especially for those tasks. And the idea is very simple again. So you have all the links that are displayed as a linear study graph on the sides of the matrix and in addition you add interaction. So on my solver you can see actually -- oops, sorry. Going to stop it. So on (inaudible) you can actually see all the links departing from one node. So you have an idea of the connectivity. If you click on one, now you can see one sort of task. Up here you could have an area or where they are connected. So here the hope is like three. They are connected from -- with three intermittent people. And you can read quickly who are the people connected. And you can go on. You can select one and do multiple path. So the idea is to have static links and interactive links and interesting thing about this is that the linear static links gives you a feeling of what's key in the metrics. For example, here you can see this light lattice kind of pattern and it's actually a community. So of course when you are used to it you see much faster. You can see there is a community. So It means that down there when the matrix is hidden you can see that there is a community and for a lot of matrixes it is even cooler because you can see those links here leaving your viewpoint. And you can feel like oh, there are people that I could be interested in outside of the viewpoints. So it gives you things that in a traditional matrix are and plus you will find is that a, you potentially entrusted in someone outside the viewpoint and you can see here for example that there is a community down there. So you might want to experience. So you have a feel of the structure of the graph. Now the problem is when you have got one person here and one person here and you display the shortest path, one intermittent guy might be outside of the viewpoint. So the good thing is you know it, actually it's very simple. You look at that. There are someone important should be interested in. But now if you are for example this matrix is 2,000 people. Imagine that guy is 2,000 rows this way on the right. Then you have to scroll down huge amount of time and if you have to do that several times you become completely lost in the matrix and it takes time and it's very frustrating. So we designed with my (inaudible) and (inaudible) method to fold the space. So the idea is to fold the matrix as a piece of paper. Show you a quick video of that. So you can see more point of focus. So basically it's very simple. Just scroll down, but instead of losing the first focus point you see the second one. And then if you want you can adjust the zoom level of one part or the other, if you want to be able to see like let's say the communities. And you prove that he works for a pharmacist in social network task. So let's go back to that question now. People, do they use matrix or node link now? So the good thing is we as far as network they use node link mainly so you don't change anything. But for large (inaudible) now they use mainly matrixes. So I say that with a small group of people, but I actually witness it that they would switch less often to node link diagrams. Now the bad news is there is an intermittent category of networks that are called small world networks. Those one are globally sparse so basically they are not a a few links. Let's use noding then. No. Bad luck. They also look dense. Means they look how dense community that are tightly connected and they are few connected together by small links. So should we use matrix then? Well, unfortunately there is a very common category of social networks. So let's see what people do currently. So one of what we consider best representation at that point we have this one. So it's node link diagram. And you can see here like blue dense connected parts that are communities and between them you can see like pictures here that are actually (inaudible). So the good thing is you can see that if I remove this central actor those two communities are disconnected from the graph. So you can have structured information. However, you don't know what is happening inside the communities. Is there a missing link here? Can't see it, so have you to zoom probably. Can you see what is happening? The communities are all the people in the community connected to the central actor. You can't see it either. So our solution for that is okay, let's merge both representation. So as node link are good for sparse, let's use node link. But for dense part let's use matrixes because they are better for dense parts. And so here come matrix. You select dense part of the graph and you transform it into a matrix. Oops. I did it again. Whoa. So what you see here is actually there is a small animation to go mostly from node link diagram to matrix and you actually find out that it help people understand what is going on. It is (inaudible) nice idea. And the good thing with this representation is you can interact very iterative with it. So it is very easy to use drag and drop and we move people from the matrix. So I'll show you with the point of it. So basically it it is like using a pen. Right? And you just draw the communities and you can see them. Oops. So you can circle with the pen and then transfer them into a matrix. So what you can see here is you have one matrix which is click, except diagonal. So everyone is connected to everyone. And the other one is actually some missing links. So the blue means link and the white means nonlink. And so you see there is a cross in the matrix which is almost empty. It simply means that the guy in the matrix is not connected to anyone, but he was placed close by the group so you could assume that he's part of the group, but he's not. So here again really quickly see it. And simply by drag and dropping out of the matrix you can address the community. So it's very iterative. So from that we try to explore larger graphs. So of course when you have got a very large network you don't want to circle other groups one by one because it takes time. You can use arithmetic algorithm or you can start from the matrix. So here is an example or start from the matrix and I select the communities I'm interested in and I want to see they're connected to each other. So I just drag them into a matrix representation. So I just select a group of those and drag it into the matrix version. So if you were filtering the graph at the same time. So in that case what is interesting is that you've got two communities that are obstructed from -- that are very close to each other in the matrix and you could imagine they were connected, but when you drag them you see there is no connection. So that is when you can (inaudible) it also the matrix visualization. So if you do that again then you can see the links and the good thing is you can use it for communicating. So of course you can use it for explorer because you drag and drop and enter it in your matrixes. But you can also use it to communicate your findings. Simply you can remove once you found out what other community you just removed the name and the beauty of this is that you can minimize the matrixes so they become very small. But you can still see the pattern of connection inside the matrix. Yeah? >> Question: (Inaudible) labeling ->> Nathalie Henry: Yep. >> Question: So for each little matrix there is ->> Nathalie Henry: Yeah, there is four. >> Question: Four -- does that mean that each name is repeated four times? >> Nathalie Henry: Yes. So why I did that is because I (inaudible) and usually there's people around so they can see the names from all around. The two are, even one is enough when you know how to read it. But usually two are enough. And so from this representation you can represent the (inaudible) research community. Could be that one here for example. So it is nodes and it takes it into very small space. So you can communicate easy also to do that. So from this I have problem of central actors. Okay. So I have got central actors, such as me whose collaborating with people from Paris, but also people from Sydney. So where do I -- where should I be placed? In which community should I be placed in, Paris or in Sydney or between them or maybe in overlapping communities? So you can imagine if I'm coming here I will have three communities so we have three overlapping communities. So imagine with more many central actors in many communities, and you can't see anything. Or should I have hierarchy of communities? Then it becomes pretty hard to read. So the idea I go with Anastasia (inaudible) from University of Sydney, why not duplicating those people. So I should be both in Paris, both in Sydney. I'm duplicated. And now the question is what are the user -what is the user in understanding on this? Is it misleading? What happened? Is it good? Is it bad? Basically. So I want to save this experiment because it is a huge one. But mainly we thought that multiplication will give you more accurate community view. Example. So this is same network and you see central actors are defined as people connected to everyone in both communities and you have yellow. One yellow here and three blue ones here. And if you choose to -- so they're put in one community, whatever, you just random one. If you change them, a community, then you see very different visual feeling of the network. For example, here you can say that the two community are kind of equivalent in size. But now if you change it then you will say oh this one is very small. So if you duplicate people, which is symbolized by gray links here then you can imagine you have a better view of the communities. Now the question is does it impact the understanding of the network? I am feeling that there are less connection between the communities or not. And so the result is that you minimize the misleading effects by visualizing the link duplicate -- duplicates -- the link between the duplicates. Sorry. And by offering interaction. So you users create their own duplication and you offer the visualization and you minimize the misleading effects and actually help performing some of the tasks. So that was the last part of my work. So this is four different stages and now at the end of my PhD I wonder what about the evaluation? How can I evaluate all of those thing? So first I did some (inaudible). So it's mostly formal and it's during the whole process. So you could say it's informing the design's kind of validation already. Then you can go further and say okay I'm doing cultural experiments so the problem with cultural experiments, is it for a specific task, with specific data set with specific representations. So basically you have got very specific results. So you can prove that one representation is better than the other one, but it is very limited. So I decided to run a case study. More realistic settings, real data sets trying to find -- trying to see if we can find something about the data. Real findings. And also launch it in a study, but it's not included in my PhD. So I ran a case study over five months and -- actually longer than that, but five actual months working on it. And say that our research collaborates in four conferences and 20 years of collaboration so it's pretty big network. Got 27 southern actors, which are researchers actually. And you have got more than hundreds of indiscernible) relations. And I use it mainly as in matrix explorer and then later because this was in wrong direction, later I use met link in matrix. And actually I ran into a problem I didn't think about at the beginning, which is how to represent such findings. So I thought okay, we can filter the graph in present and link diagram. Filtered one. But as you represent an overview of the field, so I did some manual presentation so basically you have this huge matrix here, which is like 30 southern by 30 southern people and you annotate it manually. And you show some of the groups and you can have some closer view so if you want details. And from here I actually design metrics -- matrix, sorry. So the idea came from here. You will have the community and see how they are connected. So I designed this representation where you can have overview of very fast overview and some details into it. So here you've got the matrix overview with small matrix without the name of the people, just the name of the community. And here is the closer view where you have all the details. And I just had fun doing other stuff. Matrix, which would have some magic lenses where you would see the inside of the matrix as a node-trics. So lots of merging things. So basically what have I done? So I look at the problem from three different perspective while the third one, communication, came a bit later. But so the idea was to work on -- and realize what was perception. How did you perceive matrixes? Explorer. How do you explore them? How do you communicate? So now what do I want to do next? So then very interesting questions about the evolution of the network, how they evolve over time. How groups are becoming groups. Do they split? Do they become -- what is the history of one community? And this of course is linked to perception and social inter exploration because now you have got much larger assets and so have you to, you need interaction technique to do that. And also communication is now very hard because a static picture is not enough. Maybe you need a sequence of pictures or you need a movie so or video. So that is one of my focus. So my goal in research is to stay with the flow. So I won't try to pronounce the name of the guy who came up with this, but the idea is very simple. So in research you have -- well in life in general you have challenges, which is if you got too much it is very hard. You are very scared that you can't manage to it. But there are things you know your skills. But if you only do what you know then it's very boring. So the idea is to stay between both. You want a bit of challenge and a bit of things that you know how to do. And to do that my plan is to have larger perspective. Let's say doing time-related data (indiscernible) and understanding evolution of networks. That is where keeping small projects where I know I can actually do something about it. So the first step that I started because I finish my PhD a few months ago is to try to see the evolution of social networks. One project I'm involved in is the visualization of how people collaborate in (inaudible) so my French Research Institute, so this is very old one and they created teams at the beginning and now what happen is that people collaborate all over the place and sometimes not even their own team. So they want to just do, create new architecture and new teams. And so I'm helping on this project. And also another one on Wikipedia where people want to be able -- want to see what is going on and they supervise who is working on what page and how does that thing evolve. And other types of data so there is people from Institute by (inaudible) Paris that I'm very interested in using one of the tool that I created because they got huge networks and they don't know what to do with it basically. It's lots of data and they can't see anything, so they are very interested in matrixes to do that. And more generally there is this project with MSR on reactivity and how people, how can we show user activity? Can we reflect the activity to users. As special case I'm real interested in supporting longitudinal and qualitative studies and particularly because very important in (inaudible). We don't really know -- so as I said at the first beginning at the very beginning we want to answer question we didn't know we had. So how would you ever rate that? So basically the idea is okay, we give the tool to real data. And we study how they use it and try to get the benefit of this. But though there are many problems to do that you need to log the data and need to know what do you want log, how to log it and there is this new -- this very potential, this very creative way, but how do you visualize this data. You need to analyze the data collected to see how people use the representation and the visualization, but you could also create representation to reflect it to user to help that explorer, to help them communicate on their exploration process. So also all that part that is very interesting. So to conclude very fast. My research approach is involving users. You have to talk to users before designing stuff for them. It is better design. Several perspectives you have to combine, data mining, statistics, visual, analysis. And research direction would be time related data and evaluation in general. And here is small network. In 1976, I'm here in this matrix here and you can see there is more than 8,000 people. And you can see -- so those are small matrixes of different colors are actually the teams. They can see here that you have got problems. Here you can see that (inaudible) at all in 2006. (Inaudible) ->> Question: (Inaudible). >> Nathalie Henry: Here is actually distributed network people. >> Question: (Inaudible). >> Nathalie Henry: No, no, no. They are collaborating. The -- so the problem is they are collaborating very strongly outside. So these other people are not in their own team. For example, I'm in the information visualization team. So research group. And this team is with software engineering. So we've got tiny connection, but though we collaborate more with biologists actually. So what happen is we're in the same, we've got same (inaudible) and projects and stuff like that, same resources. But though we don't really collaborate with each other so the idea is to create new research teams that make sense so basically for example those two ones collaborate quite a lot, but they might be very far away from each other or they might be more difficult to exchange people from one team to the other one. So the idea is try to have new visualization, new communities, new research organizations. Also the thing is with going very fast so the research director had no idea of what are the current themes so he's got a set of themes like HCI, but he doesn't really know what people are working on so he's looking also for -- trying to find clusters or how people collaborate. For example, we're very close with data mining people now. So he would like to be aware of that for example just to interact to the rest of the world. Yeah. Well thanks for your attention. If you have got questions. (Applause) ->> Question: (Inaudible) ->> Nathalie Henry: Very good question. So actually work on the multi-level matrix representation, but it's very hard. So okay the -- so the question is already for like few thousand nodes it's hard and we don't really have solution for that. So even with the matrix, it's very difficult because even if you have interaction techniques you are dealing with huge amount of data. Now you can see those scale representation that I tried. But it's always the same program; right. You are losing information because you are aggregating somehow. And also it takes a lot of process of time. So for example the data of biologists, you can't load them in the (inaudible). So you have to have lots of infrastructure architecture hardware to be (inaudible). So yeah, it's still a very hard problem. And that is what I'm going to run into when I work with time-related data. The question is try to scale a bit more and then we'll see later the terabytes of data that biologists have. I think it is also completely different kind of problem because maybe so what happen when you have got very, very huge amount of data, people work on different parts of the data so would be more maybe collaborative than it is. And rest of the time they use data mining. I think you can't really visualize this huge amount so have you to have other stages. This is different. And this is very small network actually. But it's already a challenge right now. So yeah? >> Question: (Inaudible) -- of adding additional dimensions to matrix? I mean ->> Nathalie Henry: Uh-huh. Yeah. So actually we got the idea a few weeks ago. So with collaborator in Canada. He is of Calgary. He's working on this idea of manipulating 3D shallow depth in a table so the idea basically is to have cubes instead of simple matrix and then you have to think about interaction with it. But yeah. For the moment I must say that the problem of matrix, so this representation here, is that you can have so many visual viables that you can't use them because you have the links, you have inside, you have the labels. So you can't use them anyway. So if you have a third dimension I just wonder how it will be effective and so the idea and one of the paths is to be able just to just to interact with one or a subset of them then they become actual cubes that you can manipulate. And basically already matrix is very iterative so you just grab one. It becomes a cube and so you can turn it so you can see the different dimension. But there is work on the perception on this and now it can be misleading and what kind of data you want to represent. And so another suggestion is to use milaunch, which is a folding thing. But instead of folding you imagine that you are actually catching through different layers and here you could visualize other data, but I think it is also a problem with perception. But yeah currently you just change the colors manually and -- that's the only thing you can do. >> Question: Do have you any way of maintaining sort of perception of the state as you are changing the colors or whatever? Because one of the big problems here is making sure that you are staying (inaudible) ->> Nathalie Henry: Yeah. >> Question: (Inaudible). >> Nathalie Henry: Yeah. So that is a very complicated problem. It is funny because I work a bit on it when I wanted to visualize how ordering (inaudible) worked and basically I wanted to show the process. So now the problem of that takes huge amount of time if you want people to actually follow what is going on because you can't move everything and when you have a (inaudible) it is standard for -- people just want to skip it. And I'm afraid it could be the same for changing the attributes though I didn't think of it. But maybe I mention might be a good way of ->> Question: (Inaudible) -- design, how long do they live in the data set? >> Nathalie Henry: Whhh. Very different. Starting all their life. So well that is for French historian, right. They're going in the streets and they take maybe several years for collecting data. So they use actually paper -- pen and paper for people. And then in the night they enter it in Excel. Right? So I mean they realize everything. So the first thing when you show them something is they want to see the data. Where is my data? Or what did you do with it? So very, very long time with it. >> Question: So that being the case, I would assume that these people would be spending some time working on a set of hypothesis or a set of investigations. >> Nathalie Henry: Yeah. >> Question: And the -- annotating their data and then working on a different set of hypothesis and annotating their data. So the data gets richer over time even if it is a static data set through the ->> Nathalie Henry: Yeah. >> Question: -- through the iterative evaluation. How do you persist that step -- the kind of hypothesis in labeling and grouping that the investigator is doing over these vast -- long or vast time scales? >> Nathalie Henry: So during the perspective design the last session that actually came a very nice idea. So the problem is I work on this three years on this. So I couldn't do anything. But the idea was you have something that shows you the history and also that can help you plan things and annotate. So we call it history planner. But -- so basically currently what they do, they have hypothesis. So much of the time it's even before they collect the data so they can ask the right questions. Could ask all the questions, but -and what happen is that they try -- they test the hypothesis through statistics. So they don't really do exploratory analysis so most of them their hypothesis is like, I don't know, most of the time women collaborate more with men and then they will try statistically and I will say yes or no. And what happen at the beginning is that at the beginning they said well, it is okay, we can try everything. So that is what happened; right? They try all the hypothesis. Because you have got the body of literature has whatever hundred of hypothesis so they will try everything. Oh, in my domain that works. But when you provide them with the visualization they see other things. So what happen is that they have hypothesis, but they would like also -- they are very interested in seeing the data just to raise new questions. So basically you can't remove the current tools because they need to test everything. But they are real interested in seeing the data in new way, maybe try to find maybe new hypothesis. And once they get their new hypothesis then they will try to go to statistical measures and what else to do that. So that is the way they work currently. Also -- so as I said before they really love their data, so they're just having faster -- it is nice to just talk about it together. So usually they're very alone people contrary to us. I mean I'm used to collaborating with lots of people, but most of the time they're doing like -- I don't know, woman in the 15th century in Britain and then you have got the other one who is doing a man in the 16th century in Paris. So they don't really talk to each other. And the fact you have visualization to help them also to say, oh look, you have this hypothesis and stuff like that. So it is also a source of communication. That is why I like the story-telling parts when you have to explain what is going on with your network. And of course when they see something iterative like a pen-based tablet, they all interact. Even if that is not their data they all go and do ah, look I did that with mine and worked. So it's a blend of techniques. Well I'm talking mainly for historians, but there are many other people that have many other ways of working so it's difficult to generalize. >> George Robertson: Thanks. >> Nathalie Henry: Thank you. (Applause)