>> George Robertson: It's my pleasure to introduce Nathalie Henry,...

advertisement
>> George Robertson: It's my pleasure to introduce Nathalie Henry, who's a graduate student at -- she's
getting a PhD jointly from Universite of Paris Sud in INRIA and the University of Sydney. And she's going
to talk to us about her thesis work on exploring social networks with matrix-based representations.
Nathalie.
>> Nathalie Henry: Thank you for the introduction. So yeah I'll try to explain my PhD work in 50 minutes,
so that's three years of work. It's about social networks and matrix-based representation. So basically my
background groups three different axis. So it's graph theory. I work with Peter Eades at University of
Sydney. Information visualization, my cursor advisor in France is Jean-Daniel Fekete and the application
domain is actually social network analysis. I work many with histology, historians and analysts from French
institutions.
So let's talk about social networks. So what is it? So social networks are actually a set of actors, which are
people, connected by relations. So like me and my mom. So we're a social network, two people, two
actors in one relations.
Just a simple example is FaceBook. So here is my friend network. So I'm in the center. I'm the red circle
on this network and I'm connected to all my friends on FaceBook. So this is a social network. And we call
this representation new link diagram. So basically you've got nodes, which are the circles. And you have
got links, which are the links, relation between the people.
So social networks are actually a very hard topic these days. So you've got all the online communities that
are becoming very popular, such as FaceBook, Flickr, Friendster. You have got online collaboration
websites also, such as Wikipedia, where user collaborates to create an encyclopedia and storage where
people share software codes.
And there are also lots of work on scientific collaboration. Our researchers collaborate with each other. So
basically the collaboration is where two researchers cosign a paper and a collaborate. We have
researchers interested in this to see type of view of the field.
And there are lots of other social networks. The web is one. People are desperately in need of visualizing
-- sorry, internets. So people are connecting through internet and you have got tourist network and more
generally communication networks and also in biology you have got business transmission networks. So
the HIV is propagating, for example. So those are all examples of social network.
My approach to analyze that is to use information visualization and exploratory the time analysis, more
particularly. So I'll explain more about that just now. So let's take Hinscomb(inaudible) numbers. So four
series of X & Y coordinates. So four series of points. For instance on this actually see the same statistics
for all four series. So for example I take the average or the mean of X and Y and I have got the same
number. So from this point I would say statistics says all those four series are actually equivalent. But if I
tranform it into a visual representation say using basic scatter blocks and representing each point by X and
Y coordinates in the space then I can see very quickly that those four series are actually very different.
And this is the main purpose of fixed exploratory data analysis. So you answer a question you did not
know you had. So you didn't know what you were looking for. You just wanted to get insight about those
numbers. And you want to look at the data from different perspective. So it's not replacing statistics, it's
complimenting it. You want to look at the raw data from different point of view to try to get insight from it.
So that's basically my approach.
Now let's talk about network visualization. So usually social scientists start with node link diagram. I
explained earlier it is just nodes like those squares here connected by links. Here is for example node link
diagram of the (indiscernible) research community, the largest component sits like 100 researchers
connected when they co-sign a paper.
So usually here you can actually see something. But if you go to larger node link diagrams, so let's say
that one, which is four to seven actors, then it's not the same problem. You can't really read the names and
you can't really see who's connected to who.
So after a lot of work filtering and signing visual viables most of the time you can achieve a very nice
representation such as dot link. So that's the same networking as the previous one. Except someone
manually placed the nodes into space and attributed colors and size. So here it is actually researchers of
(indiscernible). The size means how many papers did they publish and the color is actually how many
reference did they get. And here what you can see very fast is that big, red node here. Who's that guy? It
is actually Ben Shneiderman. And then you've got the ones that are densely, that are strongly connected.
So the links, the length of the links means the number of collaboration.
And you can just see George here, fortunately you can't really read the name. And he's connected to Joe
McKinley and (inaudible). So basically what social scientists want to see in those networks is the
communities so people densely connected such as this area here or here and central actors such as those
people that publish work or eyesighted the work or both.
So this picture was obtained manually, right? People just moved the nodes, assigned visual colors and
size. So what do you do with that big one here? First on the nodes. So you can do it manually, but then it
will take huge amount of time. So in one screen, one standard screen you can see 100 nodes. So imagine
four seven you have to spend a lot of time, but you can still do it if it is large. However, if it is dense it
means that lots of links occur between nodes, then it's not even possible. You won't manage to have
something readable enough.
We have the solution now. So first solution I remove data. So I sample. I take a random sample of my
network and I visualize it. So of course it is less data so it might be better, easier to read. However, in
social networks what people want to see is central actors. So if I remove randomly one central axis. Say I
remove Benjamin or George, then I have a vision completely misleading of my network so I don't want to
do that.
So another solution is filtering. In that case you filter so you remove data, but according to a given criteria.
Let's say I remove researchers that have only published one paper. In that case you can still have a better
understanding of the network though you are not looking at the complete data. So you might miss
something. So you need to explore to be sure you have not having missed some important trait in the data.
So you can't alter the structure which is a problem. So another solution would be to cluster. So I get
several nodes that are for example densely connected and I just scratch them in one single node. So it's
compacting the data. In that case the problem is (inaudible) information again. So let's say I will have
MSR as one big node, but then I don't see what is going on inside that node and how people are connected
outside, let's say. So it's also a problem altering the data and the insight that you can get from them.
So this set solution is to look for alternative representation and for example you can use addresses and
matrixes. So what are they? See here is a simple example. You've got a node link diagram on the left so
people are actually orange and they are linked by directed links. So we (inaudible) here and it means
basically that A is connected to C. Now this representation can be as a (inaudible) so this graph or network
as a (indiscernible) representation, node link diagram and adjacent symmetrics. So it is very simple. In a
matrix it is like a table and you have got the actors or nodes that are actually in rows and in column. Rows
being the source of the edges or relations and column being the targets. So for example if I take A is
connected to C in the matrix I will have the row A is connected to the column C and I will place a mark in
the matrix.
And now let's take a concrete example. This is a year of -- between my research collaborator in Australia.
So it's 400 people. And you take degradation of who sent an e-mail to who over a year. What you can see
by this is that oh, lots of people send email to each other, like almost everyone, except maybe those two
people here that sent only an e-mail to one guy.
So if I now represent that as a matrix, you will see here that the gray means no connections and no e-mail
and the black point means one e-mail. And if I turn rows and color according to algorithm I did later then
you can see the same matrix that gives you completely different insight on the data. So first you can see
there is big amount of gray in that matrix. So compared to the node link diagram you say, no tis not that
dense actually. So in a year people don't send email to each other quite a lot. You can see small groups
here, small blocks, small black blocks. And if you drill down and read the name of the people you actually
realize that member of the same research group and of course they send e-mail to each other. During a
year they are all connected to each other.
Now what you can see and what is interesting is that one maybe one group which would be like for
example rows sent e-mail to its own group and also maybe another group. So you can see also trends and
collaboration between groups. The other thing that you might see it is a very long line horizontal and
vertical lines here and what it is actually one -- or a group of people that (inaudible) a lot of other people. If
you drill down again and see who is sending e-mail to everyone it is actually financial service that actually
refund people when they go to mission somewhere. They send e-mails to lots of other teams.
So what you can see here and my point is with node link diagram you might have a misleading, misleading
information and you can't really say much thing about it. If you transform it into a matrix representation you
have much more insight and very quickly.
So from that point my approach was try to see the users, try to see the social scientists and work with them
and how are their problems and how we can solve them. So I use (inaudible) redesign. So we scientists
from different French institution. So basically it's four phases. Observation. I go to the place and see how
they work. What are the problem? What are the data set?
What do they do everyday? Then we come together in one workshop or several workshop and try
brainstorming together. So what would be the perfect solution for you guys? What do you want? What
could be helping?
So from this brainstorming session we get set of ideas and then we prototype and evaluate wisdom. So I
ran actually two workshop with them. They are pretty busy people and expert but I got several (inaudible)
feedback. So we went back and forth during the three years of this PhD. The main outcome was that they
use analysts too with lots of menus and basically they will have lots of statistics. They wouldn't know which
method is the best one. They would just try everything. So they will systematically try everything, try all
menus, all parameters, because they needed to find something, they just didn't know what they wanted to
find. And what we found out is also that they draw pieces by hand because using the current visualization
tool is too difficult for them. So most of the time they would draw lots of node link diagrams. So they use
mainly node link diagrams, small ones. But the interesting finding was there is also matrixes. So from
those two points the potential of matrixes for users that actually know what this representation is, I decided
to work on this and into my PhD topic it.
Yeah?
>> Question: Do they label their nodes with the links?
>> Nathalie Henry: Yeah, basically matrixes they use, usually very small ones because you can imagine it
takes a lot of time. They label by name or identifier and then usually in the matrix they put values so they
do that when they got several (inaudible) links so for example you have got three kinds of relation between
or you can imagine the distance and stuff like that. They use it to think about how the different relation
compared to each other most of the time -- with my users again.
So my approach was is information positioned but three different axis. The first is perception. We already
know how people perceive matrixes so I wanted to work on this and understand, okay, how do they
perceive it, what kind of feature are important? Let's always taking in mind that I'm working with social
scientists. So what is it for them to visualize social network with matrixes?
Second point was exploration. So okay, I know how they perceive matrixes how they interact with it to get
to the data. And the third part is actually the communication part. So they find some things, theory or
findings about the network. So how do we communicate it to the world. How can we help that?
And here is the itinerary of my PhD. I will tell you each point later. So first I was interested in perception
and how people perceive matrixes and as I briefly told earlier you can permeate rows and columns of a
matrix and the permentation(phonetic) give you different visual perception. So the idea was okay I'll try to
make metrics usable, which means if you randomly order a matrix it is (indiscernible) unusable. You will
have dots everywhere and you won't be able to see anything. So that was my first step, trying to collect all
ordering algorithms that exist and trying to assist accrediting.
The second one -- the second work that I did was trying to combine matrix and node link diagram. The
idea was that the matrix would be used for exploring because you start with large investment work. Then
you manage to find whatever insight that you need to find. You filter it and then you communicate with
node link diagram because that's the most popular representation that they use to communicate
information. So I did matrix explorer. I will just talk a bit more later.
Then I found out that actually there were problem when exploring matrixes. So I fixed that by omitting them
with net link. And the first MS work that I did in my PhD was to actually merge matrix and node link
diagram, providing a way of exploring and also a way of communicating. Now we'll talk about it later.
So how to make matrix usable. So Jack Destop(phonetic), French guy who wrote Synergy of Graphics,
similar looking theories, showed us very interesting things about table, numerical table very early, 1967.
So it is a very simple table. So you have got countries, five different countries and five different kind of
meat. And each represent what meat is produced by which country. So what he showed us is that if you
replace each values, you make a value by a small, by visual variables, so small rectangle and if you
permeate rows and columns you can actually find things very quickly in the data. So you can find groups.
So here you have got A, B and C. And they are actually very small groups, of course. A's can represent
two countries. But you can see those two countries are the same profile of prediction. So you can see
trends in the data. And you can see also that the B group, which is the friends in Italy is actually also the
same trend of prediction, but opposite from group A. And you can see that very quickly. You can also see
the outlier, which is actually Belgium, which really doesn't have or provide a prediction.
And from this very simple example you can go further and you can say, okay, now let's have France, let's
say there is a low to be voted against the prediction of let's say horses here. Then France and Italy won't
be really agreeing with that low. So should they go to (inaudible), Netherland or Belgium to try to convince
them to vote with them against that node then basically they should go to Belgium because that trend in
Netherland don't really care about it. They're more interested into the first kind of meat, whereas Belgium
can be easy to convince because doesn't really have profile of prediction.
So it's very simple example. It's a small table. You can transform it visually and if you order it then you can
get real information about data. And it is pretty fast when you know domain and the data.
So from this I tried to look out what kind of ordering method there is in the world. And there is
(indiscernible) methods to permeate rows and columns because it is related to lots of problems. So I'll just
divide it in two categories. Easy. So you have got table based ordering methods. Principally it's from the
field of biology. So you have got gene expression table and what happen is you have got links and rows
and different predictions, experimental condition and colors. And you want to reorder rows and columns so
you can see genes that are expressed the same ways in the same condition.
And so of course the way of permitting these rows and columns is -- I mean there are many ways to
compute those things. The second one is related to graph linearization. So the idea is you have got a
graph and you want to put all the nodes into one single line and you want to optimize such function that
says minimizing the number of crossings. So there are also lots of methods by it. And what I did about
that is I tried to mix both. So I tried to take a graph, transform it into a table and then apply ordering
methods on it. So I want to explain it because it is a bit complicated, but I can talk about it later.
So from that actually the question is okay, we have got lots of methods. What is a good ordering for social
networks and for social scientists in particular. So I went through semesters with scientists. I show them
node link diagram matrix. Okay. We order it and tell me what is good and what is bad and I come up with
this representation. So basically what social scientists want to see is those large crosses here, which are
central actors. So in a node link diagram you would see it like the person connected to several
communities or having many links. And then when I see groups, which are blocks in the matrix and
densely connected community node link diagram. So they want to click for example here you can see they
all are connected except the diagram and it is connected to itself. And here again see the block is not
complete so it is a community and not -- there are some missing connections here.
But so this was a manual ordering. Right? So I went to see them and I asked them. So the question from
there is what is a good arithmetic algorithm? How can I choose my arithmetic algorithm to fit their need?
So all the algorithms are designed from formal measures. Like I try to minimize the distance between rows
or between columns or I try to minimize the crossing. But the question is can we characterize those
ordering according to the visual feature they produce. Because I don't care about the formal mergers. I
want to see the communities. I want to see the crosses and stuff. So my approach on this was to perform
empirical studies with them and try to understand other perceived groups and how they perceive different
orderings.
So I ran several experiments with users on digital pen and paper. What happen is here you have got the
master's grades at Universite of Paris and you have all the students and columns you have got the course
they choose for their Master's. And I reorder them differently from different algorithm and try to
characterize what visual features would appear. So then I asked students mainly to group the -- to draw on
the visual matrix the groups and try to name them. So they had to perform very long experiments and
trying to figure it out not only the visual group but also trying to give them meaningful name.
So from maybe 30 of those people performing on several data sets I just get all the results and
superimpose them on top of each other to try to see if there were common groups. So here what you can
see is that there is no common groups actually. So the more white it is, the more group has been chosen
by people. But when it's ordered you can actually see there are some consensus. So for example, those
groups here have been chosen by almost all the people. And if you look at the name you can see that
actually those ones circles in reds have the same name, that's people -- that student that chose Kenichi
psychology courses and users, most of them like maybe 80% of them, were able to find out this group and
giving a meaningful name.
So from this experiment the question is, is there a good algorithm, a good arithmetic one to find groups or
for a particular type of data? And so what we found out is that with different algorithms, so this is different
algorithms, always on the same data set with students, that people find the same group. So here it's split it
in three and here it's actually a single group. And they are able to identify it as Kenichi students. So the
question is actually more complicated than finding one group because maybe finding a group that is in
three is good because people can identify another group. So the question of is there a good algorithm is
very hard to answer. Yep?
>> Question: Just to be clear. Your algorithm, the labeling that is happening from the people that you're
(inaudible) ->> Nathalie Henry: Yes.
>> Question: So they're just saying this is a group (inaudible)->> Nathalie Henry: Yep. Yep. Exactly and then you code it because of course they will say psychology
and some others will say Kenichi science. So you have to code it and say okay I consider it's the same
notion. But yeah, they give them (inaudible). Or you got also people that call them group one, group two,
group three. But then you try to avoid those ones.
And so now there is not a good algorithm, even mine. So what can you do about it? So first it saves huge
amount of time to start from somewhere because if you look at people that started with random or
alphabetical ordering they couldn't do anything basically or they are to spending -- first they spent twice the
time trying to figure out which group was going on. And they couldn't find anything. Whereas in the
different ordering algorithm we found that they found different groups and I was -- first they found
something and there was sometimes quicker with one ordering than the other one.
And what we found out is that analysts need actually several orderings to find a consensus in the data
because sometimes it might appear in a group with one ordering, but it's just another fact and it will be split
in others. And so then you need additional things than readability, you need interpretation, which is very
depleting to the data and then you can't do anything about it basically. So the human has to go in and
decide.
So I focus on two things. Help people reordering because most of the time they start from somewhere, but
they're not really happy about the ordering and they would like to try other things. So I develop a set of
interaction to move groups and interacting with the matrix on a more iterative way than actually address
sorting with items or stuff like that. So more direct manipulation kind of interaction and also I focus on
interaction to find a consensus on the data. Try to get the truth on the data.
So just a simple example. There are many features that are just like that one because I think it's an
important and simple one. So basically if you give people the possibility to cluster to recognize groups in a
matrix like with interactive clustering you just use lesser selection and then they decide for the group. And
then if you offer the opportunity to reorder then you still have -- I mean it's the first step towards finding
consensus in the data. So for example you can see that those Kenichi students actually, the purple one, if I
reorder in the second different ordering there will be split in three. So you can wonder why and you can
drill down and maybe decide if it's a group or not. And you can see for example that the pink one, they stay
grouped, except maybe those ones here. But so it's the first step towards trying to find what is really in the
data and what is not. And so the message here is try to give several perspective on the data.
So from this orders and directions and matrix, I combine best of both world basically with matrix explorer.
So I decided to give my users matrix and learning diagram and of course I concentrated in matrix
representation in giving lots of interaction with it. Just a simple example showing that once you have
recognized groups in one representation you can interact simply and see how it is -- they are in the other
representation. Very simply the person select a group in the matrix and he see the related data in the node
link diagram.
So it's very simple. And people usually understand it very easily because they're used to the
representation. And of course you can synchronize both representation by colors and all kind of official
variables.
So just to summarize I provided correlated views and interaction to explorer and my purpose was people
will use matrix to explore in order to communicate. But Actually what we found out is that people use node
link for certain tasks and matrix for others and usually they switch back and forth. Most of the time this -- I
mean it's better that they get two screens, one with matrix and one with node link. And it is more
comfortable that you have three, actually. And what I figure out is that switching from one to the other and
switching from one representation, which is a cell in a matrix into a link. You know, node link diagram is
very Kenichibly(phonetic) demanding. Like if you switch back and forth then you're really tired at the end of
the day. And so that was the main finding is that it's very hard to switch back and forth. However, they use
both representation to be able to figure it out if they were a group or not. They use different layouts and
ordering so basically they would use all those perspective to find out if your group was really existing. So in
-- it means that we really needed both.
So I tried to explore why would they use matrix or node link and when. So social networks can be sparse,
such as general (inaudible). So it's very sparse. Sometimes you can have loops, but better not talk about
it. It's like (indiscernible) marrying someone or something. But when they are sparse people use node link
diagram. It's -- because the matrix is mainly empty so you will just spend your time crawling. And the node
link diagram doesn't have lots of crossings so you don't really need to go to the matrix. In that case they
use only node link diagram.
(Inaudible), they use mainly met matrix. Like if you're a female they try and they wait for ages and they
don't think it is okay and work many of the matrix. But don't -- they switch back and forth. When
exploration large and dense network, they switch back to the node link diagram. And this actually explores
one of the main weakness of matrixes, which is past related task. This is very simple I explain just now.
So in social network of course your focus is connectivity. How people are connected. So for example if
you want to see how B is connected to D you will see the paths go through C. C is exponentially
interesting for you. And some people actually achieve lots of those tasks. How people are connected, who
is on the path between two people.
>> Question: (Inaudible).
>> Nathalie Henry: Yeah.
>> Question: I'd like to know when you're doing most of these it looks like there is no direction to your links.
>> Nathalie Henry: Yeah.
>> Question: All of your matrixes look like they are symmetric.
>> Nathalie Henry: Yeah, they are.
>> Question: And -- but in the case of like e-mail that have ->> Nathalie Henry: Yes.
>> Question: -- the direct link; right. So I'm just curious if your exploration here is mostly symmetric
connections or are you also looking at direct link connection.
>> Nathalie Henry: No. So I could do it. The problem is that almost all the algorithms are designed for
undirected graphs.
It's much more complicated because for directed graphs so you say this, the matrix is unsymmetrical. And
so the distance between rows is different from the distance between columns and it might be linked. So if
you move two rows then you might change the distance between columns. So it's much more complicated
than that. So it's right. I didn't plan the version was directed. You can visualize it, but can't rally reorder it,
so forget it.
>> Question: Okay. It just -- it occurred to me that especially in a matrix visualization that would be really
interesting.
>> Nathalie Henry: It is.
>> Question: You can pick up on the asymmetric.
>> Nathalie Henry: Yeah. Since the big thing of research trying to figure it out how people perceive it. So
very interesting. Because groups are not groups like blocks, so -- but you know I did investigate that.
(Inaudible).
So go back to path. So here I always show directed one because it it is easier to explain. So here -- okay
the good thing about finding a path in the matrix is that it's always possible. Because if you look at this,
how can you find a pass now? When it's too dense, too large then it's kind of madness. So of course you
can advance, highlighting or filtering. But it's very difficult to perform those tasks.
So however, the problem with matrixes, it has (indiscernible). So I'll show you a very small example, same
one. So the rows are the sources of the links and the columns are the targets. So let's go from B to C. So
I see one link is here. Then I have to go back so C is my target so I have to go back C as a source node.
So basically I have to go back to the rows. And then from there I can go to D. And you see that here
actually it's much more complicated when the matrix is large because have you to scroll down and imagine
you have got more than one intermediate. Then have you to figure it out scanning the full metrics basically.
So it's almost impossible. I mean, you could do it, but then you will become mad pretty soon. So the idea
is my third PhD contribution is trying to augment matrixes so people don't have to switch back to node link
diagram and especially for those tasks. And the idea is very simple again. So you have all the links that
are displayed as a linear study graph on the sides of the matrix and in addition you add interaction.
So on my solver you can see actually -- oops, sorry. Going to stop it. So on (inaudible) you can actually
see all the links departing from one node. So you have an idea of the connectivity. If you click on one, now
you can see one sort of task. Up here you could have an area or where they are connected. So here the
hope is like three. They are connected from -- with three intermittent people. And you can read quickly
who are the people connected. And you can go on. You can select one and do multiple path.
So the idea is to have static links and interactive links and interesting thing about this is that the linear static
links gives you a feeling of what's key in the metrics. For example, here you can see this light lattice kind of
pattern and it's actually a community. So of course when you are used to it you see much faster. You can
see there is a community. So It means that down there when the matrix is hidden you can see that there is
a community and for a lot of matrixes it is even cooler because you can see those links here leaving your
viewpoint. And you can feel like oh, there are people that I could be interested in outside of the viewpoints.
So it gives you things that in a traditional matrix are and plus you will find is that a, you potentially entrusted
in someone outside the viewpoint and you can see here for example that there is a community down there.
So you might want to experience. So you have a feel of the structure of the graph.
Now the problem is when you have got one person here and one person here and you display the shortest
path, one intermittent guy might be outside of the viewpoint. So the good thing is you know it, actually it's
very simple. You look at that. There are someone important should be interested in. But now if you are for
example this matrix is 2,000 people. Imagine that guy is 2,000 rows this way on the right. Then you have
to scroll down huge amount of time and if you have to do that several times you become completely lost in
the matrix and it takes time and it's very frustrating.
So we designed with my (inaudible) and (inaudible) method to fold the space. So the idea is to fold the
matrix as a piece of paper. Show you a quick video of that. So you can see more point of focus. So
basically it's very simple. Just scroll down, but instead of losing the first focus point you see the second
one. And then if you want you can adjust the zoom level of one part or the other, if you want to be able to
see like let's say the communities. And you prove that he works for a pharmacist in social network task.
So let's go back to that question now. People, do they use matrix or node link now? So the good thing is
we as far as network they use node link mainly so you don't change anything. But for large (inaudible) now
they use mainly matrixes. So I say that with a small group of people, but I actually witness it that they
would switch less often to node link diagrams. Now the bad news is there is an intermittent category of
networks that are called small world networks. Those one are globally sparse so basically they are not a a
few links. Let's use noding then. No. Bad luck. They also look dense. Means they look how dense
community that are tightly connected and they are few connected together by small links. So should we
use matrix then?
Well, unfortunately there is a very common category of social networks. So let's see what people do
currently. So one of what we consider best representation at that point we have this one. So it's node link
diagram. And you can see here like blue dense connected parts that are communities and between them
you can see like pictures here that are actually (inaudible). So the good thing is you can see that if I
remove this central actor those two communities are disconnected from the graph. So you can have
structured information. However, you don't know what is happening inside the communities. Is there a
missing link here? Can't see it, so have you to zoom probably.
Can you see what is happening? The communities are all the people in the community connected to the
central actor. You can't see it either. So our solution for that is okay, let's merge both representation. So
as node link are good for sparse, let's use node link. But for dense part let's use matrixes because they are
better for dense parts.
And so here come matrix. You select dense part of the graph and you transform it into a matrix. Oops. I
did it again. Whoa.
So what you see here is actually there is a small animation to go mostly from node link diagram to matrix
and you actually find out that it help people understand what is going on. It is (inaudible) nice idea. And
the good thing with this representation is you can interact very iterative with it. So it is very easy to use
drag and drop and we move people from the matrix. So I'll show you with the point of it. So basically it it is
like using a pen. Right? And you just draw the communities and you can see them. Oops.
So you can circle with the pen and then transfer them into a matrix. So what you can see here is you have
one matrix which is click, except diagonal. So everyone is connected to everyone. And the other one is
actually some missing links. So the blue means link and the white means nonlink. And so you see there is
a cross in the matrix which is almost empty. It simply means that the guy in the matrix is not connected to
anyone, but he was placed close by the group so you could assume that he's part of the group, but he's
not. So here again really quickly see it. And simply by drag and dropping out of the matrix you can
address the community.
So it's very iterative. So from that we try to explore larger graphs. So of course when you have got a very
large network you don't want to circle other groups one by one because it takes time. You can use
arithmetic algorithm or you can start from the matrix. So here is an example or start from the matrix and I
select the communities I'm interested in and I want to see they're connected to each other. So I just drag
them into a matrix representation.
So I just select a group of those and drag it into the matrix version. So if you were filtering the graph at the
same time. So in that case what is interesting is that you've got two communities that are obstructed
from -- that are very close to each other in the matrix and you could imagine they were connected, but
when you drag them you see there is no connection. So that is when you can (inaudible) it also the matrix
visualization.
So if you do that again then you can see the links and the good thing is you can use it for communicating.
So of course you can use it for explorer because you drag and drop and enter it in your matrixes. But you
can also use it to communicate your findings. Simply you can remove once you found out what other
community you just removed the name and the beauty of this is that you can minimize the matrixes so they
become very small. But you can still see the pattern of connection inside the matrix.
Yeah?
>> Question: (Inaudible) labeling ->> Nathalie Henry: Yep.
>> Question: So for each little matrix there is ->> Nathalie Henry: Yeah, there is four.
>> Question: Four -- does that mean that each name is repeated four times?
>> Nathalie Henry: Yes. So why I did that is because I (inaudible) and usually there's people around so
they can see the names from all around. The two are, even one is enough when you know how to read it.
But usually two are enough.
And so from this representation you can represent the (inaudible) research community. Could be that one
here for example. So it is nodes and it takes it into very small space. So you can communicate easy also
to do that.
So from this I have problem of central actors. Okay. So I have got central actors, such as me whose
collaborating with people from Paris, but also people from Sydney. So where do I -- where should I be
placed? In which community should I be placed in, Paris or in Sydney or between them or maybe in
overlapping communities? So you can imagine if I'm coming here I will have three communities so we
have three overlapping communities. So imagine with more many central actors in many communities, and
you can't see anything. Or should I have hierarchy of communities? Then it becomes pretty hard to read.
So the idea I go with Anastasia (inaudible) from University of Sydney, why not duplicating those people. So
I should be both in Paris, both in Sydney. I'm duplicated. And now the question is what are the user -what is the user in understanding on this? Is it misleading? What happened? Is it good? Is it bad?
Basically.
So I want to save this experiment because it is a huge one. But mainly we thought that multiplication will
give you more accurate community view. Example. So this is same network and you see central actors
are defined as people connected to everyone in both communities and you have yellow. One yellow here
and three blue ones here. And if you choose to -- so they're put in one community, whatever, you just
random one. If you change them, a community, then you see very different visual feeling of the network.
For example, here you can say that the two community are kind of equivalent in size. But now if you
change it then you will say oh this one is very small.
So if you duplicate people, which is symbolized by gray links here then you can imagine you have a better
view of the communities. Now the question is does it impact the understanding of the network? I am
feeling that there are less connection between the communities or not. And so the result is that you
minimize the misleading effects by visualizing the link duplicate -- duplicates -- the link between the
duplicates. Sorry. And by offering interaction. So you users create their own duplication and you offer the
visualization and you minimize the misleading effects and actually help performing some of the tasks.
So that was the last part of my work. So this is four different stages and now at the end of my PhD I
wonder what about the evaluation? How can I evaluate all of those thing? So first I did some (inaudible).
So it's mostly formal and it's during the whole process. So you could say it's informing the design's kind of
validation already. Then you can go further and say okay I'm doing cultural experiments so the problem
with cultural experiments, is it for a specific task, with specific data set with specific representations. So
basically you have got very specific results. So you can prove that one representation is better than the
other one, but it is very limited.
So I decided to run a case study. More realistic settings, real data sets trying to find -- trying to see if we
can find something about the data. Real findings. And also launch it in a study, but it's not included in my
PhD.
So I ran a case study over five months and -- actually longer than that, but five actual months working on it.
And say that our research collaborates in four conferences and 20 years of collaboration so it's pretty big
network. Got 27 southern actors, which are researchers actually. And you have got more than hundreds
of indiscernible) relations. And I use it mainly as in matrix explorer and then later because this was in
wrong direction, later I use met link in matrix. And actually I ran into a problem I didn't think about at the
beginning, which is how to represent such findings. So I thought okay, we can filter the graph in present
and link diagram. Filtered one. But as you represent an overview of the field, so I did some manual
presentation so basically you have this huge matrix here, which is like 30 southern by 30 southern people
and you annotate it manually. And you show some of the groups and you can have some closer view so if
you want details. And from here I actually design metrics -- matrix, sorry. So the idea came from here.
You will have the community and see how they are connected.
So I designed this representation where you can have overview of very fast overview and some details into
it. So here you've got the matrix overview with small matrix without the name of the people, just the name
of the community. And here is the closer view where you have all the details. And I just had fun doing
other stuff. Matrix, which would have some magic lenses where you would see the inside of the matrix as a
node-trics. So lots of merging things.
So basically what have I done? So I look at the problem from three different perspective while the third
one, communication, came a bit later. But so the idea was to work on -- and realize what was perception.
How did you perceive matrixes? Explorer.
How do you explore them? How do you communicate? So now what do I want to do next? So then very
interesting questions about the evolution of the network, how they evolve over time. How groups are
becoming groups. Do they split? Do they become -- what is the history of one community? And this of
course is linked to perception and social inter exploration because now you have got much larger assets
and so have you to, you need interaction technique to do that. And also communication is now very hard
because a static picture is not enough. Maybe you need a sequence of pictures or you need a movie so or
video. So that is one of my focus.
So my goal in research is to stay with the flow. So I won't try to pronounce the name of the guy who came
up with this, but the idea is very simple. So in research you have -- well in life in general you have
challenges, which is if you got too much it is very hard. You are very scared that you can't manage to it.
But there are things you know your skills. But if you only do what you know then it's very boring. So the
idea is to stay between both. You want a bit of challenge and a bit of things that you know how to do. And
to do that my plan is to have larger perspective. Let's say doing time-related data (indiscernible) and
understanding evolution of networks. That is where keeping small projects where I know I can actually do
something about it.
So the first step that I started because I finish my PhD a few months ago is to try to see the evolution of
social networks. One project I'm involved in is the visualization of how people collaborate in (inaudible) so
my French Research Institute, so this is very old one and they created teams at the beginning and now
what happen is that people collaborate all over the place and sometimes not even their own team. So they
want to just do, create new architecture and new teams. And so I'm helping on this project.
And also another one on Wikipedia where people want to be able -- want to see what is going on and they
supervise who is working on what page and how does that thing evolve. And other types of data so there
is people from Institute by (inaudible) Paris that I'm very interested in using one of the tool that I created
because they got huge networks and they don't know what to do with it basically. It's lots of data and they
can't see anything, so they are very interested in matrixes to do that.
And more generally there is this project with MSR on reactivity and how people, how can we show user
activity? Can we reflect the activity to users. As special case I'm real interested in supporting longitudinal
and qualitative studies and particularly because very important in (inaudible). We don't really know -- so as
I said at the first beginning at the very beginning we want to answer question we didn't know we had. So
how would you ever rate that? So basically the idea is okay, we give the tool to real data. And we study
how they use it and try to get the benefit of this. But though there are many problems to do that you need
to log the data and need to know what do you want log, how to log it and there is this new -- this very
potential, this very creative way, but how do you visualize this data. You need to analyze the data collected
to see how people use the representation and the visualization, but you could also create representation to
reflect it to user to help that explorer, to help them communicate on their exploration process. So also all
that part that is very interesting.
So to conclude very fast. My research approach is involving users. You have to talk to users before
designing stuff for them. It is better design. Several perspectives you have to combine, data mining,
statistics, visual, analysis. And research direction would be time related data and evaluation in general.
And here is small network. In 1976, I'm here in this matrix here and you can see there is more than 8,000
people. And you can see -- so those are small matrixes of different colors are actually the teams. They
can see here that you have got problems. Here you can see that (inaudible) at all in 2006. (Inaudible) ->> Question: (Inaudible).
>> Nathalie Henry: Here is actually distributed network people.
>> Question: (Inaudible).
>> Nathalie Henry: No, no, no. They are collaborating. The -- so the problem is they are collaborating
very strongly outside.
So these other people are not in their own team. For example, I'm in the information visualization team.
So research group. And this team is with software engineering. So we've got tiny connection, but though
we collaborate more with biologists actually. So what happen is we're in the same, we've got same
(inaudible) and projects and stuff like that, same resources. But though we don't really collaborate with
each other so the idea is to create new research teams that make sense so basically for example those two
ones collaborate quite a lot, but they might be very far away from each other or they might be more difficult
to exchange people from one team to the other one. So the idea is try to have new visualization, new
communities, new research organizations.
Also the thing is with going very fast so the research director had no idea of what are the current themes so
he's got a set of themes like HCI, but he doesn't really know what people are working on so he's looking
also for -- trying to find clusters or how people collaborate. For example, we're very close with data mining
people now. So he would like to be aware of that for example just to interact to the rest of the world.
Yeah. Well thanks for your attention. If you have got questions. (Applause) ->> Question: (Inaudible) ->> Nathalie Henry: Very good question. So actually work on the multi-level matrix representation, but it's
very hard. So okay the -- so the question is already for like few thousand nodes it's hard and we don't
really have solution for that. So even with the matrix, it's very difficult because even if you have interaction
techniques you are dealing with huge amount of data. Now you can see those scale representation that I
tried. But it's always the same program; right. You are losing information because you are aggregating
somehow. And also it takes a lot of process of time. So for example the data of biologists, you can't load
them in the (inaudible). So you have to have lots of infrastructure architecture hardware to be (inaudible).
So yeah, it's still a very hard problem. And that is what I'm going to run into when I work with time-related
data.
The question is try to scale a bit more and then we'll see later the terabytes of data that biologists have. I
think it is also completely different kind of problem because maybe so what happen when you have got
very, very huge amount of data, people work on different parts of the data so would be more maybe
collaborative than it is. And rest of the time they use data mining. I think you can't really visualize this huge
amount so have you to have other stages. This is different. And this is very small network actually. But it's
already a challenge right now. So yeah?
>> Question: (Inaudible) -- of adding additional dimensions to matrix? I mean ->> Nathalie Henry: Uh-huh. Yeah. So actually we got the idea a few weeks ago. So with collaborator in
Canada. He is of Calgary. He's working on this idea of manipulating 3D shallow depth in a table so the
idea basically is to have cubes instead of simple matrix and then you have to think about interaction with it.
But yeah. For the moment I must say that the problem of matrix, so this representation here, is that you
can have so many visual viables that you can't use them because you have the links, you have inside, you
have the labels. So you can't use them anyway. So if you have a third dimension I just wonder how it will
be effective and so the idea and one of the paths is to be able just to just to interact with one or a subset of
them then they become actual cubes that you can manipulate. And basically already matrix is very iterative
so you just grab one. It becomes a cube and so you can turn it so you can see the different dimension.
But there is work on the perception on this and now it can be misleading and what kind of data you want to
represent. And so another suggestion is to use milaunch, which is a folding thing. But instead of folding
you imagine that you are actually catching through different layers and here you could visualize other data,
but I think it is also a problem with perception. But yeah currently you just change the colors manually
and -- that's the only thing you can do.
>> Question: Do have you any way of maintaining sort of perception of the state as you are changing the
colors or whatever? Because one of the big problems here is making sure that you are staying (inaudible)
->> Nathalie Henry: Yeah.
>> Question: (Inaudible).
>> Nathalie Henry: Yeah. So that is a very complicated problem. It is funny because I work a bit on it
when I wanted to visualize how ordering (inaudible) worked and basically I wanted to show the process.
So now the problem of that takes huge amount of time if you want people to actually follow what is going on
because you can't move everything and when you have a (inaudible) it is standard for -- people just want to
skip it. And I'm afraid it could be the same for changing the attributes though I didn't think of it. But maybe
I mention might be a good way of ->> Question: (Inaudible) -- design, how long do they live in the data set?
>> Nathalie Henry: Whhh. Very different. Starting all their life. So well that is for French historian, right.
They're going in the streets and they take maybe several years for collecting data. So they use actually
paper -- pen and paper for people. And then in the night they enter it in Excel. Right? So I mean they
realize everything. So the first thing when you show them something is they want to see the data. Where
is my data? Or what did you do with it? So very, very long time with it.
>> Question: So that being the case, I would assume that these people would be spending some time
working on a set of hypothesis or a set of investigations.
>> Nathalie Henry: Yeah.
>> Question: And the -- annotating their data and then working on a different set of hypothesis and
annotating their data. So the data gets richer over time even if it is a static data set through the ->> Nathalie Henry: Yeah.
>> Question: -- through the iterative evaluation. How do you persist that step -- the kind of hypothesis in
labeling and grouping that the investigator is doing over these vast -- long or vast time scales?
>> Nathalie Henry: So during the perspective design the last session that actually came a very nice idea.
So the problem is I work on this three years on this. So I couldn't do anything. But the idea was you have
something that shows you the history and also that can help you plan things and annotate. So we call it
history planner. But -- so basically currently what they do, they have hypothesis. So much of the time it's
even before they collect the data so they can ask the right questions. Could ask all the questions, but -and what happen is that they try -- they test the hypothesis through statistics. So they don't really do
exploratory analysis so most of them their hypothesis is like, I don't know, most of the time women
collaborate more with men and then they will try statistically and I will say yes or no.
And what happen at the beginning is that at the beginning they said well, it is okay, we can try everything.
So that is what happened; right? They try all the hypothesis. Because you have got the body of literature
has whatever hundred of hypothesis so they will try everything. Oh, in my domain that works. But when
you provide them with the visualization they see other things. So what happen is that they have
hypothesis, but they would like also -- they are very interested in seeing the data just to raise new
questions. So basically you can't remove the current tools because they need to test everything. But they
are real interested in seeing the data in new way, maybe try to find maybe new hypothesis. And once they
get their new hypothesis then they will try to go to statistical measures and what else to do that. So that is
the way they work currently.
Also -- so as I said before they really love their data, so they're just having faster -- it is nice to just talk
about it together. So usually they're very alone people contrary to us. I mean I'm used to collaborating with
lots of people, but most of the time they're doing like -- I don't know, woman in the 15th century in Britain
and then you have got the other one who is doing a man in the 16th century in Paris. So they don't really
talk to each other. And the fact you have visualization to help them also to say, oh look, you have this
hypothesis and stuff like that. So it is also a source of communication. That is why I like the story-telling
parts when you have to explain what is going on with your network.
And of course when they see something iterative like a pen-based tablet, they all interact. Even if that is
not their data they all go and do ah, look I did that with mine and worked. So it's a blend of techniques.
Well I'm talking mainly for historians, but there are many other people that have many other ways of
working so it's difficult to generalize.
>> George Robertson: Thanks.
>> Nathalie Henry: Thank you. (Applause)
Download