18141 >> Eric Horvitz: It's an honor to have Jure... University. Jure is an assistant Professor of Computer Science...

advertisement
18141
>> Eric Horvitz: It's an honor to have Jure Leskovec with us visiting from Stanford
University. Jure is an assistant Professor of Computer Science at Stanford. We know
him as one of our star interns and former MSR Fellow.
And he's been pursuing a very, very interesting and challenging but rich area of probing
large scale networks, both social communication networks as well as information
networks, aspects of the Web, understanding the revolution, how information diffuses on
these structures, understanding large-scale structure that you can typically only gain
insights about through a large-scale analysis.
Jure has won numerous awards even during his graduate work, and I'm sure more to
come given how wonderful his papers are and his research and deep thinking. He's
already had three Best Paper Awards at tier one conferences and won the ACMKDD
dissertation prize. He won the KDD Cup in 2003. And I guess he and Andreas and
Carlos Destrin [phonetic] were involved in, that was the Battle of the Sensor Networks,
won the Battle of the Sensor Networks in 2007.
I see in his CV he says he holds three patents. They're all Microsoft patents, I have to
say. And he co-chairs the machine learning and data mining track at the upcoming
WWW conference. So go ahead, Jure.
>> Jure Leskovec: Thanks a lot. So the work I'll be presenting today will basically talk
about networks and networks where edges are positive and negative, right?
And this is a joint work with Daniel Huttenlocher and Jon Kleinberg from Cornell. And
we have two papers. And this one is coming at this year's KAI and the other at WWW,
and I'll try to cover both of them.
This is basically our idea, the plan for the talk. So if you look at how we look at network
structures, most of the work usually just focuses on positive edges. Meaning friends,
followers, things like that.
I analyze the network where I treat all the edges in the same sense that they all say
something positive to me, right? But generally edges or links in these networks can be
at least you could sort of classify them at least in two classes. You could say some of
them can represent some kind of friendship and the other one can represent something
that is sort of the opposite of the friendship.
And what we want to study is these types of networks. So now if you say I want a
network where I will have these pluses and minuses, the question is how can I go infer
these pluses and minuses, where can I get the data.
And one way is to say that users will express these attitudes, these positive and
negative attitudes implicitly. So here's an example. So, for example, the way how users
are using these social networking or online social applications could give me clues on
how whether that particular connection or action is positive or negative.
So, for example, product ratings would be one such thing on Amazon and rating other
people's reviews that's helpful or not would be one such thing. For example, I'll show
you examples of this thing where on Wikipedia you have editors who then go up for
promotion to become administrators, and in order to do this, there is a public road.
Now a positive edge means I voted it positively on your promotion and negative means I
opposed your promotion. So this is one -- [beeping] they want my power?
The other way to infer, let's say, the signe of the edge would be through text or taking
the text and trying to do some kind of sentiment analysis. And you could create
assigned networks from text and maybe one way to look at this would be looking at this
well-known picture from [indiscernible] where it's conservative and liberal bloggers, or
the other way around, and this is how they link to one another. You see these clusters
that don't talk too much.
That's networks with implicit edges. So what we looked at were networks with explicitly
signed edges. What I mean by that is basically what I want I wanted want to find data
that's relatively large and where every edge is naturally signed. I want a graph like that.
Every edge I have sort of a positive sign or a negative sign.
Here are three such datasets I will be talking about, and I will tell you how they are. So
the first one is this product review website called Opinions. And on Opinions people
write product reviews and then there is also a social network where they -- where you
create connections to other people, and for every connection you have to say basically
it's like a trust/distrust connection. Do I trust this person's product reviews or I distrust
this person's product reviews. So here when a user has a connection have to put a plus
1, meaning I trust this person, or minus 1, I don't trust this person. This is given, not
something I'm inferring, but the user is telling me do I trust it or not. The reason why this
social network is deduced, this trust/distrust relations have been used for other the rating
of the product reviews.
And the more trustworthy you are, the higher your product reviews are rated, more
people see your product reviews and Opinions gives you more money. So Opinions is
actually sharing its revenues with the people who write these reviews.
So it's important that you get a lot of trust edges. That's one. The second one is this
Wikipedia that I was telling you about, where I have this voting network, where that is
completely public. There is a public record of how people have voted. Here a signed
edge means was I in favor or was I against somebody becoming an administrator. This
creates an edge to that person.
The last dataset that we look at comes from this slash dot, a online technological blog,
where there's a social network associated with that. You can tag every person as a
friend or foe. The idea is do you like the person's comments or not.
So comments through the articles that are posted on this website. So these are now
three very different social applications on the Web where all of them have some kind of
network behind them and this network is naturally signed.
And users explicitly assign signs to these edges they create. But so this is sort of what
is in common. But what is different is, for example, in this case this is who trusts whom.
Here, other people can only see your trust edges but nobody knows what are your
distrust edges.
The distrust edges are private to yourself while everyone can see your trust edges. On
Wikipedia, it's again very different because here everything is public. And the way how
you vote really has some consequence that can ruin someone's life if you like.
And then slash dot is basically a bunch of geeks and it's not clear what these things
really are and what does it mean to tag someone as a foe and so on.
So this is what is similar and what is different. Okay. Now I should tell you what I want.
Or before this is just quick statistics. I have between 100,000 and 7,000 people,
between almost a million and 100,000 edges in these networks. And then the fraction of
positive edges is around 80 percent. So this is the data that I'll be looking at.
>>: The first two networks, it seems like you could also give decision aware where most
of the trust/trust edges are directed towards the more small subset of people. So it's
more bipartite.
>> Jure Leskovec: I'll come to all these questions. It's an excellent question. I'll come
to exactly how bipartite these things are. But it's an excellent question.
Okay. So now I told you where the data is. I told you I'll work with signed networks.
What do I want to study? I want to study how the signs in the network structure, how do
they interact with one other. Are the signs randomly scattered across the network or is
there some pattern?
More interestingly, I want to study or evaluate what kind of social theories could explain
how these networks structures or signs interact, and the last thing what I want to do is
can I accurately predict, can I sort of build various machine learning models that will tell
me where, is there a positive or negative relationship between a pair of people. So that's
basically the plan for the talk.
Okay. So if you now go and say how could I go about saying, trying to relate network
structure with signs. One basic intuition would be something like this. So I could say,
you know, friend of my friend is my friend. So if me and Eric are friends and Eric and
Paul are friends, then me and Paul are likely to be friends. So that's sort of the first
thing.
The second one is saying an enemy of my friend is my enemy, also makes sense. Or
friend of the enemy is also my enemy. Or enemy of my enemy is my friend. This is
something that makes lots of sense. And what this is basically doing is telling me about
three pairs of people. It's telling me what is the structure -- how can I complete, how can
I infer an edge based on the intermediate person.
And this goes -- so this in social sciences is called the theory of structural balance, goes
back to '46 by Hyder, and he was the first to reason about these things. So basically
what I want to reason now is three pairs of connected nodes. So this is -- if now I say,
okay, here are three pairs of connected nodes, turns out I have four ways to label them
with signs.
And then if you remember the previous slide, now I can go and start asking myself about
these four different trials, and I ask about this one. So what this one is telling me is
friend of my friend is my friend. So everything is good.
This one is telling me if I close this one, it's saying enemy of my enemy is my friend. Or
if I close this one, then I say enemy of my friend is also my enemy. So for this reason I
will call these two balanced because this is sort of capturing the friend of a friend is my
friend and this one is capturing the other three cases that I showed you in previous
slides. And then these two I will call unbalanced, because they are basically
inconsistent with intuitions I had on the previous slide.
So, for example, if you think in real life maintaining such a relationship in your social
network will be quite hard for you. Because your friend with two people that don't like
one another. Right? And here the theory here is that you have three people that don't
like one another. So sooner or later two of them will join and go after the third one.
So that's what structural balance says. So now I can go into my network data and what I
would expect, if this is true, is that these things will be -- I will see more of such
structures than what I would see the world as random, and I would expect to see less of
these structures because they're sort of harder to maintain, if you like, in the real data.
So that's the first intuition. The thing is what is nice about this theory of structural
balance is that you can show the following theory. Basically if you say if your network is
composed only from balanced triads, meaning if your network is composed only from
such structures, then this will happen, what sort of Alice was saying. Then you can
show that your network has to be like this. It sort of has to be bipartite in a sense in
which you have two coalitions. Coalitions A and B where you only have positive edges
inside each of the two coalitions and negative edges crossing the coalitions.
So this is what this balance implies. So if my network is composed of balanced triads, if
my network obeys this friend is my friend and the other three rules, then globally my
network will look like that.
So that's the cool thing.
>>: One of the details that came out of Cartwright Ferrari, the structure, the parameters,
too.
>> Jure Leskovec: No, it's easy -- if your network has such structures, then you can
always put the nodes in two sets such that either this will happen; or if you have no
negative edges, then you just have a network of positive edges. The idea is that from
the local rules, global, you can say something about the global network structure.
>>: I notice all the links are undirected.
>> Jure Leskovec: Sorry? Right now I'm assuming the edges are undirected. This is
saying the edges are undirected.
Actually, the way this was developed was if my graph is a complete graph. So everyone
has an edge to everyone else. And then you'll get something like that. But even like
generalizes even if your graph is not complete.
Okay. So this is this. So let me now tell you about one thing that is sort of a bit more
specific. So all our datasets, the domains we study, they are naturally directed. So one
way to apply this theory that I just showed you would be just let's forget edge directions
and apply the balanced theory and try to see how these assigns and edges correspond.
There's also a different interpretation of this positive and negative edges, that you can
sort of trace it back to this paper by Guhi, et al., where you can say the following thing:
You can say every positive edge basically tells me something about the status of people.
If A gives a positive edge to B, this means that B has a higher status than A.
If A gives a negative edge to B, this means that B has lower status than A. So it's not
really about friend and foe, but it's about status, how good is someone. This says B is
better than A and this says B is worse than A, whatever, in some kind of status sense.
So that's another way how this positive and negative edges could appear in networks.
And just a small example. So here is A points negatively to X. X points negatively to B.
I can ask what do I think would happen here. So what structural balance would say is
put a plus here. Because we are friends and we have a common enemy. So enemy of
enemy is my friend. While status theory would say A is bigger than X. X is bigger than
B. So A is bigger than B. So I should put a minus here according to the status theory.
Balance would say put a plus. Status would say put a minus, because status of A is
higher than the status of B, just by the transivity. And the other way around where B
says B has a plus to X. X has a plus to A. So by a balanced theory I would put a plus
here because it's all three of us are friends. If I put a minus, this says there is a person
who has two friends who don't like one another and we said that is bad before. So
balance would put a plus here and status would put a minus.
Why? Because X is bigger than B. A is bigger than X, so A is bigger than B. So A
looks down on B. That would be the prediction of the status theory.
What I want to do for the rest of the talk is sort of go reason about these things and see
how do people create these edges. Because these are two fundamentally different ways
of how people assign edges here.
So that's ->>: So I imagine an extended balanced theory where you have larger structures that are
stable.
>> Jure Leskovec: Sure, yes. So it's a good point.
>>: Identify those structures, that would hope to kind of scaffolding that would hold
friendships in place even if people were enemies and learn those scaffolds.
>> Jure Leskovec: Actually, I will show you a bit about this. But one way how to
generalize sort of the larger structures is just to say I want these larger structures to
have the product of the signs has to be positive. So it's a minus times minus times plus.
That's a positive thing. So that's the way how we generalize sort of bigger structures.
So that's what I want to do.
So for the rest of the talk basically I want to look at how do these two theories align with
the network data. I don't want to just tell you which one is right but I want to go into a bit
more detail and how these theories are reflected in the data and compare them, and why
do I want to do this? Because if you give me insights how these linking systems, how
people use these linking systems. Because nobody tells them this is about status, this is
about balance.
It's about is this a friend or a foe or do you like this person to become an administrator or
not or do you like this person's reviews. So I want to see if these theories say anything
about how we were using these systems and basically as far as we know this is one of
the first things that actually looked at networks that are signed that are a bit bigger.
So that's the plan. So first I want to show you about the balanced case. So I'm
evaluating structural balance. I'm considering my network to be undirected, and so what
is this big table showing me? So I have the four, sorry, the three datasets. I have the
probability of a particular triad, so each row here is a particular triad. I'll show you that.
This is the fraction of those triads in the network. So these numbers should sum to 1.
And this is the fraction of such triads in the network if I would randomly assign pluses
and minuses to the network.
So I keep the network stable. I keep the total number of pluses and minuses the same, I
just sort of shuffle them across the network. This is the proportion of the triads if the
signs would be randomly distributed on the network, and by keeping the proportion.
So the first one, all right. This is the stable, the balanced triad. And I see it's much
more -- I see it much more than what I would expect to see by chance. So balance
gives me a good prediction.
Then this is the second balance. Two friends with a common enemy. And you know I
see it more expressed. Actually, all this is statistically significant. The only way, the only
place where this is not more expressed is in slash dot. So here I almost put almost a
checkmark, and this would be now the two unbalanced cases and, again, for example,
this means I have two friends who don't like one another. I see this much less than what
I would see by chance, here, here.
So again this triad is much less expressed than what I would think if the world would be
random. So that's good. And this is now the last one, all three minuses and actually this
one is more expressed than what I would expect.
So there are more triples of enemies than what chance would tell me. So this is sort of
inconsistent with the balanced theory. So this is now if I think about balance. So now I
can go and try -- so here I just threw away the reactions of the edges. One way to go
now would be to think about, okay, what if I also try to consider directions of the edges.
Now I will change the setting a bit and I will be thinking about evolving the directed
networks. So here is what I want to do. So I want to think about directed networks
where now I can go and again ask, okay, how many of these directed triangles are
explained by balance.
So here are the 16 possible directed signed triangles. So why 16? For each of these
two edges I have to decide the direction and I have to have the sign. So it's two times
two, times two, times two so it's 16. Now, this is the edge that we look at. So the setting
is now this node is creating an edge to this node so this is the context they created so
there's sort of 16 different contexts.
Now if I use my structural balance theory the way it is suggested, forget about the
direction of the edge and start predicting.
So in this case I would predict a plus. So this is what balance would say. Now I can go
in the data and say is this directed thing more expressed or less than what I have seen
in the data.
And I would find that it is. And then a second here I would put a minus. I have a plus. A
minus. So I would put another minus here. These are now the predictions by the
balanced theory.
So makes sense, right? And now I can go into the data and start checking how many of
these predictions are correct. And it turns out that only half of them are correct. So what
I mean by that is, for example, this thing is already represented in the data. Meaning
this is a stable triad and it occurs more often than what I would expect, but, for example,
some other one -- for example, this one is underrepresented. It turns out when you have
two minuses going this way, people like to put a minus here and not a plus.
Here's another example, right, where I have pluses going this way and then people like
to put a minus here, not a plus. Which seems to be something that alliance with the
status theory.
It says you are better. You are better. So yes I'm better than you. So at least if you look
at the data here are two examples, for example, where status theory makes mistakes.
These are the other cases where status theory gives you wrong predictions.
>>: One could imagine balanced theory being extended a little bit, you want to fold in
notions of separating out the actual reality of plus/minus from communicative intent, but
you're more stable to have a friend not like people who are friends, vice versa, if you
don't communicate it. So it would be a bias to not express in the graph information that
is retrieved in reality.
So you can dig into a bunch of these things and come up with stories that involve extra
network or noncommunicative intention.
>> Jure Leskovec: So I agree that like there is all these issues that are not present, or
sort of all these evaluations that people are just not willing to give because there is some
cost to doing this.
>>: Right, bias distribution. It becomes to certain biases which they're not getting, not a
random deletion of the ->> Jure Leskovec: I agree with what you say, but on the other hand, for example, with
Wikipedia, it's very costly to say, yes, I oppose you, because everyone else sees that.
You would say people would really give it if they feel strongly about it. So Opinions no
one sees who you distrust, implicitly you could distrust everybody and nobody would
look you -- so if these two things are similar, so if the results from the opinions in
Wikipedia would be similar, then that will answer your question.
>>: Where do you get the negative edges within the public?
>> Jure Leskovec: Sorry?
>>: In cases where they're not public, where did you get the data for the negative?
>> Jure Leskovec: From the database. From Opinions. So I was able to crawl
Wikipedia and crawl slash dot but I had to get the data from Opinions.
>>: From them.
>> Jure Leskovec: Yeah. You can't obtain it from through the Web interface. Correct.
So eight out of 16. The question is can we do better or can the status theory do better.
Now the answer is yes and now let me tell you about the status theory. Here's how we'll
think now. All my edges are directed and edges are created over time. So we are still
are reasoning about triples of nodes and the way I will be reasoning is X creates some
edges and now node A evaluates node B. So this is what I would like to figure out no
you and what I'll be saying is what can I say about the behavior of this edge given the
signs that the X gave? So I want to sort of consider what happens here in the context of
X. X is giving me the context. What I mean by that is X, context means what are the
directions of these two edges and what are the signs on these two edges. And I would
like to know how does this sign A and B depend on what X did before.
And the point here is basically that instead of studying this as an independent event,
now I will sort of condition on the action of X. I will say X did this particular thing, now A
evaluated B, how was what can I say about the behavior of A.
So that's the idea. These edges are embedded. So I have this context. Okay. And the
same 16 trials I showed you before, these are now the 16 different contexts. So this is
the edge that gets created and here's the node X and this is the context. So it's direction
and the signs. And here are the 16 different contexts.
Okay. And I'll call this now contextualized links because this red ink that occurs is put
into this particular context. That's how I look at this. And now I need to make a few
definitions.
So first this I is that different users make signs or sort of give signs or the proportion of
signs they give is different. So I will define what I call a generic baseline of the user,
fraction of pluses that the user gave. Right, this is how positive they are on the average,
if you like. And then I have something I call a receptive baseline which again says how
many pluses they received. So it's how popular are they in some sense. So these are
the base lines. So I'll just take a note and count what fraction of things people point to
them were positive and what fraction of edges that they give that are positive. These are
the base lines.
Now, what I want to know is how does your baseline or how does your behavior change
from the baseline based on the context? If I know what X did, how will your behavior
change from this baseline behavior? So what is important to note here is that I'm not
interested in modeling what will you do but I will model how will your behavior change
from this baseline behavior.
And the measure that I introduced, we call it surprise, and basically it will be something
that will tell me how much does the user behavior deviate from the baseline when they
are in this particular context deal. So I want to characterize how much does the
behavior deviate from the baseline.
The idea there is what -- what does A intend to do in this situation versus what does A
do when I don't care about directions and signs here. So I want to know how does the
behavior that changed in sort of in this context versus overall possible contexts.
So here's how I compute things. So let this be all the instances of a particular
contextualized linked T. So basically the idea is if this is the contextualized link I take all
pairs of edges that participate in such a structure and then, what is the baseline context
is just generative baseline of the subject is generative baseline of all nodes A which is
basically what is the expected number of pluses I would see in this context if everyone
would be behaving based on their baseline. And again the receptive baseline for T is
the sum of the receptive base lines of these guys here.
So again what is my expected number of pluses that would occur in this particular case.
Okay. So this is if people would be behaving independent of the context. So now what I
define as a surprise is basically I also count how many positive edges I really see in this
case and then I want to quantify how far away is the observed thing from the
expectation. So I will be measuring basically the number of standard deviations by
which the number of positive links differs from the generic baseline that I define here.
Similarly, I have a reciprocative surprise just the number of standard deviations that
positive links deviate from the baseline. So I'm quantifying how much baseline is what is
expected and the unit, the unit in which I measure this is standard deviations.
Okay. So if the surprise is positive, this means I see more positive, more positive links
than what I expect, surprise is negative, then I see less positive links than what I expect,
if this surprise values are around 6, this means they're six sigmas away from the
expectation. So things. So everything that has surprise more than six is super
significant.
So let me now show you two examples. I think I've shown them. Here they are. So this
is the first example, right? So here's X. This is now the context two negative edges one
pointing up, one pointing this way, and I want to reason what would happen here, right?
So what I expect here is this is the following. So I expect A to be more negative than
generative baseline of A why because A is highest status and this is low status if my
status theory is correct it will be more negative than what it usually is. And if I now ask
okay what is the receptive baseline, if I look from viewpoint of B, because B is low
status, he's more likely to receive a minus than he is to receive a plus. So also like this
is the surprise should be more negative than the receptive baseline of B. So the
generative of A should be A and generative of B should be negative. So they both
should be more negative than what they usually are. So from the viewpoint of A, A is
more negative than what they usually are from the viewpoint of B they're more like to get
a negative edge than what they usually get. Why? Because A is high status.
Okay. And here's the same example flipped around where I would expect the same
thing. So now what I would -- well, I think I should be flipping this around. Right?
Because what I want to say is if -- so imagine that this edge is flipped around. So B is
the highest -- B is the lowest status and then A is the highest status. So if it would be
evaluated this way, B is more likely to give a plus. So surprise should be positive. And
because A is high status, it should be more likely to receive this plus. So these are two
basic examples. But here is a different example that we'll show why this thing is a bit
complicated.
So let's consider this what we call joint positive endorsement. So I'm considering this
type of case. So there is an X to give pluses to A and B. Okay. And what now what I
want to figure out what will happen here. And what we see in our data is the following.
We see that A is more likely to be positive than the baseline. So generative surprise of
A is positive. Meaning A is more positive than usual. But receptive surprise of B is
negative.
So this should be negative. So what this means is that B, for B this edge is more
negative than what B usually gets. So in some sense I almost get like a contradiction.
Here I'm saying A is more positive than usual, and here I'm saying this is the section
more negative than usual.
>>: How do you know which one is A and which one is B?
>> Jure Leskovec: Because I have the direction of the edge.
>>: Oh, I see.
>> Jure Leskovec: So A is the guy giving the edge. B is the guy positioning the edge.
Now, let me give you a story and tell you how this is consistent. Okay. So here's the
story. Okay. So think of a soccer team.
And now I can create a signed network on the soccer team to ask every person how
does your skill relate to the skill of some other person. Right? And this way I can build a
signed directed network where I will put pluses to people who are better than me and I
will put minuses to people who are worst than me. That gives me a network. I'm
assuming there's an ordering on the skill of the players. I ask every player who is better
than you, worse than you, this gives me like a signed network. Now, imagine that I
haven't yet asked what A thinks about B. All I know is what X thinks about the two of
them. And now the question is what can I infer about A and B from basically the
information that X gave me about them.
Okay. So that's the setting.
So what can I infer about the answer of A? And, okay, back to -- so here's what I can
mean first. So here's the first inference is that I can think this way. I can say because B
has a positive incoming edge, B is high status. B is a good soccer player. Then
evaluation of A should more likely to be positive than if A would be evaluating a random
guy.
So here's a small picture. So if this is the whole span of here is the best player, here is
the worst player, right, the average player is hitting in between.
Because B got a plus, it means B is better than average. So B is somewhere here. Now
I'm asked A says, okay, how will I evaluate B? Yes, he's more likely to be positive than
what he is to a random guy. That's what all this is saying. Because B got plus, B is a
good player. So, yes, A will be more positive than what A would be to compare to the
baseline, sort of comparing an average player. So this is from the generative side of
things.
So now this is now from the receptive side of things. How B will see this. B sees the
following thing. B says because A has a positive edge, A is a good player, right? So the
valuation A will give to me will less likely be positive than what I would get from a
random guy. Okay. So again the same picture. So here's the average player or the
baseline. I know A is better than average. So A is here.
And now B, it doesn't matter where on this axis B is, B is less likely to receive a positive
edge from this one than from this one. From this point of view, from the point of view of
the receiver edge, he will receive this action to be less likely than to be positive than
what they would get from a random guy.
So that's a story how basically you can get different predictions based on the viewpoint.
Based on where do you look, do you look at the edge from the point of the receiver or
from the standard of the recommendation.
So now I have to tell you how this status theory will make predictions. So here's the
idea. So I will assign X to have a status of 0. And then when X, if X points positively to
A, I will -- to U, I will assign a new status of 1 otherwise I'll assign a status of minus 1.
And if the edge is in the other direction, then I can sort of turn the direction of the edge
and turn the direction of the sign.
So plus this way means minus that way. Okay. And so in this case both A and B would
have status of 1. Okay. They are both better than X. That's why they have status of 1.
And then now I need to say, okay, how is the surprise value, the deviation from the
baseline, when is it consistent with the status. Here are the two rules. From the
generative side of things, from the viewpoint of A, then this is how I think. I say the
status of B has to have equal the same sign as generative surprise of A. Basically it
says if B is high status, A should be more positive. If B is low status, A should be more
negative than what they usually are.
So this is what this is saying. And then from the receptive side, from the viewpoint of B,
I'm saying the status of A should have the opposite sign as the receptive surprise.
So I'm saying if I'm B and I'm high status, then -- sorry, if I'm B and A is high status, that
means I'm more likely to receive a minus. If A is low status, I'm more likely to receive a
plus from someone who is below me. And if A is up, I'm likely to receive a minus.
So here I get the difference in the signs. Here I get the same signs. So this is how the
whole thing works. So right in this case this is what I would expect.
So the generative surprise of A has to be positive. And the receptive surprise of B has
to be negative. Right? They both have signs of 1. So we know for this to be consistent
I have to have equal signs here. So I have to have plus here and I have to have a
different sign. So I have to have minus here.
So now here's a huge table with results. So bear with me and I'll explain it nicely. These
are all the 16 different triads. This is the number of times this triad occurs. This is
probability that the triad is closed with a plus. Remember, baseline is about 80 percent.
And then these are my surprise values. This is the number of standard deviations the
behavior deviates from the baseline. Minus means you know 50 standard deviations
more negative than what is expected.
Plus means this many standard deviations more positive than what is expected. And
then so these are the numbers. And here are sort of the predictions. These are the
predictions from structural balance. These are the predictions based on the status
theory. Okay. So let me show you an example. So here are now the 16 things that I
have. So, for example, if I take this particular triad, this is 11, I can go to line 11 here,
right? I see that A is more negative than what they usually are and B sees this more
negatively than they usually are. So the generative surprise of A is negative and the
receptive surprise of B is negative.
And this is something -- this is opposite of what balance would say, the balance would
say both have to be positive while our theory says that they should both be negative. So
why? Because balance says here's a plus, here's a plus so here should be a plus. They
both should be more positive than what they usually are.
What I see from here -- sorry, from here, they're both more negative than what they
usually are. So this gets closed by a minus. So this is something that the status says.
But balance doesn't say. Okay. So do people see that? What this is saying, B is
greater than X. X is greater than A. So what is going this way, it should be -- wow,
wrong. What is going this way, I should give a minus this way, right? Because this is
the lowest guy, this is the highest guy. This way should be the minus, which is against
what balance says, but it is what results in the data. So my status theory gives me the
right prediction. This gives me the wrong prediction.
And the point that I made before was balance gives me around seven or eight of these
things correctly. Status gives me 13 or 14 of these things correctly. So this is one
example. And here's another example that I was showing you before, right? It's this
way. What it says is B is less than A. So I would -- so status would say put a minus
here. Balance would say put a plus. It would say two friends with a common enemy.
If you go to this particular line here, again, you would find that people put minus in this
example. They behave more like the status theory and not as a balance theory.
Okay. And then the last thing that I want to show you, okay, what are the mistakes?
>>: The status and the applications that you're looking at. So it's a status -- it's a plane
ability, build a type of opinions. So what's the method to the semantics there?
>> Jure Leskovec: That's a very good question. So, for example, the last slide I will
show you is how, like, for example, voting on Wikipedia. Why should voting be about
status. There should be some bar, and you're above that bar you are above the
administrator bar, if you're below, you're below the administrator bar.
If you look at how people, I'll basically show you the slide where I have some kind of
proxy for status and how people vote based on the difference in status. So, wait, yeah,
it's a good question. Why should there be status if these things are not about status.
They're about opinions or preferences or trust and there's no inherent status in there.
But basically what is my point is that in the systems people behave like they're assessing
status of one another and not necessarily using this friend of a friend type reasoning.
>>: You can imagine a model mixed with parameter, right?
>> Jure Leskovec: Both examples, yes, exactly. I'll come to that. So here are the four
different mistakes that I'm making. So this one and this one are the T3 and T15 and
they add 2 and 14, and this is the last mistake. What is to know all the examples up
here are just basically variants of the same thing. So I can get from this triad to this by
flipping the direction here and flipping the sign and I mean in this case. If you look at all
these examples they're all the same example at the end from the status point of view.
They're all telling me the same thing, right? Because these two pluses here I can flip
them around and make minuses and I'm in this case. It's again telling me X is the best,
and A and B are worst than X. So from this point of view these are the mistakes that I'm
making here and this last mistake is the case where balance does well. It's like I should
be putting a plus here while what I'm predicting is a minus. These are sort of the five
mistakes I'm making. So going to Eric's question, now the question is, for example, the
following. So I said, I told you this status. What does state us tell me about this
application? So this is -- so A creates an edge to B. Now B has to create an edge
backward. If my state us theory is correct, then B should be negative. Because B is
greater than A. So this way should be a negative edge. And similarly this way, right?
There should be a plus here. So the question is when people reciprocate, do they obey
this theory that I was saying or the balance. Based on the balance, the edge should
have the same sign. So if I'm friend I should be a friend back. In the status theory, the
edge should have a different sign. And basically what I'm interested in is what is the
sign of this edge given that the first edge was positive or that it was negative.
And if I look at this, basically what I find is that basically a reciprocation strongly follows
balance. So people are more likely to reply with the same sign than they reply with a
different sign.
And here the point is also that all these -- so the strongest signal is just reciprocation.
The second, and then what you get is all this triadic effect. What you also find, then, is
from the balance point of view is that people are more likely to reverse the sign if they
participate in an unbalanced triad. If I'm in an unbalanced triad, am I more likely to try to
make it balanced. That's something that happens in the data.
So going to the question that Alice had at the first thing, is how does this global network
structure interact with the links. So now I want to understand how does this network
globally look like from these pluses and minuses point of view.
And let me skip a bit. So here is what I want to say. Basically both theories make
predictions about the global network structure. For structural balance, I already showed
you. This tells me there will be coalitions in my network. So my network will look
something like that, right, pluses, pluses and minuses across. What status theory tells
me, there should be like a global status, meaning that I should be able to assign every
node a number such that positive edges only point from nodes with low numbers to the
high number and negative edges point the other way around. I can now go into the data
and look is globally my network more like this or more like that.
And to sort of scare you with a big table, this is what's going on. So this is like the
fraction of nodes that obey the balance criterion and this is a fraction of nodes that obey
the status criterion. Yes, it's less, but here are two ways I can make my data random.
Here the idea is what if I -- if I randomly assign the signs in my network and you can see
that then the network is more bipartite than what it is in reality. While if I make my
network random, it obeys status less than what it will base in reality.
So what this basically tells me is that globally networks obey status more than they obey
balance. Even globally status is more expressed than, or there's more evidence for the
status theory than there is for balanced theory.
And then the last thing -- so now I showed you these two theories. One thing that I want
to do next is to say, okay, can I go -- given the signs of the edges, can I sort of sign of
the edge and try to predict it? So given how this edge is embedded and what are the
signs of the edges adjacent to this particular edge, can I figure out what should I predict
here.
And so the way I will go through this is very simple. So for every edge I, will create a
feature vector and this feature vector will just count the types of triads this edge is
embedded in, then I'll train a logistic rectifier. These are the 16 features I'll be using. If I
want to predict this edge, then I'll count how often does it occur in this context, in this
context, in this context and so on. And I have a feature vector of line 16.
And here is the example of the logistic regression. So I sub sampled my data so that
random would give me 50 percent, and the point is that using the existing features I can
predict almost like with 90 percent classification accuracy. So basically between 80 and
90 percent classification accuracy I can predict whether a particular edge would be
positive or negative.
So that's basically the first point here. And let me show you the last thing. So the last
thing that I want to then ask is okay if I can do this with 90 percent, I also know that all
these applications are very different. So what I want to ask then is say, okay, what if I
would train my model on opinions, where people rate product reviews and then use this
model to predict how people will vote on Wikipedia.
Right, can I do that? The idea is I will train on slash dot and I will predict how people
trust each other on Opinions. And if I do this, here is what this table is showing me. So
I'm training on the raw dataset and I'm evaluating on the column dataset.
Okay. So if you look at, let's say, the first line, so when I train on Opinions and I
evaluate on Opinions, I do best. Right? But even if I evaluate the Opinions, dot one
slash or on Wikipedia, I do almost as good as if I evaluate the, if I train the model on
slash dot or if I train the model on Wikipedia.
So what is the point? The point is that these models have like amazing generalization
performance. So I get almost the same performance on Wikipedia regardless what I use
for training, or on slash dot I do about the same regardless of whether I use the slash dot
dataset to infer how people create signs or if I use Opinions or Wikipedia, and similarly
here.
I lose about one percent in classification accuracy. And what this is basically telling me
that even in this very different applications, where people vote and everything is public
and here where only positive stuff is public and nobody sees the negative stuff, the
model is the same. Right?
The model does us well regardless of what the application that I train it on.
>>: Did you do [indiscernible] to see what [indiscernible] what the 16 is related?
>> Jure Leskovec: Yes, I skipped these slides.
>>: Can you tell me from your memory?
>> Jure Leskovec: So it turns out that -- let me tell you off line, okay? So this is the
first -- so what is the surprising result here is that regardless of which dataset I train, I
can do as well as on the dataset as if I train on itself.
>>: But how do you explain the variation between the columns? Why is Wikipedia so
much less predictable than E Opinions.
>> Jure Leskovec: That's a great question. I don't have a good answer. But the point is
exactly as you said. It just seems, yes, Wikipedia is like a harder problem than what
Opinions and slash dot are, because I always do worse. I notice regardless of sort of
what kind of model I was using, the performance on Wikipedia was worse. Now, why is
that?
>>: Because it does not apply as well.
>> Jure Leskovec: So here I was using also -- I skipped a few things. I can go back, if
you like. So one thing is that when I do this logistic regression, I can train these
coefficients or I can take the coefficients directly from the theory. For example, I can
apply the balanced theory and say this is the coefficients, the logistic regression
coefficients balanced theory, it would say. If this is the edge I'm predicting, if it's
embedded in a plus, plus or embedded in a minus/minus put a plus. So I put 1s here. If
my edge is embedded in such case put a minus because these are unbalanced cases.
So I would put minuses here. Now what I'm showing you here, for example, this is
coefficients that the theory suggests and these are the learning coefficients. Okay. And
you see that the signs are correct. The only case where it's not correct is this
minus/minus, which is the all minus triads which we also saw that people put minuses
there. And similarly for status theory, I can compare what the sort of, what does the
handcrafted logistic regression model do and what are the learning coefficients. And the
slide that I also skipped is the following. So here is the point. So, for example, this is
the deterministic balance. So where I set this logistic regression coefficients by hand
and this is where I set them by learning.
And you see that I gain sort of very little.
>>: What are the different colored bars?
>> Jure Leskovec: That's a good question. The different colored bars are how well
embedded is the edge. Sort of how rich is my feature vector. The more embedded. So
this 25 means that the edge participates at minimum in 25 different triads. So my
feature counts are very good. While here at 0, sort of most of the features participate at
0 because they won't participate in many triangles. It was to see how much of this effect
was there. What's the point of this slide? So when I have that here, this means
deterministic. This means I set coefficients by hand either plus 1 or minus 1. I have
learned, this means I've learned in the coefficients.
And the point is that there's not much difference between deterministic and what I
learned. So even these theories do well. And you can go and compare sort of the
status theory, predictions from the status theory and the balance theory. So this is the
deterministic balance. This is deterministic status. And, for example, up here and also
up there, status does a bit worse in terms of predicting than balance does.
>>: Is it true that the opinions and the slash dot is something that's based on less
information and people give less thought to that in Wikipedia and Wikipedia take very
seriously and make it their life's mission and it's harder to predict kind of based on
superficial ->> Jure Leskovec: That could be an explanation. So the last slide that goes to Eric's
question is the following: So how do people vote on Wikipedia. So what am I asking
here? How likely am I to vote on someone positively? And here I'm having some notion
of status.
So I'll tell you what exactly this means. But here it will be when A and B have about the
same status. Here when A has lowest status than B and here when A has higher status
than B. And A is evaluating B.
So here is what I see. So here is now probability of voting positively versus the
difference in status, and, for example, one way to quantify status on Wikipedia is to say
how many edits have you made. And what this says is the following: When B, so the
guy goes up for administrator, made many more edits than A. A is very likely to be
positive. More positive than the baseline.
When they made about the same number of edits, A is the most negative. And when A
made much more edits than B, again B is -- A is more likely to be positive. And here is a
different way how to, a different way to think about status on Wikipedia. People can give
one another these barn stars. It means that I come to some user. I edit his page and I
say I give you a barn star.
I just put some image there. That's called a barn star. So now I can say the number of
barn stars is the value of your status. And here I'm now plotting the barn star difference.
And I see the same behavior, right? When A is evaluating someone who is clearly better
than what they are, they're very positive about it.
When they evaluate someone who is clearly worse than what they are, then they are
here, but when they evaluate someone who is exactly of the same status, they're the
most negative. So both these curves, if I sort of emphasize, have the same shape.
They're very positive. They get the most negative when A and B are about the same
status and then they sort of bounce back. Both in terms, if you look at the status in
terms of edits, or if you look at the status in terms of barn star difference.
Right? And, again, this is something that we also like see a lot in conference reviewing.
If I review from someone who is very sort of the same as me, I'm the most critical. If I
review from someone very senior or something, I'm generous. And if I review someone
who is clearly not at my level, I'm again generous, right? And this is the baseline. So
what this again says is that at least in Wikipedia, it's not that there would be a bar are
you good enough for administrator or not. It's basically people tend to evaluate other
people based on their own by comparing them to themselves. So what this shows is
that it's the relative assessment that matters a lot, absolute assessment. At least on
Wikipedia. These are just two different proxies for the status, but you get the same thing
in both cases. A very positive, the most critical when we're the same status and then
this bouncing back.
So that's all that I wanted to say. So basically what did I -- what was I telling you? I was
telling you how is the network structure and the signs, how do they interact. I was
showing you some of the examples of the structure balance theory that is present if you
look at the relations undirected and then I also showed you this status theory that can
sort of give you very interesting predictions about behavior of people, and it explains my
data much better and also at the global level this theory is more, this sort of the global
structure of the network more obeys the status than the structure balance theory.
And then the other two things that I think were interesting, first one is that you can
basically, with simple models, you can predict with almost 90 percent classification
accuracy whether the edge will be positive or negative, just based on the other edges in
the network that -- another interesting point is that these models have basically amazing
generalization performance, meaning that even though I have these three very different
datasets with three very different applications and very different mechanisms of how
these edges are generated, at the end, the same models seem to apply.
And one thing that I haven't talked much about is even if you do link prediction, it's good
to know who your enemies are. So even if you want to predict whether there will be just
an edge regardless of the sign, it's good to know what are the negative edges.
Okay. So this concludes my talk. I would be very happy to take questions.
>>: What are your thoughts about applying, is it citation databases.
>> Jure Leskovec: Oh, wow. That's -- so the question there is how could I get at least a
bit of label data, right? If I want to do these predictions I would want to have a bit of
labeled data. But, yeah, you could think that there are citations that I make willingly and
there are citations that I make unwillingly.
So it would be interesting to see what happens there.
>>: For example, there might be a way to figure out what are some classic papers that
most people refer to and some people don't as a proxy for a negative link.
>> Jure Leskovec: So like this point that you're opening I think is also very interesting.
The question is there's all these links that are not there. Are they like I'm not in fight with
everyone on Facebook that I'm not connected to. I'm not in fight with 400 million people
and 600 I'm friends with. So the question is, when does the missing information mean I
just haven't expressed my opinion, versus, no, I really don't like you? So I think that's ->>: People on Facebook who have many shared friends but no link between them for
two years. What does that mean?
>> Jure Leskovec: Yeah -- no, I agree. It's a good point.
>>: If you did follow up with what Eric was suggesting, there's research done by Simone
and Toyfal in Cambridge called argumented zoning. And she did analysis when you
positively cite someone when you negatively cite someone so she could be good ->> Jure Leskovec: Okay. Thanks.
>>: [indiscernible] citation because they have a mistake and many people rush to
correct it. [laughter].
>>: Negative link.
>>: Right. One other quick thing, you had the appointments of triple minuses, initials.
And there's another kind of theory about that which I wondered if you noticed, which is
the fighting in the mud theory. You pass two people fighting in the mud and maybe one
of them is completely at fault, pulled the other one to the mud. The other one is just
trying to get out. If you were asked to express your opinion about these two people, all
you saw were two people fighting in the mud you have a negative opinion.
You have it in some forms where you have two people on slashes who have been
exchanging snipes for a long time and get more bitter towards each other. You joined
the network late and all you see are these two very negative people fighting. Maybe one
is -- and you just give them both negative signs. You don't like the negativity.
>> Jure Leskovec: That's a good point. Here's one thing. So this is another slide that I
skipped. But basically one thing that none of these theories explain is the following
example. So what I'm asking here now I'm saying embeddedness of an edge is just the
number of triads an edge participates in.
What I'm showing you here, for example, opinions in Wikipedia, if my edges would be
random, then fraction of positive edges that are embedded as a function of their
embeddedness would be constant. What I find is that when an edge has less friends in
common, it's more likely to be negative and when an edge has more friends in common
it's more likely to be positive.
This is something that says that if two people like -- it's much less costly for me to add a
negative edge to someone to whom I don't have any friends in common versus to whom
I have friends in common. And you can also see that, for example, in Wikipedia, this is
much more expressed than on opinions, which is what this basically says is that it's not
so much consequence of the balance, but it's basically this consequence of social capital
and embeddedness, that these negative edges are easy to give to people you sort of
don't have anything in common with, and you keep positive edges more close together.
And, similarly, what the way you are suggesting it is to say, yes, if I see a negative fight
here I just say I don't like negative. I push it away. And, yeah, that's probably also the
reason, especially on the Web, there's no problem with three people being enemies with
one another.
While, for example, if you studied international relations, there you have to have an
opinion about everyone. So these minuses are less expressed. But I think it's a good
way to think about it.
>>: Imagine a recommendation engine some day running on these sites where you are
ready to say something or do something, you would not accept a friend, it gives you an
overlay what the implications are, if you accept that friend, what it says about the pluses
and minuses across your network, who you like and don't like so much anymore.
>> Jure Leskovec: That's a ->>: Recommendation.
>> Jure Leskovec: Good thought. Basically to tell you if you don't accept this friendship
you'll make enemies with all these people.
>>: Felt a little bit down here and so on.
>>: This seems to relate to things like [indiscernible] would be a good example where
you're actually in a competition to upload things. Upload it enough you get to the front
page. So strategies people will take is that they say they like submit things, upload and
anyone who votes up your vote you go to their link and you upload it, too, so you start
working in coalitions to create a critical mass of uploading of popularity. And there's not
a reciprocal thing on the negative side. There's no reason for this down flow of people
into oblivion, there's an example, it's anecdotal. I don't know if anyone's looked at it
because sometimes you can look at someone's history and read it and you can see all
the occurrence they voted on, you can vote on their history. So when someone leaves a
snarky comment in their column, they'll download their comments for anything without
any kind of sense or reason to it simply as sort of a malicious -- it seems like read it is
very social. Where these are different, they have different motivations might be
interesting.
>> Jure Leskovec: Like I spent lots of work on trying to find coalitions in Wikipedia. So
you know the idea would be out there. These sets of people who all vote positively on
one another and they don't like that set of people, and we found very little like ->>: What's the motivation? Wikipedia, what's the motivation to like uploading everybody
around you or previewing them in some way. Like Read It, has a daily reader board,
every day it's a cycle, whoever gets the most votes, boom, they get the immediate
reward for having been well embedded in the network and having this kind of activity.
>> Jure Leskovec: Okay. So I agree. So that would be an interesting thing to study.
Like I don't -- like maybe we would see the same thing, especially on like the positive
reinforcement side. Even though on Wikipedia I want people who think like me. I want
enforced editorial policies and so on. I think that would be some ->>: How long do they go on? Does this happen at a regular point in time?
>> Jure Leskovec: Like we had 2,500 elections.
I think we have 2004 to 2008. So we have 2,500 elections or almost 3,000 elections,
half of them ended up with a promotion. Half ended up without promotion. And, yes,
you can, like I haven't talked about this, then you can study how do people make
decisions? There is this person they have to decide on and there is this now collective
decision-making process. People come and say yes/no, yes/no. You basically think you
have people in the room, you say what do you think about this, let's now figure out, do
we accept this person or not. So we can study how does the outcome of the election
change as people express their opinions and so on. So we also looked at that, but
that's ->>: Anyone can vote.
>> Jure Leskovec: Anyone can vote. Mostly admins vote but anyone can vote. So you
also find, like you find people like 400 different elections and so on. So you find people
that vote in many different elections.
>>: So I'm wondering if you thought about -- I'm wondering if you thought about possible
so when you do your study of the network and statistics, as you look into the possible
effects of accounting whether that resulted in any bias, the one question I thought about
during your talk is you have the same triad, and I'm wondering if it generates like this,
like positive, positive, negative and whether for balanced theory whether it would
generate more negative-negative examples than for status theory? Or maybe other
ways of bias.
>> Jure Leskovec: Like for when I was looking at this, I really know when the edges
appeared. So I was really able to look at it in the evolutionary sense. So I was counting
exactly the same triads in both examples. I always say this is what happened and this is
the new edge what will happen, what will this new edge be. So I wasn't really looking at
the static network but the evolving network.
>>: Same triad only using that once.
>> Jure Leskovec: Exactly. Exactly. So I didn't use it. Yes. Exactly. So, yeah, and
then whenever, then of course the question is what do you do with the reciprocated
edges, and I think in those cases I would just take the first edge and usually the
reciprocated edges of the same sign so it doesn't really matter. But we try to be careful
about these effects. It's a good point. Okay. Thanks a lot.
[applause]
Download