18141 >> Eric Horvitz: It's an honor to have Jure Leskovec with us visiting from Stanford University. Jure is an assistant Professor of Computer Science at Stanford. We know him as one of our star interns and former MSR Fellow. And he's been pursuing a very, very interesting and challenging but rich area of probing large scale networks, both social communication networks as well as information networks, aspects of the Web, understanding the revolution, how information diffuses on these structures, understanding large-scale structure that you can typically only gain insights about through a large-scale analysis. Jure has won numerous awards even during his graduate work, and I'm sure more to come given how wonderful his papers are and his research and deep thinking. He's already had three Best Paper Awards at tier one conferences and won the ACMKDD dissertation prize. He won the KDD Cup in 2003. And I guess he and Andreas and Carlos Destrin [phonetic] were involved in, that was the Battle of the Sensor Networks, won the Battle of the Sensor Networks in 2007. I see in his CV he says he holds three patents. They're all Microsoft patents, I have to say. And he co-chairs the machine learning and data mining track at the upcoming WWW conference. So go ahead, Jure. >> Jure Leskovec: Thanks a lot. So the work I'll be presenting today will basically talk about networks and networks where edges are positive and negative, right? And this is a joint work with Daniel Huttenlocher and Jon Kleinberg from Cornell. And we have two papers. And this one is coming at this year's KAI and the other at WWW, and I'll try to cover both of them. This is basically our idea, the plan for the talk. So if you look at how we look at network structures, most of the work usually just focuses on positive edges. Meaning friends, followers, things like that. I analyze the network where I treat all the edges in the same sense that they all say something positive to me, right? But generally edges or links in these networks can be at least you could sort of classify them at least in two classes. You could say some of them can represent some kind of friendship and the other one can represent something that is sort of the opposite of the friendship. And what we want to study is these types of networks. So now if you say I want a network where I will have these pluses and minuses, the question is how can I go infer these pluses and minuses, where can I get the data. And one way is to say that users will express these attitudes, these positive and negative attitudes implicitly. So here's an example. So, for example, the way how users are using these social networking or online social applications could give me clues on how whether that particular connection or action is positive or negative. So, for example, product ratings would be one such thing on Amazon and rating other people's reviews that's helpful or not would be one such thing. For example, I'll show you examples of this thing where on Wikipedia you have editors who then go up for promotion to become administrators, and in order to do this, there is a public road. Now a positive edge means I voted it positively on your promotion and negative means I opposed your promotion. So this is one -- [beeping] they want my power? The other way to infer, let's say, the signe of the edge would be through text or taking the text and trying to do some kind of sentiment analysis. And you could create assigned networks from text and maybe one way to look at this would be looking at this well-known picture from [indiscernible] where it's conservative and liberal bloggers, or the other way around, and this is how they link to one another. You see these clusters that don't talk too much. That's networks with implicit edges. So what we looked at were networks with explicitly signed edges. What I mean by that is basically what I want I wanted want to find data that's relatively large and where every edge is naturally signed. I want a graph like that. Every edge I have sort of a positive sign or a negative sign. Here are three such datasets I will be talking about, and I will tell you how they are. So the first one is this product review website called Opinions. And on Opinions people write product reviews and then there is also a social network where they -- where you create connections to other people, and for every connection you have to say basically it's like a trust/distrust connection. Do I trust this person's product reviews or I distrust this person's product reviews. So here when a user has a connection have to put a plus 1, meaning I trust this person, or minus 1, I don't trust this person. This is given, not something I'm inferring, but the user is telling me do I trust it or not. The reason why this social network is deduced, this trust/distrust relations have been used for other the rating of the product reviews. And the more trustworthy you are, the higher your product reviews are rated, more people see your product reviews and Opinions gives you more money. So Opinions is actually sharing its revenues with the people who write these reviews. So it's important that you get a lot of trust edges. That's one. The second one is this Wikipedia that I was telling you about, where I have this voting network, where that is completely public. There is a public record of how people have voted. Here a signed edge means was I in favor or was I against somebody becoming an administrator. This creates an edge to that person. The last dataset that we look at comes from this slash dot, a online technological blog, where there's a social network associated with that. You can tag every person as a friend or foe. The idea is do you like the person's comments or not. So comments through the articles that are posted on this website. So these are now three very different social applications on the Web where all of them have some kind of network behind them and this network is naturally signed. And users explicitly assign signs to these edges they create. But so this is sort of what is in common. But what is different is, for example, in this case this is who trusts whom. Here, other people can only see your trust edges but nobody knows what are your distrust edges. The distrust edges are private to yourself while everyone can see your trust edges. On Wikipedia, it's again very different because here everything is public. And the way how you vote really has some consequence that can ruin someone's life if you like. And then slash dot is basically a bunch of geeks and it's not clear what these things really are and what does it mean to tag someone as a foe and so on. So this is what is similar and what is different. Okay. Now I should tell you what I want. Or before this is just quick statistics. I have between 100,000 and 7,000 people, between almost a million and 100,000 edges in these networks. And then the fraction of positive edges is around 80 percent. So this is the data that I'll be looking at. >>: The first two networks, it seems like you could also give decision aware where most of the trust/trust edges are directed towards the more small subset of people. So it's more bipartite. >> Jure Leskovec: I'll come to all these questions. It's an excellent question. I'll come to exactly how bipartite these things are. But it's an excellent question. Okay. So now I told you where the data is. I told you I'll work with signed networks. What do I want to study? I want to study how the signs in the network structure, how do they interact with one other. Are the signs randomly scattered across the network or is there some pattern? More interestingly, I want to study or evaluate what kind of social theories could explain how these networks structures or signs interact, and the last thing what I want to do is can I accurately predict, can I sort of build various machine learning models that will tell me where, is there a positive or negative relationship between a pair of people. So that's basically the plan for the talk. Okay. So if you now go and say how could I go about saying, trying to relate network structure with signs. One basic intuition would be something like this. So I could say, you know, friend of my friend is my friend. So if me and Eric are friends and Eric and Paul are friends, then me and Paul are likely to be friends. So that's sort of the first thing. The second one is saying an enemy of my friend is my enemy, also makes sense. Or friend of the enemy is also my enemy. Or enemy of my enemy is my friend. This is something that makes lots of sense. And what this is basically doing is telling me about three pairs of people. It's telling me what is the structure -- how can I complete, how can I infer an edge based on the intermediate person. And this goes -- so this in social sciences is called the theory of structural balance, goes back to '46 by Hyder, and he was the first to reason about these things. So basically what I want to reason now is three pairs of connected nodes. So this is -- if now I say, okay, here are three pairs of connected nodes, turns out I have four ways to label them with signs. And then if you remember the previous slide, now I can go and start asking myself about these four different trials, and I ask about this one. So what this one is telling me is friend of my friend is my friend. So everything is good. This one is telling me if I close this one, it's saying enemy of my enemy is my friend. Or if I close this one, then I say enemy of my friend is also my enemy. So for this reason I will call these two balanced because this is sort of capturing the friend of a friend is my friend and this one is capturing the other three cases that I showed you in previous slides. And then these two I will call unbalanced, because they are basically inconsistent with intuitions I had on the previous slide. So, for example, if you think in real life maintaining such a relationship in your social network will be quite hard for you. Because your friend with two people that don't like one another. Right? And here the theory here is that you have three people that don't like one another. So sooner or later two of them will join and go after the third one. So that's what structural balance says. So now I can go into my network data and what I would expect, if this is true, is that these things will be -- I will see more of such structures than what I would see the world as random, and I would expect to see less of these structures because they're sort of harder to maintain, if you like, in the real data. So that's the first intuition. The thing is what is nice about this theory of structural balance is that you can show the following theory. Basically if you say if your network is composed only from balanced triads, meaning if your network is composed only from such structures, then this will happen, what sort of Alice was saying. Then you can show that your network has to be like this. It sort of has to be bipartite in a sense in which you have two coalitions. Coalitions A and B where you only have positive edges inside each of the two coalitions and negative edges crossing the coalitions. So this is what this balance implies. So if my network is composed of balanced triads, if my network obeys this friend is my friend and the other three rules, then globally my network will look like that. So that's the cool thing. >>: One of the details that came out of Cartwright Ferrari, the structure, the parameters, too. >> Jure Leskovec: No, it's easy -- if your network has such structures, then you can always put the nodes in two sets such that either this will happen; or if you have no negative edges, then you just have a network of positive edges. The idea is that from the local rules, global, you can say something about the global network structure. >>: I notice all the links are undirected. >> Jure Leskovec: Sorry? Right now I'm assuming the edges are undirected. This is saying the edges are undirected. Actually, the way this was developed was if my graph is a complete graph. So everyone has an edge to everyone else. And then you'll get something like that. But even like generalizes even if your graph is not complete. Okay. So this is this. So let me now tell you about one thing that is sort of a bit more specific. So all our datasets, the domains we study, they are naturally directed. So one way to apply this theory that I just showed you would be just let's forget edge directions and apply the balanced theory and try to see how these assigns and edges correspond. There's also a different interpretation of this positive and negative edges, that you can sort of trace it back to this paper by Guhi, et al., where you can say the following thing: You can say every positive edge basically tells me something about the status of people. If A gives a positive edge to B, this means that B has a higher status than A. If A gives a negative edge to B, this means that B has lower status than A. So it's not really about friend and foe, but it's about status, how good is someone. This says B is better than A and this says B is worse than A, whatever, in some kind of status sense. So that's another way how this positive and negative edges could appear in networks. And just a small example. So here is A points negatively to X. X points negatively to B. I can ask what do I think would happen here. So what structural balance would say is put a plus here. Because we are friends and we have a common enemy. So enemy of enemy is my friend. While status theory would say A is bigger than X. X is bigger than B. So A is bigger than B. So I should put a minus here according to the status theory. Balance would say put a plus. Status would say put a minus, because status of A is higher than the status of B, just by the transivity. And the other way around where B says B has a plus to X. X has a plus to A. So by a balanced theory I would put a plus here because it's all three of us are friends. If I put a minus, this says there is a person who has two friends who don't like one another and we said that is bad before. So balance would put a plus here and status would put a minus. Why? Because X is bigger than B. A is bigger than X, so A is bigger than B. So A looks down on B. That would be the prediction of the status theory. What I want to do for the rest of the talk is sort of go reason about these things and see how do people create these edges. Because these are two fundamentally different ways of how people assign edges here. So that's ->>: So I imagine an extended balanced theory where you have larger structures that are stable. >> Jure Leskovec: Sure, yes. So it's a good point. >>: Identify those structures, that would hope to kind of scaffolding that would hold friendships in place even if people were enemies and learn those scaffolds. >> Jure Leskovec: Actually, I will show you a bit about this. But one way how to generalize sort of the larger structures is just to say I want these larger structures to have the product of the signs has to be positive. So it's a minus times minus times plus. That's a positive thing. So that's the way how we generalize sort of bigger structures. So that's what I want to do. So for the rest of the talk basically I want to look at how do these two theories align with the network data. I don't want to just tell you which one is right but I want to go into a bit more detail and how these theories are reflected in the data and compare them, and why do I want to do this? Because if you give me insights how these linking systems, how people use these linking systems. Because nobody tells them this is about status, this is about balance. It's about is this a friend or a foe or do you like this person to become an administrator or not or do you like this person's reviews. So I want to see if these theories say anything about how we were using these systems and basically as far as we know this is one of the first things that actually looked at networks that are signed that are a bit bigger. So that's the plan. So first I want to show you about the balanced case. So I'm evaluating structural balance. I'm considering my network to be undirected, and so what is this big table showing me? So I have the four, sorry, the three datasets. I have the probability of a particular triad, so each row here is a particular triad. I'll show you that. This is the fraction of those triads in the network. So these numbers should sum to 1. And this is the fraction of such triads in the network if I would randomly assign pluses and minuses to the network. So I keep the network stable. I keep the total number of pluses and minuses the same, I just sort of shuffle them across the network. This is the proportion of the triads if the signs would be randomly distributed on the network, and by keeping the proportion. So the first one, all right. This is the stable, the balanced triad. And I see it's much more -- I see it much more than what I would expect to see by chance. So balance gives me a good prediction. Then this is the second balance. Two friends with a common enemy. And you know I see it more expressed. Actually, all this is statistically significant. The only way, the only place where this is not more expressed is in slash dot. So here I almost put almost a checkmark, and this would be now the two unbalanced cases and, again, for example, this means I have two friends who don't like one another. I see this much less than what I would see by chance, here, here. So again this triad is much less expressed than what I would think if the world would be random. So that's good. And this is now the last one, all three minuses and actually this one is more expressed than what I would expect. So there are more triples of enemies than what chance would tell me. So this is sort of inconsistent with the balanced theory. So this is now if I think about balance. So now I can go and try -- so here I just threw away the reactions of the edges. One way to go now would be to think about, okay, what if I also try to consider directions of the edges. Now I will change the setting a bit and I will be thinking about evolving the directed networks. So here is what I want to do. So I want to think about directed networks where now I can go and again ask, okay, how many of these directed triangles are explained by balance. So here are the 16 possible directed signed triangles. So why 16? For each of these two edges I have to decide the direction and I have to have the sign. So it's two times two, times two, times two so it's 16. Now, this is the edge that we look at. So the setting is now this node is creating an edge to this node so this is the context they created so there's sort of 16 different contexts. Now if I use my structural balance theory the way it is suggested, forget about the direction of the edge and start predicting. So in this case I would predict a plus. So this is what balance would say. Now I can go in the data and say is this directed thing more expressed or less than what I have seen in the data. And I would find that it is. And then a second here I would put a minus. I have a plus. A minus. So I would put another minus here. These are now the predictions by the balanced theory. So makes sense, right? And now I can go into the data and start checking how many of these predictions are correct. And it turns out that only half of them are correct. So what I mean by that is, for example, this thing is already represented in the data. Meaning this is a stable triad and it occurs more often than what I would expect, but, for example, some other one -- for example, this one is underrepresented. It turns out when you have two minuses going this way, people like to put a minus here and not a plus. Here's another example, right, where I have pluses going this way and then people like to put a minus here, not a plus. Which seems to be something that alliance with the status theory. It says you are better. You are better. So yes I'm better than you. So at least if you look at the data here are two examples, for example, where status theory makes mistakes. These are the other cases where status theory gives you wrong predictions. >>: One could imagine balanced theory being extended a little bit, you want to fold in notions of separating out the actual reality of plus/minus from communicative intent, but you're more stable to have a friend not like people who are friends, vice versa, if you don't communicate it. So it would be a bias to not express in the graph information that is retrieved in reality. So you can dig into a bunch of these things and come up with stories that involve extra network or noncommunicative intention. >> Jure Leskovec: So I agree that like there is all these issues that are not present, or sort of all these evaluations that people are just not willing to give because there is some cost to doing this. >>: Right, bias distribution. It becomes to certain biases which they're not getting, not a random deletion of the ->> Jure Leskovec: I agree with what you say, but on the other hand, for example, with Wikipedia, it's very costly to say, yes, I oppose you, because everyone else sees that. You would say people would really give it if they feel strongly about it. So Opinions no one sees who you distrust, implicitly you could distrust everybody and nobody would look you -- so if these two things are similar, so if the results from the opinions in Wikipedia would be similar, then that will answer your question. >>: Where do you get the negative edges within the public? >> Jure Leskovec: Sorry? >>: In cases where they're not public, where did you get the data for the negative? >> Jure Leskovec: From the database. From Opinions. So I was able to crawl Wikipedia and crawl slash dot but I had to get the data from Opinions. >>: From them. >> Jure Leskovec: Yeah. You can't obtain it from through the Web interface. Correct. So eight out of 16. The question is can we do better or can the status theory do better. Now the answer is yes and now let me tell you about the status theory. Here's how we'll think now. All my edges are directed and edges are created over time. So we are still are reasoning about triples of nodes and the way I will be reasoning is X creates some edges and now node A evaluates node B. So this is what I would like to figure out no you and what I'll be saying is what can I say about the behavior of this edge given the signs that the X gave? So I want to sort of consider what happens here in the context of X. X is giving me the context. What I mean by that is X, context means what are the directions of these two edges and what are the signs on these two edges. And I would like to know how does this sign A and B depend on what X did before. And the point here is basically that instead of studying this as an independent event, now I will sort of condition on the action of X. I will say X did this particular thing, now A evaluated B, how was what can I say about the behavior of A. So that's the idea. These edges are embedded. So I have this context. Okay. And the same 16 trials I showed you before, these are now the 16 different contexts. So this is the edge that gets created and here's the node X and this is the context. So it's direction and the signs. And here are the 16 different contexts. Okay. And I'll call this now contextualized links because this red ink that occurs is put into this particular context. That's how I look at this. And now I need to make a few definitions. So first this I is that different users make signs or sort of give signs or the proportion of signs they give is different. So I will define what I call a generic baseline of the user, fraction of pluses that the user gave. Right, this is how positive they are on the average, if you like. And then I have something I call a receptive baseline which again says how many pluses they received. So it's how popular are they in some sense. So these are the base lines. So I'll just take a note and count what fraction of things people point to them were positive and what fraction of edges that they give that are positive. These are the base lines. Now, what I want to know is how does your baseline or how does your behavior change from the baseline based on the context? If I know what X did, how will your behavior change from this baseline behavior? So what is important to note here is that I'm not interested in modeling what will you do but I will model how will your behavior change from this baseline behavior. And the measure that I introduced, we call it surprise, and basically it will be something that will tell me how much does the user behavior deviate from the baseline when they are in this particular context deal. So I want to characterize how much does the behavior deviate from the baseline. The idea there is what -- what does A intend to do in this situation versus what does A do when I don't care about directions and signs here. So I want to know how does the behavior that changed in sort of in this context versus overall possible contexts. So here's how I compute things. So let this be all the instances of a particular contextualized linked T. So basically the idea is if this is the contextualized link I take all pairs of edges that participate in such a structure and then, what is the baseline context is just generative baseline of the subject is generative baseline of all nodes A which is basically what is the expected number of pluses I would see in this context if everyone would be behaving based on their baseline. And again the receptive baseline for T is the sum of the receptive base lines of these guys here. So again what is my expected number of pluses that would occur in this particular case. Okay. So this is if people would be behaving independent of the context. So now what I define as a surprise is basically I also count how many positive edges I really see in this case and then I want to quantify how far away is the observed thing from the expectation. So I will be measuring basically the number of standard deviations by which the number of positive links differs from the generic baseline that I define here. Similarly, I have a reciprocative surprise just the number of standard deviations that positive links deviate from the baseline. So I'm quantifying how much baseline is what is expected and the unit, the unit in which I measure this is standard deviations. Okay. So if the surprise is positive, this means I see more positive, more positive links than what I expect, surprise is negative, then I see less positive links than what I expect, if this surprise values are around 6, this means they're six sigmas away from the expectation. So things. So everything that has surprise more than six is super significant. So let me now show you two examples. I think I've shown them. Here they are. So this is the first example, right? So here's X. This is now the context two negative edges one pointing up, one pointing this way, and I want to reason what would happen here, right? So what I expect here is this is the following. So I expect A to be more negative than generative baseline of A why because A is highest status and this is low status if my status theory is correct it will be more negative than what it usually is. And if I now ask okay what is the receptive baseline, if I look from viewpoint of B, because B is low status, he's more likely to receive a minus than he is to receive a plus. So also like this is the surprise should be more negative than the receptive baseline of B. So the generative of A should be A and generative of B should be negative. So they both should be more negative than what they usually are. So from the viewpoint of A, A is more negative than what they usually are from the viewpoint of B they're more like to get a negative edge than what they usually get. Why? Because A is high status. Okay. And here's the same example flipped around where I would expect the same thing. So now what I would -- well, I think I should be flipping this around. Right? Because what I want to say is if -- so imagine that this edge is flipped around. So B is the highest -- B is the lowest status and then A is the highest status. So if it would be evaluated this way, B is more likely to give a plus. So surprise should be positive. And because A is high status, it should be more likely to receive this plus. So these are two basic examples. But here is a different example that we'll show why this thing is a bit complicated. So let's consider this what we call joint positive endorsement. So I'm considering this type of case. So there is an X to give pluses to A and B. Okay. And what now what I want to figure out what will happen here. And what we see in our data is the following. We see that A is more likely to be positive than the baseline. So generative surprise of A is positive. Meaning A is more positive than usual. But receptive surprise of B is negative. So this should be negative. So what this means is that B, for B this edge is more negative than what B usually gets. So in some sense I almost get like a contradiction. Here I'm saying A is more positive than usual, and here I'm saying this is the section more negative than usual. >>: How do you know which one is A and which one is B? >> Jure Leskovec: Because I have the direction of the edge. >>: Oh, I see. >> Jure Leskovec: So A is the guy giving the edge. B is the guy positioning the edge. Now, let me give you a story and tell you how this is consistent. Okay. So here's the story. Okay. So think of a soccer team. And now I can create a signed network on the soccer team to ask every person how does your skill relate to the skill of some other person. Right? And this way I can build a signed directed network where I will put pluses to people who are better than me and I will put minuses to people who are worst than me. That gives me a network. I'm assuming there's an ordering on the skill of the players. I ask every player who is better than you, worse than you, this gives me like a signed network. Now, imagine that I haven't yet asked what A thinks about B. All I know is what X thinks about the two of them. And now the question is what can I infer about A and B from basically the information that X gave me about them. Okay. So that's the setting. So what can I infer about the answer of A? And, okay, back to -- so here's what I can mean first. So here's the first inference is that I can think this way. I can say because B has a positive incoming edge, B is high status. B is a good soccer player. Then evaluation of A should more likely to be positive than if A would be evaluating a random guy. So here's a small picture. So if this is the whole span of here is the best player, here is the worst player, right, the average player is hitting in between. Because B got a plus, it means B is better than average. So B is somewhere here. Now I'm asked A says, okay, how will I evaluate B? Yes, he's more likely to be positive than what he is to a random guy. That's what all this is saying. Because B got plus, B is a good player. So, yes, A will be more positive than what A would be to compare to the baseline, sort of comparing an average player. So this is from the generative side of things. So now this is now from the receptive side of things. How B will see this. B sees the following thing. B says because A has a positive edge, A is a good player, right? So the valuation A will give to me will less likely be positive than what I would get from a random guy. Okay. So again the same picture. So here's the average player or the baseline. I know A is better than average. So A is here. And now B, it doesn't matter where on this axis B is, B is less likely to receive a positive edge from this one than from this one. From this point of view, from the point of view of the receiver edge, he will receive this action to be less likely than to be positive than what they would get from a random guy. So that's a story how basically you can get different predictions based on the viewpoint. Based on where do you look, do you look at the edge from the point of the receiver or from the standard of the recommendation. So now I have to tell you how this status theory will make predictions. So here's the idea. So I will assign X to have a status of 0. And then when X, if X points positively to A, I will -- to U, I will assign a new status of 1 otherwise I'll assign a status of minus 1. And if the edge is in the other direction, then I can sort of turn the direction of the edge and turn the direction of the sign. So plus this way means minus that way. Okay. And so in this case both A and B would have status of 1. Okay. They are both better than X. That's why they have status of 1. And then now I need to say, okay, how is the surprise value, the deviation from the baseline, when is it consistent with the status. Here are the two rules. From the generative side of things, from the viewpoint of A, then this is how I think. I say the status of B has to have equal the same sign as generative surprise of A. Basically it says if B is high status, A should be more positive. If B is low status, A should be more negative than what they usually are. So this is what this is saying. And then from the receptive side, from the viewpoint of B, I'm saying the status of A should have the opposite sign as the receptive surprise. So I'm saying if I'm B and I'm high status, then -- sorry, if I'm B and A is high status, that means I'm more likely to receive a minus. If A is low status, I'm more likely to receive a plus from someone who is below me. And if A is up, I'm likely to receive a minus. So here I get the difference in the signs. Here I get the same signs. So this is how the whole thing works. So right in this case this is what I would expect. So the generative surprise of A has to be positive. And the receptive surprise of B has to be negative. Right? They both have signs of 1. So we know for this to be consistent I have to have equal signs here. So I have to have plus here and I have to have a different sign. So I have to have minus here. So now here's a huge table with results. So bear with me and I'll explain it nicely. These are all the 16 different triads. This is the number of times this triad occurs. This is probability that the triad is closed with a plus. Remember, baseline is about 80 percent. And then these are my surprise values. This is the number of standard deviations the behavior deviates from the baseline. Minus means you know 50 standard deviations more negative than what is expected. Plus means this many standard deviations more positive than what is expected. And then so these are the numbers. And here are sort of the predictions. These are the predictions from structural balance. These are the predictions based on the status theory. Okay. So let me show you an example. So here are now the 16 things that I have. So, for example, if I take this particular triad, this is 11, I can go to line 11 here, right? I see that A is more negative than what they usually are and B sees this more negatively than they usually are. So the generative surprise of A is negative and the receptive surprise of B is negative. And this is something -- this is opposite of what balance would say, the balance would say both have to be positive while our theory says that they should both be negative. So why? Because balance says here's a plus, here's a plus so here should be a plus. They both should be more positive than what they usually are. What I see from here -- sorry, from here, they're both more negative than what they usually are. So this gets closed by a minus. So this is something that the status says. But balance doesn't say. Okay. So do people see that? What this is saying, B is greater than X. X is greater than A. So what is going this way, it should be -- wow, wrong. What is going this way, I should give a minus this way, right? Because this is the lowest guy, this is the highest guy. This way should be the minus, which is against what balance says, but it is what results in the data. So my status theory gives me the right prediction. This gives me the wrong prediction. And the point that I made before was balance gives me around seven or eight of these things correctly. Status gives me 13 or 14 of these things correctly. So this is one example. And here's another example that I was showing you before, right? It's this way. What it says is B is less than A. So I would -- so status would say put a minus here. Balance would say put a plus. It would say two friends with a common enemy. If you go to this particular line here, again, you would find that people put minus in this example. They behave more like the status theory and not as a balance theory. Okay. And then the last thing that I want to show you, okay, what are the mistakes? >>: The status and the applications that you're looking at. So it's a status -- it's a plane ability, build a type of opinions. So what's the method to the semantics there? >> Jure Leskovec: That's a very good question. So, for example, the last slide I will show you is how, like, for example, voting on Wikipedia. Why should voting be about status. There should be some bar, and you're above that bar you are above the administrator bar, if you're below, you're below the administrator bar. If you look at how people, I'll basically show you the slide where I have some kind of proxy for status and how people vote based on the difference in status. So, wait, yeah, it's a good question. Why should there be status if these things are not about status. They're about opinions or preferences or trust and there's no inherent status in there. But basically what is my point is that in the systems people behave like they're assessing status of one another and not necessarily using this friend of a friend type reasoning. >>: You can imagine a model mixed with parameter, right? >> Jure Leskovec: Both examples, yes, exactly. I'll come to that. So here are the four different mistakes that I'm making. So this one and this one are the T3 and T15 and they add 2 and 14, and this is the last mistake. What is to know all the examples up here are just basically variants of the same thing. So I can get from this triad to this by flipping the direction here and flipping the sign and I mean in this case. If you look at all these examples they're all the same example at the end from the status point of view. They're all telling me the same thing, right? Because these two pluses here I can flip them around and make minuses and I'm in this case. It's again telling me X is the best, and A and B are worst than X. So from this point of view these are the mistakes that I'm making here and this last mistake is the case where balance does well. It's like I should be putting a plus here while what I'm predicting is a minus. These are sort of the five mistakes I'm making. So going to Eric's question, now the question is, for example, the following. So I said, I told you this status. What does state us tell me about this application? So this is -- so A creates an edge to B. Now B has to create an edge backward. If my state us theory is correct, then B should be negative. Because B is greater than A. So this way should be a negative edge. And similarly this way, right? There should be a plus here. So the question is when people reciprocate, do they obey this theory that I was saying or the balance. Based on the balance, the edge should have the same sign. So if I'm friend I should be a friend back. In the status theory, the edge should have a different sign. And basically what I'm interested in is what is the sign of this edge given that the first edge was positive or that it was negative. And if I look at this, basically what I find is that basically a reciprocation strongly follows balance. So people are more likely to reply with the same sign than they reply with a different sign. And here the point is also that all these -- so the strongest signal is just reciprocation. The second, and then what you get is all this triadic effect. What you also find, then, is from the balance point of view is that people are more likely to reverse the sign if they participate in an unbalanced triad. If I'm in an unbalanced triad, am I more likely to try to make it balanced. That's something that happens in the data. So going to the question that Alice had at the first thing, is how does this global network structure interact with the links. So now I want to understand how does this network globally look like from these pluses and minuses point of view. And let me skip a bit. So here is what I want to say. Basically both theories make predictions about the global network structure. For structural balance, I already showed you. This tells me there will be coalitions in my network. So my network will look something like that, right, pluses, pluses and minuses across. What status theory tells me, there should be like a global status, meaning that I should be able to assign every node a number such that positive edges only point from nodes with low numbers to the high number and negative edges point the other way around. I can now go into the data and look is globally my network more like this or more like that. And to sort of scare you with a big table, this is what's going on. So this is like the fraction of nodes that obey the balance criterion and this is a fraction of nodes that obey the status criterion. Yes, it's less, but here are two ways I can make my data random. Here the idea is what if I -- if I randomly assign the signs in my network and you can see that then the network is more bipartite than what it is in reality. While if I make my network random, it obeys status less than what it will base in reality. So what this basically tells me is that globally networks obey status more than they obey balance. Even globally status is more expressed than, or there's more evidence for the status theory than there is for balanced theory. And then the last thing -- so now I showed you these two theories. One thing that I want to do next is to say, okay, can I go -- given the signs of the edges, can I sort of sign of the edge and try to predict it? So given how this edge is embedded and what are the signs of the edges adjacent to this particular edge, can I figure out what should I predict here. And so the way I will go through this is very simple. So for every edge I, will create a feature vector and this feature vector will just count the types of triads this edge is embedded in, then I'll train a logistic rectifier. These are the 16 features I'll be using. If I want to predict this edge, then I'll count how often does it occur in this context, in this context, in this context and so on. And I have a feature vector of line 16. And here is the example of the logistic regression. So I sub sampled my data so that random would give me 50 percent, and the point is that using the existing features I can predict almost like with 90 percent classification accuracy. So basically between 80 and 90 percent classification accuracy I can predict whether a particular edge would be positive or negative. So that's basically the first point here. And let me show you the last thing. So the last thing that I want to then ask is okay if I can do this with 90 percent, I also know that all these applications are very different. So what I want to ask then is say, okay, what if I would train my model on opinions, where people rate product reviews and then use this model to predict how people will vote on Wikipedia. Right, can I do that? The idea is I will train on slash dot and I will predict how people trust each other on Opinions. And if I do this, here is what this table is showing me. So I'm training on the raw dataset and I'm evaluating on the column dataset. Okay. So if you look at, let's say, the first line, so when I train on Opinions and I evaluate on Opinions, I do best. Right? But even if I evaluate the Opinions, dot one slash or on Wikipedia, I do almost as good as if I evaluate the, if I train the model on slash dot or if I train the model on Wikipedia. So what is the point? The point is that these models have like amazing generalization performance. So I get almost the same performance on Wikipedia regardless what I use for training, or on slash dot I do about the same regardless of whether I use the slash dot dataset to infer how people create signs or if I use Opinions or Wikipedia, and similarly here. I lose about one percent in classification accuracy. And what this is basically telling me that even in this very different applications, where people vote and everything is public and here where only positive stuff is public and nobody sees the negative stuff, the model is the same. Right? The model does us well regardless of what the application that I train it on. >>: Did you do [indiscernible] to see what [indiscernible] what the 16 is related? >> Jure Leskovec: Yes, I skipped these slides. >>: Can you tell me from your memory? >> Jure Leskovec: So it turns out that -- let me tell you off line, okay? So this is the first -- so what is the surprising result here is that regardless of which dataset I train, I can do as well as on the dataset as if I train on itself. >>: But how do you explain the variation between the columns? Why is Wikipedia so much less predictable than E Opinions. >> Jure Leskovec: That's a great question. I don't have a good answer. But the point is exactly as you said. It just seems, yes, Wikipedia is like a harder problem than what Opinions and slash dot are, because I always do worse. I notice regardless of sort of what kind of model I was using, the performance on Wikipedia was worse. Now, why is that? >>: Because it does not apply as well. >> Jure Leskovec: So here I was using also -- I skipped a few things. I can go back, if you like. So one thing is that when I do this logistic regression, I can train these coefficients or I can take the coefficients directly from the theory. For example, I can apply the balanced theory and say this is the coefficients, the logistic regression coefficients balanced theory, it would say. If this is the edge I'm predicting, if it's embedded in a plus, plus or embedded in a minus/minus put a plus. So I put 1s here. If my edge is embedded in such case put a minus because these are unbalanced cases. So I would put minuses here. Now what I'm showing you here, for example, this is coefficients that the theory suggests and these are the learning coefficients. Okay. And you see that the signs are correct. The only case where it's not correct is this minus/minus, which is the all minus triads which we also saw that people put minuses there. And similarly for status theory, I can compare what the sort of, what does the handcrafted logistic regression model do and what are the learning coefficients. And the slide that I also skipped is the following. So here is the point. So, for example, this is the deterministic balance. So where I set this logistic regression coefficients by hand and this is where I set them by learning. And you see that I gain sort of very little. >>: What are the different colored bars? >> Jure Leskovec: That's a good question. The different colored bars are how well embedded is the edge. Sort of how rich is my feature vector. The more embedded. So this 25 means that the edge participates at minimum in 25 different triads. So my feature counts are very good. While here at 0, sort of most of the features participate at 0 because they won't participate in many triangles. It was to see how much of this effect was there. What's the point of this slide? So when I have that here, this means deterministic. This means I set coefficients by hand either plus 1 or minus 1. I have learned, this means I've learned in the coefficients. And the point is that there's not much difference between deterministic and what I learned. So even these theories do well. And you can go and compare sort of the status theory, predictions from the status theory and the balance theory. So this is the deterministic balance. This is deterministic status. And, for example, up here and also up there, status does a bit worse in terms of predicting than balance does. >>: Is it true that the opinions and the slash dot is something that's based on less information and people give less thought to that in Wikipedia and Wikipedia take very seriously and make it their life's mission and it's harder to predict kind of based on superficial ->> Jure Leskovec: That could be an explanation. So the last slide that goes to Eric's question is the following: So how do people vote on Wikipedia. So what am I asking here? How likely am I to vote on someone positively? And here I'm having some notion of status. So I'll tell you what exactly this means. But here it will be when A and B have about the same status. Here when A has lowest status than B and here when A has higher status than B. And A is evaluating B. So here is what I see. So here is now probability of voting positively versus the difference in status, and, for example, one way to quantify status on Wikipedia is to say how many edits have you made. And what this says is the following: When B, so the guy goes up for administrator, made many more edits than A. A is very likely to be positive. More positive than the baseline. When they made about the same number of edits, A is the most negative. And when A made much more edits than B, again B is -- A is more likely to be positive. And here is a different way how to, a different way to think about status on Wikipedia. People can give one another these barn stars. It means that I come to some user. I edit his page and I say I give you a barn star. I just put some image there. That's called a barn star. So now I can say the number of barn stars is the value of your status. And here I'm now plotting the barn star difference. And I see the same behavior, right? When A is evaluating someone who is clearly better than what they are, they're very positive about it. When they evaluate someone who is clearly worse than what they are, then they are here, but when they evaluate someone who is exactly of the same status, they're the most negative. So both these curves, if I sort of emphasize, have the same shape. They're very positive. They get the most negative when A and B are about the same status and then they sort of bounce back. Both in terms, if you look at the status in terms of edits, or if you look at the status in terms of barn star difference. Right? And, again, this is something that we also like see a lot in conference reviewing. If I review from someone who is very sort of the same as me, I'm the most critical. If I review from someone very senior or something, I'm generous. And if I review someone who is clearly not at my level, I'm again generous, right? And this is the baseline. So what this again says is that at least in Wikipedia, it's not that there would be a bar are you good enough for administrator or not. It's basically people tend to evaluate other people based on their own by comparing them to themselves. So what this shows is that it's the relative assessment that matters a lot, absolute assessment. At least on Wikipedia. These are just two different proxies for the status, but you get the same thing in both cases. A very positive, the most critical when we're the same status and then this bouncing back. So that's all that I wanted to say. So basically what did I -- what was I telling you? I was telling you how is the network structure and the signs, how do they interact. I was showing you some of the examples of the structure balance theory that is present if you look at the relations undirected and then I also showed you this status theory that can sort of give you very interesting predictions about behavior of people, and it explains my data much better and also at the global level this theory is more, this sort of the global structure of the network more obeys the status than the structure balance theory. And then the other two things that I think were interesting, first one is that you can basically, with simple models, you can predict with almost 90 percent classification accuracy whether the edge will be positive or negative, just based on the other edges in the network that -- another interesting point is that these models have basically amazing generalization performance, meaning that even though I have these three very different datasets with three very different applications and very different mechanisms of how these edges are generated, at the end, the same models seem to apply. And one thing that I haven't talked much about is even if you do link prediction, it's good to know who your enemies are. So even if you want to predict whether there will be just an edge regardless of the sign, it's good to know what are the negative edges. Okay. So this concludes my talk. I would be very happy to take questions. >>: What are your thoughts about applying, is it citation databases. >> Jure Leskovec: Oh, wow. That's -- so the question there is how could I get at least a bit of label data, right? If I want to do these predictions I would want to have a bit of labeled data. But, yeah, you could think that there are citations that I make willingly and there are citations that I make unwillingly. So it would be interesting to see what happens there. >>: For example, there might be a way to figure out what are some classic papers that most people refer to and some people don't as a proxy for a negative link. >> Jure Leskovec: So like this point that you're opening I think is also very interesting. The question is there's all these links that are not there. Are they like I'm not in fight with everyone on Facebook that I'm not connected to. I'm not in fight with 400 million people and 600 I'm friends with. So the question is, when does the missing information mean I just haven't expressed my opinion, versus, no, I really don't like you? So I think that's ->>: People on Facebook who have many shared friends but no link between them for two years. What does that mean? >> Jure Leskovec: Yeah -- no, I agree. It's a good point. >>: If you did follow up with what Eric was suggesting, there's research done by Simone and Toyfal in Cambridge called argumented zoning. And she did analysis when you positively cite someone when you negatively cite someone so she could be good ->> Jure Leskovec: Okay. Thanks. >>: [indiscernible] citation because they have a mistake and many people rush to correct it. [laughter]. >>: Negative link. >>: Right. One other quick thing, you had the appointments of triple minuses, initials. And there's another kind of theory about that which I wondered if you noticed, which is the fighting in the mud theory. You pass two people fighting in the mud and maybe one of them is completely at fault, pulled the other one to the mud. The other one is just trying to get out. If you were asked to express your opinion about these two people, all you saw were two people fighting in the mud you have a negative opinion. You have it in some forms where you have two people on slashes who have been exchanging snipes for a long time and get more bitter towards each other. You joined the network late and all you see are these two very negative people fighting. Maybe one is -- and you just give them both negative signs. You don't like the negativity. >> Jure Leskovec: That's a good point. Here's one thing. So this is another slide that I skipped. But basically one thing that none of these theories explain is the following example. So what I'm asking here now I'm saying embeddedness of an edge is just the number of triads an edge participates in. What I'm showing you here, for example, opinions in Wikipedia, if my edges would be random, then fraction of positive edges that are embedded as a function of their embeddedness would be constant. What I find is that when an edge has less friends in common, it's more likely to be negative and when an edge has more friends in common it's more likely to be positive. This is something that says that if two people like -- it's much less costly for me to add a negative edge to someone to whom I don't have any friends in common versus to whom I have friends in common. And you can also see that, for example, in Wikipedia, this is much more expressed than on opinions, which is what this basically says is that it's not so much consequence of the balance, but it's basically this consequence of social capital and embeddedness, that these negative edges are easy to give to people you sort of don't have anything in common with, and you keep positive edges more close together. And, similarly, what the way you are suggesting it is to say, yes, if I see a negative fight here I just say I don't like negative. I push it away. And, yeah, that's probably also the reason, especially on the Web, there's no problem with three people being enemies with one another. While, for example, if you studied international relations, there you have to have an opinion about everyone. So these minuses are less expressed. But I think it's a good way to think about it. >>: Imagine a recommendation engine some day running on these sites where you are ready to say something or do something, you would not accept a friend, it gives you an overlay what the implications are, if you accept that friend, what it says about the pluses and minuses across your network, who you like and don't like so much anymore. >> Jure Leskovec: That's a ->>: Recommendation. >> Jure Leskovec: Good thought. Basically to tell you if you don't accept this friendship you'll make enemies with all these people. >>: Felt a little bit down here and so on. >>: This seems to relate to things like [indiscernible] would be a good example where you're actually in a competition to upload things. Upload it enough you get to the front page. So strategies people will take is that they say they like submit things, upload and anyone who votes up your vote you go to their link and you upload it, too, so you start working in coalitions to create a critical mass of uploading of popularity. And there's not a reciprocal thing on the negative side. There's no reason for this down flow of people into oblivion, there's an example, it's anecdotal. I don't know if anyone's looked at it because sometimes you can look at someone's history and read it and you can see all the occurrence they voted on, you can vote on their history. So when someone leaves a snarky comment in their column, they'll download their comments for anything without any kind of sense or reason to it simply as sort of a malicious -- it seems like read it is very social. Where these are different, they have different motivations might be interesting. >> Jure Leskovec: Like I spent lots of work on trying to find coalitions in Wikipedia. So you know the idea would be out there. These sets of people who all vote positively on one another and they don't like that set of people, and we found very little like ->>: What's the motivation? Wikipedia, what's the motivation to like uploading everybody around you or previewing them in some way. Like Read It, has a daily reader board, every day it's a cycle, whoever gets the most votes, boom, they get the immediate reward for having been well embedded in the network and having this kind of activity. >> Jure Leskovec: Okay. So I agree. So that would be an interesting thing to study. Like I don't -- like maybe we would see the same thing, especially on like the positive reinforcement side. Even though on Wikipedia I want people who think like me. I want enforced editorial policies and so on. I think that would be some ->>: How long do they go on? Does this happen at a regular point in time? >> Jure Leskovec: Like we had 2,500 elections. I think we have 2004 to 2008. So we have 2,500 elections or almost 3,000 elections, half of them ended up with a promotion. Half ended up without promotion. And, yes, you can, like I haven't talked about this, then you can study how do people make decisions? There is this person they have to decide on and there is this now collective decision-making process. People come and say yes/no, yes/no. You basically think you have people in the room, you say what do you think about this, let's now figure out, do we accept this person or not. So we can study how does the outcome of the election change as people express their opinions and so on. So we also looked at that, but that's ->>: Anyone can vote. >> Jure Leskovec: Anyone can vote. Mostly admins vote but anyone can vote. So you also find, like you find people like 400 different elections and so on. So you find people that vote in many different elections. >>: So I'm wondering if you thought about -- I'm wondering if you thought about possible so when you do your study of the network and statistics, as you look into the possible effects of accounting whether that resulted in any bias, the one question I thought about during your talk is you have the same triad, and I'm wondering if it generates like this, like positive, positive, negative and whether for balanced theory whether it would generate more negative-negative examples than for status theory? Or maybe other ways of bias. >> Jure Leskovec: Like for when I was looking at this, I really know when the edges appeared. So I was really able to look at it in the evolutionary sense. So I was counting exactly the same triads in both examples. I always say this is what happened and this is the new edge what will happen, what will this new edge be. So I wasn't really looking at the static network but the evolving network. >>: Same triad only using that once. >> Jure Leskovec: Exactly. Exactly. So I didn't use it. Yes. Exactly. So, yeah, and then whenever, then of course the question is what do you do with the reciprocated edges, and I think in those cases I would just take the first edge and usually the reciprocated edges of the same sign so it doesn't really matter. But we try to be careful about these effects. It's a good point. Okay. Thanks a lot. [applause]