21944 >> Nikhil Devanur Rangarajan: Welcome everyone. It's my pleasure to introduce Yaron Singer, who is a grad student at Berkeley advised by Christos Papadimitirou. Yaron got MSR Fellowship and also the Facebook Fellowship. He has a fellowship double. And he's going to tell us about mechanism design and social networks. Fascinating. >> Yaron Singer: Thanks. Can you guys hear me all right, in the room and also -- thanks. So thanks for Nikhil and Eric who unfortunately couldn't be here, for hosting the visit. I wanted to come and give this talk for a while so it's really great to finally do it. So I'm going to talk about how to win friends and influence people truthfully. The title is a bit scary. So for those of you who don't know, this is the title from a book from the '30s. It's kind of classic book for MBA students. And the book is called "How to Win Friends and Influence People." So I didn't come up with it on my own. So basically we'll talk about influence maximization problem. And we'll start off by talking a little bit about giving some examples and then defining it, and then we'll talk about why there's kind of like an inherent information and completeness aspect to the problem that we'd like to address. And once we'll do that, then I'll try and convince you that taking a mechanism design approach to this problem is a good idea. And we'll see that we can actually design mechanisms that give us these good theoretical guarantees, and we'll talk about some of the difficulties that kind of, the setting introduces to mechanism design and how we can overcome it. And then finally we'll also talk about some experimental evaluation that we did on these mechanisms, and then I'll try and convince you not only we get these theoretical guarantees on them, but they actually perform pretty well in practice. So just to jump to an example. So basically the influence maximization problem is basically we're asking ourselves this kind of, this kind of very -- this question that seems very important right now. And it's how do we select substantive individuals to recommend, say, a product to their friends so that basically the word of mouth effect or the influence is maximized in the social network. And what's really striking is that this question was asked by Dominguez and Richardson, where Richardson is Matt Richardson from here. Already back in 2001. And this is kind of if you think about it it's remarkable because this is kind of before the time of online social networks, and but they kind of identified this is an important problem and kind of worthwhile for computer scientists. So kind of think of a hypothetical scenario. Suppose we have some sort of site that has a thousand dollars worth of coupon codes to give. Let's say it's Zinga money. And this site is willing to give out the coupon codes to its users so they recommend the site to their friends. Okay. Let's say you recommend to their friends by posting on Facebook. Just kind of a hypothetical example. So the question is how do we decide how to pay, how to give people the Zinga money to recommend to their friends. So to continue with the hypothetical example, suppose we have Eric, Nikhil and Peter coming to our site and using it. We know that they have friends. We can maybe look at their Facebook profiles and we see that they're kind of connected, not only to the people in this building, but their connections span wide to New England and also SBC. And so basically we now have to decide who we're giving away the money to and how we're going to distribute it. So when we take -- even by this example it's not that trivial because we see that Eric has a very high degrees, has four friends on Facebook and then both Nikhil and Peter have two friends. And we have to decide -- so it seems logical we would choose Eric but then we have to decide between Peter and Nikhil. And we somehow have to make a good decision, right? Okay. So kind of keep that, kind of keep this image in mind and we can talk about it in a more formal scenario. So basically we can talk about this in the simplest model, this is the coverage model. Okay. So in the coverage model, we have N agents. And an agent would be someone like Eric or Nikhil or Peter from the previous example, and each agent now has a certain cost with, that's associated with recommending the product. And they also have a reachable subset. And reachable subset in this case was just their friends. And our goal is we're given some budget B. So this is a thousand dollars from the Zinga money that we had. And what we want is we want to find some subset of people to recommend the product so that the union of the reachable set is maximized. Okay. So we want to reach out to as many people as we can with the Facebook recommendations. >>: The CI is the cost of [inaudible] the whole set or. >> Yaron Singer: Right. So the question is what does CI represent? So CI, so we assume that each agent -- each agent has a certain cost that is associated with them. So for Nikhil, for example, the cost could be like for recommending would be $500 and maybe for Peter the cost would be 50. So it's individual to each agent. >>: That wasn't the question. Is it the cost of reaching the entire subset I have friends or just one element. >> Yaron Singer: I'm sorry. So it's just the cost of posting a recommendation on Facebook. Just making a single. >>: The whole. >> Yaron Singer: In this example, reaching the whole subset. Yeah. Okay. So this is kind of the coverage model. And just kind of more generally we can think about general influence maximization problem where we have N agents and again each one has a cost. But now we have instead of a reachable set, we have an entire social network and there is what we call an influence model. And an influence model is just -- is simply just some probabilistic rule that tells us how messages propagate from one person to another. Okay. So it's the probability that your influence given that your friend is influenced or made a recommendation. Okay. And the goal in this probabilistic model is basically similar to the coverage model we're given a budget B and basically we want to find a subset that will maximize the expected number of influence of people in the social network. So what I'll do is I'll stick to the coverage model and I'll talk about things in the coverage model and then kind of towards the middle of the talk I'll kind of just tell you that most of the things that we talked about here extend to the more general settings and I'll explain which settings and how. But just for simplicity, it's good to think about the coverage model, because it's kind of a simpler setting. >>: So you're assuming there's no propagation of [inaudible]. >> Yaron Singer: Yeah, so right now -- just for the kind of the purpose of making things concrete, I'm thinking about the coverage model where there's no propagation. There's no propagation and everybody that you're connected to is immediately influenced. Everything I'm going to talk about will also extend to the more general settings where things do propagate. But just kind of concreteness we can just think about the simple setting. Good. So kind of let's just try to give an overview of what's been a decade. So many research directions thus far have been in like I can think about them in three categories. So the first one is about characterization of influence models, right? And basically the fundamental question is what sort of influence models allow for tractable optimization solutions? And kind of a very seminal work was by Kemper [inaudible] Tardosh in 2003 basically said, well, for most problems that or in general this problem is not only it's NP hard but it's also inapproximatable. But it seems like they identified this class of influence models, they were very fundamental and been studied in the literature, that had these sub modular properties. What that means is that when they showed that when you have an influence model that's modular, they can just apply a standard greedy algorithm which gives a 1 minus 1 over your approximation. So the problem is hard. You have a lot of models that you call, some modular influence models where you can apply a simple algorithm and get a good approximation ratio, one minus one over E which is roughly 63 percent. And you can actually show there's a lower bound so you can't really do better than that. Okay. And these models have been even extended further to show that more and more that there's a larger class of these modular influence models. And I'll talk about the main models -- >>: [inaudible]. >> Yaron Singer: Lower bound is coverage model, which is kind of basic, basic fundamental one. That's it. >>: By influence models, for instance, the coverage model is an example of an inference model. >> Yaron Singer: Yes, exactly. The coverage model is like a deterministic influence model. And then kind of another direction of researchers is about actually inferring inference models. I have a lot of data, basically now I want to understand, try to fit the right model that will explain how people really influence one another. And then kind of the third direction is actually about computational techniques, saying how do we solve this problem at large scale? This is we want to apply this to online social networks. It requires a lot of computational power, how do we use these methods to really scale. Now the fundamental assumption in this problem which kind of requires for all -- required for all these techniques is that individuals have costs. And these costs are, they're costs for making a recommendation to their friends in the social network. And basically there's an inherently there's incomplete information aspect of having this cost because basically it's very hard to extract. So all the techniques that we're talking about basically rely on either the cost being explicit so they're explicitly given to you as part of your algorithm, so your algorithm receives the input, the social network, the cost of each person to make a recommendation in the social network, and then an influence model and then based on that you compute an outcome. Or they somehow assume that it was implicit. That there's -- you can basically -- there's some way from data to learn what the right threshold cost would be for people to make a recommendation. And based on that compute some sort of solution. Okay. >>: How would you calculate that? >> Yaron Singer: Right. So you can -- so, first of all, I don't know. Right? But this is an assumption. So I guess what I'm guessing the assumption is you assume maybe you can do a survey. After you do a survey, you sort of -- you do a survey of people's costs and then you kind of somehow find a threshold cost that people would -- you find like the optimal cost that people would have. And then you decide, okay, everybody I'm giving everybody this much money and you can either take it or leave it. >>: Otherwise you're just getting into this other game, the problem where people have switched [inaudible]. >> Yaron Singer: Uh-huh. So kind of what are the -- kind of what are the challenges that we'd expect when we don't know the people's costs. So, first of all, you can -- your first job might be to say well maybe we can predict costs. Maybe people -- we can sort of profile people, right, and then based on their profiles try to guess what their costs might be. Well, the problem with that kind of approach might be that it's just going to be very, very hard to do. People would have, as we'll see later, people have very, very different costs for making a recommendation and those costs might depend on just many, many parameters that are very hard to extract like personal history of a person, their affiliation with the product, just many, many other things. And that can have an extreme influence on how much their cost is going to be and it will be very, very hard to predict. So the second approach would be doing something like a survey, where you would survey people and try to determine the threshold price out of that. And the problem is that even if you can do like a good survey, then you can actually show that this kind of approach would give you poor solutions both theoretically and practically. And I'll talk about that later. So when you compare it to what you would be able to do if you actually knew the costs. And finally you can just naively go to people and ask them how much they want to be rewarded and pay them. But essentially if you do that, then you run into the risk of people actually lying to you, right, because if I know you're going to pay me what I ask for, maybe my costs for making a recommendation is 20 Zinga dollars but maybe I'll say it's 50 or a thousand, right. So that's kind of -- that's the thing we need to get around. So in this talk the question that I'm interested in asking is can we design algorithms in this environment where we have incomplete information that will still give us good solutions. Okay. So to do that, we'll do this using a mechanism design approach. And what does that mean? It means that we're going to assume that angel -- sorry, agents. Shows you where my mind is -that agents are rational. And all that means is that people want to maximize payment minus cost. So that's the standard game theoretic assumption. So if we assume that the rational, and they have these private costs that we just don't know, then what we want to do is we want a mechanism design that says let's design mechanisms that incentivise everyone to declare their true costs. We'll do it in some magical way so that we know we pay people the right amount and we select people in the right way so that everybody tells us exactly how much the real cost is. And a mechanism is just a big word for an algorithm plus a payment rule. Okay. So basically what we need to do is we need to design algorithms and tell you what the -kind of say what the payroll is going to be. And we're going to have some restrictions on these algorithms so that we can actually get the solution that we want. And the procedure would look something like this, we'll have a social network, people will tell us how much they want to be rewarded, and we'll incorporate all that into our mechanism and then our mechanism is going to eat all that information and then select the lucky few and give them payments and give them Zinga dollars so they can tell their friends. Okay. So for the rest of this talk, what I'll do is I'll talk about mechanisms for the coverage model. And I'll kind of -- I'll tell you where the difficulties are in designing these mechanisms but I won't leave you hanging. I'll also tell you the main ideas on how to solve, how to solve the interesting problems there. And then like I promised I'll tell you how they extend to other models as well so it's is not just a coverage model. And once we do that, then we'll talk about, we'll show some experimental valuation on these mechanisms that show that they actually work well in practice. Good. We're good with time. So just again the setting and the coverage model and the mechanism design setting, then we have N agents that didn't change. Each agent has a cost, and we're saying now that the cost is private, and what that means is when it's private we don't have any information on their cost. And our goal again is we're given some budget B, and we want to select subset of agents that maximizes the coverage under the budget. And we're going to restrict ourselves to mechanisms that are computationally efficient. So we're still in the algorithmic perspective when computational efficient mechanism basically means that its algorithm and the payment rule need to be computable with polynomial time. It needs to be truthful, and that's just another word for incentive compatible. It needs to provably be able to elicit the true costs from the agents. And then the important thing is there needs to be budget feasible. And the budget feasible -- the budget feasibility here means the sum of the payments needs to be under the budget. So kind of the fine point is that we need to restrict ourselves to when we're in this setting, then the payments are always going to be more than the costs. And we need to be able to design mechanisms that the total budget, the sum of the payments does not exceed the budget, not the sum of the costs. So we'll talk about it in the next slide. And then finally our mechanisms don't just need to be good, in the sense they approximate the optimal solution well, where the optimal solution would be the optimal solution that we would get if we knew everybody's real costs and we didn't have any computational limits. So this is the standard optimal solution that we would compare against. Okay. So this is kind of the most strict conditions that we can impose. >>: Can you comment on why you insist on truthfulness? It seems reasonable, but you might say all you care about is maximizing F with something? >> Yaron Singer: Right. That's great. So why should we -- so the question is why should we care about truthfulness, and why should we insist on it? Right? So, first of all, that's a philosophical question. But I think the short-term answer would be, is because we can in the sense that well I can -- I mean if I came up and said well I have an impossibility result, then that would be one thing. But actually if we can actually get -- if we can actually get design it in such a way that will get at the true values, that would be great. But to give you a better answer, if we don't insist on truthfulness, if we -- then we'd have to think about different solution concept, right? And we do know that there are a lot of -- we do know there are problems where we can't design truthful solutions for. And then we have to resort to different, then we have to go to different solution concepts that are worse than truthfulness. And then the biggest problem that they usually have with concepts like natural equilibrium and things like that, it's very hard to predict what people would do. And it's hard to sort of justify that you have a solution that's comparable or good with what you would get otherwise. >>: And the definition of good there is in sort of the strong sense you were hoping to get something that was truthful and it's even almost as good as nontruthful. >> Yaron Singer: Exactly. Exactly. Okay. So kind of what's hard about this is the first thing about this part is having the budget constraint on the payments. And that's a challenge. So it's a challenge because, first of all, you can just show that the optimal solution basically just exceeds the budget by enormous factors. And you can construct like very simple examples that show that. And while when you look at approximations you can also construct these simple natural settings where you show that once you restrict truthfulness to be under budget, then you get just horrible approximation guarantees. So the budget is a really big challenge in the setting that needs to be overcome. Okay. And then the second challenge that I'm going to talk about is basically, is an approximation challenge where in all these settings, to get an approximation to be able to run two different solutions and take the maximum between them. That's just something that naturally comes from the fact that we have a budget constraint, not incentive compatibility sense, but just a standard algorithmic sense. And then the problem is that once you try to apply the maximum to solutions on this setting, then it breaks incentive compatibility. These are two major challenges that need to be overcome in order to design incentive mechanism design for this. And the way to address these is to address the payment constraints we'll talk about the setting that's called budget feasible mechanisms, and those are mechanisms that give you truthful approximations who can actually just get us what we want. They could guarantee us that the payments will never exceed the budget, and they run in polynomial time and they're incentive compatible. And to talk about what we're going to do with the maximum issue, then we'll talk about how you can use nonlinear programming relaxations in order to circumvent that problem. So those are the two things that we'll talk about. First of all, we'll talk about how we deal with the payment constraints and then I'll talk about the incentive compatibility of the maximum operator. Good. So in our mechanism, we're first going to, to address the issue with payments, we'll use the proportional share allocation rule. So the proportional share allocation rule is just something that's very, very simple. And it just has three stages. The first round, round zero, we start with the empty set. And then the second round we're going to choose an agent to add to our set, and that agent is going to be the one that maximizes this quantity. And what this says is it's the -- when we evaluate the function on the set S, in the first case it's the empty set of the agent minus the set that's already been selected divided by their cost. So this is really the marginal contribution of the agent divided by their cost. And the intuition for why we want to use this kind of marginal contribution ordering is basically because we know that using this approach is what gets us good approximation solutions in the greedy algorithm. So the greedy algorithm, in the non-incentive compatibility setting, we just greedily choose agents according to their marginal contributions, and discontinue once we exhaust the budget. So at each stage we're going to choose the agent that maximizes the marginal contribution. And we'll decide whether to add it to the set or not based if their cost is less or equal to what we would call the proportional contribution. The proportional contribution is their marginal contribution divided by the value of the entire set, if they're included. So the intuition that where this comes from is cost sharing, where basically we want to add people to our solution only if they can justify it in the sense that their contribution over the value of the entire set is larger than their actual cost. If they contribute more than they account for intuitively. >>: [inaudible]. >> Yaron Singer: There's no budget. The assumption is that everything is normalized and the budget is one. >>: [inaudible] I take the damage of the left side where is the budget normalization. >> Yaron Singer: Here. The budgets should multiply. >>: And B ->> Yaron Singer: C, yes. >>: Why is it a denominator. If F minus F is bigger than C then you pay money to A to increase the function. Doesn't matter how big F itself is, right? >> Yaron Singer: Well, there's no relationship between like how much happiness we have and how much cost the agents have. I mean, those are kind of like two different worlds. We can think like in two different ways. So yeah, it's -- so this is a proportional allocation rule. We take an empty set and take order of agents according to their marginal contribution and then take them greedily if their cost is less than their proportional contribution. Okay. So the reason why this allocation rule is so wonderful is because we get these two nice lemmas from using it. The first one being that it's just a good approximation guarantee. So basically for any sub modular function using this proportional share allocation rule, we can show that it's a constant factor approximation of the optimal solution if we remove the item with the largest value from the set. So under this assumption then the proportional share allocation rule is great. It gives us a constant factor approximation guarantee. And the second and the more important reason for why we would use this is we can show that any sub modular function, the proportional share allocation rule is, first of all, incentive compatible. And the second thing we can show this nice property about the payments that this sort of allocation rule leads to. And the payment product is this. So if the payment is theta, then we can show that it's basically only a constant factor away from the proportional contribution of that agent. So why would this be good? Because what we can do, when we sum up all the payments, when we sum up all the thetas here, then we know they're bounded by constant factor times the budget. We just moved the valuation of the entire subset here. And then we sum everything up and it gives us a constant factor times the budget. So basically the proportional share allocation rule leads to payments that are basically, will exceed the budget, but not by a lot. Only by a constant factor. And the way to prove that is we can actually, we can characterize the payments, which is a bit more involved, and then once we characterize the payments then we can show the property that they're bounded by. When we sum them up they're bounded by the budget. So what we can do is we can run the proportional share allocation rule with not the budget, but we can shrink the budget by this constant factor, run the proportional share allocation rule, still get a constant factor approximation because we only shrank the budget by a constant factor and then guarantee that the payments don't exceed the budget. Does this make any sense? So we just -- I mean, it's kind of -- I didn't tell you -- I'm not proving these lemmas, right, so it's not immediate to see why this is true. But you'll just have to take my word for it. And then just see that the basic conclusion is that the proportional share gives us a good approximation guarantee and doesn't exceed the payments. >>: This kind of depends on what? >> Yaron Singer: This constant is -- I mean, it's a constant. It's like ->>: It's universal. >> Yaron Singer: It's a universal constant. And I can tell you, give you an intuition later if you want. >>: [inaudible]. >> Yaron Singer: They're just ugly and they vary by the different settings. >>: So [inaudible]. >> Yaron Singer: So for any submodular function you can show global constants of, I think there constant is 2 or this constant -- well, depending on the setting. Okay? Good. So the conclusion is that it's good because it gives us payments are bounded. Gives us a good approximation guarantee. But we're still assuming that -- we still threw away the agent that had the large test value. So we need to somehow be able to -- in an incentive compatible way, decide which solution is better. Once we'll be able to decide whether the proportional share allocation rule or the agent with the highest value is better, then we'll have a complete constant factor approximation guarantee and we'll be done with the theory. So to show us how to do that, then I'll kind of I'll explain to you why, what's the problem in taking the maximum in this sort of setting. So basically an incentive compatible mechanism is you can essentially characterize it by a rule in this setting by a rule on the algorithm itself. Okay. So Myerson, in 1981, this amazing theorem that tells us in these settings where we have what we call single parameter settings where the private information is only a single number, like in our case, then an algorithm or a mechanism is incentive compatible if and only if its algorithm is monotone. And the monotonicity basically says that you, if I have an agent with some cost that's been selected into the solution, so as long as nobody else changes their cost declaration, then they can, the agent that was selected can reduce their cost and still be selected by the algorithm. So reporting in our case reporting lower cost can't hurt you. So that's the monotonicity property. Does that make sense? No? Okay. So in our setting, right, we have -- so we have people declaring cost of the mechanism, right? And if you think about it, the lower -- the lower somebody's cost is, right, the better it is for us, right? Right? We can put more people into our solution. So the monotonicity rule says that an algorithm is monotone if you have some sort of agent which is, the algorithm which is selected by the algorithm. Now if that agent reduces their cost, declares a lower cost, and everybody else declares the same sort of cost they declared before, then that agent must get allocated into the solution. You can't -- my algorithm can't hurt you -- if you give me a lower cost then you're only helping yourself on the competition. So in order for an algorithm to be incentive compatible, then it must respect this monotonicity property. And the problem is that when we apply the maximum, that just violates the monotonicity property of the algorithm. So I'll give you just a quick example. So if we go back to our friends here, okay, so we would -- what we would do we want to run this algorithm and with taking the maximum out and seeing how that hurts us. So we put Eric away, because he has -- he has the highest degree. So he's the maximum. And now what we're going to do we're going to run the proportional share or basically any greedy algorithm on Peter and in Nick hill. So the way that happens is in Nikhil comes up and says my cost is $2. And then Peter comes up and says, well, my cost is $1.5, right? Nikhil, in this case, Nikhil gets selected and then Peter gets selected. And then they both have the value that they produce is as large as Eric's so we're going to select them into the solution and they're both selected. Now, if Peter declares a lower cost of 0.5, so now we need to recompute the mechanism. So now when the mechanism gets recomputed you can come up with an example where Nikhil can no longer be in the solution, because basically they come in and Nikhil now comes into the solution second. Their contribution doesn't respect the -- his contribution before was when Nikhil came into the solution first his marginal contribution was two. If Peter gets selected first then Nikhil's marginal contribution drops to be one, no longer justified by the proportional share allocation rule and can also show this for other algorithms and then Peter gets, Peter is stuck in the solution alone and then when he competes with Eric, then Eric has more friends. So Eric wins. And you can kind of try to play with all these sort of examples and try to like put some weights on them. But you're still going to end up in the same sort of situation. So somehow I need to circumvent that. So basically what we can do is for the coverage problem, we can actually formulate this as an integer program. Okay. So this is what I wrote here. It's not that important how we formulate it. But basically the important thing is that what we're trying to maximize here is that such Hs are variables describing the people that get covered and that's what we're trying to maximize. So you can take this integer program, and you can write a relaxation for it in the following form. So this relaxation is different from the integer program in two different ways. The first being is the objective function that we're trying to maximize is different. So it's not just a linear programming relaxation. So we're actually changing the objective function. But you can show that when your solutions are integral, when your solution is being the Xs that you choose here, when they're integral, then this program would give you the same solution as this program. And the second thing where they differ, of course, is in this, in the integer program we're restricting our solutions to 01 and the relaxation we're allowing ourselves to have solutions on the reel. So basically now we're going to take the idea is to think about the coverage as an integer program and then think about a relaxation. And the nice thing about the relaxation is that we'll be able to -- we can compute the optimal solution in polynomial time. But it's going to be a fractional solution. So what we're going to do, we're going to find the optimal solution of the linear programming relaxation, over the substance of the agents. And then if the relaxation solution is greater than the agent that has the highest value, then we're going to take the proportional share allocation rule. If not, we're going to take the agent with the highest value. Okay. So instead of comparing directly between the proportional share and the agent with the highest value, we'll just compare the relaxation, and if the relaxation has higher value, then we'll take the proportional share. And the reason why this works is the mechanism is now monotone and the reason being that now an agent, if they decrease their cost, then the value of the relaxation is only going to go up. And the reason being is because we're competing the optimal solution over -- and it's fractional. So decreasing your cost can only help the optimal solution in this case. So we don't run into the problem that we ran into before. And the second thing is now you can show that this mechanism still now gives you a constant factor approximation, and the reason is basically because you can show using pipeage rounding, that the solution of the relaxation and the proportional share allocation rule are close. And if they're close, then when you're comparing the relaxation, rather than the proportional share, then you're only losing a constant factor. So you show with pipeage rounding that the relaxation is constant factor away from the optimal integral solution, and we know from the previous sum of the proportional share is an approximation to the integral solution, so we're good. Okay. So the conclusion is that we can get the main theorem says we can get a mechanism for the coverage model which gives us roughly 22 approximation which is budget feasible incentive compatible, and what else? Runs in polynomial time. So, great, we're there. So we have a good solution. And now kind of one slide for why this is good, why this talk is not called on maximizing coverage models. Well, basically there are two main models that we're studying in literature, independent cascades model and the linear threshold model. So the independent cascades model is an influence model, and the rule is very simple. Basically each agent has some probability on how the rest of the agents in the social network influence it. So if Nikhil recommends something to me, then the probability of me being influenced as say .3 percent, 0.3, and if Peter recommends something to me, then I'm sold, it's probability of 99 that I'll be influenced. Then there's the linear threshold model, which is a bit more intricate. But there we, each edge in the social network has a weight. Nodes true a threshold at random. And then if the sum of my -then if the sum of my influence friends exceeds the threshold I'm influenced and if it's not, I'm not. So these are just two probabilistic influence models that kind of like the main ones that have been studied in literature. And both models we can show that in every realization of the probabilities then the models reduce to the coverage model. So we can go back and apply the mechanisms, the mechanism that we just saw, on these two models and get the same guarantees. And just in general, for the sort of the set of kind of feasible influence models that we can get are the sub modular influence models, and for those we can show -- we can basically show randomized mechanisms to give you constant factor approximation guarantees and expectations and for some cases like the voter model we can actually get mechanisms that have better approximation guarantees than the one that I showed you before, like approximation ratio four. So the conclusion is for various influence models, we can tweak the mechanism one way or another and get a good approximation guarantee. >>: Not really following the instruction, you needed the RIs in order to run the algorithm, right? >> Yaron Singer: I'm sorry, what. >>: When you ran your algorithm coverage model, you took an input of the RIs, what you had, you can't really compute those, right? >> Yaron Singer: So what I need to do, I need to decide on -- I need to flip a coin and then basically I have a realization of the probabilities. >>: An example. >> Yaron Singer: Yeah, I have a sample. And then on the sample I construct the reachable set. These are the RIs, and then I reduce the two back to the coverage model. >>: And that whole thing will be polynomial time? >> Yaron Singer: Yeah. >>: Because if that was polynomial. It wasn't clear with one sample you're going to get ->> Yaron Singer: No. That's why -- I mean, it's a subtle point, because when you sample once and then you're right, you want to argue that it's not clear with one sample -- an expectation, right, you would be able to sample once and get a good approximation guarantee. But I mean if you can't, then if you want to avoid that path at all, then you can randomly choose between the maximum element and the proportional share solution, and then choose randomly between them and then get a constant factor approximation in expectation. >>: Okay. I'll take it off line. >> Yaron Singer: Okay. So I've convinced maybe some of you that this works for other models. Okay. So we have -- so now when we have like a kind of good theoretical guarantees, I think the nice question is whether this actually holds in practice, whether the guarantees that we get are in practice are meaningful and maybe they're like just much simpler methods that we could have applied that would save us all the trouble and just be better or comparably, or just be comparable. So how good is good and how bad is bad, right? So basically we want to evaluate these mechanisms in practice. So for network data, that's kind of -- we're very lucky right now because we live in an age where getting network data is something that's rather straightforward, and it's so straightforward that even a person in a theory group can actually go and get data. And you can just -- and here I just took samples of Facebook graph. This is a public dataset that's been from a group from Irvine where they sampled the Facebook graph through random walks, and they got us a graph of almost a million nodes and 72 million edges. So it's a huge graph. And it's a representation of a social network. And for influence models, then we're lucky to have strong computers that basically can generate some relations why we can feed them influence models, we can -- and then generate these assimilations that will tell us kind of how many people are influenced in different models. The crux of doing something like this is basically how do we get reflective costs. We started this talk by saying getting costs is something that's so difficult to do and hard to predict blah, blah, blah we want to somehow have like some sort of meaningful distribution that would kind of reflect people's real costs for making recommendations in the social network. So how many people are familiar with Mechanical Turk? Everybody? Nobody? Somebody. So for those who are too shy, so Mechanical Turk is this nice platform where requesters, people like me, create tasks. So they have an interface like this that you can change some parameters and basically create these tasks for people to work on. Then you have workers who log in. And they work on multiple tasks. And this is what their world looks like. They have this list of tasks and they can go off and choose which tasks they want to work on. Traditionally it's been used for things like, well, for anything, but people use it a lot for things like image labeling and these sorts of tasks that you need like some sort of human intelligence to the process and you want to get it done at large scale. So this is the task that I posted on Mechanical Turk. And I'll just give you a second to read it. So this is what it looks like. It has the title. It says we're giving away bonus money. This is a competition we ran on people on Mechanical Turk, basically asking them how much would you like to be rewarded for posting something like this on your Facebook profile. So they gave us bids, and the bids were from 1 to 20, that was the range that we were restricted to. Then we asked them how many friends on Facebook do you have? And that was it. That's called the hit, that was the Mechanical Turk hit. Click on Submit and then we get that information. So the goal was to establish a real distribution that we can use to match with the network data and we can run the algorithms on. So the experiment for a period of about three months. And I paid people from the range of five cents to 50 cents. I kind of played with it to see ->>: Cost from $1 to $20 where you get the ->> Yaron Singer: Right. So I asked people to bid on how much they want to be rewarded for posting the message on Facebook. And but in order for them -- I wanted to -- I needed to pay them to work on the actual hit. For the fact that they click on submit, I need to pay them something. And then but they kind of -- they understood that in order to get what they, to get something that's as high or higher than what they bid for, then they need to actually post this on Facebook. So we paid people like within this range. And average payment was about 26, $0.26 the question was whether they are affected by the payments. And the answer was no. We got the same distribution no matter how much you paid them. Collected roughly 1500 bids. And about 1100 of them came from unique Mechanical Turk users. So posted over the period of time posted, continuously posted the task so people came worked on the task again and again. Then the profile was restricted to people, U.S. residents, and that had quality score of 97 percent or more. If you're using Mechanical Turk then maybe you have some sort of estimate what that quality scoring is. So now I'm going to show you some plots and just don't worry, no theoreticians were hit making these. I did them but I wasn't hurt. Okay. So the first question was, is there any strategic kind of correlation between number of friends you have and costs? And if it turns out that there was some correlation depending on how much people bid and how many friends they have that would somehow have to be reflected in the experiment. When we see like if we got an experiment, we got a node that's a high degree we somehow have to adjust the cost distribution that we're assigning for. So the answer was no. This is the plot, I played with it, I tried to bend it in different ways. There was really no apparent correlation between the number of friends and the bids and the people's cost basis. I'm sure there was more, statistically more significant ways to say this. But this is -- I'm saying this with a plot. >>: Something weird about this, you don't have very many friends. >> Yaron Singer: This is the friend category. I bin them, right, because everybody declares like has a different number of friends. So I bin them into categories, 0 to 0 to 100. >>: Dumps it well because it doesn't look like it's 1,000 ->> Yaron Singer: I'm sorry. >>: Doesn't look like 1,000 [inaudible]. >> Yaron Singer: When they're binned, I just averaged. >> Some look like drops of the right weight. >> Yeah, I know. I can show you the regular plot. >>: [inaudible]. >> Yaron Singer: I mean, yeah, there's -- I wish there was actually, and then the question maybe people lie about their friends. But actually when you look at the distribution of the number of friends this is pretty much what you get when you run BFS crawls on Facebook. So I think regardless of this, I think the thing that's interesting is you ask people how many friends they have, and you have no way to determine if a particular person lies to you about how many friends they have, but you add it together and you get a distribution that seems to be reflective of what really seems to be going on. So how kind of the other interesting thing was like how complicated is it for people to bid? So as an average it is completed within 32 seconds. And this is what the distribution looks like. So people kind of I think -- my version is they got it pretty quickly. Somebody's version might be that they didn't care pretty quickly. I mean, okay. But both stories seem plausible. But really there's no -- look at whether there's any correlation between bidding times and costs, kind of people that maybe bid $5, $15, $20, like to spend less time on the task. But yet there's no correlation. So -- by the way, people emailed me back, asking about when is it going to be, when is it going to be decided? I already posted on Facebook, blah, blah, blah. All this sort of stuff. So that was interesting. You also get to make friends on this Mechanical Turk, it's not just research. Finally what's the cause distribution look like, it looks something like this. This is something that if I generated this without telling you that I did the Mechanical Turk experiment, then I just normally know what to make of it. But it is what it is, right? And roughly four other people bid the maximum bid, which is $20. Four of the people bid below $5 and everybody else was in between. >>: Because they bid 5, 10, 25, popular number. >> Yaron Singer: Yeah, but you get a lot of them here, below five. And, yeah, and there's just no correlation between time and bid value. So you've got a distribution. So now using this distribution we can actually run the experiments so basically we ran four different algorithms, the first is the greedy algorithm, the second is the proportional share algorithm, basically the tweak here was that you used almost your entire budget. This is in the characterization of the payments which determines how much of the budget you can use, and then it has these dependencies, that you can basically -- when you know the network, you can kind of allow yourself for more budget because you know they're not going to have overpayments. And then we compared this -- we compared this to a near optimal fix price mechanism. Which by the way is not truthful. So except the proportional share allocation rule nothing was truthful. So that's to give it better competition. So the fixed price is basically saying well supposedly you could give -- what I did is I gave the -- I gave the algorithm the distribution. I allowed it to find the optimal threshold value to give to people and then it would just have this take it or leave it off for people. You can show it gives you a logarithmic approximation, and you can also show there's a lower bound of log N over log, log N for any fixed price mechanism. So this mechanism is almost optimal, Beth theoretically. And the question is can you -- can you do almost as well or better by using something like as simple as just a fixed price strategy. Can you just give away five -- can you just decide on the $7 should be your ideal value, should give everybody $7 worth of Zinga money instead of collecting bids, all that stuff. Lastly there's just the random assignment to give us kind of a lower bound of what would just make sense, what would happen if you just randomly gave it to people. So and this is what we got. So we're the green. Okay. And this is just a look on different models. So kind of the -- so you can kind of just get a quick picture of what's happening. Basically the blue is the greedy algorithm. So that's an algorithmic upper bound. And basically in many models the green is almost a factor of two away from the blue. And there are reasons for why that's happening. I can explain that off line if somebody's interested. But it's kind of -- that's the way it is in reality, sort of the proportional, except for some cases you'll see in a minute. So the proportional share allocation rule gives you almost roughly half of what you would get if you're running in an environment where you didn't have these incentive constraints. The coverage model, it seems like the red line is the uniform or the fixed price mechanism. And that seems to be comparable with the proportional share allocation rule when you're looking at the coverage model, these are small budgets from 0 to 500. When you're looking at other models, then you can see the uniform price mechanism really drops. And the difference in other models is that there was some propagation. So when there's some propagation, then all of a sudden the reachable subsets became much, much larger and then the fixed price mechanism all of a sudden has very bad performance, and in some cases you can see that this turquoise is the random mechanism. It has performance that's even lower than just a random assignment. So fixed -- the message is fixed price mechanisms are bad theoretically and practically. And you can also do this with more money. So we did this before with 0 to 500. And these are 0 to 5,000. And you see -- and the interesting things that you see there is you see in some cases independent cascade models with when we have low probabilities then you can see that the proportional share allocation rule and the greedy algorithm have almost similar performance, and then again you see in the coverage model it seems the fixed price mechanism does reasonably well where there's propagation it does really, really badly. Then just finally, just an interesting anecdote, you can also -- the question is how many people you have in your system. So this is what you see. This is the blue now is our mechanism. The red is the fixed price mechanism. Now this is when we have -- this is how they perform when there's a thousand people in the system and they're roughly a factor of two away. As we're increasing the number of people, the number of bidders in the system we're staying with the same budget but we're just adding more people. So this is with 2,000 people. This is with 3,000. 4,000 and 5,000, almost went to linear, almost becomes just a straight line. Sorry. Then anyway, then you see that there's a huge gap that's opening between these two mechanisms. So basically increasing the number of bidders doesn't really help fixed price mechanisms and I have my beliefs for why and I can talk about that off line whereas for the proportional share mechanism it really helps. Okay. So that's pretty much it. So just kind of conclude, so I think my interest was to kind of take mechanism design and look at kind of different settings. In this case in the social network settings. And it seems like it just can just makes, kind of contributes to theory of mechanism design and can introduce interesting problems. We see that it's basically using mechanism design helps the problems in social networks. In this case the influence maximization problem. And in this case the information and completeness aspect of it. And finally can see -- we can see that with experimental valuation we cannot only get theoretical good guarantees but we can get mechanisms that seem to work pretty well in practice and perform well in comparison to other things that we may try. So that's pretty much it. Thanks. [applause] so any questions? >>: In your theorem it said that it's guarantee to be a constant vector 22 approximation or something and then in practice it looks like it does much better than that, any guess as to -- are there some particular bad [inaudible]. >> Yaron Singer: So the question is why is it that in the theoretical guarantee is 5 from 22 but in practice it does much better. So the worst case is like when you're analyzing over, you have -- you run into like all these corner cases when you can have one guy that has really high value and that can really throw you off, that kind of -- that creates these kind of gaps of factor of 2, factor of 4. And then you have like other basically other cases where one person can really influence the outcome of the mechanism. That also has kind of a large effect. And in reality those things just don't happen. So you can get away with a good solutions. >>Nikhil Devanur Rangarajan: All right. Thank you. >> Yaron Singer: Thanks. [applause]