>> Peter Key: Well, welcome, everybody, and welcome to... pleasure to be here in MSR Redmond, and also a...

>> Peter Key: Well, welcome, everybody, and welcome to our remote viewers. So it's a pleasure to be here in MSR Redmond, and also a pleasure to introduce Mohamed Mostagir, who's currently at Caltech and he's doing his Ph.D. in economics there, having previously done a master's of science, SM, at MIT. So he's got very broad-based skills in both computer science and he's also gone to the dark side, economics. Mohamed, it's pleasure. Thank you. >> Mohamed Mostagir: Thank you. Thank you very much for inviting me and for coming. Seems very difficult today with the TechFest thing. So I'm going to be talking about exploiting myopic learning. And this is designed for a broad range of audience. Please feel free to stop me at any point, ask me questions. Actually, I would prefer if you do that instead of me just talking to myself. So before I start, let me give you a brief idea of the kind of things I'm interested in. So I'm interested in how people learn and how they make decisions. And I think about these questions in context of applications like consumers learning about product quality, deciding whether to buy product or not, voters learning [inaudible] politician, voting for him or not. And if we think about learning in the more broad sense of social learning instead of individual learning, then we can also ask questions about how information propagates in a population, how rumors are started, how they spread, and then how social behavior arises from that transmission of information between individuals. And hopefully I don't have to convince you of why this is important. If we have a good idea of how people learn, then it would help us develop better marketing strategies for our products. It would help us develop better political campaigns. And, in general, and this is the key point that I want to bring up during the talk, and I will do this at many points, is that I want to think about the fact that people learn and do not make the optimal decision instantaneously. I want to think about this as a resource that a system designer can use. And this will be clear in a moment. But for now, let me just give you a quick story before I begin about one of my favorite industries in Los Angeles, the music industry. This is a multi-billion dollar enterprise that was affectionately described by Hunter S. Thompson in this quote: "The music business is a cruel and shallow money trench, a long plastic hallway where thieves run free and good men die like dogs. There's also a negative side." So the music industry, around the turn of the century, was making a massive amount of revenue. This started to change because, as all of you know, of file sharing. The biggest most publicized case of that being Napster, the Napster lawsuit in 2001, which the music industry successfully won and was able to force Napster to close down its servers. And part of the reason they were able to do that was that Napster was a centralized service. They actually hosted music files illegally on their servers. And so that was easy to focus on and to close the service down. But what happened is that there was an explosion of peer-to-peer file sharing, where it becomes very tricky to actually find out the perpetrators of the file sharing crime. And what happened then is that the Recording Industry Association of America responded to this file sharing by starting a massive lawsuit campaign. They were just suing people left and right, college kids. It's documented that they sued more than one grandmother who doesn't even own a computer. And this campaign continued for five years, from 2003 to 2008. After that point, they just came out and said, okay, we're going to stop doing this, it doesn't look like it's working, and that was the end of the story. So looking back at this it would appear at first glance that what happens is that the RIAA saw a huge dip in profits. They panicked. They started lashing out at everyone by these lawsuits. They found out that it was not working, and that was it. But I will try to argue in this talk based on some of the results that I present that their course of action is actually what they should have done if we assume that people behave in a certain way that I will discuss further. So you can keep this example in mind throughout the talk, but there are more applications that I will also bring up later. >>: [inaudible] successful [inaudible]? >> Mohamed Mostagir: I'm saying that it was successful, even though it doesn't look like it was. And we'll talk about more details. So let's go back to the beginning and say why we're thinking about learning. So if you're an economist, you know that the Nash equilibria is the thing that everyone bows to. And a quick reminder of what a Nash equilibria is, say that all of us in this room are playing a game and each one is playing a certain action, and so our actions are in equilibrium if I cannot unilaterally change my action and obtain a better payoff. Right? So Nash equilibria has been the center of many economic applications in system design. First and foremost is mechanism design. So mechanism design ask the question of how to design economic or a social system such that players play in a way that will give rise to certain equilibria that have desirable properties, either for the system designer or for the population. At no point is the question of how people actually play that equilibrium arises. The assumption is that people are so smart that they are able to immediately play the equilibrium of any game that they find themselves in. And of course a more realistic way to think about this is to think that people learn how to play games. So if I just like give you a game right now, it's not very plausible that you'll be able to immediately play the optimal actions. But if you play it long enough, then after a while you learn and you play and you converge to an equilibrium. So what I'm arguing is that we should think about the equilibria as static limit points of what is naturally a dynamic process of learning where people learn and converge to the equilibrium underneath. And then the central questions I want to address is if we start thinking about games in this sense instead of the standard way in which economists have thought about them for so long, thinking about them in terms of learning dynamics instead of static limit points, then do we get better prediction of observed behavior? Because sometimes the equilibria do not really describe what we see in real life when it comes to certain games. And if we're able to do that, if we're able to verify that incorporating learning actually brings us a step closer to realty, then is it possible to interfere with the learning process so that we implement outcomes which, again, are not possible under the standard assumption of economics that people play that equilibrium immediately. So you can think of this as we are in an equilibrium. We don't arrive here immediately, but it takes us a while to get there. So if I'm a system designer, can I meddle with what's happening in this period from when we start the game to the point where we get the equilibrium to get better outcomes for myself. >>: [inaudible] >> Mohamed Mostagir: Possibly not. Excellent question. So the first result is that there's a wide class of empirically plausible learning dynamics, which I will describe shortly, that agree more with phenomena that we see in the real world than if we were using the standard economic approach. And if we are sophisticated enough, if the system designer or principal is sophisticated enough, they can manipulate these learning dynamics, obtain gains that improve on the standard results [inaudible]. So here's an outline of what I'm going to do. I'm going to first talk about myopic and non-myopic learning models, a quick background on each. And then I'll introduce a class of games called Cheat-Audit games. It's a very simple game that I will derive most of the results in its context, but I will later also talk about other games to which these results apply. So first I will talk about analysis of the game in terms of standard economic theory with the Nash equilibria, and then I will ask what happens if everyone in the game is myopic. So this will be again displayed by a population of agents against a principal. So what happens if both the players and the principal are myopic? And when I answer this question, and we'll see why this is interesting when it comes the time for it, I ask what happens if now the principal is sophisticated and can take advantage of the population learning. When I'm done with this, I will talk about how to generalize these things, these results to other games and other learning models, conclusion, and then I'll quickly talk about other projects that I'm working on that also have the theme of learning in one form or another. So let's first talk about learning models. The standard model is Bayesian learning. So Bayesian learning assumes that we start the game and I have some beliefs about your beliefs and you have some beliefs about the beliefs and beliefs that I have a bunch of beliefs, et cetera. And under some technical conditions on these beliefs, the play in the game would converge to a Nash equilibria. There's also very recent work when it comes to social learning, not just individual learning. So Bayesian social learning is something that received some attention over the last couple of years. And, again, it looks at individuals on a network learning in a Bayesian fashion, getting information from their neighbors, updating in a Bayesian sense. And trying to find out the true state of the world, where that could be that the product is bad or the product is good, something like this. The problems with these approaches, well, first the Bayesian model is very useful and important in the sense that it sets a benchmark against which we can measure any other learning model. But the problem is that the prior space can be huge. I already have to have -- I have to have priors over many, many possible scenarios. And any scenario that I started with a zero probability on, I will never actually get observed because I didn't even think it was possible. And then the Bayesian updating, given such a large state space, can be computationally infeasible. And as soon as the problem becomes slightly complicated, then a normal human agent will not be able to do the Bayesian updating. So Bayesian updating is fine if we're talking about machine learning, because machines will be able to do that. But in terms of human it doesn't seem like a plausible way to explain behavior and how people are. So we turn to myopic models, which have been introduced in the literature at various points in time. The first -- or one of the first is fictitious play. So this is basically assumed that I am learning. I'm playing against you. And I just look at the frequency with which you play certain actions, and I just respond to that. I keep updating each period depending on your play. So this is myopic in the sense that it's strategically myopic. So I don't think about how my action this period will affect how you'll play in the next period. I'm thinking of you as a static person who plays with some distribution. So a similar model is adaptive play. So here memory is limited into, say, the K last periods. So in each period a fraction of the population is replaced by new ones, so this is either you changing your mind or just exiting the system and more people coming in. They look at these last K periods, and they make a decision that they stick to until they themselves exit the system. So this is also characterized by this myopia in choosing an action that you do not change during the course of your play. >>: [inaudible] >> Mohamed Mostagir: Of a Bayesian object? >>: Yeah. >> Mohamed Mostagir: How so? >>: [inaudible] update -- if you update [inaudible] of the history [inaudible] if you're updating time [inaudible]. >> Mohamed Mostagir: So one thing about fictitious play that at least fictitious play can do that Bayesian cannot is that you're open to any action that can be played. So if you start learning a Bayesian fashion, if you play something that I didn't think you would play, then I wouldn't know what to do. And then the last model, which I'm going to focus on for the rest of the talk, is replicator dynamics. And this is a very simple kind of social learning. So the way this works is that we're playing a game repeatedly. Say I'm taking a certain route to work every day. So I'll get up in the morning I drive to work, and at the end of the day I meet Peter and I say that this is the route I took and this is how long it took me. He tells me his experience, and if he did better than I did, then I consider switching to a strategy in the next period, but the probability depends on how much better he did. So I'm just imitating my more successful friends, and I'm doing this probabilistically. I'm going to talk about this in depth a little later. It's unclear. So far so good? All right. So let's start with the model. We have a very large population. And this large population is playing a game repeatedly against the principal. There's only a single principal playing against a large population. And I assume anonymity, and this means that every member in the population looks the same to the principal, like [inaudible] based on anything. And like we mentioned, the agents learn in accordance with replicator dynamics. And the way this works is just as I described. After each round, agents are paired randomly. They compare the strategies and payoffs. And if I did better than J, then J switches to I's strategy with a probability that's proportional to the difference in payoffs. So if our payoffs are comparable, I'm not very inclined to switch. If he did much, much better than I did, then it becomes more likely that I would switch to I's strategy. So the nice thing about these learning dynamics is that they are perhaps the simplest learning dynamics that more often than not converge to a Nash equilibrium of the underlying game being played. So this provides a nice middle ground between a myopic way of behavior and the rational outcome that traditional economic theory prescribes. So even though agents are updating their actions in this fairly naive fashion, they actually do end up at an equilibrium where they would have played in the beginning if all of them were sophisticated enough. So that's one nice aspect of it, this wisdom of the crowds effect. And another nice behavioral effect is that the probabilistic switching implies that there's a cost to switching your strategy. It's not like you can always every day do something different. But there's a cost of switching. And so this is captured by the fact that you switch probabilistically. So this is the learning model. And we'll apply that to the following game. This is called the Cheat-Audit game. It's 2-by-2 game that's played, like we said, by infinite population against principal. Each member of the population is dual player, and he has one of two actions. Let me quickly grab this. So each member of the population can either cheat or be honest. And the principal also has one of two actions. He can either audit the agent or ignore the agent. But auditing is costly, and I'll describe that in the payoffs in a second. So if I'm the principal and I'm auditing the agent, I would prefer that they were actually doing something wrong instead of me just expending out in costs needlessly. So here an agent in the population is trying maximize this payoff, other principals trying to minimize this cost. And each cell here has two numbers. The first is the payoff to the role player, to the agent. The second is the payoff to the principal. So if we look at what happens if an agent cheats and the principal audits, what happens is that the agent gets zero, which is the lowest possible outcome you can get, because he was trying to cheat and got caught, and the principal gets two. But if the principal was auditing an honest agent, then he gets five, which is worse for him because he's minimizing, not maximizing. The agent is fine. >> So wait. Your role player it's the utility and for the [inaudible] player it's the costs? >> Mohamed Mostagir: The cost. Yes. Yeah. Sorry if this is a little confusing. This guy is minimized and this guy is maximized. So let's look at the best outcome for the column player. The best outcome is because he's minimizing when he gets to zero, and this happens when what happens, when the agent is honest and the agent is not doing any effort in auditing. So no one's stealing and I'm not also having any auditing expenditures. What is the worst outcome for the auditor? It's this one. When someone was cheating and was being ignored. Conversely, for the agent, the worst outcome is if they cheat and get caught, which is zero, because they're maximizing. The best outcome is if they cheat then get away with it where they get them. These two things are in between, so it's a modeling question whether like if I'm honest and I get audited whether this bothers me or not. If I'm indifferent to it, then these two numbers are the same, two. If being audited while we're not doing anything wrong is bad for me, then I will just increase this number. So is the game clear? This is just rephrasing with symbols. Don't have to spend too much time on that. The principal's minimizing his cost and the payoffs satisfy this relationship. C3 is the highest cost, the worst. Zero is the lower bound for the cost. The agents are maximizing. V3 is the maximum that they can get, zero is the lower bound, which is the worth outcome. >>: [inaudible] >> Mohamed Mostagir: So because the principal's playing against a large population and there's anonymity, the action each period just consists of me choosing a fraction of the population that I will decide to audit in this period. So some quick facts about this game. If we play this game once, there is no pure strategy equilibria, because if you're cheating and I'm not auditing you, then I should switch to audit. But if I audit them, it's in your best interest to switch to being honest and so on and so forth. So we keep chasing each other. Instead there is a unique equilibrium and mixed strategy, mixed strategies, where each agent cheats with some probability and the principal audits a fixed fraction of the population. So if we're playing this game once, this is how this game should be played if I was the principal. So sometimes in economics when you start playing the same game repeatedly, other equilibria can arise. And the reason for this is if I play against you today and I know that we're meeting again tomorrow, then I know that my action today might influence your action tomorrow. And so with threats of punishment and things like that, you can enforce other equilibria to arise. But here -- and this is called the Folk theorem. But here because of the assumptions of the infinite population and anonymity, this does also not apply. And a quick reasoning why this is the case, first let me state this quick result, the equilibrium that we talked about for the stage game is exactly the equilibrium of this game is repeated infinitely. >>: Can you go back to the [inaudible] game? Just a quick question about how the [inaudible] they look like -- so suppose we started out at the top left corner. Right? You cheat on I. Then you move to being -- I'm assuming you move down. Right? >> Mohamed Mostagir: So ->>: I'm trying to figure out whether it would be a cycle going across all four or ->> Mohamed Mostagir: I'm going to talk about this if you want to wait. >>: Okay. All right. >> Mohamed Mostagir: Yeah, yeah. But here the point that I want to make is that if we're operating in the standard economic universe, then this game has a single solution. This is the solution. Every day we wake up, I decide whether I will cheat or not with the simple ability that I decided with yesterday and that I will use tomorrow, and the principal does the same thing, every day he audits the same fraction of the population. And the reason this is true is that because the population was large and there is the anonymity assumption, I cannot single out anyone for punishment. And conversely your action does not change -- is just so small compared to everyone else. So it doesn't even matter what you do. >>: [inaudible] if you learn that [inaudible] auditing or not auditing a particular probability if you have an estimate of that probability, if you have a [inaudible] probability your action might change, right? At the beginning you might have a [inaudible]. >> Mohamed Mostagir: So wonders of that I will not talk about is that if we assume that we have Bayesian learning, I start with a probability that you audit with this much. Then we will converge to this equilibrium. So whether people are living in a Bayesian fashion or not learning at all and being completely irrational, this is the equilibrium that we have. And that's it. Okay. All right. So let's introduce learning into this. And I'll talk about replicator dynamics, but I'm also happy to answer any questions related to fictitious play. So here I will assume that the agent is not bothered by auditing. It's the same to him if he's honest, whether he's audited or not. Of course, it's cheating, then it's a problem. So under this assumption there are only two possible scenarios for switching. The first is that an agent who was cheating and got caught and then later meets someone who was not cheating considers switching to their strategy because they did better than him. And the opposite scenario is when agent was not cheating and then meets somebody who was cheating and got away with it, so he said, okay, well, let's give it a try next in the next round. I'll assume that switching happened in the first scenario with probability P and the second with probability Q. And now I'll discuss the myopic principal. So the myopic principal plays in a way that is similar to how the population plays. He just looks at what happened in the last period and adjusts his reaction accordingly. He's not taking the whole future into account. And there are many contexts that actually encourage this short-sighted principal. Most of them are in politics where you only care about your term is ending in a few years, in a couple of years, so you don't care about anything else beyond this point. So here what we'll do is that we'll denote the fraction of cheaters at any point in time by X of T and the fraction with which the principal audits by alpha T. And then I will write the equations that describe the evolution of these two quantities, X of T and alpha T. They describe a dynamical system. And we solve the dynamical system and we get the following theorem, which we also get if both parties were playing according to fictitious play, which says that the phase diagram of this game with both parties, the population and the principal learning in the myopic fashion, is a closed orbit that looks like this with the mixed Nash equilibria that we talked about as the center of the orbit. So what this means is the following. Say we start somewhere around here. This is the fraction of cheaters. And this is the audit rate of the principal. If we start at a point where there are not that many people cheating and not too much auditing, then what happens is that people start to meet and those few who were actually cheating and getting away with it, they start telling everyone about it, and then everyone starts to cheat, and it grows and grows and grows until some point where the principal starts cracking down on the population because there has been a very high amount of crime. >>: So when you're talking about the fraction of cheaters, it has to be the expectation of the next number of cheaters, right? Because it depends on how you pair them. If you always pair ->> Mohamed Mostagir: So because the population is infinite, everything happens with probability 1. >>: Oh, okay. >> Mohamed Mostagir: Yeah. >>: So does the time ->> Mohamed Mostagir: And this is -- again, this is -- I'm sorry. This is also an excellent question because once the population is finite, then all sorts of problems arise, which I also talk about. Yes. >>: Does the time integral over the [inaudible] somehow -- is the average the same as ->> Mohamed Mostagir: Yeah. Yeah, yeah, yeah. Yeah. Yeah, yeah. This is the next slide. Yeah. You guys are great. So but first let's give a couple of examples about the cyclical behavior. So what's happening here is that at some point there's little crime but then crime keeps increasing. And then the authority says, okay, this is not working, let's crack down on these guys, this brings the fraction of criminals back to a low level, after which the auditing stops again. This is a very common feature. It's in almost all anti-corruption campaigns. China has a perfect example of this. One campaign was in the late -- early '50s, the other was in the mid '60s, the other was in the early '80s, I think. And if you actually look at the numbers, you would see that just prior to the crackdown the corruption was very, very high, and then it was driven down to a certain point after which I think in the first -- after the first crackdown, this was called the Golden Age in China, because there has been very little crime, and also very little police policing. But eventually crime stars to grow again, and let's just say it's another crackdown. Another example is the LA metro. And in LA you just go and take the metro. You don't have to buy a ticket. Well, you should by a ticket, but there's nothing that prevents you from getting on the metro. So if you take the metro every day and you actually don't buy a ticket, there is a good chance that you'll get away with it for a while. But every so often you see people coming up and looking at everybody's tickets, and they do this for a few days, and then they disappear completely, and then after a while they come back again. So, again, it has this cyclical nature to it. And then the point that you made about the time average, so if people are playing in this fashion, then what happens is that the payoff averaged over time for the principal and for an agent, the population is the same as the Nash equilibrium, which goes back to the point we made earlier, which says even when you behave in this very naive fashion, you are able to approximate over time the outcome of everyone being rational. So fine so far? So then the national question afterwards is -- >>: [inaudible] >> Mohamed Mostagir: Yeah. >>: Okay. >> Mohamed Mostagir: So what happens when the principal is sophisticated? The principal knows that the population is learning in such a myopic fashion. Can he do better? And the answer should be yes. But then how should he do that? How should he play this game? And to do this, I will formulate an optimal control problem. Again, I will denote the fraction of cheaters at any time T by X of T. The principal's -- the principal's action, which is the fraction that he audits, is alpha of T. And then the cost at any time is given by three terms. The first is those agents -- those agents who were cheating and actually got caught, which costs you one. The second is those agents who were not cheating and were still audited, and this gives me a cost, C2. This is just the cost that I get from the table. And the third possibility is those that were cheating and I failed to audit, and then I incur the highest cost of these three. So this is my cost at any moment in time. And my action at any point in time also affects the evolution of the population of cheaters. Let's not get into how this happens, but as we described already, the more you crack down on people, the less you'd expect to see people cheating in the next period. So my action on time T not only affects the payoff that I will immediately get from here but also affects my future payoffs through the dependence of the rate of change on alpha. So the principal's objective then is to minimize his long-run discounted cost over the entire horizon. So I just put this function G, which is the cost per stage or per time period, we're [inaudible] this over the entire horizon. This is the discounted factor. Such that the evolution of the population is governed by the equation of motion from the last slide. And the way we solve this is that we take Hamiltonian approach. So basically we're [inaudible] Hamiltonian which reduces the problem to a single period problem. And what this means is that I want to solve a problem, I want to find the fraction for this period, and there is a price associated with the entire future that is also affected by my action of this period. And I would like to minimize this function. And solving the Hamiltonian gives me unnecessary but not sufficient condition for the optimal solution. When we do this, we find that the optimal control for the problem is a bang-bang solution. So this means that I either audit everybody, or I don't do anything. There is a case that lies in between, which I will get into. This happens when alpha basically disappears from Hamiltonian. So that's one condition that the optimal policy has to fulfill. To get another condition, what I will do is that I will formulate the problem as a variational calculus problem. So I will just replace alpha of T by this quantity from the equation of motion. And then my problem is to minimize this. I need to find the function X of T that minimizes this entire group. And it turns out that if we use the Euler-Lagrange equation to solve this problem, which also gives us another condition for optimality, what we get is that X of T is actually a constant. So X of T does not depend on time. [inaudible] says that if X star of T is the minimizer of the variational calculus problem, then it is equal to C, where C is a constant that depends on the parameters of the problem. So this tells you that any optimal policy will have the fraction of T is constant at a certain value. When I pair this necessary condition with the condition that I had on alpha, I'm able to characterize -- to completely characterize the optimal policy. And the optimal policy looks like this. There's a value X bar such that the optimal policy audits everybody when the fraction of cheaters is more than X bar, or it does nothing when the fraction of cheaters is less than X bar. But if X is equal to exactly X bar, then this is the third condition here, where I said alpha equal to this quantity. And I'll talk about later. And then the system stays in this state forever. >>: So now I'm not sure I'm following. So you're saying you would keep the fraction that cheats the constant. >> Mohamed Mostagir: I would like to. >>: But you can't, right, because they will move depending on how much you audit? You don't even know that unless you audit, right? >> Mohamed Mostagir: Well, there is -- it turns out there is an audit rate for which you're able to keep this fraction constant. How does this work? This works because I'm balancing the number of people who will decide to switch from cheering to being honest in each period with the number of guys switched in the opposite direction. So you can show that there is an audit rate that will achieve this balance. And actually this has this value and it's unique. So then the policy is I just have this job now, I look at the fraction of cheaters, I see that it's very high, I crack down on everybody indiscriminately, I drive it all the way to X bar, and once I reach X bar, I keep it there by auditing this value. So if you go back to the Recording Industry Association example, what happens is that in the year 2003 the fraction of file shares from the total Internet population of the U.S. was 35 percent. The crackdown started then, as you can see in the graph. And by the end of the crackdown, actually, it's just before the jump to 35 percent, the fraction of file shares was around 18 percent. >>: [inaudible] >> Mohamed Mostagir: Oh, I'm sorry. I'm very sorry. So this is the number of copyright infringement lawsuits filed by the Recording Industry Association of America. And this is time. Just prior to the green bars, the fraction of file shares was around I think 14 percent. And then this jump started, initiated this crackdown which lasted for five years. Went down in 2008 and it drove the percentage of cheaters down to -- the percentage of file sharers down to 16 percent, which is three years later is still where it's at. So there's also another thing, and that's a bit of anecdotal evidence. So at Caltech I'm a resident associate, which means I live with undergraduate students. We get to have dinner together every night. And so every other week invariably you would find someone on the table saying that, hey, I was just downloading so-and-so and I got a letter from the [inaudible] lawyer saying stop doing that. So there's evidence that they're still auditing at a very low rate, just like the optimum policy prescribes. And, presumably, I mean, of course, I'm not claiming that this is exactly a description of how the world works, but how this fits the story is -- how the story fits the optimal policy is very, very interesting, I thought, where you start with this very huge crackdown and now they just try to maintain the fraction of file shares at a fixed value. So this answered the first question that I was interested in, if you remember, which is whether when we try to account for learning in our problems, whether we see outcomes that actually resemble observed behavior. The second question was do these policies perform better than if everyone was rational. And the answer is a resounding yes. So basically for this problem that we just talked about, the audit trait that we converged to, which I denote by alpha B, so this is always less than the Nash audit rate. At the same time, depending on the discount factor, I'm able to drive the fraction of cheaters or file sharers for this example to very low levels, if I choose to. If we look at the Nash view of this problem, this is completely insensitive to discounting. No matter how much discounting there is, what will happen the next period is exactly what happened in this period. But here I'm able to drive that fraction down and do that by less auditing. >>: [inaudible] >> Mohamed Mostagir: The discounting is an objective function of the principal. It's E to the minus RT. >>: [inaudible] >> Mohamed Mostagir: Yeah. So if we take the same objective function, when every one is rational, and, again, take this E minus RT at the beginning, then it doesn't have any effect. So what this is saying is that not only can I obtain a better outcome by reducing the fraction of crime, but I can do that by exerting less effort than the Nash solution. These are the details of why this is true; I'll just skip. So let's quickly talk about some other ->>: So this is -- makes it better for the principal but worse for the agents. >> Mohamed Mostagir: Great question. So this of course makes it better for the principal and worse for the agents. But assume now that we also include another term in our game to account for the welfare of the agents. So say that I want to penalize myself for every time that I audit somebody who was not doing anything wrong. And so the objective now is to minimize crime in the population. And also not ties to other people. So this more of a social objective kind of like my objective as a police chief or something. Then if you do that, you get the same results, but at different convergence rates. >>: [inaudible] actually play the game [inaudible] switch it around, it would seem essentially that massive agents would [inaudible]. >> Mohamed Mostagir: If you switch it around, then you get the mirror image of these results. So say that the principal is done, okay, and the agent is sophisticated. So they would give the principal the impression that everything's fine, there's very little crime, and then once he sits back, they start cheating. When he catches on to it, they stop, and so on. So let's talk about a few more examples. So advertising is a very prominent example. Basically advertising is -- let's assume that advertising actually increases the potential of agents to buy a product. But the problem is that advertising is costly. But if we assume that people talk to each other, people receive the [inaudible] advertising talk to other people who have not and have an effect that is kind of similar to what advertising would do, it doesn't have to be the full affect but at least a fraction of it, then what you can do or the optimal policy would again give you cyclical waves of advertising. So what happens is that there is a big advertising splash. Afterwards there is very little advertising, or at least not as huge. And during the time, the advertiser is making use of the fact that word of mouth is doing the work for him for free. And as soon as this is over, they start doing it again. And actually this happens in contexts where we wouldn't think it's happening. Like, for example, Coca-Cola. Coca-Cola's advertising all the time, and that's a very steady level of advertising. But I think once every three years or something they have to have a new song, they have to have a new logo or something. Sometimes it's bigger by designing even the whole bottle, a new shape for the bottle or for the logo, things like that. Another example is traffic management. So an example of this is when the Bay Bridge was closed in San Francisco. What happened is that people adjusted their driving habits. They started taking different routes. And then when the bridge was reopened again -- and during that time congestion was actually decreased. But when the bridge opened again, it's not like the next day everyone started taking the old route, but they slowly converged to driving on their old routes. So the point is you can make use of periodic closures of roads or even adjusting tolls dynamically to make use of this transition period when people adjust their behavior in order to buy yourself more gains. Finally, another application is equilibrium selection. So I'm Apple and I just released OS4. I want everyone to move to OS4. So I start making it very difficult for them to use OS3. And the principle behind this is essentially the same thing that we talked about, it's that you don't dictate to people what they should do, but you slowly make one outcome much more favorable than the other and eventually push everyone to -- towards the equilibrium that you prefer, which is everyone using less or no one using Windows 98 anymore, et cetera. So let's talk about other learning models. Let me highlight the features that make these results work. There are two things, two main points. The first, which we brought up again and again, is the lack of instantaneous reaction to changes in the environment, like the driving example or like the cheating thing. The other thing is strategic myopia, which is whatever way that I use to make the decision, whether I'm looking at history or whether I'm talking to someone and just imitating [inaudible] more successful. I'm not thinking about how my action will influence the principal's action in the future. I'm just pretending that this principal is a machine that's playing a fixed probability. So then learning models that would fit this is adaptive play, we talked about. Fictitious play is another thing. Replicator dynamics we already talked about. So these all will have policies that are similar in structure but different maybe in convergence rates or guarantees. But all of them will still perform better than the Nash equilibrium. And models for which these will also not work is say that you respond to last period's play immediately. So a change happens in the system in the next round, you're still myopic, but you're responding to this play, then we are unable to derive similar results. And obviously if there is a Bayesian population that starts with correct priors, then you'll be able to spot the cyclical behavior and we'll be able to act against it and bring us back to the static Nash equilibrium case. >> [inaudible] public list [inaudible]. >> Mohamed Mostagir: So that's a good question. Then depending on the probability you'll have like [inaudible] cases. If you fully respond to last period's play, then you're unable to do anything. >>: Are these models also different from information [inaudible] because in order to respond to the [inaudible] the principal one would have to know it rather than just talking to people [inaudible]. >> Mohamed Mostagir: So it's enough for you to say, for example, you're the principal, I'm the agent. In this period, you audit me. So I'll just respond to your action against me, and that's all I need to do. >>: Oh, I see. >> Mohamed Mostagir: So in the next round I know that you're auditing now, let me not do this. So let me conclude before I talk about like some other learning projects. So what I wanted to convince you of is that myopic learning models are powerful enough to predict observed behavior in many real-world scenarios; that there are contexts in which the principal being myopic will lead the whole system to converge with something that's very similar to the Nash equilibrium even though no one is sophisticated enough, and that this cyclical behavior is [inaudible] of many phenomena that we see in the [inaudible], for example. For a large class of games, what I did is that I completely characterized the unique optimal policy that the principal should follow, and I showed that the optimal policy strictly improves on the Nash solution. And this goes back to the point on the very first slide which is that we should think of population learning as a resource for the principal to utilize. The fact that agents are learning is something that I can use to my advantage, as long, of course, as they are learning in a particular fashion. This -- I think I forgot to complete, but I already talked about. And then just as we discussed, these results generalize to other learning models, as long as they have these two core features that we talked about, and generalize to games in advertising and traffic management and [inaudible] selection. So before I go into this, do you want to ask any general questions? >>: This solution is a bang-bang solution. Is it a pretty robust picture throughout the [inaudible]? >> Mohamed Mostagir: Yeah, yeah. >>: [inaudible] what property is it that makes the [inaudible]? >> Mohamed Mostagir: You mean technically or ->>: [inaudible] >> Mohamed Mostagir: Technically is that [inaudible] is just a linear function in your fraction with which you audit. But if you want to think about it on a higher level, it's the fact that if you make your decision based on, say, history, then if I convince you that things look a certain way, the fastest way for me to do this is to do a bang-bang thing. So I won't convince you that we are auditing everyone who cheats. So instead of auditing at like, you know, 75 percent, if I audit at a hundred percent for a long time, then if you learn in this fashion, this is what you'll believe. It will take a while for you to realize this is not what's happening. Yeah. >>: So you were talking about the replicator dynamics, and I was wondering -- so here's one alternative. Alternative one is you randomly pair a bunch of these guys up. As they talk to each other, each pair decides what's better, cheat or not cheat. An alternative is I don't talk to just one individual, say I talk to a hundred individuals, or whatever I -- ten individuals. And then I sample essentially and I know what proportion of the people are getting audited, and then I respond to that. So I'm thinking to myself, okay, 5 percent of the people are audited or, you know, 8 percent of the people are getting audited, and actually [inaudible] according to them. So will this change any of the results and how? >> Mohamed Mostagir: It will not change the results structurally, but it will change the conversion theories for sure. Because now the more people you talk to, the better idea you have about what's actually going on. And so you end up switching more frequently. And so ultimately if you know everything, then we go back to the case where there's a Bayesian thing going on where you converge to the Nash equilibrium. >>: It's just how much uncertainty I have regarding the portion that gets audited. >> Mohamed Mostagir: Yeah. >>: [inaudible] dynamics but studied on different topologies? Because it would seem that ->> Mohamed Mostagir: You mean like a network topology? >>: Yeah, that I only talk to my neighbors ->> Mohamed Mostagir: So ->>: [inaudible] be the same. >> Mohamed Mostagir: So this is not a replicator dynamic result, but it's similar in spirit. I think it's a paper by Gould [phonetic] and Jackson from a year ago. What they do is that there's an network topology and I just take the average of my neighbor's actions. And then they show that this naive behavior leads to an equilibrium in many games. So it's similar in the sense that I'm not really sophisticated about what I'm doing, but I'm just updating myopically by taking an average. And the results are very similar to the paper that I mentioned in the beginning where people are actually doing Bayesian updating. So maybe with a large number of people you don't need to be sophisticated enough to arrive at the rational outcome for a game. All right. So any other questions before I move on? >>: So one question. So if you think about other games, like the beauty contest game, you're assigned [inaudible] what would be the analog [inaudible] game playing beauty contest where people are -- everybody and their agent try to [inaudible] principals. >> Mohamed Mostagir: So [inaudible] beauty contest game would look like what? >>: [inaudible] so people look at how are people learning about -- so they start off, right, and they see what was [inaudible] and then they realize, okay, people are sort of going towards the lower number, so then next time their direct influence ->> Mohamed Mostagir: So I think what will happen is that instead of immediately, you know, playing the low bound, they will start slowly moving towards it. So I guess if you are one of the players and you know that this is going on, then you can probably exploit that to your advantage. I have not thought about this, but this would make sense to me. Yeah. Because, again, there is this feature of -- because on beauty contest you know exactly what you should do if everyone is rational. But if it takes you a while to get there, then someone can use this to trick you in the middle. >>: Do you have many anecdotal [inaudible]? >> Mohamed Mostagir: So the stuff in the final paper, the hope is that the example at least for the RIAA will be a complete and full data based example with numbers and everything. But now something that I'm working on with someone else at Ohio State is doing a lab experiment to see if people actually really behave in this way in the lab, that play the cheat out of game, for example. And of course the first difficulty that you would run into is that all these results are for a large population. In the lab you have 30 people or something. So how large is large? There are some games where a large population consists of six people, and then everything that works on paper actually works on the lab. So the two questions that we're trying to deal with here are what the threshold after which these procedures will work and are people really behaving in this non-Bayesian fashion. So hopefully this will shed some light on that. Another thing -- so one of the reasons that got me started working on this is that one of my advisors who works at another company that does ad auctions was telling me that sometimes they change the rules by which they are billing the advertisers, but they don't publicize this change. And then they look at the data and they see that they're actually not instantaneously responding, thus responding to this change. So with this massive amount of data, I'm interested in looking into patterns that seem to suggest there is similar behavior going on in terms of, you know, lag and how you respond to things and how you respond to them, is it a myopic fashion or is it more strategic. So this is more under the future-work aspect of it. >>: So isn't it a bias of [inaudible] for example, in the music industry case, when the auditing is not happening, when people are not being audited, maybe whole cheat events might get unnoticed and might not even appear in the data? So [inaudible]. >> Mohamed Mostagir: [inaudible] the best thing to do that I can think of is just to estimate based on -- I mean, the assumption that the population is very large, so whatever sample that you have will give you a very good information with high probability about the overall composition of the population. >> Peter Key: [inaudible]. >> Mohamed Mostagir: Okay. So another -- we have only talked about a single principal. This would make sense for like a policing application. But in advertising what happens when there are multiple principals and they all act as if the agents are behaving in this fashion, then what are the results, what are the equilibrium of these games between the principals, the sum that I am thinking about. A more general line of research is to understand the benefits and the shortcomings of how consumers learn. So this is another project that I'm working on, centralized recommendation system versus word-of-mouth learning. So think of something like Yelp, where you go into a new city that you don't know anything about, you want to find the best place to get a steak, you go on Yelp, it tells you this place, 98 percent of the people like it, you go there. And an opposite model is that you just ask around, which was the model up until centralized recommendation systems were around. So I'm doing this with [inaudible] and the model we have is that we have consumers on a Hotelling line. So what this is is that we have a line and we have two product vendors at each end of the line, and we have consumers who are randomly distributed on this line. And then the payoff to the consumer is a function of the distance between them and the vendor and the quality of the product that they get. But the quality of the product is unknown. So you experiment with products. And naturally you -- if you are -- near that end of the line, you'd experiment to the closest product first. And then we assume that after enough experimentation there is a breakthrough where the quality of a product is revealed. So then what we show is that for a recommendation system like Yelp, what happens is that this information is propagated immediately, and so everyone converges to this product. And this might not be the socially efficient outcome, because it's possible that the other products actually have higher quality, but there is not enough experimentation to enable the population to find that out. And so the question is how do you design the recommendation system so that you take something like that into account. And here the learning component of course is obvious in how you learn by talking to other people or if you just get everything aggregated from a centralized recommendation system. This other project is actually very, very interesting. So it again has to do with the tension between standard economics and more recent ways of thinking about how people make decisions. So in economics if I give you a menu you of 100,000 items and I ask you to choose the one that you prefer the most, the optimal item, then you can choose it. Well, it's not clear that this is actually possible. Most of the time subjects actually end up satisficing. And what this means is that they choose something which is above a reservation value that they have once they reach it. If it's good enough for them, they don't have to find the optimal option in the menu. And there has been a recent eye-tracking experiment done at Caltech. So this is an experiment that actually started at NYU where a subject is presented with a menu of simple arithmetic operations, but they vary in difficulty amongst themselves. And what they do is that they need to choose one of these options, and the outcome of the arithmetic operation is the payoff that they will get in dollars. And so ideally what you would want to do is that you would choose the option that will give you the highest payoff. This is of course a timed experiment. So we are unable to completely -unless you are very smart, you are unable to evaluate all the options. And then it shows very concrete evidence that once people reach a certain reservation utility, they actually stop. And this was confirmed by the eye-tracking experiment in that they actually do don't even look at the rest of the menu. So then the question is how do we design menus to enforce this satisficing. Say that I want to sway your choice in a certain direction because I know that you're unable to process so much information. Then how do I design the menu such that you choose certain things that I would prefer that you choose instead of you choosing the best for you. And can I use this both for my gain as say, a firm, or to have a more fair distribution of resources, which is a social outcome instead of self [inaudible]. And so that's all I have to say. >> Peter Key: Okay. Thanks very much. >> Mohamed Mostagir: Thank you. [applause]

>> Peter Key: Well, welcome, everybody, and welcome to... pleasure to be here in MSR Redmond, and also a...

Related documents

Products

Support

&gt;&gt; Peter Key: Well, welcome, everybody, and welcome to... pleasure to be here in MSR Redmond, and also a...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Peter Key: Well, welcome, everybody, and welcome to... pleasure to be here in MSR Redmond, and also a...