>> Peter Key: Well, welcome, everybody, and welcome to... pleasure to be here in MSR Redmond, and also a...

advertisement
>> Peter Key: Well, welcome, everybody, and welcome to our remote viewers. So it's a
pleasure to be here in MSR Redmond, and also a pleasure to introduce Mohamed Mostagir,
who's currently at Caltech and he's doing his Ph.D. in economics there, having previously done a
master's of science, SM, at MIT. So he's got very broad-based skills in both computer science
and he's also gone to the dark side, economics.
Mohamed, it's pleasure. Thank you.
>> Mohamed Mostagir: Thank you. Thank you very much for inviting me and for coming.
Seems very difficult today with the TechFest thing.
So I'm going to be talking about exploiting myopic learning. And this is designed for a broad
range of audience. Please feel free to stop me at any point, ask me questions. Actually, I would
prefer if you do that instead of me just talking to myself.
So before I start, let me give you a brief idea of the kind of things I'm interested in. So I'm
interested in how people learn and how they make decisions. And I think about these questions
in context of applications like consumers learning about product quality, deciding whether to buy
product or not, voters learning [inaudible] politician, voting for him or not.
And if we think about learning in the more broad sense of social learning instead of individual
learning, then we can also ask questions about how information propagates in a population, how
rumors are started, how they spread, and then how social behavior arises from that transmission
of information between individuals.
And hopefully I don't have to convince you of why this is important. If we have a good idea of
how people learn, then it would help us develop better marketing strategies for our products. It
would help us develop better political campaigns.
And, in general, and this is the key point that I want to bring up during the talk, and I will do this
at many points, is that I want to think about the fact that people learn and do not make the
optimal decision instantaneously. I want to think about this as a resource that a system designer
can use. And this will be clear in a moment.
But for now, let me just give you a quick story before I begin about one of my favorite industries
in Los Angeles, the music industry.
This is a multi-billion dollar enterprise that was affectionately described by Hunter S. Thompson
in this quote: "The music business is a cruel and shallow money trench, a long plastic hallway
where thieves run free and good men die like dogs. There's also a negative side."
So the music industry, around the turn of the century, was making a massive amount of revenue.
This started to change because, as all of you know, of file sharing. The biggest most publicized
case of that being Napster, the Napster lawsuit in 2001, which the music industry successfully
won and was able to force Napster to close down its servers.
And part of the reason they were able to do that was that Napster was a centralized service. They
actually hosted music files illegally on their servers. And so that was easy to focus on and to
close the service down.
But what happened is that there was an explosion of peer-to-peer file sharing, where it becomes
very tricky to actually find out the perpetrators of the file sharing crime.
And what happened then is that the Recording Industry Association of America responded to this
file sharing by starting a massive lawsuit campaign. They were just suing people left and right,
college kids. It's documented that they sued more than one grandmother who doesn't even own a
computer.
And this campaign continued for five years, from 2003 to 2008. After that point, they just came
out and said, okay, we're going to stop doing this, it doesn't look like it's working, and that was
the end of the story.
So looking back at this it would appear at first glance that what happens is that the RIAA saw a
huge dip in profits. They panicked. They started lashing out at everyone by these lawsuits.
They found out that it was not working, and that was it.
But I will try to argue in this talk based on some of the results that I present that their course of
action is actually what they should have done if we assume that people behave in a certain way
that I will discuss further. So you can keep this example in mind throughout the talk, but there
are more applications that I will also bring up later.
>>: [inaudible] successful [inaudible]?
>> Mohamed Mostagir: I'm saying that it was successful, even though it doesn't look like it was.
And we'll talk about more details.
So let's go back to the beginning and say why we're thinking about learning. So if you're an
economist, you know that the Nash equilibria is the thing that everyone bows to. And a quick
reminder of what a Nash equilibria is, say that all of us in this room are playing a game and each
one is playing a certain action, and so our actions are in equilibrium if I cannot unilaterally
change my action and obtain a better payoff. Right?
So Nash equilibria has been the center of many economic applications in system design. First
and foremost is mechanism design. So mechanism design ask the question of how to design
economic or a social system such that players play in a way that will give rise to certain
equilibria that have desirable properties, either for the system designer or for the population.
At no point is the question of how people actually play that equilibrium arises. The assumption
is that people are so smart that they are able to immediately play the equilibrium of any game
that they find themselves in.
And of course a more realistic way to think about this is to think that people learn how to play
games. So if I just like give you a game right now, it's not very plausible that you'll be able to
immediately play the optimal actions. But if you play it long enough, then after a while you
learn and you play and you converge to an equilibrium.
So what I'm arguing is that we should think about the equilibria as static limit points of what is
naturally a dynamic process of learning where people learn and converge to the equilibrium
underneath.
And then the central questions I want to address is if we start thinking about games in this sense
instead of the standard way in which economists have thought about them for so long, thinking
about them in terms of learning dynamics instead of static limit points, then do we get better
prediction of observed behavior? Because sometimes the equilibria do not really describe what
we see in real life when it comes to certain games.
And if we're able to do that, if we're able to verify that incorporating learning actually brings us a
step closer to realty, then is it possible to interfere with the learning process so that we
implement outcomes which, again, are not possible under the standard assumption of economics
that people play that equilibrium immediately.
So you can think of this as we are in an equilibrium. We don't arrive here immediately, but it
takes us a while to get there. So if I'm a system designer, can I meddle with what's happening in
this period from when we start the game to the point where we get the equilibrium to get better
outcomes for myself.
>>: [inaudible]
>> Mohamed Mostagir: Possibly not. Excellent question.
So the first result is that there's a wide class of empirically plausible learning dynamics, which I
will describe shortly, that agree more with phenomena that we see in the real world than if we
were using the standard economic approach.
And if we are sophisticated enough, if the system designer or principal is sophisticated enough,
they can manipulate these learning dynamics, obtain gains that improve on the standard results
[inaudible].
So here's an outline of what I'm going to do. I'm going to first talk about myopic and
non-myopic learning models, a quick background on each. And then I'll introduce a class of
games called Cheat-Audit games. It's a very simple game that I will derive most of the results in
its context, but I will later also talk about other games to which these results apply.
So first I will talk about analysis of the game in terms of standard economic theory with the Nash
equilibria, and then I will ask what happens if everyone in the game is myopic. So this will be
again displayed by a population of agents against a principal.
So what happens if both the players and the principal are myopic? And when I answer this
question, and we'll see why this is interesting when it comes the time for it, I ask what happens if
now the principal is sophisticated and can take advantage of the population learning.
When I'm done with this, I will talk about how to generalize these things, these results to other
games and other learning models, conclusion, and then I'll quickly talk about other projects that
I'm working on that also have the theme of learning in one form or another.
So let's first talk about learning models. The standard model is Bayesian learning. So Bayesian
learning assumes that we start the game and I have some beliefs about your beliefs and you have
some beliefs about the beliefs and beliefs that I have a bunch of beliefs, et cetera. And under
some technical conditions on these beliefs, the play in the game would converge to a Nash
equilibria.
There's also very recent work when it comes to social learning, not just individual learning. So
Bayesian social learning is something that received some attention over the last couple of years.
And, again, it looks at individuals on a network learning in a Bayesian fashion, getting
information from their neighbors, updating in a Bayesian sense. And trying to find out the true
state of the world, where that could be that the product is bad or the product is good, something
like this.
The problems with these approaches, well, first the Bayesian model is very useful and important
in the sense that it sets a benchmark against which we can measure any other learning model.
But the problem is that the prior space can be huge. I already have to have -- I have to have
priors over many, many possible scenarios. And any scenario that I started with a zero
probability on, I will never actually get observed because I didn't even think it was possible.
And then the Bayesian updating, given such a large state space, can be computationally
infeasible. And as soon as the problem becomes slightly complicated, then a normal human
agent will not be able to do the Bayesian updating.
So Bayesian updating is fine if we're talking about machine learning, because machines will be
able to do that. But in terms of human it doesn't seem like a plausible way to explain behavior
and how people are.
So we turn to myopic models, which have been introduced in the literature at various points in
time. The first -- or one of the first is fictitious play. So this is basically assumed that I am
learning. I'm playing against you. And I just look at the frequency with which you play certain
actions, and I just respond to that. I keep updating each period depending on your play.
So this is myopic in the sense that it's strategically myopic. So I don't think about how my action
this period will affect how you'll play in the next period. I'm thinking of you as a static person
who plays with some distribution.
So a similar model is adaptive play. So here memory is limited into, say, the K last periods. So
in each period a fraction of the population is replaced by new ones, so this is either you changing
your mind or just exiting the system and more people coming in. They look at these last K
periods, and they make a decision that they stick to until they themselves exit the system.
So this is also characterized by this myopia in choosing an action that you do not change during
the course of your play.
>>: [inaudible]
>> Mohamed Mostagir: Of a Bayesian object?
>>: Yeah.
>> Mohamed Mostagir: How so?
>>: [inaudible] update -- if you update [inaudible] of the history [inaudible] if you're updating
time [inaudible].
>> Mohamed Mostagir: So one thing about fictitious play that at least fictitious play can do that
Bayesian cannot is that you're open to any action that can be played. So if you start learning a
Bayesian fashion, if you play something that I didn't think you would play, then I wouldn't know
what to do.
And then the last model, which I'm going to focus on for the rest of the talk, is replicator
dynamics. And this is a very simple kind of social learning. So the way this works is that we're
playing a game repeatedly. Say I'm taking a certain route to work every day. So I'll get up in the
morning I drive to work, and at the end of the day I meet Peter and I say that this is the route I
took and this is how long it took me. He tells me his experience, and if he did better than I did,
then I consider switching to a strategy in the next period, but the probability depends on how
much better he did.
So I'm just imitating my more successful friends, and I'm doing this probabilistically. I'm going
to talk about this in depth a little later. It's unclear. So far so good?
All right. So let's start with the model. We have a very large population. And this large
population is playing a game repeatedly against the principal. There's only a single principal
playing against a large population. And I assume anonymity, and this means that every member
in the population looks the same to the principal, like [inaudible] based on anything.
And like we mentioned, the agents learn in accordance with replicator dynamics. And the way
this works is just as I described. After each round, agents are paired randomly. They compare
the strategies and payoffs. And if I did better than J, then J switches to I's strategy with a
probability that's proportional to the difference in payoffs.
So if our payoffs are comparable, I'm not very inclined to switch. If he did much, much better
than I did, then it becomes more likely that I would switch to I's strategy.
So the nice thing about these learning dynamics is that they are perhaps the simplest learning
dynamics that more often than not converge to a Nash equilibrium of the underlying game being
played.
So this provides a nice middle ground between a myopic way of behavior and the rational
outcome that traditional economic theory prescribes. So even though agents are updating their
actions in this fairly naive fashion, they actually do end up at an equilibrium where they would
have played in the beginning if all of them were sophisticated enough.
So that's one nice aspect of it, this wisdom of the crowds effect. And another nice behavioral
effect is that the probabilistic switching implies that there's a cost to switching your strategy. It's
not like you can always every day do something different. But there's a cost of switching. And
so this is captured by the fact that you switch probabilistically.
So this is the learning model. And we'll apply that to the following game. This is called the
Cheat-Audit game. It's 2-by-2 game that's played, like we said, by infinite population against
principal. Each member of the population is dual player, and he has one of two actions. Let me
quickly grab this.
So each member of the population can either cheat or be honest. And the principal also has one
of two actions. He can either audit the agent or ignore the agent. But auditing is costly, and I'll
describe that in the payoffs in a second.
So if I'm the principal and I'm auditing the agent, I would prefer that they were actually doing
something wrong instead of me just expending out in costs needlessly.
So here an agent in the population is trying maximize this payoff, other principals trying to
minimize this cost. And each cell here has two numbers. The first is the payoff to the role
player, to the agent. The second is the payoff to the principal. So if we look at what happens if
an agent cheats and the principal audits, what happens is that the agent gets zero, which is the
lowest possible outcome you can get, because he was trying to cheat and got caught, and the
principal gets two.
But if the principal was auditing an honest agent, then he gets five, which is worse for him
because he's minimizing, not maximizing. The agent is fine.
>> So wait. Your role player it's the utility and for the [inaudible] player it's the costs?
>> Mohamed Mostagir: The cost. Yes. Yeah. Sorry if this is a little confusing. This guy is
minimized and this guy is maximized.
So let's look at the best outcome for the column player. The best outcome is because he's
minimizing when he gets to zero, and this happens when what happens, when the agent is honest
and the agent is not doing any effort in auditing. So no one's stealing and I'm not also having any
auditing expenditures.
What is the worst outcome for the auditor? It's this one. When someone was cheating and was
being ignored.
Conversely, for the agent, the worst outcome is if they cheat and get caught, which is zero,
because they're maximizing. The best outcome is if they cheat then get away with it where they
get them.
These two things are in between, so it's a modeling question whether like if I'm honest and I get
audited whether this bothers me or not. If I'm indifferent to it, then these two numbers are the
same, two. If being audited while we're not doing anything wrong is bad for me, then I will just
increase this number.
So is the game clear? This is just rephrasing with symbols. Don't have to spend too much time
on that. The principal's minimizing his cost and the payoffs satisfy this relationship. C3 is the
highest cost, the worst. Zero is the lower bound for the cost. The agents are maximizing. V3 is
the maximum that they can get, zero is the lower bound, which is the worth outcome.
>>: [inaudible]
>> Mohamed Mostagir: So because the principal's playing against a large population and there's
anonymity, the action each period just consists of me choosing a fraction of the population that I
will decide to audit in this period.
So some quick facts about this game. If we play this game once, there is no pure strategy
equilibria, because if you're cheating and I'm not auditing you, then I should switch to audit. But
if I audit them, it's in your best interest to switch to being honest and so on and so forth. So we
keep chasing each other.
Instead there is a unique equilibrium and mixed strategy, mixed strategies, where each agent
cheats with some probability and the principal audits a fixed fraction of the population. So if
we're playing this game once, this is how this game should be played if I was the principal.
So sometimes in economics when you start playing the same game repeatedly, other equilibria
can arise. And the reason for this is if I play against you today and I know that we're meeting
again tomorrow, then I know that my action today might influence your action tomorrow. And
so with threats of punishment and things like that, you can enforce other equilibria to arise.
But here -- and this is called the Folk theorem. But here because of the assumptions of the
infinite population and anonymity, this does also not apply. And a quick reasoning why this is
the case, first let me state this quick result, the equilibrium that we talked about for the stage
game is exactly the equilibrium of this game is repeated infinitely.
>>: Can you go back to the [inaudible] game? Just a quick question about how the [inaudible]
they look like -- so suppose we started out at the top left corner. Right? You cheat on I. Then
you move to being -- I'm assuming you move down. Right?
>> Mohamed Mostagir: So ->>: I'm trying to figure out whether it would be a cycle going across all four or ->> Mohamed Mostagir: I'm going to talk about this if you want to wait.
>>: Okay. All right.
>> Mohamed Mostagir: Yeah, yeah. But here the point that I want to make is that if we're
operating in the standard economic universe, then this game has a single solution. This is the
solution. Every day we wake up, I decide whether I will cheat or not with the simple ability that
I decided with yesterday and that I will use tomorrow, and the principal does the same thing,
every day he audits the same fraction of the population.
And the reason this is true is that because the population was large and there is the anonymity
assumption, I cannot single out anyone for punishment. And conversely your action does not
change -- is just so small compared to everyone else. So it doesn't even matter what you do.
>>: [inaudible] if you learn that [inaudible] auditing or not auditing a particular probability if
you have an estimate of that probability, if you have a [inaudible] probability your action might
change, right? At the beginning you might have a [inaudible].
>> Mohamed Mostagir: So wonders of that I will not talk about is that if we assume that we
have Bayesian learning, I start with a probability that you audit with this much. Then we will
converge to this equilibrium.
So whether people are living in a Bayesian fashion or not learning at all and being completely
irrational, this is the equilibrium that we have. And that's it. Okay.
All right. So let's introduce learning into this. And I'll talk about replicator dynamics, but I'm
also happy to answer any questions related to fictitious play.
So here I will assume that the agent is not bothered by auditing. It's the same to him if he's
honest, whether he's audited or not. Of course, it's cheating, then it's a problem.
So under this assumption there are only two possible scenarios for switching. The first is that an
agent who was cheating and got caught and then later meets someone who was not cheating
considers switching to their strategy because they did better than him.
And the opposite scenario is when agent was not cheating and then meets somebody who was
cheating and got away with it, so he said, okay, well, let's give it a try next in the next round. I'll
assume that switching happened in the first scenario with probability P and the second with
probability Q.
And now I'll discuss the myopic principal. So the myopic principal plays in a way that is similar
to how the population plays. He just looks at what happened in the last period and adjusts his
reaction accordingly. He's not taking the whole future into account.
And there are many contexts that actually encourage this short-sighted principal. Most of them
are in politics where you only care about your term is ending in a few years, in a couple of years,
so you don't care about anything else beyond this point.
So here what we'll do is that we'll denote the fraction of cheaters at any point in time by X of T
and the fraction with which the principal audits by alpha T. And then I will write the equations
that describe the evolution of these two quantities, X of T and alpha T. They describe a
dynamical system. And we solve the dynamical system and we get the following theorem,
which we also get if both parties were playing according to fictitious play, which says that the
phase diagram of this game with both parties, the population and the principal learning in the
myopic fashion, is a closed orbit that looks like this with the mixed Nash equilibria that we
talked about as the center of the orbit.
So what this means is the following. Say we start somewhere around here. This is the fraction
of cheaters. And this is the audit rate of the principal. If we start at a point where there are not
that many people cheating and not too much auditing, then what happens is that people start to
meet and those few who were actually cheating and getting away with it, they start telling
everyone about it, and then everyone starts to cheat, and it grows and grows and grows until
some point where the principal starts cracking down on the population because there has been a
very high amount of crime.
>>: So when you're talking about the fraction of cheaters, it has to be the expectation of the next
number of cheaters, right? Because it depends on how you pair them. If you always pair ->> Mohamed Mostagir: So because the population is infinite, everything happens with
probability 1.
>>: Oh, okay.
>> Mohamed Mostagir: Yeah.
>>: So does the time ->> Mohamed Mostagir: And this is -- again, this is -- I'm sorry. This is also an excellent
question because once the population is finite, then all sorts of problems arise, which I also talk
about. Yes.
>>: Does the time integral over the [inaudible] somehow -- is the average the same as ->> Mohamed Mostagir: Yeah. Yeah, yeah, yeah. Yeah. Yeah, yeah. This is the next slide.
Yeah. You guys are great.
So but first let's give a couple of examples about the cyclical behavior. So what's happening here
is that at some point there's little crime but then crime keeps increasing. And then the authority
says, okay, this is not working, let's crack down on these guys, this brings the fraction of
criminals back to a low level, after which the auditing stops again.
This is a very common feature. It's in almost all anti-corruption campaigns. China has a perfect
example of this. One campaign was in the late -- early '50s, the other was in the mid '60s, the
other was in the early '80s, I think.
And if you actually look at the numbers, you would see that just prior to the crackdown the
corruption was very, very high, and then it was driven down to a certain point after which I think
in the first -- after the first crackdown, this was called the Golden Age in China, because there
has been very little crime, and also very little police policing. But eventually crime stars to grow
again, and let's just say it's another crackdown.
Another example is the LA metro. And in LA you just go and take the metro. You don't have to
buy a ticket. Well, you should by a ticket, but there's nothing that prevents you from getting on
the metro.
So if you take the metro every day and you actually don't buy a ticket, there is a good chance that
you'll get away with it for a while.
But every so often you see people coming up and looking at everybody's tickets, and they do this
for a few days, and then they disappear completely, and then after a while they come back again.
So, again, it has this cyclical nature to it.
And then the point that you made about the time average, so if people are playing in this fashion,
then what happens is that the payoff averaged over time for the principal and for an agent, the
population is the same as the Nash equilibrium, which goes back to the point we made earlier,
which says even when you behave in this very naive fashion, you are able to approximate over
time the outcome of everyone being rational.
So fine so far?
So then the national question afterwards is --
>>: [inaudible]
>> Mohamed Mostagir: Yeah.
>>: Okay.
>> Mohamed Mostagir: So what happens when the principal is sophisticated? The principal
knows that the population is learning in such a myopic fashion. Can he do better? And the
answer should be yes. But then how should he do that? How should he play this game?
And to do this, I will formulate an optimal control problem. Again, I will denote the fraction of
cheaters at any time T by X of T. The principal's -- the principal's action, which is the fraction
that he audits, is
alpha of T. And then the cost at any time is given by three terms. The
first is those agents -- those agents who were cheating and actually got caught, which costs you
one. The second is those agents who were not cheating and were still audited, and this gives me
a cost, C2. This is just the cost that I get from the table. And the third possibility is those that
were cheating and I failed to audit, and then I incur the highest cost of these three.
So this is my cost at any moment in time.
And my action at any point in time also affects the evolution of the population of cheaters. Let's
not get into how this happens, but as we described already, the more you crack down on people,
the less you'd expect to see people cheating in the next period.
So my action on time T not only affects the payoff that I will immediately get from here but also
affects my future payoffs through the dependence of the rate of change on alpha.
So the principal's objective then is to minimize his long-run discounted cost over the entire
horizon. So I just put this function G, which is the cost per stage or per time period, we're
[inaudible] this over the entire horizon. This is the discounted factor. Such that the evolution of
the population is governed by the equation of motion from the last slide.
And the way we solve this is that we take Hamiltonian approach. So basically we're [inaudible]
Hamiltonian which reduces the problem to a single period problem. And what this means is that
I want to solve a problem, I want to find the fraction for this period, and there is a price
associated with the entire future that is also affected by my action of this period. And I would
like to minimize this function.
And solving the Hamiltonian gives me unnecessary but not sufficient condition for the optimal
solution. When we do this, we find that the optimal control for the problem is a bang-bang
solution. So this means that I either audit everybody, or I don't do anything. There is a case that
lies in between, which I will get into. This happens when alpha basically disappears from
Hamiltonian.
So that's one condition that the optimal policy has to fulfill. To get another condition, what I will
do is that I will formulate the problem as a variational calculus problem. So I will just replace
alpha of T by this quantity from the equation of motion. And then my problem is to minimize
this. I need to find the function X of T that minimizes this entire group.
And it turns out that if we use the Euler-Lagrange equation to solve this problem, which also
gives us another condition for optimality, what we get is that X of T is actually a constant. So X
of T does not depend on time. [inaudible] says that if X star of T is the minimizer of the
variational calculus problem, then it is equal to C, where C is a constant that depends on the
parameters of the problem.
So this tells you that any optimal policy will have the fraction of T is constant at a certain value.
When I pair this necessary condition with the condition that I had on alpha, I'm able to
characterize -- to completely characterize the optimal policy. And the optimal policy looks like
this. There's a value X bar such that the optimal policy audits everybody when the fraction of
cheaters is more than X bar, or it does nothing when the fraction of cheaters is less than X bar.
But if X is equal to exactly X bar, then this is the third condition here, where I said alpha equal to
this quantity. And I'll talk about later. And then the system stays in this state forever.
>>: So now I'm not sure I'm following. So you're saying you would keep the fraction that cheats
the constant.
>> Mohamed Mostagir: I would like to.
>>: But you can't, right, because they will move depending on how much you audit? You don't
even know that unless you audit, right?
>> Mohamed Mostagir: Well, there is -- it turns out there is an audit rate for which you're able
to keep this fraction constant. How does this work? This works because I'm balancing the
number of people who will decide to switch from cheering to being honest in each period with
the number of guys switched in the opposite direction. So you can show that there is an audit
rate that will achieve this balance. And actually this has this value and it's unique.
So then the policy is I just have this job now, I look at the fraction of cheaters, I see that it's very
high, I crack down on everybody indiscriminately, I drive it all the way to X bar, and once I
reach X bar, I keep it there by auditing this value.
So if you go back to the Recording Industry Association example, what happens is that in the
year 2003 the fraction of file shares from the total Internet population of the U.S. was 35 percent.
The crackdown started then, as you can see in the graph. And by the end of the crackdown,
actually, it's just before the jump to 35 percent, the fraction of file shares was around 18 percent.
>>: [inaudible]
>> Mohamed Mostagir: Oh, I'm sorry. I'm very sorry. So this is the number of copyright
infringement lawsuits filed by the Recording Industry Association of America. And this is time.
Just prior to the green bars, the fraction of file shares was around I think 14 percent. And then
this jump started, initiated this crackdown which lasted for five years. Went down in 2008 and it
drove the percentage of cheaters down to -- the percentage of file sharers down to 16 percent,
which is three years later is still where it's at.
So there's also another thing, and that's a bit of anecdotal evidence. So at Caltech I'm a resident
associate, which means I live with undergraduate students. We get to have dinner together every
night. And so every other week invariably you would find someone on the table saying that, hey,
I was just downloading so-and-so and I got a letter from the [inaudible] lawyer saying stop doing
that.
So there's evidence that they're still auditing at a very low rate, just like the optimum policy
prescribes. And, presumably, I mean, of course, I'm not claiming that this is exactly a
description of how the world works, but how this fits the story is -- how the story fits the optimal
policy is very, very interesting, I thought, where you start with this very huge crackdown and
now they just try to maintain the fraction of file shares at a fixed value.
So this answered the first question that I was interested in, if you remember, which is whether
when we try to account for learning in our problems, whether we see outcomes that actually
resemble observed behavior.
The second question was do these policies perform better than if everyone was rational. And the
answer is a resounding yes. So basically for this problem that we just talked about, the audit trait
that we converged to, which I denote by alpha B, so this is always less than the Nash audit rate.
At the same time, depending on the discount factor, I'm able to drive the fraction of cheaters or
file sharers for this example to very low levels, if I choose to. If we look at the Nash view of this
problem, this is completely insensitive to discounting. No matter how much discounting there is,
what will happen the next period is exactly what happened in this period. But here I'm able to
drive that fraction down and do that by less auditing.
>>: [inaudible]
>> Mohamed Mostagir: The discounting is an objective function of the principal. It's E to the
minus RT.
>>: [inaudible]
>> Mohamed Mostagir: Yeah. So if we take the same objective function, when every one is
rational, and, again, take this E minus RT at the beginning, then it doesn't have any effect.
So what this is saying is that not only can I obtain a better outcome by reducing the fraction of
crime, but I can do that by exerting less effort than the Nash solution.
These are the details of why this is true; I'll just skip.
So let's quickly talk about some other ->>: So this is -- makes it better for the principal but worse for the agents.
>> Mohamed Mostagir: Great question. So this of course makes it better for the principal and
worse for the agents. But assume now that we also include another term in our game to account
for the welfare of the agents. So say that I want to penalize myself for every time that I audit
somebody who was not doing anything wrong. And so the objective now is to minimize crime in
the population. And also not ties to other people. So this more of a social objective kind of like
my objective as a police chief or something.
Then if you do that, you get the same results, but at different convergence rates.
>>: [inaudible] actually play the game [inaudible] switch it around, it would seem essentially
that massive agents would [inaudible].
>> Mohamed Mostagir: If you switch it around, then you get the mirror image of these results.
So say that the principal is done, okay, and the agent is sophisticated. So they would give the
principal the impression that everything's fine, there's very little crime, and then once he sits
back, they start cheating. When he catches on to it, they stop, and so on.
So let's talk about a few more examples. So advertising is a very prominent example. Basically
advertising is -- let's assume that advertising actually increases the potential of agents to buy a
product. But the problem is that advertising is costly. But if we assume that people talk to each
other, people receive the [inaudible] advertising talk to other people who have not and have an
effect that is kind of similar to what advertising would do, it doesn't have to be the full affect but
at least a fraction of it, then what you can do or the optimal policy would again give you cyclical
waves of advertising.
So what happens is that there is a big advertising splash. Afterwards there is very little
advertising, or at least not as huge. And during the time, the advertiser is making use of the fact
that word of mouth is doing the work for him for free.
And as soon as this is over, they start doing it again.
And actually this happens in contexts where we wouldn't think it's happening. Like, for
example, Coca-Cola. Coca-Cola's advertising all the time, and that's a very steady level of
advertising. But I think once every three years or something they have to have a new song, they
have to have a new logo or something. Sometimes it's bigger by designing even the whole
bottle, a new shape for the bottle or for the logo, things like that.
Another example is traffic management. So an example of this is when the Bay Bridge was
closed in San Francisco. What happened is that people adjusted their driving habits. They
started taking different routes. And then when the bridge was reopened again -- and during that
time congestion was actually decreased. But when the bridge opened again, it's not like the next
day everyone started taking the old route, but they slowly converged to driving on their old
routes.
So the point is you can make use of periodic closures of roads or even adjusting tolls
dynamically to make use of this transition period when people adjust their behavior in order to
buy yourself more gains.
Finally, another application is equilibrium selection. So I'm Apple and I just released OS4. I
want everyone to move to OS4. So I start making it very difficult for them to use OS3.
And the principle behind this is essentially the same thing that we talked about, it's that you don't
dictate to people what they should do, but you slowly make one outcome much more favorable
than the other and eventually push everyone to -- towards the equilibrium that you prefer, which
is everyone using less or no one using Windows 98 anymore, et cetera.
So let's talk about other learning models. Let me highlight the features that make these results
work. There are two things, two main points. The first, which we brought up again and again, is
the lack of instantaneous reaction to changes in the environment, like the driving example or like
the cheating thing.
The other thing is strategic myopia, which is whatever way that I use to make the decision,
whether I'm looking at history or whether I'm talking to someone and just imitating [inaudible]
more successful. I'm not thinking about how my action will influence the principal's action in
the future. I'm just pretending that this principal is a machine that's playing a fixed probability.
So then learning models that would fit this is adaptive play, we talked about. Fictitious play is
another thing. Replicator dynamics we already talked about. So these all will have policies that
are similar in structure but different maybe in convergence rates or guarantees. But all of them
will still perform better than the Nash equilibrium.
And models for which these will also not work is say that you respond to last period's play
immediately. So a change happens in the system in the next round, you're still myopic, but
you're responding to this play, then we are unable to derive similar results.
And obviously if there is a Bayesian population that starts with correct priors, then you'll be able
to spot the cyclical behavior and we'll be able to act against it and bring us back to the static
Nash equilibrium case.
>> [inaudible] public list [inaudible].
>> Mohamed Mostagir: So that's a good question. Then depending on the probability you'll
have like [inaudible] cases. If you fully respond to last period's play, then you're unable to do
anything.
>>: Are these models also different from information [inaudible] because in order to respond to
the [inaudible] the principal one would have to know it rather than just talking to people
[inaudible].
>> Mohamed Mostagir: So it's enough for you to say, for example, you're the principal, I'm the
agent. In this period, you audit me. So I'll just respond to your action against me, and that's all I
need to do.
>>: Oh, I see.
>> Mohamed Mostagir: So in the next round I know that you're auditing now, let me not do this.
So let me conclude before I talk about like some other learning projects. So what I wanted to
convince you of is that myopic learning models are powerful enough to predict observed
behavior in many real-world scenarios; that there are contexts in which the principal being
myopic will lead the whole system to converge with something that's very similar to the Nash
equilibrium even though no one is sophisticated enough, and that this cyclical behavior is
[inaudible] of many phenomena that we see in the [inaudible], for example.
For a large class of games, what I did is that I completely characterized the unique optimal policy
that the principal should follow, and I showed that the optimal policy strictly improves on the
Nash solution. And this goes back to the point on the very first slide which is that we should
think of population learning as a resource for the principal to utilize. The fact that agents are
learning is something that I can use to my advantage, as long, of course, as they are learning in a
particular fashion.
This -- I think I forgot to complete, but I already talked about. And then just as we discussed,
these results generalize to other learning models, as long as they have these two core features
that we talked about, and generalize to games in advertising and traffic management and
[inaudible] selection.
So before I go into this, do you want to ask any general questions?
>>: This solution is a bang-bang solution. Is it a pretty robust picture throughout the
[inaudible]?
>> Mohamed Mostagir: Yeah, yeah.
>>: [inaudible] what property is it that makes the [inaudible]?
>> Mohamed Mostagir: You mean technically or ->>: [inaudible]
>> Mohamed Mostagir: Technically is that [inaudible] is just a linear function in your fraction
with which you audit. But if you want to think about it on a higher level, it's the fact that if you
make your decision based on, say, history, then if I convince you that things look a certain way,
the fastest way for me to do this is to do a bang-bang thing.
So I won't convince you that we are auditing everyone who cheats. So instead of auditing at like,
you know, 75 percent, if I audit at a hundred percent for a long time, then if you learn in this
fashion, this is what you'll believe. It will take a while for you to realize this is not what's
happening. Yeah.
>>: So you were talking about the replicator dynamics, and I was wondering -- so here's one
alternative. Alternative one is you randomly pair a bunch of these guys up. As they talk to each
other, each pair decides what's better, cheat or not cheat. An alternative is I don't talk to just one
individual, say I talk to a hundred individuals, or whatever I -- ten individuals. And then I
sample essentially and I know what proportion of the people are getting audited, and then I
respond to that. So I'm thinking to myself, okay, 5 percent of the people are audited or, you
know, 8 percent of the people are getting audited, and actually [inaudible] according to them. So
will this change any of the results and how?
>> Mohamed Mostagir: It will not change the results structurally, but it will change the
conversion theories for sure. Because now the more people you talk to, the better idea you have
about what's actually going on. And so you end up switching more frequently.
And so ultimately if you know everything, then we go back to the case where there's a Bayesian
thing going on where you converge to the Nash equilibrium.
>>: It's just how much uncertainty I have regarding the portion that gets audited.
>> Mohamed Mostagir: Yeah.
>>: [inaudible] dynamics but studied on different topologies? Because it would seem that ->> Mohamed Mostagir: You mean like a network topology?
>>: Yeah, that I only talk to my neighbors ->> Mohamed Mostagir: So ->>: [inaudible] be the same.
>> Mohamed Mostagir: So this is not a replicator dynamic result, but it's similar in spirit. I
think it's a paper by Gould [phonetic] and Jackson from a year ago. What they do is that there's
an network topology and I just take the average of my neighbor's actions. And then they show
that this naive behavior leads to an equilibrium in many games.
So it's similar in the sense that I'm not really sophisticated about what I'm doing, but I'm just
updating myopically by taking an average. And the results are very similar to the paper that I
mentioned in the beginning where people are actually doing Bayesian updating. So maybe with
a large number of people you don't need to be sophisticated enough to arrive at the rational
outcome for a game.
All right. So any other questions before I move on?
>>: So one question. So if you think about other games, like the beauty contest game, you're
assigned [inaudible] what would be the analog [inaudible] game playing beauty contest where
people are -- everybody and their agent try to [inaudible] principals.
>> Mohamed Mostagir: So [inaudible] beauty contest game would look like what?
>>: [inaudible] so people look at how are people learning about -- so they start off, right, and
they see what was [inaudible] and then they realize, okay, people are sort of going towards the
lower number, so then next time their direct influence ->> Mohamed Mostagir: So I think what will happen is that instead of immediately, you know,
playing the low bound, they will start slowly moving towards it.
So I guess if you are one of the players and you know that this is going on, then you can
probably exploit that to your advantage. I have not thought about this, but this would make
sense to me. Yeah. Because, again, there is this feature of -- because on beauty contest you
know exactly what you should do if everyone is rational. But if it takes you a while to get there,
then someone can use this to trick you in the middle.
>>: Do you have many anecdotal [inaudible]?
>> Mohamed Mostagir: So the stuff in the final paper, the hope is that the example at least for
the RIAA will be a complete and full data based example with numbers and everything.
But now something that I'm working on with someone else at Ohio State is doing a lab
experiment to see if people actually really behave in this way in the lab, that play the cheat out of
game, for example.
And of course the first difficulty that you would run into is that all these results are for a large
population. In the lab you have 30 people or something.
So how large is large? There are some games where a large population consists of six people,
and then everything that works on paper actually works on the lab. So the two questions that
we're trying to deal with here are what the threshold after which these procedures will work and
are people really behaving in this non-Bayesian fashion. So hopefully this will shed some light
on that.
Another thing -- so one of the reasons that got me started working on this is that one of my
advisors who works at another company that does ad auctions was telling me that sometimes
they change the rules by which they are billing the advertisers, but they don't publicize this
change. And then they look at the data and they see that they're actually not instantaneously
responding, thus responding to this change.
So with this massive amount of data, I'm interested in looking into patterns that seem to suggest
there is similar behavior going on in terms of, you know, lag and how you respond to things and
how you respond to them, is it a myopic fashion or is it more strategic. So this is more under the
future-work aspect of it.
>>: So isn't it a bias of [inaudible] for example, in the music industry case, when the auditing is
not happening, when people are not being audited, maybe whole cheat events might get
unnoticed and might not even appear in the data? So [inaudible].
>> Mohamed Mostagir: [inaudible] the best thing to do that I can think of is just to estimate
based on -- I mean, the assumption that the population is very large, so whatever sample that you
have will give you a very good information with high probability about the overall composition
of the population.
>> Peter Key: [inaudible].
>> Mohamed Mostagir: Okay. So another -- we have only talked about a single principal. This
would make sense for like a policing application. But in advertising what happens when there
are multiple principals and they all act as if the agents are behaving in this fashion, then what are
the results, what are the equilibrium of these games between the principals, the sum that I am
thinking about. A more general line of research is to understand the benefits and the
shortcomings of how consumers learn.
So this is another project that I'm working on, centralized recommendation system versus
word-of-mouth learning. So think of something like Yelp, where you go into a new city that you
don't know anything about, you want to find the best place to get a steak, you go on Yelp, it tells
you this place, 98 percent of the people like it, you go there.
And an opposite model is that you just ask around, which was the model up until centralized
recommendation systems were around.
So I'm doing this with [inaudible] and the model we have is that we have consumers on a
Hotelling line. So what this is is that we have a line and we have two product vendors at each
end of the line, and we have consumers who are randomly distributed on this line.
And then the payoff to the consumer is a function of the distance between them and the vendor
and the quality of the product that they get. But the quality of the product is unknown. So you
experiment with products. And naturally you -- if you are -- near that end of the line, you'd
experiment to the closest product first.
And then we assume that after enough experimentation there is a breakthrough where the quality
of a product is revealed. So then what we show is that for a recommendation system like Yelp,
what happens is that this information is propagated immediately, and so everyone converges to
this product. And this might not be the socially efficient outcome, because it's possible that the
other products actually have higher quality, but there is not enough experimentation to enable the
population to find that out. And so the question is how do you design the recommendation
system so that you take something like that into account.
And here the learning component of course is obvious in how you learn by talking to other
people or if you just get everything aggregated from a centralized recommendation system.
This other project is actually very, very interesting. So it again has to do with the tension
between standard economics and more recent ways of thinking about how people make
decisions. So in economics if I give you a menu you of 100,000 items and I ask you to choose
the one that you prefer the most, the optimal item, then you can choose it.
Well, it's not clear that this is actually possible. Most of the time subjects actually end up
satisficing. And what this means is that they choose something which is above a reservation
value that they have once they reach it. If it's good enough for them, they don't have to find the
optimal option in the menu.
And there has been a recent eye-tracking experiment done at Caltech. So this is an experiment
that actually started at NYU where a subject is presented with a menu of simple arithmetic
operations, but they vary in difficulty amongst themselves. And what they do is that they need to
choose one of these options, and the outcome of the arithmetic operation is the payoff that they
will get in dollars.
And so ideally what you would want to do is that you would choose the option that will give you
the highest payoff. This is of course a timed experiment. So we are unable to completely -unless you are very smart, you are unable to evaluate all the options. And then it shows very
concrete evidence that once people reach a certain reservation utility, they actually stop. And
this was confirmed by the eye-tracking experiment in that they actually do don't even look at the
rest of the menu.
So then the question is how do we design menus to enforce this satisficing. Say that I want to
sway your choice in a certain direction because I know that you're unable to process so much
information. Then how do I design the menu such that you choose certain things that I would
prefer that you choose instead of you choosing the best for you. And can I use this both for my
gain as say, a firm, or to have a more fair distribution of resources, which is a social outcome
instead of self [inaudible].
And so that's all I have to say.
>> Peter Key: Okay. Thanks very much.
>> Mohamed Mostagir: Thank you.
[applause]
Download