Lecture 9, 10: Game Theory

advertisement
Click
to edit
titleIncentives
style
Using
“Games”
toMaster
Understand
• Game theory models strategic behavior by agents think about how other agents
may behave and how that should influence one’s own choice. It is where the
players think strategically.
• Useful to study
–
–
–
–
Company behavior in imperfectly competitive markets (such as Coke vs. Pepsi).
Military strategies.
Bargaining/Negotiations.
Biology
• Aumann: Interactive Decision Theory
• In perfectly competitive settings, we typically do not have to think
strategically because everyone is “small” and no one can affect the
economic environment.
• But in settings where players are “large”, their own actions can impact
the economic environment – as can their consideration of others’
actions (and reactions).
• Our basic notion of equilibrium will not change all that much: an
equilibrium is reached when everyone is doing the best he/she can given
her circumstances and given what everyone else is doing.
Click toTypes
edit of
Master
title style
“Games”
We can broadly categorize games into four categories, with games differing across two
dimensions:
complete information
incomplete information
simultaneous moves
sequential moves
“rock, paper, scissors”
chess
sealed bid art auction
ascending bid art auction
Complete information games are games in which players know how other players
evaluate the possible outcomes of the game.
Simultaneous move games are games in which all players must choose their actions at
the same time (and thus without observing any other player’s action).
Sequential move games are games in which at least some players observe the actions
of another player prior to having to choose their own actions.
Incomplete information games are games in which at least some players do not know
at least some other player’s evaluation of possible outcomes.
Click toofedit
MasterInformation
title style Game
The Structure
a Complete
Any complete information game has 4 elements:
Players
Who are the players?
How many are there?
Actions
What action(s) can each of the players “play”?
Is there a discrete number of possible actions? or
Are the actions defined by some continuous interval?
Sequence and Information
Do all the players choose their action simultaneously?
Do the players play in sequence?
What do players know about earlier moves by others?
Payoffs
What do players get at the end of the game?
How does this depend on how the game is played?
Example
Player 1: employer
Player 2: job applicant
offer a dollar wage w
accept or reject offer
w from continuum
accept/reject – discrete
employer moves first;
applicant moves next
knowing the wage
offer from stage 1
employer: (MRP – w) if offer
is accepted; 0 otherwise
applicant: w if accepts; next
best alternative otherwise
Click toSimultaneous
edit MasterMove
title style
2-Player
Game
A 2-player simultaneous move game is often represented in a payoff matrix.
Players
Player 1 is represented on the left and Player 2 at the top of the matrix.
Actions
Player 1 has 2 possible actions –
and .
Player 2 also has 2 possible actions –
and
Sequence and Information
Payoffs
.
Players move simultaneous with no information
about the other’s chosen action.
Each player’s payoff depends on her own as well as her opponents’ action.
Player 1’s payoff from playing
when player 2 plays
is
, ,
and player 2’s payoff from the same actions is
.
Click
to
edit
Master
title
style
Battle of Bismarck Sea
• We want to model the Battle of the Bismarck Sea.
• Two Admirals: Imamura (Japan) and Kenny (US).
• Japan is in retreat.
• Imamura wants to transport troops in a convoy
from Rabaul to Lae
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch searching routes.
Click
to
edit
Master
title
style
Imamura wants to run convoy from Rabaul to Lae
Click
to
edit
Master
title
style
Battle of Bismarck Sea
Imamura
North
South
-2
-2
North
2
2
Kenny
-3
-1
South 1
3
This representation is called a Normal form Game.
•Imamura wants to transport troops.
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch routes.
Click
to
edit
Master
title
style
Battle of Bismarck Sea
Imamura
North
South
-2
-2
North
2
2
Kenny
-3
-1
South 1
3
Players
• Imamura wants to transport troops.
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch routes.
Click
to
edit
Master
title
style
Battle of Bismarck Sea
Imamura
North
South
-2
-2
North
2
2
Kenny
-3
-1
South 1
3
Players
Imamura’s
Strategies
• Imamura wants to transport troops.
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch routes.
Click
to
edit
Master
title
style
Battle of Bismarck Sea
Kenny’s
Strategies
Imamura
North
South
-2
-2
North
2
2
Kenny
-3
-1
South 1
3
Players
Imamura’s
Strategies
• Imamura wants to transport troops.
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch routes.
Click
to
edit
Master
title
style
Battle of Bismarck Sea
Imamura
North
South
-2
-2
North
2
2
Kenny
-3
-1
South 1
3
Imamura’s
Payoffs:
Each day of
Bombing =
-1 in payoff
• Imamura wants to transport troops.
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch routes.
Click
to
edit
Master
title
style
Battle of Bismarck Sea
Kenny’s
Payoffs:
Each day of
Bombing =
1 in payoff North
Imamura
North
South
-2
-2
2
2
Kenny
-3
-1
South 1
3
Imamura’s
Payoffs:
Each day of
Bombing =
-1 in payoff
• Imamura wants to transport troops.
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch routes.
Click
to
edit
Master
title
style
Battle of Bismarck Sea
Imamura
North
South
-2
-2
North
2
2
Kenny
-3
-1
South 1
3
This representation is called a Normal form Game.
•Imamura wants to transport troops.
• Kenny wants to bomb Japanese troops. .
• North route is two days, Southern route is three days.
• It takes one day for Kenny to switch routes.
Click to
editofMaster
title
Battle
Bismarck
Seastyle
Imamura
sail North sail South
-2
-2
search North
2
2
Kenney
-3
-1
search South 1
3
For a 2x2 game always draw the arrows!
Click
to edit
Master titleGame
style
Example:
A Coordination
Suppose player 1 and player 2 have to choose a side
of the road on which to drive.
Neither has an inherent preference of one side of
the road over the other – they just don’t want to
cash into each other by driving on opposite sides.
Players
The game then has 2 players.
Actions
Both players can choose from the actions Left and Right.
Sequence and Information
Payoffs
The players make their choice at the same time without
knowing what the other is doing.
The players get 10 each if they choose the same side and 0 each if they
choose opposite sides.
A game of this kind is called a coordination game because the main
objective of the two players is to coordinate their actions to be the same.
Click
to edit
Master
titleGame
style
2-Player
Sequential
Move
A 2-player sequential move game is often represented in a game tree.
Players Player 1 and Player 2.
Sequence and Information
Payoffs
Actions
Player 1 can play
Player 2 can play
or
or
.
.
Player 1 makes her decision first.
Player 2 observes Player 1’s choice before choosing.
Each player’s payoff depends on her own as well as her opponents’ action.
Player 1’s payoff from
playing
when
player 2 plays
is
,
and player 2’s payoff
from the same actions
is
.
Click
to edit Master
titleSets
style
Information
Nodes and
In a sequential move game, players that move later in the game have multiple
information nodes or information sets.
In our game, player 2 makes his decision from on of two possible information nodes –
where the information is the action by player 1 that has been observed.
If player 2 were to make his choice without knowing player 1’s action, both these
nodes would be in the same information set – implying player 2 cannot tell which
node he is playing from when making his choice.
This would be indicated in the game
tree by encircling the two nodes into
one information set.
Such a game would
then be equivalent to
a simultaneous move
game.
Click toVersion
edit Master
title styleGame
A Sequential
of the Right/Left
In a sequential move version of our game in which players choose on which side of the
road to drive, player 1 chooses her action first.
After observing player 1 on the road, player 2 then chooses his side,…
… choosing from the left information
node if observing player 1 on the left
… and choosing from the
right information node if
observing player 1 on the
right.
The payoffs are
then again 10
for both if they
choose the
same side and
0 if they choose
opposite sides.
Click to edit
Master title style
Strategies
Caution:
The two games below are VERY DIFFERENT.
The bimatrix game corresponding to the extensive
form game on the right is a 4x2, not a 2x2 game!
Click to edit
Master title style
Strategies
A strategy is a complete plan for how to play the game prior to the beginning of the game.
But in a sequential move game, a complete plan of action (prior to the beginning of the
game) implies a plan for what action to take from each information set.
Player 2’s strategy must therefore specify
what he plans to do if he observes player 1
going Left …
… AND what he plans to do if he
observes player 1 going Right!
Click toinedit
MasterMove
title style
Strategies
Sequential
Games
Because he moves first (and thus has no information about what player 2 does when he
makes his choice), player 1 still has only two possible pure strategies: Left and Right.
But player 2 and two possible actions from two possible information sets – implying four
possible pure strategies:
(Right, Left) – a plan to go Right from the left Left from the right node
(Left, Right) – a plan to go Left from the left and Right from the right node
Plan from
(Right, Right) – a plan to go Right from both nodes
right
node
Plan from
Plan from
(Left,
Left) – a plan to go Left from both nodes
left
node
right
node
Plan from
Plan from
left
node
right
node
Plan
Planan
from
Asfrom
we next define
left
right
Plannode
from
equilibrium,
it
willnode
leftbecome
node important to
keep in mind the
difference between the
actions and the
strategies for a player.
ClickBest
to Response
edit Master
title style
Strategies
A player is playing a best response strategy to the strategies played by others if and only if
he has no other possible strategy that would result in a higher payoff for him.
For instance, if player 2 plays the strategy Left,
player 1 best responds by also playing the strategy Left.
But if player 2 plays the strategy Right,
player 1 best responds by also playing the strategy Right.
In the sequential version,
if player 2 plays the strategy (Left, Left),
player 1 best responds by playing the
strategy Left.
If player 2 instead plays the
strategy (Left, Reft),
both Left and Right are
best responses for
player 1.
Click toNash
edit Equilibrium
Master title style
A Nash equilibrium is a set of strategies – one for each player – such that every player is
playing a best response strategy given the strategies played by others.
A less formal way of saying the same thing is: A Nash equilibrium occurs when everyone
is doing the best he/she can given what everyone else is doing.
To find a pure Nash equilibrium in a 2-payer payoff
matrix,
– Begin with player 2’s first strategy and ask
“what is player 1’s best response?”
– Then, for this best response by player 1, ask
“what is player 2’s best response to it?”
If player 2’s first strategy is a best response to player 1’s best response to
player 2’s first strategy, then we have found a Nash equilibrium.
– Then move to player 2’s second strategy and go through the same steps again.
In out Left/Right game, we then find a second pure strategy
Nash equilibrium
ClickEquilibria
to edit Master
titleEfficient
style
Nash
NOT always
Now suppose that both players were raised in England where people drive on the left side
of the road.
As a result, the players have an inherent preference for driving on the left, but driving on
the right is still better than crashing into one another.
In our payoff matrix, this implies a lower payoff for
both players in the lower right box.
Both players playing Left is still a Nash equilibrium –
and it is efficient.
5,5
But if player 2 plays Right, player 1’s best response is
to also play Right. And if player 1 plays Right, player 2’s best response is to still play Right.
Both players playing Right therefore is also still a Nash equilibrium –
but it is not efficient.
Would these two pure strategy Nash equilibria remain if one person has an inherent
preference for the left side of the road and one has an inherent preference for the right?
Yes. The basic incentives of the coordination game would remain unchanged unless someone
prefers crashing into the other over driving on his less-preferred side.
Click to Strategy
edit Master
style
Dominant
Nashtitle
equilibrium
In some games, a single strategy is always the best response to any strategy played by others.
Such a strategy is called a dominant strategy.
Consider the Up/Down game – which we construct by beginning
with the last Left/Right game (where we found two pure strategy
Nash equilibria).
Now suppose the (0,0) payoffs are changed to (7,7).
Both players playing Up remains a pure strategy equilibrium.
But when player 2 plays Down, it is now not the case that Down is player 1’s best response.
Rather, player 1’s best response is to play Up. And player 2’s best response to that is to go
Up as well, leading us back to the one remaining Nash equilibrium.
This results directly from Up being a dominant strategy for both players: Regardless of
what the other player does, it is always a best response to play Up.
When every player in a game has a dominant strategy,
there is only a single Nash equilibrium in the game.
In our case here, this Nash equilibrium is efficient, but that is not always the case …
Click to
edit Master
titleEquilibrium
style
Inefficient
Dominant
Strategy
Let’s begin again with the coordination game where both players playing Up as well as both
players playing Down are Nash equilibria.
But then suppose we change the (0,0) payoffs to (15,0) on the
bottom left and to (0,15) on the top right.
We can now check whether the efficient outcome
(10,10) can still arise in a Nash equilibrium.
If player 2’s strategy is Up, player 1’s best response is
to play Down. So (10,10) cannot arise in equilibrium.
Down is in fact a dominant strategy for player 1, as it is for player 2.
The unique Nash equilibrium of this game is thus for both players to play Down,
which results in the inefficient payoff pair (5,5).
Note that the reason each player is playing Down is NOT because each player anticipates
that the other will play Down. Rather, regardless of what the other player does, each
player’s best response is to play Down.
We will later call a game of this type a Prisoner’s Dilemma.
Click toMixed
edit Master
title style
Strategies
So far, we have dealt only with pure strategies – i.e. strategies that involve choosing an
action with probability 1 at each information set.
A mixed strategy is a game plan that settles on a probability distribution that governs the
choice of an action at an information set.
Pure strategies are in fact mixed strategies that select a degenerate probability distribution
that places all weight on one action.
As we will see, games with multiple pure strategy equilibria (like the Right/Left game)
also have (non-degenerate) mixed strategy equilibria.
And games that have no pure-strategy equilibrium always have a mixed strategy
equilibrium.
It is often not easy interpret what it is that we really mean by a mixed strategy
equilibrium.
One way to interpret the concept of a mixed strategy equilibrium in a complete
information game is to recognize it as a pure strategy equilibrium in an incomplete
information game that is almost identical to the complete information game.
Click toMatching
edit Master
title style
Pennies
Consider a game in which Me and You can
choose Heads or Tails in a simultaneous move
setting.
If we choose the same action, I get a
payoff of 1 and You get a payoff of –1.
If we choose different actions, I get a
payoff of –1 and You get a payoff of 1.
If You choose Heads, my best response is to also choose Heads.
But if I choose Heads, Your best response is to choose Tails.
But if You choose Tails, my best response is to also choose Tails.
But if I choose Tails, Your best response is to choose Heads.
Thus, there is no pure strategy Nash Equilibrium in this game.
This game, called the Matching Pennies game, is then often used to motivate the idea
of a mixed strategy equilibrium.
Click toFunctions
edit Master
style
Best Response
withtitle
Mixed
Strategies
Suppose I believer you will play Heads with probability l,
and I try to determine the my best response in terms of
probability r with which I play Heads.
My goal is to maximize the chance that I
will match what you do …
… which means I will choose Tails if you are likely to
choose Tails and Heads if you are likely to choose Heads.
The only time I am indifferent between
Heads and Tails is if l =0.5.
My best response probability r to your l
is then:
0  l  0.5  r  0
l  0.5  0  r  1
0.5  l  1  r  1


Click to
edit Master
title style
Mixed
Strategy
Nash Equilibrium
Your goal is to maximize the chance that you will NOT match
what I do …
0  r  0.5  l  1
r  0.5  0  l  1
0.5  r  1  l  0

In a Nash equilibrium, we have to best-respond to
one 
another ...

… which means our best response functions
must intersect.
At the intersection, r = l = 0.5.
The only Nash equilibrium to the
“Matching Pennies” game is a
mixed strategy Nash equilibrium
with both of us playing each of our
actions with probability 0.5.
Click
to
edit
Master
title
style
Existence of Nash equilibrium
• Nash showed that every game with finitely many
players and strategies has a Nash equilibrium,
possibly in mixed strategies.
Click to Penalty
edit Master
title
style
Kick
Kick L
Kicker
Kick R
Goalie
Dive L
Dive R
1
-1
1
-1
1
-1
1
-1
• A Kicker can kick a ball left or right.
• A Goalie can dive left or right.
Click
to
edit
Master
title
style
Mixed Strategy equilibrium
• Happens in the Penalty kick game.
• Notice that if the Kicker kicks 50-50 (.5L+.5R), the
Goalie is indifferent to diving left or right.
• If the Goalie dives 50-50 (.5L+.5R), the Kicker is
indifferent to kicking left or right.
• Thus, (.5L+.5R,.5L+.5R) is a mixed-strategy N.E.
• Nash showed that there always exists a Nash
equilibrium.
Click toDo
edit
youMaster
believe title
it? style
•
•
Do they really choose only L or R? Yes. Kickers 93.8% and Goalie 98.9%.
Kickers are either left or right footed. Assume R means kick in “easier”
direction. Below is percentage of scoring.
Kick L
Kick R
•
•
Dive L
Dive R
58.3
94.97
92.91
69.92
Nash prediction for
(Kicker, Goalie)=(38.54L+61.46R, 41.99L+58.01R)
Actual Data
=(39.98L+60.02R, 42.31L+57.69R)
to edit Master
TheClick
equilibrium
strategytitle
for style
player 2
Dive L
Kick L
Kick R
Dive R
58.3
92.91
1-q
94.97 1-p
69.92 p
q
u1(L,(1-q)L+qR)= u1(R,(1-q)L+qR)
58.3(1-q)+94.7q=92.91(1-q)+69.92q
58.3+(-58.3+94.97)q=92.91+(-92.91+69.92)q
58.3+36.67q=92.91-22.99q
59.56q=34.61
q*=34.61/59.56=0.5801
Click to edit Master title style
u1
90
L
R
80
70
60
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
q
to edit Master
TheClick
equilibrium
strategytitle
for style
player 1
Dive L
Dive R
-94.97 1-p
-92.91 -69.92 p
q
1-q
u2((1-p)L+pR,L)= u2((1-p)L+pR,R)
Kick L
Kick R
-58.3
-58.3(1-p) -92.91p=-94.97(1-p) -69.92p
-58.3-34.61p=94.97+25.05p
59.66p=36.67
p*=36.67/59.66=0.6146
Click to edit Master title style
u2 -60
L
-70
R
-80
-90
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
p
Click
to
edit
Master
title
style
Parking Enforcement Game
Student Driver
Park OK Park in Staff
Check
University
Don’t
-5
-5
5
-5
5
0
-95
5
• Student can decide to park in staff parking.
• University can check cars in staff parking lot.
Click to What
edit Master
title
style
happens?
• If the University checks, what do the students do?
• If the students park ok, what does the Uni do?
• If the uni doesn’t check, what do the students do?
• If the students park in the staff parking, what does
the uni do?
• What is the equilibrium of the game?
• What happens if the university makes it less harsh a
punishment to only –10. Who benefits from this?
Who is hurt by this?
ClickBest
to edit
Master
title
style
replies of students
• Suppose university controls with probability 1p.
• Payoff for student
– If he parks ok: -5
– If he parks in staff area: -95(1-p)+5p
• It is better to park ok if
-5> -95(1-p)+5p
90>100p, p<0.9, 1-p>0.1
i.e. if student gets controlled with more than 10%
probability.
Click to edit Master title style
u2
0
-10
ok
-20
not
-30
-40
-50
-60
-70
-80
-90
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
p
ClickBest
to edit
Master
title
style
replies of students
q
p=0
p
p=0.9
p=1
q=0
q=1
Click to edit
Master
title
style
Terminology
• IO: Response function
• (not really a function): best reply
correspondence
ClickBest
to edit
Master
title
style
replies of university
• Suppose student parks in staff area with
probability q.
• Payoff for university
– If university checks: -5(1-q)+5q
– If not: 5(1-q)
• It is better to check if
-5(1-q)+5q>5(1-q); 5q>10(1-q);
15q>10; q>2/3
i.e. if student parks in staff area with a probability
greater than 66.6%
Click to edit Master title style
u1
5
4
3
2
1
0
not
-1
watch
-2
-3
-4
-5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
q
ClickBest
to edit
Master
title
style
replies of university
It is optimal for the university
• to check if and only if q>2/3;
• not to check if and only if q<2/3
• to randomize between the two options in any
way if and only if q=2/3
ClickBest
to edit
Master
title
style
replies of university
q
p=0
q=0
p
p=1
q: probability to park in staff area
q=2/3
q=1
Click toNash
edit equilibrium
Master title style
q
p=0
q=0
q=2/3
q=1
p
p=0.9
p=1
Nash equilibrium in mixed strategies: (0.1C+0.9NC,(1/3)OK+(2/3)S)
Click toNash
editEquilibrium
Master title
style
1
• Student parks legally 1/3 of the time and the
Uni checks 1/10 of the time.
• Penalty:95
Click to
edit
Master
title
style
Change in penalty?
• Suppose the because of students parking the penalty
has been recently increased from 10 to 95. To find
out how this has improved the situation we have to
calculate the previous Nash equilibrium at the
penalty 10….
Click
to
edit
Master
title
style
Parking Enforcement Game
Student Driver
Park OK Park in Staff
Check
University
Don’t
-5
-5
5
-5
5
0
-10
5
• Student can decide to park in staff parking.
• University can check cars in staff parking lot.
Click to edit Master title style
u2
0
-10
ok
-20
not
-30
-40
-50
-60
-70
-80
-90
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
p
Click to edit Master title style
u2
4
2
not
0
-2
ok
-4
-6
-8
-10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
p
Click toNash
edit equilibrium
Master title style
q
p=0
q=0
q=2/3
q=1
p
p=0.9
p=1
Nash equilibrium in mixed strategies: ((1/3)C+(2/3)NC,(1/3)OK+(2/3)S)
Click to editAnswer
Master title style
• With lower penalty, student parked legally 1/3
of the time and the uni checked 2/3 of the
time.
• Who’s expected payoff changes? No one.
• Parking “inspectors” were more “lazy”,
probability of “illegal” parking has not
changed…
Click
to
edit
Master
title
style
The watchman and the thief
• Experimental observations confirm with
comparative statics, but there are strong ownpayoff effects.
Click
title
style
Backtotoedit
the Master
Right/Left
Game
In the Right/Left game, my goal continues to be to match what you do.
Letting r and l be the probabilities you and I place on Left,
my best response function then looks as it did for
“Matching Pennies”.
Me
But your best response function now also has you trying to
match what I do …
You
… resulting in three intersections of our
best response functions.



r 0
r   1
our previous pure
strategy Nash equilibria
r    0.5
a new mixed strategy
Nash equilibrium
Whenever there are multiple pure
strategy Nash equilibria, there is also at
least one mixed strategy equilibrium.
to edit
Master
titleProbabilities
style
Mixed Click
Strategy
NE with
Different
Now suppose that the payoffs from coordinating on Right are 5 instead
of 10, with r and l still the probabilities you and I place on Left.
My expected payoff from going Left is then 10l, and my
expected payoff from going Right is 5(1 – l).
You
Me
These are equal to one another when l=1/3, implying I am
indifferent between Left and Right when l=1/3, will best-respond
with Left when l > 1/3, and with Right when l < 1/3.
5,5
You face the same incentives, giving rise to the
same best response l to my r.
We still see the same pure strategy Nash
equilibria, but now …
The mixed strategy equilibrium now
has us both playing Heads with
probability 1/3.
1/3
1/3
Click to edit
Master
title
style
Stag hunt
Hunter 2
Stag
Rabbit
40
25
Stag
40
5
Hunter 1
25
5
Rabbit
25
25
Example based on Rousseau
3 Nash equilibria: (S,S), (R,R) and ((7/9)S+(2/9)R, (7/9)S+(2/9)R)
Click Pure
to editcoordination
Master title style
Harry
L (help) R (fun)
Top (help)
Amelia
Bottom
(fun)
x
x
14
4
4
14
0
0
Click
to
edit
Master
title
style
Coordination Problem
Jim
VHS
VHS
Sean
Beta
*
1
1
0.5
Beta
0
0
0.5
2
*
2
• Jim and Sean want to have the same VCR.
• Beta is a better technology than VHS.
ClickBattle
to editof
Master
title
style
the sexes
Boxing
Bob
Ballet
Alice
Boxing Ballet
1
0
2
0
2
0
0
1
All these games have also mixed strategy equilibria.
Here: Both play their preferred strategy with 2/3
Probability. Then each party gets the same expected
payoff with each strategy and so all strategies are
optimal
Click to edit
Master
title
style
Chicken
Teen 2
Chicken
Dare
Chicken
5
Teen 1
Dare
7
5
4
4
Example by Bertrand Russel
Motivated by film with James Dean
Halbstarker = Teddies?
7
0
0
Click toelimination
edit Master
style
Iterated
oftitle
dominated
strategies
L
T
-5
5
C 6
B 1
R
M
-3
3
-6
-2
2
-4
4
-1
-3
3
-6
6
-0
0
Click toelimination
edit Master
style
Iterated
oftitle
dominated
strategies
L
T
-5
5
C 6
B 1
R
M
-3
3
-6
-2
2
-4
4
-1
-3
3
-6
6
-0
0
Click toelimination
edit Master
style
Iterated
oftitle
dominated
strategies
T DOMINATED BY C, ELIMINATE T
L
T
-5
5
C 6
B 1
R
M
-3
3
-6
-2
2
-4
4
-1
-3
3
-6
6
-0
0
Click toelimination
edit Master
style
Iterated
oftitle
dominated
strategies
M DOMINATED BY R, ELIMINATE M
L
C 6
B 1
R
M
-6
-4
4
-1
-3
3
-6
6
-0
0
Click toelimination
edit Master
style
Iterated
oftitle
dominated
strategies
B DOMINATED BY C, ELIMINATE B
L
C 6
B 1
R
-6
-3
3
-1
-0
0
Click toelimination
edit Master
style
Iterated
oftitle
dominated
strategies
L DOMINATED BY R, ELIMINATE L
L
C 6
R
-6
-3
3
Click toelimination
edit Master
style
Iterated
oftitle
dominated
strategies
UNIQUE NASH EQUILIBRIUM (C,R)
R
C
-3
3
Click toExtensive
edit Master
title style
games
l
(-10,5)
2
L
r
1
R
(0,0)
(5,3)
Click
editand
Master
title style
Theto
frog
the scorpion
l
scorpion
(-10,5)
(sting)
r
L
(carry)
frog
R
(0,0)
(5,3)
Click to edit Master
title stylepoint
The subgame-perfect
equilibrium
(Selten 1965)
l
scorpion
(sting)
r
L
(carry)
frog
R
(0,0)
(do not carry, sting)
(-10,5)
(5,3)
to edit
Master title
style
NashClick
Equilibria
in Sequential
Move
Games
Now consider a sequential version of the Left/Right game where both players are British
and thus have an inherent preference for driving on the left side of the road.
Recall that player 2 has four possible
and a Nash equilibrium is a set of
strategies, such that both players are best-responding to one another.
Suppose player 2 plays (Left,Left) – i.e. she always plans to go Left.
Player 1 then best-responds by playing Left.
The outcome (10,10) is then a
Nash equilibrium outcome.
But suppose player
2 plays (Right,
Right) – i.e. she
always plans to go
Left.
And (Left,Left) is a best
response for player 2 to Left.
Then player 1 best-responds
by going Right.
And (Right, Right)
is a best response
for player 2 to
Right …
… giving us a
second possible
Nash equilibrium
outcome.
to edit
Master title
style
NashClick
Equilibria
in Sequential
Move
Games
Now consider a sequential version of the Left/Right game where both players are British
and thus have an inherent preference for driving on the left side of the road.
Recall that player 2 has four possible strategies,
and a Nash equilibrium is a set of
Suppose player 2 plays (Left,Left) – i.e. she always plans to go Left.
such that both players are best-responding to one another.
Player 1 then best-responds by playing Left.
The outcome (10,10) is then a
Nash equilibrium outcome.
But suppose player
2 plays (Right,
Right) – i.e. she
always plans to go
Left.
And (Left,Left) is a best
response for player 2 to Left.
Then player 1 best-responds
by going Right.
And (Right, Right)
is a best response
for player 2 to
Right …
… giving us a
second possible
Nash equilibrium
outcome.
Click to
Master
titleGames
style
Finding
alledit
NE in
Sequential
One way to find all the pure-strategy Nash equilibria in a sequential game is to depict the
game in a payoff matrix – with strategies on the axes of the matrix.
When players end up driving on opposite sides of the road, they get 0 payoff each.
When both players end up driving on the right side of the road, they get a payoff of 5 each.
And when both players drive on the left, they both get a payoff of 10.
If player 2 picks either (Left,Left) or (Left,Right), player 1 best-responds with Left,
and the original player 2 strategies are best responses to Left – giving us two Nash
equilibrium outcomes.
But if player 2 (Right,Right), player 1 best-responds with Right, to which the original
(Right,Right) is a best response for player 2 – giving us a third Nash equilibrium.
Click
to edit Master
title
style
Non-Credible
Plans and
Threats
There is, however, a problem with the Nash equilibrium under which all players end up
choosing Right.
Although every player is indeed best-responding, the equilibrium requires that player 1
thinks that player 2 plans to choose Right from her left node after observing player 1
choosing Right!
Since player 2 never reaches that node in the
equilibrium, planning to always go Right is still
a best-response if player 1 plays Right.
So far, player 2 has no incentive to
threaten such a plan …
… but that changes if
player 2 is American
and not British!
Player 2’s “plan” to
play (Right,Right) is
now intended as a
threat to
10
player 1 …
5
But such a player 2 plan should
be viewed as non-credible by
player 1.
… and player 1 bestresponds by playing Right
IF he believes the threat.
But the threat
is still just as
non-credible.
5
10
Click to edit
Master title
style
Eliminating
Non-Credible
Threats
We can eliminate Nash equilibria that are based on non-credible threats by assuming player 1
will only consider credible strategies by player 2.
Player 1 therefore looks down the game tree and knows player 2 will play Left after observing
Left from player 1, and he similarly knows player 2 will play Right after observing Right.
The only credible strategy for player 2 is then (Left,Right). Such credible strategies are called
subgame-perfect.
The name “subgame-perfect” is
motivated the fact that we are
considering sub-games
… and we insist that
that begin at each
the “sub-strategies”
information
from each node are
node …
best responses
from that
node.
Click to edit
Master
title style
Subgame
Perfect
Nash Equilibrium
Subgame-perfect strategies (Selten 1965) are then strategies that involve best responses at
every information node – whether that node is reached in equilibrium or not.
(If a player cannot tell which node he is playing from, we can extend this definition by
defining subgames as starting at an information node that is also an information set.)
As a result, all non-credible threats are eliminated when we focus only on subgame-perfect
strategies.
A subgame-perfect Nash equilibrium is a Nash equilibrium
in which only subgame-perfect strategies are played.
When a sequential game has a finite number of stages, it is easy to solve for the
subgame-perfect Nash equilibrium:
– Begin with the last stage (at the bottom of the tree) and ask what actions the
player will take from each of her information node.
– Go to the second-to-last stage and ask what actions the player will take from
each information node given he knows the next player’s credible strategy.
– Keep working backwards along the tree in the same way.
Click toAnother
edit Master
title style
example
Consider the case of an Existing Firm that is threatened by competition from a Potential Firm
that might choose to enter the market.
The Existing Firm moves first and sets either Low or a High price. The Potential Firm then
either Enters or Doesn’t Enter. If it Enters after the Existing Firm sets a Low price, its
expected profit is negative (b/c of a fixed entry cost) while the first firm’s profit is positive.
If it Doesn’t Enter at a low price, it makes zero profit while the first firm has higher profits.
If the Potential Firm Enters following a High
price set by the Existing Firm, it can price
lower and make positive profit while
causing the first firm to make losses.
But if the Potential
Firm Doesn’t Enter,
the Existing Firm
gets the highest
possible profit (and
the second firm gets
zero.)
Click
to edit
Master title Equilibrium
style
Solving
for the
Subgame-Perfect
To solve for the subgame-perfect Nash equilibrium, we solve from the bottom up.
We begin at the Potential Firm’s first node and see that, following a Low price, it does
better by not entering. And from its second node (following High), the Potential Firm
does better by Entering.
The Potential Firm’s subgame-perfect strategy is then (Don’t Enter, Enter).
The Existing Firm can anticipate this
and therefore chooses between a
payoff of 20 for Low and a payoff of 10 for High.
In the subgame–
perfect equilibrium,
the Existing Firm sets
a Low price and the
Potential Firm stays
out of the market.
ClickThe
to Prisoner’s
edit Master
title style
Dilemma
The Prisoner’s Dilemma is the most famous game in game theory because it applies to an
extraordinarily large number of real world settings.
It derives its name from the following scenario:
– Two prisoners were caught committing a minor crime,
but the prosecutor is quite convinced that they
also committed a more major crime together.
– The prosecutor puts them in separate rooms and tells
each of them that they can either Deny the major
crime or Confess to it.
– If they both Deny, they will both be convicted of the minor crime and receive a 1-year
jail sentence.
– If they both Confess, the prosecutor will agree to a plea bargain that will get both of
them a 5-year jail sentence.
– If prisoner 1 Confesses while prisoner 2 Denies, the prosecutor will use the confession
to get a 20-year jail sentence for prisoner 2 while letting prisoner 1 go free.
– If prisoner 2 Confesses while prisoner 1 Denies, prisoner 1 gets the 20-year jail
sentence while prisoner 2 goes free.
Click
to edit Master
title Dilemma
style
Nash
Equilibrium
in Prisoner’s
Now consider prisoner 1’s incentives:
– If prisoner 2 Denies, prisoner 1’s best response is to
Confess and get no jail.
– If prisoner 2 Confesses, prisoner 1’s best response is
to Confess and get 5 years in jail (rather than 20).
Prisoner 2 has the same incentives to Confess
regardless of what prisoner 1 does.
Confessing is therefore a dominant strategy for both prisoners.
In the only Nash equilibrium to the game, both prisoners therefore confess and
both end up with 5-year jail sentences …
… but both would much prefer the outcome of only a 1-year jail sentence in the
top left corner of the payoff matrix.
Cooperating with one another by Denying the crime is the efficient
outcome in this game – but it cannot be sustained in an equilibrium.
Click toHiring
edit Master
title style
an Enforcer
Suppose the two prisoners entered into an agreement prior to committing the major
crime – an cooperative agreement to Deny the crime if questioned.
In the absence of an enforcement mechanism,
such an agreement cannot be sustained in
equilibrium …
… because it remains a dominant strategy to
Confess when the time comes.
X
X
X
It then becomes in each player’s best interest to hire an enforcer to their agreement.
The two could, for instance, join a mafia organization that changes the payoffs to
Confessing to X that is much worse than 20 years in jail.
It then becomes a dominant strategy to Deny – with “efficiency” restored.
In many real-world applications of the Prisoner’s Dilemma incentives,
the “outside enforcer” we hire is the government.
Examples: Taxes for public goods; combating negative externalities
In other cases, social norms and social pressures can take the role of
altering the incentives of a Prisoner’s Dilemma.
X
Click to
Master title
style PD
“Unraveling
of edit
Cooperation”
in Repeated
It may seem that repeated interactions may also induce cooperation. Consider then a
different version of the Prisoner’s Dilemma, with Me and You
as players and with payoffs in terms of utility or dollar values.
The only Nash equilibrium is again
for neither of us to cooperate –
thus getting payoffs (10,10) when
we could have (100,100) by
cooperating.
Now suppose we play this game N different times – giving us a sequential move game
in which we repeatedly play a simultaneous move game at each of N stages.
In the final stage, the game becomes the same as if we only played once – implying a
dominant strategy of NOT cooperating in the final stage.
To find the subgame-perfect equilibrium, we now have to work backwards.
In the second-to-last stage, we know there will be no cooperation in the last stage – which
makes it a dominant strategy to NOT cooperate in the second-to-last stage.
Continuing the same reasoning, we see that cooperation unravels from the bottom of
the sequential game up.
Click Repeated”
to edit Master
title Dilemmas
style
“Infinitely
Prisoner’s
But now suppose that the game is repeated infinitely many times, or, more realistically, that
each time we encounter each other we know that there is a good chance we’ll encounter
each other again to play the same game.
We can then no longer “solve the
game from the bottom up” –
because there is no “bottom”.
Suppose that I then propose we
play the following strategy:
I will Cooperate the first time, and I will continue to Cooperate as
long as you cooperated in all previous stages. But if you ever Don’t
Cooperate, I will NEVER cooperate again.
Is it a best response for you to play the same strategy?
Yes, assuming the chance of us meeting again is sufficiently large so that the present discounted
value of future cooperation outweighs the immediate one-time benefit from not cooperating.
Such a strategy is called a trigger strategy because it uses an action by the opposing
player to “trigger” a change in behavior.
Repeated Games
Mars
Not Shoot Shoot
Not Shoot
Venus
Shoot
-5
-5
-1
-15
-15
*
-10
-1
-10
Experiments on PD
• Pure one shot game versus random
matching: cooperation dies out quickly
• Mild gender effects
• Does Studying Economics Inhibit
Cooperation? Frank, Gilovich, Regan claim
that economic students are less
cooperating than other students
• Now: THEORY of repeated PD’s with fixed
matching
Repeated games
• 1. if game is repeated with same players,
then there may be ways to enforce
a better solution to prisoners’ dilemma
• 2. suppose PD is repeated 10 times and
people know it
– then backward induction says it is a dominant
strategy to cheat every
round
• 3. suppose that PD is repeated an
indefinite number of times
– then it may pay to cooperate
• 4. Axelrod’s experiment: tit-for-tat
Continuation payoff
• Your payoff is the sum of your payoff
today plus the discounted “continuation
payoff”
• Both depend on your choice today
• If you get punished tomorrow for bad
behaviour today and you value the future
sufficiently highly, it is in your self-interest
to behave well today
• Your trade-off short run gains against long
run gains.
Infinitely repeated PD
• Discounted payoff, 0<d<1 discount
factor (d0=1)
• Normalized payoff: (d0u0+ d1u1+
d2u2+… +dtut+…)(1-d)
• Geometric series:
(d0+ d1+ d2+… +dt+…)(1-d)
=(d0+ d1+ d2+… +dt+…)
-(d1+ d2+ d3+… +dt+1+…)= d0=1
Infinitely repeated PD
• Constant “income stream” u0=
u1=u2=… =u each period yields total
normalized income u.
• Grim Strategy: Choose “Not shoot”
until someone chooses “shoot”,
always choose “Shoot” thereafter
• Is non-forgiving, problem: Not
“renegotiation proof”
• Payoff if nobody shoots:
(-5d0- 5d1-5d2-… -5dt+…)(1-d)=-5
=-5(1-d)-5d
• Maximal payoff from shooting in first
period (-15<-10!):
(-d0-10d1-10d2-… -10dt-…)(1-d)
=-1(1-d)-10d
• -1(1-d)-10d< -5(1-d)-5d iff
4(1d)<5d or 4<9d d>4/9  0.44
• Cooperation can be sustained if d> 0.45,
i.e. if players weight future sufficiently
highly.
Selten / Stöcker,1986
• Students play 5 times a 10-round fixed
pair repeated PD
• New, random assignment for each play of
the repeated game.
• Results initially: chaos, players learn to
cooperate and use punishments
• With experience: cooperation emerges
• With more plays: players learn to defect in
the last periods (end-effect)
• Final periods of defection get longer
Click
to edit Trigger
MasterStrategies
title style
Equilibrium
Each of us playing this trigger strategy is therefore a Nash Equilibrium to the infinitely
repeated Prisoner’s Dilemma – with cooperation emerging in equilibrium
Is it subgame-perfect?
Since we can no longer solve for the subgame-perfect equilibrium from the bottom up,
we have to return to a more general definition of subgame-perfect:
A Nash equilibrium is subgame-perfect if all subgames – whether they are
reached in equilibrium or not – also involve Nash equilibrium strategies.
Note that each subgame of the infinitely repeated Prisoner’s Dilemma is also an
infinitely repeated Prisoner’s Dilemma!
Since each of us playing our trigger strategy is a Nash equilibrium to the infinitely
repeated Prisoner’s Dilemma, it is therefore also a Nash equilibrium in every subgame.
Cooperation can therefore be sustained as a subgame-perfect Nash
equilibrium in infinitely repeated Prisoner’s Dilemma games.
Click“Forgiving”
to edit Master
style
More
Triggertitle
Strategies
The trigger strategy that punishes one act of non-cooperation with eternal noncooperation is the most unforgiving trigger strategy that can lead to cooperation.
The same outcome can be achieved with more forgiving trigger strategies that punish
deviations from cooperation with some period of non-cooperation but give a chance to the
opposing player to recover cooperation at a cost.
The most famous such strategy is known as Tit-for-Tat:
I will Cooperate the first time, and from then on I will mimic what you
did the last time we met.
A branch of game theory known as evolutionary game theory has given strong reasons to
expect Tit-for-Tat to be among the most “evolutionarily stable” trigger strategies in
infinitely repeated Prisoner’s Dilemma games.
In the study of how cooperation may evolve in systems governed by mechanisms akin to
evolutionary biology, Tit-for-Tat is a likely candidate for a strategy that evolves as a social
norm within civil society.
Download