EC941 - Game Theory Lecture 7 Prof. Francesco Squintani Email:

advertisement
EC941 - Game Theory
Lecture 7
Prof. Francesco Squintani
Email: f.squintani@warwick.ac.uk
1
Structure of the Lecture

Infinitely Repeated Games

Nash and Subgame-Perfect Equilibrium

Finitely Repeated Games
2
Repeated Games

Repeated games are a special class of interactions,
represented as extensive form games.

A simultaneous move game, represented as a normal
form game, is repeated over time.

This yields to enlarging the set of equilibria, if players
are sufficiently patient. For example, cooperation is a
subgame perfect equilibrium in the prisoner’s dilemma.
3
Definition Let G = (N, A, u) be a strategic game. Let T
be finite or infinite. The T-repeated game of G for the
discount factor δ is the extensive game in which:
 the set of players is N
 the set of terminal histories is the set of infinite
sequences (a1, a2, . . .) of action profiles in G
 the player function assigns the set of all players to every
proper sub-history of every terminal history
 the set of actions of player i after any history is Ai
 each player i evaluates each terminal history (a1, a2, . . .)
according to its discounted average
(1 − d) ∑Tt=1 d t−1 ui (at).
4
Repeated Prisoner Dilemma

Suppose that the following game is infinitely repeated
with discount factor d.
C
C
D
D
2, 2
0, 3
3, 0
1, 1
5
Strategies

A player’s strategy in an extensive game specifies her
action after all possible histories after which it is her
turn to move.

A strategy of player i in an infinitely repeated game of
the strategic game G specifies an action of player i (a
member of Ai) for every sequence (a1, …, aT) of
outcomes of G.
6
Grim Trigger Strategy


Consider the repeated prisoner’s dilemma.
The strategy prescribes that the player initially
cooperates, and continues to do so if both players
cooperated at all previous times.
si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T.
si (a1, . . . , aT) = C otherwise.

Note that a player defects if either she or her opponent
defected in the past.
7
Automaton Representation
An automaton for player i is (X, x0 , f, g ).
1. X is a set of states.
2. x0 is the initial state of the automaton.
3. f : X x A
X is the transition across states, as a
function of the play.
4. g: X
Ai is the play output at each state.
8
The automaton of the Grim Trigger Strategy is as follows:
 There are two states: C in which C is chosen, and D, in
which D is chosen.
 The initial state is C.
 If the play is not (C,C) in any period then the state
changes to D.
 If the automaton is in state D, there it remains forever.
(C,C)
*
C
(C,D)
(D,C)
(D,D)
D
9
Tit for Tat


The player initially cooperates.
At subsequent rounds, she plays the strategy played by
the opponent at the previous round.
si (a1, . . . , aT) = C if aTj = C or T=1.
si (a1, . . . , aT) = D if aTj = D
( . ,C)
( . ,D)
*
D
C
( . ,C)
( . ,D)
10
Grim Trigger Nash Equilibrium

Suppose that player j adopt the grim trigger strategy.

If player i plays grim trigger, then the outcome is (C, C)
in every period with payoffs (2, 2, . . .).
The discounted average is 2.

If i deviates from the grim trigger strategy, then there is
one period (at least) in which she chooses D.

All subsequent periods player j chooses D. So the best
deviation for player i is choosing D in every subsequent
period (because D is her unique best response to D).
11

If i can increase her payoff by deviating then she can do
so by deviating to D in the first period.

She obtains the stream of payoffs (3, 1, 1, . . .) with
discounted average
(1 − d)[3 + d + d 2 + d 3 + · · ·] = 3(1 − d) + d.

Thus player i cannot increase her payoff by deviating if
and only if
2 ≥ 3(1 − d) + d, or d ≥ 1/2.

Hence, if d ≥ 1/2, then playing the grim trigger strategy
by both players is a Nash equilibrium of the infinitely
repeated Prisoner’s Dilemma.
12
Tit for Tat Nash Equilibrium

Suppose that player j adopts the tit for tat strategy.

If player i plays tit for tat, then the outcome is (C, C) in
every period with payoffs (2, 2, . . .).

If i deviates, then there is one period (at least) in which
she chooses D.

At the subsequent period player j chooses D.
If player i plays D, she triggers one further D by j.
If i chooses C, then she obtains C.
13

If i can increase her payoff by deviating, then she can
do so by deviating to D in the first period.

Player i can obtain either the stream (3, 1, 1, 1,…) or
the stream (3, 0, 3, 0, . . .) with discounted average
(1 − d)[3 + 0d + 3d2 + 0d3 +…] =
3 (1 − d) ∑∞t=0 d 2t = 3 (1 − d)/(1 - d2) = 3/(1 +d)

Thus player i cannot increase her payoff by deviating if
and only if
2 ≥ 3(1 − d) + d and 2 ≥ 3/(1 + d), or d ≥ 1/2.
14
Nash Folk Theorem
in the Prisoner Dilemma

Definition The set of feasible payoff profiles of a
strategic game is the set of all weighted averages of
payoff profiles in the game.

For any feasible pair (x1, x2) of payoffs there is a finite
sequence (a1, . . . , ak) of outcomes for which each player
i’s average payoff is close to xi:
[ui(a1)+…+ ui(ak)]/k – e1 < xi
< e1 + [ui(a1)+…+ ui(ak)]/k.
15

The discounted average payoff is as close as possible to
xi when taking the discount factor close enough to 1:
(1 − d)∑∞t=1 d t-1 ui(at) – e2 < xi
< e2 + (1 − d)∑∞t=1 d t-1 ui(at).

Consider the feasible payoff pair (x1, x2), and the
outcome path b that consists of repetitions of the
sequence (a1, . . . , ak): bnk+l = al for l = 1,…,k.

Consider the strategy
si (h1, . . . , hT-1) = bT if ht = bt for t = 1, . . . , T − 1
si (h1, . . . , hT-1) = D otherwise.
16

As long as x1 > u1 (D,D) and x2 > u2 (D,D), this “grim
trigger” strategy is a Nash Equilibrium.

We conclude that any feasible payoff pair (x1, x2) such
that x1 > u1 (D,D) and x2 > u2 (D,D) is a Nash
Equilibrium payoff of the Prisoner’s Dilemma game.
17
Nash Folk Theorem

Consider a one-shot game. Suppose that each player i
can guarantee herself a “minimum” payoff mi.

We will show that every feasible payoff profile w such
that wi > mi can be achieved as the discounted average
payoff profile of a Nash equilibrium in the infinitely
repeated game, when d is close to 1.

This payoff can be achieved with strategies similar to
grim trigger strategies. Deviation from path by player i
is punished by minimizing i’s payoff forever.
18

For the Prisoner’s Dilemma, the minimum payoff of
player i supported by a Nash equilibrium is ui(D, D).

Player j can ensure (by choosing D) that player i’s
payoff does not exceed ui(D, D), and there is no lower
payoff with this property.

Hence, ui(D, D) is the lowest payoff that player j can
force upon player i.

What is this minimum payoff for player i in an arbitrary
game?
19

For any collection a−i of the other players’ mixed
actions, player i’s highest possible payoff is
max ui (ai , a−i).
{ai ∈ Ai}

As a−i changes, this maximal payoff changes.
The collection a−i of “punishments” that make
this maximum as small as possible is the solution of
min
max
ui (ai , a−i).
{a−i ∈ D(A-i)} {ai ∈ Ai}

This payoff is known as player i’s minmax payoff.
20
Theorem (Nash Folk Theorem). Let G be a strategic
game. Let w be a feasible payoff profile of G for which each player’s
payoff exceeds her minmax payoff. Then, for all e > 0, there exists
δ < 1 such that if the discount factor exceeds δ, then the
infinitely repeated game of G has a Nash equilibrium whose
discounted average payoff profile w’ satisfies |w’ − w| < e .

For any discount factor δ with 0 < δ < 1, the discounted
average payoff of every player in any Nash equilibrium of
the infinitely repeated game of G is at least her minmax
payoff.
21



Let x be the payoff profile induced by the actions a. By
hypothesis, each xi exceeds player i’s minmax payoff.
For each player i, let p-i be a profile of mixed actions
for the players other than i that holds player i down to
her minmax payoff.
Define each player i’s strategy as follows.
In each period, play ai as long as the play was a in every
previous period.
Otherwise play (p-j)i, where j is the player who
deviated in the first period in which the play was not a.
22

Let H∗ be the set of histories in which there is a period
in which exactly one player j chose an action different
from aj.

Refer to j as a lone deviant.

The strategy of player i is defined as follows:
si (∅) = ai,
si (h) = ai if h is not in H∗,
si(h) = (p-j)i if h ∈ H∗ and j is the first lone deviant in h.
23

We now show that the profile s is a Nash equilibrium.

If each player i adheres to si, then her payoff is xi in
every period.

If player i deviates from si, then she may gain in the
period in which she deviates, but she loses in every
subsequent period, obtaining at most her minmax
payoff, rather than xi.

Thus for a discount factor close enough to 1, si is a best
response to s-i for every player i.
24
Subgame Perfect Equilibrium
Theorem (One-Shot Deviation Property of subgame
perfect equilibria of infinitely repeated games)
A strategy profile in an infinitely repeated game is a subgame
perfect equilibrium if and only if no player can gain by changing
her action after any history, given both the strategies of the other
players and the remainder of her own strategy.
25

The One-Shot Deviation Principle is deceptively simple.
Its application is often not straightforward.

First, it requires that players cannot gain by deviating once,
in any history of play.
But there are infinite many histories... So they cannot be
checked one by one.
They must grouped according to the prescriptions of the
strategy profile we are considering.



Second, the one-shot deviation may change future play,
according to the strategy that we are considering.
The deviation is one shot from a strategy, not from a play.
26
Grim Trigger Strategy


Consider the repeated prisoner’s dilemma.
The strategy prescribes that the player initially cooperates,
and continues to do so if both players cooperated at all
previous times. Otherwise, they should defect forever.
si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T.
si (a1, . . . , aT) = C otherwise.
27
Grim Trigger SPE

Suppose that both players adopt the grim trigger
strategy.

There are two “groups” of histories. Those for which
grim trigger strategy prescribes that the players play
(C,C) and those for which the grim trigger strategy
prescribes that they play (D,D).

In the first set of histories, if player i plays grim trigger,
then the outcome is (C, C) in every period with payoffs
(2, 2, . . .), whose discounted average is 2.
28

If i deviates only once, she plays D. Then she reverts to
the grim trigger strategy, that prescribes to play D at all
subsequent periods.

The opponent, playing grim trigger strategy, plays D
forever as a consequence of i’s one-shot deviation.

The OSD yields the stream of payoffs (3, 1, 1, . . .) with
discounted average
(1 − d)[3 + d + d 2 + d 3 + · · ·] = 3(1 − d) + d.

Thus player i cannot increase her payoff by deviating if
and only if
2 ≥ 3(1 − d) + d, or d ≥ 1/2.
29

In the second set of histories, if player i plays grim
trigger, then the outcome is (D, D) in every period with
payoffs (1, 1, . . .), whose discounted average is 1.

If I deviates only once, she plays C. Then she reverts to
the grim trigger strategy, that prescribes to play D at all
subsequent periods.

The opponent, playing grim trigger strategy, plays D
forever as a consequence of i’s one-shot deviation.

The OSD yields the stream of payoffs (0, 1, 1, . . .)
with discounted average
(1 − d)[0 + d + d 2 + d 3 + · · ·] = d.
30


Player i cannot increase her payoff by deviating: 1 ≥ d.
We conclude that if d ≥ 1/2, then the strategy pair in
which each player’s strategy is the grim trigger strategy
is a Subgame-Perfect Equilibrium of the infinitely
repeated Prisoner’s Dilemma.
31
SPE Folk Theorem
Theorem (Simplified Subgame Perfect Folk Theorem for
Two-Player Games) Let G be a two-player strategic game. Let w
be a feasible payoff profile of G for which each player’s payoff exceeds
her (pure-strategy) minmax payoff. Then for all e > 0 there exists
δ < 1 such that if the discount factor exceeds δ then the infinitely
repeated game of G has a subgame perfect equilibrium whose
discounted average payoff profile w satisfies|w’ − w| < e .
32

Take an outcome a such that both players’ discounted
payoffs exceed their pure-strategy minmax payoffs.

Let pj be an action of player i that holds player j down
to her minmax payoff, and let p = (p2, p1).

If the minmax profile p is a Nash Equilibrium of the
stage game, then consider a modified grim strategy such
that both players play the sequence at at any time t; and
that, if either player deviates, p is played for ever.

Because both players’ discounted payoffs for a exceed
their minmax payoffs, if the discount factor d is
sufficiently close to one, the players will obey to the
modified grim trigger strategy, yielding the outcome a.
33

If p is not a Nash Equilibrium, the proof is as follows.

Let si be a strategy of player i that starts off choosing
ai,0, and continues to choose ai,t so long as the previous
outcome was at; otherwise, it chooses the action pj that
holds player j to her minmax payoff.

Once punishment begins, it continues for k periods, as
long as both players choose their punishment actions,
and then players revert to a.

If any player j deviates from the assigned punishment
action, then the punishments are re-started, and player j
is now punished.
34

To prove that (s1, s2) is a subgame perfect equilibrium,
we now find δ’ and k(δ’) such that if δ > δ’ then the
strategy pair (s1, s2) is a subgame perfect equilibrium of
the infinitely repeated game.

Suppose that player j adheres to sj.

If player i adheres to si in any history with no
deviations, then her discounted average payoff is ui(a).

If she deviates, she obtains at most her maximal payoff
in the game, say ui*, in the period of her deviation, then
ui(p) for k periods, and subsequently ui(a) in the future.
35

Her discounted payoff from the deviation is at most
(1 − δ)[ui*+δui(p)+· · · +δkui (p)] + δk+1ui (a)
= (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a).

Hence, she does not deviate if
ui(a) ≥ (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a).

If player i adheres to si in any history where the players play
p, she gets ui(p) for at most k periods, then ui(a) in every
subsequent period.

This yields a discounted payoff of (1 − δk)ui(p) + δkui(a).

Note that ui(p) < mi, her minmax payoff, and ui(a) > mi.
36

If she deviates from si, she obtains at most her minmax
payoff in the period of her deviation, then ui(p) for k
periods, then ui(a) in the future.

This yields a discounted average payoff of at most
(1 − δ)mi + δ(1 − δk)ui(p) + δk+1 ui(a).

She does not deviate if (1 − δk) ui(p) + δk ui(a) ≥ (1 − δ)mi +
δ(1 − δk) ui(p) + δk+1ui(a) or (1 − δk) ui(p) + δkui(a) ≥ mi.

For each value of δ sufficiently close to 1 we can find k(δ)
such that (δ, k(δ)) satisfies the 2 no-deviation inequalities:
ui(a) ≥ (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a),
(1 − δk) ui(p) + δkui(a) ≥ mi.
37
Finitely Repeated Games




Consider any Subgame Perfect Equilibrium of a finitely
repeated game.
In the final stage, a Nash Equilibrium of the stage game
must be played.
Hence, the set of Equilibria is enlarged only if there are
multiple equilibria in the stage game.
Otherwise, the unique Subgame Perfect Equilibrium of
the repeated game is the unique Nash Equilibrium of
the stage game.
38
Prisoner’s Dilemma

The following game is repeated for T periods.
C
C
D

D
2, 2
0, 3
3, 0
1, 1
Proceeding by backward induction, in the last period,
the unique Nash equilibrium is (D,D).
39


Because in the last period players play (D,D) regardless
of the previous play, in the second to last period future
payoffs do not depend on current play.
It is as if players were playing the following game.
C
C
D


D
2, 2
0, 3
3, 0
1, 1
The unique Nash Equilibrium is (D,D).
Proceeding by backward induction, the unique
subgame-perfect equilibrium is (D,D) in every period.
40
Expanded Prisoner’s Dilemma
The following game is repeated for T periods.
C
C
D
E
D
E
2, 2
0, 3
-2,-2
3, 0
1, 1
-2,-2
-2,-2
-2,-2
-1,-1
There are 2 Nash Equilibria: (D, D) and (E,E).
41
Expanded Prisoner’s Dilemma

In period T, a Nash Equilibrium is played, either (D, D),
with payoffs (1, 1), or (E, E), with payoffs (-1, -1).

We construct a Subgame Perfect Equilibrium as follows.
1.
In period T, the profile (D, D) is played.
In all periods t = 1, …, T-1, the profile (C, C) is played
with payoffs (2, 2).
If either player deviates to D, then the future play
switches to (E, E) forever.
2.
3.
42

This is a SPE if and only if players do not have an
incentive to deviate at the period before the last.

In fact, the punishment (E, E) forever is more severe if
there are more periods left to play.


Each player must prefer to play C with payoff 2 + d,
than to play D, with payoff 3 – d.
Hence, the strategies are a SPE if and only if:
3 - δ < 2 + δ, i.e. δ > 1/2.
43
Summary of the Lecture

Infinitely Repeated Games

Nash and Subgame-Perfect Equilibrium

Finitely Repeated Games
44
Preview of the Next Lecture

Coalitional Games and the Core

Ownership and the Distribution of Wealth

Horse Trading and House Exchanges

Voting and Matching
45
Download