Document 11163049

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/nashperfectequilOOfude
«...
SAUG
HB31
.M415
working paper
department
of economics
NASH AND PERFECT EQUILIBRIA
OF DISCOUNTED REPEATED GAMES
By
Drew Fudenberg and Eric Maskin
No.
499
July 1988
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
23
1988
NASH AND PERFECT EQUILIBRIA
OF DISCOUNTED REPEATED GAMES
By
Drew Fudenberg and Eric Maskin
No.
499
July 1988
I.I.T.
LIBRARIES
AUG 2 3
iaoc-
RECEIVED
NASH AND PERFECT EQUILIBRIA OF DISCOUNTED REPEATED GAMES
By
D.
FUDENBERG
AND
E.
MASKIN
February 1987
Revised July 1988
*
Massachusetts Institute of Technology
## Harvard University and
St.
John's College, Cambridge
We thank two referees for helpful comments.
Research support from the U.K.
Social Science Research Council, National Science Foundation Grants
SES-8607229 and SES-8520952 and the Sloan Foundation
Acknowledged.
is
gratefully
ABSTRACT
The "perfect Folk Theorem" for discounted repeated games establishes
that the sets of Nash and subgame-perfect equilibrium payoffs are equal in the
limit as the discount factor
5
which the two sets coincide before the limit
to compute
8
S_
We provide conditions under
tends to one.
is reached.
That is, we show how
such that the Nash and perfect equilibrium payoffs of the
-discounted game are identical for all
6
>
S.
1
Introduction
.
The "Folk Theorem" for infinitely repeated games with discounting
asserts that any feasible, individually rational payoffs (payoffs that
strictly Pareto dominate the minmax point) can arise as Nash equilibria if
the discount factor
is
S
Our [1986] paper established
sufficiently near one.
that under a "full-dimensionality" condition (requiring the interior of the
feasible set to be nonempty) the same is true for perfect equilibria, so that
in the limit as
tends to
6
1
the requirement of subgame-perfection does not
(Even in the limit, perfection
restrict the set of equilibrium payoffs.
does,
of course,
rule out some Nash equilibrium strategies
.
This paper shows that when the minmax point is in the interior of the
feasible set and
a
second mild condition holds, the Nash and perfect
equilibrium payoffs coincide before the limit.
game,
there is a
i
<
1
such that for all
6
E
That is,
(S_,
1),
for any repeated
the Nash and perfect
equilibrium payoffs of the 5-discounted game are identical.
Our proof is
constructive, and gives an easily computed expression for the value of
The payoff -equivalence result holds even though for any fixed
6
<
1
S_.
there
will typically be individually rational payoffs that cannot arise as
equilibria.
In other words, the payoff sets coincide before attaining their
limiting values.
Theorem;
indeed,
Payoff -equivalence is not a consequence of the Folk
it requires the additional conditions that we impose.
The key to our argument is the construction of "punishment equilibria"
one for each player,
that hold a player to exactly his reservation value.
in our other work on discounted repeated games
([1986],
As
[1987a]), we
interweave dynamic programming arguments and "geometrical" heuristics.
Section
3
2
introduces our notation for the repeated game model.
presents the main results.
Section
4
Section
provides counterexamples to show that
the additional restrictions we impose are necessary,
and that these
.
restrictions are not so strong as to imply that the Folk Theorem itself
obtains for a fixed discount factor less than one.
Through Section 4 we make
free use of the possibility of public randomization.
That is, we suppose
that there exists some random variable (possibly devised by the players
Players can thus use
themselves) whose realizations are publicly observable.
the random variable to (perfectly) coordinate their actions.
In Section
5,
however, we show that our results do not require public randomization.
2
.
Notation
We consider a finite n-player game in normal form
g:
where
AjX.
.
.XA
R
g^,...^)
» R
n
,
n)
)
and Si (a
l
player i's payoff from the vector of actions (a,,...,
a
).
strategies,
i.e.,
- (g _(a
]
1
a
Sn ^ a i
n)
a
the probability distributions over A.,
a
n)
is
Player i's mixed
are denoted
~Z.
For
.
notational convenience, we will write "g(a)" for the expected payoffs
corresponding to
a.
In the repeated version of g,
discounted sum
i
maximizes the normalized
of his per-period payoffs, with common discount factor
-k
1
tt
each player
- (1-5)
Z
I
'
S
L
6:
g.(a(t)),
t-1
where
cr(t)
is the
vector of mixed strategies chosen in period
Player i's strategy in period
t
can depend on the past actions of all
players, that is on the sequence (a(r))
randomizing probabilities o(t)
.
t.
,
but not on the past choices of
Each period's play can also depend on the
realization of a publicly observed random variable such as sunspots
Although we feel that allowing public randomization is as reasonable as
prohibiting it, our results do not require
close to
Section
it.
explains how, for
5
6
the effect of sunspots can be duplicated by deterministic cycles
1,
of play.
let w(t) be a
To give a formal definition of the strategy spaces,
sequence of independent random variables with the uniform distribution on
The history at time
[0,1].
for player
i
- (h
is h
t
is a sequence s. where s
,:
,
H
->
and a strategy
a(t-l), w(t)),
Z,
and H
is the space of time t
histories
For each player j, choose minmax strategies nr - (nr
1
m-
(0)
.
G arg min max g. (m.
J
m
m
.
.
"J
J
,
m
.
)
and
g. (nr
)
.
.
.
,nr
- max g. (a
J
J
*•
,
.
)
,
where
nr
,
.
)
.
a
j
Let
v.
where "m
g.(a., nr
.
.
- g, (mJ
)
mixed strategy selection for players other than
"
is a
)
- g. (m;
,
.
.
.
.nrc
,
.
a.,
nr.
.,..., nr
)
We call v. player j's reservation value.
for player
j
j
,
and
.
Since one feasible strategy
is to play in each period a static best response to that
period's play of his opponents, player j's average payoff must be at least v.
in any equilibrium of g, whether or not g is repeated.
Note that any Nash
equilibrium path of the repeated game can be enforced by the threat that any
deviation by
j
playing m _
for the remainder of the game.
.
)
will be punished by the other players' minmaxing
j
(i.e.
Henceforth we shall normalize the payoffs of the game g so that
(Vp
v
i
.
.
.v
)
- (0, ... ,0)
- max g (a)
i
a
.
Let
.
Call (0,
... ,0)
the minmax point
,
and take
,
U -
I
In
v
(v,
In
there exists (a, ,...,a
)
1
with g(a ,...,a
1
n)
-
^
v )
r
}
x...xA
e A,
)
1
n
,
V - Convex Hull of U,
and
V
-
(
v
(v.
1
n
)
e V
I
i
v.
>
for all
i)
l
The set V consists of feasible payoffs, and V
*
consists of feasible
payoffs that strictly Pareto dominate the minmax point.
That is, V
k
is the
set of feasible, strictly individually rational payoffs.
3
.
Nash and Perfect Equilibrium
Any feasible vector of payoffs
least (1-5) v.
(v,
v
)
that gives each player
is attainable in a Nash equilibrium,
at
since Nash strategies can
specify that any deviator from the actions sustaining
minmaxed forever.
i
(v..
,
.
.
.
In a subgame-perfect equilibrium, however,
,v
)
will be
the punishments
must themselves be consistent with equilibrium play, so that the punishers
must be given an incentive to carry out the prescribed punishments.
One way
to try to arrange this is to specify that players who fail to minmax an
opponent will be minmaxed in turn.
However, such strategies may fail to be
perfect, because minmaxing an opponent may be more costly than being minmaxed
oneself.
Still,
even in this case, one may be able, as in our [1986] paper,
to induce players to minmax by providing "rewards"
In fact,
for doing so.
the present paper demonstrates that under certain conditions
these rewards can be devised in such a way that the punished player is held
to exactly her minmax level.
When this is possible, the set of Nash and
perfect equilibrium payoffs coincide, as the following lemma asserts.
Lemma
1:
For discount factor
6,
suppose that, for each player
i,
there is
a
perfect equilibrium of the discounted repeated game in which player i's
Then the sets of Nash and perfect equilibrium
payoff is exactly zero.
payoffs (for
Proof
with
<5)
coincide.
Fix a Nash equilibrium
:
,
and construct a new strategy
s
along the equilibrium path, but specifies that, if player
s
first to deviate from
s
,
player i's payoff to zero.
deviations are ignored.)
could have faced in
s
A
A
s.
s
By construction,
that agrees
i
is the
play switches to the perfect equilibrium that holds
(If several players deviate simultaneously,
the
Since zero is the worst punishment that player
i
he will not choose to deviate from the new strategy
,
.
is a
s
perfect equilibrium with the same payoffs as
s
.
Q.E.D.
Remark
Note that the lemma does not conclude that all Nash equilibrium
;
strategies are perfect.
A trivial application of Lemma
dilemma,
1
in which there is a one-shot equilibrium that gives all players
their minmax values.
An only slightly more complex case is a game where each
player prefers to minmax than to be minmaxed, i.e.,
g.(m J
)
>
like the prisoners'
is to a game,
for all
i
*
j
.
a
game in which
In such a game we need not reward punishers to
ensure their compliance but can simply threaten them with future punishment
if they fail to punish an opponent.
.
Theorem
Suppose that for all
1_:
pure strategy, and that g.(m
and j,
i
> 0.
)
Let
i
5
i*
,j
m
,
.
.
J
i.
Then for all
5
i„j
is a
(0),
satisfy v (1-5) < min g (nr
J
all
as defined by
,
for
)
J
G (5,1), the sets of Nash and perfect equilibrium
payoffs of the repeated game exactly coincide.
Proof
:
For each player
i,
define the
i
"punishment equilibrium" as
until some player
follows. Players play according to m
^
j
deviates.
i
Player
they switch to the punishment equilibrium for j.
this occurs,
no incentive to deviate from the
If
has
i
punishment equilibrium because in every
i
period he is playing his one-shot best response.
Player
j
k
i
may have a
short-run gain to deviating, but doing so results in his being punished, so
that the maximum payoff to deviation is v. (1-5)
g. (m
)
Remark
by assumption.
1:
So the hypotheses of Lemma
among all actions in the support of m.
this case Theorem
2:
are satisfied.
1
If the minmax strategies are mixed instead of pure,
construction above is inadequate because player
Remark
which is less than min
,
1
need not hold.
The proof of Theorem
vectors that give each player
j
1
.
Example
Q.E.D.
the
may not be indifferent
j
of Section h shows that in
1
2
actually shows that all feasible payoff
at least min g.(m.) can be attained in
Recall that we are expressing players' payoffs in the repeated game as
discounted payoffs and not as present values
,
2
However, in two-player games we can sharpen Theorem 1 by replacing its
hypotheses with the condition that for all i and j
and all a. in the
i ^ j
,
i
support of m.
,
that of Theorem
g. (a.
1
,
m
.
)
is positive.
,
Note that this condition reduces to
if all the m. are pure strategies.
equilibrium if
exceeds the
6
6
defined in the proof.
Although the hypotheses of Theorem
1
are not pathological
they
(i.e.,
are satisfied by an open set of payoffs in nontrivial normal forms)
they do
,
We now look for conditions that apply
not apply to many games of interest.
even when minmaxing an opponent gives a player less than her reservation
utility.
In this case,
a
To construct equilibria of
"reward" afterwards, as we explained earlier.
this sort,
it must be possible to reward one player without also rewarding
This requirement leads to the "full-dimensionality"
the player he punishes.
requirement we introduced in our earlier paper:
should equal the number of players.
the dimensionality of V
However full dimensionality is not
sufficient for the stronger results of this paper, as we show in Section
We must strengthen it to require that the minmax point (0
0)
the interior of V.
i
A
a.
him
to induce a player to punish an opponent we must give
Moreover, we need assume that each player
A
is
U.
itself in
has an action
.
such that g.(a., m
<
.)
0,
so that when minmaxed,
a
player has an action
(From our normalization, his
for which he gets a strictly negative payoff.
maximum payoff when minmaxed is zero.)
Theorem
2:
Assume that
(i)
the minmax point is in the interior of V,
A
that (ii) for each player
there exists
S_
<
1
i
there exists
such that for all
S
£
a.
(8_,
A
such that g.(a., m
1),
and
>
.)
<
0.
Then
the sets of Nash and perfect
equilibrium payoffs of the repeated game exactly coincide.
Corollary
:
Under the conditions of Theorem
vector v with v. >
equilibrium.
v.
(1-5)
for all
i
2,
for
6
>
6_,
any feasible payoff
can be attained by a perfect
Remark
Hypothesis (ii) of the theorem is satisfied by generic payoffs in
:
The interiority
normal forms with three or more actions per player.
condition (i)
the minmax point can be outside V
however, not generic:
is,
The condition is important because it ensures
for an open set of payoffs.
that in constructing equilibria to reward a punisher the ratio of his payoff
to that of the deviator can be made as large as we like.
Proof of Theorem
We begin in part (A) with the case in which all the
2:
minmax strategies
are pure, or, equivalently
m.
mixed strategy is observable.
each player's choice of a
,
Part (B) explains how to use the methods of
our [1986] paper to extend the construction to the case where the minmax
strategies are mixed.
Assume that each m. is a pure strategy.
(A)
action
For each player
i,
choose an
i
/.
a.
l
*
ii-i
such that g.(a.,
m
b
-x.
.)
<
For Ji^i.
0.
l
-g.(a., m .).
let -'j
y.
°j
-l
1
The equilibrium strategies will have 2n "states", where n is the number of
players.
States
through n are the "punishment states", one for each
1
player; states n +
the strategies are:
Play (a., m
switch to state n +
remain in state
deviates,
i
j
»
possible because
,
1
p. (8).
-
If player
tomorrow.
In reward state n +
-
v
(v..
),
j
players
i,
which are to be determined.
deviates in a reward state, switch to punishment state j.
Choose v
x.)
If there are no deviations,
today.
with complementary probability
i
then switch to state
j
.)
i,
tomorrow with probability p. (8) (to be determined), and
play actions to yield payoffs v
If player
In the punishment state
to 2n are "reward states".
1
so that for
in V
so that,
e int V).
6
for
i
n
i
,
x.v.
-
v.y. >
(this is
Now set p.
r x(8)' - x./<5v., and choose
> I, p. (5) < 1.
l
1/
l'
6
> x./(v. +
3/1
This choice of p. (6) sets player i's
payoff starting in state
player
w.
,
j
equal to zero if she plays as specified.
i
does not deviate, his payoff starting in state
i,
If
which we denote
solves the functional equation
j
wj - (1-5)
(1)
i
(-yj) +
5
Pi (£)vj + 5(l-p.(6))w j
,
so that
w
(2)/
\
L
- (x.v
1
1
yfvf) / (v + x,).
i
i
i
-
i J
j
j
The interiority condition
By construction, the numerator in (2) is positive.
has allowed us to choose the payoffs in the reward states so as to compensate
the punishing players for punishing player
.A
without raising i's own payoff
i
A
above zero.
Choose
5
*
j
<
1
large enough that,
A
•
and so that for
i
,
w.
5
and j, v. > v. (1-5),
i
A
> v. (1-5).
We claim that for all
equilibrium.
for all
Set
5
- max (5,
5).
the specified strategies are a perfect
e(5_,l),
First consider punishment state
receives payoff zero by not deviating.
i.
If player
In this state, player
i
deviates once and then
i
conforms, she receives at most zero today (since she is being minmaxed) and
Thus player
she has a normalized payoff of zero from tomorrow on.
i
cannot
gain by a single deviation, and the usual dynamic programming argument
implies that she cannot gain by multiple deviations.
state
is w.
i
(which exceeds v. (1-5)).
Player j's payoff in
A deviation could yield as much as v.
today, but will shift the state to state j, where j's payoff is zero,
player
n+i,
j
cannot profit from deviating in state
each player k obtains payoff
v,
exceeding
i.
v,
Finally,
(1-5)
switching to punishment state k prevents deviations.
from Lemma
1.
,
so
in reward state
and so the threat of
The theorem now follows
The strategies in part (A) punish player
(B)
to use his minmax strategy m.
if,
j
in state
he fails
i,
The players can detect all deviation from m.
.
only if it is a pure strategy, or if the players' choices of mixed strategies
(Otherwise, player
are observable.
would not be detected if he deviated to
j
However,
a different mixed strategy with the same support).
following our
paper, we can readily modify the construction of (A) to allow for
[1986]
The idea is to specify that player j's payoff in
mixed minmax strategies.
reward state n+i depend on the last action he took in punishment state n in
such a way that player
is
i
exactly indifferent among all the actions in the
support of m.
let (a.(k)} be the pure strategies in the support of m.,
To begin,
where the indexation is chosen so that
1
1
y (k) « -g.(a., a.(k), m
where m
than j.
1
.
.)
= y (k+l),
Thus y.(k) is player j's expected payoff in punishment state
she plays her k
c
< -g.(a., a 1 (k+l), m 1
.)
is the vector of minmax strategies against i by players other
.
.
.
-
1
-best strategy in the support of m.
"
max
i,a. ,a. ,a
l*
l
,
g.(a.,
a
II
;
N
/
g.(a.,
a
-
.)
.
i
if
Next define
.11
.)
.
.
-l
This is the maximum variation a player's choice of action can induce in his
own payoff.
for all
3c
Also,
i
let
e
>
be such that all payoff vectors v with
are in the interior of V.
As in part (A)
,
m.
.
i
played
G int V.)
our strategies will have n punishment states and n
reward states, with a probability
n+i if player
(This is possible because
< v. <
a.
and each
of switching from state
p. (S)
j
r*
i
i
to state
played an action in the support of
However, when play switches to state n+i, player j's payoff depends on
10
Denote these payoffs by
the action that she took in the preceding period.
v.(k.), where
is the
k,
index (as defined in the preceding paragraph) of the
action last played by player
Vr
vl(k
i
<
v
L
This v
v
£<v
k
j
Thus the vector of payoffs in state n+i is
.
k
i+r---- n ) -
v
k
i-i< i-i>-
is defined as follows:
v
i+
v
i<Vi>
X>>-
First choose v. and v,(l) for each
j
to
satisfy:
(3)
x.v\(l)'
1
1
v^y (l) >
-
l^j
j
< ex./ c
(A)
v,
(5)
vj(l) <
.
0,
and
an d
e.
These conditions can be satisfied because
p. (5)
- (1-6)x./i5v.
(6)
vjCkj) -
Now for each
.
vjd)
+ (1-5)
j
[yjCkj)
and k.
-
As in part (A),
G int V.
let
set
,
yj(l)] /
6
?i
(S).
With this specification of the reward payoffs, player j's payoff in state
is the same for each strategy in the support of m.
[x.v.(l)
-
y.(l)v.] /
(v.
,
and equals
+ x.). which is positive from inequality (3).
Now we must check that the specified payoffs for state n+i are all
feasible, and that,
for
6
near one, no player wishes to deviate.
Substituting the definition of p(5) into
11
(6)
,
we have
i
vj(kj) - vj(l) + [yj(kj)
(7)
Referring to the bounds
k.
and (5), we see that v. (k.
(4)
and thus the vector v
,
yjd)] (vj/x.).
-
)
< 3e
for all
J
J
since player i's payoff in state
bounded away from zero for all
In state
reward states n+j.
j
i,
i
,
for deviating from the support.
willing to play m..
is
no player will wish to deviate in the
player j^i obtains the same positive payoff
Thus,
and she will be punished with zero
,
for
close to
S
The argument that player
exactly as in Case
is
and his payoff in state n+j
is zero,
from any strategy in the support of m.
i
and
Finally,
feasible for all values of the k.'s.
is
j
i
1,
player
will be
j
will not deviate in state in
Q.E.D.
A.
The interiority hypothesis of Theorem
dimension, i.e., that dim V -
n.
2
implies that the set V has full
Let us briefly consider the connection
When the
between Nash and perfect equilibrium when V has lower dimension.
number of players n exceeds two, our [1986] article shows by example that the
Nash and perfect equilibrium payoff sets need not coincide even in the limit
as
1.
tends to
S
Thus,
1.
for such examples,
these sets do not coincide for
S
The story is quite different, however, for two player games.
Theorem
that,
3:
In a two-player game where dim V <
for all
6
G
(6_,
1),
2,
there exists
6_
<
1
such
the sets of Nash and perfect equilibrium payoffs
coincide
Proof
2
:
1
Let (nu .m-) be a pair of minmax strategies.
(If there are multiple
such pairs, choose one that maximizes player l's payoff.)
(0,0),
1
2
If g(m, .nu) -
2
1
then (m- ,m„) forms a Nash equilibrium of the one-shot game.
12
In this
<
as infinite repetition of a one-shot Nash equilibrium constitutes a
case,
perfect equilibrium of the repeated game, application of Lemma
the theorem.
cannot be
2
Suppose, therefore,
for some
constant sum game and so, since dim V <
a
players' payoffs so that, for all
1,21
2
g, (m, ,nu)
1
that g.(m, ,nu) <
Because y <
(— g«(m, ,nu)).
•&***
g(a. ,a„) - (v ,v
response to m„
where v
),
2
0,
g.
(a..
i.
But then g
we can normalize the
- g„(a, ,a„).
,a«)
Take v -
* *
there must exist (a. ,a.) such that
11
*
>
establishes
(otherwise
1
where m,
(m, ,nu),
is a
best
minmax pair for which g(m, ,m„) - (0,0), contradicting
is a
,
(a, ,a„),
2,
1
1
the choice of (nu.nu)).
We will show that,
for
6
near
1,
there exists a
perfect equilibrium of the repeated game in which both players' payoffs are
When
zero.
p(<5)
2
m.
1
and nu are pure strategies this is easily done by choosing
such that
(8)
(l-S)-v + 6p(5)v* - 0.
2
1
The equilibrium strategies consist of playing (m..,m„)
in the first period and
•%
-A-
then either switching (permanently) to
starting again with probability l-p(5)
(a, ,a„)
.
punished by restarting the equilibrium.
{a..(l)
,
.
.
.
,a(R)
probability of
(9)
S 2
qi
and that of m„ is
}
a,
(i)
(k)
q 2 (j)
is
q.
(k)
.
{
a„ (1)
Deviations from this path are
Suppose that the support of
,
.
.
.
Z
qi
(i)
g 1 (a 1 (i),a 2 (j)) - v,
g 2 (a 1 (I),a 2 (j)) <
,
By definition,
and since the m. are minmax strategies,
(10)
for all
and
13
with probability p(6) or else
j
a„ (S)
)
.
2
m,
Suppose that the
is
2 q (j) £ (a (i),a (j)) <
1
1
2
2
(11)
in Lemma
Now,
2
for all
i.
below, we will show that, for all
i
and
i,
there exists
c.
.
>
such that
2
(12)
q
(i)
[g 2 (a 1 (i),a 2 (j))
+ c
[g 1 (a 1 (i),a 2 (j))
+ c
]
-
]
-
J
i
and
2
(13)
q 2 (j)
Take
P,,(5)
IJ
-^—^
<5
v
2
1
if players play (m, ,m„)
Then,
in the first period and switch (forever)
(a«,a„) with probability p.. (6) if the outcome is (a, (i) a»
,
expected payoffs are (0,0)
(from (12) and (13)).
(
j
) )
,
to
their
Furthermore (12) and (13)
imply that the players are indifferent among actions in the supports of their
minmax strategies.
Remark
Q.E.D.
An immediate corollary of Theorem
:
3
is that the
Folk Theorem holds
for two player games of less than full dimension even when mixed strategies
are not observable.
Lemma
(p(l)
(14)
Suppose that B - (b..) is an R x
2:
,
.
.
(This case was not treated in our [1986] paper.)
.
S
matrix and that p -
,p(R)) and q - (q(l),...,q(S)) are probability vectors such that
poB <
and Boq <
0.
14
Then there exists an RxS matrix
c
.
.
>
Proof
0,
for all
i
Consider
:
,
j
i-
such that pob.
r
i
<
i«
then (14) implies that poB -
(B+C)oq -
0,
a
Now if, for all
0.
'
and
i.
.on =
b
'J
contradiction of the choice of b.
.
such that we can increase b.. while (14) continues to
Indeed we can increase b.. until either pob.
hold.
-
.
a row b.
Hence there exists
(c.) such that po^B+C)
C -
-
or b
.on - 0.
Continuing by increasing other elements of B in the same way, we eventually
obtain poB make to
4
.
and Boq -
Let C be the matrix of the increases that we
0.
Q.E.D.
B.
Counterexamples
We turn next to a series of examples designed to explore the roles of
the hypotheses of Theorems
1
and
2.
Example
hold when the minmax strategies are mixed.
hypotheses of Theorems
1
and
2
1
shows that Theorem
Example
do not imply that all
payoffs can be attained for some
6
2
2
individually rational
Examples
are necessary.
detailed arguments can be found in our [1987b] working paper.
Example
1
Consider the game in Table
1
15
need not
establishes that the
strictly less than one.
show that both hypotheses (i) and (ii) of Theorem
1
3
The
and 4
Player
2
R
L
U
Player
-1
1,
-1.
2
1,
o
1
-1,
-1
Table
Note that player
1
is
minmaxed if and only if player
probabilities between L and
of Theorem
1
Thus,
R.
2
mixes with equal
it is easy to see that the hypotheses
are satisfied except for the assumption that 2's minmax strategy
Notice that, for any
be pure.
1
perfect (indeed,
player l's payoff must be positive in any
6,
in any Nash) equilibrium,
because l's payoff could be zero
only were he minmaxed every period, in which case player 2's payoff would be
negative.
This already suggests that the set of perfect equilibrium payoffs
will be a proper subset of the Nash payoffs, since a zero payoff can be used
but the most severe punishment in a
as a punishment in a Nash equilibrium,
perfect equilibrium is positive.
To see that this suggestion is correct,
be the infimum of player
let w
one's payoff over all perfect equilibria where player two's payoff is zero,
and let
e
be a perfect equilibrium where player two's payoff is zero and
player one's payoff w
is near w
.
In the first period of
play D with probability one, or player
2
e
2
first period if
(see our working paper)
e
is near enough to 1
player
1
must
could obtain a positive payoff.
Moreover, it can be shown that player
6
,
randomizes between L and R in the
by slightly raising the probability that player
2
.
places on
Let us modify
L.
Player 2's
payoff is unchanged by this alteration, since he is indifferent between L and
16
Flayer l's payoff, however, declines, and for w
R.
—
it
w
,
-k
player one's payoff w will now be less than w
not supportable in a perfect equilibrium.
so that if player
Since,
in
e
1
player
,
,
sufficiently close to
—
so the payoff (w,0)
is
Now further modify the strategies
deviates to U in the first period he is minmaxed forever.
1
was deterred from deviating to U with a positive
punishment, the threat of
a
punishment payoff of zero will deter player from
deviating to U when the probability that player
is slightly greater than in e
.
The modified
plays L in the first period
2
is,
e
therefore, a Nash
equilibrium whose payoffs cannot arise in a perfect equilibrium.
Example
2
Consider the game in Table
Theorems
1
and
2,
It satisfies the assumptions of both
2.
so that for sufficiently large discount factors the Nash
and perfect equilibrium payoffs coincide.
However, as we shall see,
this is
not a consequence of the Folk Theorem, since, for such discount factors not
all individually rational payoffs can arise in equilibrium.
M
U
1,
o
3,
2,
-1
1,
2
Table
Now,
the feasible point (3,0)
equilibrium payoff sets as
factor
5
<
1
,
6
is
-2,
0,
-1
1
2
contained in the limit of the Nash
tends to
1.
However, for any fixed discount
the Nash equilibrium payoffs are bounded away from (3,0).
see this observe that if,
for
6
< 1,
To
there existed a Nash equilibrium with
17
average payoffs
(.3,0),
players would necessarily play (D,L) every period.
But in any given period, player two could deviate to M and thereby obtain a
positive payoff, a contradiction.
Example
3
Consider the two-player game in Table
U
0,
2,
2,
1,
-4
0,
-5
1
in which (0,0)
lies on the
This game satisfies hypothesis (ii)
boundary of V instead of the interior.
but not (i) of Theorem
3a,
and we shall see that the theorem fails.
(2,1)
(-1,-4)
(0,-5)
Figure 3b
Table 3a
As in Example 1, player l's payoff cannot be exactly zero in a Nash
equilibrium.
(The only way that player l's payoff could be zero along a path
in which player 2's payoff is nonnegative would be for players to choose
(U,L) with probability one in every period.
But then player
to D in the first period and guarantee himself 2(1-5).)
1
could deviate
Indeed, as we show
[1987b] paper, player one's Nash equilibrium payoff must be at least
in our
14(l-6)/9.
Consider the perfect equilibrium where player l's payoff is
lowest.
If q is the probability that player
player
1
2
plays L in the first period, then
can obtain at least q(l-<5) + <514(l-<5)/9 by playing D in the first
period. Playing R is costly for player
2,
18
and so if q less than
1,
player
2
must be rewarded in future periods.
player
a
2
But,
as Figure 3b shows,
positive payoff induces an even higher payoff for
because of the failure of the interiority condition.)
player
assigning
(This is
1.
The need to reward
implies that player l's equilibrium payoff exceeds 7(1-6) (1-q).
2
Together the two constraints imply that it exceeds 7(1-6) /3 but one can
readily construct
Nash equilibrium where player l's payoff is 2(1-6).
a
Hence the Nash and perfect equilibrium payoffs do not coincide.
Example
4
Consider the game in Table
4.
0,
-1
-1,
0,
-1
1,
1
Table 4
When player two plays
L,
player one obtains zero regardless of what he does,
violating hypothesis (ii) of Theorem
(interiority) holds.
Nash equilibrium.
2.
Note, however,
that hypothesis (i)
Once again, player one's payoff exceeds zero in any
(Player
1
can be held to zero only if player
2
plays L
every period, which is not individually rational.)
From arguments similar to
those for Example
the payoffs in the Nash
1,
we can show that for
6
near
1,
equilibrium that minimizes player l's payoff among those where 2's is zero
cannot be attained in any perfect equilibrium.
1"
5
No Public Randomization
.
In the proof of Theorem
2,
we constructed strategies in which play
switches probabilistically from a "punishment" phase to a "reward" phase,
with the switching probability chosen to make the punished player's payoff
This switch is coordinated by a public randomizing device.
equal to zero.
The reward phase also relies on public randomization when the vector
v
- (^.....v 1 ) lies outside the set U of payoffs attainable with pure
1
Although public randomizing devices help simplify our arguments
strategies.
by convexifying the set of payoffs, they are not essential to Theorem
2.
Convexif ication can alternatively be achieved with deterministic cycles over
pure-strategy payoffs when the discount factor
is close
to one.
Our [1988] paper showed that public randomization is not needed for the
proof of the perfect equilibrium Folk Theorem, even if players' mixed
strategies are not observable.
all
e
there exist
>
and all
>
S
5_,
2
of that paper established that,
t,
with v. >
such that for all vectors v e V
there is a sequence (a(t))
actions in period
(1-5) 2
t-1
S_,
Lemma
.,
c
for
for
i,
where a(t) is a vector of
whose corresponding normalized payoffs are v (i.e.,
g(a(t)) - v) and for which the continuation payoffs at any time
6
CO
r
are with
e
of v (i.e.,
result implies that, for
the vector v
for all
S
t,
\{1-S) 2
t-r
t
"
T
6
g(a(t)
)
large enough so that v.(l-fi) <
-v||
e
,
<
This
e,
we can sustain
as the payoffs of a perfect equilibrium where the equilibrium
path is a deterministic sequence of action vectors whose continuation payoffs
are always within
a
e
of v
,
and where deviators are punished by assigning them
subsequent payoff of zero.
belong to
U,
Hence, attaining v
,
even when it does not
does not require public randomization.
Nor do we need public randomization to devise punishments whose payoffs
re exactly zero.
We can replace the punishment phase of Theorem
of deterministic length.
20
2
with one
.
Proof of Theorem
2
without Public Kanriomizat ion
As in the earlier
{A)
:
Let y
proof, we begin with the case of pure minmax strategies.
be as before. Because
V 1 with
v.
J
G int V we can choose
t
>
For
n
,
x.
for each
and,
> 2e and v. x. > yy + 2e for all Ji*i.
l
l
,
a
a
vector
1,
we can
i,
near enough
6
and
,
-
j
choose function v. (6) so that
l
|vj(f)
(8)
-
vjj
|
|
<
€,
vjx. > yj vj(ff) +
(9)
£
for all j*l,
and such that there exists an integer t(S) so that
(l-5
(10)
t(<5)
)(-x.) +
t(S)
6
v
1
(6)
- 0.
Let
wj -
(11)
(l-^^X-yj1
Substituting using
w
(12)
Take
S
1
(9)
)
+
t(5)
5
vj
and (10), we have
> e/x.
close enough to one so that for all
i
Now consider the following strategies:
t(5) periods,
and j, v. (1-5) < min
In state
i,
(
e
,
e/x.
)
play (a., m_.) for
and then switch to state n+i, where play follows a
deterministic sequence whose payoffs are v (5) in state n+i we appeal to Lemma
2
(v. (8),
v .).
For the play
of our [1988] paper, which guarantees that
the continuation payoffs in state n+i are at least
:-i
e
.
If player
i
ever
deviates from his strategy, switch to the beginning of state
i.
By
construction, player i's payoff is exactly zero at the beginning of state
and increases thereafter, and since
is
minmaxed in state
If player j^i deviates in state
by deviating.
v. (1-5) < e,
i
i,
i,
i,
he cannot gain
he can obtain at most
and from not deviating obtains at least w., which is larger.
Finally, since payoffs at every date of each reward state are bounded below
by
e,
and deviations result in a future payoff of zero, no player will wish
to deviate in the reward states.
(B)
To deal with mixed minmax strategies, we must make player j's
payoff in n+i depend on how he played in state
Theorem
is
2.
i,
as
in the earlier proof of
We will be a bit sketchier here than before because the argument
essentially the same.
As before, y.(k) be player j's expected payoff from his k
in the support of m., j^i.
Let
Y
s
this is the amount that player
j
ij
playing a.(l).
(13)
-
-a-6)
best action
r
[yj(l)-yjac(r))];
sacrifices in state
i
relative to always
Now take
vj(5) - vj + Ri(l-S)/6 t(£)
.
With these payoffs in state n+i, each player
actions in the support of m.
.
If v. and
e
j
is
indifferent among all the
are taken small enough, t(5)
(as
defined by (10)) will also be small, and so the right hand side of (13) will
be feasible.
Thus,
the reward payoffs can be generated by deterministic
sequences.
Q.E.D.
22
References
Fudenberg, D. and E. Maskin [1986], "The Folk Theorem in Repeated Games with
Discounting or with Incomplete Information", Econometrica 54 533-554,
.
,
[1987a], "Discounted Repeated Games with
and
Unobservable Actions, I: One-Sided Moral Hazard", mimeo.
and
[1987b], "Nash and Perfect Equilibria of Discounted
Repeated Games", Harvard University Working Paper #1301.
and
[1988], "On the Dispensability of Public
Randomization in Discounted Repeated Games", mimeo.
23
2 6 10
U 2 6
3
-~
G-^-tf\
Date Due
-
WAR
1
*^§
tawf u Q
BflQ
Lib-26-67
MIT LIBRARIES
nun
3
i
inn
TDflO
in
in
miiiiiii
DDS 2M3
A3M
Download