Document 11163011

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium Member Libraries
http://www.archive.org/details/ondispensabilityOOfude
|HB31
mi 5
working paper
department
of economics
ON THE DISPENSABILITY OF PUBLIC RANDOMIZATION
IN DISCOUNTED REPEATED GAMES
Drew Fudenberg
Eric Ma skin
Number 467
August 1987
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
ON THE DISPENSABILITY OF PUBLIC RANDOMIZATION
IN DISCOUNTED REPEATED GAMES
Drew Fudenberg
Eric Maskin
Number 467
August 1987
Dewoy
On
the Dispensability of Public Randomization
in
Discounted Repeated Games
by
Drew Fudenberg
*
and
Eric Maskin
August 1987
Massachusetts Institute of Technology
**
Harvard University
We would like to thank David Levine for encouraging us to study this
topic.
Research support from National Science Foundation Grants
SES-8607229 and SES-8520952 is gratefully acknowledged.
2G0SS23
1
.
I
n
t
roduct ion
The Folk Theorem for repeated games asserts that any feasible,
individually rational payoffs for
a
one-shot game can arise as Nash
equilibrium average payoffs when the game
In
our
[1986]
is
infinitely repeated.
paper, which extends this result to subgame perfect
equilibrium and discounting, we assumed that the players can
condition their play on the realization of
random variable.
assumption would lead to only
viz.,
any feasible,
approximated by
a
a
publicly observed
that abandoning the
slight weakening of the results;
individually rational payoffs can be
perfect equilibrium where there is sufficiently
little discounting.
This note shows that,
the Folk Theorem holds
randomization:
however,
We asserted,
a
in a
in
fact,
our extension of
strong sense even without public
all feasible individually rational payoffs can be
exactly attained in equilibrium.
Although this stronger result
is
of some interest by itself,
its
true significance appears in connection with mixed strategies.
Early analyses of repeated games with little or no discounting
(Aumann-Shapley [1976], Friedman [1971] and Rubinstein [1979])
restricted players to pure strategies, or equivalent ly
a
player's choice of
his fellow players.
a
,
assumed that
mixed strategy in any period is observable by
The assumption of pure strategies is restric-
tive because typically the range of individually rational payoffs is
greater when players are allowed to use mixed strategies to punish
their opponents.
The alternative hypothes is--that a player's
randomizations are ex post observable
Section
6
of our
[1986]
— is,
likewise,
strong.
paper showed how to extend the Folk
Theorem to allow for mixed strategies when only
a
player's realized
actions,
and not his choices of randomizing probabilities,
The key was the observation that
observable.
induced to use
a
are
player can be
mixed strategy to ininimax an opponent by making her
a
continuation payoff depend on her current action in
a
way that
renders her exactly indifferent among the various choices in the
mixed strategy's support.
Our argument relied on public randomization to ensure that any
individually rational continuation payoffs can be exactly attained.
If,
without public randomization,
merely be approximated,
a
the continuation payoffs could
minimaxing player might not be exactly
indifferent over the support of his mixed strategy, and our
construction would fail.
Thus,
if we obtain only an approximate
version of the Folk Theorem without public randomization, our
construction cannot accommodate unobservable mixed strategies.
Attaining payoffs exactly
our
is
also essential for the argument in
paper, which provided sufficient conditions for the sets
[1987a]
of Nash and perfect equilibrium payoffs to coincide for discount
Although the body of that paper assumed the
factors less than one.
possibility of public randomization, our results here imply that
this assumption,
2.
as
in
the Folk Theorem paper,
is
unnecessary.
The Model
We consider a finite n-player game in normal form:
g:
A-R
n
,
where A=A,x...xA
1
n
and A.
is
player i's action space.
1
set of player i's mixed strategies,
i.e.,
Let Z. be the
i
the probability
distributions over
A.
In
and set Z=Z,
,
l
we will write g
notation,
,
(
o
*.
.
.
*Z
To simplify
.
for player i's payoff given
)
the mixed
strategy vector o€£.
repeated versions of
In
over actions at time
each player's probability mixture
g,
can depend on the actions chosen at all
t
More formally,
previous times.
let h(t)
e
A
-
realized actions from time zero through time t-1.
strategy
Z.
is
be the
Player i's
sequence of maps (one for each period) from H(t) to
a
Note that,
.
H(t)
any time
at
player i's strategy does not depend
t,
on the past randomizing probabilities of his
opponents, but only on
their realized actions.
the infinitely repeated game G
In
,
each player i's payoff is
o
the average discounted sum
discount factor
(1-5)
T,.=
1
period
with common
6:
t_1
Z
£
g.(a(t)),
X
t=l
where a(t)
of his per-period payoffs,
-n-^
the probability distribution of actions chosen in
is
t
For each player
choose "minimax strategies" m =(m,,...,m
j,
so
)
that
m
arg min max g.(o.,o
e
.
-J
a
a
-J
J
J
.),
"J
.
J
and
v.
max g.(a.,m
=
J
J
„
J
=
.)
"J
g.(m
).
J
J
(Here "m^." is a mixed strategy selection for players other than
and
g
'
.
j
(a
.
j
,m J
-j
.
'
)
=g (m^
.
j
1
,
.
.
.
,
m^j-1'
.
a
,
.
,
j'
m Jj+1'
.^ 1
mf
) )
.
j,
We call v. player
Clearly,
j's reservation value.
least
at
v
any equilibrium of
in
.
player j's average payoff must be
g,
whether or not
g
repeated.
is
Henceforth we shall normalize the payoffs of the game
(
v
t
......
v
1
v
.
t
n
=
)
(
.
.
.
)
.
Call
.
Moreover,
=max g.(a).
l
,
(0.....0)
the mi n imax point
g
so
that
Take
.
let
l
a
U
=
{(v.,...,v
V
=
Convex Hull of
)
there exists a€Z with g(
|
a) =( v
.,...,
v
)},
U,
and
=
V
{(v.,...,v )€V
n
1
3
|
v.>0 for all
i}
l
'
The Folk Theorem without Public Randomization
.
Our [1986]
paper showed that if public randomization is allowed
and either n=2 or the dimension of
vector veV
Se(_S,l),
,
there exists
there is
a
a
V
equals
discount factor
perfect equilibrium of
n,
S_<1
then for any payoff
such that,
with payoffs
G
for all
v.
We
6
now demonstrate that public randomization is inessential for this
resul t
Lemma
in
V
1
establishes that, for
5
sufficiently large, all points
are feasible and can be obtained without using mixed
strategies.
of actions
That is,
{a(t)}"_.
there is a deterministic sequence
for any veV
for which
v
is
the payoff vector.
This is not
sufficient to establish the Folk Theorem, however, because, even if
1. Of course, for low discount factors, public randomization
does make a difference.
If & is near zero, the payoff vector for
the sequence {a(t)} is approximately g(a(l)), and
so, quite apart
from equilibrium considerations, many payoffs in V
are not
f eas ib le
veV
the sequence
,
period
V
In
.
T
{
a
(
t
)
might have the property that,
}
the continuation payoffs beginning at
,
that case,
for some
belong
do not
t
to
some player would prefer to deviate from the
even if so doing caused his opponents to minimax him
sequence,
thereafter
Building on Lemma
generated by
Lemma
1,
shows that payoffs in
2
deterministic sequence in such
a
continuation payoffs always lie
in
V
way that the
a
Following Lemma
.
can be
V
2,
we
explain how our results allow us to do without public randomization
in
the proof of the Folk Theorem.
Write A={a ,...,a
{w
,...,w
strategies
Lemma
for each
j,
let w
= g( a
)
Thus,
.
the set of payoff vectors corresponding to pure
is
}
and,
}
.
1
:
If
<S> 1
—
then for any v€V there is
,
a
sequence {a(t)}
of pure strategies whose average payoff is v.
m
Proof
Let v=Ix J w J
:
where 0<x J <l,
,
.
and £ x J = l.
We construct
j=l
(a(t)}
is
1
as
follows.
if a(t)=a J
let N J (t)
=
Let
(t)
I
be an index variable,
Set N J (1)=0 for all
otherwise.
and
t-1
T
I (1-S)S
T=l
a
I
j
(t)
for t>l.
N J (t)
discounted weight" given to strategy vector
C(t)
=
(j|x J -NvJJ (t)
j*(t)
=
5
arg max
{x J jec(t)
and set a(t)=a J
2.
t-1
>
ft)
.
a
is
j,
and
the "average
before time
t.
Let
Now define
(1-6)}.
N j (t)},
which
2
This defines an algorithm for computing a(t)
If there is a tie,
make
a
deterministic selection.
Claim
1
The algorithm is well-defined,
:
i.e.,
the set C(t)
is
assume to the contrary that at some time(s)
t,
never empty.
To prove the claim,
C(t)
N
is
(s)
empty,
for all j.
m(l-5)S
(1)
and let
S
Summing over
m.
_,
l
be the first such time.
s
ll- Z N (s)
=
1-1
1
Z (1-S)5
s-1
=
(1)
s-1
(1-6)>X
j
T
_,.J
I
(t)
j=l T=l
j = l
But
5
we have
j,
ms —
J
Then
1- Z (1-«S)5
T
_,
l
S
=
_.
L
S
contradicts our assumption that m(l-S)<l, establishing the
claim
Let N J
(oo)
=
lim N J (t).
t
bounded,
(Because N J (t)
is
increasing and
~*00
this limit exists.)
Claim
2
For all
:
N J (<»)=\ V
j,
m
To establish Claim 2,
note first that, by construction,
Z N
(<») = 1.
j=l
Moreover, N («) cannot exceed x
(by
S
t-1
(l-S))
t )<
J
Z x =1,
N
only when N J
m
for each j,
and,
since
,
(
because
- St
'
l
(
increases
N
1- 5)
.
Thus N j („)ix J
.
(«)=x
,
proving the claim.
j=l
Now,
by construction,
the payoffs corresponding to {a(t)}
are
(1-S)
l
I
t
(1-5)
X
6
g(a(t))
=
=l
Z
S
t_1
Z
[
I
J
Z
j = l
t = l
j = l
t = l
t_1
(t)w J ]= Z w J (l-5)
6
I
J
(t)-Zw j N j (co)=Zx j w J
=
v
Q.E.D.
Roughly speaking,
definition,
is
v
payoff vectors
convex combination Zx w
a
w
,
the algorithm of Lemma
.
.
.
,w
1
works as follows.
of the pure strategy
To generate v as a discounted average
.
payoff over time, choose that pure strategy vector
which the difference between
been used up until
t
x
are simply
Z
a
at
and the fraction of times
time
a
t
for
has
(suitably weighted for discounting) is largest.
The continuation payoffs at time
(a(t)}
By-
t
—S
5
g(a(t)),
associated with the sequence
s
the discounted sum of
i.e.
t =s
per-period payoffs starting at time
Lemma
2
min v.>5£,
t
and discounted to time
For every £>0 and closed set VcV
:
there exists
_S< 1
such that,
for all
s.
with
£€(_£, 1)
i.vev
and every v£V,
there is
a
sequence {a(t)} of pure strategies whose
discounted average payoffs are
any time
are at least
t
Remark
€
v,
and whose continuation payoffs at
for each player.
To prove that public randomization is inessential for
:
the Folk Theorem with observable mixed strategies,
to show
that,
such that,
if
for any individually rational payoff
8
exceeds
_5,
v
can be generated by
a
it would suffice
v,
there exists
deterministic
_£
8
sequence whose continuation payoffs are individually rational.
Lemma
establishes
2
stronger property;
a
works uniformly for the entire set
asserts that
it
a
fixed
_S
We provide the stronger
V.
result because it is needed to show that public randomization is
inessential with unobservable mixed strategies and in our [1987]
paper
.
Proof
Let
:
of the set
be the polygon corresponding to the intersection
Z
with the inequality constraints v.23£, and let
V
the J vertices of Z.
vertices
{z
}
e{w ,...,w
z
Clearly,
V
cZ
such that (i) each
z
and (iii)
};
average Zx k (j)w k
z
}
be
Let Z be a polygon with
.
is
within
of
£
z
;
(ii)
z
=z
if
can be expressed as a weighted
k
where each weight x (j)
,
{z
is
a
rational number.
Observe that VcZ
Because the x k,(j)'s are rational, we can find integers
.
(r
(j))u_i and
such that for all
d
and
j
k,
x
(j)=r (j)/d.
Let
"cycle j" be the d-period sequence of pure strategies in which
played for the first
periods;
w
k=
r
k
Let
z
j
(
S)
corresponding to cycle
z
R
k
=
I
s=l
then
2
periods;
a"
3
is
played for the next
and m.
r
S
(j),
r
is
2
(j)
(Recall that
be the discounted average payoffs
j.
Note that if
z
will be on the boundary as well.
(5)
(j)
(j)
and so on for all k between
g(a ))•
then
1
a
with R°(j)
=
0,
is
on
the boundary of
If we set
V,
z
R
Z
(S)
k
(j)-1
Z
(1-5)
k-1,
.v
D
s=R
(j)
m
.
J
k=l
Choose
S_
so
that,
within
E
of
z
for all
5
,
S
5
K
.
/(l-S a
greater than
By construction,
.
w
(
5)
We now apply the algorithm of Lemma
1
deterministic sequence of the
z
(
5)
'
z
(
contained in the
generate each v€V by
where 5=max
5>.S,
Earlier, when the payoffs w
were called for in
set a(t)=a
application, we replace the w
the
z
(
S)
'
In our current
.
s
a
(.5,
j_
the actions for the next d periods.
as
a
1-1/J)
given period
t,
.
we
with
s
Moreover, when the algorithm calls for payoffs
.
we assign cycle
1
is
5)
s
to
for
s
j,
is
V
5>.S,
'
z
.
and all
_5
for all
polygon Z(S) whose vertices are the
)
z
(
5)
The Lemma
algorithm so modified guarantees that we can generate each of the
payoffs in
V
by
a
each cycle is of length
of at
least 2£
,
d
and each
z
(
5)
gives each player
a
payoff
the continuation payoffs starting at any time
each player at least
satisfy (1-5
Because
deterministic sequence of these cycles-
)g_+5
£
if
S
is
t
give
taken large enough to
(2£)>£, where £=min g.(a)
the lowest possible
is
i, a
value of any player's payoff.
Q.E.D.
To summarize,
the algorithm of Lemma
veV by a deterministic sequence of w
s.
1
shows how to attain any
Lemma
2
replaces each w
that is not individually rational with a payoff vector
itself can be attained through
obtain v€V
through
a
a
finite cycle of w
deterministic sequence,
(i)
's.
z
(
5)
that
Hence,
to
apply the Lemma
1
in
algorithm using the
z
(
5)
the algorithm calls
for
z
'
s
instead of the
(6),
replace
it
w
's;
and
whenever
(ii)
with the corresponding
d-period cycle
To see how Lemma
in
2
enables us to do without public randomization
the proof of the Folk Theorem,
strategies in our [1986] paper.
we first recall the form of the
To obtain
the point v£V
players use publicly correlated action (generating
as
long as no player deviates.
If player
"punishment equilibrium" in which,
periods,
(b)
for
a
the players revert to
mode in which their payoffs are
to
deviates, we provided
i
a
certain number of
the player's opponents minimax him and he responds
optimally and then
as
(a)
each period)
in
v
we had
,
v
,
a
more "cooperative"
where this vector
is
chosen so
induce i's punishers to go through with their minimaxing and
so player i's overall payoff is
less than
£.
Like
v,
From Lemma
generated by publicly correlated actions.
replace the publicly correlated actions yielding
v
and
it
is
2,
we can
v
with
deterministic sequences whose continuation payoffs are greater than
£.
Because deviation leads to payoffs less than
wish to deviate from such
strategies are observable,
a
sequence.
In
e,
no player will
the case where mixed
this is the only change to the proof
required to eliminate public randomization.
The case where only players'
realized actions (and not the
randomizations themselves) are observable presents an additional
complication.
If,
to
minimax player
i,
player
j
uses a mixed
strategy, he must must be indifferent among the various actions over
11
which he randomizes.
Our
[1986]
proof ensured this indifference by
making j's continuation payoff after the punishment phase contingent
on his
actions during the phase.
It
is
important here that
precisely specified values for the continuation payoffs be
attainable;
2
shows,
it
would not suffice merely to approximate them.
however,
these exact values can,
in
fact,
be attained.
Thus public randomization is inessential in this case too.
—
Lemma
3
4
The s t r ong vers ion o f Le mma 2
guaranteeing a uniform
choic e of 1 -" is need ed he re b ecause t he particular values of the
Without
cont i nuat i on p ayoffs w ill d epen d on the discount factor.
unif o rmi ty we woul d ru n in t o th e di f f ic ulty that if 5 were chosen
large enou gh t o invoke Lemm a 2 for part icular continuation payoffs,
the c ont in uat i on payof f s re qui r ed to en sure indifference would
thems elves cha nge nee ess i t at in g a new S, etc
4
We shou Id point out that our [19 86] paper was misleading in
its a ssert i on that the Folk The orem hoi ds approximately when public
rando mizat ion is not f eas ib le.
Without the possibility of attaining
certa in co nt in uat ion p ayof f s ex actly, i t is not clear that even an
appro ximat e ve rs ion of the Folk Theorem holds for the case of
unobs ervab le m ixed s tr ategi es
3
.
,
.
.
12
References
Aumann, R., and L. Shapley [1976], "Long Term Competition:
Theoretic Analysis," mimeo, Hebrew University.
A
Game
Friedman, J. [1971], "A Noncooperat i ve Cooperative Equilibrium For
Supergames," Review of Economic Studies 38, 1-12.
,
Fudenberg, D. and E. Maskin [1986], "The Folk Theorem in Repeated
Games With Discounting or With Incomplete Information,"
Econometrica Vol. 54, No. 3, May 1986, pp. 533-554.
,
and
[1987], "Nash and Perfect Equilibria of
Discounted Repeated Games," mimeo, Harvard University.
Rubinstein, A. [1979], "Equilibrium in Supergames with the
Overtaking Criterion," Journal of Economic Theory No.
,
36!
i
096
21,
1-9,
3
TOAD QD5 130 312
Download