Document 11163180

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium IVIember Libraries
http://www.archive.org/details/repeatedgameswitOOfude
UUl 14
working paper
department
of economics
REPEATED GAMES WITH LONG-RUN
AND SHORT-RUN PLAYERS
By
Drew Fudenberg
David Kreps
Eric Ma skin
No. 474
January 1988
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
1988
REPEATED GAMES WITH LONG-RUN
AND SHORT-RUN PLAYERS
By
Drew Fudenberg
David Kreps
Eric Maskin
No. 47A
January 1988
M.I.T., Department of Economics; Stanford University, Graduate School of
Business; and Harvard University, Department of Economics, respectively.
M.I.T. LIBRARIES
JUL
1
5 1988
RECEIVED
ABTRACT
This paper studies the set of equilibrium payoffs in games with longand short-run players and little discounting.
Because the short-run players
are unconcerned about the future, equilibrium outcomes must always lie on
their static reaction (best response) curves.
The obvious extension of the
Folk Theorem to games with this constraint would simply include the constraint
in the definitions of the feasible payoffs and of the minmax values.
This
extension does obtain under the assumption that each player's choice of
m ixed strategy for the stage game is publicly observable,
but,
a
in contrast to
standard repeated games, the limit value of the set of equilibrium payoffs is
different if players can observe only their opponents' realized actions.
1.
Introduction
The "folk theorem" for repeated games with discounting says that
mild conditions) each individually-rational payoff can be attained in
(under
a
It has long
perfect equilibrium for a range of discount factors close to one.
been realized that results similar to the folk theorem can arise if some of
the players play the constituent game infinitely often and others play the
'constituent game only once, so long as all of the players are aware of all
previous play.
A standard example is the infinitely-repeated version of
Selten's [1977] chain-store game, where a single incumbent faces an infinite
sequence of short-run entrants in the game depicted in Figure
1.
Each entrant
cares only about its one-period payoff, while the incumbent maximizes its net
present value.
For discount factors close to one there is a perfect
equilibrium in which entry never occurs, even though this is not a perfect
equilibrium if the game is played only once or even a fixed finite number of
times.
In this equilibrium, each entrant's strategy is "stay out if the
incumbent has fought all previous entry; otherwise, enter;" and the
incumbent's strategies is "fight each entry as long as entry has always been
fought in the past, otherwise acquiesce."
Other examples of games with long
and short run players are the papers of Dybvig-Spatt
[1980]
and Shapiro
[1982]
on a firm's reputation for producing high-quality goods and the papers of
Simon [1951] and Kreps
[1984]
on the nature of the employment relationship.
This paper studies the set of equilibrium payoffs in games with longand short-run players and little discounting.
This set differs from what it
would be if all players were long-run, as demonstrated by the prisoner's
dilemma with one enduring player facing a sequence of short-run opponents.
Because the short-run players will fink in every period, the only equilibrium
is the static one,
no matter what the discount factor.
In general, because
the short-run players are unconcerned about the future, equilibrium outcomes
must always lie on their static reaction (best response) curves.
This is also
true off of the equilibrium path, so the reservation values of the long-run
players are higher when some of their opponents are short-run, because their
punishments must be drawn from
a
smaller set.
The perfect folk theorem for discounted repeated games
[1986])
shows that, under
a
(Fudenberg-Maskin
mild full-dimensionality condition, any feasible
payoffs that give all players more than their minmax values can be attained by
a
perfect equilibrium if the discount factor is near enough to one.
The
obvious extension of this result to games with the constraint that short-run
players always play static best responses would simply include that constraint
in the definitions of the feasible payoffs and of the minmax values.
Propositions
1
and
2
of Section 2 shows that this extension does obtain under
the assumption that each player's choice of
a
mixed strategy for the stage
game is publicly observable.
We then turn to the more realistic case in which players observe only
their opponents' realized actions and not their opponents' mixed strategies.
While in standard repeated games the folk theorem obtains in either case, when
there are some short-run players the set of equilibria can be strictly smaller
if mixed strategies are not observed.
The explanation for this difference is
that in ordinary repeated games, while mixed strategies may be needed during
punishment phases, they are not necessary along the equilibrium path.
In
contrast, with short-run players some best responses, and thus some of the
feasible payoffs, can only be obtained if the long-run players use mixed
strategies.
If the mixed strategies are not observable,
inducing the long-run
players to randomize may require that "punishments" occur with positive
probability even if no player has deviated.
For this reason the set of
equilibrium payoffs may be bounded away from the frontier of the feasible set.
Proposition
3
of
Section
3
provides
a
complete characterization of the
limiting value of the set of equilibrium payoffs for
a
single long-run player.
This characterization, and the results of Section 2, assume that players have
access to a publicly observable randomizing device.
implement strategies of the form:
switch to
a
The device is used to
if player i deviates,
then players jointly
"punishment equilibrium" with some probability p
<
1.
While the
assumption of public randomizations is not implausible, it is interesting to
know whether it leads to a larger limit set of equilibrium payoffs.
Proposition
in Section 4 shows that it does not:
4
strategies" in which
a
We construct "target
player is punished with probability one whenever his
discounted payoff to date exceeds
a
target value, and shows that these
strategies can be used to obtain as an equilibrium any of the equilibrium
payoffs that were obtained via public randomizations in Proposition 3.
Proposition
3
shows that not all feasible payoffs can be obtained as
equilibrium, so in particular we know that some payoffs cannot be obtained
with the target strategies of Proposition 4.
Inspection of that construction
shows that it fails for payoffs that are higher than what the long-run player
can obtain with probability one given the incentive constraint of the
short-run players:
that player
For payoffs this high, there is a positive probability
will suffer a run of "bad luck" after which no possible sequence
1
of payoffs could draw his discounted normalized value up to the target.
As
this problem does not arise under the criterion of time-average payoffs, one
might wonder if the set of equilibrium payoffs is larger under time-averaging.
Proposition
5
shows that the answer is yes.
In fact,
any feasible incentive
compatible payoffs can arise as equilibria with time-averaging, so that we
obtain the same set of payoffs as in the case where the player's privately
mixed strategies are observable.
This discontinuity of the equilibrium set in
passing from discounting to time averaging is reminiscent of
a
similar
discontinuity that has been established for the equilibria of repeated
partnership games (Radner [1986], Radner-Myerson-Maskin [1986]).
The
relationship between the two models is discussed further in Section
5.
We have not solved the case of several long-run players and unobservable
mixed strategies.
Section
5
gives an indication
of the additional
complications that this case presents.
2.
Observable Mixed Strategies
Consider
finite n-player game in normal form,
a
g:
S.x
xS
->
R
We denote player i's mixed strategies by
expected value of g under distribution
cr
e X, and write
g(<7)
for the
cr.
In this section we assume that a player can observe the others' past
mixed strategies.
This assumption (or a restriction to pure strategies) is
standard in the repeated games literature, but as Fudenberg-Maskin [1986]
[1987a] have shown,
next section!)
it is not necessary there.
(Here it matters
—
see the
We will also assume that the players can make their actions
contingent on the outcome of a publicly observable randoming device.
Label the players so that players
short-run.
Let
B:
EiX...xEj=^Ij^i...xI^
1
to
j
are long-run and j+1 to n are
.
be the correspondence which maps any strategy selection
(<7
,
...,a.) for the
long-run players to the corresponding Nash equilibria strategy selections for
B(a) is his
If there is only one short-run player,
the short-run players.
best response correspondence.
For each
from
i
1
to
j
choose m
,
max
^ min
m egraph(B)
g
=
(m.
,
.
.
.
,m
)
solves
so that m
{<T^,m_^),
o.
and set
v^ = max g
(f^^,in_^)
1
(This minimum is attained because the constraint set graph (B)
the function max g
(cr.
,m_.)
is compact and
is continuous in m_..)
<j
1
The strategies m_. minimize long-run player i's maximum attainable
payoff over the graph of B.
The restriction to this set reflects the
constraint that the short-run players will always choose actions that are
short-run optimal.
can give player
i
Given this constraint, no equilibrium of the repeated game
less than v..
(In general,
the short-run players could
force player one's payoff even lower using strategies that are not short-run
optimal).
a best
Note that m
response to m_.:
specifies player i's strategy m.
Player
i
,
which need not be
must play in a certain way to induce the
short-run players to attain the minimum in the definition of m
.
In order to
construct equilibria in which player i's payoff is close to v., player
i
need to be provided with an incentive to cooperate in his own punishment.
will
In the repeated version of g, we suppose that long-run players maximize
the discounted
normalized sum of their single-period payoffs, with common
t=oo
That is, long-run player i's payoff is (l-5)\ 5 g.(a(t)).
discount factor 5.
t=0
Short-run players in each period act to maximize that period's payoff.
All
players, both long- and short-run, can condition their play on all previous
actions.
Let U =
i
V =
(v
,
...V
)
I
3a in graph
Let
V = convex hull of U;
and let
V
=
\v^v\
for all
We call payoffs in V
i
from
with
(B)
1
to j, v
g{o')
>
= vl
v.).
attainable payoffs for the long-run player.
payoffs in V* can arise in equilibrium.
Only
We begin with the case of a single
long-run player.
Proposition
1:
v.ev* there exists
a
If only player one is a long-run player,
then for any
&e(0,l) such that for all 6e(5,l), there is
a
subgame-perf ect equilibrium of the infinitely repeated game with discount
factor 6 in which player i's discounted normalized payoff is v..
^ix
El22l-
s
v,e V
and consider the following strategies.
Phase A, where players play a a e graph
such
cr's)
that gives player
are ignored.
if T(6)
need not be
payoff v..
If player one deviates,
the punishment strategy m
A;
1
is large enough,
a
(B)
(or a public
Begin in
randomization over
Deviations by the short-run players
he is punished by players switching to
for T(6) periods, after which play returns to Phase
deviations in Phase A are unprofitable.
best response against m_.
,
Now
m-
so we must insure that player one
.
does not prefer to deviate during the punishment phase.
specifying that
most that player
This is done by
Since the
deviation in this phase restarts the punishment.
a
can obtain in any period of the punishment phase is v
1
,
he
will prefer not to deviate so long as T(S) is short enough that player I's
normalized payoff at the start of the punishment phase is at least v
=
max
g. (f)
Let v.
.
The two constraints on T(5) will be satisfied if:
.
CTegraph(B)
(l-5)v^+ 6(l-6'^^^^g^(m-^)+6^^^^^^v^s v^, or equivalently
(1)
(1'
6T(^)+1
<{v^-6g^(m^) + (l-6)v^)/(v^-g^(m^)), and
)
T(5)
(1-5^
(2)
5^^°^
(2')
>
)g^(m
1
)
+
T(6)
d^
v^
>
v^, or equivalently
{v^-g^(ffi^))/{v^-g^(m^))
The right-hand sides of inequalities
and for 5 close to 1 the numerator of
Then since 6
5
<
1
T
(1'
and
)
(1'
)
(2'
)
have the same denominator,
exceeds the numerator of
{2'
)
is approximately continuous in T for 5 close to 1, we can find a
such that for all greater 5 there is a T(5) satisfying
(1'
)
and
(2' ).
Q.E.D.
In repeated games with three or more players,
a
full-dimensionality
condition is required for all feasible individually rational payoffs to be
enforceable when 6 is near enough to one.
The corresponding condition here is
that the dimensionality of V* equals the number of long-run players.
.
Proposition
Assume that the dimensionality of V* =
2:
j
the number of
,
Then for each v in V*, there is a 6e(0,l) such that for all
long-run players.
6e{5,i) there is subgame-perf ect equilibrium of the infinitely repeated game
with discount factor 5 in which player i's normalized payoff is v..
Remark:
2:
The proof of Proposition
2
follows that of Fudenberg-Maskin'
s
Theorem
(long-run) player deviates, he is punished long enough to wipe out the
If a
gain from deviation.
To induce the other (long-run) players to punish him,
they are given a "reward" at the end of the punishment phase.
One small
complication not present in Fudenberg-Maskin is that, as in Proposition
player being punished must take an active role in his punishment.
1,
the
This,
however, can be arranged with essentially the same strategies as before.
Proof
i
Let T
i,
(v'
+c
,
.
.
.
o
a
Also choose
v.
to
Choose
:
v'
,v'.
(or a public
randomization over several
in the interior of V* and an c
,+c
,v'. ,v'.
+c
,
+c
,v'.
be a joint strategy that yields
and yields
v'.
to i.
being punished with the strategies m
is in V* and
)
v'.
Let w. = g.(m
)
.
>
so that q(o) =
cr's)
so that for all i from 1
v'.
+ c
<
v.
+ c to all the long-run players but
be player i's period payoff when
j
For each i, choose an integer N.
so
'
that
v.+ N.v.
1-1
1
where
v.
= max g
<
(N+1)
v'.
1
is i's greatest one-period payoff.
Consider the following repeated-game strategy for player
i;
is
Obey the following rules regardless of how the short-run players have
(0)
played in the past:
Play
(A)
<7.
each period as long as all long-run players played o last
period, or if o had been played until last period and two or more long-run
players failed to play o last period.
If
j
deviates from
(A)
Play
T-?
thereafter.
If long-run player k deviates in phase
(B.)
then
,
Play m. for N. periods, and then
(B.)
(C)
long-run player
again with
j
= k.
(As in phase A,
(B.)
or
(C)
,
then begin phase
players ignore simultaneous deviations
by two or more long-run players.)
As usual, it suffices to check that in every subgame no player can gain
from deviating once and then conforming.
The condition on N. ensures that for
S close to one, the gain from deviating in Phase A or Phase C is outweighed by
Phase B's punishment.
If player
j
conforms in B
N.
punished) her payoff is at least g.s
close enough to one.
.
(i.e. when she is being
N.
.
(1-6 ')w. +5
v'.
,
which exceed v. if 5 is
most V. the period she deviates, and postpones the payoff q.
lowers her payoff.
next
N,
she receives at
If she deviates once and then conforms,
If player k deviates in Phase B.,
periods and Phase-C play will give her
v'
for the missing computations.)
v.
which
,
she is minmaxed for the
instead of
is easy to show that such a deviation is unprofitable.
>
v'
+ c
.
Thus it
(See Fudenberg-Kaskin
Unobservable M ixed Strategies
3.
We now drop the assumption that players can observe their opponents'
mixed strategies, and instead assume they can only observe their opponents'
realized actions.
In ordinary repeated games,
(privately) mixed strategies
are needed during punishment phases, because in general
player's minmax
a
However, mixed
value is lower when his opponents use mixed strategies.
strategies are not required along the equilibrium path, since desired play
along the path can be enforced by the threat of future punishments.
Fudenberg-Maskin showed that, under the full-dimension condition of
Proposition
2,
players can be induced to use mixed strategies as punishments
by making the continuation payoffs at the end of a punishment phase dependent
on the realized actions in that phase in such a way that each action in the
support of the mixed strategy yields the same overall payoff.
In contrast, with short-run players some payoffs
(in the graph of B)
can
only be obtained if the long-run players privately randomize, so that mixed
strategies are in general required along the equilibrium path
.
As
a
consequence, the set of equilibrium payoffs in the repeated game can be
strictly smaller when mixed strategies are not observable.
illustrated by the following example of
a
This is
game with one long-run player. Row,
and one short-run player. Col.
Col's best response is M if
Let p be, the probability that Row plays D.
s
p <
1/2, L if 1/2
static equilibria:
s
p <
100/101, and R if p
s
100/101.
the pure strategy equilibrium (D,R)
,
a
There are three
second in which p =
1/2 and Col mixes between M and L, and a third in which p = 100/101 and Col
mixes between L and R.
Row's maximum attainable payoff is
when p = 1/2 and Col plays L.
10
3,
which occurs
.
4,
2,
2
0,
1
-1,
1,
1
0,
Fi gure
If Row's mixed strategy is observable,
-100
3
1
she can attain this payoff in the
infinitely repeated game if S is near enough to
1.
If however Row's mixed
strategy is not observable, her highest equilibrium payoff is at most
2
regardless of S
fix a discount factor S, and let v
To see this,
(6)
be the supremum over
Suppose that for some 6
all Nash equilibria of ROW's equilibrium payoff.
*
v(5)=2
v(-^)
= V
+ c'
(5)
is stationary:
>2, and choose
- c
>
an equilibrium
such that player I's payoff is
<>
It is easy to see that the set of equilibrium payoffs
2.
Any equilibrium payoff is an equilibrium payoff for any
subgame, and conversely.
starting from period
2
Thus, the highest payoff player
is also bounded by v
Since
(5).
1
v(-£>)
can obtain
is the weighted
average of player I's first-period payoff and her expected continuation
*
payoff, player I's first-period payoff must be v (5) - c/(i-o).
For c
sufficiently small, this implies that player I's first period payoff must
exceed
2.
In order for Row's first-period payoff to be at least 2, Col must play L
with positive probability in the first period.
As Col will only play L if Row
randomizes between U and D, Row must be indifferent between her first period
choices, and in particular must be willing to play D.
expected payoff from period
2
Let v
be Row's
on if she plays D in the first period.
11
Then we
must have
2(1-5) +
(3)
But since
v_^
^
5vjj = V*.
v*, we conclude that v*
^
2.
While Row cannot do as well as if her mixed strategies were observable,
For 5 near enough to one there
she can still gain by using mixed strategies.
is an equilibrium which gives Row an normalized payoff of 2,
while Row's best
payoff when restricted to pure strategies is the static equilibrium yielding
To induce Row to mix between U and D, specify that following periods when
1.
Col expects mixing and Row plays U, play switches with probability p to (D,R)
for ten periods and then reverts to Row randomizing and Col playing L.
The
probability p is chosen so that Row is just indifferent between receiving
2
for the next eleven periods, or receiving 4 today and risking punishment with
probability p.
This construction works quite generally, as shown in the
following proposition.
Proposition
3:
Consider a game with
a
single long-lived player, player
one, and let
V.
max
=
cregraph B
Then for any v
^(Y-i-v
s.
)
min
esupp o
^i ^^i
there exists a 5'
'
^-i^
<
1
such that for all 5e(5' ,1),
there is an equilibrium in which player one's normalized payoff is v
^
*
.
o is there an equilibrium where player one's payoff exceeds v.
12
.
For no
Proo f:
We begin by constructing a "punishment equilibrium" in which player
one's normalized payoff is exactly v..
is player one's payoff
If v.
in a
static equilibrium this is immediate, so assume all the static equilibria
given player one more than v
The strategies we will use have two phases.
.
The game begins in phase A, where the players use m
player one's maximum one-period payoff to v
,
a
strategy which holds
If player one plays s.
.
,
players
publicly randomize between remaining in phase A and switching to a static Nash
equilibrium for the remainder of the game.
If e.
is one's payoff in this
static equilibrium, set the probability of switching after
(l-5)(v^
p(s.1') =
(4)
y
is near enough to one,
p(s.), to be
1
,
~
,
g,(s^,m_J))
,
1
(If 5
-
x.
^1 ^^l'"-l
p(s.)
is between
and 1.)
The switching probability has been constructed so that player one's
normalized profit is v
for all actions,
including those in the support of m
,
so she is indifferent between these actions.
*
Next we construct strategies yielding
be the corresponding mixed strategies.
following
a.
v.
for
*
v.
*
^
Yi
•
"^^^
^
^^i' *^-i^
Play begins in phase A with players
If player one deviates to an action outside the support of
then switch to the "punishment equilibrium" constructed above.
plays an action
in the support of o
s.
equilibrium with probability p(s
),
*
*
~
,
c
If player one
then switch to the punishment
and otherwise remain in phase A.
The
probability p(s.) is chosen so that player one's payoff to all actions in the
support of o
one.
*
is v..
As above, this probability exists if 6 is near enough to
These strategies are clearly an equilibrium for large 5.
13
Equilibrium payoffs between v
and v
randomizations between those two value.
cannot exceed v
4.
are obtained by using public
The argument that player one's payoff
is exactly as in the example.
No Pub lic R andom izat ions
The equilibria that we constructed in the proofs of Theorems
1
through
3
relied on our assumption that players can condition their play on the outcome
of a publicly observed random variable.
While that assumption is not
•
implausible, it is also of interest to know whether the assumption is
necessary for our results.
Proposition
3
For this reason, Proposition 4 below extends
to games without public randomizations.
about the possible extension of Propositions
1
and
2
(We have not thought
because we think the
situation without public randomizations but where private randomization can be
The intuition, as explained in
verified ex-post is without interest.)
Fudenberg-Kaskin [1987c]
,
is that public randomizations serve to convexify the
set of attainable payoffs, and when 5 is near to
1
this convexif ication can be
achieved by sequences of play which vary over time in the appropriate way.
Fudenberg-Kaskin [1987c] shows that public randomizations are not necessary
for the proof of the perfect Folk Theorem.
However, as we have already seen,
there are important differences between classic repeated games and repeated
games with some short-run players, so the fact that public randomizations are
not needed for the folk theorem should not be thought to settle the question
here.
14
Proposition
4:
Consider
a
game with
a
single long-run player, player
As in Proposition 3,
where public randomizations are not available.
V
min
max
=
o e graph
s.e supp o
(B)
^l^^i' "^-1^
cr
*
be a strategy that attains this max.
there exists a S' <1 such that for all 5e(6'
,\)
Then, for any v. e
there is
:
Fix
a
static Nash equilibrium o with payoffs v.
(v.,v.)
subgame-perf ect
a
equilibrium where player I's discounted normalized payoff is v
Remark
let
'
*
and let
1,
.
.
For each v.
the
proof constructs strategies that keep track of the agent's total realized
payoff to date
t
and compares it to the "target" value of
what the payoff to date would be if the agent received v
V
exceeds the payoff in
a
(possibly mixed) strategy o
(1-5
)
v., which is
in every period.
static equilibrium, then play initially follows the
,
and whenever the realized total is sufficiently
greater than the target value, the agent is "punished" by reversion to the
static equilibrium.
If the target is less than the static equilibrium,
play starts out at the (possibly mixed) strategy m
,
then
with intermittent
"rewards" of the static equilibrium whenever the realized payoff drops too
low.
Proof:
(A)
If
It is trivial to obtain v
as an equilibrium payoff.
15
To attain any payoff v
(B)
Renormalize the payoffs so that
that
(l-6)v
define the strategies
(h
-^
)
and
and v
v
we proceed as follows.
and take 6 large enough
= 0,
v.
Define Jq =
v..
<
between
=
(0)
•»
and an index J
a
and for each time t>0
,
as follows:
J^ = J^_^+ (l-6)6^^"^^g^(s^(t-l),/^(h^_^),
where
is player I's action in period t-1
s, (t-1)
mixed strategy,) and
=(1-6 )v..
R
then his accrued payoff J
If player I's payoff were v.
would equal
"punish" the agent whenever J
opposed to his choice of
(as
R.
The equilibrium strategies will
.
exceeds R
each period,
More
by too large a margin.
precisely, we define
if
o-
J^
R^^,
t+1
and
J_
T
s
T<t
R_
T for all
<
R^^,
t+1
and
J_
T
^
all Tst
R_
T for
<
R^
>
t
^*(h^.)
o
=
*
if
J^
t
if
Note that since J
J
J^
is a discounted sum,
of J^,
-s_,
for each infinite history h^,
Moreover, as long as the other players use
converges to a limit J^.
strategy
for any T<t
player I's payoff to any strategy is simply the expected value
,
and his expected payoff in any subgame starting at time
t
is
*
We will now argue that
for all times
regardless of
t
and
histories h
how player
subgame starting at time
(iii)
(i)
1
t
if player 1 uses strategy
,
plays,
which imples that
^^^
v
,
is bounded by 5
that in any subgame where at some "^^t
J^^
2:
<>.,
v
,
then J
(ii)
^
R.
that
so player I's payoff in the
(v
-J
J
<
)
R
for all histories
,
h^^,
and
it is a best
response for all players to follow the prescribed strategy of always playing
16
)
the static equilibrium o
Conditions
and
(i)
and so
,
Jg^
^
^i
'
imply that it is a best response for player
(ii)
*
to play
1
player I's equilibrium payoff is
immediate, says that following
has dropped below R
where J
has never dropped below R
in every subgame where J
-d
v
Condition (iii)
.
so that
,
whose proof is
,
equilibrium in subgames
is also a Nash
-s^
and that
,
is a subgame-perf ect
<>
(The condition that the short-run players not wish to deviate is
equilibrium.
incorporated in the construction of ^
.
*
Proof of
(i)
We must show that if player
:
J^^(l-S )v.= R
At period t, either
t=T.
*
Since
g. (s,,ct_
a ,, we have J^,i = J^
iz
*
<>
J^
i
^r+i °^
^^^
then for all t,
<>.
Assume it is true for
this is true for t=0.
0,
(a)
follows
^^^^
"^^
*''^t+i'
"^t
^^^
'
*
= o.
(t)
fi
Since Jq=
.
1
=
)
and J
-
"k
and J^_^^
{l-6)6'^v*
"^
>^T
,
"k
"k
"k
-
In case (b)
by inductive hypothesis.
R
2:
,so min lg^{s^(-r) ,-a_^(h^)) ls^(T)esupport(<^^{h^))
(h^)=o-
in the support of
for every pure strategy s.
1
= v^,
(1-5 '^)v* + (l-6)o'^v* = (1-5 '^^^)v^ = R^_^^,
^
where the second inequality comes fron the inductrive hypothesis.
Thus J
s
(1-6
Proof of
(ii)
)v
for all t, and so if player
1
follows
-ft.
then J^s: v..
Next we claim that regardless of how player
:
If for some history ^qq
^
'
^i
there is a
'^^^'^
T'
plays,
1
such that J
s:
(1-6
^^^
"^
v
.
for
)v^
*
all t
>
T'
.
Thus *_i
(h
=
)
o for all
>T'
t
and since the most player
,
get when his opponents play o is zero, we have that J^
smallest such
Then
Jqd
-
J^
T'
<
so that J™- _,
J™_- + 5
in our bound on 6.
T
<
(l-6'^)v
-
(1-5 )v
<
.
(Since J
T
+ 5 v
J
<
v
,
^
J„,
= 0, T
1
can
Let T be the
.
>
0.)
where we have substituted
This argument also shows that player I's payoff in the
subgame starting at time
t
is bounded by 6
17
(v.-J
)
,
and from part
(i)
this
•
*
payoff can be attained by following
(C)
,
Next we show how to construct equilibria (for large enough S) that
:
between v
yield payoffs v
a v-e:(v
-f^.
equilibrium payoff of zero.
and the static
and choose 6 large enough that (l-5)min g,{<^)
,0),
v..
>
Pick
Then set J^
o
*
= 0,
and
-i^
= m
(0)
1
.
Now define J^(h.) and *{h
)
as follows.
*
t
Set J^ = J^_^ + 6 g^(s^{t), ^_j^{h^)), and set
r
J,
if
R^,,
<
/(h^) =^
m^
if J^
R^
2
Proceeding as above, we claim that
*
player
1
uses strategy
for all times
t
and
if
(i)
(ii)
<».
histories h
that regardless of
J,
R^
^
which imples that J^
,
how player
the subgame starting at time
plays, ^^
1
-
v
is no greater than 5
t
that in subgames where J
(iii)
then
,
v
,
so player I's payoff in
(^i"''*.^
'
^^"^
it is a best response for player one to
<R
t
,
-
K,
*
play
(h^)
-a
Proof of
J^
(1-5
t=T.
=
'^^.^
(i):
)v = R
We must show that if player
.
Since
J=
0,
1
follows
this is true for t=0.
At period t, either (a) J^
^
R^
,
,
18
or
(b)
<>.
then for all
t,
Assume it is true for
J^ <R^^,
,
In case
(a)
J^,-
+5
J
^
T
(l-5)min{g,
>R^ +6
In case (b)
,
^
(h^)=a,
so g^ (s^ (t)
R^^^.
.
*
for all t, and so if player
J^
(1-5
Assume it is true for
t='^.
J^
h.
,
*
(b)
,
(h^)=m
-o
= R
)v.
follows
1
Since J^=
.
then J^^ v
<>.
^ ^'
^^^
^° '^T+i
-
.
and all times
1
t
this is true for t=0.
0,
Then at period t, either
^^^^ ^^^' ^
^"^
-^T+i-
In case
)=0 for all s^ (T)esupport ^^(h^),
We claim that for all strategies of player
:
and histories
(b)
,-d_^ (h^)
*
t
(ii)
.
R^
Thus J.^(l-S )v.
Proof of
(from our bound on 5)
v.
(by inductive hypothesis)
V
T+1,
^
T
T
=R
and J^^^ = J^
J^+ 6
^
)
(a)
^
"^T
J
<
R^._^,
or
^T+l'
*
2
,so max
5-i
(s.
,
o_.{h))
= v
and
,
"l
T
J^
,
,
<
s
player
,-T + l
(l-o
Thus J <(l-5 )v
plays, we will have J^
that J^
T
V
(by the inductive hypothesis)
^
.
for all t, and so regardless of how player 1 plays, J^^ v
(i)
consider subgames with J
player
+(1-5)6
)v
and (ii)
show that in any subgame with
can attain the upper bound of v. by following
1
play o
(l-o
<
(from the bound on o) = R
)v
Conditions
(iii)
T
J^ + (1-5)5 V
<
R
^
R^
•
If J^
for all t
-
i
t,
for the remainder of the game.
to play
1
2:
o
= o
R^ at some
.
'^>t,
If J
>
v
,
v
,
.
^R.
Now we
then regardless of how player
Here it is clearly
then by playing
1
cr
a
best response for
player
attains
a
1
can ensure
payoff of v
If player 1 instead chooses a strategy which
19
1
so player I's opponents will
which ensures that player
in the subgame starting at t.
<>.
J.
.
assigns positive probability to the event that J
only lowe his payoff:
'^•'
<
^t-
The payoff
for histories with J
'^-"
•*
^or all t
R
(BOO
<
t,
>
he
c an
is less than v.,
]_
and the payoff for the histories with J_^R_
is bounded above by v,
00
CD
.
1
Q.E.D.
Proposition
4
shows how to attain any payoffs between v
of "target strategies."
From Proposition
3
and v
by means
we know that such strategies
We think that it is interesting to
cannot be used to attain higher payoffs.
note where an attempted proof would fail.
Ik
In part
(A)
,
we proved that if player
1
followed
sequence of realizations player I's payoff is at least v
try to attain a payoff v
>
the "reward" phases where o
payoff is less than
v,
.
*
v,
then for every
<>
.
Imagine that we
^ t
by setting the target R = (l-o )v
is played,
m
.
.
Then
it might be that player I's realized
(Recall that by definition it cannot be lower than
After a sufficiently long sequence of these outcomes, player I's
v .).
realized payoff J
would be so much lower than v
that even receiving the best
possible payoff at every future date would not bring his discounted normalized
payoff up to the target.
This problem of going so far below the target that a return is impossible
does not arise with the criterion of time-average payoffs, since the outcomes
in any finite number of periods are then irrelevant.
attain payoffs above
5.
v.
For this reason we can
under time averaging, as we show in the next section.
Time Averaging
The reason that player I's payoff is bounded by what he obtains when she
plays her least favorite strategy in the support of o
plays
a
is that every time she
different action she must be "punished" in a way that makes all of the
actions in o
equally attractive.
A similar need for "punishments" along the
20
,
equilibrium path occurs in repeated partnership games, where two players make
an effort decision that is not observed by the other,
and the link between
Since shirking by either player increases
effort and output is stochastic.
the probability of low output, low output must provoke punishment, even though
low output can occur when neither player shirks.
This is why the best
equilibrium outcome is bounded away from efficiency when the payoff criterion
is the discounted normalized value.
Fudenberg-Maskin [1987a]).
(Radner-Myerson-Maskin [1986]
However, Radner [1986] has shown that efficient
payoffs can be attained in partnerships with time averaging.
constructed strategies so that
(1)
players never cheat, punishment occurs
if
only finitely often, and thus is negligible, and
deviations is very likely to trigger
His proof
a
(2)
an infinite number of
substantial punishment.
Since no
finite number of deviations can increase the time-average payoff, in
equilibrium no one cheats yet the punishment costs are negligible.
Since the inefficiencies in repeated partnerships and games with
short-run players both stem from the need for punishments along the
equilibrium path, it is not surprising that the inefficiencies in our model
also disappear when players are completely patient.
We prove this with a
variant of the "target strategies" we used in Section 4.
differ from Radner
's
in that even if player
1
These strategies
plays the equilibrium strategy,
she will be punished infinitely often with probability one.
However, along
the equilibrium path the frequency of punishment converges to zero, so that as
in Radner the punishment imposes zero cost.
Proposition
criterion
Imagine that player
t=T-l
5^:
lim inf
T
E
CD
(1/T)
)
g
1
{s{t)).
evaluates payoff streams with the
Then for all v cv
ts'O
subgame-perf ect equilibrium with payoffs v
21
.
there is
a
Remark: The proof is based on a strong law of large numbers for martingales
with independent increments,
a
2/
which we extend to cover the difference between
supermartingale and its lowest value to date.
The relevant limit theory is
developed in the Appendix.
Proof
As in Proposition 4, we use different strategies for payoffs above and
:
below some fixed static equilibrium o.
Imagine that
payoff in this equilibrium, and normalize
mixed)
strategy in graph
define
g^(o')
= v
= 0.
v,
v.
exceeds player I's
Let o be the (possibly
that maximizes player one's expected payoff, and
(B)
^^_^
T-1
Define Jq =
and J = \ g
t
definition in Section
4,
(t) ,s_
(s
(t)
)
.
=
where we used player I's realized action and the
mixed strateg y of her opponents in defining J^).
objective function is
o*
{\i^)
=
-,
no matter how player
1
if J
>o
o
if J
<0
Note that player I's
Set
<
Ve claim that
and (ii)
lim inf E(1/T)J
o
(This differs from the
(i)
plays, her payoff is bounded by v
that by following s. player 1 can attain payoff v.
,
almost surely (and
hence in expectation.)
To prove this, let
-o{h
)
be an arbitrary strategy for player 1, and fix
the associated probability distribution over infinite-horizon histories.
For
*
each history, let R
(h
)
=
Its t-1
I
-o
{h^)=<J]
22
be the "reward" periods, and let
P
(h
)
= lT<t-l
Then let M
(h
)
(h^) = o\ be the "punishment" ones.
<!>
I
=
J
be the sum of player I's payoffs in the good
g. (x(t))
T e R
periods, and set
N^(h^)=
^g^{x{t))
T ep
t
Note that the reward and punishment sets and the associated scores are defined
path wise, i.e. they depend on the history h
henceforth, though, we will
;
Finally define M = max M
omit the history h from the notation.
^
N^, and v =min g
^
,
N = min
T<t
T<t
(f^)
We claim that for all t.
v^+ (K^-K^)
(5)
J^ ^ v^+
^
This is clearly true for
of period t,
either
t
J
(a)
(N^-N^).
= 0.
Assume
>0 or
(b)
(5)
J
^
holds for all T<t.
0.
In case (a), J
.
At the start
^
^i''''-^t
Y^ +(K^-K^).
J^^i
Also, since
<>
= J.+ N.^,-N^s ;^+N
In case
(b)
,
J^^^
.
^^^J^
and J^^^ = J^+ K^^^-K^
so once again
(5)
.
(h
-N.
.
v^
v^
)
i
<-
~^1"
= o,
v, + (N,^,-KV^.
)
v^^{N^^^-N^^^)
+Kt+r'^t
'
,
so that
,
^+ ("t+l'^t+l^
is satisfied.
23
'
(5)
is satisfied.
Lemmas
3
in the Appendix shows that
limsup {1/T)J_
almost surely, and since
s
uniformly bounded, limsup 1/T E(J
plays so that M
well.
)/t converges to zero almost
Since the per-period payoffs are uniformly bounded, this implies that
surely.
1
(N -N
)
Lemma
as well.
^
is a submartingale,
Since this is true when player
the per-period payoffs are
then the
1
follows
(M -M)
s,
4
shows that if player
converges to zero as
the result follows.
,
Q.E.D.
We can show that with our strategies, player
often
(J
>
0)
with probability one.
1
is punished infinitely
This contrasts with Radner's construction
of efficient equilibria for symmetric time-average partnership games, where
the probability of infinite punishment is zero.
It seems likely that our
"target-strategy" approach provides another way of constructing efficient
equilibria for those games; it would be interesting to know whether this could
be extended to asymmetric partnerships.
Our approach has the benefit of
making more clear why the construction cannot be extended to the discounting
case.
It also avoids the need to invoke the law of the iterated logarithm,
which may make the proof more intuitive, although we must use the strong law
for martingales in its place.
6.
Several Lon g Run Players with Unobservable Kixed Strategies
The case of several long-run players is more complex, and we have not
completely solved it.
As before, we can construct mixed-strategy equilibria
in which the long-run players do better than in any pure strategy equilibrium,
and once again they cannot do as well as if their mixed strategies were
directly observable.
However, we do not have a general characterization of
the enforceable payoffs.
Instead, we offer an example of payoffs that cannot
24
.
be enforced, and a very restrictive condition that suffices for
enforceability.
Figure
2
presents
a
3-player version of the game in Figure 1.
The third player, DUMKY, who
Col's choices and payoffs are exactly as before.
is a long-run player,
receives
3
if Col plays L and receives
feasible payoffs for Row and DUMMY are depicted in Figure
is not
3.
3
The argument of Section
3
2.
otherwise.
The
Consider the
Here Row and Dummy both
feasible point at which p = 1/2 and Col plays L.
receive
Row's and
shows that Row's best equilibrium payoff
but 2, which is the minimum of payoff over the actions in the support
of her mixed strategy.
Dummy is not mixing, so Dummy's minimum payoff over
the support of her strategy is 3.
(Indeed this is the minimum over the
support of the produce of the two strategies.)
analogy to the proof of Proposition
were enforceable.
3,
Thus one might hope that, by
we could show that the payoffs
But these payoffs are not even feasible!
Dummy's payoff can be when Row's payoff is
depicts the feasible set.)
2
is 2
1 9 G
-r-rrr
/ U
.
(2,3)
The highest
(See Figure 3, which
The problem is that an equilibrium in which Row
usually randomizes must sometimes have Col play M or R to "tax away" Row's
"excess gains" from playing U instead of D, and this "tax" imposes a cost on
Dummy
25
.
M
U
D
.
Player two is
0.0,1
1,1,1
4,3,0
2,3,0
a "dummy"
-1,0,
0,0,3
player three chooses
,
-100
COLs
.
(^T5T-^)
Player 2's payoff
3
-
(T5i.°)
Player I's payoff
Feasible set when three plays a SR best response
Figure
"J
consider the game in Figure
Next,
u
1,
1,
1
2,
D
0,
2,
1
-1,
0,
-1,'
4.
1
-9
1,
1,
-99
0,
2,
1
14,
4,
14,
2,
12,
4,
12,
2,
Figure
2,
0,
1
-1, -1,
-1
4
In this game, player 1 chooses Rows, player 2 chooses Columns,
player
3
chooses matrices; players
short-lived.
and
1
2
and
are long-lived while player
The unique one-shot equilibrium is
(U,L,A)
with payoff
(1,1,1).
The long-lived players can obtain a higher payoff if they induce player
choose
C,
which requires both long-run players to use mixed strategies.
p = prob U;
1/100].
q = prob L,
then player
3
chooses
C
if
(1-p) (1-q)
^
is
3
to
3
[Let
1/10 and pq ^
For example, if both long-run players use 50-50 randomization the
payoffs are (13,3,0).
Call this strategy
26
cr
=
(a
o
).
'
Now let us explain how to enforce (2,2) as an equilibrium payoff.
It
will be clear that the construction we develop is somewhat more general than
the example; we do not give the general version because it does not lead to a
As in the proofs of
complete characterization of the equilibrium payoffs.
Propositions
1
and
game only through
2,
a
the strategies we construct depend on the history of the
number of "state variables," with the current state
determined by last period's state and last period's outcome through a
(commonly known)
denoted
s. (1)
,
Let D and R be the "first" strategies,
transition rule.
and s„(l), and let U and L be the second,
Play begins in state
2
action
k,
and s„(2).
In this state, each player pays o, which gives
0.
equal probability weight to his two actions.
player
s. (2)
the next period state is
play in state (j,k) are denoted ct(k,k) =
(ex
and
If player 1 plays action j,
The payoffs when beginning
(j,k).
(j,k),
a
{j,k)).
In our
construction, each player's continuation payoff will be independent of his
opponents' last move, so that
in each state,
ct
a
=
(j,k)
(j)
and a (j,k) = °2^^'^'
filially
the «.'s and the specified transition rule will be such that
each player is indifferent between his pure strategies and thus is willing to
randomize.
We will find it convenient to first define the a's, and then construct
the associated strategies.
Set the a's so that
v^ = (1-6) g^{E^(j),
+ oa^(j)
cr^)
(6)
In our example, a^(l) solves
10/6.
Similar computations yield
2
a
(1-6) (12) + 6a^(l), so cx^(l) = 12
=
(2)
27
= 14 - 12/6,
tx
(i)
= 4 - 2/6,
and
-
.
= 2.
CX2(2)
If the observed play is
with payoffs a(l,i).
Choose
a
(s. (1)
,
s.(l)), next period's state is
Here play depends on
a
(1,1),
public randomization as follows.
point w in the set P of payoffs attainable without private
randomizations, and
probability p£(0,l), such that
a
p(l-5)w + (l-p)v = {l-6p)a{l,i)
(7)
.
(This is possible for S sufficiently near to one.
The general version of this
construction imposes a requirement that guarantees
(7)
can be satisfied).
With probability p, players play the strategies that yield w, and the state
remains at
With complementary probability, they play the mixed
(1,1).
strategies (f. ,f„).
The continuation payoffs are exact ly as at state 0.
Thus, the payoff to player
(8)
i
of choosing strategy x(j)
p[(l-6)w. + 5a. (1)] + (i-p) [(l-6)g^s.
(j)
,
is
a_.) + 6a. (j)] =
p[(l-6)w^ + 6a^(l)] + (l-p)[g^(E^(l), a_^)] =a.(l)
for all strategies
s (j
)esupp(a.
)
.
,
Once again, if any long-run player chooses
an action not in the support, play reverts to the static Nash equilibrium.
Now we must specify strategies at states other than (1,1).
(2,2)
Choose
(9)
occurs if the players chose their most preferred strategic
a
point w' in P and
a p'e;(0,l)
such that
p"(l-5)w' + (l-p')v= (l-5p') a(c^,C2)
28
The state
(U,L)
(Once again this is possible for 6 near to one.)
public randomization, switching to
following
(a
w'
And again, play depends on a
with probability
p'
and otherwise
a.).
,
At state
(1,2), play switches to a point in P for one period with
probability p to normalized payoffs out to a{l,2).
With complementary
probability, players once again play the mixed strategies
continuation payoffs are used.
<^
and the same
State (2,1) is symmetric.
Now let us argue that the constructed strategies are an equilibrium for
5
sufficiently large.
in state
(j,k)
First, if there are no deviations, the payoffs starting
are cx,(j,k).
If player i deviates to an action outside of the
support of a., or if either player deviates when the strategies say to play a
pure strategy point, the deviation is detected, and play reverts to a static
equilibrium.
For 6 near to one all of the a's exceed to static equilibrium
payoff, and so lor sufficiently large S no player will choose such a
deviation.
And, by construction, players are indifferent between the actions
in the support of their mixed strategy,
future.
if they plan to always conform in the
Then by the principle of optimality, no arbitrary sequence of
unilateral deviations is profitable.
29
32
'
APPENDIX
In this appendix we consider discrete-parameter martingales
0,1,
where {F
,
assume that
x,.
Lemma A.l
Let
.
=
Ix
n
,F
be
1
n
a
almost surely.
)
n
n =
I
We
martingale sequence with bounded increments.
Ix
'
page 36f f
,F
0.
for some number B,
(That is,
n
probability space.
is a filtration on an underlying
\
Ix
n
- x
,
n-1
!
s
Then lim x /n
almost surely.)
B,
^
n-Kc
A proof of this lemma can be found in Hall and Heyde
=
n
(1980,
.
We also use the following standard adaptation of this strong law:
Lemma A.
For IK
as above,
,F
=
let X
Then limX /n =
minx..
l<n
almost
n-«»
surely.
On
^0
Since x„ = 0, X
Proof:
process.
Since X /n
^
Fix
for all n.
sample of the stochastic
a
we only have to show that lim inf X /n = 0.
0,
Suppose, instead, that n. is a subsequence along which the limit is less than
0.
11
For each n., there is m.
1
x^ /m
m
1
^
.
.
2
n.
with x
1
,
n.
m
and thus
/m
.
.
1
X
>
11
n.im.i
/n.
= x
/n.
violates the strong law,
1
which can happen only on
Lemma A.
u
= X
Hence, along the subsequence Im.}, x
.
1
x- = 0.
11
m.
Let Ix ,F
Let
!X^1
n
1
a
null set.
be a supermartingale with bounded increments and with
be defined from Ix
1
n
as in lemma 2.
Then lim(x
„
n-«D
almost surely.
30
n
- X
n
)
/n =
.
4
Since
Pr oof:
x^
is nonpositive.
E(!?
X
For n =
Note that
If
.).
n' n-1
K
increments, and lemmas
Y
n
- Y
n
)/n = 0.
point wise.
^
K
n
and
= X
,,
n-1
,F
ly
1
=
l"
,,
n-1
,
i=l
n
C
,
and let
= K
K
and let Y
n
n
= inf|y.;i =
-^i
tell us that lim y /n = lim Y /n = 0,
2
,-X n-1
+ K
,
n-1
then since Y
n-1,
n
is a martingale sequence with bounded
Assume it holds for
X
n
Let y
.
n
- x
n
But this is easily done by induction.
X
If X
= x
K
We are done, therefore, once we show that x
by convention.
n =
1
let
...,
1,
Then, immediately,
l,...,nl.
lim(y
we only need to show that the limsup of the sequence
,
- X
n
^
Y
^
,
s
y
n
-
n
It is clearly true for
then since
K
-
n
K
n
,
sy ,-Y ,+r,or
n-1
n-1
n
n
n-1
n
n-1;
- X
n
and thus
^
- Y
y
,
'n
n-1
we are done.
.
While if X
x,andx
-X n ^y'n -Y.
n
n
n
n
"
"^
.,
n-1
then X
n
Q.E.D.
A symmetrical argument completes the proof, and we obtain;
Lemma A.
0.
Let X
:
n
Let
= max
Ix
n
,F
Ix. li
1
1
n
with bounded increments and x
be a submartingale
^
= 1,
,nl.
Then lim
31
n-wo
(X
n
- x
n
)
/n =
=
n
almost surely.
FOOTNOTES
1.
The required discount factor can depend on the payoffs to be attained,
2.
We thank Ian Johnstone for pointing us to this result.
32
REFE RENCES
Abreu, D.
"On the Theory of Infinitely Repeated Gaines with
[1986]
Discounting", mimeo,
Dybvig, P. and C. Spatt
[1980]
"Does it Pay to Maintain
a
Reputation?", mimeo.
Fudenberg, D. and D. Levine [1983] "Subgame-Perf ect Equilibria of Finite and
Infinite Horizon Games", Journal of Economic Theor y, 31, 227-256.
Fudenberg, D., and E. Maskin [1986] "The Folk Theorem in Repeated Games with
Discounting or With Incomplete Information," Econometrica
,
54,
533-556.
"Discounted Repeated Games with One-Sided Moral Hazard",
[1987a]
Mimeo, Harvard University.
"Nash and Perfect Equilibria of Discounted Repeated Games,"
[1987b]
Mimeo.
"On the Dispensability of Public Randomizations in
[1987c]
Discounted Repeated Games," mimeo.
Hall, P. and C.C. Heyde,
[1980]
Marti ngale Theor y and Its Applications,
Academic Press, New York.
Kreps, D.
[1984]
Radner, R.
"Corporate Culture."
Mimeo, Stanford Business School.
"Repeated Partnership Games with Imperfect Monitoring and no
[1986]
Discounting."
Review of Economic Studies 53, 43-58.
Radner, R., R. Myerson, and E. Maskin [1986] "An Example of a Repeated
Partnership Game with Discounting and With Uniformly Inefficient
Equilibria."
Selten, R.
"The Chain-Store Paradox.", Theory and Decision
[1977]
Shapiro, C.
Review of Economic Studies 53, 59-70.
[1982]
Reputation,"
Simon, H.
[1951]
,
"Consumer Information, Product Quality, and Seller
Bell Journal of Economics, 13, 20-35.
"A Formal Theory of the Employment Relationship,"
Econometric a,
9,
19,
293-305.
5582 U84
33
127-159.
Date Due
Ss ^\f
Hi
'J
8
9
MIT LIBRARIES
3
TOaO DDS
3St.
S'^M
Download