On the Equilibrium Payo¤ Set in Repeated Games

advertisement
On the Equilibrium Payo¤ Set in Repeated Games
with Imperfect Private Monitoring
Takuo Sugaya and Alexander Wolitzky
Stanford Graduate School of Business and MIT
April 19, 2015
Abstract
We provide simple su¢ cient conditions for the existence of a tight, recursive upper
bound on the sequential equilibrium payo¤ set at a …xed discount factor in two-player
repeated games with imperfect private monitoring. The bounding set is the sequential
equilibrium payo¤ set with perfect monitoring and a mediator. We show that this
bounding set admits a simple recursive characterization, which nonetheless necessarily
involves the use of private strategies. Under our conditions, this set describes precisely
those payo¤ vectors that arise in equilibrium for some private monitoring structure.
This replaces our earlier paper, “Perfect Versus Imperfect Monitoring in Repeated Games.” For helpful
comments, we thank Dilip Abreu, Gabriel Carroll, Henrique de Oliveira, Glenn Ellison, Drew Fudenberg,
Michihiro Kandori, Hitoshi Matsushima, Stephen Morris, Ilya Segal, Tadashi Sekiguchi, Andrzej Skrzypacz,
Satoru Takahashi, Juuso Toikka, and Thomas Wiseman.
1
Introduction
Like many dynamic economic models, repeated games are typically studied using recursive
methods. In an incisive paper, Abreu, Pearce, and Stacchetti (1990; henceforth APS) recursively characterized the perfect public equilibrium payo¤ set at a …xed discount factor in
repeated games with imperfect public monitoring.
Their results (along with related con-
tributions by Fudenberg, Levine, and Maskin (1994) and others) led to fresh perspectives
on problems like collusion (Green and Porter, 1984; Athey and Bagwell, 2001), relational
contracting (Levin, 2003), and government credibility (Phelan and Stacchetti, 2001). However, other important environments— like collusion with secret price cuts (Stigler, 1964) or
relational contracting with subjective performance evaluations (Levin, 2003; MacLeod, 2003;
Fuchs, 2007)— involve imperfect private monitoring, and it is well-known that the methods of
APS do not easily extend to such settings (Kandori, 2002). Whether the equilibrium payo¤
set in repeated games with private monitoring exhibits any tractable recursive structure at
all is thus a major question.
In this paper, we do not make any progress toward giving a recursive characterization
of the sequential equilibrium payo¤ set in a repeated game with a given private monitoring
structure. Instead, working in the context of two-player games, we provide simple conditions
for the existence of a tight, recursive upper bound on this set.1 Equivalently, under these
conditions we give a recursive characterization of the set of payo¤s that can be attained
in equilibrium for some private monitoring structure.
Thus, from the perspective of an
observer who knows the monitoring structure, our results give an upper bound on how well
players can do in a repeated game; while from the perspective of an observer who does not
know the monitoring structure, our results exactly characterize how well the players can do.
Throughout the paper, the set we use to upper-bound the equilibrium payo¤ set with
private monitoring is the equilibrium payo¤ set with perfect monitoring and a mediator
(or, more precisely, the closure of the strict equilibrium payo¤ set with this information
1
The ordering on sets of payo¤ vectors here is not the usual one of set inclusion, but rather dominance
in all non-negative Pareto directions. Thus, we say that A upper-bounds B if every payo¤ vector b 2 B is
Pareto dominated by some payo¤ vector a 2 A.
1
structure).
We do not take a position on the realism of allowing for a mediator, and
instead view the model with a mediator as a purely technical device that— as we show— is
useful for bounding the equilibrium payo¤ set with private monitoring. We thus show that
the equilibrium payo¤ set with private monitoring has a simple, recursive upper bound by
establishing two main results:
1. Under some conditions, the equilibrium payo¤ set with perfect monitoring and a mediator is indeed an upper bound on the equilibrium payo¤ set with any private monitoring
structure.
2. The equilibrium payo¤ set with perfect monitoring and a mediator has a simple recursive structure.
At …rst glance, it might seem surprising that any conditions at all are needed for the …rst
of these results, as one might think that improving the precision of the monitoring structure
and adding a mediator can only expand the equilibrium set. But this is not the case: giving
a player more information about her opponents’ past actions splits her information sets
and thus gives her new ways to cheat, and indeed we show by example that (unmediated)
imperfect private monitoring can sometimes outperform (mediated) perfect monitoring. Our
…rst result provides su¢ cient conditions for this not to happen. Thus, another contribution
of our paper is pointing out that perfect monitoring is not necessarily the optimal monitoring
structure in a repeated game (even if it is advantaged by giving players access to a mediator),
while also giving su¢ cient conditions under which perfect monitoring is indeed optimal.
Our su¢ cient condition for mediated perfect monitoring to outperform any private monitoring structure is that there is a feasible continuation payo¤ vector v such that no player i
is tempted to deviate if she gets continuation payo¤ vi when she conforms and is minmaxed
when she deviates. This is a joint restriction on the stage game and the discount factor,
and it is essentially always satis…ed when players are at least moderately patient.
need not be extremely patient: none of our main results concern the limit
2
! 1.)
(They
The
reason why this condition is su¢ cient for mediated perfect monitoring to outperform private
monitoring in two-player games is fairly subtle, so we postpone a detailed discussion.
Our second main result also involves some subtleties. In repeated games with perfect
monitoring without a mediator, all strategies are public, so the sequential (equivalently,
subgame perfect) equilibrium set coincides with the perfect public equilibrium set, which
was recursively characterized by APS. On the other hand, with a mediator— who makes
private action recommendations to the players— private strategies play a crucial role, and
APS’s characterization does not apply. We nonetheless show that the sequential equilibrium
payo¤ set with perfect monitoring and a mediator does have a simple recursive structure.
Under the su¢ cient conditions for our …rst result, a recursive characterization is obtained
by replacing APS’s generating operator B with what we call a minmax-threat generating
~ for any set of continuation payo¤s W , the set B
~ (W ) is the set of payo¤s that
operator B:
can be attained when on-path continuation payo¤s are drawn from W and deviators are
minmaxed.2
To see intuitively why deviators can always be minmaxed in the presence of
a mediator— and also why private strategies cannot be ignored— suppose that the mediator
recommends a target action pro…le a 2 A with probability 1
other action pro…le with probability "= (jAj
", while recommending every
1); and suppose further that if some player
i deviates from her recommendation, the mediator then recommends that her opponents
minmax her in every future period. In such a construction, player i’s opponents never learn
that a deviation has occurred, and they are therefore always willing to follow the recommendation of minmaxing player i.3 (This construction clearly relies on private strategies: if the
mediator’s recommendations were public, players would always see when a deviation occurs,
and they then might not be willing to minmax the deviator.) Our recursive characterization
of the equilibrium payo¤ set with perfect monitoring and a mediator takes into account the
2
The equilibrium payo¤ set with perfect monitoring and a mediator has a recursive structure whether
or not the su¢ cient conditions for our …rst result are satis…ed, but the characterization is somewhat more
complicated in the general case. See Section 9.1.
3
In this construction, the mediator virutally implements the target action pro…le. For other applications
of virtual implementation in games with a mediator, see Lehrer (1992), Mertens, Sorin, and Zamir (1994),
Renaul and Tomala (2004), Rahman and Obara (2010), and Rahman (2012).
3
possibility of minmaxing deviators in this way.
We also consider several extensions of our results. Perhaps most importantly, we establish
two senses in which the equilibrium payo¤ set with perfect monitoring and a mediator is a
tight upper bound on the equilibrium payo¤ set with any private monitoring structure. First,
mediated perfect monitoring is itself an example of a nonstationary monitoring structure,
meaning that the distribution of signals can depend on everything that has happened in
the past, rather than only on current actions.
Thus, our upper bound is trivially tight
in the space of nonstationary monitoring structures. Second, with a standard, stationary
monitoring structure, where the signal distribution depends only on the current actions, we
show that the mediator can be replaced by ex ante correlation and cheap talk. Hence, our
upper bound is also tight in the space of stationary monitoring structures when an ex ante
correlating device and cheap talk are available.
This paper is certainly not the …rst to develop recursive methods for private monitoring repeated games.
In an early and in‡uential paper, Kandori and Matsushima (1998)
augment private monitoring repeated games with opportunities for public communication
among players, and provide a recursive characterization of the equilibrium payo¤ set for a
subclass of equilibria that is large enough to yield a folk theorem. Tomala (2009) gives related results when the repeated game is augmented with a mediator rather than only public
communication.
However, neither paper provides a recursive upper bound on the entire
sequential equilibrium payo¤ set at a …xed discount factor in the repeated game.4
Ama-
rante (2003) does give a recursive characterization of the equilibrium payo¤ set in private
monitoring repeated games, but the state space in his characterization is the set of repeated
game histories, which grows over time. In contrast, our upper bound is recursive in payo¤
space, just like in APS.
4
Ben-Porath and Kahneman (1996) and Compte (1998) also prove folk theorems for private monitoring
repeated games with communication, but they do not emphasize recursive methods away from the ! 1
limit. Lehrer (1992), Mertens, Sorin, and Zamir (1994), and Renault and Tomala (2004) characterize the
communication equilibrium payo¤ set in undiscounted repeated games. These papers study how imperfect
monitoring can limit the equilibrium payo¤ set without discounting, whereas our focus is on how discounting
can limit the equilibrium payo¤ set independently of the monitoring structure.
4
A di¤erent approach is taken by Phelan and Skrzypacz (2012) and Kandori and Obara
(2010), who develop recursive methods for checking whether a given …nite-state strategy
pro…le is an equilibrium in a private monitoring repeated game. Their results do not give
a recursive characterization or upper bound on the equilibrium payo¤ set.
The type of
recursive methods used in their papers is also di¤erent: their methods for checking whether
a given strategy pro…le is an equilibrium involve a recursion on the sets of beliefs that players
can have about each other’s states, rather than a recursion on payo¤s.
Recently, Awaya and Krishna (2014) and Pai, Roth, and Ullman (2014) derive bounds on
payo¤s in private monitoring repeated games as a function of the monitoring structure. The
bounds in these papers come from the observation that, if an individual’s actions can have
only a small impact on the distribution of signals, then the shadow of the future can have
only a small e¤ect on her incentives. In contrast, our payo¤ bounds apply for all monitoring
structures, including those in which individuals’actions have a large impact on the signal
distribution.5
Finally, we have emphasized that our results can be interpreted either as giving an upper
bound on the equilibrium payo¤ set in a repeated game for a particular private monitoring
structure, or as characterizing the set of payo¤s that can arise in equilibrium in a repeated
game for some private monitoring structure. With the latter interpretation, our paper shares
a motivation with Bergemann and Morris (2013), who characterize the set of payo¤s that can
arise in equilibrium in a static incomplete information game for some information structure.
Yet another interpretation of our results is that they establish that information is valuable in
mediated repeated games, in that— under our su¢ cient conditions— players cannot bene…t
from imperfections in the monitoring technology.
This interpretation connects our paper
to the literature on the value of information in static incomplete information games (e.g.,
Gossner, 2000; Lehrer, Rosenberg, and Shmaya, 2010; Bergemann and Morris, 2013).
The rest of the paper is organized as follows. Section 2 describes our models of repeated
5
A working paper by Cherry and Smith (2011) is the …rst to consider the issue of upper-bounding the
equilibrium payo¤ set with imperfect private monitoring with the equilibrium set with perfect monitoring
and correlating devices. However, the result of theirs which is most relevant for us–their Theorem 3–appears
to be contradicted by our example in Section 3 below.
5
games with imperfect private monitoring and repeated games with perfect monitoring and a
mediator, which are standard. Section 3 gives an example showing that private monitoring
can sometimes outperform perfect monitoring with a mediator.
Section 4 develops some
preliminary results about repeated games with perfect monitoring and a mediator. Section 5
presents our …rst main result, which gives su¢ cient conditions for such examples not to exist:
the proof of this result is lengthy and is deferred to Section 8. Section 6 presents our second
main result: a simple recursive characterization of the equilibrium payo¤ set with perfect
monitoring and a mediator.
Combining the results of Sections 5 and 6 gives the desired
recursive upper bound on the equilibrium payo¤ set with private monitoring.
Section 7
illustrates the calculation of the upper bound with an example. Section 9 discusses partial
versions of our results that apply when our su¢ cient conditions do not hold, as in the case
of more than two players, as well as the tightness of our upper bound. Section 10 concludes.
2
Model
A stage game G =
I; (Ai )i2I ; (ui )i2I
is repeated in periods t = 1; 2; : : :, where I =
f1; : : : ; jIjg is the set of players, Ai is the …nite set of player i’s actions, and ui : A ! R
is player i’s payo¤ function. Players maximize expected discounted payo¤s with common
discount factor .
We compare the equilibrium payo¤ sets in this repeated game under
private monitoring and mediated perfect monitoring.
2.1
Private Monitoring
In each period t, the game proceeds as follows: Each player i takes an action ai;t 2 Ai . A
signal zt = (zi;t )i2I 2 (Zi )i2I = Z is drawn from distribution p (zt jat ), where Zi is the …nite
set of player i’s signals and p ( ja) is the monitoring structure. Player i observes zi;t .
A period t history of player i’s is an element of Hit = (Ai
Zi )t 1 , with typical element
hti = (ai; ; zi; )t =11 , where Hi1 consists of the null history ;. A (behavior) strategy of player
S
t
i’s is a map i : 1
(Ai ). Let H t = (Hit )i2I .
t=1 Hi !
6
We do not impose the common assumption that a player’s payo¤ is measurable with
respect to her own action and signal (i.e., that players observe their own payo¤s), because
none of our results need this assumption. All examples considered in the paper do however
satisfy this measurability condition.
The solution concept is sequential equilibrium. More precisely, a belief system of player
S1
S
t
hti ; H t i for all t; we also
(H t ) satisfying supp i (hti )
i’s is a map i : 1
t=1
t=1 Hi !
write
i
(ht jhti ) for the probability of ht under
i
(hti ).
We say that an assessment ( ; )
constitutes a sequential equilibrium if the following two conditions are satis…ed:
1. [Sequential rationality] For each player i and history hti ,
pected continuation payo¤ at history hti under belief
i
i
maximizes player i’s ex-
(hti ).
2. [Consistency] There exists a sequence of completely mixed strategy pro…les (
n
) such
that the following two conditions hold:
(a)
n
converges to
(pointwise in t): For all " > 0 and t, there exists N such that,
for all n > N ,
n t
i (hi )
t
i (hi )
< " for all i 2 I; hti 2 Hit :
(b) Conditional probabilities converge to
(pointwise in t):
For all " > 0 and t,
there exists N such that, for all n > N ,
n
Pr (hti ; ht i )
P
n
t ~t
~ t Pr (hi ; h i )
h
t
t
i (hi ; h i
j hti ) < " for all i 2 I; hti 2 Hit ; ht i 2 H t i :
i
We choose this relatively permissive de…nition of consistency (requiring that strategies
and beliefs converge only pointwise in t) to make our results upper-bounding the sequential
equilibrium payo¤ set stronger.
The results with a more restrictive de…nition (requiring
uniform convergence) would be essentially the same.
7
2.2
Mediated Perfect Monitoring
In each period t, the game proceeds as follows: A mediator sends a private message mi;t 2 Mi
to each player i, where Mi is a …nite message set for player i. Each player i takes an action
ai;t 2 Ai . All players and the mediator observe the action pro…le at 2 A.
t
A period t history of the mediator’s is an element of Hm
= (M
A)t 1 , with typical
1
consists of the null history. A strategy of the mediator’s
element htm = (m ; a )t =11 , where Hm
S1
t
! (M ). A period t history of player i’s is an element of Hit =
is a map : t=1 Hm
Mi , with typical element hti = (mi; ; a )t =11 ; mi;t , where Hi1 = Mi .6
S
t
(Ai ).
strategy of player i’s is a map i : 1
t=1 Hi !
(Mi
A)t
1
A
The de…nition of sequential equilibrium is the same as with private monitoring, except
that sequential rationality is imposed (and beliefs are de…ned) only at histories consistent
with the mediator’s strategy.
The interpretation is that the mediator is not a player in
the game, but rather a “machine” that cannot tremble.
Note that with this de…nition,
an assessment (including the mediator’s strategy) ( ; ; ) is a sequential equilibrium with
mediated perfect monitoring if and only if ( ; ) is a sequential equilibrium with the “nonstationary” private monitoring structure where Zi = Mi
perfect monitoring of actions with messages given by
A and p ( jht+1
m ) coincides with
(ht+1
m ) (see Section 9.2).
As in Forges (1986) and Myerson (1986), any equilibrium distribution over (in…nite paths
of) actions arises in an equilibrium of the following form:
1. [Messages are action recommendations] M = A.
2. [Obedience/incentive compatibility] At history hti = ((ri; ; a )t =11 ; ri;t ), player i plays
ai;t = ri;t .
Without loss of generality, we restrict attention to such obedient equilibria throughout.7
6
t 1
We also occasionally write hti for (mi; ; a ) =1 , omitting the period t message mi;t .
7
Dhillon and Mertens (1996) show that the revelation principle fails for trembling-hand perfect equilibria.
Nonetheless, with our “machine”interpretation of the mediator, the revelation principle applies for sequential
equilibrium by precisely the usual argument of Forges (1986).
8
Finally, we say that a sequential equilibrium with mediated perfect monitoring is onpath strict if following the mediator’s recommendation is strictly optimal for each player i at
every on-path history hti . Let Emed ( ) denote the set of on-path strict sequential equilibrium
payo¤s. For the rest of the paper, we slightly abuse terminology by omitting the quali…er
“on-path”when discussing such equilibria.
3
An Illustrative (Counter)Example
The goal of this paper is to provide su¢ cient conditions for the equilibrium payo¤ set with
mediated perfect monitoring to be a (recursive) upper bound on the equilibrium payo¤
set with private monitoring.
Before giving these main results, we provide an illustrative
example showing why, in the absence of our su¢ cient conditions, private monitoring (without
a mediator) can outperform mediated perfect monitoring. Readers eager to get to the results
can skip this section without loss of continuity.
Consider the repetition of the following stage game, with
L
M
R
1; 0
1; 0
U
2; 2
D
3; 0
0; 0
T
0; 3
6; 3
B 0; 3
0; 3
= 16 :
0; 0
6; 3
0; 3
We show the following:
Proposition 1 In this game, there is no sequential equilibrium where the players’per-period
payo¤s sum to more than 3 with perfect monitoring and a mediator, while there is such a
sequential equilibrium with some private monitoring structure.
Proof. See appendix.
In the constructed private monitoring structure in the proof of Proposition 1, players’
payo¤s are measurable with respect to their own actions and signals. In addition, a similar
9
argument shows that imperfect public monitoring (with private strategies) can also outperform mediated perfect monitoring. This shows that some conditions are also required
to guarantee that the sequential equilibrium payo¤ set with mediated perfect monitoring
is an upper bound on the sequential equilibrium payo¤ set with imperfect public monitoring. Since imperfect public monitoring is a special case of imperfect private monitoring, our
su¢ cient conditions for the private monitoring case are enough.
Here is a sketch of the proof of Proposition 1. Note that (U; L) is the only action pro…le
where payo¤s sum to more than 3. Because
is so low, player 1 (row player, “she”) can
be induced to play U in response to L only if action pro…le (U; L) is immediately followed
by (T; M ) with high enough probability: speci…cally, this probability must exceed 35 . With
perfect monitoring, player 2 (column player, “he”) must then “see (T; M ) coming” with
probability at least
3
5
following (U; L), and this probability is so high that player 2 will
deviate from M to L (regardless of the speci…cation of continuation play). This shows that
payo¤s cannot sum to more than 3 with perfect monitoring.
On the other hand, with private monitoring, player 2 may not know whether (U; L)
has just occurred, and therefore may be unsure of whether the next action pro…le will be
(T; M ) or (B; M ), which can give him the necessary incentive to play M rather than L. In
particular, suppose that player 1 mixes 13 U + 23 D in period 1, and the monitoring structure
is such that player 2 gets signal m (“play M ”) with probability 1 following (U; L), and gets
signals m and r (“play R”) with probability
1
2
each following (D; L). Suppose further that
player 1 plays T in period 2 if she played U in period 1, and plays B in period 2 if she played
D in period 1. Then, when player 2 sees signal m in period 1, his posterior belief that player
1 played U in period 1 is
1
3
1
3
(1)
(1) + 32
1
2
1
= :
2
Player 2 therefore expects to face T and B in period 2 with probability
1
2
each, so he is
willing to play M rather than L. Meanwhile, player 1 is always rewarded with (T; M ) in
period 2 when she plays U in period 1, so she is willing to play U (as well as D) in period 1.
To summarize, the advantage of private monitoring is that pooling players’information
10
sets (in this case, player 2’s information sets after (U; L) and (D; L)) can make providing
incentives easier.8
To preview our results, we will show that private monitoring cannot outperform mediated
perfect monitoring when there exists a feasible payo¤ vector that is appealing enough to both
players that neither is tempted to deviate when it is promised to them in continuation if
they conform. This condition is violated in the current example because, for example, no
feasible continuation payo¤ for player 2 is high enough to induce him to respond to T with
M rather than L. Speci…cally, our condition is satis…ed in the current example if and only
if
is greater than
19
.
25
perfect monitoring if
Thus, in this example, private monitoring can outperform mediated
= 16 , but not if
>
19 9
.
25
Preliminary Results about Emed ( )
4
We begin with two preliminary results about the equilibrium payo¤ set with mediated perfect
monitoring.
These results are important for both our result on when private monitoring
cannot outperform mediated perfect monitoring (Theorem 1) and our characterization of the
equilibrium payo¤ set with mediated perfect monitoring (Theorem 2).
Let ui be player i’s correlated minmax payo¤, given by
ui =
Let
i
2
min
i2
(A
i)
max ui (ai ;
ai 2Ai
i ):
(A i ) be a solution to this minmax problem.
8
Let di be player i’s greatest
As far as we know, the observation that players can bene…t from imperfections in monitoring even in
the presence of a mediator is original. Examples by Kandori (1991), Sekiguchi (2002), Mailath, Matthews,
and Sekiguchi (2002), and Miyahara and Sekiguchi (2013) show that players can bene…t from imperfect
monitoring in …nitely repeated games. However, in their examples this conclusion relies on the absence of a
mediator, and is thus due to the possibilities for correlation opened up by private monitoring. The broader
point that giving players more information can be bad for incentives is of course an old one.
9
We do not …nd the lowest possible discount factor such that mediated perfect monitoring outperforms
private monitoring for all > .
11
possible gain from a deviation at any recommendation pro…le, given by
di =
max ui (ai ; r i )
ui (r):
r2A;ai 2Ai
Let wi be the lowest continuation payo¤ such that player i does not want to deviate at any
recommendation pro…le when she is minmaxed forever if she deviates, given by
w i = ui +
1
di :
Let
Wi = w 2 RjIj : wi
wi :
Finally, let u (A) be the convex hull of the set of feasible payo¤s, and denote the interior of
T
i2I Wi \ u(A) as a subspace of u (A) by
W
int
\
i2I
with
W
\
i2I
Wi \ u(A)
!
Wi \ u(A)
Our …rst preliminary result is that all payo¤s in W are attainable in equilibrium with
mediated perfect monitoring.
See Figure 1.
The intuition is that— as discussed in the
introduction— the mediator can virtually implement any payo¤ vector in W by minmaxing
deviators.
We will actually prove the slightly stronger result that all payo¤s in W are attainable
in a strict “full-support” equilibrium with mediated perfect monitoring. Formally, we say
that an equilibrium has full support if for each player i and history hti = (ri; ; a )t =11 such
that there exists (r
i;
)t =11 with Pr (r j(r 0 ; a 0 )
12
1
0 =1
) > 0 for each
= 1; :::; t
1, there exists
Figure 1: W
(r
i;
)t =11 such that for each
Pr (ri; ; r
1 we have
= 1; :::; t
i;
j(ri; 0 ; r
Emed ( )
i;
0
; a 0)
1
0 =1
) > 0 and r
i;
=a
i;
:
That is, any history hti consistent with the mediator’s strategy is also consistent with i’s
opponents’equilibrium strategies (even if player i herself has deviated, noting that we allow
ri; 6= ai; in hti ).
This is weaker than requiring that the mediator’s recommendation
has full support at all histories (on- and o¤-path), but stronger than requiring that the
recommendation has full support at all on-path histories only. Note that, if the equilibrium
has full support, player i never believes that any of the other players has deviated.
Lemma 1 For all v 2 W , there exists a strict full-support equilibrium with mediated perfect
monitoring with payo¤ v. In particular, W
Proof. For each v 2 W , there exists
2
Emed ( ).
(A) such that u( ) = v and (r) > 0 for all
r 2 A. On the other hand, for each i 2 I and " 2 (0; 1), approximate the minmax strategy
P
T
"
(1 ") i +" a i 2A i jAa ii j . Since v 2 int i2I Wi ,
i by the full-support strategy
i
13
there exists " 2 (0; 1) such that, for each i 2 I, we have
vi > max ui (ai ;
ai 2Ai
"
i)
+
1
(1)
di :
Consider the following recommendation schedule: The mediator follows an automaton
strategy whose state is identical to a subset of players J
I. Hence, the mediator has 2jIj
states. In the following construction of the mediator’s strategy, J will represent the set of
players who have ever deviated from the mediator’s recommendation.
If the state J is equal to ; (no player has deviated), then the mediator recommends .
If there exists i with J = fig (only player i has deviated), then the mediator recommends
r
i
to players
i according to
Finally, if jJj
i.
"
i,
"
i
to player
2 (several players have deviated), then for each i 2 J, the mediator
recommends the best response to
other players
and recommends some best response to
J with probability
"
i,
while she recommends each pro…le a
1
jA
Jj
.
J
2A
J
to the
The state transitions as follows: if the current
state is J and players J 0 deviate, then the state transitions to J [ J 0 .10
Player i’s strategy is to follow her recommendation ri;t in period t.
She believes that
the mediator’s state is ; if she herself has never deviated, and believes that the state is fig
if she has deviated.
Since the mediator’s recommendation has full support, player i’s belief is consistent. (In
particular, no matter how many times player i has been instructed to minmax some player j,
it is always in…nitely more likely that these instructions resulted from randomization by the
mediator rather than a deviation by player j.) If player i has deviated, then (given her belief)
it is optimal for her to always play a static best response to
recommends
"
i
"
i,
since the mediator always
in state fig. Given that a unilateral deviation by player i is punished in
this way, (1) implies that on path player i has a strict incentive to follow her recommendation
ri;t at any recommendation pro…le rt 2 A. Hence, she has a strict incentive to follow her
recommendation when she believes that r
i;t
is distributed according to Pr (r
t
i;t jhi ).
The condition that W 6= ; can be more transparently stated as a lower bound on the
10
We thank Gabriel Carroll for suggestions which helped simplify this construction.
14
discount factor. In particular, W 6= ; if and only if there exists v 2 u (A) such that
1
vi > ui +
di for all i 2 I;
or equivalently
>
min max
v2u(A) i2I
For instance, it can be checked that
=
19
25
di
di + vi
ui
:
in the example of Section 3. Note that
is
strictly less than 1 if and only if the stage game admits a feasible and strictly individually
rational payo¤ vector (relative to correlated minmax payo¤s).11 For most games of interest,
will be some “intermediate”discount factor that is not especially close to either 0 or 1.
Our second preliminary result is that, if a strict full-support equilibrium exists, then any
payo¤ vector that can be attained by a mediator’s strategy that is incentive compatibility
on path is (virtually) attainable in strict equilibrium. The intuition is that mixing such a
mediator’s strategy with an " probability of playing any strict full-support equilibrium yields
a strict equilibrium with nearby payo¤s.
Lemma 2 With mediated perfect monitoring, …x a payo¤ vector v, and suppose there exists
a mediator’s strategy
that (1) attains v when players obey the mediator, and (2) has the
property that obeying the mediator is optimal for each player at each on-path history, when
she is minmaxed forever if she deviates: that is, for each player i and on-path history ht+1
m ,
(1
)E ui (rt ) j
max (1
ai 2Ai
htm ; ri;t
)E ui (ai ; r
i;t )
"
+ E (1
)
1
X
=t+1
t 1
ui ( (hm )) j
j htm ; ri;t + ui :
htm ; ri;t
#
(2)
Suppose also that there exists a strict full-support equilibrium. (For example, such an equilibrium exists if W 6= ;, by Lemma 1.) Then v 2 Emed ( ).
Proof. See appendix.
11
Recall that a payo¤ vector v is strictly individually rational if vi > ui for all i 2 I.
15
5
A Su¢ cient Condition for Emed ( ) to Give an Upper
Bound
Our su¢ cient condition for mediated perfect monitoring to outperform private monitoring
in two-player games is that
>
. In Section 9, we discuss what happens when there are
more than two players or the condition that
>
is relaxed.
Let E( ; p) be the set of (possibly weak) sequential equilibrium payo¤s with private
monitoring structure p. The following is our …rst main result. Note that E( ; p) is closed,
as we use the product topology on assessments (Fudenberg and Levine, 1983), so the maxima
in the theorem exist.
Theorem 1 If jIj = 2 and
>
non-negative Pareto weight
2
, then for every private monitoring structure p and every
+
max
f 2 R2+ : k k = 1g, we have
v2E( ;p)
v
max
v:
v2Emed ( )
Theorem 1 says that, in games involving two players of at least moderate patience, the
Pareto frontier of the (closure of the strict) equilibrium payo¤ set with mediated perfect
monitoring extends farther in any non-negative direction than does the Pareto frontier of
the equilibrium payo¤ set with any private monitoring structure.12 We emphasize that— like
all our main results— Theorem 1 concerns equilibrium payo¤ sets at …xed discount factors,
not in the
! 1 limit.
We describe the idea of the proof of Theorem 1, deferring the proof itself to Section 8.
Let E( ) be the equilibrium payo¤ set in the mediated repeated game with the following
universal information structure: the mediator directly observes the recommendation pro…le
rt and the action pro…le at in each period t, while each player i observes nothing beyond
her own recommendation ri;t and her own action ai;t .13 With this monitoring structure, the
12
We do not know if the same result holds for negative Pareto weights.
This information structure may not result from mediated communication among the players, as actions
are not publicly observed. Again, we simply view E ( ) as a technical device.
13
16
mediator can clearly replicate any private monitoring structure p by setting
(htm ) equal to
p ( jat 1 ) for every history htm = (r ; a )t =11 . It particular, we have E( ; p)
E( ) for every
p,14 so to prove Theorem 1 it su¢ ces to show that
max
v2E( )
v
max
v:
(3)
v2Emed ( )
(The maximum over v 2 E( ) is well de…ned since E( ) is closed, by the same reasoning as
for E( ; p).)
To show this, the idea is to start with an equilibrium in E( )— where players only observe their own recommendations— and then show that the players’ recommendations can
be “publicized”without violating anyone’s obedience constraints.15 To see why this is possible (when jIj = 2 and
>
, or equivalently W 6= ;), …rst note that we can restrict
attention to equilibria with Pareto-e¢ cient on-path continuation payo¤s, as improving both
players’on-path continuation payo¤s improves their incentives (assuming that deviators are
minmaxed, which is possible when W 6= ;, by Lemma 2). Next, if jIj = 2 and W 6= ;,
then if a Pareto-e¢ cient payo¤ vector v lies outside of Wi for one player (say player 2), it
must then lie inside of Wj for the other player (player 1). Hence, at each history ht , there
can be only one player— here player 2— whose obedience constraint could be violated if we
publicized both players’past recommendations.
Now, suppose that at history ht we do publicize the entire vector of players’past recommendations rt = (r )t =11 , but the mediator then issues period t recommendations according
to the original equilibrium distribution of recommendations conditional on player 2’s past
recommendations r2t = (r2; )t =11 only.
We claim that doing this violates neither player’s
obedience constraint: Player 1’s obedience constraint is easy to satisfy, as we can always
ensure that continuation payo¤s lie in W1 .
And, since player 2 already knew r2t in the
14
In light of this fact, the reader may wonder why we do not simply take E ( ) rather than Emed ( ) to
be our bound on equilibrium payo¤s with private monitoring. The answer is that, as far as we know, E ( )
does not admit a recursive characterization, while we recursively characterize Emed ( ) in Section 6.
15
More precisely, the construction in the proof both publicizes the players’recommendations and modi…es
the equilibrium in ways that only improve the players’ -weighted payo¤s.
17
original equilibrium, publicizing ht while issuing recommendations based only on r2t does not
a¤ect his incentives.
An important missing step in this proof sketch is that, in the original equilibrium in
E( ), at some histories it may be player 1 who is tempted to deviate when we publicize past
recommendations (while it is player 2 who is tempted at other histories). For instance, it
is not clear how we can publicize past recommendations when ex ante equilibrium payo¤s
are very good for player 1 and so v 2 W1 (so player 2 is tempted to deviate in period 1),
but continuation payo¤s at some later history are very good for player 2 (so then player 1
is tempted to deviate). The proof of Theorem 1 shows that we can ignore this possibility,
because— somewhat unexpectedly— equilibrium paths like this one are never needed to sustain Pareto-e¢ cient payo¤s. In particular, to sustain an ex ante payo¤ that is very good for
player 1 (i.e., outside W2 ), we never need to promise continuation payo¤s that are very good
for player 2 (i.e., outside W1 ). The intuition is that, rather than promising player 2 a very
good continuation payo¤ outside W1 , we can instead promise him a fairly good continuation
inside W1 , while compensating him for this change by also occasionally transitioning to this
fairly good continuation payo¤ at histories where the original promised continuation payo¤ is
less good for him. Finally, since the feasible payo¤ set is convex, the resulting “compromise”
continuation payo¤ vector is also acceptable to player 1.
Recursively Characterizing Emed( )
6
We have seen that Emed ( ) is an upper bound on E ( ; p) for two-player games satisfying
>
. As our goal is to give a recursive upper bound on E ( ; p), it remains to recursively
characterize Emed ( ).
Our characterization assumes that
>
, but it applies for any
number of players.
Recall that APS characterize the perfect public equilibrium set with imperfect public
monitoring as the iterative limit of a generating operator B, where B (W ) is de…ned as the
set of payo¤s that can be sustained when on- and o¤-path continuation payo¤s are drawn
18
from W . What we show is that the sequential equilibrium payo¤ set with mediated perfect
~ where B
~ (W ) is the set of
monitoring is the iterative limit of a generating operator B,
payo¤s that can be sustained when on-path continuation payo¤s are drawn from W , and
deviators are minmaxed o¤ path.
Intuitively, there are two things to show: (1) we can
indeed minmax deviators o¤ path, and (2) on-path continuation payo¤s must themselves be
sequential equilibrium payo¤s. The …rst of these facts is Lemma 2.
For the second, note
that, in an obedient equilibrium with perfect monitoring, players can perfectly infer each
other’s private history on path.
Continuation play at on-path histories (but not o¤-path
histories) is therefore “common knowledge,”which gives the desired recursive structure.
In the following formal development, we assume familiarity with APS (see also Section 7.3
of Mailath and Samuelson, 2006), and focus on the new features that emerge when mediation
is available.
RjIj , a correlated action pro…le
De…nition 1 For any set V
enforceable on V by a mapping
E [(1
max E [(1
(A) is minmax-threat
: A ! V if, for each player i and action ai 2 supp
) ui (ai ; a i ) +
a0i 2Ai
2
i,
(ai ; a i ) j ai ]
) ui (a0i ; a i ) j ai ] +
min
i2
(A
i)
max ui (ai ;
ai 2Ai
i) :
De…nition 2 A payo¤ vector v 2 RjIj is minmax-threat decomposable on V if there exists
a correlated action pro…le
2
(A) which is minmax-threat enforced on V by a mapping
such that
v = E [(1
) u (a) +
(a)] :
~ (V ) = v 2 RjIj : v is minmax-threat decomposable on V .
Let B
We show that the following algorithm recursively computes Emed ( ): let W 1 = u (A),
~ (W n 1 ) for n > 1, and W 1 = limn!1 W n .
Wn = B
Theorem 2 If
>
, then Emed ( ) = W 1 .
19
With the exception of the following two lemmas, the proof of Theorem 2 is entirely
standard, and omitted. The lemmas correspond to facts (1) and (2) above. In particular,
Lemma 3 follows directly from APS and Lemma 2, while Lemma 4 establishes on-path
recursivity. For both lemmas, assume
Lemma 3 If a set V
.
>
RjIj is bounded and satis…es V
~ (V ), then B
~ (V )
B
Emed ( ).
Proof. See appendix.
~ Emed ( ) .
Lemma 4 Emed ( ) = B
Proof. See appendix.
Combining Theorems 1 and 2 yields our main conclusion: in two-player games with
>
, the equilibrium payo¤ set with mediated perfect monitoring is a recursive upper
bound on the equilibrium payo¤ set with any imperfect private monitoring structure.
7
The Upper Bound in an Example
We illustrate our results with an application to a simple repeated Bertrand game. Specifically, in such a game we …nd greatest equilibrium payo¤ that each …rm can attain for any
private monitoring structure.
Consider the following Bertrand game: there are two …rms i 2 f1; 2g, and each …rm i’s
possible price level is pi 2 fW; L; M; Hg (price war, low price, medium price, high price).
Given p1 and p2 , …rm i’s pro…t is determined by the following payo¤ matrix
W
L
M
W 15; 15 30; 25 50; 15
L
H
80; 0
25; 30 40; 40 60; 35 90; 15
M 15; 50 35; 60 55; 55 85; 35
H
0; 80
15; 90 35; 85 65; 65
20
Note that L (low price) is a dominant strategy in the stage game, W (price war) is a costly
action that hurts the other …rm, and (H; H) maximizes the sum of the …rms’pro…ts. The
feasible payo¤ set is given by
u(A) = co f(0; 80); (15; 15); (15; 90); (35; 85); (65; 65); (80; 0); (85; 35); (90; 15)g :
In addition, each …rm’s minmax payo¤ ui is 25, so the feasible and individually rational
payo¤ set is given by
co f(25; 25) ; (25; 87:5) ; (35; 85) ; (65; 65) ; (85; 35) ; (87:5; 25)g :
In particular, the greatest feasible and individually rational payo¤ for each …rm is 87:5.
In this game, each …rm’s maximum deviation gain di is 25. Since the game is symmetric,
the critical discount factor
above which we can apply Theorems 1 and 2 is given by
plugging the best symmetric payo¤ of 65 into the formula for
=
25
25 + 65
25
=
, which gives
5
:
13
To illustrate our results, we …nd the greatest equilibrium payo¤ that each …rm can attain
for any private monitoring structure when
When
= 12 , we have
wi = 25 +
=
1
1
2
>
5
.
13
25 = 50:
Hence, Lemma 1 implies that
fv 2 int u(A) : vi > 50 for each ig
= int co f(50; 50) ; (75; 50) ; (50; 75) ; (65; 65)g
Emed ( ) :
~
We now compute the best payo¤ vector for …rm 1 in B(u(A)).
By Theorems 1 and 2,
~ (A)) is not included in E( ; p) for any p.
any Pareto-e¢ cient payo¤ pro…le not included B(u
21
In computing the best payo¤ vector for …rm 1, it is natural to conjecture that …rm
1’s incentive compatibility constraint is not binding. We thus consider a relaxed problem
with only …rm 2’s incentive constraint, and then verify that …rm 1’s incentive constraint
is satis…ed.
Note that playing L is always the best deviation for …rm 2.
Furthermore,
the corresponding deviation gain decreases as …rm 1 increases its price from W to L, and
(weakly) increases as it increases its price from L to M or H. On the other hand, …rm 1’s
payo¤ increases as …rm 1 increases its price from W to L and decreases as it increases its
price from L to M or H. Hence, in order to maximize …rm 1’s payo¤, …rm 1 should play L.
Suppose that …rm 2 plays H. Then, …rm 2’s incentive compatibility constraint is
(1
)
(w2
25
|{z}
maximum deviation gain
50 implies that w1
;
minmax payo¤
where w2 is …rm 2’s continuation payo¤. That is, w2
By feasibility, w2
25 )
|{z}
50.
75. Hence, if r2 = H, the best minmax-threat
decomposable payo¤ for …rm 1 is
(1
0
)@
90
15
1
0
A+ @
75
50
1
0
A=@
82:5
32:5
1
A:
Since 82:5 is larger than any payo¤ that …rm 1 can get when …rm 2 plays W , M , or L, …rm
2 should indeed play H to maximize …rm 1’s payo¤.
Moreover, since 75
w1 , …rm 1’s
incentive constraint is not binding. Thus, we have shown that 82:5 is the best payo¤ for
~ (u(A)).
…rm 1 in B
On the other hand, with mediated perfect monitoring it is indeed possible to (virtually)
implement an action path in which …rm 1’s payo¤ is 82:5: play (L; H) in period 1 (with payo¤s
(90; 15)), and then play
1
2
(M; H) + 12 (H; H) forever (with payo¤s
1
2
(85; 35) + 12 (65; 65) =
(75; 50)), while minmaxing deviators.
Thus, when
= 21 , each …rm’s greatest feasible and individually rational payo¤ is 87:5,
but the greatest payo¤ it can attain with any imperfect private monitoring structure is only
22
82:5. In this simple game, we can therefore say exactly how much of a constraint is imposed
on each …rm’s greatest equilibrium payo¤ by the …rms’impatience alone, independently of
the monitoring structure.
8
Proof of Theorem 1
8.1
Preliminaries and Plan of Proof
We wish to establish (3) for every Pareto weight
W
Therefore, for every Pareto weight
2
+,
if there exists v 2 arg maxv0 2E(
v 2 W , then there exists v 2 Emed ( ) such that
2
+
Since we consider two-player games, we can order
, that is, the vector
v , as desired.
v
v 0 \ W = ;:
v 2E( )
1
2
v 0 such that
)
with
arg max
0
0
1
0
2
Note that Lemma 1 implies that
Emed ( ):
2
Hence, we are left to consider
+.
0
is steeper than
2
+
(4)
0
as follows:
if and only if
. For each player i, let wi be the Pareto-
e¢ cient point in Wi satisfying
wi 2 arg
Note that the assumption that W
max
v2Wi \u(A)
6= ; implies that wi 2 W .
recommendation that attains wi : u( i ) = wi .
weight
i
such that wi 2 arg maxv2u(A)
i
As u (A) is convex, if
=
i
2 R2+ :
v i:
i
i
i
Let
Let
i
2
(A) be a
be the (non-empty) set of Pareto
v:
= 1; wi 2 arg max
v2u(A)
satis…es (4) then either
23
<
1
for each
i
v :
1
2
1
or
>
2
for each
2
2
2
. See Figure 2. We focus on the case where
2
>
. (The proof for the
<
1
case is symmetric and thus omitted.)
Figure 2: Setup for the construction.
Fix v 2 arg maxv0 2E(
v 0 . Let ( ; ( i )i2I ) be an equilibrium that attains v with the
)
universal information structure (where players do not observe each other’s actions).
By
Lemma 2, it su¢ ces to construct a mediator’s strategy
yielding payo¤s v such that (2)
(“perfect monitoring incentive compatibility”) holds and
v
v . The rest of the proof
constructs such a strategy.
The plan for constructing the strategy
mediator’s strategy
is as follows: First, from , we construct a
that yields payo¤s v and satis…es perfect monitoring incentive com-
patibility for player 2, but possibly not for player 1. The idea is to set the distribution of
recommendations under
equal to the distribution of recommendations under
conditional
on player 2’s information only. Second, from , we construct a mediator’s strategy
yields payo¤s v with
v
that
v and satis…es perfect monitoring incentive compatibility
for both players.
24
8.2
Construction and Properties of
For each on-path history of player 2’s recommendations, denoted by r2t = (r2; )t =11 , let
Pr ( jr2t ) be the conditional distribution of recommendations in period t; and let w (r2t ) be
the continuation payo¤ vector from period t onward conditional on r2t :
w (r2t ) = E
"
#
1
X
u(rt+ ) j r2t :
=0
so that, for every on-path history rt = (r )t =11 , the mediator draws rt according
De…ne
to Pr (rt jr2t ):
Pr (rt jrt )
We claim that
Pr (rt jr2t ):
(5)
yields payo¤s v and satis…es (2) for player 2. To see this, let w (rt )
be the continuation payo¤ vector from period t onward conditional on rt under , and note
that w (rt ) = w (r2t ).
In particular, w (r1 ) = w (r21 ) = v.
In addition, the fact that
is an equilibrium with the universal information structure implies that, for every on-path
history rt+1 ,
(1
)E
u2 (rt ) jr2t+1 + w
As w2 r2t+1 = w2 (rt+1 ) and Pr
r2t+1
max (1
a2 2A2
)E
u2 (r1;t ; a2 ) jr2t+1 + u2 :
rt jr2t+1 = Pr (rt jrt ; r2;t ), this implies that (2) holds for
player 2.
8.3
Construction of
The mediator’s strategy
will involve mixing over continuation payo¤s at certain histories
rt+1 , and we will denote the mixing probability at history rt+1 by p (rt+1 ). Our approach
1
S
is to …rst construct the mediator’s strategy
for an arbitrary function p :
At 1 ! [0; 1]
t=1
specifying these mixing probabilities, and to then specify the function p.
1
S
Given a function p :
At 1 ! [0; 1], the mediator’s strategy
is de…ned as follows:
t=1
25
In each period t = 0; 1; 2; : : :, the mediator is in one of two states, ! t 2 fS1 ; S2 g (where
“period 0” is a purely notational, and as usual the game begins in period 1).
state, recommendations in period t
Given the
1 are as follows:
1. In state S1 , at history rt = (r )t =11 , the mediator recommends rt according to Pr (rt jrt ).
1
2. In state S2 , the mediator recommends rt according to some
u(
1
2
(A) such that
) = w1 .
The initial state is ! 0 = S1 . State S2 is absorbing: if ! t = S2 then ! t+1 = S2 . Finally,
the transition rule in state S1 is as follows:
1. If w (rt+1 ) 62 W1 , then ! t+1 = S2 with probability one.
2. If w (rt+1 ) 2 W1 , then ! t+1 = S2 with probability 1
Thus, strategy
p(rt+1 ).
agrees with , with the exception that
occasionally transitions to an
absorbing state where actions yielding payo¤s w1 are recommended forever. In particular,
such a transition always occurs when continuation payo¤s under
otherwise this transition occurs with probability 1
lie outside W1 , and
p (rt+1 ).
To complete the construction of
, it remains only to specify the function p. To this
1
S
end, it is useful to de…ne an operator F , which maps functions w :
At 1 ! R2 to functions
F (w2 ) :
1
S
t=1
At
1
t=1
! R2 .
The operator F will be de…ned so that its unique …xed point is
precisely the continuation value function in state S1 under
for a particular function p,
and this function will be the one we use to complete the construction of .
1
1
S
S
Given w :
At 1 ! R2 , de…ne w (w) :
At 1 ! R so that, for every rt 2 At 1 , we
t=1
t=1
have
w (w)(rt ) = (1
On the other hand, given w (w) :
1
S
t=1
)u
At
1
(rt ) + E w(rt+1 )jrt :
! R, de…ne F (w) :
26
1
S
t=1
At
(6)
1
! R so that, for
every rt 2 At 1 , we have
F (w) rt = 1fw
9
8
< p(w)(rt ) w (w)(rt ) =
+ 1fw
(rt )2W1 g
: + (1 p(w)(rt )) w1 ;
(rt )2W
= 1gw
1
(7)
;
where, when w (rt ) 2 W1 , p(w)(rt ) is the largest number in [0; 1] such that
p(w)(rt )
That is, if w2 (w) (rt )
that w2 (rt )
w2 (w)(rt ) + 1
p(w)(rt )
w2 (rt ):
w21
(8)
w2 (rt ), then p(w)(rt ) = 1; and otherwise, since w (rt ) 2 W1 implies
w21 , p(w)(rt ) 2 [0; 1] solves
p(w)(rt )
(Intuitively, the term 1fw
w2 (w)(rt ) + 1
(rt )2W
= 1gw
1
p(w)(rt )
w21 = w2 (rt ):
in (7) re‡ects the fact that we have replaced contin-
uation payo¤s outside of W1 with player 2’s most favorable continuation payo¤ within W1 ,
namely w1 . This replacement may reduce player 2’s value below his original value of w2 (rt ).
However, (8) ensures that, by also replacing continuation payo¤s within W1 with w1 with
high enough probability, player 2’s value does not fall below w2 (rt ).)
To show that F has a unique …xed point, it su¢ ces to show that F is a contraction.
Lemma 5 For all w and w,
~ we have kF (w)
suprt kw(rt )
F (w)k
~
kw
wk,
~ where kw
wk
~
w(r
~ t )k.
Proof. See appendix.
Let w be the unique …xed point of F .
Given this function w, let w = w (w) (given
by (6)) and let p = p (w) (given by (8)). This completes the construction of the mediator’s
strategy
.
27
8.4
Properties of
Observe that
w (rt ) = (1
)u
(rt ) + E w(rt+1 )jrt
(9)
and
w rt = 1fw
(rt )2W1 g
p(rt )w (rt ) + 1
p(rt ) w1 + 1fw
(rt )2W
= 1gw
1
(10)
:
Thus, for i = 1; 2, wi (rt ) is player i’s expected continuation payo¤ from period t given rt
and ! t = S1 (before she observes ri;t ), and wi (rt ) is player i’s expected continuation payo¤
from period t given rt and ! t
1
= S1 (before she observes ri;t ). In particular, recalling that
! 1 = S1 and v = w (;) 2 W1 , (8) implies that the ex ante payo¤ vector v is given by
v = w (;) = p(;)w (;) + (1
p(;)) w1 :
We prove the following key lemma in the appendix.
Lemma 6 For all t
1, if w (rt ) 2 W1 , then p(rt )w (rt ) + (1
p(r2t )) w1 Pareto dominates
w (rt ).
Here is a graphical explanation of Lemma 6:
w(rt+1 )
1fw
1
indicates that we construct w(rt+1 ) by replacing some continuation payo¤
not included in W1 with w1 . Hence, w(rt+1 )
w (rt+1 ) (and thus w (rt )
w (rt )) is parallel
w(r
^ t+1 ) for some w(r
^ t+1 ) 2 u(A) n W1 . See Figure 3 for an illustration.
Recall that p(rt ) is determined by (8).
w1
w (rt ) is parallel to
w (rt+1 ). To evaluate this di¤erence, consider (10) for period t + 1. The term
(rt+1 )2W
= 1gw
to w1
By (9), w (rt )
Since the vector w (rt )
w (rt ) is parallel to
w(r
^ t+1 ) for some w(r
^ t+1 ) 2 u(A) n W1 and u(A) is convex, we have w1 (rt )
w1 (r2t ).
Hence, if we take p(rt ) so that the convex combination of w2 (rt ) and w21 is equal to w2 (rt ),
then player 1 is better o¤ compared to w1 (rt ). See Figure 4.
Given Lemma 6, we show that
((2)) for both players, and
v
satis…es perfect monitoring incentive compatibility
v .
28
Figure 3: The vector from w (rt ) to w (rt ) is parallel to the one from w(r
^ t+1 ) to w1 .
Figure 4: p(rt )w (rt ) + (1
p(rt )) w1 and w (rt ) have the same value for player 2.
29
1. Incentive compatibility for player 1: It su¢ ces to show that, conditional on any onpath history rt and period t recommendation r1;t , the expected continuation payo¤ from
period t + 1 onward lies in W1 . If ! t = S2 , then this continuation payo¤ is w1 2 W1 .
If ! t = S1 , then it su¢ ces to show that w (rt+1 ) 2 W1 for all rt+1 . If w (rt+1 ) 2 W1 ,
then, by Lemma 6, w(rt+1 ) = p(rt+1 )w (rt+1 ) + (1
p(rt+1 )) w1 Pareto dominates
w (rt+1 ) 2 W1 , so w(rt+1 ) 2 W1 . If w (rt+1 ) 2
= W1 , then w(rt+1 ) = w1 2 W1 . Hence,
w (rt+1 ) 2 W1 for all rt+1 .
2. Incentive compatibility for player 2:
ommendation r2;t .
Fix an on-path history rt and a period t rec-
If ! t = S2 , or if both ! t = S1 and w (rt+1 ) 2
= W1 , then the
expected continuation payo¤ from period t + 1 onward conditional on (rt ; r2;t ) is
w1 2 W1 , so (2) holds.
p(rt+1 )w (rt+1 ) + (1
If instead ! t = S1 and w (rt+1 ) 2 W1 , then w(rt+1 ) =
p(rt+1 )) w1 = w (rt+1 ) by (8). As
universal information structure and Pr
is an equilibrium with the
rt jr2t+1 = Pr (rt jrt ; r2;t ), this implies that
(2) holds for player 2, by the same argument as in Section 8.2.
3.
9
v
v : Immediate from Lemma 6 with t = 1.
Extensions
This section discusses what happens when the conditions for Theorem 1 are violated, as well
as the extent to which the payo¤ bound is tight.
9.1
What if W = ;?
The assumption that W 6= ; guarantees that all action pro…les are supportable in equilibrium, which in turn implies that deviators can be held to their correlated minmax payo¤s.
This fact plays a key role both in showing that Emed ( ) is an upper bound on E ( ; p) for
any private monitoring structure p and in recursively characterizing Emed ( ). However, the
assumption that W 6= ; is restrictive, in that it is violated when players are too impatient.
30
Furthermore, it implies that the Pareto frontier of Emed ( ) coincides with the Pareto frontier
of the feasible payo¤ set for some Pareto weights
(but of course not for others), so this
assumption must also be relaxed for our approach to be able to give non-trivial payo¤ bounds
for all Pareto weights.
To address these concerns, this subsection shows that even if W = ;, Emed ( ) may still
be an upper bound on E ( ; p) for any private monitoring structure p, and Emed ( ) can still
be characterized recursively. The idea is that, even if not all action pro…les are supportable,
our approach still applies if a condition analogous to W 6= ; holds with respect to the subset
of action pro…les that are supportable.
Recall that E( ) is the equilibrium payo¤ set with the universal monitoring structure
where a mediator observes the recommendation pro…le rt and action pro…le at in each period
t, while each player i only observes her own recommendation ri;t and her own action ai;t .
Denote this monitoring structure by p . We provide a su¢ cient condition— which is more
permissive than W 6= ;— under which we show that the Pareto frontier of Emed ( ) dominates
the Pareto frontier of E( ) and characterize Emed ( ).
Let supp( ) be the set of supportable actions with monitoring structure p :
8
>
>
with monitoring structure p ,
>
<
supp( ) = a 2 A : there exist an equilibrium strategy
>
>
>
:
and history htm with a 2 supp( (htm ))
Note that in this de…nition htm can be an o¤-path history.
On the other hand, given a product set of action pro…les A =
Q
the set of actions ai 2 Ai such that there exists a correlated action
(1
) ui (ai ;
i)
+ max ui (a)
(1
a2A
(That is, ai 2 Si A if there exists
i
2
) max ui (^
ai ;
a
^i 2Ai
i)
+
^
i2I
9
>
>
>
=
>
>
>
;
:
Ai
A, let Si A be
2
(A i ) such that
i
min
max ui (ai ; ^ i ):
(A i ) ai 2Ai
(11)
i2
(A i ) such that, if her opponents play
i,
player
i’s reward for playing ai is the best payo¤ possible among those with support in A, and player
31
i’s punishment for deviating from ai is the worst possible among those with support in A i ,
Q
then player i plays ai .) Let S A = i2I Si (Ai ) A. Let A1 = A, let An = S (An 1 ) for
n > 1, and let A1 = limn!1 An . Note that the problem of computing A1 is tractable, as
the set S A is de…ned by a …nite number of linear inequalities.
Finally, in analogy with the de…nition of wi from Section 5, let
min
max ui (ai ;
1 a 2Ai
i 2 (A i ) i
i)
+
1
max
r2A1 ;ai 2Ai
fui (ai ; r i )
ui (r)g :
be the lowest continuation payo¤ such that player i does not want to deviate to any ai 2 Ai
at any recommendation pro…le r 2 A1 , when she is minmaxed forever if she deviates, subject
to the constraint that punishments are drawn from A1 . In analogy with the de…nition of
Wi from Section 5, let
Wi =
(
jIj
w2R
: wi
min
max ui (ai ;
(A1i ) ai 2Ai
i)
+
1
i2
max
r2A1 ;ai 2Ai
fui (ai ; r i )
)
ui (r)g :
We show the following.
Proposition 2 Assume that jIj = 2. If
int
\
i2I
Wi \ u(A1 ) 6= ;
(12)
in the topology induced from u(A1 ), then for every private monitoring structure p and every
non-negative Pareto weight
2
+,
we have
max
v2E( ;p)
v
max
v:
v2Emed ( )
In addition, supp( ) = A1 .
Proof. Given that supp( ) = A1 , the proof is analogous to the proof of Theorem 1,
everywhere replacing u(A) with u(A1 ) and replacing “full support”with “full support within
32
A1 .” The proof that supp( ) = A1 whenever int
appendix.
T
i2I
Wi \ u(A1 ) 6= ; is deferred to the
Note that Proposition 2 only improves on Theorem 1 at low discount factors: for a high
enough discount factor, A = S (A) = A1 , so (12) reduces to W 6= ;. However, as W can
be empty only for low discount factors, this is precisely the case where an improvement is
needed.16
In order to be able to use Proposition 2 to give a recursive upper bound on E ( ; p) when
W 6= ;, we must characterize Emed ( ) under (12). Our earlier characterization generalizes
easily. In particular, the following de…nitions are analogous to De…nitions 1 and 2.
RjIj , a correlated action pro…le
De…nition 3 For any set V
enforceable on V by a mapping
ai 2 supp
E [(1
2
(supp( )) is supp( )
: supp( ) ! V such that, for each player i, and action
i,
) ui (ai ; a i ) +
(ai ; a i )]
max E [(1
a0i 2Ai
) ui (a0i ; a i )]+
i2
min
max ui (^
ai ;
^i 2Ai
(supp( )) a
De…nition 4 A payo¤ vector v 2 RjIj is supp( ) decomposable on V if there exists a correlated action pro…le
2
(supp( )) which is supp( ) enforced on V by some mapping
such that
v = E [(1
) u (a) +
(a)] :
~ supp( ) (V ) = v 2 RjIj : v is supp( ) decomposable on V .
Let B
Let W supp(
W supp(
);1
);1
= u (supp( )), let W supp(
= limn!1 W supp(
Proposition 3 If int
T
i2I
);n
);n
~ supp(
= B
)
W supp(
);n 1
for n > 1, and let
. We have the following.
1 ;1
Wi \ u(A1 6= ;, then Emed ( ) = W A
.
Proof. Given that supp( ) = A1 by Proposition 2, the proof is analogous to the proof of
Theorem 2.
16
To be clear, it is possible for W to be empty while A1 = A. Theorem 1 and Proposition 2 only give
su¢ cient conditions: we are not claiming that they cover every possible case.
33
i) :
As an example of how Propositions 2 and 3 can be applied, one can check that apply1 5
;
4 18
yields A1 =
T
fW; L; M g fW; L; M g— ruling out the e¢ cient action pro…le (H; H)— and int i2I Wi \ u(A1 =
6
ing the operator S in the Bertrand example in Section 7 for any
2
~ A1 .
;. We can then compute Emed ( ) by applying the operator B
Finally, we mention that Emed ( ) can be characterized recursively even if int
T
i2I
Wi \ u(A1 =
;. This fact is not directly relevant for the current paper, so we omit the proof. It is available
from the authors upon request.
9.2
Tightness of the Bound
There are at least two senses in which Emed ( ) is a tight bound on the equilibrium payo¤
set with private monitoring.
First, thus far our model of repeated games with private monitoring has maintained
the standard assumption that the distribution of period t signals depends only on period t
actions: that is, that this distribution can be written as p ( jat ). In many settings, it would
be desirable to relax this assumption and let the distribution of period t signals depend on
the entire history of actions and signals up to period t, leading to a conditional distribution
of the form pt ( jat ; z t ), as well as letting players receive signals before the …rst round of
play. (Recall that at = (a )t =11 and z t = (z )t =11 .) For example, colluding …rms do not only
observe their sales in every period, but also occasionally get more information about their
competitors’past behavior from trade associations, auditors, tax data, and the like.17
In
the space of such nonstationary private monitoring structures, the bound Emed ( ) is clearly
tight: Emed ( ) is an upper bound on E ( ; p) for any non-stationary private monitoring
structure p, because the equilibrium payo¤ set with the universal monitoring structure p ,
E ( ), remains an upper bound on E ( ; p); and the bound is tight because mediated perfect
monitoring is itself a particular nonstationary private monitoring structure.
Second, maintaining the assumption that the monitoring structure is stationary, the
17
Rahman (2014, p. 1) quotes from the European Commission decision on the amino acid cartel: a typical
cartel member “reported its citric acid sales every month to a trade association, and every year, Swiss
accountants audited those …gures.”
34
bound Emed ( ) is tight if the players can communicate through cheap talk, as they can
then “replicate” the mediator among themselves. For this result, we also need to slightly
generalize our de…nition of a private monitoring structure by letting the players receive signals
before the …rst round of play. This seems innocuous, especially if we take the perspective
of an an outside observer who does not know the game’s start date.
Equivalently, the
players have access to a mediator at the beginning of the game only.
The monitoring
structure is required to be stationary thereafter.
We call such a monitoring structure a
private monitoring structure with ex ante correlation.
Proposition 4 Let Etalk ( ; p) be the sequential equilibrium payo¤ set in the repeated game
with private monitoring structure with ex ante correlation p and with …nitely many rounds
of public cheap talk before each round of play.
If jIj = 2 and
>
, then there exists a
private monitoring structure with ex ante correlation p such that Etalk ( ; p) = Emed ( ).
The proof is long and is deferred to the online appendix.
The main idea is as in the
literature on implementing correlated equilibria without a mediator (see Forges (2009) for
a survey).
More speci…cally, Proposition 4 is similar to Theorem 9 of Heller, Solan, and
Tomala (2012), which shows that communication equilibria in repeated games with perfect
monitoring can always be implemented by ex ante correlation and cheap talk. As the private
monitoring structure used in the proof of Proposition 4 is in fact perfect monitoring, the main
di¤erence between the results is that theirs is for Nash rather than sequential equilibrium, so
they are concerned only with detecting deviations rather than providing incentives to punish
deviations once detected. In our model, when
>
, incentives to minmax the deviator can
be provided (as in Lemma 1) if her opponent does not realize that the punishment phase
has begun. The additional challenge in the proof of Proposition 4 is thus that we sometimes
need a player to switch to the punishment phase for her opponent without realizing that this
switch has occurred.
If one insists on stationary monitoring and does not allow communication, then we believe
that there are some games in which our bound is not tight, in that there are points in Emed ( )
35
which are not attainable in equilibrium for any stationary private monitoring structure. We
leave this as a conjecture.18
9.3
What if There are More Than Two Players?
The condition that W 6= ; no longer guarantees that mediated perfect monitoring outperforms private monitoring when there are more than two players.
We record this as a
proposition.
Proposition 5 There are games with jIj > 2 where W
maxv2Emed (
weight
2
6= ; but maxv2E(
;p)
v >
v for some private monitoring structure p and some non-negative Pareto
)
+.
Proof. We give an example in the appendix.
To see where the proof of Theorem 1 breaks down when jIj > 2, recall that the proof is
based on fact that, for any Pareto-e¢ cient payo¤ v, if v 2
= Wi for one player i, then it must
be the case that v 2 Wj for the other player j. This implies that incentive compatibility is
a problem only for one player at a time, which lets us construct an equilibrium with perfect
monitoring by basing continuation play only on that player’s past recommendations (which
she necessarily knows in any private monitoring structure).
On the other hand, if there
are more than two players, several players’incentive compatibility constraints might bind at
once when we publicize past recommendations. The proof of Theorem 1 then cannot get
o¤ the ground.
We can however say some things about what happens with more than two players.
T
First, the argument in the proof of Proposition 2 that supp( ) = A1 whenever int i2I Wi \ u(A1 ) 6=
T
; does not rely on jIj = 2. Thus, when int i2I Wi \ u(A1 ) 6= ;, we can characterize the
18
Strictly speaking, since our maintained de…nition of a private monitoring structure does not allow ex ante
correlation, if = 0 then there are points in Emed ( ) which are not attainable with any private monitoring
structure whenever the stage game’s correlated equilibrium payo¤ set strictly contains its Nash equilibrium
payo¤ set. The non-trivial conjecture is that the bound is still not tight when ex ante correlation is allowed,
but communication is not, or when discount factors are moderately high.
36
set of supportable actions for any number of players. This is sometimes already enough to
imply a non-trivial upper bound on payo¤s.
Second, Proposition 6 below shows that if a payo¤ vector v 2 int (u (A)) satis…es vi >
ui +
1
di for all i 2 I, then v 2 Emed ( ). This shows that private monitoring cannot do
“much” better than mediated perfect monitoring when the players are at least moderately
patient (e.g., it cannot do more than order 1
better).
Finally, suppose there is a player i whose opponents
9i : 8j; j 0 2 I n fig, uj (a) = uj 0 (a) for all a 2 A.
i all have identical payo¤ functions:
Then the proof of Theorem 1 can be
adapted to show that private monitoring cannot outperform mediated perfect monitoring
T
in a direction where the extremal payo¤ vector v lies in j2 i Wj (but not necessarily in
Wi ). For example, if the game involves one …rm and many identical consumers, then the
consumers’best equilibrium payo¤ under mediated perfect monitoring is at least as good as
under private monitoring. We can also show that the same result holds if the preferences
of players
9.4
i are su¢ ciently close to each other.
The Folk Theorem with Mediated Perfect Monitoring
The point of this paper is bounding the equilibrium payo¤ set with private monitoring
at a …xed discount factor.
Nonetheless, it is worth pointing out that— as a corollary of
our results— a strong version of the folk theorem holds with mediated perfect monitoring,
without any conditions on the feasible payo¤ set (e.g., full-dimensionality, non-equivalent
utilities), and with a rate of convergence of 1
. One reason why this result may be of
interest is that it shows that, if players are fairly patient, they cannot do “much”better with
private monitoring than with mediated perfect monitoring, even if the su¢ cient conditions for
Theorem 1 fail. Another reason is that the mediator can usually be replaced by unmediated
communication among the players (we spell out exactly when below), and it seems worth
knowing that the folk theorem holds with unmediated communication without conditions on
the feasible payo¤ set.
Recall that ui is player i’s correlated minmax payo¤ and di is player i’s greatest possible
37
deviation gain. Let u = (ui )i2I and d = (di )i2I .
Proposition 6 If a payo¤ vector v 2 int (u (A)) satis…es
v >u+
1
(13)
d;
then v 2 Emed ( ).
Proof. Fix
2
(A) such that v = u ( ). If v satis…es (13), then v 2 W (so in particular
W 6= ;), and the in…nite repetition of
satis…es (2). Lemma 1 then gives v 2 Emed ( ).
The folk theorem is a corollary of Proposition 6.19
Corollary 1 (Folk Theorem) For every strictly individually rational payo¤ vector v 2
u (A), there exists
< 1 such that if
>
then v 2 Emed ( ).
Proof. It is immediate that v 2 Emed ( ) for high enough : let
= maxi2I
di
di +vi ui
and
apply Proposition 6. The proof that taking the closure is unnecessary is in the appendix.
In earlier work (Sugaya and Wolitzky, 2014), we have shown that the mediator can usually
be dispensed with in Proposition 6 and Corollary 1 if players can communicate through cheap
talk. In particular, this is possible unless there are exactly three players, of whom exactly
two have equivalent utilities in the sense of Abreu, Dutta, and Smith (1994).
10
Conclusion
This paper gives a simple su¢ cient condition ( >
) under which the equilibrium payo¤ set
in a two-player repeated game with perfect monitoring and a mediator is a tight, recursive
upper bound on the equilibrium payo¤ set in the same game with any imperfect private
monitoring structure. There are at least three perspectives from which this result may be
19
The restriction in Corollary 1 to strictly individually rational payo¤ vectors cannot be dropped. In
particular, the counterexample to the folk theorem due to Forges, Mertens, and Neyman (1986)— in which
no payo¤ vector is strictly individually rational— remains valid in the presence of a mediator.
38
of interest. First, it shows that simple, recursive methods can be used to upper-bound the
equilibrium payo¤ set in a repeated game with imperfect private monitoring at a …xed discount factor, even though the problem of recursively characterizing this set seems intractable.
Second, it characterizes the set of payo¤s that can arise in a repeated game for some monitoring structure. Third, it shows that information is valuable in mediated repeated games,
in that players cannot bene…t from imperfections in the monitoring technology when
>
.
These di¤erent perspectives on our results suggest di¤erent questions for future research.
Do moderately patient players always bene…t from any improvement in the monitoring technology, or only from going all the way to perfect monitoring? Is it possible to characterize
the set of payo¤s that can arise for some monitoring structure even if
<
? If we do know
the monitoring structure under which the game is being played, is there a general way of
using this information to tighten our upper bound? Answering these questions may improve
our understanding of repeated games with private monitoring at …xed discount factors, even
if a full characterization of the equilibrium payo¤ set in such games remains out of reach.
11
Appendix
11.1
Proof of Proposition 1: Mediated Perfect Monitoring
As the players’stage game payo¤s from any pro…le other than (U; L) sum to at most 3, it
follows that the players’per-period payo¤s may sum to more than 3 only if (U; L) is played
in some period t with positive probability.
For this to occur in equilibrium, player 1’s
expected continuation payo¤ from playing U must exceed her expected continuation payo¤
from playing D by more than (1
than U .
) 1 = 65 , her instantaneous gain from playing D rather
In addition, player 1 can guarantee herself a continuation payo¤ of 0 by always
playing D, so her expected continuation payo¤ from playing U must exceed
1
5
6
= 5. This
is possible only if the probability that (T; M ) is played in period t + 1 when U is played in
39
period t exceeds the number p such that
1
1
6
[p (6) + (1
p) (3)] +
1
(6) = 5;
6
or p = 53 . In particular, there must exist a period t + 1 history ht+1
of player 2’s such that
2
(T; M ) is played with probability at least
3
5
in period t + 1 conditional on reaching ht+1
2 . At
such a history, player 2’s payo¤ from playing M is at most
1
1
6
3
2
1
( 3) + (3) + (3) = 0:
5
5
6
On the other hand, noting that player 2 can guarantee himself a continuation payo¤ of 0 by
playing 12 L + 12 M , player 2’s payo¤ from playing L at this history is at least
1
1
6
3
2
1
1
(3) + ( 3) + (0) = :
5
5
6
2
Therefore, player 2 has a pro…table deviation, so no such equilibrium can exist.
11.2
Proof of Proposition 1: Private Monitoring
Consider the following imperfect private monitoring structure. Player 2’s action is perfectly
observed. Player 1’s action is perfectly observed when it equals T or B. When player 1
plays U or D, player 2 observes one of two possible private signals, m and r.
Whenever
player 1 plays U , player 2 observes signal m with probability 1; whenever player 1 plays D,
player 2 observes signals m and r with probability
1
2
each.
We now describe a strategy pro…le under which the players’payo¤s sum to
Player 1’s strategy:
1
U
3
23
7
3:29.
In each odd period t = 2n + 1 with n = 0; 1; :::, player 1 plays
+ 23 D. Let a1 (n) denote the realization of this mixture. In the even period t = 2n + 2, if
the previous action a1 (n) equals U , then player 1 plays T ; if the previous action a1 (n) equals
D, then player 1 plays B.
Player 2’s strategy: In each odd period t = 2n + 1 with n = 0; 1; :::, player 2 plays L.
40
Let y2 (n) denote the realization of player 2’s private signal. In the even period t = 2n + 2,
if the previous private signal y2 (n) equals m, then player 2 plays M ; if the previous signal
y2 (n) equals r, then player 2 plays R.
We check that this strategy pro…le, together with any consistent belief system, is a
sequential equilibrium.
In an odd period, player 1’s payo¤ from U is the solution to v =
On the other hand, her payo¤ from D is
5
6
(3) +
15
66
(0) +
1
v.
62
5
6
(2) +
15
66
(6) +
1
v.
62
Hence, player 1 is indi¤erent
between U and D (and clearly prefers either of these to T or B).
In addition, playing L is a myopic best response for player 2, player 1’s continuation
play is independent of player 2’s action, and the distribution of player 2’s signal is also
independent of player 2’s action. Hence, playing L is optimal for player 2.
Next, in an even period, it su¢ ces to check that both players always play myopic best
responses, as in even periods continuation play is independent of realized actions and signals.
If player 1’s last action was a1 (n) = U , then she believes that player 2’s signal is y2 (n) = m
with probability 1 and thus that he will play M . Hence, playing T is optimal. If instead
player 1’s last action was a1 (n) = D, then she believes that player 2’s signal is equal to m
and r with probability
1
2
each, and thus that he will play 12 M + 12 R. Hence, both T and B
are optimal.
On the other hand, if player 2 observes signal y2 (n) = m, then his posterior belief that
player 1’s last action was a1 (n) = U is
1
3
(1)
1
(1) + 32
3
1
2
1
= :
2
Hence, player 2 is indi¤erent among all of his actions. If player 2 observes y2 (n) = r, then
his posterior is that a1 (n) = D with probability 1, so that M and R are optimal.
Finally, expected payo¤s under this strategy pro…le in odd periods sum to 31 (4) + 32 (3) =
41
10
,
3
and in even periods sum to 3. Therefore, per-period expected payo¤s sum to
1
1
6
10 1
+ (3)
3
6
1+
1
1
+ 4 + :::
2
6
6
=
23
:
7
Three remarks on the proof: First, the various indi¤erences in the above argument result
only because we have chosen payo¤s to make the example as simple as possible. One can
modify the example to make all incentives strict.20 Second, players’payo¤s are measurable
with respect to their own actions and signals. In particular, the required realized payo¤s
for player 2 are as follows:
(Action,Signal) Pair: (L; m) (L; r) (M; m) (M; r) (R; m) (R; r)
Realized Payo¤:
2
2
0
0
0
0
Third, a similar argument shows that imperfect public monitoring with private strategies
can also outperform mediated perfect monitoring.21
11.3
Proof of Lemma 2
Fix such a strategy
and any strict full-support equilibrium
strict
. We construct a strict
equilibrium that attains a payo¤ close to v.
In period 1, the mediator draws one of two states, Rv and Rperturb , with probabilities 1 "
and ", respectively. In state Rv , the mediator’s recommendation is determined as follows:
If no player has deviated up to period t, the mediator recommends rt according to (htm ).
If only player i has deviated, the mediator recommends r
and recommends some best response to
as in the proof of Lemma 1.
i
to player i.
i;t
to players
i according to
i,
Multiple deviations are treated
On the other hand, in state Rperturb , the mediator follows
20
The only non-trivial step in doing so is giving player 1 a strict incentive to mix in odd periods. This
can be achieved by introducing correlation between the players’actions in odd periods.
21
Here is a sketch: Modify the current example by adding a strategy L0 for player 2, which is an exact
duplicate of L as far as payo¤s are concerned, but which switches the interpretation of signals m and r.
Assume that player 1 cannot distinguish between L and L0 , and modify the equilibrium by having player 2
play 21 L + 12 L0 in odd periods. Then, even if the signals m and r are publicly observed, their interpretations
will be private to player 2, and essentially the same argument as with private monitoring applies.
42
the equilibrium
strict
.
Player i follows the recommendation ri;t in period t.
Since the
constructed recommendation schedule has full support, player i never believes that another
player has deviated.
Moreover, since
strict
has full support, player i believes that the
mediator’s state is Rperturb with positive probability after any history.
and the fact that
strict
Therefore, by (2)
is a strict equilibrium, it is always strictly optimal for each player i
to follow her recommendation on path. Taking " ! 0 yields a sequence of strict equilibria
with payo¤s converging to v.
11.4
Proof of Lemma 3
~ (V ), there exists an on-path recommendaStandard arguments imply that, for every v 2 B
tion strategy
that yields payo¤ v and satis…es (2). Under the assumption that
>
, the
conclusion of the lemma follows from Lemma 2.
11.5
Proof of Lemma 4
By Lemma 3 and boundedness, we need only show that Emed ( )
~ Emed ( ) .
B
weak
( ) be the set of (possibly weak) sequential equilibrium payo¤s with mediated
Let Emed
perfect monitoring. Note that, in any sequential equilibrium, player i’s continuation payo¤
at any history hti must be at least ui . Therefore, if
is an on-path recommendation strategy
in a (possibly weak) sequential equilibrium, then it must satisfy (2).
weak
assumption that W 6= ;, we have Emed
( )
Now, for any v 2 Emed ( ), let
Hence, under the
Emed ( ).
be a corresponding equilibrium mediator’s strategy. In
the corresponding equilibrium, if some player i deviates in period 1 while her opponents are
obedient, player i’s continuation payo¤ must be at least ui . Hence, we have
E [(1
) ui (ai ; a i ) + wi (ai ; a i ) j ai ]
max E [(1
a0i 2Ai
) ui (a0i ; a i ) j ai ] + ui ;
where wi (ai ; a i ) is player i’s equilibrium continuation payo¤ when action pro…le (ai ; a i ) is
recommended and obeyed in period 1. Finally, since action pro…le (ai ; a i ) is in the support
43
of the mediator’s recommendation in period 1, each player assigns probability 1 to the
true mediator’s history when (ai ; a i ) is recommended and played in period 1. Therefore,
continuation play from this history is itself at least a weak sequential equilibrium.
weak
particular, we have wi (ai ; a i ) 2 Emed
( )
In
Emed ( ) for all (ai ; a i ) 2 supp (ht ). Hence,
v is minmax-threat decomposable on Emed ( ) by action pro…le
(;) and continuation payo¤
~ Emed ( ) .
function w, so in particular v 2 B
~ preserves
~ Emed ( ) . As Emed ( ) is compact and B
B
We have shown that Emed ( )
~ Emed ( ) = B
~ Emed ( ) .
B
compactness, taking closures yields Emed ( )
11.6
Proof of Lemma 5
By (6), kw (w)
F (w) rt
w (w)k
~
F (w)
~ rt
kw
wk.
~
By (7),
= 1fw
(rt )2W
1g
kw (w)
p(w)(rt )) w1 g
fp(w)(r
~ t )w (w)(r
~ t ) + (1
w (w)k
~ :
Combining these inequalities yields kF (w)
11.7
fp(w)(rt )w (w)(rt ) + (1
F (w)k
~
kw
p(w)(r
~ t )) w1 g
wk.
~
Proof of Lemma 6
It is useful to introduce a family of auxiliary value functions wT
1
T =1
will converge to w and w pointwise in rt as T ! 1. For periods t
wT (rt ) = w (rt ) and w
On the other hand, for periods t
T
;T
(rt 1 ) = w (rt 1 ):
1, de…ne w
;T
and w
;T 1
,
T =1
which
T , de…ne
(14)
(rt ), pT (rt ), and wT (rt ) given wT (rt+1 )
recursively, as follows. First, de…ne
w
;T
(rt ) = (1
)u
(rt ) + E wT (rt+1 )jrt :
44
(15)
Note that, for t = T
1, this de…nition is compatible with (14). Second, given w
;T
(rt ),
de…ne
wT (rt ) = 1fw
pT (rt )w
(rt )2W1 g
;T
(rt ) + 1
pT (rt ) w1 + 1fw
(rt )2W
= 1gw
1
(16)
;
where, when w (rt ) 2 W1 , pT (rt ) is the largest number in [0; 1] such that
pT (rt )w2;T (rt ) + 1
We show that w
;T
w2 (rt ):
(rt ) = w (rt ) for all rt 2 At .
Proof. By Lemma 5, it su¢ ces to show that F (wT ) = wT +1 . For t
that w
;T +1
(17)
converges to w .
;T
Lemma 7 limT !1 w
pT (rt ) w21
T + 1, (14) implies
(rt 1 ) = w (rt 1 ). On the other hand, given wT , w (w2T ) is the value calculated
according to (6). Since wT (rt ) = w (rt ) by (14), we have w (w2T )(rt 1 ) = w (rt 1 ) by (6).
Hence,
w (w2T )(rt 1 ) = w
For t
;T +1
(rt 1 ).
(18)
T , by (15), we have
w
;T +1
(rt ) = (1
)u
(rt ) + E wT +1 (rt+1 )jrt :
By (16),
wT +1 (rt ) = 1fw
(rt )2W1 g
= 1fw
(rt )2W1 g
pT +1 (rt )w
;T +1
(rt ) + 1
pT +1 (rt )w (wT )(rt ) + 1
pT +1 (rt ) w1 + 1fw
pT +1 (rt ) w1 + 1fw
(rt )2W
= 1gw
1
(rt )2W
= 1gw
1
;
where the second equality follows from (18) for t = T , and follows by induction for t < T .
Recall that pT +1 is de…ned by (17). On the other hand, p(wT )(rt ) is de…ned in (8) using
45
w = w (wT ). Since w
wT +1 (rt ) = 1fw
;T +1
(rt )2W1 g
(rt ) = w (w2T )(rt ), we have pT +1 (rt ) = p(wT )(rt ). Hence,
p(wT )(rt )w (wT )(rt ) + 1
p(wT )(rt ) w1 + 1fw
(rt )2W
= 1gw
1
:
= F (wT )(rt );
as desired.
As w
;T
(rt ), wT (rt ), and pT (rt ) converge to w (rt ), w(rt ), and p(rt ) by Lemma 7, the
following lemma implies Lemma 6:
Lemma 8 For all t = 1; :::; T
1, if w (rt ) 2 W1 , then pT (rt )w
;T
(rt ) + 1
pT (rt ) w1
Pareto dominates w (rt ).
Proof. For t = T
1, the claim is immediate since w
Suppose that the claim holds for each period
period t. By construction, pT (rt )w2;T (rt ) + 1
show that pT (rt )w1;T (rt ) + 1
pT (rt ) w11
;T
(rt ) = w (rt ) and so pT (rt ) = 1.
t + 1. We show that it also hold for
pT (rt ) w21
w2 (rt ). Thus, it su¢ ces to
w1 (rt ).
Note that
w
;T
= (1
= (1
(rt )
)u
)u
(rt ) + E wT (rt+1 )jrt
8
t+1 t
< P t+1
jr ) pT (rt+1 )w ;T (rt+1 ) + 1
r
:w (rt+1 )2W1 Pr (r
t
(r ) +
P
:
+
Pr (rt+1 jrt ) w1
t+1
t+1
r
while
w (rt ) = (1
(rt ) +
)u
X
:w (r
Pr
rt+1
T
p (r
t+1
) w
=
r
:
:w (r
)2W1
t+1
t
T
t+1
;T
)2W
= 1
rt+1 jrt w (rt ):
t+1
T
t+1
Pr (r jr ) p (r )w (r ) + 1 p (r ) w
P
t+1 t
+ rt+1 :w (rt+1 )2W
jr ) fw1 w (rt+1 )g
= 1 Pr (r
46
1
w (r
9
=
;
Hence,
w ;T (rt ) w (rt )
8
< P t+1
t+1
1
t+1
)
9
=
;
:
;
When w (rt+1 ) 2 W1 , the inductive hypothesis implies that
pT (rt+1 )w
;T
(rt+1 ) + 1
pT (rt+1 ) w1
w (rt+1 )
0:
On the other hand, note that
X
rt+1 jrt
Pr
rt+1 :w (rt+1 )2W
= 1
for some number l(rt )
w1
w (rt+1 ) = l(rt )(w1
w(r
~ t ))
0 and vector w(r
~ t) 2
= W1 . In total, we have
w
;T
(rt ) = w (rt ) + l(rt )(w1
w(r
^ t ))
(19)
w^1 (rt ), if w11
0 and vector w(r
^ t)
w(r
~ t) 2
= W1 . Since w11
n
o
w1 (rt ) then (19) implies that min w1;T (rt ); w11
w1 (rt ), and therefore pT (rt )w1;T (rt ) +
for some number l(rt )
pT (rt ) w11
1
w1 (rt ).
In addition, if w (rt+1 ) 2 W1 with probability one, then the
inductive hypothesis implies that w2;T (rt )
pT (rt )w1;T (rt ) + 1
w2 (rt ), and therefore pT (rt ) = 1 and
pT (rt ) w11 = w1;T (rt )
= w1 (rt ) + l(rt )(w11
w^1 (rt ))
w1 (rt ):
Hence, it remains only to consider the case where w11 < w1 (rt ) and l(rt ) > 0.
In this case, take a normal vector
have
1
1
0 and
1
2
1
of the supporting hyperplane of u (A) at w1 . We
> 0, and in addition (as w^ (rt )
1
1
w1
w1
w~ (rt ) 2 u (A) and w (rt ) 2 u (A))
w(r
^ t)
0;
w (rt )
0:
47
As w11
w^1 (rt ) > 0 and w11
w1 (rt ) < 0, we have
w^2 (rt ) w21
w11 w^1 (rt )
1
1
1
2
w21 w2 (rt )
:
w1 (rt ) w11
Next, by (19), the slope of the line from w (rt ) to w
;T
(rt ) equals the slope of the line
from w(r
^ t ) to w1 . Hence,
w2 (rt ) w2;T (rt )
w1;T (rt ) w1 (rt )
w21 w2 (rt )
:
w1 (rt ) w11
In this inequality, the denominator of the left-hand side and the numerator of the right-hand
side are both positive: w1;T (rt ) > w1 (rt ) by (19) and l(rt ) > 0, while w21 > w2 (rt ) because
w (rt ) 2 W1 and w (rt ) 6= w21 . Therefore, the inequality is equivalent to
w2 (rt ) w2;T (rt )
w21 w2 (rt )
w1;T (rt ) w1 (rt )
:
w1 (rt ) w11
Now, let q 2 [0; 1] be the number such that
qw1;T (rt ) + (1
Note that
q) w11 = w1 (rt ):
w2 (rt ) w2;T (rt )
p (r ) =
;
w21 w2 (rt )
T
1
while
1
t
w1;T (rt ) w1 (rt )
:
q=
w1 (rt ) w11
Hence, pT (rt ) > q. Finally, we have seen that w11
pT (rt )w1;T rt + 1
w1 (rt )
qw1;T rt + (1
pT (rt ) w11
48
w1;T (rt ), so we have
q) w11 = w1 (rt ):
11.8
Proof of Proposition 2
We wish to show that supp( ) = A1 whenever int
consists of two steps: we show that supp( )
T
i2I
Wi \ u (A1 ) 6= ;.
The proof
A1 in Section 11.8.1, and show that supp( )
A1 in Section 11.8.2
11.8.1
A1
Proof of supp( )
Recall that A1 is the largest …xed point of the monotone operator S. Hence, to prove that
supp( )
A1 , it su¢ ces to show that there exists a set supp"#0 ( ) such that supp( )
supp"#0 ( ) and supp"#0 ( ) = S(supp"#0 ( )).
To this end, it will be useful to consider "-sequential equilibria in the repeated game with
monitoring structure p . We say that an assessment is an "-sequential equilibrium if it is
consistent, and, for each player i and (on- or o¤-path) history hti , player i’s deviation gain
at history hti is no more than ". Let
supp1" ( ) =
8
<
:
0
a2A:@
there exist an "-sequential equilibrium
such that a 2 supp ( (;))
19
=
A
;
be the on-path support of actions in the initial period in "-sequential equilibrium with
monitoring structure p . In addition, let
suppt" ( ) =
8
<
:
0
a2A:@
there exist an "-sequential equilibrium
and history
such that a 2 supp ( (htm ))
be the (possibly o¤-path) support of actions in period t
1.
htm
19
=
A
;
We will establish the following equality in Lemma 10:
\
">0
[
t
suppt" ( ) =
\
">0
supp1" ( ):
(20)
As a preliminary result toward the proof of Lemma 10, we will show in Lemma 9 that
T
1
">0 supp" ( ) is a product set.
49
T
supp1" ( ). Since a sequential equilibrium is an "-sequential equiT S
t
librium for every ", we have supp( )
">0
t supp" ( ). Together with (20), this implies
Let supp"#0 ( )
that supp( )
">0
supp"#0 ( ). Hence, it remains to show that supp"#0 ( ) = S(supp"#0 ( )).
Finally, as supp"#0 ( ) is a product set by Lemma 9 and S(A)
A, it su¢ ces to show that supp"#0 ( )
S(supp"#0 ( )).
A for every product set
This is established in Lemma 11,
completing the proof.
We now prove Lemmas 9, 10, and 11.
Given a mediator’s strategy , let
i
denote the marginal distribution of player i’s rec-
ommendation, and let
supp1i;" ( ) =
Lemma 9
T
">0
Proof. Since
T
1
">0 supp" ( ).
T
8
<
:
0
ai 2 Ai : @
there exists an "-sequential equilibrium
such that ai 2 supp ( i (;))
supp1" ( ) is a product set:
">0
supp1" ( )
T
">0
Q
T
">0
supp1" ( ) =
T
">0
Q
19
=
A :
;
supp1i;" ( ).
i2I
supp1i;" ( ), we are left to show
i2I
T
">0
Q
supp1i;" ( )
i2I
To this end, …x an arbitrary "-sequential equilibrium strategy .
It suf-
…ces to show that, for each " > 0, there exists a 2"-sequential equilibrium strategy product
Q
such that supp product (;) =
supp( i (;)). (This is su¢ cient because it implies that
i2I
T
T
T
Q
1
1
supp1i;" ( )
">0 supp" ( ).)
">0 supp2" ( ) =
">0
i2I
For
> 0, de…ne the period 1 recommendation distribution under
product (;)(a1 )
=
8
>
>
>
>
>
>
>
<
(1
Q
>
>
supp(
>
>
i2I
>
>
>
:
) (;)(a1 )
product (;)
by
if a1 2 supp( (;));
Q
if a1 2
supp( i (;))
i2I
i (;))
jsupp( (;))j
but a1 2
= supp( (;));
otherwise.
0
Note that the mediator recommends action a1 with positive probability if ai;1 is in the
support of the marginal distribution of player i’s recommendation under . In particular,
Q
supp product (;) =
supp( i (;)).
i2I
50
In subsequent periods t
2, if a1 2 supp( (;)) was recommended in period 1, then the
Q
supp( i (;)) but
mediator recommends actions according to product (htm ) = (htm ). If a1 2
i2I
a1 2
= supp( (;)) was recommended, then the mediator “resets”the history and recommends
actions according to
as if the game started from period 2: formally,
t
product (hm )
~ t 1)
= (h
m
~ t 1 = ht .
with (r1 ; a1 ); h
m
m
As
! 0, player i’s belief about a
i;1
given ai;1 2 supp( i (;)) under
her belief given ai;1 in the original equilibrium . Since
product
product
converges to
is an "-sequential equilibrium,
this implies that player i’s deviation gain in period 1 converges to at most " as
! 0. In
addition, if a1 2 supp( (;)), then continuation play from period 2 on is the same under
Q
supp( i (;)) but a1 2
= supp( (;)), then continuation
product and . Furthermore, if a1 2
i2I
play from period 2 on under
product
for any " > 0, there exists
> 0 such that player i’s deviation gain at any history under
product
is at most 2", so
Lemma 10
T
">0
S
t
product
suppt" ( ) =
is the same as play from period 1 on under . In sum,
is an 2"-sequential equilibrium strategy.
T
">0
supp1" ( ).
T S
T
T S
t
1
Proof. Since ">0 t suppt" ( )
">0
t supp" ( )
">0 supp" ( ) by de…nition, we are left to show
T
1
together with a consistent belief
">0 supp" ( ). Fix an "-sequential equilibrium strategy
system
.
As in Lemma 9, it su¢ ces to show that, for each t and ", there exists a 2"-
sequential equilibrium strategy with consistent belief system such that supp ( (;)) =
[
supp( (htm )), where the union is taken over all histories htm , including o¤-path histories.
htm
Fix t and " arbitrarily.
Since ( ; ) is an "-sequential equilibrium, there exists
such that, for each strategy-belief system pair
;
, if
;
>0
satis…es
(21)
(hm ) = (hm )
and
(hm jhi ) 2 (1
)2 (hm jhi );
1
(1
)2
(hm jhi )
for each i, , hi , and hm — that is, if the recommendation schedule is the same as
51
(22)
and the
posterior is close to
an
after each history— then
is 2"-sequentially rational. Fix such
;
> 0.
Since
is an "-sequential equilibrium, there exists a sequence of completely mixed strat-
egy pro…les
k 1
k=1
and associated belief systems
obedient strategy pro…le at
strategy
n
and
k
converges to
k 1
k=1
.
such that
k
converges to the
Consider the following mediator’s
. In period 1, the mediator draws a “…ctitious”history htm according to
the probability of htm given
n
n
(htm ),
. Given htm , the mediator draws r according to (htm ). In
period 1, the mediator sends (hti ; ri ) to each player i, where hti is player i’s component in htm .
[
Note that, since n is completely mixed, we have supp ( (;)) =
supp( (htm )).
htm
In period
2, let
the beginning of period
h2;
m
= a1 ; (r 0 ; a 0 )
1
0 =2
with
2;2
hm
= a1 be the mediator’s history at
except for the recommendation pro…le r in period 1. The mediator
t
recommends actions as if the history h2;
m happened after (hm ; r) in the original equilibrium :
2;
t
that is, she recommends action pro…les according to (htm ; r; h2;
m ), where (hm ; r; hm ) denotes
t 1
t
the concatenation of htm , r, and h2;
m . Similarly, let hi = (ri;s ; as )s=1 be player i’s …ctitious
history, and let h2;
= a1 ; (ri; 0 ; a 0 )
i
1
0 =2
be her history at the beginning of period .
2;
t
When we view hm = (htm ; r; h2;
m ) and hi = hi ; ri ; hi
as the history (including the
…ctitious one drawn at the beginning of the game), this new strategy
Hence, it su¢ ces to show that we can construct a consistent belief system
n
satis…es (21).
which satis…es
(22). To this end, denote player i’s belief about the mediator’s history hm when player i’s
history in period
is hi = hti ; ri ; h2;
i
by Pr (hm j hi ).
We construct the following sequence of perturbed full-support strategies: each player i
plays
k
t
i (hi ; ri )
k 1
k=1
in period 1 and then plays
2;
k
t
i (hi ; ri ; hi
; ri; ) in period
2. Recall that
is the sequence converging to the original equilibrium .
Let Pr
n; k
(hm j hi ) be player i’s belief about hm conditional on her history hi , given
that the …ctitious history htm is drawn from
n
52
at the beginning of the game and the players
take
Pr
k
n; k
in the subsequent periods. We have
k
Pr
(hm j hi ) =
k
Pr
t
t
(h2;
m j hm ; r) Pr (r j hm ) Pr
h2;
j hti ; ri Pr (ri j hti ) Pr
i
~ t ;~
h
m r
i
Pr
k
h2;
i
(htm )
n
(hti )
k
t
t
(h2;
m j hm ; r) Pr (r j hm )
~ t ; ri ; r~ i Pr ri ; r~ i j h
~t
jh
Pr
= P
n
m
m
n
Pr
n
~ t j ht
h
m
i
By consistency of ( ; ), for su¢ ciently large n, we have
n
(1
htm
)
j
Pr (htm )
=
n
Pr (hti )
hti
n
htm j hti
1
htm j hti
1
for all htm and hti . Moreover,
(1
)
~ t j ht
h
m
i
Pr
n
~ t j ht =
h
m
i
n
1
~ t j ht
h
m
i
1
~ t j ht
h
m
i
~ t and ht .
for each h
i
m
Fix such a su¢ ciently large n, and de…ne
(hm j hi ) = lim Pr
n; k
k!1
Since the belief system
(hm j hi ) :
is consistent, we have
lim Pr
k
k!1
t
2;
t
h2;
m j hm ; r = (hm j hm ; r)
Hence, for each hm and hi , we have
n
(hm j hi ) = P
=
That is,
~ t ;~
h
m r
(1
i
t
t
(h2;
(htm j hti )
m j hm ; r) Pr (r j hm ) Pr
~ t ; ri ; r~ i Pr ri ; r~ i j h
~ t Pr n h
~ t j ht
h2;
jh
m
m
m
i
i
)2 (hm j hi ) ;
1
(1
satis…es (22), as desired.
53
)2
(hm j hi ) :
Pr (htm )
:
n
Pr (hti )
Lemma 11 supp"#0 ( )
S supp"#0 ( ) :
Since supp"#0 ( ) = lim"!0 supp1" ( ) by de…nition,
Proof. Fix a 2 supp"#0 ( ) arbitrarily.
there exists a sequence of "-sequential equilibria f
a 2 limn!1 supp(
supp(
"n
"n
(f;g)).
"n 1
gn=1
with limn!1 "n = 0 such that
Taking a subsequence if necessary, we can assume that a 2
(f;g)) for each n.
For each n, incentive compatibility of a player i who is recommended ai in period 1
implies the following three conditions:
n
1. Let
i
2
supp1"n ( )
be the conditional distribution of player i’s opponents’
i
actions when ai is recommended to player i. In addition, let win (^
ai jai ) be player i’s
expected continuation payo¤ when she is recommended ai and plays a
^i . Then
(1
) ui ai ;
n
i
+ win (ai jai ) + "n
max (1
a
^i 2Ai
) ui a
^i ;
n
i
+ win (^
ai jai ) :
2. Player i’s continuation payo¤ does not exceed her best feasible payo¤ with the support
S
of actions restricted to t suppt"n ( ):
win (ai jai )
ui
S max
a2 t suppt"n ( )
(a) :
3. For each a
^i 2 Ai , by "-sequential rationality for player i after taking a
^i , player i’s continuation payo¤ after a
^i is at least her minmax payo¤ with the support of punishment
S
t
actions restricted to
:
t supp"n ( )
i
win (^
ai jai )
Since
limn!1
min
^
i2
(
S
t
t supp"n ( ))
max ui (~
ai ; ^ i )
i
a
~i 2Ai
"n :
(A i ) and u(A) are compact, taking a subsequence if necessary, there exist
n
i
i
=
and wi (^
ai jai ) = limn!1 win (^
ai jai ) for each a
^i . Moreover, since A is …nite, there
54
exists N such that for each n
8
>
>
>
<
>
>
>
:
N , we have
supp1"n ( ) = limn!1 supp1"n ( ) = lim"!0 supp1" ( ) = supp"#0 ( ) =
S
t
suppt"n ( ) = limn!1
Hence, for n
n
i
S
t
suppt"n ( ) = lim"!0
S
Y
supp"#0;i ( );
i2I
t
suppt" ( )
=
|{z}
supp"#0 ( ) =
Y
supp"#0;i ( ):
i2I
by Lemma (10)
(23)
N , we have
supp1"n ( )
2
win (ai jai )
a2
win (^
ai jai )
ui
S maxt
t supp"n ( )
^
i2
=
i
(a) =
min
Therefore,
i
S
(
max
a2supp"#0 ( )
n
i
i
=
i
supp"#0; i ( ) ;
ui (a) ;
max ui (~
ai ; ^ i )
t
t supp"n ( ))
= limn!1
supp"#0 ( )
a
~i 2Ai
"n =
^
i2
min
(supp"#0;
max ui (~
ai ; ^ i )
i(
~i 2Ai
)) a
and wi (^
ai jai ) = limn!1 win (^
ai jai ) satisfy the following
three conditions:
1. If ai is recommended, player i’s opponents play
i,
player i’s on-path continuation
payo¤ is wi (ai jai ), and player i’s continuation payo¤ after deviating to a
^i is wi (^
ai jai ),
then player i wants to take ai :
(1
) ui (ai ;
i)
+ wi (ai jai )
max f(1
a
^i 2Ai
) ui (^
ai ;
i)
+ wi (^
ai jai )g ;
(24)
2. Player i’s continuation payo¤ does not exceed her best feasible payo¤ with the support
of actions restricted to supp"#0 ( ):
wi (ai jai )
max
a2supp"#0 ( )
ui (a)
3. Player i’s continuation payo¤ following a deviation is at least her minmax payo¤ with
55
"n :
the support of punishment actions restricted to supp"#0; i ( ):
wi (^
ai jai )
^
i2
min
(supp"#0;
max ui (~
ai ; ^ i ) :
i(
~i 2Ai
)) a
Therefore, (24) implies that (11) holds with A = supp"#0 ( ). Hence, ai 2 Si supp"#0 ( )
for all i 2 I, so a 2 S supp"#0 ( ) .
11.8.2
Proof of supp( )
We show that if int
T
i2I
A1
Wi \ u (A1 ) 6= ; then. A1
to the proof of Lemma 1.
T
Fix v 2int i2I Wi \ u (A1 ) 6= ;, and let
(r) > 0 for all r 2 A1 . Let
i
^
2
supp( ). The argument is similar
(A1 ) be such that u ( ) = v and
be a solution to the problem
min 1 max ui (^
ai ;
i2
(A ) ai 2Ai
i) :
Let " i be the following full-support (within A1 ) approximation of i : " i = (1 ") i +
P
T
" a i 2A1 Aa 1i . Since v 2int i2I Wi , there exists " > 0 such that, for each i 2 I, we
i j
ij
have
1
vi > max ui ai ; " i +
fui (ai ; r i ) ui (r)g :
(25)
max
1
ai 2Ai
r2A ;ai 2Ai
In addition, choose " > 0 small enough such that, for each player i, some best response to
"
i
is included in A1
i : this is always possible, as every best response to
i
is included in
A1
i .
We can now construct an equilibrium strategy
with supp (
struction is identical to that in the proof of Lemma 1, except that
and player i’s deviations are punished by recommending
"
i
(;)) = A1 . The conis recommended on path,
to her opponents. Incentive
compatibility follows from (25), just as it follows from (1) in the proof of Lemma 1.
56
11.9
Proof of Proposition 5
Consider the following …ve-player game with common discount factor
=
3
.
4
Player 1
has two actions f11 ; 21 g, players 2 and 4 have three actions f12 ; 22 ; 32 g and f14 ; 24 ; 34 g, and
player 3 has four actions f13 ; 23 ; 33 ; 43 g. Player 5 is a dummy player with one action, and
her utility is always equal to
u1 (a).
When player 3 plays 13 , the payo¤ matrix for players 1 and 2 is as follows, regardless of
player 4’s actions:
12
11
0; 1
21
22
32
20; 7 20; 1
3; 0 20; 0 20; 6
In addition, player 3’s payo¤ and player 4’s payo¤ are 0, regardless of (a1 ; a2 ; a4 ).
When player 3 plays 23 , 33 , or 43 , the payo¤ matrix is as follows, regardless of (a1 ; a2 ):
14
23 0; 1; 0; 0
24
34
1; 0; 1; 1
1; 0; 1; 1
33 1; 1; 0; 0 1; 0; 1; 1
1; 0; 1; 1
43 1; 1; 0; 0 1; 0; 1; 1 10; 3; 1; 1
Note that the minmax payo¤ is zero for every player except player 5 (whose minmax
payo¤ is
20), while the players’ maximum deviation gains are d1 = 3, d2 = 6, d3 = 1,
d4 = 2, and d5 = 0. Letting v = (10; 3; 1; 1; 10) in the formula for
bound on
is
max
Thus,
>
, we see that a lower
3
6
1
2
;
;
;
;
3 + 10 6 + 3 1 + 1 2 + 1 0
0
10 + 20
2
= :
3
, so W 6= ;.
Consider the Pareto weight
= (0; 0; 0; 0; 1): that is, we want to maximize player 5’s
payo¤, or equivalently minimize player 1’s payo¤. We show that player 5’s best payo¤ is
strictly less than zero with mediated perfect monitoring, while her best payo¤ equals zero
with some private monitoring structure.
57
11.9.1
Mediated Perfect Monitoring
We show that Emed ( ) does not include a payo¤ vector v with v1 = 0. Since Emed ( ) coincides
with the set of (possibly weak) sequential equilibrium payo¤s when W 6= ; (see the proof
of Lemma 4), it su¢ ces to show player 1’s payo¤ is strictly positive in every sequential
equilibrium with mediated perfect monitoring.
To do so, we …x a putative sequential
equilibrium with v1 = 0, and establish a series of claims leading to a contradiction.
Claim 1: At any on-path history where player 1 has always played 11 in the past (where
this includes the null history), player 3 must play 13 with probability 1.
Proof: Fix an on-path history htm where player 1 has always played 11 . Let
denote the
ex ante probability of reaching history htm , and let Pr (ai ) be the probability that action ai
is played in period t conditional on reaching history htm .
Note that player 1’s equilibrium payo¤ is bounded from below by
(1
)
t
fPr(23 ) Pr(24 or 34 j 23 ) + Pr(33 ) + Pr(43 )g ;
as if she always plays 11 she gets payo¤ at least 0 in every period and reaches history htm
with probability at least . Hence, Pr(33 ) = Pr(43 ) = 0.
Given Pr(33 ) = Pr(43 ) = 0, we must also have Pr(23 ) = 0.
To see why, suppose
otherwise. Then, we must have Pr(24 or 34 j 23 ) = 0, that is, Pr(14 j 23 ) = 1. Hence, player
4’s deviation gain from 14 to 24 is no less than
Pr(23 j 14 ) =
Pr(14 j 23 ) Pr(23 )
Pr(23 )
=
:
Pr(14 )
Pr(14 )
Therefore, to ensure that player 4 does not deviate, we must guarantee him a payo¤ 1
Pr(23 )
Pr(14 )
whenever 14 is recommended. But then player 4’s ex ante payo¤ at history htm (before his
recommendation is drawn) is at least
Pr(14 ) (1
)0 +
1
Pr(23 )
Pr(14 )
58
+ Pr(a4 6= 14 )0 > 0;
since player 4 can always guarantee a payo¤ of 0 after any recommendation by playing
14 forever.
Hence, player 4’s equilibrium continuation payo¤ is strictly positive.
But,
by feasibility, this implies that player 1’s equilibrium continuation payo¤ is also strictly
positive, which is a contradiction. Therefore, we must have Pr(23 ) = Pr(33 ) = Pr(43 ) = 0,
or equivalently Pr (13 ) = 1.
Claim 2: At any on-path history where player 1 has always played 11 in the past, player
2 must play 12 with probability 1.
Proof: Given that Pr (13 ) = 1, this follows by the same argument as Claim 1: if Pr (22 )
or Pr (32 ) were positive, then player 1 could guarantee a positive payo¤ by always playing
11 .
Claim 3: At any on-path history where player 1 has always played 11 in the past, player
1 must play each of 11 and 21 with strictly positive probability.
Proof: Given that Pr (13 ) = 1 and Pr (21 ) = 1, player 2’s deviation gain is 6 if player 1
plays a pure action (either 11 or 21 ). Therefore, to ensure that player 2 does not deviate, his
continuation payo¤ must be at least
1
1’s continuation payo¤ must be at least
(20; 7) has slope
10
).
3
(6) = 2. But, by feasibility, this implies that player
10
3
(noting that the line segment between (0; 1) and
Hence, even if player 1 receives payo¤ 3 in period t, her payo¤ from
period t on is at least (1
) ( 3) +
10
3
=
7
4
> 0. So player 1 could guarantee a positive
payo¤ by playing 11 up to period t and then following her equilibrium strategy.
Claim 4: At any on-path history where player 1 has always played 11 in the past, player
1’s continuation payo¤ must equal 0 after 11 is recommended and must equal 1 after 21 is
recommended.
Proof: Player 1’s continuation payo¤ cannot be less than her minmax payo¤ of 0. If her
continuation payo¤ after 11 is recommended were strictly positive, then she could guarantee a
positive payo¤ by always playing 11 . Hence, her continuation payo¤ after 11 is recommended
must equal 0. If her continuation payo¤ after 21 is recommended were greater than 1, then
she could guarantee a positive payo¤ by playing 11 up to period t and then following her
equilibrium strategy (as (1
) ( 3) + (1) = 0).
59
If her continuation payo¤ after 21 is
recommended were less than 1, she would deviate from 21 to 11 .
Claim 5: Let v be the supremum of player 2’s continuation payo¤ over all on-path
4
.
3
histories where player 1’s continuation payo¤ equals 1. Then v
Proof: Let v be the supremum of player 2’s continuation payo¤ over all on-path histories
where player 1’s continuation payo¤ equals 0. Let P be the set of mixing probabilities p such
that, at some on-path history htm where player 1 has always played 11 in the past, player 1
plays p11 + (1
p) 21 . As Pr (13 ) = 1 and Pr (21 ) = 1 at such a history, player 2’s deviation
gain is at least
1
max f6p; 6
6pg = max f2p; 2
2pg :
Therefore, to ensure that player 2 does not deviate, his continuation payo¤ at such a history
must be at least max f2p; 2
2pg. Hence,
v
max f2p; 2
2pg for all p 2 P:
On the other hand, by the de…nition of v and v and Claim 4, we have
v
sup (1
) p + pv + (1
p) v :
p2P
Hence, there exists a number p 2 [0; 1] such that
(1
) p + pv + (1
max f2p; 2
p) v
2pg ;
or equivalently
v
1
(1
p
max f2p; 2
p)
2pg
(1
(1
)p
:
p)
It is straightforward to show that the right-hand side of this inequality is minimized at p = 12 ,
which yields
v
1
4
= :
3
Finally, note that the payo¤ vector (1; v ) is not in the feasible set for any v
60
4
,
3
as
13
10
is the largest payo¤ of player 2’s that is consistent with player 1 receiving payo¤ 1. We thus
obtain a contradiction.
11.9.2
Private Monitoring
Suppose that players 1, 2, 3, and 5 observe everyone’s actions perfectly, while player 4 only
observes her own action and payo¤ and player 2’s action. We claim that (0; 1; 0; 0; 0) is a
sequential equilibrium payo¤ with this private monitoring structure.
On the equilibrium path, in period 1, players take (11 ; 12 ; 13 ; 14 ) and (21 ; 12 ; 13 ; 14 ) with
probabilities 21 ; 12 . From period 2 on, player 3 takes 23 if player 1 took 11 in period 1, and
she takes 33 if player 1 took 21 . Players 1, 2, and 4 take (11 ; 12 ; 14 ) with probability one.
O¤ the equilibrium path, all players’ deviation are ignored, except for deviations by
player 2 in period 1 and deviations by player 4 in period t
period 1 (resp., player 4 deviates in period t
period
2.
If player 2 deviates in
2) then, in each period
2 (resp., each
t + 1), players 1 and 2 play 11 and 12 with probability one, player 3 mixes 23
and 33 with probabilities 21 ; 12 , independently across periods, and player 4 mixes 24 and 34
with probabilities 12 ; 12 , independently across periods. Note that this action pro…le is a static
Nash equilibrium.
It remains only to check incentive compatibility.
Player 3 always plays a static best
response. Players 1 and 2 play static best responses from period 2 onward, on and o¤ the
equilibrium path.
Consider player 4’s incentives.
In period 1, player 4’s payo¤ equals 0
regardless of her action. Hence, in period 2 player 4 believes that player 1 played 12 11 + 12 21
in period 1, and thus that Pr (23 ) = Pr (33 ) =
1
2
in period 2. In addition, player 4’s payo¤
remains constant at 0 as long as she plays 14 , so in every period t she continues to believe
that Pr (23 ) = Pr (33 ) = 21 .
Hence, playing 14 is a static best response, and player 4 is
minmaxed forever if she deviates, so following her recommendation is incentive compatible
on path. Furthermore, if player 2 deviates in period 1 (or player 4 herself deviates in period
t
2), then player 4 sees this and anticipates that player 3 will thenceforth play 21 23 + 12 33
independently across periods, so 12 24 + 12 34 is a best response.
61
We are left to check players 1 and 2’s incentives in period 1. For player 1, recalling that
= 43 , we have
payo¤ of taking 11 = (1
)0 +
0
|{z}
after 11 , player 3 takes 23 with probability one
= (1
) ( 3) +
1
|{z}
after 21 , player 3 takes 33 with probability one
= payo¤ of taking 21 :
Hence, player 1 is willing to mix between 11 and 21 . For player 2,
payo¤ of taking 12 = (1
)
1
2
1+
1
2
0 +
whether player 3 takes 23 or 33 , player 2 obtains 1
1
7 +
2
= payo¤ of taking 22
1
1
= (1
)
1+
2
2
= payo¤ of taking 32 .
= (1
1
|{z}
)
0
6 +
0
Hence, player 2 is willing to play 12 .
11.10
Proof of Corollary 1
Since u (A) is convex and v > u, there exists v 0 2 u (A) and " > 0 such than v
Taking
1
= maxi2I
Thus, for every
Let
2
>
= maxi2I
di
di +vi0 ui
1,
di
di +"
2" > v 0 > u.
and applying Proposition 6 yields v 0 2 Emed ( ) for all
there exists v ( ) 2 Emed ( ) such that v
and let
= max
1; 2
>
1.
" > v ( ) > u.
. Then, for every
> , the following is an
equilibrium: On path, the mediator recommends a correlated action pro…le that attains v.
If anyone deviates, play transitions to a sequential equilibrium yielding payo¤ v ( ). Such
an equilibrium exists as v ( ) 2 Emed ( ) for
>
1.
Finally, since this punishment
results in a loss of continuation payo¤ of at least " for each player, the mediator’s on-path
recommendations are incentive compatible for
62
>
2.
References
[1] Abreu, D., P.K. Dutta, and L. Smith (1994), “The Folk Theorem for Repeated Games:
a NEU Condition,”Econometrica, 62, 939-948.
[2] Abreu, D., D. Pearce, and E. Stacchetti (1990), “Toward a Theory of Discounted Repeated Games with Imperfect Monitoring,”Econometrica, 58, 1041-1063.
[3] Amarante, M. (2003), “Recursive Structure and Equilibria in Games with Private Monitoring,”Economic Theory, 22, 353-374.
[4] Athey, S. and K. Bagwell (2001), “Optimal Collusion with Private Information,”RAND
Journal of Economics, 32, 428-465.
[5] Awaya, Y. and V. Krishna (2014), “On Tacit versus Explicit Collusion,”mimeo.
[6] Ben-Porath, E. and M. Kahneman (1996), “Communication in Repeated Games with
Private Monitoring,”Journal of Economic Theory, 70, 281-297.
[7] Bergemann, D. and S. Morris (2013), “Robust Predictions in Games with Incomplete
Information,”Econometrica, 81, 1251-1308.
[8] Cherry, J. and L. Smith (2011), “Unattainable Payo¤s for Repeated Games of Private
Monitoring,”mimeo.
[9] Compte, O. (1998), “Communication in Repeated Games with Imperfect Private Monitoring,”Econometrica, 66, 597-626.
[10] Dhillon, A. and J.-F. Mertens, (1996) “Perfect Correlated Equilibria,” Journal of Economic Theory, 68, 279-302.
[11] Forges, F. (1986), “An Approach to Communication Equilibria,” Econometrica, 54,
1375-1385.
[12] Forges, F. (2009), “Correlated Equilibrium and Communication in Games,”Encyclopedia of Complexity and Systems Science, edited by R. Meyers, Springer: New York.
[13] Forges, F., J.-F. Mertens, and A. Neyman (1986), “A Counterexample to the Folk
Theorem with Discounting,”Economics Letters, 20, 7.
[14] Fuchs, W. (2007), “Contracting with Repeated Moral Hazard and Private Evaluations,”
American Economic Review, 97, 1432-1448.
[15] Fudenberg, D. and D. Levine (1983), “Subgame-Perfect Equilibria of Finite- and
In…nite-Horizon Games,”Journal of Economic Theory, 31, 251-268.
[16] Fudenberg, D., D. Levine, and E. Maskin (1994), “The Folk Theorem with Imperfect
Public Information,”Econometrica, 62, 997-1039.
63
[17] Gossner, O. (2000), “Comparison of Information Structures,” Games and Economic
Behavior, 30, 44-63.
[18] Green, E.J. and R.H. Porter (1984), “Noncooperative Collusion under Imperfect Price
Information,”Econometrica, 52, 87-100.
[19] Heller, Y., E. Solan, and T. Tomala, (2012) “Communication, Correlation and CheapTalk in Games with Public Information,”Games and Economic Behavior, 74, 222-234.
[20] Kandori, M. (1991), “Cooperation in Finitely Repeated Games with Imperfect Private
Information,”mimeo.
[21] Kandori, M. (2002), “Introduction to Repeated Games with Private Monitoring,”Journal of Economic Theory, 102, 1-15.
[22] Kandori, M. and H. Matsushima (1998), “Private Observation, Communication and
Collusion,”Econometrica, 66, 627-652.
[23] Kandori, M. and I. Obara (2010), “Towards a Belief-Based Theory of Repeated Games
with Private Monitoring: An Application of POMDP,”mimeo.
[24] Lehrer, E. (1992), “Correlated Equilibria in Two-Player Repeated Games with Nonobservable Actions,”Mathematics of Operations Research, 17, 175-199.
[25] Lehrer, E., D. Rosenberg, and E. Shmaya (2010), “Signaling and Mediation in Games
with Common Interests,”Games and Economic Behavior, 68, 670-682.
[26] Levin, J. (2003), “Relational Incentive Contracts,” American Economic Review, 93,
835-857.
[27] MacLeod, B.W. (2003), “Optimal Contracting with Subjective Evaluation,” American
Economic Review, 93, 216-240.
[28] Mailath, G.J., S.A. Matthews, and T. Sekiguchi (2002), “Private Strategies in Finitely
Repeated Games with Imperfect Public Monitoring,”Contributions in Theoretical Economics, 2.
[29] Mailath, G.J. and L. Samuelson (2006), Repeated Games and Reputations, Oxford University Press: Oxford.
[30] Mertens, J.-F., S. Sorin, and S. Zamir (1994), “Repeated Games. Part A,” CORE DP
9420.
[31] Miyahara, Y. and T. Sekiguchi (2013), “Finitely Repeated Games with Monitoring
Options,”Journal of Economic Theory, 148, 1929-1952.
[32] Myerson, R.B. (1986), “Multistage Games with Communication,” Econometrica, 54,
323-358.
64
[33] Pai, M.M, A. Roth, and J. Ullman (2014), “An Anti-Folk Theorem for Large Repeated
Games with Imperfect Monitoring,”mimeo.
[34] Phelan, C., and A. Skrzypacz (2012), “Beliefs and Private Monitoring,”Review of Economic Studies, 79, 1637-1660.
[35] Phelan, C. and E. Stacchetti (2001), “Sequential Equilibria in a Ramsey Tax Model,”
Econometrica, 69, 1491-1518.
[36] Rahman, D. (2012), “But Who Will Monitor the Monitor?,” American Economic Review, 102, 2767-2797.
[37] Rahman, D. (2014), “The Power of Communication,”American Economic Review, 104,
3737-51.
[38] Rahman, D. and I. Obara (2010), “Mediated Partnerships,”Econometrica, 78, 285-308.
[39] Renault, J. and T. Tomala (2004), “Communication Equilibrium Payo¤s in Repeated
Games with Imperfect Monitoring,”Games and Economic Behavior, 49, 313-344.
[40] Sekiguchi, T. (2002), “Existence of Nontrivial Equilibria in Repeated Games with Imperfect Private Monitoring,”Games and Economic Behavior, 40, 299-321.
[41] Stigler, G.J. (1964), “A Theory of Oligopoly,”Journal of Political Economy, 72, 44-61.
[42] Sugaya, T. and A. Wolitzky (2014), “Perfect Versus Imperfect Monitoring in Repeated
Games,”mimeo.
[43] Tomala, T. (2009), “Perfect Communication Equilibria in Repeated Games with Imperfect Monitoring,”Games and Economic Behavior, 67, 682-694.
65
Online Appendix: Proof of Proposition 4
We prove that Etalk ( ; p) = Emed ( ). Since we consider two-player games, whenever we say
players i and j, we assume that they are di¤erent players: i 6= j.
The structure of the proof is as follows: take any strategy of the mediator, ~ , that
satis…es (2) (perfect monitoring incentive compatibility); and let v~ be the value when the
players follow ~ . Since each v^ 2 Emed ( ) has a corresponding ^ that satis…es (2), it su¢ ces
to show that, for each " > 0, there exists a sequential equilibrium whose equilibrium payo¤
v satis…es kv v~k < " in the following environment:
1. At the beginning of the game, each player i receives a message mmediator
from the
i
mediator.
2. In each period t, the stage game proceeds as follows:
(a) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ), each player i sends the …rst
i
message m1st
i;t simultaneously.
; (m1st ; a ; m2nd )t =11 ; m1st
(b) Given player i’s history (mmediator
t ), each player i takes
i
action ai;t simultaneously.
; (m1st ; a ; m2nd )t =11 ; m1st
(c) Given player i’s history (mmediator
t ; at ), each player i sends
i
2nd
the second message mi;t simultaneously.
We call this environment “perfect monitoring with cheap talk.”
To this end, from ~ , we …rst create a strict full-support equilibrium with mediated perfect monitoring that yields payo¤s close to v~. We then move from to a similar equilibrium
, which will be easier to transform into an equilibrium with perfect monitoring with cheap
talk. Finally, from , we create an equilibrium with perfect monitoring with cheap talk
with the same on-path action distribution.
Construction and Properties of
In this subsection, we consider mediated perfect monitoring throughout. Since W 6= ;,
by Lemma 2, there exists a strict full support equilibrium strict with mediated perfect
monitoring. As in the proof of Lemma 2, consider the following strategy of the mediator:
In period 1, the mediator draws one of two states, Rv~ and Rperturb , with probabilities 1
and , respectively. In state Rv~, the mediator’s recommendation is determined as follows:
If no player has deviated up to period t, the mediator recommends rt according to ~ (htm ).
If only player i has deviated, the mediator recommends r i;t to player j according to j ,
and recommends some best response to j to player i. Multiple deviations are treated as
in the proof of Lemma 1. On the other hand, in state Rperturb , the mediator follows the
equilibrium strict . Let denote this strategy of the mediator. From now on, we …x > 0
arbitrarily.
66
With mediated perfect monitoring, since strict has full support, player i believes that
the mediator’s state is Rperturb with positive probability after any history. Therefore, by (2)
and the fact that strict is a strict equilibrium, it is always strictly optimal for each player i
to follow her recommendation. This means that, for each period t, there exist "t > 0 and
Tt < 1 such that, for each player i and on-path history ht+1
m , we have
#
"
1
X
t 1
)
ui ( (hm )) j htm ; ri;t
(1
)E ui (rt ) j htm ; ri;t + E (1
=t+1
> max (1
)E ui (ai ; r
ai 2Ai
+(
Tt
) (1
i;t )
j
htm ; ri;t
"t ) max ui (^
ai ;
a
^i
"t
j )
+ "t max ui (a) +
Tt
max ui (a):
a2A
a2A
(26)
That is, suppose that if player i unilaterally deviates from on-path history, then player j
virtually minmaxes player i for Tt 1 periods with probability 1 "t . (Recall that j is
the minmax strategy and "j is a full support perturbation of j .) Then player i has a
strict incentive not to deviate from any recommendation in period t on equilibrium path.
Equivalently, since is an full support recommendation, player i has a strict incentive not
to deviate unless she herself has deviated.
Moreover, for su¢ ciently small "t > 0, we have
"
#
1
X
t 1
(1
)E ui (rt ) j htm ; ri;t + E (1
)
ui ( (hm )) j htm
=t+1
> (1
Tt
) (1
"t ) max ui (^
ai ;
a
^i
"t
j )
+ "t max ui (a) +
a2A
Tt
max ui (a):
a2A
(27)
That is, if a deviation is punished with probability 1 "t for Tt periods including the current
period, then player i believes that the deviation is strictly unpro…table.22
For each t, we …x "t > 0 and Tt < 1 with (26) and (27). Without loss, we can take "t
decreasing: "t "t+1 for each t.
Construction and Properties of
In this subsection, we again consider mediated perfect monitoring. We further modify and
create the following mediator’s strategy : At the beginning of the game, for each i, t, and
punish
at , the mediator draws ri;t
(at ) according to "i t . In addition, for each i and t, she draws
! i;t 2 fR; P g such that ! i;t = R (regular) and P (punish) with probability 1 pt and pt ,
respectively, independently across i and t. We will pin down pt > 0 in Lemma 12. Moreover,
given ! t = (! 1;t ; ! 2;t ), the mediator chooses rt (at ) for each at as follows: If ! 1;t = ! 2;t = R,
then she draws rt (at ) according to (at ) (r). If ! i;t = R and ! j;t = P , then she draws ri;t (at )
22
If the current on-path recommendation schedule Pr (rj;t j htm ; ri;t ) is very close to
more restrictive than (26).
67
j,
then (27) may be
P
a
punish
from Pr (ri j rj;t
(at )) while she draws rj;t (at ) randomly from aj 2Aj jAjj j .23 Finally, if
P
! 1;t = ! 2;t = P , then she draws ri;t (at ) randomly from ai 2Ai jAaii j for each i independently.
Since has full support,
is well de…ned.
As will be seen, we will take pt su¢ ciently small. In addition, recall that > 0 (the
perturbation of ~ to ) is arbitrarily. In the next subsection and onward, we construct an
equilibrium with perfect monitoring with cheap talk that has the same equilibrium action
distribution as . Since pt is small and > 0 is arbitrary, constructing such an equilibrium
su¢ ces to prove Proposition 4.
punish
At the beginning of the game, the mediator draws ! t , ri;t
(at ), and rt (at ) for each i, t,
and at . Given them, the mediator sends messages to the players as follows:
1. At the beginning of the game, the mediator sends
punish
ri;t
(at )
i.
1
at 2At
1
to player
t=1
2. In each period t, the stage game proceeds as follows:
(a) The mediator decides ! t (at ) 2 fR; P g2 as follows: if there is no unilateral deviator
(de…ned below), then the mediator sets ! t (at ) = ! t . If instead player i is a
unilateral deviator, then the mediator sets ! i;t (at ) = R and ! j;t (at ) = P .
(b) Given ! i;t (at ), the mediator sends ! i;t (at ) to player i. In addition, if ! i;t (at ) = R,
then the mediator sends ri;t (at ) to player i as well.
(c) Given these messages, player i takes an action. In equilibrium, if player i has
punish
(at ) if
not yet deviated, then player i takes ri;t (at ) if ! i;t (at ) = R and takes ri;t
t
! i;t (a ) = P . For notational convenience, let
ri;t =
ri (at ) if ! i;t (at ) = R;
punish
ri;t
(at ) if ! i;t (at ) = P
be the action that player i is supposed to take if she has not yet deviated. Her
strategy after her own deviation is not speci…ed.
We say that player i has unilaterally deviated if there exist
t 1 and a unique i such
0
0
0
that (i) for each
< , we have an; = rn; for each n 2 f1; 2g (no deviation happened
until period
1) and (ii) ai; 6= ri; and aj; = rj; (player i deviates in period and player
j does not deviate).
Note that
is close to on the equilibrium path for su¢ ciently small pt . Hence, onpath strict incentive compatibility for player i follows from (26). Moreover, the incentive
compatibility condition analogous to (27) also holds.
punish
As will be seen below, if ! j;t = P , then player j is supposed to take rj;t
(at ). Hence, rj;t (at ) does
t
not a¤ect the equilibrium action. We de…ne rj;t (a ) so that, when the mediator sends a message only at
the beginning of the game (in the game with perfect monitoring with cheap talk), she sends a “dummy
recommendation” rj;t (at ) so that player j does not realize that ! j;t = P until period t.
23
68
Lemma 12 There exists fpt g1
t=1 with pt > 0 for each t such that it is strictly optimal for
each player i to follow her recommendation: For each player i and history
punish
ri;t
at
hti
1
at 2At 1
t=1
; at ; (! (a ))t =11 ; ! i;t (at ); (ri; )t =1 ;
if player i herself has not yet deviated, we have the following two inequalities:
1. If a deviation is punished by "j t for the next period Tt periods with probability 1 "t
Pt+Tt 1
p , then it is strictly unpro…table:
=t
#
"
1
X
t 1
ui (ri; ; aj; ) j hti ; ai;t = ri;t
(1
)E ui (ri;t ; aj;t ) j hti + E
(1
)
=t+1
> max (1
Tt
+(
+
Tt
ui (ai ; aj;t ) j
)E
ai 2Ai
1
)
"t
max ui (a):
hti
Pt+Tt
1
=t
p
ai ;
max ui (^
a
^i
"t
j )
+ "t +
Pt+Tt
1
=t
p
max ui (a)
a2A
(28)
a2A
2. If a deviation is punished by "j t from the current period with probability 1 "t
Pt+Tb 1
pt , then it is strictly unpro…table:
=t
"
#
1
X
t 1
(1
)E ui (ri;t ; aj;t ) j hti + E
(1
)
ui (ri; ; aj; ) j hti ; ai;t = ri;t
=t+1
Tt
> (1
+
Tt
)
1
max ui (a):
"t
Pt+Tt
=t
1
p
max ui (^
ai ;
a
^i
a2A
"t
j )
+ "t +
Pt+Tt
=t
1
p
max ui (a)
a2A
(29)
Moreover, E does not depend on the speci…cation of player j’s strategy after player j’s
own deviation, for each history hti such that player i has not deviated.
Proof. Since has full support on the equilibrium path, a player i who has not yet deviated
always believes that player j has not deviated. Hence, E is well de…ned without specifying
player j’s strategy after player j’s own deviation.
Moreover, since pt is small and ! j;t is independent of (! )t =11 and ! i;t , given (! (a ))t =11
and ! i;t (at ) (which are equal to (! )t =11 and ! i;t on-path), player i believes that ! j;t (at ) is
equal to ! j;t and ! j;t is equal to R with a high probability, unless player i has deviated.
Since
Pr (rj;t j ! i;t (at ); ! j;t (at ) = R ; hti ) = Pr (rj;t j at ; ri;t );
69
we have that the di¤erence
ui (ri;t ; aj;t ) j hti
E
E
ui (ri;t ; aj;t ) j rit ; at ; ri;t
is small for small pt .
Further, if p is small for each
t+1, then since ! is independent of ! t with t
1,
regardless of (! (a ))t =1 , player i believes that ! i; (a ) = ! j; (a ) = R with high probability
for
t + 1 on the equilibrium path. Since the distribution of the recommendation given
is the same as that of given a and ! i; (a ) = ! j; (a ) = R, we have that
"
#
"
#
1
1
X
X
t 1
t 1
t
t t
(1
)
ui (ri; ; aj; ) j hi ; ai;t = ri;t E (1
)
ui (ri; ; aj; ) j ri ; a ; ri;t
E
=t+1
=t+1
is small for small p with
t + 1.
Hence, (26) and (27) imply that, there exists pt > 0 such that, if p
pt for each
t,
then the claims of the lemma hold. Hence, if we take pt min t p , then the claims hold.
We …x fpt g1
t=1 so that Lemma 12 holds. This fully pins down
monitoring.
with mediated perfect
Construction with Perfect Monitoring with Cheap Talk
Given
with mediated perfect monitoring, we de…ne the equilibrium strategy with perfect
monitoring with cheap talk such that the equilibrium action distribution is the same as
. We must pin down the following four objects: at the beginning of the game, what
player i receives from the mediator; what message m1st
message mmediator
i;t player i sends at
i
the beginning of period t; what action ai;t player i takes in period t; and what message m2nd
i;t
player i sends at the end of period t.
Intuitive Argument
punish
As in , at the beginning of the game, for each i, t, and at , the mediator draws ri;t
(at )
according to "i t . In addition, with pt > 0 pinned down in Lemma 12, she draws ! t 2 fR; P g2
and rt (at ) as in
for each t and at . She then de…nes ! t (at ) from at , rt (at ), and ! t as in .
Intuitively, the mediator sends all the information about
punish
punish
! t (at ); rt at ; r1;t
at ; r2;t
at
1
at 2At
1
t=1
through the initial messages (mmediator
; mmediator
). In particular, the mediator directly sends
1
2
punish
(ri;t
(at ))at 2At
1
1
t=1
to player i as a part of mmediator
.
i
replicate the role of the mediator in
on realized history at .
Hence, we focus on how we
of sending (! t (at ); rt (at )) in each period, depending
70
The key features to establish are (i) player i does not know the instructions for the other
player, (ii) before player i reaches period t, player i does not know her own recommendations
for periods
t (otherwise, player i would obtain more information than the original
equilibrium
and thus might want to deviate), and (iii) no player wants to deviate (in
particular, if player i deviates in actions or cheap talk, then the strategy of player j is as if
the state were ! j;t = P in , for a su¢ ciently long time with a su¢ ciently high probability).
The properties (i) and (ii) are achieved by the same mechanism as in Theorem 9 of Heller,
Solan and Tomala (2012, henceforth HST). In particular, without loss, let Ai = f1i ; :::; ni g
be player i’s action set. We can view ri;t (at ) as an element of f1; :::; ni g. The mediator at
the beginning of the game draws rt (at ) for each at .
Instead of sending ri;t (at ) directly to player i, the mediator encodes ri;t (at ) as follows: For
a su¢ ciently large N t 2 Z to be determined, we de…ne pt = N t ni nj . This pt corresponds to
ph in HST. Let Zpt f1; :::; pt g. The mediator draws xji;t (at ) uniformly and independently
from Zpt for each i, t, and at . Given them, she de…nes
i
yi;t
(at )
xji;t (at ) + ri;t (at ) (mod ni ):
(30)
i
i
Intuitively, yi;t
(at ) is the “encoded instruction”of ri;t (at ), and to obtain ri;t (at ) from yi;t
(at ),
1
player i needs to know xji;t (at ). The mediator gives (yii (at ))at 2At 1 t=1 to player i as a
. At the same time, she gives
part of mmediator
i
xji;t (at )
1
at 2At 1 t=1
xji;t (at ) by
to player j as a part of
cheap talk as a part of
mmediator
. At the beginning of period t, player j sends
j
1st
t
mj;t , based on the realized action a , so that player i does not know ri;t (at ) until period t.
(Throughout the proof, the superscript of a variable represents who is informed about the
variable, and the subscript represents whose recommendation the variable is about.)
In order to incentivize player j to tell the truth, the equilibrium should embed a mechanism that punishes player i if she tells a lie. In HST, this is done as follows: The mediator
draws ii;t (at ) and ii;t (at ) uniformly and independently from Zpt , and de…nes
uji;t (at )
i
t
i;t (a )
xji;t (at ) +
i
t
i;t (a )
(mod pt ):
(31)
The mediator gives xji;t (at ) and uji;t (at ) to player j while she gives ii;t (at ) and ii;t (at ) to
player i. In period t, player j is supposed to send xji;t (at ) and uji;t (at ) to player i. If player
i receives xji;t (at ) and uji;t (at ) with
uji;t (at ) 6=
i
t
i;t (a )
xji;t (at ) +
i
t
i;t (a )
(mod pt );
(32)
then player i interprets that player j has deviated. For su¢ ciently large N t , since player
j does not know ii;t (at ) and ii;t (at ), if player j tells a lie about xji;t (at ), then with a high
probability, player j creates a situation where (32) holds.
Since HST considers Nash equilibrium, they let player i minimax player j forever after
(32) holds. On the other hand, since we consider sequential equilibrium, as in the proof of
71
Lemma 2, we will create a coordination mechanism such that, if player j tells a lie, then with
high probability player i minimaxes player j for a long time and player i assigns probability
zero to the event that player i punishes player j.
To this end, we consider the following coordination: First, if and only if ! i;t (at ) = R, the
mediator de…nes uji;t (at ) as (31). Otherwise, uji;t (at ) is randomly drawn. That is,
uji;t (at )
xji;t (at ) + ii;t (at ) (mod pt ) if ! i;t (at ) = R;
uniformly distributed over Zpt
if ! i;t (at ) = P:
t
i
i;t (a )
(33)
Since both ! i;t (at ) = R and ! i;t (at ) = P happen with a positive probability, player i after
receiving uji;t (at ) with uji;t (at ) 6= ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt ) interprets that ! i;t (at ) =
P . For notational convenience, let !
^ i;t (at ) 2 fR; P g be player i’s interpretation of ! i;t (at ).
punish
After !
^ i;t (at ) = P , she takes period-t action according to ri;t
(at ). Given this inference, if
j
player j tells a lie about ui;t (at ) with ! i;t (at ) = R, then with a high probability, she induces
a situation with uji;t (at ) 6= ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt ), and player i punishes player
j in period t (without noticing player j’s deviation).
punish
Second, switching to ri;t
(at ) only for period t may not be enough, if player j believes
that player i’s action distribution given ! i;t (at ) = R is very close to the minimax strategy.
Hence, we make sure that, once player j deviates, player i will take ri;punish (a ) for su¢ ciently
long time.
To this end, we change the mechanism so that player j does not always know uji;t (at ).
j
Instead, the mediator draws pt independent random variables vi;t
(n; at ) with n = 1; :::; pt
uniformly from Zpt . In addition, she draws nii;t (at ) uniformly from Zpt . The mediator
de…nes uji;t (n; at ) for each n = 1; :::; pt as follows:
uji;t (n; at )
=
uji;t (at ) if n = nii;t (at );
j
vi;t
(n; at ) if otherwise,
that is, uji;t (n; at ) corresponds to uji;t (at ) with (33) only if n = nii;t (at ). For other n, uji;t (n; at )
is completely random.
The mediator sends nii;t (at ) to player i, and sends fuji;t (n; at )gn2Zpt to player j. In
addition, the mediator sends nji;t (at ) to player j, where
nji;t (at ) =
nii;t (at )
if ! i;t 1 (at 1 ) = P;
uniformly distributed over Zpt if ! i;t 1 (at 1 ) = R
is equal to nii;t (at ) if and only if last-period ! i;t 1 (at 1 ) is equal to P .
In period t, player j is asked to send xji;t (at ) and uji;t (n; at ) with n = nii;t (at ), that is, send
xji;t (at ) and uji;t (at ). If and only if player j’s messages x^ji;t (at ) and u^ji;t (at ) satisfy
u^ji;t (at ) =
i
t
i;t (a )
x^ji;t (at ) +
72
i
t
i;t (a )
(mod pt );
player i interprets !
^ i;t (at ) = R. If player i has !
^ i;t (at ) = R, then player i knows that player
t+1
i
j needs to know ni;t+1 (a ) to send the correct uji;t+1 (n; at+1 ) in the next period. Hence,
^ i;t (at ) = P , then she believes that player
she sends nii;t+1 (at+1 ) to player j. If player i has !
i
t+1
i
t+1
j knows ni;t+1 (a ) and does not send ni;t+1 (a ).
Given this coordination, once player j creates a situation with ! i;t (at ) = R but !
^ i;t (at ) =
P , then player j cannot receive nii;t+1 (at+1 ). Without knowing nii;t+1 (at+1 ), with a large
N t+1 , with a high probability, player j cannot know which uji;t+1 (n; at+1 ) she should send.
Then, again, she will create a situation with
u^ji;t+1 (at+1 ) 6=
i
t+1
)
i;t+1 (a
x^ji;t (at+1 ) +
i
t+1
)
i;t (a
(mod pt+1 );
that is, !
^ i;t+1 (at+1 ) = P . Recursively, player i has !
^ i; (a ) = P for a long time with a high
probability if player j tells a lie.
Finally, if player j takes a deviant action in period t, then the mediator has drawn
! i; (a ) = P for each
t + 1 for a corresponding to the realized history. With ! i; (a ) =
P , in order to avoid !
^ i; (a ) = P , player j needs to create a situation
u^ji; (a ) =
i
i;
(a )
x^ji; (a ) +
i
i;
(a ) (mod p )
without knowing ii; (a ) and ii; (a ) while the mediator’s message does not tell her what is
t
i
xji;t (at ) + ii;t (at ) (mod p ) by (33). Hence, for su¢ ciently large N , player j cannot
i;t (a )
avoid !
^ i; (a ) = P with a nonnegligible probability. Hence, player j will be minmaxed from
the next period with a high probability.
The above argument in total shows that, if player j deviates, whether in communication
or action, then she will be minmaxed for su¢ ciently long time. Lemma 12 ensures that
player j does not want to tell a lie or take a deviant action.
Formal Construction
Let us formalize the above construction: As in , at the beginning of the game, for each i,
punish
t, and at , the mediator draws ri;t
(at ) according to "i t ; then she draws ! t 2 fR; P g2 and
t
t
rt (a ) for each t and a ; and then she de…nes ! t (at ) from at , rt (at ), and ! t as in . For each
t and at , she draws xji;t (at ) uniformly and independently from Zpt . Given them, she de…nes
i
yi;t
(at )
xji;t (at ) + ri;t (at ) (mod ni );
so that (30) holds.
j
The mediator draws ii;t (at ), ii;t (at ), u~ji;t (at ), vi;t
(n; at ) for each n 2 Zpt , nii;t (at ), and
n
~ ji;t (at ) from the uniform distribution over Zpt independently for each player i, each period
t, and each at .
73
As in (33), the mediator de…nes
xji;t (at ) + ii;t (at ) (mod pt ) if ! i;t (at ) = R;
u~ji;t (at )
if ! i;t (at ) = P:
i
t
i;t (a )
uji;t (at )
In addition, the mediator de…nes
uji;t (n; at ) =
and
nji;t (at ) =
uji;t (at ) if n = nii;t (at );
j
vi;t
(n; at ) if otherwise
nii;t (at ) if t = 1 or ! i;t 1 (at 1 ) = P;
n
~ ji;t (at ) if t 6= 1 and ! i;t 1 (at 1 ) = R;
as explained above.
Let us now de…ne the equilibrium:
1. At the beginning of the game, the mediator sends
mmediator
=
i
punish
i
(at ) ;
(at ); ii;t (at ); ii;t (at ); ri;t
yi;t
nii;t (at ); nij;t (at ); uij;t (n; at ) n2Z t ; xij;t (at )
p
!
at 2At
1
!1
t=1
to each player i.
2. In each period t, the stage game proceeds as follows: In equilibrium,
m1st
j;t =
j
t
t
uji;t (m2nd
if t 6= 1 and m2nd
i;t 1 ; a ); xi;t (a )
i;t 1 6= fbabbleg;
j
j
j
t
t
t
2nd
ui;t (ni;t (a ); a ); xi;t (a ) if t = 1 or mi;t 1 = fbabbleg
and
m2nd
j;t =
(34)
njj;t+1 (at+1 ) if !
^ j;t (at ) = R;
fbabbleg if !
^ j;t (at ) = P:
t+1
Note that, since m2nd
= (a1 ; :::; at ).
j;t is sent at the end of period t, the players know a
(a) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ), each player i sends the …rst
i
1st
message mi;t simultaneously. If player i herself has not yet deviated, then
m1st
i;t =
t
i
t
uij;t (m2nd
if t 6= 1 and m2nd
j;t 1 ; a ); xj;t (a )
j;t 1 6= fbabbleg;
i
i
t
t
i
t
2nd
uj;t (nj;t (a ); a ); xj;t (a ) if t = 1 or mj;t 1 = fbabbleg:
1st
i
2nd
t
i
i
t
t
Let m1st
i;t (u) be the …rst element of mi;t (that is, either uj;t (mj;t 1 ; a ) or uj;t (nj;t (a ); a )
1st
i
t
on equilibrium); and let mi;t (x) be the second element (xj;t (a ) on equilibrium).
As a result, the pro…le of the messages m1st
becomes common knowledge.
t
If
i
i
t
t
t
m1st
m1st
(35)
j;t (u) 6= i;t (a )
j;t (x) + i;t (a ) (mod p );
74
then player i interprets !
^ i;t (at ) = P . Otherwise, !
^ i;t (at ) = R.
(b) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ; m1st
i
t ), each player i takes
action ai;t simultaneously. If player i herself has not yet deviated, then player i
takes ai;t = ri;t with
ri;t =
i
yi;t
(at )
m1st
^ i;t (at ) = R;
j;t (x) (mod ni ) if !
punish
ri;t
(at )
if !
^ i;t (at ) = P:
(36)
i
(at ) xji;t (at ) + ri;t (at ) (mod ni ) by (30). By (34), therefore, player
Recall that yi;t
punish
i
(at ) if ! i;t (at ) = R and ri;t
(at ) if ! i;t (at ) = P on the equilibrium
i takes ri;t
path, as in .
(c) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ; m1st
i
t ; at ), each player i sends
2nd
the second message mi;t simultaneously. If player i herself has not yet deviated,
then
nii;t+1 (at+1 ) if !
^ i;t (at ) = R;
m2nd
=
i;t
fbabbleg if !
^ i;t (at ) = P:
becomes common knowledge. Note
As a result, the pro…le of the messages m2nd
t
t
that ! t (a ) becomes common knowledge as well on equilibrium path.
Incentive Compatibility
The above equilibrium has full support: Since ! t (at ), and rt (at ) have full support, (mmediator
; mmediator
)
1
2
have full support as well. Hence, we are left to verify player i’s incentive not to deviate
from the equilibrium strategy, given that player i believes that player j has not yet deviated
after any history of player i.
Suppose that player i followed the equilibrium strategy until the end of period t 1.
First, consider player i’s incentive to tell the truth about m1st
In equilibrium, player i
i;t .
sends
t
i
t
uij;t (m2nd
if m2nd
j;t 1 ; a ); xj;t (a )
j;t 1 6= fbabbleg;
=
m1st
i
i
t
t
i
t
2nd
i;t
uj;t (nj;t (a ); a ); xj;t (a ) if mj;t 1 = fbabbleg:
The random variables possessed by player i are independent of those possessed by player
j given (m1st ; a ; m2nd )t =11 , except that (i) uji;t (at ) = ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt )
if ! i;t (at ) = R, (ii) uij;t (at ) = jj;t (at ) xij;t (at ) + jj;t (at ) (mod pt ) if ! j;t (at ) = R, (iii)
nji; (a ) = nii; (a ) if ! i; 1 (a 1 ) = P while nji; (a ) = n
~ ii; (a ) if ! i; 1 (a 1 ) = R, and (iv)
nij; (a ) = njj; (a ) if ! j; 1 (a 1 ) = P while nij; (a ) = n
~ jj; (a ) if ! j; 1 (a 1 ) = R. Since
j
i
i
t
t
~ji;t (at ), vi;t
(n; at ) nii;t (at ), and n
~ ji;t (at ) are uniform and independent, player
i;t (a ), i;t (a ), u
i cannot learn ! i; (a ), ri; (a ), or rj; (a ) with
t. Hence, player i believes at the time
75
when she sends m1st
i;t that her equilibrium value is equal to
(1
)E
ui (at ) j hti + E
"
(1
punish
where hti is as if player i observed ri;t
(at )
)
1
X
=t+1
1
at 2At
1 t=1
t 1
#
ui (at ) j hti ;
, at , (! (a ))t =11 , and ri;t (at ), and
believed that r (a ) = a for each = 1; :::; t 1 with
with mediated perfect monitoring.
On the other hand, for each e > 0, for a su¢ ciently large N t , if player i tells a lie in at
e, player i creates a situation
least one element m1st
i;t , then with probability 1
m1st
i;t (u) 6=
j
t
j;t (a )
m1st
i;t (x) +
j
t
j;t (a )
(mod pt ):
Hence, (35) (with indices i and j reversed) implies that !
^ j;t (at ) = P .
t
Moreover, if player i creates a situation with !
^ j;t (a ) = P , then player j will send m2nd
j;t =
j
j
t+1
t
t+1
fbabbleg instead of nj;t+1 (a ). Unless ! j;t (a ) = P , since nj;t+1 (a ) is independent of
player i’s variables, player i believes that njj;t+1 (at+1 ) is distributed uniformly over Zpt+1 .
Hence, for each e > 0, for su¢ ciently large N t , if !
^ j;t (at ) = R, then player i will send m1st
i;t+1
with
j
j
t+1
t+1
m1st
) m1st
) (mod pt+1 )
i;t+1 (u) 6= j;t+1 (a
i;t+1 (x) + j;t+1 (a
with probability 1 e. Then, by (35) (with indices i and j reversed), player j will have
!
^ j;t+1 (at+1 ) = P .
Recursively, if ! j; (a ) = R for each
= t; ::; t + Tt 1, then player i will induce
!
^ j; (a ) = P for each = t; ::; t + Tt 1 with a high probability. Hence, for "t > 0 and
Tt …xed in (26) and (27), for su¢ ciently large N t , if N
N t for each
t, then player i
will be punished for subsequentP
Tt periods regardless
Pt+Tt 1 of player i’s continuation strategy with
t+Tt 1
p represents the maximum probability
p
.
(
probability no less than 1 "t
=t
=t
of having ! i; (a ) = P for some for subsequent Tt periods.) (29) implies that telling a lie
gives strictly lower payo¤ than the equilibrium payo¤. Therefore, it is optimal to tell the
truth about m1st
(In (29), we have shown interim incentive compatibility after knowing
i;t .
t
! i;t (a ) and ri;t while here, we consider hti before ! i;t (at ) and ri;t . Taking the expectation
with respect to ! i;t (at ) and ri;t , (29) ensures incentive compatibility before knowing ! i;t (at )
and ri;t .)
Second, consider player i’s incentive to take the action ai;t = ri;t according to (36) if
player i follows the equilibrium strategy until she sends m1st
i;t . If she follows the equilibrium
strategy, then player i believes at the time when she takes an action that her equilibrium
value is equal to
"
#
1
X
t 1
(1
)E ui (at ) j hti + E
(1
)
ui (at ) j hti ;
=t+1
76
punish
where hti is as if player i observed ri;t
(at )
1
at 2At
1 t=1
, at , (! (a ))t =11 , ! i;t (at ), and ri;t , and
believed that r (a ) = a for each = 1; :::; t 1 with
with mediated perfect monitoring.
1st
(Compared to the time when player i sends mi;t , player i now knows ! i;t (at ) and ri;t on
equilibrium path.)
If player i deviates from ai;t , then ! j; (a ) = P by de…nition for each
t + 1 and a
that is compatible with at (that is, a = (at ; at ; :::; a 1 ) for some at ; :::; a 1 ). To avoid
being minmaxed in period , player i needs to induce !
^ j; (a ) = R although ! j; (a ) = P .
j
j
i
t
t
t
i
~ ji;t (at ) are uniform
Given ! j; (a ) = P , since i;t (a ), i;t (a ), u~i;t (a ), vi;t (n; at ) nii;t (at ), and n
and independent (conditional on the other variables), regardless of player i’s continuation
strategy, by (35) (with indices i and j reversed), player i will send m1st
i; with
m1st
i; (u) 6=
j
j;
(a )
m1st
i; (x) +
j
j;
(a ) (mod p )
with a high probability.
Hence, for su¢ ciently large N t , if N
N t for each
t, then player i will be punished
for the next Tt periods regardless of player i’s continuation strategy with probability no less
than 1 "t . By (28), deviating from ri;t gives a strictly lower payo¤ than her equilibrium
payo¤. Therefore, it is optimal to take ai;t = ri;t .
2nd
Finally, consider player i’s incentive to tell the truth about m2nd
i;t . Regardless of mi;t ,
player j’s actions do not change. Hence, we are left to show that telling a lie does not
improve player i’s deviation gain by giving player i more information.
On the equilibrium path, player i knows ! i;t (at ). If player i tells the truth, then m2nd
i;t =
t+1
t
i
ni;t+1 (a ) 6= fbabbleg if and only if ! i;t (a ) = R. Moreover, player j sends
m1st
j;t+1 =
t+1
uji;t+1 (m2nd
); xji;t+1 (at+1 )
if ! i;t (at ) = R;
i;t ; a
uji;t+1 (nji;t+1 (at+1 ); at+1 ); xji;t+1 (at+1 ) if ! i;t (at ) = P:
Since nji;t+1 (at+1 ) = nii;t+1 (at+1 ) if ! i;t (at ) = P , in total, if player i tells the truth, then
player i knows uij;t+1 (mii;t+1 (at+1 ); at+1 ) and xij;t+1 (at+1 ). This is su¢ cient information to
infer ! i;t+1 (at+1 ) and ri;t+1 (at+1 ) correctly.
If she tells a lie, then player j’s messages are changed to
m1st
j;t+1 =
t+1
uji;t+1 (m2nd
); xji;t+1 (at+1 )
if m2nd
6 fbabbleg;
i;t ; a
i;t =
j
j
j
t+1
t+1
t+1
2nd
ui;t+1 (ni;t+1 (a ); a ); xi;t+1 (a ) if mi;t = fbabbleg:
j
Since ii;t+1 (at+1 ), ii;t+1 (at+1 ), u~ji;t+1 (at+1 ), vi;t+1
(n; at+1 ) nii;t+1 (at+1 ), and n
~ ji;t+1 (at+1 ) are
uniform and independent conditional on ! i;t+1 (at+1 ) and ri;t+1 (at+1 ), uji;t+1 (n; at+1 ) and
xji;t+1 (at+1 ) are not informative about player j’s recommendation from period t + 1 on or
player i’s recommendation from period t + 2 on, given that player i knows ! i;t+1 (at+1 ) and
ri;t+1 (at+1 ). Since telling the truth informs player i of ! i;t+1 (at+1 ) and ri;t+1 (at+1 ), there is
no gain from telling a lie.
77
Download