On the Equilibrium Payo¤ Set in Repeated Games with Imperfect Private Monitoring Takuo Sugaya and Alexander Wolitzky Stanford Graduate School of Business and MIT April 19, 2015 Abstract We provide simple su¢ cient conditions for the existence of a tight, recursive upper bound on the sequential equilibrium payo¤ set at a …xed discount factor in two-player repeated games with imperfect private monitoring. The bounding set is the sequential equilibrium payo¤ set with perfect monitoring and a mediator. We show that this bounding set admits a simple recursive characterization, which nonetheless necessarily involves the use of private strategies. Under our conditions, this set describes precisely those payo¤ vectors that arise in equilibrium for some private monitoring structure. This replaces our earlier paper, “Perfect Versus Imperfect Monitoring in Repeated Games.” For helpful comments, we thank Dilip Abreu, Gabriel Carroll, Henrique de Oliveira, Glenn Ellison, Drew Fudenberg, Michihiro Kandori, Hitoshi Matsushima, Stephen Morris, Ilya Segal, Tadashi Sekiguchi, Andrzej Skrzypacz, Satoru Takahashi, Juuso Toikka, and Thomas Wiseman. 1 Introduction Like many dynamic economic models, repeated games are typically studied using recursive methods. In an incisive paper, Abreu, Pearce, and Stacchetti (1990; henceforth APS) recursively characterized the perfect public equilibrium payo¤ set at a …xed discount factor in repeated games with imperfect public monitoring. Their results (along with related con- tributions by Fudenberg, Levine, and Maskin (1994) and others) led to fresh perspectives on problems like collusion (Green and Porter, 1984; Athey and Bagwell, 2001), relational contracting (Levin, 2003), and government credibility (Phelan and Stacchetti, 2001). However, other important environments— like collusion with secret price cuts (Stigler, 1964) or relational contracting with subjective performance evaluations (Levin, 2003; MacLeod, 2003; Fuchs, 2007)— involve imperfect private monitoring, and it is well-known that the methods of APS do not easily extend to such settings (Kandori, 2002). Whether the equilibrium payo¤ set in repeated games with private monitoring exhibits any tractable recursive structure at all is thus a major question. In this paper, we do not make any progress toward giving a recursive characterization of the sequential equilibrium payo¤ set in a repeated game with a given private monitoring structure. Instead, working in the context of two-player games, we provide simple conditions for the existence of a tight, recursive upper bound on this set.1 Equivalently, under these conditions we give a recursive characterization of the set of payo¤s that can be attained in equilibrium for some private monitoring structure. Thus, from the perspective of an observer who knows the monitoring structure, our results give an upper bound on how well players can do in a repeated game; while from the perspective of an observer who does not know the monitoring structure, our results exactly characterize how well the players can do. Throughout the paper, the set we use to upper-bound the equilibrium payo¤ set with private monitoring is the equilibrium payo¤ set with perfect monitoring and a mediator (or, more precisely, the closure of the strict equilibrium payo¤ set with this information 1 The ordering on sets of payo¤ vectors here is not the usual one of set inclusion, but rather dominance in all non-negative Pareto directions. Thus, we say that A upper-bounds B if every payo¤ vector b 2 B is Pareto dominated by some payo¤ vector a 2 A. 1 structure). We do not take a position on the realism of allowing for a mediator, and instead view the model with a mediator as a purely technical device that— as we show— is useful for bounding the equilibrium payo¤ set with private monitoring. We thus show that the equilibrium payo¤ set with private monitoring has a simple, recursive upper bound by establishing two main results: 1. Under some conditions, the equilibrium payo¤ set with perfect monitoring and a mediator is indeed an upper bound on the equilibrium payo¤ set with any private monitoring structure. 2. The equilibrium payo¤ set with perfect monitoring and a mediator has a simple recursive structure. At …rst glance, it might seem surprising that any conditions at all are needed for the …rst of these results, as one might think that improving the precision of the monitoring structure and adding a mediator can only expand the equilibrium set. But this is not the case: giving a player more information about her opponents’ past actions splits her information sets and thus gives her new ways to cheat, and indeed we show by example that (unmediated) imperfect private monitoring can sometimes outperform (mediated) perfect monitoring. Our …rst result provides su¢ cient conditions for this not to happen. Thus, another contribution of our paper is pointing out that perfect monitoring is not necessarily the optimal monitoring structure in a repeated game (even if it is advantaged by giving players access to a mediator), while also giving su¢ cient conditions under which perfect monitoring is indeed optimal. Our su¢ cient condition for mediated perfect monitoring to outperform any private monitoring structure is that there is a feasible continuation payo¤ vector v such that no player i is tempted to deviate if she gets continuation payo¤ vi when she conforms and is minmaxed when she deviates. This is a joint restriction on the stage game and the discount factor, and it is essentially always satis…ed when players are at least moderately patient. need not be extremely patient: none of our main results concern the limit 2 ! 1.) (They The reason why this condition is su¢ cient for mediated perfect monitoring to outperform private monitoring in two-player games is fairly subtle, so we postpone a detailed discussion. Our second main result also involves some subtleties. In repeated games with perfect monitoring without a mediator, all strategies are public, so the sequential (equivalently, subgame perfect) equilibrium set coincides with the perfect public equilibrium set, which was recursively characterized by APS. On the other hand, with a mediator— who makes private action recommendations to the players— private strategies play a crucial role, and APS’s characterization does not apply. We nonetheless show that the sequential equilibrium payo¤ set with perfect monitoring and a mediator does have a simple recursive structure. Under the su¢ cient conditions for our …rst result, a recursive characterization is obtained by replacing APS’s generating operator B with what we call a minmax-threat generating ~ for any set of continuation payo¤s W , the set B ~ (W ) is the set of payo¤s that operator B: can be attained when on-path continuation payo¤s are drawn from W and deviators are minmaxed.2 To see intuitively why deviators can always be minmaxed in the presence of a mediator— and also why private strategies cannot be ignored— suppose that the mediator recommends a target action pro…le a 2 A with probability 1 other action pro…le with probability "= (jAj ", while recommending every 1); and suppose further that if some player i deviates from her recommendation, the mediator then recommends that her opponents minmax her in every future period. In such a construction, player i’s opponents never learn that a deviation has occurred, and they are therefore always willing to follow the recommendation of minmaxing player i.3 (This construction clearly relies on private strategies: if the mediator’s recommendations were public, players would always see when a deviation occurs, and they then might not be willing to minmax the deviator.) Our recursive characterization of the equilibrium payo¤ set with perfect monitoring and a mediator takes into account the 2 The equilibrium payo¤ set with perfect monitoring and a mediator has a recursive structure whether or not the su¢ cient conditions for our …rst result are satis…ed, but the characterization is somewhat more complicated in the general case. See Section 9.1. 3 In this construction, the mediator virutally implements the target action pro…le. For other applications of virtual implementation in games with a mediator, see Lehrer (1992), Mertens, Sorin, and Zamir (1994), Renaul and Tomala (2004), Rahman and Obara (2010), and Rahman (2012). 3 possibility of minmaxing deviators in this way. We also consider several extensions of our results. Perhaps most importantly, we establish two senses in which the equilibrium payo¤ set with perfect monitoring and a mediator is a tight upper bound on the equilibrium payo¤ set with any private monitoring structure. First, mediated perfect monitoring is itself an example of a nonstationary monitoring structure, meaning that the distribution of signals can depend on everything that has happened in the past, rather than only on current actions. Thus, our upper bound is trivially tight in the space of nonstationary monitoring structures. Second, with a standard, stationary monitoring structure, where the signal distribution depends only on the current actions, we show that the mediator can be replaced by ex ante correlation and cheap talk. Hence, our upper bound is also tight in the space of stationary monitoring structures when an ex ante correlating device and cheap talk are available. This paper is certainly not the …rst to develop recursive methods for private monitoring repeated games. In an early and in‡uential paper, Kandori and Matsushima (1998) augment private monitoring repeated games with opportunities for public communication among players, and provide a recursive characterization of the equilibrium payo¤ set for a subclass of equilibria that is large enough to yield a folk theorem. Tomala (2009) gives related results when the repeated game is augmented with a mediator rather than only public communication. However, neither paper provides a recursive upper bound on the entire sequential equilibrium payo¤ set at a …xed discount factor in the repeated game.4 Ama- rante (2003) does give a recursive characterization of the equilibrium payo¤ set in private monitoring repeated games, but the state space in his characterization is the set of repeated game histories, which grows over time. In contrast, our upper bound is recursive in payo¤ space, just like in APS. 4 Ben-Porath and Kahneman (1996) and Compte (1998) also prove folk theorems for private monitoring repeated games with communication, but they do not emphasize recursive methods away from the ! 1 limit. Lehrer (1992), Mertens, Sorin, and Zamir (1994), and Renault and Tomala (2004) characterize the communication equilibrium payo¤ set in undiscounted repeated games. These papers study how imperfect monitoring can limit the equilibrium payo¤ set without discounting, whereas our focus is on how discounting can limit the equilibrium payo¤ set independently of the monitoring structure. 4 A di¤erent approach is taken by Phelan and Skrzypacz (2012) and Kandori and Obara (2010), who develop recursive methods for checking whether a given …nite-state strategy pro…le is an equilibrium in a private monitoring repeated game. Their results do not give a recursive characterization or upper bound on the equilibrium payo¤ set. The type of recursive methods used in their papers is also di¤erent: their methods for checking whether a given strategy pro…le is an equilibrium involve a recursion on the sets of beliefs that players can have about each other’s states, rather than a recursion on payo¤s. Recently, Awaya and Krishna (2014) and Pai, Roth, and Ullman (2014) derive bounds on payo¤s in private monitoring repeated games as a function of the monitoring structure. The bounds in these papers come from the observation that, if an individual’s actions can have only a small impact on the distribution of signals, then the shadow of the future can have only a small e¤ect on her incentives. In contrast, our payo¤ bounds apply for all monitoring structures, including those in which individuals’actions have a large impact on the signal distribution.5 Finally, we have emphasized that our results can be interpreted either as giving an upper bound on the equilibrium payo¤ set in a repeated game for a particular private monitoring structure, or as characterizing the set of payo¤s that can arise in equilibrium in a repeated game for some private monitoring structure. With the latter interpretation, our paper shares a motivation with Bergemann and Morris (2013), who characterize the set of payo¤s that can arise in equilibrium in a static incomplete information game for some information structure. Yet another interpretation of our results is that they establish that information is valuable in mediated repeated games, in that— under our su¢ cient conditions— players cannot bene…t from imperfections in the monitoring technology. This interpretation connects our paper to the literature on the value of information in static incomplete information games (e.g., Gossner, 2000; Lehrer, Rosenberg, and Shmaya, 2010; Bergemann and Morris, 2013). The rest of the paper is organized as follows. Section 2 describes our models of repeated 5 A working paper by Cherry and Smith (2011) is the …rst to consider the issue of upper-bounding the equilibrium payo¤ set with imperfect private monitoring with the equilibrium set with perfect monitoring and correlating devices. However, the result of theirs which is most relevant for us–their Theorem 3–appears to be contradicted by our example in Section 3 below. 5 games with imperfect private monitoring and repeated games with perfect monitoring and a mediator, which are standard. Section 3 gives an example showing that private monitoring can sometimes outperform perfect monitoring with a mediator. Section 4 develops some preliminary results about repeated games with perfect monitoring and a mediator. Section 5 presents our …rst main result, which gives su¢ cient conditions for such examples not to exist: the proof of this result is lengthy and is deferred to Section 8. Section 6 presents our second main result: a simple recursive characterization of the equilibrium payo¤ set with perfect monitoring and a mediator. Combining the results of Sections 5 and 6 gives the desired recursive upper bound on the equilibrium payo¤ set with private monitoring. Section 7 illustrates the calculation of the upper bound with an example. Section 9 discusses partial versions of our results that apply when our su¢ cient conditions do not hold, as in the case of more than two players, as well as the tightness of our upper bound. Section 10 concludes. 2 Model A stage game G = I; (Ai )i2I ; (ui )i2I is repeated in periods t = 1; 2; : : :, where I = f1; : : : ; jIjg is the set of players, Ai is the …nite set of player i’s actions, and ui : A ! R is player i’s payo¤ function. Players maximize expected discounted payo¤s with common discount factor . We compare the equilibrium payo¤ sets in this repeated game under private monitoring and mediated perfect monitoring. 2.1 Private Monitoring In each period t, the game proceeds as follows: Each player i takes an action ai;t 2 Ai . A signal zt = (zi;t )i2I 2 (Zi )i2I = Z is drawn from distribution p (zt jat ), where Zi is the …nite set of player i’s signals and p ( ja) is the monitoring structure. Player i observes zi;t . A period t history of player i’s is an element of Hit = (Ai Zi )t 1 , with typical element hti = (ai; ; zi; )t =11 , where Hi1 consists of the null history ;. A (behavior) strategy of player S t i’s is a map i : 1 (Ai ). Let H t = (Hit )i2I . t=1 Hi ! 6 We do not impose the common assumption that a player’s payo¤ is measurable with respect to her own action and signal (i.e., that players observe their own payo¤s), because none of our results need this assumption. All examples considered in the paper do however satisfy this measurability condition. The solution concept is sequential equilibrium. More precisely, a belief system of player S1 S t hti ; H t i for all t; we also (H t ) satisfying supp i (hti ) i’s is a map i : 1 t=1 t=1 Hi ! write i (ht jhti ) for the probability of ht under i (hti ). We say that an assessment ( ; ) constitutes a sequential equilibrium if the following two conditions are satis…ed: 1. [Sequential rationality] For each player i and history hti , pected continuation payo¤ at history hti under belief i i maximizes player i’s ex- (hti ). 2. [Consistency] There exists a sequence of completely mixed strategy pro…les ( n ) such that the following two conditions hold: (a) n converges to (pointwise in t): For all " > 0 and t, there exists N such that, for all n > N , n t i (hi ) t i (hi ) < " for all i 2 I; hti 2 Hit : (b) Conditional probabilities converge to (pointwise in t): For all " > 0 and t, there exists N such that, for all n > N , n Pr (hti ; ht i ) P n t ~t ~ t Pr (hi ; h i ) h t t i (hi ; h i j hti ) < " for all i 2 I; hti 2 Hit ; ht i 2 H t i : i We choose this relatively permissive de…nition of consistency (requiring that strategies and beliefs converge only pointwise in t) to make our results upper-bounding the sequential equilibrium payo¤ set stronger. The results with a more restrictive de…nition (requiring uniform convergence) would be essentially the same. 7 2.2 Mediated Perfect Monitoring In each period t, the game proceeds as follows: A mediator sends a private message mi;t 2 Mi to each player i, where Mi is a …nite message set for player i. Each player i takes an action ai;t 2 Ai . All players and the mediator observe the action pro…le at 2 A. t A period t history of the mediator’s is an element of Hm = (M A)t 1 , with typical 1 consists of the null history. A strategy of the mediator’s element htm = (m ; a )t =11 , where Hm S1 t ! (M ). A period t history of player i’s is an element of Hit = is a map : t=1 Hm Mi , with typical element hti = (mi; ; a )t =11 ; mi;t , where Hi1 = Mi .6 S t (Ai ). strategy of player i’s is a map i : 1 t=1 Hi ! (Mi A)t 1 A The de…nition of sequential equilibrium is the same as with private monitoring, except that sequential rationality is imposed (and beliefs are de…ned) only at histories consistent with the mediator’s strategy. The interpretation is that the mediator is not a player in the game, but rather a “machine” that cannot tremble. Note that with this de…nition, an assessment (including the mediator’s strategy) ( ; ; ) is a sequential equilibrium with mediated perfect monitoring if and only if ( ; ) is a sequential equilibrium with the “nonstationary” private monitoring structure where Zi = Mi perfect monitoring of actions with messages given by A and p ( jht+1 m ) coincides with (ht+1 m ) (see Section 9.2). As in Forges (1986) and Myerson (1986), any equilibrium distribution over (in…nite paths of) actions arises in an equilibrium of the following form: 1. [Messages are action recommendations] M = A. 2. [Obedience/incentive compatibility] At history hti = ((ri; ; a )t =11 ; ri;t ), player i plays ai;t = ri;t . Without loss of generality, we restrict attention to such obedient equilibria throughout.7 6 t 1 We also occasionally write hti for (mi; ; a ) =1 , omitting the period t message mi;t . 7 Dhillon and Mertens (1996) show that the revelation principle fails for trembling-hand perfect equilibria. Nonetheless, with our “machine”interpretation of the mediator, the revelation principle applies for sequential equilibrium by precisely the usual argument of Forges (1986). 8 Finally, we say that a sequential equilibrium with mediated perfect monitoring is onpath strict if following the mediator’s recommendation is strictly optimal for each player i at every on-path history hti . Let Emed ( ) denote the set of on-path strict sequential equilibrium payo¤s. For the rest of the paper, we slightly abuse terminology by omitting the quali…er “on-path”when discussing such equilibria. 3 An Illustrative (Counter)Example The goal of this paper is to provide su¢ cient conditions for the equilibrium payo¤ set with mediated perfect monitoring to be a (recursive) upper bound on the equilibrium payo¤ set with private monitoring. Before giving these main results, we provide an illustrative example showing why, in the absence of our su¢ cient conditions, private monitoring (without a mediator) can outperform mediated perfect monitoring. Readers eager to get to the results can skip this section without loss of continuity. Consider the repetition of the following stage game, with L M R 1; 0 1; 0 U 2; 2 D 3; 0 0; 0 T 0; 3 6; 3 B 0; 3 0; 3 = 16 : 0; 0 6; 3 0; 3 We show the following: Proposition 1 In this game, there is no sequential equilibrium where the players’per-period payo¤s sum to more than 3 with perfect monitoring and a mediator, while there is such a sequential equilibrium with some private monitoring structure. Proof. See appendix. In the constructed private monitoring structure in the proof of Proposition 1, players’ payo¤s are measurable with respect to their own actions and signals. In addition, a similar 9 argument shows that imperfect public monitoring (with private strategies) can also outperform mediated perfect monitoring. This shows that some conditions are also required to guarantee that the sequential equilibrium payo¤ set with mediated perfect monitoring is an upper bound on the sequential equilibrium payo¤ set with imperfect public monitoring. Since imperfect public monitoring is a special case of imperfect private monitoring, our su¢ cient conditions for the private monitoring case are enough. Here is a sketch of the proof of Proposition 1. Note that (U; L) is the only action pro…le where payo¤s sum to more than 3. Because is so low, player 1 (row player, “she”) can be induced to play U in response to L only if action pro…le (U; L) is immediately followed by (T; M ) with high enough probability: speci…cally, this probability must exceed 35 . With perfect monitoring, player 2 (column player, “he”) must then “see (T; M ) coming” with probability at least 3 5 following (U; L), and this probability is so high that player 2 will deviate from M to L (regardless of the speci…cation of continuation play). This shows that payo¤s cannot sum to more than 3 with perfect monitoring. On the other hand, with private monitoring, player 2 may not know whether (U; L) has just occurred, and therefore may be unsure of whether the next action pro…le will be (T; M ) or (B; M ), which can give him the necessary incentive to play M rather than L. In particular, suppose that player 1 mixes 13 U + 23 D in period 1, and the monitoring structure is such that player 2 gets signal m (“play M ”) with probability 1 following (U; L), and gets signals m and r (“play R”) with probability 1 2 each following (D; L). Suppose further that player 1 plays T in period 2 if she played U in period 1, and plays B in period 2 if she played D in period 1. Then, when player 2 sees signal m in period 1, his posterior belief that player 1 played U in period 1 is 1 3 1 3 (1) (1) + 32 1 2 1 = : 2 Player 2 therefore expects to face T and B in period 2 with probability 1 2 each, so he is willing to play M rather than L. Meanwhile, player 1 is always rewarded with (T; M ) in period 2 when she plays U in period 1, so she is willing to play U (as well as D) in period 1. To summarize, the advantage of private monitoring is that pooling players’information 10 sets (in this case, player 2’s information sets after (U; L) and (D; L)) can make providing incentives easier.8 To preview our results, we will show that private monitoring cannot outperform mediated perfect monitoring when there exists a feasible payo¤ vector that is appealing enough to both players that neither is tempted to deviate when it is promised to them in continuation if they conform. This condition is violated in the current example because, for example, no feasible continuation payo¤ for player 2 is high enough to induce him to respond to T with M rather than L. Speci…cally, our condition is satis…ed in the current example if and only if is greater than 19 . 25 perfect monitoring if Thus, in this example, private monitoring can outperform mediated = 16 , but not if > 19 9 . 25 Preliminary Results about Emed ( ) 4 We begin with two preliminary results about the equilibrium payo¤ set with mediated perfect monitoring. These results are important for both our result on when private monitoring cannot outperform mediated perfect monitoring (Theorem 1) and our characterization of the equilibrium payo¤ set with mediated perfect monitoring (Theorem 2). Let ui be player i’s correlated minmax payo¤, given by ui = Let i 2 min i2 (A i) max ui (ai ; ai 2Ai i ): (A i ) be a solution to this minmax problem. 8 Let di be player i’s greatest As far as we know, the observation that players can bene…t from imperfections in monitoring even in the presence of a mediator is original. Examples by Kandori (1991), Sekiguchi (2002), Mailath, Matthews, and Sekiguchi (2002), and Miyahara and Sekiguchi (2013) show that players can bene…t from imperfect monitoring in …nitely repeated games. However, in their examples this conclusion relies on the absence of a mediator, and is thus due to the possibilities for correlation opened up by private monitoring. The broader point that giving players more information can be bad for incentives is of course an old one. 9 We do not …nd the lowest possible discount factor such that mediated perfect monitoring outperforms private monitoring for all > . 11 possible gain from a deviation at any recommendation pro…le, given by di = max ui (ai ; r i ) ui (r): r2A;ai 2Ai Let wi be the lowest continuation payo¤ such that player i does not want to deviate at any recommendation pro…le when she is minmaxed forever if she deviates, given by w i = ui + 1 di : Let Wi = w 2 RjIj : wi wi : Finally, let u (A) be the convex hull of the set of feasible payo¤s, and denote the interior of T i2I Wi \ u(A) as a subspace of u (A) by W int \ i2I with W \ i2I Wi \ u(A) ! Wi \ u(A) Our …rst preliminary result is that all payo¤s in W are attainable in equilibrium with mediated perfect monitoring. See Figure 1. The intuition is that— as discussed in the introduction— the mediator can virtually implement any payo¤ vector in W by minmaxing deviators. We will actually prove the slightly stronger result that all payo¤s in W are attainable in a strict “full-support” equilibrium with mediated perfect monitoring. Formally, we say that an equilibrium has full support if for each player i and history hti = (ri; ; a )t =11 such that there exists (r i; )t =11 with Pr (r j(r 0 ; a 0 ) 12 1 0 =1 ) > 0 for each = 1; :::; t 1, there exists Figure 1: W (r i; )t =11 such that for each Pr (ri; ; r 1 we have = 1; :::; t i; j(ri; 0 ; r Emed ( ) i; 0 ; a 0) 1 0 =1 ) > 0 and r i; =a i; : That is, any history hti consistent with the mediator’s strategy is also consistent with i’s opponents’equilibrium strategies (even if player i herself has deviated, noting that we allow ri; 6= ai; in hti ). This is weaker than requiring that the mediator’s recommendation has full support at all histories (on- and o¤-path), but stronger than requiring that the recommendation has full support at all on-path histories only. Note that, if the equilibrium has full support, player i never believes that any of the other players has deviated. Lemma 1 For all v 2 W , there exists a strict full-support equilibrium with mediated perfect monitoring with payo¤ v. In particular, W Proof. For each v 2 W , there exists 2 Emed ( ). (A) such that u( ) = v and (r) > 0 for all r 2 A. On the other hand, for each i 2 I and " 2 (0; 1), approximate the minmax strategy P T " (1 ") i +" a i 2A i jAa ii j . Since v 2 int i2I Wi , i by the full-support strategy i 13 there exists " 2 (0; 1) such that, for each i 2 I, we have vi > max ui (ai ; ai 2Ai " i) + 1 (1) di : Consider the following recommendation schedule: The mediator follows an automaton strategy whose state is identical to a subset of players J I. Hence, the mediator has 2jIj states. In the following construction of the mediator’s strategy, J will represent the set of players who have ever deviated from the mediator’s recommendation. If the state J is equal to ; (no player has deviated), then the mediator recommends . If there exists i with J = fig (only player i has deviated), then the mediator recommends r i to players i according to Finally, if jJj i. " i, " i to player 2 (several players have deviated), then for each i 2 J, the mediator recommends the best response to other players and recommends some best response to J with probability " i, while she recommends each pro…le a 1 jA Jj . J 2A J to the The state transitions as follows: if the current state is J and players J 0 deviate, then the state transitions to J [ J 0 .10 Player i’s strategy is to follow her recommendation ri;t in period t. She believes that the mediator’s state is ; if she herself has never deviated, and believes that the state is fig if she has deviated. Since the mediator’s recommendation has full support, player i’s belief is consistent. (In particular, no matter how many times player i has been instructed to minmax some player j, it is always in…nitely more likely that these instructions resulted from randomization by the mediator rather than a deviation by player j.) If player i has deviated, then (given her belief) it is optimal for her to always play a static best response to recommends " i " i, since the mediator always in state fig. Given that a unilateral deviation by player i is punished in this way, (1) implies that on path player i has a strict incentive to follow her recommendation ri;t at any recommendation pro…le rt 2 A. Hence, she has a strict incentive to follow her recommendation when she believes that r i;t is distributed according to Pr (r t i;t jhi ). The condition that W 6= ; can be more transparently stated as a lower bound on the 10 We thank Gabriel Carroll for suggestions which helped simplify this construction. 14 discount factor. In particular, W 6= ; if and only if there exists v 2 u (A) such that 1 vi > ui + di for all i 2 I; or equivalently > min max v2u(A) i2I For instance, it can be checked that = 19 25 di di + vi ui : in the example of Section 3. Note that is strictly less than 1 if and only if the stage game admits a feasible and strictly individually rational payo¤ vector (relative to correlated minmax payo¤s).11 For most games of interest, will be some “intermediate”discount factor that is not especially close to either 0 or 1. Our second preliminary result is that, if a strict full-support equilibrium exists, then any payo¤ vector that can be attained by a mediator’s strategy that is incentive compatibility on path is (virtually) attainable in strict equilibrium. The intuition is that mixing such a mediator’s strategy with an " probability of playing any strict full-support equilibrium yields a strict equilibrium with nearby payo¤s. Lemma 2 With mediated perfect monitoring, …x a payo¤ vector v, and suppose there exists a mediator’s strategy that (1) attains v when players obey the mediator, and (2) has the property that obeying the mediator is optimal for each player at each on-path history, when she is minmaxed forever if she deviates: that is, for each player i and on-path history ht+1 m , (1 )E ui (rt ) j max (1 ai 2Ai htm ; ri;t )E ui (ai ; r i;t ) " + E (1 ) 1 X =t+1 t 1 ui ( (hm )) j j htm ; ri;t + ui : htm ; ri;t # (2) Suppose also that there exists a strict full-support equilibrium. (For example, such an equilibrium exists if W 6= ;, by Lemma 1.) Then v 2 Emed ( ). Proof. See appendix. 11 Recall that a payo¤ vector v is strictly individually rational if vi > ui for all i 2 I. 15 5 A Su¢ cient Condition for Emed ( ) to Give an Upper Bound Our su¢ cient condition for mediated perfect monitoring to outperform private monitoring in two-player games is that > . In Section 9, we discuss what happens when there are more than two players or the condition that > is relaxed. Let E( ; p) be the set of (possibly weak) sequential equilibrium payo¤s with private monitoring structure p. The following is our …rst main result. Note that E( ; p) is closed, as we use the product topology on assessments (Fudenberg and Levine, 1983), so the maxima in the theorem exist. Theorem 1 If jIj = 2 and > non-negative Pareto weight 2 , then for every private monitoring structure p and every + max f 2 R2+ : k k = 1g, we have v2E( ;p) v max v: v2Emed ( ) Theorem 1 says that, in games involving two players of at least moderate patience, the Pareto frontier of the (closure of the strict) equilibrium payo¤ set with mediated perfect monitoring extends farther in any non-negative direction than does the Pareto frontier of the equilibrium payo¤ set with any private monitoring structure.12 We emphasize that— like all our main results— Theorem 1 concerns equilibrium payo¤ sets at …xed discount factors, not in the ! 1 limit. We describe the idea of the proof of Theorem 1, deferring the proof itself to Section 8. Let E( ) be the equilibrium payo¤ set in the mediated repeated game with the following universal information structure: the mediator directly observes the recommendation pro…le rt and the action pro…le at in each period t, while each player i observes nothing beyond her own recommendation ri;t and her own action ai;t .13 With this monitoring structure, the 12 We do not know if the same result holds for negative Pareto weights. This information structure may not result from mediated communication among the players, as actions are not publicly observed. Again, we simply view E ( ) as a technical device. 13 16 mediator can clearly replicate any private monitoring structure p by setting (htm ) equal to p ( jat 1 ) for every history htm = (r ; a )t =11 . It particular, we have E( ; p) E( ) for every p,14 so to prove Theorem 1 it su¢ ces to show that max v2E( ) v max v: (3) v2Emed ( ) (The maximum over v 2 E( ) is well de…ned since E( ) is closed, by the same reasoning as for E( ; p).) To show this, the idea is to start with an equilibrium in E( )— where players only observe their own recommendations— and then show that the players’ recommendations can be “publicized”without violating anyone’s obedience constraints.15 To see why this is possible (when jIj = 2 and > , or equivalently W 6= ;), …rst note that we can restrict attention to equilibria with Pareto-e¢ cient on-path continuation payo¤s, as improving both players’on-path continuation payo¤s improves their incentives (assuming that deviators are minmaxed, which is possible when W 6= ;, by Lemma 2). Next, if jIj = 2 and W 6= ;, then if a Pareto-e¢ cient payo¤ vector v lies outside of Wi for one player (say player 2), it must then lie inside of Wj for the other player (player 1). Hence, at each history ht , there can be only one player— here player 2— whose obedience constraint could be violated if we publicized both players’past recommendations. Now, suppose that at history ht we do publicize the entire vector of players’past recommendations rt = (r )t =11 , but the mediator then issues period t recommendations according to the original equilibrium distribution of recommendations conditional on player 2’s past recommendations r2t = (r2; )t =11 only. We claim that doing this violates neither player’s obedience constraint: Player 1’s obedience constraint is easy to satisfy, as we can always ensure that continuation payo¤s lie in W1 . And, since player 2 already knew r2t in the 14 In light of this fact, the reader may wonder why we do not simply take E ( ) rather than Emed ( ) to be our bound on equilibrium payo¤s with private monitoring. The answer is that, as far as we know, E ( ) does not admit a recursive characterization, while we recursively characterize Emed ( ) in Section 6. 15 More precisely, the construction in the proof both publicizes the players’recommendations and modi…es the equilibrium in ways that only improve the players’ -weighted payo¤s. 17 original equilibrium, publicizing ht while issuing recommendations based only on r2t does not a¤ect his incentives. An important missing step in this proof sketch is that, in the original equilibrium in E( ), at some histories it may be player 1 who is tempted to deviate when we publicize past recommendations (while it is player 2 who is tempted at other histories). For instance, it is not clear how we can publicize past recommendations when ex ante equilibrium payo¤s are very good for player 1 and so v 2 W1 (so player 2 is tempted to deviate in period 1), but continuation payo¤s at some later history are very good for player 2 (so then player 1 is tempted to deviate). The proof of Theorem 1 shows that we can ignore this possibility, because— somewhat unexpectedly— equilibrium paths like this one are never needed to sustain Pareto-e¢ cient payo¤s. In particular, to sustain an ex ante payo¤ that is very good for player 1 (i.e., outside W2 ), we never need to promise continuation payo¤s that are very good for player 2 (i.e., outside W1 ). The intuition is that, rather than promising player 2 a very good continuation payo¤ outside W1 , we can instead promise him a fairly good continuation inside W1 , while compensating him for this change by also occasionally transitioning to this fairly good continuation payo¤ at histories where the original promised continuation payo¤ is less good for him. Finally, since the feasible payo¤ set is convex, the resulting “compromise” continuation payo¤ vector is also acceptable to player 1. Recursively Characterizing Emed( ) 6 We have seen that Emed ( ) is an upper bound on E ( ; p) for two-player games satisfying > . As our goal is to give a recursive upper bound on E ( ; p), it remains to recursively characterize Emed ( ). Our characterization assumes that > , but it applies for any number of players. Recall that APS characterize the perfect public equilibrium set with imperfect public monitoring as the iterative limit of a generating operator B, where B (W ) is de…ned as the set of payo¤s that can be sustained when on- and o¤-path continuation payo¤s are drawn 18 from W . What we show is that the sequential equilibrium payo¤ set with mediated perfect ~ where B ~ (W ) is the set of monitoring is the iterative limit of a generating operator B, payo¤s that can be sustained when on-path continuation payo¤s are drawn from W , and deviators are minmaxed o¤ path. Intuitively, there are two things to show: (1) we can indeed minmax deviators o¤ path, and (2) on-path continuation payo¤s must themselves be sequential equilibrium payo¤s. The …rst of these facts is Lemma 2. For the second, note that, in an obedient equilibrium with perfect monitoring, players can perfectly infer each other’s private history on path. Continuation play at on-path histories (but not o¤-path histories) is therefore “common knowledge,”which gives the desired recursive structure. In the following formal development, we assume familiarity with APS (see also Section 7.3 of Mailath and Samuelson, 2006), and focus on the new features that emerge when mediation is available. RjIj , a correlated action pro…le De…nition 1 For any set V enforceable on V by a mapping E [(1 max E [(1 (A) is minmax-threat : A ! V if, for each player i and action ai 2 supp ) ui (ai ; a i ) + a0i 2Ai 2 i, (ai ; a i ) j ai ] ) ui (a0i ; a i ) j ai ] + min i2 (A i) max ui (ai ; ai 2Ai i) : De…nition 2 A payo¤ vector v 2 RjIj is minmax-threat decomposable on V if there exists a correlated action pro…le 2 (A) which is minmax-threat enforced on V by a mapping such that v = E [(1 ) u (a) + (a)] : ~ (V ) = v 2 RjIj : v is minmax-threat decomposable on V . Let B We show that the following algorithm recursively computes Emed ( ): let W 1 = u (A), ~ (W n 1 ) for n > 1, and W 1 = limn!1 W n . Wn = B Theorem 2 If > , then Emed ( ) = W 1 . 19 With the exception of the following two lemmas, the proof of Theorem 2 is entirely standard, and omitted. The lemmas correspond to facts (1) and (2) above. In particular, Lemma 3 follows directly from APS and Lemma 2, while Lemma 4 establishes on-path recursivity. For both lemmas, assume Lemma 3 If a set V . > RjIj is bounded and satis…es V ~ (V ), then B ~ (V ) B Emed ( ). Proof. See appendix. ~ Emed ( ) . Lemma 4 Emed ( ) = B Proof. See appendix. Combining Theorems 1 and 2 yields our main conclusion: in two-player games with > , the equilibrium payo¤ set with mediated perfect monitoring is a recursive upper bound on the equilibrium payo¤ set with any imperfect private monitoring structure. 7 The Upper Bound in an Example We illustrate our results with an application to a simple repeated Bertrand game. Specifically, in such a game we …nd greatest equilibrium payo¤ that each …rm can attain for any private monitoring structure. Consider the following Bertrand game: there are two …rms i 2 f1; 2g, and each …rm i’s possible price level is pi 2 fW; L; M; Hg (price war, low price, medium price, high price). Given p1 and p2 , …rm i’s pro…t is determined by the following payo¤ matrix W L M W 15; 15 30; 25 50; 15 L H 80; 0 25; 30 40; 40 60; 35 90; 15 M 15; 50 35; 60 55; 55 85; 35 H 0; 80 15; 90 35; 85 65; 65 20 Note that L (low price) is a dominant strategy in the stage game, W (price war) is a costly action that hurts the other …rm, and (H; H) maximizes the sum of the …rms’pro…ts. The feasible payo¤ set is given by u(A) = co f(0; 80); (15; 15); (15; 90); (35; 85); (65; 65); (80; 0); (85; 35); (90; 15)g : In addition, each …rm’s minmax payo¤ ui is 25, so the feasible and individually rational payo¤ set is given by co f(25; 25) ; (25; 87:5) ; (35; 85) ; (65; 65) ; (85; 35) ; (87:5; 25)g : In particular, the greatest feasible and individually rational payo¤ for each …rm is 87:5. In this game, each …rm’s maximum deviation gain di is 25. Since the game is symmetric, the critical discount factor above which we can apply Theorems 1 and 2 is given by plugging the best symmetric payo¤ of 65 into the formula for = 25 25 + 65 25 = , which gives 5 : 13 To illustrate our results, we …nd the greatest equilibrium payo¤ that each …rm can attain for any private monitoring structure when When = 12 , we have wi = 25 + = 1 1 2 > 5 . 13 25 = 50: Hence, Lemma 1 implies that fv 2 int u(A) : vi > 50 for each ig = int co f(50; 50) ; (75; 50) ; (50; 75) ; (65; 65)g Emed ( ) : ~ We now compute the best payo¤ vector for …rm 1 in B(u(A)). By Theorems 1 and 2, ~ (A)) is not included in E( ; p) for any p. any Pareto-e¢ cient payo¤ pro…le not included B(u 21 In computing the best payo¤ vector for …rm 1, it is natural to conjecture that …rm 1’s incentive compatibility constraint is not binding. We thus consider a relaxed problem with only …rm 2’s incentive constraint, and then verify that …rm 1’s incentive constraint is satis…ed. Note that playing L is always the best deviation for …rm 2. Furthermore, the corresponding deviation gain decreases as …rm 1 increases its price from W to L, and (weakly) increases as it increases its price from L to M or H. On the other hand, …rm 1’s payo¤ increases as …rm 1 increases its price from W to L and decreases as it increases its price from L to M or H. Hence, in order to maximize …rm 1’s payo¤, …rm 1 should play L. Suppose that …rm 2 plays H. Then, …rm 2’s incentive compatibility constraint is (1 ) (w2 25 |{z} maximum deviation gain 50 implies that w1 ; minmax payo¤ where w2 is …rm 2’s continuation payo¤. That is, w2 By feasibility, w2 25 ) |{z} 50. 75. Hence, if r2 = H, the best minmax-threat decomposable payo¤ for …rm 1 is (1 0 )@ 90 15 1 0 A+ @ 75 50 1 0 A=@ 82:5 32:5 1 A: Since 82:5 is larger than any payo¤ that …rm 1 can get when …rm 2 plays W , M , or L, …rm 2 should indeed play H to maximize …rm 1’s payo¤. Moreover, since 75 w1 , …rm 1’s incentive constraint is not binding. Thus, we have shown that 82:5 is the best payo¤ for ~ (u(A)). …rm 1 in B On the other hand, with mediated perfect monitoring it is indeed possible to (virtually) implement an action path in which …rm 1’s payo¤ is 82:5: play (L; H) in period 1 (with payo¤s (90; 15)), and then play 1 2 (M; H) + 12 (H; H) forever (with payo¤s 1 2 (85; 35) + 12 (65; 65) = (75; 50)), while minmaxing deviators. Thus, when = 21 , each …rm’s greatest feasible and individually rational payo¤ is 87:5, but the greatest payo¤ it can attain with any imperfect private monitoring structure is only 22 82:5. In this simple game, we can therefore say exactly how much of a constraint is imposed on each …rm’s greatest equilibrium payo¤ by the …rms’impatience alone, independently of the monitoring structure. 8 Proof of Theorem 1 8.1 Preliminaries and Plan of Proof We wish to establish (3) for every Pareto weight W Therefore, for every Pareto weight 2 +, if there exists v 2 arg maxv0 2E( v 2 W , then there exists v 2 Emed ( ) such that 2 + Since we consider two-player games, we can order , that is, the vector v , as desired. v v 0 \ W = ;: v 2E( ) 1 2 v 0 such that ) with arg max 0 0 1 0 2 Note that Lemma 1 implies that Emed ( ): 2 Hence, we are left to consider +. 0 is steeper than 2 + (4) 0 as follows: if and only if . For each player i, let wi be the Pareto- e¢ cient point in Wi satisfying wi 2 arg Note that the assumption that W max v2Wi \u(A) 6= ; implies that wi 2 W . recommendation that attains wi : u( i ) = wi . weight i such that wi 2 arg maxv2u(A) i As u (A) is convex, if = i 2 R2+ : v i: i i i Let Let i 2 (A) be a be the (non-empty) set of Pareto v: = 1; wi 2 arg max v2u(A) satis…es (4) then either 23 < 1 for each i v : 1 2 1 or > 2 for each 2 2 2 . See Figure 2. We focus on the case where 2 > . (The proof for the < 1 case is symmetric and thus omitted.) Figure 2: Setup for the construction. Fix v 2 arg maxv0 2E( v 0 . Let ( ; ( i )i2I ) be an equilibrium that attains v with the ) universal information structure (where players do not observe each other’s actions). By Lemma 2, it su¢ ces to construct a mediator’s strategy yielding payo¤s v such that (2) (“perfect monitoring incentive compatibility”) holds and v v . The rest of the proof constructs such a strategy. The plan for constructing the strategy mediator’s strategy is as follows: First, from , we construct a that yields payo¤s v and satis…es perfect monitoring incentive com- patibility for player 2, but possibly not for player 1. The idea is to set the distribution of recommendations under equal to the distribution of recommendations under conditional on player 2’s information only. Second, from , we construct a mediator’s strategy yields payo¤s v with v that v and satis…es perfect monitoring incentive compatibility for both players. 24 8.2 Construction and Properties of For each on-path history of player 2’s recommendations, denoted by r2t = (r2; )t =11 , let Pr ( jr2t ) be the conditional distribution of recommendations in period t; and let w (r2t ) be the continuation payo¤ vector from period t onward conditional on r2t : w (r2t ) = E " # 1 X u(rt+ ) j r2t : =0 so that, for every on-path history rt = (r )t =11 , the mediator draws rt according De…ne to Pr (rt jr2t ): Pr (rt jrt ) We claim that Pr (rt jr2t ): (5) yields payo¤s v and satis…es (2) for player 2. To see this, let w (rt ) be the continuation payo¤ vector from period t onward conditional on rt under , and note that w (rt ) = w (r2t ). In particular, w (r1 ) = w (r21 ) = v. In addition, the fact that is an equilibrium with the universal information structure implies that, for every on-path history rt+1 , (1 )E u2 (rt ) jr2t+1 + w As w2 r2t+1 = w2 (rt+1 ) and Pr r2t+1 max (1 a2 2A2 )E u2 (r1;t ; a2 ) jr2t+1 + u2 : rt jr2t+1 = Pr (rt jrt ; r2;t ), this implies that (2) holds for player 2. 8.3 Construction of The mediator’s strategy will involve mixing over continuation payo¤s at certain histories rt+1 , and we will denote the mixing probability at history rt+1 by p (rt+1 ). Our approach 1 S is to …rst construct the mediator’s strategy for an arbitrary function p : At 1 ! [0; 1] t=1 specifying these mixing probabilities, and to then specify the function p. 1 S Given a function p : At 1 ! [0; 1], the mediator’s strategy is de…ned as follows: t=1 25 In each period t = 0; 1; 2; : : :, the mediator is in one of two states, ! t 2 fS1 ; S2 g (where “period 0” is a purely notational, and as usual the game begins in period 1). state, recommendations in period t Given the 1 are as follows: 1. In state S1 , at history rt = (r )t =11 , the mediator recommends rt according to Pr (rt jrt ). 1 2. In state S2 , the mediator recommends rt according to some u( 1 2 (A) such that ) = w1 . The initial state is ! 0 = S1 . State S2 is absorbing: if ! t = S2 then ! t+1 = S2 . Finally, the transition rule in state S1 is as follows: 1. If w (rt+1 ) 62 W1 , then ! t+1 = S2 with probability one. 2. If w (rt+1 ) 2 W1 , then ! t+1 = S2 with probability 1 Thus, strategy p(rt+1 ). agrees with , with the exception that occasionally transitions to an absorbing state where actions yielding payo¤s w1 are recommended forever. In particular, such a transition always occurs when continuation payo¤s under otherwise this transition occurs with probability 1 lie outside W1 , and p (rt+1 ). To complete the construction of , it remains only to specify the function p. To this 1 S end, it is useful to de…ne an operator F , which maps functions w : At 1 ! R2 to functions F (w2 ) : 1 S t=1 At 1 t=1 ! R2 . The operator F will be de…ned so that its unique …xed point is precisely the continuation value function in state S1 under for a particular function p, and this function will be the one we use to complete the construction of . 1 1 S S Given w : At 1 ! R2 , de…ne w (w) : At 1 ! R so that, for every rt 2 At 1 , we t=1 t=1 have w (w)(rt ) = (1 On the other hand, given w (w) : 1 S t=1 )u At 1 (rt ) + E w(rt+1 )jrt : ! R, de…ne F (w) : 26 1 S t=1 At (6) 1 ! R so that, for every rt 2 At 1 , we have F (w) rt = 1fw 9 8 < p(w)(rt ) w (w)(rt ) = + 1fw (rt )2W1 g : + (1 p(w)(rt )) w1 ; (rt )2W = 1gw 1 (7) ; where, when w (rt ) 2 W1 , p(w)(rt ) is the largest number in [0; 1] such that p(w)(rt ) That is, if w2 (w) (rt ) that w2 (rt ) w2 (w)(rt ) + 1 p(w)(rt ) w2 (rt ): w21 (8) w2 (rt ), then p(w)(rt ) = 1; and otherwise, since w (rt ) 2 W1 implies w21 , p(w)(rt ) 2 [0; 1] solves p(w)(rt ) (Intuitively, the term 1fw w2 (w)(rt ) + 1 (rt )2W = 1gw 1 p(w)(rt ) w21 = w2 (rt ): in (7) re‡ects the fact that we have replaced contin- uation payo¤s outside of W1 with player 2’s most favorable continuation payo¤ within W1 , namely w1 . This replacement may reduce player 2’s value below his original value of w2 (rt ). However, (8) ensures that, by also replacing continuation payo¤s within W1 with w1 with high enough probability, player 2’s value does not fall below w2 (rt ).) To show that F has a unique …xed point, it su¢ ces to show that F is a contraction. Lemma 5 For all w and w, ~ we have kF (w) suprt kw(rt ) F (w)k ~ kw wk, ~ where kw wk ~ w(r ~ t )k. Proof. See appendix. Let w be the unique …xed point of F . Given this function w, let w = w (w) (given by (6)) and let p = p (w) (given by (8)). This completes the construction of the mediator’s strategy . 27 8.4 Properties of Observe that w (rt ) = (1 )u (rt ) + E w(rt+1 )jrt (9) and w rt = 1fw (rt )2W1 g p(rt )w (rt ) + 1 p(rt ) w1 + 1fw (rt )2W = 1gw 1 (10) : Thus, for i = 1; 2, wi (rt ) is player i’s expected continuation payo¤ from period t given rt and ! t = S1 (before she observes ri;t ), and wi (rt ) is player i’s expected continuation payo¤ from period t given rt and ! t 1 = S1 (before she observes ri;t ). In particular, recalling that ! 1 = S1 and v = w (;) 2 W1 , (8) implies that the ex ante payo¤ vector v is given by v = w (;) = p(;)w (;) + (1 p(;)) w1 : We prove the following key lemma in the appendix. Lemma 6 For all t 1, if w (rt ) 2 W1 , then p(rt )w (rt ) + (1 p(r2t )) w1 Pareto dominates w (rt ). Here is a graphical explanation of Lemma 6: w(rt+1 ) 1fw 1 indicates that we construct w(rt+1 ) by replacing some continuation payo¤ not included in W1 with w1 . Hence, w(rt+1 ) w (rt+1 ) (and thus w (rt ) w (rt )) is parallel w(r ^ t+1 ) for some w(r ^ t+1 ) 2 u(A) n W1 . See Figure 3 for an illustration. Recall that p(rt ) is determined by (8). w1 w (rt ) is parallel to w (rt+1 ). To evaluate this di¤erence, consider (10) for period t + 1. The term (rt+1 )2W = 1gw to w1 By (9), w (rt ) Since the vector w (rt ) w (rt ) is parallel to w(r ^ t+1 ) for some w(r ^ t+1 ) 2 u(A) n W1 and u(A) is convex, we have w1 (rt ) w1 (r2t ). Hence, if we take p(rt ) so that the convex combination of w2 (rt ) and w21 is equal to w2 (rt ), then player 1 is better o¤ compared to w1 (rt ). See Figure 4. Given Lemma 6, we show that ((2)) for both players, and v satis…es perfect monitoring incentive compatibility v . 28 Figure 3: The vector from w (rt ) to w (rt ) is parallel to the one from w(r ^ t+1 ) to w1 . Figure 4: p(rt )w (rt ) + (1 p(rt )) w1 and w (rt ) have the same value for player 2. 29 1. Incentive compatibility for player 1: It su¢ ces to show that, conditional on any onpath history rt and period t recommendation r1;t , the expected continuation payo¤ from period t + 1 onward lies in W1 . If ! t = S2 , then this continuation payo¤ is w1 2 W1 . If ! t = S1 , then it su¢ ces to show that w (rt+1 ) 2 W1 for all rt+1 . If w (rt+1 ) 2 W1 , then, by Lemma 6, w(rt+1 ) = p(rt+1 )w (rt+1 ) + (1 p(rt+1 )) w1 Pareto dominates w (rt+1 ) 2 W1 , so w(rt+1 ) 2 W1 . If w (rt+1 ) 2 = W1 , then w(rt+1 ) = w1 2 W1 . Hence, w (rt+1 ) 2 W1 for all rt+1 . 2. Incentive compatibility for player 2: ommendation r2;t . Fix an on-path history rt and a period t rec- If ! t = S2 , or if both ! t = S1 and w (rt+1 ) 2 = W1 , then the expected continuation payo¤ from period t + 1 onward conditional on (rt ; r2;t ) is w1 2 W1 , so (2) holds. p(rt+1 )w (rt+1 ) + (1 If instead ! t = S1 and w (rt+1 ) 2 W1 , then w(rt+1 ) = p(rt+1 )) w1 = w (rt+1 ) by (8). As universal information structure and Pr is an equilibrium with the rt jr2t+1 = Pr (rt jrt ; r2;t ), this implies that (2) holds for player 2, by the same argument as in Section 8.2. 3. 9 v v : Immediate from Lemma 6 with t = 1. Extensions This section discusses what happens when the conditions for Theorem 1 are violated, as well as the extent to which the payo¤ bound is tight. 9.1 What if W = ;? The assumption that W 6= ; guarantees that all action pro…les are supportable in equilibrium, which in turn implies that deviators can be held to their correlated minmax payo¤s. This fact plays a key role both in showing that Emed ( ) is an upper bound on E ( ; p) for any private monitoring structure p and in recursively characterizing Emed ( ). However, the assumption that W 6= ; is restrictive, in that it is violated when players are too impatient. 30 Furthermore, it implies that the Pareto frontier of Emed ( ) coincides with the Pareto frontier of the feasible payo¤ set for some Pareto weights (but of course not for others), so this assumption must also be relaxed for our approach to be able to give non-trivial payo¤ bounds for all Pareto weights. To address these concerns, this subsection shows that even if W = ;, Emed ( ) may still be an upper bound on E ( ; p) for any private monitoring structure p, and Emed ( ) can still be characterized recursively. The idea is that, even if not all action pro…les are supportable, our approach still applies if a condition analogous to W 6= ; holds with respect to the subset of action pro…les that are supportable. Recall that E( ) is the equilibrium payo¤ set with the universal monitoring structure where a mediator observes the recommendation pro…le rt and action pro…le at in each period t, while each player i only observes her own recommendation ri;t and her own action ai;t . Denote this monitoring structure by p . We provide a su¢ cient condition— which is more permissive than W 6= ;— under which we show that the Pareto frontier of Emed ( ) dominates the Pareto frontier of E( ) and characterize Emed ( ). Let supp( ) be the set of supportable actions with monitoring structure p : 8 > > with monitoring structure p , > < supp( ) = a 2 A : there exist an equilibrium strategy > > > : and history htm with a 2 supp( (htm )) Note that in this de…nition htm can be an o¤-path history. On the other hand, given a product set of action pro…les A = Q the set of actions ai 2 Ai such that there exists a correlated action (1 ) ui (ai ; i) + max ui (a) (1 a2A (That is, ai 2 Si A if there exists i 2 ) max ui (^ ai ; a ^i 2Ai i) + ^ i2I 9 > > > = > > > ; : Ai A, let Si A be 2 (A i ) such that i min max ui (ai ; ^ i ): (A i ) ai 2Ai (11) i2 (A i ) such that, if her opponents play i, player i’s reward for playing ai is the best payo¤ possible among those with support in A, and player 31 i’s punishment for deviating from ai is the worst possible among those with support in A i , Q then player i plays ai .) Let S A = i2I Si (Ai ) A. Let A1 = A, let An = S (An 1 ) for n > 1, and let A1 = limn!1 An . Note that the problem of computing A1 is tractable, as the set S A is de…ned by a …nite number of linear inequalities. Finally, in analogy with the de…nition of wi from Section 5, let min max ui (ai ; 1 a 2Ai i 2 (A i ) i i) + 1 max r2A1 ;ai 2Ai fui (ai ; r i ) ui (r)g : be the lowest continuation payo¤ such that player i does not want to deviate to any ai 2 Ai at any recommendation pro…le r 2 A1 , when she is minmaxed forever if she deviates, subject to the constraint that punishments are drawn from A1 . In analogy with the de…nition of Wi from Section 5, let Wi = ( jIj w2R : wi min max ui (ai ; (A1i ) ai 2Ai i) + 1 i2 max r2A1 ;ai 2Ai fui (ai ; r i ) ) ui (r)g : We show the following. Proposition 2 Assume that jIj = 2. If int \ i2I Wi \ u(A1 ) 6= ; (12) in the topology induced from u(A1 ), then for every private monitoring structure p and every non-negative Pareto weight 2 +, we have max v2E( ;p) v max v: v2Emed ( ) In addition, supp( ) = A1 . Proof. Given that supp( ) = A1 , the proof is analogous to the proof of Theorem 1, everywhere replacing u(A) with u(A1 ) and replacing “full support”with “full support within 32 A1 .” The proof that supp( ) = A1 whenever int appendix. T i2I Wi \ u(A1 ) 6= ; is deferred to the Note that Proposition 2 only improves on Theorem 1 at low discount factors: for a high enough discount factor, A = S (A) = A1 , so (12) reduces to W 6= ;. However, as W can be empty only for low discount factors, this is precisely the case where an improvement is needed.16 In order to be able to use Proposition 2 to give a recursive upper bound on E ( ; p) when W 6= ;, we must characterize Emed ( ) under (12). Our earlier characterization generalizes easily. In particular, the following de…nitions are analogous to De…nitions 1 and 2. RjIj , a correlated action pro…le De…nition 3 For any set V enforceable on V by a mapping ai 2 supp E [(1 2 (supp( )) is supp( ) : supp( ) ! V such that, for each player i, and action i, ) ui (ai ; a i ) + (ai ; a i )] max E [(1 a0i 2Ai ) ui (a0i ; a i )]+ i2 min max ui (^ ai ; ^i 2Ai (supp( )) a De…nition 4 A payo¤ vector v 2 RjIj is supp( ) decomposable on V if there exists a correlated action pro…le 2 (supp( )) which is supp( ) enforced on V by some mapping such that v = E [(1 ) u (a) + (a)] : ~ supp( ) (V ) = v 2 RjIj : v is supp( ) decomposable on V . Let B Let W supp( W supp( );1 );1 = u (supp( )), let W supp( = limn!1 W supp( Proposition 3 If int T i2I );n );n ~ supp( = B ) W supp( );n 1 for n > 1, and let . We have the following. 1 ;1 Wi \ u(A1 6= ;, then Emed ( ) = W A . Proof. Given that supp( ) = A1 by Proposition 2, the proof is analogous to the proof of Theorem 2. 16 To be clear, it is possible for W to be empty while A1 = A. Theorem 1 and Proposition 2 only give su¢ cient conditions: we are not claiming that they cover every possible case. 33 i) : As an example of how Propositions 2 and 3 can be applied, one can check that apply1 5 ; 4 18 yields A1 = T fW; L; M g fW; L; M g— ruling out the e¢ cient action pro…le (H; H)— and int i2I Wi \ u(A1 = 6 ing the operator S in the Bertrand example in Section 7 for any 2 ~ A1 . ;. We can then compute Emed ( ) by applying the operator B Finally, we mention that Emed ( ) can be characterized recursively even if int T i2I Wi \ u(A1 = ;. This fact is not directly relevant for the current paper, so we omit the proof. It is available from the authors upon request. 9.2 Tightness of the Bound There are at least two senses in which Emed ( ) is a tight bound on the equilibrium payo¤ set with private monitoring. First, thus far our model of repeated games with private monitoring has maintained the standard assumption that the distribution of period t signals depends only on period t actions: that is, that this distribution can be written as p ( jat ). In many settings, it would be desirable to relax this assumption and let the distribution of period t signals depend on the entire history of actions and signals up to period t, leading to a conditional distribution of the form pt ( jat ; z t ), as well as letting players receive signals before the …rst round of play. (Recall that at = (a )t =11 and z t = (z )t =11 .) For example, colluding …rms do not only observe their sales in every period, but also occasionally get more information about their competitors’past behavior from trade associations, auditors, tax data, and the like.17 In the space of such nonstationary private monitoring structures, the bound Emed ( ) is clearly tight: Emed ( ) is an upper bound on E ( ; p) for any non-stationary private monitoring structure p, because the equilibrium payo¤ set with the universal monitoring structure p , E ( ), remains an upper bound on E ( ; p); and the bound is tight because mediated perfect monitoring is itself a particular nonstationary private monitoring structure. Second, maintaining the assumption that the monitoring structure is stationary, the 17 Rahman (2014, p. 1) quotes from the European Commission decision on the amino acid cartel: a typical cartel member “reported its citric acid sales every month to a trade association, and every year, Swiss accountants audited those …gures.” 34 bound Emed ( ) is tight if the players can communicate through cheap talk, as they can then “replicate” the mediator among themselves. For this result, we also need to slightly generalize our de…nition of a private monitoring structure by letting the players receive signals before the …rst round of play. This seems innocuous, especially if we take the perspective of an an outside observer who does not know the game’s start date. Equivalently, the players have access to a mediator at the beginning of the game only. The monitoring structure is required to be stationary thereafter. We call such a monitoring structure a private monitoring structure with ex ante correlation. Proposition 4 Let Etalk ( ; p) be the sequential equilibrium payo¤ set in the repeated game with private monitoring structure with ex ante correlation p and with …nitely many rounds of public cheap talk before each round of play. If jIj = 2 and > , then there exists a private monitoring structure with ex ante correlation p such that Etalk ( ; p) = Emed ( ). The proof is long and is deferred to the online appendix. The main idea is as in the literature on implementing correlated equilibria without a mediator (see Forges (2009) for a survey). More speci…cally, Proposition 4 is similar to Theorem 9 of Heller, Solan, and Tomala (2012), which shows that communication equilibria in repeated games with perfect monitoring can always be implemented by ex ante correlation and cheap talk. As the private monitoring structure used in the proof of Proposition 4 is in fact perfect monitoring, the main di¤erence between the results is that theirs is for Nash rather than sequential equilibrium, so they are concerned only with detecting deviations rather than providing incentives to punish deviations once detected. In our model, when > , incentives to minmax the deviator can be provided (as in Lemma 1) if her opponent does not realize that the punishment phase has begun. The additional challenge in the proof of Proposition 4 is thus that we sometimes need a player to switch to the punishment phase for her opponent without realizing that this switch has occurred. If one insists on stationary monitoring and does not allow communication, then we believe that there are some games in which our bound is not tight, in that there are points in Emed ( ) 35 which are not attainable in equilibrium for any stationary private monitoring structure. We leave this as a conjecture.18 9.3 What if There are More Than Two Players? The condition that W 6= ; no longer guarantees that mediated perfect monitoring outperforms private monitoring when there are more than two players. We record this as a proposition. Proposition 5 There are games with jIj > 2 where W maxv2Emed ( weight 2 6= ; but maxv2E( ;p) v > v for some private monitoring structure p and some non-negative Pareto ) +. Proof. We give an example in the appendix. To see where the proof of Theorem 1 breaks down when jIj > 2, recall that the proof is based on fact that, for any Pareto-e¢ cient payo¤ v, if v 2 = Wi for one player i, then it must be the case that v 2 Wj for the other player j. This implies that incentive compatibility is a problem only for one player at a time, which lets us construct an equilibrium with perfect monitoring by basing continuation play only on that player’s past recommendations (which she necessarily knows in any private monitoring structure). On the other hand, if there are more than two players, several players’incentive compatibility constraints might bind at once when we publicize past recommendations. The proof of Theorem 1 then cannot get o¤ the ground. We can however say some things about what happens with more than two players. T First, the argument in the proof of Proposition 2 that supp( ) = A1 whenever int i2I Wi \ u(A1 ) 6= T ; does not rely on jIj = 2. Thus, when int i2I Wi \ u(A1 ) 6= ;, we can characterize the 18 Strictly speaking, since our maintained de…nition of a private monitoring structure does not allow ex ante correlation, if = 0 then there are points in Emed ( ) which are not attainable with any private monitoring structure whenever the stage game’s correlated equilibrium payo¤ set strictly contains its Nash equilibrium payo¤ set. The non-trivial conjecture is that the bound is still not tight when ex ante correlation is allowed, but communication is not, or when discount factors are moderately high. 36 set of supportable actions for any number of players. This is sometimes already enough to imply a non-trivial upper bound on payo¤s. Second, Proposition 6 below shows that if a payo¤ vector v 2 int (u (A)) satis…es vi > ui + 1 di for all i 2 I, then v 2 Emed ( ). This shows that private monitoring cannot do “much” better than mediated perfect monitoring when the players are at least moderately patient (e.g., it cannot do more than order 1 better). Finally, suppose there is a player i whose opponents 9i : 8j; j 0 2 I n fig, uj (a) = uj 0 (a) for all a 2 A. i all have identical payo¤ functions: Then the proof of Theorem 1 can be adapted to show that private monitoring cannot outperform mediated perfect monitoring T in a direction where the extremal payo¤ vector v lies in j2 i Wj (but not necessarily in Wi ). For example, if the game involves one …rm and many identical consumers, then the consumers’best equilibrium payo¤ under mediated perfect monitoring is at least as good as under private monitoring. We can also show that the same result holds if the preferences of players 9.4 i are su¢ ciently close to each other. The Folk Theorem with Mediated Perfect Monitoring The point of this paper is bounding the equilibrium payo¤ set with private monitoring at a …xed discount factor. Nonetheless, it is worth pointing out that— as a corollary of our results— a strong version of the folk theorem holds with mediated perfect monitoring, without any conditions on the feasible payo¤ set (e.g., full-dimensionality, non-equivalent utilities), and with a rate of convergence of 1 . One reason why this result may be of interest is that it shows that, if players are fairly patient, they cannot do “much”better with private monitoring than with mediated perfect monitoring, even if the su¢ cient conditions for Theorem 1 fail. Another reason is that the mediator can usually be replaced by unmediated communication among the players (we spell out exactly when below), and it seems worth knowing that the folk theorem holds with unmediated communication without conditions on the feasible payo¤ set. Recall that ui is player i’s correlated minmax payo¤ and di is player i’s greatest possible 37 deviation gain. Let u = (ui )i2I and d = (di )i2I . Proposition 6 If a payo¤ vector v 2 int (u (A)) satis…es v >u+ 1 (13) d; then v 2 Emed ( ). Proof. Fix 2 (A) such that v = u ( ). If v satis…es (13), then v 2 W (so in particular W 6= ;), and the in…nite repetition of satis…es (2). Lemma 1 then gives v 2 Emed ( ). The folk theorem is a corollary of Proposition 6.19 Corollary 1 (Folk Theorem) For every strictly individually rational payo¤ vector v 2 u (A), there exists < 1 such that if > then v 2 Emed ( ). Proof. It is immediate that v 2 Emed ( ) for high enough : let = maxi2I di di +vi ui and apply Proposition 6. The proof that taking the closure is unnecessary is in the appendix. In earlier work (Sugaya and Wolitzky, 2014), we have shown that the mediator can usually be dispensed with in Proposition 6 and Corollary 1 if players can communicate through cheap talk. In particular, this is possible unless there are exactly three players, of whom exactly two have equivalent utilities in the sense of Abreu, Dutta, and Smith (1994). 10 Conclusion This paper gives a simple su¢ cient condition ( > ) under which the equilibrium payo¤ set in a two-player repeated game with perfect monitoring and a mediator is a tight, recursive upper bound on the equilibrium payo¤ set in the same game with any imperfect private monitoring structure. There are at least three perspectives from which this result may be 19 The restriction in Corollary 1 to strictly individually rational payo¤ vectors cannot be dropped. In particular, the counterexample to the folk theorem due to Forges, Mertens, and Neyman (1986)— in which no payo¤ vector is strictly individually rational— remains valid in the presence of a mediator. 38 of interest. First, it shows that simple, recursive methods can be used to upper-bound the equilibrium payo¤ set in a repeated game with imperfect private monitoring at a …xed discount factor, even though the problem of recursively characterizing this set seems intractable. Second, it characterizes the set of payo¤s that can arise in a repeated game for some monitoring structure. Third, it shows that information is valuable in mediated repeated games, in that players cannot bene…t from imperfections in the monitoring technology when > . These di¤erent perspectives on our results suggest di¤erent questions for future research. Do moderately patient players always bene…t from any improvement in the monitoring technology, or only from going all the way to perfect monitoring? Is it possible to characterize the set of payo¤s that can arise for some monitoring structure even if < ? If we do know the monitoring structure under which the game is being played, is there a general way of using this information to tighten our upper bound? Answering these questions may improve our understanding of repeated games with private monitoring at …xed discount factors, even if a full characterization of the equilibrium payo¤ set in such games remains out of reach. 11 Appendix 11.1 Proof of Proposition 1: Mediated Perfect Monitoring As the players’stage game payo¤s from any pro…le other than (U; L) sum to at most 3, it follows that the players’per-period payo¤s may sum to more than 3 only if (U; L) is played in some period t with positive probability. For this to occur in equilibrium, player 1’s expected continuation payo¤ from playing U must exceed her expected continuation payo¤ from playing D by more than (1 than U . ) 1 = 65 , her instantaneous gain from playing D rather In addition, player 1 can guarantee herself a continuation payo¤ of 0 by always playing D, so her expected continuation payo¤ from playing U must exceed 1 5 6 = 5. This is possible only if the probability that (T; M ) is played in period t + 1 when U is played in 39 period t exceeds the number p such that 1 1 6 [p (6) + (1 p) (3)] + 1 (6) = 5; 6 or p = 53 . In particular, there must exist a period t + 1 history ht+1 of player 2’s such that 2 (T; M ) is played with probability at least 3 5 in period t + 1 conditional on reaching ht+1 2 . At such a history, player 2’s payo¤ from playing M is at most 1 1 6 3 2 1 ( 3) + (3) + (3) = 0: 5 5 6 On the other hand, noting that player 2 can guarantee himself a continuation payo¤ of 0 by playing 12 L + 12 M , player 2’s payo¤ from playing L at this history is at least 1 1 6 3 2 1 1 (3) + ( 3) + (0) = : 5 5 6 2 Therefore, player 2 has a pro…table deviation, so no such equilibrium can exist. 11.2 Proof of Proposition 1: Private Monitoring Consider the following imperfect private monitoring structure. Player 2’s action is perfectly observed. Player 1’s action is perfectly observed when it equals T or B. When player 1 plays U or D, player 2 observes one of two possible private signals, m and r. Whenever player 1 plays U , player 2 observes signal m with probability 1; whenever player 1 plays D, player 2 observes signals m and r with probability 1 2 each. We now describe a strategy pro…le under which the players’payo¤s sum to Player 1’s strategy: 1 U 3 23 7 3:29. In each odd period t = 2n + 1 with n = 0; 1; :::, player 1 plays + 23 D. Let a1 (n) denote the realization of this mixture. In the even period t = 2n + 2, if the previous action a1 (n) equals U , then player 1 plays T ; if the previous action a1 (n) equals D, then player 1 plays B. Player 2’s strategy: In each odd period t = 2n + 1 with n = 0; 1; :::, player 2 plays L. 40 Let y2 (n) denote the realization of player 2’s private signal. In the even period t = 2n + 2, if the previous private signal y2 (n) equals m, then player 2 plays M ; if the previous signal y2 (n) equals r, then player 2 plays R. We check that this strategy pro…le, together with any consistent belief system, is a sequential equilibrium. In an odd period, player 1’s payo¤ from U is the solution to v = On the other hand, her payo¤ from D is 5 6 (3) + 15 66 (0) + 1 v. 62 5 6 (2) + 15 66 (6) + 1 v. 62 Hence, player 1 is indi¤erent between U and D (and clearly prefers either of these to T or B). In addition, playing L is a myopic best response for player 2, player 1’s continuation play is independent of player 2’s action, and the distribution of player 2’s signal is also independent of player 2’s action. Hence, playing L is optimal for player 2. Next, in an even period, it su¢ ces to check that both players always play myopic best responses, as in even periods continuation play is independent of realized actions and signals. If player 1’s last action was a1 (n) = U , then she believes that player 2’s signal is y2 (n) = m with probability 1 and thus that he will play M . Hence, playing T is optimal. If instead player 1’s last action was a1 (n) = D, then she believes that player 2’s signal is equal to m and r with probability 1 2 each, and thus that he will play 12 M + 12 R. Hence, both T and B are optimal. On the other hand, if player 2 observes signal y2 (n) = m, then his posterior belief that player 1’s last action was a1 (n) = U is 1 3 (1) 1 (1) + 32 3 1 2 1 = : 2 Hence, player 2 is indi¤erent among all of his actions. If player 2 observes y2 (n) = r, then his posterior is that a1 (n) = D with probability 1, so that M and R are optimal. Finally, expected payo¤s under this strategy pro…le in odd periods sum to 31 (4) + 32 (3) = 41 10 , 3 and in even periods sum to 3. Therefore, per-period expected payo¤s sum to 1 1 6 10 1 + (3) 3 6 1+ 1 1 + 4 + ::: 2 6 6 = 23 : 7 Three remarks on the proof: First, the various indi¤erences in the above argument result only because we have chosen payo¤s to make the example as simple as possible. One can modify the example to make all incentives strict.20 Second, players’payo¤s are measurable with respect to their own actions and signals. In particular, the required realized payo¤s for player 2 are as follows: (Action,Signal) Pair: (L; m) (L; r) (M; m) (M; r) (R; m) (R; r) Realized Payo¤: 2 2 0 0 0 0 Third, a similar argument shows that imperfect public monitoring with private strategies can also outperform mediated perfect monitoring.21 11.3 Proof of Lemma 2 Fix such a strategy and any strict full-support equilibrium strict . We construct a strict equilibrium that attains a payo¤ close to v. In period 1, the mediator draws one of two states, Rv and Rperturb , with probabilities 1 " and ", respectively. In state Rv , the mediator’s recommendation is determined as follows: If no player has deviated up to period t, the mediator recommends rt according to (htm ). If only player i has deviated, the mediator recommends r and recommends some best response to as in the proof of Lemma 1. i to player i. i;t to players i according to i, Multiple deviations are treated On the other hand, in state Rperturb , the mediator follows 20 The only non-trivial step in doing so is giving player 1 a strict incentive to mix in odd periods. This can be achieved by introducing correlation between the players’actions in odd periods. 21 Here is a sketch: Modify the current example by adding a strategy L0 for player 2, which is an exact duplicate of L as far as payo¤s are concerned, but which switches the interpretation of signals m and r. Assume that player 1 cannot distinguish between L and L0 , and modify the equilibrium by having player 2 play 21 L + 12 L0 in odd periods. Then, even if the signals m and r are publicly observed, their interpretations will be private to player 2, and essentially the same argument as with private monitoring applies. 42 the equilibrium strict . Player i follows the recommendation ri;t in period t. Since the constructed recommendation schedule has full support, player i never believes that another player has deviated. Moreover, since strict has full support, player i believes that the mediator’s state is Rperturb with positive probability after any history. and the fact that strict Therefore, by (2) is a strict equilibrium, it is always strictly optimal for each player i to follow her recommendation on path. Taking " ! 0 yields a sequence of strict equilibria with payo¤s converging to v. 11.4 Proof of Lemma 3 ~ (V ), there exists an on-path recommendaStandard arguments imply that, for every v 2 B tion strategy that yields payo¤ v and satis…es (2). Under the assumption that > , the conclusion of the lemma follows from Lemma 2. 11.5 Proof of Lemma 4 By Lemma 3 and boundedness, we need only show that Emed ( ) ~ Emed ( ) . B weak ( ) be the set of (possibly weak) sequential equilibrium payo¤s with mediated Let Emed perfect monitoring. Note that, in any sequential equilibrium, player i’s continuation payo¤ at any history hti must be at least ui . Therefore, if is an on-path recommendation strategy in a (possibly weak) sequential equilibrium, then it must satisfy (2). weak assumption that W 6= ;, we have Emed ( ) Now, for any v 2 Emed ( ), let Hence, under the Emed ( ). be a corresponding equilibrium mediator’s strategy. In the corresponding equilibrium, if some player i deviates in period 1 while her opponents are obedient, player i’s continuation payo¤ must be at least ui . Hence, we have E [(1 ) ui (ai ; a i ) + wi (ai ; a i ) j ai ] max E [(1 a0i 2Ai ) ui (a0i ; a i ) j ai ] + ui ; where wi (ai ; a i ) is player i’s equilibrium continuation payo¤ when action pro…le (ai ; a i ) is recommended and obeyed in period 1. Finally, since action pro…le (ai ; a i ) is in the support 43 of the mediator’s recommendation in period 1, each player assigns probability 1 to the true mediator’s history when (ai ; a i ) is recommended and played in period 1. Therefore, continuation play from this history is itself at least a weak sequential equilibrium. weak particular, we have wi (ai ; a i ) 2 Emed ( ) In Emed ( ) for all (ai ; a i ) 2 supp (ht ). Hence, v is minmax-threat decomposable on Emed ( ) by action pro…le (;) and continuation payo¤ ~ Emed ( ) . function w, so in particular v 2 B ~ preserves ~ Emed ( ) . As Emed ( ) is compact and B B We have shown that Emed ( ) ~ Emed ( ) = B ~ Emed ( ) . B compactness, taking closures yields Emed ( ) 11.6 Proof of Lemma 5 By (6), kw (w) F (w) rt w (w)k ~ F (w) ~ rt kw wk. ~ By (7), = 1fw (rt )2W 1g kw (w) p(w)(rt )) w1 g fp(w)(r ~ t )w (w)(r ~ t ) + (1 w (w)k ~ : Combining these inequalities yields kF (w) 11.7 fp(w)(rt )w (w)(rt ) + (1 F (w)k ~ kw p(w)(r ~ t )) w1 g wk. ~ Proof of Lemma 6 It is useful to introduce a family of auxiliary value functions wT 1 T =1 will converge to w and w pointwise in rt as T ! 1. For periods t wT (rt ) = w (rt ) and w On the other hand, for periods t T ;T (rt 1 ) = w (rt 1 ): 1, de…ne w ;T and w ;T 1 , T =1 which T , de…ne (14) (rt ), pT (rt ), and wT (rt ) given wT (rt+1 ) recursively, as follows. First, de…ne w ;T (rt ) = (1 )u (rt ) + E wT (rt+1 )jrt : 44 (15) Note that, for t = T 1, this de…nition is compatible with (14). Second, given w ;T (rt ), de…ne wT (rt ) = 1fw pT (rt )w (rt )2W1 g ;T (rt ) + 1 pT (rt ) w1 + 1fw (rt )2W = 1gw 1 (16) ; where, when w (rt ) 2 W1 , pT (rt ) is the largest number in [0; 1] such that pT (rt )w2;T (rt ) + 1 We show that w ;T w2 (rt ): (rt ) = w (rt ) for all rt 2 At . Proof. By Lemma 5, it su¢ ces to show that F (wT ) = wT +1 . For t that w ;T +1 (17) converges to w . ;T Lemma 7 limT !1 w pT (rt ) w21 T + 1, (14) implies (rt 1 ) = w (rt 1 ). On the other hand, given wT , w (w2T ) is the value calculated according to (6). Since wT (rt ) = w (rt ) by (14), we have w (w2T )(rt 1 ) = w (rt 1 ) by (6). Hence, w (w2T )(rt 1 ) = w For t ;T +1 (rt 1 ). (18) T , by (15), we have w ;T +1 (rt ) = (1 )u (rt ) + E wT +1 (rt+1 )jrt : By (16), wT +1 (rt ) = 1fw (rt )2W1 g = 1fw (rt )2W1 g pT +1 (rt )w ;T +1 (rt ) + 1 pT +1 (rt )w (wT )(rt ) + 1 pT +1 (rt ) w1 + 1fw pT +1 (rt ) w1 + 1fw (rt )2W = 1gw 1 (rt )2W = 1gw 1 ; where the second equality follows from (18) for t = T , and follows by induction for t < T . Recall that pT +1 is de…ned by (17). On the other hand, p(wT )(rt ) is de…ned in (8) using 45 w = w (wT ). Since w wT +1 (rt ) = 1fw ;T +1 (rt )2W1 g (rt ) = w (w2T )(rt ), we have pT +1 (rt ) = p(wT )(rt ). Hence, p(wT )(rt )w (wT )(rt ) + 1 p(wT )(rt ) w1 + 1fw (rt )2W = 1gw 1 : = F (wT )(rt ); as desired. As w ;T (rt ), wT (rt ), and pT (rt ) converge to w (rt ), w(rt ), and p(rt ) by Lemma 7, the following lemma implies Lemma 6: Lemma 8 For all t = 1; :::; T 1, if w (rt ) 2 W1 , then pT (rt )w ;T (rt ) + 1 pT (rt ) w1 Pareto dominates w (rt ). Proof. For t = T 1, the claim is immediate since w Suppose that the claim holds for each period period t. By construction, pT (rt )w2;T (rt ) + 1 show that pT (rt )w1;T (rt ) + 1 pT (rt ) w11 ;T (rt ) = w (rt ) and so pT (rt ) = 1. t + 1. We show that it also hold for pT (rt ) w21 w2 (rt ). Thus, it su¢ ces to w1 (rt ). Note that w ;T = (1 = (1 (rt ) )u )u (rt ) + E wT (rt+1 )jrt 8 t+1 t < P t+1 jr ) pT (rt+1 )w ;T (rt+1 ) + 1 r :w (rt+1 )2W1 Pr (r t (r ) + P : + Pr (rt+1 jrt ) w1 t+1 t+1 r while w (rt ) = (1 (rt ) + )u X :w (r Pr rt+1 T p (r t+1 ) w = r : :w (r )2W1 t+1 t T t+1 ;T )2W = 1 rt+1 jrt w (rt ): t+1 T t+1 Pr (r jr ) p (r )w (r ) + 1 p (r ) w P t+1 t + rt+1 :w (rt+1 )2W jr ) fw1 w (rt+1 )g = 1 Pr (r 46 1 w (r 9 = ; Hence, w ;T (rt ) w (rt ) 8 < P t+1 t+1 1 t+1 ) 9 = ; : ; When w (rt+1 ) 2 W1 , the inductive hypothesis implies that pT (rt+1 )w ;T (rt+1 ) + 1 pT (rt+1 ) w1 w (rt+1 ) 0: On the other hand, note that X rt+1 jrt Pr rt+1 :w (rt+1 )2W = 1 for some number l(rt ) w1 w (rt+1 ) = l(rt )(w1 w(r ~ t )) 0 and vector w(r ~ t) 2 = W1 . In total, we have w ;T (rt ) = w (rt ) + l(rt )(w1 w(r ^ t )) (19) w^1 (rt ), if w11 0 and vector w(r ^ t) w(r ~ t) 2 = W1 . Since w11 n o w1 (rt ) then (19) implies that min w1;T (rt ); w11 w1 (rt ), and therefore pT (rt )w1;T (rt ) + for some number l(rt ) pT (rt ) w11 1 w1 (rt ). In addition, if w (rt+1 ) 2 W1 with probability one, then the inductive hypothesis implies that w2;T (rt ) pT (rt )w1;T (rt ) + 1 w2 (rt ), and therefore pT (rt ) = 1 and pT (rt ) w11 = w1;T (rt ) = w1 (rt ) + l(rt )(w11 w^1 (rt )) w1 (rt ): Hence, it remains only to consider the case where w11 < w1 (rt ) and l(rt ) > 0. In this case, take a normal vector have 1 1 0 and 1 2 1 of the supporting hyperplane of u (A) at w1 . We > 0, and in addition (as w^ (rt ) 1 1 w1 w1 w~ (rt ) 2 u (A) and w (rt ) 2 u (A)) w(r ^ t) 0; w (rt ) 0: 47 As w11 w^1 (rt ) > 0 and w11 w1 (rt ) < 0, we have w^2 (rt ) w21 w11 w^1 (rt ) 1 1 1 2 w21 w2 (rt ) : w1 (rt ) w11 Next, by (19), the slope of the line from w (rt ) to w ;T (rt ) equals the slope of the line from w(r ^ t ) to w1 . Hence, w2 (rt ) w2;T (rt ) w1;T (rt ) w1 (rt ) w21 w2 (rt ) : w1 (rt ) w11 In this inequality, the denominator of the left-hand side and the numerator of the right-hand side are both positive: w1;T (rt ) > w1 (rt ) by (19) and l(rt ) > 0, while w21 > w2 (rt ) because w (rt ) 2 W1 and w (rt ) 6= w21 . Therefore, the inequality is equivalent to w2 (rt ) w2;T (rt ) w21 w2 (rt ) w1;T (rt ) w1 (rt ) : w1 (rt ) w11 Now, let q 2 [0; 1] be the number such that qw1;T (rt ) + (1 Note that q) w11 = w1 (rt ): w2 (rt ) w2;T (rt ) p (r ) = ; w21 w2 (rt ) T 1 while 1 t w1;T (rt ) w1 (rt ) : q= w1 (rt ) w11 Hence, pT (rt ) > q. Finally, we have seen that w11 pT (rt )w1;T rt + 1 w1 (rt ) qw1;T rt + (1 pT (rt ) w11 48 w1;T (rt ), so we have q) w11 = w1 (rt ): 11.8 Proof of Proposition 2 We wish to show that supp( ) = A1 whenever int consists of two steps: we show that supp( ) T i2I Wi \ u (A1 ) 6= ;. The proof A1 in Section 11.8.1, and show that supp( ) A1 in Section 11.8.2 11.8.1 A1 Proof of supp( ) Recall that A1 is the largest …xed point of the monotone operator S. Hence, to prove that supp( ) A1 , it su¢ ces to show that there exists a set supp"#0 ( ) such that supp( ) supp"#0 ( ) and supp"#0 ( ) = S(supp"#0 ( )). To this end, it will be useful to consider "-sequential equilibria in the repeated game with monitoring structure p . We say that an assessment is an "-sequential equilibrium if it is consistent, and, for each player i and (on- or o¤-path) history hti , player i’s deviation gain at history hti is no more than ". Let supp1" ( ) = 8 < : 0 a2A:@ there exist an "-sequential equilibrium such that a 2 supp ( (;)) 19 = A ; be the on-path support of actions in the initial period in "-sequential equilibrium with monitoring structure p . In addition, let suppt" ( ) = 8 < : 0 a2A:@ there exist an "-sequential equilibrium and history such that a 2 supp ( (htm )) be the (possibly o¤-path) support of actions in period t 1. htm 19 = A ; We will establish the following equality in Lemma 10: \ ">0 [ t suppt" ( ) = \ ">0 supp1" ( ): (20) As a preliminary result toward the proof of Lemma 10, we will show in Lemma 9 that T 1 ">0 supp" ( ) is a product set. 49 T supp1" ( ). Since a sequential equilibrium is an "-sequential equiT S t librium for every ", we have supp( ) ">0 t supp" ( ). Together with (20), this implies Let supp"#0 ( ) that supp( ) ">0 supp"#0 ( ). Hence, it remains to show that supp"#0 ( ) = S(supp"#0 ( )). Finally, as supp"#0 ( ) is a product set by Lemma 9 and S(A) A, it su¢ ces to show that supp"#0 ( ) S(supp"#0 ( )). A for every product set This is established in Lemma 11, completing the proof. We now prove Lemmas 9, 10, and 11. Given a mediator’s strategy , let i denote the marginal distribution of player i’s rec- ommendation, and let supp1i;" ( ) = Lemma 9 T ">0 Proof. Since T 1 ">0 supp" ( ). T 8 < : 0 ai 2 Ai : @ there exists an "-sequential equilibrium such that ai 2 supp ( i (;)) supp1" ( ) is a product set: ">0 supp1" ( ) T ">0 Q T ">0 supp1" ( ) = T ">0 Q 19 = A : ; supp1i;" ( ). i2I supp1i;" ( ), we are left to show i2I T ">0 Q supp1i;" ( ) i2I To this end, …x an arbitrary "-sequential equilibrium strategy . It suf- …ces to show that, for each " > 0, there exists a 2"-sequential equilibrium strategy product Q such that supp product (;) = supp( i (;)). (This is su¢ cient because it implies that i2I T T T Q 1 1 supp1i;" ( ) ">0 supp" ( ).) ">0 supp2" ( ) = ">0 i2I For > 0, de…ne the period 1 recommendation distribution under product (;)(a1 ) = 8 > > > > > > > < (1 Q > > supp( > > i2I > > > : ) (;)(a1 ) product (;) by if a1 2 supp( (;)); Q if a1 2 supp( i (;)) i2I i (;)) jsupp( (;))j but a1 2 = supp( (;)); otherwise. 0 Note that the mediator recommends action a1 with positive probability if ai;1 is in the support of the marginal distribution of player i’s recommendation under . In particular, Q supp product (;) = supp( i (;)). i2I 50 In subsequent periods t 2, if a1 2 supp( (;)) was recommended in period 1, then the Q supp( i (;)) but mediator recommends actions according to product (htm ) = (htm ). If a1 2 i2I a1 2 = supp( (;)) was recommended, then the mediator “resets”the history and recommends actions according to as if the game started from period 2: formally, t product (hm ) ~ t 1) = (h m ~ t 1 = ht . with (r1 ; a1 ); h m m As ! 0, player i’s belief about a i;1 given ai;1 2 supp( i (;)) under her belief given ai;1 in the original equilibrium . Since product product converges to is an "-sequential equilibrium, this implies that player i’s deviation gain in period 1 converges to at most " as ! 0. In addition, if a1 2 supp( (;)), then continuation play from period 2 on is the same under Q supp( i (;)) but a1 2 = supp( (;)), then continuation product and . Furthermore, if a1 2 i2I play from period 2 on under product for any " > 0, there exists > 0 such that player i’s deviation gain at any history under product is at most 2", so Lemma 10 T ">0 S t product suppt" ( ) = is the same as play from period 1 on under . In sum, is an 2"-sequential equilibrium strategy. T ">0 supp1" ( ). T S T T S t 1 Proof. Since ">0 t suppt" ( ) ">0 t supp" ( ) ">0 supp" ( ) by de…nition, we are left to show T 1 together with a consistent belief ">0 supp" ( ). Fix an "-sequential equilibrium strategy system . As in Lemma 9, it su¢ ces to show that, for each t and ", there exists a 2"- sequential equilibrium strategy with consistent belief system such that supp ( (;)) = [ supp( (htm )), where the union is taken over all histories htm , including o¤-path histories. htm Fix t and " arbitrarily. Since ( ; ) is an "-sequential equilibrium, there exists such that, for each strategy-belief system pair ; , if ; >0 satis…es (21) (hm ) = (hm ) and (hm jhi ) 2 (1 )2 (hm jhi ); 1 (1 )2 (hm jhi ) for each i, , hi , and hm — that is, if the recommendation schedule is the same as 51 (22) and the posterior is close to an after each history— then is 2"-sequentially rational. Fix such ; > 0. Since is an "-sequential equilibrium, there exists a sequence of completely mixed strat- egy pro…les k 1 k=1 and associated belief systems obedient strategy pro…le at strategy n and k converges to k 1 k=1 . such that k converges to the Consider the following mediator’s . In period 1, the mediator draws a “…ctitious”history htm according to the probability of htm given n n (htm ), . Given htm , the mediator draws r according to (htm ). In period 1, the mediator sends (hti ; ri ) to each player i, where hti is player i’s component in htm . [ Note that, since n is completely mixed, we have supp ( (;)) = supp( (htm )). htm In period 2, let the beginning of period h2; m = a1 ; (r 0 ; a 0 ) 1 0 =2 with 2;2 hm = a1 be the mediator’s history at except for the recommendation pro…le r in period 1. The mediator t recommends actions as if the history h2; m happened after (hm ; r) in the original equilibrium : 2; t that is, she recommends action pro…les according to (htm ; r; h2; m ), where (hm ; r; hm ) denotes t 1 t the concatenation of htm , r, and h2; m . Similarly, let hi = (ri;s ; as )s=1 be player i’s …ctitious history, and let h2; = a1 ; (ri; 0 ; a 0 ) i 1 0 =2 be her history at the beginning of period . 2; t When we view hm = (htm ; r; h2; m ) and hi = hi ; ri ; hi as the history (including the …ctitious one drawn at the beginning of the game), this new strategy Hence, it su¢ ces to show that we can construct a consistent belief system n satis…es (21). which satis…es (22). To this end, denote player i’s belief about the mediator’s history hm when player i’s history in period is hi = hti ; ri ; h2; i by Pr (hm j hi ). We construct the following sequence of perturbed full-support strategies: each player i plays k t i (hi ; ri ) k 1 k=1 in period 1 and then plays 2; k t i (hi ; ri ; hi ; ri; ) in period 2. Recall that is the sequence converging to the original equilibrium . Let Pr n; k (hm j hi ) be player i’s belief about hm conditional on her history hi , given that the …ctitious history htm is drawn from n 52 at the beginning of the game and the players take Pr k n; k in the subsequent periods. We have k Pr (hm j hi ) = k Pr t t (h2; m j hm ; r) Pr (r j hm ) Pr h2; j hti ; ri Pr (ri j hti ) Pr i ~ t ;~ h m r i Pr k h2; i (htm ) n (hti ) k t t (h2; m j hm ; r) Pr (r j hm ) ~ t ; ri ; r~ i Pr ri ; r~ i j h ~t jh Pr = P n m m n Pr n ~ t j ht h m i By consistency of ( ; ), for su¢ ciently large n, we have n (1 htm ) j Pr (htm ) = n Pr (hti ) hti n htm j hti 1 htm j hti 1 for all htm and hti . Moreover, (1 ) ~ t j ht h m i Pr n ~ t j ht = h m i n 1 ~ t j ht h m i 1 ~ t j ht h m i ~ t and ht . for each h i m Fix such a su¢ ciently large n, and de…ne (hm j hi ) = lim Pr n; k k!1 Since the belief system (hm j hi ) : is consistent, we have lim Pr k k!1 t 2; t h2; m j hm ; r = (hm j hm ; r) Hence, for each hm and hi , we have n (hm j hi ) = P = That is, ~ t ;~ h m r (1 i t t (h2; (htm j hti ) m j hm ; r) Pr (r j hm ) Pr ~ t ; ri ; r~ i Pr ri ; r~ i j h ~ t Pr n h ~ t j ht h2; jh m m m i i )2 (hm j hi ) ; 1 (1 satis…es (22), as desired. 53 )2 (hm j hi ) : Pr (htm ) : n Pr (hti ) Lemma 11 supp"#0 ( ) S supp"#0 ( ) : Since supp"#0 ( ) = lim"!0 supp1" ( ) by de…nition, Proof. Fix a 2 supp"#0 ( ) arbitrarily. there exists a sequence of "-sequential equilibria f a 2 limn!1 supp( supp( "n "n (f;g)). "n 1 gn=1 with limn!1 "n = 0 such that Taking a subsequence if necessary, we can assume that a 2 (f;g)) for each n. For each n, incentive compatibility of a player i who is recommended ai in period 1 implies the following three conditions: n 1. Let i 2 supp1"n ( ) be the conditional distribution of player i’s opponents’ i actions when ai is recommended to player i. In addition, let win (^ ai jai ) be player i’s expected continuation payo¤ when she is recommended ai and plays a ^i . Then (1 ) ui ai ; n i + win (ai jai ) + "n max (1 a ^i 2Ai ) ui a ^i ; n i + win (^ ai jai ) : 2. Player i’s continuation payo¤ does not exceed her best feasible payo¤ with the support S of actions restricted to t suppt"n ( ): win (ai jai ) ui S max a2 t suppt"n ( ) (a) : 3. For each a ^i 2 Ai , by "-sequential rationality for player i after taking a ^i , player i’s continuation payo¤ after a ^i is at least her minmax payo¤ with the support of punishment S t actions restricted to : t supp"n ( ) i win (^ ai jai ) Since limn!1 min ^ i2 ( S t t supp"n ( )) max ui (~ ai ; ^ i ) i a ~i 2Ai "n : (A i ) and u(A) are compact, taking a subsequence if necessary, there exist n i i = and wi (^ ai jai ) = limn!1 win (^ ai jai ) for each a ^i . Moreover, since A is …nite, there 54 exists N such that for each n 8 > > > < > > > : N , we have supp1"n ( ) = limn!1 supp1"n ( ) = lim"!0 supp1" ( ) = supp"#0 ( ) = S t suppt"n ( ) = limn!1 Hence, for n n i S t suppt"n ( ) = lim"!0 S Y supp"#0;i ( ); i2I t suppt" ( ) = |{z} supp"#0 ( ) = Y supp"#0;i ( ): i2I by Lemma (10) (23) N , we have supp1"n ( ) 2 win (ai jai ) a2 win (^ ai jai ) ui S maxt t supp"n ( ) ^ i2 = i (a) = min Therefore, i S ( max a2supp"#0 ( ) n i i = i supp"#0; i ( ) ; ui (a) ; max ui (~ ai ; ^ i ) t t supp"n ( )) = limn!1 supp"#0 ( ) a ~i 2Ai "n = ^ i2 min (supp"#0; max ui (~ ai ; ^ i ) i( ~i 2Ai )) a and wi (^ ai jai ) = limn!1 win (^ ai jai ) satisfy the following three conditions: 1. If ai is recommended, player i’s opponents play i, player i’s on-path continuation payo¤ is wi (ai jai ), and player i’s continuation payo¤ after deviating to a ^i is wi (^ ai jai ), then player i wants to take ai : (1 ) ui (ai ; i) + wi (ai jai ) max f(1 a ^i 2Ai ) ui (^ ai ; i) + wi (^ ai jai )g ; (24) 2. Player i’s continuation payo¤ does not exceed her best feasible payo¤ with the support of actions restricted to supp"#0 ( ): wi (ai jai ) max a2supp"#0 ( ) ui (a) 3. Player i’s continuation payo¤ following a deviation is at least her minmax payo¤ with 55 "n : the support of punishment actions restricted to supp"#0; i ( ): wi (^ ai jai ) ^ i2 min (supp"#0; max ui (~ ai ; ^ i ) : i( ~i 2Ai )) a Therefore, (24) implies that (11) holds with A = supp"#0 ( ). Hence, ai 2 Si supp"#0 ( ) for all i 2 I, so a 2 S supp"#0 ( ) . 11.8.2 Proof of supp( ) We show that if int T i2I A1 Wi \ u (A1 ) 6= ; then. A1 to the proof of Lemma 1. T Fix v 2int i2I Wi \ u (A1 ) 6= ;, and let (r) > 0 for all r 2 A1 . Let i ^ 2 supp( ). The argument is similar (A1 ) be such that u ( ) = v and be a solution to the problem min 1 max ui (^ ai ; i2 (A ) ai 2Ai i) : Let " i be the following full-support (within A1 ) approximation of i : " i = (1 ") i + P T " a i 2A1 Aa 1i . Since v 2int i2I Wi , there exists " > 0 such that, for each i 2 I, we i j ij have 1 vi > max ui ai ; " i + fui (ai ; r i ) ui (r)g : (25) max 1 ai 2Ai r2A ;ai 2Ai In addition, choose " > 0 small enough such that, for each player i, some best response to " i is included in A1 i : this is always possible, as every best response to i is included in A1 i . We can now construct an equilibrium strategy with supp ( struction is identical to that in the proof of Lemma 1, except that and player i’s deviations are punished by recommending " i (;)) = A1 . The conis recommended on path, to her opponents. Incentive compatibility follows from (25), just as it follows from (1) in the proof of Lemma 1. 56 11.9 Proof of Proposition 5 Consider the following …ve-player game with common discount factor = 3 . 4 Player 1 has two actions f11 ; 21 g, players 2 and 4 have three actions f12 ; 22 ; 32 g and f14 ; 24 ; 34 g, and player 3 has four actions f13 ; 23 ; 33 ; 43 g. Player 5 is a dummy player with one action, and her utility is always equal to u1 (a). When player 3 plays 13 , the payo¤ matrix for players 1 and 2 is as follows, regardless of player 4’s actions: 12 11 0; 1 21 22 32 20; 7 20; 1 3; 0 20; 0 20; 6 In addition, player 3’s payo¤ and player 4’s payo¤ are 0, regardless of (a1 ; a2 ; a4 ). When player 3 plays 23 , 33 , or 43 , the payo¤ matrix is as follows, regardless of (a1 ; a2 ): 14 23 0; 1; 0; 0 24 34 1; 0; 1; 1 1; 0; 1; 1 33 1; 1; 0; 0 1; 0; 1; 1 1; 0; 1; 1 43 1; 1; 0; 0 1; 0; 1; 1 10; 3; 1; 1 Note that the minmax payo¤ is zero for every player except player 5 (whose minmax payo¤ is 20), while the players’ maximum deviation gains are d1 = 3, d2 = 6, d3 = 1, d4 = 2, and d5 = 0. Letting v = (10; 3; 1; 1; 10) in the formula for bound on is max Thus, > , we see that a lower 3 6 1 2 ; ; ; ; 3 + 10 6 + 3 1 + 1 2 + 1 0 0 10 + 20 2 = : 3 , so W 6= ;. Consider the Pareto weight = (0; 0; 0; 0; 1): that is, we want to maximize player 5’s payo¤, or equivalently minimize player 1’s payo¤. We show that player 5’s best payo¤ is strictly less than zero with mediated perfect monitoring, while her best payo¤ equals zero with some private monitoring structure. 57 11.9.1 Mediated Perfect Monitoring We show that Emed ( ) does not include a payo¤ vector v with v1 = 0. Since Emed ( ) coincides with the set of (possibly weak) sequential equilibrium payo¤s when W 6= ; (see the proof of Lemma 4), it su¢ ces to show player 1’s payo¤ is strictly positive in every sequential equilibrium with mediated perfect monitoring. To do so, we …x a putative sequential equilibrium with v1 = 0, and establish a series of claims leading to a contradiction. Claim 1: At any on-path history where player 1 has always played 11 in the past (where this includes the null history), player 3 must play 13 with probability 1. Proof: Fix an on-path history htm where player 1 has always played 11 . Let denote the ex ante probability of reaching history htm , and let Pr (ai ) be the probability that action ai is played in period t conditional on reaching history htm . Note that player 1’s equilibrium payo¤ is bounded from below by (1 ) t fPr(23 ) Pr(24 or 34 j 23 ) + Pr(33 ) + Pr(43 )g ; as if she always plays 11 she gets payo¤ at least 0 in every period and reaches history htm with probability at least . Hence, Pr(33 ) = Pr(43 ) = 0. Given Pr(33 ) = Pr(43 ) = 0, we must also have Pr(23 ) = 0. To see why, suppose otherwise. Then, we must have Pr(24 or 34 j 23 ) = 0, that is, Pr(14 j 23 ) = 1. Hence, player 4’s deviation gain from 14 to 24 is no less than Pr(23 j 14 ) = Pr(14 j 23 ) Pr(23 ) Pr(23 ) = : Pr(14 ) Pr(14 ) Therefore, to ensure that player 4 does not deviate, we must guarantee him a payo¤ 1 Pr(23 ) Pr(14 ) whenever 14 is recommended. But then player 4’s ex ante payo¤ at history htm (before his recommendation is drawn) is at least Pr(14 ) (1 )0 + 1 Pr(23 ) Pr(14 ) 58 + Pr(a4 6= 14 )0 > 0; since player 4 can always guarantee a payo¤ of 0 after any recommendation by playing 14 forever. Hence, player 4’s equilibrium continuation payo¤ is strictly positive. But, by feasibility, this implies that player 1’s equilibrium continuation payo¤ is also strictly positive, which is a contradiction. Therefore, we must have Pr(23 ) = Pr(33 ) = Pr(43 ) = 0, or equivalently Pr (13 ) = 1. Claim 2: At any on-path history where player 1 has always played 11 in the past, player 2 must play 12 with probability 1. Proof: Given that Pr (13 ) = 1, this follows by the same argument as Claim 1: if Pr (22 ) or Pr (32 ) were positive, then player 1 could guarantee a positive payo¤ by always playing 11 . Claim 3: At any on-path history where player 1 has always played 11 in the past, player 1 must play each of 11 and 21 with strictly positive probability. Proof: Given that Pr (13 ) = 1 and Pr (21 ) = 1, player 2’s deviation gain is 6 if player 1 plays a pure action (either 11 or 21 ). Therefore, to ensure that player 2 does not deviate, his continuation payo¤ must be at least 1 1’s continuation payo¤ must be at least (20; 7) has slope 10 ). 3 (6) = 2. But, by feasibility, this implies that player 10 3 (noting that the line segment between (0; 1) and Hence, even if player 1 receives payo¤ 3 in period t, her payo¤ from period t on is at least (1 ) ( 3) + 10 3 = 7 4 > 0. So player 1 could guarantee a positive payo¤ by playing 11 up to period t and then following her equilibrium strategy. Claim 4: At any on-path history where player 1 has always played 11 in the past, player 1’s continuation payo¤ must equal 0 after 11 is recommended and must equal 1 after 21 is recommended. Proof: Player 1’s continuation payo¤ cannot be less than her minmax payo¤ of 0. If her continuation payo¤ after 11 is recommended were strictly positive, then she could guarantee a positive payo¤ by always playing 11 . Hence, her continuation payo¤ after 11 is recommended must equal 0. If her continuation payo¤ after 21 is recommended were greater than 1, then she could guarantee a positive payo¤ by playing 11 up to period t and then following her equilibrium strategy (as (1 ) ( 3) + (1) = 0). 59 If her continuation payo¤ after 21 is recommended were less than 1, she would deviate from 21 to 11 . Claim 5: Let v be the supremum of player 2’s continuation payo¤ over all on-path 4 . 3 histories where player 1’s continuation payo¤ equals 1. Then v Proof: Let v be the supremum of player 2’s continuation payo¤ over all on-path histories where player 1’s continuation payo¤ equals 0. Let P be the set of mixing probabilities p such that, at some on-path history htm where player 1 has always played 11 in the past, player 1 plays p11 + (1 p) 21 . As Pr (13 ) = 1 and Pr (21 ) = 1 at such a history, player 2’s deviation gain is at least 1 max f6p; 6 6pg = max f2p; 2 2pg : Therefore, to ensure that player 2 does not deviate, his continuation payo¤ at such a history must be at least max f2p; 2 2pg. Hence, v max f2p; 2 2pg for all p 2 P: On the other hand, by the de…nition of v and v and Claim 4, we have v sup (1 ) p + pv + (1 p) v : p2P Hence, there exists a number p 2 [0; 1] such that (1 ) p + pv + (1 max f2p; 2 p) v 2pg ; or equivalently v 1 (1 p max f2p; 2 p) 2pg (1 (1 )p : p) It is straightforward to show that the right-hand side of this inequality is minimized at p = 12 , which yields v 1 4 = : 3 Finally, note that the payo¤ vector (1; v ) is not in the feasible set for any v 60 4 , 3 as 13 10 is the largest payo¤ of player 2’s that is consistent with player 1 receiving payo¤ 1. We thus obtain a contradiction. 11.9.2 Private Monitoring Suppose that players 1, 2, 3, and 5 observe everyone’s actions perfectly, while player 4 only observes her own action and payo¤ and player 2’s action. We claim that (0; 1; 0; 0; 0) is a sequential equilibrium payo¤ with this private monitoring structure. On the equilibrium path, in period 1, players take (11 ; 12 ; 13 ; 14 ) and (21 ; 12 ; 13 ; 14 ) with probabilities 21 ; 12 . From period 2 on, player 3 takes 23 if player 1 took 11 in period 1, and she takes 33 if player 1 took 21 . Players 1, 2, and 4 take (11 ; 12 ; 14 ) with probability one. O¤ the equilibrium path, all players’ deviation are ignored, except for deviations by player 2 in period 1 and deviations by player 4 in period t period 1 (resp., player 4 deviates in period t period 2. If player 2 deviates in 2) then, in each period 2 (resp., each t + 1), players 1 and 2 play 11 and 12 with probability one, player 3 mixes 23 and 33 with probabilities 21 ; 12 , independently across periods, and player 4 mixes 24 and 34 with probabilities 12 ; 12 , independently across periods. Note that this action pro…le is a static Nash equilibrium. It remains only to check incentive compatibility. Player 3 always plays a static best response. Players 1 and 2 play static best responses from period 2 onward, on and o¤ the equilibrium path. Consider player 4’s incentives. In period 1, player 4’s payo¤ equals 0 regardless of her action. Hence, in period 2 player 4 believes that player 1 played 12 11 + 12 21 in period 1, and thus that Pr (23 ) = Pr (33 ) = 1 2 in period 2. In addition, player 4’s payo¤ remains constant at 0 as long as she plays 14 , so in every period t she continues to believe that Pr (23 ) = Pr (33 ) = 21 . Hence, playing 14 is a static best response, and player 4 is minmaxed forever if she deviates, so following her recommendation is incentive compatible on path. Furthermore, if player 2 deviates in period 1 (or player 4 herself deviates in period t 2), then player 4 sees this and anticipates that player 3 will thenceforth play 21 23 + 12 33 independently across periods, so 12 24 + 12 34 is a best response. 61 We are left to check players 1 and 2’s incentives in period 1. For player 1, recalling that = 43 , we have payo¤ of taking 11 = (1 )0 + 0 |{z} after 11 , player 3 takes 23 with probability one = (1 ) ( 3) + 1 |{z} after 21 , player 3 takes 33 with probability one = payo¤ of taking 21 : Hence, player 1 is willing to mix between 11 and 21 . For player 2, payo¤ of taking 12 = (1 ) 1 2 1+ 1 2 0 + whether player 3 takes 23 or 33 , player 2 obtains 1 1 7 + 2 = payo¤ of taking 22 1 1 = (1 ) 1+ 2 2 = payo¤ of taking 32 . = (1 1 |{z} ) 0 6 + 0 Hence, player 2 is willing to play 12 . 11.10 Proof of Corollary 1 Since u (A) is convex and v > u, there exists v 0 2 u (A) and " > 0 such than v Taking 1 = maxi2I Thus, for every Let 2 > = maxi2I di di +vi0 ui 1, di di +" 2" > v 0 > u. and applying Proposition 6 yields v 0 2 Emed ( ) for all there exists v ( ) 2 Emed ( ) such that v and let = max 1; 2 > 1. " > v ( ) > u. . Then, for every > , the following is an equilibrium: On path, the mediator recommends a correlated action pro…le that attains v. If anyone deviates, play transitions to a sequential equilibrium yielding payo¤ v ( ). Such an equilibrium exists as v ( ) 2 Emed ( ) for > 1. Finally, since this punishment results in a loss of continuation payo¤ of at least " for each player, the mediator’s on-path recommendations are incentive compatible for 62 > 2. References [1] Abreu, D., P.K. Dutta, and L. Smith (1994), “The Folk Theorem for Repeated Games: a NEU Condition,”Econometrica, 62, 939-948. [2] Abreu, D., D. Pearce, and E. Stacchetti (1990), “Toward a Theory of Discounted Repeated Games with Imperfect Monitoring,”Econometrica, 58, 1041-1063. [3] Amarante, M. (2003), “Recursive Structure and Equilibria in Games with Private Monitoring,”Economic Theory, 22, 353-374. [4] Athey, S. and K. Bagwell (2001), “Optimal Collusion with Private Information,”RAND Journal of Economics, 32, 428-465. [5] Awaya, Y. and V. Krishna (2014), “On Tacit versus Explicit Collusion,”mimeo. [6] Ben-Porath, E. and M. Kahneman (1996), “Communication in Repeated Games with Private Monitoring,”Journal of Economic Theory, 70, 281-297. [7] Bergemann, D. and S. Morris (2013), “Robust Predictions in Games with Incomplete Information,”Econometrica, 81, 1251-1308. [8] Cherry, J. and L. Smith (2011), “Unattainable Payo¤s for Repeated Games of Private Monitoring,”mimeo. [9] Compte, O. (1998), “Communication in Repeated Games with Imperfect Private Monitoring,”Econometrica, 66, 597-626. [10] Dhillon, A. and J.-F. Mertens, (1996) “Perfect Correlated Equilibria,” Journal of Economic Theory, 68, 279-302. [11] Forges, F. (1986), “An Approach to Communication Equilibria,” Econometrica, 54, 1375-1385. [12] Forges, F. (2009), “Correlated Equilibrium and Communication in Games,”Encyclopedia of Complexity and Systems Science, edited by R. Meyers, Springer: New York. [13] Forges, F., J.-F. Mertens, and A. Neyman (1986), “A Counterexample to the Folk Theorem with Discounting,”Economics Letters, 20, 7. [14] Fuchs, W. (2007), “Contracting with Repeated Moral Hazard and Private Evaluations,” American Economic Review, 97, 1432-1448. [15] Fudenberg, D. and D. Levine (1983), “Subgame-Perfect Equilibria of Finite- and In…nite-Horizon Games,”Journal of Economic Theory, 31, 251-268. [16] Fudenberg, D., D. Levine, and E. Maskin (1994), “The Folk Theorem with Imperfect Public Information,”Econometrica, 62, 997-1039. 63 [17] Gossner, O. (2000), “Comparison of Information Structures,” Games and Economic Behavior, 30, 44-63. [18] Green, E.J. and R.H. Porter (1984), “Noncooperative Collusion under Imperfect Price Information,”Econometrica, 52, 87-100. [19] Heller, Y., E. Solan, and T. Tomala, (2012) “Communication, Correlation and CheapTalk in Games with Public Information,”Games and Economic Behavior, 74, 222-234. [20] Kandori, M. (1991), “Cooperation in Finitely Repeated Games with Imperfect Private Information,”mimeo. [21] Kandori, M. (2002), “Introduction to Repeated Games with Private Monitoring,”Journal of Economic Theory, 102, 1-15. [22] Kandori, M. and H. Matsushima (1998), “Private Observation, Communication and Collusion,”Econometrica, 66, 627-652. [23] Kandori, M. and I. Obara (2010), “Towards a Belief-Based Theory of Repeated Games with Private Monitoring: An Application of POMDP,”mimeo. [24] Lehrer, E. (1992), “Correlated Equilibria in Two-Player Repeated Games with Nonobservable Actions,”Mathematics of Operations Research, 17, 175-199. [25] Lehrer, E., D. Rosenberg, and E. Shmaya (2010), “Signaling and Mediation in Games with Common Interests,”Games and Economic Behavior, 68, 670-682. [26] Levin, J. (2003), “Relational Incentive Contracts,” American Economic Review, 93, 835-857. [27] MacLeod, B.W. (2003), “Optimal Contracting with Subjective Evaluation,” American Economic Review, 93, 216-240. [28] Mailath, G.J., S.A. Matthews, and T. Sekiguchi (2002), “Private Strategies in Finitely Repeated Games with Imperfect Public Monitoring,”Contributions in Theoretical Economics, 2. [29] Mailath, G.J. and L. Samuelson (2006), Repeated Games and Reputations, Oxford University Press: Oxford. [30] Mertens, J.-F., S. Sorin, and S. Zamir (1994), “Repeated Games. Part A,” CORE DP 9420. [31] Miyahara, Y. and T. Sekiguchi (2013), “Finitely Repeated Games with Monitoring Options,”Journal of Economic Theory, 148, 1929-1952. [32] Myerson, R.B. (1986), “Multistage Games with Communication,” Econometrica, 54, 323-358. 64 [33] Pai, M.M, A. Roth, and J. Ullman (2014), “An Anti-Folk Theorem for Large Repeated Games with Imperfect Monitoring,”mimeo. [34] Phelan, C., and A. Skrzypacz (2012), “Beliefs and Private Monitoring,”Review of Economic Studies, 79, 1637-1660. [35] Phelan, C. and E. Stacchetti (2001), “Sequential Equilibria in a Ramsey Tax Model,” Econometrica, 69, 1491-1518. [36] Rahman, D. (2012), “But Who Will Monitor the Monitor?,” American Economic Review, 102, 2767-2797. [37] Rahman, D. (2014), “The Power of Communication,”American Economic Review, 104, 3737-51. [38] Rahman, D. and I. Obara (2010), “Mediated Partnerships,”Econometrica, 78, 285-308. [39] Renault, J. and T. Tomala (2004), “Communication Equilibrium Payo¤s in Repeated Games with Imperfect Monitoring,”Games and Economic Behavior, 49, 313-344. [40] Sekiguchi, T. (2002), “Existence of Nontrivial Equilibria in Repeated Games with Imperfect Private Monitoring,”Games and Economic Behavior, 40, 299-321. [41] Stigler, G.J. (1964), “A Theory of Oligopoly,”Journal of Political Economy, 72, 44-61. [42] Sugaya, T. and A. Wolitzky (2014), “Perfect Versus Imperfect Monitoring in Repeated Games,”mimeo. [43] Tomala, T. (2009), “Perfect Communication Equilibria in Repeated Games with Imperfect Monitoring,”Games and Economic Behavior, 67, 682-694. 65 Online Appendix: Proof of Proposition 4 We prove that Etalk ( ; p) = Emed ( ). Since we consider two-player games, whenever we say players i and j, we assume that they are di¤erent players: i 6= j. The structure of the proof is as follows: take any strategy of the mediator, ~ , that satis…es (2) (perfect monitoring incentive compatibility); and let v~ be the value when the players follow ~ . Since each v^ 2 Emed ( ) has a corresponding ^ that satis…es (2), it su¢ ces to show that, for each " > 0, there exists a sequential equilibrium whose equilibrium payo¤ v satis…es kv v~k < " in the following environment: 1. At the beginning of the game, each player i receives a message mmediator from the i mediator. 2. In each period t, the stage game proceeds as follows: (a) Given player i’s history (mmediator ; (m1st ; a ; m2nd )t =11 ), each player i sends the …rst i message m1st i;t simultaneously. ; (m1st ; a ; m2nd )t =11 ; m1st (b) Given player i’s history (mmediator t ), each player i takes i action ai;t simultaneously. ; (m1st ; a ; m2nd )t =11 ; m1st (c) Given player i’s history (mmediator t ; at ), each player i sends i 2nd the second message mi;t simultaneously. We call this environment “perfect monitoring with cheap talk.” To this end, from ~ , we …rst create a strict full-support equilibrium with mediated perfect monitoring that yields payo¤s close to v~. We then move from to a similar equilibrium , which will be easier to transform into an equilibrium with perfect monitoring with cheap talk. Finally, from , we create an equilibrium with perfect monitoring with cheap talk with the same on-path action distribution. Construction and Properties of In this subsection, we consider mediated perfect monitoring throughout. Since W 6= ;, by Lemma 2, there exists a strict full support equilibrium strict with mediated perfect monitoring. As in the proof of Lemma 2, consider the following strategy of the mediator: In period 1, the mediator draws one of two states, Rv~ and Rperturb , with probabilities 1 and , respectively. In state Rv~, the mediator’s recommendation is determined as follows: If no player has deviated up to period t, the mediator recommends rt according to ~ (htm ). If only player i has deviated, the mediator recommends r i;t to player j according to j , and recommends some best response to j to player i. Multiple deviations are treated as in the proof of Lemma 1. On the other hand, in state Rperturb , the mediator follows the equilibrium strict . Let denote this strategy of the mediator. From now on, we …x > 0 arbitrarily. 66 With mediated perfect monitoring, since strict has full support, player i believes that the mediator’s state is Rperturb with positive probability after any history. Therefore, by (2) and the fact that strict is a strict equilibrium, it is always strictly optimal for each player i to follow her recommendation. This means that, for each period t, there exist "t > 0 and Tt < 1 such that, for each player i and on-path history ht+1 m , we have # " 1 X t 1 ) ui ( (hm )) j htm ; ri;t (1 )E ui (rt ) j htm ; ri;t + E (1 =t+1 > max (1 )E ui (ai ; r ai 2Ai +( Tt ) (1 i;t ) j htm ; ri;t "t ) max ui (^ ai ; a ^i "t j ) + "t max ui (a) + Tt max ui (a): a2A a2A (26) That is, suppose that if player i unilaterally deviates from on-path history, then player j virtually minmaxes player i for Tt 1 periods with probability 1 "t . (Recall that j is the minmax strategy and "j is a full support perturbation of j .) Then player i has a strict incentive not to deviate from any recommendation in period t on equilibrium path. Equivalently, since is an full support recommendation, player i has a strict incentive not to deviate unless she herself has deviated. Moreover, for su¢ ciently small "t > 0, we have " # 1 X t 1 (1 )E ui (rt ) j htm ; ri;t + E (1 ) ui ( (hm )) j htm =t+1 > (1 Tt ) (1 "t ) max ui (^ ai ; a ^i "t j ) + "t max ui (a) + a2A Tt max ui (a): a2A (27) That is, if a deviation is punished with probability 1 "t for Tt periods including the current period, then player i believes that the deviation is strictly unpro…table.22 For each t, we …x "t > 0 and Tt < 1 with (26) and (27). Without loss, we can take "t decreasing: "t "t+1 for each t. Construction and Properties of In this subsection, we again consider mediated perfect monitoring. We further modify and create the following mediator’s strategy : At the beginning of the game, for each i, t, and punish at , the mediator draws ri;t (at ) according to "i t . In addition, for each i and t, she draws ! i;t 2 fR; P g such that ! i;t = R (regular) and P (punish) with probability 1 pt and pt , respectively, independently across i and t. We will pin down pt > 0 in Lemma 12. Moreover, given ! t = (! 1;t ; ! 2;t ), the mediator chooses rt (at ) for each at as follows: If ! 1;t = ! 2;t = R, then she draws rt (at ) according to (at ) (r). If ! i;t = R and ! j;t = P , then she draws ri;t (at ) 22 If the current on-path recommendation schedule Pr (rj;t j htm ; ri;t ) is very close to more restrictive than (26). 67 j, then (27) may be P a punish from Pr (ri j rj;t (at )) while she draws rj;t (at ) randomly from aj 2Aj jAjj j .23 Finally, if P ! 1;t = ! 2;t = P , then she draws ri;t (at ) randomly from ai 2Ai jAaii j for each i independently. Since has full support, is well de…ned. As will be seen, we will take pt su¢ ciently small. In addition, recall that > 0 (the perturbation of ~ to ) is arbitrarily. In the next subsection and onward, we construct an equilibrium with perfect monitoring with cheap talk that has the same equilibrium action distribution as . Since pt is small and > 0 is arbitrary, constructing such an equilibrium su¢ ces to prove Proposition 4. punish At the beginning of the game, the mediator draws ! t , ri;t (at ), and rt (at ) for each i, t, and at . Given them, the mediator sends messages to the players as follows: 1. At the beginning of the game, the mediator sends punish ri;t (at ) i. 1 at 2At 1 to player t=1 2. In each period t, the stage game proceeds as follows: (a) The mediator decides ! t (at ) 2 fR; P g2 as follows: if there is no unilateral deviator (de…ned below), then the mediator sets ! t (at ) = ! t . If instead player i is a unilateral deviator, then the mediator sets ! i;t (at ) = R and ! j;t (at ) = P . (b) Given ! i;t (at ), the mediator sends ! i;t (at ) to player i. In addition, if ! i;t (at ) = R, then the mediator sends ri;t (at ) to player i as well. (c) Given these messages, player i takes an action. In equilibrium, if player i has punish (at ) if not yet deviated, then player i takes ri;t (at ) if ! i;t (at ) = R and takes ri;t t ! i;t (a ) = P . For notational convenience, let ri;t = ri (at ) if ! i;t (at ) = R; punish ri;t (at ) if ! i;t (at ) = P be the action that player i is supposed to take if she has not yet deviated. Her strategy after her own deviation is not speci…ed. We say that player i has unilaterally deviated if there exist t 1 and a unique i such 0 0 0 that (i) for each < , we have an; = rn; for each n 2 f1; 2g (no deviation happened until period 1) and (ii) ai; 6= ri; and aj; = rj; (player i deviates in period and player j does not deviate). Note that is close to on the equilibrium path for su¢ ciently small pt . Hence, onpath strict incentive compatibility for player i follows from (26). Moreover, the incentive compatibility condition analogous to (27) also holds. punish As will be seen below, if ! j;t = P , then player j is supposed to take rj;t (at ). Hence, rj;t (at ) does t not a¤ect the equilibrium action. We de…ne rj;t (a ) so that, when the mediator sends a message only at the beginning of the game (in the game with perfect monitoring with cheap talk), she sends a “dummy recommendation” rj;t (at ) so that player j does not realize that ! j;t = P until period t. 23 68 Lemma 12 There exists fpt g1 t=1 with pt > 0 for each t such that it is strictly optimal for each player i to follow her recommendation: For each player i and history punish ri;t at hti 1 at 2At 1 t=1 ; at ; (! (a ))t =11 ; ! i;t (at ); (ri; )t =1 ; if player i herself has not yet deviated, we have the following two inequalities: 1. If a deviation is punished by "j t for the next period Tt periods with probability 1 "t Pt+Tt 1 p , then it is strictly unpro…table: =t # " 1 X t 1 ui (ri; ; aj; ) j hti ; ai;t = ri;t (1 )E ui (ri;t ; aj;t ) j hti + E (1 ) =t+1 > max (1 Tt +( + Tt ui (ai ; aj;t ) j )E ai 2Ai 1 ) "t max ui (a): hti Pt+Tt 1 =t p ai ; max ui (^ a ^i "t j ) + "t + Pt+Tt 1 =t p max ui (a) a2A (28) a2A 2. If a deviation is punished by "j t from the current period with probability 1 "t Pt+Tb 1 pt , then it is strictly unpro…table: =t " # 1 X t 1 (1 )E ui (ri;t ; aj;t ) j hti + E (1 ) ui (ri; ; aj; ) j hti ; ai;t = ri;t =t+1 Tt > (1 + Tt ) 1 max ui (a): "t Pt+Tt =t 1 p max ui (^ ai ; a ^i a2A "t j ) + "t + Pt+Tt =t 1 p max ui (a) a2A (29) Moreover, E does not depend on the speci…cation of player j’s strategy after player j’s own deviation, for each history hti such that player i has not deviated. Proof. Since has full support on the equilibrium path, a player i who has not yet deviated always believes that player j has not deviated. Hence, E is well de…ned without specifying player j’s strategy after player j’s own deviation. Moreover, since pt is small and ! j;t is independent of (! )t =11 and ! i;t , given (! (a ))t =11 and ! i;t (at ) (which are equal to (! )t =11 and ! i;t on-path), player i believes that ! j;t (at ) is equal to ! j;t and ! j;t is equal to R with a high probability, unless player i has deviated. Since Pr (rj;t j ! i;t (at ); ! j;t (at ) = R ; hti ) = Pr (rj;t j at ; ri;t ); 69 we have that the di¤erence ui (ri;t ; aj;t ) j hti E E ui (ri;t ; aj;t ) j rit ; at ; ri;t is small for small pt . Further, if p is small for each t+1, then since ! is independent of ! t with t 1, regardless of (! (a ))t =1 , player i believes that ! i; (a ) = ! j; (a ) = R with high probability for t + 1 on the equilibrium path. Since the distribution of the recommendation given is the same as that of given a and ! i; (a ) = ! j; (a ) = R, we have that " # " # 1 1 X X t 1 t 1 t t t (1 ) ui (ri; ; aj; ) j hi ; ai;t = ri;t E (1 ) ui (ri; ; aj; ) j ri ; a ; ri;t E =t+1 =t+1 is small for small p with t + 1. Hence, (26) and (27) imply that, there exists pt > 0 such that, if p pt for each t, then the claims of the lemma hold. Hence, if we take pt min t p , then the claims hold. We …x fpt g1 t=1 so that Lemma 12 holds. This fully pins down monitoring. with mediated perfect Construction with Perfect Monitoring with Cheap Talk Given with mediated perfect monitoring, we de…ne the equilibrium strategy with perfect monitoring with cheap talk such that the equilibrium action distribution is the same as . We must pin down the following four objects: at the beginning of the game, what player i receives from the mediator; what message m1st message mmediator i;t player i sends at i the beginning of period t; what action ai;t player i takes in period t; and what message m2nd i;t player i sends at the end of period t. Intuitive Argument punish As in , at the beginning of the game, for each i, t, and at , the mediator draws ri;t (at ) according to "i t . In addition, with pt > 0 pinned down in Lemma 12, she draws ! t 2 fR; P g2 and rt (at ) as in for each t and at . She then de…nes ! t (at ) from at , rt (at ), and ! t as in . Intuitively, the mediator sends all the information about punish punish ! t (at ); rt at ; r1;t at ; r2;t at 1 at 2At 1 t=1 through the initial messages (mmediator ; mmediator ). In particular, the mediator directly sends 1 2 punish (ri;t (at ))at 2At 1 1 t=1 to player i as a part of mmediator . i replicate the role of the mediator in on realized history at . Hence, we focus on how we of sending (! t (at ); rt (at )) in each period, depending 70 The key features to establish are (i) player i does not know the instructions for the other player, (ii) before player i reaches period t, player i does not know her own recommendations for periods t (otherwise, player i would obtain more information than the original equilibrium and thus might want to deviate), and (iii) no player wants to deviate (in particular, if player i deviates in actions or cheap talk, then the strategy of player j is as if the state were ! j;t = P in , for a su¢ ciently long time with a su¢ ciently high probability). The properties (i) and (ii) are achieved by the same mechanism as in Theorem 9 of Heller, Solan and Tomala (2012, henceforth HST). In particular, without loss, let Ai = f1i ; :::; ni g be player i’s action set. We can view ri;t (at ) as an element of f1; :::; ni g. The mediator at the beginning of the game draws rt (at ) for each at . Instead of sending ri;t (at ) directly to player i, the mediator encodes ri;t (at ) as follows: For a su¢ ciently large N t 2 Z to be determined, we de…ne pt = N t ni nj . This pt corresponds to ph in HST. Let Zpt f1; :::; pt g. The mediator draws xji;t (at ) uniformly and independently from Zpt for each i, t, and at . Given them, she de…nes i yi;t (at ) xji;t (at ) + ri;t (at ) (mod ni ): (30) i i Intuitively, yi;t (at ) is the “encoded instruction”of ri;t (at ), and to obtain ri;t (at ) from yi;t (at ), 1 player i needs to know xji;t (at ). The mediator gives (yii (at ))at 2At 1 t=1 to player i as a . At the same time, she gives part of mmediator i xji;t (at ) 1 at 2At 1 t=1 xji;t (at ) by to player j as a part of cheap talk as a part of mmediator . At the beginning of period t, player j sends j 1st t mj;t , based on the realized action a , so that player i does not know ri;t (at ) until period t. (Throughout the proof, the superscript of a variable represents who is informed about the variable, and the subscript represents whose recommendation the variable is about.) In order to incentivize player j to tell the truth, the equilibrium should embed a mechanism that punishes player i if she tells a lie. In HST, this is done as follows: The mediator draws ii;t (at ) and ii;t (at ) uniformly and independently from Zpt , and de…nes uji;t (at ) i t i;t (a ) xji;t (at ) + i t i;t (a ) (mod pt ): (31) The mediator gives xji;t (at ) and uji;t (at ) to player j while she gives ii;t (at ) and ii;t (at ) to player i. In period t, player j is supposed to send xji;t (at ) and uji;t (at ) to player i. If player i receives xji;t (at ) and uji;t (at ) with uji;t (at ) 6= i t i;t (a ) xji;t (at ) + i t i;t (a ) (mod pt ); (32) then player i interprets that player j has deviated. For su¢ ciently large N t , since player j does not know ii;t (at ) and ii;t (at ), if player j tells a lie about xji;t (at ), then with a high probability, player j creates a situation where (32) holds. Since HST considers Nash equilibrium, they let player i minimax player j forever after (32) holds. On the other hand, since we consider sequential equilibrium, as in the proof of 71 Lemma 2, we will create a coordination mechanism such that, if player j tells a lie, then with high probability player i minimaxes player j for a long time and player i assigns probability zero to the event that player i punishes player j. To this end, we consider the following coordination: First, if and only if ! i;t (at ) = R, the mediator de…nes uji;t (at ) as (31). Otherwise, uji;t (at ) is randomly drawn. That is, uji;t (at ) xji;t (at ) + ii;t (at ) (mod pt ) if ! i;t (at ) = R; uniformly distributed over Zpt if ! i;t (at ) = P: t i i;t (a ) (33) Since both ! i;t (at ) = R and ! i;t (at ) = P happen with a positive probability, player i after receiving uji;t (at ) with uji;t (at ) 6= ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt ) interprets that ! i;t (at ) = P . For notational convenience, let ! ^ i;t (at ) 2 fR; P g be player i’s interpretation of ! i;t (at ). punish After ! ^ i;t (at ) = P , she takes period-t action according to ri;t (at ). Given this inference, if j player j tells a lie about ui;t (at ) with ! i;t (at ) = R, then with a high probability, she induces a situation with uji;t (at ) 6= ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt ), and player i punishes player j in period t (without noticing player j’s deviation). punish Second, switching to ri;t (at ) only for period t may not be enough, if player j believes that player i’s action distribution given ! i;t (at ) = R is very close to the minimax strategy. Hence, we make sure that, once player j deviates, player i will take ri;punish (a ) for su¢ ciently long time. To this end, we change the mechanism so that player j does not always know uji;t (at ). j Instead, the mediator draws pt independent random variables vi;t (n; at ) with n = 1; :::; pt uniformly from Zpt . In addition, she draws nii;t (at ) uniformly from Zpt . The mediator de…nes uji;t (n; at ) for each n = 1; :::; pt as follows: uji;t (n; at ) = uji;t (at ) if n = nii;t (at ); j vi;t (n; at ) if otherwise, that is, uji;t (n; at ) corresponds to uji;t (at ) with (33) only if n = nii;t (at ). For other n, uji;t (n; at ) is completely random. The mediator sends nii;t (at ) to player i, and sends fuji;t (n; at )gn2Zpt to player j. In addition, the mediator sends nji;t (at ) to player j, where nji;t (at ) = nii;t (at ) if ! i;t 1 (at 1 ) = P; uniformly distributed over Zpt if ! i;t 1 (at 1 ) = R is equal to nii;t (at ) if and only if last-period ! i;t 1 (at 1 ) is equal to P . In period t, player j is asked to send xji;t (at ) and uji;t (n; at ) with n = nii;t (at ), that is, send xji;t (at ) and uji;t (at ). If and only if player j’s messages x^ji;t (at ) and u^ji;t (at ) satisfy u^ji;t (at ) = i t i;t (a ) x^ji;t (at ) + 72 i t i;t (a ) (mod pt ); player i interprets ! ^ i;t (at ) = R. If player i has ! ^ i;t (at ) = R, then player i knows that player t+1 i j needs to know ni;t+1 (a ) to send the correct uji;t+1 (n; at+1 ) in the next period. Hence, ^ i;t (at ) = P , then she believes that player she sends nii;t+1 (at+1 ) to player j. If player i has ! i t+1 i t+1 j knows ni;t+1 (a ) and does not send ni;t+1 (a ). Given this coordination, once player j creates a situation with ! i;t (at ) = R but ! ^ i;t (at ) = P , then player j cannot receive nii;t+1 (at+1 ). Without knowing nii;t+1 (at+1 ), with a large N t+1 , with a high probability, player j cannot know which uji;t+1 (n; at+1 ) she should send. Then, again, she will create a situation with u^ji;t+1 (at+1 ) 6= i t+1 ) i;t+1 (a x^ji;t (at+1 ) + i t+1 ) i;t (a (mod pt+1 ); that is, ! ^ i;t+1 (at+1 ) = P . Recursively, player i has ! ^ i; (a ) = P for a long time with a high probability if player j tells a lie. Finally, if player j takes a deviant action in period t, then the mediator has drawn ! i; (a ) = P for each t + 1 for a corresponding to the realized history. With ! i; (a ) = P , in order to avoid ! ^ i; (a ) = P , player j needs to create a situation u^ji; (a ) = i i; (a ) x^ji; (a ) + i i; (a ) (mod p ) without knowing ii; (a ) and ii; (a ) while the mediator’s message does not tell her what is t i xji;t (at ) + ii;t (at ) (mod p ) by (33). Hence, for su¢ ciently large N , player j cannot i;t (a ) avoid ! ^ i; (a ) = P with a nonnegligible probability. Hence, player j will be minmaxed from the next period with a high probability. The above argument in total shows that, if player j deviates, whether in communication or action, then she will be minmaxed for su¢ ciently long time. Lemma 12 ensures that player j does not want to tell a lie or take a deviant action. Formal Construction Let us formalize the above construction: As in , at the beginning of the game, for each i, punish t, and at , the mediator draws ri;t (at ) according to "i t ; then she draws ! t 2 fR; P g2 and t t rt (a ) for each t and a ; and then she de…nes ! t (at ) from at , rt (at ), and ! t as in . For each t and at , she draws xji;t (at ) uniformly and independently from Zpt . Given them, she de…nes i yi;t (at ) xji;t (at ) + ri;t (at ) (mod ni ); so that (30) holds. j The mediator draws ii;t (at ), ii;t (at ), u~ji;t (at ), vi;t (n; at ) for each n 2 Zpt , nii;t (at ), and n ~ ji;t (at ) from the uniform distribution over Zpt independently for each player i, each period t, and each at . 73 As in (33), the mediator de…nes xji;t (at ) + ii;t (at ) (mod pt ) if ! i;t (at ) = R; u~ji;t (at ) if ! i;t (at ) = P: i t i;t (a ) uji;t (at ) In addition, the mediator de…nes uji;t (n; at ) = and nji;t (at ) = uji;t (at ) if n = nii;t (at ); j vi;t (n; at ) if otherwise nii;t (at ) if t = 1 or ! i;t 1 (at 1 ) = P; n ~ ji;t (at ) if t 6= 1 and ! i;t 1 (at 1 ) = R; as explained above. Let us now de…ne the equilibrium: 1. At the beginning of the game, the mediator sends mmediator = i punish i (at ) ; (at ); ii;t (at ); ii;t (at ); ri;t yi;t nii;t (at ); nij;t (at ); uij;t (n; at ) n2Z t ; xij;t (at ) p ! at 2At 1 !1 t=1 to each player i. 2. In each period t, the stage game proceeds as follows: In equilibrium, m1st j;t = j t t uji;t (m2nd if t 6= 1 and m2nd i;t 1 ; a ); xi;t (a ) i;t 1 6= fbabbleg; j j j t t t 2nd ui;t (ni;t (a ); a ); xi;t (a ) if t = 1 or mi;t 1 = fbabbleg and m2nd j;t = (34) njj;t+1 (at+1 ) if ! ^ j;t (at ) = R; fbabbleg if ! ^ j;t (at ) = P: t+1 Note that, since m2nd = (a1 ; :::; at ). j;t is sent at the end of period t, the players know a (a) Given player i’s history (mmediator ; (m1st ; a ; m2nd )t =11 ), each player i sends the …rst i 1st message mi;t simultaneously. If player i herself has not yet deviated, then m1st i;t = t i t uij;t (m2nd if t 6= 1 and m2nd j;t 1 ; a ); xj;t (a ) j;t 1 6= fbabbleg; i i t t i t 2nd uj;t (nj;t (a ); a ); xj;t (a ) if t = 1 or mj;t 1 = fbabbleg: 1st i 2nd t i i t t Let m1st i;t (u) be the …rst element of mi;t (that is, either uj;t (mj;t 1 ; a ) or uj;t (nj;t (a ); a ) 1st i t on equilibrium); and let mi;t (x) be the second element (xj;t (a ) on equilibrium). As a result, the pro…le of the messages m1st becomes common knowledge. t If i i t t t m1st m1st (35) j;t (u) 6= i;t (a ) j;t (x) + i;t (a ) (mod p ); 74 then player i interprets ! ^ i;t (at ) = P . Otherwise, ! ^ i;t (at ) = R. (b) Given player i’s history (mmediator ; (m1st ; a ; m2nd )t =11 ; m1st i t ), each player i takes action ai;t simultaneously. If player i herself has not yet deviated, then player i takes ai;t = ri;t with ri;t = i yi;t (at ) m1st ^ i;t (at ) = R; j;t (x) (mod ni ) if ! punish ri;t (at ) if ! ^ i;t (at ) = P: (36) i (at ) xji;t (at ) + ri;t (at ) (mod ni ) by (30). By (34), therefore, player Recall that yi;t punish i (at ) if ! i;t (at ) = R and ri;t (at ) if ! i;t (at ) = P on the equilibrium i takes ri;t path, as in . (c) Given player i’s history (mmediator ; (m1st ; a ; m2nd )t =11 ; m1st i t ; at ), each player i sends 2nd the second message mi;t simultaneously. If player i herself has not yet deviated, then nii;t+1 (at+1 ) if ! ^ i;t (at ) = R; m2nd = i;t fbabbleg if ! ^ i;t (at ) = P: becomes common knowledge. Note As a result, the pro…le of the messages m2nd t t that ! t (a ) becomes common knowledge as well on equilibrium path. Incentive Compatibility The above equilibrium has full support: Since ! t (at ), and rt (at ) have full support, (mmediator ; mmediator ) 1 2 have full support as well. Hence, we are left to verify player i’s incentive not to deviate from the equilibrium strategy, given that player i believes that player j has not yet deviated after any history of player i. Suppose that player i followed the equilibrium strategy until the end of period t 1. First, consider player i’s incentive to tell the truth about m1st In equilibrium, player i i;t . sends t i t uij;t (m2nd if m2nd j;t 1 ; a ); xj;t (a ) j;t 1 6= fbabbleg; = m1st i i t t i t 2nd i;t uj;t (nj;t (a ); a ); xj;t (a ) if mj;t 1 = fbabbleg: The random variables possessed by player i are independent of those possessed by player j given (m1st ; a ; m2nd )t =11 , except that (i) uji;t (at ) = ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt ) if ! i;t (at ) = R, (ii) uij;t (at ) = jj;t (at ) xij;t (at ) + jj;t (at ) (mod pt ) if ! j;t (at ) = R, (iii) nji; (a ) = nii; (a ) if ! i; 1 (a 1 ) = P while nji; (a ) = n ~ ii; (a ) if ! i; 1 (a 1 ) = R, and (iv) nij; (a ) = njj; (a ) if ! j; 1 (a 1 ) = P while nij; (a ) = n ~ jj; (a ) if ! j; 1 (a 1 ) = R. Since j i i t t ~ji;t (at ), vi;t (n; at ) nii;t (at ), and n ~ ji;t (at ) are uniform and independent, player i;t (a ), i;t (a ), u i cannot learn ! i; (a ), ri; (a ), or rj; (a ) with t. Hence, player i believes at the time 75 when she sends m1st i;t that her equilibrium value is equal to (1 )E ui (at ) j hti + E " (1 punish where hti is as if player i observed ri;t (at ) ) 1 X =t+1 1 at 2At 1 t=1 t 1 # ui (at ) j hti ; , at , (! (a ))t =11 , and ri;t (at ), and believed that r (a ) = a for each = 1; :::; t 1 with with mediated perfect monitoring. On the other hand, for each e > 0, for a su¢ ciently large N t , if player i tells a lie in at e, player i creates a situation least one element m1st i;t , then with probability 1 m1st i;t (u) 6= j t j;t (a ) m1st i;t (x) + j t j;t (a ) (mod pt ): Hence, (35) (with indices i and j reversed) implies that ! ^ j;t (at ) = P . t Moreover, if player i creates a situation with ! ^ j;t (a ) = P , then player j will send m2nd j;t = j j t+1 t t+1 fbabbleg instead of nj;t+1 (a ). Unless ! j;t (a ) = P , since nj;t+1 (a ) is independent of player i’s variables, player i believes that njj;t+1 (at+1 ) is distributed uniformly over Zpt+1 . Hence, for each e > 0, for su¢ ciently large N t , if ! ^ j;t (at ) = R, then player i will send m1st i;t+1 with j j t+1 t+1 m1st ) m1st ) (mod pt+1 ) i;t+1 (u) 6= j;t+1 (a i;t+1 (x) + j;t+1 (a with probability 1 e. Then, by (35) (with indices i and j reversed), player j will have ! ^ j;t+1 (at+1 ) = P . Recursively, if ! j; (a ) = R for each = t; ::; t + Tt 1, then player i will induce ! ^ j; (a ) = P for each = t; ::; t + Tt 1 with a high probability. Hence, for "t > 0 and Tt …xed in (26) and (27), for su¢ ciently large N t , if N N t for each t, then player i will be punished for subsequentP Tt periods regardless Pt+Tt 1 of player i’s continuation strategy with t+Tt 1 p represents the maximum probability p . ( probability no less than 1 "t =t =t of having ! i; (a ) = P for some for subsequent Tt periods.) (29) implies that telling a lie gives strictly lower payo¤ than the equilibrium payo¤. Therefore, it is optimal to tell the truth about m1st (In (29), we have shown interim incentive compatibility after knowing i;t . t ! i;t (a ) and ri;t while here, we consider hti before ! i;t (at ) and ri;t . Taking the expectation with respect to ! i;t (at ) and ri;t , (29) ensures incentive compatibility before knowing ! i;t (at ) and ri;t .) Second, consider player i’s incentive to take the action ai;t = ri;t according to (36) if player i follows the equilibrium strategy until she sends m1st i;t . If she follows the equilibrium strategy, then player i believes at the time when she takes an action that her equilibrium value is equal to " # 1 X t 1 (1 )E ui (at ) j hti + E (1 ) ui (at ) j hti ; =t+1 76 punish where hti is as if player i observed ri;t (at ) 1 at 2At 1 t=1 , at , (! (a ))t =11 , ! i;t (at ), and ri;t , and believed that r (a ) = a for each = 1; :::; t 1 with with mediated perfect monitoring. 1st (Compared to the time when player i sends mi;t , player i now knows ! i;t (at ) and ri;t on equilibrium path.) If player i deviates from ai;t , then ! j; (a ) = P by de…nition for each t + 1 and a that is compatible with at (that is, a = (at ; at ; :::; a 1 ) for some at ; :::; a 1 ). To avoid being minmaxed in period , player i needs to induce ! ^ j; (a ) = R although ! j; (a ) = P . j j i t t t i ~ ji;t (at ) are uniform Given ! j; (a ) = P , since i;t (a ), i;t (a ), u~i;t (a ), vi;t (n; at ) nii;t (at ), and n and independent (conditional on the other variables), regardless of player i’s continuation strategy, by (35) (with indices i and j reversed), player i will send m1st i; with m1st i; (u) 6= j j; (a ) m1st i; (x) + j j; (a ) (mod p ) with a high probability. Hence, for su¢ ciently large N t , if N N t for each t, then player i will be punished for the next Tt periods regardless of player i’s continuation strategy with probability no less than 1 "t . By (28), deviating from ri;t gives a strictly lower payo¤ than her equilibrium payo¤. Therefore, it is optimal to take ai;t = ri;t . 2nd Finally, consider player i’s incentive to tell the truth about m2nd i;t . Regardless of mi;t , player j’s actions do not change. Hence, we are left to show that telling a lie does not improve player i’s deviation gain by giving player i more information. On the equilibrium path, player i knows ! i;t (at ). If player i tells the truth, then m2nd i;t = t+1 t i ni;t+1 (a ) 6= fbabbleg if and only if ! i;t (a ) = R. Moreover, player j sends m1st j;t+1 = t+1 uji;t+1 (m2nd ); xji;t+1 (at+1 ) if ! i;t (at ) = R; i;t ; a uji;t+1 (nji;t+1 (at+1 ); at+1 ); xji;t+1 (at+1 ) if ! i;t (at ) = P: Since nji;t+1 (at+1 ) = nii;t+1 (at+1 ) if ! i;t (at ) = P , in total, if player i tells the truth, then player i knows uij;t+1 (mii;t+1 (at+1 ); at+1 ) and xij;t+1 (at+1 ). This is su¢ cient information to infer ! i;t+1 (at+1 ) and ri;t+1 (at+1 ) correctly. If she tells a lie, then player j’s messages are changed to m1st j;t+1 = t+1 uji;t+1 (m2nd ); xji;t+1 (at+1 ) if m2nd 6 fbabbleg; i;t ; a i;t = j j j t+1 t+1 t+1 2nd ui;t+1 (ni;t+1 (a ); a ); xi;t+1 (a ) if mi;t = fbabbleg: j Since ii;t+1 (at+1 ), ii;t+1 (at+1 ), u~ji;t+1 (at+1 ), vi;t+1 (n; at+1 ) nii;t+1 (at+1 ), and n ~ ji;t+1 (at+1 ) are uniform and independent conditional on ! i;t+1 (at+1 ) and ri;t+1 (at+1 ), uji;t+1 (n; at+1 ) and xji;t+1 (at+1 ) are not informative about player j’s recommendation from period t + 1 on or player i’s recommendation from period t + 2 on, given that player i knows ! i;t+1 (at+1 ) and ri;t+1 (at+1 ). Since telling the truth informs player i of ! i;t+1 (at+1 ) and ri;t+1 (at+1 ), there is no gain from telling a lie. 77