Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/approximatefolktOOfude working paper department of economics AN APPROXIMATE FOLK THEOREM WITH IMPERFECT PRIVATE INFORMATION Drew Fudenberg and David K. Levine No. March 1989 525 massachusetts institute of I ^ technology J' 50 memorial drive Cambridge, mass. 02139 AN APPROXIMATE FOLK THEOREM WITH IMPERFECT PRIVATE INFORMATION Drew Fudenberg and David K. Levine No. 525 March 1989 AN APPROXIMATE FOLK THEOREK WITH IMPERFECT PRIVATE INFORMATION by Drew Fudenberg and David K. Levine *** March 1989 * We are grateful to Andreu Mas-Colell. NSF Grants 87-08616 and 86-09697 and a grant from the UCLA Academic Senate provided financial support. ** Department of Economics, Massachusetts Institute of Technology *** Department of Economics, University of California, Los Angeles; part of this research was conducted while at the University of Minnesota. Abstract We give a partial folk theorem for approximate equilibria of a class of discounted repeated games where each player receives a private signal of the play of his opponents "informationally connected," . Our meaning condition that each player that can be statistically detected by player any third player k. is j that i the has a game deviation regardless of the action of Under the same condition, we obtain a partial folk theorem for the exact equilibria of the game with time-average payoffs, JEL Classification numbers = 022, 026 Keywords: Repeated games, folk theorem, private information, approximate equilibrium be 1. Introduction We give a partial folk theorem for the approximate equilibria of a class of repeated games with imperfect information satisfying an informa- tional linkage condition. Every payoff vector which exceeds a mutual threat point and is generated by one-shot mixed strategies from which no player can profitably deviate without having some effect on other players, is approximately an approximate sequential equilibrium payoff if the discount factor is close enough to one. The class of repeated games we consider has complete but imperfect information. Each period, player his own payoff and also a signal i z chooses an action a , then observes of the play of his opponents. This model includes as a special case games of imperfect public information as defined by Fudenberg-Levine-Maskin [1989]. publicly observed variable y, In these games, there is a and each player i observes z. - (y,a.). The public information case includes the literature on repeated oligopoly (Green-Porter [1983], Abreu-Pearce Stacchetti [1986]), repeated principalagent games (Radner [1981, 1985], Rubinstein-Yaari [1983]), repeated partnerships (Radner [1986], Radner-Myerson-Maskin [1986]), as well as the classic case of observable actions (Auman-Shapley Rubinstein [1981], Fudenberg-Maskin [1986]). , Friedman [1971], It also includes the examples studied in the literature of repeated games with "semi-standard information", as discussed in Sorin [1988]. While examples of repeated games that have been formally studied have public information, players have private information in some situations of economic interest. Indeed, the central feature in Stigler's [1961] model of secret price-cutting is that each firm's sales depend on the prices of its rivals, but firms only observe their own demand. The goal of this paper is to show that a partial folk theorem extends to these cases, provided we relax the notion of equilibrium slightly. The key element in any folk theorem is that deviators must be punished. This requires, with three or more players, that non- deviators coordinate their punishments, and may even require a player to cooperate in his own punishment. With public information, the necessary coordination can be accomplished by conditioning play on the commonly observed outcome. When players receive different private signals they may disagree on the need for punishment. This raises the possibility that some players may believe that punishment is required while others do not realize this. The most straight- forward way to prevent such confusion from dominating play is to periodically "restart" the strategies at commonly known times, ending confusion, and permitting players to recoordinate their play. This, however, makes it difficult to punish deviators near the point at which play restarts and forces us toward approximate equilibrium. The use of approximate equilibrium leads to an important simplification, because it allows the use of review strategies of the type introduced by Radner [1981]. Under these strategies, each player calculates a statistic indicating whether a deviation has occurred. If the player's statistic crosses a threshold level at a commonly known time, he communicates this fact during a communication stage of the type introduced by Lehrer [1986]. Our informational linkage condition ensures that all players cor- rectly interpret this communication, so that the communications stage allows coordination of punishments. The importance of approximate equilibrium is that Radner-type review strategies do not punish small deviations, i.e., those that do not cause players statistics to cross the threshold level. Consequently, there will typically be small gains to deviating. A second consequence of approximate equilibrium is that sequentiality loses its force, and the review strategies are perfect whenever they are Nash. This is because punishments end when the strategies are restarted so the cost to punishers is small. Thus the punishment are "credible" in the sense that carrying them out is an approximate equilibrium. Our results are closely connected to those of Lehrer [1986, 1988a, b,c,d, 1989] who considers time-average Nash equilibrium of various classes of games with private information and imperfect monitoring but with non-stochastic outIn addition to the deterministic nature of outcomes, Lehrer 's work has comes. a different emphasis than ours, focusing on completely characterizing equilibrium payoffs under alternative specifications of what happens when a time average fails to exist. Our work provides a partial characterization of payoffs for a special but important class of games. A secondary goal of our paper is to clarify the correction between Lehrer' s work and other work on repeated time-average games, and work on discounted games such as ours. In particular we exposit. the connection between approximate discounted equilibrium and exact time-average equilibrium, which is to some extent already known to those who have studied repeated games. We emphasize the fact that the time-average equilibrium payoff set includes limits of approximate as well as exact discounted equilibria. Although we show that the converse is false in general, the type of construction used by Lehrer (and us) yields both an approximate discounted and a time-average theorem. For economists, who are typically hostile to the notion of time-average equilibrium, we hope to make clear that results on repeated games with time -average payoffs are relevant, provided the logic of approximate equilibrium is accepted. If we consider the set of equilibrium payoffs as potential contracts in there is an interpretation of approximate equi- a mechanism design problem, librium that deserves emphasis. The set of equilibrium payoffs in our theorem in some cases strictly exceeds that for exact equilibria. This means that exact equilibrium may be substantially too pessimistic. The efficiency frontier for contracts may be substantially improved if people can be persuaded to forego the pursuit of very small private benefits. For example, even if ethical standards have only a small impact on behavior, they may significantly improve contracting possibilities. Section 2 of the paper lays out the repeated games model. Section 3 establishes a connection between approximate sequential discounted equilibria with discount factor near one, and time-average equilibrium. In particular, both types of equilibria can be constructed from finite-horizon approximate Nash equilibria. Section 4 describes our informational linkage condition, and proves a partial folk theorem, 2. The Model In the stage game chooses an action , each player from a finite set a. 11 player observes an outcome ' let z - (z, z 1 n and ) z. Z - x. , 1-1 realized payoff A. 1 to N, simultaneously with m. elements. a finite set with 6 Z., induces a probability distribution i's i i, Each We elements. M. 1 Z . Each action profile . a e i tt (a) over outcomes z. A X. , A, 1-1 1 Each player depends on his own observed outcome only; the r (z.) opponents' actions matter only in their influence on the distribution over outcomes. Player i's expected payoff to an action profile g. (a) "i - S _ zeZ TT z (a)r. (z.) 1 1 . a is . d - We also define g.(a.,a max i ,a. ,a: ,a 1 1 g.(a', ,a - .) to be the maximum .) . -1 one-period gain any player could obtain by playing one of his actions instead of another al a . We will also wish to consider various types of correlated strategy profiles. A correlated action profile tion over A. A , H X. A . We write and the vector i plays write a (a. ,a probability distribution, is such a i represents the correlated action in which player ) and the other players correlate according to for the marginal induced over a simply a probability distribu- is for the induced probability distribution over a A belief by player . a we refer to is independent, by A a . is characterized by and may be identified with . We also . If the play of players as a mixed action profile a a a (a. in which case , the vector of ), marginals For any correlated action profile we can calculate the induced a, distribution over outcomes TT (a) - 2 aGA (a)a(a). TT We can also calculate the marginal distribution over player TT (z ,q:) - S g (a) - E TT z.eZ, 1 In the repeated game, . ,Q)r (z (z ^ ^ - ) ^ i in each period t Z ZGZ TT i (a)r.(z - 1,2 i at time t is h.(t) - (a.(l),z.(l) ). ^ ^ the stage game is played, and the corresponding outcome is then revealed. player outcomes i's T (q). Finally, we may calculate the expected payoff to player a. (t) The history for , a z. (t) ) . We also let denote the null history existing before play begins. h.(0) for player h.(t-l) is a sequence of maps i A system of beliefs h.(t), private history b, a mapping his private history a.(t) to probability distributions over for player A i . specifies for each time probability distribution histories of other players of length t. b over private b for all truncation of a in This requires that there exists for each a sequence of truncated strategy profiles T and t if for every finite time a, is consistent with the the sense of Kreps-Wilson [1982]. truncation b.(h.(t)) A profile of beliefs payers is consistent with the strategy profile horizon the truncation of A strategy a T -* a T putting , strictly positive probability on every action, such that the unique sequence of truncated belief profiles derived from Bayes law, Given a system of beliefs b and a history induced over the history of all players play. the strategy profile at all times player i, r . b T - T b . a distribution is h (t) Given this distribution and a corresponding distribution is induced over play a, In turn this gives an expected payoff in period which we denote by G. (r ,h. (t) ,b. ,a) normalized present value to player i r to The corresponding . < at discount factor 5 Wj^(h^(t).b^,a.5) - (1-5) 2"_^ 5^'^ G(r ,h^(t) ,b^,a) < 1 is . ^ While if 5-1 wj(hj^(t).b.,a) - (1/T) Sy_^ G^(r,h^(t),b^,a) and W (h (t),b ^ ,cr,l) ^ - lira sup W^Ch (t) ,b ,a) . T-*« Notice that in the case of the initial null history h. (0) , the present . value of time average expected payoff is the same for all beliefs consistent with W The emphasize this, we write a. For 5 each player < W^(h^(0) ,b^ and W^(h^(0),b^,a,6) and strategy i is an e-Nash a equilibrium if for a' W^((aj^.a_^),fi) ^ W^(£7,5) + (2.1) in place of .a) a strategy profile 1, T W. (a) and (cr,5) £ There is also a corresponding notion of a truncated e-Nash equilibrium , where we require W^(aj^,a_^) < W^(a) + An £ e. -sequential equilibrium is defined in a similar way, except that we require there exist beliefs b consistent with a W^(h^(t),b^,(a^,a_^),5) < W^(h^(t) (2.2) for all players and all histories i, Notice that history). rather than time t. h. (t) such that .b^^.a, 5) + 8^''^ e (not merely the null initial is measured as a present value at time W, 1, Consequently, an €-sequential equilibrium is defined so that the gain to deviating at time is of order t e, in time-t units. Notice that this definition is stronger than the usual version, found in Fudenberg and Levine [1983] at time For t , ordinarily the gain to deviating for example: is measured in time-0 units, not time-t units. 5-1, we will impose a regularity condition on the equilibrium. A uniform Nash equilibrium is a profile T (2.3) W.(cr) (2.4) for all wT(cr: ,a 1 1 a such that converges, and p > 0, .) < wT(cr) 1 -1 3r, such that + p. T > r implies A uniform sequential equilibrium is a profile T W. (h. (t) ,b (2.5) converges for all histories ,a) consistent beliefs b, 3r T > such that strategies a . r h.(t) and and there exist consistent beliefs (2.6) such that a b such that for all implies for all histories p > 0, and h.(t) , W^(h.(t),b.,(al^,a_.) < wj(h^(t),b.,a) + p. When a satisfies condition (2.3) we call it repular we say it is sequentially regular : if it satisfies (2.5) . Our definitions of time average equilibrium require that the appropriate regularity condition be satisfied. In other words, we have required that the time-average payoff exist if no player deviates, and that there be a uniform bound on the gain to deviating in any sufficiently long subgame. Both conditions are responses to our unease in comparing lim infs or lim sups of payoffs when the limit does not exist. We impose the condi- tions because we interpret time averaging as the idealized version of a game with long finite horizon or very little discounting. Below we show that non-uniform equilibria cannot always be interpreted in this way. 3 . Discounting and Time Averaging Out goal is to characterize the set of equilibrium payoffs when players are very patient. In the next section we characterize truncated €-Nash equilibrium payoffs with small £ as containing a certain set V*. Since this notion of equilibrium is not the most economically meaningful, we first show that this is in fact a strong characterization of payoffs. Specifically we show Theorem 3.1 numbers Suppose there exists a sequence of times : and strategy profiles -^ e such that a T non-negative , is a T -truncated a T time-average -Nash equilibrium and e W (a - v. ) There exists a sequence of discount factors (A) numbers t n_, - ,-.-, and strategy profiles sequential equilibrium for and 6 a -I, 5 ,, n W. (a Then . non-negative such that ,5 ) -* v. a n.is an W that Remark 1 (cr.l) 2 a such - V It is well known that the hypotheses of the theorem imply that : s equilibrium (see for example, Sorin [1988]). is a time-average Nash Remark n- . There exists a uniform sequential equilibrium strategy profile (B) £ The hypothesis is weak in the sense that only approximate Nash : equilibria are required. However, as long as we consider approximate equi- librium, or infinite time -average equilibrium, sequentlality has little force. and (B) show that (A) Roughly, the reason is that after a deviation, no further deviations are anticipated, so the cost of punishment must be only paid once. patient. This has negligible cost if players are very It is easily seen in the following example, which is a special case of the main theorem in the next section. Consider a game in which players perfectly observe each others past play, and suppose there is a pure action profile players. periods a for which Clearly there is a discount factor K, such that if by his opponents for deviating from a. K 5 > £ i 1 > 5, > and number of the loss to each player of being minraaxed periods exceeds any possible one-shot gain of Consider then a strategy profile consisting of a review phase in which all players play the strictly exceeds the minmax for all g(a) of which, player i is a, and n minmaxed for different punishment phases in K periods. Initially, the , 10 game is in the review phase, and the review phase continues as long as no Whenever player deviates, or more than one player deviates simultaneously. a single player deviates, begins. a punishment phase against that player immediately The punishment phase continues periods, regardless of whether K any further deviation occurs, then the game restarts with a new review phase. 6 > £ , Since no player can profitably deviate during a review phase for these strategies form a Nash equilibrium with payoffs g(a) It is . equally clear that they are not generally sequential (subgame perfect) because players may profitably deviate during punishment phases. In proving the folk theorem, Fudenberg-Maskin (1986a) construct strategies that induce the players to carry out punishments by providing "rewards" for doing so. strategies. But such rewards are not provided by our review the Fudenberg-Maskin construction is only possible if the Indeed, game satisfies a "full -dimensionality" condition that we have not imposed. Thus the conclusion of (A) cannot in general be strengthened to The review strategies do, however, form an with £(5) as -+ 5 -* 1. € (5) £ - 0. -sequential equilibrium (Note that with observable actions, sequential equilibrium and subgame -perfect equilibrium are equivalent.) To show this, we need only calculate the greatest gain to deviating during a punishment phase. Recall that any profile. to zero as is the greatest one -shot gain to any d Consequently, the gain is at most 6 -> 1. player of deviating from which. clearly goes (l-5)Kd, The point is that the punishment lasts only K periods, and players never expect to engage in punishment again. Consequently, with extreme patience, the cost of the punishment is negligible. In particular, with time -averaging this is an exact equilibrium. Proof of (A) : Consider the strategy then starting over and playing a a from of playing T +1 to from a 2T , 1 to T and so forth. , 11 We call period Fix < 5 t" to 1 t" + "round 1", and any beliefs consistent with 1 Let a history . Let us calculate an upper bound on any be given. h (t)) (h^ (t) a "round 2" and so on. 2t" to 1 player's gain to deviating in time-t normalized present value. and a history t < kT t (h^ (t) < (k+l)T , . . . so that , equilibrium, at most During round (1-6)T at most k+2 (1-5)T Further, d. on is independent of what happened during the Since round previous rounds. Since the maximum of per- k. can gain more than k, +1 (k+l)T play from time so that k we know that regardless of the history, no d, player by deviating in round in round is t period gain to deviating is and choose (t)), ,h Fix a time is a truncated time-average k+1 -Nash e can be gained by deviating in this round. e 5(1-5)T can be gained, and so forth. e Conse- T quently, no player can gain more than this quantity approaches choose by I'Hopital's rule. e close enough to S 11 (1-5 )T (d+e /(1-fi that 1 is a 2e a )). 5-+1, As Clearly, then, we may -sequential equilibrium. In addition, because of the repeated structure of a , it is clear that T W.(a",5") W.'^Ca") -* Proof of (B) a 2 from with T as ^ so 1 W. Consider the strategy : 1 -t- 1 to T Let a history a. s"" 12 + T , (h.. (t) (a"", fi"") i a of playing and so forth. , . . . (t)) ,h I . from a to 1 k -* € T W. k (a i can gain at most ) is at most k+1 e , ' in round k+2, k+2 e this implies that the time average gain is zero. , k k+1 -> V. , s is a uniform sequential equilibrium, desired conclusion. , be in round t period as play in the future is independent of play in this round. period gain at T Fix any beliefs consistent be given and let By deviating in the current round, player k. ^ v, 1 . d per The per- Since Since and this yields the I 12 Each of the two conclusions of Theorem 3.1 has strengths and weaknesses. The limit of discounting is more appealing than time -averaging, which suggests the interpretation (A) but exact equilibrium is more appealing than , approximate equilibrium, which suggests the interpretation (B) . In this it is worth emphasizing reiterating that the conclusion of (A) cannot context, be strengthened to e =0. One interpretation of this fact is that the set of time -average equilibria includes not only the limits of exact discounted equilibria, but the limits of approximate discounted equilibria as well. If we weaken definition (2.6) to allow the gain to deviating, depend on T and t, than a uniform one. to p, we have a time-averape sequential equilibrium , rather That these equilibria include all limits of discounted sequential equilibria as shown by Proposition 3.2 If : a is sequentially regular, equilibrium for discount factor 5 with e(5) -+ and is an 0, then e a (5) -sequential is a time- average sequential equilibrium. Proof 5 - 1 : , This effectively follows from the fact that the limit points, as of the normalized present value of a sequence of payoffs are contained in the set of limit points of the finite time averages. implies that if a is sequentially regular, This then the discounted value of payoffs along the continuation equilibrium approaches the value with time averaging. Deviations from the continuation payoff when evaluated with the discounting criterion yield values whose limits are no greater than the lira sup of the time average. It is worth noting that the converse of this proposition is not true. An example of a time -average equilibrium that is not even an approximate equilibrium with discounting can be found in the following "guru" game: I 13 Player 1 period. He chooses one of three actions is the guru. Player 2 or 0,1, each 2, He also chooses one of two actions. is a disciple. Each period the guru receives zero, while the disciple receives an amount equal to the action chosen by the guru. In other words, the guru is comp- letely indifferent, while the disciple cares only about what the guru does. Notice that sequentiality reduces to subgame perfection in this example, and Any strategy by the guru is optimal that subgame perfection has no force. in any subgame, while any Nash equilibrium need only be adjusted for the disciple so that he chooses an optimum in subgames not actually reached with (There is a trivial complication in positive probability in equilibrium. that the guru has strategies such that no optimum for the disciple exists.) Consider the following equilibrium with time averaging. disciple never played in the past, 2 the guru plays the first period in which the disciple plays period (t+1) to 2t, plays 0, in periods the guru reverts back to 3t+2 disciple always plays 1. 1 As long as the Let 1. denote t Then the guru plays 2. 2t+l, . . . , 2 in 3t+l, and in period Regardless of the history the forever. With time averaging the disciple is completely indifferent between all his deviations, since they all yield a time average of 1. Similarly the guru is clearly indifferent between all his actions. Consequently this is a time-average equilibrium. (In fact, it is also an equilibrium when the players' time preference is represented by the overtaking criterion.) On the other hand, with discount factor we can always find a time t such that by deviating at such a time player 2 is S between 1/2 always gains at least Consequently this configuration is only an £ -equilibrium for S and > 1/2, 1/4, and 1/32. e > 1/32, and 14 in particular, does not converge to zero as e goes to S 1. We would argue that time -averaging is of interest only insofar as it captures some sort of limit of discounting. Consequently, we argue that this equilibrium of the guru game does not make good economic sense. not know whether Proposition If so, average equilibrium. 3 . We do or its converse holds for uniform time- 2 it is a strong argument in favor of restricting attention to uniform equilibria. A Folk Theorem 4. We now restrict attention to a limited, but important class of games, which we call informationallv connected games informationally connected. connected to player action profile TT. ( • ,a. ,a' ,a . other words, at ,) r* n.(',a) player a, j p, to j, k. is directly such that i of player a' k. In has a deviation that can be potentially i , . . . i k We say that plays. if for every player j In i, and such that player despite player for player regardless of how player exists a sequence of players ^ ' for any a. i if there exists a mixed k?*i,j regardless of the play is connected to player i All two-player games are we say that player despite player and a mixed action a detected by player player j?^i N > 2, If . , i with k ^ i.j. In-' i, — i, i — i and there i ?«k p is directly connected to player In other words, a message can always be passed from regardless of which single other player might try to interfere. i .. i A game is informationally connected if every player is connected to every Note that the counterexample is to the strategy profile o not being an approximate equilibrium with discounting. From the folk theorem, we know that there are other strategies that are exact equilibria and yield approximately the same payoff. this is not true in games with a Markov structure and absorbing states, as shown by the example of Sorin [1986]. In fact, his example has equilibrium payoff with time averaging that are not ever approximately equilibrium payoffs with discounting. 15 other player. For an example of such a game, suppose that each player "Average" price is a strictly determines an output level of zero or one. decreasing function of total output, and each player observes an "own price" that is an independent random function of average price, with the property that the distribution corresponding to a strictly higher average price strictly stochastically dominates that corresponding to the lower price. In this case, all players can choose zero output levels, and if any player produces a unit of output, this changes the distribution of prices for all other players, with a deviation by a second player merely enhancing the signal by lowering the distribution of prices still further. v A mutual threat point mutual punishment action and mixed actions a'. . is a payoff vector for which there exists a such that a g.(a!,a ) ^ v for all players With three or more players vector v* i of rainmax values need not be a mutual threat point, as there may not be a single action profile that simultaneously holds all of the players to their minmax When such a profile exists, the game is said to satisfy the "mutual values. minmax property" (Fudenberg-Maskin [1986a]). One example where this property obtains is a repeated quantity- setting oligopoly with capacity constraints where the profile "all players produce to capacity" serves to minmax all of the players. A payoff vector that weakly Pareto- dominates a mutual threat point is called mutually punishable is the the closure of the convex hull of such payoffs : mutually punishable set . We say that a payoff vector exists a mixed action profile player i player j^i, and mixed action n . (» ,a'. ,a .) ?^ a a'., tt. v is with (independently) enforceable if there g(cx) e,.(a'.,a (•,«). .) and such that if for some - v, > v. , then for some other In other words, any improving . . 16 deviation for player The enforceable set i V can potentially be detected by some other player is the closure of the convex hull of the j enforceable Notice that in addition to the static Nash equilibrium, which is payoffs. clearly enforceable, every extremal Pareto efficient payoff is. This is because any extremal payoff is generated by a pure action profile, and any if a action profile is not efficient pure action profile is enforceable: enforceable, one player can strictly improve himself without anyone else knowing or caring. equilibrium at time Note also that if the unconditional play in any Nash t is a mixed (rather than correlated) action profile, it is clear that the profile must be enforceable (except in the infinite time average case, for a negligible fraction of periods). However, it is possible that unenforceable payoffs can be achieved through correlation. Finally, we define the enforceable mutually punishable set V* - V n V, which is closed, convex, and contains at least the convex hull of static Nash equilibrium payoffs. Theorem 4.1 v e V*, £ - : (Folk Theorem) We can now prove: In an informationally connected game, there exists a sequence of times and strategy profiles T of non-negative numbers , such that a if a is a T -truncated time- T average Remark : e -Nash equilibrium and W. (a ) -»• v. . This is a partial folk theorem in that, as remarked earlier, the mutually enforceable payoffs may be a strictly smaller set than the individually rational socially feasible ones. Moreover, in games with imperfect observation of the opponents' actions and three or more players, even the individually rational payoffs may not be a lower bound on the set of payoffs that can arise as equilibria (see Fudenberg-Levine-Maskin for an example) Finally, we do not know whether the set of equilibria can include payoffs 17 not in V*. In review The proof constructs strategies that have three stages. stages, players play to obtain the target payoff At the end of review v. stages players "test" to see if their own payoff was sufficiently close to the target level. Then follows a "communication stage" where players use their actions to "communicate" whether or not the review was passed; the assumption of information connectedness is used to ensure that such communication is possible. Finally, if players learn that the test was failed, they revert to a "punishment state" for the remainder of the game. The idea of using strategies with reviews and punishments was introduced by Radner [1981, 1986] who did not need the communication phase because he studied models with publicly observed outcomes. Lehrer [1986] introduces the idea of a communications phase to coordinate play. is £ Our proof in some ways simpler than Radner 's as we establish only the existence of -equilibria of the finite-horizon games and the appeal to Proposition 3.1. Lehrer 's [1986] proof is more complex than ours because he obtains a larger set of payoffs. The key to proving Theorem 4.1 is a lemma proven in the Appendix: Lemma 4.2 : If a is enforceable, then for Tr^L there exists m. -vectors J of weights and A such that for all a^, S A tt ( • ,a^,Q:_^) > g^(a^,a_^) + S.^.A.7r.(-,a)-g.(a)+r7. J?'! J 1 J This shows how we can reduce the problem of detecting profitable deviations by player deter a.'s 1 i to a one -dimensional linear test. that lead to the information vectors n . (• ,a. ,a l' J^ above the half- space whose existence is asserted in the ' We need only ,) -r Leijima. the problem very similar to one with publicly observed signals. that lie This makes rj 18 Proof of Theorem 4.1 how to find and T such that a Nash equilibrium, and is given. We show The proof First we construct a class of strategy profiles determined by the game and L The constants that are free parameters. i > This clearly suffices. . that depends on a vector of constants and constants f T-truncated time-average lOOe- is a a |w.(a)-v.| < lOOt proceeds in several steps. s and suppose v e V*, Fix : the relative lengths of various phases of play, and the v, will index L will determine i's both the absolute length of the phases and the number of times the phases The length of the game are repeated. equilibrium yielding payoffs within 1 is implicitly We then show how to choose the constants constants. Step T (Payoffs): v e V*, v 6 V Since of lOOe determined by these to make H v. v e V. and This means we can find finite L' -dimensional vectors of non-negative coefficients - r- r- -h to T 1, and of payoffs v V is enforceable and v , ,v are mixed action profiles a h . _L' , and a. a , and non-negative L that sum to L and such that |2:^1i(lV^)Y^-v| < Step 2 L' where v h _,L' summing ,\i /i -v, Z,..iivh mutually punishable. is is a mutual punishment that enforces can find -h-h S,../iv with , - v yields payoff a .. , , such that Corresponding to these v and Moreover, it is clear that we . -dimensional vectors of integers |ZL a lOOe- s (L /L )v -v| < L ,L and e e. (Temporal Structure) We first describe the temporal structure of : the game, which consists of communications stage. Set i' L C repetitions of a review stage followed by a - N(N-2) (N-1) periods, and a communications stage lasts ! . A review stage lasts C C i periods. L H R R L Each stage is p further subdivided into phases; a review stage into h = 1 L R "' , and a communications stage into L C L phases indexed by phases, indexed as 19 The temporal structure is outlined in Figure 3.1; the game described below. (2W2^L^) T = 2' lasts periods. The length of a phase of a review stage depend on whether it is rega- rded as a reward or punishment stage; if it is a reward stage, the h lasts periods; if it is a punishment stage it lasts L i Each phase of a communications stage lasts Q have indices (i,j,k) phases have indices Fixing and so forth. (i,j,2), (i,j,l); k, the next the third index, calculating every permutation of the set, giving rise to Fix a permutation, periods. (N-2) in the block is _,i , ,k) n-2 n-1 ^i the second (i-,i„,k); • i-o ^vj t- The (N-2)(N-1)! indices are determined by taking the set of all players but (i,j) (i corresponding (Each index generally occurs more than once.) to a triple of players. (N-2)(N-1)! periods. L periods. 2 Each communications phase is assigned an index first 2 phase (N-1) ! the k, and blocks of Then the first index and so forth up to (i„,i„,k) The point of this rather complex structure is that because . "^ '^ the game is informationally connected, each player has an opportunity to send a signal to all other players, without being blocked by any other single player. Using the fact that the game is informationally connected, we associate distinct triples ted to a , despite j with certain mixed strategies. we say that k, (i,j,k) i that enables him to communicate with despite r:. define Step 3 a k we say that ijk^,-l to be a (Strategies): , (i,j,k) a.-" We now describe player i j . is connec- and let. the deviation for pla- a. If i is an inactive link, j^ijk^,-l to be a and If is an active link, be the action allowing communication and -^ yer j (i,j,k) is not connected to and arbitrarily . i's strategy. one of two states, a reward state or a punishment state. He may be in He begins the game 20 oiy\ir^ " ^r<^ sVa^i V t- \ -X /I Figure 3.1: Temporal Structure i ^ . 21 The punishment state is absorbing, so once reached, in the reward state. player We must describe how a player plays in rema-ins there forever. i each state, and when he moves from the reward to punishment state. In the reward state and review stage, player divided into h phases, with phase L' phase he plays Z R-h L periods. i' he plays (i',j,k) a. ik -^ . In the punishment state and review stage, player as subdivided into stage, phases, with phase L' phase he plays ing the h q. if a. Player i i = lasting h regards the stage i i R h L periods. i he plays (i',j,k) ' a. ik if i that is an active link. (j,i,k) h, to be the fraction of the player i Itt. (•)-".(• a I i periods in which L \n The . (•) -it . (• ,a )| > e' calz. for any communications phase corresponding to a triple that is an active link, player fraction of the periods in which £ )| > e' , player In all other cases player calculates i occurred. z n.(z.) to be the If switches to the punishment state. i state does not change; i's in particular the punishment state is absorbing. We must show how to choose payoffs within and switches to the punishment state. At the end of the p (j,i,k) i If at the end of the review stage occurred. , ^1 e'. At the end of each reward phase of the review stage player 11 i' can change states only at the end of a review stage, or at transition is determined by a parameter 7r.(z.) ?^ i' the end of a communications phase culates Dur- In the punishment state and communications . in the phase indexed by he plays During the In the reward state and communications stage, in a.. the phase indexed by lasting h regards the stage as i lOOe of v, 2' R , S. , 2 C and e' and no greater than an to give expected lOOe gain to any . 22 player from deviating. First we consider the equilibrium payoffs. Step 4 (Payoffs): Let den- n' ote the probability that no player ever enters the punishment state and let v' be the expected payoff vector conditional on this. V = S, ., (L /L )v The payoff . v differs from v' Let solely due to communi- cations phases, yielding r r /, .x I-, -I |v'-v| (4.1) i < L d d A^+/l^ where recall that < |v-v| d l+(//i^)(LV^) is the greatest difference between any payoffs. Since by construction, we conclude e |w^(a)-v^| < (4.2) + a £ + d-TT') Ll+(iVS(LVS This shows we must take 2 R C /£. very large and close to one. n' show that the weak law of large numbers implies that i R and Step 5 C i are both sufficiently large (relative to (Gain to Deviating): by deviating. n' We will close to one if e'). Next, we consider how much a player might gain During a communications stage (or in any period for that matter) the greatest per period gain is d. However, only R R C C L /{2 L +2 L ) C C £ of the periods in the game lie in communications phases so we have the greatest per period gain over the whole game due to deviations during communications is [communications] (4.3) l+(Ji^/f)l^/lP) Next we consider review stages. possibilities: Fixing a deviator, there are three all other players are in the punishment state; all are in the review state, or they disagree on the state. Of course a player 23 contemplating deviating may not be certain which of these is true, but since he can only benefit from this extra information, we may suppose he knows which case he faces. If all other players are in the punishment state they will remain there regardless of player play. i's punishment phase, player gets at most i T the largest gain over Regardless of how he plays during a that player W. (a) |S(L /L )v. -v. Since v.. | < e, can obtain by deviating i throught the punishment state is (punishment) (4.4) T + v e - W. (a) < + (l-:r') 2£ + d Ll+(iVS(LVS where the inequality follows from (4.2). Next, suppose that other players disagree about the state (so in particular, TT N > 2) Then player . i can possibly get per period. d Let be a lower bound on the probability that all opponents agree on a punishment state at the end of the subsequent communications stage (and therefore agree for the rest of the game) best hope to get d Then the deviating player can at . for one stage with probability periods with probability (I-tt) . Since there are per period for one phase is actually worth [confusion] (4.5) — d/£' n, £' and d in all review phases, d yielding + (l-7r)d. This leaves review stages where all opponents are in the reward state. Let I-tt' more than stage. be an upper bound on the probability that player 2e i both gains and no opponent enters the punishment state at the end of the Then player i can at best gain 2e in all periods, and can gain . 24 more than and remain with all opponents in the reward state next stage 2f with probability no more than i I-tt' Since at best he can gain . d, player gains at most [reward] (4.6) 2e + j;^ + (l-7r')d. in reward stages. Adding these bounds, we find an upper bound on the gain to deviating [total] (4.7) r> ^r ^^ + r + u TT + (l-^)d + l+(iV^ )lVl^ Our goal is to find i'.i smaller than lOOf Step Fix 2' so that smaller than ^2 where 6 : R C d/2' < — e + d(l-7r') + dd-^^'). to simultaneously make ,(.' ,2 ^ C Let e. 2 (2 R ) (4.2) and (4.6) be the largest integer . 1+(1/7)(lV^) We may then simplify (4.2) to |wT(a)-v.| < 2e + (l-^')d, (4.2') and (4.7) to 8£ + (4.7') (l-^)d + (l-7r')d + (l-:;^')d. Consequently, it suffices to show that we can choose C when - 2 C 2 {2 R - ) , (I-tt'), (I-tt) (I-tt') , 2 and e' so that are all less than or equal to e/d. By Lemma A.l in the Appendix, for the h and such that for all 2 either i e' < e gains less than or equal and le 2 reward phase there exists > 2 the probability that or at least one opponent switches c 25 - to punishment at the end of the stage is at least Consequently, for By Lemma A. 2 e' < min e in the Appendix, < "" and e-^ iki play > £ phase is at least 2 > max i^ - 2 we have , i^(/) (1-e/d) / and Fix then e' > 1 > i**, tt' > 1 - e/d. such that for all and Note that e/d. - I players k' i ?^ < min e' > max £ e 2 -^ , , provided . By the > (l+7)i^JV7. - min( - lei i-^ plays a. j e'^, e^^^) i* - max{ and , 2** weak law of large numbers, there exists an 2' i and > n' that are active (j,k,i) Consequently, for . tt we have , 1/L^ switching to punishment at the end of the k the probability of ar., lei e-' if player i^ i for the i communications links there exist e' > max i , (1-e/d) Choosing e/d. 2' i'^ , (l+7)i^^ /-y) such that for = max{i*,^**) then completes the proof. Remark : How frequently does punishment occur? As T «, -+ the proof shows Examining the proof of that the probability of punishment goes to zero. Theorem 3.1, we see that infinite equilibria are constructed by splicing For approximate discounted together a sequence of truncated equilibria. equilibria, the construction repeats the same equilibrium over and over, so that the probability of punishment in each round is constant. By the zero- one law, this means punishment occurs infinitely often. In the case of time average equilibria, it is irrelevant whether punishment occurs infinitely often or not. The truncated equilibria that are spliced together have decreasing probability of punishment, say »r..,7r„,... -»• 0. By the zero-one law, infinitely often, while if finite time. if Z .. tt = co, punishment occurs CO S - tt < ", punishment almost surely stops in However, we can choose a subsequence [n' ) such that 26 2 tt' < 00. The corresponding truncated equilibria when spliced together form a time average equilibrium in which punishment almost surely ceases. On the other hand, we may form a sequence in which the equilibrium is repeated 1/t times. truncated t Splicing this sequence together clearly yields a time-average equilibrium with CO S tt' - <», and so punishment occurs infinite often. This discussion should be contrasted with the results of Radner [1981] in which punishment is infinitely often, and Fudenberg.Kreps and Maskin [1986a], in which it almost surely ceases. 27 APPENDIX Proof of Lemma 4.2 We may regard : as having components tt . and j?*i , Z. g + .M. Define . space such that there exists a mixed strategy = 7r.(»,a.,Q n. L J^ j and .) -1 with components < g.(a.,a .). '^1 g. ^1 and let 7r.(',a ), enforceability implies A > 0, for player . be the tt E. (7r,g.+A) Then ). vector .M. J {n,g.) G F Moreover, F. with 1 Jr^l - g. (a g. to be the subset of this F /i Let -L 1 dimensional Euclidean space 1 F and for is a convex polyhedral set and its extreme points correspond to pure strategies for player i. Since is convex polyhedral, F linear inequalities > where +77 g. where X(n-n) - {n,g e F ) is a matrix, A Since - r? G (7r,g.) Att - g. and , Att - g. + rj , and b Then the desired conclusion. a feasible solution to the primal exists, F, by the subject to this (g.-g.) Suppose this problem has a solution equal to zero. constraint. Att + b(g.-g.) > c, A(7r-7r) Consider minimizing are vectors. c we may characterize so a minimum of zero exists, if and only if the dual has a feasible plan yielding The dual is to maximize zero. - (A,-l) (cok,dh) tj In other words, A — wA is . A > 0, bA w>0, wc-0, cob--l, 3^ (T.g.) 6 F, A(7r-7r) , c + > c. so b(g'. -g.) > c Moreover, if We conclude for some component . p. - w q q ?^ p and - w P 1. A(7r-7r) > b. implying that + b(g.-g.) > c Finally, for - 0, b c P then then the desired solution. > g. g'. > 0. if we can find Recall that and subject to aic < 0. Choose P This is the desired solution. I 1 28 Lemma A. Suppose players : are in the reward state and follow their ji^i equilibrium strategies in review phase an and e such that for i. Prob{3j^i|^^(-)-7r.(-,a'^)| < J < e' e' and e and For every h. (1/L^i^) is player P roof -? , r.(t) > g.(a^) + 2e} < ^ (Recall that i. (t) r = weights m. -vectors of ii^i such that rj .A.7r.(«,a.,Q: E. J*^^ J V\ S. .A.-7r.(',Q x(t) = S. .) -1 J > g.(a,,a ''i 1 + .) -1 r? and 1-1 ) — B.(a + ) ri Consider then the random variable . where .A.(z.(t)), J'^i J random variable is the component of A.(z.(t)) J corresponding to players. 1 J and A. J a scalar /3 ^ realized payoff in period t.) i's From Lemma 4.2 there exist for : ^ S t=l J regardless of the strategy used by player r.(z.(t)) i there exists P > J z.(t) J J The idea of this construction is to use a single z.. to summarize the information received by all other x(t) Let L'V x(t) = X. S X.n^(-) " (1/L^i^) 2 j^i ^ t=l -" It is clear then that there exists then |7r.-7r.(«,Q; )| > for some e x > g. (a + e such that if j. That is, if the sample mean far from its theoretical distribution under observe that his empirical information tx . a ) then some player r? x j?^i i had played a Consequently, it suffices to show with . L^/ p. " = 2 t-1 r.(t) that ^ Prob{x < g.(a Fix a strategy for player i ) + r? + e and and consider £ is will is far from what it would be if J player + p. > g.(a ) + 2£) < 1-/9. 2 29 i(t) - x(t) E[x(t)|x(t-1) - r(t-l) ]. These are uncorrelated random variables with zero mean and are bounded independent of the particular strategy of player r(t) = r(t) tions apply to E[r(t) |x(t-l) - , . the weak law of large numbers shows that as ity uniformly over strategies for player mixed action at conditional on t Z R r > g. (a Lemma A . ) -r - r} < -i) implies x - r < Prob{x + 2e Suppose player : is an active -1 < e^^^ Prob ^j.'^^ ( I Proof : tt, k tt, k may find a Att m, (..a-^ (•,q-' i ,a. ,a there exists x(t) = x(t) £-' - =0. A such that if E[x(t) |x(t-l) , . 11 k'r'i r, rj. and (j,k,i) follow their equii lei. there exists an /3 lei . i lei- and e-^ i-" /3. for different ,) iki tt . . ] (•,q-' and a scalar x(t) -A(z, (t)). Set > g (a (t),a'^ ) + -1 punishment state, and that -J-i 1 -vector of weights ^) i's x<g(a)+r7+e 7r^(.,aJ^^)| > £') > - i Ici. = be player this gives the desired conclusion. convex, and by assumption does not contain and ) This is similar to, but simpler than Lemma A.l. set of vectors in probabil- i^ > i^^^ and ( . ) > Consequently . r(t-l)....] + If all players 1 ] Then 1 Since , is in the j librium strategies then for every e' e . Q:.(t) (.,a (t),a^ J . - rj communication phase. such that for -^ . x,r->0 - E[r(t)|x(t-1) Consequently . r(t-l) TT J , — — -", Let i. A Js^l r(t-l) , . x(t-l) r(t-l),...] - S E[x(t)|x(t-1) . Similarly considera- i. x > r?/2 has x , ). > r; -tt . is compact a. i ( • such that ,a • Consequently, we A^^^^(.) Again, \t\-^ Observe that the ) | > -> £-^ Att x, . > so Again converging uniformly to zero in rj 30 probability. Since E[x(t) x(t-l) ,...]= | get the desired conclusion. Att^C . ,aJ^\a. (t) .a^^f^ )> ,7 , we , 31 REFERENCES Abreu, D., D. Pearce and E. "Toward a Theory of Stacchetti (1987), Discounted Repeated Games With Imperfect Monitoring," unpublished ms . Harvard. "Optimal Cartel Monitoring With Imperfect Information," (1986), , Journal of Economic Theory Friedman, J. . "A Non-Cooperative Equilibrium for Supergames," Review (1972), of Economic Studies . 38, Fudenberg, D., D. Kreps and 1-12. E. Maskin (1989), "Repeated Games With Long Run and Short Run Players," mimeo Fudenberg, D., Levine D. 251-69. 39, , and E. , MIT. Maskin (1989), "The Folk Theorem With Imperfect Public Information," mimeo. Fudenberg, D. , and E. Maskin (1986a), "Folk Theorems for Repeated Games With Discounting and Incomplete Information," Econometrica (1986b), unpublished ms (1989) . , , 54, 533-54. "Discounted Repeated Games With One-Sided Moral Hazard," MIT. "The Dispensability of Public Randomizations in Discounted Repeated Games," unpublished ms Fudenberg, D., and . D. . , MIT. Levine [1983], "Subgame Perfect Equilibria of Finite- and Infinite-Horizon Games," Journal of Economic Theory . 31 (December), 251-63. Green, E. , and R. Porter (1984), "Noncooperative Collusion Under Imperfect Price Information," Econometrica Kreps, D. , 863-94. . 52, 87-100. and R. Wilson (1982), "Sequential Equilibrium," Econometrica . 50, 32 Lehrer, "Two-Player Repeated Games With Non-Observable Actions (1985), E. and Observable Payoffs," mimeo, Northwestern. (1988a) "Extensions of the Folk Theorem to Games With Imperfect , Monitoring," mimeo, Northwestern. "Internal Correlation in Repeated Games," Northwestern, (1988b), CMSEMS Discussion Paper #800. "Nash Equilibria of n-Player Repeated Games With Semi- (1988c), Standard Information," mimeo, Northwestern. "Two Players' Repeated Games With Non-Observable (1988d), Actions," mimeo, Northwestern. (1989) "Lower Equilibrium Payoffs in Two-Players Repeated Games , With Nonobservable Actions," forthcoming, International Journal of Game Theory . Porter, R. "Optimal Cartel Trigger-Price Strategies," Journal of (1983), Economic Theory Radner, R. (1981), 29, . 313-38. "Monitoring Cooperative Agreements in a Repeated Principal-Agent Relationship," Econometrica (1985), Econometrica . (1986) ,' R. 1127-48. 1173-98. "Repeated Partnership Games With Imperfect Monitoring and No Discounting," Review of Economic Studies Radner, R. 49, "Repeated Principal Agent Games With Discounting," 53, , . Myerson and E. . 53, 43-58. Maskin (1986), "An Example of a Repeated Partnership Game With Discounting and With Uniformly Inefficient Equilibria," Review of Economic Studies . 53, 59-70. Rubinstein, A. and M. Yaari (1983), "Repeated Insurance Contracts and Moral Hazard," Journal of Economic Theory . 30, 74-97. 33 Sorin, S. (1988), "Asymptotic Properties of a Non-Zero Sum Stochastic Game," International Journal of Game Theory (1986), Stigler, G. Economy . (1961), 69, "Supergames," mimeo , . 15, 101-107. CORE. "The Economics of Information," Journal of Political 213-25. jb' U U0 7 Date Due 24 m\ Lib-26-67 MIT LIBRARIES DUPL 3 TDflD I Da5b7m4 5