Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/repeatedgameswitOOfude UUl 14 working paper department of economics REPEATED GAMES WITH LONG-RUN AND SHORT-RUN PLAYERS By Drew Fudenberg David Kreps Eric Ma skin No. 474 January 1988 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 1988 REPEATED GAMES WITH LONG-RUN AND SHORT-RUN PLAYERS By Drew Fudenberg David Kreps Eric Maskin No. 47A January 1988 M.I.T., Department of Economics; Stanford University, Graduate School of Business; and Harvard University, Department of Economics, respectively. M.I.T. LIBRARIES JUL 1 5 1988 RECEIVED ABTRACT This paper studies the set of equilibrium payoffs in games with longand short-run players and little discounting. Because the short-run players are unconcerned about the future, equilibrium outcomes must always lie on their static reaction (best response) curves. The obvious extension of the Folk Theorem to games with this constraint would simply include the constraint in the definitions of the feasible payoffs and of the minmax values. This extension does obtain under the assumption that each player's choice of m ixed strategy for the stage game is publicly observable, but, a in contrast to standard repeated games, the limit value of the set of equilibrium payoffs is different if players can observe only their opponents' realized actions. 1. Introduction The "folk theorem" for repeated games with discounting says that mild conditions) each individually-rational payoff can be attained in (under a It has long perfect equilibrium for a range of discount factors close to one. been realized that results similar to the folk theorem can arise if some of the players play the constituent game infinitely often and others play the 'constituent game only once, so long as all of the players are aware of all previous play. A standard example is the infinitely-repeated version of Selten's [1977] chain-store game, where a single incumbent faces an infinite sequence of short-run entrants in the game depicted in Figure 1. Each entrant cares only about its one-period payoff, while the incumbent maximizes its net present value. For discount factors close to one there is a perfect equilibrium in which entry never occurs, even though this is not a perfect equilibrium if the game is played only once or even a fixed finite number of times. In this equilibrium, each entrant's strategy is "stay out if the incumbent has fought all previous entry; otherwise, enter;" and the incumbent's strategies is "fight each entry as long as entry has always been fought in the past, otherwise acquiesce." Other examples of games with long and short run players are the papers of Dybvig-Spatt [1980] and Shapiro [1982] on a firm's reputation for producing high-quality goods and the papers of Simon [1951] and Kreps [1984] on the nature of the employment relationship. This paper studies the set of equilibrium payoffs in games with longand short-run players and little discounting. This set differs from what it would be if all players were long-run, as demonstrated by the prisoner's dilemma with one enduring player facing a sequence of short-run opponents. Because the short-run players will fink in every period, the only equilibrium is the static one, no matter what the discount factor. In general, because the short-run players are unconcerned about the future, equilibrium outcomes must always lie on their static reaction (best response) curves. This is also true off of the equilibrium path, so the reservation values of the long-run players are higher when some of their opponents are short-run, because their punishments must be drawn from a smaller set. The perfect folk theorem for discounted repeated games [1986]) shows that, under a (Fudenberg-Maskin mild full-dimensionality condition, any feasible payoffs that give all players more than their minmax values can be attained by a perfect equilibrium if the discount factor is near enough to one. The obvious extension of this result to games with the constraint that short-run players always play static best responses would simply include that constraint in the definitions of the feasible payoffs and of the minmax values. Propositions 1 and 2 of Section 2 shows that this extension does obtain under the assumption that each player's choice of a mixed strategy for the stage game is publicly observable. We then turn to the more realistic case in which players observe only their opponents' realized actions and not their opponents' mixed strategies. While in standard repeated games the folk theorem obtains in either case, when there are some short-run players the set of equilibria can be strictly smaller if mixed strategies are not observed. The explanation for this difference is that in ordinary repeated games, while mixed strategies may be needed during punishment phases, they are not necessary along the equilibrium path. In contrast, with short-run players some best responses, and thus some of the feasible payoffs, can only be obtained if the long-run players use mixed strategies. If the mixed strategies are not observable, inducing the long-run players to randomize may require that "punishments" occur with positive probability even if no player has deviated. For this reason the set of equilibrium payoffs may be bounded away from the frontier of the feasible set. Proposition 3 of Section 3 provides a complete characterization of the limiting value of the set of equilibrium payoffs for a single long-run player. This characterization, and the results of Section 2, assume that players have access to a publicly observable randomizing device. implement strategies of the form: switch to a The device is used to if player i deviates, then players jointly "punishment equilibrium" with some probability p < 1. While the assumption of public randomizations is not implausible, it is interesting to know whether it leads to a larger limit set of equilibrium payoffs. Proposition in Section 4 shows that it does not: 4 strategies" in which a We construct "target player is punished with probability one whenever his discounted payoff to date exceeds a target value, and shows that these strategies can be used to obtain as an equilibrium any of the equilibrium payoffs that were obtained via public randomizations in Proposition 3. Proposition 3 shows that not all feasible payoffs can be obtained as equilibrium, so in particular we know that some payoffs cannot be obtained with the target strategies of Proposition 4. Inspection of that construction shows that it fails for payoffs that are higher than what the long-run player can obtain with probability one given the incentive constraint of the short-run players: that player For payoffs this high, there is a positive probability will suffer a run of "bad luck" after which no possible sequence 1 of payoffs could draw his discounted normalized value up to the target. As this problem does not arise under the criterion of time-average payoffs, one might wonder if the set of equilibrium payoffs is larger under time-averaging. Proposition 5 shows that the answer is yes. In fact, any feasible incentive compatible payoffs can arise as equilibria with time-averaging, so that we obtain the same set of payoffs as in the case where the player's privately mixed strategies are observable. This discontinuity of the equilibrium set in passing from discounting to time averaging is reminiscent of a similar discontinuity that has been established for the equilibria of repeated partnership games (Radner [1986], Radner-Myerson-Maskin [1986]). The relationship between the two models is discussed further in Section 5. We have not solved the case of several long-run players and unobservable mixed strategies. Section 5 gives an indication of the additional complications that this case presents. 2. Observable Mixed Strategies Consider finite n-player game in normal form, a g: S.x xS -> R We denote player i's mixed strategies by expected value of g under distribution cr e X, and write g(<7) for the cr. In this section we assume that a player can observe the others' past mixed strategies. This assumption (or a restriction to pure strategies) is standard in the repeated games literature, but as Fudenberg-Maskin [1986] [1987a] have shown, next section!) it is not necessary there. (Here it matters — see the We will also assume that the players can make their actions contingent on the outcome of a publicly observable randoming device. Label the players so that players short-run. Let B: EiX...xEj=^Ij^i...xI^ 1 to j are long-run and j+1 to n are . be the correspondence which maps any strategy selection (<7 , ...,a.) for the long-run players to the corresponding Nash equilibria strategy selections for B(a) is his If there is only one short-run player, the short-run players. best response correspondence. For each from i 1 to j choose m , max ^ min m egraph(B) g = (m. , . . . ,m ) solves so that m {<T^,m_^), o. and set v^ = max g (f^^,in_^) 1 (This minimum is attained because the constraint set graph (B) the function max g (cr. ,m_.) is compact and is continuous in m_..) <j 1 The strategies m_. minimize long-run player i's maximum attainable payoff over the graph of B. The restriction to this set reflects the constraint that the short-run players will always choose actions that are short-run optimal. can give player i Given this constraint, no equilibrium of the repeated game less than v.. (In general, the short-run players could force player one's payoff even lower using strategies that are not short-run optimal). a best Note that m response to m_.: specifies player i's strategy m. Player i , which need not be must play in a certain way to induce the short-run players to attain the minimum in the definition of m . In order to construct equilibria in which player i's payoff is close to v., player i need to be provided with an incentive to cooperate in his own punishment. will In the repeated version of g, we suppose that long-run players maximize the discounted normalized sum of their single-period payoffs, with common t=oo That is, long-run player i's payoff is (l-5)\ 5 g.(a(t)). discount factor 5. t=0 Short-run players in each period act to maximize that period's payoff. All players, both long- and short-run, can condition their play on all previous actions. Let U = i V = (v , ...V ) I 3a in graph Let V = convex hull of U; and let V = \v^v\ for all We call payoffs in V i from with (B) 1 to j, v g{o') > = vl v.). attainable payoffs for the long-run player. payoffs in V* can arise in equilibrium. Only We begin with the case of a single long-run player. Proposition 1: v.ev* there exists a If only player one is a long-run player, then for any &e(0,l) such that for all 6e(5,l), there is a subgame-perf ect equilibrium of the infinitely repeated game with discount factor 6 in which player i's discounted normalized payoff is v.. ^ix El22l- s v,e V and consider the following strategies. Phase A, where players play a a e graph such cr's) that gives player are ignored. if T(6) need not be payoff v.. If player one deviates, the punishment strategy m A; 1 is large enough, a (B) (or a public Begin in randomization over Deviations by the short-run players he is punished by players switching to for T(6) periods, after which play returns to Phase deviations in Phase A are unprofitable. best response against m_. , Now m- so we must insure that player one . does not prefer to deviate during the punishment phase. specifying that most that player This is done by Since the deviation in this phase restarts the punishment. a can obtain in any period of the punishment phase is v 1 , he will prefer not to deviate so long as T(S) is short enough that player I's normalized payoff at the start of the punishment phase is at least v = max g. (f) Let v. . The two constraints on T(5) will be satisfied if: . CTegraph(B) (l-5)v^+ 6(l-6'^^^^g^(m-^)+6^^^^^^v^s v^, or equivalently (1) (1' 6T(^)+1 <{v^-6g^(m^) + (l-6)v^)/(v^-g^(m^)), and ) T(5) (1-5^ (2) 5^^°^ (2') > )g^(m 1 ) + T(6) d^ v^ > v^, or equivalently {v^-g^(ffi^))/{v^-g^(m^)) The right-hand sides of inequalities and for 5 close to 1 the numerator of Then since 6 5 < 1 T (1' and ) (1' ) (2' ) have the same denominator, exceeds the numerator of {2' ) is approximately continuous in T for 5 close to 1, we can find a such that for all greater 5 there is a T(5) satisfying (1' ) and (2' ). Q.E.D. In repeated games with three or more players, a full-dimensionality condition is required for all feasible individually rational payoffs to be enforceable when 6 is near enough to one. The corresponding condition here is that the dimensionality of V* equals the number of long-run players. . Proposition Assume that the dimensionality of V* = 2: j the number of , Then for each v in V*, there is a 6e(0,l) such that for all long-run players. 6e{5,i) there is subgame-perf ect equilibrium of the infinitely repeated game with discount factor 5 in which player i's normalized payoff is v.. Remark: 2: The proof of Proposition 2 follows that of Fudenberg-Maskin' s Theorem (long-run) player deviates, he is punished long enough to wipe out the If a gain from deviation. To induce the other (long-run) players to punish him, they are given a "reward" at the end of the punishment phase. One small complication not present in Fudenberg-Maskin is that, as in Proposition player being punished must take an active role in his punishment. 1, the This, however, can be arranged with essentially the same strategies as before. Proof i Let T i, (v' +c , . . . o a Also choose v. to Choose : v' ,v'. (or a public randomization over several in the interior of V* and an c ,+c ,v'. ,v'. +c , +c ,v'. be a joint strategy that yields and yields v'. to i. being punished with the strategies m is in V* and ) v'. Let w. = g.(m ) . > so that q(o) = cr's) so that for all i from 1 v'. + c < v. + c to all the long-run players but be player i's period payoff when j For each i, choose an integer N. so ' that v.+ N.v. 1-1 1 where v. = max g < (N+1) v'. 1 is i's greatest one-period payoff. Consider the following repeated-game strategy for player i; is Obey the following rules regardless of how the short-run players have (0) played in the past: Play (A) <7. each period as long as all long-run players played o last period, or if o had been played until last period and two or more long-run players failed to play o last period. If j deviates from (A) Play T-? thereafter. If long-run player k deviates in phase (B.) then , Play m. for N. periods, and then (B.) (C) long-run player again with j = k. (As in phase A, (B.) or (C) , then begin phase players ignore simultaneous deviations by two or more long-run players.) As usual, it suffices to check that in every subgame no player can gain from deviating once and then conforming. The condition on N. ensures that for S close to one, the gain from deviating in Phase A or Phase C is outweighed by Phase B's punishment. If player j conforms in B N. punished) her payoff is at least g.s close enough to one. . (i.e. when she is being N. . (1-6 ')w. +5 v'. , which exceed v. if 5 is most V. the period she deviates, and postpones the payoff q. lowers her payoff. next N, she receives at If she deviates once and then conforms, If player k deviates in Phase B., periods and Phase-C play will give her v' for the missing computations.) v. which , she is minmaxed for the instead of is easy to show that such a deviation is unprofitable. > v' + c . Thus it (See Fudenberg-Kaskin Unobservable M ixed Strategies 3. We now drop the assumption that players can observe their opponents' mixed strategies, and instead assume they can only observe their opponents' realized actions. In ordinary repeated games, (privately) mixed strategies are needed during punishment phases, because in general player's minmax a However, mixed value is lower when his opponents use mixed strategies. strategies are not required along the equilibrium path, since desired play along the path can be enforced by the threat of future punishments. Fudenberg-Maskin showed that, under the full-dimension condition of Proposition 2, players can be induced to use mixed strategies as punishments by making the continuation payoffs at the end of a punishment phase dependent on the realized actions in that phase in such a way that each action in the support of the mixed strategy yields the same overall payoff. In contrast, with short-run players some payoffs (in the graph of B) can only be obtained if the long-run players privately randomize, so that mixed strategies are in general required along the equilibrium path . As a consequence, the set of equilibrium payoffs in the repeated game can be strictly smaller when mixed strategies are not observable. illustrated by the following example of a This is game with one long-run player. Row, and one short-run player. Col. Col's best response is M if Let p be, the probability that Row plays D. s p < 1/2, L if 1/2 static equilibria: s p < 100/101, and R if p s 100/101. the pure strategy equilibrium (D,R) , a There are three second in which p = 1/2 and Col mixes between M and L, and a third in which p = 100/101 and Col mixes between L and R. Row's maximum attainable payoff is when p = 1/2 and Col plays L. 10 3, which occurs . 4, 2, 2 0, 1 -1, 1, 1 0, Fi gure If Row's mixed strategy is observable, -100 3 1 she can attain this payoff in the infinitely repeated game if S is near enough to 1. If however Row's mixed strategy is not observable, her highest equilibrium payoff is at most 2 regardless of S fix a discount factor S, and let v To see this, (6) be the supremum over Suppose that for some 6 all Nash equilibria of ROW's equilibrium payoff. * v(5)=2 v(-^) = V + c' (5) is stationary: >2, and choose - c > an equilibrium such that player I's payoff is <> It is easy to see that the set of equilibrium payoffs 2. Any equilibrium payoff is an equilibrium payoff for any subgame, and conversely. starting from period 2 Thus, the highest payoff player is also bounded by v Since (5). 1 v(-£>) can obtain is the weighted average of player I's first-period payoff and her expected continuation * payoff, player I's first-period payoff must be v (5) - c/(i-o). For c sufficiently small, this implies that player I's first period payoff must exceed 2. In order for Row's first-period payoff to be at least 2, Col must play L with positive probability in the first period. As Col will only play L if Row randomizes between U and D, Row must be indifferent between her first period choices, and in particular must be willing to play D. expected payoff from period 2 Let v be Row's on if she plays D in the first period. 11 Then we must have 2(1-5) + (3) But since v_^ ^ 5vjj = V*. v*, we conclude that v* ^ 2. While Row cannot do as well as if her mixed strategies were observable, For 5 near enough to one there she can still gain by using mixed strategies. is an equilibrium which gives Row an normalized payoff of 2, while Row's best payoff when restricted to pure strategies is the static equilibrium yielding To induce Row to mix between U and D, specify that following periods when 1. Col expects mixing and Row plays U, play switches with probability p to (D,R) for ten periods and then reverts to Row randomizing and Col playing L. The probability p is chosen so that Row is just indifferent between receiving 2 for the next eleven periods, or receiving 4 today and risking punishment with probability p. This construction works quite generally, as shown in the following proposition. Proposition 3: Consider a game with a single long-lived player, player one, and let V. max = cregraph B Then for any v ^(Y-i-v s. ) min esupp o ^i ^^i there exists a 5' ' ^-i^ < 1 such that for all 5e(5' ,1), there is an equilibrium in which player one's normalized payoff is v ^ * . o is there an equilibrium where player one's payoff exceeds v. 12 . For no Proo f: We begin by constructing a "punishment equilibrium" in which player one's normalized payoff is exactly v.. is player one's payoff If v. in a static equilibrium this is immediate, so assume all the static equilibria given player one more than v The strategies we will use have two phases. . The game begins in phase A, where the players use m player one's maximum one-period payoff to v , a strategy which holds If player one plays s. . , players publicly randomize between remaining in phase A and switching to a static Nash equilibrium for the remainder of the game. If e. is one's payoff in this static equilibrium, set the probability of switching after (l-5)(v^ p(s.1') = (4) y is near enough to one, p(s.), to be 1 , ~ , g,(s^,m_J)) , 1 (If 5 - x. ^1 ^^l'"-l p(s.) is between and 1.) The switching probability has been constructed so that player one's normalized profit is v for all actions, including those in the support of m , so she is indifferent between these actions. * Next we construct strategies yielding be the corresponding mixed strategies. following a. v. for * v. * ^ Yi • "^^^ ^ ^^i' *^-i^ Play begins in phase A with players If player one deviates to an action outside the support of then switch to the "punishment equilibrium" constructed above. plays an action in the support of o s. equilibrium with probability p(s ), * * ~ , c If player one then switch to the punishment and otherwise remain in phase A. The probability p(s.) is chosen so that player one's payoff to all actions in the support of o one. * is v.. As above, this probability exists if 6 is near enough to These strategies are clearly an equilibrium for large 5. 13 Equilibrium payoffs between v and v randomizations between those two value. cannot exceed v 4. are obtained by using public The argument that player one's payoff is exactly as in the example. No Pub lic R andom izat ions The equilibria that we constructed in the proofs of Theorems 1 through 3 relied on our assumption that players can condition their play on the outcome of a publicly observed random variable. While that assumption is not • implausible, it is also of interest to know whether the assumption is necessary for our results. Proposition 3 For this reason, Proposition 4 below extends to games without public randomizations. about the possible extension of Propositions 1 and 2 (We have not thought because we think the situation without public randomizations but where private randomization can be The intuition, as explained in verified ex-post is without interest.) Fudenberg-Kaskin [1987c] , is that public randomizations serve to convexify the set of attainable payoffs, and when 5 is near to 1 this convexif ication can be achieved by sequences of play which vary over time in the appropriate way. Fudenberg-Kaskin [1987c] shows that public randomizations are not necessary for the proof of the perfect Folk Theorem. However, as we have already seen, there are important differences between classic repeated games and repeated games with some short-run players, so the fact that public randomizations are not needed for the folk theorem should not be thought to settle the question here. 14 Proposition 4: Consider a game with a single long-run player, player As in Proposition 3, where public randomizations are not available. V min max = o e graph s.e supp o (B) ^l^^i' "^-1^ cr * be a strategy that attains this max. there exists a S' <1 such that for all 5e(6' ,\) Then, for any v. e there is : Fix a static Nash equilibrium o with payoffs v. (v.,v.) subgame-perf ect a equilibrium where player I's discounted normalized payoff is v Remark let ' * and let 1, . . For each v. the proof constructs strategies that keep track of the agent's total realized payoff to date t and compares it to the "target" value of what the payoff to date would be if the agent received v V exceeds the payoff in a (possibly mixed) strategy o (1-5 ) v., which is in every period. static equilibrium, then play initially follows the , and whenever the realized total is sufficiently greater than the target value, the agent is "punished" by reversion to the static equilibrium. If the target is less than the static equilibrium, play starts out at the (possibly mixed) strategy m , then with intermittent "rewards" of the static equilibrium whenever the realized payoff drops too low. Proof: (A) If It is trivial to obtain v as an equilibrium payoff. 15 To attain any payoff v (B) Renormalize the payoffs so that that (l-6)v define the strategies (h -^ ) and and v v we proceed as follows. and take 6 large enough = 0, v. Define Jq = v.. < between = (0) •» and an index J a and for each time t>0 , as follows: J^ = J^_^+ (l-6)6^^"^^g^(s^(t-l),/^(h^_^), where is player I's action in period t-1 s, (t-1) mixed strategy,) and =(1-6 )v.. R then his accrued payoff J If player I's payoff were v. would equal "punish" the agent whenever J opposed to his choice of (as R. The equilibrium strategies will . exceeds R each period, More by too large a margin. precisely, we define if o- J^ R^^, t+1 and J_ T s T<t R_ T for all < R^^, t+1 and J_ T ^ all Tst R_ T for < R^ > t ^*(h^.) o = * if J^ t if Note that since J J J^ is a discounted sum, of J^, -s_, for each infinite history h^, Moreover, as long as the other players use converges to a limit J^. strategy for any T<t player I's payoff to any strategy is simply the expected value , and his expected payoff in any subgame starting at time t is * We will now argue that for all times regardless of t and histories h how player subgame starting at time (iii) (i) 1 t if player 1 uses strategy , plays, which imples that ^^^ v , is bounded by 5 that in any subgame where at some "^^t J^^ 2: <>., v , then J (ii) ^ R. that so player I's payoff in the (v -J J < ) R for all histories , h^^, and it is a best response for all players to follow the prescribed strategy of always playing 16 ) the static equilibrium o Conditions and (i) and so , Jg^ ^ ^i ' imply that it is a best response for player (ii) * to play 1 player I's equilibrium payoff is immediate, says that following has dropped below R where J has never dropped below R in every subgame where J -d v Condition (iii) . so that , whose proof is , equilibrium in subgames is also a Nash -s^ and that , is a subgame-perf ect <> (The condition that the short-run players not wish to deviate is equilibrium. incorporated in the construction of ^ . * Proof of (i) We must show that if player : J^^(l-S )v.= R At period t, either t=T. * Since g. (s,,ct_ a ,, we have J^,i = J^ iz * <> J^ i ^r+i °^ ^^^ then for all t, <>. Assume it is true for this is true for t=0. 0, (a) follows ^^^^ "^^ *''^t+i' "^t ^^^ ' * = o. (t) fi Since Jq= . 1 = ) and J - "k and J^_^^ {l-6)6'^v* "^ >^T , "k "k "k - In case (b) by inductive hypothesis. R 2: ,so min lg^{s^(-r) ,-a_^(h^)) ls^(T)esupport(<^^{h^)) (h^)=o- in the support of for every pure strategy s. 1 = v^, (1-5 '^)v* + (l-6)o'^v* = (1-5 '^^^)v^ = R^_^^, ^ where the second inequality comes fron the inductrive hypothesis. Thus J s (1-6 Proof of (ii) )v for all t, and so if player 1 follows -ft. then J^s: v.. Next we claim that regardless of how player : If for some history ^qq ^ ' ^i there is a '^^^'^ T' plays, 1 such that J s: (1-6 ^^^ "^ v . for )v^ * all t > T' . Thus *_i (h = ) o for all >T' t and since the most player , get when his opponents play o is zero, we have that J^ smallest such Then Jqd - J^ T' < so that J™- _, J™_- + 5 in our bound on 6. T < (l-6'^)v - (1-5 )v < . (Since J T + 5 v J < v , ^ J„, = 0, T 1 can Let T be the . > 0.) where we have substituted This argument also shows that player I's payoff in the subgame starting at time t is bounded by 6 17 (v.-J ) , and from part (i) this • * payoff can be attained by following (C) , Next we show how to construct equilibria (for large enough S) that : between v yield payoffs v a v-e:(v -f^. equilibrium payoff of zero. and the static and choose 6 large enough that (l-5)min g,{<^) ,0), v.. > Pick Then set J^ o * = 0, and -i^ = m (0) 1 . Now define J^(h.) and *{h ) as follows. * t Set J^ = J^_^ + 6 g^(s^{t), ^_j^{h^)), and set r J, if R^,, < /(h^) =^ m^ if J^ R^ 2 Proceeding as above, we claim that * player 1 uses strategy for all times t and if (i) (ii) <». histories h that regardless of J, R^ ^ which imples that J^ , how player the subgame starting at time plays, ^^ 1 - v is no greater than 5 t that in subgames where J (iii) then , v , so player I's payoff in (^i"''*.^ ' ^^"^ it is a best response for player one to <R t , - K, * play (h^) -a Proof of J^ (1-5 t=T. = '^^.^ (i): )v = R We must show that if player . Since J= 0, 1 follows this is true for t=0. At period t, either (a) J^ ^ R^ , , 18 or (b) <>. then for all t, Assume it is true for J^ <R^^, , In case (a) J^,- +5 J ^ T (l-5)min{g, >R^ +6 In case (b) , ^ (h^)=a, so g^ (s^ (t) R^^^. . * for all t, and so if player J^ (1-5 Assume it is true for t='^. J^ h. , * (b) , (h^)=m -o = R )v. follows 1 Since J^= . then J^^ v <>. ^ ^' ^^^ ^° '^T+i - . and all times 1 t this is true for t=0. 0, Then at period t, either ^^^^ ^^^' ^ ^"^ -^T+i- In case )=0 for all s^ (T)esupport ^^(h^), We claim that for all strategies of player : and histories (b) ,-d_^ (h^) * t (ii) . R^ Thus J.^(l-S )v. Proof of (from our bound on 5) v. (by inductive hypothesis) V T+1, ^ T T =R and J^^^ = J^ J^+ 6 ^ ) (a) ^ "^T J < R^._^, or ^T+l' * 2 ,so max 5-i (s. , o_.{h)) = v and , "l T J^ , , < s player ,-T + l (l-o Thus J <(l-5 )v plays, we will have J^ that J^ T V (by the inductive hypothesis) ^ . for all t, and so regardless of how player 1 plays, J^^ v (i) consider subgames with J player +(1-5)6 )v and (ii) show that in any subgame with can attain the upper bound of v. by following 1 play o (l-o < (from the bound on o) = R )v Conditions (iii) T J^ + (1-5)5 V < R ^ R^ • If J^ for all t - i t, for the remainder of the game. to play 1 2: o = o R^ at some . '^>t, If J > v , v , . ^R. Now we then regardless of how player Here it is clearly then by playing 1 cr a best response for player attains a 1 can ensure payoff of v If player 1 instead chooses a strategy which 19 1 so player I's opponents will which ensures that player in the subgame starting at t. <>. J. . assigns positive probability to the event that J only lowe his payoff: '^•' < ^t- The payoff for histories with J '^-" •* ^or all t R (BOO < t, > he c an is less than v., ]_ and the payoff for the histories with J_^R_ is bounded above by v, 00 CD . 1 Q.E.D. Proposition 4 shows how to attain any payoffs between v of "target strategies." From Proposition 3 and v by means we know that such strategies We think that it is interesting to cannot be used to attain higher payoffs. note where an attempted proof would fail. Ik In part (A) , we proved that if player 1 followed sequence of realizations player I's payoff is at least v try to attain a payoff v > the "reward" phases where o payoff is less than v, . * v, then for every <> . Imagine that we ^ t by setting the target R = (l-o )v is played, m . . Then it might be that player I's realized (Recall that by definition it cannot be lower than After a sufficiently long sequence of these outcomes, player I's v .). realized payoff J would be so much lower than v that even receiving the best possible payoff at every future date would not bring his discounted normalized payoff up to the target. This problem of going so far below the target that a return is impossible does not arise with the criterion of time-average payoffs, since the outcomes in any finite number of periods are then irrelevant. attain payoffs above 5. v. For this reason we can under time averaging, as we show in the next section. Time Averaging The reason that player I's payoff is bounded by what he obtains when she plays her least favorite strategy in the support of o plays a is that every time she different action she must be "punished" in a way that makes all of the actions in o equally attractive. A similar need for "punishments" along the 20 , equilibrium path occurs in repeated partnership games, where two players make an effort decision that is not observed by the other, and the link between Since shirking by either player increases effort and output is stochastic. the probability of low output, low output must provoke punishment, even though low output can occur when neither player shirks. This is why the best equilibrium outcome is bounded away from efficiency when the payoff criterion is the discounted normalized value. Fudenberg-Maskin [1987a]). (Radner-Myerson-Maskin [1986] However, Radner [1986] has shown that efficient payoffs can be attained in partnerships with time averaging. constructed strategies so that (1) players never cheat, punishment occurs if only finitely often, and thus is negligible, and deviations is very likely to trigger His proof a (2) an infinite number of substantial punishment. Since no finite number of deviations can increase the time-average payoff, in equilibrium no one cheats yet the punishment costs are negligible. Since the inefficiencies in repeated partnerships and games with short-run players both stem from the need for punishments along the equilibrium path, it is not surprising that the inefficiencies in our model also disappear when players are completely patient. We prove this with a variant of the "target strategies" we used in Section 4. differ from Radner 's in that even if player 1 These strategies plays the equilibrium strategy, she will be punished infinitely often with probability one. However, along the equilibrium path the frequency of punishment converges to zero, so that as in Radner the punishment imposes zero cost. Proposition criterion Imagine that player t=T-l 5^: lim inf T E CD (1/T) ) g 1 {s{t)). evaluates payoff streams with the Then for all v cv ts'O subgame-perf ect equilibrium with payoffs v 21 . there is a Remark: The proof is based on a strong law of large numbers for martingales with independent increments, a 2/ which we extend to cover the difference between supermartingale and its lowest value to date. The relevant limit theory is developed in the Appendix. Proof As in Proposition 4, we use different strategies for payoffs above and : below some fixed static equilibrium o. Imagine that payoff in this equilibrium, and normalize mixed) strategy in graph define g^(o') = v = 0. v, v. exceeds player I's Let o be the (possibly that maximizes player one's expected payoff, and (B) ^^_^ T-1 Define Jq = and J = \ g t definition in Section 4, (t) ,s_ (s (t) ) . = where we used player I's realized action and the mixed strateg y of her opponents in defining J^). objective function is o* {\i^) = -, no matter how player 1 if J >o o if J <0 Note that player I's Set < Ve claim that and (ii) lim inf E(1/T)J o (This differs from the (i) plays, her payoff is bounded by v that by following s. player 1 can attain payoff v. , almost surely (and hence in expectation.) To prove this, let -o{h ) be an arbitrary strategy for player 1, and fix the associated probability distribution over infinite-horizon histories. For * each history, let R (h ) = Its t-1 I -o {h^)=<J] 22 be the "reward" periods, and let P (h ) = lT<t-l Then let M (h ) (h^) = o\ be the "punishment" ones. <!> I = J be the sum of player I's payoffs in the good g. (x(t)) T e R periods, and set N^(h^)= ^g^{x{t)) T ep t Note that the reward and punishment sets and the associated scores are defined path wise, i.e. they depend on the history h henceforth, though, we will ; Finally define M = max M omit the history h from the notation. ^ N^, and v =min g ^ , N = min T<t T<t (f^) We claim that for all t. v^+ (K^-K^) (5) J^ ^ v^+ ^ This is clearly true for of period t, either t J (a) (N^-N^). = 0. Assume >0 or (b) (5) J ^ holds for all T<t. 0. In case (a), J . At the start ^ ^i''''-^t Y^ +(K^-K^). J^^i Also, since <> = J.+ N.^,-N^s ;^+N In case (b) , J^^^ . ^^^J^ and J^^^ = J^+ K^^^-K^ so once again (5) . (h -N. . v^ v^ ) i <- ~^1" = o, v, + (N,^,-KV^. ) v^^{N^^^-N^^^) +Kt+r'^t ' , so that , ^+ ("t+l'^t+l^ is satisfied. 23 ' (5) is satisfied. Lemmas 3 in the Appendix shows that limsup {1/T)J_ almost surely, and since s uniformly bounded, limsup 1/T E(J plays so that M well. )/t converges to zero almost Since the per-period payoffs are uniformly bounded, this implies that surely. 1 (N -N ) Lemma as well. ^ is a submartingale, Since this is true when player the per-period payoffs are then the 1 follows (M -M) s, 4 shows that if player converges to zero as the result follows. , Q.E.D. We can show that with our strategies, player often (J > 0) with probability one. 1 is punished infinitely This contrasts with Radner's construction of efficient equilibria for symmetric time-average partnership games, where the probability of infinite punishment is zero. It seems likely that our "target-strategy" approach provides another way of constructing efficient equilibria for those games; it would be interesting to know whether this could be extended to asymmetric partnerships. Our approach has the benefit of making more clear why the construction cannot be extended to the discounting case. It also avoids the need to invoke the law of the iterated logarithm, which may make the proof more intuitive, although we must use the strong law for martingales in its place. 6. Several Lon g Run Players with Unobservable Kixed Strategies The case of several long-run players is more complex, and we have not completely solved it. As before, we can construct mixed-strategy equilibria in which the long-run players do better than in any pure strategy equilibrium, and once again they cannot do as well as if their mixed strategies were directly observable. However, we do not have a general characterization of the enforceable payoffs. Instead, we offer an example of payoffs that cannot 24 . be enforced, and a very restrictive condition that suffices for enforceability. Figure 2 presents a 3-player version of the game in Figure 1. The third player, DUMKY, who Col's choices and payoffs are exactly as before. is a long-run player, receives 3 if Col plays L and receives feasible payoffs for Row and DUMMY are depicted in Figure is not 3. 3 The argument of Section 3 2. otherwise. The Consider the Here Row and Dummy both feasible point at which p = 1/2 and Col plays L. receive Row's and shows that Row's best equilibrium payoff but 2, which is the minimum of payoff over the actions in the support of her mixed strategy. Dummy is not mixing, so Dummy's minimum payoff over the support of her strategy is 3. (Indeed this is the minimum over the support of the produce of the two strategies.) analogy to the proof of Proposition were enforceable. 3, Thus one might hope that, by we could show that the payoffs But these payoffs are not even feasible! Dummy's payoff can be when Row's payoff is depicts the feasible set.) 2 is 2 1 9 G -r-rrr / U . (2,3) The highest (See Figure 3, which The problem is that an equilibrium in which Row usually randomizes must sometimes have Col play M or R to "tax away" Row's "excess gains" from playing U instead of D, and this "tax" imposes a cost on Dummy 25 . M U D . Player two is 0.0,1 1,1,1 4,3,0 2,3,0 a "dummy" -1,0, 0,0,3 player three chooses , -100 COLs . (^T5T-^) Player 2's payoff 3 - (T5i.°) Player I's payoff Feasible set when three plays a SR best response Figure "J consider the game in Figure Next, u 1, 1, 1 2, D 0, 2, 1 -1, 0, -1,' 4. 1 -9 1, 1, -99 0, 2, 1 14, 4, 14, 2, 12, 4, 12, 2, Figure 2, 0, 1 -1, -1, -1 4 In this game, player 1 chooses Rows, player 2 chooses Columns, player 3 chooses matrices; players short-lived. and 1 2 and are long-lived while player The unique one-shot equilibrium is (U,L,A) with payoff (1,1,1). The long-lived players can obtain a higher payoff if they induce player choose C, which requires both long-run players to use mixed strategies. p = prob U; 1/100]. q = prob L, then player 3 chooses C if (1-p) (1-q) ^ is 3 to 3 [Let 1/10 and pq ^ For example, if both long-run players use 50-50 randomization the payoffs are (13,3,0). Call this strategy 26 cr = (a o ). ' Now let us explain how to enforce (2,2) as an equilibrium payoff. It will be clear that the construction we develop is somewhat more general than the example; we do not give the general version because it does not lead to a As in the proofs of complete characterization of the equilibrium payoffs. Propositions 1 and game only through 2, a the strategies we construct depend on the history of the number of "state variables," with the current state determined by last period's state and last period's outcome through a (commonly known) denoted s. (1) , Let D and R be the "first" strategies, transition rule. and s„(l), and let U and L be the second, Play begins in state 2 action k, and s„(2). In this state, each player pays o, which gives 0. equal probability weight to his two actions. player s. (2) the next period state is play in state (j,k) are denoted ct(k,k) = (ex and If player 1 plays action j, The payoffs when beginning (j,k). (j,k), a {j,k)). In our construction, each player's continuation payoff will be independent of his opponents' last move, so that in each state, ct a = (j,k) (j) and a (j,k) = °2^^'^' filially the «.'s and the specified transition rule will be such that each player is indifferent between his pure strategies and thus is willing to randomize. We will find it convenient to first define the a's, and then construct the associated strategies. Set the a's so that v^ = (1-6) g^{E^(j), + oa^(j) cr^) (6) In our example, a^(l) solves 10/6. Similar computations yield 2 a (1-6) (12) + 6a^(l), so cx^(l) = 12 = (2) 27 = 14 - 12/6, tx (i) = 4 - 2/6, and - . = 2. CX2(2) If the observed play is with payoffs a(l,i). Choose a (s. (1) , s.(l)), next period's state is Here play depends on a (1,1), public randomization as follows. point w in the set P of payoffs attainable without private randomizations, and probability p£(0,l), such that a p(l-5)w + (l-p)v = {l-6p)a{l,i) (7) . (This is possible for S sufficiently near to one. The general version of this construction imposes a requirement that guarantees (7) can be satisfied). With probability p, players play the strategies that yield w, and the state remains at With complementary probability, they play the mixed (1,1). strategies (f. ,f„). The continuation payoffs are exact ly as at state 0. Thus, the payoff to player (8) i of choosing strategy x(j) p[(l-6)w. + 5a. (1)] + (i-p) [(l-6)g^s. (j) , is a_.) + 6a. (j)] = p[(l-6)w^ + 6a^(l)] + (l-p)[g^(E^(l), a_^)] =a.(l) for all strategies s (j )esupp(a. ) . , Once again, if any long-run player chooses an action not in the support, play reverts to the static Nash equilibrium. Now we must specify strategies at states other than (1,1). (2,2) Choose (9) occurs if the players chose their most preferred strategic a point w' in P and a p'e;(0,l) such that p"(l-5)w' + (l-p')v= (l-5p') a(c^,C2) 28 The state (U,L) (Once again this is possible for 6 near to one.) public randomization, switching to following (a w' And again, play depends on a with probability p' and otherwise a.). , At state (1,2), play switches to a point in P for one period with probability p to normalized payoffs out to a{l,2). With complementary probability, players once again play the mixed strategies continuation payoffs are used. <^ and the same State (2,1) is symmetric. Now let us argue that the constructed strategies are an equilibrium for 5 sufficiently large. in state (j,k) First, if there are no deviations, the payoffs starting are cx,(j,k). If player i deviates to an action outside of the support of a., or if either player deviates when the strategies say to play a pure strategy point, the deviation is detected, and play reverts to a static equilibrium. For 6 near to one all of the a's exceed to static equilibrium payoff, and so lor sufficiently large S no player will choose such a deviation. And, by construction, players are indifferent between the actions in the support of their mixed strategy, future. if they plan to always conform in the Then by the principle of optimality, no arbitrary sequence of unilateral deviations is profitable. 29 32 ' APPENDIX In this appendix we consider discrete-parameter martingales 0,1, where {F , assume that x,. Lemma A.l Let . = Ix n ,F be 1 n a almost surely. ) n n = I We martingale sequence with bounded increments. Ix ' page 36f f ,F 0. for some number B, (That is, n probability space. is a filtration on an underlying \ Ix n - x , n-1 ! s Then lim x /n almost surely.) B, ^ n-Kc A proof of this lemma can be found in Hall and Heyde = n (1980, . We also use the following standard adaptation of this strong law: Lemma A. For IK as above, ,F = let X Then limX /n = minx.. l<n almost n-«» surely. On ^0 Since x„ = 0, X Proof: process. Since X /n ^ Fix for all n. sample of the stochastic a we only have to show that lim inf X /n = 0. 0, Suppose, instead, that n. is a subsequence along which the limit is less than 0. 11 For each n., there is m. 1 x^ /m m 1 ^ . . 2 n. with x 1 , n. m and thus /m . . 1 X > 11 n.im.i /n. = x /n. violates the strong law, 1 which can happen only on Lemma A. u = X Hence, along the subsequence Im.}, x . 1 x- = 0. 11 m. Let Ix ,F Let !X^1 n 1 a null set. be a supermartingale with bounded increments and with be defined from Ix 1 n as in lemma 2. Then lim(x „ n-«D almost surely. 30 n - X n ) /n = . 4 Since Pr oof: x^ is nonpositive. E(!? X For n = Note that If .). n' n-1 K increments, and lemmas Y n - Y n )/n = 0. point wise. ^ K n and = X ,, n-1 ,F ly 1 = l" ,, n-1 , i=l n C , and let = K K and let Y n n = inf|y.;i = -^i tell us that lim y /n = lim Y /n = 0, 2 ,-X n-1 + K , n-1 then since Y n-1, n is a martingale sequence with bounded Assume it holds for X n Let y . n - x n But this is easily done by induction. X If X = x K We are done, therefore, once we show that x by convention. n = 1 let ..., 1, Then, immediately, l,...,nl. lim(y we only need to show that the limsup of the sequence , - X n ^ Y ^ , s y n - n It is clearly true for then since K - n K n , sy ,-Y ,+r,or n-1 n-1 n n n-1 n n-1; - X n and thus ^ - Y y , 'n n-1 we are done. . While if X x,andx -X n ^y'n -Y. n n n n " "^ ., n-1 then X n Q.E.D. A symmetrical argument completes the proof, and we obtain; Lemma A. 0. Let X : n Let = max Ix n ,F Ix. li 1 1 n with bounded increments and x be a submartingale ^ = 1, ,nl. Then lim 31 n-wo (X n - x n ) /n = = n almost surely. FOOTNOTES 1. The required discount factor can depend on the payoffs to be attained, 2. We thank Ian Johnstone for pointing us to this result. 32 REFE RENCES Abreu, D. "On the Theory of Infinitely Repeated Gaines with [1986] Discounting", mimeo, Dybvig, P. and C. Spatt [1980] "Does it Pay to Maintain a Reputation?", mimeo. Fudenberg, D. and D. Levine [1983] "Subgame-Perf ect Equilibria of Finite and Infinite Horizon Games", Journal of Economic Theor y, 31, 227-256. Fudenberg, D., and E. Maskin [1986] "The Folk Theorem in Repeated Games with Discounting or With Incomplete Information," Econometrica , 54, 533-556. "Discounted Repeated Games with One-Sided Moral Hazard", [1987a] Mimeo, Harvard University. "Nash and Perfect Equilibria of Discounted Repeated Games," [1987b] Mimeo. "On the Dispensability of Public Randomizations in [1987c] Discounted Repeated Games," mimeo. Hall, P. and C.C. Heyde, [1980] Marti ngale Theor y and Its Applications, Academic Press, New York. Kreps, D. [1984] Radner, R. "Corporate Culture." Mimeo, Stanford Business School. "Repeated Partnership Games with Imperfect Monitoring and no [1986] Discounting." Review of Economic Studies 53, 43-58. Radner, R., R. Myerson, and E. Maskin [1986] "An Example of a Repeated Partnership Game with Discounting and With Uniformly Inefficient Equilibria." Selten, R. "The Chain-Store Paradox.", Theory and Decision [1977] Shapiro, C. Review of Economic Studies 53, 59-70. [1982] Reputation," Simon, H. [1951] , "Consumer Information, Product Quality, and Seller Bell Journal of Economics, 13, 20-35. "A Formal Theory of the Employment Relationship," Econometric a, 9, 19, 293-305. 5582 U84 33 127-159. Date Due Ss ^\f Hi 'J 8 9 MIT LIBRARIES 3 TOaO DDS 3St. S'^M