Document 11163049

Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/nashperfectequilOOfude «... SAUG HB31 .M415 working paper department of economics NASH AND PERFECT EQUILIBRIA OF DISCOUNTED REPEATED GAMES By Drew Fudenberg and Eric Maskin No. 499 July 1988 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 23 1988 NASH AND PERFECT EQUILIBRIA OF DISCOUNTED REPEATED GAMES By Drew Fudenberg and Eric Maskin No. 499 July 1988 I.I.T. LIBRARIES AUG 2 3 iaoc- RECEIVED NASH AND PERFECT EQUILIBRIA OF DISCOUNTED REPEATED GAMES By D. FUDENBERG AND E. MASKIN February 1987 Revised July 1988 * Massachusetts Institute of Technology ## Harvard University and St. John's College, Cambridge We thank two referees for helpful comments. Research support from the U.K. Social Science Research Council, National Science Foundation Grants SES-8607229 and SES-8520952 and the Sloan Foundation Acknowledged. is gratefully ABSTRACT The "perfect Folk Theorem" for discounted repeated games establishes that the sets of Nash and subgame-perfect equilibrium payoffs are equal in the limit as the discount factor 5 which the two sets coincide before the limit to compute 8 S_ We provide conditions under tends to one. is reached. That is, we show how such that the Nash and perfect equilibrium payoffs of the -discounted game are identical for all 6 > S. 1 Introduction . The "Folk Theorem" for infinitely repeated games with discounting asserts that any feasible, individually rational payoffs (payoffs that strictly Pareto dominate the minmax point) can arise as Nash equilibria if the discount factor is S Our [1986] paper established sufficiently near one. that under a "full-dimensionality" condition (requiring the interior of the feasible set to be nonempty) the same is true for perfect equilibria, so that in the limit as tends to 6 1 the requirement of subgame-perfection does not (Even in the limit, perfection restrict the set of equilibrium payoffs. does, of course, rule out some Nash equilibrium strategies . This paper shows that when the minmax point is in the interior of the feasible set and a second mild condition holds, the Nash and perfect equilibrium payoffs coincide before the limit. game, there is a i < 1 such that for all 6 E That is, (S_, 1), for any repeated the Nash and perfect equilibrium payoffs of the 5-discounted game are identical. Our proof is constructive, and gives an easily computed expression for the value of The payoff -equivalence result holds even though for any fixed 6 < 1 S_. there will typically be individually rational payoffs that cannot arise as equilibria. In other words, the payoff sets coincide before attaining their limiting values. Theorem; indeed, Payoff -equivalence is not a consequence of the Folk it requires the additional conditions that we impose. The key to our argument is the construction of "punishment equilibria" one for each player, that hold a player to exactly his reservation value. in our other work on discounted repeated games ([1986], As [1987a]), we interweave dynamic programming arguments and "geometrical" heuristics. Section 3 2 introduces our notation for the repeated game model. presents the main results. Section 4 Section provides counterexamples to show that the additional restrictions we impose are necessary, and that these . restrictions are not so strong as to imply that the Folk Theorem itself obtains for a fixed discount factor less than one. Through Section 4 we make free use of the possibility of public randomization. That is, we suppose that there exists some random variable (possibly devised by the players Players can thus use themselves) whose realizations are publicly observable. the random variable to (perfectly) coordinate their actions. In Section 5, however, we show that our results do not require public randomization. 2 . Notation We consider a finite n-player game in normal form g: where AjX. . .XA R g^,...^) » R n , n) ) and Si (a l player i's payoff from the vector of actions (a,,..., a ). strategies, i.e., - (g _(a ] 1 a Sn ^ a i n) a the probability distributions over A., a n) is Player i's mixed are denoted ~Z. For . notational convenience, we will write "g(a)" for the expected payoffs corresponding to a. In the repeated version of g, discounted sum i maximizes the normalized of his per-period payoffs, with common discount factor -k 1 tt each player - (1-5) Z I ' S L 6: g.(a(t)), t-1 where cr(t) is the vector of mixed strategies chosen in period Player i's strategy in period t can depend on the past actions of all players, that is on the sequence (a(r)) randomizing probabilities o(t) . t. , but not on the past choices of Each period's play can also depend on the realization of a publicly observed random variable such as sunspots Although we feel that allowing public randomization is as reasonable as prohibiting it, our results do not require close to Section it. explains how, for 5 6 the effect of sunspots can be duplicated by deterministic cycles 1, of play. let w(t) be a To give a formal definition of the strategy spaces, sequence of independent random variables with the uniform distribution on The history at time [0,1]. for player i - (h is h t is a sequence s. where s ,: , H -> and a strategy a(t-l), w(t)), Z, and H is the space of time t histories For each player j, choose minmax strategies nr - (nr 1 m- (0) . G arg min max g. (m. J m m . . "J J , m . ) and g. (nr ) . . . ,nr - max g. (a J J *• , . ) , where nr , . ) . a j Let v. where "m g.(a., nr . . - g, (mJ ) mixed strategy selection for players other than " is a ) - g. (m; , . . . .nrc , . a., nr. .,..., nr ) We call v. player j's reservation value. for player j j , and . Since one feasible strategy is to play in each period a static best response to that period's play of his opponents, player j's average payoff must be at least v. in any equilibrium of g, whether or not g is repeated. Note that any Nash equilibrium path of the repeated game can be enforced by the threat that any deviation by j playing m _ for the remainder of the game. . ) will be punished by the other players' minmaxing j (i.e. Henceforth we shall normalize the payoffs of the game g so that (Vp v i . . .v ) - (0, ... ,0) - max g (a) i a . Let . Call (0, ... ,0) the minmax point , and take , U - I In v (v, In there exists (a, ,...,a ) 1 with g(a ,...,a 1 n) - ^ v ) r } x...xA e A, ) 1 n , V - Convex Hull of U, and V - ( v (v. 1 n ) e V I i v. > for all i) l The set V consists of feasible payoffs, and V * consists of feasible payoffs that strictly Pareto dominate the minmax point. That is, V k is the set of feasible, strictly individually rational payoffs. 3 . Nash and Perfect Equilibrium Any feasible vector of payoffs least (1-5) v. (v, v ) that gives each player is attainable in a Nash equilibrium, at since Nash strategies can specify that any deviator from the actions sustaining minmaxed forever. i (v.. , . . . In a subgame-perfect equilibrium, however, ,v ) will be the punishments must themselves be consistent with equilibrium play, so that the punishers must be given an incentive to carry out the prescribed punishments. One way to try to arrange this is to specify that players who fail to minmax an opponent will be minmaxed in turn. However, such strategies may fail to be perfect, because minmaxing an opponent may be more costly than being minmaxed oneself. Still, even in this case, one may be able, as in our [1986] paper, to induce players to minmax by providing "rewards" In fact, for doing so. the present paper demonstrates that under certain conditions these rewards can be devised in such a way that the punished player is held to exactly her minmax level. When this is possible, the set of Nash and perfect equilibrium payoffs coincide, as the following lemma asserts. Lemma 1: For discount factor 6, suppose that, for each player i, there is a perfect equilibrium of the discounted repeated game in which player i's Then the sets of Nash and perfect equilibrium payoff is exactly zero. payoffs (for Proof with <5) coincide. Fix a Nash equilibrium : , and construct a new strategy s along the equilibrium path, but specifies that, if player s first to deviate from s , player i's payoff to zero. deviations are ignored.) could have faced in s A A s. s By construction, that agrees i is the play switches to the perfect equilibrium that holds (If several players deviate simultaneously, the Since zero is the worst punishment that player i he will not choose to deviate from the new strategy , . is a s perfect equilibrium with the same payoffs as s . Q.E.D. Remark Note that the lemma does not conclude that all Nash equilibrium ; strategies are perfect. A trivial application of Lemma dilemma, 1 in which there is a one-shot equilibrium that gives all players their minmax values. An only slightly more complex case is a game where each player prefers to minmax than to be minmaxed, i.e., g.(m J ) > like the prisoners' is to a game, for all i * j . a game in which In such a game we need not reward punishers to ensure their compliance but can simply threaten them with future punishment if they fail to punish an opponent. . Theorem Suppose that for all 1_: pure strategy, and that g.(m and j, i > 0. ) Let i 5 i* ,j m , . . J i. Then for all 5 i„j is a (0), satisfy v (1-5) < min g (nr J all as defined by , for ) J G (5,1), the sets of Nash and perfect equilibrium payoffs of the repeated game exactly coincide. Proof : For each player i, define the i "punishment equilibrium" as until some player follows. Players play according to m ^ j deviates. i Player they switch to the punishment equilibrium for j. this occurs, no incentive to deviate from the If has i punishment equilibrium because in every i period he is playing his one-shot best response. Player j k i may have a short-run gain to deviating, but doing so results in his being punished, so that the maximum payoff to deviation is v. (1-5) g. (m ) Remark by assumption. 1: So the hypotheses of Lemma among all actions in the support of m. this case Theorem 2: are satisfied. 1 If the minmax strategies are mixed instead of pure, construction above is inadequate because player Remark which is less than min , 1 need not hold. The proof of Theorem vectors that give each player j 1 . Example Q.E.D. the may not be indifferent j of Section h shows that in 1 2 actually shows that all feasible payoff at least min g.(m.) can be attained in Recall that we are expressing players' payoffs in the repeated game as discounted payoffs and not as present values , 2 However, in two-player games we can sharpen Theorem 1 by replacing its hypotheses with the condition that for all i and j and all a. in the i ^ j , i support of m. , that of Theorem g. (a. 1 , m . ) is positive. , Note that this condition reduces to if all the m. are pure strategies. equilibrium if exceeds the 6 6 defined in the proof. Although the hypotheses of Theorem 1 are not pathological they (i.e., are satisfied by an open set of payoffs in nontrivial normal forms) they do , We now look for conditions that apply not apply to many games of interest. even when minmaxing an opponent gives a player less than her reservation utility. In this case, a To construct equilibria of "reward" afterwards, as we explained earlier. this sort, it must be possible to reward one player without also rewarding This requirement leads to the "full-dimensionality" the player he punishes. requirement we introduced in our earlier paper: should equal the number of players. the dimensionality of V However full dimensionality is not sufficient for the stronger results of this paper, as we show in Section We must strengthen it to require that the minmax point (0 0) the interior of V. i A a. him to induce a player to punish an opponent we must give Moreover, we need assume that each player A is U. itself in has an action . such that g.(a., m < .) 0, so that when minmaxed, a player has an action (From our normalization, his for which he gets a strictly negative payoff. maximum payoff when minmaxed is zero.) Theorem 2: Assume that (i) the minmax point is in the interior of V, A that (ii) for each player there exists S_ < 1 i there exists such that for all S £ a. (8_, A such that g.(a., m 1), and > .) < 0. Then the sets of Nash and perfect equilibrium payoffs of the repeated game exactly coincide. Corollary : Under the conditions of Theorem vector v with v. > equilibrium. v. (1-5) for all i 2, for 6 > 6_, any feasible payoff can be attained by a perfect Remark Hypothesis (ii) of the theorem is satisfied by generic payoffs in : The interiority normal forms with three or more actions per player. condition (i) the minmax point can be outside V however, not generic: is, The condition is important because it ensures for an open set of payoffs. that in constructing equilibria to reward a punisher the ratio of his payoff to that of the deviator can be made as large as we like. Proof of Theorem We begin in part (A) with the case in which all the 2: minmax strategies are pure, or, equivalently m. mixed strategy is observable. each player's choice of a , Part (B) explains how to use the methods of our [1986] paper to extend the construction to the case where the minmax strategies are mixed. Assume that each m. is a pure strategy. (A) action For each player i, choose an i /. a. l * ii-i such that g.(a., m b -x. .) < For Jiî. 0. l -g.(a., m .). let -'j y. °j -l 1 The equilibrium strategies will have 2n "states", where n is the number of players. States through n are the "punishment states", one for each 1 player; states n + the strategies are: Play (a., m switch to state n + remain in state deviates, i j » possible because , 1 p. (8). - If player tomorrow. In reward state n + - v (v.. ), j players i, which are to be determined. deviates in a reward state, switch to punishment state j. Choose v x.) If there are no deviations, today. with complementary probability i then switch to state j .) i, tomorrow with probability p. (8) (to be determined), and play actions to yield payoffs v If player In the punishment state to 2n are "reward states". 1 so that for in V so that, e int V). 6 for i n i , x.v. - v.y. > (this is Now set p. r x(8)' - x./<5v., and choose > I, p. (5) < 1. l 1/ l' 6 > x./(v. + 3/1 This choice of p. (6) sets player i's payoff starting in state player w. , j equal to zero if she plays as specified. i does not deviate, his payoff starting in state i, If which we denote solves the functional equation j wj - (1-5) (1) i (-yj) + 5 Pi (£)vj + 5(l-p.(6))w j , so that w (2)/ \ L - (x.v 1 1 yfvf) / (v + x,). i i i - i J j j The interiority condition By construction, the numerator in (2) is positive. has allowed us to choose the payoffs in the reward states so as to compensate the punishing players for punishing player .A without raising i's own payoff i A above zero. Choose 5 * j < 1 large enough that, A • and so that for i , w. 5 and j, v. > v. (1-5), i A > v. (1-5). We claim that for all equilibrium. for all Set 5 - max (5, 5). the specified strategies are a perfect e(5_,l), First consider punishment state receives payoff zero by not deviating. i. If player In this state, player i deviates once and then i conforms, she receives at most zero today (since she is being minmaxed) and Thus player she has a normalized payoff of zero from tomorrow on. i cannot gain by a single deviation, and the usual dynamic programming argument implies that she cannot gain by multiple deviations. state is w. i (which exceeds v. (1-5)). Player j's payoff in A deviation could yield as much as v. today, but will shift the state to state j, where j's payoff is zero, player n+i, j cannot profit from deviating in state each player k obtains payoff v, exceeding i. v, Finally, (1-5) switching to punishment state k prevents deviations. from Lemma 1. , so in reward state and so the threat of The theorem now follows The strategies in part (A) punish player (B) to use his minmax strategy m. if, j in state he fails i, The players can detect all deviation from m. . only if it is a pure strategy, or if the players' choices of mixed strategies (Otherwise, player are observable. would not be detected if he deviated to j However, a different mixed strategy with the same support). following our paper, we can readily modify the construction of (A) to allow for [1986] The idea is to specify that player j's payoff in mixed minmax strategies. reward state n+i depend on the last action he took in punishment state n in such a way that player is i exactly indifferent among all the actions in the support of m. let (a.(k)} be the pure strategies in the support of m., To begin, where the indexation is chosen so that 1 1 y (k) « -g.(a., a.(k), m where m than j. 1 . .) = y (k+l), Thus y.(k) is player j's expected payoff in punishment state she plays her k c < -g.(a., a 1 (k+l), m 1 .) is the vector of minmax strategies against i by players other . . . - 1 -best strategy in the support of m. " max i,a. ,a. ,a l* l , g.(a., a II ; N / g.(a., a - .) . i if Next define .11 .) . . -l This is the maximum variation a player's choice of action can induce in his own payoff. for all 3c Also, i let e > be such that all payoff vectors v with are in the interior of V. As in part (A) , m. . i played G int V.) our strategies will have n punishment states and n reward states, with a probability n+i if player (This is possible because < v. < a. and each of switching from state p. (S) j r* i i to state played an action in the support of However, when play switches to state n+i, player j's payoff depends on 10 Denote these payoffs by the action that she took in the preceding period. v.(k.), where is the k, index (as defined in the preceding paragraph) of the action last played by player Vr vl(k i < v L This v v £<v k j Thus the vector of payoffs in state n+i is . k i+r---- n ) - v k i-i< i-i>- is defined as follows: v i+ v i<Vi> X>>- First choose v. and v,(l) for each j to satisfy: (3) x.v\(l)' 1 1 v^y (l) > - l^j j < ex./ c (A) v, (5) vj(l) < . 0, and an d e. These conditions can be satisfied because p. (5) - (1-6)x./i5v. (6) vjCkj) - Now for each . vjd) + (1-5) j [yjCkj) and k. - As in part (A), G int V. let set , yj(l)] / 6 ?i (S). With this specification of the reward payoffs, player j's payoff in state is the same for each strategy in the support of m. [x.v.(l) - y.(l)v.] / (v. , and equals + x.). which is positive from inequality (3). Now we must check that the specified payoffs for state n+i are all feasible, and that, for 6 near one, no player wishes to deviate. Substituting the definition of p(5) into 11 (6) , we have i vj(kj) - vj(l) + [yj(kj) (7) Referring to the bounds k. and (5), we see that v. (k. (4) and thus the vector v , yjd)] (vj/x.). - ) < 3e for all J J since player i's payoff in state bounded away from zero for all In state reward states n+j. j i, i , for deviating from the support. willing to play m.. is no player will wish to deviate in the player jî obtains the same positive payoff Thus, and she will be punished with zero , for close to S The argument that player exactly as in Case is and his payoff in state n+j is zero, from any strategy in the support of m. i and Finally, feasible for all values of the k.'s. is j i 1, player will be j will not deviate in state in Q.E.D. A. The interiority hypothesis of Theorem dimension, i.e., that dim V - n. 2 implies that the set V has full Let us briefly consider the connection When the between Nash and perfect equilibrium when V has lower dimension. number of players n exceeds two, our [1986] article shows by example that the Nash and perfect equilibrium payoff sets need not coincide even in the limit as 1. tends to S Thus, 1. for such examples, these sets do not coincide for S The story is quite different, however, for two player games. Theorem that, 3: In a two-player game where dim V < for all 6 G (6_, 1), 2, there exists 6_ < 1 such the sets of Nash and perfect equilibrium payoffs coincide Proof 2 : 1 Let (nu .m-) be a pair of minmax strategies. (If there are multiple such pairs, choose one that maximizes player l's payoff.) (0,0), 1 2 If g(m, .nu) - 2 1 then (m- ,m„) forms a Nash equilibrium of the one-shot game. 12 In this < as infinite repetition of a one-shot Nash equilibrium constitutes a case, perfect equilibrium of the repeated game, application of Lemma the theorem. cannot be 2 Suppose, therefore, for some constant sum game and so, since dim V < a players' payoffs so that, for all 1,21 2 g, (m, ,nu) 1 that g.(m, ,nu) < Because y < (— g«(m, ,nu)). •&*** g(a. ,a„) - (v ,v response to m„ where v ), 2 0, g. (a.. i. But then g we can normalize the - g„(a, ,a„). ,a«) Take v - * * there must exist (a. ,a.) such that 11 * > establishes (otherwise 1 where m, (m, ,nu), is a best minmax pair for which g(m, ,m„) - (0,0), contradicting is a , (a, ,a„), 2, 1 1 the choice of (nu.nu)). We will show that, for 6 near 1, there exists a perfect equilibrium of the repeated game in which both players' payoffs are When zero. p(<5) 2 m. 1 and nu are pure strategies this is easily done by choosing such that (8) (l-S)-v + 6p(5)v* - 0. 2 1 The equilibrium strategies consist of playing (m..,m„) in the first period and •% -A- then either switching (permanently) to starting again with probability l-p(5) (a, ,a„) . punished by restarting the equilibrium. {a..(l) , . . . ,a(R) probability of (9) S 2 qi and that of m„ is } a, (i) (k) q 2 (j) is q. (k) . { a„ (1) Deviations from this path are Suppose that the support of , . . . Z qi (i) g 1 (a 1 (i),a 2 (j)) - v, g 2 (a 1 (I),a 2 (j)) < , By definition, and since the m. are minmax strategies, (10) for all and 13 with probability p(6) or else j a„ (S) ) . 2 m, Suppose that the is 2 q (j) £ (a (i),a (j)) < 1 1 2 2 (11) in Lemma Now, 2 for all i. below, we will show that, for all i and i, there exists c. . > such that 2 (12) q (i) [g 2 (a 1 (i),a 2 (j)) + c [g 1 (a 1 (i),a 2 (j)) + c ] - ] - J i and 2 (13) q 2 (j) Take P,,(5) IJ -^—^ <5 v 2 1 if players play (m, ,m„) Then, in the first period and switch (forever) (a«,a„) with probability p.. (6) if the outcome is (a, (i) a» , expected payoffs are (0,0) (from (12) and (13)). ( j ) ) , to their Furthermore (12) and (13) imply that the players are indifferent among actions in the supports of their minmax strategies. Remark Q.E.D. An immediate corollary of Theorem : 3 is that the Folk Theorem holds for two player games of less than full dimension even when mixed strategies are not observable. Lemma (p(l) (14) Suppose that B - (b..) is an R x 2: , . . (This case was not treated in our [1986] paper.) . S matrix and that p - ,p(R)) and q - (q(l),...,q(S)) are probability vectors such that poB < and Boq < 0. 14 Then there exists an RxS matrix c . . > Proof 0, for all i Consider : , j i- such that pob. r i < i« then (14) implies that poB - (B+C)oq - 0, a Now if, for all 0. ' and i. .on = b 'J contradiction of the choice of b. . such that we can increase b.. while (14) continues to Indeed we can increase b.. until either pob. hold. - . a row b. Hence there exists (c.) such that po^B+C) C - - or b .on - 0. Continuing by increasing other elements of B in the same way, we eventually obtain poB make to 4 . and Boq - Let C be the matrix of the increases that we 0. Q.E.D. B. Counterexamples We turn next to a series of examples designed to explore the roles of the hypotheses of Theorems 1 and 2. Example hold when the minmax strategies are mixed. hypotheses of Theorems 1 and 2 1 shows that Theorem Example do not imply that all payoffs can be attained for some 6 2 2 individually rational Examples are necessary. detailed arguments can be found in our [1987b] working paper. Example 1 Consider the game in Table 1 15 need not establishes that the strictly less than one. show that both hypotheses (i) and (ii) of Theorem 1 3 The and 4 Player 2 R L U Player -1 1, -1. 2 1, o 1 -1, -1 Table Note that player 1 is minmaxed if and only if player probabilities between L and of Theorem 1 Thus, R. 2 mixes with equal it is easy to see that the hypotheses are satisfied except for the assumption that 2's minmax strategy Notice that, for any be pure. 1 perfect (indeed, player l's payoff must be positive in any 6, in any Nash) equilibrium, because l's payoff could be zero only were he minmaxed every period, in which case player 2's payoff would be negative. This already suggests that the set of perfect equilibrium payoffs will be a proper subset of the Nash payoffs, since a zero payoff can be used but the most severe punishment in a as a punishment in a Nash equilibrium, perfect equilibrium is positive. To see that this suggestion is correct, be the infimum of player let w one's payoff over all perfect equilibria where player two's payoff is zero, and let e be a perfect equilibrium where player two's payoff is zero and player one's payoff w is near w . In the first period of play D with probability one, or player 2 e 2 first period if (see our working paper) e is near enough to 1 player 1 must could obtain a positive payoff. Moreover, it can be shown that player 6 , randomizes between L and R in the by slightly raising the probability that player 2 . places on Let us modify L. Player 2's payoff is unchanged by this alteration, since he is indifferent between L and 16 Flayer l's payoff, however, declines, and for w R. — it w , -k player one's payoff w will now be less than w not supportable in a perfect equilibrium. so that if player Since, in e 1 player , , sufficiently close to — so the payoff (w,0) is Now further modify the strategies deviates to U in the first period he is minmaxed forever. 1 was deterred from deviating to U with a positive punishment, the threat of a punishment payoff of zero will deter player from deviating to U when the probability that player is slightly greater than in e . The modified plays L in the first period 2 is, e therefore, a Nash equilibrium whose payoffs cannot arise in a perfect equilibrium. Example 2 Consider the game in Table Theorems 1 and 2, It satisfies the assumptions of both 2. so that for sufficiently large discount factors the Nash and perfect equilibrium payoffs coincide. However, as we shall see, this is not a consequence of the Folk Theorem, since, for such discount factors not all individually rational payoffs can arise in equilibrium. M U 1, o 3, 2, -1 1, 2 Table Now, the feasible point (3,0) equilibrium payoff sets as factor 5 < 1 , 6 is -2, 0, -1 1 2 contained in the limit of the Nash tends to 1. However, for any fixed discount the Nash equilibrium payoffs are bounded away from (3,0). see this observe that if, for 6 < 1, To there existed a Nash equilibrium with 17 average payoffs (.3,0), players would necessarily play (D,L) every period. But in any given period, player two could deviate to M and thereby obtain a positive payoff, a contradiction. Example 3 Consider the two-player game in Table U 0, 2, 2, 1, -4 0, -5 1 in which (0,0) lies on the This game satisfies hypothesis (ii) boundary of V instead of the interior. but not (i) of Theorem 3a, and we shall see that the theorem fails. (2,1) (-1,-4) (0,-5) Figure 3b Table 3a As in Example 1, player l's payoff cannot be exactly zero in a Nash equilibrium. (The only way that player l's payoff could be zero along a path in which player 2's payoff is nonnegative would be for players to choose (U,L) with probability one in every period. But then player to D in the first period and guarantee himself 2(1-5).) 1 could deviate Indeed, as we show [1987b] paper, player one's Nash equilibrium payoff must be at least in our 14(l-6)/9. Consider the perfect equilibrium where player l's payoff is lowest. If q is the probability that player player 1 2 plays L in the first period, then can obtain at least q(l-<5) + <514(l-<5)/9 by playing D in the first period. Playing R is costly for player 2, 18 and so if q less than 1, player 2 must be rewarded in future periods. player a 2 But, as Figure 3b shows, positive payoff induces an even higher payoff for because of the failure of the interiority condition.) player assigning (This is 1. The need to reward implies that player l's equilibrium payoff exceeds 7(1-6) (1-q). 2 Together the two constraints imply that it exceeds 7(1-6) /3 but one can readily construct Nash equilibrium where player l's payoff is 2(1-6). a Hence the Nash and perfect equilibrium payoffs do not coincide. Example 4 Consider the game in Table 4. 0, -1 -1, 0, -1 1, 1 Table 4 When player two plays L, player one obtains zero regardless of what he does, violating hypothesis (ii) of Theorem (interiority) holds. Nash equilibrium. 2. Note, however, that hypothesis (i) Once again, player one's payoff exceeds zero in any (Player 1 can be held to zero only if player 2 plays L every period, which is not individually rational.) From arguments similar to those for Example the payoffs in the Nash 1, we can show that for 6 near 1, equilibrium that minimizes player l's payoff among those where 2's is zero cannot be attained in any perfect equilibrium. 1" 5 No Public Randomization . In the proof of Theorem 2, we constructed strategies in which play switches probabilistically from a "punishment" phase to a "reward" phase, with the switching probability chosen to make the punished player's payoff This switch is coordinated by a public randomizing device. equal to zero. The reward phase also relies on public randomization when the vector v - (^.....v 1 ) lies outside the set U of payoffs attainable with pure 1 Although public randomizing devices help simplify our arguments strategies. by convexifying the set of payoffs, they are not essential to Theorem 2. Convexif ication can alternatively be achieved with deterministic cycles over pure-strategy payoffs when the discount factor is close to one. Our [1988] paper showed that public randomization is not needed for the proof of the perfect equilibrium Folk Theorem, even if players' mixed strategies are not observable. all e there exist > and all > S 5_, 2 of that paper established that, t, with v. > such that for all vectors v e V there is a sequence (a(t)) actions in period (1-5) 2 t-1 S_, Lemma ., c for for i, where a(t) is a vector of whose corresponding normalized payoffs are v (i.e., g(a(t)) - v) and for which the continuation payoffs at any time 6 CO r are with e of v (i.e., result implies that, for the vector v for all S t, \{1-S) 2 t-r t " T 6 g(a(t) ) large enough so that v.(l-fi) < -v|| e , < This e, we can sustain as the payoffs of a perfect equilibrium where the equilibrium path is a deterministic sequence of action vectors whose continuation payoffs are always within a e of v , and where deviators are punished by assigning them subsequent payoff of zero. belong to U, Hence, attaining v , even when it does not does not require public randomization. Nor do we need public randomization to devise punishments whose payoffs re exactly zero. We can replace the punishment phase of Theorem of deterministic length. 20 2 with one . Proof of Theorem 2 without Public Kanriomizat ion As in the earlier {A) : Let y proof, we begin with the case of pure minmax strategies. be as before. Because V 1 with v. J G int V we can choose t > For n , x. for each and, > 2e and v. x. > yy + 2e for all Ji*i. l l , a a vector 1, we can i, near enough 6 and , - j choose function v. (6) so that l |vj(f) (8) - vjj | | < €, vjx. > yj vj(ff) + (9) £ for all j*l, and such that there exists an integer t(S) so that (l-5 (10) t(<5) )(-x.) + t(S) 6 v 1 (6) - 0. Let wj - (11) (l-^^X-yj1 Substituting using w (12) Take S 1 (9) ) + t(5) 5 vj and (10), we have > e/x. close enough to one so that for all i Now consider the following strategies: t(5) periods, and j, v. (1-5) < min In state i, ( e , e/x. ) play (a., m_.) for and then switch to state n+i, where play follows a deterministic sequence whose payoffs are v (5) in state n+i we appeal to Lemma 2 (v. (8), v .). For the play of our [1988] paper, which guarantees that the continuation payoffs in state n+i are at least :-i e . If player i ever deviates from his strategy, switch to the beginning of state i. By construction, player i's payoff is exactly zero at the beginning of state and increases thereafter, and since is minmaxed in state If player jî deviates in state by deviating. v. (1-5) < e, i i, i, i, he cannot gain he can obtain at most and from not deviating obtains at least w., which is larger. Finally, since payoffs at every date of each reward state are bounded below by e, and deviations result in a future payoff of zero, no player will wish to deviate in the reward states. (B) To deal with mixed minmax strategies, we must make player j's payoff in n+i depend on how he played in state Theorem is 2. i, as in the earlier proof of We will be a bit sketchier here than before because the argument essentially the same. As before, y.(k) be player j's expected payoff from his k in the support of m., jî. Let Y s this is the amount that player j ij playing a.(l). (13) - -a-6) best action r [yj(l)-yjac(r))]; sacrifices in state i relative to always Now take vj(5) - vj + Ri(l-S)/6 t(£) . With these payoffs in state n+i, each player actions in the support of m. . If v. and e j is indifferent among all the are taken small enough, t(5) (as defined by (10)) will also be small, and so the right hand side of (13) will be feasible. Thus, the reward payoffs can be generated by deterministic sequences. Q.E.D. 22 References Fudenberg, D. and E. Maskin [1986], "The Folk Theorem in Repeated Games with Discounting or with Incomplete Information", Econometrica 54 533-554, . , [1987a], "Discounted Repeated Games with and Unobservable Actions, I: One-Sided Moral Hazard", mimeo. and [1987b], "Nash and Perfect Equilibria of Discounted Repeated Games", Harvard University Working Paper #1301. and [1988], "On the Dispensability of Public Randomization in Discounted Repeated Games", mimeo. 23 2 6 10 U 2 6 3 -~ G-^-tf\ Date Due - WAR 1 *^§ tawf u Q BflQ Lib-26-67 MIT LIBRARIES nun 3 i inn TDflO in in miiiiiii DDS 2M3 A3M

Document 11163049

Related documents

Products

Support

Document 11163049

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib