Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/ondispensabilityOOfude |HB31 mi 5 working paper department of economics ON THE DISPENSABILITY OF PUBLIC RANDOMIZATION IN DISCOUNTED REPEATED GAMES Drew Fudenberg Eric Ma skin Number 467 August 1987 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 ON THE DISPENSABILITY OF PUBLIC RANDOMIZATION IN DISCOUNTED REPEATED GAMES Drew Fudenberg Eric Maskin Number 467 August 1987 Dewoy On the Dispensability of Public Randomization in Discounted Repeated Games by Drew Fudenberg * and Eric Maskin August 1987 Massachusetts Institute of Technology ** Harvard University We would like to thank David Levine for encouraging us to study this topic. Research support from National Science Foundation Grants SES-8607229 and SES-8520952 is gratefully acknowledged. 2G0SS23 1 . I n t roduct ion The Folk Theorem for repeated games asserts that any feasible, individually rational payoffs for a one-shot game can arise as Nash equilibrium average payoffs when the game In our [1986] is infinitely repeated. paper, which extends this result to subgame perfect equilibrium and discounting, we assumed that the players can condition their play on the realization of random variable. assumption would lead to only viz., any feasible, approximated by a a publicly observed that abandoning the slight weakening of the results; individually rational payoffs can be perfect equilibrium where there is sufficiently little discounting. This note shows that, the Folk Theorem holds randomization: however, We asserted, a in a in fact, our extension of strong sense even without public all feasible individually rational payoffs can be exactly attained in equilibrium. Although this stronger result is of some interest by itself, its true significance appears in connection with mixed strategies. Early analyses of repeated games with little or no discounting (Aumann-Shapley [1976], Friedman [1971] and Rubinstein [1979]) restricted players to pure strategies, or equivalent ly a player's choice of his fellow players. a , assumed that mixed strategy in any period is observable by The assumption of pure strategies is restric- tive because typically the range of individually rational payoffs is greater when players are allowed to use mixed strategies to punish their opponents. The alternative hypothes is--that a player's randomizations are ex post observable Section 6 of our [1986] — is, likewise, strong. paper showed how to extend the Folk Theorem to allow for mixed strategies when only a player's realized actions, and not his choices of randomizing probabilities, The key was the observation that observable. induced to use a are player can be mixed strategy to ininimax an opponent by making her a continuation payoff depend on her current action in a way that renders her exactly indifferent among the various choices in the mixed strategy's support. Our argument relied on public randomization to ensure that any individually rational continuation payoffs can be exactly attained. If, without public randomization, merely be approximated, a the continuation payoffs could minimaxing player might not be exactly indifferent over the support of his mixed strategy, and our construction would fail. Thus, if we obtain only an approximate version of the Folk Theorem without public randomization, our construction cannot accommodate unobservable mixed strategies. Attaining payoffs exactly our is also essential for the argument in paper, which provided sufficient conditions for the sets [1987a] of Nash and perfect equilibrium payoffs to coincide for discount Although the body of that paper assumed the factors less than one. possibility of public randomization, our results here imply that this assumption, 2. as in the Folk Theorem paper, is unnecessary. The Model We consider a finite n-player game in normal form: g: A-R n , where A=A,x...xA 1 n and A. is player i's action space. 1 set of player i's mixed strategies, i.e., Let Z. be the i the probability distributions over A. In and set Z=Z, , l we will write g notation, , ( o *. . . *Z To simplify . for player i's payoff given ) the mixed strategy vector o€£. repeated versions of In over actions at time each player's probability mixture g, can depend on the actions chosen at all t More formally, previous times. let h(t) e A - realized actions from time zero through time t-1. strategy Z. is be the Player i's sequence of maps (one for each period) from H(t) to a Note that, . H(t) any time at player i's strategy does not depend t, on the past randomizing probabilities of his opponents, but only on their realized actions. the infinitely repeated game G In , each player i's payoff is o the average discounted sum discount factor (1-5) T,.= 1 period with common 6: t_1 Z £ g.(a(t)), X t=l where a(t) of his per-period payoffs, -n-^ the probability distribution of actions chosen in is t For each player choose "minimax strategies" m =(m,,...,m j, so ) that m arg min max g.(o.,o e . -J a a -J J J .), "J . J and v. max g.(a.,m = J J „ J = .) "J g.(m ). J J (Here "m^." is a mixed strategy selection for players other than and g ' . j (a . j ,m J -j . ' ) =g (m^ . j 1 , . . . , m^j-1' . a , . , j' m Jj+1' .^ 1 mf ) ) . j, We call v. player Clearly, j's reservation value. least at v any equilibrium of in . player j's average payoff must be g, whether or not g repeated. is Henceforth we shall normalize the payoffs of the game ( v t ...... v 1 v . t n = ) ( . . . ) . Call . Moreover, =max g.(a). l , (0.....0) the mi n imax point g so that Take . let l a U = {(v.,...,v V = Convex Hull of ) there exists a€Z with g( | a) =( v .,..., v )}, U, and = V {(v.,...,v )€V n 1 3 | v.>0 for all i} l ' The Folk Theorem without Public Randomization . Our [1986] paper showed that if public randomization is allowed and either n=2 or the dimension of vector veV Se(_S,l), , there exists there is a a V equals discount factor perfect equilibrium of n, S_<1 then for any payoff such that, with payoffs G for all v. We 6 now demonstrate that public randomization is inessential for this resul t Lemma in V 1 establishes that, for 5 sufficiently large, all points are feasible and can be obtained without using mixed strategies. of actions That is, {a(t)}"_. there is a deterministic sequence for any veV for which v is the payoff vector. This is not sufficient to establish the Folk Theorem, however, because, even if 1. Of course, for low discount factors, public randomization does make a difference. If & is near zero, the payoff vector for the sequence {a(t)} is approximately g(a(l)), and so, quite apart from equilibrium considerations, many payoffs in V are not f eas ib le veV the sequence , period V In . T { a ( t ) might have the property that, } the continuation payoffs beginning at , that case, for some belong do not t to some player would prefer to deviate from the even if so doing caused his opponents to minimax him sequence, thereafter Building on Lemma generated by Lemma 1, shows that payoffs in 2 deterministic sequence in such a continuation payoffs always lie in V way that the a Following Lemma . can be V 2, we explain how our results allow us to do without public randomization in the proof of the Folk Theorem. Write A={a ,...,a {w ,...,w strategies Lemma for each j, let w = g( a ) Thus, . the set of payoff vectors corresponding to pure is } and, } . 1 : If <S> 1 — then for any v€V there is , a sequence {a(t)} of pure strategies whose average payoff is v. m Proof Let v=Ix J w J : where 0<x J <l, , . and £ x J = l. We construct j=l (a(t)} is 1 as follows. if a(t)=a J let N J (t) = Let (t) I be an index variable, Set N J (1)=0 for all otherwise. and t-1 T I (1-S)S T=l a I j (t) for t>l. N J (t) discounted weight" given to strategy vector C(t) = (j|x J -NvJJ (t) j*(t) = 5 arg max {x J jec(t) and set a(t)=a J 2. t-1 > ft) . a is j, and the "average before time t. Let Now define (1-6)}. N j (t)}, which 2 This defines an algorithm for computing a(t) If there is a tie, make a deterministic selection. Claim 1 The algorithm is well-defined, : i.e., the set C(t) is assume to the contrary that at some time(s) t, never empty. To prove the claim, C(t) N is (s) empty, for all j. m(l-5)S (1) and let S Summing over m. _, l be the first such time. s ll- Z N (s) = 1-1 1 Z (1-S)5 s-1 = (1) s-1 (1-6)>X j T _,.J I (t) j=l T=l j = l But 5 we have j, ms — J Then 1- Z (1-«S)5 T _, l S = _. L S contradicts our assumption that m(l-S)<l, establishing the claim Let N J (oo) = lim N J (t). t bounded, (Because N J (t) is increasing and ~*00 this limit exists.) Claim 2 For all : N J (<»)=\ V j, m To establish Claim 2, note first that, by construction, Z N (<») = 1. j=l Moreover, N («) cannot exceed x (by S t-1 (l-S)) t )< J Z x =1, N only when N J m for each j, and, since , ( because - St ' l ( increases N 1- 5) . Thus N j („)ix J . («)=x , proving the claim. j=l Now, by construction, the payoffs corresponding to {a(t)} are (1-S) l I t (1-5) X 6 g(a(t)) = =l Z S t_1 Z [ I J Z j = l t = l j = l t = l t_1 (t)w J ]= Z w J (l-5) 6 I J (t)-Zw j N j (co)=Zx j w J = v Q.E.D. Roughly speaking, definition, is v payoff vectors convex combination Zx w a w , the algorithm of Lemma . . . ,w 1 works as follows. of the pure strategy To generate v as a discounted average . payoff over time, choose that pure strategy vector which the difference between been used up until t x are simply Z a at and the fraction of times time a t for has (suitably weighted for discounting) is largest. The continuation payoffs at time (a(t)} By- t —S 5 g(a(t)), associated with the sequence s the discounted sum of i.e. t =s per-period payoffs starting at time Lemma 2 min v.>5£, t and discounted to time For every £>0 and closed set VcV : there exists _S< 1 such that, for all s. with £€(_£, 1) i.vev and every v£V, there is a sequence {a(t)} of pure strategies whose discounted average payoffs are any time are at least t Remark € v, and whose continuation payoffs at for each player. To prove that public randomization is inessential for : the Folk Theorem with observable mixed strategies, to show that, such that, if for any individually rational payoff 8 exceeds _5, v can be generated by a it would suffice v, there exists deterministic _£ 8 sequence whose continuation payoffs are individually rational. Lemma establishes 2 stronger property; a works uniformly for the entire set asserts that it a fixed _S We provide the stronger V. result because it is needed to show that public randomization is inessential with unobservable mixed strategies and in our [1987] paper . Proof Let : of the set be the polygon corresponding to the intersection Z with the inequality constraints v.23£, and let V the J vertices of Z. vertices {z } e{w ,...,w z Clearly, V cZ such that (i) each z and (iii) }; average Zx k (j)w k z } be Let Z be a polygon with . is within of £ z ; (ii) z =z if can be expressed as a weighted k where each weight x (j) , {z is a rational number. Observe that VcZ Because the x k,(j)'s are rational, we can find integers . (r (j))u_i and such that for all d and j k, x (j)=r (j)/d. Let "cycle j" be the d-period sequence of pure strategies in which played for the first periods; w k= r k Let z j ( S) corresponding to cycle z R k = I s=l then 2 periods; a" 3 is played for the next and m. r S (j), r is 2 (j) (Recall that be the discounted average payoffs j. Note that if z will be on the boundary as well. (5) (j) (j) and so on for all k between g(a ))• then 1 a with R°(j) = 0, is on the boundary of If we set V, z R Z (S) k (j)-1 Z (1-5) k-1, .v D s=R (j) m . J k=l Choose S_ so that, within E of z for all 5 , S 5 K . /(l-S a greater than By construction, . w ( 5) We now apply the algorithm of Lemma 1 deterministic sequence of the z ( 5) ' z ( contained in the generate each v€V by where 5=max 5>.S, Earlier, when the payoffs w were called for in set a(t)=a application, we replace the w the z ( S) ' In our current . s a (.5, j_ the actions for the next d periods. as a 1-1/J) given period t, . we with s Moreover, when the algorithm calls for payoffs . we assign cycle 1 is 5) s to for s j, is V 5>.S, ' z . and all _5 for all polygon Z(S) whose vertices are the ) z ( 5) The Lemma algorithm so modified guarantees that we can generate each of the payoffs in V by a each cycle is of length of at least 2£ , d and each z ( 5) gives each player a payoff the continuation payoffs starting at any time each player at least satisfy (1-5 Because deterministic sequence of these cycles- )g_+5 £ if S is t give taken large enough to (2£)>£, where £=min g.(a) the lowest possible is i, a value of any player's payoff. Q.E.D. To summarize, the algorithm of Lemma veV by a deterministic sequence of w s. 1 shows how to attain any Lemma 2 replaces each w that is not individually rational with a payoff vector itself can be attained through obtain v€V through a a finite cycle of w deterministic sequence, (i) 's. z ( 5) that Hence, to apply the Lemma 1 in algorithm using the z ( 5) the algorithm calls for z ' s instead of the (6), replace it w 's; and whenever (ii) with the corresponding d-period cycle To see how Lemma in 2 enables us to do without public randomization the proof of the Folk Theorem, strategies in our [1986] paper. we first recall the form of the To obtain the point v£V players use publicly correlated action (generating as long as no player deviates. If player "punishment equilibrium" in which, periods, (b) for a the players revert to mode in which their payoffs are to deviates, we provided i a certain number of the player's opponents minimax him and he responds optimally and then as (a) each period) in v we had , v , a more "cooperative" where this vector is chosen so induce i's punishers to go through with their minimaxing and so player i's overall payoff is less than £. Like v, From Lemma generated by publicly correlated actions. replace the publicly correlated actions yielding v and it is 2, we can v with deterministic sequences whose continuation payoffs are greater than £. Because deviation leads to payoffs less than wish to deviate from such strategies are observable, a sequence. In e, no player will the case where mixed this is the only change to the proof required to eliminate public randomization. The case where only players' realized actions (and not the randomizations themselves) are observable presents an additional complication. If, to minimax player i, player j uses a mixed strategy, he must must be indifferent among the various actions over 11 which he randomizes. Our [1986] proof ensured this indifference by making j's continuation payoff after the punishment phase contingent on his actions during the phase. It is important here that precisely specified values for the continuation payoffs be attainable; 2 shows, it would not suffice merely to approximate them. however, these exact values can, in fact, be attained. Thus public randomization is inessential in this case too. — Lemma 3 4 The s t r ong vers ion o f Le mma 2 guaranteeing a uniform choic e of 1 -" is need ed he re b ecause t he particular values of the Without cont i nuat i on p ayoffs w ill d epen d on the discount factor. unif o rmi ty we woul d ru n in t o th e di f f ic ulty that if 5 were chosen large enou gh t o invoke Lemm a 2 for part icular continuation payoffs, the c ont in uat i on payof f s re qui r ed to en sure indifference would thems elves cha nge nee ess i t at in g a new S, etc 4 We shou Id point out that our [19 86] paper was misleading in its a ssert i on that the Folk The orem hoi ds approximately when public rando mizat ion is not f eas ib le. Without the possibility of attaining certa in co nt in uat ion p ayof f s ex actly, i t is not clear that even an appro ximat e ve rs ion of the Folk Theorem holds for the case of unobs ervab le m ixed s tr ategi es 3 . , . . 12 References Aumann, R., and L. Shapley [1976], "Long Term Competition: Theoretic Analysis," mimeo, Hebrew University. A Game Friedman, J. [1971], "A Noncooperat i ve Cooperative Equilibrium For Supergames," Review of Economic Studies 38, 1-12. , Fudenberg, D. and E. Maskin [1986], "The Folk Theorem in Repeated Games With Discounting or With Incomplete Information," Econometrica Vol. 54, No. 3, May 1986, pp. 533-554. , and [1987], "Nash and Perfect Equilibria of Discounted Repeated Games," mimeo, Harvard University. Rubinstein, A. [1979], "Equilibrium in Supergames with the Overtaking Criterion," Journal of Economic Theory No. , 36! i 096 21, 1-9, 3 TOAD QD5 130 312