Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/learningmixedequOOfude working paper department of economics LEARNING MIXED EQUILIBRIA Drew Fudenberg David M. Kreps No. 92-13 June 1992 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 LEARNING MIXED EQUILIBRIA Drew Fudenberg David M. Kreps No. 92-13 June 1992 M.I.T. DEC LIBRARY 7 1992 Abstract This paper explores learning models in the spirit of the method of fictitious play. We extend the previous literature by generalizing the classes of behavior rules, and by considering games with more than two players. Most importantly, we reformulate the study of convergence to mixed strategy equilibrium, using Harsanyi's notion of perturbed games. Keywords: game theory, mixed strategies, learning JEL Classification: C72, D83 LEARNING MIXED EQUILIBRIADrew Fudenberg Department of Economics, Massachusetts Institute of Technology and David M. Kreps Graduate School of Business, Stanford University, and Department of Economics, Tel Aviv University June 1992 1. Introduction Nash equilibrium describes a situation in which players have identical and exactly correct beliefs about the strategies each player will choose. might the players come enough One have correct beliefs, or at least beliefs that are close being correct that the outcome corresponds to a Nash equilibrium? to explanation beliefs to come to is that the players play the game over and be correct as the result of learning from past has been explored at some length in the recent number How and when of different forms and over, and that their play. This explanation literature, in models that take a that stress different aspects of the problem. ] This paper explores learning models that are in the spirit of the model or method of "We Li Calzi, like to play (Brown, 1951; Robinson, 1951) in which players choose fictitious are grateful to Robert Anderson, Robert Ramon Marimon, Tom Aumann, Hugo Hopenhayn, David Sargent, and Jose Scheinkman for helpful Levine, comments. We Marco would thank IDEI, Toulouse and the Institute for Advanced Studies, Tel-Aviv University, for their hospitality while this research was being conducted. The financial assistance of the National Science Foundation (Grants SES 88-08204, SES 90-08770 and SES 89-08402) and the John Simon Guggenheim Foundation is gratefully acknowledged. 1 list To provide a guide of recent references: (1991), Ellison (1991), would take too long, so we will be content with a partial Crawford and Haller (1990), Eichberger, Haller, and Milne to the literature Canning (1991), Fudenberg and Levine (1992), Fudenberg and Kreps (1988), Hendon et al. (1991), Jordan (1991), Kalai and Lehrer (1990), Kandori, Mailath, and Rob (1991), Milgrom and Roberts (1990, 1991), Nyarko when they bear directly on our (1991), and Young (1991). own We will comment on some analysis. 1 of these papers as we proceed, their strategies to sumption maximize that their opponents historical frequency. its We (1) their current period's We will play expected payoff on the as- each strategy with probability equal to extend the previous literature in three ways: provide some minor extensions to the basic model of fictitious play, by generalizing the classes of rules by which players form their beliefs and use them to choose their actions. (2) We study models in the spirit of fictitiour play for games with more than two players. (3) Most importantly, we reformulate equilibria. We the study of convergence to mixed-strategy argue that the notion of convergence used previously in the ature, that the empirical marginal distributions converge, notion of what it means is to play a mixed-strategy profile, not an approproate and we suggest and We show analyze the stronger criterion of the convergence of intended behavior. that all this Nash mode equilibria and only Nash of convergence. Finally, we liter- equilibria are possible limit points under investigate the global stability of mixed equilibria in the setting of Harsanyi's (1973) purification theorem. Section 2 gives a general formulation of learning in a strategic-form game. This formulation, and our subsequent analysis, supposes that the play each other repeatedly (as opposed to a model with a large l's, player 2's, etc.) strategies chosen by and that in each the questions addressed two groups. sense, what play, players First, if model of by fictitious of player observe the (pure) fictitious play "settles play for two players. down" or converges in is separate some appropriate play guaranteed to converge? to the first question, recall that there are gence in the literature on We play (and models of learning in general) are the possible limit points? Second, With regard number their rivals. Section 3 reviews the into round of same players two modes fictitious play. In *he first, there is a finite of conver- time T such that a single strategy profile see that any such profile convergence, play cycles must be among a Nash equilibrium. In the different strategy profiles in a empirical frequencies of each player's choices converge to The corresponding strategy egy. traditional sense in which T played in every period from is profile fictitious also a is play is equilibria. Section 3 briefly reviews results Nash on; second mode way such some easy to it is of that the (mixed) equilibrium; this strat- the is said to converge to mixed-strategy from the literature that use these two convergence notions. In Section 4, we sumptions about the ways choose their immediate in which players construct We actions. of actions are asymptotically myopic; that maximizes their will i.e., are adaptive (following we that players' choice long run, players choose in a -way in the As provide. Milgrom and Roberts But more is required convergence of empirical frequencies Nash assessments and then assume throughout for players' assessments, [1991]), then verges to a pure-strategy profile, the profile must be a of players. their as- immediate payoffs. This assumption requires some expla- nation and rationale, which number more general generalize fictitious play by considering equilibria as limit points. asymptotically empirical, to a if mixed-strategy profile this for any form of convergence the second — is to — hav^ only that assessment rules are that players' assessments with empirical frequencies. Moreover, they intended play con- Nash equilibrium, A sufficient condition is which means if if converge together condition suffices only for two-player games. Section 5 presents several objections to convergence of empirical frequencies as an appropriate file. (if In mode of convergence summary, these objections for learning to play a mixed-strategy pn^- are: (1) Although assessments are converging they are asymptotically empirical), the strategies that are chosen are not. (2) In examples, correlations will be observed (over time) in the actions of players who choose their actions independently. (3) Because of (2), convergence in this sense for games with more than two players For these reasons, in Section 6 namely convergence When we is propose problematic. a stronger This raises of (intended) behavior. mode some of convergence, technical difficulties: players use mixed strategies, the realized distribution of play need not equal the intended one, which These abilistic. difficulties are makes notions attended equilibria are possible limit points behavior is and then to, under of convergence inherently prob- this mode we show that only of convergence, as long as asymptotically myopic and assessments are asymptotically empirical. Moreover, for any game and Nash equilibrium of the game, there of asymptotically one as desired. The problem with a model difficulty is the for is a limit point of intended behavior with probability as 2 These results are not limited to two-player this alternative vergence to mixed behavior The is myopic behavior and asymptotically empirical assessments which the equilibrium close to Nash is mode possible, it of convergence is hard to see standard one with mixed strategies: is that, why it games. while con- should occur. based on their If, as- sessments, players choose their actions to maximize precisely their expected payoffs, then (unless their assessments are precisely those of the mixed equilibrium) their intended behavior will not converge. If players are not restricted to precise maximization, then any behavior (that puts weight on actions in the equilib- rium mixture) will considerations is be satisfactory. Something outside of payoff-maximization required to lead players to the precise mixtures needed in the equilibrium. In our basic formulation, As games a in way around this we see no natural way problem, in Sections 7 and 8 which each player's payoff is we of doing this. consider learning in subject to a sequence of i.i.d. random shocks that are observed only by that player, as suggested by Harsanyi's (1973) purification theorem. In this context, a mixture over two strategies does not To the extent that some Nash equilibria seem unreasonable, such as those where players use weakly dominated strategies, this last result indicates that are be discussed as well in Section 6. assumptions are too weak. This will correspond to mixing by a player each period) strictly prefers who is indifferent, but rather to a one of the two strategies, who do know not Section mixed play we 8, and we show made sense to be approximation (Arthur Soderstrom, 1983). et that of the overall story. nation (see Section We hope games with a any learning process in Section 7 that Then, in unique equilibrium in difficulty. that is close to fictitious precise) will converge with probability we one to the use and adapt results from the theory of stochastic al., 1987; let Kushner and Clark, 1978; and Lyung and us note that the results given here are only a small part Among other 4), we things, same opponents repeatedly and rivals in (period's) 3 Before setting out, by We show go through without earlier results unique equilibrium. Here the random. specialize to the class of 2 x 2 strategies, (in a our (in the precise value of this period's perturbation for the player, the actions of the first player are in this context all of depending on the who from the perspective of other realization of the player's payoff perturbation. Yet players, player are assuming that players encounter yet act myopically, a fairly unsavory combi- they observe the full (stage-game pure) strategies chosen game over and each round of play, and they play the same to return to over. each of these three simplifying assumptions in subsequent work. 2. Formulation Fix an /-player, finite, strategic- form game. The players are indexed players; set of ' i.e., -i = {1,2,... pure strategies Thus our work is ,1 - i game, hereafter referred = 1,2,...,/, and we l,i + l,...,/} (or actions) for player similar to that of Marcet .* i , Let let S let { , i to as the stage -2' denote the "other" = ...,/, be the finite 5= S x 1, 1 . . . x S 1 be the set of and Sargent (1989a, 1989b) on learning rational expectations equilibrium. j We use male pronouns for players in general, and for players numbered 2, 4, 6, etc. and We use female pronouns for players numbered 1,3, etc., and for players lettered and -j :' . lettered and k. . pure-strategy profiles, and mixed fashion, let J? be the v let S — 1 : . give player i's payoffs. In the usual 7? strategies for player i , mixed-strategy profiles, and extend the domain of set of for each S~* denote let i distributions over S~' expected utility Yljj, for ; : S 3 ' , and -1 X1- ', £ she chooses pure strategy if u' x . . x . X" 7 be the to 17. Also, set of probability tA-s'^ let and her 1 s 1 from 5 U~' denote the let £ S' and a s' J = JT let - denote ') i 's rivals act according to the (possibly correlated) distribution o~' Imagine that these players play the game repeatedly, Imagine that by after the pure strategy that i.e., chosen is chooses his action using a mixed strategy, the mixing up history of play = play {s-i , up . . . to , s -i) time t to time where , t st or (5)' , t , denoted € S for > _1 is , t' (< possible infinite histories; The how = 1, . . , . - * 5 denoted by 2, Z i.e., basic object of this paper is observed. = 1,2, If a player not observed. Then a 1 The . set of all histories of By convention, . = (S)°° with , Z Z will x denote denote the will typical element and behavior, a model of learning behave and what they believe the players t string of (pure) strategy profiles is a , is is the (singleton) set consisting of the null history. Also, all dates each round of play, players observe the actual actions chosen opponents; their at ( = (5 l5 5 2 , which as time passes. A set of . . .) specifies model of learning and behavior consists formally of t .vo pieces, behavior rules and assessment rules for We each player. take these in turn. Behavior rules We will denote by repeated game. That is, . behavior rules date . . Insofar as possible, players. When we <f>' ! (for a profile (<£\ at f <f> , 4> ) t we write ()' the behavior rule that playei = 6' (<f>{, 2 ,...), where <f>\ : ? Z — E t of behavior rules for the players), as a function of (t ), and uses in the infinitely <?,((<) 4> t l . The notation <f> (for the profile of will all be used. follow the convention that subscripts refer to time and superscripts to , however, we mean the usual within the parentheses. (We will try to avoid this as much t fold Cartesian as possible.) product of the argument . Fix a profile of behavior rules q> Given any . > t and 1 to contruct a conditional probability distribution (conditional the path of play in the usual fashion: st (the actual play at date t) via Z M and (f> t and Ct Z t+2 the conditional distribution to We that this s M given by part of this construction plays i s' 5 1 , 2 plays s 2 Y(s = t That is, <p (t ) for the rest of M (( M ) , we and so on. These then give , Z by can extend a probability Kolmogorov extension the keeping in mind , must be emphasized. Given history time at and so on. [ (s ,...,s players randomize ( , . When we the , (, construct the con- specify the joint probability that insist that = ^(G)(^)x...x^(G)(3 |c.) (in their ^J(Ct)(s') we must We I ) is t on ditional probability distribution plays <f> will write P(-|(<) for this conditional probability, probability that 1 , for a fixed profile of behavior strategies. is One t give a probability distribution on <t> distribution over the space of complete histories theorem. on Z we can use This gives us conditional probabilities on (( t ) with transition probabilities for , € (, 7 ). behaviors) independently. Assessments To model the behavior rules of players, we malisms. Specifically, we will want , and partial history (, ) what be played. Formally, for each range ancillary for- and contingent on each possible t £i For the analysis in this paper, t employ some speak of what each player assesses con- to cerning the behavior of her rivals, at each date history will E~ x , , let full /ij denote do each player in the , time round about with domain a function i Z t to and representing i's assessment over the possible pure strategy profiles that her rivals will choose at date denote a will suffice to specify (for believes her rivals will i t it t , as a function of system of assessments or assessment rule for (t . we Also, l z ; i.e., fi use is a l /i to sequence Note well (/Xj',/4,---)- ments her for that cr"' e -T behavior; rivals' is i - encodes more than ' allowed z's marginal afsess- make an assessment concerning to the joint behavior of her rivals that admits correlations in their play. seem troubling when contrasted with at first This measures P There . no conflict, that governs is measure reflects the objective probability We do by the players. the evolution of play, as a function of the behavior rules may made the independence assumption in the earlier construction of the probability however. The measure P 6 not allow players to correlate their (mixed) strategies at any date, hence P On constructed with independence at each date. what her resent a player's subjective assessment of and until knows what behavior i gies and a player and 3/4 , 2 has a choice mixes between a and b to do; unless which player in between and a' 6' has a choice between pure 1 = is, player 3 believes that player = )(a) = 3/4 (where the strate- Imagine that player 3 thinks that . with probabilities 3/4 and 1/4 or with probabilities 1/4 with the same mixing probabilities used at *nch date, irrespective of what of play. That 4>\(£t)(a) and player b either 1 about rivals are rep- fi)(Ct) rules her rivals are using, correlation in her game For example, imagine a three-player the other hand, the is 1 either using the behavior rule ft is means, irrespective of the values of and t Ci )/ is the history that is given by or she believes that 1 uses ft given by ft (C 1/4 Player 3 entertains similar beliefs about the behavior rule used by player 2; we use ft and ft to denote the two possibilities. Moreover, player 3 believes that her rivals randomize at each date independently; i.e., if player 1 is using c and player 2 is using ft t t . , then player 3 assesses that the probability of player 3 initially believes that if player 1 is Specifically, player 3's prior belief is thar. and use ft .05; and 1 2 will use will and use ft using ft 2 will use ft1 .05; 1 then , will use 1 with probability ft any round in (a, a') c 1 will and 2 will do at rivals. with probability ( , given any history C., . It is 2 will .4. the correlation in her initial beliefs a at date 1, this makes it at more . Imagine likely that player 2 is And, will that using ft use o" with probability .4; 1 . will use ft with probability finally, imagine that 3 uses the about the joint behavior rule profile to find 3's assessment about what evident that although 3 believes that are randomizing independently, her initial uncertainty about assessments about their play (3/4X3/4) = 9/16 and 2 beliefs With these data, we can integrate out any date more use ft sequence of observed play and Bayes' rule to update her used by her it is is 1 1 and and 2 what behavior rules they are using and about their behavior rule profile imply that she will be making each date that reflect correlation. likely that player 1 is using ft If we condition on player than ft , which makes it 1 playing more likely more likely. Note as well that even though player 3 do not change their behavior from date to date as a function of what happens in the course of play, her assessments n\(C very much depend on <i since the history of play up to date gives player 3 information about what behavior rule profile that player 2 is using ft , which makes a' believes (with probability one) that her rivals t t her rivals are in fact using. ) , assessments can reflect her strategic uncertainty. Fictitious play 3. The model of fictitious a 7 model of learning and behavior. then we discuss its — setting as "not i ?"; as in Section is just 2. of learning and behavior. players; = 3 ? give the details of fictitious play, and model two — i.e., we First interpretation as a In fictitious play, there are interpret viewed as play (Brown, 1951; Robinson, 1951) can be — for ? 1 = 2. i.e., In this setting, we = 1,2. Otherwise, the general 2 The behavior and assessment up rules are build as follows. (A) For each player number (B) of times that For each player that i, i , i strategy played there 1 s an is s l € S' in the "initial , and history t — (, let , k((<)(-s ') be the observations that comprise 1 weight" function S~ —+ l 77' : [0, 00) 8 (, . such Ej-'-gs-Vk"'') >°- (C) For each player i, date r?j(G)(s _i ) = riKs-*) + Then player z 's t > 1 , history (t (Note that ^(CiX* fc(Ct)(s-*") assessment rule MGKs and strategy s~ e S~' l , is //' ) = - *') = ^(s-'') given by normalizing the , we define for all r?' ; 5- .) i.e., — Eves-^HCt)^') (D) For player i, at each date t with history (t , ^}(C<) is a maximizer of uV^-MCtXs-') (3.1) a-'es- over all We Q cr' G £" are grateful to We do Bob Aumann not bother to write k\ and the player whose strategy is , for convincing us of since the being counted how important this is two arguments determine the length of the history . In (D), we have down not pinned more than one maximizer of the definition of We do (3.1). require that not say what is 7;' it is. Formally, such that (C) holds as but it trust that may play fictitious rules most readers will when make there is a particular but strategy), we do say that a model of learning and behavior if there are initial weight functions assessment rules a definition of the on the behavior as a condition fi' and (D) holds <j>' be familiar with the model of fictitious play, help the uninitiated to give a simple example. Imagine two players who repeatedly play a model of consistent with the We we would <£}(C<) mixed prescription in such cases (which, of course, can be a <f>\(( t ) row and player 2 the strategic-form a column. game We assume in Figure 3-1, with player that the game begins with 1 choosing the players holding "beliefs" 1 7? 7 =(1,0,4.32) and r, = (3,5.7), where we write these functions as vectors with the understanding component 1 of 77 corresponds to column 1, the second component that the first to column 2, and so on. Player 2 Column *- w Column 3 Column 2 1 Row 1 5,1 8,4.7 2,3 Row 2 2.3 2,1 4,2 CD re a. Fie. 3-1. Refer to the 1. The second player 2 will first A three lines of Table 3-1, line gives data for do strategic-form game. in the first player round 1; first 2 gives the higher payoff, and that her relative beliefs about what the vector (i.e., payoffs she will accrue given those beliefs Row which are labeled round number is 10 if 1 77 ); and next the expected she chooses row 1 wrtten down as her and then row 2. choice. Similarly, Round number Player 1 "Beliefs" 1 Player 2 3 Round number 2 Player 2 Player 2 3 about about 5.32 2.49 8.7 about what player will 1 2.04 Choice row 2 2.26 column 1 1.95 of fictitious play. do (the vector second three to the lines, labeled round number about the actions of player 2 are changed to round. Since player 2 chose column 3 in the 3 in l's beliefs increased by is expected payoffs to player and we see finds that choice of that ), 2's best choice is 1 is 2 in the (That optimal, to Fictitious play was not would behave (and instead as a method learn) for -q\ be player when what happened of learning The connection will We = (1,0,5.32).) recompute the when in the 3, advanced as playing a computing Nash changed now to reflect player l's second round row 2 and column round number originally But player 2 l's best choice. game and so on. a model of how individuals repeatedly; equilibria of the preplay thought process of individual players. model in the first round, the entry for column first his beliefs are Hence period. first is, reflect Plaver l's beliefs 2. of playing either row, using these reassessed beliefs, row 2 continues column row 1 1. are chosen. This gives beliefs for as a 2 tj 3. Move 1 Choice row 2 2.28 column 1 Expected payoffs 3.08* 3.28 rival An example Table 3-1. 2's beliefs 2.44 7.7 3 3 2.28 Expected payoffs 2.82* 3.45 rival 5.32 'Beliefs" 1 Player 2 column 6.7 "Beliefs" Choice row 2 2.34 column 3 Expected payoffs Choice 2.47 3.68 row 2 2.38 2.14 2.31 column 1 rival 5.32 1 Round number 4 given about 3 Round number 3 Player 2.31 1 Player 2 Expected payoffs 2.56* 3.62 rival 5.7 "Beliefs" 1 Player about 4.32 1 9 it was advanced or perhaps as a model How well does it stand and behavior? The following two questions are raised become dear in a bit. 11 immediately. (1) Is there that the any particular sense assessment rules player believes her rival //' is to how assessments are being formed? are consistent with a Bayesian It model can be shown in which each playing the same (unknown) mixed strategy in each round, independent of what came before, and where each player's prior assess- ment concerning (2) Is it this unknown behavior sensible or realistic to given their assessments? so for in Section 4, does not respond preceding Behavior that now we possibility arises in the round play as, at least, a we if in this sense will in in player 2's is that is example of Figure 3-1 for the its long-run implications. and Table 3-1; if we 3.0. In Nash equilibrium, then model of Proposition 3.1. fictitious play. Suppose that pure-strategy profile profile must be a follow this out row 1-column 2. Since game, increasing the weight on row Nash any history generated by fictitious Or, speaking very loosely, strict to the One assessment and increasing the weight on column 2 in player "gets stuck" at this pure-strategy a strict to question warranted. assessment only increases the optimality of column 2 and row Proposition rival very specific but interesting parameteri- can ask about Nash equilibrium be discussed our answer in this fashion then in round 8 play reaches the profile 8, this pair is a strict Thus play in the sense each player believes that his — as posited — then myopic behavior zation of learning and behavior, 1 would behave myopically, myopic is only note that to the history of Accepting the model until that players round, they choose a strategy that maximizes their immediate expected payoff, that, in each (1) just assume strategy has a Dirichlet distribution. is Nash in played for all Nash A all respectively. equilibrium. In general, play, if a strategy profile subsequent play will be that strategy is played profile. equilibria are absorbing for play according related observation some 1, l's is the following. history generated by fictitious play, a particular but a finite number of periods. Then that strategy equilibrium. 12 We refrain which from giving the proof proved is possibility; must be so, this profile judicious choice of the at any a Nash play might "stick" at some pure-strategy equilibrium. (It weight functions initial equilibrium. strict an easy corollary to Proposition is 4.1, later. Thus we see one If here; this Depending on how profile. goes almost without saying that will ties allow fictitious are broken, this play to stick is true as well of any equilibrium, even those in weakly dominated strategies.) Proposition 3.1 games strategy profile in in games on implies that fictitious play cannot converge to a single pure- do have pure-strategy that to a single pure-strategy profile. and change the entry 3-2. have no pure-strategy that Note 4.7 in play with the same initial equilibria, fictitious 2 play For example, take the row 1-column row 1-column that equilibria. Nash may game 2 to a 4, giving the is still a strict Moreover, even fail to lock in Figure 3-1, game in Figure equilibrium. Begin fictitious weight vector as before, and it turns out that column 2 will never be played. Instead, play "cycles" around the best response cycle row 1-column 1 "cycles" is row 1-column to put is, player player 2 plays column Hence 1 1 plays two row one 1 fifths of the the players' beliefs about corresponding mixed strategies. mixed strategies constitute a where and fifths of the each other will be playing converge It is straightforward to see that these mixed Nash equilibrium. More generally, we have 3.2. Proposition 32. Suppose that in some history generated by fictitious play, the empirical frequencies of pure-strategy choices converge to strategy profile is 3, third of the time in the limit, time and column 3 three how to the The proof row 1-column quotes because the periods of the cycles increase through time. in converges. That Proposition 3 to however, the relative frequencies of play of the various strategies In the limit, time. row 2-column 3 to is a Nash some (mixed) strategy profile. Then that equilibrium. omitted for now; this is a corollary of 13 Proposition 4.2. Note that this proposition implies Proposition 3.0 as a special case. natural to ask whether, in every It is convergence ditions, der way connected with the However ken. convergence if due which in care reasons not ensured; the every set of initial con- ties, then unfail, choices) are brois it known that ensured for zero-sum games (Robinson, 1951) and (nontrivial) first why convergence may (among optimal strategy taken in dealing with is in this sense is ties trivial two-by-two games (?vliyasawa, 1961). However is for at least in the sense of Proposition 3.2 will take place There are entirely fictitious play. game and example is for general games convergence given in Shapley (1964). Extensions of fictitious play 4. One problem with ification. model of the fictitious Assessments are formed according play (up to the initially play is its very rigid, ad hoc spec- to the empirical frequencies of past given weight vectors), and actions are chosen to maxi- mize precisely immediate expected payoffs. Neither part of essential to the results given above; we this specification is can obtain similar results for a broad class of models of learning and behavior. In this section, we some present results of this sort. Myopic behavior Definitions. Given an assessment rule the behavior rule G / <?{((«) 1 <t> = maximizes , n'(4> (C t ),^(Ct)) t l (d>\,4> 2 i 's is, t of i is myopic relative to p* if, i, we say that for every immediate expected payoff given assessment , 1 <f> is asymptotically myopic of strictly positive numbers e for (/xj,^,...) for player //}(Ci)- t and That is, = max s ^ s ^'(s\ fi](Q). The behavior rule within ) = u' maximizing u'(<p't (( t )^\(( ( )) + et i 's \t t ) relative to with limit zero, for every t x p and if for (, some sequence , immediate expected payoff given assessment >max , 3 . €S ,u'(s',fi\(Ct)). 14 <j>)(Ci) fi\((t) comes That . The behavior rule 4>' is strongly asymptotically myopic relative some sequence of strictly positive every support of in the s' payoff, given assessment support of in the for all s' Note {e t comes within d>\(( t ) p\(Q numbers That . is, with limit zero, for every } tt of maximizing u'iP.p^Ct)) + s i p' to t if and for Q , immediate expected U > max 3 eS ; u'(s' , , fi\(( t )) <fj\(( t ) that in asymptotically myopic behavior, the player can use slightly sub- optimal pure strategies with large probability, or he can use grossly suboptimal pure strategies with small probability, or both, as long as the "average" suboptimality, averaged according to the probabilities with played, is Even some: is at least asymptotically this less restrictive It all. work throughout with models will behavior implicitly of learning myopic with respect and behavior to the assumption has one feature that and imagine is assessment which rules. supposes that players do not try (asymptotically) to influence that player 2 selects actions according to the game row play. In this for potentially trouble- is the future play of their opponents. To see this, consider the 1, strategies are small enough. In strong asymptotic myopia, grossly suboptimal pure strategies cannot be used at We which the pure 2 is dominant for player 1, and so if game in Figure 4- model of fictitious player asymptotically myopic for any assessment rule, she will play row l's 1 behavior eventually. Since player 2 uses the assessment and behavior rules of fictitious play, he will eventually choose column row 2-column 1. instead chooses 2. If player 1 But row 1 if 1; player play converges to the pure-strategy equilibrium, 1 does not behave asymptotically myopically and each time, then player 2 would eventually choose column discounts her payoffs with a discount factor close to one, this gives her a higher overall payoff. The point is a simple one. As long as player 2 playing according to the model of fictitious play, player manipulate 2's beliefs in Fudenberg and Levine, 1 can exploit this order to receive her "Stackelberg leader" outcome 1989). 15 is and (cf. Player 2 Column Z Rowl Column 2 1 1.0 3,2 2,1 4,0 >> to Row 0- Fig. 4-1. A strategic-form 2 game illustrating the possibility of Stackelberg leadership. In light of this example, our assumption of asymptotically requires that some defense and combine two We explanation. justifications in myopic behavior defend the assumption with stories varying proportions: possible influence on opponents' future play is First, even if a player's may discount may believe that their large, the player the future sufficiently that the effects are unimportant. Second, even the players are relatively patient, they if current action will have little, if Suppose, for example, that player i which unknown 's as 's behavior actions. 10 is will not affect subsequently myopic) Weakening this slightly, if i what We is i what learns, will not affect it i i chooses and thus 's own to do, (as long subsequent believes that her rivals will be playing a fixed strategy asymptotically, then asymptotically reasons) in the future. not influenced by the actions of other players. Moreover, because immediate choice of action i happen (and possibly mixed) strategy learns her rivals' actual play at each date regardless of i i is will believes that her rivals chooses actions in each period according to some fixed but profile, on what any, effect myopic behavior (for the same warranted. are not very happy with either of these two justifications on its own. In order to permit learning to take place, play must be repeated "frequently," more 10 The point of this remark may affects the information she receives be natural, for example, if not be apparent. Imagine that :' 's choice of action in round about the strategy choices of her rivals in that period. This we imagined that the stage only observe the outcome of each round of play. Then game «' 's is an extensive-form game, and players choice of action today might affect her own subsequent actions; and she might choose to invest in information today by taking an action that (myopically) suboptimal but that may generate useful information for guiding future choices. 16 i would is . frequently than would be suggested by And extraordinarily impatient players. a substantial discount rate, except for the story that players regard their rivals should a player imagine that his (justifying asymptotic myopia) consequence, more down settles rivals are so different down that one's rivals will settle is to repeated more to repeated palatable story, each player down more and combining the two who number 1, between the same precise, Story 1. at date is chosen and 2. at t random 3. who in this that his rivals settle myopia can be given by enriching our Rather than thinking of is , a and so on, 2, and one group of players is 1 there + is reported to is a it is group plays the game. t how whom that repeated meetings is To be rivals acted in the past. stories holds. selected to play the game. They Those who play and another group a the of game the players, so that is played. At the end that is, at twenty percent of the Is chose row 1, is never revealed. random matching Each player all the entire population played. (That particular player there five whenever a player meets that random matching announced The play of any At each date all we have 1 assigned to a group with it is small group of Imagine that interact repeatedly. unaware of how these t story think of situations in which there are a large thousand players for date end of the period, Story assuming imagine that one of the following three At each date on.) belief plaver, in play of a single strategy. But even are then returned to the pool of potential players, t of the period, and so when each their actions are revealed to all the potential players. each player the palatable, especially sets of players are rare, At each date so, Story five of rivals, he more do we interact repeatedly, thousand players of justifications. of (potential) players some group A quickly than the player does himself. More convincing justifications players from himself? play of a single strategy profile (effectively) is why from internal inconsistency; as playing fixed strategies repeatedly suffers recalls at date of the players, and each t what happened in the previous encounters in which he was involved, without knowing anything about the identity or experiences of his current rivals. myopic behavior seems In each of these stories, "sensible," for reasons that varying degrees the two basic justifications given above. In the justification is mostly at work. Although the game any single individual plays very infrequently, and immediate payoff considerations and the second third stories, with good reason) that his how will it is believe that how a is any reasonable discount actions will have In story 2, this little on the player's immediate plausible intuition has some In is little impact on may because each player influence on the reported aggregate it any time soon, or even by the player's own immediate play rivals, of individuals) affects future rivals. These stories plausible intuitively, although rate, because each player attaches low probability to that future rivals will indirectly be affected effect first played relatively frequently, the possibility that his current rivals will be future rivals through an to matter of each player believing (now, own immediate he behaves will have distribution; in story 3, this at the dominate any long-run considerations. more his future rivals will behave. is first story, mix who then (through some chain make myopic behavior more remains a question worth exploring whether this firm, formal basis. Adaptive assessments Once we assume to specify the that behavior assessment rules /j' is (asymptotically) myopic, the next step used by the players and, in particular, how these assessments are revised as the players observe the actions of others. the models we In will consider, players believe that, at least asymptotically, the past choices of opponents are to fairly is weak property some extent representative of future choices. that captures this idea is A suggested by Milgrom and Roberts (1991)." Milgrom and Roberts define adaptivebehavior that our formal definition is just a gloss on as opposed to assessments, but theirs. IS it will be apparent , ^ Definition. The assessment rule there is some T(e,t) such probability no played at all more than that for all on the e between times and t is T(e,t) and pure strategies by puts very i for every if (according to t' have not been played for assessment rules > t' set of In words, the definition says that rivals that adaptive is > e and histories i 's £<> for every /x{,(G') , 1 puts opponents that were not (<< ). little weight on strategies by her long (enough) time. The class of adaptive a very broad, including, for example, assessments that take a weighted average of the history of past plays by one's opponent, as long as the weight put on any the segment. initial segment of history can be made small by lengthening Four examples of adaptive assessment rules one's rivals will play in period that to whatever was played in t go with Cournotian dynamics); (b) assess that t are: —1 (a) assess that (the assessments one's rivals will play according an exponentially weighted average of past plays; (c) assess that one's rivals are equally likely to play any action that has been played at least one percent of the time, with zero probability for scheme all of fictitious play (where While the class of adaptive other actions; and (d) assess according to the all previous observations are equally weighted). assessment rules that restricting attention to this class is broad, there are arguments too restrictive. is See, for example, the discussion in Milgrom and Roberts (1991, page 89ff) concerning sophisticated learning. Convergence to a pure-strategy profile We u 1 are now in a position to generalize Proposition 3.1. Fix and behavior l rules 4> for our two players. To assessment rules state the result, we require a definition. Definition. The infinite history ( rules if for each t = 1 , 2, . . . = and (s, for , s2 i = is •) , 1, m,). 19 . . , . said to be I , compatible with behavior the action s\ is in the support of That something that could be observed with positive probability over ( is is, time horizons for players finite who behave all according to the behavior rules that are given. Proposition 4.1. and for some T rules , 12 s - t and behavior //' ( be Let an infinite history s. for all z rules t > T If ( . such that for some (s-,,s 2 ,...) is G S compatible with adaptive assessment myopic that are strongly asymptotically <f> sm relative to the assessment rules, then s» must be a Nash equilibrium of the stage game. Normalize the payoffs Proof. player ( is in the game so that the range of payoffs for each no greater than one. Suppose Q is the partial history (si,s 2 ,..- ,s t ) of Maintain the hypotheses of the proposition; . components are eventually C's Then (without Let s 1 be s, . Suppose loss of generality) player better response, 1 's and 1 compatible with the ( is that s„ Nash not a is 4>* , and equilibrium. has a better response to s~ } than s\ . set 4 Because the play of are adaptive we , can find . T a assessment of player ity s\ s* x \i 1 , and because history eventually sufficiently large so that for all t > T , puts probability at least 1 — on the play p.](Ct) Thus the expected payoff assessment of her worse than against sZ 1 's l to > 1 for all t > T from payoff from playing S > T which t , 13 1 . Thus implies that s\ 1 's is e playing rivals' strategy choices) is at least 4e(l for all on repeated settles — e) the probabil- (against s\ - e more than behavior rule of 1 's - 3e - 4c2 > suboptimal e is e not strongly asymptotically myopic, a contradiction. 12 1 Compare with Milgrom and Roberts This is 4f = ti'Cs'.sT 1 no more than computed as - uHs\,sZ ( , the normalization times the is Theorem the probability assessed ] ) (1991, ) , less the probability that maximum that 3[ii]). -1 plays sf 1 , at least possible difference in payoffs playing S one. 20 1 -1 plays anything other than s^ 1 and s\ — c 1 , , , times which is which by Note that the proposition myopic. totically If we assumes that behavior rules are strongly deleted the modifier strongly, the result would be ity 1/t and to defect behavior rules. And each player chooses to cooperate with probabil- t, with probability where each player cooperates — (t But the strategy profile that probability. Nash equilibrium. The problem each finite history Consider the is infinite history compatible with the rules) the behavior rules are asymptoti- myopic, because they involve playing cally \)/t. in each period. This history any assessment (for false dilemma, and as stated. Consider, for example, repeated play of the prisoners' behavior where, at period asymp- of course suboptimal strategy with vanishing a "repeated" in each period is is is not a that compatibility only requires that has positive probability for the behavior rules. For the given behavior rules, a history where players cooperate in each period has prior probability zero. " In order to obtain a result in the spirit of Proposition 4.1, we must either be more asymptotic myopia instead of strong asymptotic myopia, how we make careful about histories consistent but with with the given behavior rules or study not the actual history of play but the intended strategies of the players. We provide one result along these lines Convergence to mixed strategies Next we proceed frequencies of observations to For the remainder of i.e., Let in We 1 = 2. d(( t ) S along : S — end of Section where we look '[0, some oo) up for = 2 play and convergence convergence of the empirical assume the case of that the game has two more than two players in Section (, ; i.e., o(£ t )(s) gives the number of times For the given behavior rules, no single complete history has positive probability, is players 5. give the vector of proportions of strategy profiles the partial history probability one, there in (possibly mixed) strategy profile. this section, will take 6. in empirical frequencies for 1 to generalizations of fictitious the second sense of Section 3, only; at the m fact, .s with never a complete cessation of cooperation by one side or the other, although eventually cooperate-eooperate is no longer observed. point. 21 All of which is quite beside the immediate , was played in periods 1 through t - _ ' 1 tf((t)(s\.s Then, '). , on the marginal frequency distribution Hs-'es- 1 divided by t — 1 , We induced by o(( S' in the spirit of Proposition 3.2, t write ct'(G) for i.e., ) ; we ©-'(CtX^') = are looking for conditions on assessment and behavior rules that guarantee, Suppose C for 2 = 1,2. To get is an infinite history a. Tfaen this result, is it is a Nash equilibrium of the stage insufficient that behavior £" some a, e (si,s 2 ,...) such that for game. 15 opic with respect to adaptive assessment rules. Consider, for example, the matching pennies, and suppose that at dates probabilities for any strategy by t > their rival that 4 , my- strongly asymptotically is the game two players assess equal has occurred at least ten percent two of the time in the past; until date 4,they assess equal probabilities for the strategies. As for behavior, players with the following specification If is t is divisible by 3, in all instances, the assessment leaves the player indifferent: then play "heads"; otherwise play that the sequence of plays players, if behave myopically optimally is tails, tails, and each always assesses equal heads, "tails." tails, tails, What happens heads, probabilities for his rival's . . two for , . strategies. Empirical frequencies converge to (1/3,2/3) for each, which (of course) Nash equilibrium The difficulty, of the stage it game. should be is not a 16 clear, of being an adaptive assessment rule. is both comes from the When fairly weak requirements only one strategy choice by rivals eventually observed, adaptive assessment rules converge together with 1 Please note carefully, this isn't quite the same as asking that lim a((t) = a. ( . We behind 16 this observation, to At the which we return in the two players eventually play the mixed gence of their intended strategies is not the issue. 99 is a next section. cost of complicating the description of the assessment this so that the he are only asking that the marginal frequencies converge, and not the joint frequency distribution. There lot f strategies and behavior rules, we can modify (1/3,2/3) at all dates; nonconver- (degenerate) empirical frequencies of observations. But when rivals use more than one (pure) strategy with nonvanishing frequency (or even with vanishing frequency that vanishes sufficiently slowly), adaptive decision rules can assign probability to that strategy that To obtain the result that imposed on assessment we unrelated to is limiting empirical frequency. we must sharpen seek, rules. its The simplest and most considerably the criterion works direct criterion that runs as follows. Definition. The assessment rule //' It is Q are subhistories of the fixed easy to see if for every ( Z , 17 ( . any asymptotically empirical assessment rule that: e \U(Ct)-o-'(Q\\=0. Urn where the asymptotically empirical is is adaptive; there are adaptive assessment rules that are not asymptotically empirical; the assessment rule in the model of Is it fictitious play asymptotically empirical. reasonable to insist that assessment rules are asymptotically empirical? This property is rival is playing natural if one's picture of a rival's some (unknown) But if you think that time — then some assessment scheme in response observations or that would be more to your that the one supposes may state variable that puts relatively repeatedly through shift such as a sunspot, say more weight on more uncover the probabilistic structure of the regime — recent shifts reasonable. Proposition 42. Let ( bean infinite history lim a'(C«) 1 Whenever we if is independent play of some (unknown) rival's strategy some Markov tries to dynamic behavior strategy repeatedly, or even that one's rival will converge to repeated, strategy. is — oc (s, ,.s 2 ,...) = such that for some a. e ai, are dealing with finite dimensional vectors as here, 23 E, || || denotes the sup norm. for i = l,2. 1 behavior rules <j> am is rules, then compatible with asymptotically empirical assessment rules // and is // C a Nash equilibrium The proof resembles ments. assessments of player mixed strategy a~' strategy I' s' 7 5. By . o\ If i is that a standard be worse against will will not be s . for player support of a\ the proof of Proposition 4.1 with the following by (given i relative to the assessment of the stage game. since the assessment rules First, myopic that are strongly asymptotically //' ) /j' are asymptotically empirical, the at the partial histories not a best response to o~ is strictly than is x , (, s' > e and by more than i 's is to the some pure some s' in the sufficiently large e, for all played eventually (by asymptotic myopia). But being in the support of the limiting frequencies of converge then there better against a~' than is argument, for some /i'(C<) amend- this t>T. would Thus T , ±' contradict strategy choices. 18 Objections to convergence of the empirical distributions as a convergence criterion Notwithstanding the results of the previous section, the convergence employed fails to strategies." such as Our capture what Mixed jumping from one pure strategy rebuttal to this equilibria are The conclusion assessment rules; e.g., pirical frequencies suppose that of history, for fi' a model of "learning to play players are (almost) never playing mixed strategies. ever-increasing length, so behavior The for a mixed objections begin with the obvious observation that in examples fictitious play, are instead we want criterion is is to another, (typically) in cycles of not converging. that while behavior is not converging, beliefs are. sometimes interpreted as equilibria of Proposition 4.2 the conclusion does not require the still They full power in beliefs; each side of asymptotically empirical holds for assessment rules that do not approach the em- along histories where the empirical frequencies don't converge. More concretely, reports the empirical frequencies of strictly — £'s choices over the most recent a percent between zero and one-hundred. This assessment rule empirical per our formal definition, but it is is not asymptotically empirical enough so that Proposition 4.2 holds. 24 believes the other to be acting in a among Under several actions. in the previous section and their believe they is fairly natural // (1,V5) would be ignored. Consider, (nearly) indifferent first convergence criterion used players ignore the cycles in their own left, round will And play of either top-left or bottom-right. results about fictitious play, Nash equilibrium perfect correlation in the game using the precise this we know V5) . relative beliefs vec- player fact, chooses top 1 with the numbers relative beliefs going The symmetry again implies so on, inductively. ,9 From general that empirical frequencies will converge probabilities (2/3,1/3). two if and each player's (2, not battle of the and vice versa. In be given by we do example, a symmetric of the situation implies that round, 2 will choose into the second striking that where each player begins with the The symmetry • for Imagine play of are given, top-left will be played, to the the phenomena so these cycles can lead to of fictitious play, in the first we makes this interpretation, the sexes as depicted in Figure 5-1. tor that opponent's play. However method manner But this will be realized with players' choices: Top-left will be played two-thirds of the time, and bottom-right one-third. Players will get zero round after round, there will be perfect correlation in their actions, and yet according to the the- ory they will persist in believing that they are "converging" to the mixed Nash equilibrium. Player 2 Column "Z. 1 Column 2 Row 1 0,0 2,1 Row 2 1,2 0,0 a> TO Q- Fie 5-1. 19 The battle of the sexes. Because the payoffs are rational and the relative weights have an irrational never be a tie; each player will have a unique best response at so all times. ratio, there will Moreover, for the case I 3, this > 2 . example shows Imagine game, a three-player from the perspective of players 1 run into that Proposition 4.2 will and 2, in which the actions of player are irrelevant. Players and 2 simply 1 play the battle of the sexes against each other in each round. Player an optimal strategy, murt forecast the to a main-diagonal cell 3, to choose joint actions of her rivals; for the sake of suppose her optimal action definiteness, difficulties is tic if she believes that they will play with probability 2/3 or more, and her optimal action is tac otherwise. What should the particular 3 conclude, asymptotically, model tic is optimal? the time, 2 will play always playing along the main diagonal, Or should she conclude left accordance with act in Should she conclude that of fictitious play given above? their actions are perfectly correlated, hence and 2 if 1 that will play top two-thirds of 1 two-thirds of the time, and hence top-left has asymptotic and thus probability four-ninths, bottom-right has probability one-ninth, tac is optimal? There are we (at least) two different ways we could proceed, depending on how extend the definition of asymptotically empirical assessments. ble definition is precisely the definition given before, interpreting marginal frequency distribution along nition, i 's assessment (asymptotically) rivals that are (r S of profiles from reflects ' . any correlations how observed empirically. The example shows One _i ct possi- (G) as the Under this defi- in the play of her this definition per- mits convergence (under fictitious play) to non-Nash (correlated) assessments, so that Proposition 4.2 fails. 2C An alternative definition suppose that players asymptotically assess independent play by their rivals, regardless of the empirfrequencies. ical One can 21 Then we obtain Proposition 1 I > 2. However this repair the proposition in this case as follows: Stipulate that along the history C empirical joint frequencies are the products seems a 4.2 for (in the limit) of the empirical margins. But the this repair bit cheesy. One way to formalize this is to define, for each 2G t , (' , and s € S , r)(0)(s) = Y[,ci <7'(CiH ') seems us to be somewhat unnatural; to feel that it is 1 and if it is even correlation asymptotically, is 2, then we it. unnatural for player 3 to ignore correlation in the choices of equally unnatural for player isn't it and in her choice of strategy that there unnatural to assume that players ignore Moreover, players if for two-player 1 to ignore correlation that of player 2? If so, then the example indicates games, asymptotic empiricism as formulated may be dubious. All these objections (past our example; in the battle-of-the-sexes objections have less force. In fact, for 2x2 games: In a two players (under cted. However we be found in larger 2x2 game, the if it that example shown can be for generically model of nongeneric, perhaps these is that the example be asymptotically uncorre- this conjecture is the game must converge game is to the game initial cells a neighborhood sure of the asymptotic frequencies of the easily, though, Because if we we move from will cells. is, fj(Ci) first fre- we Robust examples can be created we do not leave neat-and-tidy our objection — in empirical frequencies. that this mode of convergence gives the "frequency distribution" obtained by using the marginal frequencies a' and forcing independence. Let fj~'(Ci) give the empiricism in Nash equilibrium be nonzero sum) nor are secondary objections based on asymptotic correlation That conjecture that the strictures of exact fictitious play. cannot verify our conjecture, Nonetheless in our view the We games around rock-paper-scissors, are unable to prove either convergence to the quencies (since most games in weight vectors, along the main diag- onal. (Each of the other cells has asymptotic frequency 1/6.) these properties hold for a neighborhood of rock-scissors- unique Nash equilibrium zero sum. And, for most happens while (asymptotically) avoiding the three we nongeneric chosen payoffs the actions of the fictitious play) will games. The basis for (1/3,1/3,1/3), since the although is conjecture that robust examples of asymptotic correlation can paper. Fictitious play in this this and most basic objection) are grounded first this second sense is S - ' marginal distribution of f?(G). Then asymptotic the condition limi ||/i'(Cr) - -, *7 (Ct)|l = along every history C • . does not correspond to learning to modes research into stronger play mixed strategies With of convergence. — suffices this to motivate motivation, then, we proceed. Convergence of behavior strategies 6. Rather than look for convergence of empirical frequencies (and hence as- sessments about the actions of others), employed by strategies to some o\ 6 -' Because we , for each player we wish in the easiest fashion, interpreting we study convergence (in — z games with I as the set of i's rivals. That Two problems mixed strategies. myopic that is > 2 In the surface immediately. model of their rivals. z How must be likely is it If of <p]((, t ) , we must first as it specify how We proceed was given earlier, but assessments asymptotically in the choices of <f>\(Ct) is meant her rivals. to converge willing to play one of several pure we insisted that players choose only the basis of their assessments of the actions that a player, assess for his rival precisely the player) indifferent? i's First, if of fictitious play, computed on best replies, is, observed empirically strategy, then player i ) ? by using the definition precisely any correlation exhibit would to consider is, look for convergence of the behavior adapt the definition of asymptotically empirical assessments. will to a plavers. That we this is unlikely, based on some history of play, mixed strategy how can we that makes him (the first ever have players willfully randomizing? It is here that the asymptotic parts of asymptotic myopia and asymptotic empiricism come into play. We do myopic best responses; they can play not insist in general that players play only degree of suboptimality vanishes as time to the equilibrium mixed suboptimal responses, as long as the slightly ( t ) strategies quickly the allowable suboptimality vanishes, we 28 passes. So enough if assessments converge relative to the rate at which can sustain mixed strategies even if assessments do not match precisely the equilibrium mixtures. At the same time, we do not insist that players' assessments are precisely empirical; some equilibrium mixed frequencies of play converge to mixed beliefs can sit at precisely that limit strategies This even if means behavior as convergence to both at once. we mixed That is, strategies lo! of to The formal "nonconvergence" The second problem that concerned. is , in As we shall we precisely is mixed -strategy you may is raised is if we beliefs that are to hope convergence To see what in is at terms issue matching pennies game repeatedly. Suppose that along object, is two rows and the two columns very unlikely. (1/2,1/2)? and Unlikely — any history ( is players mixing strictly in each round this history we don't require myopic and that statements of probabilistic statements. strategies be converging to (1/3,2/3) that prevents this see, profiles. the empirical frequencies of the are approaching story, at least insofar need one or the other, will approach (1/2,1/2), but the behavior rules converge (1/3,2/3). This, our criterion: Unstable strategy profiles must be here, imagine playing the history £ power our results obtain with asymptotic myopia and beliefs that convergence of behavior some mixed from precisely myopic behavior and precisely allow carry a are asymptotically empirical. But of behavior strategies strategy, then players' strategy, justifying the play of are precisely empirical, or with behavior that for the empirical precisely myopic. is that the divergence empirical beliefs which if is at the How to the mixed strategies could players' behavior same time empirical frequencies just the right word. There is nothing compatible with behavior rules that have — but by the strong law of large numbers, belongs to an event of probability zero. If behavior strategies are con- verging to (1/3.2/3), then the strong law of large numbers says that empirical frequencies will converge to (1/3,2/3) with probability one. Given asymptotically empirical assessment rules, anything close to the (1/3,2/3) this would strategies. 29 rule out players continuing to play when Accordingly, we giving results in the spirit of Propositions give results of the following form: there is a. If is not a 4.1 and 4.2, Nash equilibrium, then probability zero that behavior will remain forever in a small-enough neighborhood of <7» , no matter what The are the initial conditions. formalities run as follows. Fix a set of behavior rules l rules that fi , although for the time being only the behavior rules are needed). Recall represents the objective conditional probability distribution on P(-|C<) Z space created by starting at A Definition. for all Note t (which will be accompanied by assessment <f>' strategy profile a. G and (t that V(\\<j>ACv) , here e Proposition (, 6.1. is - <r.\\ and using the behavior rules U unstable is < e for all Fix behavior rules Nash equilibrium Note totically and >t there exists \ Ct) l 4> that are asymptotically is p.' . some (< 4.2, in e > such that . myopic Then every strategy relative to profile a, some that is unstable. that in this proposition, behavior rules are required to myopic thereafter. =0. independent of the starting conditions asymptotically empirical assessment rules not a t' if th.? assessment relative to the rules. be (only) asymp- Compare with which strong asymptotic myopia was assumed. Propositions 4.1 We return to this point at the end of this section. Although the is fairly simple. profile a. too, , If details of the proof of this proposition are tedious, the idea behavior lies forever in a small neighborhood of the strategy then empirical frequencies will eventually and thus the players' assessments equilibrum, then (eventually) lie in a small neighborhood will too. If the strategy profile isn't a some player will want to move far Nash away from the strategy profile, a contradiction to the supposition. The key which we technical result state in the is form of the application of the strong law of large numbers, a lemma. 30 1 Lemma 1 {x 6.2. Let = t t ; A space with range some finite set e > 0, and A let , denoted that = t z<-i) , hm for all a Proof of = /' < + 77(a) sequence {u distributed t } Fix any a € 6.2. is i? la 7r(a)j (z t <); that and e A A . hm and an x all conditional on t is, < e. r {a) the is t number of times inf, > ^ 77(a) - t 22 Construct a "standard" probability space = [0,1]^' 2, -) and, writing u as the t t th component of u> , the an independent sequence of random variables, each uniformly on the unit the cardinality of — t £ A, almost surely conditional on {Q,F,P} where A !,...,/. Then _ sup. Lemma a probability satisfies x -\) t for , iTtialxi T {a) be the random variable 5I|» =1 Xf = a cr. - on Fix a probability distribution . 1,2,..., the distribution of each 7r t (-|xi,... max Let variables be the (measurable) subset of the probability space consisting of sample points such that for {xi,... ,zt-i} random be a sequence of 1,2,...} A ), with a : A Enumerate interval. = a . Now standard probability space as follows: For {a] ,a2 ,...,ajv} as random define t = 1 , yi(uj) (where A variables y t on 7 is this = a„ for that index n such that Then, inductively in t , n— n m=l m=l let y,(u.') = a n for that index n such that n— n ^7r<(am|yiM,---,y«-i(u;)) <u < t m=l *)T 7r t sequence of random variables whose If A has zero pnor . . . , y,-^)). m=l This construction uses the uniformly distributed a (q m \y^{u), probability, the lemma random variables to construct joint distribution is identical to the joint is 31 taken to be vacuous. distribution of the original sequence {x Accordingly, } t terms of the constructed probability space, and on the A limits superior one for each a taken (finitely and if y ( {u t t u , for t < > z(a) + e) for points in A for This . w, < — n(a) so because is A by : = a, the a, — 7r(a) = and < e 7r t v {-(a) t . Then — e.u) on for all the is sequence and the t u G A for which contained within the set (a|y,, — ,y ( _i) < 7r(a) + for e be the number of times that t 1 ,... ,t bounds compare with how we determine those definition; then t t' y set of points e) A in the desired result. for the y {u) = a. For r 6 [0,1], let u (r,u) which r set contains the set {u a u < : all u = (a,') we Because easy. is that the stated then they hold almost surely on at a time, many) as simultaneously, which then gives a A can define inferior of Tt (a)/t hold almost surely conditional But showing that the two bounds hold on fixed we prove we the estimate < < r (a,u) t v t {r;{a) + t.u) for u £ A follows from the asserted set inclusions. By the strong law of large numbers, hm — oc — v,(r,u) with probability one for each = - and = z(a) + r on on A, gives precisely the desired r (a) t Proof of Proposition Then equilibrium. is strictly ' 6.1. there r Suppose is we and a 1 e This, . that a. than is can find an e > that are within t of a\ a\ result. strategy profile that and i so that for , a pure strategy Since v'(a\a~') . both all in the a e < 32 ' is P u'dVi. is not a Nash such that s 1 continuous in both that are within sup norm (on respectively), v'(a\a-') + one combined with the previous bounds is a some player better against a~' arguments, <7. e r individually, so this holds with probability r both for 7r(a) = t t E~ l It of and Z" , , . o~ (Interpret x here as any element of ~E~ pendent play by i 's rivals. rivals according to the We claim that for this sessment and e l needn't be based on inde- composed of independent choices by l is profile a, We From for all asymptotically empirical behavior rules myopic relative to these as- M< - for e all t' to the contrary that for > t some = (t ) | C< 0. this probability is strictly / proceed to derive a contradiction. lemma, we know the on Q) where (/-l)e of a~ l . 3 IIjV: <? i'((t')(s on the that — ||<Pr(Ct') o* < || A set for all e J ) - < J J f^,(C«')(s ')-<ri(s ')| (If rij/i <7 «(5 J )| < (7 - l)e of positive probability (con- > t' the assessments of player lie i the limits inferior and t, A , which contradicts the Two remarks (1) Note that the the behavior and assessment rules cr~ is 2 . a T such that for all t' > T But then asymptotic myopia be more than e away A in order. neighborhood of a, that depends only on the strategy there definition of about the proof are then for any s~> = (^) jVl-, ' , will eventually <f>\,((t) within lie Since assessments are asymptotically within Je of implies that along every ( G A, ^ for all c .) empirical, along every infinite history in from a\ 's .) superior of the empirical frequency distribution a -, ((<') almost surely | ? rules, To see why, suppose ditional i.e., that are asymptotically /j. P(||<MCf) positive. ; components of the strategy and assessment rules <f> But a~ a~ l is that are profile a, used in the proof assumed to is independent of be given; the value of e not a Nash not required for this proof. What and the extent to which it is equilibrium. (2) is The full essential strength of asymptotic empiricism is that if behavior lies forever in is some small neighborhood 33 of a strategy, then assessments come to lie in strong law of large in a small numbers to show neighborhood, and then that the empirical frequencies we were empirical character of assessment rules. the following form: is /*)(£) the strong a is p)(Qt) as would But suppose the assessment rule took \/i periods. Since segment of history grows without bound, we can law of large numbers to come to the desired conclusion. still enlist Or suppose weighted average of past observations, with greatest weight on the t numbers and goes to infinity, we that "greatest weight" falls off to zero get the desired result. (Of course, this rules out exponential will stick to asymptotic empiricism for the remainder of expositionally the easiest thing to deal with. But it is a bit more than restrictive we fast can enlist a variation of the strong law of large moving averages, where the weight on the most recent observation doesn't vanish at We lie asymptotically equal to the empirical frequency of most recent observation. As long as enough used the able to enlist the asymptotically observed strategy choices by one's rivals over the most recent the length of this We another small neighborhood of that strategy. all.) this paper, since you should note that it is actually need. Locally stable strategy profiles Proposition librium 6.1 unstable. is shows We know that are not unstable. unstable that every strategy profile that The answer natural to It is is by example no; that there are A no Nash equilibrium strategy profile a. is not a some locally stable Nash equi- strategy profiles wonder whether any Nash equilibria are profile is unstable. To avoid double negatives, we make the following Definition. is if definition. there exists some asymptotically empirical assessment rules and behavior rules that are asymptotically myopic with respect to the assessment rules such that for every e > 0, we can find such that P( Urn <MG') = a, — t' oo 34 | Ct) >l-c some t and Qt e Z t We do not insist that behavior converges to the target strategy with probability one, but only that the probability can be model of behavior and assessment couldn't have pure (except there is players some choice of one as Lemma one as 6.3. we wish is not as we some am profile some behavior t' and t — oc let A be conditional on £ t be made If Since « A has (< ff.| to the e Z t make it as we as close some asymptotically myopic with , CO >0. for t < a 0<'((,V) -field = c«} one as desired, t' , t , generated by the this assertion f,i on 35 the , (2 ar, (t , t' for approaches some . . .} A in- (positive A can result. with what you find in is {£, the conditional probability of which gives the so conditioning on By the usual 1974, p.341), the probability of positive probability conditional that are continuations of ( (,< as close to C. t can approaches the indicator function of A as you have trouble squaring 'contains" we there exists {linv_ 00 the event measurable with respect this is probability) that As long appear. rules that are asymptotically Thus by Paul Levy's zero-or-one law (Chung, a may the probability locally stable. To prove the lemma, finity. as we know P( Urn &'(Cf) = guments, which would lead make be able to demanding Suppose that for a strategy respect to those rules, such that for is not is wish. empirical assessment rules and Then a. We the target strategy. the probability strictly positive, we conditions. initial degenerate cases), since as long as players use mixed strategies, the other hand, the requirement that make (for a fixed probability one statement as long as the target strategy away from as close to to for rules) for one arbitrarily close to positive probability of a very long run of "bad luck," On can a made Chung (1974), recall that same as conditioning on {Ci. — i Cf } - (,- Proposition Fix the Proof. need 6.4. Every Nash equilibrium Nash equilibrium and t a, profile ( locally stable. is By . myopic behavior to find asymptotically assessment rules, and a profile a, virtue of the rules lemma, we only and asymptotically empirical such that t P( lim 4>AQv) =o.\ G) >0. We provide two constructions that work, one in detail and one sketched. In the first construction, totic part of we use assessment rules that and we precisely empirical, In the second, we rely (after the first period) are on the asymptotic part of asymptotic myopia. use behavior that precisely is asymptotic empiricism. We myopic and on the asymp- rely discuss the relative merits of these two constructions at the end of the proof. For the first construction, create a probability space on which sequence of random strategy profiles {5i,5 2 ,...} where each and identically distributed according to a. . defined a is independently s t is By the strong law of large numbers, with probability one the empirical frequencies of strategies and joint strategy profiles all converge the support of o\ e(C<) to the and , 1=1 where up to a~'(C<) time which any t s' is Write SI for max u\s\a *(Cr)) — mirv u'is'.a '(<;<)) ; shorthand for the empirical frequency distribution of the s~' along the history in the probability one, e(( be . let max = corresponding probabilities under a, ( ) (( . That support of a\ is goes t to zero as a positive integer sufficiently large c(Ci) < — m e(G) is, is the maximum amount by suboptimal against a goes to infinity so that the event 1 for all 3G t > K, For m -, Then with (C<)- = 1.2, . . . , let A'„, . - V,/!™^ +1 has probability at least equal to (2™ A'] < < t We Km > that A' m+] A"2 For each . * and so on. Note , = For ease of exposition, assume . 1,2,..., let m(t) = m(0 = that limf—oc if < t A", let , m{t) = 1 if oo claim that by construction, the event e(G)<-rr m(i)v < = 1,2,.. has probability at least one-half. To see why, note that this event can be written as P| and note Morgan's law of first < — f° r a 11 * such that ?n(0 = m = 1 , 7/8 (or more) for to see that the complement of which has probability the third probability most, and the probability of interested in — is at Now we ment fj.\(£t) event Hence on. That m = 0, Apply De the union of events, no more than 1/4, the probability of this complement its define the assessment and behavior rules. <i((,t) is for — the event we are least 1/2. rules are very simple. For = one = 2 , and so on. ?77 this no more than 1/8, and so at is I, the second probability 0, 1/2 union m that the probability of the events in this intersection are 3/4 (or more) for the e(G) \ is, t - 1 except for the , define first ^ As promised, arbitrarily, and the assess- for k > 2 , let period, the players' assessments are empirical. As for the that m(t) i to n\(( t behavior rules, for = m, ) otherwise. let d',((<) = o[ It it if t such that m{t) = 0, e((<) < 1/m and <^j(Ct) = a\ . For t such = any best response by relative to the assessment rules given previously. for these behavior (f>\ obvious that these behavior rule are asymptotically myopic (even strongly asymptotically myopic), And let and assessment P(lim Mtt) = t—>oo rules, a* | ' Ci) ' > \Z To see note that this, < e((f) if 1/m(t') for {e(0 < l/m(0, the probability of the event induced by the behavior rules event under the probability distribution where independently and identically according this it l,...,t, then = t has the same probability behavior rules 4> = a, Thus . as the probability of this strategy profiles are all to the distribution a, event has probability at least 1/2 under the <t> t under the measure 1.2,...} same precisely the is <j> = t' . drawn By construction, generated measure, so i.i.d. 1/2) under the measure generated by the (at least And, of course, . {e(Cf)<l/m(0, * = 1,2,...} This completes the proof by the = = {&(0 a., < = 1,2,...}. construction. first For the second construction, go back to the standard probability space on which is to o, and , random defined a sequence of strategy profiles that are i.i.d. according let 5(0= max ||&-(0-*r That <5(0 is, and the is sup norm) between the empirical frequencies the difference (in the From target strategy. the strong law of large numbers, lim,_ oc,<5(0 = almost surely. So for n = 1,2.. .. , let L„ be a positive integer sufficiently large ror a11 * so that the event £CO < n has probability at least equal to = = each t Note that lim,—,^ n(t) - oo -. 1 , 2, Now we . . . , let n(t) , a~ 7 the i < L^ , l)/2 let n+1 n(t) Assume . = 1 if L, that < t L n+] > L„ < L2 and , . For so on. . if a otherwise. '((r), d , let ll*"%) - *V\ < Vn(t), and ^-', f { is, * n+1 give the assessment rules. For any history „, That if (2 > Ln believes that her rivals are playing (independently) according to unless and until the accumulated evidence against this hypothesis becomes 3S severe (as measured by \/n{t)), at which point the player reverts to empirical beliefs. These assessment rules are promised, player does that can for precisely is /i)(C<) cf~ make any some myopic behavior; but when more = l clearly asymptotically empirical. Behavior, as than one strategy is it still remains optimal. In such cases, for the player should use the strategy c\ , what each to specify ; such ;*, otherwise, the player selection desired (say, choose the optimal strategy of lowest index fixed enumeration of S' )• With these behavior and assessment the sequence {L n } begin with a, rules, the players has been constructed precisely so half or more, the players continue to use a. that, and with probability one- The proof forever. , of this is like that for the first construction. Remarks on the two constructions. In both of these constructions, players use precisely the equilibrium strategy for tion has the no positive reason many) given flexibility all. The second construc- comparative advantage of not relying on the by asymptotic myopia, as the mixed strategy of at the beliefs. However is flexibility an optimal choice provided (albeit the second construction does rely one on the afforded by asymptotic empiricism: For no obvious reason, players be- lieve that their opponents are playing exactly the equilibrium continue to believe this strategy, and they unless and until the evidence to the contrary becomes overwhelming. Thus, although the constructions show that convergence equilibrium is possible, neither to a mixed-strategy one seems particularly plausible. This concurs with our intuition, as mixed-strategy equilibria are hard stark environment of this chapter. Instead, we (for us) to defend in the tend to follow Harsanyi (1973) in interpreting mixed-strategy equilibria as a shorthand description of pure-strategy equilibria in games where parameters are subject to small random of the game (such as the players' payoffs) perturbations which are private information. 39 Sec- tions 7 and 8 consider learning in this context and argue outcomes are indeed plausible when interpreted in this that "mixed-strategy" way. Asymptotically myopic behavior and compatible histories moving Before we have on development, to this piece of pending business to take care of. and In both Propositions 4.1 4.2, were strongly asymptotically myopic Proposition we assumed 6.1, earlier indicated why myopic behavior; is weak. (too) that behavior We would not work with asymptotically 4 without we must work strict with the sort of probabilistic convergnece this section. we do not assume On Propositions 4.1 we must Also, because strategies are not players. (only) asymptotically myopic. in compatibility of a history and a profile of behavior rules verge, to avoid problems of correlation two was assessment rules, whereas In order to get results in the spirit of Section used in Because that the players' behavior rules relative to their the results in Section 4 viz., asymptotic myopia, criteria we assumed and 4.2) that strategies con- restrict attention to the case of assumed to converge, we cannot use the definitions of unstable and locally stable strategy profiles given above. Instead, for fixed behavior and assessment rules, we make the following defini- tion. A Definition. that for all The t E is xunstable crl\\ < e strategy profile a, € and G, P(||a*"(Ci') x in front of xunstable is - for all not a typo; this is t' there exists if > t and i = 1,2 | to empirically a. . it = > such 0. from which intended (as observed) play has probability zero of remaining in a small neighborhood of a. tion 4.2, in that C<) e to distinguish this definition the definition of an unstable strategy profile given earlier, in opposed some . Note that this captures some of the spirit of Proposi- asks for empirical frequencies to remain close to a target profile At the same time, it is a probabilistic event. 40 statement about the likelihood of this , Proposition that are asymptotically <p' a. that profile We to For asymptotically empirical assessment rules // and behavior rules 6.5. not a is 1 Nash equilibrium omit the proof. The idea <r» , will is then so must beliefs. But equilibrium, then for s myopic with respect some i that if xunstable. is if empirical (marginal) frequencies beliefs are close to a, and some pure strategy s' must fall to zero, that empirical frequencies stay close to a\ 7. And be used with vanishing probability. numbers) the frequency of Learning in assessment rules, every strategy to the s' , and a, in the is lie close not a Nash support of a\ then (by the strong law of large which contradicts the hypothesis (which puts positive weight on s 1 ). games with randomly perturbed payoffs Although Proposition 6.2 shows that including equilibria in mixed strategies, all Nash we have equilibria are locally stable, suggested that convergence of intended behavior to a mixed-strategy profile in the standard model seems implausible, as it requires that players use just the right they are indifferent, and to do so. Of course, this it is not apparent why mixed strategy whenever they should or would choose apparent drawback of mixed-strategy equilibria special to our learning-theoretic approach, but arises whenever mixed are considered. In response to this problem, Harsanyi (1973) mixed -strategy equilibria of a libria of a related is game game not strategies proposed that the could be interpreted as pure-strategy equi- of incomplete information, in randomly perturbed by is shock which a stochastic which each player's payoff is private information. For example, a mixed strategy placing probability 1/3 on one pure strategy and 2/3 on another corresponds strategies are when to a situation in which the payoff shocks and opponents' such that the player has a strict preference for the first strategy his payoff perturbation preference for the second plementary set. comes from when a set with probability 1/3 and his payoff perturbation Harsanyi showed that comes from the for generic strategic-form payoffs, 41 strict comevery mixed-strategy equilibrium can be "purified" by any small and well-behaved payoff perturbations. work, In the spirit of Harsanyi's behavior and notions of convergence are subject to an i.i.d. convergence to a games to which the players' payoffs in As we sequence of payoff perturbations. more allows us to construct a strategy equilibrium. extends our assumptions on this section satisfactory The concluding mixed equilibrium model of learning will see, this mixed- to play a section then examines the question of global in 2 x 2 games. The model Consider / players where 1,2,..., are subject to player This is i random and playing 1,... .1 the action spaces from the action i = .4' are the profile a = (a',a~') by p e\ RA x , and we denote = period is t = vKaj + u\(a) which We u" . We call assume its We ej e'tia'). is the = (e^a'))^^' the that for each the {e\;t = i that the perturbations denote the probability distribution of support, which we suppose compact, by is ' . Each period, each player does not observe the shocks i to observes the shock to consider mixed To model learning c\ to her own strategies in the in the stage is a i map from E' to .4' payoffs, but Hence her opponents' payoff functions. stage game, a pure strategy for player knows t each period, but the payoffs and identically distributed, and of different players are independent. need in the payoff functions are simply the are independent 1,2,...} C in at times privately observed shocks. Specifically, the payoff to date-t perturbation of player i's payoffs. E' same game the augmented or perturbed version of the underlying game, game where each a strategic- form . (We in the will not augmented game.) game, we suppose that at date t player i the sequence of (pure) action profiles that have occurred in the past, the past shocks to her own payoffs, and also her currently payoff perturbation 42 e't ; » . player We does not learn the past payoff shocks of her opponents. ? adopt the following notation: We (a) ment A use a = J~[ I=1 A' denote the space of action to by Profiles of actions . players except all by player Probability distributions over actions by are denoted i are denoted i. with typical profiles, by a a 2 - A € A~ e ' ele- 1 , l . and probability distributions over A~' (which can reflect correlations in the actions of i's opponents) are denoted by a~' £ A~' (b) Histories of actions (.4)'" We . Complete write a (( t ) to along the history (, (c) t = Z, 1 up t action profiles by i 's (d) In addition to up turbations subscript t £' players at all as in and (e) .4' (f) all A £* denote the empirical , at we up time Z by Q € denoted by to time player t f , of i € ( (r Z = (a, , a -i) 6 t . up distribution of action profiles t i.e., t i or , fj along the history knows her own (ej , . ej) , . . looks like . 's We (Ct, (e{, denotes a complete history for player all i.e., t ; to time use a~'(C«) to denote the empirical distribution of information; dates and and and and including time to all this i , , are denoted t histories of actions are opponents (t the vector of , to time i C, • history of payoff per- use • • • , £\ ep) € X\ . to denote Dropping the of action profiles by all payoff perturbations; dropping the superscript denotes (respectively) a time and complete history t of plays the players' payoff perturbations. behavior rule for player i is denoted by <?' = (<p\,<t>\ ), where <j>) : X] — 24 An assessment rule for player i is denoted by /i" = (/ij,^,...) , where ^ : Z -*A-\ t All of this i 's is a straightforward extension of our earlier model. In particular, assessment rule gives her predictions Measurabiiity of the behavior rules is how her opponents will play at each always assumed. 43 date, based not X\ . on history so We far. assuming are perturbations, so date some that of all i domain £] Given behavior rules fi\ never observe Z, and is 's payoff would not have assessments of the ? i we , could assume that as long as asymptotic empiricism for all the players, i is 's assessments at at properly defined. and given the exogenous probability on payoff perturbations, we can construct the induced conditional distributions probability distribution, conditional on ft, on the space denote of depending on her payoff perturbations. Nonetheless, notational complexity, depend on t that players other than seems sensible it actions of her opponents the cost of In this regard, note that the X We . use P(-|£«) to this probability distribution. Definition. For augmented games: (a) The assessment rule fx' is asymptotically empirical if limJ\ri(Q-cx7%)\\ = for every (b) £ e Z . The behavior rule of nonnegative is <f>' numbers asymptotically myopic relative to {e,} converging to zero, if for some sequence choice of action at every i 's E,\ is at suboptimal against /^(C*)- 2526 most et Nash equilibria of the augmented game Before examining learning in the context of this model, ture of ^ Nash A Nash equilibria in the augmented (stage) such that each player's chosen strategy Suboptimaliry here is of interest changes with a measured given the period weaker definition in is, (:£'—» s'(-) t review the struc- game. game equilibrium of the augmented we as usual, a strategy profile A') maximizes her expected payoff perturbation. which suboptimality is We believe that nothing measured averaging over e', , but the proofs are somewhat more involved. Throughout, denotes the date I we are loose in our notation, taking as understood things such as: subhistory of the fixed ( ; and 44 in (b), (, is in (a), the actions-profile part of £| . C< . payoff given the strategies of her opponents, or equivalently that for every 1 e , = a' maximizes the expectation (over c~') of v s'(e') , (a , -almost p' l ,s~ (c~ t )) + ! e'(o ). Assumption to 7.1. For each i RA Lebesgue measure on the distribution p , l is absolutely continuous with respect ' . This assumption simplifies the analysis, because tion of the opponents' actions, p' -almost every Lemma 7.2. For every o a singleton has We - t A~' ' when a -1 e' the set of c' for £ , and a 's i 7 . - ', b'(e\a~') specify let is e' , and J determined It is Lemma shows some 1 i 's rivals. best response for i 3 (a~' ) be the distribution let this is t to a~ that b l 7 actions: ^(q-'Xq') = p'{t e 7.2 + e'V^a-'Co-') [t'(a',a-') independent or correlated play by reflects induces on player which based on the observation that the complement of is her payoff perturbation (Lemma preference for one of her actions at union of lower-dimensional hyperplanes. Note that this set lies in a finite For each , measure one under p omit the proof, which true whether a strict e' argmax ai£A is has i implies that for any distribu- it that 8 x 1 : &V,a-') = c'}. well defined, since for every o~' 1 for p -almost E is every e' , ! 6 is uniquely .) straightforward to prove the following technical result. 73. The function 3 For each induced by is and strategy i s* 7 ; i.e., 7T'(s')(a continuous. profile l ) = 5' p'{e' , t'(s') let € 45 E l l : denote the distribution on s (e') = a 1 }. For s~' .4' a profile of a . opponents strategies for ?'s -1 . Note _, is strategy profile s is 7r independent of each Lemma if, A 7.4. (a) for each player (b) If understood that a for j f ' is 1 is E — A 3 : 3 — let use the strategies a product measure (since the various a Nash equilibrium -almost A x ... all 1 e' s'ie , 1 ) b , e3 are a game if and only 1 (e\7r~ (s~')). = /?'(a~') satisfies the product measure in = of the stage a' for all A~' whose margins l ) (where i it = a'(e\a ') for all is a3 are the various and i Nash equilibrium. largely a matter of marshalling definitions, hence the proof is This e' s3 i's rivals if x p -almost every ted. induced x where (s-7 )^, then every strategy profile s such that s {e i), This p* € A* x ) = other). and i 1 1 (a ,..., (5~') € A~' that s~* is, A~ denote the distribution on tt~'(s~') in s — that lemma shows that to analyze Nash equilibria the induced marginal distributions over actions, it suffices to is omit- work with which motivates the following definition. 1 Definition. The vector of marginal distributions a = (a ,..., a*) € A* x a Nash - distribution if /?'(a ') = a" for all . . . x A 1 is i Local stability As one would expect, our results about the relationship between Nash equi- librium and local stability carry over to the context of augmented games. Since all that each player observes about the others, decisions, are the actions chosen, rules <f> in all that matters for a player's define stability and stability of beha -ior terms of the induced distributions on actions ^'(^(fi)). A Definition. that for all we and t a, 6 profile and £r A 1 x . . . x A 1 is unstable if there exists some , P(lkW(fJ'))-ai|| < t for all t' > t and i =0. £f J 46 e > such Proposition 75. Fix asymptotically empirical assessment rules p that are asymptotically a. is myopic relative to the p' Then . myopic behavior converges totically Lemma any 7.6. that s' > For any 6 -maximizes player e Proof. We set of e' (under p will for show ) which player make that that for no greater than l player an there exists - i any #.4' responses a is uniform . To see x e such that for a~' all this, fix some a - and note ' 1 is 1 , the , that the set .4' (at Put . ! e most) (#.4*) a "sleeve" of where player 2 lies diameter e t upper bound on the Lebesgue measure of the union of bound goes to zero as e goes , -best there (#.4') bound on the o -measure of these sets, going to zero as l goes to zero, which establishes the Proof of Proposition 7 .5 . ^(oj is e result. Fix an a. of generality, suppose that 2 to zero. absolutely continuous with respect to Lesbesgue measure, there thus a uniform upper a around has multiple i on such hyperplanes, contained in the union of these sleeves. In the compact set E' a and for p' -almost every c ' between any two given actions indifferent denotes the cardinality of J (in -1 < L there exists an sleeves of this sort, and this uniform upper Because p - a beliefs has more than one e-best response has measure i 6 6 asymp- in the sense that the such that, for any /?'(a-')|! each of these hyperplanes, and the set of is distribution, close. > e lower-dimensional hyperplane, and there are where Nash that as time passes, payoff against a i 's Htt'V) e' not a myopic behavior, to induced distributions over outcomes become of is rules unstable. The proof uses the following lemma, which shows for a, if and behavior l that 1 ) is f a\ . not a Nash distribution. Without loss Let 8 = ^'(q: 1 ) - a\\\. Since ] l3 is , we continuous, a all -1 within We It' show will so that for can find some \\0 l (a ) - < 6/4 for Suppose not, 1 /? (ai,)|| 1 . that profile q» history l sufficiently small that e' a^ of . unstable for is e = min(e'/7, 6/4). f< P(lkW(£-))-a||| < e for alii' > and t for 2 = 1,...,/ > & 0. J Lemma on 6.2 then implies that (almost surely this event of positive probability) the empirical marginal frequencies of actions eventually Then because assessments surely on this event), distribution is are asymptotically empirical, - a7 ||/zj.(Cr) l \\ < It < e lie within of the e we conclude a'„ that (almost This * n turn implies that the ' on actions /^(/^.(GO) induced by the myopic best response to /j],((t') -1 within 6/4 of /^(a. From Lemma /iJ-(G') is ). 7.6, there is one such that the within 6/4 of ^(/^. ((/')) for any the suboptimization allowed for The e" by player l's (,- . set of e" -best Let t' responses to be large enough that decision rule is less than this e" . triangle inequality then implies that the marginal distribution over player l's actions is within 6/1 of ^(q^ 1 ) , and hence at least 6/1 > e away from a\ , which contracts the hypothesis. The next step is in parallel locally stable for myopic behavior with Section 6 is to note that every Nash equilibrium some asymptotically empirical assessments and asymptotically rules. This can be most easily construction in the proof of Proposition 6.2, in shown by adapting the second which players believe that the distribution over their rivals' actions corresponds to the equilibrium unless and until they receive sufficient evidence otherwise. With these nature of the players' behavior rules mix in the for the way is beliefs, the arbitrary eliminated; rather than just happening to the equilibrium prescribes, the players have a strict preference behavior they choose. Of course, the players' beliefs are 48 still cooked to favor the equilibrium, so how we do a special class of Global convergence This section will rules final section two-player games. in a class of 2 x 2 show by example how do not build in games learning in augmented games can an arbitrary predilection for equilibrium will restict attention to behavior and assessments do more than show that convergence to a even when the equilibrium games we is not this end, form Moreover, 3. we the behavior rules: In the artificially built in (augmented version) of the mixed consider, play will converge to the do not aim To mixed equilibrium can occur equilibrium with probability one, regardless of the We play. that take precisely the of fictitious play, as specified in (A) through (D) of Section will provides mixed equilibrium even when the assumed behavior and assessment lead to a we The players might learn to play a mixed equilibrium. such an explanation for S. not yet have a really satisfactory explanation of for very general results here. initial beliefs Rather, we of the players. content ourselves with the special case of 2 x 2 games that (before being augmented) have a unique Nash equilibrium, which moreover is completely mixed. At the end we speculate about possible extensions that for local stability of games. eral We mixed under will that an play in other augmented augmented version of Shapley's provide the desired counterexample, but The remainder Proposition fictitious a sufficient condition suspect, however, that convergence cannot be guaranteed for gen- augmented games; we conjecture example 8.1. of this section will discuss Take any equilibrium, and consider If equilibria would provide of the section, 2x2 game we have not verified and prove the following that has a unique, completely any augmentation that satisfies Assumption this. result. mixed Nash 8.3 given below. behavior rules and assessments are as in the model of fictitious play, the induced marginal distributions on actions converge, with probability one, to the unique Nash distributions of the augmented game. 49 , Previous results about the global convergence of behavior in learning pro- games cesses have focussed on by that are solvable iterated strict dominance (Moulin, 1984; Guesnerie, 1991; Milgrom and Roberts, 1990; Borgers and Janssen, augmented games we consider 1991). In contrast, the are not dominance solvable. Preliminaries augmented game with expected payoffs (v\v 2 ) and payoff 2 x 2 Fix a perturbation vectors so e and 1 e example, r 2 (l,2) that, for chooses row 2 Write the action sets for each player . 2's playoff is row (Player l's choice of 1. (v\v 2 ) has a unique Nash equilibrium if is that Assume a strict best-response cycle. best-response cycle 2 i< < ] (2.2), r (2,2) Let ,1 t (2,2)) FHz) be < z . is 1 i.' u 2 (2,l)) F < 2 (z) 2 . We want if 2 and (1,2), y (l,2) That if is necessary, so that this < i^O.l) is, completely mixed, r 2 (2.1) (2.1), < 2 u 0,l). F 1 Note that F 2 compute to l (c aV(l,l) + aV(2.1) + (l - a 2 )u 1 (2,2) + continuous on is she assesses probability a 2 is that, R on any given , 1 1 in the 1 . 2 2 2 date, (e (l)-e (2))/(i- (2.2)- 2 that 2 plays (1 1 e (2) the marginal probability that , ) . - a 2 )v y column 1. If si.e 1 2 (l,2)-vK2,2) 1^(2,1) -1^(1.1) . V f (l,2)-t.'H2.2) ] Define r x = 1 plays row plays row Simple algebra shows that the former _ A, + <l-aMl (2)-e (D 1 + eHD, while row 2 nets (l,2) if e 1 continuous. is /3 1 derived from the distribution function p is be the probability u 1 ?' 1 This probability expected payoff < game the probability that, on any given date, (e (2)-f (l))/(r (1.2)- obvious fashion. Note that Let Rearrange rows, counter-clockwise. that the 1 completely mixed. Because (r\r 2 ) has a unique Nash equilibrium that (v\v 2 ) has = {1,2) he chooses column 2 and player listed first.) is A 1 (2.1)-r (l,D 1 (1.2) + i' , 50 -i-H2.2) is 1, 1 her for her greater Then row 1 is chosen whenever (e'(2) - 1 1 < c'(D)/(t' (1.2)- d (2,2)) -xa 1 2 . This gives 1 3 (q 2 Note that x A > and 1 (thus) that similar computation will = ) / is /?' show , (1 -xq 2 ). 2 a nonincreasing function of a that we if . define 2 (l,l)-v 2 (l,2) r 2 (2.2)-u 2 (2,l)' f + V F then 2 /3 Note that y Lemma S.2. > and 1 Fix any (Q = 1 (thus) that ) 2 /? 2x2 game iw'f/i l-F is a a distributions, Nash Proof. / is to these a continuous function use Brouwer's fixed point theorem. To and (a\,a 2m ) are two Nash a\ f a\ and, in a2 > a2 . And fact, > a\ because 1 Z? is . 7.1 where /S'(q~') = al has unique for the unit square into Without i = 1,2. (a^,a 2 ) that show uniqueness, suppose 1 . strategies that are essentially unique. mapping Because 1 strictly mixed. is Assumption two equations, note distributions. a\ ). that satisfies distributions are pairs (a\,a2J 5 1 (q 2 ),/5 2 (q 1 )) 1 unique Nash equilibrium that and thus has Nash equilibrium To show existence of a solution ( (l-ya nondecreasing function of a Then every augmented version of the game Nash 2 itself, >-* and that (a\,al) loss of generality, assume nondecreasing, this implies that is nonincreasing, this implies that a\ < a. , a contra- diction. We Nash distributions are both strictly has a row hereafter denote the probabilities of by a\ and a 2 . In general, between zero and one, even if it is between zero and one are sufficiently small. 51 and column if 1 in the unique not the case that a\ and a 2 the original unique, completely mixed equilibrium. probabilities are strictly 1 However (unaugmented) game the Nash distribution the supports of the perturbations Assumption 83. For its 1 support. F is 1 is F 2 the density function for This assumption is on some neighborhood of strictly positive on some neighborhood of implicit form, since density functions of the distribution functions of the original distribution functions p A F F 2 Restating . it terms in assumption 1 and F 2 are connected is is that and small between zero and and p2 are bounded and 1 involves the strictly positive interior of their supports. The proof details. We of Proposition 8.1 So before giving those fix involved, and is fairly we details, 8.1 easy to get lost in the it is sketch the intuition behind the proof. behavior and assessment rules where behavior is myopic with respect assessments and the assessment rules conform precisely to the model of fictitious play, as by —i and that the equilibrium marginal distributions are strictly The intuition for the proof of Proposition to the 1 set of sufficient conditions for this one, and the density functions for p on the F it and p2 of the perturbation vectors 1 the supports of the perturbation vectors enough uniformly bounded on is strictly positive somewhat stated in tedious but not difficult. F' the density function of The density function for — xa\ and , = 1,2, i at variable, date given in Section t that player We 3. i will write a\ for the probability assessed will play her first action. depending on the history of play up limi_oo(aJ,Q-j) = (q^q 2 Nash equilibrium implies that behavior converges to the For notational simplicity, all we suppose dates t > 2 weights identically equal to zero in the allowing for nonzero qm = initial (ta't + ta\/(t t . We with probability one; since behavior ) empirical distributions at to date This 1), + 1), that myopic, this strategies. which corresponds fictitious play model. to the case of initial It will be clear that alter the analysis. In this case if i plays action round t, if z plays action 2 in round t, 52 random show will is a that the players' assessments equal the weight does not l)/(t + , is 1 in and . . which more conveniently written is (1 — —a\/(i + The key thing note here to + a\)/(f as 1), 1). if i plays action round /, if 7 plays action 2 in round 1. that the size of the is in 1 and changes of the o' vanishes asymptotically, at rate 0(1/1) The theory of stochastic approximation (Arthur et Lyung and Soderstrom, Clark, 1978; 1983) al., 1987; Kushner and shows when such asymptotically van- ishing stochastic variations can be ignored; i.e., when the successsive random variables will almost surely evolve according to the evolution of their expected values. Starting from some value of values (q ^, a 2uz ) so on. , for fix some it - (a] F 2 (l - will , a\ , ya\) , and if certain regularity conditions are met, then a 2,) and compute the conditional expected values of a bit more transparent, let us the probability that player 2 will play his , compute E[oJ +] — first strategy so the expectation of the difference between -»<.!)) + £l*l-va;). This simplifies to E[a^ +] - a]\it] = ^(1 - F*(l - yo)) - a 2 ). similar calculation gives E[a] + (a] +: , - a\\Z t ] = 1 —(F\l 53 - xa]) - a\). is a2M and 2 i3 a] , a 2t+] ) aj|£,] denotes expectation taken with respect to P(|£<) E[-|£ ( ] £f(l-«1 A and almost surely approach (Q»,a^) makes matters = 1,2, where 2 Given 1 to (a\<al) random process Or, since , for every starting value of (a], a]), this "successive expected values" If So the (conditional) expected . sequence converges the compute , 2 then use these to compute expected values of (a) +7 a u2 ) 1 , (a], a]) . (a)) is = The equations fact that the step size in these difference that the evolution of the "successive expected values" going is zero suggests to approximates that in the related differential equation system ^ where we = F'(l -*a>) -a], and a reinterpret the 1 ^ = 1 s as the successive - J*(l - ya\) - a?, expected values, and the time index has been changed: Because the amount of change between time t + in the differential equations is 1 independent of more and more t, and t steps of the difference equations are compressed per unit of time of the differential equation system as The t gets larger. trajectories of this system of differential equations are most easily studied by comparing them with those of the system ^ We show below = F'(l - xa\) - ai, and that the ^ = 1 - F 2 - ya\) - a2 . second system has closed, convex orbits around the point (a\,al), and that relative to the second system, the inward. (See Figure (l 8-1.) Thus the first first system spirals always points strictly towards (q».q») in . This suggests that the "successive expected value" sequence approaches (a\,a 2,) from any starting point, and then the methods of the theory of stochastic approximation will yield the almost sure convergence that In relating this intuition to the proof watch for. First, we we desire. we now will use the closed orbit trajectories of the of differential equations as level curves for a Lyapunov Lyapunov function we construct is a bit less regular general results as they are stated in the literature. Specifically, our Lyapunov function is than we is Second, we need, because required for the 27 not twice continuously differentiable in general. 54 to second system function. derive parts of the theory of stochastic approximation that the two things give, there are . a' Dynamics Fie. 8-1. The of expected beliefs. solid curves represent trajectories of the second in the text. These closed orbits give the these level sets are not restricted to stay inside the the trajectories of the system of differential equations given Lyapunov function L (Note that unit square.) The dashed arrows show level sets of the . system of differential equations, the system that describes the first dynamics of "expected beliefs." (These do stay within the unit square.) Since the dashed arrows always point inwards relative to the closed orbits of the second system, the first system gives trajectories which spiral in towards (a\,a\) Proof of Proposition For (a 1 , a2) € L(a\a 2 )= The function L (a). [0,1] x [0,1], let L(a\a 2 ) be [l-F {l-yF)-al]dF- the function defined 2 / (a mnemonic for Lyapunov) has [F / 1 (1 - x/? L (a^o^CaiXhthen L(a\a )>0. 2 is continuously differentiable with gradient vector VI = (1-F 2 -ya^-Q^Q -F 1 (1 . 55 2 ) by - a\}d0 the following properties: L(a\,a 2.) = 0. (b) If (c) 8.1 ] (l -xa 2 )). 2 . . This gradient vector (d) L is convex. In ^ that (a) is = F - 1 wind around trajectories that ya\) = a 2 1 - F 2 — ya (l neighborhood of a\ 1 is ) F and , F and , 8.3 to note that in a ^ - (l — xa 2 (l is ) Property . yaj) - a2 . Then F ] (l - xa2 ) convexity; the rest Note that For part just noted. L is Assumption nonincreasing, and it is strictly then follows. Property (b) 1 - F (c) is — 2 compute the Hessian of L (d), first to ya^) show an exercise in integration. separable; is note strictly increasing virtually a matter of definitions, together with the properties of and (b), enlist it is . ) For property . nondecreasing, and ] decreasing in a neighborhood of a 2 2 the point (a\ .a 2 - xa\) = a\ ] else. are trajectories of immediate from the definition of L — F^O - 1 L the level curves of fact, = J«(l - xal) - a\ and which gives closed Property zero at (q'.,q») and nonzero everywhere is L\*>) = / L(a\a 2 = Via i.e., [1 1 ) - F 2 (l - 1 y/? ) ) - L 2 (a 2 where we ) define - a 2 ]d/?\ and Jol L Now fix 2 (a 2 = ) For every (' , is t precisely > 1 , two players. t [F'{1-x/37 )-a\]d!3 myopic with 2 . fictitious play, and sup- respect to these assessments. define L«(C«) L / Jal 2 assessment rules according to the model of pose that behavior In words, r = L{a\,a\) = Via)) + L 2 (a 2 ). t the value of the function I at the vector of assessments by the (Q) is We will have proved the theorem with probability one, since this will if we prove imply that the 5G that lim,_oo 1,(0) = beliefs are converging (with , . , probability one) to a\ and a\ , hence (by and myopic behavior) earlier analysis the marginal distributions over actions are converging to those values. The step in proving that lim, first - E[Z, +] ((< + i) such that we Li(Ct)|&]- Specifically, continuous function A' if t — (( t ) show will bound on an upper bound for to derive is * on the unit square with l the uniform is L that there exists a nonpositive 1 i(a' ,oc 2 < ) 1 for (a' ^ 2 )^ (a\,al) two perturbation the density of the scalars, E[L l+1 (C, +i) - < X-f(Ci>KiJ We That obtain this we is, bound by looking t write - E[L 2 (a 2M ) - L 2 (a]) fc] t t we (+) 1)2 2{t 1 two (separable) pieces of L (Q) at the E[L M (CM)-L (Q\i ]=-E[V{a'M )-V{a\) and ~ + + t | ( \ ( j begin with estimates of each of the two expectations on the right-hand side. Consider equal (ta] + 1)/(t + 1) ta)/(t + V) if player 1 a2 1 will , , player V the term involving first if Given . player 1 plays row row 2. Since l's beliefs plays play row 1 in round 1 a\ we have , round in t about (+] F 1 (l — will will equal q] +] 2's actions are with probability t and , q' that iq') given by . Now rdoj +D/U+1) [l-F2 (l-yp)-ai}df3\ which bounded above by is 'ta) t where 28 K This is is + + 1 ~a < l-F*(l- 2 l the uniform ta) ya))-a] bound on + 1 a, y t + the density functions of 1 F 1 and F 28 2 the length of the interval of integration, times the value of the integrand at Sim- . 3 1 = a) plus a bound on the integral of the difference between the true integrand and the integrand at a This latter bound comes from noting that P •— constant Ky F 2 . 01 (l - y/3) is , 1 , . Lipschitz continuous with Lipschitz 1 1 plifving, this , is 1-°1 l-f^l-yaj) -a 2 . + * y 2 l Similarly, l/ tQ \ -J-T-ifS a. ' s I -a < Hence E [X 1 (a 1 ,.,., + + xA'y 1 - -^(a ,) 1 ) 1-F t 2 + - fr] | a: is bounded above by - al)F' - qJO - F F - - a2 2 Kxj ) ' F + 2(y l) 2 '2/ F 1 a! + 2(f the l 1 1 1 where we suppress + * l) 2 ' 1— :m 2 and 1— yaj arguments (respectively ) of F 1 and 2 . Similar calculations show E[X 2 (q 2 +1 ) - I 2 (a 2 ) that is 6] | bounded below by F -q 1 1 t + F2 - a - A' 2 2(* 1 + l) 2 Thus E[WC, + i)-L,(Ct) I = E[L 1 (al +1 &] = E[L(q| +1 .g^) -1(a), a )-I 1 (a]) I is 6] 2 ) | -E[L 2 (a 2 +1 )-L 2 (a 2 6] ) G I bounded above by 1 - F2 - Q2 F -a,0 1 1 F 1 a) 1 t + E[X. two expressions m (Cw)-Li(Ci) t-f t 2 + -q: into the it] F2 - a 2 Jy'Gt 2(* 1 [1-F2 -q 2 Write [F'-a 1,] as [F'-ai+ai-Q;] ard write substitute these - ] as upper bound, and + + y) l) 2 [l-F2 -a 2,+a 2m -a 2 t simplify. This gives < I F -q 1 M i 58 1 A'(x + y) 1 . a. - ] a, 2(i + D 2 , . Define ,(a\a 7 ) = The signs — a2 then (a of 1 F - [l . (l - - 1 ) J/Q - F - a 2 and 2 -q i is a 1 1 - F - a 2 ± 0, and 2 a 2 ) ^ (a ., a 2 ) o if -a a;] [a\ q. a 2 are the same, so that 1 ] - [F(l - xo 2 - . Next, for each and / (r I'ACt) a\) [a ) are opposite, - 2 F - }. and 1 if a\ f 0, so i(q\q 2 ) 1 the^ , 2 a F -a\ and the signs of nonpositive function. Moreover, 2 ^ a 2 1 bound Putting everything together, this gives the 1 1 2 a ^ a 1, 1 < for (*) let , = E[I,(C«) - Lf_i(C»-i) | 6-i]. and LtiO = LAG) - ^max{^(G-),0}. By construction, {0} • By so {!<} bound the is {!,((,)} a forms supermartingale over the information sequence (*), bounded supermartingale, and hence has an immediate consequence, L Finally, a not possible it is t (( t ) that, itself has a limit almost surely. with positive probability, greater than zero. To see this, suppose that along From . the construction of the function that, in this case, lim T 0. a if , da), a]) < f < sup^^a^a Increase show matter of algebra to we define for those ( L t (Ct) as where L t T that L,(Q - (( t ) if 2 ) < i some < has nonzero i/(2(t limit. 59 history ( 0, so that for ]T', =1 ibviQv) this limit is a given previously, necessary so that ri»,(C») almost surely. As a limit , + 1)) all T> t we know that lim,—^ L,(Q) > it is > T A'(x for all , t value easy to show some large for + y)/|f > T . |. Then it is Accordingly, lim,— oo -WCt) = °° But {L t is (Ct)} bounded below. a martingale that is L^d) has nonzero If with positive probability, L/(C<) has limit oc with positive probability. In limit lim,^ooE[Z,((<)] = oc, which contradicts the fact that this case, {£,((,"<)} a is martingale. Extensions of Proposition 8.1 Proposition 8.1 gives a relatively restricted global convergence result. restricted in that the behavior specific; and assessment rules It is that are permitted are quite behavior must be precisely myopic with respect to assessments that are formed according considers only game has to the model of fictitious play 2x2 games, and then only 2x2 games a single equilibrium that which the augmentation Thinking It is satisfies is it which the unaugmented in completely mixed, and then only those for Assumption 8.3. about the behavior and assessment first further restricted in that rules, it is clear that exten- sions are possible. (Indeed, extensions along these lines are suggested by Arthur et al. (1987).) totically Assessments can be asymptotically empirical and behavior asymp- myopic, as long as the "rates of convergence" to opic behavior are sufficiently fast. All we need which involves the two possible values of more for i = 2) and — bounds on E[L M -L,\£ a\ the probabilities of those values; are uniformly 0(1 /t 1 ) or smaller. a rate of convergence of asymptotically delicacy is |q' - ( Here q test One (two for we i = t ] , and two 1 can tolerate changes can control the probabilities by imposing on the sequence myopic behavior. But called for. convergence 9 M a' play and my- terms of these differences and probabilities, contribute differences that that, in that are the fictitious One's {e<} that governs the asymptotic part for the differences q| +1 first instinct might be to impose to empirical assessments; the natural condition q~'(G)| _, (Ci) is is at most 0(1 /i 2 ). 29 the fraction of the time that 60 But this is a — a\ , uniform first strategy. is more rate of would seem stronger than -i has played his a bit to be needed, since this controls deviations of u! know to far that a\ is "close" to empirical frequencies, but only that from empirical frequencies, aJ +] new (from the 0{\/t). we If is going empirical frequency) in the that for fictitious play beliefs with is from empirical frequencies; we don't need insisted that \a) - nonzero a~''((,)\ to is fairly be about the same distance away same initial direction. In this regard, note weight vectors, 0(1/t 2 ) or smaller, is a\ if - o _1 (G)| |ai we would rule out these assessment rules, for which the result does hold. Extending our results to other classes of 2 x 2 games or beyond 2x2 games We (and to games with more than two players) seem to offer greater challenges. conjecture that results on can be derived, along the following local stability line. Take an augmented game and any equilibrium distribution for that game. Write down the continuous-time quencies, as in equations dyanmics (*) above. for the expected values of the empirical fre30 If librium values by the usual eigenvalue system this test, is locally stale at the equi- then the equilibrium will be locally stable in the sense of this paper for fictitious-play learning dynamics. with Arthur et al. [1987].) It is interesting to speculate whether the continuous- time system that goes with the Shapley (1964) counterexample equilibrium. might If so, then its instability fictitious is unstable at the play (for augmentations) follow. Regardless of these conjectures, managed is under (Compare to derive plausible that in here indicate some we hope that, that the limited results we have with Harsanyi's notion of purification, cases players would learn to play a it mixed strategy equilibrium. This reason we is conceptually straightforward, but it is offer conjectures instead of results. 61 not a simple exercise in practice, which is one References Arthur, W. proportions of Borgers, M. Ermoliev, and Yu. M. Kaniovski (1987), "Limit theorems balls in a generalized urn scheme/' mimeo, I1ASA. Yu. B., and M. Janssen T., (1991), "On the dominance for Cournot solvability of large games," mimeo, University College, London. Brown, George W. C. Koopmans New (1951), "Iterative solutions of (ed.), Activity games by fictitious play," in T. Analysis of Production and Allocation, Wiley and Sons, York. Canning, D. (1991), "Social equilibrium," mimeo, Cambridge University. Chung, New K.-L. (1974), Course in Probability Theory, 2nd edition, Academic Press, York. and H. Haller (1990), "Learning how to cooperate: Optimal play repeated coordination games," Econometrica, Vol. 58, 571-96. Crawford, in A V., Eichenberger, J., H. Haller, and F. Milne (1991), "Naive Bayesian learning in 2 x 2 matrix games," mimeo, University of Melbourne. Ellison, G. (1991), "Learning, Local Interaction, and Coordination," mimeo, MIT. Fudenberg, D., and D. Kreps (1988), "A theory of learning, experimentation, and equilibrium in games," mimeo, Stanford University. Fudenberg, D., and D. Levine (1989), "Reputation and equilibrium selection games with a patient player," Econometrica, Vol. 57, 759-78. in Fudenberg, D., and David Levine (1992), "Self-confirming equilibrium," mimeo, MIT. Guesnerie, R. (1991), "An exploration of the eductive justification of the rational expectations hypothesis," forthcoming, American Economic Review. Harsanyi, J. C. (1973), "Games with randomly disrrubed payoffs: for mixed-strategy equilibrium points," International journal of 2, A new Game rationale Theory, Vol. 1-23. Hendon, E., H. Jacobsen, and B. Sloth (1991), "Fictitious play in extensive form games and sequential equilibrium," mimeo, University of Copenhagen. Jordan, J. (1991), "Bayesian learning in normal form games," Games and Economic Behavior, Vol. 3, 60-81. Kalai, E., and E. Lehrer (1990), "Rational learning leads to Nash equilibrium," mimeo, Northwestern University. 62 Kandori, M., G. Mailath, and R. Rob (1991), "Learning, mutation, and long run games," mimeo, University of Pennsylvania. equilibria in Kushner, H., and D. Clark (1978), Stochastic Approximation Methods for Constrained and Unconstrained Systems, Springer- Verlag, New and T. Soderstrom MIT Press, Cambridge, Mass. and Practice of Recursive Lyung, L., Thomas Marcet, Albert, and mechanisms learning (1983), Theory J. York, Identification, Sargent (1989a), "Convergence of least squares in self-referential linear stochastic models," journal of Eco- nomic Theory, 48:337-68. Thomas Marcet, Albert, and in environments with hidden Political Sargent (1989b), "Convergence of least-squares state variables and private information," journal of Economy, Vol. 97, 1306-22. Milgrom, in J. and P., games with Milgrom, Roberts (1990), "Rationalizability, learning, and equilibrium strategic complementarities," Econometrica, Vol. 58, 1255-78. and P., J. J. Roberts (1991), "Adaptive and sophisticated learning peated normal form games," Games and Economic Behavior, Vol. Miyasawa, K. non-zero "On (1961), 3, 82-100. the convergence of the learning process in a sum two-person game," mimeo, in re- 2x2 Princeton University. Moulin, H. (1984), "Dominance solvability and Cournot stability," Mathematical Social Sciences, Vol. 7, 83-103. Nyarko, mimeo, "Learning and agreeing Y. (1991), New Robinson, J. to disagree without common priors," York University. (1951), An iterative method of salving a game, Annals of Mathematics, Vol. 54, 296-301. Shapley, ley L. (1964), "Some and A. W. Tucker topics in two-person games", in (eds.), Advances in Game M. Dresher, L. S. Shap- Theory, Princeton University Press, Princeton, NJ. Young, H. P. (1991), "The evolution of conventions," mimeo, University of Mary- land. G3 wbb vJ J72 Date Due »CN- JMl B 3 ' / (:: Lib-26-67 Hilt 3 =1060 0077lbbb 4