Laurent Mathevet

AN AXIOMATIC APPROACH TO BOUNDED RATIONALITY IN REPEATED INTERACTIONS: THEORY AND EXPERIMENTS Laurent Mathevet∗ This paper proposes an axiomatic approach to study repeated interactions between two boundedly rational players. Several axioms are presented: individual security; (strong and weak) collective intelligence; and internal stability. We characterize the axioms’ solutions and payoffs, classify stage games by the degree of sophistication required to satisfy strong collective intelligence, and present experimental evidence broadly consistent with the axioms’ implications. Of particular interest are the dynamic predictions in Buchanan (1977)’s Samaritan’s Dilemma: our main solution predicts that players will reach efficient agreements in which payoffs depend on the Samaritan’s ability to build a reputation via punishments; yet these punishments vanish over time, as if the reputation became common knowledge. Experiments support these predictions. Keywords: sequences of play, axioms, pattern selection, complexity, learning, bounded rationality, experiments. ∗ Special thanks are due to Jonathan Lhost, David Pearce, Julian Romero, Philippe Solal, Dale Stahl, Max Stinchcombe, Ina Taneva, and Tom Wiseman for many discussions and suggestions. I also wish to thank Matthias Blonski, In-Koo Cho, David Frank, Guillaume Frechette, Drew Fudenberg, Herve Moulin, Mike Peters, Ariel Rubinstein, Sahotra Sarkar, Rick Szostak, and Peyton Young for their helpful comments as well as the seminar/conference participants at NYU, Purdue, Rice, UBC, UCR and the 2012 North American Summer Meeting of the Econometric Society. a Dept. of Economics, University of Texas at Austin, 1 University Station, C3100, Austin, TX 78712, U.S.A. (lmath@austin.utexas.edu) 1 2 1. INTRODUCTION In game theory, the analyses of bounded rationality often appear as heterogeneous or fragmented. The difficulty comes from the many ways to be less than rational. As a result, if one writes down a precise model of exactly how players are limited in rationality, then the model should be specific to avoid being cumbersome. In repeated interactions, the learning literature faces this problem. On the one hand, the model can shed light on a specific facet of bounded rationality (e.g., adaptive learning (Milgrom and Roberts (1990)), reinforcement learning (Er’ev and Roth (1998)), aspiration-based learning (Karandikar et al. (1998)), pattern recognition learning (Sonsino (1997)), regret-based learning (Hart and Mas-Colell (2000)), etc).1 On the other hand, the model can combine multiple features of bounded rationality (e.g., Camerer and Ho (1999)), but tractability quickly becomes a concern: theoretical predictions might require computer simulations. Besides, most learning models assume that all players follow the same learning rule. Allowing for different behaviors is important, but tractability often prevents it. In this context, this paper departs from the standard approach and asks the question: are there plausible principles that describe the outcomes of a repeated interaction between two unsophisticated players? This paper proposes an axiomatic approach to study repeated interactions between two boundedly rational players. Our objective is to explore an abstract approach to bounded rationality that is complementary to the descriptive method of the learning literature. In general, the researcher can either model boundedly rational behaviors precisely, for example, by specifying learning algorithms, or she can describe the implications or the outcomes of boundedly rational behaviors. In this paper, we explore the latter option. In other words, our 1 In these models, the lack of sophistication can be interpreted as bounded rationality in a standard repeated game, or as a more “rational” response in a random-matching protocol. Brown (1951), for example, introduced fictitious play using a bounded rationality justification: “One might naturally expect a statistician to keep track of the opponent’s past plays and, in the absence of a more sophisticated calculation, perhaps to choose at each play the optimum pure strategy against the mixture . . . ” 3 objective is not to propose an alternative precise model of bounded rationality in repeated settings, but rather to propose a plausible recipe for less than rational play. The desire to dispense with a model calls for a model-free approach, and in the context of repeated interactions, the infinite sequences of play are natural primitives for this exercise. Formally, a repeated interaction is an infinite sequence of repetitions of a stage game. A solution is a description of all the sequences of play that are likely to arise in a repeated interaction given its stage game. In technical terms, a solution is a correspondence that maps the set of stage games into the set of infinite sequences of action profiles. Our approach consists in writing axioms on the infinite sequences of play and then characterizing the solution and/or the solution payoffs. Our first objective is to introduce an axiomatic framework for repeated interactions and to illustrate it with specific axioms. Although our axioms are not the only reasonable ones, they are plausible principles that boundedly rational players might satisfy in many situations. Our second objective, after characterizing solutions, is to supply empirical evidence that support the axioms’ implications. Before continuing the discussion, we illustrate our approach in examples. The main question is: are there patterns of play that seem more plausible than others? S H S 4,4 0,3 H 3,0 2,2 Stag Hunt P P E 2,3 0,2 E 3,1 1,2 Diner Game Consider the stag hunt game. Two individuals go out on a hunt every day. Each can choose to hunt a stag or a hare. Suppose the history of play becomes common knowledge after each period and players are completely patient. To illustrate the first axiom, consider the sequence 4 of play ((S, H)(H, S) . . .), which is a scenario of repeated miscoordination.2 This sequence will be eliminated by individual security. In this sequence, each player earns less on average than what she could have secured by switching at any point in time and forever to action H. This axiom can be interpreted as imposing some minimal amount of rationality. To illustrate the second axiom, consider the sequence ((H, H)(S, S) . . .) in which both players alternate forever between hunting hare and hunting stag. This sequence will be eliminated by collective intelligence. Unsophisticated players like to do simple things: if they could, they would play a single action profile forever. One of the main reasons for alternating between profiles is to increase payoffs. Thus, two boundedly rational players would not play a complicated pattern, if within that pattern, there were a simpler one that gives them both higher payoffs. In other words, since it is costly for them to coordinate on complex intertemporal patterns, they would not increase their coordination effort to both earn less. Let us illustrate our third axiom in the diner game. The row player is a child and the column player is the parent. The game is played every day before diner. The child chooses to eat a cookie (E) or to not eat it (Ē). The parent decides to punish (P ) or to not punish (P̄ ) the child. The child has a dominant action to eat the cookie, and the parent only prefers to punish the child if she eats the cookie. Consider the sequence ((E, P̄ )(Ē, P̄ ) . . .), in which the child eats the cookie one time out of two and the parent never punishes the child. This sequence will be eliminated by internal stability. The intuition is that a player has to justify to herself why she is not trying to be greedier, but she is unable to process all the counterfactuals; thus, she bases her reasoning on direct observation of the path. In this sequence, the child may be unable to justify to herself why she only eats the cookie fifty percent of the time, because she never suffers the negative consequence of eating it. Thus, 2 This sequence is produced by Cournot dynamics and by many sophisticated learning dynamics (Milgrom and Roberts (1991)) when the process is initiated at (S, H). 5 the sequence is destabilized by the child who will eventually increase her eating frequency. One virtue of this approach is that it allows us to build a multifaceted picture of boundedly rational play by combining various independent principles. The multifaceted solutions that emerge from this framework deliver insightful conclusions. Our main solution, for example, will unearth reputation building phenomena (Section 4.2). Furthermore, this approach makes it possible to explore the many ways to be less than rational within the same framework, by comparing different axioms and their solutions. In the conclusion, we suggest several avenues for future axioms and characterizations. It is also worth mentioning that the combination of axioms need not jeopardize tractability. Our characterizations, for instance, will be easy to compute. Finally, as a pattern-based approach, it is pragmatic in nature and experimentally testable (Section 4). However, the main limitation of our analysis is the lack of an answer to the question ‘how exactly do players produce the desired patterns?’ Abstraction is the raison d’être of our approach, and the absence of a model of players’ behavior is deliberate. Yet, if an axiomatic solution is valid, then one can build a model that explains it. In this sense, our approach serves a different purpose from the existing literature and it is complementary to it. Our main theoretical results are characterizations. The first result characterizes the solution under individual security and strong collective intelligence. This result allows us to characterize the solution payoffs, which take the form of line segments. In 2 × 2 games, these payoffs are reminiscent of the equilibrium payoffs of Abreu and Rubinstein (1988); we compare both characterizations in examples. Subsequently, we introduce internal stability and characterize the payoffs under this axiom; once again, they are represented by line segments. We then combine these characterizations and describe the sequences and the payoffs that survive all axioms. In general, our solutions offer sharp predictions. In certain games, the payoff prediction is even unique. 6 The empirical results evaluate the performance of our main solution, based on the experiments by Mathevet and Romero (2012). The data consist of over 400 sequences generated by human subjects, who played eight different stage games, including the Prisoners’ Dilemma, for more than 100 periods on average. Based on payoffs, type I and type II errors, our solution is at least as predictive as the Experience Weighted Attraction (EWA) learning model of Ho et al. (2007) and the reinforcement learning model of Er’ev and Roth (1998). The comparison is important, because these models have been shown to be highly descriptive of human behaviors.3 Moreover, we use two different pattern detection algorithms to extract all sequences that exhibit a pattern. We obtain two subsets that each represent about 2/3 of the data. In each subset, the axioms hold more than 90% of the time. To give some order of magnitude, the first algorithm considers all patterns of play whose length is less than 10, which represents more than 106 patterns. In several games, less than 0.04% of those 106 patterns belong to our solution, yet 90% of observed patterns fall into our solution. Of particular interest are the predictions in the Samaritan’s Dilemma (Buchanan (1977)). Our solution predicts that players will reach efficient agreements in which payoffs depend on the samaritan’s ability to build a reputation via punishments; yet these punishments vanish over time, as if reputation became common knowledge. The experiments do suggest the establishment of a reputation that persists after punishments vanished. The importance of bounded rationality in repeated interactions has long been recognized by the learning literature (Fudenberg and Levine (1998)), but the axiomatic method has rarely been used in this context (see Thomson (2001) for a survey). In a recent paper, Blonski et al. (2011) proposed an axiomatic method for selecting equilibria in the repeated Prisoners’ Dilemma. 3 These models might not be intended for repeated games. Yet, their empirical success makes them a valuable benchmark for repeated games, inasmuch as the alternatives to this type of models are scant. 7 Our execution of the axiomatic method leads to pattern selection. The fact that patterns of observable behavior can be the outward expression of inward motives, and thus, can convey important information of psychological nature, is the tenet of frequent pattern mining (Han et al. (2007)) and behaviorism (see Moore (2011) for a sketch). Applications of pattern mining include market basket analysis to understand co-occurrences among products (Agrawal et al. (1993)) and web usage mining to analyze users’ browsing activities (Eirinaki and Vazirgiannis (2003)). The importance of patterns has been recognized by the learning literature (Sonsino (1997) and Foster and Young (2003)), but we take it further by considering patterns of observed behavior as primitives of the analysis. The paper is organized as follows. The next section introduces the framework and the basic definitions. Section 3 provides the axioms and the characterizations. Section 4 presents the experiments and the empirical results. Our approach raises important conceptual questions that we discuss in Section 5. Section 6 concludes. All proofs are relegated to the appendix. 2. PRELIMINARIES 2.1. Model and Solution In this paper, we consider repeated interactions between two players. Let G = (A1 , A2 , u1, u2 ) be a finite two-person game in normal form where Ai is player i’s finite action set; A = A1 ×A2 is the set of action profiles; and ui : A → R is i’s payoff function. Let Σi be i’s set of mixed actions with typical element αi . The payoff functions are extended to mixed actions by taking expectations. Let G be the set of all finite two-person games. For G ∈ G, Π(G) = {π ∈ R2 : ∃a ∈ A s.t. π = (u1 (a), u2 (a))} is the set of pure payoff profiles. Let Co denote the convex hull operator. A repeated interaction consists of an infinite sequence of repetitions of a stage game G at discrete time periods t = 1, 2, . . . The stage game is common knowledge among the players. 8 In every period t, the players make simultaneous moves, denoted by ati ∈ Ai , that become common knowledge. These choices can be the realizations of randomization devices. The dynamic payoffs will emphasize long run phenomena. Let S(G) = A∞ (or, simply S) be the set of all infinite sequences for a given stage game G. A sequence of play is written as s = (s1 s2 . . .) where st = (at1 , at2 ). For s ∈ S, player i’s dynamic payoff is πi (s) = lim inf T →∞ T 1X ui (st ).4 T t=1 The liminf criterion, which implies complete patience (Marinacci (1998)), is a modeling choice motivated by simplicity. By using this criterion, we abstract away from patience issues, in much the same way as the standard learning literature.5 In practice, we can implement this in experiments by using large discount factors δ, as subjects do not make a significant difference between δ = .99 and δ → 1. Moreover, we can interpret eventual behaviors in finite horizon as those emerging late in the game and persisting (see Section 4 and Footnote 21). A solution S is a function that assigns to each G ∈ G a subset S(G) ⊂ S. The set S(G) should be thought of as the set of infinite sequences that we expect to arise in the repeated interaction with stage game G. 2.2. Complexity This paper will pay particular attention to cyclic sequences. A sequence s ∈ S is cyclic if each action profile a occurs with frequency wa (s)/ℓ(s) in the sequence, where ℓ(s) is the 4 The liminf is the infimum of the cluster points of (st ), the sequence of average payoffs from 1 to t. Terms that appear finitely many times in a sequence do not influence the liminf. 5 There is no reason why discounting and patience would not play a role in studies of bounded rationality, but these issues are secondary to the subject matter. For example, many learning models are agnostic about patience related issues. In aspiration-based learning (e.g., Karandikar et al. (1998)), which concerns repeated games, players satisfice myopically. In some models of adaptive learning (Milgrom and Roberts (1990)), players can be seen as maximizing myopically. Of course, myopia can also be interpreted as discounting with δ = 0. 9 length of the cycle. Formally, s has a cycle if there exists ℓ(s) ∈ N such that (i) for each a ∈ A, there is a non-negative number wa (s) ∈ N such that limT →∞ where 1 is the indicator function, (ii) P a PT t=1 1{st =a} T = wa (s) ℓ(s) wa (s) = ℓ(s), and (iii) there is no ℓ < ℓ(s) for which (i) and (ii) hold. In this context, a cycle is a cyclic sequence whose terms all appear infinitely often. Note that the order does not matter in a cycle: letting a, b ∈ A, (abaaba . . .) and (aabaab . . .) are two instances of the same pattern of length 3. Note also that an action profile can have zero frequency in a cyclic sequence and yet appear infinitely often. Examples of processes that generate cycles are mixed strategies, public randomization devices, regular Markov chains, etc. A sequence s is strongly cyclic if it has a strong cycle, i.e., if there exist T and ℓ(s) such that st = st+ℓ(s) for all t ≥ T and there is no ℓ < ℓ(s) for which st = st+ℓ for all t ≥ T . In words, there is a time after which the pattern (st , . . . , st+ℓ(s)−1 ) repeats itself forever. Strong cycles are cycles. In repeated games, the use of automaton strategies produces strong cycles (Abreu and Rubinstein (1988)). The length ℓ of a cycle is a natural albeit coarse measure of complexity. Intuitively, it is simpler for players to play a single action profile rather than alternate between two. In general, it seems reasonable to assume that longer cycles are harder intertemporal patterns to generate than shorter cycles. If we consider the processes that induce cycles, then complex cycles suggest complex processes. For example, ℓ is a coarse indicator of the complexity of a pair of automata.6 3. THE AXIOMS Our approach raises conceptual questions about the interpretation of the axioms, the role of observables, and “off-path” considerations. We discuss them in Section 5. Regardless 6 The length of a cycle gives an upper bound on the number of states that a pair of automata can have in order to generate the cycle. Automata also specify off-equilibrium behaviors that are not taken into account here. 10 of interpretation, our approach amounts to selecting cycles, whether these cycles should be understood as the limit points of a learning process or as the starting points of an “equilibrium” analysis. For now, think of our axioms as selecting cycles: if two boundedly rational players adopt some cycle s as a convention, what might this convention plausibly (not) look like? For any π, π ′ ∈ R2 , π ≫ π ′ means π1 > π1′ and π2 > π2′ , i.e., π (strictly) Pareto dominates π ′ . Moreover, π > π ′ means π1 ≥ π1′ and π2 ≥ π2′ and at least one inequality holds strictly, i.e., π weakly Pareto dominates π ′ . 3.1. Individual Security Although players are boundedly rational, their knowledge of the stage game suggests the minimal amount of rationality described by the first axiom. Let ui (G) = max min ui (αi , aj ) αi ∈Σi aj ∈Aj be player i’s maxmin payoff in G. The maxmin is a player’s security level, i.e., the player can secure at least this payoff in the stage game. A maxmin action is any mixture αi that guarantees ui to be at least as large as i’s maxmin payoff. Axiom 1 (Individual Security) For any G ∈ G, if s ∈ S is such that πi (s) < ui (G) for some player i, then s ∈ / S(G). This axiom eliminates all conventions for which there exists a player whose dynamic payoff is strictly lower than her maxmin level. In many situations, Axiom 1 is uncontroversial. If we restrict attention to sequences with convergent payoffs,7 then any player can switch at 7 Payoff-convergent sequences are such that lim inf T →∞ convergent. 1 T PT t=1 ui (st ) exists. Cyclic sequences are payoff- 11 any time and forever to her maxmin action, and obtain a larger payoff than the sequence s described in the axiom. Indeed, the player’s continuation payoffs after the switch, and hence her overall dynamic payoffs, will be weakly larger than her maxmin level. As a remark, note that since the minmax is larger than the maxmin, using the former in the axiom would lead to a more restrictive axiom. 3.2. Collective Intelligence We introduce two versions of collective intelligence and then characterize the solution in conjunction with individual security. The idea behind collective intelligence is that boundedly rational players would not play a complicated pattern, if within that pattern, there were a simpler one that gives them both higher payoffs. Since it is costly for them to coordinate on complex intertemporal patterns, they would not increase their coordination effort to both earn less. Or to put it in yet another way, we would expect an increase in pattern complexity to benefit at least one player. 3.2.1. Weak Collective Intelligence Let us start with a standard yet central definition: Definition 1 A sequence s′ is a subsequence of s if there exists a strictly increasing function h : N → N such that s′ t = sh(t) for all t. Limit attention to strongly cyclic sequences s. Given a sequence s, say that its cycle starts from T0 (s) if T0 (s) is the smallest time T such that st = st+ℓ(s) for all t ≥ T . A truncation of s is a strongly cyclic subsequence ŝ of s that has a cycle of length ℓ(ŝ) < ℓ(s) and satisfies the following condition: either (i) ŝt = ŝt+1 for all t ≥ T0 (ŝ), or (ii) ŝT0 (ŝ) = sT0 (s) and for all t ≥ T0 (ŝ), ŝt = sk implies ŝt+1 = sk+1 . In words, a truncation is a pattern obtained by 12 interrupting a cycle and repeating its last element forever, or obtained by always interrupting a cycle at the same point and re-starting it from the beginning. For example, letting a, b and c be action profiles in A, (bb . . .) and (abab . . .) are truncations of (abcabc . . .), but (bcbc . . .) is not a truncation of it. Axiom 2 (Weak Collective Intelligence) For any G ∈ G, if a strongly cyclic sequence s ∈ S has a truncation s′ such that π(s′ ) ≫ π(s), then s ∈ / S(G). The notion of truncation is perhaps the most conservative requirement that can be used to rank patterns according to complexity. If two players can conceive of a pattern, then they should be able to conceive of its truncations. A truncation is not only simpler, but it is also mechanically easy to obtain from a cycle, because it repeats one element or an initial section of it. Therefore, it is natural to consider that truncations involve lower complexity costs or coordination efforts. In this context, the second axiom expresses the fact that two unsophisticated players would not rule out a truncation to earn less. 3.2.2. Strong Collective Intelligence This section extends the notion of collective intelligence in several directions. The weak axiom is limited to strong cycles, but a general notion of collective intelligence should apply to all cycles, especially to randomizations. Furthermore, it is important to use a weaker criterion than truncations to capture complexity. This will lead to sharper solutions and permit a looser interpretation of bounded rationality. Axiom 3 (Strong Collective Intelligence) For any G ∈ G, if s ∈ S has a subsequence s′ such that π(s′ ) ≫ π(s), then s ∈ / S(G). The axiom rules out any convention containing another convention that strictly benefits both players, for they would play it instead and the original convention would not emerge. 13 In the Prisoners’ Dilemma of Section 3.3, the infinite repetition of pattern ((D, C)(C, C) (D, C)(C, D)), denoted as sequence s, has no Pareto-improving truncation,8 and as such, it survives Axiom 2. According to Definition 1, however, the infinite repetition of ((D, C)(C, C) (C, C)) is a subsequence of s. Denote it by ŝ. Since ŝ is Pareto-improving over s, convention s is ruled out by Axiom 3. As another example, the infinite sequence of defections, ((D, D)(D, D) . . .), survives Axioms 2 and 3. Axiom 3 has a subtle but close relationship to bounded rationality. Suppose complexity is measured by the length of a cycle (Section 2.1). In the previous example, subsequence ŝ was not only Pareto improving, but also less complex than s.9 In general, if players can conceive of a convention, then they should also be able to conceive of the other conventions that are less complex than it and that use the same profiles, such as the truncations. Since it is costly for boundedly rational players to coordinate on complex intertemporal patterns, they will not adopt a more complex convention that results in lower payoffs for both. Or to put it in another way, we would expect an increase in pattern complexity to benefit at least one player. This argument relies on the existence, for every possible sequence s, of a Pareto improving subsequence of lower complexity than s. This problem is non-trivial, because the definition of a subsequence allows its pattern to be more complex than that of the sequence from which it is extracted (see Footnote 9). We study this next. In Section B of the appendix, we establish Proposition 5, which is the keystone of the relationship between Axiom 3 and bounded rationality. This proposition requires technicalities that would interrupt the flow, which is why we offer a brief exposition here. This result connects families of games to the existence of a subsequence of bounded complexity. According 8 The reader can check (d, c), ((d, c), (c, c)), and ((d, c)(c, c)(d, c)) do not yield Pareto improvements. A subsequence could be more complex than the sequence from which it is extracted. Let a, b be action profiles in A. Sequence s = (abab . . .) has a cycle of length 2 and s′ = (abbabb . . .), constructed by removing one a out of two from s, is a subsequence of s but it has a cycle of length 3. 9 14 to Definitions 4 to 6 of Section B, we can classify all stage games into families Fn , where n ≥ 1 is a natural number. Suppose we have done so. Proposition 5 then states: For stage games in Fn , if a cyclic sequence s admits a (strictly) Pareto-improving subsequence, then it also admits one whose cycle is at most n times the length of the cycle of s. This result has important implications. For games in F1 , which includes Battle of the Sexes, Stag Hunt, Chicken, and many others, every sequence discarded by Axiom 3 contains a simpler Pareto-improving subsequence.10 Since simpler is better for both players, simplicity would probably prevail among boundedly rational players. For games in Fn with n ≥ 2, which includes the Prisoners’ Dilemma, a convention s may admit Pareto-improving subsequences that are all more complex than s, and thus convention s can only be eliminated by players who, despite their limitations, can consider cycles of greater complexity. The proposition provides a bound on the necessary additional complexity. In Fn , in order to eliminate any convention s that violates Axiom 3, the players need the ability to implement cycles that are at most n times the length of the cycle of s. Strong collective intelligence is not equal to Pareto efficiency, as it allows for inefficiencies. In the Prisoners’ Dilemma, for instance, the sequence of infinite defections, ((D, D)(D, D) . . .), survives the axiom. This axiom imposes no restrictions on constant sequences; its only purpose is to select among cycles containing at least two profiles. Even then, it permits inefficiencies, such as ((D, C)(C, D) . . .) in the Prisoners’ Dilemma. Ultimately, our endeavor is to select cycles that express a form of equilibrium or stability. Nothing precludes chaos or the presence of punishments on the way to equilibrium, but in equilibrium, if punishments complexify a convention while harming both players, the plausibility of this agreement may be questioned. This implication echoes Rubinstein (1986), in which the equilibrium concept 10 By “simpler,” we mean weakly simpler: given a sequence whose cycle has length ℓ, the simplest Pareto improving subsequence may also have a cycle of length ℓ. In Section B, we define a family F1∗ ⊂ F1 for which all sequences that violate Axiom 3 admit a strictly simpler subsequence. 15 rules out automata whose states are not all used infinitely often, because such states — that may be used for punishments at the beginning — eventually become useless and complexify the machine without contributing to play. In summary, our axiom excludes those conventions in which mutually harmful behaviors have significant weight and co-exist indefinitely with beneficial outcomes: Proposition 1 If a cyclic sequence s has no subsequence s′ such that π(s′ ) ≫ π(s), then the frequency of play of all a ∈ A such that u(a) ≪ π(s) is wa (s)/ℓ(s) = 0. As we will see in Section 4.2, certain profiles, such as punishments, can have zero frequency and yet play a crucial role in the solution. 3.3. Intermediate Characterizations We now characterize the set of sequences and payoffs that survive individual security and collective intelligence. The results are illustrated in examples. In preparation for the results, we define several terms. The set of individually secure payoffs is given by n o ΠIS (G) = π ∈ Co(Π(G)) : πi ≥ ui (G) ∀i = 1, 2 . For any X ⊂ R2 , let P(X) = {π ∈ X : ∄x ∈ X s.t. x ≫ π} be the set of weakly Pareto efficient elements in X relative to X, and letting c stand for complement, define U(X) = \ x∈X {π ∈ R2 : π ≪ x}c 16 to be the set of real vectors that are dominated by no point in X. For any s ∈ S, define n o R(s) = u ∈ R2 : ∃a ∈ A s.t. u = (u1 (a), u2 (a)) and (∀T ∈ N+ )(∃t ≥ T ) s.t. st = a (3.1) to be the set of recurrent payoffs in s, i.e., the set of payoff profiles that appear infinitely many times in the sequence.11 Let S w be the solution that assigns to each game the set of all strongly cyclic sequences surviving Axioms 1 and 2. First, we characterize the payoffs associated with S w . Proposition 2 For all G ∈ G, π(S w (G)) = {π ∈ R2 : there is a strongly cyclic s ∈ S s.t. π = π(s) ∈ U(R(s))} ∩ ΠIS (G). According to the proposition, the solution payoffs are vectors π that are produced by sequences, none of whose recurrent terms dominate π. Otherwise, playing a recurrent term would be a Pareto-improving truncation. This result suggests a simple procedure to determine whether a payoff is in the solution set. Choose a feasible payoff π and ask whether it is possible to generate it with profiles whose payoffs do not Pareto dominate π. π is a solution payoff if and only if the answer is yes. For example, consider the Prisoners’ Dilemma and the rightmost figure in Figure 1 (on p. 19). Any payoff π in the white triangular area requires to play (C, C), but u(C, C) ≫ π. Therefore, π is excluded. For the black triangles directly above and to the right of the white triangle, π can be produced by a combination of (C, C), (D, C) and (C, D), none of which Pareto dominates π. Second, we characterize the sequences that survive Axioms 1 and 3, and their payoffs. Two notions play a central role, individual security and internal efficiency. 11 This interpretation of R(s) is accurate, because A is finite. 17 Definition 2 A sequence s is individually secure if πi (s) ≥ ui (G) for all i. Definition 3 A sequence s is internally efficient if π(s) ∈ P(Co(R(s))). Internal efficiency is a sequential notion that is weaker than Pareto efficiency. For instance, an internally efficient sequence may be inefficient, such as constant defection in the Prisoners’ Dilemma. Pareto efficiency requires dynamic payoffs to lie on the overall Pareto frontier, i.e., π(s) ∈ P(Co(Π(G))), whereas internal efficiency only requires a sequence to generate payoffs that lie on the Pareto frontier of (the convex hull of) its own set of recurrent payoffs. Internal efficiency takes players’ abilities into account: when players are contemplating some arrangement, they should be aware of the profiles that compose it, and thus, they may be aware of the internal opportunities for improvement. Pareto efficiency, however, requires players to take whatever action necessary to reach the overall Pareto frontier. Theorem 1 A solution S satisfies Axioms 1 and 3, if and only if, for all G ∈ G, S(G) is a subset of all internally efficient and individually secure sequences. Theorem 1 is intuitive. If a sequence survives Axiom 3, then it cannot offer Pareto improvements within its recurrent payoffs, for this would amount to building a Pareto improving subsequence. Likewise, if a sequence is internally efficient, then it cannot be Pareto improved by any of its subsequences, and thus, no convex combination of its recurrent terms can dominate it. A consequence of Theorem 1 is the special structure of solution payoffs. Let L(G) be the set of all pure payoffs and all line segments connecting two pure payoffs in game G. That is, L(G) is the set of all g = Co({uk , uh }), where uk and uh are in Π(G) but not necessarily different. Let S st be the solution that associates to each game the set of all sequences surviving 18 Axioms 1 and 3. Theorem 2 For all G ∈ G, π(S st (G)) = (∪g∈L(G) P(g)) ∩ ΠIS (G). The solution payoffs must be individually secure and either (i) lie on line segments connecting two (Pareto unranked) pure payoffs or (ii) be pure payoffs. This representation as a union of segments dramatically restricts the possible payoffs. In certain common interest games, it implies uniqueness (see Section C.8 in the appendix). These segments, however, do not imply that players only play two action profiles, but rather that only two profiles appear with non-vanishing frequency in the sequence. This point is crucial, because some profiles may be vanishing and yet serve an important purpose, such as building a reputation (Section 4.2). Figure 1 illustrates our solution payoffs in examples and includes the Nash equilibrium payoffs from Abreu and Rubinstein (1988)’s machine game. 3.4. Internal Stability In this section, we present the axiom of internal stability and characterize the dynamic payoffs that survive it. As the game unfolds, each player gathers information about her opponent. Information processing is a difficult task, in particular for a boundedly rational player trying to interpret the actions of another player. The idea behind internal stability is to select conventions according to whether each player can justify to herself why the surplus is divided the way it is in these conventions. But this justification is required to come from what a player observes directly, which emphasizes her naiveté or her inability to process certain counterfactuals. 19 ∏2 C D C 3,3 1,4 D 4,1 2,2 ∏2 ∏2 ∏1 ∏1 ∏1 Prisoners’ Dilemma ∏2 A B A 4,2 1,1 B 1,1 2,4 ∏2 ∏1 ∏2 ∏1 ∏1 Battle of the Sexes ∏2 ∏2 S H S 3,3 0,2 H 2,0 1,1 ∏1 ∏2 ∏1 ∏1 Stag Hunt A B A 3,3 1,4 B 4,1 0,0 ∏2 ∏2 ∏2 ∏1 ∏1 ∏1 Chicken Abreu & Rubinstein Axioms 1 & 3 Axioms 1 & 2 Figure 1.— Solution Payoffs under Individual Security and Collective Intelligence (indicated by thick dots, thick solid lines, and black areas. Dotted lines delineate the individually secure payoffs. Left: Abreu and Rubinstein (1988)’s equilibrium payoffs; middle: π(S st (·)); right: π(S w (·))). 20 3.4.1. The Axiom Fix a sequence s ∈ S. The set of recurrent action profiles in s is defined by R∗ (s) = {a ∈ A : (∀T ∈ N+ )(∃t ≥ T ) s.t. st = a}. (3.2) Let Ai (s) = {ai ∈ Ai : ∃aj ∈ Aj s.t. a ∈ R∗ (s)} denote player i’s recurrent actions. For any ai ∈ Ai (s), let Aj (ai , s) = {aj ∈ Aj : (ai , aj ) ∈ R∗ (s)} denote the actions of player j that form recurrent profiles with ai . The observed minmax, defined for i 6= j as12 πi,j (s) = min max ai ∈Ai (s) aj ∈Aj (ai ,s) uj (a), (3.3) is the smallest payoff that i shows the ability to impose j repeatedly, taking j’s reaction into account. The observed minmax uses the standard definition of minmax, except that min and max are taken over actual observations. For any s and ai , let bj (ai , s) = argmaxaj ∈Aj (ai ,s) ui (a) (s) = {ai ∈ Ai (s) : uj (a) = πi,j (s) for all aj ∈ be j’s restricted best-response to ai . Let Amm i bj (ai , s)} be the set of (observed) minmax actions for i. Now lete πi,i (s) = max min (s) aj ∈bj (ai ,s) ai ∈Amm i ui (a) (3.4) be the utility level that i can secure when minmaxing j in the sequence. Interpersonal comparisons of observed minmaxes and minmaxing utilities are only meaningful when players’ payoffs are normalized on the same scale. In this regard, πi,j (s)/πj (s) represents the worst damage caused by i to j as a percentage of j’s payoff, and πi,i (s)/πi (s) is the percentage of her dynamic payoff that i retains while minmaxing her opponent. 12 This definition does not allow for mixing over Ai (s), because many mixtures are incompatible with the observed frequencies of play; so mixing would present i with punishment abilities she may not be aware of. 21 Axiom 4 (Internal Stability) For any G ∈ G, if s ∈ S is such that πi,j (s) πj,j (s) , πj (s) πj (s) ! < πj,i (s) πi,i (s) , πi (s) πi (s) ! (3.5) and πi (s) < maxa∈R∗ (s) ui (a) for some player i, then s ∈ / S(G).13 If player i observes that she can impose j a smaller relative payoff than j can (first dimension in (3.5)) while incurring a smaller relative loss (second dimension in (3.5)), and if there is a recurrent profile a that player i likes more than s, then i might think that she should be able to enforce a more frequently, and thus, she will destabilize convention s by playing ai more often. In the Prisoners’ Dilemma of Section 3.3, the infinite repetition of pattern ((D, C)(C, C)), denoted as s, is discarded by Axiom 4. In this sequence, player 1 alternates between D and C, but her opponent never defects. Therefore, it is tempting for player 1 to play D more often in order to try increasing her payoff. In the language of the axiom, π1 (s) = 7/2 < u1 (D, C) = 4 and (3.5) is given by (1/2, 1/2) < (8/7, 8/7). That is, player 1 shows that she can cause player 2 to lose 1/2 of her dynamic payoff and at no cost, whereas 2 never causes any harm. In the diner game of the Introduction, the infinite repetition of ((E, P̄ )(Ē, P̄ )), denoted as ŝ, is also ruled out by Axiom 4. The child may think that she can increase her eating frequency since she never gets punished. In formal terms, π1 (ŝ) = 5/2 < u1 (E, P̄ ) = 3 and (3.5) is given by (1/2, 1/2) < (6/5, 6/5). Internal stability assumes a strong form of bounded rationality. Unless a player observes once in a while that her opponent can cause more damages than she can or at a smaller cost, she cannot justify to herself why she is not trying to be greedier, and thus, she upsets the convention. This emphasizes her finite memory, as punishments are eventually forgotten if 13 Use the conventions: 0/0 = 1, x/0 = ∞ for all x > 0, and x/0 > y/0 iff x > y. 22 not renewed, as well as a form of naiveté or myopia. The repetition of pattern ((D, C)(C, C)) in the Prisoners’ Dilemma is a good example. Since player 1 never witnesses a defection from player 2, she does not foresee that playing D more often is likely to prompt 2 to defect. More generally, this axiom divides the surplus relative to punishment and cost values calibrated from what happens on the path. This has several implications. First, it imposes no restrictions on constant sequences: all cycles of length 1 survive the axiom, as players have no contradictory evidence. Second, any convention in which a player experiences occasional benefits, while showing a greater ability to punish and at a lower cost, is unstable. Third, in a symmetric stage game, if all action profiles appear infinitely often, then the axiom implies that players will eventually settle on a convention in which they receive equal payoffs. Internal stability is perhaps our most controversial axiom, yet its content is complementary to earlier axioms. The theoretical benefit is worthwhile: the characterizations will be sharp and sensible. Importantly, this axiom is source of interesting phenomena, such as reputation building, which are evidenced by experiments (Section 4.2). 3.4.2. Intermediate Characterization Let S is be the solution that assigns to each game the set of all sequences surviving Axiom 4. In this section, we characterize the payoffs associated with S is . Let Ws = {π1,2 (s)/π2,1 (s), π2,2 (s)/π1,1 (s)} contain the ratio of players’ minmax utilities and the ratio of their utilities when minmaxing, where min Ws and max Ws represent the smaller and the larger of the two ratios. Define14 n L(s) = π ∈ R2+ : π2 = απ1 where α ∈ (min Ws , max Ws ) if 14 π1,2 (s) π2,1 (s) and α = 6= π2,2 (s) , π1,1 (s) π2,2 (s) π1,1 (s) otherwise Recall the conventions: 0/0 = 1, x/0 = ∞ for all x > 0, and x/0 > y/0 if and only if x > y. o (3.6) 23 to be the set of line segments starting from the origin (although the origin may be excluded) and whose slopes lie between the above ratios. Note that if two sequences s and s′ have the same recurrent profiles, R∗ (s) = R∗ (s′ ), then L(s) = L(s′ ). Therefore, there are as many sets L as there are subsets of action profiles. The next proposition characterizes the payoffs associated with internal stability. Let G ′ be the family of games for which, if ui (a) = ui (a′ ) for some i, then a = a′ . Let G + be the family of games with nonnegative payoffs. Proposition 3 For any G ∈ G + ∩G ′ , π(S is (G)) = {π ∈ R2 : ∃a ∈ A s.t. π = u(a) or ∃s ∈ S s.t. π = π(s) ∈ L(s)}. This result states that the solution payoffs must lie on line segments or be pure payoff profiles. This is reminiscent of the segmental structure described by Theorem 2, except that the segments from Proposition 3 have positive slopes and start from the origin. In Figure 2, the proposition is applied to the Prisoners’ Dilemma and Battle of the Sexes. The dashed lines represent the payoffs compatible with internal stability. In the Prisoners’ Dilemma, there are three relevant sets R∗ (·): (i) those that contain (D, D), (ii) those that do not contain (D, D) but contain (D, C) and (C, D), (iii) the others. In cases (i) and (ii), the observed minmax and the utility when minmaxing are the same for each player, and thus α = 1. This corresponds to the middle dashed line in Figure 2. In case (iii), if R∗ (·) = {(C, C)}, then α = 1; otherwise, α = 0 or α = ∞ (for instance, if R∗ (·) = {(C, C), (C, D)}, then α = ∞, which corresponds to the vertical dashed in line in Figure 2). 3.5. The Characterization Theorem The Characterization Theorem describes the sequences, and especially the payoffs, that survive individual security, strong collective intelligence, and internal stability. Considering 24 ∏2 ∏2 ∏1 ∏2 ∏2 ∏1 ∏1 ∏1 Figure 2.— Solution Payoffs under Individual Security, Strong Collective Intelligence, and Internal Stability (Solution payoffs are represented by thick dots in the second and the fourth figure. The first and the third figure represent the payoffs associated with Axiom 1 and 3 (solid thick lines) and Axiom 4 (dashed lines)). that constant (individually secure) sequences pass all requirements, the axioms are mutually compatible in all games for which there exists at least one individually-secure pure-action profile. The solution is also nonempty in many other games for which this condition is violated, as in the mixed Nash game of Figure 3. Theorem 3 If S satisfies Axioms 1, 3, and 4, then for any G ∈ G + ∩ G ′ , S(G) consists of individually secure and internally efficient sequences, and for all π ∈ π(S(G)), either π = u(a) for a ∈ A or there is (an individually secure and internally efficient) s such that π = π(s) ∈ L(s). This theorem is a direct product of earlier results. Figure 2 and Section A.1 of the appendix show the conclusions of Theorem A.1 in eight different games. Section 4.2 offers a detailed application. 4. THE EXPERIMENTS Experiments were run by Mathevet and Romero (2012) at the Vernon Smith Experimental Economics Laboratory at Purdue University. The subjects were drawn from a pool of undergraduate students. All sessions were computerized, using an interface programmed with z-Tree (Fischbacher (2007)). Subjects were randomly assigned to a computer and given a 25 handout with instructions. Instructions were read aloud. Each subject completed a quiz to make sure that they understood the instructions. Each session lasted about 1 hour. During the experiments, all payoffs were displayed in experimental Francs, and the exchange rate to U.S. dollars was announced during the instructions. Subjects’ final payoff was the sum of earnings of all rounds. There were a total of 16 sessions consisting of 314 subjects. In each session, subjects played a total of two or three supergames. Each subject was randomly paired with one other subject and they played a 2 × 2 game repeatedly with the same partner. After each supergame, subjects were rematched with a new partner that they had not been matched with before, and played a game that they had not played before. The length of the supergame was determined randomly (see Roth and Murnighan (1978)). We considered two treatments. In the first treatment, each supergame started with 30 rounds with certainty, after which the probability of playing an additional round was δ = 0.9. In the second treatment, the continuation probability was δ = 0.99 from the first period (there was no number of rounds with certainty). The second treatment represents 2/3 of all the data points. As a result, most sequences of play were more than 100 periods long. The reason for having these two treatments was to observe the impact of increasing the discount factor when it is large. The differences between the two data sets were minor, and thus, we merged them in our analysis.15 Our interpretation is that there may be a discount factor above which subjects do not make a significant difference. Since our theoretical approach is concerned with complete patience, which might be impossible to re-create exactly in the lab, our data set is a reasonable approximation of this scenario. 15 This operation is particularly innocuous here, because our approach only deals with the set of outcomes and not the probabilities over outcomes. As we move from one treatment to the other, there might be small changes (e.g., in the probability of cooperation vs. defection), but as long as the same outcomes appear and no new ones emerge, the evaluation of our approach is unaffected. 26 A B A 3,3 1,4 B 4,1 2,2 Prisoners’ Dilemma (70 obs) A B A 3,3 2,1 B 1,2 0,0 Common Interest (18 obs) A B A 1,1 2,4 B 4,2 1,1 A B A 3,3 0,2 B 2,0 1,1 Battle of the Sexes (70 obs) W W H 2,2 2,2 H 5,3 3,5 Samaritan’s Dilemma (60 obs) A B A 3,3 1,4 B 4,1 0,0 Stag Hunt (50 obs) A B A 2,2 2,2 B 3,1 0,0 Ultimatum Game (60 obs) Chicken (70 obs) A B A 4,1 1,2 B 1,2 2,1 Mixed Game (36 obs) Figure 3.— 2 × 2 Games Tested and Number of Observations Our experiments generated 434 sequences of play. A total of eight 2 × 2 games were tested (see Figure 3). In each round, subjects were asked to make a choice between two actions as well as a guess about their partner’s next move.16 Correct guesses were randomly rewarded by monetary bonuses.17 4.1. General Results This section evaluates the empirical performance of our main solution (Axioms 1, 3, and 4). We run the analysis based on the last 30 periods of interaction between subjects so as to capture long run behaviors. We discuss alternative specifications at the end of the section. First, based on the subjects’ average payoffs, we show that the solution is at least as predictive as the Experience Weighted Attraction (EWA) learning model of Ho et al. (2007) and the reinforcement learning model of Er’ev and Roth (1998). The comparison is important, 16 For a small number of experiments, we could not obtain players’ guesses. For each correct guess, a subject earned a raffle ticket. At the end of each supergame, one raffle ticket was randomly selected, and the corresponding subject received a bonus. 17 27 TABLE I Empirical Performance Based on Payoffs Game Axiomatic Type I Prisoners’ Dilemma 36% Battle of the Sexes 61% Stag Hunt 12% Chicken 46% Common Interest 6% Samaritan’s Dilemma 56% Ultimatum Game 52% Mixed-Nash Game 100% Type II Reinf. Type I 0% 83% .06% 93% 0% 12% 3% 100% 0% 6% 4% 83% .9% 53% .25% 94% Type II EWA Folk Thm Type I Type II Type I Type II 0% 83% .06% 93% 0% 12% .05% 100% 0% 6% 0% 83% 0% 98% 0% 94% 0% .06% 0% .05% 0% 0% 0% 0% 6% 30% 2% 16% 0% 47% 38% 69% 51% 75% 69% 68% 18% 71% 28% 11% Note. Type I is the percentage of experimental observations (average payoffs over the last 30 periods) that fall outside the prediction set of a theory. Thus, Type I measures the fraction of incorrect rejection of observations. Type II is the percentage of all feasible payoffs that did not occur in the experiments despite being predicted by a theory. Thus, Type II measures the failure to reject “false” observations. Since several theories make infinitely many predictions, we discretized the set of feasible payoffs (into squares of width/height .05 units) and counted the errors. because these models are a valuable benchmark for human behavior in repeated settings. Section A.1 presents the outcomes of our experiments side-by-side with the theoretical predictions. Second, we use pattern detection algorithms to filter the data, and we show that our axioms are broadly satisfied within large subsets of the data. Table I describes empirical performances based on payoffs. Based on Type I and Type II errors, the axiomatic solution performs better than EWA and reinforcement learning in the Prisoners’ Dilemma and Battle of the Sexes, while it does worse in the Mixed-Nash game. All three approaches perform equally well in Stag Hunt and Common Interest. In the other games, the comparison is more ambiguous. The axiomatic solution always accounts for a larger fraction of the data (Type I) at the expense of Type II errors. In certain contexts, such as the Samaritan’s Dilemma, this allows us to identify important phenomena that would otherwise remain unexplained (see Section 4.2). Table II reports the satisfaction rates of the axioms in large subsets of the data. We run 28 pattern detection algorithms to filter the data (The Supplement gives details).18 For each observed sequence, the algorithms determine whether it has a pattern, and if so, they output it. Once all patterns are extracted, it is relatively straightforward to determine which ones survive the axioms. The first algorithm (Algo1 ) extracts any deterministic pattern, i.e., strong cycle, of length 10 or less within the last 30 periods of interaction. This algorithm is applied to the observed sequences of action profiles (Algo1-A) and payoff profiles (Algo1-P).19 A sequence survives the algorithm, if and only if, Algo1 detects a pattern. Column All in Table II gives the total number of observed sequences. Columns Nb give the number of sequences that survive an algorithm. Quick calculation shows that about 2/3 of all sequences survive Algo1. In the data, there are patterns of length 1 to 10, although lengths 1 and 2 are the most common. The second algorithm (Algo2 ) uses a different principle. In each round of experiments, subjects were asked to make a guess about their partner’s next move. A sequence of action profiles survives Algo2 if each player makes at most 3 errors in her last 30 guesses. The idea is that players’ ability to predict each other’s next move captures a form of stability. Column All ∗ gives the number of observed sequences for which we obtained players’ guesses. About 2/3 of them survive Algo2. The Rate columns of Table II display the percentage of observed (stable) sequences that satisfy Axioms 1, 3, and 4. The data show strong regularities: at least 90% of patterns survive our axioms. Let us explain this result. The sequences that survive an algorithm are called stable. To decide whether a stable sequence satisfies the axioms, we assumed that its pattern 18 These algorithms were developed by Julian Romero and the present author. Some sequences have patterns at the payoff level but not at the action level. Such sequences may still be relevant, which is why we include them in Algo1-P. 19 29 TABLE II Satisfaction Rates of the Axioms Among Stable Sequences Game All Prisoners’ Dilemma Battle of the Sexes Stag Hunt Chicken Common Interest Samaritan’s Dilemma Ultimatum Game Total/Average 70 70 50 70 18 60 60 398 Algo1-A Algo1-P All ∗ Algo2 Nb Rate Nb Rate Nb Rate 49 100% 49 100% 57 40 100% 40 88% 40 88% 58 31 93% 45 100% 45 100% 38 35 92% 46 91% 46 91% 58 36 89% 18 100% 18 100% 18 17 94% 35 80% 38 74% 48 28 75% 33 97% 37 97% 47 26 92% 266 94% 277 92% 324 214 91% would have continued to be repeated forever.20,21 If a stable sequence, so extended, survives the axioms, then we say that it satisfies them. We want to emphasize that nothing in these algorithms favors one pattern over another, except length (shorter patterns are found first). For example, Algo1-A considers over 106 patterns of action profiles for each game. In the Prisoners’ Dilemma and Battle of the Sexes, less than 0.04% of those 106 patterns belong to our solution,22 yet 100% and 88% of the observed patterns fall into our solution. In Stag Hunt, only 2 of those 106 patterns are predicted by the solution, yet 100% of the observed patterns are among these 2. The analysis relies on the last 30 periods, but our conclusions are robust to various specifications, as shown in Section A.2 of the appendix. On the one hand, looking at longer continuations typically increases the “richness” of the data and hence type I errors. On the other hand, more data points are then filtered out by our algorithms, and the satisfaction 20 For those sequences that are stable under Algo2, we assumed that the last 30 periods would have been repeated forever. See the Supplement. 21 The assumption of infinite horizon is technical in nature and not meant to be realistic. But it compels us to translate the meaning of finite events in reality into an infinite framework. Although we do not claim that subjects would have continued playing the same pattern forever, a situation in which two subjects play a pattern for more than 100 periods may not be far removed from that of two “theoretical” players playing that same pattern forever. 22 There exist 338 distinct patterns of length less than 10 that put weight 1/2 on each of two profiles. 30 rates of the axioms increase or are not significantly affected. This is because longer continuations yield a subset of the current patterns. We illustrate these remarks in the last 50 periods of interaction (Tables III and IV in Section A.2). Furthermore, the last n periods of interaction represent very different experiences depending on the length of a sequence. Thus, we can also run the analysis based on constant experience by looking at the interaction from period T until the end. We do it for T = 30 in Tables V and IV. 4.2. Reputation Building in the Samaritan’s Dilemma: Theory and Experiments In the Samaritan’s Dilemma (Buchanan (1977)), our solution makes predictions of particular interest. In this section, we derive them in detail and present experimental evidence. In this game, player 1 chooses to help (H) or not to help (H̄) player 2 accomplish a task. Player 2 chooses to work (W ) or not to work (W ). The payoffs are given in Figure 3. The dilemma is that player 1’s help is crucial to both players’ welfare, but if 1 helps, then 2 prefers not working. 4.2.1. Theoretical Predictions By Theorem 2, the solution payoffs must be a subset of the Pareto frontier. Therefore, all the solution sequences must play (H, W ), or (H, W ), or both, almost all the time. Since constant sequences survive Axiom 4, the extremities of the Pareto frontier are solution payoffs. The first main conclusion of our solution is that if player 1 always helps player 2, then 2 stops working. Formally, if R∗ (s) = {(H, W ), (H, W )}, then L(s) = {π ∈ R2+ : π2 = 5π1 /3} (see equation (3.6)). The only feasible payoff in this set is (3, 5). Intuitively, if player 1 always helps player 2, then player 2 exploits 1’s apparent weakness. Again, this corresponds to an extremity of the Pareto frontier. The most important conclusion is that player 1 can obtain payoffs in the interior of the 31 ∏2 ∏2 4 4 (H, W) (4.8, 3.2) 3 (4.8, 3.2) Reputation Building 3 ∏1 2 2 3 4 (H, W) ∏1 2 2 3 4 Figure 4.— Solution Payoffs in the Samaritan’s Dilemma (indicated by the black dots and the dashed line in the right figure. Note: the origin is (2, 2).) Pareto frontier only if he builds a reputation by occasionally not helping. This conclusion is an apparent paradox: by the previous conclusion, player 1 cannot help player 2 all the time if she wants to get more than 3, but she must help player 2 most of the time in virtue of Axioms 1 and 3. Before clarifying, note that when {(H̄, ·), (H, W ), (H, W )} ⊂ R∗ (s), L(s) = {π ∈ R2+ : π2 = απ1 s.t. α ∈ [2/3, 5/2]}, which is the gray area in Figure 4. Therefore, solution payoffs in the interior of the Pareto frontier are possible, if player 1 plays H̄ infinitely often. But if player 1 does so, how can the payoffs stay on the Pareto frontier? This is the apparent paradox. The key is that player 1 must play H̄ with vanishing frequency. That is, player 1 can only earn more than 3, if once in a while, she decides not to help player 2, but not so often as to destroy efficiency. Therefore, player 1 must build a reputation through punishments, but her punishments must vanish over time, as if her reputation became common knowledge and persisted though punishments became rare. 4.2.2. Experimental Results In the data, the most frequent observations are sequences in which players eventually play the static Nash equilibrium, and sequences in which players eventually alternate between (H, W ) and (H, W ) (see Section A.1). We refer to them as the static Nash and the alternation 32 Number of Sequences 4 3 2 1 0 3 6 9 Static Nash (not work) 12 15 18 21 24 27 Alternation work-not work 30 33 36 39 42 45 Number of Punishments Figure 5.— Distribution of Punishments in the Static Nash and in the Alternation Outcome. sequences. The main experimental finding is the structure of the alternation sequences. To induce the alternations between W and W , the samaritan subjects (player 1) often established a reputation by massively punishing their partner early in the game. After a while, they stopped punishing completely, but their partner alternated until the end, as if the reputation persisted, which echoes our theoretical conclusion. According to our solution, alternations require punishments to vanish but never stop. In this form, the claim cannot be tested in finite horizon,23 but it suggests that there should be more punishments in the alternation sequences than in the static Nash sequences, which is testable. 23 In Table II, we considered that if there were at least one punishment prior to player 2’s alternations, then the sequence satisfied the axioms. Of course, this is the minimal requirement. In general, we could require at least 1 ≤ N ≤ ∞ punishments. 33 Figure 5 represents the distributions of punishments for the static Nash and the alternation outcomes.24 The alternation distribution first-order stochastically dominates the static Nash distribution, which supports our theoretical claim. In particular, the average number of punishments in the alternation distribution is larger (16 vs. 7), which is statistically significant at a 5% significance level. The data suggest the existence of other phenomena, such as altruism (alternations with no punishment) and exploitation. In the latter, subjects in the role of player 2 started by playing W frequently, but in the absence of any punishment, they eventually stopped, and the pair converged to the static Nash. This conforms to our solution. 5. CONCEPTUAL DISCUSSION 5.1. Equilibrium or Learning Approach? There exist two interpretations of our approach, depending on the meaning of the infinite sequences of play. In a “learning” interpretation, the sequences of play represent the real time interaction between the players. In an “equilibrium” interpretation, the sequences of play are not a real time account of players’ interaction, but they are the end result of an unspecified process. The learning interpretation comes naturally, but it faces important challenges. In this interpretation, the content of the axioms holds eventually, in virtue of the liminf criterion. But how does a player know in real time that the interaction will never satisfy some principle, unless she observes the entire sequence? For example, how does a player know today that if she does not change her behavior the final outcome will be individually irrational, unless 24 Observed sequences may have different lengths due to randomness. Nevertheless, the distributions of lengths across the static Nash and the alternation sequences are nearly identical, and so comparing the number of punishments is relevant. Moreover, the length of a sequence did not seem to affect the number of punishments in our data set, as most subjects settled on a pattern in the first 90 periods, regardless of the actual total length. 34 she sees the future? The next result reinforces the argument. Proposition 4 limT →∞ 1 T PT t=1 For any game G ∈ G ′ ,25 if a sequence s ∈ S satisfies Axiom 3, then u(st ) exists. Players’ interaction under Axiom 3 must be relatively stable, because average payoffs must converge. How does this stability emerge? The learning interpretation is tempting, because it requires the desired properties to eventually emerge from within a sequence, as if driven by evolution, but the process by which the axioms are satisfied is unspecified. Standard equilibrium theory prompts similar interrogations. If a Nash equilibrium of a repeated game represents players’ interaction in real time, then how does each player know all the future consequences of her move today? How do players know the repeated game strategies of their opponents? And if repeated games are not played in real time, then what is the process that leads to Nash equilibria? Nachbar (2005) shows that there is no obvious answer. In conclusion, we encourage an equilibrium interpretation of our approach. In this interpretation, there is no claim of convergence to a cycle, unlike what Proposition 4 nearly implies; we take cycles as the starting point of the analysis.26 That is, we construct an equilibrium solution within the space of cycles by selecting among them (say that a cycle belongs to our solution, if and only if, it survives the axioms). As such, we do not explain the emergence of cycles, much less guarantee it, but if cycles do emerge, then our ambition is to capture them, and only those, into a solution. Given this objective, our experimental results (Table II) give credit to our solution. 25 In Section 3.4.2, we defined G ′ as the family of stage games for which, if ui (a) = ui (a′ ) for some i, then a = a′ . 26 Abreu and Rubinstein (1988) take finite automata, and hence strong cycles, as the starting point of their analysis. Among the strong cycles, they determine which ones emerge in equilibrium. 35 5.2. Observables vs. Unobservables Detailed descriptions of bounded rationality, as in the standard approaches, often require assumptions on unobservable variables, while their conclusions concern observables. For example, learning models often specify how beliefs are formed or how satisfaction thresholds evolve. But beliefs and satisfaction thresholds are not directly observable. Our approach, however, focuses on outcomes and its axioms apply to observable data. Since the assumptions and the conclusions live in the same space, they must be clearly distinguishable. In our case, the axioms carry independent behavioral contents — though behavior is not exactly specified — and their combination results in substantive conclusions. Reputation building is an example. Assumptions on observables require the analyst to discard sequences based solely on what is observed to happen. Therefore, this approach abstracts away from “off-path” considerations and emphasizes actual events and the self-sustainability of sequences. This perspective is common in the learning literature. In most learning models, the sequences of play are selfgenerated from one period to the next by various rules, and “off-equilibrium” considerations are completely absent. This simplifying feature excludes valuable information, but empirical works show that these approaches can still be useful. From a methodological standpoint, starting from observables streamlines the dialogue between theory and experiments. On the one hand, given that sequences of play are malleable yet rich primitives, the theory can efficiently produce solutions. These solutions are directly testable, although the infinite horizon can be problematic. On the other hand, the data from repeated interactions contain regularities to be mined. These regularities can be introduced into this framework to study their theoretical implications. Once a valid solution is reached, it is natural to build a model that explains it. 36 6. CONCLUSION This paper has proposed an axiomatic approach to study repeated interactions between boundedly rational players. On the one hand, there is the approach of specifying a model of behavior (preferences, feasible actions, beliefs, etc). This approach describes and predicts behavior. On the other hand, there is the axiomatic approach that suggests properties of unmodeled behaviors that seem reasonable for the domain of application (e.g. equity, fairness, stability, etc.). This paper has taken the latter approach. In doing so, we have assumed that patterns of observed behaviors have meaning in the sense that they are the outward expression of inward motives. If it is the case, then we can encode behavioral principles implicitly into axioms on the sequences of play. The malleability of these sequences allows us to combine multiple axioms and to derive solutions that are multifaceted representations of unsophisticated behavior. The relevance of our approach is supported by strong experimental evidence. In the games tested, our solution performs at least as well as widely accepted learning models (Ho et al. (2007) and Er’ev and Roth (1998)). Moreover, the patterns of play extracted from the data overwhelmingly satisfy our axioms (Table II). These quantitative conclusions are complemented by qualitative evidence. In the Samaritan’s Dilemma, our solution makes particularly interesting predictions that point to reputation building. Experiments suggest that reputation building indeed plays an important role in this game. Our approach opens new prospects. There exist many ways to be less than rational, and the possibility to explore them within the same framework is appealing. Our axioms apply within games and to all games. In this category, there exist other axioms formalizing various principles, and hence other solutions can be derived. For example, our axioms do not discriminate among (individually secure) constant patterns. Clearly, this leaves room for further selection. There are also other categories of axioms. For example, we can think of 37 axioms that establish relationships across games. If players adopt some behavior in a game, then they may adopt similar behaviors in related games. Thus, a solution may satisfy some invariance or isomorphism properties. Another category consists of context-specific axioms. These axioms are designed to hold in certain classes of games, but not necessarily in others. For example, assuming that a solution outputs strong cycles is acceptable in many games, but it may not be appropriate in zero-sum games. 38 APPENDIX A: EXPERIMENTAL RESULTS A.1. Overview In the experimental plots, the center of each circle is a data point (average payoffs over the last 30 periods) and larger circles represent more frequent observations. Folk Theorem (Self-Tuning) EWA Payoff 2 Reinforcement Payoff 2 Experimental Data Axiomatic Approach Payoff 2 Payoff 2 Payoff 2 4.0 4.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 0.0 Payoff 1 0.0 0.0 Payoff 1 0.0 Payoff 1 0.0 Payoff 1 Payoff 1 4.0 0.0 0.0 4.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 Prisoners’ Dilemma Payoff 2 Payoff 2 Payoff 2 Payoff 2 Payoff 2 4.0 4.0 4.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 0.0 4.0 1.0 2.0 3.0 0.0 Payoff 1 0.0 0.0 Payoff 1 0.0 Payoff 1 Payoff 1 Payoff 1 0.0 0.0 0.0 4.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 Battle of the Sexes Payoff 2 Payoff 2 4.0 3.0 2.0 1.0 4.0 4.0 4.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 0.0 0.0 4.0 1.0 2.0 3.0 Payoff 1 1.0 Payoff 1 0.0 0.0 0.0 Payoff 1 Payoff 1 Payoff 1 0.0 Payoff 2 Payoff 2 Payoff 2 4.0 0.0 0.0 4.0 Stag Hunt Payoff 2 Payoff 2 Payoff 2 4.0 3.0 2.0 1.0 4.0 4.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 3.0 4.0 0.0 1.0 2.0 3.0 0.0 4.0 1.0 2.0 3.0 0.0 4.0 0.0 1.0 2.0 3.0 Payoff 1 2.0 0.0 Payoff 1 1.0 Payoff 1 Payoff 1 0.0 0.0 Payoff 2 4.0 Payoff 1 0.0 Payoff 2 4.0 0.0 4.0 0.0 Ultimatum Game Payoff 2 Payoff 2 Payoff 2 Payoff 2 Payoff 2 4.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 Chicken 3.0 4.0 0.0 Payoff 1 1.0 0.0 Payoff 1 0.0 0.0 Payoff 1 0.0 Payoff 1 4.0 Payoff 1 4.0 0.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0 39 Payoff 2 Payoff 2 Payoff 2 4.0 4.0 4.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 3.0 0.0 4.0 0.0 0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 3.0 2.0 1.0 0.0 4.0 0.0 1.0 2.0 3.0 Payoff 1 2.0 4.0 Payoff 1 1.0 Payoff 1 0.0 Payoff 1 Payoff 1 0.0 Payoff 2 Payoff 2 4.0 0.0 0.0 4.0 1.0 2.0 3.0 4.0 Mixed-Nash Game Payoff 2 Payoff 2 Payoff 2 Payoff 2 Payoff 2 4.0 4.0 4.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 1.0 1.0 1.0 1.0 1.0 3.0 0.0 4.0 0.0 1.0 2.0 3.0 0.0 4.0 1.0 2.0 3.0 0.0 4.0 0.0 1.0 2.0 3.0 Payoff 1 2.0 0.0 Payoff 1 1.0 Payoff 1 0.0 Payoff 1 Payoff 1 0.0 0.0 0.0 4.0 1.0 2.0 3.0 4.0 Common Interest Payoff 2 Payoff 2 Payoff 2 Payoff 2 Payoff 2 5.0 5.0 5.0 5.0 4.0 4.0 4.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 2.0 3.0 4.0 5.0 1.0 2.0 3.0 4.0 5.0 1.0 2.0 3.0 4.0 5.0 1.0 Payoff 1 2.0 1.0 Payoff 1 1.0 1.0 Payoff 1 Payoff 1 1.0 Payoff 1 5.0 1.0 1.0 2.0 3.0 4.0 5.0 1.0 2.0 3.0 4.0 5.0 Samaritan’s Dilemma A.2. Robustness Analysis TABLE III Empirical Performance Based on Payoffs (Last 50 periods) Game Axiomatic Type I Prisoners’ Dilemma 43% Battle of the Sexes 59% Stag Hunt 17% Chicken 45% Common Interest 40% Samaritan’s Dilemma 57% Ultimatum Game 57% Mixed-Nash Game 100% Type II Reinf. Type I .08% 85% .06% 95% 0% 17% 3% 100% 0% 40% 4% 83% 1% 57% .25% 92% Type II EWA Folk Thm Type I Type II Type I Type II 0% 85% .06% 95% 0% 17% .05% 100% 0% 40% 0% 83% 0% 100% 0% 92% 0% .06% 0% .05% 0% 0% 0.1% 0% 4% 17% 4% 18% 0% 55% 45% 72% 52% 75% 69% 68% 2% 71% 28% 12% Note. The analysis of the last 50 periods excludes all sequences of length less than 50. In certain games, such as the common interest game, this rules out many sequences, and hence small noise in the remaining data points can have a large impact on type I errors. 40 TABLE IV Satisfaction Rates of the Axioms Among Stable Sequences (Last 50 periods) Game Algo1-A Algo1-P All ∗ Algo2 Nb Rate Nb Rate Nb Rate 70 30 100% 30 100% 57 26 100% 70 25 92% 25 92% 58 17 94% 38 35 100% 50 20 100% 20 100% 70 24 96% 24 96% 58 16 94% 18 4 100% 18 5 100% 5 100% 60 26 81% 29 72% 48 19 79% 47 21 95% 60 25 100% 31 100% 398 155 95% 164 93% 324 139 94% All Prisoners’ Dilemma Battle of the Sexes Stag Hunt Chicken Common Interest Samaritan’s Dilemma Ultimatum Game Total/Average TABLE V Empirical Performance Based on Payoffs (Period 30 to end) Game Axiomatic Type I Prisoners’ Dilemma 36% Battle of the Sexes 70% Stag Hunt 10% Chicken 49% Common Interest 6% Samaritan’s Dilemma 68% Ultimatum Game 67% Mixed-Nash Game 100% Type II Reinf. Type I .08% 77% 0% 87% 0% 10% 3% 100% 0% 6% 4% 87% 1% 67% .25% 94% Type II EWA Folk Thm Type I Type II Type I Type II 0% 77% 0% 87% 0% 10% .06% 100% 0% 6% 0% 87% 0% 100% 0% 94% 0% 0% 0% .06% 0% 0% .1% 0% 6% 24% 2% 14% 0% 42% 55% 81% 51% 75% 69% 68% 18% 71% 28% 12% TABLE VI Satisfaction Rates of the Axioms Among Stable Sequences (Period 30 to end) Game All Prisoners’ Dilemma Battle of the Sexes Stag Hunt Chicken Common Interest Samaritan’s Dilemma Ultimatum Game Total/Average 47 41 23 40 6 60 60 277 Algo1-A Algo1-P All ∗ Algo2 Nb Rate Nb Rate Nb Rate 29 100% 29 100% 34 23 100% 20 95% 20 95% 29 12 100% 19 100% 19 100% 11 10 100% 28 11 100% 20 100% 20 100% 18 5 100% 6 100% 6 100% 48 14 71% 20 75% 21 71% 22 100% 27 100% 47 18 100% 136 96% 142 95% 215 93 96% Note. To produce this table, we have ruled out all sequences of length less than 50 so as to allow most patterns to appear at least 3 times between period 30 and the end of the sequence (otherwise we can hardly consider it a pattern). 41 APPENDIX B: CLASSIFICATION OF GAMES In this section, we classify all stage games, and indirectly all repeated interactions, into families based on the degree of sophistication required to satisfy strong collective intelligence. B.0.1. Families of Games This subsection presents the technical definitions and the classification. The next subsection contains the main result. A triangle is a convex set ∆ = Co u1 , u2 , u3 where u1 , u2 and u3 are three (non-collinear) payoffs in Π(G),27 called vertices. Let V (∆) be the set of vertices of ∆. Let N (∆) = u ∈ V (∆) : u ∈ P(V (∆)) be the set of vertices of ∆ that are pairwise Pareto undominated. Let T (G) be the set of all triangles in G. Given a game G ∈ G, the next definition classifies the areas of the feasible payoffs into sets. This will determine the family to which the game belongs. Let Q[0, 1] be the set of rational numbers between 0 and 1. Given a rational number q, d(q) denotes its denominator and ν(q) its numerator. In the definition, k is a natural number. A payoff π ∈ Co(Π(G)) is in Uk if for all triangles ∆ ∈ T (G) such that π ∈ Definition 4 ∆ and |N (∆)| = 3, we can write ∆ = Co u1 , u2 , u3 such that, for some q ∈ Q[0, 1], either (u − u ) + q(u − u ) ≫ 0 and d(q) ≤ k, or (u − u ) + q(u3 − u2 ) ≫ 0 and d(q) + ν(q) ≤ k.28 If 2 1 2 3 1 2 there is no triangle containing π such that |N (∆)| = 3, then π ∈ U1 . A feasible payoff is in Uk if every triangle that contains it, and whose vertices are pairwise Pareto unordered, can be written such that a movement along one of its edges, followed by a fraction q of a movement along another, leads to a Pareto improvement, where q satisfies certain properties with respect to k. Once all feasible payoffs are placed into some Uk , the largest of these k values is the family to which G belongs. Definition 5 A game G ∈ G is in Fn , where n ∈ N+ , if for every π ∈ Co(Π(G)), π ∈ ∪nk=1 Uk . Definition 6 A game G ∈ G is in F1∗ , if for all ∆ ∈ T (G) and π ∈ ∆, either (i) |N (∆)| = 1 or (ii) |N (∆)| = 2 and ∆ = Co u1 , u2 , u3 such that u2 ≫ u1 and u3 ≫ u1 . −x2 2 = zz21 −x Three points x, y, z ∈ R2 are collinear if yy12 −x −x1 , i.e., if they are on the same line. 1 28 The requirement q ∈ [0, 1] is inconsequential because we can relabel the vertices of a triangle: if (u2 − 1 u ) + 2(u2 − u3 ) ≫ 0, then 21 (u2 − u1 ) + (u2 − u3 ) ≫ 0; exchanging the labels of u1 and u3 gives the desired form. 27 42 Every stage game belongs to a family Fn and every family Fn is nonempty. ∏2 3 ∏2 u3 3 u1 u2 1 U1 2 2 1 u1 1 2 u3∏1 3 u2 1 u2 U2 2 u3 ∏1 3 Figure 6.— Triangles in the Prisoners’ Dilemma In the Prisoners’ Dilemma of Section 3.3, there are four possible triangles, as shown by Figure 6. For every triangle in the left panel, |N (∆)| = 2 and thus, if these were the only triangles, the game would be in F1 . However, triangles in the right panel satisfy |N (∆)| = 3. In the bottom triangle, a movement along (u1 − u2 ) + (u3 − u2 ) leads to a Pareto improvement. Therefore, q = 1 and hence d(q) + ν(q) = 2. In words, players should replace two occurrences of (D, D) by one (C, D) and one (D, C), which requires twice as much coordination. In the top triangle, a movement along (u2 − u1 ) + (u2 − u3 ) gives a Pareto improvement. Therefore, q = 1 and hence d(q) = 1. Putting these observations together, we obtain G ∈ F2 . Battle of the Sexes and Chicken are in F1 . As shown by Figure 1 of Section 3.3, the set of feasible payoffs in Battle of the Sexes has only one triangle and it satisfies |N (∆)| = 2. Thus, Battle of the Sexes is in F1 . For Chicken, only one triangle satisfies |N (∆)| = 3, and we can label its vertices so that (u2 − u1 ) + (u2 − u3 ) ≫ 0. Thus, Chicken is in F1 . Among the games from Section A.1, Battle of the Sexes, Stag Hunt, the Ultimatum Game, Common Interest, and the Samaritan’s Dilemma are in F1∗ . In order to give further examples and to assist the reader with our classification, we provide an online program that classifies 2 × 2 games.29 29 The program is available at webspace.utexas.edu/lm26875/www/research. The user specifies the payoffs and the program outputs Fn if n < 10, 000. I thank my research assistant, Jonathan Lhost, for creating this program. For example, the reader can verify that u(c, c) = u(d, d) = (6, 1), u(c, d) = (1, 8), and u(d, c) = (2, 7) define a game in F5 . 43 B.0.2. Result: Properties of Families The complexity of a cyclic sequence s is measured by the length of its cycle, ℓ(s). Given a cyclic sequence s, say that sequence s′′ is at most n times more complex than s if ℓ(s′′ ) ≤ nℓ(s). When n = 1, s′′ is said to be simpler than s. When ℓ(s′′ ) < ℓ(s), s′′ is said to be strictly simpler than s. Proposition 5 1. For any game G ∈ Fn , if a cyclic sequence s contains a subsequence s′ such that π(s′ ) ≫ π(s), then there is a cyclic subsequence s′′ of s such that s′′ is at most n times more complex than s and π(s′′ ) ≫ π(s). 2. For any game G ∈ F1∗ , if a strongly cyclic sequence s contains a subsequence s′ such that π(s′ ) ≫ π(s), then there is a strongly cyclic subsequence s′′ of s that is strictly simpler than s and such that π(s′′ ) ≫ π(s). This result states that if a Pareto-improving subsequence exists, then there must also be one whose complexity cannot exceed a certain limit compared to the original sequence, and this limit is determined by the family. Next we give the intuition behind the result. When players play a cycle, the resulting payoffs lie in some triangle, each vertex of which corresponds to a payoff of a profile from the cycle. Definition 4 describes how to substitute these profiles to yield Pareto improvements. In the Prisoners’ Dilemma, for payoffs in the top triangle, the definition says that replacing one occurrence of (C, D) and one of (D, C) by two occurrences of (C, C) increases players’ payoffs. Clearly, this does not increase the length of the cycle. In the bottom triangle, the definition says to replace two occurrences (or points of frequency) of (D, D) by one of (C, D) and one of (D, C). In some situations, e.g. when the cycle places weight 1/2 on (D, C) and 1/4 on (C, D) and (D, D), we must first double the cycle to be able to implement the changes. APPENDIX C: PROOFS Several proofs use the following observation. Observation 1 By definition of a subsequence (Definition 1), given any sequence s, if we build ′ a sequence s by only using action profiles that appear infinitely often in s, then s′ is subsequence of s. As a result, each convex combination (i.e., using rational numbers as coefficients) of recurrent payoffs can be achieved by some subsequence of s. 44 C.1. Proposition 1 Proof: Suppose that a cyclic sequence s admits no subsequence s′ such that π(s′ ) ≫ π(s) but there is a profile â such that wâ (s) > 0 and u(â) ≪ π(s). It follows from the definition of a cyclic sequence that π(s) = X wa (s) a∈A ℓ(s) u(a). Since u(â) ≪ π(s), there must be a1 and a2 , not necessarily different, such that wa1 (s), wa2 (s) > 0 and u(a1 ) − u(â) + q(u(a2 ) − u(â)) ≫ 0 where q is a rational number in [0, 1]. To see this, note that if u(â) ≪ π(s), then either (i) there is a profile (played with positive frequency) that Pareto dominates â or (ii) there are two distinct profiles (played with positive frequency) whose convex combination Pareto dominates â. Let q = ν(q)/d(q) where ν(q) and d(q) in N are the numerator and the denominator of q. For every w∗ > 0, let π ∗ = π(s) + w∗ u(a1 ) − u(â) + q(u(a2 ) − u(â)) , ℓ(s) and note that π ∗ ≫ π(s). We demonstrate that w∗ can be chosen so that π ∗ is a convex combination of pure payoffs. For w∗ = π∗ = wâ (s) (1+q) , we have wa1 (s)(d(q) + ν(q)) + d(q)wâ (s) wa (s)(d(q) + ν(q)) u(a) + u(a1 ) ℓ(s)(d(q) + ν(q)) ℓ(s)(d(q) + ν(q)) a6=a1 ,a2 ,â X wa2 (s)(d(q) + ν(q)) + ν(q)wâ (s) + u(a ) ℓ(s)(d(q) + ν(q)) 2 (C.1) which is a well-defined convex combination. This convex combination corresponds to a sequence built from s by appropriately re-allocating the weight on â to a1 and a2 . By Observation 1, this convex combination is the payoff of a subsequence of s. Since π ∗ ≫ π(s), s admits a Pareto improving subsequence, which is a contradiction. Q.E.D. C.2. Proposition 2 Proof: For every G ∈ G, Axiom 2 implies that π(s) ∈ U (R(s)) for any s ∈ S w (G), because s cannot be dominated by any single element in R(s) (otherwise a constant truncation would be Pareto improving). Individual security is immediate, and thus it is clear that π(S w (G)) ⊂ {π ∈ R2 : ∃s ∈ S such that π = π(s) and π(s) ∈ U (R(s))} ∩ ΠIS (G). Next we show the inclusion holds in the other 45 direction. Take any π ∈ {π ∈ R2 : ∃s ∈ S such that π = π(s) and π(s) ∈ U (R(s))} ∩ ΠIS (G). Thus, there is a sequence s such that π = π(s) and π ∈ U (R(s)). Write the elements of R(s) as {u1 , . . . , un } where a higher index indicates a larger payoff for player 2 (i.e., if k > h, then uk2 ≥ uh2 ). For any ǫ > 0, there is a rational convex combination of elements of R(s), ′ Pn m=1 wm m ℓ u with P wm = ℓ, that is ǫ-close to π. Define sequence s such that (i) all its elements are in R(s) and (ii) it has a cycle that corresponds to the following payoff cycle u1 u1 . . . u2 u2 . . . um (C.2) where w1 u1 ’s are followed by w2 u2 ’s, etc. Clearly, π(s′ ) = π(s). Further, given the way the cycle of s′ is ordered in (C.2), any truncation of it must (weakly) decrease player 2’s payoff because of our indexing. Therefore, no truncation can strictly benefit both players. Since π(s) ∈ U (R(s)) and R(s) = R(s′ ), s′ ∈ S w (G) and hence π = π(s′ ) ∈ π(S w (G)). Q.E.D. C.3. Theorem 1 Proof: Suppose that solution S satisfies the axioms. Step 1. Internal efficiency. Take any G and any sequence s ∈ S(G). By Observation 1, for every π ∈ Co(R(s)), there is a subsequence s′ of s such that π(s′ ) = π. By Axiom 3, there is no subsequence s′ such that π(s′ ) ≫ π(s). Thus, there is no π ∈ Co(R(s)) such that π ≫ π(s). By definition of P(·), this implies that π(s) ∈ P(Co(R(s))). Step 2. Individual security. This part follows immediately. Suppose now that S(G) is a subset of tall internally efficient and individually secure sequences. Take any G and any sequence s ∈ S(G). Then π(s) ∈ P(Co(R(s))). By Observation 1, there cannot be a subsequence s′ of s such that π(s′ ) ≫ π(s), for otherwise there would be a convex combination (in particular, one using rational numbers as coefficients) of elements of R(s) that Pareto dominates π(s). Therefore, Axiom 3 holds. Axiom 1 is trivially satisfied. Q.E.D. C.4. Theorem 2 Proof: By definition, for any π ∈ π(S st (G)), there is a sequence s ∈ S st (G) such that π(s) = π. Sequence s must be individually secure by Theorem 1, and hence π ∈ ΠIS (G). By the same theorem, s must also be internally efficient, and hence π ∈ P(Co(R(s))). Recall the definition of R∗ , defined in (3.2) (on p. 20) as the recurrent action profiles. If R∗ (s) = {a} for some profile a, then internal 46 efficiency immediately implies π ∈ P(g) where g = {u(a)}. If R∗ (s) = {a, a′ }, then internal efficiency immediately implies π ∈ P(g) where g = Co({u(a), u(a′)}). Suppose now that R∗ (s) ≥ 3. Every point in P(Co(R(s))), π in particular, must lie on the boundary of (i.e., a segment in) Co(R(s)), for if π lied in the interior, then there would be a Pareto improvement over π and s would not be internally efficient. Since π must lie on a segment g ⊂ Co(R(s)), it follows from internal efficiency that π ∈ P(g). This proves that π(S st (G)) ⊂ (∪g∈L(G) P(g)) ∩ ΠIS (G). Let us show that the inclusion holds in the other direction. Take any π ∈ (∪g∈L(G) P(g))∩ΠIS (G). We build a sequence that satisfies both axioms and generates payoffs π. By definition, there exist u1 , u2 ∈ Π(G) such that g = Co u1 , u2 and π ∈ P(g). If u1 and u2 are Pareto ordered, then assuming without loss of generality that u1 ≫ u2 , we have P(g) = u1 , and hence the sequence that only plays a profile with payoff u1 = π is internally efficient. If u1 and u2 are not Pareto ordered, then it is possible to build a sequence s such that R(s) = u1 , u2 and whose payoff π(s) = π. This sequence must be internally efficient, for otherwise π would not lie in P(Co u1 , u2 . Thus, π ∈ π(S st (G)). Q.E.D. C.5. Proposition 3 Proof: Take an arbitrary game G in G + ∩ G ′ . Choose any π ∈ π(S is (G)). Then either (i) π = u(a) for some a ∈ A or (ii) π 6= u(a) for all a ∈ A. Case (i) immediately satisfies the requirement. So, suppose that π satisfies case (ii). For any sequence s′ , if πi (s′ ) = ui (a∗ ) ≡ maxa∈R∗ (s′ ) ui (a), then πj (s′ ) = uj (a∗ ), because G ∈ G ′ .30 Let sequence s ∈ S is (G) satisfy π(s) = π. From the previous observation, it cannot be that πi (s′ ) = maxa∈R∗ (s′ ) ui (a) for any i, because π(s) 6= u(a) for all a by assumption. This implies πi (s) < maxa∈R∗ (s) ui (a) for all i. Thus, Axiom 4 requires that (3.5) hold for no i. Therefore, if π1,2 (s) π2 (s) < π1 (s) π2,1 (s) (C.3) π2,2 (s) π2 (s) > . π1 (s) π1,1 (s) (C.4) then s survives the axiom if and only if The converse also holds: if (C.4) holds, then s survives the axiom if and only if (C.3) holds. To summarize, under Axiom 4, (C.3) holds if and only if (C.4) holds. It follows from similar arguments 30 Consider a game for which (2, 4) and (2, 2) are pure payoffs but not (2, 3). Note that this game is not in G . Take a sequence s′ along which payoffs (2, 4) and (2, 2) appear with frequency 1/2. Then, π1 (s′ ) = 2 = maxa∈R∗ (s′ ) u1 (a) but π2 (s′ ) = 3 and there is no a such that u1 (a) = 2 and u2 (a) = 3. ′ 47 that π1,2 (s) π2 (s) > π1 (s) π2,1 (s) (C.5) π2 (s) π2,2 (s) < π1 (s) π1,1 (s) (C.6) holds if and only if π2 (s) π1 (s) holds. As a corollary, (a) Suppose π2,2 (s) π1,1 (s) > = π1,2 (s) π2,1 (s) , π1,2 (s) . π2,1 (s) Then if and only if, π2 (s) π1 (s) π2 (s) π1 (s) = π2,2 (s) π1,1 (s) . Consider the following three cases: can be above, below, or in between both ratios. That is, π2,2 (s) π2,2 (s) π2 (s) π1,2 (s) π1,2 (s) π2 (s) π2 (s) ≥ or > > or ≥ . π1 (s) π1,1 (s) π1,1 (s) π1 (s) π2,1 (s) π2,1 (s) π1 (s) Given our previous observations, only (b) Suppose π2,2 (s) π1,1 (s) > π2,2 (s) . π1,1 (s) > π2 (s) π1 (s) > π1,2 (s) π2,1 (s) is compatible with Axiom 4. A similar reasoning to Case (a) establishes that only π1,2 (s) π2,1 (s) π2 (s) π1 (s) > π1,2 (s) π2,1 (s) = > is compatible with Axiom 4. (c) Suppose π2 (s) π1 (s) π1,2 (s) π2,1 (s) π2,2 (s) π1,1 (s) = π2,2 (s) π1,1 (s) π1,2 (s) π2,1 (s) = π2,2 (s) π1,1 (s) . A similar reasoning to Cases (a) and (b) establishes that only is compatible with Axiom 4. Putting Cases (a)-(c) together, we conclude that π(s) ∈ L(s). The first part of the proof has established that π(S is (G)) ⊂ {π ∈ R2 : ∃a ∈ A s.t. π = u(a) or ∃s ∈ S s.t. π = π(s) ∈ L(s)}. We now prove that the inclusion holds in the other direction. First, choose any π = u(a) for some a ∈ A. Any sequence s ∈ S such that R∗ (s) = {a} trivially survives internal stability (because for no i it holds that πi (s) < maxa∈R∗ (s) ui (a)), and thus π ∈ π(S is (G)). Suppose now that there is s such that π = π(s) ∈ L(s). Then, one of Cases (a)-(c) must apply. In each case, we know that (3.5) can hold for no i. Thus, s survives Axiom 4 and π(s) ∈ π(S is (G)). Q.E.D. C.6. Proposition 4 Proof: By way of contradiction, suppose that sequence s satisfies Axiom 3 but limT →∞ does not exist. Define zT = T 1X u(st ). T t=1 1 T PT t t=1 u(s ) By assumption, {zT } is divergent, but since it lives in a compact space, it has at least two distinct cluster points z ′ and z ′′ in Co(R(s)). By definition of payoffs as liminf, z ≡ min z1′ , z1′′ , 48 min z2′ , z2′′ ≥ π(s). Suppose first {z ′ , z ′′ } 6⊂ P(Co(R(s))). Then it is clear that z ∈ / P(Co(R(s))), and thus, by Observation 1, there is a subsequence s′ of s such that π(s′ ) ≫ z. This violates Axiom 3. Suppose now {z ′ , z ′′ } ⊂ P(Co(R(s))). Then, there exist a and a′ in A such that (i) u(a), u(a′ ) ∈ P(Co(R(s))) and (ii) both z ′ and z ′′ lie on the line segment Co({u(a), u(a′)}). Since the stage game is such that no two distinct action profiles yield either player the same payoff, / P(Co{R(s)}). Again, this is a violation of u1 (a) 6= u1 (a′ ) and u2 (a) 6= u2 (a′ ), it must be that z ∈ Axiom 3. Q.E.D. C.7. Proposition 5 Proof: 1. Suppose s has a cycle. Consider two cases: (a) |R(s)| ≤ 2, (b) |R(s)| ≥ 3. Case (a). If |R(s)| = 1, there is no subsequence s′ of s such that π(s′ ) ≫ π(s) and this violates the assumption. If |R(s)| = 2, then either the elements of R(s) are not Pareto ranked, i.e., R(s) = {u, u′ } with u 6≫ u′ and u′ 6≫ u, in which case the assumption is violated because there is no subsequence s′ of s such that π(s′ ) ≫ π(s); or these elements are Pareto ranked (say u ≫ u′ , without loss of generality). Since a Pareto-improving subsequence exists, u′ must be played with strictly positive frequency. Hence the subsequence s′′ of s, which always plays the profile yielding u, strictly dominates s (if s is strongly cyclic, then s′′ is also strictly simpler than s, since s′′ has a cycle of length 1). Case (b). Suppose |R(s)| ≥ 3. If π(s) ∈ P(Co(R(s))), then there is no subsequence s′ of s such that π(s′ ) ≫ π(s), which violates the assumption. Therefore, suppose π(s) ∈ / P(Co(R(s))). Then there must be a triangle ∆ = Co u1 , u2 , u3 (iii) π(s) ∈ / P(∆). There are two sub-cases: such that (i) π(s) ∈ ∆, (ii) u1 , u2 , u3 ⊂ R(s), and Case (b1). Suppose |N (∆)| ≤ 2. This implies that there exist u, u′ ∈ {u1 , u2 , u3 } such that u ≪ u′ and u is played with strictly positive frequency in s. Let a′ be an action profile that appears infinitely often in s and for which u(a′ ) = u′ . Construct the subsequence s′′ of s by transferring the frequency attached to all profiles yielding u to a′ . Clearly, π(s′′ ) ≫ π(s) and ℓ(s′′ ) ≤ ℓ(s). Case (b2). Suppose |N (∆)| = 3. Since s is cyclic, we can write π(s) as π(s) = M X wm (s) m=1 ℓ(s) um where M ≥ 3, {um } ⊂ R(s), wm (s) ≥ 0 for all m, and ℓ(s) = PM m=1 wm (s). Recall that π(s) ∈ ∆ = 49 Co u1 , u2 , u3 . For any w∗ > 0, define π ∗ = π(s) + w∗ ((u2 − u1 ) + q(u2 − u3 )) ℓ(s) (C.7) π ∗∗ = π(s) + w∗ ((u1 − u2 ) + q(u3 − u2 )), ℓ(s) (C.8) and where q ∈ Q[0, 1] is written as q = ν(q)/d(q). Since G is in family Fn , either (u2 −u1 )+q(u2 −u3 ) ≫ 0 and d(q) ≤ n, or (u1 −u2 )+q(u3 −u2 ) ≫ 0 and d(q)+ν(q) ≤ n. Thus, for all w∗ > 0, either π ∗ ≫ π(s) or π ∗∗ ≫ π(s) holds. In each case, we can choose w∗ to obtain a well-defined convex combination. For w∗ = min w1 (s), w3 (s) , (C.7) gives π∗ = wm (s)d(q) m (w1 (s) − w∗ )d(q) 1 w2 (s)d(q) + w∗ (ν(q) + d(q)) 2 u + u + u d(q)ℓ(s) d(q)ℓ(s) d(q)ℓ(s) m6=1,2,3 X + w3 (s)d(q) − w∗ ν(q) 3 u , d(q)ℓ(s) which is a well-defined convex combination implementable by a sequence of length at most d(q)ℓ(s) (which is less than nℓ(s)). For w∗ = w2 (s)/(1 + q), (C.8) gives π ∗∗ = wm (s)(d(q) + ν(q)) m u + ℓ(s)(d(q) + ν(q)) m6=1,2,3 X w1 (s)(d(q) + ν(q)) + d(q)w2 (s) u1 ℓ(s)(d(q) + ν(q)) + w3 (s)(d(q) + ν(q)) + ν(q)w2 (s) u3 (C.9) ℓ(s)(d(q) + ν(q)) which is a well-defined convex combination implementable by a sequence of length at most (d(q) + ν(q))ℓ(s) (which is less than nℓ(s)). This completes the proof of part 1. 2. The proof follows similar steps to part 1. Case (a) has already been addressed and only Case (b1) applies. Case (b1) described a procedure that delivers a Pareto-improving subsequence (by abandoning some action profiles). However, the resulting subsequence might also be Pareto improved in a similar fashion. Therefore, this procedure can be used iteratively by eliminating profiles in R(s), and it produces one Pareto-improving subsequence after another without increasing complexity along the way. Since A is finite, there must be a point at which the sequence under consideration, call it s, satisfies R(s) = {u1 , u2 , u3 }, i.e., only three payoff profiles appear in s. From here, the next argument 50 shows that a strictly simpler Pareto-improving subsequence can be found. Let Ai ⊂ A be the set of action profiles that appear infinitely often in s such that u(ai ) = ui for all ai ∈ Ai . Define ∆ = Co({u1 , u2 , u3 }). If |N (∆)| = 1, then there is ui ∈ {u1 , u2 , u3 } such that ui ≫ uj for j 6= i. Thus, the constant sequence s′′ in which players always play ai ∈ Ai satisfies π(s′′ ) ≫ π(s) and it is a subsequence of s. Given s is strongly cyclic and |R(s)| = 3, s′′ must be strictly simpler than s, because ℓ(s′′ ) = 1. Now suppose |N (∆)| = 2. It follows from the definition of F1∗ that u2 ≫ u1 and u3 ≫ u1 . We construct a subsequence s′′ of s and then show that it is Pareto-improving and strictly simpler than s. Let Wi (s) = P a∈Ai wa (s) be the weight assigned to payoff ui in s where ℓ(s) = W1 (s) + W2 (s) + W3 (s) is the length of the cycle of s. Construct s′′ from s as follows: 1. Pick a∗ ∈ A2 and a∗∗ ∈ A3 ; a∗ and a∗∗ are the only profiles in A2 ∪ A3 that are played in s′′ . 2. No profile in A1 is played in s′′ . 3. a∗ is played with frequency 4. a ∗∗ is played with frequency W2 (s)+ W W2 (s) W1 (s) 2 (s)+W3 (s) ℓ(s) W3 (s)+ W . W3 (s) W1 (s) 2 (s)+W3 (s) ℓ(s) . In words, s′′ re-allocates all of W2 (s) to a∗ , all of W3 (s) to a∗∗ , and all of W1 (s) to a∗ and a∗∗ . Since u2 ≫ u1 and u3 ≫ u1 , s′′ is necessarily Pareto-improving. Simple calculations show that a2 is played with frequency W2 (s)/(W2 (s) + W3 (s)) and a3 with frequency W3 (s)/(W2 (s) + W3 (s)). Hence, ℓ(s′′ ) = W2 (s) + W3 (s) < ℓ(s). Q.E.D. C.8. Uniqueness Result This section defines a class of common interest games in which our solution payoffs form a singleton set. Let u∗i = maxa∈A ui (a) for i = 1, 2. A two-player game is common interest game if ui (a′ ) = u∗i for some player i implies that uj (a′ ) = u∗j for player j. Let u(G) = (u1 (G), u2 (G)) be the profile of maxmin payoffs in game G. Proposition 6 In game G ∈ G, if there is π ∗ ∈ Π(G) such that (i) π ∗ ≫ u(G) ≥ π for all π ∈ Π(G)\{π ∗ } and u(G) ∈ / Π(G) or ( ii) π ∗ = u(G) ≫ π for all π ∈ Π(G)\{π ∗ }, then π(S st (G)) = {π ∗ }. Proof: Suppose first that there exists π ∗ ∈ Π(G) such that π ∗ ≫ u(G) ≥ π for all π ∈ Π(G)\{π ∗} and u(G) ∈ / Π(G). By Theorem 2, π(S st (G)) ⊂ (∪g∈L(G) P(g)) ∩ ΠIS (G). Since u(G) ∈ / Π(G) and π ∗ ≫ π for all π ∈ Π(G)\{π ∗ }, the only segments g ∈ L(G) such that g ∩ ΠIS (G) 6= ∅ must be such that π ∗ ∈ g. For all segments g with π ∗ ∈ g, we have P(g) = {π ∗ }. Therefore, π(S st (G)) ⊂ {π ∗ }. 51 Moreover, note that the sequence that always plays profile a∗ with u(a∗ ) = π ∗ is in S st (G). Since S st (G) is nonempty, π(S st (G)) = {π ∗ }. Suppose now that π ∗ = u(G) ≫ π for all π ∈ Π(G)\{π ∗}. Then, L(G) ∩ ΠIS (G) = {π ∗ }. The sequence in which players always play their maxmin action yields payoffs π ∗ and is in S st (G). Q.E.D. REFERENCES Abreu, D., Rubinstein, A., 1988. The structure of nash equilibrium in repeated games with finite automata. Econometrica 56 (6), 1259–81. Agrawal, R., Imieliński, T., Swami, A., 1993. Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on Management of data. SIGMOD ’93. ACM, New York, NY, USA, pp. 207–216. Blonski, M., Ockenfels, P., Spagnolo, G., 2011. Equilibrium selection in the repeated prisoner’s dilemma: Axiomatic approach and experimental evidence. American Economic Journal: Microeconomics 3 (3), 164– 192. Brown, G. W., 1951. Iterative solutions of games by fictitious play. In: Koopmans, T. C. (Ed.), Activity Analysis of Production and Allocation. John Wiley and Sons, Chichester, pp. 374–376. Buchanan, J. M., 1977. Freedom in Constitutional Contract. Texas A&M University Press. Camerer, C., Ho, T.-H., 1999. Experience-weighted attraction learning in normal form games. Econometrica 67 (4), 827–874. Eirinaki, M., Vazirgiannis, M., 2003. Web mining for web personalization. ACM Transactions on Internet Technology 3, 1–27. Er’ev, I., Roth, A. E., 1998. Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review 88 (4), 848–81. Fischbacher, U., 2007. z-tree: Zurich toolbox for ready-made economic experiments. Experimental Economics 10 (2), 171–178. Foster, D. P., Young, H. P., 2003. Learning, hypothesis testing, and nash equilibrium. Games and Economic Behavior 45 (1), 73–96. Fudenberg, D., Levine, D. K., 1998. The Theory of Learning in Games, Cambridge, Massachusetts. Han, J., Cheng, H., Xing, D., Yan, X., 2007. Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery 15, 55–86. Hart, S., Mas-Colell, A., 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica 68 (5), 1127–1150. Ho, T. H., Camerer, C. F., Chong, J.-K., 2007. Self-tuning experience weighted attraction learning in games. Journal of Economic Theory 133 (1), 177–198. Karandikar, R., Mookherjee, D., Ray, D., Vega-Redondo, F., 1998. Evolving aspirations and cooperation. Journal of Economic Theory 80 (2), 292–331. Marinacci, M., 1998. An axiomatic approach to complete patience and time invariance. Journal of Economic Theory 83 (1), 105–144. Mathevet, L., Romero, J., 2012. Predictive repeated game theory: Measures and experiments. Milgrom, P., Roberts, J., 1990. Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica 58 (6), 1255–1277. Milgrom, P., Roberts, J., 1991. Adaptive and sophisticated learning in normal form games. Games and Economic Behavior 3 (1), 82–100. Moore, J., 2011. Behaviorism. The Psychological Record 61 (3), 449–464. 52 Nachbar, J. H., 2005. Beliefs in repeated games. Econometrica 73 (2), 459–480. Roth, A., Murnighan, J. K., 1978. Equilibrium behavior and repeated play of the prisoners’ dilemma. Journal of Mathematical Psychology 39 (1), 83–96. Rubinstein, A., 1986. Finite automata play the repeated prisoner’s dilemma. Journal of Economic Theory 39 (1), 83–96. Sonsino, D., 1997. Learning to learn, pattern recognition, and nash equilibrium. Games and Economic Behavior 18 (2), 286–331. Thomson, W., 2001. On the axiomatic method and its recent applications to game theory and resource allocation. Social Choice and Welfare 18 (2), 327–386.

Laurent Mathevet

Related documents

Products

Support

Laurent Mathevet

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib