Laurent Mathevet

advertisement
AN AXIOMATIC APPROACH TO BOUNDED RATIONALITY IN REPEATED
INTERACTIONS: THEORY AND EXPERIMENTS
Laurent Mathevet∗
This paper proposes an axiomatic approach to study repeated interactions between
two boundedly rational players. Several axioms are presented: individual security; (strong
and weak) collective intelligence; and internal stability. We characterize the axioms’ solutions and payoffs, classify stage games by the degree of sophistication required to satisfy
strong collective intelligence, and present experimental evidence broadly consistent with
the axioms’ implications. Of particular interest are the dynamic predictions in Buchanan
(1977)’s Samaritan’s Dilemma: our main solution predicts that players will reach efficient
agreements in which payoffs depend on the Samaritan’s ability to build a reputation via
punishments; yet these punishments vanish over time, as if the reputation became common knowledge. Experiments support these predictions.
Keywords: sequences of play, axioms, pattern selection, complexity, learning, bounded
rationality, experiments.
∗
Special thanks are due to Jonathan Lhost, David Pearce, Julian Romero, Philippe Solal, Dale Stahl, Max
Stinchcombe, Ina Taneva, and Tom Wiseman for many discussions and suggestions. I also wish to thank
Matthias Blonski, In-Koo Cho, David Frank, Guillaume Frechette, Drew Fudenberg, Herve Moulin, Mike
Peters, Ariel Rubinstein, Sahotra Sarkar, Rick Szostak, and Peyton Young for their helpful comments as well
as the seminar/conference participants at NYU, Purdue, Rice, UBC, UCR and the 2012 North American
Summer Meeting of the Econometric Society.
a
Dept. of Economics, University of Texas at Austin, 1 University Station, C3100, Austin, TX 78712, U.S.A.
(lmath@austin.utexas.edu)
1
2
1. INTRODUCTION
In game theory, the analyses of bounded rationality often appear as heterogeneous or fragmented. The difficulty comes from the many ways to be less than rational. As a result, if
one writes down a precise model of exactly how players are limited in rationality, then the
model should be specific to avoid being cumbersome. In repeated interactions, the learning literature faces this problem. On the one hand, the model can shed light on a specific
facet of bounded rationality (e.g., adaptive learning (Milgrom and Roberts (1990)), reinforcement learning (Er’ev and Roth (1998)), aspiration-based learning (Karandikar et al. (1998)),
pattern recognition learning (Sonsino (1997)), regret-based learning (Hart and Mas-Colell
(2000)), etc).1 On the other hand, the model can combine multiple features of bounded rationality (e.g., Camerer and Ho (1999)), but tractability quickly becomes a concern: theoretical
predictions might require computer simulations. Besides, most learning models assume that
all players follow the same learning rule. Allowing for different behaviors is important, but
tractability often prevents it. In this context, this paper departs from the standard approach
and asks the question: are there plausible principles that describe the outcomes of a repeated
interaction between two unsophisticated players?
This paper proposes an axiomatic approach to study repeated interactions between two
boundedly rational players. Our objective is to explore an abstract approach to bounded
rationality that is complementary to the descriptive method of the learning literature. In
general, the researcher can either model boundedly rational behaviors precisely, for example,
by specifying learning algorithms, or she can describe the implications or the outcomes of
boundedly rational behaviors. In this paper, we explore the latter option. In other words, our
1
In these models, the lack of sophistication can be interpreted as bounded rationality in a standard
repeated game, or as a more “rational” response in a random-matching protocol. Brown (1951), for example,
introduced fictitious play using a bounded rationality justification: “One might naturally expect a statistician
to keep track of the opponent’s past plays and, in the absence of a more sophisticated calculation, perhaps
to choose at each play the optimum pure strategy against the mixture . . . ”
3
objective is not to propose an alternative precise model of bounded rationality in repeated
settings, but rather to propose a plausible recipe for less than rational play. The desire
to dispense with a model calls for a model-free approach, and in the context of repeated
interactions, the infinite sequences of play are natural primitives for this exercise.
Formally, a repeated interaction is an infinite sequence of repetitions of a stage game. A
solution is a description of all the sequences of play that are likely to arise in a repeated
interaction given its stage game. In technical terms, a solution is a correspondence that maps
the set of stage games into the set of infinite sequences of action profiles. Our approach
consists in writing axioms on the infinite sequences of play and then characterizing the
solution and/or the solution payoffs.
Our first objective is to introduce an axiomatic framework for repeated interactions and to
illustrate it with specific axioms. Although our axioms are not the only reasonable ones, they
are plausible principles that boundedly rational players might satisfy in many situations. Our
second objective, after characterizing solutions, is to supply empirical evidence that support
the axioms’ implications.
Before continuing the discussion, we illustrate our approach in examples. The main question
is: are there patterns of play that seem more plausible than others?
S
H
S
4,4
0,3
H
3,0
2,2
Stag Hunt
P
P
E
2,3
0,2
E
3,1
1,2
Diner Game
Consider the stag hunt game. Two individuals go out on a hunt every day. Each can choose
to hunt a stag or a hare. Suppose the history of play becomes common knowledge after each
period and players are completely patient. To illustrate the first axiom, consider the sequence
4
of play ((S, H)(H, S) . . .), which is a scenario of repeated miscoordination.2 This sequence
will be eliminated by individual security. In this sequence, each player earns less on average
than what she could have secured by switching at any point in time and forever to action H.
This axiom can be interpreted as imposing some minimal amount of rationality. To illustrate
the second axiom, consider the sequence ((H, H)(S, S) . . .) in which both players alternate
forever between hunting hare and hunting stag. This sequence will be eliminated by collective
intelligence. Unsophisticated players like to do simple things: if they could, they would play
a single action profile forever. One of the main reasons for alternating between profiles is to
increase payoffs. Thus, two boundedly rational players would not play a complicated pattern,
if within that pattern, there were a simpler one that gives them both higher payoffs. In other
words, since it is costly for them to coordinate on complex intertemporal patterns, they
would not increase their coordination effort to both earn less.
Let us illustrate our third axiom in the diner game. The row player is a child and the
column player is the parent. The game is played every day before diner. The child chooses
to eat a cookie (E) or to not eat it (Ē). The parent decides to punish (P ) or to not punish
(P̄ ) the child. The child has a dominant action to eat the cookie, and the parent only
prefers to punish the child if she eats the cookie. Consider the sequence ((E, P̄ )(Ē, P̄ ) . . .),
in which the child eats the cookie one time out of two and the parent never punishes the
child. This sequence will be eliminated by internal stability. The intuition is that a player
has to justify to herself why she is not trying to be greedier, but she is unable to process
all the counterfactuals; thus, she bases her reasoning on direct observation of the path. In
this sequence, the child may be unable to justify to herself why she only eats the cookie fifty
percent of the time, because she never suffers the negative consequence of eating it. Thus,
2
This sequence is produced by Cournot dynamics and by many sophisticated learning dynamics
(Milgrom and Roberts (1991)) when the process is initiated at (S, H).
5
the sequence is destabilized by the child who will eventually increase her eating frequency.
One virtue of this approach is that it allows us to build a multifaceted picture of boundedly
rational play by combining various independent principles. The multifaceted solutions that
emerge from this framework deliver insightful conclusions. Our main solution, for example,
will unearth reputation building phenomena (Section 4.2). Furthermore, this approach makes
it possible to explore the many ways to be less than rational within the same framework, by
comparing different axioms and their solutions. In the conclusion, we suggest several avenues
for future axioms and characterizations. It is also worth mentioning that the combination of
axioms need not jeopardize tractability. Our characterizations, for instance, will be easy to
compute. Finally, as a pattern-based approach, it is pragmatic in nature and experimentally
testable (Section 4).
However, the main limitation of our analysis is the lack of an answer to the question ‘how
exactly do players produce the desired patterns?’ Abstraction is the raison d’être of our
approach, and the absence of a model of players’ behavior is deliberate. Yet, if an axiomatic
solution is valid, then one can build a model that explains it. In this sense, our approach
serves a different purpose from the existing literature and it is complementary to it.
Our main theoretical results are characterizations. The first result characterizes the solution under individual security and strong collective intelligence. This result allows us to
characterize the solution payoffs, which take the form of line segments. In 2 × 2 games, these
payoffs are reminiscent of the equilibrium payoffs of Abreu and Rubinstein (1988); we compare both characterizations in examples. Subsequently, we introduce internal stability and
characterize the payoffs under this axiom; once again, they are represented by line segments.
We then combine these characterizations and describe the sequences and the payoffs that
survive all axioms. In general, our solutions offer sharp predictions. In certain games, the
payoff prediction is even unique.
6
The empirical results evaluate the performance of our main solution, based on the experiments by Mathevet and Romero (2012). The data consist of over 400 sequences generated by
human subjects, who played eight different stage games, including the Prisoners’ Dilemma,
for more than 100 periods on average. Based on payoffs, type I and type II errors, our solution is at least as predictive as the Experience Weighted Attraction (EWA) learning model
of Ho et al. (2007) and the reinforcement learning model of Er’ev and Roth (1998). The
comparison is important, because these models have been shown to be highly descriptive of
human behaviors.3 Moreover, we use two different pattern detection algorithms to extract
all sequences that exhibit a pattern. We obtain two subsets that each represent about 2/3
of the data. In each subset, the axioms hold more than 90% of the time. To give some order
of magnitude, the first algorithm considers all patterns of play whose length is less than 10,
which represents more than 106 patterns. In several games, less than 0.04% of those 106 patterns belong to our solution, yet 90% of observed patterns fall into our solution. Of particular
interest are the predictions in the Samaritan’s Dilemma (Buchanan (1977)). Our solution
predicts that players will reach efficient agreements in which payoffs depend on the samaritan’s ability to build a reputation via punishments; yet these punishments vanish over time,
as if reputation became common knowledge. The experiments do suggest the establishment
of a reputation that persists after punishments vanished.
The importance of bounded rationality in repeated interactions has long been recognized
by the learning literature (Fudenberg and Levine (1998)), but the axiomatic method has
rarely been used in this context (see Thomson (2001) for a survey). In a recent paper,
Blonski et al. (2011) proposed an axiomatic method for selecting equilibria in the repeated
Prisoners’ Dilemma.
3
These models might not be intended for repeated games. Yet, their empirical success makes them a
valuable benchmark for repeated games, inasmuch as the alternatives to this type of models are scant.
7
Our execution of the axiomatic method leads to pattern selection. The fact that patterns of observable behavior can be the outward expression of inward motives, and thus,
can convey important information of psychological nature, is the tenet of frequent pattern
mining (Han et al. (2007)) and behaviorism (see Moore (2011) for a sketch). Applications of
pattern mining include market basket analysis to understand co-occurrences among products (Agrawal et al. (1993)) and web usage mining to analyze users’ browsing activities
(Eirinaki and Vazirgiannis (2003)). The importance of patterns has been recognized by the
learning literature (Sonsino (1997) and Foster and Young (2003)), but we take it further by
considering patterns of observed behavior as primitives of the analysis.
The paper is organized as follows. The next section introduces the framework and the basic
definitions. Section 3 provides the axioms and the characterizations. Section 4 presents the
experiments and the empirical results. Our approach raises important conceptual questions
that we discuss in Section 5. Section 6 concludes. All proofs are relegated to the appendix.
2. PRELIMINARIES
2.1. Model and Solution
In this paper, we consider repeated interactions between two players. Let G = (A1 , A2 , u1, u2 )
be a finite two-person game in normal form where Ai is player i’s finite action set; A = A1 ×A2
is the set of action profiles; and ui : A → R is i’s payoff function. Let Σi be i’s set of
mixed actions with typical element αi . The payoff functions are extended to mixed actions by taking expectations. Let G be the set of all finite two-person games. For G ∈ G,
Π(G) = {π ∈ R2 : ∃a ∈ A s.t. π = (u1 (a), u2 (a))} is the set of pure payoff profiles. Let Co
denote the convex hull operator.
A repeated interaction consists of an infinite sequence of repetitions of a stage game G at
discrete time periods t = 1, 2, . . . The stage game is common knowledge among the players.
8
In every period t, the players make simultaneous moves, denoted by ati ∈ Ai , that become
common knowledge. These choices can be the realizations of randomization devices.
The dynamic payoffs will emphasize long run phenomena. Let S(G) = A∞ (or, simply S)
be the set of all infinite sequences for a given stage game G. A sequence of play is written
as s = (s1 s2 . . .) where st = (at1 , at2 ). For s ∈ S, player i’s dynamic payoff is
πi (s) = lim inf
T →∞
T
1X
ui (st ).4
T t=1
The liminf criterion, which implies complete patience (Marinacci (1998)), is a modeling choice
motivated by simplicity. By using this criterion, we abstract away from patience issues, in
much the same way as the standard learning literature.5 In practice, we can implement this in
experiments by using large discount factors δ, as subjects do not make a significant difference
between δ = .99 and δ → 1. Moreover, we can interpret eventual behaviors in finite horizon
as those emerging late in the game and persisting (see Section 4 and Footnote 21).
A solution S is a function that assigns to each G ∈ G a subset S(G) ⊂ S. The set S(G)
should be thought of as the set of infinite sequences that we expect to arise in the repeated
interaction with stage game G.
2.2. Complexity
This paper will pay particular attention to cyclic sequences. A sequence s ∈ S is cyclic
if each action profile a occurs with frequency wa (s)/ℓ(s) in the sequence, where ℓ(s) is the
4
The liminf is the infimum of the cluster points of (st ), the sequence of average payoffs from 1 to t. Terms
that appear finitely many times in a sequence do not influence the liminf.
5
There is no reason why discounting and patience would not play a role in studies of bounded rationality,
but these issues are secondary to the subject matter. For example, many learning models are agnostic about
patience related issues. In aspiration-based learning (e.g., Karandikar et al. (1998)), which concerns repeated
games, players satisfice myopically. In some models of adaptive learning (Milgrom and Roberts (1990)),
players can be seen as maximizing myopically. Of course, myopia can also be interpreted as discounting with
δ = 0.
9
length of the cycle. Formally, s has a cycle if there exists ℓ(s) ∈ N such that (i) for each
a ∈ A, there is a non-negative number wa (s) ∈ N such that limT →∞
where 1 is the indicator function, (ii)
P
a
PT
t=1
1{st =a}
T
=
wa (s)
ℓ(s)
wa (s) = ℓ(s), and (iii) there is no ℓ < ℓ(s) for
which (i) and (ii) hold. In this context, a cycle is a cyclic sequence whose terms all appear
infinitely often. Note that the order does not matter in a cycle: letting a, b ∈ A, (abaaba . . .)
and (aabaab . . .) are two instances of the same pattern of length 3. Note also that an action
profile can have zero frequency in a cyclic sequence and yet appear infinitely often. Examples
of processes that generate cycles are mixed strategies, public randomization devices, regular
Markov chains, etc.
A sequence s is strongly cyclic if it has a strong cycle, i.e., if there exist T and ℓ(s) such
that st = st+ℓ(s) for all t ≥ T and there is no ℓ < ℓ(s) for which st = st+ℓ for all t ≥ T . In
words, there is a time after which the pattern (st , . . . , st+ℓ(s)−1 ) repeats itself forever. Strong
cycles are cycles. In repeated games, the use of automaton strategies produces strong cycles
(Abreu and Rubinstein (1988)).
The length ℓ of a cycle is a natural albeit coarse measure of complexity. Intuitively, it
is simpler for players to play a single action profile rather than alternate between two. In
general, it seems reasonable to assume that longer cycles are harder intertemporal patterns
to generate than shorter cycles. If we consider the processes that induce cycles, then complex
cycles suggest complex processes. For example, ℓ is a coarse indicator of the complexity of a
pair of automata.6
3. THE AXIOMS
Our approach raises conceptual questions about the interpretation of the axioms, the
role of observables, and “off-path” considerations. We discuss them in Section 5. Regardless
6
The length of a cycle gives an upper bound on the number of states that a pair of automata can have in
order to generate the cycle. Automata also specify off-equilibrium behaviors that are not taken into account
here.
10
of interpretation, our approach amounts to selecting cycles, whether these cycles should
be understood as the limit points of a learning process or as the starting points of an
“equilibrium” analysis. For now, think of our axioms as selecting cycles: if two boundedly
rational players adopt some cycle s as a convention, what might this convention plausibly
(not) look like?
For any π, π ′ ∈ R2 , π ≫ π ′ means π1 > π1′ and π2 > π2′ , i.e., π (strictly) Pareto dominates
π ′ . Moreover, π > π ′ means π1 ≥ π1′ and π2 ≥ π2′ and at least one inequality holds strictly,
i.e., π weakly Pareto dominates π ′ .
3.1. Individual Security
Although players are boundedly rational, their knowledge of the stage game suggests the
minimal amount of rationality described by the first axiom. Let
ui (G) = max min ui (αi , aj )
αi ∈Σi aj ∈Aj
be player i’s maxmin payoff in G. The maxmin is a player’s security level, i.e., the player
can secure at least this payoff in the stage game. A maxmin action is any mixture αi that
guarantees ui to be at least as large as i’s maxmin payoff.
Axiom 1
(Individual Security) For any G ∈ G, if s ∈ S is such that πi (s) < ui (G) for
some player i, then s ∈
/ S(G).
This axiom eliminates all conventions for which there exists a player whose dynamic payoff
is strictly lower than her maxmin level. In many situations, Axiom 1 is uncontroversial. If
we restrict attention to sequences with convergent payoffs,7 then any player can switch at
7
Payoff-convergent sequences are such that lim inf T →∞
convergent.
1
T
PT
t=1
ui (st ) exists. Cyclic sequences are payoff-
11
any time and forever to her maxmin action, and obtain a larger payoff than the sequence s
described in the axiom. Indeed, the player’s continuation payoffs after the switch, and hence
her overall dynamic payoffs, will be weakly larger than her maxmin level.
As a remark, note that since the minmax is larger than the maxmin, using the former in
the axiom would lead to a more restrictive axiom.
3.2. Collective Intelligence
We introduce two versions of collective intelligence and then characterize the solution in
conjunction with individual security. The idea behind collective intelligence is that boundedly
rational players would not play a complicated pattern, if within that pattern, there were a
simpler one that gives them both higher payoffs. Since it is costly for them to coordinate on
complex intertemporal patterns, they would not increase their coordination effort to both
earn less. Or to put it in yet another way, we would expect an increase in pattern complexity
to benefit at least one player.
3.2.1. Weak Collective Intelligence
Let us start with a standard yet central definition:
Definition 1
A sequence s′ is a subsequence of s if there exists a strictly increasing func-
tion h : N → N such that s′ t = sh(t) for all t.
Limit attention to strongly cyclic sequences s. Given a sequence s, say that its cycle starts
from T0 (s) if T0 (s) is the smallest time T such that st = st+ℓ(s) for all t ≥ T . A truncation of
s is a strongly cyclic subsequence ŝ of s that has a cycle of length ℓ(ŝ) < ℓ(s) and satisfies
the following condition: either (i) ŝt = ŝt+1 for all t ≥ T0 (ŝ), or (ii) ŝT0 (ŝ) = sT0 (s) and for
all t ≥ T0 (ŝ), ŝt = sk implies ŝt+1 = sk+1 . In words, a truncation is a pattern obtained by
12
interrupting a cycle and repeating its last element forever, or obtained by always interrupting
a cycle at the same point and re-starting it from the beginning. For example, letting a, b and
c be action profiles in A, (bb . . .) and (abab . . .) are truncations of (abcabc . . .), but (bcbc . . .)
is not a truncation of it.
Axiom 2 (Weak Collective Intelligence) For any G ∈ G, if a strongly cyclic sequence s ∈ S
has a truncation s′ such that π(s′ ) ≫ π(s), then s ∈
/ S(G).
The notion of truncation is perhaps the most conservative requirement that can be used
to rank patterns according to complexity. If two players can conceive of a pattern, then they
should be able to conceive of its truncations. A truncation is not only simpler, but it is
also mechanically easy to obtain from a cycle, because it repeats one element or an initial
section of it. Therefore, it is natural to consider that truncations involve lower complexity
costs or coordination efforts. In this context, the second axiom expresses the fact that two
unsophisticated players would not rule out a truncation to earn less.
3.2.2. Strong Collective Intelligence
This section extends the notion of collective intelligence in several directions. The weak
axiom is limited to strong cycles, but a general notion of collective intelligence should apply
to all cycles, especially to randomizations. Furthermore, it is important to use a weaker
criterion than truncations to capture complexity. This will lead to sharper solutions and
permit a looser interpretation of bounded rationality.
Axiom 3 (Strong Collective Intelligence) For any G ∈ G, if s ∈ S has a subsequence s′
such that π(s′ ) ≫ π(s), then s ∈
/ S(G).
The axiom rules out any convention containing another convention that strictly benefits
both players, for they would play it instead and the original convention would not emerge.
13
In the Prisoners’ Dilemma of Section 3.3, the infinite repetition of pattern ((D, C)(C, C)
(D, C)(C, D)), denoted as sequence s, has no Pareto-improving truncation,8 and as such, it
survives Axiom 2. According to Definition 1, however, the infinite repetition of ((D, C)(C, C)
(C, C)) is a subsequence of s. Denote it by ŝ. Since ŝ is Pareto-improving over s, convention s is ruled out by Axiom 3. As another example, the infinite sequence of defections,
((D, D)(D, D) . . .), survives Axioms 2 and 3.
Axiom 3 has a subtle but close relationship to bounded rationality. Suppose complexity
is measured by the length of a cycle (Section 2.1). In the previous example, subsequence
ŝ was not only Pareto improving, but also less complex than s.9 In general, if players can
conceive of a convention, then they should also be able to conceive of the other conventions
that are less complex than it and that use the same profiles, such as the truncations. Since
it is costly for boundedly rational players to coordinate on complex intertemporal patterns,
they will not adopt a more complex convention that results in lower payoffs for both. Or to
put it in another way, we would expect an increase in pattern complexity to benefit at least
one player. This argument relies on the existence, for every possible sequence s, of a Pareto
improving subsequence of lower complexity than s. This problem is non-trivial, because the
definition of a subsequence allows its pattern to be more complex than that of the sequence
from which it is extracted (see Footnote 9). We study this next.
In Section B of the appendix, we establish Proposition 5, which is the keystone of the relationship between Axiom 3 and bounded rationality. This proposition requires technicalities
that would interrupt the flow, which is why we offer a brief exposition here. This result connects families of games to the existence of a subsequence of bounded complexity. According
8
The reader can check (d, c), ((d, c), (c, c)), and ((d, c)(c, c)(d, c)) do not yield Pareto improvements.
A subsequence could be more complex than the sequence from which it is extracted. Let a, b be action
profiles in A. Sequence s = (abab . . .) has a cycle of length 2 and s′ = (abbabb . . .), constructed by removing
one a out of two from s, is a subsequence of s but it has a cycle of length 3.
9
14
to Definitions 4 to 6 of Section B, we can classify all stage games into families Fn , where
n ≥ 1 is a natural number. Suppose we have done so. Proposition 5 then states: For stage
games in Fn , if a cyclic sequence s admits a (strictly) Pareto-improving subsequence, then it
also admits one whose cycle is at most n times the length of the cycle of s.
This result has important implications. For games in F1 , which includes Battle of the
Sexes, Stag Hunt, Chicken, and many others, every sequence discarded by Axiom 3 contains
a simpler Pareto-improving subsequence.10 Since simpler is better for both players, simplicity
would probably prevail among boundedly rational players. For games in Fn with n ≥ 2, which
includes the Prisoners’ Dilemma, a convention s may admit Pareto-improving subsequences
that are all more complex than s, and thus convention s can only be eliminated by players
who, despite their limitations, can consider cycles of greater complexity. The proposition
provides a bound on the necessary additional complexity. In Fn , in order to eliminate any
convention s that violates Axiom 3, the players need the ability to implement cycles that
are at most n times the length of the cycle of s.
Strong collective intelligence is not equal to Pareto efficiency, as it allows for inefficiencies.
In the Prisoners’ Dilemma, for instance, the sequence of infinite defections, ((D, D)(D, D) . . .),
survives the axiom. This axiom imposes no restrictions on constant sequences; its only purpose is to select among cycles containing at least two profiles. Even then, it permits inefficiencies, such as ((D, C)(C, D) . . .) in the Prisoners’ Dilemma. Ultimately, our endeavor is
to select cycles that express a form of equilibrium or stability. Nothing precludes chaos or
the presence of punishments on the way to equilibrium, but in equilibrium, if punishments
complexify a convention while harming both players, the plausibility of this agreement may
be questioned. This implication echoes Rubinstein (1986), in which the equilibrium concept
10
By “simpler,” we mean weakly simpler: given a sequence whose cycle has length ℓ, the simplest Pareto
improving subsequence may also have a cycle of length ℓ. In Section B, we define a family F1∗ ⊂ F1 for which
all sequences that violate Axiom 3 admit a strictly simpler subsequence.
15
rules out automata whose states are not all used infinitely often, because such states — that
may be used for punishments at the beginning — eventually become useless and complexify
the machine without contributing to play. In summary, our axiom excludes those conventions
in which mutually harmful behaviors have significant weight and co-exist indefinitely with
beneficial outcomes:
Proposition 1
If a cyclic sequence s has no subsequence s′ such that π(s′ ) ≫ π(s), then
the frequency of play of all a ∈ A such that u(a) ≪ π(s) is wa (s)/ℓ(s) = 0.
As we will see in Section 4.2, certain profiles, such as punishments, can have zero frequency
and yet play a crucial role in the solution.
3.3. Intermediate Characterizations
We now characterize the set of sequences and payoffs that survive individual security and
collective intelligence. The results are illustrated in examples.
In preparation for the results, we define several terms. The set of individually secure payoffs
is given by
n
o
ΠIS (G) = π ∈ Co(Π(G)) : πi ≥ ui (G) ∀i = 1, 2 .
For any X ⊂ R2 , let
P(X) = {π ∈ X : ∄x ∈ X s.t. x ≫ π}
be the set of weakly Pareto efficient elements in X relative to X, and letting c stand for
complement, define
U(X) =
\
x∈X
{π ∈ R2 : π ≪ x}c
16
to be the set of real vectors that are dominated by no point in X. For any s ∈ S, define
n
o
R(s) = u ∈ R2 : ∃a ∈ A s.t. u = (u1 (a), u2 (a)) and (∀T ∈ N+ )(∃t ≥ T ) s.t. st = a
(3.1)
to be the set of recurrent payoffs in s, i.e., the set of payoff profiles that appear infinitely
many times in the sequence.11
Let S w be the solution that assigns to each game the set of all strongly cyclic sequences
surviving Axioms 1 and 2. First, we characterize the payoffs associated with S w .
Proposition 2
For all G ∈ G, π(S w (G)) = {π ∈ R2 : there is a strongly cyclic s ∈
S s.t. π = π(s) ∈ U(R(s))} ∩ ΠIS (G).
According to the proposition, the solution payoffs are vectors π that are produced by
sequences, none of whose recurrent terms dominate π. Otherwise, playing a recurrent term
would be a Pareto-improving truncation.
This result suggests a simple procedure to determine whether a payoff is in the solution
set. Choose a feasible payoff π and ask whether it is possible to generate it with profiles
whose payoffs do not Pareto dominate π. π is a solution payoff if and only if the answer is
yes. For example, consider the Prisoners’ Dilemma and the rightmost figure in Figure 1 (on
p. 19). Any payoff π in the white triangular area requires to play (C, C), but u(C, C) ≫ π.
Therefore, π is excluded. For the black triangles directly above and to the right of the white
triangle, π can be produced by a combination of (C, C), (D, C) and (C, D), none of which
Pareto dominates π.
Second, we characterize the sequences that survive Axioms 1 and 3, and their payoffs. Two
notions play a central role, individual security and internal efficiency.
11
This interpretation of R(s) is accurate, because A is finite.
17
Definition 2
A sequence s is individually secure if πi (s) ≥ ui (G) for all i.
Definition 3
A sequence s is internally efficient if π(s) ∈ P(Co(R(s))).
Internal efficiency is a sequential notion that is weaker than Pareto efficiency. For instance,
an internally efficient sequence may be inefficient, such as constant defection in the Prisoners’
Dilemma. Pareto efficiency requires dynamic payoffs to lie on the overall Pareto frontier,
i.e., π(s) ∈ P(Co(Π(G))), whereas internal efficiency only requires a sequence to generate
payoffs that lie on the Pareto frontier of (the convex hull of) its own set of recurrent payoffs.
Internal efficiency takes players’ abilities into account: when players are contemplating some
arrangement, they should be aware of the profiles that compose it, and thus, they may be
aware of the internal opportunities for improvement. Pareto efficiency, however, requires
players to take whatever action necessary to reach the overall Pareto frontier.
Theorem 1
A solution S satisfies Axioms 1 and 3, if and only if, for all G ∈ G, S(G) is
a subset of all internally efficient and individually secure sequences.
Theorem 1 is intuitive. If a sequence survives Axiom 3, then it cannot offer Pareto improvements within its recurrent payoffs, for this would amount to building a Pareto improving
subsequence. Likewise, if a sequence is internally efficient, then it cannot be Pareto improved by any of its subsequences, and thus, no convex combination of its recurrent terms
can dominate it.
A consequence of Theorem 1 is the special structure of solution payoffs. Let L(G) be the
set of all pure payoffs and all line segments connecting two pure payoffs in game G. That
is, L(G) is the set of all g = Co({uk , uh }), where uk and uh are in Π(G) but not necessarily
different.
Let S st be the solution that associates to each game the set of all sequences surviving
18
Axioms 1 and 3.
Theorem 2
For all G ∈ G, π(S st (G)) = (∪g∈L(G) P(g)) ∩ ΠIS (G).
The solution payoffs must be individually secure and either (i) lie on line segments connecting two (Pareto unranked) pure payoffs or (ii) be pure payoffs. This representation as
a union of segments dramatically restricts the possible payoffs. In certain common interest
games, it implies uniqueness (see Section C.8 in the appendix). These segments, however,
do not imply that players only play two action profiles, but rather that only two profiles
appear with non-vanishing frequency in the sequence. This point is crucial, because some
profiles may be vanishing and yet serve an important purpose, such as building a reputation
(Section 4.2).
Figure 1 illustrates our solution payoffs in examples and includes the Nash equilibrium
payoffs from Abreu and Rubinstein (1988)’s machine game.
3.4. Internal Stability
In this section, we present the axiom of internal stability and characterize the dynamic
payoffs that survive it.
As the game unfolds, each player gathers information about her opponent. Information
processing is a difficult task, in particular for a boundedly rational player trying to interpret
the actions of another player. The idea behind internal stability is to select conventions
according to whether each player can justify to herself why the surplus is divided the way it
is in these conventions. But this justification is required to come from what a player observes
directly, which emphasizes her naiveté or her inability to process certain counterfactuals.
19
∏2
C
D
C
3,3
1,4
D
4,1
2,2
∏2
∏2
∏1
∏1
∏1
Prisoners’ Dilemma
∏2
A
B
A
4,2
1,1
B
1,1
2,4
∏2
∏1
∏2
∏1
∏1
Battle of the Sexes
∏2
∏2
S
H
S
3,3
0,2
H
2,0
1,1
∏1
∏2
∏1
∏1
Stag Hunt
A
B
A
3,3
1,4
B
4,1
0,0
∏2
∏2
∏2
∏1
∏1
∏1
Chicken
Abreu & Rubinstein
Axioms 1 & 3
Axioms 1 & 2
Figure 1.— Solution Payoffs under Individual Security and Collective Intelligence (indicated by thick dots, thick solid lines, and black areas. Dotted lines delineate the individually
secure payoffs. Left: Abreu and Rubinstein (1988)’s equilibrium payoffs; middle: π(S st (·));
right: π(S w (·))).
20
3.4.1. The Axiom
Fix a sequence s ∈ S. The set of recurrent action profiles in s is defined by
R∗ (s) = {a ∈ A : (∀T ∈ N+ )(∃t ≥ T ) s.t. st = a}.
(3.2)
Let Ai (s) = {ai ∈ Ai : ∃aj ∈ Aj s.t. a ∈ R∗ (s)} denote player i’s recurrent actions. For any
ai ∈ Ai (s), let Aj (ai , s) = {aj ∈ Aj : (ai , aj ) ∈ R∗ (s)} denote the actions of player j that
form recurrent profiles with ai . The observed minmax, defined for i 6= j as12
πi,j (s) = min
max
ai ∈Ai (s) aj ∈Aj (ai ,s)
uj (a),
(3.3)
is the smallest payoff that i shows the ability to impose j repeatedly, taking j’s reaction into
account. The observed minmax uses the standard definition of minmax, except that min and
max are taken over actual observations. For any s and ai , let bj (ai , s) = argmaxaj ∈Aj (ai ,s) ui (a)
(s) = {ai ∈ Ai (s) : uj (a) = πi,j (s) for all aj ∈
be j’s restricted best-response to ai . Let Amm
i
bj (ai , s)} be the set of (observed) minmax actions for i. Now lete
πi,i (s) =
max
min
(s) aj ∈bj (ai ,s)
ai ∈Amm
i
ui (a)
(3.4)
be the utility level that i can secure when minmaxing j in the sequence.
Interpersonal comparisons of observed minmaxes and minmaxing utilities are only meaningful when players’ payoffs are normalized on the same scale. In this regard, πi,j (s)/πj (s)
represents the worst damage caused by i to j as a percentage of j’s payoff, and πi,i (s)/πi (s)
is the percentage of her dynamic payoff that i retains while minmaxing her opponent.
12
This definition does not allow for mixing over Ai (s), because many mixtures are incompatible with the
observed frequencies of play; so mixing would present i with punishment abilities she may not be aware of.
21
Axiom 4 (Internal Stability) For any G ∈ G, if s ∈ S is such that
πi,j (s) πj,j (s)
,
πj (s) πj (s)
!
<
πj,i (s) πi,i (s)
,
πi (s) πi (s)
!
(3.5)
and πi (s) < maxa∈R∗ (s) ui (a) for some player i, then s ∈
/ S(G).13
If player i observes that she can impose j a smaller relative payoff than j can (first dimension in (3.5)) while incurring a smaller relative loss (second dimension in (3.5)), and if there
is a recurrent profile a that player i likes more than s, then i might think that she should
be able to enforce a more frequently, and thus, she will destabilize convention s by playing
ai more often.
In the Prisoners’ Dilemma of Section 3.3, the infinite repetition of pattern ((D, C)(C, C)),
denoted as s, is discarded by Axiom 4. In this sequence, player 1 alternates between D and C,
but her opponent never defects. Therefore, it is tempting for player 1 to play D more often in
order to try increasing her payoff. In the language of the axiom, π1 (s) = 7/2 < u1 (D, C) = 4
and (3.5) is given by (1/2, 1/2) < (8/7, 8/7). That is, player 1 shows that she can cause
player 2 to lose 1/2 of her dynamic payoff and at no cost, whereas 2 never causes any harm.
In the diner game of the Introduction, the infinite repetition of ((E, P̄ )(Ē, P̄ )), denoted as ŝ,
is also ruled out by Axiom 4. The child may think that she can increase her eating frequency
since she never gets punished. In formal terms, π1 (ŝ) = 5/2 < u1 (E, P̄ ) = 3 and (3.5) is
given by (1/2, 1/2) < (6/5, 6/5).
Internal stability assumes a strong form of bounded rationality. Unless a player observes
once in a while that her opponent can cause more damages than she can or at a smaller cost,
she cannot justify to herself why she is not trying to be greedier, and thus, she upsets the
convention. This emphasizes her finite memory, as punishments are eventually forgotten if
13
Use the conventions: 0/0 = 1, x/0 = ∞ for all x > 0, and x/0 > y/0 iff x > y.
22
not renewed, as well as a form of naiveté or myopia. The repetition of pattern ((D, C)(C, C))
in the Prisoners’ Dilemma is a good example. Since player 1 never witnesses a defection from
player 2, she does not foresee that playing D more often is likely to prompt 2 to defect.
More generally, this axiom divides the surplus relative to punishment and cost values
calibrated from what happens on the path. This has several implications. First, it imposes
no restrictions on constant sequences: all cycles of length 1 survive the axiom, as players have
no contradictory evidence. Second, any convention in which a player experiences occasional
benefits, while showing a greater ability to punish and at a lower cost, is unstable. Third, in
a symmetric stage game, if all action profiles appear infinitely often, then the axiom implies
that players will eventually settle on a convention in which they receive equal payoffs.
Internal stability is perhaps our most controversial axiom, yet its content is complementary
to earlier axioms. The theoretical benefit is worthwhile: the characterizations will be sharp
and sensible. Importantly, this axiom is source of interesting phenomena, such as reputation
building, which are evidenced by experiments (Section 4.2).
3.4.2. Intermediate Characterization
Let S is be the solution that assigns to each game the set of all sequences surviving Axiom
4. In this section, we characterize the payoffs associated with S is .
Let Ws = {π1,2 (s)/π2,1 (s), π2,2 (s)/π1,1 (s)} contain the ratio of players’ minmax utilities
and the ratio of their utilities when minmaxing, where min Ws and max Ws represent the
smaller and the larger of the two ratios. Define14
n
L(s) = π ∈ R2+ : π2 = απ1 where α ∈ (min Ws , max Ws ) if
14
π1,2 (s)
π2,1 (s)
and α =
6=
π2,2 (s)
,
π1,1 (s)
π2,2 (s)
π1,1 (s)
otherwise
Recall the conventions: 0/0 = 1, x/0 = ∞ for all x > 0, and x/0 > y/0 if and only if x > y.
o
(3.6)
23
to be the set of line segments starting from the origin (although the origin may be excluded)
and whose slopes lie between the above ratios. Note that if two sequences s and s′ have the
same recurrent profiles, R∗ (s) = R∗ (s′ ), then L(s) = L(s′ ). Therefore, there are as many sets
L as there are subsets of action profiles.
The next proposition characterizes the payoffs associated with internal stability. Let G ′ be
the family of games for which, if ui (a) = ui (a′ ) for some i, then a = a′ . Let G + be the family
of games with nonnegative payoffs.
Proposition 3
For any G ∈ G + ∩G ′ , π(S is (G)) = {π ∈ R2 : ∃a ∈ A s.t. π = u(a) or ∃s ∈
S s.t. π = π(s) ∈ L(s)}.
This result states that the solution payoffs must lie on line segments or be pure payoff
profiles. This is reminiscent of the segmental structure described by Theorem 2, except that
the segments from Proposition 3 have positive slopes and start from the origin.
In Figure 2, the proposition is applied to the Prisoners’ Dilemma and Battle of the Sexes.
The dashed lines represent the payoffs compatible with internal stability. In the Prisoners’
Dilemma, there are three relevant sets R∗ (·): (i) those that contain (D, D), (ii) those that
do not contain (D, D) but contain (D, C) and (C, D), (iii) the others. In cases (i) and (ii),
the observed minmax and the utility when minmaxing are the same for each player, and
thus α = 1. This corresponds to the middle dashed line in Figure 2. In case (iii), if R∗ (·) =
{(C, C)}, then α = 1; otherwise, α = 0 or α = ∞ (for instance, if R∗ (·) = {(C, C), (C, D)},
then α = ∞, which corresponds to the vertical dashed in line in Figure 2).
3.5. The Characterization Theorem
The Characterization Theorem describes the sequences, and especially the payoffs, that
survive individual security, strong collective intelligence, and internal stability. Considering
24
∏2
∏2
∏1
∏2
∏2
∏1
∏1
∏1
Figure 2.— Solution Payoffs under Individual Security, Strong Collective Intelligence,
and Internal Stability (Solution payoffs are represented by thick dots in the second and the
fourth figure. The first and the third figure represent the payoffs associated with Axiom 1
and 3 (solid thick lines) and Axiom 4 (dashed lines)).
that constant (individually secure) sequences pass all requirements, the axioms are mutually
compatible in all games for which there exists at least one individually-secure pure-action
profile. The solution is also nonempty in many other games for which this condition is
violated, as in the mixed Nash game of Figure 3.
Theorem 3
If S satisfies Axioms 1, 3, and 4, then for any G ∈ G + ∩ G ′ , S(G) consists
of individually secure and internally efficient sequences, and for all π ∈ π(S(G)), either
π = u(a) for a ∈ A or there is (an individually secure and internally efficient) s such that
π = π(s) ∈ L(s).
This theorem is a direct product of earlier results. Figure 2 and Section A.1 of the appendix
show the conclusions of Theorem A.1 in eight different games. Section 4.2 offers a detailed
application.
4. THE EXPERIMENTS
Experiments were run by Mathevet and Romero (2012) at the Vernon Smith Experimental
Economics Laboratory at Purdue University. The subjects were drawn from a pool of undergraduate students. All sessions were computerized, using an interface programmed with
z-Tree (Fischbacher (2007)). Subjects were randomly assigned to a computer and given a
25
handout with instructions. Instructions were read aloud. Each subject completed a quiz to
make sure that they understood the instructions. Each session lasted about 1 hour. During
the experiments, all payoffs were displayed in experimental Francs, and the exchange rate
to U.S. dollars was announced during the instructions. Subjects’ final payoff was the sum of
earnings of all rounds.
There were a total of 16 sessions consisting of 314 subjects. In each session, subjects
played a total of two or three supergames. Each subject was randomly paired with one
other subject and they played a 2 × 2 game repeatedly with the same partner. After each
supergame, subjects were rematched with a new partner that they had not been matched
with before, and played a game that they had not played before. The length of the supergame
was determined randomly (see Roth and Murnighan (1978)). We considered two treatments.
In the first treatment, each supergame started with 30 rounds with certainty, after which
the probability of playing an additional round was δ = 0.9. In the second treatment, the
continuation probability was δ = 0.99 from the first period (there was no number of rounds
with certainty). The second treatment represents 2/3 of all the data points. As a result,
most sequences of play were more than 100 periods long. The reason for having these two
treatments was to observe the impact of increasing the discount factor when it is large. The
differences between the two data sets were minor, and thus, we merged them in our analysis.15
Our interpretation is that there may be a discount factor above which subjects do not make
a significant difference. Since our theoretical approach is concerned with complete patience,
which might be impossible to re-create exactly in the lab, our data set is a reasonable
approximation of this scenario.
15
This operation is particularly innocuous here, because our approach only deals with the set of outcomes
and not the probabilities over outcomes. As we move from one treatment to the other, there might be small
changes (e.g., in the probability of cooperation vs. defection), but as long as the same outcomes appear and
no new ones emerge, the evaluation of our approach is unaffected.
26
A
B
A
3,3
1,4
B
4,1
2,2
Prisoners’ Dilemma (70 obs)
A
B
A
3,3
2,1
B
1,2
0,0
Common Interest (18 obs)
A
B
A
1,1
2,4
B
4,2
1,1
A
B
A
3,3
0,2
B
2,0
1,1
Battle of the Sexes (70 obs)
W
W
H
2,2
2,2
H
5,3
3,5
Samaritan’s Dilemma (60 obs)
A
B
A
3,3
1,4
B
4,1
0,0
Stag Hunt (50 obs)
A
B
A
2,2
2,2
B
3,1
0,0
Ultimatum Game (60 obs)
Chicken (70 obs)
A
B
A
4,1
1,2
B
1,2
2,1
Mixed Game (36 obs)
Figure 3.— 2 × 2 Games Tested and Number of Observations
Our experiments generated 434 sequences of play. A total of eight 2 × 2 games were tested
(see Figure 3). In each round, subjects were asked to make a choice between two actions as
well as a guess about their partner’s next move.16 Correct guesses were randomly rewarded
by monetary bonuses.17
4.1. General Results
This section evaluates the empirical performance of our main solution (Axioms 1, 3, and
4). We run the analysis based on the last 30 periods of interaction between subjects so as to
capture long run behaviors. We discuss alternative specifications at the end of the section.
First, based on the subjects’ average payoffs, we show that the solution is at least as predictive as the Experience Weighted Attraction (EWA) learning model of Ho et al. (2007) and
the reinforcement learning model of Er’ev and Roth (1998). The comparison is important,
16
For a small number of experiments, we could not obtain players’ guesses.
For each correct guess, a subject earned a raffle ticket. At the end of each supergame, one raffle ticket
was randomly selected, and the corresponding subject received a bonus.
17
27
TABLE I
Empirical Performance Based on Payoffs
Game
Axiomatic
Type I
Prisoners’ Dilemma
36%
Battle of the Sexes
61%
Stag Hunt
12%
Chicken
46%
Common Interest
6%
Samaritan’s Dilemma 56%
Ultimatum Game
52%
Mixed-Nash Game
100%
Type II
Reinf.
Type I
0% 83%
.06% 93%
0% 12%
3% 100%
0%
6%
4% 83%
.9% 53%
.25% 94%
Type II
EWA
Folk Thm
Type I
Type II
Type I
Type II
0% 83%
.06% 93%
0% 12%
.05% 100%
0%
6%
0% 83%
0% 98%
0% 94%
0%
.06%
0%
.05%
0%
0%
0%
0%
6%
30%
2%
16%
0%
47%
38%
69%
51%
75%
69%
68%
18%
71%
28%
11%
Note. Type I is the percentage of experimental observations (average payoffs over the last 30
periods) that fall outside the prediction set of a theory. Thus, Type I measures the fraction of
incorrect rejection of observations. Type II is the percentage of all feasible payoffs that did not
occur in the experiments despite being predicted by a theory. Thus, Type II measures the failure to
reject “false” observations. Since several theories make infinitely many predictions, we discretized
the set of feasible payoffs (into squares of width/height .05 units) and counted the errors.
because these models are a valuable benchmark for human behavior in repeated settings.
Section A.1 presents the outcomes of our experiments side-by-side with the theoretical predictions. Second, we use pattern detection algorithms to filter the data, and we show that
our axioms are broadly satisfied within large subsets of the data.
Table I describes empirical performances based on payoffs. Based on Type I and Type II
errors, the axiomatic solution performs better than EWA and reinforcement learning in the
Prisoners’ Dilemma and Battle of the Sexes, while it does worse in the Mixed-Nash game.
All three approaches perform equally well in Stag Hunt and Common Interest. In the other
games, the comparison is more ambiguous. The axiomatic solution always accounts for a
larger fraction of the data (Type I) at the expense of Type II errors. In certain contexts,
such as the Samaritan’s Dilemma, this allows us to identify important phenomena that would
otherwise remain unexplained (see Section 4.2).
Table II reports the satisfaction rates of the axioms in large subsets of the data. We run
28
pattern detection algorithms to filter the data (The Supplement gives details).18 For each
observed sequence, the algorithms determine whether it has a pattern, and if so, they output
it. Once all patterns are extracted, it is relatively straightforward to determine which ones
survive the axioms.
The first algorithm (Algo1 ) extracts any deterministic pattern, i.e., strong cycle, of length
10 or less within the last 30 periods of interaction. This algorithm is applied to the observed
sequences of action profiles (Algo1-A) and payoff profiles (Algo1-P).19 A sequence survives
the algorithm, if and only if, Algo1 detects a pattern. Column All in Table II gives the total
number of observed sequences. Columns Nb give the number of sequences that survive an
algorithm. Quick calculation shows that about 2/3 of all sequences survive Algo1. In the
data, there are patterns of length 1 to 10, although lengths 1 and 2 are the most common.
The second algorithm (Algo2 ) uses a different principle. In each round of experiments,
subjects were asked to make a guess about their partner’s next move. A sequence of action
profiles survives Algo2 if each player makes at most 3 errors in her last 30 guesses. The idea
is that players’ ability to predict each other’s next move captures a form of stability. Column
All ∗ gives the number of observed sequences for which we obtained players’ guesses. About
2/3 of them survive Algo2.
The Rate columns of Table II display the percentage of observed (stable) sequences that
satisfy Axioms 1, 3, and 4. The data show strong regularities: at least 90% of patterns survive
our axioms. Let us explain this result. The sequences that survive an algorithm are called
stable. To decide whether a stable sequence satisfies the axioms, we assumed that its pattern
18
These algorithms were developed by Julian Romero and the present author.
Some sequences have patterns at the payoff level but not at the action level. Such sequences may still
be relevant, which is why we include them in Algo1-P.
19
29
TABLE II
Satisfaction Rates of the Axioms Among Stable Sequences
Game
All
Prisoners’ Dilemma
Battle of the Sexes
Stag Hunt
Chicken
Common Interest
Samaritan’s Dilemma
Ultimatum Game
Total/Average
70
70
50
70
18
60
60
398
Algo1-A
Algo1-P
All ∗
Algo2
Nb Rate Nb Rate
Nb Rate
49 100% 49 100%
57 40 100%
40 88% 40 88%
58 31 93%
45 100% 45 100%
38 35 92%
46 91% 46 91%
58 36 89%
18 100% 18 100%
18 17 94%
35 80% 38 74%
48 28 75%
33 97% 37 97%
47 26 92%
266 94% 277 92% 324 214 91%
would have continued to be repeated forever.20,21 If a stable sequence, so extended, survives
the axioms, then we say that it satisfies them. We want to emphasize that nothing in these
algorithms favors one pattern over another, except length (shorter patterns are found first).
For example, Algo1-A considers over 106 patterns of action profiles for each game. In the
Prisoners’ Dilemma and Battle of the Sexes, less than 0.04% of those 106 patterns belong
to our solution,22 yet 100% and 88% of the observed patterns fall into our solution. In Stag
Hunt, only 2 of those 106 patterns are predicted by the solution, yet 100% of the observed
patterns are among these 2.
The analysis relies on the last 30 periods, but our conclusions are robust to various specifications, as shown in Section A.2 of the appendix. On the one hand, looking at longer
continuations typically increases the “richness” of the data and hence type I errors. On the
other hand, more data points are then filtered out by our algorithms, and the satisfaction
20
For those sequences that are stable under Algo2, we assumed that the last 30 periods would have been
repeated forever. See the Supplement.
21
The assumption of infinite horizon is technical in nature and not meant to be realistic. But it compels
us to translate the meaning of finite events in reality into an infinite framework. Although we do not claim
that subjects would have continued playing the same pattern forever, a situation in which two subjects play
a pattern for more than 100 periods may not be far removed from that of two “theoretical” players playing
that same pattern forever.
22
There exist 338 distinct patterns of length less than 10 that put weight 1/2 on each of two profiles.
30
rates of the axioms increase or are not significantly affected. This is because longer continuations yield a subset of the current patterns. We illustrate these remarks in the last 50
periods of interaction (Tables III and IV in Section A.2). Furthermore, the last n periods of
interaction represent very different experiences depending on the length of a sequence. Thus,
we can also run the analysis based on constant experience by looking at the interaction from
period T until the end. We do it for T = 30 in Tables V and IV.
4.2. Reputation Building in the Samaritan’s Dilemma: Theory and Experiments
In the Samaritan’s Dilemma (Buchanan (1977)), our solution makes predictions of particular interest. In this section, we derive them in detail and present experimental evidence.
In this game, player 1 chooses to help (H) or not to help (H̄) player 2 accomplish a task.
Player 2 chooses to work (W ) or not to work (W ). The payoffs are given in Figure 3. The
dilemma is that player 1’s help is crucial to both players’ welfare, but if 1 helps, then 2
prefers not working.
4.2.1. Theoretical Predictions
By Theorem 2, the solution payoffs must be a subset of the Pareto frontier. Therefore, all
the solution sequences must play (H, W ), or (H, W ), or both, almost all the time. Since constant sequences survive Axiom 4, the extremities of the Pareto frontier are solution payoffs.
The first main conclusion of our solution is that if player 1 always helps player 2, then 2
stops working. Formally, if R∗ (s) = {(H, W ), (H, W )}, then L(s) = {π ∈ R2+ : π2 = 5π1 /3}
(see equation (3.6)). The only feasible payoff in this set is (3, 5). Intuitively, if player 1 always
helps player 2, then player 2 exploits 1’s apparent weakness. Again, this corresponds to an
extremity of the Pareto frontier.
The most important conclusion is that player 1 can obtain payoffs in the interior of the
31
∏2
∏2
4
4
(H, W)
(4.8, 3.2)
3
(4.8, 3.2)
Reputation
Building
3
∏1
2
2
3
4
(H, W)
∏1
2
2
3
4
Figure 4.— Solution Payoffs in the Samaritan’s Dilemma (indicated by the black dots
and the dashed line in the right figure. Note: the origin is (2, 2).)
Pareto frontier only if he builds a reputation by occasionally not helping. This conclusion
is an apparent paradox: by the previous conclusion, player 1 cannot help player 2 all the
time if she wants to get more than 3, but she must help player 2 most of the time in virtue
of Axioms 1 and 3. Before clarifying, note that when {(H̄, ·), (H, W ), (H, W )} ⊂ R∗ (s),
L(s) = {π ∈ R2+ : π2 = απ1 s.t. α ∈ [2/3, 5/2]}, which is the gray area in Figure 4.
Therefore, solution payoffs in the interior of the Pareto frontier are possible, if player 1
plays H̄ infinitely often. But if player 1 does so, how can the payoffs stay on the Pareto
frontier? This is the apparent paradox. The key is that player 1 must play H̄ with vanishing
frequency. That is, player 1 can only earn more than 3, if once in a while, she decides not
to help player 2, but not so often as to destroy efficiency. Therefore, player 1 must build
a reputation through punishments, but her punishments must vanish over time, as if her
reputation became common knowledge and persisted though punishments became rare.
4.2.2. Experimental Results
In the data, the most frequent observations are sequences in which players eventually play
the static Nash equilibrium, and sequences in which players eventually alternate between
(H, W ) and (H, W ) (see Section A.1). We refer to them as the static Nash and the alternation
32
Number of Sequences
4
3
2
1
0
3
6
9
Static Nash (not work)
12
15
18
21
24
27
Alternation work-not work
30
33
36
39
42
45
Number of Punishments
Figure 5.— Distribution of Punishments in the Static Nash and in the Alternation Outcome.
sequences.
The main experimental finding is the structure of the alternation sequences. To induce
the alternations between W and W , the samaritan subjects (player 1) often established
a reputation by massively punishing their partner early in the game. After a while, they
stopped punishing completely, but their partner alternated until the end, as if the reputation
persisted, which echoes our theoretical conclusion.
According to our solution, alternations require punishments to vanish but never stop. In
this form, the claim cannot be tested in finite horizon,23 but it suggests that there should be
more punishments in the alternation sequences than in the static Nash sequences, which is
testable.
23
In Table II, we considered that if there were at least one punishment prior to player 2’s alternations, then
the sequence satisfied the axioms. Of course, this is the minimal requirement. In general, we could require
at least 1 ≤ N ≤ ∞ punishments.
33
Figure 5 represents the distributions of punishments for the static Nash and the alternation
outcomes.24 The alternation distribution first-order stochastically dominates the static Nash
distribution, which supports our theoretical claim. In particular, the average number of punishments in the alternation distribution is larger (16 vs. 7), which is statistically significant
at a 5% significance level.
The data suggest the existence of other phenomena, such as altruism (alternations with
no punishment) and exploitation. In the latter, subjects in the role of player 2 started by
playing W frequently, but in the absence of any punishment, they eventually stopped, and
the pair converged to the static Nash. This conforms to our solution.
5. CONCEPTUAL DISCUSSION
5.1. Equilibrium or Learning Approach?
There exist two interpretations of our approach, depending on the meaning of the infinite
sequences of play. In a “learning” interpretation, the sequences of play represent the real
time interaction between the players. In an “equilibrium” interpretation, the sequences of
play are not a real time account of players’ interaction, but they are the end result of an
unspecified process.
The learning interpretation comes naturally, but it faces important challenges. In this
interpretation, the content of the axioms holds eventually, in virtue of the liminf criterion.
But how does a player know in real time that the interaction will never satisfy some principle,
unless she observes the entire sequence? For example, how does a player know today that
if she does not change her behavior the final outcome will be individually irrational, unless
24
Observed sequences may have different lengths due to randomness. Nevertheless, the distributions of
lengths across the static Nash and the alternation sequences are nearly identical, and so comparing the
number of punishments is relevant. Moreover, the length of a sequence did not seem to affect the number of
punishments in our data set, as most subjects settled on a pattern in the first 90 periods, regardless of the
actual total length.
34
she sees the future? The next result reinforces the argument.
Proposition 4
limT →∞
1
T
PT
t=1
For any game G ∈ G ′ ,25 if a sequence s ∈ S satisfies Axiom 3, then
u(st ) exists.
Players’ interaction under Axiom 3 must be relatively stable, because average payoffs must
converge. How does this stability emerge? The learning interpretation is tempting, because
it requires the desired properties to eventually emerge from within a sequence, as if driven
by evolution, but the process by which the axioms are satisfied is unspecified.
Standard equilibrium theory prompts similar interrogations. If a Nash equilibrium of a
repeated game represents players’ interaction in real time, then how does each player know
all the future consequences of her move today? How do players know the repeated game
strategies of their opponents? And if repeated games are not played in real time, then what
is the process that leads to Nash equilibria? Nachbar (2005) shows that there is no obvious
answer.
In conclusion, we encourage an equilibrium interpretation of our approach. In this interpretation, there is no claim of convergence to a cycle, unlike what Proposition 4 nearly implies;
we take cycles as the starting point of the analysis.26 That is, we construct an equilibrium
solution within the space of cycles by selecting among them (say that a cycle belongs to our
solution, if and only if, it survives the axioms). As such, we do not explain the emergence
of cycles, much less guarantee it, but if cycles do emerge, then our ambition is to capture
them, and only those, into a solution. Given this objective, our experimental results (Table
II) give credit to our solution.
25
In Section 3.4.2, we defined G ′ as the family of stage games for which, if ui (a) = ui (a′ ) for some i, then
a = a′ .
26
Abreu and Rubinstein (1988) take finite automata, and hence strong cycles, as the starting point of their
analysis. Among the strong cycles, they determine which ones emerge in equilibrium.
35
5.2. Observables vs. Unobservables
Detailed descriptions of bounded rationality, as in the standard approaches, often require
assumptions on unobservable variables, while their conclusions concern observables. For example, learning models often specify how beliefs are formed or how satisfaction thresholds
evolve. But beliefs and satisfaction thresholds are not directly observable.
Our approach, however, focuses on outcomes and its axioms apply to observable data.
Since the assumptions and the conclusions live in the same space, they must be clearly
distinguishable. In our case, the axioms carry independent behavioral contents — though
behavior is not exactly specified — and their combination results in substantive conclusions.
Reputation building is an example.
Assumptions on observables require the analyst to discard sequences based solely on what is
observed to happen. Therefore, this approach abstracts away from “off-path” considerations
and emphasizes actual events and the self-sustainability of sequences. This perspective is
common in the learning literature. In most learning models, the sequences of play are selfgenerated from one period to the next by various rules, and “off-equilibrium” considerations
are completely absent. This simplifying feature excludes valuable information, but empirical
works show that these approaches can still be useful.
From a methodological standpoint, starting from observables streamlines the dialogue between theory and experiments. On the one hand, given that sequences of play are malleable
yet rich primitives, the theory can efficiently produce solutions. These solutions are directly
testable, although the infinite horizon can be problematic. On the other hand, the data from
repeated interactions contain regularities to be mined. These regularities can be introduced
into this framework to study their theoretical implications. Once a valid solution is reached,
it is natural to build a model that explains it.
36
6. CONCLUSION
This paper has proposed an axiomatic approach to study repeated interactions between
boundedly rational players. On the one hand, there is the approach of specifying a model
of behavior (preferences, feasible actions, beliefs, etc). This approach describes and predicts
behavior. On the other hand, there is the axiomatic approach that suggests properties of
unmodeled behaviors that seem reasonable for the domain of application (e.g. equity, fairness,
stability, etc.). This paper has taken the latter approach. In doing so, we have assumed
that patterns of observed behaviors have meaning in the sense that they are the outward
expression of inward motives. If it is the case, then we can encode behavioral principles
implicitly into axioms on the sequences of play. The malleability of these sequences allows
us to combine multiple axioms and to derive solutions that are multifaceted representations
of unsophisticated behavior.
The relevance of our approach is supported by strong experimental evidence. In the games
tested, our solution performs at least as well as widely accepted learning models (Ho et al.
(2007) and Er’ev and Roth (1998)). Moreover, the patterns of play extracted from the data
overwhelmingly satisfy our axioms (Table II). These quantitative conclusions are complemented by qualitative evidence. In the Samaritan’s Dilemma, our solution makes particularly interesting predictions that point to reputation building. Experiments suggest that
reputation building indeed plays an important role in this game.
Our approach opens new prospects. There exist many ways to be less than rational, and
the possibility to explore them within the same framework is appealing. Our axioms apply within games and to all games. In this category, there exist other axioms formalizing
various principles, and hence other solutions can be derived. For example, our axioms do
not discriminate among (individually secure) constant patterns. Clearly, this leaves room for
further selection. There are also other categories of axioms. For example, we can think of
37
axioms that establish relationships across games. If players adopt some behavior in a game,
then they may adopt similar behaviors in related games. Thus, a solution may satisfy some
invariance or isomorphism properties. Another category consists of context-specific axioms.
These axioms are designed to hold in certain classes of games, but not necessarily in others.
For example, assuming that a solution outputs strong cycles is acceptable in many games,
but it may not be appropriate in zero-sum games.
38
APPENDIX A: EXPERIMENTAL RESULTS
A.1. Overview
In the experimental plots, the center of each circle is a data point (average payoffs over the last 30
periods) and larger circles represent more frequent observations.
Folk Theorem
(Self-Tuning) EWA
Payoff 2
Reinforcement
Payoff 2
Experimental Data
Axiomatic Approach
Payoff 2
Payoff 2
Payoff 2
4.0
4.0
4.0
4.0
3.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
0.0
Payoff 1
0.0
0.0
Payoff 1
0.0
Payoff 1
0.0
Payoff 1
Payoff 1
4.0
0.0
0.0
4.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
4.0
Prisoners’ Dilemma
Payoff 2
Payoff 2
Payoff 2
Payoff 2
Payoff 2
4.0
4.0
4.0
4.0
4.0
3.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
1.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
0.0
4.0
1.0
2.0
3.0
0.0
Payoff 1
0.0
0.0
Payoff 1
0.0
Payoff 1
Payoff 1
Payoff 1
0.0
0.0
0.0
4.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
4.0
1.0
2.0
3.0
4.0
1.0
2.0
3.0
4.0
Battle of the Sexes
Payoff 2
Payoff 2
4.0
3.0
2.0
1.0
4.0
4.0
4.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
0.0
0.0
4.0
1.0
2.0
3.0
Payoff 1
1.0
Payoff 1
0.0
0.0
0.0
Payoff 1
Payoff 1
Payoff 1
0.0
Payoff 2
Payoff 2
Payoff 2
4.0
0.0
0.0
4.0
Stag Hunt
Payoff 2
Payoff 2
Payoff 2
4.0
3.0
2.0
1.0
4.0
4.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
3.0
4.0
0.0
1.0
2.0
3.0
0.0
4.0
1.0
2.0
3.0
0.0
4.0
0.0
1.0
2.0
3.0
Payoff 1
2.0
0.0
Payoff 1
1.0
Payoff 1
Payoff 1
0.0
0.0
Payoff 2
4.0
Payoff 1
0.0
Payoff 2
4.0
0.0
4.0
0.0
Ultimatum Game
Payoff 2
Payoff 2
Payoff 2
Payoff 2
Payoff 2
4.0
4.0
4.0
3.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
Chicken
3.0
4.0
0.0
Payoff 1
1.0
0.0
Payoff 1
0.0
0.0
Payoff 1
0.0
Payoff 1
4.0
Payoff 1
4.0
0.0
0.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
4.0
39
Payoff 2
Payoff 2
Payoff 2
4.0
4.0
4.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
3.0
0.0
4.0
0.0
0.0
1.0
2.0
3.0
4.0
0.0
1.0
2.0
3.0
3.0
2.0
1.0
0.0
4.0
0.0
1.0
2.0
3.0
Payoff 1
2.0
4.0
Payoff 1
1.0
Payoff 1
0.0
Payoff 1
Payoff 1
0.0
Payoff 2
Payoff 2
4.0
0.0
0.0
4.0
1.0
2.0
3.0
4.0
Mixed-Nash Game
Payoff 2
Payoff 2
Payoff 2
Payoff 2
Payoff 2
4.0
4.0
4.0
4.0
4.0
3.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
1.0
1.0
1.0
1.0
1.0
3.0
0.0
4.0
0.0
1.0
2.0
3.0
0.0
4.0
1.0
2.0
3.0
0.0
4.0
0.0
1.0
2.0
3.0
Payoff 1
2.0
0.0
Payoff 1
1.0
Payoff 1
0.0
Payoff 1
Payoff 1
0.0
0.0
0.0
4.0
1.0
2.0
3.0
4.0
Common Interest
Payoff 2
Payoff 2
Payoff 2
Payoff 2
Payoff 2
5.0
5.0
5.0
5.0
4.0
4.0
4.0
4.0
4.0
3.0
3.0
3.0
3.0
3.0
2.0
2.0
2.0
2.0
2.0
3.0
4.0
5.0
1.0
2.0
3.0
4.0
5.0
1.0
2.0
3.0
4.0
5.0
1.0
Payoff 1
2.0
1.0
Payoff 1
1.0
1.0
Payoff 1
Payoff 1
1.0
Payoff 1
5.0
1.0
1.0
2.0
3.0
4.0
5.0
1.0
2.0
3.0
4.0
5.0
Samaritan’s Dilemma
A.2. Robustness Analysis
TABLE III
Empirical Performance Based on Payoffs (Last 50 periods)
Game
Axiomatic
Type I
Prisoners’ Dilemma
43%
Battle of the Sexes
59%
Stag Hunt
17%
Chicken
45%
Common Interest
40%
Samaritan’s Dilemma 57%
Ultimatum Game
57%
Mixed-Nash Game
100%
Type II
Reinf.
Type I
.08% 85%
.06% 95%
0% 17%
3% 100%
0% 40%
4% 83%
1% 57%
.25% 92%
Type II
EWA
Folk Thm
Type I
Type II
Type I
Type II
0% 85%
.06% 95%
0% 17%
.05% 100%
0% 40%
0% 83%
0% 100%
0% 92%
0%
.06%
0%
.05%
0%
0%
0.1%
0%
4%
17%
4%
18%
0%
55%
45%
72%
52%
75%
69%
68%
2%
71%
28%
12%
Note. The analysis of the last 50 periods excludes all sequences of length less than 50. In certain
games, such as the common interest game, this rules out many sequences, and hence small noise
in the remaining data points can have a large impact on type I errors.
40
TABLE IV
Satisfaction Rates of the Axioms Among Stable Sequences (Last 50 periods)
Game
Algo1-A
Algo1-P
All ∗
Algo2
Nb Rate
Nb Rate Nb Rate
70 30 100% 30 100%
57 26 100%
70 25 92% 25 92%
58 17 94%
38 35 100%
50 20 100% 20 100%
70 24 96% 24 96%
58 16 94%
18
4 100%
18
5 100%
5 100%
60 26 81% 29 72%
48 19 79%
47 21 95%
60 25 100% 31 100%
398 155 95% 164 93% 324 139 94%
All
Prisoners’ Dilemma
Battle of the Sexes
Stag Hunt
Chicken
Common Interest
Samaritan’s Dilemma
Ultimatum Game
Total/Average
TABLE V
Empirical Performance Based on Payoffs (Period 30 to end)
Game
Axiomatic
Type I
Prisoners’ Dilemma
36%
Battle of the Sexes
70%
Stag Hunt
10%
Chicken
49%
Common Interest
6%
Samaritan’s Dilemma 68%
Ultimatum Game
67%
Mixed-Nash Game
100%
Type II
Reinf.
Type I
.08% 77%
0% 87%
0% 10%
3% 100%
0%
6%
4% 87%
1% 67%
.25% 94%
Type II
EWA
Folk Thm
Type I
Type II
Type I
Type II
0% 77%
0% 87%
0% 10%
.06% 100%
0%
6%
0% 87%
0% 100%
0% 94%
0%
0%
0%
.06%
0%
0%
.1%
0%
6%
24%
2%
14%
0%
42%
55%
81%
51%
75%
69%
68%
18%
71%
28%
12%
TABLE VI
Satisfaction Rates of the Axioms Among Stable Sequences (Period 30 to end)
Game
All
Prisoners’ Dilemma
Battle of the Sexes
Stag Hunt
Chicken
Common Interest
Samaritan’s Dilemma
Ultimatum Game
Total/Average
47
41
23
40
6
60
60
277
Algo1-A
Algo1-P
All ∗
Algo2
Nb Rate Nb Rate
Nb
Rate
29 100% 29 100%
34 23
100%
20 95% 20 95%
29 12
100%
19 100% 19 100%
11 10
100%
28 11
100%
20 100% 20 100%
18
5
100%
6 100%
6 100%
48 14
71%
20 75% 21 71%
22 100% 27 100%
47 18
100%
136 96% 142 95% 215 93
96%
Note. To produce this table, we have ruled out all sequences of length less than 50 so as
to allow most patterns to appear at least 3 times between period 30 and the end of the
sequence (otherwise we can hardly consider it a pattern).
41
APPENDIX B: CLASSIFICATION OF GAMES
In this section, we classify all stage games, and indirectly all repeated interactions, into families
based on the degree of sophistication required to satisfy strong collective intelligence.
B.0.1. Families of Games
This subsection presents the technical definitions and the classification. The next subsection contains the main result.
A triangle is a convex set ∆ = Co u1 , u2 , u3
where u1 , u2 and u3 are three (non-collinear)
payoffs in Π(G),27 called vertices. Let V (∆) be the set of vertices of ∆. Let N (∆) = u ∈ V (∆) :
u ∈ P(V (∆)) be the set of vertices of ∆ that are pairwise Pareto undominated. Let T (G) be the
set of all triangles in G.
Given a game G ∈ G, the next definition classifies the areas of the feasible payoffs into sets. This
will determine the family to which the game belongs. Let Q[0, 1] be the set of rational numbers
between 0 and 1. Given a rational number q, d(q) denotes its denominator and ν(q) its numerator.
In the definition, k is a natural number.
A payoff π ∈ Co(Π(G)) is in Uk if for all triangles ∆ ∈ T (G) such that π ∈
Definition 4
∆ and |N (∆)| = 3, we can write ∆ = Co u1 , u2 , u3
such that, for some q ∈ Q[0, 1], either
(u − u ) + q(u − u ) ≫ 0 and d(q) ≤ k, or (u − u ) + q(u3 − u2 ) ≫ 0 and d(q) + ν(q) ≤ k.28 If
2
1
2
3
1
2
there is no triangle containing π such that |N (∆)| = 3, then π ∈ U1 .
A feasible payoff is in Uk if every triangle that contains it, and whose vertices are pairwise Pareto
unordered, can be written such that a movement along one of its edges, followed by a fraction q of
a movement along another, leads to a Pareto improvement, where q satisfies certain properties with
respect to k. Once all feasible payoffs are placed into some Uk , the largest of these k values is the
family to which G belongs.
Definition 5
A game G ∈ G is in Fn , where n ∈ N+ , if for every π ∈ Co(Π(G)), π ∈ ∪nk=1 Uk .
Definition 6 A game G ∈ G is in F1∗ , if for all ∆ ∈ T (G) and π ∈ ∆, either (i) |N (∆)| = 1 or
(ii) |N (∆)| = 2 and ∆ = Co u1 , u2 , u3
such that u2 ≫ u1 and u3 ≫ u1 .
−x2
2
= zz21 −x
Three points x, y, z ∈ R2 are collinear if yy12 −x
−x1 , i.e., if they are on the same line.
1
28
The requirement q ∈ [0, 1] is inconsequential because we can relabel the vertices of a triangle: if (u2 −
1
u ) + 2(u2 − u3 ) ≫ 0, then 21 (u2 − u1 ) + (u2 − u3 ) ≫ 0; exchanging the labels of u1 and u3 gives the desired
form.
27
42
Every stage game belongs to a family Fn and every family Fn is nonempty.
∏2
3
∏2
u3
3
u1
u2
1
U1
2
2
1
u1
1
2
u3∏1
3
u2
1
u2
U2
2
u3 ∏1
3
Figure 6.— Triangles in the Prisoners’ Dilemma
In the Prisoners’ Dilemma of Section 3.3, there are four possible triangles, as shown by Figure
6. For every triangle in the left panel, |N (∆)| = 2 and thus, if these were the only triangles, the
game would be in F1 . However, triangles in the right panel satisfy |N (∆)| = 3. In the bottom
triangle, a movement along (u1 − u2 ) + (u3 − u2 ) leads to a Pareto improvement. Therefore, q = 1
and hence d(q) + ν(q) = 2. In words, players should replace two occurrences of (D, D) by one (C, D)
and one (D, C), which requires twice as much coordination. In the top triangle, a movement along
(u2 − u1 ) + (u2 − u3 ) gives a Pareto improvement. Therefore, q = 1 and hence d(q) = 1. Putting these
observations together, we obtain G ∈ F2 .
Battle of the Sexes and Chicken are in F1 . As shown by Figure 1 of Section 3.3, the set of feasible
payoffs in Battle of the Sexes has only one triangle and it satisfies |N (∆)| = 2. Thus, Battle of the
Sexes is in F1 . For Chicken, only one triangle satisfies |N (∆)| = 3, and we can label its vertices so
that (u2 − u1 ) + (u2 − u3 ) ≫ 0. Thus, Chicken is in F1 .
Among the games from Section A.1, Battle of the Sexes, Stag Hunt, the Ultimatum Game, Common
Interest, and the Samaritan’s Dilemma are in F1∗ .
In order to give further examples and to assist the reader with our classification, we provide an
online program that classifies 2 × 2 games.29
29
The program is available at webspace.utexas.edu/lm26875/www/research. The user specifies the payoffs
and the program outputs Fn if n < 10, 000. I thank my research assistant, Jonathan Lhost, for creating
this program. For example, the reader can verify that u(c, c) = u(d, d) = (6, 1), u(c, d) = (1, 8), and
u(d, c) = (2, 7) define a game in F5 .
43
B.0.2. Result: Properties of Families
The complexity of a cyclic sequence s is measured by the length of its cycle, ℓ(s). Given a cyclic
sequence s, say that sequence s′′ is at most n times more complex than s if ℓ(s′′ ) ≤ nℓ(s). When
n = 1, s′′ is said to be simpler than s. When ℓ(s′′ ) < ℓ(s), s′′ is said to be strictly simpler than s.
Proposition 5
1. For any game G ∈ Fn , if a cyclic sequence s contains a subsequence s′ such that π(s′ ) ≫ π(s),
then there is a cyclic subsequence s′′ of s such that s′′ is at most n times more complex than s
and π(s′′ ) ≫ π(s).
2. For any game G ∈ F1∗ , if a strongly cyclic sequence s contains a subsequence s′ such that
π(s′ ) ≫ π(s), then there is a strongly cyclic subsequence s′′ of s that is strictly simpler than s
and such that π(s′′ ) ≫ π(s).
This result states that if a Pareto-improving subsequence exists, then there must also be one
whose complexity cannot exceed a certain limit compared to the original sequence, and this limit is
determined by the family. Next we give the intuition behind the result.
When players play a cycle, the resulting payoffs lie in some triangle, each vertex of which corresponds to a payoff of a profile from the cycle. Definition 4 describes how to substitute these profiles to
yield Pareto improvements. In the Prisoners’ Dilemma, for payoffs in the top triangle, the definition
says that replacing one occurrence of (C, D) and one of (D, C) by two occurrences of (C, C) increases
players’ payoffs. Clearly, this does not increase the length of the cycle. In the bottom triangle, the
definition says to replace two occurrences (or points of frequency) of (D, D) by one of (C, D) and
one of (D, C). In some situations, e.g. when the cycle places weight 1/2 on (D, C) and 1/4 on (C, D)
and (D, D), we must first double the cycle to be able to implement the changes.
APPENDIX C: PROOFS
Several proofs use the following observation.
Observation 1
By definition of a subsequence (Definition 1), given any sequence s, if we build
′
a sequence s by only using action profiles that appear infinitely often in s, then s′ is subsequence
of s. As a result, each convex combination (i.e., using rational numbers as coefficients) of recurrent
payoffs can be achieved by some subsequence of s.
44
C.1. Proposition 1
Proof: Suppose that a cyclic sequence s admits no subsequence s′ such that π(s′ ) ≫ π(s) but
there is a profile â such that wâ (s) > 0 and u(â) ≪ π(s). It follows from the definition of a cyclic
sequence that
π(s) =
X wa (s)
a∈A
ℓ(s)
u(a).
Since u(â) ≪ π(s), there must be a1 and a2 , not necessarily different, such that wa1 (s), wa2 (s) > 0
and u(a1 ) − u(â) + q(u(a2 ) − u(â)) ≫ 0 where q is a rational number in [0, 1]. To see this, note
that if u(â) ≪ π(s), then either (i) there is a profile (played with positive frequency) that Pareto
dominates â or (ii) there are two distinct profiles (played with positive frequency) whose convex
combination Pareto dominates â. Let q = ν(q)/d(q) where ν(q) and d(q) in N are the numerator and
the denominator of q. For every w∗ > 0, let
π ∗ = π(s) +
w∗
u(a1 ) − u(â) + q(u(a2 ) − u(â)) ,
ℓ(s)
and note that π ∗ ≫ π(s). We demonstrate that w∗ can be chosen so that π ∗ is a convex combination
of pure payoffs. For w∗ =
π∗ =
wâ (s)
(1+q) ,
we have
wa1 (s)(d(q) + ν(q)) + d(q)wâ (s)
wa (s)(d(q) + ν(q))
u(a) + u(a1 )
ℓ(s)(d(q) + ν(q))
ℓ(s)(d(q) + ν(q))
a6=a1 ,a2 ,â
X
wa2 (s)(d(q) + ν(q)) + ν(q)wâ (s)
+ u(a )
ℓ(s)(d(q) + ν(q))
2
(C.1)
which is a well-defined convex combination. This convex combination corresponds to a sequence built
from s by appropriately re-allocating the weight on â to a1 and a2 . By Observation 1, this convex
combination is the payoff of a subsequence of s. Since π ∗ ≫ π(s), s admits a Pareto improving
subsequence, which is a contradiction.
Q.E.D.
C.2. Proposition 2
Proof: For every G ∈ G, Axiom 2 implies that π(s) ∈ U (R(s)) for any s ∈ S w (G), because s
cannot be dominated by any single element in R(s) (otherwise a constant truncation would be Pareto
improving). Individual security is immediate, and thus it is clear that π(S w (G)) ⊂ {π ∈ R2 : ∃s ∈
S such that π = π(s) and π(s) ∈ U (R(s))} ∩ ΠIS (G). Next we show the inclusion holds in the other
45
direction. Take any π ∈ {π ∈ R2 : ∃s ∈ S such that π = π(s) and π(s) ∈ U (R(s))} ∩ ΠIS (G). Thus,
there is a sequence s such that π = π(s) and π ∈ U (R(s)). Write the elements of R(s) as {u1 , . . . , un }
where a higher index indicates a larger payoff for player 2 (i.e., if k > h, then uk2 ≥ uh2 ). For any
ǫ > 0, there is a rational convex combination of elements of R(s),
′
Pn
m=1
wm m
ℓ u
with
P
wm = ℓ, that
is ǫ-close to π. Define sequence s such that (i) all its elements are in R(s) and (ii) it has a cycle that
corresponds to the following payoff cycle
u1 u1 . . . u2 u2 . . . um
(C.2)
where w1 u1 ’s are followed by w2 u2 ’s, etc. Clearly, π(s′ ) = π(s). Further, given the way the cycle
of s′ is ordered in (C.2), any truncation of it must (weakly) decrease player 2’s payoff because of
our indexing. Therefore, no truncation can strictly benefit both players. Since π(s) ∈ U (R(s)) and
R(s) = R(s′ ), s′ ∈ S w (G) and hence π = π(s′ ) ∈ π(S w (G)).
Q.E.D.
C.3. Theorem 1
Proof: Suppose that solution S satisfies the axioms.
Step 1. Internal efficiency. Take any G and any sequence s ∈ S(G). By Observation 1, for every
π ∈ Co(R(s)), there is a subsequence s′ of s such that π(s′ ) = π. By Axiom 3, there is no subsequence
s′ such that π(s′ ) ≫ π(s). Thus, there is no π ∈ Co(R(s)) such that π ≫ π(s). By definition of P(·),
this implies that π(s) ∈ P(Co(R(s))).
Step 2. Individual security. This part follows immediately.
Suppose now that S(G) is a subset of tall internally efficient and individually secure sequences. Take
any G and any sequence s ∈ S(G). Then π(s) ∈ P(Co(R(s))). By Observation 1, there cannot be
a subsequence s′ of s such that π(s′ ) ≫ π(s), for otherwise there would be a convex combination
(in particular, one using rational numbers as coefficients) of elements of R(s) that Pareto dominates
π(s). Therefore, Axiom 3 holds. Axiom 1 is trivially satisfied.
Q.E.D.
C.4. Theorem 2
Proof: By definition, for any π ∈ π(S st (G)), there is a sequence s ∈ S st (G) such that π(s) = π.
Sequence s must be individually secure by Theorem 1, and hence π ∈ ΠIS (G). By the same theorem,
s must also be internally efficient, and hence π ∈ P(Co(R(s))). Recall the definition of R∗ , defined
in (3.2) (on p. 20) as the recurrent action profiles. If R∗ (s) = {a} for some profile a, then internal
46
efficiency immediately implies π ∈ P(g) where g = {u(a)}. If R∗ (s) = {a, a′ }, then internal efficiency
immediately implies π ∈ P(g) where g = Co({u(a), u(a′)}). Suppose now that R∗ (s) ≥ 3. Every point
in P(Co(R(s))), π in particular, must lie on the boundary of (i.e., a segment in) Co(R(s)), for if π
lied in the interior, then there would be a Pareto improvement over π and s would not be internally
efficient. Since π must lie on a segment g ⊂ Co(R(s)), it follows from internal efficiency that π ∈ P(g).
This proves that π(S st (G)) ⊂ (∪g∈L(G) P(g)) ∩ ΠIS (G). Let us show that the inclusion holds in the
other direction. Take any π ∈ (∪g∈L(G) P(g))∩ΠIS (G). We build a sequence that satisfies both axioms
and generates payoffs π. By definition, there exist u1 , u2 ∈ Π(G) such that g = Co u1 , u2
and
π ∈ P(g). If u1 and u2 are Pareto ordered, then assuming without loss of generality that u1 ≫ u2 , we
have P(g) = u1 , and hence the sequence that only plays a profile with payoff u1 = π is internally
efficient. If u1 and u2 are not Pareto ordered, then it is possible to build a sequence s such that
R(s) = u1 , u2 and whose payoff π(s) = π. This sequence must be internally efficient, for otherwise
π would not lie in P(Co u1 , u2 . Thus, π ∈ π(S st (G)).
Q.E.D.
C.5. Proposition 3
Proof: Take an arbitrary game G in G + ∩ G ′ . Choose any π ∈ π(S is (G)). Then either (i) π = u(a)
for some a ∈ A or (ii) π 6= u(a) for all a ∈ A. Case (i) immediately satisfies the requirement. So,
suppose that π satisfies case (ii). For any sequence s′ , if πi (s′ ) = ui (a∗ ) ≡ maxa∈R∗ (s′ ) ui (a), then
πj (s′ ) = uj (a∗ ), because G ∈ G ′ .30 Let sequence s ∈ S is (G) satisfy π(s) = π. From the previous
observation, it cannot be that πi (s′ ) = maxa∈R∗ (s′ ) ui (a) for any i, because π(s) 6= u(a) for all a by
assumption. This implies πi (s) < maxa∈R∗ (s) ui (a) for all i. Thus, Axiom 4 requires that (3.5) hold
for no i. Therefore, if
π1,2 (s)
π2 (s)
<
π1 (s)
π2,1 (s)
(C.3)
π2,2 (s)
π2 (s)
>
.
π1 (s)
π1,1 (s)
(C.4)
then s survives the axiom if and only if
The converse also holds: if (C.4) holds, then s survives the axiom if and only if (C.3) holds. To
summarize, under Axiom 4, (C.3) holds if and only if (C.4) holds. It follows from similar arguments
30
Consider a game for which (2, 4) and (2, 2) are pure payoffs but not (2, 3). Note that this game is not in
G . Take a sequence s′ along which payoffs (2, 4) and (2, 2) appear with frequency 1/2. Then, π1 (s′ ) = 2 =
maxa∈R∗ (s′ ) u1 (a) but π2 (s′ ) = 3 and there is no a such that u1 (a) = 2 and u2 (a) = 3.
′
47
that
π1,2 (s)
π2 (s)
>
π1 (s)
π2,1 (s)
(C.5)
π2 (s)
π2,2 (s)
<
π1 (s)
π1,1 (s)
(C.6)
holds if and only if
π2 (s)
π1 (s)
holds. As a corollary,
(a) Suppose
π2,2 (s)
π1,1 (s)
>
=
π1,2 (s)
π2,1 (s) ,
π1,2 (s)
.
π2,1 (s)
Then
if and only if,
π2 (s)
π1 (s)
π2 (s)
π1 (s)
=
π2,2 (s)
π1,1 (s) .
Consider the following three cases:
can be above, below, or in between both ratios. That is,
π2,2 (s)
π2,2 (s)
π2 (s)
π1,2 (s)
π1,2 (s)
π2 (s)
π2 (s)
≥
or
>
>
or
≥
.
π1 (s)
π1,1 (s)
π1,1 (s)
π1 (s)
π2,1 (s)
π2,1 (s)
π1 (s)
Given our previous observations, only
(b) Suppose
π2,2 (s)
π1,1 (s)
>
π2,2 (s)
.
π1,1 (s)
>
π2 (s)
π1 (s)
>
π1,2 (s)
π2,1 (s)
is compatible with Axiom 4.
A similar reasoning to Case (a) establishes that only
π1,2 (s)
π2,1 (s)
π2 (s)
π1 (s)
>
π1,2 (s)
π2,1 (s)
=
>
is compatible with Axiom 4.
(c) Suppose
π2 (s)
π1 (s)
π1,2 (s)
π2,1 (s)
π2,2 (s)
π1,1 (s)
=
π2,2 (s)
π1,1 (s)
π1,2 (s)
π2,1 (s)
=
π2,2 (s)
π1,1 (s) .
A similar reasoning to Cases (a) and (b) establishes that only
is compatible with Axiom 4.
Putting Cases (a)-(c) together, we conclude that π(s) ∈ L(s).
The first part of the proof has established that π(S is (G)) ⊂ {π ∈ R2 : ∃a ∈ A s.t. π = u(a) or ∃s ∈
S s.t. π = π(s) ∈ L(s)}. We now prove that the inclusion holds in the other direction. First, choose
any π = u(a) for some a ∈ A. Any sequence s ∈ S such that R∗ (s) = {a} trivially survives internal
stability (because for no i it holds that πi (s) < maxa∈R∗ (s) ui (a)), and thus π ∈ π(S is (G)). Suppose
now that there is s such that π = π(s) ∈ L(s). Then, one of Cases (a)-(c) must apply. In each case,
we know that (3.5) can hold for no i. Thus, s survives Axiom 4 and π(s) ∈ π(S is (G)).
Q.E.D.
C.6. Proposition 4
Proof: By way of contradiction, suppose that sequence s satisfies Axiom 3 but limT →∞
does not exist. Define
zT =
T
1X
u(st ).
T t=1
1
T
PT
t
t=1 u(s )
By assumption, {zT } is divergent, but since it lives in a compact space, it has at least two distinct cluster points z ′ and z ′′ in Co(R(s)). By definition of payoffs as liminf, z ≡
min z1′ , z1′′ ,
48
min z2′ , z2′′
≥ π(s). Suppose first {z ′ , z ′′ } 6⊂ P(Co(R(s))). Then it is clear that z ∈
/ P(Co(R(s))),
and thus, by Observation 1, there is a subsequence s′ of s such that π(s′ ) ≫ z. This violates
Axiom 3. Suppose now {z ′ , z ′′ } ⊂ P(Co(R(s))). Then, there exist a and a′ in A such that (i)
u(a), u(a′ ) ∈ P(Co(R(s))) and (ii) both z ′ and z ′′ lie on the line segment Co({u(a), u(a′)}). Since
the stage game is such that no two distinct action profiles yield either player the same payoff,
/ P(Co{R(s)}). Again, this is a violation of
u1 (a) 6= u1 (a′ ) and u2 (a) 6= u2 (a′ ), it must be that z ∈
Axiom 3.
Q.E.D.
C.7. Proposition 5
Proof: 1. Suppose s has a cycle. Consider two cases: (a) |R(s)| ≤ 2, (b) |R(s)| ≥ 3.
Case (a). If |R(s)| = 1, there is no subsequence s′ of s such that π(s′ ) ≫ π(s) and this violates the
assumption. If |R(s)| = 2, then either the elements of R(s) are not Pareto ranked, i.e., R(s) = {u, u′ }
with u 6≫ u′ and u′ 6≫ u, in which case the assumption is violated because there is no subsequence
s′ of s such that π(s′ ) ≫ π(s); or these elements are Pareto ranked (say u ≫ u′ , without loss
of generality). Since a Pareto-improving subsequence exists, u′ must be played with strictly positive
frequency. Hence the subsequence s′′ of s, which always plays the profile yielding u, strictly dominates
s (if s is strongly cyclic, then s′′ is also strictly simpler than s, since s′′ has a cycle of length 1).
Case (b). Suppose |R(s)| ≥ 3. If π(s) ∈ P(Co(R(s))), then there is no subsequence s′ of s such
that π(s′ ) ≫ π(s), which violates the assumption. Therefore, suppose π(s) ∈
/ P(Co(R(s))). Then
there must be a triangle ∆ = Co u1 , u2 , u3
(iii) π(s) ∈
/ P(∆). There are two sub-cases:
such that (i) π(s) ∈ ∆, (ii) u1 , u2 , u3 ⊂ R(s), and
Case (b1). Suppose |N (∆)| ≤ 2. This implies that there exist u, u′ ∈ {u1 , u2 , u3 } such that u ≪ u′
and u is played with strictly positive frequency in s. Let a′ be an action profile that appears infinitely
often in s and for which u(a′ ) = u′ . Construct the subsequence s′′ of s by transferring the frequency
attached to all profiles yielding u to a′ . Clearly, π(s′′ ) ≫ π(s) and ℓ(s′′ ) ≤ ℓ(s).
Case (b2). Suppose |N (∆)| = 3. Since s is cyclic, we can write π(s) as
π(s) =
M
X
wm (s)
m=1
ℓ(s)
um
where M ≥ 3, {um } ⊂ R(s), wm (s) ≥ 0 for all m, and ℓ(s) =
PM
m=1 wm (s).
Recall that π(s) ∈ ∆ =
49
Co u1 , u2 , u3 . For any w∗ > 0, define
π ∗ = π(s) +
w∗
((u2 − u1 ) + q(u2 − u3 ))
ℓ(s)
(C.7)
π ∗∗ = π(s) +
w∗
((u1 − u2 ) + q(u3 − u2 )),
ℓ(s)
(C.8)
and
where q ∈ Q[0, 1] is written as q = ν(q)/d(q). Since G is in family Fn , either (u2 −u1 )+q(u2 −u3 ) ≫ 0
and d(q) ≤ n, or (u1 −u2 )+q(u3 −u2 ) ≫ 0 and d(q)+ν(q) ≤ n. Thus, for all w∗ > 0, either π ∗ ≫ π(s)
or π ∗∗ ≫ π(s) holds. In each case, we can choose w∗ to obtain a well-defined convex combination.
For w∗ = min w1 (s), w3 (s) , (C.7) gives
π∗ =
wm (s)d(q) m (w1 (s) − w∗ )d(q) 1 w2 (s)d(q) + w∗ (ν(q) + d(q)) 2
u +
u +
u
d(q)ℓ(s)
d(q)ℓ(s)
d(q)ℓ(s)
m6=1,2,3
X
+
w3 (s)d(q) − w∗ ν(q) 3
u ,
d(q)ℓ(s)
which is a well-defined convex combination implementable by a sequence of length at most d(q)ℓ(s)
(which is less than nℓ(s)). For w∗ = w2 (s)/(1 + q), (C.8) gives
π ∗∗ =
wm (s)(d(q) + ν(q)) m
u +
ℓ(s)(d(q) + ν(q))
m6=1,2,3
X
w1 (s)(d(q) + ν(q)) + d(q)w2 (s)
u1
ℓ(s)(d(q) + ν(q))
+
w3 (s)(d(q) + ν(q)) + ν(q)w2 (s)
u3 (C.9)
ℓ(s)(d(q) + ν(q))
which is a well-defined convex combination implementable by a sequence of length at most (d(q) +
ν(q))ℓ(s) (which is less than nℓ(s)). This completes the proof of part 1.
2. The proof follows similar steps to part 1. Case (a) has already been addressed and only Case
(b1) applies. Case (b1) described a procedure that delivers a Pareto-improving subsequence (by
abandoning some action profiles). However, the resulting subsequence might also be Pareto improved
in a similar fashion. Therefore, this procedure can be used iteratively by eliminating profiles in R(s),
and it produces one Pareto-improving subsequence after another without increasing complexity along
the way. Since A is finite, there must be a point at which the sequence under consideration, call it s,
satisfies R(s) = {u1 , u2 , u3 }, i.e., only three payoff profiles appear in s. From here, the next argument
50
shows that a strictly simpler Pareto-improving subsequence can be found. Let Ai ⊂ A be the set
of action profiles that appear infinitely often in s such that u(ai ) = ui for all ai ∈ Ai . Define
∆ = Co({u1 , u2 , u3 }). If |N (∆)| = 1, then there is ui ∈ {u1 , u2 , u3 } such that ui ≫ uj for j 6= i.
Thus, the constant sequence s′′ in which players always play ai ∈ Ai satisfies π(s′′ ) ≫ π(s) and it
is a subsequence of s. Given s is strongly cyclic and |R(s)| = 3, s′′ must be strictly simpler than
s, because ℓ(s′′ ) = 1. Now suppose |N (∆)| = 2. It follows from the definition of F1∗ that u2 ≫ u1
and u3 ≫ u1 . We construct a subsequence s′′ of s and then show that it is Pareto-improving and
strictly simpler than s. Let Wi (s) =
P
a∈Ai
wa (s) be the weight assigned to payoff ui in s where
ℓ(s) = W1 (s) + W2 (s) + W3 (s) is the length of the cycle of s. Construct s′′ from s as follows:
1. Pick a∗ ∈ A2 and a∗∗ ∈ A3 ; a∗ and a∗∗ are the only profiles in A2 ∪ A3 that are played in s′′ .
2. No profile in A1 is played in s′′ .
3. a∗ is played with frequency
4. a
∗∗
is played with frequency
W2 (s)+ W
W2 (s)
W1 (s)
2 (s)+W3 (s)
ℓ(s)
W3 (s)+ W
.
W3 (s)
W1 (s)
2 (s)+W3 (s)
ℓ(s)
.
In words, s′′ re-allocates all of W2 (s) to a∗ , all of W3 (s) to a∗∗ , and all of W1 (s) to a∗ and a∗∗ .
Since u2 ≫ u1 and u3 ≫ u1 , s′′ is necessarily Pareto-improving. Simple calculations show that a2 is
played with frequency W2 (s)/(W2 (s) + W3 (s)) and a3 with frequency W3 (s)/(W2 (s) + W3 (s)). Hence,
ℓ(s′′ ) = W2 (s) + W3 (s) < ℓ(s).
Q.E.D.
C.8. Uniqueness Result
This section defines a class of common interest games in which our solution payoffs form a singleton
set. Let u∗i = maxa∈A ui (a) for i = 1, 2. A two-player game is common interest game if ui (a′ ) = u∗i
for some player i implies that uj (a′ ) = u∗j for player j.
Let u(G) = (u1 (G), u2 (G)) be the profile of maxmin payoffs in game G.
Proposition 6
In game G ∈ G, if there is π ∗ ∈ Π(G) such that (i) π ∗ ≫ u(G) ≥ π for all π ∈
Π(G)\{π ∗ } and u(G) ∈
/ Π(G) or ( ii) π ∗ = u(G) ≫ π for all π ∈ Π(G)\{π ∗ }, then π(S st (G)) = {π ∗ }.
Proof: Suppose first that there exists π ∗ ∈ Π(G) such that π ∗ ≫ u(G) ≥ π for all π ∈ Π(G)\{π ∗}
and u(G) ∈
/ Π(G). By Theorem 2, π(S st (G)) ⊂ (∪g∈L(G) P(g)) ∩ ΠIS (G). Since u(G) ∈
/ Π(G) and
π ∗ ≫ π for all π ∈ Π(G)\{π ∗ }, the only segments g ∈ L(G) such that g ∩ ΠIS (G) 6= ∅ must be such
that π ∗ ∈ g. For all segments g with π ∗ ∈ g, we have P(g) = {π ∗ }. Therefore, π(S st (G)) ⊂ {π ∗ }.
51
Moreover, note that the sequence that always plays profile a∗ with u(a∗ ) = π ∗ is in S st (G). Since
S st (G) is nonempty, π(S st (G)) = {π ∗ }. Suppose now that π ∗ = u(G) ≫ π for all π ∈ Π(G)\{π ∗}.
Then, L(G) ∩ ΠIS (G) = {π ∗ }. The sequence in which players always play their maxmin action yields
payoffs π ∗ and is in S st (G).
Q.E.D.
REFERENCES
Abreu, D., Rubinstein, A., 1988. The structure of nash equilibrium in repeated games with finite automata.
Econometrica 56 (6), 1259–81.
Agrawal, R., Imieliński, T., Swami, A., 1993. Mining association rules between sets of items in large databases.
In: Proceedings of the 1993 ACM SIGMOD international conference on Management of data. SIGMOD
’93. ACM, New York, NY, USA, pp. 207–216.
Blonski, M., Ockenfels, P., Spagnolo, G., 2011. Equilibrium selection in the repeated prisoner’s dilemma:
Axiomatic approach and experimental evidence. American Economic Journal: Microeconomics 3 (3), 164–
192.
Brown, G. W., 1951. Iterative solutions of games by fictitious play. In: Koopmans, T. C. (Ed.), Activity
Analysis of Production and Allocation. John Wiley and Sons, Chichester, pp. 374–376.
Buchanan, J. M., 1977. Freedom in Constitutional Contract. Texas A&M University Press.
Camerer, C., Ho, T.-H., 1999. Experience-weighted attraction learning in normal form games. Econometrica
67 (4), 827–874.
Eirinaki, M., Vazirgiannis, M., 2003. Web mining for web personalization. ACM Transactions on Internet
Technology 3, 1–27.
Er’ev, I., Roth, A. E., 1998. Predicting how people play games: Reinforcement learning in experimental
games with unique, mixed strategy equilibria. American Economic Review 88 (4), 848–81.
Fischbacher, U., 2007. z-tree: Zurich toolbox for ready-made economic experiments. Experimental Economics
10 (2), 171–178.
Foster, D. P., Young, H. P., 2003. Learning, hypothesis testing, and nash equilibrium. Games and Economic
Behavior 45 (1), 73–96.
Fudenberg, D., Levine, D. K., 1998. The Theory of Learning in Games, Cambridge, Massachusetts.
Han, J., Cheng, H., Xing, D., Yan, X., 2007. Frequent pattern mining: current status and future directions.
Data Mining and Knowledge Discovery 15, 55–86.
Hart, S., Mas-Colell, A., 2000. A simple adaptive procedure leading to correlated equilibrium. Econometrica
68 (5), 1127–1150.
Ho, T. H., Camerer, C. F., Chong, J.-K., 2007. Self-tuning experience weighted attraction learning in games.
Journal of Economic Theory 133 (1), 177–198.
Karandikar, R., Mookherjee, D., Ray, D., Vega-Redondo, F., 1998. Evolving aspirations and cooperation.
Journal of Economic Theory 80 (2), 292–331.
Marinacci, M., 1998. An axiomatic approach to complete patience and time invariance. Journal of Economic
Theory 83 (1), 105–144.
Mathevet, L., Romero, J., 2012. Predictive repeated game theory: Measures and experiments.
Milgrom, P., Roberts, J., 1990. Rationalizability, learning, and equilibrium in games with strategic complementarities. Econometrica 58 (6), 1255–1277.
Milgrom, P., Roberts, J., 1991. Adaptive and sophisticated learning in normal form games. Games and
Economic Behavior 3 (1), 82–100.
Moore, J., 2011. Behaviorism. The Psychological Record 61 (3), 449–464.
52
Nachbar, J. H., 2005. Beliefs in repeated games. Econometrica 73 (2), 459–480.
Roth, A., Murnighan, J. K., 1978. Equilibrium behavior and repeated play of the prisoners’ dilemma. Journal
of Mathematical Psychology 39 (1), 83–96.
Rubinstein, A., 1986. Finite automata play the repeated prisoner’s dilemma. Journal of Economic Theory
39 (1), 83–96.
Sonsino, D., 1997. Learning to learn, pattern recognition, and nash equilibrium. Games and Economic Behavior 18 (2), 286–331.
Thomson, W., 2001. On the axiomatic method and its recent applications to game theory and resource
allocation. Social Choice and Welfare 18 (2), 327–386.
Download