Sequential Equilibria with Infinite Histories Federal Reserve Bank of Minneapolis Research Department

advertisement
Federal Reserve Bank of Minneapolis
Research Department
August 2006
Sequential Equilibria with Infinite Histories∗
Christopher Phelan
Federal Reserve Bank of Minneapolis
University of Minnesota
Andrzej Skrzypacz
Graduate School of Business
Stanford University
ABSTRACT
This paper develops recursive methods for considering stationary sequential equilibria in
games with private actions and signals (private monitoring games), where play is assumed not only
to continue forever into the future, but is also assumed to have occurred forever into the past. In
particular, we develop set-based methods for constructing all pure strategy equilibria of the subclass
of strategies where each players’ action, but not his beliefs, depends only on what he observed in
the previous k periods. We then connect equilibria of games with infinite histories to the more
traditional games with a start date. We show that each k-history dependent sequential equilibrium
in a game with infinite history defines a simple correlated equilibrium for a game with a starting date.
Further, for any correlation device which signals fictitious private k-period histories, we derive setbased necessary and sufficient conditions for this correlation device to deliver a correlated equilibrium
of the game with a start date when coupled with a k-history dependent sequential equilibrium in
the infinite history game. Since sequential equilibria are examples of correlated equilibria (with
degenerate signaling devices) these methods deliver simple necessary and sufficient conditions for
constructing all k-history dependent pure strategy sequential equilibria of private monitoring games
with a start date.
∗
The authors thank Narayana Kocherlakota, Peter DeMarzo, David Levine, George Mailath, Stephen Morris,
Larry Samuelson, seminar participants at the Federal Reserve Bank of Minneapolis and the 2006 meetings
of the Society for Economic Dynamics for helpful comments as well as the excellent research assistance of
Roozbeh Hosseini. The views expressed herein are those of the authors and not necessarily those of the
Federal Reserve Bank of Minneapolis or the Federal Reserve System.
1. Introduction
In this paper, we propose new recursive methods for studying repeated games with
private monitoring and strategies. Our contribution is twofold. First, we present a model
of repeated games with infinite histories - that is games in which time extends infinitely
backwards and forwards - and establish a new set-based algorithm for fully characterizing
a class of finite-history dependent equilibria in these games. Second, we show a connection
between equilibria of games with infinite histories and correlated stationary equilibria of traditional games with a start date. That allows us to characterize all correlated (and sequential)
equilibria within this class for games with a start date.
Games with infinite histories are interesting in their own right. This model allows us to
abstract away from the inherent non-stationarity of the set of possible histories that plagues
games with a start date. It is a long tradition in economic theory to use the repeated game
model to study dynamic provision of incentives because in case of perfect or public monitoring
they offer a highly tractable stationary environment without end-of-horizon effects. However,
with private monitoring the stationarity is broken by beginning-of-horizon effects: the set of
possible private histories changes through time and with it the possibilities to coordinate play
among players. The new model of repeated games with infinite histories solves this problem.
Our formulation is simple. A joint strategy is a mapping from each player’s privately
observed infinite history to how he plays today. An equilibrium strategy is a strategy where
each player’s mapping, given what he can infer regarding what his opponents have observed,
is a best response to the other players’ strategies.
We pay particular attention to strategies where actions (but not beliefs) depend only
on what a player has observed in the last k periods. In fact, our methods allow one to
calculate, for a given k, all such equilibria. We do this by developing recursive methods on
sets of equilibrium beliefs for a given player about what the other players have seen the last
k periods for each specification of what he has seen the last k periods. The key is that a
player’s private history more than k dates ago is relevant only to the extent that it gives
him information regarding what the other players have seen the last k periods (a point first
made by Mailath and Morris (2002)). This allows us to summarize this history as a belief,
a much smaller object (a point also made by Mailath and Morris (2002)). The advantages
of working with sets of beliefs are first, it is necessary and sufficient to check incentives only
on beliefs which lie on the boundary of beliefs which can be generated by infinite histories
(Lemma 1), and second, these sets can be readily calculated using recursive methods which
look similar to those developed by Abreu, Pearce, and Stacchetti (1990) but operate on sets
of beliefs rather than sets of continuation values (Lemma 4).
We then connect our results on games with infinite histories to the more traditional
(and more problematic) formulation of games with a start date. We first show that any equilibrium of the game with infinite histories can be used to construct a correlated equilibrium
of the corresponding game with a start date, where the correlation device sends each player
a fictitious private history. We then develop methods for verifying whether an arbitrary
correlation device giving fictitious k-period private histories, when coupled with a k-history
dependent equilibrium strategy of the game with infinite histories, is a correlated equilibrium.
Since sequential equilibria are examples of correlated equilibria (with degenerate signaling devices) these methods deliver simple necessary and sufficient conditions for constructing all
k-history dependent pure strategy sequential equilibria.
Finding equilibria in repeated games with private monitoring is known to be difficult
2
(see for example Kandori (2002) or Mailath and Samuelson (2006), Chapter 12). There are
several difficulties, a central one being that with private monitoring, the recursive structure
of public monitoring games is lost. The continuation of (sequential) equilibrium play in a
game with private monitoring is not a sequential equilibrium, but a correlated equilibrium
where private histories play the role of the correlation device. But as Kandori (2002) notes,
the correlation device becomes increasingly more complex over time. By introducing infinite
histories we make the correlation device stationary and regain some tractability. The mapping
to correlated equilibria of games with a start date is then natural. In fact, using randomization
or exogenous correlation at time 0 of the game to make it more stationary (and create interior
beliefs about the private history of other players) was suggested before by Sekiguchi (1997),
Compte (2002), Ely (2002), and Cripps, Mailath, and Samuelson (2006). We present a robust
way of applying this method to construct a family of equilibria.
Our results complement the existing literature on the construction of belief-free equilibria (Piccione (2002), Ely and Valimaki (2002), Kandori and Obara (2006), and Ely, Horner
and Olszewski (2005), among others), in which players play mix strategies and their best
responses are independent of their beliefs about the private histories of their opponents. In
contrast, the equilibria we construct are belief-based as the best responses do depend on
the beliefs. Our paper is also related to Mailath and Morris (2002) who show robustness of
finite-history dependent strategies in games with almost public monitoring (and in Mailath
and Morris (2006) show problems with robustness of infinite-history dependent strategies).
They have shown that for a strict k-history dependent equilibrium of a game with public
monitoring, if we slightly perturb the game to almost public monitoring, the strategies still
form an equilibrium. We can apply our results to this class of games and calculate exactly
3
how much correlation is necessary (they have provided a sufficient bound on the correlation,
but finding a necessary and sufficient cutoff remained an open question).
2. Games with Infinite Histories
Consider a stage game, Γ, with N players, i = 1, . . . , N , each able to take actions
ai ∈ Ai . With probability P (y|a), a vector of private signals y = (y1 , . . . , yN ) (each yi ∈ Yi )
is observed conditional on the vector of private actions a = (a1 , . . . , aN ), where for all (a, y),
P (y|a) > 0 (full support).1 Further assume A = A1 × . . . AN and Y = Y1 × . . . YN are each
finite sets.
The payoff to agent i at date t is a function ui (ait , yit ). That is, player i’s payoff is a
function of his own current period action and his current period private signal. Players put
weight (1 − β) on current utility and weight β on future payoffs, and, as usual, care about
the expected value of utility streams.
In this and the next two sections, we consider the case (denoted Γ−∞,∞ ) where time
is assumed to extend infinitely both backwards and forwards (histories are infinite). In
Section 5, we connect our results to the more traditional case where time is assumed to start
at date 0 and extend forward only (denoted Γ0,∞ ). To this end, let ai denote an infinite
history of player i’s private actions ai = {ai,1 , ai,2 , . . .} with the set of possible private action
histories ai denoted by Ai . (All history sequences move backward in time with the time
subscript referring to the number of periods before the present period.) Likewise, let player
i’s private signal history be denoted yi = {yi1 , yi2 , . . .} where Yi denotes the set of possible
private signal histories for player i. Finally, let A = A0 × . . . × AN , Y = Y0 × . . . × YN ,
1
This assumption of full support is used only in the proofs of Lemmas 3 and 6 where we discuss possible
less restrictive assumptions.
4
Z = A × Y, Zi = Ai × Yi and Z−i = A−i × Y−i , where the subscript −i refers to the set
{1, 2, . . . , i − 1, i + 1, . . . , N }. In words, zi ∈ Zi is what player i has directly observed and
z−i ∈ Z−i is what player i has not directly observed, but his opponents have.
For player i, let a mixed strategy σ i (ai |zi ) : A × Zi → [δ, 1 − (NAi − 1)δ] map infinite
private histories to the probability of taking a given action, where δ > 0 and NAi is the number
of elements of Ai . (Players tremble.) A joint strategy σ = (σ 1 , . . . , σ N ). This formulation
implicitly restricts strategies to not depend on the calendar date (a concept we have, in fact,
not introduced). With trembling, we define a pure strategy as putting minimal probability δ
on all but one action.
A. Stationarity:
Let π : B(Z) → [0, 1] be a probability measure over infinite histories where B(Z)
denotes the Borel subsets of Z. Let G(σ, π) : B(Z) → [0, 1] be the probability measure over
infinite histories induced by π and the addition of one more date through the strategy σ and
the function P (y|a). That is, for all Ŝ ∈ B(Z)
(1)
G(σ, π)(Ŝ) =
Z X
Z a
(Πj σ j (aj |z))
X
P (y|a)I(((a, y), z) ∈ Ŝ)dπ(z).
y
A pair (σ, π) is said to be stationary if G(σ, π) = π.
B. Beliefs:
Let π i : B(Zi ) × B(Z−i ) → [0, 1] denote player i’s conditional probability measure over
sets of that which he cannot see (elements of B(Z−i )) conditional on sets of histories over
what he can see (elements of B(Zi ).)
5
C. Equilibrium Definition
Let vi (z|σ) be the function recursively defined by
X
vi (z|σ) =
a
and Evi (zi |σ) =
R
z−i
X
(Πj σ j (aj |zj ))(
P (y|a)((1 − β)ui (ai , yi ) + βvi ((a, y), z|σ)))
y
vi ((zi , z−i )|σ)dπ i (z−i |zi ).
A triplet (σ, π, π i ) is a Stationary Sequential Equilibrium of the infinite history game
Γ−∞,∞ (SSE-ih) if
1. For all i, z, and σ̂ i : zi → [δ, 1 − (NAi − 1)δ], Evi (zi |σ) ≥ Evi (zi |(σ̂ i , σ −i )).
2. G(σ, π) = π.
3. For all i, Ẑ−i ∈ B(Z−i ) and Ẑi ∈ B(Zi ) such that π(Zi ) > 0, π i (Ẑ−i |Zi ) = π(Ẑ−i , Zi )/π(Zi ).
Note that the assumption of trembling (δ > 0) and stationarity implies minimal freedom to choose π and π i independently of σ. In particular, for a given σ, the assumption of
δ > 0 ensures that all π satisfying G(σ, π) = π are almost everywhere equal to each other.
Further, while all infinite histories have zero probability, trembling and P (y|a) > 0 ensures
that all histories are in the support of π, guaranteeing that π i is defined almost everywhere
by π for both on path and off path play.2
3. Finite History Dependent Strategies
This section develops methods for analyzing strategies that depend only on the last k
periods of a player’s history zi . We show how to verify if a particular strategy profile is an
SSE-ih, and to calculate for any given k all pure strategy equilibria in this class.
2
Trembling also plays another role: that even if agents condition on their own past actions, information
about t periods ago has less and less impact on current beliefs as t → ∞. (See Lemma 3).
6
A strategy σ is k−history dependent if for all histories each player’s action is a function
only of the last k dates of his private history zi = (ai , yi ). We do not mean that players
literally forget history more than k periods old. Incentives for following the strategy will be
checked assuming each player uses all the information available to him — his entire infinite
history.
For a particular k-history dependent strategy σ, we ask if σ with properly chosen
beliefs is an SSE-ih. Define the state space Ω as the (finite) set of all possible histories of
length k, z1,k = (z1 , ..., zk ).3 An element ω ≡ z1,k ∈ Ω represents the aggregate state. In a
game with private actions and signals, no player sees the aggregate state ω. Instead he sees
only his part of it z1,k
= (zi,1 , . . . , zi,k ) denoted ω i and must form beliefs over the part he
i
cannot see, denoted ω −i . The set of private states for player i is denoted Ωi and the set of
private states for the other players is denoted Ω−i . Let Di denote the number of elements of
Ωi (the number actions for player i times the number of signals for player i to the power k)
and D−i denote the number of elements of Ω−i . Instead of writing explicitly strategies and
payoffs as functions of histories, we (equivalently) express them as functions of this induced
state space.
Given this state space, we can define expected payoffs in aggregate state ω as:
Vi (ω|σ) =
X
a
X
(Πj σ j (aj |ω j ))(
P (y|a)((1 − β)ui (ai , yi ) + βVi (ω + (a, y, ω) |σ)
y
where ω + (a, y, ω) is the next-period state given current-period state ω and new (a, y) (it is
found by simply appending to the history represented by ω the new realizations and truncating
3
For a useful discussion of the validity of representing strategies as finite state automata in the context of
games with private monitoring, see Mailath and Morris (2002) and Mailath and Samuelson (2006).
7
the oldest observation). That is, Vi (ω|σ) is the expected utility of player i if he knows the
aggregate state (including the private states of the other players) and he (and all other
players) mechanistically plays the strategy σ.
A player, in general, does not know the aggregate state ω but instead only knows ω i .
He will use all the information available to him (his private infinite history zi ) to form beliefs
regarding ω. For a particular infinite private history, a player’s beliefs over the aggregate
state of the game is simply a point in the (D−i − 1)-dimensional unit-simplex, denoted ∆D−i .
Let µi (zi ) : Zi → ∆D−i denote player i’s beliefs regarding ω −i after private history zi . Let
µi (ω −i |zi ) denote the probability assigned to the particular state ω −i . For arbitrary beliefs
mi ∈ ∆D−i and a strategy σ, let
EVi (ω i , mi |σ) =
X
mi (ω −i )Vi (ω i , ω −i |σ).
ω −i
In essence, we have defined expected payoffs now as functions of finite private histories ω i
and beliefs over the finite private histories of the other players mi , as opposed to expected
payoffs being functions of a player’s infinite private history zi .
Rather than considering separately all beliefs mi ∈ ∆D−i a player can have after some
infinite history, it is useful to consider subsets of beliefs. In particular, for each k-period
private history ω i , a player’s private history for period’s k+1 back through time will determine
his beliefs regarding ω −i . Let Mi∗ (ω i ) = co({m|m = µi (ω i = z1,k
i , ẑi ) for some ẑi ∈ Zi })
(where co() denotes the closure of the convex hull). Here, Mi∗ (ω i ) is the closure of the convex
hull of the set of possible beliefs player i can have about ω −i given that he’s seen ω i , where a
belief is possible if there is an infinite private history for dates k + 1 back which induces this
8
belief. Let Mi∗ denote the collection of Di of these subsets, one for each ω i .
The following lemma establishes that to check the incentives of a k-history dependent
strategy, one need only check that for each player i and private state ω i , the player does not
wish to deviate when his beliefs about the other players’ private states ω −i are on the frontier
of the convex hull of all possible beliefs Mi∗ (ω i ). It is not necessary to check incentives for
every infinite history.
Lemma 1. Consider a k-history dependent strategy σ and measures (π, π i ) satisfying conditions 2 and 3 of the definition of an SSE-ih and the implied sets Mi∗ (ω i ). Then (σ, π, π i )
is an SSE-ih if and only if EVi (ω i , mi |σ) ≥ EVi (ω i , mi |σ̂ i , σ −i ) for all i, ω i , σ̂ i : ω i →
[δ, 1 − (NAi − 1)δ] and all mi such that mi is on the frontier of Mi∗ (ω i ).
Proof.
If: Suppose incentives hold for beliefs mi and m̂i each in ∆D−i .
(That is,
EVi (ω i , mi |σ) ≥ EVi (ω i , mi |σ̂ i , σ −i ) and EVi (ω i , m̂i |σ) ≥ EVi (ω i , m̂i |σ̂ i , σ −i ) for all i, ω i
and σ̂ i : Ωi → [δ, 1 − (NAi − 1)δ]). Then since expected utility for any σ̂ i (including the
on-path strategy σ i ) is linear in these beliefs, for all α ∈ (0, 1), incentives hold for beliefs
αmi + (1 − α)m̂i . Next, every infinite history generates beliefs within Mi∗ (ω i ) by construction. Further, player i’s history more than k periods back is relevant to him only to the extent
that it affects his beliefs regarding the other players’ continuation play which is determined
by ω −i . Thus if incentives hold for the beliefs generated by infinite history zi , incentives hold
for infinite history zi . Finally, the beliefs generated by any infinite history can be constructed
as the convex combination of points on the frontier of Mi∗ (ω i ).
Only if: Suppose (σ, π, π i ) satisfy condition 1 of the definition of an SSE-ih, but there
exists a belief mi on the frontier of Mi∗ (ω i ) and a σ̂ i : Ωi → [δ, 1 − (NAi − 1)δ] such that
9
EVi (ω i , mi |σ) < EVi (ω i , mi |σ̂ i , σ −i ). If mi is the linear combination of beliefs generated by
a private history zi ending in ω i = z1,k
i , then this is a contradiction since, as shown above,
if incentives hold for a set of beliefs, they hold for all linear combinations of those beliefs.
But since Mi∗ (ω i ) is the closure of the convex hull of beliefs generated by private histories,
the only remaining possibility is that mi is within the closure of the convex hull of beliefs,
but not the convex hull itself. But if incentives do not hold for a given belief, then, for a
given deviation strategy, the gain to deviation is strictly positive. This implies deviation is
preferred for some neighborhood around this belief as well, contradicting the supposition.
The previous lemma is useful for checking whether a particular k-history dependent
strategy is an equilibrium when the sets of beliefs generated by that strategy (for each player
i) Mi∗ is known. The following lemmas construct a method for generating, for each k-history
dependent strategy, the appropriate Mi∗ sets.
To this end, we now construct an operator from sets of beliefs to sets of beliefs with
the property that the largest fixed point of this operator equals Mi∗ . Let Mi (without the
∗) denote an arbitrary collection of Di closed convex subsets of ∆D−i and let the one-step
operator T (Mi ) be defined as follows: First, let F (ω i ) be the set of private states last period
consistent with the private state this period being ω i . Next, the successor of belief mi ∈ ∆D
−i
given new data (ai , yi ) (denoted m0i (mi , ai , yi ) ∈ ∆D
−i ) can be defined as
m0i (mi , ai , yi )(ω −i ) =
X
σ −i (a−i |ω −i )P (yi , y−i |ai , a−i )
â−i ,ŷ−i σ −i (â−i |ω −i )P (yi , ŷ−i |ai , â−i )
mi (ω −i ) P
ω −i ∈F (ω −i )
10
where a−i and y−i are the most recent elements of ω −i . Then
T (Mi )(ω i ) = {mi | there exists ω i ∈ F (ω i ) and mi ∈ Mi (ω i ) such that
mi (ω −i ) = m0i (mi , ai , yi )}
where ai and yi are the most recent elements of ω i .
The T operator works as follows: Suppose one takes as given the beliefs of player i
over the private state of the other players, ω −i , last period. Bayes’ rule then implies what
player i should believe about ω −i this period for each realization of (ai , yi ). If there exists a
way of choosing player i’s state last period, the beliefs of player i over the private states of his
opponents last period, and a new realization of (ai , yi ) such that Bayes’ rule delivers beliefs
mi , then mi ∈ T (Mi )(ω i ). In effect, the operator T gives, for a given collection of belief sets
Mi , the belief sets associated with all possible successor beliefs generated by new data and
interpreted through σ.
The following lemmas show that this T operator can then be used to generate the true
sets of valid beliefs Mi∗ . We write Mi ⊂ M̂i if Mi (ω i ) ⊂ M̂i (ω i ) for all ω i .
Lemma 2. If Mi∗ ⊂ Mi , then Mi∗ ⊂ T (Mi ).
Proof. For a given ω i = {zi,1 , . . . , zi,k } choose beliefs mi ∈ Mi∗ (ω i ) such that mi is the linear
combination of beliefs m0i and m1i for which there exist infinite histories z0i and z1i for dates
k+1 back which generate beliefs m0i and m1i over outcomes in the last k periods. Now consider
0
1
these histories except for the last date. That is, let ẑ0i = {zi,k+2,
, . . .} and ẑ1i = {zi,k+2,
, . . .}.
Beliefs after histories ẑ0i and ẑ1i (call them m̂0i and m̂1i ) satisfy m̂0i ∈ Mi∗ (ω 0i ) and m̂1i ∈ Mi∗ (ω 1i )
11
0
1
for ω 0i = {zi,2 , . . . , zi,k , zi,k+1
} and ω 1i = {zi,2 , . . . , zi,k , zi,k+1
} from the definition of Mi∗ . Since
m̂0i ∈ Mi (ω 0i ) and m̂1i ∈ Mi (ω 1i ) from Mi∗ ⊂ Mi , if one updates beliefs m̂0i and m̂1i according to
1
0
the rule in the T operator following private outcomes zi,k+1
and zi,k+1
this will deliver beliefs
m0i ∈ T (Mi )(ω i ) and m1i ∈ T (Mi )(ω i ). Since T maps closed convex sets to closed convex sets
(from the linearity of our Bayes’ rule operator in beliefs), mi ∈ T (Mi )(ω i ). This leaves only
the possibility that mi ∈ Mi∗ (ω i ) is not a linear combination of beliefs generated by infinite
histories, which, from the definition of Mi∗ implies mi is on the frontier of Mi∗ (ω i ). Suppose
then mi ∈
/ T (Mi )(ω i ). Since the above logic can be applied to a sequence of points in Mi∗ (ω i )
converging to mi , each point in this sequence is in T (Mi∗ )(ω i ), implying T (Mi∗ )(ω i ) is an open
set, a contradiction.
Lemma 3. If Mi ⊂ T (Mi ), then T (Mi ) ⊂ Mi∗ .
Proof.
For a given ω i = {zi,1 , . . . , zi,k } choose beliefs mi ∈ T (Mi )(ω i ). Since mi ∈
T (Mi )(ω i ) there exists a private realization zi,k+1 and beliefs m1i ∈ Mi (ω 1i ) (where ω 1i =
{zi,2 , . . . , zi,k , zi,k+1 }) which, according the rule in the T operator, generate beliefs mi . That
Mi ⊂ T (Mi ) ensures that m1i ∈ T (Mi )(ω 1i ), thus this can be repeated indefinitely, generating
any finite length history of private outcomes {zi,k+1 , . . . , zi,k+s } and beliefs msi ∈ Mi (ω̂ si )
(where ω si = {zi,s+1 , . . . , zi,s+k+1 }), with the property that beliefs mi ∈ T (Mi )(ω i ) are the
beliefs player i would hold if he started with beliefs msi and proceeded to experience private
history {zi,1 , . . . , zi,k+s }. The assumptions that for all (a, y), P (y|a) > 0 and δ > 0 ensure
that as s → ∞, beliefs regarding z−i before date s become irrelevant in determining beliefs
regarding ω −i relative to the information contained in {zi,k+1 , . . . , zi,k+s } and thus beliefs
12
converge to the actual probability of ω −i conditional on the entire infinite history.4 Given
this, mi ∈ Mi∗ (ω i ).
Let T s (Mi ) denote the application of the T operator s times on Mi and ∆ = Di ×∆D−i
(or Mi = ∆ implies for all ω i , all beliefs are acceptable.)
Lemma 4. lims→∞ T s (∆) = Mi∗ .
Proof. Examination of the T operator shows it to be monotonic in that if Mi ⊂ M̂i , T (Mi ) ⊂
T (M̂i ). Since T (∆) ⊂ ∆, T 2 (∆) ⊂ T (∆) and so on. Thus T s (∆) represents a sequence of
(weakly) ever smaller included sets, guaranteeing that the limit exists. From Lemma 2,
Mi∗ ⊂ T (Mi∗ ). Lemma 3 then implies T (Mi∗ ) ⊂ Mi∗ , thus Mi∗ = T (Mi∗ ). Further, since
Mi∗ ⊂ ∆ monotonicity implies T (Mi∗ ) = Mi∗ ⊂ T (∆) and so on. Thus Mi∗ ⊂ lims→∞ T s (∆).
Since lims→∞ T s (∆) = T (lims→∞ T s (∆)), Lemma 3 implies lims→∞ T s (∆) ⊂ Mi∗ .
While Lemma 4 shows that Mi∗ is the largest fixed point of T (and thus Mi∗ can be
computed by successively applying T to ∆) unlike the value sets calculated in Abrea, Pearce,
and Stacchetti (1990) we strongly conjecture that Mi∗ is, in fact, the unique fixed point of our
operator, and thus iterating on any initial belief set will converge to Mi∗ . However, iterating
on ∆ is particularly useful since at every iteration Mi∗ ⊂ T s (∆). Thus, if after s iterations,
incentives hold on the boundary of T s (∆) one need iterate no further to verify that σ is an
4
To see this, note that for each of the finite values of zi , one can construct a Markov transition matrix
X(zi ) where each element is the probability of transiting from an ω −i yesterday to an ω 0−i today, conditional
on having observed zi . That P (y|a) > 0 (for all (a, y)) and δ > 0 ensure that these are regular transition
matrices (that is, each state communicates with each other state in a finite number of periods). Thus the
sequence {X(zi,1 ), X(zi,2 )X(zi,1 ), X(zi,3 )X(zi,2 )X(zi,1 ), . . .} is guaranteed to converge for all sequences zi to
a matrix with identical rows (where the actual value of these identical rows depends on the sequence zi ).
Thus the sequence {mi X(zi,1 ), mi X(zi,2 )X(zi,1 , mi X(zi,3 )X(zi,2 )X(zi,1 ), . . .} converges as well for all mi .
Any relaxation of the assumption P (y|a) > 0 for all (a, y) such that these sequences still converge allows this
proof to go through.
13
equilibrium.
To this point we have focused on k-history dependent strategies σ and left implicit
the corresponding measures on infinite histories π and π i . For a given k-history dependent
strategy σ, one can construct π as follows: Strategy σ and the function P define a regular
Markov transition matrix X mapping the aggregate state ω, k periods ago, to the aggregate
state ω 0 today. This matrix has a unique ergodic distribution. The probability of an infinite
sequence ending in a particular ω 1 is simply the weight the ergodic distribution assigns to
ω 1 . The probability of an infinite sequence ending in (ω 1 , ω 2 ) (where again the most recent
realization is first) is the ergodic probability of ω 2 multiplied by the probability of transiting
from ω 2 to ω 1 , and so on. Likewise, the conditional probability measures π i can be calculated
using Bayes’ rule similarly conditioning only on the last sk dates, but for all s.
Finally note that the ability to verify whether a k-history dependent strategy σ is an
SSE-ih (when coupled with the appropriate beliefs π and π i ) allows one to calculate all purestrategy k-history dependent SSE-ih for the simple reason that there exist a finite number of
candidate pure strategies.
4. A Simple Example
In this section we construct a simple example based on Mailath and Morris (2002).
Consider the two player partnership game where each player i ∈ {1, 2} can take action
ai ∈ {C, D} (cooperate, defect) and each can realize a private outcome yi ∈ {G, B} (good,
bad). If m players cooperate, with probability pm (1 − )2 + (1 − pm )2 , both players realizes
the good private outcome. With probability (1 − ), player 1 realizes the good outcome
while player 2 realizes the bad outcome. (Likewise, with this same probability, player 2
14
realizes the good outcome and player 1 realizes the bad outcome.) Finally, with probability
pm 2 + (1 − pm )(1 − )2 , both players realize the bad outcome. Essentially, this game is akin
to one where pm determines the probability of an unobservable “common” outcome and is
the probability that player i’s outcome differs from the common outcome. Thus when = 0,
outcomes are public and when approaches zero, outcomes are almost public. Payoffs are
determined by specifying for each player i the vector {ui (C, G), ui (C, B), ui (D, G), ui (D, B)}.
Next consider perhaps the simplest non-trivial pure strategy — tit-for-tat. That is,
let each player i play C (with probability (1 − δ)) if his private outcome was good in the
previous period and D (with probability (1 − δ)) otherwise. Given each player is playing this
strategy, each player cares only about the probability his opponent realized a good outcome
in the last period. In particular, he doesn’t care what his opponent realized in periods before
last period or what actions his opponent took in any periods since none of these affect his
play. Thus, for computation purposes, we can simplify and let the set Mi∗ (ai , yi ) be simply
an interval (as opposed to a subset of the three dimensional unit simplex) specifying the
range of probabilities that player −i realized a good outcome last period, given that player
i took action ai last period and realized outcome yi . The mapping T from section 3 then
maps a collection of four intervals (one for each specification of (ai , yi )) to a collection of four
intervals, and the results in that section imply that starting with the unit interval for each
of these and iterating delivers the true intervals Mi∗ (ai , yi ).
For p0 = .3, p1 = .55, and p2 = .9, δ = .001 and a payoff of 1 for receiving a good
outcome and a payoff of -.4 for cooperating, it can be easily verified that the static game is a
prisoner’s dilemma and that tit-for-tat is an equilibrium of the public outcome ( = 0) game.
For > 0, beliefs matter and one must construct the intervals Mi∗ (ai , yi ). The procedure
15
of iterating the T mapping (starting with unit intervals) is relatively easily implemented on
a computer. For = .025 the procedure converges (in less than a second) to Mi∗ (C, G) =
[.956, .972], Mi∗ (C, B) = [.059, .198], Mi∗ (D, G) = [.923, .955], Mi∗ (D, B) = [.036, .053]. For
each specification of (ai , yi ), if player i believes the other player saw a good outcome with a
probability within Mi∗ (ai , yi ) he wishes to follow the equilibrium strategy (C if yi = G, D
otherwise), thus tit-for-tat is an equilibrium.
If is increased to = .03, the intervals Mi∗ (ai , yi ) shift toward the middle and widen
to Mi∗ (C, G) = [.948, .966], Mi∗ (C, B) = [.072, .228], Mi∗ (D, G) = [.909, .945], Mi∗ (D, B) =
[.043, .063]. In this case, if (ai , yi ) = (D, G) and player i believes the other player realized a
good outcome with probability .909, he wishes to deviate and play D rather than C. Thus,
with = .03, tit-for-tat is not an equilibrium, since, by construction, there exists an infinite
private history for player i ending in (D, G) such that he believes the other player saw a good
outcome last period with probability .909 (the lower end of the interval). Simply put, being
only 91% sure your opponent saw the same good signal as you (and thus will cooperate along
with you) is an insufficient inducement for cooperation in this repeated prisoner’s dilemma.
From Mailath and Morris (2002) we know this example for sufficiently small (and δ)
that tit-for-tat is an equilibrium, and obviously for sufficiently high it is not. Our analysis
of this example allows us to go further: to establish exactly for which epsilons the profile is an
equilibrium, that is we complete the picture by providing necessary and sufficient conditions.
5. Games with a Start Date
In this section we connect our results in Section 3 with the more traditional class of
infinitely repeated games where the exists a start date t = 0. We first show how to construct,
16
for each SSE-ih, a correlated equilibrium in the corresponding game with a start date where
the correlation device sends fictitious histories to each player. We then, as in Section 3, restrict ourselves to k-history dependent strategies and show how, for each SSE-ih, to construct
a correlated equilibrium in which the correlation device signals fictitious k-period histories as
opposed to infinite histories. We then develop conditions for checking whether an arbitrary
correlation device which signals fictitious k-period histories, when coupled with a k-history
dependent strategy σ, is a correlated equilibrium. Since sequential equilibria are examples of
correlated equilibria (with degenerate signaling devices) these methods deliver simple necessary and sufficient conditions for constructing all k-history dependent pure strategy sequential
equilibria of private monitoring games with a start date.
If players receive a signal si ∈ Si at the beginning of the game, a joint strategy σ is such
that σ i,t (ai,t |si , (zi,1 , . . . , zi,t )) : A × Si × Zit → [δ, 1 − (NAi − 1)δ] . Let vi,t (s, (zi,1 , . . . , zi,t )|σ)
be the expected discounted payoff to player i if he knows the joint signal s, every player’s
private history zj1,t , and he and the other players follow σ. Let Evi,t (si , (zi,1 , . . . , zi,t )|σ) =
R
s−i
vi,t (s, (zi,1 , . . . , zi,t )|σ)dπ i,t (si , (zi,1 , . . . , zi,t )) where π i,t describes player i’s beliefs regard-
ing the signals and histories of the other players given what he knows at date t. A correlated
equilibrium in our environment is a joint strategy σ, a signaling device or probability measure
x : B(S) → [0, 1], (where S is the joint signal space), and conditional probability measures
t
π i,t : B(Si ) × Zit × B(S−i ) × Z−i
→ [0, 1] such that
1. Strategies σ i are mutual best responses (given beliefs) after all signals and histories.
2. Beliefs, π i , after all all signals and histories are consistent with the signaling device x,
the strategy σ and P .
17
In each of the following lemmas, we construct correlated equilibria by starting with
an SSE-ih (σ, π, π i ) of Γ−∞,∞ , letting the signal space S be a set of either infinite or finite
fictitious histories, and having the strategy for the game with a start date combine a player’s
fictitious history signal and his actual history up to date t in such a way that the player
treats his fictitious history as if it were real. That is, if player i receives fictitious history
(ẑi,1 , ẑi,2 , . . .) (with date subscripts going backwards in time as in the previous sections), and
then experiences actual history (zi,0 , zi,1 , . . . , zi,t ) (with date subscripts referring to the calendar date), we let σ i,t (ai,t |s = (ẑi,1 , ẑi,2 , . . .), (zi,0 , . . . , zi,t )) = σ i (ai,t |(zi,t , . . . , , zi,0 , ẑi,1 , ẑi,2 , . . .)).
In effect, through the use of fictitious histories, we have constructed fully stationary strategies
in a game with a start date — a somewhat difficult task due to the fact that in a game with
a start date, the calendar date is automatically encoded into a player’s history through the
length of that history. What remains is generating a correlation device that when coupled
with the σ thus constructed, is a correlated equilibrium.
Lemma 5. Take as given an SSE-ih (σ, π, π i ) of Γ−∞,∞ . For the game with a start date,
Γ0,∞ , let S = Z, x = π, and σ i,t (ai,t |(ẑi,1 , . . .), (zi,0 , . . . , zi,t )) = σ i (ai,t |(zi,t , . . . , zi,0 , ẑi,1 , . . .)).
Then (σ, x, π i ) is a correlated equilibrium of Γ0,∞ .
Proof.
The proof immediately follows from (σ, π, π i ) being an SSE-ih and that σ i,t reacts
to a fictitious history signal, or an actual history with a fictitious history prepended to it, as
if it were an actual history in Γ−∞,∞ .
Since the previous lemma relies on a signal space with a continuum of signals, it would
be useful to generate correlated equilibria with a finite signal space. To this end, let the signal
space S = Ω, the (finite) set of k-period histories, and construct the strategy for Γ0,∞ exactly
18
as in the previous lemma where each player treats his (now) k-period fictitious history as if
it were real.
Lemma 6. Take as given a k-history dependent SSE-ih (σ, π, π i ) of Γ−∞,∞ . For the game
with a start date, Γ0,∞ , there exists a correlated equilibrium with S = Ω (the set of k-period
joint histories) and σ i,t (ai,t |(ẑi,1 , . . . , ẑi,k ), (zi,0 , . . . , zi,t )) = σ i (ai,t |(zi,t , . . . , zi,0 , ẑi,1 , . . . , ẑi,k )).
Proof. Let Xk denote the probability transition matrix from state ω yesterday to state ω 0
today defined by σ and the function P . Since Xk is a regular transition matrix (from the
assumption of trembling and full support), this implies a unique ergodic distribution over
states ω. If joint signals are drawn from this distribution, beliefs for each player i must
lie within Mi∗ since the beliefs of player i regarding ω −i (conditional on ω i ) are a weighted
average of player i’s beliefs regarding ω −i when conditioning on his entire infinite history.
By construction, if beliefs over ω −i start within Mi∗ , they will always lie within Mi∗ . That
(σ, π, π i ) is an SSE-ih then ensures that incentives hold at all signals and histories. The actual
beliefs over signals and histories at later dates (needed to define the correlated equilibrium)
are calculated using the methods following Lemma 4, with again, no distinction made between
fictitious and actual histories.
Lemma 6 ensures that every k-history dependent SSE-ih of Γ−∞,∞ defines at least
one correlated equilibrium of Γ0,∞ . But one still may ask if other correlation devices may
work to permit, say, higher initial values than the correlation device constructed above. Thus
we now turn to developing necessary and sufficient conditions for any correlation device
xk : Ω → [0, 1], when coupled with an SSE-ih σ, to constitute a correlated equilibrium.
Define M̂i to be set of beliefs such that incentives hold for all beliefs mi ∈ M̂i . Lemma
19
1 showed that a necessary and sufficient condition for a k-history dependent strategy to be
(part of) an SSE-ih is that Mi∗ ⊂ M̂i for all i. We need to insure, however, that incentives
are satisfied not only for a particular belief generated by a correlation device, but also that
incentives are satisfied for all possible successors of that belief, and successors of those beliefs,
and so on. Recall (from Section 3) m0i (mi , ai , yi ) ∈ ∆D
−i as the successor beliefs (regarding ω −i )
of beliefs mi given new data (ai , yi ). Further recall ω +
i (ai , yi , ω i ) as the next-period private
state given current-period private state ω i and a new (ai , yi ) (again found by appending to
the history represented by ω the new realizations and truncating the oldest observation). The
operator T̂ (Mi ) (mapping sets of beliefs to sets of beliefs) is defined as
T̂ (Mi )(ω i ) = {mi |mi ∈ Mi (ω i ), and for all (ai , yi ),
m0i (mi , ai , yi ) ∈ Mi (ω +
i (ai , yi , ω i ))}
In words, T̂ eliminates an element of Mi (ω i ) if there exists a successor belief which is not in
s
Mi (ω +
i ). Clearly, T̂ is monotonic and T̂ (M̂i ) ⊂ M̂i . Thus T̂ (M̂i ) represents a sequence of
(weakly) ever smaller included sets, guaranteeing that the limit, denoted M , exists.
Lemma 7. Take as given a k-history dependent SSE-ih (σ, π, π i ) of Γ−∞,∞ . For the game with
a
start
date,
Γ0,∞ ,
the
σ i,t (ai,t |(ẑi,1 , . . . , ẑi,k ), (zi,0 , . . . , zi,t ))
strategy
=
σ i (ai,t |(zi,t , . . . , zi,0 , ẑi,1 , . . . , ẑi,k )) constitutes a correlated equilibrium when coupled with correlation device xk : Ω → [0, 1] with implied conditional conditional beliefs π i (ω −i |ω i ) =
x(ω i , ω −i )/
Proof.
P
ω −i
x(ω i , ω −i ), if and only if π i (ω i ) ∈ M i (ω i ) for all i and ω i .
If π i (ω i ) ∈ M i (ω i ) for all i and ω i then by the construction of M i , incentives hold
20
for all histories. If π i (ω i ) ∈
/ M̂i (ω i ) for some ω i then incentive compatibility does not hold
at date t = 0 for signal ω i . If π i (ω i ) ∈ M̂i (ω i ) but π i (ω i ) ∈
/ T̂ (M̂i )(ω i ) for some ω i then
incentive compatibility does not hold at date t = 1 for a private signal ω i and a private
history (ai,0 , yi,0 ) at date 0 such that π i (ω i ) = m0 (π i (ω i ), ai,0 , yi,0 ), and so on.
The implications of Lemma 7 on constructing sequential equilibria can be seen by
referring to our example from Section 4. In that example, with = .025, tit-for-tat is a k = 1history dependent SSE-ih. That is, incentives hold for the beliefs which can be generated by
infinite histories, Mi∗ (C, G) = [.956, .972], Mi∗ (C, B) = [.059, .198], Mi∗ (D, G) = [.923, .955],
Mi∗ (D, B) = [.036, .053]. But incentives hold for wider beliefs than these intervals. For this
example, M̂i (C, G) = M̂i (D, G) = [.916, 1] and M̂i (C, B) = M̂i (D, B) = [0, .706]. Further,
for this example, T̂ (M̂ ) = M̂ , thus M̂ = M . A correlation device signaling fictitious 1period histories according to the ergodic distribution has π i (y−i = G|yi = G) = .966 and
π i (y−i = G|yi = B) = .085. Since both of these beliefs are within M (yi ), this signaling
device can be used to create a correlated equilibrium for the tit-for-tat strategy. The point of
Lemma 7, however, is that any correlation device which delivers π i ∈ M can be used as well.
In particular, since M (yi = G) includes the belief that y−i = G with probability 1, putting all
probability on the fictitious joint history (G, G) delivers a correlated equilibrium. But since
this correlation device is degenerate, we have constructed a sequential equilibrium where both
agents cooperate in the first period and play tit-for-tat after that. Likewise, since M (yi = B)
includes y−i = G with probability 0, we have also constructed a sequential equilibrium where
both agents defect in the first period and play tit-for-tat after that. Finally, since M (yi = G)
does not include the belief that y−i = G with probability 0, we have demonstrated that having
one player cooperate and the other defect in the first period (and playing tit-for-tat after
21
that), is not a sequential equilibrium. Thus we have exhaustively determined which starting
conditions, when coupled with a particular SSE-ih, are and are not sequential equilibria.
This logic generalizes. Since the number of pure-strategy k-history dependent strategies is finite (and we have derived methods for checking whether each is an SSE-ih) and the
number of pure-strategy possibilities for determining play in the first k periods of a game
with a start date is also finite (and we have derived methods for determining whether each
of these starting conditions satisfy incentive compatibility) we have constructed methods for
determining all k-history dependent sequential equilibria of traditional games with a start
date.
6. Concluding Remarks
The equilibria we have characterized in this paper can be used to construct other
equilibria. For example, we can construct sequential equilibria for games with a start date in
which players mix in the first period and then follow pure stationary strategies conditional
on the initial randomization, simply by letting the players to draw randomly their private
fictitious histories (a construction similar to Sekiguchi (1997)). Or, we can also design some
other non-stationary strategies in the first few periods and append to it one of our equilibria.
Furthermore, once we find an equilibrium for a discount factor β, we can apply the method
1
from Ellison (1994) and construct an equilibrium for discount factor β 2 by simply dividing
the game into odd and even periods and making the players treat these two parts of the game
independently. Since incentive constraints are continuous in β, the sets M i are also continuous
in β, so finding a generic equilibrium for one β gives us a neighborhood of equilibria around
that β (recall that the Mi∗ sets are independent of β).
22
Some open questions remain. First, we have assumed away any problems associated
with interpreting off equilibrium behavior by assuming trembles and full support of signals.
Can our methods be extended to games without trembles? To games without full support?
These questions, albeit important, seem tangential to the stationarity issues that we addressed. However, we conjecture that some generalization is possible (at least for action-free
strategies i.e. ones in which players condition their actions on signals but not on own past
actions). If we keep the full support assumption and get rid of trembles, one can look for Nash
equilibria instead of sequential equilibria and check incentive compatibility constraints only
on the equilibrium path. Then we can apply the result in Sekiguchi (1997) or Kandori and
Matsushima (1998) that under full support every Nash equilibrium outcome is a sequential
equilibrium outcome. We would lose the description of strategies off-the equilibrium path,
but at least we would describe equilibrium outcomes. Some relaxation of the full support
assumption is also possible, for example by the introduction of public signals (that is, making
some realizations of yi perfectly correlated). But cases when some private signals indicate a
deviation by another player are much more complicated and have not achieved much attention
in the literature.
Second, we have focused on the k-history dependent strategies. It is an open question
how important is this restriction in repeated games with private monitoring. For Example,
Cole and Kocherlakota (2005) show that for some games with public monitoring the set of
payoffs achievable with public k-history dependent strategies is strictly smaller than the whole
set of PPE payoffs (at least for strongly symmetric strategies). It is not clear how rich is the
class of games with that property and whether the same is true for sequential or correlated
equilibria of games with private monitoring.
23
References
[1] Abreu, D., D. Pearce, and E. Stacchetti (1990): “Toward a Theory of Discounted Repeated Games with Imperfect Monitoring,” Econometrica, 58, pp. 1041–1063.
[2] Cole, H., L. and N. Kocherlakota (2005): “Finite memory and imperfect monitoring,”
Games and Economic Behavior, 53, pp. 59-72.
[3] Compte, O. (2002): “On failing to cooperate when monitoring is private,” Journal of
Economic Theory, 102, pp. 151–188.
[4] Cripps, M., G. Mailath and L. Samuelson (2006): “Disappearing Private Reputations in
Long-Run Relationships.” forthcoming Journal of Economic Theory.
[5] Ely, J. C. (2002): “Correlated Equilibrium and Trigger Strategies with Private Monitoring,” working paper Northwestern University.
[6] Ely, J. C., J. Hörner and W. Olszewski (2005): “Belief-Free Equilibria in Repeated
Games,” Econometrica, 73, pp. 377-415.
[7] Ely, J. C., and J. Välimäki (2002): “Robust Folk Theorem for the Prisoner’s Dilemma,”
Journal of Economic Theory, 102, pp.84-105.
[8] Kandori, M. (2002): “Introduction to Repeated Games with Private Monitoring,” Journal of Economic Theory, 102, pp. 1-15.
[9] Kandori, M., and H. Matsushima (1998): “Private Observation, Communication and
Collusion,” Econometrica, 66, pp. 627-652.
24
[10] Kandori, M. and I. Obara (2006), “Efficiency in Repeated Games Revisited: The Role
of Private Strategies”, Econometrica, 74, pp. 499-519.
[11] Mailath, G. J., and S. Morris (2002): “Repeated Games with Almost Public Monitoring,”
Journal of Economic Theory, 102, pp. 189-228.
[12] Mailath, G. J., and S. Morris (2006): “Coordination failure in repeated games with
almost-public monitoring,” forthcoming in Theoretical Economics.
[13] Mailath, G. J., L. Samuelson (2006): “Repeated Games and Reputations: Long-Run
Relationships,” Oxford University Press.
[14] Piccione, M. (2002) “The repeatedprisoner’s dilemma with imperfect private monitoring,” Journal of Economic Theory, 102, pp. 70–83.
[15] Sekiguchi, T. (1997): “Efficiency in Repeated Prisoners’ Dilemma with Private Monitoring,” Journal of Economic Theory, 76, pp. 345-361.
25
Download