Sequential Equilibrium Distributions in Multi

advertisement
Sequential Equilibrium Distributions in Multi-Stage Games
with Infinite Sets of Types and Actions
by
Roger B. Myerson and Philip J. Reny*
Draft notes October 2014
https://sites.google.com/site/philipjreny/
Abstract: Guided by several key examples, we formulate a definition of a sequential equilibrium
distribution for multi-stage games with infinite type sets and infinite action sets, and we prove its existence
for a class of games.
* Department of Economics, University of Chicago
1
Goal: formulate a definition of sequential equilibrium for multi-stage games with infinite
type sets and infinite action sets, and prove existence for some broad class of games.
Sequential equilibria were defined for finite games by Kreps-Wilson 1982,
but rigorously defined extensions to infinite games have been lacking.
Various formulations of “perfect bayesian equilibrium” (defined for finite games in
Fudenberg-Tirole 1991) have been used for infinite games; no general existence theorem.
Harris-Stinchcombe-Zame 2000 explored definitions with nonstandard analysis.
It is natural to try to define sequential equilibria of an infinite game can by taking limits of
sequential equilibria of finite games that “approximate” it.
But no general definition of “good finite approximation” has been found.
It is easy to define sequences of finite games that seem to be converging to the infinite game
(in some sense) but have limits of equilibria that seem wrong.
We therefore take another route.
A strategy is an epsilon-sequential equilibrium if for any finite partition of the players’ type
(information) spaces, each player's strategy is epsilon-optimal at each element of the partition
when the strategies of players and Nature are perturbed with epsilon probability,
independently across players and coordinates of the state, so that each partition element has
positive probability.
A sequential equilibrium distribution (over outcomes) is a limit of epsilon-sequential
equilibrium distributions as epsilon tends to zero.
The definition presented here is based on intuitive judgments about what is reasonable.
Others are encouraged to explore alternative definitions.
2
Multi-stage games =(N,K,A,T,J,,d,M,p,,u)
i  N = {players}, a finite set; k  {1,...,K} dates of the game.
L = {(i,k)| iN, k{1,...,K}} = {dated players}. We write ik for (i,k).
Aik = {possible actions for player i at date k}; history independent.
Tik = {possible informational types for player i at date k}.
k = jJkj = {possible date k states}; J a finite set. kj is metrized by dkj.
-algebras (closed under countable  and complements) of measurable subsets are specified
for Aik, Tik, and kj. kj has its Borel -algebra. Products are given their product -algebras.
A = kKiNAik = {possible vectors of actions in the whole game}.
T = kKiNTik = {possible vectors of types in the whole game}.
 = kKk = {possible states in the whole game}; A = {possible outcomes of the game}.
The subscript, <k, denotes the projection onto periods before k, and ≤k weakly before.
e.g., A<k = h<kiNAih = {possible action sequences before period k} (A<1 = {}),
and for aA, a<k=h<kiNaih is the partial sequence of actions before period k.
For any of the sets X above, M(X) denotes its set of measurable subsets.
Let (X) denote the set of countably additive probability measures on M(X).
The date k state is determined by a regular conditional probability (r.c.p.) pk from <kA<k to
(k). i.e., for each (<k,a<k), pk(|<k,a<k) is in (k), and for each BM(k), pk(B|<k,a<k) is
a measurable function of (<k,a<k). Nature’s probability function is p=(p1,…,pK).
Player i's date k information is given by a measurable onto type function ik:≤k A<k  Tik.
Assume perfect recall: ikL, m < k, there is a measurable ikm:TikTimAim such that
ikm(ik(≤k,a<k)) = (im(≤m,a<m),aim), , aA.
Each player i has a measurable and bounded utility function ui: A ℝ.
3
Strategies and induced distributions
A strategy, sik for any ikL is any regular conditional probability from Tik to (Aik).
i.e., for each tikTik, sik(|tik) is a countably additive probability on the measurable subsets
of Aik, and for each measurable subset B of Aik, sik(B|tik) is a measurable function of tik.
Let Sik denote ik’s set of strategies, let Si = kSik, denote i’s set of strategies,
and let S = ikLSik denote the set of all strategy profiles.
For any skiNSik and any tkiNTik, let sk(|tk) denote the product measure
iN sik(|tik) on iNAik. Let Ak = iNAik.
Each skiNSik determines a regular conditional probability qk from <kA<k to (kAk)
as follows. For any measurable subset Z of kAk, and any (<k,a<k)<kA<k,
qk(Z|<k,a<k,sk) = ∫k sk(Zk(k)|k(≤k,a<k))pk(dk|<k,a<k),
where Zk(k)={akiNAik:(k,ak)Z}, and k(≤k,a<k) = iNik(≤k,a<k).
Set Q1 = q1, and define the probability measures Q2,…,QK inductively as follows.
For each k and any measurable subset W of ≤kA≤k,
Qk(W|s) = ∫<k A<k qk(Wk(<k,a<k)|<k,a<k,sk)Qk-1(d(<k,a<k)|s),
where Wk(<k,a<k) = {(k,ak) : (k,<k, ak,a<k)W}.
QK(|s)(A) is the probability measure over outcomes induced by the strategy s.
4
Observable events, conditional probabilities and payoffs
For any sS, any ikL, and any CM(Tik), let P(C|s) = QK({(,a): ik(≤k,a<k)C}|s).
For any 0, let Cik() denote the set of measurable subsets of Tik that can have probability
more than  under some strategy profile, that is Cik() = {CM(Tik): P(C|s) >  for some
sS}.
We may say that the events in Cik() are -observable by player i at date k.
Let C() = ikL Cik() (a disjoint union) denote the set of all events that are -observable by
some player at some date.
Then C(0) is the set of all observable events for the players in this game.
Let Y denote the set of measurable subsets Y of A. That is, Y is the set of all outcome
events.
If P(C|s) > 0, then we may define:
a. the conditional probability P(|C,s) on A by
P(Y|C,s) = QK({(,a) Y: ik(≤k,a<k)C}|s)/P(C|s), for any YY, and
b. player i’s conditional payoff by
Ui(s|C) = A ui(,a) P(d(,a)|C,s).
To make explicit the dependence of the above conditionals on Nature’s probability function,
p = (p1,…,pK), we may sometimes write P(C|s,p), P(Y|C,s,p), and Ui(s|C,p).
To motivate our definition, we consider a number of examples.
5
Strategic entanglement in limits of approximate equilibria (Harris-Reny-Robson 1995)
Example 1: Date 1: Player 1 chooses a1 from [-1,1], player 2 chooses from {L,R}.
Date 2: Players 3 and 4 observe the date 1 choices and each choose from {L,R}.
For i=3,4, player i’s payoff is -a1 if i chooses L and a1 if i chooses R.
Player 2’s payoff depends on whether she matches 3’s choice.
If 2 chooses L then she gets 1 if player 3 chooses L but -1 if 3 chooses R; and
If 2 chooses R then she gets 2 if player 3 chooses R but -2 if 3 chooses L.
Player 1’s payoff is the sum of three terms:
(First term) If 2 and 3 match he gets -|a1|, if they mismatch he gets |a1|;
plus (second term) if 3 and 4 match he gets 0, if they mismatch he gets -10;
plus (third term) he gets -|a1|2.
There is no subgame perfect equilibrium of this game. But it has an obvious “solution” which
is the limit of strategy profiles where everyone's strategy is arbitrarily close to optimal.
Approximations in which 3 and 4 can distinguish between a1 = +,0,- and in which 1’s action
set is {-1,…,-2/m,-1/m,1/m,2/m,…,1} have a unique subgame perfect (hence sequential)
equilibrium in which player 1 chooses ±1/m with probability ½ each, player 2 chooses L and
R each with probability ½ , and players i=3,4 both choose L if a1=-1/m and both choose R if
a1=1/m.
The limiting outcome distribution is a1= 0, a2 = L or R each with probability ½, and (a3,a4) =
(L,L) or (R,R) each with probability ½ .
Player 3’s and player 4’s strategies are entangled in the limit, that is, independent strategies
cannot yield their joint behavior in the limit distribution.
6
Limitations of step-strategy approximations
Example 2: Player 1 chooses a1[0,1]. Player 2 observes t2 = a1 and chooses a2[0,1].
The game is zero-sum. Player 1 receives 1 if their choices do not match and -1 otherwise.
(This discontinuous game could be a reduced form of a continuous game with another stage.)
There should be no equilibrium in which player 2 fails to match player 1’s choice.
However, for any fixed partition of player 2’s type space T2 = [0,1], player 1 can mix
uniformly over a large enough finite set of actions within a single element of 2’s partition so
that player 2 has an arbitrarily small chance of matching 1’s choice.
Player 2 would like to use the strategy "choose a2 = t2," but this strategy can be approximated
only by step functions when player 2 has finitely many feasible actions.
Step functions close to this strategy yield very different expected payoffs because 2’s utility
function is discontinuous (but such discontinuities can arise in the reduced form of a
continuous game with one additional stage).
If we gave player 2 the strategy s2(t1) = t1, she would use it!
This example suggests that, because certain strategies can be particularly important for
players, we should include strategies (not just actions) in our finite approximations.
7
Problems of spurious signaling in naïve finite approximations
Example 3: Nature chooses {1,2}, p() = /3. Player 1 observes t1= and chooses
a1[0,1]. Player 2 observes t2 = (a1) and chooses a2{1,2}. Payoffs are as follows.
=1
=2
a2 = 1
(1,1)
(1,0)
a2 = 2
(0,0)
(0,1)
1’s payoff should be 0. Otherwise, in any Nash equilibrium (similarly for epsilon-Nash
equilibrium), for some a1=x(0,1) in the support of 1’s strategy, a1=x2 is less than twice as
likely as x, and so player 2 chooses a2=2 after observing t2=x2; hence 1’s payoff is
Pr(a2=1|t2=x)/3. But choosing a1=(x)1/2 gives 1 at least 2Pr(a2=1|t2=x)/3; so 1’s payoff is 0.
But choose any finite partition of 2’s types where sequential rationality will be checked, and
consider filling in the players’ pure strategy sets.
For any finite set F of pure strategies (actions) for player 1, we can always add one more
pure strategy a1 = x such that 0<x<1 and (x)1/2 is not in F. And we can add to player 2’s
finite set of strategies the pure strategy s2 such that s2(t2) = 1 iff (t2)1/2F{x}; so s2(x)=2.
In this approximating game with finitely many strategies, we find a subgame-perfect
equilibrium in which player 1 chooses x, player 2 applies s2, and player 1’s payoff is 1/3.
Taken together, the examples indicate that attempting to approximate the true game with a
finite game is unreliable because it can produce unwanted equilibria.
(Harris-Stinchcombe-Zame 2000 provide many excellent additional examples of this kind.)
Thus we define optimality for a player relative to the player's entire set of strategies.
But we use finite partitions of type spaces to define where sequential rationality is checked.
8
Spurious miscoordination: the need for uniform sequential rationality
Example 4: There are two teams and a referee. The odd team consists of players 1 and 3. The
even team consists of players 2 and 4. Players 1 and 2 each choose an action from [0,1] at
date 1. Players 3 and 4 each choose an action from [0,1] at date 2. Each date 2 player
observes the date 1 choice of his teammate. The referee, player 5, is equally likely to observe
the choices of team 1 or of team 2, but he does not know which team’s choices he observes.
After the observation, player 5 chooses an action from {0,1}, which determines the common
payoff of the two players whose choice he observed. All other players receive a payoff of
zero, including the referee.
It should not be possible for each of the two players on the odd team to receive an expected
payoff close to 1/2 without each of the two players on the even team also being able to do so.
However, for any finite partition of the players’ type spaces, let x and y be distinct actions in
some common element of player 4’s partition. Consider strategies that give positive
probability to all partition elements and that also satisfy the following.
Players 1 and 3 place most of their probability weight on, respectively, a1=x and s3(t3)=x t3,
and players 2 and 4 place most of their probability weight on, respectively, a2=y and s4(t4)=y
t4. Player 5’s strategy chooses action 1 iff he observes (x,x).
These strategies reach every element of the partition with positive probability and are
sequentially rational there. Nevertheless, these strategies are nonsensical.
The problem is that the strategies are permitted to vary with the partition, so player 3's
sequential rationality is never checked on the sets where player 4 finds opportunities to win.
For fixed strategies, we need to check sequential rationality uniformly in observable events.
9
The need to perturb Nature as well as players' strategies for positive probabilities
Example 5: Player 1 chooses a11{L,R}. If a11= L, then Nature chooses [0,1] uniformly,
whereas if a11= R, then player 1 moves again and chooses a12[0,1]. In either case, player 2
moves next. Player 2 observes t2 =  if a11= L and observes t2 = a12 if a11= R, and then
chooses a2{L,R}.
Payoffs (battle of the sexes) are as follows.
a2 = L
a11 = L (1,2)
a12 = R (0,0)
a2 = R
(0,0)
(2,1)
All BoS equilibria are reasonable (the choice,  or a12, from [0,1] is payoff irrelevant).
However, if all events that can have positive probability under some strategies must
eventually receive positive probability along a sequence (or net) for “consistency,” then
the only possible equilibrium payoff is (2,1).
Indeed, for any x[0,1], the event t2 = x can have positive probability, but only if positive
probability is given to the strategy (a11= R, a12= x), because  = x has probability 0.
So, in any strategy assigning positive probability to t2 = x, player 2’s strategy must choose
a2= R when t2 = x by sequential rationality. But then player 1 can obtain a payoff of 2 with
the strategy (a11= R, a12= x) and so the unique equilibrium outcome is (2,1)!
To allow other equilibria we should perturb Nature.
10
The need to perturb the coordinates of Nature's state independently
Example 6 (Harris-Stinchcombe-Zame 2000): Nature chooses (,){–1,1}[0,1]. The
coordinates are independent and uniform. Player 1 observes t1 =  and chooses a1[0,1].
Player 2 observes t2 = |a1 – | and chooses a2{–1,0,1}. Payoffs are:
u1(,,a1,a2) = – |a2|, u2(,,a1,a2) = – (a2 – )2 .
Thus, player 2 should choose a2 to equal her expected value of , and player 1 wants to hide
information about  from 2.
Player 1 can hide information about  from player 2 by using the strategy s1() = .
Hence, the most natural equilibrium gives player 1 a payoff of zero. It is the only sequential
equilibrium if [0,1] is approximated by a discrete set both for Nature and for player 1.
However, consider the strategy profile s1() = 1; s2(t2) = 1 if t2 ≥ 0, and s2(t2) = –1 if t2 < 0.
This profile is sequentially rational at every t2 except t2 = 0 and gives player 1 a payoff of -1.
To verify whether s2(0)=1 could be sequentially rational here, we need to consider perturbed
scenarios where player 1 or nature might behave differently with small probability.
The question is whether E(|a1=) could ever be different from 0 in such a scenario.
This could happen if we perturbed (,) jointly to equal (1,1) with some probability >0.
Then learning a1= would cause 2 to believe that =1, making s2(0)=1 sequentially rational.
But this perturbation violates the structural independence of  and  in the given game.
Thus, we should only consider perturbations of Nature that treat separate coordinates of the
state independently.
As long as a1 and  are both independent of , we must have E(|a1=) = 0.
11
Problems from allowing large perturbations of Nature even with small probability
Example 7 : Nature chooses (11,12) [0,1]2. With probability ½, the coordinates are
independent and uniform on [0,1], and with probability ½ the coordinates are equal and
uniform on [0,1]. Player 1 observes t1 = 11 and chooses a1{-1,1}. Player 2 observes t2 = a1
and chooses a2{–1,1}. Payoffs are:
u1(11,12,a1,a2) = a1a2, u2(11,12,a1,a2) = a2(1/3– |11-12|).
Thus, player 2 should choose a2 = 1 if she expects |11-12| to be less than 1/3 and she should
choose a2 = -1 otherwise. Player 1 wants to choose an action that player 2 will match.
Since for every 11, 12 is equally likely to be equal to 11 (in which case |11-12| =0) as to be
uniform on [0,1] (in which case |11-12| ≤ ½), player 2 should expect |11-12| to be no
greater than ¼, regardless of 1’s strategy. So player 2 should choose a2 = 1.
Thus all sensible equilibria involve strategies that give probability 1 to (a1,a2) = (1,1).
But consider the strategy profile (s1,s2) where s1(11) = -1 iff 11 ≠ 1, and s2(a1) = 1 iff a1 = -1.
This profile gives probability 1 to (a1,a2) = (-1,1), and is supported in an epsilon-sequential
equilibrium by the perturbation of Nature that never perturbs 12 but that with probability
epsilon perturbs the distribution of 11 so that it is a mass point on 11 = 1. With this
perturbation of Nature it is sequentially rational for player 2 to choose a2 = -1 when she
observes a1 = 1 because she attributes this observation to 11 being a mass point on 1 and
therefore expects the value of |11-12| to be ½.
This difficulty can be eliminated if Nature’s states can be perturbed only to nearby states.
12
-close perturbations of Nature
Arbitrary perturbations of Nature could have dramatic effects on the players’ information.
To avoid this, we consider here only perturbations such that separate coordinates of Nature’s
dated states (in k = jJkj for each date k) are independently perturbed, so that the
perturbation of one coordinate adds no information about any other coordinate.
In close perturbations, states can be changed only slightly and only with small probability.
Let kj be the set of regular conditional probabilities from kj to (kj), and let
 = k≤K,jJkj denote the set of all coordinatewise-independent perturbation mappings.
Any  in  induces a perturbation of Nature p where, for each period k, (p)k is the
regular conditional probability from <kA<k to (k) such that for any B = jJ Bj  k,
[p]k(B|<k,a<k) = ∫k jJkj(Bj|kh) pk(dk|<k,a<k).
Here  denotes multiplication, as state coordinates are independently perturbed.
Say that p’ is an -close perturbation of p if there is   such that p’ =p and
kj({jh}|kh)  1 and kj({’kj: dkj(’kj,kj) < }|kj) =1, kjkj, j, k.
Remark. Given , the -close perturbation p’ of p works as follows. At each date k a provisional
state k=(kj)jJ is drawn according to pk(|<k,a<k), where <k is the actual (not provisional) vector of
past dated-states. Then, for each coordinate jJ, the actual jth coordinate is drawn independently
according to kj(|kj) which assigns probability 1- to the event that the actual jth coordinate of the date
k state is equal to its provisional value kj, and which assigns probability 0 to anything farther than 
from the provisional value. Thus, for each date, for each coordinate of Nature’s state, for any past
history, p’ generates a state-coordinate value which may deviate slightly from the value generated by p,
but such deviations must be unlikely and must be independent across coordinates.
13
-sequential equilibria
Say that riSi is a date-k continuation of siSi if rij = sij for all dates j < k.
We say that s’ is an -small perturbation of the strategy s in S iff there exists some ŝ in S
such that s’ = (1-)s+ŝ, that is, sik(|tik) = (1)sik(|tik)+ŝik(|tik) ik, tik.
For any  > 0, say that sS is an -sequential equilibrium of  if for every F that is a finite
subset of C() containing at most 1/ events, there exist some p’ that is an -close perturbation
of p and some s’ that is an -small perturbation of s, such that ikL,C F Cik(),
1. P(C|s’,p’) > 0, and
2. Ui(ri,s’i|C,p’)  Ui(s’|C,p’) + , for any ri that is a date-k continuation of s’i.
Note. Changing i’s choice only at dates j ≥ k does not change the probability of i's types at k,
so P(C|ri,s’i,p’) = P(C|s’,p’) > 0.
Thus as  becomes small, a strategy profile can satisfy -sequential rationality if, for any
arbitrarily large collection of events that are observable with arbitrarily small probability,
there is some arbitrarily small perturbation of the strategy profile and some arbitrarily close
perturbation of nature such that the players come arbitrarily close to maximizing their
conditional expected utilities on all their observable events in the given collection.
14
Sequential equilibrium distributions in the limit.
Recall that QK(|s) is the (countably additive) probability measure over A induced by s,
and Y denotes the set of measurable subsets of A.
Call a finitely additive probability measure : Y  [0,1] a pointwise sequential equilibrium
distribution of  iff for every  > 0 and every finite subset F of Y, there is an -sequential
equilibrium s such that Y F, |(Y)-QK(Y|s)| < .
(Example 1 shows that a sequence of -sequential equilibria may, in the limit, yield a
distribution that is only finitely additive.
For any >0, the events that player 1's action is in {a1: <a1<0} or in {a1: 0<a1<} must
each have probability 1/2.)
Now suppose that  and A each have a topology, and let M() and M(A) be the Borel sets.
Then Y ={Borel subsets of A}.
A countably additive probability measure : Y  [0,1] is a weak* sequential equilibrium
distribution of  iff there is a sequence, s, of -sequential equilibria such that QK(|s) weak*
converges to .
(For Example 1, the weak* sequential equilibrium distribution would put probability 1 on
player 1 choosing a1=0. But at date 2, the actions of players 3 and 4 will still be perfectly
correlated, both choosing L or R with probability 1/2 independently of 2's action.)
15
Regular multi-stage games with projected types
Let  = (N,K,A,T,J,,d,M,p,,u) be a multi-stage game (with perfect recall ).
 is a regular game with projected types if there are sets Aikj such that, for every ikL:
(R.1) Aik = jJAikj,
(R.2) kj and Aikj are nonempty compact metric spaces jJ (with all spaces, including
products, given their Borel sigma-algebras),
(R.3) ui: A ℝ is continuous,
(R.4) there is a continuous nonnegative density function fk:≤k A<k  [0,∞),
and for each j in J, there is a probability measure kj(kj) such that
pk(B|<k,a<k) = ∫B fk(k|<k,a<k)k(dk) BM(k), (<k,a<k)<kA<k ,
where k = jJkj is a product measure.
(R.5) there exist sets M1ik  {1,…,k}J and M2ik  N {1,…,k-1} J, such that
ik(≤k,a<k) = ((hj)hjM1ik, (anhj)nhjM2ik) (≤k,a<k); that is, i's type at k is just a list of state
coordinates and action coordinates from periods up to k.
Remarks:. (1) One can always reduce the cardinality of J to (K+1)|N| or less by grouping, for
any ikL, the variables {aikj}jJ and {kj}jJ according to the |N|-vector of dates at which
each player observes them, if ever.
(2) Regular multi-stage games with projected types can include all finite multi-stage games
(simply by letting each player's type be a coordinate of the state).
(3) Since distinct players can observe the same kj, regular multi-stage games with projected
types need not satisfy the Milgrom-Weber (1985) absolute continuity condition.
16
Existence Theorems
Theorem 1. In all finite multi-stage games, a strategy is part of a Kreps-Wilson sequential
equilibrium iff it is the limit, as  tends to zero, of a sequence of -sequential equilibria (as
defined here), and the sets of pointwise and weak* sequential equilibrium distributions here
both coincide with the set of Kreps-Wilson sequential equilibrium distributions.
Theorem 2. Consider any multi-stage game in which all state and action spaces are Polish,
all action spaces are compact, and all spaces are given their Borel sigma algebras. If for
every date k, pk(|<k,a<k) is weak* continuous in (<k,a<k), then a weak* sequential
equilibrium distribution exists if and only if an -sequential equilibrium exists for all  > 0.
Theorem 3. In all regular multi-stage games with projected types, the set of weak*
sequential equilibrium distributions is nonempty.
Theorem 4. A pointwise sequential equilibrium distribution exists if an -sequential
equilibrium exists for every >0, which is true if the game is regular with projected types.
(The proof applies Tychonoff’s theorem to the compact product space [0,1]Y.)
17
References
David Kreps and Robert Wilson (1982): "Sequential Equilibria," 50:863-894.
Drew Fudenberg and Jean Tirole (1991): "Perfect Bayesian and Sequential Equilibrium,"
Journal of Economic Theory 53:236-260.
Christopher J. Harris, Maxwell B. Stinchcombe, and William R. Zame (2000): "The Finitistic
Theory of Infinite Games," UTexas.edu working paper.
http://www.laits.utexas.edu/~maxwell/finsee4.pdf
Christopher J. Harris, Philip J. Reny, and Arthur J. Robson (1995): "The existence of
subgame-perfect equilibrium in continuous games with almost perfect information,"
Econometrica 63(3):507-544.
John L. Kelley, General Topology (Springer-Verlag, 1955).
Milgrom, P. and R. Weber (1985): "Distributional Strategies for Games with Incomplete
Information," Mathematics of Operations Research, 10, 619-32.
18
Download