Document 11159475

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/learningmixedequOOfude
working paper
department
of economics
LEARNING MIXED EQUILIBRIA
Drew Fudenberg
David M. Kreps
No. 92-13
June 1992
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
LEARNING MIXED EQUILIBRIA
Drew Fudenberg
David M. Kreps
No. 92-13
June 1992
M.I.T.
DEC
LIBRARY
7 1992
Abstract
This paper explores learning models in the spirit of the method of
fictitious play.
We extend the previous literature by generalizing the
classes of behavior rules, and by considering games with more than two
players.
Most importantly, we reformulate the study of convergence to mixed
strategy equilibrium, using Harsanyi's notion of perturbed games.
Keywords:
game theory, mixed strategies, learning
JEL Classification:
C72,
D83
LEARNING MIXED EQUILIBRIADrew Fudenberg
Department of Economics, Massachusetts
Institute of
Technology
and
David M. Kreps
Graduate School of Business, Stanford University, and
Department of Economics,
Tel
Aviv University
June 1992
1.
Introduction
Nash equilibrium
describes a situation in which players have identical and
exactly correct beliefs about the strategies each player will choose.
might the players come
enough
One
have correct
beliefs, or at least beliefs that are close
being correct that the outcome corresponds to a Nash equilibrium?
to
explanation
beliefs
to
come
to
is
that the players play the
game over and
be correct as the result of learning from past
has been explored at some length in the recent
number
How and when
of different forms
and
over,
and
that their
play. This explanation
literature, in
models
that take a
that stress different aspects of the problem.
]
This paper explores learning models that are in the spirit of the model or
method of
"We
Li Calzi,
like to
play (Brown, 1951; Robinson, 1951) in which players choose
fictitious
are grateful to Robert Anderson, Robert
Ramon Marimon, Tom
Aumann, Hugo Hopenhayn, David
Sargent, and Jose
Scheinkman
for helpful
Levine,
comments.
We
Marco
would
thank IDEI, Toulouse and the Institute for Advanced Studies, Tel-Aviv University, for their
hospitality while this research
was being conducted. The
financial assistance of the National Science
Foundation (Grants SES 88-08204, SES 90-08770 and SES 89-08402) and the John Simon Guggenheim
Foundation is gratefully acknowledged.
1
list
To provide a guide
of recent references:
(1991), Ellison (1991),
would take too long, so we will be content with a partial
Crawford and Haller (1990), Eichberger, Haller, and Milne
to the literature
Canning
(1991),
Fudenberg and Levine
(1992),
Fudenberg and Kreps
(1988),
Hendon
et al. (1991),
Jordan (1991), Kalai and Lehrer (1990), Kandori, Mailath, and Rob (1991), Milgrom and Roberts (1990,
1991),
Nyarko
when
they bear directly on our
(1991),
and Young
(1991).
own
We
will
comment on some
analysis.
1
of these papers as
we
proceed,
their strategies to
sumption
maximize
that their
opponents
historical frequency.
its
We
(1)
their current period's
We
will play
expected payoff on the as-
each strategy with probability equal
to
extend the previous literature in three ways:
provide some minor extensions to the basic model of
fictitious play,
by
generalizing the classes of rules by which players form their beliefs and use them
to
choose their actions.
(2)
We study models
in the spirit of fictitiour play for
games with more than two
players.
(3)
Most importantly, we reformulate
equilibria.
We
the study of convergence to mixed-strategy
argue that the notion of convergence used previously in the
ature, that the empirical marginal distributions converge,
notion of what
it
means
is
to play a mixed-strategy profile,
not an approproate
and we suggest and
We show
analyze the stronger criterion of the convergence of intended behavior.
that all
this
Nash
mode
equilibria
and only Nash
of convergence.
Finally,
we
liter-
equilibria are possible limit points
under
investigate the global stability of
mixed
equilibria in the setting of Harsanyi's (1973) purification theorem.
Section 2 gives a general formulation of learning in a strategic-form game.
This formulation, and our subsequent analysis, supposes that the
play each other repeatedly (as opposed to a model with a large
l's,
player
2's, etc.)
strategies chosen
by
and
that in each
the questions addressed
two groups.
sense,
what
play, players
First, if
model of
by
fictitious
of player
observe the (pure)
fictitious
play "settles
play for two players.
down"
or converges in
is
separate
some appropriate
play guaranteed to converge?
to the first question, recall that there are
gence in the literature on
We
play (and models of learning in general)
are the possible limit points? Second,
With regard
number
their rivals.
Section 3 reviews the
into
round of
same players
two modes
fictitious play. In *he first, there is a finite
of conver-
time
T
such
that a single strategy profile
see that any such profile
convergence, play cycles
must be
among
a
Nash equilibrium.
In the
different strategy profiles in a
empirical frequencies of each player's choices converge to
The corresponding strategy
egy.
traditional sense in
which
T
played in every period from
is
profile
fictitious
also a
is
play
is
equilibria. Section 3 briefly reviews results
Nash
on;
second
mode
way such
some
easy to
it is
of
that the
(mixed)
equilibrium; this
strat-
the
is
said to converge to mixed-strategy
from the
literature that
use these two
convergence notions.
In Section 4,
we
sumptions about the ways
choose
their
immediate
in
which players construct
We
actions.
of actions are asymptotically myopic;
that
maximizes
their
will
i.e.,
are adaptive (following
we
that players' choice
long run, players choose in a -way
in the
As
provide.
Milgrom and Roberts
But more
is
required
convergence of empirical frequencies
Nash
assessments and then
assume throughout
for players' assessments,
[1991]), then
verges to a pure-strategy profile, the profile must be a
of players.
their
as-
immediate payoffs. This assumption requires some expla-
nation and rationale, which
number
more general
generalize fictitious play by considering
equilibria as limit points.
asymptotically empirical,
to a
if
mixed-strategy profile
this
for
any
form of convergence
the second
—
is to
—
hav^ only
that assessment rules are
that players' assessments
with empirical frequencies. Moreover,
they
intended play con-
Nash equilibrium,
A sufficient condition is
which means
if
if
converge together
condition suffices only for two-player
games.
Section 5 presents several objections to convergence of empirical frequencies
as
an appropriate
file.
(if
In
mode of convergence
summary, these objections
for learning to play a mixed-strategy pn^-
are: (1)
Although assessments are converging
they are asymptotically empirical), the strategies that are chosen are not.
(2)
In examples, correlations will be observed (over time) in the actions of players
who
choose their actions independently.
(3)
Because of
(2),
convergence
in this
sense for games with more than two players
For these reasons, in Section 6
namely convergence
When
we
is
propose
problematic.
a stronger
This raises
of (intended) behavior.
mode
some
of convergence,
technical difficulties:
players use mixed strategies, the realized distribution of play need not
equal the intended one, which
These
abilistic.
difficulties are
makes notions
attended
equilibria are possible limit points
behavior
is
and then
to,
under
of convergence inherently prob-
this
mode
we show
that only
of convergence, as long as
asymptotically myopic and assessments are asymptotically empirical.
Moreover, for any game and Nash equilibrium of the game, there
of asymptotically
one
as desired.
The problem with
a
model
difficulty is the
for
is
a limit point of intended behavior with probability as
2
These
results are not limited to two-player
this alternative
vergence to mixed behavior
The
is
myopic behavior and asymptotically empirical assessments
which the equilibrium
close to
Nash
is
mode
possible,
it
of convergence
is
hard
to see
standard one with mixed strategies:
is that,
why
it
games.
while con-
should occur.
based on their
If,
as-
sessments, players choose their actions to maximize precisely their expected payoffs,
then (unless their assessments are precisely those of the mixed equilibrium)
their
intended behavior will not converge.
If
players are not restricted to precise
maximization, then any behavior (that puts weight on actions in the equilib-
rium mixture)
will
considerations
is
be
satisfactory.
Something outside of payoff-maximization
required to lead players to the precise mixtures needed in the
equilibrium. In our basic formulation,
As
games
a
in
way around
this
we
see
no natural way
problem, in Sections 7 and 8
which each player's payoff
is
we
of doing this.
consider learning in
subject to a sequence of
i.i.d.
random
shocks that are observed only by that player, as suggested by Harsanyi's (1973)
purification theorem.
In this context, a mixture over
two
strategies does not
To the extent that some Nash equilibria seem unreasonable, such as those where players use
weakly dominated
strategies, this last result indicates that are
be discussed as well
in Section 6.
assumptions are too weak. This
will
correspond to mixing by a player
each period)
strictly prefers
who is indifferent, but rather to a
one of the two
strategies,
who do
know
not
Section
mixed
play
we
8,
and we show
made
sense to be
approximation (Arthur
Soderstrom, 1983).
et
that
of the overall story.
nation (see Section
We
hope
games with
a
any learning process
in Section 7 that
Then,
in
unique equilibrium
in
difficulty.
that is close to fictitious
precise) will converge with probability
we
one
to the
use and adapt results from the theory of stochastic
al.,
1987;
let
Kushner and Clark, 1978; and Lyung and
us note that the results given here are only a small part
Among other
4),
we
things,
same opponents repeatedly and
rivals in
(period's)
3
Before setting out,
by
We show
go through without
earlier results
unique equilibrium. Here
the
random.
specialize to the class of 2 x 2
strategies,
(in a
our
(in
the precise value of this period's perturbation for the
player, the actions of the first player are
in this context all of
depending on the
who
from the perspective of other
realization of the player's payoff perturbation. Yet
players,
player
are
assuming
that players encounter
yet act myopically, a fairly unsavory combi-
they observe the
full
(stage-game pure) strategies chosen
game over and
each round of play, and they play the same
to return to
over.
each of these three simplifying assumptions in subsequent
work.
2.
Formulation
Fix an /-player, finite, strategic- form
game. The players are indexed
players;
set of
'
i.e.,
-i = {1,2,...
pure strategies
Thus our work
is
,1
-
i
game, hereafter referred
= 1,2,...,/, and we
l,i
+ l,...,/}
(or actions) for player
similar to that of Marcet
.*
i ,
Let
let
S
let
{
,
i
to as the stage
-2'
denote the "other"
=
...,/, be the finite
5= S x
1,
1
.
.
.
x
S 1 be
the set of
and Sargent (1989a, 1989b) on learning rational
expectations equilibrium.
j
We use male pronouns for players in general, and for players numbered 2, 4, 6, etc. and
We use female pronouns for players numbered 1,3, etc., and for players lettered
and -j
:'
.
lettered
and k.
.
pure-strategy profiles, and
mixed
fashion, let J? be the
v
let
S —
1
:
.
give player i's payoffs. In the usual
7?
strategies for player
i ,
mixed-strategy profiles, and extend the domain of
set of
for each
S~* denote
let
i
distributions over S~'
expected
utility
Yljj,
for
;
:
S
3
'
,
and
-1
X1- ',
£
she chooses pure strategy
if
u'
x
.
.
x
.
X"
7
be the
to 17. Also,
set of probability
tA-s'^
let
and her
1
s
1
from 5
U~' denote the
let
£ S' and a
s'
J
=
JT
let
-
denote
')
i
's
rivals act according to
the (possibly correlated) distribution o~'
Imagine that these players play the game repeatedly,
Imagine that
by
after
the pure strategy that
i.e.,
chosen
is
chooses his action using a mixed strategy, the mixing
up
history of play
=
play
{s-i
,
up
.
.
.
to
,
s -i)
time
t
to time
where
,
t
st
or (5)'
,
t
,
denoted
€ S for
>
_1
is
,
t'
(<
possible infinite histories;
The
how
=
1,
.
.
,
.
-
*
5
denoted by 2,
Z
i.e.,
basic object of this paper
is
observed.
= 1,2,
If
a player
not observed. Then a
1
The
.
set of all histories of
By convention,
.
= (S)°° with
,
Z
Z
will
x
denote
denote the
will
typical
element
and
behavior,
a model of learning
behave and what they believe
the players
t
string of (pure) strategy profiles
is a
,
is
is
the (singleton) set consisting of the null history. Also,
all
dates
each round of play, players observe the actual actions chosen
opponents;
their
at
(
=
(5 l5 5 2 ,
which
as time passes.
A
set of
.
.
.)
specifies
model
of
learning and behavior consists formally of t .vo pieces, behavior rules and assessment
rules for
We
each player.
take these in turn.
Behavior rules
We
will
denote by
repeated game. That
is,
.
behavior rules
date
.
.
Insofar as possible,
players.
When we
<f>'
!
(for a profile (<£\
at
f
<f>
,
4>
)
t
we
write ()'
the behavior rule that playei
=
6'
(<f>{,
2
,...),
where
<f>\
:
?
Z — E
t
of behavior rules for the players),
as a function of
(t ),
and
uses in the infinitely
<?,((<)
4> t
l
.
The notation
<f>
(for the profile of
will all
be used.
follow the convention that subscripts refer to time and superscripts to
,
however,
we mean
the usual
within the parentheses. (We will try to avoid this as
much
t
fold Cartesian
as possible.)
product of the argument
.
Fix a profile of behavior rules
q>
Given any
.
>
t
and
1
to contruct a conditional probability distribution (conditional
the path of play in the usual fashion:
st
(the actual play at date t) via
Z M and
(f> t
and
Ct
Z t+2
the conditional distribution to
We
that this
s
M
given by
part of this construction
plays
i
s'
5
1
,
2 plays s 2
Y(s =
t
That
is,
<p
(t
)
for the rest of
M (( M
) ,
we
and so on. These then give
,
Z by
can extend
a probability
Kolmogorov extension
the
keeping in mind
,
must be emphasized. Given history
time
at
and so on.
[
(s ,...,s
players randomize
(
,
.
When we
the
,
(,
construct the con-
specify the joint probability that
insist that
= ^(G)(^)x...x^(G)(3
|c.)
(in their
^J(Ct)(s')
we must
We
I
)
is
t
on
ditional probability distribution
plays
<f>
will write P(-|(<) for this conditional probability,
probability that
1
,
for a fixed profile of behavior strategies.
is
One
t
give a probability distribution on
<t>
distribution over the space of complete histories
theorem.
on
Z we can use
This gives us conditional probabilities on
(( t )
with transition probabilities for
,
€
(,
7
).
behaviors) independently.
Assessments
To model the behavior rules of players, we
malisms. Specifically,
we
will
want
,
and
partial history
(,
)
what
be played. Formally, for each
range
ancillary for-
and contingent on each possible
t
£i
For the analysis in this paper,
t
employ some
speak of what each player assesses con-
to
cerning the behavior of her rivals, at each date
history
will
E~
x
,
,
let
full
/ij
denote
do
each player
in the
,
time
round about
with domain
a function
i
Z
t
to
and
representing i's assessment over the possible pure strategy profiles
that her rivals will choose at date
denote a
will suffice to specify (for
believes her rivals will
i
t
it
t
,
as a function of
system of assessments or assessment
rule for
(t
.
we
Also,
l
z ;
i.e.,
fi
use
is a
l
/i
to
sequence
Note well
(/Xj',/4,---)-
ments
her
for
that cr"' e -T
behavior;
rivals'
is
i
-
encodes more than
'
allowed
z's
marginal afsess-
make an assessment concerning
to
the joint behavior of her rivals that admits correlations in their play.
seem troubling when contrasted with
at first
This
measures P
There
.
no
conflict,
that
governs
is
measure
reflects the objective probability
We do
by the players.
the evolution of play, as a function of the behavior rules
may
made
the independence assumption
in the earlier construction of the probability
however. The measure P
6
not allow players to correlate their (mixed) strategies at any date, hence P
On
constructed with independence at each date.
what her
resent a player's subjective assessment of
and
until
knows what behavior
i
gies
and
a
player
and 3/4
,
2 has a choice
mixes between a and
b
to do; unless
which player
in
between
and
a'
6'
has a choice between pure
1
=
is,
player 3 believes that player
=
)(a) =
3/4 (where the
strate-
Imagine that player 3 thinks that
.
with probabilities 3/4 and 1/4 or with probabilities 1/4
with the same mixing probabilities used at *nch date, irrespective of what
of play. That
4>\(£t)(a)
and player
b
either
1
about
rivals are
rep-
fi)(Ct)
rules her rivals are using, correlation in her
game
For example, imagine a three-player
the other hand, the
is
1
either using the behavior rule ft
is
means, irrespective of the values of
and
t
Ci
)/
is
the history
that is given
by
or she believes that
1
uses ft given by ft (C
1/4 Player 3 entertains similar beliefs about the behavior rule used
by player 2; we use ft and ft to denote the two possibilities. Moreover, player 3 believes that her
rivals randomize at each date independently; i.e., if player 1 is using c
and player 2 is using ft
t
t
.
,
then player 3 assesses that the probability of
player 3 initially believes that
if
player
1 is
Specifically, player 3's prior belief is thar.
and
use ft
.05;
and
1
2 will use
will
and
use ft
using ft
2 will
use ft1
.05;
1
then
,
will use
1
with probability
ft
any round
in
(a, a')
c
1
will
and
2 will
do
at
rivals.
with probability
(
,
given any history
C.,
.
It
is
2 will
.4.
the correlation in her
initial beliefs
a at date
1,
this
makes
it
at
more
.
Imagine
likely that player 2 is
And,
will
that
using ft
use o" with probability
.4; 1
.
will
use ft with probability
finally,
imagine that 3 uses the
about the joint behavior rule profile
to find 3's
assessment about what
evident that although 3 believes that
are randomizing independently, her initial uncertainty about
assessments about their play
(3/4X3/4) = 9/16
and 2
beliefs
With these data, we can integrate out
any date
more
use ft
sequence of observed play and Bayes' rule to update her
used by her
it is
is
1
1
and
and
2
what behavior rules they are using and
about their behavior rule profile imply that she will be making
each date that reflect correlation.
likely that player
1
is
using ft
If
we
condition on player
than ft
,
which makes
it
1
playing
more
likely
more likely. Note as well that even though player 3
do not change their behavior from date to date as a
function of what happens in the course of play, her assessments n\(C
very much depend on <i
since the history of play up to date
gives player 3 information about what behavior rule profile
that player 2
is
using ft
,
which makes
a'
believes (with probability one) that her rivals
t
t
her rivals are in fact using.
)
,
assessments can
reflect
her strategic uncertainty.
Fictitious play
3.
The model of fictitious
a
7
model of learning and behavior.
then
we
discuss
its
—
setting
as "not
i
?";
as in Section
is just
2.
of learning and behavior.
players;
= 3
?
give the details of fictitious play, and
model
two
—
i.e.,
we
First
interpretation as a
In fictitious play, there are
interpret
viewed as
play (Brown, 1951; Robinson, 1951) can be
—
for
?
1 = 2.
i.e.,
In this setting,
we
= 1,2. Otherwise, the general
2
The behavior and assessment
up
rules are build
as
follows.
(A) For each player
number
(B)
of times that
For each player
that
i,
i ,
i
strategy
played
there
1
s
an
is
s
l
€ S'
in the
"initial
,
and history
t
—
(,
let
,
k((<)(-s ')
be the
observations that comprise
1
weight" function
S~ —+
l
77'
:
[0, 00)
8
(,
.
such
Ej-'-gs-Vk"'') >°-
(C) For each player i, date
r?j(G)(s
_i
)
= riKs-*) +
Then player
z
's
t
>
1
,
history (t
(Note that ^(CiX*
fc(Ct)(s-*")
assessment rule
MGKs
and strategy s~ e S~'
l
,
is
//'
)
=
-
*')
=
^(s-'')
given by normalizing the
,
we
define
for all
r?' ;
5-
.)
i.e.,
—
Eves-^HCt)^')
(D) For player
i, at
each date
t
with history
(t
,
^}(C<)
is
a
maximizer of
uV^-MCtXs-')
(3.1)
a-'es-
over
all
We
Q
cr'
G £"
are grateful to
We do
Bob Aumann
not bother to write k\
and the player whose strategy
is
,
for convincing us of
since the
being counted
how
important
this is
two arguments determine the length
of the history
.
In (D),
we have
down
not pinned
more than one maximizer
of
the definition of
We do
(3.1).
require that
not say what
is
7;'
it is.
Formally,
such that (C) holds as
but
it
trust that
may
play
fictitious
rules
most readers
will
when
make
there
is
a particular
but
strategy),
we do
say that a model of learning and behavior
if
there are initial weight functions
assessment rules
a definition of the
on the behavior
as a condition
fi'
and (D) holds
<j>'
be familiar with the model of
fictitious play,
help the uninitiated to give a simple example. Imagine two players
who repeatedly play
a
model of
consistent with the
We
we would
<£}(C<)
mixed
prescription in such cases (which, of course, can be a
<f>\(( t )
row and player 2
the strategic-form
a
column.
game
We assume
in
Figure 3-1, with player
that the
game begins with
1
choosing
the players
holding "beliefs"
1
7?
7
=(1,0,4.32) and
r,
=
(3,5.7),
where we write these functions as vectors with the understanding
component
1
of
77
corresponds to column
1,
the second
component
that the first
to
column
2,
and so on.
Player 2
Column
*-
w
Column 3
Column 2
1
Row
1
5,1
8,4.7
2,3
Row
2
2.3
2,1
4,2
CD
re
a.
Fie. 3-1.
Refer to the
1.
The second
player 2 will
first
A
three lines of Table 3-1,
line gives data for
do
strategic-form game.
in the first
player
round
1; first
2 gives the higher payoff, and that
her relative beliefs about what
the vector
(i.e.,
payoffs she will accrue given those beliefs
Row
which are labeled round number
is
10
if
1
77
);
and next the expected
she chooses row
1
wrtten down as her
and then row
2.
choice. Similarly,
Round number
Player
1
"Beliefs"
1
Player 2
3
Round number 2
Player
2
Player 2
3
about
about
5.32
2.49
8.7
about what player
will
1
2.04
Choice
row 2
2.26 column 1
1.95
of fictitious play.
do
(the vector
second three
to the
lines, labeled
round number
about the actions of player 2 are changed to
round. Since player 2 chose column 3 in the
3 in l's beliefs
increased by
is
expected payoffs to player
and we see
finds that
choice of
that
),
2's best choice is
1 is
2 in the
(That
optimal,
to
Fictitious play
was not
would behave (and
instead as a
method
learn)
for
-q\
be player
when
what happened
of learning
The connection
will
We
= (1,0,5.32).)
recompute the
when
in the
3,
advanced as
playing a
computing Nash
changed
now
to reflect player l's
second round row 2 and column
round number
originally
But player 2
l's best choice.
game
and so on.
a
model of how individuals
repeatedly;
equilibria
of the preplay thought process of individual players.
model
in the first
round, the entry for column
first
his beliefs are
Hence
period.
first
is,
reflect
Plaver l's beliefs
2.
of playing either row, using these reassessed beliefs,
row 2 continues
column
row
1
1.
are chosen. This gives beliefs for
as a
2
tj
3.
Move
1
Choice
row 2
2.28 column 1
Expected payoffs
3.08* 3.28
rival
An example
Table 3-1.
2's beliefs
2.44
7.7
3
3
2.28
Expected payoffs
2.82* 3.45
rival
5.32
'Beliefs"
1
Player 2
column
6.7
"Beliefs"
Choice
row 2
2.34 column 3
Expected payoffs Choice
2.47 3.68
row 2
2.38 2.14 2.31 column 1
rival
5.32
1
Round number 4
given
about
3
Round number 3
Player
2.31
1
Player 2
Expected payoffs
2.56* 3.62
rival
5.7
"Beliefs"
1
Player
about
4.32
1
9
it
was advanced
or perhaps as a model
How
well does
it
stand
and behavior? The following two questions are raised
become dear
in a bit.
11
immediately.
(1) Is there
that the
any particular sense
assessment rules
player believes her rival
//'
is
to
how
assessments are being formed?
are consistent with a Bayesian
It
model
can be shown
in
which each
playing the same (unknown) mixed strategy in each
round, independent of what came before, and where each player's prior assess-
ment concerning
(2) Is
it
this
unknown behavior
sensible or realistic to
given their assessments?
so for
in Section 4,
does not respond
preceding
Behavior that
now we
possibility arises in the
round
play
as, at least, a
we
if
in this sense will
in
in player 2's
is
that
is
example of Figure
3-1
for the
its
long-run implications.
and Table
3-1; if
we
3.0. In
Nash equilibrium, then
model of
Proposition
3.1.
fictitious play.
Suppose that
pure-strategy profile
profile
must be
a
follow this out
row 1-column
2.
Since
game, increasing the weight on row
Nash
any history generated by fictitious
Or, speaking very loosely, strict
to the
One
assessment and increasing the weight on column 2 in player
"gets stuck" at this pure-strategy
a strict
to question
warranted.
assessment only increases the optimality of column 2 and row
Proposition
rival
very specific but interesting parameteri-
can ask about
Nash equilibrium
be discussed
our answer
in this fashion
then in round 8 play reaches the profile
8,
this pair is a strict
Thus play
in the sense
each player believes that his
— as posited
— then myopic behavior
zation of learning and behavior,
1
would behave myopically,
myopic
is
only note that
to the history of
Accepting the model
until
that players
round, they choose a strategy that maximizes their immediate expected payoff,
that, in each
(1) just
assume
strategy has a Dirichlet distribution.
is
Nash
in
played for
all
Nash
A
all
respectively.
equilibrium. In general,
play,
if
a strategy profile
subsequent play will be that strategy
is
played
profile.
equilibria are absorbing for play according
related observation
some
1,
l's
is
the following.
history generated by fictitious play, a particular
but a finite number of periods. Then that strategy
equilibrium.
12
We refrain
which
from giving the proof
proved
is
possibility;
must be
so, this profile
judicious choice of the
at
any
a
Nash
play might "stick" at some pure-strategy
equilibrium.
(It
weight functions
initial
equilibrium.
strict
an easy corollary to Proposition
is
4.1,
later.
Thus we see one
If
here; this
Depending on how
profile.
goes almost without saying that
will
ties
allow
fictitious
are broken, this
play to stick
is
true as well
of any equilibrium, even those in weakly dominated strategies.)
Proposition
3.1
games
strategy profile in
in
games
on
implies that fictitious play cannot converge to a single pure-
do have pure-strategy
that
to a single pure-strategy profile.
and change the entry
3-2.
have no pure-strategy
that
Note
4.7 in
play with the same
initial
equilibria, fictitious
2
play
For example, take the
row 1-column
row 1-column
that
equilibria.
Nash
may
game
2 to a 4, giving the
is still a strict
Moreover, even
fail
to lock
in Figure 3-1,
game
in
Figure
equilibrium. Begin fictitious
weight vector as before, and
it
turns out that
column
2
will never
be played. Instead, play "cycles" around the best response cycle row
1-column
1
"cycles"
is
row 1-column
to
put
is,
player
player 2 plays column
Hence
1
1
plays
two
row
one
1
fifths of the
the players' beliefs about
corresponding mixed strategies.
mixed
strategies constitute a
where
and
fifths of the
each other will be playing converge
It is
straightforward to see that these
mixed Nash equilibrium. More
generally,
we have
3.2.
Proposition 32. Suppose
that in
some
history generated by fictitious play, the empirical
frequencies of pure-strategy choices converge to
strategy profile
is
3,
third of the time in the limit,
time and column 3 three
how
to the
The proof
row 1-column
quotes because the periods of the cycles increase through time.
in
converges. That
Proposition
3 to
however, the relative frequencies of play of the various strategies
In the limit,
time.
row 2-column
3 to
is
a
Nash
some (mixed) strategy
profile.
Then
that
equilibrium.
omitted for now;
this is a corollary of
13
Proposition
4.2.
Note
that this
proposition implies Proposition 3.0 as a special case.
natural to ask whether, in every
It is
convergence
ditions,
der
way
connected with the
However
ken.
convergence
if
due
which
in
care
reasons
not ensured; the
every
set of initial con-
ties,
then
unfail,
choices) are brois
it
known
that
ensured for zero-sum games (Robinson, 1951) and
(nontrivial)
first
why convergence may
(among optimal strategy
taken in dealing with
is
in this sense is
ties
trivial
two-by-two games (?vliyasawa, 1961). However
is
for
at least in the sense of Proposition 3.2 will take place
There are entirely
fictitious play.
game and
example
is
for general
games convergence
given in Shapley (1964).
Extensions of fictitious play
4.
One problem with
ification.
model of
the
fictitious
Assessments are formed according
play (up to the
initially
play
is its
very rigid, ad hoc spec-
to the empirical frequencies of past
given weight vectors), and actions are chosen to maxi-
mize precisely immediate expected payoffs. Neither part of
essential to the results given above;
we
this specification is
can obtain similar results for a broad class
of models of learning and behavior. In this section,
we
some
present
results of
this sort.
Myopic behavior
Definitions.
Given an assessment rule
the behavior rule
G
/
<?{((«)
1
<t>
=
maximizes
,
n'(4> (C t ),^(Ct))
t
l
(d>\,4> 2
i 's
is,
t
of
i
is
myopic
relative to
p*
if,
i,
we
say that
for every
immediate expected payoff given assessment
,
1
<f>
is
asymptotically myopic
of strictly positive numbers
e
for
(/xj,^,...) for player
//}(Ci)-
t
and
That
is,
= max s ^ s ^'(s\ fi](Q).
The behavior rule
within
)
=
u'
maximizing
u'(<p't (( t )^\(( ( ))
+
et
i 's
\t t
)
relative to
with limit zero, for every
t
x
p
and
if for
(,
some sequence
,
immediate expected payoff given assessment
>max
,
3
.
€S ,u'(s',fi\(Ct)).
14
<j>)(Ci)
fi\((t)
comes
That
.
The behavior rule
4>'
is
strongly asymptotically myopic relative
some sequence
of strictly positive
every
support of
in the
s'
payoff, given assessment
support of
in the
for all s'
Note
{e t
comes within
d>\(( t )
p\(Q
numbers
That
.
is,
with limit zero, for every
}
tt
of
maximizing
u'iP.p^Ct)) +
s
i
p'
to
t
if
and
for
Q
,
immediate expected
U > max 3 eS
;
u'(s'
,
,
fi\(( t ))
<fj\(( t )
that in asymptotically
myopic behavior,
the player can use slightly sub-
optimal pure strategies with large probability, or he can use grossly suboptimal
pure strategies with small probability, or both, as long as the "average" suboptimality, averaged according to the probabilities with
played,
is
Even
some:
is at
least asymptotically
this less restrictive
It
all.
work throughout with models
will
behavior
implicitly
of learning
myopic with respect
and behavior
to the
assumption has one feature that
and imagine
is
assessment
which
rules.
supposes that players do not try (asymptotically) to influence
that player 2 selects actions according to the
game row
play. In this
for
potentially trouble-
is
the future play of their opponents. To see this, consider the
1,
strategies are
small enough. In strong asymptotic myopia, grossly suboptimal pure
strategies cannot be used at
We
which the pure
2
is
dominant
for player 1,
and so
if
game
in Figure 4-
model
of fictitious
player
asymptotically myopic for any assessment rule, she will play
row
l's
1
behavior
eventually.
Since player 2 uses the assessment and behavior rules of fictitious play, he will
eventually choose column
row 2-column
1.
instead chooses
2.
If
player
1
But
row
1
if
1;
player
play converges to the pure-strategy equilibrium,
1
does not behave asymptotically myopically and
each time, then player 2 would eventually choose column
discounts her payoffs with a discount factor close to one, this gives
her a higher overall payoff. The point
is
a
simple one. As long as player 2
playing according to the model of fictitious play, player
manipulate
2's beliefs in
Fudenberg and Levine,
1
can exploit
this
order to receive her "Stackelberg leader" outcome
1989).
15
is
and
(cf.
Player 2
Column
Z Rowl
Column 2
1
1.0
3,2
2,1
4,0
>>
to
Row
0-
Fig. 4-1.
A
strategic-form
2
game
illustrating the possibility of Stackelberg leadership.
In light of this example, our assumption of asymptotically
requires
that
some defense and
combine two
We
explanation.
justifications in
myopic behavior
defend the assumption with stories
varying proportions:
possible influence on opponents' future play
is
First,
even
if
a player's
may
discount
may believe
that their
large, the player
the future sufficiently that the effects are unimportant.
Second, even
the players are relatively patient, they
if
current action will have
little, if
Suppose, for example, that player
i
which
unknown
's
as
's
behavior
actions.
10
is
will not affect
subsequently myopic)
Weakening
this slightly, if
i
what
We
is
i
what
learns,
will not affect
it
i
i
chooses
and thus
's
own
to do,
(as
long
subsequent
believes that her rivals will be playing a
fixed strategy asymptotically, then asymptotically
reasons)
in the future.
not influenced by the actions of other players. Moreover, because
immediate choice of action
i
happen
(and possibly mixed) strategy
learns her rivals' actual play at each date regardless of
i
i
is
will
believes that her rivals chooses actions in
each period according to some fixed but
profile,
on what
any, effect
myopic behavior
(for the
same
warranted.
are not very
happy with
either of these
two
justifications
on
its
own.
In
order to permit learning to take place, play must be repeated "frequently," more
10
The point
of this
remark
may
affects the information she receives
be natural, for example,
if
not be apparent. Imagine that
:'
's
choice of action in round
about the strategy choices of her rivals in that period. This
we imagined
that the stage
only observe the outcome of each round of play. Then
game
«'
's
is
an extensive-form game, and players
choice of action today might affect her own
subsequent actions; and she might choose to invest in information today by taking an action that
(myopically) suboptimal but that
may
generate useful information for guiding future choices.
16
i
would
is
.
frequently than
would be suggested by
And
extraordinarily impatient players.
a substantial discount rate, except for
the story that players regard their rivals
should a player imagine that his
(justifying
asymptotic myopia)
consequence,
more
down
settles
rivals are so different
down
that one's rivals will settle
is
to repeated
more
to repeated
palatable story, each player
down more
and combining the two
who
number
1,
between the same
precise,
Story
1.
at
date
is
chosen
and
2.
at
t
random
3.
who
in this
that his rivals settle
myopia can be given by enriching our
Rather than thinking of
is
,
a
and so on,
2,
and
one group of players
is
1
there
+
is
reported to
is a
it is
group plays the game.
t
how
whom
that repeated meetings
is
To be
rivals acted in the past.
stories holds.
selected to play the
game. They
Those
who
play
and another group
a
the
of
game
the players, so that
is
played. At the end
that
is,
at
twenty percent of the Is chose row
1,
is
never revealed.
random matching
Each player
all
the entire population played. (That
particular player
there
five
whenever a player meets
that
random matching
announced
The play of any
At each date
all
we have
1
assigned to a group with
it is
small group of
Imagine that
interact repeatedly.
unaware of how these
t
story
think of situations in which there are a large
thousand players
for date
end of the period,
Story
assuming
imagine that one of the following three
At each date
on.)
belief
plaver, in
play of a single strategy. But even
are then returned to the pool of potential players,
t
of the period,
and so
when each
their actions are revealed to all the potential players.
each player
the
palatable, especially
sets of players are rare,
At each date
so,
Story
five
of rivals, he
more
do
we
interact repeatedly,
thousand players
of
justifications.
of (potential) players
some group
A
quickly than the player does himself.
More convincing justifications
players
from himself?
play of a single strategy profile
(effectively)
is
why
from internal inconsistency;
as playing fixed strategies repeatedly suffers
recalls at date
of the players, and each
t
what happened
in the
previous encounters in which he was involved, without knowing anything about
the identity or experiences of his current rivals.
myopic behavior seems
In each of these stories,
"sensible," for reasons that
varying degrees the two basic justifications given above. In the
justification is
mostly
at
work. Although the game
any single individual plays very infrequently, and
immediate payoff considerations
and
the second
third stories,
with good reason) that his
how
will
it is
believe that
how
a
is
any reasonable discount
actions will have
In story 2, this
little
on the player's immediate
plausible intuition has
some
In
is
little
impact on
may
because each player
influence on the reported aggregate
it
any time soon, or even
by the player's own immediate play
rivals,
of individuals) affects future rivals. These stories
plausible intuitively, although
rate,
because each player attaches low probability to
that future rivals will indirectly be affected
effect
first
played relatively frequently,
the possibility that his current rivals will be future rivals
through an
to
matter of each player believing (now,
own immediate
he behaves will have
distribution; in story 3, this
at
the
dominate any long-run considerations.
more
his future rivals will behave.
is
first story,
mix
who
then (through
some
chain
make myopic behavior more
remains a question worth exploring whether
this
firm, formal basis.
Adaptive assessments
Once we assume
to specify the
that
behavior
assessment rules
/j'
is
(asymptotically) myopic, the next step
used by the players and, in particular,
how
these assessments are revised as the players observe the actions of others.
the
models
we
In
will consider, players believe that, at least asymptotically, the
past choices of opponents are to
fairly
is
weak property
some
extent representative of future choices.
that captures this idea
is
A
suggested by Milgrom and Roberts
(1991)."
Milgrom and Roberts define adaptivebehavior
that
our formal definition
is
just a gloss
on
as opposed to assessments, but
theirs.
IS
it
will
be apparent
,
^
Definition. The assessment rule
there
is
some T(e,t) such
probability no
played at
all
more than
that for all
on the
e
between times
and
t
is
T(e,t)
and
pure strategies by
puts very
i
for every
if
(according to
t'
have not been played for
assessment rules
>
t'
set of
In words, the definition says that
rivals that
adaptive
is
>
e
and
histories
i 's
£<>
for every
/x{,(G')
,
1
puts
opponents that were not
(<< ).
little
weight on strategies by her
long (enough) time. The class of adaptive
a
very broad, including, for example, assessments that take
a
weighted average of the history of past plays by one's opponent, as long as the
weight put on any
the segment.
initial
segment of history can be made small by lengthening
Four examples of adaptive assessment rules
one's rivals will play in period
that
to
whatever was played in
t
go with Cournotian dynamics);
(b) assess that
t
are:
—1
(a)
assess that
(the assessments
one's rivals will play according
an exponentially weighted average of past plays;
(c)
assess that one's rivals are
equally likely to play any action that has been played at least one percent of the
time, with zero probability for
scheme
all
of fictitious play (where
While the
class of adaptive
other actions; and (d) assess according to the
all
previous observations are equally weighted).
assessment rules
that restricting attention to this class
is
broad, there are arguments
too restrictive.
is
See, for example, the
discussion in Milgrom and Roberts (1991, page 89ff) concerning sophisticated
learning.
Convergence to a pure-strategy profile
We
u
1
are
now
in a position to generalize Proposition 3.1. Fix
and behavior
l
rules
4>
for
our two players. To
assessment rules
state the result,
we
require a
definition.
Definition. The infinite history (
rules
if
for each
t
=
1
,
2,
. .
.
=
and
(s,
for
,
s2
i
=
is
•)
,
1,
m,).
19
.
.
,
.
said to be
I
,
compatible with behavior
the action
s\
is
in the
support of
That
something that could be observed with positive probability over
( is
is,
time horizons for players
finite
who behave
all
according to the behavior rules that
are given.
Proposition
4.1.
and for some
T
rules
,
12
s
-
t
and behavior
//'
( be
Let
an
infinite history
s. for all
z
rules
t
> T
If (
.
such that for some
(s-,,s 2 ,...)
is
G S
compatible with adaptive assessment
myopic
that are strongly asymptotically
<f>
sm
relative to the
assessment rules, then s» must be a Nash equilibrium of the stage game.
Normalize the payoffs
Proof.
player
(
is
in the
game
so that the range of payoffs for each
no greater than one. Suppose Q
is
the partial history (si,s 2 ,..- ,s t ) of
Maintain the hypotheses of the proposition;
.
components are eventually
C's
Then (without
Let
s
1
be
s,
.
Suppose
loss of generality) player
better response,
1 's
and
1
compatible with the
( is
that s„
Nash
not a
is
4>* ,
and
equilibrium.
has a better response to s~
}
than s\
.
set
4
Because the
play of
are adaptive
we
,
can find
.
T
a
assessment of player
ity
s\
s*
x
\i
1
,
and because history eventually
sufficiently large so that for all
t
> T
,
puts probability at least
1
—
on the play
p.](Ct)
Thus the expected payoff
assessment of her
worse than
against sZ
1 's
l
to
>
1
for all
t
> T from
payoff from playing S
> T which
t
,
13
1
.
Thus
implies that
s\
1 's
is
e
playing
rivals' strategy choices) is at least 4e(l
for all
on repeated
settles
—
e)
the probabil-
(against
s\
-
e
more than
behavior rule
of
1 's
- 3e - 4c2 >
suboptimal
e
is
e
not strongly
asymptotically myopic, a contradiction.
12
1
Compare with Milgrom and Roberts
This
is
4f = ti'Cs'.sT
1
no more than
computed as
- uHs\,sZ
(
,
the normalization
times the
is
Theorem
the probability assessed
]
)
(1991,
)
,
less the probability that
maximum
that
3[ii]).
-1
plays
sf
1
,
at least
possible difference in payoffs playing S
one.
20
1
-1 plays anything other than s^
1
and
s\
—
c
1
,
,
,
times
which
is
which by
Note
that the proposition
myopic.
totically
If
we
assumes
that behavior rules are strongly
deleted the modifier strongly, the result
would be
ity
1/t
and
to defect
behavior rules.
And
each player chooses to cooperate with probabil-
t,
with probability
where each player cooperates
—
(t
But the strategy profile that
probability.
Nash equilibrium. The problem
each
finite history
Consider the
is
infinite history
compatible with the
rules) the behavior rules are asymptoti-
myopic, because they involve playing
cally
\)/t.
in each period. This history
any assessment
(for
false
dilemma, and
as stated. Consider, for example, repeated play of the prisoners'
behavior where, at period
asymp-
of course
suboptimal strategy with vanishing
a
"repeated" in each period
is
is
is
not a
that compatibility only requires that
has positive probability for the behavior rules. For the given
behavior rules, a history where players cooperate in each period has prior probability zero.
" In order
to obtain a result in the spirit of Proposition 4.1,
we must either be more
asymptotic myopia instead of strong asymptotic myopia,
how we make
careful about
histories consistent
but with
with the given behavior rules or
study not the actual history of play but the intended strategies of the players.
We
provide one result along these lines
Convergence to mixed strategies
Next we proceed
frequencies of observations to
For the remainder of
i.e.,
Let
in
We
1 = 2.
d(( t )
S along
:
S —
end of Section
where we look
'[0,
some
oo)
up
for
= 2
play and convergence
convergence of the empirical
assume
the case of
that the
game has two
more than two players
in Section
(,
;
i.e.,
o(£
t
)(s)
gives the
number
of times
For the given behavior rules, no single complete history has positive probability,
is
players
5.
give the vector of proportions of strategy profiles
the partial history
probability one, there
in
(possibly mixed) strategy profile.
this section,
will take
6.
in empirical frequencies for 1
to generalizations of fictitious
the second sense of Section 3,
only;
at the
m fact,
.s
with
never a complete cessation of cooperation by one side or the other, although
eventually cooperate-eooperate
is
no longer observed.
point.
21
All of
which
is
quite beside the immediate
,
was played
in periods
1
through
t
-
_
'
1
tf((t)(s\.s
Then,
').
,
on
the marginal frequency distribution
Hs-'es-
1
divided by
t
—
1
,
We
induced by o((
S'
in the spirit of Proposition 3.2,
t
write ct'(G) for
i.e.,
) ;
we
©-'(CtX^')
=
are looking for
conditions on assessment and behavior rules that guarantee,
Suppose C
for
2
= 1,2.
To get
is
an
infinite history
a.
Tfaen
this result,
is
it is
a
Nash equilibrium
of the stage
insufficient that behavior
£"
some a, e
(si,s 2 ,...) such that for
game.
15
opic with respect to adaptive assessment rules. Consider, for example, the
matching pennies, and suppose that at dates
probabilities for
any strategy by
t
>
their rival that
4
,
my-
strongly asymptotically
is
the
game
two players assess equal
has occurred at least ten percent
two
of the time in the past; until date 4,they assess equal probabilities for the
strategies.
As
for behavior, players
with the following specification
If
is
t
is
divisible
by
3,
in all instances,
the assessment leaves the player indifferent:
then play "heads"; otherwise play
that the sequence of plays
players,
if
behave myopically optimally
is tails, tails,
and each always assesses equal
heads,
"tails."
tails, tails,
What happens
heads,
probabilities for his rival's
.
.
two
for
,
.
strategies.
Empirical frequencies converge to (1/3,2/3) for each, which (of course)
Nash equilibrium
The
difficulty,
of the stage
it
game.
should be
is
not a
16
clear,
of being an adaptive assessment rule.
is
both
comes from the
When
fairly
weak requirements
only one strategy choice by rivals
eventually observed, adaptive assessment rules converge together with
1
Please note carefully, this isn't quite the
same
as asking that lim a((t) = a.
(
.
We
behind
16
this observation, to
At the
which we return
in the
two players eventually play the mixed
gence of their intended strategies
is
not the issue.
99
is a
next section.
cost of complicating the description of the assessment
this so that the
he
are only
asking that the marginal frequencies converge, and not the joint frequency distribution. There
lot
f
strategies
and behavior
rules,
we
can modify
(1/3,2/3) at
all dates;
nonconver-
(degenerate) empirical frequencies of observations.
But
when
rivals
use more
than one (pure) strategy with nonvanishing frequency (or even with vanishing
frequency that vanishes sufficiently slowly), adaptive decision rules can assign
probability to that strategy that
To obtain the
result that
imposed on assessment
we
unrelated to
is
limiting empirical frequency.
we must sharpen
seek,
rules.
its
The simplest and most
considerably the criterion
works
direct criterion that
runs as follows.
Definition. The assessment rule
//'
It is
Q
are subhistories of the fixed
easy to see
if
for every (
Z
,
17
(
.
any asymptotically empirical assessment rule
that:
e
\U(Ct)-o-'(Q\\=0.
Urn
where the
asymptotically empirical
is
is
adaptive;
there are adaptive assessment rules that are not asymptotically empirical; the
assessment rule in the model of
Is it
fictitious
play
asymptotically empirical.
reasonable to insist that assessment rules are asymptotically empirical?
This property
is
rival is playing
natural
if
one's picture of a rival's
some (unknown)
But
if
you think
that
time
—
then
some assessment scheme
in
response
observations or that
would be more
to
your
that the
one supposes
may
state variable
that puts relatively
repeatedly through
shift
such as
a sunspot,
say
more weight on more
uncover the probabilistic structure of the regime
—
recent
shifts
reasonable.
Proposition 42. Let
(
bean
infinite history
lim a'(C«)
1
Whenever we
if
is
independent play of some (unknown)
rival's strategy
some Markov
tries to
dynamic behavior
strategy repeatedly, or even
that one's rival will converge to repeated,
strategy.
is
— oc
(s, ,.s 2 ,...)
=
such that for some a. e
ai,
are dealing with finite dimensional vectors as here,
23
E,
||
||
denotes the sup norm.
for
i
=
l,2.
1
behavior rules
<j>
am
is
rules, then
compatible with asymptotically empirical assessment rules // and
is
// C
a
Nash equilibrium
The proof resembles
ments.
assessments of player
mixed strategy a~'
strategy
I'
s'
7
5.
By
.
o\
If
i
is
that
a standard
be worse against
will
will not be
s
.
for player
support of a\
the proof of Proposition 4.1 with the following
by
(given
i
relative to the assessment
of the stage game.
since the assessment rules
First,
myopic
that are strongly asymptotically
//' )
/j'
are asymptotically empirical, the
at the partial histories
not a best response to o~
is strictly
than
is
x
,
(,
s'
>
e
and
by more than
i
's
is
to the
some pure
some
s'
in the
sufficiently large
e, for all
played eventually (by asymptotic myopia). But
being in the support of the limiting frequencies of
converge
then there
better against a~' than is
argument, for some
/i'(C<)
amend-
this
t>T.
would
Thus
T
,
±'
contradict
strategy choices.
18
Objections to convergence of the empirical distributions
as a convergence criterion
Notwithstanding the results of the previous section, the convergence
employed
fails to
strategies."
such as
Our
capture what
Mixed
jumping from one pure strategy
rebuttal to this
equilibria are
The conclusion
assessment rules;
e.g.,
pirical frequencies
suppose
that
of history, for
fi'
a
model
of "learning to play
players are (almost) never playing mixed strategies.
ever-increasing length, so behavior
The
for a
mixed
objections begin with the obvious observation that in examples
fictitious play,
are instead
we want
criterion
is
is
to another, (typically) in cycles of
not converging.
that while behavior
is
not converging, beliefs are.
sometimes interpreted as equilibria
of Proposition 4.2
the conclusion
does not require the
still
They
full
power
in beliefs;
each side
of asymptotically empirical
holds for assessment rules that do not approach the em-
along histories where the empirical frequencies don't converge. More concretely,
reports the empirical frequencies of
strictly
— £'s
choices over the most recent a percent
between zero and one-hundred. This assessment rule
empirical per our formal definition, but
it is
is
not asymptotically
empirical enough so that Proposition 4.2 holds.
24
believes the other to be acting in a
among
Under
several actions.
in the previous section
and
their
believe they
is fairly
natural
//
(1,V5)
would be ignored. Consider,
(nearly) indifferent
first
convergence criterion used
players ignore the cycles in their
own
left,
round
will
And
play of either top-left or bottom-right.
results about fictitious play,
Nash equilibrium
perfect correlation in the
game using
the precise
this
we know
V5)
.
relative beliefs vec-
player
fact,
chooses top
1
with the numbers
relative beliefs
going
The symmetry again implies
so on, inductively.
,9
From general
that empirical frequencies will converge
probabilities (2/3,1/3).
two
if
and each player's
(2,
not
battle of the
and vice versa. In
be given by
we do
example, a symmetric
of the situation implies that
round, 2 will choose
into the second
striking that
where each player begins with the
The symmetry
•
for
Imagine play of
are given, top-left will be played,
to the
the
phenomena so
these cycles can lead to
of fictitious play,
in the first
we
makes
this interpretation, the
sexes as depicted in Figure 5-1.
tor
that
opponent's play.
However
method
manner
But this will be realized with
players' choices: Top-left will be played two-thirds
of the time, and bottom-right one-third. Players will get zero round after round,
there will be perfect correlation in their actions,
and yet according
to the the-
ory they will persist in believing that they are "converging" to the mixed Nash
equilibrium.
Player 2
Column
"Z.
1
Column 2
Row
1
0,0
2,1
Row
2
1,2
0,0
a>
TO
Q-
Fie 5-1.
19
The
battle of the sexes.
Because the payoffs are rational and the relative weights have an irrational
never be a
tie;
each player will have a unique best response at
so
all
times.
ratio, there will
Moreover,
for the case I
3,
this
>
2
.
example shows
Imagine
game,
a three-player
from the perspective of players
1
run into
that Proposition 4.2 will
and
2,
in
which the actions of player
are irrelevant. Players
and 2 simply
1
play the battle of the sexes against each other in each round. Player
an optimal strategy, murt forecast the
to a
main-diagonal
cell
3, to
choose
joint actions of her rivals; for the sake of
suppose her optimal action
definiteness,
difficulties
is tic if
she believes that they will play
with probability 2/3 or more, and her optimal action
is
tac otherwise.
What should
the particular
3 conclude, asymptotically,
model
tic is
optimal?
the time, 2 will play
always playing along the main diagonal,
Or should she conclude
left
accordance with
act in
Should she conclude that
of fictitious play given above?
their actions are perfectly correlated,
hence
and 2
if 1
that
will play top two-thirds of
1
two-thirds of the time, and hence top-left has asymptotic
and thus
probability four-ninths, bottom-right has probability one-ninth,
tac is
optimal?
There are
we
(at least)
two
different
ways we could proceed, depending on how
extend the definition of asymptotically empirical assessments.
ble definition
is
precisely the definition given before, interpreting
marginal frequency distribution along
nition,
i
's
assessment (asymptotically)
rivals that are
(r
S
of profiles from
reflects
'
.
any correlations
how
observed empirically. The example shows
One
_i
ct
possi-
(G) as the
Under
this defi-
in the play of her
this definition per-
mits convergence (under fictitious play) to non-Nash (correlated) assessments,
so that Proposition 4.2
fails.
2C
An
alternative definition
suppose
that players
asymptotically assess independent play by their rivals, regardless of the empirfrequencies.
ical
One can
21
Then we obtain Proposition
1
I
>
2.
However
this
repair the proposition in this case as follows: Stipulate that along the history C
empirical joint frequencies are the products
seems a
4.2 for
(in
the limit) of the empirical margins.
But
the
this repair
bit cheesy.
One way
to formalize this is to define, for
each
2G
t
,
('
,
and
s
€
S
,
r)(0)(s) =
Y[,ci <7'(CiH
')
seems
us to be somewhat unnatural;
to
feel that
it is
1
and
if it is
even
correlation asymptotically,
is
2,
then
we
it.
unnatural for player 3 to ignore correlation in the choices of
equally unnatural for player
isn't it
and
in her choice of strategy
that
there
unnatural to assume that players ignore
Moreover,
players
if
for two-player
1
to ignore correlation
that of player 2? If so, then the
example indicates
games, asymptotic empiricism as formulated
may
be
dubious.
All these objections (past our
example;
in the battle-of-the-sexes
objections have less force. In fact,
for
2x2 games:
In a
two players (under
cted.
However we
be found
in larger
2x2 game,
the
if
it
that
example
shown
can be
for generically
model of
nongeneric, perhaps these
is
that the
example
be asymptotically uncorre-
this conjecture is the
game must converge
game
is
to the
game
initial
cells
a
neighborhood
sure of the asymptotic frequencies of the
easily,
though,
Because
if
we
we move from
will
cells.
is,
fj(Ci)
first
fre-
we
Robust examples can be created
we do
not leave neat-and-tidy our
objection
—
in empirical frequencies.
that this
mode
of convergence
gives the "frequency distribution" obtained by using the marginal frequencies a' and
forcing independence. Let fj~'(Ci) give the
empiricism in
Nash equilibrium
be nonzero sum) nor are
secondary objections based on asymptotic correlation
That
conjecture that
the strictures of exact fictitious play.
cannot verify our conjecture,
Nonetheless in our view the
We
games around rock-paper-scissors,
are unable to prove either convergence to the
quencies (since most games in
weight vectors,
along the main diag-
onal. (Each of the other cells has asymptotic frequency 1/6.)
these properties hold for a neighborhood of
rock-scissors-
unique Nash equilibrium
zero sum. And, for most
happens while (asymptotically) avoiding the three
we
nongeneric
chosen payoffs the actions of the
fictitious play) will
games. The basis for
(1/3,1/3,1/3), since the
although
is
conjecture that robust examples of asymptotic correlation can
paper. Fictitious play in this
this
and most basic objection) are grounded
first
this
second sense
is
S
-
'
marginal distribution of f?(G). Then asymptotic
the condition limi
||/i'(Cr)
-
-,
*7
(Ct)|l
=
along every history C
•
.
does not correspond
to learning to
modes
research into stronger
play mixed strategies
With
of convergence.
— suffices
this
to
motivate
motivation, then,
we
proceed.
Convergence of behavior strategies
6.
Rather than look for convergence of empirical frequencies (and hence as-
sessments about the actions of others),
employed by
strategies
to
some
o\ 6 -'
Because
we
,
for each player
we wish
in the easiest fashion,
interpreting
we study convergence
(in
—
z
games with
I
as the set of i's rivals. That
Two problems
mixed
strategies.
myopic
that
is
>
2
In the
surface immediately.
model
of their rivals.
z
How
must be
likely is
it
If
of
<p]((, t )
,
we must
first
as
it
specify
how
We proceed
was given
earlier,
but
assessments asymptotically
in the choices of
<f>\(Ct)
is
meant
her rivals.
to
converge
willing to play one of several pure
we
insisted that players choose only
the basis of their assessments of the actions
that a player,
assess for his rival precisely the
player) indifferent?
i's
First, if
of fictitious play,
computed on
best replies,
is,
observed empirically
strategy, then player
i )
?
by using the definition precisely
any correlation
exhibit
would
to consider
is,
look for convergence of the behavior
adapt the definition of asymptotically empirical assessments.
will
to a
plavers. That
we
this is unlikely,
based on some history of play,
mixed strategy
how
can
we
that
makes him
(the first
ever have players willfully
randomizing?
It
is
here that the asymptotic parts of asymptotic myopia and asymptotic
empiricism come into play.
We do
myopic best responses; they can play
not insist in general that players play only
degree of suboptimality vanishes as time
to the equilibrium
mixed
suboptimal responses, as long as the
slightly
( t )
strategies quickly
the allowable suboptimality vanishes,
we
28
passes. So
enough
if
assessments converge
relative to the rate at
which
can sustain mixed strategies even
if
assessments do not match precisely the equilibrium mixtures. At the same time,
we do
not insist that players' assessments are precisely empirical;
some equilibrium mixed
frequencies of play converge to
mixed
beliefs can sit at precisely that limit
strategies
This
even
if
means
behavior
as convergence to
both
at once.
we
mixed
That
is,
strategies
lo!
of
to
The formal "nonconvergence"
The second problem
that
concerned.
is
,
in
As we shall
we
precisely
is
mixed -strategy
you may
is
raised
is
if
we
beliefs that
are to
hope
convergence
To see what
in
is at
terms
issue
matching pennies game repeatedly. Suppose that along
object, is
two rows and the two columns
very unlikely.
(1/2,1/2)?
and
Unlikely
— any history
(
is
players mixing strictly in each round
this history
we don't require
myopic and
that statements of
probabilistic statements.
strategies be converging to (1/3,2/3)
that prevents this
see,
profiles.
the empirical frequencies of the
are approaching
story, at least insofar
need one or the other,
will
approach (1/2,1/2), but the behavior rules converge
(1/3,2/3). This,
our
criterion: Unstable strategy profiles
must be
here, imagine playing the
history £
power
our results obtain with asymptotic myopia and beliefs that
convergence of behavior
some
mixed
from precisely myopic behavior and precisely
allow carry a
are asymptotically empirical. But
of behavior strategies
strategy, then players'
strategy, justifying the play of
are precisely empirical, or with behavior that
for
the empirical
precisely myopic.
is
that the divergence
empirical beliefs which
if
is
at the
How
to the
mixed
strategies
could players' behavior
same time empirical frequencies
just the right
word. There
is
nothing
compatible with behavior rules that have
— but by the strong law of large numbers,
belongs to an event of probability zero.
If
behavior strategies are con-
verging to (1/3.2/3), then the strong law of large numbers says that empirical
frequencies will converge to (1/3,2/3) with probability one. Given asymptotically empirical
assessment rules,
anything close
to the (1/3,2/3)
this
would
strategies.
29
rule out players continuing to play
when
Accordingly,
we
giving results in the spirit of Propositions
give results of the following form:
there
is
a.
If
is
not a
4.1
and
4.2,
Nash equilibrium, then
probability zero that behavior will remain forever in a small-enough
neighborhood of
<7»
,
no matter what
The
are the initial conditions.
formalities
run as follows.
Fix a set of behavior rules
l
rules
that
fi
,
although for the time being only the behavior rules are needed). Recall
represents the objective conditional probability distribution on
P(-|C<)
Z
space
created by starting at
A
Definition.
for all
Note
t
(which will be accompanied by assessment
<f>'
strategy profile a. G
and (t
that
V(\\<j>ACv)
,
here
e
Proposition
(,
6.1.
is
-
<r.\\
and using the behavior rules
U
unstable
is
<
e
for all
Fix behavior rules
Nash equilibrium
Note
totically
and
>t
there exists
\
Ct)
l
4>
that are asymptotically
is
p.'
.
some
(<
4.2, in
e
>
such that
.
myopic
Then every strategy
relative to
profile
a,
some
that
is
unstable.
that in this proposition, behavior rules are required to
myopic
thereafter.
=0.
independent of the starting conditions
asymptotically empirical assessment rules
not a
t'
if
th.?
assessment
relative to the
rules.
be (only) asymp-
Compare with
which strong asymptotic myopia was assumed.
Propositions 4.1
We
return to this
point at the end of this section.
Although the
is fairly
simple.
profile a.
too,
,
If
details of the proof of this proposition are tedious, the idea
behavior
lies
forever in a small neighborhood of the strategy
then empirical frequencies will eventually
and thus the players' assessments
equilibrum, then (eventually)
lie
in a small
neighborhood
will too. If the strategy profile isn't a
some player
will
want
to
move
far
Nash
away from
the
strategy profile, a contradiction to the supposition.
The key
which
we
technical result
state in the
is
form of
the application of the strong law of large numbers,
a
lemma.
30
1
Lemma
1
{x
6.2. Let
=
t
t ;
A
space with range some finite set
e
>
0, and
A
let
,
denoted
that
=
t
z<-i)
,
hm
for all a
Proof of
=
/'
<
+
77(a)
sequence {u
distributed
t
}
Fix any a €
6.2.
is
i?
la
7r(a)j
(z t <); that
and
e
A
A
.
hm
and an
x
all
conditional on
t
is,
<
e.
r {a)
the
is
t
number
of times
inf,
>
^
77(a)
-
t
22
Construct a "standard" probability space
= [0,1]^' 2, -) and, writing u
as the
t
t
th
component of
u>
,
the
an independent sequence of random variables, each uniformly
on the unit
the cardinality of
—
t
£ A, almost surely conditional on
{Q,F,P} where
A
!,...,/. Then
_
sup.
Lemma
a probability
satisfies
x -\)
t
for
,
iTtialxi
T {a) be the random variable 5I|» =1
Xf = a
cr.
- on
Fix a probability distribution
.
1,2,..., the distribution of each
7r t (-|xi,...
max
Let
variables
be the (measurable) subset of the probability space consisting of
sample points such that for
{xi,... ,zt-i}
random
be a sequence of
1,2,...}
A ),
with a
:
A
Enumerate
interval.
= a
.
Now
standard probability space as follows: For
{a] ,a2 ,...,ajv}
as
random
define
t
=
1
,
yi(uj)
(where A
variables
y
t
on
7
is
this
= a„ for that index n
such that
Then, inductively
in
t
,
n—
n
m=l
m=l
let
y,(u.')
= a n for that index n such that
n—
n
^7r<(am|yiM,---,y«-i(u;))
<u <
t
m=l
*)T
7r t
sequence of random variables whose
If
A has zero pnor
.
.
.
,
y,-^)).
m=l
This construction uses the uniformly distributed
a
(q m \y^{u),
probability, the
lemma
random
variables to construct
joint distribution is identical to the joint
is
31
taken to be vacuous.
distribution of the original sequence
{x
Accordingly,
}
t
terms of the constructed probability space, and
on the
A
limits superior
one
for each a taken
(finitely
and
if
y
(
{u
t
t
u
,
for
t
<
>
z(a)
+
e)
for points in
A
for
This
.
w,
<
—
n(a)
so because
is
A by
:
= a, the
a,
—
7r(a)
=
and
<
e
7r t
v {-(a)
t
.
Then
—
e.u)
on
for all the
is
sequence and the
t
u G A
for
which
contained within the set
(a|y,,
— ,y
(
_i)
<
7r(a)
+
for
e
be the number of times that
t
1 ,... ,t
bounds
compare with how we determine those
definition; then
t
t'
y
set of points
e)
A
in
the desired result.
for the
y {u) = a. For r 6 [0,1], let u (r,u)
which
r
set
contains the set {u
a
u <
:
all
u
=
(a,')
we
Because
easy.
is
that the stated
then they hold almost surely on
at a time,
many) as simultaneously, which then gives
a
A
can define
inferior of Tt (a)/t hold almost surely conditional
But showing that the two bounds hold on
fixed
we prove
we
the estimate
<
<
r (a,u)
t
v
t
{r;{a)
+ t.u) for
u £ A
follows from the asserted set inclusions. By the strong law of large numbers,
hm
— oc
—
v,(r,u)
with probability one for each
=
-
and
= z(a) +
r
on
on A, gives precisely the desired
r (a)
t
Proof of Proposition
Then
equilibrium.
is strictly
'
6.1.
there
r
Suppose
is
we
and a
1
e
This,
.
that a.
than
is
can find an
e
>
that are within
t
of a\
a\
result.
strategy profile that
and
i
so that for
,
a
pure strategy
Since v'(a\a~')
.
both
all
in the
a
e
<
32
'
is
P
u'dVi.
is
not a Nash
such that
s
1
continuous in both
that are within
sup norm (on
respectively),
v'(a\a-') +
one
combined with the previous bounds
is a
some player
better against a~'
arguments,
<7.
e
r
individually, so this holds with probability
r
both for
7r(a)
=
t
t
E~
l
It
of
and Z"
,
,
.
o~
(Interpret
x
here as any element of ~E~
pendent play by
i
's rivals.
rivals according to the
We
claim that for this
sessment
and
e
l
needn't be based on inde-
composed of independent choices by
l
is
profile a,
We
From
for all asymptotically empirical behavior rules
myopic
relative to these as-
M<
-
for
e
all t'
to the contrary that for
>
t
some
=
(t )
|
C<
0.
this probability is strictly
/
proceed to derive a contradiction.
lemma, we know
the
on Q) where
(/-l)e of a~
l
.
3
IIjV:
<? i'((t')(s
on the
that
—
||<Pr(Ct')
o*
<
||
A
set
for all
e
J
)
-
<
J
J
f^,(C«')(s ')-<ri(s ')|
(If
rij/i
<7 «(5 J )|
<
(7
-
l)e
of positive probability (con-
>
t'
the assessments of player
lie
i
the limits inferior and
t,
A
,
which contradicts the
Two remarks
(1)
Note
that the
the behavior
and assessment rules
cr~
is
2
.
a
T
such that for
all
t'
> T
But then asymptotic myopia
be more than
e
away
A
in order.
neighborhood of a, that
depends only on the strategy
there
definition of
about the proof are
then for any s~> = (^) jVl-,
'
,
will eventually
<f>\,((t)
within
lie
Since assessments are asymptotically
within Je of
implies that along every ( G A,
^
for all
c
.)
empirical, along every infinite history in
from a\
's
.)
superior of the empirical frequency distribution a -, ((<') almost surely
|
?
rules,
To see why, suppose
ditional
i.e.,
that are asymptotically
/j.
P(||<MCf)
positive.
;
components of the strategy
and assessment rules
<f>
But a~
a~
l
is
that are
profile a,
used in the proof
assumed
to
is
independent of
be given; the value of
e
not a
Nash
not required for this proof.
What
and the extent
to
which
it is
equilibrium.
(2)
is
The
full
essential
strength of asymptotic empiricism
is
that
if
behavior
lies
forever in
is
some small neighborhood
33
of a strategy, then
assessments come to
lie in
strong law of large
in a small
numbers
to
show
neighborhood, and then
that the empirical frequencies
we were
empirical character of assessment rules.
the following form:
is
/*)(£)
the strong
a
is
p)(Qt)
as
would
But suppose the assessment rule took
\/i
periods. Since
segment of history grows without bound, we can
law of large numbers
to
come
to the desired conclusion.
still
enlist
Or suppose
weighted average of past observations, with greatest weight on the
t
numbers and
goes
to infinity,
we
that "greatest
weight"
falls off to
zero
get the desired result. (Of course, this rules out exponential
will stick to
asymptotic empiricism for the remainder of
expositionally the easiest thing to deal with. But
it is
a bit
more
than
restrictive
we
fast
can enlist a variation of the strong law of large
moving
averages, where the weight on the most recent observation doesn't vanish at
We
lie
asymptotically equal to the empirical frequency of
most recent observation. As long as
enough
used the
able to enlist the asymptotically
observed strategy choices by one's rivals over the most recent
the length of this
We
another small neighborhood of that strategy.
all.)
this paper, since
you should note
that
it is
actually need.
Locally stable strategy profiles
Proposition
librium
6.1
unstable.
is
shows
We know
that are not unstable.
unstable
that every strategy profile that
The answer
natural to
It is
is
by example
no;
that there are
A
no Nash equilibrium
strategy profile a.
is
not a
some
locally stable
Nash
equi-
strategy profiles
wonder whether any Nash
equilibria are
profile is unstable.
To avoid double negatives, we make the following
Definition.
is
if
definition.
there exists
some asymptotically
empirical assessment rules and behavior rules that are asymptotically myopic with respect
to the
assessment rules such that for every
e
>
0,
we can
find
such that
P( Urn <MG') = a,
—
t'
oo
34
|
Ct)
>l-c
some
t
and
Qt
e
Z
t
We do
not insist that behavior converges to the target strategy with probability
one, but only that the probability can be
model of behavior and assessment
couldn't have
pure (except
there
is
players
some choice of
one as
Lemma
one as
6.3.
we wish
is
not as
we
some
am
profile
some behavior
t'
and
t
— oc
let
A be
conditional on £ t
be
made
If
Since
«
A has
(<
ff.|
to the
e
Z
t
make
it
as
we
as close
some asymptotically
myopic with
,
CO >0.
for
t
<
a
0<'((,V)
-field
= c«}
one
as desired,
t'
,
t
,
generated by the
this assertion
f,i
on
35
the
,
(2
ar,
(t
,
t'
for
approaches
some
.
.
.}
A
in-
(positive
A can
result.
with what you find in
is
{£,
the conditional probability of
which gives the
so conditioning on
By the usual
1974, p.341), the probability of
positive probability conditional
that are continuations of (
(,<
as close to
C. t
can
approaches the indicator function of A as
you have trouble squaring
'contains"
we
there exists
{linv_ 00
the event
measurable with respect
this is
probability)
that
As long
appear.
rules that are asymptotically
Thus by Paul Levy's zero-or-one law (Chung,
a
may
the probability
locally stable.
To prove the lemma,
finity.
as
we know
P( Urn &'(Cf) =
guments,
which would lead
make
be able to
demanding
Suppose that for a strategy
respect to those rules, such that for
is
not
is
wish.
empirical assessment rules and
Then a.
We
the target strategy.
the probability strictly positive,
we
conditions.
initial
degenerate cases), since as long as players use mixed strategies,
the other hand, the requirement that
make
(for a fixed
probability one statement as long as the target strategy
away from
as close to
to
for
rules) for
one
arbitrarily close to
positive probability of a very long run of "bad luck,"
On
can
a
made
Chung
(1974), recall that
same as conditioning on
{Ci.
—
i
Cf }
-
(,-
Proposition
Fix the
Proof.
need
6.4.
Every Nash equilibrium
Nash equilibrium
and
t
a,
profile
(
locally stable.
is
By
.
myopic behavior
to find asymptotically
assessment rules, and a
profile a,
virtue of the
rules
lemma, we only
and asymptotically empirical
such that
t
P( lim 4>AQv) =o.\ G) >0.
We
provide two constructions that work, one in detail and one sketched.
In the first construction,
totic part of
we
use assessment rules that
and we
precisely empirical,
In the second,
we
rely
(after the first period) are
on the asymptotic part of asymptotic myopia.
use behavior that
precisely
is
asymptotic empiricism.
We
myopic and
on the asymp-
rely
discuss the relative merits of these
two
constructions at the end of the proof.
For the
first
construction, create a probability space
on which
sequence of random strategy profiles {5i,5 2 ,...} where each
and
identically distributed according to a.
.
defined a
is
independently
s t is
By the strong law of large numbers,
with probability one the empirical frequencies of strategies and joint strategy
profiles all
converge
the support of o\
e(C<)
to the
and
,
1=1
where
up
to
a~'(C<)
time
which any
t
s'
is
Write SI for
max u\s\a
*(Cr))
—
mirv
u'is'.a
'(<;<))
;
shorthand for the empirical frequency distribution of the s~'
along the history
in the
probability one, e((
be
.
let
max
=
corresponding probabilities under a,
(
)
((
.
That
support of a\
is
goes
t
to
zero as
a positive integer sufficiently large
c(Ci)
<
—
m
e(G)
is,
is
the
maximum amount by
suboptimal against a
goes
to infinity
so that the event
1
for all
3G
t
> K,
For
m
-,
Then with
(C<)-
= 1.2,
.
.
.
,
let
A'„,
.
- V,/!™^
+1
has probability at least equal to (2™
A']
<
<
t
We
Km
>
that A' m+]
A"2
For each
.
*
and so on. Note
,
=
For ease of exposition, assume
.
1,2,..., let m(t) =
m(0 =
that limf—oc
if
<
t
A",
let
,
m{t) =
1
if
oo
claim that by construction, the event
e(G)<-rr
m(i)v
<
=
1,2,..
has probability at least one-half. To see why, note that this event can be written
as
P|
and note
Morgan's law
of
first
<
—
f° r a 11
*
such that ?n(0 =
m
=
1
,
7/8 (or more) for
to see that the
complement of
which has probability
the third probability
most, and the probability of
interested in
—
is at
Now we
ment
fj.\(£t)
event
Hence
on.
That
m
= 0,
Apply De
the union of events,
no more than 1/4,
the probability of this
complement
its
define the assessment and behavior rules.
<i((,t)
is
for
— the event we are
least 1/2.
rules are very simple. For
=
one
= 2 , and so on.
?77
this
no more than 1/8, and so
at
is
I,
the second probability
0,
1/2
union
m
that the probability of the events in this intersection are
3/4 (or more) for
the
e(G)
\
is,
t
-
1
except for the
,
define
first
^
As promised,
arbitrarily,
and
the assess-
for k
>
2
,
let
period, the players' assessments are
empirical.
As
for the
that m(t)
i
to
n\((
t
behavior rules, for
=
m,
)
otherwise.
let
d',((<)
= o[
It it
if
t
such that m{t) = 0,
e((<)
< 1/m and
<^j(Ct)
=
a\
.
For
t
such
= any best response by
relative to the assessment rules
given previously.
for these behavior
(f>\
obvious that these behavior rule are asymptotically
myopic (even strongly asymptotically myopic),
And
let
and assessment
P(lim Mtt) =
t—>oo
rules,
a*
|
'
Ci)
'
>
\Z
To see
note that
this,
<
e((f)
if
1/m(t') for
{e(0 < l/m(0,
the probability of the event
induced by the behavior rules
event under the probability distribution where
independently and identically according
this
it
l,...,t, then
=
t
has the same probability
behavior rules
4>
= a,
Thus
.
as the probability of this
strategy profiles are
all
to the distribution a,
event has probability at least 1/2 under the
<t> t
under the measure
1.2,...}
same
precisely the
is
<j>
=
t'
.
drawn
By construction,
generated measure, so
i.i.d.
1/2) under the measure generated by the
(at least
And, of course,
.
{e(Cf)<l/m(0,
*
=
1,2,...}
This completes the proof by the
=
=
{&(0
a.,
<
=
1,2,...}.
construction.
first
For the second construction, go back to the standard probability space on
which
is
to o,
and
,
random
defined a sequence of
strategy profiles that are
i.i.d.
according
let
5(0= max ||&-(0-*r
That
<5(0
is,
and the
is
sup norm) between the empirical frequencies
the difference (in the
From
target strategy.
the strong law of large numbers, lim,_ oc,<5(0 =
almost surely. So for n = 1,2..
..
,
let
L„ be
a positive integer sufficiently large
ror a11
*
so that the event
£CO < n
has probability at least equal to
=
=
each
t
Note
that lim,—,^ n(t) - oo
-.
1
,
2,
Now we
.
.
.
,
let
n(t)
,
a~
7
the
i
<
L^
,
l)/2
let
n+1
n(t)
Assume
.
=
1
if
L,
that
<
t
L n+] > L„
< L2 and
,
.
For
so on.
.
if
a
otherwise.
'((r),
d
,
let
ll*"%) - *V\ < Vn(t), and
^-',
f
{
is,
*
n+1
give the assessment rules. For any history
„,
That
if
(2
> Ln
believes that her rivals are playing (independently) according to
unless and until the accumulated evidence against this hypothesis becomes
3S
severe (as measured by \/n{t)), at which point the player reverts to empirical
beliefs.
These assessment rules are
promised,
player does
that
can
for
precisely
is
/i)(C<)
cf~
make any
some
myopic behavior; but
when more
=
l
clearly asymptotically empirical. Behavior, as
than one strategy
is
it still
remains
optimal. In such cases, for
the player should use the strategy c\
,
what each
to specify
;
such
;*,
otherwise, the player
selection desired (say, choose the optimal strategy of lowest index
fixed enumeration of S'
)•
With these behavior and assessment
the sequence {L n
}
begin with a,
rules, the players
has been constructed precisely so
half or more, the players continue to use a.
that,
and
with probability one-
The proof
forever.
,
of this
is
like
that for the first construction.
Remarks on the two constructions. In both of these constructions, players use precisely the equilibrium strategy for
tion has the
no positive reason
many) given
flexibility
all.
The second construc-
comparative advantage of not relying on the
by asymptotic myopia, as the mixed strategy
of
at
the beliefs.
However
is
flexibility
an optimal choice
provided
(albeit
the second construction does rely
one
on the
afforded by asymptotic empiricism: For no obvious reason, players be-
lieve that their
opponents are playing exactly the equilibrium
continue to believe
this
strategy,
and they
unless and until the evidence to the contrary becomes
overwhelming.
Thus, although the constructions show that convergence
equilibrium
is
possible, neither
to a
mixed-strategy
one seems particularly plausible. This concurs
with our intuition, as mixed-strategy equilibria are hard
stark environment of this chapter. Instead,
we
(for us) to
defend
in the
tend to follow Harsanyi (1973) in
interpreting mixed-strategy equilibria as a shorthand description of pure-strategy
equilibria in
games where parameters
are subject to small
random
of the
game
(such as the players' payoffs)
perturbations which are private information.
39
Sec-
tions 7
and 8 consider learning
in this context
and argue
outcomes are indeed plausible when interpreted
in this
that "mixed-strategy"
way.
Asymptotically myopic behavior and compatible histories
moving
Before
we have on
development,
to this
piece of pending business
to take care of.
and
In both Propositions 4.1
4.2,
were strongly asymptotically myopic
Proposition
we assumed
6.1,
earlier indicated
why
myopic behavior;
is
weak.
(too)
that behavior
We
would not work with asymptotically
4 without
we must work
strict
with the sort of probabilistic convergnece
this section.
we do
not assume On Propositions
4.1
we must
Also, because strategies are not
players.
(only) asymptotically myopic.
in
compatibility of a history and a profile of behavior rules
verge, to avoid problems of correlation
two
was
assessment rules, whereas
In order to get results in the spirit of Section
used in
Because
that the players' behavior rules
relative to their
the results in Section 4
viz.,
asymptotic myopia,
criteria
we assumed
and
4.2) that strategies con-
restrict attention to the case of
assumed
to converge,
we
cannot
use the definitions of unstable and locally stable strategy profiles given above.
Instead, for fixed behavior
and assessment
rules,
we make
the following defini-
tion.
A
Definition.
that for all
The
t
E
is
xunstable
crl\\
<
e
strategy profile a, €
and G,
P(||a*"(Ci')
x in front of xunstable
is
-
for all
not a typo; this
is
t'
there exists
if
>
t
and
i
= 1,2
|
to empirically
a.
.
it
=
>
such
0.
from
which intended
(as
observed) play has probability zero of remaining in a
small neighborhood of a.
tion 4.2, in that
C<)
e
to distinguish this definition
the definition of an unstable strategy profile given earlier, in
opposed
some
.
Note
that this captures
some
of the spirit of Proposi-
asks for empirical frequencies to remain close to a target profile
At the same time,
it is
a probabilistic
event.
40
statement about the likelihood of
this
,
Proposition
that are asymptotically
<p'
a. that
profile
We
to
For asymptotically empirical assessment rules // and behavior rules
6.5.
not a
is
1
Nash equilibrium
omit the proof. The idea
<r»
,
will
is
then so must beliefs. But
equilibrium, then for
s
myopic with respect
some
i
that
if
xunstable.
is
if
empirical (marginal) frequencies
beliefs are close to a,
and some pure strategy
s'
must
fall to
zero,
that empirical frequencies stay close to a\
7.
And
be used with vanishing probability.
numbers) the frequency of
Learning
in
assessment rules, every strategy
to the
s'
,
and
a,
in the
is
lie
close
not a
Nash
support of a\
then (by the strong law of large
which contradicts
the hypothesis
(which puts positive weight on
s
1
).
games with randomly perturbed payoffs
Although Proposition
6.2
shows
that
including equilibria in mixed strategies,
all
Nash
we have
equilibria are locally stable,
suggested that convergence of
intended behavior to a mixed-strategy profile in the standard model seems implausible, as
it
requires that players use just the right
they are indifferent, and
to
do
so.
Of course,
this
it
is
not apparent
why
mixed strategy whenever
they should or would choose
apparent drawback of mixed-strategy equilibria
special to our learning-theoretic approach, but arises
whenever mixed
are considered. In response to this problem, Harsanyi (1973)
mixed
-strategy equilibria of a
libria of a related
is
game
game
not
strategies
proposed that the
could be interpreted as pure-strategy equi-
of incomplete information, in
randomly perturbed by
is
shock which
a stochastic
which each player's payoff
is
private information. For
example, a mixed strategy placing probability 1/3 on one pure strategy and 2/3
on another corresponds
strategies are
when
to a situation in
which the payoff shocks and opponents'
such that the player has
a strict preference for the first strategy
his payoff perturbation
preference for the second
plementary
set.
comes from
when
a set
with probability 1/3 and
his payoff perturbation
Harsanyi showed that
comes from
the
for generic strategic-form payoffs,
41
strict
comevery
mixed-strategy equilibrium can be "purified" by any small and well-behaved
payoff perturbations.
work,
In the spirit of Harsanyi's
behavior and notions of convergence
are subject to an
i.i.d.
convergence
to a
games
to
which the players' payoffs
in
As we
sequence of payoff perturbations.
more
allows us to construct a
strategy equilibrium.
extends our assumptions on
this section
satisfactory
The concluding
mixed equilibrium
model of learning
will see, this
mixed-
to play a
section then examines the question of global
in 2 x 2
games.
The model
Consider / players
where
1,2,...,
are subject to
player
This
is
i
random and
playing
1,... .1
the action spaces
from the action
i
=
.4'
are the
profile a
= (a',a~')
by p
e\
RA
x
,
and we denote
=
period
is
t
= vKaj +
u\(a)
which
We
u"
.
We call
assume
its
We
ej
e'tia').
is
the
= (e^a'))^^' the
that for each
the {e\;t =
i
that the perturbations
denote the probability distribution of
support, which
we suppose
compact, by
is
'
.
Each period, each player
does not observe the shocks
i
to
observes the shock
to consider
mixed
To model learning
c\
to
her
own
strategies in the
in the stage
is a
i
map
from E'
to
.4'
payoffs, but
Hence
her opponents' payoff functions.
stage game, a pure strategy for player
knows
t
each period, but the payoffs
and identically distributed, and
of different players are independent.
need
in
the payoff functions are simply the
are independent
1,2,...}
C
in
at times
privately observed shocks. Specifically, the payoff to
date-t perturbation of player i's payoffs.
E'
same
game
the augmented or perturbed version of the underlying game,
game where
each
a strategic- form
.
(We
in the
will not
augmented game.)
game,
we suppose
that at date
t
player
i
the sequence of (pure) action profiles that have occurred in the past, the
past shocks to her
own
payoffs, and also her currently payoff perturbation
42
e't
;
»
.
player
We
does not learn the past payoff shocks of her opponents.
?
adopt the
following notation:
We
(a)
ment
A
use
a
=
J~[ I=1
A'
denote the space of action
to
by
Profiles of actions
.
players except
all
by player
Probability distributions over actions
by
are denoted
i
are denoted
i.
with typical
profiles,
by a
a
2
-
A
€
A~
e
'
ele-
1
,
l
.
and
probability distributions over A~' (which can reflect correlations in the actions
of i's opponents) are denoted by a~' £ A~'
(b) Histories of actions
(.4)'"
We
.
Complete
write a
(( t )
to
along the history
(,
(c)
t
= Z,
1
up
t
action profiles by
i 's
(d) In addition to
up
turbations
subscript
t
£'
players at
all
as in
and
(e)
.4'
(f)
all
A
£*
denote the empirical
,
at
we
up
time
Z
by Q €
denoted by
to time
player
t
f
,
of
i
€
(
(r
Z
=
(a,
,
a -i) 6
t
.
up
distribution of action profiles
t
i.e.,
t
i
or
,
fj
along the history
knows her own
(ej
,
.
ej)
,
.
.
looks like
.
's
We
(Ct, (e{,
denotes a complete history for player
all
i.e.,
t ;
to time
use a~'(C«) to denote the empirical distribution of
information;
dates and
and
and
and including time
to
all this
i ,
,
are denoted
t
histories of actions are
opponents
(t
the vector of
,
to time
i
C,
•
history of payoff per-
use
•
•
•
,
£\
ep)
€ X\
.
to
denote
Dropping the
of action profiles
by
all
payoff perturbations; dropping the superscript
denotes (respectively) a time
and complete history
t
of plays
the players' payoff perturbations.
behavior rule for player
i
is
denoted by
<?'
=
(<p\,<t>\
),
where
<j>)
:
X]
—
24
An
assessment rule for player
i
is
denoted by
/i"
= (/ij,^,...)
,
where
^
:
Z -*A-\
t
All of this
i
's
is
a straightforward extension of our earlier model. In particular,
assessment rule gives her predictions
Measurabiiity of the behavior rules
is
how
her opponents will play at each
always assumed.
43
date, based
not X\
.
on history so
We
far.
assuming
are
perturbations, so
date
some
that
of
all
i
domain
£]
Given behavior rules
fi\
never observe
Z, and
is
's
payoff
would not have assessments
of the
?
i
we
,
could assume that
as long as asymptotic empiricism
for all the players,
i
is
's
assessments
at
at
properly defined.
and given the exogenous probability
on payoff perturbations, we can construct the induced conditional
distributions
probability distribution, conditional on ft, on the space
denote
of
depending on her payoff perturbations. Nonetheless,
notational complexity,
depend on
t
that players other than
seems sensible
it
actions of her opponents
the cost of
In this regard, note that the
X We
.
use
P(-|£«)
to
this probability distribution.
Definition. For augmented games:
(a)
The assessment rule
fx'
is
asymptotically empirical
if
limJ\ri(Q-cx7%)\\ =
for every
(b)
£
e
Z
.
The behavior rule
of nonnegative
is
<f>'
numbers
asymptotically myopic relative to
{e,}
converging
to zero,
if for
some sequence
choice of action at every
i 's
E,\
is
at
suboptimal against /^(C*)- 2526
most
et
Nash
equilibria of the
augmented game
Before examining learning in the context of this model,
ture of
^
Nash
A Nash
equilibria in the
augmented
(stage)
such that each player's chosen strategy
Suboptimaliry here
is
of interest changes with a
measured given the period
weaker
definition in
is,
(:£'—»
s'(-)
t
review the struc-
game.
game
equilibrium of the augmented
we
as usual, a strategy profile
A')
maximizes her expected
payoff perturbation.
which suboptimality
is
We believe
that nothing
measured averaging over
e',
,
but the proofs are somewhat more involved.
Throughout,
denotes the date
I
we
are loose in our notation, taking as understood things such as:
subhistory of the fixed (
;
and
44
in (b),
(,
is
in (a),
the actions-profile part of £|
.
C<
.
payoff given the strategies of her opponents, or equivalently that for
every
1
e
,
=
a'
maximizes the expectation (over c~') of v
s'(e')
,
(a
,
-almost
p'
l
,s~ (c~
t
))
+
!
e'(o ).
Assumption
to
7.1.
For each
i
RA
Lebesgue measure on
the distribution p
,
l
is
absolutely continuous with respect
'
.
This assumption simplifies the analysis, because
tion of the opponents' actions,
p'
-almost every
Lemma
7.2.
For every o
a singleton has
We
-
t A~'
'
when
a -1
e'
the set of c' for
£
,
and a
's
i
7
.
- ',
b'(e\a~') specify
let
is
e'
,
and
J
determined
It is
Lemma
shows
some
1
i
's rivals.
best response for
i
3 (a~' ) be the distribution
let
this is
t
to
a~
that
b
l
7
actions:
^(q-'Xq') = p'{t e
7.2
+ e'V^a-'Co-')
[t'(a',a-')
independent or correlated play by
reflects
induces on player
which
based on the observation that the complement of
is
her payoff perturbation
(Lemma
preference for one of her actions at
union of lower-dimensional hyperplanes. Note that
this set lies in a finite
For each
,
measure one under p
omit the proof, which
true whether
a strict
e'
argmax ai£A
is
has
i
implies that for any distribu-
it
that
8
x
1
:
&V,a-') =
c'}.
well defined, since for every o~'
1
for p -almost
E
is
every
e'
,
!
6
is
uniquely
.)
straightforward to prove the following technical result.
73. The function 3
For each
induced by
is
and strategy
i
s*
7
;
i.e.,
7T'(s')(a
continuous.
profile
l
)
=
5'
p'{e'
,
t'(s')
let
€
45
E
l
l
:
denote the distribution on
s (e')
= a
1
}.
For s~'
.4'
a profile of
a
.
opponents
strategies for ?'s
-1
.
Note
_,
is
strategy profile s
is
7r
independent of each
Lemma
if,
A
7.4. (a)
for each player
(b) If
understood that a
for j
f
'
is
1
is
E — A
3
:
3
—
let
use the strategies
a
product measure (since the various
a
Nash equilibrium
-almost
A
x
...
all
1
e'
s'ie
,
1
)
b
,
e3
are
a
game
if
and only
1
(e\7r~ (s~')).
=
/?'(a~')
satisfies
the product measure in
=
of the stage
a'
for all
A~' whose margins
l
)
(where
i
it
= a'(e\a
')
for all
is
a3
are the various
and
i
Nash equilibrium.
largely a matter of marshalling definitions, hence the proof
is
This
e'
s3
i's rivals
if
x
p -almost every
ted.
induced
x
where
(s-7 )^,
then every strategy profile s such that s {e
i),
This
p*
€ A* x
)
=
other).
and
i
1
1
(a ,...,
(5~')
€ A~'
that
s~*
is,
A~
denote the distribution on
tt~'(s~')
in s
— that
lemma shows
that to analyze
Nash
equilibria
the induced marginal distributions over actions,
it
suffices to
is
omit-
work with
which motivates the following
definition.
1
Definition. The vector of marginal distributions a = (a ,..., a*) € A* x
a
Nash
-
distribution
if
/?'(a
')
= a"
for all
.
.
.
x
A
1
is
i
Local stability
As one would
expect, our results about the relationship
between Nash equi-
librium and local stability carry over to the context of augmented games. Since
all
that each player observes about the others,
decisions, are the actions chosen,
rules
<f>
in
all
that matters for a player's
define stability and stability of beha
-ior
terms of the induced distributions on actions ^'(^(fi)).
A
Definition.
that for all
we
and
t
a, 6
profile
and
£r
A
1
x
.
.
.
x
A
1
is
unstable
if
there exists
some
,
P(lkW(fJ'))-ai|| <
t
for all
t'
>
t
and
i
=0.
£f
J
46
e
>
such
Proposition 75. Fix asymptotically empirical assessment rules p
that are asymptotically
a.
is
myopic
relative to the p'
Then
.
myopic behavior converges
totically
Lemma
any
7.6.
that
s'
>
For any 6
-maximizes player
e
Proof.
We
set of
e'
(under p
will
for
show
)
which player
make
that
that for
no greater than
l
player
an
there exists
-
i
any
#.4'
responses
a
is
uniform
.
To see
x
e
such that for
a~'
all
this, fix
some a
-
and note
'
1
is
1
,
the
,
that the set
.4'
(at
Put
.
!
e
most)
(#.4*)
a "sleeve" of
where player
2
lies
diameter
e
t
upper bound on the Lebesgue measure of the union of
bound goes
to
zero as
e
goes
,
-best
there
(#.4')
bound on
the o -measure of these sets, going to zero as
l
goes to zero, which establishes the
Proof of Proposition 7 .5
.
^(oj
is
e
result.
Fix an a.
of generality, suppose that
2
to zero.
absolutely continuous with respect to Lesbesgue measure, there
thus a uniform upper
a
around
has multiple
i
on
such hyperplanes,
contained in the union of these sleeves. In the compact set E'
a
and
for p' -almost every c
'
between any two given actions
indifferent
denotes the cardinality
of
J
(in
-1
< L
there exists an
sleeves of this sort, and this uniform upper
Because p
-
a
beliefs
has more than one e-best response has measure
i
6
6
asymp-
in the sense that the
such that, for any
/?'(a-')|!
each of these hyperplanes, and the set of
is
distribution,
close.
>
e
lower-dimensional hyperplane, and there are
where
Nash
that as time passes,
payoff against a
i 's
Htt'V)
e'
not a
myopic behavior,
to
induced distributions over outcomes become
of
is
rules
unstable.
The proof uses the following lemma, which shows
for
a,
if
and behavior
l
that
1
)
is
f a\
.
not a Nash distribution. Without loss
Let 8 =
^'(q:
1
)
-
a\\\.
Since
]
l3
is
,
we
continuous,
a
all
-1
within
We
It'
show
will
so that for
can find
some
\\0
l
(a
)
-
< 6/4
for
Suppose
not,
1
/? (ai,)||
1
.
that profile q»
history
l
sufficiently small that
e'
a^
of
.
unstable for
is
e
= min(e'/7,
6/4).
f<
P(lkW(£-))-a||| <
e
for alii'
>
and
t
for
2
= 1,...,/
>
&
0.
J
Lemma
on
6.2 then implies that (almost surely
this
event of positive probability)
the empirical marginal frequencies of actions eventually
Then because assessments
surely on this event),
distribution
is
are asymptotically empirical,
- a7
||/zj.(Cr)
l
\\
<
It
<
e
lie
within
of the
e
we conclude
a'„
that (almost
This * n turn implies that the
'
on actions /^(/^.(GO) induced by the myopic best response
to /j],((t')
-1
within 6/4 of /^(a.
From Lemma
/iJ-(G')
is
).
7.6, there is
one
such that the
within 6/4 of ^(/^. ((/')) for any
the suboptimization allowed for
The
e"
by player
l's
(,-
.
set of e" -best
Let
t'
responses to
be large enough that
decision rule
is
less
than this e"
.
triangle inequality then implies that the marginal distribution over player
l's actions is
within 6/1 of
^(q^
1
) ,
and hence
at least 6/1
>
e
away from
a\
,
which contracts the hypothesis.
The next step
is
in parallel
locally stable for
myopic behavior
with Section 6
is
to note that
every Nash equilibrium
some asymptotically empirical assessments and asymptotically
rules.
This can be most easily
construction in the proof of Proposition
6.2, in
shown by adapting
the second
which players believe
that the
distribution over their rivals' actions corresponds to the equilibrium unless and
until they receive sufficient
evidence otherwise. With these
nature of the players' behavior rules
mix
in the
for the
way
is
beliefs, the arbitrary
eliminated; rather than just happening to
the equilibrium prescribes, the players have a strict preference
behavior they choose. Of course, the players' beliefs are
48
still
cooked
to
favor the equilibrium, so
how
we do
a special class of
Global convergence
This section will
rules
final section
two-player games.
in a class of 2 x 2
show by example how
do not build
in
games
learning in
augmented games can
an arbitrary predilection for equilibrium
will restict attention to behavior
and assessments
do more than show
that convergence to a
even when the equilibrium
games we
is
not
this
end,
form
Moreover,
3.
we
the behavior rules: In the
artificially built in
(augmented version) of the mixed
consider, play will converge to the
do not aim
To
mixed equilibrium can occur
equilibrium with probability one, regardless of the
We
play.
that take precisely the
of fictitious play, as specified in (A) through (D) of Section
will
provides
mixed equilibrium even when the assumed behavior and assessment
lead to a
we
The
players might learn to play a mixed equilibrium.
such an explanation for
S.
not yet have a really satisfactory explanation of
for very general results here.
initial beliefs
Rather,
we
of the players.
content ourselves
with the special case of 2 x 2 games that (before being augmented) have a unique
Nash equilibrium, which moreover is completely mixed. At the end
we
speculate about possible extensions that
for local stability of
games.
eral
We
mixed
under
will
that an
play in other augmented
augmented version of Shapley's
provide the desired counterexample, but
The remainder
Proposition
fictitious
a sufficient condition
suspect, however, that convergence cannot be guaranteed for gen-
augmented games; we conjecture
example
8.1.
of this section will discuss
Take any
equilibrium, and consider
If
equilibria
would provide
of the section,
2x2
game
we have
not verified
and prove the following
that has a unique, completely
any augmentation
that satisfies
Assumption
this.
result.
mixed Nash
8.3 given below.
behavior rules and assessments are as in the model of fictitious play, the induced
marginal distributions on actions converge, with probability one, to the unique Nash
distributions of the
augmented game.
49
,
Previous results about the global convergence of behavior in learning pro-
games
cesses have focussed on
by
that are solvable
iterated strict
dominance
(Moulin, 1984; Guesnerie, 1991; Milgrom and Roberts, 1990; Borgers and Janssen,
augmented games we consider
1991). In contrast, the
are not
dominance
solvable.
Preliminaries
augmented game with expected payoffs (v\v 2 ) and payoff
2 x 2
Fix a
perturbation vectors
so
e
and
1
e
example, r 2 (l,2)
that, for
chooses row
2
Write the action sets for each player
.
2's playoff
is
row
(Player l's choice of
1.
(v\v 2 ) has a unique Nash equilibrium
if
is
that
Assume
a strict best-response cycle.
best-response cycle
2
i<
<
]
(2.2), r (2,2)
Let
,1
t (2,2))
FHz) be
<
z
.
is
1
i.'
u 2 (2,l))
F
<
2
(z)
2
.
We want
if
2
and
(1,2),
y (l,2)
That
if
is
necessary, so that this
<
i^O.l)
is,
completely mixed,
r 2 (2.1)
(2.1),
<
2
u 0,l).
F
1
Note
that
F
2
compute
to
l
(c
aV(l,l) +
aV(2.1) + (l - a 2 )u
1
(2,2)
+
continuous on
is
she assesses probability a 2
is
that,
R
on any given
,
1
1
in the
1
.
2
2
2
date, (e (l)-e (2))/(i- (2.2)-
2
that 2 plays
(1
1
e (2)
the marginal probability that
,
)
.
- a 2 )v
y
column
1.
If si.e
1
2
(l,2)-vK2,2)
1^(2,1) -1^(1.1)
.
V
f (l,2)-t.'H2.2)
]
Define
r
x =
1
plays row
plays
row
Simple algebra shows that the former
_
A,
+
<l-aMl
(2)-e (D
1
+ eHD, while row 2 nets
(l,2)
if
e
1
continuous.
is
/3
1
derived from the distribution function p
is
be the probability
u
1
?'
1
This probability
expected payoff
<
game
the probability that, on any given date, (e (2)-f (l))/(r (1.2)-
obvious fashion. Note that
Let
Rearrange rows,
counter-clockwise.
that the
1
completely mixed.
Because (r\r 2 ) has a unique Nash equilibrium that
(v\v 2 ) has
= {1,2)
he chooses column 2 and player
listed first.)
is
A
1
(2.1)-r (l,D
1
(1.2)
+
i'
,
50
-i-H2.2)
is
1,
1
her
for her
greater
Then row
1
is
chosen whenever
(e'(2)
-
1
1
<
c'(D)/(t' (1.2)- d (2,2))
-xa
1
2
.
This
gives
1
3 (q
2
Note
that x
A
>
and
1
(thus) that
similar computation will
=
)
/
is
/?'
show
,
(1
-xq
2
).
2
a nonincreasing function of a
that
we
if
.
define
2
(l,l)-v 2 (l,2)
r 2 (2.2)-u 2 (2,l)'
f
+
V
F
then
2
/3
Note
that y
Lemma
S.2.
>
and
1
Fix any
(Q
=
1
(thus) that
)
2
/?
2x2 game iw'f/i
l-F
is
a
a
distributions,
Nash
Proof.
/
is
to these
a continuous function
use Brouwer's fixed point theorem. To
and (a\,a 2m ) are two Nash
a\ f a\ and, in
a2 > a2
.
And
fact,
>
a\
because
1
Z?
is
.
7.1
where
/S'(q~')
= al
has unique
for
the unit square into
Without
i
= 1,2.
(a^,a 2 )
that
show uniqueness, suppose
1
.
strategies that are essentially unique.
mapping
Because
1
strictly mixed.
is
Assumption
two equations, note
distributions.
a\
).
that satisfies
distributions are pairs (a\,a2J
5 1 (q 2 ),/5 2 (q 1 ))
1
unique Nash equilibrium that
and thus has Nash equilibrium
To show existence of a solution
(
(l-ya
nondecreasing function of a
Then every augmented version of the game
Nash
2
itself,
>-*
and
that (a\,al)
loss of generality,
assume
nondecreasing, this implies that
is
nonincreasing, this implies that a\
< a.
,
a contra-
diction.
We
Nash
distributions
are both strictly
has
a
row
hereafter denote the probabilities of
by a\ and a 2
.
In general,
between zero and one, even
if
it is
between zero and one
are sufficiently small.
51
and column
if
1
in the
unique
not the case that a\ and a 2
the original
unique, completely mixed equilibrium.
probabilities are strictly
1
However
(unaugmented) game
the
Nash
distribution
the supports of the perturbations
Assumption 83. For
its
1
support.
F
is
1
is
F
2
the density function for
This assumption
is
on some neighborhood of
strictly positive
on some neighborhood of
implicit form, since
density functions of the distribution functions
of the original distribution functions p
A
F
F
2
Restating
.
it
terms
in
assumption
1
and
F
2
are connected
is
is
that
and small
between zero and
and p2 are bounded and
1
involves the
strictly positive
interior of their supports.
The proof
details.
We
of Proposition 8.1
So before giving those
fix
involved, and
is fairly
we
details,
8.1
easy to get lost in the
it is
sketch the intuition behind the proof.
behavior and assessment rules where behavior
is
myopic with
respect
assessments and the assessment rules conform precisely to the model of
fictitious play, as
by —i
and
that the equilibrium marginal distributions are strictly
The intuition for the proof of Proposition
to the
1
set of sufficient conditions for this
one, and the density functions for p
on the
F
it
and p2 of the perturbation vectors
1
the supports of the perturbation vectors
enough
uniformly bounded on
is
strictly positive
somewhat
stated in
tedious but not difficult.
F'
the density function of
The density function for
— xa\ and
,
= 1,2,
i
at
variable,
date
given in Section
t
that player
We
3.
i
will write a\ for the probability assessed
will play her first action.
depending on the history of play up
limi_oo(aJ,Q-j) =
(q^q 2
Nash equilibrium
implies that behavior converges to the
For notational simplicity,
all
we suppose
dates
t
>
2
weights identically equal to zero in the
allowing for nonzero
qm =
initial
(ta't
+
ta\/(t
t
.
We
with probability one; since behavior
)
empirical distributions at
to date
This
1),
+
1),
that
myopic,
this
strategies.
which corresponds
fictitious
play model.
to the case of initial
It
will
be clear that
alter the analysis. In this case
if
i
plays action
round
t,
if
z
plays action 2 in round
t,
52
random
show
will
is
a
that the players' assessments equal the
weight does not
l)/(t
+
,
is
1
in
and
.
.
which
more conveniently written
is
(1
—
—a\/(i +
The key thing
note here
to
+
a\)/(f
as
1),
1).
if
i
plays action
round
/,
if
7
plays action 2 in round
1.
that the size of the
is
in
1
and
changes of the o' vanishes
asymptotically, at rate 0(1/1)
The theory
of stochastic approximation (Arthur et
Lyung and Soderstrom,
Clark, 1978;
1983)
al.,
1987;
Kushner and
shows when such asymptotically van-
ishing stochastic variations can be ignored;
i.e.,
when
the successsive
random
variables will almost surely evolve according to the evolution of their expected
values. Starting
from some value of
values (q ^, a 2uz )
so on.
,
for
fix
some
it
-
(a]
F
2
(l
-
will
,
a\
,
ya\)
,
and
if
certain regularity conditions are met, then
a 2,) and compute the conditional expected values of
a bit
more
transparent, let us
the probability that player 2 will play his
,
compute E[oJ +] —
first
strategy
so the expectation of the difference between
-»<.!)) +
£l*l-va;).
This simplifies to
E[a^ +]
-
a]\it]
=
^(1
-
F*(l
-
yo))
- a 2 ).
similar calculation gives
E[a] +
(a] +:
,
-
a\\Z
t
]
=
1
—(F\l
53
-
xa])
-
a\).
is
a2M and
2
i3
a]
,
a 2t+] )
aj|£,]
denotes expectation taken with respect to P(|£<)
E[-|£ ( ]
£f(l-«1
A
and
almost surely approach (Q»,a^)
makes matters
= 1,2, where
2
Given
1
to (a\<al)
random process
Or, since
,
for every starting value of (a], a]), this "successive expected values"
If
So
the (conditional) expected
.
sequence converges
the
compute
,
2
then use these to compute expected values of (a) +7 a u2 )
1
,
(a], a])
.
(a))
is
=
The
equations
fact that the step size in these difference
that the evolution of the "successive expected values"
going
is
zero suggests
to
approximates
that in the
related differential equation system
^
where we
= F'(l -*a>) -a], and
a
reinterpret the
1
^
=
1
s as the successive
- J*(l -
ya\)
-
a?,
expected values, and the time
index has been changed: Because the amount of change between time
t
+
in the differential equations is
1
independent of
more and more
t,
and
t
steps
of the difference equations are compressed per unit of time of the differential
equation system as
The
t
gets larger.
trajectories of this
system of
differential equations are
most
easily studied
by comparing them with those of the system
^
We show below
= F'(l - xa\) - ai, and
that the
^
=
1
-
F
2
-
ya\)
-
a2
.
second system has closed, convex orbits around the point
(a\,al), and that relative to the second system, the
inward. (See Figure
(l
8-1.)
Thus
the first
first
system spirals
always points
strictly
towards (q».q»)
in
.
This
suggests that the "successive expected value" sequence approaches (a\,a 2,) from
any starting point, and then the methods of the theory of stochastic approximation
will yield the
almost sure convergence that
In relating this intuition to the proof
watch
for.
First,
we
we
desire.
we now
will use the closed orbit trajectories of the
of differential equations as level curves for a
Lyapunov
Lyapunov function we construct
is a bit
less regular
general results as they are stated in the literature.
Specifically,
our Lyapunov function
is
than
we
is
Second,
we
need, because
required for the
27
not twice continuously differentiable in general.
54
to
second system
function.
derive parts of the theory of stochastic approximation that
the
two things
give, there are
.
a'
Dynamics
Fie. 8-1.
The
of expected beliefs.
solid curves represent trajectories of the second
in the text.
These closed orbits give the
these level sets are not restricted to stay inside the
the trajectories of the
system of differential equations given
Lyapunov function L (Note that
unit square.) The dashed arrows show
level sets of the
.
system of differential equations, the system that describes the
first
dynamics of "expected beliefs." (These do stay within the unit square.) Since the dashed
arrows always point inwards relative to the closed orbits of the second system, the first
system gives trajectories which spiral in towards (a\,a\)
Proof of Proposition
For (a
1
,
a2) €
L(a\a 2 )=
The function L
(a).
[0,1]
x [0,1],
let
L(a\a 2 ) be
[l-F {l-yF)-al]dF-
the function defined
2
/
(a
mnemonic
for
Lyapunov) has
[F
/
1
(1
-
x/?
L
(a^o^CaiXhthen L(a\a )>0.
2
is
continuously differentiable with gradient vector
VI
=
(1-F
2
-ya^-Q^Q -F
1
(1
.
55
2
)
by
- a\}d0
the following properties:
L(a\,a 2.) = 0.
(b) If
(c)
8.1
]
(l
-xa 2 )).
2
.
.
This gradient vector
(d)
L
is
convex. In
^
that
(a) is
=
F
-
1
wind around
trajectories that
ya\) = a 2
1
-
F
2
— ya
(l
neighborhood of a\
1
is
)
F
and
,
F
and
,
8.3 to note that
in a
^
-
(l
— xa 2
(l
is
)
Property
.
yaj)
- a2
.
Then
F
]
(l
- xa2 )
convexity; the rest
Note
that
For part
just noted.
L
is
Assumption
nonincreasing, and
it
is strictly
then follows. Property
(b)
1
-
F
(c) is
—
2
compute the Hessian of L
(d),
first
to
ya^)
show
an exercise in integration.
separable;
is
note
strictly increasing
virtually a matter of definitions, together with the properties of
and
(b),
enlist
it is
.
)
For property
.
nondecreasing, and
]
decreasing in a neighborhood of a 2
2
the point (a\ .a 2
- xa\) = a\
]
else.
are trajectories of
immediate from the definition of L
— F^O -
1
L
the level curves of
fact,
= J«(l - xal) - a\ and
which gives closed
Property
zero at (q'.,q») and nonzero everywhere
is
L\*>) =
/
L(a\a 2 = Via
i.e.,
[1
1
)
-
F
2
(l
-
1
y/?
)
)
- L 2 (a 2 where we
)
define
- a 2 ]d/?\ and
Jol
L
Now
fix
2
(a
2
=
)
For every
('
,
is
t
precisely
>
1
,
two
players.
t
[F'{1-x/37 )-a\]d!3
myopic with
2
.
fictitious play,
and sup-
respect to these assessments.
define
L«(C«)
L
/
Jal
2
assessment rules according to the model of
pose that behavior
In words,
r
= L{a\,a\) = Via)) + L 2 (a 2 ).
t
the value of the function I at the vector of assessments by the
(Q)
is
We
will
have proved the theorem
with probability one, since
this will
if
we prove
imply that the
5G
that lim,_oo 1,(0) =
beliefs are
converging (with
,
.
,
probability one) to a\ and a\
,
hence (by
and myopic behavior)
earlier analysis
the marginal distributions over actions are converging to those values.
The
step in proving that lim,
first
-
E[Z, +] ((< + i)
such that
we
Li(Ct)|&]- Specifically,
continuous function
A'
if
t
—
(( t )
show
will
bound on
an upper bound for
to derive
is
*
on the unit square with
l
the uniform
is
L
that there exists a nonpositive
1
i(a' ,oc
2
<
)
1
for (a'
^
2
)^ (a\,al)
two perturbation
the density of the
scalars,
E[L l+1 (C, +i) -
<
X-f(Ci>KiJ
We
That
obtain this
we
is,
bound by looking
t
write
- E[L 2 (a 2M ) - L 2 (a])
fc]
t
t
we
(+)
1)2
2{t
1
two (separable) pieces of L (Q)
at the
E[L M (CM)-L (Q\i ]=-E[V{a'M )-V{a\)
and
~
+
+
t
|
(
\
(
j
begin with estimates of each of the two expectations on the right-hand
side.
Consider
equal (ta] + 1)/(t +
1)
ta)/(t + V)
if
player
1
a2
1
will
,
,
player
V
the term involving
first
if
Given
.
player
1
plays
row
row
2.
Since
l's beliefs
plays
play row
1
in
round
1
a\
we have
,
round
in
t
about
(+]
F
1
(l
—
will
will equal
q] +]
2's actions are
with probability
t
and
,
q'
that
iq')
given by
.
Now
rdoj +D/U+1)
[l-F2 (l-yp)-ai}df3\
which
bounded above by
is
'ta)
t
where
28
K
This
is
is
+
+
1
~a
<
l-F*(l-
2
l
the uniform
ta)
ya))-a]
bound on
+
1
a,
y
t
+
the density functions of
1
F
1
and
F
28
2
the length of the interval of integration, times the value of the integrand at
Sim-
.
3
1
= a)
plus a bound on the integral of the difference between the true integrand and the integrand at a
This latter bound comes from noting that P •—
constant Ky
F
2
.
01
(l
-
y/3)
is
,
1
,
.
Lipschitz continuous with Lipschitz
1
1
plifving, this
,
is
1-°1
l-f^l-yaj) -a 2
.
+
*
y
2
l
Similarly,
l/ tQ
\ -J-T-ifS
a.
'
s
I
-a
<
Hence E [X
1
(a
1
,.,.,
+
+ xA'y
1
- -^(a ,)
1
)
1-F
t
2
+
-
fr]
|
a:
is
bounded above by
-
al)F'
- qJO -
F
F
-
- a2
2
Kxj
)
'
F
+
2(y
l) 2
'2/
F 1
a!
+
2(f
the
l
1
1
1
where we suppress
+
*
l) 2
'
1— :m 2 and 1— yaj
arguments (respectively
)
of
F
1
and
2
.
Similar calculations
show
E[X 2 (q 2 +1 ) - I 2 (a 2 )
that
is
6]
|
bounded below
by
F -q
1
1
t
+
F2 - a
-
A'
2
2(*
1
+
l) 2
Thus
E[WC, + i)-L,(Ct)
I
= E[L
1
(al +1
&] = E[L(q| +1 .g^) -1(a), a
)-I
1
(a])
I
is
6]
2
)
|
-E[L 2 (a 2 +1 )-L 2 (a 2
6]
)
G
I
bounded above by
1
- F2 -
Q2
F -a,0
1
1
F 1
a)
1
t
+
E[X.
two expressions
m (Cw)-Li(Ci)
t-f
t
2
+
-q:
into the
it]
F2 -
a
2
Jy'Gt
2(*
1
[1-F2 -q 2
Write [F'-a 1,] as [F'-ai+ai-Q;] ard write
substitute these
-
]
as
upper bound, and
+
+
y)
l) 2
[l-F2 -a 2,+a 2m -a 2
t
simplify. This gives
<
I
F -q
1
M
i
58
1
A'(x + y)
1
.
a.
-
]
a,
2(i
+
D
2
,
.
Define
,(a\a 7 ) =
The signs
—
a2
then
(a
of
1
F
-
[l
.
(l
-
-
1
)
J/Q
- F - a 2 and
2
-q
i
is a
1
1
-
F -
a 2 ± 0, and
2
a 2 ) ^ (a ., a 2 )
o
if
-a
a;] [a\
q.
a 2 are the same, so that
1
]
- [F(l - xo 2 -
.
Next, for each
and
/
(r
I'ACt)
a\) [a
)
are opposite,
-
2
F -
}.
and
1
if
a\ f 0, so i(q\q 2 )
1
the^
,
2
a
F -a\
and the signs of
nonpositive function. Moreover,
2
^ a
2
1
bound
Putting everything together, this gives the
1
1
2
a ^ a 1,
1
<
for
(*)
let
,
= E[I,(C«) - Lf_i(C»-i)
|
6-i].
and
LtiO = LAG) - ^max{^(G-),0}.
By construction,
{0}
•
By
so {!<}
bound
the
is
{!,((,)}
a
forms
supermartingale over the information sequence
(*),
bounded supermartingale, and hence has
an immediate consequence, L
Finally,
a
not possible
it is
t
(( t )
that,
itself
has a limit almost surely.
with positive probability,
greater than zero. To see this, suppose that along
From
.
the construction of the function
that, in this case,
lim
T
0.
a
if
,
da), a])
<
f
<
sup^^a^a
Increase
show
matter of algebra to
we
define
for those
(
L
t
(Ct)
as
where L
t
T
that
L,(Q -
(( t )
if
2
)
<
i
some
<
has nonzero
i/(2(t
limit.
59
history (
0, so that for
]T', =1 ibviQv)
this limit is a
given previously,
necessary so that
ri»,(C»)
almost surely. As
a limit
,
+
1))
all
T>
t
we know
that
lim,—^ L,(Q) >
it is
> T
A'(x
for all
,
t
value
easy to
show
some
large
for
+ y)/|f
> T
.
|.
Then
it is
Accordingly,
lim,— oo -WCt) = °°
But {L
t
is
(Ct)}
bounded below.
a martingale that is
L^d) has nonzero
If
with positive probability, L/(C<) has limit oc with positive probability. In
limit
lim,^ooE[Z,((<)] = oc, which contradicts the fact that
this case,
{£,((,"<)}
a
is
martingale.
Extensions of Proposition 8.1
Proposition
8.1
gives a relatively restricted global convergence result.
restricted in that the behavior
specific;
and assessment rules
It is
that are permitted are quite
behavior must be precisely myopic with respect to assessments that are
formed according
considers only
game has
to the
model
of fictitious play
2x2 games, and then only 2x2 games
a single equilibrium that
which the augmentation
Thinking
It is
satisfies
is
it
which the unaugmented
in
completely mixed, and then only those for
Assumption
8.3.
about the behavior and assessment
first
further restricted in that
rules,
it is
clear that exten-
sions are possible. (Indeed, extensions along these lines are suggested by Arthur
et al. (1987).)
totically
Assessments can be asymptotically empirical and behavior asymp-
myopic, as long as the "rates of convergence" to
opic behavior are sufficiently
fast.
All
we need
which involves the two possible values of
more
for
i
= 2) and
—
bounds on E[L M -L,\£
a\
the probabilities of those values;
are uniformly 0(1 /t 1 ) or smaller.
a rate of
convergence
of asymptotically
delicacy
is
|q'
-
(
Here q
test
One
(two for
we
i
=
t
]
,
and two
1
can tolerate changes
can control the probabilities by imposing
on the sequence
myopic behavior. But
called for.
convergence
9
M
a'
play and my-
terms of these differences and probabilities, contribute differences that
that, in
that
are the
fictitious
One's
{e<}
that
governs the asymptotic part
for the differences q| +1
first instinct
might be
to
impose
to empirical assessments; the natural condition
q~'(G)|
_,
(Ci)
is
is
at
most 0(1 /i 2 ).
29
the fraction of the time that
60
But
this is
a
—
a\
,
uniform
first strategy.
is
more
rate of
would seem
stronger than
-i has played his
a bit
to
be
needed,
since this controls deviations of u!
know
to
far
that a\
is
"close" to empirical frequencies, but only that
from empirical frequencies, aJ +]
new
(from the
0{\/t).
we
If
is
going
empirical frequency) in the
that for fictitious play beliefs with
is
from empirical frequencies; we don't need
insisted that \a)
-
nonzero
a~''((,)\
to
is fairly
be about the same distance away
same
initial
direction.
In this regard, note
weight vectors,
0(1/t 2 ) or smaller,
is
a\
if
- o _1 (G)|
|ai
we would
rule
out these assessment rules, for which the result does hold.
Extending our results to other classes of 2 x 2 games or beyond
2x2 games
We
(and to games with more than two players) seem to offer greater challenges.
conjecture that results
on
can be derived, along the following
local stability
line.
Take an augmented game and any equilibrium distribution for that game. Write
down
the continuous-time
quencies, as in equations
dyanmics
(*)
above.
for the expected values of the empirical fre30
If
librium values by the usual eigenvalue
system
this
test,
is
locally stale at the equi-
then the equilibrium will be locally
stable in the sense of this paper for fictitious-play learning dynamics.
with Arthur
et al. [1987].)
It is
interesting to speculate
whether the continuous-
time system that goes with the Shapley (1964) counterexample
equilibrium.
might
If
so,
then
its
instability
fictitious
is
unstable at the
play (for augmentations)
follow.
Regardless of these conjectures,
managed
is
under
(Compare
to derive
plausible that in
here indicate
some
we hope
that,
that the limited results
we have
with Harsanyi's notion of purification,
cases players
would
learn to play a
it
mixed strategy
equilibrium.
This
reason
we
is
conceptually straightforward, but
it is
offer conjectures instead of results.
61
not a simple exercise in practice,
which
is
one
References
Arthur, W.
proportions of
Borgers,
M. Ermoliev, and Yu. M. Kaniovski (1987), "Limit theorems
balls in a generalized urn scheme/' mimeo, I1ASA.
Yu.
B.,
and M. Janssen
T.,
(1991),
"On
the
dominance
for
Cournot
solvability of large
games," mimeo, University College, London.
Brown, George W.
C.
Koopmans
New
(1951), "Iterative solutions of
(ed.), Activity
games by
fictitious play," in T.
Analysis of Production and Allocation, Wiley and Sons,
York.
Canning, D. (1991), "Social equilibrium," mimeo, Cambridge University.
Chung,
New
K.-L. (1974),
Course
in Probability Theory,
2nd
edition,
Academic
Press,
York.
and H. Haller (1990), "Learning how to cooperate: Optimal play
repeated coordination games," Econometrica, Vol. 58, 571-96.
Crawford,
in
A
V.,
Eichenberger,
J.,
H. Haller, and
F.
Milne
(1991),
"Naive Bayesian learning
in 2
x 2
matrix games," mimeo, University of Melbourne.
Ellison, G. (1991), "Learning, Local Interaction,
and Coordination," mimeo, MIT.
Fudenberg, D., and D. Kreps (1988), "A theory of learning, experimentation, and
equilibrium in games," mimeo, Stanford University.
Fudenberg, D., and D. Levine (1989), "Reputation and equilibrium selection
games with a patient player," Econometrica, Vol. 57, 759-78.
in
Fudenberg, D., and David Levine (1992), "Self-confirming equilibrium," mimeo,
MIT.
Guesnerie, R. (1991),
"An
exploration of the eductive justification of the rational
expectations hypothesis," forthcoming, American Economic Review.
Harsanyi,
J.
C. (1973),
"Games with randomly disrrubed
payoffs:
for mixed-strategy equilibrium points," International journal of
2,
A new
Game
rationale
Theory, Vol.
1-23.
Hendon, E., H. Jacobsen, and B. Sloth (1991), "Fictitious play in extensive form
games and sequential equilibrium," mimeo, University of Copenhagen.
Jordan,
J.
(1991), "Bayesian learning in
normal form games," Games and Economic
Behavior, Vol. 3, 60-81.
Kalai, E.,
and
E.
Lehrer (1990), "Rational learning leads to Nash equilibrium,"
mimeo, Northwestern
University.
62
Kandori, M., G. Mailath, and R.
Rob
(1991), "Learning, mutation,
and long run
games," mimeo, University of Pennsylvania.
equilibria in
Kushner, H., and D. Clark (1978), Stochastic Approximation Methods for Constrained
and Unconstrained Systems, Springer- Verlag,
New
and T. Soderstrom
MIT Press, Cambridge, Mass.
and Practice of Recursive
Lyung,
L.,
Thomas
Marcet, Albert, and
mechanisms
learning
(1983), Theory
J.
York,
Identification,
Sargent (1989a), "Convergence of least squares
in self-referential linear stochastic
models," journal of Eco-
nomic Theory, 48:337-68.
Thomas
Marcet, Albert, and
in
environments with hidden
Political
Sargent (1989b), "Convergence of least-squares
state variables
and private information,"
journal of
Economy, Vol. 97, 1306-22.
Milgrom,
in
J.
and
P.,
games with
Milgrom,
Roberts (1990), "Rationalizability, learning, and equilibrium
strategic complementarities," Econometrica, Vol. 58, 1255-78.
and
P.,
J.
J.
Roberts (1991), "Adaptive and sophisticated learning
peated normal form games," Games and Economic Behavior, Vol.
Miyasawa, K.
non-zero
"On
(1961),
3,
82-100.
the convergence of the learning process in a
sum two-person game," mimeo,
in re-
2x2
Princeton University.
Moulin, H. (1984), "Dominance solvability and Cournot
stability,"
Mathematical
Social Sciences, Vol. 7, 83-103.
Nyarko,
mimeo,
"Learning and agreeing
Y. (1991),
New
Robinson,
J.
to disagree
without
common
priors,"
York University.
(1951),
An
iterative
method
of salving a
game, Annals
of Mathematics,
Vol. 54, 296-301.
Shapley,
ley
L. (1964),
"Some
and A. W. Tucker
topics in two-person games", in
(eds.),
Advances
in
Game
M. Dresher,
L. S.
Shap-
Theory, Princeton University Press,
Princeton, NJ.
Young, H.
P.
(1991),
"The evolution of conventions," mimeo, University of Mary-
land.
G3
wbb
vJ
J72
Date Due
»CN-
JMl
B
3
'
/
(::
Lib-26-67
Hilt
3
=1060
0077lbbb
4
Download