Document 11163223

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium
Member
Libraries
http://www.archive.org/details/rulesofthumbforsOOelli
/orking paper
department
of economics
RULES OF THUMB
FOR SOCIAL LEARNING
Glenn Ellison
Drew Fudenberg
No.
92-12
June 1992
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
RULES OF THUMB
FOR SOCIAL LEARNING
Glenn Ellison
Drew Fudenberg
No.
92-12
j une 1992
M.I.T.
AUG
LIBRARIES
4 1992
REOtsvtu
MASS.
SF n
,r
I
T TECH.
:
7
^2
Rules of
Thumb
for Social
Drew Fudenberg 3
Glenn Ellison 2
June
'We thank
Learning
18,
1992
Ahbijit Banerjee, Mathias Dewatripont, and seminar participants at
University College, London, Nuffield College, Oxford,
the Universite Libre de Bruxelles for helpful comments.
GREMAQ
We
r
Toulouse, and
are also
happy
to ac-
knowledge financial support from a National Science Foundation Graduate Student
Fellowship, from NSF grant SES 90-08770, the John Simon Guggenheim Foundation, and Sloan Foundation Graduate Program.
2
Harvard University, Department of Economics
3
MIT, Department of Economics, and IDEI, Toulouse.
ABSTRACT
This paper considers agents who use the experiences of their
neighbors in deciding which of two technologies to use.
We
consider two learning environments, one where the same technology
optimal
is
for
all
players
the
and
technology is better for some of them.
suppose
that
a
that
ignore
players
all
use
data
lead
adjustment
to
fairly
but
can
be
decisions
efficient
quite
slow
when
first introduced.
JEL Classifications:
D8, C7
,
N53, 030
rules
specified
which
tendency to use the more popular technology.
can
where
each
In both environments,
exogenously
historical
other
a
in
may
of
we
thumb
incorporate
These naive rules
the
superior
long
run,
technology
but
is
Introduction
1.
presents
paper
This
simple
two
models
how
of
economic
agents decide which of two technolgies to use when the relative
profitability of the technologies is unknown.
In
both models,
agents base their decisions, at least in part, on the experience
of their neighbors;
that
believe
We
this is what we mean by "social learning."
social
learning
is
frequently
important
an
aspect of the process of technology adoption, where "technology"
should be broadly construed: Although our main example concerns
adoption
the
technology
agricultural
of
English
the
in
agricultural revolution, we believe that the models may also be
applicable to such choices as parents' decisions whether to send
their children to a public or private school.
There
have
been
several
previous
social learning in technology adoption.
contagion
the
process,
which
models
models
of
role
the
of
Perhaps the earliest is
adoption
as
random
a
matching process in which players switch to the new technology
the first time they meet someone who is using it;
yields
familiar
the
"S-shaped
curve"
for
the
this process
path
time
of
adoption that has been widely used in empirical work.
Recent papers by Banerjee [1991a],
al
[1991]
social
choices
and Smith
learning,
is
in
better.
[1991b],
Bikchandari et
[1990]
study more sophisticated models of
which
players
must
decide
The primary question of
which
interest
of
in
two
these
models is whether the social learning is sufficient to prevent
the population
from locking on to the wrong decision.
These
papers suppose that players observe one another's choices,
that
players
do
not
observe
the
payoffs
that
these
but
choices
generate.
2
In a model where players choose sequentially,
later
decision makers may thus be led to make the same wrong choice as
their predecessors, as the late deciders do not observe that the
early ones now regret their choice.
Moreover,
this can occur
even though the players could identify the optimal choice with
certainty by pooling their information.
The
learning
previous work
environments
three ways.
in
we
First,
context of technology adoption,
that players
differ
is more
it
3
individuals
periodically
reevaluate
their
own
observe
other
decision.
those
of
in the
natural to suppose
payoffs,
our paper differs
Second,
from
we believe that,
observe their neighbors'
do
their choices.
study
well
as
supposing that
in
players'
outcomes
consider
we
Third,
as
and
the
possibility that players may be sufficiently heterogeneous that
under full information they would not all make the same choice. 4
In
the
diffusion
of
innovations,
agricultural
for
example,
different technologies may be appropriate for different
soils
and climates.
In
addition
environment,
style of
process
our
to
paper
its analysis:
is
differences
these
also
differs
from
those
a
reasons for proceeding in this fashion.
we
in
game played by
consider,
fully
We have several
First,
Bayesian
in some of the
learning
requires
calculations that may be too complicated to be realistic.
second motivation for our approach is that,
the
technology
choice
the
we suppose that players use exogenously
specified, and quite simple, "rules of thumb."
environments
cited
learning
Instead of assuming that the adoption
described by the equilibrium of
fully rational agents,
the
in
may
be
A
to the extent that
substantially
different
than
decisions
previous
the
have
players
would
we
faced,
be
uncomfortable with the assumption that the technology adoption
process
motivation
to
we
simply technical expediency:
is
way
easy
A somewhat different
described by an equilibrium.
is
considerations
various
incorporate
did
not
we
see
feel
an
are
important into a rational-actor, equilibrium, model.
paper
The
environments.
learning
population
simple
two
has
one
of
homogeneous
a
between
choosing
models
two
competing
with the payoff to each technology subject to an
aggregate i.i.d.
has
around
first
The
players
of
technologies,
players
structured
is
shock.
opportunity
the
only some fraction of the
Each period,
to
revise
their
these
choices;
players make their choices using simple rules of thumb.
Our
thumb
analysis
in which
choose
players
whichever
This
period.
begins with
ignore
technology
rule
a
will
particularly
historical
all
better
worked
lead
the
rule of
"naive"
data
in
popularity
•
simply
and
the
of
previous
the
two
technologies to fluctuate unless one of the technologies has a
higher payoff for all values of the shock.
We
subsequently
"popularity
weighting,"
consider
tendency
a
which
rules
choose
to
a
incorporate
more
popular
technology even if it was somewhat less profitable last period.
Since players observe both the choices and the payoffs of their
neighbors, they would have no reason to use popularity weighting
if they made
that
full use of their information.
real-world
popularity,
and
decision
indeed
makers
our
often
results
do
may
However,
pay
help
we feel
attention
provide
to
some
explanation for this phenomenon.
Specifically,
in
our
model
the
appropriate
use
of
popularity weighting leads players to adopt and stick with the
Intuitively,
technology.
better
strategy
a
which
more
is
popular today is likely to have done well in the past,
so that
the relative popularity of the technologies can serve as a proxy
performance.
their historical
for
Thus
fairly
is
clear that
popularity weighting rules can lead to better decisions; we find
that one particular choice of popularity weights picks out the
This leads us to ask whether
better technology in the long run.
there are any reasons to believe that this optimal popularity
rule is either particularly likely or particularly unlikely to
In response,
be used.
popularity
rules
we present a model of players choosing
which
in
the
optimal
rule
is
the
unique
symmetric equilibrium.
Our second model has a heterogeneous population, with each
technology better for some of the players.
Thus
the
question
here is not whether the better technology will be adopted,
rather
whether
the
new
distributed
Moreover,
.
.
have
players
will
adopted
be
the
by
We suppose that there is a continuum of
appropriate players.
players
technology
but
uniformly
similar
over
payoffs
to
a
and
line,
the
two
that
nearby
.
technologies.
5
suppose that players base their decisions on the
we
relative performance of the two technologies at locations that
This window width,
are within one "window width" of their own.
which is exogenous in our model,
the
result
an
of
informational
can be thought of as either
constraint-
players
may
not
observe outcomes at far-away locations- or as the result of a
prior
belief
that
players
at
far-away
locations
are
not
sufficiently similar for their experiences to be informative.
Once again,
players revise their technology choices using
do
not know exactly how
thus
and
simply
technologies
in
we suppose that players
In particular,
simple rules of thumb.
location
compare
average
the
window,
their
influences
relative payoffs,
payoffs
opposed
as
of
the
using
to
two
more
sophisticated statistical methods.
This second model provides a number of predications about
the types
and magnitudes of the errors that
are
likely to be
The spatial nature of the process allows some degree of
made.
even
learning
social
without
popularity
weighting,
and
the
long-run state of the system is approximately efficient when the
window width is small.
the
However, small window widths imply that
system converges more slowly,
initial
state
increasing
the
is
far
from
which can be costly
optimum.
the
popularity weighting
in
the
if
the
Roughly
speaking,
spatial
model has
about the same effect as decreasing the window width.
Before proceeding further,
we should acknowledge that our
belief that models of bounded rationality are a useful way to
study
social
learning
does
not
mean
that
we
satisfied with the particular rules we consider.
are
completely
In particular,
in the first model use of history does not seem so complicated
Our purpose is not to argue that any
as to be unreasonable.
one of these models
is
particularly compelling,
but rather to
identify general properties that seem to occur in some of the
more obvious formulations.
a
One recurrent conclusion is that in
number of cases the long-run state of the system is
efficient,
naive.
fairly
even though the individual decision rules are quite
2
The English Agricultural Revolution
.
developing
Before
our
we
models,
would
like
to
use
the
example of the English agricultural revolution to introduce some
of
issues
the
that
our models
designed to
are
address.
The
revolution in question is the improvement in English agriculture
This improvement is usually attributed
between 1650 and 1850.
to the spread of new agricultural practices,
and in particular
to what is called the "new husbandry," although both the size of
the improvement and its causes have become more controversial in
The new husbandry refers
recent years.
new
and
crops
crop
rotations
which
to
variety of
a
arrived
England
in
new
from
Flanders in the 17th century, based on the idea of growing crops
such as clover or turnips instead of leaving the land fallow.
These
crops
rejuvenate
could
the
soil,
and
hoeing
the
required had the by-product of eliminating weeds.
The
they
crops
themselves can be fed to livestock, which allows larger herds to
be
over
carried
winter,
providing
further
supplies
which in turn could be used for fertilizer.
manure,
result is
grains.
the
(ideally)
of
The net
increased production of both livestock and
7
The diffusion of the new husbandry seems to have involved
all of the major aspects of our models that we mentioned in the
introduction.
observe,
well
as
at
First,
it seems likely that farmers were able to
least roughly,
their neighbors'
output of
the
choice
of
crops
their neighbors,
and
as
crop rotations.
Second, the payoffs to various crops were different at different
depending on the soil,
locations,
farm.
climate,
and terrain of each
There is not yet a consensus on where the new husbandry
should have
been adopted,
but
it
is
clear
that
it
was
not
a
improvement:
universal
those
suited
where
Plain,
the
so difficult to harvest
and was
growth,
Midland
the
of
most
rot in the field
Third,
.
the
to
light
and were unprofitable in wet clay soils
clay soils of Norfolk,
like
were
Turnips
a
crop
it was
had
limited
often left to
crop of turnips could be ruined by
excessive rain or severe frosts,
leading us to incorporate the
weather as an annual stochastic shock complicating the learning
process.
9
Moving away from the physical description of the situation
to
(harder
our
behavior,
it
resulted
from
pronouncement
to
verify)
clear
seems
from
central
a
attempts made along this
seems
that
decentralized
plausible
that
line,
the
about
assumptions
adoption
final
the
learning,
authority
decisions
to
there
(although
may
agents'
opposed
as
as we discuss
farmers
the
below.)
have
a
were
And, it
been
less
sophisticated in their use of past observations than Bayesian
learning would suggest.
Moreover,
with capital markets poorly
developed or nonexistent, and starvation a potential concern, it
seems
plausible
determined
that
primarily
farmers'
by
technology
short-term
decisions
considerations,
and
were
that
farmers would be unlikely to experiment with a technology with a
lower expected return.
The two models we discuss explore two different aspects of
the adoption process.
a single
The homogeneous-population model looks at
location in isolation,
the adoption process.
and focuses on the dynamics of
It has been frequently noted that farmers
as a group are seemingly very hesitant to try new technologies.
These
comments
do
not
suggest
that
all
hesitant; for example, Slicher von Bath (op.
farmers
cit.
,
are
p.
equally
243)
notes
that during the English agricultural revolution,
very
ancient ways
were
followed."
next to
lay
fields
observation
This
which crop rotations
in
with
fits
"Land tilled in
our
fraction of the
meaning that at each date only some
inertia,
assumption of
population considers changing technologies.
The apparent inertia in the diffusion of the new husbandry
has been criticized by both contemporary and modern authors as
slowing progress, and indeed it does slow the transition from a
dominated technology
to a new one.
world)
improve
may
that
(one
However,
worse
in
states
all
of
the
our analysis shows that inertia
performance
long-run
the
is
process
the
of
if
the
performance of the two technologies is subject to sufficiently
Given our assumption that players do not
large random shocks.
track
keep
of
past
outcomes,
without
process
the
inertia
oscillates between the two technologies, while a combination of
with
inertia
tendency
a
use
to
most
the
popular
technique
permits the learning process to converge to the better of the
Our second model examines the idea that players learn from
two.
their
"neighbors"
when
new
a
technology
may
turn
out
to
profitable for some but not all of the potential adopters.
the case of farming,
but
we
also
believed
to
have
be
mind
in
of
(In
we take this spatial structure literally,
similar,
One
adjacent.)
be
the
learning
who
but
most
"neighbors"
from
need
striking
not
and
be
who
are
geographically
frequently
noted
characteristics of the agricultural revolution is the slow rate
at
which
England
the
said
innovations
that
the
spread.
rate
was
Contemporary
only
one
mile
observers
per
year,
in
and
indeed it did take more than a century for the new husbandry to
make its way across the island
Many explanations have been proposed for this slow spread,
technological
including
factors
like
pests
institutional factors such as the enclosures,
lack of education.
role,
and
diseases,
and the farmers'
While all of these factors may have played a
our analysis suggests that the basic fact of a slow rate
of diffusion need not be surprising,
as it is can be a natural
consequence of social learning, particularly when the difference
and when farmers pay attention to the
in payoffs is not great,
popularity
relative
technology
each
of
making
in
their
decisions.
A second and more spirited debate has surrounded the role
of elite landlords and agricultural reformers on the diffusion
Ernie [1912] and Mantoux [date]
process. The classic studies of
portrayed the agricultural revolution as the result of valiant
struggle by innovators such as Jethro Tull and Lord Townshend to
overcome
the
have
authors
ignorance
of
peasant
the
attacked this view;
a
Revisionist
farmers.
main component
of
their
argument has been that the new techniques were already in use in
some areas of England before many of the so-called
innovators
were born.
Our model reinforces the pro-elitist side of this debate by
emphasizing that popularizers can have a significant impact in
promoting the diffusion of a technique even if they are not the
first
to
develop
it.
The
arrival
of
new husbandry
the
in
England is often dated to the publication in 1650 of Sir Richard
Weston's observations of agricultural practices in Brabant and
Flanders
.
When one examines the spread of the new husbandry as
detailed by Kerridge
the
new
husbandry
[op.
spread
cit.
out
,
pp.
very
272-279],
slowly
it appears
from
a
that
number
of
geographically distinct locations.
Suffolk,
other
first adoptions were
The
in
but by 1680 the new husbandry had appeared in several
locations
over
hundred
a
information from other counties,
miles
the
away.
providing
By
agricultural popularizers
may have promoted such subsequent innovations,
and sped up the
rate of the technology's diffusion.
our theoretical model may shed some light on
In addition,
another
which
question
has
received
attention
less
the
in
Was the new husbandry eventually adopted in all of
literature:
the areas to which
was suited?
it
As
potential guide to
a
historical research, we discuss the conditions under which naive
learning processes
of
kind
the
we
consider
tend
generate
to
efficient long-run outcomes.
Beyond
these
historical
questions,
our
spatial
of
model,
learning raises the basic question of how well "naive" learning
perform
rules
we
that
homogeneous-population,
tradeoff
between
investigated
Our answers here
model.
rates
of
in
adoption
and
our
focus
efficiency
first,
on the
the
of
long-run equilibrium.
3
A Simple Model of Homogeneous Populations
.
Before
considering
heterogeneous
simpler case
players.
a
population,
in which
learning
social
it
is
interesting
the same technology
is
systems
to
with
consider
optimal
for
a
the
all
This model can be thought of as describing behavior at
single site
in
the
model we consider
relative payoffs vary with location.
large
in
(continuum)
later
on,
where the
Suppose that there
population of players at a single site,
is
a
each
of whom must choose whether to use technology f or technology g.
10
In
each period,
(Given our assumption that players observe one
the same payoff.
nothing would be changed if we allowed each
payoffs,
anothers'
player's
players using the same technology receive
all
payoff
subject
be
to
idiosyncratic
to
shocks.)
suppose that the payoffs to the two technologies at date
We
t,
u.
and u^, are related by the equation
u^ - u£ = 8 +
(1)
c
,
t
where 8 is a fixed but unknown constant parameter and the
i.i.d.
function
(In
H.
and
the
cumulative
distribution
later sections the constant 9 will vary with
is strictly between
In
mean
zero
We will assume that p
location.)
0]
with
shocks
are
c.
initial
and
=
1
= Prob [u^ -
- H(-6)
u.
£
1.
denoted
period,
players are using technology
g.
fraction x
a
0,
the
of
After each period, a fraction a
of the players have the opportunity to revise their choice.
Very
low
values
of
might
a
correspond
to
system
a
in
which
individual player made their choices for the duration of their
effective
inflow
with
lifetimes,
of
replacement
describe
a
system
embodies
in
a
capital
corresponding
revisions
intermediate
players;
which
in
costly
the
choice
the
good
of
a
that will
values
to
might
technology
not
be
an
is
replaced
12
until it wears out.
We suppose that the players who are revising their choice
can
observe
the
previous period.
entire
history
assumption,
we
average
payoffs
However,
of
technologies
both
players do not have
payoff
suppose
of
that
observations.
individual
To
players
in
the
access to the
justify
this
revise
their
choices too infrequently to want to keep track of each period's
results,
and more strongly that the market at this particular
11
"location"
is
too small
for a record-keeping agency to provide
this service.
The simplest behavior rule we consider is the "unweighted"
rule under which all players who revise their choice pick the
technology which did best in the preceding period.
Under this
adjustment rule, the evolution of the system is described by
f(l-a)x.+ a
x
(2)
t+1
=
with probability p =
Prob[u^£
u. ],
\
l(l-ct)x
with probability (l-p)=Prob[u^
t
< u£]
,
so that
=
E ( x +ll x )
t
t
(2')
(
1-a x
)
+ ap
t
-
Note that this specification is symmetric in its treatment
of the adoption and discontinuance decisions,
which corresponds
to the case where the costs of "transition"
are small.
reading about the English agricultural revolution,
13
Our
as well
as
studies of more recent innovations cited in Rogers and Shoemaker
(p. 115)
suggest
that
amount
the
of
discontinuance
an
is
important factor in the diffusion process.
The
result
following
theorem 10 of Norman [1968].
(b)
of proposition
Proposition
of
x.
1
2
standard;
is
it
follows
from
e.g.
(It is also a consequence of part
below.)
The system
(2)
i.e. the time-average
is ergodic,
converges to its expectation with respect to its unique
invariant
measure
u.
Moreover,
E
(x)
=
p,
and
var
(x)
=
p(l-p)a/(2-a)
4
.
A Single Location with Popularity Weighting
Proposition
1
says that observing the long-run fraction of
players using technology g reveals the fraction of the time that
12
been the better choice.
g has
If the distribution H
of
is
c
the arm that is more often better is also the arm
symmetric,
with the higher expected payoff.
(The same conclusion holds so
long as the amount of asymmetry in H is small compared to e.)
This suggests that if all other players in the population are
whichever
choosing
technology
has
highest
the
current
score,
each player could gain by considering the relative popularity of
the two technologies, as well as their recent payoffs.
Intuitively,
popularity
current
the
provides
some
information about the past history of the process, and thus can
serve
as
proxy
a
for
Although
it.
information
this
not
is
complete,
the complete history of the process is not needed to
identify
the
results
better
section
this
of
technology.
is
One
way
in
some
that
interpret
to
cases
the
popularity
weighting is a good enough proxy for the history that the system
eventually converges to the correct choice.
develop
now
We
weighting with
only
a
a
fraction
period.
single
a
parametric
location.
the
of
though,
Now,
simple
a
As
model
above,
population updates
popularity
of
we
suppose that
choice
its
each
instead of choosing the technology which
did best last period, the choice rule is
(3)
"Choose g if u^
-
u^
i
m(l - 2x
.
)
fc
Under this rule,
choices
their
H(m(l-2x.
-9)
;
the probability that those players who revise
choose
g
is
Prob[
0+c.
2
m(l
-
2x.
)
]
...
x
.
"1
I
t+i
[
1
-
when all players use rule (3), the fraction using
g evolves according to
K*)
=
(l-a)x. +a with probability
c
with probability
(l-a)x.
13
l-H(m(l-2x.
H(m(l-2x t
)
-6)
r
-G)
The parameter m indexes the amount of popularity weighting;
the
case
When
above.
this
=0
m
corresponds to the unweighted case discussed
chooses
players
case
both technologies are equally popular; in
= 1/2,
x.
technology
the
current payoff for any value of m.
with
As m grows,
the
highest
players become
more willing to choose the currently popular technology even if
its current payoff is lower
use
We
specification
linear
the
14
primarily for analytic convenience.
popularity weighting
of
combines nicely with a
It
second simplifying assumption that we make in this section, that
the distribution H of the per-period shocks
[-cr,
<j]
This
.
allows
us
explicitly
to
behavior of the system for any
compute
the
long-run
also ensures that the
It
m.
uniform on
is
c.
linear class of weighting rules we consider includes one rule
that
leads
the
asymptotic distribution to
ft
the
difference
additive
u.
is that it,
x.
,
another point to note
and any rule that compares
"f
-u.
to
transformations
multiplicative ones:
In
a
of
function
the
of
payoff
x.
invariant
is
,
function,
order to preserve
the
but
not
the parameter m must be multiplied by the same constant.
to
see
why
this
must
be
the
case
is
to
to
to
same decision
rule when the payoff functions are multiplied by a constant
way
the
cr.
Beyond the presumed linearity in
(3)
on
15
optimal choice, namely m =
about decision rule
concentrate
note
that
A,
One
the
expression (l-2x) is unitless, so the parameter m is measured in
the same units as the payoff are.
Our assumption that the per-period shocks have a uniform
distribution makes it easy to determine the long-run behavior of
the system.
Since the
lowest possible value
14
of
c.
is
-cr,
the
lowest possible observation of u^
sufficiently large that e -a
2
x^
Likewise,
increase.
to
playing
x^.)
s
x.
=
(cr-m+e)/2cr
-(m/cr)x
"lock
on"
to
least
when x
Thus,
.
downwards
<
steps,
Prob[8+e,
s
m]
so that the
0,
<
=
system
probability
the
upwards step is uniformly bounded away from zero.
x^ >
x.
fraction
implies x
> o
cr
is
upwards step is minimized at
this probability must be at
0,
cannot
(Note that
a
x.
certain
is
the
(m-e-<x)/2m,
x =
is certain to increase.
f
Because the probability of
=
x.
if
if
or equivalently if
,
the fraction using technology g
(m-0+cr) /2m,
=
m(l-2x.)
i
Hence,
is 6 -a.
-u.
of
an
Similarly,
if
the probability of a downwards step is uniformly bounded
1,
away from
.
The above shows that (ignoring knife-edge cases)
there are
four possibilities for the long-run behavior of the system:' If
x"
<
and
1
enough
x
upward
the
<0,
jumps
system
is
>
xg
that
x.
certain
so
,
eventually make
to
that
from
position the system converges with probability
x"
>
1
initial
and x
>
position.
If
<
x
converge (with probability
1)
1 if
x. i x
,•
for x_
e
(x
,
and
to
x")
xg
<
x.
=
.
the
1,
if x Q £ x
to
1
system converges to x
the
,
any
initial
= 1.
If
from any
system
will
will converge to
,
the system will also eventually
,
converge to a steady state, but it has a positive probability of
ending up at each of the two steady states of the system.
the remaining case,
in which x <
and x^ >
not converge to either steady state.
will
continue
to
with
fluctuate,
1,
the
long-run
computed following the statement of proposition
The
above
observations
do
most
establish the following claims:
15
of
the system will
the fraction
Instead,
the
In
2
x.
distribution
below.
work
required
to
Proposition
(a)
2
Popularity weighting m = a
"optimal"
is
in
the system converges with probability
that from any x
sense
the
to the
1
state where everyone uses the better technology.
m >
(b)
is "overweighting",
cr
with probability
1
in that the system converges
to a steady state, but which steady state is
selected may depend on the initial condition x_. More precisely,
the
system converges to the better technology if
while for
<
|0|
to
with
1
.
If x
probability
converges to
x
*
(m
with probability
1.
If
|0|< m-cr and
1;
(m+cr-0) /2m)
,
the
system
but both steady states
,
positive probability.
(c)
With "underweighting,
"
m <
i.e.
cr,
the system need not
It does converge
converge to a steady state.
1)
-cr-0) /2m,
the system will eventually
converge to one of the steady states
have
the system converges
(m+cr-0)/2m,
2
Q
if
x e( (m-cr-0) /2m,
m-cr,
the behavior of the system depends on the
m-cr
initial condition x_
2
|e|
to the better technology if
|0|
(with probability
but for
2 cr-m,
|0|
< cr-m,
the
system has a non-degenerate invariant distribution u, with
E x = 1/2 + 0/2 (cr-m),
var x =
acrE
xE (1-x)
/
and
[
(2-a)cr-2 (l-a)m]
.
Proof:
(a)
x
If m=
cr,
then x g = (2m-0) /2m is less than
= -0/ 2m is greater than zero iff
<
0.
1
iff © >
0,
and
The conclusion now
follows from the argument in the text.
(b)
It suffices to check that if
and x g <
1,
that -0
> m-cr >
>
m-cr
implies x
16
>
>
then then x
and x g
>
1,
<
while
for m-cr >
x
|e|,
and x"
>
<
1
A similar computation shows that when
(c)
|0|
>
cr-m,
Appendix B establishes that the
must converge.
when
unique invariant distribution
cr-m
>|©|,
the system
system has
a
and computes the
corresponding mean and variance.
Proposition
shows that the system is certain to converge
2
to the correct choice if the popularity weight m equals a,
and
that the payoff loss from a wrong choice must be small if m is
close
to
this
there
is
any
level.
Thus
particular
it
interesting
is
reason
to
suppose
ask whether
to
that
popularity
weights equal or close to a are likely to be used, or conversely
whether
there
players
to
consider
different
use
game
a
forces
are
in
in
the
weights.
which players
model
As
that
a
would
partial
simultaneously
drive
response,
choose
the
we
their
individual popularity weights, and show that the optimal weight
m =
cr
is its unique equilibrium outcome.
This result is only a
partial response, because it supposes more sophistication in the
determination of the popularity weights than we find compelling.
However, the result does show that popularity weighting need not
conflict with individual incentives.
To define the payoffs in this game, we suppose that players
have a common prior distribution p over e,
and that p assigns
positive probability to every neighborhood of 9 =
xQ
e
[0,1],
6
e
support (p),
and m,
let
u(0,
xQ
For each
0.
,
m)
be
the
long-run distribution on x when all players use weighting m.
For each value of B,
the payoff to the profile m is defined to
be the mean of the long-distribution on payoffs, and the overall
payoff
is
the
expectation
of
this
17
value
with
respect
to
the
prior beliefs
is
made
Our use of the long-run payoff criterion here
p.
solely
for
we would
convenience:
optimal behavior for discount factors near
want
allow
to
for
the
prefer to consider
but then we would
0,
weightings
popularity
to
be
chosen
repeatedly (popularity has no relevance in the first period) and
would need to consider the distribution of the state
each
in
period separately.
Since the profile in which all players choose m = a results
this profile is clearly a Nash
in the full-information payoffs,
equilibrium
game.
the
of
Moreover,
it
is
only
the
symmetric
equilibrium, as shown in the following proposition.
Proposition
probability,
every neighborhood
If
3
unique
the
symmetric
has positive
=
of
equilibrium of
the
game
in
which players simultaneously choose the weight m they give to
popularity is for all players to choose m = a.
Remarks:
(1)
We
do
not
know
whether
there
asymmetric
are
equilibria as well. The long-run behavior of the system when two
or more decision rules are used by a non-negligible proportion
We should also
of the population seems difficult to determine.
point
out
that
symmetric
no
Nash
pure-strategy
equilibrium
exists in the case of a normal distribution that we consider in
appendix A.
case popularity weighting
to
be
since in that
This should not be too surprising,
approximated,
but
permits the full-information payoff
not to
be
attained
exactly.
However,
profiles where all players use a "large" amount of popularity
weighting are e-Nash equilibria.
(2)
If
bounded
players
away
are
from
certain
zero,
that
there
are
18
the
absolute
equilibria
value
in
of
6
is
which m can
exceed 8.
It is difficult to suppose that players using the
(3)
kinds of
naive learning rules we consider would consciously choose their
popularity weights to maximize their long-run payoff.
We prefer
to interpret the eguilibrium assumption here as the result of a
long-run adaptive process,
but this raises the question of the
relative speeds of adjustment of the process determining m and
that reflecting learning about the technologies.
Fix any profile where all players use some m
Proof:
idea of the proof is simply that if m
too
popularity weighting,
little
would
prefer
while if m
>
deviate
to
and
<
then
give
more
<x,
*
The
cr.
so everyone is using
individual
each
weight
to
player
popularity,
each player would prefer to give popularity use
<r,
a bit less weight.
Since there is a "large number" of players,
behavior
the
of
deviates.
system
is
unaffected
if
|8|
conclusion
will
then
neighborhood of 8 =
Suppose
deviating to
m'
follow
from
payoff
first
^
If
state,
assumption
that
m
<
a,
and
= m+dm for some small dm > 0.
difference
sufficiently strong that
m'
our
is
|e|
larger;
that
and
the
every
has positive probability.
no effect on his long-run payoff if
any
player
is sufficiently small,
has no effect on the player's payoff when
the
single
We will show that there is always a deviation that
improves the player's payoff when
(a)
any
the aggregate
between
x.
consider
a
player
This deviation has
|e|
£ <r-m,
the
two
for in this case
technologies
is
converges to the optimal choice, and
m yields the first-best long-run payoff.
|e|
<
cr-m,
the system does not converge to
but as a non-degenerate ergodic distribution.
19
a
steady
In this
deviating to
case,
leads the player to use g
m'
instead of
f
whenever
m (l-2x
'
t
(l-2x
m'
f
g 3
— n
u£
u
s
)
-0
)
£ c.
m(l-2x
<
fc
t ),
<
m(l-2x )-e.
i
Similarly, the player will now use
m(l-2x.)
instead of g whenever
f
< m' (l-2x.) -8.
difference
the
Since
- c.
or
between
payoff
in
and
g
is
f
e,
the
expected change in the player's per-period payoff is
du
where
.
/
H
dm = 9 (2x-l)dH m(l-2x)-e]d(i(x)
is
the
,
uniform distribution,
and
distribution on that was derived in proposition
-9
€
[-cr,cr]
for all x
du. /dm = 9/2cr
Since
Ex
for all 9
(b)
dH m(l-2x)-e
[0,1],
e
1/2 for e > 0,
e
(-(cr-m),0)
u
and
Ex
cr-m)
(0,
<
1/2 for e < 0,
m and
m'
use an m
=
true if 9
£ m-cr
and x Q
£
positive
particular,
if
converging to
e
probability
<
that
there
cr,
form any x.
while
using
m'
=
cr
is
2
so that both
,
the
20
converges
(l-2x Q )m+cr,
to
In
0.
probability
positive
of
being played in almost every
f
converges to the probability that
x.
9 <
If the system does converge
1/2.
£
using weighting m leads to
period,
1,
If e
x" = (m+cr-e) /2m.
If 0< 9 < m-o- and x Q < x^, or equivalently
to 0,
and
Suppose first
cr.
with probability
1
>
yield the full-information long-run payoff; the same is
m'
is
> a,
so that technology g has the higher payoff.
the system converges to
there
du./dm
.
consider an individual player deviating to
m-cr,
Since m(l-2x)
2.
(2x-l)du(x) = e/2cr [E 2x-l]
>
0,
invariant
= l/2cr, and so
Now consider a profile in which players
that 9 >
the
is
u.
c.
probability
2
m-e =
of
playing
(cr+m+e)/2m,
g
which
for all positive 9.
is greater than
The preceding two paragraphs show that
well as m for all positive
1/2
and 0< 9< min(cr,
9,
m'
does at least as
and does strictly better if x n
A symmetric argument shows that
m-cr).
s
m'
does at least as well as m for all negative 9, and does strictly
better if ©* +
x
z
1/2 and 0> 9> max(-cr,
£
-(m-cr)).
Hence if
the prior assigns positive probability to the neighborhood of 9
=
yields
m'
0,
strictly
a
initial position x Q
higher
expected
payoff
from
any
.
While our formal results concern the eventual steady state
the speed of convergence is of some interest as
of the system,
consider an initial position where x. is
well.
In particular,
small,
so that g corresponds to a "new" technology,
that
9
>
improvement.
9+c.
>
so
0,
that
the
new
technology
and suppose
in
is
fact
Then the share of technology g increases whenever
m(l-2x.
)
,
since
and
probability
the
this
of
event
increases with 9, so does the expected rate of adoption. 17
a
an
Such
correlation between the extent of improvement and the speed of
adoption
has
been
noted
in
the
empirical
Mansfield [1968] and Rogers and Shoemaker
discussions
of
but has not,
[1971],
so far as we know, been addressed in the learning literature.
Note
also
decreases, as
a
that
for
increases,
becomes less informative.
fixed
so
9,
that
each
More generally,
18
of
convergence
period's
observation
speed
the
in
convergence will be
slow if the new technology usually does about as well as the old
one,
but occasionally does much better.
Furthermore,
technology usually does slightly worse than the
occasionally does much better,
higher mean payoff but
a
(i.e.
if the new
old
if the new
one,
but
technology has
a
lower median) then naive learning rules
21
that look only at the recent relative performance will be biased
towards
the
wrong
This
choice.
consistent
insurance,
observation that seat belts,
with
the
and vaccinations
have
is
been slow to diffuse.
Heterogeneous Population with Linear Technologies
5.
Now we turn to the study of heterogeneous populations,
which
denoted
technologies,
payoffs,
E(u^-u.
representing
different
before,
As
individuals.
we
f
equal
,
)
location
a
may
technologies
different
have
locations
and
to
suppose
with
g,
optimal
be
that
Now though,
9.
along
line,
a
different
9's.
optimal rule (both socially and privately)
positive 9 to use
that
the
It will
of
technology
particular,
In
as
at
the
for players with
is
choice
9
in
has
a
f,
so
cut-off
or
0.
important
be
of
players
that
so
two
difference
think
we
only
and players with negative 9 to use
g,
distribution
break-point at 9 =
are
mean
the
different
for
there
in
in
the
following that
the
relative
advantage of using technology g at location 9 may be correlated
with
the
advantage"
"absolute
of
location
9,
e.g.
the
productivity of the "land." To capture this, we suppose that the
payoffs to the technologies have the following linear form:
'u^(9)=9 +
/39
+c
lt
(5)
u£(9)=/39 +e
2t<
With this parameterization,
/3
>
implies that technology g
does better at "good" locations, while when
/3
<
0,
g does better
at bad ones.
In this model,
the player's location in 8-space determines
22
his average payoff to the two technologies.
variables
payoff-relevant
the
with
correlated
the
observed
as
We want to think of
being
unobservable
locations.
The
idea
but
that
is
players do not know exactly which aspects of their locations are
payoff -relevant, or how these aspects influence their
For
we
reason,
this
allow the players
not
do
to
payoffs.
regress the
observed payoffs of each technology on the corresponding values
Instead, we suppose that players base their decisions on
of 9.
the average performance of the two technologies at locations in
their "observation windows," where the observation window of the
player at
is the interval
[0-w,
e+w]
We call w the "window
.
width."
We have two interpretations in mind for this model.
the
parameter
location
with
location,
the
may
6
performance
correspond
of
to
geographical
technologies
the
First,
linked to
variables such as climate or terrain that are in turn correlated
with
model
the
Second,
location.
describe
may
adoption
decisions at a single village, where players are differentiated
by idiosyncratic payoff-relevant characteristics such as wealth
and household size.
In
studying
geographic
diffusion,
for
example
of
an
observation window might
reflect
the farmer only observing the outputs of his neighbors,
and the
agricultural
technology,
the
window width w might be fairly small.
In studying adoption at a
single site, the observation window corresponds to the players'
beliefs about which other players are sufficiently similar for
their experiences to be relevant, and players might well observe
the
actions
window.
To
and
the
outcome of
extent
that
others who
the
23
relevant
are
outside
of
their
characteristics
are
the window widths in this interpretation
difficult, to determine,
fairly
might be
large.
In
both
interpretations,
players might
prefer to weight observations of their immediate neighbors more
heavily than those of players who are farther away,
within
observation
the
window;
attractive when the observation window
study of
homogeneous population,
a
may
this
is
we
particularly
be
As
large.
begin
by
but still
the
in
analyzing the
simple rule where players use whichever technology did better in
their window
period;
last
later
allow for popularity weighting.
will
we
enrich the model to
To define this rule formally,
suppose that the distribution of players over locations has a
constant density, which we normalize to equal
1,
and let u^(8)
be the average score realized by those players in the interval
who used g at period t,
[S-w,
0+w]
u?(0)
=
u. (9)
is defined analogously.
convention that
if every player in the interval used f;
-co
The
with the
(unweighted)
the average
decision rule is for the player at © is
then
(6)
"Play g at period t+1 iff u^(9) -u£(e)
In
the
continuum
previous
of
players
sections
and
s
considered
we
inertia,
so
"
0.
that
a
the
model
with
fraction
a
of
players using each strategy can never shrink all the way to zero
in finite time.
suppose
that
all
period.
In our study of spatial models, though, we will
that there
players
is
at
We do so
in
no
each
inertia
location
at
individual
revise
their
locations,
choices
part for reasons of convenience,
so
each
and in
part because in rural areas with low population density it seems
plausible that
a
technology could be abandoned by everyone in an
observation window after
a few
bad draws in a row.
24
As a first step in analyzing the decision rule
that the noise terms
the
c
.
,
are identically zero,
and c_.
Suppose
system is deterministic.
(6)
further that the
state of the system is described by a cut-off rule.
suppose that there is
a
suppose
so that
initial
That is,
such that all players with 9
9,
choose g and all those with 9
<
choose
6.
f.
Then the period
t+1 state will be described by a cut-off rule as well.
note
this,
that
players
players at 9 < 9.- w play
see both
at
hence will play g
and
played,
all
9
in
w
9 .+
>
the
To see
only
see
next period,
Players at every 9
f.
e
s
e
being
g
while all
[0.-w,
9.+w]
and g being played, with
f
+w
r
L (0+l)sds/(e+w-0)
-g
u^(0) =
=
(/3+1)
(9+w+e)/2, and
(7)
Q
u£(0) =
Thus for
u^(0")
9.
£sds/(e+w-e)=/3(e-w+e)/2.
|
-w <
- u£(0")
9'
< 9" < © +w,
= u^(0')
so that if the player at
9'
- u£(0'
we have
)
+
(0"-0')/2,
plays g in period t+1 then so does
Hence the state at period t+1 is described by
the player at 9".
a
t
cut-off rule.
A steady state cut-off rule must have the property that the
player at the steady- state cut-off
is
and g given his
the
observations.
Thus
indifferent between
steady
state
unique solution of
u^(0*) = u£(9*), that (0+1) (0*+ w/2) = 0(8*- w/2),
and so
(8)
6* = -(20+l)w/2.
25
is
f
the
Note that although the optimal cut-off
value of
When
/3
=
|3
the payoff to
for example,
0,
f
if
/3
discrepancy
The
well.
any
= -1/2.
identically zero,
is
Hence when the cut-off is
9.
players
to
for
using
is
g
which will tempt players to the left of
positive,
as
payoff
average
the
0,
=
9
the steady state cut-off is only at
,
while the payoff to g is equal to
at
is
between
steady
the
strictly
to adopt g
state
and
the
optimum arises from our assumption that players do not directly
and hence use only the average payoffs received by
observe 9,
the
technologies
two
making their decisions. Note
in
that the
maximum steady-state payoff loss at any location is the absolute
value of 9
which is small if
,
/3
is not too
(in absolute
large
value) and the window width w is small.
Having
determined
the
steady-
state
cut-off,
we
next
examine the behavior of the system away from the steady state.
It is easy to show that,
from an initial cut-off 9 Q
will move towards the steady state 9
within w/2
period until
it
interval
system typically
the
*
about
9
For
.
is
ease
of
of
*
9
enters
reference,
.
,
the cut-off
at a distance of w each
Once
9.
stable
a
is
2
-period
summarize
we
within this
this
cycle
as
a
proposition.
Proposition
by
(6)
and
From an initial cut-off e Q
4
e
=
-©
t
9.
Proof:
the system determined
evolves according to
(7)
0.
(9)
,
+w
+26
-w
—a *
If u^(S.-w)
9.<9 -w/2.
*
© e[9 -w/2, 6 +w/2)
t
e.se +w/2.
—f *
- u. (0.-w)>
0,
then all players who observe
both technologies being played -i.e. all players in the interval
26
[8.
eguation
/3(8
we
(7),
that
see
A
vs
or
-w),
period
this
the
is
case
.£
Finally,
t+1.
Similarly,
technologies
both
see
if
A
= 8 + w/2.
-/3w
^
8.
who
players
Substituting 8 = 8 -w into
use g in period t+1.
-
-w,8. +w]
if
8
e
being
[8
-w/2,
if
(/3
6 -w/2,
<
e.
played
choose
8 +w/2),
©
fc
s atisfy
e
O+l)
=
t+1
-e
(e
t+1
+0
-(2/s+i)w = -e
t
= ^(©
©
t+1 + t
+ w)
t
-
w)
+ l)8.
^
JL
so
,
all
f
in
will
t+1
that
*
t
+ 28
.»
Next we consider the behavior of the model with noise, i.e.
non-degenerate i.i.d. random variables. Let
z.
denote the difference in the two shocks, and let
8.
with c..
and c_.
= e_.
1t
= 8
- e
+z.
;
et
i-
s
*-he
steady state of the system when e_ -e.
is
Because behavior rule
(6)
identically equal to
for all r.
z.
depends only on the difference between the payoffs to f and g,
and not on their
levels,
when the shock is
z,
with the term 8
Proposition
shock is
z.
*
8
=
-8.
+W
8.
.
8.
,
and the period-t
8 <8 -W/2.
+28.
t
A
8
e[e.-w/2,8 +w/2)
A
JL
8 £8.+w/2.
-w
For locations 8
Proof:
(9)
the period t+1 cut-off is given by
,
t+1
0.
*
If the period-t cut-off is
5:
system from
is the same as that given in equation
replaced everywhere by
8.
(10)
the evolution of the
€
[8.-w,8.+w],
the difference between
the average payoffs of the two technologies in 8's observation
window (the interval [9.-w, B.+w]
is
[8
+(2/3+l)w]/2
+ 8
fc
*
•*
w/2 implies S+e.
that
all
*
players
28.
who
-z =
t
(8
for all 8
)
i.e.
,
+8.
£
observe
-28.) /2.
^
a
8 -w,
e
both
27
u£(8,e
.)
> e
u.(8,c
Since 6 t >
*
t
-
t
+ w/ 2
technologies
.),
6.
+
implies
choose
g.
Similarly,
9,
<
+
8.
technologies choose
w/2
implies that all players who see both
Finally,
f.
period-t cut-off is given by
Proposition
When
6
e
the
t+1
*
if
8^
=
-e
are
z.
e
t
[6
+28
*
-w/2 ,8 ,+w/2)
,
the
.
draws
i.i.d.
from
a
distribution that has a strictly positive density on a compact
support,
the
distribution
invariant
generated
process
dynamic
and
F,
by
the
has
(10)
expected
a
probability
distribution at date t converges to F uniformly over
probability distributions
Appendix
Proof:
contraction
sense
the
in
initial
p..
shows
C
unigue
that
system
the
Norman
of
[1972]
is
and
random
a
satisfies
unigueness condition 2.11 of Futia [1982].
We have not been able
Instead,
directly.
to characterize this distribution
we have computed an invariant distribution
of the simpler system generated by
r
(ID
e
t+1
=
A
A
JL
+W0 t~ e t
t
®t"
w
V
Note that system (11)
e
t
differs from (10)
only when
8.
falls in an
interval of width w. Normally we will think of the variance of
z.
as being much larger than the window width;
in this case it
may be reasonable to guess that the invariant distributions of
(10)
and (11) are close together.
We should point out that the simplified system (11)
(10),
unlike
does not have a unigue invariant distribution: Because all
steps have size w, from initial position 8
is
,
concentrated on the grid
8
^
+
o
kw,
,
the support of (11)
and so different
conditions lead to different invariant distributions.
28
initial
Moreover,
the supports for of the date-t distribution are different for t
even
and
between
for
t
systems
and
(10)
qualitative
these
Despite
odd.
absolute magnitude
the
(11),
differences
the
of
effect of the initial condition is small when w is small, which
supports the conjecture that the two systems are similar. Table
1
below provides further support for this belief by comparing
Monte Carlo estimates of the steady-state variance of
the variance
that
is
of
(10)
with
the particular invariant distribution of
(11)
computed
proposition
in
conjectured,
As
7.
two
the
variances are close when w is small.
To
examine
that the noise terms
distributions
invariant
the
of
(11)
are are i.i.d with mean
z.
suppose
,
and c.d.f. H.
A
Then
9.
A
follows a Markov process with the transition from
A
©.
A
JL
+ w having probability Prob[0 +z.
^
0.
A
=
]
1
9.
to
JL.
- H(©. -9
)
The
.
invariant distribution has a particularly simple form when the
z.
are uniform on
[-cr,
cr]
and the parameters are such that there
is an invariant distribution whose support
containing the points 9
Proposition
M =
is
cr/w is
7:
*
is
Suppose the
limit
of
z.
<r
are uniform on
[-cr,
cr]
9 +kw)
the
[
(
(2M!
time-average
initial condition belongs to the grid 9
)
/
(M-k)
!
(M+k)
distribution
!
]
2
;
when
the
*
±
kw, k ^ M.
Remark:
Recall that the mean of this distribution
variance
is
crw/2,
and that
,
Then one invariant distribution of (11)
*
-2M
=
=
an integer.
the
symmetric grid
*
and 9 +
-cr
the binomial Prob(0
this
is a
and that the distribution
is
*
is
9
,
its
asymptotically
normal as w tends to. zero.
Proof
:
To
show
that
f
is
an
invariant
29
distribution,
it
is
verify
sufficient
to
condition"
that
that
for
all
probability flow from 9 to
reverse direction.
and
6
6'
6'
"detailed
the
meets
it
the
,
balance
(unconditional)
equals the probability flow in the
Thus, we will verify that
= e|e = 9'),
Prob(e t+1 = e'|e t = e)= f(9') Prob(e
t+1
t
or equivalently that
f(e)
f(9)/f(9') = Prob(9 t+1 =9| 9 fc = 9')/ Prob(9 t+1 =
9'
9 = 9).
t
|
Since the probability of a jump of more than w is zero,
it
suffices to check this conditions between adjacent states,
so
*
*
take 6 = B +kw and
2" 2M |"(2M!)/(M+k!) (M-k)
=
)
2
for
integer k between
some
For such states, we have
-M/w and (M-l) /w.
f(9)/f(9'
= 6 +(k+l)w
6'
-2M
(2M!
|
)
/
(M+k+1)
!
(M-k-1)
=
ll
(M+k+1)
!
and
Prob(9
t+1
=9
1
e
t
= e')/ Prob(9
t+1
= 9'
|
9 = 9)
fc
|(o-+(k+l)w)/2cr|/(o-kw)/2o- = (M+k+1) w/ (M-k) w,
so detailed balance holds.
w/o-
(10)
(11)
.5
.21
.25
.1
.048
.05
.05
.024
.025
.001
.00496
.005
TABLE
1:
STEADY STATE VARIANCE FOR UNIFORM NOISE
30
=
/
(M-k)
,
As one would expect,
and that the expected welfare loss
the cut-off is
t
u
d© =
j
Hence,
(compared to © =
0)
when
welfare
loss
is
©.
"2
©J/2,
the
in
long
run
the
average
per-period
(using the invariant distribution computed in proposition
1/2
that
E(0
2
=
)
is
Note that the social optimum is the constant © =
each period.
0,
state
because small w corresponds to small steps in
decreasing in w,
r
the variance of the steady
steady
(E(©))
1/2
2
+
state welfare
1/2
is
var(0)
=
[(2/3+l)
decreasing
in
2
w.
/8
7)
is
+cr/2]w,
so
small
w,
For
despite the lack of either memory or popularity weighting,
the
spatial nature of the process allows the long-run outcome to be
.
.
approximately efficient.
19
While small w's are thus desirable from the viewpoint of
the
time-average
payoff,
they
entail
a
significant
short-run
welfare loss when the initial state is far from the optimum,
because
in
this
case
the
system
will
take
approach the neighborhood of the optimum.
reasons:
Second,
First,
in
©.
is
limited
to move
a
long
to
This is true for two
at
most w per period.
the presence of noise a typical path is
take far more than © /w periods to reach
time
a
likely to
neighborhood of ©
*
,
because many steps will be in the wrong direction.
For a fixed initial condition and social discount factor,
the socially optimal window width will trade off the speed of
convergence and the steady state variance, with larger w's being
optimal the farther the initial condition
is
from
0.
If
the
social planner does not know the initial condition and/or the
location of the social optimum, the size of the optimal w will
31
depend on the planner's prior beliefs.
This tradeoff between
speed of adjustment and the variance of the steady state seems a
natural feature of the sorts of model we consider 20
this
At
would
point we
to
like
make
few
a
observations
about how the conclusions might change if the players did keep
records of their past observations.
*
within
of
<r
will play both technologies
9
eventually
could
they
Since players at locations
which
learn
themselves by keeping such records.
infinitely
technology
often,
better
is
for
However, a few calculations
suggest that this learning process will be fairly slow if the
random shock to the payoffs has a sizable common component and w
is small.
To see this,
suppose that the payoffs to each technology
are subject to a common shock
as well as the
tj.
shocks we assumed before, so that system
idiosyncratic
is replaced by
(5)
V
<u^(e)=e + 0e +e +
it
(5')
[u£(8)=00 +C
+
7J
t
.
variance
the
If
2t
of
is
77
relatively
observations
of
informative,
and only observations of both technologies
only
technology
one
at
date
then
large,
not
are
t
very
in the
same period will be of much help. Players at locations more than
three
standard
interval
9
*
±
3
deviations
(aw/2)
1/2
from
rarely
-
9
see
-that
both
is,
•
technologies
and hence would need a very long memory to
locations 9 closer to 9
often,
but
*
outside
learn.
many observations to be
the
played,
Players at
do see both technologies played more
for these players the systematic payoff
between the technologies
of
is
smaller,
and hence
fairly confident
32
one
difference
it may
is
require
better.
Our
approximations,
informal
this is
indeed the case,
periods
required
be
to
reported
and
better is of the order of a
long history would
be
2
w
/
suggest
E,
that
particular that the number of
in
fairly
appendix
in
confident
2
,
which
technology
is
so that when w is small a very
required for players to do much better
than with our simple rule. Of course, players could use history
even
advantage
the
when
doing
to
so
is
slight
slow
or
to
but in these cases it seems less obvious that players
develop,
would be led to abandon simple rules.
Examples of non-linear Technologies
6.
Before considering the implications of popularity weighting
in
a
we would
heterogeneous population,
like
to
discuss
some
examples of what can happen without popularity weighting when
the payoffs
form
as
a
presumed
in
function of
location do not take the
equation
.
(5)
Suppose
for
example
linear
that
"old" technology f has returns that are identically zero,
g(0)
the
while
= cos (9), so that regions where g is optimal alternate with
(See ^.(^uv^c.^-^
regions where f is.^If there is no noise in the system, and the
window width is relatively small,
locations ©
e
[-tt/2,
technology will
optimal.
In
not
this
tt/2]
then even if all players in
adopt the new technology g,
spread to
the
other
example there are
regions where
substantial
gentleman
it
social
from having the new technology "tested" at a number
locations.
the new
is
gains
of diverse
(It is for this reason that we would argue that the
farmers
may
have
played
an
important
role
in
the
spread of the new husbandry, even though they were not among the
first to adopt it.)
It
may
also
be
interesting
33
to
note
that when
the
local
g(6)=cos e
m
^
o
Figure
1
process may fail to spread as widely as it should, random shocks
to payoffs
increase social welfare,
can
that
increase as the variance of the noise term
increases
z.
=
Suppose that the technologies are f(0)
zero.
welfare can
is,
from
and g(0)
=
cos(0), and that the initial state has all players to the right
using g and players to the left using
of
the cutoff will move to e
1.)
When the support of
*
=
37T/2
and stay there.
sufficiently
is
z.
Without noise,
f.
(See figure
there will
large,
eventually be enough consecutive draws of very negative
the cutoff reaches
From this point,
tt/2.
system may no
the
longer have a single cutoff, as players to the left of
tend to switch to
that
z.
will
tt/2
while those to the right switch back to
g,
f.
Essentially, the noise leads the players in region II to use the
new technology long enough that it can spread from region
,1
to
region III.
The next example shows that in certain extreme cases the
specification error involved in ignoring how payoffs vary with
"location" can allow a technology that is everywhere inferior to
This is the case depicted in
completely drive out a better one.
figure
below,
2
which
in
=
f(0)
and
g(0)
=
0-e
current cut-off
is
-a
u^(0)
=
computes
+ (0-(0-w)
players
Hence
B.
)
.=
then the player at
0,
0-e
observe
t
- w,
both
-
f
u (0)
=
w
technologies
-
€
if
e,
choose
A.
e+w]
-w,
[0
-f
u (0)
and
+(0-(0-w) ) /2,
Since u g (0)
/2.
who
at
the
If
.
A
A.
=
w >
-w
all
e
technology
g.
and eventually g will take over the entire
population.
We
special:
if
the
should
point
out
these
that
technologies
are
quite
an inferior technology can only drive out a better one
difference
in
payoffs
|f-g|
34
is
small
compared
to
the
Figure 2
errors caused by estimating the payoffs by their average values
These errors are of the magnitude of w df/de
the window.
in
and w dg/de,
which bound the difference between the payoffs at
Thus,
and e+w.
(0-w)
the difference in payoffs
if w is small,
|f-g| must be small as well in order for the inferior technology
to
dominate,
everywhere,
adopted
substantial.
location
is
the
payoff
and
c
must be
wrong technology
the
loss
each
at
the example above,
(In
c,
though
even
hence
and
the payoff
than w
less
location
loss
order
in
not
is
at
for
is
each
g
to
dominate.
small
For
window widths,
substantial
more
a
payoff
loss
arises when the new technology is not adopted in a region where
it
is
a
substantial
improvement.
example where g = cos(0) and
should
be
adopted
example of figure
are
2
f =
was
This
the
case
in
so that the regions where g
0,
We
disconnected.
can
so that g is better than
f
also modify
the
at every location
(and so in particular is better on a connected set)
and yet a
substantial payoff loss results from g failing to spread.
the payoffs to
the
In
and g are such that g is much better
figure
3
than
f
in the neighborhood of 9 = 0, but is only slightly better
than
f
,
f
for extreme 6 values.
Hence,
if
technology g is first
introduced at these extreme values, it will be driven out of the
population before it can be tried in the center region.
7.
Heterogeneous populations and popularity weighting
Our analysis of social learning in homogeneous populations
showed
that
popularity weighting
could
improve
the
aggregate
performance of the learning process, and that the optimal level
of
popularity
weighting
consistent
is
35
with
individual
Figure
3
We
incentives.
now
will
investigate
popularity weighting in our model of
the
implications
of
heterogeneous population
a
with linear technologies.
To model popularity weighting,
let x.(8)
be the fraction of
players in the interval [6-w, e+w] who use technology g.
spirit of the popularity weighting rule (3),
the decision rule
used in sections
(6)
5
and
In the
we now modify the
6
and suppose that
players use the decision rule
"Play g at period t+1 iff u^(0) -uj(8)
(12)
where,
indexes
m
parameter
the
before,
as
m(l-2x.
*
the
(6)
importance
,
)
of
popularity in the players' decisions.
Since the analysis of this system is quite close to that of
the
without
system
results without proof.
t
weighting,
popularity
As in section
5,
will
give
the
if the state in period
so will
corresponds to a cut-off rule,
we
the
state
period
in
In addition, without noise terms the system has the same,
t+1.
=
steady-state cut-off 9
unique,
However,
- (2/3+1) w/2.
the
introduction of popularity weighting does change the dynamics in
two ways.
First,
the
in
absence
of
noise terms,
system
the
converges to the steady-state cut-off from any initial cut-off;
the
oscillations
Second,
become
described
(and relatedly)
more
common,
as
proposition
in
4
do
not
arise.
movements of less than one window width
players
are
more
hesitant
to
a
less
popular technology.
The following proposition gives a more precise description
of the dynamics.
Proposition
8
From an initial cut-off 9
by decision rule (12) and payoffs
36
(5)
,
the system described
evolves according to
(13)
:
+w
,6
e
if © <©
t
t
= e +
t
'
t+i
2m-w
)
2m+w
/
)
e "e
t
t
if 8 6[e -(m+w/2),
t
t
)
if e.> G.+ (m+w/2)
© ~w
t
•
Omitted.
Proof:
and
(
e*+(m+w/2)]
.
The calculations involved are straightforward,
similar
quite
(m+w/2)
those
to
proposition
of
Note
4.
dynamics above reduce to those of proposition
that
when m =
4
the
0,
as
they should do.
in the absence of noise,
To see that,
to 6
...
*
note that the cut-off moves a
from any initial cutoff,
full window width so long as \S.-e
A
|©.-6
[
the system converges
>
|
m + w/2.
Eventually then
A
*
(2m-w)
m
^
j|
/
(2m+w)
(8. -0
]
then
from
and
w/2,
+
on
-6
0.
so that the system converges to 9
),
it
.at
=
a
geometric rate.
Note also that for a given
8.
,
the system will move less
than a full window width whenever the realization of
is in an
e.
This show that popularity weighting makes
interval of 2m + w.
the system more "sluggish," and suggests that it will reduce the
variance
the
of
long-run
determine
intuition,
and
weighting
reduces
the
distribution.
extent
the
variance,
we
verify
To
to
which
characterize
this
popularity
the
long-run
distribution in one special case.
Proposition
9
(a)
If
the
z.
are
i.i.d.
draws
from
a
distribution that has a strictly positive density on a compact
support,
the
dynamic
process
defined
by
(5)
and
(12)
has
a
unique invariant distribution,
(b)
If the
z.
are i.i.d. draws from the uniform distribution on
37
and m
[-a, a]
the invariant distribution
2a,
£
on the interval [0 -a- w/2,
E
f
= e
(0)
+<t
f
is concentrated
+w/2] and satisfies
,
and
var,
Proof:
(0)
= a 2 w/6m.
(a)
Omitted;
proposition
argument
the
that
for
Appendix D shows that there is a deterministic,
finite time T for which the cut-off
*
to
6.
(b)
-w/2,
very close
is
is in the interval
[0 -
cr
.
and that once this interval is reached, © T
+ a +w/2],
remains in the interval for all subsequent periods T+s.
Given a T satisfying these claims, we have
*
|0
-0
*
*
|
+
1
*
-S
T+
|
(c + w/2)
<
+
cr
,
1
T
,
t
,
_e m,
\
<
which is less than m +
w/2 from our assumption that m > 2a. Hence we the evolution of
from T on is determined by the second case in proposition
8.
Writing c = (2m-w)
/
(2m+w)
and applying this rule repeatedly, we
,
find that
T+s -
<* -c)'
)
/
T=0
c
r 0*
T+s-x
Hence,
E(0 m+s |0
T
)
= (l-c)^c T E(0*) + c s (l-c)0 m
and
var(0 m+s |0
T
)
= (1-c)
2
2T
^ c var(0*) =
2
2
(l-c)cr /3 (l+c)= 2wcr /12m =
38
2
cr
w/6m.
-»
E(0*),
Comparing the steady state distributions
for m =
that
we
0,
see
long-run variance by
a
for m
that popularity weighting
>
2<r
with
reduces the
factor of l/3m.
The welfare consequences of
increasing m for fixed w are
similar to those of decreasing w for fixed m: in both cases, the
speed
distribution
state
steady
at
which
becomes
more
system converges
the
efficient,
decreases.
while
It
may
the
be
interesting to note, however, that in this simple model there is
one
way
to
convergence
without
the
change
(when the
altering the
parameters
speed
to
initial cutoff
is
up
the
rate
of
from the optimum)
far
steady-state variance,
namely
increasing
21
the window width w while holding the ratio of w/m fixed.
8
.
Concluding Remarks
The various models we have presented suggest that even
very naive learning rules can lead to quite efficient long-run
social
least
at
states,
if
the environment
is
not too highly
nonlinear. Moreover, popularity weighting can contribute to this
long-run efficiency, and the use of popularity-weighting passes
a
crude
first-cut
incentives.
test
course,
Of
with
consistency
of
there
are
many
individual
other
plausible
specifications of behavior rules for social learning,
interesting
to
speculate
about
robustness
the
so it is
of
our
conclusions.
In this light,
we would like to report simulation results
for one simple modification of popularity weighting that seems
to
improve
the
short-run
performance
changing its long-run behavior.
of.
the
system
For this purpose,
the homogeneous-population model of section
39
4
,
without
we return to
and now suppose
that players give weight to trends in the relative popularity of
the two technologies as well as to the popularity itself.
More precisely, suppose that players now choose technology
g
realized
the
iff
difference
expression m(l-2x.) -c(x.-x t
in
u^
exceeds
-u.
where x.-x._
,
)
payoffs
is
the
the
trend
in
popularity. Since the trend variable converges to zero along any
path where the system converges to
a
steady state,
the system
still converges to the better technology with probability
when
m
However,
=cr.
optimum,
as
in
the
if
case
the
initial
when
state
is
far
superior technology
a
l
when
from
the
is
first
introduced, one would expect that responsiveness to trends would
help
the
increase
to
speed with which
the
new
technology
is
adopted.
To test this intuition,
the noise term
weighting m =
period was
second,
a
c
uniformly distributed on
In the first,
cr.
[-cr,
cr]
and B =
.1
.02cr.
both, with
and popularity
the fraction a who adjust each
and the mean payoff difference 9 was
.5,
=
we ran two simulations,
In both
cases,
.lcr;
in the
we counted the
number of periods required for the system to move from initial
state x n =
.05cr
to x = .99a.
The results, reported in Table
2,
show that at least one obvious modification of our assumptions
reinforces the impression that naive rules can perform fairly
well.
a=. 5, e= .1
a=.l, 8=.02
c =
39
940
c = 5
26
710
c = 10
28
Table
2
:
470
Trend Weiaht ina and the Speed
40
22
There
are
number
a
of
other
considered but seem important.
extensions
structure
in
we
more
complex
have
considered.
equilibrium
choices
networks
of
not
Also, players might be
than
Finally,
the
our
either that rules of thumb are exogenous, or,
are
have
we
Players might use rules of thumb
which make some use of historical data.
arranged
that
a
static
simple
results
linear
suppose
in Proposition 3,
game.
It
would
be
interesting to complement these results with an analysis of
dynamic process by which players adjust their rules
of
a
thumb
along with their choice of technology.
Finally,
we should point out that popularity weighting is
not always as beneficial as our results might suggest. Consider
the problem of children is a poor neighborhood choosing whether
to pursue higher education.
If students who have done so
past tend to move out of the neighborhood,
in,
the
and past residents
are underrepresented in the observation windows, then the choice
of higher education will appear less popular than it really is,
and decisions
,
choice.
.
based on popularity may be
23
41
biased against this
Appendix
Optimal
A:
Weighting
Popularity
with
Other Distributions
To better understand the forces generating proposition 2a-
single choice of popularity weight yields the optimal
long-run distribution uniformly over all values of e- we show
that analogous results obtain when the per-period noise term e.
has distribution F with unbounded support.
that
a
Suppose first that a =
adjusts every period,
values
and
[Prob(x.)=
and hence the state
If we let
1.
Prob(x
0,
t)
transition matrix is A =
=
F(m-e)
l-F(m-e)
F(-m-e)
l-F(-m-e)
Since
this
that the
so
1,
matrix
x.
denote the vector
s.
we have
1],
is
entire population
takes on only the
strictly
positive,
ergodic; the unique invariant distribution u
where the
=s.A,
s.
*
system
the
is
.
.
is given by
M*(x=0) = F(-m-e) /[F(-m-0)+ l-F(m-e)].
If
is
F
the
standard
normal
then
distribution,
increases, the ratio F(-m-S)/ 1- F(m-0) converges to
and converges to » if 6 <
distribution
of
correct choice.
the
0.
Hence for
system places
if
near
and to
oo
> 0,
1
on
the
Moreover, the same is true for any distribution
for which the the ratio F(-m-0)/ 1- F(m-S) converges to
0,
m
the ergodic
large m,
probability
as
if e < 0.
if 6 >
(This is what is meant by saying that the
tails of the distribution
are "infinitely revealing.")
With a more involved argument, we have shown that the same
conclusion holds for any a e (0,1) when players use the
popularity weighting "if x.
-m; if x. <l/2, choose g iff u^-u.
(discontinuous)
£
u?-u.
£
£
choose g iff
-m."
The details
1/2,
available on request, here is the intuition for the result.
Note first that when m = » the system is deterministic with
are.
stable steady states at
and
1.
If m is finite but very large
compared to a and to the distribution, then steps the "wrong
way"
decreasing steps when x.
>
are rare
(i.e.
1/2
"innovations", and when the distribution is symmetric, transits
)
42
from
to
and
1/2
innovations.
to
1
take
both
1/2
same
number
the tails of the distribution are
If
then
revealing,
from
m
as
-»
innovations
towards
likely than
a>
technology become infinitely more
towards the inferior one, and the analysis
suggests that the
limit
Wentzell
[1984]
of
infinitely
the
better
of
innovations
Freidlin and
of
the
ergodic
distributions will be oncentrated on the better technology. To
establish this formally, we partitionthe interval into a large
(appropriately
chosen)
number
of
small
subintervals,
and
approximate the original system by two finite-state Markov
whose ergodic distributions will serve as bounds on
the ergodic distribution of the original system.
We then use
the discrete-time, finite-state translation of Freidlin and
processes,
results
Wentzell's
Mailath
(Kandori,
Rob
and
to confirm the intuition above,
[1992],
Young
the limits of the
distributions
of
ergodic
thef inite-state
process
are
concentrated on the subinterval corresponding to the better
[1992])
i.e.
The above suggests that infinitely-revealing tails are
sufficient for there to be a single popularity rule that is
choice.
approximately optimal for all 6. Moreover, this rule has the
nice feature that it need not be tailored to the exact form of
Even when the tails are not infinitely
the distribution.
revealing, however, there is another popularity rule that seems
to perform very well, namely "choose g iff u^ With this rule,
-1
= (l-a)x + a Prob[0 + c
* F
E(x
|x
^)]
t
t+1 t
fc
u.
s
F
(1-x.)."
)
= x
so that E(x.
to
_1
+ a[F(e+F
(x ))-x
t
t
t ],
|x.) > x. if and only if 9 >
drift towards the correct choice.
0;
the system tends
Although the
system may
converge to the wrong technology with positive probability, the
simulation
results
reported
in
table
2
for
the
logistic
and
non-revealing tails)
suggest that when a is small the system is very likely to
converge to the right choice. Intuitively, when a is small, the
system evolves through a series of small steps that allow the
We conjecture that there
drift to outweigh the random forces.
these
lines.
a
may
be
general
result
Laplace
distributions
(which
both
43
have
Appendix B
:
Proof of Proposition 2c
If cr-m
>
\9\
,
then neither
nor
1
is an absorbing state.
Our first step is to show that there is a unique invariant
distribution. To do so, we first note that the stochastic system
is a random contraction in the sense of Norman
(4)
A
[1972].
is
a
stochastic
system
in
which the
random contraction
realization of an i.i.d. auxiliary variable (call it u) is used
e V
to determine which of a family of mappings
is used to
is a contraction "on average." In
send x. to x. ., and each ip
<p
context,
our
corresponds
cj
payoffs, and there are only two maps
<p_ (x.
)
= (l-a)x.
realized
the
to
ip
^(x.
:
difference
= (l-a)x.
)
both of which are contractions,
,
+ct,
so that
in
and
is
(4)
indeed a random contraction. Norman's results then imply that
the Markov operator associated with system (4) is quasi-compact.
We next note that when
the system
<cr-m,
\9\
satisfies the
(4)
uniqueness criterion 2.11 of Futia [1982]: for any neighborhood
U of the point x = 1/2, and any point x' in [0,1], there is an n
such that the probability the system starting at x' in is in U
exactly n periods later is strictly positive. ( If m £ <r, the
uniqueness
condition
fails,
both
as
x
=
and
x
=
1
are
absorbing.
The last step is to compute the mean and variance of the
invariant distribution
E
u
= (l-a)E^(x)
(x)
u.
,
(x.
.)
we have
,
+ ajp(x)du(x),
where p(x) = (cr-m+e)/2<T +
which is
the
m(l-2x.
)
Using E (x.) = E
(m/cr)x.
is the probability that 8+e.
probability
that
x.
=
.
£
(l-a)x.+a.
Simple algebra then shows that E x = 1/2 + 8/2 (cr-m).
To compute the variance, we first write the identity
E
(x
2
)
2
E (x )|(l-a)
a
=
2
[(l-p(x)) [(l-a)x]
+2a(l-a)m/crl +
E
2
2
=
+p(x) [(l-a)x+a)] Jdu(x)
(x) J2a (1-a) (cr-m+0)
/2a +a
2
m/<r|
(a-m+S) /2a;
solving for E
(x
2
)
2
and computing var(x) = E (x
-
)
-(E (x))
2
the desired result.
APPENDIX
C:
+
PROOF OF PROPOSITION
To begin we rewrite (10)
6
in the equivalent form
44
(10')
gives
min[0 +w,2(0 +z )-0.)]0 + z s©
(10')
t+1
*
*•
*
"
"
max[0.-w,2(0 +z )-6 )]8 +z <0
t
t
To show that the system
"random dynamical system" as
we note that the
auxiliary events
(10)
is a
described by Futia [1982],
The probability distribution Q on the z's does not
are the z
depend on the current state, and so in particular is continuous
= <p(0.
in the state, and the map p(0,z) defined by 8.
z.
is
.
.
,
easily seen to be continuous in 9 for fixed
so that
z,
)
(10)
is
as
in
indeed a random dynamical system.
we
Next
that
check
is
contraction,
random
a
Because the map Q is constant in 0, the
of the definition can be taken to equal
Futia 's definition 6.2.
constant M in part
it
(a)
Next we must show that for all z, and all 0*0',
and 0' there is
d(<p(0,z), (p(0',z)) ^ d(0,0'), and that for all
d(<p(9,z), <p(8',z)) <
a positive probability of z such that
0.
d(0,0')
.
To show that d(<p(9,z), tp(9',z))
and
all
0'
and all
same direction (e.g.
i
i
either
z,
(<p
(0, z) -0)
*
both
(a)
(<p
(0'
d(0,0'), we note that for
,
z) -0'
Case a has three
<p(6',z)-8'.
)
>
and
0'
0)
or
subcases:
move in the
<p(9,z)-8
(b)
either
(1)
<p
moves both locations by w, so that d(<p(0,z), <p(8',z)) = d(0,0');
*
+z. moves less than w, and the
or (2) the location closer to
<
<p(9',z))
state farther away moves w, so that d(p(0,z),
d(0,0'),
or
both locations move by less than w,
(3)
case the two locations are reflected about the point
which
in
*
+z.,
and
d(p(0,z), <p(0',z)) = d(0,0').
In
implies
case
(b)
*
that
.
^
that
<
0';
0')
=
then case b
......
0',
and
so
d(0,
d(0,
*
+
+z)
Using the triangle inequality, this implies that
d(0 +z, 0').
d(<p(9,z)
d(<p(9,z),
+z
s.
*
(CI)
suppose w.l.o.g.
,
0*+z)
<p(9' ,z))
- d(0,0')
+ d(0 +z,
ip(0',z))
,
i
-
d(0,
+z)
- d(0 +z,
0'
0*+z)- d(0, 0*+z)] +[d(0*+z, <p(0',z))- d(0 +z,0')],
and inspection of (10') shows that each of the terms in square
brackets is non-positive.
Thus d(<p(0,z), <p(0',z)) ^ d(0,0')
= [d(<p(0,z)
for all
To
z,
,
0,
show
and 0'.
that
for
and
all
45
0'
there
is
positive
probability that d(tp(8,z),
tp(8',z))
d(8 ,8'
<
*
suppose first that 8
)
let 8
,
Then
8'
<
and
,
sufficiently
small c >0 there is a positive probability that z lies in any
*
sufficiently small neighborhood of 8 - 8 + e -w/2, and for z's
in this neighborhood, moves less than w to the left, while 8'
moves w, so that d(</>(8,z), <p(8',z)) < d(8,8').
If 6 - 6 s - <r +
w/2
*
8'
but
a
8 >
-
-w/2,
a + w/2.
-
for
argument
similar
establishes the
existence of a range of z's such that both 8 and 8' move to the
right, with 8' moving less than 8.
Finally,
+
-8
<
a
ife-e<-o-
w/2
8'
and
-8
for
d (8,8')
*
£
z's
-w/2, then
8'
cr
in
a
and &{<p(6 ,z)
neighborhood of e+w/2.
Thus
-8 > w,
,
ip(8',z))
(10')
<
is
a
random contraction.
The
step
last
proof
the
in
uniqueness
verify
to
is
that
(10')
which requires
that there be a point 8 such that for any neighborhood U of 8
and any 8, there is an n such that when the system begins at 8,
it has a positive probability of being in U in period n.
It is
satisfies
Futia's
h
easy to see that e.g. 8 Z = 8
*
condition
2.11,
satisfies this condition.
APPENDIX D Proof of Proposition 9 part b.
To complete the proof, we must show that there exists a
deterministic, finite time T such that (i) |e -8] < a +w/2, and
that |0m, c -8\ < 0"+ w/2 for all subsequent dates T+s.
(ii)
-8 |.
Define d. = 8
Note that since (8. -8.) and (8.-8.)
~
^ cr,
have the same sign, and \8. - 8
and
(8 -8.
+ 1 e +-)
have the same sign whenever d. > a + w/2. Hence,
,
.
|
)
\
(Dl)
d
t+1
=
|d
t
(i)
initial condition there exists a finite
of the sample path,
reached,
|8
either d
note that
d.
t+1
~d
-8
t
+-
(|8 t+1 - 8 t |)| whenever d t > a + w/2.
-
As a first step towards proving
To see this,
(
|
t+1
=
e
l
T'
we show that for any
such that regardless
a + w/2 or d
~e
•
proposition
8,
s
T/
implies that until
(Dl)
=
<
,
T
,
t+i
min{ w,
-e tl'
and
from
[2w/(2m+w)](8
t
min{w,
\& T
such
-8*)}
>
+1
a
T '\
T'
is
*
[2w/ (2m+w) ]w/2}
until the conditions defining T' are satisfied, the
decrease in d. is bounded below by a positive constant which is
independent of the sample path.
Thus,
46
If d
<
,
it
The remaining case
case,
(Dl)
setting T =
+ w/2,
a+ w/2
is
implies that d
than w - (a+ w/2) = w/2
to complete the proof of
To prove claim
-9
(ii)
T
-
=
,
d
<
|©
a < w/2
completes the proof of
T"
,
s
i
T +1
-
+cr.
e T'+l
~
In
*
T'
I
(i)
th ^ s
which is less
Hence we can set T = T'+l
-
6
,
|
-d
t
,
(i)
,
note that when
|e_ -6
<
\
a+ w/2, we
m +w/2 from
which is less than
From proposition 8 we then have
the assumption that m > 2a.
=
(2m-w)/(2m+w) (e - e*)
+
and since both 9* and ©
©
T
T
T+1
T
lie in the interval [G - a - w/2, 9 +a + w/2], so does 9
The claim now follows from induction on s.
have
|© T
T
£
|
(a + w/2)
[
+ a,
,
]
.
Appendix
E:
This appendix gives a rough computation of how many periods
would be needed on average for a player using history to be
reasonably sure he knew which technology is better. If we let a
= var(e
.
+ var(c_.)
)
the variance of u^ ~ u t
about (<r /9)
observations
denote
/
then the
player at 9 will need
of both
technologies to be fairly confident (about 85%- i.e. 15% chance
of a false rejection) that he knew which technology was better.
*
1/2
+ {cm/ 2)
this requires on the order of 2a 2 /crw
For
c*
*
1/2
< 9 * 9
+
observations of both technologies. For 9 + (crw/ 2)
1/2
2
3(crw/2)
at least 2a /9crw observations of both technologies
are required.
Our next step is to approximate the frequency with which
these observations arrive.
To do so, we approximate the
steady-state distribution by a normal distribution, and then
note that the density of the normal is less than -25/cr at points
more than 1 standard deviation away from the mean, and that the
density of the normal at the mean is less than .5/cr.
Finally,
we recall that only players within w of the current period's
Thus we conclude that
cutoff see both technologies being used.
*
for players within one standard deviation of 9 the required
= <r/w periods;
observations arrive on average every [2w-.5/cr]
2
2
2
since these players need about 2a /crw observations, 2cr /w
periods are required.
For players between 1 and 3 standard
deviations away, the required observations arrive about every
2
/9crw observation, so that
2cr/w periods; these players need 2cr
9*6
.
.
,
,
.
about
2
4<t
/9w
2.
C
periods are required.
47
48
REFERENCES
Apodaca, A. [1952] "Corn and Custom: Introduction of Hybrid Corn
to Spanish American Farmers in New Mexico,"
ed.
Human Problems in Technological
in E.H. Spicer,
Russel Sage Foundation, New York.
Change,
,
Banerjee, A. [1991a] "The Economics of Rumours," forthcoming
in the Review of Economic Studies
Banerjee, A. [1991b] "A Simple Model of Herd Behavior,"
forthcoming in the Quarterly Journal of Economics.
,
Bikchandari, S., D. Hirshleif fer, and I. Welch [1991] "A Theory
of Fads, Fashion, Custom, and Cultural Change as
Informational Cascades,"
forthcoming in the Journal of Political Economy.
Bush, R.R. and F. Mosteller [1955] Stochastic Models for Learning.
Wiley,
New York.
Chambers, J.D. and G.D. Mingay [1966] The Agricultural Revolution
1750-1880, Schocker, New York.
Conlisk, J. [1980] "Costly Optimizers
Economic
Imitators," Journal of
Organization. 1:275-293.
Cross,
J.G.
A Theory of
University Press
[1983]
Cambridge
Adaptive
"Learning,
Ellison,
G.
[1991]
and Coordination," mimeo.
English
R.E.P.
[1912]
Longmans Green and Co., London.
Ernie,
versus Cheap
Behavior
Local
Farming
Economic
and
Behavior.
Interaction,
Past
Present
and
"Invariant Distributions and the
C.
[1982]
Behavior of Markovian Economic Models, Econometrica
.
Limiting
Futia,
50,
377-408
Kandori,
M.
G.
Mailath,
and
R.
Rob
"Learning,
[1992]
Mutation, and Long Run Eguilibrium in Games," mimeo.
,
Kerridge, E. [1967]
Unwin, London.
The Agricultural Revolution.
George Allan
&
C. [1990] "Dynamic Choice in a Social Setting," University
of Wisconsin SSRI discussion paper 9003.
Manski,
Mantoux,
P.
Century
The Industrial Revolution in the
Harper and Row, N.Y.; reprinted in 1961.
[1905]
Mingay, G.E. [1977] The Agricultural Revolution.
Black,
London.
Adam
&
18th
Charles
"Some Convergence Theorems for Stochastic
Norman, M.F.
[1968]
Operators,"
Distance-Diminishing
Learning Systems with
J.
Mathematical Psychology. 5, 61-101.
49
Norman, M.F.
Press
[1972] Markov Processes and Learning Models
Academic
Rogers, E. with F. Shoemaker [1971] Communication of Innovations:
A Cross-Cultural approach. 2nd edition, New York, the Free
Press.
"A Two-Armed Bandit
Rothschild, M.
[1974]
Pricing," J^ Econ. Theory. 9, 185-202.
Ryan,
Theory
of
and N. Gross [1943] "The Diffusion of Hybrid
B.
Corn in Two Iowa Communities," Rural Sociology
Market
Seed
Schmalansee, R. [1975] "Alternative Models of Bandit Selection,"
Theory 10, 333-342.
J. Econ
.
Slicher van Bath, H.S. [1965] The Agrarian History of Europe A.D.
Edward Arnold, London.
xxxx -1850
"Error Persistence
L.
[1990]
Observational Learning," mimeo.
Smith,
and
Experimental
versus
"The Turnip, the New Husbandry, and the
Timmer, C.P.
[1969]
Agricultural Revolution," Quarterly Journal of
English
Economics
83, 375-395.
,
50
REFERENCES
See Rogers and Shoemaker [1971] for an extensive discussion of
on
adoption
processes,
research
empirical
especially
in
development. Mansfield [1968] and Ryan and Gross [1943] are
classic studies of technology adoption in basic industries and
agriculture, respectively.
Cross [1983] develops a model of boundedly rational,
adaptive choice with a similar information structure.
Smith's
paper
models
the
pricing
decisions
of
monopolistically competitive firms, for which the assumption of
unobserved payoffs seems more plausible. Banerjee [1991a] is a
model of investment decisions; Banerjee [1991b] is less explicit
about its intended interpretations.
Bikchandari et al suggest
that their model is appropriate for a wide range of decision
problems, including that of technology adoption.
4
Manski [1990] considers heterogeneous populations from an
His paper provides a sufficient condition
econometric viewpoint.
for an individual to be able to obtain a consistent estimate of
his own optimal choice by observing the choices and outcomes of
others
5
Note that when the players are heterogeneous, a central
to know the relative payoffs of the
planner would need
competing technologies for every player in order to implement
the optimum by fiat. Centrally-based agricultural reformers are
often hampered by their lack of understanding of the variation
For example, Apodaca
in farmers' tastes and production costs.
[1952] describes how a planner tried to induce a New Mexico
The innovation was adopted
community to adopt a hybrid corn.
and then discontinued despite doubling yields, as the villagers
decided the taste and consistency of the corn were inappropriate
making tortillas.
for
(In the second model the environment is complicated enough that a
great many periods would be required to obtain good estimates, as
we discuss in section 5.)
See Mingay [1977] and Slicher von Bath [1965] for more
Timmer
thorough descriptions of the technology.
[1965]
discusses the extent of the resulting gains in productivity.
8
Kerridge [1967] pp. 28-34, 339-341.
9
Chambers and Mingay [1966]
p.
55.
10
See Timmer [1969]. and Kerridge [1967]. Slicher von Bath [1965]
gives a similar figure for the rate of diffusion in
243
p.
France.
See Timmer [1969] for an excellent summary of this debate.
51
12
...
•
The consideration of inertia is further motivated by the
empirical evidence that there is typically a substantial lag
between the time individuals first learn of the existence of a
Ryan and Gross [1943]
technology and the time they adopt it.
found that farmers in two rural communities on average adopted
hybrid seed corn 9 years after they first heard of the
innovation. Other studies cited in Rogers and Shoemaker [1971],
p. 129, report lags of 2-4 years for the adoption of weed spray
Note that the spread of
in Iowa and fertilizer in Pakistan.
literacy and modern communication media will speed up the rate
at which farmers become aware of a new technology's existence,
but do not seem to have eliminated the lag between becoming
informed and deciding to adopt.
If we interpret
x.
as the probability that a singleindividual
chooses g, as opposed to the population fraction, (2) is an
example of the linear stochastic learning theory (LSLT) of Bush
This theory describes a traditional
and Mosteller [1952].
one-player bandit problem, in which only 1 arm is observed in
each period; it is also assumed that the only possible outcomes
are "reward" and "failure," with the probability of arm i being
Our model corresponds to the special case in
rewarded being tt.
and the two arm's outcomes in a given period
which 7T. = 1-tt_
are perfectly negatively correlated. (In the bandit problem, the
See Schmalansee [1975] for a
joint distribution is irrelevant.)
and a discussion of their
brief survey of these models,
pricing.
system
to
market
In
our
applicability
(2)
Schmalansee's constants L, L, G, and G all equal a, while
Schmalansee's a and a both equal 1, and his /S. and jS_ equal 0.
,
,
Schmalansee argues that in some cases the behavior these models
prescribe, with both actions taken infinitely often, is more
realistic than that of the optimal solution to the discounted
bandit problem.
....
14
The empirical literature suggests that popularity weighting is
a factor, but reliable estimates of m are hard to come by.
Rogers and Shoemaker (op cit.
p. 142) say that "many students
of peasant life feel" that innovations must be 20% to 30% better
to be adopted; they also cite a President's Science Advisory
Committee figure of 50% to 100%
From our reading, it is not
clear whether these premia reflect popularity weighting or
inertia.
,
.
52
15
We should warn the reader that the results we obtain for the
fact
uniform case will seem to rely on the
that
this
distribution has compact support: an observation that u^ - u^ >
However, compact support
a implies that 6 is greater than zero.
is not what underlies our conclusions. Appendix A shows that the
non-linear rule "only switch if the observed payoff difference
is
large compared to the popularity" leads to a long-run
distribution that places most of its weight on the better choice
whenever the distribution of errors is "infinitely revealing in
the tails." The appendix also reports simulations of a more
complex rule that seems to work well even when the tails are not
infinitely revealing.
An alternative explanation is to use the fact that the rule m
the optimal long-run decision, and that since a is
the standard deviation of the per-period payoff differences,
rescaling the utility function rescales cr in the same way.
= a yields
17
Unless the payoff difference is so extreme that G -cr > m, in
which case the rate of adoption is independent of 8. Note that
the rate is also an increasing function of 8 when m = 0, s
provided that 8 is smaller than cr.
18
However, the correlation is easy to explain as the result of an
optimal investment policy under complete information if adopting
the innovation requires investing in a capital good.
19
Although our leading example of very small window widths is
the English agricultural revolution, small window widths should
Anecdotal evidence
not be seen as requiring illiterate agents.
suggests that farmers often distrust the information of central
authorities and experts, and prefer to see how innovations work
Ryan and Gross [194 3] found that the
out in their neighborhood.
experiences of neighbors was an important factor in the adoption
of hybrid seed corn by 2 0th century Iowa farmers.
20
Although we have not checked the details, it seems that a
large
window
a
rule
combination
of
widths
with
of
proximity-weighted averages could combine quick convergence with
a small long-run variance.
21
" However, as w increases the specification bias grows. When w is
large, it may be more natural to suppose that players weight the
experience of those nearby more than that of those who are
farther away but still within their window.
.
.
'
....
.
...
22
Based on estimated standard errors, the first two digits
are correct at the .95 level.
23
We thank Roland Benabou for this observation.
(See Futia's [198 2 survey for a summary of Norman's results,
invariant
and
other techniques for establishing that the
distribution is unique.)
53
18 8
kO
MIT LIBRARIES
3
=1060
0074bT3b
1
Download