Document 11162682

advertisement
MIT. LIBRARIES
-
DEWEY
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium Member Libraries
http://www.archive.org/details/informationalherOOsmit
working paper
department
of economics
INFORMATIONAL HERDING AS EXPERIMENTATION DEJA VU
Lones Smith
Peter S^rensen
96-9
Mar. 1996
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 021 39
INFORMATIONAL HERDING AS EXPERIMENTATION DEJA VU
Lones Smith
Peter S/irensen
96-9
Mar. 1996
MASSACHUSETTS INSTITUTE
OF Tcry.voi OGY
JUL 18
1996
LIBRARIES
Informational Herding as Experimentation Deja Vu
Peter S0rensen*
Lones Smith*
Department of Economics, M.I.T
March
27,
Nuffield College
and M.I.T.
1996
Abstract
We
prove that the recently proposed informational herding models are but
special cases of a standard single person experimentation
We
model with myopia.
then re-interpret the incorrect herding outcome as a familiar failure of
complete learning in an optimal experimentation problem.
We
next explore that experimentation model with patient individuals
or equivalently, the observational learning
successors.
As
such,
we
find that even
—
model where individuals care about
when
individuals internalize the herding
and incomplete learning still obtain.
We note that this outcome can be implemented as a constrained social optimum
when decentralized by transfers.
externality in this fashion, incorrect herds
'The proposed mapping in this paper first appeared in July 1995 as a later section in our companion
paper, "Pathological Outcomes of Observational Learning." For the current full length paper, we thank
Abhijit Banerjee, Christopher Wallace, and the MIT theory lunch for comments. Smith gratefully acknowledges financial support for this work from the NSF, and S0rensen equally thanks the Danish Social
Sciences Research Council.
'e-mail address: lonesffllones.mit.edu
*e-mail address:
peter.sorensenQnuf.ox.ac.uk
And out
of old bookes,
Cometh
al
this
new
in
good
faithe,
men
science that
lere.
— Geoffrey Chaucer
INTRODUCTION
1.
The
few years has seen a flood of research on a paradigm known as informational
last
herding.
1
We ourselves
have actively participated
(1996a), or simply SS) that
in this research
was sparked independently by Banerjee (1992) and
Bikhchandani, Hirshleifer, and Welch (1992). The context
is
and menus, and each may condition
private signal about the state of the world
observation of their private signals
If
but
these private signals have
is
not always correct
is
and on
finite
his decision
all his
menu. Everyone has
both on
is
verboten.
bounded power,
known
it is
— namely, after some point,
we explored, embellished, and
BHW's kindred
quite robust.
upon a
limit
all
that a 'herd' eventually arises,
make the
identical choice, possibly
not occur in
finite time.
The
fully fleshed out this story,
notion of cascades
where only one action
is
(endowed)
his
predecessors' decisions; however,
unwise. This simple pathological outcome has understandably attracted
In SS
BHW:
An infinite
seductively simple:
sequence of individuals must decide on an action choice from a
identical preferences
herd (Smith and Sorensen
is
not as
resilient:
much
fanfare.
and found that herding
While
beliefs
converge
taken with probability one (a limit cascade), this need
potential for bad herds
is
also not without caveat:
Absent a
uniform bound on the strength of the individuals' private signals, only correct herds
Finally, in a world with multiple preference types, a
occur, where the lesson of history
A Possible Link,
train of thought:
been piqued by
Is
bandit:
forever mixed,
and a Puzzle.
confounded learning outcome might
and private signals always
decisive.
Robustness aside, this paper pursues a different
informational herding fundamentally a
new phenomenon?
An
One
classic
example
infinite-lived impatient
is
[0, 1].
on one of the
Aiming
monopolist optimally experiments with two
Rothschild showed that the monopolist would
prices,
for
and
(ii)
See
more than a casual analogy between
The Assembly
(i)
with positive probability select the
recast the observational learning
1
have
Rothschild's (1974) analysis of the two-
possible prices each period, with purchase chances for each price being fixed,
draws from
We
similarity to the familiar failure of complete learning in an optimal
its
experimentation problem.
armed
is
arise.
paradigm as a
of Fowles. Line 22.
this
unknown
eventually settle
down
less profitable price.
outcome and herding, we must
single person optimization problem. This
who
suggests considering the forgetful experimenter,
signal, takes
can
reflect
each period receives a new informative
an optimal action, and then promptly forgets his signal; the next period, he
only on his action choice.
can
rationality, nor
it
be the basis
for
Alas, this
is
neither a very satisfying model of
an optimal experimentation problem.
How
then can
an experimenter not observe the private signals, and yet take informative actions?
An
The Second Puzzle, and a Resolution.
interesting sequel to Rothschild's
work was McLennan (1984), who permitted the monopolist the
continuum of
of a
When
curves'.
down on
prices,
the
flexibility to
charge one
but assumed only two possible linear purchase chance 'demand
demand
may
curves crossed, he found that the monopolist
well settle
the suboptimal uninformative price.
Rothschild's and McLennan's models give examples of potentially confounding actions,
as introduced in
mal
for
beliefs
EK: Easley and Kiefer
(1988). In brief, these are actions that are opti-
unfocused beliefs for which they are invariants
unchanged). Of particular significance
finite state
actions,
and action spaces, there
as separate anticipations of
arise.
EK's general
continuous state space, whereas
the proof in
is
will generically
and thus complete learning must
McLennan
(i.e.
2
taking the action leaves the
EK
(on page 1059) that with
not exist any potentially confounding
Rothschild and
McLennan might be
Rothschild escapes
insight.
seen
by means of a
it
resorts to a continuous action space. Yet there
appears no escape for the herding paradigm, where both flavors of incomplete learning,
limit cascades
Our
and confounded learning, generically
arise with
two actions and two
states.
resolution of these puzzles respects the quintessence of the herding paradigm that
predecessors' signals are hidden from view. In a nutshell,
we imagine that
individuals do
not act for themselves, but rather furnish optimal history-contingent 'decision rules' for
agent machines that automatically
With
this device,
context for an old
map
informational herding
any realized private signal into an action choice.
is
properly understood as a
phenomenon: incomplete learning by an
Herding with Forward-Looking Behavior.
example of an informational externality: While
fully
determine the state of the world,
it is
all
new (camouflaged)
infinite-lived experimenter.
The herding outcome
individuals collectively
is
a striking
know enough
to
aggregated rather poorly. For in a herd, most
individuals almost surely take an action which reveals almost none of their information.
Late-comers ideally prefer that their predecessors had better signalled their information
with more revealing actions, but early individuals clearly have no incentive to do
Notwithstanding the motivation of this research, some
2
For instance, payoff assignments in a one-armed bandit
are not generic in K2
founding action
—
.
may
so.
therefore find the second
— where the safe arm
is
a potentially con-
more compelling, where we
half of this paper
investigate the herding externality with
forward-looking behavior. In fact, having recast the problem as one of optimal individual
experimentation,
myopia.
For as
it is
not implausible that learning
well
is
is
incomplete simply because of the
known, a rational experimenter
some current
willing to forgo
is
We
payoff in order to secure knowledge relevant for future payoffs and decisions.
experimentation model without myopia, learning
fact that in the
Herding
is
to
itself is
is still
not complete. 3
best captured by an equivalent social planner's problem, where the goal
maximize the present discounted value of individuals'
welfares.
The
social planner
Such an
likewise will sacrifice early payoffs for the informational benefit of sucessors.
outcome can be decentralized as a constrained
social
optimum by means
of budget balance transfers that encourage experimentation. Yet
externality
is
prove in
we
of a simple set
find that the herding
only imperfectly mitigated by the social planner: Incorrect herds
obtain,
still
but with chances vanishing as the discount factor converges to one.
Overview.
we
Section 2 outlines a general observational learning model. In section
re-interpret the herding
paradigm as one of optimal single-person experimentation. Sec-
tion 4 characterizes the limiting beliefs of that experimenter,
and then reverts to the
A
planner's herding problem for the characterization of the limiting actions.
gives a broader perspective on our findings.
In this section
and thus
we
BHW
set
An
exogenous order. There
is
infinite
T)
by an individual
SS (Lemma
is
1),
=
1,2,... takes actions in that
The elements
There
is
a given
measure A over 0.
random
signal,
on € E, about the
WLOG
we may assume
actually his private
belief, i.e.
we
let
state of the world.
a be the measure over
Signals thus belong to E, the space of probability measures over (0, J"), and S
Just as in SS,
when
which
signal.
is
Conditional on the state, the signals are assumed to be
private signals are not uniformly
bounded
As
that the private signal received
from Bayesian updating given the prior A and observation of the private
associated sigma-algebra.
This
into the experimentation literature.
are referred to as states of the world.
receives a private
in
happens to cover Lee (1993).
sequence of individuals n
prior belief, the probability
demonstrated
results
mapping
also
uncertainty about the payoffs from these actions.
of the parameter space (0,
n
Longer but essential proofs are appendicized.
and Banerjee (1992), and
Information.
Individual
conclusion
up a very general observational learning model, that subsumes
generality facilitates the ensuing
common
social
OBSERVATIONAL LEARNING MODEL
2.
SS,
3,
in
power, learning
is
complete.
the
i.i.d.
6
across individuals, distributed according to the probability measure
e
in state 9
p.
€ 0. To
some
signals are informative.
Each distribution may contain atoms, but to ensure that no signal
will perfectly reveal the
avoid trivialities, assume that not
state of the world,
we
p
all
insist that all
are identical, so that
e
p be mutually absolutely continuous
(a.c), for 9
Everyone chooses from a fixed action
Bayesian Decision-Making.
set A,
e 0. 4
equipped
with the sigma-algebra A. Action a earns a nonstochastic payoff u(a, 9) in state 9 6 0, the
same
where u
for all individuals,
everyone
is
rational,
seeks to
i.e.
:
Ax©
i->-
maximize
E is measurable.
their expected payoff. Before deciding
action, everyone first observes his private signal/belief
An
own
private belief.
As
in SS,
we simply assume that an
and can use the
common
probability distribution over action profiles
h
Bayes' rule implies a unique public belief
Given the posterior
belief p
expected payoff u a (p)
exists.
6
The
to A(yl).
A
rule
individual can
in either state.
=
-n
7r(/i)
and
a
= Je u(a,9)dp(9). We
(.A, .A).
That
is,
x
The optimal x
a depends on
the state
both 9 and the optimal decision rule
and unconditional on the
9,
yields the posterior belief p
€ E.
1
some
The
random
variable
A
h.
5
which maximizes
x
:
E
h->
=
a(p)
A(A), the space of
=
E
x(a) over actions simultaneously
it.
Since the probability mea-
the implied distribution over actions a depends on
the density
state, it is ip(a\n,x)
is ij){a\9,
x)
= JQ ip(a\6, x)n(d9).
7r n+1
.
Thus,
transition probabilities.
belief process (n n ) is a
limiting
any history
for
assume that an optimal action a
depends on
clearly
a distribution over next period public beliefs
Lemma
these probabilities,
an element of the space A' of maps from
is
x. In fact,
Markov process with state-dependent
decision
final
Observational Learning as a Stochastic Process.
sure of signals
Knowing
E E, individual n picks the action a £
x produces an implied distribution v
for all private beliefs a.
compute the
his
A
€ E
solution defines an optimal decision rule
probability measures over
a.s. to
entire action history h.
prior to calculate the ex ante (time-0)
application of Bayes' rule also given the private signal
always
and the
that
upon an
individual's Bayes-optimal decision rule uses the observed action history
rules of all predecessors,
his
common knowledge
It is
(7r n )
= f x(a)(a)p e (da),
This
in
turn yields
follows a discrete-time
The next
result
is
standard:
martingale unconditional on the state, converging
Tr^,.
The
limit
n^
places no weight on point masses
on the wrong states of the world.
See Rudin (1987). Measure /i 1 is a.c. w.r.t. p? if p?(S) =
=> fi l (S) =
VS € S, the sigma-algebra on
2
1
2
l
E. By the Radon-Nikodym Theorem, a unique g € L
every S € S.
) exists with n {S) = f
s gdp for
H
L
With p,
mutually a.c, 'almost sure' assertions are well-defined without specifying the measure.
p,
4
^
,
5
6
one wishes to pursue this angle, this is the unique Bayesian equilibrium of the 'game'.
Absent a unique solution, we must take an arbitrary measurable selection from the solution corre-
If
spondence.
Observational Learning Model
G 6
n individuals n n
Optimal decision rule x e X
States 9 g
States
Belief after
Belief after
A
Action taken by individual, a G
Density over observables
1:
Embedding.
ip(a\9, x)
Payoffs (unobserved)
Payoffs (private information)
Table
7r n
A
Observable signal a E
ip(a\9, x)
Density over actions
n observations
Optimal action x G X
Randomness in the nth experiment
an
Private signal of individual n,
Model
Impatient Experimenter
This table displays
how our
single-type observational learning
model
fits
into
the impatient single person experimentation model.
3.
A
first
MODEL COMPARISON
step in recasting our general
experimentation problem
is
model of observational learning
But
to avoid a forgetful experimenter,
an
in
(no 'active experimentation').
we must regard the observational learning
from a new perspective. Consider individual
private signal
n,
who
uses both the public belief
forming and acting upon his posterior
these two steps using the conditional independence of
as:
(i)
observing
and submitting
private signal
7r n
it
,
to
but not his private signal;
an agent
and take
'choice'
his action a
distribution // in state
rule
x described
We now
9,
n n and on
.
his
separate
Mr. n can be regarded
optimally determining the rule x £ X,
A
for
The ultimate
him.
payoff u(a, 9)
If
is
unobserved,
private beliefs
a G
A
a have
with chance ip(a\6,x).
precisely describe the single-person experimentation model.
EX chooses an
A
is
action (the rule) x G
this addresses
The
state space
X. Given
is
this choice,
realized with chance tp(a\9,x) in state 6. Finally,
updates beliefs using this information alone. 7 Table
how
We may
and
machine; and (Hi) letting that machine observe his
in section 2, resulting in action
a random observable statistic a G
Notice
.
pn
7r n
story
the impatient experimenter will choose the same optimal decision
O. In period n, the experimenter
£X
G
(ii)
beliefs
that provide an additional signal of the state of the world.
lest
Thus, we must study
to respect the individuals' selfishness.
an impatient experimenter with discount factor
as a single person
both lead puzzles.
1
summarizes the embedding.
First, the
experimenter never knows
7
This experimentation model does not strictly fit into the EK mold, where the instantaneous reward
period n depends only on the action and the observed signal, but (unlike here) not on the parameter
9 in 0. This is the structure of Aghion, Bolton, Harris, and Jullien (1991) (ABHJ), where payoffs are
not necessarily observed. If we wish, we may simply posit that the experimenter has fair insurance, and
in
simply earns his expected payoff each period rather than his random realized payoff. Then his behaviour
will be exactly the same, yet he will not learn anything from the payoff.
the private beliefs a, and thus does not forget them.
Second, the pathological learning
outcomes are entirely consistent with EK's generic complete learning
with
finite
signals
The
result for
Simply put, actions do not map
action and state spaces.
models
to actions but to
rewrites the observational learning model as an experimentation model.
when one
true action space for
£X
X
the infinite space
is
of decision rules.
One
SS considered two major modifications of the informational herding paradigm.
was to add
i.i.d.
noise to the individual decision problem. Noise
is
easily incorporated here
by adding an exogenous chance of a noisy signal (random action). SS also allowed
population. Multiple types can be addressed here by simply imagining that
task and private belief,
T
randomly drawn from one or the other type
different types of preferences, with individuals
T-vector of optimal decision rules
for
from X T with
(only) the choice
and choosing the action a
£X
chooses a
machine observing the
as before.
PATIENCE
4.
Assumptions
4.1 Special
While we have rather generally expressed observational learning as a type of myopic
experimentation, our analytical results will require a more restricted model typical of the
herding paradigm. For instance,
if
(0,
T)
is
sigma-algebra, then the space of mappings
mappings from the measures E over
So just as
X
algebra over
If
supp(//)
C
[0, 1].
(0, 1),
with the Borel
rather unwieldy: the space of measurable
is
©=
{H, L}
(or
more generally
is
H, and thus E
is
finite).
High and
=
X(H)
=
the interval
[0, 1],
and S the Borel sigma-
low states are equilikely ex ante, and so have prior X(L)
expresses the chance of state
R equipped
to the simplex A(A).
we assume that
in SS,
a continuum subset
1/2. Private belief
Let supp(/x) be the (common) support 8 of each probability measure
then we
call
the private beliefs bounded, while
if
co(supp(/u))
=
a
6
p.
.
[0, 1],
they are unbounded. In that case, arbitrarily strong private signals/beliefs can occur.
Lee (1993) showed that with these restrictions, a continuous action space can easily
Knowing
allow for simple statistical learning:
individual n's public belief, observation of
his action then perfectly reveals his private signal.
assumption that
A =
{a\,
.
.
.
,
a^}
is finite.
We
We
thus
make the standard lumpiness
assume that no action
the others, and this implies the simple interval structure that action a m
when the
posterior
p
is
in
some
sub-interval of
myopically optimal for posteriors p e
As
usual, the support of a probability
[0, 1].
We
m -i, fm where
[f
],
on the Borel-algebra
is
is
is
dominated by
optimal exactly
order the actions such that a m
=f <
fi
<
.
.
.
< M=
f.
1-
the smallest closed set of measure
1.
is
'
The Patient Experimenter
4.2
Since our experimentation model admits any discount factor 8 G
>
wonders what happens with a patient experimenter (when 8
one naturally
[0, 1),
SS shows that the
0).
=
myopic experimenter maps higher signals into higher actions: There are thresholds
<
xq
a G
.
.
< xm =
.
This
.
1
depending on
it
alone, such that action a m
Lemma A.l
also true with patience, as (appendicized)
is
optimal when
strictly
is
and indifference between between a m and a m+l prevails
(x m -i,x m ),
= xm
a
<
Xi
at the knife-edge
proves. Intuitively,
not only does the interval structure maximize immediate payoffs, but
also ensures the
it
greatest future value of information, by producing the riskiest posterior belief distribution.
We now
down and
Proposition
Jm {8) C
•
For
1
[0, 1],
all 5
• With
£X
that
Theorem
of EK's
Lemma
turn to £X's long run behavior.
G
is
(Absorbing Basins)
such that when
it
7r(5)
<
ft
(8)
<
1.
For each a m G A, a possibly empty interval
is
concentrated on the basins Ji(8)
beliefs, J\{8)
=
{0} and J M (6)
(7r n )
is 8,
=
—
{1};
all
For
the smaller are all basins.
J\(8)
other
=
{0} and
U
•
•
U
.
Jm{8).
are empty.
where
[ft(8),l],
large
exists
induces a m
Jm (8)
=
and Jm{8)
[0, zl((5)]
and Jm, while lim^x
basins disappear except for Ji
Proof:
a.s.
n^
The larger
an expression
is
precludes further learning.
• If the private beliefs are bounded, then Ji(5)
<
result
G Jm {8), £3C optimally chooses x which
the limit belief
unbounded private
n^
us that beliefs must settle
The next
never dead wrong about the state.
5 that the limiting belief
[0, 1),
tells
1
enough
8,
all
— {!}
lim^! Jm{8)
This characterization of the stationary points of the stochastic process of beliefs
directly generalizes the analysis for 8
=
in SS. See figure 1 for
an illustration of how
the basins are determined from the shape of the optimal value function. All but the initial
limit belief result are established in the appendix.
limit cascade
at least
must occur,
two signals
in
the highest such signal
A
is
as
SS
call it
To
see
— observe that
for
why
any belief
are realized with positive probability.
more
likely in state
follows that next period's belief differs
that one
By
7r
in the
support of
H, and the lowest more
For unbounded private
•
With bounded private
•
not in any basin,
Intuitively, or
B
of SS,
by
n cannot
71^.
•
is
— that a
likely in state L. It
from # with positive probability.
Proposition 2 (Convergence of Beliefs, or Long
there
true
the interval structure,
the characterization result for Markov-martingale processes in appendix
lie
is
beliefs,
beliefs,
tt^
is
learning
positive probability in state
H
Run
Learning)
concentrated on the truth for any 8 G
is
incomplete for any 8 G
[0,
[0, 1).
1): UnlessiiQ
G Jm{8),
that -k^ is not in Jm{8)-
The chance of incomplete learning with bounded private
beliefs
vanishes as 8 1
1
•
Figure
1:
Typical value function.
Stylized graph of v{ir,S), 6
>
0,
with three actions.
u{a x ,L)
u{a 2 ,L)
u(ai,H)
u(a 3 ,L)
MS)
MS)
J,{5)
Given the absorbing basin characterization of Proposition
Proof:
result follows
bounded
Lemma
from
1
(since
tTqo
incomplete learning conclusion follows just as in Theorem
beliefs
that for 6 close enough to
=
— 7r n )/7r n
(1
is
likelihood ratio (1
above by i(l
lim^i n(5)
1,
0,
1.
-k^ places all weight in J\{5)
—
o)/a bounded above by some
of £<*>
I
<
1
The
0).
We now
of SS.
and Jm(S). The likelihood
all
private beliefs
oo, the sequence (£ n )
must equal
=
7r
First, Proposition 1 assures us
a martingale conditional on state H. Because
— ]i(6))/n(S), and the mean
=
the unbounded beliefs
places no weight on the point belief
extend that proof to establish the limiting result for 5 1
£n
1,
its
the weight that tt^ places on J\(S) in state
a have
bounded
— 7ro)/7r
must vanish as 6 —>
prior
H
is
mean
ratio
(1
.
Since
1.
Observe how incomplete learning to some extent plagues even an extremely patient
£X. So
that
if
6 close
v(ir)
=
this
problem does not
fall
under the rubric of EK's Theorem
the optimal value function v
enough to
1.
u ai (ir)
it
for
For here,
near
0,
EX
and
is
strictly
convex
ir,
where
learning
is
it is
v(ir)
=
u QM (7r)
for
n near
lumpy
1,
shown
complete
optimally behaves myopically for very extreme
points to the source of the incomplete learning:
4.3
in beliefs
9,
beliefs:
both affine functions. This
signals rather than impatience.
Forward-Looking Informational Herding
Let us return to the view of this as the informational herding paradigm. Suppose
that everyone
If
for
is altruistic,
but subject to the standard herding informational constraints.
the realized payoff sequence
maximize E[(l
-
5)
first
is
(u k ), suppose that every individual
E*Ln **""«* M> where we
8
define E[f\ir]
n
=
1,2,... seeks to
= £»ee *(*)/(*)
Then
the
solution to £X's problem yields a perfect Bayesian equilibrium of this model.
Now
The Planner's Problem.
let
us add a shade of economic realism, and posit
behavior. Reinterpret the patient experimenter's problem as that of an informationn ~ lu
ally constrained social planner ST trying to maximize (1 - S) Yl^Li ^
n, the expected
selfish
discounted average welfare of the individuals in the herding model.
Observe how
this
and £X's objectives. To respect that key herding informational
perfectly aligns the ST's
assumption that actions are observed and signals unobserved, assume further that the
ST
state nor can observe the individuals' private signals, but can both
knows the
neither
observe and punish/reward any actions taken.
We
now
are
positioned to reformulate the learning results of the last section at the level
of actions.
The Overturning
that for
near
its
7r
Jm (5),
Principle of SS generalizes to this case:
Lemma A. 2
establishes
actions other than a m will push the updated public belief far from
current value. This at once yields the following corollary to Proposition
2.
Proposition 3 (Convergence of Actions, or Herds)
•
For unbounded private
•
With bounded private
beliefs,
beliefs,
a herd on
some action eventually
om
a herd arises on an action other than
•
no surprise that ST ends up with
More
individuals will.
interesting
incorrect herd. Herding
is
Let
ST
learning with
that
is
ST
beliefs, for
a
is
mapped
herder's threshold
"a m+ i(p(7r, x m ))
if
As
that
is
and optimal
rule x* for ST,
into the posterior p(n,a)
x m must solve the
+ r(am+1 |7r).
optimally chosen, x m
incentives are
ST
everyone
=
how
= x*m
unchanged
if
,
zero, or
=
+
indifference equation
So, the difference r(a m _x
|7r)
is
(1
-
7r)(l
Lemma
A.l
valid.
-
a)}.
u am (p(7r,x m ))
— r(a m |7r)
+
The
private
The
selfish
r(am |7r)
alone matters for
=
how
and the ST can ensure that the threshold
by suitably adjusting
a constant
is still
are the transfers set?
7ra/[ira
added to
achieve expected budget balance each period:
is
selfish
the individual takes action a he receives the
individuals trade-off between the two actions,
belief
even
tax or subsidize the actions according to the following simple
that individuals optimally choose private belief threshold rules
belief
1.
the choices away from the myopic solution
(possibly negative) transfer T(a\ir). Faced with such incentives, our proof in
belief n,
[0, 1).
optimally incurs the risk of an ever-lasting
How does ST steer
scheme. Given the current public belief n,
Given public
G Jm{8),
H for any 5 €
vanishes as S t
unbounded
7r
truly a robust property of the observational learning paradigm.
Optimal Transfers.
to £X's problem?
full
beliefs
Unless
starts.
with positive chance in state
The chance of an incorrect herd with bounded private
It is
on the correct action.
a herd eventually starts
this difference.
all
i.e.
M transfers, we can also
insist
the expected contribution from
]Cm=i ^(a m |7r,s)r(a m |7r). This uniquely determines the
transfers.
We
would dearly
much work, we
if
his desired
like to
provide a crisp characterization of these transfers
believe that this
one
will
is
impossible. Clearly,
be chosen anyway,
i.e.
that actions be chosen with discretion as
from
will differ
7r
e
Ji{8)
C
a
when
7r
6
£
it
Ji(S)
Ji{6),
strict inclusion for
'experimentation' (making non-myopic choices)
What
not tax or subsidize actions
C J
{
Conversely,
.
if
SCP wishes
then some transfers intuitively
high enough discount factors by Claim A. 7, transfers
are not identically zero for a sufficiently patient ST.
successors.
will
because myopic and patient behavior do not coincide. Rigorously, as
zero,
Ji is
for
S7
— but after
that,
rewarded, as
is
these 'high-information' choices are
5.
Beyond
it
we can only say that
provides information to
rather hard to
is
tell.
CONCLUSION
This paper has discovered and explored the fact that informational herding
is
simply
incomplete learning by a single experimenter, suitably concealed. Yet herding has clearly
achieved a measure of 'market penetration' that has eluded the experimentation literature.
Aside from
its
simple economics, we believe this owes to
case, thus avoiding the
Our mapping,
its
focus on the zero discounting
heavy dynamic optimization machinery.
recasting everything in rule space, has led us to an equivalent social
The
planner's problem, an oft-performed exercise for economic models.
in
mechanism design
also
employs a
'rule
machine'.
But
revelation principle
in multi-period, multi-person
models with uncertainty, the planner must respect the agents'
belief filtrations.
machine approach succeeds by exploiting the martingale property of public
Our
rule
beliefs.
This property extends beyond action observation models to those where only an imperfect signal of one's posterior belief
such signals
is
observed.
is
But once we
— again, provided the entire history of
observed
relax the assumption that the history
is
perfectly
observed, as in Smith and S0rensen (1996b), the public belief process (howsoever defined)
ceases to be a martingale
— a fact that we are exploring
in a
work
in progress.
As a
result,
a mapping of these models into the experimentation literature no longer appears possible.
Finally,
we and others have
correctly attributed
social information (a finite action space).
requires an optimal action
ip(a\9, x) of signals
a
is
x
for
As
in
EK, incomplete learning
which unfocused
the same for
all states.
bad herds to the lumpy transmission of
beliefs are invariant,
Such an invariance
with fewer available signals, and not surprisingly herding and
pathologies that
we have seen assume a
Rothschild (1974),
McLennan
(1984),
finite (vs.
all
the distribution
clearly easier to satisfy
published experimentation
continuous) signal space. For instance,
and ABHJ's example are
10
is
i.e.
at the very least
all
binary signal models.
APPENDIX
A.
A.l Preliminary
A
n
strategy s n for period
must be used, given
7r n
map from E
a
to
X.
The experimenter chooses a
.
X
prescribes the rule x n €
It
strategy profile s
determines the stochastic evolution of the model
in turn
which
belief
is
—
i.e.
=
which
{s u s 2
.
.
.),
,
a distribution over
the sequences of realized actions, signals, payoffs, and beliefs.
Our
analysis
is
inspired by
value function v(-,S)
is u(7r,
=
5)
sup s E[(l
processes implied by
:
-
E
->•
5)
ABHJ
E
Y^Li
and sections 9.1-2 of Stokey and Lucas
for the experimentation
<S
t-lu
=
expected payoff from a m at belief n. Since u m
Ts v(ir) =
where q(n,
sup
xex
y^
<
for v
x)
[(1
-
7ra(a m ,i7)
is affine,
+
-
(1
ir)u(am ,L) denotes the
we have the Bellman operator Tg
6)u m (q{7r, x, a m ))
+ 6v(q{n, x, a m ))]
>
Bayes-updated belief from
v'
Lemma
we have T$v >
Tgv'.
As
tt
is
when a
is
observed and rule x
standard, T$
is
= Xq < X\ <
when a — x m there is
Proof:
We
.
<
m-i, let
probability into a mr
If
.
< xm =
.
1
such that action a m
is
X
q(Tr,x,a mi
v(-, S)
)
)
>
£j
(i
Assume
=
1, 2)
be those signals in
E
x m ),
which are mapped with positive
to the contrary that the sets are not ordered as Ei
[0,x]
and E 2 (x)
(it
may be
is
=
(E :
<
g(7r,x,a m2 ).
q(n,x,a mi )
<
< E2
.
U E2 ) D
[x, 1].
,
and vice
versa.
=
For any x E (0,1), define T,i(x)
Consider then the modified rule x which
taken for signals in Ej(x), and where x
satisfies
=
^(a mi |7r,x)
necessary for x to randomize over the two actions at signal x to
accomplish that). Since signals more in favor of state
mean
(^ m _i,
q(ir,x,a m2 ), then given our ordering of actions, payoffs are improved
equals x, except that a mi
il)(a mi \ir,x)
described by
randomization between a m and a m+ i.
Next, assume that g(7r,a;,ami )
U E2 n
is
when a €
taken
(with no loss of information) by remapping signals leading to a mi into a m2
a
applied.
prove that any rule x violating this interval structure can be strictly improved
upon. For mi
find
is
a contraction, and
For any n, one optimal rule x £
A.l (Interval Structure)
thresholds
(Ex
(A-l)
V
J
unique fixed point in the space of bounded, continuous, weakly convex functions.
is its
and
\TT,
problem with discount factor 5
Um eA
x, a) is the
Note that
ip{a m
The
nk], where the expectation uses the distribution of
Recall that u m (ir)
s.
(1989).
L
are
q(n,x,a mi ), and similarly q(Tr,x,a m2 )
>
mapped
into a mi under x,
q(7r,x,a m2 ). Thus,
we
x implies
preserving spread of the updated belief versus x, and since the value function
weakly convex, this weakly improves the value
11
in the
Bellman equation
Ts {v) =
v.
is
D
A. 2 Proof of Proposition
The proposition
is
established in a series of claims.
For each a m G A, a possibly empty interval Jm {8)
Claim A.l (Basins)
when
£%
€ Jm {fi),
7r
For the
Proof:
we
first half,
G Jm (5), then a m
7r
optimally chooses x with supp(/i)
For any 8 G
(learning stops).
=
if v{tt,
u m (ir)
affine function of
an
is
For the second
since
vq(-k)
half, ai is
updates to
it
G Ji{8) and
[0, 1),
really
7r n +i
=
ir,
Claim A. 2
m -i,x m
i.e.
],
a m occurs
and
=
v(n,8)
is
then n G Jm (8) and a m
is
v
(ir)
a.s.
no matter which rule
A
[0, 1).
similar
is
applied,
it is
= um (Tr).
(7r)
interval.
=
7r n
0,
and
also dynamically
argument holds when
by v
If
the optimal choice. As
myopically strictly optimal for the focused belief
tt
interval.
weakly convex, Jm (8) must be an
v(-,8) is
a.s.
G Jm{8).
1
Define the expected utility frontier function v
utility
[x
such that
exists,
need only prove that Jm {8) must be an
= u m (7r)
optimal for any discount factor 5 G
expected
C
the optimal choice, and the value
is
Conversely,
8)
1
7r n
= max m u m (7r),
=
1.
that
is,
the
an individual would obtain by choosing the myopically optimal action. 9
The sequence {T^v
(Iterates)
} consists
of weakly convex functions that are
pointwise increasing, and converge to v(-,5). The value v(-,5) weakly exceeds vq.
To maximize Z)a m e-4^'( am
Proof:
for given
q(-K, x,
x G X, we see that Tsv
(as per usual)
The
7r ' :r
)
K 1 ~ fi)um{q(n,x,a m ))
>
(tt)
x(a))
=
v
for all
(ir)
following either
>
Proof:
82
,
By
it.
induction,
is
v(tt, 8i)
Clearly,
>
policy under 81
we
it.
10
The value function weakly
for
>
v
.
to choose the
this differs
>
or ought to be a folk result for optimal experimentation, but
rises
when 8
increases: Namely,
all n.
Y,am€A 1>(°m\*,x)iiTn{q(*,x,am))
is
yielding
.D
>
If 81
82 then
,
<
E^e^M^M^'^'O)
v
>
T
Sl
Tg 2 v
on the larger component of (A-l). Inductively, we have T^v
Observe how
,
v
v(tt, 82)
any x and any function v
9
Tn v > T n_1 u
a pointwise increasing sequence converging to the fixed point v(-,5)
Claim A. 3 (Monotonicity)
8i
over x
resulting in value v (n). Optimizing over
tt a.s.,
have not found a published or cited proof of
for
+ 5vQ (q(7r,x,am ))]
one possible policy x ensures that the myopically optimal signal a m occurs
tt,
with probability one. Then
all
l
x optimal under
from v{n,0)
=
sup;,
82
.
Let n
—>
,
since there
> T£2 v
,
is
for
more weight
since one possible
00 and apply Claim A. 2.
X) m ^(a m |7r,x)u m (9(7r,a,a m )). In other words, v(ir,0)
a before obtaining the ex post value v (p(-n,a)).
allows the myopic individual to observe one signal
10
But
ABHJ
do assert without proof
(p.
625) that the patient value function exceeds the myopic one.
12
Claim A. 4 (Inclusion)
Va m £ A,
>
if I
As seen
Proof:
when
61
>
62
>
5{)
optimal value can thus be obtained by inducing a m
Claim A. 5 (Unbounded
SS establish
=
and Jm{0)
= {!} Now
-
We
Proof:
ai occurs
[0,fr]
=
ir
(0, 1).
7r
So any
is affine,
A
small enough
n
G
sufficiently small, q(n)
—
=
3u >
v(ir, 6) is
suboptimal to
is
Fix 5 €
some <
> H ([x*,
for
1
risk
[0, 1),
<
7Ti
1])
fi
[0, 7r/2]
Updating
7r
tt 2
=
0,
<
(0, 1), in
for all
7r
G
/.
and
ai
—
q(7r)
=
Tg(v)
is
very similar. Since action
the optimal choice for beliefs
is
> u m (ir) + u
it
For
is
—
=
ira/[ira
in particular, less
for all
tt
m^
<
it.
and
1,
(1
—
By
7r)(l
C
—
[a, a]
the continuity
than u{\
—
5)/8 for
(1
it is
D
ai for such small beliefs.
<
m
basins disappear except for
all
8,
C
For
a)].
v corresponding to (A-l) reveals that
{0} and lima-n Jm(8)
m
+
arbitrarily small.
enough
large
=
q(ir)
any outcome other than
0.
We
=
< M)
{1}-
which Jm {6)
for
shall consider the alternative policy x,
= 2*, x m +i =
1 (see
Lemma
with observation of event {a €
+
(1
-
definition of e, q
[x*
,
=
[7r x
,
7T 2
],
A.l).
tt)^ ([x*, 1])] in state
D Jm (6),
maps
H.
there exists e
the interval
13
with interval
1]} yields the posterior belief q(n)
l
particular one with /
By
[0,7r(<5)]
Since there are informative private beliefs, 3x* € (1/2, 1) with
1.
xm
=
optimal to choose a rule x such that
it is
such that Ui(ir)
and an action index
H ([x*,l])/[7r^([a;*,l])
7r/i
C
{0}
1.
then arbitrarily small
> n L ([x*, 1]) >
boundaries x m _!
/
<
(6)
updated to at most
and Jm(8), while lim^x Ji(5)
Proof:
=
[0, 7r/2].
Claim A. 7 (Limiting Patience)
Ji{8)
it
t>o(7r), i.e.
The Bellman equation
7r.
are empty.
can ever yield a stronger signal than any a € supp(/x)
initial belief
of v, v(q(ir), 5)
Jm (6)
not weakly dominated, there must be some positive
is
Ui(ir)
in the interval
7r
observation a £
strictly
<
n(6)
and
0,
on which
Moreover, since each u m
for all beliefs
other
all
are empty, except for Ji (0)
with probability one; the argument for large beliefs
length interval
No
<
prove that for sufficiently low beliefs
optimal at belief
ai is
Jm (0)
only basins for the
If the private beliefs are bounded, then Ji(S)
Beliefs)
where
[*(£), 1],
all
= u m (n). The
G Jm (^2)-
7r
beliefs,
{1};
for all n,
apply Claims A.l and A.4.
Claim A. 6 (Bounded
and Jm{8)
so that
=
{0} and Jm{8)
myopic model that
for the
a.s.,
> u m {n)
v (n)
and thus v(n,6 2 )
With unbounded private
Beliefs)
extreme actions are empty, with Ji{5)
Proof:
>
v{n,S 2 )
= u m (n)
€ Jm (6i), we know v(n,5i)
it
In other words,
6 increases:
Jm (6i) C Jm {62 ).
then
0,
Claims A. 3 and A. 2, v{x,
in
For
.
>
6i,8 2
when
All basins weakly shrink
[tt 2
=
=
For any closed subinterval
e(I)
- e/2, tt 2
]
>
with q(n)
-
-n
>
e
into (but not necessarily
+
onto) [n 2
Since
e/2,
> u m (ir)
u(7r, 5)
>
choose u
thus have
Choose u >
1].
outside
> u m (n) + u
the Bellman equation Tg(v)
when
E
it
Jm (5)
so small that v(tt, 5)
v(tt, 6')
— e/2,n 2
[7r 2
].
=
By
m =
Jm (5) n
m =
or
1
M,
[7Ti, 7r 2 ],
for all
and both are continuous
> u m (n) + u
for all
>
>
If 5'
8.
u
6
n €
4-
[7r 2
e/2,
it,
-
[0, 1].
we may
By Claim
1].
so large that (1
is
in
€
7r
<
5')u
also
A. 3, we
then
S'u,
v reveals that our suggested policy x beats inducing a m
number
iterating this procedure a finite
Jm (5), we
enough 5
Jm (S) evaporates
see that
procedure can
this
/ vanishes for large
=
for all 6'
excising length e/2 from interval
If
< u m+1 (7r) +
so large that u m (ir)
of times, each time
for large
enough
6.
be applied repeatedly, to show that
still
any closed /
for
a.s.
C
(0, 1).
A. 3 Proof of Proposition 3
Lemma
In light of the interval structure found in
determine the chances
what
?/>(a
compact M-simplex A(A) (that
The
maps
is,
Theorem LB. 3
rules
is
\\q(ir,x,a)
Proof:
it,
and
>
—
For
7r||
in
and an open
>
it
e
7r.
when x
is
non-empty correspondence
of optimal
Theorem
n
interval
For
K
all 8
€
D Jm (5),
such that Vtt £
is
u.h.c. in
it,
rule
is
at least
some
e
>
the optimal rule for
is
x,ai)
<
it, i.e.
near zero.
loss,
beliefs,
K
Jm (5) ^
^
and Va
0,
a mj
and so
it is
an extra step
is
it
near
Assume
Jm (5) induces
With bounded
immediate that the public
needed.
As the
.
belief
beliefs,
must jump
.
WLOG that
it is
near
0.
Then
the updated belief is close to the current, so that v(q{ir, x, a{),8)—v{iT, 5)
Choosing an action other than a : implies a boundedly positive immediate
which must be compensated by gained future value (again, by the Bellman equation).
Since v
11
1,
upon observing any action other than a m
With unbounded
q{ir,
and
with
to a.s. induce a m
actions other than a m for only the most extreme portion of supp(/u).
bounded away from
A
and a m G
[0, 1)
the optimal rule.
correspondence of optimal rules
it is
is
Maximum
€ Jm {S) we know that the only optimal
it
WLOG the
of the
follows from the
of Hildenbrand (1974)) that the
A. 2 (Overturning Principle)
there exists e
is
objective function in the Bellman equation corresponding to (A-l)
upper hemi-continuous
Lemma
into each action). Thus, the choice set
the same strategy space as in the original observational
continuous in this choice vector and in
(e.g.
simply to
is
with which each action should be chosen (or equivalently,
m |7r)
fraction of the signal space
learning model).
A.l, ST's problem
is
continuous,
The optimal
rule
is
it
follows that
q(ir, x,
a m ) must be boundedly higher than
a point- valued function except when there
14
is
an atom on the
it.
D
interval boundary.
MIT LIBRARIES
Illlllllllllllllilll
I 'ill
I
II
i
3 9080 00988 2751
References
Aghion,
P., P.
Bolton, C. Harris, and B. Jullien
(1991):
"Optimal Learning by
Experimentation," Review of Economic Studies, 58(196), 621-654.
BANERJEE, A. V.
(1992):
"A Simple Model
of
Herd Behavior," Quarterly Journal of
Economics, 107, 797-817.
S., D. HlRSHLElFER, and I. Welch (1992): "A Theory of Fads, FashCustom, and Cultural Change as Information Cascades," Journal of Political Econ-
BlKHCHANDANl,
ion,
omy, 100, 992-1026.
EASLEY, D., and N. KlEFER
(1988):
"Controlling a Stochastic Process with
Unknown
Parameters," Econometrica, 56, 1045-1064.
Hildenbrand, W.
(1974): Core
and Equilibria of a Large Economy. Princeton University
Press, Princeton.
Lee,
I.
H. (1993): "On the Convergence of Informational Cascades," Journal of Economic
Theory, 61, 395-411.
McLennan,
A. (1984): "Price Dispersion and Incomplete Learning
Journal of Economic Dynamics and Control, 7, 331-347.
Rothschild, M. (1974): "A Two-Armed Bandit Theory
Economic Theory, 9, 185-202.
Rudin,
SMITH,
ing,"
in the
Long Run,"
of Market Pricing," Journal of
W.
(1987): Real
L.,
and P. S0RENSEN (1996a): "Pathological Outcomes of Observational Learn-
MIT Working
and Complex Analysis. McGraw-Hill,
L.,
York, 3rd edn.
Paper.
(1996b): "Rational Social Learning
Stokey, N.
New
and R. E.
LUCAS
Through Random Sampling," MIT mimeo.
(1989):
Recursive Methods in Economic Dynamics.
Harvard University Press, Cambridge, Mass.
15
Download