Document 11158525

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium Member Libraries
http://www.archive.org/details/informationalher00smit2
HB31
.M415
(rO\
working paper
department
of economics
INFORMATIONAL HERDING
AND
OPTIMAL EXPERIMENTATION
Lones Smith
Peter Sorensen
No.
97-22
October, 1997
massachusetts
institute of
technology
50 memorial drive
Cambridge, mass. 02139
_,
RN
191998
WORKING PAPER
DEPARTMENT
OF ECONOMICS
INFORMATIONAL HERDING
AND
OPTIMAL EXPERIMENTATION
Lones Smith
Peter Sorensen
No.
97-22
October, 1997
MASSACHUSETTS
INSTITUTE OF
TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE, MASS. 02139
Informational Herding and Optimal
Exp erimen tat ion*
Lones Smith'''
Economics Dept., M.I.T
Peter S0rensen
Nuffield College,
October
30,
:
:
'
Oxford
1997
Abstract
We explore the constrained
efficient
observational learning model
— as when
individuals care about successors, or are so induced by an informationally-
constrained social planner.
We
find that
when
the herding externality
rectly internalized in this fashion, incorrect herds
To describe behaviour
in this
still
is
The optimal
Choose the action with the highest index.
simply:
cor-
model, we exhibit a set of indices that capture
the privately estimated social value of every action.
rule
is
obtain.
decision
While they have
the flavour of Gittins indices, they also incorporate the potential to signal to
successors.
We
then apply these indices to establish a key comparative static,
that the set of stationary 'cascade' beliefs strictly shrinks as the planner grows
more
patient.
transfer
The
is
We
also
show how these
indices yield a set of history-dependent
payments that decentralize the constrained
lead inspiration for the paper
is
social
optimum.
our proof that informational herding
but a special case of myopic single person experimentation. In other words,
the incorrect herding outcome
is
not a
new phenomenon, but rather
just the
familiar failure of complete learning in an optimal experimentation problem.
*A briefer paper without the index rules and associated results appeared under the title "Informational
Herding as Experimentation Deja Vu." Still earlier, the proposed mapping appeared in July 1995 as a
later section in our companion paper, "Pathological Outcomes of Observational Learning." We thank
Abhijit Banerjee, Meg Meyer, Christopher Wallace, and seminar participants at the MIT theory lunch, the
Stockholm School of Economics, the Stony Brook Summer Festival on Game Theory (1996), Copenhagen
University, and the 1997 European Winter Meeting of the Econometric Society (Lisbon) for comments on
various versions. Smith gratefully acknowledges financial support for this work from NSF grants SBR9422988 and SBR-9711885, and S0rensen equally thanks the Danish Social Sciences Research Council.
t
e-mail address:
^e-mail address:
lones@lones.mit.edu
peter.sorensen@nuf.ox.ac.uk
1.
The
last
INTRODUCTION
few years has seen a flood of research on informational herding.
We
ourselves
have actively participated in this research herd (Smith and Sorensen (1997a), or SS) that
was sparked independently by Banerjee (1992) and
Welch
The context
(1992).
is
seductively simple:
An
may
In this context,
action
arises
is
infinite
sequence of individuals must
condition his decision both on his endowed private signal about the
and on
state of the world,
Bikhchandani, Hirshleifer, and
Everyone has identical preferences and
decide on an action choice from a finite menu.
menus, and each
BHW:
all
predecessors' decisions (but cannot see their private signals).
SS showed that
beliefs
BHW and
1
taken with probability one.
-- namely, after some point,
converge upon a cascade
all
—
i.e.
where only one
Banerjee showed that a 'herd' eventually
decision-makers (Z\Ms)
make
the identical choice,
Clarifying their next result, SS also showed that this herd
possibly unwise.
is
the T>A4s' private signals are uniformly bounded in
incorrect with positive probability
iff
strength. This simple pathological
outcome has understandably attracted much
The main
thrust of this paper
looking behavior.
failure,
is
ex post
fanfare.
an analysis of the herding externality with forward-
Contrary to the popular impression of incorrect herds as a market
we show that herding
is
constrained-efficient:
Even when VAis
internalize the
herding externality by placing very low weight on their private gain, incorrect herds obtain
whenever private signals are boundedly powerful; however, they occur with a vanishing
chance as the discount factor tends to one. The VAis' lack of concern
just the extent of incomplete learning,
Social information
is
and not
its
for successors affects
existence.
poorly aggregated by action observation, 2 as individuals
choose actions that reveal almost none of their information. But suppose that early
either wished to help latecomers, or were so induced,
better signalled their private information. Exactly
to the
VAA's
behaviour
to
maximize
in this
social welfare?
We
may
VM.S
by taking more revealing actions that
what instructions should a planner
give
provide a compact description of optimal
constrained efficient herding model.
We
capture the privately estimated social value of every action.
produce a
set of indices that
The optimal
decision rule
is
simply to choose the action with the highest index. While they have the flavour of Gittins'
(1979) multi-armed bandit indices, they also must incorporate the potential to signal to
successors. So unlike Gittins' perfect information context, an action's index
is
not simply
present social value. Rather, individuals' signals are hidden from view, and therefore
its
social rewards
must be translated
into private incentives using the marginal social value.
*As convergence may take infinite time (see SS), we have a limit cascade, as opposed to BHW's cascade.
Dow (1991) and Meyer (1991) have also studied the nature of such a coarse information process for
2
different contexts: search theory
and organizational design.
We
apply these indices to establish a key comparative
Our second
VMs
when
strictly shrinks
application of the indices
A
centralized.
is
is
to the equivalent
to
social planner
benefit of posterity.
problem of the informationally-
maximize the present discounted value
For the altruistic herding fiction
welfare.
cascade belief set
grow more patient.
constrained social planner, whose goal
VMs'
static, that the
only worthy of study
is
must encourage early
VMs
set of history-dependent
are given by our indices,
and have a rather
can be de-
to sacrifice for the informational
Such an outcome can be decentralized as a constrained
by means of a simple
if it
of all
social
optimum
balanced-budget transfers. These transfers
intuitive
economic meaning.
This paper was originally sparked by a simple question about informational herding:
Haven't we seen this before?
We
were piqued by
its
similarity to the familiar failure of
complete learning in an optimal experimentation problem. Rothschild's (1974) analysis of
the two-armed bandit
is
a classic example:
An
impatient monopolist optimally experiments
with two possible prices each period, with fixed but uncertain purchase chances for each
price.
two
Rothschild showed that the monopolist
prices,
had the
and (u)
selects the less profitable price
clear ring of:
(i)
an action herd occurs, and
This paper also formally
herding
is
not a
(i)
eventually settles
down on one
with positive probability. To
(ii)
with positive chance
justifies this intuitive link.
We
new phenomenon, but a camouflaged context
is
of the
us, this
misguided.
prove that informational
for
an old one: myopic single
person experimentation, with possible incomplete learning. Our proof respects the herding
paradigm quintessence that predecessors' signals be hidden from view. In a
replace
all
VMs
by agent machines that automatically
into action choices; the true experimenter then
map any
nutshell,
we
realized private signals
must furnish these automata with optimal
We therefore reinterpret actions in the herding model as
signals, and the VMs' decision rules as his allowed actions.
history-contingent 'decision rules'.
the experimenter's stochastic
We
perform this formal embedding
The organization
problem anyway,
it
of this paper
it
as follows.
makes more sense
So section 2 describes a general
re-interprets
is
for a very general observational learning context.
As we must
3
solve the experimentation
first
to proceed backwards, ending with the economics.
infinite player observational learning
model, and then
as an optimal single-person experimentation model. Focusing on the finite-
action informational herding model, section 3 characterizes the experimenter's limiting
beliefs.
The
altruistic herding
model
is
introduced in section
4,
and optimal strategies are
described using index rules; these are then applied for our key comparative static, as well
as a description in section 5 of the optimal transfers for the equivalent planner's problem.
A
conclusion affords a broader perspective on our findings.
3
Some
proofs are appendicized.
Such an embedding is well-known and obvious for rational expectations pricing models (eg. Scheinkman
and Schechtman (1983)), since the price is publicly observed, and an inverse mapping is not required.
TWO EQUIVALENT LEARNING MODELS
2.
In this section,
we
sumes SS, and thus
embedded
in the
up a rather general observational learning model, that sub-
first set
BHW
and Banerjee (1992). All models
experimentation framework. Afterwards, we specialize our findings.
An
Information.
sequence of decision-makers
infinite
actions in that exogenous order. There
The elements
a given
of the parameter space
common
state of the world.
is
T)
(f2,
signal received by a T>M.
SS (Lemma
in
.
.
.
takes
measure v over Q.
1),
actually his private
is
1, 2,
are referred to as states of the world. There
partially informative private
As shown
(DMs) n =
uncertainty about the payoffs from these actions.
prior belief, the probability
The nth VA4 observes a
Q
then formally
Observational Learning
2.1
is
in this class are
random
we may assume
we
belief, i.e.
a n G E about the
signal
WLOG
let
that the private
a be the measure over
which results from Bayesian updating given the prior v and observation of the private
Signals thus belong to E, the space of probability measures over (f],^7 ), and
signal.
the associated sigma-algebra. Conditional on the state, the signals are assumed to be
across 2? .Ms,
trivialities,
drawn according
assume that not
to the probability
w
all
/j,
we
\i
insist that all
to ensure that
signals are informative.
no signal
will perfectly reveal the
same
everyone
for all
is
DMs,
rational,
where u
i.e.
:
A
x
Q
h->
R
is
measurable.
DM's
vate belief.
predecessors,
As
we simply assume that a
in SS,
rule given the private belief
expected payoff u a (p)
4
A, equipped
in state
uj
common knowledge
G Q,
that
upon an
entire action history h.
pri-
can compute the behaviour of
all
it
a
=
ir(h)
G
E
for
Knowing
any history
yields the posterior belief
these probabilities, Bayes' rule
h.
A
final application of
Bayes'
p G E.
VA4 picks the action a G A which maximizes his
— Jn u(a, u>)dp(to). We assume that such an optimal action a = a(p)
Given the posterior
The
DM
G Q.
and can use the common prior to calculate the ex ante (time-0) probability
implies a unique public belief
4
and the
u>
Bayes-optimal decision uses the observed action history and his own
distribution over action profiles h in either state.
exists.
It is
u>)
set
seeks to maximize his expected payoff. Before deciding
action, everyone first observes his private signal/belief
Each
(a.c), for
Everyone chooses from a fixed action
with the sigma-algebra A. Action a earns a nonstochastic payoff u(a,
the
i.i.d.
G £1 To avoid
in state ui
p? be mutually absolutely continuous
Bayesian Decision-Making.
is
some
are (a.s.) identical, so that
Each distribution may contain atoms, but
state of the world,
measure
w
Q
belief p, the
nth
solution defines an optimal decision rule
x from E
to
A(A), the space of
Absent a unique solution, we must take a measurable selection from the solution correspondence.
probability measures over (A, A). Let
X be the space of such
maps x
x produces an implied distribution over actions simultaneously
The
Stochastic Process of Beliefs.
:
E
A(A).
i->
A
rule
for all private beliefs a.
Since the optimal x depends on
and
it,
the probability measure of signals a depends on the state u, the implied distribution over
actions a depends on both
f x(a)(a)/j,
in
u
'(da),
u>
and the optimal decision
and unconditional on the
state,
it is
Markov process with state-dependent
2.2 Informational
of old bookes,
Cometh
al this
A
new
goal
first
7r
n+1
.
Thus,
=
ip(a\ui,x)
This
follows a
(tt„)
transition chances.
in
good
science that
faithe,
men
lere.
— Geoffrey Chaucer
optimization.
is
Herding as Experimentation Deja Vu
And out
Our immediate
density
= fn ip(a\uj,x)ir(du)).
xp(a\ir,x)
turn yields a distribution over next period public beliefs
discrete-time
The
rule x.
is
5
to recast the observational problem
outcome
stab brings us to the forgetful experimenter,
as a single person
who each
period receives
a new informative signal, takes an optimal action, and then promptly forgets his signal;
the next period, he can reflect only on his action choice. But this
optimal experimentation, since
it
assumes and
is
not a model of Bayes-
How
in fact requires irrational behaviour.
then can an experimenter not observe the private signals, and yet take informative actions?
To
give proper context to our resolution,
nice sequel to Rothschild's
prices,
it
helps to consider
McLennan
(1984). This
work permitted the monopolist to charge one of a continuum of
but assumed only two possible linear purchase chance 'demand curves'. McLennan
found that the resulting uninformative price when the demand curves crossed
may
well
eventually be chosen by an optimizing monopolist.
Rothschild's and McLennan's models give examples of potentially confounding actions,
as introduced in
EK: Easley and Kiefer
(1988).
unfocused beliefs for which they are invariants
unchanged). Of particular significance
state
is
In brief, such actions are optimal for
taking the action leaves the beliefs
(i.e.
the proof in
EK
(on page 1059) that with finite
and action spaces, potentially confounding actions generically do not
thus complete learning must arise.
anticipations of
EK's general
state space, whereas
6
Rothschild and
insight.
McLennan
McLennan might be
Rothschild escapes
it
exist,
and
seen as separate
by means of a continuous
resorts to a continuous action space. Yet there appears no
escape for the herding paradigm, where both flavours of incomplete learning, limit cascades
5
See
6
Eg: payoff's in a one-armed bandit, with a potentially confounding safe arm, are not generic in
The Assembly
of Fowles. Line 22.
M2
.
Observational Learning Model
State:
E
u>
VM:
n observations: 7r n
Optimal action: x E
Randomness in the nth experiment: a n
Observable signals: a € A
Belief after
ir n
X
Optimal decision rule: x G X
Private signal of nth VM: a n
Action taken by each VM: a G A
Density over actions:
Density over observables: ip(a\u,x)
il)(a\u),x)
Payoffs: unobserved
Payoffs: private information
Embedding.
Table
1:
model
fits
well
State:
f2
Public Belief after nth
Model
Impatient Experimenter
This table displays how our single-type observational learning
into the impatient single person experimentation model.
and confounded learning
(see SS), generically arise
mapping that we now
puzzle suggests the inverse
In recasting our general observational learning
with two actions and two states. This
consider.
model
as a single person experimentation
problem, we must focus on the myopic experimenter with discount factor
the observational learning story from a
both the public belief
beliefs
pn
.
We may
Regard Mr. n
the rule x G
as:
unobserved,
beliefs
a have
and
perspective. Consider the nth
on
in
shall regard
VM,
observing
ir n
but not his private signal;
,
and take
his private signal
(ii)
and
his action a
G
A
for
distribution
p
in state
decision rule x described in section
to,
2,
7r„
and an
.
(Hi) letting that
The
him.
payoff u(a,u)
that provide an additional signal of the state of the world.
w
uses
optimally determining
to an agent 'choice' machine;
it
who
forming and acting upon his posterior
separate these two steps by the conditional independence of
(i)
lest
new
his private signal
X, and submitting
machine observe
is
7r n
we
Steering away from a forgetful experimenter,
active experimentation).
(ruling out
If
private
then the experimenter chooses the same optimal
resulting in action a £
A
with chance
ip(a\u),x).
Thus, the observational learning model corresponds to a single-person experimentation
model where: The
(the rule)
state space
x € X. Given
1
In period n, the experimenter
this choice, a
chance ip(a\to,x) in state
Table
is fi.
to.
Finally,
random observable
EX
EX
statistic a
G
chooses an action
A
is
realized with
updates beliefs using this information alone. 7
summarizes the embedding.
Notice
how
this addresses
both lead puzzles.
First, the
experimenter never knows the
private beliefs a, and thus does not forget them. Second, incomplete learning (bad herds)
are entirely consistent with
7
This model doesn't strictly
EK's generic finding of complete learning
fit
for
models with
finite
EK
mold, where stage payoffs depend only on the action and
wefi. This is the structure of Aghion, Bolton,
admit unobserved payoffs. Alternatively, we could posit that £X
into the
the observed signal, but (unlike here) not on the parameter
(ABHJ), who
Harris,
and
has
insurance, and only sees/earns his expected payoff each period and not his
fair
Jullien (1991)
random
realized payoff.
action and state spaces. Simply put, actions do not
rewrites the observational learning
space for
EX
X
the infinite space
is
model
map
to actions but to signals
as an experimentation model.
The
when one
true action
of decision rules.
SS considered two major modifications of the informational herding paradigm. One
was to add
i.i.d.
easily incorporated here
SS
DAi's
noise ('crazy' preference types) to the
T
also allowed for
decision problem. Noise
is
by adding an exogenous chance of a noisy signal (random action).
different types of preferences, with
VAds randomly drawn from one
or the other type population. Multiple types can be addressed here
EX
that
by simply imagining
T
chooses a T-vector of optimal decision rules from A" with (only) the choice
machine observing the task and private
and choosing the action a as
belief,
before.
THE PATIENT EXPERIMENTER
3.
The Reformulated Model
3.1
From now
we
on,
restrict ourselves to the
more focused herding analytic framework -
a two state, finite action setting, as in SS. More states complicates but does not enrich.
We
prior v(L)
=
v(H)
Let supp(^i) be the
(i.e.
are
Q = {H,L},
assume a state space
noise for
=
1/2.
Private belief
if
common
If supp(/u)
=
co(supp(/^))
assume that no action
am
is
is
is
f
<
f\
<
.
.
.
< fM =
strategy s n for period
must be used, given
which
in
so that
belief
having
E =
[0, 1].
then private beliefs are bounded; they
(0, 1),
a direct
arbitrarily strong private beliefs exist.
sum
The
of these separate analyses.
A=
{ai,
.
,
.
.
a^}.
dominated, yielding the standard interval structure that action
then order the actions such that a m
=
C
i.e. if
optimal exactly when the posterior p
where
H,
the chance of state
is
the standard herding assumption of a finite action space
We
A
-
[0, 1]
half-bounded, half-unbounded case
We make
a
i.e.
support of each probability measure pw over private beliefs
EX's problem).
unbounded
with both states equilikely ex ante,
n
7r n
is
.
is
is
in
some
sub-interval of
[0, 1].
WLOG,
myopically optimal for posteriors p £
we can
m -i,fm
[f
],
1.
a
map from E
to
X.
It
The planner chooses a
prescribes the rule
xn €
strategy profile s
=
turn determines the stochastic evolution of the model --
X which
(si,S2, ...),
a distribution over
i.e.
the sequences of realized actions, payoffs, and beliefs.
The Value Function.
Our
Stokey and Lucas (1989). The value function v(-,S)
with discount factor S
is v(ir,
8)
=
sup 5 E[(l
over the payoff sequences implied by
ABHJ
E i— R for
and sections 9.1-2 of
analysis here follows
s.
—
:
the planning problem
>•
n lu
n\^}, where the expectation
Yl^Li ^
"
5)
Recall that u m (ir)
denotes the expected payoff from a m at belief n. Since u m
=
nu(a m
is affine,
,
H
)
+
(1
—
is
7r)u(a m L)
,
the Bellman equation
-
=
v(ir,S)
where
sup
ij>(a
J^
(Markov) policy for ^A"
The optimum
in a
Theorem
4.1),
and
Lemma
1
[(1
-
is
map
a
£
:
when a
7r
[0, 1]
—
X
>
observed and rule x
is
(so the rule given belief
For any discount factor S <
1,
EX
alone, such that action a m
has an optimal policy £ 5
=
is
strictly
<
9
<
9\
.
by producing the
2
and
EX
Proof:
For the
=
thresholds
<
9q
belief
<
9\
•
EX. Any optimal
it
of
•
< 9m =
•
1
it
9
m
<
Assume
2
,
let E, (i
=
1,
2)
— u m2
is
,
and vice
x G
rule
= 9m
when a
\
be those beliefs in
versa.
(# m _i,(9
This
.
U E2 n
)
(It
may be
we
least
q(7r, x,
a mi )
That
is,
depending on
indifference
also true with
is
8
X
almost surely described by
is
taken when a £ (#m _i,#m ),
is
E mapped
with positive probability
>
q{ir, x,
find q(7T,x,a mi
one inequality
is
(Ei
<
q(ir,x,a m2 ).
This problem
is
U E2 n
)
[8, 1].
)
<
strict.
remapping
.
strictly
is
beliefs
improved,
unchanged.
For any 9 G (0,1), define Ex(#)
=
Consider then the modified rule x which
taken for beliefs in Ei(#), and where 9
satisfies ip(a mi \7r,x)
=
necessary for x to randomize over the two actions at belief 9
g(7r,x,a mi
),
and similarly q(n,
L
x,
are
am 2 )
.
mapped
>
into a mi under
q(7r,x,a m2 ), with at
Thus, x yields a mean preserving spread of the updated belief
is
weakly convex,
weakly improved. But as we have just argued, the myopic payoff
statistical decision
< E2
a m2 ), then given our action
the myopic payoff
versus x, and since the continuation value function
8
beliefs
.
Since beliefs more in favour of state
to accomplish that.)
i,
=
and E 2 (<9)
[0, 9}
equals x, except that a mi
•0(a m j7r, x).
1
m ), and
a decreasing function, while the continuation value
Next, assume that g(7r,x,a mi )
(Ei
—> X.
to the contrary that the sets are not almost surely ordered as Ei
leading to a mi into a m2
u mi
[0, 1]
:
M =
ordering, payoffs are strictly improved with no loss of information by
since
summary:
also ensures the greatest information
such that action a m
posteriors are perversely-ordered as
If
In
.
prove that any rule x without an interval structure can be strictly improved
upon. For mi
into a mr
<
.
= 9m
riskiest posterior belief distribution.
randomizes between a m and a m+
We
ABHJ,
2 proves. Intuitively, not only does the interval structure respect the
action order to yield high immediate payoffs, but
Lemma
.
optimal when a G
between between a m and a m +i prevails at the knife-edge a
value,
is £(7r)).
SS shows that the myopic experimenter maps higher
There are thresholds
Lemma
applied.
is
it
(1)
>
exists (eg.
achieved by some such policy, generically written £
is
into higher actions:
patience, as
+ 8v(q(ir,x,am ),6)]
S)um (q(n,x,am ))
Markovian decision problem with discount factor 6
Interval Structure.
7r
m \n,x)
the Bayes-updated belief from
q(ir, x, a) is
A
<
is
its
strictly
expectation
improved.
not without history. Sobel (1953) investigated an interval structure
problem, and more recently, Dow (1991) in a two period search model.
is
7
is
in
a simple
.
Our
may
assumption that no action
earlier
dominated actions
VM.
no longer so innocent, and
in principle provides additional signalling power.
with a larger
finite
We
Essentially,
it
leaves
alphabet with which to signal his continuous private
belief.
But including such actions would invalidate the above
structure.
is
have some bite for very patient EX. For the information value alone, taking
in fact
each
dominated
is
and thus perhaps the
proof,
leave this stone unturned, acknowledging the possible loss of generality.
Long Run Behavior
3.2
As
generally the case with Bayesian learning, convergence
is
of the martingale convergence theorem to the belief process.
EX
down, but also
Lemma
The
3
some
a.s. to
The next
is
belief process
random
limiting
result
A
never dead wrong about the state.
{irn )
is
when one
is
deduced by application
Not only must
proof
is
found
beliefs settle
in SS.
a martingale unconditional on the state, converging
variable vr^.
The
limit ir^ is concentrated
on
(0, 1] in state
action chosen
is
is
only possible during a cascade,
chosen almost surely, and thus
is
generalizes the analysis for 5
sets are reflected in the
Ji(5)
<
•
•
<
•
(Cascade Sets)
1
Jm{o~), such that
For
(b)
With unbounded private
and
£
Jm(°~)
=
[0, 1),
There exist
EX
M
optimally chooses x
all
beliefs, the
<
n(5)
while
lim^i
Proof:
<
it(5)
J\(5)
1.
=
a.s.
Jm (o~)
other
For
large
{0} and
enough
empty) subintervals of
A
inducing a m £
on the
U
sets Ji(6)
•
•
is
true
any belief n not
probability.
By
in
5, all
lim^i Jm(^)
any cascade
U
Jm(o~).
=
{0}
are empty.
=
=
[0,7r(5)]
and
Jm(o~)
=
[7f(<5),l],
set, at least
where
cascade sets disappear except for J\ and Jm,
{1}-
that a limit cascade must occur, as SS call
- -
•
[0,1],
Jm {6).
iffn 6
All but the initial limit belief result are established in the appendix.
that one
the cascade
extreme cascade sets are nonempty with J\(S)
If the private beliefs are bounded, then Ji(5)
<
how
illustration of
(possibly
the limit belief ix^ is concentrated
and
{1}>
an
in SS. See figure 1 for
directly
(7r n )
shape of the optimal value function.
(a)
all 6
=
The next
uninformative.
characterization of the stationary points of the stochastic process of beliefs
Proposition
H
an expression of EK's Theorem 5 that the limit belief ir^ precludes
is
further learning. In the informational herding model, this
(c)
interval
two signals
in
the interval structure, the highest such signal
A
is
it
—
To
see
why
observe that for
are realized with positive
more
likely in state
the lowest more likely in state L. So the next period's belief differs from
tt
H, and
with positive
probability. Intuitively, or by the characterization result for Markov-martingale processes
in
appendix
B
of SS,
it
cannot
lie
in the
support of
8
tt^.
.
Figure
Typical value function. Stylized graph
1:
of v(tt,S), 5
>
with three actions.
0,
u(ai,L)
u(a 2 ,L)
H)
u(ai,
u{a 3 ,L)
o
The proof of this
MS)
MS)
result also
MS)
shows that the larger
i
the weakly smaller are
is S,
all
cascade
Indeed, this drops out rather easily from the monotonicity of the value function
sets:
We
in 5.
and
defer asserting this result for now,
in fact Proposition 5 leverages
weak
monotonicity, and the index rules that we introduce later on, to deduce strict monotonicity.
Proposition 2 (Convergence of Beliefs)
(a)
For unbounded private
(b)
With bounded private
Jm{S), there
(c)
is
concentrated on the truth for any 8 G
beliefs, tt,^ is
beliefs,
Consider a solution of EX's problem.
learning
is
H
positive probability in state
incomplete for any S 6 [0,1):
that
tTqo
Part (a) follows from
Theorem
part
(c).
We now
of SS.
1
Lemma
on state
H
.
private beliefs a, the sequence (£ n )
its
prior
places on J\(S) in state
vanishes as S
and part
l-a,b,
(b)
mean
(1
ratio ln
—
(1
—
enough
7T n
)/n n
to
is
1,
n^
places
is
—
7r
(1
)/7ro.
H must vanish as
5
—>
Since
lim^i k{S) =
0,
fall
value function v
EX
v(ir)
strictly
convex
in beliefs
7T,
um{t^) for
plete learning:
it
near
lumpy
1,
9,
where
learning
optimally behaves myopically for very extreme
=
just as in
all
1 in
weight
oo for
the
all
mean
71-00
1.
under the rubric of EK's Theorem
is
1
the weight that
Observe how incomplete learning besets even an extremely patient EX. So
lem does not
"f
a martingale conditional
— o)jo bounded above by some I <
bounded above by 1(1 — k(S))/k(S), and
Because likelihood ratio
must equal
beliefs
and Proposition
First, Proposition 1 assures us that for S close
G
tt
extend that proof to establish the limiting result for S t
and Jm{S). The the likelihood
in J\(S)
of ^oo
3
Unless
not in Jm(S).
is
The chance of incomplete learning with bounded private
Proof:
[0, 1).
is
it is
shown that
complete
beliefs:
v(n)
=
if
for 5 near
Ui(ir) for
ir
this prob-
the optimal
1.
For here,
near
0,
and
both affine functions. This points to the source of the incom-
signals (actions) rather than impatience.
It is
simply individuals'
inability to properly signal their private information that frustrates the learning process.
AND INDEX RULES
ALTRUISTIC HERDING
4.
The Welfare Theorem
4.1
We now
shift focus
from the problem facing
EX
to the informational herding context
with a sequence of
VM's. For
VM
but subject to the usual informational herding constraints (observable
altruistic,
is
organizational simplicity,
actions, unobservable signals). Define
Nash equilibrium
of the
discounted welfare of
game where
all posterity,
an
result
is
VM n =
every
is
Proposition 3
For any discount factor
Consequently, an
AHE
of
all
successors in
is £*(7r).
Then
EX
taking x in the
(i)
S
£'[/|/t]
proved
<
.
.
.
seeks to
1,
=
—
5)
suppose that every
(AHE)
maximize the present
YLkLn ^""^fcl
71"]-
Define
Xlwen ^i^fi 10 )-
for clarity.
any optimal policy £ 5 for
EX
is
an AHE.
VM and a public belief Assume that £ the behaviour strategy
an AHE, but that the VM has some rule x that
a better reply than
s
tt.
is
is
can improve his value at n by fully mimicking this deviation,
first
period and thereafter
(ii)
continuing with £ s as
history had been generated by ^(/t). This contradicts the optimality of
4.2
as a Bayes-
exists.
Fix a given
Proof:
1, 2,
themselves included: E[(l
quite natural, but
first
altruistic herding equilibrium
the expected reward of the payoff function / as
The next
we simply
if
the
first
EX's
by
i.e.
period
policy.
Choosing the Best Action
Recall the classical problem of the multi-armed bandit:
9
A
given patient experimenter
each period must choose one of n actions, each having an uncertain independent reward
distribution.
The experimenter must
therefore carefully trade-off the informational and
myopic payoffs ssociated with each action. Gittins (1979) showed that optimal behavior
in this
model can be described by simple index
rules:
Attach to each action the value of
the problem with just that action and the largest possible
lump sum retirement reward
yielding indifference. Finally, each period, just choose the action with the highest index.
We now
argue that the optimal policy employed by the decision makers in an
a similarly appealing form: For a given public belief
7r
and private
belief p,
AHE
has
VM chooses
the action a m with the largest index u^(7r,p). This measure will incorporate the social
payoff, as did the Gittins index, but as privately estimated
by the
VM.
Before stating the next major result, recall that dg(y) denotes the subdifferential of
the convex function g at y -- namely, the set of
for all x.
3
An
Moreover, dg{y)
excellent,
if brief,
is
all
(strictly) increasing in
treatment
is
found
in §6.5 of
10
A that obey g(x)
>
y (strict) convexity.
Bertsekas (1987).
g(y)
+
A
•
(x
—
y)
Proposition 4 (Index Rules)
exist
Xm G
s
dv(q(iv, £ (n), a
AHE
public beliefs
m =
6
(l-S)u m {p{TT,p))+6{v{q{iT,t {n),a m )
:
5)
and private
it
+ \m
— irp/[jTp+(l — 7r)(l— p)} is the posterior of it
5
5
take action a m when w m {'K,p) = max^ w k {tt,p).
where p(ir,p)
rule is to
For
strategy f*.
1,2
...
,M,
there
such that the average expected present discounted value of
maker faced with
action a m to the decision
«i {n,p) =
m ), 5)
Fix any
belief
p
is
5
[p(iT,p)-q{TT^ {ir),a m )}}
(2)
given p. So the optimal decision
VM facing public belief
Action
We now calculate the payoffs from each of the M available actions
a m and a corresponding
a m of the VM induces the public posterior belief qm =
Fix a given decision maker
Proof:
it.
a\,
q{rc,
.
,
.
.
a^f.
d
{^),
C,
),
state-contingent average expected discounted future payoffs of his successors, say
v%.
Clearly, the
VM's
posterior belief
p, i.e.
payoff streams,
it
expected value of any such
of the form h m (p)
suffices to
=
E[v^\n]
vNM
payoff stream
= pv" +
—
(1
p)v^.
employ the EX's reckoning, since the
v^ and
afhne in his
is
To evaluate
these
VM and EX entertain
the same future payoff objectives. Because the affine function h m presumes the behaviour
which
is
,
same strategy starting
at
an arbitrary public belief
<
v(r).
the subdifferential dv(qm ).
The
h m (r); therefore, h m (r)
That
EX
v(q m ). Next, by employing the
r as at q m
EX
,
can achieve the value
Thus, the slope of this affine function necessarily
lies in
present value expression (2) follows.
can always ensure himself a payoff function tangent to the value function
simply not adjusting his policy essentially was
implies convexity of the value function (eg.
4.3 Strict Inclusion of
We
h^qm) =
optimal starting at belief qm we have
critical to this proof.
Lemma
2 of
This simple idea also
Fusselman and Mirman (1993)).
Cascade Sets
next use our index rule characterization to prove a key comparative static of our
forward-looking informational herding model: As individuals grow more patient, the set
of cascade beliefs which foreclose on learning strictly shrinks.
Before proceeding, we need two preliminary lemmata.
Lemma
4 (Strict Value Monotonicity)
outside the cascade sets: for 82
The
detailed proof of this result
Provided EX's strategy
his continuation value
for
>
in
must
Si, v(ir,
is
The value function increases
<5 2 )
>
v(ir, 81)
all
-k
appendicized, but the idea
some future eventuality
strictly
for
£ Ji^) U
is
We
who more
•
U
Jm{&2)-
non-myopic action,
then show that this holds
any continuation public belief outside both cascade sets Jm (5i)
patient player,
•
quite straightforward.
strictly prefers a
exceed his myopic value.
strictly with 5
D Jm (d 2 ).
So a more
highly weights the continuation value, will enjoy a higher value.
11
We
we can characaterize
also exploit the fact that
at the edge of cascade sets.
10
For context,
differentiability of the value function
hard to identify primitive
in general very
it is
assumptions which guarantee the differentiability of our value function. Call rules x and y
equivalent
Lemma
they can be represented by the same thresholds (with associated mixing).
if
5 (Differentiability)
that all rules optimal at
Let
>
v'
implies Tgv
^
Then
>
as v
(1)
=
Also, T$
Tsv'.
an endpoint of cascade
0, 1 be
are equivalent.
tt
Write the Bellman equation
usual, v
it
v(-,5)
is
and
T$v,
differentiable in the belief at
call
tt.
T$ the Bellman operator.
a contraction, and v(-,8)
is
Jm (5). Assume
set
is its
As
unique fixed
point in the space of bounded, continuous, weakly convex functions.
We
can
major comparative
finally establish the
about foreclosing on further learning at some belief
indifferent
then he strictly prefers to learn
if
he
is
shrink strictly
Let r
Proposition
1
when
8 rises:
=
Jm (Si), the
inf
asserts
continuity oiv(7T,8i)
Case
1
f=
— u m (ir)
Assume
.
Va m e A,
Jm (8 2 )
if 62
it,
81
myopically optimal at
r
4, v(q{ir,x,
employ the
rule
v(r,S2 )
a^),^)
>
Q «/m (0),
v(q(ir,x,ak),8i)
x with the discount factor
^
ip(aj\r,x)[(l
-
> ^2
ip(aj\r,x)[(l
-8
>
^
set.
{k\v(tt,8i)
All non-empty cascade sets
0,
then
Jm (<!>2) C Jm (6i).
As Step 4
52
,
— u m (ir) =
0}
is
closed by
are two cases.
81
,
some optimal
any 5i-cascade
set.
x takes
[For since
the optimal rule x cannot almost surely
(j 7^
m)
at a stationary belief.]
> u k (q(iT,x,ak)), and
since
So from
we can always
we must have
8 2 )u J (q(r,x,a J ))
l
of the proof of
Instead, with positive probability,
induce any other myopically suboptimal action dj
Lemma
beliefs.
belief q(ir, x, a*) not in
G Jm{8\)
is
barely in a cascade),
that at public belief r and with discount factor
.
SX
patient.
and Jm (Si)
Jm (Si) =
(i.e.
we need only prove r^ Jm (5 2 ). There
some action a k producing a posterior
is
more
edge of the cascade
left
Jm(Si), an d
in
>
x does not almost surely take action a m
rule
am
slightly
Assume bounded
Proposition 5 (Strict Inclusion)
Proof:
static of this paper, that if
)u 3 (q(r,x,a J ))
+
8 2 v(q(r,x aJ ),82 )]
+
Siv(q(r,x,
:
a.j), Si)]
=
v(r,5i)
a.j€A
Consequently, we have v(r,5 2 )
Case
8\
is
2.
unique.
>
v(r,Si)
= u m (r)
Next suppose that the optimal
Then the
and so
r
<£
Jm{S2 ).
rule at public belief r with discount factor
partial derivative V\{r,8\) exists
by
Lemma
5.
By
the convexity
of the value function, any selection from the subdifferential dv(ir) converges to Vi(r,8\)
10
We
Nyarko
thank Rabah Amir, David Easley, Andrew McLennan, Paul Milgrom, Len Mirman, and Yaw
about the differentiability of the value function in experimentation problems.
for private discussions
12
as
increases to
7r
Since the optimal rule correspondence
r.
Maximum Theorem, and
the
uniquely valued at
upper hemicontinuous by
is
the posterior belief q{n £ Sl (n)
r,
continuous in n at r for any rule optimal selection ^ 5l and any action
—
Let b
optimal rule at r almost surely prescribes action a m
q(r,£
Sl
=
(r),a m -x)
continuous in
of «i and m
m
p{r,b).
By
(ir,p) at (r, b).
,
and
is
their definition, u>^(7r,p)
[In
=
Jm (5i), while w^(ir,b) < w T^_
set
= w r^_
Wm(r,b)
l
(r,b)
<5i
l
n <
(7r,b) for
way
=
r,
v(r,8i)
of contradiction,
iw r ^_
>
[to£(r,$)
=
(5 2
-5i)
1
($2
-
since r
is
and
r
then jointly
(Tr,p) are
between the slopes
lies
continuous and vanishing at
is
+ A^(p —
(r, 6),
r) evaluates
«;^_ 1 (r,
we then have the
- ^_!(r,6)] [u
<
m (p(r,6))
fc)
b),
(r, 6),
since r lies in the cascade
the endpoint of
Jm (8\). So
in a very useful form:
Si) [u m (p(r,b))
=
(<*2
-
<$i)
[u
=
(<5 2
-
«Ji)
[fim(p(r,b))
[ti£(r,$)
A^
is
-
Ajj(p(r,&)
the slope of u m
,
r)]
because the
the prospect of taking action a m forever.
and therefore conclude that
> u^^r,
fr),
i.e.
r
=
inf
r
<£
Jm (5 2 ). By
Jm (<52 ). Subtracting
- ^_!(r,6)]
-w(r,£i)]
MpM)A) -
- um -i(p(r,b))
+82 v(p(r,bL),81 ) -
4,
-
u(r,*i)
following contradiction:
-um _i(p(r,6))
-
-
u(p(r,6),<$i)
assume that w^(r,b)
+Mp(r,&),
>
=
am )
(3)
- um -i{p{r,b)) +
shall prove that wf%(r,b)
w m{ r ,k) >
\ m -\
1
by continuity. This equality can be rewritten
[u m (p(r,b))
function h m (p)
w r^_
and
> w^^r, b)
b)
Moreover, from the previous proof of Proposition
We
As the
-u m -i{p(r,b))
um (p(r,b))
=
Also, wf^(r,
p(r,b).]
q(r, £ 5l {r),
let
w^^,
the expression for
multiplied by a function that
given q(r,£ 6l (r),a m -i)
we
,
is
a,k)
a^.
supp(p) be the lower endpoint of the private belief distribution.
inf
,
,
-
r)
+
5 1 A^(p(r,6)
-
r)
~ ^A^(p(r,6) -
r)
+
<5iA^(p(r,6)
-
r)
A^(p(r,6)
-
<5 2
A^(p(r,6)
-v(r,6i)]
51 v(p(r,b),8i)
m {p(r,k)) - u m ~i(p(r,b)) ~ «Mi) +
- u m -i(p(r,b))}
/8 1
-
v{p{r,b),8x )
r)]
>
Here's a detailed justification. Under the assumption that r £
Jm (8i), one optimal policy
induces a m almost surely at belief
=
The
first
when
r
r,
so that g(r, £* (r),a m _i)
l
(r),
a m _i)
equality then follows substituting (2) for each index, and using v(r, 52 )
£ Jm (8i) D Jm (5 2 ). The second
applies (3).
The second
equality
A^
inequality exploits
and v(p(r,b),82) > v(p(r,b),8i), as given by
r
q(r, £
S2
£ Jm {8])
C Jm (0),
so that a m
is
is
simple algebra, while the
= A^
Lemma
The
final equality
um ),
final inequality follows since
myopically optimal at p(r,b).
13
= p(r,b).
= v(r, 8i)
(true as both are the slope of
4.
£f
D
CONSTRAINED EFFICIENT HERDING
5.
Let us turn
full circle,
and consider once more the informational herding paradigm
as
played by selfish individuals. Let the realized payoff sequence be (uk). Reinterpret £ A"s
problem as that of an informationally constrained
the expected discounted average welfare E[{\
herding model. Observe
how SV's and
<f
social planner
— 8) Y^Li
n_1
^
^nM
SV
trying to maximize
of the individuals in the
A"s objectives are perfectly aligned. To respect
that key herding informational assumption that actions but not signals are observed, we
further
assume that the
SV neither knows the state nor can observe the individuals' private
but can both observe and tax or subsidize any actions taken.
signals,
How
does
SV
away from the myopic solution to £A"s problem? Given
steer the choices
the current public belief
an individual takes action a G
if
ir,
A
(possibly negative) transfer r(o|7r).
Nash equilibrium
=
(CHE)
is
1,2, ... seeks to
expected one-shot myopic payoff u(a,7r) plus incurred transfers r{a\n).
his
such incentives, our proof in
threshold rules
Since the
is still
SVs
program, the best
A
constrained herding equilibrium
game where every T>M. n
of the repeated
he then receives the
.4,
Lemma
valid, for
policy
SV
is
any
the same observed action history as was £X's
in
Lemma
6
VAi
(CEHE)
constrained- efficient herding equilibrium
this constrained first best
implement SX's optimal
to
is
a
CHE
For any discount factor
a maps
<
S
1,
the optimal policy £
into the posterior p(ir, a)
=
5
Lemma
for
£X
is
Ka/[ira-\- (1
1.
a
threshold 9 m must solve the indifference equation u m (p(7r, 9m ))
u m+ i(p(iT, 9m ))
+ r(a m+ i|7r).
how
So the transfer difference r(a m _ 1 |7r)
individuals trade-off between the two actions, and the
threshold belief
is
taking action a m _i rather than a m
We
want
to provide
or subsidize actions
Conversely,
some
when
ir
if
Our
>
action indices shed
5,
SV
from
and thus
some more
and myopic payoffs
in
a
SV
can ensure that the
i.e.
for
-k
SV
will
not tax
£ Jm (5) C Jm
.
perforce wishes to encourage nonmyopic actions, and
zero. Indeed,
E Jm {o~) C Jm
is
a strict inclusion
SV.
beyond the mere
fact
light
CEHE
ix
transfers are not identically zero for a patient
on the optimal
that 'experimentation' (making non-myopic choices)
transfer
r(a m |7r) alone matters
of these transfers. Clearly,
her desired one will be chosen anyway,
by Proposition
+ r(a m |7r) =
9^) by suitably adjusting this net premium for
some characterization
£ Jm {o~),
the
.
transfers intuitively will differ
for all 5
=
optimally chosen (9 m
—
CEHE.
— 7r)(l — a)],
selfish herder's
for
rule x*.
where the transfers achieve
outcome. Existence follows at once from
Since the private belief
Faced with
transfers.
to coax each
is
maximize
2 that individuals optimally choose private belief
measurable
can do
a Bayes-
is
transfers,
rewarded. Clearly, the
sum
of his
must leave every individual who should be on the
14
knife-edge between two neighbouring optimal actions perfectly indifferent. In other words,
+ u m (p(iT,6m )) = w^ir 9m
we need T(a m \ir)
',
too, because the interval structure
is
X m £ dv(q(ir,£
In fact, this condition
8).
Lemmata
make the
Proposition 6 (Optimal Transfers)
—
/ (1
optimal by
alone will lead inframarginal £>.M's to
s
)
2
and
6,
=
Sv(q(7T,
5
(ir),
(,
a m ))
correct decisions.
For any
+
sufficient
and myopic incentives
belief
there exist Xi
it,
(Tr),a m ),6), so that the following are efficient transfers
T(a m \7r)
is
8X m [p(ir, 6m )
-
6
q(ir,
{tt),
£
<
<
•
XM
,
with
r(a m |7r):
a m )]/(l
-
8)
= w m (ir, 9m )/{l -8)- u m (p(ir, 9 m ))
s
Observe that incentives are unchanged
sequently,
SV may
a constant
if
added to
is
all
M transfers.
also achieve expected budget balance each period:
contribution from everyone
is
=
zero, or
Yl m =i P( a m\'^,C
'
t
(
7r
i.e.
the expected
There
)) -r ( a m| 7r )-
Con-
is
obvi-
ously a unique set of efficient transfers that satisfies budget balance.
Herding
5.1
We
are
is
now
Constrained Efficient
positioned to reformulate the learning results of the last section at the level
We
of actions. First a clarifying definition:
N
if all
individuals
n
=
N,
N+ N+
from the definition of a cascade;
is false.
To show that herds
this case:
2,
1,
.
.
say that a herd arises on action a m at stage
.
choose action a m
we can
generalize the Overturning Principle of SS to
Claim 4 (statement and proof appendicized) establishes that
actions other than a m will push the updated public belief far from
convergence of beliefs implies convergence of actions
The
following
is
thus a corollary to Proposition
Proposition 7 (Convergence of Actions)
(a)
An
(b)
With bounded private
a herd arises on
(c)
It is
a herd
an action other than
—
or,
In any
CEHE for
[0, 1)
The chance of an incorrect herd with bounded private
individuals will.
is
current value. Thus,
and unbounded private
starts.
beliefs
Unless
ttq
beliefs.
e
Jm{°~)>
H for any 8 €
vanishes as 5
SV ends up with full learning with unbounded
More interesting is that SV optimally incurs the
incorrect herd. Herding
near Jm (5),
discount factor 8:
with positive chance in state
no surprise that
it
a limit cascade implies a herd.
on an action eventually
om
its
for
2.
ex post optimal herd eventually starts for 8 €
beliefs,
this differs
imply a herd, but the converse
certainly, a cascade will
arise,
Observe that
.
beliefs, for
risk of
"f
[0, 1).
1.
even
selfish
an ever-lasting
truly a robust property of the observational learning paradigm.
15
CONCLUSION
6.
This paper has shown that informational herding
not such an adverse phenomenon after
the social planner's problem, and
is
Rather,
all:
in
it is
models of observational learning
is
a constrained efficient outcome of
robust to changing the planner's discount factor. To
understand this decentralized outcome, we have then derived an expression
for the social
present value of each action. This formulation differs from the Gittins index because of the
agency problem: Since private signals are privately observed, aligning private and social
incentives entails a translation using the marginal social value. Finally,
expression to prove a strict comparative static that eludes dynamic
Namely, cascade
sets strictly shink as the planner
we have used
this
programming methods:
grows more patient.
This paper has also discovered and explored the fact that informational herding
is
Our mapping,
simply incomplete learning by a single experimenter, suitably concealed.
recasting everything in rule space, has led us to an equivalent social planner's problem.
While the revelation principle
exercise
mechanism design
in
also uses such a 'rule machine', the
harder for multi-period, multi-person models with uncertainty, since the planner
is
must respect the agents'
While
belief filtrations.
this
is
trivially achieved in rational
expectation price settings, one must exploit the martingale property of public beliefs with
observational learning, and largely invert the model. This also works for more general social
learning models without action observation
signals
is
— provided an entire history of posterior belief
observed. Absent this assumption, the public belief process (howsoever defined)
ceases to be a martingale, and expression as an experimentation
is
model with perfect
recall
no longer possible. This explains why our model of social learning with random sampling
Smith and Sorensen (1997b) must employ
Of course, once informational herding
experimentation,
efficient.
it
is
entirely different techniques (Polya urns).
correctly understood as single-person Bayesian
no longer seems so implausible that incorrect herds
For incomplete learning
is if
sights into the experimentation literature:
As
in
This link also
hope
offers
EK, incomplete learning
requires an optimal action x for which unfocused beliefs are invariant,
satisfy
of signals a
is
the same for
all
states
we have seen assume a
instance, Rothschild (1974),
McLennan
(1984),
i.e.
For such an invariance
lu.
with fewer available signals, and not surprisingly herding and
of complete learning that
constrained
anything the hallmark of optimal experimentation
models, even with forward-looking behaviour.
ip(a\to, x)
may be
finite (vs.
all
is
for reverse in-
at the very least
the distribution
clearly easier to
published failures
continuous) signal space. For
and ABHJ's example are
all
binary signal
models. More precisely, the unbounded beliefs assumption in an experimentation context
corresponds to an ability to run experiments with an arbitrarily small marginal cost
shifting the threshold 9k slightly
up only incurs myopic
16
costs o(d6k))-
(eg.
.
APPENDIX
A.
RHS
Let the Bellman operator Tg be given by T§v equals the
v
>
we have TgV >
v'
As
Tsv'.
A.l Proof of Proposition
by v
such that when n G Jm {o~),
exists,
a m occurs
For the
G
m ((5), then
u m (ir)
if
an
is
am
is
v(tt,8)
=
it
vq(tt)
=
ir n +\
SV
ir
that are pointwise increasing,
strictly so outside the
7r,
q(n x x(o))
,
for all
,
7T.
By
The
then
is
=
A
[0, 1).
Vq(tc)
to v(-,5).
>
)
[(1
-
v
is
applied,
it is
}
—
n
a.s.,
induction,
T^v > T$~
vo,
>
interval.
=
7r n
is
v
ir n
=
1.
The value
(ir) V<5
G
v(-,S) weakly exceeds vq,
andVn
[0, 1)
o~)um(q{n, x, a m ))
+
\J^=l Jm (5)
8v (q(7r,x, a m ))] over x
Then
n Observe how
all
x G X,
52
,
T$vq(ix)
>
v
.
Finally,
when
ir
is
v(n, 5 X )
>
(ir)
outside the cascade sets, by
>
v (^).
or ought to be a folk result for optimal experimentation, but
Step 3 (Weak Value Monotonicity)
>
and
yielding (as usual) a pointwise increasing sequence
have not found a published or cited proof of
5i
0,
also dynamically
not optimal to almost surely induce one action. So, v(tv,S)
following either
As
consists of weakly convex functions
resulting in value v Q (7r). Optimizing over
1
If
u m (Tr).
the optimal choice.
argument holds when
similar
cascade sets: v(n,5)
'^, x
=
weakly convex, Jm (S) must be an
The sequence {T^v
and converge
interval.
one rule x almost surely chooses the myopically optimal action.
it is
Namely, for
is
i.e.
],
G Jm{°~)-
1
v(ir,5)
is
G Jm (S) and a m
?r
no matter which rule
converging to the fixed point v(-,5)
definition
G Ji(S) and
[0, 1),
m -i,9m
[$
need only prove that Jm (S) must be an
and v (-,6)
a.s.
To maximize J^ameA^i
for given
G
C
myopically strictly optimal for the focused belief
Step 2 (Iterates and Limit)
Proof:
optimally chooses x with supp(//)
— u m (7r)
optimal for any discount factor S G
and
myopic expected
For each a m G A, a possibly empty interval Jm (^)
the optimal choice, and the value
half, a\ is
updates to
First, define the
11
it
really
affine function of n,
For the second
since
we
first half,
</
Conversely,
= max m m (7r).
(learning stops). For any S
a.s.
Proof:
iv
(ir)
Cascade Sets)
(Interval
1
unique
is its
1
established in a series of steps.
is
utility frontier function v
Step
a contraction, and v(-,5)
is
for
bounded, continuous, weakly convex functions.
fixed point in the space of
The proposition
standard, T$
is
Note that
of (1).
12
it.
At any
rate,
The value function
v(n, 5 2 ) for all
it is
is
we
here for completeness.
weakly increasing in
6:
it.
from v(n,0) = sup x J^ m ip(a m \-K,x)u m (q(n,x,a m )). In other words, v(n,Q)
allows the myopic individual to observe one signal a before obtaining the ex post value v Q (p(ir,o)).
12
But ABHJ do assert without proof (p. 625) that the patient value function exceeds the myopic one.
this differs
17
>
on the
choose
Vq. If 5i
2
>
,
Step 4 (Weak Inclusion)
words,
Va m G A,
Proof:
As seen
>
62
For
-
7T
>
if 1
>
81
>
52
and
in steps 3
>
v(tt,Si)
2,
G Jm {3\)i we know
SS establish
and Ja/(0)
=
Now
{1}.
apply steps
Step 6 (Bounded Beliefs)
JM (6) =
We
optimal at belief
for beliefs
<
No
So any
A
G
q(7r)
Bellman equation
>
fr
is
0.
1 for
is
tt
[0, 7r/2]
—
and
some < ii\ <
> H ([8*,l}) >
fi
boundaries 6m _i
7r 2
L
fi
=
0,
<
h ([9*,1])/[7tii h
tt
([9*,1})
=
{0}
—
[0,7r(£)]
and
optimal to choose a rule x that
is
is
=
some u >
and
0,
Vo(%) on
very similar.
Since action a\
must be the optimal choice
it
Since each u m
[0,7r].
for all beliefs
-k
—n
is
^ ai
For
Ji(6)
is
large
=
=
in the interval
= no j\nd +
and so
arbitrarily small,
than u(l
strictly
enough
{0} and
m
>
—
5,
<
m
9*,
=
< M)
We shall consider the
# m+1 = 1 (see Lemma 2).
(1
-
ir)ti
L
[9*, 1]}
[a, a]
C
— 7r)(l — a)]. For n
is v((j(tt), S) — v(n, 6)
such small
tv.
By
the
beliefs.
{1}-
for
which Jm (S)
=
[7Ti,7T2],
1)
with
alternative rule x, with interval
results in the posterior belief q(ir)
([8*,l})} in state
18
[0, fr/2].
cascade sets disappear except for
all
lim^i Jm{5)
(1
for
C
afhne,
(1
5)/5 for small enough
suboptimal
0.
with the event {a G
+
q(7i)
is
Since there are informative private beliefs, 30* G (1/2,
([9*,l})
Updating the prior
tt/j,
1.
6m
it
Thus, Ui(n)
q(ir)
are empty.
1.
Fix 5 G [0,1), and an action index
for
1
<
in particular, less
lim^j
Jm{o~), while
Jm (S)
are empty, except for Ji(0)
not weakly dominated,
any action a
(1),
other
4.
updated to at most
Step 7 (Limiting Patience)
Proof:
it(5)
D
can produce a stronger signal than any a £ supp(^)
initial belief
small by continuity of v
and
<
ir_(5)
m^
all
observation a 6
small enough,
Ji(S)
Jm (0)
The
u m {Ti).
only cascade sets for the
If the private beliefs are bounded, then Ji(5)
and
0,
some
for
tt,
=
> u m (ir) + ufor
Ui(ir)
(0, 1).
it
all
=
G Jmi^)-
tt
all
{1}/
when
for all n,
v(7r,5 2 )
beliefs,
=
the argument for large beliefs
a\\
-k
so that
a.s.,
prove that for sufficiently low beliefs
almost surely induces
is
<
where
[n(S), I],
and
1
and thus
{0} and Jm(5)
myopic model that
for the
> u m (7r)
vo(tt)
unbounded private
=
extreme actions are empty, with Ji(5)
>
u(7r,5 2 )
= u m (7r)
v(7r, 5i)
VFrf/i
In other
6 increases:
Jm (5i) C Jm (52 ).
then
0,
Step 5 (Unbounded Beliefs)
Proof:
when
All cascade sets weakly shrink
optimal value can thus be obtained by inducing a m
Proof:
Y, arn eA i>(am\^,x)v(q(7T,x, a m )) for
> 62, then T^Vq > Ts vq, since more weight is placed
larger component of the RHS of (1). Because one possible policy under 6\ is to
the £ optimal under 5 2 we have T^v > TJ^vq. Let n — 00 and apply step 2.
any x and any function v
Si
<
Clearly, J2a eA ip(a m \^,x)u m (q(TV, x, a m ))
m
Proof:
H.
—
For any compact subinterval
/
C
(0, 1), in
for all
n E
onto)
[tt 2
Since
+
By
e/2,
v(7r, 8)
>
choose u
have
I.
definition of e, q
maps
Choose u >
1].
> u m (7r)
outside
> u m (7r) + u
there exists e
the interval
Jm (S) =
>
for all
>
If 8'
8.
— f/2,7r 2
[7r 2
>
e(I)
]
with
8
£
i\
+
(7r)
+ s/2,
[n 2
1].
so large that (1
is
q(ir)
u
for all
G
— £/2, 712]. By
[ii2
length e/2 from interval
=
If rn
1
or
A. 2 Proof of
We
(Two
first
belief q(7r, x,
Lemma
if its
set
Proof:
an action a m
in n,
belief
downwards. With unbounded
cascade sets
By
beliefs.
M—
Fix 5
we may
set,
>
0.
TS holds,
If
By step 3, we thus
— 8')u < 5'u, then
when
a.s.
for large
enough
5.
/ vanishes for 5 near
1.
7r,
at least
%_
and
x,
g("7r,
=
p(vr,6)
the case in
BHW).
then for any
not in
it
then (^k) obtains for
all
non-cascade
both cascade sets can be reached.
two actions are taken with positive chance, and
a m ) never
<
>
let
upwards while another
lies in
Assume
[b, b}.
Jm '(0) and Jm (0), with m' < m, and
<
TS fails,
If
shifts the public belief
beliefs,
is
TS
each the unique belief between any pair of
1 points,
Let co(supp(F))
then p(n,b)
also
taken with positive chance inducing a posterior
definition of these cascade sets, p(7r, b)
cascade
[0, 1].
Call the private signal distribution
3.
Jm _i(0) and Jm (0) from which
At a non-cascade
sume bounded
is
set.
by the interval structure, some action
it
e
4
a m ) not in any 8-cascade
sets
>
of times, each time excising
support contains only two isolated points (as
("Ar).'
nonempty cascade
n
apply this procedure repeatedly: Jm {8)
except possibly at most
i\
Jm (8) evaporates
see that
(Unreachable Cascade Sets)
1
any 8-cascade
beliefs
Jm (8), we
consider a stronger version of step
Signals)
Claim
M,
number
iterating this procedure a finite
tx
£
7r
the Bellman equation (1) reveals that our suggested rule x beats inducing a m
7r
—
into (but not necessarily
and both are continuous
[7Ti,7r 2 ],
> u m (7r) + u
for all 8'
=
u m (-7r) < u m+1
so large that
so small that v(tt, 8)
v(tt, 8')
D Jm {8),
particular one with I
that
7r
=
a cascade
it
lies
set; therefore, as-
between the nonempty
sup Jm '(0) and
7f
p(7t,b). If all possible actions at
But these
f.
shifts
=
7r
inf
Jm (0).
led into a
inequalities can only hold with
equality:
>
p(p(ir,b),b)
p(tt,6)
>
p{ji,b)
>
=
p{p{n,b),b)
p(p(n,b),b)
and because the outer terms coincide, as Bayes-updating commutes.
and Jm (0) there
exists at
such a point exists
k
to solve p{fc,b)
realized,
=
iff
7f,
m!
most one point
=
m—
while
and the posterior q
if
is
1
TS
it
which can
satisfy
between Jm '(0)
both equations; moreover,
and TS holds. Indeed, given TS, we may simply choose
fails,
then with positive chance, a nonextreme signal
not in a cascade
cascade sets by Step 4 of the Proposition
- in fact
ft
So,
1
set.
proof, so a
would further require sup Jm ^\{8)
19
=
With
fr
8
>
failing (it)
is
we have weakly smaller
is
even
less likely to exist
sup Jm _i(0) and inf Jm (8)
=
inf
Jm (0).
assume TS. Consider any
Finally,
Then the
x mapping
rule
By
indeed optimal.
with reachable cascade sets Jm -i(8) and Jm (5).
ff
a m _i (low signal to
b into
convexity, v(w, 8)
most the average
at
is
and
tt_)
a m (high signal to
b into
given by transition chances), and x achieves this average. So v(-,8)
We now
for
finish
proving
Lemma
outside the (^-cascade sets. Fix
it
set we're done, as v(it,8i)
Assume
first
that
one action a m
least
a (^-cascade
set.
wCtt,^)
tt
=
such
/
=
neighbourhood
U (it, 7f)
To
maps
=
Wm-i(zc)
From
of
b into
1
81)
>
(and thus also for
m _x((5i) and Jm (^i)
.7
71",
it
1
tt_
82 )
€ Jm -i(<^i) and
7r
>
q(ir
<
i/j(a
+
v(tt,8 2 ).
(ir)
a ^-cascade
£,
,
-
Then
82).
(it)
,
>
a m ) not in
5 1;
z;(7r,5 2 )
^.
at
(4)
If (4)
holds
noted that between consecutive cascade sets
=
sup Jm -i(8i) and
ff
was everywhere an
b into
ff
>
x from Claim
G Jm {8i),
it
yields
=
mixture
is
worth
strictly
inf
Jm (8i).
Also, from
affine function
v(-, 8 2 ) at
ff
on
(see
[zr, ff],
Step
3).
v(tt,8i).
l's
proof at the belief
myopic
and continuation values
ff.
first-period values
v(tt_,
more than
82)
and
v(tt,82)-
v(jt, Si):
ip(a m \Tt,x)v{7t,8i)
[(l-82 )v(ff, 81 )+82 v(?l, 82 )]
most
Sl
fails (-fc) for
v(n,8i) and ^(ff,^)
v(tt,8i),
(1), this
ip(a m -i\7T,x)v{TL,8i)
m -i\n,x)
—
v
implied TS. In that case, (4) holds in a punctured
where
v(tt_,
>
v(tt, 8{)
lies in
tt
D
(7f,7r).
outside the 5 r cascade sets.
lies
tt
for 5\
sets. If
is
(weights
8)
on
affine
v (q(7r,£ Sl (n),a m )). Since 8 2
to ^(ff,^), apply rule
v {k,8i) and u m (rt)
clearly at
a m ),
l's proof, v(-,8\)
=
is
Suppose
Claim
Claim
not.
only,
the right hand side of
v(tt,8i)
which
tt
bound
find a lower
Since x
-cascade
supporting tangent line to the convex function
touches v(-,82) at
it
(TT),
between
7r
Assume
the last paragraph of Claim
As
<5 2
(r4l «(-,tfi))(7r) < {TS2 v(;ti))(K) < (Tft «(-,<y2 ))(7r)
(7r, it)
in turn, is a
outside the
v(tt,8 2 ).
i
must be unique, and that
7T
which
<
is
step 2 of Proposition l's proof,
taken with positive chance inducing a belief
is
Thus, v(q( K,( Sl
we are done.
7T,
vq(7t)
it
satisfies {+*) of
Next assume that some
at
From
4.
and v(n,
of v(it, 8)
ff)
Given
+
^(a m |ff,x) [{l-82 )v(7r,8i)+S2 v(Tt,82)]
this contradiction, (4)
must hold
at
ff.
A. 3 Differentiability
Continuity. In
light of the interval structure of
Lemma
the chances ip(a m \ir) with which to choose each action
space
maps
- that
is,
into each action). Thus, the choice set
the
same strategy space
is
2,
(i.e.,
SV
simply must determine
what
fraction of the signal
WLOG the compact M-simplex A(>1)
as in our earlier general observational learning model.
Since the objective function in the Bellman equation (1)
20
is
continuous in this choice vector
and
in
non-empty optimal
(1974)) that the
Proof of
rules at
Lemma
Assume
5.
We
are equivalent.
it
optimal rules at
Claim
2
\/e
>
>
38
Since
property
is
near
show
—
is
differentiable at
\tt
— tt\ <
In other words,
and
if all
as asserted.
tt,
any optimal rule
8,
tt.
optimal
all
at
tt
induces action
induces a m with probability one.
tt
Next,
ir.
and that
in
H, L.
e in both states
so are both tp(a m \H)
1,
it,
this leads to a contradiction.
rules optimal at
all
I.B.3 of Hildenbrand
upper hemi-continuous
is
not differentiable at
is
G Jm{$), one optimal rule at
tt
shared by
is
that v
Theorem
(e.g.
rule correspondence
such that when
1
Maximum
of the
are equivalent, then v
it
a m with probability at least
Proof:
Theorem
follows from the
tt, it
ip(a m
if
\Tr)
=
TTip(a m
The claim then
ip(a m \L).
\H) + (l
This
— 7r)-0(a m |I/)
follows from the upper
D
hemicontinuity of the optimal rule correspondence.
Claim
3
started
from
ViV 6
Proof:
Fix
at least
1
n+
—
i
—
TT n
when a m
>
Ve
action a m
tt,
>
38
such that
taken for the
is
<
rj
rj
in
<
\
By Claim
1/2.
2, for
—
occurs, provided
initial
by Bayes
tt)t],
so close to
7r
tt
—
e)
^
that pi(l
—
pi)
<
—
3n(l
77i/[87r(l
—
=
|Pat
in
—
\p2
tt\
—
of
Tt\
tt,
tt
tt
it
and
—
1///v
e)
and so
within
and then p 3
then
7r)/2
(1
of
Tt)]
each state from any
772
<
tt\
8 then
under any optimal strategy
with probability at least
enough to
1
—
e in
if
action a m occurs with chance
a m occurs, then the posterior
If
—
|7r„ +1
enough to
that
tt,
tt u
\
tt.
a m occurs
for the
next
N consecutive periods,
that a m occurs with conditional chance at least
tt
\p 2
at
any
in
each state.
close to
—
likewise,
stays within
tt\
of
within
tt
it
it.
—
it\
—
\p\
Let
ft\
=
771
of
tt,
\pi
—
^
tt
be so close to
Apply
o£
,
.
and choose
tt\
P2 7^
—
1
771
tt
in
this construction iteratively to choose
.
.
,
p^.
If
the initial belief
in the next A^ periods
it
tt
optimal rules take
all
that a m occurs with chance at least
and then p 4
\pi
7r„ +1 satisfies
can be chosen arbitrarily small
each period. In particular, we proceed as follows. Let pi
a m with chance at least
within
.
So
rule.
close
is
tt h
7r n
the posterior belief stays close enough to
(1
—
N periods
first
close
TT n
each state starting from
4tt(1
Choose the
1
\tt
if
H, L.
both states
|7r
N
tt
lies
within
provided a m occurs
each period.
We
Any optimal
and v H The affine
employ the machinery from the proof of Proposition
started at belief
tt
h(p) which has h(0)
Since
it is
function
some state-contingent values v L
H
and h(l) — v is then a tangent to the value function at tt.
will yield
=
v
L
.
optimal to take a m forever at
h which intersects u(a m L) at
,
derivatives of v at
ir,
strategy
4.
tt
—
tt,
one tangent to v at
and u(a m H)
,
at
tt
=
1.
tt is
the affine function
Consider the
left
and
with corresponding tangent lines hi(p) and h 2 (p) at belief
of those tangents -- say, hi
-
-
must
differ
from h (when hi
21
differs, necessarily
p.
m
right
One
>
1).
.
=
Define v\
> u(a m
nf
As
v
beliefs
>
h\(0)
convex,
u(a m L) and v^
,
>
L), a unique A
,
is
=
h(0)
=
<
h\{\)
=
exists satisfying v[
u(a m H). Since u(ai, L)
,
+
Au(ai, L)
differentiable almost everywhere.
it is
=
h(l)
So
let
— A)u(am
|
iv k
then uniquely determined for each
is
ir k
and
,
state-dependent payoffs of any optimal strategy started at n k
vH {^k) <
Now
The
v\
choose
a,\
inequalities of course follow
N
and
so large
L
L
the expected value v (nk) in state
v
L
(ir k )
<
(1
- 6 N )(1 -
<
(1
- X/2)u(am
<
(1
-
> u(a m ,L),
since u(ai,L)
>
A/2
—
1
—
(1
Then by Claim
strictly the best action in state L.
is
namely v
,
by convexity of v and
e so small that
ir k
5
/v
L
<
-
ir k
)(l
3, for all
,
\)u(a m L)
,
+
L)
+
-
-
<J
L)=v}<
v
+
[1
(1
JV
-
)(1
the
0, 1 are
>
(ir k )
v\ and
n.
Note that
e).
enough
large
of the optimal strategy starting at n^
e)u(a mj L)
L).
The tangent
.
—
intercepts at p
its
,
be a sequence of
7r
converging up to n, with the value function differentiable at each
function
action
(1
>
most
at
is
k,
e)]u(o 1} L)
{X/2)u(a x L)
,
Au(oi,
L
{n k )
as noted above. Contradiction.
A. 4 Proof of Proposition 7
Near Jm (5) we should expect to observe action a m The next lemma states that when
.
other actions are observed they lead to a drastic revision of beliefs, or there was a nonnegligible probability of observing
some other action which would overturn the
Claim 4 (Overturning Principle)
>
there exists e
(a) i/>(a
(b) i>(a
m \ir,£
Proof:
for
7r
m |7T, £
s
(%))
>
1
e -neighbourhood
— e,
and
\q(ir,
assume bounded private
it
in
or
G Jm (5)
C
Jm (5), we have
Jm- This
no information gain.
b
<
1/2
<
b.
Assume
Let e
(n),
ak )
a G
A
[0, 1),
Jm (5), such
—
7r
|
>
<s
is
is
/or
optimal
that
a//
:
a^
^ am
ip(a\n,£ (ir))>e/M,
By Step
of
any k
is
m
5
\q(ix ,^ {-k)
to stop learning.
ip(a k \iT,^ (Tr))
0,
that occur; or
,
o)-^k\> e
6 of the proof of Proposition
s
:
/
(0, 1), either:
<
1
1,
Thus, we need
For any small enough n
(0, 1).
^
and Jm {5)
£*,
W € Kn
5
beliefs.
for
—
rj.
u.h.c, almost surely taking action a k
is
>
and
Otherwise,
optimal at
impossible, as a k incurs a strict myopic loss, and captures
Let co(supp(F))
>
£
S
the only optimal rule
1,
since the optimal rule correspondence
7r
KD
some closed subinterval I
sufficiently close to
some
S
£,
< l-e, and for some
enough to
only consider
-k
(7r))
First,
close
and an
For any
beliefs.
=
[6,6].
By
the existence of informative beliefs,
minimum of//, ii H {[b, (26 + l)/4]) and // i ([(26 + l)/4, 6]).
> 1 — e for some n E I. Then any action a^ ^ a m is a.s. only
be the
s
ip(a m \ir,£ (Tt))
taken for beliefs within either
overturning (selecting,
if
[6,
(26+l)/4] or [(26+l)/4,
6].
Any such a k
necessary, e even smaller). If instead
22
?/>(a
m |7r,£
implies the stated
5
(7r))
<
1
— e,
then
each action
is
taken with chance
extreme private
less
and so
e,
different actions are taken at the
optimal
beliefs (by the threshold structure of the
Next consider unbounded private
— say
{1}
7r
6
with q(n,
£,
near
that
When
and 9(^,^(71), a k ) are near
tt
and an
|g(7r,
£
(7r),
near the cascade sets {0} or
0, this
ak )
—
tt\
>
some
e for
For the proof of Proposition
Cantelli
Lemma
process,
and
we
>
e T(Y\,
.
,
.
.
Yn
),
a\
from
continuous in
is
a strict
is
taken with positive chance
0.
first cite
the extended (conditional) Second Borel-
Corollary 5.29 of Breiman (1968):
in
An
7,
e
7^ a\
^
boundedly positive
yields a
arbitrarily small loss in future value (for v
improvement: contradiction. Consequently, any action a k
has
claimed.
beliefs, as
a k was nearly uninformative). So the altered policy
shift very little, as
5
tt is
one of
least
dk) near n. Consider the altered policy that redirects private beliefs
{tt),
into «! instead.
which
Assume
beliefs.
At
rule).
Let the optimal policy induce with positive chance an action a k
0.
first-period payoff gain
q,
—
1
M actions then occurs with chance at least e/M and overturns the
the
afc
than
Let Yi,Y2 ,... be any stochastic
the induced sigma-field.
Then almost
surely
00
{uj\uj
£
An
=
infinitely often (i.o.)}
\^ P{A n+ \\Yn
{u>\
,
,
.
.
.
Y =
00}
x )
1
Fix an optimal policy £
For fixed m, define events
and
G n+ —
i
{|7r n+1
—
7r n
|
and therefore P(G„ + i|7rn )
En n Fn
occurs
By
i.o.
By
.
Choose
En =
{ix n is
(7r„-)
many
En nF£.
En C)F£,
to get
e-close to
Jm (S)},
4 for all actions Oi, a 2
Fn =
Lemma,
almost surely converges by
implication,
is
Claim
to satisfy
{ip(a m \ir n £
En D Fn
times.
all
H that
Then
En
actions a k
occurs
i.o.
Gn
obtains
Lemma
3,
G n+l
c
(b)
/ am
eventually taken on event
an event of probability mass one, by
imply
En
Lemma
,
.
.
<
1
a^-
— e},
must obtain,
the event where
i.o.
Gn
on that event
occurs
Jm (5) and
eventually true on
is
(ir n ))
.
,
i.o.
with
with probability zero.
(n n ) converges to a limit in
fl
6
,
)
occurs only finitely
Perforce, action a m
>
> e}. If En n Fn is true, then Claim 4 scenario
= 00 on
> e/M. So J2™=i P(G n +i\^i,
,if n
Restrict attention to the event
But given
e
the above Borel-Cantelli
almost surely. But since
probability zero.
5
fl
G n+
F£ n
i,
by the
G n+X
c
.
H, and thus
first
fl
Fn
so
is
point in Claim
4.
Finally,
3 and Proposition
En
sum
over
all
m
1.
References
Aghion,
P., P.
Bolton, C. Harris, and B. Jullien
(1991):
"Optimal Learning by
Experimentation," Review of Economic Studies, 58(196), 621-654.
Banerjee, A. V.
(1992):
"A Simple Model of Herd Behavior," Quarterly Journal of
Economics, 107, 797-817.
23
,M|T LIBRARIES
III
III
llll! Illl
3 9080 01444 3425
BERTSEKAS, D.
Dynamic Programming: Deterministic and
(1987):
Prentice Hall, Englewood
Bikhchandani,
ion,
S.,
Cliffs,
Stochastic Models.
N.J.
D. Hirshleifer, and
I.
Welch
(1992):
"A Theory
of Fads, Fash-
Custom, and Cultural Change as Information Cascades," Journal of Political Econ-
omy, 100, 992-1026.
Breiman,
Dow,
L. (1968): Probability. Addison- Wesley, Reading, Mass.
J. (1991):
"Search Decisions with Limited Memory," Review of Economic Studies,
58, 1-14.
EASLEY, D., and N. KlEFER
(1988):
"Controlling a Stochastic Process with
Unknown
Parameters," Econometrica, 56, 1045-1064.
FUSSELMAN,
MlRMAN
Consumption for a General Class of Disturbance Densities," in General Equilibrium, Growth, and Trade II, ed.
by J. G. et al., pp. 367-392. Academic Press, San Diego.
Gittins,
M., and L.
J.
J.
(1993): "Experimental
C. (1979): "Bandit Processes and Dynamical Allocation Indices," Journal of
J.
the Royal Statistical Society, Series B, 14, 148-177.
Hildenbrand, W.
(1974): Core
and Equilibria of a Large Economy. Princeton University
Press, Princeton.
McLennan, A.
"Price Dispersion
(1984):
and Incomplete Learning
Journal of Economic Dynamics and Control,
Meyer, M. A.
Profiles,"
7,
and
J.
and Career
58, 15-41.
Rothschild, M. (1974): "A Two-Armed Bandit Theory
Economic Theory, 9, 185-202.
J.,
Long Run,"
331-347.
(1991): "Learning from Coarse Information: Biased Contests
Review of Economic Studies,
Scheinkman,
in the
Schechtman
(1983):
of Market Pricing," Journal of
"A Simple Competitive Model with
Production and Storage," Review of Economic Studies, 50, 427-441.
Smith,
ing,"
L.,
and P. Sorensen (1997a): "Pathological Outcomes of Observational Learn-
MIT mimeo.
(1997b): "Rational Social Learning
Sobel, M.
"An
Through Random Sampling," MIT mimeo.
Complete Class of Decision Functions for Certain
Standard Sequential Problems," Annals of Mathematical Statistics, 24, 319-337.
(1953):
Essentially
L., and R. E. Lucas (1989): Recursive Methods in Economic Dynamics.
Harvard University Press, Cambridge, Mass.
Stokey, N.
24
:
r
__Date D ue
Lib-26-67
Download