Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/informationalher00smit2 HB31 .M415 (rO\ working paper department of economics INFORMATIONAL HERDING AND OPTIMAL EXPERIMENTATION Lones Smith Peter Sorensen No. 97-22 October, 1997 massachusetts institute of technology 50 memorial drive Cambridge, mass. 02139 _, RN 191998 WORKING PAPER DEPARTMENT OF ECONOMICS INFORMATIONAL HERDING AND OPTIMAL EXPERIMENTATION Lones Smith Peter Sorensen No. 97-22 October, 1997 MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASS. 02139 Informational Herding and Optimal Exp erimen tat ion* Lones Smith''' Economics Dept., M.I.T Peter S0rensen Nuffield College, October 30, : : ' Oxford 1997 Abstract We explore the constrained efficient observational learning model — as when individuals care about successors, or are so induced by an informationally- constrained social planner. We find that when the herding externality rectly internalized in this fashion, incorrect herds To describe behaviour in this still is The optimal Choose the action with the highest index. simply: cor- model, we exhibit a set of indices that capture the privately estimated social value of every action. rule is obtain. decision While they have the flavour of Gittins indices, they also incorporate the potential to signal to successors. We then apply these indices to establish a key comparative static, that the set of stationary 'cascade' beliefs strictly shrinks as the planner grows more patient. transfer The is We also show how these indices yield a set of history-dependent payments that decentralize the constrained lead inspiration for the paper is social optimum. our proof that informational herding but a special case of myopic single person experimentation. In other words, the incorrect herding outcome is not a new phenomenon, but rather just the familiar failure of complete learning in an optimal experimentation problem. *A briefer paper without the index rules and associated results appeared under the title "Informational Herding as Experimentation Deja Vu." Still earlier, the proposed mapping appeared in July 1995 as a later section in our companion paper, "Pathological Outcomes of Observational Learning." We thank Abhijit Banerjee, Meg Meyer, Christopher Wallace, and seminar participants at the MIT theory lunch, the Stockholm School of Economics, the Stony Brook Summer Festival on Game Theory (1996), Copenhagen University, and the 1997 European Winter Meeting of the Econometric Society (Lisbon) for comments on various versions. Smith gratefully acknowledges financial support for this work from NSF grants SBR9422988 and SBR-9711885, and S0rensen equally thanks the Danish Social Sciences Research Council. t e-mail address: ^e-mail address: lones@lones.mit.edu peter.sorensen@nuf.ox.ac.uk 1. The last INTRODUCTION few years has seen a flood of research on informational herding. We ourselves have actively participated in this research herd (Smith and Sorensen (1997a), or SS) that was sparked independently by Banerjee (1992) and Welch The context (1992). is seductively simple: An may In this context, action arises is infinite sequence of individuals must condition his decision both on his endowed private signal about the and on state of the world, Bikhchandani, Hirshleifer, and Everyone has identical preferences and decide on an action choice from a finite menu. menus, and each BHW: all predecessors' decisions (but cannot see their private signals). SS showed that beliefs BHW and 1 taken with probability one. -- namely, after some point, converge upon a cascade all — i.e. where only one Banerjee showed that a 'herd' eventually decision-makers (Z\Ms) make the identical choice, Clarifying their next result, SS also showed that this herd possibly unwise. is the T>A4s' private signals are uniformly bounded in incorrect with positive probability iff strength. This simple pathological outcome has understandably attracted much The main thrust of this paper looking behavior. failure, is ex post fanfare. an analysis of the herding externality with forward- Contrary to the popular impression of incorrect herds as a market we show that herding is constrained-efficient: Even when VAis internalize the herding externality by placing very low weight on their private gain, incorrect herds obtain whenever private signals are boundedly powerful; however, they occur with a vanishing chance as the discount factor tends to one. The VAis' lack of concern just the extent of incomplete learning, Social information is and not its for successors affects existence. poorly aggregated by action observation, 2 as individuals choose actions that reveal almost none of their information. But suppose that early either wished to help latecomers, or were so induced, better signalled their private information. Exactly to the VAA's behaviour to maximize in this social welfare? We may VM.S by taking more revealing actions that what instructions should a planner give provide a compact description of optimal constrained efficient herding model. We capture the privately estimated social value of every action. produce a set of indices that The optimal decision rule is simply to choose the action with the highest index. While they have the flavour of Gittins' (1979) multi-armed bandit indices, they also must incorporate the potential to signal to successors. So unlike Gittins' perfect information context, an action's index is not simply present social value. Rather, individuals' signals are hidden from view, and therefore its social rewards must be translated into private incentives using the marginal social value. *As convergence may take infinite time (see SS), we have a limit cascade, as opposed to BHW's cascade. Dow (1991) and Meyer (1991) have also studied the nature of such a coarse information process for 2 different contexts: search theory and organizational design. We apply these indices to establish a key comparative Our second VMs when strictly shrinks application of the indices A centralized. is is to the equivalent to social planner benefit of posterity. problem of the informationally- maximize the present discounted value For the altruistic herding fiction welfare. cascade belief set grow more patient. constrained social planner, whose goal VMs' static, that the only worthy of study is must encourage early VMs set of history-dependent are given by our indices, and have a rather can be de- to sacrifice for the informational Such an outcome can be decentralized as a constrained by means of a simple if it of all social optimum balanced-budget transfers. These transfers intuitive economic meaning. This paper was originally sparked by a simple question about informational herding: Haven't we seen this before? We were piqued by its similarity to the familiar failure of complete learning in an optimal experimentation problem. Rothschild's (1974) analysis of the two-armed bandit is a classic example: An impatient monopolist optimally experiments with two possible prices each period, with fixed but uncertain purchase chances for each price. two Rothschild showed that the monopolist prices, had the and (u) selects the less profitable price clear ring of: (i) an action herd occurs, and This paper also formally herding is not a (i) eventually settles down on one with positive probability. To (ii) with positive chance justifies this intuitive link. We new phenomenon, but a camouflaged context is of the us, this misguided. prove that informational for an old one: myopic single person experimentation, with possible incomplete learning. Our proof respects the herding paradigm quintessence that predecessors' signals be hidden from view. In a replace all VMs by agent machines that automatically into action choices; the true experimenter then map any nutshell, we realized private signals must furnish these automata with optimal We therefore reinterpret actions in the herding model as signals, and the VMs' decision rules as his allowed actions. history-contingent 'decision rules'. the experimenter's stochastic We perform this formal embedding The organization problem anyway, it of this paper it as follows. makes more sense So section 2 describes a general re-interprets is for a very general observational learning context. As we must 3 solve the experimentation first to proceed backwards, ending with the economics. infinite player observational learning model, and then as an optimal single-person experimentation model. Focusing on the finite- action informational herding model, section 3 characterizes the experimenter's limiting beliefs. The altruistic herding model is introduced in section 4, and optimal strategies are described using index rules; these are then applied for our key comparative static, as well as a description in section 5 of the optimal transfers for the equivalent planner's problem. A conclusion affords a broader perspective on our findings. 3 Some proofs are appendicized. Such an embedding is well-known and obvious for rational expectations pricing models (eg. Scheinkman and Schechtman (1983)), since the price is publicly observed, and an inverse mapping is not required. TWO EQUIVALENT LEARNING MODELS 2. In this section, we sumes SS, and thus embedded in the up a rather general observational learning model, that sub- first set BHW and Banerjee (1992). All models experimentation framework. Afterwards, we specialize our findings. An Information. sequence of decision-makers infinite actions in that exogenous order. There The elements a given of the parameter space common state of the world. is T) (f2, signal received by a T>M. SS (Lemma in . . . takes measure v over Q. 1), actually his private is 1, 2, are referred to as states of the world. There partially informative private As shown (DMs) n = uncertainty about the payoffs from these actions. prior belief, the probability The nth VA4 observes a Q then formally Observational Learning 2.1 is in this class are random we may assume we belief, i.e. a n G E about the signal WLOG let that the private a be the measure over which results from Bayesian updating given the prior v and observation of the private Signals thus belong to E, the space of probability measures over (f],^7 ), and signal. the associated sigma-algebra. Conditional on the state, the signals are assumed to be across 2? .Ms, trivialities, drawn according assume that not to the probability w all /j, we \i insist that all to ensure that signals are informative. no signal will perfectly reveal the same everyone for all is DMs, rational, where u i.e. : A x Q h-> R is measurable. DM's vate belief. predecessors, As we simply assume that a in SS, rule given the private belief expected payoff u a (p) 4 A, equipped in state uj common knowledge G Q, that upon an entire action history h. pri- can compute the behaviour of all it a = ir(h) G E for Knowing any history yields the posterior belief these probabilities, Bayes' rule h. A final application of Bayes' p G E. VA4 picks the action a G A which maximizes his — Jn u(a, u>)dp(to). We assume that such an optimal action a = a(p) Given the posterior The DM G Q. and can use the common prior to calculate the ex ante (time-0) probability implies a unique public belief 4 and the u> Bayes-optimal decision uses the observed action history and his own distribution over action profiles h in either state. exists. It is u>) set seeks to maximize his expected payoff. Before deciding action, everyone first observes his private signal/belief Each (a.c), for Everyone chooses from a fixed action with the sigma-algebra A. Action a earns a nonstochastic payoff u(a, the i.i.d. G £1 To avoid in state ui p? be mutually absolutely continuous Bayesian Decision-Making. is some are (a.s.) identical, so that Each distribution may contain atoms, but state of the world, measure w Q belief p, the nth solution defines an optimal decision rule x from E to A(A), the space of Absent a unique solution, we must take a measurable selection from the solution correspondence. probability measures over (A, A). Let X be the space of such maps x x produces an implied distribution over actions simultaneously The Stochastic Process of Beliefs. : E A(A). i-> A rule for all private beliefs a. Since the optimal x depends on and it, the probability measure of signals a depends on the state u, the implied distribution over actions a depends on both f x(a)(a)/j, in u '(da), u> and the optimal decision and unconditional on the state, it is Markov process with state-dependent 2.2 Informational of old bookes, Cometh al this A new goal first 7r n+1 . Thus, = ip(a\ui,x) This follows a (tt„) transition chances. in good science that faithe, men lere. — Geoffrey Chaucer optimization. is Herding as Experimentation Deja Vu And out Our immediate density = fn ip(a\uj,x)ir(du)). xp(a\ir,x) turn yields a distribution over next period public beliefs discrete-time The rule x. is 5 to recast the observational problem outcome stab brings us to the forgetful experimenter, as a single person who each period receives a new informative signal, takes an optimal action, and then promptly forgets his signal; the next period, he can reflect only on his action choice. But this optimal experimentation, since it assumes and is not a model of Bayes- How in fact requires irrational behaviour. then can an experimenter not observe the private signals, and yet take informative actions? To give proper context to our resolution, nice sequel to Rothschild's prices, it helps to consider McLennan (1984). This work permitted the monopolist to charge one of a continuum of but assumed only two possible linear purchase chance 'demand curves'. McLennan found that the resulting uninformative price when the demand curves crossed may well eventually be chosen by an optimizing monopolist. Rothschild's and McLennan's models give examples of potentially confounding actions, as introduced in EK: Easley and Kiefer (1988). unfocused beliefs for which they are invariants unchanged). Of particular significance state is In brief, such actions are optimal for taking the action leaves the beliefs (i.e. the proof in EK (on page 1059) that with finite and action spaces, potentially confounding actions generically do not thus complete learning must arise. anticipations of EK's general state space, whereas 6 Rothschild and insight. McLennan McLennan might be Rothschild escapes it exist, and seen as separate by means of a continuous resorts to a continuous action space. Yet there appears no escape for the herding paradigm, where both flavours of incomplete learning, limit cascades 5 See 6 Eg: payoff's in a one-armed bandit, with a potentially confounding safe arm, are not generic in The Assembly of Fowles. Line 22. M2 . Observational Learning Model State: E u> VM: n observations: 7r n Optimal action: x E Randomness in the nth experiment: a n Observable signals: a € A Belief after ir n X Optimal decision rule: x G X Private signal of nth VM: a n Action taken by each VM: a G A Density over actions: Density over observables: ip(a\u,x) il)(a\u),x) Payoffs: unobserved Payoffs: private information Embedding. Table 1: model fits well State: f2 Public Belief after nth Model Impatient Experimenter This table displays how our single-type observational learning into the impatient single person experimentation model. and confounded learning (see SS), generically arise mapping that we now puzzle suggests the inverse In recasting our general observational learning with two actions and two states. This consider. model as a single person experimentation problem, we must focus on the myopic experimenter with discount factor the observational learning story from a both the public belief beliefs pn . We may Regard Mr. n the rule x G as: unobserved, beliefs a have and perspective. Consider the nth on in shall regard VM, observing ir n but not his private signal; , and take his private signal (ii) and his action a G A for distribution p in state decision rule x described in section to, 2, 7r„ and an . (Hi) letting that The him. payoff u(a,u) that provide an additional signal of the state of the world. w uses optimally determining to an agent 'choice' machine; it who forming and acting upon his posterior separate these two steps by the conditional independence of (i) lest new his private signal X, and submitting machine observe is 7r n we Steering away from a forgetful experimenter, active experimentation). (ruling out If private then the experimenter chooses the same optimal resulting in action a £ A with chance ip(a\u),x). Thus, the observational learning model corresponds to a single-person experimentation model where: The (the rule) state space x € X. Given 1 In period n, the experimenter this choice, a chance ip(a\to,x) in state Table is fi. to. Finally, random observable EX EX statistic a G chooses an action A is realized with updates beliefs using this information alone. 7 summarizes the embedding. Notice how this addresses both lead puzzles. First, the experimenter never knows the private beliefs a, and thus does not forget them. Second, incomplete learning (bad herds) are entirely consistent with 7 This model doesn't strictly EK's generic finding of complete learning fit for models with finite EK mold, where stage payoffs depend only on the action and wefi. This is the structure of Aghion, Bolton, admit unobserved payoffs. Alternatively, we could posit that £X into the the observed signal, but (unlike here) not on the parameter (ABHJ), who Harris, and has insurance, and only sees/earns his expected payoff each period and not his fair Jullien (1991) random realized payoff. action and state spaces. Simply put, actions do not rewrites the observational learning space for EX X the infinite space is model map to actions but to signals as an experimentation model. The when one true action of decision rules. SS considered two major modifications of the informational herding paradigm. One was to add i.i.d. easily incorporated here SS DAi's noise ('crazy' preference types) to the T also allowed for decision problem. Noise is by adding an exogenous chance of a noisy signal (random action). different types of preferences, with VAds randomly drawn from one or the other type population. Multiple types can be addressed here EX that by simply imagining T chooses a T-vector of optimal decision rules from A" with (only) the choice machine observing the task and private and choosing the action a as belief, before. THE PATIENT EXPERIMENTER 3. The Reformulated Model 3.1 From now we on, restrict ourselves to the more focused herding analytic framework - a two state, finite action setting, as in SS. More states complicates but does not enrich. We prior v(L) = v(H) Let supp(^i) be the (i.e. are Q = {H,L}, assume a state space noise for = 1/2. Private belief if common If supp(/u) = co(supp(/^)) assume that no action am is is is f < f\ < . . . < fM = strategy s n for period must be used, given which in so that belief having E = [0, 1]. then private beliefs are bounded; they (0, 1), a direct arbitrarily strong private beliefs exist. sum The of these separate analyses. A= {ai, . , . . a^}. dominated, yielding the standard interval structure that action then order the actions such that a m = C i.e. if optimal exactly when the posterior p where H, the chance of state is the standard herding assumption of a finite action space We A - [0, 1] half-bounded, half-unbounded case We make a i.e. support of each probability measure pw over private beliefs EX's problem). unbounded with both states equilikely ex ante, n 7r n is . is is in some sub-interval of [0, 1]. WLOG, myopically optimal for posteriors p £ we can m -i,fm [f ], 1. a map from E to X. It The planner chooses a prescribes the rule xn € strategy profile s = turn determines the stochastic evolution of the model -- X which (si,S2, ...), a distribution over i.e. the sequences of realized actions, payoffs, and beliefs. The Value Function. Our Stokey and Lucas (1989). The value function v(-,S) with discount factor S is v(ir, 8) = sup 5 E[(l over the payoff sequences implied by ABHJ E i— R for and sections 9.1-2 of analysis here follows s. — : the planning problem >• n lu n\^}, where the expectation Yl^Li ^ " 5) Recall that u m (ir) denotes the expected payoff from a m at belief n. Since u m = nu(a m is affine, , H ) + (1 — is 7r)u(a m L) , the Bellman equation - = v(ir,S) where sup ij>(a J^ (Markov) policy for ^A" The optimum in a Theorem 4.1), and Lemma 1 [(1 - is map a £ : when a 7r [0, 1] — X > observed and rule x is (so the rule given belief For any discount factor S < 1, EX alone, such that action a m has an optimal policy £ 5 = is strictly < 9 < 9\ . by producing the 2 and EX Proof: For the = thresholds < 9q belief < 9\ • EX. Any optimal it of • < 9m = • 1 it 9 m < Assume 2 , let E, (i = 1, 2) — u m2 is , and vice x G rule = 9m when a \ be those beliefs in versa. (# m _i,(9 This . U E2 n ) (It may be we least q(7r, x, a mi ) That is, depending on indifference also true with is 8 X almost surely described by is taken when a £ (#m _i,#m ), is E mapped with positive probability > q{ir, x, find q(7T,x,a mi one inequality is (Ei < q(ir,x,a m2 ). This problem is U E2 n ) [8, 1]. ) < strict. remapping . strictly is beliefs improved, unchanged. For any 9 G (0,1), define Ex(#) = Consider then the modified rule x which taken for beliefs in Ei(#), and where 9 satisfies ip(a mi \7r,x) = necessary for x to randomize over the two actions at belief 9 g(7r,x,a mi ), and similarly q(n, L x, are am 2 ) . mapped > into a mi under q(7r,x,a m2 ), with at Thus, x yields a mean preserving spread of the updated belief is weakly convex, weakly improved. But as we have just argued, the myopic payoff statistical decision < E2 a m2 ), then given our action the myopic payoff versus x, and since the continuation value function 8 beliefs . Since beliefs more in favour of state to accomplish that.) i, = and E 2 (<9) [0, 9} equals x, except that a mi •0(a m j7r, x). 1 m ), and a decreasing function, while the continuation value Next, assume that g(7r,x,a mi ) (Ei —> X. to the contrary that the sets are not almost surely ordered as Ei leading to a mi into a m2 u mi [0, 1] : M = ordering, payoffs are strictly improved with no loss of information by since summary: also ensures the greatest information such that action a m posteriors are perversely-ordered as If In . prove that any rule x without an interval structure can be strictly improved upon. For mi into a mr < . = 9m riskiest posterior belief distribution. randomizes between a m and a m+ We ABHJ, 2 proves. Intuitively, not only does the interval structure respect the action order to yield high immediate payoffs, but Lemma . optimal when a G between between a m and a m +i prevails at the knife-edge a value, is £(7r)). SS shows that the myopic experimenter maps higher There are thresholds Lemma applied. is it (1) > exists (eg. achieved by some such policy, generically written £ is into higher actions: patience, as + 8v(q(ir,x,am ),6)] S)um (q(n,x,am )) Markovian decision problem with discount factor 6 Interval Structure. 7r m \n,x) the Bayes-updated belief from q(ir, x, a) is A < is its strictly expectation improved. not without history. Sobel (1953) investigated an interval structure problem, and more recently, Dow (1991) in a two period search model. is 7 is in a simple . Our may assumption that no action earlier dominated actions VM. no longer so innocent, and in principle provides additional signalling power. with a larger finite We Essentially, it leaves alphabet with which to signal his continuous private belief. But including such actions would invalidate the above structure. is have some bite for very patient EX. For the information value alone, taking in fact each dominated is and thus perhaps the proof, leave this stone unturned, acknowledging the possible loss of generality. Long Run Behavior 3.2 As generally the case with Bayesian learning, convergence is of the martingale convergence theorem to the belief process. EX down, but also Lemma The 3 some a.s. to The next is belief process random limiting result A never dead wrong about the state. {irn ) is when one is deduced by application Not only must proof is found beliefs settle in SS. a martingale unconditional on the state, converging variable vr^. The limit ir^ is concentrated on (0, 1] in state action chosen is is only possible during a cascade, chosen almost surely, and thus is generalizes the analysis for 5 sets are reflected in the Ji(5) < • • < • (Cascade Sets) 1 Jm{o~), such that For (b) With unbounded private and £ Jm(°~) = [0, 1), There exist EX M optimally chooses x all beliefs, the < n(5) while lim^i Proof: < it(5) J\(5) 1. = a.s. Jm (o~) other For large {0} and enough empty) subintervals of A inducing a m £ on the U sets Ji(6) • • is true any belief n not probability. By in 5, all lim^i Jm(^) any cascade U Jm(o~). = {0} are empty. = = [0,7r(5)] and Jm(o~) = [7f(<5),l], set, at least where cascade sets disappear except for J\ and Jm, {1}- that a limit cascade must occur, as SS call - - • [0,1], Jm {6). iffn 6 All but the initial limit belief result are established in the appendix. that one the cascade extreme cascade sets are nonempty with J\(S) If the private beliefs are bounded, then Ji(5) < how illustration of (possibly the limit belief ix^ is concentrated and {1}> an in SS. See figure 1 for directly (7r n ) shape of the optimal value function. (a) all 6 = The next uninformative. characterization of the stationary points of the stochastic process of beliefs Proposition H an expression of EK's Theorem 5 that the limit belief ir^ precludes is further learning. In the informational herding model, this (c) interval two signals in the interval structure, the highest such signal A is it — To see why observe that for are realized with positive more likely in state the lowest more likely in state L. So the next period's belief differs from tt H, and with positive probability. Intuitively, or by the characterization result for Markov-martingale processes in appendix B of SS, it cannot lie in the support of 8 tt^. . Figure Typical value function. Stylized graph 1: of v(tt,S), 5 > with three actions. 0, u(ai,L) u(a 2 ,L) H) u(ai, u{a 3 ,L) o The proof of this MS) MS) result also MS) shows that the larger i the weakly smaller are is S, all cascade Indeed, this drops out rather easily from the monotonicity of the value function sets: We in 5. and defer asserting this result for now, in fact Proposition 5 leverages weak monotonicity, and the index rules that we introduce later on, to deduce strict monotonicity. Proposition 2 (Convergence of Beliefs) (a) For unbounded private (b) With bounded private Jm{S), there (c) is concentrated on the truth for any 8 G beliefs, tt,^ is beliefs, Consider a solution of EX's problem. learning is H positive probability in state incomplete for any S 6 [0,1): that tTqo Part (a) follows from Theorem part (c). We now of SS. 1 Lemma on state H . private beliefs a, the sequence (£ n ) its prior places on J\(S) in state vanishes as S and part l-a,b, (b) mean (1 ratio ln — (1 — enough 7T n )/n n to is 1, n^ places is — 7r (1 )/7ro. H must vanish as 5 —> Since lim^i k{S) = 0, fall value function v EX v(ir) strictly convex in beliefs 7T, um{t^) for plete learning: it near lumpy 1, 9, where learning optimally behaves myopically for very extreme = just as in all 1 in weight oo for the all mean 71-00 1. under the rubric of EK's Theorem is 1 the weight that Observe how incomplete learning besets even an extremely patient EX. So lem does not "f a martingale conditional — o)jo bounded above by some I < bounded above by 1(1 — k(S))/k(S), and Because likelihood ratio must equal beliefs and Proposition First, Proposition 1 assures us that for S close G tt extend that proof to establish the limiting result for S t and Jm{S). The the likelihood in J\(S) of ^oo 3 Unless not in Jm(S). is The chance of incomplete learning with bounded private Proof: [0, 1). is it is shown that complete beliefs: v(n) = if for 5 near Ui(ir) for ir this prob- the optimal 1. For here, near 0, and both affine functions. This points to the source of the incom- signals (actions) rather than impatience. It is simply individuals' inability to properly signal their private information that frustrates the learning process. AND INDEX RULES ALTRUISTIC HERDING 4. The Welfare Theorem 4.1 We now shift focus from the problem facing EX to the informational herding context with a sequence of VM's. For VM but subject to the usual informational herding constraints (observable altruistic, is organizational simplicity, actions, unobservable signals). Define Nash equilibrium of the discounted welfare of game where all posterity, an result is VM n = every is Proposition 3 For any discount factor Consequently, an AHE of all successors in is £*(7r). Then EX taking x in the (i) S £'[/|/t] proved < . . . seeks to 1, = — 5) suppose that every (AHE) maximize the present YLkLn ^""^fcl 71"]- Define Xlwen ^i^fi 10 )- for clarity. any optimal policy £ 5 for EX is an AHE. VM and a public belief Assume that £ the behaviour strategy an AHE, but that the VM has some rule x that a better reply than s tt. is is can improve his value at n by fully mimicking this deviation, first period and thereafter (ii) continuing with £ s as history had been generated by ^(/t). This contradicts the optimality of 4.2 as a Bayes- exists. Fix a given Proof: 1, 2, themselves included: E[(l quite natural, but first altruistic herding equilibrium the expected reward of the payoff function / as The next we simply if the first EX's by i.e. period policy. Choosing the Best Action Recall the classical problem of the multi-armed bandit: 9 A given patient experimenter each period must choose one of n actions, each having an uncertain independent reward distribution. The experimenter must therefore carefully trade-off the informational and myopic payoffs ssociated with each action. Gittins (1979) showed that optimal behavior in this model can be described by simple index rules: Attach to each action the value of the problem with just that action and the largest possible lump sum retirement reward yielding indifference. Finally, each period, just choose the action with the highest index. We now argue that the optimal policy employed by the decision makers in an a similarly appealing form: For a given public belief 7r and private belief p, AHE has VM chooses the action a m with the largest index u^(7r,p). This measure will incorporate the social payoff, as did the Gittins index, but as privately estimated by the VM. Before stating the next major result, recall that dg(y) denotes the subdifferential of the convex function g at y -- namely, the set of for all x. 3 An Moreover, dg{y) excellent, if brief, is all (strictly) increasing in treatment is found in §6.5 of 10 A that obey g(x) > y (strict) convexity. Bertsekas (1987). g(y) + A • (x — y) Proposition 4 (Index Rules) exist Xm G s dv(q(iv, £ (n), a AHE public beliefs m = 6 (l-S)u m {p{TT,p))+6{v{q{iT,t {n),a m ) : 5) and private it + \m — irp/[jTp+(l — 7r)(l— p)} is the posterior of it 5 5 take action a m when w m {'K,p) = max^ w k {tt,p). where p(ir,p) rule is to For strategy f*. 1,2 ... ,M, there such that the average expected present discounted value of maker faced with action a m to the decision «i {n,p) = m ), 5) Fix any belief p is 5 [p(iT,p)-q{TT^ {ir),a m )}} (2) given p. So the optimal decision VM facing public belief Action We now calculate the payoffs from each of the M available actions a m and a corresponding a m of the VM induces the public posterior belief qm = Fix a given decision maker Proof: it. a\, q{rc, . , . . a^f. d {^), C, ), state-contingent average expected discounted future payoffs of his successors, say v%. Clearly, the VM's posterior belief p, i.e. payoff streams, it expected value of any such of the form h m (p) suffices to = E[v^\n] vNM payoff stream = pv" + — (1 p)v^. employ the EX's reckoning, since the v^ and afhne in his is To evaluate these VM and EX entertain the same future payoff objectives. Because the affine function h m presumes the behaviour which is , same strategy starting at an arbitrary public belief < v(r). the subdifferential dv(qm ). The h m (r); therefore, h m (r) That EX v(q m ). Next, by employing the r as at q m EX , can achieve the value Thus, the slope of this affine function necessarily lies in present value expression (2) follows. can always ensure himself a payoff function tangent to the value function simply not adjusting his policy essentially was implies convexity of the value function (eg. 4.3 Strict Inclusion of We h^qm) = optimal starting at belief qm we have critical to this proof. Lemma 2 of This simple idea also Fusselman and Mirman (1993)). Cascade Sets next use our index rule characterization to prove a key comparative static of our forward-looking informational herding model: As individuals grow more patient, the set of cascade beliefs which foreclose on learning strictly shrinks. Before proceeding, we need two preliminary lemmata. Lemma 4 (Strict Value Monotonicity) outside the cascade sets: for 82 The detailed proof of this result Provided EX's strategy his continuation value for > in must Si, v(ir, is The value function increases <5 2 ) > v(ir, 81) all -k appendicized, but the idea some future eventuality strictly for £ Ji^) U is We who more • U Jm{&2)- non-myopic action, then show that this holds any continuation public belief outside both cascade sets Jm (5i) patient player, • quite straightforward. strictly prefers a exceed his myopic value. strictly with 5 D Jm (d 2 ). So a more highly weights the continuation value, will enjoy a higher value. 11 We we can characaterize also exploit the fact that at the edge of cascade sets. 10 For context, differentiability of the value function hard to identify primitive in general very it is assumptions which guarantee the differentiability of our value function. Call rules x and y equivalent Lemma they can be represented by the same thresholds (with associated mixing). if 5 (Differentiability) that all rules optimal at Let > v' implies Tgv ^ Then > as v (1) = Also, T$ Tsv'. an endpoint of cascade 0, 1 be are equivalent. tt Write the Bellman equation usual, v it v(-,5) is and T$v, differentiable in the belief at call tt. T$ the Bellman operator. a contraction, and v(-,8) is Jm (5). Assume set is its As unique fixed point in the space of bounded, continuous, weakly convex functions. We can major comparative finally establish the about foreclosing on further learning at some belief indifferent then he strictly prefers to learn if he is shrink strictly Let r Proposition 1 when 8 rises: = Jm (Si), the inf asserts continuity oiv(7T,8i) Case 1 f= — u m (ir) Assume . Va m e A, Jm (8 2 ) if 62 it, 81 myopically optimal at r 4, v(q{ir,x, employ the rule v(r,S2 ) a^),^) > Q «/m (0), v(q(ir,x,ak),8i) x with the discount factor ^ ip(aj\r,x)[(l - > ^2 ip(aj\r,x)[(l -8 > ^ set. {k\v(tt,8i) All non-empty cascade sets 0, then Jm (<!>2) C Jm (6i). As Step 4 52 , — u m (ir) = 0} is closed by are two cases. 81 , some optimal any 5i-cascade set. x takes [For since the optimal rule x cannot almost surely (j 7^ m) at a stationary belief.] > u k (q(iT,x,ak)), and since So from we can always we must have 8 2 )u J (q(r,x,a J )) l of the proof of Instead, with positive probability, induce any other myopically suboptimal action dj Lemma beliefs. belief q(ir, x, a*) not in G Jm{8\) is barely in a cascade), that at public belief r and with discount factor . SX patient. and Jm (Si) Jm (Si) = (i.e. we need only prove r^ Jm (5 2 ). There some action a k producing a posterior is more edge of the cascade left Jm(Si), an d in > x does not almost surely take action a m rule am slightly Assume bounded Proposition 5 (Strict Inclusion) Proof: static of this paper, that if )u 3 (q(r,x,a J )) + 8 2 v(q(r,x aJ ),82 )] + Siv(q(r,x, : a.j), Si)] = v(r,5i) a.j€A Consequently, we have v(r,5 2 ) Case 8\ is 2. unique. > v(r,Si) = u m (r) Next suppose that the optimal Then the and so r <£ Jm{S2 ). rule at public belief r with discount factor partial derivative V\{r,8\) exists by Lemma 5. By the convexity of the value function, any selection from the subdifferential dv(ir) converges to Vi(r,8\) 10 We Nyarko thank Rabah Amir, David Easley, Andrew McLennan, Paul Milgrom, Len Mirman, and Yaw about the differentiability of the value function in experimentation problems. for private discussions 12 as increases to 7r Since the optimal rule correspondence r. Maximum Theorem, and the uniquely valued at upper hemicontinuous by is the posterior belief q{n £ Sl (n) r, continuous in n at r for any rule optimal selection ^ 5l and any action — Let b optimal rule at r almost surely prescribes action a m q(r,£ Sl = (r),a m -x) continuous in of «i and m m p{r,b). By (ir,p) at (r, b). , and is their definition, u>^(7r,p) [In = Jm (5i), while w^(ir,b) < w T^_ set = w r^_ Wm(r,b) l (r,b) <5i l n < (7r,b) for way = r, v(r,8i) of contradiction, iw r ^_ > [to£(r,$) = (5 2 -5i) 1 ($2 - since r is and r then jointly (Tr,p) are between the slopes lies continuous and vanishing at is + A^(p — (r, 6), r) evaluates «;^_ 1 (r, we then have the - ^_!(r,6)] [u < m (p(r,6)) fc) b), (r, 6), since r lies in the cascade the endpoint of Jm (8\). So in a very useful form: Si) [u m (p(r,b)) = (<*2 - <$i) [u = (<5 2 - «Ji) [fim(p(r,b)) [ti£(r,$) A^ is - Ajj(p(r,&) the slope of u m , r)] because the the prospect of taking action a m forever. and therefore conclude that > u^^r, fr), i.e. r = inf r <£ Jm (5 2 ). By Jm (<52 ). Subtracting - ^_!(r,6)] -w(r,£i)] MpM)A) - - um -i(p(r,b)) +82 v(p(r,bL),81 ) - 4, - u(r,*i) following contradiction: -um _i(p(r,6)) - - u(p(r,6),<$i) assume that w^(r,b) +Mp(r,&), > = am ) (3) - um -i{p{r,b)) + shall prove that wf%(r,b) w m{ r ,k) > \ m -\ 1 by continuity. This equality can be rewritten [u m (p(r,b)) function h m (p) w r^_ and > w^^r, b) b) Moreover, from the previous proof of Proposition We As the -u m -i{p(r,b)) um (p(r,b)) = Also, wf^(r, p(r,b).] q(r, £ 5l {r), let w^^, the expression for multiplied by a function that given q(r,£ 6l (r),a m -i) we , is a,k) a^. supp(p) be the lower endpoint of the private belief distribution. inf , , - r) + 5 1 A^(p(r,6) - r) ~ ^A^(p(r,6) - r) + <5iA^(p(r,6) - r) A^(p(r,6) - <5 2 A^(p(r,6) -v(r,6i)] 51 v(p(r,b),8i) m {p(r,k)) - u m ~i(p(r,b)) ~ «Mi) + - u m -i(p(r,b))} /8 1 - v{p{r,b),8x ) r)] > Here's a detailed justification. Under the assumption that r £ Jm (8i), one optimal policy induces a m almost surely at belief = The first when r r, so that g(r, £* (r),a m _i) l (r), a m _i) equality then follows substituting (2) for each index, and using v(r, 52 ) £ Jm (8i) D Jm (5 2 ). The second applies (3). The second equality A^ inequality exploits and v(p(r,b),82) > v(p(r,b),8i), as given by r q(r, £ S2 £ Jm {8]) C Jm (0), so that a m is is simple algebra, while the = A^ Lemma The final equality um ), final inequality follows since myopically optimal at p(r,b). 13 = p(r,b). = v(r, 8i) (true as both are the slope of 4. £f D CONSTRAINED EFFICIENT HERDING 5. Let us turn full circle, and consider once more the informational herding paradigm as played by selfish individuals. Let the realized payoff sequence be (uk). Reinterpret £ A"s problem as that of an informationally constrained the expected discounted average welfare E[{\ herding model. Observe how SV's and <f social planner — 8) Y^Li n_1 ^ ^nM SV trying to maximize of the individuals in the A"s objectives are perfectly aligned. To respect that key herding informational assumption that actions but not signals are observed, we further assume that the SV neither knows the state nor can observe the individuals' private but can both observe and tax or subsidize any actions taken. signals, How does SV away from the myopic solution to £A"s problem? Given steer the choices the current public belief an individual takes action a G if ir, A (possibly negative) transfer r(o|7r). Nash equilibrium = (CHE) is 1,2, ... seeks to expected one-shot myopic payoff u(a,7r) plus incurred transfers r{a\n). his such incentives, our proof in threshold rules Since the is still SVs program, the best A constrained herding equilibrium game where every T>M. n of the repeated he then receives the .4, Lemma valid, for policy SV is any the same observed action history as was £X's in Lemma 6 VAi (CEHE) constrained- efficient herding equilibrium this constrained first best implement SX's optimal to is a CHE For any discount factor a maps < S 1, the optimal policy £ into the posterior p(ir, a) = 5 Lemma for £X is Ka/[ira-\- (1 1. a threshold 9 m must solve the indifference equation u m (p(7r, 9m )) u m+ i(p(iT, 9m )) + r(a m+ i|7r). how So the transfer difference r(a m _ 1 |7r) individuals trade-off between the two actions, and the threshold belief is taking action a m _i rather than a m We want to provide or subsidize actions Conversely, some when ir if Our > action indices shed 5, SV from and thus some more and myopic payoffs in a SV can ensure that the i.e. for -k SV will not tax £ Jm (5) C Jm . perforce wishes to encourage nonmyopic actions, and zero. Indeed, E Jm {o~) C Jm is a strict inclusion SV. beyond the mere fact light CEHE ix transfers are not identically zero for a patient on the optimal that 'experimentation' (making non-myopic choices) transfer r(a m |7r) alone matters of these transfers. Clearly, her desired one will be chosen anyway, by Proposition + r(a m |7r) = 9^) by suitably adjusting this net premium for some characterization £ Jm {o~), the . transfers intuitively will differ for all 5 = optimally chosen (9 m — CEHE. — 7r)(l — a)], selfish herder's for rule x*. where the transfers achieve outcome. Existence follows at once from Since the private belief Faced with transfers. to coax each is maximize 2 that individuals optimally choose private belief measurable can do a Bayes- is transfers, rewarded. Clearly, the sum of his must leave every individual who should be on the 14 knife-edge between two neighbouring optimal actions perfectly indifferent. In other words, + u m (p(iT,6m )) = w^ir 9m we need T(a m \ir) ', too, because the interval structure is X m £ dv(q(ir,£ In fact, this condition 8). Lemmata make the Proposition 6 (Optimal Transfers) — / (1 optimal by alone will lead inframarginal £>.M's to s ) 2 and 6, = Sv(q(7T, 5 (ir), (, a m )) correct decisions. For any + sufficient and myopic incentives belief there exist Xi it, (Tr),a m ),6), so that the following are efficient transfers T(a m \7r) is 8X m [p(ir, 6m ) - 6 q(ir, {tt), £ < < • XM , with r(a m |7r): a m )]/(l - 8) = w m (ir, 9m )/{l -8)- u m (p(ir, 9 m )) s Observe that incentives are unchanged sequently, SV may a constant if added to is all M transfers. also achieve expected budget balance each period: contribution from everyone is = zero, or Yl m =i P( a m\'^,C ' t ( 7r i.e. the expected There )) -r ( a m| 7r )- Con- is obvi- ously a unique set of efficient transfers that satisfies budget balance. Herding 5.1 We are is now Constrained Efficient positioned to reformulate the learning results of the last section at the level We of actions. First a clarifying definition: N if all individuals n = N, N+ N+ from the definition of a cascade; is false. To show that herds this case: 2, 1, . . say that a herd arises on action a m at stage . choose action a m we can generalize the Overturning Principle of SS to Claim 4 (statement and proof appendicized) establishes that actions other than a m will push the updated public belief far from convergence of beliefs implies convergence of actions The following is thus a corollary to Proposition Proposition 7 (Convergence of Actions) (a) An (b) With bounded private a herd arises on (c) It is a herd an action other than — or, In any CEHE for [0, 1) The chance of an incorrect herd with bounded private individuals will. is current value. Thus, and unbounded private starts. beliefs Unless ttq beliefs. e Jm{°~)> H for any 8 € vanishes as 5 SV ends up with full learning with unbounded More interesting is that SV optimally incurs the incorrect herd. Herding near Jm (5), discount factor 8: with positive chance in state no surprise that it a limit cascade implies a herd. on an action eventually om its for 2. ex post optimal herd eventually starts for 8 € beliefs, this differs imply a herd, but the converse certainly, a cascade will arise, Observe that . beliefs, for risk of "f [0, 1). 1. even selfish an ever-lasting truly a robust property of the observational learning paradigm. 15 CONCLUSION 6. This paper has shown that informational herding not such an adverse phenomenon after the social planner's problem, and is Rather, all: in it is models of observational learning is a constrained efficient outcome of robust to changing the planner's discount factor. To understand this decentralized outcome, we have then derived an expression for the social present value of each action. This formulation differs from the Gittins index because of the agency problem: Since private signals are privately observed, aligning private and social incentives entails a translation using the marginal social value. Finally, expression to prove a strict comparative static that eludes dynamic Namely, cascade sets strictly shink as the planner we have used this programming methods: grows more patient. This paper has also discovered and explored the fact that informational herding is Our mapping, simply incomplete learning by a single experimenter, suitably concealed. recasting everything in rule space, has led us to an equivalent social planner's problem. While the revelation principle exercise mechanism design in also uses such a 'rule machine', the harder for multi-period, multi-person models with uncertainty, since the planner is must respect the agents' While belief filtrations. this is trivially achieved in rational expectation price settings, one must exploit the martingale property of public beliefs with observational learning, and largely invert the model. This also works for more general social learning models without action observation signals is — provided an entire history of posterior belief observed. Absent this assumption, the public belief process (howsoever defined) ceases to be a martingale, and expression as an experimentation is model with perfect recall no longer possible. This explains why our model of social learning with random sampling Smith and Sorensen (1997b) must employ Of course, once informational herding experimentation, efficient. it is entirely different techniques (Polya urns). correctly understood as single-person Bayesian no longer seems so implausible that incorrect herds For incomplete learning is if sights into the experimentation literature: As in This link also hope offers EK, incomplete learning requires an optimal action x for which unfocused beliefs are invariant, satisfy of signals a is the same for all states we have seen assume a instance, Rothschild (1974), McLennan (1984), i.e. For such an invariance lu. with fewer available signals, and not surprisingly herding and of complete learning that constrained anything the hallmark of optimal experimentation models, even with forward-looking behaviour. ip(a\to, x) may be finite (vs. all is for reverse in- at the very least the distribution clearly easier to published failures continuous) signal space. For and ABHJ's example are all binary signal models. More precisely, the unbounded beliefs assumption in an experimentation context corresponds to an ability to run experiments with an arbitrarily small marginal cost shifting the threshold 9k slightly up only incurs myopic 16 costs o(d6k))- (eg. . APPENDIX A. RHS Let the Bellman operator Tg be given by T§v equals the v > we have TgV > v' As Tsv'. A.l Proof of Proposition by v such that when n G Jm {o~), exists, a m occurs For the G m ((5), then u m (ir) if an is am is v(tt,8) = it vq(tt) = ir n +\ SV ir that are pointwise increasing, strictly so outside the 7r, q(n x x(o)) , for all , 7T. By The then is = A [0, 1). Vq(tc) to v(-,5). > ) [(1 - v is applied, it is } — n a.s., induction, T^v > T$~ vo, > interval. = 7r n is v ir n = 1. The value (ir) V<5 G v(-,S) weakly exceeds vq, andVn [0, 1) o~)um(q{n, x, a m )) + \J^=l Jm (5) 8v (q(7r,x, a m ))] over x Then n Observe how all x G X, 52 , T$vq(ix) > v . Finally, when ir is v(n, 5 X ) > (ir) outside the cascade sets, by > v (^). or ought to be a folk result for optimal experimentation, but Step 3 (Weak Value Monotonicity) > and yielding (as usual) a pointwise increasing sequence have not found a published or cited proof of 5i 0, also dynamically not optimal to almost surely induce one action. So, v(tv,S) following either As consists of weakly convex functions resulting in value v Q (7r). Optimizing over 1 If u m (Tr). the optimal choice. argument holds when similar cascade sets: v(n,5) '^, x = weakly convex, Jm (S) must be an The sequence {T^v and converge interval. one rule x almost surely chooses the myopically optimal action. it is Namely, for is i.e. ], G Jm{°~)- 1 v(ir,5) is G Jm (S) and a m ?r no matter which rule converging to the fixed point v(-,5) definition G Ji(S) and [0, 1), m -i,9m [$ need only prove that Jm (S) must be an and v (-,6) a.s. To maximize J^ameA^i for given G C myopically strictly optimal for the focused belief Step 2 (Iterates and Limit) Proof: optimally chooses x with supp(//) — u m (7r) optimal for any discount factor S G and myopic expected For each a m G A, a possibly empty interval Jm (^) the optimal choice, and the value half, a\ is updates to First, define the 11 it really affine function of n, For the second since we first half, </ Conversely, = max m m (7r). (learning stops). For any S a.s. Proof: iv (ir) Cascade Sets) (Interval 1 unique is its 1 established in a series of steps. is utility frontier function v Step a contraction, and v(-,5) is for bounded, continuous, weakly convex functions. fixed point in the space of The proposition standard, T$ is Note that of (1). 12 it. At any rate, The value function v(n, 5 2 ) for all it is is we here for completeness. weakly increasing in 6: it. from v(n,0) = sup x J^ m ip(a m \-K,x)u m (q(n,x,a m )). In other words, v(n,Q) allows the myopic individual to observe one signal a before obtaining the ex post value v Q (p(ir,o)). 12 But ABHJ do assert without proof (p. 625) that the patient value function exceeds the myopic one. this differs 17 > on the choose Vq. If 5i 2 > , Step 4 (Weak Inclusion) words, Va m G A, Proof: As seen > 62 For - 7T > if 1 > 81 > 52 and in steps 3 > v(tt,Si) 2, G Jm {3\)i we know SS establish and Ja/(0) = Now {1}. apply steps Step 6 (Bounded Beliefs) JM (6) = We optimal at belief for beliefs < No So any A G q(7r) Bellman equation > fr is 0. 1 for is tt [0, 7r/2] — and some < ii\ < > H ([8*,l}) > fi boundaries 6m _i 7r 2 L fi = 0, < h ([9*,1])/[7tii h tt ([9*,1}) = {0} — [0,7r(£)] and optimal to choose a rule x that is is = some u > and 0, Vo(%) on very similar. Since action a\ must be the optimal choice it Since each u m [0,7r]. for all beliefs -k —n is ^ ai For Ji(6) is large = = in the interval = no j\nd + and so arbitrarily small, than u(l strictly enough {0} and m > — 5, < m 9*, = < M) We shall consider the # m+1 = 1 (see Lemma 2). (1 - ir)ti L [9*, 1]} [a, a] C — 7r)(l — a)]. For n is v((j(tt), S) — v(n, 6) such small tv. By the beliefs. {1}- for which Jm (S) = [7Ti,7T2], 1) with alternative rule x, with interval results in the posterior belief q(ir) ([8*,l})} in state 18 [0, fr/2]. cascade sets disappear except for all lim^i Jm{5) (1 for C afhne, (1 5)/5 for small enough suboptimal 0. with the event {a G + q(7i) is Since there are informative private beliefs, 30* G (1/2, ([9*,l}) Updating the prior tt/j, 1. 6m it Thus, Ui(n) q(ir) are empty. 1. Fix 5 G [0,1), and an action index for 1 < in particular, less lim^j Jm{o~), while Jm (S) are empty, except for Ji(0) not weakly dominated, any action a (1), other 4. updated to at most Step 7 (Limiting Patience) Proof: it(5) D can produce a stronger signal than any a £ supp(^) initial belief small by continuity of v and < ir_(5) m^ all observation a 6 small enough, Ji(S) Jm (0) The u m {Ti). only cascade sets for the If the private beliefs are bounded, then Ji(5) and 0, some for tt, = > u m (ir) + ufor Ui(ir) (0, 1). it all = G Jmi^)- tt all {1}/ when for all n, v(7r,5 2 ) beliefs, = the argument for large beliefs a\\ -k so that a.s., prove that for sufficiently low beliefs almost surely induces is < where [n(S), I], and 1 and thus {0} and Jm(5) myopic model that for the > u m (7r) vo(tt) unbounded private = extreme actions are empty, with Ji(5) > u(7r,5 2 ) = u m (7r) v(7r, 5i) VFrf/i In other 6 increases: Jm (5i) C Jm (52 ). then 0, Step 5 (Unbounded Beliefs) Proof: when All cascade sets weakly shrink optimal value can thus be obtained by inducing a m Proof: Y, arn eA i>(am\^,x)v(q(7T,x, a m )) for > 62, then T^Vq > Ts vq, since more weight is placed larger component of the RHS of (1). Because one possible policy under 6\ is to the £ optimal under 5 2 we have T^v > TJ^vq. Let n — 00 and apply step 2. any x and any function v Si < Clearly, J2a eA ip(a m \^,x)u m (q(TV, x, a m )) m Proof: H. — For any compact subinterval / C (0, 1), in for all n E onto) [tt 2 Since + By e/2, v(7r, 8) > choose u have I. definition of e, q maps Choose u > 1]. > u m (7r) outside > u m (7r) + u there exists e the interval Jm (S) = > for all > If 8' 8. — f/2,7r 2 [7r 2 > e(I) ] with 8 £ i\ + (7r) + s/2, [n 2 1]. so large that (1 is q(ir) u for all G — £/2, 712]. By [ii2 length e/2 from interval = If rn 1 or A. 2 Proof of We (Two first belief q(7r, x, Lemma if its set Proof: an action a m in n, belief downwards. With unbounded cascade sets By beliefs. M— Fix 5 we may set, > 0. TS holds, If By step 3, we thus — 8')u < 5'u, then when a.s. for large enough 5. / vanishes for 5 near 1. 7r, at least %_ and x, g("7r, = p(vr,6) the case in BHW). then for any not in it then (^k) obtains for all non-cascade both cascade sets can be reached. two actions are taken with positive chance, and a m ) never < > let upwards while another lies in Assume [b, b}. Jm '(0) and Jm (0), with m' < m, and < TS fails, If shifts the public belief beliefs, is TS each the unique belief between any pair of 1 points, Let co(supp(F)) then p(n,b) also taken with positive chance inducing a posterior definition of these cascade sets, p(7r, b) cascade [0, 1]. Call the private signal distribution 3. Jm _i(0) and Jm (0) from which At a non-cascade sume bounded is set. by the interval structure, some action it e 4 a m ) not in any 8-cascade sets > of times, each time excising support contains only two isolated points (as ("Ar).' nonempty cascade n apply this procedure repeatedly: Jm {8) except possibly at most i\ Jm (8) evaporates see that (Unreachable Cascade Sets) 1 any 8-cascade beliefs Jm (8), we consider a stronger version of step Signals) Claim M, number iterating this procedure a finite tx £ 7r the Bellman equation (1) reveals that our suggested rule x beats inducing a m 7r — into (but not necessarily and both are continuous [7Ti,7r 2 ], > u m (7r) + u for all 8' = u m (-7r) < u m+1 so large that so small that v(tt, 8) v(tt, 8') D Jm {8), particular one with I that 7r = a cascade it lies set; therefore, as- between the nonempty sup Jm '(0) and 7f p(7t,b). If all possible actions at But these f. shifts = 7r inf Jm (0). led into a inequalities can only hold with equality: > p(p(ir,b),b) p(tt,6) > p{ji,b) > = p{p{n,b),b) p(p(n,b),b) and because the outer terms coincide, as Bayes-updating commutes. and Jm (0) there exists at such a point exists k to solve p{fc,b) realized, = iff 7f, m! most one point = m— while and the posterior q if is 1 TS it which can satisfy between Jm '(0) both equations; moreover, and TS holds. Indeed, given TS, we may simply choose fails, then with positive chance, a nonextreme signal not in a cascade cascade sets by Step 4 of the Proposition - in fact ft So, 1 set. proof, so a would further require sup Jm ^\{8) 19 = With fr 8 > failing (it) is we have weakly smaller is even less likely to exist sup Jm _i(0) and inf Jm (8) = inf Jm (0). assume TS. Consider any Finally, Then the x mapping rule By indeed optimal. with reachable cascade sets Jm -i(8) and Jm (5). ff a m _i (low signal to b into convexity, v(w, 8) most the average at is and tt_) a m (high signal to b into given by transition chances), and x achieves this average. So v(-,8) We now for finish proving Lemma outside the (^-cascade sets. Fix it set we're done, as v(it,8i) Assume first that one action a m least a (^-cascade set. wCtt,^) tt = such / = neighbourhood U (it, 7f) To maps = Wm-i(zc) From of b into 1 81) > (and thus also for m _x((5i) and Jm (^i) .7 71", it 1 tt_ 82 ) € Jm -i(<^i) and 7r > q(ir < i/j(a + v(tt,8 2 ). (ir) a ^-cascade £, , - Then 82). (it) , > a m ) not in 5 1; z;(7r,5 2 ) ^. at (4) If (4) holds noted that between consecutive cascade sets = sup Jm -i(8i) and ff was everywhere an b into ff > x from Claim G Jm {8i), it yields = mixture is worth strictly inf Jm (8i). Also, from affine function v(-, 8 2 ) at ff on (see [zr, ff], Step 3). v(tt,8i). l's proof at the belief myopic and continuation values ff. first-period values v(tt_, more than 82) and v(tt,82)- v(jt, Si): ip(a m \Tt,x)v{7t,8i) [(l-82 )v(ff, 81 )+82 v(?l, 82 )] most Sl fails (-fc) for v(n,8i) and ^(ff,^) v(tt,8i), (1), this ip(a m -i\7T,x)v{TL,8i) m -i\n,x) — v implied TS. In that case, (4) holds in a punctured where v(tt_, > v(tt, 8{) lies in tt D (7f,7r). outside the 5 r cascade sets. lies tt for 5\ sets. If is (weights 8) on affine v (q(7r,£ Sl (n),a m )). Since 8 2 to ^(ff,^), apply rule v {k,8i) and u m (rt) clearly at a m ), l's proof, v(-,8\) = is Suppose Claim Claim not. only, the right hand side of v(tt,8i) which tt bound find a lower Since x -cascade supporting tangent line to the convex function touches v(-,82) at it (TT), between 7r Assume the last paragraph of Claim As <5 2 (r4l «(-,tfi))(7r) < {TS2 v(;ti))(K) < (Tft «(-,<y2 ))(7r) (7r, it) in turn, is a outside the v(tt,8 2 ). i must be unique, and that 7T which < is step 2 of Proposition l's proof, taken with positive chance inducing a belief is Thus, v(q( K,( Sl we are done. 7T, vq(7t) it satisfies {+*) of Next assume that some at From 4. and v(n, of v(it, 8) ff) Given + ^(a m |ff,x) [{l-82 )v(7r,8i)+S2 v(Tt,82)] this contradiction, (4) must hold at ff. A. 3 Differentiability Continuity. In light of the interval structure of Lemma the chances ip(a m \ir) with which to choose each action space maps - that is, into each action). Thus, the choice set the same strategy space is 2, (i.e., SV simply must determine what fraction of the signal WLOG the compact M-simplex A(>1) as in our earlier general observational learning model. Since the objective function in the Bellman equation (1) 20 is continuous in this choice vector and in non-empty optimal (1974)) that the Proof of rules at Lemma Assume 5. We are equivalent. it optimal rules at Claim 2 \/e > > 38 Since property is near show — is differentiable at \tt — tt\ < In other words, and if all as asserted. tt, any optimal rule 8, tt. optimal all at tt induces action induces a m with probability one. tt Next, ir. and that in H, L. e in both states so are both tp(a m \H) 1, it, this leads to a contradiction. rules optimal at all I.B.3 of Hildenbrand upper hemi-continuous is not differentiable at is G Jm{$), one optimal rule at tt shared by is that v Theorem (e.g. rule correspondence such that when 1 Maximum of the are equivalent, then v it a m with probability at least Proof: Theorem follows from the tt, it ip(a m if \Tr) = TTip(a m The claim then ip(a m \L). \H) + (l This — 7r)-0(a m |I/) follows from the upper D hemicontinuity of the optimal rule correspondence. Claim 3 started from ViV 6 Proof: Fix at least 1 n+ — i — TT n when a m > Ve action a m tt, > 38 such that taken for the is < rj rj in < \ By Claim 1/2. 2, for — occurs, provided initial by Bayes tt)t], so close to 7r tt — e) ^ that pi(l — pi) < — 3n(l 77i/[87r(l — = |Pat in — \p2 tt\ — of Tt\ tt, tt tt it and — 1///v e) and so within and then p 3 then 7r)/2 (1 of Tt)] each state from any 772 < tt\ 8 then under any optimal strategy with probability at least enough to 1 — e in if action a m occurs with chance a m occurs, then the posterior If — |7r„ +1 enough to that tt, tt u \ tt. a m occurs for the next N consecutive periods, that a m occurs with conditional chance at least tt \p 2 at any in each state. close to — likewise, stays within tt\ of within tt it it. — it\ — \p\ Let ft\ = 771 of tt, \pi — ^ tt be so close to Apply o£ , . and choose tt\ P2 7^ — 1 771 tt in this construction iteratively to choose . . , p^. If the initial belief in the next A^ periods it tt optimal rules take all that a m occurs with chance at least and then p 4 \pi 7r„ +1 satisfies can be chosen arbitrarily small each period. In particular, we proceed as follows. Let pi a m with chance at least within . So rule. close is tt h 7r n the posterior belief stays close enough to (1 — N periods first close TT n each state starting from 4tt(1 Choose the 1 \tt if H, L. both states |7r N tt lies within provided a m occurs each period. We Any optimal and v H The affine employ the machinery from the proof of Proposition started at belief tt h(p) which has h(0) Since it is function some state-contingent values v L H and h(l) — v is then a tangent to the value function at tt. will yield = v L . optimal to take a m forever at h which intersects u(a m L) at , derivatives of v at ir, strategy 4. tt — tt, one tangent to v at and u(a m H) , at tt = 1. tt is the affine function Consider the left and with corresponding tangent lines hi(p) and h 2 (p) at belief of those tangents -- say, hi - - must differ from h (when hi 21 differs, necessarily p. m right One > 1). . = Define v\ > u(a m nf As v beliefs > h\(0) convex, u(a m L) and v^ , > L), a unique A , is = h(0) = < h\{\) = exists satisfying v[ u(a m H). Since u(ai, L) , + Au(ai, L) differentiable almost everywhere. it is = h(l) So let — A)u(am | iv k then uniquely determined for each is ir k and , state-dependent payoffs of any optimal strategy started at n k vH {^k) < Now The v\ choose a,\ inequalities of course follow N and so large L L the expected value v (nk) in state v L (ir k ) < (1 - 6 N )(1 - < (1 - X/2)u(am < (1 - > u(a m ,L), since u(ai,L) > A/2 — 1 — (1 Then by Claim strictly the best action in state L. is namely v , by convexity of v and e so small that ir k 5 /v L < - ir k )(l 3, for all , \)u(a m L) , + L) + - - <J L)=v}< v + [1 (1 JV - )(1 the 0, 1 are > (ir k ) v\ and n. Note that e). enough large of the optimal strategy starting at n^ e)u(a mj L) L). The tangent . — intercepts at p its , be a sequence of 7r converging up to n, with the value function differentiable at each function action (1 > most at is k, e)]u(o 1} L) {X/2)u(a x L) , Au(oi, L {n k ) as noted above. Contradiction. A. 4 Proof of Proposition 7 Near Jm (5) we should expect to observe action a m The next lemma states that when . other actions are observed they lead to a drastic revision of beliefs, or there was a nonnegligible probability of observing some other action which would overturn the Claim 4 (Overturning Principle) > there exists e (a) i/>(a (b) i>(a m \ir,£ Proof: for 7r m |7T, £ s (%)) > 1 e -neighbourhood — e, and \q(ir, assume bounded private it in or G Jm (5) C Jm (5), we have Jm- This no information gain. b < 1/2 < b. Assume Let e (n), ak ) a G A [0, 1), Jm (5), such — 7r | > <s is is /or optimal that a// : a^ ^ am ip(a\n,£ (ir))>e/M, By Step of any k is m 5 \q(ix ,^ {-k) to stop learning. ip(a k \iT,^ (Tr)) 0, that occur; or , o)-^k\> e 6 of the proof of Proposition s : / (0, 1), either: < 1 1, Thus, we need For any small enough n (0, 1). ^ and Jm {5) £*, W € Kn 5 beliefs. for — rj. u.h.c, almost surely taking action a k is > and Otherwise, optimal at impossible, as a k incurs a strict myopic loss, and captures Let co(supp(F)) > £ S the only optimal rule 1, since the optimal rule correspondence 7r KD some closed subinterval I sufficiently close to some S £, < l-e, and for some enough to only consider -k (7r)) First, close and an For any beliefs. = [6,6]. By the existence of informative beliefs, minimum of//, ii H {[b, (26 + l)/4]) and // i ([(26 + l)/4, 6]). > 1 — e for some n E I. Then any action a^ ^ a m is a.s. only be the s ip(a m \ir,£ (Tt)) taken for beliefs within either overturning (selecting, if [6, (26+l)/4] or [(26+l)/4, 6]. Any such a k necessary, e even smaller). If instead 22 ?/>(a m |7r,£ implies the stated 5 (7r)) < 1 — e, then each action is taken with chance extreme private less and so e, different actions are taken at the optimal beliefs (by the threshold structure of the Next consider unbounded private — say {1} 7r 6 with q(n, £, near that When and 9(^,^(71), a k ) are near tt and an |g(7r, £ (7r), near the cascade sets {0} or 0, this ak ) — tt\ > some e for For the proof of Proposition Cantelli Lemma process, and we > e T(Y\, . , . . Yn ), a\ from continuous in is a strict is taken with positive chance 0. first cite the extended (conditional) Second Borel- Corollary 5.29 of Breiman (1968): in An 7, e 7^ a\ ^ boundedly positive yields a arbitrarily small loss in future value (for v improvement: contradiction. Consequently, any action a k has claimed. beliefs, as a k was nearly uninformative). So the altered policy shift very little, as 5 tt is one of least dk) near n. Consider the altered policy that redirects private beliefs {tt), into «! instead. which Assume beliefs. At rule). Let the optimal policy induce with positive chance an action a k 0. first-period payoff gain q, — 1 M actions then occurs with chance at least e/M and overturns the the afc than Let Yi,Y2 ,... be any stochastic the induced sigma-field. Then almost surely 00 {uj\uj £ An = infinitely often (i.o.)} \^ P{A n+ \\Yn {u>\ , , . . . Y = 00} x ) 1 Fix an optimal policy £ For fixed m, define events and G n+ — i {|7r n+1 — 7r n | and therefore P(G„ + i|7rn ) En n Fn occurs By i.o. By . Choose En = {ix n is (7r„-) many En nF£. En C)F£, to get e-close to Jm (S)}, 4 for all actions Oi, a 2 Fn = Lemma, almost surely converges by implication, is Claim to satisfy {ip(a m \ir n £ En D Fn times. all H that Then En actions a k occurs i.o. Gn obtains Lemma 3, G n+l c (b) / am eventually taken on event an event of probability mass one, by imply En Lemma , . . < 1 a^- — e}, must obtain, the event where i.o. Gn on that event occurs Jm (5) and eventually true on is (ir n )) . , i.o. with with probability zero. (n n ) converges to a limit in fl 6 , ) occurs only finitely Perforce, action a m > > e}. If En n Fn is true, then Claim 4 scenario = 00 on > e/M. So J2™=i P(G n +i\^i, ,if n Restrict attention to the event But given e the above Borel-Cantelli almost surely. But since probability zero. 5 fl G n+ F£ n i, by the G n+X c . H, and thus first fl Fn so is point in Claim 4. Finally, 3 and Proposition En sum over all m 1. References Aghion, P., P. Bolton, C. Harris, and B. Jullien (1991): "Optimal Learning by Experimentation," Review of Economic Studies, 58(196), 621-654. Banerjee, A. V. (1992): "A Simple Model of Herd Behavior," Quarterly Journal of Economics, 107, 797-817. 23 ,M|T LIBRARIES III III llll! Illl 3 9080 01444 3425 BERTSEKAS, D. Dynamic Programming: Deterministic and (1987): Prentice Hall, Englewood Bikhchandani, ion, S., Cliffs, Stochastic Models. N.J. D. Hirshleifer, and I. Welch (1992): "A Theory of Fads, Fash- Custom, and Cultural Change as Information Cascades," Journal of Political Econ- omy, 100, 992-1026. Breiman, Dow, L. (1968): Probability. Addison- Wesley, Reading, Mass. J. (1991): "Search Decisions with Limited Memory," Review of Economic Studies, 58, 1-14. EASLEY, D., and N. KlEFER (1988): "Controlling a Stochastic Process with Unknown Parameters," Econometrica, 56, 1045-1064. FUSSELMAN, MlRMAN Consumption for a General Class of Disturbance Densities," in General Equilibrium, Growth, and Trade II, ed. by J. G. et al., pp. 367-392. Academic Press, San Diego. Gittins, M., and L. J. J. (1993): "Experimental C. (1979): "Bandit Processes and Dynamical Allocation Indices," Journal of J. the Royal Statistical Society, Series B, 14, 148-177. Hildenbrand, W. (1974): Core and Equilibria of a Large Economy. Princeton University Press, Princeton. McLennan, A. "Price Dispersion (1984): and Incomplete Learning Journal of Economic Dynamics and Control, Meyer, M. A. Profiles," 7, and J. and Career 58, 15-41. Rothschild, M. (1974): "A Two-Armed Bandit Theory Economic Theory, 9, 185-202. J., Long Run," 331-347. (1991): "Learning from Coarse Information: Biased Contests Review of Economic Studies, Scheinkman, in the Schechtman (1983): of Market Pricing," Journal of "A Simple Competitive Model with Production and Storage," Review of Economic Studies, 50, 427-441. Smith, ing," L., and P. Sorensen (1997a): "Pathological Outcomes of Observational Learn- MIT mimeo. (1997b): "Rational Social Learning Sobel, M. "An Through Random Sampling," MIT mimeo. Complete Class of Decision Functions for Certain Standard Sequential Problems," Annals of Mathematical Statistics, 24, 319-337. (1953): Essentially L., and R. E. Lucas (1989): Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, Mass. Stokey, N. 24 : r __Date D ue Lib-26-67