MIT. LIBRARIES - DEWEY Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium Member Libraries http://www.archive.org/details/informationalherOOsmit working paper department of economics INFORMATIONAL HERDING AS EXPERIMENTATION DEJA VU Lones Smith Peter S^rensen 96-9 Mar. 1996 massachusetts institute of technology 50 memorial drive Cambridge, mass. 021 39 INFORMATIONAL HERDING AS EXPERIMENTATION DEJA VU Lones Smith Peter S/irensen 96-9 Mar. 1996 MASSACHUSETTS INSTITUTE OF Tcry.voi OGY JUL 18 1996 LIBRARIES Informational Herding as Experimentation Deja Vu Peter S0rensen* Lones Smith* Department of Economics, M.I.T March 27, Nuffield College and M.I.T. 1996 Abstract We prove that the recently proposed informational herding models are but special cases of a standard single person experimentation We model with myopia. then re-interpret the incorrect herding outcome as a familiar failure of complete learning in an optimal experimentation problem. We next explore that experimentation model with patient individuals or equivalently, the observational learning successors. As such, we find that even — model where individuals care about when individuals internalize the herding and incomplete learning still obtain. We note that this outcome can be implemented as a constrained social optimum when decentralized by transfers. externality in this fashion, incorrect herds 'The proposed mapping in this paper first appeared in July 1995 as a later section in our companion paper, "Pathological Outcomes of Observational Learning." For the current full length paper, we thank Abhijit Banerjee, Christopher Wallace, and the MIT theory lunch for comments. Smith gratefully acknowledges financial support for this work from the NSF, and S0rensen equally thanks the Danish Social Sciences Research Council. 'e-mail address: lonesffllones.mit.edu *e-mail address: peter.sorensenQnuf.ox.ac.uk And out of old bookes, Cometh al this new in good faithe, men science that lere. — Geoffrey Chaucer INTRODUCTION 1. The few years has seen a flood of research on a paradigm known as informational last herding. 1 We ourselves have actively participated (1996a), or simply SS) that in this research was sparked independently by Banerjee (1992) and Bikhchandani, Hirshleifer, and Welch (1992). The context is and menus, and each may condition private signal about the state of the world observation of their private signals If but these private signals have is not always correct is and on finite his decision all his menu. Everyone has both on is verboten. bounded power, known it is — namely, after some point, we explored, embellished, and BHW's kindred quite robust. upon a limit all that a 'herd' eventually arises, make the identical choice, possibly not occur in finite time. The fully fleshed out this story, notion of cascades where only one action is (endowed) his predecessors' decisions; however, unwise. This simple pathological outcome has understandably attracted In SS BHW: An infinite seductively simple: sequence of individuals must decide on an action choice from a identical preferences herd (Smith and Sorensen is not as resilient: much fanfare. and found that herding While beliefs converge taken with probability one (a limit cascade), this need potential for bad herds is also not without caveat: Absent a uniform bound on the strength of the individuals' private signals, only correct herds Finally, in a world with multiple preference types, a occur, where the lesson of history A Possible Link, train of thought: been piqued by Is bandit: forever mixed, and a Puzzle. confounded learning outcome might and private signals always decisive. Robustness aside, this paper pursues a different informational herding fundamentally a new phenomenon? An One classic example infinite-lived impatient is [0, 1]. on one of the Aiming monopolist optimally experiments with two Rothschild showed that the monopolist would prices, for and (ii) See more than a casual analogy between The Assembly (i) with positive probability select the recast the observational learning 1 have Rothschild's (1974) analysis of the two- possible prices each period, with purchase chances for each price being fixed, draws from We similarity to the familiar failure of complete learning in an optimal its experimentation problem. armed is arise. paradigm as a of Fowles. Line 22. this unknown eventually settle down less profitable price. outcome and herding, we must single person optimization problem. This who suggests considering the forgetful experimenter, signal, takes can reflect each period receives a new informative an optimal action, and then promptly forgets his signal; the next period, he only on his action choice. can rationality, nor it be the basis for Alas, this is neither a very satisfying model of an optimal experimentation problem. How then can an experimenter not observe the private signals, and yet take informative actions? An The Second Puzzle, and a Resolution. interesting sequel to Rothschild's work was McLennan (1984), who permitted the monopolist the continuum of of a When curves'. down on prices, the flexibility to charge one but assumed only two possible linear purchase chance 'demand demand may curves crossed, he found that the monopolist well settle the suboptimal uninformative price. Rothschild's and McLennan's models give examples of potentially confounding actions, as introduced in mal for beliefs EK: Easley and Kiefer (1988). In brief, these are actions that are opti- unfocused beliefs for which they are invariants unchanged). Of particular significance finite state actions, and action spaces, there as separate anticipations of arise. EK's general continuous state space, whereas the proof in is will generically and thus complete learning must McLennan (i.e. 2 taking the action leaves the EK (on page 1059) that with not exist any potentially confounding Rothschild and McLennan might be Rothschild escapes insight. seen by means of a it resorts to a continuous action space. Yet there appears no escape for the herding paradigm, where both flavors of incomplete learning, limit cascades Our and confounded learning, generically arise with two actions and two states. resolution of these puzzles respects the quintessence of the herding paradigm that predecessors' signals are hidden from view. In a nutshell, we imagine that individuals do not act for themselves, but rather furnish optimal history-contingent 'decision rules' for agent machines that automatically With this device, context for an old map informational herding any realized private signal into an action choice. is properly understood as a phenomenon: incomplete learning by an Herding with Forward-Looking Behavior. example of an informational externality: While fully determine the state of the world, it is all new (camouflaged) infinite-lived experimenter. The herding outcome individuals collectively is a striking know enough to aggregated rather poorly. For in a herd, most individuals almost surely take an action which reveals almost none of their information. Late-comers ideally prefer that their predecessors had better signalled their information with more revealing actions, but early individuals clearly have no incentive to do Notwithstanding the motivation of this research, some 2 For instance, payoff assignments in a one-armed bandit are not generic in K2 founding action — . may so. therefore find the second — where the safe arm is a potentially con- more compelling, where we half of this paper investigate the herding externality with forward-looking behavior. In fact, having recast the problem as one of optimal individual experimentation, myopia. For as it is not implausible that learning well is is incomplete simply because of the known, a rational experimenter some current willing to forgo is We payoff in order to secure knowledge relevant for future payoffs and decisions. experimentation model without myopia, learning fact that in the Herding is to itself is is still not complete. 3 best captured by an equivalent social planner's problem, where the goal maximize the present discounted value of individuals' welfares. The social planner Such an likewise will sacrifice early payoffs for the informational benefit of sucessors. outcome can be decentralized as a constrained social optimum by means of budget balance transfers that encourage experimentation. Yet externality is prove in we of a simple set find that the herding only imperfectly mitigated by the social planner: Incorrect herds obtain, still but with chances vanishing as the discount factor converges to one. Overview. we Section 2 outlines a general observational learning model. In section re-interpret the herding paradigm as one of optimal single-person experimentation. Sec- tion 4 characterizes the limiting beliefs of that experimenter, and then reverts to the A planner's herding problem for the characterization of the limiting actions. gives a broader perspective on our findings. In this section and thus we BHW set An exogenous order. There is infinite T) by an individual SS (Lemma is 1), = 1,2,... takes actions in that The elements There is a given measure A over 0. random signal, on € E, about the WLOG we may assume actually his private belief, i.e. we let state of the world. a be the measure over Signals thus belong to E, the space of probability measures over (0, J"), and S Just as in SS, when which signal. is Conditional on the state, the signals are assumed to be private signals are not uniformly bounded As that the private signal received from Bayesian updating given the prior A and observation of the private associated sigma-algebra. This into the experimentation literature. are referred to as states of the world. receives a private in happens to cover Lee (1993). sequence of individuals n prior belief, the probability demonstrated results mapping also uncertainty about the payoffs from these actions. of the parameter space (0, n Longer but essential proofs are appendicized. and Banerjee (1992), and Information. Individual conclusion up a very general observational learning model, that subsumes generality facilitates the ensuing common social OBSERVATIONAL LEARNING MODEL 2. SS, 3, in power, learning is complete. the i.i.d. 6 across individuals, distributed according to the probability measure e in state 9 p. € 0. To some signals are informative. Each distribution may contain atoms, but to ensure that no signal will perfectly reveal the avoid trivialities, assume that not state of the world, we p all insist that all are identical, so that e p be mutually absolutely continuous (a.c), for 9 Everyone chooses from a fixed action Bayesian Decision-Making. set A, e 0. 4 equipped with the sigma-algebra A. Action a earns a nonstochastic payoff u(a, 9) in state 9 6 0, the same where u for all individuals, everyone is rational, seeks to i.e. : Ax© i->- maximize E is measurable. their expected payoff. Before deciding action, everyone first observes his private signal/belief An own private belief. As in SS, we simply assume that an and can use the common probability distribution over action profiles h Bayes' rule implies a unique public belief Given the posterior belief p expected payoff u a (p) exists. 6 The to A(yl). A rule individual can in either state. = -n 7r(/i) and a = Je u(a,9)dp(9). We (.A, .A). That is, x The optimal x a depends on the state both 9 and the optimal decision rule and unconditional on the 9, yields the posterior belief p € E. 1 some The random variable A h. 5 which maximizes x : E h-> = a(p) A(A), the space of = E x(a) over actions simultaneously it. Since the probability mea- the implied distribution over actions a depends on the density state, it is ip(a\n,x) is ij){a\9, x) = JQ ip(a\6, x)n(d9). 7r n+1 . Thus, transition probabilities. belief process (n n ) is a limiting any history for assume that an optimal action a depends on clearly a distribution over next period public beliefs Lemma these probabilities, an element of the space A' of maps from is x. In fact, Markov process with state-dependent decision final Observational Learning as a Stochastic Process. sure of signals Knowing E E, individual n picks the action a £ x produces an implied distribution v for all private beliefs a. compute the his A € E solution defines an optimal decision rule probability measures over a.s. to entire action history h. prior to calculate the ex ante (time-0) application of Bayes' rule also given the private signal always and the that upon an individual's Bayes-optimal decision rule uses the observed action history rules of all predecessors, his common knowledge It is (7r n ) = f x(a)(a)p e (da), This in turn yields follows a discrete-time The next result is standard: martingale unconditional on the state, converging Tr^,. The limit n^ places no weight on point masses on the wrong states of the world. See Rudin (1987). Measure /i 1 is a.c. w.r.t. p? if p?(S) = => fi l (S) = VS € S, the sigma-algebra on 2 1 2 l E. By the Radon-Nikodym Theorem, a unique g € L every S € S. ) exists with n {S) = f s gdp for H L With p, mutually a.c, 'almost sure' assertions are well-defined without specifying the measure. p, 4 ^ , 5 6 one wishes to pursue this angle, this is the unique Bayesian equilibrium of the 'game'. Absent a unique solution, we must take an arbitrary measurable selection from the solution corre- If spondence. Observational Learning Model G 6 n individuals n n Optimal decision rule x e X States 9 g States Belief after Belief after A Action taken by individual, a G Density over observables 1: Embedding. ip(a\9, x) Payoffs (unobserved) Payoffs (private information) Table 7r n A Observable signal a E ip(a\9, x) Density over actions n observations Optimal action x G X Randomness in the nth experiment an Private signal of individual n, Model Impatient Experimenter This table displays how our single-type observational learning model fits into the impatient single person experimentation model. 3. A first MODEL COMPARISON step in recasting our general experimentation problem is model of observational learning But to avoid a forgetful experimenter, an in (no 'active experimentation'). we must regard the observational learning from a new perspective. Consider individual private signal n, who uses both the public belief forming and acting upon his posterior these two steps using the conditional independence of as: (i) observing and submitting private signal 7r n it , to but not his private signal; an agent and take 'choice' his action a distribution // in state rule x described We now 9, n n and on . his separate Mr. n can be regarded optimally determining the rule x £ X, A for The ultimate him. payoff u(a, 9) If is unobserved, private beliefs a G A a have with chance ip(a\6,x). precisely describe the single-person experimentation model. EX chooses an A is action (the rule) x G this addresses The state space X. Given is this choice, realized with chance tp(a\9,x) in state 6. Finally, updates beliefs using this information alone. 7 Table how We may and machine; and (Hi) letting that machine observe his in section 2, resulting in action a random observable statistic a G Notice . pn 7r n story the impatient experimenter will choose the same optimal decision O. In period n, the experimenter £X G (ii) beliefs that provide an additional signal of the state of the world. lest Thus, we must study to respect the individuals' selfishness. an impatient experimenter with discount factor as a single person both lead puzzles. 1 summarizes the embedding. First, the experimenter never knows 7 This experimentation model does not strictly fit into the EK mold, where the instantaneous reward period n depends only on the action and the observed signal, but (unlike here) not on the parameter 9 in 0. This is the structure of Aghion, Bolton, Harris, and Jullien (1991) (ABHJ), where payoffs are not necessarily observed. If we wish, we may simply posit that the experimenter has fair insurance, and in simply earns his expected payoff each period rather than his random realized payoff. Then his behaviour will be exactly the same, yet he will not learn anything from the payoff. the private beliefs a, and thus does not forget them. Second, the pathological learning outcomes are entirely consistent with EK's generic complete learning with finite signals The result for Simply put, actions do not map action and state spaces. models to actions but to rewrites the observational learning model as an experimentation model. when one true action space for £X X the infinite space is of decision rules. One SS considered two major modifications of the informational herding paradigm. was to add i.i.d. noise to the individual decision problem. Noise is easily incorporated here by adding an exogenous chance of a noisy signal (random action). SS also allowed population. Multiple types can be addressed here by simply imagining that task and private belief, T randomly drawn from one or the other type different types of preferences, with individuals T-vector of optimal decision rules for from X T with (only) the choice and choosing the action a £X chooses a machine observing the as before. PATIENCE 4. Assumptions 4.1 Special While we have rather generally expressed observational learning as a type of myopic experimentation, our analytical results will require a more restricted model typical of the herding paradigm. For instance, if (0, T) is sigma-algebra, then the space of mappings mappings from the measures E over So just as X algebra over If supp(//) C [0, 1]. (0, 1), with the Borel rather unwieldy: the space of measurable is ©= {H, L} (or more generally is H, and thus E is finite). High and = X(H) = the interval [0, 1], and S the Borel sigma- low states are equilikely ex ante, and so have prior X(L) expresses the chance of state R equipped to the simplex A(A). we assume that in SS, a continuum subset 1/2. Private belief Let supp(/x) be the (common) support 8 of each probability measure then we call the private beliefs bounded, while if co(supp(/u)) = a 6 p. . [0, 1], they are unbounded. In that case, arbitrarily strong private signals/beliefs can occur. Lee (1993) showed that with these restrictions, a continuous action space can easily Knowing allow for simple statistical learning: individual n's public belief, observation of his action then perfectly reveals his private signal. assumption that A = {a\, . . . , a^} is finite. We We thus make the standard lumpiness assume that no action the others, and this implies the simple interval structure that action a m when the posterior p is in some sub-interval of myopically optimal for posteriors p e As usual, the support of a probability [0, 1]. We m -i, fm where [f ], on the Borel-algebra is is is dominated by optimal exactly order the actions such that a m =f < fi < . . . < M= f. 1- the smallest closed set of measure 1. is ' The Patient Experimenter 4.2 Since our experimentation model admits any discount factor 8 G > wonders what happens with a patient experimenter (when 8 one naturally [0, 1), SS shows that the 0). = myopic experimenter maps higher signals into higher actions: There are thresholds < xq a G . . < xm = . This . 1 depending on it alone, such that action a m Lemma A.l also true with patience, as (appendicized) is optimal when strictly is and indifference between between a m and a m+l prevails (x m -i,x m ), = xm a < Xi at the knife-edge proves. Intuitively, not only does the interval structure maximize immediate payoffs, but also ensures the it greatest future value of information, by producing the riskiest posterior belief distribution. We now down and Proposition Jm {8) C • For 1 [0, 1], all 5 • With £X that Theorem of EK's Lemma turn to £X's long run behavior. G is (Absorbing Basins) such that when it 7r(5) < ft (8) < 1. For each a m G A, a possibly empty interval is concentrated on the basins Ji(8) beliefs, J\{8) = {0} and J M (6) (7r n ) is 8, = — {1}; all For the smaller are all basins. J\(8) other = {0} and U • • U . Jm{8). are empty. where [ft(8),l], large exists induces a m Jm (8) = and Jm{8) [0, zl((5)] and Jm, while lim^x basins disappear except for Ji Proof: a.s. n^ The larger an expression is precludes further learning. • If the private beliefs are bounded, then Ji(5) < result G Jm {8), £3C optimally chooses x which the limit belief unbounded private n^ us that beliefs must settle The next never dead wrong about the state. 5 that the limiting belief [0, 1), tells 1 enough 8, all — {!} lim^! Jm{8) This characterization of the stationary points of the stochastic process of beliefs directly generalizes the analysis for 8 = in SS. See figure 1 for an illustration of how the basins are determined from the shape of the optimal value function. All but the initial limit belief result are established in the appendix. limit cascade at least must occur, two signals in the highest such signal A is as SS call it To see — observe that for why any belief are realized with positive probability. more likely in state follows that next period's belief differs that one By 7r in the support of H, and the lowest more For unbounded private • With bounded private • not in any basin, Intuitively, or B of SS, by n cannot 71^. • is — that a likely in state L. It from # with positive probability. Proposition 2 (Convergence of Beliefs, or Long there true the interval structure, the characterization result for Markov-martingale processes in appendix lie is beliefs, beliefs, tt^ is learning positive probability in state H Run Learning) concentrated on the truth for any 8 G is incomplete for any 8 G [0, [0, 1). 1): UnlessiiQ G Jm{8), that -k^ is not in Jm{8)- The chance of incomplete learning with bounded private beliefs vanishes as 8 1 1 • Figure 1: Typical value function. Stylized graph of v{ir,S), 6 > 0, with three actions. u{a x ,L) u{a 2 ,L) u(ai,H) u(a 3 ,L) MS) MS) J,{5) Given the absorbing basin characterization of Proposition Proof: result follows bounded Lemma from 1 (since tTqo incomplete learning conclusion follows just as in Theorem beliefs that for 6 close enough to = — 7r n )/7r n (1 is likelihood ratio (1 above by i(l lim^i n(5) 1, 0, 1. -k^ places all weight in J\{5) — o)/a bounded above by some of £<*> I < 1 The 0). We now of SS. and Jm(S). The likelihood all private beliefs oo, the sequence (£ n ) must equal = 7r First, Proposition 1 assures us a martingale conditional on state H. Because — ]i(6))/n(S), and the mean = the unbounded beliefs places no weight on the point belief extend that proof to establish the limiting result for 5 1 £n 1, its the weight that tt^ places on J\(S) in state a have bounded — 7ro)/7r must vanish as 6 —> prior H is mean ratio (1 . Since 1. Observe how incomplete learning to some extent plagues even an extremely patient £X. So that if 6 close v(ir) = this problem does not fall under the rubric of EK's Theorem the optimal value function v enough to 1. u ai (ir) it for For here, near 0, EX and is strictly convex ir, where learning is it is v(ir) = u QM (7r) for n near lumpy 1, shown complete optimally behaves myopically for very extreme points to the source of the incomplete learning: 4.3 in beliefs 9, beliefs: both affine functions. This signals rather than impatience. Forward-Looking Informational Herding Let us return to the view of this as the informational herding paradigm. Suppose that everyone If for is altruistic, but subject to the standard herding informational constraints. the realized payoff sequence maximize E[(l - 5) first is (u k ), suppose that every individual E*Ln **""«* M> where we 8 define E[f\ir] n = 1,2,... seeks to = £»ee *(*)/(*) Then the solution to £X's problem yields a perfect Bayesian equilibrium of this model. Now The Planner's Problem. let us add a shade of economic realism, and posit behavior. Reinterpret the patient experimenter's problem as that of an informationn ~ lu ally constrained social planner ST trying to maximize (1 - S) Yl^Li ^ n, the expected selfish discounted average welfare of the individuals in the herding model. Observe how this and £X's objectives. To respect that key herding informational perfectly aligns the ST's assumption that actions are observed and signals unobserved, assume further that the ST state nor can observe the individuals' private signals, but can both knows the neither observe and punish/reward any actions taken. We now are positioned to reformulate the learning results of the last section at the level of actions. The Overturning that for near its 7r Jm (5), Principle of SS generalizes to this case: Lemma A. 2 establishes actions other than a m will push the updated public belief far from current value. This at once yields the following corollary to Proposition 2. Proposition 3 (Convergence of Actions, or Herds) • For unbounded private • With bounded private beliefs, beliefs, a herd on some action eventually om a herd arises on an action other than • no surprise that ST ends up with More individuals will. interesting incorrect herd. Herding is Let ST learning with that is ST beliefs, for a is mapped herder's threshold "a m+ i(p(7r, x m )) if As that is and optimal rule x* for ST, into the posterior p(n,a) x m must solve the + r(am+1 |7r). optimally chosen, x m incentives are ST everyone = how = x*m unchanged if , zero, or = + indifference equation So, the difference r(a m _x |7r) is (1 - 7r)(l Lemma A.l valid. - a)}. u am (p(7r,x m )) — r(a m |7r) + The private The selfish r(am |7r) alone matters for = how and the ST can ensure that the threshold by suitably adjusting a constant is still are the transfers set? 7ra/[ira added to achieve expected budget balance each period: is selfish the individual takes action a he receives the individuals trade-off between the two actions, belief even tax or subsidize the actions according to the following simple that individuals optimally choose private belief threshold rules belief 1. the choices away from the myopic solution (possibly negative) transfer T(a\ir). Faced with such incentives, our proof in belief n, [0, 1). optimally incurs the risk of an ever-lasting How does ST steer scheme. Given the current public belief n, Given public G Jm{8), H for any 5 € vanishes as S t unbounded 7r truly a robust property of the observational learning paradigm. Optimal Transfers. to £X's problem? full beliefs Unless starts. with positive chance in state The chance of an incorrect herd with bounded private It is on the correct action. a herd eventually starts this difference. all i.e. M transfers, we can also insist the expected contribution from ]Cm=i ^(a m |7r,s)r(a m |7r). This uniquely determines the transfers. We would dearly much work, we if his desired like to provide a crisp characterization of these transfers believe that this one will is impossible. Clearly, be chosen anyway, i.e. that actions be chosen with discretion as from will differ 7r e Ji{8) C a when 7r 6 £ it Ji(S) Ji{6), strict inclusion for 'experimentation' (making non-myopic choices) What not tax or subsidize actions C J { Conversely, . if SCP wishes then some transfers intuitively high enough discount factors by Claim A. 7, transfers are not identically zero for a sufficiently patient ST. successors. will because myopic and patient behavior do not coincide. Rigorously, as zero, Ji is for S7 — but after that, rewarded, as is these 'high-information' choices are 5. Beyond it we can only say that provides information to rather hard to is tell. CONCLUSION This paper has discovered and explored the fact that informational herding is simply incomplete learning by a single experimenter, suitably concealed. Yet herding has clearly achieved a measure of 'market penetration' that has eluded the experimentation literature. Aside from its simple economics, we believe this owes to case, thus avoiding the Our mapping, its focus on the zero discounting heavy dynamic optimization machinery. recasting everything in rule space, has led us to an equivalent social The planner's problem, an oft-performed exercise for economic models. in mechanism design also employs a 'rule machine'. But revelation principle in multi-period, multi-person models with uncertainty, the planner must respect the agents' belief filtrations. machine approach succeeds by exploiting the martingale property of public Our rule beliefs. This property extends beyond action observation models to those where only an imperfect signal of one's posterior belief such signals is observed. is But once we — again, provided the entire history of observed relax the assumption that the history is perfectly observed, as in Smith and S0rensen (1996b), the public belief process (howsoever defined) ceases to be a martingale — a fact that we are exploring in a work in progress. As a result, a mapping of these models into the experimentation literature no longer appears possible. Finally, we and others have correctly attributed social information (a finite action space). requires an optimal action ip(a\9, x) of signals a is x for As in EK, incomplete learning which unfocused the same for all states. bad herds to the lumpy transmission of beliefs are invariant, Such an invariance with fewer available signals, and not surprisingly herding and pathologies that we have seen assume a Rothschild (1974), McLennan (1984), finite (vs. all the distribution clearly easier to satisfy published experimentation continuous) signal space. For instance, and ABHJ's example are 10 is i.e. at the very least all binary signal models. APPENDIX A. A.l Preliminary A n strategy s n for period must be used, given 7r n map from E a to X. The experimenter chooses a . X prescribes the rule x n € It strategy profile s determines the stochastic evolution of the model in turn which belief is — i.e. = which {s u s 2 . . .), , a distribution over the sequences of realized actions, signals, payoffs, and beliefs. Our analysis is inspired by value function v(-,S) is u(7r, = 5) sup s E[(l processes implied by : - E ->• 5) ABHJ E Y^Li and sections 9.1-2 of Stokey and Lucas for the experimentation <S t-lu = expected payoff from a m at belief n. Since u m Ts v(ir) = where q(n, sup xex y^ < for v x) [(1 - 7ra(a m ,i7) is affine, + - (1 ir)u(am ,L) denotes the we have the Bellman operator Tg 6)u m (q{7r, x, a m )) + 6v(q{n, x, a m ))] > Bayes-updated belief from v' Lemma we have T$v > Tgv'. As tt is when a is observed and rule x standard, T$ is = Xq < X\ < when a — x m there is Proof: We . < m-i, let probability into a mr If . < xm = . 1 such that action a m is X q(Tr,x,a mi v(-, S) ) ) > £j (i Assume = 1, 2) be those signals in E x m ), which are mapped with positive to the contrary that the sets are not ordered as Ei [0,x] and E 2 (x) (it may be is = (E : < g(7r,x,a m2 ). q(n,x,a mi ) < < E2 . U E2 ) D [x, 1]. , and vice versa. = For any x E (0,1), define T,i(x) Consider then the modified rule x which taken for signals in Ej(x), and where x satisfies = ^(a mi |7r,x) necessary for x to randomize over the two actions at signal x to accomplish that). Since signals more in favor of state mean (^ m _i, q(ir,x,a m2 ), then given our ordering of actions, payoffs are improved equals x, except that a mi il)(a mi \ir,x) described by randomization between a m and a m+ i. Next, assume that g(7r,a;,ami ) U E2 n is when a € taken (with no loss of information) by remapping signals leading to a mi into a m2 a applied. prove that any rule x violating this interval structure can be strictly improved upon. For mi find is a contraction, and For any n, one optimal rule x £ A.l (Interval Structure) thresholds (Ex (A-l) V J unique fixed point in the space of bounded, continuous, weakly convex functions. is its and \TT, problem with discount factor 5 Um eA x, a) is the Note that ip{a m The nk], where the expectation uses the distribution of Recall that u m (ir) s. (1989). L are q(n,x,a mi ), and similarly q(Tr,x,a m2 ) > mapped into a mi under x, q(7r,x,a m2 ). Thus, we x implies preserving spread of the updated belief versus x, and since the value function weakly convex, this weakly improves the value 11 in the Bellman equation Ts {v) = v. is D A. 2 Proof of Proposition The proposition is established in a series of claims. For each a m G A, a possibly empty interval Jm {8) Claim A.l (Basins) when £% € Jm {fi), 7r For the Proof: we first half, G Jm (5), then a m 7r optimally chooses x with supp(/i) For any 8 G (learning stops). = if v{tt, u m (ir) affine function of an is For the second since vq(-k) half, ai is updates to it G Ji{8) and [0, 1), really 7r n +i = ir, Claim A. 2 m -i,x m i.e. ], a m occurs and = v(n,8) is then n G Jm (8) and a m is v (ir) a.s. no matter which rule A [0, 1). similar is applied, it is = um (Tr). (7r) interval. = 7r n 0, and also dynamically argument holds when by v If the optimal choice. As myopically strictly optimal for the focused belief tt interval. weakly convex, Jm (8) must be an v(-,8) is a.s. G Jm{8). 1 Define the expected utility frontier function v utility [x such that exists, need only prove that Jm {8) must be an = u m (7r) optimal for any discount factor 5 G expected C the optimal choice, and the value is Conversely, 8) 1 7r n = max m u m (7r), = 1. that is, the an individual would obtain by choosing the myopically optimal action. 9 The sequence {T^v (Iterates) } consists of weakly convex functions that are pointwise increasing, and converge to v(-,5). The value v(-,5) weakly exceeds vq. To maximize Z)a m e-4^'( am Proof: for given q(-K, x, x G X, we see that Tsv (as per usual) The 7r ' :r ) K 1 ~ fi)um{q(n,x,a m )) > (tt) x(a)) = v for all (ir) following either > Proof: 82 , By it. induction, is v(tt, 8i) Clearly, > policy under 81 we it. 10 The value function weakly for > v . to choose the this differs > or ought to be a folk result for optimal experimentation, but rises when 8 increases: Namely, all n. Y,am€A 1>(°m\*,x)iiTn{q(*,x,am)) is yielding .D > If 81 82 then , < E^e^M^M^'^'O) v > T Sl Tg 2 v on the larger component of (A-l). Inductively, we have T^v Observe how , v v(tt, 82) any x and any function v 9 Tn v > T n_1 u a pointwise increasing sequence converging to the fixed point v(-,5) Claim A. 3 (Monotonicity) 8i over x resulting in value v (n). Optimizing over tt a.s., have not found a published or cited proof of for + 5vQ (q(7r,x,am ))] one possible policy x ensures that the myopically optimal signal a m occurs tt, with probability one. Then all l x optimal under from v{n,0) = sup;, 82 . Let n —> , since there > T£2 v , is for more weight since one possible 00 and apply Claim A. 2. X) m ^(a m |7r,x)u m (9(7r,a,a m )). In other words, v(ir,0) a before obtaining the ex post value v (p(-n,a)). allows the myopic individual to observe one signal 10 But ABHJ do assert without proof (p. 625) that the patient value function exceeds the myopic one. 12 Claim A. 4 (Inclusion) Va m £ A, > if I As seen Proof: when 61 > 62 > 5{) optimal value can thus be obtained by inducing a m Claim A. 5 (Unbounded SS establish = and Jm{0) = {!} Now - We Proof: ai occurs [0,fr] = ir (0, 1). 7r So any is affine, A small enough n G sufficiently small, q(n) — = 3u > v(ir, 6) is suboptimal to is Fix 5 € some < > H ([x*, for 1 risk [0, 1), < 7Ti 1]) fi [0, 7r/2] Updating 7r tt 2 = 0, < (0, 1), in for all 7r G /. and ai — q(7r) = Tg(v) is very similar. Since action the optimal choice for beliefs is > u m (ir) + u it For is — = ira/[ira in particular, less for all tt m^ < it. and 1, (1 — By 7r)(l C — [a, a] the continuity than u{\ — 5)/8 for (1 it is D ai for such small beliefs. < m basins disappear except for all 8, C For a)]. v corresponding to (A-l) reveals that {0} and lima-n Jm(8) m + arbitrarily small. enough large = q(ir) any outcome other than 0. We = < M) {1}- which Jm {6) for shall consider the alternative policy x, = 2*, x m +i = 1 (see Lemma with observation of event {a € + (1 - definition of e, q [x* , = [7r x , 7T 2 ], A.l). tt)^ ([x*, 1])] in state D Jm (6), maps H. there exists e the interval 13 with interval 1]} yields the posterior belief q(n) l particular one with / By [0,7r(<5)] Since there are informative private beliefs, 3x* € (1/2, 1) with 1. xm = optimal to choose a rule x such that it is such that Ui(ir) and an action index H ([x*,l])/[7r^([a;*,l]) 7r/i C {0} 1. then arbitrarily small > n L ([x*, 1]) > boundaries x m _! / < (6) updated to at most and Jm(8), while lim^x Ji(5) Proof: = [0, 7r/2]. Claim A. 7 (Limiting Patience) Ji{8) it t>o(7r), i.e. The Bellman equation 7r. are empty. can ever yield a stronger signal than any a € supp(/x) initial belief of v, v(q(ir), 5) Jm (6) not weakly dominated, there must be some positive is Ui(ir) in the interval 7r observation a £ strictly < n(6) and 0, on which Moreover, since each u m for all beliefs other all are empty, except for Ji (0) with probability one; the argument for large beliefs length interval No < prove that for sufficiently low beliefs optimal at belief ai is Jm (0) only basins for the If the private beliefs are bounded, then Ji(S) Beliefs) where [*(£), 1], all = u m (n). The G Jm (^2)- 7r beliefs, {1}; for all n, apply Claims A.l and A.4. Claim A. 6 (Bounded and Jm{8) so that = {0} and Jm{8) myopic model that for the a.s., > u m {n) v (n) and thus v(n,6 2 ) With unbounded private Beliefs) extreme actions are empty, with Ji{5) Proof: > v{n,S 2 ) = u m (n) € Jm (6i), we know v(n,5i) it In other words, 6 increases: Jm (6i) C Jm {62 ). then 0, Claims A. 3 and A. 2, v{x, in For . > 6i,8 2 when All basins weakly shrink [tt 2 = = For any closed subinterval e(I) - e/2, tt 2 ] > with q(n) - -n > e into (but not necessarily + onto) [n 2 Since e/2, > u m (ir) u(7r, 5) > choose u thus have Choose u > 1]. outside > u m (n) + u the Bellman equation Tg(v) when E it Jm (5) so small that v(tt, 5) v(tt, 6') — e/2,n 2 [7r 2 ]. = By m = Jm (5) n m = or 1 M, [7Ti, 7r 2 ], for all and both are continuous > u m (n) + u for all > > If 5' 8. u 6 n € 4- [7r 2 e/2, it, - [0, 1]. we may By Claim 1]. so large that (1 is in € 7r < 5')u also A. 3, we then S'u, v reveals that our suggested policy x beats inducing a m number iterating this procedure a finite Jm (5), we enough 5 Jm (S) evaporates see that procedure can this / vanishes for large = for all 6' excising length e/2 from interval If < u m+1 (7r) + so large that u m (ir) of times, each time for large enough 6. be applied repeatedly, to show that still any closed / for a.s. C (0, 1). A. 3 Proof of Proposition 3 Lemma In light of the interval structure found in determine the chances what ?/>(a compact M-simplex A(A) (that The maps is, Theorem LB. 3 rules is \\q(ir,x,a) Proof: it, and > — For 7r|| in and an open > it e 7r. when x is non-empty correspondence of optimal Theorem n interval For K all 8 € D Jm (5), such that Vtt £ is u.h.c. in it, rule is at least some e > the optimal rule for is x,ai) < it, i.e. near zero. loss, beliefs, K Jm (5) ^ ^ and Va 0, a mj and so it is an extra step is it near Assume Jm (5) induces With bounded immediate that the public needed. As the . belief beliefs, must jump . WLOG that it is near 0. Then the updated belief is close to the current, so that v(q{ir, x, a{),8)—v{iT, 5) Choosing an action other than a : implies a boundedly positive immediate which must be compensated by gained future value (again, by the Bellman equation). Since v 11 1, upon observing any action other than a m With unbounded q{ir, and with to a.s. induce a m actions other than a m for only the most extreme portion of supp(/u). bounded away from A and a m G [0, 1) the optimal rule. correspondence of optimal rules it is is Maximum € Jm {S) we know that the only optimal it WLOG the of the follows from the of Hildenbrand (1974)) that the A. 2 (Overturning Principle) there exists e is objective function in the Bellman equation corresponding to (A-l) upper hemi-continuous Lemma into each action). Thus, the choice set the same strategy space as in the original observational continuous in this choice vector and in (e.g. simply to is with which each action should be chosen (or equivalently, m |7r) fraction of the signal space learning model). A.l, ST's problem is continuous, The optimal rule is it follows that q(ir, x, a m ) must be boundedly higher than a point- valued function except when there 14 is an atom on the it. D interval boundary. MIT LIBRARIES Illlllllllllllllilll I 'ill I II i 3 9080 00988 2751 References Aghion, P., P. Bolton, C. Harris, and B. Jullien (1991): "Optimal Learning by Experimentation," Review of Economic Studies, 58(196), 621-654. BANERJEE, A. V. (1992): "A Simple Model of Herd Behavior," Quarterly Journal of Economics, 107, 797-817. S., D. HlRSHLElFER, and I. Welch (1992): "A Theory of Fads, FashCustom, and Cultural Change as Information Cascades," Journal of Political Econ- BlKHCHANDANl, ion, omy, 100, 992-1026. EASLEY, D., and N. KlEFER (1988): "Controlling a Stochastic Process with Unknown Parameters," Econometrica, 56, 1045-1064. Hildenbrand, W. (1974): Core and Equilibria of a Large Economy. Princeton University Press, Princeton. Lee, I. H. (1993): "On the Convergence of Informational Cascades," Journal of Economic Theory, 61, 395-411. McLennan, A. (1984): "Price Dispersion and Incomplete Learning Journal of Economic Dynamics and Control, 7, 331-347. Rothschild, M. (1974): "A Two-Armed Bandit Theory Economic Theory, 9, 185-202. Rudin, SMITH, ing," in the Long Run," of Market Pricing," Journal of W. (1987): Real L., and P. S0RENSEN (1996a): "Pathological Outcomes of Observational Learn- MIT Working and Complex Analysis. McGraw-Hill, L., York, 3rd edn. Paper. (1996b): "Rational Social Learning Stokey, N. New and R. E. LUCAS Through Random Sampling," MIT mimeo. (1989): Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge, Mass. 15