Digitized by the Internet Archive in 2011 with funding from Boston Library Consortium IVIember Libraries http://www.archive.org/details/waldrevisitedoptOOmosc 35 iWi\ HB31 ,MIT LIBRARIES .M415 O^.n-Ol riiiiiiiiiiniiii'Mii ,__ 3 9080 01735 'diss WORKING PAPER DEPARTMENT OF ECONOMICS '.u WALD REVISITED: THE OPTIMAL LEVEL OF EXPERIMENTA TION Giuseppe Moscarini Lones Smith No. 98-04 April 1998 MASSACHUSEnS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE. MASS. 02139 »t MASSACHUSETTS INSTITUTE OF TECHNOLOGY LIBRARIES Wald Revisited: The Optimal Level of Experimentation*^^ Lones Smith^ Giuseppe Moscarini^ Economics Dept., M.I.T. Economics Dept., Yale P.O. 50 Memorial Drive Box 208268 New Haven CT 06520-8268 Cambridge MA 02142-1347 April 21, 1998 (first version: July, 1997) Abstract Wald's (1947) sequential experimentation paradigm, now assuming that an impatient decision maker can run variable-size experiments The paper revisits each period at some increasing and strictly convex cost before finally choosing an irreversible action. We translate this natural discrete time experimentation story into a tractable control of variance for a continuous time diffusion. Here we robustly characterize the optimal experimentation level: It is rising in the confidence about the project outcome, and for not very convex cost functions, the random process of experimentation levels has a positive drift over time. also explore several parametric shifts unique to our framework. we discover what level is positive, is arguably an it is 'anti-folk' result: often higher for a Where Among We them, the experimentation more impatient decision maker. This paper more generally suggests that a long-sought economic paradigm that delivers a sensible law of demand for information is our dynamic one — namely, allowing the decision maker an eternal repurchase (resample) option. •This paper was initially entitled "Wald Revisited: A Theory of Optimal R&D". We have benefited from comments at the Warwick Dynamic Game Theory Conference, Colmnbia, Yale, M.I.T., Toronto, Pennsylvania, UC - Davis, Stanford, UC - Sainta Cruz, Western Ontario, UC - Santa Barbara, Queens, and more specifically Massimiliano Amaxante, Dirk Bergemann, Lutz Busch, John Conley, Prajit Dutta, Drew Pudenberg, Peter Hammond, Chris Harris, Angelo Melino, Paul Milgrom, Stephen Morris, Klaus Nehring, Yaw Nyarko, Mike Peters, Sven Riuiy, Arthur Robson, John Rust, Chris Sims, Peter S0rensen, Ennio Stacchetti, and Steve Tadelis. Finally, we acknowledge helpful feedback from the usenet group sci. math. research especially, Jon Borwein of Simon Eraser University. Moscarini gratefully a<:knowledges support from the Yale SSFRF, and Smith from the National Science Foundation (grant SBR9711885) for this research. ^Keywords: learning, experimentation, sequential analysis, R&D *JEL classifications: Cll, C12, C44, C61, D81, D83 ^email address: gm76Qpaiitheon.yale.edu — ^e-mail address: lonesQlones.mit.edu t Wald Revisited: The Optimal Level of Experimentatioif^^ Lones Smith^ Giuseppe Moscarini^ Economics Dept., M.I.T. Economics Dept., Yale Box 208268 P.O. New Haven CT 50 Memorial Drive Cambridge 06520-8268 MA 02142-1347 April 21, 1998 version: July, 1997) (first Abstract Wald's (1947) sequential experimentation paradigm, now assuming that an impatient decision maker can run variable-size experiments The paper revisits each period at some increasing and strictly convex cost before finally choosing an irreversible action. We translate this natural discrete time experimentation story into a tractable control of variance for a continuous time diffusion. Here we robustly characterize the optimal experimentation level: It is rising in the confidence about the project outcome, and for not very convex cost functions, the random process of experimentation levels has a positive drift over time. also explore several parametric shifts unique to our framework. we discover level is what positive, is arguably an 'anti-folk' result: it is often higher for a Where Among We them, the experimentation more impatient decision maker. This paper more generally suggests that a long-sought economic paradigm that delivers a sensible law of demand for information is our dynamic one — namely, allowing the decision maker an eternal repurchase (resample) option. *This paper was initially entitled "Wald Revisited: A Theory of Optimal R&D". We have benefited from comments at the Warwick Dynamic Game Theory Conference, Columbia, Yale, M.I.T., Toronto, Pennsylvania, UC - Davis, Stanford, UC - Santa Cruz, Western Ontario, UC - Santa Barbara, Queens, and more specifically Massimiliano Amarante, Dirk Bergemann, Lutz Busch, John Conley, Prajit Dutta, Drew Fudenberg, Peter Hammond, Chris Harris, Angelo Melino, Paul Milgrom, Stephen Morris, Klaus Nehring, Yaw Nyarko, Mike Peters, Sven Rady, Arthur Robson, John Rust, Chris Sims, Peter S0rensen, Ennio Stacchetti, and Steve Tadelis. Finally, we acknowledge helpful feedback from the Usenet group sci. math. research especially, Jon Borwein of Simon Fraser University. Moscarini gratefully acknowledges support from the Yale SSFRF, and Smith from the National Science Foundation (grant SBR9711885) for this research. ^Keywords: learning, experimentation, sequential analysis, R&D — *JEL classifications: Cll, C12, C44, C61, D81, D83 ^email address: gm76®paiitheon yale edu . . ^e-mail address: lones@lones.mit.edu ^ INTRODUCTION 1. This paper revisits a classic decision theory contribution around Wald's (1947a) sequential probability semicentenial its More than any other work, ratio test} — this has defined the paradigm of optimal sequential experimentation, either statistical or Bayesian. For it tackled afresh the simplest of decision problems — among two choosing for instance, actions (perhaps accepting hypotheses), each optimal in one of two states of the world. Wald In his Bayesian formulation, of the state, can VJ^ shows that the one buy multiple posits that the decision Wald informative signals at constant marginal cost. i.i.d. should purchase sequentially, and act when sufficiently convinced of Yet we venture that the experimental schedule should accelerate when homo state. economicus runs the laboratory in motivated twist along these spin on Wald's tale, the VM. real time. We lines that also creates is offer here strictly richer experimentation story provides a compelling, economically- a pure theory of dynamic assumed impatient, but may each period only at an increasing, it maker {VA4), ever uncertain elect a variable-size convex cost of information. some intriguing new R&D. We In our experiment argue that this testable implications, also points economists to a long-sought well-behaved theory of information and that demand. This paper has three goals, attacked in sequence after an essential literature review. First, we motivate our impatient DM. and convex cost spin on Wald's discrete time tale, and then introduce and solve a tractable continuous time equivalent model of experimentation — which we argue is a control of variance for a diffusion with uncertain mean. Second, we investigate the robust character of the experimentation suppose that we it peaks when the find the opposite: payoff, which is VM. level. has middling, elastic beliefs over the state. In Given a convex cost function, experimentation grows greatest at extreme beliefs. For example, most discouraged, peaking just prior One might to project approval. in the fact, expected R&D levels are least when one is We argue that the VM acts like a neoclassical competitive firm producing information at an increasing marginal cost, with an increasing "producer surplus." Optimal stopping demands that cost of delaying the final decision, i.e. this surplus equal the the portion of current value killed by discounting. Ipso facto, the research level that generates this surplus rises in the value. We up rise that the level Finally, shifts. convex in beliefs for not too we explore how experimentation Among impatient ^The is VM. convex level costs, and so drifts also show over time. responds to some natural parametric our findings, we discover that contrary to established folk wisdom, a more often experiments at a higher level provided that he original source is still experiments. a 1943 war-classified mimeo by Wald. Wald (1945a) and (1945b) were the first published versions of the program. In his (1947) preamble, he credits Milton Friedman and Allen Wallis for proposing sequential analysis. Wallis details this history in his (1980) retrospective paper. 2. THREE RELATED LITERATURE THREADS Statistical Testing 2.1 The (1933) Neyman-Pearson theorem says that the best hypotheses given a fixed sample has the form: accept ratio exceeds a threshold. ity ratio test (SPRT) is Wald and Wolfowitz optimal Ho critical test of (vs. Hi) iff two competing the sample likelihood (1948) proved that the sequential probabil- — minimizing the expected number of observations for a given power, even given either hypothesis. Arrow, Blackwell, and Girshick (1949) further developed this idea, and was among the earliest of contributions to The SPRT has been explored and But generalized by several authors (see Chernoff (1972)). no one has characterized the optimal sample in a half century, surprisingly dynamics with discounting, where a unit sample its contribution. Cressie size is no longer optimal and Morgan (1993), a recent sampling from the a variable size probability ratio test procedures, for instance, while a 2.2 dynamic programming. is unconditionally optimal SPRT is best size or — our main frontier, prove that among sequential design with superadditive costs and no discounting. Optimal Experimentation Optimal experimentation was born out of sequential Among analysis. the canonical Bayesian learning models, those with a discrete action space are essentially stopping rule problems, or more generally, bandit models; continuous action models are richer, and not so easily pigeon-holed. Our model rare as is it grafts together these two contexts: variable experimentation with an eventual binary stopping decision. Crucially, because our makes a pure information purchase until stopping, it differs from all other experimentation models where myopic expected stage payoffs are themselves informative random here, payoffs are known and negative This paper investigates the run question, questions. 'Is For level finite action models signals; (the information cost), until the final decision stage. and learning complete?'^ VM drift of The experimentation, as opposed literature has also studied three like bandits, there has been to. the long- main short-run work on the sequential ordering of experiments, or stochastic scheduling. Second, the direction of experimentation has been explored in continuous action models like monopoly pricing.^ Finally, the secular diminishing trend in experimentation has been remarked in papers on search with learning (eg. rising reservation prices). This comes closest to our thrust, and yet runs exactly counter to the rising experimentation levels that we find. information purchase is blended with, and thus colored by, For with price search, the immediate payoff concerns. and Kiefer (1988) or Kihlstrom, Mirman, and Postlewaite (1984). ^McLennan (1984) and Trefler (1993) are good examples. Keller and Rady (1997), who posit a randomly shifting demand curve, assume continuous time learning; it is therefore also a technically relevant paper. ^See, for instance, Easley . Research and Development 2.3 We interpret a special case of our analysis as a pure is Kamien and Schwartz modelled R&D as (very roughly) covering a possibly uncertain distance in 'progress space', given cost and effort functions. assumed a related budget constrained model. capturing feasible?" 'D', this An and ours It may Dutta (1997) has recently help to imagine extant work as For in our setting, the actual state of the world ("Is this 'R'. always hangs in the balance, while these papers posit a goal that ) attainable, if desired. We is eventually cast research as optimal learning rather than resource allocation; a theory of science (the search for truth) and not engineering is eventual comprehensive theory of dynamic R&D will implementation). (its no doubt embed both phases. MOTIVATING THE CONTINUOUS TIME MODEL 3. Bayesian Sequential Analysis 3.1 The A. Final Static Decision Problem. must choose between two A Action is = A,B pays B in state H, and the VM. does not know the better in state L, and is the VAd's prior belief on say TTa{p) We also = pn^ + (1 ~ In a standard problem, the VJvi eventually actions: a Assume that the VM. H p)'^a- we can assume pays a positive payoff tt^ is p, The world 6 VM is indifferent between A and B VM builds, B at > 0. If means it state is then 7r(p) = = L and one is dominated, then k{p) some in p, belief p. is always increasing in new prototype, and action might or might not work: payoffs are A The if a pure optimal stopping exercise: The A (or B, or the null action, i.e. 0. <0 So initially acquire Assume he maximizes the expected As Wald and Wolfowitz proved that = it. I). iff informative signals of ^ at a fixed unit cost. 7rB(0) 7r^(p) p. R&D 'abandoning' = 7rB(l)>0or£ = /i let less costs incurred. B optimally safe action conveniently captures a stylized p)), action is afiine (without loss of generality) that action 'building' a costly H and 6 = L. tt{p) is state. max(7r^(p),7rB(p),0). Given VM earns zero regardless he abandons: = max(0, hp + i{l — and the VM invests p > p = —i/{h — the VM B. Information Acquisition. Before choosing an action, in states 6 this = L,H. utilities as payoffs). then his expected payoff for action a WLOG special case of one risky problem: action the in the state of the admit the costless option of never deciding, and thus need a default null action the null action, The 7rf Bayesian and risk neutral (simply treat yielding constant zero payoff. His static value If this obviously a well-studied subject, other work differs in a key dimension. During 1971- 82, If R&D. While theory of dynamic final reward sequential purchases are optimal, VM quits at the stopping time T and chooses quits) with posteriory <p (orp > p, or p G [Pq,Po]},- 3.2 Discrete Our goal is Time Experimentation with Impatience and Convex Costs and cost to extend Wald's setting along two dimensions: payoff discounting convexity of experimentation. Tiiese assumptions are of particular interest for economists, First may This less so for statisticians. assume an impatient explain VM , why this stone has who maximizes of wealth. Faced with the time cost of delay, the purely sequentially, but outcomes k, the in may left unturned. the expected present discounted value VM. does not necessarily wish to proceed opt to stack his information purchases. After seeing his signal any period, he either purchases more, or stops and chooses an action. In period VM. may buy A^^ i.i.d. signals Xi, . . . ,X[^^ at cost might venture that running a lab incurs a daily level. been In this case, there is is 0. One independent of the experimentation a positive fixed flow experimentation cost C{0) analogy with Wald's setting Next assume rent, > C{Nk), with C(0) obviously closest with no fixed costs C(0) > = 0. But, the 0. convex information costs within a period; this fosters more equal strictly purchases across periods, reinforcing Wald's sequential conclusion, absent discounting. Such an assumption makes economic sense on two grounds. More searchers are equally talented in producing information. then must draw on the efforts of less First, plausibly capable researchers. Second, as with non-Bayesian rent stock, identical or similar discoveries are not rare:"* may well expend Even i.i.d. signals. Then note is if based on the same cur- research laboratories are resources duplicating results. Like- wise concurrent Bayesian information tends to be correlated, as to produce all re- intensive information search inventive activity, since contemporaneously-produced knowledge created at constant cost, different labs not it grows increasingly hard that a constant marginal cost for correlated information an increasing marginal cost of independent information. intuitively corresponds to Developing the Continuous Time Experimentation Model 3.3 Although variable intensity experimentation discrete time using understood and formulated easily dynamic programming, the solution simplest signal structures. Perhaps this explains now is is in intractable even for the very why no one has seriously pursued it. We describe a tractable continuous time learning paradigm that captures the quintessence of the discrete time story. substance. Its recursive Appendix B. In a work Below we sketch and motivate solution is in progress, found in section 4.2, it and focus on and a formal we argue that the model and its its economic justification in solution is the limit of a rich class of discrete time models with a vanishing time interval between experiments. ^A recurring theme in science is that great minds simultaneously think alike (eg. Newton and Leibniz' codevelopment of calculus). Indeed, those seeing further often stand on the shoulders of the same giants. A. The Signal Process. number In discrete time, choosing the buy, each with distinct state-dependent means, is iiX has mean ±/i experimentation. For instance, of signals to i.i.d. an apt description of variable intensity in in states H, L, then experiments offer a noisy but informative glimpse of the signal mean. In fact, the very goal of experimentation is to infer this X= J2 Xi/N mean, sufficient for the is DM. for in so doing, the to decrease the variance of learns the state. Since the average signal unobserved mean, greater experimentation only serves X: Doubling the sample size precisely halves the variance. Motivated by this general observation, we model continuous time experimentation as the control of variance of a diffusion observation process (xt). is known a Brownian motion with constant uncertain drift and its drift, jjf in state 9, a'^/ut, where with the intensity differential equation fx^ = —fj,^ = /i > 0, The observation n^. Here, 6. experimentation VA4 variance. Nature chooses controls its flow variance diffusion process thus solves the stochastic we think = a ^fdt + -—dWt (1) of Ut as the flow of information purchases, level or intensity. Doubling rit and ~ ^^^(0, dt) is the Wiener increment, while the control Uf depends on the observation and intensity history {xs,0 < s < It is t). the call it halves the "variance" of dx^, yielding a doubly informative time-t experiment. As usual, dWt {ns,0 (xt) (SDE) dxl in state while the For a fixed control, < s < t) U a feedback and not open loop control, not decided at time-0.^ See §17.6 in Liptser and Shiryayev (1978), §17.5 in Chernoff (1972), or §4.2 in Shiryayev (1978) for the pure problem of estimating the bivariate drift of a Brownian motion. Their motivation be is its link to the heat equation. Neither source discusses control of variance. sure, without time preference, there is Remark. But a Our realization xt = Jq dxg is an unweighted running integral of sample means, and so < s < (w.r.t. t) totals observation process (xf), the drift obeying dx^ that case, J^ Ugds can be thought of as the running sample is exercise. choice of process (xf) yields a motivational pure control of variance: not a sufficient statistic for {{xs,ns),0 running sample no pressing reason to consider such an To = size, /z^). is Consider instead the Utfi^dt-V- {o^/nDdWt- In so that {xt, /q Ugds) G M^ a simple sufficient statistic for the mean. This process also yields a clearly well-defined level-0 experimentation, as will intensity level rit essentially accelerates observes future samples. The two and a choice between them the state ^This is 6), our belief is not time (2) in §4.2. We see also that a higher — advancing the schedule that the VM. processes (xf) and (x^) are mere conceptual devices, critical: and crucially yield the same why filter Both are sufficient for the drift i/ belief filter (2), the signal must be defined recursively via a which is (and hence what we work with. SDE rather than as an exogenous Ito process. B. The Cost The ('A'): Flow Experimentation. of on (0,oo),^ with nonnegative fixed 'Lipschitz-down' on {0,00): c'{n2)—c'{ni) means Strict cost convexity 0<7<1. roles, — which true is Weak = convex hull vex(c) least 7c(7ii) c"(> if VM. < sup{co + (1 — 7)0(712) — e 712 at a chosen intensity level Ut experimentation costs: E[Jq supremum w.r.t. T and posterior belief po on state Competing Static c|co is is and well-defined. convexity remarkably obtains achieved in any time interval is with weights (7, 1 — 7) At each time > Admissibility 0. VA4 maximizes —c{nt)e'''''^dt (nt), as H is a, + WLOG in t, > [fo, ^1] any average cost at 0, by sufficiently rapidly Then, at small payoff' loss. VM, the impatient let e — > 0. facing an interest demands that the random stopping time his expected discounted return less incurred e~^'^7r{pT)\po]- We write the optimized value, V{po), since Appendix B.l proves that the current sufficient statistic for 11 observed history to that moment. — n(p)]. This we have p = it is any subdifferential d G dU{p), line, random posterior belief q we may tack on any multiple differentiable for a.e. p. — Value of Information worth X(g|p) is = admits a motivational visual depiction. Beliefs being a martingale, E{q); therefore, E[U{q) for the describe the expected payoff of a one-shot Bayesian program. E[n{q) tangent and solution (1), convex}. Indeed, for any e prior belief p, a signal yielding the = 722 can achieve any cost function arbitrarily close to the lower At the "^{qIp) 7)0(7x2) for ni 7^ differentiability play purely technical and Dynamic Intuitions Let the convex function convex, — (1 THE OPTIMAL LEVEL OF EXPERIMENTATION 4. 4.1 + X>0, and any 7i2>ni>0. each be functions of the observed history. In the Optimal Control and level Ut Stopping (OCS) problem, the or for some Marginal costs are 0. chooses whether to stop and earn the final expected payoff' n{pt), or to continue 0, and the > costs c(0) 7c(ni) — and strict) cost The Objective Function. > > 7)712) 0) exists (though not chattering between Ui and T — ensuring that our observation process continuous time.^ For the rate r (1 — ni), X{n2 flow cost c{n). precludes undesirable bang-bang control solutions. Both the Lipschitz-down It Remark. C. + 0(7/21 > n incurs a convex (and thus continuous) on cost Junction c{n) is finite, increasing, strictly [0,00), differentiable property Intensity level U{p) — d{q i.e. — Put d = ll'{p) when a slope between the p)] is of [Eq defined, left and p] = 0. As IT is and otherwise choose right derivatives. the weighted area between with weights given by the density over q (see Figure — 11 Then and any supporting 1). ®Thus, the cost function is C^ (continuously differentiable); by convexity, it has a right derivative c'(0+). '^We thank Paul Milgrom for this nice insight. We do not wish to delve into a technical proof of this assertion, as it would detract from our focus. We intend the point and explanation to be intuitive. ^ 7r(p) 1: Static Value of Information. information. With (eg. we discrete signals, With a continuous is maximized when the V/4 information most when he VA4 likes to in the middle at is and generally p, demand The above is IT the value of beliefs; = quasiconcave in is — p), and p. Alternatively, the new information on the peaks at p = 1/2, is Intuitively, he values his action choice. the impact of tt, this visual information value tt, between the two actions. most uncertain about increasing in p(l uncertain as to which state its is indifferent is spread his posterior of posterior beliefs thus is instead have a weighting of vertical line segments in this For the specific case of the static payoff frontier at p, signal support, the shaded appropriately weighted, line, the thick dashed lines). In either case, for a purely static value function the information value maximal p) 1 area between a value function and any supporting shaded region - d{p -* P p Figure + where the variance VM is most true. Either static logic suggests that information value and (extrapolating to margins) are quasiconcave 'hill-shaped' functions of p. logic fails for our impatient VAi in a dynamic For the X>A^ has setting. an incentive minimize the present discounted cost of information, and therefore wishes to delay his costly high-intensity experimentation until just prior to stopping. only at extreme beliefs, information demand ought As this occurs to be greatest for extreme beliefs. Of course, this intuition cannot establish the shape of the experimentation schedule, let alone its intertemporal trend. For that, we must consider the recursively-formulated problem. 4.2 The Recursive Formulation and Solution A. The Belief Filter and Bayes Problem. Intuitively (and formally, as show), the observation process {xt,nt) induces a diffusion belief process Po, beliefs (pt) evolve according to the signal-to-noise ratio factor of Bayes rule (x(), in continuous time. If (pt). C — we later Given a prior "^f^/f^ denotes then Theorem 9.1 of Liptser and Shiryayev (1977) (LS77) asserts: Pt=Po + where the Wiener increment dWs /o Ps(l = PsdW^ + dW^ = ^[dxf - - Ps)CVn^dWs [p,/x a Alternatively, LS77's Theorem ^Namely: As of time {xt, /o s, nsds) the increment + 9.1 implies that unconditional perspective,^ as he does not is Gaussian (1 know the A'^[0, t is s] = L,H: (3) a Wiener process from the VM's true drift, and so cannot observe the <t'<s)U - for 9 - Ps){-f^)]ds] (Wt) given observed history {xf^O Wt - Ws — Ps)dW^, and 0- (2) {nf,0 <t'<s),oi more simply, just and independent of Wg, for all t > s. true noise process (Wt) that drives the signal process in (1). The precise measure-theoretic statement, in terms of signal filtrations, appears in Appendix B.l. As a driftless diffusion, beliefs {pt) are Substituting the ex post observed history are computed. Intuitively, the VM updates beliefs Remark. Using —2{^/n't/ o) ^iptdt Not B. + dWt- filter (2) upward find that dW/^ = SDE > dxt 2(^/ct)/x(1 Plugging either Ito process yields the iff i.e. in favor of - [pt/i /lx > + (1 of the Standard for E [/(f = 7iPi as he can ignore = and dW^^ into {Wf) OCS drift in state L. problem is + /o^Pt(l - Pt)C,^tdWt) po] (4) | +72P2 to If it: pi in For intuitively, a signal convex. is with chance he optimizes at each The next lemma, proved V{pq). e-^'dt ^e'^'^Ti (po optimal learning, this value function spreading the belief po VM.^ The supremum value -c {nt) iff solved by the conditional belief processes (pf ) or (pf ). The Value Function. s^VT,(n,) 0) Pt){—fJ')]dt. Ito stochastic integral) (i.e. H, and a negative = — 1. beliefs {dWt > + dWt pt)dt surprisingly, beliefs have a positive drift in state y(po) the we (1), how these formulae reveals {xt, Ut) into the observation process rises faster than he expects, the belief and an unconditional martingale, with least variance near the extremes (71+72 7^ p^, = 1) cannot possibly hurt he then gets 7iK(pi) +72^(P2) Appendix B.l, mandates threshold stopping > rules. Lemma 1 (Convexity) Consider any cost function c{n) > Q. Then the supremum value V is convex in p. Also, V{p) = 7r(p) forp < p andp > p, for some cut-offs < p < p < 1. < po < p. // the null action is ever exercised, then V{p) = 7r(p) in \p^,Po], where p < 'd C. Optimality Conditions. p > p, set. If and absent the TT < The VM. ^ for p null action, he experiments at level n{p) for ever, then the null action We where P<Pq <Po <P- selects action may be then partition the exercise for the schedule n{p) < p, action p G {p,p), chosen in a subinterval [p.,Po] OCS B for an open C {p,p), problem into an Optimal Control (OC) (Appendix B.l.c proves this Markovian form) and an Optimal Stopping {OS) problem for the boundaries p,p, and perhaps p^,po. The experimentation domain is then £ = {p,p), or recursive equations for exists, £ = what we {p,Pn)^{Po,P) call the value and coincides with the supremum dynamic policy using ordinary Since beliefs (pt) the null action if v] value, or is we then prove = ?; differential equations viable. We now develop in Proposition 1 that v V. Hence, we solve methods (ODE) in for the optimal Proposition 2. are a martingale obeying (2), for any given experimentation region £, the Hamilton- Jacobi-Bellman {'HJB) equations associated to the rv{p) = sup„>o{-c(n) + => rv{p) = sup„>o {-c{n) + v'{p) + (l/2)np2(l nE(p)t;"(p)} 8 OC problem are - pfev"{p)} (5) where E(p) = p^(l— p)^C'^/2 measures = pn^ + il -pjtta v{p) = v{p) "belief elasticity", plus the value P7rf + (1 -p)ir'^ and possibly matching condition: = v{p^) = rv{p) —c{n{p)) + n{p)E{p)v" (p) , boundary smooth pasting plus (6), and the free = v'{p) 7v^-7r^ ST While the functional problems and possibly p and p: = v' {po) = (7) in closed form, we can create a unified equivalent (second order ODE problem nonautonomous) free boundaries, and Proposition (Value Existence / Uniqueness / Verification) 1 > with value w G C^ strictly convex {v" (6) The solution v coincides with (c) The value v intuitive insights. For a fixed SOC the is met because c{n) + nc'{n) — g(n) strictly increasing too convex, then g{n) for p e = We is as £U Now, g 1 in = 0, > 0. and we exists, , cum and thereby supremum for we provide some TiJB (6) > 1 in is and OCS oo if c(n) / = ir. easier and T,{p)v"{p); + nT,{p)v"{p) is strictly = —c{n) + nT,{p)v" (p) = is = — c(0) < and everywhere finite < Thus, ^(0) 0. Given rv{p) are; it is c' > and rv{p) rv{p) for large n, = g{n{p)), we have g~^. (7). is Theorem 1 in §A.2 proves that a unique uniquely soluble (part (a) here), while (part (b) here) by Theorem 3 in §B.3. v shares the shape of n by convexity, value matching, and smooth pasting. Remark. Strict cost convexity rules out one undesirable and not implausible outcome: suddenly exploding the experimentation level over a vanishing time interval With a = (5) is c'{n) rmax(7r(0),7r(l)) §A.l. 1. equivalent to the two-point boundary value KJB+ST value of < (5)-(7), the static payojf clearly continuous as c is just set y(0) Also, g{n) HJB+ST c'{f(rv))/E, Here, namely objective (4). p with and thus —c{n) §A.l. Since c'(0+) unbounded above by Claim this solution is the (c) OCS: of oo then uniquely exists iirv{p) n4 have established that solution to V FOC £, the f{rv{jp)) for the strictly increasing inverse problem SU: v" For by Claim — c(0) < soluble in n. because then v{p) £, because g{n) n{p) \. is domain strictly convex, is concave in n. The solution n{p) —c{n) maximized value (ir). <p<p< interior thresholds largely appendicized. is Assume ofUJB+ST, jointly increasing/decreasing/U-shaped in is Proof Sketch: The proof more the and 0) in E, {SU) below with without the closed form. fully characterize its solution {v{p),p,p} There exists a unique solution {p,p,v) or {p,p,n,pQ,v) (a) tt v' (p^) and TiJB cannot be solved (7) at conditions, that the value v be tangent to the static payoff function v'{p)=T^A-^A (6) OS problem For any given control policy n{p), the Stefan problem (ST) associated to the is: = v{po) linear (i.e. [0, A], A— >-0. not strictly convex) cost function, such a policy that quickly achieves arbitrarily perfect information is preferred with discounting, absent any cost premium. Assume Proposition 2 (Policy Existence / Uniqueness) > (a) If payoffs 7r(p) IS for all p G the unique optimal policy for (b) The marginal (c) = IfTT^p) OCS either c(0) if The (d) > 0, exists, n{p) = or (a) is and hence so does f{rv[p)). If c" Any if c{0) — even w = rv>0 = but there exists > rj is = > nc"{n) 0. continuous, then so geometric convex cost function c{n) in Proposition 2-c is not differentiable. uniquely optimal for when n'(-) exists if c"{-) does; n(-) is C^ proven in Theorems 4-5 in §B.3, and g'{n) if c' is Since / is g' = , n'^ — met by exponential functions Remark. We note after Theorem 6of §B.4 that so that the final proviso in Proposition 2(c) attained; therefore, no optimal policy exists. For > = ^ if &. 6 in §B.4. > c!'{n) f~^ is, so is D n' too, as claimed. violates the elasticity condition 1) = the fails, and /' like c{n) if 7r assume that (d), < oo. c(-) is Theorem (c) in differentiable is and thus {k increasing, is with hm-ujio w^~'^^'{w)/^{w) established in the appendicized Claims 2-4. {h) is n = f{w) then the policy {f{rv{-)),p,p} [0,1], continuous in E; level n{p) is Proof Sketch: Part Part OCS. differentiable in the return some p G at HJE+ST then the solution {f{rv{-)),2^p] from [0, 1], cost ^{w) =c'{f{w)) of the optimal intensity level and strictly concave, ("A"). = e°" — 1 (some a somewhere and supremum V{p) > 0). = c(0) = 0, OCS is not c'(0) of This pathological case, an undesirable by- product of the continuous time approximation, arises since the VM. may never choose A (or the null action) in finite time given the negligible cost of running very small experiments. The Optimal Experimentation 4.3 FOC Even though the c^{n) = Level: T,{p)v"{p) The 2 (Monotonicity) Given creasing in the value v{p) for p e £. For instance, quadratic costs c{n) The experimentation Here level is {'k), 0, with /(O) r? yields surplus g(n) = n^, n with rises {OS), and if v. is is what level c!{n) of {OC). Focus first information equal linear in the experimentation level n, the is = iff c{0) = — 0. y/n. i-^ n. We formally greater with a higher value, and therefore the There are two decisions at each moment to experiment, at MB = T,{p)v"{p) that: then an increasing concave function of the return \/rv{p). demands that the marginal cost precision f{rv{p)) are and thus f{n) a concrete economic intuition for the monotonicity of f is = the optimal intensity level n{p) is strictly in- > argue that the bang per research dollar level an Information Firm / allows us to conclude weakly exceeds f{0) It = as and the policy equation n{p) jointly insoluble in closed form, the monotonicity of Lemma VM constant, and the total benefit 10 is in time: on the its experiment or stop level choice. Optimality marginal benefit. Since belief marginal benefit of experimentation then linear: uMB. So at an optimum, P abandon Figure we 2: experiment 1 experiment build Value Function and Experimentation Demand. Overusing the depict botii tlie static (solid line), strictly (thin dashed line). payoff function tt (thick dashed is The increasing in v. R&D a more general decision model (no null action) on the right. — the g{n) between the vertical distance = nc'{n) — and dynamic value function v and the intensity level n model is illustrated on the left, and line) convex in the experimentation domain £ The demand = nMB — c{n) static vertical axis, = {p,p), The option and dynamic values — is value of experimentation maximized at the kink tt p. c{n) equals total (flow) benefits less total (flow) costs of experimentation, or the (flow) producer surplus of information. Imagine the VM as a competitive flrm, facing an increasing marginal cost curve, "selling" himself information at the constant "price" 'E{p)v"{p). Next strictly convex. for the The surplus g{n) rises in the optimal quantity costs. closes down = g{n{p)). So a^ the value v and thus delay cost rv choose a higher level n c is optimal stopping decision, the cost of delay must equal the surplus from optimally experimenting, or rv{p) convex if rises, This surplus VJ^ must the VM. only an experimenting to generate the requisite higher surplus. Simply put, his informational firm (and acts) n with rises in when he cannot generate sufficient net profits (producer surplus) to justify the time cost of his capital rental (his deferred action). Proposition 3 (The Optimal Level of Experimentation) frontier 7r(p) is is increasing (resp. decreasing, U-shaped) in p in the For immediate context, consider our thus the research level rises as A R&D we approach confidence potential uncertain investment has headed design). first, A for (eg. development £. and in the 'build' decision (see Figure 2, commonly-observed phenomenon: new life/money breathed or finding; research expenditures stochastically because of discouragement domain spin. Here, 7r(p) strictly increases in p, panel). This provides optimizing foundations for a likely the static payoff increasing (resp. decreasing, U-shaped) in p. Given (ic), the optimal level of experimentation n{p) left Assume into grow over time, only it by a key discovery later on to (i) shrink cold fusion), or (n) continue growing, as the project (recently, spinal cord research or similar pattern also emerges in the how is a new shuttle rocket engine clinical tests of new drugs proceed: small tests, and sometimes later, larger tests, and then quite expensive field trials.^ ^A rare empirical study of project-level R&D expenditures in the pharmaceutical industry (DiMasi, Grabowski, and Vernon (1995)) reveals a pattern strikingly similar to our theoretical prediction. Timeintensive pre-FDA clinical testing usually occurs over three sequential conditional phases, of growing sizfe. 11 The Option Value 4.4 = Write v{p) 7r{p) + of Experimentation where I{p) I{p), the general shape of the static payoffs We For simplicity, assume tt. next show that I{p) peaks at the kink p in Experimentation action switches: the expected present value of information (or Since learning costs time and money, the value v necessarily inherits experimentation). a null action. is is of which action to take, as no action tt when VA4 most valuable when the is > 7r(p) always, obviating the myopically-best is most uncertain as dominant. Think of I{p) as the option value of waiting and choosing an action after optimally experimenting. This option to change one's plan is worth the when one least is most sure of an action to section 4.1 of a hill-shaped value of information however, any implication that is So the suggestion take. an apt description of this option value; alone determines the experimentation level was it in false, for the flow return to experimentation also includes the terminal payoff n{p). Assume Proposition 4 of experimentation I{p) the 7iull action is If the null action is ('A')- is never exercised, then the option value single-peaked in p, maximized at exercised, then I{p) has two peaks: one p where ai tt^ = tt^ and one Proof of first case: By value matching and smooth pasting (6)-(7), v and v — -Kb falls Remark. on [0,p). — So v tt is and rising until tx^ Since information value owes to v" > 0, -kb cross, not. = Note that since v"{p) ^(rv(p))/E(p), and ^' — and cross. ttb at -kb tta rises = on exists is If 0. (p, 1], later falling. some have suggested that the of v'" should be relevant for the experimentation derivative. It is and sign instructive to see that by Proposition it 2, v'" exists, and equals ^ , "^ ^ Consider the then 4.5 v'" > ^^' rv'jp) R&D model, where n' > on (1/2, p), while 2(2p-l)eMp)) ^ S(p)/(rt;(p)) v'" < E(p)3/2 always. Clearly, just above if payoffs are such that p, since v'{p) = by smooth pasting. Expected Remaining Time and Experimentation Costs We now analyze the behavior of the only two costs of experimentation in our model: time and money. Assume a stopping time of Proposition 2-a or c. Then by T< oo §15.3 of Karlin a.s., which is true under the assumptions and Taylor (1981) (KT81), since zero drift and variance 2E(p)n(p), the expected remaining time t{p) stopping obeys the boundary conditions t{p) + p< 1/2 <p, = t{p) = 0, E(p)n(p)r"(p). Hence, r"(p)<0, and consequently, t{p) = E[T\po as well as the is = (pt) has p] until ODE: — 1 = hill-shaped. We next write v{p) = R{p) — K{p), or the expected present value of final rewards R{p) = £'[e~'"^7r(px)|po = p] less that of experimentation costs k{p) = E[f^ e~'"*c(n(pi))df|po = p].12 Since k{p) eventually it is E(p)n(p)K"(p) = U-shaped, k{p) — as then is K, = nonmonotonic with k{p) clearly then near the extremes given the vanishing expected time horizon, falls — rK{p) k{p) When c{n{p)). By KT81, 0. everywhere concave, and is But single-peaked. < k{p) c{n{p))/r always k[p) ever exceeds c{n{p))/r middling p when n{p) and thus c{n{p)) for must become convex, and any crossing is an ODE everywhere positive and is If if also satisfies the it n{p) and thus c{n{p)) smaller and thus concave near p and p. is seems quite plausible it = k G C\ inflection point; since — U-shaped is then it strangely must then be 'molar-tooth' shaped: two local maxima, bracketing two inflection < points at pi p2, > with k{p) open question. Quite C c(n{p))/r on {pi,p2) k plausibly, Which (p,p). scenario arises an is molar-tooth shaped only for low enough r (so that is experimentation lasts a long time), and disparate payoffs (extremely U-shaped costs). 4.6 Experimentation Drift Since the belief process by Ito process Finally, {K,{pt)) and otherwise, By n{p) Lemma, and Likewise, (r(pt)) in £. in £. Ito's it is is a is convex. But supermartingale Ito process strict everywhere a in n may convex and increasing of a convex more is 1, possible is strictly when c' > 0, (8) + [nc" {n)]' (n'f implies n" producer surplus is itself Proposition 5 Assume the intensity level n{p) = c{n) > = at least . If the is strictly then in n, hill-shaped, Since v n^. is = when = [nc"{n)]' its inverse beliefs / is if weakly composition f{rv) is strictly convex. rv{p), to get g'{n)n'{p) FOG rv"{p) — < = v"{p) — when rv'{p). c'{n{p))/T,{p), n alone: ^'^ (8) the information already asserted. — nc'{n) — c{n) pE 8., and thus = rc'{n)/E{p) i.e., level, as producer surplus g{n) convex in strictly then unclear. But differential equation in the level [g' {n)n' (p)]' is if Indeed, differentiate the Bellman c"' exists. concave in the experimentation {i^) = f{rv{p)) and applying the optimal control nonautonomous second order [nc"{n)]n" is a submartingale is convex function rv equation, that surplus equals the delay cost, g{n{p)) Differentiating once more, k if 5.1 of Rockafellar (1970), the and increasing function / with a yields a simple when the convexity of n{p) By Theorem refined statement < supermartingale inside {pi,P2)- strict weakly concave and increasing in n. > down), as r" (drifts submartingale Ito process strict well be concave in rv, as p e £ by Proposition values {v{pt)) are an also a strict submartingale (drifts up) in £, since v" must switch to a the producer surplus g Since &, a martingale diffusion, and v e the same reasoning, the experimentation level process {n{pt)) convex A {pt) is is concave, then {n{pt)) is a submartingale. '°This more simply yields the nonautonomous second order differentiaJ equation in the value v alone: c'{f{rv{p))) = sition g{n(p)) T,(j})v"(p). may be Equation (8) does not necessarily hold even though v" exists since the compo" if we cannot write its first derivative as g' {n{p))n' {p) twice differentiable even . 13 Figure A 3: Payoff Shift in the and so does the value function, rises, R&D Model. When intensity levels (inside the = For instance, any convex geometric cost function c{n) = surplus [nd'{n)\' {c{n) =n for n < 1) yields surplus function g{n) R&D model, that n"{p) v", If [nc"{n)]' in a nc"{n) =1 = vanishes in = still (8), and costs, the yields = knife-edge case c{n) n then c"(0) + e) have n" < is 1 a convex producer + nlogn n> for 1 when k > e experimentation So 2. be a submartingale. In even if g{n) is light slightly convex. locally convex. In particular, for the oo, then some for > > a submartingale. In particular, = left-shifted interval {p,p)). constant, and so linear surplus. Note that a concave we may have lininio 0{n''~^) and not too convex The 0. neighborhood \p,p level (n(pt)) is locally lim„4.o '^c"'(n) > Iimji|o"c"'(n) if > l)^n^~^ now {k>l) vf' sufficient that intensity levels (n^) is enough of (8), for large Remark. — k{k the (unplotted) bad build payoff £ one can show by THopital's rule 0; if therefore, the experimentation c{n) = = then lim„j,oc"(n) n*^, R&D for barely profitable level is at least initially projects expected to rise. SENSITIVITY ANALYSIS 5. Parametric Shifts 5.1 We now how explore the experimentation schedule moves with changes in the payoffs, information cost, or interest rate. Appendix brutally illustrate in Figure 3 for the But the tangent fall, line p/i + (1 —p)i R&D tilts C model. upward since (rather loosely) the value function reward is higher, one so that p or p both is indifferent shift in the develops a When too, is less method i rises, so curved. This is is if h had and £ falls so as to and hence n. intuitive: When the less optimistic, fallen). uncertainty over the state of the world, a most natural thought experiment stems from raising the payoff rises, v, and thus the thresholds p and p must direction (but up, Since the sole reason to pay for information assume that h does about adopting or quitting when slightly same we of tangents, which risk. In other words, maintain a constant expected payoff nsip) from that action at the current belief p. Intuitively, this ought to raise the value of the dynamic problem, since the VM should prefer a riskier final payoff distribution. frontier in the current stopping set goes up. 14 For the static payoff Then by employing the same level decision. and stopping rule, the VAi's current value no worse. Hence the return rv and thus the experimentation The T>A4 also benefits signals (higher and is value often r, the more eager falls greater, so is is and act: the intensity Thresholds both result that follows (proved in when any payoff TT^ Payoff Risk: (b) an action a rises. and payoffs (ic), If payoffs grow yrf and p is replaced by c{n), where c(0) of = c'{f) = c(0) = and fall rises, = 0, this is is never taken. up for p G if'K^ or n^ [p,p] rises. remains constant for ~'^a\ increases), then the value v{p) rises, and the experimentation — c(0), c{n) optimum) at true if c{n) — level n{p) is — namely, and corresponding slope > ^'(0+). Then uniformly fall in £. With higher: ^'(0+) is c{n) level n{p) rises. "more convex" c{n) convex, and the value v{p) and intensity Information Quality: As (d) rises), (the marginal costs at the costs c(0) level n{p) shift that the cost function grows c{n) no fixed l^^a portmanteau to the null action. so high that the null action riskier (the expected payoff iTaip) at belief p, but the payoff spread thresholds shift in, and so that the return rv, to the general story, the little Thresholds p,p rise ifn^ or-K^ Cost Convexity: Assume (, a key is Less obviously, the Appendix C.l) ignores complications due the thresholds "shift out" (p falls (c) Finally, with a level. shift in. rises, Payoff Levels: The value v{p) and experimentation (a) rise. rises. case-by-case analysis that adds Assume n both and from more powerful costs, proportionately less than the interest rate Proposition 6 level re-optimizing, he does enjoys a higher expected payoff (since discounting to stop thus the intensity level n, To avoid a VM. convex information less Q. Since the value higher interest rate cost), from By intuitively increases. convex and the signal-to-noise ratio factor C, c'(O) > > c'(0) 0. value v{p) and rises, the the experimentation level n{p) shift up, while the thresholds shift out. (e) Impatience: As the interest rate r rises, the value v{p) falls, and thresholds Also, the optimal intensity level n{p) rises strictly near one or both thresholds in R&D model, n{p) declines for all p < p', and rises for all p > Surprisingly, the impatience result runs counter to the folk More impatient §2.B), with cost, more myopic a vanishing belief interval. The whereas experimentation has a Information accrual here p' G In the {p,p)- of Bayesian learning. is '!t{pt) is VM's delay his experimentation. Further, as Proposi- VM experiments at an exploding rate — albeit over 'folk intuition' finite depends on never-ending experimentation, purpose here, that one eventually stops and a means to an end; it is analysis in the paper so far remains valid where wisdom some actions. In our model, greater impatience raises the tion 7 will assert, as r blows up, the i.e. for £.. decision makers typically 'experiment' less (eg. in the canonical settings of and somewhere induces him to accelerate Our p', shift out. not a payoff-generating if the final payoff is acts. lifestyle. an annuity — an eternal flow payoff rather .than a one-shot lump-sum, as we have 15 assumed. What happens arm the safe uninformative and in this case is that the final decision therefore exercised is rate r, the annuity Bellman equation is in when equal worth for the a bandit model: formally very much provides a constant flow current It + payoff', Indeed, the intensity level falls everjrwhere.^^ maximization E[J^ —c{nt)e~^*'dt like So with a higher interest to the current value. and the less, is But e~'"*7r(pr)/r] is still (5). w = rv, it becomes w{p) = max„>o{— c(n) + nE(p)to"(p)/r}. Hence, n{p) = f{w{p)), and w solves w"{p) = rc'{f{w{p)))/Yj{p). The value matching and smooth pasting conditions are still w{p) =n{p),w{p) = 7r{p),w'{p) = n'{p),w'{p) = 7r'{p). A higher terms of the return in interest rate r is then formally equivalent to a lower signal-to-noise ratio factor this diminishes the return w{p), The Return 5.2 We finally turn impatience and to Wald's full circle, recalling our two paired twists VJ^ Absent payoff discounting, the > cost c(0) penalty, sees no hurry to stack experiments, and reverts to a Q) Assume asr ^ oo, obeying ("Ar). Afc(n2 : the final payoff 7r{p) and decreases > r cj(.(n2) — cj(.(ni) < intensity level must >0 as r 4- 0. rlO Thus, n(p)4,0 as iffc{0) = 0. Consider a cost function sequence Ci(n), C2(n), 0. Afc(n2 = — ni) for all n2 > ni > > > 7r(p) Proof of the convexity limit — n*^ for k > 1\ the producer /(n) = [n/{k — \)\-l^ blows up as c(n) are very grateful to Sven Rady = 0. . . Thennk{p) uniformly < max{7r^, -Cq, so that /(-cq) /(O) > for all for p e £, rv{p) is > and uniformly vanishes i/limjt^oo At 0, Because ^(0) 0. satisfy n{p) Likewise, since v{p) We everywhere. see the impatience limits, consider that since v{p) return rv{p) vanishes as r -> 11 > Let A^, A^ be the lower and upper Lipschitz constants of the marginal cost — ni) < To no parallel experimentation faces c{n) obeying {"k), the intensity level n{p) explodes to f{0) explodes (where positive) i/limfc_>oo A^ Proof: VM. running a massive experiment at time-0. classical statistician, Vanishing Convexity: Fix {b) noted, these assumptions cut in opposite ways. strict cost convexity, the and he becomes a > (where on Wald's sequential paradigm: As Vanishing Impatience: For fixed (a) 6. (vanishing intensity levels) barring any fixed flow experimentation Without 0. Proposition 7 c'f. mode f{w{p)), by the logic of Proposition World strict cost convexity. purely sequential = and n{p) thus, (^; pe £, = oo. tt^, tt^, tt^}, = 0, the optimal and so n{p) tends down to and n(p) explode as r —^ the oo. /(O). D appendicized, but for a powerful example, consider surplus A; 4- 1, is as g{n) = nc'{n) — c{n) = {k — l)n^. Its inverse we converge upon Wald's case for discovering this 16 key difference. of linear costs. CONCLUSION 6. Summary. This paper has explored the basic two action, two state decision problem under uncertainty. particularized We have modified Wald's model by assuming an impatient VA4, and to the case of it an increasing and strictly convex cost function of within- period experimentation. These two plausible economic assumptions have jointly afforded a reasonably complete Bayesian characterization of information demand: Research grow with the optimism over the For not too convex cost functions, they project. upwards over time. This yields some simple Our conclusions have not come falsifiable implications in purchases or R&D. We it is We it can be applied in many settings, such as strategic R&D models, or principal-agent experimentation models. our main intensity- value monotonicity result n producer surplus intuition extends to stationary in the posterior Such a Extension. V/A mean 6 = states, f{rv), and the associated information actions and states. It also obtains in E; however, the resulting problem not is alone, as total research outlays so far increase one's belief eventually becomes convinced of anything he learns, and quits. It is surprising that perimentation level in Proposition properties on the = many finitely a normal learning model with state space An consider this framework a key While we have studied a model with just two actions and two Robustness. precision. context. a tractable new off-the-shelf decision model of information believe that patent races, general equilibrium R&D an drift standard discrete time setting, but in a new in the continuous time control of variance for a diffusion. contribution of the paper, as levels demand first order conditions nailed down the optimal ex- Apart from normal learning models, such regularity 1. for information are very rare. For instance, Radner and Stiglitz (1984) have pointed out that with a one-shot information purchase, a 'non-concavity' in the value of information may emerge its marginal value is initially zero. In a we argue that when information can be repeatedly purchased, and progress, endogenous, a well-behaved dynamic formation on page — is demand theory emerges. The its economists prefer. economically well-behaved if Information the decision is is quite unlike other goods, being only maker can exercise an eternal option of repur- somewhat important Bayesian decision problem. we also believe that this same framework will afford a Blackwell's (1953) theorem on the value of information with a large purchases of very noise ratio. value resulting value of in- chase. This paper might be seen as an application of this general principle to a Finally, in concave in the quantity (and linear in our continuous time limit, underscored 10), as historically work A weak and cheap careful formulation signals, all the and development of 17 dynamic extension of much finer ordering. DM cares about is For the signal-to- this idea awaits our future work. APPENDICES PRIMARY MATHEMATICAL RESULTS A. A.l Properties Related to the Cost Function Claim The surplus g{n) 1 y(0)<0 (=Q g{n Proof: Claim 2 iff — 0), c{0) + e) — Given = — = g^^ (i^), the [c{n) and ^ > = tion. — sup„7i*n By Theorems + c{n) = n*n and 26.3 12.2 for on (0,oo). its 1, = Since g{n) inverse / its in Rockafellar (1970) dual = c** c any subgradient n* of exists c'{f{w)) is increasing, w = rv, forw>0. is — is con- continuous. Let nc'{n) and O We > c{n) is (R70), c* is convex since c is smooth. His Theorem 23.5-d then yields c at n (the Fenchel- Young equality). Taking = nc'{n), or c*{c'{n)) = g{n). Finally, at n = f{w), we c*{^{w)) = c*{(:^{f{w)))=g{f{w)) = w. The inverse relationship ^ = (c*)~^ obtains, and n*=c'{n), we have c*{c'{n)) get 0. c{n) be the (Legendre-Fenchel) conjugate dual of the cost func- convex, and strictly convex as c*{n*) = not globally so) in the return (i.e. locally, tinuous and strictly increasing by Claim c*{n*) > c{n marginal cost of experimentation ^{w) > Since (-Ar). continuous and increasing, with /(O) is + e)] + {n + e)c'{n + e) — nc'{n) > —Ec'{n + e) + {n + e)c'{n + e) — nc'{n) > nc'{n + £) - nc'{n) > nXe g{n) Clearly, ^(0) continuous, increasing, and unbounded given inverse f its and Lipschitz strictly concave, Proof: is since c* is increasing away from w = 0, + c{n) and convex, ^ on the is increasing relative interior of its and concave. is locally Lipschitz D domain, by Theorem 10.4 of R70. Observe that even though the marginal cost function = Finally, ^ need not be concave in c'{n) n, When c"(> 0) exists, this is easily seen: For then f'{w) = [g'{f{w))]~^ = [f{w)d'{f{w)]~^ > 0, and consequently, C{w) = d'{f{w))f'{w) = l/f{w) > and e."{w) = -f'{w)/f\w) < 0. the function ^{w) Clciim 3 Proof: Given If false, (tAt), d{f{w)) the Lipschitz inequality for / 9{fM)-g{f{wo)) /W-/M concavity of ^ surprisingly concave in w. both the inverse surplus function w = g{f{vj))=g{y)=yd{y) — where the limit is c{y), plus the -^(^(-°))- exists as = cf{f) c' (see exists Claim cjfiw,)) 2). is The wi > wq > close. By Lipschitz-down constant A for - /(.,)- and / fails for f and y/J are Lipschitz on cifjwo)) /M . ^^^"^^ continuous, and is c'ifjw,)) the relations cf{n), - d{f{wo)) /(-0-/K) ^'^ approached from below by Lipschitz constant l/\f{wo) 18 (0, oo). G (0,oo) suffices, u LHS For \/] the , [/(w'l) - Given Since ^ is + f{wo)]/[\/fiwi) Claim 4 Proof: of (9) (i^), the is function ^{w) = using g{n) this RHS on the as is differentiable for all w> with , ^' [w) = 1 / f [w) concave on (0,oo), the right derivative ^'{w+) exists by Theorem 24.1 ^ — D{y) limj^j,^, . ^/f{woj], this yields the Lipschitz constant l/2A(/(u;o))^/^ of R70. Let Clearly, ^/J{w^ ~ ^/JJwo) Expressing divided by — nc'{n) e(y) ^H _ J_ y-w C{w+) — l/f{w) f{w)' To exists. =w and g{f{w)) c{n) - why see = ^'{w+) 1/ f{w), observe that (as well as algebraic simplification), we have the identity 1 f(y) 1// Proof: Given By Claim = r^'{rv)/E{p) 4^(rw)(2p = ^' exists, 4, ^{rv)/T,{p) ^ so that r/[f{rv)E{p)] G - 1)/(CV(1 - Pf) e - /(<") term vanishes as y — w\.Q. [..] and / Lipschitz (Claim 1) ^(p, v) ("A"), S(v) c differentiable, the bounded (/>0 by Claim Claim 5 =(^(^»-'(^W'-c'(/fa)) y-w f(w) Given / continuous, and - fiw) (0, We only need e,'{w-) = l/ f{w). D 3). Similarly, continuous and Lipschitz on (0,1) x is (0, oo). — d^/dp = D partially differentiable in v, with d'^/dv is oo) for {p,v) (-00, oo) for e (p, v) (0,1) € (0, 1) x (0,oo). x Similarly, (0, oo). A. 2 Two- Point Boundary Value Problems An ODE Problem a. denote the > with c(0),c'(0) = ^(p,u) = boundary value ODE, amenable Observe that p <p<p< = c'{f{rv{p)))/E{p) 1. > p in to known EU, where tt Towards solving £U, we pasting, this boundary free is is & and a C^ function v not a standard, two-point results. It kinked at p, since v" first 1 cE ^(r?;(p))/E(p), such that (6)-(7) hold (no With value matching and smooth null action). < p < p < there exist unique p,p with 0, £U Let existence / uniqueness problem: For any strictly convex cost function ff. with v"{p) with Value Matching £md Smooth Pasting. needs an ad hoc proof. > (given c' > 0) and consider a second order one point boundary pG {p, 1), solve Vp{p) = ^(p, Vp{p)) - tt^ > 0. Clearly, any such Vp is 6^. value problem XV(p) of the Cauchy type: For any fixed = p7rf + given Vp{p) We now Claim 6 (0, 1) (6) (zi) X (0, - p)7r^ > start our attack (a) For all p G oo), with Vp For any p G on (1 (po,p], on the (p, 1), = and v'p{p) SU problem, by 7rf first XV(p) has a unique solution and vUp) uniformly continuous (p, 1), Vp is positive, strictly forpo>0, with assuming the Vp, that R&D can be continued on in p. convex, and Lipschitz either Vp{po)=limpi,poVp{p) 19 = 0. payoffs. (z) on (0,p), or ^ Proof of Claim 5), after (valid and on an open [0,p7rf Claim Vp is when Vp{p) > There p € a unique is solution Vg exists on Proof of = > If Up (a): on expansion (0,p] for all _ _ with 0, q E = ~oo for p near = Claim + jott^ and 6-6, case (m), Next Po be the let claim that t'^(po) > — (1 — tt^ > < = then Vp{qo) < and v'p{qo), = maximal. Finally, since v"{p) Proof of Consider any p (b): V'gip) Because Vg{z) ^{re) f^ 2~^(1 limg-t-i v'g{p) Vg{p) > is =^"b-^'b- qo > ^(p, > t;) strictly and Proof: Aw(p') P2>pi, Suppose = Vp^ip') strict qo G (a) - m (p, 1), fails at some Vp^ip') convexity of = 0. t;^^ 0. < {po,p]- r - t^b){p - by a Taylor p, P) continuity, the limit in — a contradiction. Since Vp{p) = t'p(po) = tt^ — — < < say, on — tt^ > But (po,p], as claimed. exists. ^' this last integral strictly exceeds "[ 1, or as p i p. Vp^, Since for: v'^{q) enough to Vp rises and any q = tt^ v'p^ > So p. -tt^ > 0, 1. and flattens asp (6) > rises: For Vp^. the largest p' where vp^,Vp^ cross: (p', 1), in (pi,p2) yields v'p^{pi) 20 We 0. contrary to po ^ I' ^Jli-% {0,q] for q close (a) Vp^ 0. P- on = — oo for q > pi Because A?; > ^^ - The solution we have p' > > (po,p), with non-negative slope. = ^f - limp4,o w^(p) p2 < - x in (0, 1) wherever the solution Vg{p) uniquely i.e. convex and U-shaped in threshold pairs pi u i'^B the slope f^ Vg{z))dZ Claim 8 (Monotonicity Properties) all 0, on by continuity) point where > e > for all z € \p,q] C (0,1), — zy^dz, and thus blows up either as q = — oo for fixed p, > = enough top, there > v'p Brock lin^p^po ^p(p) given Vp{p) po has all the properties of po > max (0, po), f nZ, all q for small £ by construction, Vp crosses the p axis at some 1-4.1 of unique, strictly positive o, 4- largest, not, then Vp{pQ-\-£) and ~ p)^b = as p Pi for p near p Vp{p) < (1 force supremum (and 0. If + > that is close (p, 1), _ _ P)t^b ^ P'^b tt^ ~ then for p just below p, Mp) ~ MP) - Mp~)(p -p) ^ Mp) Since Vp{p) by Theorem ^{p,v{p)) >p Vp > all such that for Also, limpj,o f^^(p) (0, q]. = For any p = or Vp Lipschitz left, continuous and bounded is from Vp (a) limpip^ ^p(p) (p, 1), continues it (0, p)- If so, ^ and 0, Strict convexity follows such that Vp{po) £ po as (0, 1) of Vp can be compactified via Vp{po) 7 (Limit Solution Behavior) > oo) (see to a two-dimensional system of first-order Lipschitz on some supremum domain Lipschitz is and 0, set) fails at — p)7rg]. (1 -I- exists Po (b) > Malliaris (1989), the — because ^ ODE (0, 1.1-2 and 2.1, and Observation 2 in §1.6 of Elsgolts (1970). Either Vp (6): continuous and Lipschitz on (0,1) x reducing the second order ODEs. See Theorems Proof of ^ All claims follow given (a): this requires Av'{p') < 0. = 7r^ - 7r^ = v'p^{p2)>v'p^{pi), Since and so Av'{pi) > 0. By = continuity of Av', Av'{p") for some p" e i.e. v'p^{p") (p',Pi), = v'p^{p"), or 4M) - ff 'i'iz,VpA^))dz = Cancelling f^j(pi) = of (10) are equal. But < since Vp^{z) tt^ — tt^ = < this is impossible, because pi from Finally, part (b) follows integration rises with p, < > for 2 = v'p{p) > p" ^{z,vp,{z))dz ff from both sides of I'^^fe) Vp^{z) given Ai'(z) - v'p,{p2) (10), the integrals and p2, Choose Proof: on As q (0,^]. over (0,^], >p q p>0 enough to close domain U-shaped by Claim convex by Claim 7-6, cross the horizontal axis only in the only if b. v{p') = Proof: for 7r(p') p' <p< and v{p') as long as VsM - for every S2 ^S2 iP') p'; < tt§ — q it p' tt^ Si, ~^si{p') = — has a solution 7r{p') for each s G R, < (S2 and S2 - si)(p all p, in (0, g), it can in Claim 7-a. A falling WLOG. D is — S1 = v'g^ = ^{p,rv{p)) = 'if{p.,rvs{p)) s.t. Vs{p') = tt{p')>0, of ^ in Claim 5, a unique solution uniformly continuous in = n{p') + s{p — p) -\- J-, f^, s, — t^s(p) < enough s. (s2 — v'g^ {p') exist. all p < — •s)(p— p') p' = 7r(p) G [0, and can be extended ^(y, Vs{y))dxdy. Next, (11) = Inequality then holds by continuity for p near . provided Vs{p) from 00), — s, Vg to this This obtains because (11), > 0. and 21 Vs2 < Vsi- — 00 Vs{p) = In fact, \ims-^-ooVs{p) > provided Vs,Vs2 Therefore by continuity and monotonicity of there exists s with Vs{p) v'g{p') - p') + /; J^,[-^{y,Vs,{y)) - ^{y,VsM)]dxdy < {p') s.t. v. where both solutions Vs^,Vs2 Hence, lims-^±ooVs{p) exists for for large > Vq{q)- Since and so ^(p, i^s2(p)) < *(P;^5i(p)) by monotonicity of ^; continuing leftward, because U52(p) 1;^ assertion then fails The two-point boundary value problem v"{p) . remains positive: Vs{p) VsM = > of £U. strictly positive minimum — which we ruled out (6), (7) IV{p) with Value Matching Alone. G R. By the properties (slope) s problem exists locally left all Consider the Cauchy problem v"{p) some = —00 < s.t. as long as could easily substitute for the horizontal axis, which was thus Fix exists is But 8. D above argument only made essential use of the rising neip) curve. An ODE Problem Claim 9 p<l way required by £U. The always stays strictly positive for tta{p) line down by Claim with exactly one global first Finally, the unique (ir), a obeying the boundary conditions with ^^(0+) also strictly 6-b, ^^^(-z)) of the solution. that Vq{p) uniquely exists and 1 it is Vg Given decreases, Vg{p) uniformly shifts it is ^(z, — tt^) — J^ ^(2, rvp{z))dz. For the domain of the integral, given ^ > 0. And we've just shown and thus so does has a unique solution Vp for a unique < '^{z,Vp^{z)) (tt^ {SU Existence/Uniqueness) 1 on either side p'. that Vp{z), and thus ^(z, Vp{z)) rises uniformly for z in the Theorem (10) in particular for p 0. Likewise, Vs{-) in s, for all = p'. p < p DERIVATION OF OPTIMAL CONTROL FOUNDATIONS B. This appendix rigorously formulates the control/stopping problem and the sequence summarize our plan and highlight the absence of any of attack, ignore the null action throughout, as We first logical circularity. We and uniqueness properties of of proofs that establish the existence its solution. its additional difficulties (with a horizontal tangency = somewhere) are fully captured by the R&D case: a vanishing intensity level. §B.l. We progressively restrict the control process to simpler domains. First, we posit the SDE met by the observation process (xj), given any control functional n(-) of past observations. Next, we derive the Bayes filter SDE describing the evolution of (pt). Then we show how to formulate the OCS problem with current beliefs pt as state variable: because tt Since the best existing theory does not do this for joint optimal control and stopping, we first find weak conditions {V convex in p); this yields a simple belief exit set may time T. Finally, we show that we We §B.2. [0,p] WLOG restrict to V{p), and then prove U [p, 1], Markov Assuming c(0) > or optimal Markov control n{p) > tt — a.s. Lemma 1 and a Markov stopping control policies n{p). any candidate establish key properties of the belief diffusion to prove that optimal policy of beliefs alone (as §B.l allows) §B.3. asV = to write the value function stops experimenting in finite time. everywhere (or both), we verify that the candidate f{rv{p)) from SU is admissible in the sense of §B.l, We then check that the unique solution to SU (and thus to 7iJB+ST) also solves OCS {V — v works, with p,p), and also is its only solution. §B.4. We specialize §B.3 to the R&D model with c(0) = and tt = somewhere. justifying uniqueness presumed so far. B.l Developing the Markov Control Model a. Let Drift and Noise Notation. ht{uj) or Let {f2, J, ?} be the underlying probability space. h{t,u) denote the same continuous time stochastic process h a collection of J-measurable real-valued functions (random variables) We consider only processes with continuous sample paths, for every subset [0, i], cu [0,t] G Q. C We [0,oo); i.e. by h^{u) = is variance, adapted only to (3"^^), 1 R_|_ is Q x R -> continuous {{s,h{s,u)))s<t the (graph of the) process restricted to lem amenable to existing (BM) with by /i*~ the W = {Wt,J^) zero drift is and unit and (canonically) with continuous sample paths. Think of Nature as — Po, and a in Cmoo); the filtration generated by h. Finally, the standard Wiener process or standard Brownian Motion with chances po, : namely denote by GJq,^) the space of continuous functions on any time restriction of h^ to [0,t); (3^f) (0, 1). /i {ht)t>o, where h{-,u) whose (superscripted) elements are sample paths or trajectories Let po ^ = initially independently drawing a BM noise path realization VF°° results in nonlinear filtering theory, 22 drift mG {— yu, /i}, C[o_oo)- To render our prob- we must define an underlying € unobserved process that the VA4 wishes to infer (not simply the stationary and an drift), observable process that provides noisy information on the former (the controlled signal). This construction works even though the unobservable process Oksendal (1995) (hereafter, 095), Example Q = Put Q = {(0, W (0,2], nLUQH = 1], !B(o,i], and endow U (0,1] BM extends to a standard ability = measures Q{uj) Finally choose = that mt{uj) > fj, miuj) = // = ff'i-adapted process h. = 0, polivenn M+ X (1 ~ <-> drift ^^^ Po)^ajenL me = m{u!) {//, /i*) for all t,^^ > 0. The dxt{uj) = m{u)dt Fix a R be a standard -H- = 7{ui) S(o,2], and partition BM on m : Next, define prob- Q{uj)Tw{^) on R+ x partition element {—//,//} such Qg G {0^, ^l}-^ K++ n(t, h^{u>)) is also 3'f-adapted for each and sample paths n:E+ x and (m) controlled signal process {xt{uj)) solving: —^ fi {^7, 3"}. otherwise. Note the equivalence: —jj, —n} control functional of time n{t, = by translation. {{l,2],'B(^i^2],'^w} + -^ (0,1] 3" by the Kolmogorov extension theorem. Then exist u G Qh and for all G {H, L} The non- anticipatory Xq{lo) on : and define the time-invariant process payoff-relevant state 9 satisfies (i) n{t, /i°°) with the Borel a-algebra W and e.g. 6.11. Let U^ (1,2]. 7w yw}, where it — time-invariant is is Cj*q s a diffusion with cr + --dWtjuj). (12) ^n{t,x^{Lu)) h. The Continuous-time Non-Linear Bayes Rule payoffs and vNM preferences, the VAd's expected payoff from stopping depends on the controlled signal diffusion only via the posterior probability of time and past observations of the signal and own control Our model, with the constant all conditions of to the SDE Theorem 9.1 in (given a prespecified Given state-contingent Filter. pt m= = 7{m = m and signal process state process LS77 provided that there exists a the process pt = \ 3^?, n^') obeys dptioj) = p,{uj){l -pt{uj) Wt{uj) = (l/a) W = and that (n) adapted not only to c. 7{Q,h own = po-, ^~^^ x/n(t, fx\ po,t,x^, n*"). obeying (12), meets in §B.3. Their result, filtering', asserts that: and solves the SDE:^^ x^{u))dWt{u) - \ps{uj)iJi + (1 - Ps{uj){-fi))]ds} ^ Wiener process, but not a standard filtration, H), given ^^3^ ^n{s,x^{uj)){dx'{uj) jl (^f,'3^t) i^ its f~ pt{(jo) (state unique strong solution Wiener process (Wt)), established a special case of the so-called 'fundamental theorem of non-linear (i) /x but also to the smaller one BM. That is, it is {3^^)- Beliefs as a Sufficient Statistic. ^^We shall choose h = x, and soon after h — p. Of course, at time t, we need the joint history of i' and compute the current optimal control nt- However, given the functional n and x^, the control history n*~ is imphcitly recursively embedded; therefore, we shall omit it altogether for notational simplicity. ^^More recently, Bolton and Harris (1993) provide alternative insights into this formula. n*"" to j, 23 The Control • work with as a Function Just of Belief History. than signals as a state variable, we establish: beliefs rather Claim 10 There is p* IS sufficient for x'' and {x\n*~). Hence, a bijection between information sets (p*,n*~) WLOG and we may , Since we want to replace n[t,x^{uj)) by n{t,p^(uj)). We first prove that (x*, n*") i-> (p*, n'~) is an injection. Fix the control functional n, states a;,a;' e Q whose resulting signal paths differ: Xs{u)^Xs{uj') for some se[0,i]. Proof: and By continuity, this holds n(t,a;*(a;)) = p\uj)i^p\uj'), or That {p*, = n*~(a;) [ii) n'~} on a time = n(f,s'(a;')), n*-~{u)') n'-{Lo) i—> {x*, + n*~} and given the either have filter (13), belief Either way, {p\uj) n'- (u)} n'-{uj'). is Then we measure. set of positive , + an injection follows from strong existence and uniqueness control process n(t, x'(w)) yield a unique belief process Pt(a;) via the The Value and associated as a Function Just of Current filter. Given the absence Beliefs. — Krylov of suitable sufficiency theorems for optimal control joint with optimal stopping (1980) being the best source here Claim 11 The random Proof: T{uj) The supremum < t} e "3^ (4) for all t > — we proceed indirectly, via the value function. can be written as V{p), a function of the current belief p. T variable : ^2 i—) [0, oo] is a Markov time relative to (3^^) w.r.t. the measure ^(a;) on a continuous terminal reward function require tt e supremum value at time upper bound this Theorem 6^). His is t We more powerful verification of the length-T horizon problem. His Theorem = = V{pt) for all t, for alH > Vit.p^ is using a policy that any p e namely the is By Claim = limf^^V{t,pt\T). since our E(-), c(-), 11, the supremum e-optimal for the V{po) problem (pi,P2), the VM 7r(-), and Finally, merely weights these payoflfs is by his Lemma 1. described by a stopping set in value can be written as V{p). (i.e. it achieves a payoff in either state 9 = By > V{po)-e) is affine in p. H,L, and adjusting p — since the Wiener probability law 7w 24 rif, r are stationary. can ensure himself an expected payoff that For such a strategy yields a constant payoff Theorem and adapted control processes can now prove that the optimal stopping decision [0, 1]. theorems 3.1.10 states that Belief Thresholds and Value Function Convexity: Proof of current belief space for general results in Krylov (1980) assume (like us; his tt the value of the infinite horizon problem • The fi. achieved via a feedback control functional n(i,p*). Next, by 6.4.13, V{t,pt) : and a Markov 3.1.9 provides a recursive equation for l^(t,pt|T), 6.4.4, since limj.^oo £'t^[e~'"(^~'W(pj._J] Remark if {a; The VM. must optimally stop the one-dimensional 0. controlled process (pt(a;)) solving (13) for an adapted control process niiui) time T(a;), differ, {p*(a;')> "*"('^')}- of the diffusion Pt(a;), which imply that every signal process path x^iuj) • paths [i) is independent of the partition element Lo while {Vie}-, m= exceeds some supporting e-tangent line at p. Next, define stopping sets Sa-, As e > V H,L. Thus, ±/i in states and p are everywhere weakly V arbitrary, is convex. Sb, and Sq for A, B, and the null action. Since point = 7r(0) and y(l) = 7r(l). As stopping is always an option, we have V > equality V{p) = 7r(p) holds iff p G 5^ U 5// U Sq. If ever V{p) = n so that p € Sa, then convexity, V > and G Sb, together force ^ = on [0,p]. Thus, Sa = [0,p], and similarly Sb = [p, 1] for some < p < p < 1. Similarly, since tt^ > and 7C§ > 0, the null action is never exercised at p = and p=l, so that 5'o = [p„,Po]. beliefs are stationary for (13), ^(0) it; tt, tt We have now proven that the stopping rule is characterized by an exit set in the space [0, 1] of current beliefs. We may therefore simplify the supremum operator in (4) as sup^-,, = suprp^i[sup„]. For each • The Control Current as a Function Just of supremum {p,p}, resolving the inner is Beliefs. crucially a pure control exercise. Under very weak conditions guaranteed by our simple two-point boundaries and boundedly finite for any optimal ?f -adapted control, there exists a Markov control that performs as by Theorem 11.3 we in —) SDE is ]R_|. dpt Markov = the admissible stationary n : [0, 1] —> Markov ]R-|_+ control policy problem when M n G n{pt{u>)), where n : ^ zs {p,p) C [0, 1]. a strictly positive yielding a unique strong solution to (12)-(13). The Properly Formulated Optimization Problem. OCS = ^/2E{pt}n{pt)dWt (given po), on any continuation set 'B[o^i]-Tneasurable function d. controls, of the type n(t,p'(a;)) a B[o,i]-measurable function. Thus, we consider beliefs as the solution to An Definition well, 095. Hence, only the current belief pt matters for control. Hereafter, restrict to stationary [0, 1] the maximand, Replacing (4) given (2), WLOG restricting to stationary Markov controls is therefore now: ^T{u\p,p) y(po)= sup n(-)eM;p,p€[0,l] n(-)eM;p,pe[o,i] T (a; s.t. e. / Jn / ""'-c(n(pt(a;)))e-^*dt + Jo p,p) = inf {t Pt{uj) = Po Wt{uj) = (1/a) /o ^n{ps{uj)){dxsiu) I Brief Aside: > +£ : pt{uj) <p or pt e-^^('^lH'P")7r(pr(,|p,p-)) ^ ^ (a;) dy{uj) > p} ^2n{p,{u))T,{ps{u))dWt{uj) - [p,(6^)/i (15) + (1 - Ps{cj)i-fi))]ds} Making Sense of the Bayesian Model of §4.2^. In the text, formulated this result without measure theory for illustrative purposes. Yet our was not without generally write force dWt, maps, if We defined a conditional signal process dxf{uj) = Itj^ng[fj,^{u)dt + {a/nt)dWt{to)], for basis. from the unconditional point of view of the Wt{ijj) (14) = pt{uj)Wl^{iJ) 4- (1 -pi(a;))W/'(a;), premultiplied by the indicator function \^J of 25 0,0 VM, where (xf), 9 (3) is summary which we can now more — L,H\ was we the belief driving really the mixture of two equivalent to this definition (suppressed in the text, along with a;)." B.2 Experimentation Ends Almost Surely in Finite We now We assume that a stationary optimal policy {n{-),p,p) then show that experimentation demanded by that positivity, C p'e {p,p) at [0, 1], then ends in a.s. admissibility, = dp\p' is KT81, and §15.6-7, and Karatzas Theorem finite time, as the diffusion G \p,p] n{p')'£{p')dW = since and Shreve (1991) (KS91), (pt) is frompo regular if, random the is E < and C We now FTPP) from define several important integrals of l/n{p), needed Though they may be unbounded, they !B[o,i]-measurable. ^ (0, 1), Given a general < S{p,p] KT81, KS91 is clarity, The {p,P)i z, — 0:pt z \ po}. Next, y can be reached < or PT{Tzy oo) > 0. = p G Lemma equivalent formulation in KTSl's non-repelling: (pf) approaches the A (p,p). = dp \iinyipS[y,p] its left-side limit. oo, independently of The boundary p an attracting boundary 6.1, attainable < = oo) > po By an is strictly speaking is hit. M{y) = j^ concept: [S'Q{z)a'^{z)]''^dz, liniy4.pM[y,p]. Next, let J\y,p\ = the j^ S[y,z\M'{z)dz j^ S[z,p\M'{z)dz, with right limits J{p,p] and K{p,p]. The boundary p < oo for any p G {p,p). An attainable boundary J{p-,p] if attracting is FTPP, we need a harsher Similar to the scale function, define the speed function and K[y,p\ is arbitrarily close with positive chance attracting boundary might not be reached in = > similar definition applies at p. boundary (say p) speed measure M[y,p], and M(p,p] in §B.3,B.4. (3{p)dt+a{p)dW with range starting from any interior point po, before any larger interior b As an both here and = J^^ exp{-2 J^^ p{z)dz/a'^{z))dx. = S{p) — S{y) denotes the scale Siy) convenient abuses of notation, S[y,p] measure, and S{p,p\ b. closely to are well-defined, because the control n(-) belief diffusion process the scale function of By KTSl's two Pr(Tpp £. (adapting both for = inf{i > =0 a critical notion, analogous to the communicating property for Markov processes. is {P,P) G z are arbitrary points in (p,p). variable Tp^z in finite time with positive probability (hereafter, This §5.5. we hew Here, Note n(p') if oo, contrary to p' any pair of interior points y,z ^ for 5 will need. not a restriction on optimality. For to avoid overuse of letters). Below, po,p,y, hitting time of 2 with n(-) admissible. exists, Background Terminology, Notation, and Theory. a. if Time > 0. Intuitively, is by Lemma 6.2 in KT81 this thus not just approached but hit in is means FTPP. We explore the unconditional driftless (/?(p) = 0) belief a{p) = ^y2n{p)'E{p) in (p,p). Then S[y,p]=p-y as n>0. Attracting Boundaries. process, with diffusion term Claim 12 Assume n > in or c(0) [0, 1], process {pi=pQ+J^ >/2n(ps)S(ps)dH^s) Proof: is > 0. regular, For an optimal control n{p), and any triggers 0<p<p < I attracting. That boundaries are attracting follows from S{p,p\ < oo and S[p,p) < essentially stated above. Regularity a pair y,z e (p,p) with PiiTy^ < is oo) the belief oo, both proven by contradiction. Suppose that there exists = 0. Obviously y 26 j^ z; WLOG order p<y<z<p. ^ Proof Step 1: Regularity on Some Interval. Given sample path continuity and y < z < p, dll diffusion patlis from y to p first fiit z, and so Pr(Tj^2 < Typ) = 1. Hence, PviTyp < oo) < Pv{Ty, < oo) = 1 - 1 = 0. But then Pr{Typ < oo) > 0, for otherwise starting at belief y, the T>M strictly suboptimaliy experiments at cost without < yields Pr{Tyny, > Pr (Ty"p < oo) must y"], (pt) also [y', down but not up Proof Step FTPP > Pr(Ty2 < 00) = for small not. Proof Step FTPP is Specifically, < with y enough 77 > 0. y, As = suppose also regular it is PT{Tyi>y"+r) and Px{Tyy" < 00) > y — p E y 0, Lemma 6.2. < S{y, y] < Thus, y limp-t-y to the By Step 1, < a claim that the largest on [y",y" = 00) is (pt) To for all r] see why, + 77] for small > 0. Because and so y" -\-t] < p p, y), so is [y, < p > y. Because also J[y-,y] 00 forces M{y, y] = + boundary y 00. K{y,y] is unattainable in Then the PAl's only reason FTPP. We now argue that 00, the Since y is — this yields a negative payoff. Steps 1-2 proved that y attracting. [y^y); thus, J[y,y] = S{y,y]M{y,y] by KTSl's By KTSl's Lemma to experiment 00 by KTSl's Lemma 6.3-uz, this implies K{y, 6.3-t;, = y] 00. a natural (Feller) boundary and so their Theorem 7.2-n conveniently says is lim^^y £'[e~'"^«p] lies in [0, 1]. q^, and assume belief process drifts we have y" < z < an unattainable boundary of a regular process on is regularity in Step 1, this contradicts the martingale property of {pt). reaching the lower boundary p in y] (p, y), Then the y. Contradiction to Optimality. 3. starting from any Since S{y, y in FTPP. For right-open, of the form [y,y)- is regular on [y,y"], then is if {pt) Assume 0. = y' C\p,y]. To wit, the y"] y"] to y' in [y', from [y', violates the martingale property.^'* connected regularity set with lower bound y 77 y" with The Maximal Regularity Interval. non-empty interval [y,y"] C [p, y). We next 2: regular process on a we prove that < y' FTPP. Pick any {p, y) is hit in FTPP, which in any for across any such transit up in > y'm contrary that no y" > oo) down diffusion process (pt) traverses on Once more sample path continuity any prospect of positive discounted returns. (a.s.) = 0. Both limits exist as the expectation is monotone Respecting the order of limits, this means: For each e with y_<pe < qe <y, such that discounted returns starting from Since the VM. £'[e~'"^«^P£ q^ are ] Riqe) p < p^^ £'[e~'"^'^E]7r(p) < < — e. Since < > 0, q^, in p and q, and there exist Pe and the expected gross E[e~^'^''^''^]-K{p) < £7r(p). stops only at the lower threshold p, the expected total discounted experimentation costs K{qs) are positive for all qg > p, increasing as g^ rises to y as e — >• 0, stopping time, then so is t A T = min{t,T). A Typ = Typ with positive chance for large enough t < oo. Unless t A Typ A Tyyii = Tyy" wlth posltlve probability for some y" > y (and so y" is hit in FTPP from y), PT{t ATyp ATyyn = Typ) = Pi{t ATyp = Typ)>0 ioT sll J/" > ^. Thcn £bt AT^AT^,,, \y]<y + Pr(i A Tj^ = ^''Here's a proof. Since PT{Typ < — y] < y. Tj,p)[p oo) First, for > 0, i>0, ifr>Oisa any we have t But Wald's Optional Stopping Theorem guarantees equality because 27 (pt) is a martingale. and hence boundedly TfiQe) for > small enough e 0, Finite Hitting Times. could approach y, WLOG But then all exists, Assume 2 it 1: >0 Proof Step 2: With assuming local integrability {LI'), or [po (P) V) come from The is finite. hitting p, which G (PiP)- to OCS hitting time is thus a.s. finite. interior belief: Indeed, = Pr(TpQp<oo) = 0. If either is zero, say p < 7r(po) for po close to p. attainable boundaries, J{p,p\,J\p,p) < cx) by KTSl's two inequalities are 5.5.32-z in KS91,^^ these our result for iff Lemma nE>0 in {p,p), and (n) M[po — s,po + e] = J^_^ dy/[n{y)T,{y)] < Vp G {p,p) 3e>0 with {ND') obtains because n(p)>0 by optimality, and E(p) > > — £, Po+e] {p,p). the intensity level the logic of Step 3 above, u(po) For a contradiction, assume {LI') for all e or 0, nondegeneracy {ND'), or a positive diffusion term {i) oo. Clearly, > This implies that the belief process Both boundaries p,p are attainable from any But by Proposition 6.2. (p,p). fails if m[0,l], orc{0)>0, or both. If a solution {n{-),p,p] FTPP, then by unattainable in or n{q^) 0, to stop immediately, contradicting po optimality at po G {PtP) precludes Pr(TpQp<oo) is > < i^iQe) moving downward, towards the only attainable threshold then the expected time to hit p or p Proof Step /^(ge) — •^7r(p) from below, only at a vanishing speed; and by the martingale The VA4 then wishes y. < i^iQe) {p,p)- discounted payoffs starting at belief po G vanish as po nears Theorem e interior belief y property, the speed also vanishes p. and thus € {y,y) C g^ — R{Qe) summary, regularity only In some "nearly vanishes" around > because either c(0) both. This contradicts the assumption c. — Thus, the value v{qe) positive. — small enough that p say, are attainable in Thus, J\po fails for FTPP — e,po + e] < oo by classification (end of §15.6-6), both some po € < po — e {p,p)- < po + from po, as the diffusion attainability, while po±e inside (0, 1). Then M[po— £,po + e] e is < p. Next, regular in M\po—e,po + s] = are then exit boundaries of [po contradiction proof from Claim 12 (step 3) with KTSl's its Theorem — all = oo points in superinterval oo. By KTSl's e,po + e]. The 7.2-ii still applies. B.3 Verification Theorem: Existence and Uniqueness of a Solution. a. Admissibility Conditions. for our candidate Markov control the non-R&D and We check the conditions (see the definition in §B.l-c) n{p) = f{rv{p)) to be admissible. Until §B.4, we assume non- null action case with tt ^ 0, to avoid some technical issues. • PosiTiviTY. This has already been addressed at the outset of §A.2. • Measurability. The composition map Xt preserves Jf measurability (095, of a Borel measurable Lemma < oo and J\p,p) < being constant in time, n(-) T is finite a.s. iff (i) both boundaries are attainable oo (same as their v function)]; in this case, more strongly E{T) < oo. w ^^In our notation, their result says: The stopping time [J{p,p] 2.1); also, map with any measurable 28 is measurable time Borel in the process n{pt{uj)) is 'B^_^_ is any !B[oj]-measurable function for ® J'-measurable, Markov control function f{rv{p)) • So sets. and Jj^- and Jf-adapted, continuous, it is n(-), the control Since the like pt{to). therefore S[o,i]-measurable. Strong Existence and Uniqueness (SEU). Given our candidate optimal policy {f{rv,p,p}, the belief process has a unique strong solution. Indeed, replace the functional n(t, ^') with our candidate Markov control f{rv{pt)) in (12), and substitute into (15) for Wt{Lo), and then dpt pt{oj). Suppressing w, = pt{l - Pt){2lJ./a) {[f{rv{pt))/a] [m - /i(2p, - l)]dt + Vf{rv{pt))dWt) + pt{l-ptKVfiryiPt))dWt -p't{^-Pt)efirv{pt))dt + pt{l-pt)C^firv{pt))dWt (16a) Pt{l-Pt?ef{rv{pt))dt where (^ = W are primitives, we need As the probability space and Wiener process 2fj,/a. (16b) only verify that there exists a unique strong (KS91, §5.2.1) solution to each corresponding SDE dpt = by (16b). Poi 1 (5e{Pi If so, m)dt + a{p)dW on Q^, 9 = L,H, the process {pt{uj)) solving (16a) where a and is are implicitly defined j3e a diffusion, being a mixture (weights -Po) of the conditional diffusions (16b), and so inherits the defining properties (KS91, §5.1.1). An adapted control process {f{rv{pt{uj)))) is induced, and a uniquely defined observation process Xt, after substitution into (12). Because our belief SDEs are defined on a bounded open interval (p,p), all requirements below need only hold up to any hitting time T, at which point the process Existence of a weak solution on (p,p) — i.e. up to any hitting time by Skorohod's Theorem (KS91, Theorem 5.4.22) from po ^ boundedness of a and Pe on {p,p). Just as Lipschitz condition, here joint on the drift with ODEs, an and variance terms, is (3g ior 6 = absorbed. L^ H follows and the continuity and (0, 1), SDE — is uniquely soluble given a and a. This requirement met because / and y/J Lipschitz (by Claim 3) and v differentiable imply that f{rv{p)) and y/f{rv{p)) are Lipschitz in p G (p,p). (A proof mimicks that of the composition chain is rule in calculus.) By this implies strong Proposition 5.2.13 in KS91 (with h{x) = a; in their SEU Value Function ChEiracterization and Verification of the Theorem {p,p), 3 Remark uniqueness and then pathwise uniqueness (by KS91, Skipping around their book, along with weak existence, this yields b. equation (2.25)), The supremum value where {v,p,p} solves £U V of OCS (14) -(15) equals v, (KS91, OCS and {p : 5.3.3). p. 310). Policy. V{p) > 7r(p)} = (5) -(7). — Proof Step 1: Control. The supremum value V{p) of OCS clearly exists and by Lemma 1, it uniquely defines a continuation region £ = {p*,p*) Q (0, 1), by [0,p*]U[p*, 1] = 29 {p e [0,1] = V{p) : For any admissible Markov control n{p) and resulting belief 7r(p)}. process {pt{cu)), define V{po) as in (14), except using stopping time T{u)\p*,p*). Since continuous, this a pure control problem with given boundaries p*,p*. is For a sufficiency verification theorem, we check three conditions. First, there exists By Claim a solution v{p) of the 1-LJB problem with value matching only at p*,p*. ODE solution v{p) to the thresholds in in \p,p] so v"{p) in (p, p), and value matching [p,p], = and ^ c'(/) is Third, the family of functions {v{pt)}t<T simply because in Markov, as proved Proof Step boundedly |f(pi)| is 095: V{p) — finite (see in §B.l.c) control n{p) = Stopping. Next, replace n{p) 2: = f{rv{p)) = withpo(<^) f{rV{p)) == is Po an optimal control for this stopping OCS, from Step discounting by his Remark 3.8.3, for all p E 8,, V 2, uniformly integrable is (pt) starting an optimal exists and (WLOG OCS. pure optimal stopping problem: + is e-^^n{pT{uj)) our supremum F(p), because By Theorem (S78), extended to a continuous flow cost c{f{rV{-))) oo) by Claim [0, with our candidate f{rV{p)). in (13) problem 1. any Second, v e G^ 7r(p). f{rV{p)) for sup^>o E^ [j^ -c(/(rT/(p)))e-"d5 — where the value of a 095, Appendix C.3). Therefore, this controlled diffusion Pt(a;), consider the resulting V{po) 9, (6) exists for and any process and there v{p) € C^ in [p,p], = continuous on controls n(-), any stopping time realization T, by Theorem 11.2 Given = d {f {rv{p))) /T,{p) continuous in is continuous. is Markov for all S v"{p) such as p*,p*, on the boundary of the set V{p) [0, 1], because tt is final 3.15 in Shiryayev (1978) payoff 7r(-), and geometric solves the generalized Stefan problem = -c{f{rV{p))) + /(rF(p))E(p)y"(p) s.t. value matching at dE. Hence, V" > in and V = n in [0,1]\8. (where V" = 0). Therefore V is continuous and convex in [0, 1], rV{p) £, and by Theorem Also, as shown necessity of rv{p) = R70 24.1 in in §B.2.a, it Summing follows + n(p)E(p)f"(p), equivalent to v" = derivatives in [0,1], from Theorem 3.16 and so at p*,p*. in S78, i.e. V plus (6)-(7), for the given control n{p) up, the triple {V,p*,p*} solves after Proposition 1, the left Bellman equation TiJB (5) (Step 1) and ST (Step solves = 2). ST, f{rV{p)). As sketched has a unique maximizer f{rv), and so £U is TiJB+ST are jointly equivalent to £U, and therefore v{p) = V{p), p* = p, p* — P, as asserted. Theorem 4 Assume n > in [0, 1], or c(0) > 0, or both. The Markov control n{p) = f{rv{p)) and thresholds p,p from EU are optimal for the OCS problem. Since v{p) i.e. d{f{rv{p)))/T.{p). Thus, a unique solution {v{p),p,p} exists; Proof: The both boundaries are attracting for any positive control. smooth pasting then —c{n{p)) has right and = control f{rv{p)) V{p) and f{rV{p)) is is an optimal control (Step 1 not only admissible (§B.3.a) but also optimal for 3.3 in S78, the stopping time of Theorem policy 3), the OCS. By Theorem T = Tp^pATp^p — and then the whole £U policy — is optimal^ 30 for OCS if T < oo which we now establish. a.s, KS91 implies that conditions {ND') and {LI') in SDE belief (see the proof of Theorem Proposition 5.5.32-(i) in KS91, = and J[p,p) J^{p - T< oo a.s. iff these inequalities follow from /(rw (•))> c. Assume 5 in (p, p) < Therefore by oo. - p)dz/[f{rv{z))i:{z)] < J^{z C (0,1), then E{z) — true given c(0) > > tt in > or c(0) [0, 1], unique solution of ElA the unique optimal policy for 7r(p) is (5)-(7). Smce Any or both. 0, £U Because oo, S\p,p) Since {p,p) problem (14)-(15) solves Proof: < = J{p,p] oo. (p,p) or tt > > oo and always. D Uniqueness of the Optimal Policy for OCS. Verification: Theorem < z)dz/[f{rv{z))E{z)] p G for all 5.5 hold for the resulting controlled and S{p,p] 2), > First, f{rv{p)) OCS {f{rv{-)).,p,p] solves OCS solution to the by Theorem 4, the OCS. and V{p) are uniquely defined, so the boundary of the region is = 7r{p), i.e. {p*,p*} = {p,p}- Consider the uniqueness of the optimal We have proven above that V{p) = v{p) € C^ in {p*,p*) = (p,p), where V{p) control f{rv{p)). and Theorem 2 that the stopping time Theorem just for f{rv{p)). So by V{p), Thus = c'{n{p)) B.4 The When f{rv{p)) = rV{p) i.e. = Theorem E{p)V"{p) R&D 7r(p) -c{n{p)) Model = = if Proof: Clearly, f{rv{-)) in the proof of If c(0) > p limuj^^o Theorem 4, then f{rv{-)) The approximate > = then boundary problems <oo = 0, iff for some we show this M^) 1 » is still t] v'{p) = 0, {p,p). > the £U policy or c(0) 0, i.e. For n{p) arise: > 7r(p) 0, is 2 uniquely or both. WLOG. As > J^{z —p)/f{rv{z)) dz < oo. is finite. If c(0) = true. Indeed, near p, {z - and = tt we have (17) p)r^{rv{z)) ~ v"{z){z the second equality from v"{z) = — p)^/2 (valid ^{rv{z))/'L{z) and =S oo) l//(w), as established in §A.l. Notice that 2E{z)/r -^ 2E(p)/r 31 = fails. _j,(,„(,„^!M£)5(fL f{rv{z)){z-p)v"{z) = p€ FTPP: So Theorem near p since < oo, integral in Then [0, 1]. J{p, p] and the for Somewhere) (tt == for some p G in {p,p) solve 1-LJB with value + nT,{p)V"{p)] equality follows from a Taylor expansion v{z) because v e 6^) with v{p) from ^'{w) — the result obtains .. f{rv{z)) 0, — not D in {p,p), while f{rv{z)) somewhere, so that f{rv{p)) ^-P = w^~''^^'{w)/^{w) > max„[-c(n) and p may be unattainable payoffs 7r(p) optimal c(0) any Markov optimal policy n{p) 095 any optimal n{p) must Action Case / Null somewhere and Assume either 11.1 in for + n{p)T,{p)V" (p) = = c'if{rV{p))). as in Figure 2 (left), 6 is finite a.s. in G (0, > as r and p E ' lim ~'- where the = ,siim {z-py-^ = Slim f (7(f -' zip ^{rv{z)){z - pY ^Iv first bounded Remark. = ^'{rv{z)) is for If c(0) = t; > is is which z) and the third G valid because v is bounded above by = zip — p)''~\ whose (z C^. It integral which establishes the claim. 0, = c'(0) More ~ then in the proof of Claim 14, (,{rv{z))/rv{z) 0, = J(p,p] exists. + e,p} thresholds {p C. = rv{z)+rv'{z){p — — p)/f{rv{z)) — p)~\ whence optimal stopping time new any ~ rv{p) l/f{rv{z)) for rv{z) small. of order {z pElim(™'(z))'' —p equality follows from (17), the second from the assumption, follows that the integrand {z \p,p\ is z ^ from a Taylor expansion on rv{z) )^-(-^) J{rv{z)) zip Observe that: (0, 1). Thus the integrand oo. < Hence, Pr{Tp^p A Tp^p precisely, the SU proof of Claim 14 in the = control n{p) an e-optimal policy. See Theorem 3.3 < co) and no 1, f{rv{p)) with the in S78. OMITTED PROOFS: SENSITIVITY ANALYSIS C.l Comparative Statics: Proof of Proposition 6 We first consider payoff shifts, which just affect the boundary conditions of £U. then employ a different methodology for shifts in • Proof of Proposition maxx,n^(Po|", =P {Po — p)/{p ~ p) positive: V^0{po\-) a change in > 0, for > c(-) that skew the Increasing Payoff Levels. by inspection: V{po\-)^H Here the event pr and where the maximand ^(po|-) T'^tt^iTTb), differentiable in n^ 0. 6(a): r, C, = E[e~^'^p | pr — p,po,n]Pr{pT = P B > >P for all Po The other P- — is Po,'^) > (4) \ = occurs with chance payoff parametric shifts are similarly a fixed policy n,T. for itself. Write V{po) — seen on the RHS of that the T>A4 eventually chooses action any po ODE We one of these parameters, the supremum value If V the VM re-optimizes after cannot fall. Consider boundary behavior at lower threshold belief p (p being similar), associating p. and payoff parameters p, by smooth pasting ^7r(PQ|7!"o)(7ro • — TTj) is tti (7), > The ttq. and value function V^(p|7r) is continuously differentiable in partially differentiable in n. Hence, V{'n\TTo) the first-order Taylor expansion, as Vp{n\no) Proof of Proposition 6(6): Claim 7-b, p must side. Also, as fall Claim 8 new tt frontier on the right. falls By a to restore a smoothly-pasted tangency of Vp{p) asserts, as p So n 0. Increasing Riskiness. Rotate the counterclockwise through (current belief, expected payoff), as tt^ the value function cuts into the = falls, the value Vp{p) 32 rises, — V{v\'n-i) >p ttb and n^ ^ . payoff line rises. Then left-right reflection of and and gets 7r(p) on the right steeper. Claim 13 (A Key Implication) Parametrize models by Ti, T2, whereVi = Let {vi,p.,pi} denote the corresponding solutions, with ^i{p,Vi{p)) If > ^2 <p independently of p, then p > p2, and all c[{fi{riVi{p)))/T,i{p). *i(p,-yi) > vi{p) Av", {ci{-)Xhfi}- (18) p G V2{p) for all maps continuous [p„,P2]- {p,,Pi)r\{p ,p2) -^R. Ordering Thresholds. We assume p > p and obtain a contradiction. symmetric argument pi > p2- First, we show that p > p implies Av{p > 0, Step By a 1: , ) ^v'{p) > Av"{p) > 0, some small for pi Aw = W2— 1^1, and similarly Aw', Put Proof: ^*2(p,W2) > vi = > e 0. 0, Av so that The and convex at p +e, two weak inequalities follow at once from value matching first and smooth pasting of each strictly positive, increasing is at p., along with strict convexity of V2- Vi = '^2ip^,V2{p^))-'^i{p^,vi{p^)) > by (18), as Vi solves the Ti problem, We claim that Aw attains a global maximum at some p € (p min(pi,p2))- Next, Av"{p^) and To Aw (p) > 0. , see why, consider the of vi at pi some for and e > 0, is > pi strictly decreasing at pi > strictly positive interior global If P2, then Az;(p2) < < +e is In either case, since = p2, < then Ai''(pi) P2: An by smooth pasting " maximum p interior global by value matching of each and increasing at p maximum p. < nonpositive at P2 Av{p + e) > < at p.; the function Pi — and so has an we deduce the ma:!dmum 0, > because Av{p} deduced. This violates the second order condition Av"{p) < for Step 0. But then Av"{p) just proven, jointly imply Av{p Suppose that Av(p) non-negative yielding the maximum for same contradiction Proof of Proposition Convexity of Ac and Ac(0) = > gi{n), and Av{p2) < < ) > at some p € a < matching, plus p^ 0. and thus > maximum p^ (P2,P2)- imply Ac'(n) f2{uj) < Av at p. Then Av must attain a global > > Av"(p} > Ac(n)/n fi{w) for all ty > Define Ac(n) for all 0, n > so that 0. = C2(n) -ci(n). Equivalently, ^2('"^) — D '. Cost Convexity. (c): of as just and pi > p2 as before. 6 0, The claim then obtains near the one such p G (P2,P2)- This requires Av{p) • have g2{n) i'2(p}) Ordering Value Functions. Value 2: thresholds. ^2(p, then Vi ^i(Pi^i(p)) Av{p) +£ Thus, the function Av, strictly increasing at p strict convexity of V2. exists. If instead pi Av, upper thresholds. = l//2(^) we > = ^[{w). Since ^^(0) > ^[{0) by assumption, ^(u;) > ^i{w) for all u; > 0. [Or, if Ci(0) = 0, then /^(O) = implies 6(0) = 4(/2(0)) > for i = 1,2, so that Ac'(0+) > c'i(/i(0)) = 6(0).] Thus, (18) holds: '^i{;Vi{-)) = 6(rWi(-))/S(-), so that 6(-) increasing l/fi{w) and 6(0 > 6(")- This proves value and threshold monotonicity. Since and each • E2(-) fi is increasing, also f2{w) < fi{w), we have n2 (p) = /2 {rv2 (p) ) < /2 {rvi (p) ) < /i {rvi (p) ) = ni (p) Proof of Proposition 6 (d): Information Quality. and therefore the premise of Claim 13 holds: 33 ^i{-,Vi{-)) = If Ci > C2, . then Ei(-) > ^(rWi(-))/Ei(-), so that and 1/Ei(-) < 1/E2(-) imply (,{) increasing in. Since / = unchanged, nj(p) is f{rvi{p)) falls uniformly with Claim 14 (Proof of Proposition 6(e): The r2-thresholds are (b) There exists a possibly empty interval < T^2{p) for ?^i(p) Proof of (a): Proof of all p € > Since r2 Let (6): vui{p) with = [q,q], strictly falls and triggers shift Vi. Given interest rates r2>ri>0. at all points in its dornain; contained in \p„,P2] C [p , with Pi], and n2{p) > ni(p) otherwise. [q^q], T] Impatience) m, and the r2-value v lower (a) shifted Hence the value v (18). all else riVi{p) for equal, ^ increasing implies (18) p e {p., pi), = i and Claim 13. Define functions Aw{p), 1,2. < p and p\ > p2 < < Az;'(p2)from (a), smooth pasting, and strict convexity of Vi imply Av'{p By continuity, Av' is strictly increasing in some subset / C [p ^2]- Hence, Av"{p) = [^{w2{p)) — ^(wi{p))]/'E{p) > 0, and so W2{p) > uii{p) and n2{p) > ni{p). We show that Aw', Aw", Av, Av' , and Av" with domain \p^,P2\- all p First, ) , the complement set, where the folk result By definition of = s.t. u;,(p.) ^2(^2) ~ from p the return Wi ~'^a) ^ w'{ > ^i(^A ""^i) — '^[{Pi) 0. follows from r2 that = = = = ri(-K^ - ri^{wi{p)) Thus tt^). < '^[{'£.2)^ where the first equality is smooth > ri and tt^ < tt^, and the strict inequality By a symmetric argument and is a possibly empty interval. ri7r(p,), w'^{pi) > tt^ n^, w'2{p2) p and increasing strictly decreasing at Aw is strictly positive in a non-empty set / C Aw cannot have a local non-negative maximum in {p ,p2)- > f^i(pi)- at p2- Since \p ,P2], it suffices to By show contradiction, suppose > Aw"{p) for some p e {p^,P2)- Then p e {p^,P2) C [0,1] implies E(p) > 0; furthermore r2 > ri, ^(•) is increasing and Aw{p) > 0, so that the familiar contradiction follows: Aw"{p) — [E{p)]~^[r2^{w2{p)) — ri^{wi{p))] > > Aw"{p). that Aw{p) > is convex and solves Ti{p)w'/{p) n^), w,{p,) Therefore the smooth function Att; we have shown that strictly - = weak inequality > p and is ni obtains, r,(7r^ r^7r(p^), w'^ip^) ''''ii'^A pasting, the w, < 7x2 Finally, we Aw'{p) specialize to the R&D payoff specification. Aw{p) = W2{p) — Wi{p) = —Wi{p) < negative and declining; since it it by p must become > it is — 0, we raise p. (P2,P2) such that n{p) declines for all beliefs p < p', and then initially strictly some point below cannot change sign twice; so sign exactly once, going from negative to positive as € Aw Therefore p.- strictly positive at cannot have a local non negative maximum, cutoff' p' Here, V2ip„) Hence, there and p2, and An changes an interior is rises for all p > p'. C.2 Vanishing / Exploding Convexity: Completion of Proof of Proposition 7 Assume n2 > 9k{n) ni, with A^ —> 00. By Claim 1, ^^(^2) > gk{0) + J^ dgkin') =0 + J^ Xku' dn' = XkU^ /2 -^ 00 34 — Qki'^.i) > '^i-^A;("2 — ni). as k -^ 00. Similarly, pfc(n) Then ^ as /c —> oo when '>^k{p) = — A^ >• > f{'rv{p)) 0. Hence, the inverse fk of g^ explodes away from The proof f{rTr{p)) explodes. A^ for ^ is 0. Since f > tt ^ 0, D symmetric. References Arrow, Blackwell, and M. Girshick K., D. (1949): "Bayes and Minimax Solutions of Sequential Design Problems," Econometrica, 17, 213-244. Blackwell, D. ics and Bolton, (1953): "Equivalent Statistics, 24, Comparison of Experiments," Annals of Mathemat- 265-272. and C. Harris (1993): P., "Strategic Experimentation," Nuffield College mimeo. Brock, W. A., and A. G. Malliaris (1989): Differential Chaos in Dynamic Economics. North Holland, New York. Chernoff, H. (1972): Sequential Analysis Equations, Stability and and Optimal Design, vol. 8 of Regional Con- ference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia. Cressie, N., and P. MORGAN (1993): "The VPRT: A Sequential Testing Procedure Dominating the SPRT," Econometric Theory, 9, 431-450. DiMasi, J., H. G. Grabowski, and J. Vernon (1995): "R&D Costs, Innovative Output and Firm Size in the Pharmaceutical Industry," International Journal of the Economics of Business, 2, 201-219. DuTTA, P. K. (1997): Dynamics and "Optimal Management of an R&D Budget," Journal of Economic Control, 21, 575-602. Easley, D., and N. KlEFER (1988): "Controlling a Stochastic Process with Unknown Parameters," Econometrica, 56, 1045-1064. Elsgolts, L. Moscow. (1970): Differential Equations Kamien, M., and N. SCHWARTZ (1971): and Calculus of Variations. Mir Publishers, "Expenditure Patterns for Risky R&D Projects," Journal of Applied Probability, 8, 60-73. Karatzas, I., Shreve (1991): Brownian Motion and New York [KS91]. and S. E. Springer- Verlag, S., and H. M. TAYLOR (1981): demic Press, San Diego [KT81]. Karlin, Keller, G., and ment," Stanford A Second Course in Stochastic Processes. Aca- Rady (1997): "Optimal Experimentation GSB Research Paper No. 1443. S. 35 Stochastic Calculus. in a Changing Environ- KlHLSTROM, R., L. MiRMAN, and A. PoSTLEWAlTE and the 'Rothschild tion Effect'," ConsumpEconomic Theory, ed. by (1984): "Experimental Bayesian Models in in M. Boyer, and R. Kihlstrom, pp. 279-302. Elsevier Science Publishers, New York. Krylov, N. V. (1980): Controlled Diffusion Processes. Springer- Verlag, New York. LiPTSER, R., and A. N. Shiryayev (1977): Statistics of Random Processes: General Theory, vol. I. Springer- Verlag, New York [LS77]. (1978): New Statistics of Random Processes: Applications, vol. II. Springer- Verlag, York. McLennan, A. (1984): "Price Dispersion and Incomplete Learning Journal of Economic Dynamics and Control, Neyman, J., and E. S. PEARSON (1933): "On 7, the Problem of the Statistical Hypotheses," Philosophical Transactions of the Oksendal, B. (1995): Stochastic Differential Equations: Springer Verlag, tions. New York in the Long Run," 331-347. Most Efficient Tests of Royal Society, 231, 140-185. An Introduction with Applica- [095]. Radner, R., and J. Stiglitz (1984): "A Nonconcavity in the Value of Information," in Bayesian Models in Economic Theory, ed. by M. Boyer, and R. Kihlstrom, pp. 33-52. Elsevier Science Publishers, New York. Rockafellar, T. (1970): Shiryayev, A. N. (1978): Optimal Stopping Rules. Springer- Verlag, Trefler, D. Convex Analysis. Princeton University Press, Princeton [R70]. New York [S78]. "The Ignorant Monopolist: Optimal Learning with Endogenous Information," International Economic Review, 34, 565-581. Wald, a. (1993): (1945b): and Method of Sampling for Deciding Between Two Courses American Statistical Association, 40. (1945a): "Sequential of Action," Journal of the "Sequential Tests of Statistical Hypotheses," Annals of Mathematics Statistics, 16. (1947a): Sequential Analysis. Wiley, Wald, Wallis, York. Wolfowitz (1948): "Optimum Character of the Sequential ProbaRatio Test," Annals of Mathematics and Statistics, 19, 326-329. A., and J. bility New W. a. (1980): "The Statistical Research Group, 1942-45," The American Statis- tician, 75. 36 DUPL MIT LIBRARIES 3 9080 J 735 0155 J