Document 11164293

advertisement
Digitized by the Internet Archive
in
2011 with funding from
Boston Library Consortium IVIember Libraries
http://www.archive.org/details/waldrevisitedoptOOmosc
35
iWi\
HB31
,MIT LIBRARIES
.M415
O^.n-Ol
riiiiiiiiiiniiii'Mii
,__
3 9080 01735 'diss
WORKING PAPER
DEPARTMENT
OF ECONOMICS
'.u
WALD REVISITED: THE OPTIMAL LEVEL
OF EXPERIMENTA TION
Giuseppe Moscarini
Lones Smith
No.
98-04
April 1998
MASSACHUSEnS
INSTITUTE OF
TECHNOLOGY
50 MEMORIAL DRIVE
CAMBRIDGE. MASS. 02139
»t
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
LIBRARIES
Wald Revisited:
The Optimal Level of Experimentation*^^
Lones Smith^
Giuseppe Moscarini^
Economics Dept., M.I.T.
Economics Dept., Yale
P.O.
50 Memorial Drive
Box 208268
New Haven CT
06520-8268
Cambridge
MA 02142-1347
April 21, 1998
(first
version: July, 1997)
Abstract
Wald's (1947) sequential experimentation paradigm, now
assuming that an impatient decision maker can run variable-size experiments
The paper
revisits
each period at some increasing and strictly convex cost before finally choosing
an
irreversible action.
We
translate this natural discrete time experimentation
story into a tractable control of variance for a continuous time diffusion. Here
we robustly
characterize the optimal experimentation level:
It is rising in
the
confidence about the project outcome, and for not very convex cost functions,
the
random
process of experimentation levels has a positive drift over time.
also explore several parametric shifts unique to our framework.
we
discover
what
level is positive,
is
arguably an
it is
'anti-folk' result:
often higher for a
Where
Among
We
them,
the experimentation
more impatient decision maker.
This paper more generally suggests that a long-sought economic paradigm
that delivers a sensible law of
demand
for information
is
our dynamic one
—
namely, allowing the decision maker an eternal repurchase (resample) option.
•This paper was initially entitled "Wald Revisited: A Theory of Optimal R&D". We have benefited
from comments at the Warwick Dynamic Game Theory Conference, Colmnbia, Yale, M.I.T., Toronto,
Pennsylvania, UC - Davis, Stanford, UC - Sainta Cruz, Western Ontario, UC - Santa Barbara, Queens,
and more specifically Massimiliano Amaxante, Dirk Bergemann, Lutz Busch, John Conley, Prajit Dutta,
Drew Pudenberg, Peter Hammond, Chris Harris, Angelo Melino, Paul Milgrom, Stephen Morris, Klaus
Nehring, Yaw Nyarko, Mike Peters, Sven Riuiy, Arthur Robson, John Rust, Chris Sims, Peter S0rensen,
Ennio Stacchetti, and Steve Tadelis. Finally, we acknowledge helpful feedback from the usenet group
sci. math. research
especially, Jon Borwein of Simon Eraser University.
Moscarini gratefully a<:knowledges support from the Yale SSFRF, and Smith from the National Science Foundation (grant SBR9711885) for this research.
^Keywords: learning, experimentation, sequential analysis, R&D
*JEL classifications: Cll, C12, C44, C61, D81, D83
^email address: gm76Qpaiitheon.yale.edu
—
^e-mail address: lonesQlones.mit.edu
t
Wald
Revisited:
The Optimal Level of Experimentatioif^^
Lones Smith^
Giuseppe Moscarini^
Economics Dept., M.I.T.
Economics Dept., Yale
Box 208268
P.O.
New Haven CT
50 Memorial Drive
Cambridge
06520-8268
MA
02142-1347
April 21, 1998
version: July, 1997)
(first
Abstract
Wald's (1947) sequential experimentation paradigm, now
assuming that an impatient decision maker can run variable-size experiments
The paper
revisits
each period at some increasing and strictly convex cost before finally choosing
an irreversible action.
We
translate this natural discrete time experimentation
story into a tractable control of variance for a continuous time diffusion. Here
we robustly
characterize the optimal experimentation level: It
is
rising in the
confidence about the project outcome, and for not very convex cost functions,
the
random process
of experimentation levels has a positive drift over time.
also explore several parametric shifts unique to our framework.
we
discover
level
is
what
positive,
is
arguably an 'anti-folk' result:
it is
often higher for a
Where
Among
We
them,
the experimentation
more impatient decision maker.
This paper more generally suggests that a long-sought economic paradigm
that delivers a sensible law of
demand
for
information
is
our dynamic one
—
namely, allowing the decision maker an eternal repurchase (resample) option.
*This paper was initially entitled "Wald Revisited: A Theory of Optimal R&D". We have benefited
from comments at the Warwick Dynamic Game Theory Conference, Columbia, Yale, M.I.T., Toronto,
Pennsylvania, UC - Davis, Stanford, UC - Santa Cruz, Western Ontario, UC - Santa Barbara, Queens,
and more specifically Massimiliano Amarante, Dirk Bergemann, Lutz Busch, John Conley, Prajit Dutta,
Drew Fudenberg, Peter Hammond, Chris Harris, Angelo Melino, Paul Milgrom, Stephen Morris, Klaus
Nehring, Yaw Nyarko, Mike Peters, Sven Rady, Arthur Robson, John Rust, Chris Sims, Peter S0rensen,
Ennio Stacchetti, and Steve Tadelis. Finally, we acknowledge helpful feedback from the Usenet group
sci. math. research
especially, Jon Borwein of Simon Fraser University.
Moscarini gratefully acknowledges support from the Yale SSFRF, and Smith from the National Science Foundation (grant SBR9711885) for this research.
^Keywords: learning, experimentation, sequential analysis, R&D
—
*JEL
classifications:
Cll, C12, C44, C61, D81,
D83
^email address: gm76®paiitheon yale edu
.
.
^e-mail address: lones@lones.mit.edu
^
INTRODUCTION
1.
This paper revisits a classic decision theory contribution around
Wald's (1947a) sequential probability
semicentenial
its
More than any other work,
ratio test}
—
this has
defined the paradigm of optimal sequential experimentation, either statistical or Bayesian.
For
it
tackled afresh the simplest of decision problems
—
among two
choosing
for instance,
actions (perhaps accepting hypotheses), each optimal in one of two states of the world.
Wald
In his Bayesian formulation,
of the state, can
VJ^
shows that the
one
buy multiple
posits that the decision
Wald
informative signals at constant marginal cost.
i.i.d.
should purchase sequentially, and act when sufficiently convinced of
Yet we venture that the experimental schedule should accelerate when homo
state.
economicus runs the laboratory in
motivated twist along these
spin on Wald's tale, the
VM.
real time.
We
lines that also creates
is
offer here
strictly
richer experimentation story provides
a compelling, economically-
a pure theory of dynamic
assumed impatient, but may
each period only at an increasing,
it
maker {VA4), ever uncertain
elect a variable-size
convex cost of information.
some
intriguing
new
R&D.
We
In our
experiment
argue that this
testable implications,
also points economists to a long-sought well-behaved theory of information
and that
demand.
This paper has three goals, attacked in sequence after an essential literature review.
First,
we motivate our impatient DM. and convex
cost spin on Wald's discrete time tale,
and then introduce and solve a tractable continuous time equivalent model of experimentation
— which we argue
is
a control of variance for a diffusion with uncertain mean.
Second, we investigate the robust character of the experimentation
suppose that
we
it
peaks when the
find the opposite:
payoff,
which
is
VM.
level.
has middling, elastic beliefs over the state. In
Given a convex cost function, experimentation grows
greatest at extreme beliefs. For example,
most discouraged, peaking just prior
One might
to project approval.
in the
fact,
expected
R&D levels are least when one is
We
argue that the
VM acts
like
a neoclassical competitive firm producing information at an increasing marginal cost, with
an increasing "producer surplus." Optimal stopping demands that
cost of delaying the final decision,
i.e.
this surplus equal the
the portion of current value killed by discounting.
Ipso facto, the research level that generates this surplus rises in the value.
We
up
rise
that the level
Finally,
shifts.
convex in
beliefs for not too
we explore how experimentation
Among
impatient
^The
is
VM.
convex
level
costs,
and so
drifts
also
show
over time.
responds to some natural parametric
our findings, we discover that contrary to established folk wisdom, a more
often experiments at a higher level provided that he
original source
is
still
experiments.
a 1943 war-classified mimeo by Wald. Wald (1945a) and (1945b) were the
first
published versions of the program. In his (1947) preamble, he credits Milton Friedman and Allen Wallis
for proposing sequential analysis. Wallis details this history in his (1980) retrospective paper.
2.
THREE RELATED LITERATURE THREADS
Statistical Testing
2.1
The
(1933)
Neyman-Pearson theorem says that the best
hypotheses given a fixed sample has the form: accept
ratio exceeds a threshold.
ity ratio test
(SPRT)
is
Wald and Wolfowitz
optimal
Ho
critical test of
(vs.
Hi)
iff
two competing
the sample likelihood
(1948) proved that the sequential probabil-
— minimizing the expected number of observations
for a
given power, even given either hypothesis. Arrow, Blackwell, and Girshick (1949) further
developed this idea, and was
among
the earliest of contributions to
The SPRT has been explored and
But
generalized by several authors (see Chernoff (1972)).
no one has characterized the optimal sample
in a half century, surprisingly
dynamics with discounting, where a unit sample
its
contribution. Cressie
size
is
no longer optimal
and Morgan (1993), a recent sampling from the
a variable size probability ratio test
procedures, for instance, while a
2.2
dynamic programming.
is
unconditionally optimal
SPRT is best
size or
— our main
frontier, prove that
among
sequential design
with superadditive costs and no discounting.
Optimal Experimentation
Optimal experimentation was born out of sequential
Among
analysis.
the canonical
Bayesian learning models, those with a discrete action space are essentially stopping rule
problems, or more generally, bandit models; continuous action models are richer, and not
so easily pigeon-holed.
Our model
rare as
is
it
grafts together these
two contexts: variable
experimentation with an eventual binary stopping decision. Crucially, because our
makes a pure information purchase
until stopping, it differs
from
all
other experimentation
models where myopic expected stage payoffs are themselves informative random
here, payoffs are
known and negative
This paper investigates the
run question,
questions.
'Is
For
level
finite action
models
signals;
(the information cost), until the final decision stage.
and
learning complete?'^
VM
drift of
The
experimentation, as opposed
literature has also studied three
like bandits,
there has been
to.
the long-
main short-run
work on the sequential
ordering of experiments, or stochastic scheduling. Second, the direction of experimentation
has been explored in continuous action models
like
monopoly
pricing.^ Finally, the secular
diminishing trend in experimentation has been remarked in papers on search with learning
(eg. rising reservation prices).
This comes closest to our thrust, and yet runs exactly
counter to the rising experimentation levels that we find.
information purchase
is
blended with, and thus colored
by,
For with price search, the
immediate payoff concerns.
and Kiefer (1988) or Kihlstrom, Mirman, and Postlewaite (1984).
^McLennan (1984) and Trefler (1993) are good examples. Keller and Rady (1997), who posit a randomly
shifting demand curve, assume continuous time learning; it is therefore also a technically relevant paper.
^See, for instance, Easley
.
Research and Development
2.3
We interpret a special case of our analysis as a pure
is
Kamien and Schwartz modelled
R&D
as (very roughly) covering a possibly uncertain
distance in 'progress space', given cost and effort functions.
assumed a related budget constrained model.
capturing
feasible?"
'D',
this
An
and ours
It
may
Dutta (1997) has recently
help to imagine extant work as
For in our setting, the actual state of the world ("Is this
'R'.
always hangs in the balance, while these papers posit a goal that
)
attainable,
if
desired.
We
is
eventually
cast research as optimal learning rather than resource allocation;
a theory of science (the search for truth) and not engineering
is
eventual comprehensive theory of dynamic
R&D
will
implementation).
(its
no doubt embed both phases.
MOTIVATING THE CONTINUOUS TIME MODEL
3.
Bayesian Sequential Analysis
3.1
The
A.
Final Static Decision Problem.
must choose between two
A
Action
is
= A,B
pays
B
in state
H, and the VM. does not know the
better in state L, and
is
the VAd's prior belief on
say
TTa{p)
We
also
= pn^ +
(1
~
In a standard problem, the VJvi eventually
actions: a
Assume that the VM.
H
p)'^a-
we can assume
pays a positive payoff tt^
is
p,
The
world 6
VM
is
indifferent
between
A
and
B
VM builds,
B
at
>
0. If
means
it
state
is
then
7r(p)
=
=
L
and one
is
dominated, then k{p)
some
in p,
belief p.
is
always increasing in
new
prototype, and action
might or might not work: payoffs are
A
The
if
a pure optimal stopping exercise: The
A
(or
B, or the null action,
i.e.
0.
<0
So
initially acquire
Assume he maximizes the expected
As Wald and Wolfowitz proved that
=
it.
I).
iff
informative signals of ^ at a fixed unit cost.
7rB(0)
7r^(p)
p.
R&D
'abandoning'
= 7rB(l)>0or£ =
/i
let
less costs incurred.
B optimally
safe action conveniently captures a stylized
p)),
action
is afiine
(without loss of generality) that action
'building' a costly
H and 6 = L.
tt{p)
is
state.
max(7r^(p),7rB(p),0). Given
VM earns zero regardless he abandons:
= max(0, hp + i{l —
and the VM invests
p > p = —i/{h —
the VM
B. Information Acquisition.
Before choosing an action,
in states 6
this
= L,H.
utilities as payoffs).
then his expected payoff for action a
WLOG
special case of one risky
problem: action
the
in the state of the
admit the costless option of never deciding, and thus need a default null action
the null action,
The
7rf
Bayesian and risk neutral (simply treat
yielding constant zero payoff. His static value
If
this
obviously a well-studied subject, other work differs in a key dimension. During 1971-
82,
If
R&D. While
theory of dynamic
final
reward
sequential purchases are optimal,
VM quits at the stopping time T and chooses
quits) with posteriory
<p
(orp
>
p, or
p G
[Pq,Po]},-
3.2 Discrete
Our
goal
is
Time Experimentation with Impatience and Convex Costs
and cost
to extend Wald's setting along two dimensions: payoff discounting
convexity of experimentation. Tiiese assumptions are of particular interest for economists,
First
may
This
less so for statisticians.
assume an impatient
explain
VM
,
why
this stone has
who maximizes
of wealth. Faced with the time cost of delay, the
purely sequentially, but
outcomes
k,
the
in
may
left
unturned.
the expected present discounted value
VM.
does not necessarily wish to proceed
opt to stack his information purchases. After seeing his signal
any period, he either purchases more, or stops and chooses an action. In period
VM. may buy
A^^ i.i.d.
signals Xi,
.
.
.
,X[^^ at cost
might venture that running a lab incurs a daily
level.
been
In this case, there
is
is
0.
One
independent of the experimentation
a positive fixed flow experimentation cost C{0)
analogy with Wald's setting
Next assume
rent,
>
C{Nk), with C(0)
obviously closest with no fixed costs C(0)
>
=
0.
But, the
0.
convex information costs within a period; this fosters more equal
strictly
purchases across periods, reinforcing Wald's sequential conclusion, absent discounting.
Such an assumption makes economic sense on two grounds.
More
searchers are equally talented in producing information.
then must draw on the efforts of
less
First, plausibly
capable researchers. Second, as with non-Bayesian
rent stock, identical or similar discoveries are not
rare:"*
may well expend
Even
i.i.d.
signals.
Then note
is
if
based on the same cur-
research laboratories are
resources duplicating results. Like-
wise concurrent Bayesian information tends to be correlated, as
to produce
all re-
intensive information search
inventive activity, since contemporaneously-produced knowledge
created at constant cost, different labs
not
it
grows increasingly hard
that a constant marginal cost for correlated information
an increasing marginal cost of independent information.
intuitively corresponds to
Developing the Continuous Time Experimentation Model
3.3
Although variable intensity experimentation
discrete time using
understood and formulated
easily
dynamic programming, the solution
simplest signal structures. Perhaps this explains
now
is
is
in
intractable even for the very
why no one has
seriously pursued
it.
We
describe a tractable continuous time learning paradigm that captures the quintessence
of the discrete time story.
substance.
Its recursive
Appendix B. In a work
Below we sketch and motivate
solution
is
in progress,
found
in section 4.2,
it
and focus on
and a formal
we argue that the model and
its
its
economic
justification in
solution
is
the limit
of a rich class of discrete time models with a vanishing time interval between experiments.
^A recurring theme in science is that great minds simultaneously think alike (eg. Newton and Leibniz'
codevelopment of calculus). Indeed, those seeing further often stand on the shoulders of the same giants.
A. The Signal Process.
number
In discrete time, choosing the
buy, each with distinct state-dependent means,
is
iiX has mean ±/i
experimentation. For instance,
of
signals to
i.i.d.
an apt description of variable intensity
in in states
H, L, then experiments
offer
a
noisy but informative glimpse of the signal mean. In fact, the very goal of experimentation
is
to infer this
X=
J2
Xi/N
mean,
sufficient for the
is
DM.
for in so doing, the
to decrease the variance of
learns the state.
Since the average signal
unobserved mean, greater experimentation only serves
X: Doubling the sample
size precisely halves the variance.
Motivated by this general observation, we model continuous time experimentation as
the control of variance of a diffusion observation process (xt).
is
known
a Brownian motion with constant uncertain drift and
its drift, jjf in state 9,
a'^/ut,
where
with the intensity
differential equation
fx^
= —fj,^ = /i > 0,
The observation
n^.
Here,
6.
experimentation
VA4
variance. Nature chooses
controls
its
flow variance
diffusion process thus solves the stochastic
we think
=
a
^fdt
+ -—dWt
(1)
of Ut as the flow of information purchases,
level or intensity.
Doubling
rit
and
~
^^^(0, dt) is
the Wiener increment,
while the control Uf depends on the observation and intensity history {xs,0
<
s
<
It is
t).
the
call it
halves the "variance" of dx^, yielding a
doubly informative time-t experiment. As usual, dWt
{ns,0
(xt)
(SDE)
dxl
in state
while the
For a fixed control,
<
s
<
t)
U
a feedback and not open loop control, not decided at time-0.^
See §17.6 in Liptser and Shiryayev (1978), §17.5 in Chernoff (1972), or §4.2 in Shiryayev
(1978) for the pure problem of estimating the bivariate drift of a Brownian motion. Their
motivation
be
is its
link to the heat equation. Neither source discusses control of variance.
sure, without time preference, there is
Remark.
But a
Our
realization xt
=
Jq dxg is
an unweighted running integral of sample means, and so
<
s
<
(w.r.t.
t)
totals observation process (xf),
the drift
obeying dx^
that case, J^ Ugds can be thought of as the running sample
is
exercise.
choice of process (xf) yields a motivational pure control of variance:
not a sufficient statistic for {{xs,ns),0
running sample
no pressing reason to consider such an
To
=
size,
/z^).
is
Consider instead the
Utfi^dt-V-
{o^/nDdWt- In
so that {xt, /q Ugds)
G M^
a simple sufficient statistic for the mean. This process also yields a clearly well-defined
level-0 experimentation, as will
intensity level
rit
essentially accelerates
observes future samples.
The two
and a choice between them
the state
^This
is
6),
our belief
is
not
time
(2) in §4.2.
We
see also that a higher
— advancing the schedule that
the
VM.
processes (xf) and (x^) are mere conceptual devices,
critical:
and crucially yield the same
why
filter
Both are
sufficient for the drift i/
belief filter (2),
the signal must be defined recursively via a
which
is
(and hence
what we work with.
SDE rather than as
an exogenous
Ito process.
B.
The Cost
The
('A'):
Flow Experimentation.
of
on (0,oo),^ with nonnegative fixed
'Lipschitz-down' on {0,00): c'{n2)—c'{ni)
means
Strict cost convexity
0<7<1.
roles,
— which
true
is
Weak
=
convex hull vex(c)
least 7c(7ii)
c"(>
if
VM.
<
sup{co
+ (1 — 7)0(712) — e
712
at a chosen intensity level Ut
experimentation costs: E[Jq
supremum
w.r.t.
T
and
posterior belief po on state
Competing
Static
c|co is
is
and
well-defined.
convexity remarkably obtains
achieved in any time interval
is
with weights
(7, 1
— 7)
At each time
>
Admissibility
0.
VA4
maximizes
—c{nt)e'''''^dt
(nt), as
H
is
a,
+
WLOG
in
t,
>
[fo, ^1]
any average cost at
0,
by
sufficiently rapidly
Then,
at small payoff' loss.
VM,
the impatient
let
e
—
>
0.
facing an interest
demands that the random stopping time
his expected discounted return less incurred
e~^'^7r{pT)\po]-
We
write the optimized value,
V{po), since Appendix B.l proves that the current
sufficient statistic for
11
observed history to that moment.
—
n(p)]. This
we have p
=
it is
any subdifferential d G dU{p),
line,
random
posterior belief q
we may tack on any multiple
differentiable for a.e. p.
—
Value of Information
worth X(g|p)
is
=
admits a motivational visual depiction. Beliefs being a martingale,
E{q); therefore,
E[U{q)
for the
describe the expected payoff of a one-shot Bayesian program.
E[n{q)
tangent
and solution
(1),
convex}. Indeed, for any e
prior belief p, a signal yielding the
=
722
can achieve any cost function arbitrarily close to the lower
At the
"^{qIp)
7)0(7x2) for ni 7^
differentiability play purely technical
and Dynamic Intuitions
Let the convex function
convex,
—
(1
THE OPTIMAL LEVEL OF EXPERIMENTATION
4.
4.1
+
X>0, and any 7i2>ni>0.
each be functions of the observed history. In the Optimal Control and
level Ut
Stopping (OCS) problem, the
or
for some
Marginal costs are
0.
chooses whether to stop and earn the final expected payoff' n{pt), or to continue
0,
and the
>
costs c(0)
7c(ni)
— and
strict) cost
The Objective Function.
>
>
7)712)
0) exists
(though not
chattering between Ui and
T
—
ensuring that our observation process
continuous time.^ For the
rate r
(1
— ni),
X{n2
flow cost c{n).
precludes undesirable bang-bang control solutions. Both the Lipschitz-down
It
Remark.
C.
+
0(7/21
>
n incurs a
convex (and thus continuous) on
cost Junction c{n) is finite, increasing, strictly
[0,00), differentiable
property
Intensity level
U{p)
—
d{q
i.e.
—
Put d
=
ll'{p)
when
a slope between the
p)] is
of [Eq
defined,
left
and
p]
=
0.
As
IT is
and otherwise choose
right derivatives.
the weighted area between
with weights given by the density over q (see Figure
—
11
Then
and any supporting
1).
®Thus, the cost function is C^ (continuously differentiable); by convexity, it has a right derivative c'(0+).
'^We thank Paul Milgrom for this nice insight. We do not wish to delve into a technical proof of this
assertion, as it would detract from our focus. We intend the point and explanation to be intuitive.
^
7r(p)
1:
Static Value of Information.
information.
With
(eg.
we
discrete signals,
With a continuous
is
maximized
when
the
V/4
information most when he
VA4
likes to
in the
middle at
is
and generally
p,
demand
The above
is
IT
the value of
beliefs;
=
quasiconcave in
is
— p), and
p.
Alternatively, the
new information on the
peaks at p
=
1/2,
is
Intuitively, he values
his action choice.
the impact of
tt,
this visual information value
tt,
between the two actions.
most uncertain about
increasing in p(l
uncertain as to which state
its
is
indifferent
is
spread his posterior
of posterior beliefs
thus
is
instead have a weighting of vertical line segments in this
For the specific case of the static payoff frontier
at p,
signal support, the shaded
appropriately weighted,
line,
the thick dashed lines). In either case, for a purely static value function
the information value
maximal
p)
1
area between a value function and any supporting
shaded region
-
d{p
-* P
p
Figure
+
where the
variance
VM
is
most
true. Either static logic suggests that information value
and
(extrapolating to margins) are quasiconcave 'hill-shaped' functions of p.
logic fails for our impatient
VAi
in a
dynamic
For the X>A^ has
setting.
an incentive minimize the present discounted cost of information, and therefore wishes to
delay his costly high-intensity experimentation until just prior to stopping.
only at extreme
beliefs,
information
demand ought
As
this occurs
to be greatest for extreme beliefs.
Of
course, this intuition cannot establish the shape of the experimentation schedule, let alone
its
intertemporal trend. For that, we must consider the recursively-formulated problem.
4.2
The Recursive Formulation and Solution
A. The Belief Filter and Bayes Problem.
Intuitively (and formally, as
show), the observation process {xt,nt) induces a diffusion belief process
Po, beliefs (pt) evolve according to
the signal-to-noise ratio factor of
Bayes rule
(x(),
in continuous time.
If
(pt).
C
—
we
later
Given a prior
"^f^/f^
denotes
then Theorem 9.1 of Liptser and Shiryayev (1977)
(LS77) asserts:
Pt=Po +
where the Wiener increment dWs
/o Ps(l
= PsdW^ +
dW^ = ^[dxf -
- Ps)CVn^dWs
[p,/x
a
Alternatively, LS77's
Theorem
^Namely: As of time
{xt, /o
s,
nsds) the increment
+
9.1 implies that
unconditional perspective,^ as he does not
is
Gaussian
(1
know the
A'^[0, t
is
s]
= L,H:
(3)
a Wiener process from the
VM's
true drift, and so cannot observe the
<t'<s)U
-
for 9
- Ps){-f^)]ds]
(Wt)
given observed history {xf^O
Wt - Ws
— Ps)dW^, and
0-
(2)
{nf,0 <t'<s),oi more simply, just
and independent of Wg,
for all t
>
s.
true noise process (Wt) that drives the signal process in
(1).
The
precise measure-theoretic
statement, in terms of signal filtrations, appears in Appendix B.l. As a driftless diffusion,
beliefs {pt) are
Substituting the ex post observed history
are computed. Intuitively, the
VM updates
beliefs
Remark. Using
—2{^/n't/ o) ^iptdt
Not
B.
+ dWt-
filter (2)
upward
find that
dW/^
=
SDE
>
dxt
2(^/ct)/x(1
Plugging either Ito process
yields the
iff
i.e.
in favor of
-
[pt/i
/lx
>
+
(1
of the
Standard
for
E
[/(f
=
7iPi
as he can ignore
=
and dW^^
into
{Wf)
OCS
drift in state L.
problem
is
+ /o^Pt(l - Pt)C,^tdWt)
po]
(4)
|
+72P2 to
If
it:
pi
in
For intuitively, a signal
convex.
is
with chance
he optimizes at each
The next lemma, proved
V{pq).
e-^'dt ^e'^'^Ti (po
optimal learning, this value function
spreading the belief po
VM.^
The supremum value
-c {nt)
iff
solved by the conditional belief processes (pf ) or (pf ).
The Value Function.
s^VT,(n,)
0)
Pt){—fJ')]dt.
Ito stochastic integral)
(i.e.
H, and a negative
=
—
1.
beliefs
{dWt >
+ dWt
pt)dt
surprisingly, beliefs have a positive drift in state
y(po)
the
we
(1),
how
these formulae reveals
{xt, Ut) into
the observation process rises faster than he expects,
the belief
and
an unconditional martingale, with least variance near the extremes
(71+72
7^
p^,
=
1)
cannot possibly hurt
he then gets 7iK(pi) +72^(P2)
Appendix B.l, mandates threshold stopping
>
rules.
Lemma 1 (Convexity) Consider any cost function c{n) > Q. Then the supremum value
V is convex in p. Also, V{p) = 7r(p) forp < p andp > p, for some cut-offs < p < p < 1.
< po < p.
// the null action is ever exercised, then V{p) = 7r(p) in \p^,Po], where p <
'd
C. Optimality Conditions.
p >
p,
set.
If
and absent the
TT
<
The VM.
^
for
p
null action, he experiments at level n{p) for
ever, then the null action
We
where P<Pq <Po <P-
selects action
may be
then partition the
exercise for the schedule n{p)
<
p, action
p G
{p,p),
chosen in a subinterval [p.,Po]
OCS
B
for
an open
C
{p,p),
problem into an Optimal Control (OC)
(Appendix B.l.c proves
this
Markovian form) and an Optimal
Stopping {OS) problem for the boundaries p,p, and perhaps p^,po. The experimentation
domain
is
then £
=
{p,p), or
recursive equations for
exists,
£
=
what we
{p,Pn)^{Po,P)
call
the value
and coincides with the supremum
dynamic policy using ordinary
Since beliefs
(pt)
the null action
if
v]
value, or
is
we then prove
=
?;
differential equations
viable.
We now develop
in Proposition 1 that v
V. Hence, we solve
methods (ODE)
in
for the
optimal
Proposition
2.
are a martingale obeying (2), for any given experimentation region £,
the Hamilton- Jacobi-Bellman {'HJB) equations associated to the
rv{p)
=
sup„>o{-c(n)
+
=> rv{p)
=
sup„>o {-c{n)
+
v'{p)
+
(l/2)np2(l
nE(p)t;"(p)}
8
OC
problem are
- pfev"{p)}
(5)
where E(p)
= p^(l— p)^C'^/2 measures
= pn^ + il -pjtta
v{p)
=
v{p)
"belief elasticity", plus the value
P7rf
+ (1 -p)ir'^
and possibly
matching condition:
=
v{p^)
=
rv{p)
—c{n{p))
+ n{p)E{p)v" (p)
,
boundary smooth pasting
plus (6), and the free
=
v'{p)
7v^-7r^
ST
While the functional problems
and possibly
p and
p:
=
v' {po)
=
(7)
in closed form,
we can
create a unified equivalent (second order
ODE problem
nonautonomous)
free boundaries,
and
Proposition
(Value Existence / Uniqueness / Verification)
1
>
with value w G C^ strictly convex {v"
(6)
The solution v coincides with
(c)
The value v
intuitive insights. For a fixed
SOC
the
is
met because
c{n)
+ nc'{n) —
g(n)
strictly increasing too
convex, then g{n)
for
p e
=
We
is
as
£U
Now, g
1 in
=
0,
>
0.
and we
exists,
,
cum
and thereby
supremum
for
we provide some
TiJB
(6)
>
1 in
is
and
OCS
oo
if
c(n)
/
=
ir.
easier
and
T,{p)v"{p);
+ nT,{p)v"{p) is strictly
= —c{n) + nT,{p)v" (p) =
is
= — c(0) <
and
everywhere
finite
<
Thus, ^(0)
0.
Given rv{p)
are; it is
c'
>
and
rv{p)
rv{p) for large n,
=
g{n{p)),
we have
g~^.
(7).
is
Theorem
1 in
§A.2 proves that a unique
uniquely soluble (part (a) here), while
(part (b) here) by
Theorem
3 in §B.3.
v shares the shape of n by convexity, value matching, and smooth pasting.
Remark.
Strict cost convexity rules out
one undesirable and not implausible outcome:
suddenly exploding the experimentation level over a vanishing time interval
With a
=
(5) is c'{n)
rmax(7r(0),7r(l))
§A.l.
1.
equivalent to the two-point boundary value
KJB+ST
value of
<
(5)-(7),
the static payojf
clearly continuous as c
is
just set y(0)
Also, g{n)
HJB+ST
c'{f(rv))/E,
Here,
namely
objective (4).
p with
and thus —c{n)
§A.l. Since c'(0+)
unbounded above by Claim
this solution is the
(c)
OCS:
of
oo then uniquely exists iirv{p)
n4
have established that
solution to
V
FOC
£, the
f{rv{jp)) for the strictly increasing inverse
problem SU: v"
For
by Claim
— c(0)
<
soluble in n.
because then v{p)
£,
because g{n)
n{p)
\.
is
domain
strictly convex,
is
concave in n. The solution n{p)
—c{n)
maximized value
(ir).
<p<p<
interior thresholds
largely appendicized.
is
Assume
ofUJB+ST,
jointly increasing/decreasing/U-shaped in
is
Proof Sketch: The proof
more
the
and
0) in E,
{SU) below with
without the closed form.
fully characterize its solution {v{p),p,p}
There exists a unique solution {p,p,v) or {p,p,n,pQ,v)
(a)
tt
v' (p^)
and TiJB cannot be solved
(7)
at
conditions, that the value v be tangent to the static payoff function
v'{p)=T^A-^A
(6)
OS problem
For any given control policy n{p), the Stefan problem (ST) associated to the
is:
=
v{po)
linear
(i.e.
[0,
A],
A—
>-0.
not strictly convex) cost function, such a policy that quickly achieves
arbitrarily perfect information
is
preferred with discounting, absent any cost
premium.
Assume
Proposition 2 (Policy Existence / Uniqueness)
>
(a) If payoffs 7r(p)
IS
for
all
p G
the unique optimal policy for
(b)
The marginal
(c)
=
IfTT^p)
OCS
either c(0)
if
The
(d)
> 0,
exists,
n{p)
=
or
(a) is
and hence so does
f{rv[p)). If c"
Any
if c{0)
— even
w = rv>0
=
but there exists
>
rj
is
=
>
nc"{n)
0.
continuous, then so
geometric convex cost function c{n)
in Proposition 2-c
is
not differentiable.
uniquely optimal for
when
n'(-) exists if c"{-) does; n(-) is C^
proven in Theorems 4-5 in §B.3, and
g'{n)
if c' is
Since /
is g'
=
,
n'^
— met by exponential functions
Remark. We note after Theorem 6of §B.4
that
so that the final proviso in Proposition 2(c)
attained; therefore, no optimal policy exists.
For
>
=
^
if
&.
6 in §B.4.
>
c!'{n)
f~^
is,
so
is
D
n' too, as claimed.
violates the elasticity condition
1)
=
the
fails,
and
/'
like c{n)
if 7r
assume that
(d),
< oo.
c(-) is
Theorem
(c) in
differentiable
is
and thus
{k
increasing,
is
with hm-ujio w^~'^^'{w)/^{w)
established in the appendicized Claims 2-4.
{h) is
n = f{w)
then the policy {f{rv{-)),p,p}
[0,1],
continuous in E;
level n{p) is
Proof Sketch: Part
Part
OCS.
differentiable in the return
some p G
at
HJE+ST
then the solution {f{rv{-)),2^p] from
[0, 1],
cost ^{w) =c'{f{w)) of the optimal intensity level
and
strictly concave,
("A").
=
e°"
—
1
(some a
somewhere and
supremum V{p)
>
0).
= c(0) = 0,
OCS is not
c'(0)
of
This pathological case, an undesirable by-
product of the continuous time approximation, arises since the
VM. may never choose A
(or
the null action) in finite time given the negligible cost of running very small experiments.
The Optimal Experimentation
4.3
FOC
Even though the
c^{n)
=
Level:
T,{p)v"{p)
The
2 (Monotonicity)
Given
creasing in the value v{p) for p e £.
For instance, quadratic costs c{n)
The experimentation
Here
level is
{'k),
0,
with /(O)
r? yields surplus g(n)
=
n^,
n
with
rises
{OS), and
if
v.
is
is
what
level
c!{n) of
{OC). Focus
first
information equal
linear in the experimentation level n, the
is
=
iff
c{0)
=
—
0.
y/n.
i-^
n.
We
formally
greater with a higher value, and therefore the
There are two decisions at each moment
to experiment, at
MB = T,{p)v"{p)
that:
then an increasing concave function of the return \/rv{p).
demands that the marginal cost
precision
f{rv{p)) are
and thus f{n)
a concrete economic intuition for the monotonicity of f
is
=
the optimal intensity level n{p) is strictly in-
>
argue that the bang per research dollar
level
an Information Firm
/ allows us to conclude
weakly exceeds f{0)
It
=
as
and the policy equation n{p)
jointly insoluble in closed form, the monotonicity of
Lemma
VM
constant, and the total benefit
10
is
in time:
on the
its
experiment or stop
level choice.
Optimality
marginal benefit. Since belief
marginal benefit of experimentation
then
linear:
uMB.
So at an optimum,
P
abandon
Figure
we
2:
experiment
1
experiment
build
Value Function and Experimentation Demand. Overusing the
depict botii
tlie static
(solid line), strictly
(thin dashed line).
payoff function
tt
(thick
dashed
is
The
increasing in v.
R&D
a more general decision model (no null action) on the right.
— the
g{n)
between the
vertical distance
=
nc'{n)
—
and dynamic value function v
and the intensity level n
model is illustrated on the left, and
line)
convex in the experimentation domain £
The demand
= nMB —
c{n)
static
vertical axis,
=
{p,p),
The option
and dynamic values
—
is
value of experimentation
maximized
at the
kink
tt
p.
c{n) equals total (flow) benefits less total (flow) costs of
experimentation, or the (flow) producer surplus of information.
Imagine the
VM
as a
competitive flrm, facing an increasing marginal cost curve, "selling" himself information
at the constant "price" 'E{p)v"{p).
Next
strictly convex.
for the
The
surplus g{n) rises in the optimal quantity
costs.
closes
down
=
g{n{p)).
So a^ the value v and thus delay cost rv
choose a higher level
n
c is
optimal stopping decision, the cost of delay must equal the
surplus from optimally experimenting, or rv{p)
convex
if
rises,
This surplus
VJ^ must
the VM. only
an experimenting
to generate the requisite higher surplus. Simply put,
his informational firm (and acts)
n with
rises in
when he cannot generate
sufficient net profits
(producer surplus) to justify the time cost of his capital rental (his deferred action).
Proposition 3 (The Optimal Level of Experimentation)
frontier 7r(p)
is
is
increasing (resp. decreasing, U-shaped) in p in the
For immediate context, consider our
thus the research level rises as
A
R&D
we approach confidence
potential uncertain investment has
headed
design).
first,
A
for
(eg.
development
£.
and
in the 'build' decision (see Figure 2,
commonly-observed phenomenon:
new life/money breathed
or finding; research expenditures stochastically
because of discouragement
domain
spin. Here, 7r(p) strictly increases in p,
panel). This provides optimizing foundations for a
likely
the static payoff
increasing (resp. decreasing, U-shaped) in p. Given (ic), the optimal level
of experimentation n{p)
left
Assume
into
grow over time, only
it
by a key discovery
later
on to
(i)
shrink
cold fusion), or (n) continue growing, as the project
(recently, spinal cord research or
similar pattern also emerges in the
how
is
a new shuttle rocket engine
clinical tests of
new drugs
proceed:
small tests, and sometimes later, larger tests, and then quite expensive field
trials.^
^A rare empirical study of project-level R&D expenditures in the pharmaceutical industry (DiMasi,
Grabowski, and Vernon (1995)) reveals a pattern strikingly similar to our theoretical prediction. Timeintensive pre-FDA clinical testing usually occurs over three sequential conditional phases, of growing sizfe.
11
The Option Value
4.4
=
Write v{p)
7r{p)
+
of Experimentation
where I{p)
I{p),
the general shape of the static payoffs
We
For simplicity, assume
tt.
next show that I{p) peaks at the kink p in
Experimentation
action switches:
the expected present value of information (or
Since learning costs time and money, the value v necessarily inherits
experimentation).
a null action.
is
is
of which action to take, as no action
tt
when
VA4
most valuable when the
is
>
7r(p)
always, obviating
the myopically-best
is
most uncertain
as
dominant. Think of I{p) as the option value of
waiting and choosing an action after optimally experimenting. This option to change one's
plan
is
worth the
when one
least
is
most sure of an action to
section 4.1 of a hill-shaped value of information
however, any implication that
is
So the suggestion
take.
an apt description of this option value;
alone determines the experimentation level was
it
in
false, for
the flow return to experimentation also includes the terminal payoff n{p).
Assume
Proposition 4
of experimentation I{p)
the 7iull action
is
If the null action is
('A')-
is
never exercised, then the option value
single-peaked in p, maximized at
exercised, then I{p) has two peaks:
one
p where
ai tt^
=
tt^
and one
Proof of first case: By value matching and smooth pasting (6)-(7), v
and v
—
-Kb falls
Remark.
on
[0,p).
—
So v
tt is
and
rising until tx^
Since information value owes to v"
>
0,
-kb cross,
not.
=
Note that since v"{p)
^(rv(p))/E(p), and
^'
—
and
cross.
ttb
at -kb
tta rises
=
on
exists
is
If
0.
(p, 1],
later falling.
some have suggested that the
of v'" should be relevant for the experimentation derivative. It
is
and
sign
instructive to see that
by Proposition
it
2, v'" exists,
and equals
^
,
"^
^
Consider the
then
4.5
v'"
>
^^'
rv'jp)
R&D model, where n' >
on (1/2, p), while
2(2p-l)eMp))
^
S(p)/(rt;(p))
v'"
<
E(p)3/2
always. Clearly,
just
above
if
payoffs are such that
p, since v'{p)
=
by smooth pasting.
Expected Remaining Time and Experimentation Costs
We now
analyze the behavior of the only two costs of experimentation in our model:
time and money. Assume a stopping time
of Proposition 2-a or
c.
Then by
T<
oo
§15.3 of Karlin
a.s.,
which
is
true under the assumptions
and Taylor (1981) (KT81), since
zero drift and variance 2E(p)n(p), the expected remaining time t{p)
stopping obeys the boundary conditions t{p)
+
p< 1/2 <p,
=
t{p)
=
0,
E(p)n(p)r"(p). Hence, r"(p)<0, and consequently, t{p)
=
E[T\po
as well as the
is
=
(pt)
has
p] until
ODE: — 1 =
hill-shaped.
We next write v{p) = R{p) — K{p), or the expected present value of final rewards R{p) =
£'[e~'"^7r(px)|po = p] less that of experimentation costs k{p) = E[f^ e~'"*c(n(pi))df|po = p].12
Since k{p) eventually
it is
E(p)n(p)K"(p)
=
U-shaped, k{p)
— as
then
is
K,
=
nonmonotonic with k{p)
clearly
then
near the extremes given the vanishing expected time horizon,
falls
—
rK{p)
k{p)
When
c{n{p)).
By KT81,
0.
everywhere concave, and
is
But
single-peaked.
<
k{p)
c{n{p))/r always
k[p) ever exceeds c{n{p))/r
middling p when n{p) and thus c{n{p))
for
must become convex, and any crossing
is
an
ODE
everywhere positive and
is
If
if
also satisfies the
it
n{p) and thus c{n{p))
smaller and thus concave near p and p.
is
seems quite plausible
it
=
k G C\
inflection point; since
—
U-shaped
is
then
it
strangely must then be 'molar-tooth' shaped: two local maxima, bracketing two inflection
<
points at pi
p2,
>
with k{p)
open question. Quite
C
c(n{p))/r on {pi,p2)
k
plausibly,
Which
(p,p).
scenario arises
an
is
molar-tooth shaped only for low enough r (so that
is
experimentation lasts a long time), and disparate payoffs (extremely U-shaped costs).
4.6 Experimentation Drift
Since the belief process
by
Ito process
Finally, {K,{pt))
and otherwise,
By
n{p)
Lemma, and
Likewise, (r(pt))
in £.
in £.
Ito's
it
is
is
a
is
convex. But
supermartingale Ito process
strict
everywhere a
in
n may
convex and increasing
of a convex
more
is
1,
possible
is
strictly
when
c'
>
0, (8)
+
[nc" {n)]' (n'f
implies n"
producer surplus
is itself
Proposition 5
Assume
the intensity level n{p)
=
c{n)
>
=
at least
.
If the
is strictly
then
in n,
hill-shaped,
Since v
n^.
is
=
when
=
[nc"{n)]'
its
inverse
beliefs
/
is
if
weakly
composition f{rv)
is
strictly convex.
rv{p), to get g'{n)n'{p)
FOG
rv"{p)
—
<
=
v"{p)
—
when
rv'{p).
c'{n{p))/T,{p),
n
alone:
^'^
(8)
the information
already asserted.
— nc'{n) — c{n)
pE 8., and thus
=
rc'{n)/E{p)
i.e.,
level, as
producer surplus g{n)
convex in
strictly
then unclear. But
differential equation in the level
[g' {n)n' (p)]'
is
if
Indeed, differentiate the Bellman
c"' exists.
concave in the experimentation
{i^)
=
f{rv{p))
and applying the optimal control
nonautonomous second order
[nc"{n)]n"
is
a submartingale
is
convex function rv
equation, that surplus equals the delay cost, g{n{p))
Differentiating once more,
k
if
5.1 of Rockafellar (1970), the
and increasing function / with a
yields a simple
when
the convexity of n{p)
By Theorem
refined statement
<
supermartingale inside {pi,P2)-
strict
weakly concave and increasing
in n.
>
down), as r"
(drifts
submartingale Ito process
strict
well be concave in rv, as
p e £ by Proposition
values {v{pt)) are an
also a strict submartingale (drifts up) in £, since v"
must switch to a
the producer surplus g
Since
&,
a martingale diffusion, and v e
the same reasoning, the experimentation level process {n{pt))
convex
A
{pt) is
is
concave, then
{n{pt)) is a submartingale.
'°This more simply yields the nonautonomous second order differentiaJ equation in the value v alone:
c'{f{rv{p)))
=
sition g{n(p))
T,(j})v"(p).
may be
Equation
(8)
does not necessarily hold even though v" exists since the compo"
if we cannot write its first derivative as g' {n{p))n' {p)
twice differentiable even
.
13
Figure
A
3:
Payoff Shift in the
and
so does the value function,
rises,
R&D
Model. When
intensity levels (inside the
=
For instance, any convex geometric cost function c{n)
=
surplus [nd'{n)\'
{c{n)
=n
for
n <
1) yields
surplus function g{n)
R&D
model,
that n"{p)
v",
If [nc"{n)]'
in a
nc"{n)
=1
=
vanishes in
=
still
(8),
and
costs, the
yields
=
knife-edge case c{n)
n
then
c"(0)
+ e)
have n"
<
is
1
a convex producer
+ nlogn
n>
for
1
when k >
e
experimentation
So
2.
be a submartingale. In
even
if
g{n)
is
light
slightly convex.
locally convex. In particular, for the
oo, then
some
for
>
>
a submartingale. In particular,
=
left-shifted interval {p,p)).
constant, and so linear surplus. Note that a concave
we may have
lininio 0{n''~^)
and not too convex
The
0.
neighborhood \p,p
level (n(pt)) is locally
lim„4.o '^c"'(n)
>
Iimji|o"c"'(n)
if
>
l)^n^~^
now
{k>l)
vf'
sufficient that intensity levels (n^)
is
enough
of (8), for large
Remark.
—
k{k
the (unplotted) bad build payoff £
one can show by THopital's rule
0;
if
therefore, the experimentation
c{n)
=
=
then lim„j,oc"(n)
n*^,
R&D
for barely profitable
level is at least initially
projects
expected to
rise.
SENSITIVITY ANALYSIS
5.
Parametric Shifts
5.1
We now
how
explore
the experimentation schedule moves with changes in the payoffs,
information cost, or interest rate. Appendix
brutally illustrate in Figure 3 for the
But the tangent
fall,
line p/i
+
(1
—p)i
R&D
tilts
C
model.
upward
since (rather loosely) the value function
reward
is
higher, one
so that
p
or
p both
is
indifferent
shift in the
develops a
When
too,
is less
method
i rises, so
curved. This
is
is
if
h had
and
£ falls so as to
and hence
n.
intuitive:
When
the
less optimistic,
fallen).
uncertainty over the state of the world,
a most natural thought experiment stems from raising the payoff
rises,
v,
and thus the thresholds p and p must
direction (but up,
Since the sole reason to pay for information
assume that h
does
about adopting or quitting when slightly
same
we
of tangents, which
risk.
In other words,
maintain a constant expected payoff nsip) from
that action at the current belief p. Intuitively, this ought to raise the value of the dynamic
problem, since the
VM should prefer a
riskier final payoff distribution.
frontier in the current stopping set goes up.
14
For the static payoff
Then by employing the same
level decision.
and stopping
rule, the
VAi's current value
no worse. Hence the return rv and thus the experimentation
The T>A4
also benefits
signals (higher
and
is
value often
r,
the
more eager
falls
greater, so
is
is
and
act:
the intensity
Thresholds both
result that follows (proved in
when any payoff TT^
Payoff Risk:
(b)
an action a
rises.
and payoffs
(ic),
If payoffs
grow
yrf
and p
is
replaced by c{n), where c(0)
of
=
c'{f)
=
c(0)
=
and fall
rises,
=
0, this is
is
never taken.
up for p G
if'K^ or
n^
[p,p]
rises.
remains constant for
~'^a\ increases), then the value v{p) rises,
and the experimentation
—
c(0), c{n)
optimum)
at
true if c{n)
—
level n{p)
is
— namely,
and corresponding
slope
>
^'(0+).
Then
uniformly fall in £.
With
higher: ^'(0+)
is
c{n)
level n{p) rises.
"more convex"
c{n) convex,
and the value v{p) and intensity
Information Quality: As
(d)
rises),
(the marginal costs at the
costs c(0)
level n{p) shift
that the cost function grows
c{n)
no fixed
l^^a
portmanteau
to the null action.
so high that the null action
riskier (the expected payoff iTaip)
at belief p, but the payoff spread
thresholds shift in,
and
so that the return rv,
to the general story, the
little
Thresholds p,p rise ifn^ or-K^
Cost Convexity: Assume
(,
a key
is
Less obviously, the
Appendix C.l) ignores complications due
the thresholds "shift out" (p falls
(c)
Finally, with a
level.
shift in.
rises,
Payoff Levels: The value v{p) and experimentation
(a)
rise.
rises.
case-by-case analysis that adds
Assume
n both
and from more powerful
costs,
proportionately less than the interest rate
Proposition 6
level
re-optimizing, he does
enjoys a higher expected payoff (since discounting
to stop
thus the intensity level n,
To avoid a
VM.
convex information
less
Q. Since the value
higher interest rate
cost),
from
By
intuitively increases.
convex and
the signal-to-noise ratio factor
C,
c'(O)
>
>
c'(0)
0.
value v{p) and
rises, the
the experimentation level n{p) shift up, while the thresholds shift out.
(e)
Impatience: As the
interest rate r rises, the value v{p) falls,
and thresholds
Also, the optimal intensity level n{p) rises strictly near one or both thresholds in
R&D
model, n{p) declines for
all
p <
p',
and
rises for all
p >
Surprisingly, the impatience result runs counter to the folk
More impatient
§2.B), with
cost,
more myopic
a vanishing belief interval. The
whereas experimentation has a
Information accrual here
p'
G
In the
{p,p)-
of Bayesian learning.
is
'!t{pt) is
VM's
delay
his experimentation. Further, as Proposi-
VM experiments at an exploding rate — albeit over
'folk intuition'
finite
depends on never-ending experimentation,
purpose here, that one eventually stops and
a means to an end;
it is
analysis in the paper so far remains valid
where
wisdom
some
actions. In our model, greater impatience raises the
tion 7 will assert, as r blows up, the
i.e.
for
£..
decision makers typically 'experiment' less (eg. in the canonical settings of
and somewhere induces him to accelerate
Our
p',
shift out.
not a payoff-generating
if
the final payoff
is
acts.
lifestyle.
an annuity
—
an eternal flow payoff rather .than a one-shot lump-sum, as we have
15
assumed.
What happens
arm
the safe uninformative
and
in this case is that the final decision
therefore exercised
is
rate r, the annuity
Bellman equation
is
in
when equal
worth
for the
a bandit model:
formally very
much
provides a constant flow current
It
+
payoff',
Indeed, the
intensity level falls everjrwhere.^^
maximization E[J^ —c{nt)e~^*'dt
like
So with a higher interest
to the current value.
and the
less,
is
But
e~'"*7r(pr)/r] is still (5).
w = rv, it becomes w{p) = max„>o{— c(n) + nE(p)to"(p)/r}. Hence,
n{p) = f{w{p)), and w solves w"{p) = rc'{f{w{p)))/Yj{p). The value matching and smooth
pasting conditions are still w{p) =n{p),w{p) = 7r{p),w'{p) = n'{p),w'{p) = 7r'{p). A higher
terms of the return
in
interest rate r
is
then formally equivalent to a lower signal-to-noise ratio factor
this diminishes the return w{p),
The Return
5.2
We finally turn
impatience and
to Wald's
full circle,
recalling our two paired twists
VJ^
Absent payoff discounting, the
>
cost c(0)
penalty,
sees
no hurry to stack experiments, and reverts to a
Q)
Assume
asr ^
oo,
obeying
("Ar).
Afc(n2
:
the final payoff 7r{p)
and decreases
>
r
cj(.(n2)
— cj(.(ni) <
intensity level
must
>0
as r
4-
0.
rlO
Thus, n(p)4,0 as
iffc{0)
= 0.
Consider a cost function sequence Ci(n), C2(n),
0.
Afc(n2
=
— ni)
for
all
n2
>
ni
>
>
>
7r(p)
Proof of the convexity limit
— n*^ for k > 1\ the producer
/(n) = [n/{k — \)\-l^ blows up as
c(n)
are very grateful to Sven
Rady
=
0.
.
.
Thennk{p) uniformly
<
max{7r^,
-Cq, so that /(-cq)
/(O)
>
for all
for
p e
£, rv{p)
is
>
and uniformly vanishes i/limjt^oo At
0,
Because ^(0)
0.
satisfy n{p)
Likewise, since v{p)
We
everywhere.
see the impatience limits, consider that since v{p)
return rv{p) vanishes as r ->
11
>
Let A^, A^ be the lower and upper Lipschitz constants of the marginal cost
— ni) <
To
no parallel experimentation
faces
c{n) obeying {"k), the intensity level n{p) explodes
to f{0)
explodes (where positive) i/limfc_>oo A^
Proof:
VM.
running a massive experiment at time-0.
classical statistician,
Vanishing Convexity: Fix
{b)
noted, these assumptions cut in opposite ways.
strict cost convexity, the
and he becomes a
>
(where
on Wald's sequential paradigm:
As
Vanishing Impatience: For fixed
(a)
6.
(vanishing intensity levels) barring any fixed flow experimentation
Without
0.
Proposition 7
c'f.
mode
f{w{p)), by the logic of Proposition
World
strict cost convexity.
purely sequential
=
and n{p)
thus,
(^;
pe
£,
=
oo.
tt^, tt^, tt^},
=
0,
the optimal
and so n{p) tends down to
and n(p) explode as
r
—^
the
oo.
/(O).
D
appendicized, but for a powerful example, consider
surplus
A; 4-
1,
is
as
g{n)
=
nc'{n)
—
c{n)
=
{k
—
l)n^. Its inverse
we converge upon Wald's case
for discovering this
16
key difference.
of linear costs.
CONCLUSION
6.
Summary.
This paper has explored the basic two action, two state decision problem
under uncertainty.
particularized
We
have modified Wald's model by assuming an impatient VA4, and
to the case of
it
an increasing and
strictly
convex cost function of within-
period experimentation. These two plausible economic assumptions have jointly afforded
a reasonably complete Bayesian characterization of information demand: Research
grow with the optimism over the
For not too convex cost functions, they
project.
upwards over time. This yields some simple
Our
conclusions have not
come
falsifiable implications in
purchases or
R&D. We
it is
We
it
can be applied in
many
settings,
such as strategic
R&D models, or principal-agent experimentation models.
our main intensity- value monotonicity result n
producer surplus intuition extends to
stationary in the posterior
Such a
Extension.
V/A
mean
6 =
states,
f{rv), and the associated information
actions and states.
It also
obtains in
E; however, the resulting problem
not
is
alone, as total research outlays so far increase one's belief
eventually becomes convinced of anything he learns, and quits.
It is
surprising that
perimentation level in Proposition
properties on the
=
many
finitely
a normal learning model with state space
An
consider this framework a key
While we have studied a model with just two actions and two
Robustness.
precision.
context.
a tractable new off-the-shelf decision model of information
believe that
patent races, general equilibrium
R&D
an
drift
standard discrete time setting, but in a new
in the
continuous time control of variance for a diffusion.
contribution of the paper, as
levels
demand
first
order conditions nailed
down
the optimal ex-
Apart from normal learning models, such regularity
1.
for information are very rare.
For instance, Radner and Stiglitz
(1984) have pointed out that with a one-shot information purchase, a 'non-concavity' in
the value of information
may emerge
its
marginal value
is
initially zero.
In a
we argue that when information can be repeatedly purchased, and
progress,
endogenous, a well-behaved dynamic
formation
on page
—
is
demand theory emerges. The
its
economists prefer.
economically well-behaved
if
Information
the decision
is
is
quite unlike other goods, being only
maker can
exercise
an eternal option of repur-
somewhat
important Bayesian decision problem.
we
also believe that this
same framework
will afford a
Blackwell's (1953) theorem on the value of information with a
large purchases of very
noise ratio.
value
resulting value of in-
chase. This paper might be seen as an application of this general principle to a
Finally,
in
concave in the quantity (and linear in our continuous time limit, underscored
10), as
historically
work
A
weak and cheap
careful formulation
signals, all the
and development of
17
dynamic extension of
much
finer ordering.
DM cares about
is
For
the signal-to-
this idea awaits our future work.
APPENDICES
PRIMARY MATHEMATICAL RESULTS
A.
A.l Properties Related to the Cost Function
Claim
The surplus g{n)
1
y(0)<0 (=Q
g{n
Proof:
Claim 2
iff
— 0),
c{0)
+ e) —
Given
=
—
= g^^
(i^), the
[c{n)
and ^ >
=
tion.
—
sup„7i*n
By Theorems
+ c{n) =
n*n
and 26.3
12.2
for
on (0,oo).
its
1,
=
Since g{n)
inverse
/
its
in Rockafellar (1970)
dual
=
c**
c
any subgradient n* of
exists
c'{f{w))
is
increasing,
w = rv, forw>0.
is
—
is
con-
continuous.
Let
nc'{n)
and
O
We >
c{n)
is
(R70), c*
is
convex since c
is
smooth. His Theorem 23.5-d then yields
c at
n
(the Fenchel- Young equality). Taking
= nc'{n), or c*{c'{n)) = g{n). Finally, at n = f{w), we
c*{^{w)) = c*{(:^{f{w)))=g{f{w)) = w. The inverse relationship ^ = (c*)~^ obtains, and
n*=c'{n), we have c*{c'{n))
get
0.
c{n) be the (Legendre-Fenchel) conjugate dual of the cost func-
convex, and strictly convex as
c*{n*)
=
not globally so) in the return
(i.e. locally,
tinuous and strictly increasing by Claim
c*{n*)
>
c{n
marginal cost of experimentation ^{w)
>
Since
(-Ar).
continuous and increasing, with /(O)
is
+ e)] + {n + e)c'{n + e) — nc'{n)
> —Ec'{n + e) + {n + e)c'{n + e) — nc'{n)
> nc'{n + £) - nc'{n) > nXe
g{n)
Clearly, ^(0)
continuous, increasing, and unbounded given
inverse f
its
and Lipschitz
strictly concave,
Proof:
is
since c*
is
increasing
away from
w = 0,
+
c{n)
and convex, ^
on the
is
increasing
relative interior of its
and concave.
is
locally Lipschitz
D
domain, by Theorem 10.4 of R70.
Observe that even though the marginal cost function
=
Finally, ^
need not be concave in
c'{n)
n,
When c"(> 0) exists, this
is easily seen: For then f'{w) = [g'{f{w))]~^ = [f{w)d'{f{w)]~^ > 0, and consequently,
C{w) = d'{f{w))f'{w) = l/f{w) > and e."{w) = -f'{w)/f\w) < 0.
the function ^{w)
Clciim 3
Proof:
Given
If false,
(tAt),
d{f{w))
the Lipschitz inequality for /
9{fM)-g{f{wo))
/W-/M
concavity of ^
surprisingly concave in w.
both the inverse surplus function
w = g{f{vj))=g{y)=yd{y) —
where the limit
is
c{y), plus the
-^(^(-°))-
exists as
= cf{f)
c'
(see
exists
Claim
cjfiw,))
2).
is
The
wi
> wq >
close.
By
Lipschitz-down constant A for
-
/(.,)-
and /
fails for
f and y/J are Lipschitz on
cifjwo))
/M
.
^^^"^^
continuous, and
is
c'ifjw,))
the relations
cf{n),
-
d{f{wo))
/(-0-/K)
^'^
approached from below by
Lipschitz constant l/\f{wo)
18
(0, oo).
G (0,oo)
suffices,
u
LHS
For \/] the
,
[/(w'l)
-
Given
Since ^
is
+
f{wo)]/[\/fiwi)
Claim 4
Proof:
of (9)
(i^), the
is
function ^{w)
=
using g{n)
this
RHS
on the
as
is
differentiable for all
w>
with
,
^' [w)
= 1 / f [w)
concave on (0,oo), the right derivative ^'{w+) exists by Theorem 24.1
^
—
D{y)
limj^j,^,
.
^/f{woj], this yields the Lipschitz constant l/2A(/(u;o))^/^
of R70. Let
Clearly,
^/J{w^ ~ ^/JJwo) Expressing
divided by
—
nc'{n)
e(y)
^H _ J_
y-w
C{w+) — l/f{w)
f{w)'
To
exists.
=w
and g{f{w))
c{n)
-
why
see
=
^'{w+)
1/ f{w), observe that
(as well as algebraic simplification),
we have
the identity
1
f(y)
1//
Proof:
Given
By Claim
=
r^'{rv)/E{p)
4^(rw)(2p
=
^' exists,
4,
^{rv)/T,{p)
^
so that
r/[f{rv)E{p)] G
- 1)/(CV(1 - Pf) e
-
/(<")
term vanishes as y — w\.Q.
[..]
and / Lipschitz (Claim
1)
^(p, v)
("A"),
S(v)
c differentiable, the
bounded (/>0 by Claim
Claim 5
=(^(^»-'(^W'-c'(/fa))
y-w
f(w)
Given / continuous, and
- fiw)
(0,
We only need
e,'{w-) = l/ f{w). D
3). Similarly,
continuous and Lipschitz on (0,1) x
is
(0, oo).
—
d^/dp =
D
partially differentiable in v, with d'^/dv
is
oo) for {p,v)
(-00, oo) for
e
(p, v)
(0,1)
€
(0, 1)
x (0,oo).
x
Similarly,
(0, oo).
A. 2 Two- Point Boundary Value Problems
An ODE Problem
a.
denote the
>
with c(0),c'(0)
=
^(p,u)
=
boundary value ODE, amenable
Observe that p
<p<p<
=
c'{f{rv{p)))/E{p)
1.
> p
in
to
known
EU, where
tt
Towards solving £U, we
pasting, this
boundary
free
is
is
&
and a C^ function v
not a standard, two-point
results. It
kinked at p, since v"
first
1
cE
^(r?;(p))/E(p), such that (6)-(7) hold (no
With value matching and smooth
null action).
< p < p <
there exist unique p,p with
0,
£U
Let
existence / uniqueness problem: For any strictly convex cost function
ff.
with v"{p)
with Value Matching £md Smooth Pasting.
needs an ad hoc proof.
>
(given
c'
>
0)
and
consider a second order one point boundary
pG {p, 1), solve Vp{p) = ^(p, Vp{p))
- tt^ > 0. Clearly, any such Vp is 6^.
value problem XV(p) of the Cauchy type: For any fixed
= p7rf +
given Vp{p)
We now
Claim 6
(0, 1)
(6)
(zi)
X
(0,
- p)7r^ >
start our attack
(a)
For
all
p G
oo), with Vp
For any p G
on
(1
(po,p],
on the
(p, 1),
=
and
v'p{p)
SU
problem, by
7rf
first
XV(p) has a unique solution
and vUp) uniformly continuous
(p, 1), Vp is positive, strictly
forpo>0, with
assuming the
Vp, that
R&D
can be continued on
in p.
convex, and Lipschitz either
Vp{po)=limpi,poVp{p)
19
= 0.
payoffs.
(z)
on
(0,p), or
^
Proof of
Claim
5), after
(valid
and
on an open
[0,p7rf
Claim
Vp
is
when
Vp{p)
>
There
p €
a unique
is
solution Vg exists on
Proof of
=
>
If Up
(a):
on
expansion
(0,p] for all
_
_
with
0,
q E
= ~oo
for
p near
=
Claim
+
jott^
and
6-6, case (m),
Next
Po be the
let
claim that
t'^(po)
>
—
(1
—
tt^
>
<
=
then Vp{qo)
<
and
v'p{qo),
=
maximal. Finally, since v"{p)
Proof of
Consider any p
(b):
V'gip)
Because Vg{z)
^{re) f^ 2~^(1
limg-t-i v'g{p)
Vg{p)
>
is
=^"b-^'b-
qo
>
^(p,
>
t;)
strictly
and
Proof:
Aw(p')
P2>pi,
Suppose
=
Vp^ip')
strict
qo
G
(a)
-
m
(p, 1),
fails
at
some
Vp^ip')
convexity of
=
0.
t;^^
0.
<
{po,p]-
r
-
t^b){p
-
by a Taylor
p,
P)
continuity, the limit in
— a contradiction.
Since Vp{p)
=
t'p(po)
=
tt^
—
—
<
<
say,
on
— tt^ >
But
(po,p], as claimed.
exists.
^'
this last integral strictly exceeds
"[
1,
or as p i
p.
Vp^,
Since
for:
v'^{q)
enough to
Vp rises
and
any q
=
tt^
v'p^
>
So
p.
-tt^
>
0,
1.
and flattens asp
(6)
>
rises:
For
Vp^.
the largest p' where vp^,Vp^ cross:
(p', 1),
in (pi,p2) yields v'p^{pi)
20
We
0.
contrary to po
^ I' ^Jli-%
{0,q] for q close
(a) Vp^
0.
P-
on
= — oo for q >
pi
Because A?;
>
^^ -
The solution
we have
p'
>
>
(po,p), with non-negative slope.
= ^f -
limp4,o w^(p)
p2
<
-
x
in (0, 1)
wherever the solution Vg{p) uniquely
i.e.
convex and U-shaped in
threshold pairs pi
u
i'^B
the slope f^
Vg{z))dZ
Claim 8 (Monotonicity Properties)
all
0,
on
by continuity) point where
> e >
for all z € \p,q] C (0,1),
— zy^dz, and thus blows up either as q
= — oo for fixed p,
>
=
enough top, there
>
v'p
Brock
lin^p^po ^p(p)
given Vp{p)
po has all the properties of po
> max (0, po),
f nZ,
all q
for small £
by construction, Vp crosses the p axis at some
1-4.1 of
unique, strictly positive
o,
4-
largest,
not, then Vp{pQ-\-£)
and
~ p)^b = as p Pi
for p near p
Vp{p) <
(1
force
supremum (and
0. If
+
>
that is close
(p, 1),
_
_
P)t^b ^ P'^b
tt^
~
then for p just below
p,
Mp) ~ MP) - Mp~)(p -p) ^ Mp) Since Vp{p)
by Theorem
^{p,v{p))
>p
Vp >
all
such that for
Also, limpj,o f^^(p)
(0, q].
=
For any p
=
or Vp Lipschitz
left,
continuous and bounded
is
from Vp
(a)
limpip^ ^p(p)
(p, 1),
continues
it
(0, p)- If so,
^
and
0,
Strict convexity follows
such that Vp{po)
£
po
as
(0, 1)
of Vp can be compactified via Vp{po)
7 (Limit Solution Behavior)
>
oo) (see
to a two-dimensional system of first-order
Lipschitz on
some supremum
domain
Lipschitz
is
and
0,
set) fails at
— p)7rg].
(1
-I-
exists Po
(b)
>
Malliaris (1989), the
— because ^
ODE
(0,
1.1-2 and 2.1, and Observation 2 in §1.6 of Elsgolts (1970).
Either Vp
(6):
continuous and Lipschitz on (0,1) x
reducing the second order
ODEs. See Theorems
Proof of
^
All claims follow given
(a):
this requires Av'{p')
<
0.
= 7r^ - 7r^ = v'p^{p2)>v'p^{pi),
Since
and so
Av'{pi)
> 0. By
=
continuity of Av', Av'{p")
for
some p" e
i.e. v'p^{p")
(p',Pi),
= v'p^{p"),
or
4M) - ff 'i'iz,VpA^))dz =
Cancelling f^j(pi)
=
of (10) are equal.
But
<
since Vp^{z)
tt^
— tt^ =
<
this is impossible, because pi
from
Finally, part (b) follows
integration rises with p,
<
>
for 2
=
v'p{p)
>
p"
^{z,vp,{z))dz
ff
from both sides of
I'^^fe)
Vp^{z) given Ai'(z)
-
v'p,{p2)
(10), the integrals
and
p2,
Choose
Proof:
on
As q
(0,^].
over (0,^],
>p
q
p>0
enough to
close
domain
U-shaped by Claim
convex by Claim
7-6,
cross the horizontal axis only in the
only
if
b.
v{p')
=
Proof:
for
7r(p')
p'
<p<
and
v{p')
as long as
VsM
-
for every S2
^S2 iP')
p';
<
tt§
—
q
it
p'
tt^
Si,
~^si{p')
=
—
has a solution
7r{p')
for
each s G R,
<
(S2
and
S2
-
si)(p
all p,
in (0, g),
it
can
in
Claim
7-a.
A falling
WLOG.
D
is
— S1 =
v'g^
=
^{p,rv{p))
= 'if{p.,rvs{p)) s.t. Vs{p') = tt{p')>0,
of ^ in Claim 5, a unique solution
uniformly continuous in
= n{p') + s{p — p)
-\-
J-, f^,
s,
— t^s(p) <
enough
s.
(s2
— v'g^ {p')
exist.
all
p <
— •s)(p— p')
p'
= 7r(p)
G
[0,
and can be extended
^(y, Vs{y))dxdy. Next,
(11)
=
Inequality then holds by continuity for p near
.
provided Vs{p)
from
00),
— s,
Vg to this
This obtains because
(11),
>
0.
and
21
Vs2
<
Vsi-
— 00
Vs{p) =
In fact, \ims-^-ooVs{p)
>
provided Vs,Vs2
Therefore by continuity and monotonicity of
there exists s with Vs{p)
v'g{p')
- p') + /; J^,[-^{y,Vs,{y)) - ^{y,VsM)]dxdy <
{p')
s.t.
v.
where both solutions Vs^,Vs2
Hence, lims-^±ooVs{p) exists for
for large
>
Vq{q)- Since
and so ^(p, i^s2(p)) < *(P;^5i(p)) by monotonicity of ^; continuing leftward,
because U52(p)
1;^
assertion then fails
The two-point boundary value problem v"{p)
.
remains positive: Vs{p)
VsM =
>
of £U.
strictly positive
minimum
— which we ruled out
(6), (7)
IV{p)
with Value Matching Alone.
G R. By the properties
(slope) s
problem exists locally
left
all
Consider the Cauchy problem v"{p)
some
= —00 <
s.t.
as long as
could easily substitute for the horizontal axis, which was thus
Fix
exists
is
But
8.
D
above argument only made essential use of the rising neip) curve.
An ODE Problem
Claim 9
p<l
way required by £U. The
always stays strictly positive for
tta{p) line
down by Claim
with exactly one global
first
Finally, the
unique
(ir), a
obeying the boundary conditions
with ^^(0+)
also strictly
6-b,
^^^(-z))
of the solution.
that Vq{p) uniquely exists and
1
it is
Vg
Given
decreases, Vg{p) uniformly shifts
it is
^(z,
— tt^) — J^ ^(2, rvp{z))dz. For the domain of
the integral, given ^ > 0. And we've just shown
and thus so does
has a unique solution Vp for a unique
<
'^{z,Vp^{z))
(tt^
{SU Existence/Uniqueness)
1
on either side
p'.
that Vp{z), and thus ^(z, Vp{z)) rises uniformly for z in the
Theorem
(10)
in particular for
p
0.
Likewise,
Vs{-) in s, for all
= p'.
p < p
DERIVATION OF OPTIMAL CONTROL FOUNDATIONS
B.
This appendix rigorously formulates the control/stopping problem and the sequence
summarize our plan
and highlight the absence of any
of attack,
ignore the null action throughout, as
We
first
logical circularity.
We
and uniqueness properties of
of proofs that establish the existence
its
solution.
its
additional difficulties (with a horizontal tangency
= somewhere) are fully captured by the R&D case: a vanishing intensity level.
§B.l.
We progressively restrict the control process to simpler domains. First, we
posit the SDE met by the observation process (xj), given any control functional n(-) of
past observations. Next, we derive the Bayes filter SDE describing the evolution of (pt).
Then we show how to formulate the OCS problem with current beliefs pt as state variable:
because
tt
Since the best existing theory does not do this for joint optimal control and stopping, we
first find
weak conditions
{V convex
in p); this yields
a simple belief exit set
may
time T. Finally, we show that we
We
§B.2.
[0,p]
WLOG
restrict
to
V{p), and then prove
U
[p, 1],
Markov
Assuming
c(0)
>
or
optimal Markov control n{p)
>
tt
—
a.s.
Lemma 1
and a Markov stopping
control policies n{p).
any candidate
establish key properties of the belief diffusion to prove that
optimal policy of beliefs alone (as §B.l allows)
§B.3.
asV =
to write the value function
stops experimenting in finite time.
everywhere (or both), we verify that the candidate
f{rv{p)) from
SU
is
admissible in the sense of §B.l,
We then check that the unique solution to SU (and
thus to 7iJB+ST) also solves OCS {V — v works, with p,p), and also is its only solution.
§B.4. We specialize §B.3 to the R&D model with c(0) =
and tt = somewhere.
justifying uniqueness
presumed so
far.
B.l Developing the Markov Control Model
a.
Let
Drift
and Noise Notation.
ht{uj) or
Let
{f2,
J, ?} be the underlying probability space.
h{t,u) denote the same continuous time stochastic process h
a collection of J-measurable real-valued functions (random variables)
We
consider only processes with continuous sample paths,
for every
subset
[0, i],
cu
[0,t]
G Q.
C
We
[0,oo);
i.e.
by h^{u)
=
is
variance, adapted only to (3"^^),
1
R_|_
is
Q
x
R
->
continuous
{{s,h{s,u)))s<t the (graph of the) process restricted to
lem amenable to existing
(BM) with
by
/i*~
the
W = {Wt,J^)
zero drift
is
and unit
and (canonically) with continuous sample paths.
Think of Nature as
— Po, and a
in Cmoo);
the filtration generated by h. Finally,
the standard Wiener process or standard Brownian Motion
with chances po,
:
namely
denote by GJq,^) the space of continuous functions on any time
restriction of h^ to [0,t); (3^f)
(0, 1).
/i
{ht)t>o,
where h{-,u)
whose (superscripted) elements are sample paths or trajectories
Let po ^
=
initially
independently drawing a
BM noise path realization VF°°
results in nonlinear filtering theory,
22
drift
mG
{— yu, /i},
C[o_oo)-
To render our prob-
we must
define an underlying
€
unobserved process that the
VA4
wishes to infer (not simply the stationary
and an
drift),
observable process that provides noisy information on the former (the controlled signal).
This construction works even though the unobservable process
Oksendal (1995) (hereafter, 095), Example
Q =
Put
Q =
{(0,
W
(0,2],
nLUQH =
1], !B(o,i],
and endow
U
(0,1]
BM
extends to a standard
ability
=
measures Q{uj)
Finally choose
=
that mt{uj)
>
fj,
miuj)
=
//
=
ff'i-adapted process h.
=
0,
polivenn
M+ X
(1
~
<->
drift
^^^
Po)^ajenL
me
=
m{u!)
{//,
/i*)
for all t,^^
>
0.
The
dxt{uj)
=
m{u)dt
Fix a
R
be a standard
-H-
=
7{ui)
S(o,2],
and partition
BM
on
m
:
Next, define prob-
Q{uj)Tw{^) on
R+ x
partition element
{—//,//} such
Qg G {0^, ^l}-^
K++
n(t, h^{u>)) is also 3'f-adapted for
each
and sample paths n:E+ x
and (m)
controlled signal process {xt{uj))
solving:
—^
fi
{^7, 3"}.
otherwise. Note the equivalence:
—jj,
—n}
control functional of time
n{t,
=
by translation.
{{l,2],'B(^i^2],'^w}
+
-^
(0,1]
3"
by the Kolmogorov extension theorem. Then
exist
u G Qh and
for all
G {H, L}
The non- anticipatory
Xq{lo)
on
:
and define the time-invariant process
payoff-relevant state 9
satisfies (i) n{t, /i°°)
with the Borel a-algebra
W
and
e.g.
6.11.
Let U^
(1,2].
7w
yw}, where
it
—
time-invariant
is
is
Cj*q
s
a diffusion with
cr
+
--dWtjuj).
(12)
^n{t,x^{Lu))
h.
The Continuous-time Non-Linear Bayes Rule
payoffs
and
vNM
preferences, the
VAd's expected payoff from stopping depends on the
controlled signal diffusion only via the posterior probability of
time and past observations of the signal and own control
Our model, with the constant
all
conditions of
to the
SDE
Theorem
9.1 in
(given a prespecified
Given state-contingent
Filter.
pt
m=
= 7{m =
m and signal process
state process
LS77 provided that there
exists a
the process pt
=
\
3^?,
n^') obeys
dptioj)
=
p,{uj){l -pt{uj)
Wt{uj)
=
(l/a)
W
=
and that (n)
adapted not only to
c.
7{Q,h
own
= po-,
^~^^
x/n(t,
fx\ po,t,x^, n*").
obeying (12), meets
in §B.3.
Their
result,
filtering', asserts that:
and solves the SDE:^^
x^{u))dWt{u)
-
\ps{uj)iJi
+
(1
- Ps{uj){-fi))]ds}
^ Wiener process, but not a standard
filtration,
H), given
^^3^
^n{s,x^{uj)){dx'{uj)
jl
(^f,'3^t) i^
its
f~
pt{(jo)
(state
unique strong solution
Wiener process (Wt)), established
a special case of the so-called 'fundamental theorem of non-linear
(i)
/x
but also to the smaller one
BM. That
is,
it is
{3^^)-
Beliefs as a Sufficient Statistic.
^^We shall choose h = x, and soon after h — p. Of course, at time t, we need the joint history of i' and
compute the current optimal control nt- However, given the functional n and x^, the control history
n*~ is imphcitly recursively embedded; therefore, we shall omit it altogether for notational simplicity.
^^More recently, Bolton and Harris (1993) provide alternative insights into this formula.
n*"" to
j,
23
The Control
•
work with
as a Function Just of Belief History.
than signals as a state variable, we establish:
beliefs rather
Claim 10
There
is
p* IS sufficient for
x''
and {x\n*~). Hence,
a bijection between information sets (p*,n*~)
WLOG
and we may
,
Since we want to
replace n[t,x^{uj)) by n{t,p^(uj)).
We first prove that (x*, n*") i-> (p*, n'~) is an injection. Fix the control functional n,
states a;,a;' e Q whose resulting signal paths differ: Xs{u)^Xs{uj') for some se[0,i].
Proof:
and
By
continuity, this holds
n(t,a;*(a;))
=
p\uj)i^p\uj'), or
That
{p*,
=
n*~(a;)
[ii)
n'~}
on a time
= n(f,s'(a;')),
n*-~{u)')
n'-{Lo)
i—> {x*,
+
n*~}
and given the
either have
filter (13), belief
Either way, {p\uj) n'- (u)}
n'-{uj').
is
Then we
measure.
set of positive
,
+
an injection follows from strong existence and uniqueness
control process n(t, x'(w)) yield a unique belief process Pt(a;) via the
The Value
and associated
as a Function Just of
Current
filter.
Given the absence
Beliefs.
— Krylov
of suitable sufficiency theorems for optimal control joint with optimal stopping
(1980) being the best source here
Claim 11
The random
Proof:
T{uj)
The supremum
<
t}
e "3^
(4)
for all t
>
— we proceed
indirectly, via the value function.
can be written as V{p), a function of the current belief p.
T
variable
:
^2 i—)
[0,
oo]
is
a Markov time relative to (3^^)
w.r.t.
the measure ^(a;) on
a continuous terminal reward function
require
tt
e
supremum
value at time
upper bound
this
Theorem
6^). His
is
t
We
more powerful
verification
of the length-T horizon problem. His
Theorem
=
=
V{pt) for
all
t,
for
alH >
Vit.p^
is
using a policy that
any p e
namely the
is
By Claim
=
limf^^V{t,pt\T).
since our E(-), c(-),
11, the
supremum
e-optimal for the V{po) problem
(pi,P2), the
VM
7r(-),
and
Finally,
merely weights these
payoflfs
is
by his
Lemma
1.
described by a stopping set in
value can be written as V{p).
(i.e. it
achieves a payoff
in either state
9
=
By
> V{po)-e)
is affine
in p.
H,L, and adjusting p
— since the Wiener probability law 7w
24
rif,
r are stationary.
can ensure himself an expected payoff that
For such a strategy yields a constant payoff
Theorem
and adapted control processes
can now prove that the optimal stopping decision
[0, 1].
theorems
3.1.10 states that
Belief Thresholds and Value Function Convexity: Proof of
current belief space
for
general results in Krylov (1980) assume
(like us; his
tt
the value of the infinite horizon problem
•
The
fi.
achieved via a feedback control functional n(i,p*). Next, by
6.4.13, V{t,pt)
:
and a Markov
3.1.9 provides a recursive equation for l^(t,pt|T),
6.4.4, since limj.^oo £'t^[e~'"(^~'W(pj._J]
Remark
if {a;
The VM. must optimally stop the one-dimensional
0.
controlled process (pt(a;)) solving (13) for an adapted control process niiui)
time T(a;),
differ,
{p*(a;')> "*"('^')}-
of the diffusion Pt(a;), which imply that every signal process path x^iuj)
•
paths
[i)
is
independent of
the
partition element
Lo
while
{Vie}-,
m=
exceeds some supporting e-tangent line at p.
Next, define stopping sets
Sa-,
As
e
>
V
H,L. Thus,
±/i in states
and p are
everywhere weakly
V
arbitrary,
is
convex.
Sb, and Sq for A, B, and the null action. Since point
= 7r(0) and y(l) = 7r(l). As stopping is always an
option, we have V >
equality V{p) = 7r(p) holds iff p G 5^ U 5// U Sq. If ever V{p) = n
so that p € Sa, then convexity, V >
and G Sb, together force ^ =
on [0,p]. Thus,
Sa = [0,p], and similarly Sb = [p, 1] for some < p < p < 1. Similarly, since tt^ > and
7C§ > 0, the null action is never exercised at p =
and p=l, so that 5'o = [p„,Po].
beliefs are stationary for (13),
^(0)
it;
tt,
tt
We have now proven
that the stopping rule is characterized by an exit set in the space [0, 1] of current beliefs. We
may therefore simplify the supremum operator in (4) as sup^-,, = suprp^i[sup„]. For each
•
The Control
Current
as a Function Just of
supremum
{p,p}, resolving the inner
is
Beliefs.
crucially a pure control exercise.
Under very weak
conditions guaranteed by our simple two-point boundaries and boundedly finite
for
any optimal ?f -adapted control, there exists a Markov control that performs as
by Theorem 11.3
we
in
—)
SDE
is
]R_|.
dpt
Markov
=
the
admissible stationary
n
:
[0, 1]
—>
Markov
]R-|_+
control policy
problem when
M
n G
n{pt{u>)),
where n
:
^
zs
{p,p)
C
[0, 1].
a strictly positive
yielding a unique strong solution to (12)-(13).
The Properly Formulated Optimization Problem.
OCS
=
^/2E{pt}n{pt)dWt (given po), on any continuation set
'B[o^i]-Tneasurable function
d.
controls, of the type n(t,p'(a;))
a B[o,i]-measurable function. Thus, we consider beliefs as the solution to
An
Definition
well,
095. Hence, only the current belief pt matters for control. Hereafter,
restrict to stationary
[0, 1]
the
maximand,
Replacing
(4) given (2),
WLOG restricting to stationary Markov controls is therefore now:
^T{u\p,p)
y(po)=
sup
n(-)eM;p,p€[0,l]
n(-)eM;p,pe[o,i]
T (a;
s.t.
e.
/
Jn
/
""'-c(n(pt(a;)))e-^*dt
+
Jo
p,p)
=
inf {t
Pt{uj)
=
Po
Wt{uj)
=
(1/a) /o ^n{ps{uj)){dxsiu)
I
Brief Aside:
>
+£
:
pt{uj)
<p
or pt
e-^^('^lH'P")7r(pr(,|p,p-))
^
^
(a;)
dy{uj)
> p}
^2n{p,{u))T,{ps{u))dWt{uj)
-
[p,(6^)/i
(15)
+
(1
- Ps{cj)i-fi))]ds}
Making Sense of the Bayesian Model of §4.2^.
In the text,
formulated this result without measure theory for illustrative purposes. Yet our
was not without
generally write
force dWt,
maps,
if
We defined a conditional signal process
dxf{uj) = Itj^ng[fj,^{u)dt + {a/nt)dWt{to)], for
basis.
from the unconditional point of view of the
Wt{ijj)
(14)
= pt{uj)Wl^{iJ) 4- (1 -pi(a;))W/'(a;),
premultiplied by the indicator function
\^J
of
25
0,0
VM,
where
(xf),
9
(3) is
summary
which we can now more
— L,H\
was
we
the belief driving
really the
mixture of two
equivalent to this definition
(suppressed in the text, along with
a;)."
B.2 Experimentation Ends Almost Surely in Finite
We now
We
assume that a stationary optimal policy {n{-),p,p)
then show that experimentation
demanded by
that positivity,
C
p'e {p,p)
at
[0, 1],
then
ends in
a.s.
admissibility,
=
dp\p'
is
KT81,
and
§15.6-7, and Karatzas
Theorem
finite time, as
the diffusion
G
\p,p]
n{p')'£{p')dW
=
since
and Shreve (1991) (KS91),
(pt) is
frompo
regular
if,
random
the
is
E <
and
C
We now
FTPP) from
define several important integrals of l/n{p), needed
Though they may be unbounded, they
!B[o,i]-measurable.
^
(0, 1),
Given a general
<
S{p,p]
KT81, KS91
is
clarity,
The
{p,P)i
z,
—
0:pt
z
\
po}. Next,
y can be reached
<
or PT{Tzy
oo)
>
0.
=
p G
Lemma
equivalent formulation in KTSl's
non-repelling: (pf) approaches the
A
(p,p).
=
dp
\iinyipS[y,p] its left-side limit.
oo, independently of
The boundary p
an attracting boundary
6.1,
attainable
<
=
oo)
>
po
By an
is
strictly
speaking
is hit.
M{y)
=
j^
concept:
[S'Q{z)a'^{z)]''^dz,
liniy4.pM[y,p]. Next, let J\y,p\
=
the
j^ S[y,z\M'{z)dz
j^ S[z,p\M'{z)dz, with right limits J{p,p] and K{p,p]. The boundary p
< oo for any p G {p,p).
An attainable boundary
J{p-,p]
if
attracting
is
FTPP, we need a harsher
Similar to the scale function, define the speed function
and K[y,p\
is
arbitrarily close with positive chance
attracting boundary might not be reached in
=
>
similar definition applies at p.
boundary (say p)
speed measure M[y,p], and M(p,p]
in §B.3,B.4.
(3{p)dt+a{p)dW with range
starting from any interior point po, before any larger interior b
As an
both here and
= J^^ exp{-2 J^^ p{z)dz/a'^{z))dx.
= S{p) — S{y) denotes the scale
Siy)
convenient abuses of notation, S[y,p]
measure, and S{p,p\
b.
closely to
are well-defined, because the control n(-)
belief diffusion process
the scale function of
By KTSl's two
Pr(Tpp
£.
(adapting both for
= inf{i >
=0
a critical notion, analogous to the communicating property for Markov processes.
is
{P,P)
G
z are arbitrary points in (p,p).
variable Tp^z
in finite time with positive probability (hereafter,
This
§5.5.
we hew
Here,
Note
n(p')
if
oo, contrary to p'
any pair of interior points y,z ^
for
5 will need.
not a restriction on optimality. For
to avoid overuse of letters). Below, po,p,y,
hitting time of 2
with n(-) admissible.
exists,
Background Terminology, Notation, and Theory.
a.
if
Time
>
0.
Intuitively,
is
by
Lemma 6.2
in
KT81
this
thus not just approached but hit in
is
means
FTPP.
We explore the unconditional driftless (/?(p) = 0) belief
a{p) = ^y2n{p)'E{p) in (p,p). Then S[y,p]=p-y as n>0.
Attracting Boundaries.
process, with diffusion term
Claim 12
Assume n >
in
or c(0)
[0, 1],
process {pi=pQ+J^ >/2n(ps)S(ps)dH^s)
Proof:
is
>
0.
regular,
For an optimal control n{p),
and any
triggers
0<p<p <
I attracting.
That boundaries are attracting follows from S{p,p\ < oo and S[p,p) <
essentially stated above. Regularity
a pair y,z e (p,p) with PiiTy^
<
is
oo)
the belief
oo,
both
proven by contradiction. Suppose that there exists
=
0.
Obviously y
26
j^ z;
WLOG order p<y<z<p.
^
Proof Step 1: Regularity on Some Interval. Given sample path continuity
and y < z < p, dll diffusion patlis from y to p first fiit z, and so Pr(Tj^2 < Typ) = 1.
Hence, PviTyp < oo) < Pv{Ty, < oo) = 1 - 1 = 0. But then Pr{Typ < oo) > 0, for
otherwise starting at belief y, the T>M strictly suboptimaliy experiments at cost without
<
yields Pr{Tyny,
> Pr (Ty"p <
oo)
must
y"], (pt) also
[y',
down but not up
Proof Step
FTPP
>
Pr(Ty2
< 00) =
for small
not.
Proof Step
FTPP
is
Specifically,
<
with y
enough
77
>
0.
y,
As
=
suppose
also regular
it is
PT{Tyi>y"+r)
and Px{Tyy" < 00) >
y
—
p E
y
0,
Lemma 6.2.
< S{y, y] <
Thus, y
limp-t-y
to the
By Step
1,
<
a
claim that the largest
on [y",y"
=
00)
is
(pt)
To
for all
r]
see why,
+
77]
for small
>
0.
Because
and so y" -\-t] < p
p,
y), so is
[y,
<
p >
y.
Because also
J[y-,y]
00 forces M{y,
y]
=
+
boundary y
00.
K{y,y]
is
unattainable in
Then the PAl's only reason
FTPP. We now argue that
00, the
Since y
is
—
this yields a negative payoff.
Steps 1-2 proved that y
attracting.
[y^y); thus, J[y,y]
=
S{y,y]M{y,y] by KTSl's
By KTSl's Lemma
to experiment
00 by KTSl's
Lemma
6.3-uz, this implies
K{y,
6.3-t;,
=
y]
00.
a natural (Feller) boundary and so their Theorem 7.2-n conveniently says
is
lim^^y £'[e~'"^«p]
lies in [0, 1].
q^,
and assume
belief process drifts
we have y" < z <
an unattainable boundary of a regular process on
is
regularity
in Step 1, this contradicts the martingale property of {pt).
reaching the lower boundary p in
y]
(p, y),
Then the
y.
Contradiction to Optimality.
3.
starting from any
Since S{y,
y in
FTPP. For
right-open, of the form [y,y)-
is
regular on [y,y"], then
is
if {pt)
Assume
0.
=
y'
C\p,y]. To wit, the
y"]
y"] to y' in
[y',
from
[y',
violates the martingale property.^'*
connected regularity set with lower bound y
77
y" with
The Maximal Regularity Interval.
non-empty interval [y,y"] C [p, y). We next
2:
regular process on a
we prove that
<
y'
FTPP. Pick any
{p, y) is hit in
FTPP, which
in
any
for
across any such
transit up in
> y'm
contrary that no y"
>
oo)
down
diffusion process (pt) traverses
on
Once more sample path continuity
any prospect of positive discounted returns.
(a.s.)
=
0.
Both
limits exist as the expectation
is
monotone
Respecting the order of limits, this means: For each e
with y_<pe
<
qe
<y, such
that
discounted returns starting from
Since the
VM.
£'[e~'"^«^P£
q^ are
]
Riqe)
p <
p^^
£'[e~'"^'^E]7r(p)
<
<
—
e.
Since
<
>
0,
q^,
in
p and
q,
and
there exist Pe and
the expected gross
E[e~^'^''^''^]-K{p)
<
£7r(p).
stops only at the lower threshold p, the expected total discounted
experimentation costs K{qs) are positive for
all qg
>
p, increasing as g^ rises to
y as e
—
>•
0,
stopping time, then so is t A T = min{t,T).
A Typ = Typ with positive chance for large enough t < oo. Unless
t A Typ A Tyyii = Tyy" wlth posltlve probability for some y" > y (and so y" is hit in FTPP from y),
PT{t ATyp ATyyn = Typ) = Pi{t ATyp = Typ)>0 ioT sll J/" > ^. Thcn £bt AT^AT^,,, \y]<y + Pr(i A Tj^ =
^''Here's
a proof.
Since PT{Typ
<
— y] <
y.
Tj,p)[p
oo)
First, for
>
0,
i>0, ifr>Oisa
any
we have
t
But Wald's Optional Stopping Theorem guarantees equality because
27
(pt) is
a martingale.
and hence boundedly
TfiQe) for
>
small enough e
0,
Finite Hitting Times.
could approach
y,
WLOG
But then
all
exists,
Assume
2
it
1:
>0
Proof Step
2:
With
assuming
local integrability {LI'), or
[po
(P) V)
come from
The
is finite.
hitting p, which
G
(PiP)-
to
OCS
hitting time is thus a.s. finite.
interior belief: Indeed,
= Pr(TpQp<oo) = 0.
If
either
is
zero, say
p
<
7r(po) for po close to p.
attainable boundaries, J{p,p\,J\p,p)
<
cx)
by KTSl's
two inequalities are
5.5.32-z in KS91,^^ these
our result
for
iff
Lemma
nE>0 in {p,p), and (n)
M[po — s,po + e] = J^_^ dy/[n{y)T,{y)] <
Vp G
{p,p)
3e>0
with
{ND') obtains because n(p)>0 by optimality, and E(p) >
>
— £, Po+e]
{p,p).
the intensity level
the logic of Step 3 above, u(po)
For a contradiction, assume {LI')
for all e
or
0,
nondegeneracy {ND'), or a positive diffusion term
{i)
oo. Clearly,
>
This implies that the belief process
Both boundaries p,p are attainable from any
But by Proposition
6.2.
(p,p).
fails if
m[0,l], orc{0)>0, or both. If a solution {n{-),p,p]
FTPP, then by
unattainable in
or n{q^)
0,
to stop immediately, contradicting po
optimality at po G {PtP) precludes Pr(TpQp<oo)
is
>
<
i^iQe)
moving downward, towards the only attainable threshold
then the expected time to hit p or p
Proof Step
/^(ge)
—
•^7r(p)
from below, only at a vanishing speed; and by the martingale
The VA4 then wishes
y.
<
i^iQe)
{p,p)-
discounted payoffs starting at belief po G
vanish as po nears
Theorem
e
interior belief y
property, the speed also vanishes
p.
and thus
€ {y,y) C
g^
—
R{Qe)
summary, regularity only
In
some
"nearly vanishes" around
>
because either c(0)
both. This contradicts the assumption
c.
—
Thus, the value v{qe)
positive.
—
small enough that p
say,
are attainable in
Thus, J\po
fails for
FTPP
— e,po + e] < oo by
classification (end of §15.6-6),
both
some po €
<
po
—
e
{p,p)-
<
po
+
from po, as the diffusion
attainability, while
po±e
inside (0, 1).
Then M[po— £,po + e]
e
is
<
p.
Next,
regular in
M\po—e,po + s] =
are then exit boundaries of [po
contradiction proof from Claim 12 (step 3) with KTSl's
its
Theorem
—
all
=
oo
points in
superinterval
oo.
By KTSl's
e,po
+ e]. The
7.2-ii still applies.
B.3 Verification Theorem: Existence and Uniqueness of a Solution.
a.
Admissibility Conditions.
for
our candidate Markov control
the
non-R&D and
We check the conditions (see the definition in §B.l-c)
n{p) = f{rv{p)) to be admissible. Until §B.4, we assume
non- null action case with
tt
^
0,
to avoid
some
technical issues.
• PosiTiviTY. This has already been addressed at the outset of §A.2.
•
Measurability. The composition
map Xt
preserves
Jf
measurability (095,
of a Borel measurable
Lemma
<
oo and J\p,p)
<
being constant in time, n(-)
T is finite a.s. iff (i) both boundaries are attainable
oo (same as their v function)]; in this case, more strongly E{T) < oo.
w
^^In our notation, their result says: The stopping time
[J{p,p]
2.1); also,
map with any measurable
28
is
measurable
time Borel
in the
process n{pt{uj))
is 'B^_^_
is
any !B[oj]-measurable function
for
® J'-measurable,
Markov control function f{rv{p))
•
So
sets.
and Jj^- and Jf-adapted,
continuous,
it is
n(-),
the control
Since the
like pt{to).
therefore S[o,i]-measurable.
Strong Existence and Uniqueness (SEU). Given
our candidate optimal policy
{f{rv,p,p}, the belief process has a unique strong solution. Indeed, replace the functional
n(t, ^')
with our candidate Markov control f{rv{pt)) in (12), and substitute into (15) for
Wt{Lo),
and then
dpt
pt{oj).
Suppressing w,
= pt{l - Pt){2lJ./a)
{[f{rv{pt))/a]
[m -
/i(2p,
-
l)]dt
+
Vf{rv{pt))dWt)
+ pt{l-ptKVfiryiPt))dWt
-p't{^-Pt)efirv{pt))dt + pt{l-pt)C^firv{pt))dWt
(16a)
Pt{l-Pt?ef{rv{pt))dt
where
(^
=
W are primitives, we need
As the probability space and Wiener process
2fj,/a.
(16b)
only verify that there exists a unique strong (KS91, §5.2.1) solution to each corresponding
SDE
dpt
=
by (16b).
Poi 1
(5e{Pi
If so,
m)dt
+ a{p)dW
on Q^, 9
= L,H,
the process {pt{uj)) solving (16a)
where a and
is
are implicitly defined
j3e
a diffusion, being a mixture (weights
-Po) of the conditional diffusions (16b), and so inherits the defining properties (KS91,
§5.1.1).
An
adapted control process {f{rv{pt{uj))))
is
induced, and a uniquely defined
observation process Xt, after substitution into (12).
Because our belief
SDEs
are defined
on a bounded open
interval (p,p), all requirements
below need only hold up to any hitting time T, at which point the process
Existence of a weak solution on (p,p)
—
i.e.
up
to any hitting time
by Skorohod's Theorem (KS91, Theorem 5.4.22) from po ^
boundedness of a and Pe on
{p,p). Just as
Lipschitz condition, here joint on the drift
with ODEs, an
and variance terms,
is
(3g
ior 6
=
absorbed.
L^
H follows
and the continuity and
(0, 1),
SDE
—
is
uniquely soluble given a
and
a. This requirement
met because / and y/J Lipschitz (by Claim 3) and v differentiable imply that f{rv{p))
and y/f{rv{p)) are Lipschitz in p G (p,p). (A proof mimicks that of the composition chain
is
rule in calculus.)
By
this implies strong
Proposition 5.2.13 in
KS91
(with h{x)
=
a;
in their
SEU
Value Function ChEiracterization and Verification of the
Theorem
{p,p),
3
Remark
uniqueness and then pathwise uniqueness (by KS91,
Skipping around their book, along with weak existence, this yields
b.
equation (2.25)),
The supremum value
where {v,p,p} solves
£U
V
of OCS (14) -(15) equals v,
(KS91,
OCS
and {p
:
5.3.3).
p. 310).
Policy.
V{p)
>
7r(p)}
=
(5) -(7).
—
Proof Step 1: Control. The supremum value V{p) of OCS clearly exists
and by
Lemma 1, it uniquely defines a continuation region £ = {p*,p*) Q (0, 1), by [0,p*]U[p*, 1] =
29
{p e
[0,1]
=
V{p)
:
For any admissible Markov control n{p) and resulting belief
7r(p)}.
process {pt{cu)), define V{po) as in (14), except using stopping time T{u)\p*,p*). Since
continuous, this
a pure control problem with given boundaries p*,p*.
is
For a sufficiency verification theorem, we check three conditions.
First, there exists
By Claim
a solution v{p) of the 1-LJB problem with value matching only at p*,p*.
ODE
solution v{p) to the
thresholds in
in \p,p]
so v"{p)
in (p, p),
and value matching
[p,p],
=
and ^
c'(/)
is
Third, the family of functions {v{pt)}t<T
simply because
in
Markov, as proved
Proof Step
boundedly
|f(pi)| is
095: V{p)
—
finite (see
in §B.l.c) control n{p)
=
Stopping. Next, replace n{p)
2:
=
f{rv{p))
=
withpo(<^)
f{rV{p))
==
is
Po
an optimal control
for
this stopping
OCS, from Step
discounting by his
Remark
3.8.3, for all
p E
8,,
V
2,
uniformly integrable
is
(pt) starting
an optimal
exists
and
(WLOG
OCS.
pure optimal stopping problem:
+
is
e-^^n{pT{uj))
our
supremum F(p), because
By Theorem
(S78), extended to a continuous flow cost c{f{rV{-)))
oo) by Claim
[0,
with our candidate f{rV{p)).
in (13)
problem
1.
any
Second, v e G^
7r(p).
f{rV{p)) for
sup^>o E^ [j^ -c(/(rT/(p)))e-"d5
— where the value of
a
095, Appendix C.3). Therefore,
this controlled diffusion Pt(a;), consider the resulting
V{po)
9,
(6) exists for
and any process
and there
v{p) € C^ in [p,p],
=
continuous on
controls n(-), any stopping time realization T,
by Theorem 11.2
Given
= d {f {rv{p))) /T,{p)
continuous in
is
continuous.
is
Markov
for all
S
v"{p)
such as p*,p*, on the boundary of the set V{p)
[0, 1],
because
tt is
final
3.15 in Shiryayev (1978)
payoff
7r(-),
and geometric
solves the generalized Stefan
problem
= -c{f{rV{p))) + /(rF(p))E(p)y"(p) s.t. value matching at dE. Hence, V" > in
and V = n in [0,1]\8. (where V" = 0). Therefore V is continuous and convex in [0, 1],
rV{p)
£,
and by Theorem
Also, as
shown
necessity of
rv{p)
=
R70
24.1 in
in §B.2.a,
it
Summing
follows
+ n(p)E(p)f"(p),
equivalent to v"
=
derivatives in [0,1],
from Theorem 3.16
and so at p*,p*.
in S78,
i.e.
V
plus (6)-(7), for the given control n{p)
up, the triple {V,p*,p*} solves
after Proposition 1, the
left
Bellman equation
TiJB
(5)
(Step
1)
and
ST
(Step
solves
=
2).
ST,
f{rV{p)).
As sketched
has a unique maximizer f{rv), and so
£U
is
TiJB+ST are jointly equivalent to £U, and
therefore v{p) = V{p), p* = p, p* — P, as asserted.
Theorem 4 Assume n > in [0, 1], or c(0) > 0, or both. The Markov control
n{p) = f{rv{p)) and thresholds p,p from EU are optimal for the OCS problem.
Since v{p)
i.e.
d{f{rv{p)))/T.{p). Thus,
a unique solution {v{p),p,p} exists;
Proof:
The
both boundaries are attracting for any positive control.
smooth pasting then
—c{n{p))
has right and
=
control f{rv{p))
V{p) and f{rV{p))
is
is
an optimal control (Step
1
not only admissible (§B.3.a) but also optimal for
3.3 in S78, the stopping time
of
Theorem
policy
3),
the
OCS. By Theorem
T = Tp^pATp^p — and then the whole £U policy — is optimal^
30
for
OCS
if
T <
oo
which we now establish.
a.s,
KS91
implies that conditions {ND') and {LI') in
SDE
belief
(see the
proof of Theorem
Proposition 5.5.32-(i) in KS91,
=
and J[p,p)
J^{p
-
T<
oo
a.s. iff
these inequalities follow from /(rw (•))>
c.
Assume
5
in (p, p)
<
Therefore by
oo.
- p)dz/[f{rv{z))i:{z)] <
J^{z
C
(0,1), then E{z)
— true given c(0) >
>
tt
in
>
or c(0)
[0, 1],
unique solution of ElA
the unique optimal policy for
7r(p)
is
(5)-(7).
Smce
Any
or both.
0,
£U
Because
oo, S\p,p)
Since {p,p)
problem (14)-(15) solves
Proof:
<
=
J{p,p]
oo.
(p,p)
or
tt
>
>
oo
and
always.
D
Uniqueness of the Optimal Policy for OCS.
Verification:
Theorem
<
z)dz/[f{rv{z))E{z)]
p G
for all
5.5 hold for the resulting controlled
and S{p,p]
2),
>
First, f{rv{p))
OCS
{f{rv{-)).,p,p] solves
OCS
solution to the
by Theorem
4, the
OCS.
and V{p) are uniquely defined, so
the boundary of the region
is
= 7r{p), i.e. {p*,p*} = {p,p}- Consider the uniqueness of the optimal
We have proven above that V{p) = v{p) € C^ in {p*,p*) = (p,p),
where V{p)
control
f{rv{p)).
and
Theorem
2 that the stopping time
Theorem
just for f{rv{p)). So by
V{p),
Thus
=
c'{n{p))
B.4 The
When
f{rv{p))
=
rV{p)
i.e.
=
Theorem
E{p)V"{p)
R&D
7r(p)
-c{n{p))
Model
=
=
if
Proof:
Clearly, f{rv{-))
in the proof of
If c(0)
>
p
limuj^^o
Theorem
4,
then f{rv{-))
The approximate
>
=
then boundary problems
<oo
=
0,
iff
for some
we show
this
M^)
1
»
is still
t]
v'{p)
=
0,
{p,p).
>
the
£U
policy
or c(0)
0,
i.e.
For n{p)
arise:
>
7r(p)
0,
is
2
uniquely
or both.
WLOG. As
>
J^{z —p)/f{rv{z)) dz < oo.
is finite.
If c(0)
=
true. Indeed, near p,
{z
-
and
=
tt
we have
(17)
p)r^{rv{z))
~
v"{z){z
the second equality from v"{z)
=
—
p)^/2 (valid
^{rv{z))/'L{z)
and
=S
oo)
l//(w), as established in §A.l. Notice that 2E{z)/r -^ 2E(p)/r
31
=
fails.
_j,(,„(,„^!M£)5(fL
f{rv{z)){z-p)v"{z)
=
p€
FTPP: So Theorem
near p since
< oo,
integral
in
Then
[0, 1].
J{p, p]
and the
for
Somewhere)
(tt ==
for some p G
in {p,p)
solve 1-LJB with value
+ nT,{p)V"{p)]
equality follows from a Taylor expansion v{z)
because v e 6^) with v{p)
from ^'{w)
—
the result obtains
..
f{rv{z))
0,
— not
D
in {p,p), while f{rv{z))
somewhere, so that f{rv{p))
^-P
=
w^~''^^'{w)/^{w)
>
max„[-c(n)
and p may be unattainable
payoffs 7r(p)
optimal
c(0)
any Markov optimal policy n{p)
095 any optimal n{p) must
Action Case
/ Null
somewhere and
Assume
either
11.1 in
for
+ n{p)T,{p)V" (p) =
= c'if{rV{p))).
as in Figure 2 (left),
6
is finite a.s.
in
G
(0,
>
as r
and p E
'
lim
~'-
where the
= ,siim
{z-py-^ = Slim f (7(f
-'
zip ^{rv{z)){z - pY
^Iv
first
bounded
Remark.
=
^'{rv{z))
is
for
If c(0)
=
t;
>
is
is
which
z)
and the third
G
valid because v
is
bounded above by
=
zip
— p)''~\ whose
(z
C^. It
integral
which establishes the claim.
0,
=
c'(0)
More
~
then in the proof of Claim 14, (,{rv{z))/rv{z)
0,
=
J(p,p]
exists.
+ e,p}
thresholds {p
C.
=
rv{z)+rv'{z){p —
— p)/f{rv{z))
— p)~\ whence
optimal stopping time
new
any
~
rv{p)
l/f{rv{z)) for rv{z) small.
of order {z
pElim(™'(z))''
—p
equality follows from (17), the second from the assumption,
follows that the integrand {z
\p,p\ is
z
^
from a Taylor expansion
on
rv{z)
)^-(-^)
J{rv{z))
zip
Observe that:
(0, 1).
Thus the integrand
oo.
<
Hence, Pr{Tp^p A Tp^p
precisely, the
SU
proof of Claim 14
in the
=
control n{p)
an e-optimal policy. See Theorem 3.3
<
co)
and no
1,
f{rv{p)) with the
in S78.
OMITTED PROOFS: SENSITIVITY ANALYSIS
C.l Comparative Statics: Proof of Proposition 6
We
first
consider payoff shifts, which just affect the boundary conditions of £U.
then employ a different methodology for shifts in
•
Proof of Proposition
maxx,n^(Po|",
=P
{Po
—
p)/{p
~
p)
positive: V^0{po\-)
a change
in
>
0, for
>
c(-)
that skew the
Increasing Payoff Levels.
by inspection: V{po\-)^H
Here the event pr
and
where the maximand ^(po|-)
T'^tt^iTTb),
differentiable in n^
0.
6(a):
r, C,
=
E[e~^'^p
|
pr
— p,po,n]Pr{pT = P
B
>
>P
for all Po
The other
P-
—
is
Po,'^)
>
(4)
\
=
occurs with chance
payoff parametric shifts are similarly
a fixed policy n,T.
for
itself.
Write V{po)
— seen on the RHS of
that the T>A4 eventually chooses action
any po
ODE
We
one of these parameters, the supremum value
If
V
the
VM re-optimizes after
cannot
fall.
Consider boundary behavior at lower threshold belief p (p being similar), associating p.
and payoff parameters
p,
by smooth pasting
^7r(PQ|7!"o)(7ro
•
—
TTj) is
tti
(7),
>
The
ttq.
and
value function V^(p|7r)
is
continuously differentiable in
partially differentiable in n. Hence, V{'n\TTo)
the first-order Taylor expansion, as Vp{n\no)
Proof of Proposition
6(6):
Claim
7-b,
p must
side. Also, as
fall
Claim 8
new
tt
frontier
on the
right.
falls
By a
to restore a smoothly-pasted tangency of Vp{p)
asserts, as
p
So n
0.
Increasing Riskiness. Rotate the
counterclockwise through (current belief, expected payoff), as tt^
the value function cuts into the
=
falls,
the value Vp{p)
32
rises,
—
V{v\'n-i)
>p
ttb
and n^
^
.
payoff line
rises.
Then
left-right reflection of
and
and gets
7r(p)
on the right
steeper.
Claim 13 (A Key Implication)
Parametrize models by Ti, T2, whereVi
=
Let {vi,p.,pi} denote the corresponding solutions, with ^i{p,Vi{p))
If
>
^2
<p
independently of p, then p
>
p2,
and
all
c[{fi{riVi{p)))/T,i{p).
*i(p,-yi)
>
vi{p)
Av",
{ci{-)Xhfi}-
(18)
p G
V2{p) for all
maps
continuous
[p„,P2]-
{p,,Pi)r\{p ,p2) -^R.
Ordering Thresholds. We assume p > p and obtain a contradiction.
symmetric argument pi > p2- First, we show that p > p implies Av{p
> 0,
Step
By a
1:
,
)
^v'{p) >
Av"{p) >
0,
some small
for
pi
Aw = W2— 1^1, and similarly Aw',
Put
Proof:
^*2(p,W2) >
vi
=
>
e
0.
0,
Av
so that
The
and convex
at
p +e,
two weak inequalities follow at once from value matching
first
and smooth pasting of each
strictly positive, increasing
is
at p., along with strict convexity of V2-
Vi
= '^2ip^,V2{p^))-'^i{p^,vi{p^)) > by (18), as Vi solves the Ti problem,
We claim that Aw attains a global maximum at some p € (p min(pi,p2))-
Next, Av"{p^)
and
To
Aw (p) >
0.
,
see why, consider the
of vi at pi
some
for
and
e
>
0,
is
>
pi
strictly decreasing at pi
>
strictly positive
interior global
If
P2,
then Az;(p2)
<
<
+e
is
In either case, since
=
p2,
<
then Ai''(pi)
P2:
An
by smooth pasting
"
maximum p
interior global
by value matching of each
and increasing at p
maximum p.
<
nonpositive at P2
Av{p + e) >
<
at p.; the function
Pi
— and so has an
we deduce the ma:!dmum
0,
>
because Av{p}
deduced. This violates the second order condition Av"{p)
<
for
Step
0.
But then Av"{p)
just proven, jointly imply
Av{p
Suppose that Av(p)
non-negative
yielding the
maximum
for
same contradiction
Proof of Proposition
Convexity of Ac and Ac(0) =
>
gi{n),
and Av{p2) <
<
)
>
at
some p €
a
<
matching, plus p^
0.
and thus
>
maximum
p^
(P2,P2)-
imply Ac'(n)
f2{uj)
<
Av
at p.
Then Av must
attain a global
>
>
Av"(p}
> Ac(n)/n
fi{w) for
all ty
>
Define Ac(n)
for all
0,
n >
so that
0.
=
C2(n) -ci(n).
Equivalently,
^2('"^)
—
D
'.
Cost Convexity.
(c):
of
as just
and pi > p2
as before.
6
0,
The claim then obtains near the
one such p G (P2,P2)- This requires Av{p)
•
have g2{n)
i'2(p})
Ordering Value Functions. Value
2:
thresholds.
^2(p,
then
Vi
^i(Pi^i(p))
Av{p)
+£
Thus, the function Av, strictly increasing at p
strict convexity of V2.
exists. If instead pi
Av,
upper thresholds.
=
l//2(^)
we
>
= ^[{w). Since ^^(0) > ^[{0) by assumption, ^(u;) > ^i{w) for all u; > 0. [Or,
if Ci(0) = 0, then /^(O) =
implies 6(0) = 4(/2(0)) >
for i = 1,2, so that Ac'(0+) >
c'i(/i(0)) = 6(0).] Thus, (18) holds: '^i{;Vi{-)) = 6(rWi(-))/S(-), so that 6(-) increasing
l/fi{w)
and 6(0 > 6(")- This proves value and threshold monotonicity. Since
and each
•
E2(-)
fi is
increasing,
also f2{w)
<
fi{w),
we have n2 (p) = /2 {rv2 (p) ) < /2 {rvi (p) ) < /i {rvi (p) ) = ni (p)
Proof of Proposition
6
(d):
Information Quality.
and therefore the premise of Claim 13 holds:
33
^i{-,Vi{-))
=
If Ci
>
C2,
.
then Ei(-)
>
^(rWi(-))/Ei(-), so that
and 1/Ei(-) < 1/E2(-) imply
(,{) increasing
in.
Since /
=
unchanged, nj(p)
is
f{rvi{p)) falls uniformly with
Claim 14 (Proof of Proposition
6(e):
The r2-thresholds are
(b)
There exists a possibly empty interval
<
T^2{p)
for
?^i(p)
Proof of
(a):
Proof of
all
p €
>
Since r2
Let
(6):
vui{p)
with
=
[q,q], strictly
falls
and
triggers shift
Vi.
Given interest rates r2>ri>0.
at all points in its dornain;
contained in \p„,P2]
C
[p
,
with
Pi],
and n2{p) > ni(p) otherwise.
[q^q],
T]
Impatience)
m, and the r2-value v lower
(a)
shifted
Hence the value v
(18).
all else
riVi{p) for
equal, ^ increasing implies (18)
p e
{p., pi),
=
i
and Claim
13.
Define functions Aw{p),
1,2.
< p and p\ > p2
<
< Az;'(p2)from (a), smooth pasting, and strict convexity of Vi imply Av'{p
By continuity, Av' is strictly increasing in some subset / C [p ^2]- Hence, Av"{p) =
[^{w2{p)) — ^(wi{p))]/'E{p) > 0, and so W2{p) > uii{p) and n2{p) > ni{p). We show that
Aw', Aw", Av, Av'
,
and Av"
with domain \p^,P2\-
all
p
First,
)
,
the complement set, where the folk result
By
definition of
=
s.t. u;,(p.)
^2(^2)
~
from p
the return Wi
~'^a)
^
w'{
>
^i(^A ""^i)
—
'^[{Pi)
0.
follows from r2
that
=
=
=
=
ri(-K^
-
ri^{wi{p))
Thus
tt^).
< '^[{'£.2)^ where the first equality is smooth
> ri and tt^ < tt^, and the strict inequality
By a symmetric argument and
is
a possibly empty interval.
ri7r(p,), w'^{pi)
>
tt^
n^, w'2{p2)
p and increasing
strictly decreasing at
Aw is strictly positive in a non-empty set / C
Aw cannot have a local non-negative maximum in
{p ,p2)-
>
f^i(pi)-
at p2- Since
\p ,P2], it suffices to
By
show
contradiction, suppose
> Aw"{p) for some p e {p^,P2)- Then p e {p^,P2) C [0,1]
implies E(p) > 0; furthermore r2 > ri, ^(•) is increasing and Aw{p) > 0, so that the
familiar contradiction follows: Aw"{p) — [E{p)]~^[r2^{w2{p)) — ri^{wi{p))] >
> Aw"{p).
that
Aw{p) >
is
convex and solves Ti{p)w'/{p)
n^), w,{p,)
Therefore the smooth function Att;
we have shown that
strictly
-
=
weak inequality
> p and
is
ni obtains,
r,(7r^
r^7r(p^), w'^ip^)
''''ii'^A
pasting, the
w,
<
7x2
Finally,
we
Aw'{p)
specialize to the
R&D
payoff specification.
Aw{p) = W2{p) — Wi{p) = —Wi{p) <
negative and declining; since
it
it
by p
must become
>
it
is
—
0,
we
raise p.
(P2,P2) such that n{p) declines for all beliefs
p
<
p',
and then
initially strictly
some point below
cannot change sign twice; so
sign exactly once, going from negative to positive as
€
Aw
Therefore
p.-
strictly positive at
cannot have a local non negative maximum,
cutoff' p'
Here, V2ip„)
Hence, there
and
p2,
and
An
changes
an
interior
is
rises for all
p >
p'.
C.2 Vanishing / Exploding Convexity: Completion of Proof of Proposition 7
Assume n2 >
9k{n)
ni, with A^
—>
00.
By Claim
1,
^^(^2)
> gk{0) + J^ dgkin') =0 + J^ Xku' dn' = XkU^ /2 -^ 00
34
—
Qki'^.i)
>
'^i-^A;("2
—
ni).
as k -^ 00. Similarly, pfc(n)
Then
^
as
/c
—> oo when
'>^k{p)
=
—
A^
>•
>
f{'rv{p))
0.
Hence, the inverse fk of g^ explodes away from
The proof
f{rTr{p)) explodes.
A^
for
^
is
0.
Since f
>
tt
^
0,
D
symmetric.
References
Arrow,
Blackwell, and M. Girshick
K., D.
(1949): "Bayes
and Minimax Solutions
of Sequential Design Problems," Econometrica, 17, 213-244.
Blackwell, D.
ics
and
Bolton,
(1953): "Equivalent
Statistics, 24,
Comparison of Experiments," Annals of Mathemat-
265-272.
and C. Harris (1993):
P.,
"Strategic Experimentation,"
Nuffield College
mimeo.
Brock, W. A., and A. G. Malliaris (1989): Differential
Chaos in Dynamic Economics. North Holland, New York.
Chernoff, H.
(1972): Sequential Analysis
Equations, Stability and
and Optimal Design,
vol. 8 of
Regional Con-
ference Series in Applied Mathematics. Society for Industrial and Applied Mathematics,
Philadelphia.
Cressie, N., and P. MORGAN (1993): "The VPRT: A Sequential Testing Procedure
Dominating the SPRT," Econometric Theory, 9, 431-450.
DiMasi, J., H. G. Grabowski, and J. Vernon (1995): "R&D Costs, Innovative
Output and Firm Size in the Pharmaceutical Industry," International Journal of the
Economics of Business, 2, 201-219.
DuTTA, P. K.
(1997):
Dynamics and
"Optimal Management of an
R&D
Budget," Journal of Economic
Control, 21, 575-602.
Easley, D., and N. KlEFER
(1988):
"Controlling a Stochastic Process with
Unknown
Parameters," Econometrica, 56, 1045-1064.
Elsgolts, L.
Moscow.
(1970): Differential Equations
Kamien, M., and N. SCHWARTZ
(1971):
and Calculus of
Variations.
Mir Publishers,
"Expenditure Patterns for Risky
R&D
Projects," Journal of Applied Probability, 8, 60-73.
Karatzas,
I.,
Shreve (1991): Brownian Motion and
New York [KS91].
and S. E.
Springer- Verlag,
S., and H. M. TAYLOR (1981):
demic Press, San Diego [KT81].
Karlin,
Keller, G., and
ment," Stanford
A Second
Course in Stochastic Processes. Aca-
Rady (1997): "Optimal Experimentation
GSB Research Paper No. 1443.
S.
35
Stochastic Calculus.
in
a Changing Environ-
KlHLSTROM,
R., L.
MiRMAN, and A. PoSTLEWAlTE
and the 'Rothschild
tion
Effect',"
ConsumpEconomic Theory, ed. by
(1984): "Experimental
Bayesian Models
in
in
M. Boyer, and R. Kihlstrom, pp. 279-302. Elsevier Science Publishers, New York.
Krylov, N. V.
(1980): Controlled Diffusion Processes. Springer- Verlag,
New
York.
LiPTSER, R., and A. N. Shiryayev (1977): Statistics of Random Processes: General
Theory, vol. I. Springer- Verlag, New York [LS77].
(1978):
New
Statistics of
Random
Processes: Applications, vol.
II.
Springer- Verlag,
York.
McLennan,
A. (1984): "Price Dispersion and Incomplete Learning
Journal of Economic Dynamics and Control,
Neyman,
J.,
and E. S.
PEARSON
(1933):
"On
7,
the Problem of the
Statistical Hypotheses," Philosophical Transactions of the
Oksendal, B.
(1995): Stochastic Differential Equations:
Springer Verlag,
tions.
New York
in the
Long Run,"
331-347.
Most
Efficient Tests of
Royal Society, 231, 140-185.
An
Introduction with Applica-
[095].
Radner,
R., and J. Stiglitz (1984): "A Nonconcavity in the Value of Information," in
Bayesian Models in Economic Theory, ed. by M. Boyer, and R. Kihlstrom, pp. 33-52.
Elsevier Science Publishers,
New
York.
Rockafellar, T.
(1970):
Shiryayev, A. N.
(1978): Optimal Stopping Rules. Springer- Verlag,
Trefler, D.
Convex Analysis. Princeton University
Press, Princeton [R70].
New York
[S78].
"The Ignorant Monopolist: Optimal Learning with Endogenous
Information," International Economic Review, 34, 565-581.
Wald, a.
(1993):
(1945b):
and
Method of Sampling for Deciding Between Two Courses
American Statistical Association, 40.
(1945a): "Sequential
of Action," Journal of the
"Sequential Tests of Statistical Hypotheses," Annals of Mathematics
Statistics, 16.
(1947a): Sequential Analysis. Wiley,
Wald,
Wallis,
York.
Wolfowitz
(1948): "Optimum Character of the Sequential ProbaRatio Test," Annals of Mathematics and Statistics, 19, 326-329.
A., and J.
bility
New
W.
a. (1980): "The
Statistical
Research Group, 1942-45," The American Statis-
tician, 75.
36
DUPL
MIT LIBRARIES
3 9080
J
735 0155
J
Download