ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL

advertisement
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC
STOCHASTIC CONTROL
MATIJA VIDMAR AND SAUL JACKA
Abstract. We formulate a massively general framework for optimal dynamic stochastic control
problems which allows for a control-dependent informational structure. The issue of informational
consistency is brought to light and investigated. Bellman’s principle is formulated and proved. In
a series of related results, we expound on the informational structure in the context of (completed)
natural filtrations of (stochastic) processes.
1. Introduction: motivation – literature overview – structure of paper &
informal summary of results
Optimal dynamic stochastic control, is (i) stochastic, in the sense that the output of the system
is random; it is (ii) optimal control, to the extent that, with the goal of optimizing its expectation,
said output is subject to exogenous influence; and it is (iii) dynamic, in that this control, at any
one instant of time, is adapted to the current and past state of the system. In general, however,
the controller in a dynamic stochastic control problem can observe some part only of all of the
“universal” information which is being accumulated (e.g. he may only be able to observe the
controlled process, or, worse, some (non one-to-one) function thereof). Moreover, this “observed
information” may depend on the chosen control.
It is therefore only reasonable to insist a priori on all the admissible controls (as processes) to be
adapted (or even predictable with respect) to what is thus a control-dependent-informational-flow
that is being acquired by the controller. And while not doing so can emerge as having been (in
some sense) immaterial a posteriori (e.g. since the optimal control turned out to have been only a
function of the present (and past) state of an observable process), then this will, generally speaking,
only happen to have been the case, rather than it needing to have been so, which, presumably, is
the preferred of the two alternatives.
Some (informal) examples follow.
2010 Mathematics Subject Classification. Primary: 93E20; Secondary: 60G05, 60A10, 28A05.
Key words and phrases. Optimal dynamic stochastic control; informational structure; stopped (completed) natural
filtrations and (completed) natural filtrations at stopping times; Bellman’s principle.
This research was mainly conducted while MV was a PhD student in the Department of Statistics at the University
of Warwick, and a recipient of a scholarship from the Slovene Human Resources Development and Scholarship Fund
(contract number 11010-543/2011). MV acknowledges this latter support and thanks the hospitality of the former.
Special mention of Jon Warren, for many helpful discussions on some of the material from Part 2 of this paper.
1
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
2
(•) Job hunting. Consider an academic trying to optimize his choice of current institution.
The decision of whether or not to move, and where to move, will be based (in particular) on the
knowledge of the quality of the research and of the amenities of the various universities/institutes
that he can choose from. This knowledge itself will depend (at least partially) on the chosen
sequence of institutions that he has already affiliated himself with (indeed, since the quality of the
institutions changes over time, it will depend on it in a temporally local way – only once he has
chosen to move and has spent some time at his new location, will he be able to properly judge the
eventual ramifications of his choice). Costs are associated with moving, but also with staying for
too long at an institution, which hinders his academic development; rewards are associated with
the quality of the chosen institution; further, more or less resources can be devoted to determining
the quality of institutions before a (potential) move. (See Section 5 for a toy model reminiscent of
this situation.)
(•) Or consider quality control in a production line of light-bulbs. A fault with the machine
may cause all the bulbs from some time onwards to be faulty. Once this has been detected, the
machine can of course be fixed and the production of functioning bulbs restored. However, only the
condition of those light-bulbs which are taken out of the production line, and tested, can actually
be observed. A cost is associated with this process of testing (but, clearly, also with issuing faulty
light-bulbs). Conversely, rewards accrue from producing functioning bulbs.
(•) From the theory of controlled SDEs, a folklore example of loss of information is Tanaka’s
Rt
SDE: let X be a Brownian motion, Wt := 0 sgn(Xs )dXs , t ∈ [0, ∞). Then the completed natural
filtration of W is strictly included in the completed natural filtration of X [11, p. 302].
(•) In economics the so-called class of bandit models is well-known (see the excellent literature
overview in [8]). These are “sequential decision problems where, at each stage, a resource like time,
effort or money has to be allocated strategically between several options, referred to as the arms
of the bandit.” And further: “The key idea in this class of models is that agents face a tradeoff between experimentation (gathering information on the returns to each arm) and exploitation
(choosing the arm with the highest expected value).” [8, p. 2].
(•) Miscellaneous other situations in which the control is non-trivially, and naturally, adapted, or
even previsible, with respect to an informational flow, which it itself influences and helps engender,
abound: controlling the movement of a probe in a stochastic field, only the local values of the field
being observable, and costs/rewards associated with the speed of movement/intensity of the field
(cf. Example 2.11); a similar situation for movement on a random graph, being only able to observe
the values attached to the vertices visited; additively controlling a process, but observing the sum
of the process itself and of the control etc.
In short, the phenomenon is ubiquitous; we suggest it is the norm, rather than the exception.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
3
In particular, then, one should like a general framework for stochastic control (equipped with
a suitable abstract version of Bellman’s principle) which makes such a control-dependent informational flow explicit and inherent in its machinery. The (seeming) circularity of the controls being
adapted to an informational structure which they themselves help engender, makes this a somewhat delicate point. Indeed, it would seem, this is the one aspect of general (abstract) dynamic
stochastic control which has not yet received due attention in existing literature. Hitherto, only
a single, non-control dependent, observable [5] informational flow [15, 14] appears to have been
allowed. (This is of course not to say that the phenomenon has not entered and been studied in
the literature in specific situations/problems, see e.g. [6, 8, 7, 16] [1, Chapter 8] and the references therein; the focus there having been (for the most part) on reducing (i.e. a priori proving a
suitable equivalence of) the original control problem, which is based on partial control-dependent
observation, to an associated ‘separated ’ problem, which is based on complete observation.)
In the present paper we attempt to fill this gap in the literature, by putting forward a general
stochastic control framework which explicitly allows for a control-dependent informational flow – as
it were ‘embracing it’, rather than trying to circumvent it in some or another manner. (Recognizing
that it may not always be advantageous, or indeed possible, to work with an equivalent (but more
complex) ‘separated’ formulation.) In particular, we provide a fully general (modulo the relevant
(technical) condition, see Assumption 4.3) abstract version of Bellman’s principle in such a setting.
This is the content of Part 1. Specifically, Section 2 formally defines a system of stochastic control
(in which observed information is an explicit function of control); Section 3 discusses its conditional
payoff and ‘Bellman’ system; Section 4 formulates Bellman’s principle – Theorem 4.6 is our main
result. Lastly, Section 5 contains the solution to a formal (but artificial) example, illustrating some
of the main ideas of this paper; several other (counter)examples are also given along the way.
Now, a crucial requirement for the above programme to be successful is that of informational
consistency over controls (cf. Assumption 2.9): if two controls agree up to a certain time, then
what we have observed up to that time should also agree. Especially at the level of random (stopping) times, this becomes a non-trivial statement – for example, when the observed information
is that generated by a (controlled) process, which is often the case. We expound on this issue of
informational consistency in the context of (completed) natural filtrations of processes in Part 2.
Specifically, we consider there, amongst others relevant, the following natural and pertinent question, which is interesting in its own right: if X and Y are two processes, and S a stopping time of
one or both of their (possibly completed) natural filtrations, with the stopped processes agreeing,
X S = Y S (possibly only with probability one), must the two (completed) natural filtrations at
the time S agree also? To answer this question (with proofs) is non-trivial in the temporally nondiscrete case, and several related findings are obtained along the way (see the introductory remarks
to Part 2, on p. 21, for a more detailed account). In essence they are (consequences of/connected
with) a generalization (Theorem 6.6) of (a part of) Galmarino’s test, available in literature for
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
4
coordinate processes on canonical spaces [4, p. 149, Theorem IV.100] [11, p. 320, Lemma 4.18],
and extended here to a not necessarily canonical setting.
Conventions. Throughout this paper, for a probability measure P on Ω and A ⊂ Ω, some
property in ω ∈ A will be said to hold P-a.s. on A, if the set of ω ∈ A for which the property does
not hold is first measurable (i.e. belongs to the domain of P), and second is of P-measure zero.
When A = Ω, we shall of course just say that the property holds P-a.s. Finally, for L ⊂ 2Ω and
A ⊂ Ω, L|A := {L ∩ A : L ∈ L} is the trace of L on A.
Part 1. Optimal dynamic stochastic control with control-dependent information
As announced in the Introduction, we provide and analyze in this part, a framework for optimal
dynamic stochastic control, in which information is explicitly control-dependent. The informational
flow itself is modeled using filtrations, and this can be done in one of the following two, essentially
different, ways:
(1) Dealing with events ‘with certainty’, irrespective of the presence of probability.
(2) Dealing with events ‘up to a.s. equality’, insisting that the filtrations be complete relative
to the underlying probability measure(s).
We develop the second ‘probabilistic’ approach – of complete filtrations – in parallel to the default
first – for lack of a better word, ‘measure-theoretic’ – setting. Indeed, the formal differences between
the two approaches are minor. For the most part one has merely to add, in the ‘complete’ setting,
a number of a.s. qualifiers. We will put these, and any other eventual differences of the second
approach as compared to the first, in {} braces. This will enforce a strict separation between the
two settings, while still allowing us to repeat ourselves as little as possible.
2. Stochastic control systems
We begin by specifying the formal ingredients of a system of optimal dynamic stochastic control.
Setting 2.1 (Stochastic control system). A stochastic control system consists of:
(i) A set T with a linear (antisymmetric, transitive & total) ordering ≤. We will assume (for
simplicity) either T = N0 , or else T = [0, ∞), with the usual order. T is the time set.
(ii) A set C. The set of admissible controls. (These might be {equivalence classes of} processes
or stopping times, or something different altogether.)
(iii) A set Ω endowed with a collection of σ-algebras (F c )c∈C . Ω is the sample space and F c
is all the information accumulated (but not necessarily acquired by the controller) by the
“end of time” or, possibly, by a “terminal time”, when c is the chosen control. For example,
in the case of optimal stopping, when there is given a process X, and it is stopped, the set
of controls C would be the {equivalence classes of the} stopping times of the {completed}
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
5
natural filtration of X, and for any S ∈ C, F S = σ(X S ), the σ-field generated by the
stopped process {or its completion}. We understand here optimal stopping in the strict
sense: the exogenous act is that of stopping, not sampling; after the process has been
stopped, it ceases to change.
(iv) (Pc )c∈C , a collection of {complete} probability measures, each Pc having domain which
includes the {Pc -complete} σ-field F c (for c ∈ C). The controller chooses a probability
measure from the collection (Pc )c∈C . This allows for incorporation of the Girsanov approach
to control, wherein the controller is seen as affecting the probability measure, rather than
the random payoff. From the point of view of information, being concerned with laws rather
than random elements, it is of course somewhat unnatural. Nevertheless, we will formally
allow for it – it costs us nothing.
(v) A function J : C → [−∞, +∞]Ω , each J(c) being F c measurable (as c runs over C) {and
c
defined up to Pc -a.s. equality}. We further insist EP J(c)− < ∞ for all c ∈ C. Given the
control c ∈ C, J(c) is the random payoff. Hence, in general, we allow both the payoff, as
well as the probability law, to vary.
c := ∨
c
c
(vi) A collection of filtrations1 (G c )c∈C on Ω. It is assumed G∞
t∈T Gt ⊂ F , and (for
simplicity) that G0c is Pc -trivial (for all c ∈ C) {and contains all the Pc -null sets}, while
G0c = G0d {i.e. the null sets for Pc and Pd are the same} and Pc |G0c = Pd |G d for all {c, d} ⊂ C.
0
Gtc is the information acquired by the controller by time t ∈ T , if the control chosen is
c ∈ C (e.g. G c may be the {completed} natural filtration of an observable process X c which
depends on c). Perfect recollection is thus assumed.
c
Definition 2.2 (Optimal expected payoff). We define v := supc∈C EP J(c) (sup ∅ := −∞), the
c
optimal expected payoff. Next, c ∈ C is said to be optimal if EP J(c) = v. Finally, a C-valued
net is said to be optimizing if its limit is v.
Remark 2.3.
(1) It is, in some sense, no restriction, to have assumed the integrability of the negative parts
c
of J in Setting 2.1(v). For, allowing any extra controls c for which EP J(c)− = ∞, but
c
for which EP J(c) would still be defined, would not change the value of v (albeit it could
change whether or not C is empty, but this is a trivial consideration).
c -measurable (for c ∈ C). The
(2) It is not natural a priori to insist on each J(c) being G∞
outcome of our controlled experiment need not be known to us (the controller) at all – not
even by the end of time; all we are concerned with is the maximization of its expectation.
(3) In the case C is a collection of processes, the natural requirement is for each such process
c ∈ C to be adapted (perhaps even previsible with respect) to G c . If it is a collection of
1All filtrations will be assumed to have the parameter set T .
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
6
random times, then each such c ∈ C should presumably be a (possibly predictable) stopping
time of G c . But we do not formally insist on this.
We now introduce the concept of a controlled time, a (we would argue, natural) generalization
of the notion of a stopping time to the setting of control-dependent filtrations.
Definition 2.4 (Controlled times). A collection of random times S = (S c )c∈C is called a controlled time, if S c is a {defined up to Pc -a.s. equality} stopping time of G c for every c ∈ C.
Example 2.5. A typical situation to have in mind is the following. What is observed is a process
X c , its values being contingent on the chosen control c (this may, but need not, be the controlled
process, e.g. it might be some non one-to-one function of it). Then G c is the {completed} natural
filtration of X c . Letting, for example, for each c ∈ C, S c be the first entrance time of X c into some
fixed set, the collection (S c )c∈C would constitute a controlled time (as long as one can formally
establish the stopping time property).
Definition 2.6 (Deterministic and control-constant times). If there is some a ∈ T ∪ {∞}, such
that S c (ω) = a for {Pc -almost} all ω ∈ Ω, and every c ∈ C, then S is called a deterministic time.
More generally, if there is a random time S, which is a stopping time of G c and S c = S {Pc -a.s}
for each c ∈ C, then S is called a control-constant time.
As yet, C is an entirely abstract set with no dynamic structure attached to it. The following
establishes this structure. The reader should think of D(c, S) as being the controls “agreeing {a.s.}
with c up to time S”. (Example 2.11 and Section 5 contain definitions of the collections D(c, S) in
the (specific) situations described there.)
Setting 2.7 (Stochastic control system (cont’d) – control dynamics). There is given a collection
G of controlled times. Further, adjoined to the stochastic control system of Setting 2.1, is a family
(D(c, S))(c,S)∈C×G of subsets of C for which:
(1) c ∈ D(c, S) for all (c, S) ∈ C × G.
(2) For all S ∈ G and {c, d} ⊂ C, d ∈ D(c, S) implies S c = S d {Pc & Pd -a.s}.
(3) If {S, T } ⊂ G, c ∈ C and S c = T c {Pc -a.s}, then D(c, S) = D(c, T ).
(4) If {S, T } ⊂ G and c ∈ C for which S d ≤ T d {Pd -a.s.} for d ∈ D(c, T ), then D(c, T ) ⊂
D(c, S).
(5) For each S ∈ G, {D(c, S) : c ∈ C} is a partition of C.
(6) For all (c, S) ∈ C × G: D(c, S) = {c} (resp. D(c, S) = C), if S c is identically {or Pc -a.s.}
equal to ∞ (resp. 0).2
Definition 2.8. Pursuant to Setting 2.7(5), we write ∼S for the equivalence relation induced by
the partition {D(c, S) : c ∈ C}.
2This is not really a restriction; see Remark 4.4(ii).
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
7
Using this dynamical structure, a natural assumption on the temporal consistency of the filtrations (G c )c∈C and the measures (Pc )c∈C — indeed a key condition on whose validity we shall insist
throughout — is as follows:
Assumption 2.9 (Temporal consistency). For all {c, d} ⊂ C and S ∈ G satisfying c ∼S d, we
have GSc c = GSd d and Pc |GSc c = Pd |G d .
Sd
Several remarks are now in order.
(•) First, when S is a non-control-constant time (e.g. when S c is the first entrance time into some
fixed set of an observed controlled process X c , as c runs over C), then already the provisions of
Setting 2.7 (leaving aside, for the moment, Assumption 2.9) are far from being entirely innocuous,
viz. condition Setting 2.7(2) (which, e.g. would then be saying that controls agreeing with c up
to the first entrance time S c of the observed controlled process X c , will actually leave the latter
invariant). They are thus as much a restriction/consistency requirement on the family D, as they are
on which controlled times we can put into the collection G. Put differently, G is not (necessarily)
a completely arbitrary, if non-specified, collection of controlled times. For, a controlled time is just
any family of G c -stopping times, as c runs over the control set C. The members of G, however,
enjoy the further property of “agreeing between two controls, if the latter coincide prior to them”.
This is of course trivially satisfied for deterministic times (and, more generally, control-constant
stopping times), but may hold of other controlled times as well.
(•) Second, the choice of the family G is guided by the specific problem at hand: not all controlled times are of interest. For example, sometimes the deterministic times may be relevant, the
others not. On the other hand, it may be possible to effect the act of “controlling” only at some
collection of (possibly non-control-constant) stopping times – then these times may be particularly
worthy of study. The following example illustrates this point already in the control-independent
informational setting (anticipating somewhat certain concepts, like the Bellman system, and conditional optimality, which we have not yet formally introduced; the reader might return to it once
he has studied Sections 3 and 4).
Example 2.10. Given: a probability space (Ω, F, P); on it an N0 -valued càdlàg Poisson process
N of unit intensity with arrival times (Sn )n∈N0 , S0 := 0, Sn < ∞ for all n ∈ N; an independent
independency of random signs (Rn )n∈N0 with values in {−1, +1}, P(Rn = +1) = 1 − P(Rn = −1) =
2/3.
The “observed process” is
Z
W := N +
·
X
0 n∈N
0
Rn 1[Sn ,Sn+1 ) dLeb
(so to N is added a drift of Rn during the random time interval [Sn , Sn+1 ), n ≥ 0). Let G be the
natural filtration of W . Remark the arrival times of N are stopping times of G.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
8
The set of controls C, on the other hand, consists of real-valued, measurable processes, starting
at 0, which are adapted to the natural filtration of the bivariate process (W 1{∆N 6=0} , N ) (where
∆N is the jump process of N ; intuitively, we must decide on the strategy for the whole of [Sn , Sn+1 )
based on the information available at time Sn already, n ≥ 0). For X ∈ C consider the penalty
functional
Z
J(X) :=
[0,∞)
e−αt 1(0,∞) ◦ |Xt − Wt |dt
(continuous penalization with discounting rate α ∈ (0, +∞) of any deviation from the process W
by the control X). Let v := inf X∈C EJ(X) be the optimal expected penalty; clearly an optimal
control is the process X̂ which takes the value of W at the instances which are the arrival times of
N and assumes a drift of +1 in between those instances, so that v = 1/(3α). Next, for X ∈ C, let
VSX := P-essinf Y ∈C,Y S =X S E[J(Y )|GS ],
S a stopping time of G,
be the Bellman system. We shall say Y ∈ C is conditionally admissible at time S for the control
X (resp. conditionally optimal at time S), if Y S = X S (resp. VSY = E[J(Y )|GS ] P-a.s.). Denote
V := V X̂ for short.
(1) We maintain first that the process (Vt )t∈[0,∞) (the Bellman process (i.e. system at the
deterministic times) for the optimal control), is not mean nondecreasing (in particular, is not a
submartingale, let alone a martingale with respect to G) and admits no a.s. right-continuous
version.
For, V0 = v; while for t ∈ (0, ∞), the following control, denoted X ? , is, apart from X̂, also
conditionally admissible at time t for X̂: It assumes the value of W at the instances of the arrival
times of N , and a drift of +1 in between those intervals, until before (inclusive of) time t; strictly
after time t and until strictly before the first arrival time of N which is ≥ t, denoted St , it takes
the values of the process which starts at the value of W at the last arrival time of N strictly
before t and a drift of −1 thereafter; and after (and inclusive of) the instance St , it resumes
to assume the values of W at the arrival times of N and a drift of +1 in between those times.
P
Notice also that Rt 1(t is not an arrival time of N ) ∈ Gt , where Rt = n∈N0 Rn 1[Sn ,Sn+1 ) (t), i.e.
Rt 1(t is not an arrival time of N ) is the drift at time t, on the (almost certain) event that t is not
an arrival time of N , zero otherwise. It follows that, since X̂ is conditionally admissible for X̂ at
time t:
Vt ≤ E[J(X̂)|Gt ],
so EVt 1{Rt =+1} ≤ EJ(X̂)1{Rt =+1} ; whereas since X ? is also conditionally admissible at time t for
X̂:
Vt ≤ E[J(X ? )|Gt ],
so
EVt 1{Rt =−1}
≤
EJ(X ? )1{Rt =−1}
EJ(X̂)1{Rt =−1} − α1 e−αt (1 −
1
1
1+α ) 3
=
EJ(X̂)1{Rt =−1} − E
1
−αt dt
{Rt =−1}
(t,St ) e
R
=
(properties of marked Poisson processes). Summing the two
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
9
inequalities we obtain
EVt ≤ v −
1
e−αt ,
3(1 + α)
implying the desired conclusion (for the nonexistence of a right continuous version, assume the
converse, reach a contradiction via uniform integrability).
(2) We maintain second that the process (VSXn )n∈N0 , however, is a discrete-time submartingale
(and martingale with X = X̂) with respect to (GSn )n∈N0 , for all X ∈ C.
For X = X̂, this follows at once from the obvious observation that X̂ is conditionally optimal
at each of the arrival instances of N . On the other hand, for arbitrary X ∈ C, n ∈ N0 , G ∈ GSn ,
and {Y, Z} ⊂ C with Y Sn = X Sn = Z Sn , the control which coincides with Y (hence Z) on [0, Sn ]
and then with Y (resp. Z) on (resp. the complement of) G strictly after Sn , is conditionally
admissible at time Sn for X. The desired conclusion then follows through a general argument, see
Theorem 4.6. Specifically, one finds that the family {E[J(Y )|GSn ] : Y ∈ C, Y Sn = X Sn } is directed
downwards for each n ∈ N0 (cf. proof of Proposition 4.2), hence can apply Lemma A.3 (cf. proofs
of Proposition 4.5 and Theorem 4.6).
In light of this example it is important to note that it will not matter to our general analysis,
which controlled times are actually put into G: as long as the explicit provisions that we (will,
viz. Assumption 4.3) have made, are in fact met. This generality allows to work with/choose, in a
given specific situation, such a family G, as can be/is most informative of the problem.
(•) Finally, as already remarked, a typical example of an observed filtration is that of an observed
process, i.e. for c ∈ C, X c is a process whose values (in some measurable space) we can observe,
and Gtc := σ(X c |[0,t] ) = σ(Xsc : s ∈ [0, t]), t ∈ T , is the natural filtration of X c {or possibly
its Pc -completion}. Let c and d be two controls, agreeing up to a controlled time S, c ∼S d.
c
d
Then, presumably, X c and X d do, also, i.e. (X c )S = (X d )S {Pc -a.s. and Pd -a.s.} (where, a
priori, S c = S d {Pc -a.s. and Pd -a.s.}), and hence we should like to have (viz. Assumption 2.9)
GSc c = GSd d . In other words, abstracting only slightly, and formulated without the unnecessary
stochastic control-picture in the background, the following is a natural, and an extremely important,
question. Suppose X and Y are two processes, defined on the same sample, and with values in
the same measurable, space; S a stopping time of both (or possibly just one) of their {completed}
natural filtrations. Suppose furthermore the stopped processes agree, X S = Y S {with probability
one}. Must we have FSX = FSY {F X S = F Y S } for the {completed} natural filtrations F X and F Y
{F X and F Y } of X and Y ? Intuitively: yes, of course (at least when there are no completions in
play). Formally, in the non-discrete case, it is not so straightforward. We obtain partial answers
in Part 2.
We conclude this section with a rather general example illustrating the concepts introduced
thusfar, focusing on the control-dependent informational flows, and with explicit references made
to Settings 2.1 and 2.7.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
10
Example 2.11. The time set is [0, ∞) (Setting 2.1(i); T = [0, ∞)). Given are: a probability space
(Ω, F, P) (Settings 2.1(iii) and 2.1(iv); F c = F and Pc = P for each c, Ω is itself); an open subset
O ⊂ Rd of Euclidean space; and a random (time-dependent) field (Ro )o∈O – each Rto being an
F-measurable random variable, and the random map ((o, t) 7→ Rto ) being assumed continuous from
O ×[0, ∞) into R /so that the map ((ω, o, t) 7→ Rto (ω)) is automatically F ⊗B(O)⊗B([0, ∞))/B(R)measurable3/. Think of, for example, the local times of a Markov process, the Brownian sheet,
solutions to SPDEs [3] etc.
Now, the idea is to control the movement in such a random field, observing only the values of
the field at the current space-time point determined by the control c. Rewards accrue as a function
of the value of the field at the location of the control, penalized is the speed of movement.
To make this formal, fix a discount factor α ∈ (0, ∞), an initial point o0 ∈ O, a measurable
reward function f : O → R and a nondecreasing penalty function g : [0, ∞) → R.
The controls (members of C of Setting 2.1(ii)) are then specified as being precisely all the
O-valued, F ⊗ B([0, ∞))/B(O)-measurable, differentiable (from the right at zero), processes c,
starting at o0 (i.e. with c0 = o0 ), adapted to the natural filtration (denoted G c ; as in Setting 2.1(vi)) of the process Rc , and satisfying (cf. definition of J in the paragraph following)
R∞
E 0 e−αt [f − ◦ Rtct + g + ◦ |ċt |] dt < ∞. (Clearly the observed information G c depends in a highly
non-trivial way on the chosen control.)
Next, the payoff functional J from Setting 2.1(v) is given as:
Z ∞
J(c) :=
e−αt [f ◦ Rtct − g ◦ |ċt |] dt,
c ∈ C.
0
Finally, with regard to Setting 2.7, define for any c ∈ C and controlled time S,
c
c
D(c, S) := {d ∈ C : dS = cS },
and then let
G := {controlled times S such that ∀c∀d(d ∈ D(c, S) ⇒ S c = S d & GSc c = GSd d )}.
c
c
We will indeed see (Corollary 6.7) that G = {controlled times S such that ∀c∀d(dS = cS ⇒ S c =
S d )}, as long as (Ω, F) is Blackwell (which can typically be taken to be the case). Regardless of
whether or not (Ω, F) is in fact Blackwell, however, all the provisions of Settings 2.1 and 2.7, as
well as those of Assumption 2.9, are in fact met.
3. The conditional payoff and the Bellman system
Definition 3.1 (Conditional payoff & Bellman system). We define for c ∈ C and S ∈ G:
c
J(c, S) := EP [J(c)|GSc c ], and then V (c, S) := Pc |GSc c -esssupd∈D(c,S) J(d, S);
3For simplicity (so as not to be preoccupied with technical issues) we make all the processes in this example
continuous.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
11
and say c ∈ C is conditionally optimal at S ∈ G, if V (c, S) = J(c, S) Pc -a.s. (J(c, S))(c,S)∈C×G
is called the conditional payoff system and (V (c, S))(c,S)∈C×G the Bellman system.
Remark 3.2.
(i) Thanks to Assumption 2.9, the essential suprema appearing in the definition of the conditional payoff system are well-defined (up to the relevant a.s. equalities).
(ii) Also, thanks to Setting 2.7(3), V (c, S) only depends on S through S c , in the sense that
V (c, S) = V (c, T ) as soon as S c = T c {Pc -a.s.}. Clearly the same holds true (trivially) of
the system J.
Some further properties of the systems V and J follow. First,
Proposition 3.3. V (c, S) is GSc c -measurable and its negative part is Pc -integrable for each (c, S) ∈
C × G. Moreover if c ∼S d, then V (c, S) = V (d, S) Pc -a.s. and Pd -a.s.
Proof. The appropriate measurability of V (c, S) follows from its definition. Moreover, since each
D(c, S) is non-empty, the integrability condition on the negative parts of V is also immediate (from
the assumed integrability of the negative parts of J). Finally, the last claim follows from the fact
that D(c, S) = D(d, S) (partitioning property) and Pc |GSc c = Pd |G d (consistency), when c ∼S d.
Sd
Second, Proposition 3.9, will (i) establish that in fact (J(c, S))(c,S)∈C×G is a (C, G)-system in
the sense of the definition which follows, and (ii) will also give sufficient conditions for the Pc -a.s.
equality J(c, S) = J(d, S) to obtain on an event A ∈ GSc c , when c ∼S d (addressing the situation
when the two controls c and d agree “for all times” on A). Some auxiliary definitions and results
are needed to this end; they precede Proposition 3.9.
Definition 3.4 ((C, G)-system). A collection X = (X(c, T ))(c,T )∈C×G of functions from
[−∞, +∞]Ω is a (C, G)-system, if (i) X(c, T ) is GTc c -measurable for all (c, T ) ∈ C × G and
(ii) X(c, S) = X(c, T ) Pc -a.s. on the event {S c = T c }, for all c ∈ C and {S, T } ⊂ G.
Definition 3.5 (Times accessing infinity). For a sequence (tn )n∈N of elements of T , we say it
accesses infinity, if for all t ∈ T , there exists an n ∈ N with t ≤ tn .
Lemma 3.6. Suppose H is a {P-complete} filtration on Ω {P being a complete probability measure}
and (Sn )n≥1 a sequence of its stopping times {each defined up to P-a.s. equality}, which accesses
infinity pointwise {or P-a.s.} on A ⊂ Ω, i.e. (Sn (ω))n∈N accesses infinity for {P-almost} every
ω ∈ A. Then H∞ |A = ∨n∈N HSn |A .
Proof. The inclusion ⊃ is manifest. For the reverse inclusion, let t ∈ T and B ∈ Ht (noting that
for any L ⊂ 2Ω and A ⊂ Ω, σΩ (L)|A = σA (L|A )). Then {P-a.s.} B ∩ A = ∪∞
n=1 (B ∩ {Sn ≥ t}) ∩ A
with B ∩ {Sn ≥ t} ∈ HSn .
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
12
Lemma 3.7. Let {c, d} ⊂ C and A ⊂ Ω. Let furthermore (Sn )n∈N be a sequence in G accessing
infinity {a.s.} on A for the controls c and d (i.e. (Snh (ω))n∈N accesses infinity for {Ph -almost}
c | = Gd | .
every ω ∈ A, each h ∈ {c, d}), and for which c ∼Sn d for each n ∈ N. Then G∞
A
∞ A
If further, A ∈ GSc nc for all n ∈ N, and the sequence (Sn )n∈N is {a.s.} nondecreasing on A and
for the controls c and d (i.e. (Snh (ω))n∈N is nondecreasing for {Ph -almost} every ω ∈ A, each
d
c and P | d agree when traced on A.
h ∈ {c, d}), then Pc |G∞
G∞
Remark 3.8. We mean to address here abstractly the situation when the two controls c and d agree
for all times on A.
Proof. By the consistency properties, certainly Pc |GSc c agrees with Pd |G d
d
Sn
n
for each n ∈ N, while
(Snc = Snd )n∈N accesses infinity {Pc -a.s. and Pd -a.s.} on A. Then apply Lemma 3.6 to obtain
c | = σ (∪
c | ) = σ (∪
d
d
c for all n ∈ N, then
G∞
c A
c
A
A
A
n∈N GSn
n∈N GS d |A ) = G∞ |A . If, moreover A ∈ GSn
n
the traces of Pc and Pd on A agree on ∪n∈N GSc nc |A . Provided in addition (Snc )n∈N is {Pc -a.s.}
nondecreasing on A, the latter union is a π-system (as a nondecreasing union of σ-fields, so even an
algebra) on A. This, coupled with the fact that two finite measures of the same mass, which agree
on a generating π-system, agree (by a monotone class argument), yields the second claim.
Proposition 3.9. (J(c, S))(c,S)∈C×G is a (C, G)-system. Moreover, if
(i) c ∼S d, A ∈ GSc c ;
(ii) there exists a sequence (Sn )n∈N from G {a.s.} nondecreasing and accessing infinity on A for
the controls c and d, and for which c ∼Sn d and A ∈ GSc nc for each n ∈ N;
c
d
c ] = EP [J(d)|G d ] Pc -a.s. and Pd -a.s. on A;
(iii) EP [J(c)|G∞
∞
then J(c, S) = J(d, S) Pc -a.s. and Pd -a.s. on A.
Proof. By definition, J(c, T ) is GTc c -measurable. Next, if c ∈ C, then GSc c = GTc c when traced
on {T c = S c } ∈ GSc c ∩ GTc c , whence J(c, T ) = J(c, S) Pc -a.s. thereon (applying Lemma A.1).
Finally, to show that J(c, S) = J(d, S) Pc -a.s. (or Pd -a.s., it is the same) on A under the indicated
conditions, we need only establish that:
Pc -a.s.
1A EP [J(c)|GSc c ] = 1A EP [J(d)|GSd d ] ⇔ (since GSd d = GSc c , A ∈ GSc c )
c
d
∀B ∈ GSc c
EP [J(c)1A 1B ] = EP [EP [J(d)1A 1B |GSd d ]] ⇔ (since Pc |GSc c = Pd |G d )
∀B ∈ GSc c
E [J(c)1A 1B ] = E [J(d)1A 1B ] ⇔ (conditioning)
∀B ∈ GSc c
c
c
d
Sd
Pc
Pd
c
d
E [E [J(c)|G∞
]1A 1B ] = EP [EP [J(d)|G∞
]1A 1B ] ⇔
Pc
Pc
d
d
c
d
c
d
] Pc -a.s. on A)
(since EP [J(c)|G∞
] = EP [J(d)|G∞
∀B ∈ GSc c
d
d
EP [EP [J(d)|G∞
]1A 1B ] = EP [EP [J(d)|G∞
]1A 1B ],
c
d
where finally one can apply Lemma 3.7.
d
d
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
13
4. Bellman’s principle
Definition 4.1 ((C, G)-super/-/sub-martingale systems). A collection X = (X(c, S))(c,S)∈(C,G)
of functions from [−∞, +∞]Ω is a (C, G)- (resp. super-, sub-) martingale system, if for each
(c, S) ∈ C × G (i) X(c, S) is GSc c -measurable, (ii) X(c, S) = X(d, S) Pc -a.s. and Pd -a.s., whenever
c ∼S d, (iii) (resp. the negative, positive part of) X(c, S) is integrable and (iv) for all {S, T } ⊂ G
and c ∈ C with S d ≤ T d {Pd -a.s.} for d ∈ D(c, T ),
c
EP [X(c, T )|GSc c ] = X(c, S)
c
c
(resp. EP [X(c, T )|GSc c ] ≤ X(c, S), EP [X(c, T )|GSc c ] ≥ X(c, S)) Pc -a.s.
In order to be able to conclude the supermartingale property of the Bellman system (Bellman’s
principle), we shall need to make a further assumption (see Assumption 4.3 below; cf. Lemma A.3).
The following proposition gives some guidance as to when it may be valid.
Proposition 4.2. Let c ∈ C, S ∈ G and ∈ [0, ∞), M ∈ (0, ∞]. Then (1)⇒(2)⇒(3).
(1) (i) For all d ∈ D(c, S), Pd = Pc . AND
(ii) For all {d, d0 } ⊂ D(c, S) and G ∈ GSc c , there is a d00 ∈ D(c, S) such that J(d00 ) ≥
M ∧ [1G J(d) + 1Ω\G J(d0 )] − Pc -a.s.
(2) For all {d, d0 } ⊂ D(c, S) and G ∈ GSc c , there is a d00 ∈ D(c, S) such that J(d00 , S) ≥
M ∧ [1G J(d, S) + 1Ω\G J(d0 , S)] − Pc -a.s.
(3) (J(d, S))d∈D(c,S) has the “(, M )-upwards lattice property”:
For all {d, d0 } ⊂ D(c, S) there exists a d00 ∈ D(c, S) such that
J(d00 , S) ≥ (M ∧ J(d, S)) ∨ (M ∧ J(d0 , S)) − Pc -a.s.
Proof. Implication (1)⇒(2) follows by conditioning on GSc c under Pc . Implication (2)⇒(3) follows
by taking G = {J(d, S) > J(d0 , S)} ∈ GSc c .
Assumption 4.3 (Upwards lattice property). For all c ∈ C, S ∈ G and {, M } ⊂ (0, ∞),
(J(d, S))d∈D(c,S) enjoys the (, M )-upwards lattice property (as in Proposition 4.2, Property (3)).
(We shall make it explicit in the sequel when this assumption will be in effect.)
Remark 4.4.
(i) The upwards lattice property of Assumption 4.3 represents a direct connection between the
set of controls C on the one hand and the collection of observable filtrations (G c )c∈C and set
of controlled times G on the other. It is weaker than insisting that every system (J(c, S))c∈C
be upwards-directed (Proposition 4.2, Property (3) with = 0, M = ∞), but still sufficient
to allow one to conclude Bellman’s (super)martingale principle (Theorem 4.6). A more
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
14
precise investigation into the relationship between the validity of Bellman’s principle, and
the linking between C, G and the collection (G c )c∈C remains open to future research.
(ii) It may be assumed without loss of generality (in the precise sense which follows) that
{0, ∞} ⊂ G. Specifically, we can always simply extend the family D, by defining D(c, ∞) :=
{c} and D(c, 0) := C for each c ∈ C – none of the provisions of Section 2.1 (Setting 2.1
and 2.7, Assumption 2.9), nor indeed the validity or non-validity of Assumption 4.3, being
thus affected.
(iii) Assumption 4.3 is of course trivially verified when the filtrations G all consist of (probabilistically) trivial σ-fields alone.
Example 2.11 continued. We verify that in Example 2.11, under the assumption that in fact the
base (Ω, F) therein is Blackwell, Property (1) from Proposition 4.2 obtains with M = ∞, = 0.
Let G ∈ GSc c , {d, d0 } ⊂ D(c, S). It will suffice to show that d00 := d1G + d0 1Ω\G ∈ C (for, then, in
fact, we will have J(d00 ) = J(d)1G + J(d0 )1Ω\G ). Now, S d = S c = S d are all stopping times of G c ,
0
c
and dS = cS = d0 S ; by Theorem 6.6, Proposition 6.5, and Proposition 6.9 to follow in Part 2, all
c
c
the events {S c > t}, {S c ≤ t} ∩ G and {S c ≤ t} ∩ (Ω\G) belong to σ((Rc )S
00
σ((Rd )t )
=
00
Gtd .
Next, for sure,
d00
c ∧t
00
) = σ((Rd )S
c ∧t
)⊂
is a F ⊗ B([0, ∞))/B(O)-measurable, differentiable O-valued
process with initial value o0 , satisfying the requisite integrability condition on f − and g + . So
it remains to check d00 is G d -adapted; let t ∈ [0, ∞). Then d00t 1{S c >t} = ct 1{S c >t} ∈ Gtc , hence
00
d00t 1{S c >t} ∈ Gtd , since Gtd = Gtc on {S c > t}. On the other hand, d00t 1{S c ≤t}∩G = dt 1{S c ≤t}∩G ∈ Gtd ,
00
00
hence d00t 1{S c ≤t}∩G ∈ Gtd , since Gtd = Gtd on {S c ≤ t} ∩ G; similarly for Ω\G in place of G.
00
00
Proposition 4.5. [Cf. [5, p. 94, Lemma 1.14].] Under Assumption 4.3, for any c ∈ C, T ∈ G
and any sub-σ-field A of GTc c , Pc -a.s.:
c
d
EP [V (c, T )|A] = Pc |A -esssupd∈D(c,T ) EP [J(d)|A].
c
d
In particular, EP V (c, T ) = supd∈D(c,T ) EP J(d).
Proof. By Lemma A.3, we have, Pc -a.s.:
c
c
d
d
d
EP [V (c, T )|A] = Pc |A -esssupd∈D(c,T ) EP [EP [J(d)|GTd d ]|A]
= Pc |A -esssupd∈D(c,T ) EP [EP [J(d)|GTc c ]|A], since GTc c = GTd d & Pc |GTc c = Pd |G d ,
Td
for d ∼T c, where from the claim follows at once.
Theorem 4.6 (Bellman’s principle). We work under the provisions of Assumption 4.3 and insist
{0, ∞} ⊂ G (recall Remark 4.4(ii)).
(V (c, S))(c,S)∈C×G is a (C, G)-supermartingale system. Moreover, if c∗ ∈ C is optimal, then
∗
c∗
(V (c∗ , T ))T ∈G has a constant Pc -expectation (equal to the optimal value v = EP J(c∗ )). If further
c∗
EP J(c∗ ) < ∞, then (V (c∗ , T ))T ∈G is a G-martingale in the sense that (i) for each T ∈ G,
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
∗
15
∗
V (c∗ , T ) is GTc c∗ -measurable and Pc -integrable and (ii) for any {S, T } ⊂ G with S d ≤ T d {Pd ∗
a.s.} for d ∈ D(c∗ , T ), Pc -a.s.,
c∗
∗
EP [V (c∗ , T )|GSc c∗ ] = V (c∗ , S).
c∗
Furthermore, if c∗ ∈ C is conditionally optimal at S ∈ G and EP J(c∗ ) < ∞, then c∗ is
conditionally optimal at T for any T ∈ G satisfying T d ≥ S d {Pd -a.s.} for d ∈ D(c∗ , T ). In
c∗
particular, if c∗ is optimal, then it is conditionally optimal at 0, so that if further EP J(c∗ ) < ∞,
then c∗ must be conditionally optimal at any S ∈ G.
Conversely, and regardless of whether Assumption 4.3 holds true, if G includes a sequence
∗
(Sn )n∈N0 for which (i) S0 = 0, (ii) the family (V (c∗ , Sn ))n≥0 has a constant Pc -expectation and is
∗
∗
uniformly integrable, and (iii) V (c∗ , Sn ) → V (c∗ , ∞), Pc -a.s. (or even just in Pc -probability), as
n → ∞, then c∗ is optimal.
Proof. Let {S, T } ⊂ G and c ∈ C with S d ≤ T d {Pd -a.s.} for d ∈ D(c, T ). Then, since S c ≤ T c
{Pc -a.s.}, GSc c ⊂ GTc c , and D(c, T ) ⊂ D(c, S), so that we obtain via Proposition 4.5, Pc -a.s.,
c
d
EP [V (c, T )|GSc c ] = Pc |GSc c -esssupd∈D(c,T ) EP [J(d)|GSc c ]
d
≤ Pc |GSc c -esssupd∈D(c,S) EP [J(d)|GSc c ] = V (c, S),
since GSc c = GSd d for s ∼S d, which establishes the first claim. The second follows at once from the
final conclusion of Proposition 4.5. Then let c∗ be optimal and {S, T } ⊂ G with S d ≤ T d {Pd c∗
c∗
∗
a.s.} for d ∈ D(c∗ , T ). Note that by the supermartingale property, v = EP EP [V (c∗ , T )|GSc c∗ ] ≤
c∗
EP V (c∗ , S) = v. So, if furthermore v < ∞ (remark that then v ∈ R), we conclude that V (c∗ , T )
∗
is Pc -integrable, and the martingale property also follows.
c∗
Next, if c∗ is conditionally optimal at S, EP J(c∗ ) < ∞, and S d ≤ T d {Pd -a.s.} for
c∗
c∗
d ∈ D(c∗ , T ), then since V is a (C, G)-supermartingale system, EP J(c∗ ) = EP J(c∗ , S) =
c∗
c∗
∗
EP V (c∗ , S) ≥ EP V (c∗ , T ). On the other hand, for sure, V (c∗ , T ) ≥ J(c∗ , T ), Pc -a.s., so
c∗
c∗
c∗
∗
EP V (c∗ , T ) ≥ EP J(c∗ , T ) = EP J(c∗ ) hence we must have V (c∗ , T ) = J(c∗ , T ), Pc -a.s., i.e. c∗
is conditionally optimal at T . The penultimate claim is then also evident.
∗
For the final claim notice that V (c∗ , Sn ) → V (c∗ , ∞) in L1 (Pc ), as n → ∞, and so v =
c
c∗
c∗
c∗
c∗
supc∈C EP J(c) = EP V (c∗ , 0) = EP V (c∗ , Sn ) → EP V (c∗ , ∞) = EP J(c∗ ), as n → ∞.
Proposition 4.7. Under Assumption 4.3 and insisting that ∞ ∈ G, V is the minimal (C, G)supermartingale system W satisfying the terminal condition
c
c
W (c, ∞) ≥ EP [J(c)|G∞
],
Pc -a.s. for each c ∈ C.
Proof. That V is a (C, G)-supermartingale system satisfying the indicated terminal condition is
clear from the definition of V and Theorem 4.6. Next, let W be a (C, G)-supermartingale system
satisfying said terminal condition. Then for all (c, T ) ∈ C × G and d ∈ D(c, T ), Pc -a.s. and Pd -a.s.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
d
d
16
d
d ]|G d ] = J(d, T ). Thus W (c, T ) ≥
W (c, T ) = W (d, T ) ≥ EP [W (d, ∞)|GTd d ] ≥ EP [EP [J(d)|G∞
Td
V (c, T ), Pc -a.s.
5. A solved formal example
Recall the notation of Section 2. The time set will be [0, ∞) (Setting 2.1(i); T = [0, ∞)).
Fix next a discount factor α ∈ (0, ∞), let (Ω, H, P) be a probability space supporting two
independent, sample-path-continuous, Brownian motions B 0 = (Bt0 )t∈[0,∞) and B 1 = (Bt1 )t∈[0,∞) ,
starting at 0 and −x ∈ R, respectively (Setting 2.1(iii) and 2.1(iv); F c = H, Pc = P for all c, Ω
is itself). We may assume (Ω, H) is Blackwell. By F denote the natural filtration of the bivariate
process (B 0 , B 1 ). Then for each càdlàg, F-adapted, {0, 1}-valued process c, let G c be the natural
filtration of B c , the observed process (Setting 2.1(vi); the G c s are themselves); let (Jkc )∞
k=0 be the
jump times of c (with J0c := 0; Jkc = ∞, if c has less than k jumps); and define:
[
C :=
F-adapted, càdlàg, {0, 1}-valued, processes c, with c0 = 0,
>0
c
− Jkc ≥ on {Jkc < ∞} for all k ∈ N
that are G c -predictable and such that Jk+1
(Setting 2.1(ii); C is itself). The insistence on the “-separation” of the jumps of the controls
appears artificial – our intention is to emphasize the salient features of the control-dependent
informational flow, not to be preoccupied with the technical details.
For c ∈ C, define next:
• for each t ∈ [0, ∞) (with the convention sup ∅ := 0) σtc := sup{s ∈ [0, t] : cs 6= ct } and
τtc := t − σtc , the last jump time of c before time t and the lag since then, respectively;
• Z c := B c − Bσ1−c
c , the current distance of the observed Brownian motion to the last recorded
value of the unobserved Brownian motion;
R
R∞
c , τ c )|dc |, where K : R×[0, ∞) → R is a measurable
• J(c) := 0 e−αt Ztc dt− (0,∞) e−αt K(Zt−
t
t−
function with polynomial growth, to be specified later (Setting 2.1(v); J is itself; remark
|Z c | ≤ B 0 + B 1 (where a line over a process denotes its running supremum), so there are
no integrability issues (due to the ‘’-separation of the jumps of c)).
R ∞
R ∞
Notice that E 0 e−αt Ztc dt = E 0 e−αt Btc − Bt1−c dt . Define V (x) := supc∈C EJ(c).
Finally, with regard to Setting 2.7, introduce for every c ∈ C and controlled time S,
c
c
D(c, S) := {d ∈ C : dS = cS }.
Then let:
G := {controlled times S such that ∀c∀d(d ∈ D(c, S) ⇒ S c = S d & GSc c = GSd d )};
all the provisions of Section 2 (specifically,
those of Setting 2.1,
well as Assumption 2.9) being thus satisfied.
{controlled times S such that
c
∀c∀d(dS
=
c
cS
⇒
Sc
=
Setting 2.7,
Thanks to Corollary 6.7,
S d )}.
G
as
=
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
17
Moreover, Assumption 4.3 can also be verified, as follows (here, the interplay between the nature
of the controls, and the observed information, will be crucial to the argument). Let {c1 , c2 } ⊂ C,
S ∈ G, assume c1 ∼S c2 , define P := S c1 = S c2 , take A ∈ GPc1 = GPc2 . Consider the process
c := c1 1A + c2 1Ω\A . We claim c ∈ C (then, since J(c) = J(c1 )1A + J(c2 )1Ω\A , the condition
of Proposition 4.2(1) will follow). For sure, c is a {0, 1}-valued, càdlàg process, vanishing at
zero. Furthermore, if the jumps of c1 and c2 are temporally separated by 1 > 0 and 2 > 0,
respectively, then the jumps of c are temporally separated by := 1 ∧ 2 > 0. Finally, c is G c predictable. To see this, note first that for all t ∈ [0, ∞), Gtc |{t≤P } = Gtc1 |{t≤P } = Gtc2 |{t≤P } , whilst
Gtc |{P <t}∩A = Gtc1 |{P <t}∩A and Gtc |{P <t}∩(Ω\A) = Gtc2 |{P <t}∩(Ω\A) (also, by Theorem 6.6, Proposition 6.5 and Proposition 6.9, from Part 2, {{t ≤ P }, A∩{P < t}, (Ω\A)∩{P < t}} ⊂ σ((B c1 )P ∧t ) =
σ((B c2 )P ∧t ) = σ((B c )P ∧t ) ⊂ Gtc ). Then it will suffice to argue that c1J0,P K = c1 1J0,P K = c2 1J0,P K ,
c1LP,∞M 1A = c1 1LP,∞M 1A and c1LP,∞M 1Ω\A = c2 1LP,∞M 1Ω\A are all G c -predictable. To see this,
one need only consider the class of G c1 or G c2 , respectively G c1 , G c2 , predictable processes V , for
which V 1J0,P K , respectively, V 1LP,∞M 1A , V 1LP,∞M 1Ω\A , are G c -predictable. Then one establishes
that this is a monotone class, containing the multiplicative class of all the left continuous, G c1 or
G c2 , respectively G c1 , G c2 , adapted processes. The Functional Monotone Class Theorem allows to
conclude.
Now, we shall:
(a) Identify an instance of K for which any control in C is optimal. It will emerge that
K(z, t) = −2z/α, (z, t) ∈ R × [0, ∞), fits this bill, and then V (x) = x/α.
(b) Provide a class of functions K for which V (x) is a symmetric function of the parameter x.
Here K will have the form:
Z
K(z, t) =
R
!
√
√
| tu − z| − |z|
e−γ| tu−z| − e−γ|z|
+
N (0, 1)(du) + 1(0,∞) (z)L(z, t), (z, t) ∈ R × [0, ∞),
α
αγ
with L nonnegative, measurable, bounded in polynomial growth; γ :=
letting
c
√
2α. We will see,
be the control, which waits an ∈ (0, ∞) amount of time each time it has jumped
(also at time zero), and thereafter jumps at the first entrance time of Z c into (−∞, 0] –
we are witnessing here, finally, an example of a whole sequence of non-control-constant
controlled times, members of G — that for any such K, EJ(c ) → V (x) =
γ|x|+e−γ|x|
,
αγ
as
↓ 0.
Indeed, according to Bellman’s principle (Theorem 4.6), and the strong Markov property, for
each c ∈ C, the following process (where V is, by a slight abuse of notation, the would-be value
function4):
Stc
Z
:=
t
−αs
e
0
Zsc ds
Z
−
c
c
e−αs K(Zs−
, τs−
)|dcs | + e−αt V (Ztc , τtc ),
(5.1)
(0,t]
4More precisely, for z ∈ R, u ∈ [0, ∞), V (z, u) is the optimal payoff of the related optimal control problem in
which, ceteris paribus, B 1 = z + Hu+· , for a Brownian motion H independent of B 0 .
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
18
should be a (G c , P)-supermartingale (in t ∈ [0, ∞)). Moreover, if an optimal strategy c∗ exists, then
∗
∗
S c should be a (G c , P)-martingale (or, when dealing with a sequence/net of optimizing controls,
it should, in expectation, ‘be increasingly close to being one’).
Guided by (5.1), let us now consider the semimartingale decomposition of S c , assuming K is
such that in fact (i) for z ∈ R, s ∈ [0, ∞) (again with a slight abuse of notation) V (z, s) = V (z) (i.e.
no explicit lag dependence in the value function); (ii) V is of class C 1 , and also twice differentiable,
with second derivative continuous, except possibly at finitely many points, wherein still the left
and right derivatives exist and remain continuous from the left, respectively right; and (iii) V and
V 0 are bounded in polynomial growth.
The semimartingale decomposition (for which we require, in principle, the “usual conditions”)
may then be effected relative to the completed measure P and the usual augmentation G c + of G c ,
with respect to which Z c is a semimartingale (indeed, its jump part is clearly of finite variation,
whilst its continuous part is, in fact, a Brownian motion relative to the augmentation of the natural
filtration of (B 0 , B 1 )).
We thus obtain, by the Itô-Tanaka-Meyer formula [12, p. 214, Theorem IV.70, p. 216, Corollary IV.1], P-a.s. for all t ∈ [0, ∞):
Stc
t
Z
−αs
= V (x) +
e
Z
c
Zs−
ds +
e
0
{z
}
=:C1
+
Z
}|
{
c
e−αs V 0 (Zs−
)dZsc +
0
0
{z
|
}
=:J1
(∗)
t
c
c
(−K(Zs−
, τs−
))|dcs | +
(0,t]
|
zZ
−αs
t
Z
|0
|
t
c
e−αs (−α)V (Zs−
)ds
{z
}
(5.2)
=:C2


(∗)
s
(∗∗)
z
}|
{
}|
{
z
}|
{
z
X
1


c
c
)∆Zsc 
e−αs V 00 (Zs−
)d [Z c ]cts
e−αs ∆V (Zsc ) −V 0 (Zs−
s +
2
{z
} 0<s≤t
=:C3
Note that the starred parts combine into:
(∗)
z
Z
t
}|
−αs
e
V
0
{
c
(Zs−
)d(Z c )cts
s ,
(5.3)
0
|
{z
=:M1
}
which is a (G c + , P)-martingale in t ∈ [0, ∞) (since |Z c | ≤ B 0 + B 1 ). On the other hand, the
compensator of the double-starred term is:
(∗∗)
zZ
−αs
|dcs |e
(0,t]
|
{
p c
c
c
N (0, 1)(du) V ( τs− u − Zs− ) − V (Zs− ) ,
R
{z
}
Z
}|
=:J2
(5.4)
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
19
making:
(∗∗)
z
Z
|dcs |e−αs
(0,t]
|
{
p
c u − Z c ) − V (Z c )
∆V (Zsc ) −
N (0, 1)(du) V ( τs−
s−
s−
R
{z
}
Z
}|
(5.5)
=:M2
into a (G c , P)-martingale (in t ∈ [0, ∞)). For, if τ is a predictable stopping time with respect to
some filtration (in continuous time) Z, U is a Zτ -measurable random variable, and Q a probability
measure with Q[|U |1[0,t] ◦ τ ] < ∞ for each t ∈ [0, ∞), then the compensator of U 1Jτ,∞M (relative
to (Z, Q)) is Q[U |Zτ − ]1Jτ,∞M . This fact may be applied to each jump time of the G c -predictable
process c (since |Z c | ≤ B 0 + B 1 ), whence linearity allows to conclude (due to the ‘-separation’ of
the jumps of c).
Remark now that the properties of being a càdlàg (super)martingale [13, p. 173, Lemma II.67.10]
or predictable process (of finite variation) are preserved when passing to the usual augmentation
of a filtered probability space. Therefore it follows that, relative to (G c + , P), M := M1 + M2 is
a martingale, whilst J = J1 + J2 (respectively C = C1 + C2 + C3 ) is a pure-jump (respectively
continuous) predictable process of finite variation. On the other hand, the process S c is supposed
to be a (G c + , P)-supermartingale. But then, we have obtained in this manner nothing but the
Doob-Meyer decomposition of S c = V (x) + M + J + C, so that J and C are both, respectively
continuous and pure-jump, nonincreasing processes of finite variation [9, p. 32, Corollary 3.16] [10,
p. 412, Theorem 22.5].
Now assume furthermore that an optimizing (as ↓ 0) net of optimal controls is to wait for a
period of after each jump of c and also at the start, and then each time change the observed Brownian motion precisely at the first entrance time of Z c into the set (−∞, −l] /for some prespecified
level l ∈ [0, ∞)/. Remark such a control is previsible with respect to G c .
Then we should like to have (the first two conditions follow from the supermartingale property,
the last two from the ‘near martingale’/‘limiting martingale’ condition; the presence of the a.e.
qualifiers being a reflection of the Occupation Time Density Formula [12, p. 216, Corollary IV.1]):
1
[from C] z − αV (z) + V 00 (z) ≤ 0, for a.e. z ∈ R;
(5.6)
2
Z
√
[from J] −K(z, t) + [V ( tu − z) − V (z)]N (0, 1)(du) ≤ 0, for all z ∈ R, t ∈ [0, ∞); (5.7)
R
1
[from C] z − αV (z) + V 00 (z) = 0, for a.e. z ≥ −l;
(5.8)
Z2 √
[from J] −K(z, t) + [V ( tu − z) − V (z)]N (0, 1)(du) = 0, for all z ≤ −l, t ∈ (0, ∞).(5.9)
R
This concludes the first part of the analysis, deriving what ought to hold of V . In the second part
we flip, as is usual, the argument upside-down. V will be specified a priori (along with K); S c
remains defined in terms of this prespecified V , viz. Eq. (5.1); and then it is shown that V (x) is
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
20
the optimal payoff, via the semimartingale decomposition of S c which the latter will continue to
enjoy in the form (5.2)-(5.3)-(5.4)-(5.5).
Indeed, as regards (a), we may take K(z, t) = −2z/α and V (z) = z/α, for z ∈ R, in which case
(5.6)-(5.7)-(5.8)-(5.9) are all satisfied with equality. Taking expectations in (5.1)-(5.2), and passing
to the limit t → ∞ via dominated convergence, we see that V (x) is the optimal payoff, and any
control from C realizes it.
For a less degenerate case, let us solve (5.8) on z ∈ [−l, ∞) and in the general solution throw
away the exponentially increasing part. Then V (z) = ψ(z) for z ≥ −l, where ψ(u) := αu + Ae−γu
√
(u ∈ R, A ∈ R, γ := 2α). To obtain a symmetric function of V it is natural to take l = 0 and then
V (z) = ψ(|z|), z ∈ R. For such a V , (5.6) is in fact satisfied by a strict inequality on z ∈ (−∞, 0);
and V is C 1 for A = 1/(γα) (and then it is even C 2 ). Then (5.9) and (5.7) essentially necessitate
taking the form of K as specified in (b) on p. 17.
Now, to see that c (as described in (b)) is in fact an optimizing net of controls (as ↓ 0), first
take expectations in (5.1)-(5.2), and pass to the limit as t → ∞ (via dominated and monotone
convergence), in order to see that V (x) ≥ EJ(c) for each c ∈ C; second apply (5.1)-(5.2) to c = c ,
pass to the limit t → ∞, and note that
Z ∞
Z ∞
1 00 c
−αs
c
c
E
e
Zs− − αV (Zs− ) + V (Zs− ) ds = 2E
e−αs Zsc 1(−∞,0) ◦ Zsc ds → 0
2
0
0
as ↓ 0. To convince the reader of this, it will suffice to check:
Z T
E
e−αs Zsc 1(−∞,0) ◦ Zsc ds → 0, as ↓ 0,
0
for each T ∈ (0, ∞). Fix such a T . Further, it will be sufficient to argue that:
Z T
E
e−αs Zsc 1(−∞,−a) ◦ Zsc ds → 0, as ↓ 0,
0
for each a ∈ (0, ∞). Fix such an a. It will now be enough to demonstrate that P-a.s. the Lebesgue
measure of the set of times A := {s ∈ [0, T ] : Zsc < −a} converges to 0, as ↓ 0. Call the intervals
of time Ak := [Jkc , Jkc + ), k ∈ N0 , holding periods for the control c . Remark the holding periods
constitute a pairwise disjoint cover of A . Moreover, if s ∈ A ∩ Ak for k ∈ N, then (denoting
c ) −a > Z c = B j0 − B 1−j0 , whilst 0 ≥ Z c = B 1−j0 − B j0 , hence
t0 := Jkc , j0 := ct0 and T0 := Jk−1
s
s
t0
t0
T0
t−
0
−a > Bsj0 − BTj00 . Thus, if s ∈ A belongs to the k-th holding period for c (and k ≥ 1), then in the
time interval between the start of the (k − 1)-th holding period and the end of the k-th holding
period, one of the Brownian motions B 0 and B 1 must have moved by more than a. However,
thanks to the continuity of the sample paths of B 0 and B 1 , the infimum over the amounts of time
required for either B 0 or B 1 to move by more than a (on the interval [0, T ]) is strictly positive
(albeit dependent on the sample point). It follows that the number of k ≥ 1 for which there can
be an s ∈ A with s ∈ Ak is bounded by some number, depending on the sample point, but not on
, and this establishes the claim.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
21
Part 2. Stopping times, stopped processes and natural filtrations at stopping times –
informational consistency
Recall from p. 9 the contents of the third, final, bullet point remark following Assumption 2.9
– it is to the questions posed and motivated there, that we now turn our attention. Along the
way, we shall (be forced to) investigate (i) the precise relationship between the sigma-fields of the
stopped processes, on the one hand, and the natural filtrations of the processes at these stopping
times, on the other and (ii) the nature of the stopping times of the processes and of the stopped
processes, themselves. Here is an informal statement of the kind of results that we will /seek to/
formally establish (F X denotes the natural filtration of a process X):
If X is a process, and S a time, then S is a stopping time of F X , if and only if it is a
S
stopping time of F X . When so, then FSX = σ(X S ). In particular, if X and Y are
two processes, and S is a stopping time of either F X or of F Y , with X S = Y S , then
S is a stopping time of F X and F Y both, moreover FSX = σ(X S ) = σ(Y S ) = FSY .
Further, if U ≤ V are two stopping times of F X , X again being a process, then
σ(X U ) = FUX ⊂ FVX = σ(X V ).
We will perform this investigation into the nature of information generated by processes in the
two ‘obvious’ settings: first the ‘measure-theoretic’ one, without reference to probability measures
(Section 6) and then the ‘probabilistic’ one, involving a complete probability measure, under which
all the filtrations and σ-fields are completed (Section 7). This will also dovetail nicely with the
parallel development of the two frameworks for stochastic control – the ‘measure-theoretic’ and the
‘probabilistic’ one – from Part 1.
Now, the most important findings of this part are as follows:
• Lemma 6.2, Proposition 6.5, Theorem 6.6, Theorem 6.7 and Proposition 6.9, in the
‘measure-theoretic’ case;
• Corollaries 7.2 and 7.3 (in discrete time) and Proposition 7.6, Corollaries 7.7, 7.9 and 7.10
(in continuous time), for the case with completions.
(Indeed, we have already referenced many of these results in Part 1 – which fact further demonstrates their relevance to this study.) It emerges that everything that intuitively ought to hold,
does hold, if either the time domain is discrete, or else the underlying space is Blackwell (and,
when dealing with completions, the stopping times are predictable; but see the negative results
of Examples 7.4 and 7.5). While we have not been able to drop the “Blackwell assumption”, we
believe many of the results should still hold true under weaker conditions – this remains open to
future research.
Finally, we note that the whole of the remainder of this part is in fact independent from the rest
of the paper (in particular, from Part 1).
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
22
6. The ‘measure-theoretic’ case
We begin by fixing quite a bit of notation and by enunciating a couple of well-known (and some
less well-known) measure-theoretic facts along the way – we ask the reader to bear with us.
(•) T = N0 or T = [0, ∞), with the usual linear order, Ω is some set and (E, E) a measurable
space. By a process (on Ω, with time domain T and values in E), we mean simply a collection
X = (Xt )t∈T of functions from Ω into E. With FtX := σ(Xs : s ∈ [0, t]) (for t ∈ T ), F X = (FtX )t∈T
is then the natural filtration of X.5 Remark that for every t ∈ T , FtX = σ(X|[0,t] ), with
X|[0,t] (ω) = (Xs (ω))s∈[0,t] for ω ∈ Ω; X|[0,t] is an FtX /E ⊗[0,t] -measurable mapping. The ω-sample
path of X, X(ω), is the mapping from T into E, given by (t 7→ Xt (ω)), ω ∈ Ω. In this sense, X
X /E ⊗T -measurable mapping, indeed F X = σ(X). Then ImX will
may of course be viewed as an F∞
∞
denote the range (image) of the mapping X : Ω → E T .
(•) If further S : Ω → T ∪ {∞} is a time and G is a filtration on Ω, then GS := {A ∈ G∞ :
A ∩ {T ≤ t} ∈ Gt for all t ∈ T } is the filtration G at (the time) S, whilst the stopped process
X S (of a process X) is defined via XtS (ω) := XS(ω)∧t (ω), (ω, t) ∈ Ω × T . Note, that if T = N0 , X is
a G-adapted process and S is a G-stopping time, then X S is automatically adapted to the stopped
−1 (Z) ∩ {S = m} ∪
filtration (Gn∧S )n∈N0 . For, if n ∈ N0 , Z ∈ E, then (XnS )−1 (Z) = ∪N0 3m≤n Xm
Xn−1 (Z) ∩ {n < S} ∈ GS∧n . On the other hand, in continuous-time, when T = [0, ∞), if X is
G-progressively measurable and S is a G-stopping time, then X S is also adapted to the stopped
filtration (Gt∧S )t∈[0,∞) (and is G-progressively measurable) [11, p. 9, Proposition 2.18]. Remark
also that every right- or left-continuous Euclidean space-valued G-adapted process is automatically
G-progressively measurable.
(•) Next, for a σ-field F on Ω, the measurable space (Ω, F) is said to be (i) separable or
countably generated, when it admits a countable generating set; (ii) Hausdorff, or separated,
when its atoms6 are the singletons of the members of Ω [4, p. 10]; and finally (iii) Blackwell
when its associated Hausdorff space ((Ω, F) quotiented out by ∼ of Footnote 6) (Ω̇, Ḟ) is Souslin
[4, p. 50, III.24]. Furthermore, a Souslin space is a measurable space, which is Borel isomorphic
to a Souslin topological space. The latter in turn is a Hausdorff topological space, which is also a
continuous image of a Polish space (i.e. of a completely metrizable separable topological space).
Every Souslin measurable space is necessarily separable and separated. [4, p. 46, III.16; p. 76,
III.67] For a measurable space, clearly being Souslin is equivalent to being simultaneously Blackwell
and Hausdorff. The key result for us, however, will be Blackwell’s Theorem [4, p. 51 Theorem III.26]
(repeated here for the reader’s convenience – we shall use it time and again):
Blackwell’s Theorem. Let (Ω, F) be a Blackwell space, G a sub-σ-field of F and
S a separable sub-σ-field of F. Then G ⊂ S, if and only if every atom of G is a
5[0, t] is to be understood throughout as the set {0, . . . , t} when t ∈ T = N .
0
6Equivalence classes for the equivalence relation ∼ on Ω, given by (ω ∼ ω 0 ) ⇔ (for all A ∈ F , 1 (ω) =
A
{ω, ω 0 } ⊂ Ω.
1A (ω0 )),
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
23
union of atoms of S. In particular, a F-measurable real function g is S-measurable,
if and only if g is constant on every atom of S.
(•) Remark finally that if Y is a mapping from A into some Hausdorff (resp. separable) measurable space (B, B), then Y is constant on the atoms of σ(Y ) (resp. σ(Y ) is separable). For, if
{ω1 , ω2 } ⊂ A, A being an atom of σ(Y ), with per absurdum (under the hypothesis that (B, B)
is Hausdorff) Y (ω1 ) 6= Y (ω2 ), then there is a W ∈ B with
1W (Y (ω1 )) 6= 1W (Y (ω2 )), hence
1Y −1 (W ) (ω1 ) 6= 1Y −1 (W ) (ω2 ), a contradiction. Conversely, if Y is a surjective mapping from A
onto some measurable space (B, B), constant on the atoms of σ(Y ), then (B, B) is Hausdorff. For,
1W (b) = 1W (b0 ) for all W ∈ B, then if {a, a0 } ⊂ A are such that Y (a) = b
and Y (a0 ) = b0 , 1Z (a) = 1Z (a0 ) for all Z ∈ σ(Y ), so that a and a0 belong to the same atom of
if {b, b0 } ⊂ B and
(A, σ(Y )) and consequently b = b0 . Furthermore, any measurable subspace (with the trace σ-field)
of a separable (resp. Hausdorff) space is separable (resp. Hausdorff). Lastly, if f : A → (B, B) is
any map into a measurable space, then the atoms of σ(f ) always ‘respect’ the equivalence relation
induced by f , i.e., for {ω, ω 0 } ⊂ A, if f (ω) = f (ω 0 ), then ω and ω 0 belong to the same atom of σ(f ):
for all Σ ∈ B,
1f −1 (Σ) (ω) = 1Σ (f (ω)) = 1Σ (f (ω0 )) = 1f −1 (Σ) (ω0 ).
Now, a key result in this section will establish that, for a process X and a stopping time S
thereof, σ(X S ) = FSX , i.e. that the initial structure (with respect to E ⊗T ) of the stopped process
coincides with the filtration of the process at the stopping time – under suitable conditions.
Indeed, our first lemma towards this end tells us that elements of FSX are functions (albeit not
(as yet) necessarily measurable functions) of the stopped process X S .
Lemma 6.1 (Key lemma). Let X be a process (on Ω, with time domain T and values in E), S an
F X -stopping time, A ∈ FSX . Then the following holds for every {ω, ω 0 } ⊂ Ω: If Xt (ω) = Xt (ω 0 )
for all t ∈ T with t ≤ S(ω) ∧ S(ω 0 ), then S(ω) = S(ω 0 ), X S (ω) = X S (ω 0 ) and
1A (ω) = 1A (ω0 ).
Proof. Define t := S(ω) ∧ S(ω 0 ). If t = ∞, for sure S(ω) = S(ω 0 ). If not, then {S ≤ t} ∈ FtX , so
0
that there is a U ∈ E ⊗[0,t] with {S ≤ t} = X|−1
[0,t] (U ). Then at least one of ω and ω must belong to
0
{S ≤ t}, hence to X|−1
[0,t] (U ). Consequently, since by assumption X|[0,t] (ω) = X|[0,t] (ω ), both do.
It follows that S(ω) = S(ω 0 ). In particular, X S (ω) = X S (ω 0 ).
Similarly, since A ∈ FSX , A ∩ {S ≤ t} ∈ FtX , so that there is a U ∈ E ⊗[0,t] (resp. U ∈ E ⊗T ),
−1 (U )), when t < ∞ (resp. t = ∞).
with A ∩ {S ≤ t} = X|−1
[0,t] (U ) (resp. A ∩ {S ≤ t} = X
1A (ω) = 1A∩{S≤t} (ω) = 1U (X|[0,t] (ω)) = 1U (X|[0,t] (ω0 )) = 1A∩{S≤t} (ω0 ) = 1A (ω0 ) (resp.
1A (ω) = 1A∩{S≤t} (ω) = 1U (X(ω)) = 1U (X(ω0 )) = 1A∩{S≤t} (ω0 ) = 1A (ω0 )).
Then
Our second lemma will allow to handle the discrete case.
Lemma 6.2 (Stopping times). Let X be a process (on Ω, with time domain N0 and values in E).
For a time S : Ω → N0 ∪ {∞} the following are equivalent:
(1) S is an F X -stopping time.
S
(2) S is an F X -stopping time.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
24
S
Proof. Suppose first S is an F X -stopping time. Let n ∈ N0 . Then for each m ∈ [0, n], {S ≤
S
X , so there is an E
⊗[0,m] with {S ≤ m} = (X S |
−1
m} ∈ Fm
m ∈ E
[0,m] ) (Em ). Then {S = m} ⊂
−1
X
X|−1
[0,m] (Em ) ⊂ {S ≤ m}. Consequently {S ≤ n} = ∪m∈[0,n] X|[0,m] (Em ) ∈ Fn .
X,
Conversely, suppose S is an F X -stopping time. Let n ∈ N0 . For each m ∈ [0, n], {S ≤ m} ∈ Fm
S
−1
hence there is an Em ∈ E ⊗[0,m] with {S ≤ m} = X|−1
[0,m] (Em ). Then {S = m} ⊂ (X |[0,m] ) (Em ) ⊂
S
{S ≤ m}. Consequently {S ≤ n} = ∪m∈[0,n] (X S |[0,m] )−1 (Em ) ∈ FnX .
The next step establishes that members of FSX are, in fact, measurable functions of the stopped
process X S – at least under certain conditions (but always in the discrete case).
Proposition 6.3. Let X be a process, S an F X -stopping time. If any one of the conditions (1)(2)-(3) below is fulfilled, then FSX ⊂ σ(X S ) (where X S is viewed as assuming values in (E T , E ⊗T )).
(1) T = N0 .
(2) ImX S ⊂ ImX.
(3) (a) (Ω, G) is Blackwell for some σ-field G ⊃ FSX ∨ σ(X S ).
(b) σ(X S ) is separable (in particular, this obtains if (ImX S , E ⊗T |ImX S ) is separable).
(c) X S is constant on the atoms of σ(X S ), i.e. (ImX S , E ⊗T |ImX S ) is Hausdorff.
Remark 6.4.
(1) Condition (2) is clearly not very innocuous, but will typically be met when X is the coordinate process on a canonical space.
(2) Condition ((3)b) is verified, if there is a D ⊂ E T , with ImX S ⊂ D, such that the trace
σ-field E ⊗T |D is separable. For example (when T = [0, ∞)) this is the case if E is a second
countable (e.g. separable metrizable) topological space endowed with its (then separable)
Borel σ-field, and the sample paths of X S are, say, all left- or all right-continuous (take D
to be all the left- or all the right-continuous paths from E [0,∞) ).
(3) Finally, condition ((3)c) follows, if (E, E) is Hausdorff and so, in particular, when the
singletons of E belong to E.
Proof. Assume first (1). Let A ∈ FSX and n ∈ N0 ∪ {∞}. Then A ∩ {S = n} ∈ FnX , so A ∩ {S =
n} = (X|[0,n] )−1 (Z) (resp. A ∩ {S = n} = X −1 (Z)) for some Z ∈ E ⊗[0,n] (resp. Z ∈ E ⊗N0 ),
when n < ∞ (resp. n = ∞). But then A ∩ {S = n} = (X S |[0,n] )−1 (Z) ∩ {S = n} (resp.
A ∩ {S = n} = (X S )−1 (Z) ∩ {S = n}). Thanks to Lemma 6.2, {S = n} ∈ σ(X S ), and we are done.
Assume next (2). Let A ∈ FSX . Then
1A = F ◦ X for some E ⊗T /B({0, 1})-measurable mapping
F . Since ImX S ⊂ ImX, for any ω ∈ Ω, there is an ω 0 ∈ Ω with X(ω 0 ) = X S (ω), and then thanks
to Lemma 6.1 X S (ω 0 ) = X S (ω), moreover, F ◦ X S (ω) = F ◦ X(ω 0 ) =
that
1A = F ◦ X S .
1A (ω0 ) = 1A (ω). It follows
Assume now (3). We apply Blackwell’s Theorem. Specifically, on account of ((3)a) (Ω, G) is a
Blackwell space and FSX is a sub-σ-field of G; on account of ((3)b), σ(X S ) is a separable sub-σ-field
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
25
of G. Finally, thanks to ((3)c) and Lemma 6.1, every atom (equivalently, every element) of FSX is
a union of atoms of σ(X S ). It follows that FSX ⊂ σ(X S ).
The continuous-time analogue of Lemma 6.2 is now as follows:
Proposition 6.5 (Stopping times). Let X be a process (on Ω, with time domain [0, ∞) and values
in E), S : Ω → [0, ∞] a time. Suppose:
(1) σ(X|[0,t] ) and σ(X S∧t ) are separable, (ImX|[0,t] , E ⊗[0,t] ) and (ImX S∧t , E ⊗T |ImX S∧t ) Hausdorff for each t ∈ [0, ∞).
(2) X S and X are both measurable with respect to a Blackwell σ-field G on Ω.
Then the following are equivalent:
(a) S is an F X -stopping time.
S
(b) S is an F X -stopping time.
S
S
Proof. Suppose first S is an F X -stopping time. Let t ∈ [0, ∞). Then {S ≤ t} ∈ FtX . But
S
FtX = σ(X S |[0,t] ) ⊂ σ(X|[0,t] ) = FtX . This follows from the fact that every atom of σ(X S |[0,t] ) is a
union of atoms of σ(X|[0,t] ) (whence one can apply Blackwell’s Theorem). To see this, note that if ω
and ω 0 belong to the same atom of σ(X|[0,t] ), then X|[0,t] (ω) = X|[0,t] (ω 0 ) (since (ImX|[0,t] , E ⊗[0,t] ) is
Hausdorff). But then XsS (ω) = XsS (ω 0 ) for all s ∈ [0, (S(ω) ∧ t) ∧ (S(ω 0 ) ∧ t)], and so by Lemma 6.1
S
(applied to the process X S and the stopping time S ∧ t of F X ), (X S )S∧t (ω) = (X S )S∧t (ω 0 ), i.e.
X S |[0,t] (ω) = X S |[0,t] (ω 0 ). We conclude that ω and ω 0 belong to the same atom of σ(X S |[0,t] ).
X , S ∧ t is an
Conversely, assume S is an F X -stopping time. Let t ∈ [0, ∞). Then {S ≤ t} ∈ FS∧t
X ⊂ σ(X S∧t ) = σ(X S |
F X -stopping time and thanks to Proposition 6.3, FS∧t
[0,t] ).
What finally follows is the main result of this section. As mentioned in the Introduction, it
generalizes canonical-space results already available in literature.
Theorem 6.6 (Generalized Galmarino’s test). Let X be a process, S an F X -stopping time. If
T = N0 , then σ(X S ) = FSX . Moreover, if X S is FSX /E ⊗T -measurable (in particular, if it is adapted
X )
to the stopped filtration (Ft∧S
t∈T ) and either one of the conditions:
(1) ImX S ⊂ ImX.
X.
(2) (a) (Ω, G) is Blackwell for some σ-field G ⊃ F∞
(b) σ(X S ) is separable.
(c) (ImX S , E ⊗T |ImX S ) is Hausdorff.
is met, then the following statements are equivalent:
(i) A ∈ FSX .
(ii)
X.
1A is constant on every set on which X S is constant and A ∈ F∞
(iii) A ∈ σ(X S ).
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
26
Proof. The first claim, which assumes T = N0 , follows from Proposition 6.3 and the fact that
automatically X S is FSX /E ⊗T -measurable in this case.
In general, implication (i)⇒(ii) follows from Lemma 6.1. Implication (ii)⇒(iii) proceeds as
follows.
Suppose first (1). Let
X . Then
1A be constant on every set on which X S is constant, A ∈ F∞
1A = F ◦ X for some E ⊗T /B({0, 1})-measurable mapping F . Next, from ImX S ⊂ ImX, for any
ω ∈ Ω, there is an ω 0 ∈ Ω with X(ω 0 ) = X S (ω), and then thanks to Lemma 6.1, X S (ω 0 ) = X S (ω),
so that by assumption, 1A (ω) = 1A (ω 0 ), also. Moreover, F ◦X S (ω) = F ◦X(ω 0 ) = 1A (ω 0 ) = 1A (ω).
It follows that
1A = F ◦ X S .
Assume now (2). Again apply Blackwell’s Theorem. Specifically, on account of (2)(a) (Ω, G) is a
Blackwell space and FSX is a sub-σ-field of G; on account of (2)(b), σ(X S ) is a separable sub-σ-field
of G. Finally, if
X , then 1 is a
1A is constant on every set on which X S is constant and A ∈ F∞
A
G-measurable function (by (2)(a)), constant on every atom of σ(X S ) (by (2)(c)). It follows that
1A is σ(X S )-measurable.
The implication (iii)⇒(i) is just one of the assumptions.
As for our original motivation into this investigation, we obtain:
Corollary 6.7 (Observational consistency). Let X and Y be two processes (on Ω, with time domain
T and values in E), S an F X and an F Y -stopping time. Suppose furthermore X S = Y S . If any
one of the conditions
(1) T = N0 .
(2) ImX = ImY .
X (resp. H ⊃ F Y ).
(3) (a) (Ω, G) (resp. (Ω, H)) is Blackwell for some σ-field G ⊃ F∞
∞
(b) σ(X S ) (resp. σ(Y S )) is separable and contained in FSX (resp. FSY ).
(c) (ImX S , E ⊗T |ImX S ) (resp. (ImY S , E ⊗T |ImY S )) is Hausdorff.
is met, then FSX = FSY .
Remark 6.8. If T = N0 , then in place of S being a stopping time of both F X and F Y , it is sufficient
(ceteris paribus) to insist on S being a stopping time of just one of them. It is so by Lemma 6.2.
The same is true when (3) obtains, as long as the conditions of Proposition 6.5 are met for the
time S and the processes X and Y alike.
Proof. If (1) or (3) hold, then the claim follows immediately from Theorem 6.6.
If (2) holds, let A ∈ FSX , t ∈ T ∪ {∞}. Then
for some
E ⊗[0,t] /B({0, 1})-measurable
(resp.
t = ∞). Moreover, if ω ∈ Ω, there is an
and
X(ω 0 )
ω0
1
1
A∩{S≤t} = F ◦ X|[0,t] (resp.
A∩{S≤t}
⊗T
E /B({0, 1})-measurable) F , when t <
∈ Ω with
X(ω 0 )
We obtain F ◦ Y |[0,t] (ω) = F ◦ X|[0,t] (ω 0 ) = 1A∩{S≤t} (ω 0 ) = 1A∩{S≤t} (ω), i.e.
(resp. F ◦ Y (ω) = F ◦
= 1A∩{S≤t}
(ω 0 )
∞ (resp.
= Y (ω), hence X(ω) agrees with Y (ω)
on T ∩ [0, S(ω)], and thus thanks to Lemma 6.1, S(ω) = S(ω 0 ) and
X(ω 0 )
= F ◦ X)
= 1A∩{S≤t} (ω), i.e.
1A (ω) = 1A (ω0 ).
1A∩{S≤t} = F ◦ Y |[0,t]
1A∩{S≤t} = F ◦ Y ).
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
27
We also have:
Proposition 6.9 (Monotonicity of information). Let Z be a process (on Ω, with time domain T
and values in E), U ≤ V two stopping times of F Z . If either T = N0 or else the conditions:
(1) (Ω, G) is Blackwell for some σ-field G ⊃ σ(Z V ) ∨ σ(Z U ).
(2) (ImZ V , E ⊗T |ImZ V ) is Hausdorff.
(3) σ(Z V ) is separable.
are met, then σ(Z U ) ⊂ σ(Z V ).
Proof. In the discrete case the result follows at once from Theorem 6.6. In the opposite instance, we
claim that the assumptions imply that every atom of σ(Z U ) is a union of the atoms of σ(Z V ): Let
ω and ω 0 belong to the same atom of σ(Z V ); then since (ImZ V , E ⊗T |ImZ V ) is Hausdorff Z V (ω) =
Z V (ω 0 ), hence by Lemma 6.1 V (ω) = V (ω 0 ) and U (ω) = U (ω 0 ), and so a fortiori Z U (ω) = Z U (ω 0 ),
which implies that ω and ω 0 belong to the same atom of σ(Z U ). Apply Blackwell’s Theorem.
7. The case with completions
We have studied in the previous section natural filtrations proper — it is sometimes convenient
to augment the latter by sets of probability zero7 — we turn our attention to their completions.
Notation-wise, for a filtration G on Ω and a complete probability measure P, whose domain includes
P
P
P
G∞ , thereon, we denote by G the completed filtration given by (for t ∈ T ) G t = Gt = Gt ∨ N ;
N being the collection of precisely all P-null sets; likewise if the domain of P includes a σ-field A on
P
Ω, then A := A ∨ N . For any other unexplained notation, that we shall use, we refer the reader to
the beginning of Section 6. And while /for ease of language/ we will continue to work in the sequel
with processes/stopping times, their equivalence classes (with respect to indistinguishability/a.s.
equality), would of course (as appropriate) suffice.
First, all is well in the discrete case.
Lemma 7.1. Let T = N0 , G a filtration on Ω. Let furthermore P be a complete probability measure
P
on Ω, whose domain includes G∞ ; S a G -stopping time. Then S is P-a.s. equal to a stopping time
P
P
S 0 of G; and for any G-stopping time U , P-a.s. equal to S, GU = G S . Moreover, if U is another
P
P
P
random time, P-a.s equal to S, then it is a G -stopping time, and G S = G U .
Proof. For each n ∈ N0 , we may find an An ∈ Gn , such that {S = n} = An , P-a.s. Then
S 0 := (∪n∈N0 An × {n}) ∪ ((Ω\ ∪m∈N0 Am ) × {∞}) is a G-stopping time, P-a.s. equal to S. Let now
7Making them also (in the temporally continuous case, if they are not automatically already) right-continuous, is
less interesting from the point of view of stochastic control, since the stopping times one is really interested in are
(usually) foretellable/predictable, anyway. In general, this is also less of an innocuous operation. For, one might well
concede to being unable to act on a null set; one cannot but feel apprehensive about having to ‘peak infinitesimally
into the future’ before being able to act in the present. Indeed, we will see in the sequel that even the act of completion
alone is less harmless than might seem at first glance.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
P
28
P
U be any time with these two properties of S 0 . To show GU ⊂ G S , it suffices to note that (i) N ,
P
P
the collection of all P-null sets, is contained in G S and (ii) GU ⊂ G S , both of which are easy to see.
P
Conversely, if A ∈ G S , then for each n ∈ N0 ∪ {∞}, A ∩ {S = n} = Bn , P-a.s., for some Bn ∈ Gn ,
and hence the event ∪n∈N0 ∪{∞} Bn ∩ {U = n} belongs to GU , and is P-a.s. equal to A.
Finally, let U be another random time, P-a.s equal to S. For each n ∈ N0 ∪ {∞}, there is then
P
a Cn ∈ Gn with {U = n} = {S = n} = Cn , P-a.s., whence U is a G -stopping time. It follows, by
what we have shown already, that we can find Z, a G-stopping time, P-a.s. equal to S, hence U ,
P
P
P
and thus with G S = GZ = G U .
Corollary 7.2. Let T = N0 ; X and Y processes (on Ω, with time domain N0 and values in E); PX
X and F Y , respectively,
and PY be complete probability measures on Ω whose domains contain F∞
∞
and sharing their null sets. Suppose furthermore S is an F X
X S = Y S , PX and PY -a.s. Then
PX
FXS
PX
= σ(X S )
PX
= σ(Y S )
PY
and an F Y
=
PY
stopping time, with
PY
F Y S .8
Proof. From Lemma 7.1 we can find stopping times U and V of F X and F Y , respectively, both
PX and PY -a.s. equal to S. The event {X U = Y V } is PX and PY -almost certain. It then follows
PX
further from Theorem 6.6 and Lemma 7.1 again, that F X S
PY
σ(Y S )
= σ(Y V )
PY
= FVY
P
=
PY
FY S
= FUX
PX
, as desired.
= σ(X U )
PX
PX
= σ(X S )
=
Corollary 7.3. Let T = N0 , X a process (on Ω, with time domain N0 and values in E), P a
S
X ∨ F X , S : Ω → T ∪ {∞} a random
complete probability measure on Ω whose domain contains F∞
∞
time. Then the following are equivalent:
P
(1) S is an F X -stopping time.
P
(2) S is an F X S -stopping time.
Proof. That the first implies the second is clear from Lemma 7.1, Lemma 6.2 and the fact that
two processes, which are versions of each other, generate the same filtration, up to null sets. For
the converse, one resorts to re-doing the relevant part of the proof of Lemma 6.2, adding P-a.s.
qualifiers as appropriate; the details are left to the reader.
The temporally continuous case is much more involved. Indeed, we have the following significant
negative results.
Example 7.4. Let Ω = (0, ∞)×{0, 1}; F be the product of the Lebesgue σ-field on (0, ∞) and of the
power set on {0, 1}; thereon P = Exp(1) × Unif({0, 1}) be the product law (which is complete; any
law on the first coordinate with a continuous distribution function would also do); e (respectively
a) be the projection onto the first (respectively second) coordinate. Define the process Nt =
a(t − e)1[0,t] (e), t ∈ [0, ∞) (starting at zero, the process N departs from zero at time e with unit
8Of course, all these completions really only depend on the null sets, which the two measures PX and PY share
by assumption.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
29
positive drift, or remains at zero, for all times, with equal probability, independently of e). Its
P
completed natural filtration, F N , is already right-continuous.
P
P
P
N ; so let A ∈ F N , we show A ∈ F N . (i) A ∩ {e = t} is
For, if t ∈ [0, ∞), F N t+ = Ft+
t+
t
P-negligible. (ii) For sure A = N −1 (G), for some G ⊂ R[0,∞) , measurable. Then define for each
natural n ≥ 1/t (when t > 0), Ln : R[0,t] → R[0,+∞) , by demanding

ω(u),
for u ≤ t
Ln (ω)(u) =
ω(t) + (u − t) ω(t)−ω(t−1/n) , for u > t
1/n
(u ∈ [0, ∞), ω ∈ R[0,t] ), a measurable mapping. It follows that for t > 0, for each natural
−1
N
n ≥ 1/t, N −1 (G) ∩ {e ≤ t − 1/n} = N |−1
[0,t] (Ln (G)) ∩ {e ≤ t − 1/n} ∈ Ft . (iii) For each natural
[0,t+1/n] ,
n, A ∩ {e > t + 1/n} = N |−1
[0,t+1/n] (Gn ) ∩ {e > t + 1/n} for some measurable Gn ⊂ R
so A ∩ {e > t + 1/n} is ∅ or {e > t + 1/n} according as 0 is an element of Gn or not (note
this is a “monotone” condition, in the sense that as soon as we once get a non-empty set for
some natural n, we subsequently get {e > t + 1/m} for all natural m ≥ n). It follows that
A ∩ {e > t} = ∪n∈N (A ∩ {e > t + 1/n}) ∈ {∅, {e > t}} ⊂ FtN .
Let further U be the first entrance time of the process N to (0, ∞). By the Début Theorem, this
P
is a stopping time of F N , but is P-a.s. equal to no stopping time of F N at all.
For, suppose that it were P-a.s. equal to a stopping time V of F N . Then there would be a
set Ω0 , belonging to F, of full P-measure, and such that V = U on Ω0 . Tracing everything (F,
P, N , a, e, V ) onto Ω0 , we would obtain (F 0 , P0 , N 0 , a0 , e0 , V 0 ), with (i) V 0 equal to the first
0
entrance time of N 0 to (0, ∞) and (ii) V 0 a stopping time of F N , the natural filtration of N 0 .
Still Nt0 = a0 (t − e0 )1[0,t] (e0 ), t ≥ 0. Take now {ω, ω 0 } ⊂ Ω0 with a(ω) = 1, a(ω 0 ) = 0, denote
0
t := e(ω). Then H|[0,t] (ω) = H|[0,t] (ω 0 ), so ω and ω 0 should belong to the same atom of FtN ; yet
0
{V 0 ≤ t} ∈ FtN , with
P
1{V 0 ≤t} (ω) = 1 and 1{V 0 ≤t} (ω0 ) = 0, a contradiction.
P
Moreover, F N U 6= σ(N U ) , since the event A := {U < ∞} = {a = 1} that N ever assumes a
P
P
P
positive drift belongs to F N U (which fact is clear), but not to σ(N U ) = σ(0) , the trivial σ-field
(it is also obvious; P(a = 1) ∈
/ {0, 1}).
Example 7.5. It is worse, still. Let Ω = (0, ∞) × {−2, −1, 0} be endowed with the law P =
Exp(1) × Unif({−2, −1, 0}), defined on the tensor product of the Lebesgue σ-field on (0, ∞) and
the power set of {−2, −1, 0}. Denote by e, respectively I, the projection onto the first, respectively
second, coordinate. Define the process Xt := I(t − e)1[0,t] (e) , t ∈ [0, ∞), and the process Yt :=
(−1)(t − e)1[0,t] (e)1{−1,−2} ◦ I, t ∈ [0, ∞). The completed natural filtrations of X and Y are already
right-continuous. The first entrance time S of X into (−∞, 0) is equal to the first entrance time
of Y into (−∞, 0), and this is a stopping time of F X
P
as it is of F Y
P
(but not of F X and not of
F Y ). Moreover, X S = 0 = Y S .
P
P
Consider now the event A := {I = −1}. Then A ∈ F X S (it is clear). However, A ∈
/ F Y S . For,
assuming the converse, we should have, P-a.s.,
1A∩{S≤1} = F ◦ Y |[0,1] for some, measurable, F . In
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
30
particular, since A ∩ {S ≤ 1} has positive probability, there should be an ω ∈ A ∩ {S ≤ 1} with
F (Y |[0,1] (ω)) = 1. But also the event {I = −2} ∩ {S ≤ 1} has positive probability and is disjoint
from A ∩ {S ≤ 1}, so there should be an ω 0 ∈ {I = −2} ∩ {S ≤ 1} having F (Y |[0,1] (ω 0 )) = 0. A
contradiction, since nevertheless Y |[0,1] (ω 0 ) = Y |[0,1] (ω).
The problem here, as it were, is that in completing the natural filtration the (seemingly innocuous) operation of adding all the events negligible under P is done uncountably many times (once
for every deterministic time). In particular, this cannot be recovered by a single completion of the
sigma-field generated by the stopped process. Completions are not always harmless.
Furthermore, it does not appear immediately clear to us, what a sensible direct ‘probabilistic’
analogue of Lemma 6.1 should be /nor, indeed, how to go about proving one, and then using it to
produce the relevant counter-parts to the results of Section 6/.
However, the situation is not so bleak, since positive results can be got at least for foretellable/predictable stopping times: As in the case of discrete time – by an indirect method;
reducing the ‘probabilistic’ to the ‘measure-theoretic’ case. We use here the terminology of [4,
p. 127, Definitions IV.69 & IV.70]; given a filtration G and a probability measure Q on Ω, whose
domain includes G∞ :
A random time S : Ω → [0, ∞] is predictable relative to G if the stochastic
interval JT, ∞J is predictable. It is Q-foretellable relative to G if there exists a
Q-a.s. nondecreasing sequence (Sn )n≥1 of G-stopping times with Sn ≤ S, Q-a.s for
all n ≥ 1 and such that, again Q-a.s.,
lim Sn = S, Sn < S for all n on {S > 0};
n→∞
foretellable, if the a.s. qualifications can be omitted.
Note that in a P-complete filtration (P itself assumed complete), the notions of predictable, foretellable and P-foretellable stopping times coincide [4, p. 127, IV.70; p. 128, Theorem IV.71 & p.
132, Theorem IV.77].
The following is now a complement to [4, p. 120, Theorem IV.59 & p. 133, Theorem IV.78] [9,
p. 5, Lemma 1.19], and an analogue of the discrete statement of Lemma 7.1:
Proposition 7.6. Let T = [0, ∞), G be a filtration on Ω. Let furthermore P be a complete
P
probability measure on Ω, whose domain includes G∞ ; S a predictable stopping time relative to G .
Then S is P-a.s. equal to a predictable stopping time P of G. Moreover, if U is any G-stopping
P
P
time, P-a.s. equal to S, then G S = GU . Finally, if S 0 is another random time, P-a.s equal to S,
P
P
P
then it is a G -stopping time, and G S = G S 0 .
Proof. The first claim is contained in [4, p. 133, Theorem IV.78].
P
P
Now let U be any G-stopping time, P-a.s. equal to S. The inclusion G S ⊃ GU is obvious. Then
P
P
P
take A ∈ G S . Since A ∈ G ∞ = G∞ , there is an A0 ∈ G∞ , such that A0 = A, P-a.s. Furthermore,
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
31
since S is foretellable,

S
SA :=
∞
on A
on Ω\A
is foretellable also (if (Sn )n≥1 P-foretells S, then ((Sn )A ∧ n)n≥1 P-foretells SA ). Hence, by what
we have already shown, there exists V , a G-stopping time, with V = SA , P-a.s. So, P-a.s., A =
(A0 ∩ {U = ∞}) ∪ {V = U < ∞} ∈ GU .
P
Finally let S 0 be a random time, P-a.s. equal to S. Clearly, it is a G -stopping time. Moreover,
we have found P , a G-stopping time, P-a.s. equal to S and (by way of corollary) S 0 . It follows from
P
P
P
what we have just shown that G S = GP = G S 0 .
From this we can obtain easily a couple of useful counter-parts to the findings of Section 6 in the
continuous case. They (Corollaries 7.7, 7.9 and 7.10 that follow) should be used in conjunction with
(in this order) (i) the fact that a standard Borel space9-valued random element measurable with
respect to the completed domain of the probability measure Q, is Q-a.s. equal to a random element
measurable with respect to the uncompleted domain of Q (Q being the completion of Q) [10, p. 13,
Lemma 1.25] and (ii) the existence part of Proposition 7.6. Loosely speaking one imagines working
on the completion of a nice (Blackwell) space. Then the quantities measurable with respect to
the completed sigma-fields are a.s. equal to quantities measurable with respect to the uncompleted
sigma-fields, and to them the ‘measure-theoretic’ results apply. Taking completions again, we arrive
at the relevant ‘probabilistic’ statements. The formal results follow.
Corollary 7.7. Let T = [0, ∞), Z a process and P a complete probability measure on Ω, whose
Z , P an F Z
domain includes F∞
P
predictable stopping time. If further for some process X P-
indistinguishable from Z and a stopping time S of F X , P-a.s. equal to P :
(1) (Ω, G) is Blackwell for some σ-field G ⊃ FSX ∨ σ(X S ).
(2) σ(X S ) is separable.
(3) (ImX S , E ⊗[0,∞) |ImX S ) is Hausdorff.
P
P
then F Z P ⊂ σ(Z P ) .
Remark 7.8. The reverse inclusion σ(Z P ) ⊂ F Z
P
is usually trivial (compare the remarks on this
in the second bullet point entry of Section 6, p. 22).
Proof. According to Proposition 6.3, FSX ⊂ σ(X S ). Also F X
P
= FZ
P
P
P
and σ(X S ) = σ(Z P ) .
Taking completions in FSX ⊂ σ(X S ), we obtain by Proposition 7.6, as applied to the stopping time
P
P of F X , P-a.s. equal to the stopping time S of F X :
P
P
P
P
P
F Z P = F X P = FSX ⊂ σ(X S ) = σ(Z P ) ,
9One that is Borel isomorphic to a Borel subset of [0, 1] [10, p. 7], equivalently a Borel subset of a Polish space [2,
p. 12, Definition 6.2.10]. Recall the spaces of Euclidean space-valued càdlàg (resp. continuous) paths endowed with
the Skorohod topology [9, Section VI.1] (resp. the topology of locally uniform convergence [11, p. 60]) are Polish,
hence standard Borel, spaces.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
as desired.
32
Corollary 7.9. Let Z and W be two processes (on Ω, with time domain [0, ∞) and values in
E); PZ and PW probability measures on Ω, sharing their null sets, and whose domain includes
Z and F W , respectively; P a predictable F Z
F∞
∞
ZP
=
WP,
PZ
and
PW -a.s.
PZ
and F W
PW
-stopping time. Suppose furthermore
If for two processes X and Y , PZ and PW -indistinguishable from
Z and W , respectively, and some stooping times S and U of F X and F Y , respectively, PZ and
PW -a.s. equal to P :
X (resp. H ⊃ F Y ).
(1) (Ω, G) (resp. (Ω, H)) is Blackwell for some σ-field G ⊃ F∞
∞
(2) σ(X S ) (resp. σ(Y U )) is separable and contained in FSX (resp. FUY ).
(3) (ImX S , E ⊗[0,∞) |ImX S ) (resp. (ImY U , E ⊗[0,∞) |ImY U )) is Hausdorff.
PZ
PZ
then F Z P = σ(Z P )
= σ(W P )
PW
PW
= FW P .
Proof. The claim follows from Corollary 7.7, and the fact that again (similarly as in the proof of
Corollary 7.7) σ(X S ) ⊂ FSX implies σ(Z P )
PZ
PZ
⊂ F Z P ; likewise for W .
Corollary 7.10. Let X be a process (on Ω, with time domain [0, ∞) and values in E); P a complete
X ; S and P two predictable stopping times of
probability measure on Ω, whose domain includes F∞
FX
P
with S ≤ P . Let U and V be two stopping times of the natural filtration of a process Z, P-
indistinguishable from X, P-a.s. equal to S and P , respectively, with U ≤ V , and such that: (Ω, G)
is Blackwell for some σ-field G ⊃ σ(Z V ) ∨ σ(Z U ), (ImZ V , E ⊗T |ImZ V ) is Hausdorff and σ(Z V ) is
P
P
separable. Then σ(X S ) ⊂ σ(X P ) .
Remark 7.11.
the existence part of Proposition 7.6 here, is as follows: There are U and V 0 ,
stopping times of F X , P-a.s. equal to S and P , respectively. Then, true, U ≤ V 0 only P-a.s. But
V := V 0 1(U ≤ V 0 ) + U 1(U > V 0 ) is also a stopping time of F X , P-a.s. equal to P , and it satisfies
U ≤ V with certainty.
P
P
P
P
Proof. We find that σ(X S ) = σ(Z U ) and σ(X P ) = σ(Z V ) . Then apply Proposition 6.9.
We are not able to provide a (in conjunction with Proposition 7.6) useful counter-part to PropoP
sition 6.5. (True, Proposition 7.6 says that given a predictable stopping time P of F X P , there is
P
a predictable stopping time U of F X , P-a.s. equal to P . But this is not to say U is a stopping
U
time of F X , so one cannot directly apply Proposition 6.5.) This is open to future research.
References
1. A. Bensoussan. Stochastic Control of Partially Observable Systems. Cambridge University Press, 2004.
2. V. I. Bogachev. Measure Theory, volume 2 of Measure Theory. Springer, 2007.
3. Z. Brzeźniak and S. Peszat. Space-time continuous solutions to SPDE’s driven by a homogeneous Wiener process.
Studia Mathematica, 137(3):261–299, 1999.
4. C. Dellacherie and P. A. Meyer. Probabilities and Potential A. North-Holland Mathematics Studies. Herman
Paris, 1978.
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
33
5. N. El Karoui. Les Aspects Probabilistes Du Contrôle Stochastique. In P. L. Hennequin, editor, Ecole d’Eté de
Probabilités de Saint-Flour IX-1979, volume 876 of Lecture Notes in Mathematics, chapter 2, pages 73–238.
Springer Berlin / Heidelberg, 1981.
6. N. El Karoui, D. Nguyen, and M. Jeanblanc-Picqu. Existence of an Optimal Markovian Filter for the Control
under Partial Observations. SIAM Journal on Control and Optimization, 26(5):1025–1061, 1988.
7. W. Fleming and E. Pardoux. Optimal control for partially observed diffusions. SIAM Journal on Control and
Optimization, 20(2):261–285, 1982.
8. R. G. Fryer and P. Harms. Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and
Indexability. NBER Working Papers 19043, National Bureau of Economic Research, Inc, May 2013.
9. J. Jacod and A. N. Shiryaev. Limit theorems for stochastic processes. Grundlehren der mathematischen Wissenschaften. Springer-Verlag, Berlin Heidelberg, 2003.
10. O. Kallenberg. Foundations of Modern Probability. Probability and Its Applications. Springer, New York Berlin
Heidelberg, 1997.
11. I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics. Springer
New York, 1991.
12. P. E. Protter. Stochastic Integration and Differential Equations: Version 2.1. Applications of mathematics. U.S.
Government Printing Office, 2004.
13. L. C. G. Rogers and D. Williams. Diffusions, Markov Processes, and Martingales: Volume 1, Foundations.
Cambridge Mathematical Library. Cambridge University Press, 2000.
14. H. M. Soner and N. Touzi. Dynamic programming for stochastic target problems and geometric flows. Journal
of the European Mathematical Society, 4(3):201–236, 2002.
15. C. Striebel. Martingale conditions for the optimal control of continuous time stochastic systems. Stochastic
Processes and their Applications, 18(2):329 – 347, 1984.
16. W. Wonham. On the separation theorem of stochastic control. SIAM Journal on Control, 6(2):312–326, 1968.
Appendix A. Miscellaneous technical results
Throughout this appendix (Ω, F, P) is a probability space; E denotes expectation with respect
to P.
Lemma A.1 (On conditioning). Let X : Ω → [−∞, +∞] be a random variable, and Gi ⊂ F,
i = 1, 2, two sub-σ-fields of F agreeing when traced on A ∈ G1 ∩ G2 . Then, P-a.s. on A, E[X|G1 ] =
E[X|G2 ], whenever X has a P-integrable positive or negative part.
1A Z is G2 -measurable, for any Z G1 -measurable, by an approximation argument. Then,
P-a.s., 1A E[X|G1 ] = E[1A X|G2 ], by the very definition of conditional expectation.
Proof.
Lemma A.2 (Generalised conditional Fatou and Beppo Levi). Let G ⊂ F be a sub-σ-field and
(fn )n≥1 a sequence of [−∞, +∞]-valued random elements, whose negative parts are dominated P-a.s
by a single P-integrable random variable. Then, P-a.s.,
E[lim inf fn |G] ≤ lim inf E[fn |G].
n→∞
n→∞
If, moreover, (fn )n≥1 is P-a.s. nondecreasing, then, P-a.s.,
E[ lim fn |G] = lim E[fn |G].
n→∞
n→∞
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
34
Proof. Just apply conditional Fatou (resp. Beppo Levi) to the P-a.s. nonnegative (resp. nonnegative nondecreasing) sequence fn + g where g is the single P-integrable random variable which
P-a.s. dominates the negative parts of f . Then use linearity and subtract the P-a.s. finite quantity
E[g|G].
The following is a slight generalization of [15, Theorem A2].
Lemma A.3 (Essential supremum and the upwards lattice property). Let G ⊂ F be a sub-σ-field
and X = (Xλ )λ∈Λ a collection of [−∞, +∞]-valued random variables with integrable negative parts.
Assume furthermore that for each {, M } ⊂ (0, ∞), X has the “(, M )-upwards lattice property”,
i.e. for all {λ, λ0 } ⊂ Λ, one can find a λ00 ∈ Λ with Xλ00 ≥ (M ∧ Xλ ) ∨ (M ∧ Xλ0 ) − P-a.s. Then,
P-a.s.,
E[P-esssupλ∈Λ Xλ |G] = P-esssupλ∈Λ E[Xλ |G],
(A.1)
where on the right-hand side the essential supremum may of course equally well be taken with respect
to the measure P|G .
Proof. It is assumed without loss of generality that Λ 6= ∅, whence remark that P-esssupλ∈Λ Xλ has
an integrable negative part. Then the “≥-inequality” in (A.1) is immediate.
Conversely, we show first that it is sufficient to establish the “≤-inequality” in (A.1) for each
truncated (Xλ ∧ N )λ∈Λ family, as N runs over N. Indeed, suppose we have P-a.s.
E[P-esssupλ∈Λ Xλ ∧ N |G] ≤ P-esssupλ∈Λ E[Xλ ∧ N |G]
for all N ∈ N. Then a fortiori P-a.s. for all N ∈ N,
E[P-esssupλ∈Λ Xλ ∧ N |G] ≤ P-esssupλ∈Λ E[Xλ |G]
and generalised conditional monotone convergence (Lemma A.2) allows to pass to the limit:
E[ lim P-esssupλ∈Λ Xλ ∧ N |G] ≤ P-esssupλ∈Λ E[Xλ |G]
N →∞
P-a.s. But clearly, P-a.s., limN →∞ P-esssupλ∈Λ Xλ ∧ N ≥ P-esssupλ∈Λ Xλ , since for all λ ∈ Λ, we
have, P-a.s., Xλ ≤ limN →∞ Xλ ∧ N ≤ limN →∞ P-esssupµ∈Λ Xµ ∧ N .
Thus it will indeed be sufficient to establish the “≤-inequality” in (A.1) for the truncated families,
and so it is assumed without loss of generality (take M = N ) that X enjoys, for each ∈ (0, ∞),
the “-upwards lattice property”: for all {λ, λ0 } ⊂ Λ, one can find a λ00 ∈ Λ with Xλ00 ≥ Xλ ∨Xλ0 −
P-a.s.
Then take (λn )n≥1 ⊂ Λ such that, P-a.s., P-esssupλ∈Λ Xλ = supn≥1 Xλn and fix δ > 0. Recursively define (λ0n )n≥1 ⊂ Λ so that, Xλ01 = Xλ1 while for n ∈ N, P-a.s., Xλ0n+1 ≥ Xλ0n ∨ Xλn+1 − δ/2n .
P
l
Prove by induction that P-a.s. for all n ∈ N, Xλ0n ≥ max1≤k≤n (Xλk − n−1
l=1 δ/2 ), so that
lim inf n→∞ Xλ0n ≥ supn∈N Xλn − δ, P-a.s. Note next that the negative parts of (Xλ0n )n∈N are
dominated P-a.s. by a single P-integrable random variable. By the generalised conditional Fatou’s
lemma (Lemma A.2) we therefore obtain, P-a.s., P-esssupλ∈Λ E[Xλ |G] ≥ lim inf n→∞ E[Xλ0n |G] ≥
ON THE INFORMATIONAL STRUCTURE IN OPTIMAL DYNAMIC STOCHASTIC CONTROL
35
E[lim inf n→∞ Xλ0n |G] ≥ E[P-esssupλ∈Λ Xλ |G] − δ. Finally, let δ descend to 0 (over some sequence
descending to 0).
Department of Mathematics, University of Ljubljana, Slovenia
E-mail address: matija.vidmar@fmf.uni-lj.si
Department of Statistics, University of Warwick, UK
E-mail address: s.d.jacka@warwick.ac.uk
Download