Lectures on Stochastic Analysis Autumn 2012 version Xue-Mei Li The University of Warwick

advertisement
Lectures on Stochastic Analysis
Autumn 2012 version
Xue-Mei Li
The University of Warwick
Typset: April 9, 2013
Contents
1
Prologue
1.1 What do we cover in this course and why? . . . . . . . . . . . . .
1.2 Exams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Preliminaries
2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Measurable Spaces . . . . . . . . . . . . . . . . . . .
2.1.3 The Monotone Class Theorem . . . . . . . . . . . . .
2.1.4 Measures . . . . . . . . . . . . . . . . . . . . . . . .
2.1.5 Measurable Functions . . . . . . . . . . . . . . . . .
2.2 Integration with respect to a measure . . . . . . . . . . . . . .
2.3 Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Notions of convergence of Functions . . . . . . . . . . . . . .
2.5 Convergence Theorems . . . . . . . . . . . . . . . . . . . .
2.5.1 Uniform Integrability . . . . . . . . . . . . . . . . . .
2.6 Pushed Forward Measures, Distributions of Random Variables
2.7 Lebesgue Integrals . . . . . . . . . . . . . . . . . . . . . . .
2.8 Total variation . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
7
8
9
10
11
13
14
15
15
18
19
20
Lectures
3.1 Lectures 1-2 . . . . . . . . . . . . . . .
3.2 Notation . . . . . . . . . . . . . . . . .
3.3 The Wiener Spaces . . . . . . . . . . .
3.4 Lecture 3. The Pushed forward Measure
3.5 Basics of Stochastic Processes . . . . .
3.6 Brownian Motion . . . . . . . . . . . .
3.7 Lecture 4-5 . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
22
23
25
26
27
29
30
3
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
5
6
3.7.1 Finite Dimensional Distributions . . . . . . . . . . . . . .
3.7.2 Gaussian Measures on Rd . . . . . . . . . . . . . . . . .
Kolmogorov’s Continuity Theorem . . . . . . . . . . . . . . . . .
3.8.1 The existence of the Wiener Measure and Brownian Motion
3.8.2 Lecture 6. Sample Properties of Brownian Motions . . . .
3.8.3 The Wiener measure does not charge the space of finite
energy* . . . . . . . . . . . . . . . . . . . . . . . . . . .
Product-σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . .
3.9.1 The Borel σ-algebra on the Wiener Space . . . . . . . . .
3.9.2 Kolmogorov’s extension Theorem . . . . . . . . . . . . .
39
40
40
41
4
Conditional Expectations
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Lecture 7-8: Conditional Expectations . . . . . . . . . . . . . . .
4.3 Properties of conditional expectation . . . . . . . . . . . . . . . .
4.4 Conditioning on a Random Variable . . . . . . . . . . . . . . . .
4.5 Regular Conditional Probabilities . . . . . . . . . . . . . . . . . .
4.6 Lecture 9: Regular Conditional Distribution and Disintegration . .
4.6.1 Lecture 9: Conditional Expectation as Orthogonal Projection
4.6.2 Uniform Integrability of conditional expectations . . . . .
43
43
45
46
47
51
54
55
57
5
Martingales
5.0.3 Lecture 10: Introduction . . . . . . . . . .
5.1 Lecture 11: Overview . . . . . . . . . . . . . . .
5.2 Lecture 12: Stopping Times . . . . . . . . . . . .
5.2.1 Extra Reading . . . . . . . . . . . . . . . .
5.2.2 Lecture 13: Stopped Processes . . . . . . .
5.3 Lecture 14: The Martingale Convergence Theorem
5.4 Lecture 15: The Optional Stopping Theorem . . . .
5.5 Martingale Inequalities (I) . . . . . . . . . . . . .
5.6 Lecture 16: Local Martingales . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
58
58
60
61
65
65
67
71
74
75
6
The quadratic Variation Process
6.1 Lecture 18-20: The Basic Theorem . . . . . . . . . . . . . . . . .
6.2 Martingale Inequalities (II) . . . . . . . . . . . . . . . . . . . . .
77
78
83
7
Stochastic Integration
7.1 Lecture 21: Integration . . . . . . . . . . . . . . . . . . . . . . .
7.2 Lecture 21: Stochastic Integration . . . . . . . . . . . . . . . . .
7.2.1 Stochastic Integral: Characterization . . . . . . . . . . . .
85
85
86
88
3.8
3.9
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
31
33
35
36
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
90
91
93
95
97
99
Stochastic Differential Equations
8.0.1 Stochastic processes defined up to a random time
8.1 Lecture 24. Stochastic Differential Equations . . . . . .
8.2 The definition . . . . . . . . . . . . . . . . . . . . . . .
8.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . .
8.3.1 Pathwise Uniqueness and Uniqueness in Law . .
8.3.2 Maximal solution and Explosion Time . . . . . .
8.3.3 Strong and Weak Solutions . . . . . . . . . . . .
8.3.4 The Yamada-Watanabe Theorem . . . . . . . . .
8.3.5 Strong Complteness, flow . . . . . . . . . . . .
8.3.6 Markov Process and its semi-group . . . . . . .
8.3.7 The semi-group associated to the SDE . . . . . .
8.3.8 The Infinitesimal Generator for the SDE . . . . .
8.4 Existence and Uniqueness Theorems . . . . . . . . . . .
8.4.1 Gronwall’s Lemma . . . . . . . . . . . . . . . .
8.4.2 Main Theorems . . . . . . . . . . . . . . . . . .
8.4.3 Global Lipschitz case . . . . . . . . . . . . . . .
8.4.4 Locally Lipschitz Continuous Case . . . . . . .
8.4.5 Non-explosion . . . . . . . . . . . . . . . . . .
8.5 Girsanov Theorem . . . . . . . . . . . . . . . . . . . .
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . .
8.7 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
100
101
102
102
103
105
106
107
108
109
110
112
113
114
114
116
116
119
120
124
127
130
7.3
7.4
7.5
7.6
8
7.2.2 Properties of Integrals . . . . . . . .
7.2.3 Lecture 22: Stochastic Integration (2)
Localization . . . . . . . . . . . . . . . . . .
Properties of Stochastic Integrals . . . . . . .
Itô Formula . . . . . . . . . . . . . . . . . .
Lévy’s Martingale Characterization Theorem
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Prologue
The objective of the course is to understand and to learn the Theory of Martingales,
and basics of Stochastic Differential Equations (SDEs). We will also study relevant
topics on Brownian motions and on Markov processes. We cover basic theories
of martingales, of Brownian motions, of stochastic integration, and of stochastic
differential equations. We offer topics on stochastic flows and on the geometry of
stochastic differential equations.
1.1
What do we cover in this course and why?
What are Brownian motions? They result from summing many small influential
factors (law of large numbers) over a time interval [0, t], t ≥ 0. So they have
Gaussian laws that change with time t.
What are martingales? A stochastic process is a martingale if, roughly speaking, the conditioned average value at a future time t given its value at current time
s is its value at s. On average you expect to see what is already statistically known.
Continuous martingales and local martingales can be represented as stochastic integrals with respect to a Brownian motion by the Integral Representation Theorem
(The Clark-Ocone formula from Malliavin calculus gives an explicit representation).
What are Markov processes? The conditional average of the future value of a
Markov process given knowledge of its past up to now is the same as the conditional average of the future value of the Markov process given knowledge on its
present status only.
The Dubin-Schwartz Theorem says that a martingale is a time change of a
Brownian motion, e.g. a Brownian motion run at a random clock. The random
clock is the quadratic variation of the martingale. The time change may not be
4
Markovian and hence a martingale as a stochastic process may not have the Markov
property. Markov processes relates to second order parabolic differential equations.
The following selected Topics gives a guidance on what we intend to work on.
We will cover some topics in depth.
• Brownian Motion. The definition(s), Lévy’s Characterization theorem, the
martingale property, the Markov Property, properties of sample paths.
• Stochastic integration with respect to Brownian motions and with respect to
semi-martingales.
• Rudiment of the Wiener Space.
• Diffusion Processes. Infinitesimal generators, Markov semigroups, Construction by SDEs.
• SDEs. Strong and weak solutions of SDE’s , path wise unique and uniqueness in law of solutions, existence of a local solution, existence of a global
solution, the infinitesimal generator and associated semi-groups, the Markov
property and the Cocycle property, the non-explosion problem, the flow
problem, and Girsanov Transform for SDEs.
• Martingale Theory. martingales, local martingales, semi-martingales, Change
of measures and Girsanov Transform,
• Itô’s formula.
Prerequisites: Measure Theory and Theory of Integration (MA359) or advanced probability theory. Good working skills in Analysis (MA244 Analysis III)
and and Metric Spaces (MA222) . Knowledge of Functional Analysis (MA3G7,
MA 3G8) would be helpful. The following course would be specially helpful:
Brownian Motion (MA4F7).
Leads to: This Module leads to Topics in Stochastic Analysis (MA595), Stochastic Partial Differential Equations.
Related to This module is related to Markov Processes and Percolation Theory (MA3H2), Quantumn Mechanics: Basic Principles and probabilistic methods
(MA4A7), and Stochastic Partial Differential Equations.
1.2
Exams
This module is assessed by one single exam ( at the beginning of term 3). I hand
out a problem sheet each week. The best way to review for the exam is to work on
the problems.
5
1.3
References
For a comprehensive study of martingales we refer to “Continuous Martingales and
Brownian Motion” by D. Revuz and M. Yor [23]. An enjoyable read for introduction to martingales is the book “Probability with martingales” by D. Williams [29].
For further reads on Brownian motions check on M. Yor’s recent books, e.g. [30]
also [18] by R. Mansuy-M. Yor, and also [19] by P. Morters and Y. Peres.
For an overall reference for stochastic differential equations, we refer to “Stochastic differential equations and diffusion processes, second edition” by N. Ikeda and
S. Watanabe [13]. The small book [16] by H. Kunita is nice to read. There are two
lovely books by A. Friedman “Stochastic differential equations and applications”
[9, 10], and “Stochastic differential equations ” by I. Gihman and A.V. Skorohod
[11]. Another book that is good for working out examples is “Stochastic stability
of differential equations” by R. Z. Khasminskii [12]. Two books that are good for
the beginners are “Stochastic Differential Equations” by B. Oksendale [20] and
“Brownian Motion and Stochastic Calculus” by I. Karatzas and S.E. Shreve [15].
The book by Oksendale has 6 editions. I like edition three and edition four: they
are neat and compact. For further studies there are “Diffusions, Markov processes
and Diffusions” by C. Rogers and D. Williams [25, 24]. Another lovely reference
book is “Foundations of Modern Probability” by Kallenberg [14]. It would work
great as a reference book. For SDEs driven by space time martingales see “Stochastic Flows and Stochastic Differential Equations” by H. Kunita [17]. For SDEs on
manifolds see “Stochastic differential equations on manifolds” by K. D. Elworthy
[4]. For work from the point of view of random dynamics see “Random Dynamical
systems” by L. Arnold [1] and “Random perturbations of dynamical systems” by
M. I. Freidlin and A.D. Wentzell. For further work on the geometry of SDEs have
a look at the books “On the geometry of diffusion operators and stochastic flows”
[7] and “The geometry of filtesing” [5] by K. D. Elworthy, Y. LeJan and X.-M. Li.
For a theory on Markov processes and especially the treatment of the Martingale
problem see “Multidimensional diffusion processes” by D. Stroock and S. R.S.
Varadhan [27]. There are a number of nice and slim books by the two authors, see
D.W. Stroock [26] and S. R.S. Varadhan [28].
If you wish to review the theory of integration, try Royden’s book “Real Analysis”. It is easy to read and useful as a reference. For further study on measures see
“Real Analysis” by Folland [8]. Have a read of “Probability measures on metric
spaces” by Parthasarathy [21] for a deep theory on measures. The books “Measure
Theory, vol 1&2” by Bogachev [3] is quite useful. For some aspects measure on
the Wiener space see “Convergence of Probability measures” by Billingsley [2].
6
Chapter 2
Preliminaries
2.1
2.1.1
Basics
Sets
[DeMorgan’s law.] For any index set I
(∩α∈I Eα )c = ∪α∈I Eαc ,
(∪α∈I Eα )c = ∪α∈I Eαc .
If {An } is a sequence of sets define
∞
lim sup An = ∩∞
m=1 ∪k=m Ak .
n→∞
∞
lim inf An = ∪∞
m=1 ∩k=m Ak .
n→∞
2.1.2
Measurable Spaces
Definition 2.1.1 Let Ω be a set. A σ-algebra F on Ω is a collection of subsets of
Ω ssatisfying the following conditions:
(1) ∅ ∈ F
(2) F is closed under complement: if A ∈ F then Ac ∈ F.
(3) F is closed under countable unions: if An ∈ F , ∪∞
n=1 An ∈ F.
The dual (Ω, F) is a measurable space and elements of F are measurable sets.
Definition 2.1.2 Let G be a collection of subsets of Ω. The σ- algebra generated
by G is the smallest σ-algebra that contains G. It is denoted by σ(G).
7
Definition 2.1.3 Let (X, d) be a metric space with metric d. Denote by Bx (a) the
open ball centred at x with radius a. The Borel σ-algebra, denoted by B(X), is
defined to be
σ({Ba (x) : x ∈ X, a ∈ R}),
Elements of B(X) are Borel sets.
If not otherwise mentioned, this is the σ-algebra we will use. Single points and
therefore a countable subset of a metric is always measurable since {x0 } = ∩n {x :
|x − x0 | < n1 }.
2.1.3
The Monotone Class Theorem
Definition 2.1.4 Let Ω be a set. A non-empty family of subsets G of Ω is called a
monotone class if limits of monotone sequences in G belong to G.
1. ∪∞
n=1 An ∈ G if An is an increasing sequence in G.
2. ∩∞
n=1 An ∈ G if An is an decreasing sequence in G
Finite additive property together with monotone class property is equivalent to
the σ-additive property.
A σ-algebra is a monotone class. Intersection of monotone classes containing
a common element is a monotone classes.
Theorem 2.1.5 (Monotone Class Theorem) If A is a algebra, then the smallest
monotone class m(A) of sets containing A is the σ algebra σ(A) generated by A.
Definition 2.1.6 A collection of subset A is a π system if A, B ∈ A implies that
A ∩ B ∈ A.
The intersection of π-systems is a π-system. Note that some people may insist that
the empty set ∅ is in π system.
Definition 2.1.7 A family B of subsets of a set Ω is a σ-additive class also called
a Dynkin system if
1. Ω ∈ B.
2. If B1 ⊂ B2 , B1 , B2 ∈ B then B1 \B2 ∈ B.
3. If An is an increasing sequence in B then ∪∞
n=1 An ∈ B.
8
Note that property 1 and 2 imply that σ-additive class is closed under taking compliment. A σ-additive class is a monotone class. The intersection of any number
of σ-additive classes is a σ-additive class.
Theorem 2.1.8 (Dynkin’s Lemma) If A is a π system then the σ-additive class
(Dynkin system) generated by A is the σ-algebra generated by A.
2.1.4
Measures
Definition 2.1.9 Let (Ω, F) be a measurable space. A measure µ is a function,
µ : F → [0, ∞], with the following property:
1. P (∅) = 0
2. Let {Aj } be a pairwise disjoint sequence
P∞ from F, that is Ai ∩ Aj = ∅
whenever i 6= j. Then µ(∪∞
(A
)
=
i
i=1
i=1 µ(Ai ),
Definition 2.1.10 The triple (Ω, F, P ) is called a measure space. The measure µ
is finite if µ(Ω) < ∞. It is a probability measure if µ(Ω) = 1. If µ(Ω) = 1, the
triple (Ω, F, µ) is called a probability space. The measure is σ-finite, if there are
∞
measurable sets {An }∞
n=1 of finite measure such that Ω = ∪n=1 An . A measure on
a Borel σ-algebra B(Ω) is called a Borel measure.
Property (2) is called the σ-additive property, which implies that if An is a nondecreasing sequence of measurable sets A1 ⊂ A2 ⊂ . . . ,
lim P (An ) = P (∪∞
n=1 An ).
n→∞
(2.1)
Similarly if An is a non-increasing sequence of measurable sets A1 ⊃ A2 ⊃ . . . ,
lim P (An ) = P (∩∞
n=1 An ).
n→∞
Definition 2.1.11 Consider the measure space (Ω, F, P ). We say that F is complete if any subset of a measurable set of null measure is in F.
Standard Assumption. We assume that the measure is a probability measure
and that the σ-algebra is complete.
In this case if f = g almost surely, that is they agree on a set of full measure,
then they are both measurable or both not measurable.
Remark 2.1.12 A measure on F is determined by its value on a generating set of
F that is a π system.
9
2.1.5
Measurable Functions
Definition 2.1.13 Given measurable spaces (Ω, F) and (S, B), a function f :
Ω → S is said to be measurable if f −1 (B) = {ω : f (ω) ∈ B} belongs to F
for each B ∈ B. Measurable functions are known as random variables, especially
if S = R.
Let (Ω, F) be a measurable space.
Example 2.1.14 The indicator function of a set A is:
1, if ω ∈ A
1A (ω) =
,
0, if ω 6∈ A.
a real valued function. The functions 1A is measurable if and only if A ∈ F.
Example 2.1.15 A function f : E → R that only takes a finite number of values
have the following form,
n
X
f (x) =
aj 1Aj (x),
j=1
where aj are values that f take and Aj = {x : f (x) = aj }. Then f is measurable
if and only if Aj are measurable sets.
Definition 2.1.16 A simple function is of the form
f (x) =
n
X
aj 1Aj (x),
j=1
where aj ∈ R and Aj are measurable sets.
If {Aα , α ∈ I} is a collection of subsets of Y then
f −1 (Ac ) = (f −1 (A))c
f −1 (∩α∈I Eα ) = ∩α∈I f −1 (Eα )
f −1 (∪α∈I Eα ) = ∪α∈I f −1 (Eα ).
Proposition 2.1.17
• If f1 : (Ω, F) → (E1 , A1 ) and f2 : (Ω, F) → (E2 , A2 )
are measurable functions. Define the product σ-algebra
A1 ⊗ A2 = σ{A1 × A2 : A∈ A1 , A2 ∈ A2 }.
Then h = (f1 , f2 ) : (Ω, F) → (E1 ×, A1 ⊗ A2 ) is measurable.
10
• If f : (X1 , B1 ) → (X2 , B2 ) and g : (X2 , B2 ) → (X3 , B3 ) are measurable
functions then so f ◦ g : (X1 , B1 ) → (X3 , B3 ) is measurable.
• If f, g, fn : (X, B) → (R, B(R) are measurable functions, then the follow−g|
ing functions are measurable: f + g, f g, max(f, g) = (f +g)+|f
, f+ =
2
max(f, 0), f − , lim sup fn , lim inf fn , supn≥1 fn , inf n≥1 fn , limn→∞ fn .
Proposition 2.1.18 If f : E → [0, ∞] is a positive measurable function, there is a
sequence {fn } of simple functions such that
0 ≤ f1 ≤ f2 ≤ . . . fn ≤ . . . f,
and fn → f pointwise. Furthermore fn → f uniformly on any sets on which f is
bounded. Here both E and [0, ∞] are equipped with the relevant Borel σ-algebras.
Theorem 2.1.19
1. For every measurable function f , there is a sequence of
simple functions fn such that for every x,
lim fn (x) = f (x).
n→∞
If f is bounded, fn can be chosen to be uniformly bounded.
2. If f : E → [0, ∞] is a positive measurable function, there is a sequence
{fn } of simple functions such that
0 ≤ f1 ≤ f2 ≤ . . . fn ≤ . . . f,
and fn → f pointwise. Furthermore fn → f uniformly on any sets on which
f is bounded. Here both E and [0, ∞] are equipped with the relevant Borel
σ-algebras.
Proposition 2.1.20 Let (X, B(X)) and (Y, B(Y )) be the metric spaces with Borel
σ-algebras. If f : X → Y is continuous then it is Borel measurable .
2.2
Integration with respect to a measure
Let (E, F, µ) be a measure space. Let
E = {f (x) =
n
X
aj 1Aj (x) : Aj ∈ F, aj ∈ R}
j=1
11
be the set of (measurable) simple functions. We define the integral of f =
with respect to µ as below:
Z
f dµ(x) =
x∈E
n
X
Pn
j=1 aj 1Aj
aj µ(Aj ).
j=1
If aj , j = 1, . . . , n are distinct values, then Aj = f −1 ({aj }).
Definition 2.2.1 Let f : E → [0, ∞] be a positive measurable function. Define
Z
Z
f dµ = sup
gdµ.
g∈E:g≤f
Definition 2.2.2 Let
R f : E →RR be a general measurable function. Write f =
f + − f − . If both f + dµ and f − dµ are finite, we say that f is integrable with
respect to µ and the integral is defined to be:
Z
Z
Z
+
f dµ =
f dµ −
f − dµ.
E
E
E
The set of all integrable functions are denoted by L1 (µ), elements of which are said
to be L1 functions.
If A is a Borel measurable set, we denote by the following the same integral:
Z
Z
Z
f dµ,
f (x) µ(dx),
1A (x)f (x) dµ(x).
A
A
E
Proposition 2.2.3 Assume that f, g ∈ L1 (µ).
1. If f ∈ L1 (µ), so does |f | ∈ L1 (µ), and
Z
Z
f dµ ≤ |f | dµ.
2. If f, g ∈ L1 (µ), a, b, ∈ R, then af + bg ∈ L1 (µ) and
Z
Z
Z
(af + bg) dµ = a f dµ + b g dµ.
3. If f ≤ g,
Z
Z
f dµ ≤
12
gdµ.
4. If A is a measurable set, define
Z
R
R
f
dµ
=
1A f dµ. Then
A
Z
g dµ
f dµ =
A
A
for all measurable set A implies that f = g almost surely.
R
5. If f ≥ 0 and f dµ = 0 then f = 0 a.e.
2.3
Lp spaces
Let (E, B, µ) be a measure space. Two functions f, g : E → R are equivalent if
f = g almost surely. For 1 ≤ p < ∞, we define the Lp space:
Z
p
L (E, B, µ) = {f : E → R : measurable,
|f (x)|p dµ(x) < ∞}.
E
1
R
Define kf kLp = E |f (x)|p dµ(x) p . Let L∞ be the set of bounded measurable
functions with |f |L∞ = inf{a : µ(|f (x)| > a) > 0}. Then Lp spaces are Banach
spaces. The notations Lp (E, B), Lp (E) or Lp maybe used for simplicity.
Proposition 2.3.1 Hölder’s inequality (Cauchy-Schwartz if p = q = 2).
Z
Z
|f g|dµ ≤
p
1 Z
p
q
|g| dµ
|f | dµ
1
q
,
1 1
+ = 1.
p q
Minkowski Inequality for p ≥ 1,
kf + gkLp ≤ kf kLp + kgkLp .
p
q
Suppose that µ is a finite measure. Then by Hölder
inequality,
P
P L ⊂ L for p < q.
A function φ : R → R is P
convex if φ( pi xi ) ≤ i pi φ(xi ) where pi are
real positive numbers such that i pi = 1. If φ is twice differentiable f is convex
00
if and only if φ ≥ 0. Example of convex functions are: φ(x) = xp for p > 1,
φ(x) = ex .
Theorem 2.3.2 (Jensen’s Inequality) If φ is a convex function then for all integrable functions f and µ a probability measure
Z
Z
φ
f dµ ≤ φ(f )dµ.
13
Proposition 2.3.3 (Chebychevs inequality) If X is an Lp random variable then
for a ≥ 0,
1
P (|X| ≥ a) ≤ E|X|p .
a
1
RTheorem 2.3.4 If f ∈ L . For any > 0 there is a simple function φ such that
|f − φ| < .
Theorem 2.3.5 (The monotone convergence theorem) If fn are non-decreasing
sequence converging to f almost surely, and fn− ≤ g and g is integrable then
Z
Z
lim fn dµ = lim
f dµ.
n→∞
n→∞
Proof This is shown when fn ≥ 0. Otherwise take gn = fn + g. Then gn ≥ 0 and
{gn } is a non-decreasing sequence.
Theorem 2.3.6 (Dominated Convergence Theorem) If |fn | ≤ g where g is an
integrable function, and fn → f a.e. then f is integrable
Z
Z
lim
fn dµ =
lim fn dµ.
n→∞
n→∞
Theorem 2.3.7 (Fatou’s lemma) If fn is bounded below by an integrable function
then
Z
Z
lim inf fn dµ ≤ lim inf fn dµ
n→∞
2.4
n→∞
Notions of convergence of Functions
Definition 2.4.1 Let (E, B, µ) be a measure space. Let fn be a sequence of measurable functions.
• fn converges to f almost surely if there is a set Ω0 with µ(Ω0 ) = 0
lim fn (ω) = f (ω),
n→∞
∀ω 6∈ Ω0 .
• The sequence fn is said to converge to f in measure if for any δ > 0
lim P (|fn − f | > δ) = 0.
n→∞
• Lp convergence
Z
|fn − f |p dP → 0.
Proposition 2.4.2 Let An be measurable sets. The indicator functions 1An → 0
in probability if and only if µ(An ) → 0.
14
2.5
Convergence Theorems
Proposition 2.5.1
measure.
1. If fn converges to f in Lp , p ≥ 1, it converges to f in
2. If fn converges to f in measure there is a subsequence of fn which converges
almost surely.
3. fn converges to f in measure if and only if every subsequence has a further
subsequence that converges a.s.
Theorem 2.5.2 (Egoroff’s Theorem*) Suppose that µ(E) < ∞. Let fn be a sequence of functions converging to f almost surely. Then for any > 0 there is
a set E ⊂ E with µ(Ec ) ≤ such that fn converges to f uniformly on E . In
particular fn converges to f in measure.
2.5.1
Uniform Integrability
Let (Ω, F, µ) be a measure space.
Definition 2.5.3 A family of real-valued functions (fα , α ∈ I), where I is an index
set, is uniformly integrable (u.i.) if
Z
lim sup sup
|fα |dµ = 0.
C→∞
α
{|fα |≥C}
Proposition 2.5.4 Suppose that supα E|fα |p < ∞ for some p > 1, then the family
(fα , α ∈ I) is uniformly integrable.
Proof This follows from
Z
sup
|fα |dµ ≤
α
{|fα |≥C}
Z
1
C (p−1)
sup
α
|fα |p dµ → 0.
{|fα |≥C}
Definition 2.5.5 A family {fα } is uniformly absolutely continuous, if for any > 0
there is a δ > 0 such that,
Z
sup
|fα |dµ < ,
∀A ∈ F with µ(A) < δ.
α∈I
A
Lemma 2.5.6 Assume that µ(Ω)R< ∞ and f ∈ L1 . Given any > 0 there is
δ > 0 such that if µ(A) < δ then A |f |dµ < .
15
Suppose that the measure µ is finite. The lemma above shows that a L1 function
is (uniformly) absolutely continuous. The lemma brows show that it is uniformly
integrable.
Proposition 2.5.7 Let µ be a finite measure. The family (fα , α ∈ I) is uniformly
integrable if and only if the following two conditions hold:
R
1. (L1 boundedness): supα∈I |fα |dµ < ∞
2. The family fα is uniformly absolutely continuous.
Proof We first assume that (fα , α ∈ I) is uniformly integrable. For any 0 < < 1,
take C such that
Z
|fα |dµ ≤ /2,
|fα |≥C
Then, since µ(Ω) < ∞, we have L1 boundedness,
Z
Z
Z
sup |fα |dµ ≤
|fα |dµ + sup
|fα | ≤ + Cµ(Ω) < ∞.
2
α
α
|fα |≥C
|fα |<C
Also
Z
|fα |dµ ≤ sup
sup
α
Z
α
A
≤ sup
≤
|fα | dµ + sup
α
Z
|fα | dµ + sup
A∩{|fα |≥C}
Z
α
Z
α
{|fα |≥C}
|fα |dµ
A∩{|fα |<C}
Cdµ
.
A∩{|fα |<C}
+ C µ(A) → 0.
2
R
Thus supα A |fα |dµ → 0 as µ(A) → 0 (uniform absolutely continuity).
We next prove the converse. Assume that Xt is uniformly absolutely continuous and is L1 bounded. Then
Z
1
sup µ(|fα | ≥ C) ≤ sup |fα |dµ
C α
α
R
is mall when C is large. By choosing C large we see that |fα |≥C |fα |dµ is as small
as we want.
Proposition 2.5.8 Let fn , f ∈ L1 .
a) Suppose
that fn → f in measure and fn is uniformly integrable. Then
R
|fn − f |dµ → 0.
16
R
b) Suppose that |fn − f |dµ → 0 and µ(Ω) < ∞. Then fn → f in measure
and fn is uniformly integrable.
Proof Part a). Assume that {fα } is uniformly integrable. Then for any > 0 there
choose C > 1 such that
Z
Z
|f |dµ < .
sup
|fn |dµ +
n
{|f |≥C}
{|fn |≥C}
Sincefn → f in measure, we may choose N such that if n > N ,
µ(|fn − f | > ) <
.
2C
It follows that
Z
Z
|fn − f |dµ ≤ sup
|fn − f |dµ
n
|fn −f |≤
Z
!
Z
+
|fn |dµ
+
{|fn −f |>}∩{|fn |≥C}
{|fn −f |>}∩{|fn |≤C}
Z
!
Z
|f |dµ
+
+
{|fn −f |>}∩{|f |≥C}
{|fn −f |>}∩{|f |≤C}
Z
≤ + 2 + 2 sup
n
Cdµ
{|fn −f |>}
≤3 + 2Cµ{|fn − f | > }) ≤ 4.
R
R
R
Hence |fn − f |dµ → 0. Finally | fn dµ − f dµ| ≤ |fn − f |dµ → 0.
Proof of part b). The convergence in measure follows from
Z
1
µ(|fn − f | > ) <
|fn − f |dµ → 0.
R
For the uniform integrability, we observe that
Z
Z
Z
|fn |dµ ≤
|fn − f |dµ +
|f |dµ.
A
Ω
A
Take A = Ω to see that {fn } is L1 bounded.
By Lemma 2.5.6, for any > 0, there
R
is δ > 0 such that if µ(A) < δ, then A |f |dµ < . This means that
Z
Z
|fn |dµ ≤
|fn − f |dµ + A
Ω
and {fn } is absolutely continuous. Apply Proposition 2.5.8 to conclude the uniform integrability.
17
2.6
Pushed Forward Measures, Distributions of Random
Variables
Let (Ω, F, ν) be a measure space and (S, B) a measurable space. Let f : Y → S be
a measurable function. The pushed forward measure f∗ (ν) on (S, B) is defined
as follows:
f∗ (ν)(B) = ν{x : f (x) ∈ B},
B ∈ B.
Lemma 2.6.1 Let α : S → R be a bounded Borel measurable function. Then
Z
Z
α ◦ f dν =
α d(f∗ (ν)).
(2.2)
S
Ω
Let (Ω, F, P ) be a probability space. Let X : Ω → S be a measurable function.
Let α : S → R be a bounded measurable function. Define
Z
Eα(X) =
α ◦ X dP.
Ω
The measure X∗ (P ) on S is called the probability distribution or the probability
law of X. Let us denote this measure by µX ,
µX (B) = P (ω : X(ω) ∈ B).
The change of variable formula says that
Z
Eα(X)) =
α(y) dµX (y).
S
If S = R, then µX ((−∞, a)) = P (ω : X(ω) ≤ a). In particular,
Z
Z
EX =
y dµX (y),
var(X) = (y − EX) dµX (y).
S
S
If X1 , . . . , Xn are real valued random variables, then (X1 , . . . , Xn ) is a real
valued random viable. The joint distribution of X1 , . . . , Xn is the measure in Rn
induced by (X1 , . . . , Xn ).
18
2.7
Lebesgue Integrals
Definition 2.7.1 Let (Ω, F, P ) be a measure space. We say that F is complete if
any subset of a measurable set of null measure is in F.
The sets in the completion of the Borel σ-algebra are said to be Lebesgue measurable. The Lebesgue Measure L : B(R) → R is determined by,
X
L(A) = inf{ (bj − aj ) : ∪∞
A ∈ B(R).
j=1 (aj , bj ) ⊃ A},
j
The measure L of an interval is the length of the interval. The integrals developed
in the last section for functions f : R → R are Lebesgue integrals.
Proposition 2.7.2 A bounded function f : [a, b] → R is Riemann integrable if and
only if the set of discontinuity of f has Lebesque measure zero.
Proposition 2.7.3 If a bounded function f : [a, b] → R is Riemann integrable,
then it is Lebesgue measurable and Lebesgue integrable. Furthermore, the integrals have the common value:
Z
Z b
f (x)dµ =
f (x)dx.
[a,b]
a
Let F be a right continuous increasing function, there is a unique Borel measure on R, called the Lebesgue-Stieltjes measure associated to F , such that
µF ((a, b]) = F (b) − F (a).
Furthermore
X
µF (A) = inf{ (F (bj ) − F (aj )) : ∪∞
j=1 (aj , bj ) ⊃ A}.
j
R
R
It is customary to denote h dF for the intergal h dµF .
If F = f − g where f, g are right continuous increasing functions, define
µF = µf − µg . The measure µF is not necessarily positive valued. It is a signed
measure.
Definition 2.7.4 Let f, g : [a, b] → R be bounded functions. We say f is RiemannStieltjes integrable with respect to g if there is a number l such that for all > 0,
there is δ such that for all partitions
∆ : a = t0 < x1 < · · · < tn = b
19
with mesh |∆| := max1≤i≤n (ti − ti−1 ) < δ,
n
X
∗
< .
f
(t
)
[g(t
)
−
g(t
)]
−
l
j+1
j
j
j=0
Here t∗j is any point in [tj−1 , tj ]. If so we we define l to be the Riemann-Stieltjes
Rb
integral of f with respect to g and this is denoted by a f (t) dg(t).
Take a function F that is absolutely continuous on [a, b]. Then
Z x
F (x) = f (a) +
g(t)dt,
a
with g Lebesque integrable and f differentiable at almost surely all points. Furthermore F 0 = g almost surely. In this case
Z b
Z b
h(x)µF (dx) =
h(x)F 0 (x)dx.
a
a
A right continuous increasing function F on [0, ∞) has finite total variation,
We may assume that F (0) = 0. The measure µF is absolutely continuous with
respect to the Lebesque measure if and only if F is absolutely continuous.
2.8
Total variation
I will relate this to semi-martingales later.
Definition 2.8.1 Denote by P ([a, b]) the set of all partitions a ≤ t0 < t1 < · · · <
tn = b of [a, b]. The total variation of a function g on [a, b] is defined by
X
gT V ([a, b]) = sup
{|g(tj ) − g(tj−1 )|} .
∆∈P ([a,b]) j
If gT V ([a, b]) is finite we say g is of bounded variation on [a, b].
If f : R → R is an increasing then it has left and right derivatives at every
point and there are only a countable number of points of discontinuity. Furthermore
f is differentiable almost everywhere.
Let f be a function, Define f + (x) = max(f (x), 0) to be its positive part and
−
f = − min(f (x), 0) to be the absolute value of its negative part. For p = 1, note
that |f (ti+1 ) − f (ti ))| = [f (ti+1 ) − f (ti )]+ + [f (ti+1 ) − f (ti )]− hence the name
‘total variation’.
20
Proposition 2.8.2 A real valued function on the interval [a, b] is of bounded variation if and only if it is the difference of two monotone functions. Such a function
is differentiable almost everywhere with respect to the Lebesque measure.
Proposition 2.8.3 If f is an increasing function then its derivative exists almost
every. Furthermore f 0 is Borel measurable.
Example 2.8.4 Let f be a real valued integrable function defined on [a, b].
Rx
1. Its indefinite integral a f (t)dt is of bounded variation and continuous.
2. The derivative of its indefinite integral equals f almost surely.
A right continuous increasing function F on [0, ∞) has finite total variation,
We may assume that F (0) = 0. The measure µF is absolutely continuous with
respect to the Lebesque measure if and only if F is absolutely continuous.
21
Chapter 3
Lectures
3.1
Lectures 1-2
The objectives of the course are: At the end of the course we should be able to
understand and work with martingales and with stochastic differential equations.
Here is a stochastic differential equation (SDE) of Markovian type on Rd in
differential form:
m
X
dxt =
σi (t, xt )dBti + b(t, xt )dt.
(3.1)
i=1
Here (Bt1 , . . . , Btm ) is an Rm -valued “Brownian Motion” on a “filtered probability
space” (Ω, F, Ft , P ). Each xt is a function from the probability space Ω to Rd ,
We assume that each σi : R+ × Rd → Rd and b : R+ × Rd → Rd to be
Borel measurable. Suitable conditions will be introduced to ensure that there is a
‘solution’ and that “uniqueness of solutions” hold. If σi ≡ 0 for all i, this is an
ordinary differential equation: ẋt = b(t, xt ). Please review results concerns ODEs
and review the existence and uniqueness theorem which you learnt.
The integral form of the SDE (3.1), given below, is more instructive:
xt = x0 +
m Z
X
i=1
t
σi (s, xs )dBsi
0
Z
+
t
b(s, xs )ds.
0
Rt
We will soon define “stochastic integrals”. The notation 0 σi (xs )dBsi denotes
the stochastic integral, Itô integral in this case, of σ(s, xs ) with respect to the one
dimensional Brownian motion (Bsi ).
Let (ft ) be a suitable stochastic
R t process, ft : Ω → R, and (Bt ) a Brownian
motion. The “stochastic integral 0 fs dBs ” is a “local martingale”. We will need
22
to study martingales, local martingales and semi-martingales. By the “integral representation theorem for martingales”, All “sample continuous ” local martingales
are stochastic integrals of the form above. The Clark-Ocone formula, a popular
formula in Malliavin calculus, gives an explicit formula for the integrand. The L2
chaos expansion takes this further.
You might want to ask the question why we study SDE. We will be able to
answer this question better, later in the course and at the end of this course. For the
moment just believe that SDEs are popular and useful.
3.2
Notation
In this section we fix the notation.
Let (Ω, F) be a measure space: Ω is a set and F is a σ-algebra of subsets of Ω.
By “F is a σ-algebra” we meant that: ∅ ∈ F, the compliment Ac of a set from F
is in F, the union ∪∞
i=1 Ai of a sequence of sets Ai ∈ F is in F.
A signed measure µ on the measure
P∞space is a function∞from F to R such that
µ(∅) = 0, and that µ(∪∞
A
)
=
n=1 n
n=1 µ(An ) if {An }n=1 is pairwise disjoint.
Unless otherwise stated, by a measure we mean one that takes non-negative values,
µ : F → R+ . The measure is said to be finite if µ(Ω) < ∞ and it is a probability
measure if µ(Ω) = 1. A finite measure can always be normalised to a probability
measure : µ̃(A) := µ(A)
µ(Ω) . A measure is σ-additive if there is a countable cover of
Ω by measurable sets of finite measure.
In this paragraph we briefly review on metric spaces. The restriction d to a
subset A of a metric space is a distance function on A. A subset of a metric space
is complete if any Cauchy sequence converges to a point in the set. A closed subset
of a complete metric space is complete. A metric space is separable if it has a
countable dense set. The product metric on (X1 , d1 ) and (X2 , d2 ) is max(d1 , d2 ).
Definition 3.2.1
• A collection T of subsets of a set X is a topology if ∅ ∈
T , and T is closed under the union of any number of sets and under the
intersection of a finite number of sets. Sets from T are called open sets.
Compliments of open sets are closed sets.
• The Borel σ-algebra of a topological space is the smallest σ-algebra that
contains all open sets and is denoted by B(X). Elements of the Borel σalgebra are Borel sets.
Borel sets include: open sets, closed sets, countable unions and intersections of
closed sets.
23
Example 3.2.2 Let (X, d) be a metric space. The collection of open sets defined
by the distance defines the metric topology on X. The metric topology are determined by open balls. The following are examples of metric spaces: [0, 1], Rn ,
Banach spaces such as Lp ((Ω, µ); R), Hilbert spaces, and any finite dimensional
manifolds.
In this course we are concerned only with metric spaces that are separable and
1
complete. Single points {x} = ∩∞
n=1 B(x, n ) are Borel set.
Definition 3.2.3
• Let (X1 , F1 ) and (X2 , F2 ) be measure spaces. A function
f : X1 → X2 is measurable if the pre-image of a measurable set is measurable: f −1 (A) ∈ F1 if A ∈ F2 .
• Let (X1 , τ1 ) and (X2 , τ2 ) be topological spaces. A function f : X1 → X2 is
continuous if the pre-image of an open set is open: f −1 (U ) ∈ τ1 if U ∈ τ2 .
Proposition 3.2.4 All continuous maps are Borel measurable.
Proof For i = 1, 2, let (Xi , τi ) be topological spaces and Bi the Borel σ-algebras.
Let f : X1 → X2 be a continuous function. Let
f −1 (B2 ) = {f −1 (A) : A ∈ B2 },
f −1 (τ2 ) = {f −1 (A) : A ∈ τ2 }.
Then f −1 (B2 ) is a σ-algebra and f −1 (τ2 ) ⊂ τ1 ⊂ B1 be the continuity of f . It is
also easy to show that σ(f −1 (τ2 )) = f −1 (B2 ).
Definition 3.2.5 Two complete separable metric spaces are measurable isomorphic if there is a bijection φ such that both φ and φ−1 are measurable.
Definition 3.2.6 The space ([0, 1], B([0, 1]), and also those isomorphic to it, is a
standard Borel space.
Let (X, F) be a measurable space. A Borel set E inherits a σ algebra by restriction:
{E ∩ A : A ∈ F}.
Theorem 3.2.7 (Theorem 2.13, [21])) Let Xi , i = 1, 2 be complete separable metic
spaces, and Ei ⊂ Xi Borel sets. Then E1 and E2 , with the restricted Borel σalgebra, are measurable isomorphic if and only if they have the same cardinality.
The cardinality of [0, 1]Z is the same as that of the continuum. The cardinality
of C([0, 1]; Rd ) is that of the continuum. This is because continuous functions
are determined by their values at rational numbers (a countable set), and hence the
24
cardinality of the set of continuous functions is that of RQ , a countable product R.
The cardinality of Lp (R; R) is that of the continuum. First we consider Lp spaces
from [2πn, 2π(n + 1)] → R. The Fourier series of a function in Lp converges to it
in Lp . Hence its values are determined by a countable number of quantities (their
Fourier coefficients) and the sines and cosines with respect to which the expansion
is made.
For non-Borel σ-algebras, a more general concept exists. The moral of this
story is: “Most” of probability spaces without an atom is “measurably isomorphic”
to the standard Borel space. All “reasonable” measure spaces that are rich enough
to support a Brownian motion is a standard Borel space.
3.3
The Wiener Spaces
Define W d ≡ C([0, 1]; Rd ) := {ω : [0, 1] → Rd continuous } and
W0d ≡ C0 ([0, 1]; Rd ) := {ω : [0, 1] → Rd continuous , σ(0) = 0}.
Then W , with the uniform norm
kωk = sup |ω(t)|,
0≤t≤1
is a separable Banach space. Furthermore W0 a closed subspace. With appropriate
modification the discussion below applies to both W0 and W0d . The cardinality of
both spaces are that of the continuum. Hence they with their Borel σ-algebras are
standard Borel spaces.
Let {ωi } be a dense set of W0d . Let {tk } ∈ Q ∩ [0, 1]. Then the topology is
determined by {ω ∈ W0d : kω − ωi k < n1 }. The following sets is a base of the
topology:
1
1
} = W d ∩{ω ∈ (Rd )[0,1] : |ω(tk )−ωi (tk )| < }.
n
n
(3.2)
A cylindrical set is of the form
{ω ∈ W0d : |ω(tk )−ωi (tk )| <
{ω ∈ W0d : ω(t1 ) ∈ A1 , . . . , ω(tk ) ∈ Ak } = ∩ni=1 {ω(ti ) ∈ Ai },
where Ai ∈ B(Rd ) and 0 ≤ t1 < t2 < · · · < tk ≤ 1.
The collection Cyl of cylindrical sets generate B(W0d ), check e.g. Fubini’s theorem, and is measure determining (two finite measures on (W0d , B(W0d )) agreeing
on Cyl will agree). In fact any π system that generates a σ-algebra is measure
determining.
25
Denote by pt the heat kernel on Rd . For x, y ∈ Rd .
pt (x, y) =
1
−
d e
|x−y|2
2t
.
(2πt) 2
Then pt (x, y)dy is a probability measure.
Let πt : W d → Rd be the projection, πt (ω) = ω(t).
Theorem 3.3.1 There is a unique probability measure µ on (W0d , B(W0d )) such
that for 0 < t1 < t2 < · · · < tn ≤ 1, A1 , . . . , An ∈ B(Rn ),
µ(ω : πt1 (ω) ∈ A1 , . . . , πtn (ω) ∈ An )
Z
Z
(3.3)
n
pt1 (0, x1 )pt2 −t1 (x1 , x2 ) . . . ptn −tn−1 (xn−1 , xn )πi=1
=
...
dxi .
A1
An
This measure is commonly known as the Wiener measure. The space (W0d , B(W0d , µ)
is the Wiener space.
We will give a proof of the existence in section 3.8.1. The map πt as a stochastic
process on the probability space (W0d , B(W0d , µ) is a Brownian motion, which we
define shortly.
Note. The interval [0, 1] can be replaced by [0, T ] where T is a positive number.
3.4
Lecture 3. The Pushed forward Measure
Let (X, B) and (Y, G) be two measurable spacs and µ a measure on (X, B). A
measurable map φ : X → Y induces a measure on (Y, G) such that for any C ∈ Y ,
(φ∗ µ)(C) = µ({x : φ(x) ∈ C}).
Denote φ−1 (C) = {x : Φ(x) ∈ C}.
Lemma 3.4.1 Let f : Y → R be in L1 (Y, (φ)∗ (µ)), then
Z
Z
f (φ)(x)dµ(x) =
f (y)d(φ∗ µ)(y).
X
Y
Idea of Proof: First take f =
Z
f (y)d(φ∗ µ) =
Y
Z X
n
Pn
i=1 ai 1Ai ,
then
n
n
X
X
ai 1Ai d(φ∗ (µ) =
(φ∗ µ)(Ai ) =
µ(φ−1 (Ai )).
Y i=1
i=1
26
i=1
On the other hand
Z X
Z
n
n
X
f (φ)(x)dµ(x) =
ai 1Ai (φ(x))dµ(x) =
µ({x : φ(x) ∈ Ai }).
X i=1
X
i=1
Next take f a positive function. Take an increasing sequence of simple functions
that converges to f :
n2n
X k
1
1A (x).
fn (x) = n [2n f (x)] ∧ n =
2
2n k
k=1
n
n
where Ak = {x : f (x) ∈ [ 2kn , k+1
2n ). Note that [2 f (x)] = k, if k ≤ 2 f (x) <
n
(k + 1)2 .
For f ∈ L1 consider the integrals of f + and f − separately.
If φ : U ⊂ Rd → φ(U ) ⊂ Rd is a diffeomorphism onto its image. Let dx
denote the Lebesque measure. Then φ−1 induces the pushed forward measure,
which is pushed back of the Lebesque measure by φ . Then if f : Rd → R is any
bounded measurable function,
Z
Z
f (x)dx =
f (φ(x))(φ−1 )∗ (dx).
Y
X
On the other hand,
Z
Z
f (x)dx =
Y
f (φ(x))| det T φ(x)|dx.
X
This shows that, e.g. for any Borel set A, take f (x) = 1φ(A) , f (φ(x)) = 1A (x).
d(φ−1 )∗ (dx)
dx
3.5
= | det T φ(x)|.
Basics of Stochastic Processes
Let (Ω, F, P ) be a measure space. Assume that P (Ω) = 1. Two measurable sets
A and B are independent if P (A ∩ B) = P (A)P (B).
Definition 3.5.1
1. Let {Fα , α ∈ Λ} be a family of σ-algebras. We say that
{Fα , α ∈ Λ} are independent, if for any finite index set {α1 , . . . αn } ⊂ Λ
and any sets A1 ∈ Fα1 , . . . , An ∈ Fαn ,
P (∩ni=1 Ai ) = Πni=1 P (Ai ).
27
2. A family of random variables are mutually independent if the σ-algebras
generated by them are mutually independent.
Let (Ω, F, P ) be a probability space. If X : Ω → Rd is measurable (i.e. a
random variable), then X∗ P is the probability distribution of X and is denoted by
µX . And
Z
Z
EX =
XdP =
ydµX (y).
Rd
Ω
Let πi : Rn → R be the projection to the ith component. The tensor σ-algebra
⊗n B(R) is
⊗n B(R) = σ{πi−1 (A) : A ∈ B(R)}.
For example π1−1 (A) = A × R × · · · × R. If each Xi is an R-valued measurable
functions, (X1 , . . . , Xn ) : Ω → Rn is measurable with respect to ⊗n B(R). The
joint distribution of (X1 , . . . , Xn ) is the measure on Rd pushed forward to the map
(X1 , . . . , Xn ).
Definition 3.5.2 The random variables {X1 , . . . , Xn } are independent if
µ(X1 ,...,Xn ) = µX1 ⊗ · · · ⊗ µXn .
Independence holds, if the two measures in the identity above agree on cylindrical
sets:
n
P (X1 ∈ A1 , . . . , Xn ∈ An ) = πi=1
P (Xi ∈ Ai ),
Ai ∈ B.
Equivalently for any gi : R → R bounded measurable,
n
n
E(πi=1
gi (Xi )) = πi=1
Egi (Xi ).
Let (Ω, F, P ) be a probability space. Let S be a metric space which is considered to be endowed with the Borel σ-algebra B unless otherwise stated. In this
course we take S = Rd . Let T be a subset of R, e.g. R, [0, 1], [a, b], [a, b), and
{1, 2, . . . }.
Definition 3.5.3 Let I be a set. A function X : Ω × I → S is a stochastic process
if for each α ∈ I, X(·, α) : (Ω, F) → (S, B) is measurable.
The process is denoted by (Xt , t ∈ T ) or (Xt ) for short. A point t ∈ I is
referred as time and the stochastic process is said to be a continuous time stochastic
process if T is an interval. If I = Z we have a discrete time stochastic process, the
stochastic process is denoted by (Xn ).
28
Example 3.5.4
• Take Ω = [0, 1], F = B([0, 1]), and P the Lebesque measure. Take I = {1, 2, . . . }. Define Xn (ω) = ωn . These are continuous
functions from [0, 1] → R and hence measurable.
• Take I = [0, 3]. Let X, Y : Ω → R be two random variables. Then
Xt (ω) = X(ω)1[0, 1 ] (t) + Y (ω)1( 1 ,3] (t) is a stochastic process.
2
2
Definition 3.5.5 A stochastic process (Xt , t ∈ I) is said to have independent increments if for any n and any 0 = t0 < t1 < · · · < tn , ti ∈ I, the increments
{(Xti+1 − Xti )}ni=1 are independent.
Definition 3.5.6 A sample continuous stochastic process (Bt : t ≥ 0) on R1 is the
standard Brownian motion if B0 = 0 and the following holds:
1. For 0 ≤ s < t, the distribution of Bt − Bs is N (0, t − s) distributed.
2. Bt has independent increments.
Definition 3.5.7 A stochastic process (Xt , t ≥ 0) with state S is said to be sample
path continuous if t 7→ Xt (ω) is continuous for almost surely all ω. The terminology time continuous can also be used.
Remark: Another regular encountered class of stochastic processes are càdlag
processes: for almost all ω, the sample path t 7→ Xt (ω) has left limit and is right
continuous for all time t. Such functions have jumps at the point of discontinuity.
Definition 3.5.8 A stochastic process Xt is said to have stationary increments if
the distribution Xt − Xs and Xt+a − Xs+a are the same for all a > 0 and 0 ≤
s < t.
3.6
Brownian Motion
A Brownian motion describes the following properties: the law of the randomness we model are independent during disjoint time intervals and has the Gaussian
property.
Let I = [0, 1] or I = [0, ∞).
Definition 3.6.1 A stochastic process (Bt : t ≥ 0) on Rd is a Brownian motion
with initial value x the following holds:
1. B0 = x
29
2. t 7→ Bt (ω) is continuous for almost surely all ω.
3. For s ≥ 0 an t > 0, the distribution of Bt+s − Bs is N (0, tI) distributed.
4. Bt has independent increments, i.e. for any 0 = t0 < t1 < · · · < tk ,
{Bti+1 − Bti }ki=1 is a family of independent random variables.
If B0 = 0 and d = 1 this is the standard (linear ) Brownian motion.
Remark*: Let A be a d × d matrix, ABt + at is N (at, AAT t) distributed. Fix
t > 0, observe that
n −1
2X
− B in t ).
Bt − B0 =
(B i+1
n t
i=0
2
2
Assume that {B i+1
t − B 2in t } are independent identically distributed. This allures
2n
that Bt ∼ N (0, t). Indeed, for each t, Bt has infinitely divisible laws and is a Lévy
process. The sample continuity will conclude a Gaussian distribution N (at, tC)
some a ∈ Rd and some positive semi-definite symmetric matrix C. Up to a linear
transform the process is indeed a Brownian motion.
3.7
Lecture 4-5
Proposition 3.7.1 The joint distribution of (Bs , Bt ), s < t, has density ps (0, x)pt−s (x, y)
with respect to the Lebesque measure dxdy on (Rd )2 .
Proof For any A1 , A2 ∈ B(Rd ),
P (Bs ∈ A1 , Bt ∈ A2 ) = E1A1 (Bs )1A2 (Bt )
= E1A1 (Bs )1A2 (Bt − Bs + Bs )
Z Z
=
1A1 (x)1A2 (z + x)ps (0, x)pt−s (0, z)dzdx
d
d
ZR ZR
=
1A1 (x)1A2 (y)ps (0, x)pt−s (0, y − x)dzdx
d
d
ZR R
=
ps (0, x)pt−s (x, y)dydx.
A1 ×A2
3.7.1
Finite Dimensional Distributions
Let I = [0.T ]. Let (Xt , 0 ≤ t ≤ T ) be a stochastic process with values in a
separable metric space (S, B(S)).
30
Definition 3.7.2 For 0 ≤ t1 < t2 < · · · < tn , the measurable map
ω 7→ (Xt1 (ω), Xt2 (ω), . . . , Xtn (ω))
from (Ω, F) to (S n , B(S n )) induces a Borel measure µt1 ,...,tn on S n . These are
the finite dimensional distributions of the stochastic process (Xt ). They are also
known as marginal distributions.
For A ∈ B(S n ),
µt1 ,...,tn (A) = P (ω : (Xt1 (ω), Xt2 (ω), . . . , Xtn (ω)) ∈ A).
Exercise. Compute the finite dimensional distributions of a Brownian motion.
Let (Xt ) be a stochastic process, we define a map
X· : Ω → (Rd )[0,T ]
by X· (ω)(t) = Xt (ω). It is a measurable map. The measure (X· )∗ (P ) is the
probability distribution (also called the law ) of the process. Finite dimensional
distributions determine the distribution of the process.
3.7.2
Gaussian Measures on Rd
Given a measure µ, its characteristic function is:
Z
µ̂(λ) := eihλ,xi dµ(x),
λ ∈ Rd .
Two finite measures agree if their characteristic functions agree. See §3.8 (page
197), volume I of [3]. A probability measure is Gaussian if for some a ∈ Rd and
some symmetric positive semi-definite d × d matrix C,
1
µ̂(λ) = eiλa− 2 hCλ,λi .
This is denoted by N (a, C). The measure is absolutely continuous with respect to
the Lebesque measure if C is invertible. Take the subspace spanned the eigenvectors of C, we we that the Gaussian measure is non-degenerate on this subspace.
hence there is little need to study non-degenerate Gaussian measures on a finite
dimensional vector space.
Let a ∈ Rd , C a positive definite d × d matrix. The (non-degenerate) Gaussian
measure N (a, C) on Rd is defined as the measure that is absolutely continuous
with respect to dx and
1
1
dµ
−1
=
e− 2 hC (x−a),x−ai .
n√
dx
(2π) 2 det C
31
(3.4)
Take

t 0 ... 0
 0 t ... 0 
.
=


...
0 0 ... t

C = tId×d
|x−a|2
dµ
1
− 2t
e
= pt (a, x) =
.
d
dx
(2πt) 2
Definition 3.7.3 A random variable whose distribution is Gaussian is a Gaussian
variable. A stochastic process Xt is Gaussian if its marginal distributions are
Gaussian.
Proposition 3.7.4 Let X = (X1 , . . . , Xn ) ∼ N (a, C) where C = (Ckl ). Then
EX = a and cov(Xk , Xl ) = Ckl .
Proof If C is diagonal this is easy to see. Otherwise, let A be a square root of C,
C = AAT . Then det C = (det A)2 and hC −1 ξ, ξi = |A−1 ξ|2 , any ξ ∈ Rd . Let
{ej } be the standard o.n.b. of Rd , then
Z
EX =
x
Rd
(2π)
n
2
1
√
1
det C
e− 2 hC
−1 (x−a),x−ai
dx
Z
1
1
−1
e− 2 hC x,xi dx
√
y∈Rd
(2π) det C
Z
1
1
2
z=A−1 y
e− 2 |z| det(A)dz = 0.
=
(Az + a)
d√
z∈Rd
(2π) 2 det C
=
(y + a)
d
2
The key ingredients is that the measure is turned into a product of measures on
each
P factor and Az is a linear function of z. In components, hAz + a, ek i =
j Ak,j zj + ak .
Z
EXk = EhX, ek i =
(
z∈Rd
X
Ak,j zj + ak )
j
32
1
1
−2
d e
(2π) 2
P
2
i (zi )
Πdzi = ak .
cov(Xk , Xl ) = EhX − a, ek ihX − a, el i
Z
=
hx − a, ek ihx − a, el i
1
√
1
e− 2 hC
−1 (x−a),x−ai
dx
(2π) det C
Z
1
1
−1
=
hy, ek ihy, el i
e− 2 hC x,xi dx
d√
y∈Rd
(2π) 2 det C
Z
1
1
2
z=A−1 y
e− 2 |z| det(A)dz
hAz, ek ihAz, el i
=
d√
z∈Rd
(2π) 2 det C
Z
P
X
X
1
− 12 i |z|2
e
(
ak,j zj
al,i zi )
=
dz
d
z∈Rd j
(2π) 2
i
Z
P
X
1
− 12 i |z|2
=
Ak,i Al,i
dz = (AAT )k,l = Ck,l .
zi2
d e
d
2
z∈R
(2π)
i
Rd
d
2
A family of real valued random variables {X1 , . . . , Xk } are uncorrelated if
cov(Xj , Xk ) = 0 when j 6= k. For Gaussian random variables,being uncorrelated
and being independent are equivalent. In fact if Ci,j = 0 when i 6= j, C is diagonal
and the Gaussian measure is a product measure. A family of stochastic processes
are independent, if their joint probability distribution is a product of the individual
probability distribution. This can be tested with evaluation at a finite number of
times or the investigation of their finite dimensional distributions.
Remark 3.7.5 If (Bt ) = (Bt1 , . . . , Btd ) is a d-dimensional Brownian motion if
and only if {(Bt1 , . . . , Btd )} are independent one dimensional Brownian motions.
For each t, {Bt1 , . . . , Btd } are independent random variables from C = tId×d . The
independence of the stochastic processes follows from this and the independent
increment property.
3.8
Kolmogorov’s Continuity Theorem
In the following definitions, the Euclidean space Rd can be replaced by a Banach
space.
Definition 3.8.1 Let α ∈ (0, 1). Let I be an interval of R, a function f : I → Rd
is Hölder continuous of exponent α if
|f (t) − f (s)| ≤ C|t − s|α .
33
Definition 3.8.2 Let α ∈ (0, 1). Let I be an interval of R, a function f : I → Rd
is locally Hölder continuous of exponent α if on any compact subinterval [a, b] ⊂ I,
|f (t) − f (s)|
< ∞.
|t − s|α
t6=s,t,s∈[a,b]
sup
Let T ∈ R+ ∪ {0}.
Theorem 3.8.3 ( Kolmogorov’s Continuity Theorem) Let (xt , 0 ≤ t < T ) be a
stochastic process, taking values in a Banach space (E, k − k). Suppose that there
exist positive constants p, δ and C such that
Ekxt − xs kp ≤ C|t − s|1+δ .
Then there is a continuous modification of (xt ), denoted as (x̃t ), which is locally
Hölder continuous with exponent α ∈ (0, δ/p).
For a proof or a refined version of this theorem (due to Garsia-Rademich-Rumsey
1970) see §2.1 (pages 47-51) of Stroock-Varadhan [27].
Definition 3.8.4 Two stochastic processes (Xt ) and (Yt ) on the same probability
space are modifications of each other if for each t, P (Xt = Yt ) = 1.
Here the exceptional set, {ω : Xt (ω 6= Yt (ω)}, of zero measure may depend on t.
Definition 3.8.5 Two stochastic processes (Xt ) and (Yt ) on the same probability
space are indistinguishable of each other if P (Xt = Yt , ∀t) = 1.
Theorem 3.8.6 Let (xt , 0 ≤ t < T ) be a real valued stochastic process such that
xt − xs is N (0, t − s) distributed. It has a continuous modification and the paths
t 7→ xt (ω) are Hölder continuous on any interval [a, b] ⊂ [0, T ) of exponent α for
any α < 21 .
Proof For any p > 0,
1
p
Z
E|xt − xs | = p
2π(t − s)
√
x=( t−s)z
=
∞
−|x|2
|x|p e 2(t−s) dx
−∞
1
|t − s| √
2π
p
2
Z
∞
|z|p e
−|z|2
2
p
dz = E(|x1 |p )|t − s| 2 < ∞.
−∞
By Kolmogorov’s continuity criterion there is a continuous modification of the
p
−1
sample path. The Hölder exponent, α = 2 p = p−2
2p , can be taken arbitrarily close
1
to 2 by taking p large.
34
3.8.1
The existence of the Wiener Measure and Brownian Motion
Fix T > 0. We come back to prove Theorem 3.3.1, which states that there is a
unique probability measure on W d such that
µ(ω : πt1 (ω) ∈ A1 , . . . , πtn (ω) ∈ An )
Z
Z
pt1 (0, x1 )pt2 −t1 (x1 , x2 ) . . . ptn −tn−1 (xn−1 , xn )dxn . . . dx2 dx1 .
...
=
A1
An
(3.5)
We used πt (ω) = ωt to denote the evaluation map at t. By Kolmogorov’s extension
theorem, there is a unique probability measure µ0 on the product σ-algebra of
(Rd )[0,T ] . See §3.9.2 for detail. Treat (B(Rd ), ⊗[0,T ] B(Rd ), µ0 ) as a probability
space and πt : B(Rd ) → Rd is a measurable function. Since πt ∼ N (0, t −
s), by Theorem 3.8.6 πt has a continuous modification and hence the set of noncontinuous path has µ0 measure zero.
The product σ-algebra = ⊗[0,T ] B(Rd ) is smaller than the Borel σ-algebra of
the product topology. Since the projections πt are continuous, B(W d ) contains the
σ-algebra of the product topology. However continuous functions are determined
by their values on rational numbers. The Borel σ-algebra is contained in B(W d ) =
W d ∩ ⊗[0,T ] B(Rd ). However W d or W0d is not a measurable set in the tensor
σ-algebra! We refer to Revuz-Yor [23].
Corollary 3.8.7 There is a measurable map
φ : ((Rd )[0,T ] , ⊗[0,T ] B(Rd )) → (W0d , B(W0d )).
Let µ = φ∗ µ0 . Then µ satisfies (3.5). In conclusion Wiener measure exists and
Brownian motion exists. In fact πt is a Brownian motion on the Wiener space.
The map φ is constructed as below. If ω is not continuous on Q we define φ(ω) ≡
0. Let Ω0 be the collection of such sets. It is clearly measurable. If ω is continuous
on Q we define φ(ω) such that it agrees with ω on Q and is continuous. Now for
all Borel sets Ai ,
{ω : φ(ωti ) ∈ Ai , ti ∈ Q} = {ω : ωti ∈ Ai , ti ∈ Q}
or
{ω : φ(ωti ) ∈ Ai , ti ∈ Q} = {ω : ωti ∈ Ai , ti ∈ Q} ∪ Ω0 .
Both are measurable sets. By Corollary 3.8.6, µ0 (ω 0 ) = 0. On cylindrical sets,
φ ∗ µ 0 = µ0 .
35
3.8.2
Lecture 6. Sample Properties of Brownian Motions
Rt
We wish to define 0 fs dBs . Could we use the theory of Lebesque-Stieltjes inRb
tegration ? To make sense of a fs dgs as a Lebesque-Stieltjes integral g is assumed to be of finite total variation on [a, b]. Functions of finite total variation
is differentiable almost surely everywhere. A typical Brownian path is nowhere
differentiable. If fs is reasonably smooth we may overcome the problem of the
differentiability
of t 7→ Bt R. By the elsmentary integrationR by parts formula:
Rb
t
t
a fs dBs = fb Bb − fa Ba − 0 Bs dfs . The stochastic integral 0 fs dBs we define
later will not request such regularities on fs .
Since a d-dimensional Brownian motion consists of d independent one dimensional Brownian motions. In this section we assume that Bt is a one dimensional
Brownian motion.
Proposition 3.8.8 Let ∆n : a = tn0 < tn1 < · · · < tnMn +1 = b be a sequence of
partitions of [a, b] with |∆n | → 0. Then
lim E
n→∞
Mn
X
!2
2
(Btni+1 − Btni ) − (b − a)
= 0.
i=0
In particular Tn converges in probability to b − a. There is a sub-sequence of
partitions ∆nk , such that for almost surely all ω,
Mn
X
Tn ≡
(Btni+1 − Btni )2 → b − a.
i=0
Proof
ETn =
Mn
X
E(B
tn
i+1
2
−B ) =
tn
i
i=0
d
Note that Bt − Bs =
√
Mn
X
(tni+1 − tni ) = b − a.
i=0
t − sB1 (exercise).
2
E(Tn − (b − a)) = var(Tn ) =
Mn
X
var (Btni+1 − Btni )2
i=0
=
Mn
X
var (tni+1 − tni )B12 ,
d
sinceBt − Bs =
√
t − sB1
i=0
=
Mn
X
tni+1 − tni
2
(varB12 ) ≤ |∆n |(b − a)var(B12 ) → 0.
i=0
36
The first statement of the Proposition hold. Now L2 convergence implies convergence in probability, and so there is a sub-sequence that is convergent almost
surely.
Taking for example a dyadic partition, divide each interval by 2 each time, the
whole sequence converge almost surely, [14]
Definition 3.8.9 Let (xt , 0 ≤ t < ∞) be a stochastic process. Suppose that for
any sequence ∆n of partitions of [0, t] with limn→∞ |∆n | = 0,
T
(n)
(x· ) :=
Mn
X
(xtni+1 − xtni )2
i=0
converges, in probability, to a finite limit, which we denote by hx, xit . We say that
(xt ) has finite quadratic variation and hx, xit is its quadratic variation process.
the convergence.
We also denote the convergence mentioned above by
lim
n→∞
Mn
X
(P )
E(xtni+1 − xtni )2 = hx, xit
i=0
The quadratic variation process is an increasing function. By Proposition 3.8.8,
the Brownian motion (Bt , t ∈ [0, ∞)) has finite quadratic variation and its quadratic
variation process is t. We’ll see below that a sample continuous stochastic process
(xt , t ∈ [0, ∞)) with finite quadratic variation, which is not identically zero, cannot have finite total variation over any finite time interval. This can be seen in
the following proposition, where the Brownian motion (Bt ) can be replaced by a
continuous process.
Proposition 3.8.10 For almost surely all ω, the Brownian paths t 7→ Bt (ω) has
infinite total variation on any intervals [a, b]. And Bt (ω) cannot have Hölder continuous path of order α > 21 .
The total variation of a function g : [a, b] → R is:
gT V ([a, b]) = sup
n−1
X
|g(ti+1 ) − g(ti )|
∆ i=0
where ∆ : a = t0 < t1 < · · · < tn = b ranges through all partitions of [a, b].
Proof Fix an ω. Since Bt has almost surely continuous paths we only consider all
such ω with t 7→ Bt (ω) continuous.
37
Suppose that B(ω)T V ([a, b]) < ∞. For the partitions in the previous lemma
such that Tn converges almost surely,
Mn
Mn
X
X
n
2
n
n
n
(Bti+1 − Bti ) ≤ max Bti+1 (ω) − Bti (ω) ·
(Btni+1 − Btni )
i
i=0
i=0
n
≤ max Bti+1 (ω) − Btni (ω) · B(ω)T V ([a, b]) → 0
i
The convergence follows
Pn from the fact that Bt (ω)2is uniformly continuous on [a, b].
This contradicts that i=0 (Btni+1 (ω) − Btni (ω)) converges to d − c.
n
n
Suppose that Bti+1 (ω) − Bti (ω) ≤ C(ω)|t − s|α , where C(ω) is a constant
for each ω, for some α > 12 .
Mn
X
|B
tn
i+1
2
2
(ω) − B (ω)| ≤ C (ω)
tn
i
i=0
n
X
|tni+1 − tni |2α
i=0
≤ C 2 (ω)|∆n |2α−1
2
n
X
(tni+1 − tni )
i=0
n 2α−1
≤ C (ω)(b − a)|∆ |
→ 0,
as 2α − 1 > 0. This contradicts with Proposition 3.8.8.
38
3.8.3
The Wiener measure does not charge the space of finite energy*
Let F : R → R be an increasing function. Then f has only a countable number
of points of discontinuity and f is differentiable almost everywhere. We modify
the value of f at the points of discontinuity so that the modified function is right
continuous. Then f 0 = f˜0 where they are differentiable and this holds a.e..
A right continuous increasing function induces a measure µF on R, which is
determined by
µF ((a, b]) = F(b) − F (a).
It is easy to see that this is indeed a measure and the continuity of the measure
on a monotone sequence of sets is assured by the right continuity of F . This is
the Lebesgue-Stieltjes measure associated to F. If F : [c, d] → R is of finite total
variation with T V (F ) then both F1 = T V (F ) + F and F2 = T V (F ) − F are
increasing and F := F1 − F2 . If F is right continuous, T V (F ), F1 and F2 are
right continuous.
If the measure µF is absolutely continuous with respect to the Lebesque measure, denote by p(x) the its density.
Z
Z
f (x)dµF (x) = f (x)p(x)dx
for any f bounded measurable. In particular,
Z
b
p(x)dx = F (b) − F (a).
a
Suppose that F ∈ H 1 , the space of finite energy also known as W 1,2 :
Z 1
0 2
1
d
F (s) ds < ∞}.
H = {F ∈ W0 :
0
Then the weak derivative F 0 = p and
Z
b
F (b) − F (a) =
F 0 (x)dx.
a
By Cauchy-Schwartz, |F (b) − F (a)| ≤
qR
√
|F 0 (x)|2 dx b − a and F is Hölder
continuous of order 21 . Since almost surely all path in W d is not a Hölder continuous of order 12 , the Wiener measure does not charge µ(H) = 0.
39
3.9
Product-σ-algebras
Let I be an arbitrary index and let (Eα , Fα ), α ∈ I be a family of measurable
spaces. The tensor σ-algebra, also called the product σ-algebra, on E = Πα∈I Eα
is defined to be
⊗α∈I Fα = σ{πα−1 (Aα ) : Aα ∈ Fα , α ∈ I}.
Here πα : E = Πα∈I Eα → Eα is the projection (also called the coordinate map).
If x = (xα , α ∈ I) is an element of E, πα (x) = xα . The tensor σ-algebra is the
smallest one such that for all α ∈ I, the mapping
πα : (E, ⊗α∈I Fα ) → (Eα , Fα )
is measurable.
For example, we make take (Eα , Fα ) = (S, B), a metric space with its Borel
sigma-algebra. We often take S = R or S = Rn .
3.9.1
The Borel σ-algebra on the Wiener Space
The set of all maps from [0, 1] to Rd can be denoted as the product space (Rd )[0,1] .
The tensor σ-algebra ⊗[0,1] B(Rd ) is the smallest space such that the projections
πt (σ) = σ(t)
is measurable. The product topology is the smallest topology such that the projections are continuous. Hence the Borel σ-algebra of the product topology is larger
than the tensor σ-algebra.
A subset W d of (Rd )[0,1] is the subset of continuous functions. The tensor
σ-algebra of (Rd )[0,1] is determined by set of the form
{πt−1 (B) : B ⊂ B(Rd )}.
Note that B(W d ) ⊂ W d ∩⊗[0,1] B(Rd ) by (3.2). On the other hand πt : W d →
Rd is continuous with respect to the uniform topology:
|πt (ω1 ) − πt (ω2 )| ≤ kω1 − ω2 k.
This means that the uniform topology is finer than the product topology and B(W d )
contains the Borel σ-algebra of the product topology on (Rd )[0,1] ∩ W d and hence
contains W d ∩ ⊗[0,1] B(Rd ). In conclusion,
B(W d ) = ⊗[0,1] B(Rd ) ∩ W d .
40
Any measurable set in the tensor σ-algebra is determined by the a countable
number of projections. The action to determine whether a function is continuous
or not cannot be determined by a countable number of operations. Hence W d is not
a measurable set in the tensor σ-algebra. (Don’t be confused with the following:
once we know a function is continuous, the function can be determined by its values
on a countable dense set.)
3.9.2
Kolmogorov’s extension Theorem
For each ω, we may view t ∈ I 7→ X(ω, t) ∈ S an element of S I . Define
X· : Ω → S I to be the map from t to Xt (ω). Then X· : (Ω, F) → (S I , ⊗I B) is
measurable if and only if each Xt : (Ω, F) → (S, B) is measurable.
Definition 3.9.1 The map X : Ω → S [0,T ] induces a probability measure on
⊗[0,T ] B. This is the law or the distribution µX of the stochastic process.
The projection map from S [0,T ] to S n defined by
πt1 ,...,tn : X· ∈ S [0,T ] 7→ (Xt1 , . . . , Xtn ) ∈ S n
induces from µX the marginal distribution µt1 ,...,tn .
The question is whether the marginal distributions determine the law µ. A
cylindrical set in πα∈I S is product set each of whose factor spaces, with possibly
a finite number of exceptions, equals S. Let E denote the collection of cylindrical
sets:
E = {Πα∈I Aα , Aα ∈ B}
where Aα = S except for a finite number of index α. Then σ(E) = ΠI B. Let J ⊂
I be a subindex set. Let πJ : Πα∈I S → Πα∈J S be the projection. If J ⊂ K ⊂ I
let πKJ : Πα∈K S → Πα∈J S be the projection map. Let µ be a measure on the
product space πI S, and let µJ the measure induced by the projection from πI S to
πJ S. Then (µJ ) is a projective family of measures, which means (πKJ )∗ µK =
µJ . The family of marginal distributions forms a projective family of cylindrical
measures.
Definition 3.9.2 Let E be a separable metric space. Let {µJf } be a collection of
probability measures on ⊗I B where Jf runs through finite subsets of I. They are
called cylindrical measures. This is a projective family of measures if
(πKJ )∗ (µk ) = µJ .
41
Theorem 3.9.3 (Kolmogorov’s Extension Theorem) Given a projective family of
cylindrical measures, there is a unique probability measure µ on ⊗I B such that
(πJf )∗ µ = µJf .
In fact there is a stochastic process Xt : Ω × [0, 1] to S such that the distribution of
(Xt1 , . . . , Xtn ) is µ(t1 ,...,tk ) .
Remark 3.9.4 This theorem leads to problems in convergence of stochastic processes, especially weak convergence. If there is a sequence of measures µn , µ on
⊗n B such that the marginal distributions µnt1 ,...,tk of µn converges weakly to that
of µ, it is not necessary that µn converges to µ weakly. See Billingsley on convergence of probability measures.
42
Chapter 4
Conditional Expectations
All measures are assumed to be σ-additive which means the total space is the union
of a countable number of measurable sets of finite measure. The Lebesque measure
is σ-additive and don’t forget finite measures.
Short hand: we denote by f ∈ G the statement that ‘f is a function measurable
with respect to G’.
4.1
Preliminaries
This section is assumed and not covered in the lectures. A measure is not a canonical object. If there is another way of measuring a set how do the two measures
compare?
Definition 4.1.1
1. Let (Ω, F) be a measurable space. Given two measures P
and Q. The measure Q is said to be absolutely continuous with respect to P
if Q(A) = 0 whenever P (A) = 0, A ∈ F. This will be denoted by Q << P .
2. They are said to be equivalent, denoted by Q ∼ P , if they are absolutely
continuous with respect to the other.
Theorem 4.1.2 (Radon-Nikodym Theorem) If Q << P , there is a nonnegative
measurable function Ω → R, , which we denote by dQ
dP , such that for each measurable set A we have
Z
dQ
Q(A) =
(ω)dP (ω).
A dP
The function dQ
dP : Ω → R is called the Radon-Nikodym derivative of Q with
respect to P . We also say that dQ
dP is the density of Q with respect to P . This
function is unique.
43
1
Note that if Q is a finite measure then dQ
dp ∈ L (Ω, F, P ). If P is a probability
R dQ
measure, and Ω dP (ω) dP (ω) = 1, then Q is a probability measure. If furthermore dQ
dP > 0, then
Z
Z
Z
1 dQ
1
dP =
dP
=
dQ.
dQ dP
dQ
A
A
A
dP
dP
It follows that Q << P and the two measures are equivalent and
dP
dQ
·
dQ
dP
= 1.
Example 4.1.3 Let Ω = [0, 1) and P the Lebesque measure. Let Ani = [ 2in , i+1
2n ),
n
n
n
n
i = 0, 1, . . . , 2 − 1. and Fn = σ{A0 , A1 , . . . , A2n −1 }. Let µ be a measure on
Fn . Check that
X µ(An )
dµ
i
(x) =
1An (x), x ∈ [0, 1).
dP
P (Ani ) i
i
Two measures Q1 and Q2 are singular if Q1 (A) = 0 whenever Q2 (A) 6= 0 and
Q2 (A) = 0 whenever Q1 (A) 6= 0.
Example
4.1.4 Let Ω = [0, 1] and P the Lebesgue measure . Define Q(A) =
R
dQ1
2
2 A x2 dx so dQ
dP (x) = 2x . Define Q1 by dP = 21[0, 1 ] . Then Q1 << P and P
2
is not absolutely continuous with respect to Q1 . Define Q2 by
two measures Q1 and Q2 are singular.
44
dQ2
dP
= 21[ 1 ,1] . The
2
4.2
Lecture 7-8: Conditional Expectations
Question: What is the conditional distribution of (Bt ) given B1 = 0?
Let (Ω, F, P ) be a probability space. Let G be sub-σ-algebra of F. The random
variables here are either real valued or Rd valued, or vector space valued.
Definition 4.2.1 Let X ∈ L1 (Ω, F, P ). The conditional expectation of X given G
is a G-measurable function, denoted by E{X|G}, such that
Z
Z
X(ω)dP (ω) =
E{X|G}(ω)dP (ω),
∀A ∈ G
(4.1)
A
A
Standard brackets can be used instead of the curly ones to denote conditional expectation, E(X|G).
Theorem 4.2.2 The conditional expectation of X ∈ L1 (Ω, F, P ) exists and is
unique up to a set of measure zero.
Proof
R
• Existence. Define Q(A) = A X(ω)dP (ω) for A ∈ G. Now P restricts to a
measure on G and Q << P . Let E{X|G} = dQ
dP .
( Treat Q as a signed measure or assume that X ≥ 0 in the first instance. For
X = X + − X − define E{X|G} = E{X + |G} − E{X − |G}.)
• Uniqueness. Let g and g̃ be two such functions. Since g − g̃ is G measurable,
A1 := {g − g̃ > 0} ∈ G
R
R
and A∩A1 gdP = A∩A1 g̃dP for all A ∈ G. Thus (g − g̃)1A1 = 0 a.s.. So
g ≤ g̃ and by symmetry g = g̃ a.s..
Proposition 4.2.3 For all bounded G-measurable function g.
Z
Z
g(ω)X(ω)dP (ω) =
g(ω)E{X|G}(ω) dP (ω).
Ω
(4.2)
Ω
This follows from the Monotone Class Theorem.
By the uniqueness and direct checking of equation (4.1), the following properties are intuitive and their proofs are straight forward.
45
• If X ∈ G, E{X|G} = X a.s.
• If X is orthogonal to G, E{X|G} = EX a.s.
• If g ∈ G, X ∈ F, then E{gX|G} = gE{X|G} a.s.
• For all a, b ∈ R,
E{aX + bY |G} = aE{X|G} + bE{Y |G}.
• If X ≤ Y , then E{X|G} ≤ E{Y |G}.
Exercise 4.2.1 The family The family of random variable {E{X|G} : g ⊂ F } is
L1 bounded.
Proof
E|E{X|G}| ≤ E (E{|X| | G}) = E|X|.
Conditional Probability: Given B ∈ F,
P (B|G) := E(1B |G),
is the conditional probability of B given G.
4.3
Properties of conditional expectation
This following is given as a handout, not part of the lectures.
Proposition 4.3.1 Let X, Y ∈ L1 (Ω, F, P ) and G a sub-σ-algebra of F.
1. For all a, b ∈ R,
E{aX + bY |G} = aE{X|G} + bE{Y |G}.
2. E(E{X|G}) = EX.
3. If X ≤ Y , then E{X|G} ≤ E{Y |G}.
4. If G1 is a sub σ-algebra of G2 then
E{X|G1 } = E{E{X|G1 }|G2 } = E{E{X|G2 }|G1 }.
46
5. If X is G measurable, XY ∈ L1 then E{XY |G} = XE{Y |G}. In particular E{X|G} = X a.s.
6. Jensen’s Inequality. Let φ : Rd → R be a convex function, e.g. φ(x) = |x|,
|x|p , p > 1. If n = 1, ex is convex. Then
φ (E{X|G}) ≤ E{φ(X)|G}.
For p ≥ 1, kE(X|G)kp ≤ kXkp .
7. If |Xn | ≤ g ∈ L1 then E(Xn |G) → E(X|G).
8. If E|Xn − X| → 0 then E|E{Xn |G} − E{X|G}| → 0.
9. If Xn ≥ 0 and Xn increases with n then E(Xn |G) increases to E(limn→∞ Xn |G).
10. If Xn ≥ 0, E(lim inf n→∞ Xn |G) ≤ lim inf n→∞ E(Xn |G).
11. If σ(X) ∨ G, the σ-algebra generated by σ(X) and A, is independent of A
then E(X|A ∨ G) = E(X|G). In particular if X is independent of G then
E{X|G} = EX.
Proof The proof for 1-4 is straight forward. For 5, check it holds for X = 1B
where B ∈ G and apply Monotone class theorem. For 6, note that φ(x) =
sup{f(p,q) : f(p,q) (x) = px+q, f (p,q) ≤ φ, p, q ∈ Q}. For
7, note that E{|Xn | |G} ≤
g ∈ L1 . For 8, |E{Xn − X G}| ≤ E{|Xn − X| G}. Check L1 boundedness and uniformly absolutely continuity of E{|Xn − X| |G} on G. For 9-10,
noteR that Xn 1B is positive and increasing.
The last statement:
R
R for A ∈ A, B ∈
G, A∩B XdP = E(X1B )P (A), A∩B E{X||G}dp = A E{X1B |G}dP =
RP (A)E(X1BR ). Since {A ∩ B} forms a π-system, show that C = {D ∈ G ∨ A :
D XdP = D E{X|G}dP } is a Dynkin system and conclude that C = G ∨ A.
4.4
Conditioning on a Random Variable
Let η : (Ω, F) → (S, B) be a measurable map. Let G = σ(η). Let ξ : ω → (S, B)
be F-measurable with E|ξ| < ∞. Denote
E(ξ|η) := E(ξ|σ(η)).
Let S be a vector space. E(ξ|σ(η)) is σ(η)-measurable, there exists a Borel measurable function φ : S → R such that
E(ξ|σ(η)) = φ(η).
47
Write E{ξ|η = y} for φ(y).
This can be understood from another point of view. Let P̂η = η∗ (P ), the
distribution measure of η on R. We define a Borel measure µ on R as below. Let
C ∈ B(R).
Z
µ(C) =
ξdP.
η∈C
If P (η ∈ C) = P̂η (C) = 0, µ(C) = 0. Hence µ << P̂η and there is a function
dµ
(y) on R such that
dP̂η
Z
dµ
µ(C) =
C
Hence
dµ
(y)
dP̂η
C
dµ
(η)
dP̂η
(y)dP̂η (y).
is a version of P (ξ|η = y). By the change of variable formula,
Z
and
dP̂η
dµ
dP̂η
Z
(y)dP̂η (y) =
η∈C
dµ
dP̂η
(η)dP
= E{ξ|η}.
(D)
Example 4.4.1 Suppose that (ξ, η) = f (x, y)dxdy. Define fξ|η (x, y) =
if the integrand is not zero. Otherwise set it to be zero. Then
Z
P (ξ ∈ A|η = y) =
fξ|η (x, y)dx.
A
This is the conditional distribution of ξ given η. Also
Z
E(ξ|η = y) =
xfξ|η (x, y)dx.
R
Proof Let A, B(R). Then
Z
P (ξ ∈ A, η ∈ C) =
Z
On the other hand,
Z
η∈C
Z
fξ|η (x, η)dxdP =
η∈C
P (ξ ∈ A|η)dP.
1ξ∈A dP =
η∈C
fξ|η (x, y)dx(η)∗ P (dy).
C
48
R f (x,y)
R f (x,y)dx
R
Since (η)∗ P (dy) = R f (x, y)dx.
Z Z
Z
f (x, y)dxdy = P (ξ ∈ A, η ∈ η).
fξ|η (x, η)dxdP =
C
η∈C
A
It follows that
P (ξ ∈ A|η) = fξ|η (x, η).
In the case of X and Y are independent, any functions of the variable X is
independent of Y . Then P̂(X,Y ) = P̂X × P̂Y . For all y, we take
µy (B) = P̂X (B).
Clearly µy is a probability measures (same for all y) and
Z
Z
y
µ (B)dP =
PX (B)dP = P (y ∈ A, X ∈ B).
y∈A
y∈A
Hence µy (B) is a version of conditional expectation of B on Y .
Example 4.4.2 Compute P (Bt ∈ A|B1 = 0).
Note that (Bt , B1 ) ∼ pt (0, x)p1−t (x, y). Then
Z
Z
p (0, x)p1−t (x, y)
pt (0, x)p1−t (x, y)
R t
P (Bt ∈ A|B1 = y) =
dx =
dx.
p1 (0, y)
A R pt (0, x)p1−t (x, y)dx
A
We have used the fact that
Z
pt (0, x)p1−t (x, y)dx = p1 (0, y),
R
which is a Chapman-Kolmogorov equation.
The Brownian bridge from 0 to 0 at time a has density given by
pt (0, x)p1−t (x, 0)dx
.
p1 0, 0
Question: For 0 < t1 < t2 < 1, Does
P (Bt1 ∈ A1 , Bt2 ∈ A2 |B1 = 0)
have a density ? Check out
pt1 (0, x)pt2 −t1 (x, y)p1−t2 (y, 0)
.
p1 (0, 0)
49
Example 4.4.3 Let Gs = σ{Br : 0 ≤ r ≤ s}. Let (Bt ) be a Brownian motion.
Let t ≥ s. Prove that E{Bt |Gs } = Bs .
If t = s, Bt is Gt measurable and E{Bt |Gs } = Bs . If s < t,
E{Bt |Gs } = E{Bt − Bs |Gs } + E{Bs |Gs } = E{Bt − Bs |Gs } + Bs .
Let us now prove that
E{Bt − Bs |Gs } = 0.
Since Gs is determined by cylindrical sets. We only need to test on functions
of the form πi 1{Bti ∈Ai } , which are associated to ∩i {Bti ∈ Ai }.
Here 0 = t0 < t1 < t2 < · · · < tk = s. To make it simple to read let
gi : R → R be Borel functions, consider
k
E(Bt − Bs )πi=1
gi (Bti ).
Since Btj =
see that
Pj
i=1 (Bti+1
− Bti ) and use the independent increments property to
k
E(Bt − Bs )πi=1
gi (Bti ) = 0
which means
E{Bt − Bs |Gs } = 0.
50
4.5
Regular Conditional Probabilities
This section is not delivered in class. We have previously seen that in the case
when G is a finite partition of Ω there is a probability measure P ω such that
P (B|G)(ω) = P ω (B). This construction works for Ω = {1, 2, . . . , n, . . . } with
discrete topology. Let C ∈ F denote
P (C|G}(ω) = E(1C |G}(ω).
(4.3)
This is the condition probability of C given G.
Definition 4.5.1 A system of probability measures {Q(B)(ω) : ω ∈ Ω} is called
a regular conditional probability given G if
(1) Each B ∈ F, Q(B)(ω) is a version of P (B|G)(ω).
(2) For almost all ω, Q(·)(ω) is a probability measure.
We write Qω for the measure Q(·)(ω).
Note that P ω (B) is a version of E{1B |G}(ω) means that for all B ∈ F and A ∈ G,
Z
P (A ∩ B) =
Q(B)(ω)dP (ω).
A
Note that for all f ∈ L1
Z
E{f |G}(ω) =
f (ω̃)Qω (dω̃).
Ω
We now consider the task of constructing a regular conditional probability.
Note that P (B|G)(ω) : F × Ω → R is G measurable and satisfies that
• For any version of the the conditional expectation, P (Ω|G)(ω) = 1
• ‘Countable’ additivity. Let us fix a countable number of disjoint sets Ck and
any versions of the conditional expectations of Ck , then outside of a set of
measure zero:
P (∪nk=1 Ck |G)(ω) =
n
X
k=1
51
P (Ck |G)(ω).
Hence P (−|G)(ω) is a natural candidate for a probability measure. For each B,
P (B|G)(ω) is defined outside of a null set N B and the value of the function on
N B is undetermined. It still makes sense when a countable number of sets are
used. The problem arises when there are an uncountable number of countably
disjoint sets.
Throughout this section let E be a Polish space ((separable ands completely
metrizable ) ). We have in mind [0, 1], Rn , finite dimensional manifolds and the
Wiener space W . Borel σ-algebras for such spaces are countably generated, a
property which allows effective management of exceptional sets in the definition
of conditional expectations.
Definition 4.5.2 A measurable space (E, A) is a Borel space also known as a
standard space if there is a bijection φ from E to [0, 1] such that φ and φ−1 are
measurable while [0, 1] is equipped with the Borel σ-algebra.
A Polish space with its Borel σ-algebra is a standard space. A measurable subset
of a standard topological space together with the induced sigma field is a standard
probability space.
Theorem 4.5.3 (Thorem 7.1, p.145 Parthasarathy [21]) If (Ω, F) is a standard
probability space, then a regular conditional expectation given any sub-σ -algebra
exists.
For the proof please consult Parthasarathy [21] (pp145-146).
Example 4.5.4 Take a measurable function Y : Ω → E = {y1 , y2 , . . . }. Define
Ai = Y −1 ({yi }) = {ω : Y (ω) = yi }. Then σ(Y ) = σ{Ai } and Ai ∩ Aj = ∅ if
i 6= j. Let X ∈ L1 (Ω, F, P ). Define
(
E(1Ai X)
P (Ai ) , if P (Ai ) 6= 0 .
φ(yi ) =
0,
if P (Ai ) = 0
Then
(
φ(Y (ω)) =
E(1Ai X)
P (Ai ) ,
0,
if ω ∈ Ai and P (Ai ) 6= 0
.
if ω ∈ Ai and P (Ai ) = 0
Then E(X|Y )(ω) = φ(Y (ω)).
Indeed for any Ak ∈ σ(Y ),
Z
Z
φ(Y )dP =
Ak
Ak
!
Z
X E(1A X)
i
1Aj (ω) dP =
XdP.
P (Ai )
Ak
i
52
We may define φ̃ such that φ̃(yi ) = ai , ai ∈ R, if P (Y −1 (yi )) = 0 and
φ = φ̃ otherwise. Then φ̃(Y ) is a version of the conditional expectation. And
φ̃(Y ) = φ(Y ) almost surely.
53
4.6
Lecture 9: Regular Conditional Distribution and Disintegration
Let X : (Ω, F) → (S, B) be a measurable map. Let X ∈ L1 (Ω, F, P ) and G a
sub-σ-algebra of F. For B ∈ B, let X −1 (B) = {X ∈ B}. Denote
P {X −1 (B)|G} = E{1X −1 (B) |G}.
Can we choose a version P (X −1 (B)|G), so that for almost all ω,
P (B)(ω) = P {X −1 (B|G)(ω}
is a probability measure on (S, B)?
Theorem 4.6.1 (Theorem 8.1 (p147) Parthasarathy [21]) If (S, B) and (Ω, F)
are separable standard Borel spaces and X : Ω → S a measurable map. Let
G ⊂ F. There are a family, µ(ω, ·), of probabilities measures, called the regular
conditional distribution of X given G, such that the following holds.
• For almsot all ω, µ(ω, ·) is a probability measure on (S, B).
• For each A ∈ B, µ(ω, A) is a version of P (X −1 (A) · |G)(ω).
The measures are denoted by P̂X (·|G)(ω).
For uniqueness and a proof please read Parthasarathy [21], pages 147-150.
Note that for any integrable function f : R → R,
Z
E{f (F )|G}(ω) =
f (x)µ(dx|G)(ω).
R
Let S, S̃ be two complete metric spaces with Borel σ-algebras and G a sub-σ
algebra of B(S).
Theorem 4.6.2 (Disintegration,Thm 6.4, Kallenberg [14]) Let X : Ω → S so
that there is a regular version, µω , of P (X ∈ ·|G). Let Y : Ω → S̃ be Gmeasurable. Let f : S × S̃ → R be such that f (X, Y ) is integrable. Then
Z
E{f (X, Y )|G}(ω) =
f (x, Y (ω))dµω (x).
x∈S
54
4.6.1
Lecture 9: Conditional Expectation as Orthogonal Projection
Let L2 := L2 (Ω, F, P ), a Hilbert space. Let K := L2 (Ω, G, P ), which is a closed
subspace of L2 . Let f ∈ L2 , Let π denote the orthogonal projection defined by the
projection theorem,
π : L2 (Ω, F, P ) → L2 (Ω, G, P ),
given by the orthogonal decomposition:
f¯ ∈ K, f 0 ⊥ K.
f = π(f ) + f 0 ,
By f 0 ∈ K we meant that f 0 is orthogonal to K.
f
f0 ⊥ K
π(f ) ∈ K
Since π(f ) is the only element in K such that f − π(f ) is orthogonal to K:
∀g ∈ L2 (Ω, G, P ).
hf − π(f ), giL2 = 0,
Writing this out:
Z
Z
f gdP =
Ω
π(f ))gdP
Ω
This holds in particular for g = 1A and A ∈ G.
Conclusion. The orthogonal projection of an L2 function f is a version of its
conditional expectation, i.e. f¯ = E{f |G} a.s.
We next assume that f ≥ 0, not necessarily in L2 . Let fn be a sequence of
bounded, hence in L2 , functions (increasing with n) converging to f point wise.
Then π(fn ) is well defined and
Z
Z
A ∈ G.
1A fn dP = 1A f¯n dP,
Since 1A fn is increasing with n and is positive, so is 1A f¯n . Hence they have limits
and we may exchange limits with integration:
Z
Z
Z
Z
¯
1A f dP = lim
1A fn dP = lim
1A fn dP = 1A lim π(fn )dP.
n→∞
n→∞
n→∞
This means
E{f |G} = lim π(fn ).
n→∞
For f ∈
L1
let f =
f+
−
f−
and define E{f |G} = E{f + |G} − E{f − |G}.
55
Remark 4.6.3 Observe that π(f ) is the unique element of K such that
kf − π(f )k =
min
g∈L2 (Ω,G,P )
kf − gk.
Please refer to §II.2 Functional Analysis [22] for the Hilbert space projection theorem.
56
4.6.2
Uniform Integrability of conditional expectations
Let (Ω, F, µ) be a measure space.
Definition 4.6.4 A family of real-valued functions (fα , α ∈ I), where I is an index
set, is uniformly integrable (u.i.) if
Z
lim sup sup
|fα |dµ = 0.
C→∞
α
{|fα |≥C}
Lemma 4.6.5 (Uniform Integrability of Conditional Expectations) Let X : Ω →
R be an L1 . Then the family of functions
{E{X|G} : G is a sub σ-algebra of F}
is uniformly integrable.
Proof Let A be any set in G,
E 1A E{X | G} ≤ E (1A E{|X| | G}) = E ({E{1A |X| | G})) = E(1A |X|).
We have used Jensen’s inequality in the first step. Take A = Ω we see that
{{E{X|G}} is L1 bounded:
sup E|{E{X|G}| ≤ E|X|.
G
Take A = {|E{X|G}| ≥ C}, where C is a positive number. (That A ∈ G is clear).
P (|E{X | G}| ≥ C) ≤
1
E|X|
E | E{X|G}| ≤
,
C
C
which converges to 0 as C goes to infinity. Since f ∈ L1 , for any > 0, there
is δ > 0 such that if P (A) < δ, E(1A |X|) < . Consequently, for any , take
δ
C > E|X|
. Then
E 1A E{X | G} ≤ E(1A |X|) < .
The proof is now complete.
57
Chapter 5
Martingales
Let (Ω, F, P ) be a probability space.
5.0.3
Lecture 10: Introduction
Definition 5.0.6 A family {Ft }t∈I of non-decreasing sub-σ- algebras of F is a
called a filtration:
Fs ⊂ Ft ,
∀s, t ∈ I, s < t
where I is the index set.
We say that (Ω, F, Ft P ) is a filtered probability space. Define Ft+ = ∩h>0 Ft+h .
A filtration is right continuous if Ft+ = Ft . The filtration {Gt : Gt = Ft+ } is right
continuous. The natural filtration of a continuous process is not necessarily right
continuous. Let F∞ = ∨t≥0 Ft = σ(∪t≥0 Ft ), the smallest σ algebra containing
every σ algebras Ft , t ≥ 0. The completion of a σ-algebra Ft is normally obtained
by adding all null sets in F∞ whose measure is zero and is called the augmented
σ-algebra. The standard assumption on the filtration is that it is right continuous
and each σ-algebra is complete.
Definition 5.0.7 A stochastic process (Xt : t ∈ I) is Ft -adapted if each Xt is Ft
measurable.
Typically we take I = [a, b], [a, ∞), or I = {1, 2, . . . } or I = N .
Definition 5.0.8 Let Ft be a filtration on (Ω, F, P ). An adapted stochastic process
(Mt , t ∈ I)
• is a -martingale if each Xt ∈ L1 , and
E{Mt |Fs } = Ms ,
58
∀s ≤ t.
• is a sub-martingale, if E{Mt |Fs } ≥ Ms for all s ≤ t.
• is a super martingale if E{Mt |Fs } ≤ Ms for all s ≤ t.
For a sub-martingale, the integrability condition can be replaced by the integrability
of its positive part. See Revuz-Yor [23]. Note that if (Xt ) is a super-martingale then
(−Xt ) is a sub-martingale.
Example 5.0.9
1. Let {Xn , n ∈ N }, be a sequence of independent integrable
random
variables
with mean zero. Define Fn = σ{X1 , . . . , Xn } and Sn =
Pn
j=1 Xj . Then E{Sn |Fn−1 } = EXn + Sn−1 = Sn−1 , and so (Sn ) is a
martingale. IfXn : Ω → {1, −1} are Bernoulli variables, P (Xn = 1) = 21 ,
Sn is the simple random walk on the real line.
2. Let Xn be as above. Let X̃n = Xn + 1. Then
S̃n =
n
X
X̃n =
k=1
n
X
Xk + n = Sn + n
k=1
is a sub-martingale.
3. If X1 , X2 , . . . is a sequence of independent non-negative integrable random
variables with mean 1 let Mn = Πni=1 Xi . It is a discrete time martingale
with respect to Fk = σ{X1 , X2 , . . . , Xk }.
4. Let f ∈ L1 and ft = E{f |Ft } then ft is a martingale.
Proposition 5.0.10 Let Gt be a filtration with Gt ⊂ Ft . If (Xt ) is an (Ft )martingale, it is a Gt -martingale.
Proof Let A ∈ Gs ⊂ Fs . Since (Xt ) is an Ft -martingale,
E[Xt 1A ] = E[Xs 1A ].
It follows that (Xt ) is an Gt -martingale.
We have already seen that the set of martingales is a vector space. Is there a
Hilbert structure associated to it? We will see later that there is indeed one on the
subspace of L2 -bounded martingales (on which stochastic integrals can be defined
through the Riesz representation theorem). For the moment we are contented with
the following colorability theorem.
Proposition 5.0.11 Let {M n (t) : t ≥ 0} be a sequence of Ft -martingales. If for
each t, limn→∞ M n (t) = M (t) and {Mn (t), n = 1, 2, . . . } is uniformly integrable then (Mt ) is a martingale.
59
Proof Let s < t, A ∈ Fs . Then
E[Msn 1A ] = E[Mtn 1A ].
Since {Msn 1A } and {Mtn 1A } are uniformly integrable, we may taking n → ∞,
exchange limit with taking expectation and conclude that
E[Ms 1A ] = E[Mt 1A ].
Example 5.0.12 Take Ω = [0, 1] and define F1 to be the Borel sets of [0, 1] and P
the Lebesgue measure. Define Ft to be the σ-algebra generated by the collection of
functions which are Borel measurable when restricted to [0, t] and constant on [t, 1].
Let f : [0, 1] → R be an integrable function. We may define Mt = E{f |Ft },
f (x),
if x ≤ t
R1
Mt (x) =
1
1−t t f (r)dr if x > t.
Check that for s < t,
f (x),
if x ≤ s
R1
Rt
1
1−s [ s f (r)dr + t Mt (r)dr] if x > s.
f (x),
if x ≤ s
R1 1 R1
Rt
1
1−s [ s f (r)dr + t ( 1−t t f (u)du)dr] if x > s.
f (x),
if x ≤ s
R1
1
1−s [ s f (r)dr] if x > s.
E{Mt |Fs }(x) =
=
=
= Ms (x).
Take f (x) = x2 + 1 and compute Mt = E{f |Ft }.
5.1
Lecture 11: Overview
Besides martingales there is the concept of “local martingales”. Let us assume all
stochastic processes concerned in this section are sample continuous. A sample
continuous stochastic process is a local martingale if there is a sequence of stopped
stochastic processes (cut-offs) Mtn := Mt∧Tn , which are martingales. Here Tn is a
sequence of “stopping times” with limn→∞ Tn = ∞ and so limn→∞ Mtn = Mt .
Definition 5.1.1 Let I = [0, ∞) or I = {0, 1, 2, . . . }. A measurable random
function T : Ω → I ∪ {∞} is a (Ft , t ∈ I) stopping time if {ω : T (ω) ≤ t} is Ft
measurable for all t ∈ I.
60
Definition 5.1.2 The stopped process X T is defined by XtT (ω) = XT (ω)∧t (ω).
Definition 5.1.3 An adapted stochastic process (Xt : t ≥ 0) is a local martingale
if there is an increasing sequence of stopping times Tn with limn→∞ Tn = ∞
a.s. and is such that for each n, (XtTn − X0 , t ≥ 0) is a uniformly integrable
martingale. We say that Tn reduces X.
We will also discuss further local martingales in a later section.
A one dimensional Brownian motion is a martingale with respect to its own
filtration. A sample continuous local martingale with its “quadratic variation process” satisfying hM, M i∞ = ∞ is a time changed Brownian motion: Mt =
BhM,M it . This is the context of Dubins-Schwartz theorem (P181, [23]) which we
do not discuss further. See definition 3.8.9 for the definition of quadratic variation
process. We will be also discussing this in detail later.
A stochastic integral with respect to a Brownian motion or with respect to a
local martingale is a martingale. We shall see this later.
On the other hand, letting B = (B 1 , . . . , B d ) be an Rd valued Brownian motion. A Brownian local martingale, one that is an FtB local martingale is of the
form:
d Z t
X
Mt = C +
Hsi dBsi .
i=1
0
This is the integrable representation theorem for martingales. See Thm (3.5), page
201 [23] for detail. The Clark-Ocone formula express H i in terms of Malliavin
derivatives of M . Further expansions leads to L2 chaos decomposition of the space
of L2 functions. Content of this paragraph will be outside of the remit of the
lectures.
Definition 5.1.4 An n dimensional stochastic process (Xt1 , . . . , Xtn ) is a Ft localmartingale if each component is a Ft local-martingale.
A remarkable result application of martingales is Lévy’s martingale representation
for Brownian motions which we do discuss later. This theorem trivialises among
other things the difficult task of proving the components of a candidate stochastic
processes begin independent.
5.2
Lecture 12: Stopping Times
Consider the time that an event has arrived. This time is ∞ if the event does not
arrive. Let I = [0, ∞) or I = {0, 1, 2, . . . }.
61
Definition 5.2.1 A random function T : Ω → I ∪ {∞} is a (Ft , t ∈ I) stopping
time, also called an optional time, if {ω : T (ω) ≤ t} ∈ Ft for all t ∈ I.
If I = {0, 1, 2, . . . }, T is a stopping time if {T (ω) = n} ∈ Fn . A constant time
is a stopping time. T (ω) ≡ ∞ is also a stopping time. Write S ∨ T = max(S, T )
and S ∧ T = min(S, T ).
Proposition 5.2.2 (1) If S, T are stopping times, then max(T, S), min(T, S)
are stopping times. In particular S ∧ t and S ∨ t are stopping times for any
t ∈ I.
(2) If Tn is a sequence of increasing stopping times then T := supn Tn is a
stopping time. If Sn is a sequence of decreasing stopping times then S :=
inf n Sn is a stopping time.
(3) If Tn are stopping times then lim supn→∞ Tn and lim inf n→∞ Tn are stopping times.
Proof These are easily seen from the following,
{ω : max(S, T ) ≤ t} = {S ≤ t} ∩ {T ≤ t} ∈ Ft
{ω : min(S, T ) ≤ t} = {S ≤ T } ∪ {T ≤ t} ∈ Ft
{sup Tn ≤ t} = ∩n {Tn ≤ t} ∈ Ft
n
{inf S ≤ t} = ∪n {Sn ≤ t} ∈ Ft
lim sup Tn = inf sup Tn ,
n→∞
n≥1 k≥n
lim inf Tn = sup inf Tn
n→∞
n≥1 k≥n
Assume that inf(∅) = +∞.
Definition 5.2.3 We say that T : Ω → I ∪ {∞} is a weakly optional time if for all
t ∈ I, {T < t} ∈ Ft .
Recall Ft+ ≡ Ft+ = ∩s>t Fs .
Lemma 5.2.4
(1) A optional time is weakly optional.
(2) If T is Ft -weakly optional then T is Ft+ optional.
62
Proof (1) For any t > 0,
{T < t} = ∪∞
n=1 {T ≤ t −
1
} ∈ Ft .
n
(2) We show that If {T < t} ∈ Ft then T is a Ft+ stopping time:
{T ≤ t} = ∩∞
n=1 {T < t +
1
Note that ∩N
n=1 {T < t + n } = {T < t +
1
N}
1
}.
n
it follows that {T ≤ t} ∈ Ft+ .
Let (Xt ) be a stochastic process with values in a measurable space (S, B). Let
B ∈ B. Let
TB (ω) = inf{t > 0 : Xt (ω) ∈ B}.
We say TB is the hitting time of the set B by the process Xt For a discrete time
process (Xn ),
TB (ω) = inf {ω : Xn (ω) ∈ B}.
n
Example 5.2.5 Suppose that (Xn , n = 0, 1, 2, . . . ) is Fn -adapted. Let B be a
measurable set. Then TB is an Fn stopping time:
{TB ≤ n} = ∪k≤n {ω : Xk (ω) ∈ B} ∈ Fn .
For continuous time stochastic process this is more complicated and in general
we request some kind of separability of the process. For example of the process is
continuous, the process is determined on its values on rational numbers.
Theorem 5.2.6 Let (Xt , t ≥ 0) be (Ft ) adapted with values in (S, B). We assume
that S is a topological space.
1. Let (Xt ) be left continuous . Let U be an open. Then TU is weakly optional.
2. Let (Xt ) be right continuous . Let U be an open set. Then TU is weakly
optional.
3. Let S be a metric space and let (Xt ) be continuous. Let B be a closed set,
TB is weakly optional.
Proof (1) Let t > 0. If TU = s then Xs ∈ U and 0 ≤ s < t. If Xt is right
continuous, since U is open, there Xt ∈ U for t ∈ [s, s + ]. And
{TU < t} = ∪r∈Q∩(0,t) {Xr ∈ U } ∈ Ft .
63
This proves that TU is weakly optional.
(2) If (Xt ) is left continuous, if s = TU > 0, there is 0 < < s/2 such that
Xt ∈ U for t ∈ [s − , s]. Hence
{TU < t} = ∪r∈Q∩(0,t) {Xr ∈ U }.
If TU = 0 then {TU < t} = Ω. Hence TU is weakly optional.
(3) Since B is closed, for any t > 0,
1
∞
∞
{TB ≤ t} = ∪m=1 ∩n=1 ∪r∈Q∩[ 1 ,t] d(Xr , B) <
.
m
n
1
If ω belongs to the right hand side, there is a sequence rn in [ m
, t] such that
1
d(Xrn (ω), B) ≤ n1 . There is a subsequence rnk → s0 ∈ [ m
, t] such that Xs0 (ω) ∈
B. If on the other hand s0 = TB (ω) ≤ t. Since B is closed, Xs0 (ω) ∈ B. By left
continuity for any n, there is rn ∈ (s0 /2, s0 ] such that d(Xrn (ω), B) < n1 .
And {TB ≤ 0} = ∩n {TB < n1 } ∈ F0+ .
Theorem 5.2.7 For Brownian filtration completed with null sets, F0+ = F0 .
See page 38, Morters-Peres [19], for a nice proof.
Theorem 5.2.8 Let T be a stopping time. Then (BT +s −BT , s ≥ 0) is a Brownian
motion.
This can be proved by first assume that T takes a countable number of values.
Remark 5.2.9 From now on we assume the standard assumptions on the filtration:
it is right continuous and complete. We assume that our stochastic processes are
cádlag: for almost surely all paths it has left limit and is right continuity. In fact
if Ft satisfies the usual conditions and Xt is a super-martingale with t 7→ E[Xt ]
is continuous, then it has a cádlag modification which is a Ft super martingale.
This follows from the martingale convergence theorem. For each t consider Xqi
where qi ∈ Q decreasing (increasing) to t. We see that Xt has finite left limit Xt−
and finite right limit Xt+ . It can also be seen that the process (Xt+ , t ≥ 0) is an
integrable Ft+ -super-martingale.
64
5.2.1
Extra Reading
Note that ω ∈ {TB ≤ 0} means within any small neighbourhood of 0, there are
points where Xt (ω) is arbitrarily small. In particular if sn ↓ 0, Xsn = 0 will imply
that TB = 0. This we cannot tell at time 0 without further information.
Let DB the first entrance time (passage time)
DB = inf{t ≥ 0 : Xt ∈ B}.
The difference between DB and TB is what happens at time 0. Suppose that B is
closed. If X0 = 0 then DB = 0. If X0 6= 0 and (Xt ) is right continuous then
DB 6= 0. So DB in this case is a Ft -stopping time.
In the case of X0 ∈ B where B is an open set, then DB = TB = 0 if Xt right
continuous. If X0 and Xt walk out of B the next instant and stays away from it,
then TB = ∞,
Example 5.2.10 Let Xt : Ω → R be sample continuous and a ∈ R. Let
Da (ω) = inf {Xt (ω) ≥ a}.
t≥0
Let Ft = σ{Xs : s ≤ t}. Then Da is a stopping time. By the continuity of the
path,
{DA ≤ t} = { sup Xs ≥ a} = { sup Xs ≥ a}
0≤s≤t
s∈[0,t]∩Q
= ∩∞
n=1 ∪s∈[0,t]∩Q {Xs ≥ a −
1
} ∈ Ft .
n
Example 5.2.11 Let a ∈ R. Let Ta = inf{t > 0 : Xt ≥ a}. Let (Xt ) be
continuous. For t > 0
∞
{Ta ≤ t} = ∪∞
m=1 ∩n=1 ∪r∈[ 1 ,t]∩Q {ω : Xr (ω) ≥ a −
m
1
} ∈ Ft .
n
However {Ta = 0} = ∩{Ta ≤ n1 } is in general only F0+ measurable.
5.2.2
Lecture 13: Stopped Processes
Consider (Xn ). Then
{XT ∈ A} = ∪∞
k=1 ({Xk ∈ A} ∩ {T = k}) .
Definition 5.2.12 Let T be a stopping time. Define
FT = {A ∈ F∞ : A ∩ {T ≤ t} ∈ Ft , ∀t ≥ 0}.
65
This is the information available when an event arrives. It is clear that if T ≡ t
then FT = Ft .
Proposition 5.2.13 Let S, T be stopping times.
(1) If S < T , then FS ⊂ FT .
(2) If S ≤ T , for any A ∈ FS , S1A + T 1Ac is a stopping time.
(3) If S, T are stopping time then FS ∩{S ≤ T } ⊂ FS∧T and FS∧T = FS ∩FT .
Proof (1) If A ∈ FS ,
A ∩ {T ≤ t} = (A ∩ {S ≤ t}) ∩ {T ≤ t} ∈ Ft
and hence A ∈ FT .
(2) Since FS ⊂ FT ,
{S1A + T 1Ac ≤ t} = ({S ≤ t} ∩ A) ∪ ({T ≤ t} ∩ Ac ) ∈ FT .
(3) Take A ∈ FS . Then
A ∩ {S ≤ T } ∩ {S ∧ T ≤ t} = (A ∩ {S ≤ T } ∩ {S ∧ T ≤ t}) ∩ {S ≤ t} ∈ Ft .
Hence A ∩ {S ≤ T } ∈ FS∧T . Furthermore
A = (A ∩ {S ≤ T }) ∪ (A ∩ {T ≤ S}) ∈ FS∧T .
In particular FS ⊂ FS∧T . By symmetry, FT ⊂ FS∧T .
For a nice account of stopping times see Kallenberg [14].
Definition 5.2.14 A stochastic process X : [0, R) × Ω → E is progressively measurable if for each t, (s, ω) 7→ Xs (ω) as a map from ([0, t] × Ω, B([0, t]) ⊗ Ft ) to
(E, B(E)) is measurable. We often assume in addition that X : R+ × Ω → E is
measurable.
Theorem 5.2.15 If T is a stopping time and Xt is progressively measurable then
XT if FT -measurable.
For a proof see Revuz-Yor [23].
Let 0 = t0 < t1 < · · · < tn < t, and H−1 ∈ F0 and Hi ∈ Fti be measurable
functions. Let
n−1
X
H(t) = H−1 1{0} (t) +
Hi 1(ti ,ti+1 ] .
i=0
66
It is an elementary process. It is easy to check that it is progressively measurable. A
left continuous (or right continuous ) adapted process is progressively measurable:
Given a partition of R, we may approximate such a process by processes that is a
constant random function on each time interval.
If T is a stopping time and Xt : Ω → E is progressively measurable then
1{T <∞} XT is FT measurable.
Proposition 5.2.16 If T is weakly optional, then there is stopping times Tn such
that Tn takes only a finite number of values and Tn decreases to T .
Proof Define Tn =
Tn (ω) =
1
n
2n [2 T
+ 1],
j+1
j j+1
,
if
T
(ω)
∈
,
,
2n
2n 2n
Then Tn decreases to T and for
j
2n
j+1
2n ,
≤t<
{Tn ≤ t} = {Tn ≤
j = 0, 1, 2 . . . .
j
j
} = {T < n }.
n
2
2
and Tn are stopping times.
5.3
Lecture 14: The Martingale Convergence Theorem
Let Hn be a process such that Hn ∈ Fn−1 (previsible). It is the stake one puts
down at time n − 1 on betting Xn , an adapted process, goes up. It is determined
by events up to time n. The winning at time n is Hn (Xn − Xn−1 ). Define a new
process H · X (the total winnings up to time n), called the martingale transform of
X by H is :
(H · X)0 = 0
(H · X)n = H1 (X1 − X0 ) + · · · + Hn (Xn − Xn−1 ),
n≥1
This can be Rconsidered as discrete ‘stochastic integral’, which can be symbolically
n
denoted by 0 Hs dXs ’.
Lemma 5.3.1 Let Hn be adapted to Fn−1 with |Hn (ω)| ≤ K for some K > 0.
(1) If Hn ≥ 0 and (Xn ) is a super martingale then ((H · X)n ) is a super
martingale.
(2) If Xn is an Fn martingale then (H · X)n is a martingale.
67
Proof (1) Write Yn = (H ·X)n . The last winning is Yn −Yn−1 = Hn (Xn −Xn−1 ).
Since Hn is bounded, (H · X)n ∈ L1 and
E{Yn − Yn−1 |Fn } = Hn E{Xn − Xn−1 |Fn−1 }.
Since Hn ≥ 0 and Xn is a super martingale, E{Yn − Yn−1 |Fn } ≤ 0 and {Yn } is
a supe-rmartingale.
(2) In case of Xn is a martingale,
E{Yn − Yn−1 |Fn } = Hn E{Xn − Xn−1 |Fn−1 } = 0.
Let a < b, by ‘an upper crossing’ by (Xn ) we mean a journey starting from
below a and ends above b. Let us say Xn < a and m = inf m>n {Xm > b}. Then
connecting the points Xn , Xn+1 , . . . , Xm gives us an ‘upper crossing’ in graph.
Lemma 5.3.2 ( page 107, Williams [29]) Let X be a super-martingale and let UN [a, b](ω)
be the number of up-crossings of [a, b] made by a stochastic process {Xn } by time
N . Then
(b − a)EUN ([a, b]) ≤ E(XN − a)− .
Proof Let H be a betting strategy that you play 1 unit when Xn < a, plays until
X gets above b and stop playing. Then
(H · X)N ≥ (b − a) (UN ([a, b])) − [XN (ω) − a]− .
Taking expectation, using the fact that (H · X) is a super-martingale, to see that
(H · X)N (ω) ≥ (b − a)E (UN ([a, b])) − [XN (ω) − a]− .
If (an ) is a sequence that crosses from a to b infinitely often for some a < b,
then an cannot have a limit. If an does not have a limit, there will be a number
a < b such that an crosses it infinitely often. This is the philosophy behind the
following martingale convergence theorem.
Theorem 5.3.3 Let Xn be a discrete time super-martingale with supn (Xn )− <
∞. Then limN →∞ XN exists. Assume that supt E|Xt | < ∞ then X∞ is in L1 .
68
Proof Let A be the set of ω such that limN →∞ XN does not exist. If ω ∈ A there
are two rational numbers a < b such that
lim inf XN (ω) < a < b < lim sup XN (ω).
N →∞
N →∞
It is clear that
A = ∪a,b∈Q,a<b {ω : lim inf XN (ω) < a < b < lim sup XN (ω)} = ∪Λa,b .
N →∞
N →∞
We will show that P (Λa,b ) = 0. If ω ∈ Λa,b , there must be an infinitely number of
visits, UN (ω), to below a and to above b: limN →∞ UN ([a, b]) = ∞. Hence
Λa,b ⊂ {ω : E lim UN ([a, b]) = ∞}.
N →∞
On the other hand, by the upper crossing Lemma,
(b − a) lim EUN ([a, b]) ≤ sup E(XN − a)− ≤ sup((XN )− + |a|) < ∞.
N →∞
N
N
By the monotone converging theorem,
E lim UN ([a, b]) = lim E UN ([a, b]) < ∞.
N →∞
N →∞
In particular limN →∞ UN ([a, b]) < ∞ almost surely and P (Λa,b ) = 0.
To see that limN →∞ XN is finite apply Fatou’s lemma
E| lim XN | = E lim |XN | ≤ lim inf E|XN | ≤ sup E|XN | < ∞.
N →∞
N →∞
N →∞
N
E(Xt+ ) + E(Xt− ).
Note that if E|Xt | =
And (Xt ) is L1 bounded is a stronger
condition.
Let Ft be a right continuous complete filtration and Xt an L1 bounded supermartingale. Assume that t 7→ E[Xt ] is right continuous. Then there is a cádlag
modification of Xt which is a Ft super-martingale. See Revuz-Yor [23] (Chapter
2, section 2 for detail).
By the above theorem, (Xt ) is a continuous time martingale, then it converges
along every increasing sequences {tk }: {Xtk } is a discrete time super-martingale.
Let s1 > s2 > s3 > . . . be a decreasing sequence of numbers,
E{Xsk |Fsk+1 } = Xsk+1 .
69
Let Yn = Xsn and Gn = Fsn . Then Gn is a decreasing sequence of σ-algebras. By
the same argument as before, Y−∞ := limn→∞ Yn converges almost surely. This
motivated the study of martingales as below.
Consider a decreasing family of · · · ⊂ F−m ⊂ F−2 ⊂ F−1 . Let
Z−n = E{Z−1 |F−n }.
Let F−∞ = ∩n F−n .
Remark 5.3.4 Since Z−1 ∈ L1 , (Z−n ) is a L1 bounded and uniformly integrable.
It follows that for any A ∈ F∞ ⊂ F−m ,
EZ−1 1A = EZ−m 1A .
Taking m → ∞ to see that E(Z−1 1A ) = E(Z−∞ 1A ) and Z−∞ = E{Z−1 |F−∞ }.
Let f : R+ → R be a function then limt→T f (t) exists if and only if for any
sequence tn → T , limn→∞ f (tn ) exists and have the same limit. This leads to the
following theorem. If the function is furthermore right continuous, the function is
determined by its values on rational numbers.
Theorem 5.3.5 If (Xt , t ∈ [0, T ), where T ∈ R ∪ {∞}, is a right continuous
super martingale with supt<T E[Xt+ ] < ∞, then limt→T Xt exists almost surely.
Let {tn } be ordered rational numbers converging to T . The previous lemma applies
to show that limt↑T,t∈Q Xt exists almost surely. The above theorem holds if (Xt )
is a sub-martingale with supt E[Xt− ] < ∞.
Let I = [0, T ) where T is finite or ∞. If f ∈ L1 we may define a martingale
ft := {f |Ft }. The following theorem say that all L1 martingales are given this
way. Let f ∈ L1 with Ef = 1, we define a probability measure Q on F such
that dQ
dP = f . Then ft is the density of the two measures restricted to Ft . Given
a martingale (ft , t < 1) can we define a measure Q on F1 such that the density
of Q with respect to P , restricted on Ft , is ft ? The following theorem says yes
if (ft , t < 1) is uniformly integrable. This theorem is taken from Revuz-Yor [23]
(section 3, Chapter 2, Theorem 3.1).
Theorem 5.3.6 (The End Point of a martingale) If (Xt , t ∈ [0, T )) is a right
continuous martingale, the following are equivalent. Below we use XT for the
end point.
(1) Xt converges in L1 , as t approaches T , to a random variable XT .
70
(2) There exists a L1 random variable XT s.t. Xt = E{XT |Ft }.
(3) (Xt , t < T ) is uniformly integrable.
Proof In case (1), Xt converges in L1 implies that {Xt } is uniformly integrable,
c.f. Proposition 2.5.8. Given uniform integrability, we know that (Xt ) is L1
bounded and by Proposition 5.3.5 there is an L1 random variable XT such that
limt→T Xt ≡ XT almost surely. Note that limt→T |Xt − XT | = 0 if and only if
the convergence is in probability and {Xt } is uniformly integrable. Hence (3) is
equivalent to (1).
Assume (2). By Lemma 4.6.5, (Xt , t ≥ 0) is uniformly integrable, hence (3)
holds.
Let us assume that (1) and (3) hold. We prove (2). By the martingale property,
for any u > t, Xt = E{Xu |Ft }. By the uniform integrability,
Xt = lim E{Xu |Ft } = E{XT |Ft }.
u→T
5.4
Lecture 15: The Optional Stopping Theorem
We say that S ≤ T if S(ω) ≤ T (ω) for a.s. ω. We say that T is bounded if there
is C > 0 such that T (ω) ≤ C for all ω.
Theorem 5.4.1 (Doob’s Optional Stopping Theorem) Let S, T : Ω → {0, 1, 2 . . . }
be bounded stopping times with S ≤ T .
(1) If (Xn ) be a super-martingale. Then EXT ≤ EXS .
(2) If Xn is a sub-martingale, EXS ≥ EXT .
(3) If Xn is a martingale EXS = EXT . For any stopping time T , the stopped
process X T is a martingale.
Proof We only prove part (3). Recall the martingale transform:
(H · X)0 = 0
(H · X)n = H1 (X1 − X0 ) + · · · + Hn (Xn − Xn−1 ),
n≥1
Define HnT = 1n≤T ( The last betting is on the T -th game). Then
T
(H ·X)n =
n
X
[(X1 −X0 )+· · ·+(Xk −Xk−1 )]1T =k =
k=1
n
X
k=1
71
(Xk −X0 )1T =k = Xn∧T −X0 .
By lemma 5.3.1, (H T · X) is a martingale and so is X T .
IfT ≤ N , take n = N , then 0 = E(H T · X)N = E(XT − X0 ) = 0. Hence
E(XT ) = E(X0 ).
If (Xn ) is uniformly integrable, then X∞ = limn→∞ Xn exists. WE know
that (H T · X)n = Xn∧T − X0 .
For part super and sub martingales, take H S,T = 1S<n≤T .
We assume that (Ft ) satisfies the standard assumptions. Recall that if (Xt :
t ∈ I) is progressively measurable then XT ∈ FT for any stopping time T .
Proposition 5.4.2 Let (Xt : t ∈ I) be an integrable progressively measurable
process.
(1) Suppose that for all bounded stopping times S ≤ T , EXT = EXS , then
E{XT |FS } = XS .
(2) Suppose that for all bounded stopping times S ≤ T , EXT ≤ EXS then
E{XT |FS } ≤ XS .
Proof Let A ∈ FS , define τ = S1A + T 1Ac ≤ T a bounded stopping time. Then
EXT = E[XT 1A ] + E[XT 1Ac ]
EXτ = E[XS 1A ] + E[XT 1Ac ].
.
(1) For the first statement, EXτ = EXT implies that E[XT 1A ] = E[XS 1A ]
for all A ∈ FS and the conclusion holds.
(2) For the second statement, EXτ ≥ EXT by the assumption giving that
E[XT 1A ] ≤ E[XS 1A ].
Since E[XT 1A ] = E [E{XT |FS }1A ]], we have E [(E{XT |FS } − Xs ) 1A ]] ≤ 0
for any A. Hence E{XT |FS } ≤ XS .
Theorem 5.4.3 (Optional Stopping Theorem)
1. Let Xt , t ≥ 0 be right continuous martingale. Then for all bounded stopping times S ≤ T ,
E{XT |FS } = XS , a.s.
72
2. As in the above, if {Xt , t ≥ 0} is furthermore uniformly integrable, we define
XT = X∞ on {T = ∞}. Then
XS = E{X∞ |FS }
E{XT |FS } = XS .
for all stopping times S and T ( which are not necessarily bounded).
3. Let (Xt , t ≥ 0) be a right continuous super-martingale. Let S and T be two
bounded stopping times. Then E{XT |FS } ≤ XS almost surely.
If furthermore {Xt , t ≥ 0} is uniformly integrable, E{XT |FS } ≤ XS for
all stopping times.
Proof 1) Let K ∈ R be such that S(ω) ≤ T (ω) ≤ K. Let Sn = 21n [2n S + 1]
then XSn = E{XK |FSn } by Doob’s optional stopping theorem (together with
Proposition 5.4.2). And EXSn = EXK . By Lemma 4.6.5, (XSn , n = 1, 2, . . . )
is a uniformly integrable family. Hence EXS = limn→∞ EXSn = EXK . That
E{XT |FS } = XS follows from the Proposition 5.4.2.
If {Xt } is a uniformly integrable right continuous martingale, by Proposition
5.3.6, there is an L1 variable with
Xt = E{X∞ |Ft }
Let Sn = 21n [2n S + 1]. Let Ak = {Sn = 2kn } ∈ FSn . Take t = 2kn . Since A ∈ Ft ,
E(X kn 1Ak ) = E(X∞ 1Ak ). Thus E(XSn 1Ak ) = E(X kn 1Ak ) = E(X∞ 1Ak ).
2
2
Summing over k to see that EXSn = EX∞ and
XSn = E{X∞ |FSn }.
The family XSn is uniformly integrable. For A ∈ FS ⊂ FSn ,
EXSn 1A = EX∞ 1A ,
taking n → ∞ to obtain E(XSn = EX∞ and
XS = E{X∞ |FS }.
Hence E{XT |FS } = E{E{X∞ |FT }FS } = XS .
The proof for super martingales. Let us only prove the case when (Xt ) is a
uniformly integrable super-martingale. For s ≤ t, Then Xs ≤ E{Xt |Fs }. Since
limt→∞ E{Xt |Fs } = E{X∞ |Fs }, Xs ≤ E{X∞ |Fs }. The rest of the proof is as
above.
73
5.5
Martingale Inequalities (I)
The following martingale inequality is inspired by the following, where Bt is a
Brownian motion:
1
P (sup Bs ≥ a) = 2P (Bt ≥ a) = P (|Bt | ≥ a) ≤ p E|Bt |p .
a
s≤t
Proposition 5.5.1 Let (Xt , t ∈ I) be a right continuous martingale or a positive
sub-martingale. Let t ∈ I, an interval.
1. Maximal Inequality. For p ≥ 1,
P (sup |Xt | ≥ λ) ≤
t∈I
1
sup E|Xt |p ,
λp t∈I
λ>0
2. Lp inequality. Let p > 1. Then
p
p
p
E sup |Xt | ≤
sup E|Xt |p .
p
−
1
t∈I
t∈I
Remark 5.5.2
• If (Xt ) is a martingale in a Banach space, kXt k is a positive
sub-martingale which follows from that the norm is a convex function.
• Let s < t. If (Xt ) is a martingale,
E|Xs |p = E (|E{Xt |Fs }|p ) ≤ E|Xt |p .
In particular E|Xs |p increases with time and sups≤t E|Xs |p ≤ E|Xt |p .
• If (Xt ) is a sub-martingale, then Xs ≤ E{Xt |Fs }. If it is positive, |Xs |p ≤
|E{Xt |Fs }|p and the discussion above is valid.
• By the Markov inequality, for any p ≥ o,
P (sup |Xt | ≥ λ) ≤
t∈I
1
E sup |Xt |p .
λp t∈I
This inequality holds for any processes. For a martingale this is weaker than
the maximal inequality.
• If Xt is a process satisfying conditions in the theorem. From the Markov
inequality and the Lp inequality we ‘almost’ recover the maximal inequality
for p > 1:
p
1
1
p
p
P (sup |Xt | ≥ λ) ≤ p E sup |Xt | ≤ p
sup E|Xt |p .
λ
λ
p−1
t∈I
t∈I
t∈I
74
See Revuz-Yor [23] for a proof. The standard convention for the maximum is:
X ∗ = supt∈I |Xt |.
Corollary 5.5.3 Let Xt be a right continuous martingale such that supt∈[0,∞) E|Xt |p <
∞ for some p > 1, then limt→∞ Xt converges to X∞ in Lp .
Proof By the Martingale convergence theorem, X∞ = limt→∞ Xt exists and
is finite a.s. Note that |Xt |p ≤ supt |Xt |p , the latter belongs to L1 by the Lp
inequality. Thus (|Xt |p , t) is u.i. and E|X∞ |p ≤ lim inf t→∞ (|Xt |p ) < ∞. By the
dominated convergence theorem E|Xt − X∞ |p → 0.
5.6
Lecture 16: Local Martingales
Definition 5.6.1 An adapted stochastic process (Xt : t ≥ 0) is a local martingale
if there is an increasing sequence of stopping times Tn with limn→∞ Tn = ∞ a.s.
and is such that (XtTn − X0 , t ≥ 0) is a uniformly integrable martingale. We say
that Tn reduces X.
A sample continuous martingale is a local martingale (Take Tn = n). A local
martingale (Mt ) is a martingale if (MT : T bounded stopping times} is uniformly
integrable. This is the case if {|Mt |, t} is bounded by an L1 random variable Z.
Remark 5.6.2 If Xt is a martingale then EXt = EX0 . If Xt is a local martingale
this no longer holds. Furthermore given any function m(t) of bounded variation
there is a local martingale such that m(t) is its expectation process. A local martingale which is not a martingale is called a strictly local martingale, otherwise it
is a true martingale, see Elworthy-Li-Yor [6] for discussions related to this.
Let T > 0.
Theorem 5.6.3 Let Mt be a continuous local martingale. If Mt has finite total
variation on [0, T ] then Mt = M0 , any t ≤ T .
Proof This proof is the same as that for Brownian motions. We did not cover it in
the lectures.
We may assume that M0 = 0. Let t ≤ T . First let
MT V (t, ω) = sup
∆
N
−1
X
|Mtj+1 (ω) − Mtj (ω)|.
j=0
75
where ∆ range through all partitions 0 = t0 < t1 < · · · < tN = t of [0, t]. It is
increasing and continuous in t. Let
Tn = inf{t : MT V (t) ≥ n}.
Tn
Fix n, write Xt = MP
t . Then Xt is bounded by n and a martingale. For a
−1
2
martingale, EXt2 = E N
i=0 |Xti+1 − Xti | . Indeed,
E
N
−1
X
|Xti+1 − Xti |
2
=
i=0
=
=
N
−1
X
i=0
N
−1
X
2
E E |Xti+1 − Xti |Fti
o
n
E E [Xt2i+1 − 2Xti+1 Xti + Xt2i ]Fti
i=0
N
−1 X
EXt2i+1 − EXt2i = EXt2
i=0
EXt2 = E
N
−1
X
|Xti+1 − Xti |2 ≤ E max |Xti+1 − Xti |
i=0
N
−1
X
!
|Xti+1 − Xti |
i=0
≤ max XT V (ω) E max |Xti+1
ω
i
≤ nE max |Xti+1 − Xti | .
− Xti |
i
Since Xt is uniformly continuous on [0,t] and bounded, by the dominated convergence theorem, E maxi |Xti+1 − Xti | → 0. Hence E[Xt2 ] = 0 and MtTn = 0
almost surely. This means that Mt = 0 on {t < Tn }. Taking n → ∞, Since
Tn → ∞ the conclusion holds.
76
Chapter 6
The quadratic Variation Process
Definition 6.0.4 A stochastic process of finite variation is an adapted stochastic
process with sample paths of finite total variation on any finite time interval.
Definition 6.0.5 A semi-martingale is an adapted stochastic process such that
Xt = X0 +Mt +At where Mt is a local martingale and At of finite total variation,
M0 = A0 = 0.
A continuous semi-martingale Xt is of the form Xt = X0 + Mt + At where
Mt and At are continuous.
Definition 6.0.6 Let H02 be the space of L2 p
bounded continuous martingales that
2
vanishes at 0. Let M ∈ H0 . Define kM k = E(M∞ )2 .
Proposition 6.0.7 The vector space H02 with inner product from the norm k − k is
a Hilbert space.
n ) is a Cauchy sequence
Proof Let (Mtn ) be a Cauchy sequence in H02 . Then (M∞
2
n
2
in L . Let X = limn→∞ M∞ . Then X ∈ L . Define Mt = E{X|Ft }. Then
(Mt ) is a martingale. By Jensen’s inequality,
2
2
EMt2 ≤ E E{M∞
|Ft } = EM∞
.
Since t 7→ EMt is continuous, we take a right continuous version of (Mt ). Then
limt→∞ Mt = X, by Theorem 5.3.6. By Doob’s L2 inequality,
n
E sup |Mtn − Mt |2 ≤ E|M∞
− X|2 → 0.
t
There is a subsequence of Mtn that converges to Mt a.s. uniformly in t. Hence
n , n = 1, 2, . . . } is
(Mt ) is a continuous martingale and M0 = 0. Since {M∞
n |F } to see that M =
uniformly integrable, taking limit n → ∞ in Mtn = E{M∞
t
t
E{M∞ |Ft } and M∞ = X a.s.
77
6.1
Lecture 18-20: The Basic Theorem
P
Let Mn be a martingale. Then Mn , equaling M0 + k (Mk −PMk−1 ), is the sum
2
2
of orthogonal processes. Furthermore E(Mn )2 = EM
0 +E
k (Mk − Mk−1 ) .
P
∞
It follows that (Mn ) is bounded in L2 if and only if k=1 E(Mk − Mk−1 )2 < ∞.
Let (Mn ) be a martingale. Define the stochastic integral M · M by
(M · M )n =
n
X
Mk (Mk+1 − Mk ).
k=0
It is a local martingale and (M · M )n = (Mn )2 − (M0 )2 −
P∞
k=1 (Mk
− Mk−1 )2 .
Theorem 6.1.1 (1) For any continuous local martingales M and N , there exists
a unique continuous process hM, N it of finite variation vanishing at 0 such
that Mt Nt −hM, N it is a local martingale. This process is called the bracket
process or the quadratic variation of M and N .
(2) The process hM, N i has the following properties:
(a) It is symmetric and bilinear, and
1
hM, N i = [hM + N, M + N i − hM − N, M − N i].
4
(6.1)
(b) hM − M0 , N − N0 it = hM, N it .
(3) hM it ≡ hM, M it is increasing.
(4) If (Mt ) is bounded, Mt2 − hM it is a martingale.
Proof We first establish the uniqueness. Let At and A0t be two processes of finite
variation satisfying that Mt Nt − At and Mt Nt − A0t are local martingales. Then
At − A0t is a continuous time local martingale with finite variation. By Theorem
5.6.3, At − A0t = 0 a.s.
The uniqueness is the key for the proof of the properties in (2). (2a): Mt Nt −
hM, N it is a local martingale and so is Nt Mt − hN, M it . The symmetry follows. Similarly if a, b ∈ R and M 0 is a another martingale, haM + bM 0 , N i =
ahM, N i + bhM 0 , N i.
Next note that 14 (M + N )2 − 14 (M − N )2 = M N , and
MN −
1
(hM + N, M + N i − hM − N, M − N i)
4
78
is a local martingale. By the uniqueness of the bracket process,
hM, N i =
1
(hM + N, M + N i − hM − N, M − N i) .
4
Since M0 N is a local martingale, hM0 , N i = 0 and part (2)C follows.
For part (3), note that for any real numbers a, (M − a)N − M N = −aN is a
local martingale. Hence hM − a, N i = hM, N i.
The proof of the existence of the quadratic variation process follows from two
intuitive ideas:
• Itô’s formula:
Mt2
=
M02
t
Z
+2
0
Ms dMs + hM, M it .
• Let ∆n be a sequence of partitions with ∆n → 0,
X
in probability
=
lim
(Mtni+1 − Mtni )2 .
hM it
n→∞
i
We first assume that Mt is bounded. By part (2b) we may assume that M0 = 0.
Let us first approximate Mt be a family of stochastic processes that is piecewise
constant in time. Given n we construct a process H n such that |Htn −Mt | ≤ 21n for
all time t. Such approximation will depend on the path. Let us construct a family
of stopping times. Let
τ0n = 0
1
}
2n
1
= inf{t > 0 : |Mt − Mτkn | ≥ n }.
2
τ1n = inf{t > 0 : |Mt − M0 | ≥
n
τk+1
Then limk→∞ τkn = ∞. Define
Htn
=
∞
X
n ] (t).
Mτkn 1(τkn ,τk+1
k=0
Then supt |Htn − Mt | ≤
(H n · M )t ≡
1
2n .
Z
0
We define the stochastic integral
t
Hsn dMs =
X
k≥0
79
n
Mτkn (Mt∧τk+1
− Mt∧τkn ).
Let
Qnt
∞
X
n
=
(Mt∧τk+1
− Mt∧τkn )2 .
k=0
Then,
Mt2 = 2
Z
t
Hsn dMs + Qnt .
0
(check! This is a simple
manipulation.)
R · algebraic
n
Then the process 0 Hs dMs is a martingale (check!). It belongs to H02 . (check!)
Furthermore {(H n · M ), n = 1, 2, . . . , } is a Cauchy sequence, see Lemma 6.1.2
below.
Let Z = limn→∞ (H n · M ). Then Z ∈ H02 and Mt2 − 2Zt is a continuous
process. Define
hM it = Mt2 − 2Zt .
By Doob’s L2 inequality,
Z t
Z t
n
2
n
sup(Qt −hM it ) = sup Mt − 2
Hs dMs − hM it = 2 sup(Zt − Hsn dMs ) → 0
t
t
t
0
0
in probability. In the following by taking an almost surely convergent sub-sequence
we may assume the convergence is for
P almost surely all ω. 2 n
We prove part (3). Since Qnt = ∞
≥ Qnτn , for
n
k=0 (Mt∧τk+1 − Mt∧τk ) , Qτk+1
k
k
k
n
each n. Since {τ } ⊂ {τ
}, the monotony of hM, i on {τ (ω), n, k = 1, 2, }˙
n
n+1
t
k
is clear. By continuity, Qn is non-decreasing on the closure of {τkn (ω), n, k =
1, 2, . . . }. Fix ω, let (a, b) be in the compliment of the closure of {τkn (ω), n, k =
˙ Then if s ∈ [a, b] hM i (ω) − hM i (ω) ≤ 1n for all n. Thus Ms (ω) =
1, 2, }.
s
a
2
Ma (ω). It follows that hM it is increasing.
Part (4) is clear from the proof.
Now we consider the case when Mt is not bounded. Let τ n is the first time that
|Mt | is greater or equal to n. Then hM τn i is defined. It is clear that
hM τn iτm = hM τm iτm .
For all m < n,
hM τn i = hM τm i
on {t < τm }. We may define
hM it = hM τm i,
ω ∈ {t < τm }.
Since limn→∞ τn = ∞, we have a process hM it defined everywhere.
80
Since (M τn )2t − hM τn it = (M 2 − hM i)τt n is a martingale, M 2 − hM i is a
local martingale.
Lemma 6.1.2 {(H n · M ), n = 1, 2, . . . , } is a Cauchy sequence in H02 .
Proof Indeed let m < n,
2

kH n · M − H m · M k2 = k(H n − H m ) · M k2 = E 
X
n
Ck (Mτk+1
− Mτkn ) ..
k≥0
where Ck is either 0, where the indices for m and n coincide, or of the form Mτkn −
1
. It
Mτjm , with τjm < τkn . The sum consists of orthogonal terms and |Ck | ≤ 2m−1
1
n
m
2
2
follows that kH · M − H · M k ≤ 2m−1 E(M∞ ) .
Proposition 6.1.3 Let T be a stopping time, M and N are local continuous martingales, then
hM T , N T i = hM, N iT = hM, N T i.
Proof We observed earlier hM, M iT = hM T , M T i. Now
1
MtT NtT − M0 N0 − [hM + N, M + N iT + hM − N, M − N iT
4
is a local martingale. Hence hM T , N T i = hM, N iT . Similarly hM, N T iT =
hM T , N T i = hM, N iT . It follows that hM, N T i = hM, N i.
Proposition 6.1.4 If M is a continuous local martingale then hM, M it = 0 if and
only if Ms = M0 for s ∈ [0, t].
Proof Assume that hM, M it = 0 = 0. We first suppose that Mt is bounded and
M0 = 0. Let s ≤ t. Then E[Mts − M02 ] = EhM, M is = 0 . Then E(Mt2 ) = 0
for all s ≤ t. Otherwise let Tn be a reducing sequence of stopping times then
MtTn − M0 = 0 almost surely for each n. Taking n → ∞ to complete the proof.
See Revuz-Yor, Proposition 1.13.
Theorem 6.1.5 Let M and N be two continuous local martingales. For any sequence of partitions with |∆n | → 0,
hM, N it = lim
|∆n |→0
∞
X
(Mt∧tnj+1 − Mt∧tnj )(Nt∧tnj+1 − Nt∧tnj ),
j=0
81
with convergence in probability.
Definition 6.1.6 Let X and Y be two continuous processes. If for any sequence of
partitions with |∆n | → 0,
lim
n→∞
∞
X
(Xt∧tnj+1 − Xt∧tnj )(Yt∧tnj+1 − Yt∧tnj )
j=0
exists, we define the limit to be hX, Y it .
See Revuz-Yor for a proof. The limit on the right hand side is another definition
for the quadratic variation process, which we will use during the remaining of the
section.
Proposition 6.1.7
• If At is continuous process of finite variation and Xt is a
continuous semi-martingale then
hX, Ait = 0.
• If Xt = Mt + At and Yt = Nt + Ct be two continuous semi-martingales
with local martingale parts M and N ,
hX, Y it = hM, N it .
These assertions follow from the continuity of M and the following estimate
|
∞
X
(Xt∧tnj+1 − Xt∧tnj )(At∧tnj+1 − At∧tnj )|
j=0
≤ max |Xt∧tnj+1 − Xt∧tnj |
j
∞
X
|At∧tnj+1 − At∧tnj |.
j=0
Remark 6.1.8 The bracket process of M, N is somewhat determined by the correlation of the two stochastic processes. This statement can be made more precise
with the help of multi-dimensional martingale representation theorem. If they are
independent and bounded, E[hM, N i]2 = 0 from computing
∞
X
E[ (Mt∧tnj+1 − Mt∧tnj )(Nt∧tnj+1 − Nt∧tnj )]2 .
j=0
If M, N are unbounded, let T be a stopping time, MT and N T are not necessarily
independent. For example T could be the first time they meet. However let Tn =
82
inf{t : |Mt | ≥ n}, Sn = inf{t : |Nt | ≥ n}. Then M Tn and N Sn are independent.
Hence E[M Tn N Sn ] = 0 and hM Tn , N Sn i = 0. By the property of martingale
bracket, hM Tn , N Sn i = hM, N iTn ∧Sn = hM Tn ∧Sn , N Tn ∧Sn i = 0 which shows
and hM, N i = 0 and E(M N )Tn ∧Sn = 0.
6.2
Martingale Inequalities (II)
Theorem 6.2.1 (Burkholder-Davis-Gundy Inequality) For every p > 0, there
exist universal constants cp and Cp such that for all continuous local martingales
vanishing at 0,
p
p
cp EhM, M iT2 ≤ E(sup |Mt |)p ≤ Cp EhM, M iT2
t<T
where T is a finite number or infinity.
Let τ be a stopping time, we have
sup sup |Mt∧τ |p ≤ sup |Mt |p ,
t<∞ τ
t<∞
sup |Mτ |p ≤ sup |Mt |p ,
τ
t<∞
Let (τn ) be a reducing sequence for a local martingale then {Mtτn − M0 } is uniformly integrable.
Corollary 6.2.2 Let (Mt ) is a continuous local martingale. If M0 ∈ L1 and
supt<∞ Mt ∈ L1 then (Mt ) is a martingale.
Corollary 6.2.3 Let (Mt ) be a continuous L2 bounded martingale. Then Mt2 −
hM, M it is a true martingale and
{MT2 − hM, M iT , T ranges through all stopping times}
is a uniformly integrable family. In particular, EMT2 = EhM i∞ .
Proof We may take M0 = 0. By Doob’s Inequality,
E sup (Ms )2 ≤ 4 sup E(Ms2 ) < ∞.
s<∞
s<∞
By Burkholder-Davis-Gundy Inequality, EhM, M i∞ < ∞ and
{hM iτ : τ are stopping times }
is uniformly integrable. By Burkholder-Davis-Gundy Inequality again, the family
{(Mtτ )2 − hM, M iτt : τ are stopping times } is uniformly integrable and hence
Mt2 − hM, M it is an uniformly integrable martingale.
83
Remark 6.2.4 A stochastic process is Lp bounded if sups≤t E|Ms |p < ∞. Let
f ∈ L2 , Ft a filtration with the usual assumptions, then Mt = E{f |Ft } is a
martingale. Further (Mt2 , t ≥ 0) is L2 bounded and uniformly integrable. By
conditional Jensen inequality, Mt2 ≤ E{f 2 |Ft }, and Mt2 is uniformly integrable.
Thus
lim E(Mt )2 = E(M∞ )2 .
t→∞
On the other hand if supt EMt2 < ∞, Mt is uniformly integrable and M∞ =
limt→∞ Mt exists with
Mt = E{M∞ |Ft }.
By Fatou’s lemma,
E(M∞ )2 ≤ lim E(Mt )2 = sup E(Mt )2 < ∞.
t→∞
t
If the filtration is augmented and right continuous, we may take the cadlag version
of the martingale. Then L2 (Ω, F, P ) is the ‘same’ as the space of L2 bounded
cadlag martingales.
For two independent Brownian motions B, W , noting that they are L2 bounded
on any finite time interval [0, t], Bt Wt is a true martingale and EBT WT = 0 for
any T, S ≤ t. However if T is the first time they meet, EBT WT 6= 0!
84
Chapter 7
Stochastic Integration
Rt
We aim to define 0 fs dXs , where Xs = X0 +Ms +As is a right continuous semimartingale and ft a left continuous measurable process. Here we limit ourselves
to the case when (Ms ) is continuous in which case the bracket process is also
continuous. In the definition and throughout this section ∞ can be replaced by a
finite number T0 . This is important as Brownian motion is not L2 bounded on R+ ,
it is L2 bounded on any bounded time interval.
7.1
Lecture 21: Integration
If (As , s ≥ 0) is a right continuous function of finite variation with A0 = 0.
There is an associated Borel measure µA on [0, ∞) determined by µA ((c, d]) =
A(c) − A(d). Note that µ({d}) = A(d) − A(d−). If A is continuous the measure
does not charge a singleton.
Write
AT V (s) + As AT V (s) − As
As =
−
.
2
2
where AT V is the total variation process. Recall a signed measure µ decomposed
as difference of two positive measures: µ = µ+ − µ− . For the Radon measure
µA , |µA | is the measure Rdetermined by ATRV . Let f : RR+ → R be integrable
t
with integral denoted by [0,∞) fs dAs . Let 0 fs dAs = [0,∞) 1(0,t] (s)fs dµA (s).
If f : R+ → R is left continuous,
Z t
X
fs dAs = lim
f (tnj ) A(tnj+1 ) − A(tnj ) .
0
|∆n |→0
j
We may allow f and As random. The above procedure holds for each ω
for which As (ω) is of finite variation. There is however the added complication
85
of measurability. We assume that f is progressively measurable (by progressive measurability we include the assumption that f is universally measurable,
i.e. f : R+ ⊗ Ω → R is measurable with respect to B(R+ ) ⊗ F∞ and A is a
right
R t continuous finite variation process (recall, in particular, As is adapted). Then
0 fs (ω)dAs (ω) is a process of finite variation and is right continuous. The integral
is furthermore continuous if (As ) is sample continuous.
Recall that hM, M i correspond to a positive measure and hM, N i a signed
measure, written as µ+ − µ− where µ+ , µ− are positive measures. By |hM, N i|
we mean the measure corresponds to µ+ + µ− .
For anyqa, hM − aN it ≥ 0. This means hM, M it + a2 hN, N it ≥ 2ahM, N it .
p
hM,M it
Take a =
hM, M it hN, N it . A similar proof
to
see
that
hM,
N
i
≤
t
hN,N it
shows that for s < t:
q
q
hM, N it − hM, N is ≤ hM, M it − hM, M is hN, N it − hN, N is .
Let Hs , Ks be measurable functions. Approximate them by elementary functions
give us the following theorem:
Theorem 7.1.1 Let M and N be two continuous local martingales. Let H and K
be measurable processes, i.e. measurable with respect to F∞ ⊗ B(R+ ). Then for
t ≤ ∞,
s
s
Z t
Z t
Z t
2
|Hs ||Ks | d|hM, N i|s ≤
|Hs | dhM, M is
|Ks |2 dhN, N is ,
a.s.
0
0
0
To see the theorem holds first take H and K to be elementary processes.
Apply Hölder Inequality to the above inequality to see, for p1 + 1q = 1,
Corollary 7.1.2 [Kunita-Watanabe Inequality] For t ≤ ∞,
Z t
E
|Hs ||Ks ||dhM, N i|s
0
Z
≤
E
0
7.2
t
|Hs |2 dhM, M is
p2 ! p1
Z
E
0
t
|Ks |2 dhN, N is
Lecture 21: Stochastic Integration
Let H 2 be the space of L2 bounded continuous martingales.
86
2q ! 1q
Definition 7.2.1 For M ∈ H 2 , define L2 (M ) to be the space of progressively
measurable stochastic process (ft ) such that f : Ω × R+ → R measurable and
Z ∞
(fs )2 dhM, M is < ∞.
kf k2L2 (M ) := E
0
This is a Hilbert space.
This is an inner product space, if H, K ∈ L2 (M ), define
Z ∞
hH, KiL2 (M ) = E
Hs Ks dhM is .
0
By Kunita-Watanabe inequality, |hH, KiL2 (M ) | < ∞.
Let PM be the measure on R+ × Ω determined by Γ ∈ B(R+ ) ⊗ F∞ ,
Z ∞
PM (Γ) = E
1Γ dhM, M is .
0
Then L2 (M ) = L2 (R+ × Ω, B(R+ ) ⊗ F∞ , PM ). We see that L2 (M ) is a Hilbert
space.
Let N be an integer and 0 = t0 < t1 < · · · < tN +1 .
Definition 7.2.2 An elementary process (Kt ) is a bounded stochastic process of
the form
N
X
Kt (ω) = K−1 (ω)1{0} (t) +
Ki (ω)1(ti ,ti+1 ] (t),
i=0
where K−1 ∈ F0 , Ki ∈ Fti . The family of elementary processes is denoted by E.
Elementary process are left continuous and so progressively measurable. Let M ∈
H 2 then
Z ∞
|E
Ks2 dhM, M is | ≤ |K|∞ hM, M itN +1
0
and E ⊂ ∩M ∈H 2
L2 (M ).
Proposition 7.2.3 The set of elementary processes are dense in L2 (M ).
Proof We prove the case when f ∈ L2 (M ) is left continuous. First assume f is
bounded and let
X
gn (s, ω) = f0 (ω)1{0} (s) +
f jn (ω)1( jn ≤s< j+1
n ]
j≥1
87
2
2
2
Since f is left continuous and bounded, gn → f in L2 (M ). Note that f jn is Fj/2n
2
measurable. In general let fn (s) = fs 1|fs |≤n . Then
|fn −
f |2L2 (M )
Z
t
≤E
0
fs2 1|f (s)|≥n dhM, M is → 0.
If f is only assumed to be progressively measurable. Let f ∈ L2 (M ) be a
function orthogonal to E. Hence for any s < t, K ∈ Fs ,
Z ∞
Z t
0 = hf, K1(s,t] iL2 (M ) = E
fr K1(s,t] dhM, M ir = EK
fr dhM, M ir .
0
s
Rt
Hence 0 fr dhM, M ir , which is in L1 (Ω) by the Kunita-Watanabe inequality (Theorem 7.1.2), is a martingale and of finite variation process and f = 0.
7.2.1
Stochastic Integral: Characterization
All separable infinite dimensional Hilbert spaces are isomorphic (take a set of baiso
sis in each space to construct the isometry). Hence L2 (M ) = H02 . Below we
construct an explicit isometric map:
K 7→ I(K, M ).
We will call I(K, M ) the Itô integral and denote it by
Z t
Ks dMs .
0
The terminology ‘integral’ will be seen to be justified, when we take the example
that K is an elementary process.
Definition 7.2.4 Let M ∈ H 2 , and
Kt (ω) = K−1 (ω)1{0} (t) +
N
X
Ki (ω)1(ti ,ti+1 ] (t).
i=0
Define the following elementary integral
(K · M )t =
k−1
X
Ki (Mti+1 ∧t − Mti ∧t ).
i=0
88
The part K−1 is irrelevant for elementary integral with respect to a continuous
processes (Mt ).
Denote by H02 the subspace of H 2 whose elements Mt satisfies M0 = 0.
Theorem 7.2.5 Given M ∈ H 2 and K ∈ L2 (M ), there is a unique process I ≡
I(K, M ) ∈ H 2 , vanishing at 0, such that for any N ∈ H 2 , any t ≥ 0
Z t
hI, N it =
Ks dhM, N is .
(7.1)
0
Proof If (7.1) holds for all N ∈
H02
it holds for all N ∈ H 2 .
(a) We first prove the uniqueness. Suppose that there are two martingales I1 , I2 ∈
H02 such that
Z t
Z t
hI1 , N it =
Ks dhM, N is ,
hI2 , N it =
Ks dhM, N is .
0
0
Then hI1 − I2 , N i = 0 for any N ∈ H02 . Then N1 − N2 = 0 a.s. by standard
results on Hilbert spaces or take N = N1 − N2 .
(b) The existence. For another proof of the existence, by approximation, see
Proposition 7.2.10.
(b1) Let N ∈ H02 . We define a real valued linear map:
U:
H02
−→
R
Z
U (N ) = E
0
∞
Ks dhM, N is .
(b2) By Kunita-Watanabe inequality:
Z ∞
Z
|U (N )| = E Ks dhM, N is ≤ E
0
0
∞
Ks2
1 q
2
dhM, M is
EhN, N i∞
< |K|L2 (M ) |N |H02 .
(7.2)
By the Riesz Representation Theorem for bounded linear operators,
there is a unique element, I, of H02 , such that
U (N ) = hI, N iH 2 = hI, N iH 2 ,
0
And so
Z
E
0
∀ N ∈ H02
∞
Ks dhM, N is = EhI(K, M ), N i∞ .
89
(7.3)
(b3) Define
t
Z
Hs dhM, N is .
Xt := It Nt −
0
We prove that (XRt ) is a martingale. By the defining property of the
t
bracket process, 0 Hs dhM, N is , which vanishes at 0, must equal to
hI, N it .
Let τ be any bounded stopping time.
EIτ Nτ = EhI, N iτ = EhI, N iτ ∧∞ = EhI, N τ i∞
Z ∞
Z ∞
by(b)
= E
Ks dhM, N τ is = E
1s<τ Hs dhM, N is
0
0
Z τ
=E
Ks dhM, N is .
0
This shows that EXτ = EX0 = 0 and (Xt ) is a martingale.
(c) We show that the map K 7→ I(K, M ) is a linear isometry. The linearly is
clear. Also, let K, K 0 ∈ L2 (M ),
Z
0
hI(K, M ), I(K , M )iH 2 = E
0
Z
∞
0
∞
=E
Z0 ∞
=E
0
0
Ks dhM, I(K , M )is
Z s
0
Ks d
Kr dhM, M ir
0
Ks Ks0 dhM, M is
= hK, K 0 iL2 (M ) .
7.2.2
Properties of Integrals
The following follows from the proof of the theorem:
Corollary 7.2.6 For H ∈ L2 (M ), K ∈ L2 (N ), and M, N ∈ H 2 ,
Z ·
Z ·
Z t
Hs dMs ,
Ks dNs =
Hs Ks dhM, N is .
0
0
t
90
0
Proof Since K · N ∈ H 2 ,
Z
hI(H, M ), I(K, N )it =
t
Hs dhM, K · N is =
0
Z
=
0
Z
t
Z
Hs d(
0
0
s
Kr dhM, N ir )
t
Hs Ks dhM, N is .
Take H = K, M = N and take t → ∞ to obtain the following identity,
commonly known as the Itô isometry,
Z ·
Z ·
Z ∞
E
Ks dMs ,
Ks dMs
=E
(Ks )2 dhM, M is .
0
0
0
∞
Corollary 7.2.7 If M ∈ H 2 , H ∈ L2 (M ), K ∈ L2 (I(H, M )), then
Z s
Z t
Z t
Ks d
Hs dMs =
Ks Hs dMs .
0
0
0
Hc2 ,
Proof For any N ∈
Z ·
Z t
Z t
h
Ks d Is (H, M ), N it =
Ks dhN, I(H, M )is =
Ks Hs dhM, N is
0
0
0
= hI(HK, M ), N it .
7.2.3
Lecture 22: Stochastic Integration (2)
Let M ∈ H 2 , and (Kt ) an elementary process,
Kt (ω) = K−1 (ω)1{0} (t) +
N
X
Ki (ω)1(ti ,ti+1 ] (t),
i=0
and
(K · M )t =
k−1
X
Ki (Mti+1 ∧t − Mti ∧t ).
i=0
Proposition 7.2.8 Let M ∈ H 2 . If K is an elementary process, K · M ∈ H02 . The
map
K ∈ E 7→ K · M ∈ H 2
is linear and
kK · M |H 2 = kKkL2 (M ) ,
91
(Itô Isometry).
Proof Take s ∈ [ti , t), Since Mt is a Ft martingale and Ki ∈ Fti ∈ Fs ,
E{Ki (Mti+1 ∧t − Mti ∧t )|Fs } = Ki (Mti+1 ∧s − Mti ∧s ).
For s < ti ,
E{Ki (Mti+1 ∧t − Mti ∧t )|Fs } = E{ E{Ki (Mti+1 ∧s − Mti ∧s )|Fti |Fs }
E{ Ki E{(Mti+1 ∧s − Mti ∧s )|Fti |Fs } = 0.
Consequently
E{(K · M )t |Fs } =
X
Ki (Mti+1 ∧s − Mti ∧s ).
i
and K · M is a martingale. We prove Itô isometry,
k−1
X
k(K · M )k2H 2 = E
!2
Ki (Mti+1 − Mti )
=
i=0
=
=
k−1
X
k−1
X
E Ki2 (Mt2i+1 − Mt2i )
i=0
n
o
E Ki2 E Mt2i+1 − Mt2i |Fti
i=0
k−1
X
E Ki2 hM, M i2ti+1 − hM, M i2ti
i=0
∞
Z
=E
0
Ks2 dhM, M is = kKk2L2 (M ) .
We use the fact that Mt2 − hM, M it is a martingale.
Proposition 7.2.9 Let M, N ∈ H 2 , K ∈ E, then for all t, hK·M, N it =
Proof Note that
(K · M )t =
k−1
X
Rt
0
Ks dhM, N is .
t
Ki (Mt i+1 − Mtti ).
i=0
For all N ∈
H02 ,
hK · M, N it =
=
k−1
X
i=0
k−1
X
hKi (M ti+1 − M ti ), N it =
k−1
X
Ki hM ti+1 , N it − hM ti , N it
i=0
Ki
hM, N itti+1
−
hM, N itti
i=0
92
Z
=
0
t
Ks dhM, N is .
Rt
Given that hK · M, N it = 0 Ks dhM, N is , we have,
Z
Z ∞
Ks dhM, K · M is = E
EhK · M, K · M i∞ = E
∞
0
0
Ks2 dhM, M is .
One may also observe that
(K · M )t =
l−1
X X
!
Ki (Mti+1 − Mti ) + Kl (Mt − Mtl ) 1(tl ,tl+1 ) (t).
i=0
l
Let N ∈ H 2 ,
t
h(K · M ), N itl+1
l ∧t
∧t
t
∧t
= Hl hM, N itl+1
.
l ∧t
Proposition 7.2.10 Let M ∈ H 2 . The map K 7→ K · M extends to an isometry
from L2 (M ) to H02 and for all t
Z t
hK · M, N it =
Ks dhM, N is
0
for all N ∈ H02 .
Proof Take fn , elementary processes converging to K ∈ L2 (M ) we see that
k(fn · M ) − (fm · M )kH 2 = kfn − fm k2L2 (M ) → 0
as n, m → ∞. Then fn · M converges in H 2 and we define f · M to be the limit.
Two isometries agreeing on a dense set must agree: K ·M = I(K, M ) as required.
7.3
Localization
Proposition 7.3.1 Let τ be a stopping time. Then for M ∈ H 2 , f ∈ L2 (M ),
Z τ ∧t
Z t
Z t
Hs dMs =
Hs dMsτ =
Hs 1[0,τ ] (s)dMs .
0
0
0
H 2.
Proof Take N ∈
Then for any t ∈ [0, ∞],
Z τ ∧·
Z ·
Z t
τ
Hs dMs , N =
Hs dMs , N
=
Hs dhM, N τ i
0
0
0
t
t
Z ·
Z t
Hs dhM τ , N is =
Hs dMsτ , Ns .
=
0
0
93
t
The first required equality follows. We have used the properties of martingale
brackets: hM, N τ i = hM τ , N i. For the second equality note that,
Z ·
Z t
Hs 1s≤τ dhM, N is .
Hs 1s≤τ dMs , Ns =
0
0
t
By properties of Lebesgue-Stieljes integrals,
Z t
Z t
Z t
Z t
τ
τ
τ
Hs dhM , N is =
Hs dMs , N .
Hs dhM, N is =
Hs 1s≤τ dhM, N is =
0
0
0
0
t
The second require inequality follows.
Let S ≤ T be stopping times,
t
Z
Hs dMsS =
Rt
Hs dMs
0
0
On t < S ∧ T ,
t∧S
Z
S
0 Hs dMs =
Rt
0
Hs dMsT .
Definition 7.3.2 Let M be a continuous local martingale. Let L2loc (M ) be the
space of progressively measurable process H for which there is a sequence of
stopping times Tn increasing to infinity such that
Z
Tn
E
0
Hs2 dhM, M is < ∞.
The class L2loc (M ) consists of all progressively measurable H such that
t
Z
0
Ks2 dhM, M is < ∞,
∀t.
Proposition 7.3.3 Let M be a continuous local martingale with M0 = 0. If H ∈
L2loc (M ), there exists a local martingale H · N with (H · N )0 = 0 and
hH · M, N i = H · hM, N i
for all continuous local martingales Nt .
Detail of the proof was not covered in class.
Proof Let
Z
t
Tn = inf t ≥ 0 :
(1 +
0
94
Hs2 )dhM, M is
≥n .
Then Tn is an increasing sequence of stopping times increasing to infinity such that
M Tn is in H 2 and kH Tn kL2 (M Tn ) bounded by n. Then if n < m,
t∧Tm
Z
and so
0
Hs dM Tm ,
0
0
Rt
t∧Tn
Z
Hs dM Tn =
Hs dMsTn agrees with
Z
Rt
0
Hs dM Tm on t < Tn . Define
t
t
Z
Hs dMsTn
Hs dMs =
0
0
on {t < Tn }.
Rt
Since (H · M )Tn = 0 Hs dMsTn , (H · M ) is a local martingale. Now let N
be a continuous local martingale with reducing stopping times Sn such that N Sn
is bounded. Define τn = Tn ∧ Sn . We see that
Z ·∧τn
τn
τn
τn
τn
τn
Hs dhM, N is .
hH · M, N i = h(H · M ) , N i = h(H · M ), N i =
0
Taking n → ∞ to see that hH · M, N it =
Rt
0
Hs dhM, N is .
Definition 7.3.4 A progressively measurable process f is locally bounded if there
is an increasing sequence of stopping times increasing to infinity such that there
are constants Cn with |f Tn | ≤ Cn for all n.
Both continuous functions and convex functions are locally bounded. Any locally
bounded functions are in L2loc (M ) for any continuous local martingale M .
7.4
Properties of Stochastic Integrals
Definition 7.4.1 If Xt = Mt + At is a continuous semi-martingale, f a progressively measurable locally bounded stochastic process we define
Z t
Z t
Z t
fs dXs =
fs dMs +
fs dAs .
0
0
0
Proposition 7.4.2 Let X, Y be continuous semi-martingales. Let f, g, K be locally bounded and progressively measurable.
Rt
Rt
Rt
1. 0 (afs + bgs )dXs = a 0 fs dXs + b 0 gs dXs .
Rt
Rt
Rt
2. 0 fs (dXs + dYs ) = 0 fs dXs + 0 fs dYs .
95
3.
Z
t
s
Z
fs d
0
=
gr dXr
fs gs dXs .
∞
Z
1s≤τ Ks dXs =
∞
Ks dXsτ .
0
0
0
t
0
0
4. For any stopping time τ ,
Z
Z τ
Ks dXs =
Z
R·
5. If Xs is of bounded total variation
on
[0,
t]
so
is
the
integral
0 Ks dXs ; and if
R
Xs is a local martingale so is Ks dXs . In particular
for
a
semi-martingale
R·
Xt this gives the Doob-Meyer decomposition of 0 Ks dXs .
Definition 7.4.3 Let X, Y be continuous time semi-martingales. The Stratonovich
integral is defined as:
Z t
Z t
1
Ys dXs + hX, Y it .
Ys ◦ dXs =
2
0
0
Also note that for continuous processes, Riemann sums corresponding to a
sequence of partitions whose modulus goes to zero converges to the stochastic integral in probability. Note that this convergence does not help with computation.
Although there are sub-sequences that converges a.s. we do not know which subsequence of the partition would work and this subsequence would be likely to differ
for different integrands and different times.
Proposition 7.4.4 If K is left continuous and ∆n : 0 = tn0 < tn1 < · · · < tnNn = t
is a sequence of partition of [0, t] such that their modulus goes to zero, then
Z
t
Ks dXs = lim
n→∞
0
Nn
X
Ktni (Xtni+1 − Xtni ).
i=1
The converges is in probability.
Proposition 7.4.5 (Dominated Convergence Theorem) Let K n be locally bounded
progressively measurable converging to K. Suppose that there is a locally bounded
progressively bounded function F such that |K n | ≤ F . Then
Z s
Z s
lim sup Krn dXr −
Kr dXr → 0.
n→∞ s≤t
0
0
The convergence is in probability.
96
7.5
Itô Formula
Intuition. Let xt solves the ODE: ẋt = σ(xt ) Fundamental Theorem of Calculus:
let f be C 1 and then
Z t
f 0 (xs )σ(xs )ds.
f (xt ) = f (x0 ) +
0
Let f : R → R be a
der terms says that
C3
function, then Taylor’s Theorem with Cauchy remain-
1
1
f (y) = f (y0 ) + f 0 (y0 )y + f 00 (y0 )y 2 + f (3) (θ)y 3 .
2
3
We apply it to a semi-martignale (xt ):
X
f (xt ) =
f (xti+1 ) − f (xti )
X
1
1 00
=
f 0 (xti )(xti+1 − xti ) + f (xti )(xti+1 − xti )2 + f (3) (xθ(s) )(xti+1 − xti )3
2
3
Rt 0
R t 00
Heuristically the right hand side converges in probability to 0 f (xs )dxs + 21 0 f (xs )dhxis .
The first two terms converges in probability. If all relevant terms are bounded the
convergence holds in L2 . The third term can be seen to converge to zero because
the process is of finite quadratic variation.
We define
Z
Z
Z
t
t
s
s
fs dXs −
fs dXs =
0
fs dXs .
0
Theorem 7.5.1 (Standard Form) Let Xt = (Xt1 , . . . , Xtn ) be a Rn valued continuous time semi-martingale and f a C 2 real valued function on Rn then for
s < t,
n Z t
n Z
X
∂f
1 X t ∂2f
i
f (Xt ) = f (Xs ) +
(Xr )dXr +
(xr )dhX i , X j ir .
2
s ∂xi
s ∂xi ∂xj
i=1
i,j=1
In short hand,
Z
f (Xt ) = f (Xs ) +
s
t
1
(Df )(Xr )dXr +
2
Z
s
t
(D2 f )(Xr )dhX, Xir .
If T is a stopping time, apply Itô’s formula to Yt = XT ∧t we see that
Z T ∧t
Z
1 T ∧t 2
f (XT ∧t ) = f (X0 ) +
(Df )Xr dXr +
(D f )xr dhX, Xir .
2 0
0
A special case is:
97
Proposition 7.5.2 The product formula. If Xt and Yt are real valued semi-martingales,
Z t
Z t
Ys dXs + hX, Y it
Xs dYs +
Xt Yt = X0 Y0 +
0
0
If X = Y , this has been proven true in the construction of the bracket process
when X is an L2 bounded continuous martingales. This extends by polarisation to
the product formula.
To prove the theorem we first assume that Xt is bounded and real valued. We
prove that if Itô’s formula holds for a function f it holds for f (x)x, using the
product formula. This shows the formula holds when f is a polynomial. If f
is not a polynomial, we first assume that xt has compact support K. Let Pn be
polynomials converging to f in C 2 . Finally we take Tn be a sequence of stopping
times to prove that the formula holds.
Example 7.5.3 Let Mt be a continuous semi-martingale. Let f (x) = ex and apply
Itô’s formula to the process Mt − 21 hM, M it , assume that M0 = 0. Then
1
eMt − 2 hM,M it
Z
t
= 1+
1
+
2
Z
1
eMs − 2 hM,M is dMs −
0
t
e
Ms − 12 hM,M is
0
Z
= 1+
t
1
2
Z
0
t
1
eMs − 2 hM,M is dhM, M is
dhM, M is
1
eMs − 2 hM,M is dMs .
0
1
If Mt is a local martingale, eMt − 2 hM,MRit is a local martingale and is called the
t
exponential martingale of Mt . If Mt = 0 fs dBs then the exponential martingale
is an integral with respect to the Brownian motion Bt .
Remark 7.5.4 Let Mt be a continuous local martingale. The exponential martin1
gale Nt := eMt − 2 hM,M it is a martingale if and only if E(Nt ) = 1 for all t.
Let Mt be a continuous local martingale with E|M0 | < ∞. Suppose that the
family {MT− , T bounded stopping times } is uniformly integrable. Then Mt is a
super martingale. It is a martingale if and only if EMt = EM0 . See Proposition
2.2 in Elworthy-Li-Yor 93(Probability Theory and Related Fields 1999, volume
115).
If Mt = (Mt1 , . . . , Mtn ) we denote by hM, M it the matrix whose entries are
i
hM , M j it .
Theorem 7.5.5 Let Xt be a semi-martingale.
98
∂
1. Assume that ∂t
F (t, x) and
continuous functions. Then
∂2
∂xi ∂xj F (t, x),
i, j = 1, . . . , d exists and are
Z t
Z t
∂F
DF (s, Xs )dXs
F (t, Xt ) =F (0, X0 ) +
(s, Xs )ds +
0 ∂s
0
Z
1 t 2
+
D F (s, Xs )dhXs , Xs i.
2 0
2. Itô’s formula holds for complex valued functions.
7.6
Lévy’s Martingale Characterization Theorem
Definition 7.6.1 An Ft -adapted stochastic process (Xt ) is a (Ft ) Brownian motion if it is a Brownian motion and for each t ≥ 0, (Bt+s − Bs ) is independent of
Fs .
A Brownian motion is a FtB -martingale where FtB = σ{Bs , s ≤ t}.
Theorem 7.6.2 [Lévy’s Martingale Characterization Theorem] An Ft adapted sample continuous stochastic process Xt in Rd vanishing at 0 is a standard Ft -Brownian
motion if and only if each Xt is a Ft local martingale and hX i , X j it = δij t.
The proof is not covered in class.
P
Proof Proof of the ‘if part’: for any λ ∈ Rd , Yt = hλ, Xt iRd =
λj Xtj is a
1
2
local martingale with bracket |λ|2 t. The exponential martingale expihλ,Xt i+ 2 |λ| t
is a martingale as it is bounded on any compact time interval, hence
1
E{expihλ,Xt −Xs i |Fs } = e− 2 |λ|
2 (t−s)
.
This is sufficient to show that Xt − Xs is independent of Fs and
1
E expihλ,Xt −Xs i = e− 2 |λ|
2 (t−s)
,
which implies that Xt − Xs ∼ N (0, t − s).
Proof of ‘only if part’. First for s < t,
E{Xti |Fs } = E{Xti − Xsi + Xsi |Fs } = E(Xti − Xsi ) + Xsi = Xsi
and each Xt is a martingale. For s < t,
E{(Xti )2 − (Xsi )2 |Fs } = E{(Xti − Xsi )2 |Fs } = E(Xti − Xsi )2 = t − s.
Then hX i , X i it = t and the bracket of independent Brownian motions is zero. 99
Chapter 8
Stochastic Differential Equations
Let m, d ∈ N , let {Bti , i = 1, . . . , m} be independent 1-dimensional Brownian
motions on a filtered probability space (Ω, F, Ft , P ) with the standard assumptions on completeness and right continuity. A stochastic differential equation of
(Markovian type) is of the form:
dxt =
m
X
σi (t, xt )dBti + b(t, xt )dt.
i=1
The functions σi , b : R+ × Rd → Rd are assumed to be Borel measurable. A solution (xt , 0 ≤ t < ζ(x0 )) is a stochastic process that satisfy the integral equation:
Z t
m Z t
X
xt = x0 +
σi (s, xs )dBsi +
b(s, xs )ds
i=1
0
0
and in particular the integrals must be defined. The value x0 is the initial value
and ζ(x0 ) : Ω → R+ is the life time of (xt ). For clarity, we write the above
SDE, just this once, in the long form. The components in Rd of the Rd valued
random variable and functions are: xt = (x1t , . . . , xdt ), σi = (σi1 , . . . , σid ), b =
(b1 , . . . , bd ). Then

Z t
m Z t
X


1
1
1
k

dxt = x0 +
σk (s, xs )dBs +
b1 (s, xs )ds



0
0

k=1



Z t
m Z t

X
 2

2
k
 dx = x2 +
σk (s, xs )dBs +
b2 (s, xs )ds
t
0
0
0
0
0
k=1




...



Z t

m Z t

X


d
d
d
k

dx
=
x
+
σ
(s,
x
)dB
+
bd (s, xs )ds.
s

0
s
k
 t
k=1
100
We also introduce the short form of the SDE, in which we let σ = (σ1 , . . . , σm )
and Bt = (Bt1 , . . . , Btm ):
dxt = σ(t, xt )dBt + b(t, xt )dt.
To motivate the definition of solutions let us recall the Central Limit Theorem:
the law of the sum of n independent random variables (satisfying certain integrability conditions) after scaling with respect to n converges to a Gaussian distribution.
Hence a system governed by dxt = b(xt )dt subject to many small independent
influences can be described by adding a Gaussian random variable. The variation
with time is embedded in the Brownian motion and the dependence of the randomness on the location is reflected in the diffusion coefficient σ.
What is Brownian motion for us? It is a martingale, its probability distribution is Gaussian etc. All these are built on a chosen environment (Ω, F, Ft , P ),
which is an added tool/structure and which is supposedly to represent the the random perturbation faithfully. There is, however, no reason why one choice of such
probability space is better than another choice.
One important notion is whether the randomness in our solution is purely a
function of the Brownian motion. It should be, but not necessarily true according
to the definition below. There is the notion of weak solutions which only describe
the statistical property of the solution, i.e. its law, but not the sample pathwise
property of the solution.
8.0.1
Stochastic processes defined up to a random time
The stochastic process Xt (ω) := 2−B1t (ω) is defined up to the first time Bt (ω)
reaches 2. We denote this time by τ . For any given time t, no matter how small
it is, there are a set of path of positive probability (measured with respect to the
Wiener measure on C([0, t]; Rd )) which will have reached 2 by time t:
r Z ∞
y2
2
P (τ ≤ t) = P (sup Bs ≥ 2) = 2P (Bt ≥ 2) =
e− 2 dy.
π √2
s≤t
t
This probability converges to zero as t → 0. We say that Xt is defined up to τ and
τ is called its life time or explosion time.
Let Rd ∪ {∆} be the one point compactification of Rd , which is a topological
space whose open sets are open sets of Rd plus set of the form (Rd r K) ∪ {∞}
where K denotes a compact set. Given a process (Xt , t < τ ) on Rd we define a
process (X̂t , t ≥ 0) on Rd ∪ {∆}:
Xt (ω), if t < τ (ω)
X̂t (ω) =
.
∆,
if t ≥ τ (ω).
101
If Xt is a continuous process on Rd then X̂t is a continuous process on Rd ∪ ∆.
Define Ŵ (Rd ) ≡ C([0, T ]; Rd ∪ ∆} whose elements satisfy that: Yt (ω) = ∆ if
Ys = ∆ for some s ≤ t. The last condition means that once a process enters the
coffin state it does not return.
8.1
Lecture 24. Stochastic Differential Equations
For i = 1, 2, . . . , m, let σi , b : R+ × Rd → Rd be Borel measurable locally
bounded functions. Let Bt = (Bt1 , . . . , Btm ) be an Rm valued Brownian motion.
8.2
The definition
Definition 8.2.1 An Ft adapted stochastic process (Xt ) is a (Ft ) Brownian motion
if it is a Brownian motion and for each t ≥ 0, (Bt+s − Bs ) is independent of Fs .
Definition 8.2.2 A solution to the SDE
dxt = σ(t, xt )dBt + b(t, xt )dt,
(8.1)
consists of
(1) a filtered probability space (Ω, F, Ft , P );
(2) a Ft Brownian motion Bt = (Bt1 , . . . , Btm );
(3) an adapted continuous stochastic process (xt , t < τ ) in Rd , where τ : Ω →
R+ ∪ {∞} is measurable, such that for all stopping times T < τ ,
Z T
m Z T
X
k
σk (s, xs )dBs +
b(s, xs )ds,
xT = x0 +
k=1
0
0
We say that the SDE holds on {t < τ (ω)}. If τ can be chosen to be ∞, it is
equivalent to say that for all t ≥ 0,
Z t
m Z t
X
k
xt = x0 +
σk (s, xs )dBs +
b(s, xs )ds.
k=1
0
0
If the functions σi (t, x) and b(t, x) do not depend on t, the SDE is said to be
time homogeneous. We concentrate on time homogeneous SDEs (of Markovian
type):
m
X
dxt =
σi (xt )dBti + b(xt )dt.
(8.2)
i=1
102
Example 8.2.3 The following SDE is not Markovian:
Z t
2
dxt =
(xr ) dr dBt .
0
8.3
Examples
Example 8.3.1 Consider ẋ(t) = ax(t) on R where a ∈ R. Let x0 ∈ R. Then
x(t) = x0 eat is a solution with initial value x0 . It is defined for all t ≥ 0.
Let φt (x0 ) = x0 eat . Then (t, x) 7→ φt (x) is continuous and φt+s (x0 ) =
φt (φs (x)).
Example 8.3.2 Linear Equation. let a, b ∈ R. Let d = m = 1. Then
x(t) = x0 eaBt −
a2
t+bt
2
solves
dxt = a xt dBt + b xt dt,
x(0) = x0 .
The solution exists for all time. To prove this statement, define a function f :
R2 → R by f (x, t) = x0 eax+(b−
a2
)t
2
. We apply Itô’s formula to f ,
(Bt , t) → f (Bt , t).
2
2
a
= af , ∂∂xf2 = a2 f , and ∂f
∂s = (b − 2 )f ,
Z t
Z
a2 t
f (Bt , t) = f (0, 0) +
af (Bs , s)dBs +
f (Bs , s) dhB, Bis
2 0
0
Z t
a2
+
(b − )f (Bs , s)ds
2
0
Z t
Z t
= x0 +
af (Bs , s)dBs +
bf (Bs , s)ds.
Since f (0, 0) = x0 ,
∂f
∂x
0
0
This proves that f (Bt , t) is indeed a solution.
Is this solution unique? The answer is yes, let yt be a solution starting from
the same point, we could compute and prove that E|xt − yt |2 = 0 for all t, which
implies that xt = yt a.s. for all t. However we delay in proving uniqueness until
Theorem 8.4.5.
Example 8.3.3 Additive noise. Let a ∈ R be a number and b : R → R be a
function, consider
dxt = b(xt )dt + a dBt .
103
A special case of this is perturbation to Newtonian mechanics. A particle of mass
1, subject to a force which is proportional to its own speed, is subject to v̇t = −kvt .
The following is called the Langevin equation:
dvt (ω) = −kvt (ω)dt + dBt (ω).
For each realisation of the noise (that means for each ω), the solution is an OrnsteinUhlenbeck process,
Z t
−kt
e−k(t−r) dBr (ω).
vt (ω) = v0 e
+
0
We check that the above equation satisfies −k
Z
−k
0
t
0
vs ds = vt − v0 − Bt .
Z t
Z s
−ks
kr
vs ds = −kv0
e ds −
ke
e dBr ds
0
0
0
Z t Z s
−kt
kr
= −v0 + v0 e
+
e dBr d(e−ks )
0
0
Z s
Z s
Z t
s=t
ekr dBr = −v0 + v0 e−kt + e−ks
−
e−ks d
ekr dBr
s=0
0
0
0
Z t
ekr dBr − Bt
= −v0 + v0 e−kt + e−kt
Z
t
Rt
−ks
0
= −v0 + vt − Bt .
This proved that the Ornstein-Uhlenbeck process is solution to the Langevin equation, with life time ∞.
Example 8.3.4
(1) Small Perturbation. Let > 0 be a small number,
Z t
xt = x0 +
b(xs )ds + Bt .
0
As → 0, xt → xt . (Exercise)
Rt
√
(2) Let yt = y0 + 0 b(ys )ds + Wt . Assume that b is bounded, as → 0,
yt on any finite time interval converges uniformly in time on any finite time
interval [0, t], E sup0≤s≤t (ys − y0 ) → 0. A more difficult question to ask is:
Does lim→0 y t exist?
104
8.3.1
Pathwise Uniqueness and Uniqueness in Law
Example 8.3.5 Tanaka’s SDE dxt = sign(xt )dBt where
(
−1, if x ≤ 0
sign(x) =
1, if x > 0.
Take any probability space and any Brownian motion (Bt ). Define
Z
Wt =
t
sign(Bs ) dBs .
0
This is a local martingale with quadratic variation t and hence a Brownian motion.
Furthermore
Z
Z
t
t
sign(Bs )dWs =
0
dBs = Bt .
0
This means that the pair (Bt , Wt ) is a solution to the SDE dBt = sign(Bt )dWt .
Question: Is the solution unique?
1. Answer 1: No. Both (Bt ) and (−Bt ) are solutions.
Rt
2. Answer 2: Yes. Any solution, xt = x0 + 0 sign(xs )dBs , is a martingale
with quadratic variation t. By Lévy Characterisation Theorem, the distribution of (xs , s ≤ t) is the Wiener measure on C([0, t]; Rd ). The probability
distribution of the solutions to Tanaka’s equation is unique.
Definition 8.3.6
• If the SDE has the following property, whenever xt and x̃t
are two solutions with x0 = x̃0 almost surely then the law of {xt : t ≥ 0} is
the same as the law of {x̃t , t ≥ 0}, we say that uniqueness in law holds.
• We say pathwise uniqueness of solution holds for an SDE if whenever xt
and x̃t are two solutions on the same probability space (Ω, F, Ft , P ) with
the same Brownian motion Bt and same initial data ( x0 = x̃0 a.s.), then
xt = x̃t for all t ≥ 0 almost surely.
Uniqueness in law holds for Tanaka’s SDE. Pathwise uniqueness fails for Tanaka’s
SDE. Uniqueness in law implies the following stronger conclusion: whenever x0
and x̃0 have the same distribution, the corresponding solutions have the same law.
105
8.3.2
Maximal solution and Explosion Time
Definition 8.3.7 A solution (xt , t < τ ) of an SDE is a maximal solution if (yt , t <
τ̄ ) is any other solution on the same probability space with the same driving noise
and with x0 = y0 a.s., then τ ≥ τ̄ a.s.. We say that τ is the explosion time or the
life time of (xt ).
Let τ N be the first time that |xt | ≥ N . Then from part (3) of the definition for
a solution, for each N ,
xt∧τ N = x0 +
m Z
X
t∧τ N
σk (xs )dBsk
Z
b(xs )ds.
+
0
0
k=1
t∧τ N
Furthermore τ = supN τ N . On τ < ∞, limN →∞ |xτN | = ∞.
Example 8.3.8 Consider ẋ(t) = (x(t))2 , x(0) = x0 . Then

x(0) = 0
 x(t) ≡ 0,
x0
 x(t) =
,
x(0) = x0 6= 0.
1 − x0 t
1
is a solution starting from x(0). For example, let x0 = 1. x(t) = 1−t
, t < 1)
is the maximal solution. The time T = τ is the life time of the solution and
limt→τ x(t) = ∞.
Example 8.3.9 Consider the SDE on R1 ,
dxt = (xt )2 dBt + (xt )3 dt.
1
Let τ be the first time that Bt hits 1. Prove that x(t) = 1−B(t)
is a solution with
initial value 1.
1
1
Let f (x) = 1−x
. On |x| < 1, the function f is smooth, f 0 (x) = (1−x)
2 =
f 2 (x), f 00 (x) =
2
(1−x)3
= 2f 3 (x). Hence
1
=1+
1 − Bt
Z
0
t
1
dBs +
(1 − Bs )2
Z
0
t
1
ds.
(1 − Bs )3
The functions σ(x) = x2 and b(x) = x3 are C 1 and so locally Lipshcitz continuous. By the theorem below there is a unique strong solution and all solutions agree
before they exit the ball of radius 1. Since
lim
t→τ
1
= ∞,
1 − Bt
1
( 1−B
, t < τ ) is the maximal solution and τ is the explosion time.
t
106
The above SDE is equivalent to the following SDE in Stratonovich form dxt =
(xt )2 ◦ dBt , i.e.
Z t
(xs )2 ◦ dBs .
xt = x0 +
0
Definition 8.3.10 We say that the SDE is complete, or conservative or does not
explode, if for all initial values the maximal solution exists for all time.
8.3.3
Strong and Weak Solutions
Definition 8.3.11 A solution (xt , Bt ) on (Ω, F, Ft , P ) is said to be a strong solution if for each t, xt is adapted to the filtration of Bt . By a weak solution we mean
one which is not strong.
The solution for Tanaka’s SDE is a not strong solution. Note that if (Bt , σ, b)
are given, any solution we could conceivably ‘construct’ concretely from the trio is
adapted to the filtration FtB of the driving noise Bt , which could be strictly smaller
than Ft . The solution constructed in Tanaka’s example is not constructed from a
given driving Brownian motion, which explains somewhat why the construction
looks slightly peculiar. This construction is typical for an SDE that has the uniqueness in law property without pathwise uniqueness property.
Proposition 8.3.12 Suppose a random function Y with values in a metric space
M is adapted to a the natural filtration of a stochastic process xs , 0 ≤ s ≤ t. Let
B([0, T ]); M ) be the space of Borel measurable function from [0, T ] to M . Then
there exists a function F : B([0, T ]); M ) → M such that Y = F (x· ).
If xt is a strong solution, it is adapted to the filtration of Bt , and so xt is a
function of (Bs , s ∈ [0, t]). Denote this function by F (x0 , ·) : W0m → Ŵ d , so
xt (ω) = Ft (x0 , B(ω)). Here W0m is the space of continuous functions from R+
to Rm starting from 0. In the case of Ω is the Wiener space W0d equipped with the
Wiener measure, Bt (ω) = ωt is a Brownian motion and we have Ft (x, ω).
Example 8.3.13 ODE ẋt = (xt )α dt, α < 1, which has two solutions from zero:
1
1
the trivial solution 0 and xt = (1 − α) 1−α t 1−α . Both uniqueness fails.
Example 8.3.14 Dimension d = 1. Consider dxt = σ(xt )dWt . Suppose that σ is
Hölder continuous of order α, |σ(x) − σ(y)| ≤ c|x − y|α for all x, y. If α ≥ 1/2
then pathwise uniqueness holds for dxt = σ(xt )dWt . If α < 1/2 uniqueness no
longer holds. For α > 1/2 this goes back to Skorohod (62-65) and Tanaka(64).
The α = 1/2 case is credited to Yamada-Watanabe.
107
8.3.4
The Yamada-Watanabe Theorem
Although it is not clear at the first glance what are the relation between pathwise
uniqueness and uniqueness in law, it becomes clear later that the former implies
the latter.
The following beautiful, and somewhat surprising, theorem of Yamada and
Watanabe, states that the existence of a weak solution for any initial distribution
together with pathwise uniqueness implies the existence of a unique strong solution.
Proposition 8.3.15 If pathwise uniqueness holds then any solution is a strong solution and uniqueness in law holds.
Theorem 8.3.16 (The Yamada-Watanabe Theorem) If for each initial probability distribution there is a weak solution to the SDE and suppose that pathwise
uniqueness holds then there exists a Borel measurable map: F : Rd × W0m → Ŵ d
such that
1. For any Bt , and x0 ∈ Rd , Ft (x0 , B) is a solution with the driving noise Bt .
2. If xt is a solution on a filtered probability space with driving noise Bt , then
xt = Ft (x, B) a.s.
For a proof see see Revuz-Yor for detail. First Observation: If there is a solution
there is a solution on a canonical probability space. Given a solution (xt , Bt ) let
µ be its joint distribution. Consider Ŵ d × W m with this measure, the canonical
filtration Gt generated by the coordinate process. Let ωt1 be the projection of the
coordinate process to Ŵ d , wt the projection of the coordinate process to the second
component W m . Then wt is a Brownian motion and ωt1 the solution with the
driving Brownian motion wt , due to the following lemma.
Lemma 8.3.17 Let f, g be locally bounded predictable processes (measurable with
respect to the filtration generated by left continuous processes), and B,R W continut
ous semi-martingales. If (f, B) = (g, W ) in distribution then (f, B, 0 fs dBs ) =
Rt
(g, W, 0 gs dWs ) in distribution.
Second Observation: Given two solutions on two probability space we could
build them on the same probability space: Ŵ d × Ŵ d × W m . Let Q1 , Q2 be the
joint distribution of the two solutions (x1t , Bt1 ) and (x2t , Bt2 ), they are measures on
Ŵ d × W m . Let Q̃i , i = 1, 2 be the regular conditional expectation of w·i given w·
then
Qi (dω i , dω) = Q̃i (ω, dω i )Q(dω).
108
Then Q̃1 (ω, dω 1 )⊗ Q̃2 (ω, dω 2 )Q(dω) induces a measure on Ŵ d × Ŵ d ×W m such
that the projection to the third component of the coordinate process wt (ω1 , ω2 , ω) =
ωt is a Ft Brownian motion where Ft is the standard filtration on the Wiener space
on R2d+m . Then wt1 and wt2 are two solutions with the same driving Brownian
motion. By pathwise uniqueness, they have the same law. Since they are equal and
independent conditioned on wt , they are functions of wt and their laws are delta
measures.
8.3.5
Strong Complteness, flow
Suppose that path wise uniqueness holds and the SDE does not explode.
Definition 8.3.18 If for each point x there is a solution Ft (x, ω) to
X
dxt =
σi (xt ) ◦ dBti + b(xt )dt,
i
and there is a version of Ft (x, ω) such that (t, x) 7→ Ft (x) is continuous, we say
that the SDE is strongly complete.
Definition 8.3.19 Let ξ be Fs -measurable. We denote by Fs,t (ξ) the solution to:
Z t
XZ t
i
xt = ξ +
σi (xr )dBr +
b(xr )dr.
i
s
s
For simplification let Ft = F0,t .
Definition 8.3.20 Let S be a stopping time we define the shift operator: θS B =
BS+· − BS .
The process (θS B)t := BS+t −BS is an F·+S BM. If (Bt ) is the canonical process
on the Wiener space, this is θS (ω)(t) = ω(S + t) − ω(S).
Theorem 8.3.21 Given 0 ≤ S ≤ T be stopping times, assume there is a unique
global strong solution Ft (·, B), t ≥ 0) to the SDE. Then the flow property holds:
FS,T (FS (x0 , B), ω) = FT (x, B).
(8.3)
And the cocycle property holds:
FT −S (FS (x, ω), θS (ω)) = FT (x, ω).
(8.4)
Proof The flow property follows from the pathwise uniqueness of the solution. 109
8.3.6
Markov Process and its semi-group
Definition 8.3.22 A family of Ft adapted stochastic process Xt is a Markov process if for all real valued bounded Borel measurable function f and for all 0 ≤
s ≤ t,
E{f (Xt )|Fs } = E{f (Xt )|Xs }.
It is strong Markov if for all stopping time τ and t ≥ 0, E{f (Xτ +t )|Fτ } =
E{f (Xτ +t )|Xτ }.
The remaining of this sub-section is not covered in 2012-12013.
Definition 8.3.23 Let (E, B(E)) be a measurable space. A function Ps,t (x, Γ)
defined for all 0 ≤ s ≤ t < ∞, ∈ E and A ∈ B(E) is a Markov transition
function if,
1. For all 0 ≤ s ≤ t and x ∈ E, A 7→ Ps,t x, A) is a probability measure on
B(E)
2. For all A ∈ B(E) and 0 ≤ s ≤ t, x 7→ Ps,t (x, A) : [0, ∞) × E → R is
bounded and Borel measurable.
3. For all 0 ≤ s ≤ t ≤ u, all x ∈ E, and A ∈ B(E),
Z
Ps,u (x, A) =
Ps,t (x, dy)Pt,u (y, A).
y∈E
This equation is the Chapman-Kolmogorov equation.
If Ps,t depends only on the increment t − s we define Pt = Ps+u,t+u .
Definition 8.3.24 A Markov process (Xt ) on E with Markov transition function
Ps,t if for 0 ≤ s ≤ t and for all f : (E, B(E)) → R bounded measurable
Z
E{f (Xt )|Xs = x} = f (y)Ps,t (x, dy).
y
If the transition function is time homogeneous, the Markov process is a time homogeneous Markov process. For simplicity we only consider the homogeneous
Markov processes. We define a family of linear operators on the space of bounded
measurable functions on E:
Z
Tt f (x) = f (y)Pt (x, dy).
y
The Chapman-Kolmogorov equation will imply that Tt is a semi-group of linear
operators: T0 f = f and Ts+t f = Ts Tt f .
110
Remark 8.3.25 If (Xt ) is a Markov process we may be tempted to define: {Ts,t , t ≥
s ≥ 0} on Bounded measurable functions by
Ts,t f (x) = E{f (Xt )|Xs = x}.
Indeed this is a good proposal needing some thoughts. The conditional expectation
E{f (Xt )|Xs } is defined up to a set of measure zero and this set of measure zero
may differ when the function f is changed. For the object E{f (Xt )|Xs = x}
to be well defined we only need to consider matters related to regular conditional
probabilities, such considerations are in general quite messy. However if we begin
with a transition function, we have no more problem.
A real valued Markov process Xt is a diffusion process its its transition probability satisfies certain properties. Roughly speaking, given xs = x, xs+h − xs ∼
b(s, x)h + δx + o(h) for some functions b, σ and a random function δ that satisfies
and E(δx)2 = σ(s, x)h + o(h).
We will review a number of properties concerning the Markov transition function. Let E = R for simplicity. If Ps,t (x, dy) is absolutely continuous with respect
to dy its density is denoted by p(s, x, t, y).
Theorem 8.3.26 If the transition density p(s, x, t, y) of a diffusion process is measurable in all its arguments, then for each s < u < t, x and for almost all y,
Z
p(s, x, u, z)p(u, z, t, y)dz = p(s, x, t, y).
Theorem 8.3.27 Assume that the transition density p(s, x, t, y) of a diffusion process satisfies:
1. For 0 ≤ s < t, t − s > δ > 0, p(s, x, t, y) is continuous and bounded in
s, t, x.
2. p is twice differentiable in x and once differentiable in t. Then for 0 < s < t,
p(s, x, t, y) satisfies the backward Kolmogorov equation:
∂p
∂p
1
∂2p
(s, x, t, y) = −b(s, x) (s, x, t, y) − σ(s, x) 2 (s, x, t, y).
∂t
∂x
2
∂x
In the backward Kolmogorov equation we differentiate the x-variable. A integration by parts formula gives,
111
Theorem 8.3.28 Assume that the transition density p(s, x, t, y) of a diffusion pro∂p
∂2p
cess is such that the partial derivatives ∂p
∂s (s, x, t, y), ∂y (s, x, t, y), ∂y 2 (s, x, t, y)
∂b
exist. Assume that ∂σ
∂x (s, x) and ∂x (s, x) exist. Then the Kolomogrov’s equation
(Fokker-Plank equation) holds for s < t:
∂p
∂
1 ∂2
(s, x, t, y) = − (b(s, x)p(s, x, t, y)) +
(σ(s, x)p(s, x, t, y)) .
∂t
∂y
2 ∂y 2
8.3.7
The semi-group associated to the SDE
Let Ft (x) be the solution to the SDE with initial value x and life time τ (x). We
define a family of linear operators {Pt , t ≥ 0} on Bounded measurable functions
by
Pt f (x) = E f (Ft (x))1t<τ (x) .
The flow property gives the semi-group property of the family of operators: Pt+s =
Pt Ps , P0 = Id. It is easier to work first with the assumption that there is no
explosion, nevertheless it is worth remembering that we do not need to assume
non-explosion.
Definition 8.3.29 A family of Ft adapted stochastic process Xt is a Markov process with Markov semi-group Pt if for all f bounded Borel measurable, 0 ≤ s ≤ t,
E{f (Xt )|Ft } = Pt−s f (Xs )).
It is strong Markov if for a stopping time S, E{f (XS+t )|FS } = Pt f (XS ).
Example 8.3.30 The Brownian motion (x + Bt ) is a strong Markov process. Let
x ∈ R and f : Rn → R bounded Borel measurable and define
Pt f (x) = Ef (x + Bt ).
(8.5)
We have,
E{f (x + Bt+S )|FS }) = E{f (x + BS + (Bt+S − BS ))|FS } = Pt f (x + BS ).
Given the existence of a strong solution and path wise uniqueness, the solution
Ft (x0 , ω) to the SDE
dxt = σ(t, xt )dBt + b(t, xt )dt
is a strong Markov process. In fact the Cocylcle property implies the Markov
property and the Cocycle property with stopping times implies the strong Markov
proerty.
Ef (FS+t (x, B)) = EE {f (Ft (FS (x), θS (B)))|FS (x)}
= Ef (Ft (−, θS (B)))(FS (x)) = Pt f (FS (x)).
112
8.3.8
The Infinitesimal Generator for the SDE
Let f : Rd → R be C 2 , we define a linear operator L : C 2 (Rd ) → C(Rd ) by
!
d
d
m
X
1 X X i
∂f
∂2f
j
Lf (x) =
(x) +
(x)bj (x).
σk (x)σk (x)
2
∂xi ∂xj
∂xj
i,j=1
j=1
k=1
We also denote by Lie derivative notation the last term:
d
X
∂f
(x)bj (x) ≡ h∇f, bi.
∂xj
Lb f (x) =
j=1
The above is abbreviated as
m
Lf =
1X
Lσk Lσk f + Lb f.
2
k=1
Proposition 8.3.31 Let f : Rd → R be a C 2 function and (xt , t < τ ) a solution
to the SDE E(σ, b), then for any stopping time T < τ ,
Z
f (xT ) = f (x0 ) +
T
Z
0
T
Lf (xs )ds.
(df )xs (σ(xs )dBs ) +
0
The linear operator L, from C 2 to the set of bounded measurable functions, is
called the infinitesimal generator of the SDE E(σ, b).
Here we adopted the notation,
Z
T
(df )xs (σ(xs )dBs ) =
0
=
m Z
X
t
(df )xs (σk (xs ))dBsk
k=1 0
m X
d Z t
X
k=1 j=1
0
∂f
(xs )(σkj (xs ))dBsk .
∂xj
P
j
i
Let σ T σ denotes d × d square matrix whose (i, j)-th entry is m
k=1 σk (x)σk (x), it
is positive definite.
Let t > 0, the term
Z t
df
Mt := f (xt ) − f (x0 ) −
Lf (xs )ds
0
113
is seen to be a local martingale. (This leads to Stroock-Varadhan’s wonderful martingale method for the existence of weak solutions.)
We say that f is the the domain of the generator of L, if
lim
t→0
Pt f − f
= Lf.
t
For suitable functions f , given suitable conditions on σ, b (e.g. BC 2 )
d
Pt f = L(Pt f ).
dt
∂
The equation ∂t
= L is called Kolmogorov’s forward equation. The equation
∂
∂t = −L is called Kolmogorov’s backward equation.
8.4
8.4.1
Existence and Uniqueness Theorems
Gronwall’s Lemma
Gronwall’s Lemma is also called Gronwall’s inequality is a comparison theorem
1
for one
R x dimensional ordinary differential equations. For g ∈ L , any a ∈ −∞ ∪
R, a f (t)dt is absolutely continuous, of finite variation on R. Furthermore
Rx
d
dx 0 f (t)dt = g almost surely. (Only local integrability is required if we work on
a finite interval.)
Rt
In the following proposition we assume that t0 β(s)f (s)ds exists.
Proposition 8.4.1 Let t0 ≥ 0, β ∈ L1loc , then Suppose that f (t) ≤ C(t) +
Rt
t0 β(s)f (s)ds. Then
Z
t
f (t) ≤ C(t) +
Rt
C(s)β(s)e
s
β(r)dr
ds.
t0
Proof First
R
Z s
Z s
R
d
− ts βr dr
− ts βr dr
e 0
β(r)f (r)dr = β(s)e 0
−
β(r)f (r)dr + f (s)
ds
t0
t0
−
≤ C(s)β(s)e
Rs
t0
βr dr
Integrate the above from t0 to t:
Z t
Z t
R
R
− t β dr
− s β dr
e t0 r
β(r)f (r)dr ≤
C(s)β(s)e t0 r ds.
t0
t0
114
Rt
Hence from the assumption, f (t) ≤ C(t) +
Rt
βr dr
t0
β(s)f (s)ds,
t
Z
−
C(s)β(s)e
f (t) ≤ C(t) + e
0
Z t
Rt
≤ C(t) +
C(s)β(s)e s βr dr ds
t0
Rs
t0
βr dr
ds
t0
Theorem 8.4.2 (Grownall’s Inequality) Let t0 ≥ 0. Suppose that β is locally
integrable and C 0 (t) exists and is locally integrable. Then from
Z t
f (t) ≤ C(t) +
β(s)f (s)ds
t0
(1) we conclude
Rt
f (t) ≤ C(t0 )e
t0
β(r)dr
Z
t
+
C 0 (s)e
Rt
s
β(r)dr
ds.
t0
(2) Let C(t) be a constant. Then C 0 (t) = 0 and
Rt
f (t) ≤ Ce
t0
βr dr
.
Proof By the previous Proposition,
Z
t
f (t) ≤ C(t) +
Rt
C(s)β(s)e
s
βr dr
ds.
t0
To the right hand side we apply integration by parts,
Z t
Rt
C(t) +
C(s)β(s)e s βr dr ds
= C(t) − e
t0
Rt
t0
βr dr
t
Z
C(s)
t0
Rt
= C(t) − e
t0
βr dr
C(t)e
−
d − Rts βr dr
e 0
ds
ds
Rt
t0
βr dr
Z
t
− C(t0 ) −
0
−
C (s)e
Rs
t0
βr dr
ds
t0
Rt
= C(t0 )e
t0
βr dr
Z
t
+
C 0 (s)e
Rt
s
βr dr
ds
t0
115
8.4.2
Main Theorems
For a matrix A = (aj,k ) define kAk =
qP
j,k
|aj,k |2 .
Definition 8.4.3 We say that f : Rd → Rd are Lipschitz continuous with Lipschitz
constant K, if there exists a constant K > 0 such that
x, y ∈ Rd .
|f (x) − f (y)| ≤ K |x − y| ,
If is locally Lipschitz continuous, if there is CN > 0 such that
|f (x) − f (y)| ≤ CN |x − y|,
x, y ∈ BN .
A vector valued function f = (f1 , . . . , fn ) is Lipschitz continuous if and only if
each fi is Lipschitz continuous. A differentiable functions f with its derivative df ,
∂f
equivalently its partial derivatives ∂x
, bounded is Lipschitz continuous. Any C 1
i
function is locally Lipschitz continuous.
Theorem 8.4.4 Suppose that the coefficients are locally Lipschitz continuous. For
each initial value x0 there is a unique strong solution (xt , t < τ ). Furthermore
limt↑τ |xt | = ∞ on {ω : τ (ω) < ∞}.
8.4.3
Global Lipschitz case
Below we will need Hölder’s inequality, for p, q ∈ (1, ∞) such that
Z
t
Z
|f (s)g(s)|ds ≤
p1 Z
t
p
t
q
|f (s)| ds
0
|g(s)| ds
0
1
p
+
1
q
= 1,
1q
.
0
Theorem 8.4.5 Suppose that the coefficients are Lipschitz continuous. For each
initial value x0 there is a unique strong solution (xt , t ≥ 0). Furthermore the
solution is sample continuous.
Proof Let us prove the Lipschitz case with d = 1, m = 1:
Z t
Z t
xt = x0 +
σ(xs )dBs +
b(xs )ds.
0
0
Step 1. Let (xt , yt ) be two solutions. Then
Z t
2
Z t
2
2
(σ(xs ) − σ(ys ))dBs + 2
(b(xs ) − b(ys ))ds
|xt − yt | ≤ 2
0
Z
≤2
2
t
(σ(xs ) − σ(ys ))dBs
0
Z
0
t
+ 2t
0
116
(b(xs ) − b(ys ))2 ds.
Apply Burkholder-Davies-Gundy inequality to see that,
Z t
Z t
2
2
E (b(xs ) − b(ys ))2 ds.
E|xt − yt | ≤ 2E
(σ(xs ) − σ(ys )) ds + 2t
0
0
Since
σ(x) − σ(y)| ≤ K|x − y|,
|b(x) − b(y)| ≤ K|x − y|,
Z
E|xt − yt |2 ≤ 2K 2
t
E|xs − ys |2 ds.
0
By Grownall’s inequality, E|xt − yt |2 = 0 for all t and xt = yt almost surely.
Step 2. Existence. Fix T > 1. Define, for all t ∈ [0, T ],
(0)
= x0 ,
(1)
= x0 +
xt
xt
Z
t
t
Z
σ(x0 )dBs +
b(x0 )ds
0
0
...
(n)
xt
Z
= x0 +
t
σ(x(n−1)
)dBs +
s
Z
0
t
b(x(n−1)
)ds
s
0
and
(n+1)
xt
Z
= x0 +
t
σk (x(n)
s )dBs +
t
Z
0
b(x(n)
s )ds.
(8.6)
0
2A. We prove that for all t > 0,
(n+1) 2
E sup |xs
| < ∞.
s≤t
This holds for n = 0. We assume that this holds for n and prove it for n + 1.
Z t
2
(n+1) 2
2
(n)
E sup |xs
| ≤4|x0 | + 4E sup σ(xr )dBr s≤t
s≤t
Z
+ 4E sup
s≤t
0
0
2
(n) b(xr ) dr
s
Below we apply the Burkholder-Davies-Gundy inequality to the local martingale
term. There is a constant C such that
Z t
2
Z ·
Z ·
(n)
(n)
(n)
E sup σ(xr )dBr ≤ CE
σ(xr )dBr ,
σ(xr )dBr
s≤t
0
0
Z
= CE
t
σ 2 (xr(n) )dr.
0
117
0
t
Apply Hölder’s inequality gives
s
Z
|b(x(n)
r )|dr
sup
s≤t
2
Z
≤t
t
|b(xr(n) )|2 dr.
0
0
Since σ, b are Lipschitz continuous, there is C2 > 0 such that
|σ(x)| ≤ C2 (1 + |x|),
|b(x)| ≤ C2 (1 + |x|).
Z t
Z t
4
2 (n)
2
2
≤4|x
|
+
4CE
σ
(x
)dr
+
4tE
|b(x(n)
E sup |x(n+1)
|
0
r
r )| dr
s
s≤t
0
4
0
t
Z
≤4|x0 | + 8CE
(1 +
2
(x(n)
r ) )dr
Z
t
+ 8tE
0
2
(1 + (x(n)
r ) )dr
0
<∞.
2B. For any k ∈ N, t > 0,
2
E sup |x(k+1)
− x(k)
s
s |
s≤t
Z
≤ 2E sup
s≤t
Z t
≤ 2CE
s
(σ(x(k)
s )
−
σ(x(k−1)
)dBs
s
2
+ 2E sup
s≤t
0
(σ(x(k)
s )
0
≤ (2CK 2 + 2tK 2 )
σ(x(k−1)
)2 ds
s
−
Z t
Z
s
(b(x(k)
s )
0
t
Z
(k−1) 2
E(b(x(k)
) ds
s ) − b(xs
+ 2t
0
(k−1) 2
E(x(k)
) ds
s − xs
0
≤ (2CK 2 + 2tK 2 )
t
Z
2 2
(k−1) 2
E sup (xt1 − xt1
) dt1
r≤t1
0
2
(k)
t
Z
≤ (2CK + 2tK )
t1
Z
dt1
0
E sup (x(k−1)
− x(k−2)
)2 dt2
r
r
r≤t1
0
≤ ...
2
2 k
Z
t
≤ (2CK + 2tK )
Z
dt1
0
t1
Z
tk−1
...
0
2
2
2 k
≤ E sup(x(1)
r − x0 ) (2CK + 2tK ) (
r≤t
Dk
= C1
k!
2
E sup (x(1)
r − x0 ) dtk
r≤tk
0
tk
k!
)
.
(1)
Here C1 = E supr≤t (xr − x0 )2 and D = (2CK 2 + 2tK 2 )t.
118
−
b(x(k−1)
)ds
s
2
2C. Let n > m,
(m)
|x(n)
s − xs | ≤
n−1
X
(k+1)
|xt
(k)
− xt |.
k=m
(n)
Hence {xs (ω)} is a Cauchy sequence if
∞
X
(k+1)
|xt
(k)
(ω) − xt (ω)| < ∞.
k=1
By 2B,
E
∞
X
(k+1)
|xt
−
(k)
xt |
≤
∞ X
(k+1)
E|xt
−
(k)
xt |2
1
2
k=1
k=1
∞ r
X
(k+1)
(k)
≤
E sup |xs
− xs |2
≤
k=1
∞
X
k=1
s≤t
C1
Dk
< ∞.
k!
(n)
2D. Let xt (ω) = limn→∞ xt (ω). The process is continuous in time by the
uniform convergence. Take n → ∞ in equation (8.6), note that
Z t
Z t
σ(xs(n) )dBs →
σ(xs )dBs
0
0
in L2 . There is an almost surely convergent subsequence. Hence
Z t
Z t
xt = x0 +
σ(xs )dBs +
b(xs )ds.
0
0
(n)
Step 3. The solution is certainly a function of Bt as each xt
a strong solution and is defined for all time.
8.4.4
is, and is hence
Locally Lipschitz Continuous Case
If xt , t < τ is a continuous adapted stochastic process, a maximal solution to the
SDE E(σ, b), with life times τ . Let
τ N (ω) = inf{t : |xt (ω)| > N }.
119
Theorem 8.4.6 Assume that σ, b are locally Lipschitz continuous then there is a
unique strong solution (xt , t < τ ) which is a maximal solution.
Proof Write b = (b1 , . . . , bd ) in components. For each N , let bN
j , j = 1, . . . , d,
N
N
be a globally Lipschitz continuous function with bj = b if |x| ≤ N and bN
j =0
N
N
N
N
if |x| > N + 1. Let b = (b1 , . . . , bd ). We also define a sequence of σ in the
same way. Let xN
t be the unique strong global solution to the SDE:
N N
N N
dxN
t = σ (xt )dBt + b (xt )dt.
N
Let τ N be the first time that |xN
t | is greater or equal to N . Then xt agrees with
N +1
N
N
xt
for t < τ and τ increases with N . Define
xt (ω) = xN
t (ω),
on {ω : t < τ N (ω)}.
Then xt is defined up to t < τ where
τ = sup τN .
N
Note that the exit time from BN of xt is τ N and limt↑τ (ω) xt (ω) = ∞ on {τ (ω) <
∞}. The sets {ω : t < τ N (ω)} increases with N . Let Ω̄ = ∪N {ω : t < τ N (ω)}.
P (Ω̄) = lim P ({ω : t < τ N (ω)}).
N →∞
On {t < τ (x0 )} = ∪N ΩN , we have patched up a solution xt .
It is now clear that xt is a maximal solution for E(σ, b). It is a strong solution.
The uniqueness is clear as two maximal solutions (xt , t < T1 ) and (yt , t < T2 )
are equal up to each exit time τ N from a ball of radius N . It follows that T1 ≥ τ N
and T2 ≥ τ N .
If for all t,
lim P ({ω : t < τN (ω)}) = 1
N →∞
we have a global solution defined for all time t.
8.4.5
Non-explosion
Definition 8.4.7 Let xt , t < τ is a maximal solution. It is a global solution is
P (τ = ∞) = 1. We say that the SDE does not explode from x0 if there is a global
solution with initial point x0 . We say that the SDE does not explode if there is a
global solution for all starting point x0 .
120
Definition 8.4.8 A C 2 function V : Rd → R+ is a Lyapunov function (for the
explosion problem associated to an infinitesimal generator L ) if
1. V ≥ 0,
2. lim|x|→∞ |V (x)| = ∞ and
3. LV ≤ cV + K for some finite number c,
Proposition 8.4.9 Assume that σk are continuous and the SDE E(σ, b) has a solution xt . Suppose that there is a Lyapunov function V for the generator L the SDE
does not explode.
Proof Apply Itô’s formula to V and xt∧τN ∧T m where τN is the first time that
V (xt ) is greater or equal to N and T m = inf{|xt | ≥ m}.
Z
V (xt∧τ N ∧T m ) = V (x0 )+
t∧τN ∧Tm
m Z
X
LV (xs ) ds+
0
k=1
t∧τN ∧Tm
dV (xs )σk (xs )dBsk .
0
Since dV and σk are continuous and therefore bounded on the ball of radius m, the
local martingale is a martingale. Taking expectation to see that
Z t
E[LV (xs )1s<τ N ∧Tm ] ds
EV (xt∧τ N ∧Tm ) = V (x0 ) +
0
Z t
≤ V (x0 ) +
E[K + CV (xs )]1s<τ N ∧Tm ds
0
Z t
≤ V (x0 ) + Kt +
EV (xs∧τ N ∧Tm )ds.
0
It follows from Gronwall’ lemma, EV (xt∧τ N ∧Tm ) ≤ [V (x0 ) + Kt]eCt . Letting
m → ∞ and note that xt does not explode before time T N ,
EV (xt∧τ N ) ≤ [V (x0 ) + Kt]eCt .
Now
EV (xt∧τ N ) = EV (xt )1t<τ N + N P (t ≥ τ N )
In particular N P (t ≥ τN ) ≤ [V (x0 ) + Kt]eCt . and
P (t ≥ τN ) ≤
1
[V (x0 ) + kt]eCt .
N
Taking N → ∞ to see that limN →∞ P (t ≥ τN ) = 0 and so τ ≥ t for any t.
121
P
2
Example 8.4.10
1. Assume that m
k=1 |σk (x)| ≤ c(1+|x| ) and hb(x), xiRd ≤
2
2
c(1 + |x| ). Then 1 + |x| is a Lyapunov function:
2
L(|x| + 1) =
m X
d
X
(σki (x))2
+2
k=1 i=1
d
X
bl (x)xl ≤ 2c(1 + |x|2 ).
l=1
2. Show that the SDE below does not explode, by cnstruction of a Lyapunov
function,
dxt = (yt2 − x2t )dBt1 + 2xt yt dBt2
dyt = −2xt yt dBt1 + (x2t − yt2 )dBt2 .
The original proof (not given in class): The system has a global solution as
it can be transformed to dxt = dBt1 , dyt = dBt2 on R2 /{0} by the map
φ(x, y) = (x/r2 , −y/r2 ) which sends points at infinity to the origin (0, 0)
and vice versa. Then φ(xt , yt ) = (x0 + Bt1 , y0 + Bt2 ) is the solution to the
second pair of SDEs. The fact that P ((x0 + Bt1 , y0 + Bt2 ) = (0, 0)) = 0
implies that (x0 + Bt1 , y0 + Bt2 ) does not hit the origin, and so (xt , yt ) does
not explode to infinity.
qP
x2i be the radius function. Let φ be a
3. Let r(x) = |(x1 , . . . , xd )| =
harmonic function, so
φ(x) = c1 |x| + c2 ,
φ(x) = c1 log |x| + c2 ,
c1
φ(x) =
+ c2 ,
|x|n−2
dimension =1
dimension =2
dimension > 2
These functions satisfy ∆f = 0 and φ ∗ f solves ∆φ ∗ f = f .
For dimension 1 and 2 harmonic functions can be used to build Lyapunov
functions for generators of the form C∆, where C can be a function. Modify
the function inside the ball of radius one so that the function is smooth.
Harmonic functions in dimension 3 or greater are not useful for explosion
problems.
Problem 8.4.1 What kind of conditions on σ and b implies exponential integrability Eext < ∞?
The following is not covered in the lectures.
122
Proposition 8.4.11 If the coefficients has linear growth, there is no explosion.
Furthermore for any p > 1, there are constants cp , C such that any solution satisfies:
p
E sup |xs |p ≤ cp |x0 |p + CT 2 + CT p eCT
s≤T
for T > 0.
Proof Non-explosion follows from that |x|2 is a Lyapunov function. We assume
that d = m = 1. Let xt be a solution, Let p > 2 there is a constant cp such that
p
Z t
p
Z s
p
p
σ(xr )dBr + cp b(xr )dr .
|xs | ≤ cp |x0 | + cp 0
0
From this we see,
Z t
p
Z
p
p
E sup |xs | ≤ cp |x0 | +cp E sup σ(xr )dBr +cp E sup
s≤t
s≤t
s≤t
0
p
s
|b(xr )| dr
.
0
Below we apply the Burkholder-Davies-Gundy inequality to the local martingale
term and Hölder’s inequality to the the second integral. There is a constant Cp such
that
Z ·
p
Z ·
2
p
p
E sup |xs | ≤cp |x0 | + cp Cp E
σ(xr )dBr ,
σ(xr )dBr
s≤t
0
0
t
Z s
|b(xr )|p dr
+ cp tp−1 E sup
s≤t
p
0
t
Z
2
≤cp |x0 | + cp Cp E
σ (xr )dr
p2
p−1
+ cp t
0
Z
t
p
|b(xr )| dr
E
0
There is C > 0 such that
σ(x)| ≤ C(1 + |x|),
b(x)| ≤ C(1 + |x|).
Z t
p2
Z t
p
p
2
p−1
p
E sup |xs | ≤cp |x0 | + cp Cp E
C(1 + |xr |) dr
+ cp t E
C(1 + |xr |) dr
s≤t
0
0
Since p > 2 we may apply apply again Hölder’s inequality, for some constant C,
Z t
p
p
p
p
E sup |xs | ≤ cp |x0 | + Ct 2 + Ct + C
E|xr |2 dr.
s≤t
0
By Grownall’s inequality,
p
E sup |xs |p ≤ cp |x0 |p + Ct 2 + Ctp eCt .
s≤t
123
8.5
Girsanov Theorem
Let (Ω, F, Ft , P ) be a filtered probability space. Let Nt be a continuous lo1
calmartingale.
The exponential martingale eNt − 2 hN,N it is a true martingale if
1
E e 2 hN,N it < ∞ for all t ≥ 0. (The condition is called Novikov criterion.) We
define Q on Ft such that
1
dQ
dP
= eNt − 2 hN,N it . Then Q is a probability measure.
Proposition 8.5.1
Let ft be a strictly positive continuous local martingale. There is a continuous
local martingale Nt s.t.
1
ft = eNt − 2 hN,N it .
Rt
In fact Nt = log f0 + 0 fs−1 dfs .
Proof Let g(x) = ex . ft = g(Nt − 12 hN, N it ). Apply Itô’s formula to conclude.
Theorem 8.5.2 (Girsanov Theorem) Let P and Q be two equivalent probability
measures on F∞ with f = dQ
dP . Assume that ft = EP {f |Ft } is continuous. If Mt
is real valued continuous (Ft , P ) local martingale then
M̃t = Mt − hM, N it ,
is a (Ft , Q) local martingale. Here Nt = log f0 +
(8.7)
Rt
0
fs−1 dfs .
Proof Taking Tn to be the first time that max(|ft |, |Mt |, hN, N it , hM, M it ) ≥
n. We only need to show that each stopped process M Tn is a martingale. For
simplicity of notation we assume all processes concerned are bounded and prove
that (M̃t ) is a Q-martingale.
Let s < t, A ∈ Fs . We want to prove that
Z
Z
M̃t dQ =
M̃s dQ.
A
A
Equivalently,
Z
Z
M̃t ft dP =
A
M̃s fs dP.
A
We may assume that M0 = 0. Note that
Z t
hM, N it =
fs−1 dhf, M is
0
124
and for any t > 0,
Z
t
Z
t
hM, N it ft =
fr dhM, N ir +
hM, N ir dfr
0
0
Z t
Z t
hM, N ir dfr
dhf, M ir +
=
0
0
Z t
hM, N ir dfr .
= hf, M it +
0
By the defining property of the bracket process, Mt ft − hf, M it is a P martingale
Z
(M̃t ft − M̃s fs )dP
A
Z Z t
Z s
=
Mt ft − Ms fs − hf, M it + hf, M is −
hM, N ir dfr +
hM, N ir dfr dP
A
0
0
Z
Z t
Z s
=−
hM, N ir dfr −
hM, N ir dfr dP = 0.
A
0
0
Corollary 8.5.3 Assume the conditions of the Girsanov theorem. If Bt = (Bt1 , . . . , Btm )
is an (Ft , P ) Brownian motion. Define B̃t such that Btj = Btj − hN, B j it . Then
(B̃t ) is an (Ft , Q) Brownian motion.
Proof By Theorem 8.5.2, each Btj − hN, Btj it is Q martingale. Furthermore
hBti , Btj i = δi,j t. By by Lévy’s characterisation theorem B̃t is a Ft Brownian
motion with respect to Q.
Example 8.5.4 Let T > 0 be a real number and let h : [0, T ] ×
Ω → Rd be an L2
R
1 T
2
progressively measurable function such that h0 = 0 and Ee 2 0 |hs | ds < ∞. Let
P be a probability measure. Defien a measure Q such that
Z T
Z
1 T
dQ
:= exp
hhs , dBs i −
|hs |2 ds .
dP
2 0
0
Then Q is a probability
measure by Novikov’s condition.
Rt
Take Nt = 0 hs dBs where Bs is a one dimensional Brownian motion with
respect to P . By Itô isometry:
Z ·
Z t
Z t
hB,
hs dBs it =
hs ds =
hs ds.
0
Then Bt −
Rt
0
0
0
hs dBs is a Brownian motion with respect to Q.
125
Example 8.5.5 Let σ, b : R → R be be bounded functions with bounded continuous first derivatives. Let U : R+ × → R be also in BC 1 . Consider two SDEs:
dxt = σ(xt )dBt + b(xt )dt
dyt = σ(yt )dBt + b(yt )dt + σ(yt )U (t, yt )dt
Let x0 = y0 . Denote by xt , yt the solutions.
By Yamada-Watanabe theorem, there is a Borel measurable function Ft such
that xt = Ft (x0 , Bt ). Let
Z t
Z
1 t
2
ft = exp −
U (s, ys )dBs −
U (s, ys ) ds .
2 0
0
Fix T > 0. Define Q on FT such that
dQ
dP
Z
= fT . Then
t
B̃t = Bt +
U (s, ys )ds
0
is a Q-Brownian motion. Note that
dyt = σ(yt )dB̃t + b(yt )dt.
Hence yt = Ft (x0 , B̃t ). Let f : R → R be bounded Borel measurable. By
uniqueness in law,
Z
Z
f (Ft (x0 , Bt ))dP =
f (Ft (x0 , B̃t ))dQ.
Ω
Ω
This means
Z t
Z
1 t
2
Ef (xt ) = Ef (yt ) exp −
U (s, ys )dBs −
U (s, ys ) ds .
2 0
0
Similarly,
Z
Ef (yt ) = Ef (xt ) exp
0
t
1
U (s, xs )dBs −
2
126
Z
0
t
U (s, xs ) ds .
2
8.6
Summary
List of Terminology
1. Martingales, stopping times, local martingales, sub-martingales, semi-martingales,
2. The bracket process. If (Mt ) and (Nt ) are two continuous local martingales,
there exists a unique continuous process hM, N it of finite total variation
vanishing at 0 such that Mt Nt − hM, N it is a local martingale.
3. Continuous L2 bounded martingales: “A continuous local martingale (Mt , t ≥
0) with M0 = 0 is an L2 bounded martingale if and only if EhM, M i∞ <
∞.” In this case M 2 − hM, M i is a uniform integrable martingale.
kM kH 2 =
q
q
EhM, M i∞ = lim EMt2 .
t→∞
4. Itô integral with respect to a semi-martingale, Stratonovich integrals.
5. Independent Increments. A stochastic process (Xt , t ∈ I) is said to have
independent increments if for any n and any 0 = t0 < t1 < · · · < tn , ti ∈ I,
the increments {(Xti+1 − Xi )}di=1 are independent.
Gaussian Processes. A stochastic process Xt is Gaussian if its finite dimensional marginal distributions are Gaussian.
Brownian Motion. A sample continuous stochastic process (Bt : t ≥ 0) on
R1 is the standard Brownian motion if B0 = 0 and the following holds:
(a) For 0 ≤ s < t, the distribution of Bt − Bs is N (0, t − s) distributed.
(b) Bt has independent increments.
An Ft Brownian motion in Rd is a (sample continuous) stochastic process
such that B0 = 0 and the following holds: Bt − Bs is independent of Fs and
Bt − Bs is N (0, (t − s)I) distributed.
6. Wiener space
7. SDEs, weak and strong solutions, existence, uniqueness in law and pathwise
uniqueness, non-explosion.
Infinitesimal generators,
d
1 X
L=
2
i,j=1
m
X
!
σki σkj
d
X
∂2
∂
+
bj
∂xi ∂xj
∂xj
j=1
k=1
127
List of Theorems
1. The Optional Stopping Theorem. For Martingales: Let Xt , t ≥ 0 be a
right continuous martingale. Then for all bounded stopping times S ≤ T ,
E{XT |FS } = XS almost surely. If {Xt , t ≥ 0} is furthermore uniformly
integrable, we define XT = X∞ on {T = ∞} and the statement holds for
all stopping times.
For super-martingales. Let (Xt , t ≥ 0) be a right continuous super-martingale.
Let S and T be two bounded stopping times. Then E{XT |FS } ≤ XS almost
surely. If furthermore {Xt , t ≥ 0} is uniformly integrable, the inequality
holds for all stopping times which are not necessarily bounded.
2. Martingale convergence theorem: If Xt is a right continuous super-martingale
(or sub-martingale) and Xt is L1 bounded then limt→∞ Xt exists a.s.
Uniform Integrability Lemma on conditional expectations of L1 random
variables: Let X : Ω → R be an integrable random function, then the
family of functions
{E{X|G} : G is a sub σ-algebra of F}
is uniformly integrable.
3. Uniform Integrability of Martingales and End point Theorem: If (Xt , t ≥
0) the following are equivalent. Below we use X∞ for the end point.
(a) Xt converges in L1 , as t approaches ∞ (to a random variable X∞ ).
(b) There exists a L1 random variable X∞ s.t. Xt = E{X∞ |Ft }.
(c) (Xt , t ≥ 0) is uniformly integrable.
4. Let (Xt , t ∈ I) be a right continuous martingale or positive sub-martingale,
where I is an interval. Maximal Inequality, p ≥ 1,
P (sup |Xt | ≥ λ) ≤
t
1
sup E|Xt |p .
λp t
Doob’s Lp Inequalities, p > 1,
1
p
p
≤
E sup |Xt |
t∈I
128
1
p
sup (E|Xt |p ) p .
p−1 t
5. B-G-D inequality: For every p > 0, there exist universal constants cp and
Cp such that for all continuous local martingales vanishing at 0,
p
p
cp EhM, M iT2 ≤ E(sup |Mt |)p ≤ Cp EhM, M iT2
t<T
where T is a finite number, infinity or a stopping time.
6. Characterisation theorem for Itô integrals. Let M be a continuous
R t local
martingale with M0 = 0. If H ∈ L2loc (M ), the stochastic integral 0 Hs dMs
is the unique local martingale vanishing at 0 such that for all t > 0 and for
all continuous local martingales Nt ,
Z ·
Z t
h
Hs dMs , N it =
Hs dhM, N is
0
0
7. Itô’s formula. Let f be C 2 , xt = (x1t , . . . , xdt ) a continuous semi-martingale,
for s < t,
d Z t
d Z
X
∂f
1 X t ∂f 2
j
f (xt ) = f (xs ) +
(xr )dxr +
(xr )dhxi , xj ir .
2
s ∂xj
s ∂xi ∂xj
j=1
i,j=1
Variation on Itô’s Formula Assume that F : [0, ∞) × Rd → R be C 1,2 .
Z
f (t, xt ) = f (s, xs ) +
s
+
t
d
X
∂f
(r, xr )dr +
∂r
j=1
Z
s
t
∂f
(r, xr )dxjr
∂xj
d Z
1 X t ∂f 2
(r, xr )dhxi , xj ir .
2
∂x
∂x
i
j
s
i,j=1
8. Lévy’s Martingale Characterization Theorem for Brownian Motions.
An Ft adapted sample continuous stochastic process Xt in Rd vanishing
at 0 is a standard Ft -Brownian motion if and only if each Xt is a Ft local
martingale and hX i , X j it = δij t.
9. Pathwise existence and uniqueness theorem : locally Lipschitz implies there
is one unique strong maximal solution from each initial point. Globally Lipschitz continuous implies non-explosion. Linear growth condition implies
non-explosion.
10. Pathwise Uniqueness implies Uniqueness in Law. If pathwise uniqueness
holds then any solution is a strong solution and uniqueness in law holds.
129
8.7
Glossary
C([0, T ]); X), the space of continuous function from [0, T ] to M
◦, Stratonovich integration.
130
Bibliography
[1] Ludwig Arnold. Random dynamical systems. Springer Monographs in Mathematics. Springer-Verlag, Berlin, 1998.
[2] Patrick Billingsley. Convergence of probability measures. Wiley Series in
Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc.,
New York, second edition, 1999. A Wiley-Interscience Publication.
[3] V. I. Bogachev. Measure theory. Vol. I, II. Springer-Verlag, Berlin, 2007.
[4] K. D. Elworthy. Stochastic differential equations on manifolds, volume 70
of London Mathematical Society Lecture Note Series. Cambridge University
Press, Cambridge, 1982.
[5] K. D. Elworthy, Y. Le Jan, and Xue-Mei Li. On the geometry of diffusion operators and stochastic flows, volume 1720 of Lecture Notes in Mathematics.
Springer-Verlag, Berlin, 1999.
[6] K. D. Elworthy, Xue-Mei Li, and M. Yor. The importance of strictly local
martingales; applications to radial Ornstein-Uhlenbeck processes. Probab.
Theory Related Fields, 115(3):325–355, 1999.
[7] K. David Elworthy, Yves Le Jan, and Xue-Mei Li. The geometry of filtering.
Frontiers in Mathematics. Birkhäuser Verlag, Basel, 2010.
[8] Gerald B. Folland. Real analysis. Pure and Applied Mathematics (New York).
John Wiley & Sons Inc., New York, second edition, 1999. Modern techniques
and their applications, A Wiley-Interscience Publication.
[9] Avner Friedman. Stochastic differential equations and applications. Vol. 1.
Academic Press [Harcourt Brace Jovanovich Publishers], New York, 1975.
Probability and Mathematical Statistics, Vol. 28.
131
[10] Avner Friedman. Stochastic differential equations and applications. Vol. 2.
Academic Press [Harcourt Brace Jovanovich Publishers], New York, 1976.
Probability and Mathematical Statistics, Vol. 28.
[11] Ĭ. Ī. Gı̄hman and A. V. Skorohod. Stochastic differential equations. SpringerVerlag, New York, 1972. Translated from the Russian by Kenneth Wickwire,
Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 72.
[12] R. Z. Has0 minskiı̆. Stochastic stability of differential equations, volume 7 of
Monographs and Textbooks on Mechanics of Solids and Fluids: Mechanics
and Analysis. Sijthoff & Noordhoff, Alphen aan den Rijn, 1980. Translated
from the Russian by D. Louvish.
[13] Nobuyuki Ikeda and Shinzo Watanabe. Stochastic differential equations
and diffusion processes, volume 24 of North-Holland Mathematical Library.
North-Holland Publishing Co., Amsterdam, second edition, 1989.
[14] Olav Kallenberg. Foundations of modern probability. Probability and its
Applications (New York). Springer-Verlag, New York, second edition, 2002.
[15] Ioannis Karatzas and Steven E. Shreve. Brownian motion and stochastic calculus, volume 113 of Graduate Texts in Mathematics. Springer-Verlag, New
York, second edition, 1991.
[16] H. Kunita. Lectures on stochastic flows and applications, volume 78 of Tata
Institute of Fundamental Research Lectures on Mathematics and Physics.
Published for the Tata Institute of Fundamental Research, Bombay, 1986.
[17] Hiroshi Kunita. Stochastic flows and stochastic differential equations, volume 24 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 1990.
[18] Roger Mansuy and Marc Yor. Aspects of Brownian motion. Universitext.
Springer-Verlag, Berlin, 2008.
[19] Peter Mörters and Yuval Peres. Brownian motion. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge,
2010. With an appendix by Oded Schramm and Wendelin Werner.
[20] Bernt Øksendal. Stochastic differential equations. Universitext. SpringerVerlag, Berlin, sixth edition, 2003. An introduction with applications.
[21] K. R. Parthasarathy. Probability measures on metric spaces. AMS Chelsea
Publishing, Providence, RI, 2005. Reprint of the 1967 original.
132
[22] Michael Reed and Barry Simon. Methods of modern mathematical physics.
I. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], New York,
second edition, 1980. Functional analysis.
[23] Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion,
volume 293 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, third
edition, 1999.
[24] L. C. G. Rogers and David Williams. Diffusions, Markov processes, and
martingales. Vol. 2. Wiley Series in Probability and Mathematical Statistics:
Probability and Mathematical Statistics. John Wiley & Sons Inc., New York,
1987. Itô calculus.
[25] L. C. G. Rogers and David Williams. Diffusions, Markov processes, and
martingales. Vol. 1. Wiley Series in Probability and Mathematical Statistics:
Probability and Mathematical Statistics. John Wiley & Sons Ltd., Chichester,
second edition, 1994. Foundations.
[26] Daniel W. Stroock. An introduction to Markov processes, volume 230 of
Graduate Texts in Mathematics. Springer-Verlag, Berlin, 2005.
[27] Daniel W. Stroock and S. R. Srinivasa Varadhan. Multidimensional diffusion processes, volume 233 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. SpringerVerlag, Berlin, 1979.
[28] S. R. S. Varadhan. Stochastic processes, volume 16 of Courant Lecture Notes
in Mathematics. Courant Institute of Mathematical Sciences, New York,
2007.
[29] David Williams. Probability with martingales. Cambridge Mathematical
Textbooks. Cambridge University Press, Cambridge, 1991.
[30] Marc Yor. Some aspects of Brownian motion. Part II. Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel, 1997. Some recent martingale
problems.
133
Download