Browninan Motion. Lecture Notes. October 2006 V. Kolokoltsov

advertisement
Browninan Motion. Lecture Notes. October 2006
V. Kolokoltsov
These notes, revised and extended, form the basis for Chapter 2 and 3 of my book
“Markov Processes, Semigroups and Generators”, de Gryuter 2011.
Aims and Objectives. Brownian motion (BM) is an acknowledged champion in
stochastic modeling for a wide variety of processes in physics (statistical mechanics, quantum fields, etc), biology (e.g. population dynamics, migration, disease spreading), finances
(e.g. common stock prices). BM enjoys beautiful nontrivial properties deeply linked with
various areas of mathematics. The general theory of modern stochastic process is strongly
rooted in BM and was largely developed by extensions of its remarkable features. The aim
of the course is to learn the basic properties of BM, its potential and limitation as a modeling tool, to understand its place among the main general classes of random processes
such as martingales, Markov and Lévy processes, to learn to apply the general tools of
stochastic analysis (e.g. stopping times) to BM and related diffusions, and to appreciate
basic notions and methods of modern stochastic analysis itself through its application to
BM.
General Remarks: 1) ? denotes an additional material, 2) Sections 1 and 5 contain
basic probability prerequisites for the course. 3) Recommended supplementary reading:
main: I. Karatzas, S. Shreve. Browninan Motion and Stochastic Calculus. Springer 1998;
D. Revuz, M. Yor. Continuous Martingales and Brownian Motion. Springer 1999; for
references in probability: J. Jacod, Ph. Protter. Probability Essentials. Springer 2004;
A.N. Shiryayev. Probability. Springer 1984.
Content.
CHAPTER 1. DEFINITION AND CONSTRUCTION of BM (p.1).
0. Overview, historical sketch, perspectives.
1. Review of measure and probability. 2. Brownian motion: construction via Hilbert
space methods. 3. The construction of BM via Kolmogorov’s continuity theorem.
CHAPTER 2. The LÉVY, MARKOV AND FELLER PROCESSES (p.12).
4. Processes with s.i. increments. 5. Conditioning. 6. Markov processes. 7. Feller
processes and semigroups.
CHAPTER 3. MARTINGALE METHODS (p.31).
8. Martingales. 9. Stopping times and optional sampling theorem. 10. Strong Markov
property. Diffusions as Feller processes with continuous paths. 11. Reflection principle
and passage times for BM.
CHAPTER 4. HEAT CONDUCTION (OR DIFFUSION) EQUATION (p.45).
12. The Dirichlet problem for diffusion operators. 13. The stationary Feynman-Kac
formula. 14. Diffusions with variable drift, Ornstein-Uhlenbeck processes.
CHAPTER 5. FINE PROPERTIES of BM (p.52).
Section 15. Zeros, excursions and local times. 16. Skorohod imbedding and invariance
principle. 17. Sample path properties.
1
CHAPTER 1. DEFINITION AND CONSTRUCTION of BM.
Section 0. Overview, historical sketch, perspectives. Brown, Einstein, Smoluchovski, Langevin, Ornstein-Uhlenbeck, Chandrasekhar, Wiener, Feynman-Kac, Nelson,
McKean, Dyson.
Section 1. Review of measure and probability
Def. A collection F of subsets of a given set S is called a σ-algebra if (i) S ∈ F; (ii)
A ∈ F ⇒ Ac ∈ F; (iii) (σ-additivity) ∪∞
n=1 An ∈ F whenever An ∈ F for any n ∈ N. The
pair (S, F) is called a measurable space. A measure on (S, F) is a mapping µ : F 7→ [0, ∞]
such that µ(∅) = 0 and σ-additivity holds:
µ (∪∞
n=1 An )
=
∞
X
µ(An )
n=1
for any sequence An of mutually disjoint sets in F. The triple (S, F, µ) is called a measure
space. A measure µ is called finite if its total mass µ(S) is finite and respectively σ-finite if
there exists a sequence An , n ∈ N, of subsets of F such that S = ∪∞
n=1 An and µ(An ) < ∞
for all n. For a collection of subsets Γ of a set Ω σ-algebra σ(Γ) generated by Γ is the
minimal σ-algebra containing all sets from Γ.
Examples. 1. A measure space (Ω, F, µ) is called a probability space whenever
µ(Ω) = 1. In this case µ is called a probability measure and the subsets from F are called
events. 2. For a topological space, e.g. a subset of Rd , the smallest σ-algebra B(S)
containing all its open subsets is called the Borel σ-algebra of S. Its elements are called
Borel sets and any measure on (S, B(S)) is called a Borel measure. The simplest example
of a Borel measure is given by Lebesgue measure on Rd . 3. For a finite or a countable
family of measure spaces (Si , Fi , µi ), i = 1, 2, ...,, the product measure space (S, F, µ) is
defined, where S = S1 × S2 × ..., F = F1 ⊗ F2 ⊗ ... -the σ-algebra generated by the sets
A1 × ... × An , Ai ∈ Fi , n ∈ N, and µ = µ1 × µ2 × ... is the product measure uniquely
specified by the prescription µ(A1 × ... × An ) = µ1 (A1 )...µn (An ).
Borel-Cantelli lemma.
If a sequence of events An , n ∈ N, on a probability space
P
(Ω, F, P ) is such that n P (An ) < ∞, then a.s. only a finite number of An can occur.
Proof. Let B = {ω ∈ Ω : infinite number An occur}. Then
B = ∩n (∪k≥n Ak )
and
P (B) ≤ P (∪k≥n Ak ) ≤
X
P (Ak ) → 0,
k≥n
as n → ∞. Hence P (B) = 0.
Def. Completion. For a measure space (S, F, µ) a subset of S is called negligible
if it is a subset of a N ∈ F with µ(N ) = 0. The σ-algebra of subsets F̄ of S of the form
A ∪ B with A ⊂ F and B being negligible and the measure µ̄ on it defined on these sets as
2
µ̄(A ∪ B) = µ(A) are called respectively the completion of F and µ (with respect to µ). In
particular, for S ⊂ Rd the completion of B(S) with respect to Lebesgue measure is called
the σ-algebra of Lebesgue measurable sets in S.
Def. For a probability space (Ω, F, µ) one says that some property depending on
ω ∈ Ω holds almost surely or with probability 1 if there exists a negligible set N ∈ F such
that this property holds for all ω ∈ Ω \ N .
Def. If (Si , Fi ), i = 1, 2, are measurable spaces a mapping f : S1 7→ S2 is called
(F1 , F2 )-measurable if f −1 (A) ∈ F1 whenever A ∈ F2 . If S1 , S2 are subsets of Rd equipped
with their Borel σ-algebra such a mapping is said to be Borel measurable. Speaking about
measurable mapping with values in Rd one usually means that Rd is equipped with its
Borel σ-algebra.
? Exercise and Def. For S ⊂ Rd the universal σ-field U(S) is defined as the
intersection of the completions of B(S) with respect to all probability measures on S. The
(U (S), B(S))- measurable functions are called universally measurable. Show that a real
valued function f is universally measurable if and only if for every probability measure µ
on S there exists a Borel measurable function gµ such that µ{x : f (x) 6= gµ (x)} = 0. Hint
for ”only if” part: show that
f (x) = inf{r ∈ Q : x ∈ U (r)},
where U (r) = {x ∈ S : f (x) ≤ r}.
Since U (r) belong to the completion of the Borel σ-algebra with respect to µ there exist
B(r), r ∈ Q, such that
µ (∪r∈Q (B(r)∆U (r))) = 0.
Define
gµ (x) = inf{r ∈ Q : x ∈ B(r)}.
Def. For a probability space (Ω, F, P ) the measurable mappings X : Ω 7→ Rd are
called random variables (shortly r.v.). The law (or distribution) of such a mapping is the
Borel probability measure pX on Rd defined as pX = P ◦ X −1 . In other words
pX (A) = P (X −1 (A)) = P (ω ∈ Ω : X(ω) ∈ A) = P (X ∈ A).
Two r.v. X and Y are called identically distributed if they have the same probability law.
For a real (i.e. one-dimensional) r.v. X its distribution function is defined by FX (x) =
pX ((−∞, x]). RA real r.v. X has a continuous distribution with a probability density function
f if pX (A) = A f (x)dx for all Borel sets A. The σ-algebra σ(X) generated by a r.v. X is
the smallest σ-algebra containing the sets {X ⊂ B} for all Borel sets B.
Exercise. Show that if X takes only finite number of values, then the law pX is a
sum of Dirac’s δ- measures.
Def. Expectation and covariance. For a Rd -valued r.v. X on a probability space
(Ω, F, µ) and a Borel measurable function f : Rd 7→ Rm the expectation E of f (X) is
defined as
Z
Z
E(f (X)) =
f (X(ω))P (dω) =
f (x)pX (dx).
(1)
Rd
Ω
3
X is called integrable if E(|X|) < ∞. For two Rd -valued r.v. X = (X1 , ..., Xd ) and
Y = (Y1 , ..., Yd ) the d × d matrix with the entries E[(Xi − E(Xi ))(Yj − E(Yj )] is called the
covariance of X and Y and is denoted Cov(X, Y ). In case d = 1 and X = Y the number
Cov(X, Y ) is called the variance of X and is denoted by V ar(X) and sometimes also by
2
σX
. The r.v. X and Y are called uncorrelated whenever Cov(X, Y ) = 0.
Exercise. Show that the two expressions in the definition (1.1) really coincide. Hint:
first choose f to be an indicator.
Def. Four basic notions of convergence of r.v. Let X and Xn , n ∈ N be
d
R -valued r.v. (defined in a probability space). One says that Xn converges to X (i) in
Lp (1 ≤ p < ∞) if limn→∞ E(|Xn − X|p ) = 0; (ii) almost surely if limn→∞ Xn (ω) = X(ω)
almost surely; (iii) in probability if for any ² > 0 limn→∞ P (|Xn − X| > ²) = 0; (iv) in
distribution if pXn weakly converges to pX , i.e. if
Z
Z
lim
n→∞
Rd
f (x)pXn (dx) =
f (x)pX (dx)
Rd
for all bounded continuous functions f .
Exer. Show that Lp -convergence ⇒ convergence in probability ⇒ weak convergence.
Hint:
for the first ⇒ use Chebyshev inequality; for the second one decompose the integral
R
|f (Xn (ω)) − f (X(ω))|P (dω) into the three terms over the sets {|Xn − X| > δ}, {|Xn −
X| ≤ δ, |X| > 1/δ} and {|Xn − X| ≤ δ, |X| ≤ 1/δ}.
Exer. (i) Show that Xn → X in probability ⇔
µ
lim E
n→∞
|Xn − X|
1 + |Xn − X|
¶
= 0.
(ii) Deduce from (i) that almost sure convergence implies convergence in probability. Hint
for (i): it is enough to prove it for X = 0; for ”if part” then use the inequality
µ
E
|Xn |
1 + |Xn |
¶
²
≥
P
1+²
µ
¶
|Xn |
>² ;
1 + |Xn |
for ”only if part” decompose the integral E(|Xn |/(1 + |Xn |) into the two terms over the
sets |Xn | > ² and |Xn | < ².
Example. Consider the following sequence of indicator functions {Xn } on [0, 1]:
1[0,1] , 1[0,1/2] , 1[1/2,1] , 1[0,1/3] , 1[1/3,2/3] , 1[2/3,1] , 1[0,1/4] , 1[1/4,2/4] , etc. Then Xn → 0 as
n → ∞ in probability and in all Lp , p ≥ 1, but not a.s.; in fact lim sup Xn (x) = 1 and
lim inf Xn (x) = 0 for each x so that Xn (x) → X(x) nowhere.
Exer. (i) Convince yourself that Xn → X in distribution does not imply Xn − X → 0
in distribution.(ii) Show that Xn → 0 in distribution ⇒ Xn → 0 in probability.
? Exer. Show that Xn → X a.s. ⇔
lim P { sup |Xn − X| > ²} = 0
m→∞
n≥m
4
for all ² > 0. Use this to give another proof of the fact that convergence a.s. implies
convergence in probability. Hint: observe that the event Xn → X is complement to the
event
B = ∪r∈Q Br ,
Br = ∩m∈Q { sup |Xn − X| > 1/r},
n≥m
i.e., a.s. convergence is equivalent to P (B) = 0 and hence to P (Br ) = 0 for all r.
Def. A family H of L1 (Ω, F, µ) is uniformly integrable if
lim sup E(|X|1|X|>c ) = 0.
c→∞ X∈H
Exer. If either (i) supX∈H E(|X|p ) < ∞ for a p > 1, or (ii) ∃ an integrable r.v. Y
s.t. |X| ≤ Y for all X ∈ H, then H is uniformly integrable. Hint:
(i)
E(|X|1|X|>c ) <
(ii)
1
cp−1
E(|X|p 1|X|>c ) <
1
cp−1
E(|X|p ),
E(|X|1|X|>c ) < E(Y 1Y >c ).
Exer. If Xn → X a.s. Rand {Xn } is uniformly integrable, then Xn → X in L1 .
Hint: decompose the integral |Xn − X|p(dω) into the sum of three over the domains
{|Xn − X| > ²}, {|Xn − X| ≤ ², |X| ≤ c} and {|Xn − X| ≤ ², |X| > c}, and they can be
made small respectively because Xn → X in probability (as it holds a.s.), by dominated
convergence an by uniform integrability.
Def. Characteristic functions.
If p is a probability measure on Rd its characterisR i(y,x)
tic function is the funtion φp (y) = e
p(dx). For a Rd -valued r.v. X its characteristic
function is defined as the characteristic function φX = φpX of its law pX , i.e.
Z
i(y,x)
φX (y) = E(e
)=
Rd
ei(y,x) pX (dx).
Exer. Show that any ch.f. is a continuous function. Hint: use
|φX (y + h) − φX (y)| ≤ E|eihX − 1| ≤ max |eihx − 1| + 2P (|X| > a).
|x|≤a
(2)
? Riemann-Lebesgue Lemma. If a probability measure p has a density, then φp
belongs to C∞ (Rd ) (continuous functions tending to 0 as its argument tends to ∞). In
other words, the inverse Fourier transform
Z
f →F
−1
−d/2
f (y) = (2π)
ei(y,x) f (x)dx
is a bounded linear operator L1 (Rd ) 7→ C∞ (Rd ).
Sketch of the proof. Reduce to the case, when f is a continuously differentiable function
with a compact support, then use integration by part.
5
Exercise and Def. For a vector m ∈ Rd and a positive definite d × d-matrix A a
r.v. X is called Gaussian (or has Gaussian distribution) with mean m and covariance A
and is denoted by N (m, A) whenever its characteristic function is
1
φN (m,A) (y) = exp{i(m, y) − (y, Ay)}.
2
(i) Show that if A is non-degenerate, a N (m, A) r.v. have a distribution with the pdf
f (x) =
(2π)d/2
1
p
1
exp{− (x − m, A−1 (x − m))}.
2
det(A)
(ii) Show that m = E(X) and Aij = E((Xi − mi )(Xj − mj )).
Exercise and Def. suppose X1 and X2 are independent Rd -valued r.v. with laws
µ1 , µ2 and characteristic functions φ1 and φ2 . (i) Show that the r.v. X1 + X2 has the
characteristic function φ1 φ2 and the law given by the convolution µ1 ? µ2 defined by
Z
Z
χA−x (y)µ1 (dy)µ2 (dx).
(µ1 ? µ2 )(A) =
µ1 (A − x)µ2 (dx) =
Rd
R2d
(ii) Extend this result to the case of n independent r.v. X1 ,..., Xn .
? Exer. and Def. Show that if probability distributions pn , n ∈ N, converge weakly
to a probability distribution p, then (i) the family pn is tight, i.e.
∀² > 0 ∃K > 0 : ∀n, pn (|x| > K) < ²;
(ii) their characteristic functions φn converge uniformly on compact sets. Hint: for (ii) use
tightness and representation (2) to show that the family of ch.f is equicontinuous, i.e.
∀² ∃δ : |φn (y + h) − φ(y)| < ²
∀h < δ, n ∈ N
which implies the uniform convergence.
Glivenko’s Theorem. If φn , n ∈ N, and φ are the characteristic functions of
probability distributions pn and p on Rd , then limn→∞ φn (y) = φ(y) for each y ∈ Rd if
and only if pn converge to p weakly.
Lévy’s Theorem. If φn , n ∈ N, is a sequence of characteristic functions of probability distributions on Rd and limn→∞ φn (y) = φ(y) for each y ∈ Rd and a function φ,
which is continuous at the origin, then φ is itself a characteristic function.
? Exer. Show that if a family of probability measures pα is tight, then it is relatively
weakly compact, i.e. any sequence of this family has a weakly convergent subsequence.
Hint: tight ⇒ family of characteristic functions is equicontinuous (by (2)), and hence is
relatively compact in the topology of uniform convergence on compact sets. Finally use
Levy’s theorem.
Exercise. (i) Show that a finite linear combination of Rd -valued Gaussian r.v. is
again a Gaussian r.v. (ii) Show that if a sequence of Rd -valued Gaussian r.v. converges
in distribution to a r.v., then the limiting r.v. is again Gaussian. (iii) Show that if (X, Y )
6
is a R2 -valued Gaussian r.v., then X and Y are uncorrelated if and only if they are
independent.
? Bochner’s Theorem. A function φ : Rd 7→ C is a characteristic function of a
probability distribution if and only if it satisfies the following three properties: (i) φ(0) = 1;
(ii) φ is continuous at the origin; (iii) φ is positive definite, which means that
d
X
cj c̄k φ(yj − yk ) ≥ 0
j,k=1
for all real y1 , ..., yd and all complex c1 , ..., cd .
Exercise. Prove the ”only if” part of Bochner’s theorem. Hint: for (iii) observe that
d
X
Z
cj c̄k φX (yj − yk ) =
j,k=1
Z
=
Rd
d
X
Rd
cj c̄k ei(yj −yk ,x) pX (dx)
j,k=1


d
X

cj ei(yj ,x)  pX (dx).
j=1
Def. Stochastic processes. A stochastic process is a collection X = (Xt ), t ≥ 0)
(or t ∈ [0, T ] for some T > 0) of Rd - valued random variables defined on the same
probability space. The finite-dimensional distributions of such a process are the collection
of probability measures pt1 ,...,tn on Rdn (parametrized by finite collections of pairwise
different non-negative numbers t1 , ..., tn ) defined as
pt1 ,...,tn (H) = P ((Xt1 , ..., Xtn ) ∈ H)
for each Borel subset H of Rdn . These finite-dimensional distributions are (obviously)
consistent (or satisfy Kolmogorov’s consistency criteria): for any n, any permutation π of
{1, ..., n}, any sequence 0 ≤ t1 < ... < tn+1 , and any collection of Borel subsets H1 , ..., Hn
of Rd one has
pt1 ,...,tn (H1 × ... × Hn ) = ptπ(1) ,...,tπ(n) (Hπ(1) × ... × Hπ(n) ),
pt1 ,...,tn ,tn+1 (H1 × ... × Hn × Rd ) = pt1 ,...,tn (H1 × ... × Hn ).
Def. A stochastic process is called Gaussian if all its finite-dimensional distributions
are Gaussian.
Kolmogorov’s existence theorem. Given a family of probability measures pt1 ,...,tn
(on Rdn ) satisfying the Kolmogorov consistency criteria, there exists a probability space
(Ω, F, P ) and a stochastic process X on it having pt1 ,...,tn as its finite-dimensional distri+
bution. In particular, one can choose Ω to be the set (Rd )R of all mappings from R+ to
Rd and F to be the smallest σ-algebra containing all cylinder sets
ItH1 ,...,tn = {ω ∈ Ω : (ω(t1 ), ..., ω(tn )) ∈ H},
7
H ∈ B(Rdn ),
and X to be the co-ordinate process Xt (ω) = ω(t).
Def. ”Sameness” between processes. Suppose two processes X and Y are defined
on the same probability space (Ω, F, P ). Then (i) X and Y are called indistinguishable if
P (∀tXt = Yt ) = 1; (ii) X is a modification of Y if for each t P (Xt = Yt ) = 1.
Example. Consider a positive r.v. ξ with a continuous distribution (i.e. such that
P (ξ = x) = 0 for any x). Put Xt = 0 for all t and let Yt be 1 for t = ξ and 0 otherwise.
Then Y is a modification of X, but P (∀tXt = Yt ) = 0.
Exercise. Suppose Y is a modification of X and both processes have right-continuous
sample paths. Then X and Y are indistinguishable. Hint: show that if X is a modification
of Y , then P (∀t ∈ Q Xt = Yt ) = 1.
Monotone class theorem. Let S be a collection of subsets of a set Ω s.t. (i) Ω ∈ S,
(ii) A, B ∈ S ⇒ A \ B ∈ S, (iii) A1 ⊂ A2 ⊂ ..., An ∈ S ⇒ ∪n An ∈ S. If a collection of
subsets Γ belongs to S and is closed under pairwise intersection, then σ(Γ) ∈ S.
This result is routinely used in stochastic analysis to check a validity of a certain
property for elements of σ(Γ), where Γ is a collection of subsets closed under intersection.
According to the theorem it is sufficient to check that the validity of this property is
preserved under the set substraction and countable unions.
Theorem (strong law of large numbers). If ξ1 , ξ2 , ... is a collection of iid r.v.
with Eξj = m, then the means (ξ1 + ... + ξn )/n converge a.s. (and in L1 ) to m.
? Riesz-Markov Theorem.
Any positive bounded linear functional on the space
R
C∞ (Rd ) has form f 7→ f (x)µ(dx) for some finite Borel measure µ.
Section 2. Brownian motion: construction via Hilbert space methods.
Main Def. A Brownian motion (or a Wiener process) with variance σ 2 is a Gaussian
process Bt (defined on a probability space (Ω, F, P )) satisfying the following conditions: (i)
B0 = 0 a.s.; (ii) the increments Bt − Bs have distribution N (0, σ 2 (t − s)) for all 0 ≤ s < t;
(iii) the r.v. Bt2 − Bt1 and Bt4 − Bt3 are independent whenever t1 ≤ t2 ≤ t3 ≤ t4 ; (iv) the
trajectories t 7→ Bt are continuous a.s. Brownian motion with σ = 1 is called the standard
Wiener process or Brownian motion.
Exer. 1. A Gaussian process Bt satisfying conditions (i) and (iv) of the above
definition is a Brownian motion if and only if EBt = 0 and E(Bt Bs ) = σ 2 min(s, t) for
any t, s. Hint: E(Bt Bs ) = σ 2 min(s, t) implies E((Bt − Bs )Bs ) = 0 for t > s. Hence
Bt − Bs , Bs are uncorrelated and consequently independent (being Gaussian).
Exer. 2 (elementary transformations of BM). Let Bt be a BM. Then so are the
processes (i) Btc = √1c Bct for any c > 0 (scaling), (ii) −Bt (symmetry), (iii) BT − BT −t , t ∈
[0, T ] for any T > 0 (time reversal), (iv) tB1/t (time inversion). Hint: for (iv) in order to
get continuity at the origin deduce from the law of large numbers that Bt /t → 0 as t → ∞
a.s.
Recall: Hilbert spaces, basis, Parceval.
Def. The Haar functions Hkn , n = 1, 2, ..., k = 0, 1, ..., 2n−1 − 1, on [0, 1] are defined
as

 2(n−1)/2 , k/2n−1 ≤ t < (k + 1/2)/2n−1 ,
n
Hk (t) = −2(n−1)/2 , (k + 1/2)/2n−1 ≤ t < (k + 1)/2n−1 ,

0, otherwise
8
Rt
and the Schauder functions as Skn (t) = 0 Hkn (u) du. The system of Haar functions is
known to be an orthonormal basis in L2 [0, 1].
R1
Exer. 3. Check the orthogonality condition: (Hkn , Hlm ) = 0 Hkn (x)Hlm (x) dx =
n k
δm
δl . Hint: supports of Hkn , Hln do not intersect for k 6= l.
Let ξkn , n = 1, 2, ..., k = 0, 1, ..., 2n−1 , be mutually independent N (0, 1) r.v. on a
probability space (Ω, F, P ).
Exer. 4. Point out a probability space (Ω, F, P ), on which such a family can be
defined.
Consider the partial sums
Btm
m
X
=
fn (t, ω),
fn (t, ω) =
n=1
2n−1
X−1
ξkn (ω)Skn (t).
(1)
k=0
The main technical ingredient of the construction is the following
Lemma. There exists a subset Ω0 ⊂ Ω such that Btm converges as m → ∞ uniformly
on [0, 1] for all ω ∈ Ω0 and P (Ω0 ) = 1.
Proof. Let
Mn (ω) = max{|ξjn | : 0 ≤ j ≤ 2n−1 − 1}
Since
P (Mn > a) ≤
2n−1
X−1
P (|ξjn | > a)
j=0
1
=2 √
2π
Z
n
one sees that
∞
e
−x2 /2
a
1
dx ≤ 2 √
2π
Z
∞
n
a
2
x −x2 /2
1
e
dx = 2n √ a−1 e−a /2 ,
a
2π
∞
X
∞
1 X n 1 −n2 /2
P (Mn > n) ≤ √
2 e
< ∞.
2π n=1 n
n=1
Hence by Borel-Cantelli P (Ω0 ) = 1, where
Ω0 = {ω : Mn (ω) ≤ n for large enough n}.
Consequently for ω ∈ Ω0
|fn (t, ω)| ≤ n
2n−1
X−1
Skn (t) ≤ n2−(n+1)/2
k=0
for all large enough n, because maxt Skn (t) = 2−(n+1)/2 and the functions Skn have nonintersecting supports for different k. This implies that
∞
X
n=0
max |fn (t, ω)| < ∞
0≤t≤1
9
on Ω0 , which clearly implies the claim of the Lemma.
Main Theorem. Let Bt denote the limit of (1) for ω ∈ Ω0 and let us put Bt = 0 for
ω outside Ω0 . Then Bt is a standard Brownian motion on [0, 1].
Proof. Since Bt is continuous in t as a uniform limit of continuous functions, the
conditions (i) and (iv) of the definition hold. Moreover, the finite-dimensional distributions
are clearly Gaussian and EBt = 0. Next, since
X
(Skn )2 (t) =
X
(1[0,t] , Hkn )2 = (1[0,t] , 1[0,t] ) = t < ∞
(by Parceval) it follows that
E[Bt −
Btm ]2
=
X
2n−1
X−1
n>m
k=0
(Skn (t))2 → 0
as m → ∞, and consequently Btm converge to Bt also in L2 . Hence one deduces that
n−1
E(Bt Bs ) = lim E(Btm Bsm ) =
m→∞
∞ 2 X−1
X
n=0
(1[0,t] , Hjn )(1[0,s] , Hjn ) = (1[0,t] , 1[0,s] ) = min(t, s),
j=0
which completes the proof.
Corollary. A standard Brownian motion exists on {t ≥ 0}.
Proof. By the main theorem there exists a sequence (Ωn , Fn , Pn ), n = 1, 2, ..., of probability spaces with Brownian motions Wn on each of them. Take the product probability
space Ω and define B on it recursively
n+1
Bt = Bn + Wt−n
,
n ≤ t ≤ n + 1.
Section 3. The construction of BM via Kolmogorov’s continuity theorem.
The Kolmogorov-Chentsov Continuity Theorem. Suppose a process Xt , t ∈
[0, T ] on a probability space (Ω, F, P ) satisfies the condition
E|Xt − Xs |α ≤ C|t − s|1+β ,
0 ≤ s, t ≤ T,
for some positive constants α, β, C. Then there exists a continuous modification X̃t of Xt ,
which is a.s. locally Hölder continuous with exponent γ for every γ ∈ (0, β/α), i.e.
"
#
|X̃t (ω) − X̃s (ω)|
sup
P ω:
≤ δ = 1,
|t − s|γ
s,t∈[0,T ]:|t−s|<h(ω)
where h(ω) is an a.s. positive r.v. and δ > 0 is a constant.
10
(1)
Proof. Step 1. By Chebyshev
P (|Xt − Xs | ≥ ²) ≤ ²−α E|Xt − Xs |α ≤ C²−α |t − s|1+β
and hence Xs → Xt in probability as s → t.
Step 2. Setting t = k/2n , s = (k − 1)/2n , ² = 2−γn in the above inequality yields
P (|Xk/2n − X(k−1)/2n | ≥ 2−γn ) ≤ C2−n(1+β−αγ) .
Hence
¶
µ
P
−γn
max |Xk/2n − X(k−1)/2n | ≥ 2
1≤k≤2n
n
≤
2
X
P (|Xk/2n −X(k−1)/2n | ≥ 2−γn ) ≤ C2−n(β−αγ) .
k=1
By Borel-Cantelli (by the assumption β − αγ > 0) there exists Ω0 of measure 1 such that
max |Xk/2n − X(k−1)/2n | < 2−γn ,
1≤k≤2n
∀n ≥ n? (ω),
(2)
where n? (ω) is a positive, integer-valued r.v.
Step 3. For each n ≥ 1 define Dn = {k/2n : k = 0, 1, ..., 2n } and D = ∪∞
n=1 Dn . For
?
a given ω ∈ Ω0 and n ≥ n (ω) we shall show that ∀m > n
|Xt (ω) − Xs (ω)| ≤ 2
m
X
2−γj ,
∀t, s ∈ Dm : 0 < t − s < 2−n .
(3)
j=n+1
For m = n + 1 necessarily t − s = 2−(n+1) and (3) follows from (2) with n replaced by
n + 1. Suppose (3) is valid for m = n + 1, ..., M − 1. Take s < t : s, t ∈ DM and define the
numbers
τmax = max{u ∈ DM −1 : u ≤ t},
so that
s ≤ τmin ≤ τmax ≤ t;
τmin = min{u ∈ DM −1 : u ≥ s}
max(τmin − s, t − τmax ) ≤ 2−M .
Hence from (2)
|Xtmin (ω) − Xs (ω)| ≤ 2−γM ,
|Xtmax (ω) − Xt (ω)| ≤ 2−γM ,
and from (3) with m = M − 1
|Xτmax (ω) − Xτmin (ω)| ≤ 2
M
−1
X
j=n+1
which implies (3) with m = M .
11
2−γj ,
Step 4. For s, t ∈ D with
?
0 < t − s < h(ω) = 2−n
choose n > n? (ω) s.t.
(ω)
2−(n+1) ≤ t − s < 2−n .
By (3)
|Xt (ω) − Xs (ω)| ≤ 2
∞
X
2−γj ≤ 2(1 − 2−γ )−1 2−(n+1)γ ≤ 2(1 − 2−γ )−1 |t − s|γ ,
j=n+1
which implies the uniform continuity of Xt with respect to t ∈ D for ω ∈ Ω0 .
Step 5. Define X̃t = lims→t,s∈Q Xs for ω ∈ Ω0 and zero otherwise. Then X̃t is
continuous and satisfies (1) with δ = 2(1 − 2−γ )−1 .
Step 6. X̃s = Xs for s ∈ Q. Then X̃t = Xt a.s. for all t, because Xs → Xt in
probability and Xs → X̃t a.s.
Exer. Show that for any n ∈ N there exists a constant Cn s.t. E|X|2n = Cn σ 2n for
a r.v. X with the normal distribution N (0, σ 2 ).
Corollary 1. ∃ a probability measure P on (R[0,∞) , B(R[0,∞) )) and a stochastic
process Wt on it which is a BM under P .
Proof. By Kolmogorov’s existence theorem ∃ P s.t. co-ordinate process Xt satisfies
all properties, but for continuity (if needed, details are given for general Markov processes
in Section 6). By Kolmogorov’s continuity theorem and the Exercise above, for each T
there exists a continuous modification W T on [0, T ]. Set
ΩT = {ω : WtT (ω) = Xt (ω) ∀t ∈ [0, T ] ∩ Q},
Ω0 = ∩∞
T =1 Ωt .
As WtT = WtS for t ∈ [0, min(T, S)] (continuous modifications of each other), their common
values define a required process on t ≥ 0.
Corollary 2. BM is a.s. Hölder continuous with any exponent γ ∈ (0, 1/2).
Proof. From Kolmogorov’s theorem and above exercise it follows that BM is a.s.
Hölder continuous with exponent γ whenever γ < (n − 1)/2n for some positive n.
CHAPTER 2. The LÉVY, MARKOV AND FELLER PROCESSES.
Section 4. Processes with s.i. increments.
Def. A probability measure µ on Rd with a ch.f. φµ is called infinitely divisible if,
for all n ∈ N, there exists a probability measure ν such that µ = ν ? ... ? ν (n times)
⇔ φµ (y) = f n (y) with f being a ch.f. of a probability measure.
Exer. 1. Convince yourself that two definitions above are actually equivalent.
Def. and Exer. A r.v. X is called infinitely divisible whenever its law pX is infinitely
divisible. Show that this is equivalent to the existence, for any n, of iid r.v. Yj , j = 1, ..., n,
s.t. Y1 + ... + Yn has the law pX .
12
Exer. 2. Convince yourself that any Gaussian distribution is infinitely divisible.
Examples. (i) A r.v. N with non-negative integers as a range is called Poisson with
the mean (or parameter) c > 0 if
cn −c
P (N = n) = e .
n!
Check (Exer.!) that E(N ) = V ar(N ) = c and that the ch.f. of N is φN (y) = exp{c(eiy −
1)}. This implies that N is infinitely divisible. (ii) Let now Z(n), n ∈ N, be a sequence
of Rd -valued iid r.v. with law µZ . The compound Poisson r.v. is X = Z(1) + ... + Z(N )
(random walk with a random number of steps).
Let us check that
Z
(ei(y,x) − 1)cµZ (dx)}.
φX (y) = exp{
Rd
In fact,
φX (y) =
∞
X
E(exp{i(y, Z(1) + ... + Z(N ))}|N = n)P (N = n)
n=0
∞
X
∞
cn −c X n
cn
=
E(exp{i(y, Z(1) + ... + Z(n))}) e =
φZ (y) e−c = exp{c(φZ (y) − 1)}.
n!
n!
n=0
n=0
Def. A Borel measure ν on Rd \ {0} is called a Lévy measure if
Z
min(1, x2 )ν(dx) < ∞.
Rd \{0}
Theorem 1 (the Lévy-Khintchine formula). For any b ∈ Rd , a positive definite
d × d matrix A and a Lévy measure ν the function
Z
1
φ(u) = exp{i(b, u) − (u, Au) +
[ei(u,y) − 1 − i(u, y)1B1 (y)]ν(dy)}
(1)
2
d
R \{0}
is a characteristic function of an infinitely divisible measure, where Ba denotes a ball of
radius a in Rd . Conversely, any infinite divisible distribution has a characteristic function
of this form.
Proof (in one direction only). If any function of form (1) is a ch.f., then it is infinitely divisible (as its roots have the same form). To show the latter we introduce the
approximations
Ã
!
Z
Z
1
φn (u) = exp{i b −
yν(dy), u − (u, Au) +
(ei(u,y) − 1)ν(dy)}. (2)
2
B1 \B1/n
Rd \B1/n
Each φn is a ch.f. (of the convolution of a normal distribution and an independent compound Poisson) and φn (u) → φ(u) for any u. By Lévy theorem one needs to show only
that φ is continuous at zero. This is easy (check it!).
13
Def. Writing φ(u) = eη(u) in (4.1) the mapping η is called the characteristic exponent
or Lévy exponent or Lévy symbol of φ (or of its distribution).
? Theorem 2. Any infinitely divisible probability measure µ is a weak limit of a
sequence of compound Poisson distributions.
Proof. Let φ be a ch.f. of µ so that φ1/n is the ch.f. of its ”convolution root” µn .
Define
Z
1/n
φn (u) = exp{n[φ (u) − 1]} = exp{
(ei(u,y) − 1)nµn (dy)}.
Rd
Each φn is a ch.f.of a compound Poisson process and
φn = exp{n(e(1/n) ln φ(u) − 1)} → φ(u),
n → ∞.
Completes by Glivenko’s theorem.
Def. Processes with s.i. increments. A process X = Xt , t ≥ 0, has independent
increments if for any collection of times 0 ≤ t1 < ... < tn+1 the r.v. Xtj+1 −Xtj , j = 1, ..., n
are independent and it has stationary increments if Xt − Xs is distributed like Xt−s − X0
for any t > s. X is a Lévy process if (i) X0 = 0 a.s., (ii) X has s.i. increments; (iii) X is
stochastically continuous, i.e. ∀ a > 0, s ≥ 0
lim P (|X(t) − X(s)| > a) = 0.
t→s
Under (i), (ii), the latter is equivalent to limt→0 P (|X(t)| > a) = 0 for all a > 0.
Alternative version of the definition of the Lévy processes requires the right continuity
of paths instead of stochastic continuity. At the end of the day this leads to the same class
of processes, because, on the one hand, conclusions of Theorems 3 and 4 below are easily
seen to remain valid under this assumption (which leads to stochastic continuity), and on
the other hand, any Lévy process as defined above has a right continuous modification, as
we shall see later. So we shall usually consider the right continuous modifications of the
Lévy processes.
Theorem 3. If Xt is stochastically continuous, then the map t 7→ φXt (u) is continuous for each u.
Proof. Follows from
Z
|φXt (u) − φXs (u)| = ei(u,Xs ) [ei(u,Xt −Xs ) − 1](ω)P (dω)
Z
≤
|ei(u,y) − 1|PXt −Xs (dy) ≤ sup |ei(u,y) − 1| + 2P (|Xt − Xs | > δ).
|y|<δ
Exer. 3. Let a right continuous function f : R+ 7→ C satisfy f (t + s) = f (t)f (s) and
f (0) = 1. Show that f (t) = etα with some α. Hint: consider first t ∈ N, then t ∈ Q, then
use continuity.
Theorem 4. If X is a Lévy process, then Xt is infinitely divisible for all t and
φXt (u) = etη(u) , where η(u) is the Lévy symbol of X1 .
14
Proof. φXt+s (u) = φXt (u)φXs (u) and φX0 (u) = 1. Hence by Exercise φXt =
exp{tα(u)}. But φX1 = exp{η(u)}.
Example. Convince yourself that the Brownian motion is a Lévy process.
Def. The Poisson process of intensity c > 0 is a right continuous Lévy process s.t.
each r.v. Nt is Poisson with the parameter tc.
Construction of Poisson processes. The existence of Poisson processes can be
obtained by the following explicit construction. Let τ1 , τ2 , ... be a sequence of iid exponential r.v. with parameter c > 0, i.e. P (τi > s) = e−cs , s > 0. Introduce the partial sums
Sn = τ1 + ... + τn . These sums have the Gamma (c, n) distributions
cn
sn−1 e−cs ds
(n − 1)!
P (Sn ∈ ds) =
(Exer.: check it by induction taking into account that the distribution of Sn is the convolution of the distributions Sn−1 and τn ). Define Nt as the right continuous inverse to Sn ,
e.g.
Nt = sup{n ∈ N : Sn ≤ t},
so that P (Sk ≤ t) = P (Nt ≥ k) and
Z
P (Nt = n) = P (Sn ≤ t, Sn+1 > t) =
0
t
cn
(ct)n
sn−1 e−cs e−c(t−s) ds = e−ct
.
(n − 1)!
n!
Exer. 4. Prove that the process Nt constructed above is in fact a Lévy process by
showing that
P (Nt+r − Nt ≥ n, Nt = k) = P (Nr ≥ n)P (Nt = k) = P (Sn ≤ r)P (Nt = k).
(3)
Hint: take, say, n > 1 (the cases with n = 0 or 1 are even simpler) and observe that the
l.h.s. of (3) is the probability of the event
(Sk ≤ t, Sk+1 > t, Sn+k ≤ t + r)
= (Sk = s ≤ t, τk+1 = τ > t − s, Sn+k − Sk+1 = v ≤ (t + r) − (s + τ )),
so that by independence the l.h.s. of (3) equals
Z
t
0
ck
sk−1 e−cs ds
(k − 1)!
Z
∞
Z
−cτ
ce
dτ
t−s
0
(t+r)−(τ +s)
cn−1 n−2 −cv
v
e
dv,
(n − 2)!
which changing τ to τ + s and denoting it again by τ rewrites as
Z t
Z ∞
Z t+r−τ
ck
cn−1 n−2 −cv
k−1
−cτ
s
ds
ce
dτ
v
e
dv.
(n − 2)!
0 (k − 1)!
t
0
By calculating the integral over ds and changing the order of v and τ this in turn rewrites
as
Z
Z t+r−v
(ct)k r cn−1 n−2 −cv
v
e
dv
ce−cτ dτ
k! 0 (n − 2)!
t
15
=e
k
−ct (ct)
Z
k!
r
0
cn−1 n−2 −cv
v
(e
− e−cr ) dv.
(n − 2)!
It remains to see that by integration by parts the integral in this expression equals
Z r
cn
sn−1 e−cs ds,
0 (n − 1)!
and (3) follows.
Exer. 5 and Def. Let Z(n), n ∈ N, be a sequence of Rd -valued iid r.v. with law
µZ . The compound Poisson process (with the distribution of jumps µZ and intensity λ) is
defined as
Y (t) = Z(1) + ... + Z(Nt ),
where Nt is a Poisson process of intensity λ. The corresponding compensated compound
Poisson process is defined as
Ỹt = Y (t) − tλEZ(1).
From the above calculations of the ch.f. of a compound Poisson r.v. it follows that Y (t) is
a Lévy process with the Lévy exponent
Z
η(u) = (ei(u,y) − 1)λµZ (dy).
(4)
Check (i) that Yt is a Lévy process and (ii) that EỸt = 0. Hint: to check condition (iii) in
the definition of Lévy processes write
P (|Yt | > a) =
∞
X
P (|Z(1) + ... + Z(n)| > a)P (Nt = n)
n=0
and use dominated convergence (alternatively follows from obvious right continuity).
Remark. The existence of a Levy process with a given characteristic exponent can be
proved by various constructions. The fastest way is based on carrying out on the level of
processes the limiting procedure outlined for r.v. in our proof of Theorem 1 (in other words
via Lévy-Ito decomposition, described below). But we shall obtain the existence (of a right
continuous modification) later by a more general procedure (applied to all Feller processes)
in three steps: (i) building finite-dimensional distributions via Markov property, (ii) using
Kolmogorov’ existence of a canonical process, (iii) defining right continuous modification
via martingale methods.
Def. A Lévy process Xt with a characteristic exponent
1
η(u) = i(b, u) − (u, Au)
2
(5)
(where A is a positive definite d × d-matrix, b ∈ Rd ) and with a.s. continuous paths
is called the d-dimensional Brownian motion with covariance A and drift b. It is called
standard if A = I, b = 0.
16
Exer. 6. Show that this is equivalent to say that Xt is a Gaussian process s.t. (i)
B0 = 0 a.s.; (ii) the increments Bt −Bs have normal distribution N ((t−s)b, (t−s)A) for all
0 ≤ s < t; (iii) the r.v. Bt2 −Bt1 and Bt4 −Bt3 are independent whenever t1 ≤ t2 ≤ t3 ≤ t4 ;
(iv) the trajectories t 7→ Bt are continuous a.s.
Exer. 7. Prove the existence of BM Bt with a given drift and covariance. Hint: first
construct √
a standard d-dimensional BM Wt using product measure spaces, then define
Bt = bt + AWt .
By ∆Xt = Xt − Xt− we shall denote the jumps of Xt .
Theorem 5 (Lévy-Ito decomposition). Let Xt be a right continuous Lévy process
with a characteristic exponent
Z
1
[ei(u,y) − 1 − i(u, y)1B1 (y)]ν(dy).
(6)
η(u) = i(b, u) − (u, Au) +
2
d
R \{0}
Then Xt can be represented as the sum of three independent Lévy processes Xt = Xt1 +
Xt2 + Xt3 , where Xt1 is the BM with a drift specified by Lévy exponent (5),
X
∆Xt 1|∆Xt |>1
Xt2 =
s≤t
is a compound Poisson process with the exponent
Z
η2 (u) =
[ei(u,y) − 1]ν(dy)
(7)
Rd \B1
obtained by summing the jumps of Xt of size exceeding 1 and Xt3 is the limit of the compensated compound Poisson processes Xt3 (n) with the exponents
Z
Z
i(u,y)
yν(dy)).
[e
− 1]ν(dy) − i(u,
η(u) =
B1 \B1/n
B1 \B1/n
The process Xt3 has jumps only of the size not exceeding 1 and has all finite moments
E|Xt3 |m , m > 0.
Proof. Straightforward from (1) and (2). In particular, the product form of ch.f. (1)
ensures the independence of X i , i = 1, 2, 3, formula (7) comes by comparison with (4), the
moments of Xt3 are given by
Z
3 2k
E|Xt | =
|y|2k ν(dy), k = 1, 2, ...
B1
Corollary 1. The only continuous Lévy processes are BM with drifts or deterministic
processes (pure drifts).
Corollary 2. For any collection of disjoint Borel sets Ai , i = 1, ..., n not containing
zero in their closures the processes
X
XtAi =
∆Xt 1∆Xt ∈Ai
s≤t
17
are independent compound Poisson process with characteristic exponent
Z
ηAi (u) =
(ei(u,y) − 1)ν( dy).
(8)
Ai
Pn
A
and Xt − j=1 Xt j is a Lévy process independent of all X Aj with the jumps only outside
∪j Aj . Moreover, the processes N (t, Ai ) that count the number of jumps of Xt or XtAi in
Ai up to time t are independent Poisson processes of intensity ν(Ai ).
Def. Let µ be a σ-finite measure on a metrical space E (we need only the case with
E being a Borel subset of Rd ). The collection of r.v. φ(B) parametrized by Borel subsets
of E is a Poisson random measure with intensity µ if each φ(B) is a Poisson r.v. with
parameter µ(B) and if φ(B1 ), ..., φ(Bn ) are independent whenever B1 , ..., Bn are disjoint.
Corollary 3. The collection of r.v. N ((s, t], A) = N (t, A) − N (s, A) (notations from
Corollary 2) counting the number of jumps of Xt of size A that occur in the time interval
(s, t] specifies a Poisson random measure on (0, ∞) × (Rd \ {0}) with intensity dt ⊗ ν.
Remark. To prove the existence of a Lévy process in the spirit of Theorem 5 one
needs two additional ingredients: existence of a Poisson random measure with an arbitrary
intensity (which is rather easy) to construct the processes N (t, A) of jumps of a Lévy
process and the proof of convergence of the approximation Xt3 (n) (see Theorem 5), for
which one needs Doob’s maximal inequality for martingales (which can be derived as a
consequence of Doob’s optional sampling given Section 8).
Def. The non-decreasing Lévy processes with values in R+ are called subordinators.
Theorem 6. A real valued Lévy process Xt is a subordinator iff its characteristic
exponent has the form
Z
∞
η(u) = ibu +
(eiuy − 1)ν(dy),
(9)
0
where b ≥ 0 and the Lévy measure ν has support in R+ and satisfies the additional condition
Z 1
xν(dx) < ∞.
(10)
0
Moreover
Xt = tb +
X
(∆Xs ).
s≤t
Proof. First if X is positive, then it can only increase from X0 = 0. Hence by iid
property it is a non-decreasing process and consequently the Lévy measure has support in
R+ and X contains no Brownian part, e.g. A = 0 in (6). Next,
X
(∆Xs )1|Xs |≤1 =
s≤t
X
|∆Xs |1|Xs |≤1 ≤ Xt2 ,
s≤t
implying that
E
X
(∆Xs )1|Xs |≤1 ≤ EXt2 < ∞.
s≤t
18
But
Z
X
X
(∆Xs )1²≤|Xs |≤1 =
(∆Xs )1|Xs |≤1 = lim E
E
²→0
s≤t
1
xν(dx),
0
s≤t
implying (10).
Def. Clearly for a subordinator Xt the Laplace transform is well defined and
Ee−λXt = exp{−tΦ(λ)},
where
Z
Φ(λ) = −η(iλ) = bλ +
∞
(1 − e−λy )ν(dy)
(11)
(12)
0
is called the Laplace exponent or cumulant.
Def. A subordinator Xt is a one-sided stable process, if to each a ≥ 0 there corresponds
a constant b(a) ≥ 0 s.t. aXt and Xtb(a) have the same law.
Exer. 8 (exponents of stable subordinators). (i) Show that b(a) in this definition
is continuous and satisfies the equation b(ac) = b(a)b(c), hence deduce that b(a) = aα with
some α > 0 called the index of stability or stability exponent. (ii) Deduce further that
Φ(a) = b(a)Φ(1) and hence
Ee−uXt = exp{−truα }
(13)
with a constant r > 0, called the rate. Taking into account that Φ from (12) is increasing
and concave, deduce that necessarily α ∈ (0, 1). (iii) Show that for α ∈ (0, 1)
Z ∞
dy
Γ(1 − α) α
(1 − e−uy ) 1+α =
u
(14)
y
α
0
by using the integration by parts in order to rewrite the l.h.s. of this equation as
Z
u ∞ −uy −α
e
y dy.
α 0
Deduce that stable subordinators with index α and rate r described by (13) has the Laplace
exponent (12) with the Lévy measure
ν(dy) = r
α
y −(1+α) .
Γ(1 − α)
(15)
Exer. 9 Prove the law of large number for a Poisson process Nt of intensity c:
Nt /t → c a.s. as t → ∞. Hint: use the construction of Nt given above and the fact that
Sn /n → 1c as n → ∞ according to the usual law of large numbers.
Section 5. Conditioning.
Def. For a given measure space (S, F, µ), a measure ν on (S, F) is called absolutely
continuous with respect to µ if ν(A) = 0 whenever A ∈ F and µ(A) = 0. Two measures
are called equivalent if they are mutually absolutely continuous.
19
The Radon-Nikodym Theorem. If µ is σ-finite and ν is finite and absolutely
continuous with respect to µ, then there exists a unique (up to almost sure equality)
non-negative measurable function g on S such that for all A ∈ F
Z
ν(A) =
g(x)µ(dx).
A
This g is called the Radon-Nikodym derivative of ν with respect to µ and is often denoted
dν/dµ.
Def. Conditional expectation. Let X be a integrable r.v. on a probability
space (Ω, F, P ) and let G be a sub- σ-algebra of F. If X ≥ 0 everywhere, the formula
QX (A) = E(X1A ) for A ∈ G defines a measure QX on (Ω, G) that is obviously absolutely
continuous with respect to P . The r.v. E(X|G) = dQX /dP on (Ω, G, P ) is called the
conditional expectation of X with respect to G. If X is not supposed to be positive one
defines the conditional expectation as E(X|G) = E(X + |G) − E(X − |G). In other words
Y = E(X|G) is a r.v. on (Ω, G, P ) such that
Z
Z
Y (ω)P (dω) =
X(ω)P (dω)
(1)
A
A
for all A ∈ G. If X = (X1 , ..., Xd ) ∈ Rd , then
E(X|G) = (E(X1 |G), ..., E(Xn |G).
Exer. 1. Let a σ-algebra G be finite and defined as the set of unions of a finite
n
collection of disjoint sets Gi ∈ F, i = 1, ..., n such that
R Ω = ∪i=1 Gi . Show that for any i
the function E(X|G) is a constant on Gi that equals Gi X(ω)P (dω).
Theorem 1 (key properties of the conditional expectation).
(i) E(E(X|G)) = E(X);
(ii) if Y is G-measurable, then E(XY |G) = Y E(X|G) a.s.;
(iii) if Y is G-measurable and X is independent of G, then E(XY |G) = Y E(X) a.s.
and
E(f (X, Y )|G) = Gf (Y )
(2)
a.s. for a Borel function f , where Gf (y) = E(f (X, y)) a.s.;
(iv) if H is a sub- σ-algebra of G then E(E(X|G)|H) = E(X|H) a.s.;
(v) the mapping X 7→ E(X|G) is on orthogonal projection L2 (Ω, F, P ) 7→ L2 (Ω, G, P ).
(vi) X1 ≤ X2 → E(X1 |G) ≤ E(X2 |G) a.s.
(vii) the mapping X 7→ E(X|G) is a linear contraction L1 (Ω, F, P ) 7→ L1 (Ω, G, P ).
Exer. 2. Prove the above theorem. Hint: (ii) consider first the case with Y being an
indicator function of a G-measurable set; (v) assume X = Y + Z with Y from L2 (Ω, G, P )
and Z from its orthogonal complement and show that Y = E(X|G). (vi) Follows from an
obvious remark that X ≥ 0 ⇒ E(X|G) ≥ 0.
Exer. 3. Give an alternative construction of conditional expectation (proving all
its properties) by passing Radon-Nikodym: define it by the property (v) from the above
theorem.
20
Def. If Z is a r.v. on (Ω, F, P ) one calls E(X|σ(Z)) the conditional expectation of X
with respect to Z and denotes it shortly by E(X|Z).
Exercise 4 and Def. Show that the r.v. E(X|Z) is a constant on any Z-level set
{ω : Z(ω) = z}. One denotes this constant by E(X|Z = z) and calls it the conditional
expectation of X given Z = z. Show that
Z
Z
E(X) = E(X|Z)(ω)P (dω) = E(X|Z = z)pZ (dz).
(3)
Hint: use (1.1) with the function f (Z(ω)) = E(X|Z)(ω) = E(X|Z = z(ω)).
Def. Let X and Z be Rd and respectively Rm -valued r.v. on (Ω, F, P ), and let G
be a sub-sigma-algebra of F. Conditional probability of X given G and X given Z = z
respectively are defined as
PX|G (B; ω) ≡ P (X ⊂ B|G)(ω) = E(1B (X)|G)(ω),
ω ∈ Ω;
PX|Z=z (B) ≡ P (X ⊂ B|Z = z) = E(1B (X)|Z = z),
for Borel sets B, or equivalently through the equations
Z
E(f (X)|G)(ω) =
f (x)PX|G (dx; ω)
(4)
Rd
Z
E(f (X)|Z = z) =
Rd
f (x)PX|Z=z (dx)
for bounded Borel functions f . Of course PX|Z=z (B) is just the common value of PX|Z (B; ω)
on the set {ω : Z(ω) = z}.
It is possible to show (though this is not obvious) that regular conditional probability
of X given G exists, i.e. such a version of conditional probability that PX|G (B, ω) is a
probability measure on Rd as a function of B for each ω (notice that from the above
discussion the required additivity of conditional expectations hold a.s. only so that they
may fail to define a probability even a.s.) and is G-measurable as a function of ω. Hence
one can define conditional r.v. XG (ω), XZ (ω) and XZ=z as r.v. with the corresponding
conditional distributions.
Exer. 5. For a Borel function h
Z
Eh(X, Z) = h(x, z)PX|Z=z (dx)pZ (dz)
(5)
(if the l.h.s. is well defined). Hint: From the above definition
Z
Z
Z
f (x)PX|G (dx; ω)P (dω) =
f (X(ω))P (dω)
A∈G
and in particular
Z
C∈Rm
Rd
A
Z
Z
Rd
f (x)PX|Z=z (dx)PZ (dz) =
21
1Z⊂C (ω)f (X(ω))P (dω).
Hence
Z
Z
Rm
Rd
g(z)f (x)PX|Z=z (dx)PZ (dz) = E(f (X)g(Z))
for Borel f, g, which implies (5).
Exer. 6. Deduce from (5) that (i) if X, Z are r.v. with a joint probability density
function fX,Z (x, z), then the conditional r.v. XZ=z has a probability density function
fXZ=z (x) = fX,Z (x, z)/fZ (z)
whenever fZ does not vanish, (ii) if X, Z are discrete r.v. with joint probability P (X =
i, Z = j) = pij , then the conditional probabilities p(X = i|Z = j) are given by the usual
formula pij /P (Z = j).
Theorem 2. Let X be a integrable variable on (Ω, F, P ) and let Gn be (i) an increasing
sequence of sub-σ-algebras of F with G being the minimal σ -algebra containing all Gn , or
1
(ii) decreasing sequence of sub-σ-algebras of F with G = ∩∞
n=1 Gn . Then a.s. and in L
E(X|G) = lim E(X|Gn ).
n→∞
(6)
Furthermore, if Xn → X a.s. and |Xn | < Y for all n, where Y is an integrable r.v., then
a.s. and in L1
E(X|G) = lim E(Xn |Gn ).
(7)
n→∞
Sketch of the proof of the convergence in L1 (a.s. convergence is a bit more involved,
and we shall neither prove, nor use it). Any r.v. of the form χB with B ∈ G can be
approximated in L2 by a Gn -measurable r.v. ξn . Hence the same holds for any r.v. from
L2 (Ω, F, P ). As E(X|Gn ) is the best approximation (L2 -projection) one obtains (6) for
X ∈ L2 (Ω, F, P ), and hence for X ∈ L1 (Ω, F, P ) by density arguments. Next,
E(Xn |Gn ) − E(X|G) = E(Xn − X|Gn ) + (E(X|Gn ) − E(X|G).
Since |Xn | < Y and Xn → X a.s. one concludes that Xn → X in L1 by dominated
convergence. Hence
E(E|Xn − X||Gn ) = E|Xn − X| → 0.
Theorem 3. If X ∈ L1 (Ω, F, P ), the family of r.v. E(X|G), G runs through all
sub-σ-algebra of F, is uniformly integrable.
Proof.
1|E(X|G)|>c E(X|G) = E(X1|E(X|G)|>c |G),
because {|E(X|G)| > c} ∈ G. Hence
¡
¢
¡
¢
E 1|E(X|G)|>c E(X|G) ≤ E 1|E(X|G)|>c |X|
d
≤ E(|X|1|X|>d ) + dP (|E(X|G)| > c) ≤ E(|X|1|X|>d ) + E(|X|).
c
22
First choose d to make the first term small, then c to make the second one small.
Theorem 4 (locality of conditional expectation). Let the σ-algebras G1 , G2 ∈ F
and r.v. X1 , X2 ∈ L1 (Ω, F, P ) be such that G1 = G2 and X1 = X2 on a set A ∈ G1 ∩ G1 .
Then E(X1 |G1 ) = E(X2 |G2 ) a.s. on A.
Proof. Note that 1A E(X1 |G1 ) and 1A E(X2 |G2 ) are both G1 ∩ G2 -measurable, and for
any B ⊂ A s.t. B ∈ G1 (and hence B ∈ G2 )
Z
Z
E(X1 |G1 )P (dω) =
B
Z
Z
X1 P (dω) =
B
X2 P (dω) =
B
E(X2 |G2 )P (dω).
B
Section 6. Markov processes.
Def. Let (Ω, F) be a measurable space. A family Ft , t ≥ 0, of sub-σ-algebras of
F is called a filtration if Fs ⊂ Ft whenever s ≤ t. By F∞ one denotes the minimal
σ-algebra containing all Ft . A probability space (Ω, F, P ) with a filtration is said to be
filtered. A process X = Xt defined on a filtered probability space (Ω, F, P ) is adapted
(or Ft -adapted) if Xt is Ft -measurable for each t. Any process X defines its own natural
filtration FtX = σ{Xs : 0 ≤ s ≤ t} and X is clearly adapted to it.
Main Def. An adapted process X = Xt on a filtered probability space (Ω, F, P ) is
called Markov process if for all f ∈ Bb (Rd ), 0 ≤ s ≤ t it satisfies the following Markov
property:
E(f (Xt )|Fs ) = E(f (Xt )|Xs ) a.s.
(1)
and moreover the function
Φs,t f (x) = E(f (Xt )|Xs = x).
(2)
belongs to Bb (Rd ) whenever f so does for any 0 ≤ s ≤ t.
Theorem 1. Any Lévy process X (e.g. Browninan motion) is Markov with respect
to its natural filtration. Moreover
Z
X
E(f (Xt )|Fs ) =
f (Xs + z)pt−s (dz)
(3)
Rd
for f ∈ Bb (Rd ), 0 ≤ s < t, where pt is the law of Xt .
Proof. By (5.2)
E(f (Xt )|FsX ) = E(f (Xt − Xs + Xs )|FsX ) = Gf (Xs ),
where
Z
Gf (y) = E(f (Xt − Xs + y)) =
f (z + y)pt−s (dz),
and (3) follows. Similarly the r.h.s. of (1) equals the r.h.s. of (3) implying (1) with the
filtration FtX .
23
Def. A Lévy process Xt on a probability space (Ω, F, P ) equipped with a filtration
Ft is called Ft -Lévy process if it is Ft -adapted and the increments Xt −Xs are independent
of Fs for all 0 ≤ s < t.
Theorem 2 (properties of transition).
(i) Φs,s = I (identity operator); (ii) (positivity) f ≥ 0 ⇒ Φs,t f ≥ 0; (iii) (conservativity) Φs,t (1) = 1; (iv) (propagator property) Φr,s Φs,t = Φr,t for r ≤ s ≤ t.
Proof. (i)-(iii) are obvious and do not depend on the Markov property. (iv) By (6.1)
Φr,t = E(f (Xt )|Xr = x) = E(E(f (Xt )|Fs )|Xr = x)
= E(E(f (Xt )|Xs )|Xr = x) = E(Φs,t f (Xs )|Xr = x) = (Φr,s (Φs,t f ))(x).
Def. For a Markov process X transition probabilities are defined by
ps,t (x, A) = (Φs,t 1A )(x) = P (Xt ∈ A|Xs = x),
so that
Z
s,t
(Φ f )(x) =
Rd
f (y)ps,t (x, dy),
f ∈ Bb (Rd ).
A Markov process has transition densities
whenever the measures ps,t (x, .) have densities,
R
say ρs,t (x, y) so that ps,t (x, A) = A ρs,t (x, y) dy.
Exer. 1. (i) Show that for a Lévy process ps,t (x, A) = qt−s (A − x), where qt is the
law of Xt . (ii) Write down the probability density of the Brownian motion.
Theorem 3 (the Chapman-Kolmogorov equations). If X is a Markov process,
then for any Borel A
Z
pr,t (x, A) =
ps,t (y, A)pr,s (x, dy).
Rd
Proof. Apply the operator equation Φr,s Φs,t = Φr,t to the indicator function 1A .
Exer. 2. (Chapman- Kolmogorov for processes with transition densities.) If a Markov
process has transition densities, then Chapman-Kolmogorov rewrites as
Z
ρr,t (x, z) =
Rd
ρr,s (x, y)ρs,t (y, z) dy.
Def. A family of mappings {ps,t : 0 ≤ s ≤ t < ∞} from Rd × B(Rd ) to [0, 1] is said
to be a transition family (shortly t.f.) if (i) ps,t (x, A) is measurable as a function of x and
is a probability measure as a function of A, (ii) the Chapman-Kolmogorov equations hold.
ps,t
E
Theorem 4. A process X is Markov with respect to its natural filtration FtX with t.f.
and initial measure ν ⇔ for any 0 = t0 < t1 < .... < tk and positive Borel fi
k
Y
Z
fi (Xti ) =
Z
ν(dx0 )f0 (x0 )
Z
p0,t1 (x0 , dx1 )f1 (x1 )...
i=0
24
ptk−1 ,tk (xk−1 , dxk )fk (xk ). (4)
Proof. Let X be Markov with t.f. ps,t . Then
Ãk−1
!
k
Y
Y
E
fi (Xti ) = E
fi (Xti )E(fk (Xtk |Ftk−1 )
i=0
=E
Ãk−1
Y
i=0
!
fi (Xti )Φtk−1 ,tk fk (Xtk−1 )
=E
Ãk−1
Y
i=0
!
Z
fi (Xti )
ptk−1 ,tk (Xtk−1 , dxk )fk (xk )
i=0
and repeating this inductively one arrives to the r.h.s. of (4). Conversely, as (1), (2) is
equivalent to
Z
Z
f (Xt (ω))P (dω) =
(Φs,t f )(Xs (ω))P (dω)
A∈Fs
A∈Fs
(here one uses that E(f (Xt )|Xs )(ω) is constant on a level set of Xs and hence it can be
Qk
written as (Φs,t f )(Xs (ω))), and because Fs is generated by the sets i=1 1Xti ∈Ai , t1 <
... < tk ≤ s, to prove that X is Markov one has to show that for any t1 < .... < tk ≤ s < t
and Borel functions f1 , ..., fk , g
à k
!
à k
!
Y
Y
E
fi (Xti )g(Xt ) = E
fi (Xti )Φs,t g(Xt ) ,
i=0
i=0
and this follows by applying (4) to both sides of this equation.
Theorem 5. Let {ps,t : 0 ≤ s ≤ t < ∞} be a transition family and µ a probability
+
measure on Rd . Then there exists a probability measure P on the measure space (Rd )R
equipped with its natural filtration Ft0 = σ(Xu : u ≤ t) generated by the co-ordinate process
X s.t. the co-ordinate process Xt is Markov with initial distribution µ and with t.f. ps,t .
Proof. On cylinder sets define
Z
Z
=
µ(dx0 )
A0
A1
pt0 ,t1 ,...,tn (A0 × A1 × ... × An )
Z
Z
p0,t1 (x0 , dx1 )
pt1 ,t2 (x1 , dx2 )...
A2
An
ptn−1 ,tn (xn−1 , dxn ).
Chapman-Kolmogorov ⇒ consistency, which implies (Kolmogorov’s theorem) the existence
of a process Xt with such finite dimensional distributions. Clearly X0 has law µ and Xt is
adapted to its natural filtration. Theorem 4 ensures that this process is Markov.
Def. A Markov process constructed in the above theorem is called canonical process
corresponding to t.f. ps,t .
Def. A Markov process is called (time) homogeneous if ps,t depend on the difference
t − s only. One then writes pt−s for ps,t and Φt−s for Φs,t .
We shall deal only with homogeneous Markov processes.
0
Exer. 3. If X is a canonical Markov process and Z is a F∞
-measurable bounded
d R+
(or positive) function on (R ) . Then the map x 7→ Ex (Z) is (Borel) measurable and
Z
Eν (Z) = ν(dx)Ex (Z)
25
for any probability measure ν (initial distribution of X). Hint: extend by the monotone
class theorem from the mappings Z being indicators of cylinders, for which this is equivalent
to (3).
? Theorem 6 (a more powerful formulation of Markov property). Coordinate
+
+
0
process on ((Rd )R , F∞
, P ) is Markov ⇔ for any bounded (or positive) r.v. Z on (Rd )R ,
every t > 0 and starting measure ν
Eν (Z ◦ θt |Ft0 ) = EXt (Z)
Pν − a.s.,
where θ is the canonical shift operator Xs (θt (ω)) = Xt+s (ω).
Proof. One needs to show that
Eν ((Z ◦ θt )Y ) = Eν (EXt (Z)Y )
for F 0 -measurable r.v. Y . By usual extension arguments it is enough to do it for Y =
Qk t
Qn
i=1 fi (Xti ) and Z =
j=1 gj (Xsj ), where ti ≤ t and fi , gj are positive Borel. Thus one
has to show that






n
k
k
n
Y
Y
Y
Y
Eν 
gj (Xsj +t )
fi (Xti ) = Eν EXt 
fi (Xti ) .
gj (Xsj )
j=1
i=1
i=1
j=1
But the l.h.s. equals
 
Eν E 
n
Y

gj (Xsj +t )|Ft0 
k
Y

fi (Xti ) ,
i=1
j=1
which coincides with the r.h.s. by the homogeneous Markov property.
Section 7. Feller processes and semigroups.
Recall: Banach spaces: Lp (Ω, F, P ), p ≥ 1, L∞ (Ω, F, P ), Bb (X), Cb (X), C∞ (X),
convergence, linear operators and their norms, dense subspaces.
Def. A semigroup of linear contractions on a Banach space B is a family Φt , t ≥ 0,
of bounded linear operators on B with norm not exceeding one s.t. Φ0 is the identity
operator and Φt Φs = Φt+s for all t, s ≥ 0. Such semigroup on the Banach space Bb (X)
(X-subset of Rd ) is called a sub-Markov semigroup, if it preserves positivity (or is positive),
i.e. if f ≥ 0 always implies Φt f ≥ 0, and a Markov semigroup, if additionally it preserves
constants, i.e. Φt 1 = 1.
Theorem 1. For a Markov process with homogeneous t.f. pt the operators
Z
Φt f (x) = pt (x, dy)f (y) = Ex f (Xt )
form a Markov semigroup in Bb (X).
26
Proof. A direct consequence of definitions and Chapman-Kolmogorov equations.
Def. (i) A semigroup Φt of linear contractions on a Banach space B is called strongly
continuous, if kΦt f − f k → 0 as t → 0 for any f ∈ B. (ii) A strongly continuous semigroup of positive linear contractions on C∞ (Rd ) is called a Feller semigroup. It is called
conservative if it extends to a semigroup of contractions on Bb (Rd ) preserving constants.
We shall discuss only conservative Feller semigroups.
Def. A (homogeneous) Markov process is called a Feller process, if its Markov semigroup reduced to C∞ (Rd ) is a (conservative) Feller semigroup.
? Proposition. Any Feller semigroup arises in this way, i.e. it is given by
Z
Φt f (x) =
pt (x, dy)f (y)
with a certain t.f. pt .
Sketch of the Proof. Follows more or less directly from the Riesz-Markov theorem.
Exer. 1. (i) Show that if A is a bounded linear operator in a Banach space, then
Tt = e
tA
∞ n
X
t n
=
A
n!
n=0
defines a strongly continuous semigroup. (ii) Show that the process of BM is Feller. (iii)
Show that the semigroup of shifts Tt f (x) = f (x + t) is strongly continuous in C∞ (R) (and
hence is Feller there), as well as in L1 (R) or L2 (R), but is not strongly continuous in
Cb (R). Observe also that for analytic functions
∞ n
X
t
f (x + t) =
(Dn f )(x),
n!
n=0
which can be formally written as etD f (x). (iv) Let η(y) be a complex-valued continuous
function on Rd s.t. Re η ≤ 0. Convince yourself that
Tt f (y) = etη(y) f (y)
(1)
is a semigroups of contraction in all our Banach spaces Lp (Rd ), L∞ (Rd ), Bb (Rd ), Cb (Rd ),
C∞ (Rd ). Show that it is strongly continuous in Lp (Rd ) and C∞ (Rd ), but not necessarily
in other three spaces.
Theorem 2. Let Xt be a Lévy process with Levy symbol η. Then Xt is a Feller
process with semigroup Φt s.t.
Z
Φt f (x) =
f (x + y)pt (dy),
where pt is the law of Xt .
27
f ∈ Cb (Rd ),
(2)
Sketch of the proof. Formula (2) was established earlier. Notice that any f ∈ C∞ is
uniformly continuous (check it!). For any such f
Z
Φt f (x) − f (x) = (f (x + y) − f (x))pt (dy)
Z
=
Z
(f (x + y) − f (x))pt (dy) +
|y|>K
(f (x + y) − f (x))pt (dy),
|y|≤K
and the first (resp. the second) term is small for small t and any K by stochastic continuity
of X (resp. for small K and arbitrary t by uniform continuity of f ). Hence kΦt f − f k → 0
as t → 0. Check that Φt f ∈ C∞ for any t (Exer.)
? Exer. 2. Recall the inversion formula for the Fourier transform on S(Rd ) and
check that the Fourier transform takes the semigroup Φt to a multiplication semigroup,
i.e.
Φt f (x) = F −1 (etη F f ), f ∈ S(Rd ).
Use this representation in conjunction with Exer. 1 (iv) to give another proof of the Feller
property of the semigroup Φt . Hint: by inversion
µZ
¶
d/2
i(u,x+Xt )
Φt f (x) = E(f (Xt + x) = (2π) E
e
F f (u) du
Rd
which yields (justify by Fubini’s)
Z
Z
d/2
i(u,x)
i(u,Xt )
d/2
Φt f (x) = (2π)
e
Ee
F f (u) du = (2π)
Rd
ei(u,x) etη(u) F f (u) du.
Rd
Feller property follows then from the exercise (iv) above and density arguments.
Def. Let Tt be a strongly continuous semigroup of linear contractions on a Banach
space B. The generator of Tt is defined as the operator
Tt f − f
t→0
t
Af = lim
on the linear subspace DA ⊂ B (the domain of A), where this limit exists (in the topology
of B). The resolvent of Tt (or of A) is defined for any λ > 0 as the operator
Z ∞
Rλ f =
e−λt Tt f dt.
0
Theorem 3 (basic properties of the generator and the resolvent).
(i) Tt DA ⊂ DA for each t ≥ 0.
(ii) Tt Af = ATt f for each t ≥ 0, f ∈ DA .
(iii) Rλ is a bounded operator in B with kRλ k ≤ λ−1 (for any λ > 0).
(iv) λRλ f → f as λ → ∞.
(v) Rλ f ∈ DA for any f and λ > 0 and (λ − A)Rλ f = f , i.e. Rλ = (λ − A)−1 .
28
(vi) If f ∈ DA , then Rλ f Af = ARλ f .
(vii) DA is dense in B.
Proof. (i) and (ii) Observe that for ψ ∈ DA
·
¸
·
¸
1
1
ATt ψ = lim (Th − I) Tt ψ = Tt lim (Th − I) ψ = Tt Aψ.
h→0 h
h→0 h
R ∞ −λt
(iii) kRλ f k ≤ 0 e kf k dt = λ−1 kf k.
(iv) Follows from the equation
Z ∞
Z ²
Z ∞
Z ∞
−λt
−λt
−λt
e Tt f dt = λ
e f dt + λ
e (Tt f − f ) dt + λ
e−λt (Tt f − f ) dt
λ
0
0
0
²
observing that the first term on the r.h.s. is f , the second (resp. the third) term is small
for small ² (resp. for any ² and large λ).
(v) From definitions
Z
1 ∞ −λt
1
ARλ f = lim (Th − 1)Rλ f =
e (Tt+h f − Tt f ) dt
h→0 h
h 0
"
#
Z
Z
eλh − 1 ∞ −λt
eλh h −λt
= lim
e Tt f dt −
e Tt f dt = λRλ f − f.
h→0
h
h 0
0
(vi) Follows from definitions and (ii).
(vii) Follows from (iv) and (v).
Exer. 3. Give another proof of (vii) above (by-passing the resolvent) by showing
Rt
that ∀ψ ∈ B the vector ψt = 0 Tu ψdu belongs to DA and Aψt = Tt ψ − ψ.
Exer. 4. The generator A of the semigroup Tt f = etη f from Exer. 1 (iv) above
is given by the multiplication operator Af = ηf on functions f s.t. η 2 f ∈ C∞ (Rd ) (or
respectively η 2 f ∈ Lp (Rd )) .
Theorem 4. If Xt is a Lévy process with a characteristic exponent
Z
1
η(u) = i(b, u) − (u, Au) +
[ei(u,y) − 1 − i(u, y)χB1 (y)]ν(dy),
(3)
2
Rd \{0}
its generator is given by
Z
d
d
X
∂f 1 X
∂2f
∂f
Lf (x) =
bj
+
Ajk
+
[f (x+y)−f (x)−
yj
χB (y)]ν(dy).
∂xj 2
∂xj ∂xk
∂xj 1
Rd \{0}
j=1
j=1
d
X
j,k=1
(4)
For instance for a Brownian motion with a drift the generator is given by the differential
part (first two terms) of (4).
Sketch of the Proof. Let us check (4) on the exponential functions. General case
follows then by approximation arguments. For f (x) = ei(u,x)
Z
Z
i(u,x)
ei(u,y) pt (dy) = ei(u,x) etη(u) .
Φt f (x) = f (x + y)pt (dy) = e
29
Hence
d
|t=0 Φt f (x) = η(u)ei(u,x) ,
dt
which is given by (4) due to the elementary properties of the exponent.
Remark. Of course ei(u,x) does not belong to C∞ (Rd ) and some attention should be
payed to an appropriate choice of the domain of the generator.
? Exer. 5. Give an alternative proof of Theorem 4 using the representation of Φt by
the Fourier transform given in a Exer. 2.
Def. An operator A on in Cb (Rd ) defined on a domain DA (i) is conditionally positive,
if Af (x) ≥ 0 for any f ∈ DA s.t. f (x) = 0 = miny f (y), (ii) satisfies the positive maximum
principle (PMP), if Af (x) ≤ 0 for any f ∈ DA s.t. f (x) = maxy f (y) ≥ 0, (iii) is local if
Af (x) = 0 whenever f ∈ DA vanishes in a neighborhood of x, (iv) satisfies a local PMP,
if Af (x) ≤ 0 for any f ∈ DA having a local non-negative maximum at x.
Theorem 5. Let A be a generator of a Feller semigroup Φt . Then (i) A is conditionally positive and (ii) satisfies the PMP on DA . (iii) If moreover A is local and DA
∞
∞
contains Ccomp
, then it satisfies the local PMP on Ccomp
.
Sketch of the proof. For (i)
Lf (x) =
Φt f (x)
Φt f (x) − f (x)
= lim
≥0
t→0
t→0
t
t
Af (x) = lim
by positivity preservation. For (ii) apply (i) to the function fx (y) = f (x) − f (y).
Theorem 6. If the generator L of a (conservative) Feller semigroup Φt with t.f.
∞
pt (x, dy) is local and Ccomp
⊂ DL , then
Lf (x) =
d
X
j=1
bj (x)
d
∂f
1 X
∂2f
+
ajk (x)
∂xj
2
∂xj ∂xk
(5)
j,k=1
for certain bi , aij ∈ C(Rd ) s.t. A = (aij ) is a positive definite matrix.
Proof. To shorten the formulas, assume d = 1. Let χ be a smooth function R 7→ [0, 1]
∞
that equals 1 (resp. 0) for |x| ≤ 1 (resp. |x| > 2). For an f ∈ Ccomp
one can write
1
f (y) = f (x) + f 0 (x)(y − x)χ(y − x) + f 00 (x)(y − x)2 χ(y − x) + gx (y),
2
where gx (y) = o(1)(y − x)2 as y → x. By conservativity L1 = 0. Hence
1
Lf (x) = b(x)f 0 (x) + a(x)f 00 (x) + (Lgx )(x)
2
with
1
b(x) = L[(. − x)χ(. − x)](x) = lim
t→0 t
1
a(x) = L[(. − x) χ(. − x)](x) = lim
t→0 t
2
30
Z
(y − x)χ(y − x)pt (x, dy),
Z
(y − x)2 χ(y − x)pt (x, dy).
But ±gx (y) + ²(y − x)2 vanishes at y = x and has a local minimum there for any ² so that
Lg(x) = 0, which completes the proof.
Def. Feller process with a generator of type (5) is called a (Feller) diffusion.
Exer. 6. Show that the coefficients bj and aij can be defined as
Z
1
bj (x) = lim
(y − x)j 1{|y−x|≤²} (y)pt (x, dy),
(6)
t→0 t
Z
1
aij (x) = lim
(y − x)i (y − x)j 1{|y−x|≤²} (y)pt (x, dy)
(7)
t→0 t
for any ² > 0. Conversely, if these limits exist and are independent of ², then the generator
is local so that the process is a diffusion.
Exer. 7 (a mathematical version of ”Einstein’s style” of the analysis of BM). If
the generator L of a (conservative) Feller semigroup Φt with t.f. pt (x, dy) is such that
∞
Ccomp
⊂ DL and
pt (x; {y : |y − x| ≥ ²}) = o(t), t → 0,
for any ² > 0, then L is local (and hence of diffusion type).
Conclusion about BM. A BM (possibly with a drift) can be characterized as (i) a
diffusion with iid increments or as (ii) a Lévy process with a local generator.
Exer. 8. (i) Show that the resolvent of the standard BM is given by the formula
Z ∞
Z ∞ √
1
1
Rλ f (x) =
Rλ (|x − y|)f (y) dy = √
e− 2λ|y−x| f (y) dy.
(8)
2λ −∞
−∞
Hint: Check this identity for the exponential functions f (x) = eiθx using the known ch.f.
of the normal r.v. N (0, t). (ii) Show that for the standard BM in R3
Z
Z
√
1
3
Rλ f (x) =
Rλ (|x − y|)f (y) dy =
e− 2λ|y−x| f (y) dy.
(9)
R3 2π|x − y|
R3
Hint: observe that
Z
Rλ3 (|z|)
=
∞
e−λt (2πt)−3/2 e−|z|
2
/(2t)
dt = −
0
1
(R1 )0 (|z|).
2π|z| λ
(10)
CHAPTER 3. MARTINGALES METHODS.
Section 8. Martingales.
Def. An adapted integrable process on a filtered probability space is called submartingale if, for all 0 ≤ s ≤ t < ∞,
E(Xt |Fs ) ≥ Xs ,
a supermartingale, if the reverse inequality holds, and a martingale if
E(Xt |Fs ) = Xs .
31
Def. A filtration Ft is said to satisfy the usual hypotheses, if (i) (completeness) F0
contains all sets of P -measure zero (all P -negligible sets), (ii) (right continuity) Ft = Ft+ =
∩²>0 Ft+² . Adding to all Ft (of an arbitrary filtration) all P -negligible sets leads to a new
filtration called the augmented filtration.
Theorem (on regularity of submartingales) (without a proof). Let M be a
submartingale. (i) The following left and right limits exist and are a.s. finite for each
t > 0:
Mt− =
lim
Ms ;
Mt+ =
lim
Ms .
s∈Q,s→t,s<t
s∈Q,s→t,s>t
? (ii) If the filtration satisfies the usual hypotheses and if the map t 7→ EMt is rightcontinuous, then M has a cadlag (right continuous with finite left limits everywhere) modification.
Theorem 1. If X is a Levy process with Lévy symbol η, then ∀u ∈ Rd , the process
Mu (t) = exp{i(u, Xt ) − tη(u)}
is a complex FtX -martingale.
Proof. E|Mu (t)| = exp{−tη(u)} < ∞ for each t. Next, for s ≤ t
Mu (t) = Mu (s) exp{i(u, Xt − Xs ) − (t − s)η(u)}.
Then
E(Mu (t)|FsX ) = Mu (s)E(exp{i(u, X(t − s))}) exp{−(t − s)η(u)} = Mu (s).
Exer. 1. Show that the following processes are martingales:
(1) standard BM Bt , Bt2 − t, Bt3 − 3tBt , Bt4 − 6tBt2 + 3t2 ;
(2) d-dimensional Brownian motion B(t) with covariance A, |B(t)|2 − tr(A)t, and
exp{(u, B(t)) − (u, Au)/2} for any u;
(3) compensated Poisson process Ñt = Nt − λt with an intensity λ and Ñt2 − λt;
(4) closed martingales: E(Y |Ft ), where Y is an arbitrary integrable r.v. in a filtered
probability space. Hint: for (4) use Theorem 5.1 (iv).
Theorem 2 (Dynkin’s formula). Let f ∈ D -domain of a Feller process Xt . Then
the process
Z
Mtf = f (Xt ) − f (X0 ) −
t
Af (Xs ) ds,
t ≥ 0,
0
is a martingale under any initial distribution ν, often called Dynkin’s martingale.
Proof.
Z
f
E(Mt+h
|Ft )
−
Mtf
ÃZ
t+h
= E(f (Xt+h ) −
Z
t+h
Af (Xs ) ds|Ft ) − (f (Xt ) −
0
Af (Xs ) ds)
0
!
Z
0
t
32
h
AΦs f (Xt ) ds = 0.
Af (Xs ) ds|Ft −f (Xt ) = Φh f (Xt )−f (Xt )−
= Φh f (Xt )−E
t
Theorem 3. A Feller process Xt admits a cadlag modification.
Remarks. (i) Unlike martingales, we do not need the right continuity of the filtration
here. (ii) Proving the result for Lévy processes only one often utilizes the special martingales Mu (t) = exp{i(u, Xt ) − tη(u)} instead of Dynkin’s one used in the proof below.
Proof. Let fn be a sequence in C∞ that separates points. By Dynkin’s formula and
Theorem on martingale, there exists a set Ω of full measure s.t. fn (Xt ) has right and left
limits on it for all n along all rational numbers Q. Hence Xt has right and left limits on
Ω. Define
X̃t =
lim
Xs .
s→t,s>t,s∈Q
Then Xs → X̃t a.s. and Xs → Xt weakly (by Feller property). Hence X̃t has the same
distributions as Xt . Moreover, for any h, g ∈ C∞
Eν (g(Xt )h(X̃t )) =
=
lim Eν (g(Xt )Φs−t h(Xt ))
s→t,s>t
lim Eν (g(Xt )h(Xs )) = Eν (g(Xt )h(Xt )),
s→t,s>t
where the uniform convergence was used. This implies
Ef (Xt , X̃t ) = Ef (Xt , Xt )
for all bounded positive Borel functions f . Choosing f to be the indicator of the set
{(x, y) : x 6= y} yields X̃t = Xt a.s.
Our final result here the following:
Theorem 4. The augmented filtration Ftν of the canonical filtration Ft0 is right
continuous.
Proof. Because Ftν and Ftν+ are Pν -complete, it is enough to show
Eν (Z|Ftν ) = Eν (Z|Ftν+ ) Pν − a.s.
0
for F∞
-measurable
Qn and positive Z. By the monotone class theorem, it is sufficient to show
this for Z = i=1 fi (Xti ) with f ∈ C∞ and t1 < .... < tn . We shall use the observation
that
Eν (Z|Ftν ) = Eν (Z|Ft0 ) Pν − a.s.
For a t > 0 choose an integer k: tk−1 ≤ t < tk so that for h < tk − t
ν
Eν (Z|Ft+h
)=
k−1
Y
fi (Xti )gh (Xt+h )
Pν − a.s.
i=1
where
Z
gh (x) =
Z
ptk −t−h (x, dxk )fk (xk )
Z
ptk+1 −tk (xk , dxk+1 )fk+1 (xk+1 )...
33
ptn −tn−1 (xn−1 , dxn )fn (xn ).
As h → 0, gh converges uniformly (Feller!) to
Z
Z
Z
g(x) = ptk −t (x, dxk )fk (xk ) ptk+1 −tk (xk , dxk+1 )fk+1 (xk+1 )... ptn −tn−1 (xn−1 , dxn )fn (xn ).
Moreover, Xt+h → Xt a.s. (right continuity!) and by Theorem 5.2
Eν (Z|Ftν+ )
= lim
h→0
ν
Eν (Z|Ft+h
)
=
k−1
Y
fi (Xti )g(Xt ) = Eν (Z|Ftν ).
i=1
Remark. The Markov property is preserved by augmentation (we shall not address
this issue in detail).
Exer. 2. Show that if a random process Xt is left continuous (e.g. is a Brownian
motion), then its natural filtration FtX is left continuous. Hint: FtX is generated by the
sets Γ = {(Xt1 , ..., Xtn ) ∈ B}, 0 ≤ t1 < ... < tn = t.
Exer.3. Let Xt be a Markov chain on {1, ..., n} with transition probabilities qij > 0,
i 6= j, which can be defined via the semigroup of stochastic matrices Φt with the generator
X
(Af )i =
(fj − fi )qij .
j6=i
Let Nt = Nt (i) denote the number of transitions during time t of a process starting at
Rt
P
some point i. Show that Nt − 0 q(Xs ) ds is a martingale, where q(l) = j6=l qlj denote the
Rt
intensity of the jumps. Hint: to check that ENt = E 0 q(Xs ) ds show that the function
ENt is differentiable and
n
X
d
E(Nt ) =
P (Xt = j)qj .
dt
j=1
Exer. 4 (Poisson integrals). Recall first that right continuous functions of bounded
variation on R+ (=differences of increasing functions) are i one-to-one correspondence with
signed Radon measures on R+ according to the formulas ft = µ([0, t]), µ((s, t]) = ft − fs ,
and the Stieltjes integral of a locally bounded Borel function g
Z t
Z
gs dfs =
gs dfs
0
(0,t]
is defined as the Lebesgue integral of g with respect to the corresponding measure µ. Let
Nt be a Poisson process of intensity c > 0 with respect to a right continuous filtration Ft .
(i) Show that
Z t
Z t
1
1
Ns dNs = Nt (Nt + 1),
Ns− dNs = Nt (Nt − 1)
2
2
0
0
(integration in the sense of Stieltjes). (ii) Let H be a left continuous bounded adapted
process. Show that the processes
Z t
Z t
Mt =
Hs dNs − c
Hs ds,
(1)
0
0
34
Z
Mt2
−c
0
t
Hs2 ds
(2)
are martingales. Hint: For (ii) check this first for simple left continuous processes Hs =
ξ(ω)1(a,b] (s), where from adapted-ness ξ is Ft -measurable for any t ∈ (a, b] and hence
Fa -measurable by right continuity. Then, say
Mt = ξ[(Nmin(t,b) − Na ) − c(min(t, b) − a)],
t ≥ a,
and one conclude that EMt = 0 by the independence of ξ and Na+u −Na and the properties
of the latter.
Section 9. Stopping times and optional sampling theorem.
Def. Let (Ω, F, P ) be a probability space equipped with a filtration Ft . Astopping
time (respectively optional time) is a r.v. T : Ω 7→ [0, ∞] s.t. ∀t ≥ 0, (T ≤ t) ∈ Ft
(respectively (T < t) ∈ Ft ).
Prop. 1. (i) T is a stopping time ⇒ T is an optional time. (ii) if Ft is right
continuous, the two notions coincide.
Proof. (i) {T < t} = ∪∞
n=1 {T ≤ t − 1/n} ∈ Ft−1/n ⊂ Ft .
∞
(ii) {T ≤ t} = ∩n=m {T < t + 1/n} ∈ Ft+1/m . Hence {T ≤ t} ∈ Ft+ .
Def. Hitting time: TA = inf{t ≥ 0 : Xt ∈ A}, where Xt is a process and A is a Borel
set.
Exer. 1. Show that if X is a Ft -adapted and right continuous and A is (i) open or
(ii) closed, then TA is (i) a optional or (ii) a stopping time respectively. Hint:
(i)
{T < t} = ∪s<t,s∈Q {Xs ∈ A} ⊂ Ft ,
(ii) {T > t} = ∩s≤t,s∈Q {Xs ∈
/ A} ⊂ Ft .
Prop. 2. If T, S are stopping times, then so are min(T, S), max(T, S) and T + S.
Proof. (i) {min(T, S) ≤ t} = {T ≤ t} ∪ {S ≤ t}. (ii) {max(T, S) ≤ t} = {T ≤
t} ∩ {S ≤ t}. At last for (iii)
{T + S > t} = {T = 0, S > t} ∪ {T > t, S = 0} ∪ {T ≥ t, S > 0} ∪ {0 < T < t, T + S > t}.
The first three events are in Ft trivially or by Prop. 1. To see that the same holds for the
last one, it can be written as
∪r∈(0,t)∩Q {t > T > r, S > t − r}.
Def. If T is a stopping time and X is a adapted process, the stopped σ-algebra FT
(of events determined prior to T ) is
Ft = {A ∈ F : A ∩ {T ≤ t} ∈ Ft , ∀t ≥ 0}
35
and the stopped r.v. XT is XT (ω) = XT (ω) .
Exer. 2. (i) Convince yourself that FT is a σ-algebra. (ii) Show that if S, T are
stoping times s.t. S ≤ T a.s., then FS ⊂ FT .
Exer. 3. If X is an adapted process and T a stopping time taking finitely many
values, then XT is FT -measurable. Hint: Say, range of T is t1 < ... < tn . Then
{XT ∈ B} ∩ {T ≤ tj } = ∪jk=1 {XT ∈ B} ∩ {T = tk } = ∪jk=1 {Xtk ∈ B} ∩ {T = tk } ∈ Ftj .
Exer. 4. (i) If Tn is a sequence of Ft stopping times, then supn Tn is a stopping
time. (ii) If Ft is right continuous, then inf n Tn is a stopping time. (iii) If additionally Tn
is decreasing and converging to T , then FT = ∩n FTn . Hint: (i) {sup Tn ≤ t} = ∩{Tn ≤ t}.
(ii) {inf Tn < t} = ∪{Tn < t} ∈ Ft and use Prop. 1 (ii).
Def. A process X is progressively measurable (or progressive) if ∀t the map (s, ω) 7→
Xs (ω) from [0, t] × Ω into Rd is B([0, t]) ⊗ Ft -measurable.
Prop. 3. An adapted process with right or left continuous paths is progressive.
(n)
Proof. Say, Xt is right continuous. Define X0 (ω) = X0 (ω) and
Xs(n) (ω) = X(k+1)t/2n (ω) for
kt
k+1
<s≤
t
n
2
2n
(n)
where t > 0, n > 0, k = 0, 1, ..., 2n − 1. The map (s, ω) 7→ Xs (ω) is B([0, t]) ⊗ Ft (n)
measurable. Hence the same holds for Xs , since Xs → Xs by right continuity.
Prop. 4. If X is progressive and T is a stopping time, then the stopped r.v. XT is
FT -measurable on {T < ∞}.
Proof. The r.v. (s, ω) 7→ Xs (ω) is B([0, t]) ⊗ Ft measurable, and the mapping ω 7→
(T (ω), ω) is Ft , B([0, t])⊗Ft measurable, and so is its restriction on the set {T ≤ t}. Hence
the composition XT (ω) of this maps is Ft -measurable on the set {T ≤ t}, which means
{ω : XT (ω) ∈ B, T ≤ t} ∈ Ft for a Borel set B, as required.
Exer. 5. For an optional time T define the sequence (Tn ), n ∈ N of decreasing
random times converging to T as
½
Tn (ω) =
T (∞),
if T (ω) = ∞
k/2n , if (k − 1)/2n ≤ T (ω) < k/2n
Show that all Tn are stopping times converging monotonically to T .
Def. (predictability and martingale transform (discrete stochastic integral)). A process Hn , n = 1, 2, ..., is called predictable with respect to a discrete filtration
Fn , n = 0, 1, ..., if Hn is Fn−1 -measurable for all n. Let (Xn ), n = 0, 1, ... be a stochastic
process adapted to Fn , and Hn a positive bounded predictable process. The process H ◦ X
defined inductively by
(H ◦ X)0 = X0 ,
(H ◦ X)n = (H ◦ X)n−1 + Hn (Xn − Xn−1 )
is called the transform of X by H and a martingale transform if X is a martingale.
36
Prop. 5. (H ◦ X) is a (sub)martingale whenever X so is.
Proof. Follows from
E((H ◦ X)n |Fn−1 ) = (H ◦ X)n−1 + Hn E(Xn − Xn−1 |Fn−1 ).
Exer. 6. Let T, S be bounded stopping times s.t. S ≤ T ≤ M . Let
Hn = 1n≤T − 1n≤S = 1S<n≤T .
Show that H is predictable and (H ◦ X)n − X0 = XT − XS for n > M . Hint:
(H ◦ X)n − X0 = 1S<1≤T (X1 − X0 ) + ... + 1S<n≤T (Xn − X0 ).
Prop. 6 (discrete optional sampling and martingale characterization). (i)
Let (Xn ), n = 0, 1, ... be a Fn -adapted integrable process. The following three statements
are equivalent:
(i) Xt is a submartingale (respectively a martingale),
(ii) for any bounded stopping times S ≤ T
E(XS ) ≤ E(XT )
(1)
(respectively with the equality sign),
(iii) for any bounded stopping times S ≤ T
XS ≤ E(XT |Fs )
a.s.
(2)
(respectively with equality).
Proof. (iii) ⇒ (i) is obvious. (i) ⇒ (ii) follows from Exer. 6 and Prop. 5. Finally, to
get (ii) ⇒ (iii) one applies (1) to the stopping times
S B = S1B + M (1 − 1B ),
T B = T 1B + M (1 − 1B )
with B ∈ FS (check they are stopping times!) yielding
E(XS 1B + XM (1 − 1B )) ≤ E(XT 1B + XM (1 − 1B )),
which implies E(XS 1B ) ≤ E(XT 1B ) and hence (2).
As an easy application one gets the following fundamental estimate.
? Prop. 7. (i) If Xn is a submartingale, n = 1, ..., N , then
λP (sup |Xn | ≥ λ) ≤ E(|XN |1supn |Xn |≥λ ) ≤ E(|XN |).
(ii) If Xt is a right continuous submartingale on t ∈ [0, T ] or t ≥ 0, then
λP (sup |Xt | ≥ λ) ≤ sup E(|Xt |).
t
37
Proof. As |Xn | is again a submartingale, it is enough to consider the case of positive
X. Define a stopping time S being equal to N if supn Xn < λ and S = inf{n : Xn ≥ λ}
otherwise. Then
E(XN ) ≥ E(XS ) = E(XS 1supn |Xn |≥λ ) + E(XS 1supn |Xn |<λ )
≥ λP (sup Xn ≥ λ) + E(XN 1supn |Xn |<λ ),
and the required estimate follows by subtraction.
(ii) From a finite index set one directly extends it to a countable index set, and then
use right continuity to obtain the general estimate.
Doob’s optional stopping (or sampling) theorem. If X is a right continuous
(sub)martingale, S ≤ T are two stopping times and either (i) T is bounded or (ii) the
family Xτ with τ running through all stopping times is uniformly integrable (the latter
occurs e.g. if Xt = E(X∞ |Ft ) for some integrable X∞ ), then XS and XT are integrable
with
XS ≤ E(XT |FS )
with equality in case X is a martingale.
Proof. Let Sn ≤ Tn be a sequences of decreasing stopping times with countably many
values (see Exer. 5) converging to S and T . Then
Z
Z
A
XSn dP ≤
A
XTn dp
(3)
for all A ∈ FSn and in particular for A ∈ FS . By right continuity XTn (respectively XSn )
converge to XT (respectively XS ) point-wise and by uniform integrability also in L1 (use
Theorem 5.3 in case (i))). Hence (3) implies
Z
Z
XS dP ≤
A
XT dp
A
for A ∈ FS , i.e. the required result.
Example: ”violation of optional sampling”. (ηn ), n ∈ N, are iid Bernoulli r.v.
s.t. ηn equals 1 (success) or -1 (loss) with probability p and q = 1 − p. The player’s stake
at nth turn is Vn . Naturally Vn is Fn−1 = σ(η1 , ...., ηn−1 )-measurable. Then the total gain
is
n
n
X
X
Xn =
Vi η i =
Vi ∆Yi = (V ◦ Y )n ,
i=1
i=1
where Yn = η1 + ... + ηn . The game is fair (or favorable, or unfavorable) if p = q (or p > q,
or p < q) ⇔ (Xn , Fn ) is a martingale (or submartingale, or supermartingale).
Consider a strategy V (called the martingale strategy) s.t. V1 = 1 and further
½
Vn =
2n−1 , if η1 = ... = ηn−1 = −1, .
0
otherwise
38
Pn
Thus if η1 = ... = ηn = −1, the total loss after n turns will be i=1 2i−1 = 2n − 1 and if
then ηn+1 = 1, Xn+1 = 2n − (2n − 1) = 1. Denoting T = inf{n : Xn = 1} and assuming
p = q = 1/2 yields
P (T = n) = (1/2)n ,
P (T < ∞) = 1(Borel − Cantelli!),
and consequently
EXT = P (XT = 1) = 1 > X0 = 0,
though Xn is a martingale and EXn = 0 for all n.
Section 10. Strong Markov property. Diffusions as Feller processes with
continuous paths.
Main Def. A time homogeneous Markov process with t.f. pt is called strong Markov
if
Eν (f (XS+t )|FS ) = (Φt f )(XS ) Pν − a.s.on
{S < ∞}
(1)
for any {Ft }-stopping time S, initial distribution ν and positive Borel f .
Exer. 1. If (1) holds for bounded stopping times, then it holds for all stopping times.
Hint: For any n and a stopping time S
Eν (f (Xmin(S,n)+t )|Fmin(S,n) ) = (Φt f )(Xmin(S,n) ) Pν − a.s.
Hence by locality (Theorem 5.4)
Eν (f (XS+t )|FS ) = (Φt f )(XS )
Pν − a.s.
on {S ≤ n}.
To complete the proof take n → ∞ thus exhausting the set {S < ∞}.
Exer. 2. A Markov process Xt is strong Markov ⇔ ∀ a.s. finite stopping time T
the process Yt = XT +t is a Markov process with respect to FT +t with the same t.f. Hint:
strong Markov ⇔
Eν (f (XT +t+s )|FT +t ) = (Φs f )(XT +t )
Pν − a.s.
? Exer. 3. A canonical Markov process is strong Markov ⇔
Eν (Z ◦ θt |FS ) = EXS (Z)
Pν − a.s.
for any {Ft }-optional time S, initial distribution ν and F∞ -measurable r.v. Z, where θ is
the canonical shift.
Theorem 1. Any Feller process Xt is strong Markov. Proof. Let T take values on a
countable set D. Then
X
X
Eν (f (XT +t )|FT ) =
1T =d Eν f (Xd+t )|Fd ) =
1T =d Φt f (Xd ) = Φt f (XT ).
d∈D
d∈D
39
For a general T take a decreasing sequence of Tn with only finitely many values converging
to T . Then
Eν (f (XTn +t )|FTn ) = Φt f (XTn ),
for all n, i.e.
Z
Z
A
f (XTn +t )P (dω) =
A
Φt f (XTn )P (dω)
for all A ∈ FTn , in particular for A ∈ FT , as FT ⊂ FTn . Hence by right continuity of Xt
and dominated convergence passing to limit n → ∞ yields
Z
Z
f (XT +t )P (dω) =
Φt f (XT )P (dω)
A
A
for all A ∈ FT , as required.
Theorem 2. If X is a Lévy process, then the process XT (t) = XT +t − XT is again
a Lévy process, which is independent of Ft , and its law under Pν is the same as that of X
under P0 .
First proof (as a corollary of the strong Markov property of Feller processes). For a
positive Borel functions fi
Ã
!
Ã
!
Y
Y
Eν
fi (XT +ti − XT )|FT = EXT
fi (Xti − X0 ) ,
i
i
but this is a constant not depending on XT .
2nd proof (direct using special martingales Mu (t) = exp{i(u, Xt ) − tη(u)}). Assume
T is bounded. Let A ∈ FT , ui ∈ Rd , 0 = t0 < t1 < ... < tn . Then




n
n
n
X
Y
Y
Muj (T + tj )




E 1A exp{i
(uj , XT (tj ) − XT (tj−1 ))} = E 1A
φt −t (uj ),
Muj (T + tj−1 ) j=1 j j−1
j=1
j=1
where φt (u) = Eei(u,Xt ) . By conditioning for s < t
µ
¶
µ
¶
Mu (T + t)
1A
E 1A
=E
E(Mu (T + t)|FT +s ) = P (A).
Mu (T + s)
Mu (T + s)
Repeating this argument yields


n
n
X
Y
E 1A exp{i
(uj , XT (tj ) − XT (tj−1 ))} = P (A)
φtj −tj−1 (uj ),
j=1
j=1
which implies the statement of the Theorem by means of the following fact.
Exer. 4. Suppose X is a r.v. on (Ω, F, P ), G is a sub-σ-algebra of F and
E(ei(u,X) 1A ) = φp (u)P (A)
40
for any A ∈ G, where φp is the ch.f. of a probability law p. Then X is independent of G
and the distribution of X is p.
Def. Denote
τh = inf{t ≥ 0 : |Xt − X0 | > h}, h > 0.
A point x is called absorbing, if τh = ∞ a.s. for every h.
Lemma (intuitively clear, omit a technical proof). A point is absorbing iff Φf (x) =
f (x) for any f ∈ DA . Otherwise Ex (τh ) < ∞ for all sufficiently small h.
Theorem 3 (Dynkin’s formula for the generator). Let X be a Feller process
with continuous paths and generator A. For any f ∈ DA and a non absorbing x
Ex f (Xτh ) − f (x)
.
h→0
Ex τh
Af (x) = lim
(2)
For absorbing points x and all f : Af (x) = 0.
Proof. By Dynkin’s martingale and optional stopping
Z min(t,τh )
Af (Xs ) ds,
Ex f (Xmin(t,τh ) ) − f (x) = Ex
t, h > 0.
0
As Ex τh < ∞ for small h, this extends to t = ∞ by dominated convergence. This implies
(1) by the continuity of Af taking into account that Ex (τh ) > 0 by continuity of paths.
The next beautiful result is a direct consequence of (2).
∞
Theorem 4. Let A be a generator of a Feller process Xt s. t. Ccomp
⊂ DA . If Xt is
∞
a.s. continuous Pν for every ν, then A is local on Ccomp and hence Xt is a diffusion.
Remark. The inverse statement holds as well, e.g. the Feller processes with local
generators have a.s. continuous paths.
Conclusion about BM. We have got another proof that BM (possibly with a drift)
is the only Lévy process with continuous paths.
Section 11. Reflection principle and passage times for BM.
Def. Let B be a Brownian motion on (Ω, F, P ). The passage time Tb to a level b is
Tb (ω) = inf{t ≥ 0 : Bt (ω) = b}.
The (intuitively clear) equation
P (Tb < t, Bt ≥ b) = P (Tb < t, Bt < b)
for b > 0 is called the reflection principle.
Since
P (Tb < t) = P (Tb < t, Bt ≥ b) + P (Tb < t, Bt < b),
and P (Tb < t, Bt ≥ b) = P (Bt ≥ b) it implies
Z
p
P (Tb < t) = 2P (Bt ≥ b) = 2/(tπ)
Z
∞
b
41
e
−x2 /2t
∞
dx = 2/π
bt−1/2
e−x
2
/2
dx.
Differentiating yields the density
P (Tb ∈ dt) = √
|b|
2πt3
e−b
2
/2t
dt.
(1)
The necessity to justify the reflection principle and hence these calculations was one
of the reason to introduce the strong Markov property.
Theorem 1 (reflection principle). For a BM Bt
P (Tb ≤ t) = P (Mt ≥ b) = 2P (Bt ≥ b) = P (|Bt | ≥ b),
(2)
where Mt = inf{b : Tb ≥ t} = sup{Bs : s ≤ t}. In particular, distribution (1) holds.
Proof.
P (Mt ≥ b, Bt < b) = P (Tb ≤ t, BTb +(t−Tb ) − BTb < 0) = P (Tb ≤ t)P (Bs < 0)
=
1
1
P (Tb ≤ t) = P (Mt ≥ b),
2
2
and the result follows as
P (Mt ≥ b) = P (Bt ≥ b) + P (Mt ≥ b, Bt < b).
Theorem 2. Process Ta is a left continuous non-decreasing Levy process (i.e. it is a
subordinator), and Ta+ = inf{t : Bt > a} is its right continuous modification.
Proof. Since Tb − Ta = inf{t ≥ 0 : BTa +t − BTa ≥ b − a}, this difference is independent
of FTa by the strong Markov property of BM. Stochastic continuity follows from the
density (1). Clearly the process Ta (respectively Ta+ ) is non-decreasing and left continuous
(respectively right continuous) and Ta+ = lims→0,s>0 Ta+s . At last it follows from the
continuity of BM that Ta = Ta+ a.s.
Theorem 3. For the process Ta
Ee−uTa = e−a
√
2u
,
(3)
which implies by (4.13), (4.15) that Ta is a stable subordinator with the index α = 1/2 and
Lévy measure ν(dx) = (2πx3 )−1/2 dx.
First proof. Compute directly from density (1) using the integral calculated in (7.10).
Second proof. As Ms (t) = exp{sBt − s2 t/2} is a martingale one concludes from
optional sampling that
1 = E exp{sBTa − s2 Ta /2} = esa E exp{−s2 Ta /2},
and (3) follows by substituting u = s2 /2. (Remark. As Doob’s theorem is stated for
bounded stopping times, in order to be precise here one has to consider first the stopping
times min(n, Ta ) and then pass to the limit n → ∞.)
42
Third proof. For any a > 0 the process 1b Ta√b is the first hitting time of the level a for
the process b−1/2 Bbt . As by the scaling property of BM the latter is again a BM, 1b Ta√b
and Ta are identically distributed, and thus the subordinator Ta is stable. Comparing
expectations one identifies the rate leading again to (3).
Theorem 4. The joint distribution of Mt and Bt is given by the density
½
¾
2(2b − a)
(2b − a)2
exp −
φ(t, a, b) = P (Bt ∈ da, Mt ∈ db) = √
dadb.
(4)
2t
2πt3
Proof. Let a ≤ b. Then
P (Bt < a, Mt ≥ b) = P (Mt ≥ b, BTb +(t−Tb ) − BTb < −(b − a))
= P (Mt ≥ b, BTb +(t−Tb ) − BTb ≥ b − a) = P (Mt ≥ b, Bt ≥ 2b − a) = P (Bt ≥ 2b − a),
and (4) follows by differentiation.
Theorem 5. The reflected Brownian motion |Bt | and the process Yt = Mt − Bt are
both Markov with the same probability density
p+
t (x, y) = pt (x − y) + pt (x + y),
(5)
where pt (x − y) is the transition density of the standard BM.
Proof. To prove the statement for |Bt | one has to show that
Z
b
P (|Bt + x| ∈ [a, b]) = P (|Bt − x| ∈ [a, b]) =
a
p+
t (x, y) dy
(6)
for all b > a ≥ 0. This holds, because
P (|Bt + x| ∈ [a, b]) = P (Bt ∈ [a − x, b − x]) + P (−Bt ∈ [a + x, b + x])
Z
= P (Bt ∈ [a − x, b − x]) + P (Bt ∈ [a + x, b + x]) =
b
(pt (y + x) + pt (y − x)) dy.
a
Turning to Yt let m = Mt > 0, b = Bt < m and r = m − b. Then by the strong
Markov:
P (Mt+h − Bt+h < ξ|Ft ) = P (Mt+h − Bt+h < ξ|Bt = b, Mt = m)
= E(1Mt+h −Bt+h <ξ 1Mt+h =m |Bt = b, Mt = m)+E(1Mt+h −Bt+h <ξ 1Mt+h >m |Bt = b, Mt = m)
= E(1r−Bh <ξ 1Mh <r ) + E(1Mh −Bh <ξ 1Mh ≥r ),
and this is seen (by inspection) to be the integral of φ(t, x, y) from (4) over the domain
r − ξ < x < y < x + ξ, i.e. it equals
Z
Z
∞
dx
r−ξ
Z
x+ξ
∞
dyφ(t, x, y) = −
x
r−ξ
43
pt (2y − x)|y=x+ξ
,
y=x
∂
because φ(t, x, y) = − ∂y
pt (2y − x). Hence
Z
Z
∞
P (Mt+h − Bt+h < ξ|Ft ) =
r−ξ
Z
pt (2ξ + x) dx
r−ξ
Z
∞
=
∞
pt (x) dx −
∞
pt (x) dx −
pt (y) dy.
r−ξ
r+ξ
Differentiating with respect to ξ yields (5).
Def. The arcsin law is the distribution of ξ = sin2 X when X is U (0, 2π) (uniformly
distributed on [0, 2π]). Clearly
P (ξ ≤ t} = P {| sin X| ≤
√
t} =
√
2
arcsin t,
π
t ∈ [0, 1].
(7)
Theorem 6. Let Bt be a Brownian motion on [0, 1] with the maximum process Mt .
Then the random times τ = inf{t : Bt = M1 } (when Bt first attains its maximum),
τ̃ = sup{t : Bt = M1 } (when Bt for the last time attains its maximum) and the time
θ = sup{t : Bt = 0} of the last exit from the origin obey all the arcsin law. In particular,
as τ ≤ τ̃ it implies that τ = τ̃ a.s.
Proof.
µ
¶
P (τ̃ ≤ t) = P (τ ≤ t) = P sup(Bs − Bt ) ≥ sup(Bs − Bt ) = P (|Bt | ≥ |B1 − Bt |)
s≤t
s≥t
µ
2
2
= P (tξ ≥ (1 − t)η ) = P
η2
≤t
ξ2 + η2
¶
= P (sin2 X ≤ t),
where ξ, η are independent N (0, 1) r.v. and X is uniformly distributed on [0, 2π]. (Exer.:
use Theorems 1,4 to explain the reasoning behind all this equivalences!). And
P (θ < t) = P (sup Bs < 0) + P (inf Bs > 0) = 2P (sup(Bs − Bt ) < −Bt )
s≥t
s≥t
s≥t
= 2(|B1 − Bt | < Bt ) = P (|B1 − Bt | < |Bt |) = P (τ ≤ t).
Exer. Show (either directly or applying the scaling transformation to (7)) that for
τt = inf{s ∈ (0, t) : Bs = Mt }
Z
r
P (τt ≤ r) =
0
dy
2
p
= arcsin
π
π y(t − y)
r
r
.
t
CHAPTER 4. HEAT CONDUCTION (OR DIFFUSION) EQUATION.
Section 12. The Dirichlet problem for diffusion operators.
44
Assume aij , bj are continuous bounded function s.t. the matrix (aij ) is positive
definite and the operator
d
X
d
∂f
1 X
∂2f
Lf (x) =
bj (x)
+
ajk (x)
∂xj
2
∂xj ∂xk
j=1
j,k=1
generates a Feller diffusion Xt . Assume Ω is an open subset of Rd with the boundary ∂Ω
and closure Ω̄. The Dirichlet problem for L in Ω consists in finding an u ∈ Cb (Ω̄) ∩ Cb2 (Ω)
s.t.
Lu(x) = f (x), x ∈ Ω, u|∂Ω = ψ
(1)
for given f ∈ Cb (Ω), ψ ∈ Cb (∂Ω). A fundamental link between probability and PDE is
given by the following
Theorem 1. Let Ω be bounded and Ex τΩ < ∞ for all x ∈ Ω, where τΩ = inf{t ≥ 0 :
Xt ∈ ∂Ω} (e.g. if X is a BM), and let u ∈ Cb (Ω̄) ∩ C 2 (Ω) be a solution to (1). Then
·
¸
Z τΩ
u(x) = Ex ψ(XτΩ ) −
f (Xt ) dt .
(2)
0
In particular, such a solution u is unique.
Proof. (i) Assume first that u can be extended to the whole Rd as a function u ∈
C∞ (Rd )∩Cb2 (Rd ). Then u ∈ DL and applying the stopping time τΩ to Dynkin’s martingale
yields
·
¸
Z
τΩ
E u(XτΩ ) − u(x) −
Lu(Xt ) dt = 0,
(2a)
0
implying (2). (ii) In general case choose an expanding sequence of domains Ωn ⊂ Ω with
smooth boundaries tending to Ω as n → ∞. The solution u to the problem
Lun (x) = f (x), x ∈ Ωn ,
un |∂Ωn = u
can be extended to Rd as in (i) and hence is unique and has the representation
·
¸
Z τΩn
u(x) = un (x) = Ex u(XτΩn ) −
f (Xt ) dt , x ∈ Ωn .
0
Taking the limit as → ∞ yields (2), because τΩn → τΩ , as n → ∞.
Example. Take Ω = (α, β) ⊂ R and
L=
1
d2
d
a(x) 2 + b(x)
2
dx
dx
with a, b ∈ C(Ω̄), a > 0. Then u(x) = Px (XτΩ = β) is the probability that Xt starting at
a point x ∈ (α, β) reaches β before α and represents a solution to the problem
d2 u(x)
du(x)
1
a(x)
+ b(x)
= 0, x ∈ (α, β),
2
2
dx
dx
45
u(α) = 0, u(β) = 1.
(3)
On the other hand, u(x) = Ex τΩ is the mean exit time from Ω that solves the problem
du(x)
1
d2 u(x)
+
b(x)
a(x)
= −1, x ∈ (α, β),
2
dx2
dx
u(α) = u(β) = 0.
Exer.1. (i) Solve problem (3) analytically showing that
ÃZ
Z
x
Px (XτΩ = β) =
β
exp{g(y)} dy
α
exp{g(y)} dy
(4)
!−1
,
(5)
α
Rx
where g(x) = − α (2b/a)(y) dy. In particular, for a standard BM Bt starting at x this
gives Px (BτΩ = β) = (x − α)/(β − α). (ii) Solve (4) with b = 0 showing that in this case
Z x
Z
x−y
x−α β β−y
Ex τΩ = 2
dy − 2
dy.
(6)
β − α α a(y)
α a(y)
In particular, for a BM this turns to (x − α)(β − x). Hint for (ii): show first that the
solution to the Cauchy problem
1
a(x)u00 (x) = −1,
2
is given by formula
Z
x
u(x) = ω(x − α) − 2
u(α) = 0
(x − y)a−1 (y) dy
α
with a constant ω.
0
Exer. 2. Check that ∆φ = h00 (|x|) + d−1
|x| h (|x|) for φ(x) = h(|x|). Deduce that if
such φ is harmonic (i.e. satisfies the Laplace equation ∆φ = 0) in Rd , then
½
A + Br−(d−2) , d > 2
(7)
h(r) =
A + B ln r, d = 2
with some constants A, B.
Exer. 3. Solve the equation ∆φ = 0 in the shell Sr,R = {x ∈ Rd : r < |x| < R}
with boundary conditions φ(x) being 1 (respectively zero) on |x| = R (resp. |x| = r).
Hence compute the probability that the standard Browninan motion started from a point
x ∈ Sr,R leaves the shell via the outer part of the boundary. Hint: choosing appropriate
A, B from (7) one finds
( 2−d 2−d
|x|
−r
d>2
2−d −r 2−d ,
R
φ(x) = ln |x|−ln r
.
(8)
d=2
ln R−ln r ,
This describes the required probability due to Theorem 1.
Exer. 4. Calculate the probability of the Brownian motion Wt ever hitting the ball
Br if started at a distance a > r from the origin. Hint: Let TR (resp. Tr ) be the first time
kWt k = R (resp. r). By letting R → ∞ in (8)
½
(r/a)d−2 , d > 2
Px (Tr < ∞) = lim Px (Tr < TR ) =
(9)
R→∞
1, d = 2
46
Exer. 5. Use Borel-Cantelli and Exer. 4 to deduce that for d > 2 and any starting
point x 6= 0 there exists a.s. a positive r > 0 s.t. Wtx starting at x never hits the ball
x
B
Pr . Hint: For any r < a let An be the event that Wt ever hits the ball Br/2n . Then
P (An ) < ∞.
Exer. 6. Show that BM in dimension d > 2 is transient, i.e. that a.s. limt→∞ kWt k =
∞. Hint: As Wt is a.s. unbounded (why?), the event that Wt does not tend to infinity
means that there exists a ball Br s.t. infinitely many events An occur, where An means
that the trajectory returns to Br after being outside B2n r . This leads to a contradiction
by Borel-Cantelli and (9).
? Theorem 2. Let L be a generator of a Feller diffusion Xt . Given a domain Ω ⊂ Rd
assume that there exists a two times continuously differentiable function f ≥ 0 in Rd \ Ω
s.t. Lf (x) ≤ 0 and for some a > 0 and a point x0 ∈ Rd \ Ω one has
f (x0 ) < a < inf{f (x) : x ∈ ∂Ω}.
Then Xt started at x0 will never hit Ω with a positive probability (this actually means that
the diffusion Xt is transient).
Proof. Let N > kx0 k, and let τΩ and τN denote the hitting times of Ω and the sphere
kyk = N respectively. Put TN = min(τN , τΩ ). From (2a) it follows that
Ex0 f (XTn ) ≤ f (x0 ) < a.
Hence
a > inf{f (x) : x ∈ ∂Ω}Px0 (τΩ < τN ) > aPx0 (τΩ < τN ).
passing to the limit as n → ∞ yields
a > aPx0 (τΩ < ∞)
implying Px0 (τΩ < ∞) < 1.
Section 13. The stationary Feynman-Kac formula.
Recall that the equation
λg = Ag + f,
(1)
where A is the generator of a Feller semigroup Φt , f ∈ C∞ (Rd ), λ > 0, is solved uniquely
by the formula
Z ∞
g(x) = Rλ f (x) = Ex
e−λt f (Xt ) dt.
0
This suggests a guess that a solution to the more general equation
(λ + k)g = Ag + f,
where the additional letter k denotes a bounded continuous function could look like
Z ∞
Z t
g(x) = Ex
exp{−λt −
k(Xs ) ds}f (Xt ) dt.
0
0
47
(2)
(3)
This is the stationary Feynman-Kac formula that we are going to discuss now. The fastest
way of proving it (at least for diffusions) is by means of Ito’s stochastic calculus. Not
having this tool at our disposal, we shall use a different method by first rewriting it in
terms of the resolvents (thus rewriting the differential equation (2) in an integral form).
Theorem 1. Let Xt be a Feller process with the semigroup Φt and the generator A.
Suppose f ∈ C∞ (Rd ), k is a continuous bounded non-negative function and λ > 0. Then
g ∈ DA and satisfies (2) iff g ∈ C∞ (Rd ) and
Rλ (kg) = Rλ f − g.
(4)
Proof. Applying Rλ to both sides of (2) and using Rλ (λ − A)g = g yields (4). Conversely, subtracting the resolvent equations for f and kg
ARλ f = λRλ f − f,
ARλ (kg) = λRλ (kg) − kg,
(5)
and using (4) yields (2).
Theorem 2. Under the assumptions of Theorem 1 the function (3) yields a solution
to (4) and hence to (2).
Proof. Using the Markov property one writes
Z ∞
Rλ (kg) = Ex
e−λs k(Xs )g(Xs ) ds
0
Z
= Ex
Z
∞
e
−λs
k(Xs )
0
Z
∞
t
exp{−λt −
0
k(Xu+s ) du}f (Xt+s ) dt ds.
0
Changing the variables of integration t, u to t̃ = s + t and ũ = s + u and denoting them
again by t and u respectively leads to
Z ∞
Z t
Z t
−λt
Rλ (kg) = Ex
e f (Xt )
k(Xs ) exp{−
k(Xu ) du} ds dt,
0
0
s
which by the integration by parts rewrites as
·
¸
Z ∞
Z t
−λt
Ex
e f (Xt ) 1 − exp{−
k(Xs ) ds} dt = (Rλ f − g)(x),
0
0
as required.
In many interesting particular situations the validity of formula (3) can be extended
beyond the general conditions of Theorem 2. Let us consider one of this extensions for a
one-dimensional BM.
Theorem 3. Assume k ≥ 0 and f are piecewise-continuous bounded functions on R
with the finite sets of discontinuity being Disck and Discf . Then the (clearly bounded)
function g given by (3) with Xt being a BM Bt is continuously differentiable, has a piecewise continuous second derivative and satisfies
(λ + k)g =
1 00
g +f
2
outside Disck ∪ Discf .
48
(6)
Proof. The calculations in the proof of Theorem 2 remains valid for all bounded
measurable f and k showing that g satisfies (4). Moreover, for piecewise continuous f and
k one sees from dominated convergence that this g is continuous. Next, from Exercise 7.7
one finds that
·Z x √
¸
Z ∞ √
1
2λ(y−x)
2λ(x−y)
e
f (y) dy +
f (y) dy .
Rλ f (x) = √
e
2λ −∞
x
Hence Rλ f is continuously differentiable for any bounded measurable f with
Z ∞ √
Z x √
0
2λ(x−y)
e
e 2λ(y−x) f (y) dy.
(Rλ f ) (x) =
f (y) dy −
x
−∞
This implies in turn that (Rλ f )00 is piece-wise continuous for a piecewise continuous f and
the resolvent equations (5) hold outside Discf ∪ Disck . Hence one shows as in Theorem 2
that g satisfies (6), which by integration implies the continuity of g 0 .
Exer. 2. Show that for α, β > 0 and a BM Bt
√
·
¸
√
1
α + β − α −√2(α+β)x
√
e
Ex
exp −αt − β
1+
α+β
α
0
0
(7)
for x ≥ 0. Hint: by Theorem 3 the function z(x) on the l.h.s. of (7) is a bounded solution
to the equation
½
αz(x) = 12 z 00 (x) − βz(x) + 1, x > 0
(8)
αz(x) = 12 z 00 (x) + 1, x < 0
Z
∞
½
Z
t
¾
1(0,∞) (Bs ) ds dt =
with the boundary conditions
z 0 (0+ ) = z 0 (0− ).
z(0+ ) = z(0− ),
The bounded solution to (8) have the form
(
z(x) =
p
1
A exp{− 2(α + β)x} + α+β
,
√
1
B exp{ 2αx} + α , x < 0
x>0
.
Theorem 4 (arcsin law for the occupation time). The law for the occupation
Rt
time Ot = 0 1(0,∞) (Bs ) ds of (0, ∞) by a standard BM Bt has the density
P (Ot ∈ dy) =
π
dy
p
y(t − y)
.
(9)
Proof. By the uniqueness of the Laplace transform it is enough to show that
Z
Ee
−βOt
=
0
t
dy
.
e−βy p
π y(t − y)
49
(10)
But from (7)
Z
∞
1
e−αt Ee−βOt dt = z(0) = p
α(α + β)
0
,
and on the other hand
Z
∞
Z
t
−αt
e
0
e
−βy
0
dy
1
p
dydt =
π
π y(t − y)
Z
∞
0
e−(α+β)y
dy
√
y
Z
0
∞
e−αs
1
√ ds = p
,
s
α(α + β)
which implies (10) again by the uniqueness of the Laplace transform.
Exer. 3. From formula (9) yielding the solution to eq. (λ − ∆)g = f , λ > 0, in R3
deduce that the solution to the Poisson equation ∆g = −f in R3 is given by formula
Z
1
g=
2π
f (y)
dy
|x − y|
whenever f decreases quickly enough at infinity.
Section 14. Diffusions with variable drift, Ornstein-Uhlenbeck processes.
In order to be able to solve probabilistically equations involving second order differential operators, one has to know that these operators generate Markov (Feller) semigroups.
Here we show how BM can be used to construct processes with the generators of the form
Lf (x) =
1
∂f
∆f (x) + (b(x),
),
2
∂x
x ∈ Rd .
(1)
Let b be a bounded Lipshitz continuous function, i.e. |b(x) − b(y)| ≤ C|x − y| with a
constant C. Let Bt be a Ft BM on a filtered probability space. Then the equation
Z
Xt = x +
t
b(Xs ) ds + Bt
0
has a unique global continuous solution Xt (x) for any x depending continuously on x
(proof by fixed point arguments literally the same as for usual ODE). Clearly Xt (x) is a
Ft -Markov process starting at x.
Theorem 1. Xt is a Feller process with the generator (1).
Proof. Clearly Φt f (x) = Ef (Xt (x)) is a semigroup of positive contractions on Cb (Rd ).
∞
Let f ∈ Ccomp
(Rd ). Then
∂f
Φt f (x) − f (x) = E (x)(Bt +
∂x
1
+ E
2
µ
∂2f
(x)(Bt +
∂x2
Z
Z
t
b(Xs ) ds)
0
Z
t
b(Xs ) ds), Bt +
0
0
50
t
¶
b(Xs ) ds + ...,
where dots denote the correcting term of McLaurent (or Taylor) series. Taking into account
that E|Btk | = O(tk/2 ) it follows that the r.h.s. of this expression is
¶
µ 2
∂f
1
∂ f
( (x), E(Bt + tb(x))) + E
(x)Bt , Bt + o(t), t → 0,
∂x
2
∂x2
so that
1
(Φt f (x) − f (x)) → Lf (x),
t
t → 0.
∞
Hence any f ∈ Ccomp
(Rd ) belongs to the domain of the generator L and Lf is given by
formula (1). As clearly Φt f → f for any such f , t → 0, it follows that the same holds for
all f ∈ C∞ by density arguments.
Exer 1. Convince yourself that the assumption that b is bounded can be dispensed
with (only Lipshitz continuity is essential).
Example 1. Solution to the Langevin equation
Z t
vt = v − b
vs ds + Bt
0
with a given constant b > 0 defines a Feller process called Ornstein-Uhlenbeck (velocity)
process with the generator
LF (v) =
1
∂f
∆f (v) − b(v,
),
2
∂v
v ∈ Rd .
(2)
R
The pair (vt , xt = x0 + vs ds) describes the evolution of a (Newton) particle subject to
white noise driving force and friction and is also called sometimes the Ornstein-Uhlenbeck
process.
Example 2. Solution to the system
½
ẋt = yt R
Rt
t
(3)
yt = − 0 ∂V
∂x (xs ) ds − b 0 ys ds + Bt
describes the evolution of a Newton particle in the potential field V subject to friction and
white noise driving force.
Exer. 2. Assume b = 0 and that the potential V is bounded below, say V ≥ 1
everywhere, and is increasing to ∞ as |x| → ∞.
(i) Write down the generator L of the pair process (xt , yt ). Answer:
Lf (x, y) = (y,
∂f
∂V ∂f
1
)−(
,
) + ∆f.
∂x
∂x ∂y
2
(ii) Check that L(H −α ) ≤ 0 for 0 < α < (d/2) − 1, where H(x, y) = V (x) + y 2 /2 is the
energy function (Hamiltonian). (iii) Applying Dynkin’s formula with f = H −α for the
process starting at (x, y) with the stopping time
τh = inf{t ≥ 0 : H(xt , yt ) = h}
51
with h < H(x, y), show that Ex,y f ((x, y)(τh )) < f (x, y) and consequently
Px,y (τh < ∞) ≤ (h/H(x, y))α .
(iv) Follow the same reasoning as in Exer. 12.6 to establish that the process (xt , yt ) is
transient in dimension d ≥ 3 (this result is remarkable, as it holds for all (smooth) V .
Open problem. Under which condition on V the process specified by (3) with b = 0
is transient in dimension d = 1, 2 (for d ≥ 3 the answer is fully settled by Exer. 2; in d = 1
only a necessary (but not a sufficient) condition for transience is known).
CHAPTER 5. FINE PROPERTIES of BM.
Section 15. Zeros, excursions and local times.
From Theorem 11.5 one has for a BM Wt that
|Wt | = Mt − Bt ,
(1)
where Bt is another BM and Mt is its maximum. Hence the times where Wt is away
from the origin coincide with times where Mt 6= Bt and remains constant so that one
can interpret Mt as a measure of time Wt spends at the origin. This motivates the Lévy
definition of the process measuring the local times Lt (0) of a BM Wt spend at the origin by
the equation 2Lt (0) = Mt (some authors do not include the multiplier 2 in this equation).
As for each τ the time s ≤ τ for which Bs = Mτ is a.s. unique (see Theorem 11.6),
one can choose a set of full measure ω0 s.t. for all ω ∈ Ω0 this holds for all rational τ . For
any such ω and a t > 0 define
γt (ω) = sup{s ∈ [0, t] : Ws = 0} = sup{s ∈ [0, t] : Bs = Mt },
βt (ω) = inf{s ∈ [t, ∞) : Ws = 0} = inf{s ∈ [t, ∞) : Bs = Mt }
so that
γt (ω) < t < βt (ω)
(2)
whenever Wt 6= 0. Assumption ω ∈ Ω0 implies that the maximum of Bs on [0, t] is attained
uniquely at s = γt (ω) so that
TMt (ω) (ω) = γt (ω),
TMt (ω)+ (ω) = βt (ω)
and thus
TMt (ω)+ (ω) − TMt (ω) (ω) = βt (ω) − γt (ω)
– the size of the jump in Tb (ω) at b = Mt (ω) equals the length of the excursion interval
(γt (ω), βt (ω)) straddling t.
Let N (b, [δ, ²)) denote the number of jumps of size l ∈ [δ, ²) of the Levy subordinator
Ta (or its right continuous modification Ta+ ) that occur on the time interval (0, b] (cf.
notations in Corollary 3 to Theorem 4.5), and let N δ (b) = N (b, [δ, ∞)). According to
52
Corollary 3 to Theorem 4.5 and Theorem 11.3 the process b → N (b, [δ, ²)) is a Poisson
process with the intensity
Z
r
²
ν([δ, ²)) =
3 −1/2
(2πx )
dx =
δ
Theorem 1. A.s.
r
lim
δ→0
2 −1/2
(δ
− ²−1/2 ).
π
πδ δ
N (b) = b ∀b ∈ [0, ∞).
2
(3)
(4)
p
2
Proof. According to (3) the r.v. Qt = N 1/t (b) is Poisson with parameter 2/πbt.
Hence, as the process Qt has non-decreasing right continuous paths and independent
increments,
p it is a Poisson process and by the law of large numbers (see Exer. 4.9)
Qt /t → b 2/π a.s. as t → ∞ implying (4) for a given b. Then one deduce that it
holds a.s. for all rational b, and then by continuity one extends it to all b.
Theorem 2 (Lévy, 1948).
r
Lt (0) = lim
δ→0
πδ
nt (δ),
8
(5)
where nt is the number of excursion intervals away from the origin, of duration ≥ δ,
completed by Ws , s ≤ t.
Proof. It follows from (4) and the above definition of Lévy’s local times that
r
Lt (0) = lim
δ→0
πδ
ñt (δ),
8
where ñt (δ) denotes the number of excursion intervals away from the origin, of duration
≥ δ, completed by Ws , s ≤ TMt + . But according to (2) βt = TMt + is the time of completion
of the excursion straddling t. Hence nt (δ) and ñt (δ) differs at most by one, and (5) follows.
Exer. Show that the zero set of BM {t : Wt = 0} is a (i) closed set of Lebesgue
measure zero, and (ii) is unbounded and has an accumulation point at the origin. Hint: (i)
use Fubini’s theorem and continuity of BM, (ii) maximum and minimum of BM are both
a.s. not equal to zero on any finite interval, and are both a.s. unbounded on t ≥ 0.
Section 16. Skorohod imbedding and invariance principle.
For a ≤ 0 ≤ b let νa,b be the unique probability measure on the two point set {a, b}
with mean zero so that νa,b = δ0 for ab = 0 and
νa,b =
bδa − aδb
b−a
otherwise.
53
(1)
Prop. 1 (Randomization Lemma). For any distribution
R µ on R of zero
R mean denote µ± its restriction on R+ and R− respectively and put c = xµ+ (dx) = − xµ− (dx).
Then
Z
µ = µ̃(dx dy)νx,y ,
(2)
where the distribution µ̃ on R− × R+ is given by
µ̃(dx dy) = µ({0})δ0,0 (dx dy) + c−1 (y − x)µ− (dx)µ+ (dy).
Proof. Direct calculations applying both sides of (2) to a continuous function f .
Prop. 2. Let τ be a stopping time for BM Bt such that Bmin(t,τ ) is uniformly bounded.
Then
EBτ = 0, Eτ = EBτ2 .
Proof. By optional stopping (and basic martingales)
EBmin(t,τ ) = 0,
2
E(min(t, τ )) = EBmin(t,τ
),
and desired result is obtained by dominated and monotone convergence as t → ∞.
Prop. 3 (embedding of r.v.). For a probability measure µ on R with mean
zero choose a random pair (a, b) with distribution µ̃ from Prop. 1 and an independent
BM Bt . Then (i) the random time T = inf{t : Bt ∈ {a, b}} is optional for filtration
σ{a, b; Bs , s ≤ t}, (ii) the law of Bτ is µ, (iii) the expectation of τ coincides with the
second moment (variance) of µ.
Proof. By Exer. 1 of Section 12 the r.v. Bτ for fixed a, b would have the distribution
(1). Hence
Z Z
Z
Ef (Bτ ) = EE(f (Bτ )|a, b) =
f (z)νx,y (dz)µ̃(dx dy) = f (x)µ(dx),
yielding (ii). Then (iii) follows from Prop. 2.
Theorem 1 (Skorohod embedding). Let ξ1 , ξ2 , ... be iid r.v. with mean 0 and
Sn = ξ1 + ... + ξn . Then there exist a filtered probability space with a BM Bt and stopping
times 0 = T0 ≤ T1 ≤ ... s.t. the differences ∆Tn = Tn − Tn−1 are iid with E∆Tn = Eξ12
and BTn are distributed like Sn for all n.
Remark. τn = inf{t ≥ τn−1 : Bt = Sn } would give a trivial solution if the moment
requirement would not be imposed.
Proof. Let µ denote the common law of ξj . Take iid pairs (an , bn ), n=1,2,..., with
the distribution µ̃ from Prop. 1 and an independent BM. Everything follows from the
recursively definition of random times 0 = T0 ≤ T1 ≤ T2 ≤ ... by
Tn = inf{t ≥ Tn−1 : Bt − BTn−1 ∈ {a, b}}.
Theorem 2 (Approximation of random walks). Let ξ1 , ξ2 , ... be iid r.v. with
mean 0 and variance 1, and let Sn = ξ1 + ... + ξn . Then there exists a BM Bt s.t.
54
Xt = t−1/2 sups≤t |S[s] − Bs | converges to zero in probability as t → ∞ ([s] denotes the
integer part of s).
Proof. Choose Tn and B as in Theorem 1. Then Tn /n → 1 a.s. by LLN, hence
T[t] /t → 1 a.s. and hence (check it!) δt /t → 0 a.s., where δt = sups≤t |T[s] − s|. For any
t, h, ² by scaling property of BM
√
P (Xt > ²) ≤ P (δt > th) + P (
sup
|Bu − Bv | > ² t)
u−v≤th,u,v≤t+th
= P (δt /t > h) + P (
sup
u−v≤h,u,v≤1+h
|Bu − Bv | > ²),
which can be made arbitrary small by choosing small h and large t.
Corollary (Functional CLT, invariance principle, two formulations). (i) For
all C, ² > 0 there exists N s.t. for all n > N there exists a BM Bt (depending on n) s.t.
¯
¯
¶
µ
¯ S[tn]
¯
P sup ¯¯ √ − Bt ¯¯ > C < ².
n
t≤1
(ii) Let F be a uniformly continuous function on the space D[0, 1] of cadlag functions on
S
[0, 1] equipped with the sup-norm topology. Then F ( √[tn]
) converges in distribution to F (B)
n
with B = Bt being a standard BM.
Proof. (i) Applying Theorem 2 with t = n yields
µ
¶
−1/2
P n
sup |S[s] − Bs | > C → 0
s≤n
as n → ∞ for any C. With s = tn this rewrites as
¯
¯
µ
¶
¯ S[tn]
¯
B
tn
P sup ¯¯ √ − √ ¯¯ > C → 0.
n
n
t≤1
√
But by scaling Btn / n is again a BM and (i) follows.
(ii) One has to show that
µ
¶
S[.n]
E g(F ( √ )) − g(F (B. )) → 0
n
(3)
as n → 0 for any bounded uniformly continuous g. Choosing for each n a version of B from
(i) one decomposes (3) into the sum of two terms with the function under the expectation
multiplied by the indicators 1Yn >C and 1Yn ≤C respectively, where
¯
¯
¯
¯ S[tn]
Yn = sup ¯¯ √ − Bt ¯¯ .
n
t≤1
Then the first term is small by (i) for any C and n large enough, and the second term is
small for small C by uniform continuity of F and g.
55
Examples. 1. Applying statement (i) with t = 1 yields the usual CLT for random walks. 2. Applying (ii) with F (h(.)) = supt∈[0,1] h(t) and taking into account the
distribution of the maximum of BM (obtained by the reflection principle) yields
µ
P
¶
max{Sk : k ≤ n}
√
≤ x → 2P (N ≤ x),
n
x ≥ 0,
where N is a standard normal r.v. N (0, 1).
Section 17. Sample path properties. Non-differentiability, quadratic variation,
module of continuity, iterated log, rate of escape.
56
Download