Browninan Motion. Lecture Notes. October 2006 V. Kolokoltsov These notes, revised and extended, form the basis for Chapter 2 and 3 of my book “Markov Processes, Semigroups and Generators”, de Gryuter 2011. Aims and Objectives. Brownian motion (BM) is an acknowledged champion in stochastic modeling for a wide variety of processes in physics (statistical mechanics, quantum fields, etc), biology (e.g. population dynamics, migration, disease spreading), finances (e.g. common stock prices). BM enjoys beautiful nontrivial properties deeply linked with various areas of mathematics. The general theory of modern stochastic process is strongly rooted in BM and was largely developed by extensions of its remarkable features. The aim of the course is to learn the basic properties of BM, its potential and limitation as a modeling tool, to understand its place among the main general classes of random processes such as martingales, Markov and Lévy processes, to learn to apply the general tools of stochastic analysis (e.g. stopping times) to BM and related diffusions, and to appreciate basic notions and methods of modern stochastic analysis itself through its application to BM. General Remarks: 1) ? denotes an additional material, 2) Sections 1 and 5 contain basic probability prerequisites for the course. 3) Recommended supplementary reading: main: I. Karatzas, S. Shreve. Browninan Motion and Stochastic Calculus. Springer 1998; D. Revuz, M. Yor. Continuous Martingales and Brownian Motion. Springer 1999; for references in probability: J. Jacod, Ph. Protter. Probability Essentials. Springer 2004; A.N. Shiryayev. Probability. Springer 1984. Content. CHAPTER 1. DEFINITION AND CONSTRUCTION of BM (p.1). 0. Overview, historical sketch, perspectives. 1. Review of measure and probability. 2. Brownian motion: construction via Hilbert space methods. 3. The construction of BM via Kolmogorov’s continuity theorem. CHAPTER 2. The LÉVY, MARKOV AND FELLER PROCESSES (p.12). 4. Processes with s.i. increments. 5. Conditioning. 6. Markov processes. 7. Feller processes and semigroups. CHAPTER 3. MARTINGALE METHODS (p.31). 8. Martingales. 9. Stopping times and optional sampling theorem. 10. Strong Markov property. Diffusions as Feller processes with continuous paths. 11. Reflection principle and passage times for BM. CHAPTER 4. HEAT CONDUCTION (OR DIFFUSION) EQUATION (p.45). 12. The Dirichlet problem for diffusion operators. 13. The stationary Feynman-Kac formula. 14. Diffusions with variable drift, Ornstein-Uhlenbeck processes. CHAPTER 5. FINE PROPERTIES of BM (p.52). Section 15. Zeros, excursions and local times. 16. Skorohod imbedding and invariance principle. 17. Sample path properties. 1 CHAPTER 1. DEFINITION AND CONSTRUCTION of BM. Section 0. Overview, historical sketch, perspectives. Brown, Einstein, Smoluchovski, Langevin, Ornstein-Uhlenbeck, Chandrasekhar, Wiener, Feynman-Kac, Nelson, McKean, Dyson. Section 1. Review of measure and probability Def. A collection F of subsets of a given set S is called a σ-algebra if (i) S ∈ F; (ii) A ∈ F ⇒ Ac ∈ F; (iii) (σ-additivity) ∪∞ n=1 An ∈ F whenever An ∈ F for any n ∈ N. The pair (S, F) is called a measurable space. A measure on (S, F) is a mapping µ : F 7→ [0, ∞] such that µ(∅) = 0 and σ-additivity holds: µ (∪∞ n=1 An ) = ∞ X µ(An ) n=1 for any sequence An of mutually disjoint sets in F. The triple (S, F, µ) is called a measure space. A measure µ is called finite if its total mass µ(S) is finite and respectively σ-finite if there exists a sequence An , n ∈ N, of subsets of F such that S = ∪∞ n=1 An and µ(An ) < ∞ for all n. For a collection of subsets Γ of a set Ω σ-algebra σ(Γ) generated by Γ is the minimal σ-algebra containing all sets from Γ. Examples. 1. A measure space (Ω, F, µ) is called a probability space whenever µ(Ω) = 1. In this case µ is called a probability measure and the subsets from F are called events. 2. For a topological space, e.g. a subset of Rd , the smallest σ-algebra B(S) containing all its open subsets is called the Borel σ-algebra of S. Its elements are called Borel sets and any measure on (S, B(S)) is called a Borel measure. The simplest example of a Borel measure is given by Lebesgue measure on Rd . 3. For a finite or a countable family of measure spaces (Si , Fi , µi ), i = 1, 2, ...,, the product measure space (S, F, µ) is defined, where S = S1 × S2 × ..., F = F1 ⊗ F2 ⊗ ... -the σ-algebra generated by the sets A1 × ... × An , Ai ∈ Fi , n ∈ N, and µ = µ1 × µ2 × ... is the product measure uniquely specified by the prescription µ(A1 × ... × An ) = µ1 (A1 )...µn (An ). Borel-Cantelli lemma. If a sequence of events An , n ∈ N, on a probability space P (Ω, F, P ) is such that n P (An ) < ∞, then a.s. only a finite number of An can occur. Proof. Let B = {ω ∈ Ω : infinite number An occur}. Then B = ∩n (∪k≥n Ak ) and P (B) ≤ P (∪k≥n Ak ) ≤ X P (Ak ) → 0, k≥n as n → ∞. Hence P (B) = 0. Def. Completion. For a measure space (S, F, µ) a subset of S is called negligible if it is a subset of a N ∈ F with µ(N ) = 0. The σ-algebra of subsets F̄ of S of the form A ∪ B with A ⊂ F and B being negligible and the measure µ̄ on it defined on these sets as 2 µ̄(A ∪ B) = µ(A) are called respectively the completion of F and µ (with respect to µ). In particular, for S ⊂ Rd the completion of B(S) with respect to Lebesgue measure is called the σ-algebra of Lebesgue measurable sets in S. Def. For a probability space (Ω, F, µ) one says that some property depending on ω ∈ Ω holds almost surely or with probability 1 if there exists a negligible set N ∈ F such that this property holds for all ω ∈ Ω \ N . Def. If (Si , Fi ), i = 1, 2, are measurable spaces a mapping f : S1 7→ S2 is called (F1 , F2 )-measurable if f −1 (A) ∈ F1 whenever A ∈ F2 . If S1 , S2 are subsets of Rd equipped with their Borel σ-algebra such a mapping is said to be Borel measurable. Speaking about measurable mapping with values in Rd one usually means that Rd is equipped with its Borel σ-algebra. ? Exercise and Def. For S ⊂ Rd the universal σ-field U(S) is defined as the intersection of the completions of B(S) with respect to all probability measures on S. The (U (S), B(S))- measurable functions are called universally measurable. Show that a real valued function f is universally measurable if and only if for every probability measure µ on S there exists a Borel measurable function gµ such that µ{x : f (x) 6= gµ (x)} = 0. Hint for ”only if” part: show that f (x) = inf{r ∈ Q : x ∈ U (r)}, where U (r) = {x ∈ S : f (x) ≤ r}. Since U (r) belong to the completion of the Borel σ-algebra with respect to µ there exist B(r), r ∈ Q, such that µ (∪r∈Q (B(r)∆U (r))) = 0. Define gµ (x) = inf{r ∈ Q : x ∈ B(r)}. Def. For a probability space (Ω, F, P ) the measurable mappings X : Ω 7→ Rd are called random variables (shortly r.v.). The law (or distribution) of such a mapping is the Borel probability measure pX on Rd defined as pX = P ◦ X −1 . In other words pX (A) = P (X −1 (A)) = P (ω ∈ Ω : X(ω) ∈ A) = P (X ∈ A). Two r.v. X and Y are called identically distributed if they have the same probability law. For a real (i.e. one-dimensional) r.v. X its distribution function is defined by FX (x) = pX ((−∞, x]). RA real r.v. X has a continuous distribution with a probability density function f if pX (A) = A f (x)dx for all Borel sets A. The σ-algebra σ(X) generated by a r.v. X is the smallest σ-algebra containing the sets {X ⊂ B} for all Borel sets B. Exercise. Show that if X takes only finite number of values, then the law pX is a sum of Dirac’s δ- measures. Def. Expectation and covariance. For a Rd -valued r.v. X on a probability space (Ω, F, µ) and a Borel measurable function f : Rd 7→ Rm the expectation E of f (X) is defined as Z Z E(f (X)) = f (X(ω))P (dω) = f (x)pX (dx). (1) Rd Ω 3 X is called integrable if E(|X|) < ∞. For two Rd -valued r.v. X = (X1 , ..., Xd ) and Y = (Y1 , ..., Yd ) the d × d matrix with the entries E[(Xi − E(Xi ))(Yj − E(Yj )] is called the covariance of X and Y and is denoted Cov(X, Y ). In case d = 1 and X = Y the number Cov(X, Y ) is called the variance of X and is denoted by V ar(X) and sometimes also by 2 σX . The r.v. X and Y are called uncorrelated whenever Cov(X, Y ) = 0. Exercise. Show that the two expressions in the definition (1.1) really coincide. Hint: first choose f to be an indicator. Def. Four basic notions of convergence of r.v. Let X and Xn , n ∈ N be d R -valued r.v. (defined in a probability space). One says that Xn converges to X (i) in Lp (1 ≤ p < ∞) if limn→∞ E(|Xn − X|p ) = 0; (ii) almost surely if limn→∞ Xn (ω) = X(ω) almost surely; (iii) in probability if for any ² > 0 limn→∞ P (|Xn − X| > ²) = 0; (iv) in distribution if pXn weakly converges to pX , i.e. if Z Z lim n→∞ Rd f (x)pXn (dx) = f (x)pX (dx) Rd for all bounded continuous functions f . Exer. Show that Lp -convergence ⇒ convergence in probability ⇒ weak convergence. Hint: for the first ⇒ use Chebyshev inequality; for the second one decompose the integral R |f (Xn (ω)) − f (X(ω))|P (dω) into the three terms over the sets {|Xn − X| > δ}, {|Xn − X| ≤ δ, |X| > 1/δ} and {|Xn − X| ≤ δ, |X| ≤ 1/δ}. Exer. (i) Show that Xn → X in probability ⇔ µ lim E n→∞ |Xn − X| 1 + |Xn − X| ¶ = 0. (ii) Deduce from (i) that almost sure convergence implies convergence in probability. Hint for (i): it is enough to prove it for X = 0; for ”if part” then use the inequality µ E |Xn | 1 + |Xn | ¶ ² ≥ P 1+² µ ¶ |Xn | >² ; 1 + |Xn | for ”only if part” decompose the integral E(|Xn |/(1 + |Xn |) into the two terms over the sets |Xn | > ² and |Xn | < ². Example. Consider the following sequence of indicator functions {Xn } on [0, 1]: 1[0,1] , 1[0,1/2] , 1[1/2,1] , 1[0,1/3] , 1[1/3,2/3] , 1[2/3,1] , 1[0,1/4] , 1[1/4,2/4] , etc. Then Xn → 0 as n → ∞ in probability and in all Lp , p ≥ 1, but not a.s.; in fact lim sup Xn (x) = 1 and lim inf Xn (x) = 0 for each x so that Xn (x) → X(x) nowhere. Exer. (i) Convince yourself that Xn → X in distribution does not imply Xn − X → 0 in distribution.(ii) Show that Xn → 0 in distribution ⇒ Xn → 0 in probability. ? Exer. Show that Xn → X a.s. ⇔ lim P { sup |Xn − X| > ²} = 0 m→∞ n≥m 4 for all ² > 0. Use this to give another proof of the fact that convergence a.s. implies convergence in probability. Hint: observe that the event Xn → X is complement to the event B = ∪r∈Q Br , Br = ∩m∈Q { sup |Xn − X| > 1/r}, n≥m i.e., a.s. convergence is equivalent to P (B) = 0 and hence to P (Br ) = 0 for all r. Def. A family H of L1 (Ω, F, µ) is uniformly integrable if lim sup E(|X|1|X|>c ) = 0. c→∞ X∈H Exer. If either (i) supX∈H E(|X|p ) < ∞ for a p > 1, or (ii) ∃ an integrable r.v. Y s.t. |X| ≤ Y for all X ∈ H, then H is uniformly integrable. Hint: (i) E(|X|1|X|>c ) < (ii) 1 cp−1 E(|X|p 1|X|>c ) < 1 cp−1 E(|X|p ), E(|X|1|X|>c ) < E(Y 1Y >c ). Exer. If Xn → X a.s. Rand {Xn } is uniformly integrable, then Xn → X in L1 . Hint: decompose the integral |Xn − X|p(dω) into the sum of three over the domains {|Xn − X| > ²}, {|Xn − X| ≤ ², |X| ≤ c} and {|Xn − X| ≤ ², |X| > c}, and they can be made small respectively because Xn → X in probability (as it holds a.s.), by dominated convergence an by uniform integrability. Def. Characteristic functions. If p is a probability measure on Rd its characterisR i(y,x) tic function is the funtion φp (y) = e p(dx). For a Rd -valued r.v. X its characteristic function is defined as the characteristic function φX = φpX of its law pX , i.e. Z i(y,x) φX (y) = E(e )= Rd ei(y,x) pX (dx). Exer. Show that any ch.f. is a continuous function. Hint: use |φX (y + h) − φX (y)| ≤ E|eihX − 1| ≤ max |eihx − 1| + 2P (|X| > a). |x|≤a (2) ? Riemann-Lebesgue Lemma. If a probability measure p has a density, then φp belongs to C∞ (Rd ) (continuous functions tending to 0 as its argument tends to ∞). In other words, the inverse Fourier transform Z f →F −1 −d/2 f (y) = (2π) ei(y,x) f (x)dx is a bounded linear operator L1 (Rd ) 7→ C∞ (Rd ). Sketch of the proof. Reduce to the case, when f is a continuously differentiable function with a compact support, then use integration by part. 5 Exercise and Def. For a vector m ∈ Rd and a positive definite d × d-matrix A a r.v. X is called Gaussian (or has Gaussian distribution) with mean m and covariance A and is denoted by N (m, A) whenever its characteristic function is 1 φN (m,A) (y) = exp{i(m, y) − (y, Ay)}. 2 (i) Show that if A is non-degenerate, a N (m, A) r.v. have a distribution with the pdf f (x) = (2π)d/2 1 p 1 exp{− (x − m, A−1 (x − m))}. 2 det(A) (ii) Show that m = E(X) and Aij = E((Xi − mi )(Xj − mj )). Exercise and Def. suppose X1 and X2 are independent Rd -valued r.v. with laws µ1 , µ2 and characteristic functions φ1 and φ2 . (i) Show that the r.v. X1 + X2 has the characteristic function φ1 φ2 and the law given by the convolution µ1 ? µ2 defined by Z Z χA−x (y)µ1 (dy)µ2 (dx). (µ1 ? µ2 )(A) = µ1 (A − x)µ2 (dx) = Rd R2d (ii) Extend this result to the case of n independent r.v. X1 ,..., Xn . ? Exer. and Def. Show that if probability distributions pn , n ∈ N, converge weakly to a probability distribution p, then (i) the family pn is tight, i.e. ∀² > 0 ∃K > 0 : ∀n, pn (|x| > K) < ²; (ii) their characteristic functions φn converge uniformly on compact sets. Hint: for (ii) use tightness and representation (2) to show that the family of ch.f is equicontinuous, i.e. ∀² ∃δ : |φn (y + h) − φ(y)| < ² ∀h < δ, n ∈ N which implies the uniform convergence. Glivenko’s Theorem. If φn , n ∈ N, and φ are the characteristic functions of probability distributions pn and p on Rd , then limn→∞ φn (y) = φ(y) for each y ∈ Rd if and only if pn converge to p weakly. Lévy’s Theorem. If φn , n ∈ N, is a sequence of characteristic functions of probability distributions on Rd and limn→∞ φn (y) = φ(y) for each y ∈ Rd and a function φ, which is continuous at the origin, then φ is itself a characteristic function. ? Exer. Show that if a family of probability measures pα is tight, then it is relatively weakly compact, i.e. any sequence of this family has a weakly convergent subsequence. Hint: tight ⇒ family of characteristic functions is equicontinuous (by (2)), and hence is relatively compact in the topology of uniform convergence on compact sets. Finally use Levy’s theorem. Exercise. (i) Show that a finite linear combination of Rd -valued Gaussian r.v. is again a Gaussian r.v. (ii) Show that if a sequence of Rd -valued Gaussian r.v. converges in distribution to a r.v., then the limiting r.v. is again Gaussian. (iii) Show that if (X, Y ) 6 is a R2 -valued Gaussian r.v., then X and Y are uncorrelated if and only if they are independent. ? Bochner’s Theorem. A function φ : Rd 7→ C is a characteristic function of a probability distribution if and only if it satisfies the following three properties: (i) φ(0) = 1; (ii) φ is continuous at the origin; (iii) φ is positive definite, which means that d X cj c̄k φ(yj − yk ) ≥ 0 j,k=1 for all real y1 , ..., yd and all complex c1 , ..., cd . Exercise. Prove the ”only if” part of Bochner’s theorem. Hint: for (iii) observe that d X Z cj c̄k φX (yj − yk ) = j,k=1 Z = Rd d X Rd cj c̄k ei(yj −yk ,x) pX (dx) j,k=1 d X cj ei(yj ,x) pX (dx). j=1 Def. Stochastic processes. A stochastic process is a collection X = (Xt ), t ≥ 0) (or t ∈ [0, T ] for some T > 0) of Rd - valued random variables defined on the same probability space. The finite-dimensional distributions of such a process are the collection of probability measures pt1 ,...,tn on Rdn (parametrized by finite collections of pairwise different non-negative numbers t1 , ..., tn ) defined as pt1 ,...,tn (H) = P ((Xt1 , ..., Xtn ) ∈ H) for each Borel subset H of Rdn . These finite-dimensional distributions are (obviously) consistent (or satisfy Kolmogorov’s consistency criteria): for any n, any permutation π of {1, ..., n}, any sequence 0 ≤ t1 < ... < tn+1 , and any collection of Borel subsets H1 , ..., Hn of Rd one has pt1 ,...,tn (H1 × ... × Hn ) = ptπ(1) ,...,tπ(n) (Hπ(1) × ... × Hπ(n) ), pt1 ,...,tn ,tn+1 (H1 × ... × Hn × Rd ) = pt1 ,...,tn (H1 × ... × Hn ). Def. A stochastic process is called Gaussian if all its finite-dimensional distributions are Gaussian. Kolmogorov’s existence theorem. Given a family of probability measures pt1 ,...,tn (on Rdn ) satisfying the Kolmogorov consistency criteria, there exists a probability space (Ω, F, P ) and a stochastic process X on it having pt1 ,...,tn as its finite-dimensional distri+ bution. In particular, one can choose Ω to be the set (Rd )R of all mappings from R+ to Rd and F to be the smallest σ-algebra containing all cylinder sets ItH1 ,...,tn = {ω ∈ Ω : (ω(t1 ), ..., ω(tn )) ∈ H}, 7 H ∈ B(Rdn ), and X to be the co-ordinate process Xt (ω) = ω(t). Def. ”Sameness” between processes. Suppose two processes X and Y are defined on the same probability space (Ω, F, P ). Then (i) X and Y are called indistinguishable if P (∀tXt = Yt ) = 1; (ii) X is a modification of Y if for each t P (Xt = Yt ) = 1. Example. Consider a positive r.v. ξ with a continuous distribution (i.e. such that P (ξ = x) = 0 for any x). Put Xt = 0 for all t and let Yt be 1 for t = ξ and 0 otherwise. Then Y is a modification of X, but P (∀tXt = Yt ) = 0. Exercise. Suppose Y is a modification of X and both processes have right-continuous sample paths. Then X and Y are indistinguishable. Hint: show that if X is a modification of Y , then P (∀t ∈ Q Xt = Yt ) = 1. Monotone class theorem. Let S be a collection of subsets of a set Ω s.t. (i) Ω ∈ S, (ii) A, B ∈ S ⇒ A \ B ∈ S, (iii) A1 ⊂ A2 ⊂ ..., An ∈ S ⇒ ∪n An ∈ S. If a collection of subsets Γ belongs to S and is closed under pairwise intersection, then σ(Γ) ∈ S. This result is routinely used in stochastic analysis to check a validity of a certain property for elements of σ(Γ), where Γ is a collection of subsets closed under intersection. According to the theorem it is sufficient to check that the validity of this property is preserved under the set substraction and countable unions. Theorem (strong law of large numbers). If ξ1 , ξ2 , ... is a collection of iid r.v. with Eξj = m, then the means (ξ1 + ... + ξn )/n converge a.s. (and in L1 ) to m. ? Riesz-Markov Theorem. Any positive bounded linear functional on the space R C∞ (Rd ) has form f 7→ f (x)µ(dx) for some finite Borel measure µ. Section 2. Brownian motion: construction via Hilbert space methods. Main Def. A Brownian motion (or a Wiener process) with variance σ 2 is a Gaussian process Bt (defined on a probability space (Ω, F, P )) satisfying the following conditions: (i) B0 = 0 a.s.; (ii) the increments Bt − Bs have distribution N (0, σ 2 (t − s)) for all 0 ≤ s < t; (iii) the r.v. Bt2 − Bt1 and Bt4 − Bt3 are independent whenever t1 ≤ t2 ≤ t3 ≤ t4 ; (iv) the trajectories t 7→ Bt are continuous a.s. Brownian motion with σ = 1 is called the standard Wiener process or Brownian motion. Exer. 1. A Gaussian process Bt satisfying conditions (i) and (iv) of the above definition is a Brownian motion if and only if EBt = 0 and E(Bt Bs ) = σ 2 min(s, t) for any t, s. Hint: E(Bt Bs ) = σ 2 min(s, t) implies E((Bt − Bs )Bs ) = 0 for t > s. Hence Bt − Bs , Bs are uncorrelated and consequently independent (being Gaussian). Exer. 2 (elementary transformations of BM). Let Bt be a BM. Then so are the processes (i) Btc = √1c Bct for any c > 0 (scaling), (ii) −Bt (symmetry), (iii) BT − BT −t , t ∈ [0, T ] for any T > 0 (time reversal), (iv) tB1/t (time inversion). Hint: for (iv) in order to get continuity at the origin deduce from the law of large numbers that Bt /t → 0 as t → ∞ a.s. Recall: Hilbert spaces, basis, Parceval. Def. The Haar functions Hkn , n = 1, 2, ..., k = 0, 1, ..., 2n−1 − 1, on [0, 1] are defined as 2(n−1)/2 , k/2n−1 ≤ t < (k + 1/2)/2n−1 , n Hk (t) = −2(n−1)/2 , (k + 1/2)/2n−1 ≤ t < (k + 1)/2n−1 , 0, otherwise 8 Rt and the Schauder functions as Skn (t) = 0 Hkn (u) du. The system of Haar functions is known to be an orthonormal basis in L2 [0, 1]. R1 Exer. 3. Check the orthogonality condition: (Hkn , Hlm ) = 0 Hkn (x)Hlm (x) dx = n k δm δl . Hint: supports of Hkn , Hln do not intersect for k 6= l. Let ξkn , n = 1, 2, ..., k = 0, 1, ..., 2n−1 , be mutually independent N (0, 1) r.v. on a probability space (Ω, F, P ). Exer. 4. Point out a probability space (Ω, F, P ), on which such a family can be defined. Consider the partial sums Btm m X = fn (t, ω), fn (t, ω) = n=1 2n−1 X−1 ξkn (ω)Skn (t). (1) k=0 The main technical ingredient of the construction is the following Lemma. There exists a subset Ω0 ⊂ Ω such that Btm converges as m → ∞ uniformly on [0, 1] for all ω ∈ Ω0 and P (Ω0 ) = 1. Proof. Let Mn (ω) = max{|ξjn | : 0 ≤ j ≤ 2n−1 − 1} Since P (Mn > a) ≤ 2n−1 X−1 P (|ξjn | > a) j=0 1 =2 √ 2π Z n one sees that ∞ e −x2 /2 a 1 dx ≤ 2 √ 2π Z ∞ n a 2 x −x2 /2 1 e dx = 2n √ a−1 e−a /2 , a 2π ∞ X ∞ 1 X n 1 −n2 /2 P (Mn > n) ≤ √ 2 e < ∞. 2π n=1 n n=1 Hence by Borel-Cantelli P (Ω0 ) = 1, where Ω0 = {ω : Mn (ω) ≤ n for large enough n}. Consequently for ω ∈ Ω0 |fn (t, ω)| ≤ n 2n−1 X−1 Skn (t) ≤ n2−(n+1)/2 k=0 for all large enough n, because maxt Skn (t) = 2−(n+1)/2 and the functions Skn have nonintersecting supports for different k. This implies that ∞ X n=0 max |fn (t, ω)| < ∞ 0≤t≤1 9 on Ω0 , which clearly implies the claim of the Lemma. Main Theorem. Let Bt denote the limit of (1) for ω ∈ Ω0 and let us put Bt = 0 for ω outside Ω0 . Then Bt is a standard Brownian motion on [0, 1]. Proof. Since Bt is continuous in t as a uniform limit of continuous functions, the conditions (i) and (iv) of the definition hold. Moreover, the finite-dimensional distributions are clearly Gaussian and EBt = 0. Next, since X (Skn )2 (t) = X (1[0,t] , Hkn )2 = (1[0,t] , 1[0,t] ) = t < ∞ (by Parceval) it follows that E[Bt − Btm ]2 = X 2n−1 X−1 n>m k=0 (Skn (t))2 → 0 as m → ∞, and consequently Btm converge to Bt also in L2 . Hence one deduces that n−1 E(Bt Bs ) = lim E(Btm Bsm ) = m→∞ ∞ 2 X−1 X n=0 (1[0,t] , Hjn )(1[0,s] , Hjn ) = (1[0,t] , 1[0,s] ) = min(t, s), j=0 which completes the proof. Corollary. A standard Brownian motion exists on {t ≥ 0}. Proof. By the main theorem there exists a sequence (Ωn , Fn , Pn ), n = 1, 2, ..., of probability spaces with Brownian motions Wn on each of them. Take the product probability space Ω and define B on it recursively n+1 Bt = Bn + Wt−n , n ≤ t ≤ n + 1. Section 3. The construction of BM via Kolmogorov’s continuity theorem. The Kolmogorov-Chentsov Continuity Theorem. Suppose a process Xt , t ∈ [0, T ] on a probability space (Ω, F, P ) satisfies the condition E|Xt − Xs |α ≤ C|t − s|1+β , 0 ≤ s, t ≤ T, for some positive constants α, β, C. Then there exists a continuous modification X̃t of Xt , which is a.s. locally Hölder continuous with exponent γ for every γ ∈ (0, β/α), i.e. " # |X̃t (ω) − X̃s (ω)| sup P ω: ≤ δ = 1, |t − s|γ s,t∈[0,T ]:|t−s|<h(ω) where h(ω) is an a.s. positive r.v. and δ > 0 is a constant. 10 (1) Proof. Step 1. By Chebyshev P (|Xt − Xs | ≥ ²) ≤ ²−α E|Xt − Xs |α ≤ C²−α |t − s|1+β and hence Xs → Xt in probability as s → t. Step 2. Setting t = k/2n , s = (k − 1)/2n , ² = 2−γn in the above inequality yields P (|Xk/2n − X(k−1)/2n | ≥ 2−γn ) ≤ C2−n(1+β−αγ) . Hence ¶ µ P −γn max |Xk/2n − X(k−1)/2n | ≥ 2 1≤k≤2n n ≤ 2 X P (|Xk/2n −X(k−1)/2n | ≥ 2−γn ) ≤ C2−n(β−αγ) . k=1 By Borel-Cantelli (by the assumption β − αγ > 0) there exists Ω0 of measure 1 such that max |Xk/2n − X(k−1)/2n | < 2−γn , 1≤k≤2n ∀n ≥ n? (ω), (2) where n? (ω) is a positive, integer-valued r.v. Step 3. For each n ≥ 1 define Dn = {k/2n : k = 0, 1, ..., 2n } and D = ∪∞ n=1 Dn . For ? a given ω ∈ Ω0 and n ≥ n (ω) we shall show that ∀m > n |Xt (ω) − Xs (ω)| ≤ 2 m X 2−γj , ∀t, s ∈ Dm : 0 < t − s < 2−n . (3) j=n+1 For m = n + 1 necessarily t − s = 2−(n+1) and (3) follows from (2) with n replaced by n + 1. Suppose (3) is valid for m = n + 1, ..., M − 1. Take s < t : s, t ∈ DM and define the numbers τmax = max{u ∈ DM −1 : u ≤ t}, so that s ≤ τmin ≤ τmax ≤ t; τmin = min{u ∈ DM −1 : u ≥ s} max(τmin − s, t − τmax ) ≤ 2−M . Hence from (2) |Xtmin (ω) − Xs (ω)| ≤ 2−γM , |Xtmax (ω) − Xt (ω)| ≤ 2−γM , and from (3) with m = M − 1 |Xτmax (ω) − Xτmin (ω)| ≤ 2 M −1 X j=n+1 which implies (3) with m = M . 11 2−γj , Step 4. For s, t ∈ D with ? 0 < t − s < h(ω) = 2−n choose n > n? (ω) s.t. (ω) 2−(n+1) ≤ t − s < 2−n . By (3) |Xt (ω) − Xs (ω)| ≤ 2 ∞ X 2−γj ≤ 2(1 − 2−γ )−1 2−(n+1)γ ≤ 2(1 − 2−γ )−1 |t − s|γ , j=n+1 which implies the uniform continuity of Xt with respect to t ∈ D for ω ∈ Ω0 . Step 5. Define X̃t = lims→t,s∈Q Xs for ω ∈ Ω0 and zero otherwise. Then X̃t is continuous and satisfies (1) with δ = 2(1 − 2−γ )−1 . Step 6. X̃s = Xs for s ∈ Q. Then X̃t = Xt a.s. for all t, because Xs → Xt in probability and Xs → X̃t a.s. Exer. Show that for any n ∈ N there exists a constant Cn s.t. E|X|2n = Cn σ 2n for a r.v. X with the normal distribution N (0, σ 2 ). Corollary 1. ∃ a probability measure P on (R[0,∞) , B(R[0,∞) )) and a stochastic process Wt on it which is a BM under P . Proof. By Kolmogorov’s existence theorem ∃ P s.t. co-ordinate process Xt satisfies all properties, but for continuity (if needed, details are given for general Markov processes in Section 6). By Kolmogorov’s continuity theorem and the Exercise above, for each T there exists a continuous modification W T on [0, T ]. Set ΩT = {ω : WtT (ω) = Xt (ω) ∀t ∈ [0, T ] ∩ Q}, Ω0 = ∩∞ T =1 Ωt . As WtT = WtS for t ∈ [0, min(T, S)] (continuous modifications of each other), their common values define a required process on t ≥ 0. Corollary 2. BM is a.s. Hölder continuous with any exponent γ ∈ (0, 1/2). Proof. From Kolmogorov’s theorem and above exercise it follows that BM is a.s. Hölder continuous with exponent γ whenever γ < (n − 1)/2n for some positive n. CHAPTER 2. The LÉVY, MARKOV AND FELLER PROCESSES. Section 4. Processes with s.i. increments. Def. A probability measure µ on Rd with a ch.f. φµ is called infinitely divisible if, for all n ∈ N, there exists a probability measure ν such that µ = ν ? ... ? ν (n times) ⇔ φµ (y) = f n (y) with f being a ch.f. of a probability measure. Exer. 1. Convince yourself that two definitions above are actually equivalent. Def. and Exer. A r.v. X is called infinitely divisible whenever its law pX is infinitely divisible. Show that this is equivalent to the existence, for any n, of iid r.v. Yj , j = 1, ..., n, s.t. Y1 + ... + Yn has the law pX . 12 Exer. 2. Convince yourself that any Gaussian distribution is infinitely divisible. Examples. (i) A r.v. N with non-negative integers as a range is called Poisson with the mean (or parameter) c > 0 if cn −c P (N = n) = e . n! Check (Exer.!) that E(N ) = V ar(N ) = c and that the ch.f. of N is φN (y) = exp{c(eiy − 1)}. This implies that N is infinitely divisible. (ii) Let now Z(n), n ∈ N, be a sequence of Rd -valued iid r.v. with law µZ . The compound Poisson r.v. is X = Z(1) + ... + Z(N ) (random walk with a random number of steps). Let us check that Z (ei(y,x) − 1)cµZ (dx)}. φX (y) = exp{ Rd In fact, φX (y) = ∞ X E(exp{i(y, Z(1) + ... + Z(N ))}|N = n)P (N = n) n=0 ∞ X ∞ cn −c X n cn = E(exp{i(y, Z(1) + ... + Z(n))}) e = φZ (y) e−c = exp{c(φZ (y) − 1)}. n! n! n=0 n=0 Def. A Borel measure ν on Rd \ {0} is called a Lévy measure if Z min(1, x2 )ν(dx) < ∞. Rd \{0} Theorem 1 (the Lévy-Khintchine formula). For any b ∈ Rd , a positive definite d × d matrix A and a Lévy measure ν the function Z 1 φ(u) = exp{i(b, u) − (u, Au) + [ei(u,y) − 1 − i(u, y)1B1 (y)]ν(dy)} (1) 2 d R \{0} is a characteristic function of an infinitely divisible measure, where Ba denotes a ball of radius a in Rd . Conversely, any infinite divisible distribution has a characteristic function of this form. Proof (in one direction only). If any function of form (1) is a ch.f., then it is infinitely divisible (as its roots have the same form). To show the latter we introduce the approximations à ! Z Z 1 φn (u) = exp{i b − yν(dy), u − (u, Au) + (ei(u,y) − 1)ν(dy)}. (2) 2 B1 \B1/n Rd \B1/n Each φn is a ch.f. (of the convolution of a normal distribution and an independent compound Poisson) and φn (u) → φ(u) for any u. By Lévy theorem one needs to show only that φ is continuous at zero. This is easy (check it!). 13 Def. Writing φ(u) = eη(u) in (4.1) the mapping η is called the characteristic exponent or Lévy exponent or Lévy symbol of φ (or of its distribution). ? Theorem 2. Any infinitely divisible probability measure µ is a weak limit of a sequence of compound Poisson distributions. Proof. Let φ be a ch.f. of µ so that φ1/n is the ch.f. of its ”convolution root” µn . Define Z 1/n φn (u) = exp{n[φ (u) − 1]} = exp{ (ei(u,y) − 1)nµn (dy)}. Rd Each φn is a ch.f.of a compound Poisson process and φn = exp{n(e(1/n) ln φ(u) − 1)} → φ(u), n → ∞. Completes by Glivenko’s theorem. Def. Processes with s.i. increments. A process X = Xt , t ≥ 0, has independent increments if for any collection of times 0 ≤ t1 < ... < tn+1 the r.v. Xtj+1 −Xtj , j = 1, ..., n are independent and it has stationary increments if Xt − Xs is distributed like Xt−s − X0 for any t > s. X is a Lévy process if (i) X0 = 0 a.s., (ii) X has s.i. increments; (iii) X is stochastically continuous, i.e. ∀ a > 0, s ≥ 0 lim P (|X(t) − X(s)| > a) = 0. t→s Under (i), (ii), the latter is equivalent to limt→0 P (|X(t)| > a) = 0 for all a > 0. Alternative version of the definition of the Lévy processes requires the right continuity of paths instead of stochastic continuity. At the end of the day this leads to the same class of processes, because, on the one hand, conclusions of Theorems 3 and 4 below are easily seen to remain valid under this assumption (which leads to stochastic continuity), and on the other hand, any Lévy process as defined above has a right continuous modification, as we shall see later. So we shall usually consider the right continuous modifications of the Lévy processes. Theorem 3. If Xt is stochastically continuous, then the map t 7→ φXt (u) is continuous for each u. Proof. Follows from Z |φXt (u) − φXs (u)| = ei(u,Xs ) [ei(u,Xt −Xs ) − 1](ω)P (dω) Z ≤ |ei(u,y) − 1|PXt −Xs (dy) ≤ sup |ei(u,y) − 1| + 2P (|Xt − Xs | > δ). |y|<δ Exer. 3. Let a right continuous function f : R+ 7→ C satisfy f (t + s) = f (t)f (s) and f (0) = 1. Show that f (t) = etα with some α. Hint: consider first t ∈ N, then t ∈ Q, then use continuity. Theorem 4. If X is a Lévy process, then Xt is infinitely divisible for all t and φXt (u) = etη(u) , where η(u) is the Lévy symbol of X1 . 14 Proof. φXt+s (u) = φXt (u)φXs (u) and φX0 (u) = 1. Hence by Exercise φXt = exp{tα(u)}. But φX1 = exp{η(u)}. Example. Convince yourself that the Brownian motion is a Lévy process. Def. The Poisson process of intensity c > 0 is a right continuous Lévy process s.t. each r.v. Nt is Poisson with the parameter tc. Construction of Poisson processes. The existence of Poisson processes can be obtained by the following explicit construction. Let τ1 , τ2 , ... be a sequence of iid exponential r.v. with parameter c > 0, i.e. P (τi > s) = e−cs , s > 0. Introduce the partial sums Sn = τ1 + ... + τn . These sums have the Gamma (c, n) distributions cn sn−1 e−cs ds (n − 1)! P (Sn ∈ ds) = (Exer.: check it by induction taking into account that the distribution of Sn is the convolution of the distributions Sn−1 and τn ). Define Nt as the right continuous inverse to Sn , e.g. Nt = sup{n ∈ N : Sn ≤ t}, so that P (Sk ≤ t) = P (Nt ≥ k) and Z P (Nt = n) = P (Sn ≤ t, Sn+1 > t) = 0 t cn (ct)n sn−1 e−cs e−c(t−s) ds = e−ct . (n − 1)! n! Exer. 4. Prove that the process Nt constructed above is in fact a Lévy process by showing that P (Nt+r − Nt ≥ n, Nt = k) = P (Nr ≥ n)P (Nt = k) = P (Sn ≤ r)P (Nt = k). (3) Hint: take, say, n > 1 (the cases with n = 0 or 1 are even simpler) and observe that the l.h.s. of (3) is the probability of the event (Sk ≤ t, Sk+1 > t, Sn+k ≤ t + r) = (Sk = s ≤ t, τk+1 = τ > t − s, Sn+k − Sk+1 = v ≤ (t + r) − (s + τ )), so that by independence the l.h.s. of (3) equals Z t 0 ck sk−1 e−cs ds (k − 1)! Z ∞ Z −cτ ce dτ t−s 0 (t+r)−(τ +s) cn−1 n−2 −cv v e dv, (n − 2)! which changing τ to τ + s and denoting it again by τ rewrites as Z t Z ∞ Z t+r−τ ck cn−1 n−2 −cv k−1 −cτ s ds ce dτ v e dv. (n − 2)! 0 (k − 1)! t 0 By calculating the integral over ds and changing the order of v and τ this in turn rewrites as Z Z t+r−v (ct)k r cn−1 n−2 −cv v e dv ce−cτ dτ k! 0 (n − 2)! t 15 =e k −ct (ct) Z k! r 0 cn−1 n−2 −cv v (e − e−cr ) dv. (n − 2)! It remains to see that by integration by parts the integral in this expression equals Z r cn sn−1 e−cs ds, 0 (n − 1)! and (3) follows. Exer. 5 and Def. Let Z(n), n ∈ N, be a sequence of Rd -valued iid r.v. with law µZ . The compound Poisson process (with the distribution of jumps µZ and intensity λ) is defined as Y (t) = Z(1) + ... + Z(Nt ), where Nt is a Poisson process of intensity λ. The corresponding compensated compound Poisson process is defined as Ỹt = Y (t) − tλEZ(1). From the above calculations of the ch.f. of a compound Poisson r.v. it follows that Y (t) is a Lévy process with the Lévy exponent Z η(u) = (ei(u,y) − 1)λµZ (dy). (4) Check (i) that Yt is a Lévy process and (ii) that EỸt = 0. Hint: to check condition (iii) in the definition of Lévy processes write P (|Yt | > a) = ∞ X P (|Z(1) + ... + Z(n)| > a)P (Nt = n) n=0 and use dominated convergence (alternatively follows from obvious right continuity). Remark. The existence of a Levy process with a given characteristic exponent can be proved by various constructions. The fastest way is based on carrying out on the level of processes the limiting procedure outlined for r.v. in our proof of Theorem 1 (in other words via Lévy-Ito decomposition, described below). But we shall obtain the existence (of a right continuous modification) later by a more general procedure (applied to all Feller processes) in three steps: (i) building finite-dimensional distributions via Markov property, (ii) using Kolmogorov’ existence of a canonical process, (iii) defining right continuous modification via martingale methods. Def. A Lévy process Xt with a characteristic exponent 1 η(u) = i(b, u) − (u, Au) 2 (5) (where A is a positive definite d × d-matrix, b ∈ Rd ) and with a.s. continuous paths is called the d-dimensional Brownian motion with covariance A and drift b. It is called standard if A = I, b = 0. 16 Exer. 6. Show that this is equivalent to say that Xt is a Gaussian process s.t. (i) B0 = 0 a.s.; (ii) the increments Bt −Bs have normal distribution N ((t−s)b, (t−s)A) for all 0 ≤ s < t; (iii) the r.v. Bt2 −Bt1 and Bt4 −Bt3 are independent whenever t1 ≤ t2 ≤ t3 ≤ t4 ; (iv) the trajectories t 7→ Bt are continuous a.s. Exer. 7. Prove the existence of BM Bt with a given drift and covariance. Hint: first construct √ a standard d-dimensional BM Wt using product measure spaces, then define Bt = bt + AWt . By ∆Xt = Xt − Xt− we shall denote the jumps of Xt . Theorem 5 (Lévy-Ito decomposition). Let Xt be a right continuous Lévy process with a characteristic exponent Z 1 [ei(u,y) − 1 − i(u, y)1B1 (y)]ν(dy). (6) η(u) = i(b, u) − (u, Au) + 2 d R \{0} Then Xt can be represented as the sum of three independent Lévy processes Xt = Xt1 + Xt2 + Xt3 , where Xt1 is the BM with a drift specified by Lévy exponent (5), X ∆Xt 1|∆Xt |>1 Xt2 = s≤t is a compound Poisson process with the exponent Z η2 (u) = [ei(u,y) − 1]ν(dy) (7) Rd \B1 obtained by summing the jumps of Xt of size exceeding 1 and Xt3 is the limit of the compensated compound Poisson processes Xt3 (n) with the exponents Z Z i(u,y) yν(dy)). [e − 1]ν(dy) − i(u, η(u) = B1 \B1/n B1 \B1/n The process Xt3 has jumps only of the size not exceeding 1 and has all finite moments E|Xt3 |m , m > 0. Proof. Straightforward from (1) and (2). In particular, the product form of ch.f. (1) ensures the independence of X i , i = 1, 2, 3, formula (7) comes by comparison with (4), the moments of Xt3 are given by Z 3 2k E|Xt | = |y|2k ν(dy), k = 1, 2, ... B1 Corollary 1. The only continuous Lévy processes are BM with drifts or deterministic processes (pure drifts). Corollary 2. For any collection of disjoint Borel sets Ai , i = 1, ..., n not containing zero in their closures the processes X XtAi = ∆Xt 1∆Xt ∈Ai s≤t 17 are independent compound Poisson process with characteristic exponent Z ηAi (u) = (ei(u,y) − 1)ν( dy). (8) Ai Pn A and Xt − j=1 Xt j is a Lévy process independent of all X Aj with the jumps only outside ∪j Aj . Moreover, the processes N (t, Ai ) that count the number of jumps of Xt or XtAi in Ai up to time t are independent Poisson processes of intensity ν(Ai ). Def. Let µ be a σ-finite measure on a metrical space E (we need only the case with E being a Borel subset of Rd ). The collection of r.v. φ(B) parametrized by Borel subsets of E is a Poisson random measure with intensity µ if each φ(B) is a Poisson r.v. with parameter µ(B) and if φ(B1 ), ..., φ(Bn ) are independent whenever B1 , ..., Bn are disjoint. Corollary 3. The collection of r.v. N ((s, t], A) = N (t, A) − N (s, A) (notations from Corollary 2) counting the number of jumps of Xt of size A that occur in the time interval (s, t] specifies a Poisson random measure on (0, ∞) × (Rd \ {0}) with intensity dt ⊗ ν. Remark. To prove the existence of a Lévy process in the spirit of Theorem 5 one needs two additional ingredients: existence of a Poisson random measure with an arbitrary intensity (which is rather easy) to construct the processes N (t, A) of jumps of a Lévy process and the proof of convergence of the approximation Xt3 (n) (see Theorem 5), for which one needs Doob’s maximal inequality for martingales (which can be derived as a consequence of Doob’s optional sampling given Section 8). Def. The non-decreasing Lévy processes with values in R+ are called subordinators. Theorem 6. A real valued Lévy process Xt is a subordinator iff its characteristic exponent has the form Z ∞ η(u) = ibu + (eiuy − 1)ν(dy), (9) 0 where b ≥ 0 and the Lévy measure ν has support in R+ and satisfies the additional condition Z 1 xν(dx) < ∞. (10) 0 Moreover Xt = tb + X (∆Xs ). s≤t Proof. First if X is positive, then it can only increase from X0 = 0. Hence by iid property it is a non-decreasing process and consequently the Lévy measure has support in R+ and X contains no Brownian part, e.g. A = 0 in (6). Next, X (∆Xs )1|Xs |≤1 = s≤t X |∆Xs |1|Xs |≤1 ≤ Xt2 , s≤t implying that E X (∆Xs )1|Xs |≤1 ≤ EXt2 < ∞. s≤t 18 But Z X X (∆Xs )1²≤|Xs |≤1 = (∆Xs )1|Xs |≤1 = lim E E ²→0 s≤t 1 xν(dx), 0 s≤t implying (10). Def. Clearly for a subordinator Xt the Laplace transform is well defined and Ee−λXt = exp{−tΦ(λ)}, where Z Φ(λ) = −η(iλ) = bλ + ∞ (1 − e−λy )ν(dy) (11) (12) 0 is called the Laplace exponent or cumulant. Def. A subordinator Xt is a one-sided stable process, if to each a ≥ 0 there corresponds a constant b(a) ≥ 0 s.t. aXt and Xtb(a) have the same law. Exer. 8 (exponents of stable subordinators). (i) Show that b(a) in this definition is continuous and satisfies the equation b(ac) = b(a)b(c), hence deduce that b(a) = aα with some α > 0 called the index of stability or stability exponent. (ii) Deduce further that Φ(a) = b(a)Φ(1) and hence Ee−uXt = exp{−truα } (13) with a constant r > 0, called the rate. Taking into account that Φ from (12) is increasing and concave, deduce that necessarily α ∈ (0, 1). (iii) Show that for α ∈ (0, 1) Z ∞ dy Γ(1 − α) α (1 − e−uy ) 1+α = u (14) y α 0 by using the integration by parts in order to rewrite the l.h.s. of this equation as Z u ∞ −uy −α e y dy. α 0 Deduce that stable subordinators with index α and rate r described by (13) has the Laplace exponent (12) with the Lévy measure ν(dy) = r α y −(1+α) . Γ(1 − α) (15) Exer. 9 Prove the law of large number for a Poisson process Nt of intensity c: Nt /t → c a.s. as t → ∞. Hint: use the construction of Nt given above and the fact that Sn /n → 1c as n → ∞ according to the usual law of large numbers. Section 5. Conditioning. Def. For a given measure space (S, F, µ), a measure ν on (S, F) is called absolutely continuous with respect to µ if ν(A) = 0 whenever A ∈ F and µ(A) = 0. Two measures are called equivalent if they are mutually absolutely continuous. 19 The Radon-Nikodym Theorem. If µ is σ-finite and ν is finite and absolutely continuous with respect to µ, then there exists a unique (up to almost sure equality) non-negative measurable function g on S such that for all A ∈ F Z ν(A) = g(x)µ(dx). A This g is called the Radon-Nikodym derivative of ν with respect to µ and is often denoted dν/dµ. Def. Conditional expectation. Let X be a integrable r.v. on a probability space (Ω, F, P ) and let G be a sub- σ-algebra of F. If X ≥ 0 everywhere, the formula QX (A) = E(X1A ) for A ∈ G defines a measure QX on (Ω, G) that is obviously absolutely continuous with respect to P . The r.v. E(X|G) = dQX /dP on (Ω, G, P ) is called the conditional expectation of X with respect to G. If X is not supposed to be positive one defines the conditional expectation as E(X|G) = E(X + |G) − E(X − |G). In other words Y = E(X|G) is a r.v. on (Ω, G, P ) such that Z Z Y (ω)P (dω) = X(ω)P (dω) (1) A A for all A ∈ G. If X = (X1 , ..., Xd ) ∈ Rd , then E(X|G) = (E(X1 |G), ..., E(Xn |G). Exer. 1. Let a σ-algebra G be finite and defined as the set of unions of a finite n collection of disjoint sets Gi ∈ F, i = 1, ..., n such that R Ω = ∪i=1 Gi . Show that for any i the function E(X|G) is a constant on Gi that equals Gi X(ω)P (dω). Theorem 1 (key properties of the conditional expectation). (i) E(E(X|G)) = E(X); (ii) if Y is G-measurable, then E(XY |G) = Y E(X|G) a.s.; (iii) if Y is G-measurable and X is independent of G, then E(XY |G) = Y E(X) a.s. and E(f (X, Y )|G) = Gf (Y ) (2) a.s. for a Borel function f , where Gf (y) = E(f (X, y)) a.s.; (iv) if H is a sub- σ-algebra of G then E(E(X|G)|H) = E(X|H) a.s.; (v) the mapping X 7→ E(X|G) is on orthogonal projection L2 (Ω, F, P ) 7→ L2 (Ω, G, P ). (vi) X1 ≤ X2 → E(X1 |G) ≤ E(X2 |G) a.s. (vii) the mapping X 7→ E(X|G) is a linear contraction L1 (Ω, F, P ) 7→ L1 (Ω, G, P ). Exer. 2. Prove the above theorem. Hint: (ii) consider first the case with Y being an indicator function of a G-measurable set; (v) assume X = Y + Z with Y from L2 (Ω, G, P ) and Z from its orthogonal complement and show that Y = E(X|G). (vi) Follows from an obvious remark that X ≥ 0 ⇒ E(X|G) ≥ 0. Exer. 3. Give an alternative construction of conditional expectation (proving all its properties) by passing Radon-Nikodym: define it by the property (v) from the above theorem. 20 Def. If Z is a r.v. on (Ω, F, P ) one calls E(X|σ(Z)) the conditional expectation of X with respect to Z and denotes it shortly by E(X|Z). Exercise 4 and Def. Show that the r.v. E(X|Z) is a constant on any Z-level set {ω : Z(ω) = z}. One denotes this constant by E(X|Z = z) and calls it the conditional expectation of X given Z = z. Show that Z Z E(X) = E(X|Z)(ω)P (dω) = E(X|Z = z)pZ (dz). (3) Hint: use (1.1) with the function f (Z(ω)) = E(X|Z)(ω) = E(X|Z = z(ω)). Def. Let X and Z be Rd and respectively Rm -valued r.v. on (Ω, F, P ), and let G be a sub-sigma-algebra of F. Conditional probability of X given G and X given Z = z respectively are defined as PX|G (B; ω) ≡ P (X ⊂ B|G)(ω) = E(1B (X)|G)(ω), ω ∈ Ω; PX|Z=z (B) ≡ P (X ⊂ B|Z = z) = E(1B (X)|Z = z), for Borel sets B, or equivalently through the equations Z E(f (X)|G)(ω) = f (x)PX|G (dx; ω) (4) Rd Z E(f (X)|Z = z) = Rd f (x)PX|Z=z (dx) for bounded Borel functions f . Of course PX|Z=z (B) is just the common value of PX|Z (B; ω) on the set {ω : Z(ω) = z}. It is possible to show (though this is not obvious) that regular conditional probability of X given G exists, i.e. such a version of conditional probability that PX|G (B, ω) is a probability measure on Rd as a function of B for each ω (notice that from the above discussion the required additivity of conditional expectations hold a.s. only so that they may fail to define a probability even a.s.) and is G-measurable as a function of ω. Hence one can define conditional r.v. XG (ω), XZ (ω) and XZ=z as r.v. with the corresponding conditional distributions. Exer. 5. For a Borel function h Z Eh(X, Z) = h(x, z)PX|Z=z (dx)pZ (dz) (5) (if the l.h.s. is well defined). Hint: From the above definition Z Z Z f (x)PX|G (dx; ω)P (dω) = f (X(ω))P (dω) A∈G and in particular Z C∈Rm Rd A Z Z Rd f (x)PX|Z=z (dx)PZ (dz) = 21 1Z⊂C (ω)f (X(ω))P (dω). Hence Z Z Rm Rd g(z)f (x)PX|Z=z (dx)PZ (dz) = E(f (X)g(Z)) for Borel f, g, which implies (5). Exer. 6. Deduce from (5) that (i) if X, Z are r.v. with a joint probability density function fX,Z (x, z), then the conditional r.v. XZ=z has a probability density function fXZ=z (x) = fX,Z (x, z)/fZ (z) whenever fZ does not vanish, (ii) if X, Z are discrete r.v. with joint probability P (X = i, Z = j) = pij , then the conditional probabilities p(X = i|Z = j) are given by the usual formula pij /P (Z = j). Theorem 2. Let X be a integrable variable on (Ω, F, P ) and let Gn be (i) an increasing sequence of sub-σ-algebras of F with G being the minimal σ -algebra containing all Gn , or 1 (ii) decreasing sequence of sub-σ-algebras of F with G = ∩∞ n=1 Gn . Then a.s. and in L E(X|G) = lim E(X|Gn ). n→∞ (6) Furthermore, if Xn → X a.s. and |Xn | < Y for all n, where Y is an integrable r.v., then a.s. and in L1 E(X|G) = lim E(Xn |Gn ). (7) n→∞ Sketch of the proof of the convergence in L1 (a.s. convergence is a bit more involved, and we shall neither prove, nor use it). Any r.v. of the form χB with B ∈ G can be approximated in L2 by a Gn -measurable r.v. ξn . Hence the same holds for any r.v. from L2 (Ω, F, P ). As E(X|Gn ) is the best approximation (L2 -projection) one obtains (6) for X ∈ L2 (Ω, F, P ), and hence for X ∈ L1 (Ω, F, P ) by density arguments. Next, E(Xn |Gn ) − E(X|G) = E(Xn − X|Gn ) + (E(X|Gn ) − E(X|G). Since |Xn | < Y and Xn → X a.s. one concludes that Xn → X in L1 by dominated convergence. Hence E(E|Xn − X||Gn ) = E|Xn − X| → 0. Theorem 3. If X ∈ L1 (Ω, F, P ), the family of r.v. E(X|G), G runs through all sub-σ-algebra of F, is uniformly integrable. Proof. 1|E(X|G)|>c E(X|G) = E(X1|E(X|G)|>c |G), because {|E(X|G)| > c} ∈ G. Hence ¡ ¢ ¡ ¢ E 1|E(X|G)|>c E(X|G) ≤ E 1|E(X|G)|>c |X| d ≤ E(|X|1|X|>d ) + dP (|E(X|G)| > c) ≤ E(|X|1|X|>d ) + E(|X|). c 22 First choose d to make the first term small, then c to make the second one small. Theorem 4 (locality of conditional expectation). Let the σ-algebras G1 , G2 ∈ F and r.v. X1 , X2 ∈ L1 (Ω, F, P ) be such that G1 = G2 and X1 = X2 on a set A ∈ G1 ∩ G1 . Then E(X1 |G1 ) = E(X2 |G2 ) a.s. on A. Proof. Note that 1A E(X1 |G1 ) and 1A E(X2 |G2 ) are both G1 ∩ G2 -measurable, and for any B ⊂ A s.t. B ∈ G1 (and hence B ∈ G2 ) Z Z E(X1 |G1 )P (dω) = B Z Z X1 P (dω) = B X2 P (dω) = B E(X2 |G2 )P (dω). B Section 6. Markov processes. Def. Let (Ω, F) be a measurable space. A family Ft , t ≥ 0, of sub-σ-algebras of F is called a filtration if Fs ⊂ Ft whenever s ≤ t. By F∞ one denotes the minimal σ-algebra containing all Ft . A probability space (Ω, F, P ) with a filtration is said to be filtered. A process X = Xt defined on a filtered probability space (Ω, F, P ) is adapted (or Ft -adapted) if Xt is Ft -measurable for each t. Any process X defines its own natural filtration FtX = σ{Xs : 0 ≤ s ≤ t} and X is clearly adapted to it. Main Def. An adapted process X = Xt on a filtered probability space (Ω, F, P ) is called Markov process if for all f ∈ Bb (Rd ), 0 ≤ s ≤ t it satisfies the following Markov property: E(f (Xt )|Fs ) = E(f (Xt )|Xs ) a.s. (1) and moreover the function Φs,t f (x) = E(f (Xt )|Xs = x). (2) belongs to Bb (Rd ) whenever f so does for any 0 ≤ s ≤ t. Theorem 1. Any Lévy process X (e.g. Browninan motion) is Markov with respect to its natural filtration. Moreover Z X E(f (Xt )|Fs ) = f (Xs + z)pt−s (dz) (3) Rd for f ∈ Bb (Rd ), 0 ≤ s < t, where pt is the law of Xt . Proof. By (5.2) E(f (Xt )|FsX ) = E(f (Xt − Xs + Xs )|FsX ) = Gf (Xs ), where Z Gf (y) = E(f (Xt − Xs + y)) = f (z + y)pt−s (dz), and (3) follows. Similarly the r.h.s. of (1) equals the r.h.s. of (3) implying (1) with the filtration FtX . 23 Def. A Lévy process Xt on a probability space (Ω, F, P ) equipped with a filtration Ft is called Ft -Lévy process if it is Ft -adapted and the increments Xt −Xs are independent of Fs for all 0 ≤ s < t. Theorem 2 (properties of transition). (i) Φs,s = I (identity operator); (ii) (positivity) f ≥ 0 ⇒ Φs,t f ≥ 0; (iii) (conservativity) Φs,t (1) = 1; (iv) (propagator property) Φr,s Φs,t = Φr,t for r ≤ s ≤ t. Proof. (i)-(iii) are obvious and do not depend on the Markov property. (iv) By (6.1) Φr,t = E(f (Xt )|Xr = x) = E(E(f (Xt )|Fs )|Xr = x) = E(E(f (Xt )|Xs )|Xr = x) = E(Φs,t f (Xs )|Xr = x) = (Φr,s (Φs,t f ))(x). Def. For a Markov process X transition probabilities are defined by ps,t (x, A) = (Φs,t 1A )(x) = P (Xt ∈ A|Xs = x), so that Z s,t (Φ f )(x) = Rd f (y)ps,t (x, dy), f ∈ Bb (Rd ). A Markov process has transition densities whenever the measures ps,t (x, .) have densities, R say ρs,t (x, y) so that ps,t (x, A) = A ρs,t (x, y) dy. Exer. 1. (i) Show that for a Lévy process ps,t (x, A) = qt−s (A − x), where qt is the law of Xt . (ii) Write down the probability density of the Brownian motion. Theorem 3 (the Chapman-Kolmogorov equations). If X is a Markov process, then for any Borel A Z pr,t (x, A) = ps,t (y, A)pr,s (x, dy). Rd Proof. Apply the operator equation Φr,s Φs,t = Φr,t to the indicator function 1A . Exer. 2. (Chapman- Kolmogorov for processes with transition densities.) If a Markov process has transition densities, then Chapman-Kolmogorov rewrites as Z ρr,t (x, z) = Rd ρr,s (x, y)ρs,t (y, z) dy. Def. A family of mappings {ps,t : 0 ≤ s ≤ t < ∞} from Rd × B(Rd ) to [0, 1] is said to be a transition family (shortly t.f.) if (i) ps,t (x, A) is measurable as a function of x and is a probability measure as a function of A, (ii) the Chapman-Kolmogorov equations hold. ps,t E Theorem 4. A process X is Markov with respect to its natural filtration FtX with t.f. and initial measure ν ⇔ for any 0 = t0 < t1 < .... < tk and positive Borel fi k Y Z fi (Xti ) = Z ν(dx0 )f0 (x0 ) Z p0,t1 (x0 , dx1 )f1 (x1 )... i=0 24 ptk−1 ,tk (xk−1 , dxk )fk (xk ). (4) Proof. Let X be Markov with t.f. ps,t . Then Ãk−1 ! k Y Y E fi (Xti ) = E fi (Xti )E(fk (Xtk |Ftk−1 ) i=0 =E Ãk−1 Y i=0 ! fi (Xti )Φtk−1 ,tk fk (Xtk−1 ) =E Ãk−1 Y i=0 ! Z fi (Xti ) ptk−1 ,tk (Xtk−1 , dxk )fk (xk ) i=0 and repeating this inductively one arrives to the r.h.s. of (4). Conversely, as (1), (2) is equivalent to Z Z f (Xt (ω))P (dω) = (Φs,t f )(Xs (ω))P (dω) A∈Fs A∈Fs (here one uses that E(f (Xt )|Xs )(ω) is constant on a level set of Xs and hence it can be Qk written as (Φs,t f )(Xs (ω))), and because Fs is generated by the sets i=1 1Xti ∈Ai , t1 < ... < tk ≤ s, to prove that X is Markov one has to show that for any t1 < .... < tk ≤ s < t and Borel functions f1 , ..., fk , g à k ! à k ! Y Y E fi (Xti )g(Xt ) = E fi (Xti )Φs,t g(Xt ) , i=0 i=0 and this follows by applying (4) to both sides of this equation. Theorem 5. Let {ps,t : 0 ≤ s ≤ t < ∞} be a transition family and µ a probability + measure on Rd . Then there exists a probability measure P on the measure space (Rd )R equipped with its natural filtration Ft0 = σ(Xu : u ≤ t) generated by the co-ordinate process X s.t. the co-ordinate process Xt is Markov with initial distribution µ and with t.f. ps,t . Proof. On cylinder sets define Z Z = µ(dx0 ) A0 A1 pt0 ,t1 ,...,tn (A0 × A1 × ... × An ) Z Z p0,t1 (x0 , dx1 ) pt1 ,t2 (x1 , dx2 )... A2 An ptn−1 ,tn (xn−1 , dxn ). Chapman-Kolmogorov ⇒ consistency, which implies (Kolmogorov’s theorem) the existence of a process Xt with such finite dimensional distributions. Clearly X0 has law µ and Xt is adapted to its natural filtration. Theorem 4 ensures that this process is Markov. Def. A Markov process constructed in the above theorem is called canonical process corresponding to t.f. ps,t . Def. A Markov process is called (time) homogeneous if ps,t depend on the difference t − s only. One then writes pt−s for ps,t and Φt−s for Φs,t . We shall deal only with homogeneous Markov processes. 0 Exer. 3. If X is a canonical Markov process and Z is a F∞ -measurable bounded d R+ (or positive) function on (R ) . Then the map x 7→ Ex (Z) is (Borel) measurable and Z Eν (Z) = ν(dx)Ex (Z) 25 for any probability measure ν (initial distribution of X). Hint: extend by the monotone class theorem from the mappings Z being indicators of cylinders, for which this is equivalent to (3). ? Theorem 6 (a more powerful formulation of Markov property). Coordinate + + 0 process on ((Rd )R , F∞ , P ) is Markov ⇔ for any bounded (or positive) r.v. Z on (Rd )R , every t > 0 and starting measure ν Eν (Z ◦ θt |Ft0 ) = EXt (Z) Pν − a.s., where θ is the canonical shift operator Xs (θt (ω)) = Xt+s (ω). Proof. One needs to show that Eν ((Z ◦ θt )Y ) = Eν (EXt (Z)Y ) for F 0 -measurable r.v. Y . By usual extension arguments it is enough to do it for Y = Qk t Qn i=1 fi (Xti ) and Z = j=1 gj (Xsj ), where ti ≤ t and fi , gj are positive Borel. Thus one has to show that n k k n Y Y Y Y Eν gj (Xsj +t ) fi (Xti ) = Eν EXt fi (Xti ) . gj (Xsj ) j=1 i=1 i=1 j=1 But the l.h.s. equals Eν E n Y gj (Xsj +t )|Ft0 k Y fi (Xti ) , i=1 j=1 which coincides with the r.h.s. by the homogeneous Markov property. Section 7. Feller processes and semigroups. Recall: Banach spaces: Lp (Ω, F, P ), p ≥ 1, L∞ (Ω, F, P ), Bb (X), Cb (X), C∞ (X), convergence, linear operators and their norms, dense subspaces. Def. A semigroup of linear contractions on a Banach space B is a family Φt , t ≥ 0, of bounded linear operators on B with norm not exceeding one s.t. Φ0 is the identity operator and Φt Φs = Φt+s for all t, s ≥ 0. Such semigroup on the Banach space Bb (X) (X-subset of Rd ) is called a sub-Markov semigroup, if it preserves positivity (or is positive), i.e. if f ≥ 0 always implies Φt f ≥ 0, and a Markov semigroup, if additionally it preserves constants, i.e. Φt 1 = 1. Theorem 1. For a Markov process with homogeneous t.f. pt the operators Z Φt f (x) = pt (x, dy)f (y) = Ex f (Xt ) form a Markov semigroup in Bb (X). 26 Proof. A direct consequence of definitions and Chapman-Kolmogorov equations. Def. (i) A semigroup Φt of linear contractions on a Banach space B is called strongly continuous, if kΦt f − f k → 0 as t → 0 for any f ∈ B. (ii) A strongly continuous semigroup of positive linear contractions on C∞ (Rd ) is called a Feller semigroup. It is called conservative if it extends to a semigroup of contractions on Bb (Rd ) preserving constants. We shall discuss only conservative Feller semigroups. Def. A (homogeneous) Markov process is called a Feller process, if its Markov semigroup reduced to C∞ (Rd ) is a (conservative) Feller semigroup. ? Proposition. Any Feller semigroup arises in this way, i.e. it is given by Z Φt f (x) = pt (x, dy)f (y) with a certain t.f. pt . Sketch of the Proof. Follows more or less directly from the Riesz-Markov theorem. Exer. 1. (i) Show that if A is a bounded linear operator in a Banach space, then Tt = e tA ∞ n X t n = A n! n=0 defines a strongly continuous semigroup. (ii) Show that the process of BM is Feller. (iii) Show that the semigroup of shifts Tt f (x) = f (x + t) is strongly continuous in C∞ (R) (and hence is Feller there), as well as in L1 (R) or L2 (R), but is not strongly continuous in Cb (R). Observe also that for analytic functions ∞ n X t f (x + t) = (Dn f )(x), n! n=0 which can be formally written as etD f (x). (iv) Let η(y) be a complex-valued continuous function on Rd s.t. Re η ≤ 0. Convince yourself that Tt f (y) = etη(y) f (y) (1) is a semigroups of contraction in all our Banach spaces Lp (Rd ), L∞ (Rd ), Bb (Rd ), Cb (Rd ), C∞ (Rd ). Show that it is strongly continuous in Lp (Rd ) and C∞ (Rd ), but not necessarily in other three spaces. Theorem 2. Let Xt be a Lévy process with Levy symbol η. Then Xt is a Feller process with semigroup Φt s.t. Z Φt f (x) = f (x + y)pt (dy), where pt is the law of Xt . 27 f ∈ Cb (Rd ), (2) Sketch of the proof. Formula (2) was established earlier. Notice that any f ∈ C∞ is uniformly continuous (check it!). For any such f Z Φt f (x) − f (x) = (f (x + y) − f (x))pt (dy) Z = Z (f (x + y) − f (x))pt (dy) + |y|>K (f (x + y) − f (x))pt (dy), |y|≤K and the first (resp. the second) term is small for small t and any K by stochastic continuity of X (resp. for small K and arbitrary t by uniform continuity of f ). Hence kΦt f − f k → 0 as t → 0. Check that Φt f ∈ C∞ for any t (Exer.) ? Exer. 2. Recall the inversion formula for the Fourier transform on S(Rd ) and check that the Fourier transform takes the semigroup Φt to a multiplication semigroup, i.e. Φt f (x) = F −1 (etη F f ), f ∈ S(Rd ). Use this representation in conjunction with Exer. 1 (iv) to give another proof of the Feller property of the semigroup Φt . Hint: by inversion µZ ¶ d/2 i(u,x+Xt ) Φt f (x) = E(f (Xt + x) = (2π) E e F f (u) du Rd which yields (justify by Fubini’s) Z Z d/2 i(u,x) i(u,Xt ) d/2 Φt f (x) = (2π) e Ee F f (u) du = (2π) Rd ei(u,x) etη(u) F f (u) du. Rd Feller property follows then from the exercise (iv) above and density arguments. Def. Let Tt be a strongly continuous semigroup of linear contractions on a Banach space B. The generator of Tt is defined as the operator Tt f − f t→0 t Af = lim on the linear subspace DA ⊂ B (the domain of A), where this limit exists (in the topology of B). The resolvent of Tt (or of A) is defined for any λ > 0 as the operator Z ∞ Rλ f = e−λt Tt f dt. 0 Theorem 3 (basic properties of the generator and the resolvent). (i) Tt DA ⊂ DA for each t ≥ 0. (ii) Tt Af = ATt f for each t ≥ 0, f ∈ DA . (iii) Rλ is a bounded operator in B with kRλ k ≤ λ−1 (for any λ > 0). (iv) λRλ f → f as λ → ∞. (v) Rλ f ∈ DA for any f and λ > 0 and (λ − A)Rλ f = f , i.e. Rλ = (λ − A)−1 . 28 (vi) If f ∈ DA , then Rλ f Af = ARλ f . (vii) DA is dense in B. Proof. (i) and (ii) Observe that for ψ ∈ DA · ¸ · ¸ 1 1 ATt ψ = lim (Th − I) Tt ψ = Tt lim (Th − I) ψ = Tt Aψ. h→0 h h→0 h R ∞ −λt (iii) kRλ f k ≤ 0 e kf k dt = λ−1 kf k. (iv) Follows from the equation Z ∞ Z ² Z ∞ Z ∞ −λt −λt −λt e Tt f dt = λ e f dt + λ e (Tt f − f ) dt + λ e−λt (Tt f − f ) dt λ 0 0 0 ² observing that the first term on the r.h.s. is f , the second (resp. the third) term is small for small ² (resp. for any ² and large λ). (v) From definitions Z 1 ∞ −λt 1 ARλ f = lim (Th − 1)Rλ f = e (Tt+h f − Tt f ) dt h→0 h h 0 " # Z Z eλh − 1 ∞ −λt eλh h −λt = lim e Tt f dt − e Tt f dt = λRλ f − f. h→0 h h 0 0 (vi) Follows from definitions and (ii). (vii) Follows from (iv) and (v). Exer. 3. Give another proof of (vii) above (by-passing the resolvent) by showing Rt that ∀ψ ∈ B the vector ψt = 0 Tu ψdu belongs to DA and Aψt = Tt ψ − ψ. Exer. 4. The generator A of the semigroup Tt f = etη f from Exer. 1 (iv) above is given by the multiplication operator Af = ηf on functions f s.t. η 2 f ∈ C∞ (Rd ) (or respectively η 2 f ∈ Lp (Rd )) . Theorem 4. If Xt is a Lévy process with a characteristic exponent Z 1 η(u) = i(b, u) − (u, Au) + [ei(u,y) − 1 − i(u, y)χB1 (y)]ν(dy), (3) 2 Rd \{0} its generator is given by Z d d X ∂f 1 X ∂2f ∂f Lf (x) = bj + Ajk + [f (x+y)−f (x)− yj χB (y)]ν(dy). ∂xj 2 ∂xj ∂xk ∂xj 1 Rd \{0} j=1 j=1 d X j,k=1 (4) For instance for a Brownian motion with a drift the generator is given by the differential part (first two terms) of (4). Sketch of the Proof. Let us check (4) on the exponential functions. General case follows then by approximation arguments. For f (x) = ei(u,x) Z Z i(u,x) ei(u,y) pt (dy) = ei(u,x) etη(u) . Φt f (x) = f (x + y)pt (dy) = e 29 Hence d |t=0 Φt f (x) = η(u)ei(u,x) , dt which is given by (4) due to the elementary properties of the exponent. Remark. Of course ei(u,x) does not belong to C∞ (Rd ) and some attention should be payed to an appropriate choice of the domain of the generator. ? Exer. 5. Give an alternative proof of Theorem 4 using the representation of Φt by the Fourier transform given in a Exer. 2. Def. An operator A on in Cb (Rd ) defined on a domain DA (i) is conditionally positive, if Af (x) ≥ 0 for any f ∈ DA s.t. f (x) = 0 = miny f (y), (ii) satisfies the positive maximum principle (PMP), if Af (x) ≤ 0 for any f ∈ DA s.t. f (x) = maxy f (y) ≥ 0, (iii) is local if Af (x) = 0 whenever f ∈ DA vanishes in a neighborhood of x, (iv) satisfies a local PMP, if Af (x) ≤ 0 for any f ∈ DA having a local non-negative maximum at x. Theorem 5. Let A be a generator of a Feller semigroup Φt . Then (i) A is conditionally positive and (ii) satisfies the PMP on DA . (iii) If moreover A is local and DA ∞ ∞ contains Ccomp , then it satisfies the local PMP on Ccomp . Sketch of the proof. For (i) Lf (x) = Φt f (x) Φt f (x) − f (x) = lim ≥0 t→0 t→0 t t Af (x) = lim by positivity preservation. For (ii) apply (i) to the function fx (y) = f (x) − f (y). Theorem 6. If the generator L of a (conservative) Feller semigroup Φt with t.f. ∞ pt (x, dy) is local and Ccomp ⊂ DL , then Lf (x) = d X j=1 bj (x) d ∂f 1 X ∂2f + ajk (x) ∂xj 2 ∂xj ∂xk (5) j,k=1 for certain bi , aij ∈ C(Rd ) s.t. A = (aij ) is a positive definite matrix. Proof. To shorten the formulas, assume d = 1. Let χ be a smooth function R 7→ [0, 1] ∞ that equals 1 (resp. 0) for |x| ≤ 1 (resp. |x| > 2). For an f ∈ Ccomp one can write 1 f (y) = f (x) + f 0 (x)(y − x)χ(y − x) + f 00 (x)(y − x)2 χ(y − x) + gx (y), 2 where gx (y) = o(1)(y − x)2 as y → x. By conservativity L1 = 0. Hence 1 Lf (x) = b(x)f 0 (x) + a(x)f 00 (x) + (Lgx )(x) 2 with 1 b(x) = L[(. − x)χ(. − x)](x) = lim t→0 t 1 a(x) = L[(. − x) χ(. − x)](x) = lim t→0 t 2 30 Z (y − x)χ(y − x)pt (x, dy), Z (y − x)2 χ(y − x)pt (x, dy). But ±gx (y) + ²(y − x)2 vanishes at y = x and has a local minimum there for any ² so that Lg(x) = 0, which completes the proof. Def. Feller process with a generator of type (5) is called a (Feller) diffusion. Exer. 6. Show that the coefficients bj and aij can be defined as Z 1 bj (x) = lim (y − x)j 1{|y−x|≤²} (y)pt (x, dy), (6) t→0 t Z 1 aij (x) = lim (y − x)i (y − x)j 1{|y−x|≤²} (y)pt (x, dy) (7) t→0 t for any ² > 0. Conversely, if these limits exist and are independent of ², then the generator is local so that the process is a diffusion. Exer. 7 (a mathematical version of ”Einstein’s style” of the analysis of BM). If the generator L of a (conservative) Feller semigroup Φt with t.f. pt (x, dy) is such that ∞ Ccomp ⊂ DL and pt (x; {y : |y − x| ≥ ²}) = o(t), t → 0, for any ² > 0, then L is local (and hence of diffusion type). Conclusion about BM. A BM (possibly with a drift) can be characterized as (i) a diffusion with iid increments or as (ii) a Lévy process with a local generator. Exer. 8. (i) Show that the resolvent of the standard BM is given by the formula Z ∞ Z ∞ √ 1 1 Rλ f (x) = Rλ (|x − y|)f (y) dy = √ e− 2λ|y−x| f (y) dy. (8) 2λ −∞ −∞ Hint: Check this identity for the exponential functions f (x) = eiθx using the known ch.f. of the normal r.v. N (0, t). (ii) Show that for the standard BM in R3 Z Z √ 1 3 Rλ f (x) = Rλ (|x − y|)f (y) dy = e− 2λ|y−x| f (y) dy. (9) R3 2π|x − y| R3 Hint: observe that Z Rλ3 (|z|) = ∞ e−λt (2πt)−3/2 e−|z| 2 /(2t) dt = − 0 1 (R1 )0 (|z|). 2π|z| λ (10) CHAPTER 3. MARTINGALES METHODS. Section 8. Martingales. Def. An adapted integrable process on a filtered probability space is called submartingale if, for all 0 ≤ s ≤ t < ∞, E(Xt |Fs ) ≥ Xs , a supermartingale, if the reverse inequality holds, and a martingale if E(Xt |Fs ) = Xs . 31 Def. A filtration Ft is said to satisfy the usual hypotheses, if (i) (completeness) F0 contains all sets of P -measure zero (all P -negligible sets), (ii) (right continuity) Ft = Ft+ = ∩²>0 Ft+² . Adding to all Ft (of an arbitrary filtration) all P -negligible sets leads to a new filtration called the augmented filtration. Theorem (on regularity of submartingales) (without a proof). Let M be a submartingale. (i) The following left and right limits exist and are a.s. finite for each t > 0: Mt− = lim Ms ; Mt+ = lim Ms . s∈Q,s→t,s<t s∈Q,s→t,s>t ? (ii) If the filtration satisfies the usual hypotheses and if the map t 7→ EMt is rightcontinuous, then M has a cadlag (right continuous with finite left limits everywhere) modification. Theorem 1. If X is a Levy process with Lévy symbol η, then ∀u ∈ Rd , the process Mu (t) = exp{i(u, Xt ) − tη(u)} is a complex FtX -martingale. Proof. E|Mu (t)| = exp{−tη(u)} < ∞ for each t. Next, for s ≤ t Mu (t) = Mu (s) exp{i(u, Xt − Xs ) − (t − s)η(u)}. Then E(Mu (t)|FsX ) = Mu (s)E(exp{i(u, X(t − s))}) exp{−(t − s)η(u)} = Mu (s). Exer. 1. Show that the following processes are martingales: (1) standard BM Bt , Bt2 − t, Bt3 − 3tBt , Bt4 − 6tBt2 + 3t2 ; (2) d-dimensional Brownian motion B(t) with covariance A, |B(t)|2 − tr(A)t, and exp{(u, B(t)) − (u, Au)/2} for any u; (3) compensated Poisson process Ñt = Nt − λt with an intensity λ and Ñt2 − λt; (4) closed martingales: E(Y |Ft ), where Y is an arbitrary integrable r.v. in a filtered probability space. Hint: for (4) use Theorem 5.1 (iv). Theorem 2 (Dynkin’s formula). Let f ∈ D -domain of a Feller process Xt . Then the process Z Mtf = f (Xt ) − f (X0 ) − t Af (Xs ) ds, t ≥ 0, 0 is a martingale under any initial distribution ν, often called Dynkin’s martingale. Proof. Z f E(Mt+h |Ft ) − Mtf ÃZ t+h = E(f (Xt+h ) − Z t+h Af (Xs ) ds|Ft ) − (f (Xt ) − 0 Af (Xs ) ds) 0 ! Z 0 t 32 h AΦs f (Xt ) ds = 0. Af (Xs ) ds|Ft −f (Xt ) = Φh f (Xt )−f (Xt )− = Φh f (Xt )−E t Theorem 3. A Feller process Xt admits a cadlag modification. Remarks. (i) Unlike martingales, we do not need the right continuity of the filtration here. (ii) Proving the result for Lévy processes only one often utilizes the special martingales Mu (t) = exp{i(u, Xt ) − tη(u)} instead of Dynkin’s one used in the proof below. Proof. Let fn be a sequence in C∞ that separates points. By Dynkin’s formula and Theorem on martingale, there exists a set Ω of full measure s.t. fn (Xt ) has right and left limits on it for all n along all rational numbers Q. Hence Xt has right and left limits on Ω. Define X̃t = lim Xs . s→t,s>t,s∈Q Then Xs → X̃t a.s. and Xs → Xt weakly (by Feller property). Hence X̃t has the same distributions as Xt . Moreover, for any h, g ∈ C∞ Eν (g(Xt )h(X̃t )) = = lim Eν (g(Xt )Φs−t h(Xt )) s→t,s>t lim Eν (g(Xt )h(Xs )) = Eν (g(Xt )h(Xt )), s→t,s>t where the uniform convergence was used. This implies Ef (Xt , X̃t ) = Ef (Xt , Xt ) for all bounded positive Borel functions f . Choosing f to be the indicator of the set {(x, y) : x 6= y} yields X̃t = Xt a.s. Our final result here the following: Theorem 4. The augmented filtration Ftν of the canonical filtration Ft0 is right continuous. Proof. Because Ftν and Ftν+ are Pν -complete, it is enough to show Eν (Z|Ftν ) = Eν (Z|Ftν+ ) Pν − a.s. 0 for F∞ -measurable Qn and positive Z. By the monotone class theorem, it is sufficient to show this for Z = i=1 fi (Xti ) with f ∈ C∞ and t1 < .... < tn . We shall use the observation that Eν (Z|Ftν ) = Eν (Z|Ft0 ) Pν − a.s. For a t > 0 choose an integer k: tk−1 ≤ t < tk so that for h < tk − t ν Eν (Z|Ft+h )= k−1 Y fi (Xti )gh (Xt+h ) Pν − a.s. i=1 where Z gh (x) = Z ptk −t−h (x, dxk )fk (xk ) Z ptk+1 −tk (xk , dxk+1 )fk+1 (xk+1 )... 33 ptn −tn−1 (xn−1 , dxn )fn (xn ). As h → 0, gh converges uniformly (Feller!) to Z Z Z g(x) = ptk −t (x, dxk )fk (xk ) ptk+1 −tk (xk , dxk+1 )fk+1 (xk+1 )... ptn −tn−1 (xn−1 , dxn )fn (xn ). Moreover, Xt+h → Xt a.s. (right continuity!) and by Theorem 5.2 Eν (Z|Ftν+ ) = lim h→0 ν Eν (Z|Ft+h ) = k−1 Y fi (Xti )g(Xt ) = Eν (Z|Ftν ). i=1 Remark. The Markov property is preserved by augmentation (we shall not address this issue in detail). Exer. 2. Show that if a random process Xt is left continuous (e.g. is a Brownian motion), then its natural filtration FtX is left continuous. Hint: FtX is generated by the sets Γ = {(Xt1 , ..., Xtn ) ∈ B}, 0 ≤ t1 < ... < tn = t. Exer.3. Let Xt be a Markov chain on {1, ..., n} with transition probabilities qij > 0, i 6= j, which can be defined via the semigroup of stochastic matrices Φt with the generator X (Af )i = (fj − fi )qij . j6=i Let Nt = Nt (i) denote the number of transitions during time t of a process starting at Rt P some point i. Show that Nt − 0 q(Xs ) ds is a martingale, where q(l) = j6=l qlj denote the Rt intensity of the jumps. Hint: to check that ENt = E 0 q(Xs ) ds show that the function ENt is differentiable and n X d E(Nt ) = P (Xt = j)qj . dt j=1 Exer. 4 (Poisson integrals). Recall first that right continuous functions of bounded variation on R+ (=differences of increasing functions) are i one-to-one correspondence with signed Radon measures on R+ according to the formulas ft = µ([0, t]), µ((s, t]) = ft − fs , and the Stieltjes integral of a locally bounded Borel function g Z t Z gs dfs = gs dfs 0 (0,t] is defined as the Lebesgue integral of g with respect to the corresponding measure µ. Let Nt be a Poisson process of intensity c > 0 with respect to a right continuous filtration Ft . (i) Show that Z t Z t 1 1 Ns dNs = Nt (Nt + 1), Ns− dNs = Nt (Nt − 1) 2 2 0 0 (integration in the sense of Stieltjes). (ii) Let H be a left continuous bounded adapted process. Show that the processes Z t Z t Mt = Hs dNs − c Hs ds, (1) 0 0 34 Z Mt2 −c 0 t Hs2 ds (2) are martingales. Hint: For (ii) check this first for simple left continuous processes Hs = ξ(ω)1(a,b] (s), where from adapted-ness ξ is Ft -measurable for any t ∈ (a, b] and hence Fa -measurable by right continuity. Then, say Mt = ξ[(Nmin(t,b) − Na ) − c(min(t, b) − a)], t ≥ a, and one conclude that EMt = 0 by the independence of ξ and Na+u −Na and the properties of the latter. Section 9. Stopping times and optional sampling theorem. Def. Let (Ω, F, P ) be a probability space equipped with a filtration Ft . Astopping time (respectively optional time) is a r.v. T : Ω 7→ [0, ∞] s.t. ∀t ≥ 0, (T ≤ t) ∈ Ft (respectively (T < t) ∈ Ft ). Prop. 1. (i) T is a stopping time ⇒ T is an optional time. (ii) if Ft is right continuous, the two notions coincide. Proof. (i) {T < t} = ∪∞ n=1 {T ≤ t − 1/n} ∈ Ft−1/n ⊂ Ft . ∞ (ii) {T ≤ t} = ∩n=m {T < t + 1/n} ∈ Ft+1/m . Hence {T ≤ t} ∈ Ft+ . Def. Hitting time: TA = inf{t ≥ 0 : Xt ∈ A}, where Xt is a process and A is a Borel set. Exer. 1. Show that if X is a Ft -adapted and right continuous and A is (i) open or (ii) closed, then TA is (i) a optional or (ii) a stopping time respectively. Hint: (i) {T < t} = ∪s<t,s∈Q {Xs ∈ A} ⊂ Ft , (ii) {T > t} = ∩s≤t,s∈Q {Xs ∈ / A} ⊂ Ft . Prop. 2. If T, S are stopping times, then so are min(T, S), max(T, S) and T + S. Proof. (i) {min(T, S) ≤ t} = {T ≤ t} ∪ {S ≤ t}. (ii) {max(T, S) ≤ t} = {T ≤ t} ∩ {S ≤ t}. At last for (iii) {T + S > t} = {T = 0, S > t} ∪ {T > t, S = 0} ∪ {T ≥ t, S > 0} ∪ {0 < T < t, T + S > t}. The first three events are in Ft trivially or by Prop. 1. To see that the same holds for the last one, it can be written as ∪r∈(0,t)∩Q {t > T > r, S > t − r}. Def. If T is a stopping time and X is a adapted process, the stopped σ-algebra FT (of events determined prior to T ) is Ft = {A ∈ F : A ∩ {T ≤ t} ∈ Ft , ∀t ≥ 0} 35 and the stopped r.v. XT is XT (ω) = XT (ω) . Exer. 2. (i) Convince yourself that FT is a σ-algebra. (ii) Show that if S, T are stoping times s.t. S ≤ T a.s., then FS ⊂ FT . Exer. 3. If X is an adapted process and T a stopping time taking finitely many values, then XT is FT -measurable. Hint: Say, range of T is t1 < ... < tn . Then {XT ∈ B} ∩ {T ≤ tj } = ∪jk=1 {XT ∈ B} ∩ {T = tk } = ∪jk=1 {Xtk ∈ B} ∩ {T = tk } ∈ Ftj . Exer. 4. (i) If Tn is a sequence of Ft stopping times, then supn Tn is a stopping time. (ii) If Ft is right continuous, then inf n Tn is a stopping time. (iii) If additionally Tn is decreasing and converging to T , then FT = ∩n FTn . Hint: (i) {sup Tn ≤ t} = ∩{Tn ≤ t}. (ii) {inf Tn < t} = ∪{Tn < t} ∈ Ft and use Prop. 1 (ii). Def. A process X is progressively measurable (or progressive) if ∀t the map (s, ω) 7→ Xs (ω) from [0, t] × Ω into Rd is B([0, t]) ⊗ Ft -measurable. Prop. 3. An adapted process with right or left continuous paths is progressive. (n) Proof. Say, Xt is right continuous. Define X0 (ω) = X0 (ω) and Xs(n) (ω) = X(k+1)t/2n (ω) for kt k+1 <s≤ t n 2 2n (n) where t > 0, n > 0, k = 0, 1, ..., 2n − 1. The map (s, ω) 7→ Xs (ω) is B([0, t]) ⊗ Ft (n) measurable. Hence the same holds for Xs , since Xs → Xs by right continuity. Prop. 4. If X is progressive and T is a stopping time, then the stopped r.v. XT is FT -measurable on {T < ∞}. Proof. The r.v. (s, ω) 7→ Xs (ω) is B([0, t]) ⊗ Ft measurable, and the mapping ω 7→ (T (ω), ω) is Ft , B([0, t])⊗Ft measurable, and so is its restriction on the set {T ≤ t}. Hence the composition XT (ω) of this maps is Ft -measurable on the set {T ≤ t}, which means {ω : XT (ω) ∈ B, T ≤ t} ∈ Ft for a Borel set B, as required. Exer. 5. For an optional time T define the sequence (Tn ), n ∈ N of decreasing random times converging to T as ½ Tn (ω) = T (∞), if T (ω) = ∞ k/2n , if (k − 1)/2n ≤ T (ω) < k/2n Show that all Tn are stopping times converging monotonically to T . Def. (predictability and martingale transform (discrete stochastic integral)). A process Hn , n = 1, 2, ..., is called predictable with respect to a discrete filtration Fn , n = 0, 1, ..., if Hn is Fn−1 -measurable for all n. Let (Xn ), n = 0, 1, ... be a stochastic process adapted to Fn , and Hn a positive bounded predictable process. The process H ◦ X defined inductively by (H ◦ X)0 = X0 , (H ◦ X)n = (H ◦ X)n−1 + Hn (Xn − Xn−1 ) is called the transform of X by H and a martingale transform if X is a martingale. 36 Prop. 5. (H ◦ X) is a (sub)martingale whenever X so is. Proof. Follows from E((H ◦ X)n |Fn−1 ) = (H ◦ X)n−1 + Hn E(Xn − Xn−1 |Fn−1 ). Exer. 6. Let T, S be bounded stopping times s.t. S ≤ T ≤ M . Let Hn = 1n≤T − 1n≤S = 1S<n≤T . Show that H is predictable and (H ◦ X)n − X0 = XT − XS for n > M . Hint: (H ◦ X)n − X0 = 1S<1≤T (X1 − X0 ) + ... + 1S<n≤T (Xn − X0 ). Prop. 6 (discrete optional sampling and martingale characterization). (i) Let (Xn ), n = 0, 1, ... be a Fn -adapted integrable process. The following three statements are equivalent: (i) Xt is a submartingale (respectively a martingale), (ii) for any bounded stopping times S ≤ T E(XS ) ≤ E(XT ) (1) (respectively with the equality sign), (iii) for any bounded stopping times S ≤ T XS ≤ E(XT |Fs ) a.s. (2) (respectively with equality). Proof. (iii) ⇒ (i) is obvious. (i) ⇒ (ii) follows from Exer. 6 and Prop. 5. Finally, to get (ii) ⇒ (iii) one applies (1) to the stopping times S B = S1B + M (1 − 1B ), T B = T 1B + M (1 − 1B ) with B ∈ FS (check they are stopping times!) yielding E(XS 1B + XM (1 − 1B )) ≤ E(XT 1B + XM (1 − 1B )), which implies E(XS 1B ) ≤ E(XT 1B ) and hence (2). As an easy application one gets the following fundamental estimate. ? Prop. 7. (i) If Xn is a submartingale, n = 1, ..., N , then λP (sup |Xn | ≥ λ) ≤ E(|XN |1supn |Xn |≥λ ) ≤ E(|XN |). (ii) If Xt is a right continuous submartingale on t ∈ [0, T ] or t ≥ 0, then λP (sup |Xt | ≥ λ) ≤ sup E(|Xt |). t 37 Proof. As |Xn | is again a submartingale, it is enough to consider the case of positive X. Define a stopping time S being equal to N if supn Xn < λ and S = inf{n : Xn ≥ λ} otherwise. Then E(XN ) ≥ E(XS ) = E(XS 1supn |Xn |≥λ ) + E(XS 1supn |Xn |<λ ) ≥ λP (sup Xn ≥ λ) + E(XN 1supn |Xn |<λ ), and the required estimate follows by subtraction. (ii) From a finite index set one directly extends it to a countable index set, and then use right continuity to obtain the general estimate. Doob’s optional stopping (or sampling) theorem. If X is a right continuous (sub)martingale, S ≤ T are two stopping times and either (i) T is bounded or (ii) the family Xτ with τ running through all stopping times is uniformly integrable (the latter occurs e.g. if Xt = E(X∞ |Ft ) for some integrable X∞ ), then XS and XT are integrable with XS ≤ E(XT |FS ) with equality in case X is a martingale. Proof. Let Sn ≤ Tn be a sequences of decreasing stopping times with countably many values (see Exer. 5) converging to S and T . Then Z Z A XSn dP ≤ A XTn dp (3) for all A ∈ FSn and in particular for A ∈ FS . By right continuity XTn (respectively XSn ) converge to XT (respectively XS ) point-wise and by uniform integrability also in L1 (use Theorem 5.3 in case (i))). Hence (3) implies Z Z XS dP ≤ A XT dp A for A ∈ FS , i.e. the required result. Example: ”violation of optional sampling”. (ηn ), n ∈ N, are iid Bernoulli r.v. s.t. ηn equals 1 (success) or -1 (loss) with probability p and q = 1 − p. The player’s stake at nth turn is Vn . Naturally Vn is Fn−1 = σ(η1 , ...., ηn−1 )-measurable. Then the total gain is n n X X Xn = Vi η i = Vi ∆Yi = (V ◦ Y )n , i=1 i=1 where Yn = η1 + ... + ηn . The game is fair (or favorable, or unfavorable) if p = q (or p > q, or p < q) ⇔ (Xn , Fn ) is a martingale (or submartingale, or supermartingale). Consider a strategy V (called the martingale strategy) s.t. V1 = 1 and further ½ Vn = 2n−1 , if η1 = ... = ηn−1 = −1, . 0 otherwise 38 Pn Thus if η1 = ... = ηn = −1, the total loss after n turns will be i=1 2i−1 = 2n − 1 and if then ηn+1 = 1, Xn+1 = 2n − (2n − 1) = 1. Denoting T = inf{n : Xn = 1} and assuming p = q = 1/2 yields P (T = n) = (1/2)n , P (T < ∞) = 1(Borel − Cantelli!), and consequently EXT = P (XT = 1) = 1 > X0 = 0, though Xn is a martingale and EXn = 0 for all n. Section 10. Strong Markov property. Diffusions as Feller processes with continuous paths. Main Def. A time homogeneous Markov process with t.f. pt is called strong Markov if Eν (f (XS+t )|FS ) = (Φt f )(XS ) Pν − a.s.on {S < ∞} (1) for any {Ft }-stopping time S, initial distribution ν and positive Borel f . Exer. 1. If (1) holds for bounded stopping times, then it holds for all stopping times. Hint: For any n and a stopping time S Eν (f (Xmin(S,n)+t )|Fmin(S,n) ) = (Φt f )(Xmin(S,n) ) Pν − a.s. Hence by locality (Theorem 5.4) Eν (f (XS+t )|FS ) = (Φt f )(XS ) Pν − a.s. on {S ≤ n}. To complete the proof take n → ∞ thus exhausting the set {S < ∞}. Exer. 2. A Markov process Xt is strong Markov ⇔ ∀ a.s. finite stopping time T the process Yt = XT +t is a Markov process with respect to FT +t with the same t.f. Hint: strong Markov ⇔ Eν (f (XT +t+s )|FT +t ) = (Φs f )(XT +t ) Pν − a.s. ? Exer. 3. A canonical Markov process is strong Markov ⇔ Eν (Z ◦ θt |FS ) = EXS (Z) Pν − a.s. for any {Ft }-optional time S, initial distribution ν and F∞ -measurable r.v. Z, where θ is the canonical shift. Theorem 1. Any Feller process Xt is strong Markov. Proof. Let T take values on a countable set D. Then X X Eν (f (XT +t )|FT ) = 1T =d Eν f (Xd+t )|Fd ) = 1T =d Φt f (Xd ) = Φt f (XT ). d∈D d∈D 39 For a general T take a decreasing sequence of Tn with only finitely many values converging to T . Then Eν (f (XTn +t )|FTn ) = Φt f (XTn ), for all n, i.e. Z Z A f (XTn +t )P (dω) = A Φt f (XTn )P (dω) for all A ∈ FTn , in particular for A ∈ FT , as FT ⊂ FTn . Hence by right continuity of Xt and dominated convergence passing to limit n → ∞ yields Z Z f (XT +t )P (dω) = Φt f (XT )P (dω) A A for all A ∈ FT , as required. Theorem 2. If X is a Lévy process, then the process XT (t) = XT +t − XT is again a Lévy process, which is independent of Ft , and its law under Pν is the same as that of X under P0 . First proof (as a corollary of the strong Markov property of Feller processes). For a positive Borel functions fi à ! à ! Y Y Eν fi (XT +ti − XT )|FT = EXT fi (Xti − X0 ) , i i but this is a constant not depending on XT . 2nd proof (direct using special martingales Mu (t) = exp{i(u, Xt ) − tη(u)}). Assume T is bounded. Let A ∈ FT , ui ∈ Rd , 0 = t0 < t1 < ... < tn . Then n n n X Y Y Muj (T + tj ) E 1A exp{i (uj , XT (tj ) − XT (tj−1 ))} = E 1A φt −t (uj ), Muj (T + tj−1 ) j=1 j j−1 j=1 j=1 where φt (u) = Eei(u,Xt ) . By conditioning for s < t µ ¶ µ ¶ Mu (T + t) 1A E 1A =E E(Mu (T + t)|FT +s ) = P (A). Mu (T + s) Mu (T + s) Repeating this argument yields n n X Y E 1A exp{i (uj , XT (tj ) − XT (tj−1 ))} = P (A) φtj −tj−1 (uj ), j=1 j=1 which implies the statement of the Theorem by means of the following fact. Exer. 4. Suppose X is a r.v. on (Ω, F, P ), G is a sub-σ-algebra of F and E(ei(u,X) 1A ) = φp (u)P (A) 40 for any A ∈ G, where φp is the ch.f. of a probability law p. Then X is independent of G and the distribution of X is p. Def. Denote τh = inf{t ≥ 0 : |Xt − X0 | > h}, h > 0. A point x is called absorbing, if τh = ∞ a.s. for every h. Lemma (intuitively clear, omit a technical proof). A point is absorbing iff Φf (x) = f (x) for any f ∈ DA . Otherwise Ex (τh ) < ∞ for all sufficiently small h. Theorem 3 (Dynkin’s formula for the generator). Let X be a Feller process with continuous paths and generator A. For any f ∈ DA and a non absorbing x Ex f (Xτh ) − f (x) . h→0 Ex τh Af (x) = lim (2) For absorbing points x and all f : Af (x) = 0. Proof. By Dynkin’s martingale and optional stopping Z min(t,τh ) Af (Xs ) ds, Ex f (Xmin(t,τh ) ) − f (x) = Ex t, h > 0. 0 As Ex τh < ∞ for small h, this extends to t = ∞ by dominated convergence. This implies (1) by the continuity of Af taking into account that Ex (τh ) > 0 by continuity of paths. The next beautiful result is a direct consequence of (2). ∞ Theorem 4. Let A be a generator of a Feller process Xt s. t. Ccomp ⊂ DA . If Xt is ∞ a.s. continuous Pν for every ν, then A is local on Ccomp and hence Xt is a diffusion. Remark. The inverse statement holds as well, e.g. the Feller processes with local generators have a.s. continuous paths. Conclusion about BM. We have got another proof that BM (possibly with a drift) is the only Lévy process with continuous paths. Section 11. Reflection principle and passage times for BM. Def. Let B be a Brownian motion on (Ω, F, P ). The passage time Tb to a level b is Tb (ω) = inf{t ≥ 0 : Bt (ω) = b}. The (intuitively clear) equation P (Tb < t, Bt ≥ b) = P (Tb < t, Bt < b) for b > 0 is called the reflection principle. Since P (Tb < t) = P (Tb < t, Bt ≥ b) + P (Tb < t, Bt < b), and P (Tb < t, Bt ≥ b) = P (Bt ≥ b) it implies Z p P (Tb < t) = 2P (Bt ≥ b) = 2/(tπ) Z ∞ b 41 e −x2 /2t ∞ dx = 2/π bt−1/2 e−x 2 /2 dx. Differentiating yields the density P (Tb ∈ dt) = √ |b| 2πt3 e−b 2 /2t dt. (1) The necessity to justify the reflection principle and hence these calculations was one of the reason to introduce the strong Markov property. Theorem 1 (reflection principle). For a BM Bt P (Tb ≤ t) = P (Mt ≥ b) = 2P (Bt ≥ b) = P (|Bt | ≥ b), (2) where Mt = inf{b : Tb ≥ t} = sup{Bs : s ≤ t}. In particular, distribution (1) holds. Proof. P (Mt ≥ b, Bt < b) = P (Tb ≤ t, BTb +(t−Tb ) − BTb < 0) = P (Tb ≤ t)P (Bs < 0) = 1 1 P (Tb ≤ t) = P (Mt ≥ b), 2 2 and the result follows as P (Mt ≥ b) = P (Bt ≥ b) + P (Mt ≥ b, Bt < b). Theorem 2. Process Ta is a left continuous non-decreasing Levy process (i.e. it is a subordinator), and Ta+ = inf{t : Bt > a} is its right continuous modification. Proof. Since Tb − Ta = inf{t ≥ 0 : BTa +t − BTa ≥ b − a}, this difference is independent of FTa by the strong Markov property of BM. Stochastic continuity follows from the density (1). Clearly the process Ta (respectively Ta+ ) is non-decreasing and left continuous (respectively right continuous) and Ta+ = lims→0,s>0 Ta+s . At last it follows from the continuity of BM that Ta = Ta+ a.s. Theorem 3. For the process Ta Ee−uTa = e−a √ 2u , (3) which implies by (4.13), (4.15) that Ta is a stable subordinator with the index α = 1/2 and Lévy measure ν(dx) = (2πx3 )−1/2 dx. First proof. Compute directly from density (1) using the integral calculated in (7.10). Second proof. As Ms (t) = exp{sBt − s2 t/2} is a martingale one concludes from optional sampling that 1 = E exp{sBTa − s2 Ta /2} = esa E exp{−s2 Ta /2}, and (3) follows by substituting u = s2 /2. (Remark. As Doob’s theorem is stated for bounded stopping times, in order to be precise here one has to consider first the stopping times min(n, Ta ) and then pass to the limit n → ∞.) 42 Third proof. For any a > 0 the process 1b Ta√b is the first hitting time of the level a for the process b−1/2 Bbt . As by the scaling property of BM the latter is again a BM, 1b Ta√b and Ta are identically distributed, and thus the subordinator Ta is stable. Comparing expectations one identifies the rate leading again to (3). Theorem 4. The joint distribution of Mt and Bt is given by the density ½ ¾ 2(2b − a) (2b − a)2 exp − φ(t, a, b) = P (Bt ∈ da, Mt ∈ db) = √ dadb. (4) 2t 2πt3 Proof. Let a ≤ b. Then P (Bt < a, Mt ≥ b) = P (Mt ≥ b, BTb +(t−Tb ) − BTb < −(b − a)) = P (Mt ≥ b, BTb +(t−Tb ) − BTb ≥ b − a) = P (Mt ≥ b, Bt ≥ 2b − a) = P (Bt ≥ 2b − a), and (4) follows by differentiation. Theorem 5. The reflected Brownian motion |Bt | and the process Yt = Mt − Bt are both Markov with the same probability density p+ t (x, y) = pt (x − y) + pt (x + y), (5) where pt (x − y) is the transition density of the standard BM. Proof. To prove the statement for |Bt | one has to show that Z b P (|Bt + x| ∈ [a, b]) = P (|Bt − x| ∈ [a, b]) = a p+ t (x, y) dy (6) for all b > a ≥ 0. This holds, because P (|Bt + x| ∈ [a, b]) = P (Bt ∈ [a − x, b − x]) + P (−Bt ∈ [a + x, b + x]) Z = P (Bt ∈ [a − x, b − x]) + P (Bt ∈ [a + x, b + x]) = b (pt (y + x) + pt (y − x)) dy. a Turning to Yt let m = Mt > 0, b = Bt < m and r = m − b. Then by the strong Markov: P (Mt+h − Bt+h < ξ|Ft ) = P (Mt+h − Bt+h < ξ|Bt = b, Mt = m) = E(1Mt+h −Bt+h <ξ 1Mt+h =m |Bt = b, Mt = m)+E(1Mt+h −Bt+h <ξ 1Mt+h >m |Bt = b, Mt = m) = E(1r−Bh <ξ 1Mh <r ) + E(1Mh −Bh <ξ 1Mh ≥r ), and this is seen (by inspection) to be the integral of φ(t, x, y) from (4) over the domain r − ξ < x < y < x + ξ, i.e. it equals Z Z ∞ dx r−ξ Z x+ξ ∞ dyφ(t, x, y) = − x r−ξ 43 pt (2y − x)|y=x+ξ , y=x ∂ because φ(t, x, y) = − ∂y pt (2y − x). Hence Z Z ∞ P (Mt+h − Bt+h < ξ|Ft ) = r−ξ Z pt (2ξ + x) dx r−ξ Z ∞ = ∞ pt (x) dx − ∞ pt (x) dx − pt (y) dy. r−ξ r+ξ Differentiating with respect to ξ yields (5). Def. The arcsin law is the distribution of ξ = sin2 X when X is U (0, 2π) (uniformly distributed on [0, 2π]). Clearly P (ξ ≤ t} = P {| sin X| ≤ √ t} = √ 2 arcsin t, π t ∈ [0, 1]. (7) Theorem 6. Let Bt be a Brownian motion on [0, 1] with the maximum process Mt . Then the random times τ = inf{t : Bt = M1 } (when Bt first attains its maximum), τ̃ = sup{t : Bt = M1 } (when Bt for the last time attains its maximum) and the time θ = sup{t : Bt = 0} of the last exit from the origin obey all the arcsin law. In particular, as τ ≤ τ̃ it implies that τ = τ̃ a.s. Proof. µ ¶ P (τ̃ ≤ t) = P (τ ≤ t) = P sup(Bs − Bt ) ≥ sup(Bs − Bt ) = P (|Bt | ≥ |B1 − Bt |) s≤t s≥t µ 2 2 = P (tξ ≥ (1 − t)η ) = P η2 ≤t ξ2 + η2 ¶ = P (sin2 X ≤ t), where ξ, η are independent N (0, 1) r.v. and X is uniformly distributed on [0, 2π]. (Exer.: use Theorems 1,4 to explain the reasoning behind all this equivalences!). And P (θ < t) = P (sup Bs < 0) + P (inf Bs > 0) = 2P (sup(Bs − Bt ) < −Bt ) s≥t s≥t s≥t = 2(|B1 − Bt | < Bt ) = P (|B1 − Bt | < |Bt |) = P (τ ≤ t). Exer. Show (either directly or applying the scaling transformation to (7)) that for τt = inf{s ∈ (0, t) : Bs = Mt } Z r P (τt ≤ r) = 0 dy 2 p = arcsin π π y(t − y) r r . t CHAPTER 4. HEAT CONDUCTION (OR DIFFUSION) EQUATION. Section 12. The Dirichlet problem for diffusion operators. 44 Assume aij , bj are continuous bounded function s.t. the matrix (aij ) is positive definite and the operator d X d ∂f 1 X ∂2f Lf (x) = bj (x) + ajk (x) ∂xj 2 ∂xj ∂xk j=1 j,k=1 generates a Feller diffusion Xt . Assume Ω is an open subset of Rd with the boundary ∂Ω and closure Ω̄. The Dirichlet problem for L in Ω consists in finding an u ∈ Cb (Ω̄) ∩ Cb2 (Ω) s.t. Lu(x) = f (x), x ∈ Ω, u|∂Ω = ψ (1) for given f ∈ Cb (Ω), ψ ∈ Cb (∂Ω). A fundamental link between probability and PDE is given by the following Theorem 1. Let Ω be bounded and Ex τΩ < ∞ for all x ∈ Ω, where τΩ = inf{t ≥ 0 : Xt ∈ ∂Ω} (e.g. if X is a BM), and let u ∈ Cb (Ω̄) ∩ C 2 (Ω) be a solution to (1). Then · ¸ Z τΩ u(x) = Ex ψ(XτΩ ) − f (Xt ) dt . (2) 0 In particular, such a solution u is unique. Proof. (i) Assume first that u can be extended to the whole Rd as a function u ∈ C∞ (Rd )∩Cb2 (Rd ). Then u ∈ DL and applying the stopping time τΩ to Dynkin’s martingale yields · ¸ Z τΩ E u(XτΩ ) − u(x) − Lu(Xt ) dt = 0, (2a) 0 implying (2). (ii) In general case choose an expanding sequence of domains Ωn ⊂ Ω with smooth boundaries tending to Ω as n → ∞. The solution u to the problem Lun (x) = f (x), x ∈ Ωn , un |∂Ωn = u can be extended to Rd as in (i) and hence is unique and has the representation · ¸ Z τΩn u(x) = un (x) = Ex u(XτΩn ) − f (Xt ) dt , x ∈ Ωn . 0 Taking the limit as → ∞ yields (2), because τΩn → τΩ , as n → ∞. Example. Take Ω = (α, β) ⊂ R and L= 1 d2 d a(x) 2 + b(x) 2 dx dx with a, b ∈ C(Ω̄), a > 0. Then u(x) = Px (XτΩ = β) is the probability that Xt starting at a point x ∈ (α, β) reaches β before α and represents a solution to the problem d2 u(x) du(x) 1 a(x) + b(x) = 0, x ∈ (α, β), 2 2 dx dx 45 u(α) = 0, u(β) = 1. (3) On the other hand, u(x) = Ex τΩ is the mean exit time from Ω that solves the problem du(x) 1 d2 u(x) + b(x) a(x) = −1, x ∈ (α, β), 2 dx2 dx u(α) = u(β) = 0. Exer.1. (i) Solve problem (3) analytically showing that ÃZ Z x Px (XτΩ = β) = β exp{g(y)} dy α exp{g(y)} dy (4) !−1 , (5) α Rx where g(x) = − α (2b/a)(y) dy. In particular, for a standard BM Bt starting at x this gives Px (BτΩ = β) = (x − α)/(β − α). (ii) Solve (4) with b = 0 showing that in this case Z x Z x−y x−α β β−y Ex τΩ = 2 dy − 2 dy. (6) β − α α a(y) α a(y) In particular, for a BM this turns to (x − α)(β − x). Hint for (ii): show first that the solution to the Cauchy problem 1 a(x)u00 (x) = −1, 2 is given by formula Z x u(x) = ω(x − α) − 2 u(α) = 0 (x − y)a−1 (y) dy α with a constant ω. 0 Exer. 2. Check that ∆φ = h00 (|x|) + d−1 |x| h (|x|) for φ(x) = h(|x|). Deduce that if such φ is harmonic (i.e. satisfies the Laplace equation ∆φ = 0) in Rd , then ½ A + Br−(d−2) , d > 2 (7) h(r) = A + B ln r, d = 2 with some constants A, B. Exer. 3. Solve the equation ∆φ = 0 in the shell Sr,R = {x ∈ Rd : r < |x| < R} with boundary conditions φ(x) being 1 (respectively zero) on |x| = R (resp. |x| = r). Hence compute the probability that the standard Browninan motion started from a point x ∈ Sr,R leaves the shell via the outer part of the boundary. Hint: choosing appropriate A, B from (7) one finds ( 2−d 2−d |x| −r d>2 2−d −r 2−d , R φ(x) = ln |x|−ln r . (8) d=2 ln R−ln r , This describes the required probability due to Theorem 1. Exer. 4. Calculate the probability of the Brownian motion Wt ever hitting the ball Br if started at a distance a > r from the origin. Hint: Let TR (resp. Tr ) be the first time kWt k = R (resp. r). By letting R → ∞ in (8) ½ (r/a)d−2 , d > 2 Px (Tr < ∞) = lim Px (Tr < TR ) = (9) R→∞ 1, d = 2 46 Exer. 5. Use Borel-Cantelli and Exer. 4 to deduce that for d > 2 and any starting point x 6= 0 there exists a.s. a positive r > 0 s.t. Wtx starting at x never hits the ball x B Pr . Hint: For any r < a let An be the event that Wt ever hits the ball Br/2n . Then P (An ) < ∞. Exer. 6. Show that BM in dimension d > 2 is transient, i.e. that a.s. limt→∞ kWt k = ∞. Hint: As Wt is a.s. unbounded (why?), the event that Wt does not tend to infinity means that there exists a ball Br s.t. infinitely many events An occur, where An means that the trajectory returns to Br after being outside B2n r . This leads to a contradiction by Borel-Cantelli and (9). ? Theorem 2. Let L be a generator of a Feller diffusion Xt . Given a domain Ω ⊂ Rd assume that there exists a two times continuously differentiable function f ≥ 0 in Rd \ Ω s.t. Lf (x) ≤ 0 and for some a > 0 and a point x0 ∈ Rd \ Ω one has f (x0 ) < a < inf{f (x) : x ∈ ∂Ω}. Then Xt started at x0 will never hit Ω with a positive probability (this actually means that the diffusion Xt is transient). Proof. Let N > kx0 k, and let τΩ and τN denote the hitting times of Ω and the sphere kyk = N respectively. Put TN = min(τN , τΩ ). From (2a) it follows that Ex0 f (XTn ) ≤ f (x0 ) < a. Hence a > inf{f (x) : x ∈ ∂Ω}Px0 (τΩ < τN ) > aPx0 (τΩ < τN ). passing to the limit as n → ∞ yields a > aPx0 (τΩ < ∞) implying Px0 (τΩ < ∞) < 1. Section 13. The stationary Feynman-Kac formula. Recall that the equation λg = Ag + f, (1) where A is the generator of a Feller semigroup Φt , f ∈ C∞ (Rd ), λ > 0, is solved uniquely by the formula Z ∞ g(x) = Rλ f (x) = Ex e−λt f (Xt ) dt. 0 This suggests a guess that a solution to the more general equation (λ + k)g = Ag + f, where the additional letter k denotes a bounded continuous function could look like Z ∞ Z t g(x) = Ex exp{−λt − k(Xs ) ds}f (Xt ) dt. 0 0 47 (2) (3) This is the stationary Feynman-Kac formula that we are going to discuss now. The fastest way of proving it (at least for diffusions) is by means of Ito’s stochastic calculus. Not having this tool at our disposal, we shall use a different method by first rewriting it in terms of the resolvents (thus rewriting the differential equation (2) in an integral form). Theorem 1. Let Xt be a Feller process with the semigroup Φt and the generator A. Suppose f ∈ C∞ (Rd ), k is a continuous bounded non-negative function and λ > 0. Then g ∈ DA and satisfies (2) iff g ∈ C∞ (Rd ) and Rλ (kg) = Rλ f − g. (4) Proof. Applying Rλ to both sides of (2) and using Rλ (λ − A)g = g yields (4). Conversely, subtracting the resolvent equations for f and kg ARλ f = λRλ f − f, ARλ (kg) = λRλ (kg) − kg, (5) and using (4) yields (2). Theorem 2. Under the assumptions of Theorem 1 the function (3) yields a solution to (4) and hence to (2). Proof. Using the Markov property one writes Z ∞ Rλ (kg) = Ex e−λs k(Xs )g(Xs ) ds 0 Z = Ex Z ∞ e −λs k(Xs ) 0 Z ∞ t exp{−λt − 0 k(Xu+s ) du}f (Xt+s ) dt ds. 0 Changing the variables of integration t, u to t̃ = s + t and ũ = s + u and denoting them again by t and u respectively leads to Z ∞ Z t Z t −λt Rλ (kg) = Ex e f (Xt ) k(Xs ) exp{− k(Xu ) du} ds dt, 0 0 s which by the integration by parts rewrites as · ¸ Z ∞ Z t −λt Ex e f (Xt ) 1 − exp{− k(Xs ) ds} dt = (Rλ f − g)(x), 0 0 as required. In many interesting particular situations the validity of formula (3) can be extended beyond the general conditions of Theorem 2. Let us consider one of this extensions for a one-dimensional BM. Theorem 3. Assume k ≥ 0 and f are piecewise-continuous bounded functions on R with the finite sets of discontinuity being Disck and Discf . Then the (clearly bounded) function g given by (3) with Xt being a BM Bt is continuously differentiable, has a piecewise continuous second derivative and satisfies (λ + k)g = 1 00 g +f 2 outside Disck ∪ Discf . 48 (6) Proof. The calculations in the proof of Theorem 2 remains valid for all bounded measurable f and k showing that g satisfies (4). Moreover, for piecewise continuous f and k one sees from dominated convergence that this g is continuous. Next, from Exercise 7.7 one finds that ·Z x √ ¸ Z ∞ √ 1 2λ(y−x) 2λ(x−y) e f (y) dy + f (y) dy . Rλ f (x) = √ e 2λ −∞ x Hence Rλ f is continuously differentiable for any bounded measurable f with Z ∞ √ Z x √ 0 2λ(x−y) e e 2λ(y−x) f (y) dy. (Rλ f ) (x) = f (y) dy − x −∞ This implies in turn that (Rλ f )00 is piece-wise continuous for a piecewise continuous f and the resolvent equations (5) hold outside Discf ∪ Disck . Hence one shows as in Theorem 2 that g satisfies (6), which by integration implies the continuity of g 0 . Exer. 2. Show that for α, β > 0 and a BM Bt √ · ¸ √ 1 α + β − α −√2(α+β)x √ e Ex exp −αt − β 1+ α+β α 0 0 (7) for x ≥ 0. Hint: by Theorem 3 the function z(x) on the l.h.s. of (7) is a bounded solution to the equation ½ αz(x) = 12 z 00 (x) − βz(x) + 1, x > 0 (8) αz(x) = 12 z 00 (x) + 1, x < 0 Z ∞ ½ Z t ¾ 1(0,∞) (Bs ) ds dt = with the boundary conditions z 0 (0+ ) = z 0 (0− ). z(0+ ) = z(0− ), The bounded solution to (8) have the form ( z(x) = p 1 A exp{− 2(α + β)x} + α+β , √ 1 B exp{ 2αx} + α , x < 0 x>0 . Theorem 4 (arcsin law for the occupation time). The law for the occupation Rt time Ot = 0 1(0,∞) (Bs ) ds of (0, ∞) by a standard BM Bt has the density P (Ot ∈ dy) = π dy p y(t − y) . (9) Proof. By the uniqueness of the Laplace transform it is enough to show that Z Ee −βOt = 0 t dy . e−βy p π y(t − y) 49 (10) But from (7) Z ∞ 1 e−αt Ee−βOt dt = z(0) = p α(α + β) 0 , and on the other hand Z ∞ Z t −αt e 0 e −βy 0 dy 1 p dydt = π π y(t − y) Z ∞ 0 e−(α+β)y dy √ y Z 0 ∞ e−αs 1 √ ds = p , s α(α + β) which implies (10) again by the uniqueness of the Laplace transform. Exer. 3. From formula (9) yielding the solution to eq. (λ − ∆)g = f , λ > 0, in R3 deduce that the solution to the Poisson equation ∆g = −f in R3 is given by formula Z 1 g= 2π f (y) dy |x − y| whenever f decreases quickly enough at infinity. Section 14. Diffusions with variable drift, Ornstein-Uhlenbeck processes. In order to be able to solve probabilistically equations involving second order differential operators, one has to know that these operators generate Markov (Feller) semigroups. Here we show how BM can be used to construct processes with the generators of the form Lf (x) = 1 ∂f ∆f (x) + (b(x), ), 2 ∂x x ∈ Rd . (1) Let b be a bounded Lipshitz continuous function, i.e. |b(x) − b(y)| ≤ C|x − y| with a constant C. Let Bt be a Ft BM on a filtered probability space. Then the equation Z Xt = x + t b(Xs ) ds + Bt 0 has a unique global continuous solution Xt (x) for any x depending continuously on x (proof by fixed point arguments literally the same as for usual ODE). Clearly Xt (x) is a Ft -Markov process starting at x. Theorem 1. Xt is a Feller process with the generator (1). Proof. Clearly Φt f (x) = Ef (Xt (x)) is a semigroup of positive contractions on Cb (Rd ). ∞ Let f ∈ Ccomp (Rd ). Then ∂f Φt f (x) − f (x) = E (x)(Bt + ∂x 1 + E 2 µ ∂2f (x)(Bt + ∂x2 Z Z t b(Xs ) ds) 0 Z t b(Xs ) ds), Bt + 0 0 50 t ¶ b(Xs ) ds + ..., where dots denote the correcting term of McLaurent (or Taylor) series. Taking into account that E|Btk | = O(tk/2 ) it follows that the r.h.s. of this expression is ¶ µ 2 ∂f 1 ∂ f ( (x), E(Bt + tb(x))) + E (x)Bt , Bt + o(t), t → 0, ∂x 2 ∂x2 so that 1 (Φt f (x) − f (x)) → Lf (x), t t → 0. ∞ Hence any f ∈ Ccomp (Rd ) belongs to the domain of the generator L and Lf is given by formula (1). As clearly Φt f → f for any such f , t → 0, it follows that the same holds for all f ∈ C∞ by density arguments. Exer 1. Convince yourself that the assumption that b is bounded can be dispensed with (only Lipshitz continuity is essential). Example 1. Solution to the Langevin equation Z t vt = v − b vs ds + Bt 0 with a given constant b > 0 defines a Feller process called Ornstein-Uhlenbeck (velocity) process with the generator LF (v) = 1 ∂f ∆f (v) − b(v, ), 2 ∂v v ∈ Rd . (2) R The pair (vt , xt = x0 + vs ds) describes the evolution of a (Newton) particle subject to white noise driving force and friction and is also called sometimes the Ornstein-Uhlenbeck process. Example 2. Solution to the system ½ ẋt = yt R Rt t (3) yt = − 0 ∂V ∂x (xs ) ds − b 0 ys ds + Bt describes the evolution of a Newton particle in the potential field V subject to friction and white noise driving force. Exer. 2. Assume b = 0 and that the potential V is bounded below, say V ≥ 1 everywhere, and is increasing to ∞ as |x| → ∞. (i) Write down the generator L of the pair process (xt , yt ). Answer: Lf (x, y) = (y, ∂f ∂V ∂f 1 )−( , ) + ∆f. ∂x ∂x ∂y 2 (ii) Check that L(H −α ) ≤ 0 for 0 < α < (d/2) − 1, where H(x, y) = V (x) + y 2 /2 is the energy function (Hamiltonian). (iii) Applying Dynkin’s formula with f = H −α for the process starting at (x, y) with the stopping time τh = inf{t ≥ 0 : H(xt , yt ) = h} 51 with h < H(x, y), show that Ex,y f ((x, y)(τh )) < f (x, y) and consequently Px,y (τh < ∞) ≤ (h/H(x, y))α . (iv) Follow the same reasoning as in Exer. 12.6 to establish that the process (xt , yt ) is transient in dimension d ≥ 3 (this result is remarkable, as it holds for all (smooth) V . Open problem. Under which condition on V the process specified by (3) with b = 0 is transient in dimension d = 1, 2 (for d ≥ 3 the answer is fully settled by Exer. 2; in d = 1 only a necessary (but not a sufficient) condition for transience is known). CHAPTER 5. FINE PROPERTIES of BM. Section 15. Zeros, excursions and local times. From Theorem 11.5 one has for a BM Wt that |Wt | = Mt − Bt , (1) where Bt is another BM and Mt is its maximum. Hence the times where Wt is away from the origin coincide with times where Mt 6= Bt and remains constant so that one can interpret Mt as a measure of time Wt spends at the origin. This motivates the Lévy definition of the process measuring the local times Lt (0) of a BM Wt spend at the origin by the equation 2Lt (0) = Mt (some authors do not include the multiplier 2 in this equation). As for each τ the time s ≤ τ for which Bs = Mτ is a.s. unique (see Theorem 11.6), one can choose a set of full measure ω0 s.t. for all ω ∈ Ω0 this holds for all rational τ . For any such ω and a t > 0 define γt (ω) = sup{s ∈ [0, t] : Ws = 0} = sup{s ∈ [0, t] : Bs = Mt }, βt (ω) = inf{s ∈ [t, ∞) : Ws = 0} = inf{s ∈ [t, ∞) : Bs = Mt } so that γt (ω) < t < βt (ω) (2) whenever Wt 6= 0. Assumption ω ∈ Ω0 implies that the maximum of Bs on [0, t] is attained uniquely at s = γt (ω) so that TMt (ω) (ω) = γt (ω), TMt (ω)+ (ω) = βt (ω) and thus TMt (ω)+ (ω) − TMt (ω) (ω) = βt (ω) − γt (ω) – the size of the jump in Tb (ω) at b = Mt (ω) equals the length of the excursion interval (γt (ω), βt (ω)) straddling t. Let N (b, [δ, ²)) denote the number of jumps of size l ∈ [δ, ²) of the Levy subordinator Ta (or its right continuous modification Ta+ ) that occur on the time interval (0, b] (cf. notations in Corollary 3 to Theorem 4.5), and let N δ (b) = N (b, [δ, ∞)). According to 52 Corollary 3 to Theorem 4.5 and Theorem 11.3 the process b → N (b, [δ, ²)) is a Poisson process with the intensity Z r ² ν([δ, ²)) = 3 −1/2 (2πx ) dx = δ Theorem 1. A.s. r lim δ→0 2 −1/2 (δ − ²−1/2 ). π πδ δ N (b) = b ∀b ∈ [0, ∞). 2 (3) (4) p 2 Proof. According to (3) the r.v. Qt = N 1/t (b) is Poisson with parameter 2/πbt. Hence, as the process Qt has non-decreasing right continuous paths and independent increments, p it is a Poisson process and by the law of large numbers (see Exer. 4.9) Qt /t → b 2/π a.s. as t → ∞ implying (4) for a given b. Then one deduce that it holds a.s. for all rational b, and then by continuity one extends it to all b. Theorem 2 (Lévy, 1948). r Lt (0) = lim δ→0 πδ nt (δ), 8 (5) where nt is the number of excursion intervals away from the origin, of duration ≥ δ, completed by Ws , s ≤ t. Proof. It follows from (4) and the above definition of Lévy’s local times that r Lt (0) = lim δ→0 πδ ñt (δ), 8 where ñt (δ) denotes the number of excursion intervals away from the origin, of duration ≥ δ, completed by Ws , s ≤ TMt + . But according to (2) βt = TMt + is the time of completion of the excursion straddling t. Hence nt (δ) and ñt (δ) differs at most by one, and (5) follows. Exer. Show that the zero set of BM {t : Wt = 0} is a (i) closed set of Lebesgue measure zero, and (ii) is unbounded and has an accumulation point at the origin. Hint: (i) use Fubini’s theorem and continuity of BM, (ii) maximum and minimum of BM are both a.s. not equal to zero on any finite interval, and are both a.s. unbounded on t ≥ 0. Section 16. Skorohod imbedding and invariance principle. For a ≤ 0 ≤ b let νa,b be the unique probability measure on the two point set {a, b} with mean zero so that νa,b = δ0 for ab = 0 and νa,b = bδa − aδb b−a otherwise. 53 (1) Prop. 1 (Randomization Lemma). For any distribution R µ on R of zero R mean denote µ± its restriction on R+ and R− respectively and put c = xµ+ (dx) = − xµ− (dx). Then Z µ = µ̃(dx dy)νx,y , (2) where the distribution µ̃ on R− × R+ is given by µ̃(dx dy) = µ({0})δ0,0 (dx dy) + c−1 (y − x)µ− (dx)µ+ (dy). Proof. Direct calculations applying both sides of (2) to a continuous function f . Prop. 2. Let τ be a stopping time for BM Bt such that Bmin(t,τ ) is uniformly bounded. Then EBτ = 0, Eτ = EBτ2 . Proof. By optional stopping (and basic martingales) EBmin(t,τ ) = 0, 2 E(min(t, τ )) = EBmin(t,τ ), and desired result is obtained by dominated and monotone convergence as t → ∞. Prop. 3 (embedding of r.v.). For a probability measure µ on R with mean zero choose a random pair (a, b) with distribution µ̃ from Prop. 1 and an independent BM Bt . Then (i) the random time T = inf{t : Bt ∈ {a, b}} is optional for filtration σ{a, b; Bs , s ≤ t}, (ii) the law of Bτ is µ, (iii) the expectation of τ coincides with the second moment (variance) of µ. Proof. By Exer. 1 of Section 12 the r.v. Bτ for fixed a, b would have the distribution (1). Hence Z Z Z Ef (Bτ ) = EE(f (Bτ )|a, b) = f (z)νx,y (dz)µ̃(dx dy) = f (x)µ(dx), yielding (ii). Then (iii) follows from Prop. 2. Theorem 1 (Skorohod embedding). Let ξ1 , ξ2 , ... be iid r.v. with mean 0 and Sn = ξ1 + ... + ξn . Then there exist a filtered probability space with a BM Bt and stopping times 0 = T0 ≤ T1 ≤ ... s.t. the differences ∆Tn = Tn − Tn−1 are iid with E∆Tn = Eξ12 and BTn are distributed like Sn for all n. Remark. τn = inf{t ≥ τn−1 : Bt = Sn } would give a trivial solution if the moment requirement would not be imposed. Proof. Let µ denote the common law of ξj . Take iid pairs (an , bn ), n=1,2,..., with the distribution µ̃ from Prop. 1 and an independent BM. Everything follows from the recursively definition of random times 0 = T0 ≤ T1 ≤ T2 ≤ ... by Tn = inf{t ≥ Tn−1 : Bt − BTn−1 ∈ {a, b}}. Theorem 2 (Approximation of random walks). Let ξ1 , ξ2 , ... be iid r.v. with mean 0 and variance 1, and let Sn = ξ1 + ... + ξn . Then there exists a BM Bt s.t. 54 Xt = t−1/2 sups≤t |S[s] − Bs | converges to zero in probability as t → ∞ ([s] denotes the integer part of s). Proof. Choose Tn and B as in Theorem 1. Then Tn /n → 1 a.s. by LLN, hence T[t] /t → 1 a.s. and hence (check it!) δt /t → 0 a.s., where δt = sups≤t |T[s] − s|. For any t, h, ² by scaling property of BM √ P (Xt > ²) ≤ P (δt > th) + P ( sup |Bu − Bv | > ² t) u−v≤th,u,v≤t+th = P (δt /t > h) + P ( sup u−v≤h,u,v≤1+h |Bu − Bv | > ²), which can be made arbitrary small by choosing small h and large t. Corollary (Functional CLT, invariance principle, two formulations). (i) For all C, ² > 0 there exists N s.t. for all n > N there exists a BM Bt (depending on n) s.t. ¯ ¯ ¶ µ ¯ S[tn] ¯ P sup ¯¯ √ − Bt ¯¯ > C < ². n t≤1 (ii) Let F be a uniformly continuous function on the space D[0, 1] of cadlag functions on S [0, 1] equipped with the sup-norm topology. Then F ( √[tn] ) converges in distribution to F (B) n with B = Bt being a standard BM. Proof. (i) Applying Theorem 2 with t = n yields µ ¶ −1/2 P n sup |S[s] − Bs | > C → 0 s≤n as n → ∞ for any C. With s = tn this rewrites as ¯ ¯ µ ¶ ¯ S[tn] ¯ B tn P sup ¯¯ √ − √ ¯¯ > C → 0. n n t≤1 √ But by scaling Btn / n is again a BM and (i) follows. (ii) One has to show that µ ¶ S[.n] E g(F ( √ )) − g(F (B. )) → 0 n (3) as n → 0 for any bounded uniformly continuous g. Choosing for each n a version of B from (i) one decomposes (3) into the sum of two terms with the function under the expectation multiplied by the indicators 1Yn >C and 1Yn ≤C respectively, where ¯ ¯ ¯ ¯ S[tn] Yn = sup ¯¯ √ − Bt ¯¯ . n t≤1 Then the first term is small by (i) for any C and n large enough, and the second term is small for small C by uniform continuity of F and g. 55 Examples. 1. Applying statement (i) with t = 1 yields the usual CLT for random walks. 2. Applying (ii) with F (h(.)) = supt∈[0,1] h(t) and taking into account the distribution of the maximum of BM (obtained by the reflection principle) yields µ P ¶ max{Sk : k ≤ n} √ ≤ x → 2P (N ≤ x), n x ≥ 0, where N is a standard normal r.v. N (0, 1). Section 17. Sample path properties. Non-differentiability, quadratic variation, module of continuity, iterated log, rate of escape. 56