Probability: Theory and Examples Adam Bowditch March 29, 2014 Contents 1 Brownian Motion 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 1.2 Path Regularity of Brownian Motion . . . . . . . . . 1.3 Donsker’s Theorem . . . . . . . . . . . . . . . . . . . 1.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . 1.5 Strong Markov Property and the Reflection Principle . . . . . 2 2 6 7 10 14 2 Levy Processes 2.1 Infinitely Divisible Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 17 3 Markov Processes 3.1 Random Conductance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Heat Kernel Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Green Densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 26 29 32 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Brownian Motion 1.1 Introduction Definition 1.1. Let (M, M, µ) be a measure space where µ is a σ-finite, non-atomic measure. A white noise is a collection of random variables {hη, ϕi}ϕ∈L2 (M,M,µ) such that ∀ϕ ∈ L2 (M, M, µ) we have that hη, ϕi is a centred Gaussian random variable and ∀ϕ1 , ϕ2 ∈ L2 (M, M, µ) Z E[hη, ϕ1 i hη, ϕ2 i] = ϕ1 (x)ϕ2 (x)µ(dx) Intuitively we want η to be some random Gaussian function such that for x 6= y we have that ` η(x) η(y). We can construct this using an approximation. For N ≥ 1 let Ij := [j/N, (j + 1)/N ), take i.i.d. 2 ξjN ∼ N (0, σN ) then set N −1 X ηN (x) = ξjN χIj (x) j=0 For ϕ1 6= ϕ2 : [0, 1] → R smooth we have that Z 1 hηN , ϕ1 i = ϕ1 (x)ηN (x)dx = 0 N −1 Z (j+1)/N X j=0 E[hηN , ϕ1 i hηN , ϕ2 i] = = −1 Z (j+1)/N N −1 N X X j=0 k=0 j/N N −1 X (j+1)/N 2 σN j=0 which converges to R1 0 ϕ1 (x)dxξjN j/N Z Z (k+1)/N ϕ2 (x)dxE[ξjN ξkN ] ϕ1 (x)dx k/N Z (j+1)/N ϕ1 (x)dx j/N ϕ2 (x)dx j/N 2 = N. ϕ1 (x)ϕ2 (x)dx if σN Remark 1.1. 1. It is possible to view white noise as a random distribution in which case η is a random n distribution in H − 2 −τ for τ > 0. 2. If η were a random function with E[η(x)η(y)] = C(x, y) then Z Z E[hηN , ϕ1 i hηN , ϕ2 i] = ϕ1 (x)ϕ2 (y)C(x, y)dxdy hence E[η(x)η(y)]δx,y . 3. For approximation N the joint density of the random function is proportional to ! N −1 N −1 k 2 Y ) 1 X η( N k exp − dη 2 2 σN N k=0 k=0 Definition 1.2. A Brownian motion on probability space (Ω, F, P) is a random function (Bt )t∈R+ such that 1. Bt is a centred Gaussian random variable; 2. E[Bt Bs ] = t ∧ s for any s, t ∈ R+ ; 3. t → Bt is continuous. 2 Lemma 1.1. Given Bt is a centred Gaussian random variable the second condition of the definition is equivalent to Bt having independent increments with E[(Bt − Bs )2 ] = t − s. Corollary 1.1. Let Bt be a Brownian motion then so are 1. λBλ2 t for λ 6= 0; 2. Bt+s − Bs for s > 0; 3. tB1/t . Definition 1.3. Hkn (t) := 2n/2 χ( k−1 , k ] − 2n/2 χ( k , k+1 ] n n n n is called a Haarwavelet. Lemma 1.2. The Haarwavelets form an orthonormal basis of L2 (Ω). Definition 1.4. Skn (t) t Z Hkn (s)ds := 0 is called a Schauder function. Lemma 1.3. Let X N be Gaussian vectors in Rn converging in distribution to X then X is Gaussian. Theorem 1.1. A Brownian motion exists (and is C α ). i.i.d. (n) Proof. Let Dn := 2kn : k ∈ [0, 2n ] ∩ N denote In := Dn \ Dn−1 then define {ξj }j∈In ∼ N (0, 1) (m) independent of {ξk }k∈Im for m < n. Define the Haarwavelets: H (n) k 2n (t) = 2n/2 χ( k−1 n , 2 k 2n ] (t) − 2 Z t and the Schauder functions S (n) k 2n (t) = H 0 (n) (n) k 2n n/2 χ( kn , k+1 (t) 2 2n ] (s)ds Let bn = supj∈In |ξj | then ∀n ∈ N, j ∈ In , x > 0 we have that (n) P(ξj y2 ∞ e− 2 √ dy > x) = 2π x Z ∞ y2 1 ye− 2 dy ≤ √ x 2π x ∞ 2 1 − y2 √ = −e x 2π x Z x2 e− 2 = √ x 2π (n) (n) P(|ξj | > x) = 2P(ξj > x) r x2 2 e− 2 ≤ π x X (n) (n) P( sup |ξj | > x) ≤ P(|ξj ) j∈In j∈In r ≤2 n−1 x2 2 e− 2 π x 3 Choosing x = n gives n2 (n) P( sup |ξj | > n) ≤ j∈In Since 2 n e− 2 √ n 2π 2 n ∞ X 2n e− 2 √ <∞ n 2π n=1 Borel-Cantelli gives us that ∞ [ ∞ \ P ! sup n=1 m=n j∈Im (n) |ξj | >m =0 i.e. almost surely only finitely such incidences occur, so for almost every ω ∈ Ω ∃n(ω) such that ∀m > n(ω) we have that bm ≤ m. We define the approximation sequence (n) Bt := n X X (m) ξj (m) Sj (t) m=0 j∈Im (n) Fix α ∈ (0, 1/2) then we want to show that Bt is uniformly C α that is ∃C > 0 (independent of n) (n) (n) such that |Bt+s − Bt | ≤ C|s|α almost surely. n X X (n) (m) (m) (m) (n) ξj (Sj (t + s) − Sj (t)) |Bt+s − Bt | = m=0 j∈Im We split into cases: 1. For |s| < 2−m : (m) We have that the maximum gradient of the Schauder function Sj (t) is 2m/2 hence (m) |Sj (m) (t + s) − Sj (t)| ≤ |s|2m/2 ≤ |s|α |s|1−α 2m/2 ≤ |s|α 2−m(1−α) 2−m/2 ≤ |s|α 2−m(1/2−α) Furthermore notice that there are only at most 2 such j ∈ Im such that (m) (m) |Sj (t + s) − Sj (t)| > 0 since the Schauder functions are null on the intersections of their supports hence X (m) (m) |Sj (t + s) − Sj (t)| ≤ 2|s|α 2−m(1/2−α) j∈Im 2. For |s| ≥ 2−m : (m) 0 ≤ Sj (t) ≤ 2−m/2 hence (m) |Sj (m) (t + s) − Sj (t)| ≤ 2−m/2 ≤ |s|α 2−m/2 2αm ≤ |s|α 2−m(1/2−α) Furthermore notice that there are only at most 2 such j ∈ Im such that (m) (m) |Sj (t + s) − Sj (t)| > 0 since the Schauder functions have disjoint support hence X (m) (m) |Sj (t + s) − Sj (t)| ≤ 2|s|α 2−m(1/2−α) j∈Im 4 P (m) (m) So j∈Im |Sj (t + s) − Sj (t)| ≤ 2|s|α 2−m(1/2−α) . So we have that n X X (m) (m) (m) (n) (n) ξj (Sj (t + s) − Sj (t)) |Bt+s − Bt | = m=0 j∈Im n X ≤ bm m=0 (m) X |Sj (m) (t + s) − Sj (t)| j∈Im n X α bm 2−m(1/2−α) m=0 ∞ X C|s|α bm 2−m(1/2−α) m=0 ≤ C|s| ≤ P∞ Where m=0 bm 2−m(1/2−α) is almost surely finite since α < 1/2 and the fact that for almost every ω ∈ Ω ∃n(ω) such that ∀m > n(ω) we have bm ≤ m. (n) This gives us that Bt is uniformly C α hence any uniform limit is also C α . We now want to show (n) that {Bt }∞ n=1 is a Cauchy sequence in the supremum norm. For almost every ω ∈ Ω ∃n(ω) such that ∀m > n(ω) we have that bm ≤ m. This gives us that ∞ X X (m) |ξj (m) |Sj ∞ X (t) ≤ m=n(ω) j∈Im X (m) mSj (t) m=n(ω) j∈Im ∞ X ≤ m m=n(ω) ∞ X ≤ X (m) Sj (t) j∈Im m2−m/2 < ∞ m=n(ω) (n) hence Bt is Cauchy in the supremum norm since for k > m > n(ω) we have that k X X (l) (l) (k) (m) ξj Sj (t) |Bt − Bt | = l=m+1 j∈Il ≤ k X X (l) (l) |ξj |Sj (t) l=m j∈Il ∞ X X ≤ (l) (l) |ξj |Sj (t) l=m j∈Il ∞ X m2−l/2 ≤ l=m which can be made arbitrarily small by choosing sufficiently large m. (n) So we have that Bt converges to some Bt (α-Holder continuous) almost surely; so it remains to show (n) that we have the desired limit. Since Bt is a converging sequence of Gaussian processes the limit must also be Gaussian and hence it suffices to show that E[Bt ] = 0 and E[Bt Bs ] = t ∧ s. 5 E[Bt ] = E ∞ X X (m) ξj (m) Sj (t) m=0 j∈Im = ∞ X X (m) Sj (m) (t)E[ξj ] m=0 j∈Im =0 E[Bt Bs ] = E ∞ X X (m) ξj (m) Sj (t) = = = X X ! (n) (n) ξk Sk (s) n=0 k∈In m=0 j∈Im ∞ X ∞ X ∞ X X (m) Sj (n) (m) (n) ξk ] (t)Sk (s)E[ξj m=0 n=0 j∈Im k∈In ∞ X X (m) (m) Sj (t)Sj (s) m=0 j∈Im ∞ X D ED E X (m) (m) Hj , χ(0,t) Hj , χ(0,s) m=0 j∈Im (m) ξj i.i.d. ∼ N (0, 1) = χ(0,t) , χ(0,s) =t∧s Where the sums commute because convergence in probability implies convergence in Lp ∀p ∈ [1, ∞) for sequence of Gaussian random variables. Remark 1.2. We have constructed a continuous random function whose distribution is a probability measure on C[0, 1] called the Wiener measure. A slightly simpler proof can be given to only show that Bt exists. 1.2 Path Regularity of Brownian Motion Definition 1.5. For (Xt )t∈[0,1]n we say that Xt is α-Holder-continuous if sup t6=s |Xt − Xs | =: [X]C α < ∞ |t − s|α Theorem 1.2. Kolmogorov’s Continuity Theorem Let (Xt )t∈[0,1]n be a continuous random function such that ∃p, β, C > 0 such that ∀t, s we have that E[|Xt − Xs |p ] ≤ C|t − s|n+β then E[[X]C α ] < ∞ for all α < β/p. S Proof. Let Dm = {2−m Zn ∈ [0, 1]n } and D = m Dm be the diadic points. Then by continuity t −Xs | t −Xs | = supt6=s∈D |X|t−s| [X]C α = supt6=s |X|t−s| α α . S −m Let ∆m = {(t, s) ∈ Dm : |t − s|∞ = 2 } and ∆ = m ∆m be collections of neighbours. Then |∆m | ≤ 2mn 3n . 6 Write Rm := sup(t,s)∈∆m p E[Rm ] ≤ E X (t,s)∈∆m ≤ X (t,s)∈∆m |Xt −Xs | |t−s|α then |Xt − Xs | |t − s|α |t − s|n+β |t − s|α ≤ C2mn 3n 2−m(−αp+n+β = Cn 2m(αp−β) P p p hence if α < β/p we have that E[supm Rm ] ≤ E [ m Rm ] < ∞ and hence " # |Xt − Xs |p E sup <∞ α (t,s)∈∆ |t − s| It remains to show that sup (t,s)∈D |Xt − Xs |p |Xt − Xs |p ≤ C sup α |t − s|α (t,s)∈∆ |t − s| For fixed t, s ∈ D there exists m ∈ N such that 2−(m+1) ≤ |t − s| < 2−m . We can choose tm , sm ∈ Dm such that (tm , sm ) ∈ ∆m (or sm = tm ) and |tm − t|, |sm − s| < 2−m . We can then construct sequences {tn }n≥m , {sn }n≥m such that (tn , tn+1 ), (sn , sn+1 ) ∈ ∆n then ∞ X |X(t) − X(s)| ≤ |X(tm ) − X(sm )| + |X(tn ) − X(tn+1 )| + |X(sn ) − X(sn+1 )| n=m ≤C mα 2 +s ∞ X ! 2 −nα n=m ≤ C2−mα ≤ C|t − s|α Theorem 1.3. Let Bt be a Brownian motion on [0, 1] then ∀p > 1 and 0 < α < 1/2 we have that E[[B]C α ] < ∞. Proof. p E[|Bt − Bs | ] = E Bt − Bs |t − s|1/2 p |t − s|p/2 = Cp |t − s|p/2 hence E[[B]C α ] < ∞ by Kolmogorov’s Continuity. 1.3 Donsker’s Theorem Definition 1.6. The sequence of random variables Xn converges weakly to X if E[f (Xn )] → E[f (X)] for all f : C[0, 1] → R continuous, bounded. Remark 1.3. If the sequence of random variables Xn take values in (S, d) and converge in distribution to X then for any continuous f : S → S we have that f (Xn ) converge weakly to f (X). Example 1.1. Xt → supt |Xt | is continuous. (n) Definition 1.7. If Xt is a sequence of random variables then we say that the finite dimensional (n) (n) distributions converge if for any {tj }kj=1 then (Xt1 , ..., Xtk ) converges in distribution to (Xt1 , ..., Xtk ). 7 (n) i.i.d. Example 1.2. Consider ξk ∼ U[0, 1] random variables and Xt hat functions on (n) (n) (ξn − 1/n, ξn + 1/n). The finite dimensional distributions converge but supt Xt = 1 a.s. hence Xt doesn’t converge in distribution. Proving convergence of a process to a limiting process usually involves 1. Prove compactness of the sequence; 2. Identify limits. Definition 1.8. Let Π be a family of probability measures on (S, d). We say that Π is relatively compact if every sequence µn ⊂ Π has a subsequence which converges weakly to a limit µ. Definition 1.9. A family of probability measures is called tight if for any ε > 0 ∃Kε ⊂ S compact such that ∀µ ∈ Π we have that µ(Kεc ) < ε. Example 1.3. The family (δn )n∈N is not tight. Corollary 1.2. If µn converges weakly to µ then {µn } is tight. Definition 1.10. A stochastic process is called tight if the distributions are tight. Theorem 1.4. Prokhorov Let (S, d) be a complete separable metric space. Then a family of probability measures Π on S is relatively compact if and only if it is tight. Lemma 1.4. Kolmogorov’s Tightness Criterion (n) If Xt is a family of continuous stochastic processes for t ∈ [0, 1]d such that ∃Cp,β where ∀s, t, n we (n) (n) (n) have E[|Xt − Xs |p ] ≤ C|t − s|d+β then Xt is tight on C α (0, 1) ∀α < β/p. Proof. C β embeds compactly for C α for β > α. Fix α < α < β/p then by Kolmogorov’s continuity criterion ∃C independent of n such that supn E[||X||pC α ] < C hence P(||X (n) ||C α ≥ N ) ≤ E[||X (n) ||pC α ] Np C ≤ p N Furthermore {X : ||X||C α ≤ N } is compact in C α so indeed the collection is tight. T Remark 1.4. C α is not separable but the processes take values in β>α C β which is separable. Definition 1.11. Let X : [0, 1]d → R be a function then we define the modulus of continuity as ω(X, δ) = sup|t−s|<δ |Xt − Xs | Theorem 1.5. Arzela-Ascoli A ⊂ C([0, 1]d ) is relatively compact if and only if 1. supx∈A |X(0)| < ∞; 2. limδ→0 supx∈A ω(X, δ) = 0. Lemma 1.5. Let Π be a family of probability measures on C[0, 1]d . Then Π is tight if and only if 1. limλ→∞ supµ∈Π µ(X : |X(0)| > λ) = 0; 2. limδ→0 supµ∈Π µ(X : ω(X, δ) > ε) = 0 for all ε > 0. Proof. Suppose Π is tight. Fix η > 0 then we want to show the ∃λ, δ such that 1. ∀µ ∈ Π µ(X : |X(0)| > λ) ≤ η; 8 2. ∀ε > 0 µ(X : ω(X, δ) > ε) ≤ η. We can find K compact such that ∀µ ∈ Π we have that µ(K c ) ≤ η. Choose λ = supX∈K |X(0)| then ∀µ ∈ Π we have µ(|X(0)| > λ) ≤ µ(K c ) ≤ η. ∃δ > 0 such that ∀X ∈ K such that ω(X, δ) ≤ ε so µ(X : ω(X, δ) > ε) ≤ µ(K c ) ≤ η. Lemma 1.6. Let {Xn }n∈N be a stochastic process. Then TFAE 1. Xn converges weakly to X; 2. ∀A open lim inf n→∞ P(An ∈ A) ≥ P(X ∈ A); 3. ∀A closed lim supn→∞ P(An ∈ A) ≤ P(X ∈ A); Theorem 1.6. Donsker Let (ξkP )∞ k=1 be an i.i.d. sequence of random variables with zero mean and unit variance, define n Sn := k=1 ξk and 1 (n) Xt = √ Sbntc + (nt − bntc)ξbntc+1 n (n) Then Xt converges weakly to Bt with respect to the C[0, 1] topology. (n) Proof. We want to show that Xt is tight i.e. that for any ε > 0 lim sup P(ω(δ, X) > ε) = 0 δ→0+ n→∞ However it suffices to show that lim sup lim sup P(ω(δ, X) > ε) = 0 δ→0+ n→∞ since this implies that ∀γ > 0 ∃τ > 0, N ∈ N such that ∀n ≥ N, δ < τ we have that P(ω(δ, Xn ) > ε) < γ. There are only finitely many such n < N so for such n ∃δn > 0 such that P(ω(δn , Xn ) > ε) < γ by continuity. Taking min{τ, δ1 , ..., δN } > 0 gives the desired result. By rearranging it suffices to show that √ lim sup lim sup P( sup sup |Sk+j − Sk | ≥ ε n) = 0 δ→0+ Let M = ( n nδ ≤ sup n→∞ 0≤k<n 0≤j≤bnδc 2 δ then notice that by the triangle inequality we have that ) ( ) √ √ sup |Sk+j − Sk | ≥ ε n ⊆ sup sup |Skbnδc+j − Skbnδc | ≥ ε n/3 0≤k<n 0≤j≤bnδc 0≤k<M 0≤j≤bnδc So we have P sup sup |Skbnδc+j ! M X √ − Skbnδc | ≥ ε n/3 ≤ P 0≤k<M 0≤j≤bnδc 2 ≤ P δ by the reflection principle. We need to show that this converges to zero as n → ∞. 9 |Skbnδc+j − Skbnδc | ≥ ε n/3 0≤j≤bnδc k=0 ≤ sup √ sup √ |Sj | ≥ ε n/3 0≤j≤bnδc √ 2 P |Sbnδc | ≥ 2ε n/3 δ ! ! √ By the central limit theorem Sbnδc / n converges weakly to a centred Gaussian with variance δ hence √ letting Y ∼ N (0, 1), X = δY ∼ N (0, δ) we have that by the previous lemma √ 2 2 lim sup P |Sj | ≥ 2ε n/3 = lim sup P |Y | ≥ 2ε/3δ 1/2 δ→0+ δ δ→0+ δ ≤ lim sup δ→0+ C δ 1/2 e− 1 2ε δ − 2 3 !2 2 =0 Lemma 1.7. Gaussian Tails If Z is a standard Gaussian random variable then for x > 0 we have that x2 x2 1 x √ e− 2 ≤ P(Z ≥ x) ≤ √ e− 2 (x2 + 1) 2π x 2π Lemma 1.8. Borel-Cantelli Let {An }∞ n=1 be a sequence of events then: T P∞ ∞ S 1. If n=1 P(An ) < ∞ then P A n=1 m≥n n = 0. 2. If {An }∞ n=1 are independent and P∞ n=1 P(An ) = ∞ then P T ∞ n=1 S A = 1. n m≥n Theorem 1.7. p Law of The Iterated Logarithm Let ψ(t) := 2t log(log(t)) for t > 1 and let Bt be a standard Brownian motion then lim sup t→∞ Bt =1 ψ(t) a.s. Proof. Use Gaussian Tails estimate and Borel-Cantelli Lemma. 1.4 Martingales For the start of this section we shall consider results for discrete time martingales on the state space R however many of the results extend to continuous time. Definition 1.12. A filtration {Fn }∞ n=0 of (Ω, F, P) is an increasing sequence of sub-σ-algebras of F. Definition 1.13. We say that the stochastic process Xn is adapted to the filtration Fn if for all n we have that Xn is Fn measurable. Definition 1.14. If Xn is adapted to Fn such that E[|Xn |] < ∞ for all n then we call Xn 1. a martingale if E[Xn+1 |Fn ] = Xn ; 2. a sub-martingale if E[Xn+1 |Fn ] ≥ Xn ; 3. a super-martingale if E[Xn+1 |Fn ] ≤ Xn . Definition 1.15. A process An is called predictable if ∀n we have that An is measurable with respect to Fn−1 . Definition Pn1.16. For processes X, A we define the martingale transform (AX)n = i=1 Ai (Xi − Xi−1 ). Theorem 1.8. If Xn is a (sub/super) martingale and An a bounded, predictable process then AX is also a (sub/super A ≥ 0) martingale. 10 Proof. E[(AX)n+1 |F] − (AX)n = E[(AX)n + An+1 (Xn+1 − Xn )|Fn ] − (AX)n = An+1 (E[Xn+1 |Fn ] − Xn ) =0 Definition 1.17. A random variable T taking values in N is called a stopping time if ∀n ∈ N we have that {T ≤ n} ∈ Fn . Theorem 1.9. Optional Stopping Theorem Let Xn be a (sub/super) martingale and T a stopping time then the stopped process XnT = Xn∧T is also a (sub/super) martingale. In particular if T is bounded a.s. then E[XT ] = E[X0 ] (≥ / ≤). Proof. Choose An := χ{T ≥n} ∈ Fn−1 then XnT = X0 + (AX)n and is therefore a martingale. Choose n greater than the a.s. bound on T then E[Xn∧T ] = E[X0∧T ] = E[X0 ]. Definition 1.18. For a < b ∈ R and a sequence Xn we define S1 := inf{n : Xn ≤ a}, Sk = inf{n > Tk−1 : Xn ≤ a}, Tk = inf{n > Sk : Xn ≥ b}. Then the number of up-crossings of [a, b] by time n is Nn ([a, b], X) := sup{k : Tk ≤ n}. Theorem 1.10. Doob’s Up-crossing If Xn is a sub-martingale then ∀a < b, n ∈ N we have that (b − a)E[Nn ([a, b], X)] ≤ E[(Xn − a)+ − (X0 − a)+ ] Proof. Write Yn = (Xn − a)+ which is also a sub-martingale by Jensen’s inequality then define An = ∞ X χ{Sk <n≤Tk } k=1 be the event that the process is on an up-crossing at time n. So by a telescoping sum we have that (AY )n = Nn X YTi − YSi + (Yn − YSNn +1 )χ{SNn +1 <n} i=1 Hence since Yn is a sub-martingale we have that E[(AY )n ] ≥ (b − a)E[Nn ]. Write K = 1 − A to be the event that the process is in a down-crossing then (KY ) is also a sub-martingale. Yn − Y0 = (KY )n + (AY )n so (b − a)E[Nn ] ≤ E[(AY )n ] = E[Yn − Y0 ] − E[(KY )n ] ≤ E[Yn − Y0 ] Corollary 1.3. Every sub-martingale with supn E[Xn+ ] < ∞ converges almost surely. Proof. If Xn doesn’t converge almost surely then there exists an interval with rational endpoints [a, b] which is crossed infinitely often but this contradicts Doob’s up-crossing. Theorem 1.11. Doob’s Maximal Inequality Let Xn be a sub-martingale and a > 0 then define Xn∗ := max0≤i≤n Xi then aP(Xn∗ ≥ a) ≤ E[Xn χ{Xn∗ ≥a} ] ≤ E[Xn+ ] 11 Proof. Let T = inf{n : Xn ≥ a} then {Xn∗ ≥ a} = {T ≤ n} so by optional stopping theorem we have that E[Xn ] ≥ E[XT ∧n ] = E[XT χT ≤n ] + E[Xn χT ≥n ] ≥ aP(Xn∗ ≥ a) + E[Xn χT ≥n ] So indeed aP(Xn∗ ≥ a) ≤ E[Xn χXn∗ ≥a ] Theorem 1.12. Let Xn be a positive sub-martingale, then for p > 1 we have that E[|Xn∗ |p ]1/p ≤ p E[|Xn |p ]1/p p−1 Proof. From Doob’s maximal inequality we have that Z 0 ∞ aP(Xn∗ ≥ a) ≤ E[Xn χXn∗ ≥a ] Z ∞ ap−1 P(Xn∗ ≥ a)da ≤ ap−2 E[Xn χXn∗ ≥a ]da 0 Z ∞ 1 E[(Xn∗ )p ] = ap−1 P(Xn∗ ≥ a)da p 0 Z ∞ ≤ ap−2 E[Xn χXn∗ ≥a ]da 0 " # Z ∗ Xn = E Xn ap−2 da 0 1 E[Xn (Xn∗ )p−1 ] = p−1 p ≤ E[Xn ]1/p E[(Xn∗ )p ] p−1 So dividing through gives the desired result. Definition 1.19. For a continuous time process Xt and filtration Ft = σ({Bs }s≤t ) we say that T is a stopping time if {T ≤ t} ∈ Ft for all t and we say that T is an optional time if {T < t} ∈ Ft for all t. Remark 1.5. Any stopping time is an optional time but there are optional times which are not stopping times. T Definition 1.20. For a continuous process Xt we define the natural filtration Ft := s>t σ({Xr }r≤s ) Remark 1.6. For any right-continuous filtration we have that any optional time is a stopping time. Theorem 1.13. For every s > 0 we have that Xt := (Bt+s − Bs )t is a Brownian motion independent of Fs . Proof. Xt is clearly a Brownian motion by definition. Fix ε > 0 and consider Yε := (Bti +s+ε − Bs+ε )ni=1 . This is clearly independent from {Brj }m j=1 for rj < s + ε and is therefore independent of σ({Br }r<s+ε ). T From this we have that Yε is independent of Fs = ε>0 σ({Br }r<s+ε ) so by path continuity we have that the almost sure limit is independent of Fs . Corollary 1.4. T Blumenthal’s 0-1 Law If A ∈ F0 := ε>0 σ({Bs }s<ε ) then P(A) ∈ {0, 1}. 12 Proof. Bt = Bt − B0 a.s. so the σ-algebra generated by this process is independent of F0 and hence F0 is trivial. Corollary 1.5. Let A be the event that in any interval [0, ε) we have that Bt attains both positive and negative values. The P(A) = 1. T Proof. Let An := {∃ε < 1/n : Bε > 0} then n≥1 An is the event that for any such interval Bt attains a positive value. An ⊂ An−1 so A0 := limn→∞ An exists and belongs to F0 . For any N we have that \ P An ≥ 1/2 0≤i≤N since nT o 0≤i≤N An ⊃ {B1/N > 0} which has probability 1/2 by symmetry of a Brownian motion. It therefore follows that P(A0 ) ≥ 1/2 but since A0 ∈ F0 this means we must have that P(A0 ) = 1 and by symmetry we have P(A) = 1. Proposition 1.1. Let Bt be a Brownian motion, then the following are martingales: 1. Bt 2. Bt2 − t 3. eλBt − λ2 t 2 where λ > 0 Proof. In each case adaptedness holds by definition and integrability holds by properties of Gaussian random variables. 1. E[Bt |Fs ] = E[Bt − Bs |Fs ] + Bs = Bs 2. E[Bt2 − t|Fs ] = E[Bt2 − Bs2 |Fs ] + Bs2 − t = E[(Bt − Bs )2 + 2Bs (Bt − Bs )|Fs ] + Bs2 − t = t − s + Bs2 − t = Bs2 − s 3. h i λ2 (t−s) λ2 t λ2 s E eλBt − 2 |Fs = eλBs − 2 E eλ(Bt −Bs )− 2 |Fs λ2 (t−s) λ2 s = eλBs − 2 E eλ(Bt −Bs )− 2 = eλBs − λ2 s 2 by properties of moment generating functions. 13 Theorem 1.14. Let f : Rd → R by C 2 with bounded derivatives. Then Z 1 t ∆f (Bs )ds f (Bt ) − 2 0 is a martingale. Proof. Adaptedness and integrability are trivial by the fact that f is C 2 with bounded derivatives. Let x2 e− 2t pt (x) = √ 2πt be the Gaussian density. E[f (Bt )|Fs ] = E[f (Bt − Bs + Bs )|Fs ] Z = f (y)pt−s (y − Bs )dy R Z t Z Z 1 s 1 t 1 ∆f (Br )dr|Fs = ∆f (Br )dr + E[∆f (Br )|Fs ]dr E 2 2 0 2 s 0 Z Z Z 1 t 1 t E[∆f (Br )|Fs ]dr = ∆f (y)pr−s (y − Bs )dydr 2 s 2 s R Z Z 1 t = f (y)∆pr−s (y − Bs )dydr 2 s R Z tZ = f (y)∂r pr−s (y − Bs )dydr Zs R = f (y)pt−s (y − Bs )dy R = E[f (Bt )|Fs ] − f (Bs ) Z t Z 1 1 s E f (Bt ) − ∆f (Br )dr|Fs = E[f (Bt )|Fs ] − ∆f (Br )dr − E[f (Bt )|Fs ] + f (Bs ) 2 0 2 0 Z 1 s = f (Bs ) − ∆f (Br )drf (Bs ) 2 0 1.5 Strong Markov Property and the Reflection Principle Definition 1.21. Let T be a stopping time with respect to the filtration Ft . Then we can define a filtration FT := {A ∈ F∞ : A ∩ {T ≤ t} ∈ Ft ∀t ≥ 0} Theorem 1.15. Strong Markov Property Let Bt be a Brownian motion and T < ∞ a.s. stopping time. Then Bt∗ := BT +t − BT is a Brownian motion independent of FT . Proof. ∀n we can define Tn := inf{k2−n > T } ≥ T which is an increasing sequence of stopping times converging almost surely to T since if t ∈ [k2−n , (k + 1)2−n ) then {Tn ≤ t} = {T ≤ k2−n } ∈ Fk2−n ⊆ Ft . 14 (n) := BTn +t − BTn then for A ∈ C([0, ∞]) measurable and B ∈ FT we have that X (n) (n) E[χA (Bt )χB ] = E χA (Bt 0χB χ{Tn =k2−n } Write Bt k≥0 = X E[χA (Bt+k2−n − Bk2−n )χB∩{Tn =k2−n } ] k≥0 = X E[χA (Bt+k2−n − Bk2−n )]E[χB∩{Tn =k2−n } ] by Markov property k≥0 = E[χA (Bt )] X P(B ∩ {Tn = k2−n }) k≥0 = E[χA (Bt )]P(B) hence we have the Strong Markov property for the increasing sequence of stopping times converging a.s. to T (since taking B = Ω gives us that the process is a Brownian motion). So we need to pass the result to the limit. (n) (n) For any t1 , ..., tk we have that (Bt∗1 , ..., Bt∗k ) is the almost sure limit of (Bt1 , ..., Btk ) and hence by path regularity and independence from FT we have that the limit in independent of FT so indeed the theorem holds. Corollary 1.6. inf{t ≥ 0 : Bt = sup0≤s≤T Bs } is not a stopping time. (By strong Markov property and Blumenthal’s 0-1 law). Theorem 1.16. Reflection Principle If B is a Brownian motion, b ∈ R and T := inf{t ≥ 0 : Bt = b} then ( Bt t≤T B̃t = 2b − Bt t > T is also a Brownian motion. Corollary 1.7. P( sup Bt ≥ b) = 2P(BS ≥ b) = P(|BS | ≥ b) 0≤t≤S Proof. Let τ := inf{t : Bt = b} {sup Bt ≥ b} = {BS ≥ b} ∪ ({BS < b} ∩ {τ < S}) t≤S = {BS ≥ b} ∪ ({B̃S > b} ∩ {τ < S}) = {BS ≥ b} ∪ {B̃S ≥ b} where all unions used are disjoint. Since B̃ is a Brownian motion we indeed have the desired result. 2 Levy Processes Definition 2.1. Stochastic process Xt is called a cadlag process if it is right continuous and has left limits a.s. Definition 2.2. Stochastic process Xt is called stochastically continuous of ∀t, ε > 0 we have that lim sups→t P(|Xt − Xs | > ε) = 0. Definition 2.3. The Rd valued process (Xt )t≥0 is called a Levy Process if 15 1. Xt has independent increments; 2. X0 = 0 almost surely; 3. Xt has stationary increments; 4. Xt is stochastically continuous; 5. Xt is a cadlag process. Definition 2.4. (Xt )t≥0 is a Poisson process of intensity λ > 0 (written P P (λ)) if it is a Levy Process and ∀t ≥ 0 we have that Xt ∼ P o(λt) i.e. P(Xt = k) = (λt)k −λt . k! e Definition 2.5. The Gamma distribution with intensity λ and shape c (written Γ(c, λ)) is the distribution with density xc−1 e−λx λc /Γ(c). Lemma 2.1. The Γ(c, λ) distribution has characteristic function 1− ϕ(z) = iz λ −c Corollary 2.1. The Γ(1, λ) distribution is equivalent to exp(λ) and Γ(n, λ) is the sum of n i.i.d. exp(λ) random variables. Remark 2.1. We can construct the Poisson process as follows: i.i.d. Let {Ti }∞ i=1 ∼ exp(λ) so by the memoryless property we have that P(Ti ≥ t + s|Ti ≥ t) = e−λ(t+s) = e−λs = P(Ti ≥ s) e−λt Pn We then define Wn := i=1 Ti to be the waiting time for the nth event. We can then define Xt = k where Wk ≤ t < Wk+1 . Notice that Wn ∼ Γ(n, λ) and is independent of Tn+1 so we have that P(Xt = n) = P(Wn ≤ t, Tn+1 > t − Wn ) Z t Z ∞ λn = wn−1 e−λw λe−λs dsdw 0 (n − 1)! t−w Z t λn wn−1 e−λw e−λ(t−w) dw = (n − 1)! 0 Z t λn −λt = e wn−1 dw (n − 1)! 0 λn −λt n = e t n! which is the p.d.f. of P o(λt). Lemma 2.2. If Xt ∼ P P (λ) fix T > 0, 0 = t0 < t1 < ... < tn = T and {ki }ni=1 ∈ N0 . Let K = then ! k n n \ K! Y ti − ti−1 i P {Xti − Xti−1 = ki }XT = Qn T i=1 ki ! i=0 i=1 Pn i=1 ki Definition 2.6. An Rd valued process Xt is called a compound Poisson process of intensity λ > 0 if it is a Levy process and there exists a probability measure σ such that σ({0}) = 0 such that Xt has the characteristic function Z eihz,xi − 1σ(dx) ϕXt (z) = exp tλ Rd 16 Lemma 2.3. We can construct compound Poisson process by taking {ξi }∞ i=1 i.i.d. distributed Pthe n according to σ, defining Sn := i=1 ξi , letting Nt ∼ P P (λ) independently from ξ and then setting Xt := SNt . Proof. P(Xt0 ∈ B0 , Xt1 − Xt0 ∈ B1 ) = X P(Nt0 = n0 , Sn0 ∈ B0 , Nt1 − Nt0 = n1 , Sn1 − Sn0 ∈ B1 ) n0 ,n1 ∈N = X P(Nt0 = n0 , Sn0 ∈ B0 )P(Nt1 − Nt0 = n1 , Sn1 − Sn0 ∈ B1 ) n0 ,n1 ∈N = X X P(Nt0 = n0 , Sn0 ∈ B0 ) n0 ∈N P(Nt1 − Nt0 = n1 , Sn1 − Sn0 ∈ B1 ) n1 ∈N = P(Xt0 ∈ B0 )P(Xt1 − Xt0 ∈ B1 ) = P(Xt0 ∈ B0 )P(Xt1 −t0 ∈ B1 ) X ϕXt (z) = E[exp(i hz, Xt i)χ{Nt =n} ] n∈N " = X * E exp i z, " = * E exp i z, X n∈N ξi # χ{Nt =n} n X +!# ξi P(Nt = n) i=1 n∈N = +! i=1 n∈N X n X σ̂(x)n (λt)n e−λt n! = e−λt eλtσ̂(z) = exp(λt(σ̂(z) − 1)) 2.1 Infinitely Divisible Distributions Definition 2.7. The probability measure µ on Rd is called infinitely divisible if ∀n ≥ 1 we have that ∃µn probability measure such that µ = ∗(n) µn . Where ∗(n) µn denotes the n-fold convolution of µn . Lemma 2.4. If Xt is a Levy process then ∀t the law of Xt is infinitely divisible. Pn Proof. Xt = k=1 X kt − X (k−1)t by a telescoping sum. This is an i.i.d. collection hence indeed the n n distribution of Xt can be given as an n-fold convolution of the distribution of X nt . Lemma 2.5. If µ1 , µ2 are infinitely divisible then so is µ1 ∗ µ2 . Corollary 2.2. Every distribution with a characteristic function of the form Z 1 ϕX (z) = exp − hz, Σzi + iγz + eihz,xi − 1ν(dx) 2 Rd for ν a positive finite measure, Σ covariance matrix and µ ∈ Rd is infinitely divisible. Proposition 2.1. If ϕ : Rd → C is a characteristic function of probability measure ν then 1. |ϕ(z)| ≤ 1; 2. |ϕ(0)| = 1; 3. ϕ is uniformly continuous; 17 4. ϕ is positive definite i.e. ∀ξ ∈ Cn n X n X ϕ(zk − zj )ξk ξ j ≥ 0 k=1 j=1 Proof. The first two points holds trivially and the third holds by dominated convergence theorem so it remains to show the final statement. Z X Z X n X n n n X exp(i hx, zj i)ξ j ν(dx) ≥ 0 exp(i hx, zk − zj i)ξk ξ j ν(dx) = exp(i hx, zk i)ξk k=1 j=1 k=1 j=1 Theorem 2.1. Bochner The properties 1. |ϕ(z)| ≤ 1; 2. |ϕ(0)| = 1; 3. ϕ is uniformly continuous; 4. ϕ is positive definite uniquely characterise a characteristic function of a probability measure. ∞ Lemma 2.6. If µ, {µn }∞ n=1 are probability measures and ϕ, {ϕn }n−1 are the corresponding characteristic functions then TFAE 1. ϕn → ϕ pointwise; 2. µn * µ. Lemma 2.7. If µn are probability measures and ϕn corresponding characteristic functions then if ϕn converge pointwise to some ϕ continuous at 0 then ϕ is the characteristic function to some probability measure µ to which µn converge weakly. Remark 2.2. In both of the previous lemmas the convergence of ϕn is locally uniform. Lemma 2.8. If ϕ is the characteristic function of a probability measure µ then |ϕ|2 is also the characteristic function of some probability measure ν. Lemma 2.9. If µ is infinitely divisible and ϕ the corresponding characteristic function then ϕ doesn’t attain the value 0. Proof. For any n let µn be the nth convolution root of µ and ϕn the corresponding characteristic function. Then ϕnn = ϕ so |ϕn |2 = |ϕ|2/n is a characteristic function. Since |ϕ| ≤ 1 we have that ( 0 ϕ(x) = 0 2/n lim |ϕ(x)| = n→∞ 1 o/w In particular by continuity at the origin we have that limn→∞ |ϕ(x)|2/n = 1 on neighbourhood of the origin. The limit is a characteristic function of a probability measure hence is continuous everywhere so indeed the limit is 1 everywhere hence ϕ 6= 0 anywhere. d Lemma 2.10. If ϕ : Rd → C \ {0} is continuous then ∃1 ψ, {λn }∞ n=1 : R → C \ {0} continuous such that 1. ψ(0) = λn (0) = 1; 2. ϕ(z) = eψ(z) = λn (z)n . 18 Furthermore if ϕε converges locally uniformly to ϕ on C \ {0} then ψε , λn,ε converges locally uniformly to ψ, λn . Lemma 2.11. Let µε be infinitely divisible and converge to µ. Then µ is infinitely divisible. Theorem 2.2. Let µn be infinitely divisible with characteristic functions ϕn of the form Z 1 ihz,xi ϕn (z) = exp − hz, An zi + i hz, γn i + e − 1 − i hx, zi c(x)νn (dx) 2 Rd where c is continuous, ≈ 1 around 0 around ≈ o(|x|−1 ) around ∞. Let µ be another probability measure then µn * µ if and only if 1. µ is infinitely divisible with the characteristic function Z 1 ihz,xi e − 1 − i hx, zi c(x)ν(dx) ϕ(z) = exp − hz, Azi + i hz, γi + 2 Rd 2. γn → γ in Rd ; R R 3. for all f continuous vanishing at the origin f (x)νn (dx) → f (x)ν(dx); R 2 4. for ε > 0 we have hz, An,ε zi = hz, An zi + |x|<ε hx, zi νn (dx) satisfies lim lim sup | hz, An,ε zi − hz, Azi | = 0 ε→0+ n→∞ Proof. (sketch) We only show that µn * µ implies 1-4 Define ρn (dx) := (|x|2 ∧ 1)−1 νn (dx) then we can show that this collection of measures is tight by showing Z Z d Y sin(hx) νn (dx) → 0 as h → 0 sup ρn (dx) ≤ sup 1− hx d n n |x|≥1/h R i=1 and Z ρn (dx) = − sup sup n Z Z log(ϕn (z))dz ≤ C sup n Rd (|x|2 ∧ 1)νn (dx) < ∞ n Then by Prokhorov ∃ρn * ρ weakly convergent subsequence. Defining ν := (|x|2 ∧ 1)−1 ρ. Then for any continuous f vanishing at zero we have Z Z Z Z 2 −1 2 −1 f νn (dx) = (|x| ∧ 1) f ρn (dx) → (|x| ∧ 1) f ρ(dx) = f ν(dx) We then have that Z log(ϕn (z)) = − hz, An zi + i hγn , zi + eihz,xi − 1 − i hx, zi c(x)νn (dx) = − hz, An,ε zi + i hγn , zi + In,ε + Jn,ε where Z 2 eihz,xi − 1 − i hx, zi + hx, zi νn (dx) In,ε = |x|≤ε Z eihz,xi − 1 − i hx, zi c(x)νn (dx) Jn,ε = |x|>ε By Taylor’s theorem limε→0+ lim supn→∞ In,ε = 0. For almost every ε Z Jn,ε → eihz,xi − 1 − i hx, zi c(x)ν(dx) |x|>ε 19 R hence limε→0+ lim supn→∞ In,ε = |x|>0 eihz,xi − 1 − i hx, zi c(x)ν(dx). We know that log(ϕn (z)) → log(ϕ(z)) hence it follows that hγn , zi → hγ, zi for some γ and limε→0 limn→∞ hz, An,ε zi = hz, Azi. Theorem 2.3. Levy-Khintchin Formula If µ is infinitely divisible then ∃Σ non-negative, symmetric matrix, γ ∈ Rd and σ-finite measure ν R satisfying Rd (|x|2 ∧ 1)ν(dx) < ∞ such that Z 1 ihz,xi ϕX (z) = exp − hz, Σzi + i hγ, zi + e − 1 − i hz, xi χD (x)ν(dx) 2 Rd where D = {|x| ≤ 1}. Proof. Let µ be an arbitrary infinitely divisible distribution. We want to approximate µ by a compound Poisson random variable. (n) N (n) Let µn be the nth convolution root of µ and take {Xk }k=1 i.i.d. distributed according to µn where N ∼ P P (1). PN (n) (n) Define X (n) := k=1 Xk . If we let β (n) denote the law of X (n) then the characteristic functions of β (n) have the correct form and hence so does the limit so we want to show that β (n) converge weakly to µ. Denote ϕ the characteristic function of µ and then ϕ1/n is the characteristic function of µn . ϕβ (n) (z) = exp n(ϕ1/n (z) − 1) 1 = exp n(exp( log(ϕ(z))) − 1) n 1 n(exp( log(z)) − 1) → log(ϕ(z)) n So indeed ϕβ (n) converges weakly to ϕ. Remark 2.3. R 1. I := Rd eihz,xi − 1 − i hz, xi χD (x)σ(dx) converges absolutely since |eihz,xi − 1 − i hz, xi χD (x)| ≤ Cz (|x|2 ∧ 1) 2. Σ, γ, ν are unique and called the characteristic triple of µ 3. There is choice in the re-normalisation term −i hz, xi χD (x). In particular we may use −i hz, xi c(x) where c ≈ 1 around 0 around c ≈ o(|x|−1 ) around ∞. Changes in c effect the drift term γ only. R 4. If a Levy measure satisfies |x| ∧ 1ν(dx) < ∞ then we can choose c = 0 then I corresponds to processes with paths with finite variance. Corollary 2.3. Every infinitely divisible distribution is a weak limit of a compound Poisson process. Theorem 2.4. Let A be a symmetric, non-negative matrix, γ ∈ Rd and ν a Levy measure. Then ∃X Levy process such that X has an infinitely divisible distribution and characteristic triple (A, γ, ν). Proof. Let Bt be a d-dimensional Brownian motion. (1) Define Xt := A1/2 Bt + γt to be a scaled Brownian motion with drift and ν (1) = ν|B(0,1)c which is a c finite measure since ν is finite on B(0, 1) . (2) Define Xt to be the compound Poisson process for the Levy measure ν (1) . Let Ak := {x : 2−(k+1) < |x| ≤ 2−k } and define X̃t3,k to be independent compound Poisson processes with jump process ν|Ak . 20 R We can then define Xt3,k := X̃t3,k − t xν(dx) to be a compensated compound Poisson process. Notice R that Xt3,k is a martingale since E[X̃t3,k ] = t xν(dx). Pkl 3,i is a martingale and by independence For increasing sequence {ki }∞ i=1 we have that i=k1 Xt E kl X !2 Xt3,i = i=k1 So by Doob’s maximal inequality E sup s≤t kl X E[Xt3,i ] Z =t i=k1 kl X i=k1 !2 Xt3,i x2 ν(dx) Skl ≤ 4t i=k1 Z Ak x2 ν(dx) Skl i=k1 Ak (1) and so the sequence is Cauchy and hence converges so Xt := Xt process with characteristic triple (A, γ, ν). (2) + Xt + P∞ k=1 Xt3,k is a Levy 1 Example 2.1. Consider ν(dx) = θα |x|1+α dx on R \ {0} which is a Levy measure for α ∈ (0, 2). This has characteristic exponent Z 1 −ψ(z) = − e−xz − 1 − i hx, zi χ{|x|≤1} θα 1+α dx |x| R Z 1 = θα |z|α (1 − cos(zx)) |z| dx |zx|1+α Z 1 = θα |z|α (1 − cos(x)) 1+α dx |x| choosing θα such that R 1 (1 − cos(x)) |x|1+α dx = 1 gives us that the characteristic function of the α infinitely divisible distribution with this Levy measure is e−|z| . Pn 1 i.i.d. d This implies that if Xi ∼ µα , Sn = i=1 Xi = n α X1 since 1 α ϕSn (z) = e−n|z| = e−|n α z| = ϕX1 (z) Definition 2.8. A probability measure µ on Rd is called strictly α-stable if for all n ϕµ (z)n = ϕµ (n1/α z) and α-stable if ϕµ (z)n = ϕµ (n1/α z)eihcn ,zi Theorem 2.5. The only non-trivial 2-stable distributions are Gaussian and the only non-trivial 1 α-stable distributions for α ∈ (0, 2) are infinitely divisible with Levy measure ν(dθ, dx) = π(dθ) rd+α dr. Definition 2.9. A subordinate is a Levy process on R which only takes non-negative values. Remark 2.4. A subordinate has no Gaussian part, positive drift and ν integrates |x| ∧ 1. Definition 2.10. The Laplace transform of a subordinate is E[e−λXt ] = e−tφ(λ) R where the cummulant φ takes the form φ(λ) = λγ + 1 − eλx ν(dx). Lemma 2.12. If Xt is a Levy process and Tt an independent subordinate the Yt := XTt is a Levy process with h i ϕY1 (z) = ET1 E[eihXT1 ,zi ] = e−φ(−ψ) 21 3 Markov Processes Definition 3.1. A Markov kernel N on measure space (E, E) is a mapping N : E × E → [0, 1] such that 1. For x ∈ E fixed we have A → N (x, A) is a probability measure. 2. For A ∈ E fixed we have x → N (x, A) is measurable. Proposition 3.1. Let N be a Markov kernel then 1. N acts on non-negative functions f by Z N f (x) = N (x, dy)f (y) 2. N acts on probability measures µ by Z µN (A) = N (x, A)µ(dx) 3. If M is another Markov kernel then Z N M (x, A) = N (x, dy)M (y, A) Definition 3.2. Let (Ω, F, P) be a probability space with filtration Ft and Ns,t be a family of Markov kernels. Then a Markov process with transition kernel Ns,t is an adapted process Xt such that ∀t > s we have for all f non-negative, measurable E[f (Xt )|Fs ] = Ns,t f (Xs ) Lemma 3.1. Chapman-Kolmogorov For r < s < t we have Nr,t = Nr,s Ns,t Proof. Let f be non-negative, measurable then Nr,t f (Xr ) = E[f (Xt )|Fr ] = E[E[f (Xt )|Fs ]|Fr ] = E[Ns,t f (Xs )|Fr ] = Nr,s Ns,t f (Xr ) Definition 3.3. A family of Markov kernels which satisfy the Chapman-Kolmogorov equations is called a transition function. Definition 3.4. A transition function Ns,t is called homogeneous if Ns,t = Ns+h,t+h for all h ≥ 0. Theorem 3.1. Let x ∈ E and Ns,t a transition function. Then there exists a Markov process Xt for Ns,t with X0 = x a.s. Definition 3.5. Let B be a Banach space. A family Tt of bounded linear operators on B is called a C0 semi-group if 1. T0 = Id; 2. s, t ≥ 0 then Tt Ts = Tt+s ; 3. ∀x ∈ B we have limt→0 ||Tt x − x|| = 0. 22 Remark 3.1. 1. If B = Rn then semi-groups are of the form Tt = etA for some A ∈ Rn×n . 2. Intuitively a homogeneous transition family should give a semi-group on a space of measurable functions by setting Tt f = N0,t f . 3. We denote C0 (E) to be the space of continuous functions on E which converges to 0 as x → ∞. i.e. ∀ε > 0 ∃Kε compact such that |f (x)| ≤ ε for x ∈ Kεc . Definition 3.6. Let E be a locally compact metric space. A Feller semi-group is a C0 semi-group on C0 (E) satisfying 0 ≤ f ≤ 1 =⇒ 0 ≤ Tt f ≤ 1. Remark 3.2. It suffices that Tt f ∈ C0 for f ∈ C0 and Tt f to converge pointwise to f as t → 0. Corollary 3.1. Every Levy process is a Feller process. Proof. For Xt Levy we have that Tt f (x) = E[f (Xt + x)]. If xn → x then f (Xt + xn ) → f (Xt + x) almost surely hence by dominated convergence theorem we have continuity of Tt f . Similarly if xn → ∞ then f (Xt + xn ) → f (∞) almost surely for t → 0+ by cadlag paths we have pointwise convergence at 0. Theorem 3.2. Let Tt be a Feller semi-group and Xt the associated process. Then 1. Xt has a cadlag modification; 2. Xt satisfies Blumenthal’s 0 − 1 law; 3. Xt satisfies the strong Markov property. Definition 3.7. The detailed balance equations for µ, K are µ(x)K(x, y) = µ(y)K(y, x) ∀x, y ∈ E Definition 3.8. The Markov chain Xn with transition kernel K(., .) on discrete state space E and µ is a measure on E then Xn is reversible with respect to µ if the detailed balance equations hold. Furthermore, if µ is a probability measure then µ is invariant for X. We then say that µ is symmetric if for all t > 0 Z Z f (x)Tt g(x)µ(dx) = g(x)Tt f (x)µ(dx) Lemma 3.2. For discrete state spaces we have that the detailed balance equations holds if and only if the measure is symmetric. Lemma 3.3. A Levy process is reversible with respect to the Lesbesgue measure if and only if Xt has the same distribution as −Xt Proof. Let f, g be bounded with compact support then Z Z Pt f (x)g(x)dx = E[f (x + Xt )]g(x)dx Z =E f (x + Xt )g(x)dx Z =E f (x)g(x − Xt )dx Z = f (x)E[g(x − Xt )]dx Z = f (x)P̃t g(x)dx where P̃t is the transition semi-group of −Xt . 23 Definition 3.9. Let Pt be a Feller semi-group on C0 (E). The generator (A, DA ) of Pt is the operator 1 DA := {f ∈ C0 : lim+ (Pt f − f ) exists in C0 } t→0 t 1 Af := lim+ (Pt f − f ) t→0 t Example 3.1. Let Xt be a compound Poisson process with jump ν and intensity λ. Then DA = C0 (E) and Af = λ(f ∗ ν − f ). Proposition 3.2. If f ∈ DA then 1. Pt f ∈ DA ∀t ≥ 0; 2. ∂ ∂t Pt f = APt f = Pt Af ; Rt Rt 3. Pt f − f = 0 Ps Af ds = 0 APs f ds. Definition 3.10. We say that an operator A is closed if whenever fn → f ∈ DA and gn := Afn → g then we have that g = Af . Proposition 3.3. For the generator (DA , A) we have that 1. DA is dense in C0 (E); 2. A is a closed operator. Proof. Let f ∈ C0 (E) and define 1 (Ph f − f ) h Z 1 s Bs f = Pt f dt s 0 Ah = Then Z s Z s 1 Pt+h f dt − Pt f dt s 0 0 ! Z s+h Z h 1 Pt f dt − Pt f dt = sh s 0 Ah B s f = 1 h which converges to 1s (Ps f − f ) as h → 0 so Bs f ∈ DA for any s and Bs f → f as s → 0 so indeed DA is densely defined. It remains to show that A is closed so Pt f − f = Pt lim f − lim f n→∞ 1 (Pt f − f ) − g t ∞ n→∞ = lim (Pt fn − fn ) n→∞ Z t = lim Ps Afn ds n→∞ 0 Z t = lim Ps gn ds n→∞ 0 Z t = Ps g ds 0 Z t 1 ≤ Ps g − g ds t 0 ∞ Z 1 t ≤ ||Ps g − g||∞ ds t 0 which converges to 0 by continuity. 24 Theorem 3.3. Let Xt be a Feller process with semi-group Pt and generator A then for any f ∈ DA Z t Mft := f (Xt ) − f (X0 ) − Af (Xs ) ds 0 is a martingale with respect to the natural filtration. Proof. Mft is bounded and adapted by definition. Z t f f Af (Xr )dr|Fs E[Mt |Fs ] = Ms + E f (Xt ) − f (Xs ) − s = Mfs + Pt−s f (Xs ) − f (Xs ) − Z t Pr−s Af (Xs )dr s = Mfs Remark 3.3. Conversely, if for a given function f ∈ C0 there exists g ∈ C0 such that Rt f (Xt ) − f (X0 ) − 0 g(Xr )dr is a martingale then f ∈ DA and g = Af . Theorem 3.4. Positive Maximum Principle If f ∈ DA , z0 ∈ E such that f (z0 ) = supz∈E f (z) ≥ 0 then Af (z0 ) ≤ 0. Example 3.2. For a Levy process with semi-group Pt on Schwarz functions we have that FAf (ξ) = ψ(−ξ)Ff (ξ) and in general we have Z n n X X Af (x) = Ai,j δi,j f (x) − bi δi f (x) + f (x + y) − f (x) − xf 0 (x)χD ν(dx) i,j=1 Rn i=1 Remark 3.4. If Pt is symmetric then the generator is self-adjoint on L2 (ν). We will often replace A by its quadratic form 1 E(f ) := −(f, Af )L2 (ν) = − lim+ ((Pt f, f )L2 (ν) − (f, f )L2 (ν) ) t→0 t P 1 2 Lemma 3.4. On a discrete space E(f ) = limt→0+ 2t x,y∈X (f (y) − f (x)) Pt (x, y)ν(x) Proof. X 1 X f (x)Pt (x, y)f (y)ν(x) − f (x)2 ν(x) E(f ) = − lim+ t→0 t x,y∈X x∈X X 1 X = − lim+ f (x)Pt (x, y)(f (y) − f (x))ν(x) + f (x)Pt (x, y)(f (y) − f (x))ν(x) t→0 2t x,y∈X x,y∈X X 1 X = − lim+ f (x)Pt (x, y)(f (y) − f (x))ν(x) − f (y)Pt (y, x)(f (y) − f (x))ν(y) t→0 2t x,y∈X x,y∈X X 1 X = − lim+ f (x)Pt (x, y)(f (y) − f (x))ν(x) − f (y)Pt (x, y)(f (y) − f (x))ν(x) t→0 2t x,y∈X x,y∈X 1 X = − lim+ (f (x) − f (y))Pt (x, y)(f (y) − f (x))ν(x) t→0 2t x,y∈X 1 X = lim+ (f (x) − f (y))2 Pt (x, y)ν(x) t→0 2t x,y∈X 25 Definition 3.11. A closed densely defined form E on L2 (ν) for ν-σ-finite is called Markovian if ∀f ∈ DE we have 1. g = (f ∧ 0) ∨ 1 ∈ DE ; 2. E(g) ≤ E(f ). A quadratic form with these properties is called a Dirichlet form. 3.1 Random Conductance Model Definition 3.12. Consider the locally finite, connected graph (X, E) where ∀(x, Py) ∈ E we have a conductance µx,y = µy,x ≥ 0 representing the flow between x, y and then µx := y∈X µx,y is the flow out of x. We write x ∼ y if (x, y) ∈ E and define the Dirichlet form as E(f ) = E(f, g) = 1X (f (x) − f (y))2 µx,y 2 x∼y 1X (f (x) − f (y))(g(x) − g(y))µx,y 2 x∼y and the potential as Lf (x) = X µx,y (f (y) − f (x)) µx y∼x Lemma 3.5. Fix x0 ∈ X then ||f ||H 1 := E(f ) + f (x0 )2 is a norm on the Hilbert space H 1 := {f : E(f ) < ∞} P Lemma 3.6. E(f ) ≤ 2 x f (x)2 µx Proof. 1X (f (x) − f (y))2 µx,y 2 x∼y X ≤ (f (x)2 + f (y)2 )µx,y E(f ) = x∼y ≤2 X f (x)2 µx x Lemma 3.7. For f, g ∈ L2 (µ) we have −(Lf, g)L2 (µ) = E(f, g) = (f, −Lg)L2 (µ) hence −L is self-adjoint with quadratic form given by E. Proof. − X X µx,y 1X (f (x) − f (y))g(x)µx = µx,y (f (y) − f (x))(g(y) − g(x)) µx 2 x∼y x y∼x 26 Remark 3.5. The natural choice of the Laplacian is given by X L̃f (x) = µx,y (f (y) − f (x)) y∼x which is self-adjoint with respect to the counting measure. Definition 3.13. We can define the following three random walks using this set-up 1. Discrete time random walk µ P(Yn+1 = y|Yn = x) = µx,y x 2. Constant speed random walk Let Yn be the discrete time random walk, Nt ∼ P P (1) then we define Xt = YNt 3. Variable speed random walk Let Yn be the discrete time random walk and define Xt to have departure rate µx from x. Proposition 3.4. For A ⊂ X and f : A → X bounded let σA = inf{n ≥ 0 : Yn ∈ A} and ϕ(x) := Ex [f (YσA )| σA < ∞] is a solution to ( Lv = 0 x ∈ Ac v|A = f and if σA < ∞ a.s. then it is the unique bounded solution. Proof. By the Markov property we have that X µx,y Ex [f (YσA )|σA < ∞] = Ey [f (YσA )|σA < ∞] µx x∼y X µx,y ϕ(y) = µx y∼x So indeed Lϕ = 0 on Ac . Definition 3.14. For A, B ⊂ X : A ∩ B = φ we define effective resistance to be Ref f (A, B)−1 = inf{E(f, f )| f ∈ H 1 , f |A = 1, f |B = 0} Proposition 3.5. For effective resistance Ref f and A, B ⊂ X disjoint we have 1. Ref f is symmetric; 2. Ref f is monotonic i.e. for A ⊂ A0 , B ⊂ B 0 with A0 ∩ B 0 = φ we have Ref f (A0 , B 0 ) ≥ Ref f (A, B); 3. Cutting bonds (µx,y → 0) increases Ref f ; 4. Shortening bonds (µx,y → ∞) decreases Ref f . Proposition 3.6. If Ref f 6= 0 then the infimum is attained by a unique minimiser ϕ solving Lϕ(x) = 0 ∀x ∈ X \ (A ∪ B) with ϕ|A = 1, ϕ|B = 0. Proof. Take x0 ∈ B then H 1 with the norm E(f, f ) + f (x0 )2 is a Hilbert space and V = {f ∈ H 1 ; f |A = 1, f |B = 0} is convex and closed so ∃1 ϕ ∈ V of minimum norm. Let f satisfy f |A = 0 = f |B then E(f + λϕ) + λ2 E(ϕ, ϕ) + 2λE(f, ϕ) ≥ E(f ) ∀λ ≥ 0. We therefore have that E(f, ϕ) = 0 =⇒ −(f, Lϕ) = 0 but since f is arbitrary we have that Lϕ = 0. Theorem 3.5. If Ac is finite then for x0 ∈ Ac we have that Ref f (x0 , A)−1 = µx0 P(σA < σx+0 ) where σx+0 = inf{n ≥ 1 : Xn = x0 }. 27 Proof. v(x) = Px (σA < σx0 ) which is a unique solution to the Dirichlet problem with v(x0 ) = 0, ν|A = 1, Lv(x) = 0 ∀x ∈ Ac \ {x0 } hence we have that Ref f (x0 , A)−1 = E(v, v) X = (v(x) − v(y))(v(x) − v(y))µx,y x∼y = X (v(y) − v(x))((1 − v(x)) − (1 − v(y)))µx,y x∼y = E(−v, 1 − v) = (Lv, 1 − v) XX = (v(x) − v(y))((1 − v)(x))µx,y x∈X y∼x = X X (v(x) − v(y))((1 − v)(x))µx,y x∈Ac y∼x = X ((1 − v)(x)) x∈Ac = X X (v(x) − v(y))µx,y y∼x ((1 − v)(x))Lv(x)µx x∈Ac = µx0 Lv(x0 ) X µx ,y 0 Py (σA < σx0 ) = µx0 µ x 0 y∼x 0 = µx0 Px0 (σA < σx+0 ) Remark 3.6. If An is an increasing sequence of finite sets converging to X then we define Ref f (x0 ) = lim Ref f ({x0 }, Acn ) n→∞ which exists and is independent of the choice of An . Theorem 3.6. ∀x we have that Px (σx+ = ∞) = (µx Ref f (x))−1 Proof. Px (σx+ = ∞) = lim Px (σAn < σx+ ) n→∞ Ref f (x, Acn )−1 n→∞ µx = (µx Ref f (x))−1 = lim Definition 3.15. For an infinite connected graph X a Markov chain is recurrent if for all x we have that P(σx+ < ∞) = 1 and transient otherwise. Corollary 3.2. It suffices that P(σx+ < ∞) = 1 for some x and transience is equivalent to Ref f (x) < ∞. 28 3.2 Heat Kernel Estimates Definition 3.16. For x, y ∈ X and n ≥ 1 define pn (x, y) = Px (Yn = y)/µy and for t ≥ 0 pt (x, y) = ∞ X e−t n=0 tn pn (x, y) n! d Remark 3.7. For Gaussian distribution on R we have |x − y|2 1 exp − pt (x, y) = 2t (2πt)d/2 Proposition 3.7. Let E, Pt , µ be the quadratic form, semi-group and density as before. Then 1. ||Pt f ||L1 (µ) ≤ ||f ||L1 (µ) for f ∈ L1 ∩ L2 . 2. Let u(t) = ||Pt f ||2L2 (µ) then we have u0 (t) = −2E(Pt f, Pt f ). ||Pt f ||2 2E(f,f ) ≤ ||f ||2 L2 ≤ 1. 3. exp − ||f ||2 L2 (µ) L2 Proof. 1. Using that Pt is self-adjoint ||Pt f ||L1 = (Pt f, 1)L2 = (f, Pt 1)L2 ≤ ||f ||L1 2. 1 ((Pt+h f, Pt+h f ) − (Pt f, Pt f )) h 1 = lim (Pt+h f + Pt f, Pt+h f − Pt f ) h→0 h 1 = lim (Pt+h f + Pt f, (Ph f − I)Pt f ) h→0 h = (2Pt f, LPt f ) u0 (t) = lim h→0 = −2E(Pt f, Pt f ) 3. Assume ||f ||L2 = 1 ||Pt f || is log-convex since ||P t+s f ||2L2 = (P t+s f, P t+s f ) 2 2 2 = (Pt f, Ps f ) ≤ ||Pt f ||2L2 ||Ps f ||2L2 so d dt log(||Pt f ||2L2 ) is non-increasing, in particular d 1 log(||Pt f ||2L2 ) = ∂t ||Pt f ||2L2 dt ||Pt f ||2L2 E(Pt f, Pt f ) = −2 ||Pt f ||2L2 29 Definition 3.17. We say E satisfies the Nash θ-inequality if ∃c1 , c2 > 0 such that ∀f ∈ H 1 ∩ L1 we have that 4 2+ 4 ||f ||L2 θ ≤ c1 (E(f, f ) + δ||f ||2L2 )||f ||Lθ 1 Theorem 3.7. For E, Pt , µ quadratic form, semi-group and density as usual we have that E satisfies the Nash θ-inequality if and only if Pt (L1 ) ⊂ L∞ and θ ||Pt ||L1 →L∞ ≤ c2 eδt t− 2 Proof. Assume Nash holds. WLOG let ||f ||L1 = 1 then we want a bound on u(t) = ||Pt f ||2L2 . 4 ∂t u(t) = −2E(Pt f, Pt f ) ≤ −c|u|2+ θ by Nash 4 4 Define v(t) = u(t)1−4/θ then ∂t v(t) ≥ cu−2− θ −1 u2+ θ = c. v(0) = 0 co v(t) ≥ ct and hence u(t) ≤ ct−θ/2 so ||Pt f ||L1 →L∞ ≤ ||Pt/2 f ||L1 →L2 ||Pt/2 f ||L2 →L∞ ≤ ct−θ/2 Now assume the other statement holds. Assume ||f ||L1 = 1 ||Pt f ||2L2 2E(f, f )t exp − ≤ ||f ||2L2 ||f ||2L2 ||Pt f ||L∞ ||Pt f ||L1 ≤ ||f ||2L2 ≤c E(f, f ) ≥ t−θ/2 ||f ||2L2 1 log(tθ/2 ||f ||2L2 ) − log(c) ||f ||2L2 2t Optimising over t gives the required result. Example 3.3. If ϕ is an eigenfunction of −L with eigenvalue λ ≥ 1 then ||ϕ||L∞ ≤ λθ/4 ||ϕ||L2 Definition 3.18. For Ω ⊂ X finite we define the principle eigenvalue as E(f, f ) λ1 (Ω) = inf : supp(f ) ⊂ Ω ||f ||2L2 Definition 3.19. E satisfies the Faber-Krahn inequality if λ1 (Ω) ≥ Cµ(Ω)−2/θ Theorem 3.8. The Nash θ inequality holds if and only if the Faber-Krahn inequality holds. Proof. Suppose the Nash inequality holds. 4 4/θ ||f ||2+ θ ≤ cE(f, f )||f ||L1 30 4 hence dividing through by ||f || θ we have 2 ||f || ≤ cE(f, f ) ||f ||L1 ||f ||L2 4/θ Suppose supp(f ) ⊂ Ω then ||f ||L1 ≤ µ(Ω)1/2 ||f ||L2 so ||f ||2L2 ≤ cE(f, f )µ(Ω)2/θ so indeed we have that E(f, f ) ≥ cµ(Ω)−2/θ ||f ||2L2 Now suppose that The Faber-Krahn inequality holds. Z 2 Z 2 u dµ = Z u2 dµ Z (u − λ)2 dµ + 2λ udµ u dµ + {u≥2λ} Z ≤4 {u<2λ} {u≥2λ} ≤ µ(Ω)2/θ E((u − λ)+ , (u − λ)+ ) + 2λ||u||L1 2/θ ||u|| E(u, u) + 2λ||u||L1 ≤c λ Optimising over λ givers the required result. Definition 3.20. For θ > 2 we say that E satisfies the Sobolev inequality if ∃c > 0 such that for all f with finite support we have that ||f ||2 2θ ≤ cE(f, f ) L θ−2 Corollary 3.3. For θ > 2 if the Sobolev inequality holds then so does the Nash θ inequality. Lemma 3.8. 1. If inf x∼y µx,y > 0 then Ref f (x, y) ≤ cd(x, y). 2. |f (x) − f (y)|2 ≤ Ref f E(f, f ). −1 3. Ref f , Ref f are both metrics. Proof. ∃ a path x0 , x1 , ..., xn such that (xi , xi+1 ) ∈ E for all i, x0 = x, xn = y and n = d(x, y). Cut all edges except for this path then since the resisters are in series we have that the resistances add and by ellipticity the first result follows. Suppose u(x) 6= u(y) then write f = au + b for a, b ∈ R such that f (x) = 1, f (y) = 0. We then have that |f (x) − f (y)|2 |u(x) − u(y)|2 : u ∈ H 1 , u(x) 6= u(y) = sup : f (x) = 1, f (y) = 0 = Ref f (x, y) sup E(u, u) E(f, f ) so indeed the second result follows. The third result follows by showing the properties of a metric directly. 31 3.3 Green Densities Definition 3.21. For a stochastic process Y we define the local time to be L(y, n) = Pn−1 k=1 χ{Yk =y} . Definition 3.22. For B ⊂ X finite we define the green function to be gB (x, y) = 1 Ex [L(y, τB )] µy where τB = inf{n : Yn ∈ B c }. Proposition 3.8. For the green function gB we have that gB (x, y) = gB (y, x) ≤ gB (x, x) for all x, y ∈ B. Proposition 3.9. 1. gB (x, .) is an harmonic function on B \ {x}; 2. if supp(f ) ⊂ B then E(gB (x, .), f ) = f (x); P 3. E[τB ] = y∈B gB (x, y)µy ; 4. Ref f (x, B c ) = gB (x, x). Proof. Let ν(z) = gB (x, z) 1. For x 6= y we have that "τ −1 # B X ν(y)µy = E χ{Yi =y} i=1 =E "τ −2 B X X i=1 z∈B µz,y χ{Yi =z} µz # X µz,y = ν(z)µy µz z∈B We therefore have that ν(y) = X p(z, y)ν(z) z∈B 2. gB (x, x)µx − 1 = Ex [L(x, τB )] − 1 X = p(x, y)E[L(x, τB )] y∈B = gB (y, x)µx E(f, ν) = −(f, Lν) = −f (x)(−µ−1 x )µx = f (x) Remark 3.8. The first two parts of proposition 3.9 show that LgB (x, .) = δx (.). Lemma 3.9. Let B(x, r) = {y : d(x, y) ≤ r} be the ball of radius r centred at x, V (x, r) = µ(B(x, r)) the volume of the ball, Ω ⊂ X finite, non-empty and r(Ω) = max(r ∈ N : ∃x0 ∈ Ω, B(x0 , r) ⊂ Ω}. Then λ1 (Ω) ≥ c r(Ω)µ(Ω) 32 Proof. Take f with supp(f ) ⊂ Ω and normalize such that ||f ||∞ = 1. Fix x0 ∈ Ω where |f (x0 )| = 1 then ||f ||2L2 ≤ µ(Ω). c ∃x0 , x1 , ..., xn path with xn ∈ Ωc , {xi }n−1 i=0 ⊂ Ω and n = d(x0 , Ω ) then E(f, f ) ≥ n−1 1X (f (xi ) − f (xi+1 ))2 2 i=0 ≥ !2 n−1 X 1 ≥ 2n (f (xi ) − f (xi+1 )) i=1 1 2n hence since n ≤ r(Ω) we have that E(f, f ) 1 ≥ ||f ||2L2 r(Ω)µ(Ω) Lemma 3.10. p2t (x, x) ≥ Px (τB > t)2 µ(Ω) Proof. Px (τB > t)2 ≤ Px (Yt ∈ B)2 2 X = pt (x, y)µy y∈B ≤ µ(B) X pt (x, y)2 µ2y Cauchy-Schwarz y∈B = µ(B)p2t (x, x) reversibility Proposition 3.10. Let fn (0, x) = pn+1 (0, x) + pn (0, x) and assume Ref f (0, y) ≤ cd(0, y)α then D rD − D+α fn (0) ≤ cn c∨ V (0, r) where n = 2rD+α . 33