An Introduction to Stochastic Processes in Continuous Time Harry van Zanten February 2, 2007 (this version) always under construction use at your own risk ii Preface iv Contents 1 Stochastic processes 1.1 Stochastic processes 1.2 Finite-dimensional distributions 1.3 Kolmogorov’s continuity criterion 1.4 Gaussian processes 1.5 Non-differentiability of the Brownian sample paths 1.6 Filtrations and stopping times 1.7 Exercises 1 1 3 4 7 10 11 17 2 Martingales 2.1 Definitions and examples 2.2 Discrete-time martingales 2.2.1 Martingale transforms 2.2.2 Inequalities 2.2.3 Doob decomposition 2.2.4 Convergence theorems 2.2.5 Optional stopping theorems 2.3 Continuous-time martingales 2.3.1 Upcrossings in continuous time 2.3.2 Regularization 2.3.3 Convergence theorems 2.3.4 Inequalities 2.3.5 Optional stopping 2.4 Applications to Brownian motion 2.4.1 Quadratic variation 2.4.2 Exponential inequality 2.4.3 The law of the iterated logarithm 2.4.4 Distribution of hitting times 2.5 Exercises 21 21 22 22 24 26 27 31 33 33 34 37 38 39 41 41 42 43 46 48 3 Markov processes 3.1 Basic definitions 3.2 Existence of a canonical version 3.3 Feller processes 3.3.1 Feller transition functions and resolvents 3.3.2 Existence of a cadlag version 3.3.3 Existence of a good filtration 3.4 Strong Markov property 51 51 54 57 57 61 63 66 vi Contents 3.5 3.6 3.7 4 3.4.1 Strong Markov property of a Feller process 3.4.2 Applications to Brownian motion Generators 3.5.1 Generator of a Feller process 3.5.2 Characteristic operator Killed Feller processes 3.6.1 Sub-Markovian processes 3.6.2 Feynman-Kac formula 3.6.3 Feynman-Kac formula and arc-sine law for the BM Exercises Special Feller processes 4.1 Brownian motion in Rd 4.2 Feller diffusions 4.3 Feller processes on discrete spaces 4.4 Lévy processes 4.4.1 Definition, examples and first properties 4.4.2 Jumps of a Lévy process 4.4.3 Lévy-Itô decomposition 4.4.4 Lévy-Khintchine formula 4.4.5 Stable processes 4.5 Exercises 66 70 72 72 76 78 78 80 81 84 87 87 90 95 99 100 103 108 111 113 115 A Elements of measure theory A.1 Definition of conditional expectation A.2 Basic properties of conditional expectation A.3 Uniform integrability A.4 Monotone class theorem 119 119 120 121 122 B Elements of functional analysis B.1 Hahn-Banach theorem B.2 Riesz representation theorem 125 125 126 References 127 1 Stochastic processes 1.1 Stochastic processes Loosely speaking, a stochastic process is a phenomenon that can be thought of as evolving in time in a random manner. Common examples are the location of a particle in a physical system, the price of a stock in a financial market, interest rates, etc. A basic example is the erratic movement of pollen grains suspended in water, the so-called Brownian motion. This motion was named after the English botanist R. Brown, who first observed it in 1827. The movement of the pollen grain is thought to be due to the impacts of the water molecules that surround it. These hits occur a large number of times in each small time interval, they are independent of each other, and the impact of one single hit is very small compared to the total effect. This suggest that the motion of the grain can be viewed as a random process with the following properties: (i) The displacement in any time interval [s, t] is independent of what happened before time s. (ii) Such a displacement has a Gaussian distribution, which only depends on the length of the time interval [s, t]. (iii) The motion is continuous. The mathematical model of the Brownian motion will be the main object of investigation in this course. Figure 1.1 shows a particular realization of this stochastic process. The picture suggest that the BM has some remarkable properties, and we will see that this is indeed the case. Mathematically, a stochastic process is simply an indexed collection of random variables. The formal definition is as follows. Stochastic processes -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 2 0.0 0.2 0.4 0.6 0.8 1.0 Figure 1.1: A sample path of Brownian motion Definition 1.1.1. Let T be a set and (E, E) a measurable space. A stochastic process indexed by T , taking values in (E, E), is a collection X = (Xt )t∈T of measurable maps Xt from a probability space (Ω, F, P) to (E, E). The space (E, E) is called the state space of the process. We think of the index t as a time parameter, and view the index set T as the set of all possible time points. In these notes we will usually have T = Z + = {0, 1, . . .} or T = R+ = [0, ∞). In the former case we say that time is discrete, in the latter we say time is continuous. Note that a discrete-time process can always be viewed as a continuous-time process which is constant on the intervals [n − 1, n) for all n ∈ N. The state space (E, E) will most often be a Euclidean space Rd , endowed with its Borel σ-algebra B(Rd ). If E is the state space of a process, we call the process E-valued. For every fixed t ∈ T the stochastic process X gives us an E-valued random element Xt on (Ω, F, P). We can also fix ω ∈ Ω and consider the map t 7→ Xt (ω) on T . These maps are called the trajectories, or sample paths of the process. The sample paths are functions from T to E, i.e. elements of E T . Hence, we can view the process X as a random element of the function space E T . (Quite often, the sample paths are in fact elements of some nice subset of this space.) The mathematical model of the physical Brownian motion is a stochastic process that is defined as follows. Definition 1.1.2. The stochastic process W = (Wt )t≥0 is called a (standard) Brownian motion, or Wiener process, if (i) W0 = 0 a.s., (ii) Wt − Ws is independent of (Wu : u ≤ s) for all s ≤ t, (iii) Wt − Ws has a N (0, t − s)-distribution for all s ≤ t, (iv) almost all sample paths of W are continuous. 1.2 Finite-dimensional distributions 3 We abbreviate ‘Brownian motion’ to BM in these notes. Property (i) says that a standard BM starts in 0. A process with property (ii) is called a process with independent increments. Property (iii) implies that that the distribution of the increment Wt −Ws only depends on t−s. This is called the stationarity of the increments. A stochastic process which has property (iv) is called a continuous process. Similarly, we call a stochastic process right-continuous if almost all of its sample paths are right-continuous functions. We will often use the acronym cadlag (continu à droite, limites à gauche) for processes with sample paths that are right-continuous have finite left-hand limits at every time point. It is not clear from the definition that the BM actually exists! We will have to prove that there exists a stochastic process which has all the properties required in Definition 1.1.2. 1.2 Finite-dimensional distributions In this section we recall Kolmogorov’s theorem on the existence of stochastic processes with prescribed finite-dimensional distributions. We use it to prove the existence of a process which has properties (i), (ii) and (iii) of Definition 1.1.2. Definition 1.2.1. Let X = (Xt )t∈T be a stochastic process. The distributions of the finite-dimensional vectors of the form (Xt1 , . . . , Xtn ) are called the finitedimensional distributions (fdd’s) of the process. It is easily verified that the fdd’s of a stochastic process form a consistent system of measures, in the sense of the following definition. Definition 1.2.2. Let T be a set and (E, E) a measurable space. For all t1 , . . . , tn ∈ T , let µt1 ,...,tn be a probability measure on (E n , E n ). This collection of measures is called consistent if it has the following properties: (i) For all t1 , . . . , tn ∈ T , every permutation π of {1, . . . , n} and all A1 , . . . , A n ∈ E µt1 ,...,tn (A1 × · · · × An ) = µtπ(1) ,...,tπ(n) (Aπ(1) × · · · × Aπ(n) ). (ii) For all t1 , . . . , tn+1 ∈ T and A1 , . . . , An ∈ E µt1 ,...,tn+1 (A1 × · · · × An × E) = µt1 ,...,tn (A1 × · · · × An ). The Kolmogorov consistency theorem states that conversely, under mild regularity conditions, every consistent family of measures is in fact the family of fdd’s of some stochastic process. Some assumptions are needed on the state space (E, E). We will assume that E is a Polish space (a complete, separable metric space) and E is its Borel σ-algebra, i.e. the σ-algebra generated by the open sets. Clearly, the Euclidean spaces (Rn , B(Rn )) fit into this framework. 4 Stochastic processes Theorem 1.2.3 (Kolmogorov’s consistency theorem). Suppose that E is a Polish space and E is its Borel σ-algebra. Let T be a set and for all t1 , . . . , tn ∈ T , let µt1 ,...,tn be a measure on (E n , E n ). If the measures µt1 ,...,tn form a consistent system, then on some probability space (Ω, F, P) there exists a stochastic process X = (Xt )t∈T which has the measures µt1 ,...,tn as its fdd’s. Proof. See for instance Billingsley (1995). The following lemma is the first step in the proof of the existence of the BM. Corollary 1.2.4. There exists a stochastic process W = (Wt )t≥0 with properties (i), (ii) and (iii) of Definition 1.1.2. Proof. Let us first note that a process W has properties (i), (ii) and (iii) of Definition 1.1.2 if and only if for all t1 , . . . , tn ≥ 0 the vector (Wt1 , . . . , Wtn ) has an n-dimensional Gaussian distribution with mean vector 0 and covariance matrix (ti ∧ tj )i,j=1..n (see Exercise 1). So we have to prove that there exist a stochastic process which has the latter distributions as its fdd’s. In particular, we have to show that the matrix (ti ∧ tj )i,j=1..n is a valid covariance matrix, i.e. that it is nonnegative definite. This is indeed the case since for all a1 , . . . , an it holds that !2 Z ∞ X n X n n X ai aj (ti ∧ tj ) = ai 1[0,ti ] (x) dx ≥ 0. i=1 j=1 0 i=1 This implies that for all t1 , . . . , tn ≥ 0 there exists a random vector (Xt1 , . . . , Xtn ) which has the n-dimensional Gaussian distribution µt1 ,...,tn with mean 0 and covariance matrix (ti ∧ tj )i,j=1..n . It easily follows that the measures µt1 ,...,tn form a consistent system. Hence, by Kolmogorov’s consistency theorem, there exists a process W which has the distributions µt1 ,...,tn as its fdd’s. To prove the existence of the BM, it remains to consider the continuity property (iv) in the definition of the BM. This is the subject of the next section. 1.3 Kolmogorov’s continuity criterion According to Corollary 1.3.4 there exists a process W which has properties (i)–(iii) of Definition 1.1.2. We would like this process to have the continuity property (iv) of the definition as well. However, we run into the problem that there is no particular reason why the set {ω : t 7→ Wt (ω) is continuous} ⊆ Ω 1.3 Kolmogorov’s continuity criterion 5 should be measurable. Hence, the probability that the process W has continuous sample paths is not well defined in general. One way around this problem is to ask whether we can modify the given process W in such a way that the resulting process, say W̃ , has continuous sample paths and still satisfies (i)–(iii), i.e. has the same fdd’s as W . To make this idea precise, we need the following notions. Definition 1.3.1. Let X and Y be two processes indexed by the same set T and with the same state space (E, E), defined on probability spaces (Ω, F, P) and (Ω0 , F 0 , P0 ) respectively. The processes are called versions of each other if they have the same fdd’s, i.e. if for all t1 , . . . , tn ∈ T and B1 , . . . , Bn ∈ E P(Xt1 ∈ B1 , . . . , Xtn ∈ Bn ) = P0 (Yt1 ∈ B1 , . . . , Ytn ∈ Bn ). Definition 1.3.2. Let X and Y be two processes indexed by the same set T and with the same state space (E, E), defined on the same probability space (Ω, F, P). The processes are called modifications of each other if for every t ∈ T Xt = Y t a.s. The second notion is clearly stronger than the first one: if processes are modifications of each other, then they are certainly versions of each other. The converse is not true in general (see Exercise 2). The following theorem gives a sufficient condition under which a given process has a continuous modification. The condition (1.1) is known as Kolmogorov’s continuity condition. Theorem 1.3.3 (Kolmogorov’s continuity criterion). Let X = (Xt )t∈[0,T ] be an Rd -valued process. Suppose that there exist constants α, β, K > 0 such that EkXt − Xs kα ≤ K|t − s|1+β (1.1) for all s, t ∈ [0, T ]. Then there exists a continuous modification of X. Proof. For simplicity, we assume that T = 1 in the proof. First observe that by Chebychev’s inequality, condition (1.1) implies that the process X is continuous Now in probability. This means that if tn → t, then Xtn → Xt in probability. S∞ for n ∈ N, define Dn = {k/2n : k = 0, 1, . . . , 2n } and let D = n=1 Dn . Then D is a countable set, and D is dense in [0, 1]. Our next aim is to show that with probability 1, the process X is uniformly continuous on D. Fix an arbitrary γ ∈ (0, β/α). Using Chebychev’s inequality again, we see that P(kXk/2n − X(k−1)/2n k ≥ 2−γn ) . 2−n(1+β−αγ) . 1 1 The notation ‘ . ’ means that the left-hand side is less than a positive constant times the right-hand side. 6 Stochastic processes It follows that P max n kXk/2n − X(k−1)/2n k ≥ 2−γn 1≤k≤2 n ≤ 2 X k=1 P(kXk/2n − X(k−1)/2n k ≥ 2−γn ) . 2−n(β−αγ) . Hence, by the Borel-Cantelli lemma, there almost surely exists an N ∈ N such that max n kXk/2n − X(k−1)/2n k < 2−γn (1.2) 1≤k≤2 for all n ≥ N . Next, consider an arbitrary pair s, t ∈ D such that 0 < t − s < 2−N . Our aim in this paragraph is to show that kXt − Xs k . |t − s|γ . (1.3) Observe that there exists an n ≥ N such that 2−(n+1) ≤ t − s < 2−n . We claim that if s, t ∈ Dm for m ≥ n + 1, then kXt − Xs k ≤ 2 m X 2−γk . (1.4) k=n+1 The proof of this claim proceeds by induction. Suppose first that s, t ∈ D n+1 . Then necessarily, t = k/2n+1 and s = (k − 1)/2n+1 for some k ∈ {1, . . . , 2n+1 }. By (1.2), it follows that kXt − Xs k < 2−γ(n+1) , which proves the claim for m = n + 1. Now suppose that it is true for m = n + 1, . . . , l and assume that s, t ∈ Dl+1 . Define the numbers s0 , t0 ∈ Dl by s0 = min{u ∈ Dl : u ≥ s}, t0 = max{u ∈ Dl : u ≤ t}. Then by construction, s ≤ s0 ≤ t0 ≤ t and s0 − s ≤ 2−(l+1) , t0 − t ≤ 2−(l+1) . Hence, by the triangle inequality, (1.2) and the induction hypothesis, kXt − Xs k ≤ kXs0 − Xs k + kXt0 − Xt k + kXt0 − Xs0 k ≤ 2−γ(l+1) + 2−γ(l+1) + 2 l X k=n+1 2−γk = 2 l+1 X 2−γk , k=n+1 so the claim is true for m = l + 1 as well. The proof of (1.3) is now straightforward. Indeed, since t, s ∈ Dm for some large enough m, relation (1.4) implies that kXt − Xs k ≤ 2 ∞ X k=n+1 2−γk = 2 2 2−γ(n+1) ≤ |t − s|γ . −γ 1−2 1 − 2−γ Observe that (1.3) implies in particular that almost surely, the process X is uniformly continuous on D. In other words, we have an event Ω∗ ⊆ Ω with P(Ω∗ ) = 1 such that for all ω ∈ Ω∗ , the sample path t 7→ Xt (ω) is uniformly 1.4 Gaussian processes 7 continuous on the countable, dense set D. Now we define a new stochastic process Y = (Yt )t∈[0,1] on (Ω, F, P) as follows: for ω 6∈ Ω∗ , we put Yt = 0 for all t ∈ [0, 1], for ω ∈ Ω∗ we define Xt if t ∈ D, Yt = Xtn (ω) if t 6∈ D. tlim →t n tn ∈D The uniform continuity of X implies that Y is a well-defined, continuous stochastic process. Since X is continuous in probability (see the first part of the proof), Y is a modification of X (see Exercise 3). Corollary 1.3.4. Brownian motion exists. Proof. By Corollary 1.3.4, there exists a process W = (Wt )t≥0 which has properties (i)–(iii) of Definition 1.1.2. By property (iii) the increment Wt − Ws has a N (0, t − s)-distribution for all s ≤ t. It follows that E(Wt − Ws )4 = (t − s)2 EZ 4 , where Z is a standard Gaussian random variable. This means that Kolmogorov’s continuity criterion (1.1) is satisfied with α = 4 and β = 1. So for every T ≥ 0, there exist a continuous modification W T = (WtT )t∈[0,T ] of the process (Wt )t∈[0,T ] . Now define the process X = (Xt )t≥0 by Xt = ∞ X Wtn 1[n−1,n) (t). n=1 The process X is a Brownian motion (see Exercise 5). 1.4 Gaussian processes Brownian motion is an example of a Gaussian process. The general definition is as follows. Definition 1.4.1. A real-valued stochastic process is called Gaussian if all its fdd’s are Gaussian. If X is a Gaussian process indexed by the set T , the mean function of the process is the function m on T defined by m(t) = EXt . The covariance function of the process is the function r on T × T defined by r(s, t) = Cov(Xs , Xt ). The functions m and r determine the fdd’s of the process X. Lemma 1.4.2. Two Gaussian processes with the same mean function and covariance function are versions of each other. 8 Stochastic processes Proof. See Exercise 6. The mean function m and covariance function r of the BM are given by m(t) = 0 and r(s, t) = s ∧ t (see Exercise 1). Conversely, the preceding lemma implies that every Gaussian process with the same mean and covariance function has the same fdd’s as the BM. It follows that such a process has properties (i)– (iii) of Definition 1.1.2. Hence, we have the following result. Lemma 1.4.3. A continuous Gaussian process X = (Xt )t≥0 is a BM if and only if it has mean function EXt = 0 and covariance function EXs Xt = s ∧ t. Using this lemma we can prove the following symmetries and scaling properties of BM. Theorem 1.4.4. Let W be a BM. Then the following are BM’s as well: (i) (time-homogeneity) for every s ≥ 0, the process (Wt+s − Ws )t≥0 , (ii) (symmetry) the process −W , (iii) (scaling) for every a > 0, the process W a defined by Wta = a−1/2 Wat , (iv) (time inversion) the process Xt defined by X0 = 0 and Xt = tW1/t for t > 0. Proof. Parts (i), (ii) and (iii) follow easily from he preceding lemma, see Exercise 7. To prove part (iv), note first that the process X has the same mean function and covariance function as the BM. Hence, by the preceding lemma, it only remains to prove that X is continuous. Since (Xt )t>0 is continuous, it suffices to show that if tn ↓ 0, then P(Xtn → 0 as n → ∞) = 1. (1.5) But this probability is determined by the fdd’s of the process (Xt )t>0 (see Exercise 8). Since these are the same as the fdd’s of (Wt )t>0 , we have P(Xtn → 0 as n → ∞) = P(Wtn → 0 as n → ∞) = 1. This completes the proof. Using the scaling and the symmetry properties we can prove that that the sample paths of the BM oscillate between +∞ and −∞. Corollary 1.4.5. Let W be a BM. Then P(sup Wt = ∞) = P(inf Wt = −∞) = 1. t≥0 t≥0 1.4 Gaussian processes 9 Proof. By the scaling property we have for all a > 0 1 1 sup Wt =d sup √ Wat = √ sup Wt . a a t t t It follows that for n ∈ N, P(sup Wt ≤ n) = P(n2 sup Wt ≤ n) = P(sup Wt ≤ 1/n). t t t By letting n tend to infinity we see that P(sup Wt < ∞) = P(sup Wt = 0), t t so the distribution of supt Wt is concentrated on {0, ∞}. Hence, to prove that supt Wt = ∞ a.s., it suffices to show that P(supt Wt = 0) = 0. Now we have P(sup Wt = 0) ≤ P(W1 ≤ 0 , sup Wt ≤ 0) t t≥1 ≤ P(W1 ≤ 0 , sup Wt − W1 < ∞) t≥1 = P(W1 ≤ 0)P(sup Wt − W1 < ∞), t≥1 by the independence of the Brownian increments. By the time-homogeneity of the BM, the last probability is the probability that the supremum of a BM is finite. We just showed that this must be equal to P(supt Wt = 0). Hence, we find that P(sup Wt = 0) ≤ 12 P(sup Wt = 0), t t which shows that P(supt Wt = 0) = 0, and we obtain the first statement of the corollary. By the symmetry property, sup Wt =d sup −Wt = − inf Wt . t≥0 t≥0 t≥0 Together with the first statement, this proves the second one. Since the BM is continuous, the preceding result implies that almost every sample path visits every point of R infinitely often. A real-valued process with this property is called recurrent. Corollary 1.4.6. The BM is recurrent. An interesting consequence of the time inversion is the following strong law of large numbers for the BM. Corollary 1.4.7. Let W be a BM. Then Wt as →0 t as t → ∞. 10 Stochastic processes Proof. Let X be as in part (iv) of Theorem 1.4.4. Then P(Wt /t → 0 as t → ∞) = P(X1/t → 0 as t → ∞) = 1, since X is continuous and X0 = 0. 1.5 Non-differentiability of the Brownian sample paths We have already seen that the sample paths of the BM are continuous functions that oscillate between +∞ and −∞. Figure 1.1 suggests that the sample paths are ‘very rough’. The following theorem states that this is indeed the case. Theorem 1.5.1. Let W be a Brownian motion. For all ω outside a set of probability zero, the sample path t 7→ Wt (ω) is nowhere differentiable. Proof. Let W be a BM. Consider the upper and lower right-hand derivatives DW (t, ω) = lim sup h↓0 and DW (t, ω) = lim inf h↓0 Wt+h (ω) − Wt (ω) h Wt+h (ω) − Wt (ω) . h Consider the set A = {ω : there exists a t ≥ 0 such that D W (t, ω) and DW (t, ω) are finite}. Note that A is not necessarily measurable! We will prove that A is contained in a measurable set B with P(B) = 0, i.e. that A has outer probability 0. To define the event B, consider first for k, n ∈ N the random variable o n Xn,k = max W k+1 − W k , W k+2 − W k+1 , W k+3 − W k+2 n n n n n 2n 2 and for n ∈ N, define 2 2 2 2 Yn = minn Xn,k . k≤n2 The event B is then defined by B= ∞ \ ∞ [ n=1 k=n k Yk ≤ k 2 . We claim that A ⊆ B and P(B) = 0. To prove the inclusion, take ω ∈ A. Then there exist K, δ > 0 such that |Ws (ω) − Wt (ω)| ≤ K|s − t| for all s ∈ [t, t + δ]. (1.6) 1.6 Filtrations and stopping times 11 Now take n ∈ N so large that 4 < δ, 2n 8K < n, t < n. (1.7) Given this n, determine k ∈ N such that k−1 k ≤ t < n. 2n 2 (1.8) Then by the first relation in (1.7) we have k/2n , . . . , (k + 3)/2n ∈ [t, t + δ]. By (1.6) and the second relation in (1.7) it follows that Xn,k (ω) ≤ n/2n . By (1.8) and the third relation in (1.7) it holds that k − 1 ≤ t2n < n2n . Hence, we have k ≤ n2n and therefore Yn (ω) ≤ Xn,k (ω) ≤ n/2n . We have shown that if ω ∈ A, then Yn (ω) ≤ n/2n for all n large enough. This precisely means that A ⊆ B. To complete the proof, we have to show that P(B) = 0. For ε > 0, the basic properties of the BM imply that !3 Z 2n/2 ε 3 1 − 1 x2 n/2 √ e 2 dx ≤ 23n/2 ε3 . . P(Xn,k ≤ ε) = P |W1 | ≤ 2 ε 2π 0 It follows that P(Yn ≤ ε) ≤ n2n P(Xn,k ≤ ε) . n25n/2 ε3 . In particular we see that P(Yn ≤ n/2n ) → 0, which implies that P(B) = 0. 1.6 Filtrations and stopping times If W is a BM, the increment Wt+h − Wt is independent of ‘what happened up to time t’. In this section we introduce the concept of a filtration to formalize this notion of ‘information that we have up to time t’. The probability space (Ω, F, P) is fixed again and we suppose that T is a subinterval of Z+ or R+ . Definition 1.6.1. A collection (Ft )t∈T of sub-σ-algebras of F is called a filtration if Fs ⊆ Ft for all s ≤ t. A stochastic process X defined on (Ω, F, P) and indexed by T is called adapted to the filtration if for every t ∈ T , the random variable Xt is Ft -measurable. We should think of a filtration as a flow of information. The σ-algebra Ft contains the events that can happen ‘up to time t’. An adapted process is a process that ‘does not look into the future’. If X is a stochastic process, we can consider the filtration (FtX )t∈T defined by FtX = σ(Xs : s ≤ t). We call this filtration the filtration generated by X, or the natural filtration of X. Intuitively, the natural filtration of a process keeps track of the ‘history’ of the process. A stochastic process is always adapted to its natural filtration. 12 Stochastic processes If (Ft ) is a filtration, then for t ∈ T we define the σ-algebra Ft+ = ∞ \ n=1 Ft+1/n . This is the σ-algebra Ft , augmented with the events that ‘happen immediately after time t’. The collection (Ft+ )t∈T is again a filtration (see Exercise 14). Cases in which it coincides with the original filtration are of special interest. Definition 1.6.2. We call a filtration (Ft ) right-continuous if Ft+ = Ft for every t. Intuitively, the right-continuity of a filtration means that ‘nothing can happen in an infinitesimally small time interval’. Note that for every filtration (F t ), the corresponding filtration (Ft+ ) is right-continuous. In addition to right-continuity it is often assumed that F0 contains all events in F∞ that have probability 0, where F∞ = σ (Ft : t ≥ 0) . As a consequence, every Ft then contains all those events. Definition 1.6.3. A filtration (Ft ) on a probability space (Ω, F, P) is said to satisfy the usual conditions if it is right-continuous and F0 contains all the P-negligible events in F∞ . We now introduce a very important class of ‘random times’ that can be associated with a filtration. Definition 1.6.4. A [0, ∞]-valued random variable τ is called a stopping time with respect to the filtration (Ft ) if for every t ∈ T it holds that {τ ≤ t} ∈ Ft . If τ < ∞ almost surely, we call the stopping time finite. Loosely speaking, τ is a stopping time if for every t ∈ T we can determine whether τ has occurred before time t on the basis of the information that we have up to time t. With a stopping time τ we associate the σ-algebra Fτ = {A ∈ F : A ∩ {τ ≤ t} ∈ Ft for all t ∈ T } (see Exercise 15). This should be viewed as the collection of all events that happen prior to the stopping time τ . Note that the notation causes no confusion since a deterministic time t ∈ T is clearly a stopping time and its associated σ-algebra is simply the σ-algebra Ft . If the filtration (Ft ) is right-continuous, then τ is a stopping time if and only if {τ < t} ∈ Ft for every t ∈ T (see Exercise 21). For general filtrations, we introduce the following class of random times. 1.6 Filtrations and stopping times 13 Definition 1.6.5. A [0, ∞]-valued random variable τ is called an optional time with respect to the filtration F if for every t ∈ T it holds that {τ < t} ∈ Ft . If τ < ∞ almost surely, we call the optional time finite. Lemma 1.6.6. τ is an optional time with respect to (Ft ) if and only if it is a stopping time with respect to (Ft+ ). Every stopping time is an optional time. Proof. See Exercise 22. The so-called hitting times form an important class of stopping times and optional times. The hitting time of a set B is the first time that a process visits that set. Lemma 1.6.7. Let (E, d) be a metric space. Suppose that X = (Xt )t≥0 is a continuous, E-valued process and that B is a closed set in E. Then the random variable σB = inf{t ≥ 0 : Xt ∈ B} is an (FtX )-stopping time. 2 Proof. Denote the distance of a point x ∈ E to the set B by d(x, B), so d(x, B) = inf{d(x, y) : y ∈ B}. Since X is continuous, the real-valued process Yt = d(Xt , B) is continuous as well. Moreover, since B is closed, it holds that Xt ∈ B if and only if Yt = 0. Using the continuity of Y , it follows that σB > t if and only if Ys > 0 for all s ≤ t (check!). But Y is continuous and [0, t] is compact, so we have {σB > t} = {Ys is bounded away from 0 for all s ∈ Q ∩ [0, t]} = {Xs is bounded away from B for all s ∈ Q ∩ [0, t]}. The event on the right-hand side clearly belongs to FtX . Lemma 1.6.8. Let (E, d) be a metric space. Suppose that X = (Xt )t≥0 is a right-continuous, E-valued process and that B is an open set in E. Then the random variable τB = inf{t > 0 : Xt ∈ B} is an (FtX )-optional time. 2 As usual, we define inf ∅ = ∞. 14 Stochastic processes Proof. Since B is open and X is right-continuous, it holds that Xt ∈ B if and only if there exists an ε > 0 such that Xs ∈ B for all s ∈ [t, t + δ]. Since this interval always contains a rational number, it follows that [ {Xs ∈ B}. {τB < t} = s<t s∈Q The event on the right-hand side is clearly an element of FtX . Example 1.6.9. Let W be a BM and for x > 0, consider the random variable τx = inf{t > 0 : Wt = x}. Since x > 0 and W is continuous, τx can be written as τx = inf{t ≥ 0 : Wt = x}. By Lemma 1.6.7 this is an (FtW )-stopping time. Moreover, by the recurrence of the BM (see Corollary 1.4.6), τx is a finite stopping time. We often want to consider a stochastic process X, evaluated at a finite stopping τ . However, it is not a-priori clear that the map ω 7→ Xτ (ω) (ω) is measurable, i.e. that Xτ is in fact a random variable. This motivates the following definition. Definition 1.6.10. An (E, E)-valued stochastic process X is called progressively measurable with respect to the filtration (Ft ) if for every t ∈ T , the map (s, ω) 7→ Xs (ω) is measurable as a map from ([0, t] × Ω, B([0, t]) × Ft ) to (E, E) . Lemma 1.6.11. Every adapted, right-continuous, Rd -valued process X is progressively measurable. Proof. Let t ≥ 0 be fixed. For n ∈ N, define the process Xsn = X0 1{0} (s) + n X k=1 Xkt/n 1((k−1)t/n,kt/n] (s), s ∈ [0, t]. Clearly, the map (s, ω) 7→ Xsn (ω) on [0, t] × Ω is B([0, t]) × Ft -measurable. Now observe that since X is right-continuous, the map (s, ω) 7→ Xsn (ω) converges pointwise to the map (s, ω) 7→ Xs (ω) as n → ∞. It follows that the latter map is B([0, t]) × Ft -measurable as well. 1.6 Filtrations and stopping times 15 Lemma 1.6.12. Suppose that X is a progressively measurable process and let τ be a finite stopping time. Then Xτ is an Fτ -measurable random variable. Proof. To prove that Xτ is Fτ -measurable, we have to show that for every B ∈ E and every t ≥ 0, it holds that {Xτ ∈ B} ∩ {τ ≤ t} ∈ Ft . Hence, it suffices to show that the map ω 7→ Xτ (ω)∧t (ω) is Ft -measurable. This map is the composition of the maps ω 7→ (τ (ω) ∧ t, ω) from Ω to [0, t] × Ω and (s, ω) 7→ Xs (ω) from [0, t] × Ω to E. The first map is measurable as a map from (Ω, Ft ) to ([0, t] × Ω, B([0, t]) × Ft ) (see Exercise 23). Since X is progressively measurable, the second map is measurable as a map from ([0, t]×Ω, B([0, t])×F t ) to (E, E). This completes the proof, since the composition of measurable maps is again measurable. It is often needed to consider a stochastic process X up to a given stopping time τ . For this purpose we define the stopped process X τ by ( Xt if t < τ, τ Xt = Xτ ∧t = Xτ if t ≥ τ. By Lemma 1.6.12 and Exercises 16 and 18, we have the following result. Lemma 1.6.13. If X is progressively measurable with respect to (Ft ) and τ an (Ft )-stopping time, then the stopped process X τ is adapted to the filtrations (Fτ ∧t ) and (Ft ). In the subsequent chapters we repeatedly need the following technical lemma. It states that every stopping time is the decreasing limit of a sequence of stopping times that take on only finitely many values. Lemma 1.6.14. Let τ be a stopping time. Then there exist stopping times τ n that only take finitely many values and such that τn ↓ τ almost surely. Proof. Define τn = n n2 −1 X k=1 k 1{τ ∈[(k−1)/2n ,k/2n )} + ∞ 1{τ ≥n} . 2n Then τn is a stopping time and τn ↓ τ almost surely (see Exercise 24). Using the notion of filtrations, we can extend the definition of the BM as follows. Definition 1.6.15. Suppose that on a probability space (Ω, F, P) we have a filtration (Ft )t≥0 and an adapted stochastic process W = (Wt )t≥0 . Then W is called a (standard) Brownian motion, (or Wiener process) with respect to the filtration (Ft ) if 16 Stochastic processes (i) W0 = 0, (ii) Wt − Ws is independent of Fs for all s ≤ t, (iii) Wt − Ws has a N (0, t − s)-distribution for all s ≤ t, (iv) almost all sample paths of W are continuous. Clearly, a process W that is a BM in the sense of the ‘old’ Definition 1.1.2 is a BM with respect to its natural filtration. If in the sequel we do not mention the filtration of a BM explicitly, we mean the natural filtration. However, we will see that it is sometimes necessary to consider Brownian motions with respect to larger filtrations as well. 1.7 Exercises 1.7 17 Exercises 1. Prove that a process W has properties (i), (ii) and (iii) of Definition 1.1.2 if and only if for all t1 , . . . , tn ≥ 0 the vector (Wt1 , . . . , Wtn ) has an ndimensional Gaussian distribution with mean vector 0 and covariance matrix (ti ∧ tj )i,j=1..n . 2. Give an example of two processes that are versions of each other, but not modifications. 3. Prove that the process Y defined in the proof of Theorem 1.3.3 is indeed a modification of the process X. 4. Let α > 0 be given. Give an example of a right-continuous process X that is not continuous and which statisfies E|Xt − Xs |α ≤ K|t − s| for some K > 0 and all s, t ≥ 0. (Hint: consider a process of the form Xt = 1{Y ≤t} for a suitable chosen random variable Y ). 5. Prove that the process X in the proof of Corollary 1.3.4 is a BM. 6. Prove Lemma 1.4.2. 7. Prove parts (i), (ii) and (iii) of Theorem 1.4.4. 8. Consider the proof of the time-inversion property (iv) of Theorem 1.4.4. Prove that the probability in (1.5) is determined by the fdd’s of the process X. 9. Let W be a BM and define Xt = W1−t − W1 for t ∈ [0, 1]. Show that (Xt )t∈[0,1] is a BM as well. 10. Let W be a BM and fix t > 0. Define the process B by ( Ws , s≤t Bs = Ws∧t − (Ws − Ws∧t ) = 2Wt − Ws , s > t. Draw a picture of the processes W and B and show that B is again a BM. We will see another version of this so-called reflection principle in Chapter 3. 11. (i) Let W be a BM and define the process Xt = Wt − tW1 , t ∈ [0, 1]. Determine the mean and covariance function of X. (ii) The process X of part (i) is called the (standard) Brownian bridge on [0, 1], and so is every other continuous, Gaussian process indexed by the interval [0, 1] that has the same mean and covariance function. Show that the processes Y and Z defined by Yt = (1 − t)Wt/(1−t) , t ∈ [0, 1), Y1 = 0 and Z0 = 0, Zt = tW(1/t)−1 , t ∈ (0, 1] are standard Brownian bridges. 18 Stochastic processes 12. Let H ∈ (0, 1) be given. A continuous, zero-mean Gaussian process X with covariance function 2EXs Xt = (t2H + s2H − |t − s|2H ) is called a fractional Brownian motion (fBm) with Hurst index H. Show that the fBm with Hurst index 1/2 is simply the BM. Show that if X is a fBm with Hurst index H, then for all a > 0 the process a−H Xat is a fBm with Hurst index H as well. 13. Let W be a Brownian motion and fix t > 0. For n ∈ N, let πn be a partition of [0, t] given by 0 = tn0 < tn1 < · · · < tnkn = t and suppose that the mesh kπn k = maxk |tnk − tnk−1 | tends to zero as n → ∞. Show that X k L2 (Wtnk − Wtnk−1 )2 −→ t as n → ∞. (Hint: show that the expectation of the sum tends to t, and the variance to zero.) 14. Show that if (Ft ) is a filtration, (Ft+ ) is a filtration as well. 15. Prove that the collection Fτ associated with a stopping time τ is a σalgebra. 16. Show that if σ and τ are stopping times such that σ ≤ τ , then Fσ ⊆ Fτ . 17. Let σ and τ be two (Ft )-stopping times. Show that {σ ≤ τ } ∈ Fσ ∩ Fτ . 18. If σ and τ are stopping times w.r.t. the filtration (Ft ), show that σ ∧ τ and σ ∨ τ are also stopping times and determine the associated σ-algebras. 19. Show that if σ and τ are stopping times w.r.t. the filtration (Ft ), then σ + τ is a stopping time as well. (Hint: for t > 0, write {σ + τ > t} = {τ = 0, σ > t} ∪ {0 < τ < t, σ + τ > t} ∪ {τ > t, σ = 0} ∪ {τ ≥ t, σ > 0}. Only for the second event on the right-hand side it is non-trivial to prove that it belongs to Ft . Now observe that if τ > 0, then σ + τ > t if and only if there exists a positive q ∈ Q such that q < τ and σ + q > t.) 20. Show that if σ and τ are stopping times with respect to the filtration (Ft ) and X is an integrable random variable, then a.s. 1{τ =σ} E(X | Fτ ) = 1{τ =σ} E(X | Fσ ). (Hint: show that E(1{τ =σ} X | Fτ ) = E(1{τ =σ} X | Fτ ∩ Fσ ).) 21. Show that if the filtration (Ft ) is right-continuous, then τ is an (Ft )stopping time if and only if {τ < t} ∈ Ft for all t ∈ T . 22. Prove Lemma 1.6.6. 23. Show that the map ω 7→ (τ (ω) ∧ t, ω) in the proof of Lemma 1.6.12 is measurable as a map from (Ω, Ft ) to ([0, t] × Ω, B([0, t]) × Ft ). 24. Show that the τn in the proof of Lemma 1.6.14 are indeed stopping times and that they converge to τ almost surely. 1.7 Exercises 19 25. Translate the definitions of Section 1.6 to the special case that time is discrete, i.e. T = Z+ . 26. Let W be a BM and let Z = {t ≥ 0 : Wt = 0} be its zero set. Show that with probability one, the set Z has Lebesgue measure zero, is closed and unbounded. 20 Stochastic processes 2 Martingales 2.1 Definitions and examples In this chapter we introduce and study a very important class of stochastic processes: the so-called martingales. Martingales arise naturally in many branches of the theory of stochastic processes. In particular, they are very helpful tools in the study of the BM. In this section, the index set T is an arbitrary interval of Z+ or R+ . Definition 2.1.1. An (Ft )-adapted, real-valued process M is called a martingale (with respect to the filtration (Ft )) if (i) E|Mt | < ∞ for all t ∈ T , as (ii) E(Mt | Fs ) = Ms for all s ≤ t. If property (ii) holds with ‘≥’ (resp. ‘≤’) instead of ‘=’, then M is called a submartingale (resp. supermartingale). Intuitively, a martingale is a process that is ‘constant on average’. Given all information up to time s, the best guess for the value of the process at time t ≥ s is simply the current value Ms . In particular, property (ii) implies that EMt = EM0 for all t ∈ T . Likewise, a submartingale is a process that increases on average, and a supermartingale decreases on average. Clearly, M is a submartingale if and only if −M is a supermartingale and M is a martingale if it is both a submartingale and a supermartingale. The basic properties of conditional expectations (see Appendix A) give us the following examples. 22 Martingales Example 2.1.2. Suppose that X is an integrable random variable and (Ft )t∈T a filtration. For t ∈ T , define Mt = E(X | Ft ), or, more precisely, let Mt be a version of E(X | Ft ). Then M = (Mt )t∈T is an (Ft )-martingale and M is uniformly integrable (see Exercise 1). Example 2.1.3. Suppose that M is a martingale and that ϕ is a convex function such that E|ϕ(Mt )| < ∞ for all t ∈ T . Then the process ϕ(M ) is a submartingale. The same is true if M is a submartingale and ϕ is an increasing, convex function (see Exercise 2). The BM generates many examples of martingales. The most important ones are presented in the following example. Example 2.1.4. Let W be a BM. Then the following processes are martingales with respect to the same filtration: (i) W itself, (ii) Wt2 − t, (iii) for every a ∈ R, the process exp(aWt − a2 t/2). You are asked to prove this in Exercise 3. In the next section we first develop the theory for discrete-time martingales. The generalization to continuous time is carried out in Section 2.3. In Section 2.4 we return to our study of the BM. 2.2 Discrete-time martingales In this section we restrict ourselves to martingales and (filtrations) that are indexed by (a subinterval of) Z+ . Note that as a consequence, it only makes sense to consider Z+ -valued stopping times. In discrete time, τ a stopping time with respect to the filtration (Fn )n∈Z+ if {τ ≤ n} ∈ Fn for all n ∈ N. 2.2.1 Martingale transforms If the value of a process at time n is already known at time n − 1, we call a process predictable. The precise definition is as follows. 2.2 Discrete-time martingales 23 Definition 2.2.1. We call a discrete-time process X predictable with respect to the filtration (Fn ) if Xn is Fn−1 -measurable for every n. In the following definition we introduce discrete-time ‘integrals’. This is a useful tool in martingale theory. Definition 2.2.2. Let M and X be two discrete-time processes. We define the process X ·M by (X ·M )0 = 0 and (X ·M )n = n X k=1 Xk (Mk − Mk−1 ). We call X · M the discrete integral of X with respect to M . If M is a (sub-, super-)martingale, it is often called the martingale transform of M by X. The following lemma explains why these ‘integrals’ are so useful: the integral of a predictable process with respect to a martingale is again a martingale. Lemma 2.2.3. If M is a submartingale (resp. supermartingale) and X is a bounded, nonnegative predictable process, then X ·M is a submartingale (resp. supermartingale) as well. If M is a martingale and X is a bounded, predictable process, then X ·M is a martingale. Proof. Put Y = X·M . Clearly, the process YPis adapted. Since X is bounded, say |Xn | ≤ K for all n, we have E|Yn | ≤ 2K k≤n E|Mk | < ∞. Now suppose first that M is a submartingale and X is nonnegative. Then a.s. E(Yn | Fn−1 ) = E (Yn−1 + Xn (Mn − Mn−1 ) | Fn−1 ) = Yn−1 + Xn E(Mn − Mn−1 | Fn−1 ) ≥ Yn−1 , . hence Y is a submartingale. If M is a martingale the last inequality is an equality, irrespective of the sign of Xn , which implies that Y is a martingale in that case. Using this lemma it is easy to see that a stopped (sub-, super-)martingale is again a (sub-, super-)martingale. Theorem 2.2.4. Let M be a (sub-, super-)martingale and τ a stopping time. Then the stopped process M τ is a (sub-, super-)martingale as well. Proof. Define the process X by Xn = 1{τ ≥n} and verify that M τ = M0 +X·M . Since τ is a stopping time, we have {τ ≥ n} = {τ ≤ n − 1}c ∈ Fn−1 , which shows that the process X is predictable. It is also a bounded process, so the statement follows from the preceding lemma. 24 Martingales The following result can be viewed as a first version of the so-called optional stopping theorem. The general version will be treated in Section 2.2.5. Theorem 2.2.5. Let M be a submartingale and σ, τ two stopping times such that σ ≤ τ ≤ K for some constant K > 0. Then E(Mτ | Fσ ) ≥ Mσ a.s. An adapted process M is a martingale if and only if EMτ = EMσ for any such pair of stopping times. Proof. Suppose for simplicity that M is a martingale and define the predictable process Xn = 1{τ ≥n} − 1{σ≥n} , so that X·M = M τ − M σ . By Lemma 2.2.3 the process X ·M is a martingale, hence E(Mnτ − Mnσ ) = E(X ·M )n = 0 for all n. For σ ≤ τ ≤ K, it follows that τ σ EMτ = EMK = EMK = EMσ . Now take A ∈ Fσ and define the ‘truncated random times’ σ A = σ1A + K1Ac , τ A = τ 1A + K1Ac . By definition of Fσ it holds for every n that {σ A ≤ n} = (A ∩ {σ ≤ n}) ∪ (Ac ∩ {K ≤ n}) ∈ Fn , so σ A is a stopping time. Similarly, τ A is a stopping time, and we clearly have σ A ≤ τ A ≤ K. By the first part of the proof, it follows that EMσA = EMτ A , which implies that Z Z Mτ dP. Mσ dP = A A Since A ∈ Fσ is arbitrary, this shows that E(Mτ | Fσ ) = Mσ almost surely (recall that Mσ is Fσ -measurable, cf. Lemma 1.6.12). If in the preceding paragraph σ is replaced by n − 1 and τ is replaced by n, the argument shows that if M is an adapted process for which EMτ = EMσ for every bounded pair σ ≤ τ of stopping times, then E(Mn | Fn−1 ) = Mn−1 a.s., i.e. M is a martingale. If M is a submartingale, the same reasoning applies, but with inequalities instead of equalities. 2.2.2 Inequalities Markov’s inequality implies that if M is a discrete-time process, then λP(M n ≥ λ) ≤ E|Mn | for all n ∈ N and λ > 0. Doob’s classical submartingale inequality states that for submartingales, we have a much stronger result. 2.2 Discrete-time martingales 25 Theorem 2.2.6 (Doob’s submartingale inequality). Let M be a submartingale. For all λ > 0 and n ∈ N, λP max Mk ≥ λ ≤ EMn 1{maxk≤n Mk ≥λ} ≤ E|Mn |. k≤n Proof. Define τ = n ∧ inf{k : Mk ≥ λ}. This is a stopping time and τ ≤ n (see Lemma 1.6.7), so by Theorem 2.2.5, we have EMn ≥ EMτ . It follows that EMn ≥ EMτ 1{maxk≤n Mk ≥λ} + EMτ 1{maxk≤n Mk <λ} ≥ λP max Mk ≥ λ + EMn 1{maxk≤n Mk <λ} . k≤n This yields the first inequality, the second one is obvious. Theorem 2.2.7 (Doob’s Lp -inequality). If M is a martingale or a nonnegative submartingale and p > 1, then for all n ∈ N p p p p E max |Mn | ≤ E |Mn | . k≤n p−1 Proof. We define M ∗ = maxk≤n |Mk |. By Fubini’s theorem, we have for every m∈N Z M ∗ ∧m ∗ p E|M ∧ m| = E pxp−1 dx 0 Z m pxp−1 1{M ∗ ≥x} dx =E Z m0 = pxp−1 P(M ∗ ≥ x) dx. 0 By the conditional version of Jensen’s inequality, the process |M | is a submartingale (see Example 2.1.3). Hence, the preceding theorem implies that Z m E|M ∗ ∧ m|p ≤ pxp−2 E|Mn |1{M ∗ ≥x} dx 0 ! Z ∗ = pE |Mn | = M ∧m xp−2 dx 0 p E |Mn ||M ∗ ∧ m|p−1 . p−1 By Hölder’s inequality, it follows that with 1/p + 1/q = 1, E|M ∗ ∧ m|p ≤ 1/q 1/p p E|M ∗ ∧ m|(p−1)q . E|Mn |p p−1 26 Martingales Since p > 1 we have q = p/(p − 1), so E|M ∗ ∧ m|p ≤ (p−1)/p 1/p p E|M ∗ ∧ m|p . E|Mn |p p−1 Now raise both sides to the pth power and cancel common factors to see that p p E|M ∗ ∧ m|p ≤ E|Mn |p . p−1 The proof is completed by letting m tend to infinity. 2.2.3 Doob decomposition An adapted, integrable process X can always be written as a sum of a martingale and a predictable process. This is called the Doob decomposition of the process X. Theorem 2.2.8. Let X be an adapted, integrable process. There exist a martingale M and a predictable process A such that A0 = M0 = 0 and X = X0 + M + A. The processes M and A are a.s. unique. The process X is a submartingale if and only if A is increasing. Proof. Suppose first that there exist a martingale M and a predictable process A such that A0 = M0 = 0 and X = X0 + M + A. Then the martingale property of M and the predictability of A show that E(Xn − Xn−1 | Fn−1 ) = An − An−1 . (2.1) Since A0 = 0 it follows that An = n X k=1 E(Xk − Xk−1 | Fk−1 ) (2.2) for n ≥ 1, and hence Mn = Xn − X0 − An . This shows that M and A are a.s. unique. Conversely, given a process X, (2.2) defines a predictable process A, and it is easily seen that the process M defined by M = X − X0 − A is a martingale. This proves existence of the decomposition. Equation (2.1) show that X is a submartingale if and only if the process A is increasing. Using the Doob decomposition in combination with the submartingale inequality, we obtain the following result. 2.2 Discrete-time martingales 27 Theorem 2.2.9. Let X be a submartingale or a supermartingale. For all λ > 0 and n ∈ N, λP max |Xk | ≥ 3λ ≤ 4E|X0 | + 3E|Xn |. k≤n Proof. Suppose that X is a submartingale. By the preceding theorem there exist a martingale M and an increasing, predictable process A such that M 0 = A0 = 0 and X = X0 + M + A. By the triangle inequality and the fact that A is increasing, P max |Xk | ≥ 3λ ≤ P (|X0 | ≥ λ) + P max |Mk | ≥ λ + P (An ≥ λ) . k≤n k≤n Hence, by Markov’s inequality and the submartingale inequality, λP max |Xk | ≥ 3λ ≤ E|X0 | + E|Mn | + EAn . k≤n Since Mn = Xn − X0 − An , the right-hand side is bounded by 2E|X0 | + E|Xn | + 2EAn . We know that An is given by (2.2). Taking expectations in the latter expression shows that EAn = EXn − EX0 ≤ E|X0 | + E|Xn |. This completes the proof. 2.2.4 Convergence theorems Let M be a supermartingale and consider a compact interval [a, b] ⊆ R. The number of upcrossings of [a, b] that the process M makes up to time n is the number of times that the process ‘moves’ from a level below a to a level above b. The precise definition is as follows. Definition 2.2.10. The number Un [a, b] is the largest k ∈ Z+ such that there exist 0 ≤ s1 < t1 < s2 < t2 < · · · < sk < tk ≤ n with Msi < a and Mti > b. Lemma 2.2.11 (Doob’s upcrossings lemma). Let M be a supermartingale. Then for all a < b, the number of upcrossings Un [a, b] of the interval [a, b] by M up to time n satisfies (b − a)EUn [a, b] ≤ E(Mn − a)− . Proof. Consider the bounded, predictable process X given by X0 = 1{M0 <a} and Xn = 1{Xn−1 =1} 1{Mn−1 ≤b} + 1{Xn−1 =0} 1{Mn−1 <a} for n ∈ N, and define Y = X ·M . The process X is 0 until M drops below the level a, then is 1 until M gets above b etc. So every completed upcrossing of 28 Martingales [a, b] increases the value of Y by at least (b − a). If the last upcrossing has not been completed at time n, this can cause Y to decrease by at most (Mn − a)− . Hence, we have the inequality Yn ≥ (b − a)Un [a, b] − (Mn − a)− . (2.3) By Lemma 2.2.3, the process Y = X ·M is a supermartingale. In particular, it holds that EYn ≤ EY0 = 0. Hence, the proof is completed by taking expectations on both sides of (2.3). Observe that the upcrossings lemma implies that if M is a supermartingale that is bounded in L1 , i.e. for which supn E|Mn | < ∞, then EU∞ [a, b] < ∞ for all a ≤ b. In particular, the total number U∞ [a, b] of upcrossings of the interval [a, b] is finite almost surely. The proof of the classical martingale convergence theorem is now straightforward. Theorem 2.2.12 (Doob’s martingale convergence theorem). If M is a supermartingale that is bounded in L1 , then Mn converges almost surely to a limit M∞ as n → ∞, and E|M∞ | < ∞. Proof. Suppose that Mn does not converge to a limit in [−∞, ∞]. Then there exist two rational numbers a < b such that lim inf Mn < a < b < lim sup Mn . In particular, we must have U∞ [a, b] = ∞. This contradicts the fact that by the upcrossings lemma, the number U∞ [a, b] is finite with probability 1. We conclude that almost surely, Mn converges to a limit M∞ in [−∞, ∞]. By Fatou’s lemma, E|M∞ | = E(lim inf |Mn |) ≤ lim inf E(|Mn |) ≤ sup E|Mn | < ∞. This completes the proof. If the supermartingale M is not only bounded in L1 but also uniformly integrable, then in addition to almost sure convergence we have convergence in L1 . Moreover, in that case the whole sequence M0 , M1 , . . . , M∞ is a supermartingale. Theorem 2.2.13. Let M be a supermartingale that is bounded in L1 . Then Mn → M∞ in L1 if and only if {Mn : n ∈ Z+ } is uniformly integrable. In that case E(M∞ | Fn ) ≤ Mn a.s., with equality if M is a martingale. Proof. By the preceding theorem the convergence Mn → M∞ holds almost surely, so the first statement follows from Theorem A.3.5 in the appendix. To prove the second statement, suppose that Mn → M∞ in L1 . Since M is a supermartingale we have Z Z Mm dP ≤ Mn dP (2.4) A A 2.2 Discrete-time martingales 29 for all A ∈ Fn and m ≥ n. For the integral on the left-hand side it holds that Z Z Mm dP − ≤ E|Mm − M∞ | → 0 M dP ∞ A A as m → ∞. Hence, by letting m tend to infinity in (2.4) we find that Z Z Mn dP M∞ dP ≤ A A for every A ∈ Fn . This completes the proof. If X is an integrable random variable and (Fn ) is a filtration, then E(X | Fn ) is a uniformly integrable martingale (cf. Example 2.1.2). For martingales of this type we can identify the limit explicitly in terms of the ‘limit σ-algebra’ F∞ defined by ! [ F∞ = σ Fn . n Theorem 2.2.14 (Lévy’s upward theorem). Let X be an integrable random variable and (Fn ) a filtration. Then as n → ∞, E(X | Fn ) → E(X | F∞ ) almost surely and in L1 . Proof. The process Mn = E(X | Fn ) is a uniformly integrable martingale (see Example 2.1.2 and Lemma A.3.4). Hence, by Theorem 2.2.13, Mn → M∞ almost surely and in L1 . It remains to show that M∞ = E(X | F∞ ) a.s. Suppose that A ∈ Fn . Then by Theorem 2.2.13 and the definition of Mn , Z Z Z M∞ dP = Mn dP = X dP. A A A By a standard monotone class argument, it follows that this holds for all A ∈ F∞ . We also need the corresponding result for decreasing families of σ-algebras. If we have a filtration of the form (Fn : n ∈ −N), i.e. a collection of σ-algebras such that F−(n+1) ⊆ F−n for all n, then we define F−∞ = \ n F−n . Theorem 2.2.15 (Lévy-Doob downward theorem). Let (Fn : n ∈ −N) be a collection of σ-algebras such that F−(n+1) ⊆ F−n for every n ∈ N and let M = (Mn : n ∈ −N) be supermartingale, i.e. E(Mn | Fm ) ≤ Mm a.s. 30 Martingales for all m ≤ n ≤ −1. Then if sup EMn < ∞, the process M is uniformly integrable, the limit M−∞ = lim Mn n→−∞ 1 exists a.s. and in L , and E(Mn | F−∞ ) ≤ M−∞ a.s., with equality if M is a martingale. Proof. For every n ∈ N the upcrossings inequality applies to the supermartingale (Mk : k = −n, . . . , −1). So by reasoning as in the proof of Theorem 2.2.12 we see that the limit M−∞ = limn→−∞ Mn exists and is finite almost surely. Now for all K > 0 and n ∈ −N we have Z Z Z |Mn | dP = − Mn dP − Mn dP + EMn . |Mn |>K Mn <−K Mn ≤K As n tends to −∞, the expectation EMn increases to a finite limit. So for an arbitrary ε > 0, there exists an m ∈ −N such that EMn − EMm ≤ ε for all n ≤ m. Together with the supermartingale property this implies that for all n ≤ m, Z Z Z |Mn | dP ≤ − Mm dP − Mm dP + EMm + ε |Mn |>K Mn <−K Mn ≤K Z |Mm | dP + ε. = |Mn |>K Hence, to prove the uniform integrability of M it suffices (in view of Lemma A.3.1) to show that by choosing K large enough, we can make P(Mn > K) arbitrarily small for all n simultaneously. Now consider the process M − = max{−M, 0}. It is an increasing, convex function of the submartingale −M , whence it is a submartingale itself (see Example 2.1.3). In particular, EM n− ≤ − EM−1 for all n ∈ −N. It follows that E|Mn | = EMn + 2EMn− ≤ sup EMn + 2E|M−1 | and, consequently, P(|Mn | > K) ≤ 1 (sup EMn + 2E|M−1 |) . K So indeed, M is uniformly integrable. The limit M−∞ therefore exists in L1 as well and the proof can be completed by reasoning as in the proof of the upward theorem. Note that the downward theorem includes the ‘downward version’ of Theorem 2.2.14 as a special case. Indeed, if X is an integrable random variable and T F1 ⊇ F2 ⊇ · · · ⊇ n Fn = F∞ is a decreasing sequence of σ-algebras, then as n → ∞, E(X | Fn ) → E(X | F∞ ) almost surely and in L1 . This is generalized in the following corollary of Theorems 2.2.14 and 2.2.15, which will be useful in the sequel. 2.2 Discrete-time martingales 31 Corollary 2.2.16. Suppose that Xn → X a.s. and that |Xn | ≤ Y for all n, where Y is an integrable random variable. Moreover, suppose that F1 ⊆ F2 ⊆ · · · (resp. F1 ⊇ F2 ⊇ · · · ) is an increasing (resp. decreasing) sequence S of σalgebras. Then E(X | F ) → E(X | F ) a.s. as n → ∞, where F = σ ( n n ∞ ∞ n Fn ) T (resp. F∞ = n Fn ). Proof. For m ∈ N, put Um = inf n≥m Xn and Vm = supn≥m Xn . Then since Xn → X a.s., we a.s. have Vm − Um → 0 as m → ∞. It holds that |Vm − Um | ≤ 2|Y |, so dominated convergence implies that E(Vm − Um ) → 0 as m → ∞. Now fix an arbitrary ε > 0 and choose m so large that E(Vm − Um ) ≤ ε. For n ≥ m we have Um ≤ Xn ≤ V m , (2.5) hence E(Um | Fn ) ≤ E(Xn | Fn ) ≤ E(Vm | Fn ) a.s. The processes on the lefthand side and the right-hand side are martingales which satisfy the conditions of the upward (resp. downward) theorem, so letting n tend to infinity we obtain, a.s., E(Um | F∞ ) ≤ lim inf E(Xn | Fn ) ≤ lim sup E(Xn | Fn ) ≤ E(Vm | F∞ ). (2.6) It follows that E lim supE(Xn | Fn ) − lim inf E(Xn | Fn ) ≤ E E(Vm | F∞ ) − E(Um | F∞ ) = E(Vm − Um ) ≤ ε. By letting ε ↓ 0 we see that lim sup E(Xn | Fn ) = lim inf E(Xn | Fn ) a.s., so E(Xn | Fn ) converges a.s. To identify the limit we let n → ∞ in (2.5), finding that Um ≤ X ≤ Vm , hence E(Um | F∞ ) ≤ E(X | F∞ ) ≤ E(Vm | F∞ ) (2.7) a.s. From (2.6) and (2.7) we see that both lim E(Xn | Fn ) and E(X | F∞ ) are a.s. between the random variables E(Um | F∞ ) and E(Vm | F∞ ). It follows that E lim E(Xn | Fn ) − E(X | F∞ ) ≤ E(Vm − Um ) ≤ ε. By letting ε ↓ 0 we see that lim E(Xn | Fn ) = E(X | F∞ ) a.s. 2.2.5 Optional stopping theorems Theorem 2.2.5 implies that if M is a martingale and σ ≤ τ are two bounded stopping times, then E(Mτ | Fσ ) = Mσ a.s. The following theorem extends this result. 32 Martingales Theorem 2.2.17 (Optional stopping theorem). Let M be a uniformly integrable martingale. Then the family of random variables {Mτ : τ is a finite stopping time} is uniformly integrable and for all stopping times σ ≤ τ we have E(Mτ | Fσ ) = Mσ almost surely. Proof. By Theorem 2.2.13, M∞ = lim Mn exist almost surely and in L1 and E(M∞ | Fn ) = Mn a.s. Now let τ be an arbitrary stopping time and n ∈ N. Since τ ∧ n ≤ n we have Fτ ∧n ⊆ Fn . By the tower property of conditional expectations (see Appendix A), it follows that for every n ∈ N E(M∞ | Fτ ∧n ) = E (E(M∞ | Fn ) | Fτ ∧n ) = E(Mn | Fτ ∧n ), a.s. Hence, by Theorem 2.2.5, we almost surely have E(M∞ | Fτ ∧n ) = Mτ ∧n . Now let n tend to infinity. Then the right-hand side converges almost surely to Mτ , and by the upward convergence theorem, the left-hand side converges a.s. (and in L1 ) to E(M∞ | G), where ! [ Fτ ∧n , G=σ n so E(M∞ | G) = Mτ almost surely. Now take A ∈ Fτ . Then Z Z Z M∞ dP = M∞ dP + A A∩{τ <∞} (2.8) A∩{τ =∞} M∞ dP. Since A ∩ {τ < ∞} ∈ G (see Exercise 6), relation (2.8) implies that Z Z Mτ dP. M∞ dP = A∩{τ <∞} A∩{τ <∞} Trivially, we also have Z A∩{τ =∞} M∞ dP = Z M∞ dP = Z Mτ dP. A∩{τ =∞} Hence, we find that Z A for every A ∈ Fτ . We conclude that Mτ dP A E(M∞ | Fτ ) = Mτ almost surely. The first statement of the theorem now follows from Lemma A.3.4 in Appendix A. The second statement follows from the tower property of conditional expectations and the fact that Fσ ⊆ Fτ if σ ≤ τ . 2.3 Continuous-time martingales 33 For the equality E(Mτ | Fσ ) = Mσ in the preceding theorem to hold it is necessary that M is uniformly integrable. There exist (positive) martingales that are bounded in L1 but not uniformly integrable, for which the equality fails in general (see Exercise 12)! For positive supermartingales without additional integrability properties we only have an inequality. Theorem 2.2.18. Let M be a nonnegative supermartingale and let σ ≤ τ be stopping times. Then E(Mτ 1{τ <∞} | Fσ ) ≤ Mσ 1{σ<∞} almost surely. Proof. Fix n ∈ N. The stopped supermartingale M τ ∧n is a supermartingale again (cf. Theorem 2.2.4) and is uniformly integrable (check!). By reasoning exactly as in the proof of the preceding theorem we find that τ ∧n E(Mτ ∧n | Fσ ) = E(M∞ | Fσ ) ≤ Mστ ∧n = Mσ∧n , a.s. Hence, by the conditional version of Fatou’s lemma (see Appendix A), E(Mτ 1{τ <∞} | Fσ ) ≤ E(lim inf Mτ ∧n | Fσ ) ≤ lim inf E(Mτ ∧n | Fσ ) ≤ lim inf Mσ∧n almost surely. On the event {σ < ∞} the lim inf in the last line equals Mσ . 2.3 Continuous-time martingales In this section we consider general martingales, indexed by a subset T of R + . If the martingale M = (Mt )t≥0 has ‘nice’ sample paths, for instance rightcontinuous, then M can be approximated ‘accurately’ by a discrete-time martingale. Simply choose a countable, dense subset {tn } of the index set T and compare the continuous-time martingale M with the discrete-time martingale (Mtn )n . This simple idea allows us to transfer many of the discrete-time results to the continuous-time setting. 2.3.1 Upcrossings in continuous time For a continuous-time process X we define the number of upcrossings of the interval [a, b] in the set of time points T ⊆ R+ as follows. For a finite set 34 Martingales F = {t1 , . . . , tn } ⊆ T we define UF [a, b] as the number of upcrossings of [a, b] of the discrete-time process (Xti )i=1..n (see Definition 2.2.10). We put UT [a, b] = sup{UF [a, b] : F ⊆ T, F finite}. Doob’s upcrossings lemma has the following extension. Lemma 2.3.1. Let M be a supermartingale and let T ⊆ R+ be countable. Then for all a < b, the number of upcrossings UT [a, b] of the interval [a, b] by M satisfies (b − a)EUT [a, b] ≤ sup E(Mt − a)− . t∈T Proof. Let Tn be a nested sequence of finite sets such that UT [a, b] = lim UTn [a, b]. For every n, the discrete-time upcrossings inequality states that (b − a)EUTn [a, b] ≤ E(Mtn − a)− , where tn is the largest element of Tn . By the conditional version of Jensen’s inequality the process (M − a)− is a submartingale (see Example 2.1.3). In particular the function t 7→ E(Mt − a)− is increasing, so E(Mtn − a)− = sup E(Mt − a)− . t∈Tn Hence, for every n we have the inequality (b − a)EUTn [a, b] ≤ sup E(Mt − a)− . t∈Tn The proof is completed by letting n tend to infinity. 2.3.2 Regularization Theorem 2.3.2. Let M be a supermartingale. Then for almost all ω, the limits lim Mq (ω) q↑t q∈Q and lim Mq (ω) q↓t q∈Q exist and are finite for every t ≥ 0. Proof. Fix n ∈ N. By Lemma 2.3.1 there exists an event Ωn of probability 1 on which the upcrossing numbers of the process M satisfy U[0,n]∩Q [a, b] < ∞ 2.3 Continuous-time martingales 35 for every pair of rational numbers a < b. Now suppose that for t ≤ n, the limit lim Mq q↑t q∈Q does not exist. Then there exist two rational numbers a < b such that lim inf Mq < a < b < lim sup Mq . q↑t q∈Q q↑t q∈Q But this implies that U[0,n]∩Q [a, b] = ∞. We conclude that on the event Ωn the limit lim Mq q↑t r∈Q exists for every t ≤ n. Similarly, the other limit exists for every t ≤ n. It follows T that on the event Ω0 = Ωn , both limits exist in [−∞, ∞] for every t ≥ 0. To prove that the left limit is in fact finite, let Q be the set of all rational numbers less than t and let Qn be nested finite sets of rational numbers increasing to Q. For fixed n, the process (Mq )q∈{0}∪Qn ∪{t} is a discrete-time supermartingale. Hence, by Theorem 2.2.9, λP(max |Mq | ≥ 3λ) ≤ 4E|M0 | + 3E|Mt |. q∈Qn Letting n → ∞ and then λ → ∞, we see that supq∈Q |Mq | < ∞ almost surely, and hence the left limit is finite as well. The finiteness of the right limit is proved similarly. Corollary 2.3.3. Almost every sample path of a right-continuous supermartingale is cadlag. Proof. See Exercise 7. Given a supermartingale M we now define for every t ≥ 0 Mt+ = lim Mq . q↓t q∈Q These random variables are well defined by Theorem 2.3.2 and we have the following result regarding the process (Mt+ )t≥0 . Lemma 2.3.4. Let M be a supermartingale. Then E|Mt+ | < ∞ for every t and E(Mt+ | Ft ) ≤ Mt a.s. If t 7→ EMt is continuous, this inequality is an equality. Moreover, the process (Mt+ )t≥0 is a supermartingale with respect to the filtration (Ft+ )t≥0 and it is a martingale if M is a martingale. 36 Martingales Proof. Let tn be a sequence of rational numbers decreasing to t. Then the process Mtn is a ‘backward supermartingale’ like we considered in Theorem 2.2.15. Hence, Mt+ is integrable and Mtn converges to Mt+ in L1 . It follows that in the inequality E(Mtn | Ft ) ≤ Mt a.s., we may let n tend to infinity to find that E(Mt+ | Ft ) ≤ Mt a.s.. The L1 convergence also implies that EMtn → EMt+ . So if t 7→ EMt is continuous, we have EMt+ = lim EMtn = EMt , hence E(Mt+ | Ft ) = Mt a.s. (see Exercise 8). To prove the final statement, take s < t and let sn be a sequence of rational numbers smaller than t and decreasing to s. Then E(Mt+ | Fsn ) = E(E(Mt+ | Ft ) | Fsn ) ≤ E(Mt | Fsn ) ≤ Msn almost surely, with equality if M is a martingale. The right-hand side of this display converges to Ms+ as n → ∞. The process (E(Mt+ | Fsn ))n is a ‘backward supermartingale’, so by Theorem 2.2.15, the left-hand side converges to E(Mt+ | Fs+ ) almost surely. We can now prove the main regularization theorem for supermartingales with respect to filtrations satisfying the usual conditions (see Definition 1.6.3). Theorem 2.3.5. Let M be a supermartingale with respect to a filtration (Ft ) that satisfies the usual conditions. Then if t 7→ EMt is right-continuous, the process M has a cadlag modification. Proof. By Theorem 2.3.2, there exists an event Ω0 of probability 1 on which the limits Mt− = lim Mq and Mt+ = lim Mq q↑t q∈Q q↓t q∈Q exist for every t. We define the process M̃ by M̃t (ω) = Mt+ (ω)1Ω0 (ω) for t ≥ 0. Then M̃t = Mt+ a.s. and since Mt+ is Ft+ -measurable, we have M̃t = E(Mt+ | Ft+ ) a.s. Hence, by the right-continuity of the filtration and the preceding lemma, M̃t = E(Mt+ | Ft ) ≤ Mt a.s. Since t 7→ EMt is rightcontinuous, it holds that EM̃t = lim EMs = EMt . s↓t Together with the fact that M̃t ≤ Mt a.s. this implies that M̃t = Mt a.s., in other words, M̃ is a modification of M (see Exercise 8). The usual conditions on the filtration imply that M̃ is adapted, and the process is cadlag by construction (see Exercise 9). By the preceding lemma and the right-continuity of the filtration it is a supermartingale. 2.3 Continuous-time martingales 37 Corollary 2.3.6. A martingale with respect to a filtration that satisfies the usual conditions has a cadlag modification. 2.3.3 Convergence theorems In view of the results of the preceding section, we will only consider rightcontinuous martingales from this point on. Under this assumption, many of the discrete-time theorems can be generalized to continuous time. Theorem 2.3.7 (Martingale convergence theorem). Let M be a rightcontinuous supermartingale that is bounded in L1 . Then Mt converges almost surely to a limit M∞ as t → ∞, and E|M∞ | < ∞. Proof. First we show that since M is right-continuous, it holds that Mt → M∞ as t → ∞ if and only if (2.9) lim Mt = M∞ . t→∞ t∈Q+ To prove the non-trivial implication in this assertion, assume that (2.9) holds and fix an ε > 0. Then there exists a number a > 0 such that for all t ∈ [a, ∞) ∩ Q, it holds that |Mt − M∞ | < ε. Now let t > a be arbitrary. Since M is rightcontinuous, there exists an s > t > a such that s ∈ Q and |Ms −Mt | < ε. By the triangle inequality, it follows that |Mt − M∞ | ≤ |Mt − Ms | + |Ms − M∞ | ≤ 2ε, which proves the first assertion. So to prove the convergence, we may assume that M is indexed by the countable set Q+ . The proof can now be finished by arguing as in the proof of Theorem 2.2.12, replacing Doob’s discrete-time upcrossings inequality by Lemma 2.3.1. Corollary 2.3.8. A nonnegative, right-continuous supermartingale M converges almost surely as t → ∞. Proof. Simple exercise. The following continuous-time extension of Theorem 2.2.13 can be derived by reasoning as in the discrete-time case. The only slight difference is that for a continuous-time process X, it is not that case that L1 -convergence of Xt as t → ∞ implies that X is uniformly integrable. This explains why martingales and supermartingales are treated separately in the following theorem. 38 Martingales Theorem 2.3.9. Let M be a right-continuous supermartingale that is bounded in L1 . (i) If M is uniformly integrable, then Mt → M∞ in L1 as t → ∞ and E(M∞ | Ft ) ≤ Mt a.s., with equality if M is a martingale. (ii) If M is a martingale and Mt → M∞ in L1 as t → ∞, then M is uniformly integrable. 2.3.4 Inequalities Doob’s submartingale inequality and Lp -inequality are easily extended to the setting of general right-continuous martingales. Theorem 2.3.10 (Doob’s submartingale inequality). Let M be a rightcontinuous submartingale. Then for all λ > 0 and t ≥ 0, 1 P sup Ms ≥ λ ≤ E|Mt |. λ s≤t Proof. Let T be a countable, dense subset of [0, t] and choose a sequence of finite subsets Tn ⊆ T such that t ∈ Tn for every n and Tn ↑ T as n → ∞. Then by the discrete-time version of the submartingale inequality 1 P max Ms ≥ λ ≤ E|Mt |. s∈Tn λ Now let n tend to infinity. Then the probability on the left-hand side increases to P sup Ms ≥ λ . s∈T Since M is right-continuous, the supremum over T equals the supremum over [0, t]. By exactly the same reasoning we can generalize the Lp -inequality to continuous time. Theorem 2.3.11 (Doob’s Lp -inequality). Let M be a right-continuous martingale or a right-continuous, nonnegative submartingale. Then for all p > 1 and t ≥ 0 p p p E |Mt | . E sup |Ms |p ≤ p−1 s≤t 2.3 Continuous-time martingales 2.3.5 39 Optional stopping We now come to the continuous-time version of the optional stopping theorem. Theorem 2.3.12 (Optional stopping theorem). Let M be a rightcontinuous, uniformly integrable martingale. Then for all stopping times σ ≤ τ we have E(Mτ | Fσ ) = Mσ almost surely. Proof. By Lemma 1.6.14 there exist stopping times σn and τn that only take finitely many values and such that τn ↓ τ and σn ↓ σ almost surely. Moreover, it is easily seen that for the stopping times σn and τn constructed as in the proof of Lemma 1.6.14, it holds that σn ≤ τn for every n. Hence, by the discrete-time optional sampling theorem, E(Mτn | Fσn ) = Mσn almost surely. Since σ ≤ σn it holds that Fσ ⊆ Fσn . It follows that a.s. E(Mτn | Fσ ) = E(E(Mτn | Fσn ) | Fσ ) = E(Mσn | Fσ ). (2.10) The discrete-time optional sampling theorem also implies that E(Mτn | Fτn+1 ) = Mτn+1 almost surely. Hence, Mτn is a ‘backward’ martingale in the sense of Theorem 2.2.15. Since sup EMτn = EM0 < ∞, Theorem 2.2.15 implies that Mτn is uniformly integrable. Together with the fact that Mτn → Mτ a.s. by the rightcontinuity of Mτ , this implies that Mτn → Mτ in L1 . Similarly, we have that Mσn → Mσ in L1 . Now take A ∈ Fσ . Then by equation (2.10) it holds that Z Z Mσn dP. Mτn dP = A A By the L1 -convergence we just proved, this yields the relation Z Z Mσ dP Mτ dP = A A if we let n tend to infinity. This completes the proof. Corollary 2.3.13. A right-continuous, adapted process M is a martingale if and only if for every bounded stopping time τ , the random variable Mτ is integrable and EMτ = EM0 almost surely. 40 Martingales Proof. Suppose first that M is a martingale. Since τ is bounded, there exists a constant K > 0 such that τ ≤ K. The process (Mt )t≤K is uniformly integrable. Hence, by the optional sampling theorem, EMτ = EM0 . The converse statement is proved by arguing as in the proof of Theorem 2.2.5. Corollary 2.3.14. If M is a right-continuous martingale and τ is a stopping time, then the stopped process M τ is a martingale as well. Proof. Let σ be a bounded stopping time. Then by applying the preceding corollary to the martingale M we find EMστ = EMτ ∧σ = EM0 = EM0τ . Since M τ is right-continuous and adapted (see Lemmas 1.6.11 and 1.6.13), another application of the preceding corollary completes the proof. Just as in discrete time the assumption of uniform integrability is crucial for the optional stopping theorem. If this condition is dropped we only have an inequality in general. Theorem 2.2.18 carries over to continuous time by using the same arguments as in the proof of Theorem 2.3.12. Theorem 2.3.15. Let M be a nonnegative, right-continuous supermartingale and let σ ≤ τ be stopping times. Then E(Mτ 1{τ <∞} | Fσ ) ≤ Mσ 1{σ<∞} almost surely. A consequence of this result is that nonnegative, right-continuous supermartingales stay at zero once they have hit it. Corollary 2.3.16. Let M be a nonnegative, right-continuous supermartingale and define τ = inf{t : Mt = 0 or Mt− = 0}. Then almost surely, the process M vanishes on [τ, ∞). Proof. Define τn = inf{t : Mt ≤ 1/n}. Then for a rational number q > 0 we have τn ≤ τ + q, so that by the preceding theorem, EMτ +q 1{τ <∞} ≤ EMτn 1{τn <∞} ≤ 1 . n It follows that almost surely on the event {τ < ∞}, it holds that Mτ +q = 0 for every rational q > 0. Since M is right-continuous, it holds for all q ≥ 0. 2.4 Applications to Brownian motion 2.4 41 Applications to Brownian motion In this section we apply the developed martingale theory to the study of the Brownian motion. 2.4.1 Quadratic variation The following theorem extends the result of Exercise 13 of Chapter 1. Theorem 2.4.1. Let W be a Brownian motion and fix t > 0. For n ∈ N, let πn be a partition of [0, t] given by 0 = tn0 ≤ tn1 ≤ · · · ≤ tnkn = t and suppose that the mesh kπn k = maxk |tnk − tnk−1 | tends to zero as n → ∞. Then X k L2 (Wtnk − Wtnk−1 )2 −→ t as n → ∞. If the partitions are nested we have X as (Wtnk − Wtnk−1 )2 → t k as n → ∞. Proof. For the first statement, see Exercise 13 of Chapter 1. To prove the second one, denote the sum by Xn and put Fn = σ(Xn , Xn+1 , . . .). Then Fn+1 ⊆ Fn for every n ∈ N. Now suppose that we can show that E(Xn | Fn+1 ) = Xn+1 a.s. Then, since sup EXn = t < ∞, Theorem 2.2.15 implies that Xn converges almost surely to a finite limit X∞ . By the first statement of the theorem the Xn converge in probability to t. Hence, we must have X∞ = t a.s. So it remains to prove that E(Xn | Fn+1 ) = Xn+1 a.s. Without loss of generality, we assume that the number elements of the partition πn equals n. In that case, there exists a sequence tn such that the partition πn has the numbers t1 , . . . , tn as its division points: the point tn is added to πn−1 to form the next partition πn . Now fix n and consider the process W 0 defined by Ws0 = Ws∧tn+1 − (Ws − Ws∧tn+1 ). By Exercise 10 of Chapter 1, W 0 is again a BM. For W 0 , denote the analogues of the sums Xk by Xk0 . Then it is easily seen that for k ≥ n + 1 it holds that 0 = Xn+1 − Xn (check!). Since Xk0 = Xk . Moreover, it holds that Xn0 − Xn+1 0 both W and W are BM’s, the sequences (X1 , X2 , . . .) and (X10 , X20 , . . .) have the same distribution. It follows that a.s., 0 0 0 , . . .) , Xn+2 | Xn+1 E(Xn − Xn+1 | Fn+1 ) = E(Xn0 − Xn+1 0 = E(Xn0 − Xn+1 | Xn+1 , Xn+2 , . . .) = E(Xn+1 − Xn | Xn+1 , Xn+2 , . . .) = −E(Xn − Xn+1 | Fn+1 ). This implies that E(Xn − Xn+1 | Fn+1 ) = 0 almost surely. 42 Martingales A real-valued function f is said to be of finite variation on an interval [a, b] if there exists a finite number K > 0 such that for every finite partition a = t0 < · · · < tn = b of [a, b] it holds that X |f (tk ) − f (tk−1 )| < K. k Roughly speaking, this means that the graph of the function f on [a, b] has finite length. Theorem 2.4.1 shows that the sample paths of the BM have positive and finite quadratic variation. This has the following consequence. Corollary 2.4.2. Almost every sample path of the BM has unbounded variation on every interval. Proof. Fix t > 0. Let πn be nested partitions of [0, t] given by 0 = tn0 ≤ tn1 ≤ · · · ≤ tnkn = t and suppose that the mesh kπn k = maxk |tnk − tnk−1 | tends to zero as n → ∞. Then X X sup |Ws − Wt | |Wtnk − Wtnk−1 |. (Wtnk − Wtnk−1 )2 ≤ |s−t|≤kπn k k k By the continuity of the Brownian sample paths the first factor on the righthand side converges to zero a.s. as n → ∞. Hence, if the BM would have finite variation on [0, t] with positive probability, then X (Wtnk − Wtnk−1 )2 k would converge to zero with positive probability, which contradicts Theorem 2.4.1. 2.4.2 Exponential inequality Let W be a Brownian motion. We have the following exponential inequality for the tail probabilities of the running maximum of the Brownian motion. Theorem 2.4.3. For every t ≥ 0 and λ > 0, 1 λ2 P sup Ws ≥ λ ≤ e− 2 t s≤t and 1 λ2 P sup |Ws | ≥ λ ≤ 2e− 2 t . s≤t 2.4 Applications to Brownian motion 43 Proof. For a > 0 consider the exponential martingale M defined by Mt = exp(aWt − a2 t/2) (see Example 2.1.4). Observe that 2 P sup Ws ≥ λ ≤ P sup Ms ≥ eaλ−a t/2 . s≤t s≤t By the submartingale inequality, the probability on the right-hand side is bounded by 2 2 2 ea t/2−aλ EMt = ea t/2−aλ EM0 = ea t/2−aλ . The proof of the first inequality is completed by minimizing the latter expression in a > 0. To prove the second one, note that P sup |Ws | ≥ λ ≤ P sup Ws ≥ λ + P inf Ws ≤ −λ s≤t s≤t s≤t = P sup Ws ≥ λ + P sup −Ws ≥ λ . s≤t s≤t The proof is completed by applying the first inequality to the BM’s W and −W . The exponential inequality also follows from the fact that sups≤t Ws =d |Wt | for every fixed t. We will prove this equality in distribution in the next chapter. 2.4.3 The law of the iterated logarithm The law of the iterated logarithm describes how the BM oscillates near zero and infinity. In the proof we need the following simple lemma. Lemma 2.4.4. For every a > 0, Z ∞ 1 2 e− 2 x dx ≥ a 1 2 a e− 2 a . 2 1+a Proof. The proof starts from the inequality Z ∞ Z ∞ 1 2 1 1 − 1 x2 2 e dx ≤ e− 2 x dx. 2 2 x a a a Integration by parts shows that the left-hand side equals Z ∞ Z ∞ 1 1 1 − 1 a2 1 1 − 2 x2 − 2 x2 2 d e − e d = e + x a x a Za ∞ 1 2 1 1 2 e− 2 x dx. = e− 2 a − a a 44 Martingales Hence, we find that 1+ 1 a2 Z ∞ a 1 2 e− 2 x dx ≥ 1 − 1 a2 e 2 a and the proof is complete. Theorem 2.4.5 (Law of the iterated logarithm). It almost surely holds that Wt Wt lim sup p = 1, lim inf p = −1, t↓0 2t log log 1/t 2t log log 1/t t↓0 lim sup √ t→∞ Wt = 1, 2t log log t lim inf √ t→∞ Wt = −1. 2t log log t Proof. It suffices to prove the first statement, since the second statement follows by applying the first one to the BM −W . The third and fourth follow by applying the first two to the BM tW1/t (see Theorem 1.4.4). Put h(t) = (2t log log 1/t)1/2 and choose two numbers θ, δ ∈ (0, 1). We put αn = (1 + δ)θ −n h(θn ), βn = h(θ n )/2. Using the submartingale inequality as in the proof of Theorem 2.4.3 we find that P sup(Ws − αn s/2) ≥ βn ≤ e−αn βn ≤ Kn−1−δ s≤1 for some constant K > 0 that does not depend on n. Hence, by the BorelCantelli lemma, sup(Ws − αn s/2) ≤ βn s≤1 for all n large enough. In particular, it holds for all n large enough and s ∈ [0, θ n−1 ] that αn θn−1 1+δ 1 αn s + βn ≤ + βn = + h(θn ). Ws ≤ 2 2 2θ 2 Since h is increasing in a neighbourhood of 0, it follows that for all n large enough, Ws 1 1+δ sup , ≤ + 2θ 2 θ n ≤s≤θ n−1 h(s) which implies that lim sup t↓0 1+δ 1 Wt ≤ + . h(t) 2θ 2 Now let θ ↑ 1 and δ ↓ 0 to find that lim sup t↓0 Wt ≤ 1. h(t) 2.4 Applications to Brownian motion 45 To prove the reverse inequality, choose θ ∈ (0, 1) and consider the events o n √ An = Wθn − Wθn+1 ≥ 1 − θ h(θn ) . By the independence of the increments of the BM the events An are independent. Moreover, it holds that Z ∞ 1 2 1 P(An ) = √ e− 2 x dx. 2π a √ with a = (1 − θ)(2(1 − θ)−1 log log θ −n )1/2 . By Lemma (2.4.4), it follows that √ 2πP(An ) ≥ 1 2 a e− 2 a . 2 1+a It is easily seen that the right-hand side is of order n with α < 1. It follows that − √ (1− θ)2 1−θ = n−α , P Wθ n P(An ) = ∞, so by the Borel-Cantelli lemma, √ ≥ 1 − θ h(θn ) + Wθn+1 for infinitely many n. Since −W is also a BM, the first part of the proof implies that −Wθn+1 ≤ 2h(θ n+1 ) for all n large enough. Combined with the preceding inequality and the fact √ that h(θ n+1 ) ≤ 2 θh(θ n ), this implies that √ √ Wθn ≥ 1 − θ h(θn ) − 2h(θ n+1 ) ≥ h(θ n )(1 − 5 θ) for infinitely many n. Hence, lim sup t↓0 √ Wt ≥ 1 − 5 θ, h(t) and the proof is completed by letting θ tend to zero. As a corollary we have the following result regarding the zero set of the BM that was considered in Exercise 26 of Chapter 1. Corollary 2.4.6. The point 0 is an accumulation point of the zero set of the BM, i.e. for every ε > 0, the BM visits 0 infinitely often in the time interval [0, ε). Proof. By the law of the iterated logarithm, the exist sequences tn and sn converging to 0 such that Wtn p 2tn log log 1/tn =1 46 Martingales and Ws n p 2sn log log 1/sn = −1 for every n. The corollary follows by the continuity of the Brownian sample paths. 2.4.4 Distribution of hitting times Let W be a standard BM and for a > 0, let τa be the (a.s. finite) hitting time of the level a, cf. Example 1.6.9. Theorem 2.4.7. For a > 0, the Laplace transform of the hitting time τa is given by Ee−λτa = e−a √ 2λ , λ ≥ 0. Proof. For b ≥ 0, consider the exponential martingale Mt = exp(bWt − b2 t/2) (see Example 2.1.4). The stopped process M τa is again a martingale (see Corollary 2.3.14) and is bounded by exp(ab). A bounded martingale is uniformly integrable, hence, by the optional stopping theorem, τa = EM0τa = 1. EMτa = EM∞ Since Wτa = a, it follows that Eeab−b 2 τa /2 = 1. The expression for the Laplace transform follows by substituting b2 = 2λ. We will see later that τa has the density a2 ae− 2x x 7→ √ 1{x≥0} . 2πx3 This can be shown for instance by inverting the Laplace transform of τa . We will however use an alternative method in the next chapter. At this point we only prove that although the hitting times τa are finite almost surely, we have Eτa = ∞ for every a > 0. A process with this property is called null recurrent. Corollary 2.4.8. For every a > 0 it holds that Eτa = ∞. 2.4 Applications to Brownian motion 47 Proof. Denote the distribution function of τa by F . By integration by parts we have for every λ > 0 Z ∞ Z ∞ F (x) de−λx e−λx dF (x) = − Ee−λτa = 0 0 Combined with the fact that −1 = Z ∞ de−λx 0 it follows that 1 − Ee−λτa 1 =− λ λ Z ∞ 0 (1 − F (x)) de−λx = Z ∞ 0 (1 − F (x))e−λx dx. Now suppose that Eτa < ∞. Then by the dominated convergence theorem, the right-hand side converges to Eτa as λ → 0. In particular, lim λ↓0 1 − Ee−λτa λ is finite. However, the preceding theorem shows that this is not the case. 48 Martingales 2.5 Exercises 1. Prove the assertion in Example 2.1.2. 2. Prove the assertion in Example 2.1.3. 3. Show that the processes defined in Example 2.1.4 are indeed martingales. 4. Let X1 , X2 , . . . be i.i.d. random variables and consider the tail σ-algebra defined by ∞ \ T = σ(Xn , Xn+1 , . . .). n=1 (a) Show that for every n, T is independent of the σ-algebra σ(X1 , . . . , Xn ) and conclude that for every A ∈ T P(A) = E(1A | X1 , X2 , . . . , Xn ) almost surely. (b) Give a “martingale proof” of Kolmogorov’s 0–1 law : for every A ∈ T , P(A) = 0 or P(A) = 1. 5. In this exercise we present a “martingale proof” of the law of large numbers. Let X1 , X2 , . . . be i.i.d. random variables with E|X1 | < ∞. Define Sn = X1 + · · · + Xn and Fn = σ(Sn , Sn+1 , . . .). (a) Note that for i = 1, . . . n, the distribution of the pair (Xi , Sn ) is independent of i. From this fact, deduce that E(Xn | Fn ) = Sn /n and, consequently, 1 1 Sn Fn+1 = Sn+1 E n n+1 almost surely. (b) Show that Sn /n converges almost surely to a finite limit. (c) Derive from Kolmogorov’s 0–1 law that the limit must be a constant and determine its value. 6. Consider the proof of Theorem 2.2.17. Prove that for the stopping time τ and the event A ∈ Fτ it holds that A ∩ {τ < ∞} ∈ G. 7. Prove Corollary 2.3.3. 8. Let X and Y be two integrable random variables, F a σ-algebra and suppose that Y is F-measurable. Show that if E(X | F) ≤ Y a.s. and EX = EY , then E(X | F) = Y a.s. 9. Show that the process M̃ constructed in the proof of Theorem 2.3.5 is cadlag. 10. Let M be a supermartingale. Prove that if M has a right-continuous modification, then t 7→ EMt is right-continuous. 2.5 Exercises 49 11. Show that for every a 6= 0, the exponential martingale of Example 2.1.4 converges to 0 a.s. as t → ∞. (Hint: use for instance the recurrence of the Brownian motion). Conclude that these martingales are not uniformly integrable. 12. Given an example of two stopping times σ ≤ τ and a martingale M that is bounded in L1 but not uniformly integrable, for which the equality E(Mτ | Fσ ) = Mσ fails. (Hint: see Exercise 11.) 13. Let M be a positive, continuous martingale that converges a.s. to zero as t tends to infinity. (a) Prove that for every x > 0, M0 P sup Mt > x | F0 = 1 ∧ x t≥0 almost surely. (Hint: stop the martingale when it gets above the level x.) (b) Let W be a standard BM. Using the exponential martingales of Example 2.1.4, show that for every a > 0, the random variable sup (Wt − 21 at) t≥0 has an exponential distribution with parameter a. 14. Let W be a BM and for a ∈ R, let τa be the first time that W hits a. Suppose that a > 0 > b. By considering the stopped martingale W τa ∧τb , show that −b . P(τa < τb ) = a−b 15. Consider the setup of the preceding exercise. By stopping the martingale Wt2 − t at an appropriate stopping time, show that E(τa ∧ τb ) = −ab. Deduce that Eτa = ∞. 16. Let W be a BM and for a > 0, let τ̃a be the first time that the process |W | hits the level a. (a) Show that for every b > 0, the process Mt = cosh(b|Wt |) exp(−b2 t/2) is martingale. (b) Find the Laplace transform of the stopping time τ̃a . (c) Calculate Eτ̃a . 50 Martingales 3 Markov processes 3.1 Basic definitions We begin with the following central definition. Definition 3.1.1. Let (E, E) be a measurable space. A transition kernel on E is a map P : E × E → [0, 1] such that (i) for every x ∈ E, the map B 7→ P (x, B) is a probability measure on (E, E), (ii) for every A ∈ E, the map x 7→ P (x, B) is measurable. A transition kernel provides a mechanism for a random motion in the space E which can be described as follows. Say at time zero, one starts at a point x0 ∈ E. Then at time 1, the next position x1 is chosen according to the probability measure P (x0 , dx). At time 2 the point x2 is chosen according to P (x1 , dx), etc. This motion is clearly Markovian, in the sense that at every point in time, the next position does not depend on the entire past, but only on the current position. Roughly speaking, a Markov process is a continuous-time process with this property. Suppose that we have such a process X for which, for every t ≥ 0, there exists a transition kernel Pt such that P(Xt+s ∈ B | FsX ) = Pt (Xs , B) a.s. for all s ≥ 0. Then it is necessary that the transition kernels (Pt )t≥0 satisfy a certain consistency relation. Indeed, by a standard approximation argument, we have Z E(f (Xt+s ) | FsX ) = f (x)Pt (Xs , dx) a.s. 52 Markov processes for every nonnegative, measurable function f . It follows that for s, t ≥ 0, Pt+s (X0 , B) = P(Xt+s ∈ B | F0X ) = E(P(Xt+s ∈ B | FsX ) | F0X ) = E(Pt (Xs , B) | F0X ) Z = Pt (y, B)Ps (X0 , dy) almost surely. This motivates the following definition. Definition 3.1.2. Let (E, E) be a measurable space. A collection of transition kernels (Pt )t≥0 is called a (homogenous) transition function if for all s, t ≥ 0, x ∈ E and B ∈ E Z Pt+s (x, B) = Ps (x, dy)Pt (y, B). This relation is known as the Chapman-Kolmogorov equation. R Integrals of the form f dν are often written in operator notation as νf . It is useful to introduce a similar notation for transition kernels. If P (x, dy) is a transition kernel on a measurable space (E, E) and f is a nonnegative, measurable function on E, we define the function P f by Z P f (x) = f (y)P (x, dy). For B ∈ E we have P 1B (x) = P (x, B), so P 1B is a measurable function. By a standard approximation argument it is easily seen that for every nonnegative, measurable function f , the function P f is measurable as well. Moreover, we see that if f is bounded, then P f is also a bounded function. Hence, P can be viewed a linear operator on the space of bounded, measurable functions on E. Note that in this operator notation, the Chapman-Kolmogorov equation states that for a transition function (Pt )t≥0 it holds that Pt+s f = Pt (Ps f ) = Ps (Pt f ) for every bounded, measurable function f and s, t ≥ 0. In other words, the operators (Pt )t≥0 form a semigroup of operators on the space of bounded, measurable functions on E. In the sequel we will not distinguish between this semigroup and the corresponding (homogenous) transition function on (E, E), since there is a one-to-one relation between the two concepts. We can now give the definition of a Markov process. Definition 3.1.3. Let (Ω, F, (Ft ), P) be a filtered probability space, (E, E) a measurable space and (Pt ) a transition function on E. An adapted process X is called a (homogenous) Markov process with respect to the filtration (Ft ) if for all s, t ≥ 0 and every nonnegative, measurable function f on E, E(f (Xt+s ) | Fs ) = Pt f (Xs ) P-a.s. The distribution of X0 under P is called the initial distribution of the process, E is called the state space. 3.1 Basic definitions 53 The following lemma will be useful in the next section. It shows in particular that the fdd’s of a Markov process are determined by the transition function and the initial distribution. Lemma 3.1.4. A process X is Markov with respect to its natural filtration with transition function (Pt ) and initial distribution ν if and only if for all 0 = t0 < t1 < · · · < tn and nonnegative, measurable functions f0 , f1 , . . . , fn , E n Y i=0 fi (Xti ) = νf0 Pt1 −t0 f1 · · · Ptn −tn−1 fn . Proof. Exercise 1. The Brownian motion is a Markov process. In the following example we calculate its transition function. Example 3.1.5. Let W be a standard BM and let (Ft ) be its natural filtration. For s, t ≥ 0 and a nonnegative, measurable function f , consider the conditional expectation E(f (Wt+s ) | Fs ). Given Fs , the random variable Wt+s = (Wt+s − Ws ) + Ws has a normal distribution with mean Ws and variance t . It follows (check!) that a.s. Z 2 1 (Ws −y) 1 t √ E(f (Wt+s ) | Fs ) = dy = Pt f (Ws ), f (y)e− 2 2πt R where the transition kernel Pt on R is defined by Z f (y)p(t, x, y)dy, Pt f (x) = R with 2 1 − 1 (x−y) e 2 t . 2πt Hence, the BM is a Markov process with respect to its natural filtration. p(t, x, y) = √ In general it is not true that a function of a Markov process is again Markovian. The following lemma gives a sufficient condition under which this is the case. Lemma 3.1.6. Let X be a Markov process with state space (E, E) and transition function (Pt ). Suppose that (E 0 , E 0 ) is a measurable space and let ϕ : E → E 0 be measurable and onto. If (Qt ) is a collection of transition kernels such that Pt (f ◦ ϕ) = (Qt f ) ◦ ϕ for all bounded, measurable functions f on E 0 , then Y = ϕ(X) is a Markov process with respect to its natural filtration, with state space (E 0 , E 0 ) and transition function (Qt ). 54 Markov processes Proof. Let f be a bounded, measurable function on E 0 . Then by the assumption and the semigroup property of (Pt ), (Qt Qs f ) ◦ ϕ = Pt ((Qs f ) ◦ ϕ) = Pt Ps (f ◦ ϕ) = Pt+s (f ◦ ϕ) = (Qt+s f ) ◦ ϕ. Since ϕ is onto, this implies that (Qt ) is a semigroup. Using the preceding lemma and the assumption it is easily verified that Y has the Markov property (see Exercise 2). 3.2 Existence of a canonical version In this section we show that for a given transition function (Pt ) and probability measure ν on a measurable space (E, E), we can construct a so-called canonical Markov process X which has ν as initial distribution and (Pt ) as its transition function. An E-valued process can be viewed as random element of the space E R+ of E-valued functions on R+ . Recall that a subset A ⊆ E R+ is called a cylinder if it is of the form A = {f ∈ E R+ : f (t1 ) ∈ E1 , . . . , f (tn ) ∈ En } for certain t1 , . . . , tn ≥ 0 and E1 , . . . , En ∈ E. The product-σ-algebra E R+ on E R+ is defined as the smallest σ-algebra which contains all cylinders. Equivalently, it is the smallest σ-algebra which makes all projections E R+ → R given by f 7→ f (t) measurable. Now we define Ω = E R+ and F = E R+ and on (Ω, F) we consider the process X = (Xt )t≥0 defined by Xt (ω) = ω(t), ω ∈ Ω. Viewed as a map from Ω → E R+ , X is simply the identity map. In particular, X is a measurable map (Ω, F) → (E R+ , E R+ ) and it follows that for every t ≥ 0, Xt is a measurable map (Ω, F) → (E, E). Hence, X is an E-valued stochastic process on (Ω, F) in the sense of Definition 1.1.1. Note however that we have not yet defined a probability measure on (Ω, F). Definition 3.2.1. The process X is called the canonical process on (E R+ , E R+ ). In Chapter 1 we presented (without proof) Kolmogorov’s consistency theorem (see Theorem 1.2.3). It states that for every consistent collection of probability measures (cf. Definition 1.2.2) there exists a certain probability space and a process on this space which has these probability measures as its fdd’s. At this point we need the following refinement of this result. 3.2 Existence of a canonical version 55 Theorem 3.2.2 (Kolmogorov’s consistency theorem). Suppose that E is a Polish space and E is its Borel σ-algebra. For all t1 , . . . , tn ≥ 0, let µt1 ,...,tn be a probability measure on (E n , E n ). If the measures µt1 ,...,tn form a consistent system, then there exists a probability measure P on the measurable space (E R+ , E R+ ) such that under P, the canonical process X on (E R+ , E R+ ) has the measures µt1 ,...,tn as its fdd’s. Proof. See for instance Billingsley (1995). From this point on we assume that (E, E) is a Polish space, endowed with its Borel σ-algebra. We have the following existence result for Markov processes with a given transition function and initial distribution. Corollary 3.2.3. Let (Pt ) be a transition function on (E, E) and let ν be a probability measure on (E, E). Then there exists a unique probability measure Pν on (Ω, F) = (E R+ , E R+ ) such that under Pν , the canonical process X is a Markov process with respect to its natural filtration (FtX ), with initial distribution ν. Proof. For all 0 = t0 ≤ t1 ≤ · · · ≤ tn we define the probability measure on (E n+1 , E n+1 ) by µt0 ,t1 ,...,tn (A0 × A1 × · · · × An ) = ν1A0 Pt1 −t0 1A1 · · · Ptn −tn−1 1An . For arbitrary t1 , . . . , tn ≥ 0, let π be a permutation of {1, . . . , n} such that tπ(1) ≤ · · · ≤ tπ(n) and define µt1 ,...,tn (A1 × · · · × An ) = µtπ(1) ,...,tπ(n) (Aπ(1) × · · · × Aπ(n) ). Then by construction, the probability measures µt1 ,...,tn satisfy condition (i) of Definition (1.2.2). By the Chapman-Kolmogorov equation we have Ps 1E Pt = Ps+t for all s, t ≥ 0. From this fact it follows that condition (ii) is also satisfied, so the measures µt1 ,...,tn form a consistent system (see Exercise 5). Hence, by Kolmogorov’s consistency theorem there exists a probability measure P ν on (Ω, F) = (E R+ , E R+ ) such that under Pν , the canonical process X has the measures µt1 ,...,tn as its fdd’s. In particular, we have for 0 = t0 < t1 < · · · < tn and A0 , A1 , . . . , An ∈ E Pν (Xt0 ∈ A0 , . . . , Xtn ∈ An ) = µt0 ,t1 ,...,tn (A0 × A1 × · · · × An ) = ν1A0 Pt1 −t0 1A1 · · · Ptn −tn−1 1An . By Lemma 3.1.4 this implies that X is Markov w.r.t. its natural filtration. Consider a given transition function (Pt ) on (E, E) again. For x ∈ E, let δx be the Dirac measure concentrated at the point x. By Corollary 1.3.4 there exists a probability measure Pδx such that under this measure, the canonical process X has δx as initial distribution. In the remainder of these notes, we simply write Px instead of Pδx and the corresponding expectation is denoted by 56 Markov processes Ex . Since Px (X0 = x) = 1 we say that under Px , the process X starts in x. Note also that for all x ∈ E and A ∈ E, Z Px (Xt ∈ A) = δx (y)Pt (y, A) = Pt (x, A). In particular, the map x 7→ Px (Xt ∈ A) is measurable for every A ∈ E. This is generalized in the following lemma. X Lemma 3.2.4. Let Z be an F∞ -measurable random variable, nonnegative or bounded. Then the map x 7→ Ex Z is measurable and Z Eν Z = ν(dx)Ex Z, for every initial distribution ν. Proof. It is easily seen that the class Z X S = Γ ∈ F∞ : x 7→ Ex 1Γ is measurable and Eν 1Γ = ν(dx)Ex 1Γ X X be the class of rectangles of . Let G ⊆ F∞ is a monotone class of subsets of F∞ the form {Xt1 ∈ A1 , . . . , Xtn ∈ An }. Using Lemma 3.1.4, it is easy to see that G ⊆ S. Since G is closed under finite intersections, it follows that X = σ(G) ⊆ S, F∞ X by the monotone class theorem (see the appendix). So for every Γ ∈ F∞ , the statement of the lemma is true for the random variable Z = 1Γ . By a standard approximation argument is follows that the statement is true for every X nonnegative or bounded F∞ -measurable random variable Z. (See also Exercise 6) For t ≥ 0 we define the translation operator θt : E R+ → E R+ by θt f (s) = f (t + s). So θt just cuts of the part of the path of f before time t and shifts the remaining part to the origin. Clearly, θt ◦ θs = θt+s and every θt is E R+ measurable. Using the translation operators, we can formulate the Markov property as follows. X -measurable random variable, nonnegative or Theorem 3.2.5. Let Z be an F∞ bounded. Then for every t > 0 and initial distribution ν, Eν (Z ◦ θt | FtX ) = EXt Z Pν -a.s. Before we turn to the proof, let us remark that the right-hand side of the display should be read as the evaluation at the (random) point Xt of the function x 7→ Ex Z. So by Lemma 3.2.4 it is a measurable function of Xt . 3.3 Feller processes 57 Proof. We have to show that for A ∈ FtX Z Z Z ◦ θt dPν = EXt Z dPν . A A By the usual approximation arguments it is enough to show this equality for A of the form A = {Xt0 ∈ A0 , . . . , Xtn ∈ An } with 0 = t0 < · · · < tn = t and Z of the form m Y fi (Xsi ), Z= i=1 for s1 ≤ · · · ≤ sm and certain nonnegative, measurable functions fi . In this case the left-hand side equals Eν n Y j=0 1Aj (Xtj ) m Y fi (Xt+si ). i=1 By Lemma 3.1.4, the right-hand side equals Eν n Y j=0 1Aj (Xtj ) Ps1 f1 Ps2 −s1 f2 · · · Psm −sm−1 fm (Xt ). Using Lemma 3.1.4 again we see that the two expressions are equal. 3.3 3.3.1 Feller processes Feller transition functions and resolvents In Section 3.1 we saw that a homogenous transition function (Pt )t≥0 on a measurable space (E, E) can be viewed as a semigroup of operators on the space of bounded, measurable functions on E. In this section we consider semigroups with additional properties. For simplicity, the state space E is assumed to be a subset of Rd , and E is its Borel-σ-algebra. By C0 = C0 (E) we denote the space of real-valued, continuous functions on E which vanish at infinity. In other words, a function f ∈ C0 is continuous on E and has the property that f (x) → 0 as kxk → ∞. Functions in C0 are bounded, so we can endow the space with the sup-norm, which is defined by kf k∞ = sup |f (x)| x∈E for f ∈ C0 . Note that C0 is a subset of the space of bounded, measurable functions on E, so we can consider the restriction of the transition operators P t to C0 . 58 Markov processes Definition 3.3.1. The transition function (Pt )t≥0 is called a Feller transition function if (i) Pt C0 ⊆ C0 for all t ≥ 0, (ii) for every f ∈ C0 and x ∈ E, Pt f (x) → f (x) as t ↓ 0. A Markov process with a Feller transition function is a called a Feller process. Observe that the operators Pt are contractions on C0 , i.e. for every f ∈ C0 , we have Z kPt f k∞ = sup |Pt f (x)| = sup f (y)Pt (x, dy) ≤ kf k∞ . x∈E x∈E E So for all t ≥ 0 we have kPt k ≤ 1, where kPt k is the norm of Pt as a linear operator on the normed linear space C0 , endowed with the supremum norm (see Appendix B for the precise definitions of these notions). If f ∈ C0 , then Pt f is also in C0 by part (i) of Definition 3.3.1. By the semigroup property and part (ii) it follows that Pt+h f (x) = Ph Pt f (x) → Pt f (x) as h ↓ 0. In other words, the map t 7→ Pt f (x) is right-continuous for all f ∈ C0 and x ∈ E. In particular we see that this map is measurable, so for all λ > 0 we may define Z ∞ Rλ f (x) = e−λt Pt f (x) dt. 0 If we view t 7→ Pt as an operator-valued map, then Rλ is simply its Laplace transform, calculated at the ‘frequency’ λ. The next theorem collects the most important properties of the operators Rλ . In particular, it states that for all λ > 0, Rλ is in fact an operator which maps C0 into itself. It is called the resolvent of order λ. Theorem 3.3.2. Let Rλ , λ > 0 be the resolvents of a Feller transition function. Then Rλ C0 ⊆ C0 for every λ > 0 and the resolvent equation Rµ − Rλ + (µ − λ)Rµ Rλ = 0 holds for all λ, µ > 0. Moreover, the image of Rλ does not depend on λ and is dense in C0 . Proof. Denote the transition function by Pt . For f ∈ C0 , we have Pt f ∈ C0 for every t ≥ 0. Hence, if xn → x in E, the function t 7→ Pt f (xn ) converges pointwise to t 7→ Pt f (x). By dominated convergence it follows that Z ∞ Z ∞ e−λt Pt f (x) dt = Rλ f (x), e−λt Pt f (xn ) dt → Rλ f (xn ) = 0 0 hence Rλ maps C0 into the space of continuous functions on E. Using the same reasoning, we see that if kxn k → ∞, then Rλ f (xn ) → 0. So indeed, Rλ C0 ⊆ C0 . 3.3 Feller processes 59 To prove the resolvent equation, note that Z t e(λ−µ)s ds. e−µt − e−λt = (λ − µ)e−λt 0 Hence, Z ∞ (e−µt − e−λt )Pt f (x) dt 0 Z t Z ∞ e(λ−µ)s Pt f (x) ds dt e−λt = (λ − µ) 0 Z0 ∞ Z ∞ −µs e−λ(t−s) Pt f (x) dt ds, e = (λ − µ) Rµ f (x) − Rλ f (x) = s 0 by Fubini’s theorem. A change of variables, the semigroup property of the transition function and another application of Fubini show that the inner integral equals Z ∞ Z ∞ −λu e−λu Ps Pu f (x) du e Ps+u f (x) du = 0 0 Z Z ∞ Pu f (y) Ps (x, dy) du = e−λu (3.1) E 0 Z Z ∞ −λu = e Pu f (y) du Ps (x, dy) E 0 = Ps Rλ f (x). Combined with the preceding display, this yields the desired equation. The resolvent equation implies that Rλ = Rµ (I + (µ − λ)Rλ ), which shows that the image of the map Rλ is contained in the image of Rµ . Since this also holds with the roles of λ and µ reversed, the image D of Rλ is independent of λ. To prove that D is dense in C0 , consider an arbitrary, bounded linear functional A on C0 that vanishes on D. By the Riesz representation theorem (Theorem B.2.1), there exist finite Borel measures ν and ν 0 on E such that Z Z Z 0 f d(ν − ν 0 ) f dν = f dν − A(f ) = E E E for every f ∈ C0 . Now fix f ∈ C0 . Then by part (ii) of Definition 3.3.1 and dominated convergence, it holds for every x ∈ E that Z ∞ Z ∞ −λt e−s Ps/λ f (x) ds → f (x) (3.2) λe Pt f (x) dt = λRλ f (x) = 0 0 as λ → ∞. Noting also that kλRλ f k∞ ≤ kf k∞ , dominated convergence implies that Z Z f (x) (ν − ν 0 )(dx) = A(f ). λRλ f (x) (ν − ν 0 )(dx) → 0 = A(λRλ f ) = E E We conclude that the functional A vanishes on the entire space C0 . By Corollary B.1.2 to the Hahn-Banach theorem, this shows that D is dense in C0 . 60 Markov processes Observe that for every f ∈ C0 , the resolvent Rλ satisfies Z ∞ Z ∞ kf k∞ . e−λt dt = kRλ f k∞ ≤ e−λt kPt f k∞ dt ≤ kf k∞ λ 0 0 Hence, for the linear transformation Rλ : C0 → C0 we have kRλ k ≤ 1/λ. Let us also note that in the proof of Theorem 3.3.2 we saw that Z ∞ Pt Rλ f (x) = e−λu Pt+u f (x) du. (3.3) 0 By the semigroup property of the transition function, the right-hand side of this equation equals Rλ Pt f (x). In other words, the operators Pt and Rλ commute. The following corollary to the theorem states that for Feller transition functions, the pointwise convergence of part (ii) of Definition 3.3.1 is actually equivalent to the seemingly stronger uniform convergence. This is called the strong continuity of a Feller semigroup. Similarly, the pointwise convergence in (3.2) is strengthened to uniform convergence. Corollary 3.3.3. For a Feller transition function Pt and its resolvents Rλ it holds for every f ∈ C0 that kPt f − f k∞ → 0 as t ↓ 0, and kλRλ f − f k∞ → 0 as λ → ∞. Proof. Relation (3.1) in the proof of Theorem 3.3.2 shows that for f ∈ C0 , Z ∞ Pt Rλ f (x) = eλt e−λs Ps f (x) ds. t It follows that Pt Rλ f (x) − Rλ f (x) = (e λt − 1) Z ∞ t e −λs Ps f (x) ds − Z t e−λs Ps f (x) ds, 0 hence kPt Rλ f − Rλ f k∞ ≤ (eλt − 1)kRλ f k∞ + tkf k∞ . Since the right-hand side vanishes as t ↓ 0, this proves the first statement of the corollary for functions in the joint image D of the resolvents. Now let f ∈ C 0 be arbitrary. Then for every g ∈ D, it holds that kPt f − f k∞ ≤ kPt f − Pt gk∞ + kPt g − gk∞ + kg − f k∞ ≤ kPt g − gk∞ + 2kg − f k∞ . Since D is dense in C0 , the second term can be made arbitrarily small by choosing an appropriate function g. For that choice of g ∈ D, the first term vanishes as t ↓ 0, by the first part of the proof. This completes the proof of the first statement of the lemma. The second statement follows easily from the first one by dominated convergence. 3.3 Feller processes 61 Example 3.3.4. The BM is a Feller process. Its resolvents are given by Z Rλ f (x) = f (y)rλ (x, y)dy, R √ where rλ (x, y) = exp (− 2λ|x − y|)/ p (2λ) (see Exercise 7). Lemma 3.1.6 gives conditions under which a function of a Markov process is again Markovian. The corresponding result for Feller processes is as follows. Lemma 3.3.5. Let X be a Feller process with state space E and transition function (Pt ). Suppose that ϕ : E → E 0 is continuous and onto, and that kϕ(xn )k → ∞ in E 0 if and only if kxn k → ∞ in E. Then if (Qt ) is a collection of transition kernels such that Pt (f ◦ ϕ) = (Qt f ) ◦ ϕ for all f ∈ C0 (E 0 ), the process Y = ϕ(X) is Feller with respect to its natural filtration, with state space E 0 and transition function (Qt ). Proof. By Lemma 3.1.6, we only have to show that the semigroup (Qt ) is Feller. The assumptions on ϕ imply that if f ∈ C0 (E 0 ), then f ◦ ϕ ∈ C0 (E). The Feller property of (Qt ) therefore follows from that of (Pt ) (Exercise 8). 3.3.2 Existence of a cadlag version In this section we consider a Feller transition function Pt on (E, E), where E ⊆ Rd and E is the Borel-σ-algebra of E. By Corollary 3.2.3 there exists for every probability measure ν on (E, E) a probability measure Pν on the canonical space (Ω, F) = (E R+ , E R+ ) such that under Pν , the canonical process X is a Markov process with respect to the natural filtration (FtX ), with transition function (Pt ) and initial distribution ν. As in the preceding section we write C0 instead of C0 (E) and we denote the resolvent of order λ > 0 by Rλ . Our first aim in this section is to prove that a Feller process always admits a cadlag modification. We need the following lemma, which will allow us to use the regularization results for supermartingales of the preceding chapter. Lemma 3.3.6. For every λ > 0 and every nonnegative function f ∈ C0 , the process e−λt Rλ f (Xt ) is a Pν -supermartingale with respect to the filtration (FtX ), for every initial distribution ν. 62 Markov processes Proof. By the Markov property we have Eν (e−λt Rλ f (Xt ) | FsX ) = e−λt Pt−s Rλ f (Xs ) Pν -a.s. (see Theorem 3.2.5). Hence, to prove the statement of the lemma it suffices to show that e−λt Pt−s Rλ f (x) ≤ e−λs Rλ f (x) for every x ∈ E, which follows from a straight-forward calculation (see Exercise 9). We can now prove that a Feller process admits a cadlag modification. The proof relies on some topological facts that have been collected in Exercises 10– 12. Theorem 3.3.7. The canonical Feller process X admits a cadlag modification. More precisely, there exists a cadlag process Y on the canonical space (Ω, F) such that for all t ≥ 0 and every initial distribution ν on (E, E), we have Yt = Xt , Pν -a.s. Proof. Fix an arbitrary initial distribution ν on (E, E). Let H be a countable, dense subset of the space C0+ (E) of nonnegative functions in C0 (E). Then H separates the points of the one-point compactification E ∗ of E (see Exercise 10) and by the second statement of Corollary 3.3.3, the class H0 = {nRn h : h ∈ H, n ∈ N} has the same property. Lemma 3.3.6 and Theorem 2.3.2 imply that Pν -a.s., the limit lim h(Xr ) r↓t r∈Q exists for all h ∈ H0 and t ≥ 0. In view of Exercise 11, it follows that Pν -a.s., the limit lim Xr r↓t r∈Q exists in E ∗ , for all t ≥ 0. So if Ω0 ⊆ Ω is the event on which these limits exist, it holds that Pν (Ω0 ) = 1 for every initial distribution ν. Now fix an arbitrary point x0 ∈ E and define a new process Y as follows. For ω 6∈ Ω0 , we put Yt (ω) = x0 for all t ≥ 0. For ω ∈ Ω0 and t ≥ 0, define Yt (ω) = lim Xr (ω). r↓t r∈Q We claim that for every initial distribution ν and t ≥ 0, we have Yt = Xt , Pν -a.s. To prove this, let f and g be two bounded, continuous functions on E ∗ . Then by dominated convergence and the Markov property, Eν f (Xt )g(Yt ) = lim Eν f (Xt )g(Xr ) r↓t r∈Q = lim Eν Eν (f (Xt )g(Xr ) | FtX ) r↓t r∈Q = lim Eν f (Xt )Pr−t g(Xt ). r↓t r∈Q 3.3 Feller processes 63 By Corollary 3.3.3, we Pν -a.s. have Pr−t g(Xt ) → g(Xt ) as r ↓ t. By dominated convergence, it follows that Eν f (Xt )g(Yt ) = Eν f (Xt )g(Xt ). Hence, by Exercise 12 we indeed have Yt = Xt , Pν -a.s. The process Y is right-continuous by construction, and we have shown that Y is a modification of X. It remains to prove that for every initial distribution ν, Y has left limits with Pν -probability 1. To this end, note that for all h ∈ H0 , the process h(Y ) is a right-continuous martingale. By Corollary 2.3.3 this implies that h(Y ) has left limits with Pν -probability one. In view of Exercise 11, it follows that Y has left limits with Pν -probability one. 3.3.3 Existence of a good filtration Let X be the canonical, cadlag version of a Feller process with state space E ⊆ Rd and transition function Pt (see Theorem 3.3.7). So far we have only been working with the natural filtration (FtX ). In general, this filtration is neither complete nor right-continuous. We would like to replace it by a larger filtration that satisfies the usual conditions (see Definition 1.6.3), and with respect to which the process X is still a Markov process. We first construct a new filtration for every fixed initial distribution ν. ν X ν with respect to Pν . This means that F∞ be the completion of F∞ We let F∞ X consists of the sets B for which there exist B1 , B2 ∈ F∞ such that B1 ⊆ B ⊆ B2 and Pν (B2 \B1 ) = 0. The Pν -probability of such a set B is defined as Pν (B) = ν . By N ν we denote Pν (B1 ) = Pν (B2 ), thus extending the definition of Pν to F∞ ν ν the Pν -negligible sets in F∞ , i.e. the sets B ∈ F∞ for which Pν (B) = 0. The filtration (Ftν ) is then defined by Ftν = σ FtX ∪ N ν . We define the filtration (Ft ) by putting \ Ftν , Ft = ν where the intersection is taken over all probability measures on the state space (E, E). We call (Ft ) the usual augmentation of the natural filtration (FtX ). It turns out that by just adding null sets, we have made the filtration right-continuous. Theorem 3.3.8. The filtrations (Ftν ) and (Ft ) are right-continuous. Proof. It is easily seen that the right-continuity of (Ftν ) implies the rightcontinuity of (Ft ), so it suffices to prove the former. Let us first show that for X -measurable random variable Z, every nonnegative, F∞ ν Eν (Z | Ftν ) = Eν (Z | Ft+ ) Pν -a.s. (3.4) 64 Markov processes By a monotone class argument, it is enough to prove this equality for Z of the form n Y fi (Xti ), Z= i=1 with t1 < · · · < tn and fi ∈ C0 for i = 1, . . . , n. Now suppose that ν X tk−1 ≤ t < tk . Since Ft+h differs only from Ft+h by Pν -negligible sets we X ν ), Pν -a.s. Hence, the Markov property implies ) = Eν (Z | Ft+h have Eν (Z | Ft+h that for h small enough, ν Eν (Z | Ft+h )= k−1 Y fi (Xti )Eν i=1 n Y i=k = gh (Xt+h ) k−1 Y X fi (Xti ) | Ft+h fi (Xti ) i=1 Pν -a.s., where gh (x) = Ptk −(t+h) fk Ptk+1 −tk fk+1 · · · Ptn −tn−1 fn (x). By the strong continuity of the semigroup Pt (Corollary 3.3.3) we have kgh − gk∞ → 0 as h ↓ 0, where g(x) = Ptk −t fk Ptk+1 −tk fk+1 · · · Ptn −tn−1 fn (x). Moreover, the right-continuity of X implies that Xt+h → Xt , Pν -a.s., so as h ↓ 0, we Pν -a.s. have ν Eν (Z | Ft+h ) → g(Xt ) k−1 Y i=1 fi (Xti ) = Eν (Z | Ftν ). On the other hand, by Theorem 2.2.15, the conditional expectation on the leftν ) as h ↓ 0, which completes the proof hand side converges Pν -a.s. to Eν (Z | Ft+ of (3.4). ν ν ν Now suppose that B ∈ Ft+ . Then 1B is F∞ -measurable and since F∞ is X X the Pν -completion of F∞ , there exists an F∞ - measurable random variable Z such that Pν -a.s., 1B = Z. By (3.4), it follows that we Pν -a.s. have ν ν ) = Eν (Z | Ftν ). ) = Eν (Z | Ft+ 1B = Eν (1B | Ft+ This shows that the random variable 1B is Ftν -measurable, so B ∈ Ftν . Our next aim is to prove that the statement of Theorem 3.2.5 remains true if we replace the natural filtration (FtX ) by its usual augmentation (Ft ). In particular, we want to show that X is still a Markov process with respect to (Ft ). But first we have to address some measurability issues. We begin by considering the completion of the Borel-σ-algebra E = B(E) of the state space E. If µ is a probability measure on (E, E), we denote by E µ the completion of E w.r.t. µ. Next, we define \ E µ, E∗ = µ 3.3 Feller processes 65 where the intersection is taken over all probability measures on (E, E). The σ-algebra E ∗ is called the σ-algebra of universally measurable sets. Lemma 3.3.9. If Z is a bounded, F∞ -measurable random variable, then the map x 7→ Ex Z is E ∗ -measurable and Z Eν Z = ν(dx)Ex Z, for every initial distribution ν. ν . By definition of Proof. Fix the initial distribution ν and note that F∞ ⊆ F∞ ν X F∞ there exist two F∞ random variables Z1 , Z2 such that Z1 ≤ Z ≤ Z2 and Eν (Z2 − Z1 ) = 0. It follows that for all x ∈ E, we have Ex Z1 ≤ Ex Z ≤ Ex Z2 . Moreover, the maps x 7→ Ex Zi are E-measurable by Lemma 3.2.4 and Z (Ex Z2 − Ex Z1 ) ν(dx) = Eν (Z2 − Z1 ) = 0. By definition of E ν this shows that x 7→ Ex Z is E ν -measurable and that Z Z Eν Z = Eν Z1 = ν(dx)Ex Z1 = ν(dx)Ex Z. Since ν is arbitrary, it follows that x 7→ Ex Z is in fact E ∗ -measurable. Lemma 3.3.10. For all t ≥ 0, the random variable Xt is measurable as a map from (Ω, Ft ) to (E, E ∗ ). Proof. Take A ∈ E ∗ and fix an initial distribution ν on (E, E). Denote the distribution of Xt on (E, E) under Pν by µ. Since E ∗ ⊆ E µ , there exist A1 , A2 ∈ E such that A1 ⊆ A ⊆ A2 and µ(A2 \A1 ) = 0. Consequently, Xt−1 (A1 ) ⊆ Xt−1 (A) ⊆ Xt−1 (A2 ). Since Xt−1 (A1 ), Xt−1 (A2 ) ∈ FtX and Pν (Xt−1 (A2 )\Xt−1 (A1 )) = Pν (Xt−1 (A2 \A1 )) = µ(A2 \A1 ) = 0, this shows that Xt−1 (A) is contained in the Pν -completion of FtX . But ν is arbitrary, so the proof is complete. We can now prove that formulation of the Markov property in terms of the shift operators θt is still valid for the usual augmentation (Ft ) of the natural filtration of our Feller process. Theorem 3.3.11 (Markov property). Let Z be an F∞ -measurable random variable, nonnegative or bounded. Then for every t > 0 and initial distribution ν, Pν -a.s. Eν (Z ◦ θt | Ft ) = EXt Z In particular, X is still a Markov process with respect to (Ft ). 66 Markov processes Proof. By combining the two preceding lemmas we see that the random variable EXt Z is Ft -measurable, so we only have to prove that for any A ∈ Ft , Z Z Z ◦ θt dPν = (3.5) EXt Z dPν . A A Assume that Z is bounded and denote the law of Xt under Pν by µ. By definition X of F∞ , there exists an F∞ - measurable random variable Z 0 such that {Z 6= 0 X Z } ⊆ Γ, with Γ ∈ F∞ and Pµ (Γ) = 0. We have {Z ◦ θt 6= Z 0 ◦ θt } = θt−1 ({Z 6= Z 0 }) ⊆ θt−1 (Γ) and by Theorem 3.2.5, Pν (θt−1 (Γ)) = Eν (1Γ ◦ θt ) = Eν Eν (1Γ ◦ θt | FtX ) = Eν EXt 1Γ = Eν ϕ(Xt ), where ϕ(x) = Ex 1Γ = Px (Γ). Since the distribution of Xt under Pν is given by µ, we have Z Z Eν ϕ(Xt ) = µ(dx)ϕ(x) = µ(dx)Px (Γ) = Pµ (Γ) = 0, so Pν (θt−1 (Γ)) = 0. This shows that on the left-hand side of (3.5), we may replace Z by Z 0 . The last two displays show that the two probability measures B 7→ Eν EXt 1B and Pµ coincide. Hence, since Pµ (Z 6= Z 0 ) ≤ Pµ (Γ) = 0, Eν |EXt Z − EXt Z 0 | ≤ Eν EXt |Z − Z 0 | = Eµ |Z − Z 0 | = 0. It follows that EXt Z = EXt Z 0 , Pν -a.s., so on the right-hand side of (3.5) we may X -measurable, the statement now follows also replace Z be Z 0 . Since Z 0 is F∞ from Theorem 3.2.5. 3.4 3.4.1 Strong Markov property Strong Markov property of a Feller process Let X again be a Feller process with state space E ⊆ Rd . In view of the preceding sections, we consider the canonical, cadlag version of X, which is a Markov process with respect of the usual augmentation (Ft ) of the natural filtration of the canonical process X. As before, we denote the shift operators by θt . In this section we will prove that for Feller processes, the Markov property of Theorem 3.3.11 does not only hold for deterministic times t, but also for (Ft )-stopping times. This is called the strong Markov property. Recall that for deterministic t ≥ 0, the shift operator θt on the canonical space Ω maps a path 3.4 Strong Markov property 67 s 7→ ω(s) to the path s 7→ ω(t+s). Likewise, we now define θτ for a random time τ as the operator that maps the path s 7→ ω(s) to the path s 7→ ω(τ (ω) + s). This definition does not cause confusion since if τ equals the deterministic time t, then τ (ω) = t for all ω ∈ Ω, so θτ is equal to the old operator θt . Observe that since the canonical process X is just the identity on the space Ω, we have for instance (Xt ◦ θτ )(ω) = Xt (θτ (ω)) = θτ (ω)(t) = ω(τ (ω) + t) = Xτ (ω)+t (ω), i.e. Xt ◦ θτ = Xτ +t . So the operators θτ can still be viewed as time shifts, but now also certain random times are allowed. Theorem 3.4.1 (Strong Markov property). Let Z be an F∞ -measurable random variable, nonnegative or bounded. Then for every (Ft )-stopping time τ and initial distribution ν, we have Pν -a.s. Eν (Z ◦ θτ | Fτ ) = EXτ Z on {τ < ∞}. Proof. Suppose first that τ is an a.s. finite stopping time that takes values in a countable set D. Then since θτ equals θd on the event {τ = d}, we have (see Exercise 20 of Chapter 1) Eν (Z ◦ θτ | Fτ ) = = X d∈D X d∈D = X d∈D 1{τ =d} Eν (Z ◦ θτ | Fτ ) 1{τ =d} Eν (Z ◦ θd | Fd ) 1{τ =d} EXd Z = E Xτ Z Pν -a.s., by the Markov property. Let us now consider a general finite stopping time τ and assume that Z is of the form Y fi (Xti ) Z= i for certain t1 < · · · < tk and f1 , . . . , fk ∈ C0 . By Lemma 1.6.14 there exist stopping times τn taking values in a finite set and decreasing to τ . By the preceding paragraph we Pν -a.s. have Y Y Eν fi (Xti ) ◦ θτn | Fτn = EXτn fi (Xti ) = g(Xτn ), i i where g(x) = Pt1 f1 Pt2 −t1 f2 · · · Ptk −tk−1 fk (x). By the right-continuity of the paths the right-hand side converges a.s. to g(X τ ) as n → ∞. The right-continuity of the filtration and Corollary 2.2.16 imply that the left-hand side converges a.s. to Y Eν fi (Xti ) ◦ θτ | Fτ . i 68 Markov processes Q Hence, the statement of the theorem holds for Z = i fi (Xti ). By a monotone X class type argument we see that it holds for all F∞ -measurable random variables Z. Next, let Z be a general F∞ -measurable random variable. Denote the distribution of Xτ under Pν by µ. By construction, F∞ is contained in the X X -measurable random variables Z 0 so there exist two F∞ Pµ -completion of F∞ 00 0 00 00 and Z such that Z ≤ Z ≤ Z and Eµ (Z − Z 0 ) = 0. It follows that Z 0 ◦ θτ ≤ Z ◦ θτ ≤ Z 00 ◦ θτ and by the preceding paragraph Eν (Z 00 ◦ θτ − Z 0 ◦ θτ ) = Eν Eν (Z 00 ◦ θτ − Z 0 ◦ θτ | Fτ ) = Eν EXτ (Z 00 − Z 0 ) Z = Ex (Z 00 − Z 0 ) µ(dx) = Eµ (Z 00 − Z 0 ) = 0. X So Z ◦ θτ is measurable with respect to the Pν -completion of F∞ , and since ν is arbitrary we conclude that Z ◦ θτ is F∞ -measurable. Now observe that we have Eν (Z 0 ◦ θτ | Fτ ) ≤ Eν (Z ◦ θτ | Fτ ) ≤ Eν (Z 00 ◦ θτ | Fτ ) Pν -a.s. By the preceding paragraph the outer terms terms Pν -a.s. equal EXτ Z 0 and EXτ Z 00 , respectively. The preceding calculation shows that these random variables are Pν -a.s. equal. Since Z 0 ≤ Z ≤ Z 00 it follows that they are both a.s. equal to EXτ Z. We have now shown that the theorem holds for finite stopping times. If τ is not finite, apply the result to the finite stopping time σ = τ 1{τ <∞} and verify that Pν -a.s., Eν (V 1{τ <∞} | Fσ ) = Eν (V 1{τ <∞} | Fτ ) for every bounded or nonnegative random variable V . The result then easily follows. We can say more for Feller processes with stationary and independent increments, i.e. for which the increment Xt − Xs is independent of Fs for all s ≤ t and its Pν - distribution depends only on t − s for all initial distributions ν. For such processes, which are called Lévy processes, we have the following corollary. Corollary 3.4.2. If X has stationary, independent increments and τ is a finite stopping time, the process (Xτ +t − Xτ )t≥0 is independent of Fτ and under each Pν , it has the same law as X under P0 , provided that 0 ∈ E. Proof. Put Yt = Xτ +t − Xτ for t ≥ 0. For t1 < · · · < tn and measurable, nonnegative functions f1 , . . . , fn we have Y Y Eν fk (Ytk ) | Fτ = Eν fk (Xtk − X0 ) ◦ θτ | Fτ k k = E Xτ Y k fk (Xtk − X0 ), 3.4 Strong Markov property 69 by the strong Markov property. Hence, the proof is complete once we have shown that for arbitrary x ∈ E Ex n Y k=1 fk (Xti − X0 ) = Pt1 f1 · · · Ptn −tn−1 fn (0) (see Lemma 3.1.4). We prove this by induction on n. Suppose first that n = 1. Then by the stationarity of the increments, the distribution of Xt1 − X0 under Px is independent of x. In particular we can take x = 0, obtaining Ex f1 (Xt1 − X0 ) = E0 f1 (Xt1 ) = Pt1 f1 (0). Now suppose that the statement is true for n−1 and all nonnegative, measurable functions f1 , . . . , fn−1 . We have Ex n Y i=1 fi (Xti − X0 ) = Ex Ex = Ex n Y i=1 n−1 Y i=1 fi (Xti − X0 ) | Ftn−1 fi (Xti − X0 )Ex (fn (Xtn − X0 ) | Ftn−1 ) . By the independence of the increments Ex (fn (Xtn − X0 ) | Ftn−1 ) = Ex (fn (Xtn − Xtn−1 + Xtn−1 − X0 ) | Ftn−1 ) = gx (Xtn−1 − X0 ), where gx (y) = Ex fn (Xtn − Xtn−1 + y). The Px -distribution of Xtn −Xtn−1 is the same as the distribution of Xtn −tn−1 − X0 and is independent of x. It follows that gx (y) = Ex fn (Xtn −tn−1 − X0 + y) = Ey fn (Xtn −tn−1 ) = Ptn −tn−1 fn (y), so that Ex n Y i=1 fi (Xti − X0 ) = Ex = Ex n−1 Y i=1 n−2 Y i=1 fi (Xti − X0 )Ptn −tn−1 fn (Xtn−1 − X0 ) fi (Xti − X0 ) (fn−1 Ptn −tn−1 fn )(Xtn−1 − X0 ). By the induction hypothesis this equals Pt1 f1 · · · Ptn −tn−1 fn (0) and the proof is complete. The following lemma is often useful in connection with the strong Markov property. Lemma 3.4.3. If σ and τ are finite (Ft )-stopping times, then σ + τ ◦ θσ is also a finite (Ft )-stopping time. 70 Markov processes Proof. Since (Ft ) is right-continuous, it suffices to prove that {σ + τ ◦ θσ < t} ∈ Ft for every t > 0. Observe that [ {σ + τ ◦ θσ < t} = {τ ◦ θσ < q} ∩ {σ ≤ t − q}. q≥0 q∈Q The indicator of {τ ◦ θσ < q} can be written as 1{τ <q} ◦ θσ . By Exercise 14, it follows that {τ ◦ θσ < q} ∈ Fσ+q . Hence, by definition of the latter σ-algebra, it holds that {τ ◦ θσ < q} ∩ {σ ≤ t − q} = {τ ◦ θσ < q} ∩ {σ + q ≤ t} ∈ Ft . This completes the proof. 3.4.2 Applications to Brownian motion In this section W is the canonical version of the Brownian motion, and (Ft ) is the usual augmentation of its natural filtration. Since the BM has stationary, independent increments, Corollary 3.4.2 implies that for every (Ft )-stopping time τ , the process (Wτ +t − Wτ )t≥0 is a standard BM. The first application that we present is the so-called reflection principle (compare with Exercise 10 of Chapter 1). Recall that we denote the hitting time of x ∈ R by τx . This is a finite stopping time with respect to the natural filtration of the BM (see Example 1.6.9) so it is certainly an (Ft )-stopping time. Theorem 3.4.4 (Reflection principle). for x ∈ R, let τx be the first hitting time of ( Wt , W 0t = 2x − Wt , Let W be a Brownian motion and x. Define the process W 0 by if t ≤ τx if t > τx . Then W 0 is a standard Brownian motion. Proof. Define the processes Y and Z by Y = W τx and Zt = Wτx +t − Wτx = Wτx +t − x for t ≥ 0. By Corollary 3.4.2 the processes Y and Z are independent and Z is a standard BM. By the symmetry of the BM it follows that −Z is also a BM that is independent of Y , so the two pairs (Y, Z) and (Y, −Z) have the same distribution. Now observe that for t ≥ 0, Wt = Yt + Zt−τx 1{t>τx } , Wt0 = Yt − Zt−τx 1{t>τx } . In other words we have W = ϕ(Y, Z) and W 0 = ϕ(Y, −Z), where ϕ : C[0, ∞) × C0 [0, ∞) → C[0, ∞) is given by ϕ(y, z)(t) = y(t) + z(t − ψ(y))1{t>ψ(y)} , 3.4 Strong Markov property 71 and ψ : C[0, ∞) → [0, ∞] is defined by ψ(y) = inf{t > 0 : y(t) = x}. Now it is easily verified that ψ is a Borel measurable map, so ϕ is also measurable, being a composition of measurable maps (see Exercise 17). Since (Y, Z) =d (Y, −Z), it follows that W = ϕ(Y, Z) =d ϕ(Y, −Z) = W 0 . The refection principle allows us to calculate the distributions of certain functionals related to the hitting times of the BM. We first consider the joint distribution of Wt and the running maximum St = sup Ws . s≤t Corollary 3.4.5. Let W be a standard BM and S its running maximum. Then for x ≤ y, P(Wt ≤ x, St ≥ y) = P(Wt ≤ x − 2y). The pair (Wt , St ) has joint density (2y−x)2 (2y − x)e− 2t p (x, y) 7→ πt3 /2 1{x≤y} with respect to Lebesgue measure. Proof. Let W 0 be the process obtained by reflecting W at the hitting time τy . Observe that St ≥ y if and only if t ≥ τy , so the probability of interest equals P(Wt ≤ x, t ≥ τy ). On the event {t ≥ τy } we have Wt = 2y − Wt0 , so it reduces further to P(Wt0 ≥ 2y − x, t ≥ τy ). But since x ≤ y we have 2y − x ≥ y, hence {Wt0 ≥ 2y − x} ⊆ {Wt0 ≥ y} ⊆ {t ≥ τy }. It follows that P(Wt0 ≥ 2y − x, t ≥ τy ) = P(Wt0 ≥ 2y − x). By the reflection principle and the symmetry of the BM this proves the first statement. For the second statement, see Exercise 18. It follows from the preceding corollary that for all x > 0 and t ≥ 0, P(St ≥ x) = P(τx ≤ t) = 2P(Wt ≥ x) = P(|Wt | ≥ x) (see Exercise 19). This shows in particular that St =d |Wt | for every t ≥ 0, and we can now derive an explicit expression for the density of the hitting time τ x . It is easily seen from this expression that Eτx = ∞, as was proved by martingale methods in Exercise 15 of Chapter 2. Corollary 3.4.6. The first time τx that the standard BM hits the level x > 0 has density x2 xe− 2t t 7→ √ 1{t≥0} 2πt3 with respect to Lebesgue measure. 72 Markov processes Proof. See Exercise 20. We have seen in the first two chapters that the zero set of the standard BM is a.s. closed, unbounded, has Lebesgue measure zero and that 0 is an accumulation point of the set, i.e. 0 is not an isolated point. Using the strong Markov property we can prove that in fact, the zero set contains no isolated points at all. Corollary 3.4.7. The zero set Z = {t ≥ 0 : Wt = 0} of the standard BM is a.s. closed, unbounded, contains no isolated points and has Lebesgue measure zero. Proof. In view of Exercise 26 of Chapter 1 we only have to prove that Z contains no isolated points. For rational q ≥ 0, define σq = q + τ0 ◦ θq and observe that σq is the first time after time q that the BM W visits 0. By Lemma 3.4.3 the random time σq is a stopping time, hence the strong Markov property implies that the process Wσq +t = Wσq +t − Wσq is a standard BM. By Corollary 2.4.6 it follows that a.s., σq is an accumulation point of Z, hence with probability 1 it holds that for every rational q ≥ 0, σq is an accumulation point of Z. Now take an arbitrary point t ∈ Z and choose rational points qn such that qn ↑ t. Then since qn ≤ σqn ≤ t, we have σqn → t. The limit of accumulation points is also an accumulation point, so the proof is complete. 3.5 3.5.1 Generators Generator of a Feller process Throughout this section we consider the ‘good version’ of a Feller process X with state space E ⊆ Rd , transition semigroup (Pt ) and resolvents Rλ . The transition function and the resolvents give little intuition about the way in which the Markov process X moves from point to point as time evolves. In the following definition we introduce a new operator which captures the behaviour of the process in infinitesimally small time intervals. Definition 3.5.1. A function f ∈ C0 is said to belong to the domain D(A) of the infinitesimal generator of X if the limit Af = lim t↓0 Pt f − f t exists in C0 . The linear transformation A : D(A) → C0 is called the infinitesimal generator of the process. 3.5 Generators 73 Immediately from the definition we see that for f ∈ D(A), it holds Pν -a.s. holds that Eν (f (Xt+h ) − f (Xt ) | Ft ) = hAf (Xt ) + o(h) as h ↓ 0. So in this sense, the generator indeed describes the movement of the process in an infinitesimally small amount of time. Lemma 3.3.5 gives conditions under which a function ϕ of the Feller process X is again Feller. Immediately from the definition we see that the generators of the two processes are related as follows. Lemma 3.5.2. Let ϕ : E → E 0 be continuous and onto, and assume that kϕ(xn )k → ∞ in E 0 if and only if kxn k → ∞ in E. Suppose that (Qt ) is a collection of transition kernels such that Pt (f ◦ϕ) = (Qt f )◦ϕ for all f ∈ C0 (E 0 ), so that Y = ϕ(X) is a Feller process with state space E 0 and transition function (Qt ). Then the generator B of Y satisfies D(B) = {f ∈ C0 (E 0 ) : f ◦ ϕ ∈ D(A)} and A(f ◦ ϕ) = (Bf ) ◦ ϕ for f ∈ D(B). Proof. See Exercise 21. The following lemma gives some of the basic properties of the generator. The differential equations in part (ii) are Kolmogorov’s forward and backward equations. Lemma 3.5.3. Suppose that f ∈ D(A). (i) For all t ≥ 0, Pt D(A) ⊆ D(A). (ii) The function t 7→ Pt f is right-differentiable in C0 and d Pt f = APt f = Pt Af. dt More precisely, this means that Pt+h f − Pt f Pt+h f − Pt f = 0. lim − P Af − AP f = lim t t h↓0 h↓0 h h ∞ ∞ (iii) We have Pt f − f = Z t Ps Af ds = 0 Z t APs f ds 0 for every t ≥ 0. Proof. All properties follow from the semigroup property and the strong continuity of (Pt ), see Exercise 22. To prove (iii) it is useful to note that since the Feller process X with semigroup (Pt ) has a right-continuous and quasi leftcontinuous modification, the map t 7→ Pt f (x) is continuous for all f ∈ C0 and x ∈ E. 74 Markov processes The next theorem gives a full description of the generator in terms of the resolvents Rλ . In the proof we need the following lemma. Lemma 3.5.4. For h, s > 0 define the linear operators Z 1 1 s Pt f dt. Ah f = (Ph f − f ), Bs f = h s 0 Then Bh f ∈ D(A) for all h > 0 and f ∈ C0 , and ABh = Ah . Proof. It is easily verified that Ah Bs = As Bh for all s, h > 0. Also note that for f ∈ C0 , Z 1 h kBh f − f k∞ ≤ kPt f − f k∞ dt → 0 h 0 as h ↓ 0, by the strong continuity of the Feller semigroup. It follows that for s > 0 and f ∈ C0 , Ah Bs f = A s Bh f → A s f in C0 as h ↓ 0, and the proof is complete. Theorem 3.5.5. The domain D(A) equals the joint image of the resolvents, whence D(A) is dense in C0 . For every λ > 0 the transformation λI − A : D(A) → C0 is invertible and its inverse is Rλ . Proof. For f ∈ D(A) we have Z ∞ e−λt Pt (λf − Af ) dt Rλ (λf − Af ) = 0 Z ∞ Z ∞ d −λt −λt =λ e Pt f dt − e Pt f dt, dt 0 0 by Lemma 3.5.3. Integration by parts of the last integral shows that Rλ (λf − Af ) = f . We conclude that D(A) ⊆ ImRλ and Rλ (λI − A) = I on D(A). To prove the converse, we use the notation of the preceding lemma. By the lemma and the fact that Ph and Rλ commute, we have for f ∈ C0 and h > 0, Z ∞ Ah Rλ f = Rλ Ah f = Rλ ABh f = e−λt Pt ABh f dt. 0 Using Lemma 3.5.3 and integration by parts we see that the right-hand side equals Z ∞ d −λt e Pt Bh f dt = λRλ Bh f − Bh f, dt 0 so we have Ah Rλ f = λRλ Bh f − Bh f . In the proof of the preceding lemma we saw that as h → 0, Bh f → f , uniformly. This shows that Rλ f ∈ D(A) and ARλ f = λRλ f − f . It follows that ImRλ ⊆ D(A) and (λI − A)Rλ = I on C0 . 3.5 Generators 75 The theorem states that for λ > 0, it holds that Rλ = (λI − A)−1 , so the generator A determines the resolvents. By uniqueness of Laplace transforms the resolvents determine the semigroup (Pt ). This shows that the generator determines the semigroup. In other words, two Feller processes with the same generator have the same semigroup. Corollary 3.5.6. The generator determines the semigroup. The preceding theorem also shows that for all λ > 0, the generator is given by A = λI − Rλ−1 . This gives us a method for explicit computations. Example 3.5.7. For the standard BM we have that D(A) = C02 , the space of functions on R with two continuous derivatives that vanish at infinity. For f ∈ D(A) it holds that that Af = f 00 /2 (see Exercise 23). From the definition of the generator it is easy to see that if f ∈ D(A) and x ∈ E is such that f (y) ≤ f (x) for all y ∈ E, then Af (x) ≤ 0. When an operator has this property, we say it satisfies the maximum principle. The following result will be useful below. Theorem 3.5.8. Suppose the generator A extends to A0 : D0 → C0 , where D0 ⊆ C0 is a linear space, and A0 satisfies the maximum principle. Then D 0 = D(A). Proof. In the first part of the proof we show that for a linear space D ⊆ C0 , an operator B : D → C0 that satisfies the maximum principle is dissipative, i.e. λkf k∞ ≤ k(λI − B)f k∞ for all f ∈ D and λ > 0. To prove this, let x ∈ E be such that |f (x)| = kf k∞ and define g(y) = f (y)sgnf (x). Then g ∈ D (D is a linear space) and g(y) ≤ g(x) for all y. Hence, by the maximum principle, Bg(x) ≤ 0. It follows that λkf k∞ = λg(x) ≤ λg(x) − Bg(x) ≤ k(λI − B)gk∞ = k(λI − B)f k∞ , as claimed. For the proof of the theorem, take f ∈ D 0 and define g = (I − A0 )f . By the first part of the proof A0 is dissipative, hence kf − R1 gk∞ ≤ k(I − A0 )(f − R1 g)k∞ . By Theorem 3.5.5 we have (I − A)R1 = I on C0 , so (I − A0 )(f − R1 g) = g − (I − A)R1 g = 0. It follows that f = R1 g, whence f ∈ D(A). Generators provide an important link between Feller processes and martingales. 76 Markov processes Theorem 3.5.9. For every f ∈ D(A) and initial measure ν, the process Mtf = f (Xt ) − f (X0 ) − Z t Af (Xs ) ds 0 is a Pν -martingale. Proof. Since f and Af are in C0 and therefore bounded, Mtf is integrable for every t ≥ 0. Now for s ≤ t, Z t Af (Xu ) du Fs . Eν (Mtf | Fs ) = Msf + Eν f (Xt ) − f (Xs ) − s By the Markov property, the conditional expectation on the right-hand side equals Z t−s EXs f (Xt−s ) − f (X0 ) − Af (Xu ) du . 0 But for every x ∈ E, Z Ex f (Xt−s ) − f (X0 ) − = Pt−s f (x) − f (x) − t−s Af (Xu ) du 0 Z t−s Pu Af (x) du = 0, 0 by Fubini’s theorem and part (iii) of Lemma 3.5.3. 3.5.2 Characteristic operator The definition of the generator and its expression in terms of resolvents given by Theorem 3.5.5 are analytical in nature. In the present section we give a probabilistic description of the generator. The key is the following corollary of Theorem 3.5.9. Corollary 3.5.10 (Dynkin’s formula). For every f ∈ D(A) and every stopping time τ such that Ex τ < ∞, we have Z τ Ex f (Xτ ) = f (x) + Ex Af (Xs ) ds 0 for every x ∈ E. Proof. By Theorem 3.5.9 and the optional stopping theorem, we have Z τ ∧n Ex f (Xτ ∧n ) = f (x) + Ex Af (Xs ) ds 0 3.5 Generators 77 for every n ∈ N. By the quasi-left continuity of X (see Exercise 16), the left-hand side converges to Ex f (Xτ ) as n → ∞. Since Af ∈ C0 we have kAf k∞ < ∞ and Z τ ∧n Af (Xs ) ds ≤ kAf k∞ τ. 0 Hence, by the fact that Ex τ < ∞, the dominated convergence theorem implies that the integral on the right-hand side of the first display converges to Z τ Af (Xs ) ds. Ex 0 This completes the proof. We call a point x ∈ E absorbing if for all t ≥ 0 it holds that Pt (x, {x}) = 1. This means that if the process is started in x, it never leaves x (see also Exercise 13). For r > 0, define the stopping time ηr = inf{t ≥ 0 : kXt − X0 k > r}. (3.6) If x is absorbing, then clearly we Px -a.s. have ηr = ∞ for all r > 0. For nonabsorbing points however, the escape time ηr is a.s. finite and has a finite mean if r is small enough. Lemma 3.5.11. If x ∈ E is not absorbing, then Ex ηr < ∞ for r > 0 sufficiently small. Proof. Let B(x, ε) = {y : ky − xk ≤ ε} be the closed ball of radius ε around the point x. If x is not absorbing, it holds that Pt (x, B(x, ε)) < p < 1 for some t, ε > 0. By the Feller property of the semigroup (Pt ) we have the weak convergence Pt (y, ·) ⇒ Pt (x, ·) as y → x. Hence, by the portmanteau theorem, the fact that B(x, ε) is closed implies that lim sup Pt (y, B(x, ε)) ≤ Pt (x, B(x, ε)). y→x It follows that for all y close enough to x, say y ∈ B(x, r) for r ∈ (0, ε), we have Pt (y, B(x, r)) ≤ Pt (x, B(x, ε)) < p. Using the Markov property it can easily be shown that consequently, Px (ηr ≥ nt) ≤ pn for n = 0, 1, . . . (see Exercise 24). Hence, Ex ηr = Z ∞ 0 Px (ηr ≥ s) ds ≤ t ∞ X n=0 Px (ηr ≥ nt) ≤ t < ∞. 1−p This completes the proof of the lemma. We can now prove the following alternative description of the generator. 78 Markov processes Theorem 3.5.12. For f ∈ D(A) we have Af (x) = 0 if x is absorbing, and Af (x) = lim r↓0 Ex f (Xηr ) − f (x) Ex ηr (3.7) otherwise. Proof. If x is absorbing we have Pt f (x) = f (x) for all t ≥ 0, which shows that Af (x) = 0. For non-absorbing x ∈ E the stopping time ηr has finite mean for r small enough (see the lemma), so by Dynkin’s formula (Corollary 3.5.10), Z ηr Ex f (Xηr ) = f (x) + Ex Af (Xs ) ds. 0 It follows that E R ηr |Af (X ) − Af (x)| ds E f (X ) − f (x) x 0 s x ηr − Af (x) ≤ Ex ηr Ex ηr ≤ sup |Af (y) − Af (x)|. ky−xk≤r This completes the proof, since Af is continuous. The operator defined by the right-hand side of (3.7) is called the characteristic operator of the Markov process X, its domain is simply the collection of all functions f ∈ C0 for which the limit in (3.7) exists. The theorem states that for Feller processes, the characteristic operator extends the infinitesimal generator. It is easily seen that the characteristic operator satisfies the maximum principle. Hence, by Theorem 3.5.8, the characteristic operator and the generator coincide. 3.6 3.6.1 Killed Feller processes Sub-Markovian processes Up to this point we have always assumed that the transition function (P t ) satisfies Pt (x, E) = 1, i.e. that the Pt (x, ·) are probability measures. It is sometimes useful to consider semigroups for which Pt (x, E) < 1 for some t, x. We call the semigroup sub-Markovian in that case. Intuitively, a sub-Markovian semigroup describes the motion of a particle that can disappear from the state space E, or die, in a finite time. A sub-Markovian transition function can be turned into a true Markovian one by adjoining a new point ∆ to the state space E, called the cemetery. We then put E∆ = E ∪ {∆}, E∆ = σ(E, {∆}) and define a new semigroup (P̃t ) by setting P̃t (x, A) = Pt (x, A) if A ⊆ E, P̃t (x, {∆}) = 1 − Pt (x, E) and P̃t (∆, {∆}) = 1 (see Exercise 25). Note that by construction, the point ∆ is 3.6 Killed Feller processes 79 absorbing for the new process. By convention, all functions on E are extended to E∆ by putting f (∆) = 0. In the sequel we will not distinguish between Pt and P̃t in our notation. In the case of sub-Markovian Feller semigroups, we extend the topology of E to E∆ in such a way that E∆ is the one-point compactification of E if E is not compact, and ∆ is an isolated point otherwise. Then the result of Theorem 3.3.7 is still true for the process on E∆ , provided that we call an E∆ -valued function on R+ cadlag if it is right-continuous and has left limits with respect to this topology on E∆ . Note that by construction, a function f ∈ C0 (E∆ ) satisfies f (∆) = 0. As before, we always consider the cadlag version of a given Feller process. We already noted that ∆ is absorbing by construction. So once a process has arrived in the cemetery state, it stays there for the remaining time. We define the killing time of the Feller process X by ζ = inf{t ≥ 0 : Xt− = ∆ or Xt = ∆}. Clearly ζ < ∞ with positive probability if the process X is sub-Markovian. For a cadlag Feller process X we have the following strong version of the absorption property of the cemetery. Lemma 3.6.1. It Pν -almost surely holds that Xt = ∆ for t ≥ ζ, for every initial distribution ν. Proof. Let f ∈ C0 (E∆ ) be strictly positive on E. Then the right-continuous Pν -supermartingale Mt = exp(−t)R1 f (Xt ) (see Lemma 3.3.6) is nonnegative, and clearly Mt = 0 if and only if Xt = ∆ and Mt− = 0 if and only if Xt− = ∆. Hence, the result follows from Corollary 2.3.16. We close this subsection with an example of a sub-Markovian Feller semigroup, given by a Brownian motion which is killed when it first hits zero. Example 3.6.2. Let W be a BM started at some positive point and let τ0 be the first time it hits zero. Define a new process X by putting Xt = Wt for t < τ0 , and Xt = ∆ for t ≥ τ0 . We can view the process X as a sub-Markovian Feller process with state space (0, ∞). Moreover, we can compute its transition semigroup (Pt ) explicitly. For f ∈ C0 (0, ∞) and x > 0 we have Pt f (x) = Ex f (Xt ) = Ex f (Wt )1{τ0 >t} . Now observe that under Px , the pair (W, τ0 ) has the same distribution as the pair (x + W, τ−x ) under P0 . By symmetry of the BM, the latter pair has the same distribution as (x−W, τx ) under P0 . Noting also that {τ0 > t} = {St < x}, where St = sups≤t Ws , we find that Pt f (x) = E0 f (x − Wt )1{St <x} . The joint distribution of (Wt , St ) under P0 is given by Corollary 3.4.5. We can use this to show that Z ∞ Pt f (x) = f (y)p(t, x, y) dy, 0 80 Markov processes where p(t, x, y) = √ (see Exercise 26). 1 1 1 exp − (y − x)2 − exp − (y + x)2 2t 2t 2πt 3.6.2 Feynman-Kac formula Let X be a Feller process and τ an exponential random variable with parameter q > 0, independent of X. Then we can define a new process Y by ( Xt , t < τ, Yt = ∆, t ≥ τ. So Y is constructed by killing X at an independent exponential time. Note that since f ∈ C0 satisfies f (∆) = 0, we have Qt f (x) = Ex f (Yt ) = Ex f (Xt )1{t<τ } = e−qt Ex f (Xt ). It follows that (Qt )t≥0 is a sub-Markovian Feller semigroup. Denote the semigroup of the original process X by (Pt ) and its generator by A. Then for f ∈ D(A) we have Qt f − f Pt f − f e−qt − 1 = e−qt +f → Af − qf. t t t So the generator Aq of the killed process Y is related to the original generator A by Aq = A − qI. The preceding example extends to Feller processes which are killed at nonconstant rates. More precisely, one can prove that if q is a continuous, nonnegative function on the state space of the Feller process X, then Qt f (x) = Ex f (Xt )e− t 0 q(Xu ) du defines a (sub-Markovian) Feller semigroup. We think of this as the semigroup of a process obtained by killing X at the rate q. It is then not hard to show that as in the example above, the generator Aq of the killed process is given by Aq = A − qI. The λ-resolvent of the killed process, denoted by Rλq , is given by Z ∞ t q f (Xt )e−λt− 0 q(Xu ) du dt Rλ f (x) = Ex 0 for f ∈ C0 . Since λI − A is the inverse of Rλq (cf. Theorem 3.5.5), it follows that the function u = Rλq f satisfies q (λ + q)u − Au = f. If we apply the resolvent Rλ of the original process X to this equation we get the equivalent equation u + Rλ qu = Rλ f. 3.6 Killed Feller processes 81 Each of these formulas is called the Feynman-Kac formula. In fact, the formula holds under less restrictive conditions. We can drop the continuity requirements on f and q. Theorem 3.6.3 (Feynman-Kac formula). Let X be a Feller process with λ-resolvent Rλ . Let q be a nonnegative, bounded measurable function on the state space and let f be bounded and measurable. Then the function Z ∞ t f (Xt )e−λt− 0 q(Xu ) du dt u(x) = Ex 0 satisfies u + Rλ qu = Rλ f. (3.8) Rt q(Xu ) du. Then since dAt /dt = q(Xt ) exp(At ), we have Z ∞ f (Xt )e−λt−At eAt − 1 dt Rλ f (x) − u(x) = Ex 0 Z ∞ Z t −λt−At = Ex f (Xt )e q(Xs )eAs ds dt 0 Z0 ∞ Z ∞ As f (Xt )e−λt−At dt ds. q(Xs )e = Ex Proof. Define At = 0 s 0 Observe that A is an additive functional, i.e. At+s − At = As ◦ θt . Hence, a substitution v = t − s yields Z ∞ Z ∞ Rλ f (x) − u(x) = Ex q(X0 ◦ θs )e−λs f (Xv ◦ θs )e−λv−Av ◦θs dv ds 0 Z0 ∞ −λs = Ex e (Z ◦ θs ) ds, 0 where Z = q(X0 ) Z ∞ f (Xt )e−λt− t 0 q(Xu ) du dt. 0 By the Markov property, Ex (Z ◦ θs ) = Ex EXs Z = Ex q(Xs )u(Xs ). Hence, we get Z Rλ f (x) − u(x) = Ex ∞ e−λs Ex q(Xs )u(Xs ) ds = Rλ qu(x). 0 This completes the proof. 3.6.3 Feynman-Kac formula and arc-sine law for the BM If we specialize the Feyman-Kac formula to the Brownian motion case, we get the following result. 82 Markov processes Theorem 3.6.4 (Feynman-Kac formula for BM). Let W be a standard BM, f a bounded, measurable function, q a nonnegative, bounded, measurable function and λ > 0. Then the function u defined by Z ∞ t u(x) = Ex f (Wt )e−λt− 0 q(Xu ) du dt 0 is twice differentiable, and satisfies the differential equation (λ + q)u − 21 u00 = f. Proof. By Example 3.3.4, the resolvent Rλ of the standard BM is given by Z Rλ f (x) = f (y)rλ (x, y)dy, R √ p where rλ (x, y) = exp (− 2λ|x − y|)/ (2λ). This shows that if g is bounded and measurable, then Rλ g is continuous. Moreover, a simple calculation shows that Rλ g is also differentiable and Z Z √ √ d g(y)e− 2λ(y−x) dy − g(y)e− 2λ(x−y) dy. Rλ g(x) = dx y>x y<x In particular, we see that Rλ g is continuously differentiable and in fact twice differentiable. Differentiating once more we find that d2 Rλ g(x) = −2g(x) + 2λRλ g(x) dx2 (see also Exercise 23). Under the conditions of the theorem the functions qu and f are clearly bounded. Hence, (3.8) implies that u is continuously differentiable and twice differentiable. The equation for u is obtained by differentiating (3.8) twice and using the identity in the last display. We now apply the Feynman-Kac formula to prove the arc-sine law for the amount of time that the BM spends above (or below) zero in a finite time interval. The name of this law is explained by the fact that the density in the statement of the theorem has distribution function r 2 x , x ∈ [0, t]. F (x) = arcsin π t Theorem 3.6.5 (Arc-sine law). Let W be a Brownian motion. For t > 0, the random variable Z t 1(0,∞) (Ws ) ds 0 has an absolutely continuous distribution with density s 7→ on (0, t). 1 π p s(t − s) 3.6 Killed Feller processes 83 Proof. For λ, ω > 0, consider the double Laplace transform Z ∞ t u(x) = e−λt Ex e−ω 0 1(0,∞) (Ws ) ds dt. 0 By the Feynman-Kac formula (applied with f = 1, q = ω1(0,∞) ) the function u is continuously differentiable and (λ + ω)u(x) − 12 u00 (x) = 1, λu(x) − 1 00 2 u (x) = 1, x > 0, x < 0. For a > 0 the general solution of the equation ag − g 00 /2 = 1 on an interval is given by √ √ 1 C1 ex 2a + C2 e−x 2a + . a Since u is bounded, it follows that for certain constants C1 , C2 , √ 1 C1 e−x 2(λ+ω) + , x > 0, λ+ω u(x) = √ 1 C2 ex 2λ + , x < 0. λ 0 Continuity x = 0 gives the equations C1 − 1/(λ + ω) = C2 − 1/λ p of u and u at √ and −C1 2(λ + ω) = C2 2λ. Solving these yields √ √ λ+ω− λ √ C1 = (λ + ω) λ and hence Z ∞ e−λt Ex e−ω 0 t 0 1(0,∞) (Ws ) ds dt = u(0) = p 1 λ(λ + ω) . On the other hand, Fubini’s theorem and a change of variables show that for the same double Laplace transform of the density given in the statement of the theorem, it holds that Z ∞ Z t 1 e−λt ds dt e−ωs p π s(t − s) 0 0 Z ∞ Z ∞ 1 1 √ √ e−ωs = e−λt dt ds π s t−s 0 s Z Z ∞ 1 ∞ 1 −(λ+ω)s 1 √ e √ e−λu du. = ds π 0 s u 0 √ A substitution x = t shows that for α > 0, r Z ∞ π 1 −αt √ e . dt = α t 0 Hence, we get Z ∞ 0 e−λt Z 0 t 1 1 e−ωs p ds dt = p . π s(t − s) λ(λ + ω) By the uniqueness of Laplace transforms, this completes the proof. 84 Markov processes 3.7 Exercises 1. Prove lemma 3.1.4. 2. Complete the proof of Lemma 3.1.6. 3. Let W be a BM. Show that the reflected Brownian motion defined by X = |W | is a Markov process (with respect to its natural filtration) and compute its transition function. (Hint: calculate the conditional probability P(Xt ∈ B | FsX ) by conditioning further on FsW ). 4. Let X be a Markov process with state space E and transition function (Pt ). Show that for every bounded, measurable function f on E and for all t ≥ 0, the process (Pt−s f (Xs ))s∈[0,t] is a martingale. 5. Prove that the probability measure µt1 ,...,tn defined in the proof of Corollary 3.2.3 form a consistent system. 6. Work out the details of the proof of Lemma 3.2.4. 7. Prove the claims made in Example 3.3.4. (Hint: to derive the explicit expression for the resolvent kernel it is needed to calculate integrals of the form Z ∞ −a2 t−b2 t−1 e √ dt. t 0 To this end, first perform the substitution t = (b/a)s2 . Next, make a change of variables u = s − 1/s and observe that u(s) = s − 1/s is a smooth bijection from (0, ∞) to R, whose inverse u−1 : R → (0, ∞) satisfies u−1 (t) − u−1 (−t) = t, whence (u−1 )0 (t) + (u−1 )0 (−t) = 1.) 8. Finish the proof of Lemma 3.3.5. 9. Prove the inequality in the proof of Lemma 3.3.6. 10. Suppose that E ⊆ Rd . Show that every countable, dense subset H of the space C0+ (E) of nonnegative functions in C0 (E) separates the points of the one-point compactification of E. This means that for all x 6= y in E, there exists a function h ∈ H such that h(x) 6= h(y) and for all x ∈ E, there exists a function h ∈ H such that h(x) 6= 0. 11. Let (X, d) be a compact metric space and let H be a class of nonnegative, continuous functions on X that separates the points of X, i.e. for all x 6= y in X, there exists a function h ∈ H such that h(x) 6= h(y). Prove that d(xn , x) → 0 if and only if for all h ∈ H, h(xn ) → h(x). (Hint: suppose that H = {h1 , h2 , . . .}, endow R∞ with the product topology and consider the map A : X → R∞ defined by A(x) = (h1 (x), h2 (x), . . .).) 12. Let X and Y be two random variables defined on the same probability space, taking values in a Polish space E. Show that X = Y a.s. if and only if Ef (X)g(Y ) = Ef (X)g(X) for all bounded and continuous functions f and g on E. (Hint: use the monotone class theorem.) 3.7 Exercises 85 13. Let X be a (canonical) Feller process with state space E and for x ∈ E, consider the stopping time σx = inf{t > 0 : Xt 6= x}. (i) Using the Markov property, show that for every x ∈ E Px (σx > t + s) = Px (σx > t)Px (σx > s) for all s, t ≥ 0. (ii) Conclude that there exists an a ∈ [0, ∞], possibly depending on x, such that Px (σx > t) = e−at . (Remark: this leads to a classification of the points in the state space of a Feller process. A point for which a = 0 is called an absorption point or a trap. If a ∈ (0, ∞) the point is called a holding point. Points for which a = ∞ are called regular.) 14. Let (Ft ) be the usual augmentation of the natural filtration of a canonical Feller process. Show that for every nonnegative, Ft -measurable random variable Z and every finite stopping time τ , the random variable Z ◦ θτ is Fτ +t -measurable. (Hint: First prove it for Z = 1A for A ∈ FtX . Next, prove it for Z = 1A for A ∈ Ft and use the fact that A ∈ Ftν if and only if there exist B ∈ FtX and C, D ∈ N ν such that B\C ⊆ A ⊆ B ∪ D (cf. Revuz and Yor (1991), pp. 3–4). Finally, prove it for arbitrary Z.) 15. Consider the situation of Exercise 13. Suppose that x ∈ E is a holding point, i.e. a point for which a ∈ (0, ∞). (i) Observe that σx < ∞, Px -a.s. and that {Xσx = x} ⊆ {σx ◦ θσx = 0}. (ii) Using the strong Markov property, show that Px (Xσx = x) = Px (Xσx = x)Px (σx = 0). (iii) Conclude that Px (Xσx = x) = 0, i.e. a Feller process can only leave a holding point by a jump. 16. Show that if X is a Feller process and we have (Ft )-stopping times τn ↑ τ a.s., then limn Xτn = Xτ a.s. on {τ < ∞}. This is called the quasi-left continuity of Feller processes. (Hint: First observe that is enough to prove the result for bounded τ . Next, put Y = limn Xτn (why does the limit exist?) and use the strong Markov property to show that for f, g ∈ C0 , Ex f (Y )g(Xτ ) = lim lim Ex f (Xτn )g(Xτn +t ) = Ex f (Y )g(Y ). t↓0 n The claim then follows from Exercise 12.) 17. Show that the maps ψ and ϕ in the proof of Theorem 3.4.4 are Borel measurable. 18. Derive the expression for the joint density of the BM and its running maximum given in Corollary 3.4.5. 86 Markov processes 19. Let W be a standard BM and S its running maximum. Show that for all x > 0 and t ≥ 0, P(St ≥ x) = P(τx ≤ t) = 2P(Wt ≥ x) = P(|Wt | ≥ x). 20. Prove Corollary 3.4.6. 21. Prove Lemma 3.5.2. 22. Prove Lemma 3.5.3. 23. Prove the claim of Example 3.5.7. (Hint: use the explicit formula for the resolvent (Example 3.3.4) to show that for f ∈ C0 , it holds that Rλ f ∈ C02 and λRλ f − f = (Rλ f )00 /2. The fact that C02 ⊆ D(A) follows from Theorem 4.1.1.) 24. In the proof of Lemma 3.5.11, show that Px (ηr ≥ nt) ≤ pn for n = 0, 1, . . .. 25. Let (Pt ) be a sub-Markovian semigroup on (E, E) and let ∆ be a point which does not belong to E. Put E∆ = E ∪ {∆}, E∆ = σ(E ∪ {∆}) and define P̃t by setting P̃t (x, A) = Pt (x, A) if A ⊆ E, P̃t (x, {∆}) = 1 − Pt (x, E) and P̃t (∆, {∆}) = 1. Show that (P̃t ) is a semigroup on (E∆ , E∆ ). 26. Verify the expression for the transition density of the killed BM given in example 3.6.2. 4 Special Feller processes 4.1 Brownian motion in Rd In this section we consider the Brownian motion in Rd for d ∈ N. We will see that the recurrence and transience properties of this process depend on the dimension d of the state space. A standard BM in Rd (notation: BMd ) is simply an Rd -valued process W = (W 1 , . . . , W d )T , such that the components W 1 , . . . , W d are independent, standard BM’s in R. The BMd is a process with stationary and independent increments. For s ≤ t, the increment Wt − Ws has a d-dimensional Gaussian distribution with mean vector zero and covariance matrix (t − s)I. As in the one-dimensional case it follows that the BMd is a Markov process with transition function Pt given by Pt f (x) = 1 (2πt)d/2 Z f (y)e− ky−xk2 2t dy (4.1) Rd (see Example 3.1.5). From this explicit expression it easily follows that the process is Feller. Now let W be the canonical version of the BMd and consider the measures Px for x ∈ R. By construction W is a standard BMd under P0 . The semigroup (4.1) has the property that if f ∈ C0 and x ∈ R, then for g(y) = f (y − x) it holds that Pt g(x) = Pt f (0). This translation invariance implies that the law of W under Px is equal to the law of x + W under P0 (see Exercise 1). In other words, under Px the process W is just a standard BMd , translated over the vector x. The following theorem describes the infinitesimal generator of the BMd . It generalizes Example 3.5.7 to higher dimensions. 88 Special Feller processes Theorem 4.1.1. The domain of the generator of the BMd contains the space C02 of functions on Rd with two continuous derivatives that vanish at infinity. Pd On C02 the generator equals ∆/2, where ∆ = i=1 ∂ 2 /∂yi2 is the Laplacian on Rd . Proof. Let x ∈ Rd be fixed. Define pt (y) = (2πt)−d/2 exp(−kyk2 /2t), so that for f ∈ C0 , Z f (x + y) pt (y)dy. Pt f (x) = Rd The functions pt (y) satisfy the partial differential equation (the heat equation) ∂ 1 pt (y) = ∆pt (y). ∂t 2 It follows that for 0 < ε ≤ t, Pt f (x) − Pε f (x) = Z f (x + y) Rd Z t ε ∂ ps (y) ds ∂s dy t 1 f (x + y) ∆ps (y) ds dy ε 2 Rd Z t Z 1 = f (x + y) ∆ps (y) dy ds. 2 Rd ε = Z Z The interchanging of the integrals is justified by the fact that f ∈ C0 . Now suppose that f ∈ C02 . Then integrating by parts twice yields Z Z 1 1 1 ps (y) ∆f (x + y) dy = Ps ∆f (x). f (x + y) ∆ps (y) dy = 2 2 2 Rd Rd If we insert this in the preceding display and let ε → 0, we find that Z t 1 Pt f (x) − f (x) = Ps ∆f (x) ds. 2 0 Divide by t and let t → 0 to complete the proof. In combination with Dynkin’s formula, the explicit expression ∆/2 for the generator of the BMd allows us to investigate the recurrence properties of the higher dimensional BM. For a ≥ 0, consider the stopping time τa = inf{t > 0 : kWt k = a}, which is the first hitting time of the ball of radius a around the origin. The follow theorem extends the result of Exercise 14 of Chapter 2 to higher dimensions. Theorem 4.1.2. For 0 < a < kxk < b, log b − log kxk log b − log a Px (τa < τb ) = kxk2−d − b2−d a2−d − b2−d d = 2, d ≥ 3. 4.1 Brownian motion in Rd 89 Proof. Let f be a C ∞ -function with compact support on Rd such that for all x ∈ Rd with a ≤ kxk ≤ b, ( log kxk d = 2 f (x) = kxk2−d d ≥ 3. Observe that τa ∧ τb ≤ inf{t : |Wt1 | = b} under Px , so by Exercise 15 of Chapter 2, Ex τa ∧ τb < ∞. Hence, by Dynkin’s formula and the fact that f ∈ C02 and ∆f (x) = 0 for a ≤ kxk ≤ b, we have Ex f (Wτa ∧τb ) = f (x). For d = 2 this yields log kxk = f (x) = Ex f (Wτa )1{τa <τb } + Ex f (Wτb )1{τa >τb } = Px (τa < τb ) log a + Px (τa > τb ) log b, which proves the first statement of the theorem. The second statement follows from Dynkin’s formula by the same reasoning, see Exercise 2. The following corollary follows from the preceding theorem by letting b tend to infinity. Corollary 4.1.3. For all x ∈ Rd such that 0 < a < kxk, ( 1 d = 2, Px (τa < ∞) = kxk2−d /a2−d d ≥ 3. Proof. Note that by the continuity of sample paths, S Px (τa < ∞) = Px b>kxk {τa < τb } . It follows that Px (τa < ∞) = lim Px (τa < τb ) , b→∞ whence the statement follows from Theorem 4.1.2. By the translation invariance of the BM, this corollary shows that with probability 1, the BM2 hits every disc with positive radius, and hence every nonempty open set. It also implies that for d ≥ 3, the BMd ultimately leaves every bounded set forever (see Exercise 3). We say that the BM2 is neighbourhoodrecurrent and for d ≥ 3, the BMd is transient. The following corollary shows that the BM2 is not ‘point-recurrent’ like the BM1 . A given point is hit with probability zero. Corollary 4.1.4. For the BM2 we have Px (τ0 < ∞) = 0 for all x 6= 0. 90 Special Feller processes Proof. Note that {τ0 < ∞} = [ n {τ0 < τn } = [ \ n m>kxk−1 {τ1/m < τn }. Hence, the corollary follows from the fact that T Px {τ < τ } = lim Px (τ1/m < τn ) = 0, −1 n 1/m m>kxk m→∞ by Theorem 4.1.2. If we combine all the recurrence-transience results, we arrive at the following theorem. Theorem 4.1.5. The BM1 is recurrent, the BM2 is neighbourhood-recurrent and for d ≥ 3, the BMd is transient. 4.2 Feller diffusions Markov processes with continuous sample paths are often called diffusions. In the present section we consider Feller diffusions. Their precise definition is given below. In addition to continuity we require that the domain of the generator of the process contains the collection of functions that are infinitely often differentiable, and whose support is compact and contained in the interior of the state space E. This collection of functions is denoted by Cc∞ (int(E)). Definition 4.2.1. A Feller process with state space E ⊆ Rd is called a Feller diffusion if it has continuous sample paths and the domain of its generator contains the function space Cc∞ (int(E)). Theorem 4.1.1 shows that the BMd is a Feller diffusion. More generally we have the following example. Example 4.2.2. Let W be a BMd , σ an invertible d × d matrix, b ∈ Rd and define the process X by Xt = bt + σWt . Then X is a Feller diffusion. Its generator A satisfies Af (x) = X i bi ∂f (x) + ∂xi 1 2 XX i j aij ∂2f (x) ∂xi ∂xj for f ∈ C02 (Rd ), where a = σσ T . If b = 0 this can be seen by relating the semigroup of X to that of W (see Exercise 4). The general case is a consequence of Exercise 5. 4.2 Feller diffusions 91 The process X in the preceding example is called an n-dimensional Brownian motion with drift b and diffusion matrix a. Below we will prove that a general Feller diffusion locally behaves like such a process. In the general case the drift vector b and diffusion matrix a depend on the space variable x. We need the following lemma, which collects some properties of the generator of a Feller diffusion. Lemma 4.2.3. Let X be a Feller diffusion on E with generator A. (i) If f ∈ D(A) and f (y) ≤ f (x) for all y ∈ E, then Af (x) ≤ 0. (ii) If f, g ∈ D(A) coincide in a neighborhood of x ∈ int(E), then Af (x) = Ag(x). (iii) If f ∈ D(A) has a local maximum in x ∈ int(E), then Af (x) ≤ 0. (iv) If f ∈ D(A) is constant in a neighborhood of x ∈ int(E), then Af (x) = 0. Proof. Part (i) is valid for general Feller processes, see the remarks preceding Theorem 3.5.8. Part (ii) follows from Theorem 3.5.12 and the continuity of the sample paths. Combined, (i) and (ii) yield (iii). The last part follows from Theorem 3.5.12 again, by noting that if f is constant near x, the numerator in (3.7) vanishes for small r. We can now prove that the generator of a Feller diffusion is a second order differential operator. Theorem 4.2.4. Let X be a Feller diffusion on E ⊆ Rd . Then there exist continuous functions aij and bj on int(E) such that for f ∈ Cc∞ (int(E)) and x ∈ int(E), Af (x) = X bi (x) i ∂f (x) + ∂xi 1 2 XX i j aij (x) ∂2f (x). ∂xi ∂xj Moreover, the matrix (aij (x)) is symmetric and nonnegative definite for every x ∈ int(E). Proof. Fix x ∈ int(E) and pick ϕi ∈ Cc∞ (int(E)) such that ϕi (y) = yi − xi for y in a neighborhood of x. Let ϕ ∈ Cc∞ (int(E)) be a function which is identically 1 in a neighborhood of x. Define bi (x) = Aϕi (x) and aij (x) = Aϕi ϕj (x). Note that a and b are well defined by part (ii) of the lemma, and since A : D(A) → C0 , they are continuous. Also observe that for all λ1 , . . . , λd ∈ R, the function 2 X λi ϕi (y) , y 7→ − i defined in a neighborhood of x, has a local maximum at x. By part (iii) of the lemma this implies that XX λi λj aij (x) ≥ 0, i j 92 Special Feller processes i.e. (aij (x)) is nonnegative definite. By Taylor’s formula it holds that f (y) = g(y) + o(kx − yk2 ) for y near x, where g(y) = f (x)ϕ(y) + X ϕi (y) i ∂f (x) + ∂xi 1 2 XX i ϕi (y)ϕj (y) j ∂2f (x). ∂xi ∂xj Part (iv) of the lemma and the definitions of a and b imply that Ag(x) = X i bi (x) ∂f (x) + ∂xi 1 2 XX i j aij (x) ∂2f (x), ∂xi ∂xj so it remains to show that Af (x) = Ag(x). To prove this, fix ε > 0 and consider the function y 7→ f (y) − g(y) − εky − xk2 , defined in a neighborhood of x. Near x this function is ky − xk2 (o(1) − ε), and hence it has a local maximum at x. By part (iii) of the lemma we get Af (x) − Ag(x) − ε X i a2ii (x) ≤ 0, and by letting ε → 0 we obtain Af (x) ≤ Ag(x). Reversing the roles of f and g yields Af (x) ≥ Ag(x). This completes the proof. In view of Example 4.2.2, we call the functions b and a exhibited in Theorem 4.2.4 the drift and diffusion of the process X, respectively. The proof of the theorem shows that b and a determine the infinitesimal movements of the diffusion process X = (X 1 , . . . , X d ), in the sense that for every x ∈ int(E) and small h, it holds that Ex Xhi − xi ≈ Ph ϕi (x) ≈ hAϕi (x) = bi (x)h. In other words, E(Xt+h | Xt = x) ≈ x + hb(x). Similarly, we have j i − xj ) | Xt = x) ≈ haij (x). − xi )(Xt+h E((Xt+h Next we investigate when a function of a Feller diffusion is again a Feller diffusion, and how the drift and diffusion functions of the new process are related to the old one. Theorem 4.2.5. Let X be a Feller diffusion on E ⊆ Rd and let ϕ : E → Ẽ ⊆ Rn be continuous and onto, and assume that kϕ(xn )k → ∞ in Ẽ if and only if kxn k → ∞ in E. Suppose that (P̃t ) is a collection of transition kernels such that Pt (f ◦ ϕ) = (P̃t f ) ◦ ϕ for all f ∈ C0 (Ẽ), so that X̃ = ϕ(X) is a Feller process with state space Ẽ and transition function (P̃t ). (i) If ϕ is infinitely often differentiable on int(E) and for every compact K ⊆ int(Ẽ), ϕ−1 (K) is contained in a compact subset of int(E), then X̃ is a Feller diffusion on Ẽ. 4.2 Feller diffusions 93 (ii) In that case the drift and diffusion b̃ and ã of X̃ satisfy b̃k (ϕ(x)) = X bi (x) i ãkl (ϕ(x)) = ∂ϕk (x) + ∂xi XX i 1 2 XX i aij (x) j aij (x) j ∂ 2 ϕk (x), ∂xi ∂xj ∂ϕl ∂ϕk (x) (x) ∂xi ∂xj for ϕ(x) ∈ int(Ẽ) and k, l = 1, . . . , n. Proof. (i). Since ϕ is continuous the process X̃ is continuous, so we only have to show that Cc∞ (int(Ẽ)) is contained in the domain of the generator à of X̃. So take f ∈ Cc∞ (int(Ẽ)). The assumptions on ϕ imply that f ◦ ϕ ∈ Cc∞ (int(E)) ⊆ D(A). Hence, by Lemma 3.5.2, f ∈ D(Ã). (ii). To simplify the notation we write ∂i instead of ∂/∂xi for the differential w.r.t. the ith variable. By the chain rule we have X ∂k f (ϕ(x))∂i ϕk (x) ∂i (f ◦ ϕ)(x) = k and (check!) ∂i ∂j (f ◦ ϕ)(x) = XX k ∂k ∂l f (ϕ(x))∂i ϕk (x)∂j ϕk (x) + l X ∂i ∂j ϕk (x)∂k f (ϕ(x)). k Now compare A(f ◦ ϕ)(x) with Ãf (ϕ(x)) and apply Lemma 3.5.2 to complete the proof (Exercise 6). Example 4.2.6 (Bessel processes). Let X be a BMd with d ≥ 2 and define X̃ = kXk. This process is called the Bessel process of order d and is denoted by BESd . The transition function Pt of the BMd (see (4.1)) has the property that if f ∈ C0 (Rd ) and f (x) depends only on kxk, then Pt f (x) also depends only on kxk (check!). So if f ∈ C0 (R+ ), x is a point on the unit sphere in Rd and r ≥ 0 then P̃t f (r) = Pt (f ◦ k · k)(rx) is independent of x, and this defines a transition kernel P̃t on R+ . The function ϕ = k · k and the semigroup (P̃t ) satisfy the conditions of Theorem 4.2.5, so the Bessel process X̃ is a Feller diffusion on R+ . For ϕ(x) = kxk > 0, we have xi ∂ϕ (x) = , ∂xi kxk ∂2ϕ δij xi xj (x) = − . ∂xi ∂xj kxk kxk3 By part (ii) of the preceding theorem, it follows that the drift b̃ and diffusion ã of the BESd are given by b̃(x) = d−1 , 2x ã(x) = 1. (4.2) (See Exercise 7). Observe that for x close to 0 the drift upward becomes very large. As a consequence the process always gets “pulled up” when it approaches 0, so that it stays nonnegative all the time. 94 Special Feller processes If we combine Theorem 4.2.4 and Example 4.2.2 we see that a Feller diffusion process X can be viewed as a BM with space dependent drift and diffusion. Loosely speaking, it holds that around time t, Xt moves like a BM with drift b(Xt ) and diffusion a(Xt ). If we consider the one-dimensional case for simplicity, this means that for s close to t, we have something like Xs = b(Xt )s + σ(Xt )Ws , where W is a BM (this is of course purely heuristic). If we consider this equation for s = t + h and s = t and subtract, we see that for small h, we must have Xt+h − Xt ≈ b(Xt )h + σ(Xt )(Wt+h − Wt ). This suggests that a one-dimensional Feller diffusion with generator drift b and diffusion σ 2 satisfies a stochastic differential equation (SDE) of the form dXt = b(Xt ) dt + σ(Xt ) dWt , (4.3) with W a Brownian motion. Of course, it is not at all exactly clear what we mean by this. In particular, the term dWt is somewhat disturbing. Since we know that the Brownian sample paths are of unbounded variation, the path W (ω) does not define a (signed) measure dW (ω). This means that in general, we can not make sense of the SDE (4.3) by viewing it as a pathwise equation for the process X. It turns out however that there is a way around this, using the fact that the Brownian motion has finite quadratic variation. The SDE (4.3) can be given a precise meaning and it can be shown that indeed, under certain assumptions on b and σ, it has a unique solution X which is a Markov process with generator Af (x) = b(x)f 0 (x) + 21 σ 2 (x)f 00 (x). The theory of SDEs and the associated stochastic calculus are very useful tools in the study of diffusions. It is interesting to see how Theorem 4.2.5 translates in the language of SDEs. The theorem states that if X is a one-dimensional Feller diffusion with drift b and diffusion σ 2 and ϕ is a smooth function, then, under certain conditions, Y = ϕ(X) is a Feller diffusion with drift b̃ and diffusion σ̃ 2 given by b̃(ϕ(x)) = b(x)ϕ0 (x) + 21 σ 2 (x)ϕ00 (x), σ̃ 2 (ϕ(x)) = σ 2 (x)(ϕ0 (x))2 . In the language of SDE’s, this show that if X satisfies (4.3), then ϕ(X) satisfies dϕ(Xt ) = b(Xt )ϕ0 (Xt ) + 12 σ 2 (Xt )ϕ00 (Xt ) dt + σ(Xt )ϕ0 (Xt ) dWt . (4.4) This is a special case of a central result in stochastic calculus, known as Itô’s formula. It shows in particular that there is a difference between stochastic calculus and the ordinary calculus. Indeed, if the rules of ordinary calculus were valid in this case, we would have dϕ(Xt ) = ϕ0 (Xt ) dXt = b(Xt )ϕ0 (Xt ) dt + σ(Xt )ϕ0 (Xt ) dWt . The presence of the additional term 00 1 2 2 σ (Xt )ϕ (Xt ) dt in (4.4) is typical for stochastic calculus, and is caused by the quadratic variation of the Brownian sample paths. See the references for texts on stochastic calculus and stochastic differential equations. 4.3 Feller processes on discrete spaces 4.3 95 Feller processes on discrete spaces Consider a Markov process X on a discrete state space E, meaning that E is countable and that it is endowed with the discrete topology. Then E is a Polish space and its Borel σ-algebra E consists of all subsets of E. As usual we denote the semigroup of X by (Pt ) and we consider the canonical version with underlying measures Px , x ∈ E. It is useful to introduce the numbers pt (x, y) = Pt (x, {y}) = Px (Xt = y) for t ≥ 0 and x, y ∈ E. These transition probabilities of course satisfy X pt (x, y) = 1 (4.5) y for x ∈ E and t ≥ 0, and the Chapman-Kolmogorov equations reduce to ps+t (x, y) = X ps (x, z)pt (z, y) (4.6) z for s, t ≥ 0 and x, y ∈ E. Conversely, every collection of nonnegative numbers {pt (x, y) : t ≥ 0, x, y ∈ E} satisfying (4.5) and (4.6) defines a transition function (Pt ) on (E, E), given by X Pt (x, B) = pt (x, y), y∈B and hence for every x ∈ E there exist a measure Px on the canonical space such that under Px , the canonical process X is a Markov process which starts in x and which has transition probabilities pt (x, y). To verify if a Markov process on a discrete space is Feller we have to understand what the function space C0 looks like in this case. Since the space E is discrete, every function on E is automatically continuous. For a sequence xn in E, the convergence kxn k → ∞ means that xn is outside every given finite subset of E for n large enough. Hence, f ∈ C0 means that for every ε > 0, there exists a finite set F ⊆ E such that that |f (x)| < ε for all x outside F . Note in particular that if the state space E itself is finite, every function on E belongs to C0 . Example 4.3.1 (Finite state space). If the state space E is finite, then every function belongs to C0 , so Pt C0 ⊆ C0 is always true. It follows that a Markov process on a finite state space E is Feller if and only if its transition probabilities satisfy, for all x, y ∈ E, pt (x, y) → δx,y as t → 0. 96 Special Feller processes Example 4.3.2 (Poisson process). Let λ > 0 be a fixed parameter. For t > 0 and x, y ∈ Z+ , define (λt)k −λt e , if y = x + k for k ∈ Z+ , pt (x, y) = (4.7) k! 0, otherwise, and put p0 (x, y) = δx,y . Upon recognizing Poisson probabilities we see that (4.5) holds. As for (4.6), observe that X ps (x, z)pt (z, y) = z k X ps (x, x + l)pt (x + l, x + k) l=0 = k eλ(t+s) X k (λs)l (λt)k−l . k! l l=0 By Newton’s binomial theorem the sum on the right-hand side equals (λ(t+s)) k , so the Chapman-Kolmogorov equalities are indeed satisfied. It follows that the transition probabilities (4.7) define a Markov process on Z+ . This process is called the Poisson process with intensity λ. Observe that if N is a Poisson process with intensity λ, then for k ∈ Z+ P0 (Nt = k) = pt (0, k) = (λt)k −λt e . k! So under P0 , the random variable Nt has a Poisson distribution with parameter λt. To verify that this process is Feller, take a function f on Z+ such that f (x) → 0 as x → ∞. Then Pt f (x) = X f (y)pt (x, y) = e−λt y X (λt)k k≥0 k! f (x + k). The summand on the right-hand side vanishes as x → ∞ and is bounded by kf k∞ (λt)k /k!. Hence, by dominated convergence, Pt f (x) → 0 as x → ∞. This shows that Pt C0 ⊆ C0 . Note that we can also write Pt f (x) = e−λt f (x) + e−λt X (λt)k k≥1 k! f (x + k). This shows that Pt f (x) → f (x) as t → 0. Hence, the Poisson process is Feller. Using the explicit expression for Pt above it is rather straightforward to compute the generator A of the Poisson process with intensity λ. It holds that D(A) = C0 and Af (x) = λ(f (x + 1) − f (x)) (see Exercise 8). We will see below that the form of the generator reveals much about the structure of the Poisson process. 4.3 Feller processes on discrete spaces 97 The motion of a Feller process X on a discrete space E is of a simple nature. The process stays in a certain state for a random amount of time, then jumps to the next state, etc. For x ∈ E, consider the stopping time σx = inf{t > 0 : Xt 6= x}. From Exercise 13 of Chapter 3 we know that under Px , the random time σx has an exponential distribution, say with parameter λ(x) ∈ [0, ∞) (λ(x) = 0 meaning that σx = ∞, Px -a.s., i.e. that x is an absorption point). This defines a function λ : E → [0, ∞), where λ(x) is interpreted as the rate at which the state x is left by the process X. Now start the process in a point x ∈ E with λ(x) > 0. Then at time σx the process jumps from state x to Xσx . So if for x 6= y in E we define q(x, y) = Px (Xσx = y), we can interpret q(x, y) as the probability that the process jumps to y when it leaves state x. Clearly, we have that q(x, y) = Q(x, {y}), where Q is the transition kernel on E defined by Q(x, B) = Px (Xσx ∈ B). Together, the rate function λ and the transition kernel Q (or, equivalently, the transition probabilities q(x, y)) determine the entire motion of the process. When started in x the process stays there for an exponential(λ(x))-distributed amount of time. Then it chooses a new state y according to the probability distribution Q(x, ·) and jumps there. By the strong Markov property this procedure then repeats itself. The process stays in y for an exponential(λ(y))distributed amount of time and then jumps to a new state z chosen according to the probability distribution Q(z, ·), etc. We call the constructed function λ the rate function of the process, and Q the transition kernel. The rate function and transition kernel of a Feller process on a discrete space can easily be read of from the generator. Theorem 4.3.3. Let X be a Feller process on a discrete space E, with rate function λ and transition kernel Q. Then the domain D(A) of its generator A consists of all functions f ∈ C0 such that λ(Qf − f ) ∈ C0 , and for f ∈ D(A), X f (y)q(x, y) − f (x) . Af (x) = λ(x)(Qf (x) − f (x)) = λ(x) y6=x Proof. This follows from Theorem 3.5.12, see Exercise 9. Example 4.3.4 (Poisson process, continued). Using Theorem 4.3.3 we can now read off the rate function and transition kernel of the Poisson process from the expression Af (x) = λ(f (x + 1) − f (x)) that we have for its generator. We conclude that the rate function is identically equal to λ, and that q(x, x + 1) = Q(x, {x + 1}) = 1, and all other transition probabilities are 0. So the jumps of the Poisson process are all equal to 1 (a process with this property is called a counting process), and the time between two consecutive jumps is exponentially distributed with parameter λ. 98 Special Feller processes The preceding example shows that the Poisson process can be constructed by starting with the deterministic process which has jumps of size 1 at times 1, 2, 3, . . . and then replacing the deterministic times between the jumps by independent exponential times. More generally, any discrete-time Markov chain can be transformed into a continuous-time Markov process by replacing the fixed times between the jumps by independent exponential times. To make this precise, let Q be a transition kernel on E and λ > 0. For t ≥ 0 and f a bounded function on E, define Pt f (x) = e−λt ∞ X Qn f (x) n=0 (λt)n , n! (4.8) where Qn = Q ◦ · · · ◦ Q (n times). Using Newton’s binomial theorem it is easily verified that (Pt ) is a transition semigroup on E (see Exercise 10), so we can consider the associated Markov process on E. The intuition behind the construction of this process is as follows. Let Y = (Yn ) be a discrete-time Markov chain with transition kernel Q, meaning that given Yn = y, the random variable Yn+1 has distribution Q(y, ·). Set τ0 = 0 and let τ1 , τ2 , . . . be i.i.d., all exponentially distributed with parameter λ. Now define the process X by Xt = ∞ X Yn 1[τn ,τn+1 ) (t). n=0 From the lack of memory of the exponential distribution and the Markov property of Y it is intuitively clear that X has the Markov property as well. Its transition function (Pt ) is given by Pt f (x) = Ex f (Xt ) = ∞ X n=0 = e−λt ∞ X n=0 Ex f (Yn )Px (τn ≤ t < τn ) Qn f (x) (λt)n . n! Here we have used the Markov property of Y to see that Ex f (Yn ) = Ex Ex (f (Yn ) | Yn−1 , . . . , Y0 ) = Ex Qf (Yn−1 ). Iterating this yields Ex f (Yn ) = Qn f (x). Moreover, with Nt a Poisson process that starts in 0, we have that P(τn ≤ t < τn ) = P(Nt = n) = exp(−λt)(λt)n /n!. Theorem 4.3.5. If Q is a transition kernel on E such that QC0 ⊆ C0 and λ > 0, then (4.8) defines a Feller process on E whose generator A satisfies D(A) = C0 and Af = λ(Qf − f ). Proof. The properties Pt C0 ⊆ C0 and Pt f (x) → f (x) as t → 0 follow from (4.8) by dominated convergence (see Exercise 11). We also see from (4.8) that Pt f (x) − f (x) e−λt − 1 = λQf (x)e−λt + f (x) + O(t). t t Letting t → 0 the right-hand side converges to λ(Qf (x) − f (x)). 4.4 Lévy processes 99 If in the present setting we define q(x, y) = Q(x, {y}) for x, y ∈ E, the generator A can be expressed as Af (x) = λ(1 − q(x, x)) X y6=x q(x, y) f (y) − f (x) . 1 − q(x, x) Hence, by Theorem 4.3.3, the construction (4.8) yields a Feller process on E with rate function λ(x) = λ(1 − q(x, x)) and transition probabilities q(x, y) 1 − q(x, x) for x 6= y. Example 4.3.6 (Continuous-time random walk). Ordinary random walk in Zd is a Markov chain with transition kernel Q given by 1 , if y ∈ N , x Q(x, {y}) = 2d 0, otherwise, where Nx is the set consisting of the 2d neighbors of x. In other words, at each time step the chain moves to one of the neighbors of the current state, all neighbors being equally likely. Clearly QC0 ⊆ C0 in this case. The associated Feller process constructed above is called the continuous-time random walk on Zd . If we take λ = 1 its generator is given by Af (x) = X 1 f (y) − f (x), 2d y∈Nx as is easily checked. 4.4 Lévy processes We already encountered processes with independent and stationary increments in Corollary 3.4.2. In this section we study this class of processes in some more detail. We restrict ourselves to real-valued processes, although most results can be generalized to higher dimensions. 100 Special Feller processes 4.4.1 Definition, examples and first properties We begin with the basic definition. Definition 4.4.1. An (Ft )-adapted process X is called a Lévy process (with respect to (Ft )) if (i) Xt − Xs is independent of Fs for all s ≤ t, (ii) Xt − Xs =d Xt−s for all s ≤ t, (iii) the process is continuous in probability, i.e. if tn → t then Xtn → Xt in probability. Observe that the first two items of the definition imply that for u ∈ R, the function f (t) = E exp(iuXt ) satisfies f (s + t) = Eeiu((Xs+t −Xt )+Xt ) = EeiuXs EeiuXt = f (s)f (t). By item (iii) the function f is continuous, and it follows that the characteristic function of Xt is of the form EeiuXt = f (t) = e−tψ(u) , where ψ : R → C is a continuous function with ψ(0) = 0. The function ψ is called the characteristic exponent of the Lévy process. Note in particular that X0 = 0 almost surely. The independence and stationarity of the increments implies that the characteristic exponent determines the whole distribution of a Lévy process (see Exercise 13). Recall that if Y is an integrable random variable with characteristic function ϕ, then ϕ0 (0) = iEY . So if X is a Lévy process and Xt is integrable, then the characteristic exponent ψ is differentiable and EXt = itψ 0 (0). If Xt has a finite second moment as well, we have VarXt = σ 2 t for some σ 2 ≥ 0 (see Exercise 14). The independence and stationarity of the increments imply that for a Lévy process X and a bounded measurable function f we have, for s ≤ t, E(f (Xt ) | Fs ) = E(f ((Xt − Xs ) + Xs ) | Fs ) = Pt−s f (Xs ), where Pt f (x) = Ef (x + Xt ). Hence, a Lévy process is a Markov process with transition kernel Pt f (x) = Ef (x + Xt ). It is not difficult to see that the semigroup (Pt ) is Feller (Exercise 15), so a Lévy process is always a Feller process. From this point on, we will always consider the canonical, cadlag version of X and the usual augmentation of its natural filtration. As usual, we denote by Px the law under which X is a Markov process with transition function (Pt ) and initial distribution δx . Under P0 the process is Lévy, and since Ex f (Xt ) = Pt f (x) = E0 f (x + Xt ), the law of X under Px is equal to the law of x + X under P0 . Below we will write P for P0 and E for E0 . 4.4 Lévy processes 101 Example 4.4.2 (Brownian motion). If W is a standard Brownian motion and b ∈ R and σ ≥ 0, then the Brownian motion with drift defined by Xt = bt + σWt is clearly a Lévy process. Since exp(iuZ) = exp(−u2 /2) if Z has a standard normal distribution, the characteristic exponent of the process X is given by ψ(u) = −ibu + 21 σ 2 u2 . We will see below that the Brownian motion with linear drift is in fact the only Lévy process with continuous sample paths (see Theorem 4.4.13). Example 4.4.3 (Poisson process). Let N be a Poisson process with intensity λ > 0. Then by the Markov property E(f (Nt − Ns ) | Fs ) = ENs f (Nt−s − N0 ) for every bounded, measurable function f and s ≤ t. For x ∈ Z+ we have Ex f (Nt−s − N0 ) = Pt−s g(x), where g(y) = f (y − x). Using the explicit expression for the semigroup (P t ) (cf. Example 4.3.2) we see that Pt−s g(x) = Pt−s f (0). Hence, E(f (Nt − Ns ) | Fs ) = Pt−s f (0) = Ef (Nt−s ). This shows that the Poisson process, when started in 0, is a Lévy process. Recall that if Z has a Poisson distribution with parameter α > 0, then E exp(iuZ) = exp(−α(1 − eiu )). It follows that the characteristic exponent of the Poisson process with intensity λ is given by ψ(u) = λ(1 − eiu ). In fact, the Poisson process is the only non-trivial Lévy process that is also counting process. Indeed, let N be a process of the latter type. Then N is a Feller process on the discrete space Z+ , of the type studied in Section 4.3. Since it is a counting process, its transition kernel is given by Q(x, {x + 1}) = 1 for x ∈ Z+ . As for the rate function, let σx be the first time the process N leaves x. Under Px , N has the same law as x + N under P0 . Hence, the law of σx under Px is equal to the law of σ0 under P0 . This shows that the rate function is equal to a constant λ ≥ 0 (if λ = 0, the process is trivial). Theorem 4.3.3 now shows that N has generator Af (x) = λ(f (x + 1) − f (x)). By Example 4.3.2 and the fact that the generator determines the semigroup, it follows that N is a Poisson process. Example 4.4.4 (Compound Poisson process). Let N be a Poisson process with intensity λ > 0, started in 0, and Y1 , Ys , . . . i.i.d. random variables with common distribution function F , independent of N . Then the process X defined by Nt X Xt = Yi i=1 102 Special Feller processes is called a compound Poisson process. It is easily seen to be a Lévy process by conditioning on N and using the fact that N is a Lévy process (see Exercise 16). It can also be shown by conditioning that the characteristic exponent of the compound Poisson process is given by Z ψ(u) = λ (1 − eiux ) dF (x) (see the same exercise). Example 4.4.5 (First passage times). Let W be a standard Brownian motion and for a ≥ 0, define σa = inf{t : Wt > a}, the first passage time of the level a. By construction it holds that Wσa +t > a for t small enough, which implies that the process (σa )a≥0 is right-continuous. Moreover, it is clear that for an ↑ a it holds that σan ↑ τa , where τa = inf{t : Wt = a} is the first hitting time of the level a. It follows in particular that the process (σa )a≥0 is cadlag. For b > a we have σb − σa = inf{t : Wσa +t − Wσa > b − a}. Hence, by the strong Markov property (see Corollary 3.4.2), σb − σa is independent of Fσa and has the same distribution as σb−a . This shows that the process of first passage times (σa )a≥0 is a Lévy process. It is in fact the cadlag modification of the process (τa )a≥0 of first hitting times. To see this, let an ↑ a. As we just noted, σan → τa a.s.. On the other hand, the quasi-left continuity of the first passage time process implies that a.s. σan → σa (see Exercise 16 of Chapter 3). It follows that a.s., σa = τa . Since a Lévy process is cadlag, the number of jumps ∆Xs before some time t such that |∆Xs | ≥ ε has to be finite for all ε > 0. Hence, if B is Borel set bounded away from 0, then for t ≥ 0 NtB = #{s ∈ [0, t] : ∆Xs ∈ B} is well defined and a.s. finite. The process N B is clearly a counting process, and we call it the counting process of B. It inherits the Lévy property from X and hence, by Example 4.4.3, N B is a Poisson process with a certain intensity, P say ν X (B) < ∞. If B is a disjoint union of Borel sets Bi , then N B = i N Bi , P P hence ν X (B) = EN1B = EN1Bi = ν(Bi ). We conclude that ν X is a Borel X measure on R\{0}. It holds that ν (R\(−ε, ε)) < ∞ for all ε > 0, in particular ν X is σ-finite. The measure ν X is called the Lévy measure of the process X. It is easily seen that for the Brownian motion and the Poisson process with intensity λ, we have ν XP= 0 and ν X = λδ1 , respectively. As for the compound Poisson process Xt = i≤Nt Yi of Example 4.4.4, observe that the counting process of the Borel set B is given by NtB = Nt X i=1 1{Yi ∈B} . 4.4 Lévy processes 103 Hence, by Rconditioning on N we see that the Lévy measure is given by ν X (B) = EN1B = λ B dF , i.e. ν X (dx) = λ dF (x). 4.4.2 Jumps of a Lévy process In this subsection we analyze the structure of the jumps of a Lévy process. The first theorem provides useful information about Lévy process with bounded jumps. Theorem 4.4.6. A Lévy process with bounded jumps has finite moments of all orders. Proof. Let X be a Lévy process and let the number C > 0 be such that |∆Xt | ≤ C for all t ≥ 0. Define the stopping times τ0 , τ1 , τ2 , . . . by τ0 = 0, τ1 = inf{t : |Xt − X0 | ≥ C}, . . . , τn+1 = inf{t > τn : |Xt − Xτn | ≥ C}, . . . . The fact that X is right-continuous implies that τ1 is a.s. strictly positive. Using the uniqueness of Laplace transforms, it follows that for some λ > 0, we have a = E0 exp(−λτ1 ) < 1. By the strong Markov property we have, on the event {τn−1 < ∞}, E e−λ(τn −τn−1 ) | Fτn−1 = E e−λτ1 ◦ θτn−1 | Fτn−1 = EXτn−1 e−λτ1 = Ee−λτ1 , where the last equality in the display follows from the fact that the law of X −X 0 under Px is the same as the law of X under P0 . By repeated conditioning, it follows that n Ee−λτn = Ee−λτn 1{τn <∞} = E e−λ k=1 (τk −τk−1 ) 1{τn <∞} n−1 = E e−λ k=1 (τk −τk−1 ) 1{τn−1 <∞} E e−λ(τn −τn−1 ) | Fτn−1 n−1 = aE e−λ k=1 (τk −τk−1 ) 1{τn−1 <∞} = . . . = an−1 Ee−λτ1 1{τ1 <∞} = an . Now observe that by construction supt≤τn |Xt | ≤ 2nC, so P(|Xt | > 2nC) ≤ P(τn < t) ≤ P(exp(−λτn ) > exp(−λt)) ≤ eλt an . This implies that Xt has finite moments of all orders. The next theorem deals with two Lévy processes X and Y which never jump at the same time, meaning that ∆Xt ∆Yt = 0 for all t ≥ 0. One of the processes is assumed to have sample paths with finite variation on finite 104 Special Feller processes intervals. Recall that a function f on R+ is said to have this property if for every t ≥ 0 X sup |f (tk ) − f (tk−1 )| < ∞, where the supremum is taken over all finite partitions 0 = t0 < · · · < tn = t of [0, t]. It is clear that for instance monotone functions and continuously differentiable functions have finite variation on finite intervals. In the proof of the theorem we use that fact that if X is a Lévy process, the process M defined by eiuXt Mt = EeiuXt is a martingale (see Exercise 17). Theorem 4.4.7. Let X and Y be two Lévy processes with respect to the same filtration, and let one of them have sample paths with finite variation on finite intervals. If X and Y never jump at the same time, they are independent. Proof. In this proof we repeatedly use the simple fact that if Z is a martingale satisfying sups≤t |Zs | ≤ C for some constant C > 0, and for n ∈ N, πn is a partition of [0, t] given by 0 = tn0 < · · · < tnn = t, then E X k (Ztnk − Ztnk−1 )2 2 .C E X (Z k 2 tn k .E X k −Z tn k−1 (Ztnk − Ztnk−1 )4 )2 = C 2 (EZt2 − EZ02 ). In particular, the sequence k (Ztnk − Ztnk−1 )2 is bounded in L2 . Now say that Y has finite variation on finite intervals and consider the martingales eiuXt eivYt Mt = − 1, N = − 1. t EeiuXt EeivYt P Since N has finitePvariation on [0, t] the sum s≤t |∆Ns | converges a.s. so we c c can write Nt = s≤t ∆Ns + Nt , where N is continuous. For n ∈ N, let n πn be a partition of [0, t] given by 0 = t0 < · · · < tnn = t and suppose that kπn k = supk |tnk − tnk−1 | → 0 as n → ∞. We have P X k (Mtnk − Mtnk−1 )(Ntnk − Ntnk−1 ) = X X ∆Ms ∆Ns (Mtnk − Mtnk−1 )(Ntnk − Ntnk−1 ) − s≤t k = (4.9) X X (Mtnk − Mtnk−1 )(Ntcnk − Ntcnk−1 ), (Msn − ∆Ms )∆Ns + s≤t k P where Msn = k (Mtnk − Mtnk−1 )1(tnk−1 ,tnk ] (s). By Cauchy-Schwarz the square of the second term on the right-hand side is bounded by X X sup |Nuc − Nvc | |Ntcnk − Ntcnk−1 | (Mtnk − Mtnk−1 )2 . |u−v|≤kπn k u,v≤t k k 4.4 Lévy processes 105 Since N c is uniformly continuous and has finite variation on [0, t], the product of the first two factors converges a.s. to 0. The last factor is bounded in L 2 by the argument in the first paragraph, whence the second term on the right-hand side of (4.9) tends to 0 in probability. As for the first term, observe that the sum is in fact countable and a.s. Msn → ∆Ms for every s ≥ 0. Since M is bounded and N has finite variation on [0, t], it follows by dominated convergence that the first term on the right-hand side of (4.9) a.s. converges to 0. All together we see that X (Mtnk − Mtnk−1 )(Ntnk − Ntnk−1 ) k converges to 0 in probability. Since by Cauchy-Schwarz and the first paragraph the sequence is bounded in L2 and hence uniformly integrable (see Lemma A.3.4) we conclude that the convergence takes place in L1 as well (Theorem A.3.5), so EMt Nt = E X k =E X k (Mtnk − Mtnk−1 ) X k (Ntnk − Ntnk−1 ) (Mtnk − Mtnk−1 )(Ntnk − Ntnk−1 ) → 0. In view of the definitions of M and N this implies that EeiuXt +ivYt = EeiuXt EeivYt , so Xt and Yt are independent. Since X and Y are Lévy processes, it follows that the processes are independent (see Exercise 18). The full jump structure of a Lévy process X is described by the random measure associated with its jumps. Formally, a random measure on a measurable space (E, E) is a map µ : Ω × E → [0, ∞] such that for fixed B ∈ E, µ(·, B) is a (measurable) random variable and for fixed ω, µ(ω, ·) is a measure on (E, E). The random measure µX associated with the jumps of the Lévy process X is the random measure on R+ × R\{0} defined by µX (ω, ·) = X δ(t,∆Xt (ω)) . t≥0 P We usually suppress the ω in the notation and simply write µX = t≥0 δ(t,∆Xt ) . Observe that the sum is in fact countable and with N B the counting process of the Borel set B, we have µX ([0, t] × B) = NtB . This shows that µX ([0, t] × B) is indeed a random variable. By a standard approximation argument, µX is then seen to be a well-defined random measure on (R+ × R\{0}, B(R+ × R\{0})). We will prove below that µX is a so-called Poisson random measure. The general definition is as follows. Definition 4.4.8. Let µ be a random measure on (E, E) and ν an ordinary measure. Then µ is called a Poisson random measure with intensity measure ν if 106 Special Feller processes (i) for any disjoint sets B1 , . . . , Bn µ(B1 ), . . . , µ(Bn ) are independent, ∈ E, the random variables (ii) if ν(B) < ∞ then µ(B) has a Poisson distribution with parameter ν(B). We have in fact already encountered Poisson random measures on the positive half-line. Example 4.4.9. If N is a Poisson process with intensity λ > 0, we can define a random measure µ on R+ by setting µ(s, t] = Nt − Ns for s ≤ t. The properties of the Poisson process show that µ is a Poisson random measure on R+ with intensity measure λ Leb, where Leb denotes the Lebesgue measure. Theorem 4.4.10. The random measure µX associated with the jumps of a Lévy process X with Lévy measure ν X is a Poisson random measure on R+ × R\{0} with intensity measure Leb⊗ν X . Proof. We provide the main ingredients of the proof, leaving the details as an exercise (see Exercise 21). Suppose that (s1 , t1 ] × B1 and (s2 , t2 ] × B2 are disjoint Borel subsets of R+ × R\{0}, and the sets Bi are bounded away from 0. Observe that µX ((si , ti ] × Bi ) = NtBi i − NsBi i . Since the sets are disjoint, either (s1 , t1 ] and (s2 , t2 ] are disjoint, or B1 and B2 are disjoint. In the first case, the independence of the random variables µX ((si , ti ] × Bi ) follows from the independence of the increments of the processes N Bi . In the second case the processes N Bi never jump at the same time, so they are independent by Theorem 4.4.7. It is clear that this line of reasoning can be extended to deal with n disjoint sets (si , ti ] × Bi instead of just two. Appropriate approximation arguments complete the proof of the fact that µX satisfies the first requirement of Definition 4.4.8. For a Borel subset of R+ × R\{0} of the form (s, t] × A we have that µX ((s, t] × A) = NtA − NsA is Poisson distributed with parameter (t − s)ν X (A) = (Leb ⊗ ν X )((s, t] × A). Using the fact that the sum of independent Poisson random variables is again Poisson and appropriate approximation arguments, it is straightforward to conclude that the second requirement of the definition is fulfilled. At several places below certain integrals with respect to the jump measure µX will appear. A measurable function f is called µX -integrable if for every t ≥ 0, Z Z (0,t] R\{0} |f (x)| µX (ds, dx) < ∞ almost surely. For a µX -integrable function f we define the stochastic process f ∗ µX by Z Z X X f (∆Xs )1{∆Xs 6=0} . f ∗ µt = f (x) µX (ds, dx) = (0,t] R\{0} s≤t 4.4 Lévy processes 107 For a specific function f we often write f (x)∗ µX instead of f ∗ µX , for instance Z Z X x µX (ds, dx) = x1{|x|>1} ∗ µX = ∆Xs 1{|∆Xs |>1} . t (0,t] |x|>1 s≤t It is clear that f ∗ µX is an adapted process. Observe that 1B ∗ µX = N B , the counting process of the set B. The following lemma collects some useful properties of integrals relative to the jump measure. Lemma 4.4.11. Let µX be the Poisson random measure associated with the jumps of the Lévy process X with Lévy measure ν X , and let f be a measurable function. R (i) The function f is µX -integrable if and only if (|f | ∧ 1) dν X < ∞, and in that case iuf X X Eeiuf∗µt = e−t (1−e ) dν . R X (ii) If f ∈ L1 (ν X ) then E f ∗ µX t = t f dν . R 2 X (iii) If f ∈ L1 (ν X ) ∩ L2 (ν X ) then Var f ∗ µX t = t f dν . (iv) If f is µX -integrable, the processes f∗µX and X −f∗µX are Lévy processes. Proof. Recall that if the random variable Y has a Poisson distribution with parameter α > 0, its Laplace transform is given by E exp(−λY ) = exp(−α(1 − Pn exp(−λ))). Let f be a simple, nonnegative function of the form f = i=1 , c 1 i B i P for certain disjoint Borel sets Bi and numbers ci ≥ 0. Then f ∗ µX = c i N Bi and hence, since the processes N Bi are independent by Theorem 4.4.7, Y Y Bi X −ci λ −λf X X )) e−tν (Bi )(1−e = e−t (1−e ) dν . Ee−ci λNt = Ee−λ(f∗µ )t = For an arbitrary nonnegative, measurable function f we can choose a sequence fn of simple, nonnegative functions increasing to f . By monotone and dominated convergence we can then conclude that again, Ee−λ(f∗µ X )t = e−t (1−e−λf ) dν X . It follows that a measurable function f is µX -integrable if and only if Z lim (1 − e−λ|f | ) dν X = 0. λ↓0 For x ≥ 0 it holds that 1 − exp(−x) ≤ x ∧ 1, whence, for λ small enough, |1 − exp(−λ|f |)| ≤ λ|fR| ∧ 1 ≤ |f | ∧ 1. In view of the dominated convergence theorem, we see that (|f | ∧ 1) dν < ∞ is a sufficient condition for the µX integrability of f . The fact that it is also necessary follows for instance from the inequality 1 1 (x ∧ 1) ≤ (log 2x) ∧ ≤ 1 − e−x 2 2 for x ≥ 0, which follows from the concavity of x 7→ 1 − exp(−x). 108 Special Feller processes A similar computation as above yields the expression for the characteristic function, see Exercise 22. Items (ii) and (iii) can be proved by first considering simple functions and using an approximation argument. The proofs of (ii), (iii) and (iv) are left to the reader, see Exercise 23. 4.4.3 Lévy-Itô decomposition Let X be a Lévy process with Lévy measure ν X and f ∈ L1 (ν X ). Then by R X X Lemma 4.4.11 the process f ∗ µt − t f dν is a process with independent increments and constant expectation, whence it is a martingale. For convenience we denote this martingale by f ∗ (µX − ν X ). For every ε > 0 we can write X = (X − x1{ε<|x|≤1} ∗ (µX − ν X )) + x1{ε<|x|≤1} ∗ (µX − ν X ). (4.10) Both processes on the right-hand side are Lévy processes by Lemma 4.4.11. The first one is obtained byRsubtracting from X its jumps with absolute value in (ε, 1] and then subtracting t ε<|x|≤1 x ν X (dx) to ensure that we get a martingale. The jumps of the second process on the right-hand side are precisely those of X with absolute value outside (ε, 1]. Hence, by Theorem 4.4.7, the two processes are independent. It turns out as ε → 0 they tend to well-defined, independent Lévy processes. The jumps of the first one equal the “small jumps” of X, while the jumps of the second one coincide with the “large jumps” of X. 4.4.12. Let X be a Lévy process with Lévy measure ν X . Then RTheorem 2 X (x ∧ 1) ν (dx) < ∞ and for t ≥ 0 the limit Mt = lim x1{ε<|x|≤1} ∗ (µX − ν X )t ε↓0 exists in L2 . The processes X − M and M are independent Lévy processes. It holds that ∆Mt = ∆Xt 1{|∆Xt |≤1} and ∆(X − M )t = ∆Xt 1{|∆Xt |>1} . Proof. Define Y = X − x1{|x|>1} ∗ µX . By part (iv) of Lemma 4.4.11, Y is a Lévy process. Observe that ∆Yt = ∆Xt 1{|∆Xt |≤1} . Hence by Theorem 4.4.6 the process Y has finite moments of all orders, and it is easily seen that the Lévy measure ν Y of Y is given by ν Y (B) = ν X (B ∩ [−1, 1]) (see Exercise 24). For ε > 0, define Y ε = x1{ε<|x|≤1} ∗ µY = x1{ε<|x|≤1} ∗ µX . By Lemma 4.4.11 and monotone convergence we have Z Z Var Y1ε = x2 ν X (dx) ↑ x2 ν X (dx) ε<|x|≤1 |x|≤1 as ε → 0. Lemma 4.4.11 also implies that Y − Y ε is independent of Y ε , so that Var Y1ε ≤ Var Y1 < ∞. It follows that Z x2 ν X (dx) < ∞, |x|≤1 4.4 Lévy processes 109 which proves the first statement of the theorem. Now put M ε = x1{ε<|x|≤1} ∗ (µX − ν X ). Then for t ≥ 0 and ε0 < ε < 1 we have Z 0 E(Mtε − Mtε )2 = x2 ν X (dx). ε0 <|x|≤ε Hence, in view of the first paragraph of the proof, the random variables Mtε are Cauchy in L2 . So as ε → 0, Mtε converges to some random variable Mt in L2 . The process M inherits the Lévy property from M ε (see Exercise 25) and (EMt )2 ≤ E(Mtε −Mt )2 → 0, so M is centered. In particular, M is a martingale. By Theorem 4.4.7, X − x1{ε<|x|≤1} ∗ µX and x1{ε<|x|≤1} ∗ µX are independent, hence X − M ε and M ε are independent as well. By letting ε → 0 we see that X − M and M are independent. To complete the proof, note that by Doob’s L2 -inequality (Theorem 2.3.11) E(sups≤t |Msε − Ms |)2 ≤ 4E(Mtε − Mt )2 → 0, so there exists a sequence εn → 0 such that a.s. sups≤t |(Y − M εn )s − (Y − M )s | → 0. By construction, |∆(Y − M ε )u | ≤ ε for all u ≥ 0. It follows that the process Y − M is continuous, hence X − M is the sum of a continuous process and x1{|x|>1} ∗ µX . In particular, the jumps of X − M are precisely the jumps of X which are larger than 1 in absolute value. The limit process M of the preceding theorem is from now on denoted by M = x1{|x|≤1} ∗ (µX − ν X ). According to the theorem it is a Lévy process and ∆Mt = ∆Xt 1{|∆Xt |≤1} , so its Lévy measure is given by B 7→ ν X (B ∩ [−1, 1]). The following theorem proves the claim made in Example 4.4.2. Theorem 4.4.13. A continuous Lévy process X is a Brownian motion with linear drift, i.e. Xt = bt + σWt for certain numbers µ ∈ R, σ ≥ 0 and a standard Brownian motion W . Proof. By Theorem 4.4.6 the process X has moments of all orders, so there exist b ∈ R and σ ≥ 0 such that EXt = bt and VarXt = σ 2 t (see Subsection 4.4.1). If we define Yt = Xt − bt, it follows easily that under Px , the processes Yt − x and (Yt − x)2 − σ 2 t are centered martingales. For a ∈ R, let τa be the first time that the process Y hits the level a. By arguing exactly as we did for the BM in Exercises 14 and 15 of Chapter 2 we find that for a < x < b, Px (τa < τb ) = b−x , b−a E x τa ∧ τ b = − (a − x)(b − x) . σ2 Now consider the escape time ηr = inf{t : |Yt − Y0 | > r}. Under Px we have that Yηr is x ± r and Px (Yηx = x − r) = Px (τx−r < τx+r ) = r 1 = . 2r 2 110 Special Feller processes Also note that Ex ηr = Ex τx−r ∧ τx+r = r2 /σ 2 . Hence, using Taylor’s formula we see that for f ∈ C02 we have 1 f (x + r) + 21 f (x − r) − f (x) Ex f (Yηr ) − f (x) = σ2 2 Ex ηr r2 1 2 00 = σ (f (y1 ) + f 00 (y2 )), 4 where y1 , y2 ∈ [x − r, x + r]. By Theorem 3.5.12 it follows that for the generator A of the process Y and f ∈ C02 it holds that Af = σ 2 f 00 /2. This is precisely the generator of σW , where W is a standard Brownian motion (see Example 3.5.7). We have now collected all ingredients for the proof of the main result of this subsection. Theorem 4.4.14 (Lévy-Itô decomposition).RLet X be a Lévy process with Lévy measure ν X and jump measure µX . Then (x2 ∧ 1) ν X (dx) < ∞ and Xt = bt + σWt + x1{|x|≤1} ∗ (µX − ν X )t + x1{|x|>1} ∗ µX t , where µ ∈ R, σ ≥ 0 and W is a standard Brownian R motion. The three processes on the right-hand side are independent. If (|x| ∧ 1) ν X (dx) < ∞ the decomposition simplifies to Xt = ct + σWt + x ∗ µX t , with c = b + R |x|≤1 x ν X (dx). Proof. By Lemma 4.4.11 the processes Y = X − x1{|x|>1} ∗ µX and x1{|x|>1} ∗ µX are Lévy processes. Since the latter has finite variation on finite intervals and they jump at different times, the processes are independent by Theorem 4.4.7. P Note that the jump measure and Lévy measure of Y are given by µY = s≤t δ(t,∆Xt ) 1{|∆Xt |≤1} and ν Y (B) = ν X (B ∩ [−1, 1]), respectively. By Theorem 4.4.12 and the process Y can be written the sum of x1{|x|≤1} ∗ (µY − ν Y ) = x1{|x|≤1} ∗ (µX − ν X ) and an independent Lévy process without jumps. According to Theorem 4.4.13 the latter is a Brownian motion with drift. This completes the proof of the first statement of the theorem. The proof of the second statement is left as Exercise 26. The triplet (b, σ 2 , ν X ) of appearing in the decomposition theorem is called the characteristic triplet of the Lévy process X. It is clear from the theorem that it is uniquely determined by X. The first process in the decomposition is a Brownian motion with drift, and the third one is a compound Poisson process (see Exercise 27). The second one can be viewed as a mixture of independent, compensated Poisson processes. Observe that the triplets of the BM with drift, 4.4 Lévy processes 111 the Poisson process and the compound Poisson process of Examples 4.4.2 - 4.4.4 are given by (b, σ 2 , 0), (0, 0, λδ1 ) and (0, 0, λ dF ), respectively. The Lévy measure of the first passage time process of Example 4.4.5 is computed in Example 4.4.20 below. The Lévy-Itô decomposition can be used to characterize certain properties of the process X in terms of its triplet. Here is an example (see also Exercise 28 for another example). Corollary 4.4.15. A Lévy process with characteristicRtriplet (b, σ 2 , ν) has finite variation on finite intervals if and only if σ 2 = 0 and (|x| ∧ 1) ν(dx) < ∞. Proof. Sufficiency follows immediately from Theorem 4.4.14. Suppose P now that X has finite variation on finite intervals. Then in particular s≤t |∆Xs | < ∞ a.s. for every t ≥ 0. Hence, by Lemma 4.4.11, R (|x| ∧ 1) ν(dx) < ∞. Theorem 4.4.14 then implies that X ∆Xs , Xt = ct + σWt + s≤t where W is a Brownian motion. It follows that the process σW has finite variation on finite intervals. By Corollary 2.4.2 this can only happen if σ 2 = 0. A Lévy process X is called a subordinator if it has increasing sample paths. In particular the sample paths of a subordinator have finite variation Ron finite intervals, so by the preceding corollary the Lévy measure satisfies (|x| ∧ 1) ν(dx) < ∞ and we can write X Xt = ct + ∆Xs . s≤t The number c is called the drift coefficient of the process. All jumps of a subordinator must be positive, hence its Lévy measure ν is concentrated on (0, ∞). To ensure that the process is increasing between jumps, we must have c ≥ 0. We say that (c, ν) are the characteristics of the subordinator X. Clearly, they are uniquely determined by the process X. Since a subordinator is nonnegative, it is convenient to work with Laplace transforms instead of characteristic functions. The Lévy property implies that for λ, t ≥ 0 it holds that E exp(−λXt ) = exp(−tϕ(λ)) for some ϕ : R+ → R+ . The function ϕ is called the Laplace exponent of the subordinator X. 4.4.4 Lévy-Khintchine formula Below we derive the Lévy-Khintchine formula, which gives an expression for the characteristic exponent in terms of the triplet. We need the following lemma to deal with the second process in the Lévy-Itô decomposition. 112 Special Feller processes Lemma 4.4.16. Let X be a Lévy process with jump measure µX and Lévy measure ν X . Then Z X X 1 − iux − eiux ν X (dx) Eeiu x1{|x|≤1} ∗(µ −ν )t = exp − t |x|≤1 Proof. The random variable Mt = x1{|x|≤1} ∗ (µX − ν X )t is the L2 -limit of Z x ν X (dx) − t Mtε = x1{ε<|x|≤1} ∗ µX t ε<|x|≤1 for ε ↓ 0, see Theorem 4.4.12. By Lemma 4.4.11 the characteristic function of Mtε is given by Z iuMtε = exp − t Ee 1 − iux − eiux ν X (dx) . ε<|x|≤1 From the power series expansion of the exponential we see that |1 − z − ez | ≤ |z|2 e|z| and hence |1 − iux − eiux | ≤ u2 eu x2 for |x| ≤ 1. Since x 7→ x2 1{|x|≤1} is ν X -integrable we can now apply the dominated convergence theorem to complete the proof. Theorem 4.4.17 (Lévy-Khintchine formula). Let X be a Lévy process with characteristic triplet (b, σ 2 , ν). Then its characteristic exponent ψ is given by Z ψ(u) = −ibu + 21 σ 2 u2 + 1 − eiu + iu1{|x|≤1} ν(dx). R If X is a subordinator with characteristics (c, ν), its Laplace exponent is given by Z ∞ ϕ(λ) = cλ + (1 − e−λx ) ν(dx). 0 Proof. The first part of the theorem is a straightforward consequence of the Lévy-Itô decomposition and the expressions that we have for the characteristic exponents of the three independent processes appearing in the decomposition (see Examples 4.4.2, 4.4.3 and Lemma 4.4.16). For the second statement of the Theorem write Xt = ct + x ∗ µX t and use the fact that for a Borel function f ≥ 0, Ee−λ(f∗µ X )t = e−t (1−e−λf ) dν X , see the proof of Lemma 4.4.11. As we observed above, the characteristic exponent of a Lévy process determines its distribution. Hence, the Lévy-Khintchine formula shows that the distribution of a Lévy process (or subordinator) is determined by its characteristics. 4.4 Lévy processes 4.4.5 113 Stable processes A Lévy process X is called a stable process with index α > 0 if for all r > 0, the process (r −1/α Xrt )t≥0 has the same distribution as X. Processes with this scaling property are also called self-similar (with index 1/α). The term “stable” is usually reserved for self-similar Lévy processes. Observe that in terms of the characteristic exponent ψ of X, α-stability means that ψ(ur) = r α ψ(u) for all r > 0 and u ∈ R. By the scaling property, the Brownian motion is a stable process with index 2. The deterministic process Xt = bt is stable with index 1. The scaling property of the BM also implies that the process of first passage times of Example 4.4.5 is stable. Example 4.4.18. Consider the first passage times process (σa )a≥0 of the BM. Then r−2 σra = r−2 inf{t : Wt > ra} = inf{t : r −1 Wr2 t > a}. Hence, by the scaling property of the BM, the processes (r −2 σra )a≥0 and (σa )a≥0 have the same distribution, so the first passage times process is stable with index 1/2. The following theorem characterizes stable processes in terms of their triplets. We exclude the degenerate, deterministic processes with triplets of the form (b, 0, 0). Trivially, they are 1-stable if b 6= 0 and α-stable for any α if b = 0. We call a subordinator with characteristics (c, ν) degenerate if ν = 0. Theorem 4.4.19. Let X be a non-degenerate Lévy process with characteristic triplet (b, σ 2 , ν). (i) If X is stable with index α, then α ∈ (0, 2]. (ii) The process X is stable with index 2 if and only if b = 0 and ν = 0. (iii) The process X is stable with index 1 if and only if σ 2 = 0 and ν(dx) = C± |x|−2 dx on R± , for certain constants C± ≥ 0. (iv) The process X is stable with index α ∈ (0, 1) ∪ (1, 2) if and only if b = 0, σ 2 = 0 and ν(dx) = C± |x|−1−α dx on R± , for certain constants C± ≥ 0. A non-degenerate subordinator with characteristics (c, ν) is α-stable if and only α ∈ (0, 1), c = 0 and ν(dx) = Cx−1−α dx for some C ≥ 0. Proof. The process X is α-stable if and only if for all r > 0, the processes (Xrα t )t≥0 and (rXt )t≥0 have the same triplets. These triplets are given by (rα b, rα σ 2 , rα ν) and (rb, r 2 σ 2 , ν( 1r ·)), respectively (check!). Hence, X is stable with index α if and only if for all r > 0: r α b = rb, r α σ 2 = r2 σ 2 , and r α ν(·) = ν( 1r ·). The conditions on the triplets given in the statement of the theorem are now easily seen to be sufficient. It remains to prove that they are also necessary. 114 Special Feller processes If the Lévy measures r α ν(·) and ν( 1r ·) are equal then for x > 0 we have rα ν[x, ∞) = ν[x/r, ∞) for all r > 0. Taking r = x we see that Z ∞ 1 1 ν[x, ∞) = α ν[1, ∞) = C+ dy, 1+α x y x with C+ = ν[1, ∞)/α, hence ν(dx) = C+ x−1−α dx on R+ . If we replace [1, ∞) by (−∞, −1] and repeat the argument we see that if X is α-stable, then ν(dx) = C− |x|−1−α dx on R− , for some constant C− ≥ 0. Now if X is stable with index 2, then clearly we must have b = 0 and the preceding paragraph shows that ν(dx) = C± |x|−3 dx on R± , for certain R1 constants C± ≥ 0. Since it holds that −1 x2 ν(dx) < ∞ (see Theorem 4.4.12), we must have ν = 0, proving (ii). For the proof of (i), (iii) and (iv), suppose X is stable with an index α 6= 2. Then we must have σ 2 = 0 and ν(dx) = C± |x|−1−α dx on R± , for certain constants C± ≥ 0. Since we assumed that X R1 is non-degenerate it holds that ν 6= 0. The fact that we have −1 x2 ν(dx) < ∞ then forces α ∈ (0, 2). If α 6= 1 then clearly we must have b = 0 as well. You are asked to provide the proof for the subordinator case in Exercise 29. We can now compute the characteristic of the process of first passage times of the Brownian motion. Example 4.4.20. The process (σa )a≥0 of first passage times of a standard BM W is a subordinator with certain characteristics (c, ν). We saw in Example 4.4.18 that the process is stable with index 1/2. Hence, by Theorem 4.4.19 we have c = 0 and ν(dx) = Cx−3/2 dx for some C ≥ 0. It remains to determine the constant C. By Theorem 4.4.17 we have Z ∞ −x Z ∞ √ e 1 − e−x √ dx = e−2 πC . dx = exp − 2C Ee−σ1 = exp − C 3/2 x x 0 0 Here the second and third equalities follow from integration by parts and a substitution x = 12 y 2 , respectively. On the other hand, we a.s. have that σa = τa = inf(t : Wt = a} (see Example 4.4.5) and hence, by Theorem 2.4.7, Ee−σ1 = Ee−τ1 = e− √ 2 . −1/2 −3/2 It follows that C = (2π)−1/2 x dx. The Laplace expo√ , so ν(dx) = (2π) nent is given by ϕ(λ) = 2λ. 4.5 Exercises 4.5 115 Exercises 1. Let W be the canonical version of the BMd . Show that the law of W under Px is equal to the law of x + W under P0 . 2. Prove the second statement of Theorem 4.1.2. 3. Derive from Corollary 4.1.3 that for d ≥ 3, it holds with probability one that the BMd ultimately leaves every bounded set forever. (Hint: for 0 < a < b, consider the stopping times τa(1) = τa (1) τb = inf{t > τa(1) : kWt k = b} (1) τa(2) = inf{t > τb .. . : kWt k = a} Using the strong Markov property, show that if kxk = b, then Px (n) Conclude that Px (τa τa(n) <∞ = a kxk (d−2)n . < ∞ for all n) = 0.) 4. Let W be a BMd , σ an invertible d × d matrix and define Xt = σWt . Show that X is a Feller process and compute its generator. 5. Let X be a Feller process on Rd with generator A. For b ∈ Rd , define Yt = bt+Xt and let B be the generator of Y . Show that for f ∈ D(A)∩C02 (Rd ), it holds that X ∂f (x). Bf (x) = Af (x) + bi ∂xi i 6. Complete the proof of Theorem 4.2.5. 7. Verify the assertions in Example 4.2.6. 8. Prove that the generator A of a Poisson process with intensity λ is given by Af (x) = λ(f (x + 1) − f (x)) for f ∈ C0 . 9. Prove Theorem 4.3.3. 10. Let Q be a transition kernel on E and λ > 0. Show that (4.8) defines a transition semigroup. 11. Show that the semigroup (Pt ) in the proof of Theorem 4.3.5 is Feller. 12. Give a “continuous-time proof” of the fact that the ordinary random walk on Z is recurrent. (Hint: consider the continuous-time random walk X on Z. Fix a ≤ x ≤ b in Z and consider a function f ∈ C0 such that f (y) = y 116 Special Feller processes for a ≤ y ≤ b. As in the proof of Theorem 4.1.2, use Dynkin’s formula with this f to show that Px (τa < τb ) = b−x . b−a Complete the proof by arguing as in the proof of 4.1.3.) 13. Show that the characteristic exponent of a Lévy process determines all finite dimensional distributions. 14. Show that if X is a Lévy process with finite second moments, then VarXt = σ 2 t for some σ 2 ≥ 0. (Hint: for a proof without using the characteristic function, show first that the claim holds for t ∈ Q. Then for arbitrary t ≥ 0, consider a sequence of rationals qn ↓ t and prove that Xqn is a Cauchy sequence in L2 . Finally, use the continuity in probability.) 15. Show that any Lévy process is a Feller process. 16. Show that the compound Poisson process is a Lévy process and compute its characteristic exponent. 17. If X is a Lévy process, show that Mt = exp(iuXt )/E exp(iuXt ) is a martingale. 18. Let X and Y be two Lévy processes with respect to the same filtration. Show that if Xt and Yt are independent for each t ≥ 0, then the processes X and Y are independent. 19. By constructing a counter example, show that in Theorem 4.4.7, the assumption that one of the processes has finite variation on finite intervals can not be dropped. 20. By constructing a counter example, show that in Theorem 4.4.7, the assumption that X and Y are Lévy processes with respect with the same filtration can not be dropped. (Hint: keeping the strong Markov property in mind, construct two suitable Poisson processes w.r.t. different filtrations.) 21. Work out the details of the proof of Theorem 4.4.10. 22. Derive the expression for the characteristic function given in Lemma 4.4.11. (Hint: use the bound |1 − exp(ix)|2 = 2(1 − cos x) ≤ 2(x2 ∧ 1).) 23. Prove items (ii), (iii) and (iv) of Lemma 4.4.11. 24. Let X be a Lévy process with Lévy measure ν X and f a µX -integrable function. Determine the Lévy measures of f ∗ µX and X − f ∗ µX . 25. Let X n be a sequence of Lévy processes relative to the same filtration and for every t ≥ 0, Xtn → Xt in L2 . Then X is a Lévy process as well. 26. Proof the second statement of Theorem 4.4.14. P 27. Let X be a Lévy process. Show that s≤t ∆Xs 1{|∆Xs |>1} is a compound Poisson process. 4.5 Exercises 117 28. Let X be a Lévy process Rwith Lévy measure ν. Show that X has finite first moments if and only (|x| ∧ x2 ) ν(dx) < ∞ and X has finite second R moments if and only x2 ν(dx) < ∞. 29. Complete the proof of Theorem 4.4.19. 118 Special Feller processes A Elements of measure theory A.1 Definition of conditional expectation The following theorem justifies the definition of the conditional expectation. Theorem A.1.1. Let X be an integrable random variable on a probability space (Ω, F, P) and G ⊆ F a σ-algebra. Then there exists an integrable random variable Y such that (i) Y is G-measurable, (ii) for all A ∈ G Z X dP = A Z Y dP. A If Y 0 is another random variable with these properties, then Y = Y 0 almost surely. Proof. To prove existence, suppose first that X ≥ 0. Define the measure Q on (Ω, G) by Z X dP, A ∈ G. Q(A) = A Since X is integrable, the measure Q is finite. Note that for all A ∈ G, we have that P(A) = 0 implies that Q(A) = 0, in other words, Q is absolutely continuous with respect to the restriction of P to G. By the Radon-Nikodym theorem, it follows that there exists an integrable, measurable function Y on (Ω, G) such that Z Y dP, A ∈ G. Q(A) = A Clearly, Y has the desired properties. The general case follows by linearity. 120 Elements of measure theory Now suppose that Y and Y 0 are two integrable random variables that both satisfy (i) and (ii). For n ∈ N, define the event An = {Y − Y 0 ≥ 1/n}. Then An ∈ G, so Z 1 0= (Y − Y 0 ) dP ≥ P(An ), n An hence P(An ) = 0. It follows that P(Y > Y 0 ) ≤ ∞ X P(An ) = 0, n=1 so Y ≤ Y 0 , almost surely. By switching the roles of Y and Y 0 we see that it also holds that Y 0 ≤ Y , almost surely. Definition A.1.2. An integrable random variable Y that satisfies conditions (i) and (ii) of Theorem A.1.1 is called (a version of ) the conditional expectation of X given G. We write Y = E(X | G), almost surely. If the σ-algebra G is generated by another random variable, say G = σ(Z), then we usually write E(X | Z) instead of E(X | σ(Z)). Similarly, if Z1 , Z2 , . . . is a sequence of random variable, then we write E(X | Z1 , Z2 , . . .) instead of E(X | σ(Z1 , Z2 , . . .)). Note that if X is square integrable, then E(X | G) is simply the orthogonal projection in L2 (Ω, F, P) of X on the closed subspace L2 (Ω, G, P). In other words, it is the G-measurable random variable that is closest to X in mean square sense. This shows that we should think of the conditional expectation E(X | G) as the ‘best guess of X that we can make on the basis of the information in G’. A.2 Basic properties of conditional expectation The following properties follow either immediately from the definition, or from the corresponding properties of ordinary expectations. Theorem A.2.1. (i) If X is G-measurable, then E(X | G) = X. (ii) E(X | {∅, Ω}) = EX a.s. as (iii) Linearity: E(aX + bY | G) = aE(X | G) + bE(Y | G). (iv) Positivity: if X ≥ 0 a.s., then E(X | G) ≥ 0 a.s. (v) Monotone convergence: if Xn ↑ X a.s., then E(Xn | G) ↑ E(X | G) a.s. A.3 Uniform integrability 121 (vi) Fatou: it Xn ≥ 0 a.s., then E(lim inf Xn | G) ≤ lim inf E(Xn | G) a.s. (vii) Dominated convergence: suppose that |Xn | ≤ Y a.s. and EY < ∞. Then if Xn → X a.s., it follows that E(Xn | G) → E(X | G) a.s. (viii) Jensen: if ϕ is a convex function such that E|ϕ(X)| < ∞, then E(ϕ(X) | G) ≥ ϕ(E(X | G)) a.s. (ix) Tower property: if G ⊆ H, then E(X | G) = E(E(X | H) | G) a.s. (x) Taking out what is known: if Y is G-measurable and bounded, then E(Y X | G) = Y E(X | G) a.s. (xi) Role of independence: if H is independent of σ(σ(X), G), then E(X | σ(H, G)) = E(X | G) a.s. Proof. Exercise! The proofs of parts (iv), (viii), (x) and (xi) are the most challenging. See Williams (1991). A.3 Uniform integrability For an integrable random variable X, the map A 7→ in the following sense. R A |X| dP is “continuous” Lemma A.3.1. Let X be an integrable random variable. Then for every ε > 0 there exists a δ > 0 such that for all A ∈ F, P(A) ≤ δ implies that Z |X| dP < ε. A Proof. See for instance Williams (1991), Lemma 13.1. Roughly speaking, a class of random variables is called uniformly integrable if this continuity property holds uniformly over the entire class. The precise definition is as follows. Definition A.3.2. Let C be an arbitrary collection of random variables on a probability space (Ω, F, P). We call the collection uniformly integrable if for every ε > 0 there exists a K ≥ 0 such that Z |X| dP ≤ ε, for all X ∈ C. |X|>K 122 Elements of measure theory The following lemma gives important examples of uniform integrable classes. Lemma A.3.3. Let C be a collection of random variables on a probability space (Ω, F, P). (i) If the class C is bounded in Lp (P) for some p > 1, then C is uniformly integrable. (ii) If C is uniformly integrable, then C is bounded in L1 (P). Proof. Exercise, see Williams (1991). Conditional expectations give us the following important example of a uniformly integrable class. Lemma A.3.4. Let X be an integrable random variable. Then the class {E(X | G) : G is a sub-σ-algebra of F} is uniformly integrable. Proof. Exercise. See Williams (1991), Section 13.4. Uniform integrability is what is needed to strengthen convergence in probability to convergence in L1 . Theorem A.3.5. Let (Xn )n∈N and X be integrable random variables. Then Xn → X in L1 if and only if (i) Xn → X in probability, and (ii) the sequence (Xn ) is uniformly integrable. Proof. See Williams (1991), Section 13.7. A.4 Monotone class theorem There exist many formulations of the monotone class theorem. In Chapter 3 we use a version for monotone classes of sets. Definition A.4.1. A collection S of subsets of Ω is called a monotone class if A.4 Monotone class theorem 123 (i) Ω ∈ S, (ii) if A, B ∈ S and A ⊆ B, then B\A ∈ S, (iii) if An is an increasing sequence of sets in S, then S An ∈ S. Theorem A.4.2. Let S be a monotone class of subsets of Ω and let F be a class of subsets that is closed under finite intersections. Then F ⊆ S implies σ(F) ⊆ S. Proof. See Appendix A1 of Williams (1991). 124 Elements of measure theory B Elements of functional analysis B.1 Hahn-Banach theorem Recall that a (real) vector space V is called a normed linear space if there exists a norm on V , i.e. a map k · k : V → [0, ∞) such that (i) kv + wk ≤ kvk + kwk for all v, w ∈ V , (ii) kavk = |a| kvk for all v ∈ V and a ∈ R, (iii) kvk = 0 if and only if v = 0. A normed linear space may be regarded as a metric space, the distance between to vectors v, w ∈ V being given by kv − wk. If V, W are two normed linear spaces and A : V → W is a linear map, we define the norm kAk of the operator A by kAk = sup{kAvk : v ∈ V and kvk = 1}. If kAk < ∞, we call A a bounded linear transformation from V to W . Observe that by construction, it holds that kAvk ≤ kAkkvk for all v ∈ V . A bounded linear transformation from V to R is called a bounded linear functional on V . We can now state the Hahn-Banach theorem. Theorem B.1.1 (Hahn-Banach theorem). Let W be a linear subspace of a normed linear space V and let A be a bounded linear functional on W . Then A can be extended to a bounded linear functional on V , without increasing its norm. 126 Elements of functional analysis Proof. See for instance Rudin (1987), pp. 104–107. In Chapter 3 we use the following corollary of the Hahn-Banach theorem. Corollary B.1.2. Let W be a linear subspace of a normed linear space V . If every bounded linear functional on V that vanishes on W , vanishes on the whole space V , then W is dense in V . Proof. Suppose that W is not dense in V . Then there exists a v ∈ V and an ε > 0 such that for w ∈ W , kv − wk > ε. Now let W 0 be the subspace generated by W and v and define a linear functional A on W 0 by putting A(w +λv) = λ for all w ∈ W and λ ∈ R. Note that for λ 6= 0, kw+λvk = |λ| kv−(−λ−1 w)k ≥ |λ|ε, hence |A(w + λv)| = |λ| ≤ kw + λvk/ε. It follows that kAk ≤ 1/ε, so A is a bounded linear functional on W 0 . By the Hahn-Banach theorem, it can be extended to a bounded linear functional on the whole space V . Since A vanishes on W and A(v) = 1, the proof is complete. B.2 Riesz representation theorem Let E ⊆ Rd be an arbitrary set and consider the class C0 (E) of continuous functions on E that vanish at infinity. We endow C0 (E) with the supremum norm kf k∞ = sup |f (x)|, x∈E turning C0 (E) into a normed linear space (and even a Banach space). The Riesz representation theorem (the version that we consider here) describes the bounded linear functionals on the space C0 (E). If µ is a finite Borel measure on E, then clearly, the map Z f 7→ f dµ E is a linear functional on C0 (E) and its norm is equal to µ(E). The Riesz representation theorem states that every bounded linear functional on C 0 (E) can be represented as the difference of two functionals of this type. Theorem B.2.1. Let A be a bounded linear functional on C0 (E). Then there exist two finite Borel measures µ and ν on E such that Z Z f dν f dµ − A(f ) = E for every f ∈ C0 (E). Proof. See Rudin (1987), pp. 130–132. E References Billingsley, P. (1995). Probability and measure. John Wiley & Sons Inc., New York, third edition. Revuz, D. and Yor, M. (1991). Continuous martingales and Brownian motion. Springer-Verlag, Berlin. Rudin, W. (1987). Real and complex analysis. McGraw-Hill Book Co., New York, third edition. Williams, D. (1991). Probability with martingales. Cambridge Mathematical Textbooks. Cambridge University Press, Cambridge.