STOCHASTIC CALCULUS, SPRING 2023, MARCH 27, LECTURE 1 CONSTRUCTION OF BROWNIAN MOTION Contents 1. Brownian Motion and its motivation 2. Symmetric Random Walk 2.1. Increments of the SRW 2.2. Martingale property of the SRW 2.3. Quadratic variation of the SRW 3. Scaled SRW 3.1. Limiting distribution of the scaled SRW 4. Brownian Motion: a formal definition References 1 1 2 2 3 3 4 5 6 Reading for this lecture (for references see the end of the lecture): • [1] pp. 72-108 1. Brownian Motion and its motivation What is Brownian motion? Loosely speaking Brownian motion is a “random path”: • The path is continuous; • At any given point in time, where the path goes next is independent of where the path came from; • At any given point in time, where the path ends up in any finite time is normally distributed. Does such a process exist? If so, how do we construct it? 2. Symmetric Random Walk It is always easier to start with the discrete case. Let’s introduce Symmetric Random Walk(SRW). Let Xj ’s be i.i.d. such that P(Xj = 1) = 0.5 and P(Xj = −1) = 0.5. We now define M (n) as follows: M (0) = 0, M (n) = n X Xj . j=1 Process M (n) is called symmetric random walk. With each step it either steps up or down one unit, and each of the two possibilities is equally likely. 1 this version March 27, 2023 1 2 SPRING 2023, LECTURE 1 Exercise 1. For a fixed positive integers k, n what is the probability P(M (n) = k)? 2.1. Increments of the SRW. SRW has independent increments, this means that if we choose nonnegative integers 0 = n0 < n1 < · · · < nk , then random variables M (n1 ) = M (n1 ) − M (n0 ), M (n2 ) − M (n1 ), . . . , M (nk ) − M (nk−1 ) are independent. Each of these random variables ni+1 M (ni+1 ) − M (ni ) = X Xj j=ni +1 is called an increment of the random walk. It is the change in the position of the random walk between times ni and ni+1 . Increments over non-overlapping time intervals are indeed independent because Xj ’s are independent. Moreover, ni+1 E(M (ni+1 ) − M (ni )) = X EXj = 0 j=ni +1 and ni+1 Var(M (ni+1 ) − M (ni )) = X VarXj = ni+1 − ni since VarXj = 1. j=ni +1 2.2. Martingale property of the SRW. In order to define martingale we need the notion of σ-algebra. There is a very important, nontechnical reason to include σ-algebras in the study of stochastic processes, and that is to keep track of information. The temporal feature of a stochastic process suggests a flow of time, in which, at every moment t ≥ 0, we can talk about a past, present and future and can ask how much an observer of the process knows about it at present, as compared to how much he knew at some point in the past or will know at some point in the future. Definition 1. A filtration on (Ω, F) is a family {Ft }t≥0 of σ-algebras Ft ⊂ F such that 0 ≤ s ≤ t =⇒ Fs ⊂ Ft , i.e., {Ft }t≥0 is increasing. Thus, informally, one can think about the σ-algebra Ft in the above definition as the “knowledge” available at time t. The fact that {Ft }t≥0 is monotonically increasing just reflects the fact that as time goes on the amount of available information increases. Definition 2. A stochastic process {Zt }t≥0 is called a martingale with respect to a filtration {Ft }t≥0 if a): Zt is Ft -measurable for all t b): E|Zt | < ∞ for all t c): E(Zt |Fs ) = Zs for all s ≤ t. The most important property in the above definition is property c). Simply put, it states that “the best” guess for Xt given information available at moment s is Xs , and thus process Xt has neither upward nor downward drift. Now let’s check that the SRW is a martingale. First, we need to specify a filtration {Ft }t≥0 . Let Fn be the a σ-algebra(read information) generated by the set of random variables X1 , X2 , · · · Xn . SPRING 2023, LECTURE 1 a): M (n) = n P 3 Xj is Fn -measurable since it depends only on Xj , j ≤ n, i.e. j=1 on the information available at time n. b): From the fact that |M (t)| ≤ t (E|M (t)|) ≤ t < ∞. c): using the fact that E(M (s)|Fs ) = M (s) and the fact that random variable M (t) − M (s) is independent of σ-algebra Fs we can write: E(M (t)|Fs ) = E(M (t) − M (s) + M (s)|Fs ) = E(M (t) − M (s)|Fs ) + E(M (s)|Fs ) = E(M (t) − M (s)) + M (s) = 0 + M (s) = M (s). (1) 2.3. Quadratic variation of the SRW. Quadratic variation of a discrete stochastic process up to time t is defined as hM, M it = t X (M (j) − M (j − 1))2 . (2) j=1 The quadratic variation up to time t along a path is computed by taking all the one-step increments M (j) − M (j − 1) along that path, squaring these increments and then summing them. Clearly, for the symmetric random walk increments can take values ±1 and thus hM, M it = t. (3) Note that in general the quadratic variation of a stochastic process is stochastic itself. The fact that the quadratic variation in this case is deterministic is a special feature of the SRW. 3. Scaled SRW To approximate a Brownian motion we speed up time and scale down the step size of a SRW. More precisely, we fix a positive integer n and define the scaled SRW at rational points nk as 1 k B (n) = √ M (k). (4) n n At all other points we define B (n) (t) be linear interpolation between its values at the nearest points of the form nk . The following properties of the scaled SRW could be easily proved using the corresponding properties of the SRW and we leave their proof as an exercise: a): independence of increments, i.e., for all rational numbers 0 = t0 < t1 < · · · < tn of the form nk random variables B (n) (t1 ) − B (n) (t0 ), B (n) (t2 ) − B (n) (t1 ), . . . , B (n) (tn ) − B (n) (tn−1 ) are independent. b): E(B (n) (t) − B (n) (s)) = 0, Var(B (n) (t) − B (n) (s)) = t − s. c): E(B (n) (t)|Fs ) = B (n) (s). d): quadratic variation of the scaled SRW: hB (n) B (n) it = t. (5) 4 SPRING 2023, LECTURE 1 3.1. Limiting distribution of the scaled SRW. Now the idea is to let n → ∞ in (4) and in the limit we obtain something satisfying the intuitive properties outlined in Section 1. By definition B (n) (t) = √1n Mnt or if t = nk then B (n) (t) = √1n Mk . k P Since Mk = Xj has a binomial distribution with parameters k and 1/2 one j=1 can easily calculate the distribution of B (n) (t). In particular, one can draw the histogram of B (n) (t). We know that random variable B (n) (t) has mean 0 and variance t. If we draw on top of the histogram of B (n) (t) the graph of normal density with mean zero and variance t we will see that the distribution of B (n) (t) is nearly normal. Actually, the Central Limit Theorem asserts that B (n) (t) converges in distribution to the normal random variable N (0, t). But let us not use this theorem and prove the convergence of B (n) (t) from scratch. The whole point of this exercise is learning a very powerful mathematical tool known as “characteristic functions”. Theorem 3. For any fixed t ≥ 0, the distribution of the scaled random walk B (n) (t) evaluated at time t converges to the normal distribution with mean zero and variance t. Proof. To prove this theorem we use a tool from probability theory - characteristic functions. Definition 4. Let X be a random variable with distribution P. Characteristic function of X is defined as Z iuX ϕX (u) = Ee = eiux dP (x) (6) for every u ∈ R. Example. 1) X = Be(p). Then ϕX (u) = p(eiu − 1) + 1. 2) X = Bi(n, p). Then P(X = k) = Cnk pk (1 ϕX (u) = (p(e 3) X = Poisson(λ), then P(X = k) = ϕX (u) = ∞ X k=0 iu n−k − p) (7) and n − 1) + 1) . k e−λ λk! (8) and ∞ e−λ X (λeiu )k it λk iuk e = e−λ = eλ(e −1) . k! k! (9) k=0 4) X is a Gaussian random variable N (m, σ 2 ). Then 2 ϕX (u) = eium−u σ 2 /2 . (10) The importance of the characteristic functions consists could be summarized in the following two theorems: 1. If two random variables have the same characteristic function then they have the same distribution.(of course, if two random variables have the same distributions then they have the same characteristic functions). 2. Paul Lévy Theorem. Let Xn be a sequence of random variables on R. Then SPRING 2023, LECTURE 1 5 a) If Xn converges to X is distribution then ϕXn (u) → ϕX (u) for every u and in fact this convergence is uniform. b) If ϕXn (u) → ϕX (u) then Xn converges to X is distribution. c) If ϕXn (u) converges to some function ϕ(u) and ϕ(u) is continuous at 0 then there exists a unique random variable X such that ϕ is the characteristic function of X and Xn converges to X in distribution. We use part (b) of the theorem to prove that B (n) (t) converges to normal distribution with expectation zero and variance t. iuB (n) (t) iu √1n Mnt = Ee ϕB (n) (u) = Ee = Ee nt 1 iu/√n 1 −iu/√n + e . = e 2 2 iu √1n nt P Xj j=1 nt iu √1 X = Ee n j (11) We need to show that as n → ∞ nt 1 iu/√n 1 −iu/√n ϕn (u) = e + e 2 2 converges to the characteristic function of the normal random variable with mean 2 0 and variance t, i.e., to e−u t/2 . Expanding in Taylor series u2 1 1 iu/√n 1 −iu/√n e + e =1− + O( 3/2 ) 2 2 2n n (12) and thus ϕn (u) = nt 2 u2 u2 1 1− + O( 3/2 ) → e− 2n ·nt = e−u t/2 . 2n n Thus we have proved that B (n) (t) converges in distribution to a normal random variable with expectation zero and variance t. Remark 5. Strictly speaking, we have just demonstrated that for a fixed moment t distribution of B (n) (t) converges to N (0, t) but we did not prove the convergence of the whole stochastic process to Brownian motion. To do this one has to prove the convergence of the finite dimensional distributions and use Prokhorov’s theorem. 4. Brownian Motion: a formal definition One-dimensional Brownian Motion is a continuous-time stochastic process that satisfies the following properties: Independence of increments.: If t0 < t1 < · · · < tn then random variables B(t0 ), B(t1 ) − B(t0 ), . . . , B(tn ) − B(tn−1 ) are independent. Normally distributed increments.: If s, t ≥ 0 then Z 1 −x2 /2t √ P [B(t + s) − B(s) ∈ A)] = e dx. (13) 2πt A Continuous trajectories: With probability one, B0 = 0 and t → Bt is continuous. 6 SPRING 2023, LECTURE 1 References [1] [2] [3] [4] Steven Shreve, Stochastic Calculus for Finance II: Continuous-Time Models Richard Durrett, Stochastic Calculus: A Practical Introduction. Ioannis Karatzas, Steven E. Shreve, Brownian Motion and Stochastic Calculus. Bernt Oksendal, Stochastic Differential Equations