1 First, some notation: we think of v ∈ Rd as a column vector and v T is its transpose. Also, vi will denote the i-th coordinate of v. Similarly, AT is the transpose of the matrix A and the entries of A are denoted by Ai,j . Let σ : [0, t0 ] × Rn → Rn × Rm and µ : [0, t0 ] × Rn → Rn be given and let Px be the law of the solution X(t) to the SDE: dX(t) = µ(t, X(t)) dt + σ(t, X(t)) dB(t) with initial condition X(0) = x. Here, B is standard m-dimensional Brownian motion (i.e. m independent standard Brownian motions). Let us use the notation Ex for expectation relative to Px . Standard theory says the above solution exists and is actually a strong solution (i.e. is unique, in the almost-sure sense: you give me a Brownian motion and I give you THE solution) if, for example, there exists a constant C such that kµ(t, x)k + kσ(t, x)k ≤ C(1 + kxk), kµ(t, x) − µ(t, y)k + kσ(t, x) − σ(t, y)k ≤ C kx − yk , for all t ∈ [0, t0 ] and x, y ∈ Rn . Here k·k is the Euclidian norm. These conditions can be relaxed, but the proofs get more involved. (The proof of the above fact is very similar to the one for ODEs, using Picard’s iteration; see ∅ksendal’s book “Stochastic Differential Equations: An Introduction with Applications”, section 5.2.) Let f : Rn → R be a bounded smooth function with two bounded derivatives. Then, Y (t) = f (X(t)) is another stochastic process. But is it a diffusion? And if it is, then what are its corresponding “µ and σ”? The answer to this is given by Itō’s formula, which is simply a generalization of Taylor’s expansion to stochastic calculus. Indeed, applying Taylor’s expansion, one has n X ∂f dY (t) = (X(t)) dXi (t) ∂xi i=1 n n X m hX i X ∂f ∂f = (X(t))µi (t, X(t)) dt + (X(t))σi,j (t, X(t)) dBj (t). ∂xi ∂xi i=1 i=1 j=1 This is not quite right, though. What Itō observed was that (dBi (t))2 contributes a dt! (This is obvious at least when one takes expectations.) 0 April 22, 2016. 14:41 2 Hence, one should continue in Taylor’s expansion to the second order terms. The correct SDE for Y is then n n X 1 X ∂2f ∂f dY (t) = (X(t)) dXi (t) + (X(t)) dXi (t) dXj (t). ∂xi 2 ∂xi ∂xj i=1 i,j=1 Omitting terms of order higher than dt (like (dt)2 , (dt)(dBi (t)), and (dBi (t))(dBj (t)) for i 6= j) and replacing (dBi (t))2 by dt, we write dXi (t) dXj (t) = = m X k,`=1 m hX σik (t, X(t)) dBk (t) σj,` (t, X(t)) dB` (t) i σik (t, X(t))σj,k (t, X(t)) dt. k=1 In other words, we can replace the matrix (dXi (t) dXj (t))i,j with the matrix σ(t, X(t))[σ(t, X(t))]T dt. Let us abbreviate a = σσ T , which is a function from [0, t0 ] × Rn to n × n symmetric nonnegative-definite matrices. We now have n n hX i 1 X ∂2f ∂f dY (t) = (X(t))µi (t, X(t)) + ai,j (t, X(t)) (X(t)) dt ∂xi 2 ∂xi ∂xj i,j=1 i=1 + n X m X i=1 j=1 ∂f (X(t))σi,j (t, X(t)) dBj (t). ∂xi In other words, there is a second term that contributes to the drift part and that comes from the quadratic variation of the Brownian motion. The rigorous way to prove that Y is a solution to the above SDE is via martingales. It is also shown in ∅ksandal’s book, for example (look for Itō’s formula). The above reasoning, though, is mostly what the proof is about. Define now the operator Lt f (x) = n n X ∂f 1 X ∂2f (x)µi (t, x) + ai,j (t, x) (x). ∂xi 2 ∂xi ∂xj i=1 i,j=1 The SDE for Y implies then that Ex [f (Xt )] − Ex [f (Xs )] = Ex hZ s t i hZ t i dY (s) = Ex Lh f (Xh ) dh . s (The Brownian motion part has mean-zero and hence Rcancels out. This is t rigorously justified by the fact that, for s and i fixed, s g(h, X(h))dBi (h) is an Itō integral and hence a martingale.) 3 Because f is smooth enough (with bounded derivatives), one can exchange expectation and integration. Then, dividing by ε and taking it to 0, one finds ∂Ex [f (X(t))] = Ex [Lt f (X(t))]. ∂t Since f is nice, we can exchange derivatives and expectation and see that u(t, x) = Ex [f (X(t))] satisfies the heat equation (known to probabilists as Kolmogorov’s forward equation): ∂u = Lt u ∂t (0.1) with the obvious initial condition u(0, x) = Ex [f (X(0))] = f (x). Note at this point that for Brownian motion (µ = 0 and σ = I), Lt = 12 ∆ and we have solved the standard heat equation (with a given initial condition)! On the other hand, if µ and σ are sufficiently smooth (i.e. C k for k large enough) and if there exists a δ > 0 such that ||σ(t, x)T v||2 ≥ δ kvk2 , for all t ∈ [0, t0 ] and x, v ∈ Rn (equivalently, v T av ≥ δv T v, which is a “textbook” condition in parabolic PDEs!), one can show that the solution X will have a smooth (jointly in t, x, and y) transition density px (t, y). That is: Z Px {X(t) ∈ A} = px (t, y) dy. A (The above conditions can be considerably relaxed to what is called Hörmander’s condition; see http://en.wikipedia.org/wiki/H%C3%B6rmander%27s condition.) We can then rewrite the above heat equation as R Z ∂ f (y)px (t, y) dy = px (t, y)Lt f (y) dy. ∂t Then, since p is nice, one can exchange derivative and integration and also, if f has nice decay, one can use integration by parts to switch all the derivatives over from f to px : Z Z ∂px f (y) (t, y) dy = L∗t px (t, y)f (y) dy, ∂t where L∗t is the adjoint operator L∗t p(y) = − n X ∂[µi (t, y)p(y)] i=1 ∂yi + n 1 X ∂ 2 [ai,j (t, y)p(y)] . 2 ∂yi ∂yj i,j=1 4 Since this is true for a dense set of functions f , we conclude that px satisfies ∂px = L∗t px . ∂t (0.2) Obviously, we also have the initial condition px (0, y) = δx (y) (and so px is really the fundamental solution of the heat equation (0.2)). This is again known to probabilists as Kolmogorov’s forward equation. Physicists call it the Fokker-Planck equation. Now, say that µ and σ were only functions of x. In other words, we have the stochastic differential equation (0.3) dX(t) = µ(X(t)) dt + σ(X(t)) dB(t). Then to get px (t, y) we solve (0.2) with initial conditions px (t, y) = δx (y) and where operator L no longer depends on time: n n X ∂f 1 X ∂2f Lf (x) = (x)µi (x) + (x) ai,j (x) ∂xi 2 ∂xi ∂xj i=1 and L∗ p(y) = − i,j=1 n X ∂[µi (y)p(y)] i=1 ∂yi + n 1 X ∂ 2 [ai,j (y)p(y)] . 2 ∂yi ∂yj i,j=1 Once one solves the above and gets a formula for px (t, y) one can take t → ∞ and get a pdfR p(y). This function in fact solves the equation L∗ p = 0 (with the condition p(y) dy = 1). R If now we choose X(0) according to this pdf p(y) (i.e. P {X(0) ∈ A} = A p(y) dy) and then evolve X(t) according to (0.3) then for all t ≥ 0 random variable X(t) will have the same distribution: Z P {X(t) ∈ A} = p(y) dy. A That is, X(t) is a stationary process. This is not doable for Brownian motion, i.e. for R µ = 0 and σ = 1. (Try to solve p00 (x) = 0 on the real line with condition p(x)dx = 1 and see what you get!) But this is doable for example for the Ornstein-Uhlenbeck process: dX(t) = a(m − X(t)) dt + σ dB(t). 5 So here µ(x) = a(m−x) (where a > 0 and m is a real number) and σ(x) = σ is just a positive constant. If we set σ = 0 then we see that X(t) → m exponentially fast. When we add the noise (i.e. σ > 0), X(t) no longer converges to m, but it does “stabilize” in the sense that its distribution reaches a “steady state”. As explained above, the equation for this stationary distribution is given by ∂ ∂2p a [a(x − m)p(x)] + 2 = 0. ∂x ∂x Its solution is given by r a −a(x−m)2 /σ2 p(x) = e . πσ 2 In other words, the stationary distribution of the Ornstein-Uhlenbeck √ process is the Normal with mean m and variance σ/ a.