Stochastic Processes Basics Probability Spaces • A probability space comprises: a set a structure § (a ¾ -algabra) a measure of confidence P (probability) ¾ -algebra • A ¾ -algebra § is the class of subsets for which we wish to assign probabilities (the measurable sets) • 2§ • If A 2 § then AC 2 § • If {Ai} are all in §, then [i Ai 2 § Probability • Assigns positive measure (confidence) to subsets of A µ • Let A 2 §, then 0 · P(A) · 1, is called the probability of A • Let A,B 2 be disjoint, then P(A [ B) = P(A) + P(B) • Let {Ai} 2 be disjoint, then P([ i Ai) = i P(Ai) • P() = 1 Intersections of ¾ -algebra • There is at least 1 ¾–algebra of subsets of , namely 2 (powerset) • If there are more ¾ –algebras §1 and §2, then §1 Å §2 is a ¾ -algebra (exercise) • Let ¡ be a class of ¾ – algebras, then Å (§ 2 ¡) is a ¾ -algebra (exercise) Generating a ¾ -algebra • Let ¤ be a class of subsets of • Let ¡ be the class of ¾ -algebras, so that ¤ µ § for all § 2 ¡ • §(¤) = Å (§ 2 ¡) • §(¤) is called the ¾ -algebra generated by ¤ • §(¤) is the smallest ¾ -algebra including ¤ (Prove: exercise) Example • = N (natural numbers) = {1,2,..} • ¤ = {{1}} • §(¤)={N,;,N\{1},{1}} More examples • = R (real numbers) • ¤ = the open intervals of R • §(¤) is called the Borel sets. • If ¤ is a ¾ –algebra then §(¤) = ¤ (prove:exercise) Examples (probabilities) • • • • = N (natural numbers) = {1,2,..} § = 2N (all subsets) P(n)=1/n (wrong, why: exercise) P(n)=K/n2 (find K: exercise) • • • • ¤ = {{1}} §(¤)={N,;,N\{1},{1}} P({1}) = 0.5 Specifying P(A) for all A 2 ¤ is sufficient for specifying P(A) for all A in §(¤) Random (stochastic) variables • Consider two sets and X, each equipped with ¾ –algebras, § and F, resp. • A mapping x: -> X is called measurable If for any A 2 F, x-1(A) 2 § (inverse sets of measurable sets are measurable) • There exists set A 2 st. x(A) F !!! (Souslin) • A random variable is a measurable mapping. • Piecewise continuous mappings g: R->R are measurable w.r.t. the Borel sets. Induced probability distributions • Let (,§,P) be a probability space • Let x: -> R (Borel sets) • Then for A µ R (Borel set) define P(A)=P(x-1(A)) • Define F:R->[0,1] by F(y)=P((-1,y]) • F is called a distribution function (CDF) Integration • Let A be measurable • Let P be a probability • Let Â(A)be its indicator function, then we define its integral by s Â(A)dP = sA dP = P(A) • A step function (simple function) is a weighted sum of indicator functions, i.e. • g = i ai  (A_i) • s g dP = i ai P(Ai) • For a positive measurable mapping g, define the integral s g dP = sup all simple functions g’ · g s g’ dP • Generally for a measurable mapping g s g dP = s g+ dP - s g- dP Intragrals Examples • Let g: [0,1]->R be continuous. • ¹([a,b])=b-a (Lebesque measure) • Then s g d¹ is the Riemann integral • Let g: [0,1]-> {0,1} be defined by: g(x)=1 for x 2 Q (rational) g(x)= 0 elsewhere and • Let ¹ be the Lebesque measure, then s g d¹ = 0 , whereas the Riemann integral is undefined (prove: exercise) Integrals Examples • Let P(A) = Â(x 2 A) (probability measure concentrated on x) • P({x})=1 • Then s g dP = g(x) (sampling – Dirac) • Let P be concentrated on sequence of points {xi}, i.e. P(A) = i Pi Â(xi 2 A) (P({xi})= Pi) (Exercise: prove) then s g dP = i Pi g(xi) Density functions • Let two measures ¹ and P be defined on some measure space • Assume ¹(A)=0 => P(A)=0, then we say that P is absolutely continuous (AC) w.r.t. ¹ • Theorem (Radon Nikodym) Let P be AC w.r.t. ¹, then a measurable function f: -> exists such that P(A) = s f d¹ • f is called the density funtion of P Density functions Example • Consider the reals R, the Borel sets and the Lebesque measure ¹ • Let P be absolutely continuous w.r.t. Lebesque measure, e.g. P({x})=0 for x 2 R Then f exists, such that P(A) = s f d¹ (AKA s f(x) dx) Conditional Probabilities • Classic (Bayes) • Let A,B be subsets P(A|B) = P(A Å B) / P(B) • In measure theory • Let P be a probility measure Let F µ § be ¾ –algebras (F is a sub ¾ –algebra of §) • Define for A 2 §, P(A|F) to be some function, measurable in F, such that for any F 2 F P(A Å F) = sF P(A | F) dP Conditional Probabilities • Theorem P(A | F) exists !! • Proof: Let ¹A(F)= P(A Å F) (measure on F ) ¹A is AC w.r.t. P RN guarantees existence of P(A | F) measurable in F, i.e. P(A Å F) = ¹A(F)= sF P(A | F) dP An identity for conditional probabilities • Let F µ § be ¾ –algebras • If P(A|§) is F measurable • Then: P(A|F) = P(A|§) w.P.1 • Proof: for all F in F P(A Å F) = sF P(A|§) dP = sF P(A|F) dP Conditional Probabilities • Theorem P(A | F) is (almost unique) I.e., let g and g’ be candidates Then sF g dP = sF g’ dP for all F 2 F • Proof: (exercise) Conditional Probabilities Examples • Let = R2 • Let § be the smallest ¾–algebra including B £ B • Let x and y: R2 -> R (the natural projections) be two random variables. • Let §x and §y be the smallest ¾ – algebras such that x and y are measurable (generated by x and y resp.) • Both §x µ § and §y µ § Conditional Probabilities Examples • Both §x µ § and §y µ § • Define P(y 2 Y| §x) = P(y-1(Y)| §x) to be some function measurable in §x, such that P(y-1(Y) Å x-1(X)) = sinv(X) P(y 2 Y| §x) dP • We define P(y 2 Y| x) = P(y 2 Y| §x) = Py|x(Y) Conditional Probabilities Examples • Lemma §x = {B £ R | B 2 B} • Proof: {B £ R | B 2 B} is a ¾ –algebra x-1(B) = B £ R for any B 2 B Assume B £ R §x, for some B 2 B then x-1(B) §x (contradiction) So {B £ R | B 2 B} is minimal Conditional Probabilities Examples • Lemma P(y 2 Y| x) is a function of x • Proof: Define g(x,y)=P(y 2 Y|§x) (= P(y 2 Y| x)) Assume g(x,y1)=g1 g(x,y2)=g2 for y1 y2 Thus (x,y1) 2 g-1(g1) and (x,y2) g-1(g1) , so that g-1(g1) {B £ R | B 2 B} = §x (contradiction) q.e.d. Moments • Let x be a random variable • Expectation E(x)=s x dP • p-moment E(|x|p) • Variance E((x-E(x))2) = E(x2)-E2(x) Conditional moments • F µ § be ¾ –algebras • Let x and y be § measurable (random variables) • E(y| F) is an F measurable function st for all F 2 F st sF E(y| F) dP = sF y dP • Define ¹F(A) = P(A | F) • Then (Prove as an exercise) E(y|F)=s y d¹F Conditional moments • E(y|x) = E(y|§x) is a measurable in §x (a function of x) (Exercise: Prove) st for A 2 §x E(y|x2 A) = sA y dP = sA E(y|§x) dP = sA E(y|x) dP • E(y|x) = s y dPy|x (Exercise: Prove) An identity for conditional expectations • Let F µ § be ¾ –algebras • If E(y|§) is F measurable then E(y| F) = E(y|§) w.P.1 • Proof: for all F in F sF E(y| F) dP = sF y dP = sF E(y|§) dP Another identity for conditional expectations • Let F µ § be ¾ –algebras • If y is F measurable then E(y| F) = y w.P.1 • Proof: for all F in F sF E(y| F) dP = sF y dP Stochastic Proces • Let T be an index (time) set (R, R+, Z, N) • Let x: £ T -> R where for each t 2 T , x(.,t) is a random variable • Then we say that x is a stochastic (random) proces. Stochastic Proces Example • • • • • • Let ={0,1} P(1)=P(0)=1/2 x(0,t)=0 x(1,t)=t E(x(.,t))=s x(.,t) dP = ½ ¢ 0 + ½ ¢ t = t/2 E(x2(.,t))=s x2(.,t) dP = ½ ¢ 0 + ½ ¢ t2 = t2/2 Cylinders • Let {t1,..,tn} be a finite ordered subset of T • Let {A1,..,An} be subsets of R • Then C={x(.,t1) 2 A1 Å ..Å x(.,tn) 2 An} µ is called a cylinder • Let C be defined as above then Ch={x(.,t1+h) 2 A1 Å ..Å x(.,tn+h) 2 An} Stationarity • A random proces is stationary iff • P(Ch) = P(C) for all C and h • x stationary => E(x(.,t))=E(x)=E (constant in t) • x stationary => E(x2(.,t))=E(x2) • x stationary => E((x(.,t+h)-E)(x(.,t)-E))=Cxx(h) stationarity => wide sense stationarity Filtrations • • • • • A filtration is an increasing set of ¾ –algebras {§t} §t µ § t1 < t2 => §t1 µ §t2 Let x be a random proces §xt = ¾(x-1(C,¿) for ¿ · t) (i.e. the smallest ¾ –algebra including all inverse cylinder sets of x(.,¿), ¿ · t) • §x = {§xt} is called the natural filtration of x Adapted processes • A random proces x is adapted to a filtration {§t} iff for every t, x(.,t) is measurable w.r.t §t • x is adapted to its natural filtration • Lemma: Let x be adapted to {§t}, and t1 <..< tn then {x(.,t1) 2 A1 [ .. [ x(.,tn) 2 An} 2 §tn Proof: {x(.,tj) 2 Aj} 2 §tj µ §tn for all j· n Since §tn is a ¾ –algebra the result follows Adapted processes • Lemma: Let x be adapted to {§t}, and t1 <..< tn then C={x(.,t1) 2 A1 Å ..Å x(.,tn) 2 An} 2 §tn Proof: {x(.,t1) 2 A1C [ .. [ x(.,tn) 2 AnC} 2 §tn {x(.,t1) 2 A1C [ .. [ x(.,tn) 2 AnC}C 2 §tn C={x(.,t1) 2 A1 Å ..Å x(.,tn) 2 An} 2 §tn Stochastic Convergence • i) Convergence in probability (Stochastic convergence) limt ! 1 P(|x(t)|>±)=0 8 ±>0 • ii) Convergence in mean/moment limt ! 1 E[|x(t)|²]=0 • iii) Almost sure convergence P(limt ! 1 x(t)=0)=1 • iv) Convergence in distribution (weak,law) limt ! 1 P(x(t) · x) = FX(x) Convergence relations • ii) => i) • Proof: Markov inequality: P(|x(t)| ¸ a) · E(|x(t)|/a) P(|x(t)|¸ a 1/²) = P(|x(t)|² ¸ a) · E(|x(t)|²)/a Convergence relations • iii) => i) • Proof: A={! | P(limt !1 x(!,t)=0} P(A)=1 *) limt ! 1 x(!,t)=0 => 9 n, s.t. |x(!,t)| · ± for all t ¸ n For a given !, let n(!) be the smallest such n Let Am = {! | n(!) = m} {Am} are disjoint *) implies A = [m Am and in turn 1 = P(A) = P([m Am) = m P(Am), i.e. P(|x(!,t)| · ± for all t ¸l) = m=1l P(Am) -> 1 for l-> 1 Convergence relations • i) does not imply iii) • Proof by counterexample Define c(k) uniformly distributed in {2k,..,2k+1} Let {c(k)} be independent for n 2 {2k,..,2k+1}, let x(n)=1 if c(k)=n x(n)=0 else P(x(n) · ±)=P(x(n)=0) ¸ 1-2-k (convergence in probability) {x(n)} converges to 0 nowhere (no a.s. convergence) • i) does not imply ii) (exercise) Convergence relations • • • • • • iii) does not imply ii) Let = [1,1) Let ! ~ f(!)=K ! -± (density) Let x(!,t)=!/t for t ¸ 1 P(limt ! 1 x(!,t)=0)=1 (a.s. convergence) E(x(!,t)²)=1/t²s !² K !-± d! =K/t²s !²-± ! =K/t²/(1+²-±) [! 1+²-± ]11 • Finite only for 1+²-± < 0 More generally (Wikipedia) Types of Stochastic Processes • • • • • • • • Markov processes Poisson processes Martingales Brownian motions Ito processes Ito diffusions Levy processes Renewal processes Markov Processes • Let X be a random process adapted to the filtration (Ft) • Then X is a Markov process w.r.t. (Ft) iff for any measurable function f E(f(Xt+h)|Ft)= E(f(Xt+h)|Xt) w.P.1 • In particular if (Ft) is generated by X and f is the indicator function for some set A P(Xt+h2 A|Xt,Xt1,..,Xtk)= P(Xt+h2 A|Xt) for any selection of times t ¸ t1 ¸ t2 ¸ .. ¸ tk Markov Processes (Stopping times) • Let ¿ be a random variable, such that Â(¿(!) · t) is adapted to (Ft) • The ¿ is said to be a stopping time w.r.t. (Ft) • If (Ft) is generated by a random process X a stopping time ¿ (w.r.t.) (Ft)) depends only on the history of X • Example (first escape time) Let X(0) 2 U ¿ = inf {s | X(s) U} Strong Markov Property • Let X be a Markov Process generating (Ft) • The X is said to be a strong Markov process iff E(f(X¿+h)|F¿)= E(f(X¿+h)|X¿) for any stopping time ¿ w.r.t. (Ft) (F¿ is generated by the history of X until ¿) • Every discrete time Markov process (chain) is strong Non strong Markov Process Example • • • • • Let g: R -> R2 (1-D curve in the plane) Let x1 < x2 Let G=g(x1)=g(x2) and g(x) g(y) elsewhere G1={g(x)|x<x1}, G2={g(x)|x1<x<x2}, G3={g(x)|x2<x} Let X: £ R -> R be a strong Markov process, such that 0 < P(X(!,t+h) · y|X(!,t)=x)=sy Hh(´-x) d´ < 1 for all x,y (P(X(!,t)=x1)=P(X(!,t)=x2)=0 for any fixed t > 0) and X(!,0) x1 and X(!,0) x2 Let • ¿ = inf {s | g(s)=G} = inf {s | X(s) 2 {x1,x2}} • Assume P(X(!,¿)=x1), P(X(!,¿)=x2) > 0 Non strong Markov Process Example • The process g is adapted to ¾({Xs}), therefore ¾({gs}) µ ¾({Xs}) • Since X is (strong) Markov: E(fog(Xt+h)|¾({Xt}))=E(fog(Xt+h)|Xt) • Prove: E(f(gt+h)|¾({gt}))= E(f(gt+h)|gt) • Since Xt = g-1(t,gt) (w.P.1), there is a version of E(fog(Xt+h)|Xt) which is a function of gt, i.e. E(fog(Xt+h)|¾({Xt}))= E(fog(Xt+h)|Xt) = E(f(gt+h)|gt) w.P.1 • Thus gt is a Markov process Non strong Markov Process Example • P(g¿+h 2 G2 |¾({g¿})) = P(g¿+h 2 G2|¾({X¿})) w.P.1 • P(g¿+h 2 G2|¾({X¿})) = P(g¿+h 2 G2|X¿) (X is strong) = P(X¿ 2 {x1,x2}|X ¿) = sx1x2 Hh(´-X¿) d´ • On {!| X(!,¿)=x1} P(X¿+h 2 {x1,x2}|X ¿) = sx1x2 Hh(´-x1) d´ • On {!| X(!,¿)=x2} P(X¿+h 2 {x1,x2}|X ¿) = sx1x2 Hh(´-x2) d´ • Let f(g) = Â(g 2 G2), then E(f(g¿+h)| g¿) = P(g¿+h 2 G2 |g¿) • Since E(f(g¿+h)| g¿) is measurable in ¾(g¿) it has to be constant on G • However since sx1x2 Hh(´-x1) d´ sx1x2 Hh(´-x2) d´ it cannot coincide with P(g¿+h 2 G2 |¾({g¿})) Markov Processes remarks • • • • • • Models uncertain state dynamics BJ models have Markov representation Queing models are typically Markov Extensions: Semi Markov, GSMP Extensions: HMM, MDP Discrete time -> Markov chains Poisson processes • A Poisson process T is a point process, i.e. T: £ N -> R • {Tn} is increasing with probability 1 • Independent increments, i.e. P(Tn+k-Tn · d | Fn)= P(Tn+k-Tn · d) • Exponential increments, i.e. P(Tn+1-Tn · d) = 1-exp(-¸ d) • ¸: intensity parameter Poisson counting process • Let T be a Poisson process • Let the process N: X R -> N be defined by N(!,0)=0 N(!,t)=arg maxn {Tn · t} • N is the Poisson counting process and is non decreasing • N is a continuous time Markov process Poisson counting process • N has right continuous paths, i.e. Consider a realization {Tn} of T N(t)=n is constantly for t 2 [Tn,Tn+1), such that lim(from right) t -> Tn N(t) = n = N(Tn) • N has left limits, i.e. lim(from left) t -> Tn N(t) = n-1 • N is our first example of an rcll or cadlag proces Martingales • Let X: X R -> R • Let (Fs) be a filtration • X is called a Martingale w.r.t. (Fs) iff E(Xt | Fs) = Xs for every t ¸ s • Or E(Xt | Fs) - Xs = E(Xt - Xs | Fs) = 0 for every t ¸ s • Thus a Martingale has increments of zero conditional expectation. • A Martingale weather forecast would predict the weather tomorrow as it is today. • Super- (-) and sub-Martingales (+). Martingale properties • Let X be a Martingale (positive sub-Martingale) • Let |X|*t = sups{|Xs|, s · t} • Then P(|X|*t ¸ L) · E(|Xt|)/L • Recall the Markov inequality P(|Xt| ¸ L) · E(|Xt|)/L • Thus for a Martingale, |X|*t obeys a bound similar to |Xt|, even though |X|*t selects the larger element. Martingale Properties Example • Let {en} be an i.i.d. random (sequence) process, en ~ N0,1 • Let Xn = Xn-1 + en , X0=0 (random walk) • Then X is a Martingale and • Xn ~ N0,n • E(|Xn|) = (2 ¼ n)-1/2 s |x| exp(-x2/n) dx = n (2 ¼ n)-1/2 s |x| exp(-x2) dx = K n1/2 • Finally P(|X|*n ¸ L) · K n1/2/L Martingale Convergence • Let X be a non-negative supermartingale, i.e. E(Xt | Fs) · Xs for every t ¸ s • Then there is a random variable X1, s.t. X1 ¸ 0 E(X1) · E(X0) and Xt -> X1 w.P.1 Martingale Convergence Example • Let {en} be an i.i.d. random (sequence) process, en ~ N0,s • Let Xn = aXn-1 + Xn-1 en , X0=0 • Yn = Xn2 = (a+en)2 Xn-12 = (a+en)2 Yn-1 • Since Yn and en are independent E[Yn] = E[(a+en)2] E[Yn-1] • If E[(a+en)2] = a2 + s2 < 1, Yn is a non-negative supermartingale and E[Yn] -> 0 • Thus Y1 ¸ 0 and E(Y1) = 0 => Y1= 0 w.P.1 • Finally Yn -> 0 w.P.1 Measurement Noise N u=0 Input/Output system + y=0 ym=N + • At time ’t’ the measurement device would give: ymm(t) = 1/d st-dt N(¿) d¿ • We observe that the sequence {ymm(tj)} would be an i.i.d. sequence if tj –tj-1 ¸ d • Moreover ymm(tj) ~ N0,s • Thus the process w(t) = s0t N(¿) d¿ has independent Gaussian increments, i.e. w(t1) - w(t2) ~ N0,ss •Since w(t) - w(0) = w(t) – w(t/2) + w(t/2) - w(0) VAR(w(t) - w(0) ) = VAR(w(t) – w(t/2)) + VAR(w(t/2) - w(0)) = 2 VAR(w(t/2) - w(0)) • Generally: VAR(w(ct) - w(0) ) = c VAR(w(t) - w(0) ) Measurement Noise • VAR(w(ct) - w(0)) = c VAR(w(t) - w(0) ) • We define as a standard w(0)=0 and c=1 • Thus VAR(w(t))=t • Covariance (s<t): E(w(t)w(s)] =E((w(t)-w(s)+w(s))w(s)] = E[w(s)w(s)] = s = t Æ s • w is the standard Wiener process, Wiener and Kolmogorov both proved its existence. • It has a version w’ with continuous paths, i.e. P(|w(t)-w’(t)|> 0) = 0 and w’ is continuous in t for every t Process Noise N u uM Input/Output system • d/dt x = Á(x,uM) = Á(x,u+N) (dynamics) • d/dt x = Á(x,u+N) ¼ Á(x,u) + Áu(x,u) N = f(x,u) + G(x,u) N x The noise process N • • • • • • w(t) = s0t N(¿) d¿ w(t) is the Wiener process w(0)=0, a>0 ¿(a)=min{t | w(t)=a} (first hitting time) P(w(t)>a| ¿(a)=s) = P(w(t)<a| ¿(a)=s)= ½, s<t Now s0t P(w(t)>a|¿(a)=s) dP¿(s) = P(w(t)>a Æ ¿(a) < t) = P(w(t)>a) = s0t 1/2 dP¿(s) = ½ P(¿(a) < t) • Let M(t)=max{W(s) 0 · s · t} • Then P(M(t)>a)=P(¿(a)<t)=2P(w(t)>a) The noise process N • • • • • • • • w(t) = s0t N(¿) d¿ P(M(t)>a)=2P(w(t)>a) Assume w is differentiable in t=0 Then the limit limt->0 w(t)/t exists (finite) Also for an interval t 2 [0,²] w(t) · at for some positive a => M(t) · at P(M(t) · at)=1-P(M(t) > at)=1- 2P(w(t)>at) So P(M(t) · at) -> 0 for t->0 • And P(w is differentiable in t=0) = 0 • Argument generalises to to all times, i.e.: P(w is differentiable in t) = 0 Process Noise N u uM Input/Output system • d/dt x = Á(x,u+N) ¼ Á(x,u) + Áu(x,u) N = f(x,u) + G(x,u) N • How to proceed without N • Suggestion x(t) = s t f(x(s),u(s)) ds + G(x(s),u(s)) N(s) ds • Get rid of N by suggesting N(s) ds = dw x(t) = s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s) x A stochastic integral • x(t) = s t b(x(s),u(s)) ds + s t G(x(s),u(s)) dw(s) • Let G be constant then: • stt’ G dw(s) = stt’ G I(t 2 (t, t’]) dw(s) = G (w(t’)-w(t)) (consistent with observations) • Let G be a stepfunction, i.e. G(t,!) = i Gi(!) I(t 2 (ti,ti+1]) • Then stt’ G(t,!) dw(s) = i Gi(!) (w(ti+1)-w(ti)) (consistent with general properties of integrals) A stochastic integral stt’ G(t,!) dw(s) = i Gi(!) (w(ti+1)-w(ti)) For any other function stt’ G(t,!) dw(s) = i G(¿i , !) (w(ti+1)-w(ti)), ¿i 2 (ti ,ti+1] We choose ¿i somewhat regularly, i.e. ¿i = ti + a (ti+1-ti), a 2 [0,1] • Having (ti+1-ti) -> 0, will the choice of ’a’ impact the nature of the defined integral ?? • • • • A stochastic integral stt’ G(t,!) dw(s) = i G(¿i , !) (w(ti+1)-w(ti)), ¿i 2 (ti ,ti+1] ¿i = ti + a (ti+1-ti), a 2 [0,1] stt’ G(t,!) dw(s) = i G(ti + a (ti+1-ti) , !) (w(ti+1)-w(ti)) Consider then G(t,!)=w(t,!) stt’ G(t,!) dw(s) = stt’ w(t,!) dw(s) =lim i w(ti + a (ti+1-ti) , !) (w(ti+1)-w(ti)) • So (boundedness assumed) E(stt’ w(t) dw(s)) = lim i E[w(ti + a (ti+1-ti)) (w(ti+1)-w(ti))] = lim i E[w(ti + a (ti+1-ti)) (w(ti+1)- w(ti + a (ti+1-ti))+ w(ti + a (ti+1-ti)) -w(ti))] = lim i E[w(ti + a (ti+1-ti)) w(ti + a (ti+1-ti)] - E[w(ti + a (ti+1-ti)) w(ti))] = lim i (ti + a (ti+1-ti)) – ti = a(t’-t) • • • • • Ito and Stratonovich • E(stt’ w(t) dw(s)) = a(t’-t) • Choosing a=0 gives the Ito – integral • Let G be nonanticipating, i.e. G(t) is measurable in the history of w(t) • Then (with Ito definition) E[stt’ G(t,!) dw(s)] = 0 • Choosing a=1/2 gives the Stratonovich-integral, i.e. stt’ G(t,!) o dw(s) • We shall stick mostly to the Ito definition • However both definitions have different nice properties. Definition of the Ito stochastic integral • Definition for step functions, where G(t,!) = G(ti , !) for t 2 (ti+1,ti]: stt’ G(t,!) dw(s) = i G(ti , !) (w(ti+1)-w(ti)) • For other non-anticipative functions let {{ti}n} define a family of partitions of the interval [t,t’], such that supi (ti+1n - ti+1n) -> 0 for n->0 and (really enough) lim-pn stt’ |G(s,!)- Gn(s,!)|2 ds = 0 • Then stt’ G(t,!) dw(s) = lim-pn i G(tin , !) (w(ti+1 n)-w(ti n)) = lim-pn stt’ Gn(t,!) dw(s) Convergence • • • • For the definition to be meaningful The right hand side has to have a limit This limit should be independent of the choice of partitions. Arnold shows that stt’ Gn(s,!) dw(s) is a stochastic Cauchy sequence, i.e. limn,m P[stt’ |Gn(s,!)- Gm(s,!)| dw(s)>²]=0 for all ²>0 • Every stochastic Cauchy sequence converges stochastically to some random variable I(G) • Uniqueness is established by considering two families of of partitions {{ti}n} and {{¿ i}n} fulfilling the requirements. • We can define a third family by {di}2n = {ti}n, {di}2n+1 = {¿ i}n also fulfilling the requirements. • Convergence for d to I(G) implies convergence of t and ¿ to I(G) Properties of the Stochastic Integral • Simpler for step functions stt’ G(t,!) dw(s) = i G(ti , !) (w(ti+1)-w(ti)) • E[ stt’ G(t,!) dw(s)] = 0 • Proof E[i G(ti , !) (w(ti+1)-w(ti))] = i E[G(ti , !)] E[(w(ti+1)-w(ti))] = 0 Properties of the Stochastic Integral • E[ stt’ G(t) dw(s) sts’ G(t) dw(s)] for s’ < t’ (G independent of !/w) • Proof (Proof): (tj = s’) E[ G(ti) (w(ti+1)-w(ti)) j G(ti) (w(ti+1)-w(ti))] = E[{ j+1 G(ti) (w(ti+1)-w(ti))+j G(ti) (w(ti+1)-w(ti))} j G(ti) (w(ti+1)-w(ti))] = E[j+1 G(ti) (w(ti+1)-w(ti)) j G(ti) (w(ti+1)-w(ti))] + E[ j G(ti) (w(ti+1)-w(ti)) j G(ti) (w(ti+1)-w(ti))] = E[j+1 j G(ti) (w(ti+1)-w(ti)) G(tk) (w(tk+1)-w(tk))] + E[ j j G(ti) (w(ti+1)-w(ti)) G(tk) (w(tk+1)-w(tk))] = E[ j+1 j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] + E[ j j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] = Properties of the Stochastic Integral E[ j+1 j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] + E[ j j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] = j+1 j E[G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))]+ j j E[G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] = j+1 j G(ti) G(tk) E[(w(ti+1)-w(ti)) (w(tk+1)-w(tk))] + j j G(ti) G(tk) E[(w(ti+1)-w(ti)) (w(tk+1)-w(tk))] = j j G(ti) G(tk) E[(w(ti+1)-w(ti)) (w(tk+1)-w(tk))] = j G(ti) G(ti) E[(w(ti+1)-w(ti)) (w(ti+1)-w(ti))] = j G(ti) G(ti) (ti+1-ti)= sts’ G2(s) ds Stochastic Differential Equation SDE • x(t) = s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s) (i) • dx = f(x,t) ds + G (x,t) dw (shorthand) • A solution to (i) is non-anticipating and fulfills (i) for every t w.P.1, i.e. P(x(t)=s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s))=1 for all t • Not P(x(t)=s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s) for all t)=1 Ito’s lemma • Let x be a solution to dx = f(x,t) ds + G (x,t) dw • Then V(t)=U(x(t),t) is a solution to dV = (U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt + U_x(x,t) G(x,t) dw Fix Points • Let X be a metric space • Let d be the metric • Let g: X->X and for 0<L<1 d(g(x),g(y)) · L d(x,y) (Lipshitz) • Consider the iteration xn+1 = g(xn) Fix Points • xn+1 = g(xn) • d(xn+1, xn) · L d(xn, xn-1) · Ln d(x1, x0) -> 0 for n->1 • d(xn+2, xn+1) = d(g(xn+1), g(xn)) · L d(xn+1, xn) • d(xm+n, xn) · d(xm+n+1, xm+n) + d(xm+n, xm+n-1) + d(xn+1, xn) · (Lm+Lm-1+..+1) d(xn+1, xn) · d(xn+1, xn)/(1-L) • d(xn+1, xn)/(1-L) - > 0 for n->1 • {xn} is a Cauchy sequence • {xn} has a unique limit x in a complete metrix space Existence and uniqueness of solutions to SDEs • • • • • • • x(t) = s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s) Let M=sup{|w(s)|, s 2 [0,t]} Let f and g be Lipshitz w.r.t. x, i.e. d(x,y,!)=sup {|x(s,!)-y(s,!)|} d(f(x,u)-f(y,u)) · L d(x,y) d(g(x,u)-g(y,u)) · L d(x,y) Let T=1/L/(M+1) Existence and uniqueness of solutions to SDEs • xn+1(t) = g(xn(s)) = s T f(xn(s),u(s)) ds + s T G (xn(s),u(s)) dw(s) • g(x)-g(y) = s T f(x(s),u(s))- f(y(s),u(s)) ds + s T G (x(s),u(s))- G (x(s),u(s))) dw(s) • |g(x)-g(y)| · s T |f(x(s),u(s))- f(y(s),u(s))| ds + s T |G(x(s),u(s))- G (y(s),u(s)))| |dw(s)| · Ls T |x(s)-y(s)| ds + Ls T |x(s)-y(s)| |dw(s)| (step-fct) · LT d(x,y) + LTM d(x,y) = L T d(x,y)(1+M) < 1 • {xn} is a Cauchy sequence with a unique limitvalue Example • dx = -a x dt + dw • V(t) = U(x(t),t) = x2(t) dV = (U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt + U_x(x,t) G(x,t) dw = (2x (-ax)+1/2 * 2) dt + 2x dw =(-2a x^2+1) + dw = (-2a V+1) + 2x dw • E[V(t)]= E[s t (-2a V+1) ds + s t 2x dw(s)] E[s t (-2a V+1) ds = -2a s t E[V] ds + t • d/dt E[V]= -2a E[V] + 1 • E[V(t)] = ?? (Exercise, stability) Example • dx = a dt + b dw • V(t) = U(x(t),t) = e^x dV = (U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt + U_x(x,t) G(x,t) dw = (a e^x +1/2 e^x b^2) dt + e^x b dw = V (a +1/2 b^2) dt + V b dw = V c dt + V b dw V(t)=e^{x(t)} = exp(a t + b w(t)) (explicit solution !!) Linear SDEs • dx = (ax +bu)dt + c dw • x(t)=x(0) exp(at) solves: dx = ax dt • Define y by: dy = exp(-at) b u dt + exp(-at) c dw • Define x by: x(t) = exp(at) y(t) = U(t,y(t)) • Then by Ito dx = (a exp(at) y(t) + exp(at) (exp(-at) b u)) dt + exp(at) exp(-at) c dw = (a x + b u) dt + c dw !! dV = (U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt + U_x(x,t) G(x,t) dw Linear SDEs • • • • dx = (ax +bu)dt + c dw dy = exp(-at) b u dt + exp(-at) c dw y(t) = y(0) +s t exp(-as) b u(s) ds + s t exp(-as) c dw x(t) = exp(at) y(t) = exp(at) y(0) + exp(at) (s t exp(-as) b u(s) ds + s t exp(-as) c dw) = exp(at) y(0) +s t exp(-a(s-t)) b u(s) ds + s t exp(-a(s-t)) c dw • I.e. • ¹t = E(x(t)) = exp(at) y(0) +s t exp(-a(s-t)) b u(s) ds • R(t,t’) = E((x(t)- ¹t)(x(t’)- ¹t’)) = E(s t’ exp(-a(s-t’)) c dw s t exp(-a(s-t)) c dw) = s t’ exp(-a(s-t’)) c exp(-a(s-t)) c ds) = c2 exp(a(t’+t)) s t’ exp(-2as) ds Linear SDEs • R(t,t’) = E((x(t)-¹t)(x(t’)-¹t’)) = E(s t’ exp(-a(s-t’)) c dw s t exp(-a(s-t)) c dw) = s t’ exp(-a(s-t’)) c exp(-a(s-t)) c ds) = c2 exp(a(t’+t)) s t’ exp(-2as) ds = c2 exp(2at’) exp(a(t-t’)) s t’ exp(-2as) ds = c2 exp(2at’) exp(a(t-t’)) 1/a (exp(-2at’)-1) = c2/a exp(a(t-t’)) (1-exp(2at’)) -> c2/a exp(a(t-t’))