Stochastic Processes

advertisement
Stochastic Processes
Basics
Probability Spaces
• A probability space comprises:
a set 
a structure § (a ¾ -algabra)
a measure of confidence P (probability)
¾ -algebra
• A ¾ -algebra § is the class of subsets for which
we wish to assign probabilities
(the measurable sets)
• 2§
• If A 2 § then AC 2 §
• If {Ai} are all in §, then [i Ai 2 §
Probability
• Assigns positive measure (confidence) to subsets
of A µ 
• Let A 2 §, then
0 · P(A) · 1, is called the probability of A
• Let A,B 2  be disjoint, then
P(A [ B) = P(A) + P(B)
• Let {Ai} 2  be disjoint, then
P([ i Ai) =  i P(Ai)
• P() = 1
Intersections of ¾ -algebra
• There is at least 1 ¾–algebra of subsets of ,
namely 2 (powerset)
• If there are more ¾ –algebras §1 and §2, then
§1 Å §2 is a ¾ -algebra (exercise)
• Let ¡ be a class of ¾ – algebras, then
Å (§ 2 ¡) is a ¾ -algebra (exercise)
Generating a ¾ -algebra
• Let ¤ be a class of subsets of 
• Let ¡ be the class of ¾ -algebras, so that
¤ µ § for all § 2 ¡
• §(¤) = Å (§ 2 ¡)
• §(¤) is called the ¾ -algebra generated by ¤
• §(¤) is the smallest ¾ -algebra including ¤
(Prove: exercise)
Example
•  = N (natural numbers) = {1,2,..}
• ¤ = {{1}}
• §(¤)={N,;,N\{1},{1}}
More examples
•  = R (real numbers)
• ¤ = the open intervals of R
• §(¤) is called the Borel sets.
• If ¤ is a ¾ –algebra then
§(¤) = ¤ (prove:exercise)
Examples (probabilities)
•
•
•
•
 = N (natural numbers) = {1,2,..}
§ = 2N (all subsets)
P(n)=1/n (wrong, why: exercise)
P(n)=K/n2 (find K: exercise)
•
•
•
•
¤ = {{1}}
§(¤)={N,;,N\{1},{1}}
P({1}) = 0.5
Specifying P(A) for all A 2 ¤ is sufficient for
specifying P(A) for all A in §(¤)
Random (stochastic) variables
• Consider two sets  and X, each equipped with
¾ –algebras, § and F, resp.
• A mapping x:  -> X is called measurable
If for any A 2 F, x-1(A) 2 §
(inverse sets of measurable sets are measurable)
• There exists set A 2  st. x(A)  F !!! (Souslin)
• A random variable is a measurable mapping.
• Piecewise continuous mappings g: R->R are
measurable w.r.t. the Borel sets.
Induced probability distributions
• Let (,§,P) be a probability space
• Let x:  -> R (Borel sets)
• Then for A µ R (Borel set) define
P(A)=P(x-1(A))
• Define F:R->[0,1] by
F(y)=P((-1,y])
• F is called a distribution function (CDF)
Integration
• Let A be measurable
• Let P be a probability
• Let Â(A)be its indicator function, then we define its integral by
s Â(A)dP = sA dP = P(A)
• A step function (simple function) is a weighted sum of indicator
functions, i.e.
• g =  i ai  (A_i)
• s g dP =  i ai P(Ai)
• For a positive measurable mapping g, define the integral
s g dP = sup all simple functions g’ · g s g’ dP
• Generally for a measurable mapping g
s g dP = s g+ dP - s g- dP
Intragrals Examples
• Let g: [0,1]->R be continuous.
• ¹([a,b])=b-a (Lebesque measure)
• Then s g d¹ is the Riemann integral
• Let g: [0,1]-> {0,1} be defined by:
g(x)=1 for x 2 Q (rational)
g(x)= 0 elsewhere
and
• Let ¹ be the Lebesque measure, then
s g d¹ = 0 , whereas the Riemann integral is undefined (prove: exercise)
Integrals Examples
• Let P(A) = Â(x 2 A) (probability measure concentrated
on x)
• P({x})=1
• Then s g dP = g(x) (sampling – Dirac)
• Let P be concentrated on sequence of points {xi}, i.e.
P(A) =  i Pi Â(xi 2 A) (P({xi})= Pi) (Exercise: prove)
then
s g dP =  i Pi g(xi)
Density functions
• Let two measures ¹ and P be defined on some
measure space 
• Assume ¹(A)=0 => P(A)=0, then we say that
P is absolutely continuous (AC) w.r.t. ¹
• Theorem (Radon Nikodym)
Let P be AC w.r.t. ¹, then a measurable function
f:  ->  exists such that
P(A) = s f d¹
• f is called the density funtion of P
Density functions Example
• Consider the reals R, the Borel sets and the
Lebesque measure ¹
• Let P be absolutely continuous w.r.t. Lebesque
measure, e.g. P({x})=0 for x 2 R
Then f exists, such that
P(A) = s f d¹
(AKA s f(x) dx)
Conditional Probabilities
• Classic (Bayes)
• Let A,B be subsets
P(A|B) = P(A Å B) / P(B)
• In measure theory
• Let P be a probility measure
Let F µ § be ¾ –algebras (F is a sub ¾ –algebra of §)
• Define for A 2 §, P(A|F) to be some function,
measurable in F, such that for any F 2 F
P(A Å F) = sF P(A | F) dP
Conditional Probabilities
• Theorem
P(A | F) exists !!
• Proof:
Let ¹A(F)= P(A Å F) (measure on F )
¹A is AC w.r.t. P
RN guarantees existence of P(A | F)
measurable in F, i.e.
P(A Å F) = ¹A(F)= sF P(A | F) dP
An identity for conditional
probabilities
• Let F µ § be ¾ –algebras
• If P(A|§) is F measurable
• Then:
P(A|F) = P(A|§) w.P.1
• Proof: for all F in F
P(A Å F) = sF P(A|§) dP = sF P(A|F) dP
Conditional Probabilities
• Theorem
P(A | F) is (almost unique)
I.e., let g and g’ be candidates
Then
sF g dP = sF g’ dP for all F 2 F
• Proof:
(exercise)
Conditional Probabilities Examples
• Let  = R2
• Let § be the smallest ¾–algebra including B £ B
• Let x and y: R2 -> R (the natural projections) be
two random variables.
• Let §x and §y be the smallest ¾ – algebras such
that x and y are measurable
(generated by x and y resp.)
• Both §x µ § and §y µ §
Conditional Probabilities Examples
• Both §x µ § and §y µ §
• Define P(y 2 Y| §x) = P(y-1(Y)| §x) to be some
function measurable in §x, such that
P(y-1(Y) Å x-1(X)) = sinv(X) P(y 2 Y| §x) dP
• We define
P(y 2 Y| x) = P(y 2 Y| §x) = Py|x(Y)
Conditional Probabilities Examples
• Lemma §x = {B £ R | B 2 B}
• Proof:
{B £ R | B 2 B} is a ¾ –algebra
x-1(B) = B £ R for any B 2 B
Assume B £ R  §x, for some B 2 B then
x-1(B)  §x
(contradiction)
So {B £ R | B 2 B} is minimal
Conditional Probabilities Examples
• Lemma P(y 2 Y| x) is a function of x
• Proof:
Define g(x,y)=P(y 2 Y|§x) (= P(y 2 Y| x))
Assume g(x,y1)=g1  g(x,y2)=g2 for y1  y2
Thus (x,y1) 2 g-1(g1) and (x,y2)  g-1(g1) , so that
g-1(g1)  {B £ R | B 2 B} = §x (contradiction)
q.e.d.
Moments
• Let x be a random variable
• Expectation
E(x)=s x dP
• p-moment
E(|x|p)
• Variance
E((x-E(x))2) = E(x2)-E2(x)
Conditional moments
• F µ § be ¾ –algebras
• Let x and y be § measurable (random variables)
• E(y| F) is an F measurable function st for all F 2 F
st
sF E(y| F) dP = sF y dP
• Define ¹F(A) = P(A | F)
• Then (Prove as an exercise)
E(y|F)=s y d¹F
Conditional moments
• E(y|x) = E(y|§x) is a measurable in §x
(a function of x) (Exercise: Prove)
st for A 2 §x
E(y|x2 A) = sA y dP = sA E(y|§x) dP = sA E(y|x) dP
• E(y|x) = s y dPy|x (Exercise: Prove)
An identity for conditional
expectations
• Let F µ § be ¾ –algebras
• If E(y|§) is F measurable then
E(y| F) = E(y|§) w.P.1
• Proof: for all F in F
sF E(y| F) dP = sF y dP = sF E(y|§) dP
Another identity for conditional
expectations
• Let F µ § be ¾ –algebras
• If y is F measurable then
E(y| F) = y w.P.1
• Proof: for all F in F
sF E(y| F) dP = sF y dP
Stochastic Proces
• Let T be an index (time) set (R, R+, Z, N)
• Let x:  £ T -> R
where for each t 2 T , x(.,t) is a random
variable
• Then we say that x is a stochastic (random)
proces.
Stochastic Proces Example
•
•
•
•
•
•
Let ={0,1}
P(1)=P(0)=1/2
x(0,t)=0
x(1,t)=t
E(x(.,t))=s x(.,t) dP = ½ ¢ 0 + ½ ¢ t = t/2
E(x2(.,t))=s x2(.,t) dP = ½ ¢ 0 + ½ ¢ t2 = t2/2
Cylinders
• Let {t1,..,tn} be a finite ordered subset of T
• Let {A1,..,An} be subsets of R
• Then C={x(.,t1) 2 A1 Å ..Å x(.,tn) 2 An} µ  is
called a cylinder
• Let C be defined as above then
Ch={x(.,t1+h) 2 A1 Å ..Å x(.,tn+h) 2 An}
Stationarity
• A random proces is stationary iff
• P(Ch) = P(C) for all C and h
• x stationary => E(x(.,t))=E(x)=E (constant in t)
• x stationary => E(x2(.,t))=E(x2)
• x stationary => E((x(.,t+h)-E)(x(.,t)-E))=Cxx(h)
stationarity => wide sense stationarity
Filtrations
•
•
•
•
•
A filtration is an increasing set of ¾ –algebras {§t}
§t µ §
t1 < t2 => §t1 µ §t2
Let x be a random proces
§xt = ¾(x-1(C,¿) for ¿ · t)
(i.e. the smallest ¾ –algebra including all inverse
cylinder sets of x(.,¿), ¿ · t)
• §x = {§xt} is called the natural filtration of x
Adapted processes
• A random proces x is adapted to a filtration {§t} iff
for every t, x(.,t) is measurable w.r.t §t
• x is adapted to its natural filtration
• Lemma:
Let x be adapted to {§t}, and t1 <..< tn then
{x(.,t1) 2 A1 [ .. [ x(.,tn) 2 An} 2 §tn
Proof:
{x(.,tj) 2 Aj} 2 §tj µ §tn for all j· n
Since §tn is a ¾ –algebra the result follows
Adapted processes
• Lemma:
Let x be adapted to {§t}, and t1 <..< tn then
C={x(.,t1) 2 A1 Å ..Å x(.,tn) 2 An} 2 §tn
Proof:
{x(.,t1) 2 A1C [ .. [ x(.,tn) 2 AnC} 2 §tn

{x(.,t1) 2 A1C [ .. [ x(.,tn) 2 AnC}C 2 §tn

C={x(.,t1) 2 A1 Å ..Å x(.,tn) 2 An} 2 §tn
Stochastic Convergence
• i) Convergence in probability (Stochastic convergence)
limt ! 1 P(|x(t)|>±)=0 8 ±>0
• ii) Convergence in mean/moment
limt ! 1 E[|x(t)|²]=0
• iii) Almost sure convergence
P(limt ! 1 x(t)=0)=1
• iv) Convergence in distribution (weak,law)
limt ! 1 P(x(t) · x) = FX(x)
Convergence relations
• ii) => i)
• Proof:
Markov inequality:
P(|x(t)| ¸ a) · E(|x(t)|/a)
P(|x(t)|¸ a 1/²) =
P(|x(t)|² ¸ a) · E(|x(t)|²)/a
Convergence relations
• iii) => i)
• Proof:
A={! | P(limt !1 x(!,t)=0}
P(A)=1
*) limt ! 1 x(!,t)=0 => 9 n, s.t. |x(!,t)| · ±
for all t ¸ n
For a given !, let n(!) be the smallest such n
Let Am = {! | n(!) = m}
{Am} are disjoint
*) implies A = [m Am
and in turn
1 = P(A) = P([m Am) = m P(Am), i.e.
P(|x(!,t)| · ± for all t ¸l) = m=1l P(Am) -> 1 for l-> 1
Convergence relations
• i) does not imply iii)
• Proof by counterexample
Define c(k) uniformly distributed in {2k,..,2k+1}
Let {c(k)} be independent
for n 2 {2k,..,2k+1}, let
x(n)=1 if c(k)=n
x(n)=0 else
P(x(n) · ±)=P(x(n)=0) ¸ 1-2-k (convergence in probability)
{x(n)} converges to 0 nowhere (no a.s. convergence)
• i) does not imply ii) (exercise)
Convergence relations
•
•
•
•
•
•
iii) does not imply ii)
Let  = [1,1)
Let ! ~ f(!)=K ! -± (density)
Let x(!,t)=!/t for t ¸ 1
P(limt ! 1 x(!,t)=0)=1 (a.s. convergence)
E(x(!,t)²)=1/t²s !² K !-± d!
=K/t²s !²-± !
=K/t²/(1+²-±) [! 1+²-± ]11
• Finite only for 1+²-± < 0
More generally (Wikipedia)
Types of Stochastic Processes
•
•
•
•
•
•
•
•
Markov processes
Poisson processes
Martingales
Brownian motions
Ito processes
Ito diffusions
Levy processes
Renewal processes
Markov Processes
• Let X be a random process adapted to the
filtration (Ft)
• Then X is a Markov process w.r.t. (Ft) iff for any
measurable function f
E(f(Xt+h)|Ft)= E(f(Xt+h)|Xt) w.P.1
• In particular if (Ft) is generated by X and f is the
indicator function for some set A
P(Xt+h2 A|Xt,Xt1,..,Xtk)= P(Xt+h2 A|Xt)
for any selection of times t ¸ t1 ¸ t2 ¸ .. ¸ tk
Markov Processes
(Stopping times)
• Let ¿ be a random variable, such that
Â(¿(!) · t) is adapted to (Ft)
• The ¿ is said to be a stopping time w.r.t. (Ft)
• If (Ft) is generated by a random process X a
stopping time ¿ (w.r.t.) (Ft)) depends only on the
history of X
• Example (first escape time)
Let X(0) 2 U
¿ = inf {s | X(s)  U}
Strong Markov Property
• Let X be a Markov Process generating (Ft)
• The X is said to be a strong Markov process iff
E(f(X¿+h)|F¿)= E(f(X¿+h)|X¿)
for any stopping time ¿ w.r.t. (Ft)
(F¿ is generated by the history of X until ¿)
• Every discrete time Markov process (chain) is
strong
Non strong Markov Process
Example
•
•
•
•
•
Let g: R -> R2 (1-D curve in the plane)
Let x1 < x2
Let G=g(x1)=g(x2) and g(x)  g(y) elsewhere
G1={g(x)|x<x1}, G2={g(x)|x1<x<x2}, G3={g(x)|x2<x}
Let X:  £ R -> R be a strong Markov process, such that
0 < P(X(!,t+h) · y|X(!,t)=x)=sy Hh(´-x) d´ < 1 for all x,y
(P(X(!,t)=x1)=P(X(!,t)=x2)=0 for any fixed t > 0)
and
X(!,0)  x1 and X(!,0)  x2
Let
• ¿ = inf {s | g(s)=G} = inf {s | X(s) 2 {x1,x2}}
• Assume P(X(!,¿)=x1), P(X(!,¿)=x2) > 0
Non strong Markov Process
Example
• The process g is adapted to ¾({Xs}), therefore ¾({gs}) µ
¾({Xs})
• Since X is (strong) Markov:
E(fog(Xt+h)|¾({Xt}))=E(fog(Xt+h)|Xt)
• Prove:
E(f(gt+h)|¾({gt}))= E(f(gt+h)|gt)
• Since Xt = g-1(t,gt) (w.P.1), there is a version of E(fog(Xt+h)|Xt)
which is a function of gt, i.e.
E(fog(Xt+h)|¾({Xt}))= E(fog(Xt+h)|Xt) = E(f(gt+h)|gt) w.P.1
• Thus gt is a Markov process
Non strong Markov Process
Example
• P(g¿+h 2 G2 |¾({g¿})) = P(g¿+h 2 G2|¾({X¿})) w.P.1
• P(g¿+h 2 G2|¾({X¿})) = P(g¿+h 2 G2|X¿) (X is strong)
= P(X¿ 2 {x1,x2}|X ¿) = sx1x2 Hh(´-X¿) d´
• On {!| X(!,¿)=x1}
P(X¿+h 2 {x1,x2}|X ¿) = sx1x2 Hh(´-x1) d´
• On {!| X(!,¿)=x2}
P(X¿+h 2 {x1,x2}|X ¿) = sx1x2 Hh(´-x2) d´
• Let f(g) = Â(g 2 G2), then E(f(g¿+h)| g¿) = P(g¿+h 2 G2 |g¿)
• Since E(f(g¿+h)| g¿) is measurable in ¾(g¿) it has to be constant on G
• However since sx1x2 Hh(´-x1) d´  sx1x2 Hh(´-x2) d´ it cannot coincide
with P(g¿+h 2 G2 |¾({g¿}))
Markov Processes
remarks
•
•
•
•
•
•
Models uncertain state dynamics
BJ models have Markov representation
Queing models are typically Markov
Extensions: Semi Markov, GSMP
Extensions: HMM, MDP
Discrete time -> Markov chains
Poisson processes
• A Poisson process T is a point process, i.e.
T:  £ N -> R
• {Tn} is increasing with probability 1
• Independent increments, i.e.
P(Tn+k-Tn · d | Fn)= P(Tn+k-Tn · d)
• Exponential increments, i.e.
P(Tn+1-Tn · d) = 1-exp(-¸ d)
• ¸: intensity parameter
Poisson counting process
• Let T be a Poisson process
• Let the process N:  X R -> N be defined by
N(!,0)=0
N(!,t)=arg maxn {Tn · t}
• N is the Poisson counting process and is non
decreasing
• N is a continuous time Markov process
Poisson counting process
• N has right continuous paths, i.e.
Consider a realization {Tn} of T
N(t)=n is constantly for t 2 [Tn,Tn+1), such that
lim(from right) t -> Tn N(t) = n = N(Tn)
• N has left limits, i.e.
lim(from left) t -> Tn N(t) = n-1
• N is our first example of an rcll or cadlag
proces
Martingales
• Let X:  X R -> R
• Let (Fs) be a filtration
• X is called a Martingale w.r.t. (Fs) iff
E(Xt | Fs) = Xs for every t ¸ s
• Or
E(Xt | Fs) - Xs = E(Xt - Xs | Fs) = 0 for every t ¸ s
• Thus a Martingale has increments of zero conditional
expectation.
• A Martingale weather forecast would predict the
weather tomorrow as it is today.
• Super- (-) and sub-Martingales (+).
Martingale properties
• Let X be a Martingale (positive sub-Martingale)
• Let |X|*t = sups{|Xs|, s · t}
• Then
P(|X|*t ¸ L) · E(|Xt|)/L
• Recall the Markov inequality
P(|Xt| ¸ L) · E(|Xt|)/L
• Thus for a Martingale, |X|*t obeys a bound similar
to |Xt|, even though |X|*t selects the larger
element.
Martingale Properties
Example
• Let {en} be an i.i.d. random (sequence) process, en ~
N0,1
• Let Xn = Xn-1 + en , X0=0 (random walk)
• Then X is a Martingale and
• Xn ~ N0,n
• E(|Xn|) = (2 ¼ n)-1/2 s |x| exp(-x2/n) dx
= n (2 ¼ n)-1/2 s |x| exp(-x2) dx
= K n1/2
• Finally
P(|X|*n ¸ L) · K n1/2/L
Martingale Convergence
• Let X be a non-negative supermartingale, i.e.
E(Xt | Fs) · Xs for every t ¸ s
• Then there is a random variable X1, s.t.
X1 ¸ 0
E(X1) · E(X0)
and
Xt -> X1 w.P.1
Martingale Convergence
Example
• Let {en} be an i.i.d. random (sequence) process,
en ~ N0,s
• Let Xn = aXn-1 + Xn-1 en , X0=0
• Yn = Xn2 = (a+en)2 Xn-12 = (a+en)2 Yn-1
• Since Yn and en are independent
E[Yn] = E[(a+en)2] E[Yn-1]
• If E[(a+en)2] = a2 + s2 < 1, Yn is a non-negative supermartingale and
E[Yn] -> 0
• Thus Y1 ¸ 0 and E(Y1) = 0 => Y1= 0 w.P.1
• Finally
Yn -> 0 w.P.1
Measurement Noise
N
u=0
Input/Output
system
+
y=0
ym=N
+
• At time ’t’ the measurement device would give: ymm(t) = 1/d st-dt N(¿) d¿
• We observe that the sequence {ymm(tj)} would be an i.i.d. sequence if tj –tj-1 ¸ d
• Moreover ymm(tj) ~ N0,s
• Thus the process w(t) = s0t N(¿) d¿ has independent Gaussian increments, i.e.
w(t1) - w(t2) ~ N0,ss
•Since w(t) - w(0) = w(t) – w(t/2) + w(t/2) - w(0)
VAR(w(t) - w(0) ) = VAR(w(t) – w(t/2)) + VAR(w(t/2) - w(0)) = 2 VAR(w(t/2) - w(0))
• Generally:
VAR(w(ct) - w(0) ) = c VAR(w(t) - w(0) )
Measurement Noise
• VAR(w(ct) - w(0)) = c VAR(w(t) - w(0) )
• We define as a standard w(0)=0 and c=1
• Thus
VAR(w(t))=t
• Covariance (s<t):
E(w(t)w(s)]
=E((w(t)-w(s)+w(s))w(s)]
= E[w(s)w(s)] = s = t Æ s
• w is the standard Wiener process, Wiener and Kolmogorov both
proved its existence.
• It has a version w’ with continuous paths, i.e.
P(|w(t)-w’(t)|> 0) = 0 and w’ is continuous in t for every t
Process Noise
N
u
uM
Input/Output
system
• d/dt x = Á(x,uM) = Á(x,u+N) (dynamics)
• d/dt x = Á(x,u+N) ¼ Á(x,u) + Áu(x,u) N = f(x,u) + G(x,u) N
x
The noise process N
•
•
•
•
•
•
w(t) = s0t N(¿) d¿
w(t) is the Wiener process
w(0)=0, a>0
¿(a)=min{t | w(t)=a} (first hitting time)
P(w(t)>a| ¿(a)=s) = P(w(t)<a| ¿(a)=s)= ½, s<t
Now
s0t P(w(t)>a|¿(a)=s) dP¿(s) =
P(w(t)>a Æ ¿(a) < t) =
P(w(t)>a) = s0t 1/2 dP¿(s) = ½ P(¿(a) < t)
• Let M(t)=max{W(s) 0 · s · t}
• Then
P(M(t)>a)=P(¿(a)<t)=2P(w(t)>a)
The noise process N
•
•
•
•
•
•
•
•
w(t) = s0t N(¿) d¿
P(M(t)>a)=2P(w(t)>a)
Assume w is differentiable in t=0
Then the limit limt->0 w(t)/t exists (finite)
Also for an interval t 2 [0,²]
w(t) · at for some positive a => M(t) · at
P(M(t) · at)=1-P(M(t) > at)=1- 2P(w(t)>at)
So
P(M(t) · at) -> 0 for t->0
• And
P(w is differentiable in t=0) = 0
• Argument generalises to to all times, i.e.:
P(w is differentiable in t) = 0
Process Noise
N
u
uM
Input/Output
system
• d/dt x = Á(x,u+N) ¼ Á(x,u) + Áu(x,u) N = f(x,u) + G(x,u) N
• How to proceed without N
• Suggestion
x(t) = s t f(x(s),u(s)) ds + G(x(s),u(s)) N(s) ds
• Get rid of N by suggesting N(s) ds = dw
x(t) = s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s)
x
A stochastic integral
• x(t) = s t b(x(s),u(s)) ds + s t G(x(s),u(s)) dw(s)
• Let G be constant then:
• stt’ G dw(s) = stt’ G I(t 2 (t, t’]) dw(s) = G (w(t’)-w(t))
(consistent with observations)
• Let G be a stepfunction, i.e.
G(t,!) = i Gi(!) I(t 2 (ti,ti+1])
• Then
stt’ G(t,!) dw(s) = i Gi(!) (w(ti+1)-w(ti))
(consistent with general properties of integrals)
A stochastic integral
stt’ G(t,!) dw(s) = i Gi(!) (w(ti+1)-w(ti))
For any other function
stt’ G(t,!) dw(s) = i G(¿i , !) (w(ti+1)-w(ti)), ¿i 2 (ti ,ti+1]
We choose ¿i somewhat regularly, i.e.
¿i = ti + a (ti+1-ti), a 2 [0,1]
• Having (ti+1-ti) -> 0,
will the choice of ’a’ impact the nature of the defined
integral ??
•
•
•
•
A stochastic integral
stt’ G(t,!) dw(s) = i G(¿i , !) (w(ti+1)-w(ti)), ¿i 2 (ti ,ti+1]
¿i = ti + a (ti+1-ti), a 2 [0,1]
stt’ G(t,!) dw(s) = i G(ti + a (ti+1-ti) , !) (w(ti+1)-w(ti))
Consider then G(t,!)=w(t,!)
stt’ G(t,!) dw(s) = stt’ w(t,!) dw(s)
=lim i w(ti + a (ti+1-ti) , !) (w(ti+1)-w(ti))
• So (boundedness assumed)
E(stt’ w(t) dw(s))
= lim i E[w(ti + a (ti+1-ti)) (w(ti+1)-w(ti))]
= lim i E[w(ti + a (ti+1-ti)) (w(ti+1)- w(ti + a (ti+1-ti))+ w(ti + a
(ti+1-ti)) -w(ti))]
= lim i E[w(ti + a (ti+1-ti)) w(ti + a (ti+1-ti)] - E[w(ti + a (ti+1-ti)) w(ti))]
= lim i (ti + a (ti+1-ti)) – ti = a(t’-t)
•
•
•
•
•
Ito and Stratonovich
• E(stt’ w(t) dw(s)) = a(t’-t)
• Choosing a=0 gives the Ito – integral
• Let G be nonanticipating, i.e.
G(t) is measurable in the history of w(t)
• Then (with Ito definition)
E[stt’ G(t,!) dw(s)] = 0
• Choosing a=1/2 gives the Stratonovich-integral, i.e.
stt’ G(t,!) o dw(s)
• We shall stick mostly to the Ito definition
• However both definitions have different nice properties.
Definition of the Ito stochastic integral
• Definition for step functions, where
G(t,!) = G(ti , !) for t 2 (ti+1,ti]:
stt’ G(t,!) dw(s) = i G(ti , !) (w(ti+1)-w(ti))
• For other non-anticipative functions
let {{ti}n} define a family of partitions of the interval [t,t’], such that
supi (ti+1n - ti+1n) -> 0 for n->0
and (really enough)
lim-pn stt’ |G(s,!)- Gn(s,!)|2 ds = 0
• Then
stt’ G(t,!) dw(s) = lim-pn i G(tin , !) (w(ti+1 n)-w(ti n))
= lim-pn stt’ Gn(t,!) dw(s)
Convergence
•
•
•
•
For the definition to be meaningful
The right hand side has to have a limit
This limit should be independent of the choice of partitions.
Arnold shows that
stt’ Gn(s,!) dw(s)
is a stochastic Cauchy sequence, i.e.
limn,m P[stt’ |Gn(s,!)- Gm(s,!)| dw(s)>²]=0 for all ²>0
• Every stochastic Cauchy sequence converges stochastically to some
random variable I(G)
• Uniqueness is established by considering two families of of partitions
{{ti}n} and {{¿ i}n} fulfilling the requirements.
• We can define a third family by {di}2n = {ti}n, {di}2n+1 = {¿ i}n also fulfilling the
requirements.
• Convergence for d to I(G) implies convergence of t and ¿ to I(G)
Properties of the Stochastic Integral
• Simpler for step functions
stt’ G(t,!) dw(s) = i G(ti , !) (w(ti+1)-w(ti))
• E[ stt’ G(t,!) dw(s)] = 0
• Proof
E[i G(ti , !) (w(ti+1)-w(ti))] =
i E[G(ti , !)] E[(w(ti+1)-w(ti))] = 0
Properties of the Stochastic Integral
• E[ stt’ G(t) dw(s) sts’ G(t) dw(s)] for s’ < t’ (G independent of !/w)
• Proof (Proof): (tj = s’)
E[ G(ti) (w(ti+1)-w(ti)) j G(ti) (w(ti+1)-w(ti))] =
E[{ j+1 G(ti) (w(ti+1)-w(ti))+j G(ti) (w(ti+1)-w(ti))} j G(ti) (w(ti+1)-w(ti))] =
E[j+1 G(ti) (w(ti+1)-w(ti)) j G(ti) (w(ti+1)-w(ti))] +
E[ j G(ti) (w(ti+1)-w(ti)) j G(ti) (w(ti+1)-w(ti))] =
E[j+1 j G(ti) (w(ti+1)-w(ti)) G(tk) (w(tk+1)-w(tk))] +
E[ j j G(ti) (w(ti+1)-w(ti)) G(tk) (w(tk+1)-w(tk))] =
E[ j+1 j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] +
E[ j j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] =
Properties of the Stochastic Integral
E[ j+1 j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] +
E[ j j G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] =
j+1 j E[G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))]+
j j E[G(ti) G(tk) (w(ti+1)-w(ti)) (w(tk+1)-w(tk))] =
j+1 j G(ti) G(tk) E[(w(ti+1)-w(ti)) (w(tk+1)-w(tk))] +
j j G(ti) G(tk) E[(w(ti+1)-w(ti)) (w(tk+1)-w(tk))] =
j j G(ti) G(tk) E[(w(ti+1)-w(ti)) (w(tk+1)-w(tk))] =
j G(ti) G(ti) E[(w(ti+1)-w(ti)) (w(ti+1)-w(ti))] =
j G(ti) G(ti) (ti+1-ti)= sts’ G2(s) ds
Stochastic Differential Equation
SDE
• x(t) = s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s) (i)
• dx = f(x,t) ds + G (x,t) dw (shorthand)
• A solution to (i) is non-anticipating and fulfills (i) for
every t w.P.1, i.e.
P(x(t)=s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s))=1 for all t
• Not
P(x(t)=s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s) for all t)=1
Ito’s lemma
• Let x be a solution to
dx = f(x,t) ds + G (x,t) dw
• Then V(t)=U(x(t),t) is a solution to
dV =
(U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt
+ U_x(x,t) G(x,t) dw
Fix Points
• Let X be a metric space
• Let d be the metric
• Let g: X->X
and for 0<L<1
d(g(x),g(y)) · L d(x,y) (Lipshitz)
• Consider the iteration
xn+1 = g(xn)
Fix Points
• xn+1 = g(xn)
• d(xn+1, xn) · L d(xn, xn-1) · Ln d(x1, x0) -> 0 for n->1
• d(xn+2, xn+1) = d(g(xn+1), g(xn)) · L d(xn+1, xn)
• d(xm+n, xn) ·
d(xm+n+1, xm+n) + d(xm+n, xm+n-1) + d(xn+1, xn) ·
(Lm+Lm-1+..+1) d(xn+1, xn) · d(xn+1, xn)/(1-L)
• d(xn+1, xn)/(1-L) - > 0 for n->1
• {xn} is a Cauchy sequence
• {xn} has a unique limit x in a complete metrix space
Existence and uniqueness of solutions
to SDEs
•
•
•
•
•
•
•
x(t) = s t f(x(s),u(s)) ds + s t G (x(s),u(s)) dw(s)
Let M=sup{|w(s)|, s 2 [0,t]}
Let f and g be Lipshitz w.r.t. x, i.e.
d(x,y,!)=sup {|x(s,!)-y(s,!)|}
d(f(x,u)-f(y,u)) · L d(x,y)
d(g(x,u)-g(y,u)) · L d(x,y)
Let T=1/L/(M+1)
Existence and uniqueness of solutions
to SDEs
• xn+1(t) = g(xn(s)) = s T f(xn(s),u(s)) ds + s T G (xn(s),u(s))
dw(s)
• g(x)-g(y) = s T f(x(s),u(s))- f(y(s),u(s)) ds + s T G
(x(s),u(s))- G (x(s),u(s))) dw(s)
• |g(x)-g(y)|
· s T |f(x(s),u(s))- f(y(s),u(s))| ds
+ s T |G(x(s),u(s))- G (y(s),u(s)))| |dw(s)|
· Ls T |x(s)-y(s)| ds + Ls T |x(s)-y(s)| |dw(s)| (step-fct)
· LT d(x,y) + LTM d(x,y) = L T d(x,y)(1+M) < 1
• {xn} is a Cauchy sequence with a unique limitvalue
Example
• dx = -a x dt + dw
• V(t) = U(x(t),t) = x2(t)
dV = (U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt +
U_x(x,t) G(x,t) dw
= (2x (-ax)+1/2 * 2) dt + 2x dw
=(-2a x^2+1) + dw = (-2a V+1) + 2x dw
• E[V(t)]= E[s t (-2a V+1) ds + s t 2x dw(s)]
E[s t (-2a V+1) ds = -2a s t E[V] ds + t
• d/dt E[V]= -2a E[V] + 1
• E[V(t)] = ?? (Exercise, stability)
Example
• dx = a dt + b dw
• V(t) = U(x(t),t) = e^x
dV = (U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt
+ U_x(x,t) G(x,t) dw
= (a e^x +1/2 e^x b^2) dt + e^x b dw
= V (a +1/2 b^2) dt + V b dw
= V c dt + V b dw
V(t)=e^{x(t)} = exp(a t + b w(t)) (explicit solution !!)
Linear SDEs
• dx = (ax +bu)dt + c dw
• x(t)=x(0) exp(at) solves:
dx = ax dt
• Define y by:
dy = exp(-at) b u dt + exp(-at) c dw
• Define x by:
x(t) = exp(at) y(t) = U(t,y(t))
• Then by Ito
dx = (a exp(at) y(t) + exp(at) (exp(-at) b u)) dt + exp(at) exp(-at) c dw
= (a x + b u) dt + c dw !!
dV = (U_t(x,t)+U_x(x,t) f(x,t) +1/2 U_{xx} G^2(x,t)) dt + U_x(x,t) G(x,t)
dw
Linear SDEs
•
•
•
•
dx = (ax +bu)dt + c dw
dy = exp(-at) b u dt + exp(-at) c dw
y(t) = y(0) +s t exp(-as) b u(s) ds + s t exp(-as) c dw
x(t) = exp(at) y(t) = exp(at) y(0) + exp(at) (s t exp(-as) b u(s)
ds + s t exp(-as) c dw)
= exp(at) y(0) +s t exp(-a(s-t)) b u(s) ds + s t exp(-a(s-t)) c dw
• I.e.
• ¹t = E(x(t)) = exp(at) y(0) +s t exp(-a(s-t)) b u(s) ds
• R(t,t’) = E((x(t)- ¹t)(x(t’)- ¹t’)) =
E(s t’ exp(-a(s-t’)) c dw s t exp(-a(s-t)) c dw)
= s t’ exp(-a(s-t’)) c exp(-a(s-t)) c ds)
= c2 exp(a(t’+t)) s t’ exp(-2as) ds
Linear SDEs
• R(t,t’) = E((x(t)-¹t)(x(t’)-¹t’)) =
E(s t’ exp(-a(s-t’)) c dw s t exp(-a(s-t)) c dw)
= s t’ exp(-a(s-t’)) c exp(-a(s-t)) c ds)
= c2 exp(a(t’+t)) s t’ exp(-2as) ds
= c2 exp(2at’) exp(a(t-t’)) s t’ exp(-2as) ds
= c2 exp(2at’) exp(a(t-t’)) 1/a (exp(-2at’)-1)
= c2/a exp(a(t-t’)) (1-exp(2at’)) -> c2/a exp(a(t-t’))
Download