1
Sets
if
(x1 , y1 )
≤
(x2 , y2 )
then
FX,Y (x1 , y1 ) ≤ FX,Y (x2 , y2 )
(A ∪ B) ∪ C = A ∪ (B ∪ C)
FX,Y is continuous from above, in that:
(A ∪ B)C = AC ∩ B C
FX,Y (x + u, y + v) → FX,Y (x, y) as
Definition: σ-field
u, v → 0
F subset of Ω is a σ − f ield, if
Theorem:
(a) 0 ∈ F
If X and Y are independent and g, h :
(b) if A1 , A2 , ... ∈ F then ∪∞
R → R, then g(X) and h(Y ) are indei=1 Ai ∈ F
(c) if A ∈ F then Ac ∈ F
pendent too.
Definition:
The expectation of the random variable
2 Probability
X is: P
E(X) = x:f (x)>0 xf (x)
P(AC ) = 1 − P(A)
Lemma:
If B ⊇ A then P(B) = P(A)+P(B\A) ≥ If X has mass function f and g : R → R,
P(A)
then:
P
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
E(g(X)) = x g(x)f (x)
More
Continous counterpart:
Sn generally:
P
P
(
=P
− E (g(X)) = R ∞ g(x)fX (x)dx
i P(Ai )
P i=1 Ai )
−∞
)−
i<j P (Ai ∩ Aj )−
i<j<k P (Ai ∩ Aj ∩ AkDefinition:
n+1
· · · + (−1)
P (A1 ∩ A2 ∩ · · · ∩ An )
If k is a positive integer, the k:th moLemma 5, p. 7: Let A1 ⊆ A2 ⊆ . . . , ment mk of X is defined mk = E(X k ).
and writeSA for their limit:
The k:th central moment is σk =
∞
A =
= limi→∞ Ai then E((X − m1 )k )
i=1 Ai
P(A) = limi→∞ P(Ai )
Theorem:
Similarly,
B
⊇
B
⊇
B
⊇
.
.
.
,
then
The expectation operator E:
1
2
3
T∞
B = i=1 Bi = limi→∞ Bi
(a) If X ≥ 0 then E(X) ≥ 0
satisfies P(B)
=
limi→∞ P(Bi ) (b) If a, b ∈ R then E(aX + bY ) =
Multiplication rule
aE(X) + bE(Y )
P (A, B) = P (A)P (B | A)
Lemma:
Conditional Probability
If X and Y are independent, then
E(XY ) = E(X)E(Y )
P (A | B) = P P(A∩B)
(B)
Definition:
P (A,B,C,...)
P (A | B, C, ...) = P (B,C,...)
X and Y are uncorrelated if E(XY ) =
Bayes formula
E(X)E(Y )
P(A|B) = P(B|A)P(A)P(B)
Theorem:
Total probability
For random variables X and Y
P (A) = P (A | B)P (B)
(a) Var(aX) = a2 Var(X) for a ∈ R
+ P (A | P
B C )P (B C )
(b) Var(X + Y ) = Var(X) + Var(Y ) if
n
P (A) = i=1 P (A | Bi )P (Bi )
X and Y are uncorrelated.
Definition 1, p. 13:
Indicator function:
A family
{A
i : iQ∈ I}is independent if:
EIA = P(A)
T
P i∈J Ai = i∈J P(Ai ) For all finite
subset J of I
distribution function
F : R− > [0, 1] : F (x) = P (X ≤ x)
mass function
3
Random Variable
Lemma 11, p. 30:
Let F be a distribution function of X,
then
(a) P(X > x) = 1 − F (x)
(b) P(x < X ≤ y) = F (y) − F (x)
(c) F (X = x) = F (x) − limy→x F (y)
Marginal distribution:
limy→∞ FX,Y (x, y) = FX (x)
limx→∞ FX,Y (x, y) = FY (y)
Lemma 5, p. 39:
The joint distribution function FX,Y of
the random vector (X, Y ) has the following properties:
lim FX,Y (x, y) = 0
x,y→−∞
lim FX,Y (x, y) = 1
x,y→∞
3.1
Distribution functions
Constant variable
X(ω) = c: F (X) = σ(x − c)
Bernoulli distribution Bern(p)
A coin is tossed one time and shows
head with probability p with X(H) = 1
and X(T ) = 0
F (X) = 0 x < 0
F (X) = 1 − p 0 ≤ x < 1
F (X) = 1 x ≥ 1
E(X) = p, Var(X) = p(1 − p)
Binomial distribution bin(n, k)
A coin is tossed n times and a head
turns up each time with probability p.
The total number of heads is discribed
1
by:
f (k) = nk pk q n−k
E(X) = np, Var(X) = np(1 − p)
Poisson distribution
k
f (k) = λk! exp(−λ)
E(X) = Var(X) = λ
Geometric distribution
Independent Bernoulli trials are performed. Let W be the waiting time
before the first succes occurs. Then
f (k) = P (W = k) = p(1 − p)k−1
E(X) = 1/p, Var(X) = (1 − p)/p2
negative binomial distribution
Let Wr be the waiting time before the
r:th success. Then
r
k−r
f (k) = P (Wr = k) = k−1
r−1 p (1 − p)
k = r, r + 1
pr
pr
, Var = (1−p)
E(X) = 1−p
2
Exponential distribution:
F (x) = 1 − e−λx , x ≥ 0
E(X) = 1/λ, Var(X) = 1/λ2
Normal distribution:
(x−µ)2
1
f (x) = √2πσ
e− 2σ2 , −∞ < x < ∞
2
E(X) = µ, Var(X) = σ 2
Cauchy distribution:
1
f (x) = π(1+x
2 ) (no moments!)
3.2
Dependence
Joint distribution:
F
R y(x, Ry)x = P(X ≤ x, Y ≤ y) =
f (u, v)dvdu
−∞ −∞
Lemma:
The random variables X and Y are independent if and only if
fX,Y (x, y) = fX (x)fY (y) for all x, y ∈ R
FX,Y (x, y) = FX (x)FY (y) for all
x, y ∈ R
Marginal distribution:
FX (x)
= P(X ≤ x) = F (x, ∞) =
R x R −∞
f (u, y dydx
−∞
−∞
fX (x)
=
Marginal densities:
S
P
({X = x} ∩ {Y = y})
=
P y
P
x, Y = y) = y fX,Y (x, y)
y P(X =
R∞
fX (x) = −∞ f (x, y)dy
Lemma:
P
E(g(X, Y )) = x,y g(x, y)fX,Y (x, y)
Definition:
Cov(X, Y ) = E [(X − EX)(Y − EY )]
3.3
Conditional
tions
distribu-
Definition:
The conditional distribution of Y given
X = x is:
FY |X (y|x) = P(Y ≤ y|X ≤ x)
R x f (v,y)
=
dv, {y : fY (y) > 0}
−∞ fY (y)
Theorem: Conditional expectation
ψ(X) = E (Y | X) , E (ψ(X)) = E(Y )
E (ψ(X)g(X)) = E (Y g(X))
Exponential
Φ(t)
=
R ∞ itx −λx distr.
e
λe
dx
0
R ∞ eitx
Cauchy distr: Φ(t) = π1 −∞ (1+x
2 ) dx
Normal distr,
N
(0,
1)
:
Φ(t)
=
R∞ 1
1 2
√
3.4 Sums of random vari- E(eitX ) = −∞
exp(itx
−
x
)dx
2
2π
Corollary:
ables
Random variables X and Y have the
Theorem:
P
same characteristic function if and only
P(X + Y = z) = x f (x, z − x)
if they have the same distribution funcIf X and Y are independent, then
tion. Theorem: Law of large Numbers
P(X + Y = z) = fX+Y
P
P (z) = Let X1 , X2 , X3 . . . be a sequence of iid
=
x fX (x)fY (z − x)
y fX (z − r.v’s with finite mean µ.
y)fY (y)
Their partial sums Sn = X1 + X2 + · · · +
D
Xn satisfy n1 Sn →µ as n → ∞
Central Limit Theorem:
3.5 Multivariate normal dis- Let X1 , X2 , X3 . . . be a sequence of iid
tribution:
r.v’s with finite mean µ and finite nonzero σ 2 , and let Sn = X1 +X2 +· · ·+Xn
exp(− 12 (x−µ)T V−1 (x−µ))
√
f (x) =
then
(2π)n det(V)
(Sn −nµ) D
T
√
→N (0, 1) as n → ∞
X = (X1 , X2 , . . . , Xn )
nσ 2
µ = (µ1 , . . . , µn ), E(Xi ) = µi
V = (vij ), vij = Cov(Xi , Xj )
5
4
Markov chains
Generating functions
Definiton Markov Chain
P (Xn = s | X0 = x0 , ...Xn − 1 = xn 1) =
Definition: Generating function
P (Xn | Xn − 1 = xn − 1)
The generating function of the random
Definition homogenous chain
variable X is defined by:
P (Xn + 1 = j | Xn = i) = P (X1 = j |
G(s) = E(sX )
X0 = i)
Example: Generating functions
Definition transition matrix
Constant: G(s) = sc
P = (pij ) with
Bernoulli: G(s) = (1 − p) + ps
pij = P (Xn+1 = j | Xn = i)
ps
Geometric: 1−s(1−p)
Theorem: P stochastic matrix
Poisson: G(s) = eλ(s−1)
(a) P has non-negative entries
Theorem: expectation ↔ G(s)
(b) P has row sum equal 1
(a) E(s) = G0 (1)
n-step transition
(b) E(X(X −1) . . . (X −k+1)) = G(k) (1) pij (m, m + n) = P (Xm+n = j | Xm = i)
Theorem: independance
Theorem Chapman Kolmogorov
X and Y are independent, iff
p
ij (m.m + n + r) =
P
GX+Y (s) = GX (s)GY (s)
k pik (m, m + n)pkj (m + n, m + n + r)
so
n
4.1 Characteristic functions P(m, m + n) = P
Definiton: mass funtion
Definition: moment generating function µ(n) = P (X = i)
n
i
M (t) = E(etX )
µ(m+n) = µ(m)Pn ⇒ µ(n) = µ(0)Pn
Definition: characteristic function
Definition: persistent, transient
Φ(t) = E(eitX )
persistent:
Theorem: independance
P (Xn = i for some n ≥ 1 | X0 = i) = 1
X and Y are independent iff
transient:
ΦX+Y (t) = ΦX (t)ΦY (t)
P (Xn = i for some n ≥ 1 | X0 = i) < 1
Theorem: Y = aX + b
Definition: first passage time
ΦY (t) = eitb ΦX (at)
fij (n) = P (X1 6= j, ., Xn−1 6= j, Xn =
Definition: joint characteristic function j | X0 = i)
P∞
ΦX,Y (s, t) = E(eisX eitY )
fij := n=1 fij (n)
Independent if:
Corollary: persistent, transient
P
ΦX,Y (s, t) = ΦX (s)ΦY (t) for all s and t State j is persistent if n pjj (n) = ∞
P
Theorem: moment gf ↔ charact. fcn
and ⇒ n pij (n) = ∞Pfor all i
Examples of characteristic functions:
State j P
is transient if n pjj (n) < ∞
Ber(p): Φ(t) = q + peit
and ⇒ n pij (n) < ∞ for all i
Bin distribution, bin(n, p): ΦX (tt) = Theorem:PGenerating functions
n
∞
(q + peit )
Pij (s) = n=0 sn pij (n)
2
P∞
Fij (s) = n=0 sn fij (n)
then
(a) Pii (s) = 1 + Fii (s)Pii (s)
(b) Pij (s) = Fij (s)Pij (s) if i 6= j
Definition: First visit time Tj
Tj := min{n ≥ 1 : Xn = j}
Definition: mean recurrence
P time µi
µi := E(Ti | X0 = i) = n nfii (n)
Definition: null, non-null state
state i is null if µi = ∞
state i is non-null if µi < ∞
Theorem: nullness of a persistent state
A persistent state is null if and only if
pii (n) → 0(n → ∞)
Definition: period d(i)
The period d(i) of a state i is defined
by d(i) = gcd{n : pii (n) > 0}. We call
i periodic if d(i) > 1 and aperiodic if
d(i) = 1
Definition: Ergodic
A state is called ergodic if it is persistent, non-null, and aperiodic.
Definition: (Inter-)communication
i → j if pij (m) < 0 for some m
i ↔ j if i → j and j → i
Theorem: intercommunication
If i ↔ j then:
(a) i and j have the same period
(b) i is transient iff j is transient
(c) i is null peristent iff j is null peristent
Definition: closed, irreducible
a set C of states is called:
(a) closed if pij = 0 for all i ∈ C, j ∈
/C
(b) irreducible if i ↔ j for all i, j ∈ C
an absorbing state is a closed set with
one state
Theorem: Decomposition
State space S can be partitioned as
T = T ∪ C1 ∪ C2 ∪ . . . , T is the set of
transient states, Ci irreducible, closed
sets of persistent states
Lemma: finite S
If S is finite, then at least one state is
persisten and all persistent states are
non-null.
5.1
Stationary distributions
Definition: stationary distribution
π is called stationaryPdistribution if
(a) πj ≥ 0 for all j, Pj πj = 1
(b) π = πP , so πj = i πi pij for all j
Theorem: existence of stat. distribution
An irreducible chain has a stationary
distribution π iff all states are non-null
persistent.
Then π is unique and given by πi = µ−1
i
Lemma: ρi (k)
ρi (k): mean number of visits of the
chain to the state i between two successive visits to state k.
Lemma: For any state k of an irre-
ducible persistent chain, the vector
ρ(k) satisfies ρi (k) < ∞ for all i and
ρ(k) = ρ(k)P
Theorem: irreducible, persistent
If the chain is irreducible and persistent, there exists a positive x with
x = xP , which is unique to a multiplicative
P constant. The chain
P is non-null if
x
<
∞
and
null
if
i
i
i xi = ∞
Theorem: transient chain if
s any state of an irreducible chain. The
chain is transient iff there exists a nonzero solution {yj : j 6= s}, with |yj | ≤ 1
for allPj, to the equation:
yi = j,j6=s pij yj ,
i 6= s
Theorem: persistent if
s any state of an irreducible chain on
S = {0, 1, 2, ...}. The chain is persistent
if there exists a solution {yj : j 6= s} to
the inequalities
P
yi ≥ j,j6=s pij yj ,
i 6= s
Theorem: Limittheorem
For an irreducible aperiodic chain, we
have that
pij (n) → µ1j as n → ∞ for all i and j
5.2
Reversibility
Theorem: Inverse Chain
Y with Yn = XN −n is a Markov chain
π
with P (Yn+1 = j | Yn = i) = ( πji )pji
Definition: Reversible chain
A chain is called reversible if
πi pij = πj pji
Theorem: reversible → stationary
If there is a π with πi pij = πj pji then
π is the stationary distribution of the
chain.
5.3
Poisson process
Eq. Forward System of Equations:
p0 ij (t) = λj−1 pi,j−1 (t) − λj pij (t)
j ≥ i, λ−1 = 0, pij (0) = δij
Eq. Backward systems of equations:
p0ij (t) = λi pi+1,j (t) − λpij (t)
j ≥ i pij (0) = δij
Theorem:
The forward system has a unique solution which satisfies the backward equation.
convergent in distribution
d
Xn −
→X
if P (Xn < x) → P (X < x) as n → ∞
Theorem: implications
a.s./r
P
(Xn −−−−→ X) ⇒ (Xn −
→ X) ⇒
D
(Xn −→ X)
For r > s ≥ 1 :
r
s
(Xn −
→ X) ⇒ (Xn −
→ X)
Theorem: additional implications
D
(a) If Xn −→ c, where c is const, then
5.4 Continuous
Markov Xn −P→ c
P
chain
(b) If Xn −
→ X and P (|Xn | ≤ k) = 1
r
for all n and some k, then Xn −
→ X for
Definition: Continuous Markov chain
all r ≥ 1
X is continuous Markov chain if:
P (X(tn ) = j | X(t1 ))i1 , ..., X(tn−1 ) = (c) IfPPn () = P (|Xn − X|) > ) satisin−1 ) = P (X(tn ) = j | X(tn−1 ) fies n Pn () < ∞ for all > 0, then
a.s.
Definition: transistion probability
Xn −−→ X
pij (s, t) = P (X(t) = j | X(s) = i) for Theorem: Skorokhod’s representation t.
D
s≤t
If Xn −→ X as n → ∞
homogeneous if pij (s, t) = pij (0, t − s)
then there exists a probability space and
Def: Generator Matrix
random variable Yn , Y with:
G = (gij ), pij (h) = gij h if i 6= j and (a) Y and Y have distribution F , F
n
n
pii = 1 + gij h
a.s.
(b) Yn −−→ Y as n → ∞
Eq. Forward systems of equations:
Theorem: Convergence over function
Pt0 = Pt G
D
If Xn −→ X and g : R → R is continuEq. Backward systems of equations:
D
Pt0 = GP t
ous then g(Xn ) −→ g(X)
Often solutions on the form Pt = Theorem: Equivalence
exp(tG)
The following statements are equivalent:
D
Matrix Exponential:
(a) Xn −→ X
P∞ tn n
exp(tG) = n=1 n! G
(b) E(g(Xn )) → E(g(X)) for all
Definition:
bounded continuous functions g
Irreducible if for any pair i, j, pij (t) > 0 (c) E(g(Xn )) → E(g(X)) for all funcfor some t
tions g of the form g(x) = f (x)I[a,b] (x)
Definition: Stationary
where f is continuous.
P
π, πj ≥ 0,
Theorem: Borel-Cantelli
j πj = 1 and
π = πPt ∀t ≥ 0
Let A = ∩n ∪∞
m=n Am be the event that
Claim: π = πPt ⇔ πG = 0
infinitely many of
Pthe An occur. Then:
Theorem:
(a) P (A) = 0 if P
n P (An ) < ∞
Stationary if pij (t) → πj as t → ∞ ∀i, j (b) P (A) = 1 if
n P (An ) = ∞ and
Not stationary if pij → 0
A1 , A2 , ... are independent.
Theorem:
X
→ X and Y
→ Y implies
6 Convergence of Ran- Xnn + Yn → X + Yn for convergence
a.s., r:th mean and probability. Not
dom Variables
generally true in distribution.
Norm
(a)kf k ≥ 0
(b) kf k = 0 iff f = 0
6.1 Laws of large numbers
(c) kaf k = |a| · kf k
(d) kf + gk ≤ kf k + kgk
Theorem:
convergent almost surely
2
X
1 , X2 , . . . is iid and E(Xi ) < ∞ and
a.s.
Xn −−→ X,
E(X)
Pn = µ then
if {ω ∈ Ω : Xn (ω) → X(ω) as n → ∞}
1
i=1 Xi → µ a.s. and in mean square
n
convergent in rth mean
Theorem:
r
Xn −
→X
{Xn } iid.
Distribution function F .
P
if E|Xnr | < ∞ and E(|Xn − X|r ) → 0 as Then 1 Pn X →µ
iff one of the foli=1 i
n
n→∞
lowing holds:
R
convergent in probability
1) nP (|X | > n) → 0 and
xdF
Definition: Poisson process
N (t) gives the number of events in time
t
Poisson process N (t) in S = {0, 1, 2, ...},
if
(a) N (0) = 0; if s < t then N (s) ≤ N (t)
(b) P (N (t + h) = n + m | N (t) = n) =
λh + o(h) if m = 1
o(h) if m > 1
1 − λh + o(h) if m = 0
(c) the emission per interval are independent of the intervals before.
Theorem: Poisson distribution
N (t) has the Poisson distribution:
(λt)j −λt
P (N (t)
=
j)
=
j! e
Definition: arrivaltime, interarrivaltime
Arrival time: Tn = inf {t : N (t) = n}
Interarrivaltime: Xn = Tn − Tn−1
Theorem: Interarrivaltime
X1 , X2 , ... are independent having exponential distribution Definition:
P
Birth process → Poisson process with Xn −
→X
intensities λ0 , λ1 , . . .
if P (|Xn − X| > ) → 0 as n → ∞
3
1
[−n,n]
as n → ∞
2) Char. Fcn. Φ(t) of Xi is differen-
tiable at t = 0 and Φ0 (0) = iµ
The best predictor of Y given X is
Theorem: Strong law of large numbers E(X|Y )
X1P
, X2 , ... iid. Then
n
1
i=1 Xi → µ a.s. as n → ∞.
n
Stochastic processes
for some µ, iff E|X1 | < ∞. In this case 7
µ = EX1
Definition: Renewal process
N = {N (t) : t ≥ 0} is a process for
6.2 Law of iterated loga- which N (t) = max{n : Tn ≤ t} where
T0 = 0, Tn = X1 + X2 + · · · + Xn for
rithm
n ≥ 1, and the Xn are iid non-negative
If X1 , X2 , ... are iid with mean 0 and r.v.’s
variance 1 then
Sn
P(lim supn→∞ √2nloglogn
= 1) = 1
6.3
Martingales
Definition: Martingale
Sn : n ≥ 1 is called a martingale with
respect to the sequence Xn : n ≥ 1, if
(a) E|Sn | < ∞
(b) E(Sn+1 | X1 , X2 , ..., Xn ) = Sn
Lemma:
E(X1 + X2 |Y ) = E(X1 |Y ) + E(X2 |Y )
E (Xg(Y )|Y ) = g(Y )E(X|Y ), g :
Rn → R
E (X|h(Y )) = E(X|Y ) if h : Rn → Rn
is one-one
Lemma: Tower Property
E [E(X|Y1 , Y2 )|Y1 ] = E(X|Y1 )
Lemma:
If {Bi : 1 ≤ i ≤Pn} is a partition of A
n
then E(X|A) = i=1 E(X|Bi )P(Bi )
Theorem: Martingale convergence
If {Sn } is a Martingale with E(Sn2 <
M < ∞) for some M and all n then
a.s,L2
∃S : Sn −→ S
6.4
Prediction and conditional expectation
Notation:
p
p
kU k2 = E(U 2 ) = hU, U i
hU, V i = E(U V ), kUn − U k2 → 0 ⇔
L2
Un −→U
kU + V k2 ≤ kU k2 + kV k2
Def.
X, Y r.v. E(Y 2 ) < ∞. The minimum
mean square predictor of Y given X is
Ŷ = h(X) = min kY − Ŷ k2
Theorem:
If a (linear) space H is closed and Ŷ ∈ H
then min Y − Ŷ
exists
2
Projection Theorem:
H is a closed linear space and Y :
E(Y 2 ) < ∞
For M ∈ H then E((Y − M )Z) = 0 ⇔
kY − M k2 ≤ kY − Zk2 ∀Z ∈ H
Theorem:
Let X and Y be r.v., E(Y 2 ) < ∞.
with
Pnthe same mean as Xn such that
1
j=1 Xj → Y a.s and in mean
n
Weakly stationary processes:
If X = {Xn : n ≥ 1} is a weakly stationary process, there exists
that
Pn a Y such
m.s.
E(Y ) = E(X1 ) and n1 j=1 Xj −→Y .
8.1
Gaussian processes
Definition:
A real valued c.t. process is called
Gaussian if each finite dimensional
vector (X(t1 ), X(t2 ), . . . , X(tn )) has
8 Stationary processes the multivariate normal distribution
N (µ(t), V(t)) , t = (t1 , t2 , . . . , tn )
Definition:
Theorem:
The Autocovariance function
The Gaussian process X is stationary
c(t, t + h) = Cov(X(t), X(t + h))
iff. E (X(t)) is constant for all t and
Definition: The process X = {X(t) : V(t) = V(t + h) for all t and h > 0
t ≥ 0} taking real values is called
strongly stationary if the families
¯
Inequalities
{X(t1 ), X(t2 ), ..., X(tn )} and {X(t1 + 9
h), X(t2 +h), ..., X(tn +h)} has the same
joint distribution for all t1 , t2 , ..., tn and Cauchy-Schwarz:
2
(E(XY )) ≤ E(X 2 )E(Y 2 )
h>0
Definition: The process X = {X(t) : with equality if and only if aX +bY = 1.
t ≥ 0} taking real values is called Jensen’s inequality:
weakly stationary if, for all t1 and Given a convex function J(x) and a r.v.
¯
t2 and h > 0 :E(X(t1 )) = E(X(t2 )) X with mean µ: E(J(X)) ≥ J(µ)
and Cov(X(t1 ), X(t2 )) = Cov(X(t1 + Markov’s inequality
h), X(t2 + h)), thus if and only if it has P (|X| ≥ a) ≤ E|X| for any a > 0
a
constant means and its autocovariance h : (R) → [0, ∞] non-negative fcn, then
function satisfies c(t, t + h) = c(0, h)
P (h(X) ≥ a) ≤ E(h(X))
∀a > 0
a
Definition:
General Markov’s inequality:
The covariance of complex-valued h : (R) → [0, M ] non-negative function
C1 and C2 is
bounded by some M . Then
Cov(C1 , C2 ) = E (C1 − EC1 )(C2 − EC2 )
P (h(X) ≥ a) ≤ E(h(X))−a
0≤a<M
M −a
Theorem:
Chebyshev’s Inequality:
{X} real, stationary with zero mean
E(X 2 )
P (|X| ≥ a) ≤ a2 if a ≥ 0
and autocovariance c(m).
The best predictor from the class of lin- Theorem: Holder’s inequality
ear functions of the subsequence {X}rr−s If p, q > 1 and p−1 + q −1 = 1 then
1
1
is
E|XY | ≤ (E|X p |) p (E|Y p |) q
P
s
br+k =
X
Minkowski’s inequality:
Psi=0 ai Xr−i
where
i=0 ai c (|i − j|) = c(k + j) for If p ≥ 1 then
1
1
1
0≤j≤s
[E(|X + Y |p )] p ≤ (E|X p |) p + (E|Y p |) q
Definition:
Minkowski 2:
Autocorrelation function of a weakly E(|X + Y |p ) ≤ CP [E|X p | + E|Y p |]
stationary process with autocovariance where p > 0 and
function c(t) is
c(t)
1 0<p≤1
ρ(t) = √ Cov(X(0),X(t))
= c(0)
CP →
Var(X(0))Var(X(t))
2p−1 p > 0
Theorem:
Spectral theorem: Weakly stationary Kolomogorov’s inequality:
process X with strictly pos. σ 2 is the Let {X } be iid with zero means and
n
char. function of some distribution F variances σ 2 . Then for > 0
n
whenever
σ 2 +···+σ 2
R ∞ρ(t)λ is continuous at t = 0
P(
max
|X
+ · · · + Xi | > ) ≤ 1 2 n
1
ρ(t) = −∞ e dF (λ)
1≤i≤n
F is called the spectral distribution Doob-Kolomogorov’s inequality:
function.
If {Sn } is a martingale, then for any
>0
Ergodic theorem:
E(S 2 )
X is strongly stationary such that P( max |Si | > ) ≤ 2n
1≤i≤n
E|X1 | < ∞ there exists a r.v Y
4
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )