Uploaded by judd.israelsohn.za

cheatsheet 2

advertisement
1
Sets
if
(x1 , y1 )
≤
(x2 , y2 )
then
FX,Y (x1 , y1 ) ≤ FX,Y (x2 , y2 )
(A ∪ B) ∪ C = A ∪ (B ∪ C)
FX,Y is continuous from above, in that:
(A ∪ B)C = AC ∩ B C
FX,Y (x + u, y + v) → FX,Y (x, y) as
Definition: σ-field
u, v → 0
F subset of Ω is a σ − f ield, if
Theorem:
(a) 0 ∈ F
If X and Y are independent and g, h :
(b) if A1 , A2 , ... ∈ F then ∪∞
R → R, then g(X) and h(Y ) are indei=1 Ai ∈ F
(c) if A ∈ F then Ac ∈ F
pendent too.
Definition:
The expectation of the random variable
2 Probability
X is: P
E(X) = x:f (x)>0 xf (x)
P(AC ) = 1 − P(A)
Lemma:
If B ⊇ A then P(B) = P(A)+P(B\A) ≥ If X has mass function f and g : R → R,
P(A)
then:
P
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
E(g(X)) = x g(x)f (x)
More
Continous counterpart:
Sn generally:
P
P
(
=P
− E (g(X)) = R ∞ g(x)fX (x)dx
i P(Ai )
P i=1 Ai )
−∞
)−
i<j P (Ai ∩ Aj )−
i<j<k P (Ai ∩ Aj ∩ AkDefinition:
n+1
· · · + (−1)
P (A1 ∩ A2 ∩ · · · ∩ An )
If k is a positive integer, the k:th moLemma 5, p. 7: Let A1 ⊆ A2 ⊆ . . . , ment mk of X is defined mk = E(X k ).
and writeSA for their limit:
The k:th central moment is σk =
∞
A =
= limi→∞ Ai then E((X − m1 )k )
i=1 Ai
P(A) = limi→∞ P(Ai )
Theorem:
Similarly,
B
⊇
B
⊇
B
⊇
.
.
.
,
then
The expectation operator E:
1
2
3
T∞
B = i=1 Bi = limi→∞ Bi
(a) If X ≥ 0 then E(X) ≥ 0
satisfies P(B)
=
limi→∞ P(Bi ) (b) If a, b ∈ R then E(aX + bY ) =
Multiplication rule
aE(X) + bE(Y )
P (A, B) = P (A)P (B | A)
Lemma:
Conditional Probability
If X and Y are independent, then
E(XY ) = E(X)E(Y )
P (A | B) = P P(A∩B)
(B)
Definition:
P (A,B,C,...)
P (A | B, C, ...) = P (B,C,...)
X and Y are uncorrelated if E(XY ) =
Bayes formula
E(X)E(Y )
P(A|B) = P(B|A)P(A)P(B)
Theorem:
Total probability
For random variables X and Y
P (A) = P (A | B)P (B)
(a) Var(aX) = a2 Var(X) for a ∈ R
+ P (A | P
B C )P (B C )
(b) Var(X + Y ) = Var(X) + Var(Y ) if
n
P (A) = i=1 P (A | Bi )P (Bi )
X and Y are uncorrelated.
Definition 1, p. 13:
Indicator function:
A family
{A
i : iQ∈ I}is independent if:
EIA = P(A)
T
P i∈J Ai = i∈J P(Ai ) For all finite
subset J of I
distribution function
F : R− > [0, 1] : F (x) = P (X ≤ x)
mass function
3
Random Variable
Lemma 11, p. 30:
Let F be a distribution function of X,
then
(a) P(X > x) = 1 − F (x)
(b) P(x < X ≤ y) = F (y) − F (x)
(c) F (X = x) = F (x) − limy→x F (y)
Marginal distribution:
limy→∞ FX,Y (x, y) = FX (x)
limx→∞ FX,Y (x, y) = FY (y)
Lemma 5, p. 39:
The joint distribution function FX,Y of
the random vector (X, Y ) has the following properties:
lim FX,Y (x, y) = 0
x,y→−∞
lim FX,Y (x, y) = 1
x,y→∞
3.1
Distribution functions
Constant variable
X(ω) = c: F (X) = σ(x − c)
Bernoulli distribution Bern(p)
A coin is tossed one time and shows
head with probability p with X(H) = 1
and X(T ) = 0
F (X) = 0 x < 0
F (X) = 1 − p 0 ≤ x < 1
F (X) = 1 x ≥ 1
E(X) = p, Var(X) = p(1 − p)
Binomial distribution bin(n, k)
A coin is tossed n times and a head
turns up each time with probability p.
The total number of heads is discribed
1
by:
f (k) = nk pk q n−k
E(X) = np, Var(X) = np(1 − p)
Poisson distribution
k
f (k) = λk! exp(−λ)
E(X) = Var(X) = λ
Geometric distribution
Independent Bernoulli trials are performed. Let W be the waiting time
before the first succes occurs. Then
f (k) = P (W = k) = p(1 − p)k−1
E(X) = 1/p, Var(X) = (1 − p)/p2
negative binomial distribution
Let Wr be the waiting time before the
r:th success. Then
r
k−r
f (k) = P (Wr = k) = k−1
r−1 p (1 − p)
k = r, r + 1
pr
pr
, Var = (1−p)
E(X) = 1−p
2
Exponential distribution:
F (x) = 1 − e−λx , x ≥ 0
E(X) = 1/λ, Var(X) = 1/λ2
Normal distribution:
(x−µ)2
1
f (x) = √2πσ
e− 2σ2 , −∞ < x < ∞
2
E(X) = µ, Var(X) = σ 2
Cauchy distribution:
1
f (x) = π(1+x
2 ) (no moments!)
3.2
Dependence
Joint distribution:
F
R y(x, Ry)x = P(X ≤ x, Y ≤ y) =
f (u, v)dvdu
−∞ −∞
Lemma:
The random variables X and Y are independent if and only if
fX,Y (x, y) = fX (x)fY (y) for all x, y ∈ R
FX,Y (x, y) = FX (x)FY (y) for all
x, y ∈ R
Marginal distribution:
FX (x)
= P(X ≤ x) = F (x, ∞) =
R x R −∞
f (u, y dydx
−∞
−∞
fX (x)
=
Marginal densities:
S
P
({X = x} ∩ {Y = y})
=
P y
P
x, Y = y) = y fX,Y (x, y)
y P(X =
R∞
fX (x) = −∞ f (x, y)dy
Lemma:
P
E(g(X, Y )) = x,y g(x, y)fX,Y (x, y)
Definition:
Cov(X, Y ) = E [(X − EX)(Y − EY )]
3.3
Conditional
tions
distribu-
Definition:
The conditional distribution of Y given
X = x is:
FY |X (y|x) = P(Y ≤ y|X ≤ x)
R x f (v,y)
=
dv, {y : fY (y) > 0}
−∞ fY (y)
Theorem: Conditional expectation
ψ(X) = E (Y | X) , E (ψ(X)) = E(Y )
E (ψ(X)g(X)) = E (Y g(X))
Exponential
Φ(t)
=
R ∞ itx −λx distr.
e
λe
dx
0
R ∞ eitx
Cauchy distr: Φ(t) = π1 −∞ (1+x
2 ) dx
Normal distr,
N
(0,
1)
:
Φ(t)
=
R∞ 1
1 2
√
3.4 Sums of random vari- E(eitX ) = −∞
exp(itx
−
x
)dx
2
2π
Corollary:
ables
Random variables X and Y have the
Theorem:
P
same characteristic function if and only
P(X + Y = z) = x f (x, z − x)
if they have the same distribution funcIf X and Y are independent, then
tion. Theorem: Law of large Numbers
P(X + Y = z) = fX+Y
P
P (z) = Let X1 , X2 , X3 . . . be a sequence of iid
=
x fX (x)fY (z − x)
y fX (z − r.v’s with finite mean µ.
y)fY (y)
Their partial sums Sn = X1 + X2 + · · · +
D
Xn satisfy n1 Sn →µ as n → ∞
Central Limit Theorem:
3.5 Multivariate normal dis- Let X1 , X2 , X3 . . . be a sequence of iid
tribution:
r.v’s with finite mean µ and finite nonzero σ 2 , and let Sn = X1 +X2 +· · ·+Xn
exp(− 12 (x−µ)T V−1 (x−µ))
√
f (x) =
then
(2π)n det(V)
(Sn −nµ) D
T
√
→N (0, 1) as n → ∞
X = (X1 , X2 , . . . , Xn )
nσ 2
µ = (µ1 , . . . , µn ), E(Xi ) = µi
V = (vij ), vij = Cov(Xi , Xj )
5
4
Markov chains
Generating functions
Definiton Markov Chain
P (Xn = s | X0 = x0 , ...Xn − 1 = xn 1) =
Definition: Generating function
P (Xn | Xn − 1 = xn − 1)
The generating function of the random
Definition homogenous chain
variable X is defined by:
P (Xn + 1 = j | Xn = i) = P (X1 = j |
G(s) = E(sX )
X0 = i)
Example: Generating functions
Definition transition matrix
Constant: G(s) = sc
P = (pij ) with
Bernoulli: G(s) = (1 − p) + ps
pij = P (Xn+1 = j | Xn = i)
ps
Geometric: 1−s(1−p)
Theorem: P stochastic matrix
Poisson: G(s) = eλ(s−1)
(a) P has non-negative entries
Theorem: expectation ↔ G(s)
(b) P has row sum equal 1
(a) E(s) = G0 (1)
n-step transition
(b) E(X(X −1) . . . (X −k+1)) = G(k) (1) pij (m, m + n) = P (Xm+n = j | Xm = i)
Theorem: independance
Theorem Chapman Kolmogorov
X and Y are independent, iff
p
ij (m.m + n + r) =
P
GX+Y (s) = GX (s)GY (s)
k pik (m, m + n)pkj (m + n, m + n + r)
so
n
4.1 Characteristic functions P(m, m + n) = P
Definiton: mass funtion
Definition: moment generating function µ(n) = P (X = i)
n
i
M (t) = E(etX )
µ(m+n) = µ(m)Pn ⇒ µ(n) = µ(0)Pn
Definition: characteristic function
Definition: persistent, transient
Φ(t) = E(eitX )
persistent:
Theorem: independance
P (Xn = i for some n ≥ 1 | X0 = i) = 1
X and Y are independent iff
transient:
ΦX+Y (t) = ΦX (t)ΦY (t)
P (Xn = i for some n ≥ 1 | X0 = i) < 1
Theorem: Y = aX + b
Definition: first passage time
ΦY (t) = eitb ΦX (at)
fij (n) = P (X1 6= j, ., Xn−1 6= j, Xn =
Definition: joint characteristic function j | X0 = i)
P∞
ΦX,Y (s, t) = E(eisX eitY )
fij := n=1 fij (n)
Independent if:
Corollary: persistent, transient
P
ΦX,Y (s, t) = ΦX (s)ΦY (t) for all s and t State j is persistent if n pjj (n) = ∞
P
Theorem: moment gf ↔ charact. fcn
and ⇒ n pij (n) = ∞Pfor all i
Examples of characteristic functions:
State j P
is transient if n pjj (n) < ∞
Ber(p): Φ(t) = q + peit
and ⇒ n pij (n) < ∞ for all i
Bin distribution, bin(n, p): ΦX (tt) = Theorem:PGenerating functions
n
∞
(q + peit )
Pij (s) = n=0 sn pij (n)
2
P∞
Fij (s) = n=0 sn fij (n)
then
(a) Pii (s) = 1 + Fii (s)Pii (s)
(b) Pij (s) = Fij (s)Pij (s) if i 6= j
Definition: First visit time Tj
Tj := min{n ≥ 1 : Xn = j}
Definition: mean recurrence
P time µi
µi := E(Ti | X0 = i) = n nfii (n)
Definition: null, non-null state
state i is null if µi = ∞
state i is non-null if µi < ∞
Theorem: nullness of a persistent state
A persistent state is null if and only if
pii (n) → 0(n → ∞)
Definition: period d(i)
The period d(i) of a state i is defined
by d(i) = gcd{n : pii (n) > 0}. We call
i periodic if d(i) > 1 and aperiodic if
d(i) = 1
Definition: Ergodic
A state is called ergodic if it is persistent, non-null, and aperiodic.
Definition: (Inter-)communication
i → j if pij (m) < 0 for some m
i ↔ j if i → j and j → i
Theorem: intercommunication
If i ↔ j then:
(a) i and j have the same period
(b) i is transient iff j is transient
(c) i is null peristent iff j is null peristent
Definition: closed, irreducible
a set C of states is called:
(a) closed if pij = 0 for all i ∈ C, j ∈
/C
(b) irreducible if i ↔ j for all i, j ∈ C
an absorbing state is a closed set with
one state
Theorem: Decomposition
State space S can be partitioned as
T = T ∪ C1 ∪ C2 ∪ . . . , T is the set of
transient states, Ci irreducible, closed
sets of persistent states
Lemma: finite S
If S is finite, then at least one state is
persisten and all persistent states are
non-null.
5.1
Stationary distributions
Definition: stationary distribution
π is called stationaryPdistribution if
(a) πj ≥ 0 for all j, Pj πj = 1
(b) π = πP , so πj = i πi pij for all j
Theorem: existence of stat. distribution
An irreducible chain has a stationary
distribution π iff all states are non-null
persistent.
Then π is unique and given by πi = µ−1
i
Lemma: ρi (k)
ρi (k): mean number of visits of the
chain to the state i between two successive visits to state k.
Lemma: For any state k of an irre-
ducible persistent chain, the vector
ρ(k) satisfies ρi (k) < ∞ for all i and
ρ(k) = ρ(k)P
Theorem: irreducible, persistent
If the chain is irreducible and persistent, there exists a positive x with
x = xP , which is unique to a multiplicative
P constant. The chain
P is non-null if
x
<
∞
and
null
if
i
i
i xi = ∞
Theorem: transient chain if
s any state of an irreducible chain. The
chain is transient iff there exists a nonzero solution {yj : j 6= s}, with |yj | ≤ 1
for allPj, to the equation:
yi = j,j6=s pij yj ,
i 6= s
Theorem: persistent if
s any state of an irreducible chain on
S = {0, 1, 2, ...}. The chain is persistent
if there exists a solution {yj : j 6= s} to
the inequalities
P
yi ≥ j,j6=s pij yj ,
i 6= s
Theorem: Limittheorem
For an irreducible aperiodic chain, we
have that
pij (n) → µ1j as n → ∞ for all i and j
5.2
Reversibility
Theorem: Inverse Chain
Y with Yn = XN −n is a Markov chain
π
with P (Yn+1 = j | Yn = i) = ( πji )pji
Definition: Reversible chain
A chain is called reversible if
πi pij = πj pji
Theorem: reversible → stationary
If there is a π with πi pij = πj pji then
π is the stationary distribution of the
chain.
5.3
Poisson process
Eq. Forward System of Equations:
p0 ij (t) = λj−1 pi,j−1 (t) − λj pij (t)
j ≥ i, λ−1 = 0, pij (0) = δij
Eq. Backward systems of equations:
p0ij (t) = λi pi+1,j (t) − λpij (t)
j ≥ i pij (0) = δij
Theorem:
The forward system has a unique solution which satisfies the backward equation.
convergent in distribution
d
Xn −
→X
if P (Xn < x) → P (X < x) as n → ∞
Theorem: implications
a.s./r
P
(Xn −−−−→ X) ⇒ (Xn −
→ X) ⇒
D
(Xn −→ X)
For r > s ≥ 1 :
r
s
(Xn −
→ X) ⇒ (Xn −
→ X)
Theorem: additional implications
D
(a) If Xn −→ c, where c is const, then
5.4 Continuous
Markov Xn −P→ c
P
chain
(b) If Xn −
→ X and P (|Xn | ≤ k) = 1
r
for all n and some k, then Xn −
→ X for
Definition: Continuous Markov chain
all r ≥ 1
X is continuous Markov chain if:
P (X(tn ) = j | X(t1 ))i1 , ..., X(tn−1 ) = (c) IfPPn () = P (|Xn − X|) > ) satisin−1 ) = P (X(tn ) = j | X(tn−1 ) fies n Pn () < ∞ for all > 0, then
a.s.
Definition: transistion probability
Xn −−→ X
pij (s, t) = P (X(t) = j | X(s) = i) for Theorem: Skorokhod’s representation t.
D
s≤t
If Xn −→ X as n → ∞
homogeneous if pij (s, t) = pij (0, t − s)
then there exists a probability space and
Def: Generator Matrix
random variable Yn , Y with:
G = (gij ), pij (h) = gij h if i 6= j and (a) Y and Y have distribution F , F
n
n
pii = 1 + gij h
a.s.
(b) Yn −−→ Y as n → ∞
Eq. Forward systems of equations:
Theorem: Convergence over function
Pt0 = Pt G
D
If Xn −→ X and g : R → R is continuEq. Backward systems of equations:
D
Pt0 = GP t
ous then g(Xn ) −→ g(X)
Often solutions on the form Pt = Theorem: Equivalence
exp(tG)
The following statements are equivalent:
D
Matrix Exponential:
(a) Xn −→ X
P∞ tn n
exp(tG) = n=1 n! G
(b) E(g(Xn )) → E(g(X)) for all
Definition:
bounded continuous functions g
Irreducible if for any pair i, j, pij (t) > 0 (c) E(g(Xn )) → E(g(X)) for all funcfor some t
tions g of the form g(x) = f (x)I[a,b] (x)
Definition: Stationary
where f is continuous.
P
π, πj ≥ 0,
Theorem: Borel-Cantelli
j πj = 1 and
π = πPt ∀t ≥ 0
Let A = ∩n ∪∞
m=n Am be the event that
Claim: π = πPt ⇔ πG = 0
infinitely many of
Pthe An occur. Then:
Theorem:
(a) P (A) = 0 if P
n P (An ) < ∞
Stationary if pij (t) → πj as t → ∞ ∀i, j (b) P (A) = 1 if
n P (An ) = ∞ and
Not stationary if pij → 0
A1 , A2 , ... are independent.
Theorem:
X
→ X and Y
→ Y implies
6 Convergence of Ran- Xnn + Yn → X + Yn for convergence
a.s., r:th mean and probability. Not
dom Variables
generally true in distribution.
Norm
(a)kf k ≥ 0
(b) kf k = 0 iff f = 0
6.1 Laws of large numbers
(c) kaf k = |a| · kf k
(d) kf + gk ≤ kf k + kgk
Theorem:
convergent almost surely
2
X
1 , X2 , . . . is iid and E(Xi ) < ∞ and
a.s.
Xn −−→ X,
E(X)
Pn = µ then
if {ω ∈ Ω : Xn (ω) → X(ω) as n → ∞}
1
i=1 Xi → µ a.s. and in mean square
n
convergent in rth mean
Theorem:
r
Xn −
→X
{Xn } iid.
Distribution function F .
P
if E|Xnr | < ∞ and E(|Xn − X|r ) → 0 as Then 1 Pn X →µ
iff one of the foli=1 i
n
n→∞
lowing holds:
R
convergent in probability
1) nP (|X | > n) → 0 and
xdF
Definition: Poisson process
N (t) gives the number of events in time
t
Poisson process N (t) in S = {0, 1, 2, ...},
if
(a) N (0) = 0; if s < t then N (s) ≤ N (t)
(b) P (N (t + h) = n + m | N (t) = n) =
λh + o(h) if m = 1
o(h) if m > 1
1 − λh + o(h) if m = 0
(c) the emission per interval are independent of the intervals before.
Theorem: Poisson distribution
N (t) has the Poisson distribution:
(λt)j −λt
P (N (t)
=
j)
=
j! e
Definition: arrivaltime, interarrivaltime
Arrival time: Tn = inf {t : N (t) = n}
Interarrivaltime: Xn = Tn − Tn−1
Theorem: Interarrivaltime
X1 , X2 , ... are independent having exponential distribution Definition:
P
Birth process → Poisson process with Xn −
→X
intensities λ0 , λ1 , . . .
if P (|Xn − X| > ) → 0 as n → ∞
3
1
[−n,n]
as n → ∞
2) Char. Fcn. Φ(t) of Xi is differen-
tiable at t = 0 and Φ0 (0) = iµ
The best predictor of Y given X is
Theorem: Strong law of large numbers E(X|Y )
X1P
, X2 , ... iid. Then
n
1
i=1 Xi → µ a.s. as n → ∞.
n
Stochastic processes
for some µ, iff E|X1 | < ∞. In this case 7
µ = EX1
Definition: Renewal process
N = {N (t) : t ≥ 0} is a process for
6.2 Law of iterated loga- which N (t) = max{n : Tn ≤ t} where
T0 = 0, Tn = X1 + X2 + · · · + Xn for
rithm
n ≥ 1, and the Xn are iid non-negative
If X1 , X2 , ... are iid with mean 0 and r.v.’s
variance 1 then
Sn
P(lim supn→∞ √2nloglogn
= 1) = 1
6.3
Martingales
Definition: Martingale
Sn : n ≥ 1 is called a martingale with
respect to the sequence Xn : n ≥ 1, if
(a) E|Sn | < ∞
(b) E(Sn+1 | X1 , X2 , ..., Xn ) = Sn
Lemma:
E(X1 + X2 |Y ) = E(X1 |Y ) + E(X2 |Y )
E (Xg(Y )|Y ) = g(Y )E(X|Y ), g :
Rn → R
E (X|h(Y )) = E(X|Y ) if h : Rn → Rn
is one-one
Lemma: Tower Property
E [E(X|Y1 , Y2 )|Y1 ] = E(X|Y1 )
Lemma:
If {Bi : 1 ≤ i ≤Pn} is a partition of A
n
then E(X|A) = i=1 E(X|Bi )P(Bi )
Theorem: Martingale convergence
If {Sn } is a Martingale with E(Sn2 <
M < ∞) for some M and all n then
a.s,L2
∃S : Sn −→ S
6.4
Prediction and conditional expectation
Notation:
p
p
kU k2 = E(U 2 ) = hU, U i
hU, V i = E(U V ), kUn − U k2 → 0 ⇔
L2
Un −→U
kU + V k2 ≤ kU k2 + kV k2
Def.
X, Y r.v. E(Y 2 ) < ∞. The minimum
mean square predictor of Y given X is
Ŷ = h(X) = min kY − Ŷ k2
Theorem:
If a (linear) space H is closed and Ŷ ∈ H
then min Y − Ŷ
exists
2
Projection Theorem:
H is a closed linear space and Y :
E(Y 2 ) < ∞
For M ∈ H then E((Y − M )Z) = 0 ⇔
kY − M k2 ≤ kY − Zk2 ∀Z ∈ H
Theorem:
Let X and Y be r.v., E(Y 2 ) < ∞.
with
Pnthe same mean as Xn such that
1
j=1 Xj → Y a.s and in mean
n
Weakly stationary processes:
If X = {Xn : n ≥ 1} is a weakly stationary process, there exists
that
Pn a Y such
m.s.
E(Y ) = E(X1 ) and n1 j=1 Xj −→Y .
8.1
Gaussian processes
Definition:
A real valued c.t. process is called
Gaussian if each finite dimensional
vector (X(t1 ), X(t2 ), . . . , X(tn )) has
8 Stationary processes the multivariate normal distribution
N (µ(t), V(t)) , t = (t1 , t2 , . . . , tn )
Definition:
Theorem:
The Autocovariance function
The Gaussian process X is stationary
c(t, t + h) = Cov(X(t), X(t + h))
iff. E (X(t)) is constant for all t and
Definition: The process X = {X(t) : V(t) = V(t + h) for all t and h > 0
t ≥ 0} taking real values is called
strongly stationary if the families
¯
Inequalities
{X(t1 ), X(t2 ), ..., X(tn )} and {X(t1 + 9
h), X(t2 +h), ..., X(tn +h)} has the same
joint distribution for all t1 , t2 , ..., tn and Cauchy-Schwarz:
2
(E(XY )) ≤ E(X 2 )E(Y 2 )
h>0
Definition: The process X = {X(t) : with equality if and only if aX +bY = 1.
t ≥ 0} taking real values is called Jensen’s inequality:
weakly stationary if, for all t1 and Given a convex function J(x) and a r.v.
¯
t2 and h > 0 :E(X(t1 )) = E(X(t2 )) X with mean µ: E(J(X)) ≥ J(µ)
and Cov(X(t1 ), X(t2 )) = Cov(X(t1 + Markov’s inequality
h), X(t2 + h)), thus if and only if it has P (|X| ≥ a) ≤ E|X| for any a > 0
a
constant means and its autocovariance h : (R) → [0, ∞] non-negative fcn, then
function satisfies c(t, t + h) = c(0, h)
P (h(X) ≥ a) ≤ E(h(X))
∀a > 0
a
Definition:
General Markov’s inequality:
The covariance of complex-valued h : (R) → [0, M ] non-negative function
C1 and C2 is
bounded by some M . Then
Cov(C1 , C2 ) = E (C1 − EC1 )(C2 − EC2 )
P (h(X) ≥ a) ≤ E(h(X))−a
0≤a<M
M −a
Theorem:
Chebyshev’s Inequality:
{X} real, stationary with zero mean
E(X 2 )
P (|X| ≥ a) ≤ a2 if a ≥ 0
and autocovariance c(m).
The best predictor from the class of lin- Theorem: Holder’s inequality
ear functions of the subsequence {X}rr−s If p, q > 1 and p−1 + q −1 = 1 then
1
1
is
E|XY | ≤ (E|X p |) p (E|Y p |) q
P
s
br+k =
X
Minkowski’s inequality:
Psi=0 ai Xr−i
where
i=0 ai c (|i − j|) = c(k + j) for If p ≥ 1 then
1
1
1
0≤j≤s
[E(|X + Y |p )] p ≤ (E|X p |) p + (E|Y p |) q
Definition:
Minkowski 2:
Autocorrelation function of a weakly E(|X + Y |p ) ≤ CP [E|X p | + E|Y p |]
stationary process with autocovariance where p > 0 and
function c(t) is
c(t)
1 0<p≤1
ρ(t) = √ Cov(X(0),X(t))
= c(0)
CP →
Var(X(0))Var(X(t))
2p−1 p > 0
Theorem:
Spectral theorem: Weakly stationary Kolomogorov’s inequality:
process X with strictly pos. σ 2 is the Let {X } be iid with zero means and
n
char. function of some distribution F variances σ 2 . Then for > 0
n
whenever
σ 2 +···+σ 2
R ∞ρ(t)λ is continuous at t = 0
P(
max
|X
+ · · · + Xi | > ) ≤ 1 2 n
1
ρ(t) = −∞ e dF (λ)
1≤i≤n
F is called the spectral distribution Doob-Kolomogorov’s inequality:
function.
If {Sn } is a martingale, then for any
>0
Ergodic theorem:
E(S 2 )
X is strongly stationary such that P( max |Si | > ) ≤ 2n
1≤i≤n
E|X1 | < ∞ there exists a r.v Y
4
Download