1 Sets if (x1 , y1 ) ≤ (x2 , y2 ) then FX,Y (x1 , y1 ) ≤ FX,Y (x2 , y2 ) (A ∪ B) ∪ C = A ∪ (B ∪ C) FX,Y is continuous from above, in that: (A ∪ B)C = AC ∩ B C FX,Y (x + u, y + v) → FX,Y (x, y) as Definition: σ-field u, v → 0 F subset of Ω is a σ − f ield, if Theorem: (a) 0 ∈ F If X and Y are independent and g, h : (b) if A1 , A2 , ... ∈ F then ∪∞ R → R, then g(X) and h(Y ) are indei=1 Ai ∈ F (c) if A ∈ F then Ac ∈ F pendent too. Definition: The expectation of the random variable 2 Probability X is: P E(X) = x:f (x)>0 xf (x) P(AC ) = 1 − P(A) Lemma: If B ⊇ A then P(B) = P(A)+P(B\A) ≥ If X has mass function f and g : R → R, P(A) then: P P(A ∪ B) = P(A) + P(B) − P(A ∩ B) E(g(X)) = x g(x)f (x) More Continous counterpart: Sn generally: P P ( =P − E (g(X)) = R ∞ g(x)fX (x)dx i P(Ai ) P i=1 Ai ) −∞ )− i<j P (Ai ∩ Aj )− i<j<k P (Ai ∩ Aj ∩ AkDefinition: n+1 · · · + (−1) P (A1 ∩ A2 ∩ · · · ∩ An ) If k is a positive integer, the k:th moLemma 5, p. 7: Let A1 ⊆ A2 ⊆ . . . , ment mk of X is defined mk = E(X k ). and writeSA for their limit: The k:th central moment is σk = ∞ A = = limi→∞ Ai then E((X − m1 )k ) i=1 Ai P(A) = limi→∞ P(Ai ) Theorem: Similarly, B ⊇ B ⊇ B ⊇ . . . , then The expectation operator E: 1 2 3 T∞ B = i=1 Bi = limi→∞ Bi (a) If X ≥ 0 then E(X) ≥ 0 satisfies P(B) = limi→∞ P(Bi ) (b) If a, b ∈ R then E(aX + bY ) = Multiplication rule aE(X) + bE(Y ) P (A, B) = P (A)P (B | A) Lemma: Conditional Probability If X and Y are independent, then E(XY ) = E(X)E(Y ) P (A | B) = P P(A∩B) (B) Definition: P (A,B,C,...) P (A | B, C, ...) = P (B,C,...) X and Y are uncorrelated if E(XY ) = Bayes formula E(X)E(Y ) P(A|B) = P(B|A)P(A)P(B) Theorem: Total probability For random variables X and Y P (A) = P (A | B)P (B) (a) Var(aX) = a2 Var(X) for a ∈ R + P (A | P B C )P (B C ) (b) Var(X + Y ) = Var(X) + Var(Y ) if n P (A) = i=1 P (A | Bi )P (Bi ) X and Y are uncorrelated. Definition 1, p. 13: Indicator function: A family {A i : iQ∈ I}is independent if: EIA = P(A) T P i∈J Ai = i∈J P(Ai ) For all finite subset J of I distribution function F : R− > [0, 1] : F (x) = P (X ≤ x) mass function 3 Random Variable Lemma 11, p. 30: Let F be a distribution function of X, then (a) P(X > x) = 1 − F (x) (b) P(x < X ≤ y) = F (y) − F (x) (c) F (X = x) = F (x) − limy→x F (y) Marginal distribution: limy→∞ FX,Y (x, y) = FX (x) limx→∞ FX,Y (x, y) = FY (y) Lemma 5, p. 39: The joint distribution function FX,Y of the random vector (X, Y ) has the following properties: lim FX,Y (x, y) = 0 x,y→−∞ lim FX,Y (x, y) = 1 x,y→∞ 3.1 Distribution functions Constant variable X(ω) = c: F (X) = σ(x − c) Bernoulli distribution Bern(p) A coin is tossed one time and shows head with probability p with X(H) = 1 and X(T ) = 0 F (X) = 0 x < 0 F (X) = 1 − p 0 ≤ x < 1 F (X) = 1 x ≥ 1 E(X) = p, Var(X) = p(1 − p) Binomial distribution bin(n, k) A coin is tossed n times and a head turns up each time with probability p. The total number of heads is discribed 1 by: f (k) = nk pk q n−k E(X) = np, Var(X) = np(1 − p) Poisson distribution k f (k) = λk! exp(−λ) E(X) = Var(X) = λ Geometric distribution Independent Bernoulli trials are performed. Let W be the waiting time before the first succes occurs. Then f (k) = P (W = k) = p(1 − p)k−1 E(X) = 1/p, Var(X) = (1 − p)/p2 negative binomial distribution Let Wr be the waiting time before the r:th success. Then r k−r f (k) = P (Wr = k) = k−1 r−1 p (1 − p) k = r, r + 1 pr pr , Var = (1−p) E(X) = 1−p 2 Exponential distribution: F (x) = 1 − e−λx , x ≥ 0 E(X) = 1/λ, Var(X) = 1/λ2 Normal distribution: (x−µ)2 1 f (x) = √2πσ e− 2σ2 , −∞ < x < ∞ 2 E(X) = µ, Var(X) = σ 2 Cauchy distribution: 1 f (x) = π(1+x 2 ) (no moments!) 3.2 Dependence Joint distribution: F R y(x, Ry)x = P(X ≤ x, Y ≤ y) = f (u, v)dvdu −∞ −∞ Lemma: The random variables X and Y are independent if and only if fX,Y (x, y) = fX (x)fY (y) for all x, y ∈ R FX,Y (x, y) = FX (x)FY (y) for all x, y ∈ R Marginal distribution: FX (x) = P(X ≤ x) = F (x, ∞) = R x R −∞ f (u, y dydx −∞ −∞ fX (x) = Marginal densities: S P ({X = x} ∩ {Y = y}) = P y P x, Y = y) = y fX,Y (x, y) y P(X = R∞ fX (x) = −∞ f (x, y)dy Lemma: P E(g(X, Y )) = x,y g(x, y)fX,Y (x, y) Definition: Cov(X, Y ) = E [(X − EX)(Y − EY )] 3.3 Conditional tions distribu- Definition: The conditional distribution of Y given X = x is: FY |X (y|x) = P(Y ≤ y|X ≤ x) R x f (v,y) = dv, {y : fY (y) > 0} −∞ fY (y) Theorem: Conditional expectation ψ(X) = E (Y | X) , E (ψ(X)) = E(Y ) E (ψ(X)g(X)) = E (Y g(X)) Exponential Φ(t) = R ∞ itx −λx distr. e λe dx 0 R ∞ eitx Cauchy distr: Φ(t) = π1 −∞ (1+x 2 ) dx Normal distr, N (0, 1) : Φ(t) = R∞ 1 1 2 √ 3.4 Sums of random vari- E(eitX ) = −∞ exp(itx − x )dx 2 2π Corollary: ables Random variables X and Y have the Theorem: P same characteristic function if and only P(X + Y = z) = x f (x, z − x) if they have the same distribution funcIf X and Y are independent, then tion. Theorem: Law of large Numbers P(X + Y = z) = fX+Y P P (z) = Let X1 , X2 , X3 . . . be a sequence of iid = x fX (x)fY (z − x) y fX (z − r.v’s with finite mean µ. y)fY (y) Their partial sums Sn = X1 + X2 + · · · + D Xn satisfy n1 Sn →µ as n → ∞ Central Limit Theorem: 3.5 Multivariate normal dis- Let X1 , X2 , X3 . . . be a sequence of iid tribution: r.v’s with finite mean µ and finite nonzero σ 2 , and let Sn = X1 +X2 +· · ·+Xn exp(− 12 (x−µ)T V−1 (x−µ)) √ f (x) = then (2π)n det(V) (Sn −nµ) D T √ →N (0, 1) as n → ∞ X = (X1 , X2 , . . . , Xn ) nσ 2 µ = (µ1 , . . . , µn ), E(Xi ) = µi V = (vij ), vij = Cov(Xi , Xj ) 5 4 Markov chains Generating functions Definiton Markov Chain P (Xn = s | X0 = x0 , ...Xn − 1 = xn 1) = Definition: Generating function P (Xn | Xn − 1 = xn − 1) The generating function of the random Definition homogenous chain variable X is defined by: P (Xn + 1 = j | Xn = i) = P (X1 = j | G(s) = E(sX ) X0 = i) Example: Generating functions Definition transition matrix Constant: G(s) = sc P = (pij ) with Bernoulli: G(s) = (1 − p) + ps pij = P (Xn+1 = j | Xn = i) ps Geometric: 1−s(1−p) Theorem: P stochastic matrix Poisson: G(s) = eλ(s−1) (a) P has non-negative entries Theorem: expectation ↔ G(s) (b) P has row sum equal 1 (a) E(s) = G0 (1) n-step transition (b) E(X(X −1) . . . (X −k+1)) = G(k) (1) pij (m, m + n) = P (Xm+n = j | Xm = i) Theorem: independance Theorem Chapman Kolmogorov X and Y are independent, iff p ij (m.m + n + r) = P GX+Y (s) = GX (s)GY (s) k pik (m, m + n)pkj (m + n, m + n + r) so n 4.1 Characteristic functions P(m, m + n) = P Definiton: mass funtion Definition: moment generating function µ(n) = P (X = i) n i M (t) = E(etX ) µ(m+n) = µ(m)Pn ⇒ µ(n) = µ(0)Pn Definition: characteristic function Definition: persistent, transient Φ(t) = E(eitX ) persistent: Theorem: independance P (Xn = i for some n ≥ 1 | X0 = i) = 1 X and Y are independent iff transient: ΦX+Y (t) = ΦX (t)ΦY (t) P (Xn = i for some n ≥ 1 | X0 = i) < 1 Theorem: Y = aX + b Definition: first passage time ΦY (t) = eitb ΦX (at) fij (n) = P (X1 6= j, ., Xn−1 6= j, Xn = Definition: joint characteristic function j | X0 = i) P∞ ΦX,Y (s, t) = E(eisX eitY ) fij := n=1 fij (n) Independent if: Corollary: persistent, transient P ΦX,Y (s, t) = ΦX (s)ΦY (t) for all s and t State j is persistent if n pjj (n) = ∞ P Theorem: moment gf ↔ charact. fcn and ⇒ n pij (n) = ∞Pfor all i Examples of characteristic functions: State j P is transient if n pjj (n) < ∞ Ber(p): Φ(t) = q + peit and ⇒ n pij (n) < ∞ for all i Bin distribution, bin(n, p): ΦX (tt) = Theorem:PGenerating functions n ∞ (q + peit ) Pij (s) = n=0 sn pij (n) 2 P∞ Fij (s) = n=0 sn fij (n) then (a) Pii (s) = 1 + Fii (s)Pii (s) (b) Pij (s) = Fij (s)Pij (s) if i 6= j Definition: First visit time Tj Tj := min{n ≥ 1 : Xn = j} Definition: mean recurrence P time µi µi := E(Ti | X0 = i) = n nfii (n) Definition: null, non-null state state i is null if µi = ∞ state i is non-null if µi < ∞ Theorem: nullness of a persistent state A persistent state is null if and only if pii (n) → 0(n → ∞) Definition: period d(i) The period d(i) of a state i is defined by d(i) = gcd{n : pii (n) > 0}. We call i periodic if d(i) > 1 and aperiodic if d(i) = 1 Definition: Ergodic A state is called ergodic if it is persistent, non-null, and aperiodic. Definition: (Inter-)communication i → j if pij (m) < 0 for some m i ↔ j if i → j and j → i Theorem: intercommunication If i ↔ j then: (a) i and j have the same period (b) i is transient iff j is transient (c) i is null peristent iff j is null peristent Definition: closed, irreducible a set C of states is called: (a) closed if pij = 0 for all i ∈ C, j ∈ /C (b) irreducible if i ↔ j for all i, j ∈ C an absorbing state is a closed set with one state Theorem: Decomposition State space S can be partitioned as T = T ∪ C1 ∪ C2 ∪ . . . , T is the set of transient states, Ci irreducible, closed sets of persistent states Lemma: finite S If S is finite, then at least one state is persisten and all persistent states are non-null. 5.1 Stationary distributions Definition: stationary distribution π is called stationaryPdistribution if (a) πj ≥ 0 for all j, Pj πj = 1 (b) π = πP , so πj = i πi pij for all j Theorem: existence of stat. distribution An irreducible chain has a stationary distribution π iff all states are non-null persistent. Then π is unique and given by πi = µ−1 i Lemma: ρi (k) ρi (k): mean number of visits of the chain to the state i between two successive visits to state k. Lemma: For any state k of an irre- ducible persistent chain, the vector ρ(k) satisfies ρi (k) < ∞ for all i and ρ(k) = ρ(k)P Theorem: irreducible, persistent If the chain is irreducible and persistent, there exists a positive x with x = xP , which is unique to a multiplicative P constant. The chain P is non-null if x < ∞ and null if i i i xi = ∞ Theorem: transient chain if s any state of an irreducible chain. The chain is transient iff there exists a nonzero solution {yj : j 6= s}, with |yj | ≤ 1 for allPj, to the equation: yi = j,j6=s pij yj , i 6= s Theorem: persistent if s any state of an irreducible chain on S = {0, 1, 2, ...}. The chain is persistent if there exists a solution {yj : j 6= s} to the inequalities P yi ≥ j,j6=s pij yj , i 6= s Theorem: Limittheorem For an irreducible aperiodic chain, we have that pij (n) → µ1j as n → ∞ for all i and j 5.2 Reversibility Theorem: Inverse Chain Y with Yn = XN −n is a Markov chain π with P (Yn+1 = j | Yn = i) = ( πji )pji Definition: Reversible chain A chain is called reversible if πi pij = πj pji Theorem: reversible → stationary If there is a π with πi pij = πj pji then π is the stationary distribution of the chain. 5.3 Poisson process Eq. Forward System of Equations: p0 ij (t) = λj−1 pi,j−1 (t) − λj pij (t) j ≥ i, λ−1 = 0, pij (0) = δij Eq. Backward systems of equations: p0ij (t) = λi pi+1,j (t) − λpij (t) j ≥ i pij (0) = δij Theorem: The forward system has a unique solution which satisfies the backward equation. convergent in distribution d Xn − →X if P (Xn < x) → P (X < x) as n → ∞ Theorem: implications a.s./r P (Xn −−−−→ X) ⇒ (Xn − → X) ⇒ D (Xn −→ X) For r > s ≥ 1 : r s (Xn − → X) ⇒ (Xn − → X) Theorem: additional implications D (a) If Xn −→ c, where c is const, then 5.4 Continuous Markov Xn −P→ c P chain (b) If Xn − → X and P (|Xn | ≤ k) = 1 r for all n and some k, then Xn − → X for Definition: Continuous Markov chain all r ≥ 1 X is continuous Markov chain if: P (X(tn ) = j | X(t1 ))i1 , ..., X(tn−1 ) = (c) IfPPn () = P (|Xn − X|) > ) satisin−1 ) = P (X(tn ) = j | X(tn−1 ) fies n Pn () < ∞ for all > 0, then a.s. Definition: transistion probability Xn −−→ X pij (s, t) = P (X(t) = j | X(s) = i) for Theorem: Skorokhod’s representation t. D s≤t If Xn −→ X as n → ∞ homogeneous if pij (s, t) = pij (0, t − s) then there exists a probability space and Def: Generator Matrix random variable Yn , Y with: G = (gij ), pij (h) = gij h if i 6= j and (a) Y and Y have distribution F , F n n pii = 1 + gij h a.s. (b) Yn −−→ Y as n → ∞ Eq. Forward systems of equations: Theorem: Convergence over function Pt0 = Pt G D If Xn −→ X and g : R → R is continuEq. Backward systems of equations: D Pt0 = GP t ous then g(Xn ) −→ g(X) Often solutions on the form Pt = Theorem: Equivalence exp(tG) The following statements are equivalent: D Matrix Exponential: (a) Xn −→ X P∞ tn n exp(tG) = n=1 n! G (b) E(g(Xn )) → E(g(X)) for all Definition: bounded continuous functions g Irreducible if for any pair i, j, pij (t) > 0 (c) E(g(Xn )) → E(g(X)) for all funcfor some t tions g of the form g(x) = f (x)I[a,b] (x) Definition: Stationary where f is continuous. P π, πj ≥ 0, Theorem: Borel-Cantelli j πj = 1 and π = πPt ∀t ≥ 0 Let A = ∩n ∪∞ m=n Am be the event that Claim: π = πPt ⇔ πG = 0 infinitely many of Pthe An occur. Then: Theorem: (a) P (A) = 0 if P n P (An ) < ∞ Stationary if pij (t) → πj as t → ∞ ∀i, j (b) P (A) = 1 if n P (An ) = ∞ and Not stationary if pij → 0 A1 , A2 , ... are independent. Theorem: X → X and Y → Y implies 6 Convergence of Ran- Xnn + Yn → X + Yn for convergence a.s., r:th mean and probability. Not dom Variables generally true in distribution. Norm (a)kf k ≥ 0 (b) kf k = 0 iff f = 0 6.1 Laws of large numbers (c) kaf k = |a| · kf k (d) kf + gk ≤ kf k + kgk Theorem: convergent almost surely 2 X 1 , X2 , . . . is iid and E(Xi ) < ∞ and a.s. Xn −−→ X, E(X) Pn = µ then if {ω ∈ Ω : Xn (ω) → X(ω) as n → ∞} 1 i=1 Xi → µ a.s. and in mean square n convergent in rth mean Theorem: r Xn − →X {Xn } iid. Distribution function F . P if E|Xnr | < ∞ and E(|Xn − X|r ) → 0 as Then 1 Pn X →µ iff one of the foli=1 i n n→∞ lowing holds: R convergent in probability 1) nP (|X | > n) → 0 and xdF Definition: Poisson process N (t) gives the number of events in time t Poisson process N (t) in S = {0, 1, 2, ...}, if (a) N (0) = 0; if s < t then N (s) ≤ N (t) (b) P (N (t + h) = n + m | N (t) = n) = λh + o(h) if m = 1 o(h) if m > 1 1 − λh + o(h) if m = 0 (c) the emission per interval are independent of the intervals before. Theorem: Poisson distribution N (t) has the Poisson distribution: (λt)j −λt P (N (t) = j) = j! e Definition: arrivaltime, interarrivaltime Arrival time: Tn = inf {t : N (t) = n} Interarrivaltime: Xn = Tn − Tn−1 Theorem: Interarrivaltime X1 , X2 , ... are independent having exponential distribution Definition: P Birth process → Poisson process with Xn − →X intensities λ0 , λ1 , . . . if P (|Xn − X| > ) → 0 as n → ∞ 3 1 [−n,n] as n → ∞ 2) Char. Fcn. Φ(t) of Xi is differen- tiable at t = 0 and Φ0 (0) = iµ The best predictor of Y given X is Theorem: Strong law of large numbers E(X|Y ) X1P , X2 , ... iid. Then n 1 i=1 Xi → µ a.s. as n → ∞. n Stochastic processes for some µ, iff E|X1 | < ∞. In this case 7 µ = EX1 Definition: Renewal process N = {N (t) : t ≥ 0} is a process for 6.2 Law of iterated loga- which N (t) = max{n : Tn ≤ t} where T0 = 0, Tn = X1 + X2 + · · · + Xn for rithm n ≥ 1, and the Xn are iid non-negative If X1 , X2 , ... are iid with mean 0 and r.v.’s variance 1 then Sn P(lim supn→∞ √2nloglogn = 1) = 1 6.3 Martingales Definition: Martingale Sn : n ≥ 1 is called a martingale with respect to the sequence Xn : n ≥ 1, if (a) E|Sn | < ∞ (b) E(Sn+1 | X1 , X2 , ..., Xn ) = Sn Lemma: E(X1 + X2 |Y ) = E(X1 |Y ) + E(X2 |Y ) E (Xg(Y )|Y ) = g(Y )E(X|Y ), g : Rn → R E (X|h(Y )) = E(X|Y ) if h : Rn → Rn is one-one Lemma: Tower Property E [E(X|Y1 , Y2 )|Y1 ] = E(X|Y1 ) Lemma: If {Bi : 1 ≤ i ≤Pn} is a partition of A n then E(X|A) = i=1 E(X|Bi )P(Bi ) Theorem: Martingale convergence If {Sn } is a Martingale with E(Sn2 < M < ∞) for some M and all n then a.s,L2 ∃S : Sn −→ S 6.4 Prediction and conditional expectation Notation: p p kU k2 = E(U 2 ) = hU, U i hU, V i = E(U V ), kUn − U k2 → 0 ⇔ L2 Un −→U kU + V k2 ≤ kU k2 + kV k2 Def. X, Y r.v. E(Y 2 ) < ∞. The minimum mean square predictor of Y given X is Ŷ = h(X) = min kY − Ŷ k2 Theorem: If a (linear) space H is closed and Ŷ ∈ H then min Y − Ŷ exists 2 Projection Theorem: H is a closed linear space and Y : E(Y 2 ) < ∞ For M ∈ H then E((Y − M )Z) = 0 ⇔ kY − M k2 ≤ kY − Zk2 ∀Z ∈ H Theorem: Let X and Y be r.v., E(Y 2 ) < ∞. with Pnthe same mean as Xn such that 1 j=1 Xj → Y a.s and in mean n Weakly stationary processes: If X = {Xn : n ≥ 1} is a weakly stationary process, there exists that Pn a Y such m.s. E(Y ) = E(X1 ) and n1 j=1 Xj −→Y . 8.1 Gaussian processes Definition: A real valued c.t. process is called Gaussian if each finite dimensional vector (X(t1 ), X(t2 ), . . . , X(tn )) has 8 Stationary processes the multivariate normal distribution N (µ(t), V(t)) , t = (t1 , t2 , . . . , tn ) Definition: Theorem: The Autocovariance function The Gaussian process X is stationary c(t, t + h) = Cov(X(t), X(t + h)) iff. E (X(t)) is constant for all t and Definition: The process X = {X(t) : V(t) = V(t + h) for all t and h > 0 t ≥ 0} taking real values is called strongly stationary if the families ¯ Inequalities {X(t1 ), X(t2 ), ..., X(tn )} and {X(t1 + 9 h), X(t2 +h), ..., X(tn +h)} has the same joint distribution for all t1 , t2 , ..., tn and Cauchy-Schwarz: 2 (E(XY )) ≤ E(X 2 )E(Y 2 ) h>0 Definition: The process X = {X(t) : with equality if and only if aX +bY = 1. t ≥ 0} taking real values is called Jensen’s inequality: weakly stationary if, for all t1 and Given a convex function J(x) and a r.v. ¯ t2 and h > 0 :E(X(t1 )) = E(X(t2 )) X with mean µ: E(J(X)) ≥ J(µ) and Cov(X(t1 ), X(t2 )) = Cov(X(t1 + Markov’s inequality h), X(t2 + h)), thus if and only if it has P (|X| ≥ a) ≤ E|X| for any a > 0 a constant means and its autocovariance h : (R) → [0, ∞] non-negative fcn, then function satisfies c(t, t + h) = c(0, h) P (h(X) ≥ a) ≤ E(h(X)) ∀a > 0 a Definition: General Markov’s inequality: The covariance of complex-valued h : (R) → [0, M ] non-negative function C1 and C2 is bounded by some M . Then Cov(C1 , C2 ) = E (C1 − EC1 )(C2 − EC2 ) P (h(X) ≥ a) ≤ E(h(X))−a 0≤a<M M −a Theorem: Chebyshev’s Inequality: {X} real, stationary with zero mean E(X 2 ) P (|X| ≥ a) ≤ a2 if a ≥ 0 and autocovariance c(m). The best predictor from the class of lin- Theorem: Holder’s inequality ear functions of the subsequence {X}rr−s If p, q > 1 and p−1 + q −1 = 1 then 1 1 is E|XY | ≤ (E|X p |) p (E|Y p |) q P s br+k = X Minkowski’s inequality: Psi=0 ai Xr−i where i=0 ai c (|i − j|) = c(k + j) for If p ≥ 1 then 1 1 1 0≤j≤s [E(|X + Y |p )] p ≤ (E|X p |) p + (E|Y p |) q Definition: Minkowski 2: Autocorrelation function of a weakly E(|X + Y |p ) ≤ CP [E|X p | + E|Y p |] stationary process with autocovariance where p > 0 and function c(t) is c(t) 1 0<p≤1 ρ(t) = √ Cov(X(0),X(t)) = c(0) CP → Var(X(0))Var(X(t)) 2p−1 p > 0 Theorem: Spectral theorem: Weakly stationary Kolomogorov’s inequality: process X with strictly pos. σ 2 is the Let {X } be iid with zero means and n char. function of some distribution F variances σ 2 . Then for > 0 n whenever σ 2 +···+σ 2 R ∞ρ(t)λ is continuous at t = 0 P( max |X + · · · + Xi | > ) ≤ 1 2 n 1 ρ(t) = −∞ e dF (λ) 1≤i≤n F is called the spectral distribution Doob-Kolomogorov’s inequality: function. If {Sn } is a martingale, then for any >0 Ergodic theorem: E(S 2 ) X is strongly stationary such that P( max |Si | > ) ≤ 2n 1≤i≤n E|X1 | < ∞ there exists a r.v Y 4