Convergence of random variables • • • • • • • • Recall definition of convergence of real numbers Random variables are functions (from Ω to [0,1]), so there can be many senses in which they converge Definitions of different types of convergence: a.s., m.s., p. and d. Note: the definition of m.s. convergence automatically implies that the limiting random variable has a finite second moment Relationship between different types of convergence: o a.s. and m.s. imply p. But a.s. and m.s. do not imply each other in general. The moving rectangles example is very useful to remember these implications. o p. convergence implies d. convergence. Thus, a.s., m.s. and p. imply d. In other words, d. is the weakest notion of convergence Important facts: o The limit of a sequence of random variables in p., a.s. or m.s. sense is unique (with probability one) o If a sequence of random variables converges in the m.s. sense, the limit has a finite second moment Law of large numbers o m.s. version o p. version (also called the weak law of large numbers or WLLN) o a.s. version (also called the strong law of large numbers or SLLN). o Interpreting SLLN in terms of tossing a fair coin: First, we note that the sample space consists of infinite sequence of coin tosses. When will the empirical average not converge to ½? When we have sequences such as HTTHTTHTT…., or TTTHHTTTHH…., etc. There are infinitely many such sequences where the fraction of H is clearly not ½. The SLLN tells us that even though there are infinitely many such sequences, the probability that any of them occurs is zero. Central limit theorem (CLT) o Important result: d. convergence iff characteristic functions converge pointwise o Prove CLT by proving convergence to the characteristic function of a Gaussian random variable Proving convergence when the limit is unknown o For real numbers, monotone and bounded sequences converge o When the above conditions do not hold, use the Cauchy criterion. o Cauchy criterion for random variables o For m.s. convergence in particular, the Cauchy criterion gives a simplified condition for convergence in terms of correlation. This is a result that will be used extensively in this course. Xn converges to X m.s. if and only if lim E ( X m X n ) exists and is m , n→∞ finite. Further, if Xn converges to X m.s., then lim E ( X m X n ) = E ( X 2 ) . m , n→∞ • • • • Useful Results: o The limiting cdf is unique under d. convergence o If Xn and Yn converge to X and Y, respectively, in the m.s. sense, then E((Xn+Yn) converges to E(X+Y), and E(XnYn) converges to E(XY). o d. convergence iff convergence of expectation of all bounded functions of random variables. o Gaussian random variables only converge to Gaussian random variables o If |Xn| <= Y and E(Y2) < ∞, then Xn converges to X in the p. sense implies it also converges m.s., and thus the expectation also converges. When a sequence of random variables converges, under what conditions do the expectations also converge to the expectation of the limiting random variable? o Bounded convergence theorem: For a sequence of bounded random variables, p. convergence implies the convergence of the expectation o Dominated convergence theorem: If |Xn| <= Y and E(Y) < ∞, then Xn converges to X in the p. sense implies the expectation also converges. o Monotone convergence theorem: Consider a monotone non-decreasing sequence of non-negative random variables. Then, the sequence of random variables converges for all ω (as an extended random variable, i.e., the limit could possibly be infinity) and the expectations also converge to the expectation of the limiting random variable. Convex functions and Jensen’s inequality. Special case: E(X2) >= [E(X)]2. Chernoff bound using the Markov inequality; Cramer’s theorem Minimum Mean Square Estimation • MMSE estimation and the Orthogonality principle: o W is the MMSE estimator of X over some linear space V iff X-W is orthogonal to all Z in V. The MMSE estimator is also called the projection of X onto V. o Further E[(X-W)2] =E(X2)-E(W2) o Results on projections: Linearity; projections onto nested spaces and orthogonal spaces o Example: X is a random variable of interest, Y is an observation. Find a function g(Y) such that E[(X-g(Y))2] is minimized. The MMSE estimate g(Y)=E(X|Y). Can prove this using the orthogonality principle. o The MMSE estimate is a projection: properties of projections. • Random vectors: expectation or mean vector; correlation matrix and covariance matrix • Correlation and covariance matrices are positive semidefinite. Need to know how to diagonalize a positive definite matrix. • MMSE for vectors: Element-by-element MMSE estimates; Cov(e)=Cov(X)Cov(E(X|Y)) • If g(Y) is restricted to be an affine function, then the resulting MMSE estimate is called the linear MMSE estimate: o E(e)=0; Cov(e,Y)=0 o Ehat(X|Y)=E(X)+Cov(X,Y)[Cov(Y)]-1(Y-EY) o Cov(e)=Cov(X)-Cov(Ehat(X|Y))=Cov(X)-Cov(X,Y) [Cov(Y)]-1Cov(Y,X) • Relationship between linear estimator and conditional expectations; tower property of both estimators • Jointly Gaussian random variables and Gaussian random vectors: o Definition: finite linear combinations are Gaussian o Jointly Gaussian implies each random variable is Gaussian o Linear combinations and limits of jointly Gaussian random variables are also Gaussian. o Characteristic function and pdf o If X, Y are jointly Gaussian vectors, they are independent iff Cov(X,Y)=0 o X and Y are jointly Gaussian. Then, E(X|Y)=Ehat(X|Y) and X|Y=y is Gaussian with mean E(X|Y=y) and covariance Cov(e). • Linear Innovations: A recursive method of compute linear MMSE estimate conditioned on a sequence of observations. The key idea is to “orthogonalize” the observations. • Kalman Filter: A recursive algorithm to obtain linear MMSE estimates for a linear system. Uses the linear innovations concept.