• Recall definition of convergence of real numbers

advertisement
Convergence of random variables
•
•
•
•
•
•
•
•
Recall definition of convergence of real numbers
Random variables are functions (from Ω to [0,1]), so there can be many senses in
which they converge
Definitions of different types of convergence: a.s., m.s., p. and d. Note: the definition
of m.s. convergence automatically implies that the limiting random variable has a
finite second moment
Relationship between different types of convergence:
o a.s. and m.s. imply p. But a.s. and m.s. do not imply each other in general. The
moving rectangles example is very useful to remember these implications.
o p. convergence implies d. convergence. Thus, a.s., m.s. and p. imply d. In
other words, d. is the weakest notion of convergence
Important facts:
o The limit of a sequence of random variables in p., a.s. or m.s. sense is unique
(with probability one)
o If a sequence of random variables converges in the m.s. sense, the limit has a
finite second moment
Law of large numbers
o m.s. version
o p. version (also called the weak law of large numbers or WLLN)
o a.s. version (also called the strong law of large numbers or SLLN).
o Interpreting SLLN in terms of tossing a fair coin: First, we note that the
sample space consists of infinite sequence of coin tosses. When will the
empirical average not converge to ½? When we have sequences such as
HTTHTTHTT…., or TTTHHTTTHH…., etc. There are infinitely many such
sequences where the fraction of H is clearly not ½. The SLLN tells us that
even though there are infinitely many such sequences, the probability that any
of them occurs is zero.
Central limit theorem (CLT)
o Important result: d. convergence iff characteristic functions converge
pointwise
o Prove CLT by proving convergence to the characteristic function of a
Gaussian random variable
Proving convergence when the limit is unknown
o For real numbers, monotone and bounded sequences converge
o When the above conditions do not hold, use the Cauchy criterion.
o Cauchy criterion for random variables
o For m.s. convergence in particular, the Cauchy criterion gives a simplified
condition for convergence in terms of correlation. This is a result that will be
used extensively in this course.
ƒ Xn converges to X m.s. if and only if lim E ( X m X n ) exists and is
m , n→∞
finite. Further, if Xn converges to X m.s.,
then lim E ( X m X n ) = E ( X 2 ) .
m , n→∞
•
•
•
•
Useful Results:
o The limiting cdf is unique under d. convergence
o If Xn and Yn converge to X and Y, respectively, in the m.s. sense, then
E((Xn+Yn) converges to E(X+Y), and E(XnYn) converges to E(XY).
o d. convergence iff convergence of expectation of all bounded functions of
random variables.
o Gaussian random variables only converge to Gaussian random variables
o If |Xn| <= Y and E(Y2) < ∞, then Xn converges to X in the p. sense implies it
also converges m.s., and thus the expectation also converges.
When a sequence of random variables converges, under what conditions do the
expectations also converge to the expectation of the limiting random variable?
o Bounded convergence theorem: For a sequence of bounded random variables,
p. convergence implies the convergence of the expectation
o Dominated convergence theorem: If |Xn| <= Y and E(Y) < ∞, then Xn
converges to X in the p. sense implies the expectation also converges.
o Monotone convergence theorem: Consider a monotone non-decreasing
sequence of non-negative random variables. Then, the sequence of random
variables converges for all ω (as an extended random variable, i.e., the limit
could possibly be infinity) and the expectations also converge to the
expectation of the limiting random variable.
Convex functions and Jensen’s inequality. Special case: E(X2) >= [E(X)]2.
Chernoff bound using the Markov inequality; Cramer’s theorem
Minimum Mean Square Estimation
• MMSE estimation and the Orthogonality principle:
o W is the MMSE estimator of X over some linear space V iff X-W is
orthogonal to all Z in V. The MMSE estimator is also called the projection of
X onto V.
o Further E[(X-W)2] =E(X2)-E(W2)
o Results on projections: Linearity; projections onto nested spaces and
orthogonal spaces
o Example: X is a random variable of interest, Y is an observation. Find a
function g(Y) such that E[(X-g(Y))2] is minimized. The MMSE estimate
g(Y)=E(X|Y). Can prove this using the orthogonality principle.
o The MMSE estimate is a projection: properties of projections.
• Random vectors: expectation or mean vector; correlation matrix and covariance
matrix
• Correlation and covariance matrices are positive semidefinite. Need to know how to
diagonalize a positive definite matrix.
• MMSE for vectors: Element-by-element MMSE estimates; Cov(e)=Cov(X)Cov(E(X|Y))
• If g(Y) is restricted to be an affine function, then the resulting MMSE estimate is
called the linear MMSE estimate:
o E(e)=0; Cov(e,Y)=0
o Ehat(X|Y)=E(X)+Cov(X,Y)[Cov(Y)]-1(Y-EY)
o Cov(e)=Cov(X)-Cov(Ehat(X|Y))=Cov(X)-Cov(X,Y) [Cov(Y)]-1Cov(Y,X)
• Relationship between linear estimator and conditional expectations; tower property of
both estimators
• Jointly Gaussian random variables and Gaussian random vectors:
o Definition: finite linear combinations are Gaussian
o Jointly Gaussian implies each random variable is Gaussian
o Linear combinations and limits of jointly Gaussian random variables are also
Gaussian.
o Characteristic function and pdf
o If X, Y are jointly Gaussian vectors, they are independent iff Cov(X,Y)=0
o X and Y are jointly Gaussian. Then, E(X|Y)=Ehat(X|Y) and X|Y=y is
Gaussian with mean E(X|Y=y) and covariance Cov(e).
• Linear Innovations: A recursive method of compute linear MMSE estimate
conditioned on a sequence of observations. The key idea is to “orthogonalize” the
observations.
• Kalman Filter: A recursive algorithm to obtain linear MMSE estimates for a linear
system. Uses the linear innovations concept.
Download