Lecture 2 Convergence theorems 1 Types of Convergence To deal with convergence arguments in the next section, we have to introduce the concept of a random experiment. By denition, a random experiment is a triple (Ω, A, P ) where Ω denotes the sample space, A is some class of subsets of Ω and P is a function from A into [0, 1], i.e. P : A → [0, 1]. Elements ω of sample space Ω are referred to as outcomes of a random experiment. In each random experiment, there is one realized outcome ω . Elements A of A are called events. We say that event A happens if realized outcome ω belongs to A, i.e. ω ∈ A. Event A does not occur if the realized outcome ω ∈ / A. Each event A ∈ A has some probability of happening. This probability is denoted by P (A). Thus, function P denes a probability of events A in A. It would be nice if we could dene probability for all subsets of Ω. However, due to some technicalities, it is not always possible. Therefore we have to use only some class A of subsets of Ω, which may or may not contain all subsets of Ω. In this course, we will not talk about class A at all. It is mentioned here only for completeness. Once we have incorporated the concept of the random experiment, we can represent random variables as functions on sample space Ω. Thus, if ξ is a random variable, then ξ : Ω → R, i.e. for each realized outcome ω ∈ Ω, we have realization ξ(ω) of random variable ξ . 1.1 Denitions Let ξ1 , ..., ξn , ... be a sequence of random variables. Then, for any realized outcome ω ∈ Ω, we have a sequence of real numbers, ξ1 (ω), ..., ξn (ω), ... Denition 1. We say that {ξn }∞ n=1 converges to some random variable ξ almost surely if P {ω : ξn (ω) → ξ(ω)} = 1. In this case we write ξn → ξ a.s. Denition 2. We say that {ξn }∞ n=1 converges to ξ in probability if P {ω : |ξn (ω) − ξ(ω)| > ε} → 0 as n → ∞ for any ε > 0. In this case we write ξn →p ξ . 2 Denition 3. We say that {ξn }∞ n=1 converges to ξ in quadratic mean if E[|ξn − ξ| ] → 0 as n → ∞. In this case we write ξn → ξ in L2 . Denition 4. We say that {ξn }∞ n=1 converges to ξ in distribution if limn→∞ Fξn (x) = Fξ (x) for all x ∈ R where Fξ (x) is continuous. In this case we write ξn ⇒ ξ . Cite as: Anna Mikusheva, course materials for 14.381 Statistical Methods in Economics, Fall 2011. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. We will discuss the relationships between dierent types of convergence after we prove some inequalities. 1.2 Useful Inequalities First, we have Markov inequality. Theorem 5. Let X be any nonnegative random variable such that E[X] exists. Then for any t > 0, we have P {X ≥ t} ≤ E[X]/t. Proof. Since X is nonnegative, Z ∞ E[X] = xf (x)dx 0 Z Z t = xf (x)dx + Z0 ∞ ≥ xf (x)dx t xf (x)dx t Z ≥ ∞ t ∞ f (x)dx t = tP {X ≥ t} where f denotes the pdf of X . A similar argument works for discrete random variables as well. From the Markov inequality we can derive the Chebyshev inequality. Theorem 6. For any random variable X with mean µ and any t > 0, we have P {|X − µ| ≥ t} ≤ V (X)/t2 . Proof. Note that |X − µ| ≥ t if and only if |X − µ|2 ≥ t2 . Thus, P {|X − µ| ≥ t} = P {|X − µ|2 ≥ t2 }. Since |X − µ|2 is a nonnegative random variable, P {|X − µ|2 ≥ t2 } ≤ E[|X − µ|2 ]/t2 = V (X)/t2 by Markov inequality. These two inequalities are of huge importance in statistics and probability. 1.3 Relations between dierent types of convergence We can use the Markov inequality to prove that convergence in quadratic mean implies convergence in probability: Theorem 7. If ξn → ξ in L2 , then ξn →p ξ . Proof. By Markov inequality, P {|ξn − ξ| > ε} = P {|ξn − ξ|2 > ε2 } ≤ E[|ξn − ξ|2 ]/ε2 → 0 for any ε > 0. Convergence in probability implies convergence in distribution: 2 Theorem 8. If ξn →p ξ , then ξn ⇒ ξ . Proof. Note that ξn ≤ x and ξ > x + ε implies |ξn − ξ| > ε. Thus, Fξn (x) = P {ξn ≤ x} = P {ξn ≤ x, ξ ≤ x + ε} + P {ξn ≤ x, ξ > x + ε} ≤ P {ξ ≤ x + ε} + P {|ξn − ξ| > ε} = Fξ (x + ε) + P {|ξn − ξ| > ε}. for any x ∈ R and ε > 0. Similarly, Fξ (x − ε) ≤ Fξn (x) + P {|ξn − ξ| > ε}. Thus, Fξ (x − ε) − P {|ξn − ξ| > ε} ≤ Fξn (x) ≤ Fξ (x + ε) + P {|ξn − ξ| > ε}. Next, if x is a point of continuity of Fξ , for any δ > 0, there exists ε(δ) > 0 such that Fξ (x + ε(δ)) − δ ≤ Fξ (x) ≤ Fξ (x − ε(δ)) + δ. Therefore Fξ (x) − δ − P {|ξn − ξ| > ε(δ)} ≤ Fξn (x) ≤ Fξ (x) + δ + P {|ξn − ξ| > ε(δ)}. Next, since ξn →p ξ ,by denition: lim |Fξn (x) − Fξ (x)| ≤ δ n So, Fξn (x) → Fξ (x) as n → ∞ for any x ∈ R where Fξ (x) is continuous. As an exercise, prove that almost sure convergence implies convergence in probability. Also, one can show that if c is some constant and ξn ⇒ c, then ξn →p c. We have mentioned here all the correct implications. Any other implication is, in general, incorrect. Let us show, for example, that convergence in quadratic mean does not follow from convergence in probability. Let Ω = [0, 1] be a sample space. Let ξn (ω) = n if ω ∈ [0, 1/n] and 0 otherwise. Then it is obvious that ξn →p 0. Indeed, for any ε ∈ (0, 1), P {|ξn − 0| > ε} = 1/n → 0 as n → ∞ and P {|ξn − 0| > ε} = 0 for any ε ≥ 1. On the other hand, E[|ξn − 0|2 ] = E[|ξn |2 ] = n2 · (1/n) = n → ∞. 2 Slutsky theorem and Continuous mapping theorem Let X, X1 , ..., Xn , ... and Y, Y1 , ..., Yn , ... be some random variables. Let g be some continuous function. Let c be some constant. Then 1. If Xn →p X and Yn →p Y , then Xn + Yn →p X + Y and Xn Yn →p XY . 2. If Xn ⇒ X and Y →p c, then Xn + Yn ⇒ X + c and Xn Yn ⇒ cX . 3 3. If Xn →p X , then g(Xn ) →p g(X). 4. If Xn ⇒ X , then g(Xn ) ⇒ g(X) The rst and second statements are known as the Slutsky theorem. The third and forth statements are known as the Continuous mapping theorem. These theorems are widely used in statistics. Note that, in general, Xn ⇒ X and Yn ⇒ Y does not imply Xn + Yn ⇒ X + Y or Xn Yn ⇒ XY . 3 Law of Large Numbers Theorem 9. If {Xn }∞ n=1 is a sequence of independent and identically distributed (iid) random variables with E[Xn ] = µ and V (X) = σ 2 < ∞, then X n := Pn Proof. By linearity of expectation, E[X n ] = E [ i=1 Xi /n → µ in L2 and, thus, in probability. Pn i=1 E[|X − µ|2 ] Xi /n] = = = Pn i=1 E[Xi ]/n = µ. Thus, V (X) n X V( Xi )/n2 i=1 = = n X V (Xi )/n2 i=1 2 σ /n. Thus, E[|X − µ|2 ] → 0 as n → ∞. Another version of the law of large numbers is Theorem 10. If {Xn }∞ n=1 is a sequence of iid random variables with E[Xn ] = µ and E[|Xn |] < ∞, then X n → µ a.s. Proof. Omitted. 4 Central Limit Theorem 2 Let {Xn }∞ n=1 be a sequence of iid random variables with mean µ and variance σ . We have seen already that Pn Pn √ √ E[ i=1 (Xi − µ)/ n] = 0 and V ( i=1 (Xi − µ)/ n) = σ 2 . A much more remarkable result is the Central limit theorem: Theorem 11. Pn i=1 (Xi √ − µ)/ n ⇒ N (0, σ 2 ). In the multivariate case, if V (Xi ) = E [(Xi −E [Xi ])(Xi −E[Xi ])T ] = Σ, then 4 Pn i=1 (Xi −µ)/ √ n ⇒ N (0, Σ). 4.1 Example Let the probability of a newborn being a boy be, say, 0.51. What is the probability that at least half out of 100 newborns will be boys? To answer this question, let Xi = 1 if i-th newborn is a boy and Xi = 0 otherwise. Then Xi = 1 with probability p = 0.51 and Xi = 0 with probability 1 − p = 0.49. Therefore µ = E[Xi ] = 0.51 and σ 2 = p(1 − p) = 0.51 · 0.49. Moreover, X1 , ..., X100 are independent random variables. P100 The total number of boys equals i=1 Xi . Thus P{ 100 X Xi /100 ≥ 0.5} 100 X (Xi − µ)/100 ≥ −0.01} = P{ = 100 X √ √ √ P { (Xi − µ)/(σ 100) ≥ −0.01 · 100/ 0.51 · 0.49} i=1 i=1 i=1 ≈ since Pn i=1 (Xi 1 − Φ(−0.01 · √ √ 100/ 0.51 · 0.49) Pn √ √ − µ)/(σ n) ⇒ N (0, 1) as n → ∞ and, hence, P { i=1 (Xi − µ)/(σ n) ≤ x} → Φ(x) as n → ∞ for any x ∈ R. Note that here we used the fact that Φ(x) is continuous at all x ∈ R. 5 Delta method Let X1 , ..., Xn , ... be a sequence of iid random variables with mean µ and variance σ 2 . As before, X n = Pn √ n(X n − µ)/σ ⇒ N (0, 1). Suppose that g : R → R is some i=1 Xi /n. By the Central limit theorem twice continuously dierentiable function (i.e. it has at least two derivatives, and the second derivative is continuous). The delta method allows us to derive the limit (or, as we usually say, asymptotic) distribution of g(X n ): 6 0, then Theorem 12. If g 0 (µ) = √ n(g(X n ) − g(µ))/σ ⇒ N (0, (g 0 (µ))2 ). Proof. By the Taylor theorem with remainder, for any realization X n (ω), there is some µ?n (ω) between µ and X n (ω) such that g(X n (ω)) − g(µ) = g 0 (µ)(X n (ω) − µ) + g 00 (µ?n (ω))(X n (ω) − µ)2 /2. (1) Thus, we have dened a new sequence of random variables, {µ?n }∞ n=1 . By the Law of large numbers, X n →p µ. Since µ?n is between µ and X n , µ?n →p µ as well. By the Continuous mapping theorem, g 00 (µn? ) →p g 00 (µ) since √ √ g 00 (x) is continuous. Moreover, by the Slutsky theorem, n1/4 (X n −µ) = n(X n −µ)/n1/4 ⇒ 0 since n(X n − √ µ) ⇒ N (0, 1) and 1/n1/4 →p 0. Thus, n1/4 (X n − µ) →p 0. Moreover, n(X n − µ)2 = (n1/4 (X n − µ))2 →p 0 √ since f (x) = x2 is a continuous function. So, by the Slutsky theorem again, ng 00 (µ?n (ω))(X n (ω)−µ)2 /2 →p √ 0. In addition, by the Central limit theorem, ng 0 (µ)(X n (ω) − µ) ⇒ N (0, σ 2 (g 0 (µ))2 ). Multiplying equation √ (1) by n and applying the Slutsky theorem once more yields the result. Note that this theorem also holds when g 0 (µ) = 0 but in this case the asymptotic distribution will be 0 (constant), i.e. degenerate. I recommend that you remember the argument used in this theorem as it is very typical in statistics and econometrics. 5 The Delta method has a multidimensional extension. Let X1 , ..., Xn , ... be a sequence of iid k × 1 random vectors with mean µ and covariance matrix Σ. Then, by the multidimensional Central limit theorem, √ n(X n − µ) ⇒ N (0, Σ). Let g : Rk → R be a twice continuously dierentiable function. Let τ 2 = (∂g(µ)/∂µ)T Σ(∂g(µ)/∂µ). Here ∂g(µ)/∂µ is a k × 1 vector with i-th component equals ∂g(µ)/∂µi . Then √ n(g(X n ) − g(µ)) ⇒ N (0, τ 2 ). 5.1 Example Let X1 , ..., Xn , ... be a sequence of iid random variables with mean µ and variance σ 2 . What is the limiting √ distribution of (X n )2 ? Let g(x) = x2 . Then g 0 (µ) = 2µ. Thus, by the Delta method, n((X n )2 − µ2 ) ⇒ N (0, 4µ2 σ 2 ). Note that if µ = 0, then the limit distribution is degenerate. 6 MIT OpenCourseWare http://ocw.mit.edu 14.381 Statistical Method in Economics Fall 2013 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.