Lecture 2 Convergence theorems 1 Types of Convergence

advertisement
Lecture 2
Convergence theorems
1 Types of Convergence
To deal with convergence arguments in the next section, we have to introduce the concept of a random
experiment. By denition, a random experiment is a triple (Ω, A, P ) where Ω denotes the sample space,
A is some class of subsets of Ω and P is a function from A into [0, 1], i.e. P : A → [0, 1]. Elements ω of
sample space Ω are referred to as outcomes of a random experiment. In each random experiment, there is
one realized outcome ω . Elements A of A are called events. We say that event A happens if realized outcome
ω belongs to A, i.e. ω ∈ A. Event A does not occur if the realized outcome ω ∈
/ A. Each event A ∈ A has
some probability of happening. This probability is denoted by P (A). Thus, function P denes a probability
of events A in A. It would be nice if we could dene probability for all subsets of Ω. However, due to some
technicalities, it is not always possible. Therefore we have to use only some class A of subsets of Ω, which
may or may not contain all subsets of Ω. In this course, we will not talk about class A at all. It is mentioned
here only for completeness.
Once we have incorporated the concept of the random experiment, we can represent random variables as
functions on sample space Ω. Thus, if ξ is a random variable, then ξ : Ω → R, i.e. for each realized outcome
ω ∈ Ω, we have realization ξ(ω) of random variable ξ .
1.1 Denitions
Let ξ1 , ..., ξn , ... be a sequence of random variables. Then, for any realized outcome ω ∈ Ω, we have a sequence
of real numbers, ξ1 (ω), ..., ξn (ω), ...
Denition 1. We say that {ξn }∞
n=1 converges to some random variable ξ almost surely if P {ω : ξn (ω) →
ξ(ω)} = 1. In this case we write ξn → ξ a.s.
Denition 2. We say that {ξn }∞
n=1 converges to ξ in probability if P {ω : |ξn (ω) − ξ(ω)| > ε} → 0 as n → ∞
for any ε > 0. In this case we write ξn →p ξ .
2
Denition 3. We say that {ξn }∞
n=1 converges to ξ in quadratic mean if E[|ξn − ξ| ] → 0 as n → ∞. In this
case we write ξn → ξ in L2 .
Denition 4. We say that {ξn }∞
n=1 converges to ξ in distribution if limn→∞ Fξn (x) = Fξ (x) for all x ∈ R
where Fξ (x) is continuous. In this case we write ξn ⇒ ξ .
Cite as: Anna Mikusheva, course materials for 14.381 Statistical Methods in Economics, Fall 2011. MIT OpenCourseWare
(http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
We will discuss the relationships between dierent types of convergence after we prove some inequalities.
1.2 Useful Inequalities
First, we have Markov inequality.
Theorem 5. Let X be any nonnegative random variable such that E[X] exists. Then for any t > 0, we
have P {X ≥ t} ≤ E[X]/t.
Proof. Since X is nonnegative,
Z
∞
E[X] =
xf (x)dx
0
Z
Z
t
=
xf (x)dx +
Z0 ∞
≥
xf (x)dx
t
xf (x)dx
t
Z
≥
∞
t
∞
f (x)dx
t
=
tP {X ≥ t}
where f denotes the pdf of X . A similar argument works for discrete random variables as well.
From the Markov inequality we can derive the Chebyshev inequality.
Theorem 6. For any random variable X with mean µ and any t > 0, we have P {|X − µ| ≥ t} ≤ V (X)/t2 .
Proof. Note that |X − µ| ≥ t if and only if |X − µ|2 ≥ t2 . Thus, P {|X − µ| ≥ t} = P {|X − µ|2 ≥ t2 }.
Since |X − µ|2 is a nonnegative random variable, P {|X − µ|2 ≥ t2 } ≤ E[|X − µ|2 ]/t2 = V (X)/t2 by Markov
inequality.
These two inequalities are of huge importance in statistics and probability.
1.3 Relations between dierent types of convergence
We can use the Markov inequality to prove that convergence in quadratic mean implies convergence in
probability:
Theorem 7. If ξn → ξ in L2 , then ξn →p ξ .
Proof. By Markov inequality,
P {|ξn − ξ| > ε} = P {|ξn − ξ|2 > ε2 } ≤ E[|ξn − ξ|2 ]/ε2 → 0
for any ε > 0.
Convergence in probability implies convergence in distribution:
2
Theorem 8. If ξn →p ξ , then ξn ⇒ ξ .
Proof. Note that ξn ≤ x and ξ > x + ε implies |ξn − ξ| > ε. Thus,
Fξn (x) =
P {ξn ≤ x}
=
P {ξn ≤ x, ξ ≤ x + ε} + P {ξn ≤ x, ξ > x + ε}
≤
P {ξ ≤ x + ε} + P {|ξn − ξ| > ε}
=
Fξ (x + ε) + P {|ξn − ξ| > ε}.
for any x ∈ R and ε > 0. Similarly,
Fξ (x − ε) ≤ Fξn (x) + P {|ξn − ξ| > ε}.
Thus,
Fξ (x − ε) − P {|ξn − ξ| > ε} ≤ Fξn (x) ≤ Fξ (x + ε) + P {|ξn − ξ| > ε}.
Next, if x is a point of continuity of Fξ , for any δ > 0, there exists ε(δ) > 0 such that
Fξ (x + ε(δ)) − δ ≤ Fξ (x) ≤ Fξ (x − ε(δ)) + δ.
Therefore
Fξ (x) − δ − P {|ξn − ξ| > ε(δ)} ≤ Fξn (x) ≤ Fξ (x) + δ + P {|ξn − ξ| > ε(δ)}.
Next, since ξn →p ξ ,by denition:
lim |Fξn (x) − Fξ (x)| ≤ δ
n
So, Fξn (x) → Fξ (x) as n → ∞ for any x ∈ R where Fξ (x) is continuous.
As an exercise, prove that almost sure convergence implies convergence in probability. Also, one can
show that if c is some constant and ξn ⇒ c, then ξn →p c.
We have mentioned here all the correct implications. Any other implication is, in general, incorrect. Let
us show, for example, that convergence in quadratic mean does not follow from convergence in probability.
Let Ω = [0, 1] be a sample space. Let ξn (ω) = n if ω ∈ [0, 1/n] and 0 otherwise. Then it is obvious that
ξn →p 0. Indeed, for any ε ∈ (0, 1), P {|ξn − 0| > ε} = 1/n → 0 as n → ∞ and P {|ξn − 0| > ε} = 0 for any
ε ≥ 1. On the other hand, E[|ξn − 0|2 ] = E[|ξn |2 ] = n2 · (1/n) = n → ∞.
2 Slutsky theorem and Continuous mapping theorem
Let X, X1 , ..., Xn , ... and Y, Y1 , ..., Yn , ... be some random variables. Let g be some continuous function. Let
c be some constant. Then
1. If Xn →p X and Yn →p Y , then Xn + Yn →p X + Y and Xn Yn →p XY .
2. If Xn ⇒ X and Y →p c, then Xn + Yn ⇒ X + c and Xn Yn ⇒ cX .
3
3. If Xn →p X , then g(Xn ) →p g(X).
4. If Xn ⇒ X , then g(Xn ) ⇒ g(X)
The rst and second statements are known as the Slutsky theorem. The third and forth statements are
known as the Continuous mapping theorem. These theorems are widely used in statistics.
Note that, in general, Xn ⇒ X and Yn ⇒ Y does not imply Xn + Yn ⇒ X + Y or Xn Yn ⇒ XY .
3 Law of Large Numbers
Theorem 9. If {Xn }∞
n=1 is a sequence of independent and identically distributed (iid) random variables with
E[Xn ] = µ and V (X) = σ 2 < ∞, then X n :=
Pn
Proof. By linearity of expectation, E[X n ] = E [
i=1
Xi /n → µ in L2 and, thus, in probability.
Pn
i=1
E[|X − µ|2 ]
Xi /n] =
=
=
Pn
i=1
E[Xi ]/n = µ. Thus,
V (X)
n
X
V(
Xi )/n2
i=1
=
=
n
X
V (Xi )/n2
i=1
2
σ /n.
Thus, E[|X − µ|2 ] → 0 as n → ∞.
Another version of the law of large numbers is
Theorem 10. If {Xn }∞
n=1 is a sequence of iid random variables with E[Xn ] = µ and E[|Xn |] < ∞, then
X n → µ a.s.
Proof. Omitted.
4 Central Limit Theorem
2
Let {Xn }∞
n=1 be a sequence of iid random variables with mean µ and variance σ . We have seen already that
Pn
Pn
√
√
E[ i=1 (Xi − µ)/ n] = 0 and V ( i=1 (Xi − µ)/ n) = σ 2 . A much more remarkable result is the Central
limit theorem:
Theorem 11.
Pn
i=1 (Xi
√
− µ)/ n ⇒ N (0, σ 2 ).
In the multivariate case, if V (Xi ) = E [(Xi −E [Xi ])(Xi −E[Xi ])T ] = Σ, then
4
Pn
i=1 (Xi −µ)/
√
n ⇒ N (0, Σ).
4.1 Example
Let the probability of a newborn being a boy be, say, 0.51. What is the probability that at least half out
of 100 newborns will be boys? To answer this question, let Xi = 1 if i-th newborn is a boy and Xi = 0
otherwise. Then Xi = 1 with probability p = 0.51 and Xi = 0 with probability 1 − p = 0.49. Therefore
µ = E[Xi ] = 0.51 and σ 2 = p(1 − p) = 0.51 · 0.49. Moreover, X1 , ..., X100 are independent random variables.
P100
The total number of boys equals i=1 Xi . Thus
P{
100
X
Xi /100 ≥ 0.5}
100
X
(Xi − µ)/100 ≥ −0.01}
=
P{
=
100
X
√
√
√
P { (Xi − µ)/(σ 100) ≥ −0.01 · 100/ 0.51 · 0.49}
i=1
i=1
i=1
≈
since
Pn
i=1 (Xi
1 − Φ(−0.01 ·
√
√
100/ 0.51 · 0.49)
Pn
√
√
− µ)/(σ n) ⇒ N (0, 1) as n → ∞ and, hence, P { i=1 (Xi − µ)/(σ n) ≤ x} → Φ(x) as
n → ∞ for any x ∈ R. Note that here we used the fact that Φ(x) is continuous at all x ∈ R.
5 Delta method
Let X1 , ..., Xn , ... be a sequence of iid random variables with mean µ and variance σ 2 . As before, X n =
Pn
√
n(X n − µ)/σ ⇒ N (0, 1). Suppose that g : R → R is some
i=1 Xi /n. By the Central limit theorem
twice continuously dierentiable function (i.e. it has at least two derivatives, and the second derivative is
continuous). The delta method allows us to derive the limit (or, as we usually say, asymptotic) distribution
of g(X n ):
6 0, then
Theorem 12. If g 0 (µ) =
√
n(g(X n ) − g(µ))/σ ⇒ N (0, (g 0 (µ))2 ).
Proof. By the Taylor theorem with remainder, for any realization X n (ω), there is some µ?n (ω) between µ
and X n (ω) such that
g(X n (ω)) − g(µ) = g 0 (µ)(X n (ω) − µ) + g 00 (µ?n (ω))(X n (ω) − µ)2 /2.
(1)
Thus, we have dened a new sequence of random variables, {µ?n }∞
n=1 . By the Law of large numbers, X n →p µ.
Since µ?n is between µ and X n , µ?n →p µ as well. By the Continuous mapping theorem, g 00 (µn? ) →p g 00 (µ) since
√
√
g 00 (x) is continuous. Moreover, by the Slutsky theorem, n1/4 (X n −µ) = n(X n −µ)/n1/4 ⇒ 0 since n(X n −
√
µ) ⇒ N (0, 1) and 1/n1/4 →p 0. Thus, n1/4 (X n − µ) →p 0. Moreover, n(X n − µ)2 = (n1/4 (X n − µ))2 →p 0
√
since f (x) = x2 is a continuous function. So, by the Slutsky theorem again, ng 00 (µ?n (ω))(X n (ω)−µ)2 /2 →p
√
0. In addition, by the Central limit theorem, ng 0 (µ)(X n (ω) − µ) ⇒ N (0, σ 2 (g 0 (µ))2 ). Multiplying equation
√
(1) by n and applying the Slutsky theorem once more yields the result.
Note that this theorem also holds when g 0 (µ) = 0 but in this case the asymptotic distribution will be 0
(constant), i.e. degenerate. I recommend that you remember the argument used in this theorem as it is very
typical in statistics and econometrics.
5
The Delta method has a multidimensional extension. Let X1 , ..., Xn , ... be a sequence of iid k × 1 random
vectors with mean µ and covariance matrix Σ. Then, by the multidimensional Central limit theorem,
√
n(X n − µ) ⇒ N (0, Σ). Let g : Rk → R be a twice continuously dierentiable function. Let τ 2 =
(∂g(µ)/∂µ)T Σ(∂g(µ)/∂µ). Here ∂g(µ)/∂µ is a k × 1 vector with i-th component equals ∂g(µ)/∂µi . Then
√
n(g(X n ) − g(µ)) ⇒ N (0, τ 2 ).
5.1 Example
Let X1 , ..., Xn , ... be a sequence of iid random variables with mean µ and variance σ 2 . What is the limiting
√
distribution of (X n )2 ? Let g(x) = x2 . Then g 0 (µ) = 2µ. Thus, by the Delta method, n((X n )2 − µ2 ) ⇒
N (0, 4µ2 σ 2 ). Note that if µ = 0, then the limit distribution is degenerate.
6
MIT OpenCourseWare
http://ocw.mit.edu
14.381 Statistical Method in Economics
Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download