Markov’s & Chebyshev’s Inequalities Markov’s Inequality For any random variable X ≥ 0 and constant a > 0, then Lecture 7: Chebyshev’s Inequality, LLN, and the CLT P(X ≥ a) ≤ E (X ) a Corollary - Chebyshev’s Inequality: Sta 111 Colin Rundel P(|X − E (X )| ≥ a) ≤ May 22, 2014 Var (X ) a2 “The inequality says that the probability that X is far away from its mean is bounded by a quantity that increases as Var (X ) increases.” Sta 111 (Colin Rundel) Markov’s & Chebyshev’s Inequalities Lecture 7 May 22, 2014 1 / 28 Markov’s & Chebyshev’s Inequalities Derivation of Markov’s Inequality Derivation of Chebyshev’s Inequality Let X be a random variable such that X ≥ 0 then Proposition - if f (x) is a non-decreasing function then P(X ≥ a) = P f (X ) ≥ f (a) . Therefore, E f (X ) P(X ≥ a) ≤ . f (a) If we define the positive valued random variable to be |X − E (X )| and f (x) = x 2 then 2 P |X − E (X )| ≥ a = P (X − E (X )) ≥ a 2 E (X − E (X ))2 Var (X ) ≤ = a2 a2 Var (X ) P |X − E (X )| ≥ a ≤ a2 Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 2 / 28 Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 3 / 28 Markov’s & Chebyshev’s Inequalities Markov’s & Chebyshev’s Inequalities Using Markov’s and Chebyshev’s Inequalities Chebyshev’s Inequality - Example Suppose that it is known that the number of items produced in a factory during a week is a random variable X with mean 50. (Note that we don’t know anything about the distribution of the pmf) Lets use Chebyshev’s inequality to make a statement about the bounds for the probability of being with in 1, 2, or 3 standard deviations of the mean for all random variables. p If we define a = kσ where σ = Var (X ) then (a) What can be said about the probability that this week’s production will exceed 75 units? (b) If the variance of a week’s production is known to equal 25, then what can be said about the probability that this week’s production will be between 40 and 60 units? Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 4 / 28 P(|X − E (X )| ≥ kσ) ≤ Sta 111 (Colin Rundel) Markov’s & Chebyshev’s Inequalities X −µ σ Lecture 7 May 22, 2014 5 / 28 Law of Large Numbers Accuracy of Chebyshev’s inequality, cont. If X ∼ N (µ, σ 2 ) then let Z = Var (X ) 1 = 2 2 2 k σ k Independent and Identically Distributed (iid) ∼ N (0, 1). A collection of random variables that share the same probability distribution and all are mutually independent. Empirical Rule: Chebyshev’s inequality: 1 − P(−1 ≤ Z ≤ 1) ≈ 1 − 0.66 = 0.34 1 − P(−1 ≤ U ≤ 1) ≤ 1 11 1 − P(−2 ≤ Z ≤ 2) ≈ 1 − 0.96 = 0.04 1 − P(−2 ≤ U ≤ 2) ≤ 1 22 = 0.25 1 − P(−3 ≤ Z ≤ 3) ≈ 1 − 0.997 = 0.003 1 − P(−3 ≤ U ≤ 3) ≤ 1 32 = 0.11 Example =1 If X ∼ Binom(n, p) then X = Pn i=1 Yi iid where Y1 , · · · , Yn ∼ Bern(p) Why do we care if it is so inaccurate? Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 6 / 28 Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 7 / 28 Law of Large Numbers Law of Large Numbers Sums of iid Random Variables Average of iid Random Variables iid iid Let X1 , X2 , · · · , Xn ∼ D where D is some probability distribution with E (Xi ) = µ and Var (Xi ) = σ 2 . Let X1 , X2 , · · · , Xn ∼ D where D is some probability distribution with E (Xi ) = µ and Var (Xi ) = σ 2 . We defined Sn = X1 + X2 + · · · + Xn We defined X n = (X1 + X2 + · · · + Xn )/n = Sn /n then Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 8 / 28 Sta 111 (Colin Rundel) Law of Large Numbers Lecture 7 May 22, 2014 9 / 28 Law of Large Numbers Weak Law of Large Numbers Law of Large Numbers Based on these results and Markov’s Inequality we can show the following: Weak Law of Large Numbers (X̄n converges in probability to µ): lim P(|X̄n − µ| > ) = 0 n→∞ Strong Law of Large Numbers (X̄n converges almost surely to µ): P lim X̄n = µ = 1 n→∞ Strong LLN is a more powerful result (Strong LLN implies Weak LLN), but its proof is more complicated. Therefore, as long as σ 2 < ∞ lim P(|X̄n − µ| ≥ ) = 0 n→∞ Sta 111 (Colin Rundel) ⇒ Lecture 7 lim P(|X̄n − µ| < ) = 1 n→∞ May 22, 2014 10 / 28 Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 11 / 28 Law of Large Numbers Law of Large Numbers LLN - Example LLN and CLT How large a random sample must be taken from a given distribution in order for the probability to be at least 0.99 that the sample mean will be within 2 standard deviations of the mean of the distribution? Law of large numbers shows us that Sn − nµ = lim (X̄n − µ) → 0 n→∞ n→∞ n lim which shows that for large n, n S̄n − nµ. What happens if we divide by something that grows slower than n like What about 0.95 probability to be within 1 standard deviations of the mean? √ n? √ Sn − nµ d √ = lim n(X̄n − µ) → N(0, σ 2 ) n→∞ n→∞ n lim This is the Central Limit Theorem, of which the DeMoivre-Laplace theorem for the normal approximation to the binomial is a special case. Hopefully by the end of this class we will have the tools to prove this. Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 12 / 28 Sta 111 (Colin Rundel) Lecture 7 Law of Large Numbers May 22, 2014 13 / 28 Law of Large Numbers Equivalence of de Moivre-Laplace with CLT Equivalence of de Moivre-Laplace with CLT, cont. We have already seen that a Binomial random variable is equivalent to the sum of n iid Bernoulli random variables. P iid Let X ∼ Binom(n,p) where X = ni=1 Yi with Y1 , · · · , Yn ∼ Bern(p) and E (Yi ) = p, Var (Yi ) = p(1 − p). The Central Limit theorem gives us, Sn − nµ √ P c≤ ≤ d ≈ Φ(d) − Φ(c) σ n P then for X = Sn = Yi de Moivre-Laplace tells us: P(a ≤ X ≤ b) = P ≈Φ X − np b − np a − np p ≤p ≤p np(1 − p) np(1 − p) np(1 − p) ! ! b − np a − np p −Φ p np(1 − p) np(1 − p) P(a ≤ X ≤ b) = P ! =P ≈Φ Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 14 / 28 Sta 111 (Colin Rundel) a − nµ X − nµ b − nµ √ ≤ √ ≤ √ nσ nσ nσ a − np Sn − np b − np p ≤p ≤p np(1 − p) np(1 − p) np(1 − p) ! ! b − np a − np p −Φ p np(1 − p) np(1 − p) Lecture 7 May 22, 2014 ! 15 / 28 Law of Large Numbers Law of Large Numbers Why do we care? (Statistics) Astronomy Example In general we are interested in making statements about how the world works, we usually do this based on empirical evidence. This tends to involve observing or measuring something a bunch of times. An astronomer is interested in measuring, in light years, the distance from his observatory to a distant star. Although the astronomer has a measuring technique, he knows that, because of changing atmospheric conditions and measurement error, each time a measurement is made it will not yield the exact distance but merely an estimate. As a result the astronomer plans to make a series of measurements and then use the average value of these measurements as his estimated value of the actual distance. LLN tells us that if we average our results we’ll get closer and closer to the true value as we take more measurements CLT tells us the distribution of our average is normal if we take enough measurements, which tells us our uncertainty about the ‘true’ value Example - Polling: I can’t interview everyone about their opinions but if I interview enough my estimate should be close and I can use the CLT to approximate my margin of error. Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 16 / 28 If the astronomer believes that the values of the measurements are independent and identically distributed random variables having a common mean d (the actual distance) and a common variance of 4 (light years2 ), how many measurements should he make to be reasonably sure that his estimated distance is accurate to within ±0.5 light years? Sta 111 (Colin Rundel) Law of Large Numbers Lecture 7 May 22, 2014 17 / 28 Law of Large Numbers Astronomy Example, cont. Astronomy Example, cont. We need to decide on a level to account for being “reasonably sure”, usually taken to be 95% but the choice somewhat arbitrary. Our previous estimate depends on how long it takes for the distribution of the average of the measurements to converge to the normal distribution. CLT only guarantees convergence as n → ∞, but most distributions converge much more quickly (think about the np ≥ 10, np ≥ 10 requirement for Normal approximation to Binomial). We can also solve this problem using Chebyshev’s inequality Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 18 / 28 Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 19 / 28 Central Limit Theorem Central Limit Theorem Moment Generating Function - Properties Moment Generating Function - Unit Normal If X and Y are independent random variables then the moment generating function for the distribution of X + Y is Let Z ∼ N (0, 1) then Z ∞ 2 1 MZ (t) = E [e tZ ] = e tx √ e −x /2 dx 2π −∞ Z ∞ 2 x −tx 1 =√ e − 2 dx 2π −∞ Z ∞ (x−t)2 t2 1 =√ e − 2 + 2 dx 2π −∞ Z ∞ (x−t)2 1 t 2 /2 √ e − 2 dx =e 2π −∞ MX +Y (t) = E [e t(X +Y ) ] = E [e tX e tY ] = E [e tX ]E [e tY ] = MX (t)MY (t) Similarly, the moment generating function for Sn , the sum of iid random variables X1 , X2 , . . . , Xn is MSn (t) = [MXi (t)]n Sta 111 (Colin Rundel) Lecture 7 = et May 22, 2014 20 / 28 Sta 111 (Colin Rundel) Central Limit Theorem Sketch of Proof Let X1 , . . . , Xn be a sequence of independent and identically distributed random variables each having mean µ and variance σ 2 . Then the distribution of Proposition X1 + · · · + Xn − nµ √ σ n Sta 111 (Colin Rundel) May 22, 2014 21 / 28 Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables and Sn = X1 + · · · + Xn . The distribution of Sn is given by the distribution function fSn which has a moment generating function MSn with n ≥ 1. If MSn (t) → MZ (t) for all t, then fSn (t) → fZ (t) for all t at which fZ (t) is continuous. That is, for −∞ < a < ∞, P Lecture 7 Let Z being a random variable with distribution function fZ and moment generating function MZ . tends to the unit normal as n → ∞. X1 + · · · + Xn − nµ √ ≤a σ n /2 Central Limit Theorem Central Limit Theorem 2 1 →√ 2π Lecture 7 Z a e −x 2 /2 dx = Φ(a) as n → ∞ −∞ May 22, 2014 22 / 28 We can prove the CLT by letting Z ∼ N (0, 1), MZ (t) = e t 2 showing for any Sn that MSn /√n → e t /2 as n → ∞. Sta 111 (Colin Rundel) Lecture 7 2 /2 and then May 22, 2014 23 / 28 Central Limit Theorem Central Limit Theorem Proof of the CLT Proof of the CLT, cont. √ The moment generating function of Xi / n is given by tX t i MXi /√n (t) = E exp √ = MXi √ n n Some simplifying assumptions and notation: E (Xi ) = 0 Var (Xi ) = 1 P √ √ and this the moment generating function of Sn / n = ni=1 Xi / n is given by n t MSn /√n (t) = MXi √ n MXi (t) exists and is finite LX (t) = log MX (t) Also, remember L’Hospital’s Rule: f (x) f 0 (x) = lim 0 x→∞ g (x) x→∞ g (x) lim Sta 111 (Colin Rundel) Lecture 7 Therefore in order to show MSn /√n → MZ (t) we need to show n t 2 MXi √ → e t /2 n May 22, 2014 24 / 28 Sta 111 (Colin Rundel) Central Limit Theorem √ 2 [MXi (t/ n)]n → e t /2 √ nLXi (t/ n) → t 2 /2 LXi (t) = log MXi (t) MX0 (t) d i log MXi (t) = dt MXi (t) MX0 (0) µ i = =0 L0Xi (0) = MXi (0) 1 L00 Xi (0) = = Sta 111 (Colin Rundel) 25 / 28 Proof of the CLT, cont. LXi (0) = log MXi (0) = log 1 = 0 L00 Xi (t) = May 22, 2014 Central Limit Theorem Proof of the CLT, cont. L0Xi (t) = Lecture 7 lim n→∞ 0 MXi (t)MX00 (t) − [MX0 (t)]2 d MXi (t) i i = dt MXi (t) [MXi (t)]2 MXi (0)MX00 (0) − [MX0 (0)]2 i i [MXi (0)]2 E (Xi0 )E (Xi2 ) − E (Xi )2 = E (Xi2 ) − E (Xi )2 = σ 2 = 1 E (Xi0 )2 Lecture 7 May 22, 2014 26 / 28 √ √ L0 (t/ n)(− 21 tn−3/2 ) L(t/ n) = lim n→∞ n−1 −n−2 √ 0 L (t/ n)t = lim n→∞ 2n−1/2 √ L00 (t/ n)t(− 12 tn−3/2 ) = lim n→∞ −n−3/2 √ t2 = lim L00 (t/ n) n→∞ 2 t2 = 2 Sta 111 (Colin Rundel) Lecture 7 by L’Hospital’s rule by L’Hospital’s rule May 22, 2014 27 / 28 Central Limit Theorem Proof of the CLT, Final Comments The preceding proof assumes that E (Xi ) = 0 and Var (Xi ) = 1. We can generalize this result to any collection of random variables Yi by considering the standardized form Yi∗ = (Yi − µ)/σ. √ Y1 + · · · + Yn − nµ Y1 − µ Yn − µ √ n = + ··· + σ σ σ n √ = (Y1∗ + · · · + Yn∗ ) / n E (Yi∗ ) = 0 Var (Yi∗ ) = 1 Sta 111 (Colin Rundel) Lecture 7 May 22, 2014 28 / 28