Statistics 305 CHAPTER 6 – INTRODUCTION TO FORMAL STATISTICAL INFERENCE Large Sample Confidence Intervals for a Mean The situation is that we have a sample x1 , x 2 , K , x n from a population whose mean µ is unknown. We wish to derive an interval, based on our sample, which is likely to contain µ, and we also want some idea of how likely it is that µ is in the interval. Why? Because simply looking at the sample mean value x doesn’t give information about how close x might be to µ. To begin a derivation, let X 1 , X 2 , K , X n be sampling random variables which characterize mathematically the sample mentioned above. Suppose n is large so that X is close to normal. Then X ≈ N ( µ , σ 2 / n), (using Central Limit Theorem). For some given small number α , 0 ≤ α ≤ 1 , (e.g. α = 0.05 or 0.01) consider (using the ⎛ ⎞ X −µ ≤ zα / 2 ⎟⎟ = 1 − α . normal quantile zα / 2 ), P ⎜⎜ − zα / 2 ≤ σ/ n ⎝ ⎠ Here zα / 2 is the quantile used to make a correct statement (under the approximation of the Central Limit Theorem. I.e., zα / 2 is the 1 − α / 2 quantile in the N (0, 1) population.) Doing some algebra gives: σ zα / 2 σ zα / 2 ⎞ ⎞ ⎛ ⎛ X −µ P ⎜⎜ − zα / 2 ≤ ≤ −µ ≤ −X + ⎟⎟ ≤ zα / 2 ⎟⎟ = P ⎜⎜ − X − n n ⎠ σ/ n ⎝ ⎠ ⎝ ⇒ σ zα / 2 σ zα / 2 ⎞ ⎛ P ⎜⎜ X − ⎟⎟ = 1 − α ≤ µ ≤ X + n n ⎠ ⎝ σ zα / 2 σ zα / 2 ⎞ ⎛ The interval ⎜⎜ X − , X+ ⎟⎟ is random because it involves X . The n n ⎠ ⎝ probability statement says that the probability is 1 − α that the random interval will cover µ. Even given the realization of X we still can’t compute interval endpoints because we don’t know σ. However, we can compute the sample standard deviation s from X 1 , X 2 , K , X n and for large n it turns out that we can use s to approximate σ. Now our sz sz ⎛ interval becomes ⎜⎜ X − α / 2 , X + α / 2 n n ⎝ ⎞ ⎟⎟ . ⎠ Now we have a theoretical development of a useful result. How to use it to get what we call a 100(1 − α)% Confidence Interval for µ ? 1. Select a confidence level α and use Table B.3 to find zα / 2 so that P(− zα / 2 ≤ Z ≤ zα / 2 ) = 1 − α . The following table gives example values. α zα / 2 1.28 0.2 1.645 0.1 1.96 0.05 2.33 0.02 2. Use the sample values x1 , x 2 , K , x n and compute x = 1 n ∑ xi , s = i 1 ( xi − x ) 2 ∑ n −1 i 3. Find the endpoints of the interval s zα / 2 ⎛ ⎜⎜ x − , n ⎝ x + s zα / 2 ⎞ ⎟⎟ n ⎠ This is called a 100(1−α)% confidence interval for the population mean µ. We interpret the interval by saying that we are 100(1−α)% confident that µ lies in the interval. We don’t say that the probability that µ is in the interval is 1 − α because there is nothing random about the interval now. 2 Example: Exercise 6.1.2 n = 26, x = 142.7, s = 98.2 (LARGE SAMPLE SIZE) a) Find a 90% C.I. for µ. Here α = 0.10 so that 1−α = 0.90 P ( − zα / 2 ≤ Z ≤ zα / 2 ) = 0.90 , zα / 2 = 1.645 so the C.I. for µ is x ± s zα / 2 = 142.7 ± n (98.2) (1.645) 26 ⇒ [111.02, 174.38 ] . We are 90% confident that this interval contains µ. Certainly either µ is in there or it isn’t so no probability statement is applicable. b) Find a 95% C.I. for µ. Here α = 0.05 and zα / 2 = 1.96 x ± s zα / 2 n = 142.7 ± (98.2) (1.96) 26 ⇒ [104.95, 180.45 ] The 95% C.I. is wider, but we are more confident that it contains µ than we were with the 90% C.I. * * * * * * Confidence bounds are also used sometimes. Instead of a finite interval, an upper or lower bound for µ, with confidence coefficient α, is given. ⎞ ⎛ X −µ Consider the statement P ⎜⎜ ≤ zα ⎟⎟ = 1 − α or equivalently ⎠ ⎝S/ n S zα ⎞ ⎛ P ⎜⎜ X − ≤ µ ⎟⎟ = 1 − α . n ⎠ ⎝ 3 This yields the 100(1−α)% lower confidence bound x − are 100(1−α)% confident that µ is in the interval [ x − s zα n s zα n for µ, and we say that we , ∞ ]. Similarly, beginning with the statement ⎛ X −µ P ⎜⎜ ≥ − zα ⎝ S/ n ⎞ ⎟⎟ = 1 − α ⎠ yields s zα ⎞ ⎛ P ⎜⎜ X + ≥ µ ⎟⎟ = 1 − α , n ⎠ ⎝ so [ − ∞, x + s zα n ] shows the 100(1−α )% upper confidence bound. To find a 100(1−α )% confidence bound (either upper or lower) the easy way to proceed is: 1. Compute the endpoints of a 100(1−2α )% confidence interval, then 2. Take the (upper or lower) endpoint of the interval. This is the 100(1−α )% confidence bound. Example: Exercise 6.1.2 (continued) c) For a 90% upper confidence bound for µ, compute an 80% confidence interval and take the upper endpoint. Here α = 0.20, zα / 2 = z0.10 = 1.28, so the 80% C.I. has endpoints ⎡ (98.2) (1.28) , ⎢142.7 − 26 ⎣ 142.7 + (98.2) (1.28) ⎤ ⎥ 26 ⎦ The required 90% upper confidence bound is 142.7 + (98.2) (1.28) 26 = 167.4 d) For a 95% upper confidence bound for µ, compute a 90% confidence interval and take the upper endpoint. This was done in part (a) and the upper endpoint is 174.4. 4