ST 361 Estimation --- Interval Estimation for (§7.2, 7.4) Topics: I. Interval estimation: confidence interval II. (Two-sided) Confidence interval for estimating population mean (a) When the population SD is known: use Z distribution (§7.2) (b) When the population SD is NOT known: use t distribution (§7.4) III. (Two-sided) confidence interval for estimating population proportion (§7.3) IV. Two-sided confidence interval for estimating population mean difference 1 2 (§7.5) (a) when the population SD’s 1 , 2 are known (b) when the population SD’s 1 , 2 are NOT unknown ---------------------------------------------------------------------------------------------------------------------I. Interval Estimate----Confidence Interval (CI) What is it? A confidence interval is an interval calculated from a sample such that it will contain the true value of a population parameter (such as the population mean ) with certain probability (called the confidence level) Why? Because of sampling variability, the point estimate is almost never exactly equal to the correct value for the parameter Point estimates don’t tell us how close they are to the actual parameter So we use an interval call a confidence interval to report the likely range for the parameter of interest Confidence interval for the population mean Consider a sample of X 1 , X 2 ,, X n (n 30) that is randomly selected from a population with mean and SD . To estimate we use the sample mean X n1 X i . Our goal here is find the likely range of . Recall that X ~ N ( , X Based on this normal distribution of X , we can show that the middle 95% of the X n ) no matter what distribution X has (by CLT). fall within 1.96 / n , 1.96 / n , or equivalently, 1.96 / n . Notice that this is an interval centering at the parameter . 1 However, in reality we don’t know , and instead we only observe X from the sample collected. So what we really want is an interval centering at X with the same length: X 1.96 / n or equivalently X 1.96 / n , X 1.96 / n . 2 Meaning of the interval “ X 1.96 / n ”: It contains with 95% probability Thus we have 95% of confidence that can be covered within the range X 1.96 / n , X 1.96 / n This is the concept of Confidence Interval (CI)----We called such interval “the 95% confidence interval for ” Note that the fundamental assumption for constructing the CI for is that : X has a normal distribution (automatically true if X has a normal distribution. Otherwise the sample size n has to be large) II. Confidence interval for ; assume X ~ N ( , / n ) If known If known, the CI for at a given confidence level (1 ) is X z / 2 / n 4 components: (a) The point estimator X (b) Confidence level, which determines the critical value z* (c) The SE of the point estimator X (d) need X follows normal distribution Ex1. X~N( , ). The 90% CI for with sample size n is X z0.1/ 2 / n X 1.645 / n 3 Ex2. X~N( , ). The 95% CI for with sample size n is X z0.05/ 2 / n X 1.96 / n Ex3. X~N( , ). The 99% CI for with sample size n is X z0.01/ 2 / n X 2.58 / n ► Comment: the higher the confidence level is, the wider or narrower (choose one) a CI becomes. Ex4. X has mean and SD (known). A sample of size=100 is collected. What is the 95% CI for ? X 1.96 / 100 [ X 0.196 , X 0.196 ] If from a sample we got X = 3.4, and is assumed to be 2.5. Then a 95% CI for is [ X 0.196 , X 0.196 ] [3.4 0.196 2.5,3.4 0.196 2.5] [2.91,3.89] , X 1.75 ? Ex5. What is the confidence level for the interval X 1.75 n n Since / 2 P[ Z 1.75] P[ Z 1.75] 0.04 0.08, 1 0.92 is the confidence level. 4 If is unknown In practice, most of the time is not known. To calculate CI for , we have to use s / n instead of n. When is known, When s is used, X ~ Z (when X has a normal distribution or the sample size n n is large), and hence we use a z critical value. X will be distributed as s (a) If n is large ( n 30) , n X is approximately distributed as N(0,1). So we can still use s n the result before by replacing with the sample SD s . (b) If n is small ( n 30) , we have to assume X has a normal distribution with mean and SD (even though its value if unknown). Then X has a t-distribution with (n-1) s n degrees of freedom When unknown, the CI for at a given confidence level is X tn1, / 2 s / n The t distribution with (n-1) degree of freedom (graph on the last page of the textbook) t distribution is similar to the standard normal distribution (the Z distribution) in many aspects: (1) all values are possible (2) symmetric around zero (3) bell-shaped However, it has heavier tails than the Z distribution. Different sample size results in different thickness of the tail in a t distribution: the smaller the sample size (the degrees of freedom), the thicker the distribution. Each t distribution is defined through the degree of freedom (df) and the corresponding t distribution is denoted by t df 5 When the sample size is very large (i.e., >120), t(n-1)≈ Z !! Use t- table to find the critical value Page 566 Table IV Ex6. Use the t table to find 95% and 99% t-critical value for each of the following sample size: Sample size n Degree of freedom (df) = n-1 3 t* (i.e., t-critical value) 95% 99% 2 4.303 9.925 6 5 2.571 4.032 12 11 2.201 3.106 30 29 2.045 2.756 1.96 2.576 Ex7. X~Normal distribution. n=25, X =8 and s=2. What is the 95% CI for the population mean ? X 2.064 s / n 8 2.064 2 / 25 2.0 [7.17,8.83] 6 Ex8. X= # of claims received (per week) by an insurance company. Based on 41 weeks of samples, X 18.5 and s=20.0. What is the 95% CI for Here n=41 is large enough for us to use the formula X z / 2 s / n 18.5 1.96 20 / 41 [12.38, 24.62] 7