Applied Statistics I Liang Zhang Department of Mathematics, University of Utah July 17, 2008 Liang Zhang (UofU) Applied Statistics I July 17, 2008 1 / 23 Large-Sample Confidence Intervals Liang Zhang (UofU) Applied Statistics I July 17, 2008 2 / 23 Large-Sample Confidence Intervals Proposition If n is sufficiently large, the standardized variable Z= X −µ √ S/ n has approximately a standard normal distribution. This implies that s x̄ ± zα/2 · √ n is a large-sample confidence interval for µ with confidence level approximately 100(1 − α)%. This formula is valid regardless of the shape of the population distribution. Liang Zhang (UofU) Applied Statistics I July 17, 2008 2 / 23 Large-Sample Confidence Intervals Liang Zhang (UofU) Applied Statistics I July 17, 2008 3 / 23 Large-Sample Confidence Intervals Example (a variant of Problem 16) The charge-to-tap time (min) for a carbon steel in one type of open hearth furnace was determined for each heat in a sample of size 46, resulting in a sample mean time of 382.1 and a sample standard deviation of 31.5. Calculate a 95% confidence interval for true average charge-to-tap time. Liang Zhang (UofU) Applied Statistics I July 17, 2008 3 / 23 Large-Sample Confidence Intervals Liang Zhang (UofU) Applied Statistics I July 17, 2008 4 / 23 Large-Sample Confidence Intervals Example (Problem 19) The article “Limited Yield Estimation for Visual Defect Sources” (IEEE Trans. on Semiconductor Manuf., 1997: 17-23) reported that, in a study of a particular wafer inspection process, 356 dies were examined by an inspection probe and 201 of these passed the probe. Assuming a stable process, calculate a 95% confidence interval for the proportion of all dies that pass the probe. Liang Zhang (UofU) Applied Statistics I July 17, 2008 4 / 23 Large-Sample Confidence Intervals Liang Zhang (UofU) Applied Statistics I July 17, 2008 5 / 23 Large-Sample Confidence Intervals Proposition A confidence interval for a population proportion p with confidence level approximately 100(1 − α)% has r p̂ + lower confidence limit = 2 zα/2 2n p̂ + Liang Zhang (UofU) + 2 zα/2 4n2 2 )/n 1 + (zα/2 and upper confidence limit = p̂q̂ n − zα/2 2 zα/2 2n Applied Statistics I r + zα/2 1+ p̂q̂ n + 2 zα/2 4n2 2 )/n (zα/2 July 17, 2008 5 / 23 Large-Sample Confidence Intervals Liang Zhang (UofU) Applied Statistics I July 17, 2008 6 / 23 Large-Sample Confidence Intervals Example (Problem 16) The charge-to-tap time (min) for a carbon steel in one type of open hearth furnace was determined for each heat in a sample of size 46, resulting in a sample mean time of 382.1 and a sample standard deviation of 31.5. Calculate a 95% upper confidence bound for true average charge-to-tap time. Liang Zhang (UofU) Applied Statistics I July 17, 2008 6 / 23 Large-Sample Confidence Intervals Liang Zhang (UofU) Applied Statistics I July 17, 2008 7 / 23 Large-Sample Confidence Intervals Example (Problem 19) The article “Limited Yield Estimation for Visual Defect Sources” (IEEE Trans. on Semiconductor Manuf., 1997: 17-23) reported that, in a study of a particular wafer inspection process, 356 dies were examined by an inspection probe and 201 of these passed the probe. Assuming a stable process, calculate a 95% lower confidence bound for the proportion of all dies that pass the probe. Liang Zhang (UofU) Applied Statistics I July 17, 2008 7 / 23 Large-Sample Confidence Intervals Liang Zhang (UofU) Applied Statistics I July 17, 2008 8 / 23 Large-Sample Confidence Intervals Proposition A large-sample upper confidence bound for µ is s µ < x̄ + zα · √ n and a large-sample lower confidence bound for µ is s µ > x̄ − zα · √ n A one-sided confidence bound for p results from replacing zα/2 by zα and ± by either + or − in the CI formula for p. In all cases the confidence level is approximately 100(1 − α)% Liang Zhang (UofU) Applied Statistics I July 17, 2008 8 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 9 / 23 Confidence Intervals for Normal Distribution Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 Liang Zhang (UofU) Applied Statistics I July 17, 2008 9 / 23 Confidence Intervals for Normal Distribution Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 What is the 95% confidence interval for the population mean µ? Liang Zhang (UofU) Applied Statistics I July 17, 2008 9 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 10 / 23 Confidence Intervals for Normal Distribution Theorem Let X1 , X2 , . . . , Xn be a random sample from a normal distribution with mean µ and variance σ 2 , where µ and σ are unknown. The random variable X −µ √ T = S/ n has a probability distribution called a t distribution with n − 1 degrees of freedom (df). Here X is the sample mean and S is the sample standard deviation. Liang Zhang (UofU) Applied Statistics I July 17, 2008 10 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 11 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 11 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 12 / 23 Confidence Intervals for Normal Distribution Properties of t Distributions: Liang Zhang (UofU) Applied Statistics I July 17, 2008 12 / 23 Confidence Intervals for Normal Distribution Properties of t Distributions: Let tν denote the density function curve for ν df. 1. tν is governed by only one parameter ν, the number of degrees of freedom. Liang Zhang (UofU) Applied Statistics I July 17, 2008 12 / 23 Confidence Intervals for Normal Distribution Properties of t Distributions: Let tν denote the density function curve for ν df. 1. tν is governed by only one parameter ν, the number of degrees of freedom. 2. Each tν curve is bell-shaped and centered at 0. Liang Zhang (UofU) Applied Statistics I July 17, 2008 12 / 23 Confidence Intervals for Normal Distribution Properties of t Distributions: Let tν denote the density function curve for ν df. 1. tν is governed by only one parameter ν, the number of degrees of freedom. 2. Each tν curve is bell-shaped and centered at 0. 3. Each tν curve is more spread out than the standard normal (z) curve. Liang Zhang (UofU) Applied Statistics I July 17, 2008 12 / 23 Confidence Intervals for Normal Distribution Properties of t Distributions: Let tν denote the density function curve for ν df. 1. tν is governed by only one parameter ν, the number of degrees of freedom. 2. Each tν curve is bell-shaped and centered at 0. 3. Each tν curve is more spread out than the standard normal (z) curve. 4. As ν increases, the spread of the corresponding tν curve decreases. Liang Zhang (UofU) Applied Statistics I July 17, 2008 12 / 23 Confidence Intervals for Normal Distribution Properties of t Distributions: Let tν denote the density function curve for ν df. 1. tν is governed by only one parameter ν, the number of degrees of freedom. 2. Each tν curve is bell-shaped and centered at 0. 3. Each tν curve is more spread out than the standard normal (z) curve. 4. As ν increases, the spread of the corresponding tν curve decreases. 5. As ν → ∞, the sequence of tν curves approaches the standard normal curve (so the z curve is often called the t curve with df=∞). Liang Zhang (UofU) Applied Statistics I July 17, 2008 12 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 13 / 23 Confidence Intervals for Normal Distribution Notation Let tα,ν = the number on the measurement axis for which the area under the t curve with ν df to the right of tα,ν is α; tα,ν is called a t critical value. Liang Zhang (UofU) Applied Statistics I July 17, 2008 13 / 23 Confidence Intervals for Normal Distribution Notation Let tα,ν = the number on the measurement axis for which the area under the t curve with ν df to the right of tα,ν is α; tα,ν is called a t critical value. Liang Zhang (UofU) Applied Statistics I July 17, 2008 13 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 14 / 23 Confidence Intervals for Normal Distribution Proposition Let x̄ and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean µ. Then a 100(1 − α)% confidence interval for µ is s s α α x̄ − t 2 ,n−1 · √ , x̄ + t 2 ,n−1 · √ n n or, more compactly, x̄ ± t α2 ,n−1 · √sn . An upper confidence bound for µ is s x̄ + tα,n−1 · √ n and replacing + by − in this latter expression gives a lower confidence bound for µ, both with confidence level 100(1 − α)%. Liang Zhang (UofU) Applied Statistics I July 17, 2008 14 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 15 / 23 Confidence Intervals for Normal Distribution Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 Liang Zhang (UofU) Applied Statistics I July 17, 2008 15 / 23 Confidence Intervals for Normal Distribution Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 What is the 95% confidence interval for the 11th component? Liang Zhang (UofU) Applied Statistics I July 17, 2008 15 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 16 / 23 Confidence Intervals for Normal Distribution Proposition A prediction interval (PI) for a single observation to be selected from a normal population distribution is r 1 x̄ ± t α2 ,n−1 · s 1 + n The prediction level is 100(1 − α)%. Liang Zhang (UofU) Applied Statistics I July 17, 2008 16 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 17 / 23 Confidence Intervals for Normal Distribution Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 Liang Zhang (UofU) Applied Statistics I July 17, 2008 17 / 23 Confidence Intervals for Normal Distribution Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 What is the 95% confidence interval such that at least 90% of the values in the population are inside this interval? Liang Zhang (UofU) Applied Statistics I July 17, 2008 17 / 23 Confidence Intervals for Normal Distribution Liang Zhang (UofU) Applied Statistics I July 17, 2008 18 / 23 Confidence Intervals for Normal Distribution Proposition A tolerance interval for capturing at least k% of the values in a normal population distribution with a confidence level 95%has the form x̄ ± (tolerance critical value) · s Liang Zhang (UofU) Applied Statistics I July 17, 2008 18 / 23 Confidence Intervals for Normal Distribution Proposition A tolerance interval for capturing at least k% of the values in a normal population distribution with a confidence level 95%has the form x̄ ± (tolerance critical value) · s The tolerance critical values for k = 90, 95, and 99 in combination with various sample sizes are given in Appendix Table A.6. Liang Zhang (UofU) Applied Statistics I July 17, 2008 18 / 23 Confidence Intervals for the Variance of a Normal Population Liang Zhang (UofU) Applied Statistics I July 17, 2008 19 / 23 Confidence Intervals for the Variance of a Normal Population Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 Liang Zhang (UofU) Applied Statistics I July 17, 2008 19 / 23 Confidence Intervals for the Variance of a Normal Population Example (a variant of Problem 62, Ch5) The total time for manufacturing a certain component is known to have a normal distribution. However, the mean µ and variance σ 2 for the normal distribution are unknown. After an experiment in which we manufactured 10 components, we recorded the sample time which is given as follows: 1 2 3 4 5 time 63.8 60.5 65.3 65.7 61.9 with X = 64.95, s = 2.42 6 7 8 9 10 time 68.2 68.1 64.8 65.8 65.4 What is a 95% confidence for the population variance σ 2 ? Liang Zhang (UofU) Applied Statistics I July 17, 2008 19 / 23 Confidence Intervals for the Variance of a Normal Population Liang Zhang (UofU) Applied Statistics I July 17, 2008 20 / 23 Confidence Intervals for the Variance of a Normal Population Theorem Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean µ and variance σ 2 . Then the random variable P (n − 1)S 2 (Xi − X )2 = σ2 σ2 has s chi-squared (χ2 ) probability distribution with n − 1 degrees of freedom (df). Liang Zhang (UofU) Applied Statistics I July 17, 2008 20 / 23 Confidence Intervals for the Variance of a Normal Population Liang Zhang (UofU) Applied Statistics I July 17, 2008 21 / 23 Confidence Intervals for the Variance of a Normal Population Liang Zhang (UofU) Applied Statistics I July 17, 2008 21 / 23 Confidence Intervals for the Variance of a Normal Population Liang Zhang (UofU) Applied Statistics I July 17, 2008 22 / 23 Confidence Intervals for the Variance of a Normal Population Notation Let χ2α,ν , called a chi-squared critical value, denote the number on the measurement axis such that α of the area under the chi-squared curve with ν df lies to the right of χ2α,ν . Liang Zhang (UofU) Applied Statistics I July 17, 2008 22 / 23 Confidence Intervals for the Variance of a Normal Population Notation Let χ2α,ν , called a chi-squared critical value, denote the number on the measurement axis such that α of the area under the chi-squared curve with ν df lies to the right of χ2α,ν . Liang Zhang (UofU) Applied Statistics I July 17, 2008 22 / 23 Confidence Intervals for the Variance of a Normal Population Liang Zhang (UofU) Applied Statistics I July 17, 2008 23 / 23 Confidence Intervals for the Variance of a Normal Population Proposition A 100(1 − α)% confidence interval for the variance σ 2 of a normal population has lower limit (n − 1)s 2 /χ2α ,n−1 2 and upper limit (n − 1)s 2 /χ21− α ,n−1 2 A confidence interval for σ has lower and upper limits that are the square roots of the corresponding limits in the interval for σ 2 . Liang Zhang (UofU) Applied Statistics I July 17, 2008 23 / 23