STAT355 - Probability & Statistics Chapter 7: Statistical Intervals Based on a Single Sample Fall 2011 STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 1 / 28Inte Chapter 7 - Statistical Intervals Based on a Single Sample 1 7.1 Basic Properties of Confidence Intervals 2 7.2 Large-Sample Confidence Intervals for a Population Mean and Proportion 3 7.3 Intervals Based on a Normal Population Distribution 4 7.4 Confidence Intervals for the Variance and Standard Deviation of a Normal Population STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 2 / 28Inte Basic Properties of Confidence Intervals Consider a random sample X1 , ..., Xn from N(µ, σ 2 ) and x1 , ..., xn be the actual observations of the random sample. Sample mean X̄ ∼ N(µ, σ 2 /n). Z= P(−1.96 ≤ X̄ − µ √ ∼ N(0, 1) σ/ n X̄ − µ √ ≤ 1.96) = 0.95 σ/ n STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 3 / 28Inte Basic Properties of Confidence Intervals P(−1.96 ≤ X̄ − µ √ ≤ 1.96) = 0.95 σ/ n is equivalent to σ σ P(X̄ − 1.96 √ ≤ µ ≤ X̄ + 1.96 √ ) = 0.95 n n Thus, σ σ (X̄ − 1.96 √ , X̄ + 1.96 √ ) n n is a random interval that includes or covers the true value of µ. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 4 / 28Inte Basic Properties of Confidence Intervals σ σ (X̄ − 1.96 √ , X̄ + 1.96 √ ) n n (1) is a random interval that includes or covers the true value of µ. Definition If, after observing X1 = x1 , X2 = x2 , ..., Xn = xn , we compute the observed sample mean x̄ and then substitute x̄ into (1) in place of X̄ , the resulting fixed interval σ σ (x̄ − 1.96 √ , x̄ + 1.96 √ ) n n is called a 95% confidence interval for µ. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 5 / 28Inte Basic Properties of Confidence Intervals Definition A 100(1 − α)% confidence interval for the mean µ of a normal population when the value of σ 2 is known is given σ σ (x̄ − zα/2 √ , x̄ + zα/2 √ ) n n or, equivalently, by σ x̄ ± zα/2 √ n α = 0.1, zα/2 = z0.05 = 1.64 α = 0.05, zα/2 = z0.025 = 1.96 STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 6 / 28Inte Example Exercises 1: Consider a normal population with the value of σ known. √ 1 What is the confidence interval level for the interval x̄ ± 2.81σ/ n? √ 2 What is the confidence interval level for the interval x̄ ± 1.44σ/ n? 3 What is the value of z α/2 that will result in a confidence level of 99.7%? STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 7 / 28Inte Large-Sample Confidence Intervals for a Population Mean Consider X1 , ..., Xn from N(µ, σ 2 ). Often, σ 2 is unknown. Let S be the sample standard deviation. Proposition If n is sufficiently large, the standardized variable Z= X −µ √ S/ n has approximately a standard normal distribution. This implies that s x̄ ± zα/2 √ n is a large-sample confidence interval for µ with confidence level approximately 100(1 − α)%. This formula is valid regardless of the shape of the population distribution. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 8 / 28Inte A Confidence Interval for a Population Proportion Let p denote the proportion of “successes” in a population. A random sample of n individuals is to be selected, and X is the number of successes in the sample. Provided that n is small compared to the population size, X can be regarded as a binomial rv with p E (X ) = np and σX = np(1 − p) I Furthermore, if both np ≥ 10 and n(1 − p) ≥ 10, then X has approximately a normal distribution. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 9 / 28Inte A Confidence Interval for a Population Proportion The natural estimator of p is p̂ = X /n, the sample fraction of successes. Since p̂ is just X multiplied by the constant 1/n, p̂ also has approximately a normal distribution. As we know that, E (p̂) = p (unbiasedness) and σp̂ = p p(1 − p)/n. The standard deviation σp̂ involves the unknown parameter p. Standardizing p̂ by subtracting p and dividing by σp̂ then implies that p̂ − p P(−zα/2 ≤ p ≤ zα/2 ) ≈ 1 − α p(1 − p)/n STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 10 / 28Inte A Confidence Interval for a Population Proportion Proposition Let p̃ = 2 /2n p̂+zα/2 2 /n 1+zα/2 . Then a confidence interval for a population proportion p with confidence level approximately 100(1 − α)% is q 2 /4n2 p̂(1 − p̂)/n + zα/2 p̃ ± zα/2 2 /n 1 + zα/2 STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 11 / 28Inte Exercise (7.2) 21 In a sample of 1000 randomly selected consumers who had opportunities to send in a rebate claim form after purchasing a product, 250 of these people said they never did so. Calculate an upper confidence bound at the 95% confidence level for the true proportion of such consumers who never apply for a rebate. Based on this bound, is there compelling evidence that the true proportion of such consumers is smaller than 1/3? STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 12 / 28Inte Intervals Based on a Normal Population Distribution The CI for µ presented earlier is valid provided that n is large. The resulting interval can be used whatever the nature of the population distribution. The CLT cannot be invoked, however, when n is small. In this case, one way to proceed is to make a specific assumption about the form of the population distribution and then derive a CI tailored to that assumption. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 13 / 28Inte Intervals Based on a Normal Population Distribution Assumption The population of interest is normal, so that X1 , ..., Xn constitutes a random sample from a normal distribution with both µ and σ 2 unknown. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 14 / 28Inte Intervals Based on a Normal Population Distribution The key result underlying the interval in earlier section was that for large X̄ −µ √ has approximately a standard normal distribution. n, the rv Z = S/ n When n is small, S is no longer likely to be close to s, so the variability in the distribution of Z arises from randomness in both the numerator and the denominator. This implies that the probability distribution of out than the standard normal distribution. STAT355 () - Probability & Statistics X̄ −µ √ S/ n will be more spread Chapter Fall 2011 7: Statistical 15 / 28Inte Intervals Based on a Normal Population Distribution The result on which inferences are based introduces a new family of probability distributions called t distributions. Theorem When X̄ is the mean of a random sample of size n from a normal distribution with mean, the rv T = X̄ − µ √ S/ n has a probability distribution called a t distribution with n − 1 degrees of freedom (df). STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 16 / 28Inte Properties of t Distributions X̄ −µ √ , we now denote it by T to Although the variable of interest is still S/ n emphasize that it does not have a standard normal distribution when n is small. We know that a normal distribution is governed by two parameters; each different choice of µ in combination with σ 2 gives a particular normal distribution. Any particular t distribution results from specifying the value of a single parameter, called the number of degrees of freedom, abbreviated df. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 17 / 28Inte Properties of t Distributions Well denote this parameter by the Greek letter ν. Possible values of ν are the positive integers 1, 2, 3,... So there is a t distribution with 1 df, another with 2 df, yet another with 3 df, and so on. For any fixed value of ν, the density function that specifies the associated t curve is even more complicated than the normal density function. Fortunately, we need concern ourselves only with several of the more important features of these curves. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 18 / 28Inte Properties of t Distributions Let tν denote the t distribution with ν df. 1 Each tν curve is bell-shaped and centered at 0. 2 Each tν curve is more spread out than the standard normal (z) curve. 3 As ν increases, the spread of the corresponding tν curve decreases. 4 As ν → ∞, the sequence of tν curves approaches the standard normal curve (so the z curve is often called the t curve with df =∞). STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 19 / 28Inte Properties of t Distributions T = X̄ − µ √ S/ n The number of df for T is n − 1 because, although S is based on the n P deviations X1 − X̄ , ..., X̄ − Xn , the fact that (Xi − X̄ ) = 0 implies that only n − 1 of these are “freely determined.” The number of df for a t variable is the number of freely determined deviations on which the estimated standard deviation in the denominator of T is based. The use of t distribution in making inferences requires notation for capturing t-curve tail areas tα analogous to zα for the z curve. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 20 / 28Inte Properties of t Distributions Notation: Let tα,ν = the number on the measurement axis for which the area under the t curve with ν df to the right of tα,ν is α; tα,ν is called a t critical value. For example, t.05,6 is the t critical value that captures an upper-tail area of 0.05 under the t curve with 6 df. Because t curves are symmetric about zero, -tα,ν captures lower-tail area α. Appendix Table A.5 gives tα,ν for selected values of α and n. The columns of the table correspond to different values of α. To obtain t0.05,15 , go to the α = 0.05 column, look down to the n = 15 row, and read t0.05,15 = 1.753. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 21 / 28Inte The One-Sample t Confidence Interval Proposition Let x̄ and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean µ. Then a 100(1 − α)% confidence interval for µ is s s (x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √ ) n n or, more compactly, s x̄ ± tα/2,n−1 √ n STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 22 / 28Inte The One-Sample t Confidence Interval Example (11): Even as traditional markets for sweetgum lumber have declined, large section solid timbers traditionally used for construction bridges and mats have become increasingly scarce. The article “Development of Novel Industrial Laminated Planks from Sweetgum Lumber” (J. of Bridge Engr., 2008: 6466) described the manufacturing and testing of composite beams designed to add value to low-grade sweetgum lumber. Here is data on the modulus of rupture: 6807.99 7437.88 7659.50 7422.69 7637.06 6872.39 7378.61 7886.87 6663.28 7663.18 7295.54 6316.67 6165.03 6032.28 6702.76 7713.65 6991.41 6906.04 7440.17 7503.33 6992.23 6981.46 7569.75 6617.17 6984.12 7093.71 8053.26 8284.75 7347.95 7674.99 STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 23 / 28Inte The One-Sample t Confidence Interval Use R software. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 24 / 28Inte The One-Sample t Confidence Interval Example (12) Consider the following sample of fat content (in percentage) of n = 10 randomly selected hot dogs (“Sensory and Mechanical Assessment of the Quality of Frankfurters,” J. of Texture Studies, 1990: 395409): 25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5 Assuming that these were selected from a normal population distribution, find a 95% CI for (interval estimate of) the population mean fat content. Use your calculator to obtain x̄ and s. STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 25 / 28Inte The Chi-Squared (χ2 ) Distribution Definition Let X1 , X2 , ..., Xn be a random sample from a normal distribution with parameters µ and σ 2 . Then the rv P (n − 1)S 2 (Xi − X̄ )2 = σ2 σ2 has a chi-squared (χ2 ) probability distribution with ν = n − 1 df. Notation: Let χ2α,ν called a chi-squared critical value, denote the number on the horizontal axis such that α of the area under the chi-squared curve with ν df lies to the right of χ2α,ν . Remark: The chi-squared distribution is not symmetric STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 26 / 28Inte Confidence Interval of σ 2 From the theorem, P(χ21−α/2,n−1 ≤ (n − 1)S 2 ≤ χ2α/2,n−1 ) = 1 − α σ2 we get the inequalities (n − 1)S 2 (n − 1)S 2 ≤ α ≤ χ2α/2,n−1 χ21−α/2,n−1 I A 100(1 − α)% confidence interval for the variance σ 2 of a normal population is (n − 1)s 2 (n − 1)s 2 , ) ( 2 χα/2,n−1 χ21−α/2,n−1 STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 27 / 28Inte (Suppl) 51 An April 2009 survey of 2253 American adults conducted by the Pew Research Center’s Internet & American Life Project revealed that 1262 of the respondents had at some point used wireless means for online access. 1 Calculate an interpret a 95% CI for the proportion of all American adults who at the time of the survey had used wireless means for online access. 2 What sample size is required if the desired width of the 95% CI is to be at most 0.04, irrespective of the sample results? STAT355 () - Probability & Statistics Chapter Fall 2011 7: Statistical 28 / 28Inte