統計學 授課教師:統計系余清祥 日期:2014年11月27日 第八章:區間估計 Fall 2014 STATISTICS in PRACTICE • Food Lion is one of the largest supermarket chains in the USA. • Food Lion establishes a LIFO (last-in, first-out) index for each of seven inventory pools. • Using a 95% confidence level, Food Lion computed a margin of error of .006 for the sample estimate of LIFO index. • The interval from 1.009 to 1.021 provided a 95% confidence interval estimate of the population LIFO index. • This level of precision was judged to be very good. 2 Chapter 8 Interval Estimation • 8.1 Population Mean: σ Known • 8.2 Population Mean: σ Unknown • 8.3 Determining the Sample Size • 8.4 Population Proportion 3 8.1 Population Mean: σ Known -1 • Margin of Error and the Interval Estimate • Practical Advice 4 Population Mean: σ Known -2 • Example: Lloyd’s Department Store – Each week Lloyd’s Department Store selects a simple random sample of 100 customers in order to learn about the amount spent per shopping trip. With x representing the amount spent per shopping trip, the sample mean x provides a point estimate of µ, the mean amount spent per shopping trip for the population of all Lloyd’s customers. Lloyd’s has been using the weekly survey for several years. Based on the historical data, Lloyd’s now assume a know value of σ = $20 for the population standard deviation. 5 Margin of Error and the Interval Estimate -1 A point estimator cannot be expected to provide the exact value of the population parameter. An interval estimate can be computed by adding and subtracting a margin of error to the point estimate. Point Estimate +/− Margin of Error The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter. 6 Margin of Error and the Interval Estimate -2 The general form of an interval estimate of a population mean is x ± Margin of Error 7 Interval Estimate of a Population Mean: σ Known -1 • In order to develop an interval estimate of a population mean, the margin of error must be computed using either: – the population standard deviation σ , or – the sample standard deviation σ • σ is rarely known exactly, but often a good estimate can be obtained based on historical data or other information. • We refer to such cases as the σ known case. 8 Interval Estimate of a Population Mean: σ Known -2 • There is a 1 – α probability that the value of a sample mean will provide a margin of zα /2σ x error of or less. Sampling distribution of x α/2 1 - α of all x values zα /2 σ x µ α/2 x zα /2 σ x 9 Interval Estimate of a Population Mean: σ Known -3 Sampling distribution of x α/2 interval does not include µ 1 - α of all x values zα /2σ x µ x zα /2σ x x -------------------------] interval includes µ x -------------------------] [------------------------- x -------------------------] [------------------------[------------------------- α/2 10 Interval Estimate of a Population Mean: σ Known -4 • Example: Lloyd’s Department Store – The sampling distribution of x is normally distributed with a standard error of σ x = 2. – Because ±1.96σ x = 21.96(2)= 3.92, we can conclude that 95% of all x values obtained using a sample size of n = 100 will be within ± 3.92 of the population mean μ. 11 Interval Estimate of a Population Mean: σ Known -5 • Example: Lloyd’s Department Store, Sampling Distribution of Showing the Location of x Sample Means that are Within 3.92 of μ 12 Interval Estimate of a Population Mean: σ Known -6 • Example: Lloyd’s Department Store, Intervals Formed From Selected Sample Means at Location x1, x2, and x3 13 Meaning of Confidence (σ Known) -1 • Example: Lloyd’s Department Store – Any sample mean that is within the darkly shaded region of Figure 8.3 will provide an interval that contains the population mean μ. Because 95% of all possible sample means are in the darkly shaded region, 95% of all intervals formed by subtracting 3.92 from and adding 3.92 to will include the population mean μ. 14 Meaning of Confidence (σ Known) -2 • Example: Lloyd’s Department Store – During the most recent week, the quality assurance team at Lloyd’s surveyed 100 customers and obtained a sample mean amount spent of x = 82. Using x ± 3.92 to construct the interval estimate, we obtain 82 ± 3.92. Thus, the specific interval estimate of μ based on the data from the most recent week is 82 – 3.92 = 78.08 to 82 + 3.92 = 85.92. 15 Meaning of Confidence (σ Known) -3 • Example: Lloyd’s Department Store – Because 95% of all the intervals constructed using x ± 3.92 will contain the population mean, we say that we are 95% confident that the interval 78.08 to 85.92 includes the population mean μ. We say that this interval has been established at the 95% confidence level. – The value 0.95 is referred to as the confidence coefficient, and the interval 78.08 to 85.92 is called the 95% confidence interval. 16 Meaning of Confidence (σ Known) -4 • Interval Estimate of µ x ± zα /2 where: σ n x is the sample mean 1 – α is the confidence coefficient zα/2 is the z value providing an area of α/2 in the upper tail of the standard normal probability distribution σ is the population standard deviation n is the sample size 17 Meaning of Confidence (σ Known) -5 • Values of zα/2 for the Most Commonly Used Confidence Levels 18 Meaning of Confidence (σ Known) -6 Because 90% of all the intervals constructed using x ± 1.645σ x will contain the population mean, we say we are 90% confident that the interval x ± 1.645σ x includes the population mean µ. We say that this interval has been established at the 90% confidence level. The value 0.90 is referred to as the confidence coefficient. 19 Meaning of Confidence (σ Known) -7 • Example: Lloyd’s Department Store – For a 95% confidence interval, the confidence coefficient is (1 – α) = 0.95 and thus, α = 0.05. – Using the standard normal probability table, an area of α/2 = 0.05/2 = 0.025 in the upper tail provides z0.025 = 1.96. With the Lloyd’s sample mean x = 82, σ = 20, and a sample size n = 100, 20 82 ± 1.96 100 82 ± 3.92 – Thus, the margin of error is 3.92 and the 95% confidence interval is 82 – 3.92 = 78.08 to 82 + 3.92 85.92. 20 Meaning of Confidence (σ Known) -8 • Example: Lloyd’s Department Store – The 90% confidence interval for the Lloyd’s example is 20 82 ± 1.645 100 82 ± 3.29 – At 90%, the margin of error is 3.29 and the confidence interval is 82 – 3.28 = 78.71 to 82 + 3.28 = 85.29. 21 Meaning of Confidence (σ Known) -9 • Example: Lloyd’s Department Store – The 99% confidence interval is 82 ± 2.576 82 ± 5.15 20 100 – Thus, at 99% confidence, the margin of error is 5.15 and the confidence interval is 82 – 5.15 = 76.85 to 82 + 5.15 = 87.15. 22 Interval Estimate of a Population Mean: σ Known -7 • Example: Discount Sounds – Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating a potential location for a new outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. – A sample of size n = 36 was taken; the sample mean income is $41,100. The population is not believed to be highly skewed. The population standard deviation is estimated to be $4,500, and the confidence coefficient to be used in the interval estimate is 0.95. 23 Interval Estimate of a Population Mean: σ Known -8 • Example: Discount Sounds – 95% of the sample means that can be observed are within ±1.96σ x of the population mean µ. – The margin of error is: σ 4, 500 = zα /2 1.96 = n 36 1, 470 Thus, at 95% confidence, the margin of error is $1,470. 24 Interval Estimate of a Population Mean: σ Known -9 • Example: Discount Sounds – Interval estimate of µ is: $41,100 ± $1,470 or $39,630 to $42,570 – We are 95% confident that the interval contains the population mean. 25 Interval Estimate of a Population Mean: σ Known -10 • Example: Discount Sounds Confidence Level 90% 95% 99% Margin of Error Interval Estimate 3.29 3.92 5.15 78.71 to 85.29 78.08 to 85.92 76.85 to 87.15 – In order to have a higher degree of confidence, the margin of error and thus the width of the confidence interval must be larger. 26 Interval Estimate of a Population Mean: σ Known -11 • Adequate Sample Size In most applications, a sample size of n = 30 is adequate. If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. 27 Interval Estimate of a Population Mean: σ Known -12 • Adequate Sample Size (continued) If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. 28 8.2 Population Mean: σ Unknown • Margin of Error and the Interval Estimate • Practical Advice • Using a Small Sample • Summary of Interval Estimation Procedures 29 Interval Estimate of a Population Mean: σ Unknown • If an estimate of the population standard deviation σ cannot be developed prior to sampling, we use the sample standard deviation s to estimate σ. • This is the σ unknown case. • In this case, the interval estimate for µ is based on the t distribution. • We’ll assume for now that the population is normally distributed. 30 t Distribution -1 William Gosset, writing under the name “Student”, is the founder of the t distribution. Gosset was an Oxford graduate in mathematics and worked for the Guinness Brewery in Dublin. He developed the t distribution while working on small-scale materials and temperature experiments. 31 Student - William Sealy Gosset (1876 - 1937) This birth-and-death process is suffering from labor pains; it will be the death of me yet. (Student Sayings) 32 t Distribution -2 The t distribution is a family of similar probability distributions. A specific t distribution depends on a parameter known as the degrees of freedom. Degrees of freedom refer to the number of independent pieces of information that go into the computation of s. 33 t Distribution -3 A t distribution with more degrees of freedom has less dispersion. As the degrees of freedom increases, the difference between the t distribution and the standard normal probability distribution becomes smaller and smaller. 34 t Distribution -4 t distribution (20 degrees of freedom) Standard normal distribution t distribution (10 degrees of freedom) z, t 0 35 t Distribution -5 • Selected Values From the t Distribution Table – Entries in the table give t values for an area or probability in the upper tail of the t distribution. 36 t Distribution -6 (Table ) • For example, with 10 degrees of freedom and a 0.05 area in the upper tail, t0.05 = 1.812. 37 t Distribution -7 For more than 100 degrees of freedom, the standard normal z value provides a good approximation to the t value. The standard normal z values can be found in the infinite degrees ( ) row of the t distribution table. 38 t Distribution -8 Standard normal z values 39 Margin of Error and the Interval Estimate: σ Unknown -1 • Interval Estimate x ± tα /2 s n where: 1 - α = the confidence coefficient tα/2 = the t value providing an area of α/2 in the upper tail of a t distribution with n – 1 degrees of freedom σ = the sample standard deviation 40 Margin of Error and the Interval Estimate: σ Unknown -2 • Example: Apartment Rents – A reporter for a student newspaper is writing an article on the cost of off-campus housing. A sample of 16 one-bedroom apartments within a half-mile of campus resulted in a sample mean of $750 per month and a sample standard deviation of $55. – Let us provide a 95% confidence interval estimate of the mean rent per month for the population of one-bedroom efficiency apartments within a halfmile of campus. We will assume this population to be normally distributed. 41 Margin of Error and the Interval Estimate: σ Unknown -3 • At 95% confidence, α = 0.05, and α/2 = 0.025. t0.025 is based on n – 1 = 16 – 1 = 15 degrees of freedom. In the t distribution table we see that t0.025 = 2.131. 42 Margin of Error and the Interval Estimate: σ Unknown -4 • Example: Apartment Rents – Interval Estimate x ± t0.025 s n Margin of Error 55 750 ± 2.131 = 750 ± 29.30 16 We are 95% confident that the mean rent per month for the population of one-bedroom apartments within a half-mile of campus is between $720.70 and $779.30. 43 Margin of Error and the Interval Estimate: σ Unknown -5 • Example: Consider a study designed to estimate the mean credit card debt for the population of U.S. households. A sample of n = 70 households provided the credit card balances shown in Table. 44 Margin of Error and the Interval Estimate: σ Unknown -6 • Example: – For this situation, no previous estimate of the population standard deviation σ is available. Thus, the sample data must be used to estimate both the population mean and the population standard deviation. – Using the data, we compute the sample mean x = $9312 and the sample standard deviation s = $4007. 45 Margin of Error and the Interval Estimate: σ Unknown -7 • Example: – With 95% confidence and n – 1 = 69 degrees of freedom, t distribution table can be used to obtain the appropriate value for t0.025. – We want the t value in the row with 69 degrees of freedom, and the column corresponding to 0.025 in the upper tail. The value shown is t0.025 = 1.995. 46 Margin of Error and the Interval Estimate: σ Unknown -8 • Example: – An interval estimate of the population mean credit card balance. 9312 ± 1.995 9312 ± 955 4007 70 – The point estimate of the population mean is $9312, the margin of error is $955, and the 95% confidence interval is 9312 – 955 = $8357 to 9312 + 955 = $10,267. – Thus, we are 95% confident that the mean credit card balance for the population of all households is between $8357 and $10,267. 47 Using a Small Sample: σ Unknown -1 • Adequate Sample Size In most applications, a sample size of n = 30 is adequate when using the expression 𝑥𝑥̅ ± 𝑡𝑡𝛼𝛼⁄2 𝑠𝑠⁄ 𝑛𝑛 to develop an interval estimate of a population mean. If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. 48 Using a Small Sample: σ Unknown -2 • Adequate Sample Size (continued) If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. 49 Using a Small Sample: σ Unknown -3 • Example: A sample of 20 employees is selected, with each employee in the sample completing the training program. Data on the training time in days for the 20 employees are shown in Table. 50 Using a Small Sample: σ Unknown -4 • Example: – First, the sample data do not support the conclusion that the distribution of the population is normal, yet we do not see any evidence of skewness or outliers. Therefore, using the guidelines in the previous subsection, we conclude that an interval estimate based on the t distribution appears acceptable for the sample of 20 employees. 51 Using a Small Sample: σ Unknown -5 • Example: – We continue by computing the sample mean and sample standard deviation as follows. x = s = x ∑ = i n 1030 = 51.5 days 20 ∑ ( xi − x ) 2 = n−1 889 = 6.84 days 20 − 1 52 Using a Small Sample: σ Unknown -6 • Example: Histogram of Training Times for the Scheer Industries Sample 53 Using a Small Sample: σ Unknown -7 • Example: – For a 95% confidence interval, we use t distribution table and n – 1 = 19 degrees of freedom to obtain t0.025 = 2.093. The interval estimate of the population mean 6.84 51.5 ± 2.093 20 51.5 ± 3.2 – The point estimate of the population mean is 51.5 days. The margin of error is 3.2 days and the 95% confidence interval is 51.5 – 3.2 = 48.3 days to 51.5 + 3.2 = 54.7 days. 54 Summary of Interval Estimation Procedures for a Population Mean Can the population standard deviation σ be assumed known ? Yes Use the sample standard deviation s to estimate σ Known Case Use x ± zα /2 σ n No σ Unknown Case Use x ± tα /2 s n 55 8.3 Determining the Sample Size • Sample Size for an Interval Estimate of a Population Mean Let E = the desired margin of error. E is the amount added to and subtracted from the point estimate to obtain an interval estimate. If a desired margin of error is selected prior to sampling, the sample size necessary to satisfy the margin of error can be determined. 56 Sample Size for an Interval Estimate of a Population Mean -1 • Margin of Error E = zα /2 σ n • Necessary Sample Size ( zα / 2 ) 2 σ 2 n= E2 57 Sample Size for an Interval Estimate of a Population Mean -2 The Necessary Sample Size equation requires a value for the population standard deviation σ . If σ is unknown, a preliminary or planning value for σ can be used in the equation. 1. Use the estimate of the population standard deviation computed in a previous study. 2. Use a pilot study to select a preliminary study and use the sample standard deviation from the study. 3. Use judgment or a “best guess” for the value of σ . 58 Sample Size for an Interval Estimate of a Population Mean -3 • Example: – A previous study that investigated the cost of renting automobiles in the United States found a mean cost of approximately $55 per day for renting a midsize automobile. – Suppose that the organization that conduces this study would like to conduce a new study in order to estimate the population mean daily rental cost for a midsize automobile in the United States. – In design the new study, the project director specifies that the population mean daily rental cost be estimated with a margin of error of $2 and a 95% level of confidence. 59 Sample Size for an Interval Estimate of a Population Mean -4 • Example: – The project director specified a desire margin of error of E = 2, and the 95% level if confidence indicates z0.025 = 1.96. – At this point, we found σ = $9.65, we obtain ( zα /2 )2 σ 2 (1.96)2 (9.65)2 = n = E2 = 89.43 2 2 – The sample size of the new study needs to be at least 98.43 midsize automobile rentals in order to satisfy the project director’s $2 margin-of-error requirement. We round up to the next integer value, the recommended sample size is 90 midsize automobile rentals. 60 Sample Size for an Interval Estimate of a Population Mean -5 • Example: Discount Sounds – Recall that Discount Sounds is evaluating a potential location for a new retail outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. – Suppose that Discount Sounds’ management team wants an estimate of the population mean such that there is a 0.95 probability that the sampling error is $500 or less. – How large a sample size is needed to meet the required precision? 61 Sample Size for an Interval Estimate of a Population Mean -6 • Example: Discount Sounds zα /2 σ = 500 n – At 95% confidence, z0.025 = 1.96. Recall that σ = 4,500. (1.96)2 (4, 500)2 = n = 311.17 = 2 (500) 312 A sample of size 312 is needed to reach a desired precision of ± $500 at 95% confidence. 62 8.4 Population Proportion • Determining the Sample Size 63 Interval Estimate of a Population Proportion -1 The general form of an interval estimate of a population proportion p is p ± Margin of Error 64 Interval Estimate of a Population Proportion -2 The sampling distribution of plays a key role in computing the margin of error for this interval estimate. The sampling distribution of can be approximated by a normal distribution whenever np > 5 and n(1 – p) > 5. 65 Interval Estimate of a Population Proportion -3 • Normal Approximation of Sampling Distribution of p Sampling distribution of p α/2 p(1 − p) σp = n 1 - α of all p values zα /2σ p p α/2 p zα /2σ p 66 Interval Estimate of a Population Proportion -4 • Interval Estimate p ± zα / 2 p (1 − p ) n where: 1 – α is the confidence coefficient zα/2 is the z value providing an area of α/2 in the upper tail of the standard normal probability distribution p is the sample proportion 67 Interval Estimate of a Population Proportion -5 • Example: Political Science, Inc. – Political Science, Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers informed of their position in a race. – Using telephone surveys, PSI interviewers ask registered voters who they would vote for if the election were held that day. 68 Interval Estimate of a Population Proportion -6 • Example: Political Science, Inc. – In a current election campaign, PSI has just found that 220 registered voters, out of 500 contacted, favor a particular candidate. PSI wants to develop a 95% confidence interval estimate for the proportion of the population of registered voters that favor the candidate. 69 Interval Estimate of a Population Proportion -7 • Example: Political Science, Inc. p ± zα /2 – where: p(1 − p ) n n = 500, p = 220/500 = 0.44, zα/2 = 1.96 0.44(1 − 0.44) 0.44 ± 1.96 500 PSI is 95% confident that the proportion of all voters that favor the candidate is between 0.3965 and 0.4835. 70 Interval Estimate of a Population Proportion -8 • Example: Tee Times – A national survey of 900 women golfers was conduced to learn how women golfers view their treatment at golf courses in the United States. The survey found that 396 of the women golfers were satisfied with the availability of tee times. – The point estimate of the proportion of the population of women golfers who are satisfied with the availability of tee times is 396/900 = 0.44. 71 Interval Estimate of a Population Proportion -9 • Example: Tee Times – A 95% confidence level, p ± zα /2 p(1 − p ) n 0.44(1 − 0.44) 0.44 ± 1.96 900 0.44 ± 0.0324 – The margin of error us 0.0324 and the 95% confidence interval estimate of the population proportion is 0.4074 to 0.4724. The survey results enable us to state with 95% confidence that between 40.76% to 47.24% of all women golfers are satisfied with the availability of tee times. 72 Sample Size for an Interval Estimate of a Population Proportion -1 • Margin of Error E = zα / 2 p (1 − p ) n – Solving for the necessary sample size, we get ( zα /2 ) 2 p (1 − p ) n= E2 – However, p will not be known until after we have selected the sample. We will use the planning value p* for p . 73 Sample Size for an Interval Estimate of a Population Proportion -2 • Necessary Sample Size ( zα / 2 ) 2 p* (1 − p* ) n= E2 • The planning value p* can be chosen by: 1. Using the sample proportion from a previous sample of the same or similar units, or 2. Selecting a preliminary sample and using the sample proportion from this sample. 3. Use judgment or a “best guess” for a p* value. 4. Otherwise, use 0.50 as the p* value. 74 Sample Size for an Interval Estimate of a Population Proportion -3 • Some Possible Values for p*(1 − p*) 75 抽取1,000份樣本的原因 • 民意、市場調查的多為封閉問卷,有興 趣的多為某個問項佔的比例,例如:某 位候選人的支持程度二項分配。 • 在信心水準為95%及最大誤差不大於 3% 的要求下: 1.96 × p (1 − p ) ≤ 0.03 n 1.96 p (1 − p ) 1.96 × 1 / 2 ⇔ n≥ ≅ 0.03 0.03 ⇔ n ≥ 1,067 Sample Size for an Interval Estimate of a Population Proportion -4 • Example: Political Science, Inc. – Suppose that PSI would like a 0.99 probability that the sample proportion is within + 0.03 of the population proportion. – How large a sample size is needed to meet the required precision? (A previous sample of similar units yielded 0.44 for the sample proportion.) 77 Sample Size for an Interval Estimate of a Population Proportion -5 • Example: Political Science, Inc. p(1 − p ) zα /2 = 0.03 n – At 99% confidence, z0.005 = 2.576. Recall that p = 0.44. ( zα /2 )2 p * (1 − p *) (2.576)2 (0.44)(0.56) n = ≅ 1817 2 2 (0.03) E A sample of size 1817 is needed to reach a desired precision of ± 0.03 at 99% confidence. 78 Sample Size for an Interval Estimate of a Population Proportion -6 • Example: Political Science, Inc. – Note: We used 0.44 as the best estimate of p in the preceding expression. If no information is available about p, then 0.5 is often assumed because it provides the highest possible sample size. If we had used p = 0.5, the recommended n would have been 1843. 79 Sample Size for an Interval Estimate of a Population Proportion -7 • Example: Tee Times – How large should the sample be if the survey director wants to estimate the population proportion with a margin of error of 0.25 at 95% confidence? – With E = 0.25 and zα/2 = 1.96, we need a planning value p* to answer the sample size question. 80 Sample Size for an Interval Estimate of a Population Proportion -8 • Example: Tee Times – Using the previous survey result of p = 0.44 as the planning value p*, n ( zα /2 )2 p * (1 − p *) (1.96)2 (0.44)(1 − 0.44) 1514.5 = = 2 2 (0.025) E – The sample size must be at least 1514.5 women golfers to satisfy the margin of error requirement. Rounding up to the next integer value indicates that a sample of 1515 women golfers is recommended to satisfy the margin of error requirement. 81 Sample Size for an Interval Estimate of a Population Proportion -9 • Example: Tee Times – The fourth alternative suggested for selecting a planning value p* is to use p* = 0.50. This value of p* is frequently used when no other information is available. – In the survey of women golfers example, a planning value of p* = 0.50 would have provided the sample size ( zα /2 )2 p * (1 − p *) (1.96)2 (0.50)(1 − 0.50) n = 1536.6 = = 2 2 (0.025) E Thus, a slightly larger sample size of 1537 women golfers would be recommended. 82 End of Chapter 8 83