Chapter 5 Confidence Intervals Course Website: www.math.mun.ca/~sneddon/st2500 These handouts are modifications of lab notes prepared by Lauren Granter. In this session, we will study some examples to see how Minitab calculates confidence intervals for a population mean and proportion. We’ll also do some numerical studies to illustrate the interpretation of confidence intervals. As with sampling distributions, this will be a numerical study of the Big Pot Theory discussed in class. 5.1 Confidence Intervals for Population Mean Recall from class that we want to construct confidence intervals (CI) for the population mean (µ) in two situations: when the sample size is large (n ≥ 30) and when the sample size is small (n < 30). 5.1.1 CI for Population Mean: Large Sample Size EX: The U.S. Commerce Dept. is interested in the average house price of all new houses sold in the U.S. They selected a random sample of 345 homes, and found their average was $201,400. If we assume the standard deviation of the price of all new homes sold is $38,000, find a 95% confidence interval for the mean house price of all new homes sold. In this case, we have a large sample size, so we can use this CI for the mean, as discussed in class: ! σ x̄ ± zα/2 √ n where zα/2 is the value on the standard normal curve with area of α/2 to its right, and n is our sample size. 1 We do this in Minitab as follows: 1. Select Stat–Basic Statistics–1 Sample Z 2. Select Summarized Data, and enter 345 for Sample size and 201400 for mean. 3. Enter 38000 in the Standard deviation box. 4. Leave Test Mean blank. 5. Select OK. The output is shown below: One-Sample Z The assumed standard deviation = 38000 N 345 Mean SE Mean 201400 2046 95% CI (197390, 205410) So we are 95% confident the mean selling price is between $197,390 and $205,410. By default, Minitab found a 95% CI for the population mean, so α = 0.05. If we wanted a 90% CI (or any other confidence level), we add the following step: • Click Options, and enter the confidence level we want (say 90) in the Confidence level box. 5.1.2 CI for Population Mean: Small Sample Size EX: The data file pallet.mtw on the course website contains a sample of the weights of wooden pallets of 2 types of shingles (“Boston” and “Vermont”). 1. Find a 90% CI for the mean weight of the Boston shingles pallets. 2. Find a 95% CI for the mean weight of the Vermont shingles pallets. 3. Evaluate whether the assumption needed for (1) and (2) has been seriously violated. We’ll go through (1), and you can work on (2) and (3) on your own. 1. To find the 90% CI for a small sample size, we do the following in Minitab: The first step is to get the data from the course webpage into your Minitab worksheet. In this case, we have n < 30, so we need to use x̄ ± tn−1 s √ n ! where tn−1 comes from the T-distribution with (n − 1) degrees of freedom (df). 2 (a) Select Stat–Basic Statistics–1 Sample t (b) Select the Boston column for Samples in columns. (c) Leave Test mean blank. (d) Select Options and change Confidence Level to 90. (e) Select OK. Enter the answer from Minitab in the space below. 2. Find a 95% CI for the mean weight of the Vermont shingles pallets. Enter your answer below: 3. Evaluate whether the assumption needed for (1) and (2) has been seriously violated. Enter your answer below: 5.2 Confidence Interval for Population Proportion EX: A study of 828 travellers showed that 567 of them purchased plane tickets on an airline website in the past 12 months. Find a 96% confidence interval for the proportion of all travellers that have purchased plane tickets on an airline website in the past 12 months. 3 Here we are looking at a proportion of people that have a certain characteristic. The formula for a CI for a population proportion p is s p̂ ± zα/2 p̂(1 − p̂) n where p̂ is the sample proportion. In this example, our sample size is n = 828 and x = 567, which is the number of people with the characteristic we are interested in. So our sample proportion is p̂ = 567/828 = 0.685. We can find our CI for p in Minitab as follows: 1. Select Stat-Basic Statistics-1 Proportion 2. Select Summarized data and enter 828 after Number of trials and 567 after Number of events 3. Select Options and 96 for Confidence Level and choose Use test and interval based on normal distribution. Click OK. 4. Click OK. The output is as follows: Test and CI for One Proportion Test of p = 0.5 vs p not = 0.5 Sample 1 X 567 N 828 Sample p 0.684783 96% CI (0.651623, 0.717943) Z-Value 10.63 P-Value 0.000 NOTE: Minitab uses Sample p instead of p̂. The 96% CI for p is (.651, .718). 5.3 Interpreting a CI for µ As we discussed in class, the formal interpretation of a confidence interval follows the Big Pot Theory: if we can draw lots of samples, and create CI’s for each of these samples, we would expect a certain percentage of them (90%, 99%, or whatever confidence level we’re using) to contain the true value of µ. Let’s see if we can get Minitab to illustrate this result numerically. First, get Minitab to select 100 different samples, each of size n = 40, from a normal distribution with µ = 10 and σ = 2. (We don’t have to use data from the normal distribution, though). We do this as follows: 4 1. Select Calc–Random Data–Normal 2. Enter 40 in the space after Generate 3. Type C1–C100 in the space below Store in column 4. Set Mean = 10 and Standard deviation = 2. 5. Select OK. The 100 samples are in columns C1–C100. For each sample, we want to find a 90% confidence interval for µ. We do this as before: 1. Select Stat-Basic Statistics-1 Sample Z. 2. Choose Samples in Columns, and enter C1–C100 in the box. 3. Enter Standard deviation as 2. 4. Under Options, enter 90 for Confidence Level and hit OK. 5. Select OK. A portion of the output I got (your answers will not be the same) is below: One-Sample Z: C1, C2, C3, C4, C5, C6, C7, C8, ... The assumed standard deviation = 2 Variable C1 C2 . . . N 40 40 Mean 9.79757 9.82073 StDev 2.03591 1.79016 SE Mean 0.31623 0.31623 90% CI (9.39216, 10.20298) (9.41532, 10.22614) The last column contains the 100 CI’s for µ that Minitab found (one CI for each sample of 40 observations). Theory says: In this case we know that µ = 10 (that’s what we told Minitab when we created the samples). The theory says 90% of all the CI’s we just created should contain µ = 10. In other words, the interval (9.39216, 10.20298) would contain µ = 10, since 10 falls between 9.39 and 10.20. Take a look at the 100 intervals you have. Write down below how many of them do not contain µ = 10. 5 Then the number that do contain µ = 10 is (100 – above result). Write down this value: Out of your 100 intervals, how many (theoretically) should contain µ = 10? How do your theoretical and actual results compare? 6