1 Excel Manual Chapter Estimation of the Mean and Proportion 8 While the spreadsheet setups described in this guide may seem to be getting more complicated, once they are created (and tested!), they will serve as handy tools for making quick and dynamic calculations. This section explains how to use Excel to create confidence intervals for population means and proportions as well as hypothesis testing for both. Since the calculations of both confidence intervals and hypothesis testing are similar, you can set up the same spreadsheet to solve both types of problems. Excel displays its calculations as the values are entered, so you can observe how changing the values affect the results. For example, you can play with the n value to see what happens if it is increased or decreased, or find out how changing the sample proportion would affect the margin of error. This point cannot be emphasized enough: The spreadsheets created in this chapter need to be tested using known values to make sure that you are obtaining the correct results. Use the examples shown here to ensure an accurate setup. Estimating a Population Mean: σ known =CONFIDENCE(alpha, standard deviation, sample size): This formula calculates the margin of error for a confidence interval. Note: This function only applies to estimating a population mean when the standard deviation of the population standard deviation is known. The following illustration shows how to set up a spreadsheet to calculate the margin of error and confidence interval given the α-level, mean, standard deviation, and sample size: Note the negative sign on the NORMSINV() function. This makes the Z value positive. Input data in the white cells; the shaded areas contain formulas or labels. Chapter 8/ Introductory Statistics - Mann 2 Excel Manual The example below uses the information in Example 8-1 on page 366 of your text, creating a confidence interval for estimating the mean cost of a college textbook: To enter the Greek letters as shown, select Symbol from the Symbols group in the Insert tab, choose the desired letter and click Insert. The 90% confidence interval is found in cells B10 and C10. We are 90% confident that the true mean time of community service is in the interval (133.49, 156.51). This is a little different from the text’s results due to rounding errors. The text rounds the Z* to the nearest hundredth, whereas Excel carries the decimal out much further. (* indicates that the Z-score is a “criticalvalue.”) Finding the Z and Required Sample Size for a Confidence Interval Calculating values of zα/2 for any α. Create the following spreadsheet to automatically calculate the zα/2 given the desired α level. You can also set up the bottom portion of this spreadsheet to automatically yield the required sample size to ensure that the margin of error is below a specified level for a given α level and standard deviation. The following are the results for Example 8-3 on page 371. We have calculated Z and the sample size for estimating the mean debt of college graduates with 99% confidence, given a standard deviation of $11,800 and an error of no more than $800. Chapter 8/ Introductory Statistics - Mann 3 Excel Manual From the results, we see that a sample size of 1444 is needed for 99% confidence with an error of $800. This value is slightly different from the text due the rounding of Z*. Estimating a Mean When σ is Not Known =TINV(probability, degrees of freedom): This calculates the t value for the student’s tdistribution based on the degrees of freedom (n-1) for a specific probability. This function assumes a two-tailed or “not equal” alternative hypothesis or a two-sided confidence interval. =TDIST(t-statistic, degrees of freedom, tails): This calculates the tail probability of a t test statistic based on the degrees of freedom (n-1) and the number of tails (1 or 2). It is important to note that the t statistic must be positive. If you use this function in a situation where the value could be negative, you will need to use the ABS() function to make it positive or else an error will result. Note: The TINV can only be used with a two-tailed test. Therefore, if a one-tailed test is desired, you have to divide this probability by two. On the other hand, the TDIST function allows either a one-tailed or a two-tailed test, and this must be indicated as a final argument of the function. The following graphic demonstrates how to set up a spreadsheet to calculate the t necessary to create a confidence interval based on sample size (n) and significance level (α): Chapter 8/ Introductory Statistics - Mann 4 Excel Manual The following spreadsheet will calculate the entire confidence interval automatically: The next spreadsheet shows the results of creating a 95% confidence interval for the average premium paid for family health insurance using the t-distribution from the data in Example 8-5 on page 378 in your text. The results show that we are 95% confident that the true mean premium paid for health insurance coverage lies in the interval (6269.78, 6930.22). There is a slight difference from the confidence interval in the textbook due to a more precise value of t. Alternatively, if you would like Excel to perform all your calculations and you want to be able to use the actual data in Excel, the following spreadsheet is set up to calculate everything— including n, mean, and standard deviation. The formulas for n, mean and standard deviation in this sheet can accommodate any number of entries in column D since the range is open-ended. In other words, because (D:D) has no top nor bottom, any number in the column will be included in the calculation. The following example uses the normally distributed data from Exercise 8-47 on page 381 in your text. Chapter 8/ Introductory Statistics - Mann 5 Excel Manual Finally, you may also use the Data Analysis ToolPak to calculate a confidence interval from a sample. This will only work for a t-distribution. It is not available for the previous confidence intervals that involved the z-distribution. To use the Data Analysis ToolPak, enter the data in a column as shown in the previous example. Select Data Analysis from the Analysis group in the Data tab. (If it is not there, you will need to add it as described on the first page of Chapter 2.) Select Descriptive Statistics from the list and click OK. Click on the red arrow in the input range option and select the list of data in your spreadsheet. If you have a label in the first cell of the input range, be sure to check the appropriate box. Check Summary Statistics and Confidence Level for Mean. Enter the desired confidence level (100(1-α)). Click OK. Chapter 8/ Introductory Statistics - Mann 6 Excel Manual The result is stored in a separate spreadsheet. The confidence level calculated is the Margin of Error and therefore must be added and subtracted from the sample mean to obtain the interval. Exercise 8-47 on page 381 for a normally distributed data set is shown below: Hence, the 95% confidence interval for the true mean in this data set is in the interval 1.409 +/4.810 = (-3.401, 6.219). Estimation of a Population Proportion: Large Samples You can set up a spreadsheet similar to the one above to automatically calculate these values for a proportion. B3, B4, and B9 in the spreadsheet below are the input areas. One notable difference in this sheet from the previous example is the calculation of Margin of Error. In this sheet, the formula Z-Critical * Standard Error (=B11*B6) has been used instead of the CONFIDENCE() function, since the CONFIDENCE() is used only in population mean situations. The CONFIDENCE() function may be used with some modifications: Use the Standard Error for the standard deviation argument and a sample size of n = 1. This works because the Standard Error already has the sample size calculated into it. The spreadsheet illustrated has the number of successes put in as a formula in B4 which is ̂. This is the product of the sample size and the sample proportion. The number of successes (B4) may also be entered as a numerical value such as 440: Chapter 8/ Introductory Statistics - Mann 7 Excel Manual The following sheet references Example 8-7 on page 384 of your text and calculates the confidence interval about the population proportion. In this example, the sheet determines the proportion of American adults that said owning a home is a very important part of the American dream with 95% confidence. According to the results, the 95% confidence interval for the proportion of American adults that say that owning a home is a very important part of the American dream is (.509, .591). Determining a Sample Size for Estimating a Population Proportion Next we’ll review two different Excel methods for determining the sample size necessary to obtain a maximum margin of error. The first uses the spreadsheet from the previous example along with a handy feature in Excel called Goal Seek. Enter an estimated sample size to begin with. Be sure the formula is used to calculate x in B4, the number of successes. The .5 represents p-hat, which is .5 to obtain the largest error. Start B3 at an arbitrary value such as 200. B3: 200 Chapter 8/ Introductory Statistics - Mann 8 Excel Manual B4: =B3*0.5 Next, go to the Data tab and select Goals Seek from the What-If Analysis of the Data group, as shown in the illustration below: The Margin of Error cell (B12) must be entered into the Set Cell entry. Enter .02 for the To Value entry to obtain the maximum margin of error requested in the text, and enter the sample size cell (B3) for the By Changing entry. See illustration below. When you click OK, Excel will perform a numerical analysis routine to determine the approximate sample size to obtain the desired margin of error. The results for this scenario are shown below: Chapter 8/ Introductory Statistics - Mann 9 Excel Manual In this case, the margin of error is slightly over the .02 accepted. Step up the sample size incrementally from 2357, and continue by trial and error by entering values. You will find that entering 2401 for the sample size yields the first margin of error less than .02 for a confidence level of 95%, just as shown in Example 8-9. As an alternative method, you can set up a spreadsheet to find the answer. The setup for this looks essentially the same as the sheet for means. The only difference is that instead of using the standard deviation for the margin of error, the calculation uses the product of the probability success and the probability of failure (pq) or p(1-p). The spreadsheet should look like this: To practice this, look at Example 8-9 on page 387. To estimate the number of subjects it is necessary to survey to ensure the margin of error is no more than 2% (if the estimated proportion is .5 and the level of confidence is 95%), set up the spreadsheet as shown: Chapter 8/ Introductory Statistics - Mann 10 Excel Manual To obtain the sample size necessary to ensure the margin of error is at most 2%, round this value up to 2401. Chapter 8/ Introductory Statistics - Mann