Some Statistical Procedures and Functions with Excel Introductory Note: Microsoft’s Excel spreadsheet provides both statistical procedures and statistical functions. The procedures are accessed by clicking on Tools in the task bar at the top of the Excel screen. From the Tools menu, choose Data Analysis and from the menu presented, choose the appropriate procedure. NOTE ABOUT THE DATA ANALYSIS TOOLS: Excel comes with the Data Analysis tool pack, but this tool pack is an “Add In.” If you have never used the Data Analysis tools, you must first click on Tools, then click on Add-Ins. Click on the boxes for Analysis Toolpak and Analysis ToolpakVBA so that a check appears in each box. Then click on OK, and you will now be able to bring up the Data Analysis tools. Excel’s statistical functions are built-in formulas that carry out certain calculations. To use them, enter the appropriate formula in a cell and give the formula all the arguments it requires. You may already be familiar with some of these functions from accounting or finance courses, where you may have learned to calculate present values, payments on a debt at a given interest rate, or the sum of a column of figures. Descriptive Statistics: • Choose Tools Data Analysis. From the list which appears choose Descriptive Statistics and click OK o A dialog box appears. Mark the range which contains the data, which you should have previously entered. If there is a data label in the first row, mark that label and check the box for “Labels in first row.” Indicate an output range in your worksheet by either entering the address of the first cell or clicking on the cell. Check the box for “Summary Statistics” and click OK. o Output looks like this: Scores Mean 28.2 Standard Error 7.144228 Median 24 Mode #N/A Standard Deviation 15.97498 Sample Variance 255.2 Kurtosis -0.40167 Skewness 0.891063 Range 38 Minimum 14 Maximum 52 Sum 141 Count 5 Confidence Level(95.0%) 19.8356 o Notes: “Scores” is the label from the first line of the column containing data Standard Error is the sample standard deviation divided by the square root of the sample size Standard Deviation is a sample value Confidence Level is the error in the estimate of a confidence interval, calculated using the t distribution; that is, confidence level = t0.95 × sx o A 95% confidence interval is implicit in this output: it is 28.2 ± 19.84 Excel, page 2 To calculate an interval for a different confidence level: in the Descriptive Statistics dialog box there is an entry for “Confidence Level for Mean.” This is the confidence level of the interval to be calculated. o For a hypothesis test, t = (X̄ - µ0)/sX̄. For the hypothesis test H0: µ ≤ 25 vs. H1: µ > 25, for example, we would have t = (28.2 − 25)/7.144 = 0.4479. We could then use the TDIST function to determine the p-value of the test. Excel also provides spreadsheet formulas for descriptive statistics. To use these, enter an = sign in a cell, followed by the formula with appropriate range designation o AVERAGE(RANGE) : returns the arithmetic mean o STDEV(RANGE) : returns the sample standard deviation o STDEVP(RANGE) : returns the population standard deviation o VAR(RANGE) : returns the sample variance o VARP(RANGE) : returns the population variance o MEDIAN(RANGE) o MODE(RANGE) o COUNT(RANGE) : returns the number of cells in the range which contain numberic data. Note that the count function does not count blank cells or cells containing alphabetic information (words). o • Probability Functions in Excel • Binomial Probabilities: o BINOMDIST(x0, n, π, CUMULATIVE) “CUMULATIVE” takes the values “TRUE” or “FALSE”; false returns the probability of the individual number of successes, while true returns the value P(x ≤ x0) BINOMDIST(4, 12, .3, false) = 0.23114 is the probability of 4 successes in 12 trials with probability of success = 0.3 for each trial BINOMDIST(4, 12, .3, false) = 0.723655 is the probability of 4 or fewer successes in 12 trials To work repeated problems, create a specialized worksheet. For example, in cell A5, enter Prob x =, in cell B5 enter =binomdist(b2,b3,b4,false) and in cell B6 enter =binomdist(b2,b3,b4, true) • b2 is the entry cell for the number of success, b3 for the number of trials and b4 for π • enter your own labels for cells a2 to a4 and a6 • Poisson Probabilities: o POISSON(x, µ, CUMULATIVE) “CUMULATIVE” takes the values “TRUE” and “FALSE”, for cumulative or individual values remember that Poisson probabilities depend entirely on the expected value µ • Exponential Probabilities: o EXPONDIST(t0, r, CUMULATIVE) “CUMULATIVE” will usually take the value “TRUE” r is the rate of occurrence and t0 is the interval until first occurrence, thus this formula returns P(t ≤ t0) EXPONDIST(2,0.5,true) = 0.632121 is the probability that the first success will occur within 2 minutes if the average rate of occurrence is 0.5 per minute to find P(t > t0) enter 1 – EXPONDIST • Normal Probabilities: o NORMDIST(x0, µ, σ, CUMULATIVE) Excel, page 3 If “CUMULATIVE” has value “TRUE” this formula returns P(x ≤ x0) for the normal distribution with given µ, σ =normdist(20,25,5,true) = 0.1587 is the probability of values less than or equal to 20 on a normal distribution with µ = 25 and σ = 5 NORMINV(PROBABILITY, µ, σ) this formula returns the x0 such that P(x ≤ x0) has the probability entered in the formula NORMINV(.975, 200, 20) = 239.2; on a normal distribution with mean 200 and standard deviation 20, .975 of the distribution is less than 239.2 NORMSDIST(z0): returns P(z ≤ z0) NORMSINV(PROBABILITY): returns z0 such that P(z ≤ z0) has the given probability To work repeated problems, create a specialized spreadsheet: for example, in Cell A4 enter Prob (x <= x0); in cell B4, enter =NORMDIST(B6, B7, B8, TRUE). In A5 enter Prob( x > x0) and in B5 enter =1-B4). Then enter an x value in B6, mean in B7, and standard deviation in B8. You will of course want to enter labels in A6 to A8. o o o o • t Distribution Probabilities: o TDIST(t, degrees of freedom, tails) t is a calculated value from the formula t= x − µ0 sx or other t formulas which we will encounter degrees of freedom will depend on the problem; in simple hypothesis tests, we have df = n – 1 “tails” takes the value 1 or 2, depending on whether it’s a one-tailed or twotailed test the result of tdist is the probability of a t value as great as that actually obtained; it is the area under the graph of the t distribution beyond the calculated value of t. If we specify 1 for “tails,” it is the area in one tail beyond the calculated value; if we specify 2 for “tails,” it is the area in the tails beyond ±t. in hypothesis testing, the result of the TDIST formula is the p-value of the test. TDIST(3.15, 9, 1) = 0.00362; TDIST(1.93, 22, 2) = 0.0666 TINV(probability, degrees of freedom) returns a t value with the specified probability split between the two tails used for finding t values for use with confidence intervals • TINV(0.05, 22) = 2.073875 gives the t value that would be used for calculating a 95% confidence interval with a sample of n = 23 or for finding critical t values: for a two-tailed test, enter the significance level for “probability”; for a two-tailed test, enter twice the significance level for “probability” • TINV(0.01, 44) = 2.692286 is the critical value for a two-tailed test at 1% significance with 44 degrees of freedom • TINV(0.1, 26) = 1.705616 is the critical value for an upper one-tailed test at 5% significance with 26 degrees of freedom; −1.705616 is the critical value for a lower one-tailed test with same conditions o Sample Problems and Applications Normal Probabilities and z tests: ¾ For a compact model of microwave oven, the average power used is 750 watts with standard deviation 10 watts. o What is the probability that a randomly selected oven uses less than 735 watts? Solution: use NORMDIST(735, 750, 10, true) o What proportion of these ovens draw more than 720 watts? Excel, page 4 o o o o Solution: use NORMDIST(720, 750, 10, true). The result is the proportion that use less than 720, and the required answer is 1 − that value, or 1 − NORMDIST(720, 750, 10, true) How much power do the lowest 25% of these ovens use? Solution: use NORMINV(0.25,750,10). The result is the number of watts such that 25% use that many or fewer watts. How much power do the highest 10% of these ovens use? Solution: since 10% use more, 90% use less. Enter NORMINV(0.9,750,10). The result is a wattage figure such that only 10% of the ovens use that much or more. If we choose a sample of 25 of these ovens, what is the probability that the mean power usage will be more 755 watts? Solution: This question refers to the distribution of sample means; that distribution has µX̄ = 750; the relevant standard deviation is the standard error of the mean σX̄. Calculate that value: σX̄ = σ/√n = 10/5 = 2. Then use the normdist function: =1 − NORMDIST(755,750,2,TRUE) The thickness of steel plates is normally distributed with σ = 0.05 mm. For a sample of 30 plates, X̄ = 22mm. Calculate a 95% confidence interval for the mean diameter of all plates. Solution: This problem requires the use of z values which demarcate the middle 95% of a normal distribution, 2-1/2% on each end. To find the appropriate values, enter =NORMSINV(0.025) and/or NORMSINV(0.975). The numbers you get will have the same absolute value. A general formula to find the z values for a confidence interval would be =−NORMSINV(1 − (confidence level)/200) Probabilities and Hypothesis Tests with the t Distribution: o o The amounts customers spend at Ye Olde Antique Barne are skewed upwards. In a sample of 57 customers we find a sample mean of $312 with standard deviation $70. Find a 90% confidence interval for the average spending at YOAB. Solution: This problem requires the use of t values. To find the correct t-value enter TINV(0.1, degrees of freedom). A general formula would be =TINV((100-confidence level)/100, df). Notice that the TINV function uses both tails of the distribution. Use the information from the preceding problem to test the hypotheses H0: µ ≥ 320 vs. H1: µ < 320. Use 5% significance level. Solution: Calculate the t value t = (X̄ − µ0)/sX̄; in this case sX̄ = 70/7.5498 = 9.2717, so t = (312 − 320)/9.2717 = −0.8628. • p-value approach: use TDIST to find the p-value: enter TDIST(0.8628,56,1). The result is the probability of a t value as large as or larger than 0.8628, and that is the p-value of the test. (We actually want the probability of values as small or smaller than −0.8628, but by the symmetry of the t distribution, that is the same. o NOTE: the t value entered must be a positive number. If you are setting up a spreadsheet to do a number of these problems, use the ABS, or absolute value, function. For example, the expression TDIST(ABS(B4),B5-1,1) would give the p-value in a one-tailed test for the t value entered in cell B4 and degrees of freedom equal to the sample size, entered in cell B5, minus 1. • Critical value approach. Use TINV to find the critical values. Enter TINV(0.1, 56). The result is the critical value for an upper one-tailed test at 5% significance. Notice that to find critical values for a onetailed test, we must enter TWO TIMES the significance level. Excel, page 5 For a lower one-tailed test, we use the same procedure but append a minus sign to the critical t value. o For a two-tailed test, we enter the significance level of the test as the probability. Copper tubing must have an average diameter of 0.575 in; diameters are known to be normally distributed. In a sample of 20 sections of pipe, the mean diameter is 0.569 in with standard deviation 0.04 in. At 5% significance level, does the tubing meet the standard? Solution: this is a hypothesis test of H0: µ = 0.575 vs. H1: µ ≠ 0.575. The t statistic = -0.6708. • p-value: use TDIST(0.6708,19,2); this will give the p-value of the test. • critical value: to find the critical values, enter TINV(0.05,19). The result is 2.0930, and the decision rule is Reject H0 if t > +2.0930 or if t < −2.0930. o o