Excel Statistical Formulas

advertisement
Some Statistical Procedures and Functions with Excel
Introductory Note:
Microsoft’s Excel spreadsheet provides both statistical procedures and statistical functions. The procedures
are accessed by clicking on Tools in the task bar at the top of the Excel screen. From the Tools menu,
choose Data Analysis and from the menu presented, choose the appropriate procedure.
NOTE ABOUT THE DATA ANALYSIS TOOLS: Excel comes with the Data Analysis tool pack, but this
tool pack is an “Add In.” If you have never used the Data Analysis tools, you must first click on
Tools, then click on Add-Ins. Click on the boxes for Analysis Toolpak and Analysis ToolpakVBA so that a check appears in each box. Then click on OK, and you will now be able to bring
up the Data Analysis tools.
Excel’s statistical functions are built-in formulas that carry out certain calculations. To use them, enter the
appropriate formula in a cell and give the formula all the arguments it requires. You may already be
familiar with some of these functions from accounting or finance courses, where you may have learned to
calculate present values, payments on a debt at a given interest rate, or the sum of a column of figures.
Descriptive Statistics:
• Choose Tools
Data Analysis.
From the list which appears choose Descriptive Statistics and click OK
o A dialog box appears. Mark the range which contains the data, which you should have
previously entered. If there is a data label in the first row, mark that label and check the
box for “Labels in first row.” Indicate an output range in your worksheet by either
entering the address of the first cell or clicking on the cell. Check the box for “Summary
Statistics” and click OK.
o Output looks like this:
Scores
Mean
28.2
Standard Error
7.144228
Median
24
Mode
#N/A
Standard Deviation
15.97498
Sample Variance
255.2
Kurtosis
-0.40167
Skewness
0.891063
Range
38
Minimum
14
Maximum
52
Sum
141
Count
5
Confidence Level(95.0%)
19.8356
o Notes:
ƒ “Scores” is the label from the first line of the column containing data
ƒ Standard Error is the sample standard deviation divided by the square root of the
sample size
ƒ Standard Deviation is a sample value
ƒ Confidence Level is the error in the estimate of a confidence interval, calculated
using the t distribution; that is, confidence level = t0.95 × sx
o A 95% confidence interval is implicit in this output: it is 28.2 ± 19.84
Excel, page 2
To calculate an interval for a different confidence level: in the Descriptive Statistics
dialog box there is an entry for “Confidence Level for Mean.” This is the confidence
level of the interval to be calculated.
o For a hypothesis test, t = (X̄ - µ0)/sX̄. For the hypothesis test H0: µ ≤ 25 vs. H1: µ > 25,
for example, we would have t = (28.2 − 25)/7.144 = 0.4479. We could then use the
TDIST function to determine the p-value of the test.
Excel also provides spreadsheet formulas for descriptive statistics. To use these, enter an = sign in
a cell, followed by the formula with appropriate range designation
o AVERAGE(RANGE) : returns the arithmetic mean
o STDEV(RANGE) : returns the sample standard deviation
o STDEVP(RANGE) : returns the population standard deviation
o VAR(RANGE) : returns the sample variance
o VARP(RANGE) : returns the population variance
o MEDIAN(RANGE)
o MODE(RANGE)
o COUNT(RANGE) : returns the number of cells in the range which contain numberic
data. Note that the count function does not count blank cells or cells containing
alphabetic information (words).
o
•
Probability Functions in Excel
•
Binomial Probabilities:
o BINOMDIST(x0, n, π, CUMULATIVE)
ƒ “CUMULATIVE” takes the values “TRUE” or “FALSE”; false returns the
probability of the individual number of successes, while true returns the value
P(x ≤ x0)
ƒ BINOMDIST(4, 12, .3, false) = 0.23114 is the probability of 4 successes in 12
trials with probability of success = 0.3 for each trial
ƒ BINOMDIST(4, 12, .3, false) = 0.723655 is the probability of 4 or fewer
successes in 12 trials
ƒ To work repeated problems, create a specialized worksheet. For example, in
cell A5, enter Prob x =, in cell B5 enter =binomdist(b2,b3,b4,false) and in cell
B6 enter =binomdist(b2,b3,b4, true)
• b2 is the entry cell for the number of success, b3 for the number of
trials and b4 for π
• enter your own labels for cells a2 to a4 and a6
•
Poisson Probabilities:
o POISSON(x, µ, CUMULATIVE)
ƒ “CUMULATIVE” takes the values “TRUE” and “FALSE”, for cumulative or
individual values
ƒ remember that Poisson probabilities depend entirely on the expected value µ
•
Exponential Probabilities:
o EXPONDIST(t0, r, CUMULATIVE)
ƒ “CUMULATIVE” will usually take the value “TRUE”
ƒ r is the rate of occurrence and t0 is the interval until first occurrence, thus this
formula returns P(t ≤ t0)
ƒ EXPONDIST(2,0.5,true) = 0.632121 is the probability that the first success will
occur within 2 minutes if the average rate of occurrence is 0.5 per minute
ƒ to find P(t > t0) enter 1 – EXPONDIST
•
Normal Probabilities:
o NORMDIST(x0, µ, σ, CUMULATIVE)
Excel, page 3
If “CUMULATIVE” has value “TRUE” this formula returns P(x ≤ x0) for the
normal distribution with given µ, σ
ƒ =normdist(20,25,5,true) = 0.1587 is the probability of values less than or equal
to 20 on a normal distribution with µ = 25 and σ = 5
NORMINV(PROBABILITY, µ, σ)
ƒ this formula returns the x0 such that P(x ≤ x0) has the probability entered in the
formula
ƒ NORMINV(.975, 200, 20) = 239.2; on a normal distribution with mean 200 and
standard deviation 20, .975 of the distribution is less than 239.2
NORMSDIST(z0): returns P(z ≤ z0)
NORMSINV(PROBABILITY): returns z0 such that P(z ≤ z0) has the given probability
To work repeated problems, create a specialized spreadsheet: for example, in Cell A4
enter Prob (x <= x0); in cell B4, enter =NORMDIST(B6, B7, B8, TRUE). In A5 enter
Prob( x > x0) and in B5 enter =1-B4). Then enter an x value in B6, mean in B7, and
standard deviation in B8. You will of course want to enter labels in A6 to A8.
ƒ
o
o
o
o
•
t Distribution Probabilities:
o TDIST(t, degrees of freedom, tails)
ƒ
t is a calculated value from the formula
t=
x − µ0
sx
or other t formulas which
we will encounter
degrees of freedom will depend on the problem; in simple hypothesis tests, we
have df = n – 1
ƒ “tails” takes the value 1 or 2, depending on whether it’s a one-tailed or twotailed test
ƒ the result of tdist is the probability of a t value as great as that actually obtained;
it is the area under the graph of the t distribution beyond the calculated value of
t. If we specify 1 for “tails,” it is the area in one tail beyond the calculated
value; if we specify 2 for “tails,” it is the area in the tails beyond ±t.
ƒ in hypothesis testing, the result of the TDIST formula is the p-value of the test.
ƒ TDIST(3.15, 9, 1) = 0.00362; TDIST(1.93, 22, 2) = 0.0666
TINV(probability, degrees of freedom)
ƒ returns a t value with the specified probability split between the two tails
ƒ used for finding t values for use with confidence intervals
• TINV(0.05, 22) = 2.073875 gives the t value that would be used for
calculating a 95% confidence interval with a sample of n = 23
ƒ or for finding critical t values: for a two-tailed test, enter the significance level
for “probability”; for a two-tailed test, enter twice the significance level for
“probability”
• TINV(0.01, 44) = 2.692286 is the critical value for a two-tailed test at
1% significance with 44 degrees of freedom
• TINV(0.1, 26) = 1.705616 is the critical value for an upper one-tailed
test at 5% significance with 26 degrees of freedom; −1.705616 is the
critical value for a lower one-tailed test with same conditions
ƒ
o
Sample Problems and Applications
Normal Probabilities and z tests:
¾
For a compact model of microwave oven, the average power used is 750 watts with standard
deviation 10 watts.
o What is the probability that a randomly selected oven uses less than 735 watts?
ƒ Solution: use NORMDIST(735, 750, 10, true)
o What proportion of these ovens draw more than 720 watts?
Excel, page 4
ƒ
o
o
o
o
Solution: use NORMDIST(720, 750, 10, true). The result is the proportion that
use less than 720, and the required answer is 1 − that value, or
1 − NORMDIST(720, 750, 10, true)
How much power do the lowest 25% of these ovens use?
ƒ Solution: use NORMINV(0.25,750,10). The result is the number of watts such
that 25% use that many or fewer watts.
How much power do the highest 10% of these ovens use?
ƒ Solution: since 10% use more, 90% use less. Enter NORMINV(0.9,750,10).
The result is a wattage figure such that only 10% of the ovens use that much or
more.
If we choose a sample of 25 of these ovens, what is the probability that the mean power
usage will be more 755 watts?
ƒ Solution: This question refers to the distribution of sample means; that
distribution has µX̄ = 750; the relevant standard deviation is the standard error of
the mean σX̄. Calculate that value: σX̄ = σ/√n = 10/5 = 2. Then use the normdist
function: =1 − NORMDIST(755,750,2,TRUE)
The thickness of steel plates is normally distributed with σ = 0.05 mm. For a sample of
30 plates, X̄ = 22mm. Calculate a 95% confidence interval for the mean diameter of all
plates.
ƒ Solution: This problem requires the use of z values which demarcate the middle
95% of a normal distribution, 2-1/2% on each end. To find the appropriate
values, enter =NORMSINV(0.025) and/or NORMSINV(0.975). The numbers
you get will have the same absolute value. A general formula to find the z
values for a confidence interval would be =−NORMSINV(1 − (confidence
level)/200)
Probabilities and Hypothesis Tests with the t Distribution:
o
o
The amounts customers spend at Ye Olde Antique Barne are skewed upwards. In a
sample of 57 customers we find a sample mean of $312 with standard deviation $70.
Find a 90% confidence interval for the average spending at YOAB.
ƒ Solution: This problem requires the use of t values. To find the correct t-value
enter TINV(0.1, degrees of freedom). A general formula would be
=TINV((100-confidence level)/100, df). Notice that the TINV function uses
both tails of the distribution.
Use the information from the preceding problem to test the hypotheses H0: µ ≥ 320 vs.
H1: µ < 320. Use 5% significance level.
ƒ Solution: Calculate the t value t = (X̄ − µ0)/sX̄; in this case sX̄ = 70/7.5498 =
9.2717, so t = (312 − 320)/9.2717 = −0.8628.
• p-value approach: use TDIST to find the p-value: enter
TDIST(0.8628,56,1). The result is the probability of a t value as large
as or larger than 0.8628, and that is the p-value of the test. (We
actually want the probability of values as small or smaller than
−0.8628, but by the symmetry of the t distribution, that is the same.
o NOTE: the t value entered must be a positive number. If you
are setting up a spreadsheet to do a number of these problems,
use the ABS, or absolute value, function. For example, the
expression TDIST(ABS(B4),B5-1,1) would give the p-value
in a one-tailed test for the t value entered in cell B4 and
degrees of freedom equal to the sample size, entered in cell
B5, minus 1.
• Critical value approach. Use TINV to find the critical values. Enter
TINV(0.1, 56). The result is the critical value for an upper one-tailed
test at 5% significance. Notice that to find critical values for a onetailed test, we must enter TWO TIMES the significance level.
Excel, page 5
For a lower one-tailed test, we use the same procedure but
append a minus sign to the critical t value.
o For a two-tailed test, we enter the significance level of the test
as the probability.
Copper tubing must have an average diameter of 0.575 in; diameters are known to be
normally distributed. In a sample of 20 sections of pipe, the mean diameter is 0.569 in
with standard deviation 0.04 in. At 5% significance level, does the tubing meet the
standard?
ƒ Solution: this is a hypothesis test of H0: µ = 0.575 vs. H1: µ ≠ 0.575. The t
statistic = -0.6708.
• p-value: use TDIST(0.6708,19,2); this will give the p-value of the test.
• critical value: to find the critical values, enter TINV(0.05,19). The
result is 2.0930, and the decision rule is Reject H0 if t > +2.0930 or if t
< −2.0930.
o
o
Download