Measures of Variation:

advertisement
Measures of Variation:
Variance – spread of the data.
Range – difference between the largest and smallest values of the distribution
- indicates the variation between the smallest and largest entries, but does not
tell how much other values vary from one another.
Range = largest value – smallest value
Ex. 45, 72, 88, 91, 27, 11, 99, 66, & 10
Range = 99 – 10 = 89
Look at the example on page 121 to see how the range can sometimes be deceiving.
Standard Deviation – measurement that will give you a better understanding of how the
data entries differ from the mean.
- formula differs depending on whether you are using an entire population or
just a sample.
Sample Standard Deviation = s = √Σ (x – x bar)2
n-1
where x is any entry in the distribution, x bar is the mean, and n is the number of entries.
*** Notice that the standard deviation uses the difference between each entry x and the
mean x bar. The quantity (x – x bar) will be negative if the mean is greater than the
entry. If you take the sum Σ (x – x bar) then the negative values will cancel the positive
values, leaving you with a variation measure of 0 even if some entries vary greatly from
the mean. Once the quantities become squared, the possibility of having some negative
values in the sum is eliminated.
*** If we were working with an entire population, we would divide by N, the population
size, and would thus have the mean of the values (x – μ)2, where μ represents the mean of
the population.
We get our sample standard deviation formula from a formula for what we call a variance
of a sample, denoted by s2:
Sample Variance = s2 = Σ (x – x bar)2
n–1
Look at both of these formulas above. What difference do you see?
When we are working by hand with problems involving standard deviation, it is a good
idea to break up our formula into steps and to create a table to aid us along the way.
The following example will show you how to break your formula up and will give you a
table to guide you along.
Ex. A random sample of seven New York plays gave the following information
about how long each play ran on Broadway (in days).:
12
45
36
118
50
7
20
a) Find the range.
b) Find the sample mean.
c) Find the sample standard deviation.
Solution:
Part A is rather simple, we know our largest value is 118 and our smallest value is
7. If we substitute that in our range formula we arrive at:
Range = largest value – smallest value
Range =
118
7
=
111 days
Part B is just asking for the sample mean. We add up all of our entries and divide
by the total number of entries. We then arrive at a sample mean of 41.14 days.
Part C is where it gets a little tricky. Let’s create a chart that breaks down the
standard deviation formula.
Length of Broadway Plays (in days):
x
x – x bar
7
7 – 41.14 = -34.14
12
20
36
45
50
118
Σx = 288
(x – x bar)2
1165.54
Σ(x – x bar) 2 =
We placed the entries in order in the x column and we took a sum of that column
and placed it at the bottom. Now, we are going to use our sample mean from part B and
use that to help us complete the x – x bar column. (One example is already completed!)
The last column will just be the result of column 2 squared. (We will round to the nearest
hundredth!) Also, calculate the sum of column 3.
After we have completed this chart, we need to take care of the denominator of
our formula, by figure out what n is equal to.
n = ______
therefore
n – 1 = ______
We will now take our Σ(x – x bar) 2 = ______ and divide that by n – 1 = ______.
What is the result? ________
If we think about it, this answer only gives us a sample variance. What do you think we
should do to the result above to come up with the sample standard deviation? Why?
s = _______
Let’s go through the following examples to get a better sense of what we are trying to
accomplish:
1) Petroleum pollution in oceans is known to increase the growth of a certain
bacteria. Brian did a project for his ecology class for which he made a
bacteria count (per 100 milliliters in nine random samples of sea water. His
counts gave the following readings: 17 23
18
19
21
16
12
15
18
a) Find the range.
b) Find the sample mean.
c) Find the sample standard deviation.
2) In the process of tuna fishing, porpoises are sometimes accidentally caught and
killed. A U.S. oceanographic institute wants to study the number of porpoises
killed. Records from eight commercial tuna fishing fleets gave the following
information about the number of porpoises killed in a three-month period:
6
18
9
0
15
3
10
2
a) Find the range.
b) Find the sample mean.
c) Find the sample standard deviation.
2) The neighborhood association of Cherry Hills Village took a survey of
opinions about rent control in their neighborhood. In this opinion poll 1=
strongly against rent control and 10 = strongly in favor of rent control. A
random sample of 14 people gave the following opinions: 1
1
2
1
10
1
10
10
8
10
2
10
8
1
a) Compute the range, sample mean, and sample standard deviation of
opinion ratings about rent control.
Another questionnaire asked for opinions about moving a mailbox from one
side of the street to another. Again a random sample of 15 people gave the
following opinions where 1 = strongly disagree and 10 = strongly agree:
5
5
5
4
5
5
5
6
5
5
6
5
6
5
b) Compute the range, sample mean, and standard deviation of these
numbers.
c) Compare your answers for parts a and b. Were the means about the
same? Were the opinions on the two issues distributed differently?
How did the range and standard deviation reflect this when the mean
did not? Explain your answer!
3) Black Hole Pizza Parlor instructs its cooks to put a “handful” of cheese on each
large pizza. A random sample of six such handfuls were weighed. The weights
to the nearest ounce were:
3
2
3
4
3
5
a) Find the mode, median, and mean weight of the handfuls of cheese.
b) Find the range and standard deviation of the weights.
c) A new cook used to play football and has large hands. His handful of
cheese weighs 6 ounce. Replace the 2 ounce data value by 6 ounces.
Recalculate the mode, median, and mean. Which average changed the
most? Comment on the changes!
4) City Hospital has a temporary shortage of nurses, so the nurses have been
working overtime. A random sample of six nurses reported that the overtime
hours they worked last week were: 7
2
4
5
4
3
a) Compute the mode, median, and mean of the overtime hours.
b) Compute the range and standard deviation.
c) Suppose a recording error occurred, and the data value of 7 was
replaced by 2. Recompute the mode, median, and mean and comment
on the changes these averages produced by changing the data.
*** We can also use the calculator to help us in solving standard deviation problems! If
we create a list under the STAT menu, we see c chart beginning to develop. If we scroll
over to the second column (L2), we can tell the calculator exactly what we would like to
see this column calculate (and so on with L3).
***Notice, after we create a list, if we use the 1-VAR STATS function, the calculator
gives us the sample mean and standard deviation of our entries. The Sx is the standard
deviation.
Download