Standard Deviation (s)

advertisement
AP Statistics
1.2 Measuring Spread
Variance and Standard Deviation
Objectives: Given a data set, compute the standard deviation and variance as measures of spread.
Give two reasons why we use squared deviations rather than just average deviations
from the mean.
Explain what is meant by degrees of freedom.
Identify situations in which the standard deviation is the most appropriate measure of
spread and situations in which the IQR is the most appropriate measure.
Standard Deviation (s):
The five-number summary is not the most common numerical description of a distribution.
That belongs to the combination of the mean to measure center and the standard deviation to
measure spread. The standard deviation measures spread by looking at:
How far the observations are from their mean.
The Variance s2 and Standard Deviation s
The variance s2 of a set of observations is the average of the squares of the deviations of the
observations from their mean. In symbols, the variance of n observations x1, x2, …, xn is :
Or, more compactly:
The standard deviation s is the square root of the variance s2:
In words: The square root of the average squared deviation from the mean
Some deviations will be positive and some negative because some of the observations fall on each
side of the mean. In fact the sum of the deviations of the observations from their mean will always be
zero.
Squaring the deviations makes them all positive.
The Variance is the average squared deviation, therefore s2 and s will be large if the observations are
widely spread about their mean, and small if the observations are all close to the mean.
Example 1:
Radon is a naturally occurring gas and is the second leading cause of lung cancer in the United
States. It comes from the natural breakdown of uranium in the soil and enters buildings through
cracks and other holes in the foundations. Found throughout the United States, levels vary
considerably from state to state. There are several methods to reduce the levels of radon in your
home, and the Environmental Protection Agency recommends using one of these if the measured
level in your home is above 4 picocuries per liter. Four readings from Franklin County, Ohio, where
the county average is 9.32 picocuries per liter, were 5.2, 13.8, 8.6 and 16.8.
a) Find the mean step-by-step.
b) Find the standard deviation step-by-step.
Finally, let’s find the standard deviation s:
c) Now enter the data into your calculator and use the mean and standard deviation buttons to
obtain x̅ and s. Do the results agree with your hand calculations?
STAT  Edit  enter your data into L3  STAT  CALC  1-Var Stats  See your mean and
Standard Deviation.
Degrees of Freedom: the number n – 1 is called the degrees of freedom of the variance or
standard deviation. Many calculators offer a choice between dividing by n and dividing by
n – 1, so be sure to use n – 1.
Properties of the Standard Deviation

s measures spread about the mean and should be used only when: the mean is
chosen as the measure of center.

s = 0 only when there is no spread/variability. This happens only when all
observations have the same value. Otherwise, s > 0. As the observations become
more spread out about their mean, s gets larger.

s, like the mean x̅ , is NOT RESISTANT. A few outliers can make s very large.
The use of squared deviations renders s even more sensitive than x̅ to a few extreme
observations.
Choosing a Summary:
The five-number summary is usually better than the mean and standard deviation for describing a
skewed distribution or a distribution with strong outliers. Use x̅ and s only for reasonably symmetric
distributions that are free of outliers.
Example for choosing summaries:
Stemplots of annual returns for stocks and Treasury bills, 1950 to 2003. (a) Stock returns, in
whole percents. (b) Treasury bill returns, in percents and tenths of percent. Which one would you
use s and x̅ , and which one would you use the five-number summary?
You see that returns on Treasury bills have a right skewed distribution. Convention in the
financial world calls for x̅ and s because some parts of investment theory use them. For
describing this right-skewed distribution, however, the five-number summary would be more
informative.
Remember! The choice of using mean-based (mean, standard deviation) or median-based
(median, five-number summary, IQR) statistics to describe a distribution should be based on
the SHAPE of the distribution!
HW: pg. 89-90; 1.39 – 1.44
Download