Standard deviation

advertisement
MAT 1000
Mathematics in Today's World
Last Time
1. Three keys to summarize a collection
of data: shape, center, spread.
2. Can measure spread with the fivenumber summary.
3. The five-number summary can be
represented visually by a boxplot,
which is useful for making
comparisons between distributions.
Today
Another measurement for the spread of a distribution:
the standard deviation.
For a distribution of the correct shape, the two
numbers mean and standard deviation give us more
information than the whole five-number summary.
These special shaped distributions are called normal
distributions, and they are very common.
Standard deviation
Spread should describe how widely data values are
dispersed about the center.
Finding the standard deviation uses the mean 𝑥 as the
center.
The standard deviation is the “average” of the
distance of each data value 𝑥𝑖 from the mean 𝑥.
Standard deviation
The standard deviation can be either a parameter
or a statistic.
Parameter: 𝜎
(This is a Greek letter. It is pronounced “sigma.”)
Statistic: 𝑠
For the following we will assume we are computing
a statistic 𝑠
Standard deviation
Example
Let’s find the standard deviation of 7, 8, 11, 14,
(assuming these are from a sample).
First, find the mean
7 + 8 + 11 + 14
𝑥=
= 10
4
Standard deviation
Now we make a table
𝑥𝑖
𝑥𝑖 − 𝑥
7
8
7 − 10 = −3
8 − 10 = −2
11 − 10 = 1
14 − 10 = 4
11
14
(𝑥𝑖 − 𝑥)2
−3
−2
1
4
2
=9
2 =4
2 =1
2 = 16
30
Add up all the numbers in the last column.
Standard deviation
We divide that sum by one less than the number of
data values.
Remember the data set is: 7, 8, 11, 14. This has 4
values, so we divide the sum 30 by 3
30
= 10
3
This number is called the variance.
Standard deviation
The standard deviation is the square root of the
variance.
10 ≈ 3.16
Standard deviation
Let’s review the steps we took, using 𝑥 for the mean
and 𝑥1 , 𝑥2 , … , 𝑥𝑛 for the 𝑛 data values.
1. We found the difference of each data value and
the mean: 𝑥𝑖 −𝑥
2. We squared each of these numbers: 𝑥𝑖 − 𝑥 2
3. Add all of these up:
𝑥1 − 𝑥 2 + 𝑥2 − 𝑥 2 + ⋯ + 𝑥𝑛 − 𝑥 2
4. Divide by 𝑛 − 1, and take a square root:
𝑥1 − 𝑥
2
+ 𝑥2 − 𝑥 2 + ⋯ + 𝑥𝑛 − 𝑥
𝑛−1
2
Standard deviation
This is the formula for the standard deviation you will
be given on tests
𝑠=
𝑥1 − 𝑥
2
+ 𝑥2 − 𝑥 2 + ⋯ + 𝑥𝑛 − 𝑥
𝑛−1
The key is to remember the steps this formula
describes.
2
Standard deviation
Notes
We divide by 𝑛 − 1 because we are computing a
statistic (the reason is subtle but important). If we
were finding a parameter, we would divide by 𝑛
If the data values have units, then the mean and
standard deviation have the same units.
Standard deviation
How should we interpret the standard deviation?
If the standard deviation is 0 then there is no
deviation from the mean (all the data is equal)
Otherwise, the standard deviation will be positive.
The larger the value of the standard deviation, the
more spread out the data.
Five-number summary and
standard deviation
We have two ways to measure the center and spread
of a distribution:
1. The five-number summary
2. The mean and standard deviation.
If the data is symmetric without many outliers, we will
see that the mean and standard deviation give lots of
information.
If the data is not very symmetric, or has lots of
outliers, the five-number summary is best.
Normal distributions
The goal is to summarize large data sets.
For a one number summary, measures of center like
mean or median are the best we have, but no one
number summary is very informative.
It may be surprising, but for a large group of
commonly occurring distributions, a two number
summary can be quite informative.
These distributions are called normal distributions.
Normal distributions
Normal distributions all have a particular shape: fairly
symmetric, one peak, few outliers, and a characteristic
“bell” shape.
The shape is easier to see with a smooth curve…
Normal distributions
As a histogram.
Normal distributions
As a smooth curve.
Normal distributions
Both at once.
Normal distributions
As a histogram.
Normal distributions
As a smooth curve
Normal distributions
Both at once.
Normal distributions
If we know the mean 𝑥 and the standard deviation
𝑠 of a normal distribution, we can get lots of
information.
We can get (close to) Q1, the median, and Q3.
Since normal distributions are very symmetric, the
median is very close to 𝑥.
Normal distributions
What about the first and third quartiles of a normal
distribution?
The first quartile Q1 is: 𝑥 − 0.67𝑠
In words, multiply 𝑠 by 0.67, then find 𝑥 minus that.
The third quartile Q3 is: 𝑥 + 0.67𝑠
In words, multiply 𝑠 by 0.67, then add that to 𝑥.
Normal distributions
Example The heights of men in the US are normally
distributed with mean 69.3 in. (5′ 9") and standard
deviation 2.9 in. (notice the unit of the standard
deviation is in.). Find the median, Q1, and Q3.
The median height is equal to the mean, 69.3 in.
Q1 is 69.3 − 0.67 2.9 = 67.4 in. = 5′ 7"
Q3 is 69.3 + 0.67 2.9 = 71.2 in. = 5′ 11"
Normal distributions
Remember that 25% of the data is below Q1, and 75%
is below Q3
This is the same as saying “50% of the data is between
Q1 and Q3."
So in a normal distribution, the middle 50% of the
data is between: 𝑥 − 0.67𝑠 and 𝑥 + 0.67𝑠
Normal distributions
Remember: not every distribution is normal. Don’t use
the formulas 𝑥 − 0.67𝑠 and 𝑥 + 0.67𝑠 unless you
know the distribution is normal.
Normal distributions have a specific shape: symmetric,
one peak, few outliers, and no clusters.
How can you tell if a distribution is normal?
Look at a histogram or a stemplot!
Download