Goals

advertisement
Goals
Notes
Measurements of Variance
§2.4 Goals:
I
Measure fluctionations.
I
Study the variance formula.
I
Discuss concentration of data similar to the center.
Assignment
I
Homework §2.4: #7, #13, #25, #33, #39
Suggested Exercises: §2.4: #1, #3, #9, #10
Measuring Variation in Data Sets
Notes
Measurements of variation relate how strongly data entries vary from
your choice of the measurement of center.
I
Range
I
Deviation from the mean
I
Variance and Standard Deviation
In addition, help to identify unusually occuring data values, use
I
Emprical Rules for symmetrical bell-shapes.
I
Chebychev’s Theorem for other shapes.
Deviations
Notes
You can compute the deviation from the mean for every data entry:
Deviation of data entry = data entry − mean
For a data set with mean = 24.5.
Data Entry Deviation
20
-4.5
33
8.5
18
-6.5
24
-0.5
25
0.5
2.5
27
1. What is the total
of all the deviations?
2. Is the mean of
these deviations
useful?
Variance Formula
Notes
To overcome the problem of total deviation, we use
sum of squares of deviations.
SSx = Σ(x − µ)2 (for parameters)
or
SSx = Σ(x − x̄)2 (for statistics)
The average of sum of squares is the variance.
σ2 =
Σ(x − µ)2
(for parameters)
N
or
s2 =
Σ(x − x̄)2
(for statistics)
n−1
Using N vs n − 1
is for technical
reasons.
Standard Deviation
Notes
Taking the square root of the variance results in the
standard deviation.
I
This is what is used instead of ”average deviation.”
I
This has the same units of measurement as the data entries.
Steps to find the standard deviation:
1. Find the mean of the data set.
2. Find the deviation for each entry.
3. Square each deviation.
4. Add to get the sum of squares.
5. Divide (by either N or n − 1) to get the variance.
6. Find the square root of the variance to get the standard
deviation.
Empirical Rules
Notes
Special case: With symmetric and bell-shaped data, approximately
I
68% of the data entries lie within one standard deviation away
from the mean.
I
95% of the data entries lie within two standard deviations away
from the mean.
I
99.7% of the data entries lie within three standard deviations
away from the mean.
For non-bell shaped and other shapes, a portion of at least
1−
1
k2
of the data entries lie within k standard deviations of the mean. This
is a theorem by Chebychev.
Download