Measures of Variation

advertisement
Measures of Variation
Range
 The difference between the maximum and the
minimum data entries in a data set.
 Range = max value – min value
Deviation
 The difference between a data entry (x) and the
mean (µ)
 Deviation of x = x - µ
EX: find the range of the set and
the deviation of each value
Salary (1000 s of
dollars)
41
37
39
45
47
41
Deviation
Population Variance
2
(σ )
 Square the deviations of the data set, then average
them to get the population variance.
σ2 = Σ(x - µ)2
n
Population Standard Deviation
… Just take the square root of the population
variance.
(Symbol = σ)
EX: find the variance and standard
deviation of the data set
x
41
37
39
45
47
41
x-µ
(x - µ)2
Sample variance and standard
deviation:

EX: find variance and standard
deviation of the sample:
The weights (in pounds) of a sample of 10 U.S.
presidents.
173 175 200 173 160
185 195 230 190 180
Interpreting Standard
Deviation
 Standard deviation is the measure of the typical
amount an entry deviates from the mean. The
more entries are spread out, the greater the
standard deviation.
Empirical Rule
 For data with a symmetric (bell-shaped)
distribution, the standard deviation has the
following characteristics:
 1. About 68% of the data lie within 1 standard
deviation of the mean.
 2. About 95% of the data lie within 2 standard
deviations of the mean.
 3. About 99.7% of the data lie within 3 standard
deviations of the mean.
Ex (from page 96)
 Use the Empirical Rule – assume the data has a
bell-shaped distribution:
30. The mean monthly utility bill for a sample of
households in a city is $70, with a standard
deviation of $8. Between what two values do about
95% of the data lie?
Chebychev’s Theorem
 This works for ANY data set, symmetric or not.
 The portion of any data set lying within k
standard deviations of the mean is at least
1- 1
k2
Ex: (from page 96)
 36. Old Faithful is a famous geyser at Yellowstone
National Park. From a sample with n = 32, the
mean duration of Old Faithful’s eruptions is 3.32
minutes and the standard deviation is 1.09
minutes. Using Chebyshev’s Theorem, determine
at least how many of the eruptions lasted between
1.14 minutes and 5.5 minutes.
Standard Deviation for
grouped data:

Measures of Position
Quartiles
 Data set is divided into 4 sections, separated
by 3 QUARTILES
 Q1 – about 25% of the data is below Quartile 1
 Q2 – about 50% of the data is below Quartile 2
 Q3 – about 75% of the data is below Quartile 3
 (Q2 is also the median!)
Ex: Find the Quartiles
The number of vacations days used by a sample of
20 employees in a recent year:
3
5
4
5
9
3
0
7
2
2
10
8
1
2
0
6
7
6
3
5
InterQuartile Range (IQR)
 IQR is the measure of variation that given the
range of the middle 50% of the data. It is the
difference between the 3rd and 1st quartiles.
IQR = Q3 – Q1
Box-and-Whisker Plot
 Find the 3 quartiles of the data set, and the
minimum and maximum entries
 Construct a horizontal scale that spans the
range.
 Draw a box from Q1 to Q3 and draw a vertical
line at Q2.
 Draw whiskers from the box to the min and
max entries.
Construct a box-and-whisker
plot
The number of vacations days used by a sample of
20 employees in a recent year:
3
5
4
5
9
3
0
7
2
2
10
8
1
2
0
6
7
6
3
5
Percentiles and Deciles
 Similar to Quartiles, but the data is divided into
10 or 100 parts instead of 4.
 8th Decile  80% of the data falls before the
decile.
 95th Percentile  95% of the data falls before the
percentile
Standard Score (z-score)
 Represents the number of standard deviations a
given value (x) falls from the mean (µ).
z=
value – mean
= x-µ
standard deviation
σ
Ex: (from page 112)
 48. The life spans of a species of fruit fly have a
bell-shaped distribution, with a mean of 33 days
and a standard deviation of 4 days.
 A. The life spans of three randomly selected fruit
flies are 34 days, 30 days, and 42 days. Find the zscore that corresponds to each life span.
Determine if any of these life spans are unusual.
 B. The life spans of three randomly selected fruit
flies are 29 days, 41 days, and 25 days. Using the
Empirical Rule, find the percentile that
corresponds to each life span.
Download