# standard deviation

```4.3 Measures of Variation
LEARNING GOAL
Understand and interpret these common measures of
variation: range, the five-number summary, and standard
deviation.
4.3-1
Why Variation Matters
Customers at Big Bank can enter any one of three
different lines leading to three different tellers.
Best Bank also has three tellers, but all customers wait in
a single line and are called to the next available teller.
Here is a sample of wait times are arranged in ascending
order.
Big Bank (three lines): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0
Best Bank (one line): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8
4.3-2
Slide
4.3- 2
Why Variation Matters
You’ll probably find more unhappy customers at Big Bank
than at Best Bank, but this is not because the average wait
is any longer. In fact, the mean and median waiting times
are 7.2 minutes at both banks.
Big Bank (three lines): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0
Best Bank (one line): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8
The difference in customer satisfaction comes from the
variation at the two banks.
4.3-3
Slide
4.3- 3
Figure 4.13 Histograms for the waiting times at Big Bank and Best Bank,
shown with data binned to the nearest minute.
4.3-4
Slide
4.3- 4
Range
Definition
The range of a set of data values is the difference
between its highest and lowest data values:
range = highest value (max) - lowest value (min)
4.3-5
Slide
4.3- 5
EXAMPLE 1 Misleading Range
Consider the following two sets of quiz scores for nine students.
Which set has the greater range? Would you also say that this set
has the greater variation?
Quiz 1: 1 10 10 10 10 10 10 10 10
Quiz 2: 2 3 4 5 6 7 8 9 10
4.3-6
Slide
4.3- 6
M
L
eo
dw
e
ar
nq
u
Q
a
r
e
(
Q
Quartiles and the Five-Number Summary
Quartiles are values that divide the data distribution
into quarters.
Lower quartile
(Q1)
Median
(Q2)
Upper quartile
(Q3)
Big Bank: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0
Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8
4.3-7
Slide
4.3- 7
Definitions
The lower quartile (or first quartile or Q1) divides the
lowest fourth of a data set from the upper three-fourths. It
is the median of the data values in the lower half of a data
set. (Exclude the middle value in the data set if the
number of data points is odd.)
The middle quartile (or second quartile or Q2) is the
overall median.
The upper quartile (or third quartile or Q3) divides the
lowest three-fourths of a data set from the upper fourth. It
is the median of the data values in the upper half of a
data set. (Exclude the middle value in the data set if the
number of data points is odd.)
4.3-8
Slide
4.3- 8
The Five-Number Summary
The five-number summary for a data distribution
consists of the following five numbers:
low value
lower quartile
median
upper quartile
high value
4.3-9
Slide
4.3- 9
Drawing a Boxplot (Box and Whisker Plot)
Step 1. Draw a number line that spans all the values in the
data set.
Step 2. Enclose the values from the lower to the upper
quartile in a box. (The thickness of the box has no meaning.)
Step 3. Draw a line through the box at the median.
Step 4. Add “whiskers” extending to the low and high values.
4.3-10
Slide
4.3- 10
Percentiles
Definition
The nth percentile of a data set divides the bottom n% of
data values from the top (100 - n)%. A data value that lies
between two percentiles is often said to lie in the lower
percentile. You can approximate the percentile of any
data value with the following formula:
percentile of data value =
number of values less than this data value
x 100
total number of values in data set
4.3-11
Slide
4.3- 11
4.3-12
Slide
4.3- 12
EXAMPLE 2 Smoke Exposure Percentiles
Answer the following questions concerning the data in Table 4.4
(previous slide).
a. What is the percentile for the data value of 104.54 ng/ml for
smokers?
b. What is the percentile for the data value of 61.33 ng/ml for
nonsmokers?
c. What data value marks the 36th percentile for the smokers? For
the nonsmokers?
4.3-13
Slide
4.3- 13
Standard Deviation
Statisticians often prefer to describe variation with a single
number. The single number most commonly used to describe
variation is called the standard deviation.
standard deviation =
sum of (deviations from the mean)2
total number of data values - 1
4.3-14
Slide
4.3- 14
EXAMPLE 4 Calculating Standard Deviation
Calculator the standard deviation of
9, 2, 5, 4, 12, 7, 8, 11
4.3-15
Slide
4.3- 15
Interpreting the Standard Deviation
A good way to develop a deeper understanding of the standard
deviation is to consider an approximation called the range
rule of thumb.
The Range Rule of Thumb
The standard deviation is approximately related to the range of
a distribution by the range rule of thumb:
range
standard deviation ≈
4
If we know the range of a distribution (range = high – low),
we can use this rule to estimate the standard deviation.
4.3-16
Slide
4.3- 16
The Range Rule of Thumb (cont.)
Alternatively, if we know the standard deviation, we can use
this rule to estimate the low and high values as follows:
low value ≈ mean – (2 x standard deviation)
high value ≈ mean + (2 x standard deviation)
The range rule of thumb does not work well when the high or
low values are outliers.
4.3-17
Slide
4.3- 17
EXAMPLE 5 Estimating a Range
Studies of the gas mileage of a BMW under varying driving
conditions show that it gets a mean of 22 miles per gallon with a
standard deviation of 3 miles per gallon. Estimate the minimum and
maximum typical gas mileage amounts that you can expect under
ordinary driving conditions.
4.3-18
Slide
4.3- 18
EXAMPLE 6 Comparing Variations
The following data sets show the ages of the first seven U.S.
presidents (Washington through Jackson) and seven recent U.S.
presidents (Ford through Obama) at the time of inauguration.
First 7: 57, 61, 57, 57, 58, 57, 61
Last 7: 61, 52, 69, 64, 46, 54, 47
(a)Find the mean, media, and range for each of the two data sets.
(b) Give the five-number summary and draw a boxplot for each of
the two data sets.
(c) Find the standard deviation for each of the two data sets.
(d) Apply the range rule of thumb to estimate the standard deviation
of each of the two data sets. How well does the rule work in each
case? Briefly discuss why it does or does not work well.