4-1
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4-2
When you have completed this chapter, you will be able to:
1.
Compute and interpret the range, the mean
deviation, the variance, the standard deviation,
and the coefficient of variation of ungrouped data
2.
Compute and interpret the range, the variance,
and the standard deviation from grouped data
3.
Explain the characteristics, uses, advantages,
and disadvantages of each measure
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4-3
4.
Understand Chebyshev’s theorem and the normal
or empirical rule, as it relates to a set of observations
5.
Compute and interpret percentiles, quartiles and the
interquartile range
6.
Construct and interpret box plots
7.
Compute and describe the coefficient of skewness and
kurtosis of a data distribution
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Terminology
Range
…is the difference between the
largest and the smallest value.
Only two values are used in its calculation.
It is influenced by an extreme value.
It is easy to compute and understand.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4-4
Terminology
4-5
Mean Deviation
…is the arithmetic mean of the absolute values of
the deviations from the arithmetic mean.
MD 
 x
N
All values are used in the calculation.
It is not unduly influenced by large or small values.
The absolute values are difficult to manipulate.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4-6
The weights of a sample of crates
containing books for the bookstore
(in kg) are:
103
97 101
106
103
Find the range and the mean deviation.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4-7
103
97 101
106
Find the mean weight
103
x 510




102
N
5
Find the mean deviation

103  102 + ... + 103  102
5
Find the range
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
MD 
 x
1+ 5 +1+ 4 + 5

5
106 – 97 = 9
N
= 2.4
4-8
Terminology
Variance
…is the arithmetic mean of the
squared deviations
from the arithmetic mean.
 All values are used in the calculation.
 It is not influenced by extreme values.
 The units are awkward…the square
of the original units.
Computation
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Computing the Variance
Formula

Formula
s
2
… for a Population
2
( x   )

N
2
… for a Sample
( x  x )

n 1
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
2
4-9
4 - 10
The ages of the Dunn family are:
2, 18, 34, 42
What is the population mean and variance?
x 96  24



4
N

2
(x   )

N
2
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

2  24 

2
944

4
 236
+ ... + 42  24 
4
2
Population Standard Deviation
4 - 11
… is the square root of the
population variance
From previous example…
2

 

236
= 15.36
Example
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 12
EXAMPLE
The hourly wages earned by a sample of five
students are: $7, $5, $11, $8, $6.
Find the mean, variance, and Standard Deviation.
x


N
37

5
2
 7 . 4 2 + ... + 6  7 . 4 2 21.2

7

(
x

x
)
s2 



5 1
5-1
n 1
s=
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
s2  5.29
= 7.40
= 5.30
= 2.30
The Mean
of Grouped Data
4 - 13
From chapter 3….
A sample of ten movie theatres in a metropolitan
area tallied the total number of movies showing
last week.
Compute the
mean number of movies showing per theatre.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
The Mean
fx

x
of Grouped Data
N
Continued…
Class
(f)(x)
Midpoint
Movies
Showing
Frequency
1 to under 3
1
2
2
3 to under 5
2
4
8
5 to under 7
3
6
18
7 to under 9
1
8
8
9 to under 11
3
10
30
Total
10
f
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 14
66
The Mean
fx

x
of Grouped Data
N
Movies
Showing
Frequency
Total
10
f
Formula
Continued…
Class
Midpoint
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
(f)(x)
66
fx

x
N
Now: Compute the
variance and
standard deviation.
4 - 15
 66
10
= 6.6
Sample Variance
for Grouped Data
4 - 16
The formula for the sample variance for
grouped data is:
2

(
f
x
)
 fx 2 
n
2 
s
n1
f is class frequency and X is class midpoint
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 17
Sample Variance
for Grouped Data
Frequency
1 to under 3
1
2
2
4
3 to under 5
2
4
8
32
5 to under 7
3
6
18
108
7 to under 9
1
8
8
64
9 to under 11
3
10
30
300
Total
10
66
508
f
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Class
(f)(x)
Midpoint
(x2)f
Movies
Showing
Sample Variance
for Grouped Data
Movies
Showing
Frequency
Total
10
f
4 - 18
Class
(f)(x)
Midpoint
66
(x2)f
508
2

(
f
x
)
 fx 2 
n
2 
s
n1
The variance is
2
66
= 508 - 10
9
= 8.04
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
The standard
deviation is
8.04 = 2.8
Interpretation and Uses
of the Standard Deviation
Chebyshev’s Theorem:
For any set of observations,
the minimum proportion of the values
that lie within k standard deviations
of the mean is at least:
Formula
1 
1
k2
where k2 is any constant greater than 1
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 19
4 - 20
Suppose that a wholesale plumbing supply company has
a group of 50 sales vouchers from a particular day.
The amount of these vouchers are:
How well
does this
data set
fit
Chebychev’s
Theorem?
Solution
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 21
Solution (continued)
Using
Step 1
Step 2
Mean = $319
SD
= $101.78
Determine the mean and
standard deviation of the sample
Input k =2
into Chebyshev’s theorem
1-
1
22
= 1 – ¼ = 3/4
i.e. At least .75 of the observations will fall
within 2SDof the mean.
Step 3
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 22
Solution (continued)
Step 3
Using the mean and SD,
find the range of data values
within 2
SD of the mean
Mean = $319
SD
= $101.78
( - 2S, + 2S) = 319 - (2)101.78, 319 +2(101.78)
= (115.44, 522.56)
x
x
Now, go back to the sample data,
and see what proportion of the values fall between
115.44 and 522.5656
Proportion
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Solution (continued)
4 - 23
Proportion of the values
that fall
between 115.44 and 522.56
We find that
48-50
or 96%
of the data
values are in
this range
– certainly
at least 75%
as the theorem
suggests!
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Interpretation and Uses of the
Standard Deviation
4 - 24
Empirical Rule:
For any symmetrical, bell-shaped distribution:
…About 68% of the observations
will lie within 1s of the mean
…About 95% of the observations will
lie within 2s of the mean
…Virtually all the observations
will be within 3s of the mean
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Bell-Shaped Curve
…showing the relationship between
 and 
3

+ 3
2
+2
1 +1
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 25
4 - 26
Suppose that a wholesale plumbing supply company has
a group of 50 sales vouchers from a particular day.
The amount of these vouchers are:
How well
does this
data set
fit the
Empirical
Rule?
Solution
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Solution
4 - 27
First check if the histogram has an approximate mound-shape
Not bad…so we’ll proceed!
We need to calculate the mean and standard
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
deviation
Mean: $319 Standard Deviation: $101.78
4 - 28
Calculate the intervals:
( x  s , x + s ) = (319-101.78,
319+101.78)  (217.22, 420.78)
( x  2 s , x + 2 s ) = 319 -(2)101.78, 319 +2(101.78) =(115.44, 522.56)
( x 3 s , x +3 s)
= 319-(3)101.78, 319 + 3(101.78) = (13.66, 624.34)
Interval
Empirical Rule Actual # values
217.22, 420.78
68%
31/50
115.44, 522.56
95%
48/50
13.66, 624.34
100%
49/50
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Actual percentage
62%
96%
98%
4 - 29
Skewness
…is the measurement of the
lack of symmetry
of the distribution
…The coefficient of skewness
can range from -3.00 up to +3.00
…A value of 0 indicates a symmetric distribution.
It is computed as follows:
SK = 3  Mean  Median 
1
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
σ
4 - 30
Skewness
SK1 = 3  Mean  Median 
σ
Following are the earnings per share for a sample of 15
software companies for the year 2000. The earnings
per share are arranged from smallest to largest.
$0.09
3.50
Find the
coefficient
of
skewness.
0.13 0.41
0.51
6.36
8.92 10.13 12.99 16.40
7.83
1.12
1.20
1.49 3.18
Mean = 4.95 SK = 3(4.95-3.18)/5.22
1
Median = 3.18
= 1.017
SD
= 5.22
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Positively Skewed Distribution
Mean and Median are to the right of the Mode
Mode<
Median<
Mean
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 31
Negatively Skewed Distribution
Mean and Median are to the left of the Mode
< Mode
< Median
Mean
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 32
4 - 33
Interquartile
Range
…is the distance between the third quartile
Q3 and the first quartile Q1.
This distance
will include the middle 50 percent of the
observations.
Interquartile Range = Q3 - Q1
Example
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Example
4 - 34
For a set of observations the
third quartile is 24 and the first quartile is 10.
What is the interquartile range?
The interquartile range is 24 - 10 = 14.
Fifty percent of the observations
will occur between 10 and 24.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 35
Box Plots
…is a graphical display, based on quartiles,
that helps to picture a set of data
Five pieces of data are needed to construct a box plot:
… the Minimum Value,
… the First Quartile,
… the Median,
… the Third Quartile, and
… the Maximum Value
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Example
Example
4 - 36
Based on a sample of 20 deliveries, Buddy’s
Pizza determined the following information.
The…minimum delivery time was 13minutes
…the maximum 30 minutes
The…first quartile was 15 minutes
…the median 18 minutes, and
… the third quartile 22 minutes
Develop a box plot for the delivery times.
Solution
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Solution
Min. Q1 Median
12
14
16
18
4 - 37
Q3
20
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
22
Max.
24
26
28
30
32
4 - 38
The following are the average rates of return
for Stocks A and B over a six year period,
In which of the following Stocks would you
prefer to invest?
Why?
Stock A: 7 6 8 5 7 3
Stock B: 15 -10 18 10 -5 8
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 39
Find the Mean rate of return for
each of the two stocks:
Stock A: 7 6 8 5 7 3
Mean =
36/6 = 6
Stock B: 15 -10 18 10 -5 8
Mean =
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
36/6 = 6
4 - 40
Find the Range of Values of each stock:
Stock A: 7 6 8 5 7 3
8–3=5
Stock B: 15 -10 18 10 -5 8
18 – ( -10) = 28
Therefore, Stock B is riskier.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Relative Dispersion
4 - 41
The coefficient of variation
is the ratio of the standard deviation to the
arithmetic mean, expressed as a percentage:
x
CV 
s
(100%)
A standard deviation of 10 may be perceived as
large when the mean value is 100,
but only
moderately large
when the mean value is 500!
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 42
Example
Rates of return over the past 6 years for
two mutual funds are shown below.
Fund A: 8.3, -6.0, 18.9, -5.7, 23.6, 20
Fund B: 12, -4.8, 6.4, 10.2, 25.3, 1.4
Which one has a higher level of risk?
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Solution
4 - 43
Solution
Fund A
Fund B
Mean
9.85 Mean
Let us use Standard Error
5.38 Standard Error
the Excel Median
13.60 Median
printout Mode
#N/A Mode
that is run Standard Deviation 13.19 Standard Deviation
173.88 Sample Variance
from the Sample Variance
-2.21 Kurtosis
“Descriptive Kurtosis
-0.44 Skewness
Statistics” Skewness
29.60 Range
sub-menu Range
Minimum
-6 Minimum
Maximum
23.6 Maximum
Sum
59.1 Sum
Count
6 Count
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
8.42
4.20
8.30
#N/A
10.29
105.81
0.90
0.61
30.1
-4.8
25.3
50.5
6
4 - 44
Solution
Is
Fund A
riskier
because
its
standard
deviation
is
larger?
Fund A
Fund B
Mean
9.85 Mean
Standard Error
5.38 Standard Error
Median
13.60 Median
Mode
#N/A Mode
Standard Deviation 13.19 Standard Deviation
Sample Variance
173.88 Sample Variance
Kurtosis
-2.21 Kurtosis
Skewness
-0.44 Skewness
Range
29.60 Range
Minimum
-6 Minimum
Maximum
23.6 Maximum
Sum
59.1 Sum
Count
6 Count
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
8.42
4.20
8.30
#N/A
10.29
105.81
0.90
0.61
30.1
-4.8
25.3
50.5
6
4 - 45
Solution
But the
means of
the two
funds are
different.
Fund A
Fund B
Mean
9.85 Mean
8.42
Standard Error
5.38 Standard Error
4.20
Median
13.60 Median
8.30
Mode
#N/A Mode
#N/A
Standard Deviation 13.19 Standard Deviation
10.29
Sample Variance
173.88 Sample Variance
105.81
Kurtosis
Kurtosis
0.90
Fund A has a -2.21
higher
rate of return,
Skewness but it also
-0.44
0.61
hasSkewness
a larger sd.
Range
29.60 Range
30.1
Therefore
we
need
to
compare
the
Minimum
-6 Minimum
-4.8
relative
Maximum
23.6variability
Maximum
25.3
Sum using the coefficient
59.1 Sum of variation.
50.5
Count
6 Count
6
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 46
CV 
s
x
Solution
(100%)
Fund A: CV = 13.19 / 9.85 = 1.34
Fund B: CV = 10.29 / 8.42 = 1.22
So now we say that there is
more variability in Fund A
as compared to Fund B
Therefore, Fund A is riskier.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Test your learning…
www.mcgrawhill.ca/college/lind
Online Learning Centre
for quizzes
extra content
data sets
searchable glossary
access to Statistics Canada’s E-Stat data
…and much more!
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
4 - 47
4 - 48
This completes Chapter 4
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.