Objective A : Range, Variance, and Standard

advertisement
Chapter 3
Numerically Summarizing Data
Copyright of the definitions and examples is reserved to Pearson Education,
Inc.. In order to use this PowerPoint presentation, the required textbook for
the class is the Fundamentals of Statistics, Informed Decisions Using Data,
Michael Sullivan, III, fourth edition.
Los Angeles Mission College
Prepared by DW
Chapter 3.1 Measures of Central Tendency
Objective A : Mean, Median, and Mode
Objective B : Relation Between the Mean, Median, and
Distribution Shape
Los Angeles Mission College
Prepared by DW
Chapter 3.1 Measures of Central Tendency
Objective A : Mean, Median, and Mode
Three measures of central of tendency: the mean, the median, and
the mode.
A1. Mean
The mean of a variable is the sum of all data values divided by the
number of observations.
Population mean:  
 xi
where x i is each data value and N is the
N
population size (the number of observations in the population).
x 
 xi
Sample mean:
where x i is each data value and n in the
n
sample size (the number of observations in the sample).
Los Angeles Mission College
Prepared by DW
Example 1: Population : 12 16 23 17 32 27 14 16
Compute the population mean and sample mean from a simple
random sample of size 4.
Does the sample mean equal to the population mean? Does the
population mean or sample mean stay the same? Explain.
(a) Population mean : (Round the mean to one more decimal
place than that in the raw data)
 
 xi
N 8
N

12  16  23  17  32  27  14  16
8

157
8
 19 . 625
 19 . 6
Los Angeles Mission College
Prepared by DW
(b) Sample mean:
From a lottery method, 23 16 14 17 were selected.
x 
 xi
n  4
n

23  16  14  17
4

70
4
 17 . 5
(c) Does the sample mean equal to the population mean?
No.
(d) Does the population mean or sample mean stay the same?
Explain.
 stays the same.
x varies from sample to sample.
Los Angeles Mission College
Prepared by DW
A2. Median
The median, M , is the value that lies in the middle of the data
when arranged in ascending order.
If n is odd, the median is the data value in the middle of the data
set; the location of the median is the
n 1
2
position.
If n is even, the median is the mean of the two middle
observations in the data set that lie in the
n
2
and
n
2
1
position
respectively.
Los Angeles Mission College
Prepared by DW
Example 1: Find the median of the data given below.
4 12 32 24 9 18 28 10 36
Reorder:
4
9
10
12
18
24
28
32
36
n  9  odd
The location of median is at =
n 1
2
=
9 1
2
= 5th position
The median is 18.
Los Angeles Mission College
Prepared by DW
Example 2: Find the median of the data given below.
$35.34 $42.09 $38.72 $43.28 $39.45 $49.36
$30.15 $40.88
Reorder: $30.15 $35.34 $38.72 $39.45 $40.88 $42.09
$43.28 $49.36
n  8  even
The location of median is between
8
2
and
8
1
2
2
and
n
2
1
which is between
= 4th and 5th position.
The median 
39.45  40.88
2
Los Angeles Mission College
n

80.33
 40 . 165
2
Prepared by DW
A3. Mode
Mode is the most frequent observation in the data set.
Example 1: Find the mode of the data given below.
76 60 81 72 60 80 68 73 80 67
Reorder:
60 60 67 68 72 73 76 80 80 81
Mode = 60 and 80
Example 2: Find the mode of the data given below.
A C D C B C A B B F B W F D B W D
A D C D
Reorder:
A A A B B B B B C C C C D D D D D F
F W W
Mode = B and D
Los Angeles Mission College
Prepared by DW
Example 3: The following data represent the G.P.A. of 12 students.
2.56 3.21 3.88 2.44 1.96 2.85 2.32 3.38 1.86
3.04 2.75 2.23
Find the mean, median, and mode G.P.A.
Reorder: 1.86 1.96 2.23 2.32 2.44 2.56 2.75 2.85 3.04
3.21 3.38 3.88
(a) mean
x 
 xi
n  12
n

1 . 86  1 . 96  2 . 23  2 . 32  2 . 44  2 . 56  2 . 75  2 . 85  3 . 04  3 . 21  3 . 38  3 . 88
12

32.48
12
 2 . 7067
 2 . 707
Los Angeles Mission College
Prepared by DW
(b) median
n  12  even
The location of median is between
12
2
and
12
2
1
n
2
and
n
2
1
which is between
= 6th and 7th position.
Reorder: 1.86 1.96 2.23 2.32 2.44 2.56 2.75 2.85 3.04
7th
6th
3.21 3.38 3.88
The median 
2 . 56  2 . 75
2

5 . 31
 2 . 655
2
(c) mode
None.
Los Angeles Mission College
Prepared by DW
Chapter 3.1 Measures of Central Tendency
Objective A : Mean, Median, and Mode
Objective B : Relation Between the Mean, Median, and
Distribution Shape
Los Angeles Mission College
Prepared by DW
Objective B : Relation Between the Mean, Median, and
Distribution Shape
 The mean is sensitive to extreme data. For continuous data,
if the distribution shape is a bell-shaped curve, the mean is a
better measure of central tendency because it includes all data
values in a data set.
 The median is resistant to extreme data. For continuous data,
if the distribution shape is skewed to the right or left, the median
is a better measure of central tendency.
 The mode is used to represent the measure of central tendency
for qualitative data.
Los Angeles Mission College
Prepared by DW
Mean or Median versus Skewness
Los Angeles Mission College
Prepared by DW
Chapter 3.2
Measures of Dispersion
Objective A : Range, Variance, and Standard Deviation
Objective B : Empirical Rule
Objective C : Chebyshev’s Inequality
Los Angeles Mission College
Prepared by DW
Chapter 3.2
Measures of Dispersion (Part I)
Measurement of dispersion is a numerical measure that can quantify
the spread of data.
In this section, the three numerical measures of dispersion that we
will discuss are the range, variance, and standard deviation. In the
later section, we will discuss another measure of dispersion called
interquartile range (IQR).
Objective A : Range, Variance, and Standard Deviation
A1. Range
Range = R = largest data value – smallest data value
The range is not resistant because it is affected by extreme
values in the data set.
Los Angeles Mission College
Prepared by DW
A2. Variance and Standard Deviation
Standard Deviation is based on the deviation about the mean. Since
the sum of deviation about the mean is zero, we cannot use the
average deviation about the mean as a measure of spread.
We use the average squared deviation (variance) instead.
The population variance,  , of a variable is the sum of the squared
deviations about the population mean,  , divided by the number of
observations in the population, N .
2

2

 ( xi   )
2
Definition Formula
N
2

2

 xi 
( x i )
N
Los Angeles Mission College
N
2
Computational Formula
Prepared by DW
The sample variance, s 2 , of a variable is the sum of the squared
deviations about the sample mean, x , divided by the number of
observations in the sample minus 1, n  1 .
s 
2
 ( xi  x )
Definition Formula
n 1
2
s 
2
 xi 
( xi )
2
Los Angeles Mission College
n 1
n
2
Computational Formula
Prepared by DW
In order to use the sample variance to obtain an unbiased estimate of
the population variance, we divide the sum of the squared deviations
about the sample mean by n  1 . We call n  1 the degree of
freedom because the first n  1 observations have freedom to be
whatever value they wish, but the n th value has no freedom in order
to force  ( x i  x ) to be zero.
The population standard deviation,  , is the square root of the
population variance or   population variance .
The sample standard deviation, s , is the square root of the sample
variance or s  sample variance .
To avoid round-off error, never use the rounded value of the
variance to compute the standard deviation. Keep a few more
decimal places for an intermediate step calculation.
Los Angeles Mission College
Prepared by DW
Example 1: Use the definition formula to find the population
variance and standard deviation.
Population: 4, 10, 12, 13, 21
Definition formula  
2
 
 ( xi   )
2
where  
N
4  10  12  13  21
5
60
N
 12
5
xi  
xi

 xi
( xi   )
4
4  12   8
(  8 )  64
10
10  12   2
(2)  4
12
12  12  0
0 0
13
13  12  1
1 1
21
21  12  9
9  81
2
2
2
2
2
2
 ( x i   )  150
2
Population variance:  2

150
5
Population standard deviation:
Los Angeles Mission College
 30

2
 
30  5 . 5
Prepared by DW
Example 2: Use the definition formula to find the sample variance
and standard deviation.
Sample: 83, 65, 91, 84
Definition formula
s 
2
Sample mean:
x
 ( xi  x )
n 1
2
where x 
83  65  91  84
4

323
 xi
n
 80.75  80.8
4
xi
xi  x
( xi  x )
83
83  80 . 75  2 . 25
( 2 . 25 )  5 . 0625
65
65  80 . 75   15 . 75
91
91  80 . 75  10 . 25
84
84  80 . 75  3 . 25
2
2
(  15 . 75 )  248 . 0625
2
(10 . 25 )  105 . 0625
2
( 3 . 25 )  10 . 5625
2
 ( x i  x )  368 . 75
2
Sample variance: s 
2
368 . 75
4 1
 122 . 9166  122 . 9
Sample standard deviation: s  122 . 9166  11 . 08677  11 . 1
Los Angeles Mission College
Prepared by DW
Example 3: Use the computational formula to find the sample
variance and standard deviation.
Sample: 83, 65, 91, 84 (same data set as Example 2)
2
Computational Formula
2
xi
xi
83
83
2
65
65
2
91
91
2
84
2
84
 x i  323
2
s 
2
2
n
n 1
2
( 323 )
26451 
2
4
s 
4 1

(Sample variance)
26451  26082 . 25
 122 . 9166  122 . 9
3
 x i  26451
Sample standard deviation:
s
Los Angeles Mission College
 xi 
( xi )
122 . 9166  11 . 08677  11 . 1
Prepared by DW
Example 4: Use StatCrunch to find the sample variance and
standard deviation.
Sample: 83, 65, 91, 84 (same data set as Example 2)
Step 1:
Click StatCrunch navigation button under the Course Home page 
Click StatCrunch website  Click Open StatCrunch 
Input the raw data in Var 1 column  Click Stat  Click Summary Stats  Columns
Los Angeles Mission College
Prepared by DW
Step 2:
Click var1 under Select column(s):  Under Statistics:, choose Variance and
Std. dev. (click them while holding Ctrl key on the keyboard)  Click Compute!
Los Angeles Mission College
Prepared by DW
Variance and standard deviation are computed.
s  122.9
2
s  1 1 .1
For more detailed instructions, please download “Q3.2.20 “ by
clicking the StatCrunch Handout navigation button of the course
homepage.
Note : For a small data set, students are expected to calculate
the standard deviation by hand.
Los Angeles Mission College
Prepared by DW
Chapter 3.2
Measures of Dispersion
Objective A : Range, Variance, and Standard Deviation
Objective B : Empirical Rule
Objective C : Chebyshev’s Inequality
Los Angeles Mission College
Prepared by DW
Objective B : Empirical Rule
Los Angeles Mission College
Prepared by DW
The figure below illustrates the Empirical Rule.
Los Angeles Mission College
Prepared by DW
Example 1: SAT Math scores have a bell-shaped distribution with a
mean of 515 and a standard deviation of 114. (Source:
College Board, 2007)
(a) What percentage of SAT scores is between 401 and
629?
1
 1

  1
 515  114
 401
515
  1
 515  114
 629
According to the Empirical Rule, approximately 68% of the data will
lie within 1 standard deviation of the mean.
68% of SAT scores is between 401 and 629.
Los Angeles Mission College
Prepared by DW
Example 1: (b) What percentage of SAT scores is between 287 and
743?
 2
2
 1
401
515
1
629
  2
515  2 (114 )
 287
  2
515  2 (114 )
 743
According to the Empirical Rule, approximately 95% of the data will
lie within 2 standard deviations of the mean.
95% of SAT scores is between 287 and 743.
Los Angeles Mission College
Prepared by DW
Example 1: (c) What percentage of SAT scores is less than 401 or
greater than 629?
100 %
–
68 %
401
=
629
 1 1
401 515
629
32 %
Los Angeles Mission College
Prepared by DW
Example 1: (d) What percentage of SAT scores is between 515 and
743?
9 5  2  4 7 .5
95 %
743
287
2
=
1
515
629
743
47 . 5 %
Los Angeles Mission College
Prepared by DW
Example 1: (e) About 99.7% of SAT scores will be between what
scores?
According to the Empirical Rule, approximately 99.7% of the data
will lie within 3 standard deviations of the mean.
(   3 ,   3 )
 ( 515  3 (114 ), 515  3 (114 ))
 (173 , 857 )
Los Angeles Mission College
Prepared by DW
Chapter 3.2
Measures of Dispersion
Objective A : Range, Variance, and Standard Deviation
Objective B : Empirical Rule
Objective C : Chebyshev’s Inequality
Los Angeles Mission College
Prepared by DW
Objective C : Chebyshev’s Inequality
Los Angeles Mission College
Prepared by DW
Example 1: According to the U.S. Census Bureau, the mean of the
commute time to for a resident to Boston,
Massachusetts, is 27.3 minutes. Assume that the
standard deviation of the commute time is 8.1 minutes
to answer the following:
(a) What minimum percentage of commuters in Boston has a
commute time within 2 standard deviations of the mean?
Standard deviation → k  2
According to the Chebyshev’s Inequality, at least
(1 
1
k
2
)  100 %  (1 
1
2
2
)  100 %  (1 
1
)  100 %  75 %
4
will lie within 2 standard deviations of the mean.
Los Angeles Mission College
Prepared by DW
Example 1: (b) (i) What minimum percentage of commuters in
Boston has a commute time within 1.5 standard
deviations of the mean?
(ii) What are the commute times within 1.5
standard deviations of the mean?
(i)
According to the Chebyshev’s Inequality, at least (1 
1
k
2
)  100 %
of the data will lie within k standard deviations of the mean.
Since
k  1 . 5 , (1 
1
1 .5
2
)  100 %  (1  0 . 4444 ...)  100 %  55 . 6 %
55.6% of commuters in Boston has a commute time.
(ii) (   1.5 ,   1.5 )
 (27.3  1.5(8.1), 27.3  1.5(8.1))
 (1 5 .1 5, 3 9 .4 5)
At least 55.6% of commuters in Boston has a commute time
between 15.15 minutes and 39.45 minutes
Los Angeles Mission College
Prepared by DW
Chapter 3.3
Measures of Central Tendency and Dispersion from
Grouped Data
This section we are going to learn how to calculate the mean, x , and
the weighted mean, x w , from data that have already been
summarized in frequency distributions (group data).
Since raw data cannot be retrieved from a frequency table, the class
midpoint is used to represent the mean of the data values within
each class.
Midpoint = (Adding consecutive lower class limits) ÷ 2
Los Angeles Mission College
Prepared by DW
Chapter 3.3
Measures of Central Tendency and Dispersion
from Grouped Data
Objective A : Approximate the sample mean of a variable
from grouped data.
Objective B : The weighted Mean,
Los Angeles Mission College
xw
Prepared by DW
Objective A : Approximate the sample mean of a variable
from grouped data.
Sample Mean:
x
 xi f i
 fi
where x i is the midpoint of the i th class
f i is the frequency of the i th class
 f i  n is the number of classes
Los Angeles Mission College
Prepared by DW
Example 1: The following frequency distribution represents the
second test scores of my Math 227 from last semester.
Approximate the mean of the score.
Test Score
Number of Students
1  20
1
( f i ) Midpoint
1  21
( xi )
xi f
i
 11
(11 )( 1)  11
 31
( 31 )( 2 )  62
 51
( 51 )( 7 )  357
 71
( 71 )( 10 )  710
 91
( 91 )( 5 )  455
2
21  40
2
21  41
2
41  60
7
61  80
10
41  61
2
61  81
2
81  100
5
81  101
2
 f i  25
Los Angeles Mission College
 x i f i  1595
Prepared by DW
From the previous slide,
 x i f i  1595
 f i  25
The mean of the score :
x
Los Angeles Mission College
 xi f i
 fi

1595
 63 . 8
25
Prepared by DW
Chapter 3.3
Measures of Central Tendency and Dispersion from
Grouped Data
Objective A : Approximate the sample mean of a variable
from grouped data.
Objective B : The weighted Mean,
Los Angeles Mission College
xw
Prepared by DW
Objective B : The weighted Mean,
xw
We compute the weighted mean when data values are not weighted
equally.
xw 
 wi xi
 wi
where w i is the weight of the i th observation
x i is the value of the i th observation
Los Angeles Mission College
Prepared by DW
Example 1: Michael and Kevin want to buy nuts. They can't agree
on whether they want peanuts, cashews, or almonds.
They agree to create a mix. They bought 2.5 pounds of
peanuts for $1.30 per pound, 4 pounds of cashews for
$4.50 per pounds, and 2 pounds of almonds for $3.75
per pound. Determine the price per pound of the mix.
wi
xi
wi xi
2 .5
1 . 30
( 2 . 5 )( 1 . 30 )  3 . 25
4
4 . 50
( 4 )( 4 . 50 )  18
2
3 . 75
( 2 )( 3 . 75 )  7 . 5
 w i x i  28 . 75
 wi  8 .5
The price per pound of the mix :
x
Los Angeles Mission College
 wi xi
 wi

28 . 75
 3 . 38235
 $ 3 . 38 per lb
8 .5
Prepared by DW
Example 2: In Marissa's calculus course, attendance counts for 5% of
the grade, quizzes count for 10% of the grade, exams
count for 60% of the grade, and the final exam counts for
25% of the grade. Marissa had a 100% average for
attendance, 93% for quizzes, 86% for exams, and 85% on
the final. Determine Marissa's course average.
wi
xi
wi xi
5
100
( 5 )( 100 )  500
10
93
(10 )( 93 )  930
60
86
( 60 )( 86 )  5160
25
85
(25)(85)  2125
 w i x i  8715
 w i  100
Marissa’s course average :
x
Los Angeles Mission College
 wi xi
 wi

8715
100
 87 . 15  87 . 2 %
Prepared by DW
Ch 3.4 Measures of Positions and Outliers
Objective A : z -scores
Objective B : Percentiles and Quartiles
Objective C : Outliers
Los Angeles Mission College
Prepared by DW
Ch3.4 Measures of Positions and Outliers
Measures of position determine the relative position of a certain
data value within the entire set of data.
Objective A : z -scores
The z -score represents the distance that a data value is from the
mean in terms of the number of standard deviations.
Population z -score:
Sample z -score:
Los Angeles Mission College
z 
z 
x

xx
s
Prepared by DW
Example 1: The average 20- to 29-year-old man is 69.6 inches tall,
with a standard deviation of 3.0 inches, while the
average 20- to 29-year-old woman is 64.1 inches tall,
with a standard deviation of 3.8 inches. Who is
relatively taller, a 67-inch man or 62-inch woman?
Man :
  69 . 6 inches
z 
x


  3 . 0 inches
67  69.6
x  67 inches
  0.87
3.0
0.87 standard deviation below the mean.
Woman : 
 64 . 1 inches
z 
x


  3 . 8 inches
62  64 . 1
x  62 inches
  0 . 55
3 .8
0.55 standard deviation below the mean
Therefore, the 62-inch woman is relatively taller than the 67-inch man.
Los Angeles Mission College
Prepared by DW
Ch 3.4 Measures of Positions and Outliers
Objective A : z -scores
Objective B : Percentiles and Quartiles
Objective C : Outliers
Los Angeles Mission College
Prepared by DW
Objective B : Percentiles and Quartiles
B1. Percentiles
The k th percentile, Pk , of a set of data is a value such that k percent
of the observations are less than or equal to the value.
Example 1: Explain the meaning of the 5th percentile of the weight
of males 36 months of age is 12.0 kg.
5% of 36-month-old males weighs 12.0 kg or less.
95% of 36-month-old males weighs more than 12.0 kg.
Los Angeles Mission College
Prepared by DW
The most common percentiles are quartiles.
The first quartile, Q 1 , is equivalent to P25 .
The second quartile, Q 2 , is equivalent to P50 .
The third quartile, Q 3 , is equivalent to P75 .
Los Angeles Mission College
Prepared by DW
Example 2: Determine the quartiles of the following data.
46 45 58 71 42 66 72 42 61 49 80
Ascending order :
42 42 45 46 49 58 61 66 71 72 80
M  58
Q 2  58
Lower half of the data :
42 42 45 46 49
Q 1  45
Upper half of the data :
61 66 71 72 80
Q 3  71
Los Angeles Mission College
Prepared by DW
B2. Interquartile
The interquartile range, IQR, is the measure of dispersion that is
based on quartiles. The range and standard deviation are effected
by extreme values. The IQR is resistant to extreme values.
Los Angeles Mission College
Prepared by DW
Example 1: One variable that is measured by online homework
systems is the amount of time a student spends on
homework for each section of the text. The following is
a summary of the number of minutes a student spends
for each section of the text for the fall 2007 semester in
a College Algebra class at Joliet Junior College.
Q 1  42
Q 2  51 . 5
Q 3  72 . 5
(a) Provide an interpretation of these results.
Q 1  42 : 25% of the students spend 42 minutes or less on homework
for each section, and 75% of the students spend more than
42 minutes.
Q 2  51 . 5 : 50% of the students spend 51.5 minutes or less on homework
for each section, and 50% of the students spend more than
51.5 minutes.
Q 3  72 . 5 : 75% of the students spend 72.5 minutes or less on homework
for each section, and 25% of the students spend more than
72.5 minutes.
Los Angeles Mission College
Prepared by DW
(b) Determine and interpret the interquartile range.
IQR  Q 3  Q1  72 . 5  42  30 . 5 minutes
The middle of 50% of all students has a range of 30.5 minutes
of time spent on homework.
(c) Do you believe that the distribution of time spent doing
homework is skewed or symmetric? Why?
Skewed right. The difference between Q 2 and Q1 is less than the
difference between Q 3 and Q 2 .
Los Angeles Mission College
Prepared by DW
Los Angeles Mission College
Prepared by DW
Ch 3.4 Measures of Positions and Outliers
Objective A : z -scores
Objective B : Percentiles and Quartiles
Objective C : Outliers
Los Angeles Mission College
Prepared by DW
Objective C : Outliers
Extreme observations are called outliers; they may occur by error in
the measurement or during data entry or from errors in sampling.
Los Angeles Mission College
Prepared by DW
Example 1: The following data represent the hemoglobin ( in g/dL )
for 20 randomly selected cats. (Source: Joliet Junior College
Veterinarian Technology Program)
5.7 8.9 9.6 10.6 11.7 7.7 9.4 9.9 10.7 12.9 7.8 9.5
10.0 11.0 13.0 8.7 9.6 10.3 11.2 13.4
(a) Determine the quartiles.
Ascending order :
5.7 7.7 7.8 8.7
8.9 9.4 9.5 9.6 9.6 9.9 10.0 10.3
10.6 10.7 11.0 11.2 11.7 12.9 13.0 13.4
M 
9 . 9  10 . 0
 9 . 95
2
Los Angeles Mission College
Prepared by DW
(b) Compute and interpret the interquartile range, IQR.
Lower half of the data:
5.7
7.7
7.8
8.7
8.9
Q1 
9.4
9.5
8 .9  9 .4
9.6
9.6
9.9
 9 . 15
2
Upper half of the data:
10.0 10.3 10.6 10.7 11.0 11.2 11.7 12.9 13.0 13.4
Q3 
Los Angeles Mission College
11 . 0  11 . 2
 11 . 1
2
Prepared by DW
(c) Determine the lower and upper fences. Are there any outliers,
according to this criterion?
Ascending order of the original data :
Q 1  9 . 15
M  9 . 95
5.7 7.7 7.8 8.7 8.9 9.4 9.5 9.6 9.6 9.9 10.0 10.3
10.6 10.7 11.0 11.2 11.7 12.9 13.0 13.4
Q 3  11 . 1
IQR  Q 3  Q1  11 . 1  9 . 15  1 . 95
Lower Fence  Q 1  1 . 5 ( IQR )
 9 . 15  1 . 5 (1 . 95 )  6 . 225  6 . 23
Upper Fence  Q 3  1 . 5 ( IQR )
 11 . 1  1 . 5 (1 . 95 )  14 . 025  14 . 03
All data falls within 6.23 to 14.03 except 5.7.
5.7 is the outlier.
Los Angeles Mission College
Prepared by DW
Ch 3.5 The Five-Number Summary and Boxplots
Objective A : The Five-Number Summary
Objective B : Boxplots
Objective C : Using a Boxplot to describe the shape of
a distribution
Los Angeles Mission College
Prepared by DW
Ch 3.5 The Five-Number Summary and Boxplots
Objective A : The Five-Number Summary
Los Angeles Mission College
Prepared by DW
Example 1: The number of chocolate chips in a randomly selected
21 name-brand cookies were recorded. The results are
shown
28 23 28 31 27 29 24 19 26 23 21 25 22 23
21 23 33 28 33 21 30
Find the Five-Number Summary. M  25
Ascending order :
19 21 21 21 22 23 23 23 23 24 25 26 27 28 28 28 29
30 31 33 33
Lower half of the data: 19 21 21 21 22 23 23 23 23 24
Q1 
22  23
 22 . 5
2
Upper half of the data: 26 27 28 28 28 29 30 31 33 33
Five-number summary:
Minimum = 19, Q 1 = 22.5,
Los Angeles Mission College
Q3 
28  29
 28 . 5
2
Q 2 = 25, Q 3 = 28.5,
Maximum = 33
Prepared by DW
Ch 3.5 The Five-Number Summary and Boxplots
Objective A : The Five-Number Summary
Objective B : Boxplots
Objective C : Using a Boxplot to describe the shape of a
distribution
Los Angeles Mission College
Prepared by DW
Objective B : Boxplots
The five-number summary can be used to construct a graph called
the boxplot.
Los Angeles Mission College
Prepared by DW
Example 1: A stockbroker recorded the number of clients she saw
each day over an 11-day period. The data are shown.
Draw a boxplot.
32 39 41 30 31 43 48 27 42 20 34
Ascending order :
20 27 30 31 32 34 39 41 42 43 48
Q1
M
Q3
IQR  Q 3  Q1  42  30  12
Lower Fence  Q 1  1 . 5 ( IQR )  30  1 . 5 (12 )  12
Upper Fence  Q 3  1 . 5 ( IQR )  42  1 . 5 (12 )  60
Since all data fall between the lower fence, 12, and upper
fence, 60. There is no outlier.
20
25
Los Angeles Mission College
30
35
40
45
50
55
Prepared by DW
Ch 3.5 The Five-Number Summary and Boxplots
Objective A : The Five-Number Summary
Objective B : Boxplots
Objective C : Using a Boxplot to describe the shape of a
distribution
Los Angeles Mission College
Prepared by DW
Objective C : Using a Boxplot to describe the shape of a
distribution
Los Angeles Mission College
Prepared by DW
Example 1:
Use the side-by-side boxplots shown to answer the
questions that follow.
(a) To the nearest integer, what is the median of variable x ?
 15
(b) To the nearest integer, what is the first quartile of variable y ?
 22
Los Angeles Mission College
Prepared by DW
(c) Which variable has more dispersion? Why?
The y variable has more dispersion because the IQR on y is
wider than the IQR on the x variable.
(d) Does the variable x have any outliers? If so, what is the value
of the outlier?
Yes, there is an asterisk on the right side of the boxplot.
Outliers  30
(e) Describe the shape of the variable y . Support your position.
Since there is a longer whisker on the left and Q 2  Q 1 is bigger
than Q 3  Q 2 , the shape of the distribution is skewed to the left.
Los Angeles Mission College
Prepared by DW
Example 2: The following data represent the carbon dioxide
emissions per capita (total carbon dioxide emissions, in
tons, divided by total population) for the countries of
Western Europe in 2004.
Los Angeles Mission College
Prepared by DW
(a) Find the five-number summary.
Q1
Ascending order:
1.01 1.34 1.40 1.44 1.47 1.53 1.61 1.64 1.67 2.07 2.08
2.09 2.12 2.21 2.34 2.38 2.39 2.64 2.67 2.68 2.87 3.44
3.65 3.86 5.22 6.81
M 
2 . 12  2 . 21
 2 . 165
Q3
2
Minimum = 1.01,
Q 1 = 1.61, Q 2 = 2.165, Q 3 = 2.68,
Maximum = 6.81
(b) Determine the lower and upper fences.
IQR  Q 3  Q1  2 . 68  1 . 61  1 . 07
Lower Fence  Q 1  1 . 5 ( IQR )  1 . 61  1 . 5 (1 . 07 )  0 . 005  0 . 01
Upper Fence  Q 3  1 . 5 ( IQR )  2 . 68  1 . 5 (1 . 07 )  4 . 285  4 . 29
Outlier
 5 . 22 , 6 . 81
Los Angeles Mission College
Prepared by DW
(c) Construct a boxplot.
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
5.22 is a mild outlier which is represented by an asterisk.
6.81 is an extreme outlier because it is larger than
Q 3  3 ( IQR ) . An extreme outlier is presented by an open
circle.
(d) Use the boxplot and quartiles to describe the shape of the
distribution.
Since there are two extreme large outliers, the shape of the
distribution is skewed to the right.
Los Angeles Mission College
Prepared by DW
Note: Part (a) and (c) can be easily done by using StatCrunch. For
the instructions, please refer to the StatCrunch handout.
Los Angeles Mission College
Prepared by DW
Download
Related flashcards

Statistical theory

24 cards

Statistical software

79 cards

Plotting software

51 cards

Create Flashcards