Sec 3.3 Navidi

advertisement
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
SECTION 3.3
MEASURES OF POSITION
Section 3.3 - Objectives
1.
2.
3.
4.
5.
6.
Compute and interpret z-scores
Compute percentiles of a data set
Compute the quartiles of a data set
Compute the five-number summary for a data set
Understand the effects of outliers
Construct boxplots to visualize the five-number summary
and outliers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objective 1
Compute and interpret z-scores
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
z-Score
Who is taller, a man 73 inches tall or a woman 68 inches tall? The obvious
answer is that the man is taller. However, men are taller than women on the
average. Suppose the question is asked this way: Who is taller relative to their
gender, a man 73 inches tall or a woman 68 inches tall?
One way to answer this question is with a z-score.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
z-Score
The z-score of an individual data value tells how many standard deviations
that value is from its population mean.
For example, a value one standard deviation above the mean has a z-score of
1. A value two standard deviations below the mean has a z-score of –2.
Definition:
Let x be a value from a population with mean μ and standard deviation σ .
The z-score for x is
z
x

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 3.22
A National Center for Health Statistics study states that the mean height for adult
men in the U.S. is μ = 69.4 inches, with a standard deviation of σ = 3.1 inches.
The mean height for adult women is μ = 63.8 inches, with a standard deviation
of σ = 2.8 inches. Who is taller relative to their gender, a man 73 inches tall, or
a woman 68 inches tall?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
We compute the z-scores for the two heights.
zMan's Height 
x
zWoman's Height 

73  69.4

 1.16
3.1
x


Taller, relative to
the population of
women’s heights.
68  63.8
 1.50
2.8
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
z-Scores & The Empirical Rule
Since the z-score is the number of standard deviations from the mean, we can
easily interpret the z-score for bell-shaped populations using The Empirical Rule.
When a population has a histogram that is approximately bell-shaped, then

Approximately 68% of the data will have z-scores between –1 and 1.

Approximately 95% of the data will have z-scores between –2 and 2.

All, or almost all of the data will have z-scores between –3 and 3.
z = –3
z = –2
z = –1
z=1
z=2
z=3
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objective 2
Compute percentiles of a data set
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objective 3
Compute the quartiles of a data set
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Quartiles
There are three special percentiles which divide a data set into four
pieces, each of which contains approximately one quarter of the data.
These values are called the quartiles.



The first quartile, denoted Q1, is the 25th percentile.
Q1 separates the lowest 25% of the data from the highest 75%.
The second quartile, denoted Q2, is the 50th percentile.
Q2 separates the lower 50% of the data from the upper 50%. Q2
is the same as the median.
The third quartile, denoted Q3, is the 75th percentile.
Q3 separates the lowest 75% of the data from the highest 25%.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 3.25
The following table presents the annual rainfall, in inches, in Los Angeles during
the month of February from 1965 to 2006. Compute the first and third
quartiles.
0.23
3.06
6.61
1.51
12.75
3.21
0.11
1.48
1.30
0.49
0.70
4.94
8.03
4.37
0.08
2.58
0.00
13.68
0.67
2.84
0.56
0.13
6.10
5.54
7.89
1.22
8.87
0.14
1.72
0.29
3.54
1.90
4.64
3.71
3.12
4.89
0.17
4.13
11.02
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
8.91
7.96
2.37
Solution
Step 1:
0.00
1.48
4.64
0.08
1.51
4.89
Arrange the data in increasing order.
0.11
1.72
4.94
0.13
1.90
5.54
0.14
2.37
6.10
0.17
2.58
6.61
0.23
2.84
7.89
0.29
3.06
7.96
0.49
3.12
8.03
0.56
3.21
8.87
0.67
3.54
8.91
0.70 1.22 1.30
3.71 4.13 4.37
11.02 12.75 13.68
Step 2: There are 42 values in the data set. We compute L = (p/100)·n for
both p = 25 and
p = 75:
L25 = (25/100)·42 = 10.5
L75 = (75/100)·42 = 31.5
Step 3: We round these values up to 11 and 32. The first quartile, Q1 is in
the 11th position. The third quartile, Q3 is in the 32nd position. We
see that Q1= 0.67 and Q3= 5.54.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Quartiles on the TI-84 PLUS
On the TI-84 PLUS Calculator, the 1-Var Stats command, also used for means,
medians, and standard deviations, will compute quartiles.
We enter the data into list L1 and run the 1-Var Stats command.
First Quartile
Third Quartile
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objective 4
Compute the five-number summary for a data
set
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Five-Number Summary
The five-number summary of a data set consists of the median, the first
quartile, the third quartile, the smallest value, and the largest value. These
values are generally arranged in order.
Definition:
The five-number summary of a data set consists of the following quantities:
Minimum
First Quartile
Median
Third Quartile
Maximum
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example – Five-Number Summary
Consider again the Los Angeles rainfall data.
0.00
1.48
4.64
0.08
1.51
4.89
0.11
1.72
4.94
0.13
1.90
5.54
0.14
2.37
6.10
0.17
2.58
6.61
0.23
2.84
7.89
0.29
3.06
7.96
0.49
3.12
8.03
0.56
3.21
8.87
0.67
3.54
8.91
0.70 1.22 1.30
3.71 4.13 4.37
11.02 12.75 13.68
The Minimum is 0.00 and the Maximum is 13.68. The Median is easily
computed as 2.95. We have already computed Q1 = 0.67 and Q3 = 5.54.
The five-number summary for this data set is:
0.00
0.67
2.95
5.54
13.68
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objective 5
Understand the effects of outliers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Outliers
An outlier is a value that is considerably larger or considerably
smaller than most of the values in a data set.
Some outliers result from errors; for example a misplaced decimal
point may cause a number to be much larger or smaller than the
other values in a data set.
Some outliers are correct values, and simply reflect the fact that
the population contains some extreme values.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 3.28
The temperature in a downtown location in a certain city is measured for eight
consecutive days during the summer. The readings, in degrees Fahrenheit, are
81.2, 85.6, 89.3, 91.0, 83.2, 8.45, 79.5, and 87.8.
Which reading is an outlier? Is it certain that the outlier is an error, or is it
possible that it is correct? Should the outlier be deleted?
Solution:
The outlier is 8.45, which is much smaller than the rest of the data. This outlier is
certainly an error; it is likely that a decimal point was misplaced. The outlier
should be corrected if possible, or deleted.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Interquartile Range
One method for detecting outliers involves a measure called the Interquartile
Range.
Definition:
The interquartile range is found by subtracting the first quartile from the third
quartile.
IQR = Q3 – Q1
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
IQR Method for Detecting Outliers
The most frequent method used to detect outliers in a data set is the IQR
Method. The procedure for the IQR Method is:
Step 1: Find the first quartile Q1, and the third quartile Q3.
Step 2: Compute the interquartile range: IQR = Q3 – Q1.
Step 3: Compute the outlier boundaries. These boundaries are the cutoff
points for determining outliers:
Lower Outlier Boundary = Q1 – 1.5(IQR)
Upper Outlier Boundary = Q3 + 1.5(IQR)
Step 4: Any data value that is less than the lower outlier boundary or greater
than the upper outlier boundary is considered to be an outlier.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 3.30
The following table presents the number of students absent in a middle school in
northwestern Montana for each school day January 2008. Identify any outliers.
Jan. 2
Jan. 3
Jan. 4
Jan. 7
Jan. 8
Jan. 9
65
67
71
57
51
49
Jan. 10
Jan. 11
Jan. 14
Jan. 15
Jan. 16
Jan. 17
44
41
59
49
42
56
Jan. 18
Jan. 21
Jan. 22
Jan. 23
Jan. 24
Jan. 25
45
77
44
42
45
46
Jan. 28
Jan. 29
Jan. 30
Jan. 31
100
59
53
51
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
We may use the TI-84 PLUS, or other
technology, to compute the quartiles.
We see that Q1 = 45 and Q3 = 59.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
The Interquartile Range is IQR = Q3 – Q1 = 59 – 45 = 14.
The outlier boundaries are:
Lower Outlier Boundary
= Q1 – 1.5(IQR)
= 45 – 1.5(14)
= 24
Upper Outlier Boundary
= Q3 + 1.5(IQR)
= 59 + 1.5(14)
= 80
There are no values in the data set less than the lower boundary of 24. There is
one value, 100, which is greater than the upper boundary of 80. Thus there is
one outlier, 100.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Objective 6
Construct boxplots to visualize the five-number
summary and outliers
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Any Data Set
A boxplot is a graph that presents the five-number summary
along with some additional information about a data set.
There are several different kinds of boxplots. The one we
describe here is sometimes called a modified boxplot.
*
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
*
Procedure for Drawing a Boxplot
Following is the procedure for drawing a boxplot:
Step 1: Compute the first quartile, the median, and the third quartile.
Step 2: Draw vertical lines at the first quartile, the median, and the third
quartile. Draw horizontal lines between the first and third quartiles to
complete the box.
Step 3: Compute the lower and upper outlier boundaries.
Step 4: Find the largest data value that is less than the upper outlier boundary.
Draw a horizontal line from the third quartile to this value. This
horizontal line is called a whisker.
Step 5: Find the smallest data value that is greater than the lower outlier
boundary. Draw a horizontal line (whisker) from the first quartile to this
value.
Step 6: Determine which values, if any, are outliers. Plot each outlier
separately.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Example 3.31
The following table presents the number of students absent in a middle school in
northwestern Montana for each school day January 2008. Construct a boxplot.
Jan. 2
Jan. 3
Jan. 4
Jan. 7
Jan. 8
Jan. 9
65
67
71
57
51
49
Jan. 10
Jan. 11
Jan. 14
Jan. 15
Jan. 16
Jan. 17
44
41
59
49
42
56
Jan. 18
Jan. 21
Jan. 22
Jan. 23
Jan. 24
Jan. 25
45
77
44
42
45
46
Jan. 28
Jan. 29
Jan. 30
Jan. 31
100
59
53
51
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 1:
We may use the TI-84 PLUS, or other
technology, to compute the median and
quartiles. We see that Median = 51,
Q1 = 45, and Q3 = 59.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 2:
We draw vertical lines at 45, 51, and 59, then horizontal lines to complete the
box.
Step 3:
We compute the outlier boundaries:
Lower Outlier Boundary
= Q1 – 1.5(IQR) = 24
Upper Outlier Boundary
= Q3 + 1.5(IQR) = 80
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 4:
The largest data value that is less than the upper boundary is 77. We draw a
horizontal line from 59 up to 77.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 5:
The smallest data value that is greater than the lower boundary is 41. We draw
a horizontal line from 45 down to 41.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Solution
Step 6:
The data value 100 lies outside of the outlier boundaries. Therefore, 100 is an
outlier. We plot this point separately.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Determining Skewness
Boxplots can help determine the skewness of a data set.

If the median is closer to the first
quartile than to the third quartile,
or the upper whisker is longer than
the lower whisker, the data are
skewed to the right.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Determining Skewness

If the median is closer to the third
quartile than to the first quartile, or
the lower whisker is longer than the
upper whisker, the data are skewed
to the left.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Determining Skewness

If the median is approximately
halfway between the first and
third quartiles, and the two whiskers
are approximately equal in length,
the data are approximately
symmetric
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Do You Know…
•
•
•
•
•
•
How to compute and interpret z-scores?
How to compute percentiles of a data set?
How to compute the quartiles of a data set?
How to compute the five-number summary for a data set?
The effects of outliers?
How to construct boxplots?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Download