SECTION 3.4 - MEASURES OF POSITION AND OUTLIERS

advertisement
SECTION 3.4 - MEASURES OF POSITION AND OUTLIERS
Definition
The D -score represents the distance that a data value is from the mean in terms of
the number of standard deviations. It is obtained by subtracting the mean from the data
value and dividing this result by the standard deviation. There is both a population D score and a sample D -score; their formulas are:
Population D -score
Sample D -score
B.
BB
Dœ
Dœ
5
=
The D -score is unitless. It has mean ! and standard deviation ".
Example 1
The average 20- to 29-year-old man is 69.6 inches tall, with a standard deviation
of 3.0 inches, while the average 20- to 29-year-old woman is 64.1 inches tall, with a
standard deviation of 3.8 inches. Who is relatively taller, a 67-inch man or a 62-inch
woman? (p. 173, Ex. 12)
Definition
The 5 th percentile, denoted T5 , of a set of data is the value such that 5 percent of
the observations are less than or equal to the value.
Example 2
Use the dividend yield data from Example 1 in Section 2.2 to calculate the
percentile for a dividend yield of 1.68.
1.7
0
1.15 0.62 1.06 2.45 2.38
2.83 2.16 1.05 1.22 1.68 0.89 0
2.59 0
1.7
0.64 0.67 2.07 0.94
2.04 0
0
1.35 0
0
0.41
Definitions
Quartiles divide the data set into fourths or four equal parts. The first quartile,
U" , divides the bottom 25% from the top 75%. The second quartile, U# , divides the
bottom 50% of the data from the top 50% of the data. The third quartile, U$ , divides the
bottom 75% from the top 25%.
Finding Quartiles
Step 1. Arrange the data in ascending order.
Step 2. Determine the median, Q , or second quartile, U# .
Step 3. Determine the first and third quartiles, U" and U$ , by dividing the date set into
two halves; the bottom half will be the observations below (to the left of) the
location of the median and the top half will be the observations above (to the right
of) the location of the median. The first quartile is the median of the bottom half
and the third quartile is the median of the top half.
Definition
The interquartile range, denoted IQR, is the range of the middle 50% of the
observations in a data set. That is, the IQR is the difference between the first and third
quartiles and is found using the formula
IQR œ U$  U"
Example 3
The following data represent the carbon dioxide emissions per capita (total carbon
dioxide emissions, in tons, divided by total population) for the countries of Western
Europe in 2004. Determine the quartiles and the interquartile range. (p. 174, Ex. 24)
2.34 1.34 3.86 1.40 2.08 2.39 2.68
2.64 3.44 2.07 1.67 1.61 6.81 2.21
2.09 1.64 2.87 2.38 1.47 1.53
1.44 3.65 2.12 5.22 2.67 1.01
Checking for Outliers by Using Quartiles
Step 1. Determine the first and third quartiles of the data.
Step 2. Compute the interquartile range.
Step 3. Determine the fences. Fences serve as cutoff points for determining outliers.
Lower fence œ U"  "Þ&IQR
Upper fence œ U$  "Þ&IQR
Step 4. If a data value is less than the lower fence or greater than the upper fence, it is
considered an outlier.
Example 3 (continued)
The following data represent the carbon dioxide emissions per capita (total carbon
dioxide emissions, in tons, divided by total population) for the countries of Western
Europe in 2004. Determine the fences and use the fences to determine if any of the
observations are outliers. (p. 174, Ex. 24)
2.34 1.34 3.86 1.40 2.08 2.39 2.68
2.64 3.44 2.07 1.67 1.61 6.81 2.21
2.09 1.64 2.87 2.38 1.47 1.53
1.44 3.65 2.12 5.22 2.67 1.01
Download