SECTION 3.4 - MEASURES OF POSITION AND OUTLIERS Definition The D -score represents the distance that a data value is from the mean in terms of the number of standard deviations. It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation. There is both a population D score and a sample D -score; their formulas are: Population D -score Sample D -score B. BB Dœ Dœ 5 = The D -score is unitless. It has mean ! and standard deviation ". Example 1 The average 20- to 29-year-old man is 69.6 inches tall, with a standard deviation of 3.0 inches, while the average 20- to 29-year-old woman is 64.1 inches tall, with a standard deviation of 3.8 inches. Who is relatively taller, a 67-inch man or a 62-inch woman? (p. 173, Ex. 12) Definition The 5 th percentile, denoted T5 , of a set of data is the value such that 5 percent of the observations are less than or equal to the value. Example 2 Use the dividend yield data from Example 1 in Section 2.2 to calculate the percentile for a dividend yield of 1.68. 1.7 0 1.15 0.62 1.06 2.45 2.38 2.83 2.16 1.05 1.22 1.68 0.89 0 2.59 0 1.7 0.64 0.67 2.07 0.94 2.04 0 0 1.35 0 0 0.41 Definitions Quartiles divide the data set into fourths or four equal parts. The first quartile, U" , divides the bottom 25% from the top 75%. The second quartile, U# , divides the bottom 50% of the data from the top 50% of the data. The third quartile, U$ , divides the bottom 75% from the top 25%. Finding Quartiles Step 1. Arrange the data in ascending order. Step 2. Determine the median, Q , or second quartile, U# . Step 3. Determine the first and third quartiles, U" and U$ , by dividing the date set into two halves; the bottom half will be the observations below (to the left of) the location of the median and the top half will be the observations above (to the right of) the location of the median. The first quartile is the median of the bottom half and the third quartile is the median of the top half. Definition The interquartile range, denoted IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the first and third quartiles and is found using the formula IQR œ U$ U" Example 3 The following data represent the carbon dioxide emissions per capita (total carbon dioxide emissions, in tons, divided by total population) for the countries of Western Europe in 2004. Determine the quartiles and the interquartile range. (p. 174, Ex. 24) 2.34 1.34 3.86 1.40 2.08 2.39 2.68 2.64 3.44 2.07 1.67 1.61 6.81 2.21 2.09 1.64 2.87 2.38 1.47 1.53 1.44 3.65 2.12 5.22 2.67 1.01 Checking for Outliers by Using Quartiles Step 1. Determine the first and third quartiles of the data. Step 2. Compute the interquartile range. Step 3. Determine the fences. Fences serve as cutoff points for determining outliers. Lower fence œ U" "Þ&IQR Upper fence œ U$ "Þ&IQR Step 4. If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier. Example 3 (continued) The following data represent the carbon dioxide emissions per capita (total carbon dioxide emissions, in tons, divided by total population) for the countries of Western Europe in 2004. Determine the fences and use the fences to determine if any of the observations are outliers. (p. 174, Ex. 24) 2.34 1.34 3.86 1.40 2.08 2.39 2.68 2.64 3.44 2.07 1.67 1.61 6.81 2.21 2.09 1.64 2.87 2.38 1.47 1.53 1.44 3.65 2.12 5.22 2.67 1.01