SECTION 3.3: MEASURES OF POSITION OBJECTIVES 1. 2. 3. 4. 5. 6. Compute and interpret ๐ง-scores Compute the quartiles of a data set Compute the percentiles of a data set Compute the five-number summary for a data set Understand the effects of outliers Construct boxplots to visualize the five-number summary and outliers OBJECTIVE 1 COMPUTE AND INTERPRET ๐-SCORES ๐-SCORE Who is taller, a man 73 inches tall or a woman 68 inches tall? The obvious answer is that the man is taller. However, men are taller than women on the average. Suppose the question is asked this way: Who is taller relative to their gender, a man 73 inches tall or a woman 68 inches tall? One way to answer this question is with a ๐ง-score. The ๐ง-score of an individual data value tells how many _____________________________ that value is from its population mean. For example, a value one standard deviation above the mean has a ๐ง-score of ____________ and a value two standard deviations below the mean has a ๐ง-score of ____________. Let ๐ฅ be a value from a population with mean ๐ and standard deviation ๐. The z-score for ๐ฅ is ๐ง = . ____________________ 1 SECTION 3.3: MEASURES OF POSITION E XAMPLE : A National Center for Health Statistics study states that the mean height for adult men in the U.S. is ๐ = 69.4 inches, with a standard deviation of ๐ = 3.1 inches. The mean height for adult women is ๐ = 63.8 inches, with a standard deviation of ๐ = 2.8 inches. Who is taller relative to their gender, a man 73 inches tall, or a woman 68 inches tall? S OLUTION : ๐-SCORES & THE EMPIRICAL RULE Since the ๐ง-score is the number of standard deviations from the mean, we can easily interpret the ๐ง-score for bell-shaped populations using The Empirical Rule. When a population has a histogram that is approximately bell-shaped, then • Approximately 68% of the data will have ๐ง-scores between –1 and 1. • Approximately 95% of the data will have ๐ง-scores between –2 and 2. • All, or almost all of the data will have ๐ง-scores between –3 and 3. 2 SECTION 3.3: MEASURES OF POSITION OBJECTIVE 2 COMPUTE THE QUARTILES OF A DATA SET QUARTILES In a previous section, we learned how to compute the mean and median of a data set as measures of the center. Sometimes, it is useful to compute measures of position other than the center to get a more detailed description of the distribution. Quartiles divide a data set into four approximately equal pieces. Q UARTILES : Every data set has three quartiles: The __________________________, denoted ๐1 separates the lowest __________ of the data from the highest __________. The __________________________, denoted ๐2 separates the lowest __________ of the data from the highest __________. ๐2 is the same as the median. The __________________________, denoted ๐3 separates the lowest __________ of the data from the highest __________. 3 SECTION 3.3: MEASURES OF POSITION COMPUTING QUARTILES There are several methods for computing quartiles, all of which give similar results. The following procedure is one fairly straightforward method: Step 1: Step 2: Arrange the data in increasing order. Let ๐ be the number of values in the data set. To compute the second quartile, simply compute the median. For the first or third quartiles, proceed as follows: For the first quartile, compute ๐ฟ = 0.25๐ For the third quartile, compute ๐ฟ = 0.75๐ If ๐ฟ is a whole number, the quartile is the average of the number in position ๐ฟ and the number in position ๐ฟ + 1. If ๐ฟ is not a whole number, round it up to the next higher whole number. The quartile is the number in the position corresponding to the rounded-up value. Step 3: E XAMPLE : 0.00 1.64 4.37 The following table presents the annual rainfall, in inches, in Los Angeles during the month of February from 1969 to 2013. Compute the quartiles for the data. 0.08 1.72 4.64 0.13 1.90 4.89 0.14 2.37 4.94 0.16 2.58 5.54 0.17 2.84 6.10 0.20 3.06 6.61 0.29 3.12 7.89 0.56 3.21 7.96 0.67 3.29 8.03 0.70 3.54 8.87 0.92 3.57 8.91 1.22 3.71 11.02 1.30 4.13 12.75 1.48 4.27 13.68 S OLUTION : 4 SECTION 3.3: MEASURES OF POSITION QUARTILES ON THE TI-84 PLUS The 1-Var Stats command in the TI-84 PLUS Calculator displays a list of the most common parameters and statistics for a given data set. This command is accessed by pressing STAT and then highlighting the CALC menu. EXAMPLE: QUARTILES ON THE TI-84 PLUS Step 1: Enter the data in L1. Step 2: Press STAT and highlight the CALC menu. Step 3: Select 1-Var Stats and press ENTER. Enter L1 in the List field and run the command. Note: If your calculator does not support Stat Wizards, enter L1 next to the 1-Var Stats command on the home screen and press enter to run the command The quartile values produced by the TI-84 PLUS may differ from results obtained by hand because it uses a slightly different procedure than the one described in the text. VISUALIZING THE QUARTILES Following is a dotplot of the Los Angeles rainfall data with the quartiles indicated. The quartiles divide the data set into four parts, with approximately 25% of the data in each part. 5 SECTION 3.3: MEASURES OF POSITION OBJECTIVE 3 COMPUTE THE PERCENTILES OF A DATA SET Quartiles describe the shape of a distribution by dividing it into fourths. Sometimes it is useful to divide a data set into a greater number of pieces to get a more detailed description of the distribution. ________________________ divide a data set into hundredths. P ERCENTILES : For a number p between 1 and 99, the pth percentile separates the lowest p% of the data from the highest (1– p)%. COMPUTING PERCENTILES The following procedure computes the pth percentile of a data set: Step 1: Arrange the data in increasing order. Step 2: Let ๐ be the number of values in the data set. For the pth percentile, compute ๐ Step 3: ๐ฟ = (100) ๐. If ๐ฟ is a whole number, the pth percentile is the average of the number in position ๐ฟ and the number in position ๐ฟ + 1. If ๐ฟ is not a whole number, round it up to the next higher whole number. The pth percentile is the number in the position corresponding to the rounded-up value. 6 SECTION 3.3: MEASURES OF POSITION E XAMPLE : 0.00 1.64 4.37 The following table presents the annual rainfall, in inches, in Los Angeles during the month of February from 1969 to 2013. Compute the 60 th percentile for the data. 0.08 1.72 4.64 0.13 1.90 4.89 0.14 2.37 4.94 0.16 2.58 5.54 0.17 2.84 6.10 0.20 3.06 6.61 0.29 3.12 7.89 0.56 3.21 7.96 0.67 3.29 8.03 0.70 3.54 8.87 0.92 3.57 8.91 1.22 3.71 11.02 1.30 4.13 12.75 1.48 4.27 13.68 S OLUTION : COMPUTING A PERCENTILE FROM A GIVEN DATA VALUE Sometimes we are given a value from a data set and wish to compute the percentile corresponding to that value. Following is the procedure for doing this: Step 1: Arrange the data in increasing order. Step 2: Let ๐ฅ be the data value whose percentile is to be computed. Use the following formula to compute the percentile: Percentile = 100 โ (๐๐ข๐๐๐๐ ๐๐ ๐ฃ๐๐๐ข๐๐ ๐๐๐ ๐ ๐กโ๐๐ ๐ฅ)+0.5 ๐๐ข๐๐๐๐ ๐๐ ๐ฃ๐๐๐ข๐๐ ๐๐ ๐กโ๐ ๐๐๐ก๐ ๐ ๐๐ก Round the result to the nearest whole number. This is the percentile corresponding to the value ๐ฅ. E XAMPLE : 0.00 1.64 4.37 The following table presents the annual rainfall in Los Angeles during February from 1969 to 2013. In 1989, the rainfall was 1.90. What percentile does this correspond to? 0.08 1.72 4.64 0.13 1.90 4.89 0.14 2.37 4.94 0.16 2.58 5.54 0.17 2.84 6.10 0.20 3.06 6.61 0.29 3.12 7.89 0.56 3.21 7.96 0.67 3.29 8.03 0.70 3.54 8.87 0.92 3.57 8.91 1.22 3.71 11.02 1.30 4.13 12.75 1.48 4.27 13.68 S OLUTION : 7 SECTION 3.3: MEASURES OF POSITION OBJECTIVE 4 COMPUTE THE FIVE-NUMBER SUMMARY FOR A DATA SET The ___________________________________________ of a data set consists of the median, the first quartile, the third quartile, the smallest value, and the largest value. These values are generally arranged in order. E XAMPLE : 0.00 1.64 4.37 Recall the Los Angeles annual rainfall data. Compute the five-number summary. 0.08 1.72 4.64 0.13 1.90 4.89 0.14 2.37 4.94 0.16 2.58 5.54 0.17 2.84 6.10 0.20 3.06 6.61 0.29 3.12 7.89 0.56 3.21 7.96 0.67 3.29 8.03 0.70 3.54 8.87 0.92 3.57 8.91 1.22 3.71 11.02 1.30 4.13 12.75 1.48 4.27 13.68 S OLUTION : When using the TI-84 PLUS Calculator, the five-number summary is given by the 1-Var Stats command. 8 SECTION 3.3: MEASURES OF POSITION OBJECTIVE 5 UNDERSTAND THE EFFECTS OF OUTLIERS An _______________ is a value that is considerably larger or considerably smaller than most of the values in a data set. Some outliers result from errors; for example a misplaced decimal point may cause a number to be much larger or smaller than the other values in a data set. Some outliers are correct values, and simply reflect the fact that the population contains some extreme values. E XAMPLE : The temperature in a downtown location is measured for eight consecutive days during the summer. The readings, in Fahrenheit, are 81.2 85.6 89.3 91.0 83.2 8.45 79.5 87.8 Which reading is an outlier? Is the outlier an error or is it possible that it is correct? S OLUTION : INTERQUARTILE RANGE One method for detecting outliers involves a measure called the Interquartile Range. I NTERQUARTILE R ANGE The interquartile range is found by subtracting the __________ quartile from the __________ quartile. IQR = __________________________ 9 SECTION 3.3: MEASURES OF POSITION IQR METHOD FOR DETECTING OUTLIERS The most frequent method used to detect outliers in a data set is the IQR Method. The procedure for the IQR Method is: Step 1: Find the first quartile ๐1, and the third quartile ๐3 . Step 2: Compute the interquartile range: IQR = ๐3 − ๐1. Step 3: Compute the outlier boundaries. These boundaries are the cutoff points for determining outliers: Lower Outlier Boundary = ๐1 – 1.5(IQR) Upper Outlier Boundary = ๐3 + 1.5(IQR) Step 4: Any data value that is less than the lower outlier boundary or greater than the upper outlier boundary is considered to be an outlier. E XAMPLE : The following table presents the number of students absent in a middle school in northwestern Montana for each school day in January. Identify any outliers. 65 67 71 57 51 49 44 41 59 49 42 56 45 77 44 42 45 46 100 59 53 51 S OLUTION : 10 SECTION 3.3: MEASURES OF POSITION OBJECTIVE 6 CONSTRUCT BOXPLOTS TO VISUALIZE THE FIVE-NUMBER SUMMARY AND OUTLIERS A _____________________ is a graph that presents the five-number summary along with some additional information about a data set. There are several different kinds of boxplots. The one we describe here is sometimes called a __________________________________. E XAMPLE : The following table presents the number of students absent in a middle school in northwestern Montana for each school day in January. Identify any outliers. 65 67 71 57 51 49 44 41 59 49 42 56 45 77 44 42 45 46 100 59 53 51 S OLUTION : 11 SECTION 3.3: MEASURES OF POSITION BOXPLOTS ON THE TI-84 PLUS The following steps will create a boxplot for the student absences data on the TI-84 PLUS. Step 1: Enter the data in L1. Step 2: Press 2nd,Y=, then 1 to access the Plot1 menu. Select On and the boxplot type. Step 3: Press Zoom, 9 to view the plot. 12 SECTION 3.3: MEASURES OF POSITION DETERMINING THE SHAPE OF A DATA SET FROM A BOXPLOT Boxplots can be used to determine skewness in a data set. If the median is closer to the first quartile than to the third quartile, or the upper whisker is longer than the lower whisker, the data are skewed to the right. If the median is closer to the third quartile than to the first quartile, or the lower whisker is longer than the upper whisker, the data are skewed to the left. If the median is approximately halfway between the first and third quartiles, and the two whiskers are approximately equal in length, the data are approximately symmetric 13 SECTION 3.3: MEASURES OF POSITION YOU SHOULD KNOW … ๏ท How to compute and interpret ๐ง-scores ๏ท How to compute the quartiles of a data set ๏ท How to compute a percentile of a data set ๏ท How to compute the percentile corresponding to a given data value ๏ท How to find the five-number summary for a data set ๏ท How to determine outliers using the IQR method ๏ท How to construct a boxplot and use it to determine skewness 14