Sample Median and Other Quantiles Sample Median Definition: sample median The sample median is the center of the ordered array. 1. Order the sample from smallest to largest. A stem-and-leaf plot is good for ordering. 2. Median location. a. If the sample size is odd, then the median is the middle observation. b. If the sample size is even, then the median is the average of the two middle observation. Example 1 (odd sample size) Consider the sample 4, 1, 1, 2, 6. The ordered array is 1, 1, 2, 4, 6. The sample size is 5, which is odd, so the median is the middle value, which is 2. I.e., (sample median) = 2. Example 2 (even sample size) Consider the sample 4, 1, 1, 2, 6. 8 Document1 1 2/8/2016 The ordered array is 1, 1, 2, 4, 6, 8 The sample size is 6, which is even, so the median is the average of the two middle values. The two middle values are 2 and 4, so the median is (sample median) = (2 + 4) / 2 = 6 / 2 = 3. Large Samples For larger samples it is convenient to be a more technical about defining the median location. We therefore elaborate on the definition given above. Definition: sample median The sample median is the center of the ordered array. 1. Order the sample from smallest to largest. A stem-and-leaf plot is good for ordering. 2. Median location. The median location is for a sample of size n is defined by (median location) = (0.5)(n + 1) a. If the sample size, n, is odd, then the median location is an integer, and the median is the is the (0.5)(n + 1)-th observation in the ordered array, which is the middle observation. b. If the sample size is even, then the median location is a fraction between two integers, say (a), and (a + 1). The median is then the average of the ath and (a + 1)-th observations in the ordered array, which is the average of the two middle observations. Document1 2 2/8/2016 Note. There are actually at least four alternative definitions of the sample median (and other quantiles) that give slightly different answers, but this definition is sufficient for "hand" calculation. Statistical software such as SAS® and JMP® provide these alternatives. One such alternative is given below. Example 1 (odd sample size) Consider the sample 4, 1, 1, 2, 6. The ordered array is 1, 1, 2, 4, 6. The sample size is 5 (which is odd), and the median location is (median location) = (0.5)(5 + 1) = (0.5)(6) = 3. Therefore, the median is the 3-rd value in the ordered array, which is, of course, the middle value. I.e., (sample median) = 2. Example 2 (even sample size) Consider the sample 4, 1, 1, 2, 6. 8 The ordered array is 1, 1, 2, 4, 6, 8 n=6 Document1 3 2/8/2016 (median location) = (0.5)(6 + 1) = (0.5)(7) = 3.5 So the median is the average of the 3-rd and 4-th observations in the ordered array, which are 2 and 4. (sample median) = (2 + 4) / 2 = 6 / 2 = 3 Common Errors 1. Forgetting to order the sample. 2. Reporting the median location as the median. Quartiles and the Five Number Summary Note: This is the most elementary definition of the quartiles, as given by Baldi and Moore, for use in Stat 3615. For Stat 5674, a more complicated definition after Daniel (2009) is given below. These two methods can produce slightly different results. There are three quartiles, Q1, Q2, and Q3, which divide the ordered array into four quarters of nearly equal numbers of observations. The second quartile, Q2, is simply the median. It divides the ordered array into two halves, each half having the same number of observations. The first quartile, Q1, is the median of the lower half of the ordered array. The third quartile, Q3, is the median of the upper half of the ordered array. The five number summary is comprised by the sample Min, Q1, Q2, Q3, Max I.e., Min, Q1, Median, Q3, Max Document1 4 2/8/2016 Example 1 (odd sample size) Consider the sample 4, 1, 1, 2, 6. The ordered array is 1, 1, 2, 4, 6. The sample size is 5, which is odd, so the median is the middle observation of the ordered array. Therefore, the median is the 3-rd value in the ordered array, which is, of course, the middle value. I.e., (sample median) = 2. First Quartile, Q1. To find the first quartile, Q1, we find the median of the lower half of the orderd array. The lower half of the ordered array is 1, 1 Because the lower half has an even number of observations, the first quartile, Q1, being the median of the lower half, is the average of the two middle values Q1 = (1 + 1)/2 = 1 Third Quartile, Q3. To find the third quartile, Q3, we find the median of the upper half of the orderd array. The upper half of the ordered array is 4, 6 Because the upper half has an even number of observations, the third quartile, Q3, being the median of the upper half, is the average of the two middle values Document1 5 2/8/2016 Q3 = (4 + 6)/2 = 5 The minimum is Min = 1, the maximum is Max = 6, so the five number summary is 1, 1, 2, 5, 6 Example 2 (even sample size) Consider the sample 4, 1, 1, 2, 6. 8 The ordered array is 1, 1, 2, 4, 6, 8 n=6 even So the median is the average of the 3-rd and 4-th observations in the ordered array, which are 2 and 4. (sample median) = (2 + 4) / 2 = 6 / 2 = 3. Lower half: 1, 1, 2 Q1 = 1 Upper half: 4, 6, 8 Q3 = 6 Document1 6 2/8/2016 Five number summary 1, 1, 3, 6, 8 Example 3 Consider the sample 4, 1, 1, 2, 6. 8, 8, 5, 7, 3, 0, -3, -2, -2, 0. The ordered array is -3, -2, -2, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8, 8 n = 15, odd Median = Q2 = 2 Lower half -3, -2, -2, 0, 0, 1, 1 odd Q1 = 0 Upper half 3, 4, 5, 6, 7, 8, 8 odd Q3 = 6 Five number summary -3, 0, 2, 6, 8 Document1 7 2/8/2016 Note The so-called lower half and upper half are not exactly halves for an odd size sample. For an odd sized sample, there are (n − 1)/2 observations in each half. For an even sized sample, there are n/2 observations in each half. For the purpose of dividing the ordered array into lower and upper halves, the actual median is not included. The first quartile is also known as the 25th percentile and as the 0.25 quantile. The median or 2nd quartile is also known as the 50th percentile and as the 0.50 quantile. The third quartile is also known as the 75th percentile and as the 0.75 quantile. Quantiles (for advanced classes, e.g., Stat 5605, 5506, and 5674) This is the “weighted-average” method of computing quantiles, as in Daniel (2009) and in JMP. Definition: q-th quantile location, 0 < q < 1 The q-th quantile location is the Lq = (q)(n + 1) Definition: q-th quantile, 0 ≤ q ≤ 1 For q = 0, the 0.0 quantile, denoted by Q0.00, is the first observation in the ordered array, the sample minimum. For q = 1, the 1.0 quantile, denoted by Q1.00, is the last observation in the ordered array, the sample maximum. Document1 8 2/8/2016 For 0 < q < 1, there are two cases: 1. If Lq is an integer, then the q-th quantile, denoted by Qq, is the Lq-th observation in the ordered array. 2. If Lq is not an integer, then the q-th quantile is the weighted average of the [Lq]-th and ([Lq] + 1)-th observations in the ordered array, where [x] denotes the greatest integer ≤ x. Let w denote the fractional part of Lq, and let a and b denote the [Lq]th and ([Lq] + 1)-th observations in the ordered array, respectively. Then Qq = (1 − w)(a) + (w)(b) = a + (w)(b − a) Note that Qq = a + (w)(b − a) is the definition given by Daniel (2009) in Example 2.5.5. Note. There are actually at least four alternative definitions of the sample median (and other quantiles) that give slightly different answers, but this definition is sufficient for "hand" calculation. Statistical software such as SAS® and JMP® provide these alternatives. One such alternative is given below. Example 2 (even sample size) using the weighted-average method We repeat Example 2 to show the different results. Consider the sample 4, 1, 1, 2, 6, 8 The ordered array is 1, 1, 2, 4, 6, 8 n=6 Document1 9 2/8/2016 Using the weighted average method to compute the median location, we get L0.50 = 0.5(n + 1) = 0.5(6 + 1) = 3.5 Therefore the median is the weighted average of the 3rd and the 4th observations in the ordered array, which are 2 and 4, and the weight is w = 0.5 And the median is Q0.50 = a + (w)(b – a) = 2 + (0.5)(4 – 2) = 3 the same as the elementary method. To calculate the 1st quartile, which is the 0.25 quantile, we get the quantile location of L0.50 = 0.25(n + 1) = 0.25(6 + 1) = 1.75 Therefore the 1st quartile is the weighted average of the 1st and 2nd observations in the ordered array, which are 1 and 1, and the weight is w = 0.75 So the 1st quartile is Q0.25 = a + (w)(b – a) = 1 + (0.75)(1 – 1) = 1 the same as the elementary method. However, if the 1st and 2nd observations had been unequal, then the elementary method would have given the 2nd observation, and the weighted average would have given a lower number. To calculate the 3rd quartile, which is the 0.75 quantile, we get the quantile location of L0.50 = 0.75(n + 1) = 0.75(6 + 1) = 5.25 Document1 10 2/8/2016 Therefore the 3rd quartile is the weighted average of the 5th and 6th observations in the ordered array, which are 6 and 8, and the weight is w = 0.25 So the 3rd quartile is Q0.75 = a + (w)(b – a) = 6 + (0.25)(8 – 6) = 6.5 which is different from the value of 6.0 from the elementary method. Five number summary 1.0, 1.0, 3.0, 6.5, 8.0 Example 3 Consider the sample 4, 1, 1, 2, 6, 8, 8, 5, 7, 3, 0, −3, −2, −2, 0 The ordered array is −3, −2, −2, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8, 8 n = 15 Let Qq denote the q-th quantile. Q0.00 = (sample minimum) = −3 The 0.05 quantile, also known as the 5-th percentile, is found as follows. L0.05 = (0.05)(15 + 1) = (0.05)(16) = 0.8 Document1 11 2/8/2016 Because the 0.05 quantile location is less than 1, the 0.05 quantile is simply the sample minimum. Q0.05 = −3. The 0.10 quantile, also known as the 10-th percentile, is found as follows. L0.10 = (0.10)(15 + 1) = (0.10)(16) = 1.6 Therefore the 0.10 quantile is a weighted average of the 1st and 2nd observations in the ordered array. And the weight, w, is the fractional part of 1.6, namely w = 0.6 Q0.10 = (−3) + (0.6)[(−3) − (−2)] = (−3) + (0.6)( −1) = (−3) + (−0.6) = −2.4. This is the value given by default in JMP. The 0.25 quantile, also known as the 25-th percentile, also known as the first quartile, is found as follows. L0.25 = (0.25)(15 + 1) = (0.25)(16) = 4.0, Q0.25 = 0. The 0.50 quantile, also known as the 50-th percentile, also known as the second quartile, also known as the sample median, is found as follows. L0.50 = (0.50)(15 + 1) = (0.50)(16) = 8.0, Q0.50 = 2.0. Likewise, the 0.75 quantile = 75-th percentile = third quartile = the 12-th observation in the ordered array = 6.0. Document1 12 2/8/2016 Exercises 1. Find the 90-th percentile of the sample of Example 3, above. 2. Find the 0.95 quantile of the sample of Example 3, above. 3. Find the 100-th percentile, i.e., the maximum, for the sample of Example 3, above. 4. Find the first quartile of the sample of Example 1, above. 5. Find the third quartile of the sample of Example 2, above. Percentiles calculated in JMP 100.0% maximum 8 99.5% 8 97.5% 8 90.0% 8 75.0% quartile 6 50.0% median 2 25.0% quartile 0 10.0% -2.4 2.5% -3 0.5% -3 0.0% minimum -3 Common Errors 1. Forgetting to order the sample. 2. Reporting the quantile location as the quantile. 3. A mistake of sign (i.e., plus or minus). 4. A factor of 2. To avoid errors 1. Write out each arithmetic step. 2. Check your answer by seeing if it makes sense in a graph of the data. Document1 13 2/8/2016 Example 4 = Example 2.5.5 of Daniel (2011) The ordered array of the sample of Daniel (2009) Table 2.5.1 is shown to the right. n = 20 The five number summary is found as follows. The sample minimum is Q0.00 = 14.6 The first quartile is the 25th percentile and the 0.25 quantile, with quantile location L0.25 0.25 n 1 0.25 20 1 5.25 So the first quartile is the weighted average of the 5th and 6th observations in the ordered array: Q0.25 27.2 0.25 27.4 27.2 27.2 0.25 0.2 27.25 The second quartile is the median, the 50th percentile, and the 0.50 quantile, with quantile location L0.50 0.50 n 1 0.50 20 1 10.5 So the sample median is the weighted average of the 10th and 11th Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Ordered Array 14.6 24.3 24.9 27.0 27.2 27.4 28.2 28.8 29.9 30.7 31.5 31.6 32.3 32.8 33.3 33.6 34.3 36.9 38.3 44.0 observations in the ordered array: Q0.50 30.7 0.5 31.5 30.7 30.7 0.5 0.8 31.1 The third quartile, the 75th percentile, and the 0.75 quantile, with quantile location L0.75 0.75 n 1 0.75 20 1 15.75 So the 3rd quartile is the weighted average of the 15th and 16th observations in the ordered array, which are 33.3 and 33.6: Q0.50 33.3 0.75 33.6 33.3 33.3 0.75 0.3 33.525 The sample maximum is Q1.00 = 44.0 Document1 14 2/8/2016 Thus, the five-number summary is sample minimum Q0.00 = 14.600 first quartile Q0.25 = 27.250 median Q0.50 = 31.100 third quartile Q0.75 = 33.525 sample maximum Q1.00 = 44.000 Note that if I were presenting these data I would round to 3 significant digits: sample minimum Q0.00 = 14.6 first quartile Q0.25 = 27.3 median Q0.50 = 31.1 third quartile Q0.75 = 33.5 sample maximum Q1.00 = 44.0 Thus, the sample range is (sample range) = (max) – (min) = 44.0 – 14.6 = 29.4 The sample inter-quartile range is IQR = Q0.75 – Q0.25 = 33.525 – 27.25 = 6.275 For further illustration, we can calculate the 10th and 90th percentiles. The 10th percentile is the 0.10 quantile with L0.10 = 0.10(20 + 1) = 2.1 Q0.10 = 24.3 + (0.1)(24.9 – 24.3) = 24.3 + 0.06 = 24.36 The 90th percentile is the 0.90 quantile with L0.90 = 0.90(20 + 1) = 18.9 Q0.90 = 36.9 + (0.9)(38.3 – 36.9) = 36.9 + 1.26 = 38.16 Document1 15 2/8/2016 Quantiles 100.0% maximum 99.5% 97.5% 90.0% 75.0% quartile 50.0% median 25.0% quartile 10.0% 2.5% 0.5% 0.0% minimum JMP Quantiles from JMP > Analyze > Distribution Document1 16 44 44 44 38.16 33.525 31.1 27.25 24.36 14.6 14.6 14.6 2/8/2016