Sample Median and Other Quantiles

advertisement
Sample Median and Other Quantiles
Sample Median
Definition: sample median
The sample median is the center of the ordered array.
1. Order the sample from smallest to largest. A stem-and-leaf plot is good for
ordering.
2. Median location.
a. If the sample size is odd, then the median is the middle observation.
b. If the sample size is even, then the median is the average of the two
middle observation.
Example 1 (odd sample size)
Consider the sample
4, 1, 1, 2, 6.
The ordered array is
1, 1, 2, 4, 6.
The sample size is 5, which is odd, so the median is the middle value, which is 2. I.e.,
(sample median) = 2.
Example 2 (even sample size)
Consider the sample
4, 1, 1, 2, 6. 8
Document1
1
2/8/2016
The ordered array is
1, 1, 2, 4, 6, 8
The sample size is 6, which is even, so the median is the average of the two middle
values.
The two middle values are 2 and 4, so the median is
(sample median) = (2 + 4) / 2 = 6 / 2 = 3.
Large Samples
For larger samples it is convenient to be a more technical about defining the median
location. We therefore elaborate on the definition given above.
Definition: sample median
The sample median is the center of the ordered array.
1. Order the sample from smallest to largest. A stem-and-leaf plot is good for
ordering.
2. Median location. The median location is for a sample of size n is defined by
(median location) = (0.5)(n + 1)
a. If the sample size, n, is odd, then the median location is an integer, and the
median is the is the (0.5)(n + 1)-th observation in the ordered array, which
is the middle observation.
b. If the sample size is even, then the median location is a fraction between
two integers, say (a), and (a + 1). The median is then the average of the ath and (a + 1)-th observations in the ordered array, which is the average of
the two middle observations.
Document1
2
2/8/2016
Note. There are actually at least four alternative definitions of the sample median (and
other quantiles) that give slightly different answers, but this definition is sufficient for
"hand" calculation. Statistical software such as SAS® and JMP® provide these
alternatives. One such alternative is given below.
Example 1 (odd sample size)
Consider the sample
4, 1, 1, 2, 6.
The ordered array is
1, 1, 2, 4, 6.
The sample size is 5 (which is odd), and the median location is
(median location) = (0.5)(5 + 1) = (0.5)(6) = 3.
Therefore, the median is the 3-rd value in the ordered array, which is, of course, the
middle value. I.e.,
(sample median) = 2.
Example 2 (even sample size)
Consider the sample
4, 1, 1, 2, 6. 8
The ordered array is
1, 1, 2, 4, 6, 8
n=6
Document1
3
2/8/2016
(median location) = (0.5)(6 + 1) = (0.5)(7) = 3.5
So the median is the average of the 3-rd and 4-th observations in the ordered array, which
are 2 and 4.
(sample median) = (2 + 4) / 2 = 6 / 2 = 3
Common Errors
1. Forgetting to order the sample.
2. Reporting the median location as the median.
Quartiles and the Five Number Summary
Note: This is the most elementary definition of the quartiles, as given by Baldi and
Moore, for use in Stat 3615. For Stat 5674, a more complicated definition after Daniel
(2009) is given below. These two methods can produce slightly different results.
There are three quartiles, Q1, Q2, and Q3, which divide the ordered array into four
quarters of nearly equal numbers of observations.
The second quartile, Q2, is simply the median. It divides the ordered array into two
halves, each half having the same number of observations.
The first quartile, Q1, is the median of the lower half of the ordered array.
The third quartile, Q3, is the median of the upper half of the ordered array.
The five number summary is comprised by the sample
Min, Q1, Q2, Q3, Max
I.e.,
Min, Q1, Median, Q3, Max
Document1
4
2/8/2016
Example 1 (odd sample size)
Consider the sample
4, 1, 1, 2, 6.
The ordered array is
1, 1, 2, 4, 6.
The sample size is 5, which is odd, so the median is the middle observation of the ordered
array.
Therefore, the median is the 3-rd value in the ordered array, which is, of course, the
middle value. I.e.,
(sample median) = 2.
First Quartile, Q1. To find the first quartile, Q1, we find the median of the lower half of
the orderd array. The lower half of the ordered array is
1, 1
Because the lower half has an even number of observations, the first quartile, Q1, being
the median of the lower half, is the average of the two middle values
Q1 = (1 + 1)/2 = 1
Third Quartile, Q3. To find the third quartile, Q3, we find the median of the upper half
of the orderd array. The upper half of the ordered array is
4, 6
Because the upper half has an even number of observations, the third quartile, Q3, being
the median of the upper half, is the average of the two middle values
Document1
5
2/8/2016
Q3 = (4 + 6)/2 = 5
The minimum is Min = 1, the maximum is Max = 6, so the five number summary is
1, 1, 2, 5, 6
Example 2 (even sample size)
Consider the sample
4, 1, 1, 2, 6. 8
The ordered array is
1, 1, 2, 4, 6, 8
n=6
even
So the median is the average of the 3-rd and 4-th observations in the ordered array, which
are 2 and 4.
(sample median) = (2 + 4) / 2 = 6 / 2 = 3.
Lower half:
1, 1, 2
Q1 = 1
Upper half:
4, 6, 8
Q3 = 6
Document1
6
2/8/2016
Five number summary
1, 1, 3, 6, 8
Example 3
Consider the sample
4, 1, 1, 2, 6. 8, 8, 5, 7, 3, 0, -3, -2, -2, 0.
The ordered array is
-3, -2, -2, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8, 8
n = 15, odd
Median = Q2 = 2
Lower half
-3, -2, -2, 0, 0, 1, 1
odd
Q1 = 0
Upper half
3, 4, 5, 6, 7, 8, 8
odd
Q3 = 6
Five number summary
-3, 0, 2, 6, 8
Document1
7
2/8/2016
Note

The so-called lower half and upper half are not exactly halves for an odd size
sample. For an odd sized sample, there are (n − 1)/2 observations in each half. For
an even sized sample, there are n/2 observations in each half.

For the purpose of dividing the ordered array into lower and upper halves, the
actual median is not included.

The first quartile is also known as the 25th percentile and as the 0.25 quantile.

The median or 2nd quartile is also known as the 50th percentile and as the 0.50
quantile.

The third quartile is also known as the 75th percentile and as the 0.75 quantile.
Quantiles (for advanced classes, e.g., Stat 5605, 5506,
and 5674)
This is the “weighted-average” method of computing quantiles, as in Daniel (2009) and
in JMP.
Definition: q-th quantile location, 0 < q < 1
The q-th quantile location is the
Lq = (q)(n + 1)
Definition: q-th quantile, 0 ≤ q ≤ 1
For q = 0, the 0.0 quantile, denoted by Q0.00, is the first observation in the ordered array,
the sample minimum.
For q = 1, the 1.0 quantile, denoted by Q1.00, is the last observation in the ordered array,
the sample maximum.
Document1
8
2/8/2016
For 0 < q < 1, there are two cases:
1. If Lq is an integer, then the q-th quantile, denoted by Qq, is the Lq-th observation
in the ordered array.
2. If Lq is not an integer, then the q-th quantile is the weighted average of the [Lq]-th
and ([Lq] + 1)-th observations in the ordered array, where [x] denotes the greatest
integer ≤ x. Let w denote the fractional part of Lq, and let a and b denote the [Lq]th and ([Lq] + 1)-th observations in the ordered array, respectively. Then
Qq = (1 − w)(a) + (w)(b) = a + (w)(b − a)
Note that
Qq = a + (w)(b − a)
is the definition given by Daniel (2009) in Example 2.5.5.
Note. There are actually at least four alternative definitions of the sample median (and
other quantiles) that give slightly different answers, but this definition is sufficient for
"hand" calculation. Statistical software such as SAS® and JMP® provide these
alternatives. One such alternative is given below.
Example 2 (even sample size) using the weighted-average method
We repeat Example 2 to show the different results.
Consider the sample
4, 1, 1, 2, 6, 8
The ordered array is
1, 1, 2, 4, 6, 8
n=6
Document1
9
2/8/2016
Using the weighted average method to compute the median location, we get
L0.50 = 0.5(n + 1) = 0.5(6 + 1) = 3.5
Therefore the median is the weighted average of the 3rd and the 4th observations in the
ordered array, which are 2 and 4, and the weight is
w = 0.5
And the median is
Q0.50 = a + (w)(b – a) = 2 + (0.5)(4 – 2) = 3
the same as the elementary method.
To calculate the 1st quartile, which is the 0.25 quantile, we get the quantile location of
L0.50 = 0.25(n + 1) = 0.25(6 + 1) = 1.75
Therefore the 1st quartile is the weighted average of the 1st and 2nd observations in the
ordered array, which are 1 and 1, and the weight is
w = 0.75
So the 1st quartile is
Q0.25 = a + (w)(b – a) = 1 + (0.75)(1 – 1) = 1
the same as the elementary method. However, if the 1st and 2nd observations had been
unequal, then the elementary method would have given the 2nd observation, and the
weighted average would have given a lower number.
To calculate the 3rd quartile, which is the 0.75 quantile, we get the quantile location of
L0.50 = 0.75(n + 1) = 0.75(6 + 1) = 5.25
Document1
10
2/8/2016
Therefore the 3rd quartile is the weighted average of the 5th and 6th observations in the
ordered array, which are 6 and 8, and the weight is
w = 0.25
So the 3rd quartile is
Q0.75 = a + (w)(b – a) = 6 + (0.25)(8 – 6) = 6.5
which is different from the value of 6.0 from the elementary method.
Five number summary
1.0, 1.0, 3.0, 6.5, 8.0
Example 3
Consider the sample
4, 1, 1, 2, 6, 8, 8, 5, 7, 3, 0, −3, −2, −2, 0
The ordered array is
−3, −2, −2, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8, 8
n = 15
Let Qq denote the q-th quantile.
Q0.00 = (sample minimum) = −3
The 0.05 quantile, also known as the 5-th percentile, is found as follows.
L0.05 = (0.05)(15 + 1) = (0.05)(16) = 0.8
Document1
11
2/8/2016
Because the 0.05 quantile location is less than 1, the 0.05 quantile is simply the sample
minimum.
Q0.05 = −3.
The 0.10 quantile, also known as the 10-th percentile, is found as follows.
L0.10 = (0.10)(15 + 1) = (0.10)(16) = 1.6
Therefore the 0.10 quantile is a weighted average of the 1st and 2nd observations in the
ordered array. And the weight, w, is the fractional part of 1.6, namely
w = 0.6
Q0.10 = (−3) + (0.6)[(−3) − (−2)] = (−3) + (0.6)( −1) = (−3) + (−0.6) = −2.4.
This is the value given by default in JMP.
The 0.25 quantile, also known as the 25-th percentile, also known as the first quartile, is
found as follows.
L0.25 = (0.25)(15 + 1) = (0.25)(16) = 4.0,
Q0.25 = 0.
The 0.50 quantile, also known as the 50-th percentile, also known as the second quartile,
also known as the sample median, is found as follows.
L0.50 = (0.50)(15 + 1) = (0.50)(16) = 8.0,
Q0.50 = 2.0.
Likewise, the 0.75 quantile = 75-th percentile = third quartile = the 12-th observation in
the ordered array = 6.0.
Document1
12
2/8/2016
Exercises
1. Find the 90-th percentile of the sample of Example 3, above.
2. Find the 0.95 quantile of the sample of Example 3, above.
3. Find the 100-th percentile, i.e., the maximum, for the sample of Example 3, above.
4. Find the first quartile of the sample of Example 1, above.
5. Find the third quartile of the sample of Example 2, above.
Percentiles calculated in JMP
100.0%
maximum
8
99.5%
8
97.5%
8
90.0%
8
75.0%
quartile
6
50.0%
median
2
25.0%
quartile
0
10.0%
-2.4
2.5%
-3
0.5%
-3
0.0%
minimum
-3
Common Errors
1. Forgetting to order the sample.
2. Reporting the quantile location as the quantile.
3. A mistake of sign (i.e., plus or minus).
4. A factor of 2.
To avoid errors
1. Write out each arithmetic step.
2. Check your answer by seeing if it makes sense in a graph of the data.
Document1
13
2/8/2016
Example 4 = Example 2.5.5 of Daniel (2011)
The ordered array of the sample of Daniel (2009) Table 2.5.1 is
shown to the right.
n = 20
The five number summary is found as follows.
The sample minimum is Q0.00 = 14.6
The first quartile is the 25th percentile and the 0.25 quantile,
with quantile location
L0.25  0.25  n  1  0.25  20  1  5.25
So the first quartile is the weighted average of the 5th and 6th
observations in the ordered array:
Q0.25  27.2  0.25  27.4  27.2  27.2  0.25  0.2   27.25
The second quartile is the median, the 50th percentile, and the
0.50 quantile, with quantile location
L0.50  0.50  n  1  0.50  20  1  10.5
So the sample median is the weighted average of the 10th and 11th
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Ordered
Array
14.6
24.3
24.9
27.0
27.2
27.4
28.2
28.8
29.9
30.7
31.5
31.6
32.3
32.8
33.3
33.6
34.3
36.9
38.3
44.0
observations in the ordered array:
Q0.50  30.7  0.5  31.5  30.7   30.7  0.5  0.8  31.1
The third quartile, the 75th percentile, and the 0.75 quantile, with quantile location
L0.75  0.75  n  1  0.75  20  1  15.75
So the 3rd quartile is the weighted average of the 15th and 16th observations in the ordered
array, which are 33.3 and 33.6:
Q0.50  33.3  0.75  33.6  33.3  33.3  0.75  0.3  33.525
The sample maximum is Q1.00 = 44.0
Document1
14
2/8/2016
Thus, the five-number summary is
sample minimum Q0.00 = 14.600
first quartile Q0.25 = 27.250
median Q0.50 = 31.100
third quartile Q0.75 = 33.525
sample maximum Q1.00 = 44.000
Note that if I were presenting these data I would round to 3 significant digits:
sample minimum Q0.00 = 14.6
first quartile Q0.25 = 27.3
median Q0.50 = 31.1
third quartile Q0.75 = 33.5
sample maximum Q1.00 = 44.0
Thus, the sample range is
(sample range) = (max) – (min) = 44.0 – 14.6 = 29.4
The sample inter-quartile range is
IQR
= Q0.75 – Q0.25
= 33.525 – 27.25
= 6.275
For further illustration, we can calculate the 10th and 90th percentiles.
The 10th percentile is the 0.10 quantile with
L0.10
= 0.10(20 + 1) = 2.1
Q0.10
= 24.3 + (0.1)(24.9 – 24.3)
= 24.3 + 0.06 = 24.36
The 90th percentile is the 0.90 quantile with
L0.90
= 0.90(20 + 1) = 18.9
Q0.90
= 36.9 + (0.9)(38.3 – 36.9)
= 36.9 + 1.26 = 38.16
Document1
15
2/8/2016
Quantiles
100.0%
maximum
99.5%
97.5%
90.0%
75.0%
quartile
50.0%
median
25.0%
quartile
10.0%
2.5%
0.5%
0.0%
minimum
JMP Quantiles from
JMP > Analyze > Distribution
Document1
16
44
44
44
38.16
33.525
31.1
27.25
24.36
14.6
14.6
14.6
2/8/2016
Download