Percentiles

advertisement
Calculating Percentiles
Percentiles are used to divide a set of data into specific proportions. For example, the 25th percentile is the
position in a set of data, when the data are placed in ascending order, that 25% of the data values are equal to or
less than. It also means that 75% of the data values are equal to or greater than this locational value.
Bottom 25% of values
Smallest
Value
Top 75% of values
25th percentile
Largest
value
The median is the 50th percentile since it is located exactly in the middle of the data.
There are two ways of determining the value of a given percentile. The first method involves the following steps.
(The second is using the PERCENTILE function in MS Excel).
Step 1: Arrange the data values in ascending order
Step 2: Find the position (denoted i) of the percentile in the above ranked data
i = P (n)
100
where
i = the location of the percentile in the ascending set of data
P = the percentile we want to calculate
n = the number of data values in the set of data
Step 3:
If i is not a whole number, then go up to the next whole number (for example: 3.3 would become 4).
Locate this position in the ordered set. The value in this location is the required percentile value.
If i is a whole number, find the average of the values in the i and i + 1 positions in the ordered set. The
average value will be the required percentile value.
Revised 5/21/12 pe
Example
To illustrate this, let’s suppose we want to know data values (g/m3) for the 30th, 50th, and 80th percentiles for PM
data collected during the month of September.
We first list the data set.
South S3 values (g/m3)
Sep 4
41.8
Sep 7
85.0
Sep 10
57.1
Sep 13
229.6
Sep 16
27.9
Sep 19
26.1
Sep 22
25.7
Sep 28
20.4
Next we need to rank the scores in ascending order:
Rank
Data Value
(in g/m3)
1
20.4
2
25.7
3
26.1
4
27.9
5
41.8
6
57.1
7
85.0
8
229.6
The next step is to calculate the percentiles in this ranked data:
æ 30 ö
i= ç
÷8 = 2.4
è 100 ø
æ 50 ö
÷8 = 4
è 100 ø
æ 80 ö
i= ç
÷8 = 6.4
è 100 ø
i= ç
For the 30th percentile the location (i) is 2.4, which, according to Step 3 means that the 30th percentile is in
position 3 and therefore has a value of 26.1 g/m3. We can now say that 30% of the PM values were 26.1 g/m3
or less (or we might prefer to say that 70% of the PM values were 26.1 g/m3 or higher).
For the 50th percentile the location (i) is a whole number, so according to Step 3, we take the average of the values
in locations 4 and 5 (i and i + 1), which is
which is the average of 27.9 and 41.8.
27.9 + 41.8
= 34.9. The 50th percentile has a value of 34.9 g/m3,
2
The 80th percentile is in location 6.4. According Step 3, we round up to the next whole number, which is location
7. The 80th percentile therefore has a value of 85.0 g/m3.
Revised 5/21/12 pe
Why does a percentile measure relative position?
(adapted from Statistics for the Utterly Confused by Lloyd Jaisingh, 2000)
The diagram below shows a plot of the data points with the location of the 65th percentile value of 27.
This number is the cutoff point where 65 percent of the data values are less than or equal to the data
value of 27, and 35 percent of the data values are equal to or greater than the data value of 27. Thus, this
shows that the 65th percentile value is a measure of location. That is, the percentile gives us an idea of
the relative position of a value in an ordered data set.
Bottom 65% of values
15
21
27
Top 35% of Values
33
39
45
51
57
63
65th percentile is at data value of 27
Revised 5/21/12 pe
Download