CHAPTER 5 PERCENTILES AND PERCENTILE RANKS

advertisement
CHAPTER 5
PERCENTILES AND PERCENTILE RANKS
Percentiles and percentile ranks are frequently used as indicators of performance in both
the academic and corporate worlds. Percentiles and percentile ranks provide information about
how a person or thing relates to a larger group. Relative measures of this type are often
extremely valuable to researchers employing statistical techniques. For example, most nationally
standardized tests scores are reported as percentile ranks. Many graduate schools admit only
candidates who score in the upper fifty percent of those taking certain standardized tests. Honor
students are frequently determined by selecting those that fall in the top ten percent of class.
Researchers may want to know which Congresspersons fall in the bottom ten percent of the
ratings by ADA.1 These and other applications for percentiles and percentile ranks emphasize
the need for students to be knowledgeable in this important area of statistics. This chapter
provides an explanation of the meaning of these statistics and instructions for calculating them
using data provided in frequency and grouped frequency distributions.
Percentiles and percentile ranks are highly similar statistics. Percentiles are calculated as
a means of dividing a distribution of values into 2 or more groups. They are used to determine
where to draw the line between observed values within the distribution. For example: if a teacher
wishes to determine the exam score that divides his class in half, with 50% scoring above and
50% scoring below, he determines the point that marks the 50th percentile.2 In statistics, there are
percentile scales which have special names for various percentiles. Deciles are percentiles which
1
Receiving a low rating by Am ericans for Dem ocratic Action would probably m ean that a
Congressperson would be very conservative.
2
The 50 th percentile is identical to one of the m easures of central tendency – The Median
43
divide a distribution into ten equal sections. There are nine deciles in each distribution of values
which correspond to the 10th, 20th, 30th,...90th percentiles. Quartiles are points in a distribution
which divide that distribution into quarters. The first quartile (Q1) is the 25th percentile. The
second quartile (Q2) is the 50th percentile or the median. The 75th percentile is the third quartile
(Q3). These quartiles are shown in Figure 5:1.
FIGURE 5:1
LOWER QUARTILE, MEDIAN, UPPER QUARTILE
A percentile rank is used to determine where a particular score or value fits within a broader
distribution. For example: A student receives a score of 75 out of 100 on an exam and wishes to
determine how her score compares to the rest of the class. She calculates a percentile rank for a
score of 75 based on the reported scores of the entire class. Her percentile rank in this example
44
would be 80, meaning that 80 percent of scores on the exam were at or below 75.
Calculating the percentile rank of a score of 75. This point is illustrated in Figure 5:2
Figure 5:2
PERCENTILE RANK FOR SCORE OF 75
If the data are properly organized and the appropriate formulas carefully followed, the
sequential statistical steps for finding percentiles and percentile ranks are not difficult.
Researchers working with percentiles begin with the desired percentage which is used to
calculate the specific value that represents the appropriate dividing point within the distribution.
Researchers working with percentile ranks begin with a specific score or value and calculate the
percentage of cases falling at or below it. Even though it will be unclear at this point as to how
the formula for finding percentiles is actually used, the formula is given here and it will be
explained momentarily exactly how it will be used. The formula for percentiles is as follows:
45
FORMULA FOR FINDING PERCENTILES
(Simple Frequency Distribution)
Definitions of the Symbols in the Formula
kth =
The percentile one wishes to calculate. The answer will be a value.
P=
Represents the position within the distribution that marks the percentile one wishes to calculate.
For example: if one wishes to find the 50th percentile and calculated value of p=5, the 50th
percentile is equal to the 5th value in the distribution. Always begin with the lowest value when
counting and round the value of P to the nearest whole number.
n=
The total number of values in the distribution
This is a simple formula that yields the precise location of each percentile line within a
distribution. Unfortunately, researchers are often unable to obtain data sufficient for the use of this
formula. Rather than report each specific observed value, data is often presented in the form of a grouped
frequency distribution. The use of such distributions reduces precision of measurements, but it is still
possible to calculate a close approximation of any given percentile with a second formula.
FORMULA FOR FINDING PERCENTILES
(Grouped Frequency Distribution)
Definitions of the Symbols in the Formula
kth =
The percentile one wishes to calculate. The answer will be a value.
46
P=
(k ÷ 100) (n) where k is the percentile and n is the number of values in the distribution. For
example, if one wanted to find the 50th percentile, and there were 400 values (n) in the
distribution, P would be the 200th value or (50÷100) (400) = 200.
L=
lower limit of the critical interval. The critical interval is designated "critical" because it is the
interval within which the percentile will occur. The percentile will be a value at or between the
lowest and highest values of that interval. The critical interval is where P occurs. The lower
limit of the critical interval is the lowest possible value of the critical interval. For example, the
real lower limit for a critical interval $200 - 249 would be $199.50 because all values above
$199.50 would be rounded up to $200 and included in the interval.
cfb =
cumulative frequency of all the intervals below (but not including) the critical interval.
f=
frequency in the critical interval.
U=
Upper Limit of critical interval. This is the highest value that would be included in the critical
interval. For example: the 80-90 interval on an exam would typically have an upper limit of
89.5.
The next step in explaining percentiles will be to apply these concepts to some real data.
Suppose a researcher had data on consulting fees per day paid by the Environmental Protection Agency
(EPA) for a sample of 400 consultants and wanted to know what the top ten percent earned per day. First
the data are organized in a data matrix such as the one shown in Figure 5:3. Based on these data, the
above question could be answered by finding the appropriate percentile.
47
FIGURE 5:3
EPA CONSULTING FEES
Interval
Y
f
cf
1
$250-$500
32
400
2 (Critical Interval)
$200-$249
27
368 P-or 360th Value
occurs here
3
$160-$199
48
341
4
$130-$159
72
293
5
$100-$129
92
221
6
$80-$99
51
129
7
$60-$79
35
78
8
$40-$59
43
43
Solution:
(1)
Question: What is the 90th3 percentile?
Research Conclusion: The top ten percent make $234.50 to $500.00 per day. Ninety percent make at or
below $234.50 per day.
3
Finding the 90 th percentile will enable the researcher to know what the top 10% earned per day.
48
Once the data are organized, calculating a percentile is a very logical process. It is
obvious that the 360th (P) value is in interval 2 ($200 - 249) because values 342nd to 368th are
in this critical interval, and its lower limit is $199.50 (L). The 90th Percentile is the 360th value
and is between $200 and $249. There are 341 (cf) values below the critical interval and 27 (f)
values in the critical interval. The upper limit of the critical interval is $249.50. In order to
reach the 360th value, one must have 19 of the 27 values in the critical interval because 19 plus
the 341 values below the critical interval is equal to 360. The formula then produces an estimate
of how far the 19th value will fall from the lower limit of the critical interval 70% (19÷27) of the
$50 (w) is $35. The $35 is added to the real lower limit of $199.50, and the value which is
equivalent to the 90th percentile is $234.50. Ten percent of the values in the distribution are
above $234.50, and ninety percent are at or below $234.50. Graphically this conclusion is shown
in Figure 5:4.
FIGURE 5:4
90TH PERCENTILE FOR EPA FEES
49
The preceding example demonstrates that calculation of a percentile begins with determination of
the desired percentage which is then used to find a value. Calculating a percentile rank involves
the opposite procedure. One begins with a value and calculates the percentage of cases falling
below it. The formula for finding percentile ranks using a simple frequency distribution is as
follows:
Formula for Calculating Percentile Rank
(Simple Frequency Distributions)
PR=
Percentile Rank. The answer will be a percentage
Xp=
The position of the score within the distribution. Begin with the lowest value and count
the number of cases until reaching the score under consideration. Be sure to include the
score under consideration and all those of equal value when determining Xp
n=
The total number of cases in the distribution
This formula provides a simple and accurate value for the percentile rank of any given value
within a distribution. As the discussion of percentiles demonstrated, however, researchers often
lack the data necessary for the use of this formula. When data is presented in a grouped
frequency distribution, the formula for calculating the percentile rank is as follows:
50
Formula for Calculating Percentile Rank
(Grouped Frequency Distributions)
PR = Percentile rank. The answer will be a percentage
cfb=
cumulative frequency of all the values below the critical interval.4
X=
raw score or value for which one wants to find a percentile rank.
L=
lower limit of the critical interval.
U=
upper limit of critical interval.
f= frequency of the values in the critical interval.
In order to explain how percentile ranks are calculated, the same data used for calculating
percentiles will be used. The data are organized in a solution matrix shown in Figure 5:5.
Assume that a researcher wanted to know the percentage of consultants that made $135.00 or
more per day. The calculations would be as follows:
4
The critical interval for percentile ranks is the interval where X is located.
51
FIGURE 5:5
EPA CONSULTING FEES
Interval
Y
f
cf
1
$250-$500
32
400
100%
2
$200-$249
27
368
92%
3
$160-$199
48
341
85.25%
4 (critical interval)
$130-$159
72
293
73.25%
5
$100-$129
92
221
55.25%
6
$80-$99
51
129
32.25%
7
$60-$79
35
78
19.50%
8
$40-$59
43
43
10.75%
Steps for solution:
Question: What is the percentile rank for $135.00?
Conclusion:
Round the resulting value to the nearest whole number. Therefore 58% make $135.00 or less per
day. 42% make $135.00 or more per day.
52
As shown above, when calculating a percentile rank, an additional column is added to the
solution matrix. This is called the cumulative percentage (
) column. This addition is
logical because it must be remembered that the final result will be a percentage. Finding the
critical interval for calculating percentile ranks is a simple process because one knows the value
(X). In this example, the value for X is $135. $135.00 clearly falls in interval 4 ($130-159). The
real lower limit of that interval is $129.50. After the cumulative percentage (
) column
has been calculated, it is clear that 55.25% (221 ÷ 400) of the values are located in intervals 5, 6,
7, and 8 which are all below the critical interval (4).
One now knows that $135.00 is a percentile rank somewhere between 55.25% and
73.25% or the highest points of intervals 4 and 5. The width of the critical interval is $30.00.
The logic employed in step 2 is that $135.00 is $5.50 above the real lower limit of the critical
interval ($135.00 minus $129.50 is equal to $5.50). This measures the distance of X from the
bottom of the interval. The calculation (f ÷ n @ 100) in step 1 (72 ÷ 400 @ 100) is necessary to
determine the percentage of the values in the critical interval. In this example, 18% of the values
are in interval 4. Since $135.00 is not at the top of the interval, one does not add all 18% in the
interval to the 55.25% of the values below the interval. In this case, only .18 (5.5 ÷ 30) of the
18% or 3.34% in the critical interval is added to the 55.25% below. The final result is 58%. One
then concludes that 58% make $135.00 or less, and 42% make $135.00 and above per day.
Graphically this is shown in Figure 5:6.
53
FIGURE 5:6
PERCENTILE RANKS FOR $135 FOR EPA FEES
Percentiles and percentile ranks are useful statistical tools. This is especially true of the median
or 50th percentile which is the measure of central tendency that is most useful in evaluating
severely skewed distributions. There are many applications in statistical research which make
practical use of percentiles and percentile ranks. A review of the sequential steps for calculating
percentiles and percentile ranks appear at the end of this chapter.
A major idea:
The Median of a distribution of values
is the 50th percentile. A percentile is a
value and a percentile rank is a percent.
54
Step 1
SEQUENTIAL STATISTICAL STEPS
Finding Percentiles
What is the first operation that must be performed in an effort to
Organize Data
find percentiles? Construct a solution matrix.
Matrix
kth
What percentile is to be obtained? It can be the median (50th) or
any other percentile selected.
P
What is the position of the percentile you wish to calculate?
Solve for P [P=k÷100(n)] to identify the critical interval.
L
What is the real lower limit (L) of the critical interval? This is
the lowest value that can be included in the interval.
cf
What is the cumulative frequency (cf) of all the intervals below
the critical interval? This should be part of your solution matrix
Step 2
Step 3
Step 4
Step 5
Step 6
f
Step 7
i
Step 8
Step 9
Mathematical
Computation
Conclusions
What is the frequency of values within the critical interval?
This the number of values that fall within that interval.
What is the size or width of the critical interval? The size is
determined by subtracting the lower limit of the interval from the
upper limit (U-L).
Substitute the appropriate values within the formula and solve.
Draw conclusions based on the final result of your data analysis.
55
Step 1
SEQUENTIAL STATISTICAL STEPS
Finding Percentile Ranks
What is the first operation that must be performed in an effort to
Organize Data
find percentile ranks? Construct a solution matrix.
Matrix
Step 2
X
Step 3
c%
Step 4
L
Step 5
i
Step 6
Step 7
What is the raw value or score for which you wish to calculate a
percentile rank (PR)? The value may be any value which occurs
within the distribution. It must be a whole number.
What is the cumulative percentage of all the values below the
critical interval? cf÷n. Add the frequencies and divide by the
total number of values in the distribution.
What is the real lower limit (L) of the critical interval? This is
the lowest value that can be included in the interval in which X is
located.
What is the size or width of the critical interval? The size is
determined by subtracting the lower limit of the interval from the
upper limit (U-L).
What is the frequency of values in the critical interval divided by
the total number of values and multiplied by 100? The result is
the percentage of the values in the critical interval
Mathematical
Computation
Step 8
Substitute the appropriate values within the formula and solve.
Draw conclusions based on the final result of your data analysis.
Conclusions
56
Exercises – Chapter 5
(1)
Define the following terms:
(A)
Percentile
(B)
Percentile Rank
(C)
Quartile
(D)
Decile
(E)
Critical Interval
(F)
Cumulative Frequency
(G)
Cumulative Percentage
(H)
Width
(2)
The average number of days a sample of Social Security recipients had to wait for
approval of their forms in 1996 were as follows:
70, 70, 90, 20, 20, 20, 21, 25, 26, 27, 28, 30, 30, 31, 31, 31, 32, 34, 35, 36, 36, 67, 74, 84,
71, 14, 36, 86, 40, 40, 41, 45, 46, 47, 48, 49, 50, 50, 52, 53, 55, 58, 59, 15, 91, 100, 101.
Beginning with the value 10, organize these data in intervals (widths) of 15 and find the
median and the percentile rank of 30 days waiting time. What conclusions can you draw
from your answers about the waiting time of the recipients? Calculate the mean and
mode for this distribution.
(3)
Find the 75th percentile or 3rd quartile and percentile rank of 130 and 145 for the
following distribution. Identify the mean and median?
X
f
151-160
3
141-150
4
131-140
6
121-130
9
111-120
6
101-110
4
91-100
5
81-90
1
57
(4)
A class of 15 students received the following scores on a quiz
0,3,3,6,6,6,9,12,12,12,12,15,15,15,15
a)
b)
c)
(5)
Construct a simple frequency distribution
Calculate the percentile rank for a student’s score of 12
Find the 75th percentile
The following are final course grades:
101, 94, 89, 89, 89, 88, 85, 82, 81, 78, 77, 77, 77, 76, 76, 74, 73, 73, 73, 72, 71, 71, 71,
69, 67, 67, 63, 54, 46, 45, 34
(6)
Beginning with the value 30, organize these data in intervals of 9 and calculate the
median. What is the mean? What is the 8th decile? What is the first quartile? What is
the percentile rank of 68?
Charles and Mary scored 29 and 31 on the ACT test. The determining factor for a college
scholarship is that a student's score be in the top 10% of their graduating class. Charles
and Mary's graduating class obtained the following ACT scores.
ACT Scores
f
34-36
5
31-33
6
28-30
10
25-27
15
22-24
26
19-21
5
16-18
4
13-15
5
10-12
2
7-9
2
4-6
1
By making use of percentiles and percentile ranks, did Charles and Mary receive a
scholarship? Joe obtained a score of 23. One of the criteria for being admitted to the
58
college to which he applied is to be in the upper half of his class. Did Joe get admitted?
What is the mean ACT score for the class? Show all work and cross check.
(7).
Find the 3rd, 5th, and 9th deciles for the following distribution of scores on an
achievement test. What are the median and mean scores?
Class interval
f
42-44
1
39-41
2
36-38
3
33-35
4
30-32
7
27-29
5
24-26
3
21-23
2
What are the percentile ranks for 41, 31, and 25?
(8)
The following values are gross average daily earnings for construction (X) and
manufacturing (Y) workers in 1996.
X=
48, 51, 57, 64, 70, 71, 83, 91, 72, 63, 64, 48, 70, 36, 30, 27, 76, 66, 78, 88, 89, 87,
45, 50, 93, 20, 21, 24
Y=
48, 52, 57, 65, 73, 74, 82, 84, 86, 87, 92, 99, 34, 35, 28, 49, 55, 53, 71, 75, 90, 62,
26, 66, 50, 20, 21, 22
Beginning with 20, organize these data in intervals of 5 and calculate the mean, median
and mode for each distribution. What is the percentile rank for $50.00? The top 20% of
which group makes less?
59
Download