CHAPTER 5 PERCENTILES AND PERCENTILE RANKS Percentiles and percentile ranks are frequently used as indicators of performance in both the academic and corporate worlds. Percentiles and percentile ranks provide information about how a person or thing relates to a larger group. Relative measures of this type are often extremely valuable to researchers employing statistical techniques. For example, most nationally standardized tests scores are reported as percentile ranks. Many graduate schools admit only candidates who score in the upper fifty percent of those taking certain standardized tests. Honor students are frequently determined by selecting those that fall in the top ten percent of class. Researchers may want to know which Congresspersons fall in the bottom ten percent of the ratings by ADA.1 These and other applications for percentiles and percentile ranks emphasize the need for students to be knowledgeable in this important area of statistics. This chapter provides an explanation of the meaning of these statistics and instructions for calculating them using data provided in frequency and grouped frequency distributions. Percentiles and percentile ranks are highly similar statistics. Percentiles are calculated as a means of dividing a distribution of values into 2 or more groups. They are used to determine where to draw the line between observed values within the distribution. For example: if a teacher wishes to determine the exam score that divides his class in half, with 50% scoring above and 50% scoring below, he determines the point that marks the 50th percentile.2 In statistics, there are percentile scales which have special names for various percentiles. Deciles are percentiles which 1 Receiving a low rating by Am ericans for Dem ocratic Action would probably m ean that a Congressperson would be very conservative. 2 The 50 th percentile is identical to one of the m easures of central tendency – The Median 43 divide a distribution into ten equal sections. There are nine deciles in each distribution of values which correspond to the 10th, 20th, 30th,...90th percentiles. Quartiles are points in a distribution which divide that distribution into quarters. The first quartile (Q1) is the 25th percentile. The second quartile (Q2) is the 50th percentile or the median. The 75th percentile is the third quartile (Q3). These quartiles are shown in Figure 5:1. FIGURE 5:1 LOWER QUARTILE, MEDIAN, UPPER QUARTILE A percentile rank is used to determine where a particular score or value fits within a broader distribution. For example: A student receives a score of 75 out of 100 on an exam and wishes to determine how her score compares to the rest of the class. She calculates a percentile rank for a score of 75 based on the reported scores of the entire class. Her percentile rank in this example 44 would be 80, meaning that 80 percent of scores on the exam were at or below 75. Calculating the percentile rank of a score of 75. This point is illustrated in Figure 5:2 Figure 5:2 PERCENTILE RANK FOR SCORE OF 75 If the data are properly organized and the appropriate formulas carefully followed, the sequential statistical steps for finding percentiles and percentile ranks are not difficult. Researchers working with percentiles begin with the desired percentage which is used to calculate the specific value that represents the appropriate dividing point within the distribution. Researchers working with percentile ranks begin with a specific score or value and calculate the percentage of cases falling at or below it. Even though it will be unclear at this point as to how the formula for finding percentiles is actually used, the formula is given here and it will be explained momentarily exactly how it will be used. The formula for percentiles is as follows: 45 FORMULA FOR FINDING PERCENTILES (Simple Frequency Distribution) Definitions of the Symbols in the Formula kth = The percentile one wishes to calculate. The answer will be a value. P= Represents the position within the distribution that marks the percentile one wishes to calculate. For example: if one wishes to find the 50th percentile and calculated value of p=5, the 50th percentile is equal to the 5th value in the distribution. Always begin with the lowest value when counting and round the value of P to the nearest whole number. n= The total number of values in the distribution This is a simple formula that yields the precise location of each percentile line within a distribution. Unfortunately, researchers are often unable to obtain data sufficient for the use of this formula. Rather than report each specific observed value, data is often presented in the form of a grouped frequency distribution. The use of such distributions reduces precision of measurements, but it is still possible to calculate a close approximation of any given percentile with a second formula. FORMULA FOR FINDING PERCENTILES (Grouped Frequency Distribution) Definitions of the Symbols in the Formula kth = The percentile one wishes to calculate. The answer will be a value. 46 P= (k ÷ 100) (n) where k is the percentile and n is the number of values in the distribution. For example, if one wanted to find the 50th percentile, and there were 400 values (n) in the distribution, P would be the 200th value or (50÷100) (400) = 200. L= lower limit of the critical interval. The critical interval is designated "critical" because it is the interval within which the percentile will occur. The percentile will be a value at or between the lowest and highest values of that interval. The critical interval is where P occurs. The lower limit of the critical interval is the lowest possible value of the critical interval. For example, the real lower limit for a critical interval $200 - 249 would be $199.50 because all values above $199.50 would be rounded up to $200 and included in the interval. cfb = cumulative frequency of all the intervals below (but not including) the critical interval. f= frequency in the critical interval. U= Upper Limit of critical interval. This is the highest value that would be included in the critical interval. For example: the 80-90 interval on an exam would typically have an upper limit of 89.5. The next step in explaining percentiles will be to apply these concepts to some real data. Suppose a researcher had data on consulting fees per day paid by the Environmental Protection Agency (EPA) for a sample of 400 consultants and wanted to know what the top ten percent earned per day. First the data are organized in a data matrix such as the one shown in Figure 5:3. Based on these data, the above question could be answered by finding the appropriate percentile. 47 FIGURE 5:3 EPA CONSULTING FEES Interval Y f cf 1 $250-$500 32 400 2 (Critical Interval) $200-$249 27 368 P-or 360th Value occurs here 3 $160-$199 48 341 4 $130-$159 72 293 5 $100-$129 92 221 6 $80-$99 51 129 7 $60-$79 35 78 8 $40-$59 43 43 Solution: (1) Question: What is the 90th3 percentile? Research Conclusion: The top ten percent make $234.50 to $500.00 per day. Ninety percent make at or below $234.50 per day. 3 Finding the 90 th percentile will enable the researcher to know what the top 10% earned per day. 48 Once the data are organized, calculating a percentile is a very logical process. It is obvious that the 360th (P) value is in interval 2 ($200 - 249) because values 342nd to 368th are in this critical interval, and its lower limit is $199.50 (L). The 90th Percentile is the 360th value and is between $200 and $249. There are 341 (cf) values below the critical interval and 27 (f) values in the critical interval. The upper limit of the critical interval is $249.50. In order to reach the 360th value, one must have 19 of the 27 values in the critical interval because 19 plus the 341 values below the critical interval is equal to 360. The formula then produces an estimate of how far the 19th value will fall from the lower limit of the critical interval 70% (19÷27) of the $50 (w) is $35. The $35 is added to the real lower limit of $199.50, and the value which is equivalent to the 90th percentile is $234.50. Ten percent of the values in the distribution are above $234.50, and ninety percent are at or below $234.50. Graphically this conclusion is shown in Figure 5:4. FIGURE 5:4 90TH PERCENTILE FOR EPA FEES 49 The preceding example demonstrates that calculation of a percentile begins with determination of the desired percentage which is then used to find a value. Calculating a percentile rank involves the opposite procedure. One begins with a value and calculates the percentage of cases falling below it. The formula for finding percentile ranks using a simple frequency distribution is as follows: Formula for Calculating Percentile Rank (Simple Frequency Distributions) PR= Percentile Rank. The answer will be a percentage Xp= The position of the score within the distribution. Begin with the lowest value and count the number of cases until reaching the score under consideration. Be sure to include the score under consideration and all those of equal value when determining Xp n= The total number of cases in the distribution This formula provides a simple and accurate value for the percentile rank of any given value within a distribution. As the discussion of percentiles demonstrated, however, researchers often lack the data necessary for the use of this formula. When data is presented in a grouped frequency distribution, the formula for calculating the percentile rank is as follows: 50 Formula for Calculating Percentile Rank (Grouped Frequency Distributions) PR = Percentile rank. The answer will be a percentage cfb= cumulative frequency of all the values below the critical interval.4 X= raw score or value for which one wants to find a percentile rank. L= lower limit of the critical interval. U= upper limit of critical interval. f= frequency of the values in the critical interval. In order to explain how percentile ranks are calculated, the same data used for calculating percentiles will be used. The data are organized in a solution matrix shown in Figure 5:5. Assume that a researcher wanted to know the percentage of consultants that made $135.00 or more per day. The calculations would be as follows: 4 The critical interval for percentile ranks is the interval where X is located. 51 FIGURE 5:5 EPA CONSULTING FEES Interval Y f cf 1 $250-$500 32 400 100% 2 $200-$249 27 368 92% 3 $160-$199 48 341 85.25% 4 (critical interval) $130-$159 72 293 73.25% 5 $100-$129 92 221 55.25% 6 $80-$99 51 129 32.25% 7 $60-$79 35 78 19.50% 8 $40-$59 43 43 10.75% Steps for solution: Question: What is the percentile rank for $135.00? Conclusion: Round the resulting value to the nearest whole number. Therefore 58% make $135.00 or less per day. 42% make $135.00 or more per day. 52 As shown above, when calculating a percentile rank, an additional column is added to the solution matrix. This is called the cumulative percentage ( ) column. This addition is logical because it must be remembered that the final result will be a percentage. Finding the critical interval for calculating percentile ranks is a simple process because one knows the value (X). In this example, the value for X is $135. $135.00 clearly falls in interval 4 ($130-159). The real lower limit of that interval is $129.50. After the cumulative percentage ( ) column has been calculated, it is clear that 55.25% (221 ÷ 400) of the values are located in intervals 5, 6, 7, and 8 which are all below the critical interval (4). One now knows that $135.00 is a percentile rank somewhere between 55.25% and 73.25% or the highest points of intervals 4 and 5. The width of the critical interval is $30.00. The logic employed in step 2 is that $135.00 is $5.50 above the real lower limit of the critical interval ($135.00 minus $129.50 is equal to $5.50). This measures the distance of X from the bottom of the interval. The calculation (f ÷ n @ 100) in step 1 (72 ÷ 400 @ 100) is necessary to determine the percentage of the values in the critical interval. In this example, 18% of the values are in interval 4. Since $135.00 is not at the top of the interval, one does not add all 18% in the interval to the 55.25% of the values below the interval. In this case, only .18 (5.5 ÷ 30) of the 18% or 3.34% in the critical interval is added to the 55.25% below. The final result is 58%. One then concludes that 58% make $135.00 or less, and 42% make $135.00 and above per day. Graphically this is shown in Figure 5:6. 53 FIGURE 5:6 PERCENTILE RANKS FOR $135 FOR EPA FEES Percentiles and percentile ranks are useful statistical tools. This is especially true of the median or 50th percentile which is the measure of central tendency that is most useful in evaluating severely skewed distributions. There are many applications in statistical research which make practical use of percentiles and percentile ranks. A review of the sequential steps for calculating percentiles and percentile ranks appear at the end of this chapter. A major idea: The Median of a distribution of values is the 50th percentile. A percentile is a value and a percentile rank is a percent. 54 Step 1 SEQUENTIAL STATISTICAL STEPS Finding Percentiles What is the first operation that must be performed in an effort to Organize Data find percentiles? Construct a solution matrix. Matrix kth What percentile is to be obtained? It can be the median (50th) or any other percentile selected. P What is the position of the percentile you wish to calculate? Solve for P [P=k÷100(n)] to identify the critical interval. L What is the real lower limit (L) of the critical interval? This is the lowest value that can be included in the interval. cf What is the cumulative frequency (cf) of all the intervals below the critical interval? This should be part of your solution matrix Step 2 Step 3 Step 4 Step 5 Step 6 f Step 7 i Step 8 Step 9 Mathematical Computation Conclusions What is the frequency of values within the critical interval? This the number of values that fall within that interval. What is the size or width of the critical interval? The size is determined by subtracting the lower limit of the interval from the upper limit (U-L). Substitute the appropriate values within the formula and solve. Draw conclusions based on the final result of your data analysis. 55 Step 1 SEQUENTIAL STATISTICAL STEPS Finding Percentile Ranks What is the first operation that must be performed in an effort to Organize Data find percentile ranks? Construct a solution matrix. Matrix Step 2 X Step 3 c% Step 4 L Step 5 i Step 6 Step 7 What is the raw value or score for which you wish to calculate a percentile rank (PR)? The value may be any value which occurs within the distribution. It must be a whole number. What is the cumulative percentage of all the values below the critical interval? cf÷n. Add the frequencies and divide by the total number of values in the distribution. What is the real lower limit (L) of the critical interval? This is the lowest value that can be included in the interval in which X is located. What is the size or width of the critical interval? The size is determined by subtracting the lower limit of the interval from the upper limit (U-L). What is the frequency of values in the critical interval divided by the total number of values and multiplied by 100? The result is the percentage of the values in the critical interval Mathematical Computation Step 8 Substitute the appropriate values within the formula and solve. Draw conclusions based on the final result of your data analysis. Conclusions 56 Exercises – Chapter 5 (1) Define the following terms: (A) Percentile (B) Percentile Rank (C) Quartile (D) Decile (E) Critical Interval (F) Cumulative Frequency (G) Cumulative Percentage (H) Width (2) The average number of days a sample of Social Security recipients had to wait for approval of their forms in 1996 were as follows: 70, 70, 90, 20, 20, 20, 21, 25, 26, 27, 28, 30, 30, 31, 31, 31, 32, 34, 35, 36, 36, 67, 74, 84, 71, 14, 36, 86, 40, 40, 41, 45, 46, 47, 48, 49, 50, 50, 52, 53, 55, 58, 59, 15, 91, 100, 101. Beginning with the value 10, organize these data in intervals (widths) of 15 and find the median and the percentile rank of 30 days waiting time. What conclusions can you draw from your answers about the waiting time of the recipients? Calculate the mean and mode for this distribution. (3) Find the 75th percentile or 3rd quartile and percentile rank of 130 and 145 for the following distribution. Identify the mean and median? X f 151-160 3 141-150 4 131-140 6 121-130 9 111-120 6 101-110 4 91-100 5 81-90 1 57 (4) A class of 15 students received the following scores on a quiz 0,3,3,6,6,6,9,12,12,12,12,15,15,15,15 a) b) c) (5) Construct a simple frequency distribution Calculate the percentile rank for a student’s score of 12 Find the 75th percentile The following are final course grades: 101, 94, 89, 89, 89, 88, 85, 82, 81, 78, 77, 77, 77, 76, 76, 74, 73, 73, 73, 72, 71, 71, 71, 69, 67, 67, 63, 54, 46, 45, 34 (6) Beginning with the value 30, organize these data in intervals of 9 and calculate the median. What is the mean? What is the 8th decile? What is the first quartile? What is the percentile rank of 68? Charles and Mary scored 29 and 31 on the ACT test. The determining factor for a college scholarship is that a student's score be in the top 10% of their graduating class. Charles and Mary's graduating class obtained the following ACT scores. ACT Scores f 34-36 5 31-33 6 28-30 10 25-27 15 22-24 26 19-21 5 16-18 4 13-15 5 10-12 2 7-9 2 4-6 1 By making use of percentiles and percentile ranks, did Charles and Mary receive a scholarship? Joe obtained a score of 23. One of the criteria for being admitted to the 58 college to which he applied is to be in the upper half of his class. Did Joe get admitted? What is the mean ACT score for the class? Show all work and cross check. (7). Find the 3rd, 5th, and 9th deciles for the following distribution of scores on an achievement test. What are the median and mean scores? Class interval f 42-44 1 39-41 2 36-38 3 33-35 4 30-32 7 27-29 5 24-26 3 21-23 2 What are the percentile ranks for 41, 31, and 25? (8) The following values are gross average daily earnings for construction (X) and manufacturing (Y) workers in 1996. X= 48, 51, 57, 64, 70, 71, 83, 91, 72, 63, 64, 48, 70, 36, 30, 27, 76, 66, 78, 88, 89, 87, 45, 50, 93, 20, 21, 24 Y= 48, 52, 57, 65, 73, 74, 82, 84, 86, 87, 92, 99, 34, 35, 28, 49, 55, 53, 71, 75, 90, 62, 26, 66, 50, 20, 21, 22 Beginning with 20, organize these data in intervals of 5 and calculate the mean, median and mode for each distribution. What is the percentile rank for $50.00? The top 20% of which group makes less? 59