The Median and Semi-Interquartile Range: Central Tendency and Variability of Ordinal Data Of all of the tests that we will encounter, the median (Me) and semi-interquartile range (SQR) may be the most difficult to calculate. Remember that these statistics are used to obtain the central tendency and variability on a set of ordinal data. The median (Me) is simple to define. It is the point (or a theoretical score) where 50% of the data (scores) exceed the point and 50% of the data (scores) are less than the point. Before we continue, it may be helpful to define two additional points: Q1 and Q3. Q1 is defined as a point where 75% of the data exceeds the point and 25% of the data are less than the point. Q3 is defined as the point such where 25% of the data exceeds the point and 75% of the data are less than the point. The Semi-interquartile range, is defined as: SQR = (Q3 - Q1)/2 Although the definitions and formulas appear simple enough, they can become very tedious. There are several trivial cases where the calculation of the Median (Me) and SQR does not pose large problems. We will deal with these trivial cases first. First trivial case (even number of discrete cases) 4-1 The first trivial case deals with an even number of cases with discrete scores. By discrete I mean that no two people have the same score. For example, suppose we collected the following subjective ratings of pain for eight patients. Higher scores indicate higher subjective ratings of pain. Subjective Pain Ratings 87 65 73 51 71 49 68 21 Notice in the example, I have arranged the scores in order. (You will not need to arrange your data in order if you are performing the calculations through the Stats.Exe program, it will do that for you). The first step in our process is to find Q1, the 25th percentile. To accomplish this task, we will need to multiply our sample size by 25% (8 x .25 = 2). Next, we'll begin to insert the bounds. The bounds are the numbers exactly halfway between two scores. To calculate the bounds between two scores simply add the two scores and divide by two (See the example below.) Subjective Pain Ratings 87 bound = 80 73 bound = 72 71 bound = 69.5 68 bound = 66.5 65 bound = 58 51 bound = 50 49 bound = 35 21 Now, we should examine the bounds to see if one of them will work as the point Q1. Let's try the very bottom bound (35). If this bound were Q1, would we have two scores (25% of 4-2 the eight scores) less than this point? Since this bound is too small, we need to move up to the next bound, the bound of 50. Now, do we have two scores less than this point? Yes, therefore 50 is Q1. To find the median, multiply the Theoretical Distribution for sample size by 50% to find the number of scores that must be below the median (8 x .50 Trivial Case I = 4). Continue to move up the bounds until you find the bound that separates the sample Percentages into two groups of four, which is the middle * * 25%---*> * * * * : * * : 25% *> * : * │ : : : : : │ bound (66.5). This bound must be the median * <*---25% * * * : * * : * <* 25% : * * │ Me-SQR Me 55.5 66.5 Me+SQR 77.5 (Me = 66.5). Our final step is to find Q3. To accomplish this, multiply the sample size by 75% (8 x .75 = 6). As before, move to the bound which separates the sample into groupings of six scores below and two scores above. The bound is 72 (Q3 = 72). At this point we can calculate SQR, using the formula below. SQR = (Q3 - Q1)/2 SQR = (72 - 50)/2 SQR = 22/2 = 11 Therefore, we can conclude that our median is 66.5 and SQR is 11. To help interpret what these numbers imply, consider the theoretical distribution for Trivial Case 1. What the theoretical distribution implies is that 25% of our data are between the median and an SQR, and 25% of our scores are between the median minus an SQR. Second Trivial Case (odd number of discrete cases) To demonstrate the second trivial case, we will add one more score (score of 10) to the above example. The revised data with the bounds inserted are shown below. 4-3 Subjective Pain Ratings 87 bound = 80 73 bound = 72 71 bound = 69.5 68 bound = 66.5 65 bound = 58 51 bound = 50 49 bound = 35 21 bound = 15.5 10 Now, as in the previous trivial case, our first step is to find Q1 by multiplying our new sample size (9) by 25% (9 x .25 = 2.25). If, as in the previous trivial case, we attempted to establish Q1 at the second bound (35) you can see that we would have two subjects less than this bound. However, we need exactly 2.25 subjects (literally 2 subjects plus a fraction (.25) of a third). Therefore, this bound would not suffice, nor would be the next bound 50. This bound (50) has three subjects less than the bound. At this point we must interpolate by using the following formula: D = (upper bound - lower bound) x Fraction In our case: D = (50 - 35) x .25 D = (15) x .25 D = 3.75 Now, Q1 will be given by the following formula: Q1 = lower bound + D In our case: Q1 = 35 + 3.75 = 38.75 4-4 The next step in our process is to find Q3, by determining 75% of our sample size (9 x .75 = 6.75). In this example our lower bound would be 69.5 because below this bound we have six subjects, and we need exactly 6.75. At this point we need to apply our "D" formula again, as follows: D = (72 - 69.5) x .75 D = 2.5 x .75 D = 1.88 Now, Q3 will be given by the following formula: Q3 = lower bound + D In our case: Q3 = 69.5 + 1.88 = 71.38 Our last task is to find the median, a Theoretical Distribution for simple task with an odd number of discrete scores. Our median is the middle score, (Me = Trivial Case II 65.0). Therefore, we have a distribution with a median of 65 and an SQR of 16.32. Percentages SQR = (Q3 - Q1)/2 SQR = (71.375 - 38.75)/2 * * 25%---*> * * * * : * * : 25% *> * : * │ : : : : : │ * <*---25% * * * : * * : * <* 25% : * * │ Me-SQR Me 48.7 65.5 Me+SQR 81.3 SQR = 32.63/2 SQR = 16.32 (rounded to 16.3.) The theoretical distribution of our nine pain scores should fit the observed distribution. Notice that theoretically 25% of our scores should be between 48.7 and 65 and 25% of our scores should be between 65 and 81.3. We actually had 11% and 33%, respectively. This difference in theoretical and actual percentages is not surprising considering that we started with only nine scores. Had our original sample been larger, our theoretical and actual 4-5 percentages would have been closer. The Non-Trivial Case (Data in Groups) Determining the Median and SQR for Grouped Data Many times data will be collected in some type of grouping format and not as individual discrete scores. For example, in a survey conducted by one of my students, the estimated age of the person being surveyed was collected on the following scale. Age Range 60 - 69 50 - 59 40 - 49 30 - 39 20 - 30 10 - 19 When data are grouped as opposed to discrete scores, the computational procedures for the median and SQR are modified somewhat. To demonstrate the process, suppose that our survey results were as follows: Age Range f 60 - 69 7 50 - 59 21 40 - 49 25 30 - 39 15 20 - 29 3 10 - 19 2 The bounds between the various categorical groupings are the midpoints between the groupings. For example, the bound between the categories of 20 - 29 and 30 - 39 is 29.5. Below are our scores with the bounds inserted. Age Range f 4-6 60 - 69 7 bound = 59.5 ........... 50 - 59 21 bound = 49.5 ........... 40 - 49 25 bound = 39.5 ........... 30 - 39 15 bound = 29.5 ........... 20 - 29 3 bound = 19.5 ........... 10 - 19 2 Our next step is to find Q1 by determining 25% of our sample size (73 x .25 = 18.25). Our lower bound for the determination of Q1 is 29.5 and the upper bound for Q1 is 39.5. Below the lower bound for Q1 you can see that we have 5 cases? We need 18.25, which require us to add 13.25 more cases to the lower bound. Now we use the modified "D" formula. The formula is as follows. (upper bound - lower bound) x cases to be added D = ________________________________________ Number of cases between upper and lower bounds In our example: (39.5 - 29.5) x 13.25 D = ________________ 15 D = 132.5/15 = 8.83 To determine Q1 all we need to do is add D to the lower bound (29.5), therefore: Q1 = 29.5 +8.83 Q1 = 38.33 4-7 To compute Q3 we will need to determine 75% of our sample size (73 x .75 = 54.75). As you can determine from the distribution of our scores, 45 cases would be located between the lower bound of 49.5 and the upper bound of 59.5. Below the lower bound we have 45 cases, Theoretical Distribution for so we need to include 9.75 more cases. Utilizing our modified D formula, we have: Non-Trivial Case II Percentages D= (59.5 - 49.5) x 9.75 ________________ 21 * * 25%---*> * * * * : * * : 25% *> * : * │ : : : : : │ * <*---25% * * * : * * : * <* 25% : * * │ Me-SQR Me 38.2 46.1 D = 97.5/21 D = 4.64 Therefore, Q3 is given by: Q3 = lower bound + D Q3 = 49.5 + 4.64 Q3 = 54.14 and: SQR = (54.14 - 38.33)/2 SQR = 15.81/2 = 7.90 Me+SQR 54.0 Our final step is to calculate the median. We will need to determine 50% of our sample size (73 x .50 = 36.5). From the distribution of scores we can see that 36.5 cases are located between the lower bound of 39.5 and the upper bound of 49.5. Below the bound of 39.5 are 20 cases and we needed to include 16.5 more cases. Our D formula becomes: D= (49.5 - 39.5) x 16.5 _______________ 25 D = 165/25 = 6.6 Therefore, our median is obtained by adding this D to the lower bound of 39.5 (Me = 39.5 + 6.6 = 46.1). The theoretical distribution is given above. 4-8 HOW TO PERFORM THE CALCULATIONS USING STATS.EXE To use the program disk to obtain the median and SQR you will need to access your Stats.exe program and select the program called Median. I will first show how to obtain the median for the non-trivial case. Select the "start calculations" option key. The program will ask if you want to enter the data from a file (yes) or from the keyboard (no). In the next screen, if you are calculating the median based on the trivial cases enter the number one. If you are calculating the median for data in grouped form (non-trivial case two) enter the number two. (enter 2) Next, use the data input screen to record your data, select the finished option. The data is as follows: 60 69 7 50 59 21 40 49 25 30 39 15 20 29 3 10 19 2 Following are the results that will appear on the screen. Following are the scores with the bounds inserted. ...... 69.5.................... 60 69 7 ...... 59.5.................... 50 59 21 ...... 49.5.................... 40 49 25 ...... 39.5.................... 30 39 15 ...... 29.5.................... 20 29 3 ...... 19.5.................... 10 19 2 4-9 ...... 9.5.................... median = 46.1 SQR = 7.9047 * * * * * : * * * * * *: < - 50% - > : * * * * * * * * * * * ..........................|......|......|....................... . Me-SQR 38.2 Me 46.1 Me+SQR 54 Chapter Exercises 1. Suppose that a "fast food" (take-out) restaurant had developed a new batter for their fried chicken which reduced cholesterol and fats by 50% over any of their competitors= products. Your task is to determine average ratings of taste assigned by 28 people. Ratings were made on a 50-point scale, with 0 indicating poor flavor and 50 indicating excellent flavor. What do you conclude? Draw the theoretical distribution of scores. Is the sample adequate? Following are the ratings: 10, 50, 40, 30, 40, 50, 40, 40, 20, 40, 30, 30, 40, 50, 40, 30, 20, 40, 50, 30, 40, 30, 30, 40, 10, 50, 40, 40 4-10 2. Suppose that the restaurant had performed the same experiment some years before with their old batter. The data for the former study are summarized below. Find the median and compare these results with the results from the previous problem. Address the adequacy of the sample. Taste Range Frequency 1 - 10 1 11 - 20 12 21 - 30 19 31 - 40 15 41 - 50 9 3. Now, suppose that the restaurant wanted to see if advertising the healthy aspects of their chicken preparation process, compared to their competitor=s product, might improve their customers' ratings of taste. An advertising program was undertaken which included a new package for their chicken that boldly displayed the healthy aspects of their new chicken recipe. Additionally, newspaper and radio advertisements were run for two weeks focusing on the health benefits of their new chicken frying process. Following the advertising program a research project, again, examined customers' ratings of their chicken's taste. Following are the data. What do you conclude? 10, 24, 13, 19, 47, 44, 42, 45, 30, 39, 29, 43, 25, 26, 40, 50, 49, 48, 41, 46 Addendum to Chapter 4 Determining the Adequacy of a Sample For any experiment, the sample size may be inadequate for two reasons. First, the sample may be too small; second, the sample may not be representative of the population being studied. For a more detailed presentation of sample size determination you may want to consult the Elfin Forest Software=s Thesis Writer (1995) or the Scaling.Bok file located on the web-site www.cliffordweedman.com. 4-11 To review, 50% of the data for an experiment using ordinal data should be between the median less an SQR (Me – SQR) and the median plus an SQR (Me + SQR). Therefore, to determine if your sample size is adequate, you can test to see if percentage for your sample deviates significantly from 50%. Refer to the proportion of subjects between the median plus and the median minus an SQR as Δ (pronounced delta). To test this deviation let us assume that you have a respectable number of subjects in your study, say 150 or more. Then you can construct a z-test to test the difference. You may want to consult Chapter 11 for more details on the z-test for differences between percentages. The Z= (.5 - ) formula would appear as 2 follows. .5 150 The Z-value need to achieve a significant difference is 1.96 (we will round the number to 2). After some algebraic we have: 2 2 Or, .5 = (.5 - ) 150 .8 = (.5 - ) This analysis indicates that any Δ less than .42 or greater than .58 would indicate that your sample size is inadequate. 4-12