median and SQR (semi-interquartile range)

advertisement
The Median and Semi-Interquartile Range:
Central Tendency and Variability of Ordinal Data
Of all of the tests that we will encounter, the median (Me) and semi-interquartile range
(SQR) may be the most difficult to calculate. Remember that these statistics are used to obtain
the central tendency and variability on a set of ordinal data. The median (Me) is simple to
define. It is the point (or a theoretical score) where 50% of the data (scores) exceed the point and
50% of the data (scores) are less than the point. Before we continue, it may be helpful to define
two additional points: Q1 and Q3. Q1 is defined as a point where 75% of the data exceeds the
point and 25% of the data are less than the point. Q3 is defined as the point such where 25% of
the data exceeds the point and 75% of the data are less than the point. The Semi-interquartile
range, is defined as:
SQR = (Q3 - Q1)/2
Although the definitions and formulas appear simple enough, they can become very
tedious. There are several trivial cases where the calculation of the Median (Me) and SQR does
not pose large problems. We will deal with these trivial cases first.
First trivial case (even number of discrete cases)
4-1
The first trivial case deals with an even number of cases with discrete scores. By discrete
I mean that no two people have the same score. For example, suppose we collected the
following subjective ratings of pain for eight patients. Higher scores indicate higher subjective
ratings of pain.
Subjective Pain Ratings
87
65
73
51
71
49
68
21
Notice in the example, I have arranged the scores in order. (You will not need to arrange
your data in order if you are performing the calculations through the Stats.Exe program, it will
do that for you). The first step in our process is to find Q1, the 25th percentile. To accomplish
this task, we will need to multiply our sample size by 25% (8 x .25 = 2). Next, we'll begin to
insert the bounds. The bounds are the numbers exactly halfway between two scores. To
calculate the bounds between two scores simply add the two scores and divide by two (See the
example below.)
Subjective Pain Ratings
87
bound = 80
73
bound = 72
71
bound = 69.5
68
bound = 66.5
65
bound = 58
51
bound = 50
49
bound = 35
21
Now, we should examine the bounds to see if one of them will work as the point Q1.
Let's try the very bottom bound (35). If this bound were Q1, would we have two scores (25% of
4-2
the eight scores) less than this point? Since this bound is too small, we need to move up to the
next bound, the bound of 50. Now, do we have two scores less than this point? Yes, therefore
50 is Q1.
To find the median, multiply the
Theoretical Distribution for
sample size by 50% to find the number of
scores that must be below the median (8 x .50
Trivial Case I
= 4). Continue to move up the bounds until
you find the bound that separates the sample
Percentages
into two groups of four, which is the middle
*
*
25%---*>
*
*
*
* :
*
* :
25% *>
*
:
*
│
:
:
:
:
:
│
bound (66.5). This bound must be the median
*
<*---25%
*
*
*
: *
*
: *
<* 25%
:
*
*
│
Me-SQR Me
55.5 66.5
Me+SQR
77.5
(Me = 66.5).
Our final step is to find Q3. To
accomplish this, multiply the sample size by
75% (8 x .75 = 6). As before, move to the
bound which separates the sample into
groupings of six scores below and two scores
above. The bound is 72 (Q3 = 72). At this
point we can calculate SQR, using the formula
below.
SQR = (Q3 - Q1)/2
SQR = (72 - 50)/2
SQR = 22/2 = 11
Therefore, we can conclude that our median is 66.5 and SQR is 11. To help interpret
what these numbers imply, consider the theoretical distribution for Trivial Case 1. What the
theoretical distribution implies is that 25% of our data are between the median and an SQR, and
25% of our scores are between the median minus an SQR.
Second Trivial Case (odd number of discrete cases)
To demonstrate the second trivial case, we will add one more score (score of 10) to the
above example. The revised data with the bounds inserted are shown below.
4-3
Subjective Pain Ratings
87
bound = 80
73
bound = 72
71
bound = 69.5
68
bound = 66.5
65
bound = 58
51
bound = 50
49
bound = 35
21
bound = 15.5
10
Now, as in the previous trivial case, our first step is to find Q1 by multiplying our new
sample size (9) by 25% (9 x .25 = 2.25). If, as in the previous trivial case, we attempted to
establish Q1 at the second bound (35) you can see that we would have two subjects less than this
bound. However, we need exactly 2.25 subjects (literally 2 subjects plus a fraction (.25) of a
third). Therefore, this bound would not suffice, nor would be the next bound 50. This bound
(50) has three subjects less than the bound. At this point we must interpolate by using the
following formula:
D = (upper bound - lower bound) x Fraction
In our case:
D = (50 - 35) x .25
D = (15) x .25
D = 3.75
Now, Q1 will be given by the following formula:
Q1 = lower bound + D
In our case:
Q1 = 35 + 3.75 = 38.75
4-4
The next step in our process is to find Q3, by determining 75% of our sample size (9 x
.75 = 6.75). In this example our lower bound would be 69.5 because below this bound we have
six subjects, and we need exactly 6.75. At this point we need to apply our "D" formula again, as
follows:
D = (72 - 69.5) x .75
D = 2.5 x .75
D = 1.88
Now, Q3 will be given by the following formula:
Q3 = lower bound + D
In our case:
Q3 = 69.5 + 1.88 = 71.38
Our last task is to find the median, a
Theoretical Distribution for
simple task with an odd number of discrete
scores. Our median is the middle score, (Me =
Trivial Case II
65.0). Therefore, we have a distribution with a
median of 65 and an SQR of 16.32.
Percentages
SQR = (Q3 - Q1)/2
SQR = (71.375 - 38.75)/2
*
*
25%---*>
*
*
*
* :
*
* :
25% *>
*
:
*
│
:
:
:
:
:
│
*
<*---25%
*
*
*
: *
*
: *
<* 25%
:
*
*
│
Me-SQR Me
48.7 65.5
Me+SQR
81.3
SQR = 32.63/2
SQR = 16.32 (rounded to 16.3.)
The theoretical distribution of our nine
pain scores should fit the observed distribution.
Notice that theoretically 25% of our scores
should be between 48.7 and 65 and 25% of our
scores should be between 65 and 81.3. We
actually had 11% and 33%, respectively. This
difference in theoretical and actual percentages
is not surprising considering that we started
with only nine scores. Had our original sample
been larger, our theoretical and actual
4-5
percentages would have been closer.
The Non-Trivial Case (Data in Groups)
Determining the Median and SQR for Grouped Data
Many times data will be collected in some type of grouping format and not as individual
discrete scores. For example, in a survey conducted by one of my students, the estimated age of
the person being surveyed was collected on the following scale.
Age Range
60 - 69
50 - 59
40 - 49
30 - 39
20 - 30
10 - 19
When data are grouped as opposed to discrete scores, the computational procedures for the
median and SQR are modified somewhat. To demonstrate the process, suppose that our survey
results were as follows:
Age Range
f
60 - 69
7
50 - 59
21
40 - 49
25
30 - 39
15
20 - 29
3
10 - 19
2
The bounds between the various categorical groupings are the midpoints between the
groupings. For example, the bound between the categories of 20 - 29 and 30 - 39 is 29.5. Below
are our scores with the bounds inserted.
Age Range
f
4-6
60 - 69
7
bound = 59.5 ...........
50 - 59
21
bound = 49.5 ...........
40 - 49
25
bound = 39.5 ...........
30 - 39
15
bound = 29.5 ...........
20 - 29
3
bound = 19.5 ...........
10 - 19
2
Our next step is to find Q1 by determining 25% of our sample size (73 x .25 = 18.25).
Our lower bound for the determination of Q1 is 29.5 and the upper bound for Q1 is 39.5. Below
the lower bound for Q1 you can see that we have 5 cases? We need 18.25, which require us to
add 13.25 more cases to the lower bound. Now we use the modified "D" formula. The formula
is as follows.
(upper bound - lower bound) x cases to be added
D = ________________________________________
Number of cases between upper and lower bounds
In our example:
(39.5 - 29.5) x 13.25
D = ________________
15
D = 132.5/15 = 8.83
To determine Q1 all we need to do is add D to the lower bound (29.5), therefore:
Q1 = 29.5 +8.83
Q1 = 38.33
4-7
To compute Q3 we will need to determine 75% of our sample size (73 x .75 = 54.75). As
you can determine from the distribution of our scores, 45 cases would be located between the
lower bound of 49.5 and the upper bound of
59.5. Below the lower bound we have 45 cases,
Theoretical Distribution for
so we need to include 9.75 more cases.
Utilizing our modified D formula, we have:
Non-Trivial Case II
Percentages
D=
(59.5 - 49.5) x 9.75
________________
21
*
*
25%---*>
*
*
*
* :
*
* :
25% *>
*
:
*
│
:
:
:
:
:
│
*
<*---25%
*
*
*
: *
*
: *
<* 25%
:
*
*
│
Me-SQR Me
38.2 46.1
D = 97.5/21
D = 4.64
Therefore, Q3 is given by:
Q3 = lower bound + D
Q3 = 49.5 + 4.64
Q3 = 54.14
and:
SQR = (54.14 - 38.33)/2
SQR = 15.81/2 = 7.90
Me+SQR
54.0
Our final step is to calculate the median.
We will need to determine 50% of our sample
size (73 x .50 = 36.5). From the distribution of scores we can see that 36.5 cases are located
between the lower bound of 39.5 and the upper bound of 49.5. Below the bound of 39.5 are 20
cases and we needed to include 16.5 more cases. Our D formula becomes:
D=
(49.5 - 39.5) x 16.5
_______________
25
D = 165/25 = 6.6
Therefore, our median is obtained by adding this D to the lower bound of 39.5 (Me = 39.5 + 6.6
= 46.1). The theoretical distribution is given above.
4-8
HOW TO PERFORM THE CALCULATIONS USING STATS.EXE
To use the program disk to obtain the median and SQR you will need to access your
Stats.exe program and select the program called Median. I will first show how to obtain the
median for the non-trivial case. Select the "start calculations" option key. The program will ask
if you want to enter the data from a file (yes) or from the keyboard (no). In the next screen, if
you are calculating the median based on the trivial cases enter the number one. If you are
calculating the median for data in grouped form (non-trivial case two) enter the number two.
(enter 2)
Next, use the data input screen to record your data, select the finished option. The data is as
follows:
60 69 7
50 59 21
40 49 25
30 39 15
20 29 3
10 19 2
Following are the results that will appear on the screen.
Following are the scores with the bounds inserted.
...... 69.5....................
60
69
7
...... 59.5....................
50
59
21
...... 49.5....................
40
49
25
...... 39.5....................
30
39
15
...... 29.5....................
20
29
3
...... 19.5....................
10
19
2
4-9
...... 9.5....................
median =
46.1
SQR =
7.9047
* *
*
*
*
:
*
*
*
*
*
*: < - 50% - > : *
*
*
*
*
*
*
*
*
*
*
..........................|......|......|.......................
.
Me-SQR
38.2
Me
46.1
Me+SQR
54
Chapter Exercises
1. Suppose that a "fast food" (take-out) restaurant had developed a new batter for their fried
chicken which reduced cholesterol and fats by 50% over any of their competitors= products.
Your task is to determine average ratings of taste assigned by 28 people. Ratings were made on
a 50-point scale, with 0 indicating poor flavor and 50 indicating excellent flavor. What do you
conclude? Draw the theoretical distribution of scores. Is the sample adequate? Following are the
ratings:
10, 50, 40, 30, 40, 50, 40, 40, 20, 40, 30, 30, 40, 50, 40, 30, 20, 40, 50, 30, 40, 30, 30, 40, 10, 50,
40, 40
4-10
2. Suppose that the restaurant had performed the same experiment some years before with their
old batter. The data for the former study are summarized below. Find the median and compare
these results with the results from the previous problem. Address the adequacy of the sample.
Taste Range
Frequency
1 - 10
1
11 - 20
12
21 - 30
19
31 - 40
15
41 - 50
9
3. Now, suppose that the restaurant wanted to see if advertising the healthy aspects of their
chicken preparation process, compared to their competitor=s product, might improve their
customers' ratings of taste. An advertising program was undertaken which included a new
package for their chicken that boldly displayed the healthy aspects of their new chicken recipe.
Additionally, newspaper and radio advertisements were run for two weeks focusing on the health
benefits of their new chicken frying process. Following the advertising program a research
project, again, examined customers' ratings of their chicken's taste. Following are the data.
What do you conclude?
10, 24, 13, 19, 47, 44, 42, 45, 30, 39, 29, 43, 25, 26, 40, 50, 49, 48, 41, 46
Addendum to Chapter 4
Determining the Adequacy of a Sample
For any experiment, the sample size may be inadequate for two reasons. First, the sample
may be too small; second, the sample may not be representative of the population being studied.
For a more detailed presentation of sample size determination you may want to consult the Elfin
Forest Software=s Thesis Writer (1995) or the Scaling.Bok file located on the web-site
www.cliffordweedman.com.
4-11
To review, 50% of the data for an experiment using ordinal data should be
between the median less an SQR (Me – SQR) and the median plus an SQR (Me + SQR).
Therefore, to determine if your sample size is adequate, you can test to see if percentage
for your sample deviates significantly from 50%. Refer to the proportion of subjects
between the median plus and the median minus an SQR as Δ (pronounced delta).
To test this deviation let us assume that you have a respectable number of subjects
in your study, say 150 or more. Then you can construct a z-test to test the difference.
You may want to consult Chapter 11 for more details on the z-test for differences
between percentages. The
Z=
(.5 -  )
formula would appear as
2
follows.
.5
150
The Z-value need to achieve a
significant difference is 1.96
(we will round the
number to 2). After some
algebraic we have:
2
2
Or,
.5
= (.5 -  )
150
.8 = (.5 -  )
This analysis indicates that any Δ less than .42 or greater than .58 would indicate
that your sample size is inadequate.
4-12
Download