STAT 301 – BUSINESS STATISTICS

advertisement
STAT 301 – BUSINESS STATISTICS
LEVELS OF VARIABLES
Statisticians speak of the “level” of a variable as an indicator of the amount of information
available from a single datum. The four “levels” of data, from lowest to highest, are NOMINAL,
ORDINAL, INTERVAL, and RATIO.
Generally speaking, the higher the level of the data, the more information it contains. Different
levels of variables will be analyzed in different ways.
Nominal level data are data where the values are simply labels or names. (This is from the Latin
root nom-, for “name.” Remember that to “nominate” someone is to “name” that person for an
office.)
A person's gender is an example of a nominal level variable, since the values the variable can
take on (i.e., 'female' or 'male') are simply labels. Political party affiliation (Democrat,
Republican, Independent, Green, Libertarian, etc.) is another example of a nominal level
variable.
Often, the labels in a nominal variable will be coded with numbers as part of the process of
analysis. One characteristic of nominal level variables is that the way you assign these variables
is completely arbitrary. You could code 'female' as '1' and 'male' as 2, for example. Or you could
code them as '1' and '0'. Or even '42' and '846'. It really doesn't matter, as long as you remember
which one is which.
Some things that appear to be numbers are really nominal level variables, because the numbers
are assigned arbitrarily. Your phone number is an example of this; so is your zip code. They look
like numbers - but they're really just labels.
Think of it this way: Does it make any sense at all to average a bunch of zip codes? Even though
you could compute something, it would be really stupid to do so. The numbers are just labels and
their average means nothing whatsoever.
Ordinal level data are data where the values the variable can assume are ordered, low-to-high
(or, high-to-low). However, the true gaps between the observations are not necessarily of
proportional size.
The order of finish in a race (first, second, third, etc.) is an ordinal level variable. First comes
ahead of second, which comes ahead of third, etc. However, the gap between first and second
place is not necessarily the same as that between second and third.
Many surveys ask participants to answer to an opinion question on a five-point scale (“strongly
disagree, disagree, neutral, agree, strongly agree”). This is another example of an ordinal level
variable.
Responses on ordinal-level variables are often coded numerically. For example, the five-point
scale mentioned above often is coded 1=strongly disagree, 2=disagree, 3=neutral, 4=agree,
5=strongly agree.
Note that it would make NO sense to code things 1/4/2/5/3 - this would lose the “ordered”
information contained in the data. (Contrast this with nominal level data, where any coding is
OK.) However, we COULD code the data 1/3/4/5/7. In fact, it might even more sensible to do so
- the psychological distance between “strongly disagree” and “disagree” could very well be
larger than that between “disagree” and “neutral.”
Hence, taking an average of an ordinal level variable is a suspect activity - since there is still a
certain degree of arbitrariness to the way the numbers are assigned. This averaging is widely
done. However, the results should be viewed as being dubious at best.
In an interval level variable, the number we observe really is a number. That is to say, equal
sized intervals between two values always mean the same thing. However, the zero point on the
scale is assigned arbitrarily - “zero” does not mean “nothing.”
Consider, for example, the temperature of the room, as measured in degrees Fahrenheit. A
change in the temperature from 65 degrees to 70 degrees represents the same sort of increment in
heat as does the change from 70 to 75 degrees. However, 0 degrees does NOT mean “no heat.”
The zero point on the scale has been chosen arbitrarily.
Your SAT-Math score is another example of an interval level variable. The scale runs from 200
to 800. But the folk who put the test together could just as easily have made the scale run from
100 to 700, or 700 to 1300, or even (by stretching things out a bit) from 0 to 2000. The zero point
is arbitrary here.
Note, as a consequence of this, that, while it makes sense to take averages of interval-level data,
it is meaningless to take percentages. If you got 800 on your SAT-Math and your roommate got
200, your average score really is 500. However, we cannot say that you're “four times as smart at
math” as your roommate - since the scale could just as well have run 100 to 700. A temperature
of eighty degrees (Fahrenheit) is NOT “twice as hot” as one of forty degrees. (This is a principle
that is frequently violated, however. Beware news stories that say that “the school district's
average SAT-Math score declined 8 points since last year” (an OK statement), and then go on to
say “a one percent decline on the 800-point scale” (which is NOT an OK statement).
In a ratio level variable, numbers really represent quantities, and “zero” means “nothing.”
(Contrast this with interval level data, where the zero point is established arbitrarily.)
Physical weights and measures are typical ratio level variables.
Both averages and percentages make sense with ratio level variables. A four-pound weight and a
two-pound weight average out to three pounds, and the former really is twice as heavy as the
latter.
Download