STAT 301 – BUSINESS STATISTICS LEVELS OF VARIABLES Statisticians speak of the “level” of a variable as an indicator of the amount of information available from a single datum. The four “levels” of data, from lowest to highest, are NOMINAL, ORDINAL, INTERVAL, and RATIO. Generally speaking, the higher the level of the data, the more information it contains. Different levels of variables will be analyzed in different ways. Nominal level data are data where the values are simply labels or names. (This is from the Latin root nom-, for “name.” Remember that to “nominate” someone is to “name” that person for an office.) A person's gender is an example of a nominal level variable, since the values the variable can take on (i.e., 'female' or 'male') are simply labels. Political party affiliation (Democrat, Republican, Independent, Green, Libertarian, etc.) is another example of a nominal level variable. Often, the labels in a nominal variable will be coded with numbers as part of the process of analysis. One characteristic of nominal level variables is that the way you assign these variables is completely arbitrary. You could code 'female' as '1' and 'male' as 2, for example. Or you could code them as '1' and '0'. Or even '42' and '846'. It really doesn't matter, as long as you remember which one is which. Some things that appear to be numbers are really nominal level variables, because the numbers are assigned arbitrarily. Your phone number is an example of this; so is your zip code. They look like numbers - but they're really just labels. Think of it this way: Does it make any sense at all to average a bunch of zip codes? Even though you could compute something, it would be really stupid to do so. The numbers are just labels and their average means nothing whatsoever. Ordinal level data are data where the values the variable can assume are ordered, low-to-high (or, high-to-low). However, the true gaps between the observations are not necessarily of proportional size. The order of finish in a race (first, second, third, etc.) is an ordinal level variable. First comes ahead of second, which comes ahead of third, etc. However, the gap between first and second place is not necessarily the same as that between second and third. Many surveys ask participants to answer to an opinion question on a five-point scale (“strongly disagree, disagree, neutral, agree, strongly agree”). This is another example of an ordinal level variable. Responses on ordinal-level variables are often coded numerically. For example, the five-point scale mentioned above often is coded 1=strongly disagree, 2=disagree, 3=neutral, 4=agree, 5=strongly agree. Note that it would make NO sense to code things 1/4/2/5/3 - this would lose the “ordered” information contained in the data. (Contrast this with nominal level data, where any coding is OK.) However, we COULD code the data 1/3/4/5/7. In fact, it might even more sensible to do so - the psychological distance between “strongly disagree” and “disagree” could very well be larger than that between “disagree” and “neutral.” Hence, taking an average of an ordinal level variable is a suspect activity - since there is still a certain degree of arbitrariness to the way the numbers are assigned. This averaging is widely done. However, the results should be viewed as being dubious at best. In an interval level variable, the number we observe really is a number. That is to say, equal sized intervals between two values always mean the same thing. However, the zero point on the scale is assigned arbitrarily - “zero” does not mean “nothing.” Consider, for example, the temperature of the room, as measured in degrees Fahrenheit. A change in the temperature from 65 degrees to 70 degrees represents the same sort of increment in heat as does the change from 70 to 75 degrees. However, 0 degrees does NOT mean “no heat.” The zero point on the scale has been chosen arbitrarily. Your SAT-Math score is another example of an interval level variable. The scale runs from 200 to 800. But the folk who put the test together could just as easily have made the scale run from 100 to 700, or 700 to 1300, or even (by stretching things out a bit) from 0 to 2000. The zero point is arbitrary here. Note, as a consequence of this, that, while it makes sense to take averages of interval-level data, it is meaningless to take percentages. If you got 800 on your SAT-Math and your roommate got 200, your average score really is 500. However, we cannot say that you're “four times as smart at math” as your roommate - since the scale could just as well have run 100 to 700. A temperature of eighty degrees (Fahrenheit) is NOT “twice as hot” as one of forty degrees. (This is a principle that is frequently violated, however. Beware news stories that say that “the school district's average SAT-Math score declined 8 points since last year” (an OK statement), and then go on to say “a one percent decline on the 800-point scale” (which is NOT an OK statement). In a ratio level variable, numbers really represent quantities, and “zero” means “nothing.” (Contrast this with interval level data, where the zero point is established arbitrarily.) Physical weights and measures are typical ratio level variables. Both averages and percentages make sense with ratio level variables. A four-pound weight and a two-pound weight average out to three pounds, and the former really is twice as heavy as the latter.