Level of measurement

advertisement
Level of measurement
From Wikipedia, the free encyclopedia
The "levels of measurement", or scales of measure are expressions that typically refer to the
theory of scale types developed by the psychologist Stanley Smith Stevens. Stevens proposed
his theory in a 1946 Science article titled "On the theory of scales of measurement".[1] In that
article, Stevens claimed that all measurement in science was conducted using four different types
of scales that he called "nominal", "ordinal", "interval" and "ratio".
Contents







1 The theory of scale types
o 1.1 Nominal scale
o 1.2 Ordinal scale
o 1.3 Interval scale
o 1.4 Ratio measurement
2 Debate on classification scheme
3 Scale types and Stevens' "operational theory of measurement"
4 Notes
5 See also
6 References
7 External links
The theory of scale types
Stevens (1946, 1951) proposed that measurements can be classified into four different types of
scales. These are shown in the table below as: nominal, ordinal, interval, and ratio.
Scale Type
Permissible Statistics
Admissible Scale
Transformation
Mathematical
structure
nominal (also
denoted as
categorical)
mode, Chi-square
One to One
(equality (=))
standard set
structure
(unordered)
ordinal
median, percentile
Monotonic
totally ordered set
increasing (order (<))
interval
mean, standard deviation, correlation,
regression, analysis of variance
Positive linear
(affine)
affine line
ratio
All statistics permitted for interval scales
plus the following: geometric mean,
harmonic mean, coefficient of variation,
logarithms
Positive similarities
(multiplication)
field
Nominal scale
At the nominal scale, i.e., for a nominal category, one uses labels; for example, rocks can be
generally categorized as igneous, sedimentary and metamorphic. For this scale, some valid
operations are equivalence and set membership. Nominal measures offer names or labels for
certain characteristics.
Variables assessed on a nominal scale are called categorical variables; see also categorical data.
Stevens (1946, p. 679) must have known that claiming nominal scales to measure obviously nonquantitative things would have attracted criticism, so he invoked his theory of measurement to
justify nominal scales as measurement:
“
…the use of numerals as names for classes is an example of the assignment of numerals
according to rule. The rule is: Do not assign the same numeral to different classes or
different numerals to the same class. Beyond that, anything goes with the nominal scale.
”
The central tendency of a nominal attribute is given by its mode; neither the mean nor the
median can be defined.
We can use a simple example of a nominal category: first names. Looking at nearby people, we
might find one or more of them named Aamir. Aamir is their label; and the set of all first names
is a nominal scale. We can only check whether two people have the same name (equivalence) or
whether a given name is in on a certain list of names (set membership), but it is impossible to say
which name is greater or less than another (comparison) or to measure the difference between
two names. Given a set of people, we can describe the set by its most common name (the mode),
but cannot provide an "average name" or even the "middle name" among all the names.
However, if we decide to sort our names alphabetically (or to sort them by length; or by how
many times they appear in the US Census), we will begin to turn this nominal scale into an
ordinal scale.
Ordinal scale
Rank-ordering data simply puts the data on an ordinal scale. Ordinal measurements describe
order, but not relative size or degree of difference between the items measured. In this scale type,
the numbers assigned to objects or events represent the rank order (1st, 2nd, 3rd, etc.) of the
entities assessed. A Likert Scale is a type of ordinal scale and may also use names with an order
such as: "bad", "medium", and "good"; or "very satisfied", "satisfied", "neutral", "unsatisfied",
"very unsatisfied." An example of an ordinal scale is the result of a horse race, which says only
which horses arrived first, second, or third but include no information about race times. Another
is the Mohs scale of mineral hardness, which characterizes the hardness of various minerals
through the ability of a harder material to scratch a softer one, saying nothing about the actual
hardness of any of them. Yet another example is military ranks; they have an order, but no welldefined numerical difference between ranks.
When using an ordinal scale, the central tendency of a group of items can be described by using
the group's mode (or most common item) or its median (the middle-ranked item), but the mean
(or average) cannot be defined.
In 1946, Stevens observed that psychological measurement usually operates on ordinal scales,
and that ordinary statistics like means and standard deviations do not have valid interpretations.
Nevertheless, such statistics can often be used to generate fruitful information, with the caveat
that caution should be taken in drawing conclusion from such statistical data.
Psychometricians like to theorise that psychometric tests produce interval scale measures of
cognitive abilities (e.g. Lord & Novick, 1968; von Eye, 2005) but there is little prima facie
evidence to suggest that such attributes are anything more than ordinal for most psychological
data (Cliff, 1996; Cliff & Keats, 2003; Michell, 2008). In particular,[2] IQ scores reflect an
ordinal scale, in which all scores are only meaningful for comparison, rather than an interval
scale, in which a given number of IQ "points" corresponds to a unit of intelligence.[3][4][5] Thus it
is an error to write that an IQ of 160 is just as different from an IQ of 130 as an IQ of 100 is
different from an IQ of 70.[6][7]
In mathematical order theory, an ordinal scale defines a total preorder of objects (in essence, a
way of sorting all the objects, in which some may be tied). The scale values themselves (such as
labels like "great", "good", and "bad"; 1st, 2nd, and 3rd) have a total order, where they may be
sorted into a single line with no ambiguities. If numbers are used to define the scale, they remain
correct even if they are transformed by any monotonically increasing function. This property is
known as the order isomorphism. A simple example follows:
Judge's score Score minus 8 Tripled score Cubed score
x
x-8
3x
x3
Alice's cooking ability 10
2
30
1000
Bob's cooking ability 9
1
27
729
Claire's cooking ability 8.5
0.5
25.5
614.125
Dana's cooking ability 8
0
24
512
Edgar's cooking ability 5
-3
15
125
Since x-8, 3x, and x3 are all monotonically increasing functions, replacing the ordinal judge's
score by any of these alternate scores does not affect the relative ranking of the five people's
cooking abilities. Each column of numbers is an equally legitimate ordinal scale for describing
their abilities. However, the numerical (additive) difference between the various ordinal scores
has no particular meaning.
See also Strict weak ordering.
Interval scale
Quantitative attributes are all measurable on interval scales, as any difference between the levels
of an attribute can be multiplied by any real number to exceed or equal another difference. A
highly familiar example of interval scale measurement is temperature with the Celsius scale. In
this particular scale, the unit of measurement is 1/100 of the temperature difference between the
freezing and boiling points of water under a pressure of 1 atmosphere. The "zero point" on an
interval scale is arbitrary; and negative values can be used. The formal mathematical term is an
affine space (in this case an affine line). The Likert scale, which is one of the most common
scales used in survey research, would be a popular example and practical application of the
'interval scale'. Variables measured at the interval level are called "interval variables" or
sometimes "scaled variables" as they have units of measurement.
Ratios between numbers on the scale are not meaningful, so operations such as multiplication
and division cannot be carried out directly. But ratios of differences can be expressed; for
example, one difference can be twice another.
The central tendency of a variable measured at the interval level can be represented by its mode,
its median, or its arithmetic mean. Statistical dispersion can be measured in most of the usual
ways, which just involved differences or averaging, such as range, interquartile range, and
standard deviation. Since one cannot divide, one cannot define measures that require a ratio, such
as studentized range or coefficient of variation. More subtly, while one can define moments
about the origin, only central moments are useful, since the choice of origin is arbitrary and not
meaningful. One can define standardized moments, since ratios of differences are meaningful,
but one cannot define coefficient of variation, since the mean is a moment about the origin,
unlike the standard deviation, which is (the square root of) a central moment.
Ratio measurement
Most measurement in the physical sciences and engineering is done on ratio scales. Mass, length,
time, plane angle, energy and electric charge are examples of physical measures that are ratio
scales. The scale type takes its name from the fact that measurement is the estimation of the ratio
between a magnitude of a continuous quantity and a unit magnitude of the same kind (Michell,
1997, 1999). Informally, the distinguishing feature of a ratio scale is the possession of a nonarbitrary zero value. For example, the Kelvin temperature scale has a non-arbitrary zero point of
absolute zero, which is denoted 0K and is equal to -273.15 degrees Celsius. This zero point is
non arbitrary as the particles that compose matter at this temperature have zero kinetic energy.
Examples of ratio scale measurement in the behavioral sciences are all but non-existent. Luce
(2000) argues that an example of ratio scale measurement in psychology can be found in rank
and sign dependent expected utility theory.
All statistical measures can be used for a variable measured at the ratio level, as all necessary
mathematical operations are defined. The central tendency of a variable measured at the ratio
level can be represented by, in addition to its mode, its median, or its arithmetic mean, also its
geometric mean or harmonic mean. In addition to the measures of statistical dispersion defined
for interval variables, such as range and standard deviation, for ratio variables one can also
define measures that require a ratio, such as studentized range or coefficient of variation.
Debate on classification scheme
There has been, and continues to be, debate about the merits of the classifications, particularly in
the cases of the nominal and ordinal classifications (Michell, 1986). Thus, while Stevens'
classification is widely adopted, it is by no means universally accepted.[8]
Duncan (1986) observed that Stevens' classification nominal measurement is contrary to his own
definition of measurement. Stevens (1975) said on his own definition of measurement that "the
assignment can be any consistent rule. The only rule not allowed would be random assignment,
for randomness amounts in effect to a nonrule". However, so-called nominal measurement
involves arbitrary assignment, and the "permissible transformation" is any number for any other.
This is one of the points made in Lord's (1953) satirical paper On the Statistical Treatment of
Football Numbers.
Among those who accept the classification scheme, there is also some controversy in behavioural
sciences over whether the mean is meaningful for ordinal measurement. In terms of measurement
theory, it is not, because the arithmetic operations are not made on numbers that are
measurements in units, and so the results of computations do not give numbers in units.
However, many behavioural scientists use means for ordinal data anyway. This is often justified
on the basis that ordinal scales in behavioural science are really somewhere between true ordinal
and interval scales; although the interval difference between two ordinal ranks is not constant, it
is often of the same order of magnitude. For example, applications of measurement models in
educational contexts often indicate that total scores have a fairly linear relationship with
measurements across a range of an assessment. Thus, some argue, that so long as the unknown
interval difference between ordinal scale ranks is not too variable, interval scale statistics such as
means can meaningfully be used on ordinal scale variables. Statistical analysis software such as
PSPP require the user to select the appropriate measurement class for each variable. This ensures
that subsequent user errors cannot inadvertently perform meaningless analyses (for example
correlation analysis with a variable on a nominal level).
L. L. Thurstone made progress toward developing a justification for obtaining interval-level
measurements based on the law of comparative judgment. For a common application of the law,
see the Analytic Hierarchy Process. Further progress was made by Georg Rasch (1960), who
developed the probabilistic Rasch model that provides a theoretical basis and justification for
obtaining interval-level measurements from counts of observations such as total scores on
assessments.
Another issue is derived from Nicholas R. Chrisman's article "Rethinking Levels of
Measurement for Cartography",[9] in which he introduces an expanded list of levels of
measurement to account for various measurements that do not necessarily fit with the traditional
notion of levels of measurement. Measurements bound to a range and repeat (like degrees in a
circle, time, etc), graded membership categories, and other types of measurement do not fit to
Steven's original work, leading to the introduction of 6 new levels of measurement leading to: (1)
Nominal, (2) Graded membership, (3) Ordinal, (4) Interval, (5) Log-Interval, (6) Extensive
Ratio, (7) Cyclical Ratio, (8) Derived Ratio, (9) Counts and finally (10) Absolute. The extended
levels of measurement are rarely used outside of academic geography.
Scale types and Stevens' "operational theory of measurement"
The theory of scale types is the intellectual handmaiden to Stevens' "operational theory of
measurement", which was to become definitive within psychology and the behavioral sciences,
despite Michell's characterization as its being quite at odds with Michell's understanding of
measurement in the natural sciences (Michell, 1999). Essentially, the operational theory of
measurement was a reaction to the conclusions of a committee established in 1932 by the British
Association for the Advancement of Science to investigate the possibility of genuine scientific
measurement in the psychological and behavioral sciences. This committee, which became
known as the Ferguson committee, published a Final Report (Ferguson, et al., 1940, p. 245) in
which Stevens' sone scale (Stevens & Davis, 1938) was an object of criticism:
“
…any law purporting to express a quantitative relation between sensation intensity and
stimulus intensity is not merely false but is in fact meaningless unless and until a meaning
can be given to the concept of addition as applied to sensation.
”
That is, if Stevens' sone scale was genuinely measuring the intensity of auditory sensations, then
evidence for such sensations as being quantitative attributes must be produced. The evidence
needed was the presence of additive structure - a concept comprehensively treated by the
German mathematician Otto Hölder (Hölder, 1901). Given the physicist and measurement
theorist Norman Robert Campbell dominated the Ferguson committee's deliberations, the
committee concluded that measurement in the social sciences was impossible due to the lack of
concatenation operations. This conclusion was later rendered false by the discovery of the theory
of conjoint measurement by Debreu (1960) and independently by Luce & Tukey (1964).
However, Stevens' reaction was not to conduct experiments to test for the presence of additive
structure in sensations, but instead to render the conclusions of the Ferguson committee null and
void by proposing a new theory of measurement:
“
Paraphrasing N.R. Campbell (Final Report, p.340), we may say that measurement, in the
broadest sense, is defined as the assignment of numerals to objects and events according to
”
rules (Stevens, 1946, p.677).
Stevens was greatly influenced by the ideas of another Harvard academic, the Nobel laureate
physicist Percy Bridgman (1927), whose doctrine of operationism Stevens used to define
measurement. In Stevens' definition for example, it is the use of a tape measure that defines
length (the object of measurement) as being measurable (and so by implication quantitative).
Critics of operationism object that it confuses the relations between two objects or events for
properties of one of those of objects or events (Hardcastle, 1995; Michell, 1999; Moyer, 1981a,b;
Rogers, 1989).
The Canadian measurement theorist William Rozeboom (1966) was an early and trenchant critic
of Stevens' theory of scale types. But it was not until much later with the work of mathematical
psychologists Theodore Alper (1985, 1987), Louis Narens (1981a, b) and R. Duncan Luce (1986,
1987, 2001) did the concept of scale types receive the mathematical rigour that it lacked at its
inception. As Luce (1997, p. 395) bluntly stated:
“
S.S. Stevens (1946, 1951, 1975) claimed that what counted was having an interval or ratio
scale. Subsequent research has given meaning to this assertion, but given his attempts to
invoke scale type ideas it is doubtful if he understood it himself…no measurement theorist I
know accepts Stevens' broad definition of measurement…in our view, the only sensible
meaning for 'rule' is empirically testable laws about the attribute.
”
Notes
1. ^ Stevens, S. S. (1946). "On the Theory of Scales of Measurement". Science 103 (2684): 677–680.
doi:10.1126/science.103.2684.677. PMID 17750512.
2. ^ Sheskin, David J. (2007). Handbook of Parametric and Nonparametric Statistical Procedures
(Fourth ed.). Boca Raton (FL): Chapman & Hall/CRC. p. 3. ISBN 9781584888147. Lay summary (27
July 2010). "Although in practice IQ and most other human characteristics measured by
psychological tests (such as anxiety, introversion, self esteem, etc.) are treated as interval scales,
many researchers would argue that they are more appropriately categorized as ordinal scales.
Such arguments would be based on the fact that such measures do not really meet the
requirements of an interval scale, because it cannot be demonstrated that equal numerical
differences at different points on the scale are comparable."
3. ^ Mussen, Paul Henry (1973). Psychology: An Introduction. Lexington (MA): Heath. p. 363.
ISBN 0-669-61383-7. "The I.Q. is essentially a rank; there are no true "units" of intellectual
ability."
4. ^ Truch, Steve (1993). The WISC-III Companion: A Guide to Interpretation and Educational
Intervention. Austin (TX): Pro-Ed. p. 35. ISBN 0890795851. "An IQ score is not an equal-interval
score, as is evident in Table A.4 in the WISC-III manual."
5. ^ Bartholomew, David J. (2004). Measuring Intelligence: Facts and Fallacies. Cambridge:
Cambridge University Press. p. 50. ISBN 9780521544788. Lay summary (27 July 2010). "When we
come to quantities like IQ or g, as we are presently able to measure them, we shall see later that
we have an even lower level of measurement—an ordinal level. This means that the numbers
6.
7.
8.
9.
we assign to individuals can only be used to rank them—the number tells us where the
individual comes in the rank order and nothing else."
^ Eysenck, Hans (1998). Intelligence: A New Look. New Brunswick (NJ): Transaction Publishers.
pp. 24–25. ISBN 1-56000-360-X. "Ideally, a scale of measurement should have a true zero-point
and identical intervals. . . . Scales of hardness lack these advantages, and so does IQ. There is no
absolute zero, and a 10-point difference may carry different meanings at different points of the
scale."
^ Mackintosh, N. J. (1998). IQ and Human Intelligence. Oxford: Oxford University Press. pp. 30–
31. ISBN 0-19-852367-X. "In the jargon of psychological measurement theory, IQ is an ordinal
scale, where we are simply rank-ordering people. . . . It is not even appropriate to claim that the
10-point difference between IQ scores of 110 and 100 is the same as the 10-point difference
between IQs of 160 and 150"
^ Velleman, Paul F.; Wilkinson,Leland (1993). "Nominal, Ordinal, Interval, and Ratio Typologies
Are Misleading". The American Statistician (American Statistical Association) 47 (1): 65–72.
doi:10.2307/2684788. JSTOR 2684788.
^ Chrisman, Nicholas R. (1998). Rethinking Levels of Measurement for Cartography. Cartography
and Geographic Information Science, vol. 25 (4), pp. 231-242
See also







Measure (mathematics)
Inter-rater reliability
Cohen's kappa
Category theory
Quantitative data
Qualitative data
Ramsey–Lewis method
References







Alper, T. M. (1985). A note on real measurement structures of scale type (m, m + 1). Journal of
Mathematical Psychology, 29, 73–81.
Alper, T.M. (1987). A classification of all order-preserving homeomorphism groups of the reals
that satisfy finite uniqueness. Journal of Mathematical Psychology, 31, 135–154.
Briand, L. & El Emam, K. & Morasca, S. (1995). On the Application of Measurement Theory in
Software Engineering. Empirical Software Engineering, 1, 61–88. [On line]
http://www2.umassd.edu/swpi/ISERN/isern-95-04.pdf
Babbie, E. (2004). The Practice of Social Research, 10th edition, Wadsworth, Thomson Learning
Inc., ISBN 0-534-62029-9
Cliff, N. (1996). Ordinal Methods for Behavioral Data Analysis. Mahwah, NJ: Lawrence Erlbaum.
ISBN 0-8058-1333-0
Cliff, N. & Keats, J. A. (2003). Ordinal Measurement in the Behavioral Sciences. Mahwah, NJ:
Erlbaum. ISBN 0-8058-2093-0
Lord, Frederic M (December 1953). "On the Statistical Treatment of Football Numbers".
American Psychologist 8 (12): 750–751. doi:10.1037/h0063675. Retrieved 16 September 2010
See also reprints in:
Readings in Statistics, Ch. 3, (Haber, A., Runyon, R.P., and Badia, P.) Reading, Mass: AddisonWesley, 1970.
Maranell, Gary Michael, ed (2007). "Chapter 31". Scaling: A Sourcebook for Behavioral Scientists.
New Brunswick, New Jersey & London, UK: Aldine Transaction. pp. 402–405. ISBN 978-0-20236175-8. Retrieved 16 September 2010



















Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA:
Addison-Wesley.
Luce, R.D. (1986). Uniqueness and homogeneity of ordered relational structures. Journal of
Mathematical Psychology, 30, 391–415.
Luce, R.D. (1987). Measurement structures with Archimedean ordered translation groups.
Order, 4, 165–189.
Luce, R.D. (1997). Quantification and symmetry: commentary on Michell 'Quantitative science
and the definition of measurement in psychology'. British Journal of Psychology, 88, 395–398.
Luce, R.D. (2000). Utility of uncertain gains and losses: measurement theoretic and experimental
approaches. Mahwah, N.J.: Lawrence Erlbaum.
Luce, R.D. (2001). Conditions equivalent to unit representations of ordered relational structures.
Journal of Mathematical Psychology, 45, 81–98.
Luce, R.D. & Tukey, J.W. (1964). Simultaneous conjoint measurement: a new scale type of
fundamental measurement. Journal of Mathematical Psychology, 1, 1–27.
Michell, J. (1986). Measurement scales and statistics: a clash of paradigms. Psychological
Bulletin, 3, 398–407.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British
Journal of Psychology, 88, 355–383.
Michell, J. (1999). Measurement in Psychology – A critical history of a methodological concept.
Cambridge: Cambridge University Press.
Michell, J. (2008). Is psychometrics pathological science? Measurement – Interdisciplinary
Research & Perspectives, 6, 7–24.
Narens, L. (1981a). A general theory of ratio scalability with remarks about the measurementtheoretic concept of meaningfulness. Theory and Decision, 13, 1–70.
Narens, L. (1981b). On the scales of measurement. Journal of Mathematical Psychology, 24,
249–275.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen:
Danish Institute for Educational Research.
Rozeboom, W.W. (1966). Scaling theory and the nature of measurement. Synthese, 16, 170–233.
Stevens, S.S (June 7, 1946). "On the Theory of Scales of Measurement". Science 103 (2684): 677–
680. doi:10.1126/science.103.2684.677. PMID 17750512. Retrieved 16 September 2010
Stevens, S.S. (1951). Mathematics, measurement and psychophysics. In S.S. Stevens (Ed.),
Handbook of experimental psychology (pp. 1–49). New York: Wiley.
Stevens, S.S. (1975). Psychophysics. New York: Wiley.
von Eye, A. (2005). Review of Cliff and Keats, Ordinal measurement in the behavioral sciences.
Applied Psychological Measurement, 29, 401–403.
External links


Hyperstat — Measurement Scales
Measurement theory: Frequently asked questions
Retrieved from "http://en.wikipedia.org/wiki/Level_of_measurement"
Categories: Scientific method | Statistical data types | Measurement


This page was last modified on 19 August 2011 at 13:46.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms
may apply. See Terms of use for details.
Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit
organization.
Download