FIRST DRAFT

advertisement

IS TRUE SCORE A LATENT VARIABLE?

Thomas R. Knapp & Hak P.Tam

Introduction

In our opinion, the answer to the question posed as the title of this paper is either "no" or "it depends upon what you mean by latent".

But the topic is a controversial one, as reflected in the measurement literature by the stances taken by various writers, which we shall attempt to summarize in what follows.

The extremists

The prototypical case of the argument that true score is a latent variable was provided by Frank Schmidt and John Hunter (1999) in their article concerning the measurement of intelligence. They claimed that the obtained correlation between any pair of alleged measures of intelligence should be routinely corrected for attenuation (unreliability) in order to provide an estimate of the "true" correlation between the underlying constructs. That is, they equated the traditional estimate of the correlation between true scores with a correlation between latent variables.

In their commentary regarding the Schmidt and Hunter article, Denny

Borsboom and Gideon Mellenbergh (2002) vehemently disagreed.

They argued, among other things, that in the context of classical test theory (CTT) true score is a concept that is strictly associated with reliability via X = T + E, and although T is unknown and unobservable

(as is the corresponding E) it is not latent. A discussion of latent variables, in their view, necessarily involves considerations that are fundamental to validity, not reliability. (See also Borsboom, 2005.)

Curiously, Schmidt and Hunter did not write a rejoinder to the argument made by Borsboom and Mellenbergh.

We happen to agree with Borsboom and Mellenbergh, as the example at the end of this paper will illustrate, but there is much more to the story.

Some other points of view

In his book on reliability for the social sciences, Ross Traub (1994) does not refer to true score as a latent variable, but in a later article

(1997) and in recent correspondence (2006) he argued that it is a latent variable.

Graham Dunn (2004) discussed two kinds of true scores, τ i

and S i

The former he regarded as an unmeasurable and error-free latent

. variable and the latter a possibly error-prone standard. The two are related by the equation S i

= α s

+ β s

τ i

+ e is

, where α s

is the intercept and β s

is the slope for the regression of S on τ, and e is

is the error term for that regression. In recent correspondence (2006) he reiterated his claim that τ is latent.

In his article concerning congeneric and tau-equivalent reliability,

James Graham (2006) refers to "...the latent trait being measured

(T)...in the equation X = T + E" (p. 931). He therefore agrees with

Schmidt and Hunter (1999).

In response to an e-mail message from one of us (Knapp), Leonard

Feldt (2006) wrote in part: "... I am a little unhappy with ... use of the term 'latent variable' to refer to true score and error score. In IRT

[Item Response Theory] and other contexts this term carries other implications....Using the term 'latent variable' simply as a synonym for score components that are not directly observable would not seem to accomplish this goal [of making other approaches to reliability more accessible].".

In response to the other one of us (Tam), Gregory Hancock (2006) said: "It depends. For a single test score that is presumed to be unidimensional, then yes -- I see the true score as an unmeasured

(and hence latent) variable that contributes to the measured variable score in addition to error. For scores that are multidimensional, there are multiple underlying constructs. In this case one can conceive of a true score as existing, but as being the result of many contributing latent variables as well as error."

And in response to another e-mail message from Tam, Michael Browne

(2006) wrote: "You would get different answers to this question from different people but I would regard the true score in classical test theory as a latent variable."

The nihilists

Two of the most outspoken critics of classical test theory were the

Australian psychologists John Ross and James Lumsden (Ross &

Lumsden, 1968; Lumsden, 1976), who argued that the concept of true score should be done away with. They were later joined in their advocacy of the elimination of the concept of true score by Norman

Cliff (1979).

The silent majority

Most measurement theorists seem to have been content to discuss obtained scores, true scores, and error scores without using the terms

"latent variable" or "construct" except when referring to the factor analysis (exploratory and/or confirmatory) of measurement data.

Most prominent among them was Harold Gulliksen (1950) who derived most of the results of classical test theory in two ways: (1) by defining

E as random error and letting T fall out by subtraction of E from X; and

(2) by defining T as the expected value of X across repeated independent administrations of a measuring instrument and letting E fall out by subtraction of T from X. Neither approach involved anything "latent".

What is a latent variable?

Kenneth Bollen (2002) provided several non-formal definitions (e.g., a latent variable is a "hypothetical" variable) and four different formal definitions. (See also Bollen, 1989.) The first formal definition was based upon the difficult technical concept of local independence; the second was based upon the expected value of obtained scores (which he attributed to Frederic Lord and Melvin Novick in their 1968 classic textbook); the third was based upon a non-deterministic function of obtained scores; and the fourth was his own approach based upon something called "sample realization". With so many different definitions, no wonder that people cannot all agree regarding whether or not true score is a latent variable!

An example

Consider a spelling test developed for, say, eighth graders, which consists of 50 words randomly sampled from an unabridged dictionary that are to be dictated aloud by an examiner. The test-takers are asked to write down the spelling of each word on a piece of paper

containing the numbers from 1 to 50. An individual's score is the number of words that are spelled correctly.

A person's obtained (observed) score on such a test, X, is obviously not latent, since it is both known and manifest. But how about T and

E? T has been variously referred to as the score that a person "should have" gotten (in some sort of Platonic sense) and/or as the score that is the expected value of X over a large number of independent hypothetical administrations of tests "like" the one in question (in this case, other 50-item forms of words randomly drawn from the same dictionary). Is there anything latent about that? We think not.

"Latent" and "unobservable" aren't necessarily the same thing. X, T, and E are all on the same scale (number of words spelled correctly).

There may be other variables that are "underlying constructs", such as

Verbal Fluency and Memory, perhaps, but any intellectualization of them would necessitate an excursion into validity, as Borsboom and

Mellenbergh (2002) argued.

Classical test theory says nothing about validity, the hidden assumption being that tradition, logic, or expert judgment has been appealed to in the choice of measurement procedure, much as in the measurement of physical dimensions such as height and weight.

Furthermore, classical test theory need not be concerned with "items" as such. Charles Spearman (1910) and William Brown (1910) introduced the concept of part scores, G. Frederic Kuder and Marion

Richardson (1937) extended the idea to large numbers of dichotomously-scored items, and Lee Cronbach (1951) went even further by considering items that were scored on continuous scales, but there is nothing in CTT that requires X to be a total score that is a composite of item scores.

Or are we missing something?

References

Bollen, K.A. (1989). Structural equation modeling with latent variables. New York: Wiley.

Bollen, K.A. (2002). Latent variables in psychology and the social sciences. Annual Review of Psychology, 53, 605-634.

Borsboom, D. (2005). Measuring the mind: Conceptual issues in contemporary psychometrics. New York: Cambridge University Press.

Borsboom, D., & Mellenbergh, G.J. (2002). True scores, latent variables, and constructs: A commentary on Schmidt and Hunter.

Intelligence, 30, 505-514.

Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296-322.

Browne, M.W. (December 21, 2006). Personal communication.

Cliff, N. (1979). Test theory without true scores. Psychometrika, 44,

373-393.

Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.

Dunn, G. (2004). Statistical evaluation of measurement errors:

Design and analysis of reliability studies (2nd. ed.). London: Arnold.

Dunn, G. (December 11, 2006). Personal communication.

Feldt, L.S. (December 7, 2006). Personal communication.

Graham, J.M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are and how to use them.

Educational and Psychological Measurement, 66 (6), 930-944.

Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.

Kuder, G.F., & Richardson, M.W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151-160.

Hancock, G. (December 20, 2006). Personal communication.

Lumsden, J. (1976). Test theory. Annual Review of Psychology, 27,

251-280.

Ross, J., & Lumsden, J. (1968). Attribute and reliability. British

Journal of Mathematical and Statistical Psychology, 21, Part 2, 251-

263.

Spearman, C. (1910). Correlation calculated from faulty data. British

Journal of Psychology, 3, 171-195.

Traub, R.E. (1994). Reliability for the social sciences: Theory and applications. Thousand Oaks, CA: Sage.

Traub, R.E. (1997). Classical test theory in historical perspective.

Educational Measurement: Issues and Practice, 16 (4), 8-14.

Traub, R.E. (2006). Personal communication.

Download