The currently dominant approach to statistical analysis in

advertisement
Consequences of the ergodic theorems for
classical test theory, factor analysis, and the
analysis of developmental processes
Peter C.M. Molenaar
The Pennsylvania State University
1
1. Introduction
The currently dominant approach to statistical analysis in psychology and
biomedicine is based on analysis of inter-individual variation. Differences
between subjects, drawn from a population of subjects, provide the information
for making inferences about states of affairs at the population level (e.g., mean
and/or covariance structure). This approach underlies all standard statistical
analysis techniques such as analysis of variance, regression analysis, path
analysis, factor analysis, cluster analysis, and multilevel modeling techniques.
Whether the data are obtained in cross-sectional or longitudinal designs (or more
elaborated designs such as sequential designs), the statistical analysis always is
focused on the structure of inter-individual variation. Parameters and statistics of
interest are estimated by pooling across subjects, where these subjects are
assumed to be homogeneous in all relevant respects. This is the hall-mark of
analysis of inter-individual variation: the sums defining the estimators in statistical
analysis are taken over different subjects randomly drawn from a population of
presumably homogeneous subjects. In mixed modeling the population is
considered to be composed of different sub-populations, but within each
subpopulation subjects again are assumed to be homogeneous.
In the next section definitions will be given of inter-individual variation and
homogeneity of a population of subjects, but the intuitive content of these
concepts is clear. These intuitions would seem to imply that inferences about
states of affairs at the population level obtained by pooling across subjects
constitute general findings that apply to each subject in the homogeneous
population. Yet in general this is not the case. That is, in general it is not true that
inferences about states of affairs at the population level based on analysis of
inter-individual variation apply to any of the individual subjects making up the
population. This negative result is a direct implication of a set of mathematicalstatistical theorems; the so-called classical ergodic theorems (cf. Molenaar,
2004). A concise heuristic description of the classical ergodic theorems will be
given below. The main focus of this chapter, however, will be on some of the
implications of these theorems. For instance, it will be shown that classical test
theory is based on assumptions that violate the classical ergodic theorems, and
hence, in a precise sense to be defined later on, the results of classical test
theory do not apply in individual assessments. This, of course, is a serious
shortcoming of classical test theory, because many psychological tests have
been constructed and standardized according to classical test theory and are
applied in the assessment of individual subjects.
Special emphasis will be given to the fact that developmental systems constitute
prime examples of non-ergodic systems having age-dependent statistical
characteristics (mean trends and sequential dependencies). Therefore the
statistical analysis of developmental processes has to be based not on interindividual variation, as now is the standard approach, but on intra-individual
variation (where the latter type of variation will be defined in the next section). It
2
will be indicated that the insistence that developmental processes should be
studied at the individual level has a long history in theoretical developmental
psychology. The classical ergodic theorems provide a definite vindication of this
theoretical line of thought.
At the close of this chapter a new statistical modeling technique will be presented
with which it is possible to analyze developmental processes with age-dependent
statistical characteristics at the required intra-individual level. This modeling
technique is based on advanced engineering methods for the analysis of
complex dynamic systems. It will be shown that the new modeling technique
allows for the optimal guidance of ongoing developmental processes at the intraindividual level. Evidently, this opens up entirely new possibilities for applied
developmental psychological science.
2. Preliminaries
In this section definitions will be given of the main concepts used in this chapter.
The given definition of (non-)ergodicity is heuristic; selected references will be
given to the vast literature on ergodic theory for more formal elaborations.
2.1 Unit of analysis. Each actually existing human being can be conceived of as
a high-dimensional integrated system whose behavior evolves as function of
place and time. In psychology one usually does not consider place, leaving time
as the dimension of main interest. The system includes important functional
subsystems such as the perceptual, emotional, cognitive and physiological
systems, as well as their dynamic interrelationships. The complete set of
measurable time-dependent variables characterizing the system’s behavior can
be represented as the coordinates of a high-dimensional space (cf. Nayfeh &
Balachandran, 1993, Ch. 1), which will be called the behavior space. According
to De Groot (1954), the behavior space contains all the scientifically relevant
information about a person.
The realized values of all measurable variables for a particular individual at
consecutive time points constitutes a trajectory (life history) in behavior space.
This trajectory in behavior space is our basic unit of analysis. Accordingly, the
complete set of life histories of a population of human subjects can be
represented as an ensemble of trajectories in the same behavior space.
2.2 Inter- and intra-individual variation. A standard dictionary definition of
variation is: “The degree to which something differs, for example, from a former
state or value, from others of the same type, or from a standard”. The degree to
which something differs implies a comparison, either between different replicates
of the same type of entity (inter-individual variation) or else between consecutive
temporal states of the same individual entity (intra-individual variation). Based on
this dictionary definition and using the construct of an ensemble of life trajectories
defined in the previous section, it is possible to give appropriate definitions of
3
inter- and intra-individual variation. The following definitions are inspired by
Catell’s (1952) notion of the Data Box.
With respect to an ensemble of trajectories in behavior space, inter-individual
variation is defined as follows: (i) select a fixed subset of variables; (ii) select one
or more fixed time points as measurement occasions, (iii) determine the variation
of the scores on the selected variables at the selected time points by pooling
across subjects. Analysis of inter-individual variation thus defined is called Rtechnique by Cattell (1952). In contrast, intra-individual variation is defined as
follows: (i) select a fixed subset of variables; (ii) select a fixed subject; (iii)
determine the variation of the scores of the single subject on the selected
variables by pooling across time points. Analysis of intra-individual variation thus
defined is called P-technique by Cattell (1952).
2.3 Ergodicity. We now can present a heuristic definition of ergodicity in terms of
the concepts defined in the previous sections. Ergodicity addresses the following
foundational question: Given the same set of selected variables (of Cattell’s Data
Box), under which conditions will an analysis of inter-individual variation yield the
same results as an analysis of intra-individual variation? To illustrate this
question: under which conditions will factor analysis of inter-individual covariation
yield a factor solution that is equal to factor analysis of intra-individual
covariation? The latter illustration can be rephrased in terms of Cattell’s Data Box
in the following way: Under which conditions will R-technique factor analysis of
inter-individual covariation yield a solution that equals the analogous P-technique
factor solution of intra-individual covariaton?
The general answer to this question is provided by the classical ergodic
theorems (cf. Molenaar, 2004; Molenaar, 2003, chapter 3). The answer is: Only if
the ensemble of time-dependent trajectories in behavior space obeys two
rigorous conditions will an analysis of inter-individual variation yield the same
results as an analysis of intra-individual variation. The two conditions concerned
are the following. Firstly, the trajectory of each subject in the ensemble has to
obey exactly the same dynamical laws (homogeneity of the ensemble).
Secondly, each trajectory should have constant statistical characteristics in time
(stationarity, i.e., constant mean level and serial dependencies). In case either
one (or both) of these two conditions is not met, the psychological process
concerned is non-ergodic, i.e., its structure of inter-individual variation will differ
from its structure of intra-individual variation. For a non-ergodic process, the
results obtained in standard analysis of inter-individual variation do not apply at
the individual level of intra-individual variation.
The meaning of the homogeneity and stationarity assumptions will be elaborated
more fully in later sections, starting with the section on classical test theory
below. The requirement that each subject in the ensemble should obey the same
dynamical laws is expressed in the language of ergodic theory, which has its
roots in the theoretical foundations of statistical mechanics. Statistical mechanics
4
arose as the attempt by Boltzmann to explain the equilibrium characteristics of a
homogeneous gas kept under constant pressure and temperature in a container,
where the atoms of the homogeneous gas each obey the Newton laws of motion.
Nowadays ergodic theory is an independent mathematical discipline; standard
introductions are Petersen (1983) and Walters (1982). An excellent recent
monograph is Choe (2005). The theorem which for the ensuing discussion is the
most important one in the set of classical ergodic theorems has been proven by
Birkhoff (1931).
3. The non-ergodicity of classical test theory.
Many of the psychological tests currently in use have been constructed according
to the principles of classical test theory. The basic concept in classical test theory
is the concept of true score: each observed score is conceived of as a linear
combination of a true score and an error score. In their authoritative book on
classical test theory, Lord & Novick (1968) define the concept of true score as
follows. They consider a fixed person P, i.e., P is not randomly drawn from some
population but is the given person for which the true score is to be defined. The
true score of P is defined as the expected value of the propensity distribution of
P’s observed scores. The propensity distribution is characterized as a “...
distribution function defined over repeated statistically independent
measurements on the same person” (Lord & Novick, 1968, p. 30). The concept of
error score then follows straightforwardly: the error score is the difference
between the observed score and the true score.
Several aspects of this definition of true score are noteworthy. The definition is
based on the intra-individual variation characterizing a fixed person P. Repeated
administration of the same test to P yields a time series of scores of P, the mean
level of which is defined to be P’s true score. Hence this definition of true score
does not involve any comparison with other persons and therefore is not at all
dependent on inter-individual variation. The single-subject repeated measures
design used to obtain P’s time series of observed scores is akin to standard
psychophysical measurement designs (e.g., Gescheider, 1997). Lord & Novick
(1968) require that the repeated measurements are independent. This implies
that the time series of P’s scores should lack any sequential dependencies
(autocorrelation). At the close of this section we will further discuss the
requirement that repeated measurements have to be independent.
Lord & Novick (1968, p. 30) do not further elaborate their original definition of true
score in the context of intra-individual variation because: “… it is not possible in
psychology to obtain more than a few independent observations”. Instead of
considering an arbitrary large number of replicated measurements of a single
fixed person P, Lord & Novick (1968, p. 32) shift attention to an alternative
scheme in which an arbitrary large number of persons is measured at a single
fixed time: “Primarily, test theory treats individual differences or, equivalently, the
distribution of measurements over people”. Apparently it is expected that using
5
an individual differences approach, valid information can be obtained about the
distinct propensity distributions underlying individual true scores. We will see
shortly that this expectation is unwarranted.
Before focusing in the remainder of their book solely on the latter definition of
true score based on inter-individual variation, Lord & Novick (1968, p.32) make
the following interesting comment about their initial definition of true score based
on intra-individual variation: “The true and error scores defined above [based on
intra-individual variation; PM] are not those primarily considered in test theory …
They are, however, those that would be of interest to a theory that deals with
individuals, rather than with groups (counseling rather than selection)”. This is a
remarkable, though somewhat oblique statement. What is clear is that Lord &
Novick consider a test theory based on their initial concept of true score, defined
as the mean of the intra-individual variation characterizing a fixed person P, to be
“… of interest to a theory that deals with individuals …”. That is, they consider
such a test theory based on intra-individual variation to be important in the
context of individual assessment. But what is not clear is whether they also
consider the alternative concept of true score based on inter-individual variation
(individual differences) to be not of interest to a theory that deals with individuals.
That is, do they imply that classical test theory as we know it is only appropriate
for the assessment of groups and not for individuals? It will be shown that
classical test theory indeed is inappropriate for individual assessment.
To summarize the discussion thus far: Lord & Novick (1968) define the concept
of true score as the expected value of the propensity distribution of the observed
scores of a given individual person P. This definition of true score based on intraindividual variation then is used in an inter-individual context focused on
individual differences, i.e., classical test theory as we know it. This raises the allimportant question whether the information provided by individual differences
(inter-individual variation) is able to determine the individual propensity
distributions to a degree which is sufficient to apply the concept of true score
based on intra-individual variation. It is noted that this is exactly the question
concerning the ergodicity of the psychological process concerned: for a given
test, will an analysis of inter-individual variation of test scores yield the same
results as an analysis of intra-individual variation of test scores? To answer this
question it has to be established that the psychological process presumed by
classical test theory to underlie the generation of test scores obeys the two
criteria for ergodicity.
The psychological process which according to classical test theory underlies the
generation of test scores is very simple. It is implicit in the definition of true score
given by Lord & Novick (1968). Each individual person P is assumed to generate
a time series of independent scores in response to repeated administration of the
same test. Each observed score of P’s time series constitutes an independent
random sample drawn from P’s propensity distribution. Hence there exists a oneto-one relationship between the time series of P’s observed test scores and P’s
6
propensity distribution. The psychological process underlying P’s time series of
observed scores therefore is characterized, according to classical test theory, by
P’s propensity distribution. Statistical analysis of P’s intra-individual variation
boils down to statistical analysis based on P’s propensity distribution. Classical
test theory only considers the first two central moments of P’s propensity
distribution (its mean and its variance).
According to classical test theory the propensity distributions of different persons
have different means and different variances. The true score of person P1 (i.e.,
the mean of the propensity distribution of P1) will in general differ from the true
score of person P2. Also the variance of P1’s observed scores will in general
differ from the variance of P2 observed scores. Hence, given the one-to-one
correspondence between individual time series and individual propensity
distributions noted above, the ensemble involving persons Pi, i=1,2,…, is
populated by time series (propensity distributions) which have different mean
levels (means of the propensity distributions) and different variances. Clearly
such an ensemble is entirely heterogeneous: the psychological process
according to which Pi’s time series of observed scores is generated is different
from the psychological process according to which Pk’s time series of observed
scores is generated because, for i ≠ k, the underlying propensity distribution of Pj
has mean and variance different from Pk’s propensity distribution. Consequently
the ensemble of time series (propensity distributions) violates at least one of the
two criteria for ergodicity: the trajectories (time series) in the ensemble do not
obey the homogeneity criterion for ergodity, i.e., trajectories associated with
different persons do not obey exactly the same dynamical laws. Stated more
specifically, the random motion characterizing time series of observed scores in
the ensemble has different mean level and variance for different persons.
Consequently, the psychological process which according to classical test theory
underlies the generation of test scores is non-ergodic. That is, it follows from the
classical ergodic theorems that results obtained in an analysis of inter-individual
variation (individual differences) of test scores based on classical test theory do
not apply at the individual level of intra-individual variation. In short, the results
obtained with classical test theory do not apply in the context of individual
assessment.
3.1 Some formal elaborations.
We will now present some simple formal elaborations showing the invalidity of
classical test theory for individual assessment. In particular we will focus on the
concept of reliability as defined in classical test theory, show how estimation of
an individual’s true score in classical test theory depends upon the reliability of
the test, and indicate why this leads to invalid inferences. In what follows
expressions related to classical test theory are based on Lord & Novick (1968).
Consider first the situation with respect to the definition of true score based on
intra-individual variation. A particular test has been selected (it will be understood
7
in the rest of this section that the same test is being considered). Also a particular
person P is given. Let y(P,t), t=1,2,… denote the time series of P’s scores
obtained by repeatedly administering the test. The number of repeated
measurements is left undefined: it is understood that this number can be taken to
be arbitrarily large. Then the true score of P, (P), is defined as the expected
value (mean) of y(P,t) across all repeated measurements t. Notice that (P) is a
constant. The variance of y(P,t) across all repeated measurements is denoted by
2(P). The variance 2(P) is a measure of the reliability of a single score y(P,t=T)
which is obtained at the T-th repeated measurement (T arbitrary), conceived of
as an indicator of P’s true score (P). If 2(P) is large, y(P,t=T) can be very
different from (P), whereas if 2(P) is small its value will be close to (P).
To reiterate, in classical test theory one does not consider an arbitrary large
number of repeated measurements of a single person P, but instead one
considers an arbitrary large number of persons measured at a single time T. This
is the shift from an intra-individual variation perspective underlying the concept of
true score to an inter-individual variation perspective underlying classical test
theory as we know it. Accordingly we consider an ensemble of time series of test
scores associated with different persons Pi, i=1,2,…, where the number of
persons can be taken arbitrarily large. Associated with each distinct person P i is
a distinct propensity distribution which has, as explained above, a one-to-one
relationship with the psychological process according to which Pi generates
his/her time series of observed test scores. The mean (true score) of the
propensity distribution of Pi is (Pi) and the observed score of Pi is y(P,t=T),
where T is arbitrary but fixed. To ease the presentation we will denote (Pi) as i
and y(Pi, t=T) as yi. The error score associated with y(Pi, t=T) = yi is (Pi, t=T) and
will be denoted as (Pi, t=T) = i,.
We now are ready to express the basic relationships of classical test theory:
(1a) yi = i + i, i=1,2,…
(1b) var[yi] = var[i] + var[i].
According to (1a) the observed score yi of a randomly selected person Pi is a
linear combination of the true score i and the error score i of Pi. According to
(1b) the variance of observed scores across persons consists of a linear
combination of the variance of the true scores across persons and the variance
of the error scores across persons. The reliability  of the test then is defined as:
(1c)  = var[i] / { var[i] + var[i]}.
Hence the reliability  is the proportion of true score variance across persons in
the total variance of observed scores across persons.
8
Now suppose that the reliability  of our test is given and that also is given the
observed score yi of person Pi. Then the following so-called Kelly estimator of the
true score i of Pi can be defined (cf. Lord & Novick, 1968, p. 65, formula 3.7.2a):
(2a) est[i yi] = yi + (1 - )
where  is the mean of observed scores across persons. The error variance
associated with the Kelly estimator (2a) is (Lord & Novick, 1968, p. 68, formula
3.8.4a):
(2b) var{est[i yi]} = var[yi](1 - ).
Expressions (2a) and (2b) show that the estimate and associated standard error
of a person’s true score in classical test theory are a direct function of the test
reliability . The reliability itself is according to (1c) a direct function of the
variance of error scores var[i] across persons. Hence the Kelly estimate (2a) of a
person’s true score is a direct function of the error variance var[i] across
persons.
We have reached the conclusion that in classical test theory based on analysis of
inter-individual variation (individual differences), the estimate of a person’s true
score as well as the standard error of this estimated true score depend directly
upon the reliability  of the test. In contrast, it was indicated at the beginning of
this section that the variance 2(P) of the propensity distribution describing P’s
intra-individual variation is a measure of the reliability of a single score y(P,t=T)
estimating P’s true score (P). Hence we have two different concepts of
reliability: an intra-individual definition in which the reliability is given by 2(P) and
an inter-individual definition in which the reliability is a direct function of var[i].
Given that the definition of true score as the mean of a person P’s propensity
distribution is the starting point of both concepts of reliability, the definition of
reliability in terms of the intra-individual variance 2(P) is basic. The question
then arises whether the classical test theoretical definition of reliability in terms of
the inter-individual error variance var[i] is a good approximation of 2(P). The
answer to this question is given by the following expression (Lord & Novick,
1968, p. 35, formula 2.6.4):
(3) var[i] = Ei[2(Pi)]
where Ei denotes the expectation taken over persons Pi, i=1,2,… . Expression (3)
states that the inter-individual error variance var[i] is the mean of the intraindividual variances of individual propensity distributions across persons Pi,
i=1,2,… .
So, coming to our final verdict, how good an approximation is (3) for each of the
individual variances 2(Pi), i=1,2,… ? Given that the number of persons in the
9
ensemble is taken to be arbitrarily large, and given that the 2(Pi), i=1,2,… can
differ arbitrarily according to classical test theory, it is immediately clear that in
general (3) bears no relationship to any of the variances of the individual
propensity distributions. Hence (3) is a poor approximation to the variances 2(Pi)
of the individual propensity distributions. Suppose that (3) is small, which implies
that the (inter-individual) reliability  is high. This leaves entirely open the
possibility that the variance 2(P) of a given person P’s propensity distribution is
arbitrary large (the psychological process generating test scores is
heterogeneous, hence non-ergodic). Estimation of P’s true score by means of the
Kelly estimator (2a) then will yield a severely biased result. Also the standard
error (2b) of this estimate will be severely biased, suggesting an illusory high
precision of the Kelly estimate. Only the actual value of 2(P) will provide the
correct precision of taking P’s observed score as an estimate of P’s true score.
The true value of 2(P) only can be estimated in an analysis of P’s intra-individual
variation. That is, the test should be repeatedly administered to P, yielding a time
series of P’s observed scores. The mean of P’s time series of observed scores
constitutes an unbiased estimate of P’s true score, and the standard deviation of
P’s time series of observed scores provides an unbiased estimate of the
precision of P’s estimated true score.
3.2 Fundamental reasons or contingent circumstances
This section presents a critical discussion of the reasons why Lord & Novick
(1968), after having defined the concept of true score in terms of intra-individual
variation, do not further pursue an intra-individual foundation for test theory and
turn instead to an inter-individual perspective. It will be argued that their reasons
for doing so are not fundamental, but pertain to contingent circumstances that
can be dealt with by means of appropriate statistical-methodological techniques.
The key remark leading up to the rejection of the possibility of a test theory based
on intra-individual variation is the following: Characterizing the propensity
distribution associated with the time series of a given person P’s observed test
scores, Lord & Novick (1968, p. 30) require that the “... distribution function [is]
defined over repeated statistically independent measurements on the same
person”. The important qualification is that the repeated measurements should
be statistically independent. This implies the requirement that P’s time series of
observed test scores should lack sequential dependencies (e.g., autocorrelation).
After having postulated the requirement of obtaining statistically independent
observed scores, Lord & Novick (1968, p. 30) conclude: “… it is not possible in
psychology to obtain more than a few independent observations”. This is the
reason why they do not consider the possibility of a test theory based on intraindividual variation to be feasible. In general test scores obtained in a singlesubject time series design will be sequentially dependent, i.e., have significant
autocorrelation. Moreover, the statistical properties of the psychological process
according to which test scores are generated may change in time. For instance,
10
the process concerned may be vulnerable to learning and habituation influences
which induce time-dependent changes in the way test scores are being
generated.
Before scrutinizing the details of Lord & Novick’s (1968) requirement that
repeated measurements of the same person P should be statistically
independent, we first consider their reason not to pursue a test theory based on
intra-individual variation. Because the basic concept underlying classical test
theory, the concept of true score, is defined at the level of intra-individual
variation, one would expect that the reason to leave that level and move to a
different level of inter-individual variation would have to be a fundamental reason.
One would expect to be given an argument involving issues of logical necessity
or impossibility. Yet the actual argument given by Lord & Novick (1968) concerns
more an issue of contingent character: repeated measurement of the same
person P yields test scores that are in general not statistically independent.
Indeed, all psychometricians will agree. But the statistical analysis techniques
used to determine P’s propensity distribution can accommodate the presence of
sequential dependencies, and then we still can have a test theory which is
directly based on the concept of true score as defined by Lord & Novick (1968).
That is, a test theory based on intra-individual variation which would be of
interest for individual assessment and counseling. The reason which Lord &
Novick (1968) give for not further pursuing such a test theory is not fundamental
and does not prove the impossibility of such a theory.
We now turn to discussion of the requirement that repeatedly measuring the
same person P should yield a time series of statistically independent scores. To
reiterate, no psychometrician will expect this to occur: repeated measurement of
the same person generally will yield a time series of sequentially dependent
scores. But is this problematic? The time series of scores provides the
information to determine the propensity distribution characterizing person P. In
particular, the mean and variance of P’s propensity distribution have to be
determined. This is a standard problem in the statistical analysis of time series
that has been completely solved in case the time series is stationary (cf.
Anderson, 1971). Hence the important requirement is not that P’s time series
should consist of statistically independent scores, but that the time series is
stationary. Stationarity of a time series implies that the series has constant mean
level and that its autocorrelation only depends upon the relative distance (lag)
between measurement occasions.
The alternative requirement that a time series has to be stationary can be tested
for in several ways (cf. Priestley, 1988). In case such tests indicate that the
series is non-stationary, it can be analyzed by means of special techniques such
as evolutionary spectrum analysis (Priestley, 1988) or wavelet analysis (e.g.,
Hogan & Lakey, 2005; Houtveen & Molenaar, 2001). At the close of this chapter
a new modeling technique for multivariate non-stationary time series will be
presented. Hence from a statistical analytic point of view non-stationary time
11
series can be handled satisfactorily. Yet from the point of view of a test theory
based on intra-individual variation, a person P’s time series of test scores should
be stationary in order to allow estimation of the constant mean and constant
variance of P’s time-invariant propensity distribution. In case P’s time series of
test scores is non-stationary, the mean and/or variance of the series will in
general be time-varying. Lord & Novick’s (1968) definition of true score, however,
does not pertain to time-varying propensity distributions with time-varying means
and/or variances.
Hence either methodological or statistical techniques have to be invoked in order
to guarantee that P’s time series of test scores is stationary. Only then can the
(constant) mean and variance of P’s time series be used as estimates of the
mean and variance of P’s propensity distribution. Methodological techniques can
be used to guarantee that non-stationarity due to learning and habituation is
avoided. For instance, using a common approach in reaction time research,
registration of P’s time series of test scores only should begin if P has reached a
steady state after an initial transient due to novelty effects. This will require the
availability of a pool of many parallel test items in order to avoid learning effects.
Statistical techniques can be used a posteriori to remove transient effects due to
habituation and learning from P’s time series of test scores (e.g., Molenaar &
Roelofs, 1987). Almost certainly new methodological and statistical techniques
will have to be developed in order to accommodate the intricacies due to nonstationarity and fully exploit the possibilities of a test theory based on intraindividual variation. Until now these possibilities have not been pursued
systematically, for the wrong reasons as has been argued in this section. Given
that the psychological process underlying the generation of test scores is nonergodic according to classical test theory based on analysis of inter-individual
variation, psychometricians will have to seriously reconsider their reasons for not
pursuing a test theory based on intra-individual variation.
One promising psychological paradigm which allows for straightforward
determination of person-specific propensity distributions is mental chronometry.
In his excellent monograph on mental chronometry, Jensen (2006, p.96) states:
“The main reasons for the usefulness of chronometry are not only the
advantages of its absolute scale properties, but also its sensitivity and precision
for measuring small changes in cognitive functioning, the unlimited repeatability
of measurements under identical procedures, the adaptability of chronometric
techniques for measuring a variety of cognitive processes, and the possibility of
obtaining the same measurements with consistently identical tasks and
procedures over an extremely wide age range” (italics added). The possibility to
obtain unlimited repeated measurements under identical procedures will allow for
the determination of person-specific reaction time propensity distributions with
arbitrary precision. Jensen presents impressive empirical results showing the
importance of not only the intra-individual means of person-specific reaction time
distributions, but also their intra-individual variances in assessing cognitive status
and development (e.g., in the context of the so-called neural noise hypothesis;
12
Jensen, 2006, p.122 ff.). Consequently, I conjecture that mental chronometry
provides a very interesting approach to pursue a test theory based on intraindividual variation.
3.3 Additional thoughts
The impact of the fact that the ensemble of time series underlying classical test
theory is non-ergodic is enormous. Psychological tests are applied for individual
assessment in all kinds of settings. Using the population average expressed by
formula (3) as estimate of the intra-individual variance 2(P) of a given person P
can lead to entirely erroneous conclusions. To give an arbitrary example:
suppose that the norm  of a test is  = 100, that the inter-individual reliability 
of the test is  = 0.9, and that the between-subjects variance of test scores is
var[yi] = 25. Suppose also that a true score which is larger than yC = 120 is
considered reason for special treatment (clinical, educational, or otherwise).
Finally, suppose that person P has observed score yP = y(P,t=T) = 126. Then the
Kelly estimate (2a) of P’s true score P is: est[P yP] = 0.9*126 + (1 – 0.9)*100 =
123.4. According to (2b) the error variance of this estimated true score is:
var{est[P yP]} = 25*(1 – 0.9)*0.9 = 2.25. Hence the standard error is 1.5 and a
commonly used confidence interval about the estimated true score is: 123.4 ±
2*1.5, yielding 120.4 < est[P yP] < 126.4. This confidence interval is entirely
located above the criterion score yC = 120, hence it is concluded that P needs
special treatment. Suppose, however, that the intra-individual variance 2(P) of
P’s propensity distribution is 2(P) = 36. Then the difference between P’s
observed score, yP = 126, and the criterion score for special treatment, yC = 120,
is only 1 standard deviation, which according to standard statistical criteria would
not indicate that P needs special treatment.
Numerical exercises such as the one given above can be carried out in a variety
of formats, using Monte Carlo simulation techniques and alternative settings. We
intend to report the results of one such a simulation study in a separate
publication. But the overall message should be clear: using the (inter-individual)
population value of the error variance (based on the inter-individual reliability) as
approximation for the intra-individual variance of a person P’s propensity
distribution is vulnerable to lead to erroneous conclusions about P’s true score,
and, consequently, to erroneous decisions about the necessity to apply special
treatment to P. The fundamental reason for the invalidity of (3) as approximation
for 2(P) is because the ensemble of time series of observed scores is nonergodic.
4. Hidden heterogeneity
In the previous section we discussed heterogeneity with respect to the means
and variances of the propensity distributions underlying classical test theory. That
13
kind of heterogeneity can be considered to be a special instance of a much wider
class of heterogeneous phenomena, including also qualitative heterogeneity. An
important example of qualitative heterogeneity concerns individual differences in
the loadings in a factor model. The standard factor model of inter-individual
covariation is (using bold face lower case letters for vectors and bold face upper
case letters for matrices):
(4) yi = i+ i, i=1,2,…
where: yi = [y1i, y2i, …, ypi]’ is the p-variate vector of observed scores of a
randomly drawn subject i (the apostrophe denotes transposition); i =
[1i, 2i, …, qi]’ is the q-variate vector of factor scores of subject I; i =
[1i, 2i, …, pi]’ is the p-variate vector of measurement errors for subject i, and 
is the (p,q)-dimensional matrix of factor loadings.
The factor model of inter-individual covariation not only underlies classical test
theory, but is of central importance in much of psychology. The factor model can
be heuristically characterized as follows. In the context of the behavior space
introduced in section 2.1, choose a fixed time and a select a set of p variables y
which are considered to be indicators of a q-variate latent factor . Then the
factor loadings  represent the regression coefficients in the linear relationships
between the p indicators and the q-variate latent factor . It is an essential
assumption underlying the factor model that the factor loadings are invariant
across subjects. That is,  does not depend upon i, where the subscript i stands
for subject i in the population; i = 1,2,… .Hence the assumption is that each
individual person i in the population has a person-specific q-variate factor score
i and person-specific p-variate error score i, but the factor model for each
person in the population has the same (p.q)-dimensional matrix of factor loadings
.
Suppose now that we carry out a simulation experiment in which each person not
only has a person-specific q-variate factor score and p-variate error score, but
also a person-specific set of values for the factor loadings i, i = 1,2,… . Hence
each person has a person-specific factor model:
(5) yi = ii+ i, i=1,2,…
This heterogeneity of factor loadings i, i = 1,2,…, constitutes a severe violation
of an important assumption underlying the standard factor model, namely the
assumption that the matrix of factor loadings should be invariant (fixed) across
subjects. The fact that the matrix of factor loadings in (5) is subject-specific
implies that the way in which factors are expressed in the observed scores is
qualitatively different for different subjects. These inter-individual differences in
the values of factor scores are called qualitative because the substantial
interpretation (semantic labeling) of factors is based on these loading values.
14
Despite the fact that (5) involves a severe violation of the qualitative homogeneity
assumption (invariance of factor loadings across subjects) underlying the
standard factor model (4), it was shown in a number of simulation studies that
factor analysis of inter-individual covariation appears to be insensitive to this
violation. The typical set-up of these simulation studies was to generate data
according to the person-specific (qualitatively heterogeneous) factor model (5),
and then fit the standard factor model (4) to the simulated data. Although one
would expect the fit of model (4) to be poor due to the fact that the simulated data
violate the assumption of qualitative homogeneity underlying model (4), it turns
out that this is not at all the case. The general finding in these simulation studies
is that (variants of) factor model (4) provide(s) satisfactory fits to data generated
according to (variants of) factor model (5). Satisfactory fits, that is, according to
all usual criteria of goodness-of-fit, such as the chi-squared likelihood ratio test,
standardized root mean square residual, and root mean square error of
approximation (cf. Brown, 2006, for definitions and discussion of these criteria).
Nowhere in the obtained (Maximum Likelihood) solutions a flag is waving
indicating that something is fundamentally wrong. These simulation studies were
based on the cross-sectional factor model (Molenaar, 1997), the longitudinal
factor model (Molenaar, 1999) and the behavior genetical factor model for
multivariate phenotypes of MZ and DZ twins (Molenaar et al., 2003). A
mathematical-statistical proof of the insensitivity of the factor model of interindividual covariation to the qualitative heterogeneity of the factor loadings is
given in Kelderman & Molenaar (2006).
Evidently, the finding that the standard factor model of inter-individual covariation
is insensitive to the presence of extreme qualitative heterogeneity in the
population of subjects, created by the person-specific matrices of factor loadings
i, i = 1,2,…, in (5), raises serious questions. To reiterate, nothing in the results
obtained with the standard factor analyses based on model (4) indicates that the
true state of affairs is in severe violation of the assumptions underlying this
model. The standard factor models yield satisfactory fits to the data generated
according to model (5). Consequently, the presence of substantial qualitative
heterogeneity in the simulated data remains entirely hidden in the standard factor
analyses based on inter-individual covariation. Before discussing some of the
consequences of this finding, it is noted that there exist a prior reasons to expect
that wide-spread qualitative heterogeneity actually exists in human populations.
The reasons have to do with the way in which cortical neural networks grow and
adapt during the life span, namely by means of self-organizing epigenetic
processes (cf. Molenaar et al., 1993). Self-organizing growth and adaptation give
rise to emergent endogenous variation in neural network connections, even
between homologous structures located at the left and right sides of the brain
within the same subject (cf. Edelman, 1987). In so far as cognitive information
processing is associated with cortical neural activity, one can expect that these
endogenously generated differences in neural network architectures will become
discernable as qualitative heterogeneity of the structure of observed behavior of
15
different subjects (see Molenaar, 2006, for further elaboration and mathematicalbiological modeling of these epigenetic processes).
One direct consequence of the fact that standard factor analysis of interindividual covariation is insensitive to qualitative heterogeneity is the following.
Suppose that the standard q-factor model (4) yields a satisfactory fit to the data
obtained with a test composed of p subtests (e.g., items). Let est[] denote the
estimated (p,q)-dimensional matrix of factor loadings thus obtained. Suppose
also that in reality qualitative heterogeneity is present in the population of
subjects, so that the true (p,q)-dimensional matrix of factor loadings P for a
given subject P differs substantially from the nominal loading matrix est[]. For
instance, several of the p subtests have negative or zero loadings in P whereas
the analogous loadings in est[] are high and positive. Of course P is unknown
in the context of standard factor analysis of inter-individual variation. The
estimate of P’s factor score, est[P], is based on the nominal loading matrix
est[] and, because est[] is a poor approximation of the true P, this estimate
est[P] will be substantially biased. For quantitative details about this bias the
reader is referred to the publications mentioned above (Molenaar, 1999;
Molenaar et al., 2003; Kelderman & Molenaar, 2006).
Another consequence of the insensitivity of standard factor analysis of intraindividual variation to qualitative heterogeneity concerns the fact that the
semantic interpretation of factors thus obtained is inappropriate at the personspecific level. Suppose that standard factor analysis of personality test scores
yields the expected pattern of factor loadings in est[] corresponding to the Big
Five theory (cf. Borkenau & Ostendorf, 1998). Then, if qualitative heterogeneity is
present, the factor loadings in P for a particular person P may not at all conform
to the Big Five pattern and hence the semantic interpretation of the factors for P
will be different. Stated more specifically, the nominal semantic interpretation of
the five factors obtained in standard factor analysis is inappropriate for P. The
reader is referred to Hamaker, Dolan, & Molenaar (2005) for an elaborate
illustration based on empirical personality test scores.
5. Heterogeneity in time
To reiterate, a (psychological) process should obey two criteria in order to qualify
as an ergodic process. Firstly, the trajectory of each subject in the ensemble
should conform to exactly the same dynamical laws (homogeneity of the
ensemble). Secondly, each trajectory should have constant statistical
characteristics in time (stationarity, i.e., constant mean level and serial
dependencies which only depend upon relative time differences). In the previous
sections attention has been confined to psychological processes which are nonergodic because they violate the first criterion, i.e., heterogeneity of different
trajectories in the ensemble. Whereas the first criterion involves a comparison
between different trajectories, the second stationarity criterion involves
comparison of the same trajectory at different times. In this section we will
16
consider psychological processes which are non-ergodic because they violate
the second criterion, i.e., they are non-stationary.
In general, non-stationarity implies that parameters of a dynamic system are
time-varying. Prime examples of non-stationary systems are developmental
systems which typically have time-varying parameters such as waxing and/or
waning factor loadings. For this reason developmental systems are non-ergodic
and their analysis should be based on intra-individual variation. There exists a
long tradition in theoretical developmental psychology in which it is argued that
developmental processes should be analyzed at the level of intra-individual
variation (time series data). The general denotation for this tradition is
Developmental Systems Theory (DST). Important contributions to DST include
Wohlwill’s (1973) monograph on the concept of developmental functions
describing intra-individual variation, Ford and Lerner’s (1992) integrative
approach based on the interplay between intra-individual variation and interindividual variation and change, and Gottlieb’s (1992, 2003) theoretical work on
probabilistic epigenetic development.
Intra-individual analysis of non-stationary multivariate time series requires the
availability of sophisticated statistical modeling techniques. We developed such a
technique based on a systems model with arbitrarily time-varying parameters
(Molenaar, 1994; Molenaar & Newell, 2003). Our model can be conceived of as a
suitably generalized factor model for non-stationary p-variate time series y(t), t =
1,2,...,T. Its schematic form is :
(6a) y(t) = [(t)](t) + (t)
(6b) (t+1) = [(t)](t) + (t+1)
(6c) (t+1) = (t) + (t+1)
In (6a) y(t) denotes the observed p-variate time series, (t) is the q-variate latent
factor series (system state process), and (t) is the p-variate measurement error
process. The factor loadings in [(t)] depend upon the r-variate time-varying
parameter vector (t). (6b) describes the evolution of the latent factor series (t)
by means of a q-variate stochastic difference equation (autoregression) relating
(t+1) to (t), where (t+1) denotes the q-variate residual process. The (q.q)dimensional matrix of regression weights [(t)] depends upon the r-variate timevarying parameter vector (t). (6c) describes the time-dependent variation of the
unknown parameters. The r-variate parameter vector process (t) obeys a
special stochastic difference equation: a random walk with r-variate innovations
process (t).
The system of equations (6a), (6b) and (6c) allows for the modeling of a large
class of multivariate non-stationary (non-ergodic) processes. Equations (6a) and
(6b) have the same formal structure as the well-known inter-individual longitudinal
17
q-factor model, which helps in their interpretation. Yet the system of equations
(6a), (6b) and (6c) is applied to analyze the structure of intra-individual variation
underlying the observed p-variate time series y(t) obtained with a single subject.
Generalization of this model to accommodate multivariate time series obtained in
a replicated time series design is straightforward. Also extension of the model
with arbitrary mean trend functions and covariate processes having time-varying
effects is straightforward.
The fit of equations (6a), (6b) and (6c) to an observed p-variate time series y(t),
t=1,2,...,T, where T is the number of repeated measurements obtained with a
single subject P, is based on advanced statistical analysis techniques taken from
the engineering sciences (Bar-Shalom et al., 2001 ; Ristic et al., 2004). It
consists of a combination of recursive estimation (filtering), smoothing, and
iteration (EKFIS: Extended Kalman Filter with Iteration and Smoothing). The
EKFIS yields a time series (trajectory) of estimated values for each of the r
parameters in (t): k(t), t=1,2,...,T ; k=1,2,...,r.
To illustrate the performance of the EKFIS, the following small simulation study
has been carried out. A 4-variate (p = 4) time series y(t) has been generated by
means of the state-space model with time-varying parameters (6a), (6b) and (6c).
The model has a univariate (q = 1) latent state process (t). The autoregressive
coefficient [(t)] = b(t) in the process model (6b) for the latent state increases
linearly from 0.0 to 0.9 over the observation interval comprising T = 100 time
points: b(t) = 9t/1000, t=1,2,…,100.. Hence the sequential dependence
(autocorrelation) of the latent state process (latent factor series) increases from
zero to 0.9 across 100 time points and therefore is highly time-varying (nonstationary, hence non-ergodic). Depicted in Figure 1 is the estimate of this
autoregressive weight b(t) obtained by means of the EKFIS based on a single
subject time series y(t), t=1,2,...,100. It is clear that the estimated trajectory
closely tracks the true time-varying path of this parameter.
18
Figure 1: EKFIS estimate of time-varying
coefficient b(t) in the autoregressive model for the
latent factor scores
b(t)
1.2
1
0.8
0.6
0.4
0.2
CI Upper Bound
EKFIS Estimate
True Value
0
-0.2 1 12 23 34 45 56 67 78 89 100
-0.4
CI Lower Bound
Time
6. Discussion and conclusion
In this chapter some of the implications of the classical ergodic theorems have
been considered in the contexts of classical test theory, factor analysis of interindividual covariation, and the analysis of non-stationary developmental
processes. In each of these contexts the classical ergodic theorems imply that
instead of using standard statistical approaches based on analysis of interindividual variation, it is necessary to use single-subject time series analysis of
intra-individual variation. This conclusion holds for individual assessment based
on classical test theory, for testing the assumption of homogeneity (fixed factor
loadings across subjects) in factor analysis of inter-individual covariation, and for
the analysis of non-stationary processes such as learning and developmental
processes.
19
The consequences of the classical ergodic theorems in these and many other
contexts in psychology imply that time series designs and time series analysis
techniques will have to be assigned a much more prominent place than is
currently the case in psychological methodology. The overall aim of scientific
research in psychology still should be to arrive at general (nomothetic) laws that
hold for all subjects in a well-defined population. But the inductive tools to arrive
at such general laws have to be fundamentally different from the currently
standard approaches for those psychological processes which are non-ergodic.
Only if a psychological process is ergodic, i.e., obeys the two criteria of
homogeneity and stationarity, can results obtained by means of analysis of interindividual variation be generalized to the level of intra-individual variation. But the
two criteria for ergodicity are very strict and many psychological processes of
interest will fail to obey these criteria. Psychologists have to understand that
ergodicity is the special case, whereas non-ergodicity is the rule. For non-ergodic
psychological processes analysis of inter-individual variation yields results that
may not apply to any of the individual subjects in the population of subjects.
In conclusion, the inductive tools which are necessary to arrive at general
(nomothetic) laws for non-ergodic processes involve the search for
communalities between single-subject process models fitted to time series data
obtained in replicated time series designs. The latter search for communalities
between single-subject process models can be based on standard mixed
modeling techniques (see the excellent textbook of Demidenko, 2004).
Having available appropriate time series models for each individual subject
opens up possibilities which are entirely new in psychology. These possibilities
involve the optimal control of ongoing psychological processes. For instance,
consider the following special instance of the system of equations (6a), (6b):
(7a) y(t) = (t) + (t)
(7b) (t+1) = (t) + u(t) + (t+1)
Here the same definitions apply as for equations (6a), (6b). Notice that in (7a) and
(7b) the (p,q)-dimensional matrix of factor loadings  and the (q,q)-dimensional
matrix of regression weights  are assumed to be constant in time. This is to
ease the presentation; generalization of what follows to the non-stationary model
given by (6a), (6b) and (6c) is straightforward. Notice also that (7b) contains a new
component: u(t). The process s-variate process u(t) represents a known
process that can be manipulated; for instance dose of medication or
environmental stimulation.  is a (q,s)-dimensional matrix of regression weights.
Suppose that (7a) and (7b) provide a faithful description of the p-variate time
series y(t) for subject P. It then is possible to determine u(t) in such a way that
the state process (t) is steered to its desired level #, where # is chosen by the
20
controller. The optimal input u@(t) is determined according to the following
schematic feedback function:
(8) u@(t) = F[y(t),t]
where F[.] denotes an (s,p)-dimensional nonlinear feedback function. Application
of u@(t) at time t guarantees that the state process (t+1) at the next time point
t+1 will be as close as possible to the desired level #.
Optimal control is an important field of research in the engineering sciences.
There exists a vast literature on many different variants of optimal control (cf.
Kwon, 2005, for a thorough explanation of the currently most advanced
approaches). These control techniques can be applied straightforwardly in
analyses of intra-individual variation in order to steer psychological processes in
desired directions (cf. Molenaar, 1987, for an application to the optimal control of
a psychotherapeutic process). This opens up an entirely new promising field of
applied psychology: person-specific modeling and adaptive control of ongoing
psychological processes.
References
Anderson, T.W. (1971). The statistical analysis of time series. New York: Wiley.
Bar-Shalom, Y., Li, X.R., & Kirubarajan, T. (2001). Estimation with applications to
tracking and navigation. New York: Wiley.
21
Birkhoff, G.D. (1931). Proof of the ergodic theorem. Proceedings of the National
Academy of Sciences USA, 17, 656-660.
Borkenau, P., & F. Ostendorf, (1998). The Big Five as states: How useful is the
five-factor model to describe intra-individual variations over time? Journal of
Personality Research, 32, 202-221.
Brown, T.A. (2006). Confirmatory factor analysis for applied research. New York:
Guilford Press.
Cattell, R.B. (1952). The three basic factor-analytic designs – Their interrelations
and derivatives. Psychological Bulletin, 49, 499-520.
Choe, G.H. (2005). Computational ergodic theory. Berlin: Springer.
De Groot, A.D. (1954). Scientific personality diagnosis. Acta Psychologica, 10,
220-241.
Demidenko, E. (2004). Mixed models: Theory and applications. Hoboken, NJ:
Wiley.
Edelman, G.M. (1987). Neural Darwinism: The theory of neuronal group
selection. New York: Basic Books.
Ford, D.H., & Lerner, R.M. (1992). Developmental systems theory. Newbury
Park: Sage.
Gescheider, G.A. (1997). Psychophysics: The fundamentals. Mahwah, NJ:
Erlbaum.
Gottlieb, G. (1992). Individual development and evolution: The genesis of novel
behavior. New York: Oxford University Press.
Gottlieb, G. (2003). On making behavioral genetics truly developmental. Human
Development, 46, 337-355.
Hamaker, E.L., Dolan, C.V., & Molenaar, P.C.M. (2005). Statistical modeling of
the individual: Rationale and application of multivariate time series analysis.
Multivariate Behavioral Research, 40, 207-233.
Hogan, J.A., & Lakey, J.D. (2005). Time-frequency and time-scale methods:
Adaptive decompositions, uncertainty principles, and sampling. Boston:
Birkhäuser
22
Houtveen, J.H., & Molenaar, P.C.M. (2001). Comparison between the Fourier
and wavelet methods of spectral analysis applied to stationary and nonstationary heart period data. Psychophysiology, 38, 729-735.
Kelderman, H., & Molenaar, P.C.M. (2006). The effect of individual differences in
factor loadings on the standard factor model (to appear in Multivariate Behavioral
Research).
Jensen, A.R. (2006). Clocking the mind: Mental chronometry and individual
differences. Amsterdam: Elsevier.
Kwon, W.H. (2005). Receding horizon control: Model predictive control for state
models. London: Springer.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores.
Reading, MA: Addison-Wesley.
Molenaar, P.C.M. (1987), Dynamic assessment and adaptive optimization of the
therapeutic process. Behavioral Assessment, 9, 389-416.
Molenaar, P.C.M., & Roelofs, J.W. (1987). The analysis of multiple habituation
profiles of single trial evoked potentials. Biological Psychology, 24, 1-21.
Molenaar, P.C.M., Boomsma, D.I., & Dolan, C.V. (1993). A third source of
developmental differences. Behavior Genetics, 23, 519-524.
Molenaar, P.C.M. (1994). Dynamic latent variable models in developmental
psychology. In: A. von Eye & C.C. Clogg (Eds.), Analysis of latent variables in
developmental research. Newbury Park: Sage, pp. 155-180.
Molenaar, P.C.M. (1997). Time series analysis and its relationship with
longitudinal analysis. International Journal of Sports Medicine, 19, 232-237.
Molenaar, P.C.M. (1999). Longitudinal analysis. In: H.J. Ader & G.J. Mellenbergh
(Eds.), Research methodology in the social, behavioral and life sciences.
London: Sage, pp. 143-167.
Molenaar, P.C.M., Huizenga, H.M., & Nesselroade, J.R. (2003). The relationship
between the structure of interindividual and intraindividual variability: A
theoretical and empirical vindication of Developmental Systems Theory. In: U.M.
Staudinger & U. Lindenberger (Eds.), Understanding human development:
Dialogues with life-span psychology. Dordrecht: Kluwer, pp. 339-360.
Molenaar, P.C.M. (2003). State space techniques in structural equation
modeling: Transformation of latent variables in and out of latent variable models.
111 pages. Website: http://www.hhdev.psu.edu/hdfs/faculty/molenaar.html
23
Molenaar, P.C.M., & Newell, K.M. (2003). Direct fit of a theoretical model of
phase transition in oscillatory finger motions. British Journal of Mathematical and
Statistical Psychology, 56, 199-214.
Molenaar, P.C.M. (2004). A manifesto on psychology as idiographic science:
Bringing the person back into scientific psychology, this time forever.
Measurement, 2, 201-218.
Molenaar, P.C.M. (2006). On the implications of the classic ergodic theorems:
Analysis of developmental processes has to focus on intra-individual variation
(submitted).
Nayfeh, A.H., & Balachandran, B. (1995). Applied nonlinear dynamics: Analytical,
computational, and experimental methods. New York: Wiley.
Petersen, K. Ergodic theory. Cambridge: Cambridge University Press.
Priestley, M.B. (1988). Non-linear and non-stationary time series analysis.
London: Academic Press.
Ristic, B., Arulampalam, S., & Gordon, N. (2004). Beyond the Kalman filter:
Particle filters for tracking applications. London: Artech House.
Walters, P. (1982). An introduction to ergodic theory. 2nd edition. New York:
Springer.
Wohlwill, J.F. (1973). The study of behavioral development. New York: Academic
Press.
24
Download