New Methods for Sequential Behavioral Analysis Running head

advertisement
New Methods for Sequential Behavioral Analysis
1
Running head: NEW METHODS FOR SEQUENTIAL BEHAVIORAL ANALYSIS
New Methods for Sequential Behavior Analysis
Peter Molenaar & Tamara Goode
The Pennsylvania State University
Corresponding Author: Peter Molenaar, Department of Human Development and Family
Studies, 110 South Henderson Building, The Pennsylvania State University, University Park, PA
16802-6505. E-mail: pxm21@psu.edu.
New Methods for Sequential Behavioral Analysis
2
New Methods for Sequential Behavior Analysis
Sequential behavior analysis is firmly rooted in learning theory (Sharpe & Koperwas,
2003). The identification and analysis of antecedents and consequences of target behavior is
basic to behavior theory; sequential behavior analysis makes explicit temporal dependencies
between present and past behavior, and present and future behavior, using time-series data.
Behavior theory is currently expanding in exciting new directions such as artificial neural
network simulation (Schmajuk, 1997) and neuroscience (Moore, 2002). This expanded scope of
behavior theory presents new challenges to sequential behavior analysis, as do recent
developments in psychometrics and measurement. In this chapter we will review new methods
for sequential behavior analysis that can be instrumental in meeting these challenges.
The earliest methods for analyzing time-series data were simple curve-fitting procedures,
in which slopes and intercepts were compared pre- and post-intervention. These methods are
greatly limited, due to ubiquitous violations of the assumption that data collected in two different
conditions are independent. A generating function procedure also has been used, in which signal
(process) is separated from noise (error). Curve-fitting procedures and generating function
procedures both quantify the dependence of present observations on those preceding (also known
as lead-lag relations) (Gottman, McFall, & Barnett, 1969). Newer methods for sequential
behavior analysis, such as those we describe here, no longer require the quantification of these
dependencies in this manner. The inclusion of the “state” in the model, which we discuss in the
next section, both simplifies these models and allows for the estimation of more complex
systems and processes.
In the following, the focus is on new statistical models describing the sequential
dependencies of behavior processes. The statistical models are derived as special instances of a
general dynamic systems model – the so-called state space model (SSM). The SSM allows for
New Methods for Sequential Behavioral Analysis
3
the estimation of a complex system that includes inputs, outputs, and states. A simple but useful
example of such a system is one that many students experience in an introductory psychology
lab: that of training a rat to press a lever. An input could be a tone signaling that reinforcement
is available; the state could be hunger; and the output could be the pressing of the lever. The
general SSM not only provides for a transparent classification of a wide range of behavioral
process models, but also enables the specification of a general estimation scheme to fit these
models to the data. Moreover, and importantly, the general state space model constitutes an
excellent paradigm to introduce optimal control of behavior processes – a powerful extension of
sequential behavior analysis that, to the best of our knowledge, is considered here for the first
time. In behavior analysis, it is understood that optimal control means the least variability
possible in behavior as a function of controlling all of the environmental variables possible. Here
we are using optimal control as engineers use the term, which is somewhat similar. As in
behavior analysis, variability in the output, or behavior that being modeled, is minimized.
However, we are also able to specifically model the desired level of the behavior, and estimate
the level of those variables that influence the behavior. Additionally, we can set parameters to
minimize cost of any of those input variables. Thus, this chapter discusses the state space model,
which has been applied, albeit sparingly, to sequential behavior analysis. We conclude by
introducing optimal control models, which have not to our knowledge been applied to sequential
behavior analysis but have the potential to more precisely model variables controlling of
behavior as noted above. This model can then be applied to whatever process is under
examination, and behavior can be controlled much more precisely.
The presentation in this chapter is heuristic, emphasizing the interpretation of process
models while providing published references for further details. Some formal notation is
New Methods for Sequential Behavioral Analysis
4
indispensible and will be defined in the course of the development of the different methods. We
first discuss, however, individual versus group analyses, which has important implications for
any sequential analysis of behavioral processes.
Individual versus group analysis
The question of the relation between individual and aggregated models of behavior – the
application of group findings to an individual – has a long history. Sidman (1960) reviewed the
older literature, ranging from Merrill (1931) to Estes (1956), and built a case for single subject
designs while criticizing group-data experiments (see also Branch & Pennypacker, this volume).
Hannan’s (1991) monograph is devoted to the issue of aggregation and disaggregation in the
social sciences, i.e., the relation (if any) between individual and group analyses. This also is a
major theme in macroeconomics, where similar negative conclusions about the relation between
results obtained in individual and group analyses have been obtained (Aoki, 2002). This lack of
lawful relations between individual and aggregated data not only pertains to mean trends, but to
all aspects of probability distributions (variances, correlations, etc.). From a theoretical point of
view, a main emphasis of developmental systems theory (Ford & Lerner, 1992) is that
developmental processes should be investigated at the individual level, not the group level. This
chapter deals with analysis at the individual level, based upon these and other findings that
illustrate the problems of relations between aggregated and individual findings (for a
mathematical discussion of this problem and the theory of ergodicity, see Molenaar [ 2004,
2007]).
Dynamic Systems Models and General State Space Models:
A Scheme for Sequential Analysis
New Methods for Sequential Behavioral Analysis
5
In what follows we will refer to the unit of analysis (individual subject, dyad, triad, etc.)
as a system. The behavior of a system unfolds in time, constituting a behavior process. Hence,
the appropriate class of models for behavior processes is the class of dynamic systems models,
that is, models that can represent change over time. That change may be linear over the course of
the observation, changing consistently from one unit of time to the next; however, more common
is nonlinear change. For example, most representations of a complete learning process are
represented by an initial asymptote at 0, during which the target behavior is not observed at all.
During the learning process, the target behavior is observed more and more frequently; finally
the target behavior asymptotes again when fluency is achieved. This process is most often
represented by the nonlinear logistic function (Howell, 2007).
The class of dynamic systems models is large, encompassing linear and nonlinear time
series models, artificial neural network models, and many more. A considerable variety of
dynamic systems models, however, can be conceived of as special cases of the SSM. In
particular, most dynamic systems models that are of interest for sequential analysis of behavioral
processes are special cases of the SSM. In this section the SSM will serve as the organizing
principle in presenting a range of dynamic models for the sequential analysis of behavioral
processes.
Orientation and Notational Conventions
In the material that follows, matrices (i.e., two-dimensional data structures [arrays]
composed of rows & columns) are denoted by capital bold face letters; vectors (one dimensional
arrays; i.e., a single column of rows) by lower case bold face letters. The superscript T denotes
transposition (interchanging rows and columns). Because vectors are column vectors,
transposing a vector makes it a row vector. Random processes will not be denoted in a special
New Methods for Sequential Behavioral Analysis
6
way; which processes are random will be defined in the text. Manifest (observed) processes are
denoted by Roman letters; latent processes are denoted by Greek letters.
A manifest process is that which is measured (e.g., the number of problem behaviors a
child with autism is exhibiting in a period of time). A latent process is a process, state or class of
behaviors that we are measuring with manifest variables. For example, for these models we may
label aggression a latent state, and measure aggression by observing several behaviors, such as
hair pulling, hitting, biting and kicking. A more technical definition is that the latent state
contains all available information about the history of the relevant behavior or class of behavior.
For example, Kuhn, Hardesty and Lucznski (2009) conducted a functional analysis of antecedent
social events and their effect on motivation to access preferred edible items by an individual with
severe problem behavior. If we were to analyze these data using an SSM, the manipulation of
antecedent social events by having the therapist consume food items would be the inputs in the
SSM, and the change in the participant’s motivation, as reflected by decreased socially
appropriate responding, would be the latent state in the SSM
The Linear Gaussian SSM
We start with the linear Gaussian SSM. The term “Gaussian” refers to a normally
distributed data set; a common example of a Gaussian distributed data set is the heights of adult
male humans. An example of a simple linear Gaussian model is the relation of height and weight
of adult male humans; generally speaking, as height increases, weight increases proportionally at
a constant rate.
The linear Gaussian SSM consists of two submodels: the measurement submodel and the
dynamic submodel. As will be described in subsequent sections of this chapter, this
decomposition into measurement and dynamic submodels applies to each SSM, not just the
New Methods for Sequential Behavioral Analysis
7
Gaussian SSM. The measurement submodel of the Gausian SSM is made up of a Gaussian
(normally distributed) behavior process that has been observed at equidistant measurement
occasions (e.g., 20 min observation sessions conducted at the same time each day), a latent
Gaussian state process (to be explained shortly), a manifest (observed) input series u, and a
Gaussian error process . Thus the measurement submodel specifies that the behavior process
(y) at time (t) is a function of the individual mean (, the latent process, the input, and error:
(1A)
y(t) =  + (t) + u(t) + (t)
with and containing regression coefficients (i.e., values allowed to vary so as to provide a
better fit of the data; these values may be interpreted in ways discussed below). Equation 1A can
be interpreted as a traditional factor model of behavior, one in which the latent state process (t)
contains all available information about the history of the behavior process y(t).
The dynamic submodel of the linear Gaussian SSM is expressed as:
(1B)
(t) =  + β(t-1) + u(t)+(t).
The dynamic model links the previous time point (t-1) to the current time point (t); which is to
say that the previous state affects the present state. We specify the latent Gaussian process at
time t (i.e., the left side of Equation 1B) as a function of the mean , at the previous time point
(t-1), input u at time t, and error  The matrix β contains regression coefficients linking the past
state process (t-1) to the current state process (t). The matrix also contains regression
coefficients, values which quantify the effect of the input u(t) on the state process t. The
degree to which one cannot predict (t) given (t-1) is represented by (t). The vector  contains
mean levels (intercepts).
Example. For the first example, we refer to Glass, Willson & Gottman (1975) and their
argument for interrupted time series as an alternative to experimental designs. Here we have a
New Methods for Sequential Behavioral Analysis
8
simulated single-subject univariate time series y(t) obtained during a baseline condition lasting
from t = 1 until t = T1 (the end of the baseline condition) and an experimental condition lasting
from t = T1 + 1 until T2. We set T1 = 50 and T2 = 100. For example, this simulation could
represent the number of appropriate responses emitted before and after the introduction of some
behavioral intervention with a child diagnosed with autism. The mean  is 1 for the baseline
condition; our input u(t) = 0 for the baseline condition (i.e., the absence of the intervention
manipulation) and 1 for the introduction of the intervention condition. These specifications yield
the following special case of the linear Gaussian SSM, in which Λ = 1 and ε(t) = 0; the state
process is being treated as observed (i.e., no free parameter is used in scaling the Gaussian state
process and no error), thus we can drop them from the measurement submodel:
(2)
Measurement submodel:
y(t) = (t) + u(t)
Dynamic submodel:
(t) = (t-1) + (t)
Similarly, and u(t) are dropped from the dynamic submodel as the model is centered so that
the mean is 0; thus = 0 and u(t) is 0 for the baseline condition and 1 for the experimental
condition.
Figure 1 illustrates these simulated data. The continuous line depicts y(t) (our behavioral
outcome), the broken line u(t) (our input which, in our example, would correspond to pre- and
post- intervention). The obtained estimates of the parameters (designated with ^ above each
parameter) in (2) are: ˆ = 0.9, ˆ = 0.6 and the estimated variance ̂  of error (t) is ̂  = 0.9; (
ˆ is the estimate of the effect on the outcome of the experimental manipulation). All parameter
estimates are significant at nominal alpha = 0.01 (details not presented). This implies that ˆ =
0.9 is a significant effect of the experimental intervention condition, and that ˆ = 0.6 is a
New Methods for Sequential Behavioral Analysis
9
significant effect of at the previous time point t-1; in other words, there is a significant relation
between the quantified value of the latent state  at time (t-1) on time (t).
Figure 1 placement approximately here
For example, if the state being targeted was aggressive behavior, as measured by instances of
hitting, kicking, and biting, the significant result of at the previous time point t-1 would
indicate that the level of aggressive behavior at time point t-1 has a significant relation to the
level of aggressive behavior at time point t.
An example of data from applied behavior analysis where this model might be applied is
in Kuhn, Hardesty and Lucznski (2009). They conducted a functional analysis of antecedent
social events and their effect on motivation to access preferred edible items by an individual with
severe problem behavior. Manipulating antecedent social events by having the therapist consume
food items (inputs, u(t)) changed the participant’s motivation (as measured by an increase or
decrease in socially inappropriate responding), a latent state, Although a sequential relation
between motivation at time (t-1) and time (t) was not explored in the authors’ analysis, it is
possible that analysis of these data using the linear Gaussian SSM would identify any such
dependencies, assuming sufficient occasions of data (a total of 100 occasions or more) were
collected to perform the analysis.
This first example involves what is perhaps the simplest possible instance of a linear
Gaussian SSM. The linear Gaussian SSM can be straightforwardly extended to continuous time
(Simon, 2006).
Implementation. These analyses can be implemented in the Fortran computer program
MKFM6, which uses an expectation-maximization (EM) algorithm to fit linear Gaussian SSMs,
which together with documentation can be freely downloaded from the website
New Methods for Sequential Behavioral Analysis
10
http://users.fmg.uva.nl/cdolan/. The EM algorithm is more efficient than the more commonly
used maximum likelihood (ML) algorithm. This computer program also can estimate
multivariate Gaussian time series obtained with multiple subjects in a replicated time series
design.
The Hidden Markov Model
A hidden Markov model (HMM) is similar in many ways to the linear Gaussian SSM.
One similarity is that the HMM would be used when the outcome, y(t), is known but the state is
unknown, or “hidden,” (i.e., latent, as in the Gaussian linear SSM example above); a Markov
model is one in which the state can be determined simply by the value of y(t). Recalling that in a
behavioral analysis y(t) is the behavioral outcome at a given point in time (e.g., one of the data
points in Figure 1), the experimenter is interested in the organism’s state so as to predict the next
behavioral event y(t+1), or the next state.
The HMM is used to estimate the probability of transition from one state to another,
given the observed data. An HMM could be used to analyze the transitions between different
categories of behavior; for example, observational data could be gathered over time of problem
behavior (e.g., aggression, self-injurious behavior, and disruption). These data then could be
analyzed using an HMM, which would result in the probabilities of transition from one state
(category of problem behavior) to another. The resulting probabilities could be used to more
precisely tailor a behavioral intervention. For example, if the target behavior was self-injurious
behavior, and an HMM revealed that the probability of transition from aggression to selfinjurious behavior was much higher than the probability of transition from aggression to
disruption, intervention could be targeted more precisely before the occurrence of self-injurious
behavior.
New Methods for Sequential Behavioral Analysis
11
The defining equations for the HMM are similar to (1A)-(1B) defining the linear
Gaussian SSM. For the HMM, y(t) is a categorical process with p categories – the number of
states. The latent state process (t) is also a categorical process. The standard HMM can be
defined as:
(3A)
Measurement Submodel:
y(t) = (t) + (t)
(3B)
Dynamic Submodel:
(t) = β(t-1) + (t)
 is the probability that y(t) is a certain category, given a value of (t). β is the conditional
probability that (t) is a certain category, given a value of (t-1). The measurement error
process (t) and the process noise (t) are categorical processes with a priori fixed properties
(Elliott, Aggoun & Moore, 1995).
Example. Until now applications of HMMs for the sequential analysis of behavior
processes are rare. Notable exceptions are Courville and Touretzky (2002) and Visser,
Raijmakers, & Molenaar (2007). We will discuss Visser, Raijmaker and Molenaar’s (2002)
application of the HMM to implicit learning, or learning without awareness, of a rule-governed
sequence, with simulated data to illustrate. The data were simulated according to a state process
(t) with 2 categories (q = 2) to represent two states, an “unlearned state” and a “learned state.”
The observed process y(t) was simulated with 3 categories (p = 3), to represent three possible
responses; that is, three different buttons that participants could press in sequence during the task
to represent the rule governing the observed sequence. In this instance, the probabilities for (t)
given the values of (t-1) contained in the matrix β are shown in Table 1. Thus, when the
previous state is the guessing state ((t-1) = 1), the probability that the current state is also the
unlearned state is 0.9; when the previous state is the learned state ((t-1) = 2) , the probability
that the current state is the guessing state is 0.3.
New Methods for Sequential Behavioral Analysis
12
Table 1 placement approximately here
We can understand similarly the elements of the matrix as shown in Table 2. When the
present state is the guessing state ((t) = 1), the probability that the outcome is 1 (continued
guessing) is 0.7; when the present state is the learned state ((t) = 2), the probability that the
outcome is 1 is 0; when the present state is the guessing state ((t) = 1), the probability that the
outcome is 2 is 0; when the present state is the unsure state ((t) = 3), the probability that the
outcome is 1 is 0.3.
Table 2 placement approximately here
Figure 2 depicts the observed process y(t) and latent state process (t). The category
values of the latent state process have been increased by 4 units for clarity. The estimates of the
conditional probabilities specified in Tables 1 and 2 are very close to their true values (see Visser
et al., 2002, for further details). As can be observed by the plot of the data, when the state is in
the learned category (a value of 6 on the y-axis), the response emitted is more likely to be in
Categories 2 or 3.
Figure 2 placement approximately here
An HMM could be used to analyze the probability of transition among different
categories of problem behavior in an individual with mental retardation, as in Kunh et al. (2009).
Observational data could be collected over time of stereotypic movements, self-injurious
behavior, and disruptive behavior. If the target behavior is disruptive behavior, an HMM may
reveal, for example, that the probability of transitioning from stereotypic movement to disruptive
behavior is lower than the probability of transitioning from self-injurious behavior to disruptive
behavior. An intervention could then be designed accordingly, and targeted more precisely.
New Methods for Sequential Behavioral Analysis
13
Parameter estimation. Parameter estimation in HMMs and hidden hierarchical Markov
models (HHMMs) can be accomplished by means of the EM algorithm mentioned in the linear
Gaussian SSM section. This algorithm, when used for HMMs and HHMMs, is known as the
Baum-Welch (forward-backward) algorithm. It has been implemented in the R software package
DEPMIXS4 that can be freely downloaded from the website http://users.fmg.uva.nl/ivisser/.
Visser and Speekenbrink (2010) provide illustrative examples and instruction. Fraser (2008)
presents a transparent derivation of this algorithm.
The Generalized Linear Dynamic Model
Another instance of the SSM is the generalized linear dynamic model (GLDM). In the
GLDM the manifest process y(t) is categorical (as in the standard HMM) and the latent state
process (t) is linear (as in the linear Gaussian SSM). This model can be conceived of as a
dynamic generalization of loglinear analysis, and serves as an alternative to the HMM.
The GLDM is defined as:
(5A)
y(t) = [(t)] + (t)
(5B)
(t) =  + β(t-1) + u(t)+(t)
These formulas are similar to the formulas in the HMM section and the linear Gaussian SSM
section. In the GLDM, [(t)] is a nonlinear function of (t), the so-called response function
(Fahrmeir & Tutz, 2001). The matrix β again contains regression coefficients linking the prior
state process (t-1) to the current state process (t). The matrix again also contains regression
coefficients. The process noise (t) again represents the lack of predictability of (t) given (t-1).
The vector  contains mean levels (intercepts).
Example. An example of data from the behavior analysis literature that could be
analyzed using this approach is found in Cunningham (1979). The prediction of extinction or
New Methods for Sequential Behavioral Analysis
14
conditioned suppression of a conditioned emotional response y(t) as a function of sobriety (t)
could be assessed. Cunningham trained a conditioned emotional response in sober rats.
Extinction was conducted under conditions of high doses of alcohol. Conditioned suppression
returned when those rats were in a sober condition. The GLDM could be used to determine at
what level of sobriety (e.g., different doses of alcohol) conditioned suppression returns.
Parameter estimation. Parameter estimation in the GLDM is accomplished by means of
the EM algorithm described in the linear Gaussian SSM. The nonlinear response function in (5A)
necessitates a special approach to carry out the estimation of (t) (Fahrmeir & Tutz, 2001). A
beta version of the computer program concerned can be obtained from the first author.
The Linear Gaussian SSM with Time-Varying Parameters
The state space models considered in the previous sections all have parameters that are
constant – consistent both in level and pattern – over time. Yet learning and developmental
processes often have changing statistical characteristics that only can be captured accurately by
allowing model parameters to be time varying. For example, erratic environments with
inconsistent contingencies have been associated with the development of attachment problems in
children (Belsky, Garduque & Hrncir, 1984).
There exist a number of model types for the sequential analysis of processes with
changing statistical characteristics (so-called nonstationary processes), several of which are
described in Priestley (1988) and Tong (1990). Here we focus on a direct generalization of the
linear Gaussian SSM, allowing its parameters to be time varying in arbitrary ways. This
approach is based on related work by Young, McKenna and Bruun (2001) and Priestley (1988).
For this generalization we introduce a new vector to our formulas, t, which contains all
unknown parameters. Notice that t is time-varying, as indicated by the subscript t. The
New Methods for Sequential Behavioral Analysis
15
individual parameters in t may, or may not, be time-varying; it is one of the tasks of the
estimation algorithm to determine which individual parameters in t are time-varying and, if so,
how they vary in time. With this specification, the linear Gaussian SSM with time-varying
parameters is obtained as the following generalization of (1A)-(1B):
(6A)
y(t) =t] + t](t) + t]u(t) + (t)
(6B)
(t) = t] + βt](t-1) + t]u(t) + (t)
(6C)
t = t-1 + (t)
The interpretation of the parameter matrices in (6A)-(6B) is the same as for (1A)-(1B),
but now each parameter matrix or vector depends upon elements of the vector t in which all
unknown parameters are collected. For instance, t] specifies that the unknown parameters in
the matrix are elements of the vector t of time-varying parameters. (6C) is a new addition,
tracking the possibly time-varying behavior of the unknown parameters. The process (t)
represents lack of predictability of t. If an element of (t) has zero (or small) variance then the
corresponding parameter in t is (almost) constant in time, whereas this parameter will be timevarying if the variance of this element of (t) is significantly different from zero.
Example. One example of the application of this model is based on data gathered from a
biological son and his father. This application could allow an intervention to be planned and
implemented to reduce escalating negative interactions that might improve the long-term
relationship between father and son. These data were presented by Molenaar and Campbell
(2009), in what appears to be the first application of this model to psychological process data.
The data concern a 3-variate time series of repeated measures of emotional experiences of
biological sons with their biological fathers during 80 consecutive interaction episodes over a
New Methods for Sequential Behavioral Analysis
16
period of about 2 months. The self-report measures collected at the conclusion of each episode
were Involvement, Anger and Anxiety. Here we focus on a single biological son, the data of
whom are depicted in Figure 3.
Figure 3 placement approximately here
The following instance of (6A)-(6C) was used to analyze the data in Figure 3:
(7A)
y(t) = (t)
(7B)
(t) = βt-1](t-1) +(t)
(7C)
t = t-1 + (t)
The matrix βt] contains the possibly time-varying regression coefficients linking (t) to (t-1).
Here we only consider the part of the model explaining the Involvement process. This part can
be represented as:
(8)
Inv(t) = 1(t-1)*Inv(t-1) + 2(t-1)*Ang(t-1) + 3(t-1)*Anx(t-1) + (t)
Where Inv = involvement, Ang = anger, and Anx = anxiety. Thus, Involvement at time (t) is a
function of Involvement at the previous time point (t-1), Anger at the previous time point (t-1),
and Anxiety at the previous time point. The beta (coefficients at time (t) in (8) indicate that
involvement, anger, and anxiety vary with time; the coefficient quantifies the relation between
time (t) and time (t-1).
Figure 4 shows the estimates of 1(t), 2(t) and 3(t) across the 80 interaction episodes,
and illustrates the intertwined nature of involvement, anger and anxiety. 1(t), which quantifies
the effect of involvement at time (t-1) on involvement at time (t) is decreasing across the initial
half of the interaction episodes, after which it stabilizes during the final half of the interaction
episodes. 3(t) in (8), which quantifies the effect of anxiety at the previous interaction sequence
on Involvement at the next interaction sequence, is increasing across the initial half of the
New Methods for Sequential Behavioral Analysis
17
interaction episodes, after which it stabilizes during the final half of the interaction episodes. It is
noteworthy that 3(t) is negative during the initial interaction episodes, but is positive during the
later interaction episodes. When 3(t) is negative, there is a negative relation between anxiety at
time (t-1) and involvement at time (t), i.e., increased anxiety predicts decreased involvement at
each subsequent interaction or decreased anxiety predicts increased involvement at each
subsequent interaction. When 3(t) is positive, there is a positive relation between anxiety at
time (t-1) and involvement at time (t), i.e., increased anxiety predicts increased involvement at
each subsequent interaction or decreased anxiety predicts decreased involvement at each
subsequent interaction. When 3(t) is zero, anxiety at time (t-1) has no effect on involvement at
the subsequent interaction. Additional procedural details and data can be found in Molenaar,
Sinclair, Rovine, Ram & Corneal (2009).
Figure 4 placement approximately here
An example of data from the behavior analysis literature that could be analyzed using this
approach can be found in McSweeney, Murphy, and Kowal (2004). The authors examine
habituation as a function of duration of reinforcers. Rate of responding y(t) as a function of
habituation (t) as habituation varies over time βt-1](t-1) could be assessed more precisely.
Parameter estimation. Parameter estimation in (6A)-(6C) is accomplished by the EM
algorithm. The E-step (estimation step) requires considerable reformulation of the original
model, resulting in a nonlinear analogue. A beta version of the Fortran program with which the
linear Gaussian SSM with time-varying parameters is fitted to the data can be obtained from the
first author.
SSMs in Continuous Time
New Methods for Sequential Behavioral Analysis
18
The class of SSMs in continuous time is large, including (non-)linear stochastic
differential equations, Fokker-Planck equations, etc., and therefore cannot be reviewed within
the confines of this chapter. One application of SSMs in continuous time, the point process,
could be applied to Nevin and Baum (1980). Point processes can be used to determine the
timing of a repeated discrete response. Nevin and Baum discuss probabilities of the termination
and initiation of a responding burst in free-operant behavior. An assumption of one model
discussed is that, during a burst, interresponse times are approximately constant. A point process
could test this assumption, as well as whether the transition from responding to termination is a
function of that timing. Rate of responding and rate of reinforcement data could be collected
while training pigeons. These data could be analyzed using the point process, and results would
indicate whether interresponse times are constant or varied during a burst, and whether the
transition from responding to termination is a function of constant interresponse time or varied
interresponse time.
For excellent reviews of SSMs in continuous time see, for instance, Gardiner (2004) and
Risken (1984). The linear Gaussian SSMs considered here are special in that the analogues in
continuous time share most of their statistical properties. Molenaar & Newell (2003) applied
nonlinear stochastic differential equations to analyze bifurcations (sudden qualitative
reorganizations) in human motion dynamics. Haccou & Meelis (1992) give a comprehensive
treatment of continuous time Markov chain models for the analysis of ethological processes.
Remarks on Optimal-Control-of-Behavior Processes
The concepts of homeostasis and control have long played a role in behavior theory
(McFarland, 1971; Carver, 2004; Johnson, Chang & Lord, 2006). This role, however, has been
confined to the theoretical level. If one of the SSMs considered earlier has been fitted to a
New Methods for Sequential Behavioral Analysis
19
behavioral process then one can use this model to optimally control the process as it unfolds by
means of powerful mathematical techniques. Here we present optimal control as a computational
technique to steer a behavioral process to desired levels. The fitting of this model to a complex
behavioral process, such as the development of aggressive behavior, allows the precise
estimation of parameters that can then be applied to ameliorate the problem behavior and correct
those problems in an efficient manner.
To introduce the concept of computational feedback, consider the following simple
behavior process model in discrete time:
(10)
y(t+1) = y(t) + u(t) + (t+1)
Here y(t) denotes a manifest univariate behavior process, u(t) is a univariate external input, and
(t) is process noise. From an intervention perspective, the desired level of the behavioral process
is y* and achieving this is obtained by experimentally manipulating u(t) (e.g., medication,
reinforcement contingency, etc.).
Given (10) and y*, the goal of the computational feedback problem is to set the value of
u(t) (i.e., the level of the value of the independent variable) in a way that minimizes the
difference between the expected value of y(t) and y*. To do this requires an estimate of the
expected value of the behavior of interest given the extant record of that behavior up to time t-1:
E[y(t|t-1)]. The deviation between expected and desired behavior is referred to as a cost function.
A simple cost function is: C(t) = E[y(t|t-1) – y*]2. Again, the external input u(t) will be chosen in
a way that minimizes C(t) for all times t. Because E[y(t|t-1)] = y(t) + u(t), the optimal level of
the independent variable, u*(t), is given by:
(11)
u*(t) = [-y(t) + y*]/
New Methods for Sequential Behavioral Analysis
20
(11) is a feedback function, in which the optimal input u*(t), which will co-determine the value
y(t+1), depends upon y(t). Thus the optimal input for time (t+1) to steer the behavior y to an
optimal value y* is dependent upon the value of y at time (t).
Example. Figure 5 depicts a manifest behavioral process y(t), t=1,…,50 (labeled Y). This
y(t) is uncontrolled (i.e., not subject to any experimental manipulation) and was simulated
according to (10) with = 0.7 and = 0.9. The goal of our intervention will be to achieve a
desired level of behavior which we have set at zero, y* = 0. Application of (11) yields the u*(t)
values shown in the UO function in Figure 5. Application of these optimal values of the
independent variable, u*(t), in (10) yields the optimally controlled behavioral process which is
labeled YO in Figure 5. It is evident from Figure 5 that the deviation of the optimally controlled
process YO from y* = 0 is much smaller than the analogous deviation of the behavioral process
without control Y.
Figure 5 placement approximately here
This example of optimal feedback control given above is simple in a number of respects.
The behavioral process to be controlled is univariate, or a single dependent variable; in actuality,
both are often multivariate. For example, escalating aggression may be measured by frequency
and co-occurrence of several behaviors, such as hair pulling, slapping, hitting, kicking and
scratching. Also, (10) assumes a known linear model. In reality, the model can be any one of the
SSMs considered previously, with parameters that are unknown and have to be estimated. In our
simple example, the cost function is a quadratic function that does not penalize the costs of
exercising control actions. In general both the deviations of the behavioral process from the
desired level and the external input (e.g., medication, reinforcement contingency, etc.) are
penalized according to separate cost functions which can have much more intricate forms. This
New Methods for Sequential Behavioral Analysis
21
example, however, serves to illustrate that control of behavior can be quantified and estimated,
thus allowing for much more precise and efficient manipulations of experimental variables
designed to modify behavior.
An example from the behavior-analytic literature where this analysis might be applied is
an extension of the Kuhn et al. (2009) study, mentioned earlier in this chapter. Once the
evaluation of antecedent events to problem behavior had been completed, and those events which
led to reduction of target behavior identified, implementation of the treatment plan could be
assessed using an optimal control feedback model.
In general well-developed mathematical theories exist for optimal feedback control in
each of the types of SSMs considered here. Kwon & Han (2005) present an in-depth description
of optimal control in linear SSMs (see also Molenaar, 2010); Elliott, Aggoun & Moore (1995) is
the classic source for optimal control in HMMs; Costa, Fragoso and Marques (2005) discuss
optimal control in SSMs with regime shifting (i.e., systems models for processes undergoing
sudden changes in their dynamics).
Conclusion
Because the class of dynamic systems models is too large to comprehensively cover
within the confines of this chapter, we have reviewed several simple examples of this class of
models: the linear Gaussian SSM, the hidden Markov model, the hierarchical hidden Markov
model, the generalized linear dynamic model, the linear Gaussian SSM with time varying
parameters, and SSMs in continuous time. Notable omissions from our review are regime shifts
(Costa et al., 2005), modeling processes having long-range sequential dependencies (Palma,
2007), and chaotic processes having dynamics evolving at multiple time scales (Gao, Cao, Tung
& Hu, 2007). These types of process models are of interest for sequential behavior analysis
New Methods for Sequential Behavioral Analysis
22
(Torre & Wagenmakers, 2009) and again can be formulated as special instances of SSMs;
however, their discussion requires consideration of much more technical concepts and for that
reason have been omitted from this review. We therefore refer to the excellent references given
above for further details about these model types.
These new methods for the sequential analysis of behavior processes reviewed here can
be conceived of as special cases of a general SSM, a model that allows for the quantification of
complex systems that are comprised of inputs, feedback, outputs and states. Several instances of
this general SSM were presented, however, the SSM covers a much broader range of dynamic
models. For instance, there exists a close correspondence between artificial neural network
models and state space modeling (Haykin, 2001). This correspondence opens up the possibility
to reformulate artificial neural network models as specific instances of nonlinear SSMs. For
instance, the neural network models considered by Schmajuk (2002; 1997) are variants of
adaptive resonance theory networks that, in their most general form, constitute coupled systems
of nonlinear differential equations (Raijmakers & Molenaar, 1996) obeying the general nonlinear
state space model format. Similar remarks apply to reinforcement networks (Wyatt, 2003).
It is our hope that this presentation of the class of models known as SSMs has illustrated
how they can be useful in behavior analysis – specifically in modeling behavior and behavior
change as a process, and to include all aspects of the behavior change process: the state of the
organism, the inputs and outputs occurring during the process, and the feedback that occurs as
the process continues. Use of these models can enhance the further development and growth of
the field.
New Methods for Sequential Behavioral Analysis
23
References
Aoki, M. (2002). Modeling aggregate behavior and fluctuations in economics: Stochastic views
of interacting agents. Cambridge: Cambridge University Press.
Belsky, J. Garduque, L., & Hrncir, E. (1984). Assessing performance, competence, and executive
capacity in infant play: Relations to home environment and security of attachment.
Developmental Psychology, 20, 406 – 417.
Browne, M. W. & Nesselroade, J. R. (2005). Representing psychological processes with
dynamic factor models: Some promising uses and extensions of ARMA time series
models. In: A. Maydeu-Olivares & J.J. McArdle (Eds.), Psychometrics: A festschrift to
Roderick P. McDonald. Mahwah, NJ: Lawrence Erlbaum Associates, 415-452.
Carver, S. C. (2004). Self-regulation of action and affect. In: R. F. Baumeister & K. D. Vohs
(Eds.), Handbook of self-regulation. New York: The Guilford Press, 13-39.
Costa, O. L. V., Fragoso, M. D., & Marques, R. P. (2005). Discrete-time Markov jump linear
systems. London: Springer-Verlag.
Courville, A. C., & Touretzky, D. S. (2002). Modeling temporal structure in classical
conditioning. In: T. J. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural
information processing systems, Vol. 14. Cambridge, MA: MIT Press, 3-10.
Cunningham, C. L. (1979). Alcohol as a cue for extinction: State dependency produced by
conditioned inhibition. Animal Learning & Behavior, 7,45 – 52.
Elliott, R. J., Aggoun, L., & Moore, J. B. (1995). Hidden Markov models: Estimation and
control. New York: Springer-Verlag.
Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modeling based on generalized linear
models. 2nd edition. Berlin: Springer-Verlag.
New Methods for Sequential Behavioral Analysis
24
Ford, D. H., & Lerner, R. M. (1992). Developmental systems theory: An integrative approach.
Newbury Park, CA: Sage Publications.
Gao, J., Cao, Y., Tung, W. W., & Hu, J. (2007). Multiscale analysis of complex time series:
Integration of chaos and random fractal theory, and beyond. Hoboken, NJ: Wiley.
Gardiner, C. W. (2004). Handbook of stochastic methods: For physics, chemistry and the natural
sciences. 3rd edition. Berlin: Springer-Verlag.
Glass, G., Willson, V., & Gottman, J. M. (1975). Design and analysis of time-series experiments.
Boulder, Co: Colorado University Press.
Gottman, J. M., McFall, R. M., & Barnett, J. T. (1969). Design and analysis of research using
time-series. Psychological Bulletin, 72, 299 – 306.
Haccou, P., & Meelis, E. (1992). Statistical analysis of behavioural data: An approach based on
time-structured models. Oxford: Oxford University Press.
Hamaker, E. L., Dolan, C. V., & Molenaar, P. C. M. (2005). Statistical modeling of the
individual: Rationale and application of multivariate stationary time series analysis.
Multivariate Behavioral Research, 40, 207-233.
Hamaker, E. L., Nesselroade, J. R., & Molenaar, P. C. M. (2007). The integrated state-space
model. Journal of Research in Personality, 41, 295-315.
Hannan, M. T. (1991). Aggregation and disaggregation in the social sciences. Lexington: Mass.,
Lexington Books.
Haykin, S. (2001). Kalman filtering and neural networks. New York: Wiley.
Howell, D. C. (2007). Statistical methods for psychology. Belmont: California, Thomson Wadsworth.
Johnson, R. E., Chang, C. H., & Lord, R. G. (2006). Moving from cognition to behavior: What
the research says. Psychological Bulletin, 132, 381-415.
Kwon, W. H., & Han, S. (2005). Receding horizon control. London: Springer-Verlag.
New Methods for Sequential Behavioral Analysis
25
Kuhn, D. E., Hardesty, S. L., & Luczynski, K. (2009). Further evaluation of antecedent social
events during a functional analysis. Journal of Applied Behavior Analysis, 42, 349 – 353.
McFarland, D. (1971). Feedback mechanisms in animal behavior. London: Academic Press.
McSweeney, F. K., Murphy, E. S., & Kowal, B. P. (2004). Varying reinforcer duration produces
behavioral interactions during multiple schedules. Behavioural Processes, 66, 83 – 100.
Merrill, M. (1931). The relationship of individual growth to average growth. Human Biology, 3,
37-70.
Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the
person back into scientific psychology, this time forever. Measurement, 2, 201-218.
Molenaar, P. C. M. (2007). On the implications of the classical ergodic theorems: Analysis of
developmental processes has to focus on intra-individual variation. Developmental
Psychobiology, 50, 60-69.
Molenaar, P. C. M. (2010). Note on optimization of individual psychotherapeutic processes.
Journal of Mathematical Psychology (to appear).
Molenaar, P. C. M., & Campbell, C. G. (2009). The new person-specific paradigm in
psychology. Current Directions in Psychology, 18, 112-117.
Molenaar, P. C. M., & Newell, K. M. (2003). Direct fit of a theoretical model of phase transition
in oscillatory finger motions. British Journal Mathematical and Statistical Psychology,
56, 199-214.
Molenaar, P. C. M., Sinclair, K. O., Rovine, M. J., Ram, N., & Corneal, S. E. (2009). Analyzing
developmental processes on an individual level using non-stationary time series
modeling. Developmental Psychology, 45, 260-271.
New Methods for Sequential Behavioral Analysis
26
Moore, J. W. (Ed.) (2002). A neuroscientist’s guide to classical conditioning. New York:
Springer-Verlag.
Nevin, J. A., & Baum, W. M. (1980). Feedback functions for variable-interval reinforcement.
Journal of the Experimental Analysis of Behavior, 34, 207 – 217.
Palma, W. (2007). Long-memory time series: Theory and methods. Hoboken, NJ: Wiley.
Priestley, M. B. (1988). Non-linear and non-stationary time series analysis. London: Academic
Press.
Raijmakers, M. E. J., & Molenaar, P. C. M. (1996). Exact ART: A complete implementation of an ART
network, including all regulatory and logical functions, as a system of differential equations
capable of stand-alone running in real time. Neural Networks, 10, 649-669.
Risken, H. (1984). The Fokker-Planck equation: Methods of solution and applications. Berlin:
Springer-Verlag.
Schmajuk, N. (2002). Latent inhibition and its neural substrates. Norwell, Mass.: Kluwer Academic
Publishers.
Schmajuk, N. A. (1997). Animal learning and cognition: A neural network approach.
Cambridge: Cambridge University Press.
Sharpe, T., & Koperwas, J. (2003). Behavior and sequential analyses: Principles and practice.
Thousand Oaks, CA: Sage Publications.
Sidman, M. (1960). Tactics of scientific research. Oxford: Basic Books.
Simon, D. (2006). Optimal state estimation: Kalman, H∞, and nonlinear approaches. Hoboken,
NJ: Wiley.
Torre, K., & Wagenmakers, E. J. (2009). Theories and models for 1/f noise in human movement
science. Human Movement Science, 28, 297-318.
New Methods for Sequential Behavioral Analysis
27
Van Rijn, P. W., & Molenaar, P. C. M. (2005). Logistic models for single-subject time series. In:
L. A. van der Ark, M. A. Croon, & K. Sijtsma (Eds.), New developments in categorical
data analysis for the social and behavioral sciences. Mahwah, NJ: Lawrence Erlbaum
Associates, 125-145.
Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2002). Fitting hidden Markov models to
psychological data. Scientific Programming, 10, 185-199.
Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2007). Characterizing sequence
knowledge using on-line measures and hidden Markov models. Memory & Cognition, 35,
1502-1517.
Visser, I. & Speekenbrink, M. (2010). depmixS4: An R-package for hidden Markov
models. Journal of Statistical Software , 36(7):1–21.
Wyatt, J. (2003). Reinforcement learning: A brief overview. In: R. Kühn, R. Menzel, W. Menzel,
U. Ratsch, M. M. Richter, & I. O. Stamatescu (Eds.), Adaptivity and learning: An
interdisciplinary debate. Berlin: Springer-Verlag, 243-264.
Young, P. C., McKenna, P., & Bruun, J. (2001). Identification of non-linear stochastic systems
by state-dependent parameter estimation. International Journal of Control, 74, 18371857.
Authors Note
Peter C. M. Molenaar, Department of Human Development and Family Studies, The
Pennsylvania State University. Tamara Goode, Department of Human Development and Family
Studies, The Pennsylvania State University.
The authors gratefully acknowledge funding provided by NSF, grant 0852147, which
made Dr. Molenaar’s work possible; and the University Graduate Fellowship provided by The
Pennsylvania State University which made Ms. Goode’s work possible.
New Methods for Sequential Behavioral Analysis
Correspondence concerning this article should be addressed to Peter Molenaar,
Department of Human Development and Family Studies, 110 South Henderson Building,
Pennsylvania State University, University Park, PA 16802. E-mail: pxm21@psu.edu.
28
New Methods for Sequential Behavioral Analysis
Table 1
Probabilities of (t) given (t-1) = 1 and (t-1) = 2
(t-1) = 1
(t-1) = 2
(t) = 1
0.9
0.3
(t) = 2
0.1
0.7
29
New Methods for Sequential Behavioral Analysis
Table 2
Probabilities of y(t) given (t) = 1 and (t) = 2
(t) = 1
(t) = 2
y(t) = 1
.7
0
y(t) = 2
0
.4
y(t) = 3
.3
.6
30
New Methods for Sequential Behavioral Analysis
Figure 1. Single subject time series with experimental manipulation at time = 51.
31
New Methods for Sequential Behavioral Analysis
Figure 2. HMM process y and latent process 
32
New Methods for Sequential Behavioral Analysis
Figure 3. Observed series for biological son.
33
New Methods for Sequential Behavioral Analysis
Figure 4. Involvement at t+1.
34
New Methods for Sequential Behavioral Analysis
Figure 5. Simple optimal feedback control. Y: Behavioral process without control. YO:
Optimally controlled behavioral process. UO: Optimal external input
35
Download