New Methods for Sequential Behavioral Analysis 1 Running head: NEW METHODS FOR SEQUENTIAL BEHAVIORAL ANALYSIS New Methods for Sequential Behavior Analysis Peter Molenaar & Tamara Goode The Pennsylvania State University Corresponding Author: Peter Molenaar, Department of Human Development and Family Studies, 110 South Henderson Building, The Pennsylvania State University, University Park, PA 16802-6505. E-mail: pxm21@psu.edu. New Methods for Sequential Behavioral Analysis 2 New Methods for Sequential Behavior Analysis Sequential behavior analysis is firmly rooted in learning theory (Sharpe & Koperwas, 2003). The identification and analysis of antecedents and consequences of target behavior is basic to behavior theory; sequential behavior analysis makes explicit temporal dependencies between present and past behavior, and present and future behavior, using time-series data. Behavior theory is currently expanding in exciting new directions such as artificial neural network simulation (Schmajuk, 1997) and neuroscience (Moore, 2002). This expanded scope of behavior theory presents new challenges to sequential behavior analysis, as do recent developments in psychometrics and measurement. In this chapter we will review new methods for sequential behavior analysis that can be instrumental in meeting these challenges. The earliest methods for analyzing time-series data were simple curve-fitting procedures, in which slopes and intercepts were compared pre- and post-intervention. These methods are greatly limited, due to ubiquitous violations of the assumption that data collected in two different conditions are independent. A generating function procedure also has been used, in which signal (process) is separated from noise (error). Curve-fitting procedures and generating function procedures both quantify the dependence of present observations on those preceding (also known as lead-lag relations) (Gottman, McFall, & Barnett, 1969). Newer methods for sequential behavior analysis, such as those we describe here, no longer require the quantification of these dependencies in this manner. The inclusion of the “state” in the model, which we discuss in the next section, both simplifies these models and allows for the estimation of more complex systems and processes. In the following, the focus is on new statistical models describing the sequential dependencies of behavior processes. The statistical models are derived as special instances of a general dynamic systems model – the so-called state space model (SSM). The SSM allows for New Methods for Sequential Behavioral Analysis 3 the estimation of a complex system that includes inputs, outputs, and states. A simple but useful example of such a system is one that many students experience in an introductory psychology lab: that of training a rat to press a lever. An input could be a tone signaling that reinforcement is available; the state could be hunger; and the output could be the pressing of the lever. The general SSM not only provides for a transparent classification of a wide range of behavioral process models, but also enables the specification of a general estimation scheme to fit these models to the data. Moreover, and importantly, the general state space model constitutes an excellent paradigm to introduce optimal control of behavior processes – a powerful extension of sequential behavior analysis that, to the best of our knowledge, is considered here for the first time. In behavior analysis, it is understood that optimal control means the least variability possible in behavior as a function of controlling all of the environmental variables possible. Here we are using optimal control as engineers use the term, which is somewhat similar. As in behavior analysis, variability in the output, or behavior that being modeled, is minimized. However, we are also able to specifically model the desired level of the behavior, and estimate the level of those variables that influence the behavior. Additionally, we can set parameters to minimize cost of any of those input variables. Thus, this chapter discusses the state space model, which has been applied, albeit sparingly, to sequential behavior analysis. We conclude by introducing optimal control models, which have not to our knowledge been applied to sequential behavior analysis but have the potential to more precisely model variables controlling of behavior as noted above. This model can then be applied to whatever process is under examination, and behavior can be controlled much more precisely. The presentation in this chapter is heuristic, emphasizing the interpretation of process models while providing published references for further details. Some formal notation is New Methods for Sequential Behavioral Analysis 4 indispensible and will be defined in the course of the development of the different methods. We first discuss, however, individual versus group analyses, which has important implications for any sequential analysis of behavioral processes. Individual versus group analysis The question of the relation between individual and aggregated models of behavior – the application of group findings to an individual – has a long history. Sidman (1960) reviewed the older literature, ranging from Merrill (1931) to Estes (1956), and built a case for single subject designs while criticizing group-data experiments (see also Branch & Pennypacker, this volume). Hannan’s (1991) monograph is devoted to the issue of aggregation and disaggregation in the social sciences, i.e., the relation (if any) between individual and group analyses. This also is a major theme in macroeconomics, where similar negative conclusions about the relation between results obtained in individual and group analyses have been obtained (Aoki, 2002). This lack of lawful relations between individual and aggregated data not only pertains to mean trends, but to all aspects of probability distributions (variances, correlations, etc.). From a theoretical point of view, a main emphasis of developmental systems theory (Ford & Lerner, 1992) is that developmental processes should be investigated at the individual level, not the group level. This chapter deals with analysis at the individual level, based upon these and other findings that illustrate the problems of relations between aggregated and individual findings (for a mathematical discussion of this problem and the theory of ergodicity, see Molenaar [ 2004, 2007]). Dynamic Systems Models and General State Space Models: A Scheme for Sequential Analysis New Methods for Sequential Behavioral Analysis 5 In what follows we will refer to the unit of analysis (individual subject, dyad, triad, etc.) as a system. The behavior of a system unfolds in time, constituting a behavior process. Hence, the appropriate class of models for behavior processes is the class of dynamic systems models, that is, models that can represent change over time. That change may be linear over the course of the observation, changing consistently from one unit of time to the next; however, more common is nonlinear change. For example, most representations of a complete learning process are represented by an initial asymptote at 0, during which the target behavior is not observed at all. During the learning process, the target behavior is observed more and more frequently; finally the target behavior asymptotes again when fluency is achieved. This process is most often represented by the nonlinear logistic function (Howell, 2007). The class of dynamic systems models is large, encompassing linear and nonlinear time series models, artificial neural network models, and many more. A considerable variety of dynamic systems models, however, can be conceived of as special cases of the SSM. In particular, most dynamic systems models that are of interest for sequential analysis of behavioral processes are special cases of the SSM. In this section the SSM will serve as the organizing principle in presenting a range of dynamic models for the sequential analysis of behavioral processes. Orientation and Notational Conventions In the material that follows, matrices (i.e., two-dimensional data structures [arrays] composed of rows & columns) are denoted by capital bold face letters; vectors (one dimensional arrays; i.e., a single column of rows) by lower case bold face letters. The superscript T denotes transposition (interchanging rows and columns). Because vectors are column vectors, transposing a vector makes it a row vector. Random processes will not be denoted in a special New Methods for Sequential Behavioral Analysis 6 way; which processes are random will be defined in the text. Manifest (observed) processes are denoted by Roman letters; latent processes are denoted by Greek letters. A manifest process is that which is measured (e.g., the number of problem behaviors a child with autism is exhibiting in a period of time). A latent process is a process, state or class of behaviors that we are measuring with manifest variables. For example, for these models we may label aggression a latent state, and measure aggression by observing several behaviors, such as hair pulling, hitting, biting and kicking. A more technical definition is that the latent state contains all available information about the history of the relevant behavior or class of behavior. For example, Kuhn, Hardesty and Lucznski (2009) conducted a functional analysis of antecedent social events and their effect on motivation to access preferred edible items by an individual with severe problem behavior. If we were to analyze these data using an SSM, the manipulation of antecedent social events by having the therapist consume food items would be the inputs in the SSM, and the change in the participant’s motivation, as reflected by decreased socially appropriate responding, would be the latent state in the SSM The Linear Gaussian SSM We start with the linear Gaussian SSM. The term “Gaussian” refers to a normally distributed data set; a common example of a Gaussian distributed data set is the heights of adult male humans. An example of a simple linear Gaussian model is the relation of height and weight of adult male humans; generally speaking, as height increases, weight increases proportionally at a constant rate. The linear Gaussian SSM consists of two submodels: the measurement submodel and the dynamic submodel. As will be described in subsequent sections of this chapter, this decomposition into measurement and dynamic submodels applies to each SSM, not just the New Methods for Sequential Behavioral Analysis 7 Gaussian SSM. The measurement submodel of the Gausian SSM is made up of a Gaussian (normally distributed) behavior process that has been observed at equidistant measurement occasions (e.g., 20 min observation sessions conducted at the same time each day), a latent Gaussian state process (to be explained shortly), a manifest (observed) input series u, and a Gaussian error process . Thus the measurement submodel specifies that the behavior process (y) at time (t) is a function of the individual mean (, the latent process, the input, and error: (1A) y(t) = + (t) + u(t) + (t) with and containing regression coefficients (i.e., values allowed to vary so as to provide a better fit of the data; these values may be interpreted in ways discussed below). Equation 1A can be interpreted as a traditional factor model of behavior, one in which the latent state process (t) contains all available information about the history of the behavior process y(t). The dynamic submodel of the linear Gaussian SSM is expressed as: (1B) (t) = + β(t-1) + u(t)+(t). The dynamic model links the previous time point (t-1) to the current time point (t); which is to say that the previous state affects the present state. We specify the latent Gaussian process at time t (i.e., the left side of Equation 1B) as a function of the mean , at the previous time point (t-1), input u at time t, and error The matrix β contains regression coefficients linking the past state process (t-1) to the current state process (t). The matrix also contains regression coefficients, values which quantify the effect of the input u(t) on the state process t. The degree to which one cannot predict (t) given (t-1) is represented by (t). The vector contains mean levels (intercepts). Example. For the first example, we refer to Glass, Willson & Gottman (1975) and their argument for interrupted time series as an alternative to experimental designs. Here we have a New Methods for Sequential Behavioral Analysis 8 simulated single-subject univariate time series y(t) obtained during a baseline condition lasting from t = 1 until t = T1 (the end of the baseline condition) and an experimental condition lasting from t = T1 + 1 until T2. We set T1 = 50 and T2 = 100. For example, this simulation could represent the number of appropriate responses emitted before and after the introduction of some behavioral intervention with a child diagnosed with autism. The mean is 1 for the baseline condition; our input u(t) = 0 for the baseline condition (i.e., the absence of the intervention manipulation) and 1 for the introduction of the intervention condition. These specifications yield the following special case of the linear Gaussian SSM, in which Λ = 1 and ε(t) = 0; the state process is being treated as observed (i.e., no free parameter is used in scaling the Gaussian state process and no error), thus we can drop them from the measurement submodel: (2) Measurement submodel: y(t) = (t) + u(t) Dynamic submodel: (t) = (t-1) + (t) Similarly, and u(t) are dropped from the dynamic submodel as the model is centered so that the mean is 0; thus = 0 and u(t) is 0 for the baseline condition and 1 for the experimental condition. Figure 1 illustrates these simulated data. The continuous line depicts y(t) (our behavioral outcome), the broken line u(t) (our input which, in our example, would correspond to pre- and post- intervention). The obtained estimates of the parameters (designated with ^ above each parameter) in (2) are: ˆ = 0.9, ˆ = 0.6 and the estimated variance ̂ of error (t) is ̂ = 0.9; ( ˆ is the estimate of the effect on the outcome of the experimental manipulation). All parameter estimates are significant at nominal alpha = 0.01 (details not presented). This implies that ˆ = 0.9 is a significant effect of the experimental intervention condition, and that ˆ = 0.6 is a New Methods for Sequential Behavioral Analysis 9 significant effect of at the previous time point t-1; in other words, there is a significant relation between the quantified value of the latent state at time (t-1) on time (t). Figure 1 placement approximately here For example, if the state being targeted was aggressive behavior, as measured by instances of hitting, kicking, and biting, the significant result of at the previous time point t-1 would indicate that the level of aggressive behavior at time point t-1 has a significant relation to the level of aggressive behavior at time point t. An example of data from applied behavior analysis where this model might be applied is in Kuhn, Hardesty and Lucznski (2009). They conducted a functional analysis of antecedent social events and their effect on motivation to access preferred edible items by an individual with severe problem behavior. Manipulating antecedent social events by having the therapist consume food items (inputs, u(t)) changed the participant’s motivation (as measured by an increase or decrease in socially inappropriate responding), a latent state, Although a sequential relation between motivation at time (t-1) and time (t) was not explored in the authors’ analysis, it is possible that analysis of these data using the linear Gaussian SSM would identify any such dependencies, assuming sufficient occasions of data (a total of 100 occasions or more) were collected to perform the analysis. This first example involves what is perhaps the simplest possible instance of a linear Gaussian SSM. The linear Gaussian SSM can be straightforwardly extended to continuous time (Simon, 2006). Implementation. These analyses can be implemented in the Fortran computer program MKFM6, which uses an expectation-maximization (EM) algorithm to fit linear Gaussian SSMs, which together with documentation can be freely downloaded from the website New Methods for Sequential Behavioral Analysis 10 http://users.fmg.uva.nl/cdolan/. The EM algorithm is more efficient than the more commonly used maximum likelihood (ML) algorithm. This computer program also can estimate multivariate Gaussian time series obtained with multiple subjects in a replicated time series design. The Hidden Markov Model A hidden Markov model (HMM) is similar in many ways to the linear Gaussian SSM. One similarity is that the HMM would be used when the outcome, y(t), is known but the state is unknown, or “hidden,” (i.e., latent, as in the Gaussian linear SSM example above); a Markov model is one in which the state can be determined simply by the value of y(t). Recalling that in a behavioral analysis y(t) is the behavioral outcome at a given point in time (e.g., one of the data points in Figure 1), the experimenter is interested in the organism’s state so as to predict the next behavioral event y(t+1), or the next state. The HMM is used to estimate the probability of transition from one state to another, given the observed data. An HMM could be used to analyze the transitions between different categories of behavior; for example, observational data could be gathered over time of problem behavior (e.g., aggression, self-injurious behavior, and disruption). These data then could be analyzed using an HMM, which would result in the probabilities of transition from one state (category of problem behavior) to another. The resulting probabilities could be used to more precisely tailor a behavioral intervention. For example, if the target behavior was self-injurious behavior, and an HMM revealed that the probability of transition from aggression to selfinjurious behavior was much higher than the probability of transition from aggression to disruption, intervention could be targeted more precisely before the occurrence of self-injurious behavior. New Methods for Sequential Behavioral Analysis 11 The defining equations for the HMM are similar to (1A)-(1B) defining the linear Gaussian SSM. For the HMM, y(t) is a categorical process with p categories – the number of states. The latent state process (t) is also a categorical process. The standard HMM can be defined as: (3A) Measurement Submodel: y(t) = (t) + (t) (3B) Dynamic Submodel: (t) = β(t-1) + (t) is the probability that y(t) is a certain category, given a value of (t). β is the conditional probability that (t) is a certain category, given a value of (t-1). The measurement error process (t) and the process noise (t) are categorical processes with a priori fixed properties (Elliott, Aggoun & Moore, 1995). Example. Until now applications of HMMs for the sequential analysis of behavior processes are rare. Notable exceptions are Courville and Touretzky (2002) and Visser, Raijmakers, & Molenaar (2007). We will discuss Visser, Raijmaker and Molenaar’s (2002) application of the HMM to implicit learning, or learning without awareness, of a rule-governed sequence, with simulated data to illustrate. The data were simulated according to a state process (t) with 2 categories (q = 2) to represent two states, an “unlearned state” and a “learned state.” The observed process y(t) was simulated with 3 categories (p = 3), to represent three possible responses; that is, three different buttons that participants could press in sequence during the task to represent the rule governing the observed sequence. In this instance, the probabilities for (t) given the values of (t-1) contained in the matrix β are shown in Table 1. Thus, when the previous state is the guessing state ((t-1) = 1), the probability that the current state is also the unlearned state is 0.9; when the previous state is the learned state ((t-1) = 2) , the probability that the current state is the guessing state is 0.3. New Methods for Sequential Behavioral Analysis 12 Table 1 placement approximately here We can understand similarly the elements of the matrix as shown in Table 2. When the present state is the guessing state ((t) = 1), the probability that the outcome is 1 (continued guessing) is 0.7; when the present state is the learned state ((t) = 2), the probability that the outcome is 1 is 0; when the present state is the guessing state ((t) = 1), the probability that the outcome is 2 is 0; when the present state is the unsure state ((t) = 3), the probability that the outcome is 1 is 0.3. Table 2 placement approximately here Figure 2 depicts the observed process y(t) and latent state process (t). The category values of the latent state process have been increased by 4 units for clarity. The estimates of the conditional probabilities specified in Tables 1 and 2 are very close to their true values (see Visser et al., 2002, for further details). As can be observed by the plot of the data, when the state is in the learned category (a value of 6 on the y-axis), the response emitted is more likely to be in Categories 2 or 3. Figure 2 placement approximately here An HMM could be used to analyze the probability of transition among different categories of problem behavior in an individual with mental retardation, as in Kunh et al. (2009). Observational data could be collected over time of stereotypic movements, self-injurious behavior, and disruptive behavior. If the target behavior is disruptive behavior, an HMM may reveal, for example, that the probability of transitioning from stereotypic movement to disruptive behavior is lower than the probability of transitioning from self-injurious behavior to disruptive behavior. An intervention could then be designed accordingly, and targeted more precisely. New Methods for Sequential Behavioral Analysis 13 Parameter estimation. Parameter estimation in HMMs and hidden hierarchical Markov models (HHMMs) can be accomplished by means of the EM algorithm mentioned in the linear Gaussian SSM section. This algorithm, when used for HMMs and HHMMs, is known as the Baum-Welch (forward-backward) algorithm. It has been implemented in the R software package DEPMIXS4 that can be freely downloaded from the website http://users.fmg.uva.nl/ivisser/. Visser and Speekenbrink (2010) provide illustrative examples and instruction. Fraser (2008) presents a transparent derivation of this algorithm. The Generalized Linear Dynamic Model Another instance of the SSM is the generalized linear dynamic model (GLDM). In the GLDM the manifest process y(t) is categorical (as in the standard HMM) and the latent state process (t) is linear (as in the linear Gaussian SSM). This model can be conceived of as a dynamic generalization of loglinear analysis, and serves as an alternative to the HMM. The GLDM is defined as: (5A) y(t) = [(t)] + (t) (5B) (t) = + β(t-1) + u(t)+(t) These formulas are similar to the formulas in the HMM section and the linear Gaussian SSM section. In the GLDM, [(t)] is a nonlinear function of (t), the so-called response function (Fahrmeir & Tutz, 2001). The matrix β again contains regression coefficients linking the prior state process (t-1) to the current state process (t). The matrix again also contains regression coefficients. The process noise (t) again represents the lack of predictability of (t) given (t-1). The vector contains mean levels (intercepts). Example. An example of data from the behavior analysis literature that could be analyzed using this approach is found in Cunningham (1979). The prediction of extinction or New Methods for Sequential Behavioral Analysis 14 conditioned suppression of a conditioned emotional response y(t) as a function of sobriety (t) could be assessed. Cunningham trained a conditioned emotional response in sober rats. Extinction was conducted under conditions of high doses of alcohol. Conditioned suppression returned when those rats were in a sober condition. The GLDM could be used to determine at what level of sobriety (e.g., different doses of alcohol) conditioned suppression returns. Parameter estimation. Parameter estimation in the GLDM is accomplished by means of the EM algorithm described in the linear Gaussian SSM. The nonlinear response function in (5A) necessitates a special approach to carry out the estimation of (t) (Fahrmeir & Tutz, 2001). A beta version of the computer program concerned can be obtained from the first author. The Linear Gaussian SSM with Time-Varying Parameters The state space models considered in the previous sections all have parameters that are constant – consistent both in level and pattern – over time. Yet learning and developmental processes often have changing statistical characteristics that only can be captured accurately by allowing model parameters to be time varying. For example, erratic environments with inconsistent contingencies have been associated with the development of attachment problems in children (Belsky, Garduque & Hrncir, 1984). There exist a number of model types for the sequential analysis of processes with changing statistical characteristics (so-called nonstationary processes), several of which are described in Priestley (1988) and Tong (1990). Here we focus on a direct generalization of the linear Gaussian SSM, allowing its parameters to be time varying in arbitrary ways. This approach is based on related work by Young, McKenna and Bruun (2001) and Priestley (1988). For this generalization we introduce a new vector to our formulas, t, which contains all unknown parameters. Notice that t is time-varying, as indicated by the subscript t. The New Methods for Sequential Behavioral Analysis 15 individual parameters in t may, or may not, be time-varying; it is one of the tasks of the estimation algorithm to determine which individual parameters in t are time-varying and, if so, how they vary in time. With this specification, the linear Gaussian SSM with time-varying parameters is obtained as the following generalization of (1A)-(1B): (6A) y(t) =t] + t](t) + t]u(t) + (t) (6B) (t) = t] + βt](t-1) + t]u(t) + (t) (6C) t = t-1 + (t) The interpretation of the parameter matrices in (6A)-(6B) is the same as for (1A)-(1B), but now each parameter matrix or vector depends upon elements of the vector t in which all unknown parameters are collected. For instance, t] specifies that the unknown parameters in the matrix are elements of the vector t of time-varying parameters. (6C) is a new addition, tracking the possibly time-varying behavior of the unknown parameters. The process (t) represents lack of predictability of t. If an element of (t) has zero (or small) variance then the corresponding parameter in t is (almost) constant in time, whereas this parameter will be timevarying if the variance of this element of (t) is significantly different from zero. Example. One example of the application of this model is based on data gathered from a biological son and his father. This application could allow an intervention to be planned and implemented to reduce escalating negative interactions that might improve the long-term relationship between father and son. These data were presented by Molenaar and Campbell (2009), in what appears to be the first application of this model to psychological process data. The data concern a 3-variate time series of repeated measures of emotional experiences of biological sons with their biological fathers during 80 consecutive interaction episodes over a New Methods for Sequential Behavioral Analysis 16 period of about 2 months. The self-report measures collected at the conclusion of each episode were Involvement, Anger and Anxiety. Here we focus on a single biological son, the data of whom are depicted in Figure 3. Figure 3 placement approximately here The following instance of (6A)-(6C) was used to analyze the data in Figure 3: (7A) y(t) = (t) (7B) (t) = βt-1](t-1) +(t) (7C) t = t-1 + (t) The matrix βt] contains the possibly time-varying regression coefficients linking (t) to (t-1). Here we only consider the part of the model explaining the Involvement process. This part can be represented as: (8) Inv(t) = 1(t-1)*Inv(t-1) + 2(t-1)*Ang(t-1) + 3(t-1)*Anx(t-1) + (t) Where Inv = involvement, Ang = anger, and Anx = anxiety. Thus, Involvement at time (t) is a function of Involvement at the previous time point (t-1), Anger at the previous time point (t-1), and Anxiety at the previous time point. The beta (coefficients at time (t) in (8) indicate that involvement, anger, and anxiety vary with time; the coefficient quantifies the relation between time (t) and time (t-1). Figure 4 shows the estimates of 1(t), 2(t) and 3(t) across the 80 interaction episodes, and illustrates the intertwined nature of involvement, anger and anxiety. 1(t), which quantifies the effect of involvement at time (t-1) on involvement at time (t) is decreasing across the initial half of the interaction episodes, after which it stabilizes during the final half of the interaction episodes. 3(t) in (8), which quantifies the effect of anxiety at the previous interaction sequence on Involvement at the next interaction sequence, is increasing across the initial half of the New Methods for Sequential Behavioral Analysis 17 interaction episodes, after which it stabilizes during the final half of the interaction episodes. It is noteworthy that 3(t) is negative during the initial interaction episodes, but is positive during the later interaction episodes. When 3(t) is negative, there is a negative relation between anxiety at time (t-1) and involvement at time (t), i.e., increased anxiety predicts decreased involvement at each subsequent interaction or decreased anxiety predicts increased involvement at each subsequent interaction. When 3(t) is positive, there is a positive relation between anxiety at time (t-1) and involvement at time (t), i.e., increased anxiety predicts increased involvement at each subsequent interaction or decreased anxiety predicts decreased involvement at each subsequent interaction. When 3(t) is zero, anxiety at time (t-1) has no effect on involvement at the subsequent interaction. Additional procedural details and data can be found in Molenaar, Sinclair, Rovine, Ram & Corneal (2009). Figure 4 placement approximately here An example of data from the behavior analysis literature that could be analyzed using this approach can be found in McSweeney, Murphy, and Kowal (2004). The authors examine habituation as a function of duration of reinforcers. Rate of responding y(t) as a function of habituation (t) as habituation varies over time βt-1](t-1) could be assessed more precisely. Parameter estimation. Parameter estimation in (6A)-(6C) is accomplished by the EM algorithm. The E-step (estimation step) requires considerable reformulation of the original model, resulting in a nonlinear analogue. A beta version of the Fortran program with which the linear Gaussian SSM with time-varying parameters is fitted to the data can be obtained from the first author. SSMs in Continuous Time New Methods for Sequential Behavioral Analysis 18 The class of SSMs in continuous time is large, including (non-)linear stochastic differential equations, Fokker-Planck equations, etc., and therefore cannot be reviewed within the confines of this chapter. One application of SSMs in continuous time, the point process, could be applied to Nevin and Baum (1980). Point processes can be used to determine the timing of a repeated discrete response. Nevin and Baum discuss probabilities of the termination and initiation of a responding burst in free-operant behavior. An assumption of one model discussed is that, during a burst, interresponse times are approximately constant. A point process could test this assumption, as well as whether the transition from responding to termination is a function of that timing. Rate of responding and rate of reinforcement data could be collected while training pigeons. These data could be analyzed using the point process, and results would indicate whether interresponse times are constant or varied during a burst, and whether the transition from responding to termination is a function of constant interresponse time or varied interresponse time. For excellent reviews of SSMs in continuous time see, for instance, Gardiner (2004) and Risken (1984). The linear Gaussian SSMs considered here are special in that the analogues in continuous time share most of their statistical properties. Molenaar & Newell (2003) applied nonlinear stochastic differential equations to analyze bifurcations (sudden qualitative reorganizations) in human motion dynamics. Haccou & Meelis (1992) give a comprehensive treatment of continuous time Markov chain models for the analysis of ethological processes. Remarks on Optimal-Control-of-Behavior Processes The concepts of homeostasis and control have long played a role in behavior theory (McFarland, 1971; Carver, 2004; Johnson, Chang & Lord, 2006). This role, however, has been confined to the theoretical level. If one of the SSMs considered earlier has been fitted to a New Methods for Sequential Behavioral Analysis 19 behavioral process then one can use this model to optimally control the process as it unfolds by means of powerful mathematical techniques. Here we present optimal control as a computational technique to steer a behavioral process to desired levels. The fitting of this model to a complex behavioral process, such as the development of aggressive behavior, allows the precise estimation of parameters that can then be applied to ameliorate the problem behavior and correct those problems in an efficient manner. To introduce the concept of computational feedback, consider the following simple behavior process model in discrete time: (10) y(t+1) = y(t) + u(t) + (t+1) Here y(t) denotes a manifest univariate behavior process, u(t) is a univariate external input, and (t) is process noise. From an intervention perspective, the desired level of the behavioral process is y* and achieving this is obtained by experimentally manipulating u(t) (e.g., medication, reinforcement contingency, etc.). Given (10) and y*, the goal of the computational feedback problem is to set the value of u(t) (i.e., the level of the value of the independent variable) in a way that minimizes the difference between the expected value of y(t) and y*. To do this requires an estimate of the expected value of the behavior of interest given the extant record of that behavior up to time t-1: E[y(t|t-1)]. The deviation between expected and desired behavior is referred to as a cost function. A simple cost function is: C(t) = E[y(t|t-1) – y*]2. Again, the external input u(t) will be chosen in a way that minimizes C(t) for all times t. Because E[y(t|t-1)] = y(t) + u(t), the optimal level of the independent variable, u*(t), is given by: (11) u*(t) = [-y(t) + y*]/ New Methods for Sequential Behavioral Analysis 20 (11) is a feedback function, in which the optimal input u*(t), which will co-determine the value y(t+1), depends upon y(t). Thus the optimal input for time (t+1) to steer the behavior y to an optimal value y* is dependent upon the value of y at time (t). Example. Figure 5 depicts a manifest behavioral process y(t), t=1,…,50 (labeled Y). This y(t) is uncontrolled (i.e., not subject to any experimental manipulation) and was simulated according to (10) with = 0.7 and = 0.9. The goal of our intervention will be to achieve a desired level of behavior which we have set at zero, y* = 0. Application of (11) yields the u*(t) values shown in the UO function in Figure 5. Application of these optimal values of the independent variable, u*(t), in (10) yields the optimally controlled behavioral process which is labeled YO in Figure 5. It is evident from Figure 5 that the deviation of the optimally controlled process YO from y* = 0 is much smaller than the analogous deviation of the behavioral process without control Y. Figure 5 placement approximately here This example of optimal feedback control given above is simple in a number of respects. The behavioral process to be controlled is univariate, or a single dependent variable; in actuality, both are often multivariate. For example, escalating aggression may be measured by frequency and co-occurrence of several behaviors, such as hair pulling, slapping, hitting, kicking and scratching. Also, (10) assumes a known linear model. In reality, the model can be any one of the SSMs considered previously, with parameters that are unknown and have to be estimated. In our simple example, the cost function is a quadratic function that does not penalize the costs of exercising control actions. In general both the deviations of the behavioral process from the desired level and the external input (e.g., medication, reinforcement contingency, etc.) are penalized according to separate cost functions which can have much more intricate forms. This New Methods for Sequential Behavioral Analysis 21 example, however, serves to illustrate that control of behavior can be quantified and estimated, thus allowing for much more precise and efficient manipulations of experimental variables designed to modify behavior. An example from the behavior-analytic literature where this analysis might be applied is an extension of the Kuhn et al. (2009) study, mentioned earlier in this chapter. Once the evaluation of antecedent events to problem behavior had been completed, and those events which led to reduction of target behavior identified, implementation of the treatment plan could be assessed using an optimal control feedback model. In general well-developed mathematical theories exist for optimal feedback control in each of the types of SSMs considered here. Kwon & Han (2005) present an in-depth description of optimal control in linear SSMs (see also Molenaar, 2010); Elliott, Aggoun & Moore (1995) is the classic source for optimal control in HMMs; Costa, Fragoso and Marques (2005) discuss optimal control in SSMs with regime shifting (i.e., systems models for processes undergoing sudden changes in their dynamics). Conclusion Because the class of dynamic systems models is too large to comprehensively cover within the confines of this chapter, we have reviewed several simple examples of this class of models: the linear Gaussian SSM, the hidden Markov model, the hierarchical hidden Markov model, the generalized linear dynamic model, the linear Gaussian SSM with time varying parameters, and SSMs in continuous time. Notable omissions from our review are regime shifts (Costa et al., 2005), modeling processes having long-range sequential dependencies (Palma, 2007), and chaotic processes having dynamics evolving at multiple time scales (Gao, Cao, Tung & Hu, 2007). These types of process models are of interest for sequential behavior analysis New Methods for Sequential Behavioral Analysis 22 (Torre & Wagenmakers, 2009) and again can be formulated as special instances of SSMs; however, their discussion requires consideration of much more technical concepts and for that reason have been omitted from this review. We therefore refer to the excellent references given above for further details about these model types. These new methods for the sequential analysis of behavior processes reviewed here can be conceived of as special cases of a general SSM, a model that allows for the quantification of complex systems that are comprised of inputs, feedback, outputs and states. Several instances of this general SSM were presented, however, the SSM covers a much broader range of dynamic models. For instance, there exists a close correspondence between artificial neural network models and state space modeling (Haykin, 2001). This correspondence opens up the possibility to reformulate artificial neural network models as specific instances of nonlinear SSMs. For instance, the neural network models considered by Schmajuk (2002; 1997) are variants of adaptive resonance theory networks that, in their most general form, constitute coupled systems of nonlinear differential equations (Raijmakers & Molenaar, 1996) obeying the general nonlinear state space model format. Similar remarks apply to reinforcement networks (Wyatt, 2003). It is our hope that this presentation of the class of models known as SSMs has illustrated how they can be useful in behavior analysis – specifically in modeling behavior and behavior change as a process, and to include all aspects of the behavior change process: the state of the organism, the inputs and outputs occurring during the process, and the feedback that occurs as the process continues. Use of these models can enhance the further development and growth of the field. New Methods for Sequential Behavioral Analysis 23 References Aoki, M. (2002). Modeling aggregate behavior and fluctuations in economics: Stochastic views of interacting agents. Cambridge: Cambridge University Press. Belsky, J. Garduque, L., & Hrncir, E. (1984). Assessing performance, competence, and executive capacity in infant play: Relations to home environment and security of attachment. Developmental Psychology, 20, 406 – 417. Browne, M. W. & Nesselroade, J. R. (2005). Representing psychological processes with dynamic factor models: Some promising uses and extensions of ARMA time series models. In: A. Maydeu-Olivares & J.J. McArdle (Eds.), Psychometrics: A festschrift to Roderick P. McDonald. Mahwah, NJ: Lawrence Erlbaum Associates, 415-452. Carver, S. C. (2004). Self-regulation of action and affect. In: R. F. Baumeister & K. D. Vohs (Eds.), Handbook of self-regulation. New York: The Guilford Press, 13-39. Costa, O. L. V., Fragoso, M. D., & Marques, R. P. (2005). Discrete-time Markov jump linear systems. London: Springer-Verlag. Courville, A. C., & Touretzky, D. S. (2002). Modeling temporal structure in classical conditioning. In: T. J. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems, Vol. 14. Cambridge, MA: MIT Press, 3-10. Cunningham, C. L. (1979). Alcohol as a cue for extinction: State dependency produced by conditioned inhibition. Animal Learning & Behavior, 7,45 – 52. Elliott, R. J., Aggoun, L., & Moore, J. B. (1995). Hidden Markov models: Estimation and control. New York: Springer-Verlag. Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modeling based on generalized linear models. 2nd edition. Berlin: Springer-Verlag. New Methods for Sequential Behavioral Analysis 24 Ford, D. H., & Lerner, R. M. (1992). Developmental systems theory: An integrative approach. Newbury Park, CA: Sage Publications. Gao, J., Cao, Y., Tung, W. W., & Hu, J. (2007). Multiscale analysis of complex time series: Integration of chaos and random fractal theory, and beyond. Hoboken, NJ: Wiley. Gardiner, C. W. (2004). Handbook of stochastic methods: For physics, chemistry and the natural sciences. 3rd edition. Berlin: Springer-Verlag. Glass, G., Willson, V., & Gottman, J. M. (1975). Design and analysis of time-series experiments. Boulder, Co: Colorado University Press. Gottman, J. M., McFall, R. M., & Barnett, J. T. (1969). Design and analysis of research using time-series. Psychological Bulletin, 72, 299 – 306. Haccou, P., & Meelis, E. (1992). Statistical analysis of behavioural data: An approach based on time-structured models. Oxford: Oxford University Press. Hamaker, E. L., Dolan, C. V., & Molenaar, P. C. M. (2005). Statistical modeling of the individual: Rationale and application of multivariate stationary time series analysis. Multivariate Behavioral Research, 40, 207-233. Hamaker, E. L., Nesselroade, J. R., & Molenaar, P. C. M. (2007). The integrated state-space model. Journal of Research in Personality, 41, 295-315. Hannan, M. T. (1991). Aggregation and disaggregation in the social sciences. Lexington: Mass., Lexington Books. Haykin, S. (2001). Kalman filtering and neural networks. New York: Wiley. Howell, D. C. (2007). Statistical methods for psychology. Belmont: California, Thomson Wadsworth. Johnson, R. E., Chang, C. H., & Lord, R. G. (2006). Moving from cognition to behavior: What the research says. Psychological Bulletin, 132, 381-415. Kwon, W. H., & Han, S. (2005). Receding horizon control. London: Springer-Verlag. New Methods for Sequential Behavioral Analysis 25 Kuhn, D. E., Hardesty, S. L., & Luczynski, K. (2009). Further evaluation of antecedent social events during a functional analysis. Journal of Applied Behavior Analysis, 42, 349 – 353. McFarland, D. (1971). Feedback mechanisms in animal behavior. London: Academic Press. McSweeney, F. K., Murphy, E. S., & Kowal, B. P. (2004). Varying reinforcer duration produces behavioral interactions during multiple schedules. Behavioural Processes, 66, 83 – 100. Merrill, M. (1931). The relationship of individual growth to average growth. Human Biology, 3, 37-70. Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into scientific psychology, this time forever. Measurement, 2, 201-218. Molenaar, P. C. M. (2007). On the implications of the classical ergodic theorems: Analysis of developmental processes has to focus on intra-individual variation. Developmental Psychobiology, 50, 60-69. Molenaar, P. C. M. (2010). Note on optimization of individual psychotherapeutic processes. Journal of Mathematical Psychology (to appear). Molenaar, P. C. M., & Campbell, C. G. (2009). The new person-specific paradigm in psychology. Current Directions in Psychology, 18, 112-117. Molenaar, P. C. M., & Newell, K. M. (2003). Direct fit of a theoretical model of phase transition in oscillatory finger motions. British Journal Mathematical and Statistical Psychology, 56, 199-214. Molenaar, P. C. M., Sinclair, K. O., Rovine, M. J., Ram, N., & Corneal, S. E. (2009). Analyzing developmental processes on an individual level using non-stationary time series modeling. Developmental Psychology, 45, 260-271. New Methods for Sequential Behavioral Analysis 26 Moore, J. W. (Ed.) (2002). A neuroscientist’s guide to classical conditioning. New York: Springer-Verlag. Nevin, J. A., & Baum, W. M. (1980). Feedback functions for variable-interval reinforcement. Journal of the Experimental Analysis of Behavior, 34, 207 – 217. Palma, W. (2007). Long-memory time series: Theory and methods. Hoboken, NJ: Wiley. Priestley, M. B. (1988). Non-linear and non-stationary time series analysis. London: Academic Press. Raijmakers, M. E. J., & Molenaar, P. C. M. (1996). Exact ART: A complete implementation of an ART network, including all regulatory and logical functions, as a system of differential equations capable of stand-alone running in real time. Neural Networks, 10, 649-669. Risken, H. (1984). The Fokker-Planck equation: Methods of solution and applications. Berlin: Springer-Verlag. Schmajuk, N. (2002). Latent inhibition and its neural substrates. Norwell, Mass.: Kluwer Academic Publishers. Schmajuk, N. A. (1997). Animal learning and cognition: A neural network approach. Cambridge: Cambridge University Press. Sharpe, T., & Koperwas, J. (2003). Behavior and sequential analyses: Principles and practice. Thousand Oaks, CA: Sage Publications. Sidman, M. (1960). Tactics of scientific research. Oxford: Basic Books. Simon, D. (2006). Optimal state estimation: Kalman, H∞, and nonlinear approaches. Hoboken, NJ: Wiley. Torre, K., & Wagenmakers, E. J. (2009). Theories and models for 1/f noise in human movement science. Human Movement Science, 28, 297-318. New Methods for Sequential Behavioral Analysis 27 Van Rijn, P. W., & Molenaar, P. C. M. (2005). Logistic models for single-subject time series. In: L. A. van der Ark, M. A. Croon, & K. Sijtsma (Eds.), New developments in categorical data analysis for the social and behavioral sciences. Mahwah, NJ: Lawrence Erlbaum Associates, 125-145. Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2002). Fitting hidden Markov models to psychological data. Scientific Programming, 10, 185-199. Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2007). Characterizing sequence knowledge using on-line measures and hidden Markov models. Memory & Cognition, 35, 1502-1517. Visser, I. & Speekenbrink, M. (2010). depmixS4: An R-package for hidden Markov models. Journal of Statistical Software , 36(7):1–21. Wyatt, J. (2003). Reinforcement learning: A brief overview. In: R. Kühn, R. Menzel, W. Menzel, U. Ratsch, M. M. Richter, & I. O. Stamatescu (Eds.), Adaptivity and learning: An interdisciplinary debate. Berlin: Springer-Verlag, 243-264. Young, P. C., McKenna, P., & Bruun, J. (2001). Identification of non-linear stochastic systems by state-dependent parameter estimation. International Journal of Control, 74, 18371857. Authors Note Peter C. M. Molenaar, Department of Human Development and Family Studies, The Pennsylvania State University. Tamara Goode, Department of Human Development and Family Studies, The Pennsylvania State University. The authors gratefully acknowledge funding provided by NSF, grant 0852147, which made Dr. Molenaar’s work possible; and the University Graduate Fellowship provided by The Pennsylvania State University which made Ms. Goode’s work possible. New Methods for Sequential Behavioral Analysis Correspondence concerning this article should be addressed to Peter Molenaar, Department of Human Development and Family Studies, 110 South Henderson Building, Pennsylvania State University, University Park, PA 16802. E-mail: pxm21@psu.edu. 28 New Methods for Sequential Behavioral Analysis Table 1 Probabilities of (t) given (t-1) = 1 and (t-1) = 2 (t-1) = 1 (t-1) = 2 (t) = 1 0.9 0.3 (t) = 2 0.1 0.7 29 New Methods for Sequential Behavioral Analysis Table 2 Probabilities of y(t) given (t) = 1 and (t) = 2 (t) = 1 (t) = 2 y(t) = 1 .7 0 y(t) = 2 0 .4 y(t) = 3 .3 .6 30 New Methods for Sequential Behavioral Analysis Figure 1. Single subject time series with experimental manipulation at time = 51. 31 New Methods for Sequential Behavioral Analysis Figure 2. HMM process y and latent process 32 New Methods for Sequential Behavioral Analysis Figure 3. Observed series for biological son. 33 New Methods for Sequential Behavioral Analysis Figure 4. Involvement at t+1. 34 New Methods for Sequential Behavioral Analysis Figure 5. Simple optimal feedback control. Y: Behavioral process without control. YO: Optimally controlled behavioral process. UO: Optimal external input 35