Markov, Latent Variable, State-Space, or Marginal Probability Models in Pharmacometrics: Concepts and an Evaluation of Bias Matthew M. Hutmacher1,* 1. Ann Arbor Pharmacometrics Group (A2PG) 301 N. Main St, Suite 102 Ann Arbor, MI 48104 * Corresponding author. E-mail address: matt.hutmacher@a2pg.com State Space and Marginal Probability Models Abstract One might get the impression from reading the pharmacometric literature that Markov and latent variable (LV) models for longitudinal binary or ordered categorical data are distinct approaches. The approaches are related when considering autocorrelation between responses within an individual however, because these both use the Markov property. An alternate terminology (albeit not new) for describing these modeling approaches is proposed that aligns with the aim and structure of each approach. The term state space (SS) is proposed for models in which the model structure is linked to transition probabilities. The primary focus of SS models is the transition from one response state to another. The term marginal probability (MP) is proposed for models in which the primary focus of the model structure is the marginal probabilities, and the transition probabilities are derived. One might question if the transition probabilities need to be characterized adequately to ensure accurate prediction of the marginal probabilities. A simulation study, to address this, was conducted to assess bias in estimation, prediction, and inferences when the transition probabilities are misspecified and the autocorrelation is stochastic. The results may be surprising to many in that these suggest that characterizing autocorrelation, when stochastic, is not as important as specifying a suitably rich random effects structure. KEY WORDS: latent variable; probit; autocorrelation; generalized nonlinear mixed-effects models; Markov model -2- State Space and Marginal Probability Models Introduction Generalized nonlinear mixed effects models have been used increasingly for modeling longitudinal binary or ordered categorical outcomes since their introduction into the pharmacometric literature over 20 year ago [1]. Methods for addressing and handling additional within-subject correlation, or autocorrelation – correlation not addressed by subject-specific random effects – have been discussed more recently. For example, Lacroix et. al. modeled the probabilities for transitioning between responder status and dropout, where responder status was defined by the American College of Rheumatology 20% responder criterion (ACR20) [2]. These transition probabilities, or conditional probabilities of moving between states given observation of the previous states, address the autocorrelation. Exposure and time effects, as well as subjectspecific random effects, were linked to the transition probabilities directly. Modeling transition probabilities using Markov components is the most frequently used method to include autocorrelation in a pharmacometric model, and it is easy to implement. No special functions are required and the approach can be used with any link function – eg, logit, probit, etc. Hutmacher and French addressed the issue of autocorrelation within the latent variable (LV) framework [3]. Autocorrelation was assumed to be random or stochastic and manifested in the latent residuals. Exposure and time effects were applied to the probabilities of the responses, not the transition probabilities. The general latent residual correlation structure presented by Hutmacher and French was not implemented easily within software currently available for generalized nonlinear mixed effects models however, restricting its general utility. The Markov and LV approaches to modeling autocorrelation differ primarily in focus and practical implementation, not in the general construction of the model likelihood. The -3- State Space and Marginal Probability Models approaches are related at a fundamental level through use of the Markov property. Thus, a different terminology is proposed that better aligns with the intent, structure and interpretation of the models. The term state space (SS) is proposed for direct modeling of, or applying structural effects to, the transition probabilities (TP). The focus of this model is characterizing the transitions from state-to-state. The term marginal probability (MP) is proposed for models in which structural effects are applied to probabilities that are not conditional on the previous response. The LV approach as described by Hutmacher et. al. [4] implies the MP approach*. Hopefully, the rationale for the proposed change in terminology shall become clearer after a more formal exposition of each framework. Despite the different objectives and utilities of the SS and MP approaches, one might be concerned about the adequacy of the MP approach if the within-subject correlation or TPs are modeled inappropriately. One might even have this concern even if marginal probabilities are the primary quantity of interest. Large decreases in the objective function value (OFV) have been noted when using the SS approach (eg, see [2]). If the OFVs are interpreted as a statistic related to the goodness-of-fit (to the extent that it should be in this context is another matter), then one might be concerned that an approach that does not accurately predict the TPs is inadequate. This article, in relation and extension to these previous works, seeks to address two objectives. The first objective is to demonstrate the relationship between the Markov and LV * The LV concept was used to provide a framework for including indirect response model (IRM) constructs into generalized nonlinear mixed effects models. The LV was used in a way similar to modeling continuous data with IRMs in which the current response does not depend upon the previously observed response value deterministically other than through the IRM model. An LV construct could be applied to the state space approach. For example the threshold for achieving a response could depend upon the previous response. The interpretation may be awkward from the pharmacometric drug-development point of view, however. Nevertheless, marginal probability is used here to maintain focus on the difference in concepts between the two approaches. -4- State Space and Marginal Probability Models approaches when autocorrelation is present. In so doing, a simpler, more user-friendly method for implementing stochastic autocorrelation in the MP approach is presented. A simple simulation/estimation exercise is presented to help delineate the SS and MP concepts. Further insight by comparison is provided by looking at the steady state condition of SS models. The approaches are also contrasted in how missing data need to be addressed. The second objective is the evaluation of biases in parameter estimation and prediction, and the effects of these on inference, when the random effects or correlation structure is misspecified. The biases are evaluated using a simulation study in which autocorrelation is assumed to be stochastic, and biases in predicting the MP are interpreted with respect to the accuracies of the TP predictions. A method of handling autocorrelation using link functions other than the probit is also described for general utility. The results suggest that characterizing stochastic autocorrelation is not as important as specifying a suitably rich random effects structure. Relationship between the Markov and LV Models ๐ Let ๐๐ = (๐๐1 , โฏ , ๐๐๐๐ ) be a vector of binary or Bernoulli (0 or 1) observations collected at times ๐ก๐๐ , j = 1, 2…,ni. The joint probability of the responses, conditioned on the subject-specific random effects, η, can be derived by conditioning on the prior observations ๐(๐๐1 = ๐ฆ๐1 , โฏ , ๐๐๐๐ = ๐ฆ๐๐๐ |๐) ๐ ๐ = ๐(๐๐1 = ๐ฆ๐1 |๐) ∏๐=2 ๐(๐๐๐ = ๐ฆ๐๐ |๐๐1 = ๐ฆ๐1 , โฏ , ๐๐๐−1 = ๐ฆ๐๐−1 , ๐) (1) Assuming the current observation is related only to the previous response – ie, the autoregression assumption ๐(๐๐๐ = ๐ฆ๐๐ |๐๐1 = ๐ฆ๐1 , โฏ , ๐๐๐−1 = ๐ฆ๐๐−1 , ๐) = ๐(๐๐๐ = ๐ฆ๐๐ |๐๐๐−1 = ๐ฆ๐๐−1 , ๐) – then (1) can be simplified to -5- State Space and Marginal Probability Models ๐ ๐ ๐(๐๐1 = ๐ฆ๐1 , โฏ , ๐๐๐๐ = ๐ฆ๐๐๐ |๐) = ๐(๐๐1 = ๐ฆ๐1 |๐) ∏๐=2 ๐(๐๐๐ = ๐ฆ๐๐ |๐๐๐−1 = ๐ฆ๐๐−1 , ๐) (2) The derivation of (2) is well known and illustrates the Markov property (of order 1). The Markov approach, as implemented typically in the pharmacometric literature, uses structural models for the probability of the current observation ๐๐๐ given the previous observation ๐๐๐−1 . Fixed effects parameters for the baseline, placebo and drug exposure model components are applied to โ[๐(๐๐๐ = ๐ฆ๐๐ |๐๐๐−1 = ๐ฆ๐๐−1 , ๐)], where โ is a link function such as the logit. The ๐(๐๐๐ = ๐ฆ๐๐ |๐๐๐−1 = ๐ฆ๐๐−1 , ๐) can be viewed as a transition probabilities (TP), where a subject moves from state ๐ฆ๐๐−1 to state ๐ฆ๐๐ . If the ๐๐๐ are not interpreted as states, instead perhaps as discrete measurements or observations from an underlying continuous distribution, then modeling the TPs might be interpreted as an approach which addresses autocorrelation deterministically within a subject. The LV approach, as presented in the pharmacometric literature, addresses autocorrelation through LV residuals. The probability of response ๐๐๐ is modeled through fixed and subjectspecific random effects when specifying ๐(๐๐๐ = ๐ฆ๐๐ |๐); that is, these effects are applied to โ[๐(๐๐๐ = ๐ฆ๐๐ |๐)], where h is a link function. Stochastic autocorrelation between ๐๐๐ and ๐๐๐−1 can be introduced through the joint probability, ๐(๐๐๐ = ๐ฆ๐๐ , ๐๐๐−1 = ๐ฆ๐๐−1 |๐). The overall joint probability can be expressed as ๐ ๐ ๐(๐๐1 = ๐ฆ๐1 , โฏ , ๐๐๐๐ = ๐ฆ๐๐๐ |๐) = ๐(๐๐1 = ๐ฆ๐1 |๐) ∏๐=2 ๐(๐๐๐ =๐ฆ๐๐ ,๐๐๐−1 =๐ฆ๐๐−1 |๐) ๐(๐๐๐−1 =๐ฆ๐๐−1 |๐) Thus, the Markov and autocorrelated LV approaches are related through -6- (3) State Space and Marginal Probability Models ๐(๐๐๐ = ๐ฆ๐๐ |๐๐๐−1 = ๐ฆ๐๐−1 , ๐) = ๐(๐๐๐ =๐ฆ๐๐ ,๐๐๐−1 =๐ฆ๐๐−1 |๐) (4) ๐(๐๐๐−1 =๐ฆ๐๐−1 |๐) From this, the LV approach with autocorrelation can be seen to be a Markov-based approach as well. Structurally modeling the TP, ๐(๐๐๐ = ๐ฆ๐๐ |๐๐๐−1 = ๐ฆ๐๐−1 , ๐), implies the focus of the model is on characterizing the transitions from state-to-state or a state space (SS) structure. Structurally modeling ๐(๐๐๐ = ๐ฆ๐๐ |๐) using an LV structure with stochastic autocorrelation mediated by ๐(๐๐ = ๐ฆ๐ , ๐๐−1 = ๐ฆ๐−1 |๐) implies a focus on the MP. Consequently, the SS and MP terminology is proposed. Notation of Model Quantities Notation is introduced to simplify exposition and clarify concepts for the remainder of the article. The notation is also intended to help differentiate between conditioning on random effects and prior observations. Let j index the measurement time for the ith subject, and ๐ be a vector of subject-specific random effects. Probabilities, conditioned on ๐, are defined using a ‘โ’ (๐ฆ |๐ฆ ) ๐๐ ๐๐−1 as follows: ๐ฬ๐๐๐−1 = ๐(๐๐๐ = ๐ฆ๐๐ |๐๐๐−1 = ๐ฆ๐๐−1 , ๐) is the conditional or TP of ๐๐๐ based on (๐ฆ ) observing the previous response ๐๐๐−1 ; ๐ฬ๐๐ ๐๐ = ๐(๐๐๐ = ๐ฆ๐๐ |๐) is the unconditional or MP of (๐ฆ ,๐ฆ ) ๐๐ ๐๐−1 observing ๐๐๐ (unconditional on the previous response); and ๐ฬ๐๐๐−1 = ๐(๐๐๐ = ๐ฆ๐๐ , ๐๐๐−1 = ๐ฆ๐๐−1 |๐) is the joint probability of ๐๐๐ and ๐๐๐−1. Another level derives from probabilities that are (๐ฆ ) (๐ฆ ) not conditioned on ๐. These probabilities are without the ‘โ’: ๐๐๐ ๐๐ = ๐ธ๐ [๐ฬ๐๐ ๐๐ ] = (๐ฆ ,๐ฆ ) (๐ฆ ,๐ฆ ) ๐๐ ๐๐−1 ๐๐ ๐๐−1 ๐(๐๐๐ = ๐ฆ๐๐ ) is the MP of observing ๐๐๐ ; ๐๐๐๐−1 = ๐ธ๐ [๐ฬ๐๐๐−1 ] = ๐(๐๐๐ = ๐ฆ๐๐ , ๐๐๐−1 = ๐ฆ๐๐−1 ) is the joint probability of ๐๐๐ and ๐๐๐−1; and ๐ธ๐ represents the expectation operator with -7- State Space and Marginal Probability Models respect to η. These MPs are termed population probabilities (PPs) hereinafter in an attempt to avoid confusion due to multiple uses of the term marginal.† A natural definition of population (๐ฆ |๐ฆ ) ๐๐ ๐๐−1 TPs is ๐ธ๐ [๐ฬ๐๐๐−1 ]. This definition is not as interesting for MP models without stochastic (๐ฆ ,๐ฆ ) (๐ฆ ) (๐ฆ ) ๐๐ ๐๐−1 ๐๐−1 autocorrelation, because ๐ฬ๐๐๐−1 = ๐ฬ๐๐ ๐๐ ๐ฬ๐๐−1 due to conditional independence. Thus, the (๐ฆ ) population TP results in ๐๐๐ ๐๐ when there is no stochastic autocorrelation. Instead, population (๐ฆ ,๐ฆ ) (๐ฆ ) ๐๐ ๐๐−1 ๐๐−1 TPs are defined as ๐๐๐๐−1 ⁄๐๐๐−1 in this article. This definition also highlights that ๐ affects transitions, the influence of which can be illustrated by considering the simple case of an additive random effect on the baseline of the logit or probit scale. Subjects with a very large (or (๐ฆ ) small) values of ๐, will tend to have ๐ฬ๐๐ ๐๐ of 1 (or 0) and thus will not have as many transitions despite having (conditionally) independent observations. Note that the PPs are correlated through the ๐ even if there is no stochastic autocorrelation. Modeling probabilities requires cumulative distribution functions. In this article, Φ(๐ฅ๐๐ ) represents the normal cumulative distribution function at quantile ๐ฅ๐๐ . Φ2 (๐ฅ๐๐ , ๐ฅ๐๐′ , ๐๐๐๐′ (๐)) represents the bivariate normal c.d.f. or an approximation thereto, at quantiles ๐ฅ๐๐ and ๐ฅ๐๐′ , where ๐๐๐๐′ (๐) is the autocorrelation function which depends on the parameter ๐. The function ๐๐๐๐′ is defined for subject ๐ between times ๐ and ๐ ′ because the correlation and sampling times could be different between the subjects. † (๐ฆ๐๐ ) The population (marginal) probability ๐๐๐ is often the quantity of interest when predicting from a model. However, this probability cannot be directly modelled, because a closed form does not exist for the multivariate probability distribution. These probabilities must be derived from the model secondarily. -8- State Space and Marginal Probability Models Illustration of State-Space and Marginal Probability Models The relationship between the SS and MP approaches was provided in the previous section. In this section, the implied intent of the SS and MP approaches is clarified and contrasted through modeling a simple example. The data for the example were simulated using stochastic autocorrelation to help delineate the differences between the approaches. Consider a simple hypothetical case where a drug is given, its effects achieve steady state by the first sample time, and there is no placebo effect. Let the probability of being a responder be (1) (1) (1) ๐๐๐ = ๐ฬ๐๐ = 0.75 across all subjects (i) and times (๐, ๐ก๐ ∈ {1,2,4,8,16}). The equality, ๐๐๐ = (1) ๐ฬ๐๐ , indicates that V๐๐(๐) = 0. Clearly this is an idealization, yet it is useful for illustration. Two structures for ๐๐๐๐′ were considered, the autoregressive structure (AR1), where ๐๐๐๐′ = ๐, and the spatial autoregressive structure (AR1S), where ๐๐๐๐−1 = ๐|๐ก๐๐ −๐ก๐๐−1 | . Both scenarios used ๐ = 0.5 with data from 1000 virtual subjects. Five models were fitted to the data. The models are enumerated below along with a reference to its model structure: 1) typical Markov model with assumed first state (5a), 2) Markov model with estimated first state (5b), 3) constrained Markov model (5c), 4) an MP model (5d), and 5) an MP model (5d) with a 3-point quadrature approximation to Φ2 : (1|๐ฆ ) (1|๐ฆ ) ๐๐๐๐−1๐๐−1 = Φ(๐ผ1 + ๐ผ2 ๐ผ[๐ฆ๐๐−1 = 1]); assume ๐(๐๐0 = 0) = 1 (1) ๐๐๐๐−1๐๐−1 = Φ(๐ผ1 + ๐ผ2 ๐ผ[๐ฆ๐๐−1 = 1]), ๐ > 1; ๐๐๐ = Φ(๐ผ0 ), ๐ = 1 (1) (1|0) ๐๐๐ = Φ(๐ฝ1 ), ๐ ≥ 1; ๐ฬ๐๐,๐−1 = Φ(๐ฝ1 + ๐ผ3 ), ๐ > 1; (1|1) ๐๐๐,๐−1 = 1 − -9- Φ(๐ฝ1 + ๐ผ3 )[1 − Φ(๐ฝ1 )] ,๐ > 1 Φ(๐ฝ1 ) (5a) (5b) (5c) State Space and Marginal Probability Models (1) ๐๐๐ = Φ(๐ฝ1 ), ๐ ≥ 1; (๐ฆ ,๐ฆ (๐ฆ๐๐ |๐ฆ๐๐−1 ) ๐๐๐๐−1 (๐ฆ๐๐ ,๐ฆ๐๐−1 ) = ๐ฬ ๐๐๐−1 (๐ฆ๐๐−1 ) , ๐ > 1; (5d) ๐ฬ ๐๐−1 ) (1) (1) ๐๐ ๐๐−1 ๐๐๐๐−1 = (๐ฆ๐๐ − 1)(๐ฆ๐๐−1 − 1) + (๐ฆ๐๐ − 1)(−1)๐ฆ๐๐−1 ๐๐๐−1 + (๐ฆ๐๐−1 − 1)(−1)๐ฆ๐๐ ๐๐๐ + (−1)(๐ฆ๐๐+๐ฆ๐๐−1 ) Φ2 (๐ฝ1 , ๐ฝ1 , ๐๐๐๐−1 ) ๐๐0 represents a response at ๐ก = 0, which was not simulated as part of the design, yet is necessary to start the Markov chain. ๐ผ[โ] represents an indicator function that = 1 when the logical condition is true and = 0 when false. Different symbols were used for the parameters to clarify their different roles and interpretations between the models. The two Markov models, Model 1 and Model 2 [(5a) and (5b), respectively] have two parameters for modeling the TP, ๐ผ1 and ๐ผ2 . The term ๐ผ2 ๐ผ[๐ฆ๐๐−1 = 1] reflects a change in the TP as a function of observing ๐ฆ๐๐−1 = 1, and the magnitude of the change depends on ๐ผ2 ; thus, (1|0) (1|0) ๐๐๐๐−1 = Φ(๐ผ1 ) and ๐๐๐๐−1 = Φ(๐ผ1 + ๐ผ2 ). In (5a), the response or state at time 0 is assumed to be 0 and is forced to be so in the model (the first observations must result from a transition from the 0-state or a recurrence of the 0 state). Model 2 in (5b) does not make any assumption about the state of the prior time point. The parameter, ๐0 , models the probability of being in the state = 1 at time ๐ก = 1. Note that ๐ผ2 reflects the degree of dependence between two contiguous (1|0) (1|1) observations (ie, ๐ผ2 = 0 means these are independent or ๐๐๐๐−1 = ๐๐๐๐−1 = Φ(๐ผ1 ).). For these two models, the marginal probabilities can be derived using the initial state probability vector and the transition probability matrix. The calculation for ๐ก = 1 is as follows: - 10 - State Space and Marginal Probability Models [ (0) ๐๐0 (0|0) ๐๐10 (1) ๐๐0 ] [ (0|1) ๐๐10 (0,0) (0,1) [ ๐๐10 + ๐๐10 (0|0) (1|0) ๐๐10 (1|1) ๐๐10 (0) (0|0) (1) (0|1) ] = [ ๐๐0 ๐๐10 + ๐๐0 ๐๐10 (1,0) (1,1) (0) ๐๐10 + ๐๐10 ] = [ ๐๐1 (1|0) (0|1) (0) (1|0) (1) (1|1) ๐๐0 ๐๐10 + ๐๐0 ๐๐10 ] = (1) ๐๐1 ] (1|1) (6) (0) (1) where ๐๐10 + ๐๐10 = 1, ๐๐10 + ๐๐10 = 1, and ๐๐1 + ๐๐1 = 1 by definition. The probabilities for a general time point j are calculated from repeated application of the TP matrix as follows: (0|0) [ (0) ๐๐0 (1) ๐๐0 ] [ (1|0) ๐๐10 ๐๐10 (0|1) ๐๐10 (1|1) ๐๐10 (0|0) ][ (1|0) ๐๐21 ๐๐21 (0|1) ๐๐21 (1|1) ๐๐21 ]โฏ[ ๐๐๐๐−1 (0|0) ๐๐๐๐−1 (1|0) (0|1) ๐๐๐๐−1 (1|1) ๐๐๐๐−1 (0) ] = [ ๐๐๐ (1) ๐๐๐ ] (7) If the TP matrix is constant over all j and equal to P, which is the case here, then (7) maybe simplified to (0) [ ๐๐0 (0) (1) ๐๐0 ]P๐ = [ ๐๐๐ (1) ๐๐๐ ] (8) due to j repeated matrix multiplications of P. The expression in (8) has implications with respect to steady state probability predictions, which are useful in contrasting the SS and MP approaches as shown hereinafter. The structure for Model 4 in (5d) shows that the MPs of the responses are modeled by ๐ฝ1. The autocorrelation is handled by the TPs, which are derived using the joint probability and the previous marginal probability. The magnitude of the autocorrelation is governed by ๐. The calculation of Φ2 is not available in all software. Therefore, an approximation of Φ2 using 3point quadrature was evaluated in (5d) which is denoted as Model 5 [5]. Note for Model 4 and (1|0) (1|1) Model 5 that under independence (๐ = 0) ๐๐๐๐−1 = ๐๐๐๐−1 = Φ(๐ฝ1 ). - 11 - State Space and Marginal Probability Models Model 3, termed the constrained Markov model and depicted in (5c), is a hybrid between the Markov approach and the MP approach. The model uses a fixed effect, ๐ผ3 , to govern the amount of autocorrelation mediated by the TP, while predicting the MP through estimation of ๐ฝ1. Note (1|0) (1|1) that ๐๐๐๐−1 = ๐๐๐๐−1 = Φ(๐ฝ1) when ๐ผ3 = 0. Models 1-5 were fitted to the simulated data. The results are provided in Figure 1. The first row displays the predictions for the AR(1) scenario, and the first column of the row displays the (1) predicted ๐๐๐ over time. The prediction for Model 1 at ๐ก = 1 is lower than those from the other models, and achieves steady state around ๐ก = 4. Model 2 predicts better than Model 1 and achieves steady state also at ๐ก = 4 approximately. Note that the prediction from Model 2 at ๐ก = 1 is off the line because of variability. This prediction is made based on the data at ๐ก = 1 only. Interestingly, Model 1 and Model 2 do not have a constant probability prediction like the true model. Models 3-5 are indistinguishable and predict a flat trajectory as these were constructed to do so. (0|1) For the transition to 0 from 1, all models provide similar predictions of the TP ๐๐๐๐−1 . For (1|0) ๐๐๐๐−1 , Model 1 is not as close to the true value as Model 2, which is a not as close to Models 35. The reason is that the simulation structure did not assume a value at ๐ก = 0. The assumption (1|0) (0|1) (0|1) of a 0 response at ๐ก = 0 influences ๐๐๐๐−1 more than ๐๐๐๐−1 because information for ๐๐๐๐−1 is only available under the model at ๐ก ≥ 2 – ie, the initial assumption does not influence this probability. One can see that by ๐ก = 2 that Model 1 has moved away from the poor prediction at ๐ก = 1. The second row displays the results from the AR(1S) scenario. Models 1 and 2 again reach steady state by around ๐ก = 4, with all models achieving comparable predictions at ๐ก ≥ 2. - 12 - State Space and Marginal Probability Models For the TP, only Models 4 and 5 track the true trajectory. The other models do not have the machinery to allow autocorrelation to be a function of time. However, even though Model 1, 2 (1) and 3 do not accurately estimate the TPs, these do predict and approximate the flatness of ๐๐๐ at steady state. Figure 1 alludes to the notion of steady state for a Markov chain. Steady state can be formalized according to the following (0) [ ๐๐0 (1) (0) ๐๐0 ]P ∞ = [ ๐๐∞ (1) (0) ๐๐∞ ] = [ ๐๐∞ (1) ๐๐∞ ]P (9) where P ∞ represents infinite applications of P. This is somewhat analogous to pharmacokinetic steady state following repeated regular administration of a drug, where the concentration at the beginning of the dosing interval is identical to that at the end of the interval. The steady state (0) (1) (0) probabilities ๐๐∞ , and ๐๐∞ can be calculated by solving [ ๐๐∞ (0) (1) ๐๐∞ ](P − I), where I is the (1) (0) (1) identity matrix, using the constraint ๐๐∞ + ๐๐∞ = 1. The solution connotes that ๐๐∞ , and ๐๐∞ (0) (1) do not depend upon the starting probabilities ๐๐0 and ๐๐0 after a large number of applications (0) (1) of P (ie, P ∞ ). The rate at which steady state is achieved is dependent upon ๐๐0 and ๐๐0 , (0) however. Consider a simulation in which the sample size is large enough that the TP, and ๐๐0 (1) and ๐๐0 for Model 2), are estimated with sufficient accuracy and precision to be indistinguishable from the true values. Model 1 predictions do not approach 0.75 until ๐ก > 16, (1) (1) which is outside the sampling design ( ๐ฬ๐6 = 0.749), while Model 2 predicts ๐ฬ๐1 = 0.75 for all ๐ก ≥ 1, because the starting value of the chain is estimated at the first time point. In contrast, steady state is implicit in the formulations (and assumptions) of Models 3 through 5. - 13 - State Space and Marginal Probability Models Another area to compare and contrast the MP and SS approaches is how these handle interstitial (intermittent or non-monotonic) missing data – ie, occasional missing data within a subject such as a missed clinic visit (not due to dropout). For the SS approach, the probability model for the current observation is a function of the previous observation (such as Model 1 and 2). When the previous observation is missing and hence unknown, one must effectively integrate it out of the likelihood. This is demonstrated in the context of the illustration hereinbefore. Noting the assumption that the 0 state has probability 1 at ๐ก = 0 for Model 1, the likelihood for a complete data case is (๐ฆ |0) โ = ๐๐10๐1 (๐ฆ |๐ฆ๐1 ) ๐๐21๐2 (๐ฆ |๐ฆ๐2 ) ๐๐32๐3 (๐ฆ |๐ฆ๐3 ) ๐๐43๐4 (๐ฆ |๐ฆ๐4 ) ๐๐54๐5 (10) Consider the case where the second observation (๐ = 2, ๐ก = 2) is missing, yet all the other observations are available. The path the individual followed to arrive at state ๐ฆ๐3 (at ๐ก = 3) from ๐ฆ๐2 is unknown, so all trajectories must be considered. The likelihood for this case is (๐ฆ |0) โ = ๐๐10๐1 (0|๐ฆ๐1 ) [ ๐๐21 (๐ฆ |0) ๐๐32๐3 (1|๐ฆ๐1 ) + ๐๐21 (๐ฆ |1) (๐ฆ |๐ฆ๐3 ) ๐๐32๐3 ] ๐๐43๐4 (๐ฆ |๐ฆ๐4 ) ๐๐54๐5 (11) The quantity in brackets in (11) is ๐(๐๐3 = ๐ฆ๐3 |๐๐1 = ๐ฆ๐1 ). Handling missing data in this principled manner is complicated when there are strings of contiguous interstitial missingness in the data, because of the numbers of summations involved. Because of the complications, missing data are often imputed using a last-observation-carried-forward approach, the effects of which are not often evaluated. Missing data must also be integrated out of the likelihood for the MP approach. The assumption of normal latent residuals (ie, probit modeling) and consideration of the variance- - 14 - State Space and Marginal Probability Models covariance matrix of the residual simplifies the issue. Assuming that the variance of the residuals = 1, which is typical, the matrices for the AR(1) and AR(1S) scenarios, are 1 ๐1 Σ๐ = ๐ถ๐๐ฃ[๐๐ ] = ๐2 ๐3 [๐4 ๐1 1 ๐1 ๐2 ๐3 ๐2 ๐1 1 ๐1 ๐2 ๐3 ๐2 ๐1 1 ๐1 ′ with general elements ๐|๐−๐ | and ๐ 1 ๐4 3 ๐1 ๐ ๐2 or ๐3 ๐1 ๐7 1 ] [๐15 |๐ก๐๐ −๐ก๐๐′ | ๐1 1 ๐2 ๐6 ๐14 ๐3 ๐7 ๐15 ๐2 ๐6 ๐14 1 ๐4 ๐12 ๐4 1 ๐8 ๐12 ๐8 1 ] (12) , respectively. Because of the normality assumption, integrating out the missing data corresponds to eliminating the row and column from the matrix in (12) that corresponds to the missing observation – in this example, the second row and column yielding 1 ๐2 Σ๐ = ๐ถ๐๐ฃ[๐๐ ] = 3 ๐ [๐4 ๐2 1 ๐1 ๐2 ๐3 ๐1 1 ๐1 1 ๐4 2 ๐ ๐3 or ๐1 ๐7 1 ] [๐15 ๐3 1 ๐4 ๐12 ๐7 ๐15 ๐4 ๐12 1 ๐8 ๐8 1 ] (13) This general description of the elements is helpful when coding the likelihood. The correlation function used in Φ2 is simple to derive, because ๐๐31 = ๐|3−2| ๐|2−1| = ๐|3−1| for AR(1) or ๐๐31 = ๐|4−2| ๐|2−1| = ๐|4−1| for AR(1S). Thus, the likelihood goes from (๐ฆ๐2 ,๐ฆ๐1 ) (๐ฆ๐3 ,๐ฆ๐2 ) (๐ฆ๐4 ,๐ฆ๐3 ) (๐ฆ๐5 ,๐ฆ๐4 ) ๐๐21 ๐๐21 ๐๐21 (๐ฆ ) ๐๐21 โ = ๐๐1 ๐1 (๐ฆ ๐๐1 ๐1 ) (๐ฆ ๐๐2 ๐2 ) (๐ฆ ๐๐3 ๐3 ) (๐ฆ ๐๐4 ๐4 ) (14) for the complete data case to (๐ฆ๐3 ,๐ฆ๐1 ) (๐ฆ๐4 ,๐ฆ๐3 ) (๐ฆ๐5 ,๐ฆ๐4 ) ๐๐21 ๐๐21 (๐ฆ ) ๐๐21 โ = ๐๐1 ๐1 - 15 - (๐ฆ ๐๐1 ๐1 ) (๐ฆ ๐๐3 ๐3 ) (๐ฆ ๐๐4 ๐4 ) (15) State Space and Marginal Probability Models when the second observation is missing. Marginal Probability Models and the Latent Variable Structure Before the simulation study, it is helpful to establish a general framework. Let Z be a qdimensional multivariate latent variable ๐๐ |๐ = ๐(๐ฝ, ๐ก๐ , ๐๐ , ๐ฅ๐ ) + ๐(๐ฝ, ๐ก๐ , ๐๐ , ๐ฅ๐ )๐๐ + ๐๐ (16) where ๐(๐ฝ, ๐ก๐ , ๐๐ , ๐ฅ๐ ) = ๐๐ and ๐(๐ฝ, ๐ก๐ , ๐๐ , ๐ฅ๐ ) = ๐๐ are vectors or matrices of functions of the fixed effects (β), design dependent covariate vectors ๐ก๐ (time) and ๐๐ (dose), and ๐ฅ๐ is a matrix of subject-specific covariates (potentially time varying); η is a p-dimensional vector of random effects (๐๐๐(๐๐ ) = Ω); and ๐๐ is the q-dimensional normally distributed latent residual error vector (๐๐๐(๐๐ ) = Σ(๐ก๐ )), with a covariance/correlation matrix that can be a function of ti. In general, ๐๐ could have any distribution, however, modeling autocorrelation through latent residuals is the most straightforward. This will be revisited later. The diagonal entries of Σ(ti) ≡ 1 for identifiability, and is thus a correlation matrix. In pharmacometrics work for binary data, often ๐๐ ≡ 1, and the model is not written grouping the fixed effect terms and the random effects terms as shown in (16). There is benefit in this which should become clear. Then, (1) ๐ฬ๐๐ = ๐(๐๐๐ = 1|๐) = ๐(๐๐๐ ≤ ๐พ|๐) = Φ(๐พ − ๐๐๐ − ๐๐๐ ๐๐ |๐) (17) where i indexes subject, j indexes time within the subject (an observation from the vector), and ๐พ is a constant, interpreted as a threshold, which is also sometimes interpreted as the baseline (see Hu et al [6] for a broader discussion). - 16 - State Space and Marginal Probability Models Nonlinear mixed effects software takes (17) and computes the (approximate) likelihood behind the scenes. Despite the linearity of ๐ in (16), which allows exact computation of the distribution of Z, the likelihood is based on integrals of Z, which leads to the need to approximate the likelihood when fitting Y. (1) (1) The desired quantity from modeling for decision making is typically ๐๐๐ , not ๐ฬ๐๐ . Usually ∗ one must use Monte Carlo techniques to calculate these, eg ๐๐๐ ≈ ๐1 ∑๐ ๐=1 Φ(๐๐๐ − ๐๐๐ ๐๐ ), for a suitable size M where ๐∗ is sampled from ๐(0, Ω) and ๐๐๐ and Ω and have been estimated. However, ๐๐๐ can be computed directly from the marginal distribution for the probit model described above. This removes simulation error from the prediction and is economical computationally, and also provides some insight into the potential for bias. The direct computation is based on the multivariate normal marginal distribution of Z – ie, ๐๐ ~๐(๐๐ , ๐๐ Ω๐๐๐ + Σ๐ = Ξ๐ ). Using the LV framework, properties of the multivariate normal and letting ๐๐๐ = ๐พ − ๐๐๐ (absorbing the threshold), the PP can be derived (1) −½ −½ ๐๐๐ = ๐(๐๐๐ = 1) = ๐(๐๐๐ ≤ ๐พ) = Φ(Ξ๐๐๐ [๐พ − ๐๐๐ ]) = Φ(Ξ๐๐๐ ๐๐๐ ) (18) where Ξ๐๐๐ is the jth element of the diagonal of Ξ๐ . Note that (18) is a MP and that the index i is retained, because, in general, subjects can have different covariate vectors such as time or dose. The result in (18) is not dependent directly on the off diagonal elements of Ξ and hence the −½ autocorrelation specified in Σ. One can see in (18) that β, ๐ and Ω are intertwined in Ξ๐๐๐ ๐๐๐ , such that the misspecification of ๐ or Ω, or biased estimates of Ω could influence the estimates of β to keep accurate predictions of ๐๐๐ . From this, one could speculate that incorrect specification of Σ could lead to biased estimates of Ω to compensate, and hence biases in β could - 17 - State Space and Marginal Probability Models −½ result. The Ξ๐๐๐ ๐๐๐ in (18) exemplifies the possible confounding of fixed and random effects. The LV structure naturally leads to the MP approach. Simulation Study to Evaluate Bias in the Estimates, Predictions and Inferences Methods The form of the model components in (16) used in the simulation study were ๐ ln 2 ๐ ๐๐ = ๐ฝ1 + ๐ฝ2 ๐(๐ก๐๐ ) + ๐ฝ3 ๐ +๐๐ฅ๐(๐ฝ , ๐๐๐ = ๐(๐ก๐๐ ) = 1 − ๐๐ฅ๐ (− ๐๐ฅ๐(๐ฝ )) ๐ก๐ ) ๐ 5 4 (19a) ๐ ๐๐ = (1, ๐(๐ก๐๐ )) → ๐๐ ๐๐ = ๐1๐ + ๐2๐ ๐๐๐ (19b) As above, Ξ๐ = ๐๐ Ω๐๐ ๐ + Σ๐ , where Ξ๐๐๐ = Ω11 + 2๐๐๐ Ω12 + ๐๐๐2 Ω22 + 1. The design skeleton used for the simulation study was as follows: 7 parallel-groups with doses ๐๐ from the set {0, 1, 3, 5, 10, 15, 30}, 40 subjects per dose group, and 7 time points per individual with ๐ก๐ ∈{1, 2, 4, 8, 16, 24, 36}. The parameters ๐ฝ1, ๐ฝ2, and ๐ฝ3 in (19a) were derived (1) (1) (1) to achieve: ๐๐๐ = 0.15 for ๐๐ = 0, ๐ก๐๐ = 1; ๐๐๐ = 0.40 for ๐๐ = 0, ๐ก๐๐ = 16; and ๐๐๐ = 0.85 for ๐๐ = 0, ๐ก๐๐ = 16; ๐ฝ2 is the maximum placebo effect and ๐ฝ3 is the maximum drug effect or Emax. The half-life (t½) of placebo onset was ๐๐ฅ๐(๐ฝ4 ) = 4, and the ED50 was ๐๐ฅ๐(๐ฝ5 ) = 30 (ie, 9 the ED90 = 30). Overall, β = (−1.67, 1.25, 2.94, 1.39, 1.20). The variance components were Ω11 = 1, Ω12 = 0, and Ω22 = 2.5. The values of ๐ considered in the simulation were {0, 0.3, 0.5, 0.7, 0.9} for Σ๐ , and the AR(1S) structure was used – ie, Σ๐๐๐ = ๐|๐ก๐๐−๐ก๐๐| . The ๐ in (16) were simulated using the following recursive relation: - 18 - State Space and Marginal Probability Models ๐๐1 = ๐๐1 ๐=1 ๐๐๐ = ๐๐๐−1 ๐|๐ก๐๐−๐ก๐๐−1 | + ๐๐๐ √1 − ๐2|๐ก๐๐ −๐ก๐๐−1 | ๐≥2 (20) where ๐๐๐ ~๐(0,1). The responses were realized by plugging the simulated values of ๐๐ and ๐๐๐ into ๐ฆ๐๐ = ๐ผ(๐๐๐ < ๐๐๐ ), where ๐ผ(โ) = 1. when ๐๐๐ < ๐๐๐ and = 0 otherwise. Table 1 displays the off-diagonal correlation values for Ξ – ie, the correlations in the latent variable Z. Ranges of correlations across the doses are reported for the responses, ๐; ranges are displayed because these are on the marginal scale and so the fixed effects play a role. Note that the diagonal elements of these matrices = 1 by definition. From the table, one can see there is moderate correlation at baseline, and as the study progresses in time (large index values) that the correlation increases. This increase in correlation is based on the placebo onset model and its random effects, and reflects the onset of steady-state. One can also see that as ρ increases, observations at early time points (small index values) are increasingly more correlated due to ๐|๐ก๐๐ −๐ก๐๐−1 | in Σ๐ . This increased correlation attenuates as observations become farther apart. The complexity and form of the model in (16) was chosen for realism. Including random effects on model components other than the intercept is not typical [7]. The extra random effect on the non-drug response was included to allow for a more general correlation structure than typically assumed, yet is likely more consistent with data that are generated from an LV process (owing to its hypothetical relation to an unobserved continuous factor). The number of time points and doses are more than typically studied in a Phase 2 trial. However, this information rich design was selected specifically to facilitate convergence and covariance step estimation across multiple simulations. The goal was to avoid confounding of results and conclusions with convergence issues, both within an assumed model and between more complex and simpler - 19 - State Space and Marginal Probability Models (reduced) models. In fact, to further facilitate covariance step estimation, the following parameterization was used: ln √Ω11 , ln √Ω22 , and ๐ฝ๐ where ๐ = 2(1 + ๐ −๐ฝ๐ ) −1 − 1). These reparameterizations decreased the issues associated with boundary constraints of the variance components (ie, 0) and demonstrated improved normality in their sampling distributions (preliminary simulations – data not shown). Eight versions of the model described in (19a) and (19b) were fitted: 1) M0 used adaptive Gaussian quadrature (AGQ) for the random effects and a 5-point approximation for Φ2 (see [5]); M1 was the same as M0 except the Laplace approximation (LA) was used instead of AGQ ‡ and a 3-point approximation for Φ2 was used (see [5]); M2 was M1 using AGQ (ie, 3-point approximation for Φ2 ); M3 was M2 using LA with ๐ constrained to 0; M4 was M3 using AGQ; M5 was M4 using LA with Ω22 constrained to 0, which is generally regarded as the standard model; M6 was M5 using AGQ; and M7 was M6 with Ω11 constrained to 0 – a naïve pool model. These models reflect different approximations to the likelihood and simplification to the stochastic structure. The joint probabilities used in the likelihood for models M0, M1 and M2 are (๐ฆ ,๐ฆ ) ๐๐ ๐๐−1 ๐ฬ๐๐๐−1 = (๐ฆ๐๐ − 1)(๐ฆ๐๐−1 − 1) + (๐ฆ๐๐ − 1)(1 − 2๐ฆ๐๐−1 )Φ(๐๐๐−1 ) +(๐ฆ๐๐−1 − 1)(1 − 2๐ฆ๐๐ )Φ(๐๐๐ ) + (1 − 2๐ฆ๐๐ )(1 − 2๐ฆ๐๐−1 )Φ2 [๐๐๐ , ๐๐๐−1 , ๐๐๐๐−1 (๐)] (21) The joint probability is expressed differently in (21) than in (5d) to facilitate coding in NONMEM. Predictions of the PPs for these models were based on (1) ๐ฝ ๐ 3 ๐ −½ 2 ๐๐๐ = Φ(Ξ๐๐๐ ๐๐๐ ) = Φ ([๐ฝ1 + ๐ฝ2 ๐๐๐ + ๐ +๐ ๐ฝ5 ]⁄๐๐๐ ) , ๐๐๐ = √1 + Ω11 + Ω22 ๐๐๐ ๐ ‡ The Laplace approximation is essentially adaptive Gaussian quadrature with 1 quadrature point. - 20 - (22) State Space and Marginal Probability Models Inspection of (22) reveals that biased predictions could arise because of incorrect specification of ๐๐๐ . When Ω22 is constrained to 0, the modeled time-profile is changed and the parameter estimates of the ๐ฝ’s could attempt to compensate to fit the data. Three models were introduced to adjust for such biases, and these are based on the structure ๐ท ๐๐๐ = ๐ฝ1 + ๐1๐ + ๐ฝ2,๐ (๐ก๐ ≥ 2) + ๐ฝ3,๐ ๐ท +๐๐ ๐ฝ5 ๐ (23) where separate fixed effects were estimated by time for the placebo and drug components. These models represent saturated models, essentially fitting different dose-effect profiles by time. M8 used LA, M9 used AGQ, and M10 had Ω11 constrained to 0 (a naïve pool model). PROC NLMIXED in SAS Version 9.3 (SAS Institute Inc., Cary, NC) was used to fit all these models. Recently, the approximations described and implemented above have been made available as a subroutine in NONMEM (ICON Development Solutions, Ellicott City MD), which is a more standard and general software for pharmacometrics work. A NONMEM control stream is supplied in Appendix A that demonstrates more clearly how these data were fitted. Appendix A also provide a link to a site from which the bivariate normal routine can be downloaded. Percentage bias in the parameter estimates was calculated. Despite the parameterization discussed above for estimation, biases in Ω11 , Ω22 and ๐ were computed so that percentage bias could be reported (the skewness of the estimates on this scale did not provide misleading conclusions). Biases in the estimates of the off diagonal elements of Ξ were evaluated also as low biases in these should indicate good characterization of the variability or correlation, hence accurate predictions of the TPs. Biases in parameter estimates were only computed for M0 through M7, because of the common parameter structure related to the true simulation model. - 21 - State Space and Marginal Probability Models (1|0) (1) Biases in the ๐๐๐ and the population transition probabilities (๐๐๐ (0|1) , ๐๐๐ ), as defined in the Notation of Model Quantities section, were also computed and assessed. A closed form of these is available, and the derivation is provided in Appendix B. These TPs relate to the ability of the model to reproduce patterns in the response transitions within subjects over time. (1) Predictions of ๐๐๐ and inferences thereon are often of primary interest in decision making; (1) thus, accuracies of inferences in the ๐๐๐ were also assessed. Two methods were used to compute 90% confidence intervals (CI). The first method used was a smoothed bootstrap technique [8]. For each of 1000 replicates, a vector of parameters were sampled from a multivariate normal distribution that had the mean vector equal to the estimate vector and covariance matrix equal to the output from the covariance step (covariance matrix of the estimates). This vector was plugged into the model to make predictions across the doses and times for the replicate. For each dose and time combination, the 5th and 95th percentiles across the distribution generated by the replicates were used as the 90% CI bounds. The second method (1) used was the delta-method, which is based on linearization [9]. The closed from of ๐๐๐ allowed straightforward implementation. The details are supplied in Appendix C. A delta-method technique is also provided therein for general non-closed form predictions (eg, if the logit link were used). Such a method might be able to be incorporated into software for general, quick and convenient calculation of CIs of population means, which could be output in a table file. The procedure was not evaluated here, yet maybe of potential use and so future research is warranted. - 22 - State Space and Marginal Probability Models Results The results for ๐ = 0.0 and ๐ = 0.7 are presented and contrasted here. Results for the other ๐ scenarios are presented in the Supplemental Material. Model fittings that did not converge or did not provide standard errors were discarded from the results (maximum was only 0.6% of the fittings for a specific scenario and model). For ๐ = 0.7, M0 and M2 showed the smallest magnitudes of bias (percentage) in the fixed effects ๐ฝ. The 5-point approximation to Φ2 did not provide much benefit over the 3-point. M1 (using LA) showed slightly increased biases comparatively. M3 and M4, in which ๐ is fixed to 0, had greater positive biases, with AGQ (M3) not providing much benefit over LA (M4). M5 and M6 showed lower magnitudes of bias, except for the larger negative biases in ๐ฝ2 and ๐ฝ4 , which reflect the maximum placebo effect and its onset. The design might not have been as rich in information regarding these parameters. The decrease is surprising perhaps, because these models did not estimate ๐ or Ω22 . The naïve pool model, M7, had the largest biases which were negative. M0 and M1 had the smallest biases in Ω11 , Ω22 and ๐, also as expected. Biases in estimates from models where parameters were constrained to 0 are −100% by definition (๐ for M3 through M7, Ω22 for M5 and M6, Ω11 for M7). Given some of these models represent simplifications of the true model, meaningful comparisons of the biases for these parameters are difficult. Biases in the off diagonal elements of Ξ are also provided in Figure 2. These provide an idea of how well the correlations are being estimated. Model M0, M1 and M2 showed almost no bias. Model M3 and M4 demonstrated a consistent positive bias across all the elements, which resulted from the imposed constraint of ๐ = 0. AGQ (M4) did not improve these estimates. M5 and M6 had jagged patterns of bias, with some biases < 0, and again AGQ provided no benefit. M7 did not provide estimates of the off diagonals, because these are assumed to be 0 by the model; thus, all were reported as −100%. - 23 - State Space and Marginal Probability Models One wonders if the lower biases observed for M3 in the ๐ฝ compared to M2 were because, for some reason, M3 provided a better approximation to Ξ in some average sense. The biases in ๐ (1) on the probability scale are displayed in the first row of Figure 3, where the segments reflect the minimum to maximum bias (range) across the doses and times and the points represent the median across these. M0, as might be anticipated, is nearly unbiased. M1, using LA and a 3- point approximation showed slightly increased bias compared to M0. M2 (AGQ) provided similar results to M0, which suggests that the 3-point is nearly as good as the 5point approximation. M3 and M4 had a little larger bias than the previous models, which was due to not estimating ๐. The biases were not egregious; the biases for M4 were all less than ± 0.01. The biases for M2 and M4 fitted to the ๐ = 0 scenario were similar to that of M0 (all used AGQ), and M1 and M3 showed a little larger range of bias. Note that M0 through M4 are either the true model (subject to approximation) or contain it for ๐ = 0. M5 shows an increase in the biases independent of ๐; Ω22 was not estimated in this model. Using AGQ (M6) did not improve these. M7 (naïve pooled) had nearly the same biases for ๐ (1) as M5 and M6. The saturated models M8, M9 (AGQ), and M10 (naïve pooled) demonstrated little bias for either scenario of ๐. Biases in the population transition probabilities ๐ (1|0) and ๐ (0|1) are displayed in the second and third rows of Figure 3, respectively. The patterns between these across the models are similar. M0, M1 and M2 demonstrated little bias, AGQ decreased the bias comparatively. M3 through M6 for ๐ = 0.7 and M5 and M6 for ๐ = 0 demonstrated greater bias. M3 and M4 for ๐ = 0 yielded similar results to M1 and M2 as expected. M7 and M10 (naïve pooled) - 24 - State Space and Marginal Probability Models demonstrated the greatest biases. M8 and M9 had smaller biases compared to those models; the estimation of Ω11 improved prediction of ๐ (1|0) and ๐ (0|1) . The fourth and final row of Figure 3 shows the range of 90% CI coverages across the doses and times by model. M0 and M2 (AGQ) performed the best independent of ๐, and M4 performed similarly to M0 and M2 for ๐ = 0 as expected. M1 (and M3 for ๐ = 0), M8, and M9 had less overall coverage comparatively, with M8 and M9 performing slightly better than M1 (or M3 for ๐ = 0). Even though M1 (and M3 for ๐ = 0) is the true model, the LA approximation degraded the coverage rates. This is likely due to estimating Ω22 , because M8 used LA but only estimated Ω11 . The 90% CI rates for M8 and M9 are not egregious, despite the model form not being the true model. The saturated nature of the model compensated, even with systematically biased TPs. The biases increase for M3, M4 (๐ = 0.7) and M10. M5 through M7 demonstrated poor coverages independent of ๐, which is due to the larger biases in predicting ๐ (1) and also because only 1 or 0 variance components were estimated. These models likely did not adequately capture the correlation structure and thus inflated the amount of information used in calculating the precisions of the estimates. The use of AGQ in M6 does not improve the outcome compared to M5. Based on the coverage rates, extreme caution would be needed when making inferences using M5 – M7. Discussion The beginning of this article demonstrated that the Markov and latent variable models are related when there is autocorrelation. Because both methods use the Markov property and thus could be considered Markov models, it was suggested that a different terminology be used when referring to these approaches, and that this terminology be aligned with the structural orientation - 25 - State Space and Marginal Probability Models of the model. Models which focus on the structure of transition probabilities could be termed state space models and those that focus on marginal probabilities could be termed marginal probability models. Details on transition probabilities, achieving steady state of a Markov process, and handling missing data were provided to help delineate differences which manifest between the two approaches. A simple simulation was presented to compare and contrast these as well. The influence of the assumption of the initial condition for the Markov chain was illustrated. Overall, the intent was not to claim superiority of either method. In general, given the data pharmacometricians encounter typically, one will not know which approach is better. Discussing the underlying concepts of these approaches hopefully will help inform pharmacometricians when thinking of strategies of modeling such data. For example, such issues could be important if the goal of a model is to be able to predict results from a study if dosage levels are changed based on observed responses or to predict the difference in responder rate at trial conclusion. A hybrid model which specified marginal probabilities, yet used fixed effects (non-stochastic components) to address the autocorrelation, was presented also. It is unknown if such a model has been discussed in the literature. The hybrid approach could be used for other link functions and also for more in depth looks at autocorrelation, such as decay in time (not handled by simple SS models). Future exploration of this method might be of interest, because it has the potential to realize the benefits from both approaches. Continuous time SS models were not discussed here (see Pilla Reddy et al for example [10]). These models are more complex to implement, and hence were beyond the scope of this article. Continuous time SS models are attractive in that these do not have the complications of discrete time SS models when accounting for missing data, and these are more flexible in dealing with autocorrelation structures. - 26 - State Space and Marginal Probability Models Autocorrelation of the latent residuals was presented for dealing with stochastic autocorrelation. An approximation for the bivariate normal enabled modeling this through an autocorrelation function. This method considers correlation on the latent scale. A method that was not presented, yet could be used, is one that models autocorrelation of the responses instead of through the latent residuals. It is based on the relation ๐ถ๐๐(๐1 , ๐2 ) = ๐ธ(๐1 ,๐2 )−๐ธ(๐1 )๐ธ(๐2 ) ๐๐ท(๐1 )๐๐ท(๐2 ) (1,1) = (1) (1) ๐12 −๐1 ๐2 (1) (1) (1) (24) (1) √๐1 (1−๐1 )√๐2 (1−๐2 ) where Cor(.) is the correlation operator, E(.) is the expectation operator, and SD(.) is the standard (1,0) deviation operator. Setting ๐ถ๐๐(๐1 , ๐2 ) = ๐12 (๐) and noting ๐12 (๐ฆ ,๐ฆ2 ) probabilities ๐121 (1) (1,1) = ๐1 − ๐12 , the joint can be calculated and used in the likelihood. Table I illustrates the relation between correlation on the latent and response scales for the simulations used in the article. This approach is also Markov based (see Guerra et al [11]) and could be used for logistic or other link functions. This method was not developed further here because of the difficulty in generalizing it to ordered categorical data. Additionally, correlation on the latent residual scale seems more analogous to continuous data, despite the restriction to the probit link function. The simulation, conducted originally to evaluate the effects of neglecting stochastic autocorrelation in the model, which leads to biased transition probabilities, provided considerable information on strategies for modeling longitudinal binary data. These results did not appear contradictory to those reported by Johansson et al, in so far as these two reports can be compared (some different objectives and estimation methods were used) [12]. The main objective here was to evaluate predictions and their inferences which were shown to be a function of all the parameters and so biases in any could result in biased predictions. If the - 27 - State Space and Marginal Probability Models results from this study can be generalized, these would imply the following, some of which might be surprising given the current standard for modeling such data. The 3-point approximation to the bivariate normal is nearly as good as the 5-point. Figure 3 demonstrates this for ๐ = 0.7, and can be verified even for ๐ = 0.9 (see the Supplemental Material). Values of ๐ > 0.9 are likely needed before a difference is perceived. Nevertheless, unless run times are an issue (for the simulation study, the run times were similar), the 5-point is recommended in case ๐ happens to be large (albeit unlikely). Similarly, one should use the best approximation available for estimating the random effects. Adaptive Gauss quadrature demonstrated less bias than using the Laplace approximation, except when the random effects structure was misspecified. There was no information to suggest that using a better approximation could be harmful for incorrect models. In fact, biases were lower for the saturated models when using adaptive Guass quadrature and 90% CI coverage rates were improved. to Better approximation methods might require longer run times, however. If this is not an issue, better approximations should be pursued when possible. Unbiased prediction of the transition probabilities was not necessary to get unbiased predictions (relatively) of the population mean probabilities nor was it necessary to achieve reasonable 90% CI coverage rates and hence inferences. Addition of fixed effects parameters is required to attain good predictions and inferences without characterizing the transition probabilities (or the latent marginal variance structure). Thus, not characterizing the transition probabilities will lead to developing parsimonious models. This point is furthered bellow. Also, unless ๐ is substantial (> 0.5 based on the simulation results presented here – see the Supplemental Material), not including autocorrelation in the model should not affect marginal probabilities or inferences egregiously. Characterization of the marginal variance or transition - 28 - State Space and Marginal Probability Models probabilities may still be of interest, if the purpose of the model is simulating or predicting individual response profiles that incorporate changes in dosage levels which are triggered by observed responses. Using random effects and autocorrelation should be flexible enough to deal with many situations frequently encountered in pharmacometrics, and thus should help reduce bias [13]. This led to the next most critical result from the stimulation study. The key learning was this. Incorporating random effects when necessary is essential to reduce biases in predictions of the probabilities of response. Failure to include the random effect on the non-drug model component in the simulation study led to much greater biases than omitting autocorrelation (ie, M5 through M7). The probit formulation provides insight into why this is the case. The marginal variability on the probit scale directly influences the population mean probability predictions (not including autocorrelation influences the predictions indirectly due to bias in the random effects). Consequently, incorrect random effects structures lead to bias. Bias from failure to include necessary random effects can be reduced at the cost of additional fixed effects. The saturated models, which had positively biased transition probabilities, were unbiased in the population mean predictions. However, if one wants to purse a parsimonious model, random effects should be considered on all model components, similar to models posited for continuous data. This makes sense intuitively. As time progresses during a trial and as subjects receive benefit from treatment, heterogeneity is introduced. Not every subject is anticipated to have the same benefit from the same dose, which is modeled by including a random effect on the drug effect to account for different maximum effects a drug can bestow. Although, drug-related random effects were not considered in the simulation, it is not a stretch to infer that failure to incorporate such effects will lead to biases in the predictions of drug effect and drug response. Failure to incorporate these might lead to the incorporation of - 29 - State Space and Marginal Probability Models more fixed effects – ie, model expansion – to get an adequate fit to the data, thwarting interpretation and efficiency. If drug does introduce heterogeneity or variability manifested as a change in correlation structure as a function of dose, then this has implications for recent suggestions regarding parameterization of indirect response models. The change from baseline parameterization [6] is convenient for modeling data from placebo and then, in a sequential fashion, incorporating data following active treatment, because the placebo model component absorbs the drug effect parameters when dose or concentration is 0. Making this change dose not ensure that the variabilities will be the same, which could result in bias as discussed above. For example think of a simple indirect response model (IRM) [14] parameterized as (๐ฝ1 + ๐1 ) + (๐ฝ2 + ๐2 )๐ (๐ก) or using change from baseline as (๐ฝฬ1 + ๐ฬ 1 ) + (๐ฝฬ2 + ๐ฬ 2 )[๐ (๐ก) − 1], where ๐ (๐ก) is the solution of an IRM model which assumes inhibition (ie, ๐ (๐ก) = 0 and ๐ (๐ก > 0) < 1). The first model posits that variability in baseline response is due to disease and that the drug effect will remove this component of variability from the response. To use the change from baseline parameterization in this case, care must be taken to when postulating Ω. In fact, a full Ω matrix might need to avoid bias. Continuing this argument, if one considers a partial inhibition of production of response (ie, ๐๐๐ = 1 − ๐ผ๐๐๐ฅ (๐) โ ๐ถ(๐ก)⁄(๐ธ๐ถ50 + ๐ถ(๐ก))) that is a function of a random effect, change of baseline will not necessarily lead to the same result as if a stimulatory effect had been assumed (ie, ๐๐๐ = 1 + ๐๐๐๐ฅ (๐) โ ๐ถ(๐ก)⁄(๐ธ๐ถ50 + ๐ถ(๐ก))) even if change from baseline is used and despite the ability to achieve a similar predicted outcome potentially. The reason is the assumption of the random effects between the two models will lead to a different marginal variance. Thus, when considering random effects and hence the population, the model - 30 - State Space and Marginal Probability Models space is a little richer than that suggested by Hu et al (see [6]) in that there are more than 3 possible IRM models. In summation, this research suggests that more random effects should be evaluated when evaluating longitudinal binary or categorical data. Failure to incorporate random effects or incorrect specification thereof can lead to biased predictions and less than nominal inference rates. ACKNOWLEDGEMENT The authors would like to acknowledge graciously and thank kindly Dr. Robert Bauer for making the bivariate normal subroutine available for general use in NONMEM. REFERENCES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sheiner LB (1994) A new approach to the analysis of analgesic drug trials, illustrated with bromfenac data. Clin Pharmacol Ther 56:309-322 Lacroix BD, Lovern MR, Stockis A, Sargentini-Maier ML, Karlsson MO, Friberg LE (2009) A pharmacodynamic Markov mixed-effects model for determining the effect of exposure to certolizumab pegol on the ACR20. Clin Pharmacol Ther 86:387-95 Hutmacher MM, French JL (2011) Extending the latent variable model for extra correlated longitudinal dichotomous responses. J Pharmacokinet Pharmacodyns 38:833-859 Hutmacher MM, Krishnaswami S, Kowalski KG (2008) Exposure-response modeling using latent variables for the efficacy of a JAK3 inhibitor administered to rheumatoid arthritis patients. J Pharmacokinet Pharmacodyns 35:139–157 Drezner Z, Wesolowsky GO (1990) On the computation of the bivariate normal integral. J Stat Comput Simul 35:101–107 Hu C, Xu Z, Mendelsohn Am, Zhou H (2013) Latent variable indirect response modeling of categorical endpoints representing change from baseline. J Pharmacokinet Pharmacodyns 40:81–91 Hu C (2014) Exposure–response modeling of clinical end points using latent variable indirect response models. CPT Pharmacometrics Syst. Pharmacol. 3:1–8 Yu L, Griffith WS, Tyas SL, Snowdon DA, Kryscio RJ (2010) A nonstationary Markov transition model for computing the relative risk of dementia before death Stat Med. 29:639-648 Bishop YMM, Fienberg SE, Holland PW Discrete multivariate analysis: theory and practice. The MIT Press, Massachusetts, 1975, pp. 486-502 Pilla ReddyV, Petersson KJ, Suleiman AA, Vermeulen A, Proost JH, Friberg LE (2012) Pharmacokinetic– pharmacodynamic modeling of severity levels of extrapyramidal side effects with Markov elements. CPT Pharmacometrics Syst. Pharmacol. 1: 1−9 Guerra MW, Shults J, Amsterdam J, Ten-Have T (2012) The analysis of binary longitudinal data with timedependent covariates. Stat Med 31:931-948 Johansson วบM, Ueckert S, Plan EL, Hooker AC, Karlsson MO (2014) J Pharmacokinet Pharmacodyn (2014) 41:223–238 Gurka MJ, Edwards LJ, Muller KE (2011) Avoiding bias in mixed model inference for fixed effects. Stat Med. 30(22):2696-2707 Dayneka NL, Garg V, Jusko WJ (1993) Comparison of four basic models of indirect pharmacodynamic response. J Pharmacokinet Biopharm. 21(4):457-478. - 31 - State Space and Marginal Probability Models Appendices Appendix A The bivariate normal subroutine is available at ftp://nonmem.iconplc.com/Public/nonmem/bivariate/ along with some sample control streams. Below is an example control stream that corresponds to the simulation study. $PROB Example Control Stream $INPUT SIM ID DOSE DV DV_ TIME TIME ;***DV_ ,TIME_ are previous DV, TIME***; ;*** – ie, DV = ๐ฆ๐๐ , DV_=๐ฆ๐๐−1, TIME=๐ก๐๐ , TIME_=๐ก๐๐−1***; $DATA ex01.csv IGNORE=@ $SUBROUTINES OTHER=bivariate.f90 ;***Include bivariate normal subroutine***; $PRED B1=THETA(1) ;***๐ฝ1***; B2=THETA(2) ;***๐ฝ2***; B3=THETA(3) ;***๐ฝ3***; K =LOG(2)/EXP(THETA(4)) ;***used for placebo onset – ๐(๐ก๐๐ )***; ED50=EXP(THETA(5)) - 32 - State Space and Marginal Probability Models ET1=EXP(THETA(6))*ETA(1) ;***parameterized for smoothed parametric bootstrap***; ET2=EXP(THETA(7))*ETA(2) RHO=(2/(1+EXP(-THETA(8)))-1)**(TIME-TIME_) ;***correlation function ๐๐๐๐′ (๐)***; U = (1-EXP(-K*TIME)) U_ = (1-EXP(-K*TIME_)) MX =B1+B2*U+B3*DOSE/(DOSE+ED50)+ET1+ET2*U ;*** ๐๐๐ + ๐๐๐ ๐๐ ***; MX_ =B1+B2*U_+B3*DOSE/(DOSE+ED50)+ET1+ET2*U_ ;*** ๐๐๐−1 + ๐๐๐−1 ๐๐ ***; MX_=B1+B2*(1-EXP(-K*TIME_))+B3*DOSE/(DOSE+ED50)+U1+U2*(1-EXP(-K*TIME_)) (0) (1) PC =(1- PHI(MX))*(1-DV ) + PHI(MX) *DV ;*** ๐ฬ๐๐ (1 − ๐ฆ๐๐ ) + ๐ฬ๐๐ ๐ฆ๐๐ ***; (0) (1) PC_ =(1-PHI(MX_))*(1-DV_) + PHI(MX_)*DV_ ;*** ๐ฬ๐๐−1 (1 − ๐ฆ๐๐−1 ) + ๐ฬ๐๐−1 ๐ฆ๐๐−1***; IF(PC.LE.0.0D+00) EXIT ;***avoid issues of LOG(0)***; IF (TIME.EQ.1) THEN ;***signifies first observation***; LOGL=LOG(PC) ELSE ;*** Start Passing Bivariate Normal Information - only do so on subsequent records***; - 33 - State Space and Marginal Probability Models VECTRA(1)=RHO VECTRA(2)=MX VECTRA(3)=MX_ VECTRA(4)=1 ;***0 = Upper tail as in Drezner & Wesolowsky; 1 = Bottom tail***; VECTRA(5)=1 ;***0 = 3 pt approximation; 1 = 5 point approximation***; BV=FUNCA(VECTRA) ;***return bivariate normal results***; ;***code joint probability based on bottom tail***; JP=(DV-1)*(DV_-1)+(DV-1)*(1-2*DV_)*PHI(MX_)+(DV_-1)*(1-2*DV)*PHI(MX)+(12*DV)*(1-2*DV_)*BV LOGL=LOG(JP/PC_) ;***Compute population predictions***; V=SQRT(1+EXP(THETA(6))**2+EXP(THETA(7))**2*U**2) ;***V=SQRT(1+OMEGA(1,1)+OMEGA(2,2)*U**2) use if estimating OMEGA***; POPP = (B1+B2*U +B3*DOSE/(DOSE+ED50))/V ;***Population mean prediction***; ;***Set up log-likelihood***; $THETA - 34 - State Space and Marginal Probability Models -2.4 ; 1 B1 2.0 ; 2 B2 3.4 ; 3 B3 1.4 ; 4 LOG(B4) 0.7 ; 5 LOG(B5) 0.1 ; 6 LOG SQRT VAR(ETA1) 0.5 ; 7 LOG SQRT VAR(ETA2) 0.6 ; 8 RHO parameter $OMEGA DIAGONAL(2) 1 FIXED ; V1 1 FIXED ; V2 $EST MAX=8000 PRINT=10 METHOD=1 LAPLACE -2LL SIGL=10 NOHABORT ;***$EST METHOD=IMP LAPLACE -2LL NOHABORT PRINT=1 NITER=500 CTYPE=3 $COV COMPRESS MATRIX=R PRINT=E UNCONDITIONAL - 35 - State Space and Marginal Probability Models A version of this control stream which uses features of NONMEM to generate DV_, MX_ etc is provided in the Supplemental Material. That version should be helpful when dealing with models that employ differential equations. Appendix B The calculation of bias for the population transition probabilities (eliminating the subscript i for simplicity and indexing two arbitrary times by 1 and 2) requires the quantity ๐ธ๐ Φ2 ( ๐ฬ 1 , ๐ฬ 2 , ๐12 (๐)), where ๐ฬ 1 = ๐1 − ๐1 ๐. From the latent variable formulation ∞ ๐1 ∞ ๐2 ๐ธ๐ Φ2 = ∫ Φ2 (๐ฬ 1 , ๐ฬ 2 , ๐12 (๐)) = ∫ ∫ ∫ ๐2 (๐ง1 , ๐ง2 , ๐12 (๐)) ๐๐ง1 ๐๐ง2 −∞ where ๐2 (๐ง1 , ๐ง2 , ๐12 (๐)) = −∞ −∞ −∞ 1 2๐√1−๐12 (๐)2 ๐๐ฅ๐ {− (๐ง12 −2๐12 (๐)๐ง1 ๐ง2 +๐ง22 ) 2(1−๐12 (๐)2 ) }. Changing the order of integration and making a suitable change of variables based on the square root of the diagonal elements of Ξ (based on times 1 and 2 only), ๐ธ๐ Φ2 = Φ2 (๐1 , ๐2 , ๐ 12 (๐, Ω)) where ๐ 12 (๐) is the off-diagonal from [๐๐๐๐(Ξ)]−1⁄2 Ξ[๐๐๐๐(Ξ)]−1⁄2, which for the example ⁄ ⁄ yields ๐ 12 (๐) = (๐12 (๐) + Ω11 + Ω22 ๐1 ๐2 )๐1−1 2 ๐2−1 2 . Monte Carlo methods could also be (1|0) used: ๐๐๐ (0) −1 (1,0)∗ (0|1) ≅ [๐๐๐−1 ] [๐1 ∑๐ ๐=1 ๐ฬ๐๐๐−1 ] and ๐๐๐ (1) −1 (0,1)∗ ≅ [๐๐๐−1 ] [๐1 ∑๐ ๐=1 ๐ฬ๐๐๐−1 ], where the ‘*’ ∗ indicates the probability is a function of ๐๐ which is sampled from ๐(0, Ω), with Ω estimated, using a sufficiently large M. - 36 - State Space and Marginal Probability Models Appendix C ฬ ) correspond to their estimates, Let ๐ = (๐ฝ, Ω) be the vector of parameters and ๐ฬ = (๐ฝฬ , Ω ฬ. Then the variance of ฬ (๐ฬ) = Δ which have an estimated covariance matrix (eg, COV step) ๐๐๐ the prediction can be calculated using (1) (1) ฬ [Φ−1(๐๐๐ ๐๐๐ )] ≈ ∂ [Φ−1 (๐๐๐ )] ∂๐ ๐ (1) ฬ Δ | ∂ [Φ−1 (๐๐๐ )] ∂๐ | ฬ ๐=๐ ฬ ๐=๐ such that the confidence limit (eg, at 5th percentile) is (1) ฬ [Φ−1 (๐(1) )]] ๐ถ๐ฟ0.05 = Φ [Φ−1 (๐๐๐ ) + ๐0.05 โ √๐๐๐ ๐๐ where ๐0.05 is the quantile corresponding to probability level 0.05. To apply this method when a closed form solution to the population mean is not available, suitable regulatory conditions are required. In general, let ๐ธ(๐|๐) = ๐(๐) and ๐ธ(๐; ๐) = ๐ธ๐ (๐(๐); ๐) where inference on ๐ธ(๐) is desired. Applying the delta-method ๐ ∂[๐ธ(๐; ๐)] ∂[๐ธ(๐; ๐)] ฬ ฬ [๐ธ(๐; ๐)] ≈ ๐๐๐ | Δ | ∂๐ ∂๐ ฬ ฬ ๐=๐ ๐=๐ where passing the differentiation through the integration ๐ ∗ ); ∂[๐ธ๐ (๐(๐); ๐)] ๐ธ๐ [∂(๐(๐); ๐)] ∂[๐ธ(๐; ๐)] ∂(๐(๐๐ ๐) | = | = | ≅∑ | ∂๐ ∂๐ ∂๐ ∂๐ ฬ ฬ ฬ ฬ ๐=๐ ๐=๐ ๐=๐ ๐=๐ ๐=1 - 37 - State Space and Marginal Probability Models ฬ ). Finite differences could be used, for example where ๐ ∗ ~๐๐๐(0, Ω ∗ );๐) ∂(๐(๐๐ ∂๐ = ∗ ); ∗ ); [(๐(๐๐ ๐ + ๐ฟ) − (๐(๐๐ ๐)]/๐ฟ. Appendix D Let the response ๐ be one of K+1 ordered values 0, 1, 2, … , K-1, K, and let Z be defined as in equation (16). For this derivation (suppressing ๐ when convenient and with some convenient abuse of notation), assume the mapping ๐(๐ = 0) = ๐(๐พ1 < ๐ < ๐พ0 ), … , ๐(๐ = ๐พ) = ๐(๐พ๐พ+1 < ๐ ≤ ๐พ๐พ ) where the ๐พ0 = ∞ > ๐พ1 > โฏ > ๐พ๐พ−1 > ๐พ๐พ > ๐พ๐พ+1 = −∞ are thresholds (K parameters), and let ๐ฬ ๐ฆ๐ = ๐พ๐ฆ๐ − ๐๐ − ๐๐ ๐ for time j and ๐12 = ๐12 (๐). ๐พ (๐ฆ ) ๐ฬ1 1 (๐) = ๐(๐1 = ๐ฆ1 ) = ๐(๐พ๐ฆ1 +1 < ๐1 ≤ ๐พ๐ฆ1 ) = Φ(๐ฬ ๐ฆ1 ) − Φ(๐ฬ ๐ฆ1+1 ), ∑ ๐ฬ1 =0 ๐=0 (๐ฆ ,๐ฆ2 ) ๐ฬ121 = ๐(๐1 = ๐ฆ1 , ๐2 = ๐ฆ2 ) = ๐(๐พ๐ฆ1+1 < ๐1 ≤ ๐พ๐ฆ1 , ๐พ๐ฆ2 +1 < ๐2 ≤ ๐พ๐ฆ2 ) = Φ2 (๐ฬ ๐ฆ1 , ๐ฬ ๐ฆ2 , ๐12 ) − Φ2 (๐ฬ ๐ฆ1 +1 , ๐ฬ ๐ฆ2 , ๐12 ) − Φ2 (๐ฬ ๐ฆ1 , ๐ฬ ๐ฆ2 +1 , ๐12 ) + Φ2 (๐ฬ ๐ฆ1+1 , ๐ฬ ๐ฆ2 +1 , ๐12 ) where Φ(−∞) = 0, Φ(∞) = 1, and in this notation Φ2 (∞, ∞, ๐12 ) = 1; Φ2 (๐ฅ, ∞, ๐12 ) = Φ2 (∞, ๐ฅ, ๐12 ) = Φ(๐ฅ); and Φ2 (๐ฅ, −∞, ๐12 ) = Φ2 (๐ฅ, −∞, ๐12 ) = Φ2 (−∞, ∞, ๐12 ) = Φ2 (∞, −∞, ๐12 ) = 0. - 38 - State Space and Marginal Probability Models - 39 - State Space and Marginal Probability Models Table 1 Correlations for the Latent Variables (Z) and Responses (Y) Correlation in LV (Z) Correlation in Responses (Y)a ρ/ 0 0.3 0.5 0.7 0.9 0 0.3 0.5 0.7 0.9 Elementb 1,2 0.52 0.66 0.76 0.85 0.94 0.30-0.34 0.40-0.47 0.48-0.55 0.59-0.64 0.75-0.78 1,3 0.52 0.53 0.57 0.66 0.83 0.29-0.34 0.31-0.35 0.33-0.39 0.41-0.45 0.56-0.61 1,4 0.49 0.49 0.49 0.52 0.67 0.27-0.32 0.28-0.32 0.27-0.32 0.29-0.33 0.40-0.45 1,5 0.47 0.47 0.47 0.47 0.54 0.25-0.29 0.25-0.29 0.26-0.31 0.26-0.29 0.31-0.34 1,6 0.46 0.46 0.46 0.46 0.49 0.24-0.30 0.26-0.29 0.25-0.29 0.25-0.29 0.26-0.32 1,7 0.46 0.46 0.46 0.46 0.47 0.25-0.30 0.25-0.29 0.25-0.29 0.25-0.29 0.25-0.30 2,3 0.57 0.60 0.67 0.77 0.90 0.34-0.38 0.36-0.41 0.41-0.47 0.50-0.56 0.68-0.71 2,4 0.56 0.56 0.57 0.61 0.76 0.31-0.37 0.33-0.37 0.33-0.38 0.36-0.42 0.49-0.54 2,5 0.55 0.55 0.55 0.56 0.63 0.31-0.36 0.32-0.36 0.33-0.36 0.32-0.37 0.38-0.42 2,6 0.55 0.55 0.55 0.55 0.58 0.31-0.36 0.32-0.35 0.32-0.36 0.32-0.36 0.34-0.39 2,7 0.55 0.55 0.55 0.55 0.56 0.31-0.36 0.32-0.35 0.31-0.36 0.31-0.36 0.31-0.36 3,4 0.65 0.65 0.67 0.73 0.87 0.39-0.44 0.39-0.45 0.41-0.47 0.46-0.51 0.63-0.66 3,5 0.65 0.65 0.65 0.66 0.74 0.38-0.45 0.40-0.45 0.39-0.45 0.41-0.45 0.48-0.52 3,6 0.65 0.65 0.65 0.65 0.69 0.39-0.45 0.40-0.45 0.40-0.45 0.40-0.45 0.43-0.48 3,7 0.65 0.65 0.65 0.65 0.66 0.40-0.45 0.40-0.44 0.40-0.45 0.39-0.45 0.40-0.46 4,5 0.73 0.73 0.73 0.74 0.84 0.46-0.51 0.46-0.52 0.46-0.52 0.48-0.53 0.59-0.64 4,6 0.73 0.73 0.73 0.73 0.78 0.47-0.52 0.47-0.53 0.47-0.52 0.47-0.53 0.52-0.56 4,7 0.73 0.73 0.73 0.73 0.75 0.47-0.51 0.48-0.53 0.49-0.52 0.48-0.53 0.48-0.53 5,6 0.77 0.77 0.77 0.78 0.87 0.51-0.56 0.51-0.56 0.50-0.56 0.52-0.57 0.63-0.67 5,7 0.77 0.77 0.77 0.77 0.80 0.50-0.55 0.51-0.56 0.50-0.56 0.51-0.56 0.53-0.59 6,7 0.78 0.78 0.78 0.78 0.84 0.52-0.56 0.51-0.56 0.51-0.57 0.52-0.57 0.58-0.63 aCorrelation in responses calculated by simulation (n=20,000 per dose group), range reported is over dose groups (0 to 30 mg) bElement relates indices of the matrices – eg, 2,7 is 2nd row, 7th column. Diagonals of the matrices = 1 by definition. - 40 - State Space and Marginal Probability Models Captions for Figures Fig. 1. Results for the illustration. Black line is true value. Grey lines with numbers correspond to results from Models 1 − 5, respectively. Fig. 2. Percentage bias in the parameter estimates for models M0 through M7. Fig. 3. Range of bias (line segment) and median bias (point) across these doses and times over the 1000 simulations for the population predictions (๐(1) − first row) and transition probabilities (๐(1|0) and ๐(0|1) − second and third row, respectively). Last row corresponds to 90% CI coverage rates across doses and times over the simulations. Black is for the smoothed parametric bootstrap and grey is for the delta method. - 41 - State Space and Marginal Probability Models Figure 1 - 42 - - 43 Parameter -6 0 ๏ ๏ถ๏ฒ ๏ ๏ถ๏ณ ๏ ๏ถ๏ด ๏ ๏ถ๏ต ๏ ๏ท๏ฑ ๏ ๏ท๏ฒ ๏ ๏ท๏ณ ๏ ๏ท๏ด ๏ ๏ท๏ต ๏ ๏ท๏ถ ๏ ๏ถ๏ด ๏ ๏ถ๏ต ๏ ๏ท๏ฑ ๏ ๏ท๏ฒ ๏ ๏ท๏ณ ๏ ๏ท๏ด ๏ ๏ท๏ต ๏ ๏ท๏ถ ๏ ๏ต๏ณ ๏ ๏ต๏ฒ ๏ ๏ต๏ฑ ๏ ๏ด๏ณ ๏ ๏ด๏ฒ ๏ ๏ด๏ฑ ๏ ๏ณ๏ฒ ๏ ๏ณ๏ฑ ๏ ๏ฒ๏ฑ ๏ ๏ฒ๏ฒ ๏ ๏ฑ๏ฑ ๏ฒ ๏ข๏ต ๏ข๏ด ๏ข๏ณ ๏ข๏ฒ ๏ข๏ฑ -1 0 0 -8 0 ๏ ๏ถ๏ณ 40 ๏ ๏ถ๏ฑ 20 ๏ ๏ถ๏ฒ M0 M1 M2 M3 M4 M5 M6 M7 ๏ ๏ต๏ด 0 ๏ฒ๏ ๏ฝ๏ ๏ฐ ๏ ๏ถ๏ฑ -2 0 % Bia s -4 0 Parameter ๏ ๏ต๏ด ๏ ๏ต๏ณ ๏ ๏ต๏ฒ ๏ ๏ต๏ฑ ๏ ๏ด๏ณ ๏ ๏ด๏ฒ ๏ ๏ด๏ฑ ๏ ๏ณ๏ฒ ๏ ๏ณ๏ฑ ๏ ๏ฒ๏ฑ ๏ ๏ฒ๏ฒ ๏ ๏ฑ๏ฑ ๏ฒ ๏ข๏ต ๏ข๏ด ๏ข๏ณ ๏ข๏ฒ ๏ข๏ฑ -1 0 0 -8 0 -6 0 -2 0 % Bia s -4 0 0 20 40 State Space and Marginal Probability Models Figure 2 ๏ฒ๏ ๏ฝ๏ ๏ฐ๏ฎ๏ท M0 M1 M2 M3 M4 M5 M6 M7 State Space and Marginal Probability Models Figure 3 - 44 -