EvalBiasMixedDichotomousData_13APR2015

Markov, Latent Variable, State-Space, or Marginal Probability Models in Pharmacometrics: Concepts and an Evaluation of Bias Matthew M. Hutmacher1,* 1. Ann Arbor Pharmacometrics Group (A2PG) 301 N. Main St, Suite 102 Ann Arbor, MI 48104 * Corresponding author. E-mail address: matt.hutmacher@a2pg.com State Space and Marginal Probability Models Abstract One might get the impression from reading the pharmacometric literature that Markov and latent variable (LV) models for longitudinal binary or ordered categorical data are distinct approaches. The approaches are related when considering autocorrelation between responses within an individual however, because these both use the Markov property. An alternate terminology (albeit not new) for describing these modeling approaches is proposed that aligns with the aim and structure of each approach. The term state space (SS) is proposed for models in which the model structure is linked to transition probabilities. The primary focus of SS models is the transition from one response state to another. The term marginal probability (MP) is proposed for models in which the primary focus of the model structure is the marginal probabilities, and the transition probabilities are derived. One might question if the transition probabilities need to be characterized adequately to ensure accurate prediction of the marginal probabilities. A simulation study, to address this, was conducted to assess bias in estimation, prediction, and inferences when the transition probabilities are misspecified and the autocorrelation is stochastic. The results may be surprising to many in that these suggest that characterizing autocorrelation, when stochastic, is not as important as specifying a suitably rich random effects structure. KEY WORDS: latent variable; probit; autocorrelation; generalized nonlinear mixed-effects models; Markov model -2- State Space and Marginal Probability Models Introduction Generalized nonlinear mixed effects models have been used increasingly for modeling longitudinal binary or ordered categorical outcomes since their introduction into the pharmacometric literature over 20 year ago [1]. Methods for addressing and handling additional within-subject correlation, or autocorrelation – correlation not addressed by subject-specific random effects – have been discussed more recently. For example, Lacroix et. al. modeled the probabilities for transitioning between responder status and dropout, where responder status was defined by the American College of Rheumatology 20% responder criterion (ACR20) [2]. These transition probabilities, or conditional probabilities of moving between states given observation of the previous states, address the autocorrelation. Exposure and time effects, as well as subjectspecific random effects, were linked to the transition probabilities directly. Modeling transition probabilities using Markov components is the most frequently used method to include autocorrelation in a pharmacometric model, and it is easy to implement. No special functions are required and the approach can be used with any link function – eg, logit, probit, etc. Hutmacher and French addressed the issue of autocorrelation within the latent variable (LV) framework [3]. Autocorrelation was assumed to be random or stochastic and manifested in the latent residuals. Exposure and time effects were applied to the probabilities of the responses, not the transition probabilities. The general latent residual correlation structure presented by Hutmacher and French was not implemented easily within software currently available for generalized nonlinear mixed effects models however, restricting its general utility. The Markov and LV approaches to modeling autocorrelation differ primarily in focus and practical implementation, not in the general construction of the model likelihood. The -3- State Space and Marginal Probability Models approaches are related at a fundamental level through use of the Markov property. Thus, a different terminology is proposed that better aligns with the intent, structure and interpretation of the models. The term state space (SS) is proposed for direct modeling of, or applying structural effects to, the transition probabilities (TP). The focus of this model is characterizing the transitions from state-to-state. The term marginal probability (MP) is proposed for models in which structural effects are applied to probabilities that are not conditional on the previous response. The LV approach as described by Hutmacher et. al. [4] implies the MP approach*. Hopefully, the rationale for the proposed change in terminology shall become clearer after a more formal exposition of each framework. Despite the different objectives and utilities of the SS and MP approaches, one might be concerned about the adequacy of the MP approach if the within-subject correlation or TPs are modeled inappropriately. One might even have this concern even if marginal probabilities are the primary quantity of interest. Large decreases in the objective function value (OFV) have been noted when using the SS approach (eg, see [2]). If the OFVs are interpreted as a statistic related to the goodness-of-fit (to the extent that it should be in this context is another matter), then one might be concerned that an approach that does not accurately predict the TPs is inadequate. This article, in relation and extension to these previous works, seeks to address two objectives. The first objective is to demonstrate the relationship between the Markov and LV * The LV concept was used to provide a framework for including indirect response model (IRM) constructs into generalized nonlinear mixed effects models. The LV was used in a way similar to modeling continuous data with IRMs in which the current response does not depend upon the previously observed response value deterministically other than through the IRM model. An LV construct could be applied to the state space approach. For example the threshold for achieving a response could depend upon the previous response. The interpretation may be awkward from the pharmacometric drug-development point of view, however. Nevertheless, marginal probability is used here to maintain focus on the difference in concepts between the two approaches. -4- State Space and Marginal Probability Models approaches when autocorrelation is present. In so doing, a simpler, more user-friendly method for implementing stochastic autocorrelation in the MP approach is presented. A simple simulation/estimation exercise is presented to help delineate the SS and MP concepts. Further insight by comparison is provided by looking at the steady state condition of SS models. The approaches are also contrasted in how missing data need to be addressed. The second objective is the evaluation of biases in parameter estimation and prediction, and the effects of these on inference, when the random effects or correlation structure is misspecified. The biases are evaluated using a simulation study in which autocorrelation is assumed to be stochastic, and biases in predicting the MP are interpreted with respect to the accuracies of the TP predictions. A method of handling autocorrelation using link functions other than the probit is also described for general utility. The results suggest that characterizing stochastic autocorrelation is not as important as specifying a suitably rich random effects structure. Relationship between the Markov and LV Models 𝑇 Let 𝑌𝑖 = (𝑌𝑖1 , ⋯ , 𝑌𝑖𝑛𝑖 ) be a vector of binary or Bernoulli (0 or 1) observations collected at times 𝑡𝑖𝑗 , j = 1, 2…,ni. The joint probability of the responses, conditioned on the subject-specific random effects, η, can be derived by conditioning on the prior observations 𝑃(𝑌𝑖1 = 𝑦𝑖1 , ⋯ , 𝑌𝑖𝑛𝑖 = 𝑦𝑖𝑛𝑖 |𝜂) 𝑛 𝑖 = 𝑃(𝑌𝑖1 = 𝑦𝑖1 |𝜂) ∏𝑗=2 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖1 = 𝑦𝑖1 , ⋯ , 𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂) (1) Assuming the current observation is related only to the previous response – ie, the autoregression assumption 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖1 = 𝑦𝑖1 , ⋯ , 𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂) = 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂) – then (1) can be simplified to -5- State Space and Marginal Probability Models 𝑛 𝑖 𝑃(𝑌𝑖1 = 𝑦𝑖1 , ⋯ , 𝑌𝑖𝑛𝑖 = 𝑦𝑖𝑛𝑖 |𝜂) = 𝑃(𝑌𝑖1 = 𝑦𝑖1 |𝜂) ∏𝑗=2 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂) (2) The derivation of (2) is well known and illustrates the Markov property (of order 1). The Markov approach, as implemented typically in the pharmacometric literature, uses structural models for the probability of the current observation 𝑌𝑖𝑗 given the previous observation 𝑌𝑖𝑗−1 . Fixed effects parameters for the baseline, placebo and drug exposure model components are applied to ℎ[𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂)], where ℎ is a link function such as the logit. The 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂) can be viewed as a transition probabilities (TP), where a subject moves from state 𝑦𝑖𝑗−1 to state 𝑦𝑖𝑗 . If the 𝑌𝑖𝑗 are not interpreted as states, instead perhaps as discrete measurements or observations from an underlying continuous distribution, then modeling the TPs might be interpreted as an approach which addresses autocorrelation deterministically within a subject. The LV approach, as presented in the pharmacometric literature, addresses autocorrelation through LV residuals. The probability of response 𝑌𝑖𝑗 is modeled through fixed and subjectspecific random effects when specifying 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝜂); that is, these effects are applied to ℎ[𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝜂)], where h is a link function. Stochastic autocorrelation between 𝑌𝑖𝑗 and 𝑌𝑖𝑗−1 can be introduced through the joint probability, 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 , 𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 |𝜂). The overall joint probability can be expressed as 𝑛 𝑖 𝑃(𝑌𝑖1 = 𝑦𝑖1 , ⋯ , 𝑌𝑖𝑛𝑖 = 𝑦𝑖𝑛𝑖 |𝜂) = 𝑃(𝑌𝑖1 = 𝑦𝑖1 |𝜂) ∏𝑗=2 𝑃(𝑌𝑖𝑗 =𝑦𝑖𝑗 ,𝑌𝑖𝑗−1 =𝑦𝑖𝑗−1 |𝜂) 𝑃(𝑌𝑖𝑗−1 =𝑦𝑖𝑗−1 |𝜂) Thus, the Markov and autocorrelated LV approaches are related through -6- (3) State Space and Marginal Probability Models 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂) = 𝑃(𝑌𝑖𝑗 =𝑦𝑖𝑗 ,𝑌𝑖𝑗−1 =𝑦𝑖𝑗−1 |𝜂) (4) 𝑃(𝑌𝑖𝑗−1 =𝑦𝑖𝑗−1 |𝜂) From this, the LV approach with autocorrelation can be seen to be a Markov-based approach as well. Structurally modeling the TP, 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂), implies the focus of the model is on characterizing the transitions from state-to-state or a state space (SS) structure. Structurally modeling 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝜂) using an LV structure with stochastic autocorrelation mediated by 𝑃(𝑌𝑗 = 𝑦𝑗 , 𝑌𝑗−1 = 𝑦𝑗−1 |𝜂) implies a focus on the MP. Consequently, the SS and MP terminology is proposed. Notation of Model Quantities Notation is introduced to simplify exposition and clarify concepts for the remainder of the article. The notation is also intended to help differentiate between conditioning on random effects and prior observations. Let j index the measurement time for the ith subject, and 𝜂 be a vector of subject-specific random effects. Probabilities, conditioned on 𝜂, are defined using a ‘∙’ (𝑦 |𝑦 ) 𝑖𝑗 𝑖𝑗−1 as follows: 𝑝̇𝑖𝑗𝑗−1 = 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 , 𝜂) is the conditional or TP of 𝑌𝑖𝑗 based on (𝑦 ) observing the previous response 𝑌𝑖𝑗−1 ; 𝑝̇𝑖𝑗 𝑖𝑗 = 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 |𝜂) is the unconditional or MP of (𝑦 ,𝑦 ) 𝑖𝑗 𝑖𝑗−1 observing 𝑌𝑖𝑗 (unconditional on the previous response); and 𝑝̇𝑖𝑗𝑗−1 = 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 , 𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 |𝜂) is the joint probability of 𝑌𝑖𝑗 and 𝑌𝑖𝑗−1. Another level derives from probabilities that are (𝑦 ) (𝑦 ) not conditioned on 𝜂. These probabilities are without the ‘∙’: 𝑝𝑖𝑗 𝑖𝑗 = 𝐸𝜂 [𝑝̇𝑖𝑗 𝑖𝑗 ] = (𝑦 ,𝑦 ) (𝑦 ,𝑦 ) 𝑖𝑗 𝑖𝑗−1 𝑖𝑗 𝑖𝑗−1 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 ) is the MP of observing 𝑌𝑖𝑗 ; 𝑝𝑖𝑗𝑗−1 = 𝐸𝜂 [𝑝̇𝑖𝑗𝑗−1 ] = 𝑃(𝑌𝑖𝑗 = 𝑦𝑖𝑗 , 𝑌𝑖𝑗−1 = 𝑦𝑖𝑗−1 ) is the joint probability of 𝑌𝑖𝑗 and 𝑌𝑖𝑗−1; and 𝐸𝜂 represents the expectation operator with -7- State Space and Marginal Probability Models respect to η. These MPs are termed population probabilities (PPs) hereinafter in an attempt to avoid confusion due to multiple uses of the term marginal.† A natural definition of population (𝑦 |𝑦 ) 𝑖𝑗 𝑖𝑗−1 TPs is 𝐸𝜂 [𝑝̇𝑖𝑗𝑗−1 ]. This definition is not as interesting for MP models without stochastic (𝑦 ,𝑦 ) (𝑦 ) (𝑦 ) 𝑖𝑗 𝑖𝑗−1 𝑖𝑗−1 autocorrelation, because 𝑝̇𝑖𝑗𝑗−1 = 𝑝̇𝑖𝑗 𝑖𝑗 𝑝̇𝑖𝑗−1 due to conditional independence. Thus, the (𝑦 ) population TP results in 𝑝𝑖𝑗 𝑖𝑗 when there is no stochastic autocorrelation. Instead, population (𝑦 ,𝑦 ) (𝑦 ) 𝑖𝑗 𝑖𝑗−1 𝑖𝑗−1 TPs are defined as 𝑝𝑖𝑗𝑗−1 ⁄𝑝𝑖𝑗−1 in this article. This definition also highlights that 𝜂 affects transitions, the influence of which can be illustrated by considering the simple case of an additive random effect on the baseline of the logit or probit scale. Subjects with a very large (or (𝑦 ) small) values of 𝜂, will tend to have 𝑝̇𝑖𝑗 𝑖𝑗 of 1 (or 0) and thus will not have as many transitions despite having (conditionally) independent observations. Note that the PPs are correlated through the 𝜂 even if there is no stochastic autocorrelation. Modeling probabilities requires cumulative distribution functions. In this article, Φ(𝑥𝑖𝑗 ) represents the normal cumulative distribution function at quantile 𝑥𝑖𝑗 . Φ2 (𝑥𝑖𝑗 , 𝑥𝑖𝑗′ , 𝜈𝑖𝑗𝑗′ (𝜌)) represents the bivariate normal c.d.f. or an approximation thereto, at quantiles 𝑥𝑖𝑗 and 𝑥𝑖𝑗′ , where 𝜈𝑖𝑗𝑗′ (𝜌) is the autocorrelation function which depends on the parameter 𝜌. The function 𝜈𝑖𝑗𝑗′ is defined for subject 𝑖 between times 𝑗 and 𝑗 ′ because the correlation and sampling times could be different between the subjects. † (𝑦𝑖𝑗 ) The population (marginal) probability 𝑝𝑖𝑗 is often the quantity of interest when predicting from a model. However, this probability cannot be directly modelled, because a closed form does not exist for the multivariate probability distribution. These probabilities must be derived from the model secondarily. -8- State Space and Marginal Probability Models Illustration of State-Space and Marginal Probability Models The relationship between the SS and MP approaches was provided in the previous section. In this section, the implied intent of the SS and MP approaches is clarified and contrasted through modeling a simple example. The data for the example were simulated using stochastic autocorrelation to help delineate the differences between the approaches. Consider a simple hypothetical case where a drug is given, its effects achieve steady state by the first sample time, and there is no placebo effect. Let the probability of being a responder be (1) (1) (1) 𝑝𝑖𝑗 = 𝑝̇𝑖𝑗 = 0.75 across all subjects (i) and times (𝑗, 𝑡𝑗 ∈ {1,2,4,8,16}). The equality, 𝑝𝑖𝑗 = (1) 𝑝̇𝑖𝑗 , indicates that V𝑎𝑟(𝜂) = 0. Clearly this is an idealization, yet it is useful for illustration. Two structures for 𝜈𝑖𝑗𝑗′ were considered, the autoregressive structure (AR1), where 𝜈𝑖𝑗𝑗′ = 𝜌, and the spatial autoregressive structure (AR1S), where 𝜈𝑖𝑗𝑗−1 = 𝜌|𝑡𝑖𝑗 −𝑡𝑖𝑗−1 | . Both scenarios used 𝜌 = 0.5 with data from 1000 virtual subjects. Five models were fitted to the data. The models are enumerated below along with a reference to its model structure: 1) typical Markov model with assumed first state (5a), 2) Markov model with estimated first state (5b), 3) constrained Markov model (5c), 4) an MP model (5d), and 5) an MP model (5d) with a 3-point quadrature approximation to Φ2 : (1|𝑦 ) (1|𝑦 ) 𝑝𝑖𝑗𝑗−1𝑖𝑗−1 = Φ(𝛼1 + 𝛼2 𝐼[𝑦𝑖𝑗−1 = 1]); assume 𝑃(𝑌𝑖0 = 0) = 1 (1) 𝑝𝑖𝑗𝑗−1𝑖𝑗−1 = Φ(𝛼1 + 𝛼2 𝐼[𝑦𝑖𝑗−1 = 1]), 𝑗 > 1; 𝑝𝑖𝑗 = Φ(𝛼0 ), 𝑗 = 1 (1) (1|0) 𝑝𝑖𝑗 = Φ(𝛽1 ), 𝑗 ≥ 1; 𝑝̇𝑖𝑗,𝑗−1 = Φ(𝛽1 + 𝛼3 ), 𝑗 > 1; (1|1) 𝑝𝑖𝑗,𝑗−1 = 1 − -9- Φ(𝛽1 + 𝛼3 )[1 − Φ(𝛽1 )] ,𝑗 > 1 Φ(𝛽1 ) (5a) (5b) (5c) State Space and Marginal Probability Models (1) 𝑝𝑖𝑗 = Φ(𝛽1 ), 𝑗 ≥ 1; (𝑦 ,𝑦 (𝑦𝑖𝑗 |𝑦𝑖𝑗−1 ) 𝑝𝑖𝑗𝑗−1 (𝑦𝑖𝑗 ,𝑦𝑖𝑗−1 ) = 𝑝̇ 𝑖𝑗𝑗−1 (𝑦𝑖𝑗−1 ) , 𝑗 > 1; (5d) 𝑝̇ 𝑖𝑗−1 ) (1) (1) 𝑖𝑗 𝑖𝑗−1 𝑝𝑖𝑗𝑗−1 = (𝑦𝑖𝑗 − 1)(𝑦𝑖𝑗−1 − 1) + (𝑦𝑖𝑗 − 1)(−1)𝑦𝑖𝑗−1 𝑝𝑖𝑗−1 + (𝑦𝑖𝑗−1 − 1)(−1)𝑦𝑖𝑗 𝑝𝑖𝑗 + (−1)(𝑦𝑖𝑗+𝑦𝑖𝑗−1 ) Φ2 (𝛽1 , 𝛽1 , 𝜈𝑖𝑗𝑗−1 ) 𝑌𝑖0 represents a response at 𝑡 = 0, which was not simulated as part of the design, yet is necessary to start the Markov chain. 𝐼[∙] represents an indicator function that = 1 when the logical condition is true and = 0 when false. Different symbols were used for the parameters to clarify their different roles and interpretations between the models. The two Markov models, Model 1 and Model 2 [(5a) and (5b), respectively] have two parameters for modeling the TP, 𝛼1 and 𝛼2 . The term 𝛼2 𝐼[𝑦𝑖𝑗−1 = 1] reflects a change in the TP as a function of observing 𝑦𝑖𝑗−1 = 1, and the magnitude of the change depends on 𝛼2 ; thus, (1|0) (1|0) 𝑝𝑖𝑗𝑗−1 = Φ(𝛼1 ) and 𝑝𝑖𝑗𝑗−1 = Φ(𝛼1 + 𝛼2 ). In (5a), the response or state at time 0 is assumed to be 0 and is forced to be so in the model (the first observations must result from a transition from the 0-state or a recurrence of the 0 state). Model 2 in (5b) does not make any assumption about the state of the prior time point. The parameter, 𝑏0 , models the probability of being in the state = 1 at time 𝑡 = 1. Note that 𝛼2 reflects the degree of dependence between two contiguous (1|0) (1|1) observations (ie, 𝛼2 = 0 means these are independent or 𝑝𝑖𝑗𝑗−1 = 𝑝𝑖𝑗𝑗−1 = Φ(𝛼1 ).). For these two models, the marginal probabilities can be derived using the initial state probability vector and the transition probability matrix. The calculation for 𝑡 = 1 is as follows: - 10 - State Space and Marginal Probability Models [ (0) 𝑝𝑖0 (0|0) 𝑝𝑖10 (1) 𝑝𝑖0 ] [ (0|1) 𝑝𝑖10 (0,0) (0,1) [ 𝑝𝑖10 + 𝑝𝑖10 (0|0) (1|0) 𝑝𝑖10 (1|1) 𝑝𝑖10 (0) (0|0) (1) (0|1) ] = [ 𝑝𝑖0 𝑝𝑖10 + 𝑝𝑖0 𝑝𝑖10 (1,0) (1,1) (0) 𝑝𝑖10 + 𝑝𝑖10 ] = [ 𝑝𝑖1 (1|0) (0|1) (0) (1|0) (1) (1|1) 𝑝𝑖0 𝑝𝑖10 + 𝑝𝑖0 𝑝𝑖10 ] = (1) 𝑝𝑖1 ] (1|1) (6) (0) (1) where 𝑝𝑖10 + 𝑝𝑖10 = 1, 𝑝𝑖10 + 𝑝𝑖10 = 1, and 𝑝𝑖1 + 𝑝𝑖1 = 1 by definition. The probabilities for a general time point j are calculated from repeated application of the TP matrix as follows: (0|0) [ (0) 𝑝𝑖0 (1) 𝑝𝑖0 ] [ (1|0) 𝑝𝑖10 𝑝𝑖10 (0|1) 𝑝𝑖10 (1|1) 𝑝𝑖10 (0|0) ][ (1|0) 𝑝𝑖21 𝑝𝑖21 (0|1) 𝑝𝑖21 (1|1) 𝑝𝑖21 ]⋯[ 𝑝𝑖𝑗𝑗−1 (0|0) 𝑝𝑖𝑗𝑗−1 (1|0) (0|1) 𝑝𝑖𝑗𝑗−1 (1|1) 𝑝𝑖𝑗𝑗−1 (0) ] = [ 𝑝𝑖𝑗 (1) 𝑝𝑖𝑗 ] (7) If the TP matrix is constant over all j and equal to P, which is the case here, then (7) maybe simplified to (0) [ 𝑝𝑖0 (0) (1) 𝑝𝑖0 ]P𝑗 = [ 𝑝𝑖𝑗 (1) 𝑝𝑖𝑗 ] (8) due to j repeated matrix multiplications of P. The expression in (8) has implications with respect to steady state probability predictions, which are useful in contrasting the SS and MP approaches as shown hereinafter. The structure for Model 4 in (5d) shows that the MPs of the responses are modeled by 𝛽1. The autocorrelation is handled by the TPs, which are derived using the joint probability and the previous marginal probability. The magnitude of the autocorrelation is governed by 𝜌. The calculation of Φ2 is not available in all software. Therefore, an approximation of Φ2 using 3point quadrature was evaluated in (5d) which is denoted as Model 5 [5]. Note for Model 4 and (1|0) (1|1) Model 5 that under independence (𝜌 = 0) 𝑝𝑖𝑗𝑗−1 = 𝑝𝑖𝑗𝑗−1 = Φ(𝛽1 ). - 11 - State Space and Marginal Probability Models Model 3, termed the constrained Markov model and depicted in (5c), is a hybrid between the Markov approach and the MP approach. The model uses a fixed effect, 𝛼3 , to govern the amount of autocorrelation mediated by the TP, while predicting the MP through estimation of 𝛽1. Note (1|0) (1|1) that 𝑝𝑖𝑗𝑗−1 = 𝑝𝑖𝑗𝑗−1 = Φ(𝛽1) when 𝛼3 = 0. Models 1-5 were fitted to the simulated data. The results are provided in Figure 1. The first row displays the predictions for the AR(1) scenario, and the first column of the row displays the (1) predicted 𝑝𝑖𝑗 over time. The prediction for Model 1 at 𝑡 = 1 is lower than those from the other models, and achieves steady state around 𝑡 = 4. Model 2 predicts better than Model 1 and achieves steady state also at 𝑡 = 4 approximately. Note that the prediction from Model 2 at 𝑡 = 1 is off the line because of variability. This prediction is made based on the data at 𝑡 = 1 only. Interestingly, Model 1 and Model 2 do not have a constant probability prediction like the true model. Models 3-5 are indistinguishable and predict a flat trajectory as these were constructed to do so. (0|1) For the transition to 0 from 1, all models provide similar predictions of the TP 𝑝𝑖𝑗𝑗−1 . For (1|0) 𝑝𝑖𝑗𝑗−1 , Model 1 is not as close to the true value as Model 2, which is a not as close to Models 35. The reason is that the simulation structure did not assume a value at 𝑡 = 0. The assumption (1|0) (0|1) (0|1) of a 0 response at 𝑡 = 0 influences 𝑝𝑖𝑗𝑗−1 more than 𝑝𝑖𝑗𝑗−1 because information for 𝑝𝑖𝑗𝑗−1 is only available under the model at 𝑡 ≥ 2 – ie, the initial assumption does not influence this probability. One can see that by 𝑡 = 2 that Model 1 has moved away from the poor prediction at 𝑡 = 1. The second row displays the results from the AR(1S) scenario. Models 1 and 2 again reach steady state by around 𝑡 = 4, with all models achieving comparable predictions at 𝑡 ≥ 2. - 12 - State Space and Marginal Probability Models For the TP, only Models 4 and 5 track the true trajectory. The other models do not have the machinery to allow autocorrelation to be a function of time. However, even though Model 1, 2 (1) and 3 do not accurately estimate the TPs, these do predict and approximate the flatness of 𝑝𝑖𝑗 at steady state. Figure 1 alludes to the notion of steady state for a Markov chain. Steady state can be formalized according to the following (0) [ 𝑝𝑖0 (1) (0) 𝑝𝑖0 ]P ∞ = [ 𝑝𝑖∞ (1) (0) 𝑝𝑖∞ ] = [ 𝑝𝑖∞ (1) 𝑝𝑖∞ ]P (9) where P ∞ represents infinite applications of P. This is somewhat analogous to pharmacokinetic steady state following repeated regular administration of a drug, where the concentration at the beginning of the dosing interval is identical to that at the end of the interval. The steady state (0) (1) (0) probabilities 𝑝𝑖∞ , and 𝑝𝑖∞ can be calculated by solving [ 𝑝𝑖∞ (0) (1) 𝑝𝑖∞ ](P − I), where I is the (1) (0) (1) identity matrix, using the constraint 𝑝𝑖∞ + 𝑝𝑖∞ = 1. The solution connotes that 𝑝𝑖∞ , and 𝑝𝑖∞ (0) (1) do not depend upon the starting probabilities 𝑝𝑖0 and 𝑝𝑖0 after a large number of applications (0) (1) of P (ie, P ∞ ). The rate at which steady state is achieved is dependent upon 𝑝𝑖0 and 𝑝𝑖0 , (0) however. Consider a simulation in which the sample size is large enough that the TP, and 𝑝𝑖0 (1) and 𝑝𝑖0 for Model 2), are estimated with sufficient accuracy and precision to be indistinguishable from the true values. Model 1 predictions do not approach 0.75 until 𝑡 > 16, (1) (1) which is outside the sampling design ( 𝑝̇𝑖6 = 0.749), while Model 2 predicts 𝑝̇𝑖1 = 0.75 for all 𝑡 ≥ 1, because the starting value of the chain is estimated at the first time point. In contrast, steady state is implicit in the formulations (and assumptions) of Models 3 through 5. - 13 - State Space and Marginal Probability Models Another area to compare and contrast the MP and SS approaches is how these handle interstitial (intermittent or non-monotonic) missing data – ie, occasional missing data within a subject such as a missed clinic visit (not due to dropout). For the SS approach, the probability model for the current observation is a function of the previous observation (such as Model 1 and 2). When the previous observation is missing and hence unknown, one must effectively integrate it out of the likelihood. This is demonstrated in the context of the illustration hereinbefore. Noting the assumption that the 0 state has probability 1 at 𝑡 = 0 for Model 1, the likelihood for a complete data case is (𝑦 |0) ℒ = 𝑝𝑖10𝑖1 (𝑦 |𝑦𝑖1 ) 𝑝𝑖21𝑖2 (𝑦 |𝑦𝑖2 ) 𝑝𝑖32𝑖3 (𝑦 |𝑦𝑖3 ) 𝑝𝑖43𝑖4 (𝑦 |𝑦𝑖4 ) 𝑝𝑖54𝑖5 (10) Consider the case where the second observation (𝑗 = 2, 𝑡 = 2) is missing, yet all the other observations are available. The path the individual followed to arrive at state 𝑦𝑖3 (at 𝑡 = 3) from 𝑦𝑖2 is unknown, so all trajectories must be considered. The likelihood for this case is (𝑦 |0) ℒ = 𝑝𝑖10𝑖1 (0|𝑦𝑖1 ) [ 𝑝𝑖21 (𝑦 |0) 𝑝𝑖32𝑖3 (1|𝑦𝑖1 ) + 𝑝𝑖21 (𝑦 |1) (𝑦 |𝑦𝑖3 ) 𝑝𝑖32𝑖3 ] 𝑝𝑖43𝑖4 (𝑦 |𝑦𝑖4 ) 𝑝𝑖54𝑖5 (11) The quantity in brackets in (11) is 𝑃(𝑌𝑖3 = 𝑦𝑖3 |𝑌𝑖1 = 𝑦𝑖1 ). Handling missing data in this principled manner is complicated when there are strings of contiguous interstitial missingness in the data, because of the numbers of summations involved. Because of the complications, missing data are often imputed using a last-observation-carried-forward approach, the effects of which are not often evaluated. Missing data must also be integrated out of the likelihood for the MP approach. The assumption of normal latent residuals (ie, probit modeling) and consideration of the variance- - 14 - State Space and Marginal Probability Models covariance matrix of the residual simplifies the issue. Assuming that the variance of the residuals = 1, which is typical, the matrices for the AR(1) and AR(1S) scenarios, are 1 𝜌1 Σ𝑖 = 𝐶𝑜𝑣[𝜀𝑖 ] = 𝜌2 𝜌3 [𝜌4 𝜌1 1 𝜌1 𝜌2 𝜌3 𝜌2 𝜌1 1 𝜌1 𝜌2 𝜌3 𝜌2 𝜌1 1 𝜌1 ′ with general elements 𝜌|𝑗−𝑗 | and 𝜌 1 𝜌4 3 𝜌1 𝜌 𝜌2 or 𝜌3 𝜌1 𝜌7 1 ] [𝜌15 |𝑡𝑖𝑗 −𝑡𝑖𝑗′ | 𝜌1 1 𝜌2 𝜌6 𝜌14 𝜌3 𝜌7 𝜌15 𝜌2 𝜌6 𝜌14 1 𝜌4 𝜌12 𝜌4 1 𝜌8 𝜌12 𝜌8 1 ] (12) , respectively. Because of the normality assumption, integrating out the missing data corresponds to eliminating the row and column from the matrix in (12) that corresponds to the missing observation – in this example, the second row and column yielding 1 𝜌2 Σ𝑖 = 𝐶𝑜𝑣[𝜀𝑖 ] = 3 𝜌 [𝜌4 𝜌2 1 𝜌1 𝜌2 𝜌3 𝜌1 1 𝜌1 1 𝜌4 2 𝜌 𝜌3 or 𝜌1 𝜌7 1 ] [𝜌15 𝜌3 1 𝜌4 𝜌12 𝜌7 𝜌15 𝜌4 𝜌12 1 𝜌8 𝜌8 1 ] (13) This general description of the elements is helpful when coding the likelihood. The correlation function used in Φ2 is simple to derive, because 𝜈𝑖31 = 𝜌|3−2| 𝜌|2−1| = 𝜌|3−1| for AR(1) or 𝜈𝑖31 = 𝜌|4−2| 𝜌|2−1| = 𝜌|4−1| for AR(1S). Thus, the likelihood goes from (𝑦𝑖2 ,𝑦𝑖1 ) (𝑦𝑖3 ,𝑦𝑖2 ) (𝑦𝑖4 ,𝑦𝑖3 ) (𝑦𝑖5 ,𝑦𝑖4 ) 𝑝𝑖21 𝑝𝑖21 𝑝𝑖21 (𝑦 ) 𝑝𝑖21 ℒ = 𝑝𝑖1 𝑖1 (𝑦 𝑝𝑖1 𝑖1 ) (𝑦 𝑝𝑖2 𝑖2 ) (𝑦 𝑝𝑖3 𝑖3 ) (𝑦 𝑝𝑖4 𝑖4 ) (14) for the complete data case to (𝑦𝑖3 ,𝑦𝑖1 ) (𝑦𝑖4 ,𝑦𝑖3 ) (𝑦𝑖5 ,𝑦𝑖4 ) 𝑝𝑖21 𝑝𝑖21 (𝑦 ) 𝑝𝑖21 ℒ = 𝑝𝑖1 𝑖1 - 15 - (𝑦 𝑝𝑖1 𝑖1 ) (𝑦 𝑝𝑖3 𝑖3 ) (𝑦 𝑝𝑖4 𝑖4 ) (15) State Space and Marginal Probability Models when the second observation is missing. Marginal Probability Models and the Latent Variable Structure Before the simulation study, it is helpful to establish a general framework. Let Z be a qdimensional multivariate latent variable 𝑍𝑖 |𝜂 = 𝑓(𝛽, 𝑡𝑖 , 𝑑𝑖 , 𝑥𝑖 ) + 𝑔(𝛽, 𝑡𝑖 , 𝑑𝑖 , 𝑥𝑖 )𝜂𝑖 + 𝜀𝑖 (16) where 𝑓(𝛽, 𝑡𝑖 , 𝑑𝑖 , 𝑥𝑖 ) = 𝑓𝑖 and 𝑔(𝛽, 𝑡𝑖 , 𝑑𝑖 , 𝑥𝑖 ) = 𝑔𝑖 are vectors or matrices of functions of the fixed effects (β), design dependent covariate vectors 𝑡𝑖 (time) and 𝑑𝑖 (dose), and 𝑥𝑖 is a matrix of subject-specific covariates (potentially time varying); η is a p-dimensional vector of random effects (𝑉𝑎𝑟(𝜂𝑖 ) = Ω); and 𝜀𝑖 is the q-dimensional normally distributed latent residual error vector (𝑉𝑎𝑟(𝜀𝑖 ) = Σ(𝑡𝑖 )), with a covariance/correlation matrix that can be a function of ti. In general, 𝜀𝑖 could have any distribution, however, modeling autocorrelation through latent residuals is the most straightforward. This will be revisited later. The diagonal entries of Σ(ti) ≡ 1 for identifiability, and is thus a correlation matrix. In pharmacometrics work for binary data, often 𝑔𝑖 ≡ 1, and the model is not written grouping the fixed effect terms and the random effects terms as shown in (16). There is benefit in this which should become clear. Then, (1) 𝑝̇𝑖𝑗 = 𝑃(𝑌𝑖𝑗 = 1|𝜂) = 𝑃(𝑍𝑖𝑗 ≤ 𝛾|𝜂) = Φ(𝛾 − 𝑓𝑖𝑗 − 𝑔𝑖𝑗 𝜂𝑖 |𝜂) (17) where i indexes subject, j indexes time within the subject (an observation from the vector), and 𝛾 is a constant, interpreted as a threshold, which is also sometimes interpreted as the baseline (see Hu et al [6] for a broader discussion). - 16 - State Space and Marginal Probability Models Nonlinear mixed effects software takes (17) and computes the (approximate) likelihood behind the scenes. Despite the linearity of 𝜂 in (16), which allows exact computation of the distribution of Z, the likelihood is based on integrals of Z, which leads to the need to approximate the likelihood when fitting Y. (1) (1) The desired quantity from modeling for decision making is typically 𝑝𝑖𝑗 , not 𝑝̇𝑖𝑗 . Usually ∗ one must use Monte Carlo techniques to calculate these, eg 𝑝𝑖𝑗 ≈ 𝑀1 ∑𝑀 𝑚=1 Φ(𝜇𝑖𝑗 − 𝑔𝑖𝑗 𝜂𝑚 ), for a suitable size M where 𝜂∗ is sampled from 𝑁(0, Ω) and 𝜇𝑖𝑗 and Ω and have been estimated. However, 𝑝𝑖𝑗 can be computed directly from the marginal distribution for the probit model described above. This removes simulation error from the prediction and is economical computationally, and also provides some insight into the potential for bias. The direct computation is based on the multivariate normal marginal distribution of Z – ie, 𝑍𝑖 ~𝑁(𝑓𝑖 , 𝑔𝑖 Ω𝑔𝑖𝑇 + Σ𝑖 = Ξ𝑖 ). Using the LV framework, properties of the multivariate normal and letting 𝜇𝑖𝑗 = 𝛾 − 𝑓𝑖𝑗 (absorbing the threshold), the PP can be derived (1) −½ −½ 𝑝𝑖𝑗 = 𝑃(𝑌𝑖𝑗 = 1) = 𝑃(𝑍𝑖𝑗 ≤ 𝛾) = Φ(Ξ𝑖𝑗𝑗 [𝛾 − 𝑓𝑖𝑗 ]) = Φ(Ξ𝑖𝑗𝑗 𝜇𝑖𝑗 ) (18) where Ξ𝑖𝑗𝑗 is the jth element of the diagonal of Ξ𝑖 . Note that (18) is a MP and that the index i is retained, because, in general, subjects can have different covariate vectors such as time or dose. The result in (18) is not dependent directly on the off diagonal elements of Ξ and hence the −½ autocorrelation specified in Σ. One can see in (18) that β, 𝑔 and Ω are intertwined in Ξ𝑖𝑗𝑗 𝜇𝑖𝑗 , such that the misspecification of 𝑔 or Ω, or biased estimates of Ω could influence the estimates of β to keep accurate predictions of 𝑝𝑖𝑗 . From this, one could speculate that incorrect specification of Σ could lead to biased estimates of Ω to compensate, and hence biases in β could - 17 - State Space and Marginal Probability Models −½ result. The Ξ𝑖𝑗𝑗 𝜇𝑖𝑗 in (18) exemplifies the possible confounding of fixed and random effects. The LV structure naturally leads to the MP approach. Simulation Study to Evaluate Bias in the Estimates, Predictions and Inferences Methods The form of the model components in (16) used in the simulation study were 𝑑 ln 2 𝑖 𝜇𝑖 = 𝛽1 + 𝛽2 𝑈(𝑡𝑖𝑗 ) + 𝛽3 𝑑 +𝑒𝑥𝑝(𝛽 , 𝑈𝑖𝑗 = 𝑈(𝑡𝑖𝑗 ) = 1 − 𝑒𝑥𝑝 (− 𝑒𝑥𝑝(𝛽 )) 𝑡𝑖 ) 𝑖 5 4 (19a) 𝑇 𝑔𝑖 = (1, 𝑈(𝑡𝑖𝑗 )) → 𝑔𝑖 𝜂𝑖 = 𝜂1𝑖 + 𝜂2𝑖 𝑈𝑖𝑗 (19b) As above, Ξ𝑖 = 𝑔𝑖 Ω𝑔𝑖 𝑇 + Σ𝑖 , where Ξ𝑖𝑗𝑗 = Ω11 + 2𝑈𝑖𝑗 Ω12 + 𝑈𝑖𝑗2 Ω22 + 1. The design skeleton used for the simulation study was as follows: 7 parallel-groups with doses 𝑑𝑖 from the set {0, 1, 3, 5, 10, 15, 30}, 40 subjects per dose group, and 7 time points per individual with 𝑡𝑖 ∈{1, 2, 4, 8, 16, 24, 36}. The parameters 𝛽1, 𝛽2, and 𝛽3 in (19a) were derived (1) (1) (1) to achieve: 𝑝𝑖𝑗 = 0.15 for 𝑑𝑖 = 0, 𝑡𝑖𝑗 = 1; 𝑝𝑖𝑗 = 0.40 for 𝑑𝑖 = 0, 𝑡𝑖𝑗 = 16; and 𝑝𝑖𝑗 = 0.85 for 𝑑𝑖 = 0, 𝑡𝑖𝑗 = 16; 𝛽2 is the maximum placebo effect and 𝛽3 is the maximum drug effect or Emax. The half-life (t½) of placebo onset was 𝑒𝑥𝑝(𝛽4 ) = 4, and the ED50 was 𝑒𝑥𝑝(𝛽5 ) = 30 (ie, 9 the ED90 = 30). Overall, β = (−1.67, 1.25, 2.94, 1.39, 1.20). The variance components were Ω11 = 1, Ω12 = 0, and Ω22 = 2.5. The values of 𝜌 considered in the simulation were {0, 0.3, 0.5, 0.7, 0.9} for Σ𝑖 , and the AR(1S) structure was used – ie, Σ𝑖𝑙𝑚 = 𝜌|𝑡𝑖𝑙−𝑡𝑖𝑚| . The 𝜀 in (16) were simulated using the following recursive relation: - 18 - State Space and Marginal Probability Models 𝜀𝑖1 = 𝑒𝑖1 𝑗=1 𝜀𝑖𝑗 = 𝜀𝑖𝑗−1 𝜌|𝑡𝑖𝑗−𝑡𝑖𝑗−1 | + 𝑒𝑖𝑗 √1 − 𝜌2|𝑡𝑖𝑗 −𝑡𝑖𝑗−1 | 𝑗≥2 (20) where 𝑒𝑖𝑗 ~𝑁(0,1). The responses were realized by plugging the simulated values of 𝜂𝑖 and 𝜀𝑖𝑗 into 𝑦𝑖𝑗 = 𝐼(𝜀𝑖𝑗 < 𝜇𝑖𝑗 ), where 𝐼(∙) = 1. when 𝜀𝑖𝑗 < 𝜇𝑖𝑗 and = 0 otherwise. Table 1 displays the off-diagonal correlation values for Ξ – ie, the correlations in the latent variable Z. Ranges of correlations across the doses are reported for the responses, 𝑌; ranges are displayed because these are on the marginal scale and so the fixed effects play a role. Note that the diagonal elements of these matrices = 1 by definition. From the table, one can see there is moderate correlation at baseline, and as the study progresses in time (large index values) that the correlation increases. This increase in correlation is based on the placebo onset model and its random effects, and reflects the onset of steady-state. One can also see that as ρ increases, observations at early time points (small index values) are increasingly more correlated due to 𝜌|𝑡𝑖𝑗 −𝑡𝑖𝑗−1 | in Σ𝑖 . This increased correlation attenuates as observations become farther apart. The complexity and form of the model in (16) was chosen for realism. Including random effects on model components other than the intercept is not typical [7]. The extra random effect on the non-drug response was included to allow for a more general correlation structure than typically assumed, yet is likely more consistent with data that are generated from an LV process (owing to its hypothetical relation to an unobserved continuous factor). The number of time points and doses are more than typically studied in a Phase 2 trial. However, this information rich design was selected specifically to facilitate convergence and covariance step estimation across multiple simulations. The goal was to avoid confounding of results and conclusions with convergence issues, both within an assumed model and between more complex and simpler - 19 - State Space and Marginal Probability Models (reduced) models. In fact, to further facilitate covariance step estimation, the following parameterization was used: ln √Ω11 , ln √Ω22 , and 𝛽𝑟 where 𝜌 = 2(1 + 𝑒 −𝛽𝑟 ) −1 − 1). These reparameterizations decreased the issues associated with boundary constraints of the variance components (ie, 0) and demonstrated improved normality in their sampling distributions (preliminary simulations – data not shown). Eight versions of the model described in (19a) and (19b) were fitted: 1) M0 used adaptive Gaussian quadrature (AGQ) for the random effects and a 5-point approximation for Φ2 (see [5]); M1 was the same as M0 except the Laplace approximation (LA) was used instead of AGQ ‡ and a 3-point approximation for Φ2 was used (see [5]); M2 was M1 using AGQ (ie, 3-point approximation for Φ2 ); M3 was M2 using LA with 𝜌 constrained to 0; M4 was M3 using AGQ; M5 was M4 using LA with Ω22 constrained to 0, which is generally regarded as the standard model; M6 was M5 using AGQ; and M7 was M6 with Ω11 constrained to 0 – a naïve pool model. These models reflect different approximations to the likelihood and simplification to the stochastic structure. The joint probabilities used in the likelihood for models M0, M1 and M2 are (𝑦 ,𝑦 ) 𝑖𝑗 𝑖𝑗−1 𝑝̇𝑖𝑗𝑗−1 = (𝑦𝑖𝑗 − 1)(𝑦𝑖𝑗−1 − 1) + (𝑦𝑖𝑗 − 1)(1 − 2𝑦𝑖𝑗−1 )Φ(𝜇𝑖𝑗−1 ) +(𝑦𝑖𝑗−1 − 1)(1 − 2𝑦𝑖𝑗 )Φ(𝜇𝑖𝑗 ) + (1 − 2𝑦𝑖𝑗 )(1 − 2𝑦𝑖𝑗−1 )Φ2 [𝜇𝑖𝑗 , 𝜇𝑖𝑗−1 , 𝜈𝑖𝑗𝑗−1 (𝜌)] (21) The joint probability is expressed differently in (21) than in (5d) to facilitate coding in NONMEM. Predictions of the PPs for these models were based on (1) 𝛽 𝑑 3 𝑖 −½ 2 𝑝𝑖𝑗 = Φ(Ξ𝑖𝑗𝑗 𝜇𝑖𝑗 ) = Φ ([𝛽1 + 𝛽2 𝑈𝑖𝑗 + 𝑑 +𝑒 𝛽5 ]⁄𝑉𝑖𝑗 ) , 𝑉𝑖𝑗 = √1 + Ω11 + Ω22 𝑈𝑖𝑗 𝑖 ‡ The Laplace approximation is essentially adaptive Gaussian quadrature with 1 quadrature point. - 20 - (22) State Space and Marginal Probability Models Inspection of (22) reveals that biased predictions could arise because of incorrect specification of 𝑉𝑖𝑗 . When Ω22 is constrained to 0, the modeled time-profile is changed and the parameter estimates of the 𝛽’s could attempt to compensate to fit the data. Three models were introduced to adjust for such biases, and these are based on the structure 𝐷 𝜇𝑖𝑗 = 𝛽1 + 𝜂1𝑖 + 𝛽2,𝑗 (𝑡𝑗 ≥ 2) + 𝛽3,𝑗 𝐷 +𝑒𝑖 𝛽5 𝑖 (23) where separate fixed effects were estimated by time for the placebo and drug components. These models represent saturated models, essentially fitting different dose-effect profiles by time. M8 used LA, M9 used AGQ, and M10 had Ω11 constrained to 0 (a naïve pool model). PROC NLMIXED in SAS Version 9.3 (SAS Institute Inc., Cary, NC) was used to fit all these models. Recently, the approximations described and implemented above have been made available as a subroutine in NONMEM (ICON Development Solutions, Ellicott City MD), which is a more standard and general software for pharmacometrics work. A NONMEM control stream is supplied in Appendix A that demonstrates more clearly how these data were fitted. Appendix A also provide a link to a site from which the bivariate normal routine can be downloaded. Percentage bias in the parameter estimates was calculated. Despite the parameterization discussed above for estimation, biases in Ω11 , Ω22 and 𝜌 were computed so that percentage bias could be reported (the skewness of the estimates on this scale did not provide misleading conclusions). Biases in the estimates of the off diagonal elements of Ξ were evaluated also as low biases in these should indicate good characterization of the variability or correlation, hence accurate predictions of the TPs. Biases in parameter estimates were only computed for M0 through M7, because of the common parameter structure related to the true simulation model. - 21 - State Space and Marginal Probability Models (1|0) (1) Biases in the 𝑝𝑖𝑗 and the population transition probabilities (𝑝𝑖𝑗 (0|1) , 𝑝𝑖𝑗 ), as defined in the Notation of Model Quantities section, were also computed and assessed. A closed form of these is available, and the derivation is provided in Appendix B. These TPs relate to the ability of the model to reproduce patterns in the response transitions within subjects over time. (1) Predictions of 𝑝𝑖𝑗 and inferences thereon are often of primary interest in decision making; (1) thus, accuracies of inferences in the 𝑝𝑖𝑗 were also assessed. Two methods were used to compute 90% confidence intervals (CI). The first method used was a smoothed bootstrap technique [8]. For each of 1000 replicates, a vector of parameters were sampled from a multivariate normal distribution that had the mean vector equal to the estimate vector and covariance matrix equal to the output from the covariance step (covariance matrix of the estimates). This vector was plugged into the model to make predictions across the doses and times for the replicate. For each dose and time combination, the 5th and 95th percentiles across the distribution generated by the replicates were used as the 90% CI bounds. The second method (1) used was the delta-method, which is based on linearization [9]. The closed from of 𝑝𝑖𝑗 allowed straightforward implementation. The details are supplied in Appendix C. A delta-method technique is also provided therein for general non-closed form predictions (eg, if the logit link were used). Such a method might be able to be incorporated into software for general, quick and convenient calculation of CIs of population means, which could be output in a table file. The procedure was not evaluated here, yet maybe of potential use and so future research is warranted. - 22 - State Space and Marginal Probability Models Results The results for 𝜌 = 0.0 and 𝜌 = 0.7 are presented and contrasted here. Results for the other 𝜌 scenarios are presented in the Supplemental Material. Model fittings that did not converge or did not provide standard errors were discarded from the results (maximum was only 0.6% of the fittings for a specific scenario and model). For 𝜌 = 0.7, M0 and M2 showed the smallest magnitudes of bias (percentage) in the fixed effects 𝛽. The 5-point approximation to Φ2 did not provide much benefit over the 3-point. M1 (using LA) showed slightly increased biases comparatively. M3 and M4, in which 𝜌 is fixed to 0, had greater positive biases, with AGQ (M3) not providing much benefit over LA (M4). M5 and M6 showed lower magnitudes of bias, except for the larger negative biases in 𝛽2 and 𝛽4 , which reflect the maximum placebo effect and its onset. The design might not have been as rich in information regarding these parameters. The decrease is surprising perhaps, because these models did not estimate 𝜌 or Ω22 . The naïve pool model, M7, had the largest biases which were negative. M0 and M1 had the smallest biases in Ω11 , Ω22 and 𝜌, also as expected. Biases in estimates from models where parameters were constrained to 0 are −100% by definition (𝜌 for M3 through M7, Ω22 for M5 and M6, Ω11 for M7). Given some of these models represent simplifications of the true model, meaningful comparisons of the biases for these parameters are difficult. Biases in the off diagonal elements of Ξ are also provided in Figure 2. These provide an idea of how well the correlations are being estimated. Model M0, M1 and M2 showed almost no bias. Model M3 and M4 demonstrated a consistent positive bias across all the elements, which resulted from the imposed constraint of 𝜌 = 0. AGQ (M4) did not improve these estimates. M5 and M6 had jagged patterns of bias, with some biases < 0, and again AGQ provided no benefit. M7 did not provide estimates of the off diagonals, because these are assumed to be 0 by the model; thus, all were reported as −100%. - 23 - State Space and Marginal Probability Models One wonders if the lower biases observed for M3 in the 𝛽 compared to M2 were because, for some reason, M3 provided a better approximation to Ξ in some average sense. The biases in 𝑝 (1) on the probability scale are displayed in the first row of Figure 3, where the segments reflect the minimum to maximum bias (range) across the doses and times and the points represent the median across these. M0, as might be anticipated, is nearly unbiased. M1, using LA and a 3- point approximation showed slightly increased bias compared to M0. M2 (AGQ) provided similar results to M0, which suggests that the 3-point is nearly as good as the 5point approximation. M3 and M4 had a little larger bias than the previous models, which was due to not estimating 𝜌. The biases were not egregious; the biases for M4 were all less than ± 0.01. The biases for M2 and M4 fitted to the 𝜌 = 0 scenario were similar to that of M0 (all used AGQ), and M1 and M3 showed a little larger range of bias. Note that M0 through M4 are either the true model (subject to approximation) or contain it for 𝜌 = 0. M5 shows an increase in the biases independent of 𝜌; Ω22 was not estimated in this model. Using AGQ (M6) did not improve these. M7 (naïve pooled) had nearly the same biases for 𝑝 (1) as M5 and M6. The saturated models M8, M9 (AGQ), and M10 (naïve pooled) demonstrated little bias for either scenario of 𝜌. Biases in the population transition probabilities 𝑝 (1|0) and 𝑝 (0|1) are displayed in the second and third rows of Figure 3, respectively. The patterns between these across the models are similar. M0, M1 and M2 demonstrated little bias, AGQ decreased the bias comparatively. M3 through M6 for 𝜌 = 0.7 and M5 and M6 for 𝜌 = 0 demonstrated greater bias. M3 and M4 for 𝜌 = 0 yielded similar results to M1 and M2 as expected. M7 and M10 (naïve pooled) - 24 - State Space and Marginal Probability Models demonstrated the greatest biases. M8 and M9 had smaller biases compared to those models; the estimation of Ω11 improved prediction of 𝑝 (1|0) and 𝑝 (0|1) . The fourth and final row of Figure 3 shows the range of 90% CI coverages across the doses and times by model. M0 and M2 (AGQ) performed the best independent of 𝜌, and M4 performed similarly to M0 and M2 for 𝜌 = 0 as expected. M1 (and M3 for 𝜌 = 0), M8, and M9 had less overall coverage comparatively, with M8 and M9 performing slightly better than M1 (or M3 for 𝜌 = 0). Even though M1 (and M3 for 𝜌 = 0) is the true model, the LA approximation degraded the coverage rates. This is likely due to estimating Ω22 , because M8 used LA but only estimated Ω11 . The 90% CI rates for M8 and M9 are not egregious, despite the model form not being the true model. The saturated nature of the model compensated, even with systematically biased TPs. The biases increase for M3, M4 (𝜌 = 0.7) and M10. M5 through M7 demonstrated poor coverages independent of 𝜌, which is due to the larger biases in predicting 𝑝 (1) and also because only 1 or 0 variance components were estimated. These models likely did not adequately capture the correlation structure and thus inflated the amount of information used in calculating the precisions of the estimates. The use of AGQ in M6 does not improve the outcome compared to M5. Based on the coverage rates, extreme caution would be needed when making inferences using M5 – M7. Discussion The beginning of this article demonstrated that the Markov and latent variable models are related when there is autocorrelation. Because both methods use the Markov property and thus could be considered Markov models, it was suggested that a different terminology be used when referring to these approaches, and that this terminology be aligned with the structural orientation - 25 - State Space and Marginal Probability Models of the model. Models which focus on the structure of transition probabilities could be termed state space models and those that focus on marginal probabilities could be termed marginal probability models. Details on transition probabilities, achieving steady state of a Markov process, and handling missing data were provided to help delineate differences which manifest between the two approaches. A simple simulation was presented to compare and contrast these as well. The influence of the assumption of the initial condition for the Markov chain was illustrated. Overall, the intent was not to claim superiority of either method. In general, given the data pharmacometricians encounter typically, one will not know which approach is better. Discussing the underlying concepts of these approaches hopefully will help inform pharmacometricians when thinking of strategies of modeling such data. For example, such issues could be important if the goal of a model is to be able to predict results from a study if dosage levels are changed based on observed responses or to predict the difference in responder rate at trial conclusion. A hybrid model which specified marginal probabilities, yet used fixed effects (non-stochastic components) to address the autocorrelation, was presented also. It is unknown if such a model has been discussed in the literature. The hybrid approach could be used for other link functions and also for more in depth looks at autocorrelation, such as decay in time (not handled by simple SS models). Future exploration of this method might be of interest, because it has the potential to realize the benefits from both approaches. Continuous time SS models were not discussed here (see Pilla Reddy et al for example [10]). These models are more complex to implement, and hence were beyond the scope of this article. Continuous time SS models are attractive in that these do not have the complications of discrete time SS models when accounting for missing data, and these are more flexible in dealing with autocorrelation structures. - 26 - State Space and Marginal Probability Models Autocorrelation of the latent residuals was presented for dealing with stochastic autocorrelation. An approximation for the bivariate normal enabled modeling this through an autocorrelation function. This method considers correlation on the latent scale. A method that was not presented, yet could be used, is one that models autocorrelation of the responses instead of through the latent residuals. It is based on the relation 𝐶𝑜𝑟(𝑌1 , 𝑌2 ) = 𝐸(𝑌1 ,𝑌2 )−𝐸(𝑌1 )𝐸(𝑌2 ) 𝑆𝐷(𝑌1 )𝑆𝐷(𝑌2 ) (1,1) = (1) (1) 𝑝12 −𝑝1 𝑝2 (1) (1) (1) (24) (1) √𝑝1 (1−𝑝1 )√𝑝2 (1−𝑝2 ) where Cor(.) is the correlation operator, E(.) is the expectation operator, and SD(.) is the standard (1,0) deviation operator. Setting 𝐶𝑜𝑟(𝑌1 , 𝑌2 ) = 𝜈12 (𝜌) and noting 𝑝12 (𝑦 ,𝑦2 ) probabilities 𝑝121 (1) (1,1) = 𝑝1 − 𝑝12 , the joint can be calculated and used in the likelihood. Table I illustrates the relation between correlation on the latent and response scales for the simulations used in the article. This approach is also Markov based (see Guerra et al [11]) and could be used for logistic or other link functions. This method was not developed further here because of the difficulty in generalizing it to ordered categorical data. Additionally, correlation on the latent residual scale seems more analogous to continuous data, despite the restriction to the probit link function. The simulation, conducted originally to evaluate the effects of neglecting stochastic autocorrelation in the model, which leads to biased transition probabilities, provided considerable information on strategies for modeling longitudinal binary data. These results did not appear contradictory to those reported by Johansson et al, in so far as these two reports can be compared (some different objectives and estimation methods were used) [12]. The main objective here was to evaluate predictions and their inferences which were shown to be a function of all the parameters and so biases in any could result in biased predictions. If the - 27 - State Space and Marginal Probability Models results from this study can be generalized, these would imply the following, some of which might be surprising given the current standard for modeling such data. The 3-point approximation to the bivariate normal is nearly as good as the 5-point. Figure 3 demonstrates this for 𝜌 = 0.7, and can be verified even for 𝜌 = 0.9 (see the Supplemental Material). Values of 𝜌 > 0.9 are likely needed before a difference is perceived. Nevertheless, unless run times are an issue (for the simulation study, the run times were similar), the 5-point is recommended in case 𝜌 happens to be large (albeit unlikely). Similarly, one should use the best approximation available for estimating the random effects. Adaptive Gauss quadrature demonstrated less bias than using the Laplace approximation, except when the random effects structure was misspecified. There was no information to suggest that using a better approximation could be harmful for incorrect models. In fact, biases were lower for the saturated models when using adaptive Guass quadrature and 90% CI coverage rates were improved. to Better approximation methods might require longer run times, however. If this is not an issue, better approximations should be pursued when possible. Unbiased prediction of the transition probabilities was not necessary to get unbiased predictions (relatively) of the population mean probabilities nor was it necessary to achieve reasonable 90% CI coverage rates and hence inferences. Addition of fixed effects parameters is required to attain good predictions and inferences without characterizing the transition probabilities (or the latent marginal variance structure). Thus, not characterizing the transition probabilities will lead to developing parsimonious models. This point is furthered bellow. Also, unless 𝜌 is substantial (> 0.5 based on the simulation results presented here – see the Supplemental Material), not including autocorrelation in the model should not affect marginal probabilities or inferences egregiously. Characterization of the marginal variance or transition - 28 - State Space and Marginal Probability Models probabilities may still be of interest, if the purpose of the model is simulating or predicting individual response profiles that incorporate changes in dosage levels which are triggered by observed responses. Using random effects and autocorrelation should be flexible enough to deal with many situations frequently encountered in pharmacometrics, and thus should help reduce bias [13]. This led to the next most critical result from the stimulation study. The key learning was this. Incorporating random effects when necessary is essential to reduce biases in predictions of the probabilities of response. Failure to include the random effect on the non-drug model component in the simulation study led to much greater biases than omitting autocorrelation (ie, M5 through M7). The probit formulation provides insight into why this is the case. The marginal variability on the probit scale directly influences the population mean probability predictions (not including autocorrelation influences the predictions indirectly due to bias in the random effects). Consequently, incorrect random effects structures lead to bias. Bias from failure to include necessary random effects can be reduced at the cost of additional fixed effects. The saturated models, which had positively biased transition probabilities, were unbiased in the population mean predictions. However, if one wants to purse a parsimonious model, random effects should be considered on all model components, similar to models posited for continuous data. This makes sense intuitively. As time progresses during a trial and as subjects receive benefit from treatment, heterogeneity is introduced. Not every subject is anticipated to have the same benefit from the same dose, which is modeled by including a random effect on the drug effect to account for different maximum effects a drug can bestow. Although, drug-related random effects were not considered in the simulation, it is not a stretch to infer that failure to incorporate such effects will lead to biases in the predictions of drug effect and drug response. Failure to incorporate these might lead to the incorporation of - 29 - State Space and Marginal Probability Models more fixed effects – ie, model expansion – to get an adequate fit to the data, thwarting interpretation and efficiency. If drug does introduce heterogeneity or variability manifested as a change in correlation structure as a function of dose, then this has implications for recent suggestions regarding parameterization of indirect response models. The change from baseline parameterization [6] is convenient for modeling data from placebo and then, in a sequential fashion, incorporating data following active treatment, because the placebo model component absorbs the drug effect parameters when dose or concentration is 0. Making this change dose not ensure that the variabilities will be the same, which could result in bias as discussed above. For example think of a simple indirect response model (IRM) [14] parameterized as (𝛽1 + 𝜂1 ) + (𝛽2 + 𝜂2 )𝑅(𝑡) or using change from baseline as (𝛽́1 + 𝜂́ 1 ) + (𝛽́2 + 𝜂́ 2 )[𝑅(𝑡) − 1], where 𝑅(𝑡) is the solution of an IRM model which assumes inhibition (ie, 𝑅(𝑡) = 0 and 𝑅(𝑡 > 0) < 1). The first model posits that variability in baseline response is due to disease and that the drug effect will remove this component of variability from the response. To use the change from baseline parameterization in this case, care must be taken to when postulating Ω. In fact, a full Ω matrix might need to avoid bias. Continuing this argument, if one considers a partial inhibition of production of response (ie, 𝑘𝑖𝑛 = 1 − 𝐼𝑚𝑎𝑥 (𝜂) ∙ 𝐶(𝑡)⁄(𝐸𝐶50 + 𝐶(𝑡))) that is a function of a random effect, change of baseline will not necessarily lead to the same result as if a stimulatory effect had been assumed (ie, 𝑘𝑖𝑛 = 1 + 𝑆𝑚𝑎𝑥 (𝜂) ∙ 𝐶(𝑡)⁄(𝐸𝐶50 + 𝐶(𝑡))) even if change from baseline is used and despite the ability to achieve a similar predicted outcome potentially. The reason is the assumption of the random effects between the two models will lead to a different marginal variance. Thus, when considering random effects and hence the population, the model - 30 - State Space and Marginal Probability Models space is a little richer than that suggested by Hu et al (see [6]) in that there are more than 3 possible IRM models. In summation, this research suggests that more random effects should be evaluated when evaluating longitudinal binary or categorical data. Failure to incorporate random effects or incorrect specification thereof can lead to biased predictions and less than nominal inference rates. ACKNOWLEDGEMENT The authors would like to acknowledge graciously and thank kindly Dr. Robert Bauer for making the bivariate normal subroutine available for general use in NONMEM. REFERENCES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Sheiner LB (1994) A new approach to the analysis of analgesic drug trials, illustrated with bromfenac data. Clin Pharmacol Ther 56:309-322 Lacroix BD, Lovern MR, Stockis A, Sargentini-Maier ML, Karlsson MO, Friberg LE (2009) A pharmacodynamic Markov mixed-effects model for determining the effect of exposure to certolizumab pegol on the ACR20. Clin Pharmacol Ther 86:387-95 Hutmacher MM, French JL (2011) Extending the latent variable model for extra correlated longitudinal dichotomous responses. J Pharmacokinet Pharmacodyns 38:833-859 Hutmacher MM, Krishnaswami S, Kowalski KG (2008) Exposure-response modeling using latent variables for the efficacy of a JAK3 inhibitor administered to rheumatoid arthritis patients. J Pharmacokinet Pharmacodyns 35:139–157 Drezner Z, Wesolowsky GO (1990) On the computation of the bivariate normal integral. J Stat Comput Simul 35:101–107 Hu C, Xu Z, Mendelsohn Am, Zhou H (2013) Latent variable indirect response modeling of categorical endpoints representing change from baseline. J Pharmacokinet Pharmacodyns 40:81–91 Hu C (2014) Exposure–response modeling of clinical end points using latent variable indirect response models. CPT Pharmacometrics Syst. Pharmacol. 3:1–8 Yu L, Griffith WS, Tyas SL, Snowdon DA, Kryscio RJ (2010) A nonstationary Markov transition model for computing the relative risk of dementia before death Stat Med. 29:639-648 Bishop YMM, Fienberg SE, Holland PW Discrete multivariate analysis: theory and practice. The MIT Press, Massachusetts, 1975, pp. 486-502 Pilla ReddyV, Petersson KJ, Suleiman AA, Vermeulen A, Proost JH, Friberg LE (2012) Pharmacokinetic– pharmacodynamic modeling of severity levels of extrapyramidal side effects with Markov elements. CPT Pharmacometrics Syst. Pharmacol. 1: 1−9 Guerra MW, Shults J, Amsterdam J, Ten-Have T (2012) The analysis of binary longitudinal data with timedependent covariates. Stat Med 31:931-948 Johansson ǺM, Ueckert S, Plan EL, Hooker AC, Karlsson MO (2014) J Pharmacokinet Pharmacodyn (2014) 41:223–238 Gurka MJ, Edwards LJ, Muller KE (2011) Avoiding bias in mixed model inference for fixed effects. Stat Med. 30(22):2696-2707 Dayneka NL, Garg V, Jusko WJ (1993) Comparison of four basic models of indirect pharmacodynamic response. J Pharmacokinet Biopharm. 21(4):457-478. - 31 - State Space and Marginal Probability Models Appendices Appendix A The bivariate normal subroutine is available at ftp://nonmem.iconplc.com/Public/nonmem/bivariate/ along with some sample control streams. Below is an example control stream that corresponds to the simulation study. $PROB Example Control Stream $INPUT SIM ID DOSE DV DV_ TIME TIME ;***DV_ ,TIME_ are previous DV, TIME***; ;*** – ie, DV = 𝑦𝑖𝑗 , DV_=𝑦𝑖𝑗−1, TIME=𝑡𝑖𝑗 , TIME_=𝑡𝑖𝑗−1***; $DATA ex01.csv IGNORE=@ $SUBROUTINES OTHER=bivariate.f90 ;***Include bivariate normal subroutine***; $PRED B1=THETA(1) ;***𝛽1***; B2=THETA(2) ;***𝛽2***; B3=THETA(3) ;***𝛽3***; K =LOG(2)/EXP(THETA(4)) ;***used for placebo onset – 𝑈(𝑡𝑖𝑗 )***; ED50=EXP(THETA(5)) - 32 - State Space and Marginal Probability Models ET1=EXP(THETA(6))*ETA(1) ;***parameterized for smoothed parametric bootstrap***; ET2=EXP(THETA(7))*ETA(2) RHO=(2/(1+EXP(-THETA(8)))-1)**(TIME-TIME_) ;***correlation function 𝜈𝑖𝑗𝑗′ (𝜌)***; U = (1-EXP(-K*TIME)) U_ = (1-EXP(-K*TIME_)) MX =B1+B2*U+B3*DOSE/(DOSE+ED50)+ET1+ET2*U ;*** 𝜇𝑖𝑗 + 𝑔𝑖𝑗 𝜂𝑖 ***; MX_ =B1+B2*U_+B3*DOSE/(DOSE+ED50)+ET1+ET2*U_ ;*** 𝜇𝑖𝑗−1 + 𝑔𝑖𝑗−1 𝜂𝑖 ***; MX_=B1+B2*(1-EXP(-K*TIME_))+B3*DOSE/(DOSE+ED50)+U1+U2*(1-EXP(-K*TIME_)) (0) (1) PC =(1- PHI(MX))*(1-DV ) + PHI(MX) *DV ;*** 𝑝̇𝑖𝑗 (1 − 𝑦𝑖𝑗 ) + 𝑝̇𝑖𝑗 𝑦𝑖𝑗 ***; (0) (1) PC_ =(1-PHI(MX_))*(1-DV_) + PHI(MX_)*DV_ ;*** 𝑝̇𝑖𝑗−1 (1 − 𝑦𝑖𝑗−1 ) + 𝑝̇𝑖𝑗−1 𝑦𝑖𝑗−1***; IF(PC.LE.0.0D+00) EXIT ;***avoid issues of LOG(0)***; IF (TIME.EQ.1) THEN ;***signifies first observation***; LOGL=LOG(PC) ELSE ;*** Start Passing Bivariate Normal Information - only do so on subsequent records***; - 33 - State Space and Marginal Probability Models VECTRA(1)=RHO VECTRA(2)=MX VECTRA(3)=MX_ VECTRA(4)=1 ;***0 = Upper tail as in Drezner & Wesolowsky; 1 = Bottom tail***; VECTRA(5)=1 ;***0 = 3 pt approximation; 1 = 5 point approximation***; BV=FUNCA(VECTRA) ;***return bivariate normal results***; ;***code joint probability based on bottom tail***; JP=(DV-1)*(DV_-1)+(DV-1)*(1-2*DV_)*PHI(MX_)+(DV_-1)*(1-2*DV)*PHI(MX)+(12*DV)*(1-2*DV_)*BV LOGL=LOG(JP/PC_) ;***Compute population predictions***; V=SQRT(1+EXP(THETA(6))**2+EXP(THETA(7))**2*U**2) ;***V=SQRT(1+OMEGA(1,1)+OMEGA(2,2)*U**2) use if estimating OMEGA***; POPP = (B1+B2*U +B3*DOSE/(DOSE+ED50))/V ;***Population mean prediction***; ;***Set up log-likelihood***; $THETA - 34 - State Space and Marginal Probability Models -2.4 ; 1 B1 2.0 ; 2 B2 3.4 ; 3 B3 1.4 ; 4 LOG(B4) 0.7 ; 5 LOG(B5) 0.1 ; 6 LOG SQRT VAR(ETA1) 0.5 ; 7 LOG SQRT VAR(ETA2) 0.6 ; 8 RHO parameter $OMEGA DIAGONAL(2) 1 FIXED ; V1 1 FIXED ; V2 $EST MAX=8000 PRINT=10 METHOD=1 LAPLACE -2LL SIGL=10 NOHABORT ;***$EST METHOD=IMP LAPLACE -2LL NOHABORT PRINT=1 NITER=500 CTYPE=3 $COV COMPRESS MATRIX=R PRINT=E UNCONDITIONAL - 35 - State Space and Marginal Probability Models A version of this control stream which uses features of NONMEM to generate DV_, MX_ etc is provided in the Supplemental Material. That version should be helpful when dealing with models that employ differential equations. Appendix B The calculation of bias for the population transition probabilities (eliminating the subscript i for simplicity and indexing two arbitrary times by 1 and 2) requires the quantity 𝐸𝜂 Φ2 ( 𝜇́ 1 , 𝜇́ 2 , 𝜈12 (𝜌)), where 𝜇́ 1 = 𝜇1 − 𝑔1 𝜂. From the latent variable formulation ∞ 𝜇1 ∞ 𝜇2 𝐸𝜂 Φ2 = ∫ Φ2 (𝜇́ 1 , 𝜇́ 2 , 𝜈12 (𝜌)) = ∫ ∫ ∫ 𝜙2 (𝑧1 , 𝑧2 , 𝜈12 (𝜌)) 𝑑𝑧1 𝑑𝑧2 −∞ where 𝜙2 (𝑧1 , 𝑧2 , 𝜈12 (𝜌)) = −∞ −∞ −∞ 1 2𝜋√1−𝜈12 (𝜌)2 𝑒𝑥𝑝 {− (𝑧12 −2𝜈12 (𝜌)𝑧1 𝑧2 +𝑧22 ) 2(1−𝜈12 (𝜌)2 ) }. Changing the order of integration and making a suitable change of variables based on the square root of the diagonal elements of Ξ (based on times 1 and 2 only), 𝐸𝜂 Φ2 = Φ2 (𝜇1 , 𝜇2 , 𝜅12 (𝜌, Ω)) where 𝜅12 (𝜌) is the off-diagonal from [𝑑𝑖𝑎𝑔(Ξ)]−1⁄2 Ξ[𝑑𝑖𝑎𝑔(Ξ)]−1⁄2, which for the example ⁄ ⁄ yields 𝜅12 (𝜌) = (𝜈12 (𝜌) + Ω11 + Ω22 𝑈1 𝑈2 )𝑉1−1 2 𝑉2−1 2 . Monte Carlo methods could also be (1|0) used: 𝑝𝑖𝑗 (0) −1 (1,0)∗ (0|1) ≅ [𝑝𝑖𝑗−1 ] [𝑀1 ∑𝑀 𝑚=1 𝑝̇𝑖𝑗𝑗−1 ] and 𝑝𝑖𝑗 (1) −1 (0,1)∗ ≅ [𝑝𝑖𝑗−1 ] [𝑀1 ∑𝑀 𝑚=1 𝑝̇𝑖𝑗𝑗−1 ], where the ‘*’ ∗ indicates the probability is a function of 𝜂𝑚 which is sampled from 𝑁(0, Ω), with Ω estimated, using a sufficiently large M. - 36 - State Space and Marginal Probability Models Appendix C ̂ ) correspond to their estimates, Let 𝜓 = (𝛽, Ω) be the vector of parameters and 𝜓̂ = (𝛽̂ , Ω ̂. Then the variance of ̂ (𝜓̂) = Δ which have an estimated covariance matrix (eg, COV step) 𝑉𝑎𝑟 the prediction can be calculated using (1) (1) ̂ [Φ−1(𝑝𝑖𝑗 𝑉𝑎𝑟 )] ≈ ∂ [Φ−1 (𝑝𝑖𝑗 )] ∂𝜓 𝑇 (1) ̂ Δ | ∂ [Φ−1 (𝑝𝑖𝑗 )] ∂𝜓 | ̂ 𝜓=𝜓 ̂ 𝜓=𝜓 such that the confidence limit (eg, at 5th percentile) is (1) ̂ [Φ−1 (𝑝(1) )]] 𝐶𝐿0.05 = Φ [Φ−1 (𝑝𝑖𝑗 ) + 𝑍0.05 ∙ √𝑉𝑎𝑟 𝑖𝑗 where 𝑍0.05 is the quantile corresponding to probability level 0.05. To apply this method when a closed form solution to the population mean is not available, suitable regulatory conditions are required. In general, let 𝐸(𝑌|𝜂) = 𝜇(𝜂) and 𝐸(𝑌; 𝜓) = 𝐸𝜂 (𝜇(𝜂); 𝜓) where inference on 𝐸(𝑌) is desired. Applying the delta-method 𝑇 ∂[𝐸(𝑌; 𝜓)] ∂[𝐸(𝑌; 𝜓)] ̂ ̂ [𝐸(𝑌; 𝜓)] ≈ 𝑉𝑎𝑟 | Δ | ∂𝜓 ∂𝜓 ̂ ̂ 𝜓=𝜓 𝜓=𝜓 where passing the differentiation through the integration 𝑀 ∗ ); ∂[𝐸𝜂 (𝜇(𝜂); 𝜓)] 𝐸𝜂 [∂(𝜇(𝜂); 𝜓)] ∂[𝐸(𝑌; 𝜓)] ∂(𝜇(𝜂𝑚 𝜓) | = | = | ≅∑ | ∂𝜓 ∂𝜓 ∂𝜓 ∂𝜓 ̂ ̂ ̂ ̂ 𝜓=𝜓 𝜓=𝜓 𝜓=𝜓 𝜓=𝜓 𝑖=1 - 37 - State Space and Marginal Probability Models ̂ ). Finite differences could be used, for example where 𝜂 ∗ ~𝑀𝑉𝑁(0, Ω ∗ );𝜓) ∂(𝜇(𝜂𝑚 ∂𝜓 = ∗ ); ∗ ); [(𝜇(𝜂𝑚 𝜓 + 𝛿) − (𝜇(𝜂𝑚 𝜓)]/𝛿. Appendix D Let the response 𝑌 be one of K+1 ordered values 0, 1, 2, … , K-1, K, and let Z be defined as in equation (16). For this derivation (suppressing 𝜂 when convenient and with some convenient abuse of notation), assume the mapping 𝑃(𝑌 = 0) = 𝑃(𝛾1 < 𝑍 < 𝛾0 ), … , 𝑃(𝑌 = 𝐾) = 𝑃(𝛾𝐾+1 < 𝑍 ≤ 𝛾𝐾 ) where the 𝛾0 = ∞ > 𝛾1 > ⋯ > 𝛾𝐾−1 > 𝛾𝐾 > 𝛾𝐾+1 = −∞ are thresholds (K parameters), and let 𝜇́ 𝑦𝑗 = 𝛾𝑦𝑗 − 𝑓𝑗 − 𝑔𝑗 𝜂 for time j and 𝜈12 = 𝜈12 (𝜌). 𝐾 (𝑦 ) 𝑝̇1 1 (𝑘) = 𝑃(𝑌1 = 𝑦1 ) = 𝑃(𝛾𝑦1 +1 < 𝑍1 ≤ 𝛾𝑦1 ) = Φ(𝜇́ 𝑦1 ) − Φ(𝜇́ 𝑦1+1 ), ∑ 𝑝̇1 =0 𝑘=0 (𝑦 ,𝑦2 ) 𝑝̇121 = 𝑃(𝑌1 = 𝑦1 , 𝑌2 = 𝑦2 ) = 𝑃(𝛾𝑦1+1 < 𝑍1 ≤ 𝛾𝑦1 , 𝛾𝑦2 +1 < 𝑍2 ≤ 𝛾𝑦2 ) = Φ2 (𝜇́ 𝑦1 , 𝜇́ 𝑦2 , 𝜈12 ) − Φ2 (𝜇́ 𝑦1 +1 , 𝜇́ 𝑦2 , 𝜈12 ) − Φ2 (𝜇́ 𝑦1 , 𝜇́ 𝑦2 +1 , 𝜈12 ) + Φ2 (𝜇́ 𝑦1+1 , 𝜇́ 𝑦2 +1 , 𝜈12 ) where Φ(−∞) = 0, Φ(∞) = 1, and in this notation Φ2 (∞, ∞, 𝜈12 ) = 1; Φ2 (𝑥, ∞, 𝜈12 ) = Φ2 (∞, 𝑥, 𝜈12 ) = Φ(𝑥); and Φ2 (𝑥, −∞, 𝜈12 ) = Φ2 (𝑥, −∞, 𝜈12 ) = Φ2 (−∞, ∞, 𝜈12 ) = Φ2 (∞, −∞, 𝜈12 ) = 0. - 38 - State Space and Marginal Probability Models - 39 - State Space and Marginal Probability Models Table 1 Correlations for the Latent Variables (Z) and Responses (Y) Correlation in LV (Z) Correlation in Responses (Y)a ρ/ 0 0.3 0.5 0.7 0.9 0 0.3 0.5 0.7 0.9 Elementb 1,2 0.52 0.66 0.76 0.85 0.94 0.30-0.34 0.40-0.47 0.48-0.55 0.59-0.64 0.75-0.78 1,3 0.52 0.53 0.57 0.66 0.83 0.29-0.34 0.31-0.35 0.33-0.39 0.41-0.45 0.56-0.61 1,4 0.49 0.49 0.49 0.52 0.67 0.27-0.32 0.28-0.32 0.27-0.32 0.29-0.33 0.40-0.45 1,5 0.47 0.47 0.47 0.47 0.54 0.25-0.29 0.25-0.29 0.26-0.31 0.26-0.29 0.31-0.34 1,6 0.46 0.46 0.46 0.46 0.49 0.24-0.30 0.26-0.29 0.25-0.29 0.25-0.29 0.26-0.32 1,7 0.46 0.46 0.46 0.46 0.47 0.25-0.30 0.25-0.29 0.25-0.29 0.25-0.29 0.25-0.30 2,3 0.57 0.60 0.67 0.77 0.90 0.34-0.38 0.36-0.41 0.41-0.47 0.50-0.56 0.68-0.71 2,4 0.56 0.56 0.57 0.61 0.76 0.31-0.37 0.33-0.37 0.33-0.38 0.36-0.42 0.49-0.54 2,5 0.55 0.55 0.55 0.56 0.63 0.31-0.36 0.32-0.36 0.33-0.36 0.32-0.37 0.38-0.42 2,6 0.55 0.55 0.55 0.55 0.58 0.31-0.36 0.32-0.35 0.32-0.36 0.32-0.36 0.34-0.39 2,7 0.55 0.55 0.55 0.55 0.56 0.31-0.36 0.32-0.35 0.31-0.36 0.31-0.36 0.31-0.36 3,4 0.65 0.65 0.67 0.73 0.87 0.39-0.44 0.39-0.45 0.41-0.47 0.46-0.51 0.63-0.66 3,5 0.65 0.65 0.65 0.66 0.74 0.38-0.45 0.40-0.45 0.39-0.45 0.41-0.45 0.48-0.52 3,6 0.65 0.65 0.65 0.65 0.69 0.39-0.45 0.40-0.45 0.40-0.45 0.40-0.45 0.43-0.48 3,7 0.65 0.65 0.65 0.65 0.66 0.40-0.45 0.40-0.44 0.40-0.45 0.39-0.45 0.40-0.46 4,5 0.73 0.73 0.73 0.74 0.84 0.46-0.51 0.46-0.52 0.46-0.52 0.48-0.53 0.59-0.64 4,6 0.73 0.73 0.73 0.73 0.78 0.47-0.52 0.47-0.53 0.47-0.52 0.47-0.53 0.52-0.56 4,7 0.73 0.73 0.73 0.73 0.75 0.47-0.51 0.48-0.53 0.49-0.52 0.48-0.53 0.48-0.53 5,6 0.77 0.77 0.77 0.78 0.87 0.51-0.56 0.51-0.56 0.50-0.56 0.52-0.57 0.63-0.67 5,7 0.77 0.77 0.77 0.77 0.80 0.50-0.55 0.51-0.56 0.50-0.56 0.51-0.56 0.53-0.59 6,7 0.78 0.78 0.78 0.78 0.84 0.52-0.56 0.51-0.56 0.51-0.57 0.52-0.57 0.58-0.63 aCorrelation in responses calculated by simulation (n=20,000 per dose group), range reported is over dose groups (0 to 30 mg) bElement relates indices of the matrices – eg, 2,7 is 2nd row, 7th column. Diagonals of the matrices = 1 by definition. - 40 - State Space and Marginal Probability Models Captions for Figures Fig. 1. Results for the illustration. Black line is true value. Grey lines with numbers correspond to results from Models 1 − 5, respectively. Fig. 2. Percentage bias in the parameter estimates for models M0 through M7. Fig. 3. Range of bias (line segment) and median bias (point) across these doses and times over the 1000 simulations for the population predictions (𝑝(1) − first row) and transition probabilities (𝑝(1|0) and 𝑝(0|1) − second and third row, respectively). Last row corresponds to 90% CI coverage rates across doses and times over the simulations. Black is for the smoothed parametric bootstrap and grey is for the delta method. - 41 - State Space and Marginal Probability Models Figure 1 - 42 - - 43 Parameter -6 0                                                                 -1 0 0 -8 0   40   20   M0 M1 M2 M3 M4 M5 M6 M7   0    -2 0 % Bia s -4 0 Parameter                               -1 0 0 -8 0 -6 0 -2 0 % Bia s -4 0 0 20 40 State Space and Marginal Probability Models Figure 2  M0 M1 M2 M3 M4 M5 M6 M7 State Space and Marginal Probability Models Figure 3 - 44 -

EvalBiasMixedDichotomousData_13APR2015

Related documents

Products

Support

EvalBiasMixedDichotomousData_13APR2015

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib