EvalBiasMixedDichotomousData_13APR2015

advertisement
Markov, Latent Variable, State-Space, or Marginal Probability Models in
Pharmacometrics: Concepts and an Evaluation of Bias
Matthew M. Hutmacher1,*
1. Ann Arbor Pharmacometrics Group (A2PG)
301 N. Main St, Suite 102
Ann Arbor, MI 48104
*
Corresponding author. E-mail address: matt.hutmacher@a2pg.com
State Space and Marginal Probability Models
Abstract
One might get the impression from reading the pharmacometric literature that Markov and latent
variable (LV) models for longitudinal binary or ordered categorical data are distinct approaches.
The approaches are related when considering autocorrelation between responses within an
individual however, because these both use the Markov property. An alternate terminology
(albeit not new) for describing these modeling approaches is proposed that aligns with the aim
and structure of each approach. The term state space (SS) is proposed for models in which the
model structure is linked to transition probabilities. The primary focus of SS models is the
transition from one response state to another. The term marginal probability (MP) is proposed
for models in which the primary focus of the model structure is the marginal probabilities, and
the transition probabilities are derived. One might question if the transition probabilities need to
be characterized adequately to ensure accurate prediction of the marginal probabilities. A
simulation study, to address this, was conducted to assess bias in estimation, prediction, and
inferences when the transition probabilities are misspecified and the autocorrelation is stochastic.
The results may be surprising to many in that these suggest that characterizing autocorrelation,
when stochastic, is not as important as specifying a suitably rich random effects structure.
KEY WORDS: latent variable; probit; autocorrelation; generalized nonlinear mixed-effects
models; Markov model
-2-
State Space and Marginal Probability Models
Introduction
Generalized nonlinear mixed effects models have been used increasingly for modeling
longitudinal binary or ordered categorical outcomes since their introduction into the
pharmacometric literature over 20 year ago [1]. Methods for addressing and handling additional
within-subject correlation, or autocorrelation – correlation not addressed by subject-specific
random effects – have been discussed more recently. For example, Lacroix et. al. modeled the
probabilities for transitioning between responder status and dropout, where responder status was
defined by the American College of Rheumatology 20% responder criterion (ACR20) [2]. These
transition probabilities, or conditional probabilities of moving between states given observation
of the previous states, address the autocorrelation. Exposure and time effects, as well as subjectspecific random effects, were linked to the transition probabilities directly. Modeling transition
probabilities using Markov components is the most frequently used method to include
autocorrelation in a pharmacometric model, and it is easy to implement. No special functions are
required and the approach can be used with any link function – eg, logit, probit, etc.
Hutmacher and French addressed the issue of autocorrelation within the latent variable (LV)
framework [3]. Autocorrelation was assumed to be random or stochastic and manifested in the
latent residuals. Exposure and time effects were applied to the probabilities of the responses, not
the transition probabilities. The general latent residual correlation structure presented by
Hutmacher and French was not implemented easily within software currently available for
generalized nonlinear mixed effects models however, restricting its general utility.
The Markov and LV approaches to modeling autocorrelation differ primarily in focus and
practical implementation, not in the general construction of the model likelihood. The
-3-
State Space and Marginal Probability Models
approaches are related at a fundamental level through use of the Markov property. Thus, a
different terminology is proposed that better aligns with the intent, structure and interpretation of
the models. The term state space (SS) is proposed for direct modeling of, or applying structural
effects to, the transition probabilities (TP). The focus of this model is characterizing the
transitions from state-to-state. The term marginal probability (MP) is proposed for models in
which structural effects are applied to probabilities that are not conditional on the previous
response. The LV approach as described by Hutmacher et. al. [4] implies the MP approach*.
Hopefully, the rationale for the proposed change in terminology shall become clearer after a
more formal exposition of each framework.
Despite the different objectives and utilities of the SS and MP approaches, one might be
concerned about the adequacy of the MP approach if the within-subject correlation or TPs are
modeled inappropriately. One might even have this concern even if marginal probabilities are
the primary quantity of interest. Large decreases in the objective function value (OFV) have
been noted when using the SS approach (eg, see [2]). If the OFVs are interpreted as a statistic
related to the goodness-of-fit (to the extent that it should be in this context is another matter),
then one might be concerned that an approach that does not accurately predict the TPs is
inadequate.
This article, in relation and extension to these previous works, seeks to address two
objectives. The first objective is to demonstrate the relationship between the Markov and LV
*
The LV concept was used to provide a framework for including indirect response model (IRM) constructs into
generalized nonlinear mixed effects models. The LV was used in a way similar to modeling continuous data with
IRMs in which the current response does not depend upon the previously observed response value deterministically
other than through the IRM model. An LV construct could be applied to the state space approach. For example the
threshold for achieving a response could depend upon the previous response. The interpretation may be awkward
from the pharmacometric drug-development point of view, however. Nevertheless, marginal probability is used
here to maintain focus on the difference in concepts between the two approaches.
-4-
State Space and Marginal Probability Models
approaches when autocorrelation is present. In so doing, a simpler, more user-friendly method
for implementing stochastic autocorrelation in the MP approach is presented. A simple
simulation/estimation exercise is presented to help delineate the SS and MP concepts. Further
insight by comparison is provided by looking at the steady state condition of SS models. The
approaches are also contrasted in how missing data need to be addressed. The second objective
is the evaluation of biases in parameter estimation and prediction, and the effects of these on
inference, when the random effects or correlation structure is misspecified. The biases are
evaluated using a simulation study in which autocorrelation is assumed to be stochastic, and
biases in predicting the MP are interpreted with respect to the accuracies of the TP predictions.
A method of handling autocorrelation using link functions other than the probit is also described
for general utility. The results suggest that characterizing stochastic autocorrelation is not as
important as specifying a suitably rich random effects structure.
Relationship between the Markov and LV Models
๐‘‡
Let ๐‘Œ๐‘– = (๐‘Œ๐‘–1 , โ‹ฏ , ๐‘Œ๐‘–๐‘›๐‘– ) be a vector of binary or Bernoulli (0 or 1) observations collected at
times ๐‘ก๐‘–๐‘— , j = 1, 2…,ni. The joint probability of the responses, conditioned on the subject-specific
random effects, η, can be derived by conditioning on the prior observations
๐‘ƒ(๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 , โ‹ฏ , ๐‘Œ๐‘–๐‘›๐‘– = ๐‘ฆ๐‘–๐‘›๐‘– |๐œ‚)
๐‘›
๐‘–
= ๐‘ƒ(๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 |๐œ‚) ∏๐‘—=2
๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 , โ‹ฏ , ๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚)
(1)
Assuming the current observation is related only to the previous response – ie, the autoregression
assumption ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 , โ‹ฏ , ๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚) = ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚) – then
(1) can be simplified to
-5-
State Space and Marginal Probability Models
๐‘›
๐‘–
๐‘ƒ(๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 , โ‹ฏ , ๐‘Œ๐‘–๐‘›๐‘– = ๐‘ฆ๐‘–๐‘›๐‘– |๐œ‚) = ๐‘ƒ(๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 |๐œ‚) ∏๐‘—=2
๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚)
(2)
The derivation of (2) is well known and illustrates the Markov property (of order 1).
The Markov approach, as implemented typically in the pharmacometric literature, uses
structural models for the probability of the current observation ๐‘Œ๐‘–๐‘— given the previous observation
๐‘Œ๐‘–๐‘—−1 . Fixed effects parameters for the baseline, placebo and drug exposure model components
are applied to โ„Ž[๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚)], where โ„Ž is a link function such as the logit. The
๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚) can be viewed as a transition probabilities (TP), where a subject
moves from state ๐‘ฆ๐‘–๐‘—−1 to state ๐‘ฆ๐‘–๐‘— . If the ๐‘Œ๐‘–๐‘— are not interpreted as states, instead perhaps as
discrete measurements or observations from an underlying continuous distribution, then
modeling the TPs might be interpreted as an approach which addresses autocorrelation
deterministically within a subject.
The LV approach, as presented in the pharmacometric literature, addresses autocorrelation
through LV residuals. The probability of response ๐‘Œ๐‘–๐‘— is modeled through fixed and subjectspecific random effects when specifying ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐œ‚); that is, these effects are applied to
โ„Ž[๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐œ‚)], where h is a link function. Stochastic autocorrelation between ๐‘Œ๐‘–๐‘— and ๐‘Œ๐‘–๐‘—−1
can be introduced through the joint probability, ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— , ๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 |๐œ‚). The overall joint
probability can be expressed as
๐‘›
๐‘–
๐‘ƒ(๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 , โ‹ฏ , ๐‘Œ๐‘–๐‘›๐‘– = ๐‘ฆ๐‘–๐‘›๐‘– |๐œ‚) = ๐‘ƒ(๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 |๐œ‚) ∏๐‘—=2
๐‘ƒ(๐‘Œ๐‘–๐‘— =๐‘ฆ๐‘–๐‘— ,๐‘Œ๐‘–๐‘—−1 =๐‘ฆ๐‘–๐‘—−1 |๐œ‚)
๐‘ƒ(๐‘Œ๐‘–๐‘—−1 =๐‘ฆ๐‘–๐‘—−1 |๐œ‚)
Thus, the Markov and autocorrelated LV approaches are related through
-6-
(3)
State Space and Marginal Probability Models
๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚) =
๐‘ƒ(๐‘Œ๐‘–๐‘— =๐‘ฆ๐‘–๐‘— ,๐‘Œ๐‘–๐‘—−1 =๐‘ฆ๐‘–๐‘—−1 |๐œ‚)
(4)
๐‘ƒ(๐‘Œ๐‘–๐‘—−1 =๐‘ฆ๐‘–๐‘—−1 |๐œ‚)
From this, the LV approach with autocorrelation can be seen to be a Markov-based approach as
well. Structurally modeling the TP, ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚), implies the focus of the
model is on characterizing the transitions from state-to-state or a state space (SS) structure.
Structurally modeling ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐œ‚) using an LV structure with stochastic autocorrelation
mediated by ๐‘ƒ(๐‘Œ๐‘— = ๐‘ฆ๐‘— , ๐‘Œ๐‘—−1 = ๐‘ฆ๐‘—−1 |๐œ‚) implies a focus on the MP. Consequently, the SS and
MP terminology is proposed.
Notation of Model Quantities
Notation is introduced to simplify exposition and clarify concepts for the remainder of the
article. The notation is also intended to help differentiate between conditioning on random
effects and prior observations. Let j index the measurement time for the ith subject, and ๐œ‚ be a
vector of subject-specific random effects. Probabilities, conditioned on ๐œ‚, are defined using a ‘โˆ™’
(๐‘ฆ |๐‘ฆ
)
๐‘–๐‘— ๐‘–๐‘—−1
as follows: ๐‘ฬ‡๐‘–๐‘—๐‘—−1
= ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐‘Œ๐‘–๐‘—−1 = ๐‘ฆ๐‘–๐‘—−1 , ๐œ‚) is the conditional or TP of ๐‘Œ๐‘–๐‘— based on
(๐‘ฆ )
observing the previous response ๐‘Œ๐‘–๐‘—−1 ; ๐‘ฬ‡๐‘–๐‘— ๐‘–๐‘— = ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— |๐œ‚) is the unconditional or MP of
(๐‘ฆ ,๐‘ฆ
)
๐‘–๐‘— ๐‘–๐‘—−1
observing ๐‘Œ๐‘–๐‘— (unconditional on the previous response); and ๐‘ฬ‡๐‘–๐‘—๐‘—−1
= ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— , ๐‘Œ๐‘–๐‘—−1 =
๐‘ฆ๐‘–๐‘—−1 |๐œ‚) is the joint probability of ๐‘Œ๐‘–๐‘— and ๐‘Œ๐‘–๐‘—−1. Another level derives from probabilities that are
(๐‘ฆ )
(๐‘ฆ )
not conditioned on ๐œ‚. These probabilities are without the ‘โˆ™’: ๐‘๐‘–๐‘— ๐‘–๐‘— = ๐ธ๐œ‚ [๐‘ฬ‡๐‘–๐‘— ๐‘–๐‘— ] =
(๐‘ฆ ,๐‘ฆ
)
(๐‘ฆ ,๐‘ฆ
)
๐‘–๐‘— ๐‘–๐‘—−1
๐‘–๐‘— ๐‘–๐‘—−1
๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— ) is the MP of observing ๐‘Œ๐‘–๐‘— ; ๐‘๐‘–๐‘—๐‘—−1
= ๐ธ๐œ‚ [๐‘ฬ‡๐‘–๐‘—๐‘—−1
] = ๐‘ƒ(๐‘Œ๐‘–๐‘— = ๐‘ฆ๐‘–๐‘— , ๐‘Œ๐‘–๐‘—−1 =
๐‘ฆ๐‘–๐‘—−1 ) is the joint probability of ๐‘Œ๐‘–๐‘— and ๐‘Œ๐‘–๐‘—−1; and ๐ธ๐œ‚ represents the expectation operator with
-7-
State Space and Marginal Probability Models
respect to η. These MPs are termed population probabilities (PPs) hereinafter in an attempt to
avoid confusion due to multiple uses of the term marginal.† A natural definition of population
(๐‘ฆ |๐‘ฆ
)
๐‘–๐‘— ๐‘–๐‘—−1
TPs is ๐ธ๐œ‚ [๐‘ฬ‡๐‘–๐‘—๐‘—−1
]. This definition is not as interesting for MP models without stochastic
(๐‘ฆ ,๐‘ฆ
)
(๐‘ฆ ) (๐‘ฆ
)
๐‘–๐‘— ๐‘–๐‘—−1
๐‘–๐‘—−1
autocorrelation, because ๐‘ฬ‡๐‘–๐‘—๐‘—−1
= ๐‘ฬ‡๐‘–๐‘— ๐‘–๐‘— ๐‘ฬ‡๐‘–๐‘—−1
due to conditional independence. Thus, the
(๐‘ฆ )
population TP results in ๐‘๐‘–๐‘— ๐‘–๐‘— when there is no stochastic autocorrelation. Instead, population
(๐‘ฆ ,๐‘ฆ
)
(๐‘ฆ
)
๐‘–๐‘— ๐‘–๐‘—−1
๐‘–๐‘—−1
TPs are defined as ๐‘๐‘–๐‘—๐‘—−1
⁄๐‘๐‘–๐‘—−1
in this article. This definition also highlights that ๐œ‚
affects transitions, the influence of which can be illustrated by considering the simple case of an
additive random effect on the baseline of the logit or probit scale. Subjects with a very large (or
(๐‘ฆ )
small) values of ๐œ‚, will tend to have ๐‘ฬ‡๐‘–๐‘— ๐‘–๐‘— of 1 (or 0) and thus will not have as many transitions
despite having (conditionally) independent observations. Note that the PPs are correlated
through the ๐œ‚ even if there is no stochastic autocorrelation.
Modeling probabilities requires cumulative distribution functions. In this article, Φ(๐‘ฅ๐‘–๐‘— )
represents the normal cumulative distribution function at quantile ๐‘ฅ๐‘–๐‘— . Φ2 (๐‘ฅ๐‘–๐‘— , ๐‘ฅ๐‘–๐‘—′ , ๐œˆ๐‘–๐‘—๐‘—′ (๐œŒ))
represents the bivariate normal c.d.f. or an approximation thereto, at quantiles ๐‘ฅ๐‘–๐‘— and ๐‘ฅ๐‘–๐‘—′ , where
๐œˆ๐‘–๐‘—๐‘—′ (๐œŒ) is the autocorrelation function which depends on the parameter ๐œŒ. The function ๐œˆ๐‘–๐‘—๐‘—′ is
defined for subject ๐‘– between times ๐‘— and ๐‘— ′ because the correlation and sampling times could be
different between the subjects.
†
(๐‘ฆ๐‘–๐‘— )
The population (marginal) probability ๐‘๐‘–๐‘— is often the quantity of interest when predicting from a model.
However, this probability cannot be directly modelled, because a closed form does not exist for the multivariate
probability distribution. These probabilities must be derived from the model secondarily.
-8-
State Space and Marginal Probability Models
Illustration of State-Space and Marginal Probability Models
The relationship between the SS and MP approaches was provided in the previous section.
In this section, the implied intent of the SS and MP approaches is clarified and contrasted
through modeling a simple example. The data for the example were simulated using stochastic
autocorrelation to help delineate the differences between the approaches.
Consider a simple hypothetical case where a drug is given, its effects achieve steady state by
the first sample time, and there is no placebo effect. Let the probability of being a responder be
(1)
(1)
(1)
๐‘๐‘–๐‘— = ๐‘ฬ‡๐‘–๐‘— = 0.75 across all subjects (i) and times (๐‘—, ๐‘ก๐‘— ∈ {1,2,4,8,16}). The equality, ๐‘๐‘–๐‘— =
(1)
๐‘ฬ‡๐‘–๐‘— , indicates that V๐‘Ž๐‘Ÿ(๐œ‚) = 0. Clearly this is an idealization, yet it is useful for illustration.
Two structures for ๐œˆ๐‘–๐‘—๐‘—′ were considered, the autoregressive structure (AR1), where ๐œˆ๐‘–๐‘—๐‘—′ = ๐œŒ,
and the spatial autoregressive structure (AR1S), where ๐œˆ๐‘–๐‘—๐‘—−1 = ๐œŒ|๐‘ก๐‘–๐‘— −๐‘ก๐‘–๐‘—−1 | . Both scenarios used
๐œŒ = 0.5 with data from 1000 virtual subjects.
Five models were fitted to the data. The models are enumerated below along with a
reference to its model structure: 1) typical Markov model with assumed first state (5a), 2)
Markov model with estimated first state (5b), 3) constrained Markov model (5c), 4) an MP
model (5d), and 5) an MP model (5d) with a 3-point quadrature approximation to Φ2 :
(1|๐‘ฆ
)
(1|๐‘ฆ
)
๐‘๐‘–๐‘—๐‘—−1๐‘–๐‘—−1 = Φ(๐›ผ1 + ๐›ผ2 ๐ผ[๐‘ฆ๐‘–๐‘—−1 = 1]); assume ๐‘ƒ(๐‘Œ๐‘–0 = 0) = 1
(1)
๐‘๐‘–๐‘—๐‘—−1๐‘–๐‘—−1 = Φ(๐›ผ1 + ๐›ผ2 ๐ผ[๐‘ฆ๐‘–๐‘—−1 = 1]), ๐‘— > 1; ๐‘๐‘–๐‘— = Φ(๐›ผ0 ), ๐‘— = 1
(1)
(1|0)
๐‘๐‘–๐‘— = Φ(๐›ฝ1 ), ๐‘— ≥ 1; ๐‘ฬ‡๐‘–๐‘—,๐‘—−1 = Φ(๐›ฝ1 + ๐›ผ3 ), ๐‘— > 1;
(1|1)
๐‘๐‘–๐‘—,๐‘—−1 = 1 −
-9-
Φ(๐›ฝ1 + ๐›ผ3 )[1 − Φ(๐›ฝ1 )]
,๐‘— > 1
Φ(๐›ฝ1 )
(5a)
(5b)
(5c)
State Space and Marginal Probability Models
(1)
๐‘๐‘–๐‘—
= Φ(๐›ฝ1 ), ๐‘— ≥ 1;
(๐‘ฆ ,๐‘ฆ
(๐‘ฆ๐‘–๐‘— |๐‘ฆ๐‘–๐‘—−1 )
๐‘๐‘–๐‘—๐‘—−1
(๐‘ฆ๐‘–๐‘— ,๐‘ฆ๐‘–๐‘—−1 )
=
๐‘ฬ‡ ๐‘–๐‘—๐‘—−1
(๐‘ฆ๐‘–๐‘—−1 )
, ๐‘— > 1;
(5d)
๐‘ฬ‡ ๐‘–๐‘—−1
)
(1)
(1)
๐‘–๐‘— ๐‘–๐‘—−1
๐‘๐‘–๐‘—๐‘—−1
= (๐‘ฆ๐‘–๐‘— − 1)(๐‘ฆ๐‘–๐‘—−1 − 1) + (๐‘ฆ๐‘–๐‘— − 1)(−1)๐‘ฆ๐‘–๐‘—−1 ๐‘๐‘–๐‘—−1 + (๐‘ฆ๐‘–๐‘—−1 − 1)(−1)๐‘ฆ๐‘–๐‘— ๐‘๐‘–๐‘—
+ (−1)(๐‘ฆ๐‘–๐‘—+๐‘ฆ๐‘–๐‘—−1 ) Φ2 (๐›ฝ1 , ๐›ฝ1 , ๐œˆ๐‘–๐‘—๐‘—−1 )
๐‘Œ๐‘–0 represents a response at ๐‘ก = 0, which was not simulated as part of the design, yet is necessary
to start the Markov chain. ๐ผ[โˆ™] represents an indicator function that = 1 when the logical
condition is true and = 0 when false. Different symbols were used for the parameters to clarify
their different roles and interpretations between the models.
The two Markov models, Model 1 and Model 2 [(5a) and (5b), respectively] have two
parameters for modeling the TP, ๐›ผ1 and ๐›ผ2 . The term ๐›ผ2 ๐ผ[๐‘ฆ๐‘–๐‘—−1 = 1] reflects a change in the TP
as a function of observing ๐‘ฆ๐‘–๐‘—−1 = 1, and the magnitude of the change depends on ๐›ผ2 ; thus,
(1|0)
(1|0)
๐‘๐‘–๐‘—๐‘—−1 = Φ(๐›ผ1 ) and ๐‘๐‘–๐‘—๐‘—−1 = Φ(๐›ผ1 + ๐›ผ2 ). In (5a), the response or state at time 0 is assumed to
be 0 and is forced to be so in the model (the first observations must result from a transition from
the 0-state or a recurrence of the 0 state). Model 2 in (5b) does not make any assumption about
the state of the prior time point. The parameter, ๐‘0 , models the probability of being in the state =
1 at time ๐‘ก = 1. Note that ๐›ผ2 reflects the degree of dependence between two contiguous
(1|0)
(1|1)
observations (ie, ๐›ผ2 = 0 means these are independent or ๐‘๐‘–๐‘—๐‘—−1 = ๐‘๐‘–๐‘—๐‘—−1 = Φ(๐›ผ1 ).). For these
two models, the marginal probabilities can be derived using the initial state probability vector
and the transition probability matrix. The calculation for ๐‘ก = 1 is as follows:
- 10 -
State Space and Marginal Probability Models
[
(0)
๐‘๐‘–0
(0|0)
๐‘๐‘–10
(1)
๐‘๐‘–0 ] [ (0|1)
๐‘๐‘–10
(0,0)
(0,1)
[ ๐‘๐‘–10
+ ๐‘๐‘–10
(0|0)
(1|0)
๐‘๐‘–10
(1|1)
๐‘๐‘–10
(0) (0|0)
(1) (0|1)
] = [ ๐‘๐‘–0
๐‘๐‘–10 + ๐‘๐‘–0 ๐‘๐‘–10
(1,0)
(1,1)
(0)
๐‘๐‘–10 + ๐‘๐‘–10 ] = [ ๐‘๐‘–1
(1|0)
(0|1)
(0) (1|0)
(1) (1|1)
๐‘๐‘–0 ๐‘๐‘–10 + ๐‘๐‘–0 ๐‘๐‘–10 ] =
(1)
๐‘๐‘–1 ]
(1|1)
(6)
(0)
(1)
where ๐‘๐‘–10 + ๐‘๐‘–10 = 1, ๐‘๐‘–10 + ๐‘๐‘–10 = 1, and ๐‘๐‘–1 + ๐‘๐‘–1 = 1 by definition. The
probabilities for a general time point j are calculated from repeated application of the TP matrix
as follows:
(0|0)
[
(0)
๐‘๐‘–0
(1)
๐‘๐‘–0 ] [
(1|0)
๐‘๐‘–10
๐‘๐‘–10
(0|1)
๐‘๐‘–10
(1|1)
๐‘๐‘–10
(0|0)
][
(1|0)
๐‘๐‘–21
๐‘๐‘–21
(0|1)
๐‘๐‘–21
(1|1)
๐‘๐‘–21
]โ‹ฏ[
๐‘๐‘–๐‘—๐‘—−1
(0|0)
๐‘๐‘–๐‘—๐‘—−1
(1|0)
(0|1)
๐‘๐‘–๐‘—๐‘—−1
(1|1)
๐‘๐‘–๐‘—๐‘—−1
(0)
] = [ ๐‘๐‘–๐‘—
(1)
๐‘๐‘–๐‘— ]
(7)
If the TP matrix is constant over all j and equal to P, which is the case here, then (7) maybe
simplified to
(0)
[ ๐‘๐‘–0
(0)
(1)
๐‘๐‘–0 ]P๐‘— = [ ๐‘๐‘–๐‘—
(1)
๐‘๐‘–๐‘— ]
(8)
due to j repeated matrix multiplications of P. The expression in (8) has implications with respect
to steady state probability predictions, which are useful in contrasting the SS and MP approaches
as shown hereinafter.
The structure for Model 4 in (5d) shows that the MPs of the responses are modeled by ๐›ฝ1.
The autocorrelation is handled by the TPs, which are derived using the joint probability and the
previous marginal probability. The magnitude of the autocorrelation is governed by ๐œŒ. The
calculation of Φ2 is not available in all software. Therefore, an approximation of Φ2 using 3point quadrature was evaluated in (5d) which is denoted as Model 5 [5]. Note for Model 4 and
(1|0)
(1|1)
Model 5 that under independence (๐œŒ = 0) ๐‘๐‘–๐‘—๐‘—−1 = ๐‘๐‘–๐‘—๐‘—−1 = Φ(๐›ฝ1 ).
- 11 -
State Space and Marginal Probability Models
Model 3, termed the constrained Markov model and depicted in (5c), is a hybrid between the
Markov approach and the MP approach. The model uses a fixed effect, ๐›ผ3 , to govern the amount
of autocorrelation mediated by the TP, while predicting the MP through estimation of ๐›ฝ1. Note
(1|0)
(1|1)
that ๐‘๐‘–๐‘—๐‘—−1 = ๐‘๐‘–๐‘—๐‘—−1 = Φ(๐›ฝ1) when ๐›ผ3 = 0.
Models 1-5 were fitted to the simulated data. The results are provided in Figure 1. The first
row displays the predictions for the AR(1) scenario, and the first column of the row displays the
(1)
predicted ๐‘๐‘–๐‘— over time. The prediction for Model 1 at ๐‘ก = 1 is lower than those from the other
models, and achieves steady state around ๐‘ก = 4. Model 2 predicts better than Model 1 and
achieves steady state also at ๐‘ก = 4 approximately. Note that the prediction from Model 2 at ๐‘ก =
1 is off the line because of variability. This prediction is made based on the data at ๐‘ก = 1 only.
Interestingly, Model 1 and Model 2 do not have a constant probability prediction like the true
model. Models 3-5 are indistinguishable and predict a flat trajectory as these were constructed to
do so.
(0|1)
For the transition to 0 from 1, all models provide similar predictions of the TP ๐‘๐‘–๐‘—๐‘—−1 . For
(1|0)
๐‘๐‘–๐‘—๐‘—−1 , Model 1 is not as close to the true value as Model 2, which is a not as close to Models 35. The reason is that the simulation structure did not assume a value at ๐‘ก = 0. The assumption
(1|0)
(0|1)
(0|1)
of a 0 response at ๐‘ก = 0 influences ๐‘๐‘–๐‘—๐‘—−1 more than ๐‘๐‘–๐‘—๐‘—−1 because information for ๐‘๐‘–๐‘—๐‘—−1 is
only available under the model at ๐‘ก ≥ 2 – ie, the initial assumption does not influence this
probability. One can see that by ๐‘ก = 2 that Model 1 has moved away from the poor prediction at
๐‘ก = 1. The second row displays the results from the AR(1S) scenario. Models 1 and 2 again
reach steady state by around ๐‘ก = 4, with all models achieving comparable predictions at ๐‘ก ≥ 2.
- 12 -
State Space and Marginal Probability Models
For the TP, only Models 4 and 5 track the true trajectory. The other models do not have the
machinery to allow autocorrelation to be a function of time. However, even though Model 1, 2
(1)
and 3 do not accurately estimate the TPs, these do predict and approximate the flatness of ๐‘๐‘–๐‘— at
steady state.
Figure 1 alludes to the notion of steady state for a Markov chain. Steady state can be
formalized according to the following
(0)
[ ๐‘๐‘–0
(1)
(0)
๐‘๐‘–0 ]P ∞ = [ ๐‘๐‘–∞
(1)
(0)
๐‘๐‘–∞ ] = [ ๐‘๐‘–∞
(1)
๐‘๐‘–∞ ]P
(9)
where P ∞ represents infinite applications of P. This is somewhat analogous to pharmacokinetic
steady state following repeated regular administration of a drug, where the concentration at the
beginning of the dosing interval is identical to that at the end of the interval. The steady state
(0)
(1)
(0)
probabilities ๐‘๐‘–∞ , and ๐‘๐‘–∞ can be calculated by solving [ ๐‘๐‘–∞
(0)
(1)
๐‘๐‘–∞ ](P − I), where I is the
(1)
(0)
(1)
identity matrix, using the constraint ๐‘๐‘–∞ + ๐‘๐‘–∞ = 1. The solution connotes that ๐‘๐‘–∞ , and ๐‘๐‘–∞
(0)
(1)
do not depend upon the starting probabilities ๐‘๐‘–0 and ๐‘๐‘–0 after a large number of applications
(0)
(1)
of P (ie, P ∞ ). The rate at which steady state is achieved is dependent upon ๐‘๐‘–0 and ๐‘๐‘–0 ,
(0)
however. Consider a simulation in which the sample size is large enough that the TP, and ๐‘๐‘–0
(1)
and ๐‘๐‘–0 for Model 2), are estimated with sufficient accuracy and precision to be
indistinguishable from the true values. Model 1 predictions do not approach 0.75 until ๐‘ก > 16,
(1)
(1)
which is outside the sampling design ( ๐‘ฬ‡๐‘–6 = 0.749), while Model 2 predicts ๐‘ฬ‡๐‘–1 = 0.75 for all
๐‘ก ≥ 1, because the starting value of the chain is estimated at the first time point. In contrast,
steady state is implicit in the formulations (and assumptions) of Models 3 through 5.
- 13 -
State Space and Marginal Probability Models
Another area to compare and contrast the MP and SS approaches is how these handle
interstitial (intermittent or non-monotonic) missing data – ie, occasional missing data within a
subject such as a missed clinic visit (not due to dropout). For the SS approach, the probability
model for the current observation is a function of the previous observation (such as Model 1 and
2). When the previous observation is missing and hence unknown, one must effectively
integrate it out of the likelihood. This is demonstrated in the context of the illustration
hereinbefore. Noting the assumption that the 0 state has probability 1 at ๐‘ก = 0 for Model 1, the
likelihood for a complete data case is
(๐‘ฆ |0)
โ„’ = ๐‘๐‘–10๐‘–1
(๐‘ฆ |๐‘ฆ๐‘–1 )
๐‘๐‘–21๐‘–2
(๐‘ฆ |๐‘ฆ๐‘–2 )
๐‘๐‘–32๐‘–3
(๐‘ฆ |๐‘ฆ๐‘–3 )
๐‘๐‘–43๐‘–4
(๐‘ฆ |๐‘ฆ๐‘–4 )
๐‘๐‘–54๐‘–5
(10)
Consider the case where the second observation (๐‘— = 2, ๐‘ก = 2) is missing, yet all the other
observations are available. The path the individual followed to arrive at state ๐‘ฆ๐‘–3 (at ๐‘ก = 3) from
๐‘ฆ๐‘–2 is unknown, so all trajectories must be considered. The likelihood for this case is
(๐‘ฆ |0)
โ„’ = ๐‘๐‘–10๐‘–1
(0|๐‘ฆ๐‘–1 )
[ ๐‘๐‘–21
(๐‘ฆ |0)
๐‘๐‘–32๐‘–3
(1|๐‘ฆ๐‘–1 )
+ ๐‘๐‘–21
(๐‘ฆ |1)
(๐‘ฆ |๐‘ฆ๐‘–3 )
๐‘๐‘–32๐‘–3 ] ๐‘๐‘–43๐‘–4
(๐‘ฆ |๐‘ฆ๐‘–4 )
๐‘๐‘–54๐‘–5
(11)
The quantity in brackets in (11) is ๐‘ƒ(๐‘Œ๐‘–3 = ๐‘ฆ๐‘–3 |๐‘Œ๐‘–1 = ๐‘ฆ๐‘–1 ). Handling missing data in this
principled manner is complicated when there are strings of contiguous interstitial missingness in
the data, because of the numbers of summations involved. Because of the complications,
missing data are often imputed using a last-observation-carried-forward approach, the effects of
which are not often evaluated.
Missing data must also be integrated out of the likelihood for the MP approach. The
assumption of normal latent residuals (ie, probit modeling) and consideration of the variance-
- 14 -
State Space and Marginal Probability Models
covariance matrix of the residual simplifies the issue. Assuming that the variance of the
residuals = 1, which is typical, the matrices for the AR(1) and AR(1S) scenarios, are
1
๐œŒ1
Σ๐‘– = ๐ถ๐‘œ๐‘ฃ[๐œ€๐‘– ] = ๐œŒ2
๐œŒ3
[๐œŒ4
๐œŒ1
1
๐œŒ1
๐œŒ2
๐œŒ3
๐œŒ2
๐œŒ1
1
๐œŒ1
๐œŒ2
๐œŒ3
๐œŒ2
๐œŒ1
1
๐œŒ1
′
with general elements ๐œŒ|๐‘—−๐‘— | and ๐œŒ
1
๐œŒ4
3
๐œŒ1
๐œŒ
๐œŒ2 or ๐œŒ3
๐œŒ1
๐œŒ7
1 ] [๐œŒ15
|๐‘ก๐‘–๐‘— −๐‘ก๐‘–๐‘—′ |
๐œŒ1
1
๐œŒ2
๐œŒ6
๐œŒ14
๐œŒ3 ๐œŒ7 ๐œŒ15
๐œŒ2 ๐œŒ6 ๐œŒ14
1
๐œŒ4 ๐œŒ12
๐œŒ4
1 ๐œŒ8
๐œŒ12 ๐œŒ8 1 ]
(12)
, respectively. Because of the normality assumption,
integrating out the missing data corresponds to eliminating the row and column from the matrix
in (12) that corresponds to the missing observation – in this example, the second row and column
yielding
1
๐œŒ2
Σ๐‘– = ๐ถ๐‘œ๐‘ฃ[๐œ€๐‘– ] = 3
๐œŒ
[๐œŒ4
๐œŒ2
1
๐œŒ1
๐œŒ2
๐œŒ3
๐œŒ1
1
๐œŒ1
1
๐œŒ4
2
๐œŒ
๐œŒ3
or
๐œŒ1
๐œŒ7
1 ] [๐œŒ15
๐œŒ3
1
๐œŒ4
๐œŒ12
๐œŒ7 ๐œŒ15
๐œŒ4 ๐œŒ12
1 ๐œŒ8
๐œŒ8 1 ]
(13)
This general description of the elements is helpful when coding the likelihood. The correlation
function used in Φ2 is simple to derive, because ๐œˆ๐‘–31 = ๐œŒ|3−2| ๐œŒ|2−1| = ๐œŒ|3−1| for AR(1) or
๐œˆ๐‘–31 = ๐œŒ|4−2| ๐œŒ|2−1| = ๐œŒ|4−1| for AR(1S). Thus, the likelihood goes from
(๐‘ฆ๐‘–2 ,๐‘ฆ๐‘–1 ) (๐‘ฆ๐‘–3 ,๐‘ฆ๐‘–2 ) (๐‘ฆ๐‘–4 ,๐‘ฆ๐‘–3 ) (๐‘ฆ๐‘–5 ,๐‘ฆ๐‘–4 )
๐‘๐‘–21
๐‘๐‘–21
๐‘๐‘–21
(๐‘ฆ ) ๐‘๐‘–21
โ„’ = ๐‘๐‘–1 ๐‘–1
(๐‘ฆ
๐‘๐‘–1 ๐‘–1
)
(๐‘ฆ
๐‘๐‘–2 ๐‘–2
)
(๐‘ฆ
๐‘๐‘–3 ๐‘–3
)
(๐‘ฆ
๐‘๐‘–4 ๐‘–4
)
(14)
for the complete data case to
(๐‘ฆ๐‘–3 ,๐‘ฆ๐‘–1 ) (๐‘ฆ๐‘–4 ,๐‘ฆ๐‘–3 ) (๐‘ฆ๐‘–5 ,๐‘ฆ๐‘–4 )
๐‘๐‘–21
๐‘๐‘–21
(๐‘ฆ ) ๐‘๐‘–21
โ„’ = ๐‘๐‘–1 ๐‘–1
- 15 -
(๐‘ฆ
๐‘๐‘–1 ๐‘–1
)
(๐‘ฆ
๐‘๐‘–3 ๐‘–3
)
(๐‘ฆ
๐‘๐‘–4 ๐‘–4
)
(15)
State Space and Marginal Probability Models
when the second observation is missing.
Marginal Probability Models and the Latent Variable Structure
Before the simulation study, it is helpful to establish a general framework. Let Z be a qdimensional multivariate latent variable
๐‘๐‘– |๐œ‚ = ๐‘“(๐›ฝ, ๐‘ก๐‘– , ๐‘‘๐‘– , ๐‘ฅ๐‘– ) + ๐‘”(๐›ฝ, ๐‘ก๐‘– , ๐‘‘๐‘– , ๐‘ฅ๐‘– )๐œ‚๐‘– + ๐œ€๐‘–
(16)
where ๐‘“(๐›ฝ, ๐‘ก๐‘– , ๐‘‘๐‘– , ๐‘ฅ๐‘– ) = ๐‘“๐‘– and ๐‘”(๐›ฝ, ๐‘ก๐‘– , ๐‘‘๐‘– , ๐‘ฅ๐‘– ) = ๐‘”๐‘– are vectors or matrices of functions of the
fixed effects (β), design dependent covariate vectors ๐‘ก๐‘– (time) and ๐‘‘๐‘– (dose), and ๐‘ฅ๐‘– is a matrix of
subject-specific covariates (potentially time varying); η is a p-dimensional vector of random
effects (๐‘‰๐‘Ž๐‘Ÿ(๐œ‚๐‘– ) = Ω); and ๐œ€๐‘– is the q-dimensional normally distributed latent residual error
vector (๐‘‰๐‘Ž๐‘Ÿ(๐œ€๐‘– ) = Σ(๐‘ก๐‘– )), with a covariance/correlation matrix that can be a function of ti. In
general, ๐œ€๐‘– could have any distribution, however, modeling autocorrelation through latent
residuals is the most straightforward. This will be revisited later. The diagonal entries of Σ(ti) ≡
1 for identifiability, and is thus a correlation matrix. In pharmacometrics work for binary data,
often ๐‘”๐‘– ≡ 1, and the model is not written grouping the fixed effect terms and the random effects
terms as shown in (16). There is benefit in this which should become clear. Then,
(1)
๐‘ฬ‡๐‘–๐‘— = ๐‘ƒ(๐‘Œ๐‘–๐‘— = 1|๐œ‚) = ๐‘ƒ(๐‘๐‘–๐‘— ≤ ๐›พ|๐œ‚) = Φ(๐›พ − ๐‘“๐‘–๐‘— − ๐‘”๐‘–๐‘— ๐œ‚๐‘– |๐œ‚)
(17)
where i indexes subject, j indexes time within the subject (an observation from the vector), and ๐›พ
is a constant, interpreted as a threshold, which is also sometimes interpreted as the baseline (see
Hu et al [6] for a broader discussion).
- 16 -
State Space and Marginal Probability Models
Nonlinear mixed effects software takes (17) and computes the (approximate) likelihood
behind the scenes. Despite the linearity of ๐œ‚ in (16), which allows exact computation of the
distribution of Z, the likelihood is based on integrals of Z, which leads to the need to approximate
the likelihood when fitting Y.
(1)
(1)
The desired quantity from modeling for decision making is typically ๐‘๐‘–๐‘— , not ๐‘ฬ‡๐‘–๐‘— . Usually
∗
one must use Monte Carlo techniques to calculate these, eg ๐‘๐‘–๐‘— ≈ ๐‘€1 ∑๐‘€
๐‘š=1 Φ(๐œ‡๐‘–๐‘— − ๐‘”๐‘–๐‘— ๐œ‚๐‘š ), for a
suitable size M where ๐œ‚∗ is sampled from ๐‘(0, Ω) and ๐œ‡๐‘–๐‘— and Ω and have been estimated.
However, ๐‘๐‘–๐‘— can be computed directly from the marginal distribution for the probit model
described above. This removes simulation error from the prediction and is economical
computationally, and also provides some insight into the potential for bias. The direct
computation is based on the multivariate normal marginal distribution of Z – ie,
๐‘๐‘– ~๐‘(๐‘“๐‘– , ๐‘”๐‘– Ω๐‘”๐‘–๐‘‡ + Σ๐‘– = Ξ๐‘– ). Using the LV framework, properties of the multivariate normal and
letting ๐œ‡๐‘–๐‘— = ๐›พ − ๐‘“๐‘–๐‘— (absorbing the threshold), the PP can be derived
(1)
−½
−½
๐‘๐‘–๐‘— = ๐‘ƒ(๐‘Œ๐‘–๐‘— = 1) = ๐‘ƒ(๐‘๐‘–๐‘— ≤ ๐›พ) = Φ(Ξ๐‘–๐‘—๐‘—
[๐›พ − ๐‘“๐‘–๐‘— ]) = Φ(Ξ๐‘–๐‘—๐‘—
๐œ‡๐‘–๐‘— )
(18)
where Ξ๐‘–๐‘—๐‘— is the jth element of the diagonal of Ξ๐‘– . Note that (18) is a MP and that the index i is
retained, because, in general, subjects can have different covariate vectors such as time or dose.
The result in (18) is not dependent directly on the off diagonal elements of Ξ and hence the
−½
autocorrelation specified in Σ. One can see in (18) that β, ๐‘” and Ω are intertwined in Ξ๐‘–๐‘—๐‘—
๐œ‡๐‘–๐‘— ,
such that the misspecification of ๐‘” or Ω, or biased estimates of Ω could influence the estimates
of β to keep accurate predictions of ๐‘๐‘–๐‘— . From this, one could speculate that incorrect
specification of Σ could lead to biased estimates of Ω to compensate, and hence biases in β could
- 17 -
State Space and Marginal Probability Models
−½
result. The Ξ๐‘–๐‘—๐‘—
๐œ‡๐‘–๐‘— in (18) exemplifies the possible confounding of fixed and random effects.
The LV structure naturally leads to the MP approach.
Simulation Study to Evaluate Bias in the Estimates, Predictions and Inferences
Methods
The form of the model components in (16) used in the simulation study were
๐‘‘
ln 2
๐‘–
๐œ‡๐‘– = ๐›ฝ1 + ๐›ฝ2 ๐‘ˆ(๐‘ก๐‘–๐‘— ) + ๐›ฝ3 ๐‘‘ +๐‘’๐‘ฅ๐‘(๐›ฝ
, ๐‘ˆ๐‘–๐‘— = ๐‘ˆ(๐‘ก๐‘–๐‘— ) = 1 − ๐‘’๐‘ฅ๐‘ (− ๐‘’๐‘ฅ๐‘(๐›ฝ )) ๐‘ก๐‘–
)
๐‘–
5
4
(19a)
๐‘‡
๐‘”๐‘– = (1, ๐‘ˆ(๐‘ก๐‘–๐‘— )) → ๐‘”๐‘– ๐œ‚๐‘– = ๐œ‚1๐‘– + ๐œ‚2๐‘– ๐‘ˆ๐‘–๐‘—
(19b)
As above, Ξ๐‘– = ๐‘”๐‘– Ω๐‘”๐‘– ๐‘‡ + Σ๐‘– , where Ξ๐‘–๐‘—๐‘— = Ω11 + 2๐‘ˆ๐‘–๐‘— Ω12 + ๐‘ˆ๐‘–๐‘—2 Ω22 + 1.
The design skeleton used for the simulation study was as follows: 7 parallel-groups with
doses ๐‘‘๐‘– from the set {0, 1, 3, 5, 10, 15, 30}, 40 subjects per dose group, and 7 time points per
individual with ๐‘ก๐‘– ∈{1, 2, 4, 8, 16, 24, 36}. The parameters ๐›ฝ1, ๐›ฝ2, and ๐›ฝ3 in (19a) were derived
(1)
(1)
(1)
to achieve: ๐‘๐‘–๐‘— = 0.15 for ๐‘‘๐‘– = 0, ๐‘ก๐‘–๐‘— = 1; ๐‘๐‘–๐‘— = 0.40 for ๐‘‘๐‘– = 0, ๐‘ก๐‘–๐‘— = 16; and ๐‘๐‘–๐‘— = 0.85
for ๐‘‘๐‘– = 0, ๐‘ก๐‘–๐‘— = 16; ๐›ฝ2 is the maximum placebo effect and ๐›ฝ3 is the maximum drug effect or
Emax. The half-life (t½) of placebo onset was ๐‘’๐‘ฅ๐‘(๐›ฝ4 ) = 4, and the ED50 was ๐‘’๐‘ฅ๐‘(๐›ฝ5 ) = 30
(ie,
9
the ED90 = 30). Overall, β = (−1.67, 1.25, 2.94, 1.39, 1.20). The variance components were
Ω11 = 1, Ω12 = 0, and Ω22 = 2.5. The values of ๐œŒ considered in the simulation were {0, 0.3,
0.5, 0.7, 0.9} for Σ๐‘– , and the AR(1S) structure was used – ie, Σ๐‘–๐‘™๐‘š = ๐œŒ|๐‘ก๐‘–๐‘™−๐‘ก๐‘–๐‘š| . The ๐œ€ in (16)
were simulated using the following recursive relation:
- 18 -
State Space and Marginal Probability Models
๐œ€๐‘–1 = ๐‘’๐‘–1
๐‘—=1
๐œ€๐‘–๐‘— = ๐œ€๐‘–๐‘—−1 ๐œŒ|๐‘ก๐‘–๐‘—−๐‘ก๐‘–๐‘—−1 | + ๐‘’๐‘–๐‘— √1 − ๐œŒ2|๐‘ก๐‘–๐‘— −๐‘ก๐‘–๐‘—−1 |
๐‘—≥2
(20)
where ๐‘’๐‘–๐‘— ~๐‘(0,1). The responses were realized by plugging the simulated values of ๐œ‚๐‘– and ๐œ€๐‘–๐‘—
into ๐‘ฆ๐‘–๐‘— = ๐ผ(๐œ€๐‘–๐‘— < ๐œ‡๐‘–๐‘— ), where ๐ผ(โˆ™) = 1. when ๐œ€๐‘–๐‘— < ๐œ‡๐‘–๐‘— and = 0 otherwise.
Table 1 displays the off-diagonal correlation values for Ξ – ie, the correlations in the latent
variable Z. Ranges of correlations across the doses are reported for the responses, ๐‘Œ; ranges are
displayed because these are on the marginal scale and so the fixed effects play a role. Note that
the diagonal elements of these matrices = 1 by definition. From the table, one can see there is
moderate correlation at baseline, and as the study progresses in time (large index values) that the
correlation increases. This increase in correlation is based on the placebo onset model and its
random effects, and reflects the onset of steady-state. One can also see that as ρ increases,
observations at early time points (small index values) are increasingly more correlated due to
๐œŒ|๐‘ก๐‘–๐‘— −๐‘ก๐‘–๐‘—−1 | in Σ๐‘– . This increased correlation attenuates as observations become farther apart.
The complexity and form of the model in (16) was chosen for realism. Including random
effects on model components other than the intercept is not typical [7]. The extra random effect
on the non-drug response was included to allow for a more general correlation structure than
typically assumed, yet is likely more consistent with data that are generated from an LV process
(owing to its hypothetical relation to an unobserved continuous factor). The number of time
points and doses are more than typically studied in a Phase 2 trial. However, this information
rich design was selected specifically to facilitate convergence and covariance step estimation
across multiple simulations. The goal was to avoid confounding of results and conclusions with
convergence issues, both within an assumed model and between more complex and simpler
- 19 -
State Space and Marginal Probability Models
(reduced) models. In fact, to further facilitate covariance step estimation, the following
parameterization was used: ln √Ω11 , ln √Ω22 , and ๐›ฝ๐‘Ÿ where ๐œŒ = 2(1 + ๐‘’ −๐›ฝ๐‘Ÿ )
−1
− 1). These
reparameterizations decreased the issues associated with boundary constraints of the variance
components (ie, 0) and demonstrated improved normality in their sampling distributions
(preliminary simulations – data not shown).
Eight versions of the model described in (19a) and (19b) were fitted: 1) M0 used adaptive
Gaussian quadrature (AGQ) for the random effects and a 5-point approximation for Φ2 (see [5]);
M1 was the same as M0 except the Laplace approximation (LA) was used instead of AGQ ‡ and
a 3-point approximation for Φ2 was used (see [5]); M2 was M1 using AGQ (ie, 3-point
approximation for Φ2 ); M3 was M2 using LA with ๐œŒ constrained to 0; M4 was M3 using AGQ;
M5 was M4 using LA with Ω22 constrained to 0, which is generally regarded as the standard
model; M6 was M5 using AGQ; and M7 was M6 with Ω11 constrained to 0 – a naïve pool
model. These models reflect different approximations to the likelihood and simplification to the
stochastic structure. The joint probabilities used in the likelihood for models M0, M1 and M2
are
(๐‘ฆ ,๐‘ฆ
)
๐‘–๐‘— ๐‘–๐‘—−1
๐‘ฬ‡๐‘–๐‘—๐‘—−1
= (๐‘ฆ๐‘–๐‘— − 1)(๐‘ฆ๐‘–๐‘—−1 − 1) + (๐‘ฆ๐‘–๐‘— − 1)(1 − 2๐‘ฆ๐‘–๐‘—−1 )Φ(๐œ‡๐‘–๐‘—−1 )
+(๐‘ฆ๐‘–๐‘—−1 − 1)(1 − 2๐‘ฆ๐‘–๐‘— )Φ(๐œ‡๐‘–๐‘— ) + (1 − 2๐‘ฆ๐‘–๐‘— )(1 − 2๐‘ฆ๐‘–๐‘—−1 )Φ2 [๐œ‡๐‘–๐‘— , ๐œ‡๐‘–๐‘—−1 , ๐œˆ๐‘–๐‘—๐‘—−1 (๐œŒ)]
(21)
The joint probability is expressed differently in (21) than in (5d) to facilitate coding in
NONMEM. Predictions of the PPs for these models were based on
(1)
๐›ฝ ๐‘‘
3 ๐‘–
−½
2
๐‘๐‘–๐‘— = Φ(Ξ๐‘–๐‘—๐‘—
๐œ‡๐‘–๐‘— ) = Φ ([๐›ฝ1 + ๐›ฝ2 ๐‘ˆ๐‘–๐‘— + ๐‘‘ +๐‘’
๐›ฝ5 ]⁄๐‘‰๐‘–๐‘— ) , ๐‘‰๐‘–๐‘— = √1 + Ω11 + Ω22 ๐‘ˆ๐‘–๐‘—
๐‘–
‡
The Laplace approximation is essentially adaptive Gaussian quadrature with 1 quadrature point.
- 20 -
(22)
State Space and Marginal Probability Models
Inspection of (22) reveals that biased predictions could arise because of incorrect
specification of ๐‘‰๐‘–๐‘— . When Ω22 is constrained to 0, the modeled time-profile is changed and the
parameter estimates of the ๐›ฝ’s could attempt to compensate to fit the data. Three models were
introduced to adjust for such biases, and these are based on the structure
๐ท
๐œ‡๐‘–๐‘— = ๐›ฝ1 + ๐œ‚1๐‘– + ๐›ฝ2,๐‘— (๐‘ก๐‘— ≥ 2) + ๐›ฝ3,๐‘— ๐ท +๐‘’๐‘– ๐›ฝ5
๐‘–
(23)
where separate fixed effects were estimated by time for the placebo and drug components. These
models represent saturated models, essentially fitting different dose-effect profiles by time. M8
used LA, M9 used AGQ, and M10 had Ω11 constrained to 0 (a naïve pool model). PROC
NLMIXED in SAS Version 9.3 (SAS Institute Inc., Cary, NC) was used to fit all these models.
Recently, the approximations described and implemented above have been made available as a
subroutine in NONMEM (ICON Development Solutions, Ellicott City MD), which is a more
standard and general software for pharmacometrics work. A NONMEM control stream is
supplied in Appendix A that demonstrates more clearly how these data were fitted. Appendix A
also provide a link to a site from which the bivariate normal routine can be downloaded.
Percentage bias in the parameter estimates was calculated. Despite the parameterization
discussed above for estimation, biases in Ω11 , Ω22 and ๐œŒ were computed so that percentage bias
could be reported (the skewness of the estimates on this scale did not provide misleading
conclusions). Biases in the estimates of the off diagonal elements of Ξ were evaluated also as
low biases in these should indicate good characterization of the variability or correlation, hence
accurate predictions of the TPs. Biases in parameter estimates were only computed for M0
through M7, because of the common parameter structure related to the true simulation model.
- 21 -
State Space and Marginal Probability Models
(1|0)
(1)
Biases in the ๐‘๐‘–๐‘— and the population transition probabilities (๐‘๐‘–๐‘—
(0|1)
, ๐‘๐‘–๐‘—
), as defined in the
Notation of Model Quantities section, were also computed and assessed. A closed form of these
is available, and the derivation is provided in Appendix B. These TPs relate to the ability of the
model to reproduce patterns in the response transitions within subjects over time.
(1)
Predictions of ๐‘๐‘–๐‘— and inferences thereon are often of primary interest in decision making;
(1)
thus, accuracies of inferences in the ๐‘๐‘–๐‘— were also assessed. Two methods were used to
compute 90% confidence intervals (CI). The first method used was a smoothed bootstrap
technique [8]. For each of 1000 replicates, a vector of parameters were sampled from a
multivariate normal distribution that had the mean vector equal to the estimate vector and
covariance matrix equal to the output from the covariance step (covariance matrix of the
estimates). This vector was plugged into the model to make predictions across the doses and
times for the replicate. For each dose and time combination, the 5th and 95th percentiles across
the distribution generated by the replicates were used as the 90% CI bounds. The second method
(1)
used was the delta-method, which is based on linearization [9]. The closed from of ๐‘๐‘–๐‘— allowed
straightforward implementation. The details are supplied in Appendix C. A delta-method
technique is also provided therein for general non-closed form predictions (eg, if the logit link
were used). Such a method might be able to be incorporated into software for general, quick and
convenient calculation of CIs of population means, which could be output in a table file. The
procedure was not evaluated here, yet maybe of potential use and so future research is warranted.
- 22 -
State Space and Marginal Probability Models
Results
The results for ๐œŒ = 0.0 and ๐œŒ = 0.7 are presented and contrasted here. Results for the other
๐œŒ scenarios are presented in the Supplemental Material. Model fittings that did not converge or
did not provide standard errors were discarded from the results (maximum was only 0.6% of the
fittings for a specific scenario and model). For ๐œŒ = 0.7, M0 and M2 showed the smallest
magnitudes of bias (percentage) in the fixed effects ๐›ฝ. The 5-point approximation to Φ2 did not
provide much benefit over the 3-point. M1 (using LA) showed slightly increased biases
comparatively. M3 and M4, in which ๐œŒ is fixed to 0, had greater positive biases, with AGQ
(M3) not providing much benefit over LA (M4). M5 and M6 showed lower magnitudes of bias,
except for the larger negative biases in ๐›ฝ2 and ๐›ฝ4 , which reflect the maximum placebo effect and
its onset. The design might not have been as rich in information regarding these parameters.
The decrease is surprising perhaps, because these models did not estimate ๐œŒ or Ω22 . The naïve
pool model, M7, had the largest biases which were negative. M0 and M1 had the smallest biases
in Ω11 , Ω22 and ๐œŒ, also as expected. Biases in estimates from models where parameters were
constrained to 0 are −100% by definition (๐œŒ for M3 through M7, Ω22 for M5 and M6, Ω11 for
M7). Given some of these models represent simplifications of the true model, meaningful
comparisons of the biases for these parameters are difficult. Biases in the off diagonal elements
of Ξ are also provided in Figure 2. These provide an idea of how well the correlations are being
estimated. Model M0, M1 and M2 showed almost no bias. Model M3 and M4 demonstrated a
consistent positive bias across all the elements, which resulted from the imposed constraint of
๐œŒ = 0. AGQ (M4) did not improve these estimates. M5 and M6 had jagged patterns of bias,
with some biases < 0, and again AGQ provided no benefit. M7 did not provide estimates of the
off diagonals, because these are assumed to be 0 by the model; thus, all were reported as −100%.
- 23 -
State Space and Marginal Probability Models
One wonders if the lower biases observed for M3 in the ๐›ฝ compared to M2 were because, for
some reason, M3 provided a better approximation to Ξ in some average sense.
The biases in ๐‘
(1)
on the probability scale are displayed in the first row of Figure 3, where
the segments reflect the minimum to maximum bias (range) across the doses and times and the
points represent the median across these. M0, as might be anticipated, is nearly unbiased. M1,
using LA and a 3- point approximation showed slightly increased bias compared to M0. M2
(AGQ) provided similar results to M0, which suggests that the 3-point is nearly as good as the 5point approximation. M3 and M4 had a little larger bias than the previous models, which was
due to not estimating ๐œŒ. The biases were not egregious; the biases for M4 were all less than ±
0.01. The biases for M2 and M4 fitted to the ๐œŒ = 0 scenario were similar to that of M0 (all used
AGQ), and M1 and M3 showed a little larger range of bias. Note that M0 through M4 are either
the true model (subject to approximation) or contain it for ๐œŒ = 0. M5 shows an increase in the
biases independent of ๐œŒ; Ω22 was not estimated in this model. Using AGQ (M6) did not
improve these. M7 (naïve pooled) had nearly the same biases for ๐‘
(1)
as M5 and M6. The
saturated models M8, M9 (AGQ), and M10 (naïve pooled) demonstrated little bias for either
scenario of ๐œŒ.
Biases in the population transition probabilities ๐‘
(1|0)
and ๐‘
(0|1)
are displayed in the second
and third rows of Figure 3, respectively. The patterns between these across the models are
similar. M0, M1 and M2 demonstrated little bias, AGQ decreased the bias comparatively. M3
through M6 for ๐œŒ = 0.7 and M5 and M6 for ๐œŒ = 0 demonstrated greater bias. M3 and M4 for
๐œŒ = 0 yielded similar results to M1 and M2 as expected. M7 and M10 (naïve pooled)
- 24 -
State Space and Marginal Probability Models
demonstrated the greatest biases. M8 and M9 had smaller biases compared to those models; the
estimation of Ω11 improved prediction of ๐‘
(1|0)
and ๐‘
(0|1)
.
The fourth and final row of Figure 3 shows the range of 90% CI coverages across the doses
and times by model. M0 and M2 (AGQ) performed the best independent of ๐œŒ, and M4
performed similarly to M0 and M2 for ๐œŒ = 0 as expected. M1 (and M3 for ๐œŒ = 0), M8, and M9
had less overall coverage comparatively, with M8 and M9 performing slightly better than M1 (or
M3 for ๐œŒ = 0). Even though M1 (and M3 for ๐œŒ = 0) is the true model, the LA approximation
degraded the coverage rates. This is likely due to estimating Ω22 , because M8 used LA but only
estimated Ω11 . The 90% CI rates for M8 and M9 are not egregious, despite the model form not
being the true model. The saturated nature of the model compensated, even with systematically
biased TPs. The biases increase for M3, M4 (๐œŒ = 0.7) and M10. M5 through M7 demonstrated
poor coverages independent of ๐œŒ, which is due to the larger biases in predicting ๐‘
(1)
and also
because only 1 or 0 variance components were estimated. These models likely did not
adequately capture the correlation structure and thus inflated the amount of information used in
calculating the precisions of the estimates. The use of AGQ in M6 does not improve the
outcome compared to M5. Based on the coverage rates, extreme caution would be needed when
making inferences using M5 – M7.
Discussion
The beginning of this article demonstrated that the Markov and latent variable models are
related when there is autocorrelation. Because both methods use the Markov property and thus
could be considered Markov models, it was suggested that a different terminology be used when
referring to these approaches, and that this terminology be aligned with the structural orientation
- 25 -
State Space and Marginal Probability Models
of the model. Models which focus on the structure of transition probabilities could be termed
state space models and those that focus on marginal probabilities could be termed marginal
probability models. Details on transition probabilities, achieving steady state of a Markov
process, and handling missing data were provided to help delineate differences which manifest
between the two approaches. A simple simulation was presented to compare and contrast these
as well. The influence of the assumption of the initial condition for the Markov chain was
illustrated. Overall, the intent was not to claim superiority of either method. In general, given
the data pharmacometricians encounter typically, one will not know which approach is better.
Discussing the underlying concepts of these approaches hopefully will help inform
pharmacometricians when thinking of strategies of modeling such data. For example, such
issues could be important if the goal of a model is to be able to predict results from a study if
dosage levels are changed based on observed responses or to predict the difference in responder
rate at trial conclusion.
A hybrid model which specified marginal probabilities, yet used fixed effects (non-stochastic
components) to address the autocorrelation, was presented also. It is unknown if such a model
has been discussed in the literature. The hybrid approach could be used for other link functions
and also for more in depth looks at autocorrelation, such as decay in time (not handled by simple
SS models). Future exploration of this method might be of interest, because it has the potential
to realize the benefits from both approaches. Continuous time SS models were not discussed
here (see Pilla Reddy et al for example [10]). These models are more complex to implement,
and hence were beyond the scope of this article. Continuous time SS models are attractive in
that these do not have the complications of discrete time SS models when accounting for missing
data, and these are more flexible in dealing with autocorrelation structures.
- 26 -
State Space and Marginal Probability Models
Autocorrelation of the latent residuals was presented for dealing with stochastic
autocorrelation. An approximation for the bivariate normal enabled modeling this through an
autocorrelation function. This method considers correlation on the latent scale. A method that
was not presented, yet could be used, is one that models autocorrelation of the responses instead
of through the latent residuals. It is based on the relation
๐ถ๐‘œ๐‘Ÿ(๐‘Œ1 , ๐‘Œ2 ) =
๐ธ(๐‘Œ1 ,๐‘Œ2 )−๐ธ(๐‘Œ1 )๐ธ(๐‘Œ2 )
๐‘†๐ท(๐‘Œ1 )๐‘†๐ท(๐‘Œ2 )
(1,1)
=
(1) (1)
๐‘12 −๐‘1 ๐‘2
(1)
(1)
(1)
(24)
(1)
√๐‘1 (1−๐‘1 )√๐‘2 (1−๐‘2 )
where Cor(.) is the correlation operator, E(.) is the expectation operator, and SD(.) is the standard
(1,0)
deviation operator. Setting ๐ถ๐‘œ๐‘Ÿ(๐‘Œ1 , ๐‘Œ2 ) = ๐œˆ12 (๐œŒ) and noting ๐‘12
(๐‘ฆ ,๐‘ฆ2 )
probabilities ๐‘121
(1)
(1,1)
= ๐‘1 − ๐‘12 , the joint
can be calculated and used in the likelihood. Table I illustrates the relation
between correlation on the latent and response scales for the simulations used in the article. This
approach is also Markov based (see Guerra et al [11]) and could be used for logistic or other link
functions. This method was not developed further here because of the difficulty in generalizing
it to ordered categorical data. Additionally, correlation on the latent residual scale seems more
analogous to continuous data, despite the restriction to the probit link function.
The simulation, conducted originally to evaluate the effects of neglecting stochastic
autocorrelation in the model, which leads to biased transition probabilities, provided
considerable information on strategies for modeling longitudinal binary data. These results did
not appear contradictory to those reported by Johansson et al, in so far as these two reports can
be compared (some different objectives and estimation methods were used) [12]. The main
objective here was to evaluate predictions and their inferences which were shown to be a
function of all the parameters and so biases in any could result in biased predictions. If the
- 27 -
State Space and Marginal Probability Models
results from this study can be generalized, these would imply the following, some of which
might be surprising given the current standard for modeling such data.
The 3-point approximation to the bivariate normal is nearly as good as the 5-point. Figure 3
demonstrates this for ๐œŒ = 0.7, and can be verified even for ๐œŒ = 0.9 (see the Supplemental
Material). Values of ๐œŒ > 0.9 are likely needed before a difference is perceived. Nevertheless,
unless run times are an issue (for the simulation study, the run times were similar), the 5-point is
recommended in case ๐œŒ happens to be large (albeit unlikely). Similarly, one should use the best
approximation available for estimating the random effects. Adaptive Gauss quadrature
demonstrated less bias than using the Laplace approximation, except when the random effects
structure was misspecified. There was no information to suggest that using a better
approximation could be harmful for incorrect models. In fact, biases were lower for the
saturated models when using adaptive Guass quadrature and 90% CI coverage rates were
improved. to Better approximation methods might require longer run times, however. If this is
not an issue, better approximations should be pursued when possible.
Unbiased prediction of the transition probabilities was not necessary to get unbiased
predictions (relatively) of the population mean probabilities nor was it necessary to achieve
reasonable 90% CI coverage rates and hence inferences. Addition of fixed effects parameters is
required to attain good predictions and inferences without characterizing the transition
probabilities (or the latent marginal variance structure). Thus, not characterizing the transition
probabilities will lead to developing parsimonious models. This point is furthered bellow. Also,
unless ๐œŒ is substantial (> 0.5 based on the simulation results presented here – see the
Supplemental Material), not including autocorrelation in the model should not affect marginal
probabilities or inferences egregiously. Characterization of the marginal variance or transition
- 28 -
State Space and Marginal Probability Models
probabilities may still be of interest, if the purpose of the model is simulating or predicting
individual response profiles that incorporate changes in dosage levels which are triggered by
observed responses. Using random effects and autocorrelation should be flexible enough to deal
with many situations frequently encountered in pharmacometrics, and thus should help reduce
bias [13]. This led to the next most critical result from the stimulation study.
The key learning was this. Incorporating random effects when necessary is essential to
reduce biases in predictions of the probabilities of response. Failure to include the random effect
on the non-drug model component in the simulation study led to much greater biases than
omitting autocorrelation (ie, M5 through M7). The probit formulation provides insight into why
this is the case. The marginal variability on the probit scale directly influences the population
mean probability predictions (not including autocorrelation influences the predictions indirectly
due to bias in the random effects). Consequently, incorrect random effects structures lead to
bias. Bias from failure to include necessary random effects can be reduced at the cost of
additional fixed effects. The saturated models, which had positively biased transition
probabilities, were unbiased in the population mean predictions. However, if one wants to purse
a parsimonious model, random effects should be considered on all model components, similar to
models posited for continuous data. This makes sense intuitively. As time progresses during a
trial and as subjects receive benefit from treatment, heterogeneity is introduced. Not every
subject is anticipated to have the same benefit from the same dose, which is modeled by
including a random effect on the drug effect to account for different maximum effects a drug can
bestow. Although, drug-related random effects were not considered in the simulation, it is not a
stretch to infer that failure to incorporate such effects will lead to biases in the predictions of
drug effect and drug response. Failure to incorporate these might lead to the incorporation of
- 29 -
State Space and Marginal Probability Models
more fixed effects – ie, model expansion – to get an adequate fit to the data, thwarting
interpretation and efficiency.
If drug does introduce heterogeneity or variability manifested as a change in correlation
structure as a function of dose, then this has implications for recent suggestions regarding
parameterization of indirect response models. The change from baseline parameterization [6] is
convenient for modeling data from placebo and then, in a sequential fashion, incorporating data
following active treatment, because the placebo model component absorbs the drug effect
parameters when dose or concentration is 0. Making this change dose not ensure that the
variabilities will be the same, which could result in bias as discussed above. For example think
of a simple indirect response model (IRM) [14] parameterized as (๐›ฝ1 + ๐œ‚1 ) + (๐›ฝ2 + ๐œ‚2 )๐‘…(๐‘ก) or
using change from baseline as (๐›ฝฬ1 + ๐œ‚ฬ 1 ) + (๐›ฝฬ2 + ๐œ‚ฬ 2 )[๐‘…(๐‘ก) − 1], where ๐‘…(๐‘ก) is the solution of
an IRM model which assumes inhibition (ie, ๐‘…(๐‘ก) = 0 and ๐‘…(๐‘ก > 0) < 1). The first model
posits that variability in baseline response is due to disease and that the drug effect will remove
this component of variability from the response. To use the change from baseline
parameterization in this case, care must be taken to when postulating Ω. In fact, a full Ω matrix
might need to avoid bias. Continuing this argument, if one considers a partial inhibition of
production of response (ie, ๐‘˜๐‘–๐‘› = 1 − ๐ผ๐‘š๐‘Ž๐‘ฅ (๐œ‚) โˆ™ ๐ถ(๐‘ก)⁄(๐ธ๐ถ50 + ๐ถ(๐‘ก))) that is a function of a
random effect, change of baseline will not necessarily lead to the same result as if a stimulatory
effect had been assumed (ie, ๐‘˜๐‘–๐‘› = 1 + ๐‘†๐‘š๐‘Ž๐‘ฅ (๐œ‚) โˆ™ ๐ถ(๐‘ก)⁄(๐ธ๐ถ50 + ๐ถ(๐‘ก))) even if change from
baseline is used and despite the ability to achieve a similar predicted outcome potentially. The
reason is the assumption of the random effects between the two models will lead to a different
marginal variance. Thus, when considering random effects and hence the population, the model
- 30 -
State Space and Marginal Probability Models
space is a little richer than that suggested by Hu et al (see [6]) in that there are more than 3
possible IRM models.
In summation, this research suggests that more random effects should be evaluated when
evaluating longitudinal binary or categorical data. Failure to incorporate random effects or
incorrect specification thereof can lead to biased predictions and less than nominal inference
rates.
ACKNOWLEDGEMENT
The authors would like to acknowledge graciously and thank kindly Dr. Robert Bauer for
making the bivariate normal subroutine available for general use in NONMEM.
REFERENCES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Sheiner LB (1994) A new approach to the analysis of analgesic drug trials, illustrated with bromfenac data.
Clin Pharmacol Ther 56:309-322
Lacroix BD, Lovern MR, Stockis A, Sargentini-Maier ML, Karlsson MO, Friberg LE (2009) A
pharmacodynamic Markov mixed-effects model for determining the effect of exposure to certolizumab pegol
on the ACR20. Clin Pharmacol Ther 86:387-95
Hutmacher MM, French JL (2011) Extending the latent variable model for extra correlated longitudinal
dichotomous responses. J Pharmacokinet Pharmacodyns 38:833-859
Hutmacher MM, Krishnaswami S, Kowalski KG (2008) Exposure-response modeling using latent variables for
the efficacy of a JAK3 inhibitor administered to rheumatoid arthritis patients. J Pharmacokinet Pharmacodyns
35:139–157
Drezner Z, Wesolowsky GO (1990) On the computation of the bivariate normal integral. J Stat Comput Simul
35:101–107
Hu C, Xu Z, Mendelsohn Am, Zhou H (2013) Latent variable indirect response modeling of categorical
endpoints representing change from baseline. J Pharmacokinet Pharmacodyns 40:81–91
Hu C (2014) Exposure–response modeling of clinical end points using latent variable indirect response models.
CPT Pharmacometrics Syst. Pharmacol. 3:1–8
Yu L, Griffith WS, Tyas SL, Snowdon DA, Kryscio RJ (2010) A nonstationary Markov transition model for
computing the relative risk of dementia before death Stat Med. 29:639-648
Bishop YMM, Fienberg SE, Holland PW Discrete multivariate analysis: theory and practice. The MIT Press,
Massachusetts, 1975, pp. 486-502
Pilla ReddyV, Petersson KJ, Suleiman AA, Vermeulen A, Proost JH, Friberg LE (2012) Pharmacokinetic–
pharmacodynamic modeling of severity levels of extrapyramidal side effects with Markov elements. CPT
Pharmacometrics Syst. Pharmacol. 1: 1−9
Guerra MW, Shults J, Amsterdam J, Ten-Have T (2012) The analysis of binary longitudinal data with timedependent covariates. Stat Med 31:931-948
Johansson วบM, Ueckert S, Plan EL, Hooker AC, Karlsson MO (2014) J Pharmacokinet Pharmacodyn (2014)
41:223–238
Gurka MJ, Edwards LJ, Muller KE (2011) Avoiding bias in mixed model inference for fixed effects. Stat Med.
30(22):2696-2707
Dayneka NL, Garg V, Jusko WJ (1993) Comparison of four basic models of indirect pharmacodynamic
response. J Pharmacokinet Biopharm. 21(4):457-478.
- 31 -
State Space and Marginal Probability Models
Appendices
Appendix A
The bivariate normal subroutine is available at
ftp://nonmem.iconplc.com/Public/nonmem/bivariate/ along with some sample control streams.
Below is an example control stream that corresponds to the simulation study.
$PROB Example Control Stream
$INPUT SIM ID DOSE DV DV_ TIME TIME ;***DV_ ,TIME_ are previous DV, TIME***;
;*** – ie, DV = ๐‘ฆ๐‘–๐‘— , DV_=๐‘ฆ๐‘–๐‘—−1, TIME=๐‘ก๐‘–๐‘— , TIME_=๐‘ก๐‘–๐‘—−1***;
$DATA ex01.csv IGNORE=@
$SUBROUTINES OTHER=bivariate.f90 ;***Include bivariate normal subroutine***;
$PRED
B1=THETA(1) ;***๐›ฝ1***;
B2=THETA(2) ;***๐›ฝ2***;
B3=THETA(3) ;***๐›ฝ3***;
K =LOG(2)/EXP(THETA(4)) ;***used for placebo onset – ๐‘ˆ(๐‘ก๐‘–๐‘— )***;
ED50=EXP(THETA(5))
- 32 -
State Space and Marginal Probability Models
ET1=EXP(THETA(6))*ETA(1) ;***parameterized for smoothed parametric bootstrap***;
ET2=EXP(THETA(7))*ETA(2)
RHO=(2/(1+EXP(-THETA(8)))-1)**(TIME-TIME_) ;***correlation function ๐œˆ๐‘–๐‘—๐‘—′ (๐œŒ)***;
U = (1-EXP(-K*TIME))
U_ = (1-EXP(-K*TIME_))
MX =B1+B2*U+B3*DOSE/(DOSE+ED50)+ET1+ET2*U ;*** ๐œ‡๐‘–๐‘— + ๐‘”๐‘–๐‘— ๐œ‚๐‘– ***;
MX_ =B1+B2*U_+B3*DOSE/(DOSE+ED50)+ET1+ET2*U_ ;*** ๐œ‡๐‘–๐‘—−1 + ๐‘”๐‘–๐‘—−1 ๐œ‚๐‘– ***;
MX_=B1+B2*(1-EXP(-K*TIME_))+B3*DOSE/(DOSE+ED50)+U1+U2*(1-EXP(-K*TIME_))
(0)
(1)
PC =(1- PHI(MX))*(1-DV ) + PHI(MX) *DV ;*** ๐‘ฬ‡๐‘–๐‘— (1 − ๐‘ฆ๐‘–๐‘— ) + ๐‘ฬ‡๐‘–๐‘— ๐‘ฆ๐‘–๐‘— ***;
(0)
(1)
PC_ =(1-PHI(MX_))*(1-DV_) + PHI(MX_)*DV_ ;*** ๐‘ฬ‡๐‘–๐‘—−1 (1 − ๐‘ฆ๐‘–๐‘—−1 ) + ๐‘ฬ‡๐‘–๐‘—−1 ๐‘ฆ๐‘–๐‘—−1***;
IF(PC.LE.0.0D+00) EXIT ;***avoid issues of LOG(0)***;
IF (TIME.EQ.1) THEN ;***signifies first observation***;
LOGL=LOG(PC)
ELSE
;*** Start Passing Bivariate Normal Information - only do so on subsequent records***;
- 33 -
State Space and Marginal Probability Models
VECTRA(1)=RHO
VECTRA(2)=MX
VECTRA(3)=MX_
VECTRA(4)=1 ;***0 = Upper tail as in Drezner & Wesolowsky; 1 = Bottom tail***;
VECTRA(5)=1 ;***0 = 3 pt approximation; 1 = 5 point approximation***;
BV=FUNCA(VECTRA) ;***return bivariate normal results***;
;***code joint probability based on bottom tail***;
JP=(DV-1)*(DV_-1)+(DV-1)*(1-2*DV_)*PHI(MX_)+(DV_-1)*(1-2*DV)*PHI(MX)+(12*DV)*(1-2*DV_)*BV
LOGL=LOG(JP/PC_)
;***Compute population predictions***;
V=SQRT(1+EXP(THETA(6))**2+EXP(THETA(7))**2*U**2)
;***V=SQRT(1+OMEGA(1,1)+OMEGA(2,2)*U**2) use if estimating OMEGA***;
POPP = (B1+B2*U +B3*DOSE/(DOSE+ED50))/V ;***Population mean prediction***;
;***Set up log-likelihood***;
$THETA
- 34 -
State Space and Marginal Probability Models
-2.4 ; 1 B1
2.0 ; 2 B2
3.4 ; 3 B3
1.4 ; 4 LOG(B4)
0.7 ; 5 LOG(B5)
0.1 ; 6 LOG SQRT VAR(ETA1)
0.5 ; 7 LOG SQRT VAR(ETA2)
0.6 ; 8 RHO parameter
$OMEGA DIAGONAL(2)
1 FIXED
; V1
1 FIXED
; V2
$EST MAX=8000 PRINT=10 METHOD=1 LAPLACE -2LL SIGL=10 NOHABORT
;***$EST METHOD=IMP LAPLACE -2LL NOHABORT PRINT=1 NITER=500 CTYPE=3
$COV COMPRESS MATRIX=R PRINT=E UNCONDITIONAL
- 35 -
State Space and Marginal Probability Models
A version of this control stream which uses features of NONMEM to generate DV_, MX_ etc is
provided in the Supplemental Material. That version should be helpful when dealing with
models that employ differential equations.
Appendix B
The calculation of bias for the population transition probabilities (eliminating the subscript i for
simplicity and indexing two arbitrary times by 1 and 2) requires the quantity
๐ธ๐œ‚ Φ2 ( ๐œ‡ฬ 1 , ๐œ‡ฬ 2 , ๐œˆ12 (๐œŒ)), where ๐œ‡ฬ 1 = ๐œ‡1 − ๐‘”1 ๐œ‚. From the latent variable formulation
∞ ๐œ‡1
∞
๐œ‡2
๐ธ๐œ‚ Φ2 = ∫ Φ2 (๐œ‡ฬ 1 , ๐œ‡ฬ 2 , ๐œˆ12 (๐œŒ)) = ∫ ∫ ∫ ๐œ™2 (๐‘ง1 , ๐‘ง2 , ๐œˆ12 (๐œŒ)) ๐‘‘๐‘ง1 ๐‘‘๐‘ง2
−∞
where ๐œ™2 (๐‘ง1 , ๐‘ง2 , ๐œˆ12 (๐œŒ)) =
−∞ −∞ −∞
1
2๐œ‹√1−๐œˆ12
(๐œŒ)2
๐‘’๐‘ฅ๐‘ {−
(๐‘ง12 −2๐œˆ12 (๐œŒ)๐‘ง1 ๐‘ง2 +๐‘ง22 )
2(1−๐œˆ12 (๐œŒ)2 )
}. Changing the order of
integration and making a suitable change of variables based on the square root of the diagonal
elements of Ξ (based on times 1 and 2 only),
๐ธ๐œ‚ Φ2 = Φ2 (๐œ‡1 , ๐œ‡2 , ๐œ…12 (๐œŒ, Ω))
where ๐œ…12 (๐œŒ) is the off-diagonal from [๐‘‘๐‘–๐‘Ž๐‘”(Ξ)]−1⁄2 Ξ[๐‘‘๐‘–๐‘Ž๐‘”(Ξ)]−1⁄2, which for the example
⁄
⁄
yields ๐œ…12 (๐œŒ) = (๐œˆ12 (๐œŒ) + Ω11 + Ω22 ๐‘ˆ1 ๐‘ˆ2 )๐‘‰1−1 2 ๐‘‰2−1 2 . Monte Carlo methods could also be
(1|0)
used: ๐‘๐‘–๐‘—
(0)
−1
(1,0)∗
(0|1)
≅ [๐‘๐‘–๐‘—−1 ] [๐‘€1 ∑๐‘€
๐‘š=1 ๐‘ฬ‡๐‘–๐‘—๐‘—−1 ] and ๐‘๐‘–๐‘—
(1)
−1
(0,1)∗
≅ [๐‘๐‘–๐‘—−1 ] [๐‘€1 ∑๐‘€
๐‘š=1 ๐‘ฬ‡๐‘–๐‘—๐‘—−1 ], where the ‘*’
∗
indicates the probability is a function of ๐œ‚๐‘š
which is sampled from ๐‘(0, Ω), with Ω estimated,
using a sufficiently large M.
- 36 -
State Space and Marginal Probability Models
Appendix C
ฬ‚ ) correspond to their estimates,
Let ๐œ“ = (๐›ฝ, Ω) be the vector of parameters and ๐œ“ฬ‚ = (๐›ฝฬ‚ , Ω
ฬ‚. Then the variance of
ฬ‚ (๐œ“ฬ‚) = Δ
which have an estimated covariance matrix (eg, COV step) ๐‘‰๐‘Ž๐‘Ÿ
the prediction can be calculated using
(1)
(1)
ฬ‚ [Φ−1(๐‘๐‘–๐‘—
๐‘‰๐‘Ž๐‘Ÿ
)] ≈
∂ [Φ−1 (๐‘๐‘–๐‘— )]
∂๐œ“
๐‘‡
(1)
ฬ‚
Δ
|
∂ [Φ−1 (๐‘๐‘–๐‘— )]
∂๐œ“
|
ฬ‚
๐œ“=๐œ“
ฬ‚
๐œ“=๐œ“
such that the confidence limit (eg, at 5th percentile) is
(1)
ฬ‚ [Φ−1 (๐‘(1) )]]
๐ถ๐ฟ0.05 = Φ [Φ−1 (๐‘๐‘–๐‘— ) + ๐‘0.05 โˆ™ √๐‘‰๐‘Ž๐‘Ÿ
๐‘–๐‘—
where ๐‘0.05 is the quantile corresponding to probability level 0.05. To apply this method when a
closed form solution to the population mean is not available, suitable regulatory conditions are
required. In general, let ๐ธ(๐‘Œ|๐œ‚) = ๐œ‡(๐œ‚) and ๐ธ(๐‘Œ; ๐œ“) = ๐ธ๐œ‚ (๐œ‡(๐œ‚); ๐œ“) where inference on ๐ธ(๐‘Œ) is
desired. Applying the delta-method
๐‘‡
∂[๐ธ(๐‘Œ; ๐œ“)]
∂[๐ธ(๐‘Œ; ๐œ“)]
ฬ‚
ฬ‚ [๐ธ(๐‘Œ; ๐œ“)] ≈
๐‘‰๐‘Ž๐‘Ÿ
|
Δ
|
∂๐œ“
∂๐œ“
ฬ‚
ฬ‚
๐œ“=๐œ“
๐œ“=๐œ“
where passing the differentiation through the integration
๐‘€
∗ );
∂[๐ธ๐œ‚ (๐œ‡(๐œ‚); ๐œ“)]
๐ธ๐œ‚ [∂(๐œ‡(๐œ‚); ๐œ“)]
∂[๐ธ(๐‘Œ; ๐œ“)]
∂(๐œ‡(๐œ‚๐‘š
๐œ“)
|
=
|
=
|
≅∑
|
∂๐œ“
∂๐œ“
∂๐œ“
∂๐œ“
ฬ‚
ฬ‚
ฬ‚
ฬ‚
๐œ“=๐œ“
๐œ“=๐œ“
๐œ“=๐œ“
๐œ“=๐œ“
๐‘–=1
- 37 -
State Space and Marginal Probability Models
ฬ‚ ). Finite differences could be used, for example
where ๐œ‚ ∗ ~๐‘€๐‘‰๐‘(0, Ω
∗ );๐œ“)
∂(๐œ‡(๐œ‚๐‘š
∂๐œ“
=
∗ );
∗ );
[(๐œ‡(๐œ‚๐‘š
๐œ“ + ๐›ฟ) − (๐œ‡(๐œ‚๐‘š
๐œ“)]/๐›ฟ.
Appendix D
Let the response ๐‘Œ be one of K+1 ordered values 0, 1, 2, … , K-1, K, and let Z be defined as in
equation (16). For this derivation (suppressing ๐œ‚ when convenient and with some convenient
abuse of notation), assume the mapping ๐‘ƒ(๐‘Œ = 0) = ๐‘ƒ(๐›พ1 < ๐‘ < ๐›พ0 ), … , ๐‘ƒ(๐‘Œ = ๐พ) =
๐‘ƒ(๐›พ๐พ+1 < ๐‘ ≤ ๐›พ๐พ ) where the ๐›พ0 = ∞ > ๐›พ1 > โ‹ฏ > ๐›พ๐พ−1 > ๐›พ๐พ > ๐›พ๐พ+1 = −∞ are thresholds (K
parameters), and let ๐œ‡ฬ ๐‘ฆ๐‘— = ๐›พ๐‘ฆ๐‘— − ๐‘“๐‘— − ๐‘”๐‘— ๐œ‚ for time j and ๐œˆ12 = ๐œˆ12 (๐œŒ).
๐พ
(๐‘ฆ )
๐‘ฬ‡1 1
(๐‘˜)
= ๐‘ƒ(๐‘Œ1 = ๐‘ฆ1 ) = ๐‘ƒ(๐›พ๐‘ฆ1 +1 < ๐‘1 ≤ ๐›พ๐‘ฆ1 ) = Φ(๐œ‡ฬ ๐‘ฆ1 ) − Φ(๐œ‡ฬ ๐‘ฆ1+1 ), ∑ ๐‘ฬ‡1
=0
๐‘˜=0
(๐‘ฆ ,๐‘ฆ2 )
๐‘ฬ‡121
= ๐‘ƒ(๐‘Œ1 = ๐‘ฆ1 , ๐‘Œ2 = ๐‘ฆ2 ) = ๐‘ƒ(๐›พ๐‘ฆ1+1 < ๐‘1 ≤ ๐›พ๐‘ฆ1 , ๐›พ๐‘ฆ2 +1 < ๐‘2 ≤ ๐›พ๐‘ฆ2 )
= Φ2 (๐œ‡ฬ ๐‘ฆ1 , ๐œ‡ฬ ๐‘ฆ2 , ๐œˆ12 ) − Φ2 (๐œ‡ฬ ๐‘ฆ1 +1 , ๐œ‡ฬ ๐‘ฆ2 , ๐œˆ12 ) − Φ2 (๐œ‡ฬ ๐‘ฆ1 , ๐œ‡ฬ ๐‘ฆ2 +1 , ๐œˆ12 )
+ Φ2 (๐œ‡ฬ ๐‘ฆ1+1 , ๐œ‡ฬ ๐‘ฆ2 +1 , ๐œˆ12 )
where Φ(−∞) = 0, Φ(∞) = 1, and in this notation Φ2 (∞, ∞, ๐œˆ12 ) = 1; Φ2 (๐‘ฅ, ∞, ๐œˆ12 ) =
Φ2 (∞, ๐‘ฅ, ๐œˆ12 ) = Φ(๐‘ฅ); and Φ2 (๐‘ฅ, −∞, ๐œˆ12 ) = Φ2 (๐‘ฅ, −∞, ๐œˆ12 ) = Φ2 (−∞, ∞, ๐œˆ12 ) =
Φ2 (∞, −∞, ๐œˆ12 ) = 0.
- 38 -
State Space and Marginal Probability Models
- 39 -
State Space and Marginal Probability Models
Table 1 Correlations for the Latent Variables (Z) and Responses (Y)
Correlation in LV (Z)
Correlation in Responses (Y)a
ρ/
0
0.3
0.5
0.7
0.9
0
0.3
0.5
0.7
0.9
Elementb
1,2
0.52
0.66
0.76
0.85
0.94
0.30-0.34
0.40-0.47
0.48-0.55
0.59-0.64
0.75-0.78
1,3
0.52
0.53
0.57
0.66
0.83
0.29-0.34
0.31-0.35
0.33-0.39
0.41-0.45
0.56-0.61
1,4
0.49
0.49
0.49
0.52
0.67
0.27-0.32
0.28-0.32
0.27-0.32
0.29-0.33
0.40-0.45
1,5
0.47
0.47
0.47
0.47
0.54
0.25-0.29
0.25-0.29
0.26-0.31
0.26-0.29
0.31-0.34
1,6
0.46
0.46
0.46
0.46
0.49
0.24-0.30
0.26-0.29
0.25-0.29
0.25-0.29
0.26-0.32
1,7
0.46
0.46
0.46
0.46
0.47
0.25-0.30
0.25-0.29
0.25-0.29
0.25-0.29
0.25-0.30
2,3
0.57
0.60
0.67
0.77
0.90
0.34-0.38
0.36-0.41
0.41-0.47
0.50-0.56
0.68-0.71
2,4
0.56
0.56
0.57
0.61
0.76
0.31-0.37
0.33-0.37
0.33-0.38
0.36-0.42
0.49-0.54
2,5
0.55
0.55
0.55
0.56
0.63
0.31-0.36
0.32-0.36
0.33-0.36
0.32-0.37
0.38-0.42
2,6
0.55
0.55
0.55
0.55
0.58
0.31-0.36
0.32-0.35
0.32-0.36
0.32-0.36
0.34-0.39
2,7
0.55
0.55
0.55
0.55
0.56
0.31-0.36
0.32-0.35
0.31-0.36
0.31-0.36
0.31-0.36
3,4
0.65
0.65
0.67
0.73
0.87
0.39-0.44
0.39-0.45
0.41-0.47
0.46-0.51
0.63-0.66
3,5
0.65
0.65
0.65
0.66
0.74
0.38-0.45
0.40-0.45
0.39-0.45
0.41-0.45
0.48-0.52
3,6
0.65
0.65
0.65
0.65
0.69
0.39-0.45
0.40-0.45
0.40-0.45
0.40-0.45
0.43-0.48
3,7
0.65
0.65
0.65
0.65
0.66
0.40-0.45
0.40-0.44
0.40-0.45
0.39-0.45
0.40-0.46
4,5
0.73
0.73
0.73
0.74
0.84
0.46-0.51
0.46-0.52
0.46-0.52
0.48-0.53
0.59-0.64
4,6
0.73
0.73
0.73
0.73
0.78
0.47-0.52
0.47-0.53
0.47-0.52
0.47-0.53
0.52-0.56
4,7
0.73
0.73
0.73
0.73
0.75
0.47-0.51
0.48-0.53
0.49-0.52
0.48-0.53
0.48-0.53
5,6
0.77
0.77
0.77
0.78
0.87
0.51-0.56
0.51-0.56
0.50-0.56
0.52-0.57
0.63-0.67
5,7
0.77
0.77
0.77
0.77
0.80
0.50-0.55
0.51-0.56
0.50-0.56
0.51-0.56
0.53-0.59
6,7
0.78
0.78
0.78
0.78
0.84
0.52-0.56
0.51-0.56
0.51-0.57
0.52-0.57
0.58-0.63
aCorrelation in responses calculated by simulation (n=20,000 per dose group), range reported is over dose groups (0 to 30 mg)
bElement relates indices of the matrices – eg, 2,7 is 2nd row, 7th column. Diagonals of the matrices = 1 by definition.
- 40 -
State Space and Marginal Probability Models
Captions for Figures
Fig. 1. Results for the illustration. Black line is true value. Grey lines with numbers correspond
to results from Models 1 − 5, respectively.
Fig. 2. Percentage bias in the parameter estimates for models M0 through M7.
Fig. 3. Range of bias (line segment) and median bias (point) across these doses and times over
the 1000 simulations for the population predictions (๐‘(1) − first row) and transition probabilities
(๐‘(1|0) and ๐‘(0|1) − second and third row, respectively). Last row corresponds to 90% CI
coverage rates across doses and times over the simulations. Black is for the smoothed parametric
bootstrap and grey is for the delta method.
- 41 -
State Space and Marginal Probability Models
Figure 1
- 42 -
- 43 Parameter
-6 0
๏˜ ๏€ถ๏€ฒ
๏˜ ๏€ถ๏€ณ
๏˜ ๏€ถ๏€ด
๏˜ ๏€ถ๏€ต
๏˜ ๏€ท๏€ฑ
๏˜ ๏€ท๏€ฒ
๏˜ ๏€ท๏€ณ
๏˜ ๏€ท๏€ด
๏˜ ๏€ท๏€ต
๏˜ ๏€ท๏€ถ
๏˜ ๏€ถ๏€ด
๏˜ ๏€ถ๏€ต
๏˜ ๏€ท๏€ฑ
๏˜ ๏€ท๏€ฒ
๏˜ ๏€ท๏€ณ
๏˜ ๏€ท๏€ด
๏˜ ๏€ท๏€ต
๏˜ ๏€ท๏€ถ
๏˜ ๏€ต๏€ณ
๏˜ ๏€ต๏€ฒ
๏˜ ๏€ต๏€ฑ
๏˜ ๏€ด๏€ณ
๏˜ ๏€ด๏€ฒ
๏˜ ๏€ด๏€ฑ
๏˜ ๏€ณ๏€ฒ
๏˜ ๏€ณ๏€ฑ
๏˜ ๏€ฒ๏€ฑ
๏— ๏€ฒ๏€ฒ
๏— ๏€ฑ๏€ฑ
๏ฒ
๏ข๏€ต
๏ข๏€ด
๏ข๏€ณ
๏ข๏€ฒ
๏ข๏€ฑ
-1 0 0 -8 0
๏˜ ๏€ถ๏€ณ
40
๏˜ ๏€ถ๏€ฑ
20
๏˜ ๏€ถ๏€ฒ
M0
M1
M2
M3
M4
M5
M6
M7
๏˜ ๏€ต๏€ด
0
๏ฒ๏€ ๏€ฝ๏€ ๏€ฐ
๏˜ ๏€ถ๏€ฑ
-2 0
% Bia s
-4 0
Parameter
๏˜ ๏€ต๏€ด
๏˜ ๏€ต๏€ณ
๏˜ ๏€ต๏€ฒ
๏˜ ๏€ต๏€ฑ
๏˜ ๏€ด๏€ณ
๏˜ ๏€ด๏€ฒ
๏˜ ๏€ด๏€ฑ
๏˜ ๏€ณ๏€ฒ
๏˜ ๏€ณ๏€ฑ
๏˜ ๏€ฒ๏€ฑ
๏— ๏€ฒ๏€ฒ
๏— ๏€ฑ๏€ฑ
๏ฒ
๏ข๏€ต
๏ข๏€ด
๏ข๏€ณ
๏ข๏€ฒ
๏ข๏€ฑ
-1 0 0 -8 0
-6 0
-2 0
% Bia s
-4 0
0
20
40
State Space and Marginal Probability Models
Figure 2
๏ฒ๏€ ๏€ฝ๏€ ๏€ฐ๏€ฎ๏€ท
M0
M1
M2
M3
M4
M5
M6
M7
State Space and Marginal Probability Models
Figure 3
- 44 -
Download