Estimation across multiple models with application to Bayesian computing and software development Abstract

advertisement
Estimation across multiple models with
application to Bayesian computing and
software development
Richard J Stevens (Diabetes Trials Unit, University of Oxford)
Trevor J Sweeting (Department of Statistical Science, University College London)
Abstract
Statistical models are sometimes incorporated into computer software for making
predictions about future observations. When the computer model consists of a
single statistical model this corresponds to estimation of a function of the model
parameters. This paper is concerned with the case that the computer model implements multiple, individually-estimated statistical sub-models. This case frequently
arises, for example, in models for medical decision making that derive parameter
information from multiple clinical studies. We develop a method for calculating the
posterior mean of a function of the parameter vectors of multiple statistical models that is easy to implement in computer software, has high asymptotic accuracy,
and has a computational cost linear in the total number of model parameters. The
formula is then used to derive a general result about posterior estimation across
multiple models. The utility of the results is illustrated by application to clinical
software that estimates the risk of fatal coronary disease in people with diabetes.
Key words: Bayesian inference; asymptotic theory; computer models.
Research Report No.274, Department of Statistical Science, University College
London. Date: January 2007
1
1
Introduction
It frequently occurs that a computer model estimates a single output from multiple,
individually-estimated statistical models. For example, a computer model for risk
of complications of diabetes (Eastman et al. 1997) combines a survival model for
heart disease estimated from the Framingham Heart Study with a Markov model
for diseases of the eye estimated from the Wisconsin Study of Diabetic Retinopathy and several more models. This approach is typical of many models developed
since in diabetes and in other fields (Brown 2000). A typical output to be estimated might be life-expectancy or quality-adjusted life expectancy. Calculations
are often computationally demanding and a common practice at present is to restrict attention to the maximum likelihood estimate (MLE). When uncertainty
across multiple models is addressed, it is usually by Monte Carlo methods, which
increases the computational burden substantially; so much so that many published
software models take many hours to estimate uncertainty (e.g. Clarke et al. 2004),
or decline to address uncertainty at all (e.g. Eddy and Schlessinger 2003).
In this paper we provide a framework for calculations across multiple statistical
models, in which the computational burden at run-time increases only linearly
with the total number of model parameters. Our approach uses an asymptotic
approximation formula for posterior moments obtained by Sweeting (1996) which
we will refer to here as the summation formula. This formula appears to have
no equivalent outside the Bayesian paradigm and for that reason the remainder of
this paper is restricted to Bayesian estimation via posterior expectations. That the
summation formula can be useful in practice has been shown previously (Stevens
2003). Although other asymptotic methods exist for the single-model situation
(see, for example, Hoeffding 1982, Tierney and Kadane 1986), we show here that
the summation formula has properties relevant to the situation in which a computer
model uses two or more statistical models to obtain output. The summation
formula also has properties which allow most of the computational burden to be
absorbed at development time, with rapid calculations at run-time.
The summation formula is given, without proof, in Section 2. In Section 3 we
prove our main result by considering the case of a single model whose likelihood, for
a parameter vector θ, factorizes into p likelihood functions for subvectors θ1 , . . . , θp .
This, and the more general corollary derived in Section 4, can be applied to the case
of a single computer model whose output depends on p independently estimated
2
statistical models. In Section 5 we give as an example computer software that
combines three statistical models to estimate risk of fatal coronary heart disease.
Section 6 gives a general discussion.
2
Formulae for posterior expectations based on
signed-root transformations
In this section we review the summation formula in Sweeting (1996), which provides
asymptotic approximations to posterior expectations.
Suppose that a model is represented by a likelihood function L(θ) ≡ Ln (θ; x)
on a k-dimensional parameter vector θ = (θ1 , . . . , θk ), based on a data matrix x
with n rows, and a prior density π(θ). Let l(θ) = log L(θ) be the log-likelihood and
θ̂ = (θ̂1 , . . . , θ̂k ) be the MLE of θ. Also let J be the observed information matrix;
that is, J is the matrix of second-order partial derivatives of −l(θ) evaluated at θ =
θ̂. For each i = 1, . . . , k, let Li (θi ) denote the profile likelihood of (θ̂1 , . . . , θ̂i−1 , θi ).
That is, Li (θi ) is the maximum value of L achievable, for a given θi , by fixing
θ1 , . . . , θi−1 at their MLEs and maximising over all possible values of θi+1 , . . . , θk .
Define θ̂i+1 (θi ), θ̂i+2 (θi ), . . . , θ̂k (θi ) to be the values of θi+1 , θi+2 , . . . , θk at which
this conditional maximum is achieved. Finally, define the profile log-likelihoods
li (θi ) = log Li (θi ). We can now define a transformation Ri of each θi , i = 1, . . . , k,
by
Ri (θi ) = sign(θi − θ̂i )[2{l(θ̂) − li (θi )}]1/2 .
Each Ri is a signed-root transformation of θi ; in full, it is a signed-root profile loglikelihood ratio. Note that
P
i {Ri (θi )}
2
= 2{l(θ̂) − l(θ)}, the usual log- likelihood
ratio statistic.
For each i, define θi+ and θi− to be the solutions to the equations Ri (θi ) = 1
and Ri (θi ) = −1 respectively. Sweeting (1996) uses these values as the basis for
an approximation to posterior expectations. For i = 1, . . . , k − 1, define ji+1 (θi ) to
be the matrix







2
− ∂θ∂2 l
·
·
i+1
2
− ∂θk∂∂θli+1
2

2






∂ l
· · · − ∂θi+1
∂θk
···
···
···
·
·
∂ l
− ∂θ
2
k
evaluated at (θ̂1 , . . . , θ̂i−1 , θi , θ̂i+1 (θi ), . . . , θ̂k (θi )), with jk+1 (θk ) defined to be 1 and
j1 (θ̂) = J. Finally define πi (θi ) = π(θ̂1 , . . . , θ̂i−1 , θi , θ̂i+1 (θi ), . . . , θ̂k (θi )).
3
Now define
(
τi+
=
πi (θi+ )|ji+1 (θi+ )|1/2
=
πi (θi− )|ji+1 (θi− )|1/2
(
τi−
)−1
∂
l(θ̂1 , . . . , θ̂i−1 , θi+ , θ̂i+1 (θi+ ), . . . , θ̂k (θi+ ))
∂θi
(1)
)−1
∂
l(θ̂1 , . . . , θ̂i−1 , θi− , θ̂i+1 (θi− ), . . . , θ̂k (θi− ))
∂θi
and τi = τi− + τi+ . Finally, let αi+ = τi+ /τi and αi− = τi− /τi . Then Sweeting’s
(1996) approximation to the posterior expectation E{g(θ)} of a ‘smooth’ function
g(θ) is
E{g(θ)} = g(θ̂) +
k
X
{αi+ gi+ + αi− gi− − g(θ̂)}
(2)
i=1
in which gi+ = g(θ̂1 , . . . , θ̂i−1 , θi+ , θ̂i+1 (θi+ ), . . . , θ̂k (θi+ )) and gi− = g(θ̂1 , . . . , θ̂i−1 , θi− ,
θ̂i+1 (θi− ), . . . , θ̂k (θi− )). The error term in (2) turns out to be O(n−2 ) (Sweeting,
1996). This may be compared with the error of the plug-in approximation g(θ̂),
which is O(n−1 ). The above result holds whenever the prior density π(θ) is O(1)
with respect to n.
We emphasise that, despite the complexity of the derivations, the implementation of these methods is relatively straightforward. The presentation here is restricted to signed-root log-likelihood ratios. Sweeting (1996) embeds these in the
more general framework of signed-root log-density ratios. Sweeting and Kharroubi
(2003) obtain alternative versions of formula (2) in which the abscissae θi+ , θi− are
−1/2
placed symmetrically either side of θ̂i at a distance ki
, where ki is the recip-
rocal of the first entry in {ji (θ̂i−1 )}−1 . It is shown that these formulae are also
correct to O(n−2 ). A disadvantage of these versions is that they are not invariant
to reparameterisation, so some care needs to be taken when choosing the parameterisation. On the other hand, they may be easier to implement as no inversion of
the signed-root log-likelihood ratio is required for the computation of θi+ and θi− .
It is possible to obtain an approximation for the posterior variance of g(θ) by
applying formula (2) to both g(θ) and {g(θ)}2 . However, in terms of asymptotic
error of approximation, no gain is made over the simpler approximation that uses
the observed information matrix along with the delta method,
0
0
Var{g(θ)} = {g (θ̂)}T J −1 g (θ̂) ,
(3)
0
since this is already an O(n−2 ) approximation. Here g (θ) is the column vector
of first-order partial derivatives of g(θ). If further a normal approximation to the
4
posterior distribution of g(θ) is appropriate, then clearly interval estimates for
g(θ) can be obtained from (2) and (3). In general such (two-sided) intervals will
only be accurate to O(n−1 ). Higher-order accuracy may be achieved by using the
distributional approximation formulae in Sweeting (1996), for example.
3
Application to multi-component models
In this section we consider the case of a single computer model that incorporates
p statistical sub-models, possibly derived from independent data sets, to calculate
an output. The parameter vector of the combined model is θ = (θ1 , . . . , θp ), where
θm = (θm1 , . . . , θmkm ) is the vector of parameters associated with the mth submodel, m = 1, . . . , p. The dimension of the overall parameter space is therefore
k = k1 + · · · + kp . Let the data matrix associated with sub-model m have nm rows.
We assume that the likelihood factorizes as
L(θ) = L1 (θ1 )L2 (θ2 ) · · · Lp (θp ) ,
(4)
where now Lm (θm ) denotes the likelihood function associated with the mth submodel. Independent data matrices for the sub-models will be sufficient (but not
necessary) for this factorisation. The profile likelihood of (θ̂m1 , . . . , θ̂m,i−1 , θmi ) for
the mth sub-model is denoted by Lmi (θmi ). We further suppose that the component
parameter vectors θ1 , . . . , θp are a priori independent so that the prior density also
factorizes as
π(θ) = π1 (θ1 )π2 (θ2 ) · · · πp (θp ) .
(5)
It will be convenient to re-express formula (2) in a new notation. For each
sub-model m = 1, . . . , p define a set of abscissae θm [j] , j = 0, . . . , 2km , as follows.
θm [0] = θ̂m
+
+
+
θm [2i − 1] = (θ̂m1 , . . . , θ̂m,i−1 , θmi
, θ̂m,i+1 (θmi
), · · · , θ̂m,km (θmi
))
−
−
−
θm [2i] = (θ̂m1 , . . . , θ̂m,i−1 , θmi
, θ̂m,i+1 (θmi
), · · · , θ̂m,km (θmi
))
for i = 1, . . . , km and define the corresponding set of weights αm [j] by
αm [0] = 1 − km
+
αm [2i − 1] = αmi
−
αm [2i] = αmi
.
5
Now suppose that some quantity g = g(θ) is the output of the computer model
given the parameter vector θ, possibly dependent on additional data supplied by
the computer user. We derive formulae for E{g(θ)} as follows. Let Em {g(θ)}
denote the posterior expectation of g(θ) obtained from sub-model m alone, with
the parameter vectors θs , s 6= m held fixed. Then, from equation (2),
Em {g(θ)} =
2k
m
X
αm [j] g(θ1 , . . . , θm−1 , θm [j] , θm+1 , . . . , θp )
(6)
j=0
to O(n−2
m ).
Given the likelihood and prior factorisations (4) and (5), formula (6) gives
the conditional posterior expectation of g(θ) given θs , s 6= m. Application of the
iterated expectation formula therefore gives the posterior expectation of g to be
E{g(θ)} =
2k1
X
j1 =0
···
2kp
X
α1 [j1 ] · · · αp [jp ] g(θ1 [j1 ] , . . . , θp [jp ]) .
(7)
jp =0
Notice that this formula requires (2k1 + 1) × (2k2 + 1) × . . . × (2kp + 1) evaluations
of the function g.
The results derived in the Appendix provide an alternative to (7). Consider
instead the overall k−parameter model. Given the factorisation of the component
likelihood functions, the fitted parameter vector is θ̂ = (θ̂1 , . . . , θ̂p ). From the
results in the Appendix, the abscissae for the signed-root approximations are θ [0] =
θ̂ and, for j = 1, . . . , 2km , m = 1, . . . , p,
θ [2(k1 + · · · + km−1 ) + j] = (θ̂1 , . . . , θ̂m−1 , θm [j] , θ̂m+1 , . . . , θ̂p ) ,
where θm [j] is obtained from the mth likelihood alone. The corresponding weights,
from the Appendix, are α [0] = 1 − k and
α [2(k1 + · · · + km−1 ) + j] = αm [j] ,
where again αm [j] is obtained from the mth likelihood alone. Invoking formula
(2), we obtain the alternative approximation
E{g(θ)} =
2k
X
α [r] g(θ [r]) ,
(8)
r=0
which requires only 2k + 1 evaluations of g compared with the
Q
m (2km + 1)
evalua-
tions for formula (7). Furthermore, although both formulae (7) and (8) are correct
6
to O(n−2
min ), where nmin = min(nm , m = 1, . . . , p), formula (8) is likely to be more
accurate in practice, since the errors will be compounded in (7). We note that
the quantities θm [j] and αm [j] depend only on the likelihood and prior from the
mth sub-model and, for each sub-model m, the components θs , s 6= m, are fixed
at their MLE values in the evaluation of g(θ [r]).
The developers of a computer model, or its component statistical models, would
determine the values of θ[r] and α[r], r = 1, . . . , 2m, during the development
cycle. In software to calculate estimates of some function g(θ, X), with values of
X provided by the user, the chief computational burden at run-time is the 2m + 1
evaluations of g(θ[r], X).
4
A general posterior expectation decomposition
In this section we note that formula (8) can be decomposed into p approximate
posterior expectations from each of the sub-models. This leads to an interesting
general approximation formula for the posterior expectation of g in terms of the
component posterior expectations.
Given a function g(θ) in the multi-component situation of Section 3, define the
p functions
gm (θm ) = g(θ̂1 , . . . , θ̂m−1 , θm , θ̂m+1 · · · θ̂p ) .
Then, from formula (8),
E{g(θ)} = (1 − k)g(θ̂) +
= (1 − k)g(θ̂) +
= (1 − k)g(θ̂) +
2k
X
α [r] g(θ [r])
r=1
p 2k
m
X
X
αm [j] gm (θm [j])
m=1 j=1
p
X
[E{gm (θm )} − (1 − km )g(θ̂)]
m=1
= g(θ̂) +
p
X
[E{gm (θm )} − g(θ̂)] ,
(9)
m=1
which expresses the overall posterior expectation of g(θ) as the first-order approximation g(θ̂) plus a sum of correction terms calculated from each sub-model. It is
not hard to see that (9) holds to O(n−1
min ). What is new here is that it holds to the
higher asymptotic order O(n−2
min ).
This relationship between posterior expectations and maximum likelihood estimates may find application in calculating expectations across multiple models,
7
even when the asymptotic methods of Sections 2 and 3 are not employed. For
example, if the sub-models are low dimensional, then exact, or numerically computed, component expectations could be used in (9). In the case of numerical
integration, formula (9) gives a clear computational benefit over p−dimensional
numerical integration. We further note that one or more of the MLEs of θm in
(9) could be replaced by their posterior means if these were more readily available
(e.g. analytically). This can be shown using the fact that, in general, the posterior
mean and the MLE differ by O(n−1 ).
Sweeting and Kharroubi (2003) derive an alternative version of the summation
formula (2) that is of the form
E{g(θ)} =
k
X
wi (αi+ gi+ + αi− gi− ) ,
i=1
where wi are weights satisfying
P
i
wi = 1 and the gi± values are evaluations of g at
points defined in an alternative way to the θi± in Section 2. It turns out that this
approximation is also O(n−2 ). A discussion of the advantages of this version of the
formula is given in that paper. We simply note here that this alternative version
could be used to approximate the component posterior expectations of gm (θm ) and
then (9) used to approximate the overall posterior expectation of g(θ).
The posterior uncertainty in g(θ) may be measured by its posterior variance,
which again can be obtained from the sub-model variance formulae. Specifically,
from (3) we have, to O(n−2 ),
0
0
Var{g(θ)} = {g (θ̂)}T J −1 g (θ̂)
=
p
X
0
0
−1
{gm (θ̂m )}T Jm
gm (θ̂m ) =
m=1
p
X
Var{gm (θm )} ,
m=1
0
where gm (θm ) is the column vector of first-order partial derivatives of gm (θm ) and
Jm is the observed information associated with sub-model m. The above additive
formula follows since J is block diagonal. Again, any exact or approximate values
for Var{gm (θm )} could be used in this formula.
5
Example
We illustrate the use of (8) and (9) with an application in which three statistical
models are used to estimate risk of fatal coronary heart disease.
8
The UK Prospective Diabetes Study (UKPDS) Group have published an equation for coronary case fatality in type 2 diabetes (Stevens et al. 2004). Clinically,
case fatality is the proportion of cases of a disease that are fatal; for the purposes of
the model, case fatality is defined as the probability that coronary disease is fatal,
conditional on coronary disease occurring. Given an individual with observed value
x = (x1 , . . . , x5 )T of a vector of covariates X = (X1 , . . . , X5 )T , with the convention
that X1 = 1, the UKPDS model for case fatality p(x), given a parameter vector
µ = (µ1 , . . . , µ5 ), is
h
n
p(x) = 1 + exp −xT µ
oi−1
(10)
The MLE of µ is µ̂ = (0.713, 0.048, 0.178, 0.141, 0.104) (with corresponding standard errors 0.194, 0.0236, 0.0119, 0.0612 and 0.0417), derived by fitting a logistic
regression model to data from 597 people with coronary disease as described in
detail previously (Stevens et al. 2004).
The UKPDS Group have also published an equation for risk of coronary disease
in type 2 diabetes (UKPDS Group 2001). Given an individual with observed value
y = (y1 , . . . , y8 )T of a vector of covariates Y = (Y1 , . . . , Y8 )T , with the convention
that Y1 = 1, the modelled risk of coronary disease over t years depends on a
parameter vector ν = (ν1 , . . . , ν9 ) as follows:
R(y, t) = 1 − exp{−q(1 − ν9t )/(1 − ν9 )} ,
(11)
where q = exp {(ν1 , . . . , ν8 )y}.
The MLE of ν is ν̂ = (0.0112, 1.059, 0.525, 0.390, 1.350, 1.183, 1.088, 3.845, 1.078)
(with corresponding standard errors 0.00154, 0.0155, 0.00666, 0.0511, 0.103, 0.124,
0.0365, 0.0250 and 0.638, derived by fitting the model to 4,540 people with a median 10.7 years of follow-up, as described previously (UKPDS Group 2001). There
were 597 incident cases of coronary disease in this cohort, and these are the same
597 that were used to fit the case fatality equation (10).
Variable X3 in the case fatality model, and variable Y5 in the coronary risk
model, take the same value. Glycated haemoglobin (abbreviated HbA1c) is a
widely-used indicator of prevailing blood glucose levels over recent months. Measurements of HbA1c are subject to variation between laboratories. The models
define X3 and Y5 to be HbA1c as measured on an assay aligned to the reference
laboratories of the National Glycosylation Standardisation Program (NGSP). The
NGSP publishes linear regression models
X3 = λ1 + λ2 h + z , z ∼ N (0, λ23 ).
9
(12)
relating X3 , HbA1c measured in an NGSP reference laboratory, to h, the HbA1c
measured in another laboratory. Let λ = (λ1 , λ2 , λ3 ). For a particular local laboratory the MLE of λ was λ̂ = (0.229, 0.982, 0.124) (with corresponding standard
errors 0.079, 0.0096 and 0.014), so that an HbA1c of 4.5% in the local laboratory
corresponds to a MLE of 4.65% at the reference laboratory.
Although the risk model and case fatality model were estimated from data
from the same study, they still satisfy the factorisation criterion (4). Let F denote
the set of individuals with fatal coronary disease, N denote the set of individuals
with nonfatal coronary disease, and C denote the set of individuals censored for
coronary disease within the UKPDS. Further, write
ri (ν) = P (disease in individual i),
pi (µ) = P (fatal disease in individual i|disease in individual i)
Then the likelihood for the case fatality model is L1 (µ) =
and the likelihood for the risk model is L2 (ν) =
Q
i∈C
Q
i∈F
pi (µ)
{1 − ri (ν)}
Q
Q
i∈N
i∈F,N
{1 − pi (µ)}
ri (ν). The
combined likelihood function that would be required to estimate µ and ν jointly is
L(µ, ν) =
Y
i∈C
{1 − ri (ν)}
Y
pi (µ)ri (ν)
i∈F
Y
{1 − pi (µ)} ri (ν),
i∈N
which equals L1 (µ)L2 (ν) as required in (4). The extension of this to include the
independent data on which λ is estimated is trivial.
Consider a hypothetical patient with HbA1c of 4.5% in the local laboratory,
Xi = 0 for all i > 0, i 6= 3, and Yi = 0 for all i > 0, i 6= 5. For simplicity,
we consider here the case that t = 1. Using λ̂, µ̂ and ν̂ the MLE of one-year
coronary risk is 0.0078707 and the MLE of one-year case fatality is 0.23752, giving
an estimated risk of fatal coronary disease 0.0078707 × 0.23752 = 0.001869. We
wish to calculate the posterior expectation of the risk of fatal coronary disease
over λ, µ, ν under the independent prior specifications πλ (λ) ∝ 1/λ2 and πµ (µ) =
πν (ν) ∝ 1. Taking p = 3, with θ1 = λ, θ2 = µ and θ3 = ν, we can apply the
approximations of Section 3. To three significant figures, the approximate posterior
expectation by the full method (7) is 0.001903 and requires 1,463 evaluations. The
approximation (8) to the posterior expectation is also 0.001903 and requires only
35 evaluations. For comparison, the exact answer by simulation is 0.001907.
To six significant figures the MLE of fatal coronary risk is 0.00186948. Fixing
λ = λ̂, ν = ν̂, the expectation over µ calculated by (8) is 0.00187545. Fixing
10
λ = λ̂, µ = µ̂, the expectation over ν by (8) is 0.00189733. Fixing µ = µ̂, ν = ν̂,
the expectation over λ by numerical integration is 0.00186948. Hence, using the
decomposition formula (9), the expectation over λ, µ, ν is approximately −2 ×
0.00186948 + 0.00189733 + 0.00187545 + 0.00186960 = 0.001903.
Table 1 compares the MLE and approximations (8) and (9) to the posterior
expectation, calculated by simulation, for 25 individuals selected at random from
the UKPDS. The higher-order approximations to the expectation show a smaller
mean absolute error than the MLE. Compared to the variation between patients,
however, the error in the MLE is small, due to the large sample size on which
the risk and case fatality models were estimated. To illustrate the performance of
the approximations in models derived from a smaller sample, we refit the risk and
case fatality models using a 15% random subsample of the UKPDS. Table 2 shows
simulation results and asymptotic approximations for this analysis.
6
Discussion
We have presented formulae for fast estimates of posterior quantities with high
asymptotic accuracy. In our example the good asymptotic properties of the formulae resulted in substantially reduced error relative to the posterior standard
deviation. Although the formulae are for Bayesian estimation, Stevens (2003)
showed that they can also be useful in other contexts.
The example in Section 5 is one of many situations in which one model is used
to estimate the input to another. Another is a class of methods for correcting
regression dilution that consist of estimating a scaling factor from a repeated measures model, which is then used to modify the effect parameter, or equivalently
the input variable, preparatory to calculating a prediction from a regression model
(Frost and Thompson 2000). The methods of Sections 3 and 4 could be used to
calculate expectations across both the repeated measures model and the regression
model.
Our examples have emphasised the use of the results of Section 3 in combining
multi-component models. They may also have a use in reducing the computational
burden in fitting a single model with very many parameters. Calculation of maximum likelihood estimates and profile likelihood functions becomes increasingly
difficult with the square of the dimension of the parameter vector. If a problem with many parameters can be factorized in the manner of Section 3 then the
11
weights and abscissae for these approximations can be calculated on p problems of
lower dimension.
The factorisation criteria (4) and (5) will be met whenever the p models are estimated on p independent data sets, but the example shows that factorisation can
also arise from conditionality as well as independence. It is possible to relax the
factorisation (5) of the prior density (although not the factorisation (4) of the likelihood). Let π1 (θ1 ), . . . , πp (θp ) be the marginal prior densities, or some other densities chosen for convenience, of θ1 , . . . , θp and define φ(θ) = π(θ)/{π1 (θ1 ) · · · πp (θp )}.
Then φ(θ) may be absorbed into the function g(θ) before computing the expectation (8). Since φ(θ) is O(1), the resulting approximation will continue to be of the
same order of accuracy; details are not given here.
This paper was motivated by the “UKPDS Risk Engine” software project,
which provides clinicians with estimates of coronary and stroke risk in people with
diabetes. The methods of this paper made a substantial contribution to the successful drive to make the Risk Engine run acceptably fast on a Palm computer.
They should also find application in the many other models, in diabetes and elsewhere, that use multi-sub-models.
Appendix: Derivation of weights and abscissae
We obtain the abscissae and weights used in formula (8) for the overall multicomponent model when the likelihood function and prior density factorise as in
(4) and (5).
Defining lm (θm ) = log Lm (θm ) to be the log-likelihood for the mth model, we
observe that
l(θ1 , . . . , θk ) = l1 (θ1 ) + · · · + lp (θp )
so that the maximiser of l is θ̂ = (θ̂1 , . . . , θ̂p ), where θ̂m is the maximiser of lm (θm ).
For a given m and 1 ≤ i ≤ km , as in Section 2 we let θ̂mj (θmi ), j = i+1, . . . , km , and
θ̂rj (θmi ), j = 1, . . . , kr , r = m + 1, . . . , p, denote the values of θm,i+1 , . . . , θpkp that
maximise l(θ) conditional on (θ̂1 , . . . , θ̂m−1 , θ̂m1 , . . . , θ̂m,i−1 , θmi ). But it follows from
the factorisation (4) of L that θ̂mj (θmi ), j > i does not depend on (θ̂1 , . . . , θ̂m−1 )
and that θ̂rj (θmi ) = θ̂rj for r > m. Therefore the profile log-likelihood associated
with the (k1 + · · · + km−1 + i)th component θmi of θ, i = 1, . . . , km , m = 1, . . . , p,
12
can be written as
l1 (θ̂1 ) + · · · + lm−1 (θ̂m−1 ) + lmi (θmi ) + lm+1 (θ̂m+1 ) + · · · + lp (θ̂p )
in which lmi (θmi ) is the logarithm of the profile likelihood Lmi (θmi ) defined in Section 3. It follows that the (m, i)th signed-root transformation is simply calculated
from the mth log-likelihood function as
Rmi (θmi ) = sign(θmi − θ̂mi )[2{lm (θ̂m ) − lmi (θmi )}]1/2 ,
+
−
and hence that θmi
and θmi
are obtained from the mth likelihood alone.
Next consider the weights in the overall model. In view of the factorisation of
the likelihood, it is readily seen that the log-likelihood derivative in the definition
+
of τmi
associated with the (m, i)th component of θ is simply
∂
+
+
+
lm (θ̂m1 , . . . , θ̂m,i−1 , θmi
, θ̂m,i+1 (θmi
), . . . , θ̂mkm (θmi
)) .
∂θmi
Similarly, the matrix jr+1 in the combined model is block diagonal, so that when
r = 2(k1 + · · · + km−1 ) + i its determinant is
|jmi (θ̂mi )||Jm+1 | · · · |Jp | ,
where Jm is the observed information matrix associate with model m. It follows
that the (m, i)th component of τ + for the combined model is
Y
s6=m
πs (θ̂s )
Y
s>m
+
|Js |−1/2 τmi
,
+
where τmi
is given by equation (1) applied to model m. Since we get a similar
expression for the (m, i)th component of τ − , the common factors in these components of τ + and τ − cancel when we form the (m, i)th component of α+ and α− . It
+
−
follows that the (m, i)th components of α+ and α− are just αmi
and αmi
obtained
from the mth model.
Acknowledgements
This paper arises from work carried out when RJS was a Wellcome Trust research
fellow at the Diabetes Trials Unit, Oxford. We are grateful to the UK Prospective Diabetes Study group, and to the laboratory of the Diabetes Trials Unit for
permission to use the example.
13
References
Brown J.B. 2000. Computer models of diabetes: almost ready for prime time,
Diabetes Research and Clinical Practice 50(Supplement 3):S1–3.
Clarke P.M., Gray A.M., Briggs A., Farmer A.J., Fenn P. et al. 2004. A model
to estimate the lifetime health outcomes of patients with Type 2 diabetes:
the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model
(UKPDS no. 68) Diabetologia 47: 1747–59.
Eastman R.C., Javitt J.C., Herman W.H., Dasbach E.J., Zbrozek A.S. et al. 1997.
Model of complications of NIDDM. I. Model construction and assumptions,
Diabetes Care 20: 725–34.
Eddy D.M. and Schlessinger L. 2003. Archimedes: A trial-validated model of
diabetes. Diabetes Care 26: 3093–3101.
Frost C. and Thompson S. 2000. Correcting for regression dilution bias: comparison of methods for a single predictor variable, Journal of the Royal Statistical
Society series A 163: 173–190.
Hoeffding W. 1982. Asymptotic normality. In: Kotz S., Johnson N.L. and Read
C.B. (Eds.), Encyclopedia of Statistical Sciences. Wiley.
Stevens R.J. 2003. Evaluation of methods for interval estimation of model outputs, with application to survival models. Journal of Applied Statistics 30:
967–981.
Stevens R.J., Coleman R.L., et al. 2004. Risk factors for myocardial infarction
case fatality and stroke case fatality in type 2 diabetes (UKPDS 66). Diabetes
Care 27: 201–207.
Sweeting T.J. 1996. Approximate Bayesian computation based on signed roots of
log-density ratios (with Discussion). In: Bernardo J.M., Berger J.O., Dawid
A.P., and Smith A.F.M. (Eds.), Bayesian Statistics 5. Oxford University
Press.
Sweeting T. J. and Kharroubi S. A. 2003. Some new formulae for posterior
expectations and Bartlett corrections. Test 12: 497–521.
14
Tierney L. and Kadane J. B. 1986. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81: 82–86.
UKPDS Group. 2001. The UKPDS Risk Engine: a model for the risk of coronary
heart disease in type 2 diabetes (UKPDS 56). Clinical Science 101: 671–679.
15
Table 1. Modelled one-year risk of fatal coronary disease in 25 randomly selected individuals:
exact posterior expectation calculated by simulation, compared to the maximum likelihood
estimate (MLE) and the expectation estimated by formulae (8) and (9) as described in the text.
Patient
Posterior expectation
(Monte-Carlo standard
error)
Posterior
standard
deviation
MLE
Formula
(8)
Formula
(9)
a
0.000913 (2.0 × 10-7)
0.000329
0.000836
0.000912
0.000912
b
0.000272 (5.6 × 10-8)
0.000098
0.000248
0.000271
0.000271
-7
c
0.000483 (1.0 × 10 )
0.000163
0.000461
0.000483
0.000483
d
0.004536 (6.1 × 10-7)
e
0.000963
0.004405
0.004534
0.004534
-7
0.000646
0.002171
0.002253
0.002253
-7
0.002254 (4.1 × 10 )
f
0.001034 (2.1 × 10 )
0.000341
0.000990
0.001033
0.001033
g
0.000064 (1.6 × 10-8)
0.000028
0.000061
0.000064
0.000064
-7
h
0.005467 (6.7 × 10 )
0.001047
0.005384
0.005468
0.005468
i
0.002598 (3.5 × 10-7)
j
0.000553
0.002550
0.002598
0.002598
-7
0.000432
0.001923
0.001944
0.001944
-7
0.001944 (2.7 × 10 )
k
0.002100 (2.8 × 10 )
0.000443
0.002087
0.002101
0.002101
l
0.002884 (4.3 × 10-7)
0.000694
0.002821
0.002884
0.002884
-7
m
0.002335 (3.1 × 10 )
0.000481
0.002289
0.002335
0.002335
n
0.004851 (6.2 × 10-7)
o
p
q
0.000977
0.004802
0.004852
0.004852
-6
0.001811
0.008968
0.009125
0.009125
-7
0.000868
0.003380
0.003507
0.003507
-7
0.000633
0.003032
0.003081
0.003081
-7
0.009124 (1.2 × 10 )
0.003509 (5.7 × 10 )
0.003081 (4.1 × 10 )
r
0.004077 (6.6 × 10 )
0.001037
0.003922
0.004074
0.004074
s
0.004236 (5.7 × 10-7)
t
u
v
0.000886
0.004148
0.004236
0.004236
-7
0.000516
0.001583
0.001626
0.001626
-7
0.000655
0.003063
0.003070
0.003070
-7
0.000513
0.002245
0.002286
0.002286
-6
0.001626 (3.2 × 10 )
0.003069 (4.1 × 10 )
0.002285 (3.2 × 10 )
w
0.011519 (1.4 × 10 )
0.002226
0.011322
0.011520
0.011520
x
0.004730 (6.5 × 10-7)
0.001031
0.004621
0.004729
0.004729
0.001404
0.007024
0.007196
0.007196
11%
0.15%
0.23%
y
-7
0.007197 (9.3 × 10 )
Mean error1
16
1
For each row, we calculated the absolute value of the difference between each estimate and the exact
value by simulation, and divided by the posterior standard deviation. ‘Mean error’ in the last row
denotes the average across all 25 rows.
Table 2. One-year risk of fatal coronary disease according to models derived from a subsample
of the UKPDS, as described in the text. MLE and mean error defined as in Table 1.
Patient
Posterior expectation
(Monte-Carlo standard
error)
Posterior
standard
deviation
MLE
Formula
(8)
Formula
(9)
a
0.002967 (1.8 × 10-5)
b
0.002094
0.002033
0.002875
0.002875
-6
0.000297
0.000243
0.000358
0.000358
-6
0.000380 (3.0 × 10 )
c
0.000381 (3.1 × 10 )
0.000286
0.000282
0.000367
0.000367
d
0.006667 (3.0 × 10-5)
0.003207
0.005474
0.006545
0.006545
-5
e
0.002049 (1.5 × 10 )
0.001261
0.001586
0.001987
0.001987
f
0.000761 (7.2 × 10-6)
g
h
i
0.000537
0.000574
0.000735
0.000735
-7
0.000044
0.000033
0.000043
0.000043
-5
0.002462
0.005183
0.005610
0.005610
-5
0.001864
0.003267
0.003632
0.003632
-5
0.000043 (4.1 × 10 )
0.005730 (3.2 × 10 )
0.003703 (1.7 × 10 )
j
0.001688 (1.1 × 10 )
0.000842
0.001563
0.001641
0.001641
k
0.002185 (1.2 × 10-5)
l
m
n
0.001008
0.002094
0.002143
0.002143
-5
0.001019
0.001500
0.001730
0.001730
-5
0.001178
0.002194
0.002432
0.002432
-5
0.002614
0.005278
0.005504
0.005504
-5
0.001783 (1.6 × 10 )
0.002487 (1.2 × 10 )
0.005609 (3.1 × 10 )
o
0.012677 (6.6 × 10 )
0.005601
0.011483
0.012523
0.012523
p
0.005380 (2.6 × 10-5)
0.002828
0.004255
0.005274
0.005274
-5
q
0.003956 (1.9 × 10 )
0.001912
0.003599
0.003889
0.003889
r
0.006368 (3.2 × 10-5)
s
0.003499
0.004985
0.006244
0.006244
-5
0.002557
0.004608
0.005153
0.005153
-5
0.005246 (2.7 × 10 )
t
0.002094 (1.4 × 10 )
0.001366
0.001839
0.002100
0.002100
u
0.004599 (2.1 × 10-5)
0.002120
0.004624
0.004565
0.004565
-5
v
0.001714 (1.2 × 10 )
0.000864
0.001503
0.001662
0.001662
w
0.016274 (7.9 × 10-5)
x
y
0.006945
0.014721
0.016133
0.016133
-5
0.002237
0.003794
0.004382
0.004382
-5
0.005190
0.011112
0.012589
0.012589
27%
3.9%
3.9%
0.004516 (2.8 × 10 )
0.012669 (4.9 × 10 )
Mean error
17
Download