Estimation across multiple models with application to Bayesian computing and software development Richard J Stevens (Diabetes Trials Unit, University of Oxford) Trevor J Sweeting (Department of Statistical Science, University College London) Abstract Statistical models are sometimes incorporated into computer software for making predictions about future observations. When the computer model consists of a single statistical model this corresponds to estimation of a function of the model parameters. This paper is concerned with the case that the computer model implements multiple, individually-estimated statistical sub-models. This case frequently arises, for example, in models for medical decision making that derive parameter information from multiple clinical studies. We develop a method for calculating the posterior mean of a function of the parameter vectors of multiple statistical models that is easy to implement in computer software, has high asymptotic accuracy, and has a computational cost linear in the total number of model parameters. The formula is then used to derive a general result about posterior estimation across multiple models. The utility of the results is illustrated by application to clinical software that estimates the risk of fatal coronary disease in people with diabetes. Key words: Bayesian inference; asymptotic theory; computer models. Research Report No.274, Department of Statistical Science, University College London. Date: January 2007 1 1 Introduction It frequently occurs that a computer model estimates a single output from multiple, individually-estimated statistical models. For example, a computer model for risk of complications of diabetes (Eastman et al. 1997) combines a survival model for heart disease estimated from the Framingham Heart Study with a Markov model for diseases of the eye estimated from the Wisconsin Study of Diabetic Retinopathy and several more models. This approach is typical of many models developed since in diabetes and in other fields (Brown 2000). A typical output to be estimated might be life-expectancy or quality-adjusted life expectancy. Calculations are often computationally demanding and a common practice at present is to restrict attention to the maximum likelihood estimate (MLE). When uncertainty across multiple models is addressed, it is usually by Monte Carlo methods, which increases the computational burden substantially; so much so that many published software models take many hours to estimate uncertainty (e.g. Clarke et al. 2004), or decline to address uncertainty at all (e.g. Eddy and Schlessinger 2003). In this paper we provide a framework for calculations across multiple statistical models, in which the computational burden at run-time increases only linearly with the total number of model parameters. Our approach uses an asymptotic approximation formula for posterior moments obtained by Sweeting (1996) which we will refer to here as the summation formula. This formula appears to have no equivalent outside the Bayesian paradigm and for that reason the remainder of this paper is restricted to Bayesian estimation via posterior expectations. That the summation formula can be useful in practice has been shown previously (Stevens 2003). Although other asymptotic methods exist for the single-model situation (see, for example, Hoeffding 1982, Tierney and Kadane 1986), we show here that the summation formula has properties relevant to the situation in which a computer model uses two or more statistical models to obtain output. The summation formula also has properties which allow most of the computational burden to be absorbed at development time, with rapid calculations at run-time. The summation formula is given, without proof, in Section 2. In Section 3 we prove our main result by considering the case of a single model whose likelihood, for a parameter vector θ, factorizes into p likelihood functions for subvectors θ1 , . . . , θp . This, and the more general corollary derived in Section 4, can be applied to the case of a single computer model whose output depends on p independently estimated 2 statistical models. In Section 5 we give as an example computer software that combines three statistical models to estimate risk of fatal coronary heart disease. Section 6 gives a general discussion. 2 Formulae for posterior expectations based on signed-root transformations In this section we review the summation formula in Sweeting (1996), which provides asymptotic approximations to posterior expectations. Suppose that a model is represented by a likelihood function L(θ) ≡ Ln (θ; x) on a k-dimensional parameter vector θ = (θ1 , . . . , θk ), based on a data matrix x with n rows, and a prior density π(θ). Let l(θ) = log L(θ) be the log-likelihood and θ̂ = (θ̂1 , . . . , θ̂k ) be the MLE of θ. Also let J be the observed information matrix; that is, J is the matrix of second-order partial derivatives of −l(θ) evaluated at θ = θ̂. For each i = 1, . . . , k, let Li (θi ) denote the profile likelihood of (θ̂1 , . . . , θ̂i−1 , θi ). That is, Li (θi ) is the maximum value of L achievable, for a given θi , by fixing θ1 , . . . , θi−1 at their MLEs and maximising over all possible values of θi+1 , . . . , θk . Define θ̂i+1 (θi ), θ̂i+2 (θi ), . . . , θ̂k (θi ) to be the values of θi+1 , θi+2 , . . . , θk at which this conditional maximum is achieved. Finally, define the profile log-likelihoods li (θi ) = log Li (θi ). We can now define a transformation Ri of each θi , i = 1, . . . , k, by Ri (θi ) = sign(θi − θ̂i )[2{l(θ̂) − li (θi )}]1/2 . Each Ri is a signed-root transformation of θi ; in full, it is a signed-root profile loglikelihood ratio. Note that P i {Ri (θi )} 2 = 2{l(θ̂) − l(θ)}, the usual log- likelihood ratio statistic. For each i, define θi+ and θi− to be the solutions to the equations Ri (θi ) = 1 and Ri (θi ) = −1 respectively. Sweeting (1996) uses these values as the basis for an approximation to posterior expectations. For i = 1, . . . , k − 1, define ji+1 (θi ) to be the matrix 2 − ∂θ∂2 l · · i+1 2 − ∂θk∂∂θli+1 2 2 ∂ l · · · − ∂θi+1 ∂θk ··· ··· ··· · · ∂ l − ∂θ 2 k evaluated at (θ̂1 , . . . , θ̂i−1 , θi , θ̂i+1 (θi ), . . . , θ̂k (θi )), with jk+1 (θk ) defined to be 1 and j1 (θ̂) = J. Finally define πi (θi ) = π(θ̂1 , . . . , θ̂i−1 , θi , θ̂i+1 (θi ), . . . , θ̂k (θi )). 3 Now define ( τi+ = πi (θi+ )|ji+1 (θi+ )|1/2 = πi (θi− )|ji+1 (θi− )|1/2 ( τi− )−1 ∂ l(θ̂1 , . . . , θ̂i−1 , θi+ , θ̂i+1 (θi+ ), . . . , θ̂k (θi+ )) ∂θi (1) )−1 ∂ l(θ̂1 , . . . , θ̂i−1 , θi− , θ̂i+1 (θi− ), . . . , θ̂k (θi− )) ∂θi and τi = τi− + τi+ . Finally, let αi+ = τi+ /τi and αi− = τi− /τi . Then Sweeting’s (1996) approximation to the posterior expectation E{g(θ)} of a ‘smooth’ function g(θ) is E{g(θ)} = g(θ̂) + k X {αi+ gi+ + αi− gi− − g(θ̂)} (2) i=1 in which gi+ = g(θ̂1 , . . . , θ̂i−1 , θi+ , θ̂i+1 (θi+ ), . . . , θ̂k (θi+ )) and gi− = g(θ̂1 , . . . , θ̂i−1 , θi− , θ̂i+1 (θi− ), . . . , θ̂k (θi− )). The error term in (2) turns out to be O(n−2 ) (Sweeting, 1996). This may be compared with the error of the plug-in approximation g(θ̂), which is O(n−1 ). The above result holds whenever the prior density π(θ) is O(1) with respect to n. We emphasise that, despite the complexity of the derivations, the implementation of these methods is relatively straightforward. The presentation here is restricted to signed-root log-likelihood ratios. Sweeting (1996) embeds these in the more general framework of signed-root log-density ratios. Sweeting and Kharroubi (2003) obtain alternative versions of formula (2) in which the abscissae θi+ , θi− are −1/2 placed symmetrically either side of θ̂i at a distance ki , where ki is the recip- rocal of the first entry in {ji (θ̂i−1 )}−1 . It is shown that these formulae are also correct to O(n−2 ). A disadvantage of these versions is that they are not invariant to reparameterisation, so some care needs to be taken when choosing the parameterisation. On the other hand, they may be easier to implement as no inversion of the signed-root log-likelihood ratio is required for the computation of θi+ and θi− . It is possible to obtain an approximation for the posterior variance of g(θ) by applying formula (2) to both g(θ) and {g(θ)}2 . However, in terms of asymptotic error of approximation, no gain is made over the simpler approximation that uses the observed information matrix along with the delta method, 0 0 Var{g(θ)} = {g (θ̂)}T J −1 g (θ̂) , (3) 0 since this is already an O(n−2 ) approximation. Here g (θ) is the column vector of first-order partial derivatives of g(θ). If further a normal approximation to the 4 posterior distribution of g(θ) is appropriate, then clearly interval estimates for g(θ) can be obtained from (2) and (3). In general such (two-sided) intervals will only be accurate to O(n−1 ). Higher-order accuracy may be achieved by using the distributional approximation formulae in Sweeting (1996), for example. 3 Application to multi-component models In this section we consider the case of a single computer model that incorporates p statistical sub-models, possibly derived from independent data sets, to calculate an output. The parameter vector of the combined model is θ = (θ1 , . . . , θp ), where θm = (θm1 , . . . , θmkm ) is the vector of parameters associated with the mth submodel, m = 1, . . . , p. The dimension of the overall parameter space is therefore k = k1 + · · · + kp . Let the data matrix associated with sub-model m have nm rows. We assume that the likelihood factorizes as L(θ) = L1 (θ1 )L2 (θ2 ) · · · Lp (θp ) , (4) where now Lm (θm ) denotes the likelihood function associated with the mth submodel. Independent data matrices for the sub-models will be sufficient (but not necessary) for this factorisation. The profile likelihood of (θ̂m1 , . . . , θ̂m,i−1 , θmi ) for the mth sub-model is denoted by Lmi (θmi ). We further suppose that the component parameter vectors θ1 , . . . , θp are a priori independent so that the prior density also factorizes as π(θ) = π1 (θ1 )π2 (θ2 ) · · · πp (θp ) . (5) It will be convenient to re-express formula (2) in a new notation. For each sub-model m = 1, . . . , p define a set of abscissae θm [j] , j = 0, . . . , 2km , as follows. θm [0] = θ̂m + + + θm [2i − 1] = (θ̂m1 , . . . , θ̂m,i−1 , θmi , θ̂m,i+1 (θmi ), · · · , θ̂m,km (θmi )) − − − θm [2i] = (θ̂m1 , . . . , θ̂m,i−1 , θmi , θ̂m,i+1 (θmi ), · · · , θ̂m,km (θmi )) for i = 1, . . . , km and define the corresponding set of weights αm [j] by αm [0] = 1 − km + αm [2i − 1] = αmi − αm [2i] = αmi . 5 Now suppose that some quantity g = g(θ) is the output of the computer model given the parameter vector θ, possibly dependent on additional data supplied by the computer user. We derive formulae for E{g(θ)} as follows. Let Em {g(θ)} denote the posterior expectation of g(θ) obtained from sub-model m alone, with the parameter vectors θs , s 6= m held fixed. Then, from equation (2), Em {g(θ)} = 2k m X αm [j] g(θ1 , . . . , θm−1 , θm [j] , θm+1 , . . . , θp ) (6) j=0 to O(n−2 m ). Given the likelihood and prior factorisations (4) and (5), formula (6) gives the conditional posterior expectation of g(θ) given θs , s 6= m. Application of the iterated expectation formula therefore gives the posterior expectation of g to be E{g(θ)} = 2k1 X j1 =0 ··· 2kp X α1 [j1 ] · · · αp [jp ] g(θ1 [j1 ] , . . . , θp [jp ]) . (7) jp =0 Notice that this formula requires (2k1 + 1) × (2k2 + 1) × . . . × (2kp + 1) evaluations of the function g. The results derived in the Appendix provide an alternative to (7). Consider instead the overall k−parameter model. Given the factorisation of the component likelihood functions, the fitted parameter vector is θ̂ = (θ̂1 , . . . , θ̂p ). From the results in the Appendix, the abscissae for the signed-root approximations are θ [0] = θ̂ and, for j = 1, . . . , 2km , m = 1, . . . , p, θ [2(k1 + · · · + km−1 ) + j] = (θ̂1 , . . . , θ̂m−1 , θm [j] , θ̂m+1 , . . . , θ̂p ) , where θm [j] is obtained from the mth likelihood alone. The corresponding weights, from the Appendix, are α [0] = 1 − k and α [2(k1 + · · · + km−1 ) + j] = αm [j] , where again αm [j] is obtained from the mth likelihood alone. Invoking formula (2), we obtain the alternative approximation E{g(θ)} = 2k X α [r] g(θ [r]) , (8) r=0 which requires only 2k + 1 evaluations of g compared with the Q m (2km + 1) evalua- tions for formula (7). Furthermore, although both formulae (7) and (8) are correct 6 to O(n−2 min ), where nmin = min(nm , m = 1, . . . , p), formula (8) is likely to be more accurate in practice, since the errors will be compounded in (7). We note that the quantities θm [j] and αm [j] depend only on the likelihood and prior from the mth sub-model and, for each sub-model m, the components θs , s 6= m, are fixed at their MLE values in the evaluation of g(θ [r]). The developers of a computer model, or its component statistical models, would determine the values of θ[r] and α[r], r = 1, . . . , 2m, during the development cycle. In software to calculate estimates of some function g(θ, X), with values of X provided by the user, the chief computational burden at run-time is the 2m + 1 evaluations of g(θ[r], X). 4 A general posterior expectation decomposition In this section we note that formula (8) can be decomposed into p approximate posterior expectations from each of the sub-models. This leads to an interesting general approximation formula for the posterior expectation of g in terms of the component posterior expectations. Given a function g(θ) in the multi-component situation of Section 3, define the p functions gm (θm ) = g(θ̂1 , . . . , θ̂m−1 , θm , θ̂m+1 · · · θ̂p ) . Then, from formula (8), E{g(θ)} = (1 − k)g(θ̂) + = (1 − k)g(θ̂) + = (1 − k)g(θ̂) + 2k X α [r] g(θ [r]) r=1 p 2k m X X αm [j] gm (θm [j]) m=1 j=1 p X [E{gm (θm )} − (1 − km )g(θ̂)] m=1 = g(θ̂) + p X [E{gm (θm )} − g(θ̂)] , (9) m=1 which expresses the overall posterior expectation of g(θ) as the first-order approximation g(θ̂) plus a sum of correction terms calculated from each sub-model. It is not hard to see that (9) holds to O(n−1 min ). What is new here is that it holds to the higher asymptotic order O(n−2 min ). This relationship between posterior expectations and maximum likelihood estimates may find application in calculating expectations across multiple models, 7 even when the asymptotic methods of Sections 2 and 3 are not employed. For example, if the sub-models are low dimensional, then exact, or numerically computed, component expectations could be used in (9). In the case of numerical integration, formula (9) gives a clear computational benefit over p−dimensional numerical integration. We further note that one or more of the MLEs of θm in (9) could be replaced by their posterior means if these were more readily available (e.g. analytically). This can be shown using the fact that, in general, the posterior mean and the MLE differ by O(n−1 ). Sweeting and Kharroubi (2003) derive an alternative version of the summation formula (2) that is of the form E{g(θ)} = k X wi (αi+ gi+ + αi− gi− ) , i=1 where wi are weights satisfying P i wi = 1 and the gi± values are evaluations of g at points defined in an alternative way to the θi± in Section 2. It turns out that this approximation is also O(n−2 ). A discussion of the advantages of this version of the formula is given in that paper. We simply note here that this alternative version could be used to approximate the component posterior expectations of gm (θm ) and then (9) used to approximate the overall posterior expectation of g(θ). The posterior uncertainty in g(θ) may be measured by its posterior variance, which again can be obtained from the sub-model variance formulae. Specifically, from (3) we have, to O(n−2 ), 0 0 Var{g(θ)} = {g (θ̂)}T J −1 g (θ̂) = p X 0 0 −1 {gm (θ̂m )}T Jm gm (θ̂m ) = m=1 p X Var{gm (θm )} , m=1 0 where gm (θm ) is the column vector of first-order partial derivatives of gm (θm ) and Jm is the observed information associated with sub-model m. The above additive formula follows since J is block diagonal. Again, any exact or approximate values for Var{gm (θm )} could be used in this formula. 5 Example We illustrate the use of (8) and (9) with an application in which three statistical models are used to estimate risk of fatal coronary heart disease. 8 The UK Prospective Diabetes Study (UKPDS) Group have published an equation for coronary case fatality in type 2 diabetes (Stevens et al. 2004). Clinically, case fatality is the proportion of cases of a disease that are fatal; for the purposes of the model, case fatality is defined as the probability that coronary disease is fatal, conditional on coronary disease occurring. Given an individual with observed value x = (x1 , . . . , x5 )T of a vector of covariates X = (X1 , . . . , X5 )T , with the convention that X1 = 1, the UKPDS model for case fatality p(x), given a parameter vector µ = (µ1 , . . . , µ5 ), is h n p(x) = 1 + exp −xT µ oi−1 (10) The MLE of µ is µ̂ = (0.713, 0.048, 0.178, 0.141, 0.104) (with corresponding standard errors 0.194, 0.0236, 0.0119, 0.0612 and 0.0417), derived by fitting a logistic regression model to data from 597 people with coronary disease as described in detail previously (Stevens et al. 2004). The UKPDS Group have also published an equation for risk of coronary disease in type 2 diabetes (UKPDS Group 2001). Given an individual with observed value y = (y1 , . . . , y8 )T of a vector of covariates Y = (Y1 , . . . , Y8 )T , with the convention that Y1 = 1, the modelled risk of coronary disease over t years depends on a parameter vector ν = (ν1 , . . . , ν9 ) as follows: R(y, t) = 1 − exp{−q(1 − ν9t )/(1 − ν9 )} , (11) where q = exp {(ν1 , . . . , ν8 )y}. The MLE of ν is ν̂ = (0.0112, 1.059, 0.525, 0.390, 1.350, 1.183, 1.088, 3.845, 1.078) (with corresponding standard errors 0.00154, 0.0155, 0.00666, 0.0511, 0.103, 0.124, 0.0365, 0.0250 and 0.638, derived by fitting the model to 4,540 people with a median 10.7 years of follow-up, as described previously (UKPDS Group 2001). There were 597 incident cases of coronary disease in this cohort, and these are the same 597 that were used to fit the case fatality equation (10). Variable X3 in the case fatality model, and variable Y5 in the coronary risk model, take the same value. Glycated haemoglobin (abbreviated HbA1c) is a widely-used indicator of prevailing blood glucose levels over recent months. Measurements of HbA1c are subject to variation between laboratories. The models define X3 and Y5 to be HbA1c as measured on an assay aligned to the reference laboratories of the National Glycosylation Standardisation Program (NGSP). The NGSP publishes linear regression models X3 = λ1 + λ2 h + z , z ∼ N (0, λ23 ). 9 (12) relating X3 , HbA1c measured in an NGSP reference laboratory, to h, the HbA1c measured in another laboratory. Let λ = (λ1 , λ2 , λ3 ). For a particular local laboratory the MLE of λ was λ̂ = (0.229, 0.982, 0.124) (with corresponding standard errors 0.079, 0.0096 and 0.014), so that an HbA1c of 4.5% in the local laboratory corresponds to a MLE of 4.65% at the reference laboratory. Although the risk model and case fatality model were estimated from data from the same study, they still satisfy the factorisation criterion (4). Let F denote the set of individuals with fatal coronary disease, N denote the set of individuals with nonfatal coronary disease, and C denote the set of individuals censored for coronary disease within the UKPDS. Further, write ri (ν) = P (disease in individual i), pi (µ) = P (fatal disease in individual i|disease in individual i) Then the likelihood for the case fatality model is L1 (µ) = and the likelihood for the risk model is L2 (ν) = Q i∈C Q i∈F pi (µ) {1 − ri (ν)} Q Q i∈N i∈F,N {1 − pi (µ)} ri (ν). The combined likelihood function that would be required to estimate µ and ν jointly is L(µ, ν) = Y i∈C {1 − ri (ν)} Y pi (µ)ri (ν) i∈F Y {1 − pi (µ)} ri (ν), i∈N which equals L1 (µ)L2 (ν) as required in (4). The extension of this to include the independent data on which λ is estimated is trivial. Consider a hypothetical patient with HbA1c of 4.5% in the local laboratory, Xi = 0 for all i > 0, i 6= 3, and Yi = 0 for all i > 0, i 6= 5. For simplicity, we consider here the case that t = 1. Using λ̂, µ̂ and ν̂ the MLE of one-year coronary risk is 0.0078707 and the MLE of one-year case fatality is 0.23752, giving an estimated risk of fatal coronary disease 0.0078707 × 0.23752 = 0.001869. We wish to calculate the posterior expectation of the risk of fatal coronary disease over λ, µ, ν under the independent prior specifications πλ (λ) ∝ 1/λ2 and πµ (µ) = πν (ν) ∝ 1. Taking p = 3, with θ1 = λ, θ2 = µ and θ3 = ν, we can apply the approximations of Section 3. To three significant figures, the approximate posterior expectation by the full method (7) is 0.001903 and requires 1,463 evaluations. The approximation (8) to the posterior expectation is also 0.001903 and requires only 35 evaluations. For comparison, the exact answer by simulation is 0.001907. To six significant figures the MLE of fatal coronary risk is 0.00186948. Fixing λ = λ̂, ν = ν̂, the expectation over µ calculated by (8) is 0.00187545. Fixing 10 λ = λ̂, µ = µ̂, the expectation over ν by (8) is 0.00189733. Fixing µ = µ̂, ν = ν̂, the expectation over λ by numerical integration is 0.00186948. Hence, using the decomposition formula (9), the expectation over λ, µ, ν is approximately −2 × 0.00186948 + 0.00189733 + 0.00187545 + 0.00186960 = 0.001903. Table 1 compares the MLE and approximations (8) and (9) to the posterior expectation, calculated by simulation, for 25 individuals selected at random from the UKPDS. The higher-order approximations to the expectation show a smaller mean absolute error than the MLE. Compared to the variation between patients, however, the error in the MLE is small, due to the large sample size on which the risk and case fatality models were estimated. To illustrate the performance of the approximations in models derived from a smaller sample, we refit the risk and case fatality models using a 15% random subsample of the UKPDS. Table 2 shows simulation results and asymptotic approximations for this analysis. 6 Discussion We have presented formulae for fast estimates of posterior quantities with high asymptotic accuracy. In our example the good asymptotic properties of the formulae resulted in substantially reduced error relative to the posterior standard deviation. Although the formulae are for Bayesian estimation, Stevens (2003) showed that they can also be useful in other contexts. The example in Section 5 is one of many situations in which one model is used to estimate the input to another. Another is a class of methods for correcting regression dilution that consist of estimating a scaling factor from a repeated measures model, which is then used to modify the effect parameter, or equivalently the input variable, preparatory to calculating a prediction from a regression model (Frost and Thompson 2000). The methods of Sections 3 and 4 could be used to calculate expectations across both the repeated measures model and the regression model. Our examples have emphasised the use of the results of Section 3 in combining multi-component models. They may also have a use in reducing the computational burden in fitting a single model with very many parameters. Calculation of maximum likelihood estimates and profile likelihood functions becomes increasingly difficult with the square of the dimension of the parameter vector. If a problem with many parameters can be factorized in the manner of Section 3 then the 11 weights and abscissae for these approximations can be calculated on p problems of lower dimension. The factorisation criteria (4) and (5) will be met whenever the p models are estimated on p independent data sets, but the example shows that factorisation can also arise from conditionality as well as independence. It is possible to relax the factorisation (5) of the prior density (although not the factorisation (4) of the likelihood). Let π1 (θ1 ), . . . , πp (θp ) be the marginal prior densities, or some other densities chosen for convenience, of θ1 , . . . , θp and define φ(θ) = π(θ)/{π1 (θ1 ) · · · πp (θp )}. Then φ(θ) may be absorbed into the function g(θ) before computing the expectation (8). Since φ(θ) is O(1), the resulting approximation will continue to be of the same order of accuracy; details are not given here. This paper was motivated by the “UKPDS Risk Engine” software project, which provides clinicians with estimates of coronary and stroke risk in people with diabetes. The methods of this paper made a substantial contribution to the successful drive to make the Risk Engine run acceptably fast on a Palm computer. They should also find application in the many other models, in diabetes and elsewhere, that use multi-sub-models. Appendix: Derivation of weights and abscissae We obtain the abscissae and weights used in formula (8) for the overall multicomponent model when the likelihood function and prior density factorise as in (4) and (5). Defining lm (θm ) = log Lm (θm ) to be the log-likelihood for the mth model, we observe that l(θ1 , . . . , θk ) = l1 (θ1 ) + · · · + lp (θp ) so that the maximiser of l is θ̂ = (θ̂1 , . . . , θ̂p ), where θ̂m is the maximiser of lm (θm ). For a given m and 1 ≤ i ≤ km , as in Section 2 we let θ̂mj (θmi ), j = i+1, . . . , km , and θ̂rj (θmi ), j = 1, . . . , kr , r = m + 1, . . . , p, denote the values of θm,i+1 , . . . , θpkp that maximise l(θ) conditional on (θ̂1 , . . . , θ̂m−1 , θ̂m1 , . . . , θ̂m,i−1 , θmi ). But it follows from the factorisation (4) of L that θ̂mj (θmi ), j > i does not depend on (θ̂1 , . . . , θ̂m−1 ) and that θ̂rj (θmi ) = θ̂rj for r > m. Therefore the profile log-likelihood associated with the (k1 + · · · + km−1 + i)th component θmi of θ, i = 1, . . . , km , m = 1, . . . , p, 12 can be written as l1 (θ̂1 ) + · · · + lm−1 (θ̂m−1 ) + lmi (θmi ) + lm+1 (θ̂m+1 ) + · · · + lp (θ̂p ) in which lmi (θmi ) is the logarithm of the profile likelihood Lmi (θmi ) defined in Section 3. It follows that the (m, i)th signed-root transformation is simply calculated from the mth log-likelihood function as Rmi (θmi ) = sign(θmi − θ̂mi )[2{lm (θ̂m ) − lmi (θmi )}]1/2 , + − and hence that θmi and θmi are obtained from the mth likelihood alone. Next consider the weights in the overall model. In view of the factorisation of the likelihood, it is readily seen that the log-likelihood derivative in the definition + of τmi associated with the (m, i)th component of θ is simply ∂ + + + lm (θ̂m1 , . . . , θ̂m,i−1 , θmi , θ̂m,i+1 (θmi ), . . . , θ̂mkm (θmi )) . ∂θmi Similarly, the matrix jr+1 in the combined model is block diagonal, so that when r = 2(k1 + · · · + km−1 ) + i its determinant is |jmi (θ̂mi )||Jm+1 | · · · |Jp | , where Jm is the observed information matrix associate with model m. It follows that the (m, i)th component of τ + for the combined model is Y s6=m πs (θ̂s ) Y s>m + |Js |−1/2 τmi , + where τmi is given by equation (1) applied to model m. Since we get a similar expression for the (m, i)th component of τ − , the common factors in these components of τ + and τ − cancel when we form the (m, i)th component of α+ and α− . It + − follows that the (m, i)th components of α+ and α− are just αmi and αmi obtained from the mth model. Acknowledgements This paper arises from work carried out when RJS was a Wellcome Trust research fellow at the Diabetes Trials Unit, Oxford. We are grateful to the UK Prospective Diabetes Study group, and to the laboratory of the Diabetes Trials Unit for permission to use the example. 13 References Brown J.B. 2000. Computer models of diabetes: almost ready for prime time, Diabetes Research and Clinical Practice 50(Supplement 3):S1–3. Clarke P.M., Gray A.M., Briggs A., Farmer A.J., Fenn P. et al. 2004. A model to estimate the lifetime health outcomes of patients with Type 2 diabetes: the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS no. 68) Diabetologia 47: 1747–59. Eastman R.C., Javitt J.C., Herman W.H., Dasbach E.J., Zbrozek A.S. et al. 1997. Model of complications of NIDDM. I. Model construction and assumptions, Diabetes Care 20: 725–34. Eddy D.M. and Schlessinger L. 2003. Archimedes: A trial-validated model of diabetes. Diabetes Care 26: 3093–3101. Frost C. and Thompson S. 2000. Correcting for regression dilution bias: comparison of methods for a single predictor variable, Journal of the Royal Statistical Society series A 163: 173–190. Hoeffding W. 1982. Asymptotic normality. In: Kotz S., Johnson N.L. and Read C.B. (Eds.), Encyclopedia of Statistical Sciences. Wiley. Stevens R.J. 2003. Evaluation of methods for interval estimation of model outputs, with application to survival models. Journal of Applied Statistics 30: 967–981. Stevens R.J., Coleman R.L., et al. 2004. Risk factors for myocardial infarction case fatality and stroke case fatality in type 2 diabetes (UKPDS 66). Diabetes Care 27: 201–207. Sweeting T.J. 1996. Approximate Bayesian computation based on signed roots of log-density ratios (with Discussion). In: Bernardo J.M., Berger J.O., Dawid A.P., and Smith A.F.M. (Eds.), Bayesian Statistics 5. Oxford University Press. Sweeting T. J. and Kharroubi S. A. 2003. Some new formulae for posterior expectations and Bartlett corrections. Test 12: 497–521. 14 Tierney L. and Kadane J. B. 1986. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81: 82–86. UKPDS Group. 2001. The UKPDS Risk Engine: a model for the risk of coronary heart disease in type 2 diabetes (UKPDS 56). Clinical Science 101: 671–679. 15 Table 1. Modelled one-year risk of fatal coronary disease in 25 randomly selected individuals: exact posterior expectation calculated by simulation, compared to the maximum likelihood estimate (MLE) and the expectation estimated by formulae (8) and (9) as described in the text. Patient Posterior expectation (Monte-Carlo standard error) Posterior standard deviation MLE Formula (8) Formula (9) a 0.000913 (2.0 × 10-7) 0.000329 0.000836 0.000912 0.000912 b 0.000272 (5.6 × 10-8) 0.000098 0.000248 0.000271 0.000271 -7 c 0.000483 (1.0 × 10 ) 0.000163 0.000461 0.000483 0.000483 d 0.004536 (6.1 × 10-7) e 0.000963 0.004405 0.004534 0.004534 -7 0.000646 0.002171 0.002253 0.002253 -7 0.002254 (4.1 × 10 ) f 0.001034 (2.1 × 10 ) 0.000341 0.000990 0.001033 0.001033 g 0.000064 (1.6 × 10-8) 0.000028 0.000061 0.000064 0.000064 -7 h 0.005467 (6.7 × 10 ) 0.001047 0.005384 0.005468 0.005468 i 0.002598 (3.5 × 10-7) j 0.000553 0.002550 0.002598 0.002598 -7 0.000432 0.001923 0.001944 0.001944 -7 0.001944 (2.7 × 10 ) k 0.002100 (2.8 × 10 ) 0.000443 0.002087 0.002101 0.002101 l 0.002884 (4.3 × 10-7) 0.000694 0.002821 0.002884 0.002884 -7 m 0.002335 (3.1 × 10 ) 0.000481 0.002289 0.002335 0.002335 n 0.004851 (6.2 × 10-7) o p q 0.000977 0.004802 0.004852 0.004852 -6 0.001811 0.008968 0.009125 0.009125 -7 0.000868 0.003380 0.003507 0.003507 -7 0.000633 0.003032 0.003081 0.003081 -7 0.009124 (1.2 × 10 ) 0.003509 (5.7 × 10 ) 0.003081 (4.1 × 10 ) r 0.004077 (6.6 × 10 ) 0.001037 0.003922 0.004074 0.004074 s 0.004236 (5.7 × 10-7) t u v 0.000886 0.004148 0.004236 0.004236 -7 0.000516 0.001583 0.001626 0.001626 -7 0.000655 0.003063 0.003070 0.003070 -7 0.000513 0.002245 0.002286 0.002286 -6 0.001626 (3.2 × 10 ) 0.003069 (4.1 × 10 ) 0.002285 (3.2 × 10 ) w 0.011519 (1.4 × 10 ) 0.002226 0.011322 0.011520 0.011520 x 0.004730 (6.5 × 10-7) 0.001031 0.004621 0.004729 0.004729 0.001404 0.007024 0.007196 0.007196 11% 0.15% 0.23% y -7 0.007197 (9.3 × 10 ) Mean error1 16 1 For each row, we calculated the absolute value of the difference between each estimate and the exact value by simulation, and divided by the posterior standard deviation. ‘Mean error’ in the last row denotes the average across all 25 rows. Table 2. One-year risk of fatal coronary disease according to models derived from a subsample of the UKPDS, as described in the text. MLE and mean error defined as in Table 1. Patient Posterior expectation (Monte-Carlo standard error) Posterior standard deviation MLE Formula (8) Formula (9) a 0.002967 (1.8 × 10-5) b 0.002094 0.002033 0.002875 0.002875 -6 0.000297 0.000243 0.000358 0.000358 -6 0.000380 (3.0 × 10 ) c 0.000381 (3.1 × 10 ) 0.000286 0.000282 0.000367 0.000367 d 0.006667 (3.0 × 10-5) 0.003207 0.005474 0.006545 0.006545 -5 e 0.002049 (1.5 × 10 ) 0.001261 0.001586 0.001987 0.001987 f 0.000761 (7.2 × 10-6) g h i 0.000537 0.000574 0.000735 0.000735 -7 0.000044 0.000033 0.000043 0.000043 -5 0.002462 0.005183 0.005610 0.005610 -5 0.001864 0.003267 0.003632 0.003632 -5 0.000043 (4.1 × 10 ) 0.005730 (3.2 × 10 ) 0.003703 (1.7 × 10 ) j 0.001688 (1.1 × 10 ) 0.000842 0.001563 0.001641 0.001641 k 0.002185 (1.2 × 10-5) l m n 0.001008 0.002094 0.002143 0.002143 -5 0.001019 0.001500 0.001730 0.001730 -5 0.001178 0.002194 0.002432 0.002432 -5 0.002614 0.005278 0.005504 0.005504 -5 0.001783 (1.6 × 10 ) 0.002487 (1.2 × 10 ) 0.005609 (3.1 × 10 ) o 0.012677 (6.6 × 10 ) 0.005601 0.011483 0.012523 0.012523 p 0.005380 (2.6 × 10-5) 0.002828 0.004255 0.005274 0.005274 -5 q 0.003956 (1.9 × 10 ) 0.001912 0.003599 0.003889 0.003889 r 0.006368 (3.2 × 10-5) s 0.003499 0.004985 0.006244 0.006244 -5 0.002557 0.004608 0.005153 0.005153 -5 0.005246 (2.7 × 10 ) t 0.002094 (1.4 × 10 ) 0.001366 0.001839 0.002100 0.002100 u 0.004599 (2.1 × 10-5) 0.002120 0.004624 0.004565 0.004565 -5 v 0.001714 (1.2 × 10 ) 0.000864 0.001503 0.001662 0.001662 w 0.016274 (7.9 × 10-5) x y 0.006945 0.014721 0.016133 0.016133 -5 0.002237 0.003794 0.004382 0.004382 -5 0.005190 0.011112 0.012589 0.012589 27% 3.9% 3.9% 0.004516 (2.8 × 10 ) 0.012669 (4.9 × 10 ) Mean error 17