sim6635-sup-0001-Supplementary1

advertisement
Online Supplementary Materials for “BAYESIAN DOSE-RESPONSE ANALYSIS FOR
EPIDEMIOLOGICAL STUDIES WITH COMPLEX UNCERTAINTY IN DOSE
ESTIMATION”
by Deukwoo Kwon, F. Owen Hoffman, Brian E. Moroz, Steven L. Simon
We provide the following supplementary materials for the main manuscript.
S1. Relationship between our problem and Bayesian model uncertainty paradigm
Under our model uncertainty paradigm, we rely on Bernardo and Smith’s [1] perspectives of Mclosed, M-complete, and M-open for the relationship between the multiple realizations of dose
vectors produced from the 2DMC algorithm and the true exposure model. The M-closed
perspective is the situation in which the dose reconstruction method describes the ‘true’ exposure
model exactly and thus is able to produce the true exposure in one of its multiple realizations of
possibly true dose vectors. Under M-complete, the ‘true’ exposure model exists but the 2DMC
algorithm is unable to reproduce the true dose vector exactly. The M-open perspective is similar
to the M-complete; the difference is that we cannot assume that the true exposure model is captured
within the multiple realizations of dose vectors produced by the 2DMC dose estimation algorithm.
In that case, we cannot develop any prior beliefs about the probability that each realization of
2DMC is equal to the true exposure model without having additional information. Most dose
estimations for cohorts supporting epidemiologic studies are located somewhere between the
extremes of the perspectives of M-complete and M-open, since we cannot guarantee that the
dosimetry system generates the true dose vector.
S2. Stochastic Approximation Monte Carlo (SAMC)
We give a brief review of the SAMC method [2]. We can reformulate the posterior distribution
in terms of U(), the energy function:
p(α,,|Data) = exp(-U(α,,))/Z, (α,,) ,
1
where Z is the normalizing constant and  is the parameter space. Let  denote a set of parameters,
=(1,…, J, , ) and . When we write U()=-log(p0(|Data)), where p0() is the
unnormalized posterior distribution, then U is equivalent to the negative of the log-posterior
distribution in Bayesian computation.
Without loss of generality we assume  is compact. In the implementation we set =[10100,10100]. We assume that the parameter space can be partitioned according to the energy
function, U(), into S disjoint sub-regions denoted by E1={(): U()≤u1}, E2={():
u1<U()≤u2},…, E(S-1)={(): u(S-2)<U()≤u(S-1)}, ES={(): U()>u(S-1)}, where u1,..,u(S-1) are real
numbers in increasing order and are specified by the user. The SAMC method tries to sample from
each sub-region with a pre-specified frequency, f=(f1,…,fS) (e.g., equal frequency) using the trial
S
distribution pw ( )  
s 1
ws 
 p ( )d
0
f s p0 ( )
I (  Es ), where I() denotes an indicator function and
ws
for s=1,…,S.
Es
Since our problem involves parameter estimation with multiple models, (i.e., K linear doseresponse models according to K dose vectors), the model space can be partitioned into K disjoint
sub-regions (E1,..., EK). We use the SAMC model selection approach. We emphasize that our main
interest is estimation of the slope of the linear dose response, , (ERR Gy-1), not the selection of
different types of dose-response models with different shapes that depart from linear. We attempt
to identify which among multiple dose vectors is a good approximate for the unknown true dose
vector in this study. We illustrate how we implemented the SAMC method. Let K models denote
M1, …, MK, each of which are associated with one of the K dose vectors. The SAMC method
consists of two stages: (1) Metropolis-Hastings (MH) sampling of  and (2) weight updating. The
weight,   (1,..., K ) , denotes the working estimate of (log(w1/f1),…, log(wK/fK)) obtained at
each iteration.
In MH sampling, we generate a sample (t) from a Metropolis-Hastings kernel K w( t ) ( (t ) ,) with
K
the proposal distribution q((t),) and the stationary distribution, p ( t ) ( )  
k 1
where 
(t )
p0 ( )
 k( t )
e
I (  Ek ),
 (1(t ) ,...,  K(t ) ) at iteration t. Let Q( M i  M j ) denote the proposed probability for a
2
transition from model Mi to model Mj. The proposed distribution satisfies irreducibility and
aperiodicity for convergence.
First, generate dose vector selection parameter, *, according to the proposal Q(M  ( t 1)  M   ) at
iteration t. If *=(t-1), then generate * and * from p ( ,  | X , y, dose vector ( t 1) ) by a single
MCMC iteration and set (t, t, t)= (*, *, *). If *(t-1), then generate * and * from
p( ,  | X , y, dose vector  ) and accept *, * and * with probability
  M( t 1()t 1)

p(  ,   | X , y, dose vector  )
Q( M    M  ( t 1) ) 
 e
min 1,  ( t 1)
.
M 
Q
(
M
( t 1)
( t 1)
( t 1)  M  )

 e



p( , 
| X , y, dose vector ( t 1) )


In the weight updating stage, *= (t-1) + t(et-f), where t denotes the gain factor sequence and et
= (e1(t ) ,..., eK(t ) ) and ek(t )  1 if (t)=k and 0 otherwise. The gain factor sequence should be a

positive, non-decreasing sequence satisfying the following conditions:

t
  and
t 0

 (
t 
)  , for some (1,2). More details on the implementation of SMAC are described in
t 0
Liang et al. [2].
REFERENCES
1.
Bernardo J, Simith A. Bayesian Theory. Wiley: New York, NY, 2000.
2.
Liang F, Liu C, Carroll RJ. Stochastic Approximation in Monte Carlo Computation.
Journal of the American Statistical Association 2007; 102: 305-320.
3
Download