Online Supplementary Materials for “BAYESIAN DOSE-RESPONSE ANALYSIS FOR EPIDEMIOLOGICAL STUDIES WITH COMPLEX UNCERTAINTY IN DOSE ESTIMATION” by Deukwoo Kwon, F. Owen Hoffman, Brian E. Moroz, Steven L. Simon We provide the following supplementary materials for the main manuscript. S1. Relationship between our problem and Bayesian model uncertainty paradigm Under our model uncertainty paradigm, we rely on Bernardo and Smith’s [1] perspectives of Mclosed, M-complete, and M-open for the relationship between the multiple realizations of dose vectors produced from the 2DMC algorithm and the true exposure model. The M-closed perspective is the situation in which the dose reconstruction method describes the ‘true’ exposure model exactly and thus is able to produce the true exposure in one of its multiple realizations of possibly true dose vectors. Under M-complete, the ‘true’ exposure model exists but the 2DMC algorithm is unable to reproduce the true dose vector exactly. The M-open perspective is similar to the M-complete; the difference is that we cannot assume that the true exposure model is captured within the multiple realizations of dose vectors produced by the 2DMC dose estimation algorithm. In that case, we cannot develop any prior beliefs about the probability that each realization of 2DMC is equal to the true exposure model without having additional information. Most dose estimations for cohorts supporting epidemiologic studies are located somewhere between the extremes of the perspectives of M-complete and M-open, since we cannot guarantee that the dosimetry system generates the true dose vector. S2. Stochastic Approximation Monte Carlo (SAMC) We give a brief review of the SAMC method [2]. We can reformulate the posterior distribution in terms of U(), the energy function: p(α,,|Data) = exp(-U(α,,))/Z, (α,,) , 1 where Z is the normalizing constant and is the parameter space. Let denote a set of parameters, =(1,…, J, , ) and . When we write U()=-log(p0(|Data)), where p0() is the unnormalized posterior distribution, then U is equivalent to the negative of the log-posterior distribution in Bayesian computation. Without loss of generality we assume is compact. In the implementation we set =[10100,10100]. We assume that the parameter space can be partitioned according to the energy function, U(), into S disjoint sub-regions denoted by E1={(): U()≤u1}, E2={(): u1<U()≤u2},…, E(S-1)={(): u(S-2)<U()≤u(S-1)}, ES={(): U()>u(S-1)}, where u1,..,u(S-1) are real numbers in increasing order and are specified by the user. The SAMC method tries to sample from each sub-region with a pre-specified frequency, f=(f1,…,fS) (e.g., equal frequency) using the trial S distribution pw ( ) s 1 ws p ( )d 0 f s p0 ( ) I ( Es ), where I() denotes an indicator function and ws for s=1,…,S. Es Since our problem involves parameter estimation with multiple models, (i.e., K linear doseresponse models according to K dose vectors), the model space can be partitioned into K disjoint sub-regions (E1,..., EK). We use the SAMC model selection approach. We emphasize that our main interest is estimation of the slope of the linear dose response, , (ERR Gy-1), not the selection of different types of dose-response models with different shapes that depart from linear. We attempt to identify which among multiple dose vectors is a good approximate for the unknown true dose vector in this study. We illustrate how we implemented the SAMC method. Let K models denote M1, …, MK, each of which are associated with one of the K dose vectors. The SAMC method consists of two stages: (1) Metropolis-Hastings (MH) sampling of and (2) weight updating. The weight, (1,..., K ) , denotes the working estimate of (log(w1/f1),…, log(wK/fK)) obtained at each iteration. In MH sampling, we generate a sample (t) from a Metropolis-Hastings kernel K w( t ) ( (t ) ,) with K the proposal distribution q((t),) and the stationary distribution, p ( t ) ( ) k 1 where (t ) p0 ( ) k( t ) e I ( Ek ), (1(t ) ,..., K(t ) ) at iteration t. Let Q( M i M j ) denote the proposed probability for a 2 transition from model Mi to model Mj. The proposed distribution satisfies irreducibility and aperiodicity for convergence. First, generate dose vector selection parameter, *, according to the proposal Q(M ( t 1) M ) at iteration t. If *=(t-1), then generate * and * from p ( , | X , y, dose vector ( t 1) ) by a single MCMC iteration and set (t, t, t)= (*, *, *). If *(t-1), then generate * and * from p( , | X , y, dose vector ) and accept *, * and * with probability M( t 1()t 1) p( , | X , y, dose vector ) Q( M M ( t 1) ) e min 1, ( t 1) . M Q ( M ( t 1) ( t 1) ( t 1) M ) e p( , | X , y, dose vector ( t 1) ) In the weight updating stage, *= (t-1) + t(et-f), where t denotes the gain factor sequence and et = (e1(t ) ,..., eK(t ) ) and ek(t ) 1 if (t)=k and 0 otherwise. The gain factor sequence should be a positive, non-decreasing sequence satisfying the following conditions: t and t 0 ( t ) , for some (1,2). More details on the implementation of SMAC are described in t 0 Liang et al. [2]. REFERENCES 1. Bernardo J, Simith A. Bayesian Theory. Wiley: New York, NY, 2000. 2. Liang F, Liu C, Carroll RJ. Stochastic Approximation in Monte Carlo Computation. Journal of the American Statistical Association 2007; 102: 305-320. 3