Analysis of “Benchmark dose estimation incorporating multiple data sources” Jerome J. Schleier III Background Modeling and decision theory are being used increasingly for model and uncertainty analysis in risk management (Ascough et al. 2008). Information often is generated and gathered from multiple laboratories that derive toxic endpoints like the benchmark dose (BMD). Decision makers and risk analysts are confronted with the issue of combining data from different laboratories to set an overall threshold for chemicals. However it is has been shown that there is a large amount of intra-laboratory variability, therefore by accounting for the variability in the model can allow for better estimation of the dose-response curve (Bailer et al. 2000; Bailer and Oris 1993; Wheeler and Bailer 2009). The analysis by Wheeler and Bailer (2009) is based on the findings of Bailer et al. (2000) that toxicity testing of chemicals shows significant lab-to-lab heterogeneity with respect to the determination of the toxic threshold for sodium chloride. Therefore, Wheeler and Bailer (2009) propose the use of a hierarchical model that takes into account the underlying heterogeneity of each lab and incorporates it into the dose-response equation for sodium chloride. Methods Wheeler and Bailer (2009) modeled the BMD by using a two-stage hierarchical model to incorporate multiple laboratory toxicology data. They use reproductive inhibition (RI), which they define as the concentration that decreases the expected brood size below the control concentration of zero by some proportion p. They set the RI25 and RI50 as their BMD for which they estimate for the population and laboratories (Bailer et al. 2000; Bailer and Oris 1993). The data they use is from the U.S. Environmental Protection Agencies Region IX’s reference toxicity database on the effect of sodium chloride on the reproduction of Ceriodaphnia dubia (Wheeler and Bailer 2009). To estimate the population response they use a Poisson model (jc) with a logarithmic link function that relates the mean response to a quadratic polynomial for the fixed effects which was determined by Bailer et al. (2000) in which Wheeler and Bailer (2009) added the random effects of laboratory heterogeneity in the following equation: where bkj is the lab random effect for the jth lab. The prior distributions were assumed to be flat for all parameters estimated. The priors were j ~N3(, D) where 𝜎00 𝐷=[ 0 0 0 𝜎11 0 0 0 ] 𝜎22 Or it can be thought of as kj ~N (k, kk) for k= 0, 1, 2. The uncertainty of the parameters was reflected by k ~ N(0,106) and kk ~ InvGamma (0.001, 0.001). They used a 10,000 burn-in sampling that was utilized on three different chains and once the model converged 20,000 additional samples were taken from the posterior distribution for , and A difficulty in their notation is they do not define what represents. This is troubling because they need to change the notation in the pdf so that the appropriate notation for representing the prior distribution for the parameters; however I am assuming that = p. If is not equal to p then when the distributions are multiplied together they would contain p, , and x and it would be unclear how related to p (Lynch 2007). For comparison Wheeler and Bailer perform the same analysis where the data is pooled using the following equation, All of the parameters in the model were given the same prior of k ~ N(0, 2 = 106). Discussion Wheeler and Bailer’s (2009)model can be thought of as representing the underlying doseresponse which is modeled using the quadratic Possion regression adding in the underlying laboratory variability. Wheeler and Bailer’s (2009) interpretation of the “true” estimate is really the estimated average response for all of the laboratories based on the quadratic Possion regression. Therefore their model can be simplified and thought of as a collection of doseresponse quadratic Possion regressions that estimates the posterior distributions for each lab (Table 2; Figure 1). Their model can be simplified to the following, log(jc)=b0j + b1j c+ b2j c2 where the b’s are estimated for each lab, and the average of the b coefficients is the average laboratory response. This is clearly shown in tables 1 and 2 in which the estimated average and 95% credible intervals for the RI25 and RI50 for all of the labs are the same as their hierarchical model above. Wheeler and Bailer’s (2009) assume that the analysis of Bailer et al. (2000) is the appropriate dose-response model sodium chloride in C. dubia. This may not be a correct assumption because Bailer et al. (2000) used model selection techniques based on visual analysis of fit to select their model and not a biological basis for their selection (discussed below). Wheeler and Bailer (2009) used the deviance information criterion (DIC) to compare models for pool-data and the hierarchical model that incorporates laboratory variability (Table 1). Their argument is that the model incorporating laboratory variability provides a better fit based on the DIC. This argument is a frequentist type analysis similar to that used for ANOVA analysis of models. Based on figure 1 it is clear that there is a large amount of variability in the estimated responses for the laboratories. As we discussed in class the selection of model should have a theatrical basis where two competing theories are both probable. I do not agree with their interpretation because if they know based on previous studies that there is large laboratory variability (their analysis and figure 1 also clearly demonstrate it) then the model that best describes the data should incorporate it into the analysis. In addition an analysis that takes this variability into account is going to provide a better fit to the data because a larger number of coefficients are being estimated. Generally, if more parameters are incorporated into the model it will provide better fit if these variables are linked to the data and provides information that other parameters do not. Wheeler and Bailer (2009) also argue that by analyzing the data where all of the laboratories are pooled it could lead to toxicity measurements that underestimate the “true” toxicity of the chemical. Yet, there analysis does not lead credence to this strong statement because the estimated RI25 is lower for the model that pooled laboratory data than it is for the hierarchical model that takes into account laboratory variability. In chemical risk assessment generally the lowest toxicity value will be used unless there is strong evidence that a higher value is the best estimate. I do not agree with their use of “true” because the response of a population to any chemical is a distribution with certain individuals being more or less susceptible to certain chemicals depending on their ability to metabolize the insecticide and the physical health of the individuals. Therefore all toxicological analysis should be treated as distributions, however for deterministic risk assessments on chemicals – especially pesticides – point estimates are used which are based on the average response of the cohort. Wheeler and Bailer (2009) clearly demonstrate in their analysis that there is no “true” value for a certain point estimate because the variability is so large even when standardized protocols are used. Bailer and Wheeler (2009) state “… the curve defined by the posterior mean values of (0, 1, 2), suggests an increased response associated with sodium chloride at low concentrations (i.e., the expectation of the linear term is positive by the 95% credible intervals contain 0)…”. They do not elaborate on the implications of their 1 estimated 95% credible intervals encompassing 0. This could have implications for their model because if it is not a quadratic response then their interpretation that an increased positive response at lower sodium chloride concentrations would be incorrect. The authors could have addressed this question in biological terms by examining the rearing habits of the C. dubia. This study was done by Winner (1989) which showed that a sodium concentration of 26.3 mg/L showed higher reproduction rate than a sodium concentration of 4.6 mg/L which demonstrates that increases in sodium content can have a positive impact on reproduction in C. dubia. Wheeler and Bailer (2009) also present their analysis as if there is a surprise that laboratories show large variability between labs. However this has been known and the differences are associated with age, sex, environmental rearing conditions, and susceptibility of the organisms (Schleier III and Peterson in press). I think that Wheeler and Bailer (2009) analysis is important because it can be used to estimate endpoints when multiple laboratories contribute data and it can also be used to show which laboratories have credible intervals that do not overlap with the others. In fact their analysis shows that laboratory CAMEC1 95% credible intervals for both the RI25 and RI50 are below any of the other laboratories (Table 2). This demonstrates the power of a Bayesian hierarchical model because the distributions can be obtained for the overall response and each laboratory and can be used to evaluate both fixed and random effects based on the estimates. This is also important because it shows that the laboratory may have quality control issues or has a population that may have a higher susceptibility than any other. There is a distinct need in chemical risk assessment for incorporating data from different sources, especially when there is a known underlying variability in the data (Assmuth and Hilden 2008; Ellison 1996). Wheeler and Bailer (2009) analysis should be used by decision makers when there is data from multiple sources for better estimates of toxicological endpoints. Bayesian analysis techniques have been underutilized with respect to environmental and public health, risk assessment, ecology, and environmental sciences (Clark 2005). Their method derives toxicological endpoints from multiple sources of data, which provides a framework that can be used by assessors and managers which can provide information for the underlying distributions and uncertainty of toxicology analysis (Assmuth and Hilden 2008; Linkov et al. 2009). Table 1: The comparison of the posterior estimate of the k and the reproduction inhibition of the 25 and 50% for the pooled data and hierarchical model taking into account laboratory variability. Table 2: Posterior mean and standard deviation estimates for the lab- and population-average reproduction inhibition of the 25 and 50% for the hierarchical model taking into account laboratory variability. Figure 1: The posterior means of the expected number of offspring given the lab-source variability. The solid line represents the population average and the dotted lines represent the individual laboratories. Literature Cited Ascough, J.C., H.R. Maier, J.K. Ravalico, and M.W. Strudley. 2008. Future research challenges for incorporation of uncertainty in environmental and ecological decision-making. Ecological Modelling 219: 383-399. Assmuth, T., and M. Hilden. 2008. The significance of information frameworks in integrated risk assessment and management. Environmental Science and Policy 11: 71-86. Bailer, A.J., M.R. Hughes, D.L. Denton, and J.T. Oris. 2000. An empirical comparison of effective concentration estimators for evaluating aquatic toxicity test responses. Environmental Toxicology and Chemistry 19: 141-150. Bailer, A.J., and J.T. Oris. 1993. Modeling reproductive toxicity in Ceriodaphnia tests. Environmental Toxicology and Chemistry 12: 787-791. Clark, J.S. 2005. Why environmental scientists are becoming Bayesians. Ecology Letters 8: 2-14. Ellison, A.M. 1996. An introduction to Bayesian inference for ecological research and environmental decision-making. Ecological Applications 6: 1036-1046. Linkov, I., D. Loney, S. Cormier, F.K. Satterstrom, and T. Bridges. 2009. Weight-of-evidence evaluation in environmental assessment: Review of qualitative and quantitative approaches. Science of the Total Environment 407: 5199-5205. Lynch, S.M. 2007. Bayesian statistics and estimation for social scientists. Springer Science+Business Media, LLC, New York, NY, USA. Schleier III, J.J., and R.K.D. Peterson. in press. Pyrethrins and pyrethroid insecticides. In: O. Lopez and J. G. Fernández-Bolaños (eds.) Green Trends in Insect Control. Royal Society of Chemistry, London. Wheeler, M.W., and A.J. Bailer. 2009. Benchmark dose estimation incorporating multiple data sources. Risk Analysis 29: 249-256. Winner, R.W. 1989. Multigeneration life-span tests of the nutritional adequacy of several diets and culture waters for Ceriodaphnia dubia. Environmental Toxicology and Chemistry 8: 513-520.