Estimation Techniques for Dose-response Functions (Bahman Shafii)

advertisement
Estimation Techniques
for Dose-response Functions
Presented by
Bahman Shafii, Ph.D.
Statistical Programs
College of Agricultural and Life Sciences
University of Idaho
Acknowledgments
• Research partially funded by USDA-ARS Hatch Project
IDA01412, Idaho Agricultural Experiment Station.
• Collaborators:
• William J. Price Ph. D., Statistical Programs,
University of Idaho.
• Steven Seefeldt, Ph. D., USDA -ARS,
University of Alaska Fairbanks.
Introduction
• Dose-response models are common in agricultural
research.
• They can encompass many types of problems:
• Modeling environmental effects due to
exposure to chemical or temperature regimes.
• Estimation of time dependent responses such as
germination, emergence, or hatching.
(e.g. Shafii and Price 2001; Shafii, et al. 2009)
• Bioassay assessments via calibration curves and
quantal estimation. (e.g. Shafii and Price 2006)
Estimation
• Curve estimation.
• Linear or non-linear techniques.
• Estimate other quantities:
• percentiles.
• typically: LD50, LC50, EC50, etc.
• percentile estimation problematic.
• inverted solutions.
• unknown distributions.
• approximate variances.
• The response distribution:
• Continuous
• Normal
• Log Normal
• Gamma, etc.
• Discrete - quantal responses
• Binomial, Multinomial (yes/no)
• Poisson (count)
• The response form:
Typically expressed as a nonlinear curve
• increasing or decreasing sigmoidal form
• increasing or decreasing asymptotic form
Response
•
Dose
Bioassay and Calibration
• Given a dose-response curve and an observed
response:
• What dose generated the response?
• What is the probability of a dose given an
observed response and the calibration curve?
• This problem fits naturally into a Bayesian framework.
Response
Measured Response
Unknown Dose
Dose
• Typical dose-response estimation assumes that the
functional form or tolerance distribution,
is known, e.g. a sigmoidal shape.
• In some cases, however, it may be advantageous to
relax this assumption and restrict estimation
to a family of dose-response forms.
• The dose-response population consists of a
mixture of subpopulations which can not be
sampled separately.
• The dose-response series exhibits a more complex
behavior than a simple sigmoidal shape,
e.g. hormesis.
• Objectives
• Outline estimation methods for dose-
response models.
• Traditional approaches.
•
Probit - Least Squares.
• Modern approaches.
• Probit - Maximum Likelihood
• Generalized non-linear models.
• Bayesian solutions.
• Objectives
• Demonstrate solutions for calibration of an
unknown dose with a binary response
assuming:
• A known dose-response form.
• Standard MLE estimation.
• Standard Parametric Bayesian estimation.
• A family of dose-response forms.
• Nonparametric Bayesian estimation.
Estimation Methods
Traditional Approach
• Probit Analysis - Least Squares
• A linearized least squares estimation (Bliss, 1934 ; Fisher, 1935;
Finney, 1971):
Probiti = F -1(pij) = b0 + b1*dosei + eij
where
pij = yij / N and yij is the number of successes out of N
trials in the jth replication of the ith dose.
b0 and b1 are regression parameters and ei is a random
error; eij ~ N(0,s2).
• Minimize:
^
2
SSerror =  (pij - probit)
(1)
• F is a convenient CDF form or “tolerance
distribution“, e.g.
•
Normal:
pij = (1/2s) exp((x-)2/s2

•
Logistic:
pij = 1 / (1 + exp( -b1( dosei - b0 ))
• Modified Logistic:
(e.g. Seefeldt et al. 1995)
pij = C + (C-M) / (1 + exp( -b1(dosei -b0))
•
Gompertz:
pij = b0 (1 - exp(exp(-b1(dose))))
•
Exponential:
pij = b0 exp(-b1(dose))
• SAS: PROC REG.
Modern Approaches
• Probit Analysis - Maximum Likelihood
• The responses, yij, are assumed binomial at each dose i
with parameter i. Using the joint likelihood, L(i) :
Maximize:
L(i) 
P ( )
i
yij
(1 - i)(N - yij)
for data set yij where i = F (b0 + b1*dosei ) and b0, b1,
and dosei are those given previously.
• The CDF, F, is typically defined as a Normal, Logistic, or
Gompertz distribution as given above.
• SAS: PROC PROBIT.
(2)
Probit Analysis
• Limitations:
• Least squares limited.
• Linearized solution to a non-linear problem.
• Even under ML, solution for percentiles approximated.
• inversion.
• use of the ratio b0/b1 (Fieller, 1944).
• Appropriate only for proportional data.
• Assumes the response F -1(pij) ~ N(, s2).
• Interval estimation and comparison of percentile
values approximated.
Modern Approaches (cont)
• Nonlinear Regression - Iterative Least Squares
• Directly models the response as:
yij = f(dosei) + eij
where yij is an observed continuous response, f(dosei)
may be generalized to any continuous function of dose
and eij ~ N(0, s2).
• Minimize: SSerror =  [ yij - f(dosei) ] 2.
• SAS: PROC NLIN.
(3)
• Nonlinear Regression - Iterative Least Squares
• Limitations:
• assumes the data, yij , is continuous; could be discrete.
• the response distribution may not be Normal,
i.e. eij ~ N(0, s2).
• standard errors and inference are asymptotic.
• treatment comparisons difficult in PROC NLIN.
• differential sums of squares, or
• specialized SAS codes ; PROC IML.
Modern Approaches (cont)
•
Generalized Nonlinear Model - Maximum Likelihood
• Directly models the response as:
yij = f(dosei) + eij
where yij and f(dosei) are as defined above.
• Estimation through maximum likelihood where the
response distribution may take on many forms:
Normal:
Binomial:
yij ~ N(i, s) ,
yij ~ bin(N, i) ,
Poisson:
in general:
yij ~ poisson(i) , or
yij ~ ƒ().
• Generalized Nonlinear Model - Maximum Likelihood
•
Maximize:
L() 
P ƒ( | y )
ij
(4)
• Nonlinear estimation.
• Response distribution not restricted to Normal.
• May also incorporate random components into the model.
• Treatment comparisons easier in SAS.
• Contrast and estimate statements.
• SAS: PROC NLMIXED.
• Generalized Nonlinear Model - Inference
• Formulate a full dummy variable model encompassing k
treatments.
• The joint likelihood over the k treatments becomes:
L(k) 
Pijk ƒ(
k
| yijk)
(5)
where yijk is the jth replication of the ith dose in the kth
treatment and k are the parameters of the kth treatment.
• Comparison of parameter values is then possible through
single and multiple degree of freedom contrasts.
• Generalized Nonlinear Model
• Limitations
• percentile solution may still be based on inversion or
Fieller’s theorem.
• inferences based on normal theory approximations.
• standard errors and confidence intervals asymptotic.
Modern Approaches (cont)
• Bayesian Estimation - Iterative Numerical Techniques
• Considers the probability of the parameters, ,
given the data yij.
• Using Bayes theorem, estimate:
p(|yij) =
p(yij|)*p()
(6)
p(y |)*p()d
ij

where p(|yij) is the posterior distribution of 
given the data yij, p(yij|) is the likelihood defined
above, and p() is a prior probability distribution
for the parameters .
• Bayesian Estimation - Iterative Numerical Techniques
• Nonlinear estimation.
• Percentiles can be found from the distribution of .
• The likelihood is same as Generalized Nonlinear Model.
• flexibility in the response distribution.
• f(dosei) any continuous function of dose.
• Inherently allows updating of the estimation.
• Correct interval estimation (credible intervals).
• agrees well with GNLM at midrange percentiles.
• can perform better at extreme percentiles.
• SAS: PROC MCMC.
• Bayesian Estimation - Iterative Numerical Techniques
• Limitations
• User must specify a prior probability p().
• Estimation requires custom programming.
• SAS: PROC MCMC
• Specialized software: WinBUGS
• Computationally intensive solutions.
• Requires statistical expertise.
• Sample programs and data are available at:
http://www.uidaho.edu/ag/statprog
Calibration Methods
• Tolerance Distribution: Logistic
• The response yij/Ni at dose i = 1 to k, and replication
j =1 to r , is binomial with the proportion of success
given by:
yij/Ni = M/(1 + exp(-b (dosei - g)))
where b is a rate related parameter and g is the
dosei for which the proportion of success,
yij/Ni , is M/2. M is the theoretical maximum
proportion attainable.
(7)
• A convenient generalization of (1) will allow g to
represent any dose at which yij/Ni = Q:
yij/Ni = M*C / (C + exp(-b (dosei - g)))
(8)
Where the constant C = Q/(M – Q). Note that, if
Q = M/2, then C = 1 and equation (8) reverts to
the standard form given in (7).
Equation (8), therefore, permits an unknown dose at
a given response, Q, to be estimated through
parameter g.
• Maximum Likelihood
• Given the binomial responses, yij/Ni, a joint
likelihood may be defined as:
L(i | yij/Ni)  Pij (i)yij (1 - i)(Ni - yij)
(9)
Where the binomial parameter ,i , is defined by (8)
and the associated parameters,  = [M, b, g], are
estimated through maximization of (9). Ni and yij
are the total number of trials and number of
successes, respectively.
• Inferences on g are carried out assuming g ~ N(g, sg).
• SAS: PROC NLMIXED
• Bayesian: Parametric
• A Bayesian posterior distribution for  is given by:
pr(| yij/Ni)  pr(yij/Ni |) · pr()
(10)
where pr(yij/Ni j|) is the likelihood shown in (9) and pr()
is a prior distribution for the parameters  = [M, b, g].
Estimation of  is carried out through numerically
intensive techniques such as MCMC. (e.g. Price and Shafii 2005)
• Inference on g is obtained through integration of
(10) over the parameter space of M and b.
• Bayesian: Nonparametric
• This methodology was first proposed by Mukhopadhyay (2000) and
followed by Kottas et al. (2002).
• The technique considers the dose-response series as a
multinomial process with parameters P = [p1, p2, p3, … pk].
• Assuming the responses, yij/Ni, are binomial, a likelihood can
then be defined as:
L(P | yij/Ni) 
P
ij
(pi)yij (1 - pi)(Ni - yij)
(11)
• If the random segments between true response rates, pi ,
are distributed as a Dirichlet Process (DP), a joint prior
distribution on the pi may then be defined by:
pr(P)  Pi (pi – pi - 1)(i - 1)
(12)
where i = a{ F0(dose i) – F0(dose i – 1 ) }, a is a precision
parameter , and F0 is a base tolerance distribution.
• The precision parameter, a, reflects how closely the final
estimation follows the base distribution. Low values
indicate less correspondence , while larger values indicate
a tighter association.
• The base distribution, F0(.), defines a family of tolerance
distributions.
• A posterior distribution for P can then be defined by
combining (11) and (12) as:
pr(P | yij/Ni)  Pij (pi)yij (1 - pi)(Ni - yij) Pi (pi – pi - 1)(i - 1)
(13)
• Estimation of this posterior is again carried out
numerically using techniques such as MCMC.
• Inference on an unknown dose, g , at a known
response p0 = y0/N0, is obtained through
sampling of the posterior given in (13) .
Concluding Remarks
•
•
•
•
Dose-response models have wide application in agriculture.
They are useful for quantifying the relative efficacy of treatments.
Probit models of estimation are limited in scope.
Generalized nonlinear and Bayesian models provide the most
flexible framework for dose-response estimation.
•
•
•
•
Can use various response distributions
Can use various dose-response models.
Can incorporate random model effects.
Can be used to compare treatments.
• GNLM: full dummy variable modeling.
• Bayesian methods: probability statements.
• Generalized nonlinear models sufficient in most
situations.
• Bayesian estimation is preferred when estimating
extreme percentiles.
Concluding Remarks (cont)
• Bioassay is an import part of dose-response analysis.
• Determining an unknown dose can be problematic for
some parametric functional forms.
• Dose estimation fits naturally in a Bayesian framework.
• Some dose-response data may not follow typical
sigmoidal patterns.
• Methodology proposed here uses a base tolerance
distribution.
• Should be used and interpreted with caution.
• Standard model assessment techniques still apply.
• Introduces more uncertainty into the estimation situation.
References
Bliss, C. I. 1934. The method of probits. Science, 79:2037, 38-39
Bliss, C. I. 1938. The determination of dosage-mortality curves from small
numbers. Quart. J. Pharm., 11: 192-216.
Berkson, J. 1944. Application of the Logistic function to bio-assay. J.
Amer. Stat. Assoc. 39: 357-65.
Feiller, E. C. 1944. A fundamental formula in the statistics of biological
assay and some applications. Quart. J. Pharm. 17: 117-23.
Finney, D. J. 1971. Probit Analysis. Cambridge University Press, London.
Fisher, R. A. 1935. Appendix to Bliss, C. I.: The case of zero survivors.,
Ann. Appl. Biol., 22: 164-5.
SAS Inst. Inc. 2004. SAS OnlineDoc, Version 9, Cary, NC.
Seefeldt, S.S., J. E. Jensen, and P. Fuerst. 1995. Log-logistic analysis of
herbicide dose-response relationships. Weed Technol. 9:218-227.
Kottas, A., M. D. Branco, and A. E. Gelfand. 2002. A Nonparametric
Bayesian Modeling Approach for Cytogenetic Dosimetry. Biometrics
58, 593-600.
References
Mukhopadhyay, S. 2000. Bayesian Nonparametric Inference on the Dose Level
with Specified Response Rate. Biometrics 56, 220-226.
Price, W. J. and B. Shafii. 2005. Bayesian Analysis of Dose-response Calibration
Curves. Proceedings of the Seventeenth Annual Kansas State
University Conference on Applied Statistics in Agriculture [CDROM],
April 25-27, 2005. Manhattan Kansas.
Shafii, B. and W. J. Price. 2001. Estimation of cardinal temperatures in
germination data analysis. Journal of Agricultural, Biological and
Environmental Statistics. 6(3):356-366.
Shafii, B. and W. J. Price. 2006. Bayesian approaches to dose-response
calibration models. Abstract: Proceedings of the XXIII International
Biometrics Conference [CDROM], July 16 - 21, 2006. Montreal,
Quebec Canada.
Shafii, B., Price, W.J., Barney, D.L. and Lopez, O.A. 2009. Effects of
stratification and cold storage on the seed germination characteristics of
cascade huckleberry and oval-leaved bilberry. Acta Hort. 810:599-608.
Questions / Comments
Download