Lecture 7. Oct 14 at classroom

advertisement
1. EMAX models for Dose-Response.
Bates, DM. And Watts, DG. 1988. Nonlinear Regression Analysis and Its Applications. Wiley and Sons,
NY
Allison, Paul D. (1999) Logistic Regression Using SAS: Theory and Practice. Cary, NC: The SAS
Institute.
For dose-response modeling, one of the most common parametric approaches is to use a 3-parameter
EMAX model by fitting the dose response function g(D)
g ( D )  E Y D   E0 
Emax D
,
ED50  D
where E0 is the response Y at baseline (absence of dose), Emax is the asymptotic maximum dose effect and
ED50 is the dose which produces 50% of the maximal effect. A generalization is the 4-parameter EMAX
model for
g ( D)  E Y D   E0 
Emax D 
,
ED50  D 
where  is the 4th parameter which is sometimes called the Hill parameter. The Hill parameter affects the
shape of the curve and is in some cases very difficult to estimate. In Figure 1 we show curves for a range
of Hill parameter values.
In fitting models the error distribution is assumed to be iid normal with zero mean and equal variance .
This assumption is not in any way a restriction of the methods that we discuss here but a convenient
assumption that simplifies the issues that we address. Our model becomes:
Y = g(D) + 




 


where =(E0,ED50,EMAX, and  ~iid N(0, ). Model (1.1) can be written similarly as in (1.3) by
setting  =1.
Figure 1: Curve Shapes for Different Values of Hill Parameter .
Figure 1 illustrates the rich variety of shapes generated by changing the Hill parameter . For
≤the curves have concave downward shapes whereas for the curves have a sigmoidal
shape.
For very small  the curve represents a flat response for any dose greater than zero, whereas for very
large  the curve approaches a step function at ED50.
2. Maximum Likelihood Estimation for the EMAX model.
Under the EMAX model the MLE estimator of  is the least squares estimator ˆ . One way to calculate ˆ
is by non-linear least squares minimization (NLS). Given the data {(Di, yi), i=1,…,N} the NLS estimator
minimizes (in ) the following quantity:
N
SSE()=  ( yi  g ( Di )) 2
i 1
The algorithm to minimize SSE is a special case of Newton-Raphson specialized for minimizing sums of
squares. It is implemented in many standard software packages such as R, S-plus and others. The NLS
implementation of the MLE’s is convenient because it allows the computation of confidence intervals for
the model parameters and p-values for testing the significance of the parameters.
Example 1. Dose response estimation in a clinical trial.
In order to illustrate the issues that we address in this paper we use data from an unpublished phase II
clinical trial. The study had 5 doses (0, 5, 25, 50 and 100 mg) and corresponding group sizes (n=78, 81,
81, 81, 77). Unlike most dose-response trials this study has about 80 patients per arm and a high signal to
noise ratio. Figure 2 plots the response against the doses with 90% confidence intervals around the mean.
The graph shows that it is a potentially good candidate for the EMAX model. We first fit 3-Paramater
EMAX model to the data. The maximum likelihood estimators (MLEs) converge quickly and have
estimates of Êmax = 15.13, Ê0= 2.47, and ED50=67.49.
3. A Bayesian approach to EMAX and logistic modeling using BUGS.
As an alternative to the MLE we apply Bayesian methodology to this problem. We introduce a prior
distribution on each parameter and estimate the model parameters by the posterior means.
The parameters EMAX and E0 both have unbounded ranges and we assigned them a normal prior with
mean 10 and zero respectively and large variances 100 and 25 respectively. The other three parameters  ,
ED50 and  are assigned Gamma priors, with means 1.8, 57.5 and 10 and variances 2.16, 2875 and
10000 respectively. The prior for  was offset by adding 0.5 to the Gamma random variable. Our choice
of prior distributions is consistent with what the literature suggests. The Bayesian estimate for the 4parameter EMAX model is shown by the solid line in Figure 2. We may claim that 4-parameter Bayes
EMAX fit recovered the 4-parameter MLE fit. With the 4-parameter Bayes EMAX we obtain an estimate
of 1.1, slightly bigger than 1 (1 corresponding to a 3-parameter EMAX).
Figure 2: Clinical trial example with 5 doses. Red stars represent the response at a given dose and
90% CI. Solid line is the 3-parameter fit. Dashed line refers to the 4-parameter fit (using the
estimates after 100 iterations.)
15
10
5
Change in EDD
0
3 Parameter EMAX
4 Parameter EMAX
Bayesian Approach
Power Law
0
20
40
60
80
100
Dose
(mg)
4.
Logistic regression model.
The 4 parameters Emax curve yields the logistic curve by setting E0 = 0 , EMAX =1 and substituting D by
ex. The resulting function is g(x)= ea+bx/(1+ ea+bx) where b= and a= log(ED50). The response can be
binarized by a threshold into Y=1 for high respondents and Y=0 for low respondents. The logistic model
has the property that the P(Y=1| X=x) = E(Y|X=x) = g(x) so both logistic and EMAX models satisfy
similar equation. The maximum likelihood estimator forfor the logistic model is not the least squares
estimator but has to be obtained by maximization of the likelihood function (a standard version of which
is the Iterative Reweighted Least Squares algorithm).
Consider the simple logistic regression model with one predictor x taking values {x1,…, xn} and a
binary response y taking values {y1,…, yn}.
The log-likelihood can be written as


p
l ( p; y)    yi log i  log(1  pi )  ,
1  pi
i 

-1
where pi =Prob(yi=1) = logit ( (xi -).
Let’s consider the special case when the yi’s sorted by the values of xi’s produce a sequence made up of a
subsequence of k zero’s followed by a subsequence of n-k one’s and there are no 0’s between the 1’s and
no 1’s between the 0’s. Let be fixed at a value of x that separates the yi values into 0’s and 1’s.
In this case the lilkelihood becomes
l ( p; y)   log( pi )   log(1  pi )
yi 1
yi 0
As   gets larger the pi on first sum of the equation become closer to 1 and the pi on the second sum of
the equation converges to zero. Hence the MLE of  is plus infinity.
R code for Emax models:
m1 = nls(ch8~e0 +(emax * dose)/(dose+ed50), start= c(ed50=68,
e0=2.4,emax=16),trace=T,data=edd)
m1 = nls(ch8~e0 +(emax * dose)/(dose+ed50), start= c(ed50=68,
e0=2.4,emax=16),control=nls.control(maxiter=100),trace=TRUE,na.action=na.omit,data=edd)
Flick Tail experiment
Objective of the Study
- Detect synergy between Morphine and Marijuana
- Find the best combination of drugs that produces highest efficacy
- Also interested in the followings
- ED50s for each drug and their combination
- Confidence intervals for ED50s using Fieller’s theorem
Data
Predictor variables :
Morphine and Marijuana dosages in milligrams
Response variable :
Proportion of mice respond to the given drug combination
Methodology: Logistic regression
\
options(contrasts = c("contr.treatment", "contr.poly"))
x <- read.table("flick1.txt",head=T)
Fl <- cbind(flick=x$flick, noflick=x$reps-x$flick)
x.lg <- glm(Fl ~ morphine*del9, family = binomial,data=x)
summary(x.lg, cor = F)
plot(x$morphine,residuals.glm(x.lg,type="d"))
plot(x$del9,residuals.glm(x.lg,type="d"))
plot(predict(x.lg),residuals.glm(x.lg,type="d"))
x.lg0 <- glm(Fl ~ log(1+morphine)*log(1+del9)-1, family = binomial,data=x)
dose.p(x.lg0, cf = c(1,3), p = 1:3/4)
dose.p(update(x.lg0, family = binomial(link = probit)),
cf = c(1, 3), p = 1:3/4)
Two way ANOVA
ONE-WAY ANOVA:
Linear Model: xgi
= g +gi
,
g=1,…,k i=1,…,ng
Assumptions: Errors gi are (i) independent (ii) normally distributed (iii) with
equal variances.
Fit: Data = group effect + residual
Hypothesis Testing: All means are equal Vs some are not equal.
ANOVA
(a) H : k
(b) Ha : j
LINEAR MODEL FOR TWO WAY TABLE:
This is a typical dataset where we have a response and two factors.
Exampe:
clarion
clinton
knox
o'neill
compost
wabash
webster
B1
32.7
32.1
35.7
36.0
31.8
38.2
32.5
B2
32.3
29.7
35.9
34.2
28.0
37.8
31.1
B3
31.5
29.1
33.1
31.2
29.2
31.9
29.7
Fitted model: Data = Tot Mean + Row Effect + Col Eff + Residual
B1
B2
B3 row effect
clarion -1.052 -0.024 1.076 -0.390
clinton 0.214 -0.757 0.543 -2.257
knox -0.786 0.843 -0.057
2.343
o'neill 0.614 0.243 -0.857
1.243
compost 0.548 -1.824 1.276 -2.890
wabash 0.648 1.676 -2.324
3.410
webster -0.186 -0.157 0.343 -1.457
col eff 1.586 0.157 -1.743 32.557
SAS DOES IT IN A DIFFERENT WAY, IT SETS THE EFFECT
FOR ONE OF THE GROUPS AS ZERO.
INTERCEPT
TYPE
clarion
clinton
knox
o'neill
compost
wabash
webster
BLOCK
1
2
3
ESTIMATE
TVALUE
29.3571 B 34.96
1.0667
-0.800
3.800
2.700
-1.433
4.867
0.000
3.3285
1.900
0.000
B
B
B
B
B
B
B
B
B
B
1.02
-0.76
3.63
2.58
-1.37
4.65
.
4.85
2.77
.
PVALUE
0.0001
STD ERROR
0.839703
0.3285
0.4597
0.0035
0.0242
0.1962
0.0006
.
0.0004
0.0169
.
1.047294
1.047294
1.047294
1.047294
1.047294
1.047294
.
0.685615
0.685615
.
COMPUTE ANOVA TABLE:
In order to test the significance of row effects or column effects.
Source
TYPE
BLOCK
Source
TYPE
BLOCK
DF
Type I SS
F Value
Pr > F
6
2
103.15142
39.03714
10.45
11.86
0.0004
0.0014
Type III SS
F Value
103.15142
39.03714
10.45
11.86
DF
6
2
Pr > F
0.0004
0.0014
MULTIPLE COMPARISONS:
Once we find that a factor is significant then we need to explore the difference between the
corresponding factor levels. We do that using a multiple comparison procedure such as Tukey or
Duncan.
Another useful tool is to use contrasts if we are interested in testing only a few comparisons or in
some linear combinations of the parameters.
This is the SAS code and output file
options ps=50 ls=70;
*---------------snapdragon experiment---------------*
| as reported by stenstrom, 1940, an experiment was |
| undertaken to investigate how snapdragons grew in |
| various soils. each soil type was used in three
|
| blocks.
|
*---------------------------------------------------*;
data plants;
input type $ @;
do block=1 to 3;
input stemleng @;
output;
end;
cards;
clarion 32.7 32.3 31.5
clinton 32.1 29.7 29.1
knox
35.7 35.9 33.1
o'neill 36.0 34.2 31.2
compost 31.8 28.0 29.2
wabash
38.2 37.8 31.9
webster 32.5 31.1 29.7
;
proc glm;
class type block;
model stemleng=type block; run;
proc glm order=data;
class type block;
model stemleng=type block / solution;
means type / bon duncan tukey;
*-type-order---clrn-cltn-knox-onel-cpst-wbsh-wstr;
contrast 'compost v others' type -1 -1 -1 -1 6 -1 -1;
contrast 'knox vs oneill'
type 0 0 1 -1 0 0 0;
run;
OUTPUT: Output file from "glm.sas"
INCOMPLETE DESIGNS
120 6559 1240
6
71 237
40 689 165 855
202 233 165
62
5 385
40
74
25
36
15
22
34 129
32
54
23
48
10
1
This is the data but with some missing observations:
clarion
clinton
knox
o'neill
compost
wabash
webster
B1
32.7
32.1
35.7
NA
31.8
38.2
32.5
B2
32.3
29.7
35.9
34.2
28.0
37.8
NA
B3
NA
29.1
33.1
31.2
29.2
31.9
29.7
Row Effects :
clarion
clinton
knox o'neill compost
wabash
webster
0.05238095 -2.147619 2.452381 0.252381 -2.780952 3.519048 -1.34761
Column Effects:
B1
B2
B3
1.427778 0.3111111 -1.738889
Main Effect:
32.44762
So what is SAS going to do?
INTERCEPT
TYPE
clarion
clinton
knox
ESTIMATE
29.372 B
TVALUE
27.73
PVALUE
0.0001
0.281 B
-0.969 B
3.630 B
0.20
-0.76
2.84
0.8491
0.4682
0.0196
STD ERROR
1.0591
1.4390
1.2804
1.2804
o'neill
compost
wabash
webster
BLOCK 1
2.209
-1.603
4.696
0.000
3.455
2.236
0.000
2
3
B
B
B
B
B
B
B
1.54
-1.25
3.67
.
4.16
2.69
.
0.1591
0.2421
0.0052
.
0.0025
0.0247
1.4390
1.2804
1.2804
.
0.8308
0.8308
We look at the ANOVA table and we see that order matters.
TYPE BEFORE BLOCK
Source
TYPE
BLOCK
Sourc
TYPE
BLOCK
DF
6
2
DF
6
2
Type I SS
95.93611111
33.76848485
Type III SS
98.19681818
33.76848485
F Value
8.42
8.89
F Value
8.62
8.89
Pr > F
0.0028
0.0074
Pr > F
0.0026
0.0074
F Value
8.30
8.62
F Value
8.89
8.62
Pr > F
0.0091
0.0026
Pr > F
0.0074
0.0026
F Value
8.54
Pr > F
0.0021
BLOCK BEFORE TYPE
Source
BLOCK
TYPE
Source
BLOCK
TYPE
DF
2
6
DF
2
6
Type I SS
31.50777778
98.19681818
Type III SS
33.76848485
98.19681818
THE BASIC STATS
Source
Model
Error
Total
R-Square
0.883610
DF
8
9
17
Sum of Squares
129.70459596
17.08484848
146.78944444
C.V.
4.238642
STEMLENG Mean
32.5055556
Download