STT 864 KEY TO HOMEWORK 2

advertisement
STT 864 KEY TO HOMEWORK 2
1. Consider the simulation example given in Example 2 of lecture notes 4 on the
class page website. Namely, we generated data according to the following model
Yi = sin{2π(β1 + β2 Xi2 )} + εi for i = 1, · · · , n.
(1)
To be specific, in this simulation, please set β1 = 2 and β2 = 1.2, generating Xi from
a uniform(0,1) and generating εi from N (0, σ 2 ) with σ 2 = 1.5.
(a) Please give a sufficient condition so that β1 and β2 are identifiable. Provide
your reasons.
To make β1 and β2 identifiable, we need to find conditions such that, if
(1)
(1)
(2)
(1)
sin{2π(β1 + β2 x2 )} = sin{2π(β1 + β2 x2 )} for every x, then it implies
(1)
(2)
(1)
(2)
(1)
(2)
β1 = β1 and β2 = β2 . First, we notice that if β2 6= β2 , there always
(1)
(1)
(2)
(1)
exist some value x so that sin{2π(β1 + β2 x2 )} =
6 sin{2π(β1 + β2 x2 )},
which violate our assumption. Therefore, we do not need condition for β2 .
(1)
Now we only consider the identification of β1 . Suppose x = 0, sin{2πβ1 } =
(2)
(1)
(2)
(1)
sin{2πβ1 }. If we restrict β1 and β1 in (1.75, 2.25), then β1 has to be
(2)
same as β1 . Therefore, a sufficient condition so that β1 and β2 are identifiable
is that β1 ∈ (1.75, 2.25).
(b) First, set a seed for your simulation: for convenience, please use your PID as
the seed for your simulation. For example, if your PID is “A79235030”, then
set seed using “set.seed(79235030)” to generate your own data set. By setting
the seed, we are able to reproduce the same data set in the simulation. Second,
generating n = 100 data points (Xi , Yi ) according to the model (1).
The following R code could be used for generating data sets
set.seed(79235030)
n<-100
beta1<-2
beta2<-1.2
X<-runif(n,0,1)
epsilon<-rnorm(n,0,sqrt(1.5))
Y<-sin(2*pi*(beta1+beta2*X))+epsilon
1
(c) Find one algorithm that will help you find the least squares estimators of
β = (β1 , β2 )T . Please give the details of the algorithm and implement your
proposed algorithm using R. You can not use any built in optimization function
in R (for example, “nls” or “optim”. . .).
If we use the R bulid-in function, you could obtain the estimates as following:
nlsfit<-nls(Y~sin(2*pi*(beta1+beta2*X)), start=list(beta1=2,beta2=1),
lower=c(1.75,0),upper=c(2.25,5),algorithm="port")
For this question, there are many possible algorithms. As an example, we
consider a coordinate descent algorithm. The algorithm contains the following
steps:
(0)
1. Choose initial values for β1 and β2 , denote them as β1
(0)
and β2 .
(k−1)
2. At the k-th step (k = 1, 2, · · · ), fix the value of β2 at β2
, minimizing
the function SSE(β1 ) as a function of β1 using the binary searching
method where
SSE(β1 ) =
n
X
(k−1)
{Yi − sin(2π(β1 + β2
Xi2 ))}2 .
i=1
(k)
Denote the minimizer β1 as β1 .
(k)
3. Then fix the value of β1 at β1 , minimizing the following function as a
function of β2 using the Newton algorithm
SSE(β2 ) =
n
X
(k)
(Yi − sin(2π(β1 + β2 Xi2 ))}2 .
i=1
(k)
Denote the minimizer β2 as β2 . Let k = k + 1.
4. Repeat Steps 2 and 3 until convergence.
You could implement the above algorithm as following:
## Step 1: Choose initial values for \beta_1 and \beta_2
beta10<-1.9
beta20<-0.8
iter<-0
repeat
{
## Step 2: minimize SSE(beta1) with respect to beta1 by fixing beta2
SSE1<-function(beta1)
{
2
SSE<-sum((Y-sin(2*pi*(beta1+beta20*X)))^2)
return(SSE)
}
## The following line use the binary searching algorithm to
## find the minimimum of the function SSE1(beta1) in the
## interval (1.75, 2.25)
beta1new<-optimize(SSE1,interval=c(1.75,2.25))$minimum
## Step 3: minimize SSE(beta2) with respect to beta2 by fixing beta1
Db<-cos(2*pi*(beta1new+beta20*X))*2*pi*X
beta2new<-beta20+t(Db)%*%(Y-sin(2*pi*(beta1new+beta20*X)))/(t(Db)%*%Db)
iter<-iter+1
if ((beta10-beta1new)^2+(beta20-beta2new)^2<1e-5||iter>30) break;
beta10<-beta1new
beta20<-beta2new
}
betahat<-c(beta1new,beta2new)
(d) Plot the estimated curve sin{2π(β̂1 + β̂2 Xi2 )} as function of Xi , and overlay
the curve with the scatter plot of Yi versus Xi .
We can plot the estimated curve and the scatter plot using the following commands.
beta1<-betahat[1]
beta2<-betahat[2]
plot(X,Y)
points(X, sin(2*pi*(beta1+beta2*X)),col=2)
The plot is given below:
●
3
●
●
●
●
●
2
●
●
●
●●
●
●●
1
Y
0
●
●●
● ●●
●
● ●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●●
●● ●
●●
●
●●
●
●
●
●
●
●●
●
●●
● ●
●● ● ●
●
●
●
●
−1
−2
●
●
●
●
●
●
●
●
●
● ●
●●●
●●●
●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
0.2
●●
●
●
●
●
●●
●
●
●● ●
●●
●●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
0.0
●
●
●
●
●
●
●
0.4
●
0.6
●
0.8
1.0
X
Figure 1: Plot of X versus Y .
The plot shows that the estimated curve fit the data set well.
3
2. The book Nonlinear Regression Analysis and its Applications by Bates and
Watts contains a small data set taken from an MS thesis of M.A. Treloar “Effects of
Puromycin on Galactosyltransferase of Golgi Membranes.” It is reproduced below.
y is reaction velocity (in counts/min2 ) for an enzymatic reaction and x is substrate
concentration (in ppm) for untreated enzyme and enzyme treated with Puromycin.
x
y
Untreated
Treated
0.02
67, 51
76, 47
0.06
84, 86
97, 107
0.11
98, 115
123, 139
0.22
131, 124
159, 152
0.56
144, 158
191, 201
1.10
160
207, 200
Every cell in the above table contains two response values except the cell corresponding to x = 1.10 for the untreated group.
A standard model for either the untreated enzyme or for the treated enzyme is
the “Michaelis-Menten model”
Yi =
θ1 Xi
+ i
θ2 + Xi
(2)
where i are assumed to be iid N (0, σ 2 ). Note that in this model, (1) the mean of y
is 0 when x = 0 , (2) the limiting (large x ) mean of y is θ1 , and (3) the mean of y
reaches half of its limiting value when x = θ2 .
We begin by considering only the “Treated” part of the data set. Use R to help
you do all that follows. Begin by reading in 12 × 1 vectors y and x.
(a) Plot y vs x and make “eye-estimates” of the parameters based on your plot
and the interpretations of the parameters offered above. (Your eye-estimate
of θ1 is what looks like a plausible limiting value for y, and your eye-estimate
of θ2 is a value of x at which y has achieved half its maximum value.)
Use the following commands to plot y vs x:
>x<-rep(c(0.02,0.06,0.11,0.22,0.56,1.10),each=2)
>y<-c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200)
>plot(x,y,xlab="Conc (ppm)",ylab="Vel (counts/sqmin)")
It can be telled from Figure 1 that the “eye-estimate” of θ1 is around 200 and
the “eye estimate” of θ2 is around 0.08.
(b) Add the nls package to your R environment. Then issue the command
> REACT.fm<-nls(formula=y~theta1*x/(theta2+x),
start=c(theta1=#,theta2=##),trace=T)
where in place of # and ## you enter your eye-estimates from (a). This
will fit the nonlinear model (2) via least squares. What are the least squares
4
200
●
●
●
●
150
●
●
●
●
100
Vel (counts/sqmin)
●
●
50
●
●
0.0
0.2
0.4
0.6
0.8
1.0
Conc (ppm)
Figure 2: Scatter plot of concentration vs reaction velocity.
estimate of the parameter vector θ and the sum of squares of errors (SSE)
θ̂ OLS
θ̂1
=
θ̂2
and
SSE =
12 X
i=1
Yi −
θ̂1 Xi 2
θ̂2 + Xi
.
> REACT.fm<-nls(formula=y~theta1*x/(theta2+x),
+ start=c(theta1=200,theta2=0.08),trace=T)
4235.266 : 2e+02 8e-02
1211.965 : 213.02889399
0.06246453
1195.504 : 212.57333382
0.06393501
1195.449 : 212.67212463
0.06410312
1195.449 : 212.68261888
0.06411953
1195.449 : 212.68363515
0.06412111
> deviance(REACT.fm)
[1] 1195.449
From the above R output, the least square esimate of θ is θ̂ = (212.684, 0.064)T
and SSE is 1195.449.
(c) Re-plot the original data with a superimposed plot of the fitted equation. To
do this, you may type
5
> conc<-seq(0,1.5,.05)
> velocity<-coef(REACT.fm)[1]*conc/(coef(REACT.fm)[2]+conc)
> plot(c(0,1.5),c(0,250),type="n",xlab="Conc (ppm)",ylab="Vel
(counts/sqmin)")
> points(x,y)
> lines(conc,velocity)
(The first two commands set up vectors of points on the fitted curve. The
third creates an empty plot with axes appropriately labeled. The fourth plots
the original data and the fifth plots line segments between points on the fitted
curve.)
●
●
●
●
150
●
●
●
●
●
100
Vel
(counts/sqmin)
200
250
The plot is given in Figure 3.
●
●
0
50
●
0.0
0.5
1.0
1.5
Conc (ppm)
Figure 3: Plot of concentration vs reaction velocity with a superimposed plot of
fitted regression function.
(d) Get more complete information on the fit by typing
> summary(REACT.fm)
> vcov(REACT.fm)
Verify that the output of this last call is MSE(D̂T D̂)−1 and that the standard
errors produced by the first are square roots of diagonal elements of this matrix. Then use the information produced here and make an approximate 95%
6
prediction interval for one additional reaction velocity, for substrate concentration .50 ppm.
> summary(REACT.fm)
Formula: y ~ theta1 * x/(theta2 + x)
Parameters:
Estimate Std. Error t value Pr(>|t|)
theta1 2.127e+02 6.947e+00 30.615 3.24e-11 ***
theta2 6.412e-02 8.281e-03
7.743 1.57e-05 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 10.93 on 10 degrees of freedom
Number of iterations to convergence: 5
Achieved convergence tolerance: 5.798e-06
> vcov(REACT.fm)
theta1
theta2
theta1 48.26288310 4.401439e-02
theta2 0.04401439 6.857383e-05
The
matrix D̂ inthis example is a 12 × 2 matrix with i-th row equal to
Xi
, − θ̂1 Xi 2 . Next, we verify that the output of vcov(REACT.fm) and
θ̂2 +Xi
(θ̂2 +Xi )
the standard errors of θ using the following R code.
>
>
>
>
D1<-x/(x+coef(REACT.fm)[2])
D2<--coef(REACT.fm)[1]*x/((coef(REACT.fm)[2]+x)^2)
D<-cbind(D1,D2)
solve(t(D)%*%D)*deviance(REACT.fm)/10
D1
D2
D1 48.26288272 4.401439e-02
D2 0.04401439 6.857383e-05
> sqrt(48.26288310)
[1] 6.947149
> sqrt(6.857383e-05)
[1] 0.008280932
According to the notes, the prediction interval can be computed as follows:
>Ghat<-c(0.5/(0.5+coef(REACT.fm)[2]),
-coef(REACT.fm)[1]*0.5/((coef(REACT.fm)[2]+0.5)^2))
7
>RMSE<-sqrt(deviance(REACT.fm)/10)
>PSE<-sqrt(1+t(Ghat)%*%solve(t(D)%*%D)%*%Ghat)
>PredLb<-coef(REACT.fm)[1]*0.5/(coef(REACT.fm)[2]+0.5)-PSE*RMSE*qt(0.975,10)
>PredUb<-coef(REACT.fm)[1]*0.5/(coef(REACT.fm)[2]+0.5)+PSE*RMSE*qt(0.975,10)
This produces an approximate 95% prediction interval for one additional reaction velocity at concentration 0.5 ppm as (162.2353,214.7824).
(e) The concentration, say x100 , at which mean reaction velocity is 100 counts/min2
is a function of θ1 and θ2 . Find a sensible point estimate of x100 and a standard
error (estimated standard deviation) for your estimate.
Setting the expectation of Y to 100, namely θ1 x/(θ2 + x) = 100, we obtain
that
100θ2
x100 =
.
θ1 − 100
Then an estimator of x100 is x̂100 =
100θ̂2
θ̂1 −100
=
100∗0.064
212.684−100
= 0.057.
>Ghat2<-c(-100*coef(REACT.fm)[2]/((coef(REACT.fm)[1]-100)^2),
100/(coef(REACT.fm)[1]-100))
>RMSE<-sqrt(deviance(REACT.fm)/10)
>PSE<-sqrt(t(Ghat2)%*%solve(t(D)%*%D)%*%Ghat2)
>SE<-PSE*RMSE
Using the above R code, we obtain the standard error of x̂100 as 0.00518.
(f) As a means of visualizing what function the R routine nls minimized in order
to find the least squares coefficients, do the following. First set up a grid of
(θ1 , θ2 ) pairs as follows. Type
>
>
>
>
>
>
>
theta<-coef(REACT.fm)
se<-sqrt(diag(vcov(REACT.fm)))
dv<-deviance(REACT.fm)
gsize<-101
th1<-theta[1]+seq(-4*se[1],4*se[1],length=gsize)
th2<-theta[2]+seq(-4*se[2],4*se[2],length=gsize)
th<-expand.grid(th1,th2)
Then create a function to evaluate the sums of squares
>
+
+
+
ss<-function(t)
{
return(sum((y-t[1]*x/(t[2]+x))^2))
}
8
As a check to see that you have it programmed correctly, evaluate this function
at θ̂ OLS for the data in hand, and verify that you get the SSE. Then evaluate
the SSE over the grid of parameter vectors θ set up earlier and produce a
contour plot using
>
>
>
>
SumofSquares<-apply(th,1,ss)
SumofSquares<-matrix(SumofSquares,gsize,gsize)
plot(th1,th2,type="n",main="Error Sum of Squares Contours")
contour(th1,th2,SumofSquares,levels=c(seq(1000,4000,200)))
What contour on this plot corresponds to an approximately 90% approximate
confidence region for the parameter vector θ?
By using the following command
> ss(theta)
[1] 1195.449
0.09
We see that the SSE computed from the function ss is the same as the SSE
from the R output. Because SSE exp(qchisq(0.9, 2)/12) = 1754.679, the ap-
00
40 0
0
36
00
34
2200
0.07
0.08
3
0
80
00
0.06
12
0.05
1400
1600
1800
0.04
2000
2600
3400
0.03
2400
2800
3000
3200
3600
3800
4000
190
200
210
220
230
240
Figure 4: Contour plot of the likelihood function.
proximate 90% confidence region for θ is the area with in the contour labeled
by 1800 in about plot.
9
(g) Now redo the contour plotting, placing only two contours on the plot using
the following code.
> plot(th1,th2,type="n",main="Error Sum of Squares Contours")
> contour(th1,th2,SumofSquares,levels=dv*c((1+.1*qf(.95,1,10)),
(1+.2*qf(.95,2,10))))
0.05
0.06
0.07
0.08
0.09
Identify on this plot an approximately 95% (joint) confidence region for θ and
individual 95% confidence intervals for θ1 and θ2 .
1788.942
0.03
0.04
2176.391
190
200
210
220
230
240
Figure 5: Contour plot of confidence regions.
The approximately 95% (joint) confidence region for θ is the contour labeled
by 2176.391. The individual 95% confidence interval for θ1 is determinated
by the cross points between solid line and the inner contour in Figure 5. The
individual 95% confidence interval for θ2 is determinated by the cross points
between dash line and the inner contour in Figure 5. Note that solid and dash
line were drew, respectively, at the places of maximum likelihood estimators
for θ2 and θ1 . The confidence intervals can be found by the following functions.
> CI4th1<-function(th1)
+ {return(ss(c(th1,theta[2]))-dv*(1+.1*qf(.95,1,10)))}
> uniroot(CI4th1,c(200,210))
10
$root
[1] 202.7161
$f.root
[1] -0.002613181
> uniroot(CI4th1,c(220,230))
$root
[1] 222.6512
$f.root
[1] -0.0009634489
> CI4th2<-function(th2)
+ {return(ss(c(theta[1],th2))-dv*(1+.1*qf(.95,1,10)))}
> uniroot(CI4th2,c(0.05,0.06))
$root
[1] 0.05294322
$f.root
[1] 0.5348936
> uniroot(CI4th2,c(0.07,0.08))
$root
[1] 0.07733102
$f.root
[1] 0.8202585
Therefore the individual 95% CI for θ1 is (202.7161,222.6512) and the individual CI for θ2 is (0.0529, 0.0773).
(h) Use the standard errors for the estimates of the coefficients produced by the
routine nls() and make 95% t intervals for θ1 and θ2 . How much different are
these from your intervals in g)? (Notice that the sample size in this problem is
small and reliance on any version of large sample theory to support inferences
is tenuous.)
> theta[1]-qt(0.975,10)*6.947
theta1
197.2048
> theta[1]+qt(0.975,10)*6.947
theta1
228.1625
> theta[2]-qt(0.975,10)*8.281e-03
11
theta2
0.0456699
> theta[2]+qt(0.975,10)*8.281e-03
theta2
0.08257233
The 95% CI for θ1 is (197.2048,228.1625) and the 95% CI for θ2 is (0.0456,0.0826).
The Wald type confidence intervals using t-distributions in part (h) is wider
than that in part (g).
(i) Make two different approximate 95% confidence intervals for σ 2 . One based
on carrying over the linear model result that SSE/σ 2 ∼ χ2n−p and the other
based on the “profile likelihood” method given in the lecture notes.
> SSE/qchisq(0.975,10)
[1] 58.36247
> SSE/qchisq(0.025,10)
[1] 368.1733
Thus the confidence interval based on approximated chi-square distribution is
(58.362,368.173). The “profile likelihood” type confidence interval is
{σ 2 : −
12
SSE
12
1
log(σ 2 ) −
> − log(SSE/12) − 6 − χ21,0.05 }
2
2
2σ
2
2
> CI4sigma<-function(sigma)
+ {return(-6*log(sigma)-SSE/(2*sigma)+6*log(SSE/12)+6+.5*qchisq(0.95,1))}
> uniroot(CI4sigma,c(40,100))
$root
[1] 49.16228
$f.root
[1] -1.55848e-07
> uniroot(CI4sigma,c(100,300))
$root
[1] 250.6437
$f.root
[1] 4.149792e-12
From the output of above R code, we find that the 95% “profile likelihood”
type confidence interval is (49.162,250.644).
12
(j) Use the R function confint() to get 95% intervals for θ1 and θ2 . That is, add
the MASS package in order to get access to the function. Then type
> confint(REACT.fm, level=.95)
How do these intervals compare to the ones you found in part (g)?
> confint(REACT.fm, level=.95)
Waiting for profiling to be done...
1301.902 : 0.06412111
1237.844 : 0.0601934
1237.844 : 0.06019309
.
.
.
2159.605 : 229.5254
2584.938 : 233.3091
2584.839 : 233.1675
2.5%
97.5%
theta1 197.30212811 229.29006457
theta2
0.04692517
0.08615995
The intervals found above are wider than the intervals found in part (g).
(k) Scientific theory suggests that treated enzyme will have the same value of θ2
as does untreated enzyme, but that θ1 may change with treatment. That is, if
0 if treated (Puromycin is used)
Zi =
1
otherwise
a possible model is
Yi =
(θ1 + θ3 Zi )Xi
+ i .
θ2 + Xi
and the parameter θ3 then measures the effect of the treatment. Go back to
the data table and now do a fit of the above (3 parameter) nonlinear model
including a possible Puromycin effect using all 23 data points. Making two
different approximately 95% confidence intervals for θ3 . Interpret these. (Do
they indicate a statistically detectable effect? If so, what does the sign say
about how treatment affects the relationship between x and y?) Plot the
following curves on the same set of axes
y=
θ̂1 x
θ̂2 + x
and y =
13
(θ̂1 + θ̂3 )x
θ̂2 + x
for 0 < x < 2.
> ally<-c(c(67, 51, 84, 86, 98, 115, 131, 124, 144, 158, 160),y)
> allx<-c(x[-12],x)
> allz<-c(rep(1,11),rep(0,12))
> new.REACT.fm<-nls(formula=ally~(theta1+theta3*allz)*allx/(theta2+allx),
start=c(theta1=200,theta2=0.05,theta3=100),trace=T)
110154 : 2e+02 5e-02 1e+02
2327.989 : 205.23857904
0.05317497 -39.75080557
2242.586 : 208.10507238
0.05721579 -41.84677975
2240.917 : 208.56209653
0.05787892 -42.00362510
2240.892 : 208.62193926
0.05796084 -42.02328299
2240.891 : 208.62911125
0.05797054 -42.02565604
2240.891 : 208.62995759
0.05797168 -42.02593617
> summary(new.REACT.fm)
Formula: ally ~ (theta1 + theta3 * allz) * allx/(theta2 + allx)
Parameters:
Estimate Std. Error t value Pr(>|t|)
theta1 208.62996
5.80399 35.946 < 2e-16 ***
theta2
0.05797
0.00591
9.809 4.37e-09 ***
theta3 -42.02594
6.27214 -6.700 1.61e-06 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 10.59 on 20 degrees of freedom
Number of iterations to convergence: 6
Achieved convergence tolerance: 5.153e-06
> -42.02594-qt(0.975,23-3)*6.27214
[1] -55.10939
> -42.02594+qt(0.975,23-3)*6.27214
[1] -28.94249
> confint(new.REACT.fm)
Waiting for profiling to be done...
2357.707 :
0.05797168 -42.02593617
2272.785 :
0.05573146 -40.16015707
.
.
.
2781.752 : 201.91202436
0.05738586
3085.913 : 200.31354346
0.05736048
3085.894 : 200.35964778
0.05743236
3085.894 : 200.36525319
0.05744097
3085.894 : 200.365923
0.057442
14
2.5%
97.5%
theta1 196.39379474 221.50898562
theta2
0.04599081
0.07234273
theta3 -55.19924353 -28.95657688
The Wald type 95% CI for θ3 is (-55.109,-28.942). A profile likelihood method
gives 95% CI (-55.199 -28.957). Because both intervals do not include 0, we can
conclude that θ3 is not 0 and there exists statistically significant Puromycin
effect. Since the signs of both limits of θ̂3 is negative, the treated enzyme will
significant increase the expected reaction velocity.
> conc<-seq(0,1.5,.05)
> velocity1<-coef(new.REACT.fm)[1]*conc/(coef(REACT.fm)[2]+conc)
> velocity2<-(coef(new.REACT.fm)[1]
> +coef(new.REACT.fm)[3])*conc/(coef(REACT.fm)[2]+conc)
> plot(c(0,1.5),c(0,250),type="n",xlab="Conc (ppm)",ylab="Vel
(counts/sqmin)")
> points(allx,ally)
> lines(conc,velocity1)
> lines(conc,velocity2,lty=2)
15
250
200
●
150
●
●
●
●
●
●
●
●
●
●
●
100
Vel(counts/sqmin)
●
●
●
● ●
●
●
●
●
●
Treated
Untreated
0
50
●
0.0
0.5
1.0
1.5
Conc (ppm)
Figure 6: Plot of concentration vs reaction velocity with superimposed plots of fitted
regression functions.
16
Download