STT 864 KEY TO HOMEWORK 2 1. Consider the simulation example given in Example 2 of lecture notes 4 on the class page website. Namely, we generated data according to the following model Yi = sin{2π(β1 + β2 Xi2 )} + εi for i = 1, · · · , n. (1) To be specific, in this simulation, please set β1 = 2 and β2 = 1.2, generating Xi from a uniform(0,1) and generating εi from N (0, σ 2 ) with σ 2 = 1.5. (a) Please give a sufficient condition so that β1 and β2 are identifiable. Provide your reasons. To make β1 and β2 identifiable, we need to find conditions such that, if (1) (1) (2) (1) sin{2π(β1 + β2 x2 )} = sin{2π(β1 + β2 x2 )} for every x, then it implies (1) (2) (1) (2) (1) (2) β1 = β1 and β2 = β2 . First, we notice that if β2 6= β2 , there always (1) (1) (2) (1) exist some value x so that sin{2π(β1 + β2 x2 )} = 6 sin{2π(β1 + β2 x2 )}, which violate our assumption. Therefore, we do not need condition for β2 . (1) Now we only consider the identification of β1 . Suppose x = 0, sin{2πβ1 } = (2) (1) (2) (1) sin{2πβ1 }. If we restrict β1 and β1 in (1.75, 2.25), then β1 has to be (2) same as β1 . Therefore, a sufficient condition so that β1 and β2 are identifiable is that β1 ∈ (1.75, 2.25). (b) First, set a seed for your simulation: for convenience, please use your PID as the seed for your simulation. For example, if your PID is “A79235030”, then set seed using “set.seed(79235030)” to generate your own data set. By setting the seed, we are able to reproduce the same data set in the simulation. Second, generating n = 100 data points (Xi , Yi ) according to the model (1). The following R code could be used for generating data sets set.seed(79235030) n<-100 beta1<-2 beta2<-1.2 X<-runif(n,0,1) epsilon<-rnorm(n,0,sqrt(1.5)) Y<-sin(2*pi*(beta1+beta2*X))+epsilon 1 (c) Find one algorithm that will help you find the least squares estimators of β = (β1 , β2 )T . Please give the details of the algorithm and implement your proposed algorithm using R. You can not use any built in optimization function in R (for example, “nls” or “optim”. . .). If we use the R bulid-in function, you could obtain the estimates as following: nlsfit<-nls(Y~sin(2*pi*(beta1+beta2*X)), start=list(beta1=2,beta2=1), lower=c(1.75,0),upper=c(2.25,5),algorithm="port") For this question, there are many possible algorithms. As an example, we consider a coordinate descent algorithm. The algorithm contains the following steps: (0) 1. Choose initial values for β1 and β2 , denote them as β1 (0) and β2 . (k−1) 2. At the k-th step (k = 1, 2, · · · ), fix the value of β2 at β2 , minimizing the function SSE(β1 ) as a function of β1 using the binary searching method where SSE(β1 ) = n X (k−1) {Yi − sin(2π(β1 + β2 Xi2 ))}2 . i=1 (k) Denote the minimizer β1 as β1 . (k) 3. Then fix the value of β1 at β1 , minimizing the following function as a function of β2 using the Newton algorithm SSE(β2 ) = n X (k) (Yi − sin(2π(β1 + β2 Xi2 ))}2 . i=1 (k) Denote the minimizer β2 as β2 . Let k = k + 1. 4. Repeat Steps 2 and 3 until convergence. You could implement the above algorithm as following: ## Step 1: Choose initial values for \beta_1 and \beta_2 beta10<-1.9 beta20<-0.8 iter<-0 repeat { ## Step 2: minimize SSE(beta1) with respect to beta1 by fixing beta2 SSE1<-function(beta1) { 2 SSE<-sum((Y-sin(2*pi*(beta1+beta20*X)))^2) return(SSE) } ## The following line use the binary searching algorithm to ## find the minimimum of the function SSE1(beta1) in the ## interval (1.75, 2.25) beta1new<-optimize(SSE1,interval=c(1.75,2.25))$minimum ## Step 3: minimize SSE(beta2) with respect to beta2 by fixing beta1 Db<-cos(2*pi*(beta1new+beta20*X))*2*pi*X beta2new<-beta20+t(Db)%*%(Y-sin(2*pi*(beta1new+beta20*X)))/(t(Db)%*%Db) iter<-iter+1 if ((beta10-beta1new)^2+(beta20-beta2new)^2<1e-5||iter>30) break; beta10<-beta1new beta20<-beta2new } betahat<-c(beta1new,beta2new) (d) Plot the estimated curve sin{2π(β̂1 + β̂2 Xi2 )} as function of Xi , and overlay the curve with the scatter plot of Yi versus Xi . We can plot the estimated curve and the scatter plot using the following commands. beta1<-betahat[1] beta2<-betahat[2] plot(X,Y) points(X, sin(2*pi*(beta1+beta2*X)),col=2) The plot is given below: ● 3 ● ● ● ● ● 2 ● ● ● ●● ● ●● 1 Y 0 ● ●● ● ●● ● ● ●● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ●● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● −1 −2 ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● 0.2 ●● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● 0.4 ● 0.6 ● 0.8 1.0 X Figure 1: Plot of X versus Y . The plot shows that the estimated curve fit the data set well. 3 2. The book Nonlinear Regression Analysis and its Applications by Bates and Watts contains a small data set taken from an MS thesis of M.A. Treloar “Effects of Puromycin on Galactosyltransferase of Golgi Membranes.” It is reproduced below. y is reaction velocity (in counts/min2 ) for an enzymatic reaction and x is substrate concentration (in ppm) for untreated enzyme and enzyme treated with Puromycin. x y Untreated Treated 0.02 67, 51 76, 47 0.06 84, 86 97, 107 0.11 98, 115 123, 139 0.22 131, 124 159, 152 0.56 144, 158 191, 201 1.10 160 207, 200 Every cell in the above table contains two response values except the cell corresponding to x = 1.10 for the untreated group. A standard model for either the untreated enzyme or for the treated enzyme is the “Michaelis-Menten model” Yi = θ1 Xi + i θ2 + Xi (2) where i are assumed to be iid N (0, σ 2 ). Note that in this model, (1) the mean of y is 0 when x = 0 , (2) the limiting (large x ) mean of y is θ1 , and (3) the mean of y reaches half of its limiting value when x = θ2 . We begin by considering only the “Treated” part of the data set. Use R to help you do all that follows. Begin by reading in 12 × 1 vectors y and x. (a) Plot y vs x and make “eye-estimates” of the parameters based on your plot and the interpretations of the parameters offered above. (Your eye-estimate of θ1 is what looks like a plausible limiting value for y, and your eye-estimate of θ2 is a value of x at which y has achieved half its maximum value.) Use the following commands to plot y vs x: >x<-rep(c(0.02,0.06,0.11,0.22,0.56,1.10),each=2) >y<-c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200) >plot(x,y,xlab="Conc (ppm)",ylab="Vel (counts/sqmin)") It can be telled from Figure 1 that the “eye-estimate” of θ1 is around 200 and the “eye estimate” of θ2 is around 0.08. (b) Add the nls package to your R environment. Then issue the command > REACT.fm<-nls(formula=y~theta1*x/(theta2+x), start=c(theta1=#,theta2=##),trace=T) where in place of # and ## you enter your eye-estimates from (a). This will fit the nonlinear model (2) via least squares. What are the least squares 4 200 ● ● ● ● 150 ● ● ● ● 100 Vel (counts/sqmin) ● ● 50 ● ● 0.0 0.2 0.4 0.6 0.8 1.0 Conc (ppm) Figure 2: Scatter plot of concentration vs reaction velocity. estimate of the parameter vector θ and the sum of squares of errors (SSE) θ̂ OLS θ̂1 = θ̂2 and SSE = 12 X i=1 Yi − θ̂1 Xi 2 θ̂2 + Xi . > REACT.fm<-nls(formula=y~theta1*x/(theta2+x), + start=c(theta1=200,theta2=0.08),trace=T) 4235.266 : 2e+02 8e-02 1211.965 : 213.02889399 0.06246453 1195.504 : 212.57333382 0.06393501 1195.449 : 212.67212463 0.06410312 1195.449 : 212.68261888 0.06411953 1195.449 : 212.68363515 0.06412111 > deviance(REACT.fm) [1] 1195.449 From the above R output, the least square esimate of θ is θ̂ = (212.684, 0.064)T and SSE is 1195.449. (c) Re-plot the original data with a superimposed plot of the fitted equation. To do this, you may type 5 > conc<-seq(0,1.5,.05) > velocity<-coef(REACT.fm)[1]*conc/(coef(REACT.fm)[2]+conc) > plot(c(0,1.5),c(0,250),type="n",xlab="Conc (ppm)",ylab="Vel (counts/sqmin)") > points(x,y) > lines(conc,velocity) (The first two commands set up vectors of points on the fitted curve. The third creates an empty plot with axes appropriately labeled. The fourth plots the original data and the fifth plots line segments between points on the fitted curve.) ● ● ● ● 150 ● ● ● ● ● 100 Vel (counts/sqmin) 200 250 The plot is given in Figure 3. ● ● 0 50 ● 0.0 0.5 1.0 1.5 Conc (ppm) Figure 3: Plot of concentration vs reaction velocity with a superimposed plot of fitted regression function. (d) Get more complete information on the fit by typing > summary(REACT.fm) > vcov(REACT.fm) Verify that the output of this last call is MSE(D̂T D̂)−1 and that the standard errors produced by the first are square roots of diagonal elements of this matrix. Then use the information produced here and make an approximate 95% 6 prediction interval for one additional reaction velocity, for substrate concentration .50 ppm. > summary(REACT.fm) Formula: y ~ theta1 * x/(theta2 + x) Parameters: Estimate Std. Error t value Pr(>|t|) theta1 2.127e+02 6.947e+00 30.615 3.24e-11 *** theta2 6.412e-02 8.281e-03 7.743 1.57e-05 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 10.93 on 10 degrees of freedom Number of iterations to convergence: 5 Achieved convergence tolerance: 5.798e-06 > vcov(REACT.fm) theta1 theta2 theta1 48.26288310 4.401439e-02 theta2 0.04401439 6.857383e-05 The matrix D̂ inthis example is a 12 × 2 matrix with i-th row equal to Xi , − θ̂1 Xi 2 . Next, we verify that the output of vcov(REACT.fm) and θ̂2 +Xi (θ̂2 +Xi ) the standard errors of θ using the following R code. > > > > D1<-x/(x+coef(REACT.fm)[2]) D2<--coef(REACT.fm)[1]*x/((coef(REACT.fm)[2]+x)^2) D<-cbind(D1,D2) solve(t(D)%*%D)*deviance(REACT.fm)/10 D1 D2 D1 48.26288272 4.401439e-02 D2 0.04401439 6.857383e-05 > sqrt(48.26288310) [1] 6.947149 > sqrt(6.857383e-05) [1] 0.008280932 According to the notes, the prediction interval can be computed as follows: >Ghat<-c(0.5/(0.5+coef(REACT.fm)[2]), -coef(REACT.fm)[1]*0.5/((coef(REACT.fm)[2]+0.5)^2)) 7 >RMSE<-sqrt(deviance(REACT.fm)/10) >PSE<-sqrt(1+t(Ghat)%*%solve(t(D)%*%D)%*%Ghat) >PredLb<-coef(REACT.fm)[1]*0.5/(coef(REACT.fm)[2]+0.5)-PSE*RMSE*qt(0.975,10) >PredUb<-coef(REACT.fm)[1]*0.5/(coef(REACT.fm)[2]+0.5)+PSE*RMSE*qt(0.975,10) This produces an approximate 95% prediction interval for one additional reaction velocity at concentration 0.5 ppm as (162.2353,214.7824). (e) The concentration, say x100 , at which mean reaction velocity is 100 counts/min2 is a function of θ1 and θ2 . Find a sensible point estimate of x100 and a standard error (estimated standard deviation) for your estimate. Setting the expectation of Y to 100, namely θ1 x/(θ2 + x) = 100, we obtain that 100θ2 x100 = . θ1 − 100 Then an estimator of x100 is x̂100 = 100θ̂2 θ̂1 −100 = 100∗0.064 212.684−100 = 0.057. >Ghat2<-c(-100*coef(REACT.fm)[2]/((coef(REACT.fm)[1]-100)^2), 100/(coef(REACT.fm)[1]-100)) >RMSE<-sqrt(deviance(REACT.fm)/10) >PSE<-sqrt(t(Ghat2)%*%solve(t(D)%*%D)%*%Ghat2) >SE<-PSE*RMSE Using the above R code, we obtain the standard error of x̂100 as 0.00518. (f) As a means of visualizing what function the R routine nls minimized in order to find the least squares coefficients, do the following. First set up a grid of (θ1 , θ2 ) pairs as follows. Type > > > > > > > theta<-coef(REACT.fm) se<-sqrt(diag(vcov(REACT.fm))) dv<-deviance(REACT.fm) gsize<-101 th1<-theta[1]+seq(-4*se[1],4*se[1],length=gsize) th2<-theta[2]+seq(-4*se[2],4*se[2],length=gsize) th<-expand.grid(th1,th2) Then create a function to evaluate the sums of squares > + + + ss<-function(t) { return(sum((y-t[1]*x/(t[2]+x))^2)) } 8 As a check to see that you have it programmed correctly, evaluate this function at θ̂ OLS for the data in hand, and verify that you get the SSE. Then evaluate the SSE over the grid of parameter vectors θ set up earlier and produce a contour plot using > > > > SumofSquares<-apply(th,1,ss) SumofSquares<-matrix(SumofSquares,gsize,gsize) plot(th1,th2,type="n",main="Error Sum of Squares Contours") contour(th1,th2,SumofSquares,levels=c(seq(1000,4000,200))) What contour on this plot corresponds to an approximately 90% approximate confidence region for the parameter vector θ? By using the following command > ss(theta) [1] 1195.449 0.09 We see that the SSE computed from the function ss is the same as the SSE from the R output. Because SSE exp(qchisq(0.9, 2)/12) = 1754.679, the ap- 00 40 0 0 36 00 34 2200 0.07 0.08 3 0 80 00 0.06 12 0.05 1400 1600 1800 0.04 2000 2600 3400 0.03 2400 2800 3000 3200 3600 3800 4000 190 200 210 220 230 240 Figure 4: Contour plot of the likelihood function. proximate 90% confidence region for θ is the area with in the contour labeled by 1800 in about plot. 9 (g) Now redo the contour plotting, placing only two contours on the plot using the following code. > plot(th1,th2,type="n",main="Error Sum of Squares Contours") > contour(th1,th2,SumofSquares,levels=dv*c((1+.1*qf(.95,1,10)), (1+.2*qf(.95,2,10)))) 0.05 0.06 0.07 0.08 0.09 Identify on this plot an approximately 95% (joint) confidence region for θ and individual 95% confidence intervals for θ1 and θ2 . 1788.942 0.03 0.04 2176.391 190 200 210 220 230 240 Figure 5: Contour plot of confidence regions. The approximately 95% (joint) confidence region for θ is the contour labeled by 2176.391. The individual 95% confidence interval for θ1 is determinated by the cross points between solid line and the inner contour in Figure 5. The individual 95% confidence interval for θ2 is determinated by the cross points between dash line and the inner contour in Figure 5. Note that solid and dash line were drew, respectively, at the places of maximum likelihood estimators for θ2 and θ1 . The confidence intervals can be found by the following functions. > CI4th1<-function(th1) + {return(ss(c(th1,theta[2]))-dv*(1+.1*qf(.95,1,10)))} > uniroot(CI4th1,c(200,210)) 10 $root [1] 202.7161 $f.root [1] -0.002613181 > uniroot(CI4th1,c(220,230)) $root [1] 222.6512 $f.root [1] -0.0009634489 > CI4th2<-function(th2) + {return(ss(c(theta[1],th2))-dv*(1+.1*qf(.95,1,10)))} > uniroot(CI4th2,c(0.05,0.06)) $root [1] 0.05294322 $f.root [1] 0.5348936 > uniroot(CI4th2,c(0.07,0.08)) $root [1] 0.07733102 $f.root [1] 0.8202585 Therefore the individual 95% CI for θ1 is (202.7161,222.6512) and the individual CI for θ2 is (0.0529, 0.0773). (h) Use the standard errors for the estimates of the coefficients produced by the routine nls() and make 95% t intervals for θ1 and θ2 . How much different are these from your intervals in g)? (Notice that the sample size in this problem is small and reliance on any version of large sample theory to support inferences is tenuous.) > theta[1]-qt(0.975,10)*6.947 theta1 197.2048 > theta[1]+qt(0.975,10)*6.947 theta1 228.1625 > theta[2]-qt(0.975,10)*8.281e-03 11 theta2 0.0456699 > theta[2]+qt(0.975,10)*8.281e-03 theta2 0.08257233 The 95% CI for θ1 is (197.2048,228.1625) and the 95% CI for θ2 is (0.0456,0.0826). The Wald type confidence intervals using t-distributions in part (h) is wider than that in part (g). (i) Make two different approximate 95% confidence intervals for σ 2 . One based on carrying over the linear model result that SSE/σ 2 ∼ χ2n−p and the other based on the “profile likelihood” method given in the lecture notes. > SSE/qchisq(0.975,10) [1] 58.36247 > SSE/qchisq(0.025,10) [1] 368.1733 Thus the confidence interval based on approximated chi-square distribution is (58.362,368.173). The “profile likelihood” type confidence interval is {σ 2 : − 12 SSE 12 1 log(σ 2 ) − > − log(SSE/12) − 6 − χ21,0.05 } 2 2 2σ 2 2 > CI4sigma<-function(sigma) + {return(-6*log(sigma)-SSE/(2*sigma)+6*log(SSE/12)+6+.5*qchisq(0.95,1))} > uniroot(CI4sigma,c(40,100)) $root [1] 49.16228 $f.root [1] -1.55848e-07 > uniroot(CI4sigma,c(100,300)) $root [1] 250.6437 $f.root [1] 4.149792e-12 From the output of above R code, we find that the 95% “profile likelihood” type confidence interval is (49.162,250.644). 12 (j) Use the R function confint() to get 95% intervals for θ1 and θ2 . That is, add the MASS package in order to get access to the function. Then type > confint(REACT.fm, level=.95) How do these intervals compare to the ones you found in part (g)? > confint(REACT.fm, level=.95) Waiting for profiling to be done... 1301.902 : 0.06412111 1237.844 : 0.0601934 1237.844 : 0.06019309 . . . 2159.605 : 229.5254 2584.938 : 233.3091 2584.839 : 233.1675 2.5% 97.5% theta1 197.30212811 229.29006457 theta2 0.04692517 0.08615995 The intervals found above are wider than the intervals found in part (g). (k) Scientific theory suggests that treated enzyme will have the same value of θ2 as does untreated enzyme, but that θ1 may change with treatment. That is, if 0 if treated (Puromycin is used) Zi = 1 otherwise a possible model is Yi = (θ1 + θ3 Zi )Xi + i . θ2 + Xi and the parameter θ3 then measures the effect of the treatment. Go back to the data table and now do a fit of the above (3 parameter) nonlinear model including a possible Puromycin effect using all 23 data points. Making two different approximately 95% confidence intervals for θ3 . Interpret these. (Do they indicate a statistically detectable effect? If so, what does the sign say about how treatment affects the relationship between x and y?) Plot the following curves on the same set of axes y= θ̂1 x θ̂2 + x and y = 13 (θ̂1 + θ̂3 )x θ̂2 + x for 0 < x < 2. > ally<-c(c(67, 51, 84, 86, 98, 115, 131, 124, 144, 158, 160),y) > allx<-c(x[-12],x) > allz<-c(rep(1,11),rep(0,12)) > new.REACT.fm<-nls(formula=ally~(theta1+theta3*allz)*allx/(theta2+allx), start=c(theta1=200,theta2=0.05,theta3=100),trace=T) 110154 : 2e+02 5e-02 1e+02 2327.989 : 205.23857904 0.05317497 -39.75080557 2242.586 : 208.10507238 0.05721579 -41.84677975 2240.917 : 208.56209653 0.05787892 -42.00362510 2240.892 : 208.62193926 0.05796084 -42.02328299 2240.891 : 208.62911125 0.05797054 -42.02565604 2240.891 : 208.62995759 0.05797168 -42.02593617 > summary(new.REACT.fm) Formula: ally ~ (theta1 + theta3 * allz) * allx/(theta2 + allx) Parameters: Estimate Std. Error t value Pr(>|t|) theta1 208.62996 5.80399 35.946 < 2e-16 *** theta2 0.05797 0.00591 9.809 4.37e-09 *** theta3 -42.02594 6.27214 -6.700 1.61e-06 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 10.59 on 20 degrees of freedom Number of iterations to convergence: 6 Achieved convergence tolerance: 5.153e-06 > -42.02594-qt(0.975,23-3)*6.27214 [1] -55.10939 > -42.02594+qt(0.975,23-3)*6.27214 [1] -28.94249 > confint(new.REACT.fm) Waiting for profiling to be done... 2357.707 : 0.05797168 -42.02593617 2272.785 : 0.05573146 -40.16015707 . . . 2781.752 : 201.91202436 0.05738586 3085.913 : 200.31354346 0.05736048 3085.894 : 200.35964778 0.05743236 3085.894 : 200.36525319 0.05744097 3085.894 : 200.365923 0.057442 14 2.5% 97.5% theta1 196.39379474 221.50898562 theta2 0.04599081 0.07234273 theta3 -55.19924353 -28.95657688 The Wald type 95% CI for θ3 is (-55.109,-28.942). A profile likelihood method gives 95% CI (-55.199 -28.957). Because both intervals do not include 0, we can conclude that θ3 is not 0 and there exists statistically significant Puromycin effect. Since the signs of both limits of θ̂3 is negative, the treated enzyme will significant increase the expected reaction velocity. > conc<-seq(0,1.5,.05) > velocity1<-coef(new.REACT.fm)[1]*conc/(coef(REACT.fm)[2]+conc) > velocity2<-(coef(new.REACT.fm)[1] > +coef(new.REACT.fm)[3])*conc/(coef(REACT.fm)[2]+conc) > plot(c(0,1.5),c(0,250),type="n",xlab="Conc (ppm)",ylab="Vel (counts/sqmin)") > points(allx,ally) > lines(conc,velocity1) > lines(conc,velocity2,lty=2) 15 250 200 ● 150 ● ● ● ● ● ● ● ● ● ● ● 100 Vel(counts/sqmin) ● ● ● ● ● ● ● ● ● ● Treated Untreated 0 50 ● 0.0 0.5 1.0 1.5 Conc (ppm) Figure 6: Plot of concentration vs reaction velocity with superimposed plots of fitted regression functions. 16