Key notes on solutions to assignment 3:

advertisement
Key notes on solutions to assignment 3:

Problem 1.
You need to simulate the data sets. A set of example codes for fitting non-parametric
models are:
par(mfrow = c(3,2))
plot(x, y, main = "Polynomial Regression")
lines(times, fitted(lm(y ~ poly(x,3))))
lines(times, fitted(lm(y ~ poly(x,6))), lty = 3)
legend(40, -100, c("degree = 3", "degree = 6"), lty = c(1,3), bty ="n")
plot(x, y, main = "Natural/general splines")
lines(times, fitted(lm(y ~ ns(x,df = 5))))
lines(times, fitted(lm(y ~ ns(x, df = 10))), lty = 3)
lines(times, fitted(lm(y ~ ns(x, df = 20))), lty = 4)
legend(40, -100, c("df = 5", "df = 10", "df = 20"), lty = c(1,3,4), bty= "n")
plot(x, y, main = "Smoothing splines")
lines(smooth.spline(x,y))
plot(x, y, main = "Lowess")
lines(lowess(x,y))
lines(lowess(x,y, f = 0.2), lty = 3)
legend(40, -100, c("default span", "f = 0.2"), lty = c(1,3), bty = "n")
plot(x, y, main = "ksmooth(kernel)")
lines(ksmooth(x,y, kernel = "normal", bandwidth = 5))
lines(ksmooth(x,y, kernel = "normal", bandwidth = 2), lty = 3)
legend(40, -100, c("bandwidth = 5", "bandwidth = 2"), lty = c(1,3), bty = "n")
plot(x, y, main = "supsmu/k-NN")
lines(supsmu(x,y))
lines(supsmu(x,y, bass = 8.2), lty = 3)
legend(40, -100, c("default", "bass = 8.2"), lty = c(1,3), bty = "n")
Most of you did pretty good job on this problem. The only problem that caught my
attention is that some of you did not try to change the tuning parameters (ie,
bandwidth, spline degree, etc.) to get better fit when the fitted curve is not good using
the defaults.
For either set of simulated data, we should get good fitted curves using any of
smoothing methods. In the second set, the variance is bigger. Pay attention to the
appearance of the fitted curves, and any possible differences.

Problem 2.
There is no unique model solution to this problem. Reasonable linear models are
(partial list):
Log(GAG) ~ a + b * AGE + e;
Log(GAG) ~ a + b * sqrt(AGE) + e;
Log(GAG) ~ polynomial regression on AGE [note polynomial regression is often considered as a linear
regression, since it is linear in the regression parameters]
A lot of you use the non-linear regression model:
Log(GAG) ~ a * exp(b + c* AGE) + e
which seems to give us a reasonably fitted curve.
You can use any smoothing method to fit a non-parametric curve.
I was hoping in the prediction chart, you would produce a prediction line together
with pointwise prediction intervals (bands). Something like (in the case of using
linear model Log(GAG) ~ Age):
However, only a handful of people tried to do that. In linear regression model,
prediction and prediction interval should have been taught in the Regression class
(STAT 563). Although we didn’t teach them, for non-linear and non-parametric
models, the idea and concept of prediction and prediction interval are the same but
more complicated. Some R/Splus functions (some in MASS library) (e.g., “predict”,
“predict.loess”, “predict.gam”, “predict.nls”, etc) can provide you prediction standard
errors (se). This can help you to get the pointwise prediction intervals.
Download