Stat 401B Fall 2015 Lab #12

advertisement
Stat 401B Fall 2015 Lab #12
There is an R below and posted on the web page that can be used to explore the behavior of kernel and
locally weighted regression smoothing and use of a generalized additive model. Use it and by
experimenting with different parameters, get a general idea of what is possible with these methods.
Code Set for Stat 401B Laboratory #12
##Below is some code for doing locally weighted regression smoothing
#using the loess function
#First add the stats and graphics packages
x<-c(1,3,4,6,7,8,10,12)
y<-c(1,2,1,4,3,4,2,2)
plot(x,y)
yhat<-loess(y~x,span=2,degree=1)
plot(yhat)
loess.smooth(x,y,span=5,degree=0,family="gaussian")
scatter.smooth(x,y,span=.5,degree=0,family="gaussian")
#This is a small bit of code for kernel smoothing with another toy example
x<-c(1,2,4,5,7,8,9,11,12)
y<-c(2,3,4,3,5,6,7,2,3)
plot(x,y)
lines(ksmooth(x,y,"normal",bandwidth=.5),col=2)
lines(ksmooth(x,y,"normal",bandwidth=1),col=3)
lines(ksmooth(x,y,"normal",bandwidth=2),col=4)
lines(ksmooth(x,y,"normal",bandwidth=4),col=5)
lines(ksmooth(x,y,"normal",bandwidth=5),col=6)
lines(ksmooth(x,y,"normal",bandwidth=10),col=8)
lines(ksmooth(x,y,"normal",bandwidth=20),col=9)
#Here is a small bit of code for local regression smoothing with the second
#toy example
plot(x,y)
locregress1<-loess(y~x,as.data.frame(cbind(x,y)),control=loess.control("direct"),
degree=0,span=.5)
fit.locregress1<-predict(locregress1,data.frame(x=seq(1,12,.05)))
lines(seq(1,12,.05),fit.locregress1,col=2)
locregress2<-loess(y~x,as.data.frame(cbind(x,y)),control=loess.control("direct"),
degree=0,span=.75)
fit.locregress2<-predict(locregress2,data.frame(x=seq(1,12,.05)))
lines(seq(1,12,.05),fit.locregress2,col=3)
1
locregress3<-loess(y~x,as.data.frame(cbind(x,y)),control=loess.control("direct"),
degree=0,span=1)
fit.locregress3<-predict(locregress3,data.frame(x=seq(1,12,.05)))
lines(seq(1,12,.05),fit.locregress3,col=4)
locregress4<-loess(y~x,as.data.frame(cbind(x,y)),control=loess.control("direct"),
degree=1,span=.5)
fit.locregress4<-predict(locregress4,data.frame(x=seq(1,12,.05)))
lines(seq(1,12,.05),fit.locregress4,col=5)
locregress5<-loess(y~x,as.data.frame(cbind(x,y)),control=loess.control("direct"),
degree=1,span=.75)
fit.locregress5<-predict(locregress5,data.frame(x=seq(1,12,.05)))
lines(seq(1,12,.05),fit.locregress5,col=6)
locregress6<-loess(y~x,as.data.frame(cbind(x,y)),control=loess.control("direct"),
degree=1,span=1)
fit.locregress6<-predict(locregress6,data.frame(x=seq(1,12,.05)))
lines(seq(1,12,.05),fit.locregress6,col=8)
#This is some code for a "Generalized Additive Model" for the Ames
#House Price problem
#First load the gam package
#The following Calls fit "Generalized Additive Models" with various smooth terms
Price1<-gam(Price~s(Size)+s(Basement..Total.)+.,AmesHouse,family="gaussian")
summary(Price1)
Price1$coefficients
plot(Price1,ask=TRUE)
Price2<-gam(Price~s(Size)+s(Land)+s(Basement..Total.)+.,AmesHouse,
family="gaussian")
summary(Price2)
plot(Price2,ask=TRUE)
#Now use the Air data frame
pressure1<-gam(Pressure~s(Freq)+s(Angle)+s(Chord)+s(Velocity)+s(Displace),Air,
family="gaussian")
summary(pressure1)
plot(pressure1,ask=TRUE)
2
pressure2<-gam(Pressure~s(Freq)+s(Displace)+.,Air,
family="gaussian")
summary(pressure2)
plot(pressure2,ask=TRUE)
Here are a couple additional exercises for getting a feel for what kernel smoothing is doing.
1. Work out by hand what the 1 and 3 nearest neighbor predictors produce for ŷ  x  based on the small fake
data set used in class and above.
2. Write an R function giving the kernel smoothing predictor for Gaussian kernel and bandwidth 
8
 x  xi 
yi 


  
ˆy  x   i 18
 x  xi 



  
i 1
for the first fake data set in the code. Plot this function for a range of values that allow you to see the behavior
for small  and for large  . How do the values of the predictor compare to the 1-nn predictions from 1.?
3. Compute regression tree predictors (chosen by forward selection only, not involving pruning) for the 1st
small fake data set for 2 final nodes through the first number of nodes where "rectangles" (intervals here) are
complete homogeneous and SSE is 0. Identify one tree predictor that could be obtained by pruning your final
tree that was not seen in forward selection.
3
Download