Lab 11 Key

advertisement
Lab 11 Key
a)
Fit an ordinary MLR model to these data via ordinary least squares. Also fit Lasso,
Ridge and Elastic Net (with say $ $=.5) linear models to these data (i.e. use penalized
least squares and "shrink" the predictions toward ybar and the size of the regression
coefficients). Use glmnet and cross-validation to choose the values of$ $ you employ.
#Here is Code for Lab #11
#Load the psych package
library(psych)
Airfoil<-read.table(file.choose(),header=FALSE,sep='\t')
Air<-data.frame(Airfoil)
names(Air)<-c("Freq","Angle","Chord","Velocity","Displace","Pressure")
summary(Air)
x=as.matrix(Air[,1:5])
y=as.matrix(Air[,6])
#Load the glmnet,stats, and graphics packages and do some
#cross-validations to pick best lambda values
library(glmnet)
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-2
library(stats)
library(graphics)
AirLasso<-cv.glmnet(x,y,alpha=1)
AirRidge<-cv.glmnet(x,y,alpha=0)
AirENet<-cv.glmnet(x,y,alpha=.5)
plot(AirLasso)
plot(AirRidge)
plot(AirENet)
#The next commands give lambdas usually recommended .... they are not
#exact minimizers of CVSSE, but ones "close to" the minimizers
#and associated with somewhat less flexible predictors
AirLasso$lambda.1se
## [1] 0.288921
AirRidge$lambda.1se
## [1] 1.193825
AirENet$lambda.1se
## [1] 0.5265081
b)
How do the fits in a) compare? Do residual plots make you think that a predictor linear
in the 5 input variables will be effective? Explain. Does it look like some aspects of the
relationship between inputs and the response are unaccounted for in the fitting? Why
or why not?
Due to figures, it is obvious that they are somehow the same. There is not a
considerable difference between plots. Hence, it shows that all methods are trying to
shrink everything over ybar (the baseline). It is favorable for residuals to randomly
go up and down of baseline and y and yhat has a kind of linear correlation. What we
can see in plots there is a line which is in small y’s small yhats are obtained and in big
y’s big yhats. In linear model there is a bit curvature in model which shows that
something is not considered in model. Linear line is more predictable. (Answer from
Zahra Davoudi)
##
##
##
##
##
##
##
##
6 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 131.64263394
Freq
-0.00109955
Angle
-0.26314173
Chord
-27.78417475
Velocity
0.07259511
Displace
-156.79039132
##
##
##
##
##
##
##
##
6 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 1.308044e+02
Freq
-1.005557e-03
Angle
-2.541206e-01
Chord
-2.582854e+01
Velocity
7.535086e-02
Displace
-1.479461e+02
##
##
##
##
##
##
##
##
6 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 1.312905e+02
Freq
-1.053203e-03
Angle
-2.424408e-01
Chord
-2.632966e+01
Velocity
6.983460e-02
Displace
-1.550157e+02
##
(Intercept)
xFreq
xAngle
xChord
## 1.328338e+02 -1.282207e-03 -4.219117e-01 -3.568800e+01
##
xDisplace
## -1.473005e+02
##
##
##
##
##
##
##
c)
(Intercept)
xFreq
xAngle
xChord
xVelocity
xDisplace
OLS
132.8338
-0.001282207
-0.4219117
-35.688
0.09985404
-147.3005
Lasso
?
?
?
?
?
?
Ridge
?
?
?
?
?
?
xVelocity
9.985404e-02
ENet(.5)
?
?
?
?
?
?
Standardize the input variables and fit k -nearest neighbor predictors to the data for
k=1,3,4,5 (I don't know why the routine crashes when one tries k=2) using knn.reg()
from the FNN package. The " PRESS " it produces is the LOO cross-validation error sum
of squares and can thus be used to choose a value of the complexity parameter k .
Which value seems best? Large k corresponds to which, a "simple" or a "complex"
predictor? How does knn prediction seem to compare to linear prediction in this
"fairly large N and small p " context?
K=3 seems to be the best because it has the least PRESS (LOOCVSSE). K’s shows that
how many nearest vectors you used to predict the value. So, increasing the k makes
the model less complex cause, you have more data to predict with. Also KNN3 is
shinked to ybar (baseline). So, the prediction is easier. Thus, KNN3is selected as the
best model.
#Now do some nearest neighbor regressions
#First, load the FNN package and standardize the predictors
scale.x<-scale(x)
x[1:10]
##
[1]
800 1000 1250 1600 2000 2500 3150 4000 5000 6300
scale.x[1:10,]
##
## [1,]
## [2,]
## [3,]
## [4,]
## [5,]
## [6,]
## [7,]
## [8,]
## [9,]
## [10,]
Freq
-0.6618024
-0.5983622
-0.5190619
-0.4080415
-0.2811610
-0.1225604
0.0836204
0.3532414
0.6704426
1.0828042
Angle
-1.146021
-1.146021
-1.146021
-1.146021
-1.146021
-1.146021
-1.146021
-1.146021
-1.146021
-1.146021
Chord
1.798701
1.798701
1.798701
1.798701
1.798701
1.798701
1.798701
1.798701
1.798701
1.798701
Velocity
1.312498
1.312498
1.312498
1.312498
1.312498
1.312498
1.312498
1.312498
1.312498
1.312498
Displace
-0.6445901
-0.6445901
-0.6445901
-0.6445901
-0.6445901
-0.6445901
-0.6445901
-0.6445901
-0.6445901
-0.6445901
library(FNN)
Air.knn1<-knn.reg(scale.x,test=NULL,y,k=1)
Air.knn1
## PRESS = 10709.82
## R2-Predict = 0.8501754
Air.knn1$pred[1:10]
##
##
[1] 125.201 126.201 125.201 125.951 127.591 127.461 125.571 121.762
[9] 119.632 118.122
plot(y,y-Air.knn1$pred)
abline(a=0,b=0)
Air.knn3<-knn.reg(scale.x,test=NULL,y,k=3)
Air.knn3
## PRESS = 7573.019
## R2-Predict = 0.8940575
Air.knn3$pred[1:10]
##
##
[1] 126.2477 126.5810 126.3310 126.2043 126.3710 125.7247 124.0080
[8] 122.7547 121.4850 119.6850
plot(y,y-Air.knn3$pred)
abline(a=0,b=0)
Air.knn4<-knn.reg(scale.x,test=NULL,y,k=4)
Air.knn4
## PRESS = 10938.42
## R2-Predict = 0.8469774
Air.knn4$pred[1:10]
##
##
[1] 126.4663 126.4338 126.6135 126.2035 126.4387 126.1912 124.1340
[8] 122.9140 120.9990 118.7973
plot(y,y-Air.knn4$pred)
abline(a=0,b=0)
Air.knn5<-knn.reg(scale.x,test=NULL,y,k=5)
Air.knn5
## PRESS = 11503.44
## R2-Predict = 0.8390731
Air.knn5$pred[1:10]
##
##
[1] 126.3714 126.5714 126.4652 126.2892 126.4774 126.2814 124.7994
[8] 122.2576 120.4236 118.4680
plot(y,y-Air.knn5$pred)
abline(a=0,b=0)
d)
Use the tree package and fit a good regression tree to these data. Employ costcomplexity pruning and cross-validation to pick this tree. How many final nodes do
you suggest? How do the predictions for this tree compare to the ones in a) and c) ?
Better result for this tree according to the residuals. NODE=18.
#Load the tree package
#Do fitting of a big tree
library(tree)
Airtree<-tree(Pressure ~.,Air,
control=tree.control(nobs=1503,mincut=5,minsize=10,mindev=.005))
summary(Airtree)
##
## Regression tree:
## tree(formula = Pressure ~ ., data = Air, control = tree.control(nobs =
1503,
##
mincut = 5, minsize = 10, mindev = 0.005))
## Number of terminal nodes: 32
## Residual mean deviance: 12.42 = 18270 / 1471
## Distribution of residuals:
##
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
## -11.48000 -2.11700
0.09367
0.00000
2.38700 14.93000
Airtree
## node), split, n, deviance, yval
##
* denotes terminal node
##
##
1) root 1503 71480.00 124.8
##
2) Freq < 3575 1079 34610.00 126.6
##
4) Displace < 0.0155759 769 17080.00 127.9
##
8) Chord < 0.127 331 6638.00 129.7
##
16) Freq < 715 80 2368.00 126.9
##
32) Chord < 0.0762 51
957.50 124.4
##
64) Displace < 0.0135486 39
500.40 122.9
##
65) Displace > 0.0135486 12
86.88 129.2
##
33) Chord > 0.0762 29
483.40 131.5 *
##
17) Freq > 715 251 3496.00 130.5 *
##
9) Chord > 0.127 438 8554.00 126.5
##
18) Freq < 1800 312 4517.00 127.9
##
36) Angle < 3.5 161 1493.00 126.6
##
72) Freq < 565 47
410.50 123.6 *
##
73) Freq > 565 114
477.30 127.9 *
##
37) Angle > 3.5 151 2491.00 129.3
##
74) Velocity < 47.55 80 1066.00 127.7 *
##
75) Velocity > 47.55 71
995.50 131.0 *
##
19) Freq > 1800 126 1847.00 123.0
##
38) Displace < 0.0044289 75
566.30 125.0 *
##
39) Displace > 0.0044289 51
524.80 120.0 *
##
5) Displace > 0.0155759 310 13140.00 123.4
##
10) Displace < 0.0505823 284 10690.00 124.1
##
20) Freq < 1425 196 7077.00 125.7
##
40) Displace < 0.0174419 27 1459.00 120.5
##
80) Freq < 565 15
121.60 114.9 *
##
81) Freq > 565 12
290.40 127.4 *
##
41) Displace > 0.0174419 169 4771.00 126.5
##
82) Velocity < 47.55 81 1855.00 125.0 *
##
83) Velocity > 47.55 88 2546.00 127.9
##
166) Angle < 12.45 36
813.40 130.6
##
332) Freq < 565 20
170.40 133.9 *
##
333) Freq > 565 16
154.10 126.5 *
##
167) Angle > 12.45 52 1286.00 126.1 *
##
21) Freq > 1425 88 2099.00 120.7
##
42) Chord < 0.0762 48
735.90 123.6
##
84) Displace < 0.0286223 36
233.20 125.3
##
85) Displace > 0.0286223 12
65.23 118.3
##
43) Chord > 0.0762 40
491.40 117.2 *
##
11) Displace > 0.0505823 26
585.70 115.3 *
##
3) Freq > 3575 424 25370.00 120.4
##
6) Displace < 0.00156285 112 3477.00 129.4
##
12) Freq < 7150 54
729.90 132.9
##
24) Displace < 0.00107075 36
108.90 135.0 *
##
25) Displace > 0.00107075 18
131.30 128.6 *
##
13) Freq > 7150 58 1509.00 126.2
##
26) Displace < 0.00107075 43
820.60 128.1 *
*
*
*
*
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
27) Displace > 0.00107075 15
120.70 120.9 *
7) Displace > 0.00156285 312 9602.00 117.2
14) Displace < 0.0353012 292 7649.00 117.8
28) Freq < 5650 147 3388.00 120.1
56) Displace < 0.00467589 68
953.50 122.8
112) Chord < 0.1905 32
387.40 125.3 *
113) Chord > 0.1905 36
193.50 120.6 *
57) Displace > 0.00467589 79 1514.00 117.8
114) Chord < 0.0381 16
226.70 123.3 *
115) Chord > 0.0381 63
675.90 116.4 *
29) Freq > 5650 145 2746.00 115.5
58) Displace < 0.00259924 39
461.60 118.6 *
59) Displace > 0.00259924 106 1768.00 114.4
118) Chord < 0.0381 15
213.40 120.5 *
119) Chord > 0.0381 91
898.50 113.4 *
15) Displace > 0.0353012 20
169.00 108.1 *
plot(Airtree)
text(Airtree,pretty=0)
#One can try to find an optimal sub-tree of the tree just grown
#We'll use cross-validation based on cost-complexity pruning
#Each alpha (in the lecture notation) has a favorite number of
#nodes and not all numbers of final nodes are ones optimizing
#the cost-complexity
#The code below finds the alphas at which optimal subtrees change
cv.Airtree<-cv.tree(Airtree,FUN=prune.tree)
names(cv.Airtree)
## [1] "size"
"dev"
"k"
"method"
cv.Airtree
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
$size
[1] 32 31 30 29 26 25 24 23 21 19 18 17 15 14 13 11 10
[24] 1
$dev
[1]
[8]
[15]
[22]
$k
[1]
[7]
[13]
[19]
31132.98
31132.98
34228.70
45289.63
31132.98
31132.98
35228.85
48724.11
-Inf
489.6232
850.3184
1515.5737
8
7
6
4
31132.98 31132.98 31132.98 31132.98 31132.98
31132.98 31132.98 32564.49 34376.07 34454.97
37390.42 39682.32 39682.32 43422.31 45065.73
71589.01
370.2215
567.4349
871.6015
1783.3467
372.5295
568.9482
920.6570
1872.4868
429.3051
586.1602
947.0603
2038.2066
$method
[1] "deviance"
attr(,"class")
[1] "prune"
9
"tree.sequence"
#We can plot SSE versus size of the optimizing trees
plot(cv.Airtree$size,cv.Airtree$dev,type="b")
435.2129
437.4860
611.0563
756.4097
1237.9701 1509.5330
4391.7714 11895.0527
3
#And we can plot versus what the program calls "k," which is the
#reciprocal of alpha from class
plot(cv.Airtree$k,cv.Airtree$dev,type="b")
#Or we can plot versus "1/k"
plot(1/cv.Airtree$k,cv.Airtree$dev,type="b")
#We can see what a pruned tree will look like for a size
#identified by the cross-validation
Airgoodtree<-prune.tree(Airtree,best=15)
Airgoodtree
## node), split, n, deviance, yval
##
* denotes terminal node
##
## 1) root 1503 71480.0 124.8
##
2) Freq < 3575 1079 34610.0 126.6
##
4) Displace < 0.0155759 769 17080.0 127.9
##
8) Chord < 0.127 331 6638.0 129.7 *
##
9) Chord > 0.127 438 8554.0 126.5
##
18) Freq < 1800 312 4517.0 127.9 *
##
19) Freq > 1800 126 1847.0 123.0 *
##
5) Displace > 0.0155759 310 13140.0 123.4
##
10) Displace < 0.0505823 284 10690.0 124.1
##
20) Freq < 1425 196 7077.0 125.7
##
40) Displace < 0.0174419 27 1459.0 120.5
##
80) Freq < 565 15
121.6 114.9 *
##
81) Freq > 565 12
290.4 127.4 *
##
41) Displace > 0.0174419 169 4771.0 126.5 *
##
21) Freq > 1425 88 2099.0 120.7
##
42) Chord < 0.0762 48
735.9 123.6 *
##
43) Chord > 0.0762 40
491.4 117.2 *
##
11) Displace > 0.0505823 26
585.7 115.3 *
##
3) Freq > 3575 424 25370.0 120.4
##
6) Displace < 0.00156285 112 3477.0 129.4
##
12) Freq < 7150 54
729.9 132.9 *
##
13) Freq > 7150 58 1509.0 126.2 *
##
7) Displace > 0.00156285 312 9602.0 117.2
##
14) Displace < 0.0353012 292 7649.0 117.8
##
28) Freq < 5650 147 3388.0 120.1
##
56) Displace < 0.00467589 68
953.5 122.8 *
##
57) Displace > 0.00467589 79 1514.0 117.8 *
##
29) Freq > 5650 145 2746.0 115.5 *
##
15) Displace > 0.0353012 20
169.0 108.1 *
summary(Airgoodtree)
##
##
##
##
##
##
##
##
##
Regression tree:
snip.tree(tree = Airtree, nodes = c(56L, 41L, 42L, 12L, 13L,
18L, 29L, 57L, 19L, 8L))
Variables actually used in tree construction:
[1] "Freq"
"Displace" "Chord"
Number of terminal nodes: 15
Residual mean deviance: 18.56 = 27620 / 1488
Distribution of residuals:
##
Min.
## -13.5800
1st Qu.
-2.7420
Median
0.2656
Mean
0.0000
3rd Qu.
2.7580
plot(Airgoodtree)
text(Airgoodtree,pretty=0)
Airpred<-predict(Airgoodtree,Air,type="vector")
plot(y,y-Airpred)
abline(a=0,b=0)
Max.
14.4800
e) Fit a random forest to these data using m=2 (which is approximately the standard
default of the number of predictors divided by 3) using the randomForest package. How do
these predictions compare to all the others?
These predictions are similar to all the others.
#Next is some code for Random Forest Fitting
#First Load the randomForest package
library(randomForest)
##
##
##
##
##
##
##
##
randomForest 4.6-10
Type rfNews() to see new features/changes/bug fixes.
Attaching package: 'randomForest'
The following object is masked from 'package:psych':
outlier
Air.rf<-randomForest(Pressure~.,data=Air,
type="regression",ntree=500,mtry=2)
Air.rf
##
## Call:
## randomForest(formula = Pressure ~ ., data = Air, type = "regression",
ntree = 500, mtry = 2)
##
Type of random forest: regression
##
Number of trees: 500
## No. of variables tried at each split: 2
##
##
Mean of squared residuals: 4.720337
##
% Var explained: 90.07
predict(Air.rf)[1:10]
##
1
2
3
4
5
6
7
8
## 126.5397 126.5226 126.4388 126.3687 126.2617 125.4725 123.9205 121.8551
##
9
10
## 120.7444 118.3694
f)
g)
People sometimes try to use "ensembles" of predictors in big predictive analytics
problems. This means combining predictors by using a weighted average. See if you
can find positive constants
COLS , CLASSO , CRIDGE , CENET , CKNN , CTREE , CRF
adding to 1 so that the corresponding linear combination of predictions has a larger
correlation with the response variable than any single predictor. (In practice, one
would have to look for these constant inside and not after cross-validations.)
#This code produces a scatterplot matrix for the predictions
#with the "45 degree line" drawn in
comppred<-cbind(y,predict(AirLM,newx=x),predict(AirLasso,newx=x),
predict(AirRidge,newx=x),predict(AirENet,newx=x),
Air.knn3$pred,Air.knn5$pred,Airpred,predict(Air.rf))
colnames(comppred)<-c("y","OLS","Lasso","Ridge",
"ENet(.5)","3NN","5NN","Tree","RF")
pairs(comppred,panel=function(x,y,...){
points(x,y)
abline(0,1)},xlim=c(100,145),
ylim=c(100,145))
round(cor(as.matrix(comppred)),2)
##
##
##
##
##
##
##
##
##
##
y
y
1.00
OLS
0.72
Lasso
0.72
Ridge
0.72
ENet(.5) 0.71
3NN
0.95
5NN
0.92
Tree
0.78
RF
0.96
OLS Lasso Ridge ENet(.5) 3NN 5NN Tree
RF
0.72 0.72 0.72
0.71 0.95 0.92 0.78 0.96
1.00 1.00 1.00
1.00 0.74 0.78 0.73 0.77
1.00 1.00 1.00
1.00 0.73 0.78 0.73 0.77
1.00 1.00 1.00
1.00 0.74 0.78 0.73 0.77
1.00 1.00 1.00
1.00 0.73 0.78 0.73 0.77
0.74 0.73 0.74
0.73 1.00 0.98 0.80 0.97
0.78 0.78 0.78
0.78 0.98 1.00 0.82 0.97
0.73 0.73 0.73
0.73 0.80 0.82 1.00 0.84
0.77 0.77 0.77
0.77 0.97 0.97 0.84 1.00
For question 2, rerun the code with different data set.
Download