7 - Multivariate Adaptive Regression Splines (MARS)

advertisement
7 - Multivariate Adaptive Regression Splines (MARS)
In MARS we fit a multivariate regression model of the form,
𝑴
𝐲 = πœΌπ’ + ∑ πœΌπ’Ž π’‰π’Ž (𝑿)
π’Ž=𝟏
where the π’‰π’Ž (𝑿) are terms with a very specific form. All the terms created will be of the form:
(π‘Ώπ’Š − π’•π’Š )+ = {
(π‘Ώπ’Š − π’•π’Š ) π’Šπ’‡ π‘Ώπ’Š > π’•π’Š
𝟎
π’Šπ’‡ π‘Ώπ’Š ≤ π’•π’Š
(π’•π’Š − π‘Ώπ’Š )+ = {
(π’•π’Š − π‘Ώπ’Š ) π’Šπ’‡ π‘Ώπ’Š < π’•π’Š
𝟎
π’Šπ’‡ π‘Ώπ’Š ≥ π’•π’Š
and terms of the form if we wish to include interactions.
𝒉(π‘Ώπ’Š , 𝑿𝒋 ) = π’‚π’π’š 𝒑𝒓𝒐𝒅𝒖𝒄𝒕 𝒐𝒇 𝒕𝒉𝒆 π’–π’π’Šπ’—π’‚π’“π’Šπ’‚π’•π’† π’•π’†π’“π’Žπ’” 𝒂𝒃𝒐𝒗𝒆
Example of a MARS fit to the Boston Housing data
180
Plotting the fitted surface for a simple CART model to L.A. Basin Ozone data
>
>
>
>
>
+
x1 = seq(min(inbh),max(inbh),length=100)
x2 = seq(min(safb),max(safb),length=100)
x = expand.grid(inbh=x1,safb=x2)
ypred = predict(oz.rpart,newdata=x)
persp(x1,x2,z=matrix(ypred,100,100),theta=45,xlab="INBH",
ylab="SAFB",zlab="UPOZ")
Plotting the fitted surface for a simple MARS model to L.A. Basin Ozone data
>
>
>
+
oz.mars = earth(upoz~safb+inbh,degree=2,data=Ozdata)
ypred = predict(oz.mars,newdata=x)
persp(x1,x2,z=matrix(ypred,100,100),theta=45,xlab="INBH",
ylab="SAFB",zlab="UPOZ")
181
The earth()function in the CRAN package of the same name will fit a MARS regression model using
similar nomenclature to that used to fit OLS models using the lm() function.
> attach(Ozdata)
> library(earth) οƒŸ this will also install other packages it depends on
> names(Ozdata)
[1] "day" "v500" "wind" "hum" "safb" "inbh" "dagg" "inbt" "vis" "upoz"
> oz.mars = earth(upoz~.,data=Ozdata) οƒŸ basic MARS call, nothing fancy
> plot(oz.mars)
We still need to consider response transformations if needed. Here the heteroscedasticity is obvious,
along with some degree of curvature. We will transform response using a cube root transformation and
rerun the basic MARS fit.
> oz.mars2 = earth(upoz^.333~.,data=Ozdata)
> plot(oz.mars2)
182
> summary(oz.mars2)
Call: earth(formula=upoz^0.333~., data=Ozdata)
(Intercept)
h(day-60)
h(60-day)
h(day-116)
h(5740-v500)
h(9-wind)
h(54-hum)
h(safb-54)
h(1069-inbh)
h(dagg-11)
h(11-dagg)
h(258-inbt)
h(200-vis)
coefficients
2.45749384
0.00256177
-0.00899592
-0.00526839
-0.00090639
0.01958579
-0.00548426
0.01566385
-0.00023647
-0.00680685
-0.00321460
-0.00231004
0.00127010
Selected 13 of 20 terms, and 9 of
Importance: safb, day, v500, hum,
Number of terms at each degree of
GCV 0.05093197
RSS 14.35741
9 predictors
dagg, inbh, vis, inbt, wind
interaction: 1 12 (additive model)
GRSq 0.8117691
RSq 0.8382297
To see what the terms look like we can use the function plotmo()from the plotmo package. The
command plotmo()stands for plot model which can be used with a variety of model types in R. See
the plotmo help page for more details on which model types it can plot.
> library(plotmo)
> plotmo(oz.mars2)
grid:
day v500 wind hum safb
inbh dagg inbt vis
177.5 5760
5 64
62 2112.5
24 167.5 120
183
The equation below gives the following fitted model:
1
𝐸̂ (π‘ˆπ‘ƒπ‘‚π‘ 3 |𝑿) = 2.45 + .0026(π‘‘π‘Žπ‘¦ − 60)+ − .009(60 − π‘‘π‘Žπ‘¦)+ − .0053(π‘‘π‘Žπ‘¦ − 116)+
− .0009(5740 − 𝑣500)+ + .0196(9 − 𝑀𝑖𝑛𝑑)+ − .0055(54 − β„Žπ‘’π‘š)+ + .016(π‘ π‘Žπ‘“π‘ − 54)+
− .0002(1069 − π‘–π‘›π‘β„Ž)+ − .0068(π‘‘π‘Žπ‘”π‘” − 11)+ − .0032(11 − π‘‘π‘Žπ‘”π‘”)+ + .0013(200 − 𝑣𝑖𝑠)+
> summary(oz.mars2)
Call: earth(formula=upoz^0.333~., data=Ozdata)
(Intercept)
h(day-60)
h(60-day)
h(day-116)
h(5740-v500)
h(9-wind)
h(54-hum)
h(safb-54)
h(1069-inbh)
h(dagg-11)
h(11-dagg)
h(258-inbt)
h(200-vis)
coefficients
2.45749384
0.00256177
-0.00899592
-0.00526839
-0.00090639
0.01958579
-0.00548426
0.01566385
-0.00023647
-0.00680685
-0.00321460
-0.00231004
0.00127010
We now consider adding higher order terms, i.e. potential interactions between the covariates possibly of the form
(π‘₯𝑖 − 𝑑𝑖 )+ (π‘₯𝑗 − 𝑑𝑗 ) .
+
> oz.mars3 = earth(upoz^.333~.,degree=2,data=Ozdata)
> plot(oz.mars3)
184
> summary(oz.mars3)
Call: earth(formula=upoz^0.333~., data=Ozdata, degree=2)
(Intercept)
h(60-day)
h(day-116)
h(5740-v500)
h(67-hum)
h(safb-54)
h(1069-inbh)
h(dagg-11)
h(11-dagg)
h(250-inbt)
h(200-vis)
h(hum-50) * h(inbh-1069)
coefficients
2.60653448
-0.00969879
-0.00195882
-0.00117770
-0.00635632
0.01621764
-0.00021214
-0.00546709
-0.00335714
-0.00169567
0.00110855
-0.00000328
Selected 12 of 20 terms, and 8 of 9 predictors
Importance: safb, inbh, hum, v500, day, dagg, vis, inbt, wind-unused
Number of terms at each degree of interaction: 1 10 1
GCV 0.04860307
RSS 13.38827
GRSq 0.8203761
RSq 0.8491494
Here we can see the interaction term between humidity and inversion base height plotted. It is also
interesting to note that this model does not include a term based on wind speed.
There is also a slight improvement in R-square from the first degree term only model. As with any
flexible modeling strategy cross-validation is critical. The earth function has some built-in
functionality for performing cross-validation, which we will consider below.
185
Cross-validation with MARS using the earth package
The earth() function has built in cross-validation functionality that can be plotted and used to choose
an optimal number of terms for a MARS model. This handout will demonstrate their use and some
additional optional arguments that are useful in the model building process.
We will use the L.A. Basin ozone concentration data again in our examples.
> oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,nfold=10)
> summary(oz.mars)
Call: earth(formula=upoz^0.333~., data=Ozdata, nfold=10, degree=2)
(Intercept)
h(60-day)
h(day-116)
h(5740-v500)
h(67-hum)
h(safb-54)
h(1069-inbh)
h(dagg-11)
h(11-dagg)
h(250-inbt)
h(200-vis)
h(hum-50) * h(inbh-1069)
coefficients
2.60653448
-0.00969879
-0.00195882
-0.00117770
-0.00635632
0.01621764
-0.00021214
-0.00546709
-0.00335714
-0.00169567
0.00110855
-0.00000328
Selected 12 of 20 terms, and 8 of 9 predictors
Importance: safb, inbh, hum, v500, day, dagg, vis, inbt, wind-unused
Number of terms at each degree of interaction: 1 10 1
GCV 0.04860307 RSS 13.38827 GRSq 0.8203761 RSq 0.8491494 cv.rsq 0.7745141
Note: the cross-validation sd's below are standard deviations across folds
Cross validation:
cv.rsq
sd
0.77 0.081
nterms 12.50 sd 1.27
nvars 7.90 sd 0.32
MaxErr
sd
-0.8 0.61
To examine cross-validation results graphically it is best to save the results using the option keepxy=T
option when using nfold to perform the cross-validation.
> oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,nfold=10,keepxy=T)
> plot(oz.mars,which=1,col.rsq=0) οƒŸ displays only the model selection plot. The col.rsq = 0
option will block out the R2 portion of the plot.
186
We can choose a different value for k in the k-fold cross-validation performed by the nfold option. We can also
use the ncross option to perform the k-fold cross-validation ncross times. This will allow us to see the degree
of variability in the k-fold cross-validation results.
> oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,ncross=10,nfold=5,keepxy=T)
> plot(oz.mars,which=1,col.rsq=0)
187
When using a small value for k-fold cross-validation we can plot the results of each fold individually as
follows:
> oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,nfold=3,keepxy=T)
> plot.earth.models(oz.mars$cv.list,which=1)
> oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,ncross=20,nfold=5,keepxy=T)]
> plot(oz.mars,which=1,col.mean.infold.rsq="blue",col.infold.rsq="lightblue",
col.grsq=0,col.rsq=0)
188
Measuring Variable Importance – evimp()command
The command evimp function uses three criteria for estimating variable importance in a MARS model.
The nsubsets criterion counts the number of model subsets that include the variable, there is one
subset for each model size from 1 to selected model size. The rss criterion first calculates the decrease
in the RSS for each subset relative to the previous subset. Then for each variable it sums these decreases
over all subsets that include the variable. The gcv uses GCV instead of the residual sum of squares.
Adding a variable can actually increase the GCV do the model complexity penalty in the denominator.
> evimp(oz.mars,trim=FALSE)
nsubsets
gcv
rss
safb
11 100.0 100.0
inbh
10 48.9
50.8
hum
9 39.3
41.5
v500
8 32.8
35.1
day
7 29.5
31.7
dagg
5 22.2
24.2
vis
4 13.4
16.6
inbt
2
7.8
10.5
wind-unused
0
0.0
0.0
The trim = FALSE option displays variables not used in the model.
Additional useful/interesting options for the earth() function
189
Example 2: Boston Housing Data
Using the same training and test/validation set as in the example at the end of Section 6 of the
notes, we will first compute an “optimal” MARS model to the training set and then predict the
test cases. We can compare the mean PSE to the best models we identified above, i.e. random
forest, boosted trees, and treed regression.
Using the internal GCV criterion a degree=1 model with approximately 20 terms seemed
appears optimal.
> bos.mars = earth(logmedv~.,data=Boston.train,degree=1,nfold=5,ncross=10,keepxy=T,nprune=30)
> bos.mars = earth(logmedv~.,data=Boston.train,degree=1,nfold=5,ncross=10,keepxy=T,nprune=30)
> summary(bos.mars)
Call: earth(formula=logmedv~., data=Boston.train, keepxy=T,
ncross=10, nfold=5, degree=1, nprune=30)
(Intercept)
h(B-394.43)
h(394.43-B)
h(crim-4.34879)
h(4.34879-crim)
h(6.3361-dis)
h(lstat-6.47)
h(6.47-lstat)
h(lstat-29.29)
h(nox-0.488)
h(nox-0.544)
h(nox-0.655)
h(nox-0.693)
h(nox-0.77)
h(ptratio-14.7)
h(rm-6.319)
h(6.319-rm)
h(284-tax)
coefficients
3.4543157
-0.0208361
-0.0005712
-0.0105045
-0.0496444
0.0502471
-0.0302173
0.0519796
0.0485452
-2.9008332
3.2657694
-13.2343127
17.9503513
-7.0763230
-0.0306274
0.2069193
0.0655710
0.0013811
Selected 18 of 22 terms, and 8 of
Importance: lstat, rm, crim, nox,
Number of terms at each degree of
GCV 0.0264371 RSS 8.962633 GRSq
13 predictors
dis, ptratio, B, tax, age-unused, chas-unused, indus-unused, ...
interaction: 1 17 (additive model)
0.8480453 RSq 0.8724877 cv.rsq 0.8129899
Note: the cross-validation sd's below are standard deviations across folds
Cross validation:
cv.rsq
sd
0.81 0.047
nterms 18.52 sd 1.23
nvars 8.96 sd 0.64
MaxErr
sd
1.2 0.66
190
> plotmo(bos.mars)
> plot(bos.mars,which=1,col.mean.infold.rsq="blue",col.infold.rsq="lightblue",col.grsq=0,col.rsq=0)
> ypred = predict(bos.mars,newdata=Boston.test)
> mean((Boston.test$logmedv-ypred)^2)
[1] 0.02864426
191
Compared to random forest, boosted trees, and boosted treed regression, MARS does not measure up
for this particular regression problem. MARS is one of the main algorithms used in Salford Systems
(https://www.salford-systems.com/) commercial predictive analytics software. Although MARS did
not outperform the tree-based methods, it does produce a much interpretable model. The summary of
the MARS model shown below demonstrates that a MARS fit is just an OLS model with automatically
chosen terms and interactions.
> summary(bos.mars)
Call: earth(formula=logmedv~., data=Boston.train, keepxy=T,
ncross=10, nfold=5, degree=1, nprune=30)
(Intercept)
h(B-394.43)
h(394.43-B)
h(crim-4.34879)
h(4.34879-crim)
h(6.3361-dis)
h(lstat-6.47)
h(6.47-lstat)
h(lstat-29.29)
h(nox-0.488)
h(nox-0.544)
h(nox-0.655)
h(nox-0.693)
h(nox-0.77)
h(ptratio-14.7)
h(rm-6.319)
h(6.319-rm)
h(284-tax)
coefficients
3.4543157
-0.0208361
-0.0005712
-0.0105045
-0.0496444
0.0502471
-0.0302173
0.0519796
0.0485452
-2.9008332
3.2657694
-13.2343127
17.9503513
-7.0763230
-0.0306274
0.2069193
0.0655710
0.0013811
Selected 18 of 22 terms, and 8 of
Importance: lstat, rm, crim, nox,
unused, ...
Number of terms at each degree of
GCV 0.0264371 RSS 8.962633 GRSq
13 predictors
dis, ptratio, B, tax, age-unused, chas-unused, indusinteraction: 1 17 (additive model)
0.8480453 RSq 0.8724877 cv.rsq 0.8068335
192
Download