7 - Multivariate Adaptive Regression Splines (MARS) In MARS we fit a multivariate regression model of the form, π΄ π² = πΌπ + ∑ πΌπ ππ (πΏ) π=π where the ππ (πΏ) are terms with a very specific form. All the terms created will be of the form: (πΏπ − ππ )+ = { (πΏπ − ππ ) ππ πΏπ > ππ π ππ πΏπ ≤ ππ (ππ − πΏπ )+ = { (ππ − πΏπ ) ππ πΏπ < ππ π ππ πΏπ ≥ ππ and terms of the form if we wish to include interactions. π(πΏπ , πΏπ ) = πππ ππππ πππ ππ πππ ππππππππππ πππππ πππππ Example of a MARS fit to the Boston Housing data 180 Plotting the fitted surface for a simple CART model to L.A. Basin Ozone data > > > > > + x1 = seq(min(inbh),max(inbh),length=100) x2 = seq(min(safb),max(safb),length=100) x = expand.grid(inbh=x1,safb=x2) ypred = predict(oz.rpart,newdata=x) persp(x1,x2,z=matrix(ypred,100,100),theta=45,xlab="INBH", ylab="SAFB",zlab="UPOZ") Plotting the fitted surface for a simple MARS model to L.A. Basin Ozone data > > > + oz.mars = earth(upoz~safb+inbh,degree=2,data=Ozdata) ypred = predict(oz.mars,newdata=x) persp(x1,x2,z=matrix(ypred,100,100),theta=45,xlab="INBH", ylab="SAFB",zlab="UPOZ") 181 The earth()function in the CRAN package of the same name will fit a MARS regression model using similar nomenclature to that used to fit OLS models using the lm() function. > attach(Ozdata) > library(earth) ο this will also install other packages it depends on > names(Ozdata) [1] "day" "v500" "wind" "hum" "safb" "inbh" "dagg" "inbt" "vis" "upoz" > oz.mars = earth(upoz~.,data=Ozdata) ο basic MARS call, nothing fancy > plot(oz.mars) We still need to consider response transformations if needed. Here the heteroscedasticity is obvious, along with some degree of curvature. We will transform response using a cube root transformation and rerun the basic MARS fit. > oz.mars2 = earth(upoz^.333~.,data=Ozdata) > plot(oz.mars2) 182 > summary(oz.mars2) Call: earth(formula=upoz^0.333~., data=Ozdata) (Intercept) h(day-60) h(60-day) h(day-116) h(5740-v500) h(9-wind) h(54-hum) h(safb-54) h(1069-inbh) h(dagg-11) h(11-dagg) h(258-inbt) h(200-vis) coefficients 2.45749384 0.00256177 -0.00899592 -0.00526839 -0.00090639 0.01958579 -0.00548426 0.01566385 -0.00023647 -0.00680685 -0.00321460 -0.00231004 0.00127010 Selected 13 of 20 terms, and 9 of Importance: safb, day, v500, hum, Number of terms at each degree of GCV 0.05093197 RSS 14.35741 9 predictors dagg, inbh, vis, inbt, wind interaction: 1 12 (additive model) GRSq 0.8117691 RSq 0.8382297 To see what the terms look like we can use the function plotmo()from the plotmo package. The command plotmo()stands for plot model which can be used with a variety of model types in R. See the plotmo help page for more details on which model types it can plot. > library(plotmo) > plotmo(oz.mars2) grid: day v500 wind hum safb inbh dagg inbt vis 177.5 5760 5 64 62 2112.5 24 167.5 120 183 The equation below gives the following fitted model: 1 πΈΜ (ππππ 3 |πΏ) = 2.45 + .0026(πππ¦ − 60)+ − .009(60 − πππ¦)+ − .0053(πππ¦ − 116)+ − .0009(5740 − π£500)+ + .0196(9 − π€πππ)+ − .0055(54 − βπ’π)+ + .016(π πππ − 54)+ − .0002(1069 − πππβ)+ − .0068(ππππ − 11)+ − .0032(11 − ππππ)+ + .0013(200 − π£ππ )+ > summary(oz.mars2) Call: earth(formula=upoz^0.333~., data=Ozdata) (Intercept) h(day-60) h(60-day) h(day-116) h(5740-v500) h(9-wind) h(54-hum) h(safb-54) h(1069-inbh) h(dagg-11) h(11-dagg) h(258-inbt) h(200-vis) coefficients 2.45749384 0.00256177 -0.00899592 -0.00526839 -0.00090639 0.01958579 -0.00548426 0.01566385 -0.00023647 -0.00680685 -0.00321460 -0.00231004 0.00127010 We now consider adding higher order terms, i.e. potential interactions between the covariates possibly of the form (π₯π − π‘π )+ (π₯π − π‘π ) . + > oz.mars3 = earth(upoz^.333~.,degree=2,data=Ozdata) > plot(oz.mars3) 184 > summary(oz.mars3) Call: earth(formula=upoz^0.333~., data=Ozdata, degree=2) (Intercept) h(60-day) h(day-116) h(5740-v500) h(67-hum) h(safb-54) h(1069-inbh) h(dagg-11) h(11-dagg) h(250-inbt) h(200-vis) h(hum-50) * h(inbh-1069) coefficients 2.60653448 -0.00969879 -0.00195882 -0.00117770 -0.00635632 0.01621764 -0.00021214 -0.00546709 -0.00335714 -0.00169567 0.00110855 -0.00000328 Selected 12 of 20 terms, and 8 of 9 predictors Importance: safb, inbh, hum, v500, day, dagg, vis, inbt, wind-unused Number of terms at each degree of interaction: 1 10 1 GCV 0.04860307 RSS 13.38827 GRSq 0.8203761 RSq 0.8491494 Here we can see the interaction term between humidity and inversion base height plotted. It is also interesting to note that this model does not include a term based on wind speed. There is also a slight improvement in R-square from the first degree term only model. As with any flexible modeling strategy cross-validation is critical. The earth function has some built-in functionality for performing cross-validation, which we will consider below. 185 Cross-validation with MARS using the earth package The earth() function has built in cross-validation functionality that can be plotted and used to choose an optimal number of terms for a MARS model. This handout will demonstrate their use and some additional optional arguments that are useful in the model building process. We will use the L.A. Basin ozone concentration data again in our examples. > oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,nfold=10) > summary(oz.mars) Call: earth(formula=upoz^0.333~., data=Ozdata, nfold=10, degree=2) (Intercept) h(60-day) h(day-116) h(5740-v500) h(67-hum) h(safb-54) h(1069-inbh) h(dagg-11) h(11-dagg) h(250-inbt) h(200-vis) h(hum-50) * h(inbh-1069) coefficients 2.60653448 -0.00969879 -0.00195882 -0.00117770 -0.00635632 0.01621764 -0.00021214 -0.00546709 -0.00335714 -0.00169567 0.00110855 -0.00000328 Selected 12 of 20 terms, and 8 of 9 predictors Importance: safb, inbh, hum, v500, day, dagg, vis, inbt, wind-unused Number of terms at each degree of interaction: 1 10 1 GCV 0.04860307 RSS 13.38827 GRSq 0.8203761 RSq 0.8491494 cv.rsq 0.7745141 Note: the cross-validation sd's below are standard deviations across folds Cross validation: cv.rsq sd 0.77 0.081 nterms 12.50 sd 1.27 nvars 7.90 sd 0.32 MaxErr sd -0.8 0.61 To examine cross-validation results graphically it is best to save the results using the option keepxy=T option when using nfold to perform the cross-validation. > oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,nfold=10,keepxy=T) > plot(oz.mars,which=1,col.rsq=0) ο displays only the model selection plot. The col.rsq = 0 option will block out the R2 portion of the plot. 186 We can choose a different value for k in the k-fold cross-validation performed by the nfold option. We can also use the ncross option to perform the k-fold cross-validation ncross times. This will allow us to see the degree of variability in the k-fold cross-validation results. > oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,ncross=10,nfold=5,keepxy=T) > plot(oz.mars,which=1,col.rsq=0) 187 When using a small value for k-fold cross-validation we can plot the results of each fold individually as follows: > oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,nfold=3,keepxy=T) > plot.earth.models(oz.mars$cv.list,which=1) > oz.mars = earth(upoz^.333~.,data=Ozdata,degree=2,ncross=20,nfold=5,keepxy=T)] > plot(oz.mars,which=1,col.mean.infold.rsq="blue",col.infold.rsq="lightblue", col.grsq=0,col.rsq=0) 188 Measuring Variable Importance – evimp()command The command evimp function uses three criteria for estimating variable importance in a MARS model. The nsubsets criterion counts the number of model subsets that include the variable, there is one subset for each model size from 1 to selected model size. The rss criterion first calculates the decrease in the RSS for each subset relative to the previous subset. Then for each variable it sums these decreases over all subsets that include the variable. The gcv uses GCV instead of the residual sum of squares. Adding a variable can actually increase the GCV do the model complexity penalty in the denominator. > evimp(oz.mars,trim=FALSE) nsubsets gcv rss safb 11 100.0 100.0 inbh 10 48.9 50.8 hum 9 39.3 41.5 v500 8 32.8 35.1 day 7 29.5 31.7 dagg 5 22.2 24.2 vis 4 13.4 16.6 inbt 2 7.8 10.5 wind-unused 0 0.0 0.0 The trim = FALSE option displays variables not used in the model. Additional useful/interesting options for the earth() function 189 Example 2: Boston Housing Data Using the same training and test/validation set as in the example at the end of Section 6 of the notes, we will first compute an “optimal” MARS model to the training set and then predict the test cases. We can compare the mean PSE to the best models we identified above, i.e. random forest, boosted trees, and treed regression. Using the internal GCV criterion a degree=1 model with approximately 20 terms seemed appears optimal. > bos.mars = earth(logmedv~.,data=Boston.train,degree=1,nfold=5,ncross=10,keepxy=T,nprune=30) > bos.mars = earth(logmedv~.,data=Boston.train,degree=1,nfold=5,ncross=10,keepxy=T,nprune=30) > summary(bos.mars) Call: earth(formula=logmedv~., data=Boston.train, keepxy=T, ncross=10, nfold=5, degree=1, nprune=30) (Intercept) h(B-394.43) h(394.43-B) h(crim-4.34879) h(4.34879-crim) h(6.3361-dis) h(lstat-6.47) h(6.47-lstat) h(lstat-29.29) h(nox-0.488) h(nox-0.544) h(nox-0.655) h(nox-0.693) h(nox-0.77) h(ptratio-14.7) h(rm-6.319) h(6.319-rm) h(284-tax) coefficients 3.4543157 -0.0208361 -0.0005712 -0.0105045 -0.0496444 0.0502471 -0.0302173 0.0519796 0.0485452 -2.9008332 3.2657694 -13.2343127 17.9503513 -7.0763230 -0.0306274 0.2069193 0.0655710 0.0013811 Selected 18 of 22 terms, and 8 of Importance: lstat, rm, crim, nox, Number of terms at each degree of GCV 0.0264371 RSS 8.962633 GRSq 13 predictors dis, ptratio, B, tax, age-unused, chas-unused, indus-unused, ... interaction: 1 17 (additive model) 0.8480453 RSq 0.8724877 cv.rsq 0.8129899 Note: the cross-validation sd's below are standard deviations across folds Cross validation: cv.rsq sd 0.81 0.047 nterms 18.52 sd 1.23 nvars 8.96 sd 0.64 MaxErr sd 1.2 0.66 190 > plotmo(bos.mars) > plot(bos.mars,which=1,col.mean.infold.rsq="blue",col.infold.rsq="lightblue",col.grsq=0,col.rsq=0) > ypred = predict(bos.mars,newdata=Boston.test) > mean((Boston.test$logmedv-ypred)^2) [1] 0.02864426 191 Compared to random forest, boosted trees, and boosted treed regression, MARS does not measure up for this particular regression problem. MARS is one of the main algorithms used in Salford Systems (https://www.salford-systems.com/) commercial predictive analytics software. Although MARS did not outperform the tree-based methods, it does produce a much interpretable model. The summary of the MARS model shown below demonstrates that a MARS fit is just an OLS model with automatically chosen terms and interactions. > summary(bos.mars) Call: earth(formula=logmedv~., data=Boston.train, keepxy=T, ncross=10, nfold=5, degree=1, nprune=30) (Intercept) h(B-394.43) h(394.43-B) h(crim-4.34879) h(4.34879-crim) h(6.3361-dis) h(lstat-6.47) h(6.47-lstat) h(lstat-29.29) h(nox-0.488) h(nox-0.544) h(nox-0.655) h(nox-0.693) h(nox-0.77) h(ptratio-14.7) h(rm-6.319) h(6.319-rm) h(284-tax) coefficients 3.4543157 -0.0208361 -0.0005712 -0.0105045 -0.0496444 0.0502471 -0.0302173 0.0519796 0.0485452 -2.9008332 3.2657694 -13.2343127 17.9503513 -7.0763230 -0.0306274 0.2069193 0.0655710 0.0013811 Selected 18 of 22 terms, and 8 of Importance: lstat, rm, crim, nox, unused, ... Number of terms at each degree of GCV 0.0264371 RSS 8.962633 GRSq 13 predictors dis, ptratio, B, tax, age-unused, chas-unused, indusinteraction: 1 17 (additive model) 0.8480453 RSq 0.8724877 cv.rsq 0.8068335 192