Appendix S4. Examining the Asymmetric Boundary for ‘Extreme’ Size Change
Our analysis of mainland rodent size variation produced an asymmetric distribution of ratios for rodent populations compared to their mainland species average. The upper
97.5% quantile consisted of populations at least 9% bigger than their mainland species average, while the lower 2.5% quantile consisted of populations more than 20% smaller than their mainland species averages. Since there is such a discrepancy in the degree of size change included in the upper and lower quantiles, we took our original sample of
306 island species, and we examined how well both the classification tree and random forest methods performed with our asymmetric boundaries (‘mainland boundaries’) when compared to symmetric boundaries, which used either the upper quantile (‘inclusive boundaries’, considering more populations to be extreme by defining 'normal' size ratios as 0.91-1.09) or the lower quantile (‘stricter boundaries’, defining 'normal' size ratios as
0.80-1.20) to determine extreme size. We used the same variables as in our original analysis, with the only difference being the number of cases included in the analyses.
Once the trees and random forests were built, we ran the independent data set (new data obtained from Lomolino et al., 2013) through each model to determine the predictive accuracy for each sample size. Additionally, we built 1,000 random forests from a random sampling, for each sample size, of the original 306 species to examine the impact sample size had on predictive accuracy.
For most classification tree and random forest analyses, models using the boundaries based on quantiles of the mainland body size distribution performed better than either the inclusive or the strict symmetrical boundaries (see table below). Random forests based on random (including both 'normal' and extreme) samples of the insular rodent populations were relatively similar regardless of sample size, and performed two to three times worse than any other model. This demonstrates that clearer patterns emerge when the focus is on size differences that exceed those commonly found among populations on the mainland.
Applying strict boundaries, as compared to the mainland quantile boundaries, is a matter of reducing the number of Big island populations in the sample. In general, doing so raised the misclassification rate, which is to be expected because the trees in these analyses were found to be more successful in predicting size increase than decrease on islands. Predictive classification of the independent data appears to have improved, but at this small sample size the difference between the rate of misclassification using strict boundaries (~11%) and that produced using mainland quantile (~15%) amounts to a single case.
When more inclusive boundaries are applied, the large disparity in predictive accuracy between classification trees and random forests suggests that classification trees produced using these boundaries are relatively unstable. Using inclusive boundaries adds Small island populations to the sample, and the likelihood that undetected subadult or juvenile individuals have been included and influenced population averages is greater in Small island populations whose Size Ratios fall closer to 1.
Effects of Shifting the “Extreme” Boundaries
# Large / # Small
Classification Tree
Misclassification Rate
Random Forest
Misclassification Rate
Predictive
Misclassification Rate for Independent Data
Random Forest
Misclassification Rate for Independent Data
Random Sample Mean
Misclassification Rate
(Random Forest)
Strict
Boundaries
(n=120, new*=27)
( 83 / 37 , 22 / 5 )
15.00%
16.38%
11.11%
15.67%
Mainland quantile
Boundaries
(n=169, new*=39)
( 132 / 37 , 34
10.00%
13.11%
15.38%
11.30%
35.67% 34.37%
*'new' refers to independent data set
/ 5 )
Inclusive
Boundaries
(n=206, new*=43)
( 132 / 74 , 34 / 9 )
16.02%
26.28%
23.26%
18.79%
33.77%