Online Resource 1: Quantification and characterization of natural and anthropogenic environmental patterns across the study area ARTICLE TITLE: Effects of urbanization on herbaceous forest vegetation: the relative impacts of soil, geography, biotic interactions, human access, and an invasive shrub JOURNAL: Urban Ecosystems AUTHORS: Guy N. Cameron1, Theresa M. Culley1, Sarah E. Kolbe2, Arnold I. Miller2, and Stephen F. Matter1 AFFILIATION: 1Department of Biological Sciences and 2Department of Geology University of Cincinnati, Cincinnati Ohio 45221 Corresponding Author: G. Cameron (g.cameron@uc.edu) This supplement provides an overview of natural and anthropogenic environmental patterns at our study sites:Miami Whitewater Forest (MWW), Mt. Airy Forest (MAF), Benedict Nature Preserve (BEN), East Fork Wildlife Area (EF), Tranquility Wildlife Area (TRA), Edge of Appalachia Preserve (EOA); see main text (Medthods and Materials: Study sites) for details on study sites]. In the following sections, we describe the methods used for collecting and analyzing environmental data, present the results of environmental analyses, and provide the basis for classifying these sites as Urban, Exurban and Wildland. Methods used to collect data Edaphic variables: Our methods for collecting and analyzing data from soil cores are described in the main text (Methods: Environmental measures). Additional data on soil taxonomy, drainage class, and available water storage capacity were extracted from the Soil Survey Geographic Database (SSURGO 2011). Likewise, our use of GIS datasets to acquire information on elevation, aspect, and slope of our study plots, as well the distance of each plot to the nearest primary and secondary roads are described in the main text (Methods: Environmental measures). Deposition of atmospheric pollutants: Measures of wet deposition of NO3-, NH4+, total nitrogen (TN), and SO42- were obtained using raster datasets created from data collected through the National Atmospheric Deposition Program (NRSP-3 2012; 4-km resolution) to assess the impacts of atmospheric deposition from coal-fired power plants and other major point sources of pollution. Annual precipitation, average maximum temperature, and average minimum temperature are based on 30-year climate-normal data from 1971 to 2000. These data were extracted from raster datasets compiled by the PRISM Climate Group (PRISM 2008) from weather stations maintained by the National Weather Service, National Resources Conservation Service, United States Forest Service, the Bureau of Land Management, and other state and local station networks. Population density: Population density in 2010 and change in population density between 2000 and 2010 were assembled from U.S. Census data (U.S. Census Bureau 2011) to assess contemporary population characteristics and recent trends near each plot. Change in population density was assessed over this relatively short time interval to capture volatility. The short time interval also decreases the influence of artefactual differences in population density related to changes in census block borders through time. To estimate population density for each time period, we created a buffer with a 1-km radius around each plot centroid, and calculated average population density for the buffer based on the population density of census blocks that intersected the buffer, weighted by their percent areal coverage within the buffered zone. This was preferred over extracting population density values directly from the census blocks containing the plots because many plots were located near the borders of census blocks, so that population densities of the blocks may not accurately represent true population densities near the plots. A buffer size of 1-km was selected to balance the need to characterize the area around each plot while minimizing the overlap among buffers of adjacent plots. Previous studies suggest that even in fully forested environments, edge-effected zones may be as deep as 1 km, so it is reasonable to expect that anthropogenic activities occurring within that distance from our plots may affect the communities present there (Gascon et al. 2000). Using the relatively large buffer size is a conservative approach: buffers serve as windows to a landscape and its features, and heterogeneous landscapes generally appear more homogeneous as buffer size increases, decreasing the likelihood of significant differences among different areas (Baker et al. 1995). Although the 1-km plot buffers overlap to some degree at each locality, the buffer approach was favored over the use of a locality “average” value because it permits diagnosis of variation among plots within localities even when parts of buffers were shared. Roads: In addition to proximity to roads described in the main text (Methods: Environmental measures), we also calculated total density of roads around study plots by summing the total length of roads circumscribed by a 1-km buffer around each plot centroid, analogous to the standard method for calculating drainage density within a basin area (Tucker et al. 2001). Land cover and land-use change: Remote sensing multispectral imagery was used to characterize current land cover and changes in land use at each study site over the 23 year interval from 1988 to 2011, selected based on image availability. We used Landsat 5 imagery because this satellite has been operational since 1984, allowing for comparison among images collected by the same platform using identical bands and sensors in both 1988 and 2011. For each locality, a satellite image was selected from each year using USGS EarthExplorer (http://earthexplorer.usgs.gov). Images were selected to be cloud-free and to represent the same time of year to minimize potential differences introduced by variation in moisture and plant phenology. For MWW, BEN, and EF, cloud-free scenes from June 6, 1988 and June 6, 2011 were selected. For EOA and TRA, scenes from May 30, 1988 and May 30, 2011 were selected. Appropriate cloud-free scenes were not available for May or June prior to 1988. A high-resolution panchromatic band from Landsat 7 ETM+ was also downloaded for each area to aid in the identification of training areas for land cover classification. After image acquisition, composite images were formed using bands 1-5 and 7, representing visible blue, green, red, near-infrared, and mid-infrared wavelengths, and training areas representing forest, agricultural, water, residential/transitional, and urban land cover were selected (Online Resource 1 Fig. 1). The same training areas were used for each image wherever possible; overlap between the east and west scenes permitted the identification of training areas for each land cover type present in all four images. Supplemental training areas were identified for each image as necessary to fully characterize each land cover type. Maximum likelihood-based supervised classification was applied to determine land-cover types for each image individually using ENVI 4.4, and post-processing majority analysis was used to smooth images. Classified images then were exported into a geographic information system. Images were combined into a single composite raster representing land cover transitions over the 23-year interval. Changes in land use were quantified within a 1-km buffered radius to determine the total percentage of land that experienced any transition in land use over the interval, the percentage of land that experience a transition to “urban” land use, and the percentage of land that transitioned to any disturbed state (e.g. “residential”, “agricultural”), to distinguish between active, anthropogenically mediated transitions in land use and passive transitions, such as reforestation after agricultural abandonment. The classified image from 2011 also was used to quantify current land cover conditions and measures of habitat fragmentation near each plot. The percentage of each land cover type was determined within a 1-km buffered radius of each plot. To quantify forest fragmentation, we calculated the area-weighted mean size, size variability, and edge-to-area ratio of forest patches within each plot’s buffer radius. To characterize landscape heterogeneity, we also calculated the diversity of land cover types and the total edge density (in km/km2) within each 1-km buffer. Estimating canopy cover from LiDAR: Canopy cover, which represents the percentage of forest floor covered by the vertical projection of tree crowns (McLane et al. 2009), strongly influences the amount of light available in the forest understory, and therefore may influence tree community composition and structure. To estimate canopy cover at each study site, we used discretereturn LiDAR data available through the Ohio Statewide Imagery Program (OSIP). These data were collected with a Leica ALS50 digital LiDAR System at a flight altitude of 2.2 km, resulting in an average post spacing of approximately 2 m. LiDAR returns were separated into ground and aboveground points. LiDAR-based canopy cover was calculated as the ratio of all (first, last, single, and intermediate) canopy returns to all total (ground plus canopy) returns: CC AR R R Canopy(all) , Total(all) where RCanopy(all) represents all canopy returns, and RTotal(all) represents the all ground and aboveground returns. A similar metric using the canopy-to-total ratio of first returns (CCFR) is also frequently used as a measure of canopy cover, and is sometimes favored because intermediate and last returns provide little additional information when closelyspaced first returns are available (Morsdorf et al. 2006). Given the post spacing of the OSIP data, however, and the density of the canopy cover at our study sites, there were not enough first ground returns to reliably calculate CCFR. CCAR has been demonstrated to reliably represent true canopy cover (Smith et al. 2009, Hopkinson and Chasmer 2009), and LiDAR-based measures of canopy cover can actually surpass the accuracy of quickly measured field-based estimates (Morsdorf et al. 2006, Smith et al. 2009, Korhonen et al. 2011). A height threshold of 1.37 m was used to separate canopy returns from other nonground returns. This threshold was selected because it is the average height at which field-based measurements of vertical canopy cover and diameter at breast height are recorded (Morsdorf et al. 2006, Smith et al. 2009, Korhonen et al. 2011), and because overstory and understory vegetation can be efficiently separated at this height across a broad range of forest types (McLane et al. 2009). Quantitative data analyses In our initial analyses of the environmental data listed below in Results, we illustrate differences in degree of anthropogenic influence among Urban, Exurban, and Wildland sites based on variables that describe population density and flux, proximity to roads, current land cover and transitions in land use, canopy cover at our study sites based on LiDAR, measures of habitat fragmentation and heterogeneity, and sources of atmospheric pollution. For variables that were measured using a 1-km buffer radius, the explanation of differences is descriptive, as the presence of overlapping buffers at some sites prohibits the use of any statistical test that is based on an assumption of independent samples. For other variables, Kruskal-Wallis tests were used to identify significant differences among Urban, Exurban, and Wildland sites for variables that met the distribution assumptions of the test; Wilcoxon rank sum tests then were used to assess pairwise differences among the three groups. Finally, to test our initial assignments of each study site to one of the three categories (Urban, Exurban, Wildland), we used a classification tree and linear discriminant analysis (McGarigal et al. 2000). For both approaches, all anthropogenic and natural explanatory environmental variables were included as possible predictors of study site classification, and to determine whether study sites could reliably be assigned to the categories based on differences in any of the environmental variables. In both analyses, classification error was calculated as the percent of plots incorrectly assigned to a site category. For the discriminant analysis, error was further assessed using leave-one-outcross-validation (LOOCV), in which a single sample is omitted from the data set, a classification function is derived, and the omitted sample is classified (Allen 1971). The process is repeated sequentially for each sample, and the resulting correct classification rate for the omitted samples represents the LOOCV error. LOOCV cross-validation provides unbiased estimates of the accuracy of the linear discriminant analysis when the number of explanatory variables is large relative to the number of samples (Molinaro et al. 2005). Results As expected, human population density was highest at urban sites, lowest at wildland sites, and intermediate at exurban sites (Online Resource 1 Fig. 2a). Urban populations also were more volatile over the period from 2000 to 2010. Change in population density over the 10 year interval ranged from -195 to +82 people/km2 near Urban sites, as compared to Wildland and Exurban sites where change in population density ranged from -1 to +1 person/km2 and -9 to +19 person/km2, respectively (Online Resource 1 Fig. 2b). At all study sites, plots were < 1 km from the nearest road (Online Resource 1 Fig. 3a). Mean distance to the nearest road was lowest at Urban sites, significantly higher at Exurban sites (Wilcoxon rank sum test, W = 174.5, p < 0.000), and intermediate at Wildland sites. Plots in Wildland sites varied the most in their distances from roads. Patterns were similar when only major roads (primary and secondary highways) were considered (Supplemental Fig. 3b). Urban sites had a much higher road density within a 1-km buffer radius than Exurban or Wildland sites, while Exurban sites had the lowest road density (Online Resource l Fig. 3c). The distribution of land cover types varied among sites (Online Resource 1 Fig. 4). At Exurban and Wildland sites, forest cover was dominant within the 1-km buffer radius (70 and 71%, respectively), while at Urban sites, forest cover was less common (43%), and the mean area of residential and forest cover was approximately equal. Agricultural cover was absent near Urban sites, and highest near Wildland sites (16%). Residential cover was highest at Urban sites (45%), intermediate at Exurban sites (19%), and lowest at Wildland sites (12%). Urban cover was highest near Urban sites (12%), and nearly absent near Wildland sites (0.3%). Areal coverage of rivers and lakes was only substantial at Exurban sites, where an average of approximately 6% of land near plots was classified as ‘water’, reflecting the proximity of our Exurban plots at ef to East Fork Lake. Transitions in land use were greatest in areas surrounding Wildland sites, where more than 25% percent of land experienced a change in land cover over the period from 1988 to 2011 (Online Resource 1 Fig. 5). Exurban sites also had relatively high levels of total land use change (24%), while Urban sites experienced the lowest levels (18%). However, when only transitions to a highly anthropogenically-disturbed state (“urban”, “residential”, and “agricultural”) were considered, the degree of land use transition experienced at each site over the 23-year interval was approximately equal (14 to 15% at all sites). The difference for each site between total land use transitions and transitions to disturbed states reflects the relative importance of reversions from agricultural or residential land to forest vegetation. Reversion to forests was relatively high near Wildland and Exurban sites, where it accounts for a high proportion of total land-use change (45% and 43% of total land use change at Wildland and Exurban sites, respectively). Land-use transitions from forest, agricultural, or residential cover to urban land cover were greatest near Urban sites (4.4%), and much lower near Exurban (0.7%) and Wildland sites (0.3%). The area-weighted mean size of forest patches within a 1-km buffer radius of each plot was lowest at Urban sites, and 2-3 times higher at Exurban and Wildland sites (Online Resource l Fig. 6a). Variance in the size of forest patches contained within a buffer also varied along the gradient: Urban sites had relatively low variability in forest patch size, while Wildland sites had the highest variability in in forest patch size, and Exurban sites had an intermediate level of variability (Online Resource l Fig. 6b). This reflects the absence of large forest patches in urban settings; the mean maximum forest patch size observed near urban plots was 1.2 km2, compared to 2.1 and 2.2 km2 at Exurban and Wildland sites, respectively. The ratio of forest edge to forest area decreased from Urban to Exurban to Wildland sites (Online Resource l Fig. 6c), indicating that core forest area is largest at Wildland sites, and smallest at Urban sites. Total edge density was highest at Urban sites (19.1 km/km2) and lower at Exurban (16.6 km/km2) and Wildland sites (16.4 km/km2), indicating a higher degree of landscape heterogeneity and fragmentation in urban settings. Atmospheric wet deposition of NH4+, NO3-, SO4-, and TN generally increased along a west-east transect, although differences among study sites were not statistically significant. For all measures, wet atmospheric deposition was lower at the Urban sites and at the western Exurban site (MWW), and higher at the Wildland sites and the eastern Exurban site (EF) (Online Resource 1 Fig. 7). Variation along the gradient was strongest for SO4-, while variation among sites in NH4+, NO3-, and TN was relatively low. Because all measures of wet deposition were highly correlated (R2 = 0.75 to 0.97), a principal components analysis was used to reduce the four variables to one composite PCA axis, which summarized 98% of the variation in the original dataset. This composite atmospheric deposition variable was used in all subsequent analyses described in the main text. Finally, a classification tree based on all anthropogenic and natural environmental explanatory variables demonstrated that the assignment of localities to each site category (Urban, Exurban, Wildland) could be achieved using a single explanatory variable, population density in 2010 (Online Resource 1 Fig. 8). Based on this classification tree, all plots were assigned to their appropriate site category with 100% accuracy. This result indicates that population density is a key difference among the three site categories. More importantly, the accuracy and simplicity of this classification tree indicates that our initial site groupings are statistically supported, and reflect meaningful differences in explanatory environmental variables among site categories. Linear discriminant analysis also produced a classification function with 100% assignment accuracy based on both percent correct assignment and LOOCV. This result provides further support for the validity of our initial site assignments. References Allen DM (1971) Mean square error of prediction as a criterion for selecting variables. Technometrics 13:469 -475 Baker BW, Cade BS, Mingus WL, McMillen JL (1995) Spatial analysis of sandhill crane nesting habitat. J Wild Managt 59:752-758 Gascon C., Williamson GB, da Fonseca GAB (2000) Receding forest edges and vanishing reserves. Science 288:1356-1358 Hopkinson C, Chasmer L (2009) Testing LiDAR models of fractional cover across multiple forest ecozones. Remote Sens Env 113:275-288 Korhonen L., Korpela I, Heiskanen J, Maltamo M (2011) Airborne discrete-return LIDAR data in the estimation of vertical canopy cover, angular canopy closure, and leaf area index. Remote Sens Env 115:1065-1080 McGarigal K., Cushman S, Stafford S (2000) Multivariate statistics for wildlife and ecology research. Springer Verlag, New York McLane AJ, McDermid, GJ, Wulder MA (2009) Processing discrete-return profiling lidar to estimate canopy closure for large-area forest mapping and management. Canadian J Remote Sens 35:217-229 Molinaro AM, Simon R, Pfeiffer RM (2005) Prediction error estimation: a comparison of resampling methods. Bioinformatics 21: 3301-3307 Morsdorf F, Kötz B, Meier E, Itten KL, Allgöwer B (2006) Estimation of LAI and fractional cover from small footprint airborne laser scanning data based on gap fraction. Remote Sens Env 104:50-61 National Atmospheric Deposition Program (NRSP-3). 2012. NADP Program Office, Illinois State Water Survey, 2204 Griffith Dr., Champaign, IL PRISM Climate Group. 2008. Oregon State University. http://www.prism.oregonstate.edu/index.phtml Smith AMS, Falkowski MJ, Hudak AT, Evans JS, Robinson AP, Steele CM (2009) A cross-comparison of field, spectral, and lidar estimates of forest canopy cover. Canadian J Remote Sens 35:447-459 Soil Survey Staff, Natural Resources Conservation Service, United States Department of Agriculture. Soil Survey Geographic (SSURGO) Database for Ohio. http://soildatamart.nrcs.usda.gov. January 15, 2011. Tucker GE, Catani F, Rinaldo A, Bras RL (2001) Statistical analysis of drainage density from digital terrain data. Geomorphology 36:187-202 U.S. Census Bureau. 2008. Estimates of population changes for metropolitan statistical areas and rankings: July 1, 2007 to July 1, 2008. CBSA-EST2008-05. U.S. Census Bureau. 2011. 2010 Census Summary File 1 - Ohio. U.S. Geological Survey, Earth Resources Observation and Science Center. http://earthexplorer.usgs.gov. Online Resource 1 Figure 1. Examples of training areas for the classification of water, urban, residential, agricultural, and forest land cover. Water training areas (A) were selected from rivers, streams, lakes, and manmade water-features. Urban areas (B) were selected to represent larger manmade features such as parking lots, large building complexes, and industrial areas, while residential/transitional areas (C) were defined based on a distinct road pattern with intermixed trees, grassy areas, and buildings (Walsh 2009). Crop and pasture cover were selected as training areas for the agricultural category (D); highresolution photographs and a high-resolution panchromatic image was used to facilitate the identification of such areas. Forest training areas (E) were selected to represent a variety of slopes and aspects from multiple forest types across the study area. Online Resource 1 Figure 2. A) 2010 population density near plots along the urban-to-wildland gradient. B) Change in population density between 2000 and 2010, in persons per km2. In this and subsequent box-and-whisker plots, whiskers indicate the range of the data, box limits indicate the boundaries of the upper and lower quartiles, and the bold middle line represents the median. Online Resource 1 Figure 3. A) Mean distance to the nearest road along the urban-to-wildland gradient. B) Mean distance to the nearest primary or secondary highway. C) Mean density of roads within a 1-km buffer radius of plots. Online Resource 1 Figure 4. Average land cover within a 1-km buffer of urban, exurban, and wildland plots in 2011. Error bars represent standard deviation of the areal coverage of each land cover type among plots in each of the three settings. Online Resource 1 Figure 5. Land cover transitions along the urban-to-wildland gradient between 1988 and 2011, as a percentage of the total area within a 1-km buffer radius of plots. See text for a description of the three measures of transition. Online Resource 1 Figure 6 A) Area-weighted forest patch mean size within a 1-km buffer of plots. B) Variation in forest patch size within the buffer zone. C) Mean forest edge-to-forest area ratio along the gradient. Online Resource 1 Figure 7. Total annual wet deposition of SO4-2,total nitrogen (TN), NH4+, and NO3-, along the urban-to-wildland gradient. Because the spatial resolution of the deposition data is relatively low, values within each bar are identical, so error bars are not available. Online Resource 1 Figure 8. Classification tree of Urban, Exurban, and Wildland sites based on the explanatory environmental variables described in the text. Logical statements describe the left side of each split. PD2010, refers to population density in 2010. See text for additional explanation.