Appendix 1: Methods for Generalized Additive Modeling Approach We developed generalized additive models (GAMs) (Knapp, 2005; Miro and Ventura, 2013) to explore data relationships and determine the relative importance of factors on endemic shrimp occurrence. GAMs do not require constant variance or normally distributed errors, and therefore are useful when dependant variables are binary (ie. presence/absence data). GAMs relax the assumption that the relationship between the dependent variable (on the logit scale) and the explanatory variables are linear. Nonparametric loess smoothing functions are used in GAMs to describe the relationships between the dependent and continuous predictor variables (Hastie and Tibshirani, 1991). The dependent variables in the GAMs were the presence or absence of H. rubra or M. lohena. The predictive variables that were examined for potential use in exploratory GAM models represented water properties, pool characteristics, land use, and co-occurring endemic and introduced species (Table A1). Chlorophyll a (n=201), turbidity (n=230), salinity (n=323) and temperature measurements (n=314) were not available for all of the pools included in the dataset so were examined in separate GAM models using a subset of pools. None of the water variables were a significant term for explaining H. rubra or M. lohena occurrence, and therefore, were dropped from the full analysis. Determining the relative importance of factors controlling distributions of a plant or animal can be confounded by spatial autocorrelation because sites that are closer to each other may be more similar than those at greater distances (Koenig 1999, Legendre 1993). Spatial autocorrelation is problematic for statistical tests which require independence between variables. Incorporating a spatial autocorrelation term into regression models is recommended otherwise environmental variables may appear to influence distributions when they are actually not statistically significant (Legendre, 1993). In the GAM models, spatial autocorrelation was accounted for with a loess smoothed term that combined the latitude and longitude of each pool (Knapp et al., 2003, Dormann et al., 2007). Collinearity between predictor variables in multiple regressions, may confound their independent effects (Quinn and Keough, 2002). Prior to regression analyses, Pearson correlation matrices were calculated for all pairwise combinations of independent variables with full datasets. Correlation coefficients (r) ranged between -0.32 and 0.71, were assumed to be independent, and were included in the dataset (Knapp et al., 2003). The value pi is the probability of finding the shrimp at location i, and is defined as, 𝑝𝑖 = 𝑒 𝜃𝑖 1 + 𝑒 𝜃𝑖 1 where the linear predictor( i.e. the logit line) θi is a function of the independent variables. For both shrimp species, the relationship we used for θ was: 𝜃𝑖 = 𝛽0 + 𝑓1 (SILT) + 𝑓2 (AREA) + 𝑓3 (CANOPY) + 𝑓4 (PERIMETER VEG ) + 𝑓5 (UTME, UTMN) + TILAPIA + POECILIID + KSAND + MLAR + LU_PROXIMAL + LU_UPSLOPE where 𝜃0 is the intercept or constant and 𝜃1 ()..𝜃5 () are nonparametric loess smoothing functions that characterize the effect of each continuous independent variable on the probability of response. Spatial autocorrelation is incorporated as the term 𝜃5 (UTME,UTMN) which is a smoothed surface of UTM easting and northing (Augustin et al. 1998). The terms TILAPIA, POECILIID, KSAND, MLAR, LU_PROXIMAL, LU_UPSLOPE are categorical binary variables and do not incorporate loess smoothing functions. For the models used to explain M. lohena distribution, and additional categorical variable HRUBRA was added to the model. The best combination of independent variables was determined by dropping each term from the model in the presence of all other variables. Analysis of deviance and likelihood ratio tests were used to test the significance of each of the independent variables on the probability of each shrimp species occurrence (Knapp, 2005). Independent variables with significant (P ≤ 0.01) effects on shrimp occurrence were used to develop a simplified model that could be used to predict shrimp occurrence within anchialine pools. References Augustin, N. H., M. A. Muggelstone & S. T. Buckland, 1998. The role of simulation in modelling spatially correlated data. Environmetrics 9: 175–196 Dormann, C. F., J.M. McPherson, M.B. Araújo, R. Bivand, J. Bolliger, G. Carl, R.G. Davies, A. Hirzel, W. Jetz, W.D. Kissling, I. Kühn, , R. Ohlemüller, P.R. Peres-Neto, B. Reineking, B. Schröder, F.M. Schurr & R. Wilson, 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30: 609- 628. Hastie, T. & R. Tibshirani, 1991. Generalized additive models. Chapman and Hall, New York, New York, USA. Knapp, R.A., 2005. Effects of nonnative fish and habitat characteristics on lentic herpetofauna in Yosemite National Park, USA. Biological Conservation 121: 265-279. Knapp, R.A., K.R. Matthews, H.K. Preisler & R. Jellison, 2003. Developing probabilistic models to predict amphibian site occupancy in a patchy landscape. Ecological Applications 13: 1069-1082. 2 Koenig, W.D., 1999. Spatial autocorrelation of ecological phenomena. Trends in Ecology and Evolution 14: 22-26. Legendre, P., 1993. Spatial autocorrelation: trouble or new paradigm? Ecology 74:1659–1673. Miro, A. & M. Ventura, 2013. Historical use, fishing management and lake characteristics explain the presence of non-native trout in Pyrenean lakes: Implications for conservation. Biological Conservation 167: 17-24. Quinn, G. & M. Keough, 2009. Experimental Design and Data Analysis for Biologists. New York: Cambridge University Press. Table A1: Explanatory factors in generalized additive models Variable Physical AREA SILT Approximate surface area (m2) % silt cover on substrate Fauna TILAPIA (binary) POECILIID (binary) MLAR (binary) KSAND (binary) Presence/ absence Tilapia Presence/ absence poeciliids Presence/ absence prawn Macrobrachium lar K. sandvicensis CANOPY PERIMETER veg % canopy overhang % of pool periphery surrounded by vegetation (within 0.5 m ) Vegetation Land use LU_UPSLOPE (binary) Yes if urban, resort or residential development within 1km LU_PROXIMAL Yes if residential or resort development surrounds a pool upslope and inland Spatial Autocorrelation (binary) UTMN,UTMS X and Y coordinates from GIS Water quality (examined separately from other variables) Temperature Celsius Salinity ppt= parts per thousand Turbidity NTU = Nephelometric Turbidity Units 3 Chlorophyll a μg/L Table A2: Summary statistics for all anchialine pool survey results included in the study including mean, standard deviation, median, minimum value and maximum value. Vegetation and substrate variables represent percent cover data. Pool Characteristic Turbidity (NTU) Chlorophyll a (ug/L) Temp (Celsius) Salinity (ppt) Area (m2) Distance to Shore (m) Maximum Depth (m) Tree Canopy (%) Emergent Vegetation (%) Perimeter Vegetation (%) Rocky Substrate (%) Silt Substrate (%) Sand Substrate (%) Pools sampled 230 201 314 323 398 398 398 398 398 398 398 398 398 Mean 0.39 1.12 23.3 10.9 38.8 107.4 0.38 12.1 8.1 28.7 71 22.6 5.1 4 Stdev 2.5 5.5 2.4 5.8 104.8 74.62 0.7 25 23.2 38 40.8 38.4 17.7 Median 0 0.1 23.1 11.7 8.0 91.0 0.25 0 0 5 100 0 0 Min 0.0 0.0 19.3 1.3 0.5 11 0 0 0 0 0 0 0 Max 32.3 66.3 29.6 26.6 951.5 600 10 100 100 100 100 100 100 Figure A1: Salinity variation among anchialine pools and aquifers along the western and southern coastline of the Island of Hawaii. Each histogram shows the frequency of pool salinities within a geographic cluster. Figure A2: Maps showing distribution and percentage of pools with: (a) vegetation, (b,c) endemic anchialine shrimp species, and (d,e,f) introduced species known to prey on endemic shrimp. 5