Appendix S1: Bayesian geostatistical posterior prediction of the depth distribution of aggregations of small pelagic fish Introduction The objective of the analysis presented here was to simulate the depth distribution of aggregations of Peruvian anchoveta (Engraulis ringens) for the purposes of investigating foraging site selection by two seabird species, the Peruvian Booby (Sula variegata) and Guanay Cormorant (Phalacrocorax bougainvillii). Bayesian predictions of the relative abundance of anchoveta across two dimensions were previously developed by Boyd et al. (2015) from the same survey data. The account presented here parallels the account presented by Boyd et al. (2015). Analysis of seabird foraging site selection was based on a regular hexagonal grid, encompassing tracking data for Peruvian Boobies and Guanay Cormorants breeding at Grupo Pescadores in December 2008 (Fig. 1a in the main text). Data on the depth distribution of anchoveta were derived from systematic acoustic survey transects. Bayesian posterior prediction was used to predict the upper depth limits of aggregations at a set of prediction locations in each hexagonal grid cell. Posterior predictions of the mean and the variance of the upper depth limits of aggregations were then computed for each grid cell, and used to estimate the probability that prey occurred in the upper water column. Materials and methods DATA The Instituto del Mar del Péru (IMARPE) conducted an acoustic survey on the RV “Olaya” off Grupo Pescadores (~11.77°S, 77.27°W) off the coast of Peru during 2-5 December 2008 (Fig. S.1). The survey design was systematic, based on parallel, equally-spaced, onshore-offshore transects approximately 10 km apart (Fig. 1b). For the purposes of this analysis, the study region was restricted to the on-shelf area covered by the survey. Acoustic backscattering data were collected using a calibrated Simrad scientific echosounder (EK60) operating at 120 kHz and processed by IMARPE using Echoview acoustic postprocessing software (Myriax Software, Hobart, Tasmania, Australia). Acoustic backscatter was identified to species based on known backscattering characteristics, ground-truthed using 1 biological samples taken from mid-water trawls during the survey (Castillo et al. 2009; Simmonds et al. 2009). Only backscatter attributed to anchoveta was used in this analysis. Relatively homogeneous regions of acoustic backscatter were identified using the school detection algorithm in Echoview. For each identified aggregation, the height and mean depth were estimated by the school detection algorithm and used to calculate the depth of the upper limit of the aggregation. All depths refer to depths below the echosounder (i.e. depths are measured from 3.4m below the sea surface). No adjustment was made for possible vessel avoidance by anchoveta. Anchoveta schools in central southern Chile exhibited limited vertical diving behaviour with a range of around 5 m (Gerlotto et al. 2004). The geographic locations of anchoveta aggregations were transformed into the Universal Transverse Mercator coordinate system using the package rgdal() in R (Keitt et al. 2011; R Development Core Team 2012). Several aggregations occurred at the same geographic location, but at different depths (20 sets of duplicate coordinates from a total of 1,562 identified aggregations). Multiple observations at the same geographic location are not supported in geostatistical analysis. This issue was resolved by jittering the duplicate coordinates using the package geoR() in R (Ribeiro & Diggle 2001). BAYESIAN GEOSTATISTICAL POSTERIOR PREDICTION The upper depth limits of anchoveta aggregations were assumed to be a realization of a continuous non-zero random variable, and were analyzed using a linear geostatistical model following Box-Cox transformation (Box & Cox 1964). Preliminary analysis indicated that a model based on lognormal transformation of the upper depth limits, constant mean, and an exponential correlation function was well-supported by the data (Boyd 2012): 𝑌𝑖 ~ 𝛽 + 𝑆𝑖 + 𝜀𝑖 eqn S1.1 where Yi represents the log-transformed upper depth limits of anchoveta aggregations observed at a set of sampling locations i = 1 ....N; β is the mean parameter; Si is the spatial signal process; and εi are independent and identically distributed with zero mean and non-spatial variance τ2. The spatial signal process is characterized by the theoretical variogram. In the stationary and isotropic case, this simplifies to: 𝑉(ℎ) = 𝜎 2 [1 − 𝜌(ℎ)] eqn S1.2 2 where σ2 is the variance of the spatial signal process (the ‘partial sill’); and ρ(h) is the ℎ exponential correlation function, 𝜌(ℎ) = 𝑒𝑥𝑝 (− 𝜑), where h is the absolute Euclidean distance between two locations and φ > 0 is a scaling parameter. The non-spatial variance may be 𝜏2 reparameterized in relative terms as 𝑣 2 = 𝜎2 . Bayesian methods were used to sample 100 unique parameter sets from the posterior densities of the parameters, β, σ2, φ, and v2, and generate posterior predictions of the upper depth limits of aggregations for a set of over 45,000 prediction points (approximately 25 prediction points per hexagonal grid cell). Bayesian inference was conducted by direct simulation, replicated independently, using the krige.bayes function in the geoR() package in R. The following vague priors were chosen from the options available in geoR: a flat prior for β (i.e. 𝑝(𝛽) ∝ 1), a reciprocal prior for σ2 (i.e. 𝑝(𝜎 2 ) ∝ 1 ), 𝜎2 a uniform discrete prior {0, 0.1, 0.2,…20} for φ, and a uniform discrete prior {0, 0.01, 0.02,…2} for v2. The geoR() package uses a global neighborhood for prediction, but this is computationally demanding for a set of over 45,000 prediction points, so posterior predictions of the upper depth limits of aggregations were generated using the predict.gstat function in the gstat() package in R (Pebesma 2004), with the prediction neighborhood set equal to 3φ (i.e. the practical range, the distance at which correlation is 0.05) and the maximum number of observations set to 475 (i.e. the approximate number of prediction locations in 19 hexagonal grid cells). Results Samples from the posterior densities of the parameters, β, σ2, φ, and v2, are shown in Figure S.2, together with the respective prior distributions. The Bayesian posterior predictions reproduce the sample statistics fairly well (Fig. S.3). The differences between the distribution of the observed data and the distribution of the posterior predictions reflect deviations from normality in the observed data following log-transformation (see Boyd 2012). The Bayesian geostatistical approach appears to underestimate spatial autocorrelation at short distances when compared to the empirical variogram for the observed data (Fig. S.4). The mean theoretical variogram computed from samples from the posterior distribution of the spatial parameters, and hence the posterior predictions, appear to over-estimate the variance in the depth of aggregations between locations that are less than 24 km distant. We tested alternative 3 correlation functions with similar results. The likelihood-based geostastistical methods applied here use all the data to estimate the spatial parameters, rather than focusing on pairs of points that are relatively close together as is common practice in classical geostatistics. While this appears to lead to a poor fit to the empirical variogram in this case, it is important to recognize that the empirical variogram is only a summary of the data. Sample predictions for the probability that the upper depth limit of aggregations is less than 10m below the echosounder are shown in Figure S.5. The Bayesian posterior predictions for the depth distribution of anchoveta aggregations were consistent with the observed data in showing that the depth distribution of anchoveta was relatively shallow throughout the study region in December 2008. References Box, G.E.P. & Cox, D.R. (1964) An analysis of transformations. Journal of the Royal Statistical Society, Series B, 26, 211-252. Boyd, C. (2012) The Predator’s Dilemma: Investigating the responses of seabirds to changes in the abundance and distribution of small pelagic prey. Ph.D. dissertation., University of Washington. Castillo, R., Peraltilla, S., Aliaga, A., Flores, M., Ballón, M., Calderón, J. & Gutiérrez, M. (2009) Protocolo técnico para la evaluación acústica de las áreas de distribución y abundancia de recursos pelágicos en el mar peruano. Informe Instituto del Mar Perú, 36, 7-28. Gerlotto, F., Castillo, J., Saavedra, A., Barbieri, M.A., Espejo, M. & Cotel, P. (2004) Threedimensional structure and avoidance behaviour of anchovy and common sardine schools in central southern Chile. Ices Journal of Marine Science, 61, 1120-1126. Keitt, T.H., Bivand, R., Pebesma, E. & Rowlingson, B. (2011) rgdal: Bindings for the Geospatial Data Abstraction. Pebesma, E.J. (2004) Multivariable geostatistics in S: the gstat package. Computers & Geosciences, 30, 683-691. R Development Core Team (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Ribeiro, P.J. & Diggle, P.J. (2001) geoR: a package for geostatistical analysis. R-NEWS, 1, 15-18. Simmonds, E.J., Gutiérrez, M., Chipollini, A., Gerlotto, F., Woillez, M. & Bertrand, A. (2009) Optimizing the design of acoustic surveys of Peruvian anchoveta. Ices Journal of Marine Science, 66, 1341-1348. 4 Figure legends Figure S.1. Proportional representation of acoustic backscatter (m2 per nautical mile2) in December 2008. Elementary distance sampling units (EDSUs) are marked by crosses (zero values for anchoveta) and circles (positive values, diameter of the circles proportional to the logarithm of relative anchoveta abundance). The shelf break (200 m isobaths) is indicated by the dashed line. Figure S.2. Prior distributions (dashed lines) and samples from the posterior densities (histograms) for the spatial parameters. Figure S.3. The distribution of the observed log-transformed upper depth limit of aggregations (histogram); and the mean (solid line) and 2.5th and 97.5th percentiles (dashed lines) of samples from the corresponding posterior densities. Figure S.4. Quantitative summaries of the spatial pattern of observed data and posterior predictions. Box plots represent empirical variograms computed from posterior predictions. The dashed line represents the empirical variogram for the observed data. The dotted line represents the mean of the theoretical variograms computed from 100 samples from the posterior distributions of the spatial parameters. Figure S.5. Four Bayesian posterior predictions of the probability that the upper depth limit of aggregations is less than 7.5m below the echosounder. Land is shown in pale grey. 5 Figures 6 7 8 9 10