Supporting information for Parallel adaptive evolution of Atlantic cod on both sides of the Atlantic Ocean in response to temperature Ian R. Bradbury*, Sophie Hubert, Brent Higgins, Sharen Bowman, Ian Paterson, Paul V.R. Snelgrove, Corey Morris, Robert S. Gregory, David C. Hardie, Jeffrey A. Hutchings, Tudor Borza, Daniel Ruzzante, Chris Taggart, Paul Bentzen *To whom correspondence should be addressed. E-mail: ibradbur@dal.ca The PDF includes: Methods Tables S1-S5 Figures S1-S8 1 Methods/Results Sample Collection and Location Characteristics Individuals were sampled at 14 locations from throughout the North Atlantic (Fig 1, S1, Table S1) from 1996 to 2007. Sample sizes ranged from 15 to 26. Samples were collected as part of scientific surveys or commercial harvest and primarily targeted fish in spawning condition with the exception of Ogac Lake and Gilbert Bay where sampling was restricted to summer months. Specific details regarding some samples and locations are published elsewhere (e.g., Taggart and Cook 1996; Bradbury et al. 2009; Hubert et al. 2009, 2010). Although the potential for mixing of local stocks within our samples exists, the large geographic scale examined here and the nature of the trends observed should minimize the impact of small scale mixtures, but this cannot be discounted. Nonetheless, the impact of local structure in areas such as Iceland may be important. Samples from Iceland, Norway, and Ireland were also genotyped for A and B Pan I alleles (Pogson & Fevolden 2003) using the DraI restriction site as well as the hemoglobin βI alleles (Borza et al. 2009) to refine our assignment of fish to discrete geographic areas (table S2). This was used in conjunction with maps of alleles (e.g., Arnason et al. 2009; Andersen et al. 2008), ocean temperature, and oxygen isotope temperature estimates (e.g., Weidman and Millner 2000) to ensure our chosen temperatures were appropriate. Moreover, the trends observed at the outlier SNPs for the most part seem robust to the influence of small scale structure or isolation. This is especially clear with the Baltic or Ogac Lake samples sample which are quite isolated from neighboring populations yet display very consistent trends in the temperature associated outlier loci. That said, there may be small scale variation in structure 2 associated with ocean temperature that is contributing to the observed trends, yet on the scale examined (i.e. the entire north Atlantic) these do not seem to significantly impact our associations though do warrant future evaluation. EST libraries, SNP identification, annotation and linkage In total 884 individuals were used to develop the initial cDNA libraries (Bowman et al. 2010). SNPs with less than 100 bp of flanking sequence or within 60 bp of another selected SNP were removed from consideration (Hubert et al, 2010). Of the 3072 putative SNPs identified, 2284 (~74%) were selected following screening, of which 1641 were informative for the populations used in this study (53%). For identified SNPs, contig consensi were processed with BLASTX against the NCBI non-redundant dataset (nr), using a cutoff of 1e -05 to identify significant hits. Overall annotation success was low (see Hubert et al. 2010), likely because the SNPs were designed almost entirely using sequences from the 3' end, and as such mostly comprise the 3'UTR (ie non-coding). This approach was chosen as a greater number of SNPs and less splicing is likely in the 3'UTR, the obvious drawback is a lower number of annotated genes (see Hubert et al. 2010 for further details). SNPs have been deposited at GenBank dbSNP under accession numbers ss131570222 - ss131571915. Linkage was assessed using JoinMap®4 (Van Ooijen, 2006). The overall approach followed for map generation is described in Hubert et al. 2010. Briefly, each cross was examined separately, and loci which showed abnormal segregation as determined using a chi-square goodness of fit test were removed (P<0.005). Groups of linked markers were 3 identified using a LOD cut-off value of 5.0 or greater and ordered within linkage groups using Haldane’s mapping function. Family specific maps were compared and a 1:1 correspondence between linkage groups confirmed. The corresponding family groups were combined using the JoinMap (Van Ooijen 2006) merge function to generate a consensus map. However, in contrast to the initial study (Hubert et al, 2010) a third family was incorporated into the previously published map, and JoinMap®4 was allowed to force additional markers with a lower goodness-of-fit into the map to maximize the information regarding linked markers for the purposes of this study. Both PanI and haemoglobin have been shown to be associated with temperature in Atlantic cod (e.g., Case et al. 2005; Andersen et al. 2009); however, this study revealed clinal temperature SNPs associated only with haemoglobin β1. As a result only data related to the mapping of haemoglobin β1 is shown here. Detecting loci under selection – analysis and simulations Simulated datasets were generated using EASYPOP (Balloux 2001) and analyzed using BAYESCAN to examine the false positive error rate. Simulations were based on an island model that included 10 demes, with 10000 individuals per deme, a migration rate of 0.01, and 1000 bi-allelic loci. Mutation rate and simulation run length were selected (=0.0001, 1000 generations) to produce realistic levels of diversity and divergence based on the observed and previously published (e.g., Moen et al. 2008; Nielsen et al. 2009) SNP data (FST=0.05, Ho=0.29). Ten independent simulations were completed and the number of false positives was estimated with BAYESCAN. In addition to 4 BAYESCAN, we used the hierarchical test for selection as implemented in ARLEQUIN (v.3.5.1.2, Excoffier et al. 2010). Outlier loci which displayed parallel clines on either side of the Atlantic were examined for environmental and life history correlates with allele frequency using multiple regression and simple mantel tests. Geographic distance was estimated as the straightest distance between samples when avoiding depths > 600 m. Environmental and ecological data were collected from published sources (Myers et al. 2001; Robichaud and Rose 2004) and references therein, and included average annual bottom temperature, average annual bottom salinity, latitude, longitude, stock biomass, area of occupancy, age at maturity, and spawning month. Because associations with environmental variables may by be a result of neutral geographic associations, we accounted for the influence of geography by using both partial Mantel tests and the residuals from the isolation by distance relationship (IBD). Admittedly, small scale variation in these environmental and life history variables may be a source of error. Moreover the environmental conditions experienced by an individual or population often change seasonally, with development, or movement patterns. Without knowing precisely when selection is acting during the life history it is impossible to identify the period and value of importance. As such we have used a conservative approach of using average annual values for environmental factors as they are most likely to represent regional trends in ocean climate and include the critical periods of interest. Notwithstanding its inherent uncertainties, this approach is one that has been used repeatedly in studies of Atlantic cod (Myers et al. 2001; Hutchings et al. 2007). Moreover, the average annual temperature values used are in agreement with 5 temperature exposure of individuals (e.g., Barents Sea) estimated from stable oxygen isotopes (Weidman and Millner 2000) further supporting the values used. Nonetheless, this remains a potential source of error; the refinement of the environmental variables used requires the further evaluation of exactly what temperatures sampled fish experience and when selection is acting during the life history. Estimating selection and clinal width Sigmoidal curves of cline data were used to calculate clinal width using the inverse of the maximum slope and the gradient model of gradient environmental change (Endler 1977), in which dispersal distance is much smaller than the cline width. This model assumes the equation, in which w is the width of the cline, σ is the standard deviation of the adult-offspring dispersal distance, and b represents the selection gradient (Slatkin 1973; Endler 1977).As allele frequencies were not fixed at 0 and 1 on either side of the cline, clinal width equalled Δp/slope, where Δp is the change in allele frequencies from one side of the cline to the other. Because the dispersal distance is currently unknown in Atlantic cod and likely varies significantly across the study area, we inferred the strength of selection using a range of reasonable dispersal estimates taken from early life history and adult mark–recapture studies (Bradbury et al. 2000; Robichaud and Rose 2004). We also assumed a normal distribution of dispersal distances, estimated as the average per generation dispersal using where d is the average per generation dispersal. In order to evaluate the influence of selection on resultant gene flow, we compared estimates of the number of effective migrants (NeM) estimated using the island model (Wright 1931) calculated using both the outlier loci and a random subset of the non6 outlier loci; this ratio was then compared across a range of spatial scales. We repeated random sampling of the non-outlier loci 10 times to ensure a representative sample. Although the limitations of the island model are well documented, in our context it is the relative comparison between the marker types and not the absolute estimates that is of interest. We therefore believe that this approach is appropriate. 7 Table S1. Details on sample locations and SNP summary statistics. See Methods for further information. Location Georges Bank (A) Georges Bank (B) Cape Sable, NS St. Mary's Bay, NL Holyrood Pond, NL Bay Bulls, NL Smith Sound A, NL Smith Sound B, NL GIlbert Bay, NL Ogac Lake, Baffin Island Barents Sea, Norway Akureyri, Iceland Baltic Sea Galway Bay, Ireland N 24 20 23 25 20 23 20 23 21 18 26 26 16 15 Ho 0.361 0.363 0.362 0.358 0.357 0.350 0.358 0.354 0.309 0.253 0.239 0.258 0.222 0.242 He % Polymorphic 0.360 99.22% 0.357 98.93% 0.359 99.29% 0.350 97.30% 0.351 96.80% 0.353 97.22% 0.351 96.37% 0.354 97.79% 0.304 90.11% 0.253 76.23% 0.239 78.22% 0.258 85.05% 0.220 69.40% 0.236 74.73% 8 Table S2. Allele frequencies of PanI and β1 Hb alleles in samples from the eastern Atlantic. Samples were genotyped using the KASpar system (KBiosciences). The two main allele variants PanIA and PanIB described at the PanI locus have been determined by assessing the polymorphism present at a DraI site (a G/A substitution in intron 4. Allele A: DraI site absent - TTTTGAAA; Allele B DraI site present - TTTTAAAA)( Pogson et al. (2001). The two Hb β1 allele variants (Andersen et al. 2009, Borza et al. 2009), corresponding to HbI-1 and HbI-2 (Sick, 1965), have been determined by assessing the SNP G454A Lys/Ala (Borza et al. 2009). Details regarding the KASpar assay for PanI and Hb β1are described elsewhere (Borza et al., submitted). Sample sizes given in Table S1. Location Barents Sea, Norway Akureyri, Iceland Baltic Sea Galway Bay, Ireland PanIA β1 Hb (HbI-2) 0.10 1.0 1.0 1.0 0.88 0.96 0.93 0.32 9 Table S3. Backward regression results from comparison population average allele frequency at clinal loci with environmental and stock life history parameters. Only variables included in best model are shown. Excluded were age at maturity, longitude and latitude. Partial Coefficients Effect P-value R2 Temperature Biomass Salinity -0.952 0.274 -0.137 <0.001 0.002 0.064 0.96 Temperature -0.947 <0.001 0.90 Biomass 0.144 0.622 0.02 Salinity -0.165 0.574 0.03 Table S4. Mantel tests (simple and partial) of the correlation between genetic differentiation at clinal loci, geographic distance and average bottom temperature. Partial Mantel tests estimate the correlation between two variables while holding one variable constant. Test Simple Mantel Partial Mantel Matrices compared r P FST x geographic distance 0.263 0.071 Temperature with geographic distance 0.159 0.117 FST x temperature 0.840 <0.001 FST x temperature (holding geographic distance constant) 0.832 <0.001 FST x distance (holding temperature distance constant) 0.245 0.148 10 Table S5. Annotation and linkage map information for outlier associated linkage groups for which SNPs which display parallel clines on either side of the Atlantic. Haemoglobin (Hb) SNPs are included for map position comparison. See Methods for details regarding SNPs and mapping. SNP cgpGmo-S248a Linkage group CGPIA12 Location (cM) 15.582 Annotation cgpGmo-S866 CGPIA12 16.674 cgpGmo-S816a CGPIA12 18.327 gb|ACI66551.1| Growth arrest and DNA-damage-inducible protein GADD45 alpha [Salmo salar] gb|ACO10168.1| GDP-L-fucose synthetase [Osmerus mordax] cgpGmo-S372a CGPIA12 18.327 no hits cgpGmo-S636 CGPIA12 19.814 no hits cgpGmo-S1046 CGPIA12 20.994 no hits cgpGmo-S316 CGPIA12 24.168 no hits cgpGmo-S2101 CGPIA12 34.413 no hits cgpGmo-S1112, Hb cluster 1 CGPIA2 36.469 cgpGmo-S1113, Hb cluster 1 CGPIA2 36.469 cgpGmo-S1111, Hb cluster 1 CGPIA2 38.88 cgpGmo-S1205 CGPIA2 56.739 no hits cgpGmo-S1068 CGPIA2 56.835 cgpGmo-S1101a CGPIA2 56.835 cgpGmo-S532 CGPIA2 56.835 cgpGmo-S174 CGPIA2 57.04 linked to cgpGmo-1456: ref|NP_956397.1| KDEL (Lys-Asp-Glu-Leu) endoplasmic reticulum protein retention receptor 2 [Danio rerio] linked to cgpGmo-1456: ref|NP_956397.1| KDEL (Lys-Asp-Glu-Leu) endoplasmic reticulum protein retention receptor 2 [Danio rerio] linked to cgpGmo-1456: ref|NP_956397.1| KDEL (Lys-Asp-Glu-Leu) endoplasmic reticulum protein retention receptor 2 [Danio rerio] no hits cgpGmo-S184 CGPIA2 57.04 no hits cgpGmo-S1751 CGPIA2 57.296 no hits cgpGmo-S1200 CGPIA7 9.272 no hits cgpGmo-S2019 CGPIA7 17.391 no hits Bit Score e-value 191 4E-55 243 7E-77 219 5E-55 219 5E-55 219 5E-55 no hits 11 cgpGmo-S917 CGPIA7 18.267 no hits cgpGmo-S1039b CGPIA7 19.147 no hits cgpGmo-S1089 CGPIA7 19.147 no hits cgpGmo-S1183 CGPIA7 19.147 no hits cgpGmo-S1425 CGPIA7 19.147 no hits cgpGmo-S152 CGPIA7 19.147 no hits cgpGmo-S157 CGPIA7 19.147 gb|ACI66405.1| Oligoribonuclease, mitochondrial precursor [Salmo salar] cgpGmo-S1039a CGPIA7 19.147 no hits cgpGmo-S1810 CGPIA7 19.147 no hits cgpGmo-S183 CGPIA7 19.147 no hits cgpGmo-S1830 CGPIA7 19.147 no hits cgpGmo-S2158 CGPIA7 19.147 emb|CAF95009.1| unnamed protein product [Tetraodon nigroviridis] 136 3E-30 cgpGmo-S268 CGPIA7 19.147 ref|XP_685984.3| PREDICTED: im:7151086 [Danio rerio] 60 0.000001 cgpGmo-S419 CGPIA7 19.147 no hits cgpGmo-S739 CGPIA7 19.147 no hits cgpGmo-S814a CGPIA7 19.147 no hits cgpGmo-S870 CGPIA7 19.147 ref|NP_001134335.1| Kaptin [Salmo salar] 175 2E-50 cgpGmo-S920 CGPIA7 19.147 no hits cgpGmo-S260a CGPIA7 20.06 288 9E-76 cgpGmo-S426 CGPIA7 20.392 gb|ACH70869.1| calpain small subunit 1 [Salmo salar] gb|ACI68381.1| Calpain small subunit 1 [Salmo salar] no hits cgpGmo-S1644 CGPIA7 22.341 no hits 120 2E-55 12 (a) Labrador (b) 49.0 Bonavista Bay Trinity Bay Smith Sound (B) Smith Sound (A) 48.0 Conception Bay Bay Bulls 47.0 Placentia Bay Holyrood Pond St. Mary's Bay -55.0 -54.0 -53.0 Figure S1. Location of sampling sites within Newfoundland. Inset shows east coast of Newfoundland in respect to eastern Canada. See Figure 1 for complete sample locations. 13 4 Ogac Lake OL OL OL OL OL OL OL OL OL OL OL OL OL OL OL OL OL OL 3 PC2 (17.9%) 2 Western Atlantic 1 Eastern Atlantic GB SSB SMB GB HRP SMBSSB SSB SSA 5ZB GB SSA SSB BBHRP GB GBSMB BB 5ZA HRP SSA SSA 4X HRP 5ZA SSB GBHRP HRP SSA GB BBHRP SMB SMB SMB BB 5ZB GB GB 4X GB GB SSA BB HRP 5ZB SMB GB SMB SSB SMB SMB 5ZA5ZA SSB HRP 4X SSA HRP SMB 5ZB HRP BB5ZB 5ZA 5ZB 4X 5ZB BB BB BB 5ZA SSA BB SSB 5ZA SSA SMB HRP SSB 4X SSA HRP 5ZA BB SSA GB GB HRP GB 4X BB 4X SSB BB 5ZB BB SMB BB5ZA 5ZA 4X SSB SSA HRP SSA SSA 5ZA SSA BB BB BB GB 4X SSA 5ZB 4X 5ZB 5ZB 4X HRP SSA SSB BB SMB SMB 5ZA BB SMB HRPSSB 5ZB SSB SSA SSB 5ZB BB 4X5ZB 5ZA HRP SSA SSA 5ZA 4X 4XSMB 5ZB 5ZA 4X SSA 5ZB GB 5ZB 5ZB SSA GB HRP 4X5ZB 5ZA 5ZA 4X HRP 5ZA 4XBB SSB 5ZA 4X5ZB BB 4X 5ZA 5ZB SMBSSA 5ZB 4X SSB 5ZB 0 NOR NOR NOR NOR ICE NOR NOR NOR ICE IRLD ICE ICENOR NOR NOR IRLD ICE NOR ICE ICE IRLD IRLD IRLD ICE ICEICE ICE NOR ICENOR NOR ICE NOR ICE ICE NOR IRLD NOR NOR NOR ICE IRLD NOR NOR ICEICE IRLD ICE NOR ICE IRLD IRLD NOR NOR IRLD ICE ICE IRLDIRLD IRLD ICE ICE ICE IRLD 4X -1 -2 -3 -2 -1 0 1 2 3 4 PC1 (52.9%) Figure S2. PCA with non-neutral loci removed (see Methods) Key: GB-Gilbert Bay, OG-Ogac Lake, BB-Bay Bulls, 4X-Cape Sable, 5ZA&B-Georges Bank, SSA&A-Smith Sound, ICE-Iceland, NOR-Norway, IRLD-Galway Bay. Values in parentheses indicate variance explained by each coordinate. 14 0.7 0.01 0.05 0.10 0.50 0.90 0.95 0.6 0.5 FST 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 He Figure S3. Hierarchical test for selection conducted in Arlequin ver. 3.5 (Excoffier and Lischer 2010) on 1641 Atlantic cod single nucleotide polymorphisms genotyped for 14 locations throughout the north Atlantic. Red points represent outliers (n=70) identified using BAYESCAN, see Methods for details. 15 Figure S4. Heatmaps of allele frequency at loci identified as under selection. (a) showing parallel clines on either side of the Atlantic, (b) includes loci which are fixed in the eastern Atlantic and show some clinal structure in the west, and (c) includes the remaining loci. 16 Figure S5. Linkage of clinal loci calculated using TASSEL (Bradbury et al. 2007). Above the diagonal represents D’ and below the diagonal represent p-values from a Fisher’s Exact test. Loci are ordered by linkage group on the y-axis, see Fig. 4. 17 0.8 (a) clinal loci 0.6 ST F 0.4 0.2 r2 = 0.72, p<0.001 0.0 0 2 4 6 8 10 12 0.5 0.4 (b) IBD residuals of clinal loci IBD residual 0.3 0.2 0.1 0.0 -0.1 r2 = 0.73, p<0.001 -0.2 -0.3 0 2 4 6 8 10 12 10 Nemnon-neutral / Nem neutral (c) comparison of rates of gene flow 1 0.1 r2 = 0.58, p<0.001 0.01 0 2 4 6 8 10 Temperature (difference between pairwise comparisons) Figure S6. (a) Association between FST and the difference in average annual bottom temperature between all 14 locations sampled throughout the north Atlantic and (b) relationship between the residuals of the isolation by distance relationship (i.e., FST and geographic distance) with the difference in annual average bottom temperature. (c) Comparison of estimated number of effective migrants for both neutral and non-neutral loci calculated using Wrights island model with the difference in average annual bottom temperature. 18 allele frequency 1.0 (a) Western Atlantic 0.8 0.6 0.4 w = 802.km (SD=90.1) 0.2 0.0 0 selection coefficent (s) 1e-1 1000 2000 3000 (b) Dispersal distance and selection 1e-2 1e-3 1e-4 1e-5 1e-6 0 200 400 600 800 1000 Geographic distance (km) Figure S7. (A) Clines in allele frequency in temperature associated outlier loci from the western Atlantic, clinal width estimated using sigmoidal non-linear curve fitting. (B) Estimates of selection associated with dispersal distances from 1:1000 km estimated using the gradient model of clinal structure (Endler 1977) and assuming a normal distribution of dispersal distances. 19 1.0 (a) Georges Bank (a) Georges Bank (b) Cape Sable, NS St. Marys Bay, NL Holyrood Pond, NL Bay Bulls, NL Smith Sound, NL (a) Smith Sound, NL (b) GIlbert Ba, Lab Ogac Lake Barents Sea, Norway Akureyri, Iceland Baltic Sea Galway Bay, Ireland PC Two (11%) 0.5 0.0 -0.5 -1.0 0.4 Frequency (%) (b) 0.3 LG 12 LG 2 LG 7 0.2 0.1 0.0 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 PC One (74%) Figure S8. (a) Principle Coordinate Analysis (PCA) of 40 temperature-associated loci only. Key: Lab=Labrador; NL=Newfoundland and Labrador; NS=Nova Scotia. (b) Frequency distributions of PC One values from PCA conducted on each of the temperature-associated linkage groups separately. Values in parentheses indicate variance explained by each coordinate. 20