Dispersal in the Oceans and the Conservation of Marine Communities

advertisement
Supporting information for
Parallel adaptive evolution of Atlantic cod on both sides
of the Atlantic Ocean in response to temperature
Ian R. Bradbury*, Sophie Hubert, Brent Higgins, Sharen Bowman, Ian Paterson,
Paul V.R. Snelgrove, Corey Morris, Robert S. Gregory, David C. Hardie, Jeffrey
A. Hutchings, Tudor Borza, Daniel Ruzzante, Chris Taggart, Paul Bentzen
*To whom correspondence should be addressed. E-mail: ibradbur@dal.ca
The PDF includes:
Methods
Tables S1-S5
Figures S1-S8
1
Methods/Results
Sample Collection and Location Characteristics
Individuals were sampled at 14 locations from throughout the North Atlantic (Fig 1, S1,
Table S1) from 1996 to 2007. Sample sizes ranged from 15 to 26. Samples were
collected as part of scientific surveys or commercial harvest and primarily targeted fish in
spawning condition with the exception of Ogac Lake and Gilbert Bay where sampling
was restricted to summer months. Specific details regarding some samples and locations
are published elsewhere (e.g., Taggart and Cook 1996; Bradbury et al. 2009; Hubert et al.
2009, 2010). Although the potential for mixing of local stocks within our samples exists,
the large geographic scale examined here and the nature of the trends observed should
minimize the impact of small scale mixtures, but this cannot be discounted.
Nonetheless, the impact of local structure in areas such as Iceland may be
important. Samples from Iceland, Norway, and Ireland were also genotyped for A and B
Pan I alleles (Pogson & Fevolden 2003) using the DraI restriction site as well as the
hemoglobin βI alleles (Borza et al. 2009) to refine our assignment of fish to discrete
geographic areas (table S2).
This was used in conjunction with maps of alleles (e.g.,
Arnason et al. 2009; Andersen et al. 2008), ocean temperature, and oxygen isotope
temperature estimates (e.g., Weidman and Millner 2000) to ensure our chosen
temperatures were appropriate. Moreover, the trends observed at the outlier SNPs for the
most part seem robust to the influence of small scale structure or isolation. This is
especially clear with the Baltic or Ogac Lake samples sample which are quite isolated
from neighboring populations yet display very consistent trends in the temperature
associated outlier loci.
That said, there may be small scale variation in structure
2
associated with ocean temperature that is contributing to the observed trends, yet on the
scale examined (i.e. the entire north Atlantic) these do not seem to significantly impact
our associations though do warrant future evaluation.
EST libraries, SNP identification, annotation and linkage
In total 884 individuals were used to develop the initial cDNA libraries (Bowman et al.
2010). SNPs with less than 100 bp of flanking sequence or within 60 bp of another
selected SNP were removed from consideration (Hubert et al, 2010). Of the 3072 putative
SNPs identified, 2284 (~74%) were selected following screening, of which 1641 were
informative for the populations used in this study (53%). For identified SNPs, contig
consensi were processed with BLASTX against the NCBI non-redundant dataset (nr),
using a cutoff of 1e -05 to identify significant hits. Overall annotation success was low
(see Hubert et al. 2010), likely because the SNPs were designed almost entirely using
sequences from the 3' end, and as such mostly comprise the 3'UTR (ie non-coding). This
approach was chosen as a greater number of SNPs and less splicing is likely in the
3'UTR, the obvious drawback is a lower number of annotated genes (see Hubert et al.
2010 for further details). SNPs have been deposited at GenBank dbSNP under accession
numbers ss131570222 - ss131571915.
Linkage was assessed using JoinMap®4 (Van Ooijen, 2006). The overall approach
followed for map generation is described in Hubert et al. 2010. Briefly, each cross was
examined separately, and loci which showed abnormal segregation as determined using a
chi-square goodness of fit test were removed (P<0.005). Groups of linked markers were
3
identified using a LOD cut-off value of 5.0 or greater and ordered within linkage groups
using Haldane’s mapping function. Family specific maps were compared and a 1:1
correspondence between linkage groups confirmed. The corresponding family groups
were combined using the JoinMap (Van Ooijen 2006) merge function to generate a
consensus map. However, in contrast to the initial study (Hubert et al, 2010) a third
family was incorporated into the previously published map, and JoinMap®4 was allowed
to force additional markers with a lower goodness-of-fit into the map to maximize the
information regarding linked markers for the purposes of this study. Both PanI and
haemoglobin have been shown to be associated with temperature in Atlantic cod (e.g.,
Case et al. 2005; Andersen et al. 2009); however, this study revealed clinal temperature
SNPs associated only with haemoglobin β1. As a result only data related to the mapping
of haemoglobin β1 is shown here.
Detecting loci under selection – analysis and simulations
Simulated datasets were generated using EASYPOP (Balloux 2001) and analyzed using
BAYESCAN to examine the false positive error rate. Simulations were based on an
island model that included 10 demes, with 10000 individuals per deme, a migration rate
of 0.01, and 1000 bi-allelic loci. Mutation rate and simulation run length were selected
(=0.0001, 1000 generations) to produce realistic levels of diversity and divergence
based on the observed and previously published (e.g., Moen et al. 2008; Nielsen et al.
2009) SNP data (FST=0.05, Ho=0.29). Ten independent simulations were completed and
the number of false positives was estimated with BAYESCAN. In addition to
4
BAYESCAN, we used the hierarchical test for selection as implemented in ARLEQUIN
(v.3.5.1.2, Excoffier et al. 2010).
Outlier loci which displayed parallel clines on either side of the Atlantic were examined
for environmental and life history correlates with allele frequency using multiple
regression and simple mantel tests. Geographic distance was estimated as the straightest
distance between samples when avoiding depths > 600 m. Environmental and ecological
data were collected from published sources (Myers et al. 2001; Robichaud and Rose
2004) and references therein, and included average annual bottom temperature, average
annual bottom salinity, latitude, longitude, stock biomass, area of occupancy, age at
maturity, and spawning month. Because associations with environmental variables may
by be a result of neutral geographic associations, we accounted for the influence of
geography by using both partial Mantel tests and the residuals from the isolation by
distance relationship (IBD). Admittedly, small scale variation in these environmental and
life history variables may be a source of error. Moreover the environmental conditions
experienced by an individual or population often change seasonally, with development, or
movement patterns. Without knowing precisely when selection is acting during the life
history it is impossible to identify the period and value of importance. As such we have
used a conservative approach of using average annual values for environmental factors as
they are most likely to represent regional trends in ocean climate and include the critical
periods of interest. Notwithstanding its inherent uncertainties, this approach is one that
has been used repeatedly in studies of Atlantic cod (Myers et al. 2001; Hutchings et al.
2007). Moreover, the average annual temperature values used are in agreement with
5
temperature exposure of individuals (e.g., Barents Sea) estimated from stable oxygen
isotopes (Weidman and Millner 2000) further supporting the values used. Nonetheless,
this remains a potential source of error; the refinement of the environmental variables
used requires the further evaluation of exactly what temperatures sampled fish experience
and when selection is acting during the life history.
Estimating selection and clinal width
Sigmoidal curves of cline data were used to calculate clinal width using the inverse of the
maximum slope and the gradient model of gradient environmental change (Endler 1977),
in which dispersal distance is much smaller than the cline width. This model assumes the
equation,
in which w is the width of the cline, σ is the standard deviation
of the adult-offspring dispersal distance, and b represents the selection gradient (Slatkin
1973; Endler 1977).As allele frequencies were not fixed at 0 and 1 on either side of the
cline, clinal width equalled Δp/slope, where Δp is the change in allele frequencies from
one side of the cline to the other. Because the dispersal distance is currently unknown in
Atlantic cod and likely varies significantly across the study area, we inferred the strength
of selection using a range of reasonable dispersal estimates taken from early life history
and adult mark–recapture studies (Bradbury et al. 2000; Robichaud and Rose 2004). We
also assumed a normal distribution of dispersal distances, estimated as the average per
generation dispersal using
where d is the average per generation dispersal.
In order to evaluate the influence of selection on resultant gene flow, we compared
estimates of the number of effective migrants (NeM) estimated using the island model
(Wright 1931) calculated using both the outlier loci and a random subset of the non6
outlier loci; this ratio was then compared across a range of spatial scales. We repeated
random sampling of the non-outlier loci 10 times to ensure a representative sample.
Although the limitations of the island model are well documented, in our context it is the
relative comparison between the marker types and not the absolute estimates that is of
interest. We therefore believe that this approach is appropriate.
7
Table S1. Details on sample locations and SNP summary statistics. See Methods for
further information.
Location
Georges Bank (A)
Georges Bank (B)
Cape Sable, NS
St. Mary's Bay, NL
Holyrood Pond, NL
Bay Bulls, NL
Smith Sound A, NL
Smith Sound B, NL
GIlbert Bay, NL
Ogac Lake, Baffin Island
Barents Sea, Norway
Akureyri, Iceland
Baltic Sea
Galway Bay, Ireland
N
24
20
23
25
20
23
20
23
21
18
26
26
16
15
Ho
0.361
0.363
0.362
0.358
0.357
0.350
0.358
0.354
0.309
0.253
0.239
0.258
0.222
0.242
He
% Polymorphic
0.360
99.22%
0.357
98.93%
0.359
99.29%
0.350
97.30%
0.351
96.80%
0.353
97.22%
0.351
96.37%
0.354
97.79%
0.304
90.11%
0.253
76.23%
0.239
78.22%
0.258
85.05%
0.220
69.40%
0.236
74.73%
8
Table S2. Allele frequencies of PanI and β1 Hb alleles in samples from the eastern
Atlantic. Samples were genotyped using the KASpar system (KBiosciences). The two
main allele variants PanIA and PanIB described at the PanI locus have been determined
by assessing the polymorphism present at a DraI site (a G/A substitution in intron 4.
Allele A: DraI site absent - TTTTGAAA; Allele B DraI site present - TTTTAAAA)(
Pogson et al. (2001). The two Hb β1 allele variants (Andersen et al. 2009, Borza et al.
2009), corresponding to HbI-1 and HbI-2 (Sick, 1965), have been determined by
assessing the SNP G454A Lys/Ala (Borza et al. 2009). Details regarding the KASpar
assay for PanI and Hb β1are described elsewhere (Borza et al., submitted). Sample sizes
given in Table S1.
Location
Barents Sea, Norway
Akureyri, Iceland
Baltic Sea
Galway Bay, Ireland
PanIA
β1 Hb
(HbI-2)
0.10
1.0
1.0
1.0
0.88
0.96
0.93
0.32
9
Table S3.
Backward regression results from comparison population average allele
frequency at clinal loci with environmental and stock life history parameters. Only
variables included in best model are shown. Excluded were age at maturity, longitude
and latitude.
Partial
Coefficients
Effect
P-value
R2
Temperature
Biomass
Salinity
-0.952
0.274
-0.137
<0.001
0.002
0.064
0.96
Temperature
-0.947
<0.001
0.90
Biomass
0.144
0.622
0.02
Salinity
-0.165
0.574
0.03
Table S4. Mantel tests (simple and partial) of the correlation between genetic
differentiation at clinal loci, geographic distance and average bottom temperature. Partial
Mantel tests estimate the correlation between two variables while holding one variable
constant.
Test
Simple Mantel
Partial Mantel
Matrices compared
r
P
FST x geographic distance
0.263
0.071
Temperature with geographic distance
0.159
0.117
FST x temperature
0.840
<0.001
FST x temperature (holding geographic
distance constant)
0.832
<0.001
FST x distance (holding temperature distance
constant)
0.245
0.148
10
Table S5. Annotation and linkage map information for outlier associated linkage groups for which SNPs which display parallel clines
on either side of the Atlantic. Haemoglobin (Hb) SNPs are included for map position comparison. See Methods for details regarding
SNPs and mapping.
SNP
cgpGmo-S248a
Linkage
group
CGPIA12
Location
(cM)
15.582
Annotation
cgpGmo-S866
CGPIA12
16.674
cgpGmo-S816a
CGPIA12
18.327
gb|ACI66551.1| Growth arrest and DNA-damage-inducible protein
GADD45 alpha [Salmo salar]
gb|ACO10168.1| GDP-L-fucose synthetase [Osmerus mordax]
cgpGmo-S372a
CGPIA12
18.327
no hits
cgpGmo-S636
CGPIA12
19.814
no hits
cgpGmo-S1046
CGPIA12
20.994
no hits
cgpGmo-S316
CGPIA12
24.168
no hits
cgpGmo-S2101
CGPIA12
34.413
no hits
cgpGmo-S1112, Hb cluster 1
CGPIA2
36.469
cgpGmo-S1113, Hb cluster 1
CGPIA2
36.469
cgpGmo-S1111, Hb cluster 1
CGPIA2
38.88
cgpGmo-S1205
CGPIA2
56.739
no hits
cgpGmo-S1068
CGPIA2
56.835
cgpGmo-S1101a
CGPIA2
56.835
cgpGmo-S532
CGPIA2
56.835
cgpGmo-S174
CGPIA2
57.04
linked to cgpGmo-1456: ref|NP_956397.1| KDEL (Lys-Asp-Glu-Leu)
endoplasmic reticulum protein retention receptor 2 [Danio rerio]
linked to cgpGmo-1456: ref|NP_956397.1| KDEL (Lys-Asp-Glu-Leu)
endoplasmic reticulum protein retention receptor 2 [Danio rerio]
linked to cgpGmo-1456: ref|NP_956397.1| KDEL (Lys-Asp-Glu-Leu)
endoplasmic reticulum protein retention receptor 2 [Danio rerio]
no hits
cgpGmo-S184
CGPIA2
57.04
no hits
cgpGmo-S1751
CGPIA2
57.296
no hits
cgpGmo-S1200
CGPIA7
9.272
no hits
cgpGmo-S2019
CGPIA7
17.391
no hits
Bit Score
e-value
191
4E-55
243
7E-77
219
5E-55
219
5E-55
219
5E-55
no hits
11
cgpGmo-S917
CGPIA7
18.267
no hits
cgpGmo-S1039b
CGPIA7
19.147
no hits
cgpGmo-S1089
CGPIA7
19.147
no hits
cgpGmo-S1183
CGPIA7
19.147
no hits
cgpGmo-S1425
CGPIA7
19.147
no hits
cgpGmo-S152
CGPIA7
19.147
no hits
cgpGmo-S157
CGPIA7
19.147
gb|ACI66405.1| Oligoribonuclease, mitochondrial precursor [Salmo salar]
cgpGmo-S1039a
CGPIA7
19.147
no hits
cgpGmo-S1810
CGPIA7
19.147
no hits
cgpGmo-S183
CGPIA7
19.147
no hits
cgpGmo-S1830
CGPIA7
19.147
no hits
cgpGmo-S2158
CGPIA7
19.147
emb|CAF95009.1| unnamed protein product [Tetraodon nigroviridis]
136
3E-30
cgpGmo-S268
CGPIA7
19.147
ref|XP_685984.3| PREDICTED: im:7151086 [Danio rerio]
60
0.000001
cgpGmo-S419
CGPIA7
19.147
no hits
cgpGmo-S739
CGPIA7
19.147
no hits
cgpGmo-S814a
CGPIA7
19.147
no hits
cgpGmo-S870
CGPIA7
19.147
ref|NP_001134335.1| Kaptin [Salmo salar]
175
2E-50
cgpGmo-S920
CGPIA7
19.147
no hits
cgpGmo-S260a
CGPIA7
20.06
288
9E-76
cgpGmo-S426
CGPIA7
20.392
gb|ACH70869.1| calpain small subunit 1 [Salmo salar] gb|ACI68381.1|
Calpain small subunit 1 [Salmo salar]
no hits
cgpGmo-S1644
CGPIA7
22.341
no hits
120
2E-55
12
(a)
Labrador
(b)
49.0
Bonavista Bay
Trinity Bay
Smith Sound (B)
Smith Sound (A)
48.0
Conception Bay
Bay Bulls
47.0
Placentia Bay
Holyrood Pond
St. Mary's
Bay
-55.0
-54.0
-53.0
Figure S1. Location of sampling sites within Newfoundland. Inset shows
east coast of Newfoundland in respect to eastern Canada. See Figure 1
for complete sample locations.
13
4
Ogac Lake
OL
OL
OL
OL
OL
OL
OL
OL
OL OL
OL
OL OL
OL OL
OL
OL OL
3
PC2 (17.9%)
2
Western Atlantic
1
Eastern Atlantic
GB
SSB
SMB
GB
HRP SMBSSB
SSB SSA
5ZB
GB SSA SSB
BBHRP
GB GBSMB
BB
5ZA
HRP
SSA
SSA
4X
HRP
5ZA
SSB
GBHRP
HRP
SSA
GB
BBHRP
SMB
SMB
SMB
BB 5ZB
GB
GB
4X
GB
GB
SSA
BB
HRP
5ZB
SMB
GB
SMB
SSB
SMB
SMB
5ZA5ZA
SSB
HRP
4X
SSA
HRP
SMB
5ZB
HRP
BB5ZB
5ZA
5ZB
4X
5ZB
BB
BB
BB
5ZA
SSA
BB
SSB
5ZA
SSA
SMB
HRP
SSB
4X
SSA
HRP
5ZA
BB
SSA
GB GB
HRP
GB
4X
BB
4X
SSB
BB
5ZB
BB
SMB
BB5ZA
5ZA
4X
SSB
SSA
HRP
SSA
SSA
5ZA
SSA
BB
BB
BB
GB
4X
SSA
5ZB
4X
5ZB
5ZB
4X
HRP
SSA
SSB
BB
SMB
SMB
5ZA
BB
SMB
HRPSSB
5ZB
SSB
SSA
SSB
5ZB
BB
4X5ZB
5ZA
HRP
SSA
SSA
5ZA
4X
4XSMB
5ZB
5ZA
4X SSA
5ZB
GB
5ZB
5ZB
SSA
GB
HRP
4X5ZB
5ZA
5ZA
4X HRP
5ZA
4XBB
SSB
5ZA
4X5ZB
BB 4X 5ZA
5ZB
SMBSSA
5ZB
4X
SSB
5ZB
0
NOR
NOR
NOR
NOR
ICE
NOR
NOR
NOR
ICE
IRLD
ICE
ICENOR
NOR
NOR
IRLD
ICE
NOR
ICE
ICE
IRLD
IRLD
IRLD
ICE
ICEICE
ICE
NOR
ICENOR NOR
ICE
NOR
ICE
ICE
NOR
IRLD
NOR
NOR
NOR
ICE
IRLD
NOR
NOR
ICEICE
IRLD
ICE
NOR
ICE
IRLD
IRLD
NOR
NOR
IRLD
ICE ICE
IRLDIRLD
IRLD
ICE
ICE
ICE
IRLD
4X
-1
-2
-3
-2
-1
0
1
2
3
4
PC1 (52.9%)
Figure S2. PCA with non-neutral loci removed (see Methods) Key: GB-Gilbert Bay,
OG-Ogac Lake, BB-Bay Bulls, 4X-Cape Sable, 5ZA&B-Georges Bank, SSA&A-Smith
Sound, ICE-Iceland, NOR-Norway, IRLD-Galway Bay. Values in parentheses indicate
variance explained by each coordinate.
14
0.7
0.01
0.05
0.10
0.50
0.90
0.95
0.6
0.5
FST
0.4
0.3
0.2
0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
He
Figure S3. Hierarchical test for selection conducted in Arlequin ver. 3.5 (Excoffier and
Lischer 2010) on 1641 Atlantic cod single nucleotide polymorphisms genotyped for 14
locations throughout the north Atlantic. Red points represent outliers (n=70) identified
using BAYESCAN, see Methods for details.
15
Figure S4. Heatmaps of allele frequency at loci identified as under selection. (a) showing
parallel clines on either side of the Atlantic, (b) includes loci which are fixed in the
eastern Atlantic and show some clinal structure in the west, and (c) includes the
remaining loci.
16
Figure S5. Linkage of clinal loci calculated using TASSEL (Bradbury et al. 2007). Above the diagonal
represents D’ and below the diagonal represent p-values from a Fisher’s Exact test. Loci are ordered by
linkage group on the y-axis, see Fig. 4.
17
0.8
(a) clinal loci
0.6
ST
F
0.4
0.2
r2 = 0.72, p<0.001
0.0
0
2
4
6
8
10
12
0.5
0.4
(b) IBD residuals of clinal loci
IBD residual
0.3
0.2
0.1
0.0
-0.1
r2 = 0.73, p<0.001
-0.2
-0.3
0
2
4
6
8
10
12
10
Nemnon-neutral / Nem neutral
(c) comparison of rates of gene flow
1
0.1
r2 = 0.58, p<0.001
0.01
0
2
4
6
8
10
Temperature (difference between pairwise comparisons)
Figure S6. (a) Association between FST and the difference in average annual
bottom temperature between all 14 locations sampled throughout the north
Atlantic and (b) relationship between the residuals of the isolation by distance
relationship (i.e., FST and geographic distance) with the difference in annual
average bottom temperature. (c) Comparison of estimated number of effective
migrants for both neutral and non-neutral loci calculated using Wrights island
model with the difference in average annual bottom temperature.
18
allele frequency
1.0
(a) Western Atlantic
0.8
0.6
0.4
w = 802.km (SD=90.1)
0.2
0.0
0
selection coefficent (s)
1e-1
1000
2000
3000
(b) Dispersal distance and selection
1e-2
1e-3
1e-4
1e-5
1e-6
0
200
400
600
800
1000
Geographic distance (km)
Figure S7. (A) Clines in allele frequency in temperature associated outlier loci from the western
Atlantic, clinal width estimated using sigmoidal non-linear curve fitting. (B) Estimates of selection
associated with dispersal distances from 1:1000 km estimated using the gradient model of clinal
structure (Endler 1977) and assuming a normal distribution of dispersal distances.
19
1.0
(a)
Georges Bank (a)
Georges Bank (b)
Cape Sable, NS
St. Marys Bay, NL
Holyrood Pond, NL
Bay Bulls, NL
Smith Sound, NL (a)
Smith Sound, NL (b)
GIlbert Ba, Lab
Ogac Lake
Barents Sea, Norway
Akureyri, Iceland
Baltic Sea
Galway Bay, Ireland
PC Two (11%)
0.5
0.0
-0.5
-1.0
0.4
Frequency (%)
(b)
0.3
LG 12
LG 2
LG 7
0.2
0.1
0.0
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
PC One (74%)
Figure S8. (a) Principle Coordinate Analysis (PCA) of 40 temperature-associated loci
only. Key: Lab=Labrador; NL=Newfoundland and Labrador; NS=Nova Scotia. (b)
Frequency distributions of PC One values from PCA conducted on each of the
temperature-associated linkage groups separately. Values in parentheses indicate variance
explained by each coordinate.
20
Download