Detecting selection using genome scans

advertisement
Detecting selection using
genome scans
Roger Butlin
University of Sheffield
Nielsen R (2005) Molecular signatures of natural selection. Annu. Rev.
Genet. 39, 197–218.
What signatures does selection leave in the genome?
1.
2.
3.
4.
5.
Population differentiation – today’s focus!
Frequency spectrum, e.g. Tajima’s D
Selective sweeps
Haplotype structure (linkage disequilibrium)
MacDonald-Kreitman tests (or PAML over long time-scales)
Frequency distribution:
From Nielsen (2005): frequency of derived allele in a sample of 20 alleles.
Tajima’s D = (π-S)/sd, summarises excess of rare variants
Selective sweep:
Extended haplotype homozygosity (Sabeti et al. 2002)
MacDonald-Kreitman and related tests
dN = replacement changes per replacement site
dS = silent changes per silent site
dN/dS = 1 - neutral
dN/dS < 1 - conserved (purifying selection)
dN/dS > 1 - adaptive evolution (positive selection)
Selection on phenotypic traits:
QTL
Association analysis
Candidate genes
Genome scans
(aka ‘Outlier analysis’)
Littorina saxatilis – locally adapted morphs
What signatures of
selection might we look
for?
‘H’
‘M’
Thornwick Bay
Signatures of selection:
Departure from HWE
Low diversity (selective sweep)
Frequency spectrum tests
High divergence
Elevated proportion of non-synonymous substitutions
LD
Fs
t
0.
05
0.
1
0.
15
0.
2
0.
25
0.
3
0.
35
0.
4
0.
45
0.
5
0.
55
Number of loci
Neutral loci
16
14
12
10
8
6
4
2
0
Fst
Fs
t
0.
05
0.
1
0.
15
0.
2
0.
25
0.
3
0.
35
0.
4
0.
45
0.
5
0.
55
Number of loci
Stabilizing selection
16
14
12
10
8
6
4
2
0
Fst
Fs
t
0.
05
0.
1
0.
15
0.
2
0.
25
0.
3
0.
35
0.
4
0.
45
0.
5
0.
55
Number of loci
Local adaptation
16
14
12
10
8
6
4
2
0
Fst
Charlesworth et al. 1997 (from Nosil et al. 2009)
A concrete example: adaptation to altitude in Rana temporaria (Bonin et al. 2006)
High – 2000m
Intermediate – 1000m
Low – 400m
190 individuals
392 AFLP bands
Generating the expected distribution
DetSel – Vitalis et al. 2001
to
Ne
N 0 Ne
t
N1
Dfdist – Beaumont & Nichols 1996
m
N
N
N
N2
μ
N
F1,2 – measure of divergence
of population 1,2 from
population 2,1
N
N
N
FST – symmetrical population
differentiation, as a function
of heterozygosity
Does the structure/history matter?
DetSel
95% CI
Dfdist
95%
50
%
5%
‘Low 1’ vs ‘High 1’
DetSel
Dfdist
Monomorphic in one
population
35
N/A
Unreliable outliers
Significant in one
comparison
14
29
False positives
Significant in comparisons
involving one population
3
11
Local effects
Significant in at least 2
comparisons
2
3
Significant in global
comparison across altitudes
6
(2 at 99%)
Both
1
Interpretation
Adaptation to
altitude
Adaptation to
altitude
392 AFLPs, 12 pairwise comparisons across altitude or 3 altitude categories, 95% cut off
343 loci
8 loci
Outliers and selected traits
Rogers and Bernatchez (2007):
Dwarf x Normal cross  both backcrosses
Measure ‘adaptive’ traits (9)
QTL map (>400 AFLP plus microsatellites)
Homologous AFLP in 4 natural sympatric population pairs
Outlier analysis (forward simulation based on Winkle)
Coregonus clupeaformis
(lake whitefish)
Hybrid x Dwarf
Homologous
AFLP
Outlier AFLP in
homologous set*
180
19
Outlier within QTL
(based on 1.5 LOD support)
9
(3.6 expected, P=0.0015)
Hybrid x Normal
131
8
4
(0.5 expected, P=0.0002)
*Only 3 outliers shared between lakes
Roger Butlin - Genome scans
21
Nosil et al. 2009 review of 14 studies:
1.
2.
3.
4.
5.
0.5 – 26% outliers, most studies 5-10%
1 - 5% outliers replicated in pair-wise comparisons
25 - 100% of outliers specific to habitat comparisons
No consistent pattern for EST-associated loci
LD among outliers typically low
But many methodological differences between studies
Population sampling
Marker type
Analysis type and options
Statistical cut-offs
Environmental correlations
SAM – Joost et al. 2007
IBA – Nosil et al. 2007
FST for each locus correlated with ‘adaptive distance’, controlling for geographic
distance (partial Mantel test)
Methodological improvements – Bayesian approaches
BayesFst – Beaumont & Balding 2004
Bayescan – Foll & Gaggiotti 2008
For each locus i and population j we have an FST measure,
relative to the ‘ancestral’ population, Fij
Then decompose into locus and population components,
Log(Fij/(1-Fij) = αi + βj
αi is the locus-effect
– 0 neutral, +ve divergence selection, -ve balancing selection
Ancestral
βj is the population effect
Assuming Dirichlet distribution of allele frequencies among
subpopulations, can estimate αi + βj by MCMC
In Bayescan, also explicitly test αi = 0
Apparently much greater power to detect balancing selection than FDIST
Lower false positive rate
Wider applicability
Methodological improvements – hierarchical structure
Arlequin – Excoffier et al. 2009
Circles – simulated STR data, grey – null distribution
Bayenv – Coop et al. 2010
Estimates variance-covariance matrix of allele frequencies then tests for
correlations with environmental variables (or categories).
Software available at:
http://www.eve.ucdavis.edu/gmcoop/Software/Bayenv/Bayenv.html
Multiple analyses? Candidate vs control? E.g. Shimada et al. 2010
Hohenlohe et al. 2010
Mäkinen et al 2008
7 populations
3 marine, 4 freshwater
103 STR loci
Analysed by BayesFst
(and LnRH)
5 under directional selection
(3 in Eda locus)
15 under balancing selection
Used as a test case by Excoffier et al
2 directional
3 balancing
Can we replicate these results?
Bayescan
Stickleback_allele.txt – input file
Output_fst.txt – view with R routine plot_Bayescan
Arlequin
Stickleback_data_standard.arp – IAM
Stickleback_data_repeat.arp – SMM
Run using Arlequin3.5
Try hierarchical and island models, maybe different hierarchies
Sympatric speciation?
FST distribution as evidence of speciation with gene flow
Savolainen et al (2006)
Cf. Gavrilets and Vose (2007)
• few loci underlying key traits
• intermediate selection
• initial environmental effect on phenology
Howea - palms
Download