Slides - Lirmm

advertisement
The Dynamics of Positive Selection
on the Mammalian Tree
Carolin Kosiol
Cornell University
<ck285@cornell.edu>
Joint with: Tomas Vinar, Rute Da Fonseca, Melissa Hubisz,
Carlos Bustamante, Rasmus Nielsen and Adam Siepel
1
Positive selection in six mammalian
genomes
human
6 high-quality genomes of
eutherian mammals
chimp
macaque
16529 human / chimp /
macaque / mouse / rat / dog
orthologous genes.
544 genes identified to be
under positive selection using
codon models.
mouse
rat
dog
0.05
subst/site
2
Codon models
Qij
=
0
i, j differ by > 1 nucleotide
j
i, j synonymous transversion
j 
i, j synonymous transition
j 
i, j nonsynonymous transversion
j   i, j nonsynonymous transition
<1
where
=1
 : transition/transversion rate ratio
>1
j : equilibrium frequency of codon j
 : nonsynonymous/synonymous rate ratio
purifying selection
neutral evolution
positive selection
3
(Goldman &Yang 1994,Yang et al. , 2000)
Branch-Site Likelihood
Ratio Tests (LRTs)
• Based on continuous-time Markov models of codon evolution
• Compare null model allowing for negative
selection (ω<1) or neutral evolution (ω=1)
with alternative model additionally allowing
for positive selection (ω>1)
• Both models allow ω to vary across sites
• Can have foreground branches with PS and background
branches without
• Applied separately to each gene
4
(Nielsen & Yang, 1998; Yang & Nielsen, 2002)
Branch and clade LRTs
400
human
10
rodent
branch
56
chimp
18
rodent
clade
61
hominid
7
macaque
10
primate
branch
21
primate
clade
24
Total: 544 positively selected genes (PSGs) identified
5
Co-evolution in complement immunity
P<0.05
FDR<0.05
6
29-1 = 511 possible selection
histories on the 9 branch
mammalian phylogeny
7
Why Baysian Model Selection?
• Many of the likelihoods of the 511 models might be
very similar or identical.
• Models are not nested.
• Bayesian analysis looks at distribution of selection
histories.
• Bayesian analysis allows “soft” (probabilistic)
choices of selection histories.
• We can compute prevalence of selection on
individual branches and clades that considers
uncertainty of selection histories.
8
Bayesian Switching Model
Two evolutionary modes:
Selected
Non-selected
Parameters describing the switching process:
b,G : probability that gene gains positive selection on
branch b
b,L : probability that gene loses positive selection on
branch b
9
Bayesian Switching Model
X =(X1, …XN) be the alignment data,
with Xi alignment of ith gene
Z=(Z1,…,ZN) be the set of selection histories,
with Zi denoting history of ith gene.
 is set of switching parameters
Assume independence of genes X and histories Z, and
conditional independence X and  given Z. Thus,
10
Mapping selection histories to
switches (cont.)
(0,0)
(1,1)
(1,1) (1,1)
(1,1)
(0,1)
(1,1)
Gain of pos. selection (0,1) : nbG
Absence of gain of pos. selection (0,0) : 1- nbG
Loss of pos. selection (0,1) : nbL
Absence of loss pos. selection (1,1) : 1- nbL
11
Bayesian Switching model
12
Putting everything together …
with
(Beta distrib =1, =9)
(Likelihoods from codon models
assuming selection histories Zj)
(Product relevant switching prob)
13
Gibbs sampling
Variables Z and  are unobserved. We sample from the
joint posterior distribution
by a Gibbs sampler that alternates between sampling
each Zi conditional on Xi and previously sampled  and
sampling  conditional on a previously sampled Z.
14
Inferred Rates of Gain and Loss
gain
loss
15
Episodic selection on the
mammalian tree
• Most genes appear to have switched between
evolutionary modes multiple times.
• Posterior expected number of modes switches 1.6
(0.6 gains, 1.0 loses)
• An expected 95% of PSGs have experienced at least
once, 53% at least twice.
• These observations are qualitatively in agreement
with Gillespie’s episodic molecular clock.
16
Inferred Number of Genes Under
Positive Selection
(119-162)
(183-232)
(32-62)
(234 -327)
(219-257)
(338-382)
(318-360)
(255-325)
(357-426)
(204-278)
(213-292)
(281-333)
17
Complement components C7 and C8B
C7: PP=0.98
C8B: PP=0.93
• Components C7 and C8B encode proteases in the
membrane attack complex
• Differences in complement proteases are thought to
explain certain differences in immune responses of
humans and rodents.
(Puente et al, 2003)
18
Glycoprotein hormones GGA
• CGA is alpha subunit of
chorionic gonadotropin,
luteinizing hormone,
follicle stimulating, and
thyroid stimulating hormone.
PP = 0.82
• The alpha subunits of 4 hormones are identical, however,
their beta chains are unique and confer biological
specificity.
• Beta subunits CGB1 and CGB2 are thought to have
originated from gene duplication in the common ancestor
of humans and great apes.
19
Summary and Future Work
• Bayesian analysis allows the study of patterns and
the episodic nature of positive selection on the
mammalian tree.
• Most probable selection histories can be identified
for individual genes.
• Ideally, we like to model mode switches in
continuous time.
• Compare functions of genes with high and low
expected number of switches.
• Is the selection history predictive of function?
20
Resource
http://compgen.bscb.cornell.edu/projects/mammal-psg/
21
Thanks
Siepel Lab (Cornell)
Adam Siepel, Tomas Vinar, Brona Brejova,
Adam Diehl, Andre Luis Martins
Bustamante Lab (Cornell)
Carlos Bustamante, Adam Boyko, Adam Auton, Keyan Zhao,
Abra Brisbin, Kasia Bryc, Jeremiah Degenhardt,
Lin Li, Kirk Lohmueller, Weisha Michelle Zhu, Amit Indap
Nielsen lab (Berkeley)
Rasmus Nielsen
Rute Da Fonseca
NIH and NSF for funding
22
Download