Molecular Evolution (with an amphasis on substitution rates)

advertisement
Molecular Evolution
with an emphasis on substitution rates
Gavin JD Smith
State Key Laboratory of Emerging Infectious Diseases
& Department of Microbiology
The University of Hong Kong
Bioinformatic and Comparative Genome Analysis Course
HKU-Pasteur Research Centre - Hong Kong
August 17 - August 29, 2009
Why?
“Understanding the selective pressures that
have shaped genetic variation is a central goal in
the study of evolutionary biology” Pond et al. 2007
2
Dynamics of evolution
•
The diversity exhibited by a population reflects the organisms natural
history
•
The genetic diversity of a population is a combination of:
- biological properties (e.g. mutation rates, generation time)
- evolutionary forces (e.g. molecular adaptation, genetic drift)
•
Three principal mechanisms are responsible for viral genetic variation
- mutation
- selection
- recombination
3
Mutations
As nonsynonymous (β) mutations directly alter
proteins (& potentially their function) they are
more likely to affect organism fitness than
synonymous (α) mutations that leave the amino
acid sequence unchanged
4
Mutations
•
Mutations that result in amino acid changes are non-synonymous
•
Mutations that do not result in amino acid changes are silent or synonymous
5
Selective pressures
• Selective pressure on coding sequences can be
calculated by comparison of the relative rates
of α & β mutations
• The ratio ω = β/α (also referred to as dN/dS or
KA/KS) is a standard measure of selective
pressure
6
Selective pressures
• ω ≈ 1 indicates neutral evolution, ω < 1
negative (or purifying) selection, & ω > 1
positive (or diversifying) selection
• To infer selective pressures it is necessary to
be able to accurately estimate
nonsynonymous & synonymous rates
– this is where models come in (discussed later)
7
Evolutionary rates and Selection
•
Mutations have evolutionary consequences ONLY if they are successfully transmitted to the next
generation
MUTATION RATE:
Number of nucleotide alterations per round of replication
SUBSTITUTION (or EVOLUTION) RATE:
Number of nucleotide alterations fixed in a population per unit of time
•
The rate of evolution of a virus reflects the relative proportion
of advantageous, neutral or deleterious evolutionary forces
exerted on it
8
Selective pressures
• Under negative selection less ‘fit’
nonsynonymous subst. accumulate more
slowly than synonymous subst.
• Alternatively expressed, negative selection
exerts pressure to remove deleterious subst.
from a population
• Positive selection acts to fix more ‘fit’ or
advantageous subst. in a population
9
Evolutionary models
• Necessary for accurate rate estimation
• Current models either take the nucleotide or
the codon as the unit of evolution
• The structure of the genetic code determines
that realistic models of evolution should
consider triplets of nucleotides (i.e. codons) to
be the basic unit of evolution
10
Nucleotide based models
• Nucleotide substitution models
– each nucleotide position of an alignment is
treated independently
• Codon position substitution models
– partitions nucleotide data so that codon positions
1, 2 & 3 may have different parameters
– SRD06 model has two categories 1+2 & 3
11
Codon based models
• A model of DNA sequence evolution
applicable to coding regions
• Uses the codon, as opposed to the nucleotide,
as the unit of evolution
• Accounts for dependencies among
nucleotides within a codon
• Most commonly used are GY94 (Goldman &
Yang) and MG94 (Muse & Gaut)
12
Nucleotide substitution models
as an example
Models of nucleotide evolution
Several probabilistic models of evolution have been developed to convert observed
nucleotide distances into measures of actual evolutionary distances
The relative complexity of these models is a function of the extent of the biological,
biochemical ad evolutionary assumptions (i.e. parameters) they incorporate
Substitutions are usually described as probabilities of mutational events, mathematically
modeled by matrices of relative rates:
14
Jukes-Cantor (JC)
•
First proposed model
•
It assumes that the four bases have equal frequencies and all substitutions are
equally likely
15
Kimura’s 2 parameter
•
Transitions are generally more frequent than transversions
•
K2P model assumes that the rate of transitions per site (α) differs from the rate of
transversions per site (β)
16
Felsenstein (1981)
•
If some substitutions are more common in one sequence than others, some
substitutions may be more frequent than others
•
F81 model allows the frequency (π) of the four nucleotides to be different
17
Hasegawa, Kishino and Yano
•
The HKY85 model allows rates of transitions and transversions to differ and base
frequencies to vary
18
General Time Reversible
•
The GTR/REV model allows each possible substitution to have its own probability
•
Substitutions are reversible (i.e. substitutions from i to j has the same probability as a
substitution from j to i)
19
After Whelan et al. 2001
20
Rate heterogeneity
•
Different regions of RNA/DNA may have different probabilities of change, and
variable rates of substitution can have considerable impact on sequence divergence
•
Typically, a gamma distribution is used to describe heterogeneity in nucleotide
substitution rate across sequences
•
The range of rate variation among sites is dictated by the shape parameter α of the
distribution
21
Beware of recombination!!
• Many phylogenetic methods implicitly assume
that all sites in a sequence share a common
evolutionary history
• However, recombination can violate this
assumption by allowing sites to move freely
between different genetic backgrounds
• This may cause different sections of an alignment
to lead to contradictory estimates of the tree and
subsequently confuse model inferences
22
Global vs. Local ω models
• Global – fits a single
model to a given
alignment & tree (i.e. all
branches are equal)
• Local – can a unique set
of substitution rates to
every branch in a tree
23
Acknowledgements
• HKU: Vijaykrishna Dhanasekaran & Justin Bahl
for help with preparing the presentation &
practical component
• Estimating selection pressures on alignments
of coding sequences: Analyses using HyPhy.
Edited by Sergei L. Kosakovsky Pond, Art F.Y.
Poon, and Simon D.W. Frost
24
Download