Calculation of numbers of synonymous and non

advertisement
Calculation of numbers of synonymous and
non-synonymous substitutions per site using the method of
Nei & Gojobori (1986).
Show that syn and non-syn sites evolve at different rates.
Need to calculate:
S = no. syn sites
N = no. non-syn sites
Sd = no. syn differences
Nd = no. non-syn differences
Now define :
DS = Sd/S (fraction of syn sites that differ)
DN = Nd/N (fraction of non-syn sites that differ)
These are equivalent to D in the Jukes-Cantor model.
We can use the JC distance formula to calculate two
evolutionary distances.
dS = -3/4 ln(1- 4DS/3)
dN = -3/4 ln(1- 4DN/3)
site)
(no. of syn subs per syn site)
(no. of non-syn subs per non-syn
These are equivalent to the usual Jukes-Cantor d, which is
the number of substitutions per site if all sites are
equivalent.
For any two homologous sequences, we expect dS > dN
because selection slows down the rate of non-syn subs.
If we know the time t since two species diverged, we can
calculate the rates of syn and non-syn subs:
dS/2t and dN/2t.
These rates would be numbers of subs per site per million
years.
If we don’t know t, we can still compare the two distances.
The ratio dN/dS tells us how much slower the non-syn subs
are.
Notation:
d is sometimes called K
dS is sometimes called KS
dN is sometimes called KA (where the A means amino acid
subs)
dN/dS is the same thing as KA/KS
Seq 1
Seq 2
1
Pro
CCC
CCC
Pro
2
Phe
UUU
UUC
Phe
3
Gly
GGG
GAG
Ala
4
Leu
UUA
CUA
Leu
5
Phe
UUU
GUA
Val
Calculate S for each codon.
Check the genetic code A fourfold degenerate site counts as S = 1(N = 0)
A non-degenerate site counts as S = 0 (N = 1)
A two fold degenerate site counts as S = 1/3 (N = 2/3)
1. S = 0 + 0 + 1 = 1
2. S = 0 + 0 + 1/3 = 1/3
3. S = 0 + 0 + 1 = 1 (whether we look at Gly or Ala codons)
4. for UUA, S = 1/3 + 0 + 1/3 = 2/3
for CUA, S = 1/3 + 0 + 1 = 4/3
Take the average of these: S = 1 for codon 4.
5. for UUU, S = 1/3
for GUA, S = 1
Take average: S = 2/3
For whole sequence, S = 1 + 1/3 + 1 + 1 + 2/3 = 4
N = total number of sites - S = 15 - 4 = 11
Seq 1
Seq 2
1
Pro
CCC
CCC
Pro
2
Phe
UUU
UUC
Phe
3
Gly
GGG
GAG
Ala
4
Leu
UUA
CUA
Leu
Calculate Sd and Nd for each codon.
1. Sd = 0,
Nd = 0
2. Sd = 1,
Nd = 0
3. Sd = 0,
Nd = 1
4. Sd = 1,
Nd = 0
5. this could happen two ways
UUU --> GUU --> GUA
N d = 1 Sd = 1
UUU --> UUA --> GUA
Nd = 1 Nd = 1
Take average of these two:
Sd = 0.5, Nd = 1.5
5
Phe
UUU
GUA
Val
route 1
Sd = 1, Nd = 1
route 2
Sd = 0, Nd = 2
(note that if all three positions were different there would be
6 routes to average)
Total Sd = 2.5
Total Nd = 2.5
DS = 2.5/4 = 0.625
dS = 1.34
DN = 2.5/11 = 0.227
dN = 0.271
Non-syn rate is much slower than syn rate in this example
Download