Calculation of numbers of synonymous and non-synonymous substitutions per site using the method of Nei & Gojobori (1986). Show that syn and non-syn sites evolve at different rates. Need to calculate: S = no. syn sites N = no. non-syn sites Sd = no. syn differences Nd = no. non-syn differences Now define : DS = Sd/S (fraction of syn sites that differ) DN = Nd/N (fraction of non-syn sites that differ) These are equivalent to D in the Jukes-Cantor model. We can use the JC distance formula to calculate two evolutionary distances. dS = -3/4 ln(1- 4DS/3) dN = -3/4 ln(1- 4DN/3) site) (no. of syn subs per syn site) (no. of non-syn subs per non-syn These are equivalent to the usual Jukes-Cantor d, which is the number of substitutions per site if all sites are equivalent. For any two homologous sequences, we expect dS > dN because selection slows down the rate of non-syn subs. If we know the time t since two species diverged, we can calculate the rates of syn and non-syn subs: dS/2t and dN/2t. These rates would be numbers of subs per site per million years. If we don’t know t, we can still compare the two distances. The ratio dN/dS tells us how much slower the non-syn subs are. Notation: d is sometimes called K dS is sometimes called KS dN is sometimes called KA (where the A means amino acid subs) dN/dS is the same thing as KA/KS Seq 1 Seq 2 1 Pro CCC CCC Pro 2 Phe UUU UUC Phe 3 Gly GGG GAG Ala 4 Leu UUA CUA Leu 5 Phe UUU GUA Val Calculate S for each codon. Check the genetic code A fourfold degenerate site counts as S = 1(N = 0) A non-degenerate site counts as S = 0 (N = 1) A two fold degenerate site counts as S = 1/3 (N = 2/3) 1. S = 0 + 0 + 1 = 1 2. S = 0 + 0 + 1/3 = 1/3 3. S = 0 + 0 + 1 = 1 (whether we look at Gly or Ala codons) 4. for UUA, S = 1/3 + 0 + 1/3 = 2/3 for CUA, S = 1/3 + 0 + 1 = 4/3 Take the average of these: S = 1 for codon 4. 5. for UUU, S = 1/3 for GUA, S = 1 Take average: S = 2/3 For whole sequence, S = 1 + 1/3 + 1 + 1 + 2/3 = 4 N = total number of sites - S = 15 - 4 = 11 Seq 1 Seq 2 1 Pro CCC CCC Pro 2 Phe UUU UUC Phe 3 Gly GGG GAG Ala 4 Leu UUA CUA Leu Calculate Sd and Nd for each codon. 1. Sd = 0, Nd = 0 2. Sd = 1, Nd = 0 3. Sd = 0, Nd = 1 4. Sd = 1, Nd = 0 5. this could happen two ways UUU --> GUU --> GUA N d = 1 Sd = 1 UUU --> UUA --> GUA Nd = 1 Nd = 1 Take average of these two: Sd = 0.5, Nd = 1.5 5 Phe UUU GUA Val route 1 Sd = 1, Nd = 1 route 2 Sd = 0, Nd = 2 (note that if all three positions were different there would be 6 routes to average) Total Sd = 2.5 Total Nd = 2.5 DS = 2.5/4 = 0.625 dS = 1.34 DN = 2.5/11 = 0.227 dN = 0.271 Non-syn rate is much slower than syn rate in this example