Molecular evolution Reminders: • First writing assignment due in class on Wednesday • Exam 2 next Monday, Nov 3 TA extra office hours this week: Early (1960s) analyses of amino acid variation among different species (Kimura; Zuckerkandl & Pauling) e.g., Cytochrome c Hemoglobin human horse bird fish Kyra: Wed (10/29), 11 am - Noon Nic: Fri (10/31), 9-10 am Pu: Fri (10.31), 10-11 am Cytochrome c TA review session (Katie and Pu): 2:00 - 3:00 Friday, McMillan 149 Amount of aa differences between species ~correspond to the length of time since species diverged from a common ancestor. e.g., horses and humans, good fossil data on time of divergence Suggested constant, steady rate of amino acid substitution Molecular clock: constant, steady rate of change at the molecular level (amino acids, DNA sequences). e.g., for a hypothetical protein sequenced in several species Humans and horses last shared a common ancestor ~45 MY ago. Differ at 6 out of 100 aa sites reptile-fish Molecular clock: constant, steady rate of change at the molecular level (amino acids, DNA sequences). e.g., for a hypothetical protein sequenced in several species Humans and horses last shared a common ancestor ~45 MY ago. Differ at 6 out of 100 aa sites Humans and fruitflies last shared a common ancestor ~600 MY ago. Differ at 90 out of 100 aa sites reptile-fish Linear relationship indicates molecular clock Humans and fruitflies last shared a common ancestor ~600 MY ago. Differ at 90 out of 100 aa sites 1 Implications of molecular clock observations… • Inconsistent with evolution by natural selection: — Would expect far fewer changes (since the protein’s function didn’t change — e.g., cytochrome c) — Would expect changes to be episodic, not steady: associated with periods of NS (e.g, environmental change, rapid speciation, etc.) • Suggested that most mutations that arise and go to fixation do so by genetic drift: selectively neutral Neutral Theory of Evolution Most mutations that arise and go to fixation are neutral with respect to natural selection; fixation by genetic drift. This doesn’t mean that natural selection isn’t acting on these genes: — We assume that most mutations that affect protein function are deleterious and immediately selected against. Many mutations are quickly eliminated by natural selection (not observed). — The Neutral Theory focuses on mutations that arise and go to fixation Revolutionary at the time, since it had been assumed that natural selection was the primary mechanism of evolution The proportion of a gene showing neutral evolution depends on proportion of sites under functional constraint (where mutations will likely be deleterious). — Very few mutations are driven to fixation through natural selection Molecular clock varies by protein: depends on functional constraint e.g., functional constraint is lower for a gene that is mostly introns e.g., functional constraint is lower for a gene where amino acid changes are less likely to disrupt protein function Variation in functional constraint within and among genes: — pseudogenes (no expressed protein): no functional constraints — noncoding regions (e.g., introns): very few functional constraints — 3rd codon position (often synonymous): some functional constraints —1st position (nonsynonymous): high functional constraints Cyt c is under high functional constraint: a low proportion of mutations evolve neutrally Molecular clock will vary for different genes/proteins, gene regions 2 Level of functional constraint varies within a gene Degree of functional constraint on non-synonymous varies depending on gene function: Rates of nucleotide substitition between humans and mice/rats Influenza A strains with different degrees of divergence Gene Lower proportion of nonsynonymous sites evolve neutrally: slower molecular clock Non Synonymous* Synonymous* Histone 3 0.00 6.38 Actin α 0.01 3.13 Thyrotropin 0.33 4.66 Immunoglobulin Ig VH 1.07 5.66 Interleukin I 1.42 4.60 Interferon γ 3.06 5.50 (*avg # substitutions per site per billion years) For sites that are not under functional constraint… Evolution purely by genetic drift: u = rate of mutation (per gamete per generation) For a diploid population of size Ne, the number of new mutations per generation = 2Neu Rate of fixation at neutral sites equals the mutation rate (u)… Implications: (2Neu)(1/2Ne) = u 1. Does not depend on population size! Large N: more mutants, but lower likelihood that any one is fixed Recall: probability of fixation of an allele equals its frequency For a newly arisen allele, this is 1/(2Ne) (e.g., for N=5, 2N= 10, p=1/10 = 0.1) So the number of mutations that arise per generation that eventually get fixed is (2Neu)(1/[2Ne]) = u Rate of fixation equals the mutation rate = u 3 Rate of fixation at neutral sites equals the mutation rate (u)… Implications: (2Neu)(1/2Ne) = u Rate of fixation at neutral sites equals the mutation rate (u)… This would explain the molecular clock too, right?… 1. Does not depend on population size! Large N: more mutants, but lower likelihood that any one is fixed 2. Often expect to find variation (polymorphism) at a site: Recall: time to fixation by drift: 4Ne generations (don’t need to know derivation) e.g., Ne = 1000 4000 generations to fixation of 1 new mutation… If u = 1 X 10-9 mutations per gamete per generation, And if we’re looking at a 1000 bp gene, Then over a period of 4000 generations we expect 8 new mutations: Higher proportion of neutrally evolving sites. Lower proportion of neutrally evolving sites. (1 X 10-9)(2000 gametes)(4000 generations)(1000 bp) = 8 mutations 2Ne Variation! (=No fixation) Tomoko Ohta saves the day: she proposes Nearly Neutral Model Wait! We have a problem… Rate of fixation equals the mutation rate = u u = rate of mutation per gamete per generation NOT per absolute time But the molecular clock seems to follow absolute time… Even though species vary widely in generation time Why aren’t more mutations accumulating in lineages with short generation time?? If we allow for the possibility that most mutations are slightly deleterious instead of strictly neutral, then the probability of drifting to fixation will depend on population size: Small N = drift overrides weak selection, so most mutations are evolving as if neutral: ‘effectively neutral’ Large N = drift is weak, so most mutations not neutral and are selected against Mutations are effectively neutral (i.e., s=0) for s < 1 2Ne So why does this save the day?… Absolute time 4 Because species with short generation times tend to have larger populations: Detecting natural selection on DNA sequences Using (nearly) neutral evolution as the null hypothesis: look for deviations from neutral expectations In other words, look for patterns of evolution that don’t fit evolution by genetic drift alone. Short generation time many mutations per year but fewer mutations effectively neutral s< 1 2Ne Long generation time few mutations per year but many mutations are effectively neutral So the difference in generation time is balanced out by N Method 1. dN/dS ratios Nature Vol. 335, 8 September 1988 dN = rate of non-synonymous substitutions per site (measured as # nonsynonymous polymorphisms) dS = rate of synonymous substitutions per site (measured as # synonymous polymorphisms) MHC: major histocompatibility complex ARS receptor: antigen recognition site Antigen recognition site (57 codons) dN/dS < 1 aa replacements largely deleterious (e.g., normal functioning gene) dN/dS = 1 aa replacements are neutral (e.g., pseudogene, no functional constraint) dN/dS > 1 aa replacements are advantageous and are favored by selection dN 13.3 per 100 sites ds 3.5 per 100 sites dN/dS: 3.8 Selection favors amino acid changes in ARS (new alleles favored) MHC protein Remaining codons in Exons 2 and 3 1.6 per 100 sites 2.5 per 100 sites dN/dS: 0.64 Typical for nonsynonymous vs synonymous sites (functional constraint) 5 Nature Genetics 25, 410 - 413 (2000) Adaptive evolution of the tumour suppressor BRCA1 in humans and chimpanzees Gavin A. Huttley, Simon Easteal, Melissa C. Southey, Andrea Tesoriero, Graham G. Giles, Margaret R.E. McCredie, John L. Hopper & Deon J. Venter One problem with using dN/dS ratios to infer selection: it’s extremely conservative if you average across the entire gene e.g., 363 genes examined in mice/rats: only one with dN/dS > 1 Most useful for comparing different domains within a protein: e.g., abalone lysin protein domains Recall: sperm competition to penetrate egg rapid evolution. Suggests selection dN/dS ratios Exposed protein regions are evolving rapidly: Black: dN/dS >3.0 Gray: dN/dS <1.0 White: dN/dS ~1.0 dN/dS >3.0 MK test… Detecting natural selection on DNA sequences… Adh (alcohol dehydrogenase) gene in Drosophila species Method 2. McDonald-Kreitman (MK) test: Neutral theory: # polymorphic sites within species should be directly proportional to number of # differences fixed between species Might expect selection on Adh for alcohol tolerance in species whose larvae live in fermenting fruit Within species: look at polymorphic sites (# nonsynon., # synon.) Between species: look at fixed differences (# nonsynon., #synon.) MK test reveals selection where dN/dS alone does not: Within-species polymorphism Differences fixed between species Nonsynonymous: 2 8 Synonymous: 10 40 These ratios should be equal under neutral evolution An excess of non-synonymous fixed differences would indicate selection driving amino acid change. Within-species polymorphism Differences fixed between species Nonsynonymous: 2 7 Synonymous: 42 17 By itself, dN/dS just looks like normal functional constraint 0.048 (=dN/dS) Excess nonsynonymous fixed differences 0.412 6 Detecting natural selection on DNA sequences… Method 3: test for excess of old or new mutations… With no selection (genetic drift acting alone), we expect alleles (=haplotypes) to continuously arise and to go extinct Haplotype tree with no selection: Haplotype tree with positive selection: Most alleles will be recent descendents of favored allele Expect mixture of closely related alleles (=recently diverged from an ancestral allele), and those that are more distantly related (=older common ancestor allele) * = selectively favored mutation With directional selection for an advantageous mutation (= Positive Selection), we expect fewer older alleles than with neutrality time Method 3: test for whether there is an excess of old or new polymorphisms compared to neutral expectations. Alleles closely related, no long branches: shallow haplotype tree (blue= extinct alleles) (blue= extinct alleles) Method 3: test for excess of old or new mutations… With selection to maintain two or more allele classes (=balancing selection), find maintenance of allele lineages that would otherwise go extinct Haplotype tree with balancing selection: Some alleles will be very distantly related: long branches e.g., heterozygote advantage negative freq-dep. selection diversifying selection (blue= extinct alleles) 7