PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS consensus tree • • • can also ask equallysupported trees (equally parsimonious, equal likelihood) how well they all support same nodes doesn’t have to involve subset of data like in bootstrap may summarize the stable parts of tree across 2+ trees a b c d ea b c d e a b c d e CONSENSUS phylogeny is not fully resolved where there is disagreement among equally-ranked trees blue indicates dinosaurs with bifurcation of neural spine in vertebrae http://svpow.com/papers-by-sv-powsketeers/wedel-and-taylor-2013-on-sauropod-neural-spinebifurcation/ support for the method • do we believe phylogeny reconstruction works? need to test it against a known history • (fish(salamander(bird(mouse,human)) we feel pretty strongly about • experimental phylogenetics uses virus evolution to go one step further experimental evolution 40 generations 40 generations 40 generations growing T7 phage on E. coli plates; speed up mutation process by adding mutagen experimental evolution • so phylogeny is known, and ancestral strains can be kept in freezer • • • sequence part of DNA and use parsimony, likelihood, and other approaches consistently got the right (TRUE) answer! can also track “traits” on this tree, e.g. changes in growth rate and plaque size on E. coli plates (and check against actual ancestors) # DNA mutations on this branch Rem: each branch is 40 generation s Text: “Because constructing phylogenies, and science more broadly, is often a process of evaluating evidence, scientists often test the effectiveness of the methodologies used to draw conclusions.” case studies • text goes through Origin of Tetrapods, Human phylogeny, Darwins finches, HIV • show phylogeny, explain the likely mechanisms for pattern well-supported phylogeny of rabies virus lineages, coded by host bat species Phylogeny: how? Methods from Streicker et al (2010 bat rabies phylogeny paper) what gene region(s)? PCR of gene with primers sampling effort how sequence data generated Phylogeny: how? Methods from Streicker et al (2010 bat rabies phylogeny paper) tree criterion: uses statistical model of DNA evolution every type of mutation happens at different rate, as observed mutations happen at different rates across codons in protein-coding genes are our data consistently supporting same phylogeny? outgroup comparison ‘roots’ phylogeny at ancestral node Phylogeny: how? coalescent: statistical model of how different evolutionary histories Methods from Streicker et al (2010 bat rabies phylogeny paper) of drift, selection, migration, and change in population size are associated with DATA oh, no. now it is getting gnarly. treat bat species as locations and ask how frequently migration of virus among species could explain pattern we see now? analysis indicates rate of virus jumping from one host to another For RNA viruses, rapid viral evolution and the biological similarity of closely related host species have been proposed as key determinants of the occurrence and long-term outcome of cross-species transmission. Using a data set of hundreds of rabies viruses sampled from 23 North American bat species, we present a general framework to quantify per capita rates of cross-species transmission and reconstruct historical patterns of viral establishment in new host species using molecular sequence data. These estimates demonstrate diminishing frequencies of both cross-species transmission and host shifts with increasing phylogenetic distance between bat species. Evolutionary constraints on viral host range indicate that host species barriers may trump the intrinsic mutability of RNA viruses in determining the fate of emerging host-virus interactions. so this study requires TWO phylogenies (virus and bats) CST: cross-species transmission neutrality • • • neutral: doesn’t affect fitness of organism compare mutations in protein coding regions: synonymous mutations do not change amino acid, nonsynonymous do if much of diversity is neutral (or nearly so), mutations will accumulate and fix (become a substitution) in populations regularly through time “molecular clock” works for many genome partitions neutrality acts as our NULL HYPOTHESIS • • different homologous genome regions have different rates, slower rates when more functional constraints remember: fossil record, biogeography/geology, mutation accumulation studies help us estimate substitution rate µ isthmus closes via volcanic uplift ~3.5mya two locations - are they two populations? different allele frequencies, distinct clades on tree: yes compare cytochrome oxidase mtDNA gene: 7% divergence • d=2µt time(t), rate µ along 2 branches • µ is the rate of mutations going to fixation (substitutions), under neutrality the mutation rate IS the substitution rate because selection doesn’t accelerate or halt or change probability of fixation • here we know t=3,500,000 years, d=0.07 • µ = d/2t = (0.07)/(7,000,000) = 1x10 • another way to put it, rate of divergence -8 (2µ) ~2% per million years • what is our assumption in those slides about clock calibration? • how would YOU test that? • idea is any mutation is equally likely to become a substitution • how have we divided (point) mutations up so far? neutrality • • • neutral: doesn’t affect fitness of organism compare mutations in protein coding regions: synonymous mutations do not change amino acid, nonsynonymous do if much of diversity is neutral (or nearly so), mutations will accumulate and fix (become a substitution) in populations regularly through time synonymous is assumed neutral • so we can ask if nonsynonymous substitutions happen at a different rate • • • neutrality: nonsynonymous divergence (dN) = synonymous divergence (dS) rate rate, not number of mutations - remember many more ways for a mutation to be nonsynonymous than synonymous does dN/dS =1? (book, elsewhere often this is called kA/kS; adjusts for the “more ways” of nonsynonymy) This is the dN:dS or kA:kS approach we have been discussing if kA:kS >> 1, change has been selected FOR if kA:kS << 1, change is generally BAD if kA:kS ~ 1 neutrality positive selection: amino acid change is favored functional constraints lead to high levels of homology: change is generally bad (purifying selection) region of high homology led to discovery of new functional region that influences mammalian heart disease