Lecture 17: Phylogenetics and Phylogeography October 22, 2012 Announcements Exam Next Wednesday (Oct 31) Review on Monday Bring questions Covers material from genetic drift (Sept 28) through Coalescence (Friday) I will be gone Monday, Oct 29 (after office hours) through Oct 31 Bring questions on Monday! Last Time Using FST to estimate migration Direct estimates of migration: parentage analysis Introduction to phylogenetic analysis Today Phylogeography Limitations of phylogenetic analysis Coalescence introduction Influence of demography on coalescence time UPGMA Method Use all pairwise comparisons to make dendrogram UPGMA:Unweighted Pairwise Groups Method using Arithmetic Means Hierarchically link most closely related individuals Read the Lab 9 Introduction! Phenetics (distance) vs Cladistics (character state based) Lowe, Harris, and Ashton 2004 Parsimony Methods Based on underlying genealogical relationships among alleles Occam’s Razor: simplest scenario is the most likely Useful for depicting evolutionary relationships among taxa or populations Choose tree that requires smallest number of steps (mutations) to produce observed relationships Choosing Phylogenetic Trees MANY possible trees can be built for a given set of taxa Very computationally intensive to choose among these Lowe, Harris, and Ashton 2004 UN ( 2 n 5 )! 2 n3 ( n 3 )! RN ( 2 n 3 )! 2 n2 ( n 2 )! ( 2 n 3 )U n Choosing Phylogenetic Trees Many algorithms exist for searching tree space Local optima are problem: need to traverse valleys to get to other peaks Heuristic search: cut trees up systematically and reassemble Branch and bound: search for optimal path through tree space 9 9 10 9 9 Felsenstein 2004 8 9 7 8 11 11 5 Choosing Phylogenetic Trees If multiple trees equally likely, select majority rule or consensus Strict consensus is most conservative approach Bootstrap data matrix (sample with replacement) to determine robustness of nodes E 60 Lowe, Harris, and Ashton 2004 A D F CB 60 60 Felsenstein 2004 Phylogeography The study of evolutionary relationships among individuals based on phylogenetic analysis of DNA sequences in geographic context Can be used to infer evolutionary history of populations Migrations Population subdivisions Bottlenecks/Founder Effects Can provide insights on current relationships among populations Connectedness of populations Effects of landscape features on gene flow Phylogeography Topology of tree provides clues about evolutionary and ecological history of a set of populations Dispersal creates poor correspondence between geography and tree topology Vicariance (division of populations preventing gene flow among subpopulations) results in neat mapping of geography onto haplotypes Example: Pocket gophers (Geomys pinetis) Fossorial rodent that inhabits 3-state area in the U.S. RFLP for mtDNA of 87 individuals revealed 23 haplotypes Parsimony network reveals geographic relationships among haplotypes Haplotypes generally confined to single populations Major east-west split in distribution revealed Avise 2004 Problems with using Phylogenetics for Inferring Evolution It’s a black box: starting from end point, reconstructing past based on assumed evolutionary model Homologs versus paralogs Hybridization Differential evolutionary rates Assumes coalescence Gene Orthology Phylogenetics requires unambiguous identification of orthologous genes Paralogous genes are duplicated copies that do not share a common evolutionary history Difficult to determine orthology relationships Lowe, Harris, Ashton 2004 Gene Trees vs Species Trees Genes (or loci) evolve at different rates Why? Topology derived by a single gene may not match topology based on whole genome, or morphological traits Gene Tree B C A Gene Trees vs Species Trees Failure to coalesce within species lineages drives divergence of relationships between gene and species trees Divergent Gene Tree: Concordant Gene Tree b is closer to a than to c a b c b is closer to c a b than to a c Coalescence Retrospective tracing of ancestry of individual alleles Allows explicit simulation of sequence evolution Incorporation of factors that cause deviation from neutrality: selection, drift, and gene flow 9 generations in the history of a population of 14 gene copies Time present Slide courtesy of Yoav Gilad Individual alleles How to model this process? Modeling from Theoretical Ancestors: Forward Evolution Can model populations in a forward direction, starting with theoretical past Fisher-Wright model of neutral evolution Very computationally intensive for large populations Alternative: Start at the end and work your way back Most recent common ancestor (MRCA) Time present Slide courtesy of Yoav Gilad Individual alleles The genealogy of a sample of 5 gene copies Most recent common ancestor (MRCA) Time present individuals Slide courtesy of Yoav Gilad The genealogy of a sample of 5 gene copies Most recent common ancestor (MRCA) Individual alleles Slide courtesy of Yoav Gilad Time present Examples of coalescent trees for a sample of 6 Time Individual alleles Slide courtesy of Yoav Gilad Coalescence Advantages Don’t have to model dead ends Only consider lineages that survive to modern day: computationally efficient Based on actual observations Can simulate different evolutionary scenarios to see what best fits the observed data Coalescent Tree Example Coalescence: Merging of two lineages in the Most Recent Common Ancestor (MRCA) Waiting Time: time to coalescence for two lineages Increases with each coalescent event Probability of Coalescence For any two lineages, function of population size Pcoalescenc e 1 2Ne Also a function of number of lineages Pcoalescenc e k ( k 1) 1 2 2Ne where k is number of lineages Probability of Coalescence Probability declines over time Lineages decrease in number Can be estimated based on negative exponential Pcoalescenc e e k ( k 1 ) 1 t 2 2Ne where k is number of lineages Time to Coalescence Affected by Population History Bottleneck Time to Coalescence Affected by Population History Population Growth Time to Coalescence Affected by Population Structure Applications of the Coalescent Approach Framework for efficiently testing alternative models for evolution Inferences about effective population size Detection of population structure Signatures of selection (coming attraction)