Gene Trees and Species Trees: Lessons from morning glories Lauren A. Eserman & Richard E. Miller Department of Biological Sciences Southeastern Louisiana University Introduction • DNA sequences are an important source of data for phylogenetic reconstruction • Single-gene trees were considered exciting and sufficient at one time • Chase and 41 other authors, 1993 • Phylogeny of angiosperms using rbcL Introduction • Next sequenced additional gene regions – Philosophical argument for “total evidence” –more data will strengthen the ability to determine species relationships – Used concatenated datasets to implement this idea – Still dominates the way species trees are estimated Introduction • Population genetics and coalescent theory emphasize that genes have unique histories – Gene trees do not always reflect the true species history Introduction • Gene tree heterogeneity can come about by: – Gene duplication events – Horizontal gene transfer – Incomplete lineage sorting (deep coalescence) – Branch length heterogeneity Edwards, 2009 • This provides evidence against concatenation Introduction • Paradigm shift in systematics? (Edwards, 2009) – Moving away from notion that gene trees show true species relationships – Promotes synergism of phylogenetic systematics with population genetics and coalescent theory Introduction • Application of the paradigm shift: – Use collective information from multiple gene trees to estimate a species tree – Consider conflicting results, valid alternative hypotheses for species relationships Research Objectives 1. Explore how gene trees with different phylogenetic signal influence the estimated species tree – Using 28 gene trees – Effects of concatenation on estimated species tree 2. Alternative objective is to obtain an understanding of species relationships for the organisms of interest (not discussed here) Study Organisms • Morning glories are generally species of the genus Ipomoea (not monophyletic) • Focus on tribe Ipomoeeae − Ipomoea + 9 other genera − c. 900 species − Distributed throughout the subtropics and tropics of the world Ipomoea nil Methods 1. Bayesian phylogenetic analysis of 28 gene trees • Obtained 28 gene regions for species of Ipomoeeae based on our research and additional genes from GenBank • Number of taxa ranged from 6 to 129 • Alignments using MAFFT and manually adjusted • Models of nucleotide substitution chosen using jModelTest • Gene trees constructed using MrBayes v3.1.2 – 4 runs, 4 chains sampling every 100-200 generations – Runs were continued until stationary distribution was estimated – Burn-in determined as asymptote in plots of total tree length by generations – Convergence criteria: • Same topology among 4 runs • PP of clade support ±3% among 4 runs – Majority rule consensus tree constructed from a combination of postburnin trees from all 4 runs Methods 1. Synthesis of 28 gene trees • ITS tree used as working hypothesis – Densest taxon sampling (129 species) – Good intrageneric resolution – NOTE: Not assuming this is the species tree – rather, a working hypothesis to compare to other gene trees • Topology and clade support of 27 other gene trees compared to ITS gene tree Results 1. Same relationships between ITS and other genes DFRB-2 myb1 Majority rule Majority rule I alb a MDR6 I b atatas FG 100 I hederacea MDR6 52 100 I b atatas FY 99 I nil MDR6 100 I li ndheimeri MDR6 100 PHAR I horsfall iae MDR 100 I nil SI1 I purpurea MDR6 97 100 I nil SI2 100 I tricolor MDR6 PHAR 98 I purpurea ST I coccinea MDR 6 100 85 I hederifol ia MDR6 100 100 I neei MDR6 I quamocli t MDR MINA 100 100 I tenuilob a MDR I quamocli t MDR6 I coccinea MDR I lacunosa MDR 6 T ob longata MD R I tril ob a MDR6 I trifida MDR6 M di ssecta MDR MINA Results 2. Individual species with unique positions not shown in any other gene tree DFRB-2 Majority rule 100 I nil 554 A1 I nil 225 C2 bHLH3 I nil 830 O3 Majority rule I nil 420 T I b atatas FG I nil 163 AD4 100 I nil 422 AJ 81 I nil 679 AK 80 I nil 447 AL I b atatas FY 99 I nil 449 BG8 I nil 418 BA6 I horsfall iae MDR I nil 845 BC7 I nil 767 BJ I nil 195 BS11 82 I nil SI1 I nil 414 BW12 I nil 164 BY I nil 771 BZ PHAR 97 100 100 I indica 253 D 82 98 I purpurea ST I indica 168 E I indica 602 F 100 78 85 I indica 130 B I quamocli t MDR I indica 424 G 100 100 100 I neurocephala 222 A1 I alb a 129 I nil 547 BM 99 100 I tenuilob a MDR I nil 427 BQ10 I nil SI1 AM5 79 76 PHAR I indica 166 C 100 76 I nil SI2 100 I indica 481 A 62 I coccinea MDR I nil 766 AY I nil 776 BI I nil 371 BK T ob longata MD R I nil 706 BL I nil 779 BN9 I hederifol ia 506 M di ssecta MDR MINA Results 3. Major alternative topology in CHSE Majority rule I alba MTC4 CHSE 69 I purpurea SI4 A1 56 I nil MTC4 A 100 100 PHAR I nil SI5 B I lobata MTC4 98 100 I quamoclit MTC4 I cordatotriloba MTC4 82 I amnicola MTC4 100 I argillicola MTC4 100 I wrightii MTC4 95 100 I platensis MTC4 99 I saintronanensis MTC4 I aquatica MTC4 100 I diamantinensis MTC4 I eriocarpa MTC4 100 I plebia MTC4 70 I ochracea MTC4 I pes tigridis MTC4 P hybrida MTC4 MINA Results 4. Identify new unnamed clades bHLH2 Majority rule Majority rule waxy 1 90 62 70 100 I alb a MDR4 98 95 58 I hederacea MDR4 100 100 I nil TKS Inb HLH2 100 PHAR 98 75 79 100 100 100 90 99 97 97 I purpurea 99 100 59 99 I tricolor SI3 64 61 54 99 56 81 100 I coccinea MDR 4 58 100 54 I quamocli t MDR4 MINA 100 82 95 76 94 52 100 72 60 I horsfall iae MDR4 100 100 100 100 70 87 I lacunosa MDR 4 100 I trifida MDR4 BATA I vi olacea MDR4 O pteri pes MDR4 ‘VIOL’ 72 67 100 70 53 62 I hochstetteri MDR4 100 100 98 98 96 71 A nervosa 20 T holubii I pes tigris 12 S beraviensis S tiliifolia 62 I obscura 26 I ochracea 22 I arachnosperma I pedicellaris 97 L owariensis I eriocarpa I lonchophylla 96 I plebeia 18 I albivenia 125 I cairica 91 I sepiaria 98 I diamantinensis 45 I aquatica 7 I coccinea 13 I sesscosiana 143 I lutea 141 I neei 140 I funis 123 I lobata 39 I quamoclit 14 I orizabensis 142 I dumetorum 147 I lindheimeri 25 I indica 130 I hederacea 10 I nil 11 A I nil 127 B I purpurea 9 A I purpurea 131 B I pubescens 76 I mairetii 137 I barbatisepala 90 I marginisepala 148 I tricolor 128 B I tricolor 16 A I parasitica 5 A I stans 136 I expansa 135 I seducta 146 I ampullacea 124 I ternifolia 47 I neurocephala 145 I muricata 144 I parasitica 15 B I alba 129 I santillanii 138 I conzattii 44 I graminea 46 I platensis 49 I carnea 21 I costata 55 I polpha 56 I wolcottiana 38 I saintronanensis 50 I leptophylla 4 I pandurata 48 I muelleri 36 I gracilis 24 I argillicola 34 I asarifolia 8 A1 I amnicola 3 I imperati 53 I sumatrana W22 I batatas 1 A1 I lacunosa 40 I umbraticola 29 I setosa 6 I wrightii 33 I sagittata 132 I polymorpha M tuberosa 19 O brownii 63 ‘OBSC’ MINA PHAR TRIC CALO ‘AMNI’ BATA Methods 2. Concatenated dataset • To address the issue of concatenation, constructed concatenated dataset using 10 genes • All gene trees showed similar topologies Majority rule Majority rule I alb a MTC I alb a MTC A 100 97 DFRB-1 100 100 I nil MTC I purpurea MTC UF3GT 100 I lob ata MTC 100 96 I nil MTC4 I nil MTC A 100 I nil MDR B 100 55 I hederacea MDR 100 100 I ampullacea REM I neurocephala REM 96 100 I purpurea MTC A 100 I argil li col a MTC 86 I lobata MTC4 100 I quamoclit MTC4 100 I quaml oci t MTC I amnicola MTC 50 I quamocli t MTC 80 76 I alba MTC4 CHI I alb a MDR B I lob ata MTC 100 Majority rule I purpurea MTC4 86 I purpurea MDR B I amnicola MTC4 I wri ghtii MTC I cordatotrilob a MTC 79 67 I cordatotrilob a MTC 96 60 100 I argillicola MTC4 100 100 I wrightii MTC4 I trifida MDR I platensis MTC I ob scura MTC 100 I sai ntranensis MTC 100 100 I umb rati col a MTC I wri ghtii MTC 100 75 I aquati ca MTC1 66 100 100 I aquati ca MTC2 64 I platensis MTC I pes tigridis MTC I umbraticola MTC4 99 I saintronanensis MTC4 I eriocarpa MTC4 I diamantinensis MTC I eri ocarpa MTC I platensis MTC4 100 I aquati ca MTC 100 I cordatotriloba MTC4 90 I argil li col a MTC I diamantinensis MTC I pes tigridis MTC4 99 100 I pleb ia MTC 83 I ochracea MTC I amnicola MTC I sai ntronanensi s MTC 50 100 I aquatica MTC4 I diamantinensis MTC4 78 100 I umb rati col a MTC 100 I b atatas HN I eri ocarpa MTC I plebia MTC4 I pleb ia MTC I ochracea MTC4 I pes tigridis MTC I obscura MTC4 I ob scura MTC I ochracea MTC Majority rule I alba 100 Results 100 100 I purpurea I lobata 10-gene concatenated dataset 100 100 I quamoclit I cordatotriloba 100 99 I umbraticola I platensis 100 100 • Maintains topologies of individual gene trees I nil I saintronanensis I amnicola 100 100 I argillicola 100 I wrightii I aquatica 100 100 I diamantinensis I eriocarpa 100 100 I plebia I pes tigridis I ochracea I obscura Concatenated dataset • What happens when one more gene is added? • Add CHSE to 10-gene concatenated dataset – Alternative topology – All coding region – No indels Majority rule I alba Results 100 I nil 100 100 11-gene concatenated dataset I purpurea I lobata 100 100 I quamoclit I cordatotriloba 100 I amnicola 100 • Exhibits topology of CHSE – new gene overwhelms this analysis I argillicola 100 I wrightii 90 100 I platensis 100 I saintronanensis I aquatica 100 I diamantinensis I eriocarpa 100 100 I plebia I pes tigridis I ochracea 10-gene concatenated dataset 11-gene concatenated dataset Majority rule Majority rule I alba I alba 100 100 100 100 I nil I nil 100 I purpurea 100 I purpurea I lobata 100 I lobata 100 100 100 I quamoclit I quamoclit I cordatotriloba 100 99 I cordatotriloba I umbraticola 100 I amnicola I platensis 100 100 100 I argillicola 100 I saintronanensis I amnicola I wrightii 90 100 100 100 I argillicola 100 I platensis 100 I wrightii I saintronanensis I aquatica 100 100 I aquatica 100 I diamantinensis I diamantinensis I eriocarpa 100 I eriocarpa 100 I plebia I pes tigridis 100 100 I plebia I ochracea I pes tigridis I obscura I ochracea BEST Analysis Bayesian Estimation of Species Trees (Liu, 2008) • Incorporates a multispecies coalescent model to estimate species tree from many gene trees • Methods: – 11-gene concatenated dataset – 2 runs, 4 chains – 8 million generations (did not reach convergence on topology) Majority rule I alba I amnicola 98 BEST Analysis I argillicola 100 I wrightii 78 I platensis 100 100 73 • Results: – Clade present in CHSE appears in BEST tree – Overall topology differs – Species pairs supported throughout I saintronanensis I aquatica 75 100 I diamantinensis I cordatotriloba 99 I lobata 100 I quamoclit 84 I nil 100 I purpurea I eriocarpa 100 100 I plebia I pes tigridis I ochracea Discussion • Analysis of 28 gene trees – Provides an estimate of species tree – Alternative hypotheses for species relationships have emerged – Overall congruence among gene trees Discussion – Concatenated datasets • Total evidence philosophically justified but misleads results because of gene tree heterogeneity – Shown clearly in 11-gene concatenated dataset • Left with idea that we have two alternative hypotheses of species relationships – Two estimates of the species tree Discussion – Concatenated datasets • Can now appreciate how a single gene can overwhelm results of a concatenated dataset – Topology of CHSE dominated Majority rule Majority rule I alba I alba 100 100 100 100 I nil I nil 100 I purpurea 100 I purpurea I lobata 100 I lobata 100 100 100 I quamoclit I quamoclit I cordatotriloba 100 99 I cordatotriloba I umbraticola 100 I amnicola I platensis 100 100 100 I argillicola 100 I saintronanensis I amnicola I wrightii 90 100 100 100 I argillicola 100 I platensis 100 I wrightii I saintronanensis I aquatica 100 100 I aquatica 100 I diamantinensis I diamantinensis I eriocarpa 100 I eriocarpa 100 I plebia I pes tigridis 100 100 I plebia I ochracea I pes tigridis I obscura I ochracea Acknowledgements Research Assistants: A. McDaniel, K. Robichaux, W. Terry, S. Major, H. Echlin, F. St. Cyr Seed Donations: M. Clegg, M. Rausher, J. A. McDonald, J. Miller P. Tiffin, B. Zufall, S.M. Chang Ipomoea purpurea