The relationship between the rate of molecular evolution and the rate of genome rearrangement in animal mitochondrial genomes. Wei Xu(1), Daniel Jameson(2), Paul G Higgs(1). (1)Department of Physics, McMaster University, Hamilton, Ontario. (2) School of Biological Sciences, University of Manchester, UK. Background to this project 1 3 Phylogenetic Artefacts – Gene sequences are often used in molecular phylogenetics studies to deduce the evolutionary relationship between species. This is difficult with mitochondrial genomes because of the presence of rapidly evolving ‘problem species’ with very divergent sequences (which leads to long branch attraction), and due to the wide variation in the frequency of bases and amino acids among species (which causes biases in trees). Can Gene Order Help? – Gene order rearrangements sometimes provide strong evidence of shared ancestry of a group, e.g. the translocation of tRNA-Leu shown in the Drosophila genome (see 2) is a derived character shared by many other insects and crustaceans, and this supports the existence of the Pancrustacea clade. The gene order of Rhipicephalus is shared by two other ticks and no other species, which is a strong signature of the relationship of this group. However, there are also species, such as Tigriopus, with extremely scrambled genomes, where gene order tells us little. Chelicerata Sequence Evolution and Genome Rearrangement are Related – Here we will show that species with high rates of sequence evolution also tend to have high rates of genome rearrangement. Problem species in molecular phylogenetics also tend to be problems in gene order studies. OGRe is a relational database for comparative analysis of mitochondrial genomes. It contains information on gene sequences, gene order and genome rearrangements. Please visit OGRe on-line at http://ogre.mcmaster.ca 2 4 Protein Tree Terebratulina Katharina Limulus Maximum Likelihood branch lengths were obtained using this fixed topology. The protein tree (left) is derived from a concatenation of 4 mitochondrial proteins. The tRNA tree (right) is derived from a concatenation of 22 mitochondrial tRNAs. For each species, the total branch length from the root of the arthropods (A) to the tip was measured (see Table 1). Rates of sequence evolution vary substantially between species. Vargula Hutchinsoniella Tigriopus Armillifer Argulus Tetraclita Pollicipes Penaeus Cherax Portunus Panulirus Pagurus Artemia Triops Daphnia Tetrodontophora Gomphiocephalus Tricholepidion Locusta Aleurodicus Triatoma Philaenus Thrips A P It is thought that the ancestral gene order (at A) is the same as Limulus. Therefore, the break point and inversion distances from Limulus to each species were measured. Lepidopsocid Heterodoxus Pyrocoelia Tribolium Crioceris Apis Melipona Limulus and the fruit fly, Drosophila, differ by a single translocation of a tRNA-Leu gene (shown in yellow and marked by an arrow). Ornithoctonus Habronattus Speleonectes P Heptathela A consensus tree topology for arthropods was obtained from morphological evidence, published molecular phylogenies and our own analysis of mitochondrial sequences. The base of the pancrustacea (P) was left as a multifurcation as there is no reliable consensus. Scutigera Lithobius Thyropygus Narceus A Limulus Methods Heptathela Ornithoctonus Habronattus Varroa Carios Ornithodoros moubata Ornithodoros porcinus Rhipicephalus Amblyomma Haemaphysalis Ixodes holocyclus Ixodes hexagonus Ixodes persulcatus Typical animal mitochondrial genomes contain 13 protein-coding genes, 2 rRNAs and 22 tRNAs. OGRe produces comparisons of mitochondrial gene orders for any two species. The examples below show comparisons between the Horseshoe crab, Limulus polyphemus and three other arthropods. These genomes a circular – the two ends are connected – but they are shown as linear for convenience. Each gene is shown as a block labelled by its gene symbol. Single letter abbreviations are for tRNA genes. Genes drawn below the central line are transcribed from left to right. Genes drawn above the line are transcribed from right to left (and a – sign is added to the gene symbol). tRNA Tree Terebratulina Katharina Ostrinia Antheraea Bombyx Anopheles Drosophila Chrysomya Myriapoda Tigriopus Armillifer Argulus Tetraclita Pollicipes Penaeus Cherax Portunus Panulirus Pagurus Artemia Triops Daphnia Tetrodontophora Gomphiocephalus Tricholepidion Locusta Aleurodicus Triatoma Philaenus Crustacea Thrips Lepidopsocid Heterodoxus Pyrocoelia Tribolium Crioceris Apis Melipona Ostrinia Antheraea Bombyx Anopheles Drosophila Chrysomya 5 0.1 Varroa Carios Ornithodoros moubata Ornithodoros porcinus Rhipicephalus Amblyomma Haemaphysalis Ixodes holocyclus Ixodes hexagonus Ixodes persulcatus Scutigera Lithobius Thyropygus Narceus Speleonectes Vargula Hutchinsoniella Hexapoda 0.1 Results Limulus polyphemus the Horseshoe crab Limulus and the tick, Rhipicephalus, differ by several gene rearrangements, but many common stretches of identical gene order are found between the two species (as indicated by the colour scheme). A point of discontinuity of gene order between two species is called a break point. The Break Point Distance between two species is the number of break points (7 in this example). This is the simplest quantitative measure of the amount of genome rearrangement that has a occurred. Another measure is the Inversion Distance, i.e. the minimum number of inversions of DNA sections that would be required to convert one gene order into the other. Limulus can be converted to Rhipicephalus with 6 inversions. This does not imply that inversion is the dominant mechanism of genome rearrangement. Translocations and duplication/deletion processes can rearrange gene orders without changing strands. Only tRNA-Cys has changed strand in this case. Image courtesy of Marine Biology Laboratory, Woods Hole. www.mbl.edu/animals/Limulus Breakpoints Inversions Tigriopus japonicus 35 32 Heterodoxus macropus 35 32 Thrips imaginis 32 29 Pollicipes polymerus 22 16 Cherax destructor 20 16 Tetraclita japonica 20 16 Argulus americanus 20 18 Speleonectes tulumensis 19 16 Apis mellifera 19 16 Hutchinsoniella macracantha 18 16 Pagurus longicarpus 18 12 Vargula hilgendorfii 17 15 Lepidopsocid RS-2001 17 16 Habronattus oregonensis 16 14 Ornithoctonus huwena 15 13 Scutigera coleoptrata 15 15 Melipona bicolor 14 8 Varroa destructor 14 12 Armillifer armillatus 13 12 Narceus annularus 9 9 Thyropygus sp. 9 9 Aleurodicus dugesii 8 5 Anopheles gambiae 8 6 Tetrodontophora bielanensis 8 6 Artemia franciscana 7 5 Rhipicephalus sanguineus 7 6 Amblyomma triguttatum 7 6 Haemaphysalis flava 7 6 Locusta migratoria 6 5 Bombyx mori 6 5 Portunus trituberculatus 6 5 Ostrinia furnacalis 6 5 Tribolium castaneum 6 5 Antheraea pernyi 6 5 Chrysomya putoria 4 2 Tricholepidion gertschi 3 2 Daphnia pulex 3 2 Pyrocoelia rufa 3 2 Drosophila melanogaster 3 2 Panulirus japonicus 3 2 Triatoma dimidiata 3 2 Lithobius forficatus 3 3 Philaenus spumarius 3 2 Gomphiocephalus hodgsoni 3 2 Penaeus monodon 3 2 Crioceris duodecimpunctata 3 2 Triops cancriformis 3 2 Limulus polyphemus 0 0 Ixodes persulcatus 0 0 Ixodes holocyclus 0 0 Ixodes hexagonus 0 0 Carios capensis 0 0 Ornithodoros porcinus 0 0 Heptathela hangzhouensis 0 0 Ornithodoros moubata 0 0 Dup/Del 0 0 1 2 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 tRNA 2.15 1.39 1.34 0.69 0.54 0.66 0.72 0.83 0.84 0.86 0.65 0.79 0.60 1.48 1.95 0.48 0.93 0.83 0.85 0.63 0.49 1.04 0.41 0.77 0.63 0.82 0.88 0.82 0.38 0.51 0.51 0.49 0.55 0.50 0.36 0.44 0.62 0.52 0.37 0.58 0.59 1.13 0.69 0.69 0.34 0.55 0.42 0.36 0.72 0.76 0.74 0.70 0.67 0.76 0.68 Protein 1.34 1.83 1.32 0.59 0.57 0.57 1.12 0.93 1.50 0.87 0.45 1.41 0.59 1.09 1.23 0.44 1.66 1.09 1.73 0.58 0.46 1.54 0.47 0.70 0.64 0.96 1.00 0.96 0.52 0.54 0.44 0.48 0.53 0.54 0.42 0.39 0.51 0.77 0.42 0.53 0.50 0.61 0.58 0.62 0.32 0.58 0.40 0.40 0.82 0.83 0.90 0.79 0.86 0.87 0.88 Table 1 The gene order of the crustacean Tigriopus is completely scrambled with respect to Limulus. There are 35 break points for only 37 genes. The Limulus order has features in common with non-arthropod species, and is thought to be the ancestral arthropod gene order. The Tigriopus gene order has very little in common with any other known species. Thus there has been extensive recent genome scrambling in this lineage. Breakpoint category Very High High Moderate Low min 1.33 0.48 0.38 0.34 tRNA distance mean 1.62 0.86 0.63 0.60 only tRNA High only tRNA Mod/Low 0.66 0.34 1.01 0.60 Table 2 max 2.14 1.94 1.04 1.13 min 1.32 0.44 0.43 0.32 1.94 1.13 0.57 0.32 Table 1 shows the two gene-order distances and the two sequence-based distances between the ancestral arthropod and each current species. The number of deleted or duplicated genes with respect to the ancestor is also shown. Species are classed into four categories according to breakpoint distance (shown by colour). r = 0.59 Images coutesy of University of Nebraska, Dept.of Entomology. http://entomology.unl.edu/images/ This is also demonstrated by Table 2, which shows the minimum, mean and maximum of the sequence-based distances in each of the categories. Species with high break point distances also have high tRNA and protein distances. It is found that tRNA genes are more frequently translocated than rRNA or protein genes. There are many species where only tRNAs have moved. This includes 9 species whose breakpoint distance is in the ‘High’ category and 21 species in the ‘Moderate’ or ‘Low’ break point categories. In Table 2, the two bottom rows show that, even when only tRNAs have moved, there are higher tRNA and protein distances for species with higher breakpoint distances. This means that high rates of tRNA translocation are correlated with increased rate of evolution in tRNA and protein genes. protein distance mean max 1.50 1.83 0.99 1.73 0.69 1.54 0.62 0.90 1.15 0.63 r = 0.99 The figure on the right shows that all four distance measures are positively correlated with each other. The correlation coefficients are shown by each graph. 1.73 1.54 Acknowledgements r = 0.53 r = 0.69 This work is supported by Canada Research Chairs and NSERC. Discussion These results show that the rates of both sequence evolution and genome rearrangement are very nonclocklike. Species with high evolutionary rates often have close relatives with much lower rates. This means that rates have increased in scattered lineages independently. For example: Holometabolous insects : Bees (Apis, Melipona) >> Beetles (Tribolium, Crioceris) Hemiptera (bugs) : Aleurodicus >> Triatoma Maxillopod crustaceans : Tigriopus, Argulus >> Tetraclita, Pollicipes Spiders : Ornithoctonus, Habronattus >> Heptathela This suggests a breakdown in accuracy of mitochondrial genome replication in the fast-evolving lineages that causes higher mutation rate and higher susceptibility to major rearrangements, but it is also possible to envisage selective explanations, such as rapid adaptation to a new environment. There are still many questions that we do not understand. Why do tRNAs move much more frequently than larger genes? Why are there many examples of ‘long jump’ translocations of tRNAs? Why are there many examples of genes whose positions are reshuffled but remain on the same strand (no inversions)? We hope that further analysis of the OGRe gene order database will give some clues to these questions.