GENE ORDER DATABASE FOR PHYLOGENETIC ANALYSIS Takashi Kunisawa Department of Applied Biological Science, Science University of Tokyo, Noda 278, Japan Abstract: Gene arrangement is one of the key characters that can discriminate evolutionary relationships among genomes. However, there is no public database of gene order. Here, plastid genomes, whose size are usually less than 200,000 base pairs, were regarded as a model for phylogenetic inference based on gene arrangement. Gene order data of completely sequenced plastid genomes were compiled and were structured as a database. To build the database, incomparability in the database entries is a major problem; due to differences in nomenclature it is difficult to identify the same gene among species. Using a homology search program, a unified labeling system for identical genes was constructed across the genomes. The gene order data shed light on the evolutionary origin of apicomplexan plastids, which is under controversy. Although the green algal origin was proposed from sequence-based analyses, comparison of gene order and content suggests a red algal origin of apicomplexan plastids. 1 Introduction Gene arrangement is one of the key characters that can discriminate evolutionary relationships among genomes. Alterations in gene order are caused by inversions and/or transpositions of genome segments. Gene order comparisons have been emerging as a useful new tool for resolving phylogenetic relationships. However, there is no public database for gene order. For this, we first identify the same gene among different genomes, using, for instance, the major sequence databases. However, there exists incomparability in the entries in a single database; the identification is often not straightforward due to differences in nomenclature for gene or protein name. For these reasons, we feel necessity for developing gene order databases with a unified labeling system for easy comparison. Gene orders of plastid genomes may be a good example for this trial, since plastid genomes from various lineages have completely been sequenced and the number of encoded genes are less than 300. The existing conditions of a database for plastid gene orders is presented. Based on gene order comparison, the evolutionary origin of apicomplexan plastids, which is under controversy, is discussed. 2 Gene Order Database of Plastid Genomes Gene order data of 12 plastid genomes, which are completely sequenced, were compiled from the GenBank/EMBL/DDBJ sequence database. The plastid genomes thus compiled were summarized in Table 1. As mentioned, the incomparability in gene labeling is a major problem associated with the use of database annotations. Based on amino acid sequence data, homologous relationships of plastid-encoded genes or ORFs were identified with the use of FASTA computer program [1] and a unified gene labeling system across different genomes was developed. Currently the gene order data are stored in a flat file. Development of utilities for retrieving similar gene orders, for example, is in progress. 3 Ancestry of Apicomplexan Plastids Apicomplexan parasites such as the malaria parasites contain a plastid genomes in addition to the nuclear and mitochondrial genomes. The evolutionary origin of apicomplexan plastids is under controversy. Phylogenetic analyses of gene or protein sequences suggest that the apicomplexan plastid was captured directly from a green alga [2] or from a euglenoid alga [3], while comparisons of gene content of the plastid genomes suggest a rhodophyte ancestry [4,5]. In order to avoid the pitfalls of sequence-based phylogenetic analyses (e.g., nucleotide substitution biases and ambiguities in amino acid sequence alignment due to a high A+T content), we have compared gene order of the plastid genomes of Plasmodium falciparum [5] and Toxoplasma gondii (GenBank/EMBL/DDBJ Acc. # U87145), for which virtually identical genetic maps are reported, to gene orders of algal plastid genomes. Although the non-photosynthetic plastid genome of Plasmodium is rather reduced in size, its genetic map still includes more than 50 protein-encoding or RNAspecifying genes. In Fig. 1, arrangements of genes coding for ribosomal proteins are depicted, where contiguous gene segments are indicated by solid vertical lines and homologous genes are horizontally aligned. Genes absent from the alignment are likely to have been transferred to the nucleus during evolution. The arrangement in Plasmodium or Toxoplasma is similar to that in non-green plastid genomes from the rhodophyte alga Porphyra purpurea (U38804), the chromophyte diatom Odontella sinensis (Z67757) and the cryptomonad alga Guillardia theta (formerly Cryptomonas $B&5 (B) [6] in that Cluster II is located just downstream of Cluster I, whereas in green plastids from the chlorophyte alga Chlorella vulgaris (AB001684), the euglenophyte Euglena gracilis (X70810) and land plants such as the bryophyte moss Marchantia polymorpha [7] Clusters I and II are split and are located separately on the genome. The split arrangement can also be found in the cyanobacteria Synecocystis sp. PCC6803 [8] and Spirulina platensis [9], while in diverse bacteria other than cyanobacteria, i.e., Escherichia coli, Bacillus subtilis, Thermotoga maritima, Cluster II is located upstream of Cluster I. The split arrangement is thus likely to have occurred in the cyanobacteria and hence shows up in the green plastids, so that the apicomplexan plastids must have had a different origin than the green plastids. Instead, the unique order 5'-Cluster I-Cluster II-3' found in the plastids of Plasmodium/Toxoplasma, Porphyra, Odontella, and Guillardia indicates their another common ancestry. It is widely thought that Odontella and Guillardia plastids, which are also surrounded with four membranes, were acquired by secondary endosymbiosis from a red alga like Porphyra [10]. Thus, the arrangements shown in the figure support the red algal origin of apicomplexan plastids. In further support of this, arrangements of contiguous tRNA genes, trnC(GCA)-trnL(UAA) and trnS(GCU)-trnD(GUC), are shared among Plasmodium, Porphyra and Odontella but are not found in the green plastids. References 1. D. Lipman, W.R. Pearson, Science 227, 1435 (1985) 2.S. Kohler et al., Science 275, 1485 (1997) 3. N. Egea, N. Lang-Unnasch, J. Euk. Microbiol. 42, 679 (1995). 4. D.H. Williamson et al., Mol. Gen. Genet. 243, 249 (1994). 5. R.J.M. Wilson et al., J. Mol. Biol. 261, 155 (1996). 6. S.L. Wang et al., Biochem. Mol. Biol. Int. 41, 1035 (1997). 7. M. Sugiura, Esays Biochem. 30, 49 (1995). 8. T. Kaneko et al., DNA res. 3, 109 (1996). 9. M. Anna et al., J. Gen. Microbiol. 139, 2579 (1993). 10. M.A. Ragan, Bot. J. Linn. Soc., 118, 105 (1995).