China`s Efforts on Information Sharing of Science and Technology

advertisement
GENE ORDER DATABASE FOR PHYLOGENETIC ANALYSIS
Takashi Kunisawa Department of Applied Biological Science, Science University of
Tokyo, Noda 278, Japan
Abstract:
Gene arrangement is one of the key characters that can discriminate evolutionary
relationships among genomes. However, there is no public database of gene order.
Here, plastid genomes, whose size are usually less than 200,000 base pairs, were
regarded as a model for phylogenetic inference based on gene arrangement. Gene
order data of completely sequenced plastid genomes were compiled and were
structured as a database. To build the database, incomparability in the database
entries is a major problem; due to differences in nomenclature it is difficult to identify
the same gene among species. Using a homology search program, a unified labeling
system for identical genes was constructed across the genomes. The gene order data
shed light on the evolutionary origin of apicomplexan plastids, which is under
controversy. Although the green algal origin was proposed from sequence-based
analyses, comparison of gene order and content suggests a red algal origin of
apicomplexan plastids.
1 Introduction
Gene arrangement is one of the key characters that can discriminate evolutionary
relationships among genomes. Alterations in gene order are caused by inversions
and/or transpositions of genome segments. Gene order comparisons have been
emerging as a useful new tool for resolving phylogenetic relationships. However,
there is no public database for gene order. For this, we first identify the same gene
among different genomes, using, for instance, the major sequence databases. However,
there exists incomparability in the entries in a single database; the identification is
often not straightforward due to differences in nomenclature for gene or protein name.
For these reasons, we feel necessity for developing gene order databases with a
unified labeling system for easy comparison. Gene orders of plastid genomes may be
a good example for this trial, since plastid genomes from various lineages have
completely been sequenced and the number of encoded genes are less than 300. The
existing conditions of a database for plastid gene orders is presented. Based on gene
order comparison, the evolutionary origin of apicomplexan plastids, which is under
controversy, is discussed.
2 Gene Order Database of Plastid Genomes
Gene order data of 12 plastid genomes, which are completely sequenced, were
compiled from the GenBank/EMBL/DDBJ sequence database. The plastid genomes
thus compiled were summarized in Table 1. As mentioned, the incomparability in
gene labeling is a major problem associated with the use of database annotations.
Based on amino acid sequence data, homologous relationships of plastid-encoded
genes or ORFs were identified with the use of FASTA computer program [1] and a
unified gene labeling system across different genomes was developed. Currently the
gene order data are stored in a flat file. Development of utilities for retrieving similar
gene orders, for example, is in progress.
3 Ancestry of Apicomplexan Plastids
Apicomplexan parasites such as the malaria parasites contain a plastid genomes in
addition to the nuclear and mitochondrial genomes. The evolutionary origin of
apicomplexan plastids is under controversy. Phylogenetic analyses of gene or protein
sequences suggest that the apicomplexan plastid was captured directly from a green
alga [2] or from a euglenoid alga [3], while comparisons of gene content of the plastid
genomes suggest a rhodophyte ancestry [4,5].
In order to avoid the pitfalls of sequence-based phylogenetic analyses (e.g., nucleotide
substitution biases and ambiguities in amino acid sequence alignment due to a high
A+T content), we have compared gene order of the plastid genomes of Plasmodium
falciparum [5] and Toxoplasma gondii (GenBank/EMBL/DDBJ Acc. # U87145), for
which virtually identical genetic maps are reported, to gene orders of algal plastid
genomes. Although the non-photosynthetic plastid genome of Plasmodium is rather
reduced in size, its genetic map still includes more than 50 protein-encoding or RNAspecifying genes. In Fig. 1, arrangements of genes coding for ribosomal proteins are
depicted, where contiguous gene segments are indicated by solid vertical lines and
homologous genes are horizontally aligned.
Genes absent from the alignment are likely to have been transferred to the nucleus
during evolution. The arrangement in Plasmodium or Toxoplasma is similar to that in
non-green plastid genomes from the rhodophyte alga Porphyra purpurea (U38804),
the chromophyte diatom Odontella sinensis (Z67757) and the cryptomonad alga
Guillardia theta (formerly Cryptomonas $B&5 (B) [6] in that Cluster II is located
just downstream of Cluster I, whereas in green plastids from the chlorophyte alga
Chlorella vulgaris (AB001684), the euglenophyte Euglena gracilis (X70810) and land
plants such as the bryophyte moss Marchantia polymorpha [7] Clusters I and II are
split and are located separately on the genome. The split arrangement can also be
found in the cyanobacteria Synecocystis sp. PCC6803 [8] and Spirulina platensis [9],
while in diverse bacteria other than cyanobacteria, i.e., Escherichia coli, Bacillus
subtilis, Thermotoga maritima, Cluster II is located upstream of Cluster I. The split
arrangement is thus likely to have occurred in the cyanobacteria and hence shows up
in the green plastids, so that the apicomplexan plastids must have had a different
origin than the green plastids. Instead, the unique order 5'-Cluster I-Cluster II-3' found
in the plastids of Plasmodium/Toxoplasma, Porphyra, Odontella, and Guillardia
indicates their another common ancestry. It is widely thought that Odontella and
Guillardia plastids, which are also surrounded with four membranes, were acquired by
secondary endosymbiosis from a red alga like Porphyra [10]. Thus, the arrangements
shown in the figure support the red algal origin of apicomplexan plastids. In further
support of this, arrangements of contiguous tRNA genes, trnC(GCA)-trnL(UAA) and
trnS(GCU)-trnD(GUC), are shared among Plasmodium, Porphyra and Odontella but
are not found in the green plastids.
References
1. D. Lipman, W.R. Pearson, Science 227, 1435 (1985)
2.S. Kohler et al., Science 275, 1485 (1997)
3. N. Egea, N. Lang-Unnasch, J. Euk. Microbiol. 42, 679 (1995).
4. D.H. Williamson et al., Mol. Gen. Genet. 243, 249 (1994).
5. R.J.M. Wilson et al., J. Mol. Biol. 261, 155 (1996).
6. S.L. Wang et al., Biochem. Mol. Biol. Int. 41, 1035 (1997).
7. M. Sugiura, Esays Biochem. 30, 49 (1995).
8. T. Kaneko et al., DNA res. 3, 109 (1996).
9. M. Anna et al., J. Gen. Microbiol. 139, 2579 (1993).
10. M.A. Ragan, Bot. J. Linn. Soc., 118, 105 (1995).
Download