The Rieske protein; a case study on the pitfalls of... phylogenetic reconstruction

advertisement
MBE Advance Access published March 28, 2006
1
The Rieske protein; a case study on the pitfalls of multiple sequence alignments and
phylogenetic reconstruction
Research Article
Evelyne Lebrun*, Joanne M. Santini#, Myriam Brugna*, Anne-Lise Ducluzeau*, Soufian
Ouchane&, Barbara Schoepp-Cothenet*, Frauke Baymann* & Wolfgang Nitschke*+
*
Laboratoire de Bioénergétique et Ingénierie des Protéines, Institut de Biologie Structurale et
Microbiologie (IFR..), 31 chemin Joseph-Aiguier, 13402 Marseille Cedex 20, France
#
Department of Biology, University College, Room 524 Darwin Building, Gower Street,
London WC1E 6BT, UK
&
Centre de Génétique Moléculaire CNRS (UPR 2167), Bât. 24, avenue de la Terrasse, 91198
Gif-sur-Yvette Cedex, France
+
corresponding author: Wolfgang Nitschke, BIP/CNRS, 31 chemin Joseph-Aiguier, 13402
Marseille Cedex 20, phone: +33 491164435, fax: +33 491164578, e-mail:
nitschke@ibsm.cnrs-mrs.fr
Keywords: Rieske protein, phylogeny, lateral gene transfer, indel, bioenergetics
Running Head: Phylogeny of Rieske proteins
 The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology
and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org
2
Abstract
Previously published phylogenetic trees reconstructed on Rieske protein sequences frequently
are at odds with each other, with those of other subunits of the parent enzymes and with small
subunit rRNA trees. These differences are shown to be at least partially if not completely due
to problems in the reconstruction procedures. A major source of erroneous Rieske protein
trees lies in the presence of a large, poorly conserved domain prone to accommodate very
long insertions in well-defined structural hotspots substantially hampering multiple
alignments. The remaining smaller domain, in contrast, is too conserved to allow distant
phylogenies to be deduced with sufficient confidence. 3D structures of representatives from
this protein family are now available from phylogenetically distant species and from diverse
enzymes. Multiple alignments can thus be refined on the basis of these structures. We show
that structurally guided alignments of Rieske proteins from Rieske/cytb complexes and
arsenite oxidases strongly reduce conflicts between resulting trees and those obtained on their
companion enzyme subunits. Further problems encountered during this work, mainly
consisting in database errors such as wrong annotations and frameshifts, are described. The
obtained results are discussed against the background of hypotheses stipulating pervasive
lateral gene transfer in prokaryotes.
3
Introduction
Since the early 1960's, when Zuckerkandl and Pauling recognised that protein
sequences are documents of evolutionary history (Zuckerkandl & Pauling 1965), protein and
ribonucleotide sequence signatures have been used to infer family relationships between
species. The analysis of hemo/myoglobin and mitochondrial cytochrome c proved the validity
of this approach while at the same time already revealing some of the limitations of the
method. Small subunit (ssu) rRNA has subsequently replaced proteins as reference molecules
for taxonomic classification (Woese 87; Olsen, Woese and Overbeek 1994). More recently,
the availability of a rapidly increasing number of whole genome sequences revigorated
interest in protein-based phylogenetic studies (Doolittle and Logsdon 1998; Snel et al.1999;
Hansmann and Martin 2000; Nesbø et al. 2001; Clarke et al. 2002; Raymond et al. 2003).
The comparison of phylogenetic trees derived from selected proteins and from ssu
rRNA frequently revealed substantial discrepancies between the respective trees. These
incongruencies have in the last few years increasingly been interpreted as providing evidence
for a high frequency of lateral gene transfer (Doolittle and Logsdon 1998; Doolittle 1999;
Nesbø et al. 2001; Gogarten, Doolittle and Lawrence 2002; Raymond et al. 2002) implying
that the "tree of life" might actually rather look like an intervoven mesh than a hierarchical
tree. Such a mesh-like structure of the phylogenetic relationship between species would have
far-reaching consequences for the study of the metabolic capacities of the earliest cells
striving on Earth. While the results of comparative studies of metabolic pathways in the
framework of a (mostly) hierarchical tree were taken to indicate a very early origin of the
majority of these mechanisms (Castresana and Moreira 1999; Kyrpidis et al. 1999; Schütz et
al. 2000; Castresana 2001; Baymann et al. 2003), a mesh-like topology of the phylogenetic
tree of species would suggest more recent lateral gene transfer rather than common ancestry
as the most likely cause for the universal distribution of metabolic capabilities. However, the
notion of the phylogenetic tree of prokaryotes not being mainly hierarchical has been
challenged during recent years (Clarke et al. 2002; Daubin et al. 2003; Eisen and Fraser 2003;
Kurland et al. 2003; Yang et al. 2005). These studies argue that discordant phylogenies
possibly arising from methodological problems are too readily blamed on lateral gene transfer
by the proponents of the net-like ancestries. In this work we describe various parameters
potentially influencing topologies of phylogenetic trees as observed by a specific example, the
"Rieske" [2Fe-2S]-protein.
Rieske proteins are found as subunits of a number of different enzymes (Carrell et al.
1997; Colbert et al. 2000; Lebrun et al. 2003), the most prominent of which is the Rieske/cytb
4
complex, a key enzyme of chemiosmotic bioenergetic electron transport chains. This enzyme
has been the object of several phylogenetic studies in the past (Castresana 1995; Schütz et al.
2000; Xiong et al. 2000; Sone et al. 2001; Baymann et al. 2003). Whereas the topology of the
phylogenetic tree obtained from sequences of the cytochrome b subunit fits standard ssu
rRNA trees well (to the obvious exception of the endosymbiotically acquired proteins of
mitochondria and plastids in eukaryotes), published phylogenies of the Rieske subunit
strongly deviate from those of the cytochrome b subunit and of the ssu rRNA present in the
corresponding species (Schütz et al. 2000; Schmidt and Shaw 2001). Horizontal gene transfer
lends itself as an obvious explanation for this fact and has indeed been discussed previously
(Schmidt and Shaw 2001). The integration of the gene coding for the Rieske protein of
Rieske/cytb complexes in an operon with the cytochrome b subunit in all cases studied so far
(except the cyanobacteria), however, raises doubts about the feasibility of frequent horizontal
gene transfer events. Furthermore, several differing trees have been reported raising the
suspicion that weak robustness of Rieske protein's phylogenetic trees rather than horizontal
gene transfer may account for the observed differences.
The results presented below highlight the difficulties in deducing reliable multiple
alignments for phylogenetically distant and hence strongly divergent sequences of members
from the Rieske protein superfamily. An inspection of 3D structures of Rieske proteins from
diverse representatives (Iwata et al. 1996; Carrell et al. 1997; Zhang et al. 1998; Ellis et al
2001; Bönisch etal. 2002; Hunsicker-Wang et al. 2003; Kurisu et al. 2003; Stroebel et al.
2003) reveals the presence of extensive indels in structural hotspots. These indels complicate
multiple alignment procedures based on commonly used algorithms. Guiding multiple
alignments of Rieske proteins by structural, genetic, biochemical and biophysical information
results in a substantial increase in congruency of Rieske-, cytochrome b- and ssu rRNA-trees.
In addition to the alignment problem, a number of further parameters potentially leading
phylogenetic trees of Rieske proteins astray, will be addressed.
5
Materials and Methods
ORFs coding for Rieske proteins, cytochrome b from Rieske/cytb complexes and
molybdenum-subunits
of
arsenite
oxidase
were
retrieved
from
the
NCBI
(http://www.ncbi.nlm.nih.gov) and KEGG servers (http://www.genome.ad.jp/kegg-bin/).
Several unfinished genomes were analysed via the TIGR-server (http://www.tigr.org)
Coordinates of the Rieske proteins from yeast mitochondrial cytochrome bc1-complex
(entry 1KB9), plastidic cytochrome b6f complex (1Q90), Sulfolobus acidocaldarius (1JM1),
Thermus thermophilus (1NYK) and arsenite oxidase from Alcaligenes faecalis (1G8J) were
obtained from the pdb-database (http://www.rcsb.org/pdb/).
Structural alignments were obtained using the rms-fit option of the Swiss PdbViewer
(version 3.7; http://www.expasy.org/spdbv).
All available structures were found to yield good structural alignments for the β-strand
skeleton β2 to β7. A Swiss pdb-Viewer file was created containing those structures aligned as
described above in sucessive layers. In the parent layer, representing the arsenite oxidase
Rieske protein, all residues except those present in the β-strand skeleton were subsequently
deleted whereas in the other layers, the β-skeleton was taken out and only the connecting
loops were retained. Visualising all layers therefore yielded a Rieske protein featuring short
and long loop connections as observed in the structures. This "virtual" Rieske structure was
subsequently reintegrated into the parent enzymes cyt bc1-, cyt b6f complex and arsenite
oxidase by structure alignment of the β-sheet skeleton with that of the genuine Rieske protein
present in the complex and subsequently deleting the latter from the structure file.
Secondary structure prediction was performed by means of the pSAAM-package
(www.life.uiuc.edu_crofts_ahab_psaam.html).
ClustalX (Thompson et al. 1997) was used to obtain multiple sequence alignments of
cytochrome b, the arsenite oxidase molybdenum-subunit and of sequence stretches not fixed
by the structural alignment in the Rieske proteins. For cytochrome b, the ClustalX alignment
was found to nicely superimpose secondary features such as transmembrane and membraneparallel helices as well as functionally important residues.
Phylogenetic trees were reconstructed from these alignments using the neighbourjoining (NJ-) algorithm implemented in ClustalX or using the parsimony method (Phylippackage).
Sequences analysed in this work are detailed as Table I of the Supplementary Material.
6
7
Results
Defining the limits of the present analysis
The "Rieske protein" was identified 40 years ago as an indispensable subunit of the
mitochondrial cytochrome bc1-complex (Rieske et al., 1964) and since then has been found in
all cytochrome bc-type enzymes. Since the Rieske protein and cytochrome b form the
functional core of a "cytochrome bc-complex", it has been proposed to rename this enzyme
family the "Rieske/cytochrome b" complexes (Schütz et al., 2000). In addition to their role as
a crucial subunit of these enzymes, however, Rieske proteins are widespread electron transfer
proteins and participate in very divergent enzymes. Characterised examples comprise the
proteobacterial dioxygenases (Schmidt and Shaw, 2001) and the prokaryotic arsenite oxidases
(Ellis et al., 2001). In Archaea, Rieske proteins appear to serve as soluble electron carriers
(Iwasaki et al., 1996).
Ideally, a phylogenetic tree encompassing all Rieske proteins should be arrived at.
Previous reports attempted the construction of a common tree for the proteins of Rieske/cytb
complexes, dioxygenases and the archaeal soluble electron shuttles (Schmidt and Shaw 2001).
As will be detailed below, two criteria strongly improve the reliability of phylogenetic tree
reconstruction, i.e. (a) the presence of crystal structures and (b) a good coverage of the species
tree for the individual enzymes. In contrast to Rieske/cytb complexes (Schütz et al. 2000;
Schmidt and Shaw 2001) and arsenite oxidases (Lebrun et al. 2003), the sample of available
sequences of dioxygenase Rieskes is incomplete. Almost all representatives are from γproteobacterial enzymes with very divergent substrate specificities. In some of these enzymes,
the Rieske protein is actually a domain fused to the large catalytic subunit (Kauppi et al.
1998). It therefore seems quite likely to us that the bacterial dioxygenases by themselves form
a diverse family of enzymes whereof only a rather limited subset of primary sequences is
known so far. The same is true for the cases of the archaeal soluble Rieske proteins and the
desaturases. Moreover, crystal structures for the latter two groups are lacking. We have
therefore decided to restrict our present analysis to the Rieske proteins from Rieske/cytb and
arsenite oxidase enzymes for which both crystal structures are available and sufficient
sequence coverage can be obtained.
Data mining and correction of genome sequence errors
To have a complete as possible set of sequences for the reconstruction of phylogenetic
trees at our disposal, we searched protein and nucleotide data bases as well as whole genomes
for the presence of operons encoding Rieske/cytb complexes and arsenite oxidases. A few
selected
Rieske
genes
not
situated
in
such
operon
contexts
but
for
which
8
biochemical/molecular biological evidence suggests a functioning in the respective
Rieske/cytb complexes (Schneider et al. 2004; Ouchane et al., 2005) were included in the
analysis.
A closer inspection of ORFs retrieved from whole genomes which were both (i)
annotated as Rieske or cytochrome b genes and (ii) present in an operon context with the
other subunits of the Rieske/cytb complexes or arsenite oxidase showed that a sizeable
fraction of these genes featured complications hampering phylogenetic reconstruction or
outright contained severe sequence errors.
(a) Gene fusions: In a sub-group of the α-proteobacteria (e.g. Rhodopseudomonas
palustris and Bradyrhizobium japonicum) as well as in most Bacilli, the c-type cytochrome of
the Rieske/cytb complex is fused to the C-terminal end of cytochrome b. Omitting to cut off
the cyt c–part of the sequence results in an artificial attraction of these two phylogenetically
rather distant species .
(b) N-terminal extensions: The gene of the Rieske protein in most Actinobacteria
contains a long sequence stretch coding for two additional N-terminal transmembrane helices.
These helices are indeed present in the mature protein (Sone et al. 2001; 2003). Equivalent
sequences coding for hydrophobic stretches are present in the mitochondrial Rieske genes.
The stretches in the genes of the mitochondrial proteins serve for addressing the gene
products into the organelle and are subsequently cleaved off. Clustal failed to correctly align
the bulk of the protein unless these stretches were removed from the respective sequences.
(c) Erroneous lengths of ORFs and frameshifts: In a few cases (see Table I in the
Supplementary Material), annotated ORFs didn't represent the full-length genes. This was
found to be due to erroneous interpretation of within-sequence methionines as N-termini and
frameshifts induced by sequencing errors. Correct ORFs were determined directly from the
base sequences of the genomes and truncations resulting from obvious frameshifts were
remediated for our analysis.
(d) Erroneous annotation: The genes for arsenite oxidase in Chloroflexus aurantiacus
and Aeropyrum pernix are incorrectly annotated in the respective genomes despite
unambiguous BLAST-scores. Failing to detect misannotations will either reduce species
coverage or confuse the ortholog/paralog distinction.
Multiple sequence alignments
The set of sequences for Rieske proteins and cytochrome b from Rieske/cytb
complexes retrieved as described above was fed into a Clustal multiple alignment routine.
9
Phylogenetic trees were subsequently built from these multiple alignments by the neighbourjoining method. Resulting trees for the Rieske protein and cytochrome b are shown in Fig. 1a
and b, respectively. These two trees differ in numerous places. Apart from minor branching
differences in the hierachically high nodes of the proteobacteria, a number of inter-phyla
replacements are observed such as the positionings of the heliobacteria/ desulfitobacterium- or
the chlorobiaceae-clades and that of the SoxF-protein from an archaeal Rieske/cytb complex.
Arsenite oxidases do not contain cytochrome b but a molybdopterin-containing catalytic
subunit and we therefore added the phylogeny obtained for this protein in the upper part of the
tree in Fig. 1b.
Since an incorrect multiple alignment might rationalise the deviant tree topologies, we
tried to assay the validity of these alignments. For cytochrome b, only structures for the
mitochondrial and for the plastidic/cyanobacterial representatives are available to date.
Cytochrome b is an integral membrane protein with 7-12 transmembrane helices (n.b.: the
split cyt b6/subunit IV gene is treated as a single sequence in this work), and thus the overall
correctness of the alignment can be verified by comparing the positioning of the hydrophilic
stretches. Furthermore, four glycine and four histidine residues involved in binding the two bhemes as well as the so-called PEWY-motif participating in forming the substrate binding QOsite are fully conserved in all cytochrome b sequences (e.g. see Schütz et al. 2000). These
criteria allowed us to conclude that in the major part of the sequence, Clustal had managed to
correctly align these proteins. The only uncertain stretches occur in the region of the b6/SUIVsplit and at the C-terminal end after the P(EWY)-motif. Using standard gap penalties, Clustal
produces numerous indels in this region. To minimise biases introduced by misalignments in
these stretches, positions containing gaps were excluded from the analysis yielding the tree of
Fig. 1b. However, even considering these positions in NJ-calculations did not significantly
modify the tree topology.
For the Rieske proteins, however, a substantially different result was obtained.
Comparison of the Clustal-alignment underlying the phylogram of Fig. 1a and the published
crystal structures showed that the N-terminal halves of the sequences were strongly
misaligned in many cases. By contrast, in the C-terminal half of the sequences, the Clustal
alignment fitted the structural alignment extremely well.
To understand this peculiar pattern, a detailed inspection of the published structures
was performed.
10
Global structural features of an archetypal Rieske protein
3D-structures of Rieske proteins that are part of Rieske/cytb-type enzymes have been
reported from mitochondria (Iwata et al. 1996; Zhang et al. 1998; Hunte et al. 2000),
chloroplasts (Carrell et al. 1997; Stroebel et al. 2003) and cyanobacteria (Kurisu et al. 2003),
the thermophilic Bacterium Thermus thermophilus (Hansicker-Wang et al. 2003) and the
hyperthermoacidophilic Archaeon Sulfolobus acidocaldarius (Bönisch et al. 2003). Since the
amino acid sequences of the mitochondrial protein are almost similar to those of the protein in
α-proteobacteria, these five structures represent examples from very diverse regions of the
phylogenetic tree of the prokaryotes and therefore provide a good picture of conserved and
variable structural elements in the Rieske proteins from the Rieske/cytb complexes. In
addition, the structure of the Rieske subunit in arsenite oxidase has been solved (Ellis et al.
2001).
As has been noted previously (Hunsicker-Wang etal. 2003), the conserved structural
core of Rieske proteins consists of 10 antiparallel β-sheets. For the sake of a comparative
analysis, we have created the structure file of a virtual Rieske protein encompassing all the
structural features of its phylogenetically diverse individual constituents via a structural
alignment of this β-sheet skeleton (see Materials and Methods Section). This "chimeric"
Rieske protein is shown in Fig. 2a,b emphasising the conserved and the variable elements.
Three obvious tertiary structural elements can be distinguished. (a) The cluster binding
domain with highly conserved sequence motifs extending from strand β5 to strand β8
(marked in orange in Fig. 2a), (b) the "large" domain formed by strands β1 to β4 and β9/β10
together with structural elements (loops or helices) connecting these β-strands (dark magenta
in Fig. 2a) and (c) the hydrophobic α-helix preceeding the large domain and anchoring the
protein in the membrane (green in Fig. 2a-f).
Considering the extreme evolutionary distance between these proteins (spanning all
three domains of life as well as comprising two quite distinct enzymes), the high degree of
structural conservation is striking. As shown below, this surprisingly low structural variability
is due to strong functional constraints exerted on this protein.
Substantial sequence variability occurs at well-defined mutational hot spots
Albeit demonstrating a strong conservation of global structure and in particular of the
β-sheet skeleton throughout Rieske proteins from both Rieske/cytb complexes and arsenite
11
oxidases in all domains of the phylogenetic tree, Figure 2a,b also clearly shows the presence
of specific regions of extensive structural variability with indels of up to 35 residues. These
indels mainly involve β-strand-connecting loops. Three such regions can be discerned in the
presently available structures with the most prominent one being represented by the loop
connecting sheets β3 and β4. For all three strand-connecting loops, the most divergent
versions observed in the structures are schematically indicated in Fig. 2a,b.
Fig. 2b represents a view of the protein facilitating the visualisation of the three
strongly variable stretches with respect to the invariable β-sheet core (in grey). For the β3/β4connecting loop, two different maximal-length loops are shown, encountered in (α-, β-, γ-)
proteobacteria (pink) and in crenarchaeota (cyan). The phylogenetic distance between these
two phyla is enormous. The fact that much shorter insertions are observed in all other phyla
raises the suspicion that a long insertion in this position may have occurred twice
independently. The strongly divergent fold of these stretches shown in Fig. 2a,b supports this
scenario which is further corroborated by the complete absence of sequence homology (see
below). Intriguingly, the space occupied by these two insertions connecting β-strands 3 and 4
is filled in by the C-terminal extension of the cyanobacterial/plastidic Rieske protein (dark
blue in Fig. 2a-f). This observation of spatial regions harbouring significant structural
variability as opposed to those featuring an invariant scaffold suggests that the mutability of
the Rieske protein is largely dominated by constraints imposed by its interactions with other
subunits of the respective parent enzymes.
Structural constraints determining positions of hotspots
Figure 2 also shows an incorporation of the virtual chimeric protein into the available
3D-structures of the parent enzymes, i.e. the cytochrome bc1-complex from mitochondria
(Fig. 2c,d), the cytochrome b6f complex from cyanobacteria and chloroplasts (Fig. 2e,f) and
arsenite oxidase from a β-proteobacterium (Fig. 2g,h). In the images to the left side of the
figure, the enzymes are viewed along an axis parallel to the membrane, whereas the right hand
views represent the enzymes seen from the positively charged (periplasmic/luminal) side of
the membrane. The inner and outer surfaces of the membrane in the side views (left) are
indicated by hatched boxes.
In the mitochondrial bc1 complex, the long extension connecting β-sheets 3 and 4 is
found on the extreme periphery of the enzyme (pink in Fig. 2c,d; dotted circle) and does not
interact with other subunits in the functional dimer. In the cytochrome b6f complex where the
12
structurally unrelated cytochrome f replaces cytochrome c1, the long extension present in the
mitochondrial Rieske protein would intrude into the space occupied by cytochrome f's small
domain rationalising the absence of an insertion in the b6f-Rieske protein (arrow in Fig. 2f; the
structural overlap occurs on the level of the side chains whereas only backbone traces are
shown in Fig. 2). In arsenite oxidase, the interaction surface on the Rieske protein with the
large, catalytic subunit (shown in green) strongly differs from that involved in the Rieske/cytb
complexes. The β3/β4 connecting loops both of bc1 complexes and of the Sulfolobus enzyme
as well as the C-terminal extension of the cytochrome b6f complex would structurally interfere
with the catalytic subunit (arrows in Fig. 2 g) and they are correspondingly absent in all
arsenite oxidase Rieske proteins (see multiple alignment in Supplementary Material).
Intriguingly, the loops equivalent to the β2/β3-connecting region in the Rieske protein from
Thermus thermophilus (purple in Fig. 2) would be fully exposed at the surface of arsenite
oxidase (Fig. 2g, dotted circle) suggesting that insertions might be tolerated in this position.
Yet, such an insertion is not observed in any of the arsenite oxidases. A structural reason for
this absence might be sought in the association of the enzyme to the cytoplasmic membrane.
EPR data obtained on the arsenite oxidase from Chloroflexus aurantiacus indeed suggest a
geometry of association to the membrane as indicated in Fig. 2g (Lebrun et al. 2003). The
question of arsenite oxidase's membrane attachment actually is a matter of debate. Whereas
the corresponding enzymes in Cenibacterium arsenoxidans (Muller et al. 2003) and
Chloroflexus aurantiacus (Lebrun et al. 2003) have been reported to be associated to the
cytoplasmic membrane, those from "Alcaligenes faecalis" (Ellis et al. 2001) and Rhizobium
sp. st. NT-26 (Santini and vanden Hoven 2004) and Hydrogenophaga sp. str. NT-14 (vanden
Hoven & Santini 2004) represent fully soluble isolates. The genes encoding the arsenite
oxidase Rieske protein contain the N-terminal transmembrane helix (see sequence alignment
in Supplementary Material). However, this part of the protein corresponding to a signal
sequence required for translocation via the Tat (Twin Arginine Translocation)-system may
well be cleaved off in the mature enzyme. Even if a membrane-anchoring helix wasn't
retained in the functional state of the enzyme by all or part of arsenite oxidase's Rieske
proteins, a high affinity of this protein for the cytoplasmic membrane would still be possible
in which case a loop extension between β2 and β3 wouldn't be tolerated.
The loop between the strands β5 and β6 in SoxF from Sulfolobus acidocaldarius (red
in Fig. 2) overlaps with the large subunit of arsenite oxidase but also with the small domain of
cytochrome f of the cyanobacterial/plastidic b6f complex (arrows in Fig. 2f,g). In line with the
13
fact that the small size of cytochrome c1 as compared to cytochrome f opens up space for this
loop, corresponding insertions are indeed present in a number of proteobacterial complexes
(see multiple alignment in Supplemental Material). In turn, the strict absence of this insertion
in the group of the actinobacterial Rieske/cytb complexes suggests that the electron accepting
redox protein likely occupies this particular space. Indeed, a (probably bulkier) diheme
cytochrome has been reported to play the role of cytochrome c1 or cytochrome f in these
bacteria (Sone et al. 2001).
All the mentioned examples show that evolutionary variability in Rieske proteins is
only tolerated in a few well-defined areas of the structure. In these specific places, however,
very extensive sequence and structural variability is observed. The structural conservation
required by the embeddedness into the parent enzymes allows only structurally neutral
exchanges in the remaining bulk of the protein. In the cluster binding domain (representing
roughly one third of the total sequence), additional functional constraints (cluster ligation and
stabilisation, redox potential and pK-value thereof; for a detailed discussion see Link 1999)
further limit the degree of free evolutionary change strongly suppressing the evolutionary
signal in this part of the sequence.
Phylogenies of Rieske proteins obtained from structure-based multiple sequence alignments
The structural alignment obtained for Rieske proteins with a published 3D structure
was used as basic scaffold for a multiple alignment. Secondary structural elements such as βsheets and turns were predicted from the amino acid sequences for phyla where no structural
information is available. Stretches predicted to correspond to each other were further aligned
by Clustal. As control, the same procedure was applied to the proteins with known crystal
structures. The alignment obtained by this method was found to be identical to the rigorously
structural alignment.
The resulting multiple alignment was then used to produce an NJ-phylogram (Fig. 1c).
The comparison of the trees in Fig. 1 demonstrates that the number of clashes between the
cytochrome b- and the Rieske tree is substantially reduced for the tree obtained on the
structure-based alignment (Fig. 1c). In particular, no discordant topologies for the deepbranching nodes persist in the trees b and c. Disagreements, however, remain in the highest
nodes of the proteo- and cyanobacteria. For these nodes, the topologies of the trees based on
the Clustal-only (Fig. 1a) and the structure-assisted (Fig.1c) alignments are identical. This is
not astonishing since the respective sequences are very closely related and thus relatively
straightforward to align. Moreover, both these trees are perfectly in line with ssu-rRNA trees
14
in that they reproduce the α- to ε-subdivisions of the proteobacteria with the β- and γsubdivisions forming a common clade that branches off from the α-proteobacterial lineage.
The β/γ-clustering of the proteobacterial subdivisions is much less rigorous in the cytochrome
b-tree (Fig. 1b). For example, the β-proteobacterium Neisseria meningitidis clusters with the
γ-proteobacteria whereas Allochromatium vinosum (γ-subdivision) is found within the βgroup. The corresponding nodes both show relatively low bootstrap values in Fig. 1b.
Furthermore, minor realignments in the region where the cytochrome b alignment is not fully
determined (as discussed above) alter the positioning of the Allochromatium and Neisseria
branches (not shown). Accordingly, it seems extremely unlikely to us that these
disagreements reflect true phylogenetic discordances due to lateral gene transfer.
Phylogenies based on parsimony approaches had topologies almost identical (for more
than 90% of all nodes) to those depicted in Fig. 1, a-c (not shown). Disagreements were
restricted to orderings of a few of the highest nodes. Interestingly, both the structure-guided
and the Clustal-only alignments yielded similar trees in NJ- and parsimony methods
demonstrating that, for the case of the considered sequences, the dominant source of error lies
in the alignment rather than in the employed tree-building algorithm.
15
Discussion
The phylogenetic signal varies strongly along the amino acid sequence of Rieske proteins
As shown above, the Rieske proteins are characterised by a bimodal architecture. The
domain ligating the [2Fe-2S] cluster and harbouring a strongly conserved disulfide bond, is
practically devoid of indels (to the exception of the SoxF case) and contains about 20%
almost invariable (i.e. less than 3 different residues at a given site) sequence positions. At
least two positions have been shown to tune the redox potential of the cluster (Denke et al.
1998; Schröter et al. 1998) and are thus determined by functional constraints. All these
structural/functional constraints strongly limit the extent of freely variable sequence sites.
Considering the phylogenetic diversity of parent organisms and enzymes, the sequence
forming the cluster binding domain certainly is saturated with respect to the evolutionary
signal limiting the reliability of extractable phylogenetic information. The sequence of the
large domain, by contrast, probably contains strong phylogenetically relevant information.
The structurally conserved core of this domain is made up of β-sheet structural elements that
are intrinsically much less mutationally constrained by side-chain interaction than α-helices.
The fact that one side of the protein in the cytochrome bc1-complexes is in contact with
cytochrome b forbids the presence of extended indels. The opposite side (on the periphery of
the enzyme complex), however, features extensive insertions in several places along the
primary structure. A high amount of phylogenetic information can therefore be extracted from
the sequence of the large domain of Rieske proteins provided that the reliability of multiple
sequence alignments is ensured by structural comparisons.
Reconstruction of phylogenetic trees for Rieske proteins requires both multiple alignments
guided by 3D structural information and an informed selection of sequences
In our hands, none of the commonly used multiple alignment algorithms (e.g.
CLUSTAL or MACAW) yielded results in line with the structural data regarding the large
domain of Rieske proteins. Extensively varying gap penalties did not significantly improve
the obtained alignments. This failure to correctly align these sequences most probably is due
to the simultaneous presence of the weakly sequence-, but strongly structure-conserved βsheet backbone and a number of long indels, for instance making up for roughly one third of
the large domain in the case of the proteobacterial bc1-Rieske protein. The difficulties in
aligning the large domain were noted previously and frequently led to the approach of
restricting phylogenetic analysis to the cluster-binding domain (e.g. see Schmidt and Shaw
2001). Studies on other genes have indeed demonstrated changes in tree topologies upon
16
exclusion of uncertain sites in multiple alignments (Hansmann and Martin 2000). Reported
alignments of Rieske proteins trying to consider the full sequence (e.g. Schütz et al. 2000) are
structurally wrong in several places. In the absence of 3D structures of a wide sample of
phylogenetically diverse Rieske proteins, exclusion of the large domain from analysis was
thus certainly the best possible method to tackle the problem. However, the validity of this
approach breaks down when it comes to comparing phylogenetically distant species for which
there isn't enough evolutionary information available in the number of sites present. Inclusion
of the full sequence in a relatively reliable alignment, by contrast, results in a much better
correspondence between the phylogenetic relationships of Rieske proteins and of cytochrome
b (Fig. 2). This latter observation provides strong evidence that the structure-guided alignment
is indeed much closer to reality than those either arrived at by automated sequence alignments
of the full sequence or obtained on a reduced subset of strongly conserved sites.
In phylogenetic studies, congruency of tree topologies calculated via different methods
are often taken to corroborate the reliability of the presented phylograms. The above
mentioned observation that parsimony trees deviate only slightly from NJ-trees for all three
cases shown in Fig. 1, raises concerns about the general validity of this argument. Alignment
quality certainly prevails on the choice of tree-building methods and even with all available
methods yielding congruent results, a tree may be wrong if its underlying alignment is
incorrect.
The difficulties encountered in arriving at a correct alignment for strongly divergent
sequences, especially if they contain long indels, are certainly the most important, yet not the
only, source of erroneous tree topologies. A further complication comes from the fact that
frequently a detailed inspection of the gene context is required to distinguish between
orthologous and paralogous genes. Rieske genes in databases are classed into the cluster of
orthologous genes COG 0723. As emphasised previously (Schmidt and Shaw 2000; Baymann
et al. 2003), representatives of this COG belong to enzyme families as divergent as
cytoplasmic dioxygenases, soluble electron shuttles, arsenite oxidases or Rieske/cytb
complexes. The Rieske subunits in these enzymes have diverged prior to the
Bacteria/Archaea-split, i.e. are in fact paralogs rather than orthologs. Again, a machine-based
retrieval of sequences from this COG followed by their multiple alignment without further
selection will result in strongly messed-up phylogenies.
Conversely, true orthologs maximally covering the diversity of species need to be
considered in the analysis. In the framework of the work presented here, the observation that
including a single sequence, i.e. that from Desulfitobacterium hafniense, was crucial for
17
attracting the two heliobacterial sequences into the Bacillus/cyanobacterial cluster,
emphasised the need for completeness. Prior to retrieval and inclusion of the D. hafniense
Rieske gene but using an identical alignment for the remaining sequences, the heliobacterial
Rieske proteins were found to cluster with the Actinobacteria (as found in previous reports;
see Schmidt and Shaw 2000), although with extremely low bootstrap values (below 5%!). D.
hafniense lying midway between the Heliobacteria and Bacilli allowed the clustering of this
set of sequences characterised by a strongly divergent sequence of the large domain.
Indels are not always good indicators for clades
The analysis of the presence/absence of insertions is employed as a method
independent of the standard tree building algorithms to deduce phylogenetic clades (Baldauf
and Palmer 1993; Gupta 1998; Gupta et al. 1999; Gupta et al. 2003; Griffiths and Gupta
2004a,b). This approach is intuitively appealing and avoids the psychological obstacle of
relying on intricate mathematical algorithms based on sequence evolution models. This
"indel"-approach frequently yields tree topologies at variance with those obtained with the
conventional methods (Gupta et al. 1999; Griffiths and Gupta 2004b) which is taken by the
authors of the respective articles as evidence for the limitations of the conventional methods.
The analysis of the occurance of indels in the Rieske protein detailed above suggests
some caution in their use for cladistic purposes. As emphasised in Fig. 1, insertions occur at
specific hotspots where the structure of the protein and in particular its interactions with the
other subunit(s) of the entire enzyme allow additional bulk to be inserted. The most striking
example is the loop connecting the sheets β3 and β4 which, being positioned on the periphery
of the dimeric Rieske/cytb complex, houses insertions more than 30 residues long. Very long
insertions in this position have occured independently several times during the evolution of
Rieske/cytb complexes (see for example the structurally and sequencewise completely
unrelated loops of mitochondria and Crenarchaeota).
Conversely, a complete absence of a loop extension is observed in the Rieske/cytb
complex from the Bacillus/cyanobacterial phylum and in arsenite oxidase, i.e. proteins even
belonging to different enzyme families. Cladistic interpretations of the occurance of indels in
the Rieske proteins would thus produce severly misleading results. The high probability of
adding and removing sequence stretches in defined places of the sequence rationalised by the
structural analysis limits the cladistic significance of such events.
18
Comparison of 16S rRNA and Rieske/cytb trees
As detailed above, founding multiple alignments of the Rieske proteins on alignments
of the basic structural elements results in rather similar tree topologies for both subunits of the
Rieske/cytb and arsenite oxidase enzymes. This topology can now be compared to species
phylogenies (based on ssu rRNA). The tree shown in Fig. 1b corresponds rather closely to that
proposed in standard phylogenetic studies (eg. Olsen et al. 1994). A major discrepency to the
tree put forward by Olsen et al. (1994), however, consists in the fact that the Firmicutes
(comprising the clades of the low-GC Gram-positive and the Heliobacteria) and the
Actinobacteria (formerly denoted as high-GC Gram-positive bacteria) clearly do not form
sister clades in the protein trees. This splitting in Rieske/cytb trees has already been noted by
Sone (Sone et al. 2001; Sone et al. 2003).
The apparent discrepancy between Rieske/cytb- and ssu rRNA-based phylogenies can
be rationalised in two different ways. (i) the Rieske/cytb enzyme in one of the two phyla has
been imported by lateral gene transfer or (ii) one of the two trees shows a wrong branching
order. In this context, it is noteworthy that 16S rRNA trees have been published which
separate Actinobacteria and Firmicutes positioning the Actinobacteria as a phylum closer to
the root of the domain Bacteria. It therefore seems quite likely to us that Actinobacteria and
Firmicutes indeed aren't sister clades as suggested by the trees in Fig. 1b and 1c. The
clustering of Firmicutes (including Heliobacteria), Chlorobiaceae and cyanobacteria seen in
Fig. 1b,c corroborates scenarios for the emergence of photosystems which stipulate a common
branching node for phototrophs containing an RCI-type, i.e. photosystem I-related,
photosynthetic reaction centre (Schütz et al. 2000; Baymann et al. 2002).
The Aquificalis Rieske/cytb complex clusters with the ε-proteobacteria instead of
representing the lowest branch of the bacterial domain as observed in ssu-rRNA trees. The
respective nodes on our protein trees are highly supported and similar in all phylograms of
Fig. 1. Corresponding affinities of Aquifex genes to ε-proteobacterial representatives are
observed for roughly 20% of the organism's genome. Aquificales are now known to share the
hyperthermal deep-sea vent habitat with numerous ε-proteobacterial species (Corre et al.
2001) rationalising extensive lateral gene transfer between these phyla. The case of the
Aquificalis Rieske/cytb complex therefore almost certainly represents a true example of
horizontally transferred genes observable in the trees of Fig. 1.
Incidentally, it is noteworthy that the trees b and c in Fig. 1 both support the
Opisthokonta hypothesis (Cavalier-Smith 1998), i.e. indicate a common clade for
19
mitochondrial Rieske/cytochrome b from animals and fungi together with an earlier branching
of the plant mitochondrial proteins.
Multiple orthologous Rieskes in a single organism
The recent finding that two distinct Rieske/cytb complexes encoded by two separate
operons function in the acidophilic proteobacterium Acidithiobacillus ferrooxidans (Brasseur
et al. 2002) was strongly reminiscent of the previously observed multiple Rieske/cytb
complexes in the thermoacidophilic Archaeon Sulfolobus acidocaldarius (Hiller et al. 2003).
These observations raise the question whether multiple Rieske/cytb complexes were present
already in the common ancestor of Archaea and Bacteria as is for example the case for [NiFe]
hydrogenases (Vignais et al. 2001; Brugna-Guiral et al. 2003). The phylogenetic trees of Fig.
2b,c show that this is not the case. The two operons in A. ferrooxidans result from a relatively
recent duplication event in the Acidithiobacillus lineage. In the domain of the Archaea, by
contrast, the Rieske/cytb operon seems to have undergone an operon duplication early on
during the radiation of the domain and prior to the Cren-/Euryarchaeota split giving rise to
two separate families of Rieske/cytb complexes (denoted "L/N" and "F/G" in Fig.1) inherited
vertically in Archaea.
Both for the case of the archaeal species containing two distinct Rieske/cytb
complexes and of the Acidithiobacilli, the functional significance of this redundancy is not
firmly established so far. A functioning in different electron transfer directions has been
suggested for the two enzymes in A. ferrooxidans (Brasseur et al. 2002).
Biochemical evidence recently suggested that in some cyano- and proteobacteria,
extraoperonic Rieske genes code for proteins which can functionally substitute the "genuine"
subunits, i.e. those encoded in the Rieske/cytochrome b operon context (Schneider et al. 2004;
Ouchane et al. 2005). The corresponding sequences are labelled C2, C3 (cyanobacteria) and
A2 (proteobacteria) in Figs. 1a and 1c. The evolutionary histories of these "second" Rieske
genes differs strikingly between cyano- and proteobacteria. Whereas the proteobacterial
extraoperonic genes appear to have originated independently by individual duplication events
in several species, the gene duplications leading to the cyanobacterial C2 and C3 genes predate the radiation of the cyanobacterial phylum (Fig. 1c). A more detailed account of these
results has been presented in Ouchane et al. (2005).
Automated phylogenetic reconstruction not backed-up by circumstantial information very
likely overestimates frequency of lateral gene transfer
20
As elaborated above, the reconstruction of a phylogenetic tree for Rieske proteins
requires significantly more circumstantial information about this specific protein than is
presently handled by automated tree building procedures.
Most of the proteins the phylogeny of which we studied in the recent past, e.g. [Fe-S]and cytochrome b subunits of [NiFe] hydrogenases (Brugna-Guiral et al. 2003) or molybdoproteins of the DMSO-reductase family (Lebrun et al. 2003) in fact behaved the same way. A
notable exception is the cytochrome b subunit from the Rieske/cytb complex. While showing
a globally poor conservation, cytochrome b features a number of fully conserved residues (i.e.
the ligands to the two b-hemes, the glycine residues sterically required to form the heme
pocket or the residues involved in forming the quinol-oxidising site) scattered along the whole
sequence. These fully conserved residues obviously provide the crucial sequence landmarks
allowing the multiple alignment algorithms to achieve correct alignments all along the full
sequence. Furthermore, no paralogous genes for cytochrome b from Rieske/cytb complexes
have been detected in sequenced genomes so far. Cytochrome b thus contains a number of
particular properties favouring it for use in phylogenetic analyses. However, this situation in
our experience represents the exception rather than the rule. We therefore tend to suspect that
the abundance of discordant trees attributed to lateral gene transfer events that can be found in
the recent literature may in part rather be due to methodological problems as treated in this
work. The results discussed above by no means question the important role of lateral gene
transfer in numerous cases. For instance, the position of the Rieske/cytb complex from
Aquificales is highly supported and identical in all trees of Fig. 1 but differs strongly from
that seen in ssu rRNA trees. This case therefore almost certainly represents a bona fide
example for the occurance of lateral gene transfer observable in our trees.
21
Supplementary Material
The multiple sequence alignments underlying the NJ-trees depicted in Fig. 1 b and c as well
as a table listing entry numbers of the analysed sequences are available as supplementary
material to this work.
Acknowedgements
The authors would like to thank Nobuhito Sone (Iizuka/Japan) and Christian Schmidt
(Lübeck/Germany) for stimulating discussions and suggestions.
22
References
Aravind, L., R.L. Tatusov, Y.I. Wolf, D.R. Walker, and E.V., Koonin. 1998. Evidence for
massive gene exchange between archaeal and bacterial hyperthermophiles. Trends in genetics
14:442-444.
Baldauf, S.L., and J.D. Palmer. 1993. Animals and fungi are each others closest relatives –
congruent evidence from multiple proteins. Proc.Natl.Acad.Sci. USA 90:11558-11562.
Baymann, F., M. Brugna, U. Mühlenhoff, and W. Nitschke. 2001. Daddy, where did (PS) I
come from? Biochim.Biophys.Acta 1507:291-310.
Baymann, F., E. Lebrun, M. Brugna, B. Schoepp-Cothenet, M.-T. Giudici-Orticoni, and W.
Nitschke. 2003. The redox protein construction kit; pre-LUCA evolution of energy conserving
enzymes. Phil.Trans.R. Soc.Lond. B 358:267-274.
Bönisch, H., C.L. Schmidt, G. Schäfer, and R. Ladenstein. 2002. The structure of the soluble
domain of an archaeal Rieske iron-sulfur protein at 1.1 resolution. J.Mol.Biol. 319:791-805.
Brasseur, G., P. Bruscella, V. Bonnefoy, and D. Lemesle-Meunier. 2002. The bc1
complex of the iron-grown acidophilic chemolithotrophic bacterium Acidithiobacillus
ferrooxidans functions in the reverse but not in the forward direction. Is there a second
bc1 complex? Biochim Biophys Acta 1555: 37-43.
Brugna-Guiral, M., P. Tron, W. Nitschke, K.-O. Stetter, B. Burlat, B. Guigliarelli, M.
Bruschi, and M.-T. Giudici-Orticoni. 2003. [NiFe] hydrogenases from the hyperthermophilic
bacterium Aquifex aeolicus; properties, function and phylogenetics. Extremophiles 7:145-157.
Carrell, C.J., H. Zhang, W.A. Cramer, and J.L. Smith. 1997. Biological identity and diversity
in photosynthesis and respiration: structure of the lumen-side domain of the chloroplast
Rieske protein. Structure 5:1613-1625.
Castresana, J., M. Lübben, and M. Saraste. 1995. New archaebacterial
genes coding for redox proteins: implications for the evolution of aerobic mechanisms.
J.Mol.Biol. 250, 202-210.
Castresana, J., and D. Moreira. 1999. Respiratory chains in the last common ancestor of living
organisms. J.Mol.Evol.49:453-460.
Castresana, J. 2001. Comparative genomics and bioenergetics. Biochim. Biophys. Acta
1506:147-162.
Cavalier-Smith, T. 1998. Neomonada and the origin of animals and fungi. Pp375-407 in G.H.
Coombs, K. Vickermann, M.A.Sleigh and A. Warren, eds. Evolutionary relationships among
protozoa, Kluwer, London.
Clarke, G.D.P., R.G. Beiko, M.A. Ragan, and R.L. Charlebois. 2002. Inferring genome trees
by using a filter to eliminate phylogenetically discordant sequences and a distance matrix
based on mean normalised BLASTP scores. J.Bacteriol. 184:2072-2080.
23
Colbert, C.L., M.M-J. Couture, L.D. Eltis, and J.T. Bolin. 2000. A cluster exposed: crystal
structure of BPhF and determinants of the redox potential of Rieske FeS proteins. Structure
8:1267-1278.
Corre, E., A.-L. Reysenbach, and D. Prieur. 2001. Epsilon-proteobacterial diversity from a
deep-sea hydrothermal vent on the mid-atlantic ridge. FEMS Microbiol. Lett 205:329-335.
Daubin, V., N.A. Moran, and H. Ochman. 2003. Phylogenetics and the cohesion of bacterial
genomes. Science 301:829-832.
Denke, E., T. Merbitz-Zahradnik, O.M. Hatzfeld, C.H. Snyder, T.A. Link, and B.L.
Trumpower. 1998. Alteration of the midpoint potential and catalytic activity of the Rieske
iron-sulfur protein by changes of amino acids forming hydrogen bonds to the iron-sulfur
cluster. J.Biol.Chem. 273: 9085-9093.
Doolittle, W.F., and J.M.Jr. Logsdon. 1998. Archaeal genomics: Do archaea have a mixed
heritage? Current Biology 8:R209-R211.
Doolittle, W.F. 1999. Lateral genomics. Trends in Biochemical Sciences 24:M5-M8.
Doolittle, W.F. 1999. Phylogenetic classification and the universal tree. Science 284:21242129.
Eisen, J.A., and C.M. Fraser. 2003. Phylogenomics: Intersection of evolution and genomics.
Science 300:1706-1707.
Ellis, P. J., T. Conrads, R. Hille, and P. Kuhn. 2001. Crystal structure of the 100 kDa arsenite
oxidase from Alcaligenes faecalis in two crystal forms at 1.64 and 2.03 . Structure
9:125-132.
Gogarten, P, W.F. Doolittle, and J.G. Lawrence. 2002. Prokaryotic evolution in light of gene
transfer. Mol.Biol.Evol. 19:2226-2238.
Griffiths, E., and R.C. Gupta. 2004a. Distinctive protein sequences provide molecular markers
and evidence for the monophyletic nature of the Deinococcus-Thermus phylum. J.Bact.
186:3097-3107.
Griffiths, E., and R.C. Gupta. 2004b. Signature sequences in diverse proteins provide
evidence for the late divergence of the order Aquificales. Int.Microbiol. 7:41-52.
Gupta, R.S. 1998. Protein phylogenies and signature sequences: a reappraisal of evolutionary
relationships among archaebacteria, eubacteria and eukaryotes. Microbiol.Mol.Biol. 62:14351491.
Gupta, R.S., T. Mukhtar, and B. Singh. 1999. Evolutionary relationships among
photosynthetic prokaryotes (Heliobacterium chlorum, Chloroflexus aurantiacus,
cyanobacteria, Chlorobium tepidum and proteobacteria): implications regarding the origin of
photosynthesis. Mol.Microbiol. 32:893-906.
24
Gupta, R.S., M. Pereira, C. Chandrasekra, and V. Jahari. 2003. Molecular signatures in
protein sequences that are characteristic of cyanobacteria and plastid homologues.
Int.J.Sys.Evol.Microbiol. 53:1833-1841.
Hansmann, S., and W. Martin. 2000. Phylogeny of 33 ribosomal and six other proteins
encoded in an ancient gene cluster that is conserved across prokaryotic genomes: influence of
excluding poorly alignable sites from analysis. Int.J.Sys.Evol.Microbiol. 50:1655-1663.
Hiller, A., T. Henninger, G. Schäfer, and C.L. Schmidt. 2003. New respiratory genes
encoding subunits of a cytochrome bc1-analogous complex in the respiratory chain of the
hyperthermoacidophilic crenarchaeon Sulfolobus acidocaldarius. J.Bioenerg.Biomembr.
35:121-131.
Hunsicker-Wang, L.M., A. Heine, T. Chen, Y et al. (8 co-authors). 2003. High-resolution
structure of the soluble, respiratory-type Rieske protein from Thermus thermophilus: Analysis
and comparison. Biochemistry 42:7303-7317.
Hunte, C., J. Koepke, C. Lange, T. Rossmanith, and H. Michel. 2000. Structure of the yeast
cytochrome bc1-complex co-crystallized with an antibody FV-Fragment. Structure 8:669–684.
Iwata, S., M. Saynovits, T.A. Link, and H. Michel. 1996. Structure of a water soluble
fragment of the Rieske iron-sulfur protein of the bovine heart mitochondrial cytochrome bc1complex determined by MAD phasing at 1.5 resolution. Structure 4:567-579.
Kauppi, B., K. Lee, E. Carredano, R.E. Parales, D.T. Gibson, H. Eklund, and S. Ramaswamy.
1998. Structure of an aromatic-ring-hydroxylating dioxygenase – Naphthalene 1,2dioxygenase. Structure 6:571-586.
Kurisu, G., H. Zhang, J.L. Smith, and W.A. Cramer. 2003. Structure of the cytochrome b6f
complex of oxygenic photosynthesis: tuning the cavity. Science 302:1009-1014.
Kurland, C.G., B. Canback, and O.G. Berg. 2003. Horizontal gene transfer: A critical view.
Proc.Natl.Acad.Sci. USA 100:9658-9662.
Kyrpidis, N., R. Overbeek, and C. Ouzounis. 1999. Universal protein families and the
functional content of the last universal common ancestor. J.Mol.Evol. 49: 413-423.
Lebrun, E., M. Brugna, F. Baymann, D. Muller, D. Lièvremont, M.-C. Lett, and W. Nitschke.
2003. Arsenite oxidase, an ancient bioenergetic enzyme. Mol.Biol.Evol. 20:686-693.
Link, T.A. 1999. The structures of Rieske and Rieske-type proteins. Advances in Inorganic
Chemistry 47: 83-157.
Muller, D., D. Lièvremont, D.D. Simeonova, J.C. Hubert, and M.C. Lett. 2003. Arsenite
oxidase aox genes from a metal-resistant beta-proteobacterium. J.Bact. 185:135-141.
Nesbø, C.L., S.L. Haridon, K.O. Stetter, and W.F. Doolittle. Phylogenetic analysis of two
"archaeal" genes in Thermotoga maritima reveal multiple transfers between Archaea and
Bacteria. Mol.Biol.Evol. 18:362-375.
25
Olsen, G. J., C.R. Woese, and R. Overbeek. 1994. The winds of (evolutionary) change:
breathing new life into microbiology. J. Bacteriol. 176:1-6.
Ouchane, S., W. Nitschke, P. Bianco, D.D. Verméglio, and C. Astier. 2005. Multiple Rieske
genes in prokaryotes: exchangeable Rieske subunits in the cytochrome bc1 complex of
Rubrivivax gelatinosus. Mol.Microbiol. 57: 261-275.
Raymond, J., O. Zhaxybayeva, J.P. Gogarten, S.Y. Gerdes, and R.E. Blankenship. 2002.
Whole-genome analysis of photosynthetic prokaryotes. Science 298:1616-1620.
Raymond, J., and R.E. Blankenship. 2003. Horizontal gene transfer in eukaryotic algal
evolution. Proc.Natl.Acad.Sci. USA 100:7419-7420.
Santini, J.M., and R.N. vanden Hoven. 2004. Molybdenum containing arsenite oxidase of the
chemolithoautotrophic arsenite oxidiser NT-26. J. Bact. 186:1614-1619.
Schmidt, C.L., and L. Shaw. 2001. A comprehensive phylogenetic analysis of Rieske and
Rieske-type iron-sulfur proteins. J.Bioenerg.Biomembr. 33:9-26.
Schneider, D., S. Berry, T. Volkmer, A. Seidler, and M. Rögner. 2004. PetC1 is the major
Rieske iron-sulfur protein in the cytochrome b6f complex of Synechocystis sp. PCC6803.
J.Biol.Chem. 279:39383-39388.
Schröter, T., O.M. Hatzfeld, S. Gemeinhardt, M. Korn, T. Friedrich, B. Ludwig, and T.A.
Link. 1998. Mutational analysis of residues forming hydrogen bonds in the Rieske [2Fe-2S]
cluster of the cytochrome bc1-complex in Paracoccus denitrificans. Eur. J. Biochem. 255:
100-106.
Schütz, M., M. Brugna, E. Lebrun, et al. (9 co-authors). 2000. Early evolution of cytochrome
bc complexes. J. Mol. Biol. 300:663-675.
Snel, B., P. Bork, and M.A, Huynen. 1999. Genome phylogeny based on gene content. Nature
Genet. 21:108-110.
Sone, N., K. Nagata, H. Kojima, J. Tajima, Y. Kodera, T. Kanamaru, S. Noguchi, and J.
Sakamoto. 2001. A novel hydrophobic diheme c-type cytochrome. Purification from
Corynebacterium glutamicum and analysis of the QcrCBA operon encoding three subunit
proteins of a putative cytochrome reductase complex. Biochim.Biophys.Acta 1503:279-290.
Sone, N., M. Fukuda, S. Katayama, A. Jyoudai, M. Syugyou, S. Noguchi, and J. Sakamoto.
2003. QcrCAB operon of a nocardia-form actinomycete Rhodococcus rhodochrous encodes
cytochrome reductase complex with diheme cytochrome cc subunit. Biochim.Biophys.Acta
1557:125-131.
Stroebel, D., Y. Choquet, J.-L. Popot, and D. Picot. 2003. An atypcial haem in the
cytochrome b6f complex. Nature 426:413-418.
Thompson, J.D., T.J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins. 1997. The
ClustalX windows interface: flexible strategies for multiple sequence alignment aided by
quality analysis tools. Nucleic Acids Research 24:4876-4882.
26
vanden Hoven, R.N., and J.M. Santini. 2004. Arsenite oxidation by the heterotroph
Hydrogenophaga sp. str. NT-14: the arsenite oxidase and its physiological electron acceptor.
Biochim.Biophys.Acta 1656:148-155.
Vignais, P.M., B. Billoud, and J. Meyer. 2001. Classifcation and phylogeny of hydrogenases.
FEMS Microbiol.Rev. 25:455–501.
Woese, C.R. 1987. Bacterial evolution. Microbiol.Rev. 81:221-271.
Xiong, J., W.M. Fischer, K. Inoue, M. Nakahara, and C.E. Bauer. 2000. Molecular evidence
for the early evolution of photosynthesis. Science 289:1724-1730.
Yang, S., R.F. Doolittle, and P.E. Bourne. 2005. Phylogeny determined by protein domain
content. Proc.Natl.Acad.Sci. USA 102:373-378.
Zhang, Z., L. Huang, V.M. Shulmeister, Y.I. Chi, K.K. Kim, L.W. Hung, A.R. Crofts, E.A.
Berry, and S.H. Kim. 1998. Electron transfer by domain movement in cytochrome bc1. Nature
392:677-684.
Zuckerkandl, E., and L. Pauling. 1965. Molecules as documents of evolutionary history.
J.Theor.Biol. 8:357-366.
27
Figure legends
Figure 1:
NJ-phylograms reconstructed from multiple sequence alignments of Rieske proteins
(a,c) and cytochrome b (b, below dashed horizontal line) as well as the Molybdopterin subunit
(b, above dashed line) from Rieske/cytb complexes and arsenite oxidases. The alignments in
(a) and (b) were produced by Clustal whereas the alignment in (c) relies on superposition of
structurally conserved stretches as detailed in the text. Dash-dotted lines indicate inter-phylum
clashes between trees. Thin dotted lines correlate branches of trees (a) and (c) to the specieslabelled ones of tree (b). Bootstap supports exceeding 90% are denoted by dots.
Figure 2:
(a) and (b) represent the structure of Rieske proteins emphasizing the conserved βsheet skeleton with the large domain (magenta in a and grey in b) and the cluster-bearing
small domain (in orange). The transition from the N-terminal transmembrane helix to the
hydrophilic head-domain is indicated in green. The figure furthermore shows the most
divergent versions of β-strand connecting stretches as observed in 3D-structures so far, such
as the β3/β4-loops in the b6f- (green) and the bc1-complexes (light magenta) and the SoxF
protein from Sulfolobus acidocaldarius (light blue) as well as the β2/β3-loop in the
mitochondrial enzyme (grey) as compared to that seen in the Thermus thermophilus Rieske
protein (purple). The long insertion between the cluster ligating boxes in SoxF is represented
in red and the C-terminal extension of the cyanobacterial/plastidic enzyme in dark blue.
(c)-(h) show the structural incorporation of this virtual Rieske protein into the
presently known 3D-structures of parent enzymes, i.e. the mitochondrial bc1-complex (c, d),
the cyanobacterial/plastidic b6f complex (e, f) and a proteobacterial arsenite oxidase (g, h). In
(c), (e) and (g), view directions are within the membrane plane (indicated by hatches and
delimited by dashed lines), whereas in (d), (f) and (h) the enzymes are observed from the
periplasmic side of their parent enzymes.
28
Download