MBE Advance Access published March 28, 2006 1 The Rieske protein; a case study on the pitfalls of multiple sequence alignments and phylogenetic reconstruction Research Article Evelyne Lebrun*, Joanne M. Santini#, Myriam Brugna*, Anne-Lise Ducluzeau*, Soufian Ouchane&, Barbara Schoepp-Cothenet*, Frauke Baymann* & Wolfgang Nitschke*+ * Laboratoire de Bioénergétique et Ingénierie des Protéines, Institut de Biologie Structurale et Microbiologie (IFR..), 31 chemin Joseph-Aiguier, 13402 Marseille Cedex 20, France # Department of Biology, University College, Room 524 Darwin Building, Gower Street, London WC1E 6BT, UK & Centre de Génétique Moléculaire CNRS (UPR 2167), Bât. 24, avenue de la Terrasse, 91198 Gif-sur-Yvette Cedex, France + corresponding author: Wolfgang Nitschke, BIP/CNRS, 31 chemin Joseph-Aiguier, 13402 Marseille Cedex 20, phone: +33 491164435, fax: +33 491164578, e-mail: nitschke@ibsm.cnrs-mrs.fr Keywords: Rieske protein, phylogeny, lateral gene transfer, indel, bioenergetics Running Head: Phylogeny of Rieske proteins The Author 2006. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org 2 Abstract Previously published phylogenetic trees reconstructed on Rieske protein sequences frequently are at odds with each other, with those of other subunits of the parent enzymes and with small subunit rRNA trees. These differences are shown to be at least partially if not completely due to problems in the reconstruction procedures. A major source of erroneous Rieske protein trees lies in the presence of a large, poorly conserved domain prone to accommodate very long insertions in well-defined structural hotspots substantially hampering multiple alignments. The remaining smaller domain, in contrast, is too conserved to allow distant phylogenies to be deduced with sufficient confidence. 3D structures of representatives from this protein family are now available from phylogenetically distant species and from diverse enzymes. Multiple alignments can thus be refined on the basis of these structures. We show that structurally guided alignments of Rieske proteins from Rieske/cytb complexes and arsenite oxidases strongly reduce conflicts between resulting trees and those obtained on their companion enzyme subunits. Further problems encountered during this work, mainly consisting in database errors such as wrong annotations and frameshifts, are described. The obtained results are discussed against the background of hypotheses stipulating pervasive lateral gene transfer in prokaryotes. 3 Introduction Since the early 1960's, when Zuckerkandl and Pauling recognised that protein sequences are documents of evolutionary history (Zuckerkandl & Pauling 1965), protein and ribonucleotide sequence signatures have been used to infer family relationships between species. The analysis of hemo/myoglobin and mitochondrial cytochrome c proved the validity of this approach while at the same time already revealing some of the limitations of the method. Small subunit (ssu) rRNA has subsequently replaced proteins as reference molecules for taxonomic classification (Woese 87; Olsen, Woese and Overbeek 1994). More recently, the availability of a rapidly increasing number of whole genome sequences revigorated interest in protein-based phylogenetic studies (Doolittle and Logsdon 1998; Snel et al.1999; Hansmann and Martin 2000; Nesbø et al. 2001; Clarke et al. 2002; Raymond et al. 2003). The comparison of phylogenetic trees derived from selected proteins and from ssu rRNA frequently revealed substantial discrepancies between the respective trees. These incongruencies have in the last few years increasingly been interpreted as providing evidence for a high frequency of lateral gene transfer (Doolittle and Logsdon 1998; Doolittle 1999; Nesbø et al. 2001; Gogarten, Doolittle and Lawrence 2002; Raymond et al. 2002) implying that the "tree of life" might actually rather look like an intervoven mesh than a hierarchical tree. Such a mesh-like structure of the phylogenetic relationship between species would have far-reaching consequences for the study of the metabolic capacities of the earliest cells striving on Earth. While the results of comparative studies of metabolic pathways in the framework of a (mostly) hierarchical tree were taken to indicate a very early origin of the majority of these mechanisms (Castresana and Moreira 1999; Kyrpidis et al. 1999; Schütz et al. 2000; Castresana 2001; Baymann et al. 2003), a mesh-like topology of the phylogenetic tree of species would suggest more recent lateral gene transfer rather than common ancestry as the most likely cause for the universal distribution of metabolic capabilities. However, the notion of the phylogenetic tree of prokaryotes not being mainly hierarchical has been challenged during recent years (Clarke et al. 2002; Daubin et al. 2003; Eisen and Fraser 2003; Kurland et al. 2003; Yang et al. 2005). These studies argue that discordant phylogenies possibly arising from methodological problems are too readily blamed on lateral gene transfer by the proponents of the net-like ancestries. In this work we describe various parameters potentially influencing topologies of phylogenetic trees as observed by a specific example, the "Rieske" [2Fe-2S]-protein. Rieske proteins are found as subunits of a number of different enzymes (Carrell et al. 1997; Colbert et al. 2000; Lebrun et al. 2003), the most prominent of which is the Rieske/cytb 4 complex, a key enzyme of chemiosmotic bioenergetic electron transport chains. This enzyme has been the object of several phylogenetic studies in the past (Castresana 1995; Schütz et al. 2000; Xiong et al. 2000; Sone et al. 2001; Baymann et al. 2003). Whereas the topology of the phylogenetic tree obtained from sequences of the cytochrome b subunit fits standard ssu rRNA trees well (to the obvious exception of the endosymbiotically acquired proteins of mitochondria and plastids in eukaryotes), published phylogenies of the Rieske subunit strongly deviate from those of the cytochrome b subunit and of the ssu rRNA present in the corresponding species (Schütz et al. 2000; Schmidt and Shaw 2001). Horizontal gene transfer lends itself as an obvious explanation for this fact and has indeed been discussed previously (Schmidt and Shaw 2001). The integration of the gene coding for the Rieske protein of Rieske/cytb complexes in an operon with the cytochrome b subunit in all cases studied so far (except the cyanobacteria), however, raises doubts about the feasibility of frequent horizontal gene transfer events. Furthermore, several differing trees have been reported raising the suspicion that weak robustness of Rieske protein's phylogenetic trees rather than horizontal gene transfer may account for the observed differences. The results presented below highlight the difficulties in deducing reliable multiple alignments for phylogenetically distant and hence strongly divergent sequences of members from the Rieske protein superfamily. An inspection of 3D structures of Rieske proteins from diverse representatives (Iwata et al. 1996; Carrell et al. 1997; Zhang et al. 1998; Ellis et al 2001; Bönisch etal. 2002; Hunsicker-Wang et al. 2003; Kurisu et al. 2003; Stroebel et al. 2003) reveals the presence of extensive indels in structural hotspots. These indels complicate multiple alignment procedures based on commonly used algorithms. Guiding multiple alignments of Rieske proteins by structural, genetic, biochemical and biophysical information results in a substantial increase in congruency of Rieske-, cytochrome b- and ssu rRNA-trees. In addition to the alignment problem, a number of further parameters potentially leading phylogenetic trees of Rieske proteins astray, will be addressed. 5 Materials and Methods ORFs coding for Rieske proteins, cytochrome b from Rieske/cytb complexes and molybdenum-subunits of arsenite oxidase were retrieved from the NCBI (http://www.ncbi.nlm.nih.gov) and KEGG servers (http://www.genome.ad.jp/kegg-bin/). Several unfinished genomes were analysed via the TIGR-server (http://www.tigr.org) Coordinates of the Rieske proteins from yeast mitochondrial cytochrome bc1-complex (entry 1KB9), plastidic cytochrome b6f complex (1Q90), Sulfolobus acidocaldarius (1JM1), Thermus thermophilus (1NYK) and arsenite oxidase from Alcaligenes faecalis (1G8J) were obtained from the pdb-database (http://www.rcsb.org/pdb/). Structural alignments were obtained using the rms-fit option of the Swiss PdbViewer (version 3.7; http://www.expasy.org/spdbv). All available structures were found to yield good structural alignments for the β-strand skeleton β2 to β7. A Swiss pdb-Viewer file was created containing those structures aligned as described above in sucessive layers. In the parent layer, representing the arsenite oxidase Rieske protein, all residues except those present in the β-strand skeleton were subsequently deleted whereas in the other layers, the β-skeleton was taken out and only the connecting loops were retained. Visualising all layers therefore yielded a Rieske protein featuring short and long loop connections as observed in the structures. This "virtual" Rieske structure was subsequently reintegrated into the parent enzymes cyt bc1-, cyt b6f complex and arsenite oxidase by structure alignment of the β-sheet skeleton with that of the genuine Rieske protein present in the complex and subsequently deleting the latter from the structure file. Secondary structure prediction was performed by means of the pSAAM-package (www.life.uiuc.edu_crofts_ahab_psaam.html). ClustalX (Thompson et al. 1997) was used to obtain multiple sequence alignments of cytochrome b, the arsenite oxidase molybdenum-subunit and of sequence stretches not fixed by the structural alignment in the Rieske proteins. For cytochrome b, the ClustalX alignment was found to nicely superimpose secondary features such as transmembrane and membraneparallel helices as well as functionally important residues. Phylogenetic trees were reconstructed from these alignments using the neighbourjoining (NJ-) algorithm implemented in ClustalX or using the parsimony method (Phylippackage). Sequences analysed in this work are detailed as Table I of the Supplementary Material. 6 7 Results Defining the limits of the present analysis The "Rieske protein" was identified 40 years ago as an indispensable subunit of the mitochondrial cytochrome bc1-complex (Rieske et al., 1964) and since then has been found in all cytochrome bc-type enzymes. Since the Rieske protein and cytochrome b form the functional core of a "cytochrome bc-complex", it has been proposed to rename this enzyme family the "Rieske/cytochrome b" complexes (Schütz et al., 2000). In addition to their role as a crucial subunit of these enzymes, however, Rieske proteins are widespread electron transfer proteins and participate in very divergent enzymes. Characterised examples comprise the proteobacterial dioxygenases (Schmidt and Shaw, 2001) and the prokaryotic arsenite oxidases (Ellis et al., 2001). In Archaea, Rieske proteins appear to serve as soluble electron carriers (Iwasaki et al., 1996). Ideally, a phylogenetic tree encompassing all Rieske proteins should be arrived at. Previous reports attempted the construction of a common tree for the proteins of Rieske/cytb complexes, dioxygenases and the archaeal soluble electron shuttles (Schmidt and Shaw 2001). As will be detailed below, two criteria strongly improve the reliability of phylogenetic tree reconstruction, i.e. (a) the presence of crystal structures and (b) a good coverage of the species tree for the individual enzymes. In contrast to Rieske/cytb complexes (Schütz et al. 2000; Schmidt and Shaw 2001) and arsenite oxidases (Lebrun et al. 2003), the sample of available sequences of dioxygenase Rieskes is incomplete. Almost all representatives are from γproteobacterial enzymes with very divergent substrate specificities. In some of these enzymes, the Rieske protein is actually a domain fused to the large catalytic subunit (Kauppi et al. 1998). It therefore seems quite likely to us that the bacterial dioxygenases by themselves form a diverse family of enzymes whereof only a rather limited subset of primary sequences is known so far. The same is true for the cases of the archaeal soluble Rieske proteins and the desaturases. Moreover, crystal structures for the latter two groups are lacking. We have therefore decided to restrict our present analysis to the Rieske proteins from Rieske/cytb and arsenite oxidase enzymes for which both crystal structures are available and sufficient sequence coverage can be obtained. Data mining and correction of genome sequence errors To have a complete as possible set of sequences for the reconstruction of phylogenetic trees at our disposal, we searched protein and nucleotide data bases as well as whole genomes for the presence of operons encoding Rieske/cytb complexes and arsenite oxidases. A few selected Rieske genes not situated in such operon contexts but for which 8 biochemical/molecular biological evidence suggests a functioning in the respective Rieske/cytb complexes (Schneider et al. 2004; Ouchane et al., 2005) were included in the analysis. A closer inspection of ORFs retrieved from whole genomes which were both (i) annotated as Rieske or cytochrome b genes and (ii) present in an operon context with the other subunits of the Rieske/cytb complexes or arsenite oxidase showed that a sizeable fraction of these genes featured complications hampering phylogenetic reconstruction or outright contained severe sequence errors. (a) Gene fusions: In a sub-group of the α-proteobacteria (e.g. Rhodopseudomonas palustris and Bradyrhizobium japonicum) as well as in most Bacilli, the c-type cytochrome of the Rieske/cytb complex is fused to the C-terminal end of cytochrome b. Omitting to cut off the cyt c–part of the sequence results in an artificial attraction of these two phylogenetically rather distant species . (b) N-terminal extensions: The gene of the Rieske protein in most Actinobacteria contains a long sequence stretch coding for two additional N-terminal transmembrane helices. These helices are indeed present in the mature protein (Sone et al. 2001; 2003). Equivalent sequences coding for hydrophobic stretches are present in the mitochondrial Rieske genes. The stretches in the genes of the mitochondrial proteins serve for addressing the gene products into the organelle and are subsequently cleaved off. Clustal failed to correctly align the bulk of the protein unless these stretches were removed from the respective sequences. (c) Erroneous lengths of ORFs and frameshifts: In a few cases (see Table I in the Supplementary Material), annotated ORFs didn't represent the full-length genes. This was found to be due to erroneous interpretation of within-sequence methionines as N-termini and frameshifts induced by sequencing errors. Correct ORFs were determined directly from the base sequences of the genomes and truncations resulting from obvious frameshifts were remediated for our analysis. (d) Erroneous annotation: The genes for arsenite oxidase in Chloroflexus aurantiacus and Aeropyrum pernix are incorrectly annotated in the respective genomes despite unambiguous BLAST-scores. Failing to detect misannotations will either reduce species coverage or confuse the ortholog/paralog distinction. Multiple sequence alignments The set of sequences for Rieske proteins and cytochrome b from Rieske/cytb complexes retrieved as described above was fed into a Clustal multiple alignment routine. 9 Phylogenetic trees were subsequently built from these multiple alignments by the neighbourjoining method. Resulting trees for the Rieske protein and cytochrome b are shown in Fig. 1a and b, respectively. These two trees differ in numerous places. Apart from minor branching differences in the hierachically high nodes of the proteobacteria, a number of inter-phyla replacements are observed such as the positionings of the heliobacteria/ desulfitobacterium- or the chlorobiaceae-clades and that of the SoxF-protein from an archaeal Rieske/cytb complex. Arsenite oxidases do not contain cytochrome b but a molybdopterin-containing catalytic subunit and we therefore added the phylogeny obtained for this protein in the upper part of the tree in Fig. 1b. Since an incorrect multiple alignment might rationalise the deviant tree topologies, we tried to assay the validity of these alignments. For cytochrome b, only structures for the mitochondrial and for the plastidic/cyanobacterial representatives are available to date. Cytochrome b is an integral membrane protein with 7-12 transmembrane helices (n.b.: the split cyt b6/subunit IV gene is treated as a single sequence in this work), and thus the overall correctness of the alignment can be verified by comparing the positioning of the hydrophilic stretches. Furthermore, four glycine and four histidine residues involved in binding the two bhemes as well as the so-called PEWY-motif participating in forming the substrate binding QOsite are fully conserved in all cytochrome b sequences (e.g. see Schütz et al. 2000). These criteria allowed us to conclude that in the major part of the sequence, Clustal had managed to correctly align these proteins. The only uncertain stretches occur in the region of the b6/SUIVsplit and at the C-terminal end after the P(EWY)-motif. Using standard gap penalties, Clustal produces numerous indels in this region. To minimise biases introduced by misalignments in these stretches, positions containing gaps were excluded from the analysis yielding the tree of Fig. 1b. However, even considering these positions in NJ-calculations did not significantly modify the tree topology. For the Rieske proteins, however, a substantially different result was obtained. Comparison of the Clustal-alignment underlying the phylogram of Fig. 1a and the published crystal structures showed that the N-terminal halves of the sequences were strongly misaligned in many cases. By contrast, in the C-terminal half of the sequences, the Clustal alignment fitted the structural alignment extremely well. To understand this peculiar pattern, a detailed inspection of the published structures was performed. 10 Global structural features of an archetypal Rieske protein 3D-structures of Rieske proteins that are part of Rieske/cytb-type enzymes have been reported from mitochondria (Iwata et al. 1996; Zhang et al. 1998; Hunte et al. 2000), chloroplasts (Carrell et al. 1997; Stroebel et al. 2003) and cyanobacteria (Kurisu et al. 2003), the thermophilic Bacterium Thermus thermophilus (Hansicker-Wang et al. 2003) and the hyperthermoacidophilic Archaeon Sulfolobus acidocaldarius (Bönisch et al. 2003). Since the amino acid sequences of the mitochondrial protein are almost similar to those of the protein in α-proteobacteria, these five structures represent examples from very diverse regions of the phylogenetic tree of the prokaryotes and therefore provide a good picture of conserved and variable structural elements in the Rieske proteins from the Rieske/cytb complexes. In addition, the structure of the Rieske subunit in arsenite oxidase has been solved (Ellis et al. 2001). As has been noted previously (Hunsicker-Wang etal. 2003), the conserved structural core of Rieske proteins consists of 10 antiparallel β-sheets. For the sake of a comparative analysis, we have created the structure file of a virtual Rieske protein encompassing all the structural features of its phylogenetically diverse individual constituents via a structural alignment of this β-sheet skeleton (see Materials and Methods Section). This "chimeric" Rieske protein is shown in Fig. 2a,b emphasising the conserved and the variable elements. Three obvious tertiary structural elements can be distinguished. (a) The cluster binding domain with highly conserved sequence motifs extending from strand β5 to strand β8 (marked in orange in Fig. 2a), (b) the "large" domain formed by strands β1 to β4 and β9/β10 together with structural elements (loops or helices) connecting these β-strands (dark magenta in Fig. 2a) and (c) the hydrophobic α-helix preceeding the large domain and anchoring the protein in the membrane (green in Fig. 2a-f). Considering the extreme evolutionary distance between these proteins (spanning all three domains of life as well as comprising two quite distinct enzymes), the high degree of structural conservation is striking. As shown below, this surprisingly low structural variability is due to strong functional constraints exerted on this protein. Substantial sequence variability occurs at well-defined mutational hot spots Albeit demonstrating a strong conservation of global structure and in particular of the β-sheet skeleton throughout Rieske proteins from both Rieske/cytb complexes and arsenite 11 oxidases in all domains of the phylogenetic tree, Figure 2a,b also clearly shows the presence of specific regions of extensive structural variability with indels of up to 35 residues. These indels mainly involve β-strand-connecting loops. Three such regions can be discerned in the presently available structures with the most prominent one being represented by the loop connecting sheets β3 and β4. For all three strand-connecting loops, the most divergent versions observed in the structures are schematically indicated in Fig. 2a,b. Fig. 2b represents a view of the protein facilitating the visualisation of the three strongly variable stretches with respect to the invariable β-sheet core (in grey). For the β3/β4connecting loop, two different maximal-length loops are shown, encountered in (α-, β-, γ-) proteobacteria (pink) and in crenarchaeota (cyan). The phylogenetic distance between these two phyla is enormous. The fact that much shorter insertions are observed in all other phyla raises the suspicion that a long insertion in this position may have occurred twice independently. The strongly divergent fold of these stretches shown in Fig. 2a,b supports this scenario which is further corroborated by the complete absence of sequence homology (see below). Intriguingly, the space occupied by these two insertions connecting β-strands 3 and 4 is filled in by the C-terminal extension of the cyanobacterial/plastidic Rieske protein (dark blue in Fig. 2a-f). This observation of spatial regions harbouring significant structural variability as opposed to those featuring an invariant scaffold suggests that the mutability of the Rieske protein is largely dominated by constraints imposed by its interactions with other subunits of the respective parent enzymes. Structural constraints determining positions of hotspots Figure 2 also shows an incorporation of the virtual chimeric protein into the available 3D-structures of the parent enzymes, i.e. the cytochrome bc1-complex from mitochondria (Fig. 2c,d), the cytochrome b6f complex from cyanobacteria and chloroplasts (Fig. 2e,f) and arsenite oxidase from a β-proteobacterium (Fig. 2g,h). In the images to the left side of the figure, the enzymes are viewed along an axis parallel to the membrane, whereas the right hand views represent the enzymes seen from the positively charged (periplasmic/luminal) side of the membrane. The inner and outer surfaces of the membrane in the side views (left) are indicated by hatched boxes. In the mitochondrial bc1 complex, the long extension connecting β-sheets 3 and 4 is found on the extreme periphery of the enzyme (pink in Fig. 2c,d; dotted circle) and does not interact with other subunits in the functional dimer. In the cytochrome b6f complex where the 12 structurally unrelated cytochrome f replaces cytochrome c1, the long extension present in the mitochondrial Rieske protein would intrude into the space occupied by cytochrome f's small domain rationalising the absence of an insertion in the b6f-Rieske protein (arrow in Fig. 2f; the structural overlap occurs on the level of the side chains whereas only backbone traces are shown in Fig. 2). In arsenite oxidase, the interaction surface on the Rieske protein with the large, catalytic subunit (shown in green) strongly differs from that involved in the Rieske/cytb complexes. The β3/β4 connecting loops both of bc1 complexes and of the Sulfolobus enzyme as well as the C-terminal extension of the cytochrome b6f complex would structurally interfere with the catalytic subunit (arrows in Fig. 2 g) and they are correspondingly absent in all arsenite oxidase Rieske proteins (see multiple alignment in Supplementary Material). Intriguingly, the loops equivalent to the β2/β3-connecting region in the Rieske protein from Thermus thermophilus (purple in Fig. 2) would be fully exposed at the surface of arsenite oxidase (Fig. 2g, dotted circle) suggesting that insertions might be tolerated in this position. Yet, such an insertion is not observed in any of the arsenite oxidases. A structural reason for this absence might be sought in the association of the enzyme to the cytoplasmic membrane. EPR data obtained on the arsenite oxidase from Chloroflexus aurantiacus indeed suggest a geometry of association to the membrane as indicated in Fig. 2g (Lebrun et al. 2003). The question of arsenite oxidase's membrane attachment actually is a matter of debate. Whereas the corresponding enzymes in Cenibacterium arsenoxidans (Muller et al. 2003) and Chloroflexus aurantiacus (Lebrun et al. 2003) have been reported to be associated to the cytoplasmic membrane, those from "Alcaligenes faecalis" (Ellis et al. 2001) and Rhizobium sp. st. NT-26 (Santini and vanden Hoven 2004) and Hydrogenophaga sp. str. NT-14 (vanden Hoven & Santini 2004) represent fully soluble isolates. The genes encoding the arsenite oxidase Rieske protein contain the N-terminal transmembrane helix (see sequence alignment in Supplementary Material). However, this part of the protein corresponding to a signal sequence required for translocation via the Tat (Twin Arginine Translocation)-system may well be cleaved off in the mature enzyme. Even if a membrane-anchoring helix wasn't retained in the functional state of the enzyme by all or part of arsenite oxidase's Rieske proteins, a high affinity of this protein for the cytoplasmic membrane would still be possible in which case a loop extension between β2 and β3 wouldn't be tolerated. The loop between the strands β5 and β6 in SoxF from Sulfolobus acidocaldarius (red in Fig. 2) overlaps with the large subunit of arsenite oxidase but also with the small domain of cytochrome f of the cyanobacterial/plastidic b6f complex (arrows in Fig. 2f,g). In line with the 13 fact that the small size of cytochrome c1 as compared to cytochrome f opens up space for this loop, corresponding insertions are indeed present in a number of proteobacterial complexes (see multiple alignment in Supplemental Material). In turn, the strict absence of this insertion in the group of the actinobacterial Rieske/cytb complexes suggests that the electron accepting redox protein likely occupies this particular space. Indeed, a (probably bulkier) diheme cytochrome has been reported to play the role of cytochrome c1 or cytochrome f in these bacteria (Sone et al. 2001). All the mentioned examples show that evolutionary variability in Rieske proteins is only tolerated in a few well-defined areas of the structure. In these specific places, however, very extensive sequence and structural variability is observed. The structural conservation required by the embeddedness into the parent enzymes allows only structurally neutral exchanges in the remaining bulk of the protein. In the cluster binding domain (representing roughly one third of the total sequence), additional functional constraints (cluster ligation and stabilisation, redox potential and pK-value thereof; for a detailed discussion see Link 1999) further limit the degree of free evolutionary change strongly suppressing the evolutionary signal in this part of the sequence. Phylogenies of Rieske proteins obtained from structure-based multiple sequence alignments The structural alignment obtained for Rieske proteins with a published 3D structure was used as basic scaffold for a multiple alignment. Secondary structural elements such as βsheets and turns were predicted from the amino acid sequences for phyla where no structural information is available. Stretches predicted to correspond to each other were further aligned by Clustal. As control, the same procedure was applied to the proteins with known crystal structures. The alignment obtained by this method was found to be identical to the rigorously structural alignment. The resulting multiple alignment was then used to produce an NJ-phylogram (Fig. 1c). The comparison of the trees in Fig. 1 demonstrates that the number of clashes between the cytochrome b- and the Rieske tree is substantially reduced for the tree obtained on the structure-based alignment (Fig. 1c). In particular, no discordant topologies for the deepbranching nodes persist in the trees b and c. Disagreements, however, remain in the highest nodes of the proteo- and cyanobacteria. For these nodes, the topologies of the trees based on the Clustal-only (Fig. 1a) and the structure-assisted (Fig.1c) alignments are identical. This is not astonishing since the respective sequences are very closely related and thus relatively straightforward to align. Moreover, both these trees are perfectly in line with ssu-rRNA trees 14 in that they reproduce the α- to ε-subdivisions of the proteobacteria with the β- and γsubdivisions forming a common clade that branches off from the α-proteobacterial lineage. The β/γ-clustering of the proteobacterial subdivisions is much less rigorous in the cytochrome b-tree (Fig. 1b). For example, the β-proteobacterium Neisseria meningitidis clusters with the γ-proteobacteria whereas Allochromatium vinosum (γ-subdivision) is found within the βgroup. The corresponding nodes both show relatively low bootstrap values in Fig. 1b. Furthermore, minor realignments in the region where the cytochrome b alignment is not fully determined (as discussed above) alter the positioning of the Allochromatium and Neisseria branches (not shown). Accordingly, it seems extremely unlikely to us that these disagreements reflect true phylogenetic discordances due to lateral gene transfer. Phylogenies based on parsimony approaches had topologies almost identical (for more than 90% of all nodes) to those depicted in Fig. 1, a-c (not shown). Disagreements were restricted to orderings of a few of the highest nodes. Interestingly, both the structure-guided and the Clustal-only alignments yielded similar trees in NJ- and parsimony methods demonstrating that, for the case of the considered sequences, the dominant source of error lies in the alignment rather than in the employed tree-building algorithm. 15 Discussion The phylogenetic signal varies strongly along the amino acid sequence of Rieske proteins As shown above, the Rieske proteins are characterised by a bimodal architecture. The domain ligating the [2Fe-2S] cluster and harbouring a strongly conserved disulfide bond, is practically devoid of indels (to the exception of the SoxF case) and contains about 20% almost invariable (i.e. less than 3 different residues at a given site) sequence positions. At least two positions have been shown to tune the redox potential of the cluster (Denke et al. 1998; Schröter et al. 1998) and are thus determined by functional constraints. All these structural/functional constraints strongly limit the extent of freely variable sequence sites. Considering the phylogenetic diversity of parent organisms and enzymes, the sequence forming the cluster binding domain certainly is saturated with respect to the evolutionary signal limiting the reliability of extractable phylogenetic information. The sequence of the large domain, by contrast, probably contains strong phylogenetically relevant information. The structurally conserved core of this domain is made up of β-sheet structural elements that are intrinsically much less mutationally constrained by side-chain interaction than α-helices. The fact that one side of the protein in the cytochrome bc1-complexes is in contact with cytochrome b forbids the presence of extended indels. The opposite side (on the periphery of the enzyme complex), however, features extensive insertions in several places along the primary structure. A high amount of phylogenetic information can therefore be extracted from the sequence of the large domain of Rieske proteins provided that the reliability of multiple sequence alignments is ensured by structural comparisons. Reconstruction of phylogenetic trees for Rieske proteins requires both multiple alignments guided by 3D structural information and an informed selection of sequences In our hands, none of the commonly used multiple alignment algorithms (e.g. CLUSTAL or MACAW) yielded results in line with the structural data regarding the large domain of Rieske proteins. Extensively varying gap penalties did not significantly improve the obtained alignments. This failure to correctly align these sequences most probably is due to the simultaneous presence of the weakly sequence-, but strongly structure-conserved βsheet backbone and a number of long indels, for instance making up for roughly one third of the large domain in the case of the proteobacterial bc1-Rieske protein. The difficulties in aligning the large domain were noted previously and frequently led to the approach of restricting phylogenetic analysis to the cluster-binding domain (e.g. see Schmidt and Shaw 2001). Studies on other genes have indeed demonstrated changes in tree topologies upon 16 exclusion of uncertain sites in multiple alignments (Hansmann and Martin 2000). Reported alignments of Rieske proteins trying to consider the full sequence (e.g. Schütz et al. 2000) are structurally wrong in several places. In the absence of 3D structures of a wide sample of phylogenetically diverse Rieske proteins, exclusion of the large domain from analysis was thus certainly the best possible method to tackle the problem. However, the validity of this approach breaks down when it comes to comparing phylogenetically distant species for which there isn't enough evolutionary information available in the number of sites present. Inclusion of the full sequence in a relatively reliable alignment, by contrast, results in a much better correspondence between the phylogenetic relationships of Rieske proteins and of cytochrome b (Fig. 2). This latter observation provides strong evidence that the structure-guided alignment is indeed much closer to reality than those either arrived at by automated sequence alignments of the full sequence or obtained on a reduced subset of strongly conserved sites. In phylogenetic studies, congruency of tree topologies calculated via different methods are often taken to corroborate the reliability of the presented phylograms. The above mentioned observation that parsimony trees deviate only slightly from NJ-trees for all three cases shown in Fig. 1, raises concerns about the general validity of this argument. Alignment quality certainly prevails on the choice of tree-building methods and even with all available methods yielding congruent results, a tree may be wrong if its underlying alignment is incorrect. The difficulties encountered in arriving at a correct alignment for strongly divergent sequences, especially if they contain long indels, are certainly the most important, yet not the only, source of erroneous tree topologies. A further complication comes from the fact that frequently a detailed inspection of the gene context is required to distinguish between orthologous and paralogous genes. Rieske genes in databases are classed into the cluster of orthologous genes COG 0723. As emphasised previously (Schmidt and Shaw 2000; Baymann et al. 2003), representatives of this COG belong to enzyme families as divergent as cytoplasmic dioxygenases, soluble electron shuttles, arsenite oxidases or Rieske/cytb complexes. The Rieske subunits in these enzymes have diverged prior to the Bacteria/Archaea-split, i.e. are in fact paralogs rather than orthologs. Again, a machine-based retrieval of sequences from this COG followed by their multiple alignment without further selection will result in strongly messed-up phylogenies. Conversely, true orthologs maximally covering the diversity of species need to be considered in the analysis. In the framework of the work presented here, the observation that including a single sequence, i.e. that from Desulfitobacterium hafniense, was crucial for 17 attracting the two heliobacterial sequences into the Bacillus/cyanobacterial cluster, emphasised the need for completeness. Prior to retrieval and inclusion of the D. hafniense Rieske gene but using an identical alignment for the remaining sequences, the heliobacterial Rieske proteins were found to cluster with the Actinobacteria (as found in previous reports; see Schmidt and Shaw 2000), although with extremely low bootstrap values (below 5%!). D. hafniense lying midway between the Heliobacteria and Bacilli allowed the clustering of this set of sequences characterised by a strongly divergent sequence of the large domain. Indels are not always good indicators for clades The analysis of the presence/absence of insertions is employed as a method independent of the standard tree building algorithms to deduce phylogenetic clades (Baldauf and Palmer 1993; Gupta 1998; Gupta et al. 1999; Gupta et al. 2003; Griffiths and Gupta 2004a,b). This approach is intuitively appealing and avoids the psychological obstacle of relying on intricate mathematical algorithms based on sequence evolution models. This "indel"-approach frequently yields tree topologies at variance with those obtained with the conventional methods (Gupta et al. 1999; Griffiths and Gupta 2004b) which is taken by the authors of the respective articles as evidence for the limitations of the conventional methods. The analysis of the occurance of indels in the Rieske protein detailed above suggests some caution in their use for cladistic purposes. As emphasised in Fig. 1, insertions occur at specific hotspots where the structure of the protein and in particular its interactions with the other subunit(s) of the entire enzyme allow additional bulk to be inserted. The most striking example is the loop connecting the sheets β3 and β4 which, being positioned on the periphery of the dimeric Rieske/cytb complex, houses insertions more than 30 residues long. Very long insertions in this position have occured independently several times during the evolution of Rieske/cytb complexes (see for example the structurally and sequencewise completely unrelated loops of mitochondria and Crenarchaeota). Conversely, a complete absence of a loop extension is observed in the Rieske/cytb complex from the Bacillus/cyanobacterial phylum and in arsenite oxidase, i.e. proteins even belonging to different enzyme families. Cladistic interpretations of the occurance of indels in the Rieske proteins would thus produce severly misleading results. The high probability of adding and removing sequence stretches in defined places of the sequence rationalised by the structural analysis limits the cladistic significance of such events. 18 Comparison of 16S rRNA and Rieske/cytb trees As detailed above, founding multiple alignments of the Rieske proteins on alignments of the basic structural elements results in rather similar tree topologies for both subunits of the Rieske/cytb and arsenite oxidase enzymes. This topology can now be compared to species phylogenies (based on ssu rRNA). The tree shown in Fig. 1b corresponds rather closely to that proposed in standard phylogenetic studies (eg. Olsen et al. 1994). A major discrepency to the tree put forward by Olsen et al. (1994), however, consists in the fact that the Firmicutes (comprising the clades of the low-GC Gram-positive and the Heliobacteria) and the Actinobacteria (formerly denoted as high-GC Gram-positive bacteria) clearly do not form sister clades in the protein trees. This splitting in Rieske/cytb trees has already been noted by Sone (Sone et al. 2001; Sone et al. 2003). The apparent discrepancy between Rieske/cytb- and ssu rRNA-based phylogenies can be rationalised in two different ways. (i) the Rieske/cytb enzyme in one of the two phyla has been imported by lateral gene transfer or (ii) one of the two trees shows a wrong branching order. In this context, it is noteworthy that 16S rRNA trees have been published which separate Actinobacteria and Firmicutes positioning the Actinobacteria as a phylum closer to the root of the domain Bacteria. It therefore seems quite likely to us that Actinobacteria and Firmicutes indeed aren't sister clades as suggested by the trees in Fig. 1b and 1c. The clustering of Firmicutes (including Heliobacteria), Chlorobiaceae and cyanobacteria seen in Fig. 1b,c corroborates scenarios for the emergence of photosystems which stipulate a common branching node for phototrophs containing an RCI-type, i.e. photosystem I-related, photosynthetic reaction centre (Schütz et al. 2000; Baymann et al. 2002). The Aquificalis Rieske/cytb complex clusters with the ε-proteobacteria instead of representing the lowest branch of the bacterial domain as observed in ssu-rRNA trees. The respective nodes on our protein trees are highly supported and similar in all phylograms of Fig. 1. Corresponding affinities of Aquifex genes to ε-proteobacterial representatives are observed for roughly 20% of the organism's genome. Aquificales are now known to share the hyperthermal deep-sea vent habitat with numerous ε-proteobacterial species (Corre et al. 2001) rationalising extensive lateral gene transfer between these phyla. The case of the Aquificalis Rieske/cytb complex therefore almost certainly represents a true example of horizontally transferred genes observable in the trees of Fig. 1. Incidentally, it is noteworthy that the trees b and c in Fig. 1 both support the Opisthokonta hypothesis (Cavalier-Smith 1998), i.e. indicate a common clade for 19 mitochondrial Rieske/cytochrome b from animals and fungi together with an earlier branching of the plant mitochondrial proteins. Multiple orthologous Rieskes in a single organism The recent finding that two distinct Rieske/cytb complexes encoded by two separate operons function in the acidophilic proteobacterium Acidithiobacillus ferrooxidans (Brasseur et al. 2002) was strongly reminiscent of the previously observed multiple Rieske/cytb complexes in the thermoacidophilic Archaeon Sulfolobus acidocaldarius (Hiller et al. 2003). These observations raise the question whether multiple Rieske/cytb complexes were present already in the common ancestor of Archaea and Bacteria as is for example the case for [NiFe] hydrogenases (Vignais et al. 2001; Brugna-Guiral et al. 2003). The phylogenetic trees of Fig. 2b,c show that this is not the case. The two operons in A. ferrooxidans result from a relatively recent duplication event in the Acidithiobacillus lineage. In the domain of the Archaea, by contrast, the Rieske/cytb operon seems to have undergone an operon duplication early on during the radiation of the domain and prior to the Cren-/Euryarchaeota split giving rise to two separate families of Rieske/cytb complexes (denoted "L/N" and "F/G" in Fig.1) inherited vertically in Archaea. Both for the case of the archaeal species containing two distinct Rieske/cytb complexes and of the Acidithiobacilli, the functional significance of this redundancy is not firmly established so far. A functioning in different electron transfer directions has been suggested for the two enzymes in A. ferrooxidans (Brasseur et al. 2002). Biochemical evidence recently suggested that in some cyano- and proteobacteria, extraoperonic Rieske genes code for proteins which can functionally substitute the "genuine" subunits, i.e. those encoded in the Rieske/cytochrome b operon context (Schneider et al. 2004; Ouchane et al. 2005). The corresponding sequences are labelled C2, C3 (cyanobacteria) and A2 (proteobacteria) in Figs. 1a and 1c. The evolutionary histories of these "second" Rieske genes differs strikingly between cyano- and proteobacteria. Whereas the proteobacterial extraoperonic genes appear to have originated independently by individual duplication events in several species, the gene duplications leading to the cyanobacterial C2 and C3 genes predate the radiation of the cyanobacterial phylum (Fig. 1c). A more detailed account of these results has been presented in Ouchane et al. (2005). Automated phylogenetic reconstruction not backed-up by circumstantial information very likely overestimates frequency of lateral gene transfer 20 As elaborated above, the reconstruction of a phylogenetic tree for Rieske proteins requires significantly more circumstantial information about this specific protein than is presently handled by automated tree building procedures. Most of the proteins the phylogeny of which we studied in the recent past, e.g. [Fe-S]and cytochrome b subunits of [NiFe] hydrogenases (Brugna-Guiral et al. 2003) or molybdoproteins of the DMSO-reductase family (Lebrun et al. 2003) in fact behaved the same way. A notable exception is the cytochrome b subunit from the Rieske/cytb complex. While showing a globally poor conservation, cytochrome b features a number of fully conserved residues (i.e. the ligands to the two b-hemes, the glycine residues sterically required to form the heme pocket or the residues involved in forming the quinol-oxidising site) scattered along the whole sequence. These fully conserved residues obviously provide the crucial sequence landmarks allowing the multiple alignment algorithms to achieve correct alignments all along the full sequence. Furthermore, no paralogous genes for cytochrome b from Rieske/cytb complexes have been detected in sequenced genomes so far. Cytochrome b thus contains a number of particular properties favouring it for use in phylogenetic analyses. However, this situation in our experience represents the exception rather than the rule. We therefore tend to suspect that the abundance of discordant trees attributed to lateral gene transfer events that can be found in the recent literature may in part rather be due to methodological problems as treated in this work. The results discussed above by no means question the important role of lateral gene transfer in numerous cases. For instance, the position of the Rieske/cytb complex from Aquificales is highly supported and identical in all trees of Fig. 1 but differs strongly from that seen in ssu rRNA trees. This case therefore almost certainly represents a bona fide example for the occurance of lateral gene transfer observable in our trees. 21 Supplementary Material The multiple sequence alignments underlying the NJ-trees depicted in Fig. 1 b and c as well as a table listing entry numbers of the analysed sequences are available as supplementary material to this work. Acknowedgements The authors would like to thank Nobuhito Sone (Iizuka/Japan) and Christian Schmidt (Lübeck/Germany) for stimulating discussions and suggestions. 22 References Aravind, L., R.L. Tatusov, Y.I. Wolf, D.R. Walker, and E.V., Koonin. 1998. Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends in genetics 14:442-444. Baldauf, S.L., and J.D. Palmer. 1993. Animals and fungi are each others closest relatives – congruent evidence from multiple proteins. Proc.Natl.Acad.Sci. USA 90:11558-11562. Baymann, F., M. Brugna, U. Mühlenhoff, and W. Nitschke. 2001. Daddy, where did (PS) I come from? Biochim.Biophys.Acta 1507:291-310. Baymann, F., E. Lebrun, M. Brugna, B. Schoepp-Cothenet, M.-T. Giudici-Orticoni, and W. Nitschke. 2003. The redox protein construction kit; pre-LUCA evolution of energy conserving enzymes. Phil.Trans.R. Soc.Lond. B 358:267-274. Bönisch, H., C.L. Schmidt, G. Schäfer, and R. Ladenstein. 2002. The structure of the soluble domain of an archaeal Rieske iron-sulfur protein at 1.1 resolution. J.Mol.Biol. 319:791-805. Brasseur, G., P. Bruscella, V. Bonnefoy, and D. Lemesle-Meunier. 2002. The bc1 complex of the iron-grown acidophilic chemolithotrophic bacterium Acidithiobacillus ferrooxidans functions in the reverse but not in the forward direction. Is there a second bc1 complex? Biochim Biophys Acta 1555: 37-43. Brugna-Guiral, M., P. Tron, W. Nitschke, K.-O. Stetter, B. Burlat, B. Guigliarelli, M. Bruschi, and M.-T. Giudici-Orticoni. 2003. [NiFe] hydrogenases from the hyperthermophilic bacterium Aquifex aeolicus; properties, function and phylogenetics. Extremophiles 7:145-157. Carrell, C.J., H. Zhang, W.A. Cramer, and J.L. Smith. 1997. Biological identity and diversity in photosynthesis and respiration: structure of the lumen-side domain of the chloroplast Rieske protein. Structure 5:1613-1625. Castresana, J., M. Lübben, and M. Saraste. 1995. New archaebacterial genes coding for redox proteins: implications for the evolution of aerobic mechanisms. J.Mol.Biol. 250, 202-210. Castresana, J., and D. Moreira. 1999. Respiratory chains in the last common ancestor of living organisms. J.Mol.Evol.49:453-460. Castresana, J. 2001. Comparative genomics and bioenergetics. Biochim. Biophys. Acta 1506:147-162. Cavalier-Smith, T. 1998. Neomonada and the origin of animals and fungi. Pp375-407 in G.H. Coombs, K. Vickermann, M.A.Sleigh and A. Warren, eds. Evolutionary relationships among protozoa, Kluwer, London. Clarke, G.D.P., R.G. Beiko, M.A. Ragan, and R.L. Charlebois. 2002. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalised BLASTP scores. J.Bacteriol. 184:2072-2080. 23 Colbert, C.L., M.M-J. Couture, L.D. Eltis, and J.T. Bolin. 2000. A cluster exposed: crystal structure of BPhF and determinants of the redox potential of Rieske FeS proteins. Structure 8:1267-1278. Corre, E., A.-L. Reysenbach, and D. Prieur. 2001. Epsilon-proteobacterial diversity from a deep-sea hydrothermal vent on the mid-atlantic ridge. FEMS Microbiol. Lett 205:329-335. Daubin, V., N.A. Moran, and H. Ochman. 2003. Phylogenetics and the cohesion of bacterial genomes. Science 301:829-832. Denke, E., T. Merbitz-Zahradnik, O.M. Hatzfeld, C.H. Snyder, T.A. Link, and B.L. Trumpower. 1998. Alteration of the midpoint potential and catalytic activity of the Rieske iron-sulfur protein by changes of amino acids forming hydrogen bonds to the iron-sulfur cluster. J.Biol.Chem. 273: 9085-9093. Doolittle, W.F., and J.M.Jr. Logsdon. 1998. Archaeal genomics: Do archaea have a mixed heritage? Current Biology 8:R209-R211. Doolittle, W.F. 1999. Lateral genomics. Trends in Biochemical Sciences 24:M5-M8. Doolittle, W.F. 1999. Phylogenetic classification and the universal tree. Science 284:21242129. Eisen, J.A., and C.M. Fraser. 2003. Phylogenomics: Intersection of evolution and genomics. Science 300:1706-1707. Ellis, P. J., T. Conrads, R. Hille, and P. Kuhn. 2001. Crystal structure of the 100 kDa arsenite oxidase from Alcaligenes faecalis in two crystal forms at 1.64 and 2.03 . Structure 9:125-132. Gogarten, P, W.F. Doolittle, and J.G. Lawrence. 2002. Prokaryotic evolution in light of gene transfer. Mol.Biol.Evol. 19:2226-2238. Griffiths, E., and R.C. Gupta. 2004a. Distinctive protein sequences provide molecular markers and evidence for the monophyletic nature of the Deinococcus-Thermus phylum. J.Bact. 186:3097-3107. Griffiths, E., and R.C. Gupta. 2004b. Signature sequences in diverse proteins provide evidence for the late divergence of the order Aquificales. Int.Microbiol. 7:41-52. Gupta, R.S. 1998. Protein phylogenies and signature sequences: a reappraisal of evolutionary relationships among archaebacteria, eubacteria and eukaryotes. Microbiol.Mol.Biol. 62:14351491. Gupta, R.S., T. Mukhtar, and B. Singh. 1999. Evolutionary relationships among photosynthetic prokaryotes (Heliobacterium chlorum, Chloroflexus aurantiacus, cyanobacteria, Chlorobium tepidum and proteobacteria): implications regarding the origin of photosynthesis. Mol.Microbiol. 32:893-906. 24 Gupta, R.S., M. Pereira, C. Chandrasekra, and V. Jahari. 2003. Molecular signatures in protein sequences that are characteristic of cyanobacteria and plastid homologues. Int.J.Sys.Evol.Microbiol. 53:1833-1841. Hansmann, S., and W. Martin. 2000. Phylogeny of 33 ribosomal and six other proteins encoded in an ancient gene cluster that is conserved across prokaryotic genomes: influence of excluding poorly alignable sites from analysis. Int.J.Sys.Evol.Microbiol. 50:1655-1663. Hiller, A., T. Henninger, G. Schäfer, and C.L. Schmidt. 2003. New respiratory genes encoding subunits of a cytochrome bc1-analogous complex in the respiratory chain of the hyperthermoacidophilic crenarchaeon Sulfolobus acidocaldarius. J.Bioenerg.Biomembr. 35:121-131. Hunsicker-Wang, L.M., A. Heine, T. Chen, Y et al. (8 co-authors). 2003. High-resolution structure of the soluble, respiratory-type Rieske protein from Thermus thermophilus: Analysis and comparison. Biochemistry 42:7303-7317. Hunte, C., J. Koepke, C. Lange, T. Rossmanith, and H. Michel. 2000. Structure of the yeast cytochrome bc1-complex co-crystallized with an antibody FV-Fragment. Structure 8:669–684. Iwata, S., M. Saynovits, T.A. Link, and H. Michel. 1996. Structure of a water soluble fragment of the Rieske iron-sulfur protein of the bovine heart mitochondrial cytochrome bc1complex determined by MAD phasing at 1.5 resolution. Structure 4:567-579. Kauppi, B., K. Lee, E. Carredano, R.E. Parales, D.T. Gibson, H. Eklund, and S. Ramaswamy. 1998. Structure of an aromatic-ring-hydroxylating dioxygenase – Naphthalene 1,2dioxygenase. Structure 6:571-586. Kurisu, G., H. Zhang, J.L. Smith, and W.A. Cramer. 2003. Structure of the cytochrome b6f complex of oxygenic photosynthesis: tuning the cavity. Science 302:1009-1014. Kurland, C.G., B. Canback, and O.G. Berg. 2003. Horizontal gene transfer: A critical view. Proc.Natl.Acad.Sci. USA 100:9658-9662. Kyrpidis, N., R. Overbeek, and C. Ouzounis. 1999. Universal protein families and the functional content of the last universal common ancestor. J.Mol.Evol. 49: 413-423. Lebrun, E., M. Brugna, F. Baymann, D. Muller, D. Lièvremont, M.-C. Lett, and W. Nitschke. 2003. Arsenite oxidase, an ancient bioenergetic enzyme. Mol.Biol.Evol. 20:686-693. Link, T.A. 1999. The structures of Rieske and Rieske-type proteins. Advances in Inorganic Chemistry 47: 83-157. Muller, D., D. Lièvremont, D.D. Simeonova, J.C. Hubert, and M.C. Lett. 2003. Arsenite oxidase aox genes from a metal-resistant beta-proteobacterium. J.Bact. 185:135-141. Nesbø, C.L., S.L. Haridon, K.O. Stetter, and W.F. Doolittle. Phylogenetic analysis of two "archaeal" genes in Thermotoga maritima reveal multiple transfers between Archaea and Bacteria. Mol.Biol.Evol. 18:362-375. 25 Olsen, G. J., C.R. Woese, and R. Overbeek. 1994. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 176:1-6. Ouchane, S., W. Nitschke, P. Bianco, D.D. Verméglio, and C. Astier. 2005. Multiple Rieske genes in prokaryotes: exchangeable Rieske subunits in the cytochrome bc1 complex of Rubrivivax gelatinosus. Mol.Microbiol. 57: 261-275. Raymond, J., O. Zhaxybayeva, J.P. Gogarten, S.Y. Gerdes, and R.E. Blankenship. 2002. Whole-genome analysis of photosynthetic prokaryotes. Science 298:1616-1620. Raymond, J., and R.E. Blankenship. 2003. Horizontal gene transfer in eukaryotic algal evolution. Proc.Natl.Acad.Sci. USA 100:7419-7420. Santini, J.M., and R.N. vanden Hoven. 2004. Molybdenum containing arsenite oxidase of the chemolithoautotrophic arsenite oxidiser NT-26. J. Bact. 186:1614-1619. Schmidt, C.L., and L. Shaw. 2001. A comprehensive phylogenetic analysis of Rieske and Rieske-type iron-sulfur proteins. J.Bioenerg.Biomembr. 33:9-26. Schneider, D., S. Berry, T. Volkmer, A. Seidler, and M. Rögner. 2004. PetC1 is the major Rieske iron-sulfur protein in the cytochrome b6f complex of Synechocystis sp. PCC6803. J.Biol.Chem. 279:39383-39388. Schröter, T., O.M. Hatzfeld, S. Gemeinhardt, M. Korn, T. Friedrich, B. Ludwig, and T.A. Link. 1998. Mutational analysis of residues forming hydrogen bonds in the Rieske [2Fe-2S] cluster of the cytochrome bc1-complex in Paracoccus denitrificans. Eur. J. Biochem. 255: 100-106. Schütz, M., M. Brugna, E. Lebrun, et al. (9 co-authors). 2000. Early evolution of cytochrome bc complexes. J. Mol. Biol. 300:663-675. Snel, B., P. Bork, and M.A, Huynen. 1999. Genome phylogeny based on gene content. Nature Genet. 21:108-110. Sone, N., K. Nagata, H. Kojima, J. Tajima, Y. Kodera, T. Kanamaru, S. Noguchi, and J. Sakamoto. 2001. A novel hydrophobic diheme c-type cytochrome. Purification from Corynebacterium glutamicum and analysis of the QcrCBA operon encoding three subunit proteins of a putative cytochrome reductase complex. Biochim.Biophys.Acta 1503:279-290. Sone, N., M. Fukuda, S. Katayama, A. Jyoudai, M. Syugyou, S. Noguchi, and J. Sakamoto. 2003. QcrCAB operon of a nocardia-form actinomycete Rhodococcus rhodochrous encodes cytochrome reductase complex with diheme cytochrome cc subunit. Biochim.Biophys.Acta 1557:125-131. Stroebel, D., Y. Choquet, J.-L. Popot, and D. Picot. 2003. An atypcial haem in the cytochrome b6f complex. Nature 426:413-418. Thompson, J.D., T.J. Gibson, F. Plewniak, F. Jeanmougin, and D.G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research 24:4876-4882. 26 vanden Hoven, R.N., and J.M. Santini. 2004. Arsenite oxidation by the heterotroph Hydrogenophaga sp. str. NT-14: the arsenite oxidase and its physiological electron acceptor. Biochim.Biophys.Acta 1656:148-155. Vignais, P.M., B. Billoud, and J. Meyer. 2001. Classifcation and phylogeny of hydrogenases. FEMS Microbiol.Rev. 25:455–501. Woese, C.R. 1987. Bacterial evolution. Microbiol.Rev. 81:221-271. Xiong, J., W.M. Fischer, K. Inoue, M. Nakahara, and C.E. Bauer. 2000. Molecular evidence for the early evolution of photosynthesis. Science 289:1724-1730. Yang, S., R.F. Doolittle, and P.E. Bourne. 2005. Phylogeny determined by protein domain content. Proc.Natl.Acad.Sci. USA 102:373-378. Zhang, Z., L. Huang, V.M. Shulmeister, Y.I. Chi, K.K. Kim, L.W. Hung, A.R. Crofts, E.A. Berry, and S.H. Kim. 1998. Electron transfer by domain movement in cytochrome bc1. Nature 392:677-684. Zuckerkandl, E., and L. Pauling. 1965. Molecules as documents of evolutionary history. J.Theor.Biol. 8:357-366. 27 Figure legends Figure 1: NJ-phylograms reconstructed from multiple sequence alignments of Rieske proteins (a,c) and cytochrome b (b, below dashed horizontal line) as well as the Molybdopterin subunit (b, above dashed line) from Rieske/cytb complexes and arsenite oxidases. The alignments in (a) and (b) were produced by Clustal whereas the alignment in (c) relies on superposition of structurally conserved stretches as detailed in the text. Dash-dotted lines indicate inter-phylum clashes between trees. Thin dotted lines correlate branches of trees (a) and (c) to the specieslabelled ones of tree (b). Bootstap supports exceeding 90% are denoted by dots. Figure 2: (a) and (b) represent the structure of Rieske proteins emphasizing the conserved βsheet skeleton with the large domain (magenta in a and grey in b) and the cluster-bearing small domain (in orange). The transition from the N-terminal transmembrane helix to the hydrophilic head-domain is indicated in green. The figure furthermore shows the most divergent versions of β-strand connecting stretches as observed in 3D-structures so far, such as the β3/β4-loops in the b6f- (green) and the bc1-complexes (light magenta) and the SoxF protein from Sulfolobus acidocaldarius (light blue) as well as the β2/β3-loop in the mitochondrial enzyme (grey) as compared to that seen in the Thermus thermophilus Rieske protein (purple). The long insertion between the cluster ligating boxes in SoxF is represented in red and the C-terminal extension of the cyanobacterial/plastidic enzyme in dark blue. (c)-(h) show the structural incorporation of this virtual Rieske protein into the presently known 3D-structures of parent enzymes, i.e. the mitochondrial bc1-complex (c, d), the cyanobacterial/plastidic b6f complex (e, f) and a proteobacterial arsenite oxidase (g, h). In (c), (e) and (g), view directions are within the membrane plane (indicated by hatches and delimited by dashed lines), whereas in (d), (f) and (h) the enzymes are observed from the periplasmic side of their parent enzymes. 28