Running head: Genome Annotation 1 Summary report on Ferroplasma acidarmanus fer1 Richard Johnson Genome Annotation Assessment St. George’s University 17th November 2014 Genome Annotation 2 The Tricarboxylic acid (TCA) cycle/Kreb's cycle/Citric acid cycle plays several roles in metabolism. It is a key pathway in virtually all cells, as it comprises important reactions in carbon metabolism associated with formation of ATP. It begins when the two-carbon compound acetyl-CoA condenses with the four-carbon compound oxaloacetate to form the sixcarbon compound citrate. Through a series of oxidations and transformations, this six-carbon compound is ultimately converted back to the four-carbon compound oxaloacetate, which then begins another cycle with addition of the next molecule of acetyl-CoA, as seen below (Midagan, Martinko, Stahl, & Clark, 2012). Fig 1: TCA Cycle Ferroplasma acidarmanus fer1, hereinafter referred to as F. acidarmanus, is a mesophilic, extreme acidophile (Edwards et al., 2000). This cell wall-less extremophile is capable of mobilizing metals from sulfide ores and more acid-resistant than iron and sulfur Genome Annotation 3 oxidizing bacteria. Consequently they are major influences on the biogeochemical cycling of sulfur and sulfide metals in highly acidic environments (Golyshina & Timmis, 2005). Their mechanism of survival in such an acidic environment, probably involving a proton pump to maintain the pH gradient, must be highly energy exhaustive. In this regard this gene being annotated seems vital as it appears to be implicated in several pathways, most notably, the Tricarboxylic acid (TCA) cycle – an important cycle in energy generation, as noted above. The substrate of the TCA cycle is the 2 carbon compound acetyl-coA. However, this compound is generated from pyruvate, the end product of glycolysis. Pyruvate is converted to acetyl CoA by the pyruvate dehydrogenase (PDH) complex, which is a multienzyme complex. Strictly speaking, the PDH complex is not part of the TCA cycle proper, but is a major source of acetyl CoA—the two-carbon substrate for the cycle (Harvey & Ferrier, 2011). The object identification number (OID) 638394153 of the F. acidarmanus gene was previously annotated and proposed by the computer to be a hypothetical protein, i.e. a protein of unknown function. This gene, with DNA coordinates 1815827..1817182 (1356bp), codes for a protein 451 amino acids long. After annotating this gene however, the compelling evidence suggests that it is Dihydrolipoamide dehydrogenase (EC: 1.8.1.4), an enzyme that is part of the multienzyme PDH complex. The PDH complex comprises three components: pyruvate dehydrogenase (E1), dihydrolipoyl transacetylase (E2), and dihydrolipoyl dehydrogenase (E3). It is therefore suggested that this gene product is the E3 component of the PDH complex. Similarity based tools; such as Blastp found that this gene’s hypothetical protein was similar to a highly conserved class of proteins that have oxidoreductase activity and therefore have a vital role in energy metabolism. The top hits were all the same enzyme, dihydrolipoamide dehydrogenase, also known as dihydrolipoyl dehydrogenase, found in organisms such as Genome Annotation 4 Thermoplasma volcanium and Picrophilus torridus (Non-redundant protein sequences top hits), and Bacillus subtilis and Chlamydia muridarum (Swissprot top hits). All of these hits had very low E-values (1e-172 - 2e-49), high coverage (>96.7%) and good percentage identity (>30%). A CDD search produced one cluster of orthologous groups (COG) hit - Lpd: Pyruvate/2oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes, COG1249 (multi-domain). TMHMM predicted no transmembrane helices. Similarly, SignalP predicted low probabilities of Signal Peptides: Eukaryotic Comparison: 0.117, Gram -ve Bacterial Comparison: 0.137, Gram +ve Bacterial Comparison: 0.138. A signal peptide is an N-terminal signal that directs the protein across the ER membrane in eukaryotes and across the plasma membrane in prokaryotes. Signal peptides are also known as ER signal peptides or secretory signal peptides. (http://www.cbs.dtu.dk/services/SignalP/faq.php?&sessionid=26c4a4fac2d7a0209973972fa06c8564). The proposed absence of transmembrane helices is an indication that the protein is not membrane associated and therefore not involved in secretion, as evidenced by the SignalP prediction. Phobius results also corresponded with the results of TMHMM and Signal IP. LipoP predicted less than 4 putative cleavage sites so no plot was made. LipoP also predicted the protein was cytoplasmic. This cytoplasmic postulation was further supported by PSORT-B, which gave a compellingly high cytoplasmic score of 9.96. These findings correlate with the proposed functional prediction of the protein, which MetaCyc indicates should be located in cytoplasm. TIGRFAM detected dihydrolipoamide dehydrogenase as its top hit, TIGR01350. TIGRFAM further describes dihydrolipoamide dehydrogenase as a flavoprotein that acts in several ways, even identifying it as the E3 component of dehydrogenase complexes for pyruvate, Genome Annotation 5 2-oxoglutarate, 2-oxoisovalerate, and acetoin. This family includes a few members known to have distinct functions (ferric leghemoglobin reductase and NADH:ferredoxin oxidoreductase) but that may be predicted by homology to act as dihydrolipoamide dehydrogenase as well. The motif GGXCXXXGCXP near the N-terminus contains a redox-active disulfide (Marchler-Bauer A et al., 2013). Pfam’s top hit, Pyr_redox_2 is of the NADP_Rossmann clan, which describes a class of redox enzymes with two domain proteins. One domain, termed the catalytic domain, confers substrate specificity and the precise reaction of the enzyme. The other domain, which is common to this class of redox enzymes, is a Rossmann-fold domain. The Rossmann domain binds nicotinamide adenine dinucleotide (NAD+) and it is this cofactor that reversibly accepts a hydride ion, which is lost or gained by the substrate in the redox reaction. In some more distantly related Rossmann domains the NAD+ cofactor is replaced by the functionally similar cofactor FAD. (http://pfam.xfam.org/family/ PF07992.9#tabview=tab2). All of this is consistent with dihydrolipoamide dehydrogenase as it is a flavoprotein involved in redox reactions and further supports the prediction that the query sequence is indeed dihydrolipoamide dehydrogenase. Fig. 2: 3D structure of Pyr_redox_2 Genome Annotation 6 The top hit from the PDB search was Crystal structure of a dihydrolipoyl dehydrogenase from Sulfolobus solfataricus: The proposed gene product of the query gene, dihydrolipoamide dehydrogenase, is found in the close relatives Picrophilus torridus, Thermoplasma volcanium & Thermoplasma acidophilum. According to KEGG, MetaCyc and EC databases, this enzyme is an integral component of several enzymatic pathways in these organisms, including TCA cycle, glycolysis/gluconeogenesis, glycine, serine and threonine metabolism and valine, leucine and isoleucine degradation. The fact that is one enzyme is so multifunctional – involved in several pathways – may account for the reduced genome size seen in archaeal organisms. Fig 3: Enzymatic reaction of dihydrolipoamide dehydrogenase Genome Annotation 7 Fig 4: KEGG results showing the TCA cycle and query sequence (shown in red) Fig 5: MetaCyc results (P. torridus) There is compelling evidence that horizontal gene transfer (HGT) occurred between F. acidarmanus and several other organisms. The phylogenetic tree indicated the organisms most closely related to F. acidarmanus were within the same order - Thermoplasmatales. These organisms are: Picrophilus torridus, Thermoplasma volcanium, Thermoplasma acidophilum & Thermoplasmatales archaeon Gpl. The presence of the other unrelated organisms, such as Genome Annotation 8 Hydrogenobacculum, Leptospirillum, Acidithiobacillus and Metallosphaera, however are a strong indication that HGT occurred, resulting in orthologous copies of the gene under assessment. The strong evidence of HGT is due to the fact that these organisms are all bacterial and would not contain the same gene found in an archaeal organism unless there was some exchange of genetic material. It is important to note that while Sulfolobus and Acidolobus are of the Archaeal domain, they are classified under the Crenarchaeota phylum, which may suggest an immediate point in history where HGT occurred. Fig. 6: Phylogenetic tree The Gene Neighborhood Region analysis indicated that the gene is not part of an operon one or more genes transcribed into a single RNA and under the control of a single regulatory site, comprising an operator, promoter and multiple genes - since it is not flanked by adjacent genes. Genome Annotation 9 It also suggests that HGT might have occurred since the gene was found in Hydrogenobaculum, an Aquificaceae of bacterial domain. Fig. 7: Gene Neighborhood Region results For this gene, three paralogous genes, only one of known function, were found. This suggests that the gene is highly conserved and supports the theory of evolutionary conservation, indicating that it is protective and most likely necessary for survival. It also suggests that this gene is part of a multi-gene family, with the paralogs as isozymes. Not surprisingly, the Rfam analysis indicated that the query gene was not coding for tRNA, rRNA, or siRNA. This indicates that this gene is coding for a protein, which again is consistent with the proposed annotation of the gene in question. Finally, it is important to state that the gene is not a pseudogene, i.e. it is predicted to be functional. This is based on three criteria. Firstly, the HMM alignment coverage of both domains Genome Annotation 10 was much greater than 30% indicating that the domains are still functional and hence the query sequence may not be a pseudogene. Secondly, the missing domain regions were not found in the flanking regions of the translated DNA sequence. Review of the literature shows that it is unclear whether the missing regions are necessary for functionality of the domains. Thirdly, the Prosite database found several prosite motif signatures within the query sequence that lacked a functional site. However, upon further research, it was determined that there were other sequences within the database that may also be missing these sites but remained functional. Hence, based on criterion 3 this query gene may not be a pseudogene. It is noteworthy that an alternative ORF was identified for this gene. The original ORF of the gene has a start and stop codon and a Shine-Dalgarno (SD) region, however the SD is more than 20 bp upstream of the start codon and the possible start codons within that range all have a stop codon coming after them. Since an alternative SD region was required the ORF was in turn redefined. No SD that met all criteria, i.e. 5-20 bp upstream of a potential start codon that is not being terminated before the original stop codon, was found in the original ORF. However, an SD was found 90 bp downstream of the original start codon. Two SDs were found before this one however the alternative start codons were coding within a different frame (F3) to the original start codon (F2) and so could not be used. One alternative ORF (>80aa) was provided when using a textual output of the IMG Sequence Viewer. However, this alternative ORF was coding in frame 5 and produced very insignificant blastp results – E-value of 2 – 5. Conversely, when compared to the original blastp results, the alternative ORF indicated by (IMG) DNA coordinates 891..2165, gave the same top hits, with very low E values and similar % identity; the same specific hits were found in the conserved domains. Genome Annotation 11 References Dopson, M., Baker-Austin, C., Hind, A., Bowman, J.P., & Bond, P.L. (2004). Characterization of Ferroplasma isolate and ferroplasma acidarmanus sp.nov, extreme acidophiles from acid mine drainage and industrial bioleaching environments. Applied and Environmental Microbiology, 70(4), 2079-2088. doi: 10.1128/AEM.70.4.2079-2088.2004 Edwards, K. J., Bond, P. L., Gihring, T. M. & Banfield, J. F. (2000). An archaeal iron-oxidizing extreme acidophile important in acid mine drainage. Science, 287(54459), 1796–1799. Retrieved from PubMed. Golyshina, O. & Timmis, K.N. (2005). Ferroplasma and relatives, recently discovered cell walllacking archea making a living in extremely acidic, heavy metal-rich environments. Environmental Microbiology, 7(9), 1277-1288. doi: 10.1111/j.14622920.2005.00861.x Harvey, R., & Ferrier, D. (2011). Lippincott's illustrated reviews, biochemistry (5th ed.). Philadelphia: Wolters Kluwer Health. Madigan, M. T., Martinko, J. M., Stahl, D. A., Clark, D. P. (2012). Brock Biology of Microorganisms (13th ed.). Upper Saddle River, NJ: Prentice Hall. Marchler-Bauer A et al. (2013), "CDD: conserved domains and protein three-dimensional structure.", Nucleic Acids Res. 41(D1):D384-52. Wu et al., 2009. A phylogeny-driven genomic encyclopedia of Bacteria and Archaea. Nature, 462:1056-60.