Running head: Genome Annotation 1 Genome Annotation Genome

advertisement
Running head: Genome Annotation
1
Summary report on Ferroplasma acidarmanus fer1
Richard Johnson
Genome Annotation Assessment
St. George’s University
17th November 2014
Genome Annotation
2
The Tricarboxylic acid (TCA) cycle/Kreb's cycle/Citric acid cycle plays several roles in
metabolism. It is a key pathway in virtually all cells, as it comprises important reactions in
carbon metabolism associated with formation of ATP. It begins when the two-carbon
compound acetyl-CoA condenses with the four-carbon compound oxaloacetate to form the sixcarbon compound citrate. Through a series of oxidations and transformations, this six-carbon
compound is ultimately converted back to the four-carbon compound oxaloacetate, which then
begins another cycle with addition of the next molecule of acetyl-CoA, as seen below (Midagan,
Martinko, Stahl, & Clark, 2012).
Fig 1: TCA Cycle
Ferroplasma acidarmanus fer1, hereinafter referred to as F. acidarmanus, is a
mesophilic, extreme acidophile (Edwards et al., 2000). This cell wall-less extremophile is
capable of mobilizing metals from sulfide ores and more acid-resistant than iron and sulfur
Genome Annotation
3
oxidizing bacteria. Consequently they are major influences on the biogeochemical cycling of
sulfur and sulfide metals in highly acidic environments (Golyshina & Timmis, 2005). Their
mechanism of survival in such an acidic environment, probably involving a proton pump to
maintain the pH gradient, must be highly energy exhaustive. In this regard this gene being
annotated seems vital as it appears to be implicated in several pathways, most notably, the
Tricarboxylic acid (TCA) cycle – an important cycle in energy generation, as noted above.
The substrate of the TCA cycle is the 2 carbon compound acetyl-coA. However, this
compound is generated from pyruvate, the end product of glycolysis. Pyruvate is converted to
acetyl CoA by the pyruvate dehydrogenase (PDH) complex, which is a multienzyme complex.
Strictly speaking, the PDH complex is not part of the TCA cycle proper, but is a major source of
acetyl CoA—the two-carbon substrate for the cycle (Harvey & Ferrier, 2011).
The object identification number (OID) 638394153 of the F. acidarmanus gene was
previously annotated and proposed by the computer to be a hypothetical protein, i.e. a protein of
unknown function. This gene, with DNA coordinates 1815827..1817182 (1356bp), codes for a
protein 451 amino acids long. After annotating this gene however, the compelling evidence
suggests that it is Dihydrolipoamide dehydrogenase (EC: 1.8.1.4), an enzyme that is part of the
multienzyme PDH complex. The PDH complex comprises three components: pyruvate
dehydrogenase (E1), dihydrolipoyl transacetylase (E2), and dihydrolipoyl dehydrogenase (E3). It
is therefore suggested that this gene product is the E3 component of the PDH complex.
Similarity based tools; such as Blastp found that this gene’s hypothetical protein was
similar to a highly conserved class of proteins that have oxidoreductase activity and therefore
have a vital role in energy metabolism. The top hits were all the same enzyme, dihydrolipoamide
dehydrogenase, also known as dihydrolipoyl dehydrogenase, found in organisms such as
Genome Annotation
4
Thermoplasma volcanium and Picrophilus torridus (Non-redundant protein sequences top hits),
and Bacillus subtilis and Chlamydia muridarum (Swissprot top hits). All of these hits had very
low E-values (1e-172 - 2e-49), high coverage (>96.7%) and good percentage identity (>30%). A
CDD search produced one cluster of orthologous groups (COG) hit - Lpd: Pyruvate/2oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and
related enzymes, COG1249 (multi-domain).
TMHMM predicted no transmembrane helices. Similarly, SignalP predicted low
probabilities of Signal Peptides: Eukaryotic Comparison: 0.117, Gram -ve Bacterial Comparison:
0.137, Gram +ve Bacterial Comparison: 0.138. A signal peptide is an N-terminal signal that
directs the protein across the ER membrane in eukaryotes and across the plasma membrane in
prokaryotes. Signal peptides are also known as ER signal peptides or secretory signal peptides.
(http://www.cbs.dtu.dk/services/SignalP/faq.php?&sessionid=26c4a4fac2d7a0209973972fa06c8564). The proposed absence of transmembrane helices is an
indication that the protein is not membrane associated and therefore not involved in secretion, as
evidenced by the SignalP prediction. Phobius results also corresponded with the results of
TMHMM and Signal IP. LipoP predicted less than 4 putative cleavage sites so no plot was made.
LipoP also predicted the protein was cytoplasmic. This cytoplasmic postulation was further
supported by PSORT-B, which gave a compellingly high cytoplasmic score of 9.96. These
findings correlate with the proposed functional prediction of the protein, which MetaCyc
indicates should be located in cytoplasm.
TIGRFAM detected dihydrolipoamide dehydrogenase as its top hit, TIGR01350.
TIGRFAM further describes dihydrolipoamide dehydrogenase as a flavoprotein that acts in
several ways, even identifying it as the E3 component of dehydrogenase complexes for pyruvate,
Genome Annotation
5
2-oxoglutarate, 2-oxoisovalerate, and acetoin. This family includes a few members known to
have distinct functions (ferric leghemoglobin reductase and NADH:ferredoxin oxidoreductase)
but that may be predicted by homology to act as dihydrolipoamide dehydrogenase as well. The
motif GGXCXXXGCXP near the N-terminus contains a redox-active disulfide (Marchler-Bauer
A et al., 2013).
Pfam’s top hit, Pyr_redox_2 is of the NADP_Rossmann clan, which describes a class of
redox enzymes with two domain proteins. One domain, termed the catalytic domain, confers
substrate specificity and the precise reaction of the enzyme. The other domain, which is common
to this class of redox enzymes, is a Rossmann-fold domain. The Rossmann domain binds
nicotinamide adenine dinucleotide (NAD+) and it is this cofactor that reversibly accepts a
hydride ion, which is lost or gained by the substrate in the redox reaction.
In some more distantly related Rossmann
domains the NAD+ cofactor is replaced by the
functionally similar cofactor FAD.
(http://pfam.xfam.org/family/
PF07992.9#tabview=tab2). All of this is
consistent with dihydrolipoamide
dehydrogenase as it is a flavoprotein involved
in redox reactions and further supports the
prediction that the query sequence is indeed
dihydrolipoamide dehydrogenase.
Fig. 2: 3D structure of Pyr_redox_2
Genome Annotation
6
The top hit from the PDB search was Crystal structure of a dihydrolipoyl dehydrogenase from
Sulfolobus solfataricus:
The proposed gene product
of the query gene,
dihydrolipoamide
dehydrogenase, is found in
the close relatives
Picrophilus torridus,
Thermoplasma volcanium
& Thermoplasma
acidophilum. According to
KEGG, MetaCyc and EC
databases, this enzyme is an integral component of several enzymatic pathways in these
organisms, including TCA cycle, glycolysis/gluconeogenesis, glycine, serine and threonine
metabolism and valine, leucine and isoleucine degradation. The fact that is one enzyme is so
multifunctional – involved in several pathways – may account for the reduced genome size seen
in archaeal organisms.
Fig 3: Enzymatic reaction of dihydrolipoamide dehydrogenase
Genome Annotation
7
Fig 4: KEGG results showing the TCA cycle and query sequence
(shown in red)
Fig 5: MetaCyc results (P. torridus)
There is compelling evidence that horizontal gene transfer (HGT) occurred between F.
acidarmanus and several other organisms. The phylogenetic tree indicated the organisms most
closely related to F. acidarmanus were within the same order - Thermoplasmatales. These
organisms are: Picrophilus torridus, Thermoplasma volcanium, Thermoplasma acidophilum &
Thermoplasmatales archaeon Gpl. The presence of the other unrelated organisms, such as
Genome Annotation
8
Hydrogenobacculum, Leptospirillum, Acidithiobacillus and Metallosphaera, however are a
strong indication that HGT occurred, resulting in orthologous copies of the gene under
assessment. The strong evidence of HGT is due to the fact that these organisms are all bacterial
and would not contain the same gene found in an archaeal organism unless there was some
exchange of genetic material. It is important to note that while Sulfolobus and Acidolobus are of
the Archaeal domain, they are classified under the Crenarchaeota phylum, which may suggest an
immediate point in history where HGT occurred.
Fig. 6: Phylogenetic tree
The Gene Neighborhood Region analysis indicated that the gene is not part of an operon one or more genes transcribed into a single RNA and under the control of a single regulatory site,
comprising an operator, promoter and multiple genes - since it is not flanked by adjacent genes.
Genome Annotation
9
It also suggests that HGT might have occurred since the gene was found in Hydrogenobaculum,
an Aquificaceae of bacterial domain.
Fig. 7: Gene Neighborhood Region results
For this gene, three paralogous genes, only one of known function, were found. This
suggests that the gene is highly conserved and supports the theory of evolutionary conservation,
indicating that it is protective and most likely necessary for survival. It also suggests that this
gene is part of a multi-gene family, with the paralogs as isozymes.
Not surprisingly, the Rfam analysis indicated that the query gene was not coding for
tRNA, rRNA, or siRNA. This indicates that this gene is coding for a protein, which again is
consistent with the proposed annotation of the gene in question.
Finally, it is important to state that the gene is not a pseudogene, i.e. it is predicted to be
functional. This is based on three criteria. Firstly, the HMM alignment coverage of both domains
Genome Annotation
10
was much greater than 30% indicating that the domains are still functional and hence the query
sequence may not be a pseudogene. Secondly, the missing domain regions were not found in the
flanking regions of the translated DNA sequence. Review of the literature shows that it is unclear
whether the missing regions are necessary for functionality of the domains. Thirdly, the Prosite
database found several prosite motif signatures within the query sequence that lacked a
functional site. However, upon further research, it was determined that there were other
sequences within the database that may also be missing these sites but remained functional.
Hence, based on criterion 3 this query gene may not be a pseudogene.
It is noteworthy that an alternative ORF was identified for this gene. The original ORF of
the gene has a start and stop codon and a Shine-Dalgarno (SD) region, however the SD is more
than 20 bp upstream of the start codon and the possible start codons within that range all have a
stop codon coming after them. Since an alternative SD region was required the ORF was in turn
redefined. No SD that met all criteria, i.e. 5-20 bp upstream of a potential start codon that is not
being terminated before the original stop codon, was found in the original ORF. However, an SD
was found 90 bp downstream of the original start codon. Two SDs were found before this one
however the alternative start codons were coding within a different frame (F3) to the original
start codon (F2) and so could not be used.
One alternative ORF (>80aa) was provided when using a textual output of the IMG
Sequence Viewer. However, this alternative ORF was coding in frame 5 and produced very
insignificant blastp results – E-value of 2 – 5. Conversely, when compared to the original blastp
results, the alternative ORF indicated by (IMG) DNA coordinates 891..2165, gave the same top
hits, with very low E values and similar % identity; the same specific hits were found in the
conserved domains.
Genome Annotation
11
References
Dopson, M., Baker-Austin, C., Hind, A., Bowman, J.P., & Bond, P.L. (2004). Characterization
of Ferroplasma isolate and ferroplasma acidarmanus sp.nov, extreme acidophiles from
acid mine drainage and industrial bioleaching environments. Applied and Environmental
Microbiology, 70(4), 2079-2088. doi: 10.1128/AEM.70.4.2079-2088.2004
Edwards, K. J., Bond, P. L., Gihring, T. M. & Banfield, J. F. (2000). An archaeal iron-oxidizing
extreme acidophile important in acid mine drainage. Science, 287(54459), 1796–1799.
Retrieved from PubMed.
Golyshina, O. & Timmis, K.N. (2005). Ferroplasma and relatives, recently discovered cell walllacking archea making a living in extremely acidic, heavy metal-rich
environments. Environmental Microbiology, 7(9), 1277-1288. doi: 10.1111/j.14622920.2005.00861.x
Harvey, R., & Ferrier, D. (2011). Lippincott's illustrated reviews, biochemistry (5th ed.).
Philadelphia: Wolters Kluwer Health.
Madigan, M. T., Martinko, J. M., Stahl, D. A., Clark, D. P. (2012). Brock Biology of
Microorganisms (13th ed.). Upper Saddle River, NJ: Prentice Hall.
Marchler-Bauer A et al. (2013), "CDD: conserved domains and protein three-dimensional
structure.", Nucleic Acids Res. 41(D1):D384-52.
Wu et al., 2009. A phylogeny-driven genomic encyclopedia of Bacteria and Archaea. Nature,
462:1056-60.
Download