Databases and reading what enzymes do

advertisement
Advanced Environmental Biotechnology I
Databases and reading what enzymes do
Water is polluted.
What can we do?
If we know the pollutant, perhaps we can look it up in a database.
If the pollutant is 1,2-Dichlorethane
University of Minnesota Biocatalysis/Biodegradation Database
This database contains information on microbial biocatalytic reactions and biodegradation
pathways for primarily xenobiotic, chemical compounds. The UM-BBD provides information on
microbial enzyme-catalyzed reactions that are important for biotechnology.
From http://umbbd.ethz.ch/
Selecting the 1,2-Dichlorethane pathway leads here:
Click on the 1,2-Dichloroethane box gives
1,2-Dichloroethane






Formula: C2H4Cl2
MW: 98.96
SMILES String: ClCCCl
CAS Reg. 107-06-2
PubChem Substance Entry
UM-BBD reactions whose substrate is 1,2-Dichloroethane
o 1,2-Dichloroethane -----> 2-Chloroethanol (reacID# r0001)
Or click on haloalkane dehalogenase
From 1,2-Dichloroethane to 2-Chloroethanol
Graphic of the reaction.
Medline reference
Verschueren
KH,
Seljee
F,
Rozeboom
BW. Nature (1993) 363(6431): 693-8.
Search Medline titles for haloalkane dehalogenase.
86 citations on March 09, 2012.
HJ,
Kalk
KH,
Dijkstra
1,2-Dichloroethane
|
| H2O
haloalkane | /
dehalogenase |/
3.8.1.5 | Search GenBank, 578 hits on Apr. 03, 2012
Kyoto |\
ExPASy | \
| HCl
v
2-Chloroethanol
Display a pathway starting from this reaction.
UM-BBD Biotransformation rules in accord with this reaction:
Halomethyl derivative -----> 1-Methylalcohol derivative (bt0022)
The University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) began in 1995
and now contains information on almost 1200 compounds, over 800 enzymes, almost 1300
reactions and almost 500 microorganism entries. Besides these data, it includes a Biochemical
Periodic Table (UM-BPT) and a rule-based Pathway Prediction System (UM-PPS) that predicts
plausible pathways for microbial degradation of organic compounds. Currently, the UM-PPS
contains 260 biotransformation rules derived from reactions found in the UM-BBD and scientific
literature. Public access to UM-BBD data is increasing. A new mirror website of the UM-BBD,
UM-BPT and UM-PPS is being developed at ETH Zürich to improve speed and reliability of
online access from anywhere in the world.
A different useful database is KEGG: Kyoto Encyclopedia of Genes and Genomes
KEGG Overview
1. Genomes to Biological System
KEGG is a database resource for understanding high-level functions and utilities of the biological
system, such as the cell, the organism and the ecosystem, from genomic and molecular-level
information. It is a computer representation of the biological system, consisting of molecular
building blocks of genes and proteins (genomic information) and chemical substances (chemical
information) that are integrated with the knowledge on molecular wiring diagrams of interaction,
reaction and relation networks (systems information).
The KEGG database has been in development by Kanehisa Laboratories since 1995, and is now
a prominent reference knowledge base for integration and interpretation of large-scale molecular
data sets generated by genome sequencing and other high-throughput experimental
technologies.
2. The KEGG Database
KEGG is an integrated database resource consisting of the sixteen main databases shown below.
They are broadly categorized into systems information, genomic information and chemical
information and further subcategorized by color coding of web pages.
Category
Database
KEGG PATHWAY
Systems
KEGG BRITE
information
KEGG MODULE
Content
KEGG pathway maps
BRITE functional hierarchies
KEGG modules of functional units
Color
KEGG ORTHOLOGY KEGG Orthology (KO) groups
Genomic KEGG GENOME
information KEGG GENES
KEGG organisms with complete genomes
Gene catalogs in complete genomes
KEGG SSDB
Sequence similarity database for GENES
KEGG COMPOUND Metabolites and other small molecules
KEGG GLYCAN
Glycans
Chemical KEGG REACTION Biochemical reactions
information KEGG RPAIR
Reactant pair chemical transformations
KEGG RCLASS
Reaction class defined by RPAIR
KEGG ENZYME
Enzyme nomenclature
KEGG DISEASE
Human diseases
Health
KEGG DRUG
Drugs
information
KEGG ENVIRON
Crude drugs and health-related substances
Chemical information category is collectively called KEGG LIGAND
Health information category is collectively called KEGG MEDICUS
These databases contain various data objects for computer representation of the biological
systems. Thus, the database entry of each database is called the KEGG object, which is
identified by the KEGG object identifier consisting of a database-dependent prefix and a five-digit
number.
3. KEGG Molecular Networks
The most unique data object in KEGG is the molecular networks -- molecular interaction,
reaction and relation networks representing systemic functions of the cell and the organism.
Experimental knowledge on such systemic functions is captured from literature and organized in
the following three forms:




Pathway map - in KEGG PATHWAY (see: Pathway maps)
Functional hierarchy (ontology) - in KEGG BRITE (see: Brite hierarchies)
Membership (logical expression) - in KEGG MODULE
Membership (simple list) - in KEGG DISEASE
These databases constitute the reference knowledge base for biological interpretation of
genomes and high-throughput molecular datasets through the process of KEGG mapping
(see: KEGG mapping).
In 1995 the concept of mapping was first introduced in KEGG for linking genomes to metabolic
pathways (metabolic reconstruction) using the EC number. Once the EC numbers were assigned
to enzyme genes in the genome, organism-specific pathways could be generated automatically
by matching against the enzyme (EC number) networks of the KEGG reference metabolic
pathways. The EC number is no longer used as an identifier in KEGG.
References
1. Kanehisa, M.; Toward pathway engineering: a new database of genetic and molecular
pathways. Science & Technology Japan, No. 59, pp. 34-38 (1996). [pdf]
2. Kanehisa, M.; A database for post-genome analysis. Trends Genet. 13, 375-376 (1997).
[pubmed]
3. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M.; KEGG: Kyoto
Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34 (1999). [pubmed]
[pdf]
4. Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic
Acids Res. 28, 27-30 (2000). [pubmed] [pdf]
5. Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A.; The KEGG databases at
GenomeNet. Nucleic Acids Res. 30, 42-46 (2002). [pubmed] [pdf]
6. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M.; The KEGG resources
for deciphering the genome. Nucleic Acids Res. 32, D277-D280 (2004). [pubmed] [pdf]
7. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S.,
Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new
developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). [pubmed] [pdf]
8. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T.,
Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y.; KEGG for linking genomes
to life and the environment. Nucleic Acids Res. 36, D480-D484 (2008). [pubmed] [pdf]
9. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG for
representation and analysis of molecular networks involving diseases and drugs. Nucleic
Acids Res. 38, D355-D360 (2010). [pubmed] [pdf]
10. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., and Tanabe, M.; KEGG for integration and
interpretation of large-scale molecular datasets. Nucleic Acids Res. 40, D109-D114 (2012).
[pubmed] [pdf]
11. Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M.; Data,
information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42,
D199–D205 (2014). [pubmed] [pdf]
KEGG - Metabolic Maps
This shows a map of all known metabolic pathways.
http://www.genome.jp/kegg/
Chemicals which are new to the environment are called xenobiotics.
Many of these chemicals can be degraded by living things.
This is called biodegradation.
We can look at the biodegradation of one chemical.
Lets’s look at 1,2 Dichloroethane.
Here is the biodegradation pathway.
We can look at this enzyme.
ENZYME: 3.8.1.5
We can see the reaction it catalyses.
We can look at the primary sequence of the protein.
We can also look at the base sequence of the gene.
Conclusion
There is a lot of very useful information on the internet.
More is being added all the time.
We can find out what enzymes degrade pollutants.
We can find what organisms have those enzymes.
We can find out which genes are useful.
In this way, we can get information. We can then look for that information in the environment.
Some other useful databases.
BRENDA
http://www.brenda-enzymes.org/
The Comprehensive Enzyme Information System (BRENDA) is an excellent source for a variety
of information on enzymes. Comprehensive data on a particular enzyme can be found by using
standard nomenclature searches for name and EC number, but searches can also be done using
a large array of additional criteria. For example it is possible to search for enzymes based on
functional parameters such as Km, pH range, and temperature range. It's also possible to
identify enzymes based on the ligands it uses or its inhibitors, cofactors, structure, stability, etc.
In addition, the database lets the user see if researchers have purified, cloned or crystallized an
enzyme of interest and provides links to journal articles with relevant information. It allows the
user to search and browse for enzymes based on their application in biotechnology and industry.
The site lists about 20 fields in which enzymes, many of which are produced by microbes, play
important roles in emerging and existing technologies. Several of the fields listed are not ones
that most people would immediately associate with biotechnology and enzymes.
Entrez PubMed
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed
The Entrez PubMed is a very useful site because it allows one to perform literature searches and
access to abstracts (and sometimes full articles) of PubMed. The articles provide useful insight
and technical information on a large, diverse compilation of scientific papers authored by
scientists around the world. It is useful to find information on specific microbiology topics when
performing literature searches. One can search PubMed by topic, author, or journal and this
flexibility and versatility built into the website's search engine make it even easier to find articles
that are relevant to finding more information about a particular topic. In addition, Entrez PubMed
has multiple major databases that can be searched to find information about many other
topics.
Some of the databases available for searching include: Nucleotide and Protein
Sequences, Protein Structures, Complete Genomes, and Taxonomy. These databases can
provide a person with up to date scientific information. Finally, this website has additional links
that provide information to related resources. These other resources provide another set of
references that can help a person find information about a chosen topic, remain up to date with
current industry topics, and/or obtain information about different government regulatory
agencies.
GenBank
http://www.ncbi.nlm.nih.gov/Genbank/
GenBank is a good tool for bioinformatics. It contains information about all the genetic
sequences of cataloged genes. These submitted genes are annotated and compared for
authenticity; given an easy 6 letter gene identification number. Often when a new gene is
identified in a paper the GenBank ID for the gene is also published. This is a NCBI database so it
can be generally trusted to be reputable. One strong point for GenBank is that it provides aminoacid, and nucleotide sequences for many different organisms. For bioremediation this can be a
help in finding the specific organism that may carry out the desired bioremediation. The GenBank
nucleotide format is generally considered the standard and most molecular genetics tools
manipulate gene info in that format.
Biocyc Database
http://biocyc.org/
BioCyc is a collection of 3530 Pathway/Genome Databases (PGDBs), with tools for
understanding their data. This database allows the user to search metabolic pathways used by
specific bacteria. BioCyc provides tools for navigating, visualizing, and analyzing the underlying
databases, and for analyzing omics data:
 Genome
browser
 Display of individual metabolic pathways, and of full metabolic maps
 Multiple analysis methods for user-supplied omics and multi-omics datasets including
painting onto metabolic maps, regulatory maps, and genome maps
 Store groups of genes and pathways in your account as SmartTables; share, analyze,
transform those groups
 Comparative analysis tools
Other important databases look at Ribosomal RNA
Ribosomal RNA (rRNA) is the main component of the ribosome.
The ribosome makes proteins.
The rRNA and about 70 – 80 ribosomal proteins fold up into two complex folded structures.
rRNA decodes mRNA into amino acids (at center of small ribosomal subunit) and interacts with
the tRNAs during translation by providing petidyltransferase activity (large subunit).
rRNA is the most conserved (least variable) gene in all cells. Genes that encode the rRNA
(rDNA) can be used to identify an organism's taxonomic group, and calculate related groups.
In Bacteria, Archaea, Mitochondria, and Chloroplasts the small ribosomal subunit contains 16S
rRNA.
S means Svedberg units which is a measure of how quickly the particles sediment.
The large ribosomal subunit contains two rRNA species (the 5S and 23S rRNAs).
rna.ucsc.edu/rnacenter/ribosome_images.html
tigger.uic.edu/classes/phys/phys461/phys450/ANJUM04/ribosome.jpg
rna.ucsc.edu/rnacenter/ribosome_images.html
Download