Advanced Environmental Biotechnology I Databases and reading what enzymes do Water is polluted. What can we do? If we know the pollutant, perhaps we can look it up in a database. If the pollutant is 1,2-Dichlorethane University of Minnesota Biocatalysis/Biodegradation Database This database contains information on microbial biocatalytic reactions and biodegradation pathways for primarily xenobiotic, chemical compounds. The UM-BBD provides information on microbial enzyme-catalyzed reactions that are important for biotechnology. From http://umbbd.ethz.ch/ Selecting the 1,2-Dichlorethane pathway leads here: Click on the 1,2-Dichloroethane box gives 1,2-Dichloroethane Formula: C2H4Cl2 MW: 98.96 SMILES String: ClCCCl CAS Reg. 107-06-2 PubChem Substance Entry UM-BBD reactions whose substrate is 1,2-Dichloroethane o 1,2-Dichloroethane -----> 2-Chloroethanol (reacID# r0001) Or click on haloalkane dehalogenase From 1,2-Dichloroethane to 2-Chloroethanol Graphic of the reaction. Medline reference Verschueren KH, Seljee F, Rozeboom BW. Nature (1993) 363(6431): 693-8. Search Medline titles for haloalkane dehalogenase. 86 citations on March 09, 2012. HJ, Kalk KH, Dijkstra 1,2-Dichloroethane | | H2O haloalkane | / dehalogenase |/ 3.8.1.5 | Search GenBank, 578 hits on Apr. 03, 2012 Kyoto |\ ExPASy | \ | HCl v 2-Chloroethanol Display a pathway starting from this reaction. UM-BBD Biotransformation rules in accord with this reaction: Halomethyl derivative -----> 1-Methylalcohol derivative (bt0022) The University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD) began in 1995 and now contains information on almost 1200 compounds, over 800 enzymes, almost 1300 reactions and almost 500 microorganism entries. Besides these data, it includes a Biochemical Periodic Table (UM-BPT) and a rule-based Pathway Prediction System (UM-PPS) that predicts plausible pathways for microbial degradation of organic compounds. Currently, the UM-PPS contains 260 biotransformation rules derived from reactions found in the UM-BBD and scientific literature. Public access to UM-BBD data is increasing. A new mirror website of the UM-BBD, UM-BPT and UM-PPS is being developed at ETH Zürich to improve speed and reliability of online access from anywhere in the world. A different useful database is KEGG: Kyoto Encyclopedia of Genes and Genomes KEGG Overview 1. Genomes to Biological System KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from genomic and molecular-level information. It is a computer representation of the biological system, consisting of molecular building blocks of genes and proteins (genomic information) and chemical substances (chemical information) that are integrated with the knowledge on molecular wiring diagrams of interaction, reaction and relation networks (systems information). The KEGG database has been in development by Kanehisa Laboratories since 1995, and is now a prominent reference knowledge base for integration and interpretation of large-scale molecular data sets generated by genome sequencing and other high-throughput experimental technologies. 2. The KEGG Database KEGG is an integrated database resource consisting of the sixteen main databases shown below. They are broadly categorized into systems information, genomic information and chemical information and further subcategorized by color coding of web pages. Category Database KEGG PATHWAY Systems KEGG BRITE information KEGG MODULE Content KEGG pathway maps BRITE functional hierarchies KEGG modules of functional units Color KEGG ORTHOLOGY KEGG Orthology (KO) groups Genomic KEGG GENOME information KEGG GENES KEGG organisms with complete genomes Gene catalogs in complete genomes KEGG SSDB Sequence similarity database for GENES KEGG COMPOUND Metabolites and other small molecules KEGG GLYCAN Glycans Chemical KEGG REACTION Biochemical reactions information KEGG RPAIR Reactant pair chemical transformations KEGG RCLASS Reaction class defined by RPAIR KEGG ENZYME Enzyme nomenclature KEGG DISEASE Human diseases Health KEGG DRUG Drugs information KEGG ENVIRON Crude drugs and health-related substances Chemical information category is collectively called KEGG LIGAND Health information category is collectively called KEGG MEDICUS These databases contain various data objects for computer representation of the biological systems. Thus, the database entry of each database is called the KEGG object, which is identified by the KEGG object identifier consisting of a database-dependent prefix and a five-digit number. 3. KEGG Molecular Networks The most unique data object in KEGG is the molecular networks -- molecular interaction, reaction and relation networks representing systemic functions of the cell and the organism. Experimental knowledge on such systemic functions is captured from literature and organized in the following three forms: Pathway map - in KEGG PATHWAY (see: Pathway maps) Functional hierarchy (ontology) - in KEGG BRITE (see: Brite hierarchies) Membership (logical expression) - in KEGG MODULE Membership (simple list) - in KEGG DISEASE These databases constitute the reference knowledge base for biological interpretation of genomes and high-throughput molecular datasets through the process of KEGG mapping (see: KEGG mapping). In 1995 the concept of mapping was first introduced in KEGG for linking genomes to metabolic pathways (metabolic reconstruction) using the EC number. Once the EC numbers were assigned to enzyme genes in the genome, organism-specific pathways could be generated automatically by matching against the enzyme (EC number) networks of the KEGG reference metabolic pathways. The EC number is no longer used as an identifier in KEGG. References 1. Kanehisa, M.; Toward pathway engineering: a new database of genetic and molecular pathways. Science & Technology Japan, No. 59, pp. 34-38 (1996). [pdf] 2. Kanehisa, M.; A database for post-genome analysis. Trends Genet. 13, 375-376 (1997). [pubmed] 3. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H., and Kanehisa, M.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34 (1999). [pubmed] [pdf] 4. Kanehisa, M. and Goto, S.; KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27-30 (2000). [pubmed] [pdf] 5. Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A.; The KEGG databases at GenomeNet. Nucleic Acids Res. 30, 42-46 (2002). [pubmed] [pdf] 6. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., and Hattori, M.; The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277-D280 (2004). [pubmed] [pdf] 7. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., and Hirakawa, M.; From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354-357 (2006). [pubmed] [pdf] 8. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., and Yamanishi, Y.; KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484 (2008). [pubmed] [pdf] 9. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M.; KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38, D355-D360 (2010). [pubmed] [pdf] 10. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., and Tanabe, M.; KEGG for integration and interpretation of large-scale molecular datasets. Nucleic Acids Res. 40, D109-D114 (2012). [pubmed] [pdf] 11. Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M.; Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014). [pubmed] [pdf] KEGG - Metabolic Maps This shows a map of all known metabolic pathways. http://www.genome.jp/kegg/ Chemicals which are new to the environment are called xenobiotics. Many of these chemicals can be degraded by living things. This is called biodegradation. We can look at the biodegradation of one chemical. Lets’s look at 1,2 Dichloroethane. Here is the biodegradation pathway. We can look at this enzyme. ENZYME: 3.8.1.5 We can see the reaction it catalyses. We can look at the primary sequence of the protein. We can also look at the base sequence of the gene. Conclusion There is a lot of very useful information on the internet. More is being added all the time. We can find out what enzymes degrade pollutants. We can find what organisms have those enzymes. We can find out which genes are useful. In this way, we can get information. We can then look for that information in the environment. Some other useful databases. BRENDA http://www.brenda-enzymes.org/ The Comprehensive Enzyme Information System (BRENDA) is an excellent source for a variety of information on enzymes. Comprehensive data on a particular enzyme can be found by using standard nomenclature searches for name and EC number, but searches can also be done using a large array of additional criteria. For example it is possible to search for enzymes based on functional parameters such as Km, pH range, and temperature range. It's also possible to identify enzymes based on the ligands it uses or its inhibitors, cofactors, structure, stability, etc. In addition, the database lets the user see if researchers have purified, cloned or crystallized an enzyme of interest and provides links to journal articles with relevant information. It allows the user to search and browse for enzymes based on their application in biotechnology and industry. The site lists about 20 fields in which enzymes, many of which are produced by microbes, play important roles in emerging and existing technologies. Several of the fields listed are not ones that most people would immediately associate with biotechnology and enzymes. Entrez PubMed http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed The Entrez PubMed is a very useful site because it allows one to perform literature searches and access to abstracts (and sometimes full articles) of PubMed. The articles provide useful insight and technical information on a large, diverse compilation of scientific papers authored by scientists around the world. It is useful to find information on specific microbiology topics when performing literature searches. One can search PubMed by topic, author, or journal and this flexibility and versatility built into the website's search engine make it even easier to find articles that are relevant to finding more information about a particular topic. In addition, Entrez PubMed has multiple major databases that can be searched to find information about many other topics. Some of the databases available for searching include: Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, and Taxonomy. These databases can provide a person with up to date scientific information. Finally, this website has additional links that provide information to related resources. These other resources provide another set of references that can help a person find information about a chosen topic, remain up to date with current industry topics, and/or obtain information about different government regulatory agencies. GenBank http://www.ncbi.nlm.nih.gov/Genbank/ GenBank is a good tool for bioinformatics. It contains information about all the genetic sequences of cataloged genes. These submitted genes are annotated and compared for authenticity; given an easy 6 letter gene identification number. Often when a new gene is identified in a paper the GenBank ID for the gene is also published. This is a NCBI database so it can be generally trusted to be reputable. One strong point for GenBank is that it provides aminoacid, and nucleotide sequences for many different organisms. For bioremediation this can be a help in finding the specific organism that may carry out the desired bioremediation. The GenBank nucleotide format is generally considered the standard and most molecular genetics tools manipulate gene info in that format. Biocyc Database http://biocyc.org/ BioCyc is a collection of 3530 Pathway/Genome Databases (PGDBs), with tools for understanding their data. This database allows the user to search metabolic pathways used by specific bacteria. BioCyc provides tools for navigating, visualizing, and analyzing the underlying databases, and for analyzing omics data: Genome browser Display of individual metabolic pathways, and of full metabolic maps Multiple analysis methods for user-supplied omics and multi-omics datasets including painting onto metabolic maps, regulatory maps, and genome maps Store groups of genes and pathways in your account as SmartTables; share, analyze, transform those groups Comparative analysis tools Other important databases look at Ribosomal RNA Ribosomal RNA (rRNA) is the main component of the ribosome. The ribosome makes proteins. The rRNA and about 70 – 80 ribosomal proteins fold up into two complex folded structures. rRNA decodes mRNA into amino acids (at center of small ribosomal subunit) and interacts with the tRNAs during translation by providing petidyltransferase activity (large subunit). rRNA is the most conserved (least variable) gene in all cells. Genes that encode the rRNA (rDNA) can be used to identify an organism's taxonomic group, and calculate related groups. In Bacteria, Archaea, Mitochondria, and Chloroplasts the small ribosomal subunit contains 16S rRNA. S means Svedberg units which is a measure of how quickly the particles sediment. The large ribosomal subunit contains two rRNA species (the 5S and 23S rRNAs). rna.ucsc.edu/rnacenter/ribosome_images.html tigger.uic.edu/classes/phys/phys461/phys450/ANJUM04/ribosome.jpg rna.ucsc.edu/rnacenter/ribosome_images.html