Comparison of the RAST and KEGG Pathway Annotations for Glycolysis/Gluconeogenesis Pathway annotations can be very helpful in trying to determine the metabolic pathways an organism has and what types of characteristics make it unique. The Seed Viewer gives very good visualizations for the enzymes involved in different pathways in Halorhabdus utahensis. However, it is important to note that annotations are not perfect. The following tutorial shows you how to navigate KEGG and RAST to find the pathway you are interested in and then how to manually compare the two annotations to see if there are any mistakes in the pathway annotation for H. utahensis. 1. You must first log in to the SEED Viewer in order to access the annotation. The following is the link to the page where you can log in: http://rast.nmpdr.org/ 2. After you log in and go to the overview of the genome, this is the first page you will see: This is a picture of what you will see towards the top of the page. It gives you all of the different major pathways found in Halorhabdus utahensis: The number in blue shows you how many EC numbers were found in H. utahensis. If you click on this link it will take you to a page with information specific to carbohydrate metabolism Click on the link to find out more information about glycolysis/gluconeogenesis Below is the pathway map you will see for Glycolysis/Gluconeogenesis: Enzymes found in KEGG, but not in RAST *The numbers highlighted in green are ones that the annotation found in H. utahensis Next, we will look at the KEGG annotation for this same pathway. The website for KEGG pathways is as follows: http://www.genome.ad.jp/kegg/pathway.html Here is the first page you will see when you get to the database Clicking on this link will take you to a list of pathways under carbohydrate metabolism. Click on glycolysis/gluconeogenesis. You will find the page below. Click on the drop-down menu in the upper left corner and choose the organism Halobacterium salinarum: I chose this bacterium, because it is the halobacterium that is most closely related to H. utahensis. These are the enzymes found in KEGG and not in RAST I will further investigate the following enzyme: 1.1.1.1: Alcohol dehydrogenase using SEED, JGI and MANATEE. Go to the SEED’s homepage: This is the page you will get after you click on “Genome Browser” Make sure that you display 2915 items per page so that you can search the entire annotations. Then type in a search function for either 1.1.1.1 or alcohol dehydrogenase. Because I was able to find the enzyme, this means that the database had called both the name of the enzyme and the EC number, but did not highlight it in the pathway annotation To investigate the calls that JGI and MANATEE made, go to the H. utahensis GCAT wiki: http://gcat.davidson.edu/GcatWiki/index.php/Halorhabdus_utahensi s_Genome This link will give you all the protein sequences for the annotation in JGI This link will give you all of the protein sequences for the annotation in MANATEE Once you pull up these sequences, you can type in a search function for either the EC number or the name of the enzyme to see if that database called it. I would suggest doing both, because sometimes the databases call the enzyme name without the EC number. This is what I found in the MANATEE search: I did not find the enzyme is JGI. However, the fact that I found the enzyme with the exact same EC number and protein name in two of the databases should give strong evidence that the protein exists in the H. utahensis genome. To further support this claim I did a BLAST 2 between the amino acid sequences in the two different databases (SEED and MANATEE) and the amino acid sequence associated with EC number 1.1.1.1 (this can be found by clicking on the EC number 1.1.1.1 in the pathway map in KEGG). In order to do a BLAST 2 comparison go to this website http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi. If you are dong a protein BLAST, make sure that the “Program” option in the upper left corner is set to “blastp.” Below are the results for comparisons between KEGG and SEED and KEGG and Manatee. Top: KEGG Bottom SEED Top: KEGG Bottom: MANATEE Although both of these alignments do not have perfect matches, this is expected, because the protein sequences are coming from two different organisms. However, it is important to make sure that the protein sequences in SEED and MANATEE have functional domains associated with an alcohol dehydrogenase. I will investigate this by using the Conserved Domains Database and Pfam. First, however, I wanted to compare the SEED and MANATEE sequences against each other. I did a BLAST 2 comparison between these sequences. These are some very interesting results, because it shows that even though these two databases had the same raw DNA sequence, they used two different portions of the genome to call the same protein. This suggests that the two databases may have different methods of calling proteins. To make sure that both of these amino acid sequences have functional domains that allow them to be an alcohol dehydrogenase, I entered the sequences from SEED and MANATEE into the Conserved Domains Database (CDD) (http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). This is the result I got for the amino acid sequence in SEED: This is the result I got for the amino acid sequence found in MANATEE: Both of these amino acid sequences returned the same superfamily hits, which suggests that they both have functional domains that allow them to perform the function of an alcohol dehydrogenase. Searching CDD can be a very good way to confirm that an amino acid sequence’s function is what it was called as if there are discrepancies in the amino acid sequences between the different databases. To further support the data I found in CDD, I searched the Pfam database, which looks for protein families within the query sequence. This is the link for the Pfam search site: http://pfam.sanger.ac.uk/search. For both the sequence in RAST and Manatee, these are the results I found: From this data, we can see that these two protein sequences do in fact code for the same enzyme. As you can see, using other databases to support annotation calls can be very helpful in confirming the pathway annotation calls of a database. In a similar manner, I found the following information for the other enzymes that were in the KEGG pathway, but not highlighted in the RAST pathway. EC Number Name Other Databases? CDD and Pfam Results 2.7.1.2 Glucokinase YesSEED and MANATEE 3.1.3.11 Fructose Bisphosphate YesSEED 2.3.1.12 Dihydrolipollysine residue acetyltransferase No N/A 1.2.4.1 Pyruvate Dehydrogenase No N/A 1.2.1.3 Aldehyde Dehydrogenase (NAD+) No N/A SEED (ROK family, sugar kinase; ROK family) Manatee (Same, Same) SEED(IMPase like domains-substrate is fructose 1,6 bisphosphate; IMP family) Based on this information, we can see how important it is to hand curate a pathway and to compare the protein calls of different databases. Looking at the KEGG pathway for Halobacterium salinarum, gave me a starting place to see where RAST may have not done an accurate job of calling proteins in the Glycolysis/Gluconeogenesis pathway. I was then able to compare the protein sequences between KEGG and the other databases and make comparisons within the databases using tools such as BLAST2, CDD and Pfam. These tools are very helpful in determining if the function of the protein was what the database said it was. By going through this process, we have a better idea of what holes there are in a pathway and what experiments we can do to test whether or not these annotations are correct. Although automated annotation tools are very helpful as a starting place for annotating a pathway, they are by no means, perfect.