Media:Pallavitutorial

advertisement
Comparison of the RAST and KEGG Pathway Annotations for
Glycolysis/Gluconeogenesis
Pathway annotations can be very helpful in trying to determine
the metabolic pathways an organism has and what types of
characteristics make it unique. The Seed Viewer gives very good
visualizations for the enzymes involved in different pathways in
Halorhabdus utahensis. However, it is important to note that
annotations are not perfect. The following tutorial shows you
how to navigate KEGG and RAST to find the pathway you are
interested in and then how to manually compare the two
annotations to see if there are any mistakes in the pathway
annotation for H. utahensis.
1. You must first log in to the SEED Viewer in order to access
the annotation. The following is the link to the page where
you can log in: http://rast.nmpdr.org/
2. After you log in and go to the overview of the genome, this
is the first page you will see:
This is a picture of what you will see towards the top of the
page. It gives you all of the different major pathways found in
Halorhabdus utahensis:
The number in blue shows
you how many EC numbers
were found in H.
utahensis. If you click
on this link it will take
you to a page with
information specific to
carbohydrate metabolism
Click on the link to find
out more information about
glycolysis/gluconeogenesis
Below is the pathway map you will see for Glycolysis/Gluconeogenesis:
Enzymes found
in KEGG, but
not in RAST
*The numbers highlighted
in green are ones that the
annotation found in
H. utahensis
Next, we will look at the KEGG annotation for this same pathway.
The website for KEGG pathways is as follows:
http://www.genome.ad.jp/kegg/pathway.html
Here is the first page you will see when you get to the database
Clicking on this link will take you
to a list of pathways under
carbohydrate metabolism. Click on
glycolysis/gluconeogenesis.
You will find the page below. Click on the drop-down menu in the upper
left corner and choose the organism Halobacterium salinarum:
I chose this bacterium,
because it is the
halobacterium that is
most closely related to
H. utahensis.
These are the
enzymes found in
KEGG and not in RAST
I will further investigate the following enzyme: 1.1.1.1:
Alcohol dehydrogenase using SEED, JGI and MANATEE.
Go to the SEED’s homepage:
This is the page you will get after you click on “Genome Browser”
Make sure that you display 2915 items
per page so that you can search the
entire annotations. Then type in a
search function for either 1.1.1.1 or
alcohol dehydrogenase.
Because I was able to find the enzyme,
this means that the database had
called both the name of the enzyme and
the EC number, but did not highlight
it in the pathway annotation
To investigate the calls that JGI and MANATEE made, go to the H.
utahensis GCAT wiki:
http://gcat.davidson.edu/GcatWiki/index.php/Halorhabdus_utahensi
s_Genome
This link will give
you all the protein
sequences for the
annotation in JGI
This link will give you
all of the protein
sequences for the
annotation in MANATEE
Once you pull up these sequences, you can type in a search
function for either the EC number or the name of the enzyme to
see if that database called it. I would suggest doing both,
because sometimes the databases call the enzyme name without the
EC number.
This is what I found in the MANATEE search:
I did not find the enzyme is JGI. However, the fact that I found
the enzyme with the exact same EC number and protein name in two
of the databases should give strong evidence that the protein
exists in the H. utahensis genome. To further support this claim
I did a BLAST 2 between the amino acid sequences in the two
different databases (SEED and MANATEE) and the amino acid
sequence associated with EC number 1.1.1.1 (this can be found by
clicking on the EC number 1.1.1.1 in the pathway map in KEGG).
In order to do a BLAST 2 comparison go to this website
http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi. If you are
dong a protein BLAST, make sure that the “Program” option in the
upper left corner is set to “blastp.” Below are the results for
comparisons between KEGG and SEED and KEGG and Manatee.
Top: KEGG
Bottom SEED
Top: KEGG
Bottom: MANATEE
Although both of these alignments do not have perfect matches,
this is expected, because the protein sequences are coming from
two different organisms. However, it is important to make sure
that the protein sequences in SEED and MANATEE have functional
domains associated with an alcohol dehydrogenase. I will
investigate this by using the Conserved Domains Database and
Pfam. First, however, I wanted to compare the SEED and MANATEE
sequences against each other. I did a BLAST 2 comparison between
these sequences.
These are some very interesting results, because it shows that
even though these two databases had the same raw DNA sequence,
they used two different portions of the genome to call the same
protein. This suggests that the two databases may have different
methods of calling proteins. To make sure that both of these
amino acid sequences have functional domains that allow them to
be an alcohol dehydrogenase, I entered the sequences from SEED
and MANATEE into the Conserved Domains Database (CDD)
(http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml). This is
the result I got for the amino acid sequence in SEED:
This is the result I got for the amino acid sequence found in
MANATEE:
Both of these amino acid sequences returned the same superfamily
hits, which suggests that they both have functional domains that
allow them to perform the function of an alcohol dehydrogenase.
Searching CDD can be a very good way to confirm that an amino
acid sequence’s function is what it was called as if there are
discrepancies in the amino acid sequences between the different
databases.
To further support the data I found in CDD, I searched the Pfam
database, which looks for protein families within the query
sequence. This is the link for the Pfam search site:
http://pfam.sanger.ac.uk/search. For both the sequence in RAST
and Manatee, these are the results I found:
From this data, we can see that these two protein sequences do
in fact code for the same enzyme. As you can see, using other
databases to support annotation calls can be very helpful in
confirming the pathway annotation calls of a database.
In a similar manner, I found the following information for the
other enzymes that were in the KEGG pathway, but not highlighted
in the RAST pathway.
EC Number
Name
Other
Databases?
CDD and Pfam Results
2.7.1.2
Glucokinase
YesSEED
and MANATEE


3.1.3.11
Fructose
Bisphosphate
YesSEED
2.3.1.12
Dihydrolipollysine
residue
acetyltransferase
No
N/A
1.2.4.1
Pyruvate
Dehydrogenase
No
N/A
1.2.1.3
Aldehyde
Dehydrogenase
(NAD+)
No
N/A
SEED (ROK family,
sugar kinase; ROK
family)
Manatee (Same,
Same)
 SEED(IMPase like
domains-substrate
is fructose 1,6
bisphosphate; IMP
family)
Based on this information, we can see how important it is to
hand curate a pathway and to compare the protein calls of
different databases. Looking at the KEGG pathway for
Halobacterium salinarum, gave me a starting place to see where
RAST may have not done an accurate job of calling proteins in
the Glycolysis/Gluconeogenesis pathway. I was then able to
compare the protein sequences between KEGG and the other
databases and make comparisons within the databases using tools
such as BLAST2, CDD and Pfam. These tools are very helpful in
determining if the function of the protein was what the database
said it was. By going through this process, we have a better
idea of what holes there are in a pathway and what experiments
we can do to test whether or not these annotations are correct.
Although automated annotation tools are very helpful as a
starting place for annotating a pathway, they are by no means,
perfect.
Download