IMG-ER Curation Exercises

advertisement
IMG User Scenario February 1, 2012
IMG-ER Curation Exercises
A. Manual curation of candidate enzyme assignments.
Find the genome of Actinosynnema mirum 101, DSM 43827.
1. How many genes in this genome have no enzyme assignment, but
have candidate KEGG Orthology-based enzymes?
2. Which gene in the above list of candidates has the highest percent
identity to the gene with KO-based enzyme?
3. Consider the gene with the gene_oid 644946660. Can you figure out
why it wasn’t automatically annotated with KO term?
4. Consider the gene with the gene_oid 644940710 from the list of genes
with candidate KO terms. Should the candidate KO term KO:K01485,
cytosine deaminase [EC:3.5.4.1] be assigned to this gene?
5. Which EC number (if any) should be assigned to the gene
644940710?
6. Annotate the gene 646607937 using “Enter MyIMG annotation”
option in Gene Cart.
B. Finding “missing enzymes” and their manual curation.
In the genome of Actinosynnema mirum 101, DSM 43827
1. How many genes in this genome are connected to KEGG Pathways?
2. Does it have all enzymes necessary for histidine biosynthesis
according to KEGG map?
3. Which enzymes appear to be missing?
4. Are there candidate genes for these functions in the list of “Genes w/o
enzymes but with candidate KO based enzymes”?
IMG User Scenario February 1, 2012
5. Can you find any candidate genes for EC:5.3.1.16 in Actinosynnema
mirum 101, DSM 43827?
6. Is there any chromosomal neighborhood evidence that the candidate
gene for EC:5.3.1.16 is involved in histidine biosynthesis?
C. Using Compare Gene Annotations.
1. How many genes in the genome of Sanguibacter keddieii belong to
the category “No Product Name/With Evidence”?
2. What is the likely function of the gene 646607935 in Sanguibacter
keddieii?
3. How many genes in the genome of Sanguibacter keddieii belong to
the category “With Product Name/No Evidence”?
4. Is annotation of the gene 646609130 “Fimbrial assembly protein
(PilN)” likely correct or not?
5. Annotate the gene 646607935 using “Enter MyIMG Annotation” tool
in Gene Cart. Which gene symbol should be assigned to it? Which EC
number should be assigned to it?
IMG User Scenario February 1, 2012
IMG-ER Curation Exercises
A. Manual curation of candidate enzyme assignments.
Find the genome of Actinosynnema mirum 101, DSM 43827.
1. How many genes in this genome have no enzyme assignment, but have
candidate KEGG Orthology-based enzymes?
Answer: 246
Explanation: find the genome using “Quick Genome Search” option or
“Find Genomes” -> “Genome Search” menu. Click on the genome name,
on “Organism Details” page go to “Genome Statistics”. In the table find
the row “w/o enzymes but with candidate KO based enzymes”.
2. Which non-pseudogene gene in the above list of candidates has the
highest percent identity to the gene with KO-based enzyme?
Answer: gene_oid 644940672 has 83% identity to the gene with KO term
“cysteine desulfurase [EC:2.8.1.7]”.
Explanation: click on the count of genes “w/o enzymes but with
candidate KO based enzymes” in “Genome Statistics” table. Sort the
resulting table by the column “Percent Identity” by clicking on the
column header. The gene 645947926 with 99% identity is a pseudogene.
3. Consider the gene with the gene_oid 644946660. Can you figure out
why it wasn’t automatically annotated with KO term?
Answer: There are conflicting KO term assignments among the close
homologs of this gene.
Explanation: click on gene_oid to go to the corresponding Gene Details
page. Review protein family assignments and protein family alignments. All
IMG User Scenario February 1, 2012
protein family assignments agree that it is an acyl-CoA dehydrogenase
family protein, and butyryl-CoA dehydrogenase is one specific instance of
this activity. Go to Homolog Display and select “Orthologs only” option
(KO term assignments in IMG are propagated only to orthologs, defined as
reciprocal best BLASTp hits between the genomes).
On the Top IMG Homolog Hits page select “Percent identity” as target
column, “regex” as a filter and >75 as the keyword; hit “Apply” button. This
filters out the list of orthologs to display only the genes with greater than
75% identity to the target gene. Click “Select All” and add to Gene Cart.
IMG User Scenario February 1, 2012
In the Gene Cart select all genes, and go to the “Function Cart” tab. Select
“KEGG KO” as a filter and hit “Add to Function Cart” button. This will add
KO term assignments of the selected genes to Function Cart.
IMG User Scenario February 1, 2012
3 KO terms will appear in Function Cart indicating that closely related
homologs of the target gene have 3 different KO term assignments. KO
terms in IMG will not be assigned to the genes if a potential conflict
between the assignments exists.
4. Consider the gene with the gene_oid 644940710 from the list of genes
with candidate KO terms. Should the candidate KO term KO:K01485,
cytosine deaminase [EC:3.5.4.1] be assigned to this gene?
Answer: No.
Explanation: Go to the gene page of this gene, review different annotations
provided by IMG. This protein belongs to a family of Zn-dependent
deaminases acting on (deoxy)nucleotides and nucleosides, and it has a KO
term assignment of KO:K01500 E3.5.4.-, which corresponds to this broad
multifunctional family (EC 3.5.4.- is hydrolase acting on carbon-nitrogen
bonds, other than peptide bonds, in cyclic amidines). Cytosine deaminase
(EC 3.5.4.1) is one representative of this class. However, both SEED
annotation and IMG term suggest that this protein has different activity –
that of tRNA-specific adenosine deaminase, i. e. an enzyme which
deaminates adenosine rather than cytosine, and acts on a nucleoside in tRNA
rather than free nucleoside.
IMG User Scenario February 1, 2012
In order to verify the correctness of SEED and IMG term annotations, you
can find experimentally characterized bacterial tRNA-adenosine deaminases
(in E. coli, Staphylococcus aureus and Streptococcus pyogenes). After that
you can run IMG Genome BLAST against these genomes only.
Sort the resulting table by the genome name. It shows that the query gene
has 2 homologs in several E. coli genomes, and only 1 homolog in S. aureus
IMG User Scenario February 1, 2012
and S. pyogenes genomes. The latter are experimentally characterized
tRNA-adenosine deaminases. In E. coli it is orthologous to the gene tadA
(b2559), which is also an experimentally characterized adenosine
deaminase.
Furthermore, the function of tRNA-adenosine deaminase is ubiquitous in
prokaryotic and eukaryotic organisms, so Actinosynnema mirum is likely to
need it as well. Therefore the suggested KO annotation of “cytosine
deaminase” is incorrect and should not be assigned to this gene.
5. Which EC number (if any) should be assigned to the gene
644940710?
Answer: The gene should be annotated as EC:3.5.4.-.
Explanation: the authoritative source is Enzyme Nomenclature, at
http://www.expasy.org/enzyme/ Other alternatives include KEGG and
MetaCyc pathways. The list of enzymes in the class 3.5.4.- does not include
any entries for tRNA-specific adenosine deaminases, so a partial EC number
corresponding to the enzymatic class should be assigned. In order to find out
whether the annotation of “tRNA-specific adenosine-34 deaminase” or
“tRNA-adenosine deaminase” is more correct, one should review the
literature to see if the only adenosine in tRNA that can be deaminated is
found in position 34. If other adenosines in tRNA can be deaminated by this
enzyme, the annotation of “adenosine-34” will be too specific.
IMG User Scenario February 1, 2012
6. Annotate the gene 644940710 using “Enter MyIMG annotation”
option in Gene Cart.
B. Finding “missing enzymes” and their manual curation.
In the genome of Actinosynnema mirum 101, DSM 43827
1. How many genes in this genome are connected to KEGG Pathways?
Answer: 1408.
Explanation: the number can be found in “Organism Details” page,
“Genome Statistics” table, “Protein coding genes connected to KEGG
pathways” row.
2. Does it have all enzymes necessary for histidine biosynthesis
according to KEGG map?
Answer: no.
Explanation: click on the number of “Protein coding genes connected to
KEGG pathways”, then on the “KEGG Categories” page on the count of
genes in Amino Acid Metabolism (290), then go to “Histidine
metabolism” map and click on the name of the map. Enzymes found in
IMG User Scenario February 1, 2012
the query genome are colored blue, enzymes not found in the query, but
found in other genomes are colored orange. Two enzymes in the pathway
from PRPP to L-histidine are colored orange.
3. Which enzymes appear to be missing?
Answer: EC:5.3.1.16, 1-(5-phosphoribosyl)-5-[(5phosphoribosylamino)methylideneamino] imidazole-4-carboxamide
isomerase and EC:3.1.3.15, histidinol phosphatase.
4. Are there candidate genes for these functions in the list of “Genes w/o
enzymes but with candidate KO based enzymes”?
Answer: no.
Explanation: go to “Genome Details” page, “Genome Statistics” table, row
“w/o enzymes but with candidate KO based enzymes” and click on the gene
count (265). Use “Search column” function with “KO definition” as a filter
and enzyme names or EC numbers as keywords
5. Can you find any candidate genes for EC:5.3.1.16 in Actinosynnema
mirum?
Answer: yes, it has gene_oid 644946150.
Explanation: click on the box with EC:5.3.1.16 in KEGG map. It will bring
the list of genes in genomes other than Actinosynnema mirum annotated
with this KO term. Select one of these genes as a query; since
Actinosynnema mirum is an actinobacterium, it is better to select another
actinobacterial gene as a query – for instance, the gene 637266285 from
Streptomyces coelicolor (you can filter the list by Genome column using
Streptomyces as a keyword). Perform IMG Genome BLAST of the gene
637266285 against the genome of Actinosynnema mirum.
IMG User Scenario February 1, 2012
There is a gene in Actinosynnema with 72% identity to S. coelicolor protein
annotated as EC:5.3.1.16, its gene_oid is 644946150.
6. Is there any chromosomal neighborhood evidence that the candidate
gene for EC:5.3.1.16 is involved in histidine biosynthesis?
Answer: yes.
Explanation: go to the gene page of the gene 644946150. Go to
“Evidence For Function Prediction” section of the page and review the
chromosomal context. Your query gene is colored in red, and there are 3
more proteins next to it colored in green, indicating that our query gene
and these 3 genes have KO terms from the same KEGG map, including
hisH protein next to the candidate gene. You can mouse-over these genes
to review their annotations. You can also review conserved chromosomal
neighborhoods using orthologs (e.g. click on “Show ortholog
neighborhood regions”).
IMG User Scenario February 1, 2012
Chromosomal neighborhood is well conserved in Actinobacteria.
C. Using Compare Gene Annotations.
1. How many genes in the genome of Sanguibacter keddieii belong to
the category “No Product Name/With Evidence”?
Answer: 361.
Explanation: go to Sanguibacter keddieii “Organism Details” page,
“Compare Gene Annotations” link. 3735 genes are retrieved. Select filter
option “No Product Name/With Evidence”. 361 genes are retrieved.
IMG User Scenario February 1, 2012
2. What is the likely function of the gene 646607935 in Sanguibacter
keddieii?
Answer: L-rhamnose mutarotase.
Explanation: this gene has a product name “hypothetical protein”, but is
assigned to KO Term K03534 rhaM L-rhamnose mutarotase [EC:5.1.3.-].
This is the first gene in the conserved chromosomal cassette (likely an
operon) coding for other genes for L-rhamnose degradation. L-rhamnose
mutarotase catalyzes interconversion between alpha-L-rhamnose and
beta-L-rhamnose; the next step in the pathway is catalyzed by Lrhamnose kinase, which is specific for beta-L-rhamnose.
3. How many genes in the genome of Sanguibacter keddieii belong to
the category “With Product Name/No Evidence”?
Answer: 23.
4. Is annotation of the gene 646609130 “Fimbrial assembly protein
(PilN)” likely correct or not?
Answer: this annotation is likely correct.
Explanation: go to Gene Detail page and mouse over the genes
immediately upstream and downstream of the query gene. The query
gene appears to be in the middle of an operon for pili biosynthesis.
5. Annotate the gene 646607935 using “Enter MyIMG Annotation” tool
in Gene Cart. Which gene symbol should be assigned to it? Which EC
number should be assigned to it?
IMG User Scenario February 1, 2012
Explanation: KEGG assigns a gene symbol rhaM to this gene, which is
based on E. coli nomenclature. However, there is an alternative gene
symbol rhaU, which is based on Rhizobium leguminosarum
nomenclature. There are no established rules for assigning gene symbols
when multiple nomenclatures exist. Either rhaM or rhaU can be used, but
the second gene symbol better be added in “Notes”. There is no complete
EC number for rhamnose mutarotase. Search in Enzyme Nomenclature
shows that there is a preliminary EC number is assigned to this enzyme,
5.1.3.n3, which will cause problems upon GenBank submission. Partial
EC number for this genome should be used: “EC:5.1.3.-“.
Download