IMG User Scenario February 1, 2012 IMG-ER Curation Exercises A. Manual curation of candidate enzyme assignments. Find the genome of Actinosynnema mirum 101, DSM 43827. 1. How many genes in this genome have no enzyme assignment, but have candidate KEGG Orthology-based enzymes? 2. Which gene in the above list of candidates has the highest percent identity to the gene with KO-based enzyme? 3. Consider the gene with the gene_oid 644946660. Can you figure out why it wasn’t automatically annotated with KO term? 4. Consider the gene with the gene_oid 644940710 from the list of genes with candidate KO terms. Should the candidate KO term KO:K01485, cytosine deaminase [EC:3.5.4.1] be assigned to this gene? 5. Which EC number (if any) should be assigned to the gene 644940710? 6. Annotate the gene 646607937 using “Enter MyIMG annotation” option in Gene Cart. B. Finding “missing enzymes” and their manual curation. In the genome of Actinosynnema mirum 101, DSM 43827 1. How many genes in this genome are connected to KEGG Pathways? 2. Does it have all enzymes necessary for histidine biosynthesis according to KEGG map? 3. Which enzymes appear to be missing? 4. Are there candidate genes for these functions in the list of “Genes w/o enzymes but with candidate KO based enzymes”? IMG User Scenario February 1, 2012 5. Can you find any candidate genes for EC:5.3.1.16 in Actinosynnema mirum 101, DSM 43827? 6. Is there any chromosomal neighborhood evidence that the candidate gene for EC:5.3.1.16 is involved in histidine biosynthesis? C. Using Compare Gene Annotations. 1. How many genes in the genome of Sanguibacter keddieii belong to the category “No Product Name/With Evidence”? 2. What is the likely function of the gene 646607935 in Sanguibacter keddieii? 3. How many genes in the genome of Sanguibacter keddieii belong to the category “With Product Name/No Evidence”? 4. Is annotation of the gene 646609130 “Fimbrial assembly protein (PilN)” likely correct or not? 5. Annotate the gene 646607935 using “Enter MyIMG Annotation” tool in Gene Cart. Which gene symbol should be assigned to it? Which EC number should be assigned to it? IMG User Scenario February 1, 2012 IMG-ER Curation Exercises A. Manual curation of candidate enzyme assignments. Find the genome of Actinosynnema mirum 101, DSM 43827. 1. How many genes in this genome have no enzyme assignment, but have candidate KEGG Orthology-based enzymes? Answer: 246 Explanation: find the genome using “Quick Genome Search” option or “Find Genomes” -> “Genome Search” menu. Click on the genome name, on “Organism Details” page go to “Genome Statistics”. In the table find the row “w/o enzymes but with candidate KO based enzymes”. 2. Which non-pseudogene gene in the above list of candidates has the highest percent identity to the gene with KO-based enzyme? Answer: gene_oid 644940672 has 83% identity to the gene with KO term “cysteine desulfurase [EC:2.8.1.7]”. Explanation: click on the count of genes “w/o enzymes but with candidate KO based enzymes” in “Genome Statistics” table. Sort the resulting table by the column “Percent Identity” by clicking on the column header. The gene 645947926 with 99% identity is a pseudogene. 3. Consider the gene with the gene_oid 644946660. Can you figure out why it wasn’t automatically annotated with KO term? Answer: There are conflicting KO term assignments among the close homologs of this gene. Explanation: click on gene_oid to go to the corresponding Gene Details page. Review protein family assignments and protein family alignments. All IMG User Scenario February 1, 2012 protein family assignments agree that it is an acyl-CoA dehydrogenase family protein, and butyryl-CoA dehydrogenase is one specific instance of this activity. Go to Homolog Display and select “Orthologs only” option (KO term assignments in IMG are propagated only to orthologs, defined as reciprocal best BLASTp hits between the genomes). On the Top IMG Homolog Hits page select “Percent identity” as target column, “regex” as a filter and >75 as the keyword; hit “Apply” button. This filters out the list of orthologs to display only the genes with greater than 75% identity to the target gene. Click “Select All” and add to Gene Cart. IMG User Scenario February 1, 2012 In the Gene Cart select all genes, and go to the “Function Cart” tab. Select “KEGG KO” as a filter and hit “Add to Function Cart” button. This will add KO term assignments of the selected genes to Function Cart. IMG User Scenario February 1, 2012 3 KO terms will appear in Function Cart indicating that closely related homologs of the target gene have 3 different KO term assignments. KO terms in IMG will not be assigned to the genes if a potential conflict between the assignments exists. 4. Consider the gene with the gene_oid 644940710 from the list of genes with candidate KO terms. Should the candidate KO term KO:K01485, cytosine deaminase [EC:3.5.4.1] be assigned to this gene? Answer: No. Explanation: Go to the gene page of this gene, review different annotations provided by IMG. This protein belongs to a family of Zn-dependent deaminases acting on (deoxy)nucleotides and nucleosides, and it has a KO term assignment of KO:K01500 E3.5.4.-, which corresponds to this broad multifunctional family (EC 3.5.4.- is hydrolase acting on carbon-nitrogen bonds, other than peptide bonds, in cyclic amidines). Cytosine deaminase (EC 3.5.4.1) is one representative of this class. However, both SEED annotation and IMG term suggest that this protein has different activity – that of tRNA-specific adenosine deaminase, i. e. an enzyme which deaminates adenosine rather than cytosine, and acts on a nucleoside in tRNA rather than free nucleoside. IMG User Scenario February 1, 2012 In order to verify the correctness of SEED and IMG term annotations, you can find experimentally characterized bacterial tRNA-adenosine deaminases (in E. coli, Staphylococcus aureus and Streptococcus pyogenes). After that you can run IMG Genome BLAST against these genomes only. Sort the resulting table by the genome name. It shows that the query gene has 2 homologs in several E. coli genomes, and only 1 homolog in S. aureus IMG User Scenario February 1, 2012 and S. pyogenes genomes. The latter are experimentally characterized tRNA-adenosine deaminases. In E. coli it is orthologous to the gene tadA (b2559), which is also an experimentally characterized adenosine deaminase. Furthermore, the function of tRNA-adenosine deaminase is ubiquitous in prokaryotic and eukaryotic organisms, so Actinosynnema mirum is likely to need it as well. Therefore the suggested KO annotation of “cytosine deaminase” is incorrect and should not be assigned to this gene. 5. Which EC number (if any) should be assigned to the gene 644940710? Answer: The gene should be annotated as EC:3.5.4.-. Explanation: the authoritative source is Enzyme Nomenclature, at http://www.expasy.org/enzyme/ Other alternatives include KEGG and MetaCyc pathways. The list of enzymes in the class 3.5.4.- does not include any entries for tRNA-specific adenosine deaminases, so a partial EC number corresponding to the enzymatic class should be assigned. In order to find out whether the annotation of “tRNA-specific adenosine-34 deaminase” or “tRNA-adenosine deaminase” is more correct, one should review the literature to see if the only adenosine in tRNA that can be deaminated is found in position 34. If other adenosines in tRNA can be deaminated by this enzyme, the annotation of “adenosine-34” will be too specific. IMG User Scenario February 1, 2012 6. Annotate the gene 644940710 using “Enter MyIMG annotation” option in Gene Cart. B. Finding “missing enzymes” and their manual curation. In the genome of Actinosynnema mirum 101, DSM 43827 1. How many genes in this genome are connected to KEGG Pathways? Answer: 1408. Explanation: the number can be found in “Organism Details” page, “Genome Statistics” table, “Protein coding genes connected to KEGG pathways” row. 2. Does it have all enzymes necessary for histidine biosynthesis according to KEGG map? Answer: no. Explanation: click on the number of “Protein coding genes connected to KEGG pathways”, then on the “KEGG Categories” page on the count of genes in Amino Acid Metabolism (290), then go to “Histidine metabolism” map and click on the name of the map. Enzymes found in IMG User Scenario February 1, 2012 the query genome are colored blue, enzymes not found in the query, but found in other genomes are colored orange. Two enzymes in the pathway from PRPP to L-histidine are colored orange. 3. Which enzymes appear to be missing? Answer: EC:5.3.1.16, 1-(5-phosphoribosyl)-5-[(5phosphoribosylamino)methylideneamino] imidazole-4-carboxamide isomerase and EC:3.1.3.15, histidinol phosphatase. 4. Are there candidate genes for these functions in the list of “Genes w/o enzymes but with candidate KO based enzymes”? Answer: no. Explanation: go to “Genome Details” page, “Genome Statistics” table, row “w/o enzymes but with candidate KO based enzymes” and click on the gene count (265). Use “Search column” function with “KO definition” as a filter and enzyme names or EC numbers as keywords 5. Can you find any candidate genes for EC:5.3.1.16 in Actinosynnema mirum? Answer: yes, it has gene_oid 644946150. Explanation: click on the box with EC:5.3.1.16 in KEGG map. It will bring the list of genes in genomes other than Actinosynnema mirum annotated with this KO term. Select one of these genes as a query; since Actinosynnema mirum is an actinobacterium, it is better to select another actinobacterial gene as a query – for instance, the gene 637266285 from Streptomyces coelicolor (you can filter the list by Genome column using Streptomyces as a keyword). Perform IMG Genome BLAST of the gene 637266285 against the genome of Actinosynnema mirum. IMG User Scenario February 1, 2012 There is a gene in Actinosynnema with 72% identity to S. coelicolor protein annotated as EC:5.3.1.16, its gene_oid is 644946150. 6. Is there any chromosomal neighborhood evidence that the candidate gene for EC:5.3.1.16 is involved in histidine biosynthesis? Answer: yes. Explanation: go to the gene page of the gene 644946150. Go to “Evidence For Function Prediction” section of the page and review the chromosomal context. Your query gene is colored in red, and there are 3 more proteins next to it colored in green, indicating that our query gene and these 3 genes have KO terms from the same KEGG map, including hisH protein next to the candidate gene. You can mouse-over these genes to review their annotations. You can also review conserved chromosomal neighborhoods using orthologs (e.g. click on “Show ortholog neighborhood regions”). IMG User Scenario February 1, 2012 Chromosomal neighborhood is well conserved in Actinobacteria. C. Using Compare Gene Annotations. 1. How many genes in the genome of Sanguibacter keddieii belong to the category “No Product Name/With Evidence”? Answer: 361. Explanation: go to Sanguibacter keddieii “Organism Details” page, “Compare Gene Annotations” link. 3735 genes are retrieved. Select filter option “No Product Name/With Evidence”. 361 genes are retrieved. IMG User Scenario February 1, 2012 2. What is the likely function of the gene 646607935 in Sanguibacter keddieii? Answer: L-rhamnose mutarotase. Explanation: this gene has a product name “hypothetical protein”, but is assigned to KO Term K03534 rhaM L-rhamnose mutarotase [EC:5.1.3.-]. This is the first gene in the conserved chromosomal cassette (likely an operon) coding for other genes for L-rhamnose degradation. L-rhamnose mutarotase catalyzes interconversion between alpha-L-rhamnose and beta-L-rhamnose; the next step in the pathway is catalyzed by Lrhamnose kinase, which is specific for beta-L-rhamnose. 3. How many genes in the genome of Sanguibacter keddieii belong to the category “With Product Name/No Evidence”? Answer: 23. 4. Is annotation of the gene 646609130 “Fimbrial assembly protein (PilN)” likely correct or not? Answer: this annotation is likely correct. Explanation: go to Gene Detail page and mouse over the genes immediately upstream and downstream of the query gene. The query gene appears to be in the middle of an operon for pili biosynthesis. 5. Annotate the gene 646607935 using “Enter MyIMG Annotation” tool in Gene Cart. Which gene symbol should be assigned to it? Which EC number should be assigned to it? IMG User Scenario February 1, 2012 Explanation: KEGG assigns a gene symbol rhaM to this gene, which is based on E. coli nomenclature. However, there is an alternative gene symbol rhaU, which is based on Rhizobium leguminosarum nomenclature. There are no established rules for assigning gene symbols when multiple nomenclatures exist. Either rhaM or rhaU can be used, but the second gene symbol better be added in “Notes”. There is no complete EC number for rhamnose mutarotase. Search in Enzyme Nomenclature shows that there is a preliminary EC number is assigned to this enzyme, 5.1.3.n3, which will cause problems upon GenBank submission. Partial EC number for this genome should be used: “EC:5.1.3.-“.