PathoLogic - Bioinformatics Research Group at SRI International

PathoLogic Pathway Predictor SRI International Bioinformatics Inference of Metabolic Pathways Annotated Genomic Sequence Pathway/Genome Database Gene Products Pathways Genes/ORFs DNA Sequences Multi-organism Pathway Database (MetaCyc) Pathways Reactions PathoLogic Software Integrates genome and pathway data to identify putative metabolic networks Compounds Gene Products Genes Reactions Genomic Map Compounds PathoLogic Functionality  Initialize SRI International Bioinformatics schema for new PGDB  Transform existing genome to PGDB form  Infer metabolic pathways and store in PGDB  Infer operons and store in PGDB  Assemble Overview diagram  Assist user with manual tasks  Assign enzymes to reactions they catalyze  Identify false-positive pathway predictions  Build protein complexes from monomers  Infer transport reactions SRI International Bioinformatics PathoLogic Input/Output  Inputs:  File listing genetic elements      http://bioinformatics.ai.sri.com/ptools/genetic-elements.dat Files containing DNA sequence for each genetic element Files containing annotation for each genetic element MetaCyc database Output:  Pathway/genome database for the subject organism  Reports that summarize:   Evidence contained in the input genome for the presence of reference pathways Reactions missing from inferred pathways PathoLogic Analysis Phases     SRI International Bioinformatics Trial parsing of input data files [few days] Initialize schema of new PGDB [3 min] Create DB objects for replicons, genes, proteins [5 min] Assign enzymes to reactions they catalyze  ferrochelatase [10 min / 1 week]  glutamate 1-semialdehyde 2,1-aminomutase  porphobilinogen deaminase A E1 B E2 C D E G F PathoLogic Analysis Phases SRI International Bioinformatics  From assigned reactions, infer what pathways are present [5 min / few days]  Define metabolic overview diagram  Define protein complexes [30 min] [few days] genetic-elements.dat ID TEST-CHROM-1 NAME Chromosome 1 TYPE :CHRSM CIRCULAR? N ANNOT-FILE chrom1.pf SEQ-FILE chrom1.fsa // ID TEST-CHROM-2 NAME Chromosome 2 CIRCULAR? N ANNOT-FILE /mydata/chrom2.gbk SEQ-FILE /mydata/chrom2.fna // SRI International Bioinformatics SRI International Bioinformatics File Naming Conventions  One pair of sequence and annotation files for each genetic element  Sequence files: FASTA format  suffix fsa or fna  Annotation file:  Genbank format: suffix .gbk  PathoLogic format: suffix .pf SRI International Bioinformatics Typical Problems Using Genbank Files With PathoLogic  Wrong qualifier names used: read PathoLogic documentation!  Extraneous  Check information in a given qualifier results of trial parse carefully GenBank File Format  Accepted feature types:  CDS, tRNA, rRNA, misc_RNA  Accepted qualifiers:  /locus_tag  /gene  /product  /EC_number  /product_comment  /gene_comment  /alt_name  /pseudo  SRI International Bioinformatics Unique ID [recm] Gene name [req] [req] [recm] [opt] [opt] Synonyms [opt] Gene is a pseudogene [opt] For multifunctional proteins, put each function in a separate /product line PathoLogic File Format    SRI International Bioinformatics Each record starts with line containing an ID attribute Tab delimited Each record ends with a line containing //  One attribute-value pair is allowed per line  Use multiple FUNCTION lines for multifunctional proteins  Lines starting with ‘;’ are comment lines  Valid attributes are:  ID, NAME, SYNONYM  STARTBASE, ENDBASE, GENE-COMMENT  FUNCTION, PRODUCT-TYPE, EC, FUNCTION-COMMENT  DBLINK  INTRON PathoLogic File Format SRI International Bioinformatics ID TP0734 NAME deoD STARTBASE 799084 ENDBASE 799785 FUNCTION purine nucleoside phosphorylase DBLINK PID:g3323039 PRODUCT-TYPE P GENE-COMMENT similar to GP:1638807 percent identity: 57.51; identified by sequence similarity; putative // ID TP0735 NAME gltA STARTBASE 799867 ENDBASE 801423 FUNCTION glutamate synthase DBLINK PID:g3323040 PRODUCT-TYPE P SRI International Bioinformatics Before you start: What to do when an error occurs Navigator errors are automatically trapped – debugging information is saved to error.tmp file.  All other errors (including most PathoLogic errors) will cause software to drop into the Lisp debugger  Unix: error message will show up in the original terminal window from which you started Pathway Tools.  Windows: Error message will show up in the Lisp console. The Lisp console usually starts out iconified – its icon is a blue bust of Franz Liszt  2 goals when an error occurs:  Try to continue working  Obtain enough information for a bug report to send to pathway-tools support team.  Most The Lisp Debugger  SRI International Bioinformatics Sample error (details and number of restart actions differ for each case) Error: Received signal number 2 (Keyboard interrupt) Restart actions (select using :continue): 0: continue computation 1: Return to command level 2: Pathway Tools version 10.0 top level 3: Exit Pathway Tools version 10.0 [1c] EC(2):  To generate debugging information (stack backtrace): :zoom :count :all  To continue from error, find a restart that takes you to the top level – in this case, number 2 :cont 2  To exit Pathway Tools: :exit How to report an error  Determine SRI International Bioinformatics if problem is reproducible, and how to reproduce it (make sure you have all the latest patches installed)  Send email to ptools-support@ai.sri.com containing:  Pathway Tools version number and platform  Description of exactly what you were doing (which command you invoked, what you typed, etc.) or instructions for how to reproduce the problem  error.tmp file, if one was generated  If software breaks into the lisp debugger, the complete error message and stack backtrace (obtained using the command :zoom :count :all, as described on previous slide) SRI International Bioinformatics Using the PPP GUI to Create a Pathway/Genome Database  Input Project Information  Organism -> Create New SRI International Bioinformatics Input Project Information Next Steps  Trial Parse  Build -> Trial Parse  Fix any errors in input files  Build pathway/genome database  Build -> Automated Build SRI International Bioinformatics SRI International Bioinformatics PathoLogic Parser Output SRI International Bioinformatics Assign Enzymes to Reactions 5.1.3.2 Gene product MetaCyc UDP-glucose-4epimerase Match yes no Probable enzyme -ase no yes Not a metabolic enzyme Assign UDP-D-glucose  UDP-galactose Manually search no Can’t Assign yes Assign Enzyme Name Matcher  Matches SRI International Bioinformatics on full enzyme name  Match is case-insensitive and removes the punctuation characters “ -_(){}',:”  Also matches after removal of prefixes and suffixes such as:  “Putative”, “Hypothetical”, etc  alpha|beta|…|catalytic|inducible chain|subunit|component  Parenthetical gene name Enzyme Name Matcher  For SRI International Bioinformatics names that do not match, software identifies probable metabolic enzymes as those  Containing “ase”  Not containing keywords such as      “sensor kinase” “topoisomerase” “protein kinase” “peptidase” Etc  Research unknown enzymes  MetaCyc, Swiss-Prot, PubMed SRI International Bioinformatics Enzyme Name to Reaction Mapping See also file PTools Tutorial/PathoLogic Reports/name-matching-report.txt SRI International Bioinformatics Manual Polishing  Refine -> Assign Probable Enzymes  Do this first  Refine -> Rescore Pathways  Redo after assigning enzymes  Refine -> Create Protein Complexes  Can be done at any time  Refine -> Assign Modified Proteins  Can be done at any time  Refine -> Transport Identification Parser  Can be done at any time  Refine -> Pathway Hole Filler  Refine -> Predict Transcription Units  Refine -> Update Overview  Do this last, and repeat after any material changes to PGDB Assign Probable Enzymes SRI International Bioinformatics SRI International Bioinformatics How to find reactions for probable enzymes  First, verify that enzyme name describes a specific, metabolic function  Search for fragment of name in MetaCyc – you may be able to find a match that PathoLogic missed  Look up protein in SwissProt or other DBs  Search for gene name in PGDB for related organism (bear in mind that gene names are not reliable indicators of function, so check carefully)  Search for function name in PubMed  Other… Manual Polishing  Refine -> Assign Probable Enzymes  Refine -> Rescore Pathways  Refine -> Create Protein Complexes  Refine -> Assign Modified Proteins  Refine -> Transport Identification Parser  Refine -> Pathway Hole Filler  Refine -> Predict Transcription Units  Refine -> Run Consistency Checker  Refine -> Update Overview SRI International Bioinformatics Automated Pathway Inference  All SRI International Bioinformatics pathways in MetaCyc for which there is at least one enzyme identified in the target organism are considered for possible inclusion.  Algorithm errs on side of inclusivity – easier to manually delete a pathway from an organism than to find a pathway that should have been predicted but wasn’t. SRI International Bioinformatics Considerations taken into account when deciding whether or not a pathway should be inferred:     Is there a unique enzyme – an enzyme not involved in any other pathway? Does the organism fall in the expected taxonomic domain of the pathway? Is this pathway part of a variant set, and, if so, is there more evidence for some other variant? If there is no unique enzyme:  Is there evidence for more than one enzyme?  If a biosynthetic pathway, is there evidence for final reaction(s)?  If a degradation pathway, is there evidence for initial reaction(s)?  If an energy metabolism pathway, is there evidence for more than half the reactions? SRI International Bioinformatics Assigning Evidence Scores to Predicted Pathways  X|Y|Z denotes score for P in O  where:    X = total number of reactions in P Y = enzymes catalyzing number of reactions for which there is evidence in O Z = number of Y reactions that are used in other pathways in O Manual Pruning of Pathways SRI International Bioinformatics  Use pathway evidence report  Coloring scheme aids in assessing pathway evidence  Phase I: Prune extra variant pathways  Rescore pathways, re-generate pathway evidence report  Phase II: Prune pathways unlikely to be present  No/few unique enzymes  Most pathway steps present because they are used in another pathway  Pathway very unlikely to be present in this organism  Nonspecific enzyme name assigned to a pathway step Caveats  Cannot predict pathways not present in MetaCyc  Evidence  Since SRI International Bioinformatics for short pathways is hard to interpret many reactions occur in multiple pathways, some false positives Output from PPP  Pathway/genome SRI International Bioinformatics database  Summary pages  Pathway evidence page   Click “Summary of Organisms”, then click organism name, then click “Pathway Evidence”, then click “Save Pathway Report” Missing enzymes report  Directory etc. tree containing sequence files, reports, SRI International Bioinformatics Resulting Directory Structure  ROOT/ptools-local/pgdbs/user/ORGIDcyc/VERSION/  input       reports    ORGIDbase.ocelot data   name-matching-report.txt trial-parse-report.txt kb   organism.dat organism-init.dat genetic-elements.dat annotation files sequence files overview.graph released -> VERSION Manual Polishing  Refine -> Assign Probable Enzymes  Refine -> Rescore Pathways  Refine -> Create Protein Complexes  Refine -> Assign Modified Proteins  Refine -> Transport Identification Parser  Refine -> Pathway Hole Filler  Refine -> Predict Transcription Units  Refine -> Run Consistency Checker  Refine -> Update Overview SRI International Bioinformatics SRI International Bioinformatics Creating Protein Complexes SRI International Bioinformatics Complex Subunits Stoichiometries Manual Polishing  Refine -> Assign Probable Enzymes  Refine -> Re-run Name Matcher  Refine -> Create Protein Complexes  Refine -> Assign Modified Proteins  Refine -> Transport Identification Parser  Refine -> Pathway Hole Filler  Refine -> Predict Transcription Units  Refine -> Run Consistency Checker  Refine -> Update Overview SRI International Bioinformatics SRI International Bioinformatics Proteins as Reaction Substrates Manual polishing  Refine -> Assign Probable Enzymes  Refine -> Rescore Pathways  Refine -> Create Protein Complexes  Refine -> Assign Modified Proteins  Refine -> Transport Identification Parser  Refine -> Pathway Hole Filler  Refine -> Predict Transcription Units  Refine -> Run Consistency Checker  Refine -> Update Overview SRI International Bioinformatics Nomenclature SRI International Bioinformatics • WO pair = pair of genes within an operon • TUB pair = pair of genes at a transcription unit boundary (delineate operons) SRI International Bioinformatics Operation of the operon predictor   For each contiguous gene pair, predict whether gene pairs are within the same operon or at a transcription unit boundary Use pairwise predictions to identify potential operons AB = TUB pair BC = WO pair CD = WO pair DE = TUB pair A operon = BCD B C D E Operon predictor SRI International Bioinformatics  Predicts operon gene pairs based on:  intergenic distance between genes  genes in the same functional class  Typically used for operon prediction  We use method from Salgado et al, PNAS (2000) as a starting point.   Uses E. coli experimentally verified data as a training set. Compute log likelihood of two genes being WO or TUB pair based on intergenic distance. Operon predictor SRI International Bioinformatics Additional features easily computed from a PGDB 1. both genes products enzymes in the same metabolic pathway 2. both gene products monomers in the same protein complex 3. one gene product transports a substrate for a metabolic pathway in which the other gene product is involved as an enzyme 4. a gene upstream or downstream from the gene pair (and within the same directon) is related to either one of the genes in the pair as per features 1, 2 and 3 above.

PathoLogic - Bioinformatics Research Group at SRI International

Related documents

Products

Support

PathoLogic - Bioinformatics Research Group at SRI International

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib