Amherst, NY
May 16, 2013
Cathy H. Wu, Ph.D.
2
• Ontology Developers
• GO ontology: Interfaces of GO/PRO complexes; GO definition (e.g., GO:0005109)
• GO annotation: precise annotation of protein forms in PomBase
• Cell Ontology: Define cell types based on protein types
• Annotation Ontology for annotating scientific documents on the web
• Brucellosis Ontology (IDOBRU), extension of the Infectious Disease Ontology (IDO)
• Semantic Resources
• Semantic Web Applications in Neuromedicine (SWAN); Neuroscience Information
Framework (NIF)
• Pathway/Process-Modeling Resources
• Reactome, MouseCyc, EcoCyc, MaizeCyc
• Chemical/Proteomic Resources: PubChem, IUPhar, P3DB, Top-Down Proteomics, PDB
• Pharma/Clinical Communities: Drug Discovery & Disease Biomarker
• Alzforum
• Salivaomics KB/SALO (Saliva Ontology): Saliva Biomarkers
• Clinical flow cytometry, immunology (ImmuPort) community
3
• List all the genes expressed differentially in the leaves of Rice plant varieties IRBB5 and IR24 at the 5-leaf visible growth stage, when the plants were infected with Xanthomonas oryzae pv. oryzae were grown in a growth camber. IRBB5 is resistant and IR24 is susceptible to rice bacterial blight disease.
• Filter the differentially expressed gene set for those with
– LRR-domains
– Transmembrane domains (e.g. in excess of 1)
– Receptor like kinase function
– Plasmamembrane cellular location
– OR those having Tryptophan decarboxylase function
– Tryptophan metabolism
– Have known alleles and homologs with disease resistance phenotype
4
Object
XX
Object type
Feature or ontology Feature type
Molecular Function GO
Biological Process GO
Cellular component GO
Plant structure
Plant growth stage
(bio)chemical
Disease
Protein Domains
Pathway
Trait
PO
PO
ChEBI
DO
PRO
InterPro
Pathway??
TO
Attribute and score
PATO context
Any of the ontologies including the environment ontology for adding context to the annotation.
E.g. PEP carboxylase activity (GO-MF) in maize is required for
C4 carbon assimilation
(GO-PB). The process occurs in the plastid
(GO-CC) of the leaf mesophyll cell (PO).
5
GO: response to pathogen
Allele-B
GO: Receptor like Kinase
Gene:XA21
Allele-A belongs_to
Oryza genotype
6
Data Sources
• Manual annotation (curator, collaborator, user): sourceforge tracker; RACE-PRO
• Semi-automated processing of external databases (e.g., UniProtKB, Reactome,
MouseCyc, EcoCyc); coverage of 12 reference genomes in progress
Integration with text mining: RLIMS-P/eFIP ( P hosphorylation and F unctional I mpact)
RACE-PRO
Annotation Interface:
Capture knowledge of protein forms/ complexes of interest to support integrated analysis
7
PRO representation of the spindle checkpoint
PRO search query to retrieve PRO terms that contain the phrases
“spindle checkpoint” or
“spindle assembly checkpoint” or “mitotic checkpoint” and combined Cytoscape web view of the search results nodes retrieved by the search are blue; related nodes (parents and children) are gray
Use of the protein ontology for multi-faceted analysis of biological processes: a case study of the spindle checkpoint. Ross et al. (2013) Front Genet. 4:62. [PMID: 23637705] 8
Phosphorylated forms of BUB1B in PRO
[PMID: 23637705]
Four species-independent
BUB1B phosphorylated forms (blue nodes).
Display options set to show parents and all children, including organism level terms.
Sequence alignment of human, frog, and mouse
BUB1B highlighted to indicate experimentally determined phosphorylation sites
(blue) and predicted phosphorylation sites
(red).
9
PTM network of enzyme-substrate relationships and protein-protein interactions => iPTMnet with rich relations
Data Mining: iProClass database for molecular and omics data integration
Text Mining: RLIMS-P/eFIP system for knowledge extraction from literature
Ontology: PRO for knowledge representation of PTM forms
Web portal linking data and analysis/visualization tools for scientific queries
( http://proteininformationresource.org/iPTMnet )
10
• Literature-curated kinase-substrate data
PhosphoSitePlus, Phospho.ELM, HPRD
PhosphoGRID
P3DB, PhosPhAt
UniProtKB, PRO
• Database content
Substrates: 28,000; P-Sites: 126,000; Kinases: 700
Substrate/site-kinase pairs: 13,000
Covering: human, mouse, rat, other vertebrates, Drosophila, C. elegans, yeast and plants
Curated phosphorylation papers: 10,000
• Full-scale processing of PubMed abstracts: 22 million
Phosphorylation papers identified by RLIMS-P: 143,000
Phosphorylation-PPI related papers identified by eFIP: 10,000
11
Exploring Relations
• Substrate-centric:
What PTM forms of a protein and their modifying enzymes are known?
• Enzyme-centric:
What substrates are known for a given PTM enzyme?
• Interaction:
What interacting partners are known for each PTM form of a given protein?
• Pathway:
What modifications and enzymes are known in a given signaling pathway?
(homology, disease, tissue/cell..)
12
• 73 nodes
• 24 phosphorylated forms
• 9 protein kinases
• 10 phospho-specific PPIs
• BUB1B/Phos:2 interacts specifically with PPP2R5A
• UB1B/Phos:2 phosphorylated by two important mitotic kinases:
CDK1 and PLK1
• BUB1B interacts with both phosphorylated and unphosphorylated CDC27
• Phosphorylation on
CDC27/Phos:1 sites does not regulate CDC27 interaction with BUB1B
Construction of protein phosphorylation networks by data mining, text mining, and ontology
integration: analysis of the spindle checkpoint. Ross et al. (2013) Database (Oxford) (in press).
• Brassinosteroids (BRs): a class of growth-promoting hormones, which plays role in plant growth and development.
• BR signaling is highly integrated with the light, gibberellin, and auxin pathways, and crosstalks with other receptor kinase pathways to modulate stomata development and innate immunity.
BR signaling curation
Step 1: Search RLIMS-P with core genes (bri1, bak1, bin2, bsu1, bzr1, bes1) and
“brassinosteroid mediated signaling pathway” to identify phosphorylation papers with phosphorylation information (kinase, substrate, site)
Step 2: Use RACE-PRO to curate phosphorylated protein forms, their kinases, PPIs, and associated GO functions, process, subcellular component
14
Core proteins and other associated proteins annotated with GO related to BR signaling pathway (blue)
15
Cullin-1 Rubylated
• SCF Complexes formed in response to auxin and jasmonate signaling
• Link to ChEBI for small molecule-containing complexes
16