Research in Semantics and Services Science October 26, 2006 Knowledge Enabled Information and Services Science (kno.e.sis) Where do we fit in Computer Science? Semantic Web Service Oriented Computing Business Process Management Data Management and Mining Bioinformatics Knowledge Enabled Information and Services Science (kno.e.sis) What capabilities do we have? ontology management and multi-ontology environments integration and analysis of heterogeneous data (structured, semistructured, unstructured) advanced and intelligent search, browsing, querying, mining, analysis and knowledge discovery semantic annotation of documents, scientific data and services involving entity and relationship extraction/disambiguation, semantic enhancement of Web2.0 including social search and light-weighted services, semantic middleware and semanticsenabled networking semantic Web services and processes including semantics based publication, discovery, composition and dynamic binding of services Knowledge Enabled Information and Services Science (kno.e.sis) Where are our application areas? e-Science Web-based Information Management bioinformatics, biomedicine, health care search and business intelligence National and Homeland Security intelligence analysis Overview Semantic Web Services and Processes Entity/Relationship Extraction, Disambiguation and Annotation Semantic Analytics Semantics for Life Sciences Meteor-S Semantic Middleware SemDis Bioinformatics for Gycan Expression Semantic Web Services and Processes Challenges “Each enterprise will measure and Challenges aspire to its own unique Business/Organizational level of dynamism based on its individual purpose. It is How to effectively create new business solutions about being nimble and adaptable. A fully integrated using a global workforce business platform can respond faster, and completely, to How to make IT more responsive to business change. Whether it involves fulfilling a new mandate or strategy embracing a new market opportunity. Some organizations Technical/Tactical Challenges will push the envelope, automating event-triggered responses foradd highly integrated closed-loop processes, How to more dynamism in business process settingcreation the stage for self-optimizing systems.” How to make processes adapt with changing environments Sandra Rogers, White Paper: Business Forces Driving Adoption of Service Oriented Architecture, Sponsored by: SAP AG Ontologies to Describe Service Semantics (ontologies are about agreements) People Technical Aspect of Agreement Organization Autonomic Web Process* Strategy Layer (Corporate Strategy and Strategy Layer Goals) • Self Healing Requirement: Only Provide customer Operational Layer (Modeling Business support gold customer Process to to provide business services) • Agile • Self Optimizing ITLayer Layer Execution (SOA Based IT Processes andRequirement: Services) • Self Configuring If cost > $$$$, Implementation Layer customer =(Databases, gold OS, etc.) Execution Scope of Agreement Task/ App Domain Industry Gen. Purpose, Non Functional Functional Common Data/ Sense Info. *it’s about the business, not just computing resources Broad Based Semantics for Technical Services Data/Information Semantics Functional Semantics (Semi-) Formally representing capabilities of web service for discovery and composition of Web Services by annotating operations of Web Services as well as provide preconditions and effects Execution Semantics What: (Semi-)Formal definition of data in input and output messages of a web service Why: for discovery and interoperability How: by annotating input/output data of web services using ontologies (Semi-) Formally representing the execution or flow of a services in a process or operations in a service for analysis (verification), validation (simulation) and execution (exception handling) of the process models using State Machines, Petri nets, activity diagrams etc. Non Functional Semantics (WS-*) (Semi-) formally represent qualitative and quantitative measures of Web process Non- Quantitative includes security, transactions Quantitative includes cost, time etc. Business constraints and inter service dependencies (Domain and application ontologies) Semantics for Technical Services BPWS4J, Execution, Adaptation and Mediation Development / Description / Annotation activeBPEL, WSMX WSDL, WSDL-S, SAWSDL, WSMO, OWL-S METEOR-S (MWSAF) METEOR-S BPEL, WSAgreement, WSPolicy METEOR-S (MWSCF) Composition, Configuration and Negotiation Publication / Discovery (Semantic) UDDI METEOR-S (MWSDI) Dynamic Process Configuration Operations Research has been used in industry for business process optimization There is often a lot of domain knowledge in business process optimization Minds of analysts/experts Hidden in databases/texts We try to explicitly capture domain knowledge and link with IT systems Dynamic Process Configuration Find optimal partners for the process based on process constraints – cost, supply time, etc. Conceptual Approach 1. Create framework to capture represent domain knowledge 2. Represent constraints on the domain knowledge 3. Ability to reason on the constraints and configure the process Dynamic Process Configuration Research Challenges Capturing functional and non-functional requirements of the Web process (Abstract process specification) Discovering service partners based on functional requirements (Semantic Web service discovery) Choosing optimal partners that satisfy nonfunctional requirements (Constraint Analysis) K. Verma, R. Akkiraju, R. Goodwin, P. Doshi, J. Lee, On Accommodating Inter Service Dependencies in Web Process Flow, AAAI Spring Symposium on Semantic Web Services, 2004 R. Aggarwal, K. Verma, J. A. Miller, Constraint Driven Composition in METEOR-S, SCC 2004. K. Verma, K.Gomadam, J. Miller and A. Sheth, Configuration and Execution of Dynamic Web Processes, LSDIS Lab Technical Report, 2005. Process Adaptation Ability to adapt the processes from failures, unexpected events Two kinds of failures Failures of physical components like services, processes, network Can replace services using dynamic configuration Logical failures like violation of SLA constraints/Agreements such as Delay in delivery, partial fulfillment of order Need additional decision making capabilities K. Verma, A. Sheth, Autonomic Web Processes, ICSOC 2005 K. Verma, P. Doshi, K. Gomadam, A. Sheth, J. Miller, Optimal Adaptation of Web Processes with Coordination Constraints, ICWS 2006. Process Adaptation Research Challenges Creating a model to recover from failures and handle future events Model must deal with two important factors Scenario Uncertainty about when a failure occurs Cost based recovery After order for MB and RAM are placed, they may get delayed The manufacturer may have severe costs if assembly is halted It must evaluate whether it is cheaper to cancel/return and reorder or take the penalty of delay Caveat: possible that reordered goods may be delayed too Proposed Solution Modeling decision making capabilities of Service Managers as Markov Decision Processes (MDPs) SWAPS: Use of Semantics in Agreement Matching An agreement is a collection of alternatives. A={Alt1, Alt2, …, AltN} An alternative is a collection of guarantees. Alt={G1, G2, ...GN} “requirement(Alt, G)” returns true if G is a requirement of Alt A guarantee is defined as a collection“capability(Alt, G)” returns true if G is an assurance of Alt G={Scope, returns Obligated, SLO, Qualifying Condition, Business Value} “scope(G)” the scope of G “obligation(G)” returns the obligated party of G “satisfies(Gj, Gi)” returns true if the SLO of is equivalent to There is a potential match between provider andGjconsumer alternatives or if:stronger than the SLO of Gi An alternative Alt1 is a suitable match for Alt2 if: For requirement one alternative, there is a capability (" allGi) such thatofGi Alt1 requirement(Alt1, Gi)inother ($ Gj) alternative, has the same scope and the obligation and the such thatwhich Gj Alt2 capability(Alt2, Gj)same scope(Gi) SLO of the capability satisfies the request. = scope(Gj) obligation(Gi) = obligation(Gj) satisfies(Gj, WS-Agreement Definition and Ontology hasGuaranteeTerm GuaranteeTerm hasScope An agreement consists of a collection of Guarantee hasBusinessValue terms hasCondition A guarantee term has a scope – e.g. operation BusinessValue of service Qualifying Condition ServiceLevelObjectivev hasReward hasObjective Scope Predicate Reward There might business values hasPenalty associated guarantee term maybe have qualifying A guaranteeAterm may have collection of a hasImportance with each to guarantee terms. Business values Penalty conditionParameter for SLO’s hold. Parameter service level objectives Unit Importance include importance, confidence, penalty, Value ValueExpression e.g. numRequests < 100 and reward. e.g. Unit responseTime < 2 seconds Predicate Value OWL ontology e.g. Penalty 5 USD ValueUnit ValueExpression Assessment Interval Assessment Interval ValueUnit TimeInterval Count Count TimeInterval Agreement represented as an instance of ontology Semantic Middleware Semantic Middleware Investigating fundamental issues in entity/relationship extraction, disambiguation (matching & mapping) and annotation. Three fundamental steps Semantic Tagging of resources (simplest form) Entity identification Entity disambiguation Annotation ----------------------------------------------------------------------------------------------------------------------------------------------------- World Model Lexical Analysis, Natural Language Processing, Additional linguistic resources: Thesaurus,Dictionary (synonymns, common variations) Entity Identification / Metadata Creation Documents to annotate YES Multiple matches found during lookup? NO Knowledge Base Semantic Annotation of selected documents Annotated Documents Entity Disambiguation Semantic Annotation Entities in a drug advisory annotated with concepts and relationships from a Drug Ontology Excerpt of Drug Ontology Excerpt of Drug Ontology Sample Created Metadata <Entity id="122805" class="DrugOntology#prescription_drug_brandname"> Bextra <Relationship id=”442134” class="DrugOntology#has_interaction"> <Entity id="14280" class="DrugOntology #interaction_with_physical_condition>sulfa allergy </Entity> </Relationship> </Entity> Disambiguation Functionality: merging two databases / ontologies, multiple references pointing to the same logical entity Adding new instances to an ontology, a similar entity already exists and has to be merged with the new one Example: merging person instances recorded in a government ontology and an incoming choice point person entity. Challenges Varying information content in entities Differences in schema Variations in representation Use of abbreviations, mis-spellings, different naming convention, representation formats changing over time etc. Insufficient information while merging two entities Exploiting relationships and other /previous reconciliation decisions Schema Conflicting instances Person Tim Robins Timothy Wallace Robinson -- SSN -- 889889889 -- 889889889 -- TelNumber -- 7065434567 -- 7062123443 -- FirstName -- Tim -- Timothy -- MiddleName -- -- Wallace -- LastName -- Robins -- Robinson -- Generation -- -- -- Marital Status -- Single -- Married -- Applicant -- -- -- dependent of -- -- -- spouse of -- -- person12332 -- works for -- People Soft -- Oracle -- affiliated with -- -- -- foreign influence event -- event7823 -- event099 -- address -- place23 -- place23 Nature of attribute indicates its relative importance – SSN given a high weight in disambiguating person entities String similarity metrics Recognized as a time sensitive attribute Reconciling Oracle and PeopleSoft indicates the two person entities work for the same organization Application - Disambiguating entities from two domains DBLP vs. DBLP FOAF vs. FOAF DBLP vs. FOAF FOAF rdfs:literal rdfs:literal DBLP rdfs:literal foaf:mbox foaf:schoolpage rdfs:literal label rdfs:literal rdfs:literal dblp:has_label dblp:has_homepage dblp:has_no_of_co_authors dblp:has_no_of_publications dblp:has_coauthor rdfs:literal foaf:workplacepage dblp:Researcher rdfs:literal foaf:knows foaf:Person rdfs:literal foaf:surname foaf:homepage foaf:firstName foaf:depiction foaf:mbox_sha1sum foaf:nickName dblp:has_iswcLocation dblp:has_iswc_type rdfs:literal dblp:has_iswc_affiliation rdfs:literal rdfs:literal rdfs:literal rdfs:literal rdfs:literal rdfs:literal rdfs:literal Exploiting relationships and propagating reconciliation decisions Syntactic matches (String similarity) Attribute weights Relationship with other entities Presupposition: some coauthors could also be your friends Propagating decisions Reference reconciliation in complex information spaces, X Dong, A Halevy, J Madhavan - Proceedings of the 2005 ACM SIGMOD international conference Syntactic matches http://www.informatik.uni-trier.de/~ley /db/indices/a-tree/s/Sheth:Amit_P=.html 9c1dfd993ad7d1852e80ef8c87fac30e10776c0c Dblp mbox_shasum homepage Amit P Sheth http://www.semagix.com http://lsdis.cs.uga.edu Workplace homepage 9c1dfd993ad7d1852e80ef8c87fac30e10776c0c Amit Sheth label UGA mbox_shasum Professor title iswc_affiliation DBLP Researcher coauthors label FOAF Person Marek Rusinkiewicz Carole Goble Steefen Staab Ramesh Jain John Miller John A. Miller friends homepage http://lsdis.cs.uga.edu/~amit http://lsdis.cs.uga.edu/~amit homepage Attribute weights 9c1dfd993ad7d1852e80ef8c87fac30e10776c0c The uniqueness property of the Mail box and homepage values give those attributes more weight 9c1dfd993ad7d1852e80ef8c87fac30e10776c0c mbox_shasum Amit P Sheth Amit Sheth label UGA label Professor title iswc_affiliation mbox_shasum DBLP Researcher Marek Rusinkiewicz Carole Goble FOAF Person coauthors Steefen Staab Ramesh Jain John Miller John A. Miller friends homepage homepage http://lsdis.cs.uga.edu/~amit http://lsdis.cs.uga.edu/~amit Relationship with other entities http://www.semagix.com http://lsdis.cs.uga.edu http://www.informatik.uni-trier.de/~ley /db/indices/a-tree/s/Sheth:Amit_P=.html 9c1dfd993ad7d1852e80ef8c87fac30e10776c0c Dblpmbox_shasum homepage Amit P Sheth Workplace homepage 9c1dfd993ad7d1852e80ef8c87fac30e10776c0c Amit Sheth label UGA Professor mbox_shasum title iswc_affiliation A coauthor who is also a friend DBLP Researcher coauthors label FOAF Person Marek Rusinkiewicz Carole Goble Steefen Staab Ramesh Jain John Miller John A. Miller friends homepage http://lsdis.cs.uga.edu/~amit http://lsdis.cs.uga.edu/~amit homepage Propagating decisions If John Miller and John A. Miller are found to be the same entity, there is more support for reconciliation of the entities Amit P. Sheth and Amit Sheth (based on our presupposition that some coauthors could also be your friends) Amit P Sheth Amit Sheth label UGA Professor title iswc_affiliation DBLP Researcher coauthors label FOAF Person Marek Rusinkiewicz Carole Goble Steefen Staab Ramesh Jain John Miller John A. Miller friends homepage homepage http://lsdis.cs.uga.edu/~amit http://lsdis.cs.uga.edu/~amit Results / Evaluation Attributes - weights and thresholds Properties of the dataset and results Semantic Analytics NSF funded Semantic Discovery: Discovering Complex Relationships in the Semantic Web (SemDis) 4 faculty members, 7 PhD students http://lsdis.cs.uga.edu/semdis Semantic Discovery From ….. Finding things To ….. Finding out about things Relationships! Semantic Discovery (SemDis) Overview How is entity 1 (Reviewer) related to entity 7 (Submission)? author_of E2:Paper E6:Person author_of E1:Reviewer author_of author_of E7:Submission E4:Paper knows author_of E3:Person knows E5:Person User Aggregated RDF Instance Base Semantic Analytics · · · Semantic Association Discovery and Ranking Subgraph Discovery Browsing Ontology Schema(s) Text XML HTML RDMS Semantic Associations Concepts and Definitions Semantic Connectivity “Matt” Semantically Connected &r1 &r6 “Perry” &r5 name “LSDIS Lab” name “The University of Georgia” Semantic Similarity Passenger Ticket Corporate Account “Bill” &r1 “Fred” &r7 “Smith” &r2 paidby &r3 Semantically Similar “Jones” purchased lname purchased &r8 paidby &r9 Battling Information Overload Enumeration and Ranking Subgraph Discovery Ranking Semantic Associations5 Association Length Rarity Organization Political Organization Democratic Political Organization Subsumption Context Association Rank Trust Popularity 5. Boanerges Aleman-Meza, Christian Halaschek-Wiener, Budak Arpinar, Cartic Ramakrishnan, and Amit Sheth. “Ranking Complex Relationships on the Semantic Web”, IEEE Internet Computing, 9(3), 37-44. 2005. Ranking Semantic Associations (SemRank) Modulative Ranking Relevance: Search Mode + Predictability Refraction Count Information Gain How varied is the result from what is expected from schema? How much information does a user gain by being informed about a result? S-Match Best semantic match with user need (if provided) Kemafor Anyanwu, Angela Maduko, Amit Sheth, SemRank: Ranking Complex Relationship Search Results on the Semantic Web, The 14th International World Wide Web Conference, (WWW2005), Chiba, Japan, May 10-14, 2005 Low Information Gain Low Refraction Count High S-Match High Information Gain High Refraction Count High S-Match adjustable search mode Subgraph Discovery Idea is to summarize the important connections between two resources Tied with visualization (what is the best set of associations that can be visually comprehended at one time?) Given: RDF Graph, Budget b Find: The best set of associations which pass through at most b different nodes Cartic Ramakrishnan, William Milnor, Matthew Perry, Amit Sheth. "Discovering Informative Connection Subgraphs in Multi-relational Graphs", SIGKDD Explorations Special Issue on Link Mining, Volume 7, Issue 2, December 2005 1988 Democratic Natl Conv 2000 Democratic Natl Conv spoke_at spoke_at spoke_at spoke_at Bill Clinton nominated_at Edward Kennedy won relative_of 1992 Democratic Natl Conv 1992 Natl Presidential Election spoke_at Maria Schriver Zell Miller lost spoke_at George H W Bush 2004 Republican Natl Conv spouse_of started spoke_at Arnold Schwarzenegger George H W Bush Council of Physical Fitness leader _of Subgraph Discovery Approach Heuristic algorithm Puts weights on the edges based on semantics of node and edge types and based on structural properties of the graph Models graph as electrical circuit (weights are conductance) Use Greedy Algorithm to maximize current flow and minimize number of nodes Spatiotemporal and Thematic Semantic Analytics Matthew Perry, Farshad Hakimpour, Amit Sheth. "Analyzing Theme, Space and Time: An Ontology-based Approach", Fourteenth International Symposium on Advances in Geographic Information Systems (ACMGIS '06), Arlington, VA, November 10 - 11, 2006 From thematic analytics to spatio-temporal, thematic (STT) analytics (Ex: Bioterrorism) assigned_to E10:Docto r E9:Base E1:Soldie r Spotted Before and E8:Soldie Close in Time r E6:Attack used_in [0, 2] stationed_at member_of [0, 10] E7:Platoo n E2:Sympto m E4:Chemica l causes sign_of participated_in [4, 6] After the Battle spotted_at [3, 5] E11:Locatio n E14:Battl e Near in Space E5:Terroris t member_of E3:Diseas e exhibits [8, 10] carried_out [0, 2] E13:Soldie r E12:Platoo n participated_in member_of Proposed Model – 3 Dimensions (Thematic, Geospatial, Temporal) Dynamic Entity Named Place located_at [ ti : tj ] Event [ ti : tj ] [ ti : tj ] occurred_at[ ti : tj ] subClassOf (isA) arbitrary user-defined classes and relationships [ ti : tj ] time interval of relationship [ ti : tj ] Footprint Spatial Geometry Representation part_of contains overlaps adjacent_to [ ti : tj ] Thematic Context for Spatial Extent Spatial extent of non-spatial entities is derived from thematic context 15 Spring Street Lives At University of Georgia Works For (x3, y3) Bill Allen (x2, y2) Fred Smith Lives At Georeferenced Coordinate Space Dynamic Entity Named Place 150 Elm Street (x1, y1) Context: path expression connecting dynamic entity type to static entity type / event Spatial extent in context of Example Context: employment and in Residency of Co-Workers: works_for.works_for.lives_at context of residency Queries based on Spatiotemporal Contexts Basic ST Query ST Range Query ST Behavior Query ST Relationship Query When was the 3rd Armored Division within Iraq? Where were bombing targets of the US Air Force in April 2003? How did the distribution of US airstrips in Iraq change during March 2003? Show the dates and locations of battles of the 101st Airborne Division How does the battle pattern of the 3rd Armored Division compare to the pattern of the 1st Armored Division? When and where were the 101st Airborne and the 82nd Airborne likely to have interacted? Spatiotemporal Semantic Associations • Define setting as a region of space in combination with an interval of time • How is entity X related to Spatial setting S? ( ρ (entity, setting)) Group 1 Account_1234 Fred 125 Broad Street Jim Attack Site How is Group 1 connected to the setting of the expected attack? Spatiotemporal Semantic Associations How are entity X and entity Y related w.r.t Spatial setting S? ρ (entity, entity, setting) Group 1 Group 2 Account_1234 Fred Jim 125 Broad Street How are Group 1 and Group 2 connected with respect to the attack site? Spatiotemporal Semantic Associations Idea of Virtual Links between entities based on Spatiotemporal information Possible definition of rules to define a virtual link type Collaboration: entity X and Y are in close ST proximity more often than a given threshold Knows: entity X and Y are in close ST proximity regularly Other Aspects How do temporal relationships affect association semantics 2 works_for relationships (overlapping times, disjoint times, etc) Complex queries based on all 3 dimensions Which location is the most likely storage facility for exfiltrated weapon material Thematic (correct capabilities, linked to correct people) Spatial (where was the material last seen) Temporal (how long can the material stay out of storage) REmBRANDTS – Retrieval, Browsing, Analytics and knowledge Discovery from Text using Semantics Cartic Ramakrishnan LSDIS Lab, University of Georgia, Athens, GA SEMANTICS, SEMANTICS ….. SEMANTICS Overview UMLS Biologically active substance affects complicates causes causes Lipid Disease or Syndrome affects instance_of instance_of ??????? Fish Oils Raynaud’s Disease MeSH PubMed 9284 documents 5 documents 4733 documents About the data used UMLS – A high level schema of the biomedical domain MeSH 136 classes and 49 relationships Synonyms of all relationship – using variant lookup (tools T147—effect from NLM) T147—induce T147—etiology T147—cause T147—effecting T147—induced Terms already asserted as instance of one or more classes in UMLS PubMed Abstracts annotated with one or more MeSH terms Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ exogenous) ) (NN stimulation) ) (PP (IN by) (NP (NN estrogen) ) ) ) (VP (VBZ induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT the) (NN endometrium) ) ) ) ) ) ) Method – Identify entities and Relationships in Parse Tree Method – Identify entities and Relationships in Parse Tree Modifiers Modified entities Composite Entities Result of Extraction Semantic Metadata Represented in RDF With complex entities and relationships connecting them Provenance of extracted facts Pointers to original document and sentence Current results ~2MB RDF for Migraine Magnesium subset of PubMed ~150MB RDF for all documents pertaining to Neoplasms subtree of MeSH Use of Generated Semantic Metadata Semantic Browsing of PubMed based on named relationships between MeSH terms Corpus-based hypothesis validation Path/hypothesis based document retrieval Knowledge discovery from literature Coprus-based complex relationship discovery and ranking Corpus-based relevant connection subgraph discovery Corpus based Hypothesis validation affectedBy Magnesium Migraine Stress inhibit Patient isa Calcium Channel Blockers Complex Query PubMed Supporting Document sets retrieved Discovering Complex Relationships Stress Migraine ? Calcium Channel Blockers Cortical Spreading Depression PubMed Magnesium Possibly thousands of paths Need corpus-based relevance model for paths and subgraphs Discovering Maximally Relevant Connection Subgraphs Migraine Subgraph with Maximal Support Magnesium A connection subgraph PubMed Computing Semantic Associations Graph-based, Main-Memory RDF Processing BRAHMS – Design Goals Offer high performance for basic operations used in graph traversal algorithms. Capable of handling big ontologies (100s Mbytes to many Gbytes). Handle RDF / RDFS. Distinguish between schema and instance level. Provide framework for testing different semantic association discovery algorithms. Maciej Janik, Krys Kochut, "BRAHMS: A WorkBench RDF Store And High Performance Memory System for Semantic Association Discovery" In the Proceedings of the 4th International Semantic Web Conference (ISWC2005), November 2005, Galway Ireland, pp. 431-445. BRAHMS Performance requirements use main memory for storage – fastest access create indexes for operations used in graph traversal algorithms use C/C++ in implementation instead of Java Design decisions compact knowledge base to minimize memory usage, no memory fragmentation – use contiguous memory blocks make it read-only create snapshot of memory structures for fast start-up (parse once, use many times) BRAHMS Results Speed outperform Sesame, Jena and Redland in k-hop limited semantic association searches using main-memory RDF model big impact using large datasets, when other datastores either perform slowly or cannot execute algorithm at all Handling datasets size limited by main-memory (physical) and/or system (32 Vs. 64bit) able to efficiently run algorithms on large datasets, that other RDF storages cannot handle using memory-model tested: SWETO [255Mb], Lehigh University – Univ(50, 0) [556Mb], synthetic [9Gb] /64bit machine/ Future of Brahms SPARQL – currently implemented most of functionality over BRAHMS create querying extension for regular expressions on graphs Distributed storage4 (current work) handle very large dataset (10s Gb+) partitioned to cluster of computers efficient distributed SPARQL query model and implementation Matthew Perry, Maciej Janik, Cartic Ramakrishnan, Conrad Ibanez, Ismailcem Budak Arpinar, Amit Sheth. "Peer-to-Peer Discovery of Semantic Associations", Second International Workshop on Peer-to-Peer Knowledge Management, San Diego, CA, July 17, 2005 Semantics for Life Sciences Applications Semantic Bioinformatics in Glycoproteomics Acknowledgement: NCRR funded Bioinformatics of Glycan Expression, collaborators, partners at CCRC (Dr. William S. York) and Christopher Thomas, Cory Henson, Prateek Jain Outline Semantic integration of large distributed data sources Glycoproteomics ontologies Services architecture based biological resources GLYDE – XML-based representation standard Integrated Semantic Information and knowledge System (Isis) Have I performed an error? Give me all result files from a similar organism, cell, preparation, mass spectrometric conditions and compare results. SPARQL query-based User Interface ProPreO ontology Is the result erroneous? Experimental Semantic Give me result files from a similar Data all Semantic Metadata Annotation Metadata organism, cell, preparation, Registry File mass spectrometric conditions and compare results. PROTEOMECOMMONS EXPERIMENTAL DATA Raw mzXML Raw2mzXML mzXML2Pkl Pkl MACOT result ProVault result MASCOT Search ProVault pSplit Pkl2pSplit PROTEOMICS WORKFLOW N-Glycosylation Process (NGP) Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 n Separation technique I Glycopeptides Fraction n PNGase Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction ms peaklist ms/ms peaklist binning Glycopeptide identification and quantification N-dimensional array Signal integration Data reduction Peptide identification Peptide list Data correlation Ontologies • Glyco • An ontology for structure and function of Glycopeptides • 573 classes, 113 relationships • Published through the National Center for Biomedical Ontology (NCBO) • ProPreO • An ontology for capturing process and lifecycle information related to proteomic experiments • 398 classes, 32 relationships • 3.1 million instances • Published through the National Center for Biomedical Ontology (NCBO) and Open Biomedical Ontologies (OBO) Zooming in a little … Reaction R05987 catalyzed by enzyme 2.4.1.145 adds_glycosyl_residue N-glycan_b-D-GlcpNAc_13 The product of this reaction is the Glycan with KEGG ID 00020. The N-Glycan with KEGG ID 00015 is the substrate to the reaction R05987, which is catalyzed by an enzyme of the class EC 2.4.1.145. Semantic annotation of Scientific Data <ms/ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_ mass_spectrometer” mode = “ms/ms”/> <parent_ion_mass>830.9570</parent_ion_mass> <total_abundance>194.9604</total_abundance> <z>2</z> <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> <ms/ms_peak_list> Annotated ms/ms peaklist data Semantic Biological Web Service Registry Semantic Web Service GLYDE-CT : GLYcan Data Exchange Based on a Connection Table Format <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE GlydeCT SYSTEM "http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11.DTD"> <GlydeCT xmlns:GlydeCT="http://glycomics.ccrc.uga.edu/GLYDE-CT/GLYDE-CT_v2.11"> <structure type="molecule" id="molecule_1" name=“GP1"> <part type="moiety" id=“moiety_1" ref=“some_file#GNGS" name="GNGS"/> <part type="moiety" id=“moiety_2" ref=“some_file#Man3" name="Man3GlcNAc2"/> <link from=“moiety_2" to=“moiety_1"> <link from=“residue_1" to=“residue_2"> <link from="C1" to="N4"/> </link> 4 </link> </structure> 3 2 1 </Glyde-CT> 5 moiety_2 Gly 1 | Asn 2 | Gly3 | Ser 4 moiety_1 GLYDE-CT: Collaborative GlycoInformatics Evolving collaboration between: LSDIS/CCRC: Will York, Amit Sheth, Michael Pierce EUROCarbDB (German Cancer Research Center): Willi von der Lieth Consortium for Functional Glycomics (CFG): Rahul Raman, Ram Sasisekharan, Thomas Lütteke N.D. Zelinsky Institute of Organic Chemistry (Moscow) Yuriy Knirel Mitsui Knowledge Industry (Japan): Hisashi Narimatsu, Norihiro Kikuchi Kyoto Encyclopedia of Genes and Genomes (KEGG): Minoru Kanehisa, Kiyoko F. Aoki-Kinoshita Palo Alto Research Center (PARC): David Goldberg, Moving Forward Dr. Amit Sheth will take his new position of LexisNexis Eminent Scholar at Wright State University starting January 2, 2007 Dr. Sheth, along with 10 of his Ph.D. students and existing and newly-selected faculty at WSU, will form the kno.e.sis (Knowledge Enabled Information and Services Science) lab Collaborative research will continue between the newly-formed kno.e.sis lab at WSU and the remaining members of LSDIS at UGA http://lsdis.cs.uga.edu http://knoesis.org http://lsdis.cs.uga.edu/projects/asdoc/ http://lsdis.cs.uga.edu/projects/glycomics/