Semantic Web: Promising Technologies, Current Applications & Future Directions Invited and Colloquia talks at: Swinburne Institute of Technology –Melbourne (July 18), University of Adelaide-Adelaide (July 23), University of Melbourne- Melbourne (July 31), Victoria University- Melbourne Australia, 2008 Amit P. Sheth amit.sheth@wright.edu Kno.e.sis Center, Comp. Sc & Engg Wright State University, Dayton OH, USA Thanks Kno.e.sis team and collaborators Knowledge Enabled Information and Services Science Outline • Semantic Web –key capabilities and technlologies • Real-world Applications demonstrating benefit of semantic web technologies • Exciting on-going research Knowledge Enabled Information and Services Science Evolution of the Web Web as an oracle / assistant / partner - “ask the Web”: using semantics to leverage text + data + services 2007 - Powerset Web of people - social networks, user-created casual content - Twine, GeneRIF, Connotea Web of resources - data = service = data, mashups - ubiquitous computing 1997 Web of databases - dynamically generated pages - web query interfaces Web of pages - text, manually created links - extensive navigation Knowledge Enabled Information and Services Science 1 2 3 of Semantic Web Knowledge Enabled Information and Services Science 1 • Ontology: Agreement with a common vocabulary/nomenclature, conceptual models and domain Knowledge • Schema + Knowledge base • Agreement is what enables interoperability • Formal description - Machine processability is what leads to automation Knowledge Enabled Information and Services Science 2 • Semantic Annotation (Metadata Extraction): Associating meaning with data, or labeling data so it is more meaningful to the system and people. • Can be manual, semi-automatic (automatic with human verification), automatic. Knowledge Enabled Information and Services Science 3 • Reasoning/Computation: semantics enabled search, integration, answering complex queries, connections and analyses (paths, sub graphs), pattern finding, mining, hypothesis validation, discovery, visualization Knowledge Enabled Information and Services Science Different foci • TBL – focus on data: Data Web (“In a way, the Semantic Web is a bit like having all the databases out there as one big database.”) • Others focus on reasoning and intelligent processing Knowledge Enabled Information and Services Science SW Stack: Architecture, Standards Knowledge Enabled Information and Services Science From Syntax to Semantics Deep semantics Shallow semantics Knowledge Enabled Information and Services Science a little bit about ontologies Knowledge Enabled Information and Services Science Many Ontologies Available Today Open Biomedical Ontologies Open Biomedical Ontologies, http://obo.sourceforge.net/ Knowledge Enabled Information and Services Science From simple ontologies Knowledge Enabled Information and Services Science Drug Ontology Hierarchy (showing is-a relationships) non_drug_ reactant interaction_ property formulary_ property formulary indication monograph _ix_class prescription _drug_ property cpnum_ group property indication_ property brandname_ individual brandname_ undeclared prescription _drug_ brand_name brandname_ composite generic_ composite prescription _drug prescription _drug_ generic owl:thing interaction interaction_ with_prescri ption_drug generic_ individual Knowledge Enabled Information and Services Science interaction_ with_non_ drug_reactant interaction_ with_mono graph_ix_cl ass to complex ontologies Knowledge Enabled Information and Services Science N-Glycosylation metabolic pathway N-glycan_beta_GlcNAc_9 GNT-I attaches GlcNAc at position 2 N-acetyl-glucosaminyl_transferase_V N-glycan_alpha_man_4 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 Knowledge Enabled Information and Services Science A little bit about semantic metadata extractions and annotations Knowledge Enabled Information and Services Science Extraction for Metadata Creation WWW, Enterprise Repositories Nexis UPI AP Feeds/ Documents Digital Videos ... ... Data Stores Digital Maps ... Digital Images Create/extract as much (semantics) metadata automatically as possible; Use ontlogies to improve and enhance extraction Digital Audios EXTRACTORS METADATA Knowledge Enabled Information and Services Science Automatic Semantic Metadata Extraction/Annotation Knowledge Enabled Information and Services Science Semantic Web in Action Supporting Clinical Decision Making Knowledge Enabled Information and Services Science 1. Supporting Clinical Decision Making • Status: In use today • Where: Athens Heart Center • What: Use of Semantic Web technologies for clinical decision support Knowledge Enabled Information and Services Science Operational Since January 2006 Knowledge Enabled Information and Services Science Active Semantic Electronic Medical Records (ASEMR) Goals: • Increase efficiency with decision support • formulary, billing, reimbursement • real time chart completion • automated linking with billing • Reduce Errors, Improve Patient Satisfaction & Reporting • drug interactions, allergy, insurance • Improve Profitability Technologies: • Ontologies, semantic annotations & rules • Service Oriented Architecture Thanks -- Dr. Agrawal, Dr. Wingeth, and others. ISWC2006 paper Knowledge Enabled Information and Services Science ASEMR - Demonstration Click to Launch Knowledge Enabled Information and Services Science Further Opportunity: Clinical and Biomedical Data binary text Scientific Literature Health Information Services PubMed 300 Documents Published Online each day Elsevier iConsult NCBI User-contributed Content (Informal) Public Datasets GeneRifs Genome, Protein DBs new sequences daily Clinical Data Personal health history Laboratory Data Lab tests, RTPCR, Mass spec Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support. Knowledge Enabled Information and Services Science Semantic Web in Action Querying Integrated Data Sources Knowledge Enabled Information and Services Science 2. Querying Integrated Data Sources • Status: Completed research • Where: NIH • What: Querying Integrated Data Sources – Enriching data with ontologies for integration, querying, and automation – Ontologies beyond vocabularies : the power of relationships Knowledge Enabled Information and Services Science Using Data to Test Hypothesis Link between glycosyltransferase activity and congenital muscular dystrophy? Gene name Interactions Glycosyltransferase GO gene Sequence PubMed OMIM Congenital muscular dystrophy Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07 Knowledge Enabled Information and Services Science On the World Wide Web (GeneID: 9215) has_associated_disease Congenital muscular dystrophy, type 1D has_molecular_function Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07 Knowledge Enabled Information and Services Science Acetylglucosaminyltransferase activity With the Semantically Enhanced Data SELECT DISTINCT ?t ?g ?d { ?t is_a GO:0016757 . GO:0016757 ?g has molecular functionglycosyltransferase ?t . ?g has_associated_phenotype ?b2 . isa ?b2 has_textual_description ?d . FILTER (?d, “muscular distrophy”, “i”) . GO:0008194 FILTER (?d, “congenital”,GO:0016758 “i”) } GO:0008375 acetylglucosaminyltransferase GO:0008375 acetylglucosaminyltransferase MIM:608840 Muscular dystrophy, congenital, type 1D has_molecular_function LARGE EG:9215 has_associated_phenotype From medinfo paper. Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07 Knowledge Enabled Information and Services Science Semantic Web in Action Industry Examples Knowledge Enabled Information and Services Science • • • • • • Zemanta Twine Digger Calais – Reuters Thompson Powerset Talis Knowledge Enabled Information and Services Science Emerging Research Areas Knowledge Enabled Information and Services Science Fact Extraction and Schema Creation Knowledge Extraction from Community-Generated Content Knowledge Enabled Information and Services Science Fact Extraction From Community Content • Search helps us find relevant pages/articles • But: It doesn’t answer questions. Knowledge Enabled Information and Services Science Fact Extraction From Community Content • Fact Extraction is the first step towards answering questions. • Famous new company that does fact extraction from Wikipedia is Powerset http://www.powerset.com Knowledge Enabled Information and Services Science Fact Extraction From Community Content • Problem: without a guiding schema, extracted predicates are just terms • useful for humans, but not for machines Knowledge Enabled Information and Services Science Fact Extraction From Community Content • Expert-created schemas are expensive and usually very restricted Knowledge Enabled Information and Services Science Fact Extraction From Community Content • Solution: Have a community-generated schema • Wikipedia hierarchy for terms and concepts Knowledge Enabled Information and Services Science Hierarchy Creation Wikigraph-Based expansion Graph Search Query: “cognition” Seed Query Fulltext Concept Search Graph Search Graph Search B Wikipedia Hierarchy Creation Knowledge Enabled Information and Services Science Fact Extraction From Community Content • Solution: Have a community-generated schema Relationship • Wikipedia types hierarchy for terms and concepts – See Automatic Domain Model creation • Wikipedia Infoboxes for relationship types Knowledge Enabled Information and Services Science Learn Patterns that indicate Relationships Query: Australia Sidney Australia New South Wales, Australia •in Sydney, •in Sydney, New South Wales, Australia Canberra •Sydney is the most populous city in Australia •Sydney is the most populous city in Australia We know that countries •Canberra, the Australian capitalhave city capitals. •Canberra, the Australian capital city Which one is Australia’s? •Canberra is the capital city of the Commonwealth of •Canberra is the capital city of the Commonwealth of Australia Australia •Canberra, the Australian capital •Canberra, the Australian capital Knowledge Enabled Information and Services Science Add relationships Knowledge Enabled Information and Services Science Fact Extraction From Community Content • The accumulation of many pattern occurrences give the necessary support – Canberra, Australia minimal positive support – The Australian capital of Canberra additional major support Knowledge Enabled Information and Services Science Summary • Create Domain models from seed queries or seed concepts • Connect the concepts in the created domain models with valid relationships – Learn pertinent patterns for relationships – Find evidence for relationships in text • Wikipedia • WWW Knowledge Enabled Information and Services Science Discovering Undiscovered Knowledge Connecting the Dots Knowledge Enabled Information and Services Science How are Harry Potter and Dan Brown related? mentioned_in Nicolas Flammel Harry Potter mentioned_in Nicolas Poussin member_of The Hunchback of Notre Dame painted_by written_by cryptic_motto_of Et in Arcadia Ego Victor Hugo Holy Blood, Holy Grail member_of Priory of Sion mentioned_in displayed_at member_of The Da Vinci code mentioned_in painted_by Leonardo Da Vinci The Louvre The Mona Lisa painted_by displayed_at The Last Supper painted_by displayed_at The Vitruvian man Santa Maria delle Grazie Knowledge Enabled Information and Services Science Motivation • Undiscovered Public Knowledge [Swanson 89] – Hidden connections in text • Our objective: build mechanisms to reveal these connections • Our approach: o Populate existing ontology schemas via information extraction from text o Use the extracted information to o Support browsing o Text retrieval o Knowledge discovery Knowledge Enabled Information and Services Science Discovering the Undiscovered Knowledge • Swanson’s discoveries – Associations Magne leads to between loss Migraine and Magnesium [Hearst99] • • • • • • • • stress is associated with migraines sium stress can lead to loss of magnesium Stress calcium channel blockers prevent some migraines magnesium is a natural calcium channel blocker Spreading spreading cortical depression (SCD) is implicated in some migraines high levels of magnesium inhibit SCD Cortical migraine patients have high platelet aggregability Depression magnesium can suppress platelet aggregability platelet aggregability • Data sets generated using these entities (marked red above) as boolean keyword queries against pubmed Calcium Channel Blockers prevents Migraine • Bidirectional breadth-first search used to find paths in resulting RDF Knowledge Enabled Information and Services Science Background Knowledge Used • UMLS – A high level schema of the biomedical domain – 136 classes and 49 relationships – Synonyms of all relationship – using variant lookup (tools from NLM) – 49 relationship + their synonyms = ~350 mostly verbs T147—effect T147—induce T147—etiology T147—cause T147—effecting T147—induced • MeSH – 22,000+ topics organized as a forest of 16 trees – Used to query PubMed • PubMed – Over 16 million abstract – Abstracts annotated with one or more MeSH terms Knowledge Enabled Information and Services Science Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) • Entities (MeSH terms) in sentences occur in modified forms • “adenomatous” modifies “hyperplasia” (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ • “An excessive endogenous or exogenous modifies exogenous) ) (NN stimulation) ) (PP (IN by) (NPstimulation” (NN estrogen) ) ) ) (VP (VBZ “estrogen” induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT • Entities can also occur) as of 2 or more other entities the) (NN endometrium) ) ) composites ))) • “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium” Knowledge Enabled Information and Services Science Knowledge Enabled Information and Services Science Utilizing Extracted Knowledge Supporting browsing, querying and knowledge discovery – Semantic Browser – Query semi-structured representations • SPARQL • Hypothesis-Driven Retrieval – Discovery complex connection patterns • Knowledge Discovery operators Knowledge Enabled Information and Services Science Example - Evaluating Hypotheses Migraine affects Magnesium Stress inhibit Patient isa Calcium Channel Blockers Complex Query Keyword query: Migraine[MH] + Magnesium[MH] PubMed Supporting Document sets retrieved Knowledge Enabled Information and Services Science Example - Semantic Browser Click to Launch Knowledge Enabled Information and Services Science Web 2.0 Man Meets Machine Knowledge Enabled Information and Services Science Putting the man back in Semantics Semantic Web focuses on artificial agents “Web 2.0 is made of people” (Ross Mayfield) “Web 2.0 is about systems that harness collective intelligence.” (Tim O’Reilly) The relationship web combines the skills of humans and machines Knowledge Enabled Information and Services Science Putting the man back in Semantics Semantic Web focuses on artificial agents “Web 2.0 is made of people” (Ross Mayfield) “Web 2.0 is about systems that harness collective intelligence.” (Tim O’Reilly) The relationship web combines the skills of humans and machines Knowledge Enabled Information and Services Science Going places … Formal Powerful Implicit Social, Informal Knowledge Enabled Information and Services Science A Community’s Pulse • Wealth of information available in blogs, social networks, chats etc. • Free medium of self-expression makes mass opinions / interests available • Polling for popular culture opinions is easier • Social Production undeniably affects markets • Results of analysis more effectively tailored to specific audience : geo-specific retail ads, demographic interests in music Knowledge Enabled Information and Services Science Buzz on MySpace Mining artist popularity from chatter on MySpace - Lists close to listeners preferences User Comments: vs. BB May 07 - Bill Boards Rihanna Rihanna User Comments: Jun 07 Rihanna Biffy Clyro Winehouse Winehouse Twang Maroon 5 Maroon 5 Maroon 5 Mccartney Mccartney McCartney Biffy Clyro Biffy Clyro Winehouse Twang Rascal Rascal Rascal Twang Knowledge Enabled Information and Services Science The How Metadata Extraction from Comments Artist, Track name in comments are common words “Keep your smile on Lil.” (Artist: Lilly Allen, Track: Smile) Necessitate a combination of linguistic, statistical, domain knowledge and domain specific rules to do well Detecting and discarding Spam Accurate popularity estimates Transliterating Slang I say: “Your music is wicked” What I really mean: “Your music is good” Hypercube: Demographics' of users who post, nonspam positive and negative sentiment comment counts Lets one ask questions like “Who is the most popular artists among the 19 year olds in New York?” Knowledge Enabled Information and Services Science Opportunities Casual Text more and more pervasive Extracting Semantic Metadata a whole different problem What works for a news article, scientific literature does not work well for content that does not follow rules of edited text Need to systematically understand differences in these types of text in order to improve enablers like entity extraction Knowledge Enabled Information and Services Science Event Web and the Semantic Sensor Web Time, Space and Theme Knowledge Enabled Information and Services Science Events – Spatial, Temporal and Thematic Spatial Temporal Thematic Knowledge Enabled Information and Services Science Events and STT Dimensions E2:Soldier E4:Address located_at lives_at located_at lives_at E6:Address Georeferenced Coordinate Space (Spatial Regions) E1:Soldier assigned_to occurred_at E7:Battle participates_in E8:Military_Unit E8:Military_Unit participates_in E5:Battle assigned_to occurred_at Residency Battle Participation Named Places Spatial Occurrents Dynamic Entities Knowledge Enabled Information and Services Science E3:Soldier Scenario: Sensor Data Fusion and Analysis High-level Sensor Low-level Sensor How do we determine if the three images depict … • the same time and same place? • the same entity? • a serious threat? Knowledge Enabled Information and Services Science 90 Data Pyramid Sensor Data Pyramid Ontology Metadata Knowledge Entity Metadata Information Feature Metadata Raw Sensor (Phenomenological) Data Data Knowledge Enabled Information and Services Science What is Sensor Web Enablement? http://www.opengeospatial.org/projects/groups/sensorweb Knowledge Enabled Information and Services Science 92 SWE Components - Languages Information Model for Observations and Sensing Sensor and Processing Description Language Observations & Measurements (O&M) GeographyML (GML) Common Model for Geographical Information SensorML (SML) TransducerML (TML) Sam Bacharach, “GML by OGC to AIXM 5 UGM,” OGC, Feb. 27, 2007. Knowledge Enabled Information and Services Science Real Time Streaming Protocol SWE Components – Web Services Discover Services Sensors Providers Data Sensor Planning Service: Command and Task Sensor Systems Sensor Observation Service: Access Sensor Description and Data SOS SPS SAS Sensor Alert Service Dispatch Sensor Alerts to registered Users Catalog Service Sam Bacharach, “GML by OGC to AIXM 5 UGM,” OGC, Feb. 27, 2007. Clients Accessible from various types of clients from PDAs and Cell Phones to high end Workstations Knowledge Enabled Information and Services Science Semantic Sensor Web Knowledge Enabled Information and Services Science 95 Conclusion Knowledge Enabled Information and Services Science Take Home Points Semantics - from documents, to entities, to relationships – Richer, meaningful representations offer more insight, powerful reasoning capabilities Semantics and Web technologies for integration of information from disparate sources, often created for very different purposes with lesser human involvement Semantic Web is highly interdisciplinary – uses IR, AI, KR, DB, DC, ... Increasing mesh of Semantics, Services, People for better exploitation of resources (data, sensors, services, people) Knowledge Enabled Information and Services Science Kno.e.sis Labs (3rd floor, Joshi) Semantic Sciences Lab (Dr Sheth) Bioinformatics Lab (Dr Raymer) Semantic Web Lab (Dr Sheth + Dr. S.Wang) Service Research Lab (Dr Sheth) Metadata and Languages Lab (Dr Prasad) Data Mining Lab (Dr Dong) Joint Proposals With Each Sensor Networking Bin Wang Knowledge Enabled Information and Services Science Kno.e.sis Members – a subset Knowledge Enabled Information and Services Science References Projects: http://knoesis.org/research/ Demos at: http://knoesis.wright.edu/library/demos/ Publications: http://knoesis.wright.edu/library Rest: http://knoesis.org Thanks to our key sponsors: National Science Foundation, National Institute of Health, AFRL and industry partners. Knowledge Enabled Information and Services Science