Relationships at the Heart of Semantic Web Keynote SOFSEM 2002 , Milovy, Czech Republic, Nov 25 2002 Amit Sheth Large Scale Distributed Information Systems (LSDIS) Lab University Of Georgia; http://lsdis.cs.uga.edu CTO, Semagix, Inc. http://www.semagix.com November 2002 © Amit Sheth Semantics: The Next Step in the Web’s Evolution The Semantic Web -- a vision with several views: •·“The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what data means to process it.” [B99] •·“The semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [BHL01] •·“The Semantic Web is a vision: the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications. [W3C01] Semantics for the Web On the Semantic Web every resource (people, enterprises, information services, application services, and devices) are augmented with machine processable descriptions to support the finding, reasoning about (e.g., which service is best), and using (e.g., executing or manipulating) the resource. The idea is that self-descriptions of data and other techniques would allow context-understanding programs to selectively find what users want, or for programs to work on behalf of humans and organizations to make them more efficient and productive. Move from Syntax to Semantics in Information System (a personal perspective) Semantics Generation III (information brokering) 1997... Generation II (mediators) 1990s ADEPT Semantic Web, some DL-II projects, VideoAnywhere Taalee Semantic Engine, Oingo InfoQuilt Metadata VisualHarness InfoHarness Data Generation I (federated DB/ multidatabases) 1980s (Ontology, Context, Relationships, KB) Mermaid DDTS (Domain model) InfoSleuth, KMed, DL-I projects Infoscopes, HERMES, SIMS, Garlic,TSIMMIS,Harvest, RUFUS,... (Schema, “semantic data modeling) Multibase, MRDSM, ADDS, IISS, Omnibase, ... Semantics and Relationships Semantics is derived from relationships. Consider the linguistics perspective. “Semantics is the study of meaning. …We may distinguish a number of legitimate ways to approach semantics: • … • the relationship between linguistic expressions (e.g. synonymy, antonymy, hyperonymy, etc.): sense; • the relationship to linguistic expressions to the "real world": reference. “ Ontologies in KR help capture the above. Quoted part from http://www.ncl.ac.uk/sml/staff/. © 2000 Jonathan West. Why is this a hard problem? Are object/entities equivalent/same? How (well) are they related? • partial and fuzzy match: related, relevant.. • related in a “context” • semantic similarity, proximity, distance, …. Semantic ambiguity, also based on incomplete, inconsistent, approximate information/knowledge Many problems have stumbled across these issues e.g., schema integration (in database management area) Semantics and Relationships Increasing depth and sophistication in dealing with semantics by dealing with (identifying/searching to analyzing) documents, entities, and relationships. Future Relationships Current Entities Past Documents Issues - Relationships • • • • Identifying Relationship Discovering Relationships Validating Relationships Utilizing Relationships (in document search, querying metadata, analysis…) Specifying/Describing Relationships • Expressiveness of specification language – In relational model – In semantic data model, e.g., E-R variants – In logic, e.g., description logics –… Relationship Modeling in Various Representation Models … Catalog/ID Thesauri “narrower term” relation DB Schema Formal is-a Frames (properties) UMLS RDF Wordnet OO Terms/ glossary Simple Taxonomies Informal is-a Formal instance RDFS Disjointness, Inverse, part of… DAML CYC OWL IEEE SUO Value Restriction General Logical constraints Expressive Ontologies After Deborah L. McGuinness (Stanford) and Tim Finin (UMBC) Sampling Issues in Relationshipsoutline of this talk • Simple Relationships – already known – Representation – Identification/Querying: “Which entities are related to entity X via relationship R?” where R is typically specified as possibly a join condition or path expression • Complex relationships – Rho: discovery from large document set with associated metadata and ontologies: “How is X related to Y?” – ISCAPEs: validation/ human-directed knowledge discovery Metadata and Ontology: Primary Semantic Web enablers User Ontologies Classifications Domain Models Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Direct Content Based Metadata (inverted lists, document vectors, LSI) Content Dependent Metadata (size, max colors, rows, columns...) Content Independent Metadata (creation-date, location, type-of-sensor...) Data (Heterogeneous Types/Media) SCORE technology Knowledge Agents Semantic Enhancement Server Automatic Classification Classification Committee Entity Extraction, Enhanced Metadata, Domain Experts Knowledge Sources KA KS KS Ontology Content Sources KA Content Agents KS KA Databases CA XML/Feeds Knowledge Agent Monitor CA Websites Email CA Reports Documents Metabase Content Agent Monitor CA Toolkit Metadata adapter KA Toolkit Metadata adapter Enterprise Content Applications Semantic Query Server Ontology and Metabase Main Memory Index KS Automatic Classification & Metadata Extraction (Web page) Video with Editorialized Text on the Web Auto Categorization Semantic Metadata Semantic Annotation COMTEX Tagging Value-added Voquette Semantic Tagging Content ‘Enhancement’ Rich Semantic Metatagging Limited tagging (mostly syntactic) Value-added relevant metatags added by Voquette to existing COMTEX tags: • Private companies • Type of company • Industry affiliation • Sector • Exchange • Company Execs • Competitors Information Extraction for Metadata Creation WWW, Enterprise Repositories Nexis UPI AP Feeds/ Documents Digital Videos ... ... Data Stores Digital Maps ... Digital Images Key challenge: Create/extract as much (semantics) metadata automatically as possible Digital Audios EXTRACTORS METADATA Automatic Semantic Annotation of Text: Entity and Relationship Extraction Ontology-directed Metadata Extraction (Semi-structured data) Web Page Enhanced Metadata Asset Extraction Agent Entity and Semantic Metadata Extraction Syntax Metadata Semantic Metadata Semantic Metadata Enhancement Semantic Application Example – Research Dashboard Automatic 3rd party content integration Focused relevant content organized by topic (semantic categorization) Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic Content Aggregation from multiple content providers and feeds Semantic Web – Intelligent Content Intelligent Content = What You Asked for + What you need to know! Related Stock News COMPANY Competition COMPANIES in INDUSTRY with Competing PRODUCTS COMPANIES in Same or Related INDUSTRY Regulations Technology Products Important to INDUSTRY or COMPANY Industry News EPA Impacting INDUSTRY or Filed By COMPANY SEC Knowledge-based and Manual Associations Syntax Metadata Same entity led by Semantic Metadata Humanassisted inference Blended Semantic Browsing and Querying (Intelligence Analyst Workbench) Relationship Discovery • Problem Huge volumes of data. Need to find relationships between two entities in the Semantic Web. Application areas such as National Security, Intelligence Services, Bioinformatics. Relationship can be of different kinds. Example Terrorist Organization Ramzi Binalshibh Osama, bin laden name leaderOf AlQaida memberOf friendOf Mohammad Atta name passengerOf name Introduction • Semantic Association Complex relationships which capture connectivity and similarity of entities in a knowledge base – Complex • Involve multiple relations – Connectivity • Includes both directed paths and undirected paths Similarity • Specific notion of an isomorphism, based on the similarity of roles of entities. RDF Graph Data Model • By using a graph data model, Semantic Associations can be viewed in terms of graph traversals • We can distinguish between different types of Semantic Associations based on structural properties • For example, a path, intersecting paths, isomorphic paths. • We use the RDF Graph Data Model, to model Semantic Associations. Example Graph String creates String Artist lname Sculptor exhibited Artifact Museum String String -pathConnected(x, y):String is true if there is Integer a path Ext. Resource last_modified paints technique a, p2, b, p3,String…. y> in the Painter<x, p1, Painting knowledge base sculpts material Sculpture &r3 “Pablo” title Date “Reina Sofia Museun” &r1 &r2 “Picasso” 2000-6-09 “oil on canvas” “Rembrandt” &r5 &r4 X p1 a technique typeOf(instance) “oil on canvas” p2 subClassOf(isA) Y subPropertyOf “Rodin” &r6 “August” creates &r7 exhibited &r8 title “Rodin Museum” lname “image/jpeg” 2000-02-01 Example Graph String -joinConnected(x, y): is true creates P1 = <x, pa, a, pb,Artifact b, pc, String lname Artist if there two paths P1, P2 such that: c,exhibited pd…k, pl l, pm, m>Stringand Museum String P2 = <y, pu, b, pv,…k, pw, l, py, n> Or material sculpts Integer Sculptor P1 = < a, pa, b, pb,…k, pk, l, pl, x String > and Sculpture P2 = < y, pu, b, pv, m, pw, l,…k, p5, l, p6, n > Ext. Resource Painter paints Painting technique Date last_modified String n X a “Pablo” &r3 title “Reina Sofia Museun” &r1 &r2 “Picasso” y “Rembrandt” b &r5 &r4 2000-6-09 “oil on canvas” technique k m typeOf(instance) “oil on canvas” subClassOf(isA) subPropertyOf “Rodin” &r6 “August” creates &r7 exhibited &r8 title “Rodin Museum” lname “image/jpeg” 2000-02-01 Example Graph -isoConnected(x, y) is true if there two paths P1, P2 such that: P1 = <x, pa, a, pb, b, pc, c> and creates exhibited String P2 = <y, pu, b, pv, m, pw, l>Museum Artist Artifact lname and x y, a b, c l ……. material sculpts Sculptor String Sculpture pa pu, pb pv, pc pw …. String String String Integer Ext. Resource Painter paints X Painting technique c String pc pa a &r3 “Pablo” last_modified title Date “Reina Sofia Museun” &r1 &r2 “Picasso” “oil on canvas” “Rembrandt” &r5 &r4 Y&r6 “Rodin” “August” 2000-6-09 pa creates u &r7 technique “oil on canvas” p1exhibited typeOf(instance) 1 &r8 subClassOf(isA) subPropertyOf title “Rodin Museum” lname “image/jpeg” 2000-02-01 Operators • The Operator computes Semantic Associations between two entities. • Three kinds of Operators are defined. – Path : This operator returns all paths between two entities in the data model. – Connect : This operator returns intersecting paths, on which the two entities lie. – Iso : -isomorphic paths implies a similarity of nodes and edges along the paths, and returns such similar paths between entities. Formalism • -pathConnected(x, y): is true if there is a path – <x, p1, a, p2, b, p3, …. y> in the knowledge base • -joinConnected(x, y): is true if there two paths P1, P2 such that: – P1 = <x, pa, a, pb, b, pc, c, pd…k, pl l, pm, m> and – P2 = <y, pu, b, pv,…k, pw, l, py, n> Or – P1 = < a, pa, b, pb,…k, pk, l, pl, x > and – P2 = < y, pu, b, pv, m, pw, l,…k, p5, l, p6, y > Complex Relationships • Arise in several contexts, especially involving multiple ontologies (hence mappings) – information interoperability where related resources subscribe to different but related ontologies – information requestor and resource modelers choose to use different ontologies – information requests to support analysis, knowledge discovery, decision making, learning that requires linking multiple domains with different ontologies Developing all encompassing, unified ontology is not shown to be practical. Preexisting classifications/metadata standards/taxonomies are hard to ignore. Complex Relationships - Cause-Effects & Knowledge discovery ENVIRON. VOLCANO BUILDING LOCATION LOCATION ASH RAIN DESTROYS PYROCLASTIC FLOW WEATHER PEOPLE COOLS TEMP DESTROYS KILLS PLANT InfoQuilt System Core capabilities • Ability to handle heterogeneous, static or dynamic content – wrappers & extractors, with resource modeling (completeness, data characteristics, binding patterns) • Information Extraction: Semi-Automatically or Automatically create domain-specific or contextually relevant metadata • Domain modeling with complex (user defined, inter-ontology) relationships, domain rules and FD • User defined Functions (esp. for fuzzy/approximate matching) and Simulation • Post processing result analysis (e.g., chart creator) Inter-Ontological Relationships A nuclear test could have caused an earthquake if the earthquake occurred some time after the nuclear test was conducted and in a nearby region. NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000 IScape (Information Scape) A computing paradigm that allows users to query and analyze the data available from a diverse autonomous sources, gain better understanding of the domains and their interactions as well as discover and study relationships. IScape …a simple example • user’s request – for semantically related information (regardless of all forms of heterogeneity) – specified in terms of components of knowledge base (domain model, relationships, functions, simulations) “Find all earthquakes with epicenter less than 5000 mile from the location at latitude 60.790 North and longitude 97.570 East and find all tsunamis that they might have caused” Next - KD using ISacpes Knowledge Discovery - Example Earthquake Sources Nuclear Test Sources Nuclear Test May Cause Earthquakes Is it really true? Complex Relationship: How do you model this? Knowledge Discovery - Example When was the first recorded nuclear test conducted? 1950 Find the total number of earthquakes with a magnitude 5.8 or higher on the Richter scale per year starting from 1900 Increase in number of earthquakes since 1945 Knowledge Discovery – exploring relationship… For each group of earthquakes with magnitudes in the ranges 5.8-6, 6-7, 7-8, 8-9, and >9 on the Richter scale per year starting from 1900, find number of earthquakes Number of earthquakes with magnitude > 7 almost constant. So nuclear tests probably only cause earthquakes with magnitude < 7 Knowledge Discovery - Example… Find nuclear tests and earthquakes that may have occurred as a result of the test KB Knowledge Builder IScape Builder IScape Execution Knowledge Final Results IScape Plan Query Query IScape Final Results Plan Data retrieved Query IScape 1 Resource Resource NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3 ); SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” ); http://sun00781.dn.net/nuke/hew/Library/Catalog Resource USGS site NuclearTestSites( testSite, latitude, longitude ); Ontology NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude ); Ontology Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 ); Relationship NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000 IScape “Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been caused due to these tests.” IScape Processing Monitor Future Future work in Semantic Web will increasingly focus on all dimensions of relationships, especially complex relationships. Further Information • Related Paper: Sheth, Arpinar, Kashyap: Relationships at the Heart of Semantic Web: Modeling, Discovering, and Exploiting Complex Semantic Relationships http://lsdis.cs.uga.edu/lib/2002.html • InfoQuilt and Semantic Association Projects at the LSDIS Lab: http://lsdis.cs.uga.edu