Realizing the Relationship Web: Morphing information access on the Web from today’s document- and entity-centric paradigm to a relationship-centric paradigm ACM Multimedia International Workshop on the Many Faces of Multimedia Semantics September 28, 2007, Augsburg, Germany Amit Sheth Kno.e.sis Center, Wright State University, Dayton, OH This talk also represents work of several members of Kno.e.sis team, esp. the Semantic Discovery and Semantic Sensor Data. http://knoesis.wright.edu Thanks, M. Perry, C. Ramakrishnan, C. Thomas Knowledge Enabled Information and Services Science Objects of Interest “An object by itself is intensely uninteresting”. Grady Booch, Object Oriented Design with Applications, 1991 Keywords | Search Entities | Integration Relationships | Analysis, Insight Entities + Relationships also needed to model & study Events Knowledge Enabled Information and Services Science Semantics and Relationships Increasing depth and sophistication in dealing with semantics by dealing with (identifying/searching to analyzing) documents, entities, and relationships. Future Relationships Current Entities Past Documents/Media Knowledge Enabled Information and Services Science Data, Information, and Insight What: Which thing or which particular one Who: What or which person or persons Where: At or in what place When: At what time How: In what manner or way; by what means Why: For what purpose, reason, or cause; with what intention, justification, or motive © Ramesh Jain Knowledge Enabled Information and Services Science Insights require understanding Relationships Object Location Time What X X X Who X Where When Relationships X X How X Why X © Ramesh Jain Knowledge Enabled Information and Services Science Semantics and Relationships Semantics is derived from relationships. Consider the linguistics perspective. “Semantics is the study of meaning. …We may distinguish a number of legitimate ways to approach semantics: • … • the relationship between linguistic expressions (e.g. synonymy, antonymy, hyperonymy, etc.): sense; • the relationship to linguistic expressions to the "real world": reference. “ Ontologies use KR language to support modeling of relationships . Quoted part from http://www.ncl.ac.uk/sml/staff/. © 2000 Jonathan West. Knowledge Enabled Information and Services Science Why is this a hard problem? Are objects/entities equivalent/equal(same)? How (well) are they related? • Implicit vs explicit; formal/assertional vs social consensus based; powerful (beyond FOL): partial, probilistic and fuzzy match • Degrees of relatedness and relevance: semantic similarity, semantic proximity, semantic distance, …. – [differentiation, disjointedness] – related in a “context” • Even is-a link involves different notions: identify, unity, essense (Guarino and Wetley 2002) Semantic ambiguity, also based on incomplete, inconsistent, approximate information/knowledge Knowledge Enabled Information and Services Science Issues - Relationships • Identifying Relationship (extraction) • Expressing (specifying, representing) relationships • Discovering and Exploring Relationships • Hypothesizing and Validating Relationships • Utilizing/exploiting Relationships for Semantic Applications (in document search, querying metadata, inferencing, analysis, insight, discovery) Knowledge Enabled Information and Services Science Information Extraction for Metadata Creation WWW, Enterprise Repositories Nexis UPI AP Feeds/ Documents Digital Videos ... ... Data Stores Digital Maps ... Digital Images Create/extract as much (semantics) metadata automatically as possible; Use ontlogies to improve and enhance extraction Digital Audios EXTRACTORS METADATA Knowledge Enabled Information and Services Science Automatic Semantic Metadata Extraction/Annotation Knowledge Enabled Information and Services Science Semantic Annotation (Extraction + Enhancement) COMTEX Tagging Value-added Voquette Semantic Tagging Content ‘Enhancement’ Rich Semantic Metatagging Limited tagging (mostly syntactic) Value-added relevant metatags added by Voquette to existing COMTEX tags: Knowledge Enabled Information and Services Science • Private companies • Type of company • Industry affiliation • Sector • Exchange • Company Execs • Competitors Semantic Metadata Enhancement Knowledge Enabled Information and Services Science Automatic Classification & Metadata Extraction (Web page) Video with Editorialized Text on the Web Auto Categorization Semantic Metadata Knowledge Enabled Information and Services Science Ontology-directed Metadata Extraction (Semi-structured data) Web Page Enhanced Metadata Asset Extraction Agent Knowledge Enabled Information and Services Science Semantic Extraction/Annotation of Experimental Data ProPreO: Ontology-mediated provenance 830.9570 194.9604 2 580.2985 0.3592 parent ion m/z 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 fragment ion m/z 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion charge parent ion abundance fragment ion abundance ms/ms peaklist data Knowledge Enabled Information and(MS) Services Science Mass Spectrometry Data Video metadata and search Technique Who’s trying it How it works Wired Tired Scanning the script Blinkx, TVEyes Everything said in a clip is tracked through voice recognition software, closedcaption information, or a combination of the two. Hunts down TV news references to, say, Lindsay Lohan. A mere mention of her name doesn't guarantee Lindsay is in the clip. Identifying what's being shown Google, UCSD's Statistical Visual Computing Lab, VideoMining Algorithms try to figure out what's in the video by monitoring attributes like behavior and movement (VideoMining), faces (Google), and objects (UCSD). Finds friends, public figures, or specific actions like car chases. The tech is stuck in the lab or limited to specialized search tasks. Analyzing links and metadata Dabble, Google Conventional search spiders scan the text around a video and the pages that link to it. The fastest and best way to search for video data. Can't "see" what's in the videos or locate specific action in a long clip. Knowledge Enabled Information and Services Science Wired Mag, Aug 2007 Semantic Sensor ML – Adding Ontological Metadata Domain Ontology Person Company Spatial Ontology Coordinates Coordinate System Temporal Ontology Time Units Timezone Mike Botts, "SensorML and Sensor Knowledge Web Enablement," Enabled Information and Services Science Earth System Science Center, UAB Huntsville 18 Relationships on the Web: early work Knowledge Enabled Information and Services Science MREF (Metadata Reference Link -- complementing HREF) Creating “logical web” through Media Independent Metadata based Correlation Knowledge Enabled Information and Services Science Metadata Reference Link (<A MREF …>) <A HREF=“URL”>Document Description</A> physical link between document (components) • <A MREF KEYWORDS=<list-of-keywords>; THRESH=<real>>Document Description</A> • <A MREF ATTRIBUTES(<list-of-attribute-valuepairs>)>Document Description</A> Knowledge Enabled Information and Services Science MREF Metadata Reference Link -- complementing HREF (1996, 1998) Creating “logical web” through Media Independent Metadata based Correlation ONTOLOGY NAMESPACE ONTOLOGY NAMESPACE METADATA METADATA DATA MREF in RDF DATA Knowledge Enabled Information and Services Science MREF (1998) Model for Logical Correlation using Ontological Terms and Metadata MREF Framework for Representing MREFs RDF Serialization (one implementation choice) XML K. Shah and A. Sheth, "Logical Information Modeling of Web-accessible Heterogeneous Digital Assets", Proc. of the Forum on Research and Technology Advances in Digital Libraries," (ADL'98), Santa Barbara, CA, May 28-30, 1998, pp. 266-275. Knowledge Enabled Information and Services Science Figure 3: XML, RDF, and MREF Correlation based on Content-based Metadata Some interesting information on dams is available here. “information on dams” is defined by an MREF defining keywords and metadata (which may be used for a query). water.gif (Data) Metadata Storage water.gif ……mpeg ……ppm Content Dependent Metadata height, width and size Content based Metadata Major component(RGB) Knowledge Enabled Information and Services Science Blue Domain Specific Correlation Potential locations for a future shopping mall identified by all regions having a population greater than 500 and area greater than 50 sq meters having an urban land cover and moderate relief <A MREF ATTRIBUTES(population < 500; area < 50 & region-type = ‘block’ & land-cover = ‘urban’ & relief = ‘moderate’)>can be viewed here</A> => media-independent relationships between domain specific metadata: population, area, land cover, relief => correlation between image and structured data at a higher domain specific level as opposed to physical “link-chasing” in the WWW Knowledge Enabled Information and Services Science Repositories and the Media Types Population: Area: Boundaries: Land cover: Relief: Image Features (IP routines) Regions (SQL) Boundaries Census DB TIGER/Line DB Knowledge Enabled Information and Services Science Map DB Knowledge Enabled Information and Services Science Complex Relationships • Some relationships may not be manually asserted, but according to statistical analyses of text, experimental data, etc. • allow association of provenance data with classes, instances, relationship types and direct relationships or statements Knowledge Enabled Information and Services Science A simple relationship ? Smoking Causes Knowledge Enabled Information and Services Science Cancer Complex Relationships - Cause-Effects & Knowledge discovery ENVIRON. VOLCANO BUILDING LOCATION LOCATION ASH RAIN DESTROYS PYROCLASTIC FLOW WEATHER PEOPLE COOLS TEMP PLANT DESTROYS KILLS Knowledge Enabled Information and Services Science Knowledge Discovery - Example Earthquake Sources Nuclear Test Sources Nuclear Test May Cause Earthquakes Is it really true? Knowledge Enabled Information and Services Science Complex Relationship: How do you model this? Inter-Ontological Relationships A nuclear test could have caused an earthquake if the earthquake occurred some time after the nuclear test was conducted and in a nearby region. NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000 Knowledge Enabled Information and Services Science Knowledge Discovery – exploring relationship… For each group of earthquakes with magnitudes in the ranges 5.8-6, 6-7, 7-8, 8-9, and >9 on the Richter scale per year starting from 1900, find number of earthquakes Number of earthquakes with magnitude > 7 almost constant. So nuclear tests probably only cause earthquakes with magnitude < 7 Knowledge Enabled Information and Services Science Now possible – Extracting relationships between MeSH terms from PubMed Biologically active substance UMLS Semantic Network complicates affects causes causes Lipid affects Disease or Syndrome instance_of instance_of ??????? Fish Oils Raynaud’s Disease MeSH 9284 documents 5 documents Knowledge Enabled Information and Services Science 4733 documents PubMed Schema-Driven Extraction of Relationships from Biomedical Text Cartic Ramakrishnan, Krys Kochut, Amit P. Sheth: A Framework for SchemaDriven Relationship Discovery from Unstructured Text. International Semantic Web Conference 2006: 583-596 [.pdf] Knowledge Enabled Information and Services Science Method – Parse Sentences in PubMed SS-Tagger (University of Tokyo) SS-Parser (University of Tokyo) • Entities (MeSH terms) in sentences occur in modified forms • “adenomatous” modifies “hyperplasia” (TOP (S (NP (NP (DT An) (JJ excessive) (ADJP (JJ endogenous) (CC or) (JJ • “An excessive endogenous or exogenous modifies exogenous) ) (NN stimulation) ) (PP (IN by) (NPstimulation” (NN estrogen) ) ) ) (VP (VBZ “estrogen” induces) (NP (NP (JJ adenomatous) (NN hyperplasia) ) (PP (IN of) (NP (DT • Entities can also occur) as of 2 or more other entities the) (NN endometrium) ) ) composites ))) • “adenomatous hyperplasia” and “endometrium” occur as “adenomatous hyperplasia of the endometrium” Knowledge Enabled Information and Services Science Method – Identify entities and Relationships in Parse Tree Modifiers Modified entities Composite Entities TOP S VP NP VBZ PP NP DT the JJ excessive JJ endogenous IN by ADJP NP induces NN estrogen NP NN stimulation JJ adenomatous CC or PP NN hyperplasia IN of NP JJ exogenous DT the Knowledge Enabled Information and Services Science NN endometrium Resulting Semantic Web Data in RDF hyperplasia adenomatous hasModifier hasPart modified_entity2 An excessive endogenous or exogenous stimulation hasModifier hasPart modified_entity1 induces composite_entity1 hasPart hasPart estrogen Modifiers Modified entities Composite Entities endometrium Knowledge Enabled Information and Services Science Blazing Semantic Trails in Biomedical Literature Cartic Ramakrishnan, Amit P. Sheth: Blazing Semantic Trails in Text: Extracting Complex Relationships from Biomedical Literature. Tech. Report #TR-RS2007 [.pdf] Knowledge Enabled Information and Services Science Relationships -- Blazing the Trails “The physician, puzzled by her patient's reactions, strikes the trail established in studying an earlier similar case, and runs rapidly through analogous case histories, with side references to the classics for the pertinent anatomy and histology. The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior.” [V. Bush, As We May Think. The Atlantic Monthly, 1945. 176(1): p. 101-108. ] Knowledge Enabled Information and Services Science Original documents PMID-15886201 PMID-10037099 Knowledge Enabled Information and Services Science Semantic Trail Knowledge Enabled Information and Services Science Semantic Trails over all types of Data Semantic Trails can be built over a Web of Semantic (Meta)Data extracted (manually, semi-automatically and automatically) and gleaned from • Structured data (e.g., NCBI databases) • Semi-structured data (e.g., XML based and semantic metadata standards for domain specific data representations and exchanges) • Unstructured data (e.g., Pubmed and other biomedical literature) and • Various modalities (experimental data, medical images, etc.) Knowledge Enabled Information and Services Science Applications Applications “Everything's connected, all along the line. Cause and effect. That's the beauty of it. Our job is to trace the connections and reveal them.” Jack in Terry Gilliam’s 1985 film - “Brazil” Knowledge Enabled Information and Services Science An application in Risk & Compliance Ahmed Yaseer: Watch list • Appears on Watchlist ‘FBI’ Organization Hamas FBI Watchlist member of organization appears on Watchlist Ahmed Yaseer works for Company WorldCom Company Knowledge Enabled Information and Services Science • Works for Company ‘WorldCom’ • Member of organization ‘Hamas’ Global Investment Bank Watch Lists Law Enforcement Regulators Public Records World Wide Web content BLOGS, RSS Semi-structured Government Data Un-structure text, Semi-structured Data Establishing New Account User will be able to navigate the ontology using a number of different interfaces Scores the entity based on the content and entity relationships Example of Fraud prevention application used in financial services Knowledge Enabled Information and Services Science Hypothesis driven retrieval of Scientific Text Knowledge Enabled Information and Services Science Semantic Browser Knowledge Enabled Information and Services Science More about the Relationship Web Relationship Web takes you away from “which document” could have information I need, to “what’s in the resources” that gives me the insight and knowledge I need for decision making. Amit P. Sheth, Cartic Ramakrishnan: Relationship Web: Blazing Semantic Trails between Web Resources. IEEE Internet Computing July 2007 (to appear) [.pdf] Knowledge Enabled Information and Services Science Events: 3 Dimensions – Spatial, Temporal and Thematic Spatial Temporal Thematic Knowledge Enabled Information and Services Science Events and STT dimensions • Powerful mechanism to integrate content – Describes the Real-World occurrences – Can have video, images, text, audio all of the same event – Search and Index based on events and STT relations • Many relationship types – Spatial: • What events happened near this event? • What entities/organizations are located nearby? – Temporal: • What events happened before/after/during this event? – Thematic: • What is happening? • Who is involved? • Going further – Can we use What? Where? When? Who? to answer Why? / How? – Use integrated STT analysis to explore cause and effect Knowledge Enabled Information and Services Science Example Scenario: Sensor Data Fusion and Analysis Low-level Sensor (S-L) High-level Sensor (S-H) H L A-H E-H A-L E-L • How do we determine if A-H = A-L? (Same time? Same place?) • How do we determine if E-H = E-L? (Same entity?) • How do we determine if E-H or E-L constitutes a threat? Knowledge Enabled Information and Services Science 53 Data Pyramid Sensor Data Pyramid Relationship Semantics/Understanding /Insight Metadata Entity Metadata Information Feature Metadata Raw Sensor (Phenomenological) Data Data Knowledge Enabled Information and Services Science 54 Sensor Data Architecture Analysis Processes Annotation Processes Knowledge • Object-Event Relations • Spatiotemporal Associations Semantic Analysis RDF KB • Provenance Pathways SML-S Entity Detection SML-S Feature Extraction O&M SML-S Ontologies Information • Entity Metadata Fusion • Object-Event Ontology • Space-Time Ontology • Feature Metadata TML Collection Data • Raw Phenomenological Data Sensors (RF, EO, IR, HIS, acoustic) Knowledge Enabled Information and Services Science 55 Current Research Towards STT Relationship Analysis • Modeling Spatial and Temporal data using SW standards (RDF(S))1 – Upper-level ontology integrating thematic and spatial dimensions – Use Temporal RDF3 to encode temporal properties of relationships – Demonstrate expressiveness with various query operators built upon thematic contexts • Graph Pattern queries over spatial and temporal RDF data2 – Extended ORDBMS to store and query spatial and temporal RDF – User-defined functions for graph pattern queries involving spatial variables and spatial and temporal predicates – Implementation of temporal RDFS inferencing 1. Matthew Perry, Farshad Hakimpour, Amit Sheth. "Analyzing Theme, Space and Time: An Ontology-based Approach", Fourteenth International Symposium on Advances in Geographic Information Systems (ACM-GIS '06), Arlington, VA, November 10 - 11, 2006 2. Matthew Perry, Amit Sheth, Farshad Hakimpour, Prateek Jain. “Supporting Complex Thematic, Spatial and Temporal Queries over Semantic Web Data", Second International Conference on Geospatial Semantics (GeoS ‘07), Mexico City, MX, November 29 – 30, 2007 3. Claudio Gutiérrez, Carlos A. Hurtado, Alejandro A. Vaisman. “Temporal RDF”, ESWC 2005: 93-107 Knowledge Enabled Information and Services Science Upper-level Ontology modeling Theme and Space Continuant Occurrent Spatial_Occurrent Named_Place Dynamic_Entity located_at occurred_at Spatial_Region rdfs:subClassOf property Occurrent: Events – happen and then don’t exist occurred_at:Those Links Spatial_Occurents to theirbehavior geographic locations Named_Place: Spatial_Region: Records entities exact with spatial static location spatial (geometry objects, building) Continuant: Concrete and Abstract Entities – persist over (e.g. time Spatial_Occurrent: Events with concrete spatial locations (e.g. a speech) located_at: Links Named_Places to their geographic locations Dynamic_Entity: Those entities with dynamiccoordinate spatial behavior system (e.g. info)person) Knowledge Enabled Information and Services Science Upper-level Ontology Continuant Occurrent Named_Place Dynamic_Entity located_at occurred_at Spatial_Occurrent Spatial_Region City Person trains_at Speech gives Politician Military_Unit Soldier participates_in Military_Event assigned_to on_crew_of used_in Bombing Battle Vehicle rdfs:subClassOf used for integration rdfs:subClassOf relationship type Domain Ontology Knowledge Enabled Information and Services Science Temporal RDF Graph: Platoon Membership assigned_to [5, 15] E4:Soldier assigned_to [1, 10] E1:Soldier E2:Platoon assigned_to [11, 20] Time interval represents valid time of the relationship E3:Platoon assigned_to [5, 15] E5:Soldier E1 is assigned to E2 from time 1 to 10 and then assigned to E3 from time 11 to 20 Also need to handle inferencing: (x rdf:type Grad_Student):[2004, 2006] AND (x rdf:type Undergrad_Student):[2000, 2004] (x rdf:type Student):[2000, 2006] Knowledge Enabled Information and Services Science ORDBMS Implementation: DB Structures Unlike thematic relationships which are explicitly stated in the RDF graph, many spatial and temporal relationships (e.g., distance) are implicit and require additional computation Knowledge Enabled Information and Services Science Sample STT Query Scenario (Biochemical Threat Detection): Analysts must examine soldiers’ symptoms to detect possible biochemical attack Query specifies (1) a relationship between a soldier, a chemical agent and a battle location (graph pattern 1) (2) a relationship between members of an enemy organization and their known locations (graph pattern 2) (3) a spatial filtering condition based on the proximity of the soldier and the enemy group in this context (spatial Constraint) Knowledge Enabled Information and Services Science The Machine Factor Formal representation of knowledge – RDF(S), OWL, etc. Statistical analysis – Similarity – Cooccurrence – Clustering Intelligent aggregation of knowledge – Collaboration/Problem Solving Environments – Decision support tools Knowledge Enabled Information and Services Science Putting the man back in Semantics The Semantic Web focuses on artificial agents “Web 2.0 is made of people” (Ross Mayfield) “Web 2.0 is about systems that harness collective intelligence.” (Tim O’Reilly) The relationship web combines the skills of humans and machines Knowledge Enabled Information and Services Science Putting the man back in Semantics “Web 2.0 is about systems that harness collective intelligence.” The relationship webiscombines the skills humans and machines The Semantic Web focuses onofartificial agents “Web 2.0 made of people” (Ross Mayfield) (Tim O’Reilly) Knowledge Enabled Information and Services Science Going places … Formal Powerful Implicit Social, Informal Knowledge Enabled Information and Services Science