Web, Semantics, OIL and FUEL: Semantic Interoperability and learning on the Web by Amit Sheth Director, Large-Scale Distributed Information Systems Lab. University of Georgia, Athens, GA USA http://lsdis.cs.uga.edu Founder/Chairman, Taalee, Inc. http://www.taalee.com Special thanks, Digital Library project team at LSDIS Stanford DB Seminar, October 20, 2000 Semantics: “meaning or relationship of meanings, or relating to meaning …” (Webster), meaning and use of data (Information System) Semantic Web: “The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what the data means to process it. . . . . . . Imagine what computers can understand when there is a vast tangle of interconnected terms and data that can automatically be followed.” (Tim Berners-Lee, Weaving the Web, 1999) • “A Web in which machine reasoning will be ubiquitous and devastatingly powerful.” • “A place where the whim of a human being and the reasoning of a machine coexist in an ideal, powerful mixture.” • “A semantic Web would permit more accurate and efficient Web searches, which are among the most important Web-based activities.” — A personal definition Semantic Web: The concept that Web-accessible content can be organized semantically, rather than though syntactic and structural methods. • Markups/Standards: DAML: Semantic Annotations and Directory; DSML: Directory (of course, XML, RDF, namespaces) • Commercialization 1 (Oingo): Taxonomy – Ontology and Semantic Techniques • Commercialization 2 (Taalee): Knowledgebase (Taxonomy, Domain Modeling, Entities and Relationships) and Semantic Techniques • Research (Digital Earth at UGA): Complex Relationships and “deep semantics” allow semantic interoperability at the level we currently have syntactic interoperability in XML 1. 2. 3. 4. Create an Agent Mark-Up Language (DAML) built upon XML that allows users to provide machinereadable semantic annotations for specific communities of interest. Create tools that embed DAML markup on to web pages and other information sources in a manner that is transparent and beneficial to the users. Use these tools to build up, instantiate, operate, and test sets of agent-based programs that markup and use DAML. 5. 6. ….applications DARPA Agent Mark Up Language (DAML) Program Manager: Professor James Hendler http://dtsn.darpa.mil/iso/programtemp.asp?mode=347 <ONTOLOGY ID=”powerpoint-ontology" VERSION="1.0" DESCRIPTION=”formal model for powerpoint presentations"> <Title> DAML <subtitle> an Example </subtitle> </title> <USE-ONTOLOGY ID=”PPT-ontology" VERSION="1.0" PREFIX=”PP" URL= "http://iwp.darpa.mil/ppt..html"> <CATEGORY NAME=”pp.presentation” FOR="http://iwp.darpa.mil/jhendler/agents.html"> <RELATION-VALUE POS1 = “Agents” POS2 = “/madhan”> <DEF-CATEGORY NAME=”Title" ISA=”Pres-Feature" > <DEF-CATEGORY NAME=”Subtitle" ISA=”Pres-Feature" > <DEF-RELATION NAME=”title-of" SHORT="was written by"> <DEF-ARG POS=1 TYPE=”presentation"> <DEF-ARG POS=2 TYPE=”presenter" > Objects in the web can be marked- in principle (manually or automatically) to include the following information • Descriptions of data they contain (DBs) • Descriptions of functions they provide (Code) • Descriptions of data they can provide (Sensors) Source : http://www.darpa.mil/iso/DAML/ Source: http://www.zdnet.com/pcweek/stories/jumps/0,4270,2432946,00.html Example of searching on DAML-centric semantic Web Semantics; Entity+Rel+Events; Meaning with Context Directory; Structure; Table of Contents Search; Syntax; Index Value of Information Semantics results in deep understanding of content, resulting in more relevant and timely match with the information needs and targeting. • Oingo Ontology – ODP based(?), the database of millions of concepts and relationships that powers Oingo's semantic technology • Oingo Seek - the database of millions of concepts and relationships that powers Oingo's semantic technology • Oingo Sense - the knowledge extraction tool that uncovers the essential meaning of information by sensing concepts and context • Oingo Lingua - the language of meaning used to state intent. The basis for intelligent interaction • Assets catalogued are Web sites or Web pages. Broad taxonomy, Shallow understanding and results After 3 or 4 clicks Taalee WorldModelTM: Domain Models (metadata of domain-media-business attributes, types), Ontologies, Entities, Relationships, Automated “Experts”, Reference Data (Live Encyclopedia), Mappings Taalee Distributed Intelligent Agent Infrastructure: push/pull/scheduled agents for fresh extraction Taalee Metabase of A/V assets Taalee Semantic EngineTM with contextual reasoning Semantic CategorIzation Semantic Cataloging Semantic Search Semantic Directory Semantic Personalization Semantic Targeting Taalee Semantic Engine Metabase Extractor Agents WorldModelTM Metabase: Rapidly growing A/V aggregation Automatic Extraction Agents: Expert driven value addition WorldModel: Understanding of content, profiles, targeting needs Taalee Metadata on Football Assets Metadata from Typical Virage Search on Cataloging of Football football touchdown Assets Rich Media Reference Page Baltimore 31, Pit 24 http://www.nfl.com Brian Griese Interview Part Four Brian Griese talks about the first touchdown he ever threw. URL: http://cbs.sportsline... Jimmy Smith Interview Part Seven Jimmy Smith explains his philosophy on showboating. URL: http://cbs.sportsline... Quandry Ismail and Tony Banks hook up for their third long touchdown, this time on a 76-yarder to extend the Raven’s lead to 31-24 in the third quarter. League: Teams: Score: Players: Event: Produced by: Posted date: Professional Ravens, Steelers Bal 31, Pit 24 Quandry Ismail, Tony Banks Touchdown NFL.com 2/02/2000 Semantic Enrichment (a commercial perspective) What else can a context do? Simply the most precise and freshest A/V search Delightful, relevant information, exceptional targeting opportunity Context and Domain Specific Attributes Uniform Metadata for Content from Multiple Sources, Can be sorted by any field Creating a Web of related information What can a context do? System recognizes ENTITY & CATEGORY Relevant portion of the Directory is automatically presented. Users can explore Semantically related Information. Looking ahead FROM: Browsing Lexical search Data exchange Data retrieval TO: Information requests Content search Semantic retrieval Interpretation Knowledge creation Knowledge sharing Evolving targets and approaches in integrating data and information (a personal perspective) Generation III 1997... Generation II Taalee, Observer ADEPT, InfoQuilt DL-II/DARPA/KA2 projects, OntoBroker, … VisualHarness InfoHarness InfoSleuth, KMed, DL-I projects Infoscopes, HERMES, SIMS, Garlic,TSIMMIS,Harvest, RUFUS,... Mermaid DDTS Multibase, MRDSM, ADDS, IISS, Omnibase, ... 1990s Generation I 1980s enablers of the emerging concepts Terminology (and language) transparency Domain modeling (entities with domain specific attributes) and complex relationships Comprehensive metadata management Context-sensitive information processing Semantic correlation Digital Earth Prototype System at UGA Develop a Digital Earth Modeling System Answer requests for collection of information from distributed resources Develop a supportive learning environment for undergraduate geography students A Digital Library Scenario VOLCANOES ACTIVITY Some volcanoes are more active than others, and a few are in a state of permanent eruption, at least for the geological present. Volcanoes may become quiescent (dormant) for months or years. The danger to life posed by active volcanoes is not limited to eruption of molten rock or showers of ash and cinders. Mudflows that melt ice and snow on the volcano's flanks are equally hazardous*. * Encarta® 98 Desk Encyclopedia © & 1996-97 Microsoft Corporation.All rights reserved. Pu'u'O'o, Hawaii A Digital Library Scenario VOLCANOES ACTIVITY A sample information request: Find information on volcanoes in St. Helens and how they affect the environment. Some of the ontologies involved in processing this information request are: • Ontology for GIS Datasets; • Ontology for Natural Disasters; • Ontology for Volcanoes; • Ontology for Environment; TRY HERE THIS AND OTHER CONCEPT DEMOS Iscape working definition “An iscape is an information request that supports learning and semantic interoperability (about Digital Earth) “ (ADEPT at UGA) Iscapes in the context of digital earth (ADEPT) Iscapes are useful to understand geographical phenomena, typically involving relationships between them Iscapes are created by instructors using an iscape specification framework Iscapes are run by students while learning about Digital Earth Iscapes creation framework fits in the ADEPT agent -based architecture prototype Iscape specification framework Ontologies Operations/ Simulation Presentation Information Landscape Relationships Creation Learning/What-if Information Landscapes A modular specification framework to represent information landscapes Specifications of complex information requests over multiple ontologies Specification of relationships, including “affects” Enabling user-configurable parameters Enabling operations including simulations A graphical toolkit for easy creation of iscapes Information Landscapes Learning paradigm for students Uses embedded ontological terms and iscapes Metadata framework Models spatial, temporal and theme based metadata Uses FGDC and Dublin Core standards to represent domain independent metadata Relations Given a set X, a relation is some property that may or may not hold between one member of X and a member of another set Various relationships: “equals”, “less_than”, “is_a”, “is_part_of”, “like” Semantic Relations Most of these relations are hierarchical or similarity based These are not powerful enough for our task of semantic interoperability between domains like Geography In these domains, we have a natural “affects” relation between the ontologies Semantic Relations How does A affect B? A, in its entirety or by a set of its components, induces some changes or properties on a set of components of B Design of “affects” How do volcanoes affect the environment? ENVIRON. VOLCANO BUILDING LOCATION LOCATION ASH RAIN DESTROYS PYROCLASTIC FLOW ATMOSPHERE PEOPLE COOLS PLANT DESTROYS KILLS Design of “affects” [Area (Pyroclastic Flow) INTERSECT Area (Plant)] => [Pyroclastic Flow destroys Plant] [Size (Ash Particles) < 2] => [Ash Rain cools Atmosphere] [Pyroclastic Flow destroys Plant] and [Ash Rain cools Atmosphere] => [Volcano affects Environment] (x | xASC) and (y | yBSC) [ FN(x) operator FN(y) ]* => [ ASC relation BSC ] [ ASC relation BSC ]* => A affects B Mapping Functions How do volcanoes affect the environment? [ Location (Volcano) = Location (Environment) ] Enclosing function provides a standard interface to the operator Operator does imprecise or fuzzy match Achieves Geo-spatial interoperability Mapping Functions How do volcanoes affect the environment? [ Time (Volcano) = Time (Environment) ] Matches, with a tolerance depending on the granularity of values Tolerance different for different entities; Specified default; Can be user-defined Achieves temporal interoperability Operations Powerful mechanism of studying geographical domains and other complex phenomena Input parameters can be changed to support learning For e.g. statistical operations, numerical analysis simulation modeling, etc. Metadata Objects (site, table, keyword, image …) i1 o1 User Object i2 om f(i1,... i n, o1,... om) Clarke’s Urban Growth Model (UGM) Domain of Learning – URBAN DYNAMICS Demonstrates the utility of integrating existing historic maps with remotely sensed data and related geographic information to dynamically map urban land characteristics for large metropolitan areas. San Francisco Bay Area prediction of urban extent in 2100 Digital Earth Prototype: run-time architecture overview RELATE Cost Model Ontology Agent User Agent Planning Agent Broker Correlation Agent Wrapped Resource Agent Metabase Resource Agent Simulation Resource Agent Web Wrapper Database Wrapper Simulation ADEPT Metabase Semantic Web: Possible Evolution FUEL – User defined/supplied operators, functions, computations Declarative Languages DAML-O, OIL XHTML HTML SMIL XML RDF FUEL as OIL Extension? RDF(S) • • • • • • class-def subclass-of slot-def subslot-of domain range FUEL OIL OIL,FUEL • class-expressions • AND, OR, NOT • slot-constraints • has-value, value-type • cardinality • slot-properties • trans, symm • Framework for mapping data/formats • user defined operators eg., affects, simulations The Promise of the Web with Semantics…. Semantic Web can be a basis of handling information overload and provide semantic interoperability Step wise enrichment -- starting with constrained and well understood language (such as based on Description Logic), let us explore how we can support richer/deeper semantics for enabling complex decision making and learning involving heterogeneous digital media on the Global Information Infrastructure Further reading http://www.semanticweb.org http://www.daml.org http://lsdis.cs.uga.edu/~adept “DAML could take search to a new level” http://www.zdnet.com/pcweek/stories/news/0,4153,2432538,00.html V. Kashyap and A. Sheth, Information Brokering, Kluwer Academic Publishers, 2000 Tim Berners-Lee, Weaving the Web, Harper, 1999. Editorial writing by Ramesh Jain in IEEE Multimedia. Gio’s papers. OIL …. “Humankind has not woven the web of life. We are but one thread within it. Whatever we do to the web, we do to ourselves. All things connect.” – Chief Seattle, 1854 amit@taalee.com amit@cs.uga.edu – – http://www.taalee.com http://lsdis.cs.uga.edu For additional details on Information Brokering Architecture: Realizing Semantic Information Brokering and Semantic Web ITC-IRST/University of Trento Seminar Series on Perspectives on Agents: Theories and Technologies, April, 27, 2000, Trento, Italy http://lsdis.cs.uga.edu/~adept/presenta.html For additional details on ISCAPE specification and Execution: Project Overview and Detailed Presentation at: http://lsdis.cs.uga.edu/~adept/presenta.html Demonstrations at: http://lsdis.cs.uga.edu/~adept Iscape specification using XML <! -- A template collection for all iscapes -- > <?xml version = “1.0” ?> <!DOCYPE IscapeCollection SYSTEM “IscapeCollection.dtd” > <! -- All Iscapes -- > <IscapeCollection> <!-- An iscape specification for how stratovolcanoes affect the environment -- > <Iscape> < -- Identifying this iscape -- > <Name> How do stratovolcanoes affect the environment </Name> <Description> An iscape using the affects relationship </Description> <! – All ontologies which participate -- > <Ontologies> <Ontology>Volcano</Ontology> <Ontology>Environment</Ontology> </Ontologies> <! – Operations involved -- > <Operation> <Relation>Affects</Relation> </Operation> Iscape specification using XML <!— Constraints on ontologies -- > <Ontological Constraints> <Constraint> Volcano morphology is stratovolcano </Constraint> <Constraint> Volcano start year is 1950 </Constraint> </Ontological Constraints> <!—Metadata to present in the result --> <Presentation> Volcano and Environment Metadata </Presentation> <!—What can the student configure -- > <Student> <Config> Location of Environment </Config> </Student> </Iscape> <!—This Iscape Ends -- > <! – Next Iscape starts -- > <Iscape> … … </Iscape> </IscapeCollection> <!—Iscape Collection ends here -- > Relations <!-- Template collection of all relations in the system --> <?xml version = “1.0” > <!DOCTYPE Relations SYSTEM “Relations.dtd” > <Relations> <!--Relation specification starts here --> <Relation> <!-- Information to correlate with base iscape --> <Name> Affects </Name> <!-- Ontologies Involved --> <OntologyA> Volcano </OntologyA> <OntologyB> Environment </OntologyB> <!-- All operators --> <OperatorSet> <!-- Specification has value and mapping conditions --> <ValueCondition> <OntologyName> Environment </OntologyName> <Attribute> Damage </Attribute> <ValOperator> GREATERTHANEQUALS</ValOperator> <Value> 10000 </Value> <Type> Integer </Type> </ValueCondition> Relations <MappingCondition> <FunctionA>Area</FunctionA> <ElementA>Volcano</FunctionA> <Operator>EQUALS</Operator> <FunctionB>Area</Function> <ElementB>Environment</ElementB> </MappingCondition> </OperatorSet> <!-- End of all operators -- > </Relation> <!-- End of this relation specification -- > </Relations> <!-- End of relation collection -- > Ontological Constraints <!-- Template to specify ontological constraints -- > <?xml version = “1.0” > <!DOCTYPE OntologicalConstraints SYSTEM “OntologicalConstraints.dtd” > <!-- A collection of ontological constraints for all iscapes -- > <OntologicalConstraints> < -- A constraint on this iscape--> <Constraint> <IscapeID>Volcano-Env</IscapeID> <Name>Volcano morphology is stratovolcano</Name> <LHSOntology>Volcano</LHSOntology> <LHSAttribute>Morphology</LHSAttribute> <Operator>LIKE</Operator> <Type>String</Type> <RHSValue>Stratovolcano</RHSValue> </Constraint> </OntologicalConstraints> <! -- Collection of ontological constraints ends here -- > Presentation <!-- Template for presentation attributes - > <?xml version = “1.0” > <!DOCTYPE Presentation SYSTEM “Presentation.dtd” > <!-- All presentation attributes are embedded here - > <Presentation> <!-- presentation attributes for this iscape-- > <IncludeThese> <IscapeID>Volcano-Env</IscapeID> <Name>Volcano and Environment Metadata</Name> <Include> <Ontology>Volcano</Ontology> <Attribute>TectonicSetting</Attribute> </Include> <Include> <Ontology>Volcano</Ontology> <Attribute>EndYear</Attribute> </Include> </IncludeThese> </Presentation> <!-- Presentation attributes end here -- > Student < !-- Template for student configurable attributes -- > <! DOCTYPE Student SYSTEM “Student.dtd” > <!-- All parameters which can be configured by a student -- > <UserConfigurable> <!-- Configuration for a particular iscape -- > <Config> <!-- Correlating information -- > <Name>Location of environment</Name> <!-- The parameters which are configurable -- > <Parameter> <Ontology>Environment</Ontology> <Attribute>LocationName</Attribute> <DisplayName>Configure Location</Display> <Value>Hawaii</Value> <Value>Kileauaea</Value> </Parameter> </Config> <!-- Configuration for this iscape ends here -- > </UserConfigurable> <!-- End of all student configurable parameters -- > Student interface Results The correlation agent Receives the results collections from each of the resource agents Correlates the results on basis of information provided in iscape and the query plan generated by planning agent Performs data cleaning operations and merges the results into uniform result set and pass it on to user agent Responsible for performing operations, if specified in the iscape Realizing Semantic Information Brokering and Semantic Web in summary Knowledge Mgmt., Information Brokering/ Mediator, Cooperative IS Visual, Scientific/Eng. Knowledge Semantic Semi-structured Metadata Structural, Schematic Mediator, Federated IS Text Structured Databases Data Syntax, System Federated DB Popular Alternative perspective/approach: Linguistics, IR, AI Taking advantage of the Web for learning Graduate students in a College of Geography have a final project in which a case of study is proposed. In the case, they are supposed to help a City Council in making decisions over the planning of a new landfill. This is a hands-on learning exercise through the interaction with a Digital Earth and the starting point would be to find the best location for the landfill*. * This scenario comes in support of one of the suggestions for Digital Earth scenarios sampled by the “First Inter-Agency Digital Earth Working Group, an effort on behalf of NASA’s inter-agency Digital Earth Program. Tacoma Landfill An example scenario of learning on the Web A high level information request would be: Find a landfill site for a new landfill near the source of the wastes. The earthquakes’ impacts must be evaluated. by definition by semantics A first cut refinement leads us to the following information request: by synonymy Find a proper soil in sites not subject to flooding or high groundwater levels for a new landfill near the industrial zone. Liquefaction phenomenon cannot occur. An example scenario of learning on the Web Adding on-the-fly user constraints while processing the information request: Retrieve satellite images in 12-meter resolution or higher, looking for soils with permeability rate < 10 (silty clay loam) for a new landfill whose distance from the city industrial park is less than 5km. Using the images’ coordinates, forecast seismic activity up to moderate magnitude (5 - 5.9, Richter scale) in the pointed areas. domain specific metadata; correlation among multiple ontologies; return results in multiple media (in this case, images and a simulation) An example scenario of learning on the Web Partial sample ontologies for semantic information brokering: RECREATIONAL MILITARY LANDFILL SITE LAND (SITE) CULTIVATED AREA GREENLAND AREA LAND USE ZONING AGRICULTURAL COMERCIAL LAND BANK INDUSTRIAL WASTE DISPOSAL SOLID RESIDENTIAL RURAL STORM SEWAGE FLOOD HAZARDOUS TSUNAMI RESOURCE REC. LANDFILL FIRE causes NATURAL DISASTER RECYCLING VOLCANO AVALANCHE washing shredding magnetic separation causes causes screening LANDSLIDE EARTHQUAKE causes An example scenario of learning on the Web A sample result (depending on information providers) could be: identified landfill site 5km industrial zone images source: http://www.orbimage.com OrbView-4’s stereo imaging capacity providing 3-D terrain images Hyperspectral data will be valuable for identifying material types The students now have the information requested for helping the City Council in the planning of the new landfill