No Slide Title - The Stanford University InfoLab

Web, Semantics, OIL and FUEL:
Semantic Interoperability and learning on the Web
by
Amit Sheth
Director, Large-Scale Distributed Information Systems Lab.
University of Georgia, Athens, GA USA
http://lsdis.cs.uga.edu
Founder/Chairman, Taalee, Inc.
http://www.taalee.com
Special thanks, Digital Library project team at LSDIS
Stanford DB Seminar, October 20, 2000
Semantics: “meaning or relationship of meanings, or
relating to meaning …” (Webster), meaning and use of data
(Information System)
Semantic Web: “The Web of data (and connections) with
meaning in the sense that a computer program can learn
enough about what the data means to process it. . . .
. . . Imagine what computers can understand when there is
a vast tangle of interconnected terms and data that can
automatically be followed.” (Tim Berners-Lee, Weaving the Web, 1999)
• “A Web in which machine reasoning will be
ubiquitous and devastatingly powerful.”
• “A place where the whim of a human being and the
reasoning of a machine coexist in an ideal, powerful
mixture.”
•
“A semantic Web would permit more accurate and
efficient Web searches, which are among the most
important Web-based activities.”
— A personal definition
Semantic Web: The concept that Web-accessible
content can be organized semantically, rather than
though syntactic and structural methods.
• Markups/Standards: DAML: Semantic
Annotations and Directory; DSML: Directory
(of course, XML, RDF, namespaces)
• Commercialization 1 (Oingo): Taxonomy –
Ontology and Semantic Techniques
• Commercialization 2 (Taalee): Knowledgebase (Taxonomy, Domain Modeling, Entities
and Relationships) and Semantic Techniques
• Research (Digital Earth at UGA): Complex
Relationships and “deep semantics”
allow semantic interoperability at the level
we currently have syntactic interoperability in XML
1.
2.
3.
4.
Create an Agent Mark-Up Language (DAML) built
upon XML that allows users to provide machinereadable semantic annotations for specific
communities of interest.
Create tools that embed DAML markup on to web
pages and other information sources in a manner
that is transparent and beneficial to the users.
Use these tools to build up, instantiate, operate,
and test sets of agent-based programs that markup
and use DAML.
5. 6. ….applications
DARPA Agent Mark Up Language (DAML)
Program Manager: Professor James Hendler
http://dtsn.darpa.mil/iso/programtemp.asp?mode=347
<ONTOLOGY ID=”powerpoint-ontology" VERSION="1.0"
DESCRIPTION=”formal model for powerpoint presentations">
<Title> DAML
<subtitle> an Example </subtitle>
</title>
<USE-ONTOLOGY ID=”PPT-ontology" VERSION="1.0"
PREFIX=”PP" URL= "http://iwp.darpa.mil/ppt..html">
<CATEGORY NAME=”pp.presentation”
FOR="http://iwp.darpa.mil/jhendler/agents.html">
<RELATION-VALUE POS1 = “Agents” POS2 = “/madhan”>
<DEF-CATEGORY NAME=”Title" ISA=”Pres-Feature" >
<DEF-CATEGORY NAME=”Subtitle" ISA=”Pres-Feature" >
<DEF-RELATION NAME=”title-of"
SHORT="was written by">
<DEF-ARG POS=1 TYPE=”presentation">
<DEF-ARG POS=2 TYPE=”presenter" >
Objects in the web can be marked- in principle (manually or automatically) to include the following
information
• Descriptions of data they contain (DBs)
• Descriptions of functions they provide (Code)
• Descriptions of data they can provide (Sensors)
Source : http://www.darpa.mil/iso/DAML/
Source: http://www.zdnet.com/pcweek/stories/jumps/0,4270,2432946,00.html
Example of searching on DAML-centric semantic Web
Semantics; Entity+Rel+Events;
Meaning with Context
Directory; Structure; Table of Contents
Search; Syntax; Index
Value of Information
Semantics results in deep understanding of content,
resulting in more relevant and timely match with the
information needs and targeting.
• Oingo Ontology – ODP based(?), the database of
millions of concepts and relationships that powers
Oingo's semantic technology
• Oingo Seek - the database of millions of concepts and
relationships that powers Oingo's semantic technology
• Oingo Sense - the knowledge extraction tool that
uncovers the essential meaning of information by
sensing concepts and context
• Oingo Lingua - the language of meaning used to state
intent. The basis for intelligent interaction
• Assets catalogued are Web sites or Web pages.
Broad taxonomy,
Shallow understanding and results
After 3 or 4 clicks
Taalee WorldModelTM: Domain Models (metadata of
domain-media-business attributes, types),
Ontologies, Entities, Relationships, Automated
“Experts”, Reference Data (Live Encyclopedia),
Mappings
Taalee Distributed Intelligent Agent Infrastructure:
push/pull/scheduled agents for fresh extraction
Taalee Metabase of A/V assets
Taalee Semantic EngineTM with contextual reasoning
Semantic
CategorIzation
Semantic
Cataloging
Semantic
Search
Semantic
Directory
Semantic
Personalization
Semantic
Targeting
Taalee Semantic Engine
Metabase
Extractor Agents
WorldModelTM
Metabase: Rapidly growing A/V aggregation
Automatic Extraction Agents: Expert driven value addition
WorldModel: Understanding of content, profiles, targeting needs
Taalee Metadata on
Football Assets
Metadata from Typical
Virage Search on
Cataloging of Football
football touchdown
Assets
Rich Media Reference Page
Baltimore 31, Pit 24
http://www.nfl.com
Brian Griese Interview Part Four
Brian Griese talks about the
first touchdown he ever threw.
URL: http://cbs.sportsline...
Jimmy Smith Interview Part Seven
Jimmy Smith explains his
philosophy on showboating.
URL: http://cbs.sportsline...
Quandry Ismail and Tony Banks hook up for their third long
touchdown, this time on a 76-yarder to extend the Raven’s
lead to 31-24 in the third quarter.
League:
Teams:
Score:
Players:
Event:
Produced by:
Posted date:
Professional
Ravens, Steelers
Bal 31, Pit 24
Quandry Ismail, Tony Banks
Touchdown
NFL.com
2/02/2000
Semantic Enrichment
(a commercial perspective)
What else can a context do?
Simply the most precise and freshest A/V search
Delightful, relevant information,
exceptional targeting opportunity
Context and Domain Specific Attributes
Uniform Metadata for Content from Multiple
Sources, Can be sorted by any field
Creating a Web of
related information
What can a context do?
System recognizes ENTITY & CATEGORY
Relevant portion
of the Directory is
automatically
presented.
Users can explore
Semantically related
Information.
Looking ahead
FROM:
Browsing
Lexical search
Data exchange
Data retrieval
TO:
Information
requests
Content search
Semantic
retrieval
Interpretation
Knowledge
creation
Knowledge
sharing
Evolving targets and approaches in integrating
data and information (a personal perspective)
Generation III
1997...
Generation II
Taalee, Observer
ADEPT, InfoQuilt
DL-II/DARPA/KA2 projects,
OntoBroker, …
VisualHarness
InfoHarness
InfoSleuth, KMed, DL-I projects
Infoscopes, HERMES, SIMS,
Garlic,TSIMMIS,Harvest, RUFUS,...
Mermaid
DDTS
Multibase, MRDSM, ADDS,
IISS, Omnibase, ...
1990s
Generation I
1980s
enablers of the emerging concepts
 Terminology (and language) transparency
 Domain modeling (entities with domain specific
attributes) and complex relationships
 Comprehensive metadata management
 Context-sensitive information processing
 Semantic correlation
Digital Earth Prototype System at UGA



Develop a Digital Earth Modeling System
Answer requests for collection of
information from distributed resources
Develop a supportive learning environment
for undergraduate geography students
A Digital Library Scenario
VOLCANOES ACTIVITY
Some volcanoes are more active than others, and a few
are in a state of permanent eruption, at least for the
geological present. Volcanoes may become quiescent
(dormant) for months or years. The danger to life posed by
active volcanoes is not limited to eruption of molten rock or
showers of ash and cinders.
Mudflows that melt ice and
snow on the volcano's flanks
are equally hazardous*.
* Encarta® 98 Desk Encyclopedia © &
1996-97 Microsoft Corporation.All rights reserved.
Pu'u'O'o, Hawaii
A Digital Library Scenario

VOLCANOES ACTIVITY
A sample information request:
Find information on volcanoes in St. Helens and how
they affect the environment.

Some of the ontologies involved in
processing this information request are:
• Ontology for GIS Datasets;
• Ontology for Natural Disasters;
• Ontology for Volcanoes;
• Ontology for Environment;
TRY HERE THIS AND OTHER CONCEPT DEMOS
Iscape working definition
“An iscape is an information request that
supports learning and semantic
interoperability (about Digital Earth) “
(ADEPT at UGA)
Iscapes in the context of digital earth (ADEPT)




Iscapes are useful to understand geographical
phenomena, typically involving relationships
between them
Iscapes are created by instructors using
an iscape specification framework
Iscapes are run by students while learning
about Digital Earth
Iscapes creation framework fits in the
ADEPT agent -based architecture prototype
Iscape specification framework
Ontologies
Operations/
Simulation
Presentation
Information
Landscape
Relationships
Creation
Learning/What-if
Information Landscapes

A modular specification framework to
represent information landscapes


Specifications of complex information requests
over multiple ontologies

Specification of relationships, including “affects”

Enabling user-configurable parameters

Enabling operations including simulations
A graphical toolkit for easy creation
of iscapes
Information Landscapes

Learning paradigm for students


Uses embedded ontological terms and iscapes
Metadata framework


Models spatial, temporal and theme based
metadata
Uses FGDC and Dublin Core standards to
represent domain independent metadata
Relations

Given a set X, a relation is some property that
may or may not hold between one member of
X and a member of another set

Various relationships:
“equals”, “less_than”, “is_a”, “is_part_of”, “like”
Semantic Relations

Most of these relations are hierarchical or
similarity based

These are not powerful enough for our task of
semantic interoperability between domains
like Geography

In these domains, we have a natural “affects”
relation between the ontologies
Semantic Relations

How does A affect B?
A, in its entirety or by a set of its components,
induces some changes or properties on a set
of components of B
Design of “affects”
How do volcanoes affect the environment?
ENVIRON.
VOLCANO
BUILDING
LOCATION
LOCATION
ASH RAIN DESTROYS
PYROCLASTIC
FLOW
ATMOSPHERE
PEOPLE
COOLS
PLANT
DESTROYS
KILLS
Design of “affects”
[Area (Pyroclastic Flow) INTERSECT Area (Plant)]
=> [Pyroclastic Flow destroys Plant]
[Size (Ash Particles) < 2] => [Ash Rain cools Atmosphere]
[Pyroclastic Flow destroys Plant] and
[Ash Rain cools Atmosphere]
=>
[Volcano affects Environment]
(x | xASC) and (y | yBSC)
[ FN(x) operator FN(y) ]* => [ ASC relation BSC ]
[ ASC relation BSC ]* => A affects B
Mapping Functions
How do volcanoes affect the environment?

[ Location (Volcano) = Location (Environment) ]

Enclosing function provides a standard
interface to the operator

Operator does imprecise or fuzzy match

Achieves Geo-spatial interoperability
Mapping Functions
How do volcanoes affect the environment?

[ Time (Volcano) = Time (Environment) ]

Matches, with a tolerance depending on the
granularity of values

Tolerance different for different entities;
Specified default; Can be user-defined

Achieves temporal interoperability
Operations


Powerful mechanism of studying geographical
domains and other complex phenomena
Input parameters can be changed to support learning
For e.g. statistical operations, numerical analysis
simulation modeling, etc.
Metadata Objects
(site, table, keyword, image …)
i1
o1
User Object
i2
om
f(i1,... i n, o1,... om)
Clarke’s Urban Growth Model (UGM)
Domain of Learning – URBAN DYNAMICS
Demonstrates the utility of integrating existing historic maps
with remotely sensed data and related geographic information
to dynamically map urban land characteristics for large
metropolitan areas.
San Francisco Bay Area prediction of urban extent in 2100
Digital Earth Prototype: run-time architecture overview
RELATE
Cost
Model
Ontology
Agent
User Agent
Planning
Agent
Broker
Correlation
Agent
Wrapped
Resource
Agent
Metabase
Resource
Agent
Simulation
Resource
Agent
Web
Wrapper
Database
Wrapper
Simulation

ADEPT
Metabase
Semantic Web: Possible Evolution
FUEL – User defined/supplied
operators, functions, computations
Declarative Languages
DAML-O, OIL
XHTML
HTML
SMIL
XML
RDF
FUEL as OIL Extension?
RDF(S)
•
•
•
•
•
•
class-def
subclass-of
slot-def
subslot-of
domain
range
FUEL
OIL
OIL,FUEL
• class-expressions
• AND, OR,
NOT
• slot-constraints
• has-value,
value-type
• cardinality
• slot-properties
• trans, symm
• Framework
for mapping
data/formats
• user defined
operators
eg., affects,
simulations
The Promise of the Web with Semantics….
 Semantic Web can be a basis of handling information
overload and provide semantic interoperability

Step wise enrichment -- starting with constrained and
well understood language (such as based on Description
Logic), let us explore how we can support richer/deeper
semantics for enabling complex decision making and
learning involving heterogeneous digital media on the
Global Information Infrastructure
Further reading
http://www.semanticweb.org http://www.daml.org http://lsdis.cs.uga.edu/~adept
“DAML could take search to a new level”
http://www.zdnet.com/pcweek/stories/news/0,4153,2432538,00.html
V. Kashyap and A. Sheth, Information Brokering, Kluwer Academic Publishers, 2000
Tim Berners-Lee, Weaving the Web, Harper, 1999.
Editorial writing by Ramesh Jain in IEEE Multimedia. Gio’s papers. OIL ….
“Humankind has not woven the web of life.
We are but one thread within it.
Whatever we do to the web, we do to
ourselves.
All things connect.”
– Chief Seattle, 1854
amit@taalee.com
amit@cs.uga.edu
–
–
http://www.taalee.com
http://lsdis.cs.uga.edu
For additional details on Information Brokering Architecture:
Realizing Semantic Information Brokering and Semantic Web
ITC-IRST/University of Trento Seminar Series on
Perspectives on Agents: Theories and Technologies,
April, 27, 2000, Trento, Italy
http://lsdis.cs.uga.edu/~adept/presenta.html
For additional details on ISCAPE specification and Execution:
Project Overview and Detailed Presentation at:
http://lsdis.cs.uga.edu/~adept/presenta.html
Demonstrations at: http://lsdis.cs.uga.edu/~adept
Iscape specification using XML
<! -- A template collection for all iscapes -- >
<?xml version = “1.0” ?>
<!DOCYPE IscapeCollection SYSTEM “IscapeCollection.dtd” >
<! -- All Iscapes -- >
<IscapeCollection>
<!-- An iscape specification for how stratovolcanoes affect the
environment -- >
<Iscape>
< -- Identifying this iscape -- >
<Name> How do stratovolcanoes affect the environment </Name>
<Description> An iscape using the affects relationship
</Description>
<! – All ontologies which participate -- >
<Ontologies>
<Ontology>Volcano</Ontology>
<Ontology>Environment</Ontology>
</Ontologies>
<! – Operations involved -- >
<Operation>
<Relation>Affects</Relation>
</Operation>
Iscape specification using XML
<!— Constraints on ontologies -- >
<Ontological Constraints>
<Constraint> Volcano morphology is stratovolcano </Constraint>
<Constraint> Volcano start year is 1950 </Constraint>
</Ontological Constraints>
<!—Metadata to present in the result -->
<Presentation> Volcano and Environment Metadata </Presentation>
<!—What can the student configure -- >
<Student>
<Config> Location of Environment </Config>
</Student>
</Iscape>
<!—This Iscape Ends -- >
<! – Next Iscape starts -- >
<Iscape>
…
…
</Iscape>
</IscapeCollection>
<!—Iscape Collection ends here -- >
Relations
<!-- Template collection of all relations in the system -->
<?xml version = “1.0” >
<!DOCTYPE Relations SYSTEM “Relations.dtd” >
<Relations>
<!--Relation specification starts here -->
<Relation>
<!-- Information to correlate with base iscape -->
<Name> Affects </Name>
<!-- Ontologies Involved -->
<OntologyA> Volcano </OntologyA>
<OntologyB> Environment </OntologyB>
<!-- All operators -->
<OperatorSet>
<!-- Specification has value and mapping conditions -->
<ValueCondition>
<OntologyName> Environment </OntologyName>
<Attribute> Damage </Attribute>
<ValOperator> GREATERTHANEQUALS</ValOperator>
<Value> 10000 </Value>
<Type> Integer </Type>
</ValueCondition>
Relations
<MappingCondition>
<FunctionA>Area</FunctionA>
<ElementA>Volcano</FunctionA>
<Operator>EQUALS</Operator>
<FunctionB>Area</Function>
<ElementB>Environment</ElementB>
</MappingCondition>
</OperatorSet>
<!-- End of all operators -- >
</Relation>
<!-- End of this relation specification -- >
</Relations>
<!-- End of relation collection -- >
Ontological Constraints
<!-- Template to specify ontological constraints -- >
<?xml version = “1.0” >
<!DOCTYPE OntologicalConstraints SYSTEM “OntologicalConstraints.dtd” >
<!-- A collection of ontological constraints for all iscapes -- >
<OntologicalConstraints>
< -- A constraint on this iscape-->
<Constraint>
<IscapeID>Volcano-Env</IscapeID>
<Name>Volcano morphology is stratovolcano</Name>
<LHSOntology>Volcano</LHSOntology>
<LHSAttribute>Morphology</LHSAttribute>
<Operator>LIKE</Operator>
<Type>String</Type>
<RHSValue>Stratovolcano</RHSValue>
</Constraint>
</OntologicalConstraints>
<! -- Collection of ontological constraints ends here -- >
Presentation
<!-- Template for presentation attributes - >
<?xml version = “1.0” >
<!DOCTYPE Presentation SYSTEM “Presentation.dtd” >
<!-- All presentation attributes are embedded here - >
<Presentation>
<!-- presentation attributes for this iscape-- >
<IncludeThese>
<IscapeID>Volcano-Env</IscapeID>
<Name>Volcano and Environment Metadata</Name>
<Include>
<Ontology>Volcano</Ontology>
<Attribute>TectonicSetting</Attribute>
</Include>
<Include>
<Ontology>Volcano</Ontology>
<Attribute>EndYear</Attribute>
</Include>
</IncludeThese>
</Presentation>
<!-- Presentation attributes end here -- >
Student
< !-- Template for student configurable attributes -- >
<! DOCTYPE Student SYSTEM “Student.dtd” >
<!-- All parameters which can be configured by a student -- >
<UserConfigurable>
<!-- Configuration for a particular iscape -- >
<Config>
<!-- Correlating information -- >
<Name>Location of environment</Name>
<!-- The parameters which are configurable -- >
<Parameter>
<Ontology>Environment</Ontology>
<Attribute>LocationName</Attribute>
<DisplayName>Configure Location</Display>
<Value>Hawaii</Value>
<Value>Kileauaea</Value>
</Parameter>
</Config>
<!-- Configuration for this iscape ends here -- >
</UserConfigurable>
<!-- End of all student configurable parameters -- >
Student interface
Results
The correlation agent




Receives the results collections from each
of the resource agents
Correlates the results on basis of information provided
in iscape and the query plan generated by planning
agent
Performs data cleaning operations and merges the
results into uniform result set and pass it on to user
agent
Responsible for performing operations, if specified in
the iscape
Realizing Semantic Information Brokering
and Semantic Web in summary
Knowledge Mgmt.,
Information
Brokering/
Mediator,
Cooperative IS
Visual,
Scientific/Eng.
Knowledge
Semantic
Semi-structured
Metadata
Structural,
Schematic
Mediator,
Federated IS
Text
Structured Databases
Data
Syntax,
System
Federated DB
Popular Alternative perspective/approach: Linguistics, IR, AI
Taking advantage of the Web for learning
Graduate students in a College of Geography have a final
project in which a case of study is proposed. In the case,
they are supposed to help a City Council in making
decisions over the planning of a new landfill. This is a
hands-on learning exercise through the interaction
with a Digital Earth and the starting
point would be to find the best
location for the landfill*.
* This
scenario comes in support of one of the suggestions for
Digital Earth scenarios sampled by the “First Inter-Agency Digital
Earth Working Group, an effort on behalf of NASA’s inter-agency
Digital Earth Program.
Tacoma Landfill
An example scenario of learning on the Web

A high level information request would be:
Find a landfill site for a new landfill near the source of the wastes.
The earthquakes’ impacts must be evaluated.
by
definition

by
semantics
A first cut refinement leads us to the following
information request:
by
synonymy
Find a proper soil in sites not subject to flooding or high
groundwater levels for a new landfill near the industrial zone.
Liquefaction phenomenon cannot occur.
An example scenario of learning on the Web

Adding on-the-fly user constraints while processing the
information request:
Retrieve satellite images in 12-meter resolution or higher,
looking for soils with permeability rate < 10 (silty clay loam)
for a new landfill
whose distance from the city industrial park is less than 5km.
Using the images’ coordinates, forecast seismic activity up to
moderate magnitude (5 - 5.9, Richter scale) in the pointed areas.

domain specific metadata;

correlation among multiple ontologies;

return results in multiple media (in this case, images and a simulation)
An example scenario of learning on the Web
Partial sample ontologies for semantic information brokering:
RECREATIONAL
MILITARY
LANDFILL
SITE
LAND
(SITE)
CULTIVATED
AREA
GREENLAND
AREA
LAND
USE
ZONING
AGRICULTURAL
COMERCIAL
LAND
BANK
INDUSTRIAL
WASTE
DISPOSAL
SOLID
RESIDENTIAL
RURAL
STORM
SEWAGE
FLOOD
HAZARDOUS
TSUNAMI
RESOURCE REC.
LANDFILL
FIRE
causes
NATURAL
DISASTER
RECYCLING
VOLCANO
AVALANCHE
washing
shredding
magnetic
separation
causes
causes
screening
LANDSLIDE
EARTHQUAKE
causes
An example scenario of learning on the Web
A sample result (depending on information providers) could be:
identified landfill site
5km
industrial zone
images source:
http://www.orbimage.com
OrbView-4’s stereo imaging capacity
providing 3-D terrain images

Hyperspectral data will be valuable for
identifying material types
The students now have the information requested for
helping the City Council in the planning of the new landfill