Relationships at the Heart of Semantic Web SOFSEM 2002 , Keynote

advertisement
Relationships at the Heart of
Semantic Web
Keynote
SOFSEM 2002 , Milovy, Czech Republic, Nov 25 2002
Amit Sheth
Large Scale Distributed Information Systems (LSDIS) Lab
University Of Georgia; http://lsdis.cs.uga.edu
CTO, Semagix, Inc. http://www.semagix.com
November 2002
© Amit Sheth
Semantics:
The Next Step in the Web’s Evolution
The Semantic Web -- a vision with several views:
•·“The Web of data (and connections) with meaning in the
sense that a computer program can learn enough about what
data means to process it.” [B99]
•·“The semantic Web is an extension of the current Web in
which information is given well-defined meaning, better
enabling computers and people to work in cooperation.”
[BHL01]
•·“The Semantic Web is a vision: the idea of having data on
the Web defined and linked in a way that it can be used by
machines not just for display purposes, but for automation,
integration and reuse of data across various applications.
[W3C01]
Semantics for the Web
On the Semantic Web every resource (people,
enterprises, information services, application services,
and devices) are augmented with machine
processable descriptions to support the finding,
reasoning about (e.g., which service is best), and
using (e.g., executing or manipulating) the resource.
The idea is that self-descriptions of data and other
techniques would allow context-understanding
programs to selectively find what users want, or for
programs to work on behalf of humans and
organizations to make them more efficient and
productive.
Move from Syntax to Semantics in
Information System (a personal perspective)
Semantics
Generation III
(information
brokering)
1997...
Generation II
(mediators)
1990s
ADEPT
Semantic Web, some DL-II projects,
VideoAnywhere
Taalee Semantic Engine, Oingo
InfoQuilt
Metadata
VisualHarness
InfoHarness
Data
Generation I
(federated DB/
multidatabases)
1980s
(Ontology, Context, Relationships, KB)
Mermaid
DDTS
(Domain model)
InfoSleuth, KMed, DL-I projects
Infoscopes, HERMES, SIMS,
Garlic,TSIMMIS,Harvest, RUFUS,...
(Schema, “semantic data modeling)
Multibase, MRDSM, ADDS,
IISS, Omnibase, ...
Semantics and Relationships
Semantics is derived from relationships. Consider the
linguistics perspective.
“Semantics is the study of meaning. …We may
distinguish a number of legitimate ways to approach
semantics:
• …
• the relationship between linguistic expressions (e.g.
synonymy, antonymy, hyperonymy, etc.): sense;
• the relationship to linguistic expressions to the "real
world": reference. “
Ontologies in KR help capture the above.
Quoted part from http://www.ncl.ac.uk/sml/staff/. © 2000 Jonathan West.
Why is this a hard problem?
Are object/entities equivalent/same?
How (well) are they related?
• partial and fuzzy match: related, relevant..
• related in a “context”
• semantic similarity, proximity, distance, ….
Semantic ambiguity, also based on incomplete,
inconsistent, approximate
information/knowledge
Many problems have stumbled across these
issues e.g., schema integration (in database
management area)
Semantics and Relationships
Increasing depth and sophistication in dealing
with semantics by dealing with
(identifying/searching to analyzing) documents,
entities, and relationships.
Future
Relationships
Current
Entities
Past
Documents
Issues - Relationships
•
•
•
•
Identifying Relationship
Discovering Relationships
Validating Relationships
Utilizing Relationships (in document
search, querying metadata, analysis…)
Specifying/Describing Relationships
• Expressiveness of specification language
– In relational model
– In semantic data model, e.g., E-R variants
– In logic, e.g., description logics
–…
Relationship Modeling in Various
Representation Models …
Catalog/ID
Thesauri
“narrower
term”
relation
DB Schema
Formal
is-a
Frames
(properties)
UMLS
RDF
Wordnet
OO
Terms/
glossary
Simple
Taxonomies
Informal
is-a
Formal
instance
RDFS
Disjointness,
Inverse,
part of…
DAML CYC
OWL IEEE SUO
Value
Restriction
General
Logical
constraints
Expressive
Ontologies
After Deborah L. McGuinness (Stanford) and Tim Finin (UMBC)
Sampling Issues in Relationshipsoutline of this talk
• Simple Relationships – already known
– Representation
– Identification/Querying: “Which entities are related to
entity X via relationship R?” where R is typically specified as
possibly a join condition or path expression
• Complex relationships
– Rho: discovery from large document set with
associated metadata and ontologies: “How is X
related to Y?”
– ISCAPEs: validation/ human-directed knowledge
discovery
Metadata and Ontology:
Primary Semantic Web enablers
User
Ontologies
Classifications
Domain Models
Domain Specific Metadata
area, population (Census),
land-cover, relief (GIS),metadata
concept descriptions from ontologies
Domain Independent (structural) Metadata
(C++ class-subclass relationships, HTML/SGML
Document Type Definitions, C program structure...)
Direct Content Based Metadata
(inverted lists, document vectors, LSI)
Content Dependent Metadata (size, max colors, rows, columns...)
Content Independent Metadata (creation-date, location, type-of-sensor...)
Data (Heterogeneous Types/Media)
SCORE technology
Knowledge
Agents
Semantic Enhancement Server
Automatic
Classification
Classification Committee
Entity Extraction,
Enhanced Metadata,
Domain Experts
Knowledge
Sources
KA
KS
KS
Ontology
Content
Sources
KA
Content
Agents
KS
KA
Databases
CA
XML/Feeds
Knowledge
Agent
Monitor
CA
Websites
Email
CA
Reports
Documents
Metabase
Content
Agent
Monitor
CA
Toolkit
Metadata
adapter
KA
Toolkit
Metadata
adapter
Enterprise Content
Applications
Semantic Query Server
Ontology
and
Metabase
Main Memory
Index
KS
Automatic Classification & Metadata
Extraction (Web page)
Video with
Editorialized
Text on the Web
Auto
Categorization
Semantic Metadata
Semantic Annotation
COMTEX Tagging
Value-added Voquette Semantic Tagging
Content
‘Enhancement’
Rich Semantic
Metatagging
Limited tagging
(mostly syntactic)
Value-added
relevant metatags
added by Voquette
to existing
COMTEX tags:
• Private companies
• Type of company
• Industry affiliation
• Sector
• Exchange
• Company Execs
• Competitors
Information Extraction
for Metadata Creation
WWW, Enterprise
Repositories
Nexis
UPI
AP
Feeds/
Documents
Digital Videos
...
...
Data Stores
Digital Maps
...
Digital Images
Key challenge:
Create/extract as much (semantics)
metadata automatically as possible
Digital Audios
EXTRACTORS
METADATA
Automatic Semantic Annotation of Text:
Entity and Relationship Extraction
Ontology-directed Metadata
Extraction (Semi-structured data)
Web Page
Enhanced Metadata Asset
Extraction
Agent
Entity and Semantic
Metadata Extraction
Syntax Metadata
Semantic Metadata
Semantic Metadata Enhancement
Semantic Application Example
– Research Dashboard
Automatic
3rd party
content
integration
Focused
relevant
content
organized
by topic
(semantic
categorization)
Related relevant
content not
explicitly asked for
(semantic
associations)
Competitive
research
inferred
automatically
Automatic Content
Aggregation
from multiple
content providers
and feeds
Semantic Web –
Intelligent Content
Intelligent Content = What You Asked for + What you need to know!
Related
Stock
News
COMPANY
Competition
COMPANIES in
INDUSTRY with
Competing PRODUCTS
COMPANIES in Same or
Related INDUSTRY
Regulations
Technology
Products
Important to INDUSTRY
or COMPANY
Industry
News
EPA
Impacting INDUSTRY
or Filed By COMPANY
SEC
Knowledge-based and
Manual Associations
Syntax Metadata
Same
entity
led by
Semantic Metadata
Humanassisted
inference
Blended Semantic Browsing and Querying
(Intelligence Analyst Workbench)
Relationship Discovery
• Problem
Huge volumes of data. Need to find relationships
between two entities in the Semantic Web.
Application areas such as National Security,
Intelligence Services, Bioinformatics.
Relationship can be of different kinds.
Example
Terrorist
Organization
Ramzi
Binalshibh
Osama,
bin laden
name
leaderOf
AlQaida
memberOf
friendOf
Mohammad
Atta
name
passengerOf
name
Introduction
• Semantic Association
Complex relationships which capture
connectivity and similarity of entities in a
knowledge base
– Complex
• Involve multiple relations
– Connectivity
• Includes both directed paths and undirected paths
Similarity
• Specific notion of an isomorphism, based on the
similarity of roles of entities.
RDF Graph Data Model
• By using a graph data model, Semantic
Associations can be viewed in terms of graph
traversals
• We can distinguish between different types of
Semantic Associations based on structural
properties
• For example, a path, intersecting paths,
isomorphic paths.
• We use the RDF Graph Data Model, to model
Semantic Associations.
Example Graph
String
creates
String
Artist
lname
Sculptor
exhibited
Artifact
Museum
String
String
-pathConnected(x, y):String
is true if there is Integer
a path
Ext. Resource
last_modified
paints
technique
a, p2, b, p3,String…. y> in the
Painter<x, p1,
Painting
knowledge base
sculpts
material
Sculpture
&r3
“Pablo”
title
Date
“Reina Sofia
Museun”
&r1
&r2
“Picasso”
2000-6-09
“oil on
canvas”
“Rembrandt”
&r5
&r4
X p1
a
technique
typeOf(instance)
“oil on
canvas”
p2
subClassOf(isA)
Y
subPropertyOf
“Rodin”
&r6
“August”
creates
&r7
exhibited
&r8
title
“Rodin
Museum”
lname
“image/jpeg”
2000-02-01
Example Graph
String
-joinConnected(x, y): is true
creates
P1
=
<x,
pa,
a, pb,Artifact
b, pc,
String
lname Artist
if there two paths P1, P2 such that:
c,exhibited
pd…k, pl
l, pm, m>Stringand
Museum
String
P2 = <y, pu, b, pv,…k, pw, l, py, n>
Or
material
sculpts
Integer
Sculptor
P1 = < a, pa, b, pb,…k,
pk, l, pl, x String
> and
Sculpture
P2 = < y, pu, b, pv, m, pw, l,…k, p5, l, p6, n >
Ext. Resource
Painter
paints
Painting
technique
Date
last_modified
String
n
X
a
“Pablo”
&r3
title
“Reina Sofia
Museun”
&r1
&r2
“Picasso”
y
“Rembrandt”
b
&r5
&r4
2000-6-09
“oil on
canvas”
technique
k
m
typeOf(instance)
“oil on
canvas”
subClassOf(isA)
subPropertyOf
“Rodin”
&r6
“August”
creates
&r7
exhibited
&r8
title
“Rodin
Museum”
lname
“image/jpeg”
2000-02-01
Example
Graph
-isoConnected(x, y) is true if there two paths P1, P2 such that:
P1 = <x,
pa, a, pb, b, pc, c> and
creates
exhibited
String
P2
=
<y,
pu,
b,
pv,
m,
pw, l>Museum
Artist
Artifact
lname
and
x  y, a  b, c  l …….
material
sculpts
Sculptor
String
Sculpture
pa  pu, pb  pv, pc  pw ….
String
String
String
Integer
Ext. Resource
Painter
paints
X
Painting
technique
c
String
pc
pa a
&r3
“Pablo”
last_modified
title
Date
“Reina Sofia
Museun”
&r1
&r2
“Picasso”
“oil on
canvas”
“Rembrandt”
&r5
&r4
Y&r6
“Rodin”
“August”
2000-6-09
pa
creates
u
&r7
technique
“oil on
canvas”
p1exhibited
typeOf(instance)
1
&r8
subClassOf(isA)
subPropertyOf
title
“Rodin
Museum”
lname
“image/jpeg”
2000-02-01
 Operators
• The  Operator computes Semantic Associations
between two entities.
• Three kinds of Operators are defined.
–  Path : This operator returns all paths between two
entities in the data model.
–  Connect : This operator returns intersecting paths,
on which the two entities lie.
–  Iso : -isomorphic paths implies a similarity of nodes
and edges along the paths, and returns such similar
paths between entities.
Formalism
• -pathConnected(x, y): is true if there is a path
– <x, p1, a, p2, b, p3, …. y> in the knowledge base
• -joinConnected(x, y): is true if there two paths
P1, P2 such that:
– P1 = <x, pa, a, pb, b, pc, c, pd…k, pl l, pm, m> and
– P2 = <y, pu, b, pv,…k, pw, l, py, n>
Or
– P1 = < a, pa, b, pb,…k, pk, l, pl, x > and
– P2 = < y, pu, b, pv, m, pw, l,…k, p5, l, p6, y >
Complex Relationships
• Arise in several contexts, especially involving
multiple ontologies (hence mappings)
– information interoperability where related resources
subscribe to different but related ontologies
– information requestor and resource modelers choose to
use different ontologies
– information requests to support analysis, knowledge
discovery, decision making, learning that requires linking
multiple domains with different ontologies
Developing all encompassing, unified ontology is not shown to be practical.
Preexisting classifications/metadata standards/taxonomies are hard to
ignore.
Complex Relationships -
Cause-Effects & Knowledge discovery
ENVIRON.
VOLCANO
BUILDING
LOCATION
LOCATION
ASH RAIN DESTROYS
PYROCLASTIC
FLOW
WEATHER
PEOPLE
COOLS TEMP
DESTROYS
KILLS
PLANT
InfoQuilt System Core capabilities
• Ability to handle heterogeneous, static or dynamic
content – wrappers & extractors, with resource modeling
(completeness, data characteristics, binding patterns)
• Information Extraction: Semi-Automatically or
Automatically create domain-specific or
contextually relevant metadata
• Domain modeling with complex (user defined,
inter-ontology) relationships, domain rules and FD
• User defined Functions (esp. for
fuzzy/approximate matching) and Simulation
• Post processing result analysis (e.g., chart
creator)
Inter-Ontological Relationships
A nuclear test could have caused an earthquake
if the earthquake occurred some time after the
nuclear test was conducted and in a nearby region.
NuclearTest Causes Earthquake
<= dateDifference( NuclearTest.eventDate,
Earthquake.eventDate ) < 30
AND distance( NuclearTest.latitude,
NuclearTest.longitude,
Earthquake,latitude,
Earthquake.longitude ) < 10000
IScape (Information Scape)
A computing paradigm that allows users to query
and analyze the data available from a diverse
autonomous sources, gain better understanding
of the domains and their interactions as well as
discover and study relationships.
IScape …a simple example
• user’s request
– for semantically related information
(regardless of all forms of heterogeneity)
– specified in terms of components of knowledge base
(domain model, relationships, functions, simulations)
“Find all earthquakes with epicenter less than 5000 mile from the
location at latitude 60.790 North and longitude 97.570 East and
find all tsunamis that they might have caused”
Next - KD using ISacpes
Knowledge Discovery - Example
Earthquake Sources
Nuclear Test Sources
Nuclear Test May Cause Earthquakes
Is it really true?
Complex
Relationship:
How do you
model this?
Knowledge Discovery - Example
When was the first recorded nuclear test conducted?
1950
Find the total number of earthquakes with a magnitude
5.8 or higher on the Richter scale per year starting from 1900
Increase in number of
earthquakes since 1945
Knowledge Discovery –
exploring relationship…
For each group of earthquakes with magnitudes in the ranges
5.8-6, 6-7, 7-8, 8-9, and >9 on the Richter scale per year
starting from 1900, find number of earthquakes
Number of earthquakes with
magnitude > 7 almost constant.
So nuclear tests probably only
cause earthquakes with
magnitude < 7
Knowledge Discovery - Example…
Find nuclear tests and earthquakes that may have
occurred as a result of the test
KB
Knowledge Builder
IScape Builder
IScape Execution
Knowledge
Final Results
IScape
Plan
Query
Query
IScape
Final Results
Plan
Data retrieved
Query
IScape 1
Resource
Resource
NuclearTestsDB(
testSite, explosiveYield, waveMagnitude,
testType, eventDate, conductedBy,
[dc] waveMagnitude > 3 );
SignificantEarthquakesDB(
eventDate, description, region,
magnitude, latitude, longitude,
numberOfDeaths, damagePhoto,
[dc] eventDate > “January 1, 1970” );
http://sun00781.dn.net/nuke/hew/Library/Catalog
Resource
USGS site
NuclearTestSites(
testSite, latitude, longitude );
Ontology
NuclearTest(
testSite, explosiveYield, waveMagnitude, testType,
eventDate, conductedBy, latitude, longitude,
waveMagnitude > 0, waveMagnitude < 10,
testSite -> latitude longitude );
Ontology
Earthquake(
eventDate, description, region,
magnitude, latitude, longitude,
numberOfDeaths, damagePhoto,
magnitude > 0 );
Relationship
NuclearTest Causes Earthquake
<= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30
AND distance( NuclearTest.latitude, NuclearTest.longitude,
Earthquake,latitude, Earthquake.longitude ) < 10000
IScape
“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with
seismic body wave magnitude > 4.5 and find all earthquakes that could have been
caused due to these tests.”
IScape Processing Monitor
Future
Future work in Semantic Web will
increasingly focus on all dimensions of
relationships, especially complex
relationships.
Further Information
• Related Paper: Sheth, Arpinar, Kashyap:
Relationships at the Heart of Semantic Web: Modeling,
Discovering, and Exploiting Complex Semantic
Relationships
http://lsdis.cs.uga.edu/lib/2002.html
• InfoQuilt and Semantic Association Projects at
the LSDIS Lab: http://lsdis.cs.uga.edu
Download