Integrating Databases into the Semantic Web through an Ontology-based Framework

advertisement
Integrating Databases into the Semantic Web
through an Ontology-based Framework
Dejing Dou, Paea LePendu, Shiwoong Kim
Computer and Information Science, University of Oregon, USA
Peishen Qi
Computer Science Department, Yale University, USA
April, 2006 @ SWDB’06
1
Outline

Introduction
– The status of the Semantic Web
– Realizing SW needs existing databases

OntoGrate: An Ontology-based Information Integration
Framework
– Some previous work
– Modules in OntoGrate Architecture

Case Study for integrating Databases into SW
– Without an existing domain ontology
– With an existing domain ontology

Conclusion and Future Work
2
The Semantic Web


One major goal of the Semantic Web is that web-based agents can
process and “understand” data [Berners-Lee etal01].
Ontologies formally describe the semantics of data and web-based
agents can take SW documents (e.g. in RDF/OWL) as a set of
assertions (true statements) and draw inferences from them.
Web-based
agents
human
SW
3
What we have now?

DAML+OIL  OWL (Web ontology language)

More and more domain ontologies are defined in
DAML+OIL/OWL, even for some specific domains
(e.g., GO)

We are developing some tools, agents, services
See http://www.semwebcentral.org,
http://knowledgeweb.semanticweb.org/
http://www.daml.org/
4
Two things are important

Real Data for sharing
– relational databases (may be the biggest resource)
– Other kinds of databases
– WWW/XML data
– Some knowledge bases

Better Semantic Web Services/Agents
5
Semantic Annotation for Data?

It is good for small size data resources

It is not that good for large size data resources (relational
databases)
– “Redundant” copies
– Time consuming for query answering.
 E.g. it currently works as loading OWL data into a
knowledge base then answering queries with DL
ABox reasoning. (Can it compete with existing
DBMS which has well developed indexing and query
optimization techniques?)

It is better that relational databases can be accessed/queried
6
directly by SW agents/services
The difficulties
The Semantic Web
Ontologies define the
semantics of data
The Relational DBs
Schemas define the structure and
integrity constraints
7
A more general question

How can we make databases, SW resources,
WWW/XML data, KBs work together?

The problem is similar
– SW resources and KBs are defined by ontologies, which
are more expressive and focus on semantics
– Databases and XML documents are defined by schemas,
which focus on structure
– Syntax difference (e.g., OWL vs. SQL)
8
OntoGrate: An Ontology-based Information
Integration System
9

Some Previous Work
Schemas (e.g., stores7 DB in IBM informix),
10
Some Previous Work
 Schemas,
Ontologies and Web-PDDL
Relation  Type/Class
Attribute  Predicate/Property
Integrity Constrain  Axiom/Rule
Primary Key  Fact/Instance
11
Some Previous Work
 Merging
Ontologies with Bridging Axioms
12
Some Previous Work
 The
Bridge Axiom/mapping on
customerfname/customerlname
vs.
customercontactname :
(forall (c - @stores7:Customer f l - @sql:varchar)
(if (and (@stores7:customerfname c f)
(@stores7:customerlname c l))
(@nwind:customercontactname c
(@sql:concat f l))))
13
 The
Some Previous Work
Bridge Axiom/mapping on
customerregion
vs.
customerstatecode/statename/statecode :
(forall (x - @nwind:Customer y - @sql:varchar)
(if (@nwind:customerregion x y)
(exists (z - @stores7:State t - @sql:varchar)
(and (@stores7:customerstatecode x t)
(@stores7:statename z y)
(@stores7:statecode z t)))))
14
Some Previous Work
 Inferential
Data Integration with OntoEngine
– Data Translation:
View data as true statements, e.g., (statecode S#28 “OR”)
(Ms_t; s)
D t only if (Ms_t; s) ╞ t
(Ms_t; s)
D t  (Ms_t; s) ├ t (Ms_t; s) ╞ t
– Query Translation:
(Ms_t; s)
(Ms_t; s)
Q
t only if (Ms_t;  (t)) ╞  (s)
Q
t  (Ms_t;  (t)) ├  (s)
(Ms_t;  (t)) ╞  (s)
15
OntoGrate Architecture Revisited
16
Modules in OntoGrate Architecture

The Syntax Translators (Wrappers)
– e.g., PDDSQL (SQLWeb-PDDL),
PDDOWL(OWL Web-PDDL)

The Matching (correspondence) Generation
– e.g., name, structure (tree, graph) similarity,synonyms and
is-a (part of) relationships using thesauri and dictionary, such
as Wordnet

The Data Mining Module
 The Machine Learning Module
 The Inference Engine (OntoEngine)
 The User Interface
17
Learning the mappings from
domain experts
(forall (x - @A1:Invertebrate)
(if (is @A1:Insect x)
(and (@A2:legs x 6)
(@A2:bodySegments x 3))))
18
Mining the mappings from large
datasets
For example, two Medical databases in the same hospital: DB1 list
blood pressure of patients with nominal values, such as low, normal, at
risk, and high, while the other DB2 may record the exact numerical
values for systolic and diastolic pressure.
By association rule mining, we may get the rule/mapping like:
@DB2:SystolicPressure  140  @DB2:DiastolicPressure  90
@DB2:BloodPressure = `High‘
(support = 40%, confidence = 90%)
19
Case Study in Two Scenarios
 Integrating
DBs into SW without an
existing domain ontology
 Integrating
DBs into SW with an existing
domain ontology
20
Without an existing domain ontology
21
Generating OWL ontologies from DB Schemas
SQL schema  Web-PDDL (by using PDDSQL)
 Web-PDDL  OWL (by using PDDOWL)
– E.g., Stores7.sql  Stores7.pddl  Stores7.owl
...
<owl:Class rdf:ID="Customer">
<rdfs:subClassOf
rdf:resource=“http://www.cs.uoregon.edu/~paea/sql#Relation"/>
</owl:Class>
<owl:DatatypeProperty rdf:ID="customercity">
<rdfs:domain rdf:resource="#Customer"/>
<rdfs:range rdf:resource="#String"/>
</owl:DatatypeProperty>
...

22
An OWL-QL query based on Stores7.owl
<owl-ql:query xmlns:owl-ql=“http://www.w3.org/2003/10/owl-ql-syntax#"...>
<owl-ql:premise>
<rdf:RDF>
<rdf:Description rdf:about="#C">
<rdf:type rdf:resource="#Customer"/>
<customercity rdf:resource="#Eugene"/>
</rdf:Description>
</rdf:RDF>
</owl-ql:premise>
<owl-ql:queryPattern>
<rdf:RDF>
<rdf:Description rdf:about="#C">
<customerfname rdf:resource="http://www.w3.org/2003/10/ owl-ql-variables
#x"/>
<customerlname rdf:resource=" http://www.w3.org/2003/10/ owl-qlvariables##y"/>
</rdf:Description>
</rdf:RDF>
</owl-ql:queryPattern>
23
<owl-ql:answerKBPattern>
<owl-ql:kbRef rdf:resource="...stores7.owl"/>…
The corresponding Web-PDDL and SQL queries
PDDOWL
(and (customercity ?C - Customer "Eugene")
(customerfname ?C - Customer ?x - String)
(customerlname ?C - Customer ?y - String))
PDDSQL
SELECT C.customerfname, C.customerlname
FROM Customer C
WHERE C.customercity = "Eugene"
24
Getting Answers from Stores7 DB
customerfname
customerlname
Paea
LePendu
Dejing
Dou
Shiwoong
Kim
(1000/100,000/3secs)
PDDSQL
{?x/Paea, ?y/LePendu}
{?x/Dejing, ?y/Dou}
{?x/Shiwoong, ?y/Kim}
<owl-ql:answerBundle
PDDOWL
(1000 bindings/3 secs)
xmlns:owl-ql=" http://www.w3.org/2003/10/ owl-ql-syntax#" ...>
<owl-ql:answer>
<owl-ql:binding-set>
<var:x rdf:resource="#Paea"/>
<var:y rdf:resource="#LePendu"/>
</owl-ql:binding-set>
<owl-ql:answerPatternInstance>
<rdf:RDF>
<rdf:Description rdf:about="#C">
25
<customerfname rdf:resource="#Paea"/>
With an existing domain ontology
Order ontology: http://www.dayf.de/2004/owl/order.owl
26
An OWL-QL query based on order.owl
<owl-ql:query xmlns:owl-ql=“http://www.w3.org/2003/10/owl-ql-syntax#"...>
<owl-ql:premise>
<rdf:RDF>
<<rdf:type rdf:resource="#Person"/>
<hasAddress rdf:resource="#A"/>
</rdf:Description>
<rdf:Description rdf:about="#A">
<rdf:type rdf:resource="#Address"/>
<City rdf:resource="#Eugene"/>
</rdf:Description>
</rdf:Description>
</rdf:RDF>
<owl-ql:queryPattern>
<rdf:RDF>
<rdf:Description rdf:about="#C">
<FirstName rdf:resource="http://www.w3.org/2003/10/ owl-ql-variables #x"/>
<LastNname rdf:resource=" http://www.w3.org/2003/10/ owl-ql-variables##y"/>
…
<owl-ql:kbRef rdf:resource=" http://www.dayf.de/2004/owl/order.owl"/>…
27
The Bridging Axioms/Mappings between
Stores7.pddl and Order.pddl
(T-> @stores7:Customer @order:Person)
(forall (P - @order:Person A - @order:Address z - String)
(if (and (@order:hasAddress P A)
(@order:City A z))
(@stores7:customercity P z)))
(forall (C - @stores7:Customer z - String)
(if (@stores7:customercity P z)
(exists (A - @order:Address)
(and (@order:hasAddress P A)
(@order:City A z)))))
28
The Bridging Axioms/Mappings between
Stores7.pddl and Order.pddl
(T-> @stores7:Customer @order:Person)
(forall (C - @stores7:Customer x - String)
(iff (@stores7:customerfname C x)
(@order:FirstName C x)))
(forall (C - @stores7:Customer y - String)
(iff (@stores7:customerlname C y)
(@order:LastName C y)))
29
The Query Translation between Stores7 and Order
OWL-QL query in order.owl
PDDOWL
(and (hasAddress ?C - Person ?A - Address)
(City ?A "Eugene")
(FirstName ?C - Person ?x - String)
(LastName ?C - Person ?y - String))
OntoEngine
( < 1 sec)
(and (customercity ?C - Customer "Eugene")
(customerfname ?C - Customer ?x - String)
(customerlname ?C - Customer ?y - String))
Bridging
Axioms
30
Final Answers in the order ontology
customerfname
customerlname
Paea
LePendu
Dejing
Dou
Shiwoong
Kim
PDDSQL
<owl-ql:answer>
(customerfname C1 Paea)
(customerlname C2 LePendu)
(customerfname C1 Dejing)
…
OntoEngine
(40,000facts/30 secs)
<owl-ql:binding-set>
<var:x rdf:resource="#Paea"/>
(FirstName C1 Paea)
(LastName C2 LePendu)
(FirstName C1 Dejing)
…
<var:y rdf:resource="#LePendu"/>
</owl-ql:binding-set>
<owl-ql:answerPatternInstance>
<rdf:RDF>
<rdf:Description rdf:about="#C">
<FirstName rdf:resource="#Paea"/>
PDDOWL
(10,000 facts/11 secs)
<LastName rdf:resource="#LePendu"/>
…
31
Some related work

Semantic Annotation
– [Stojanovic etal@SAC02] maps relational model to frame logic/RDF.
– DOGMA[Verheyden etal@SWDB04] translates a ontology query to SQL

Schema and Ontology mapping
– Similarity matching, machine learning… useful for generating
candidate matchings
– Semi-automatic tool (Clio)

Data integration and query answering
– Federated databases[Sheth&Larson 90], data warehouse, peer
to peer management [Halevy etal@ICDE03] , MiniCon
[PottingerLevy@VLDB00] uses query rewriteing at GLV

Logic and Databases
– Reiter’s reconstruction of relational model in FOL.
– Carnot, SIMS, Information Manifold by using a global
ontology, DL or Datalog
32
Conclusion and Future work

We applied OntoGrate, an ontology-based information
integration framework, to integrate relational databases with the
Semantic Web. The testing result based on two scenarios is
promising.

We are developing other modules (e.g., learning/mapping/UI) in
OntoGrate.

The scalability and efficiency need to be investigated in largersize data resources.

Extending the current work to integrate XML (with/without
XML schemas or DTD) and the Semantic Web.
33
Thank you for your attention !
34
Download