O 1

advertisement
Enabling Ontology Evolution
in Data Integration
Haridimos Kondylakis
Dimitris Plexousakis
Yannis Tzitzikas
Computer Science Department, University of Crete
Information Systems Laboratory, FORTH-ICS
Problem Statement
Query
Mappings
Data Integration System
Sub-queries
DB
DB
DB
2 of 25
Outline
1.
2.
3.
4.
5.
6.
7.
8.
Past Approaches
Our Idea
Modelling Ontology Evolution
Rewritings among ontology versions
Problems & Solutions
Rewritings to the sources
Implementation/Evaluation
Conclusions
3 of 25
1. Past Approaches (1/2)
 Mapping Adaptation
(Velegrakis, 2004)
 Idea: After each small evolution
the mapping can be incrementally
adapted by applying local
modifications.
 System-dependent
 The list of changes may not be given
and should be discovered (how?)
 Multiple list of changes may lead to the
same effect
 Cannot handle complex change
operations such as split & merge
 The algorithm should reapply after
each primitive change
 Inefficient when we have a long
list of changes
S
O
M1
move
elem
O1
M2
add
elem
O2
M3
delete
constraint
 Lack of a precise criterion under
which the adapted mappings
constitute indeed the “right result”
4 of 25
1. Past Approaches (2/2)
 Mapping Composition
Can use schema
mapping tools to
construct E.
(Bernstein, 2008)
 Idea: Is it possible to generate M’
that is equivalent to the original
mappings?
 No known implementation on
ontology evolution
 First-order mappings: not closed
under composition
 Second-order:
 Too difficult to handle
 Not supported by DBMS ( not likely in
the future either)
 Not understood by domain experts
S
M
O
E
M’ = M ° E
O’
 The composition for all mappings
should be produced.
 Several Sets of mapping between each
T and T’
5 of 25
“Everything should be as simple as it
is, but not simpler” -Albert Einstein
Ontology
RDF/S
as Ontology
global schema
System Independent
More Intuitive
Modular
Mappings created
only once
Only one mapping
SpaRQL
set
Data Integration System
Mappings
DB
SpaRQL
Verifiable
Mappings
DB
DB
DB
DB
Mappings
DB
6 of 25
“Everything that exists, it is only change”
-Heraclitus 535 BCE
 Definition (Change Operation). A change u from one ontology
version O1 to another version O2 is defined as a tuple (δa, δd) where:
 δa corresponds to the triples that are added to O1 in order to get
O2
 δd corresponds to the triples that are deleted from O1 in order to
get O2
 δa(u)  δd(u)≠ø,
 δa(u1)  δa(u2)= ø
δa(u)  δd(u)= ø,
δd(u1)  δd(u2)= ø
 Definition (Application semantics of a high-level change). The
application of u upon O denoted by u(O) is defined as
u(O) = (O  δa(u)) \ δd(u).
7 of 25
3.1 Example
fullname
name
Literal
Literal
Literal
Person
Literal
ssn
street
has_gender
Gender
has_cont_point
Actor
Literal
address
city
Cont.
Point
 u1= (Delete, ø, {has_gender(Person, Gender)} )
 u2= (Move, {has_cont_point(Person, Cont_Point)},
{has_cont_point(Actor, Cont_Point)})
Intuitive
 u3 = (Merge, {domain(Cont_Point, address)},
Concise
{domain(Cont_Point, street), domain(Cont_Point, city)})
 u4 = (Rename, {domain(Person, fullname)},
Can
Describe complex evolution
{domain(Person, name)})
8 of 25
4. Data Integration Redefined
RDF/S Ontology
Definition (Data Integration): A data integration system I is a
quadruple (O, E, S, M) where
•O is a version of the Ontology,
•E is the evolution log of the Ontology
•(between the ontologies under consideration),
•S is the set of the local sources,
•M is the mapping between S and one version Oi
Mappings
Sources
9 of 37
4.1 Affecting change operations
 Definition (Affecting change operation).
A change operation u affects the query Q (with graph pattern G), i.e
u ◊ Q if:
 δd(u)≠ø and
 triple pattern t  G that can be unified with a triple of δd(u).
 Definition (Valid Rewriting): Let q a query expressed in O1, us a
sequence of change operations such that us(O1)= O2. q' is a valid
rewriting of q over O2 using us if uiδd(u)such that ui ◊ q holds
that
 |δa(ui)|>0,
  t δd(ui), t ◊ q
and is constructed as follows:
q':= (q – δd(ui)) δa(ui).
10 of 25
4.2 Query answering semantics
 Definition (equivalent query rewriting): (Lenzerini, 2002)
Let
O1, O2 two ontology versions,
E a set of dependencies on the O1  O2
q2 a O2-query
An equivalent rewriting of q2 in presence of E is a query
O1-query, q1 such that q1 gives the same answers as
q2 on any O1 instance that satisfies E
 Theorem: Valid rewritings are equivalent query
rewritings and can be computed with O(N*T) time
complexity (N= #us, T =#triples in G)
11 of 25
4.3 Results
Proposition (Uniqueness): Valid rewritings are
unique
 Proposition (Inverse Query Rewriting): if q2 is a
query over O2 and E the evolution log from O1
to O2, we can produce an equivalent rewriting of
q2 to the O1 by computing the valid rewriting of
q2 on the sequence of the inverted changes of E.
12 of 25
4.3. Example
Initial Query
?NAME
fullname
Person
?SSN
?Address
ssn
address
fullname
name
Literal
Literal
Gender
Cont.
Point
address
Literal
street
ssn
has_gender
Actor
Literal
Literal
Person
has_cont_point
Actor
Cont.
Point
Rewriten Query
city
?NAME
fullname
name
Person
?SSN
?STREET
?Address
street
address
ssn
?CITY
city
Actor
Cont.
Point
13 of 37
5. Problems & Solutions
fullname
Literal
Literal
Person
Literal
?NAME
Person
Actor
address
ssn
has_cont_point
Actor
?SSN
Cont.
Point
?Address
fullname
ssn
address
has_cont_point
Cont.
Point
 Problem Identification: One class is deleted but there exists a parent
class, maintaining all properties
 Problem resolution: Use that class to find more general answers
14 of 25
6.1. System Architecture
DlvHex Prototype
(Polleres, 2007)
15 of 25
6.2 Source Rewriter
 Traditionally the problem was
to find the maximally
contained rewriting for one
user query
 Algorithms: MiniCon (Pottinger,
2001), Bucket, Inverse rules
 Now we have several
queries, one for each
ontology version.
 Information might need to be
combined among ontology
versions
16 of 25
6.3 Source Rewriter
 Reuse the best algorithm for
finding maximally contained
rewritings
 But adopt it for multiple
queries
 Properties of the algorithm
 Sound & Complete
 Complexity O(q(n m M)n)
 q the number or valid rewriting,
 n the number of subgoals in the
biggest query,
 m the maximal number of
subgoals in a view
 M the number of the mappings
Algorithm 3.3: EDI-Minicon(Q, M)
Input: Q a set of datalog queries, M the
mappings
Output: The set of maximally-contained
rewritings MQ
1. Initialize MCD={}, MQ={}
2. For each qj in Q
5.
MCDj:= FormMCDs(qj, M)
6.
Add MCDj to MCD
7. For each qj in Q
8.
mqj := CombineMCDs (MCD, qj)
9.
Add mqj to MQ
10. Return MQ
17 of 25
7.Preliminaty Evaluation
Changes v4.2 to v.3.2.1
 CIDOC-CRM
13%
 80 classes
 250 properties
66%
 726 changes
21%
 Adding &
restructuring
(01.02.02-01.06.05)
information does not
affect valid rewritings
 Deleting Information
Changes v3.2.1 to v.4.2
 Queries
however it does
 50 real user queries
13%
from 3D-COFORM
21%
In general assuming
queries over v.4.2 from CIDOC we
DELETIONS
ADDITIONS
66%
would
be able to rewrite
89% of them
RESTRUCTURE
18 of 37
7.4 Problems: Fiction or Reality?
A
A
C
B
B
Del D, Add C
It makes no sense searching for C in previous versions
D
Actually, we can provide
access
Add D,
Del C to the 99% of the
source information through valid rewritings
Time
In general assuming queries over v.4.2 from CIDOC we
would be able to rewrite 89% of them to v.3.2.1
19 of 25
7.2 Avg Running Time: 0,06 msec
20 of 25
8.1 Advantages of our approach
 We don’t rewrite all the mappings but the query
 Exploit the locality of the query
 Mappings are produced one time and can be validated by
domain experts  Greatly reduces human effort & time spent
 Our approach works independently of the family of mappings to
the sources (GAV, LAV, GLAV, nested e.t.c)
 The mappings to the sources are not affected at all in order to
maintain their initial semantics
 Modularity & scalability : New mappings or ontology changes can
be defined independently
 We use high level changes to model ontology evolution
 High level changes can model complex ontology evolution
 Reduces the size of the evolution log
 Can be provided efficiently for two ontology versions.
21 of 25
8.2 Advantages of our approach
 Valid Rewritings
 We define the answer semantics in such a setting
 Precise criteria exists for deciding when is possible to compute
valid rewritings.
 With small complexity
 Even when no valid rewritings exist
 Smart things are done as more-general answers
 We can guide user in mapping redefinition
 Computing Source Rewritings
 The increased computational complexity is linear to the number
of the input queries and remains scalable.
22 of 25
8.3 Conclusions
 Ontology evolution is reality and data integration
systems should be aware of this
 We have shown how to answer queries over multiple
ontology versions
 To the best of our knowledge no system today is
capable of query answering over multiple ontology
versions
 Future Work
 More extensive evaluation using Gene Ontology
 Semantic Infrastructure for plugIT
 Integrate our system to Protégé MASTRO system
 Extend our approach to OWL variants
 Consider RDF Sources and their Evolution as well
23 of 25
References
1.
2.
3.
4.
5.
6.
7.
Philip A. Bernstein, Todd J. Green, Sergey Melnik, Alan Nash:
Implementing mapping composition. VLDB J. (VLDB) 17(2):333-353
(2008)
Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris
Kotzinos, Vassilis Christophides: On Detecting High-Level Changes in
RDF/S KBs. International Semantic Web Conference 2009:473-488
Maurizio Lenzerini: Data Integration: A Theoretical Perspective. PODS
2002:233-246
Rachel Pottinger, Alon Y. Halevy: MiniCon: A scalable algorithm for
answering queries using views. VLDB J. (VLDB) 10(2-3):182-198 (2001)
Axel Polleres: From SPARQL to rules (and back). WWW 2007:787-796
Yannis Tzitzikas, Dimitris Kotzinos: (Semantic web) evolution through
change logs: Problems and solutions. Artificial Intelligence and
Applications 2007:654-659
Yannis Velegrakis, Renée J. Miller, Lucian Popa, John Mylopoulos:
ToMAS: A System for Adapting Mappings while Schemas Evolve. ICDE
2004:862
Questions?
Download