Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete Information Systems Laboratory, FORTH-ICS Problem Statement Query Mappings Data Integration System Sub-queries DB DB DB 2 of 25 Outline 1. 2. 3. 4. 5. 6. 7. 8. Past Approaches Our Idea Modelling Ontology Evolution Rewritings among ontology versions Problems & Solutions Rewritings to the sources Implementation/Evaluation Conclusions 3 of 25 1. Past Approaches (1/2) Mapping Adaptation (Velegrakis, 2004) Idea: After each small evolution the mapping can be incrementally adapted by applying local modifications. System-dependent The list of changes may not be given and should be discovered (how?) Multiple list of changes may lead to the same effect Cannot handle complex change operations such as split & merge The algorithm should reapply after each primitive change Inefficient when we have a long list of changes S O M1 move elem O1 M2 add elem O2 M3 delete constraint Lack of a precise criterion under which the adapted mappings constitute indeed the “right result” 4 of 25 1. Past Approaches (2/2) Mapping Composition Can use schema mapping tools to construct E. (Bernstein, 2008) Idea: Is it possible to generate M’ that is equivalent to the original mappings? No known implementation on ontology evolution First-order mappings: not closed under composition Second-order: Too difficult to handle Not supported by DBMS ( not likely in the future either) Not understood by domain experts S M O E M’ = M ° E O’ The composition for all mappings should be produced. Several Sets of mapping between each T and T’ 5 of 25 “Everything should be as simple as it is, but not simpler” -Albert Einstein Ontology RDF/S as Ontology global schema System Independent More Intuitive Modular Mappings created only once Only one mapping SpaRQL set Data Integration System Mappings DB SpaRQL Verifiable Mappings DB DB DB DB Mappings DB 6 of 25 “Everything that exists, it is only change” -Heraclitus 535 BCE Definition (Change Operation). A change u from one ontology version O1 to another version O2 is defined as a tuple (δa, δd) where: δa corresponds to the triples that are added to O1 in order to get O2 δd corresponds to the triples that are deleted from O1 in order to get O2 δa(u) δd(u)≠ø, δa(u1) δa(u2)= ø δa(u) δd(u)= ø, δd(u1) δd(u2)= ø Definition (Application semantics of a high-level change). The application of u upon O denoted by u(O) is defined as u(O) = (O δa(u)) \ δd(u). 7 of 25 3.1 Example fullname name Literal Literal Literal Person Literal ssn street has_gender Gender has_cont_point Actor Literal address city Cont. Point u1= (Delete, ø, {has_gender(Person, Gender)} ) u2= (Move, {has_cont_point(Person, Cont_Point)}, {has_cont_point(Actor, Cont_Point)}) Intuitive u3 = (Merge, {domain(Cont_Point, address)}, Concise {domain(Cont_Point, street), domain(Cont_Point, city)}) u4 = (Rename, {domain(Person, fullname)}, Can Describe complex evolution {domain(Person, name)}) 8 of 25 4. Data Integration Redefined RDF/S Ontology Definition (Data Integration): A data integration system I is a quadruple (O, E, S, M) where •O is a version of the Ontology, •E is the evolution log of the Ontology •(between the ontologies under consideration), •S is the set of the local sources, •M is the mapping between S and one version Oi Mappings Sources 9 of 37 4.1 Affecting change operations Definition (Affecting change operation). A change operation u affects the query Q (with graph pattern G), i.e u ◊ Q if: δd(u)≠ø and triple pattern t G that can be unified with a triple of δd(u). Definition (Valid Rewriting): Let q a query expressed in O1, us a sequence of change operations such that us(O1)= O2. q' is a valid rewriting of q over O2 using us if uiδd(u)such that ui ◊ q holds that |δa(ui)|>0, t δd(ui), t ◊ q and is constructed as follows: q':= (q – δd(ui)) δa(ui). 10 of 25 4.2 Query answering semantics Definition (equivalent query rewriting): (Lenzerini, 2002) Let O1, O2 two ontology versions, E a set of dependencies on the O1 O2 q2 a O2-query An equivalent rewriting of q2 in presence of E is a query O1-query, q1 such that q1 gives the same answers as q2 on any O1 instance that satisfies E Theorem: Valid rewritings are equivalent query rewritings and can be computed with O(N*T) time complexity (N= #us, T =#triples in G) 11 of 25 4.3 Results Proposition (Uniqueness): Valid rewritings are unique Proposition (Inverse Query Rewriting): if q2 is a query over O2 and E the evolution log from O1 to O2, we can produce an equivalent rewriting of q2 to the O1 by computing the valid rewriting of q2 on the sequence of the inverted changes of E. 12 of 25 4.3. Example Initial Query ?NAME fullname Person ?SSN ?Address ssn address fullname name Literal Literal Gender Cont. Point address Literal street ssn has_gender Actor Literal Literal Person has_cont_point Actor Cont. Point Rewriten Query city ?NAME fullname name Person ?SSN ?STREET ?Address street address ssn ?CITY city Actor Cont. Point 13 of 37 5. Problems & Solutions fullname Literal Literal Person Literal ?NAME Person Actor address ssn has_cont_point Actor ?SSN Cont. Point ?Address fullname ssn address has_cont_point Cont. Point Problem Identification: One class is deleted but there exists a parent class, maintaining all properties Problem resolution: Use that class to find more general answers 14 of 25 6.1. System Architecture DlvHex Prototype (Polleres, 2007) 15 of 25 6.2 Source Rewriter Traditionally the problem was to find the maximally contained rewriting for one user query Algorithms: MiniCon (Pottinger, 2001), Bucket, Inverse rules Now we have several queries, one for each ontology version. Information might need to be combined among ontology versions 16 of 25 6.3 Source Rewriter Reuse the best algorithm for finding maximally contained rewritings But adopt it for multiple queries Properties of the algorithm Sound & Complete Complexity O(q(n m M)n) q the number or valid rewriting, n the number of subgoals in the biggest query, m the maximal number of subgoals in a view M the number of the mappings Algorithm 3.3: EDI-Minicon(Q, M) Input: Q a set of datalog queries, M the mappings Output: The set of maximally-contained rewritings MQ 1. Initialize MCD={}, MQ={} 2. For each qj in Q 5. MCDj:= FormMCDs(qj, M) 6. Add MCDj to MCD 7. For each qj in Q 8. mqj := CombineMCDs (MCD, qj) 9. Add mqj to MQ 10. Return MQ 17 of 25 7.Preliminaty Evaluation Changes v4.2 to v.3.2.1 CIDOC-CRM 13% 80 classes 250 properties 66% 726 changes 21% Adding & restructuring (01.02.02-01.06.05) information does not affect valid rewritings Deleting Information Changes v3.2.1 to v.4.2 Queries however it does 50 real user queries 13% from 3D-COFORM 21% In general assuming queries over v.4.2 from CIDOC we DELETIONS ADDITIONS 66% would be able to rewrite 89% of them RESTRUCTURE 18 of 37 7.4 Problems: Fiction or Reality? A A C B B Del D, Add C It makes no sense searching for C in previous versions D Actually, we can provide access Add D, Del C to the 99% of the source information through valid rewritings Time In general assuming queries over v.4.2 from CIDOC we would be able to rewrite 89% of them to v.3.2.1 19 of 25 7.2 Avg Running Time: 0,06 msec 20 of 25 8.1 Advantages of our approach We don’t rewrite all the mappings but the query Exploit the locality of the query Mappings are produced one time and can be validated by domain experts Greatly reduces human effort & time spent Our approach works independently of the family of mappings to the sources (GAV, LAV, GLAV, nested e.t.c) The mappings to the sources are not affected at all in order to maintain their initial semantics Modularity & scalability : New mappings or ontology changes can be defined independently We use high level changes to model ontology evolution High level changes can model complex ontology evolution Reduces the size of the evolution log Can be provided efficiently for two ontology versions. 21 of 25 8.2 Advantages of our approach Valid Rewritings We define the answer semantics in such a setting Precise criteria exists for deciding when is possible to compute valid rewritings. With small complexity Even when no valid rewritings exist Smart things are done as more-general answers We can guide user in mapping redefinition Computing Source Rewritings The increased computational complexity is linear to the number of the input queries and remains scalable. 22 of 25 8.3 Conclusions Ontology evolution is reality and data integration systems should be aware of this We have shown how to answer queries over multiple ontology versions To the best of our knowledge no system today is capable of query answering over multiple ontology versions Future Work More extensive evaluation using Gene Ontology Semantic Infrastructure for plugIT Integrate our system to Protégé MASTRO system Extend our approach to OWL variants Consider RDF Sources and their Evolution as well 23 of 25 References 1. 2. 3. 4. 5. 6. 7. Philip A. Bernstein, Todd J. Green, Sergey Melnik, Alan Nash: Implementing mapping composition. VLDB J. (VLDB) 17(2):333-353 (2008) Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides: On Detecting High-Level Changes in RDF/S KBs. International Semantic Web Conference 2009:473-488 Maurizio Lenzerini: Data Integration: A Theoretical Perspective. PODS 2002:233-246 Rachel Pottinger, Alon Y. Halevy: MiniCon: A scalable algorithm for answering queries using views. VLDB J. (VLDB) 10(2-3):182-198 (2001) Axel Polleres: From SPARQL to rules (and back). WWW 2007:787-796 Yannis Tzitzikas, Dimitris Kotzinos: (Semantic web) evolution through change logs: Problems and solutions. Artificial Intelligence and Applications 2007:654-659 Yannis Velegrakis, Renée J. Miller, Lucian Popa, John Mylopoulos: ToMAS: A System for Adapting Mappings while Schemas Evolve. ICDE 2004:862 Questions?