Magic Sets and their Application to Data Integration Wolfgang Faber, Gianluigi Greco, Nicola Leone Department of Mathematics University of Calabria, Italy {faber,greco,leone}@mat.unical.it Magic Sets and their Application to Data Integration – p. 1 Roadmap Motivation: Data Integration Datalog¬ Modularity Results Magic Sets Some Experiments Conclusions Magic Sets and their Application to Data Integration – p. 2 Research Context EU-funded project: INFOMIX Data Integration Advanced System Dealing with Incomplete and Inconsistent Information Builds on Datalog system DLV http://www.dlvsystem.com Univ. Calabria (Leone, Faber et al.), Univ. Rome (Lenzerini, Rosati et al.), TU Vienna (Eiter, Gottlob et al.), Rodan (Staniszkis et al.) Magic Sets and their Application to Data Integration – p. 3 Context: Data Integration Data integration system I = hG, S, Mi: G = hΨ, Σi global (relational) scheme – Ψ relation schemes, Σ integrity constraints, ShΨ0 , ∅i (relational) schema of the sources, M mapping between G and S. Magic Sets and their Application to Data Integration – p. 4 Context: Data Integration Users issue queries on the global schema, and the system automatically retrieves data from the sources. But: Data stored in sources may violate global constraints Retrieved data might be inconsistent. Techniques for database repairing are needed. In many settings: co-NP Magic Sets and their Application to Data Integration – p. 5 ¬ Datalog for Repairing Data Idea: Given a data integration system I, construct a Datalog¬ program Π(I) whose stable models are in one-to-one correspondence with repairs of I. The Cautious Consequences of Π(I) Coincide with the Consistent Query Answers Magic Sets and their Application to Data Integration – p. 6 ¬ Datalog : Current Situation Competitive Systems: Bottom-Up Focus on Models, not Query-Answering Query Optimization Methods? Magic Sets and their Application to Data Integration – p. 7 ¬ Datalog Syntax Rules: a :- b1 , . . . , bk , not bk+1 , . . . , not bm . where a, b1 , . . . , bm are atoms and not denotes default negation. Intuitive reading: If b1 . . . , bk are true, and bk+1 , . . . , bm are not true, then a is true. Magic Sets and their Application to Data Integration – p. 8 ¬ Datalog Syntax Program P: finite set of safe rules. Base BP : set of all ground atoms constructible from constants and predicates in P. Ground Program Ground(P): set of rules obtained by applying all possible substitutions (from variables in P to constants in P) to P. Magic Sets and their Application to Data Integration – p. 9 Stable Model Semantics An interpretation I⊆ BP is a model of a program P if it satisfies all rules in Ground(P). The reduct P I of a ground program P (wrt I) is obtained by 1. deleting all rules with false negative body 2. deleting the negative body of the other rules. the positive ground program. An interpretation I is a stable model of P iff it is the least model of Ground(P)I . Magic Sets and their Application to Data Integration – p. 10 Example The program P1 p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). has exactly two stable models: S1 = {p(1), e(1)} and S2 = {q(1), e(1)} Ground(P1 )S1 = p(1) :- e(1). e(1). Ground(P1 )S2 = q(1) :- e(1). e(1). Magic Sets and their Application to Data Integration – p. 11 Example The program P2 z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). has exactly one stable model: S1 = {p(1), e(1)} Magic Sets and their Application to Data Integration – p. 12 Example The program P2 z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). has exactly one stable model: S1 = {p(1), e(1)} S2 = {z, q(1), t(1), e(1)} is not a stable model, as P2S2 does not contain a rule with z in the head. Note: z :- t(1), not z. acts like an integrity constraint t(1) ⇒ ⊥, inhibiting any stable model containing t(1). Magic Sets and their Application to Data Integration – p. 12 Brave/Cautious Consequences A ground atom a is a for P (P |=b a) if a is true in some stable model of P. brave consequence cautious consequence for P (P |=c a) if a is true in all stable models. Note: If no stable model exists, all atoms in BP are cautious consequences, and no atom is a brave consequence. Magic Sets and their Application to Data Integration – p. 13 Example p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). Stable Models: {p(1), e(1)} and {q(1), e(1)} Brave consequences: p(1), q(1), e(1), cautious consequences: e(1). Magic Sets and their Application to Data Integration – p. 14 Example p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). Stable Models: {p(1), e(1)} and {q(1), e(1)} Brave consequences: p(1), q(1), e(1), cautious consequences: e(1). z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). Stable Model: {p(1), e(1)} Brave and cautious consequences: {p(1), e(1)}. Magic Sets and their Application to Data Integration – p. 14 Queries Syntax: Query q: c? c: atom (with variables) Brave answers: Substitutions θ s.t. P |=b qθ Cautious answers: Substitutions θ s.t. P |=c qθ Magic Sets and their Application to Data Integration – p. 15 Query Evaluation Desideratum: Evaluate only a subprogram relevant to the query Implicit in top-down methods. Not straightforward for query answering using stable models. Problem: Generating subprograms along head → body is not sufficient. Magic Sets and their Application to Data Integration – p. 16 Example z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). Generating a subprogram for evaluation of query p(X)?, moving only along “head to body”, we would produce P : p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). Magic Sets and their Application to Data Integration – p. 17 Example z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). Generating a subprogram for evaluation of query p(X)?, moving only along “head to body”, we would produce P : p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). But then 1 is not a cautious answer for P , while it is for the original program. Magic Sets and their Application to Data Integration – p. 17 Example z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). Magic Sets and their Application to Data Integration – p. 18 Example z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). z :- t(1), not z. is a rule which should not be dropped Magic Sets and their Application to Data Integration – p. 18 Example z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1). z :- t(1), not z. is a rule which should not be dropped t(1) should be treated like being reached from the query, hence both rules and z :- t(1), not z. t(X) :- q(X). should be included in the relevant subprogram. Magic Sets and their Application to Data Integration – p. 18 Dangerous Predicates and Rules A predicate d is dangerous if d occurs in a cycle with an odd number of negations, or d occurs in the body of a rule with a dangerous head predicate. A rule r is dangerous, if its head is dangerous. Magic Sets and their Application to Data Integration – p. 19 Independent Sets An independent set for a ground program is a set S ⊆ BP such that for each a ∈ S: if a is the head of rule r then all atoms of r are in S, and if a appears in the body of a dangerous rule r then all atoms of r are in S. A subprogram T of a program P is a module if T consists of exactly the rules with head atoms from S for an independent set S. Magic Sets and their Application to Data Integration – p. 20 Theorems Let T be a module of P, and q occur in T . SM(P)/T ⊆ SM(T). (T |=c q) ⇒ (P |=c q), and (T |=b q) ⇐ (P |=b q) Magic Sets and their Application to Data Integration – p. 21 Theorems Let T be a module of P, and q occur in T . SM(P)/T ⊆ SM(T). (T |=c q) ⇒ (P |=c q), and (T |=b q) ⇐ (P |=b q) Moreover, if P is consistent, then SM(T) = SM(P)/T . (T |=c q) ⇔ (P |=c q), and (T |=b q) ⇔ (P |=b q). Magic Sets and their Application to Data Integration – p. 21 Evaluation For a query c? use the smallest module containing c. Optimal: ⇒ infeasible ⇒ use an approximating technique Adaptation of Magic Sets Magic Sets and their Application to Data Integration – p. 22 Magic-Set Method Given a query q, and a program P Focuses on the subset of P which is relevant for q “Pushes-down” the query constants, to eliminate rule-instances which cannot contribute to the derivation of q Simulates the top-down evaluation of q Magic Sets and their Application to Data Integration – p. 23 Magic-Set Method Rewrite P in a query-equivalent program P’ 1. Adorn P (simulate the binding passing) 2. Generate Magic (magic rules identify the relevant atoms). 3. Modify P (limit P to the Magic Set) Magic Sets and their Application to Data Integration – p. 24 Modification for Datalog ¬ Rule-by-rule processing Process also dangerous rules . . . but only for generating magic rules . . . by swapping head and body, and applying standard magic generation Magic Sets and their Application to Data Integration – p. 25 Enhanced Magic-Set Algorithm A Datalog¬ program P, and a query Q = g(t). Input: Output: var The optimized program MS¬ (Q, P). S: stack of adorned predicates; modifiedRules,magicRules: set of rules; modifiedRules:= ∅; magicRules:=BuildQuerySeeds(Q, S); while S 6= ∅ do pα := S.pop(); for each rule r ∈ P with H(r) = p(tp ) do ra := Adorn(r,pα ,S); magicRules := magicRules modifiedRules := modifiedRules for each dangerous rule d Generate(ra ); {Modify(ra )}; ∈ P where h(th ) : − q1 (t1 ), . . . , qm (tm ) and qi = p do let ds be the rule qi (ti ) let da :=Adorn(ds S S ,pα ,S ); : − h(th ), q1 (t1 ), . . . , qi−1 (t1 ), qi+1 (t1 ), . . . , qm (tm ); magicRules := magicRules S Generate(da ); MS¬ (Q, P):=magicRules ∪ modifiedRules; return MS¬ (Q, P); Magic Sets and their Application to Data Integration – p. 26 Magic Sets: Example e(1). z :- t(1), not z. t(X) :- q(X). p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). a(X) : −not b(X). b(X) : −not a(X). with query p(1)? yields the following e(1). z :- t(1), not z. t(X) :- magic_tb (X), q(X). p(X) :- magic_pb (X), e(X), not q(X). q(X) :- magic_q b (X), e(X), not p(X). magic_pb (1). magic_tb (X) :- magic_q b (X). magic_q b (X) :- e(X), magic_pb (X). magic_pb (X) :- e(X), magic_q b (X). Magic Sets and their Application to Data Integration – p. 27 Theorem Let P be a Datalog¬ program, let Q be a query. Then, it holds that MS¬(hQ, Pi)⊆cQ P and MS¬(hQ, Pi)⊇bQ P, and if SM(P) 6= ∅, MS¬(hQ, Pi)≡bQ P and MS¬(hQ, Pi)≡cQ P. Magic Sets and their Application to Data Integration – p. 28 Theorem Let P be a Datalog¬ program, let Q be a query. Then, it holds that MS¬(hQ, Pi)⊆cQ P and MS¬(hQ, Pi)⊇bQ P, and if SM(P) 6= ∅, MS¬(hQ, Pi)≡bQ P and MS¬(hQ, Pi)≡cQ P. Data Integration Programs Π(I) always have stable models, so we obtain query equivalence for these! Remark: Magic Sets and their Application to Data Integration – p. 28 Demo Scenario EU Project INFOMIX (IST-2001-33570) Information system of University “La Sapienza” in Rome. 14 global relations, 29 integrity constraints, 29 relations (in 3 legacy databases) and 12 web wrappers, More than 24MB of data regarding students, professors and exams of the University. Magic Sets and their Application to Data Integration – p. 29 Experiments Relative Gain Magic Sets and their Application to Data Integration – p. 30 Conclusion Optimization for Datalog¬ with stable models Important for Data Integration Modularity results for Datalog¬ Magic Sets for Datalog¬ Positive impact on Data Integration Application Magic Sets and their Application to Data Integration – p. 31