Magic Sets and their Application to Data Integration { }

advertisement
Magic Sets and their
Application to Data Integration
Wolfgang Faber, Gianluigi Greco, Nicola Leone
Department of Mathematics
University of Calabria, Italy
{faber,greco,leone}@mat.unical.it
Magic Sets and their Application to Data Integration – p. 1
Roadmap
Motivation: Data Integration
Datalog¬
Modularity Results
Magic Sets
Some Experiments
Conclusions
Magic Sets and their Application to Data Integration – p. 2
Research Context
EU-funded project: INFOMIX
Data Integration
Advanced System Dealing with Incomplete
and Inconsistent Information
Builds on Datalog system DLV
http://www.dlvsystem.com
Univ. Calabria (Leone, Faber et al.),
Univ. Rome (Lenzerini, Rosati et al.),
TU Vienna (Eiter, Gottlob et al.),
Rodan (Staniszkis et al.)
Magic Sets and their Application to Data Integration – p. 3
Context: Data Integration
Data integration system
I = hG, S, Mi:
G = hΨ, Σi global (relational) scheme
– Ψ relation schemes, Σ integrity constraints,
ShΨ0 , ∅i (relational) schema of the sources,
M mapping between G and S.
Magic Sets and their Application to Data Integration – p. 4
Context: Data Integration
Users issue queries on the global schema,
and the system automatically retrieves data
from the sources. But:
Data stored in sources may violate global
constraints
Retrieved data might be inconsistent.
Techniques for database repairing are
needed.
In many settings: co-NP
Magic Sets and their Application to Data Integration – p. 5
¬
Datalog for Repairing Data
Idea: Given a data integration system I,
construct a Datalog¬ program Π(I) whose
stable models are in one-to-one
correspondence with repairs of I.
The Cautious Consequences of Π(I)
Coincide with the Consistent Query Answers
Magic Sets and their Application to Data Integration – p. 6
¬
Datalog : Current Situation
Competitive Systems: Bottom-Up
Focus on Models, not Query-Answering
Query Optimization Methods?
Magic Sets and their Application to Data Integration – p. 7
¬
Datalog Syntax
Rules:
a :- b1 , . . . , bk , not bk+1 , . . . , not bm .
where a, b1 , . . . , bm are atoms
and not denotes default negation.
Intuitive reading:
If b1 . . . , bk are true, and bk+1 , . . . , bm are not true, then a is
true.
Magic Sets and their Application to Data Integration – p. 8
¬
Datalog Syntax
Program P:
finite set of safe rules.
Base BP :
set of all ground atoms constructible
from constants and predicates in P.
Ground Program Ground(P):
set of rules
obtained by applying all possible substitutions
(from variables in P to constants in P) to P.
Magic Sets and their Application to Data Integration – p. 9
Stable Model Semantics
An interpretation I⊆ BP is a model of a program
P if it satisfies all rules in Ground(P).
The reduct P I of a ground program P (wrt I)
is obtained by
1. deleting all rules with false negative body
2. deleting the negative body of the other
rules. the positive ground program.
An interpretation I is a stable model of P iff it is
the least model of Ground(P)I .
Magic Sets and their Application to Data Integration – p. 10
Example
The program P1
p(X) :- e(X), not q(X).
q(X) :- e(X), not p(X). e(1).
has exactly two stable models:
S1 = {p(1), e(1)} and S2 = {q(1), e(1)}
Ground(P1 )S1 = p(1) :- e(1).
e(1).
Ground(P1 )S2 = q(1) :- e(1). e(1).
Magic Sets and their Application to Data Integration – p. 11
Example
The program P2
z :- t(1), not z.
t(X) :- q(X).
p(X) :- e(X), not q(X).
q(X) :- e(X), not p(X).
e(1).
has exactly one stable model: S1 = {p(1), e(1)}
Magic Sets and their Application to Data Integration – p. 12
Example
The program P2
z :- t(1), not z.
t(X) :- q(X).
p(X) :- e(X), not q(X).
q(X) :- e(X), not p(X).
e(1).
has exactly one stable model: S1 = {p(1), e(1)}
S2 = {z, q(1), t(1), e(1)} is not a stable model, as P2S2 does
not contain a rule with z in the head.
Note: z :- t(1), not z. acts like an integrity constraint
t(1) ⇒ ⊥, inhibiting any stable model containing t(1).
Magic Sets and their Application to Data Integration – p. 12
Brave/Cautious Consequences
A ground atom a is a
for P (P |=b a) if a is true in
some stable model of P.
brave consequence
cautious consequence
for P (P |=c a) if a is true
in all stable models.
Note: If no stable model exists, all atoms in BP are
cautious consequences, and no atom is a brave
consequence.
Magic Sets and their Application to Data Integration – p. 13
Example
p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1).
Stable Models: {p(1), e(1)} and {q(1), e(1)}
Brave consequences: p(1), q(1), e(1),
cautious consequences: e(1).
Magic Sets and their Application to Data Integration – p. 14
Example
p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1).
Stable Models: {p(1), e(1)} and {q(1), e(1)}
Brave consequences: p(1), q(1), e(1),
cautious consequences: e(1).
z :- t(1), not z.
t(X) :- q(X).
p(X) :- e(X), not q(X).
q(X) :- e(X), not p(X).
e(1).
Stable Model: {p(1), e(1)}
Brave and cautious consequences: {p(1), e(1)}.
Magic Sets and their Application to Data Integration – p. 14
Queries
Syntax: Query q:
c?
c: atom (with variables)
Brave answers:
Substitutions θ s.t. P |=b qθ
Cautious answers:
Substitutions θ s.t. P |=c qθ
Magic Sets and their Application to Data Integration – p. 15
Query Evaluation
Desideratum:
Evaluate only a subprogram relevant
to the query
Implicit in top-down methods.
Not straightforward for query answering
using stable models.
Problem:
Generating subprograms along head → body is
not sufficient.
Magic Sets and their Application to Data Integration – p. 16
Example
z :- t(1), not z. t(X) :- q(X).
p(X) :- e(X), not q(X). q(X) :- e(X), not p(X).
e(1).
Generating a subprogram for evaluation of query p(X)?,
moving only along “head to body”, we would produce P :
p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1).
Magic Sets and their Application to Data Integration – p. 17
Example
z :- t(1), not z. t(X) :- q(X).
p(X) :- e(X), not q(X). q(X) :- e(X), not p(X).
e(1).
Generating a subprogram for evaluation of query p(X)?,
moving only along “head to body”, we would produce P :
p(X) :- e(X), not q(X). q(X) :- e(X), not p(X). e(1).
But then 1 is not a cautious answer for P , while it is for the
original program.
Magic Sets and their Application to Data Integration – p. 17
Example
z :- t(1), not z.
t(X) :- q(X).
p(X) :- e(X), not q(X).
q(X) :- e(X), not p(X).
e(1).
Magic Sets and their Application to Data Integration – p. 18
Example
z :- t(1), not z.
t(X) :- q(X).
p(X) :- e(X), not q(X).
q(X) :- e(X), not p(X).
e(1).
z :- t(1), not z. is a rule which should not be dropped
Magic Sets and their Application to Data Integration – p. 18
Example
z :- t(1), not z.
t(X) :- q(X).
p(X) :- e(X), not q(X).
q(X) :- e(X), not p(X).
e(1).
z :- t(1), not z. is a rule which should not be dropped
t(1) should be treated like being reached from the query,
hence both rules
and
z :- t(1), not z.
t(X) :- q(X).
should be included in the relevant subprogram.
Magic Sets and their Application to Data Integration – p. 18
Dangerous Predicates and Rules
A predicate d is dangerous if
d occurs in a cycle with an odd number of
negations, or
d occurs in the body of a rule with a
dangerous head predicate.
A rule r is dangerous, if its head is dangerous.
Magic Sets and their Application to Data Integration – p. 19
Independent Sets
An independent set for a ground program is a set
S ⊆ BP such that for each a ∈ S:
if a is the head of rule r then all atoms of r are
in S, and
if a appears in the body of a dangerous rule r
then all atoms of r are in S.
A subprogram T of a program P is a module if T
consists of exactly the rules with head atoms
from S for an independent set S.
Magic Sets and their Application to Data Integration – p. 20
Theorems
Let T be a module of P, and q occur in T .
SM(P)/T ⊆ SM(T).
(T |=c q) ⇒ (P |=c q), and
(T |=b q) ⇐ (P |=b q)
Magic Sets and their Application to Data Integration – p. 21
Theorems
Let T be a module of P, and q occur in T .
SM(P)/T ⊆ SM(T).
(T |=c q) ⇒ (P |=c q), and
(T |=b q) ⇐ (P |=b q)
Moreover, if P is consistent, then
SM(T) = SM(P)/T .
(T |=c q) ⇔ (P |=c q), and
(T |=b q) ⇔ (P |=b q).
Magic Sets and their Application to Data Integration – p. 21
Evaluation
For a query c? use the smallest
module containing c.
Optimal:
⇒ infeasible
⇒ use an approximating technique
Adaptation of Magic Sets
Magic Sets and their Application to Data Integration – p. 22
Magic-Set Method
Given a query q, and a program P
Focuses on the subset of P which is relevant
for q
“Pushes-down” the query constants, to
eliminate rule-instances which cannot
contribute to the derivation of q
Simulates the top-down evaluation of q
Magic Sets and their Application to Data Integration – p. 23
Magic-Set Method
Rewrite P in a query-equivalent program P’
1. Adorn P (simulate the binding passing)
2. Generate Magic
(magic rules identify the relevant atoms).
3. Modify P (limit P to the Magic Set)
Magic Sets and their Application to Data Integration – p. 24
Modification for Datalog
¬
Rule-by-rule processing
Process also dangerous rules
. . . but only for generating magic rules
. . . by swapping head and body, and applying
standard magic generation
Magic Sets and their Application to Data Integration – p. 25
Enhanced Magic-Set Algorithm
A Datalog¬ program P, and a query Q = g(t).
Input:
Output:
var
The optimized program MS¬ (Q, P).
S: stack of adorned predicates; modifiedRules,magicRules: set of rules;
modifiedRules:= ∅; magicRules:=BuildQuerySeeds(Q, S);
while S 6= ∅ do
pα := S.pop();
for each rule r ∈ P with H(r) = p(tp ) do
ra :=
Adorn(r,pα ,S);
magicRules := magicRules
modifiedRules := modifiedRules
for each dangerous rule d
Generate(ra );
{Modify(ra )};
∈ P where h(th ) : − q1 (t1 ), . . . , qm (tm ) and qi = p do
let ds be the rule qi (ti )
let da :=Adorn(ds
S
S
,pα ,S );
: − h(th ), q1 (t1 ), . . . , qi−1 (t1 ), qi+1 (t1 ), . . . , qm (tm );
magicRules := magicRules
S
Generate(da );
MS¬ (Q, P):=magicRules ∪ modifiedRules;
return MS¬ (Q, P);
Magic Sets and their Application to Data Integration – p. 26
Magic Sets: Example
e(1). z :- t(1), not z. t(X) :- q(X).
p(X) :- e(X), not q(X). q(X) :- e(X), not p(X).
a(X) : −not b(X). b(X) : −not a(X).
with query p(1)? yields the following
e(1). z :- t(1), not z. t(X) :- magic_tb (X), q(X).
p(X) :- magic_pb (X), e(X), not q(X). q(X) :- magic_q b (X), e(X), not p(X).
magic_pb (1). magic_tb (X) :- magic_q b (X).
magic_q b (X) :- e(X), magic_pb (X). magic_pb (X) :- e(X), magic_q b (X).
Magic Sets and their Application to Data Integration – p. 27
Theorem
Let P be a Datalog¬ program, let Q be a query.
Then, it holds that
MS¬(hQ, Pi)⊆cQ P and MS¬(hQ, Pi)⊇bQ P, and
if SM(P) 6= ∅,
MS¬(hQ, Pi)≡bQ P and MS¬(hQ, Pi)≡cQ P.
Magic Sets and their Application to Data Integration – p. 28
Theorem
Let P be a Datalog¬ program, let Q be a query.
Then, it holds that
MS¬(hQ, Pi)⊆cQ P and MS¬(hQ, Pi)⊇bQ P, and
if SM(P) 6= ∅,
MS¬(hQ, Pi)≡bQ P and MS¬(hQ, Pi)≡cQ P.
Data Integration Programs Π(I) always
have stable models, so we obtain query
equivalence for these!
Remark:
Magic Sets and their Application to Data Integration – p. 28
Demo Scenario
EU Project INFOMIX (IST-2001-33570)
Information system of University “La Sapienza” in
Rome.
14 global relations,
29 integrity constraints,
29 relations (in 3 legacy databases) and 12
web wrappers,
More than 24MB of data regarding students,
professors and exams of the University.
Magic Sets and their Application to Data Integration – p. 29
Experiments
Relative Gain
Magic Sets and their Application to Data Integration – p. 30
Conclusion
Optimization for Datalog¬ with stable models
Important for Data Integration
Modularity results for Datalog¬
Magic Sets for Datalog¬
Positive impact on Data Integration
Application
Magic Sets and their Application to Data Integration – p. 31
Download