Consistent Query Answering Under Inclusion Dependencies Presented by: Zhijun Lin

advertisement
Consistent Query Answering Under
Inclusion Dependencies
Authors: Loreto Bravo and Leopoldo Bertossi
Carleton University, Canada
Presented by: Zhijun Lin
Advisors: Dr. Hactor Hernandez
Dr. Yuanlin Zhang
1
Integrity Constraints

Integrity constraints (ICs) describe valid database
instance.
For example:
- “Every student has a unique ID number”
- “Students can enroll only in the offered courses”
- “No employees can have salary higher than his
manager”
2
Inconsistent databases

Inconsistent database: database that violates given
integrity constraints.

Some reasons for database to be inconsistent:
- DBMS does not enforce all ICs.
- Integration of data from different databases.
- New constraints are imposed on pre existing database.
- Soft or user constraints.
3
Inconsistent databases

In several cases we don’t want to repair database to
restore consistency:
- no permission.
- too expensive.
- temporary inconsistency.

How to obtain consistent query answer from inconsistent
database?
4
Example

A database instance r
Student

Name
Grade
John
90
John
80
Smith
70
IC: (functional dependency) Name  Grade.
5
Example


If only deletion/insertion of
whole tuples are allowed,
there are two ways to repair
the database with minimal
changes.
Note Student (Smith, 70)
persists in both repairs
whereas Student(John,90)
does not.
Student
Student
Name
Grade
John
90
Smith
70
Name
Grade
John
80
Smith
70
6
Repair

A repair of a database instance r is a database
instance r’
- over the same database schema and domain,
- satisfies ICs,
- differs from r by a minimal set of changes (insertion
or deletion of tuples) wrt set inclusion.
7
Consistent Query Answer

A tuple (a1,…,an) is a consistent query answer to a
query Q (x1,…,xn) in a database r if it is an answer to Q
in every repair of r.
8
Example

Student (Smith, 70) is a
consistent answer.

Student (John, 90) is not a
consistent answer.

For query asking for student
that has higher grade than
Smith, John should be a
consistent answer.
Student
Student
Name
Grade
John
90
Smith
70
Name
Grade
John
80
Smith
70
9
Classes of ICs

Consider two important classes of ICs:
- Universal integrity constraints (UICs)
- Referential integrity constraints (RICs),
also known as inclusion dependencies (INDs).
10
UIC and RIC
Universal integrity constraint has the form
m
n
x1 ,..., xn [  Pi ( xi )   Pi ( xi )   ( x1 ,..., xn )],
i 1
i  m 1
(2)
Referentia l integrity constraint has the form
x1 x3 [Q( x1 )  P( x2 , x3 )],
(3)
where xi are sequence of distinct v ariables, with x2 contained
in x1 , and database relations P, Q.
11
Example of UIC

Functional dependency “Emp: id dept” can be
expressed as:
id dept1 dept 2 (Emp(id , dept1)  Emp(id , dept 2)
 dept1  dept 2).
which is equivalent to:
id dept1 dept 2 ( Emp(id , dept1)  Emp(id , dept 2) 
dept1  dept 2).
12
Example of RIC

Consider a database schema {Emp (id, dept),
People(id, name)}, in order to represent IND
“Emp[id] People[id]”, which says that employees are
people, we use the RIC:
id dept name [Emp(id , dept )  People(id , name)]
which is equivalent to:
id dept ( Emp(id , dept )  name ( People(id , name))
13
Special treatment of null-value

UIC holds if its satisfied by non-null values.
{Student(john,90), Student (john,null)} satisfies UIC Namegrade.

RIC is satisfied considering only non-null values for
universally quantified variables and any value for
existentially quantified variables.
{Emp(777, CS), People(777,null)} satisfies RIC Emp[id]  People [id],
so does {Emp(555,null)} and {Emp(null,cs)}.
14
Example
Given database: D = { emp(john,cs),emp(mary,ee), dept(ee),
salary(mary,2000) },
UIC: emp(X,Y)  dept(Y)
RIC: emp(X,Y)  Z salary(Y,Z)
Repair
New Database instance
Changes
1
{emp(john,cs), salary (john,null), dept(cs)
emp(mary,ee),dept(ee),salary(mary,2000) }
salary(john,null), dept(cs)
2
{emp(mary,ee),dept(ee),salary(mary,2000) }
emp(john,cs)
15
Use ASP to compute repairs
1.
dom(john). dom(mary). % for all constants a != null.
dom(cs). dom(ee). dom(2000).
2.
emp(john,cs,td). emp(mary,ee,td).
salary(mary,2000,td).
dept(ee,td).
% td denotes database fact.
3.
emp(A,B,t1):- emp(A,B,td), dom(A),dom(B).
emp(A,B,t1):- emp(A,B,ta), dom(A), dom(B).
% t1 denotes true or becomes true
% ta denotes advised to be true.
% (also for salary and dept).
16
Use ASP to compute repairs
4.
emp(A, B, fa) v dept(B, ta):- emp(A, B, t1), not dept(B, td),
dom(A),dom(B).
emp(A, B, fa) v dept(B, ta):- emp(A, B, t1), dept(B, fa ),
dom(A),dom(B).
% fa denotes advised to be false.
% repair for UIC: emp(X,Y)  dept(Y)
17
Use ASP to compute repairs
5.
emp(A, B, fa) v salary(B, null, ta) :- emp(A, B, t1), not aux(B),
not salary(B, null, td), dom(A), dom(B).
aux(B):- salary(B, Z, td), not salary(B, Z, fa), dom(B), dom(Z).
aux(B):- salary(B,Z,ta), dom(B), dom(Z).
% repair for RIC: emp(X,Y)  Z salary(Y,Z)
% aux(B) means salary(B,Z) in final database for some Z.
18
Use ASP to compute repairs
6. emp(A, B, t2) :- emp(A, B, ta).
emp(A, B, t2) :- emp(A, B, td), not emp(A, B, fa).
% t2 denotes true in the repair.
(Also for dept and salary).
7. % A tuple cannot be both deleted and inserted.
:- emp(A, B, ta), emp(A, B, fa).
(Also for dept and salary).
19
Consistent Query Answering
For Query ?emp(X,Y) ,
Add rule
ans(X,Y) :- emp(X,Y,t2).
to the repair problem, if ans(A, B) appears in all
stable models, then emp(A,B) is a consistent query answer.
20
How does it work
The basic idea behind the repair program:
If there is a possible violation of ICs, it lists possible
repairs (insertion/ deletion of tuples) in disjunction.
Since ASP produces answer sets which are minimal wrt
set inclusion, the changes should also be minimal wrt set
inclusion, matching the definition of repair.
21
Problem
Now consider
D = { p(a,a)},
RICs : p(X,Y)   Z p(Y,Z)
Clearly D satisfies the RICs.
But the repair program will generate a redundant repair,
which deletes p(a,a).
22
Grounded program
dom(a).
p(a,a,td).
p(a,a,t1):- p(a,a,td), dom(a).
p(a,a,fa) v p(a,null,ta):- p(a,a,t1), not aux(a),
not p(a,null,td),dom(a).
aux(a):- p(a,a,td), not p(a,a,fa), dom(a).
aux(a):- p(a,a,ta), dom(a).
% p(a,a,fa) justifies itself.
23
Other example with Circular justification
p(X,Y)Z q(Y,Z), q(X,Y)Z p(Y,Z),
D={p(a,b), q(b,a)}
program:
p(a,b,td). q(b,a,td).
p(a,b,fa) v q(b,null,ta):- p(a,b,t1),not aux1(b).
q(b,a,fa) v p(a,null,td):- q(b,a,t1),not aux2(a).
aux1(b):- q(b,a,td), not q(b,a,fa).
aux2(a):- p(a,b,td), not p(a,b,fa).
24
Cyclic / Acyclic RICs
-
A set of RICs is said to be acyclic if there is no
cycle in the directed graph whose vertices
correspond to the relations in R, and an edge
from P to R correspond to a RIC P(X1)  Z
R(X2,Z). Otherwise it is cyclic.
-
Examples of cyclic RIC(s):
1. XY (p(X,Y)  Z q(Y,Z)).
XY (q(X,Y)  Z p(Y,Z)).
2. XY (p(X,Y)  Z p(Y,Z)).
p
q
p
25
Problem
The problem we show earlier happens only for cyclic
RICs.
The authors concluded that their repair program
generates the exact repairs for UICs and acyclic RICs.
When cyclic RICs are presented, the program will
produce a superset of the set of the repairs.
Can we fix the repair program to make it work for cyclic
RICs?
26
New repair program
Our solution is to add constraints to prevent redundant
changes.
Suppose we have cyclic RIC set {p, q}, and in the old
program p(A,B,fa) and q(B,A,fa) justify each other.
The repair rules for this RIC in the old program look like:
p(A,B,fa) v q(B,null,ta):- G1. -- r1
q(A,B,fa) v p(B,null,ta):- G2. -- r2
and assume there is another repair rule (not for cyclic
RIC) involves p(A,B,fa).
p(A,B,fa) v H :- G3. -- r3
27
New repair program
First we rewrite r1-r3 to:
p(A,B,fa) v q(B,null,ta):- G1, not other_p_fa(1,A,B).
p_fa(1,A,B):- p(A,B,fa), G1, not other_p_fa(1,A,B).
q(A,B,fa) v p(B,null,ta):- G2, not other_q_fa(1,A,B).
q_fa(1,A,B):- q(A,B,fa), G2, not other_q_fa(1,A,B).
p(A,B,fa) v H :- G3.
p_fa(0,A,B):- p(A,B,fa), G3.
28
New repair program
Add following rules:
1. suppose we have only one cyclic RIC set,
type(1). % repair rules for cyclic RIC violation.
type(0). % repair rules for other IC violation.
2. other_p_fa(X,A,B):- p_fa(Y,A,B), X!=Y, type(X), type(Y).
other_q_fa(X,A,B):- q_fa(Y,A,B), X!=Y, type(X), type(Y).
3. Deny circular justification:
:- p_fa(1,A,B), q_fa(1,B,A).
29
An interesting observation
The main idea of our new program is to avoid circular
justification. Our method can be used in ASP- SAT
translation process.
Consider program: P = { a :- b. b:- a. }
Its completion, Comp(P) = {(a  b), (b  a) } has two models
{} and {a,b}, while P has one answer set {}.
We want to prevent circular justification between a and b.
30
ASP to SAT
Rewrite P to P’:
a :- b, not a2. % a2 -- other rule makes ‘a’ true.
a1:- a, b, not a2.
b:- a, not b2.
b1:- b, a, not b2.
:- a1, b1.
Now comp(P’) = { ( a  (b  a2)), (a1  (a  b  a2)),
( b  (a  b2)), (b1  (b  a  b2)),
(a1  b1), a2, b2 }
which has only one model {}.
31
ASP to SAT
Suppose we add fact {a} to program P, the rewritten P’ should add
{ a, a2 :- a }.
Then comp(P’) = { a, a2 a, (a1  (a  b  a2)),
( b  (a  b2)), (b1  (b  a  b2)),
(a1  b1), b2}
Now it has single model {a, b, a2, b1}, corresponds to the answer set of P,
which is {a, b}.
32
THE END
33
Download