Consistent Query Answering Under Inclusion Dependencies Authors: Loreto Bravo and Leopoldo Bertossi Carleton University, Canada Presented by: Zhijun Lin Advisors: Dr. Hactor Hernandez Dr. Yuanlin Zhang 1 Integrity Constraints Integrity constraints (ICs) describe valid database instance. For example: - “Every student has a unique ID number” - “Students can enroll only in the offered courses” - “No employees can have salary higher than his manager” 2 Inconsistent databases Inconsistent database: database that violates given integrity constraints. Some reasons for database to be inconsistent: - DBMS does not enforce all ICs. - Integration of data from different databases. - New constraints are imposed on pre existing database. - Soft or user constraints. 3 Inconsistent databases In several cases we don’t want to repair database to restore consistency: - no permission. - too expensive. - temporary inconsistency. How to obtain consistent query answer from inconsistent database? 4 Example A database instance r Student Name Grade John 90 John 80 Smith 70 IC: (functional dependency) Name Grade. 5 Example If only deletion/insertion of whole tuples are allowed, there are two ways to repair the database with minimal changes. Note Student (Smith, 70) persists in both repairs whereas Student(John,90) does not. Student Student Name Grade John 90 Smith 70 Name Grade John 80 Smith 70 6 Repair A repair of a database instance r is a database instance r’ - over the same database schema and domain, - satisfies ICs, - differs from r by a minimal set of changes (insertion or deletion of tuples) wrt set inclusion. 7 Consistent Query Answer A tuple (a1,…,an) is a consistent query answer to a query Q (x1,…,xn) in a database r if it is an answer to Q in every repair of r. 8 Example Student (Smith, 70) is a consistent answer. Student (John, 90) is not a consistent answer. For query asking for student that has higher grade than Smith, John should be a consistent answer. Student Student Name Grade John 90 Smith 70 Name Grade John 80 Smith 70 9 Classes of ICs Consider two important classes of ICs: - Universal integrity constraints (UICs) - Referential integrity constraints (RICs), also known as inclusion dependencies (INDs). 10 UIC and RIC Universal integrity constraint has the form m n x1 ,..., xn [ Pi ( xi ) Pi ( xi ) ( x1 ,..., xn )], i 1 i m 1 (2) Referentia l integrity constraint has the form x1 x3 [Q( x1 ) P( x2 , x3 )], (3) where xi are sequence of distinct v ariables, with x2 contained in x1 , and database relations P, Q. 11 Example of UIC Functional dependency “Emp: id dept” can be expressed as: id dept1 dept 2 (Emp(id , dept1) Emp(id , dept 2) dept1 dept 2). which is equivalent to: id dept1 dept 2 ( Emp(id , dept1) Emp(id , dept 2) dept1 dept 2). 12 Example of RIC Consider a database schema {Emp (id, dept), People(id, name)}, in order to represent IND “Emp[id] People[id]”, which says that employees are people, we use the RIC: id dept name [Emp(id , dept ) People(id , name)] which is equivalent to: id dept ( Emp(id , dept ) name ( People(id , name)) 13 Special treatment of null-value UIC holds if its satisfied by non-null values. {Student(john,90), Student (john,null)} satisfies UIC Namegrade. RIC is satisfied considering only non-null values for universally quantified variables and any value for existentially quantified variables. {Emp(777, CS), People(777,null)} satisfies RIC Emp[id] People [id], so does {Emp(555,null)} and {Emp(null,cs)}. 14 Example Given database: D = { emp(john,cs),emp(mary,ee), dept(ee), salary(mary,2000) }, UIC: emp(X,Y) dept(Y) RIC: emp(X,Y) Z salary(Y,Z) Repair New Database instance Changes 1 {emp(john,cs), salary (john,null), dept(cs) emp(mary,ee),dept(ee),salary(mary,2000) } salary(john,null), dept(cs) 2 {emp(mary,ee),dept(ee),salary(mary,2000) } emp(john,cs) 15 Use ASP to compute repairs 1. dom(john). dom(mary). % for all constants a != null. dom(cs). dom(ee). dom(2000). 2. emp(john,cs,td). emp(mary,ee,td). salary(mary,2000,td). dept(ee,td). % td denotes database fact. 3. emp(A,B,t1):- emp(A,B,td), dom(A),dom(B). emp(A,B,t1):- emp(A,B,ta), dom(A), dom(B). % t1 denotes true or becomes true % ta denotes advised to be true. % (also for salary and dept). 16 Use ASP to compute repairs 4. emp(A, B, fa) v dept(B, ta):- emp(A, B, t1), not dept(B, td), dom(A),dom(B). emp(A, B, fa) v dept(B, ta):- emp(A, B, t1), dept(B, fa ), dom(A),dom(B). % fa denotes advised to be false. % repair for UIC: emp(X,Y) dept(Y) 17 Use ASP to compute repairs 5. emp(A, B, fa) v salary(B, null, ta) :- emp(A, B, t1), not aux(B), not salary(B, null, td), dom(A), dom(B). aux(B):- salary(B, Z, td), not salary(B, Z, fa), dom(B), dom(Z). aux(B):- salary(B,Z,ta), dom(B), dom(Z). % repair for RIC: emp(X,Y) Z salary(Y,Z) % aux(B) means salary(B,Z) in final database for some Z. 18 Use ASP to compute repairs 6. emp(A, B, t2) :- emp(A, B, ta). emp(A, B, t2) :- emp(A, B, td), not emp(A, B, fa). % t2 denotes true in the repair. (Also for dept and salary). 7. % A tuple cannot be both deleted and inserted. :- emp(A, B, ta), emp(A, B, fa). (Also for dept and salary). 19 Consistent Query Answering For Query ?emp(X,Y) , Add rule ans(X,Y) :- emp(X,Y,t2). to the repair problem, if ans(A, B) appears in all stable models, then emp(A,B) is a consistent query answer. 20 How does it work The basic idea behind the repair program: If there is a possible violation of ICs, it lists possible repairs (insertion/ deletion of tuples) in disjunction. Since ASP produces answer sets which are minimal wrt set inclusion, the changes should also be minimal wrt set inclusion, matching the definition of repair. 21 Problem Now consider D = { p(a,a)}, RICs : p(X,Y) Z p(Y,Z) Clearly D satisfies the RICs. But the repair program will generate a redundant repair, which deletes p(a,a). 22 Grounded program dom(a). p(a,a,td). p(a,a,t1):- p(a,a,td), dom(a). p(a,a,fa) v p(a,null,ta):- p(a,a,t1), not aux(a), not p(a,null,td),dom(a). aux(a):- p(a,a,td), not p(a,a,fa), dom(a). aux(a):- p(a,a,ta), dom(a). % p(a,a,fa) justifies itself. 23 Other example with Circular justification p(X,Y)Z q(Y,Z), q(X,Y)Z p(Y,Z), D={p(a,b), q(b,a)} program: p(a,b,td). q(b,a,td). p(a,b,fa) v q(b,null,ta):- p(a,b,t1),not aux1(b). q(b,a,fa) v p(a,null,td):- q(b,a,t1),not aux2(a). aux1(b):- q(b,a,td), not q(b,a,fa). aux2(a):- p(a,b,td), not p(a,b,fa). 24 Cyclic / Acyclic RICs - A set of RICs is said to be acyclic if there is no cycle in the directed graph whose vertices correspond to the relations in R, and an edge from P to R correspond to a RIC P(X1) Z R(X2,Z). Otherwise it is cyclic. - Examples of cyclic RIC(s): 1. XY (p(X,Y) Z q(Y,Z)). XY (q(X,Y) Z p(Y,Z)). 2. XY (p(X,Y) Z p(Y,Z)). p q p 25 Problem The problem we show earlier happens only for cyclic RICs. The authors concluded that their repair program generates the exact repairs for UICs and acyclic RICs. When cyclic RICs are presented, the program will produce a superset of the set of the repairs. Can we fix the repair program to make it work for cyclic RICs? 26 New repair program Our solution is to add constraints to prevent redundant changes. Suppose we have cyclic RIC set {p, q}, and in the old program p(A,B,fa) and q(B,A,fa) justify each other. The repair rules for this RIC in the old program look like: p(A,B,fa) v q(B,null,ta):- G1. -- r1 q(A,B,fa) v p(B,null,ta):- G2. -- r2 and assume there is another repair rule (not for cyclic RIC) involves p(A,B,fa). p(A,B,fa) v H :- G3. -- r3 27 New repair program First we rewrite r1-r3 to: p(A,B,fa) v q(B,null,ta):- G1, not other_p_fa(1,A,B). p_fa(1,A,B):- p(A,B,fa), G1, not other_p_fa(1,A,B). q(A,B,fa) v p(B,null,ta):- G2, not other_q_fa(1,A,B). q_fa(1,A,B):- q(A,B,fa), G2, not other_q_fa(1,A,B). p(A,B,fa) v H :- G3. p_fa(0,A,B):- p(A,B,fa), G3. 28 New repair program Add following rules: 1. suppose we have only one cyclic RIC set, type(1). % repair rules for cyclic RIC violation. type(0). % repair rules for other IC violation. 2. other_p_fa(X,A,B):- p_fa(Y,A,B), X!=Y, type(X), type(Y). other_q_fa(X,A,B):- q_fa(Y,A,B), X!=Y, type(X), type(Y). 3. Deny circular justification: :- p_fa(1,A,B), q_fa(1,B,A). 29 An interesting observation The main idea of our new program is to avoid circular justification. Our method can be used in ASP- SAT translation process. Consider program: P = { a :- b. b:- a. } Its completion, Comp(P) = {(a b), (b a) } has two models {} and {a,b}, while P has one answer set {}. We want to prevent circular justification between a and b. 30 ASP to SAT Rewrite P to P’: a :- b, not a2. % a2 -- other rule makes ‘a’ true. a1:- a, b, not a2. b:- a, not b2. b1:- b, a, not b2. :- a1, b1. Now comp(P’) = { ( a (b a2)), (a1 (a b a2)), ( b (a b2)), (b1 (b a b2)), (a1 b1), a2, b2 } which has only one model {}. 31 ASP to SAT Suppose we add fact {a} to program P, the rewritten P’ should add { a, a2 :- a }. Then comp(P’) = { a, a2 a, (a1 (a b a2)), ( b (a b2)), (b1 (b a b2)), (a1 b1), b2} Now it has single model {a, b, a2, b1}, corresponds to the answer set of P, which is {a, b}. 32 THE END 33