View-based query answering: on the relationship between rewriting, answering, and losslessness Diego Calvanese Faculty of Computer Science Free University of Bolzano/Bozen Giuseppe De Giacomo, Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza” Moshe Y. Vardi Rice University, Houston ICDT 2005 – Edinburgh, U.K., January 2005 View based query processing Computing the answer to a query by relying solely on a set of views Relevant problem in data integration, data warehousing, query optimization, authorization, etc. Two different approaches: • view based query answering • view based query rewriting D. Calvanese et. al Rewriting, Answering, and Losslessness 1 View based query answering certain answers certQ,V we are interested in answers to Q Q View definition V V1 V2 … Vn Database schema R1 R2 … Rm … View extension E … Database B Open world assumption (sound views): E ⊆ V(B) D. Calvanese et. al Rewriting, Answering, and Losslessness 2 View based query rewriting certain answers certQ,V answers to RmaxQ,V we are interested in answers to Q RmaxQ,V Q View definition V V1 V2 … Vn Database schema R1 R2 … Rm … View extension E … Database B Open world assumption (sound views): E ⊆ V(B) max RQ,V expressed in the “same” language as Q (but on V -symbols) D. Calvanese et. al Rewriting, Answering, and Losslessness 3 Answering vs rewriting • Answering and rewriting coincide in some interesting cases (notably, in the case of conjunctive queries and views – see later) • However, they do not coincide in general, and therefore, it makes sense to compare the query, the rewriting and the certain answers D. Calvanese et. al Rewriting, Answering, and Losslessness 4 The main focus of this work Principles and tools for comparing: • query Q max • maximal rewriting RQ,V of Q wrt views V (a maximal rewriting of Q wrt V is a maximal query R over V such that ∀B ∀E ⊆ V(B) : we have R(E) ⊆ Q(B)) • function (i.e., query) cert Q,V that computes the certain answers to Q wrt views V , given V -extension E (i.e., ~t ∈ cert Q,V (E) iff ∀B : E ⊆ V(B) we have ~t ∈ Q(B)) RmaxQ,V perfectness certQ,V losslessness Q exactness D. Calvanese et. al Rewriting, Answering, and Losslessness 5 Outline 1. Framework 2. Rewriting vs answering 3. Exactness 4. Perfectness 5. Losslessness 6. Conclusions D. Calvanese et. al Rewriting, Answering, and Losslessness 6 Framework Two settings: • Relational dbs – Conjunctive queries and views • Semistructured data: edge labeled graph, with set Σ of basic binary relations (edge labels) on nodes – Queries and views: variants of regular path queries (RPQs) ∗ an RPQ is a regular expression Q over the edge labels ∗ it returns the set of pairs of nodes connected by a path in L(Q) D. Calvanese et. al Rewriting, Answering, and Losslessness 7 Two-way regular path queries (2RPQs) Expressed as regular expression over Σ± = Σ ∪ {p− | p ∈ Σ} (p− denotes the inverse of the binary relation p), e.g., Q = r·(p− + q)·p·p− ·q ∗ Answer over DB: set of pairs of nodes connected by a semipath in DB conforming to the regular expression d1 d3 r DB: p q d5 p D. Calvanese et. al q p r r Q returns d1 d5 d1 d3 .. .. . . Rewriting, Answering, and Losslessness via rqpp− via rp− pp− q 8 Rewriting vs answering for conjunctive queries v1 (T ) = { (T ) | movie(T, Y, D) ∧ european(D) } v2 (T, Z) = { (T, Z) | movie(T, Y, D) ∧ review(T, Z) } The certain answers to Q are computed by evaluating the goal Q wrt this nonrecursive logic program [Abiteboul&Duschka 1998]: movie(T, f1 (T ), f2 (T )) ← v1 (T ) european(f2 (T )) ← v1 (T ) movie(T, f4 (T, Z), f5 (T, Z)) ← v2 (T, Z) review(T, Z)) ← v2 (T, Z) The goal and the logic program can be equivalently transformed into a finite union of conjunctive queries over the view symbols, which is the maximal rewriting of Q wrt V D. Calvanese et. al Rewriting, Answering, and Losslessness 9 Rewriting vs answering for 2RPQs Views V : V1 = d Query Q: df + eg max =∅ RQ,V V2 = e V3 = f + g cert Q,V = { (x, y) | ∃z . x V1 z ∧ x V2 z ∧ z V3 y } x V1 (d) z V3 (f + g) y V2 (e) Furthermore, computing the certain answers is coNP-complete in data complexity, while evaluating the maximal rewriting can be done in polynomial time D. Calvanese et. al Rewriting, Answering, and Losslessness 10 max Exactness: comparing RQ,V and Q RmaxQ,V perfectness certQ,V losslessness Q exactness max The maximal rewriting RQ,V of Q wrt views V is exact if for every max (V(B)) database B we have that Q(B) = RQ,V Exactness means losslessness of rewriting wrt the query (note that exactness = perfectness + losslessness) D. Calvanese et. al Rewriting, Answering, and Losslessness 11 Exactness in the case of conjunctive queries RmaxQ,V perfectness certQ,V losslessness Q exactness max • RQ,V is a union of conjunctive queries over the V -symbols • To check whether such union is equivalent to Q modulo V , it suffices to check whether there is a disjunct in the unfolding max of RQ,V that is equivalent to Q • Checking whether there exists an exact rewriting of a conjunctive query is NP-complete [Halevy&al 1995] D. Calvanese et. al Rewriting, Answering, and Losslessness 12 Exactness in the case of 2RPQs RmaxQ,V perfectness certQ,V losslessness Q exactness From [Calvanese&al 2000]: max • RQ,V can be constructed in 2EXPTIME, via an automata-theoretic approach • To check exactness, we check whether Q is contained in the max unfolding of RQ,V • Checking whether there exists an exact rewriting of a 2RPQ is 2EXPSPACE-complete D. Calvanese et. al Rewriting, Answering, and Losslessness 13 max Perfectness: comparing RQ,V and cert Q,V RmaxQ,V perfectness certQ,V losslessness Q exactness max of Q wrt views V is New notion: The maximal rewriting RQ,V perfect, if for every database B and every view extension E with max (E) E ⊆ V(B) we have that cert Q,V (E) = RQ,V Perfectness means that the maximal rewriting is powerful enough to compute the certain answers max over Perfectness allows us to compute cert Q,V by evaluating RQ,V the view extension D. Calvanese et. al Rewriting, Answering, and Losslessness 14 Perfectness in the case of conjunctive queries RmaxQ,V perfectness certQ,V losslessness Q exactness max is always equivalent to cert Q,V • RQ,V max is always perfect • RQ,V D. Calvanese et. al Rewriting, Answering, and Losslessness 15 Perfectness in the case of 2RPQs Perfectness means max ∀B ∀E ⊆ V(B) : cert Q,V (E) ⊆ RQ,V (E) that is a form of view-based query containment max Q ⊆V RQ,V From [Calvanese&al 2003], this can be checked in NEXPTIME, max is resulting in N3EXPTIME for perfectness, since the size of RQ,V doubly exponential in Q New result: checking perfectness can be done in N2EXPTIME, via characterization of view-based query answering through CSP (lower bound open) D. Calvanese et. al Rewriting, Answering, and Losslessness 16 Losslessness: comparing cert Q,V and Q RmaxQ,V perfectness certQ,V losslessness Q exactness A set of views V is lossless wrt a query Q, if for every database B we have that Q(B) = cert Q,V (V(B)) Losslessness means that the views are powerful enough to precisely answer the query In the case where we have access to B , losslessness allows us to compute cert Q,V by evaluating Q over the database D. Calvanese et. al Rewriting, Answering, and Losslessness 17 Losslessness in the case of conjunctive queries RmaxQ,V perfectness certQ,V losslessness Q exactness max is always equivalent to cert Q,V • RQ,V • Losslessness and exactness coincide, i.e., checking losslessness is NP-complete D. Calvanese et. al Rewriting, Answering, and Losslessness 18 Losslessness in the case of 2RPQs: example Views V : Query Q: V1 = 0 + 1 V2 = 01 V3 = 10 V4 = 000 V5 = 111 010 + 101 + 000 + 111 max RQ,V = V4 + V5 cert Q,V is equivalent to Q V3 database B’ V3 database B x1 0 V1 x2 V2 1 V1 x3 x1 0 V1 a V1 x2 b V1 V2 x4 x1 0 V1 x2 x3 0 V1 x4 1 V1 x4 V3 b V1 x3 V2 D. Calvanese et. al Rewriting, Answering, and Losslessness 19 Loslessness in the case of 2RPQs • In [Calvanese&al 2003] we showed that losslessness is EXPSPACE-complete for RPQs • We show in the paper that perfectness is EXPSPACE-complete also for 2RPQs • To this end, we introduce the notion of linear approximation to the certain answers D. Calvanese et. al Rewriting, Answering, and Losslessness 20 Losslessness in the case of 2RPQs The linear fragment of certain answers clin Q,V for a 2RPQ Q wrt a set V of views is the maximal two-way path query Q0 over Σ such that ∀B : Q0 (B) ⊆ cert Q,V (V(B)) New results: • We have a method for constructing clin Q,V (always a 2RPQ) • We show that losslessness means ∀B Q(B) ⊆ clin Q,V (B) • Checking losslessness is EXPSPACE-complete perfectness RmaxQ,V clinQ,V certQ,V Q losslessness exactness D. Calvanese et. al Rewriting, Answering, and Losslessness 21 The case of lossiness (with perfectness) • In case of losslessness, cert Q,V computes exactly Q (which is equivalent to clin Q,V ), and Q explains cert Q,V at best • In case of lossiness, still, we would like to express cert Q,V in the language of the user (2RPQ over the database) max max – If RQ,V is perfect, then RQ,V is equivalent to clin Q,V , and max explains cert Q,V at best the unfolding of RQ,V perfectness Rmax D. Calvanese et. al Q,V clinQ,V Rewriting, Answering, and Losslessness certQ,V losslessness Q 22 The case of lossiness (without perfectness) max • In case of lossiness and if RQ,V is not perfect, still, we would like to express cert Q,V as a 2RPQ over the database – If cert Q,V is equivalent to clin Q,V , then clin Q,V explains cert Q,V at best (and, if we have access to B , cert Q,V can be computed by evaluating clin Q,V over the database) perfectness Rmax Q,V clinQ,V certQ,V losslessness Q New result: checking whether cert Q,V is equivalent to clin Q,V can be done in N3EXPTIME (lower bound open) D. Calvanese et. al Rewriting, Answering, and Losslessness 23 Conclusions • Answering and rewriting are different notions • Exactness, perfectness and losslessness are different notions • We introduced the concept of “good” approximation of cert Q,V for 2RPQs, i.e., the linear fragment • In our past work, we addressed – answering, via the relationship with CSP – rewriting, via an automata-theoretic approach max The paper also proposes a technique for building RQ,V that reconciles the two approaches (see the proceedings) D. Calvanese et. al Rewriting, Answering, and Losslessness 24 Future work • A few lower bounds open • In the case cert Q,V is not equivalent to clin Q,V , we would like to exhibit a nonlinear counterexample database explaining lossiness to the user • Many open problems with exact views: – perfectness and losslessness in the case of conjunctive max and cert Q,V do not coincide) queries and views (RQ,V – perfectness and losslessness in the case of 2RPQs • More expressive language, i.e., C2RPQs D. Calvanese et. al Rewriting, Answering, and Losslessness 25