View-based query answering: on the relationship between rewriting, answering, and losslessness

advertisement
View-based query answering: on the relationship
between rewriting, answering, and losslessness
Diego Calvanese
Faculty of Computer Science
Free University of Bolzano/Bozen
Giuseppe De Giacomo, Maurizio Lenzerini
Dipartimento di Informatica e Sistemistica
Università di Roma “La Sapienza”
Moshe Y. Vardi
Rice University, Houston
ICDT 2005 – Edinburgh, U.K., January 2005
View based query processing
Computing the answer to a query by relying solely on a set of
views
Relevant problem in data integration, data warehousing, query
optimization, authorization, etc.
Two different approaches:
• view based query answering
• view based query rewriting
D. Calvanese et. al
Rewriting, Answering, and Losslessness
1
View based query answering
certain
answers
certQ,V
we are
interested in
answers
to Q
Q
View definition V
V1 V2 … Vn
Database schema
R1 R2 … Rm
…
View extension E
…
Database B
Open world assumption (sound views): E ⊆ V(B)
D. Calvanese et. al
Rewriting, Answering, and Losslessness
2
View based query rewriting
certain
answers
certQ,V
answers
to RmaxQ,V
we are
interested in
answers
to Q
RmaxQ,V
Q
View definition V
V1 V2 … Vn
Database schema
R1 R2 … Rm
…
View extension E
…
Database B
Open world assumption (sound views): E ⊆ V(B)
max
RQ,V
expressed in the “same” language as Q (but on V -symbols)
D. Calvanese et. al
Rewriting, Answering, and Losslessness
3
Answering vs rewriting
• Answering and rewriting coincide in some interesting cases
(notably, in the case of conjunctive queries and views – see
later)
• However, they do not coincide in general, and therefore, it
makes sense to compare the query, the rewriting and the
certain answers
D. Calvanese et. al
Rewriting, Answering, and Losslessness
4
The main focus of this work
Principles and tools for comparing:
• query Q
max
• maximal rewriting RQ,V
of Q wrt views V
(a maximal rewriting of Q wrt V is a maximal query R over V such
that ∀B ∀E ⊆ V(B) : we have R(E) ⊆ Q(B))
• function (i.e., query) cert Q,V that computes the certain answers to
Q wrt views V , given V -extension E
(i.e., ~t ∈ cert Q,V (E) iff ∀B : E ⊆ V(B) we have ~t ∈ Q(B))
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
D. Calvanese et. al
Rewriting, Answering, and Losslessness
5
Outline
1. Framework
2. Rewriting vs answering
3. Exactness
4. Perfectness
5. Losslessness
6. Conclusions
D. Calvanese et. al
Rewriting, Answering, and Losslessness
6
Framework
Two settings:
• Relational dbs
– Conjunctive queries and views
• Semistructured data: edge labeled graph, with set Σ of basic
binary relations (edge labels) on nodes
– Queries and views: variants of regular path queries
(RPQs)
∗ an RPQ is a regular expression Q over the edge labels
∗ it returns the set of pairs of nodes connected by a path
in L(Q)
D. Calvanese et. al
Rewriting, Answering, and Losslessness
7
Two-way regular path queries (2RPQs)
Expressed as regular expression over Σ± = Σ ∪ {p− | p ∈ Σ}
(p− denotes the inverse of the binary relation p), e.g.,
Q = r·(p− + q)·p·p− ·q ∗
Answer over DB: set of pairs of nodes connected by a semipath in
DB conforming to the regular expression
d1
d3
r
DB:
p
q
d5
p
D. Calvanese et. al
q
p
r
r
Q returns
d1 d5
d1 d3
.. ..
. .
Rewriting, Answering, and Losslessness
via rqpp−
via rp− pp− q
8
Rewriting vs answering for conjunctive queries
v1 (T )
= { (T ) | movie(T, Y, D) ∧ european(D) }
v2 (T, Z) = { (T, Z) | movie(T, Y, D) ∧ review(T, Z) }
The certain answers to Q are computed by evaluating the goal Q
wrt this nonrecursive logic program [Abiteboul&Duschka 1998]:
movie(T, f1 (T ), f2 (T )) ← v1 (T )
european(f2 (T )) ← v1 (T )
movie(T, f4 (T, Z), f5 (T, Z)) ← v2 (T, Z)
review(T, Z)) ← v2 (T, Z)
The goal and the logic program can be equivalently transformed
into a finite union of conjunctive queries over the view symbols,
which is the maximal rewriting of Q wrt V
D. Calvanese et. al
Rewriting, Answering, and Losslessness
9
Rewriting vs answering for 2RPQs
Views V :
V1 = d
Query Q:
df + eg
max
=∅
RQ,V
V2 = e
V3 = f + g
cert Q,V = { (x, y) | ∃z . x V1 z ∧ x V2 z ∧ z V3 y }
x
V1 (d)
z
V3 (f + g)
y
V2 (e)
Furthermore, computing the certain answers is coNP-complete in
data complexity, while evaluating the maximal rewriting can be
done in polynomial time
D. Calvanese et. al
Rewriting, Answering, and Losslessness
10
max
Exactness: comparing RQ,V
and Q
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
max
The maximal rewriting RQ,V
of Q wrt views V is exact if for every
max
(V(B))
database B we have that Q(B) = RQ,V
Exactness means losslessness of rewriting wrt the query (note
that exactness = perfectness + losslessness)
D. Calvanese et. al
Rewriting, Answering, and Losslessness
11
Exactness in the case of conjunctive queries
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
max
• RQ,V
is a union of conjunctive queries over the V -symbols
• To check whether such union is equivalent to Q modulo V , it
suffices to check whether there is a disjunct in the unfolding
max
of RQ,V
that is equivalent to Q
• Checking whether there exists an exact rewriting of a
conjunctive query is NP-complete [Halevy&al 1995]
D. Calvanese et. al
Rewriting, Answering, and Losslessness
12
Exactness in the case of 2RPQs
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
From [Calvanese&al 2000]:
max
• RQ,V
can be constructed in 2EXPTIME, via an
automata-theoretic approach
• To check exactness, we check whether Q is contained in the
max
unfolding of RQ,V
• Checking whether there exists an exact rewriting of a 2RPQ
is 2EXPSPACE-complete
D. Calvanese et. al
Rewriting, Answering, and Losslessness
13
max
Perfectness: comparing RQ,V
and cert Q,V
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
max
of Q wrt views V is
New notion: The maximal rewriting RQ,V
perfect, if for every database B and every view extension E with
max
(E)
E ⊆ V(B) we have that cert Q,V (E) = RQ,V
Perfectness means that the maximal rewriting is powerful enough
to compute the certain answers
max
over
Perfectness allows us to compute cert Q,V by evaluating RQ,V
the view extension
D. Calvanese et. al
Rewriting, Answering, and Losslessness
14
Perfectness in the case of conjunctive queries
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
max
is always equivalent to cert Q,V
• RQ,V
max
is always perfect
• RQ,V
D. Calvanese et. al
Rewriting, Answering, and Losslessness
15
Perfectness in the case of 2RPQs
Perfectness means
max
∀B ∀E ⊆ V(B) : cert Q,V (E) ⊆ RQ,V
(E)
that is a form of view-based query containment
max
Q ⊆V RQ,V
From [Calvanese&al 2003], this can be checked in NEXPTIME,
max
is
resulting in N3EXPTIME for perfectness, since the size of RQ,V
doubly exponential in Q
New result: checking perfectness can be done in N2EXPTIME,
via characterization of view-based query answering through CSP
(lower bound open)
D. Calvanese et. al
Rewriting, Answering, and Losslessness
16
Losslessness: comparing cert Q,V and Q
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
A set of views V is lossless wrt a query Q, if for every database B
we have that Q(B) = cert Q,V (V(B))
Losslessness means that the views are powerful enough to
precisely answer the query
In the case where we have access to B , losslessness allows us to
compute cert Q,V by evaluating Q over the database
D. Calvanese et. al
Rewriting, Answering, and Losslessness
17
Losslessness in the case of conjunctive queries
RmaxQ,V
perfectness
certQ,V
losslessness
Q
exactness
max
is always equivalent to cert Q,V
• RQ,V
• Losslessness and exactness coincide, i.e., checking
losslessness is NP-complete
D. Calvanese et. al
Rewriting, Answering, and Losslessness
18
Losslessness in the case of 2RPQs: example
Views V :
Query Q:
V1 = 0 + 1
V2 = 01
V3 = 10
V4 = 000
V5 = 111
010 + 101 + 000 + 111
max
RQ,V
= V4 + V5
cert Q,V is equivalent to Q
V3
database B’
V3
database B
x1
0
V1
x2
V2
1
V1
x3
x1
0
V1
a
V1
x2
b
V1
V2
x4
x1
0
V1
x2
x3
0
V1
x4
1
V1
x4
V3
b
V1
x3
V2
D. Calvanese et. al
Rewriting, Answering, and Losslessness
19
Loslessness in the case of 2RPQs
• In [Calvanese&al 2003] we showed that losslessness is
EXPSPACE-complete for RPQs
• We show in the paper that perfectness is
EXPSPACE-complete also for 2RPQs
• To this end, we introduce the notion of linear approximation to
the certain answers
D. Calvanese et. al
Rewriting, Answering, and Losslessness
20
Losslessness in the case of 2RPQs
The linear fragment of certain answers clin Q,V for a 2RPQ Q wrt a
set V of views is the maximal two-way path query Q0 over Σ such
that ∀B : Q0 (B) ⊆ cert Q,V (V(B))
New results:
• We have a method for constructing clin Q,V (always a 2RPQ)
• We show that losslessness means ∀B Q(B) ⊆ clin Q,V (B)
• Checking losslessness is EXPSPACE-complete
perfectness
RmaxQ,V
clinQ,V
certQ,V
Q
losslessness
exactness
D. Calvanese et. al
Rewriting, Answering, and Losslessness
21
The case of lossiness (with perfectness)
• In case of losslessness, cert Q,V computes exactly Q (which is
equivalent to clin Q,V ), and Q explains cert Q,V at best
• In case of lossiness, still, we would like to express cert Q,V in
the language of the user (2RPQ over the database)
max
max
– If RQ,V
is perfect, then RQ,V
is equivalent to clin Q,V , and
max
explains cert Q,V at best
the unfolding of RQ,V
perfectness
Rmax
D. Calvanese et. al
Q,V
clinQ,V
Rewriting, Answering, and Losslessness
certQ,V
losslessness
Q
22
The case of lossiness (without perfectness)
max
• In case of lossiness and if RQ,V
is not perfect, still, we would
like to express cert Q,V as a 2RPQ over the database
– If cert Q,V is equivalent to clin Q,V , then clin Q,V explains
cert Q,V at best (and, if we have access to B , cert Q,V can be
computed by evaluating clin Q,V over the database)
perfectness
Rmax
Q,V
clinQ,V
certQ,V
losslessness
Q
New result: checking whether cert Q,V is equivalent to clin Q,V
can be done in N3EXPTIME (lower bound open)
D. Calvanese et. al
Rewriting, Answering, and Losslessness
23
Conclusions
• Answering and rewriting are different notions
• Exactness, perfectness and losslessness are different notions
• We introduced the concept of “good” approximation of cert Q,V
for 2RPQs, i.e., the linear fragment
• In our past work, we addressed
– answering, via the relationship with CSP
– rewriting, via an automata-theoretic approach
max
The paper also proposes a technique for building RQ,V
that
reconciles the two approaches (see the proceedings)
D. Calvanese et. al
Rewriting, Answering, and Losslessness
24
Future work
• A few lower bounds open
• In the case cert Q,V is not equivalent to clin Q,V , we would like
to exhibit a nonlinear counterexample database explaining
lossiness to the user
• Many open problems with exact views:
– perfectness and losslessness in the case of conjunctive
max
and cert Q,V do not coincide)
queries and views (RQ,V
– perfectness and losslessness in the case of 2RPQs
• More expressive language, i.e., C2RPQs
D. Calvanese et. al
Rewriting, Answering, and Losslessness
25
Download