Document 13986717

advertisement
Containment of Conjuntive Regular Path Queries with Inverse
Diego Calvanese1 , Giuseppe De Giaomo1 , Maurizio Lenzerini1 , Moshe Y. Vardi2
1
Dipartimento di Informatia e Sistemistia
Universita di Roma \La Sapienza"
Via Salaria 113, I-00198 Roma, Italy
2
lastname dis.uniroma1.it
http://www.dis.uniroma1.it/lastname
vardis.rie.edu
http://www.s.rie.edu/vardi
Abstrat
Reasoning on queries is a basi problem both
in knowledge representation and databases.
A fundamental form of reasoning on queries is
heking ontainment, i.e., verifying whether
one query yields neessarily a subset of the result of another query. Query ontainment is
ruial in several ontexts, suh as query optimization, knowledge base veriation, information integration, database integrity heking, and ooperative answering.
In this paper we address the problem of query
ontainment in the ontext of semistrutured
knowledge bases, where the basi querying mehanism, namely regular path queries,
asks for all pairs of objets that are onneted
by a path onforming to a regular expression. We onsider onjuntive regular path
queries with inverse, whih extend regular
path queries with the possibility of using both
the inverse of binary relations, and onjuntions of atoms, where eah atom speies that
one regular path query with inverse holds
between two variables. We present a novel
tehnique to hek ontainment of queries in
this lass, based on the use of two-way nite
automata. The tehnique shows the power
of two-way automata in dealing with the inverse operator and with the variables in the
queries. We also haraterize the omputational omplexity of both the proposed algorithm and the problem.
1
INTRODUCTION
Querying is the fundamental mehanism for extrating
information from a knowledge base. The basi rea-
Department of Computer Siene
Rie University, P.O. Box 1892
Houston, TX 77251-1892, U.S.A.
1
soning task assoiated to querying is query answering, whih amounts to ompute the information to
be returned as result of a query. However, there are
other reasoning servies involving queries that knowledge representation systems should support. One of
the most important is heking ontainment, i.e., verifying whether one query yields neessarily a subset of
the result of another one. Query ontainment is ruial in several ontexts, suh as query optimization,
knowledge base veriation, information integration,
integrity heking, and ooperative answering.
Query optimization aims at improving the eÆieny of
query answering, and largely benets from the possibility of performing various kinds of omparisons between query expressions. In partiular, query ontainment heks are useful to reognize equivalent subexpressions, to avoid omputing results already available, to reognize the possibility of using materialized views, and to use integrity onstraints to speedup query proessing (Levy & Sagiv, 1995; Chaudhuri,
Krishnamurthy, Potarnianos, & Shim, 1995; Widom,
1995; Adali, Candan, Papakonstantinou, & Subrahmanian, 1996; Buneman, Davidson, Hillebrand, & Suiu,
1996; Fernandez & Suiu, 1998; Milo & Suiu, 1999).
Reently, it has been shown that query ontainment
is relevant for the task of knowledge base veriation
(Levy & Rousset, 1997). In (Levy & Rousset, 1998b),
the problem of verifying whether a knowledge base
produes the orret set of output for any set of input is solved by means of a method that exploits the
ability of heking query ontainment.
One of the major issues in information integration
(Fensel, Knoblok, Kushmerik, & Rousset, 1999) is
to reformulate a query expressed over a unied domain representation in terms of the loal soures. Several reent papers point out that query ontainment
is essential for this purpose (Calvanese, De Giaomo,
Lenzerini, Nardi, & Rosati, 1998; Levy, Rajaraman,
& Ordille, 1996; Knoblok & Levy, 1995; Friedman,
Levy, & Millstein, 1999). Also, many information integration appliations are developed on the web, where
data are expressed using XML-like languages (Bray,
Paoli, & Sperberg-MQueen, 1998; Calvanese, De Giaomo, & Lenzerini, 1999) and semistrutured models (Buneman, 1997; Floresu, Levy, & Mendelzon,
1998a). This new senario poses interesting hallenges
to both database and knowledge representation tehnologies, and query ontainment is one notable example of suh hallenges (Floresu, Levy, & Suiu, 1998b;
Calvanese, De Giaomo, Lenzerini, & Vardi, 2000).
Besides the above desribed appliations, query ontainment is ruial in integrity onstraint heking
(Gupta & Ullman, 1992; Fernandez, Floresu, Levy,
& Suiu, 1999), ooperative answering (Motro, 1996),
and in general in knowledge representation systems
based on desription logis and oneptual graphs,
where it omes in the form of subsumption hek, and
is at the heart of all relevant reasoning tasks (Donini,
Lenzerini, Nardi, & Shaerf, 1996; Eklund, Nagle, Nagle, & Gerholz, 1992; Donini, Lenzerini, Nardi, &
Shaerf, 1998; Levy & Rousset, 1998a).
Needless to say, query ontainment is undeidable if
we do not limit the expressive power of the query language. In fat, in knowledge representation suitable
query languages have been designed for retaining deidability. The same is true in databases, where the
notion of onjuntive query is the basi one in the investigation on reasoning on queries (Chandra & Merlin, 1977). A onjuntive query is simply a onjuntion
of atoms, where eah atom is built out from relation
symbols and (existentially quantied) variables, and
orrespond to a single rule in non-reursive datalog.
Most of the results on query ontainment onern onjuntive queries and their extensions. In (Chandra &
Merlin, 1977), NP-ompleteness has been established
for onjuntive queries, in (Klug, 1988; van der Meyden, 1992), 2 -ompleteness of ontainment of onjuntive queries with inequalities was proved, and in
(Sagiv & Yannakakis, 1980) the ase of queries with
the union and dierene operators was studied. For
various lasses of datalog queries with inequalities, deidability and undeidability results were presented
in (Chaudhuri & Vardi, 1992) and (van der Meyden,
1992), respetively. Other papers onsider the ase of
onjuntive query ontainment in the presene of various types of onstraints (Aho, Sagiv, & Ullman, 1979;
David S. Johnson, 1984; Chan, 1992; Levy & Rousset,
1996; Levy & Suiu, 1997), or in knowledge representation systems integrating datalog and desription logis
(Levy & Rousset, 1998a).
In this paper we address the problem of query ontainment in the ontext of a general form of knowledge
bases, alled semistrutured knowledge bases. Our
goal is to apture the essential features of knowledge
bases as found in semanti networks, desription logis,
oneptual graphs, and in databases, both traditional
and semistrutured. For this purpose, we oneive a
knowledge base as a labeled graph, where nodes represent objets, and a labeled edge between two nodes
represents the fat that the binary relation denoted by
the label holds for the objets.
In our framework, the basi querying mehanism is
the one of regular path queries (Buneman, 1997; Calvanese, De Giaomo, Lenzerini, & Vardi, 1999; Abiteboul, Buneman, & Suiu, 1999), whih ask for all pairs
of objets that are onneted by a path onforming
to a regular expression. Regular path queries are extremely useful for expressing omplex navigations in a
graph. In partiular, union and transitive losure are
ruial when we do not have a omplete knowledge of
the struture of the knowledge base.
In this work, we onsider onjuntive regular path
queries with inverse, whih extend regular path queries
with the possibility of using both the inverse of binary
relations, and onjuntions of atoms, where eah atom
speies that one regular path query with inverse holds
between two variables. Notably, several authors argue
that these kinds of extensions are essential for making
regular path queries useful in real settings (see for example (Buneman, 1997; Buneman et al., 1996; Milo &
Suiu, 1999)). Conjuntive regular path queries have
been studied in (Floresu et al., 1998b), where an EXPSPACE algorithm for query ontainment in this lass
is presented. However, no lower bound is provided
for the problem, and, moreover, the method does not
seem easily generalizable to take into aount the inverse operator. The ase with the inverse operator
is impliitly addressed in (Calvanese, De Giaomo, &
Lenzerini, 1998), where an 2EXPTIME algorithm is
proposed for query ontainment. However, the framework investigated in (Calvanese et al., 1998) inludes
a rih set of onstraints, and is not suited for a preise
haraterization of ontainment of onjuntive regular
path queries with inverse.
We present a novel tehnique to hek ontainment of
queries in this lass, based on the use of two-way nite
automata. Dierently from standard nite state automata, two-way automata are equipped with a head
that an move bak and forth on the input string. A
transition of these kinds of automata indiates not only
the new state, but also whether the head should move
left, right, or stay in plae. Our tehnique shows the
p
2
power of two-way automata in dealing with the inverse
operator and with the variables in onjuntive queries.
In partiular, we desribe an algorithm that heks
ontainment of two queries by heking nonemptiness
of a two-way automaton onstruted from the two
queries. The algorithm works in exponential spae,
and therefore has the same worst-ase omplexity as
the best algorithm known for the ase of onjuntive
regular path queries without inverse (Floresu et al.,
1998b).
We also prove an EXPSPACE lower bound for
the omputational omplexity of the problem, thus
demonstrating that our method is essentially optimal.
Interestingly, the lower bound holds even if we disregard the inverse operator, and therefore provides the
solution to the open problem of whether ontainment
of onjuntive regular path queries ould be done in
PSPACE (Floresu et al., 1998b). Besides the spei
result, our method provides the basis for using two-way
automata in reasoning on omplex queries, and an be
adapted to more general forms of queries (e.g., unions
of onjuntive queries) and reasoning tasks (e.g., query
rewriting), as well as to other formalisms in Artiial
Intelligene (e.g., temporal and dynami logis with a
bakward modality, and ation theories with onverse).
Regular path queries with inverse (RPQIs) are expressed by means of regular expressions or nite automata over . Thus, in ontrast with RPQs, RPQIs
may use also the inverse p of p, for eah p 2 0 .
When evaluated over a KB K, an RPQI E omputes
the set ans (E; K) of pairs of nodes onneted by a
semipath that onforms to the regular language L(E )
dened by E . A semipath in K from x to y is a sequene of the form (y1 ; r1 ; y2 ; r2 ; y3; : : : ; y ; r ; y +1 ),
where q 0, y1 = x, y +1 = y, and for eah y ; r ; y +1 ,
q
q
q
i
q
i
i
either y ! y +1 or y +1 ! y is in K. The semipath
onforms to E if r1 r 2 L(E ). The semipath is
simple if eah y , for i 2 f2; : : : ; qg, is a node that
does not our elsewhere in the semipath.
Finally, we add to RPQIs the possibility of using onjuntions of atoms, where eah atom speies that one
regular path query with inverse holds between two
variables. More preisely, if is an alphabet of variables, then a onjuntive regular path query with inverse (CRPQI) Q is a formula of the form
i
r
r
i
i
i
i
i
q
i
y1 E1 y2 ^ ^ y2m
Q(x1 ; : : : ; xn )
1 Em y2m
where x1 ; : : : ; x ; y1 ; : : : ; y2 range over a set
fu1; :::; u g of variables in , eah x , alled a distinguished variable, is one of y1 ; : : : ; y2 , and E1 ; : : : ; E
are RPQIs.
The answer set ans (Q; K) to a CRPQI Q over a
KB K = (D; E ) is the set of tuples (d1 ; : : : ; d ) of
nodes of K suh that there is a total mapping from
fu1; : : : ; u g to D with (x ) = d for every distinguished variable x of Q, and ((y); (z )) 2 ans (E; K)
for every onjunt yEz in Q.
n
m
k
i
m
2
KNOWLEDGE BASES AND
QUERIES
We onsider a semistrutured knowledge base (KB) K
as an edge-labeled graph (D; E ), where D is the set of
nodes, and E is the set of edges labeled with elements
of an alphabet 0 . A node represents an objet, and
an edge between nodes x and y labeled p represents
the fat that the binary relation p holds for the pair
(x; y). We denote an edge from x to y labeled by p
with x ! y.
The basi querying mehanism on a KB is that of regular path queries (RPQs). An RPQ R is expressed as a
regular expression or a nite automaton, and omputes
the set of pairs of nodes of the KB onneted by a path
that onforms to the regular language L(R) dened by
R. As we said in the introdution, we onsider queries
that extend regular path queries with both the inverse
operator, and the possibility of using onjuntions and
variables.
Formally, let = 0 [ fp j p 2 0 g be the alphabet
inluding a new symbol p for eah p in 0 . Intuitively,
p denotes the inverse of the binary relation p. If
r 2 , then we use r to mean the inverse of r, i.e., if
r is p, then r is p , and if r is p , then r is p.
m
n
k
i
i
i
Let us onsider a KB of parental relationships. The CRPQI
Example 1
Q(x1 ; x2 )
x1 (father father [ mother mother)+ x2 ^
x1 (father [ mother) father y ^
x2 (father [ mother) mother y
p
returns all pairs of individuals (x1 ; x2 ) that are in the
transitive losure of the sibling (inluding stepsibling)
relation, and suh that there is some individual y suh
that x1 and x2 have two desendants who are respetively the father and the mother of y.
3
Given two CRPQIs Q1 and Q2 , we say that Q1 is ontained in Q2 , written Q1 Q2 , if for every KB K,
ans (Q1 ; K) ans (Q2 ; K). Obviously, Q1 6 Q2 i
there is a ounterexample KB to Q1 Q2 , i.e., a KB
K with a tuple in ans (Q1 ; K) and not in ans (Q2 ; K).
Note that the existene of a (Q2 ; K; )-mapping implies that ((x1 ); : : : ; (x )) 2 ans (Q2 ; K).
The following theorem provides an important haraterization of ontainment between CRPQIs.
It is possible to verify that the
Example 1 (ont.)
CRPQI
Q0 (x1 ; x2 )
n
x1 father fathermother mother x2 ^
x1 fathermother x2
is ontained in Q.
Theorem 2
In order to haraterize ontainment between CRPQIs, we introdue the notion of anonial KB. Let
Q be the CRPQI
Q(x1 ; : : : ; xn )
y1 E1 y2 ^ ^ y2m
exists.
For the if-part, it is easy to see that
For the only-ifpart, it is possible to show that any ounterexample
an be transformed into a KB of the form stated in the
theorem and that suh KB is still a ounterexample to
Q1 Q2 .
Proof (sketh).
1 Em y2m
K is a ounterexample to Q1 Q2 .
K be a KB, and be a total mapping from the variables fu1; : : : ; u g of Q to the nodes of K. Then K is
said to be -anonial for Q if:
k
K onstitutes of m simple semipaths, one for eah
onjunt of Q, whih are node and edge disjoint,
i.e., only start and end nodes an be shared between dierent semipaths.
3
for i 2 f1; : : : ; mg, the simple semipath assoiated to the onjunt y2 1 E y2 onnets the
node (y2 1 ) to the node (y2 ), and onforms
to E .
i
i
i
i
i
It is easy to see that, if K is -anonial for Q, then
the tuple ( (x1 ); : : : ; (x )) belongs to ans (Q; K), and
therefore ans (Q; K) is nonempty.
In the following, we assume that Q , for h = f1; 2g,
are of the form
n
h
yh;1 Eh;1 yh;2 ^ ^
yh;2mh 1 Eh;mh yh;2mh
i.e., Q1 and Q2 have the same distinguished variables
x1 ; : : : ; x , and the sets of non-distinguished variables
of respetively Q1 and Q2 are disjoint. This assumption an be made without loss of generality, sine we
an rename variables and simulate equality between x
and y by introduing x"y in the right hand side.
Let K = (D; E ) be a -anonial KB for Q1. A total
mapping from the variables of Q2 to D, is alled a
(Q2 ; K; )-mapping if
m
i
; j
; j
;j
m
i
i
i
j
i
m
m
n
i
for all j 2 f1; : : : ; m2 g, we have that
((y2 2 1 ); (y2 2 )) 2 ans (E2 ; K).
m
i
it maps distinguished variables of Q2 into nodes
of K orresponding to distinguished variables of
Q1 , i.e., for eah x , we have that (x ) = (x ),
and
i
A two-way automaton (Hoproft & Ullman, 1979;
Vardi, 1989) A = ( ; S; I; Æ; F ) onsists of an alphabet , a nite set of states S , a set I S of initial
states, a transition funtion Æ : S ! 2 f 1 0 1g,
and a set F S of aepting states. Intuitively, a
transition indiates both the new state of the automaton, and whether the head reading the input should
move left ( 1), right (1), or stay in plae (0). If for all
s 2 S and a 2 we have that Æ (s; a) S f1g, then
the automaton is a traditional nondeterministi nite
state automaton (also alled one-way automaton ).
A onguration of A is a pair onsisting of a state
and a position represented as a natural number. A
run is a sequene of ongurations. The sequene
((s0 ; j0 ); : : : ; (s ; j )) is a run of A on a word w =
a0 ; : : : ; a 1 in if s0 2 I , j0 = 0, j n, and for
all i 2 f0; : : : ; m 1g, we have that 0 j < n, and
there is some (t; k) 2 Æ(s ; a i ) suh that s +1 = t and
j +1 = j + k . This run is aepting if j = n and
s 2 F . A aepts w if it has an aepting run on w.
The set of words aepted by A is denoted L(A).
It is well known that two-way automata dene regular
languages (Hoproft & Ullman, 1979), and that, given
a two-way automaton with n states, we an onstrut
a one-way automaton with O(2 log ) states aepting
the same language (Vardi, 1989).
We want to gain some intuition on how two-way automata apture omputations of CRPQIs over KBs.
To this end we show how a two-way automaton an
be used for the fundamental task of verifying the
nonemptiness of an RPQI over a given KB.
n
n
TWO-WAY AUTOMATA
S
i
Qh (x1 ; : : : ; xn )
Let Q1 and Q2 be two CRPQIs. Then
6 Q2 i there exists a KB K and a mapping from
the variables of Q1 to the nodes of K suh that (i) K
is -anonial for Q1 and (ii) no (Q2 ; K; )-mapping
Q1
4
n
; ;
3. (s2 ; 1) 2 Æ (s1 ; r) and (s2 ; 0) 2 Æ (s1 ; r ), for
eah transition s2 2 Æ(s1 ; r) of E . These transitions orrespond to the transitions of E whih are
performed forward or bakward aording to the
urrent \sanning mode".
The basi idea that allows us to exploit automata is
that we an represent speial KBs by means of words.
In partiular, we onsider KBs in whih the domain
D ontains a xed set D0 of nodes, and that are onstituted by simple semipaths whih are node and edge
disjoint and suh that the start and end nodes of eah
semipath are in D0 . Eah suh KB K = (D; E ) is represented by a word WK over the alphabet [D0 [f$g,
whih has the form
$d1 w1 d2 $d3 w2 d4 $ $d2
A
((s; d); 0)
((s; d); 0)
((s; d); 1)
((s; d); 1)
(s; 0)
(s; 1)
where d1 ; : : : ; d2 range over D0 , w 2 + , and the $
ats as a separator. Speially, WK onsists of one
subword d2 1 w d2 , for eah simple semipath in K
from d2 1 to d2 onforming to w . Observe that we
an represent the same KB by several words that differ in the diretion onsidered for the semipaths and
in the order in whih the subwords orresponding to
the semipaths appear.
Now, given an RPQI E , we build a two-way automaton
A over the alphabet [D0 [f$g suh that A aepts
a word WK i E has a nonempty answer on the KB K
represented by WK . To do so we exploit the ability of
two-way automata to:
m
i
i
i
i
i
E
move on the word both forward and bakward,
whih orresponds to traversing edges of the KB
in both diretions;
\jump" from one position in the word representing a node to any other position (either preeding
or sueeding) representing the same node.
A
A
A
Let E be an RPQI, K a KB over D, and
WK the word representing K. Then AE aepts WK i
ans (E; K) is nonempty.
4
f
CHECKING CONTAINMENT
We haraterize both the upper bound and the lower
bound of the problem of heking ontainment between
CRPQIs.
A
1. (s0 ; 1) 2 Æ (s0 ; `), for eah ` 2 , and also
(s; 1) 2 Æ (s0 ; `) for eah s 2 I . These transitions plae the head of the automaton in some
randomly hosen position of the input string and
set the state of the automaton to some randomly
hosen initial state of E .
A
A
Theorem 3
b
A
A
A
Observe that the separator symbol $ does not allow
transitions exept in \searh mode". Its role is to fore
the automaton to move in the orret diretion when
exiting \searh mode".
The following theorem haraterizes the relationship
between an RPQI and the orresponding two-way automaton.
E
E
for eah ` 2 for eah ` 2 A
These two apabilities ensure that the automaton evaluating the RPQI on the word simulates exatly the
evaluation of the query on the KB.
To onstrut A , we assume that E is represented as
a nite (one-way) automaton E = (; S; I; Æ; F ) over
the alphabet . Then A = ( ; S ; fs0g; Æ ; fs g),
where = [ D0 [ f$g, S = S [ fs0 g [ fs j s 2
S g [ S D0 , and Æ is dened as follows:
A
ÆA (s; d)
ÆA (sb ; d)
ÆA ((s; d); `);
ÆA ((s; d); `);
ÆA ((s; d); d)
ÆA (s; d)
5. (s; 1) 2 Æ (s; `), for eah s 2 F and eah ` 2 .
These transitions move the head of the automaton
to the end of the input string, when the automaton enters a nal state.
E
2
2
2
2
2
2
Whenever the automaton reahes a symbol representing a node d (rst and seond lause), it
may enter into \searh (for d) mode" and move to
any other ourrene of d in the word. Then the
automaton exits searh mode (seond last lause)
and ontinues its omputation either forward (last
lause) or bakward (see item 2).
i
i
b
4. for eah s 2 S and eah d 2 D0
1 wm d2m $
m
A
A
A
2. (s ; 1) 2 Æ (s; `), for eah s 2 S and ` 2 [ D0 .
At any point suh transition makes the automaton
ready to san one step bakward by plaing it in
\bakward mode".
b
4.1
UPPER BOUND
Let Q , for h = f1; 2g, be in the form
h
Qh (x1 ; : : : ; xn )
A
5
yh;1 Eh;1 yh;2 ^ ^
yh;2mh 1 Eh;mh yh;2mh
and let V1 ; V2 be the sets of variables of Q1 and
respetively. To show that Q1 is not ontained in
Q2
$`1 ` $ with eah symbol ` 6= $ annotated with is represented by the word $(`1 ; 1 ) (` ; )$ over
the alphabet (( [ D0 ) 2V2 ) [ f$g. The intended
meaning is that the variables in are mapped in K
to the node ` , if ` 2 D0 , and are mapped to the target node of the edge orresponding to the ourrene
of ` , if ` 2 .
Given a word W 0 representing an annotated Q1 -word
W , an automaton A2 an hek if the annotation orresponds to a (Q2 ; K ; )-mapping. To onstrut
A2 , we dene:
Q2 , we have to searh for a ounterexample KB. From
r
Theorem 2, we know that it is suÆient to look for a
KB K and a mapping from the variables of Q1 to
the nodes of K, suh that K is -anonial for Q1 , and
suh that no (Q2 ; K; )-mapping exists.
To generate andidate ounterexample KBs and assoiated -mappings, we rst onstrut a one-way automaton A1 that aepts the set of words of the form
$d1 w1 d2 $d3 w2 d4 $ $d2
i
i
2. A one-way automaton A2 that heks that for every variable y 2 V2 , either y appears in the annotation of at most one symbol in , or it appears
in the annotation of every ourrene of a symbol
d 2 D0 .
s
i
i
3. A one-way automaton A$2 that heks that every
ourrene in W of a symbol preeding a $ is annotated in W 0 with the same set of variables as
the symbol preeding it.
A one-way automaton A1 that heks that in a
word w the distint symbols d 2 D0 appearing in
w onstitute a partition of V1 .
p
i
A one-way automaton A1 that heks that in a
word w, whenever two symbols d2 1 and d2 are
adjaent, then d2 1 = d2 .
4. A two-way automaton A2 2 that heks that for all
i 2 f1; : : : ; m2 g, the atom y2 2 1 E2 y2 2 of Q2 is
satised in K , i.e., there are symbols `1, `2 in W
annotated in W 0 with 1 and 2 respetively, suh
that y2 2 1 2 1 , y2 2 2 2 , and the pair of nodes
orresponding to `1 and `2 is in ans (E2 ; K ).
To build A2 2 we exploit the onstrution in Setion 3.
a
i
i
Q
i
; i
i
The number of states in A1 1 is polynomial in the size
of Q1 (although the alphabet is exponential), while
the number of states in A1 and A1 is exponential in
the number of variables in Q1 . A1 is the produt automaton of A1 1 , A1 , and A1 , and hene is of exponential size in Q1 . We all a word W aepted by A1 a
Q1 -word, and use K and to denote the KB and
mapping orresponding to Q1 .
By virtue of the orrespondene between Q1 -words
and KBs that are -anonial for Q1 , our method for
heking whether Q1 6 Q2 is based on searhing for a
Q1 -word W suh that no (Q2 ; K ; )-mapping exists. Now, let W be a Q1 -word. To hek whether
there is no (Q2 ; K ; )-mapping, we dene a twoway automaton A3 that heks the existene of suh a
mapping, and then omplement A3 , obtaining an automaton A4 .
In order to dene A3 , we represent (Q2 ; K ; )mappings as annotations of Q1-words, whih speify
where the variables of Q2 are being mapped to in
W , and hene in K . More preisely, the Q1 -word
Q
p
; i
W
W
W
The number of states in A2 2 is polynomial in the size
of Q2 , while the number of states in A2 , A2 , and A$2
is exponential in the number of variables in Q2. A2
is the produt automaton of A2 , A2 , A$2 , and of the
one-way automaton equivalent to A2 2 1 . Hene A2 is
of exponential size in Q2 .
Next we dene the one-way automaton A3 that simulates the guess of an annotation of a Q1 -word, and
emulates the behaviour of A2 on the resulting annotated word. The simulation of the guess and the emulation of A2 an be obtained simply by onstruting
A2 and then projeting out the annotation from the
Q
d
d
s
s
Q
W
W
W
; i
Q
W
W
; i
;i
a
a
W
;i
W
Q
p
W
d
Q
; i
i
1. A one-way automaton A2 that heks that for every symbol d 2 D0 ontaining a distinguished variable x of Q1 , every ourrene of d in W is annotated in W 0 with a set of variables ontaining
x.
A one-way automaton A1 1 that aepts all words
of the form above, where for i 2 f1; : : : ; m1 g,
y1 2 1 2 d2 1 , y1 2 2 d2 , and w is a word in
L(E ).
i
W
i
W
i
; i
r
i
i
that represent a KB that is -anonial for Q1 for some
, where eah d is an element of a xed set of nodes
D0 . We take D0 to be 2V1 and we expliitly enode in a
word aepted by A1 the mapping from the variables
of Q1 to the nodes in D0 . This requires to ensure that
the elements of D0 appearing in a word aepted by
A1 onstitute a partition of V1 into equivalene lasses.
To onstrut A1 , we dene:
i
r
1 wm1 d2m1 $
m1
i
1
W
Notie that the number of states of the one-way au-
tomaton equivalent to
AQ2 2
does not depend on the size of
the alphabet, whih is exponential in the number of vari-
6
ables of
Q1
.
transitions. The idea behind this onstrution is that
a path in A3 from an initial state to a nal state that
leads to the aeptane of a non-annotated word W ,
orresponds to a path in A2 that leads to the aeptane of a word W 0 whih represents W with some
annotation, and vie-versa. Observe that A3 has the
same number of states as A2 .
Let us stress that projeting out the annotations from
the transitions orresponds to guess them. However, in
order for the guesses to be meaningful the automaton
must be one-way, sine with a two-way automaton we
annot ensure that we make the same guess eah time
we pass over the same position in a word.
Finally, we dene the one-way automaton A4 as the
omplement of A3 .
4.2
Next we show an EXPSPACE lower bound for ontainment of onjuntive regular path queries without inverse (CRPQs). This loses the open problem in (Floresu et al., 1998b) on whether ontainment of CRPQs
ould be done in PSPACE.
To prove the result we exploit a redution from tiling
problems (van Emde Boas, 1982, 1997; Berger, 1966).
A tile is a unit square of one of several types and the
tiling problem we onsider is speied by means of a
nite set of tile types, two binary relations H and V
over , representing horizontal and vertial adjaeny
relations, respetively, and two distinguished tile types
t ; t 2 . The tiling problem onsists in determining
whether, for a given number n in unary, a region of the
integer plane of size 2 k, for some k, an be tiled onsistently with the adjaeny relations H and K , and
with the left bottom tile of the region of type t and
the right upper tile of type t . Using a redution from
aeptane of EXPSPACE Turing mahines analogous
to the one in (van Emde Boas, 1997), it an be shown
that this tiling problem is EXPSPACE-omplete.
S
S
F
By Theorem 2, to hek Q1 6 Q2 , it
is suÆient to nd a KB K and a mapping from the
variables of Q1 to the nodes of K, suh that K is anonial for Q1 , and suh that no (Q2 ; K; )-mapping
exists. Eah word W aepted by A1 represents a KB
K and a mapping suh that K is -anonial
for Q1. A4 aepts a Q1 -word W i there is no annotation of W that represents a (Q2 ; K ; )-mapping.
Thus, A1 \ A4 is nonempty i there exists a Q1 -word
W suh that K is anonial for Q1 , and is suh that
no (Q2 ; K ; )-mapping exists. Hene, A1 \ A4 is
nonempty i Q1 is not ontained in Q2 .
Proof (sketh).
W
W
W
Theorem 6 The problem of heking whether Q1 Q2 , where Q1 and Q2 are two CRPQs, is EXPSPACE-
hard.
W
Let T = (; H; V; t ; t ) be an
instane of the EXPSPACE-omplete tiling problem
above and n a number in unary. The alphabet is
= [ f0; 1g. The query Q1 is
Proof (sketh).
W
W
W
W
S
Q1 (x1 ; x2 )
Given two CRPQIs Q1 and Q2 , heking
whether Q1 Q2 an be done in EXPSPACE.
F
x1 E x2
where the regular expression in the right hand side is
E = 0 t ((0 + 1) ) 1 t :
Theorem 5
By theorem 4, Q1 Q2 i A1 \ A4 is empty.
The size of A1 is exponential in the size of Q1 , the size
of A2 is exponential in the size of Q2 , the size of A3 is
polynomial in the size of A2 , and nally, the size of A4
is exponential in the size of A3 . Therefore, the size of
A4 is doubly exponential in the size of Q2 . However,
to hek whether A1 \ A4 is empty we do not need
to onstrut A4 expliitly. Instead, starting from A3 ,
we onstrut A4 \on-the-y"; whenever the emptiness
algorithm wants to move from a state s1 of the intersetion of A1 and A4 to a state s2 , it guesses s2 and
heks that it is diretly onneted to s1 . One this
has been veried, the algorithm an disard s1 . Thus,
at eah step the algorithm needs to keep in memory at
most two states and there is no need to generate all of
A4 at any single step of the algorithm.
F
n
Let Q1 and Q2 be two CRPQIs and A1
and A4 be as speied above. Then Q1 6 Q2 i A1 \A4
is nonempty.
Theorem 4
W
LOWER BOUND
n
Proof.
n
S
n
F
Thus, a word in L(E ) onsists of a sequene of bloks,
eah blok onsists of an n-bit address and a tile in
. We intend the sequene of addresses to behave
as an n-bit ounter, starting with 0 and ending with
1 . A word in L(E ) enodes a tiling if eah pair of
adjaent tiles is onsistent with H and eah pair of
tiles that have the same address, where there is no tile
in between them with the same address, is onsistent
with V . Thus, a word in L(E ) does not enode a tiling,
if it ontains an error. An error is a pair of bloks that
exhibit an inorret behavior of the ounter or that has
a pair of tiles that is inonsistent with H or V . We
detet errors using the query Q2, whih is
n
n
Q2 (x1 ; x2 )
7
x1 E1 y1 ^ (
i
^
2f0
g
;:::;n
y1 Fi y2 ) ^ y2 E1 x2
where E1 is the regular expression ((0 + 1) ) , and
for i 2 f0; : : : ; ng are onstruted as explained below. The intuition is that y1 and y2 map to a pair that
represents an error.
We dene a regular expression F that detets adjaent bloks with an error in the address bits. F an
be onstruted by enoding an n-bit ounter (Borger,
Graedel, & Gurevih, 1997).
We dene a regular expression F that detets adjaent bloks in whih the tiles do not respet the horizontal adjaeny relation H :
5
n
Fi ,
We have presented both upper bound and lower bound
results for ontainment of onjuntive regular path
queries with inverse. This lass of queries has several features that are typial of modern query languages for knowledge and data bases. In partiular,
it is the largest subset of query languages for XML
data (Deutsh, Fernandez, Floresu, Levy, Maier, &
Suiu, 1999) for whih ontainment has been shown
deidable.
The upper bound shows that adding inverse to onjuntive regular path queries does not inrease the
omplexity of query ontainment. The lower bound
holds also for the ase without the inverse operator,
and provides the answer to the question of whih is
the inherent omplexity of heking ontainment of
onjuntive regular path queries.
One interesting feature of our method is to demonstrate the power of two-way automata in reasoning on
omplex queries. The method an also be adapted to
more general forms of queries and reasoning tasks. Indeed, it is easy to extend our algorithm to the ase of
union of onjuntive regular path queries with inverse.
Query ontainment is typially the rst step in addressing the more involved problems of query rewriting and query answering using views. For the ase of
regular path queries, suh problems have been studied in (Calvanese et al., 1999, 2000). We are urrently
working on extending these results to more powerful
query languages, suh as the one onsidered in this
paper, by exploiting the tehniques based on two-way
automata.
C
C
H
FH
X
((0 + 1) ) 2 )62 (0 + 1) t1 (0 + 1) t2 ((0 + 1) )
=
(t1 ;t
n
n
H
n
n
In order to onstrut a regular expression that detets
a sequene of 2 +1 bloks, in whih the tiles in the rst
and last blok do not respet the vertial adjaeny
relation V , we dene:
n
G0
X
=
(0 + 1) t1 2 )62 ((0 + 1) ) (0 + 1) t2
(t1 ;t
n
n
V
n
and, for i 2 f1; : : : ; ng, we dene G = G0 + G1 , where
for b 2 f0; 1g (b denotes the omplement of bit b):
i
Gbi
i
i
= (0 + 1) 1 b(0 + 1) (0 + 1) b(0 + 1) b (0 + 1) b(0 + 1) (0 + 1) 1 b(0 + 1) i
n
i
i
n
i
n
Assuming that the address bits are orret, a word aepted by all G1 ; : : : ; G is onstituted by a sequene
of bloks in whih the rst and the last blok are exatly 2 bloks apart. This follows from the fat that
the rst and the last blok oinide in all the address
bits, and in between there is either exatly one blok
with all address bits equal to 0, or exatly one blok
with all address bits equal to 1. The intersetion of
G0 ; G1 ; : : : ; G detets errors due to the vertial adjaeny relation V .
Hene, by dening F = G + F + F , for i 2
f0; : : : ; ng, the query Q2 detets all three types of errors.
If there is no tiling, then every word in L(E ) ontains
an error and Q1 is ontained in Q2 . If there is a tiling,
then there is a word in L(E ) without an error, and this
word provides a ounterexample to the ontainment of
Q1 in Q2 .
Aknowledgments
n
This work was supported in part by the NSF grant
CCR-970006, by MURST, by ESPRIT LTR Projet
No. 22469 DWQ (Foundations of Data Warehouse
Quality), and by the Italian Spae Ageny (ASI) under projet \Integrazione ed Aesso a Basi di Dati
Eterogenee".
n
n
i
i
H
CONCLUSIONS
Referenes
Abiteboul, S., Buneman, P., & Suiu, D. (1999). Data
on the Web: From Relations to Semistrutured Data
and XML. Morgan Kaufmann, Los Altos.
C
8
Adali, S., Candan, K. S., Papakonstantinou, Y., &
Subrahmanian, V. S. (1996). Query ahing and optimization in distributed mediator systems. In Pro.
of the ACM SIGMOD Int. Conf. on Management of
Data, pp. 137{148.
Aho, A. V., Sagiv, Y., & Ullman, J. D. (1979). Equivalene among relational expressions. SIAM J. on Computing, 8, 218{246.
Priniples of Database Systems (PODS'92), pp. 202{
211.
Chandra, A. K. & Merlin, P. M. (1977). Optimal implementation of onjuntive queries in relational data
bases. In Pro. of the 9th ACM Sym. on Theory of
Computing (STOC'77), pp. 77{90.
Berger, R. (1966). The undeidability of the dominoe
problem. Mem. Amer. Math. So., 66, 1{72.
Borger, E., Graedel, E., & Gurevih, Y. (1997). The
Classial Deision Problem. Perspetives in Mathematial Logi. Springer-Verlag.
Chaudhuri, S., Krishnamurthy, S., Potarnianos, S., &
Shim, K. (1995). Optimizing queries with materialized
views. In Pro. of the 11th IEEE Int. Conf. on Data
Engineering (ICDE'95) Taipei (Taiwan).
Bray, T., Paoli, J., & Sperberg-MQueen,
C. M. (1998).
Extensible Markup Language
(XML) 1.0 { W3C reommendation. Teh. rep.,
World Wide Web Consortium.
Available at
http://www.w3.org/TR/1998/REC-xml-19980210.
Chaudhuri, S. & Vardi, M. Y. (1992). On the equivalene of reursive and nonreursive Datalog programs.
In Pro. of the 11th ACM SIGACT SIGMOD SIGART
Sym. on Priniples of Database Systems (PODS'92),
pp. 55{66.
Buneman, P. (1997). Semistrutured data. In Pro. of
the 16th ACM SIGACT SIGMOD SIGART Sym. on
Priniples of Database Systems (PODS'97), pp. 117{
121.
David S. Johnson, A. C. K. (1984). Testing ontainment of onjuntive queries under funtional and inlusion dependenies. J. of Computer and System Sienes, 28 (1), 167{189.
Buneman, P., Davidson, S., Hillebrand, G., & Suiu,
D. (1996). A query language and optimization tehnique for unstrutured data. In Pro. of the ACM SIGMOD Int. Conf. on Management of Data, pp. 505{516.
Deutsh, A., Fernandez, M., Floresu, D., Levy, A.,
Maier, D., & Suiu, D. (1999). Querying XML data.
IEEE Bulletin of the Tehnial Committee on Data
Engineering, 22 (3), 10{18.
Calvanese, D., De Giaomo, G., & Lenzerini, M.
(1998). On the deidability of query ontainment under onstraints. In Pro. of the 17th ACM SIGACT
SIGMOD SIGART Sym. on Priniples of Database
Systems (PODS'98), pp. 149{158.
Donini, F. M., Lenzerini, M., Nardi, D., & Shaerf, A.
(1996). Reasoning in desription logis. In Brewka, G.
(Ed.), Priniples of Knowledge Representation, Studies in Logi, Language and Information, pp. 193{238.
CSLI Publiations.
Calvanese, D., De Giaomo, G., & Lenzerini, M.
(1999). Representing and reasoning on XML douments: A desription logi approah. J. of Logi and
Computation, 9 (3), 295{318.
Donini, F. M., Lenzerini, M., Nardi, D., & Shaerf, A.
(1998). AL-log: Integrating Datalog and desription
logis. J. of Intelligent Information Systems, 10 (3),
227{252.
Calvanese, D., De Giaomo, G., Lenzerini, M., Nardi,
D., & Rosati, R. (1998). Desription logi framework
for information integration. In Pro. of the 6th Int.
Conf. on the Priniples of Knowledge Representation
and Reasoning (KR'98), pp. 2{13.
Eklund, P., Nagle, T., Nagle, J., & Gerholz, L. (Eds.).
(1992). Coneptual Strutures: Current Researh and
Pratie. Ellis Horwood.
Fensel, D., Knoblok, C., Kushmerik, N., & Rousset,
M.-C. (Eds.). (1999). IJCAI-99 Workshop on Intelligent Information Integration. CEUR Eletroni Workshop Proeedings, http://sunsite.informatik.rwthaahen.de/Publiations/CEUR-WS/Vol-23/.
Calvanese, D., De Giaomo, G., Lenzerini, M., &
Vardi, M. Y. (1999). Rewriting of regular expressions and regular path queries. In Pro. of the 18th
ACM SIGACT SIGMOD SIGART Sym. on Priniples
of Database Systems (PODS'99), pp. 194{204.
Fernandez, M., Floresu, D., Levy, A., & Suiu, D.
(1999). Verifying integrity onstraints on web-sites. In
Pro. of the 16th Int. Joint Conf. on Artiial Intelligene (IJCAI'99).
Calvanese, D., De Giaomo, G., Lenzerini, M., &
Vardi, M. Y. (2000). Answering regular path queries
using views. In Pro. of the 16th IEEE Int. Conf. on
Data Engineering (ICDE 2000). To appear.
Chan, E. P. F. (1992). Containment and minimization
of positive onjuntive queries in oodb's. In Pro. of
the 11th ACM SIGACT SIGMOD SIGART Sym. on
9
Fernandez, M. F. & Suiu, D. (1998). Optimizing
regular path expressions using graph shemas. In
Pro. of the 14th IEEE Int. Conf. on Data Engineering
(ICDE'98), pp. 14{23.
Floresu, D., Levy, A., & Mendelzon, A. (1998a).
Database tehniques for the World-Wide Web: A survey. SIGMOD Reord, 27 (3), 59{74.
Levy, A. Y. & Sagiv, Y. (1995). Semanti query optimization in datalog programs. In Pro. of the 14th
ACM SIGACT SIGMOD SIGART Sym. on Priniples
of Database Systems (PODS'95), pp. 163{173.
Floresu, D., Levy, A., & Suiu, D. (1998b). Query
ontainment for onjuntive queries with regular expressions. In Pro. of the 17th ACM SIGACT SIGMOD SIGART Sym. on Priniples of Database Systems (PODS'98), pp. 139{148.
Levy, A. Y. & Suiu, D. (1997). Deiding ontainment
for queries with omplex objets. In Pro. of the 16th
ACM SIGACT SIGMOD SIGART Sym. on Priniples
of Database Systems (PODS'97), pp. 20{31.
Friedman, M., Levy, A., & Millstein, T. (1999). Navigational plans for data integration. In Pro. of the
16th Nat. Conf. on Artiial Intelligene (AAAI'99).
AAAI Press/The MIT Press.
Milo, T. & Suiu, D. (1999). Index strutures for path
expressions. In Pro. of the 7th Int. Conf. on Database
Theory (ICDT'99), Vol. 1540 of Leture Notes in Computer Siene, pp. 277{295. Springer-Verlag.
Gupta, A. & Ullman, J. D. (1992). Generalizing onjuntive query ontainment for view maintenane and
integrity onstraint veriation (abstrat). In Workshop on Dedutive Databases (In onjuntion with
JICSLP), p. 195 Washington D.C. (USA).
Motro, A. (1996). Panorama: A database system that
annotates its answers to queries with their properties.
J. of Intelligent Information Systems, 7 (1).
Sagiv, Y. & Yannakakis, M. (1980). Equivalenes
among relational expressions with the union and differene operators. J. of the ACM, 27 (4), 633{655.
Hoproft, J. E. & Ullman, J. D. (1979). Introdution to
Automata Theory, Languages, and Computation. Addison Wesley Publ. Co., Reading, Massahussetts.
van der Meyden, R. (1992). The Complexity of Querying Indenite Information. Ph.D. thesis, Rutgers University.
Klug, A. C. (1988). On onjuntive queries ontaining
inequalities. J. of the ACM, 35 (1), 146{160.
van Emde Boas, P. (1982). Dominoes are forever. In
Pro. of 1st GTI Workshop, Rheie Theoretishe Informatik UGH Paderborn, pp. 75{95 Paderborn (Germany).
Knoblok, C. & Levy, A. Y. (Eds.). (1995). Proeedings of the AAAI 1995 Spring Symp. on Information
Gathering from Heterogeneous, Distributed Environments, No. SS-95-08 in AAAI Spring Symposium Series. AAAI Press/The MIT Press.
van Emde Boas, P. (1997). The onveniene of tilings.
In Sorbi, A. (Ed.), Complexity, Logi, and Reursion
Theory, Vol. 187 of Leture notes in pure and applied
mathematis, pp. 331{363. Marel Dekker In.
Levy, A. Y., Rajaraman, A., & Ordille, J. J. (1996).
Query answering algorithms for information agents. In
Pro. of the 13th Nat. Conf. on Artiial Intelligene
(AAAI'96), pp. 40{47.
Vardi, M. Y. (1989). A note on the redution of twoway automata to one-way automata. Information Proessing Letters, 30 (5), 261{264.
Levy, A. Y. & Rousset, M.-C. (1996). CARIN: A representation language ombining Horn rules and desription logis. In Pro. of the 12th European Conf. on
Artiial Intelligene (ECAI'96), pp. 323{327.
Widom, J. (1995). Speial issue on materialized views
and data warehousing. IEEE Bulletin on Data Engineering, 18 (2).
Levy, A. Y. & Rousset, M.-C. (1997). Veriation of
knowledge bases: a unifying logial view. In Pro.
of the 4th European Symposium on the Validation and
Veriation of Knowledge Based Systems Leuven, Belgium.
Levy, A. Y. & Rousset, M.-C. (1998a). Combining
horn rules and desription logis in CARIN. Artiial
Intelligene, 104(1-2), 165{209.
Levy, A. Y. & Rousset, M.-C. (1998b). Veriation of
knowledge bases based on ontainment heking. Artiial Intelligene, 101(1-2), 227{250.
10
Download