SHIQ Conjunctive Query Answering for the Description Logic

advertisement
Conjunctive Query Answering for the Description Logic SHIQ
Birte Glimm1, Ian Horrocks1 , Carsten Lutz2 , Uli Sattler1
1
2
School of Computer Science
University of Manchester, UK
Abstract
Conjunctive queries play an important role as an
expressive query language for Description Logics
(DLs). Although modern DLs usually provide for
transitive roles, it was an open problem whether
conjunctive query answering over DL knowledge
bases is decidable if transitive roles are admitted
in the query. In this paper, we consider conjunctive queries over knowledge bases formulated in
the popular DL SHIQ and allow transitive roles in
both the query and the knowledge base. We show
that query answering is decidable and establish the
following complexity bounds: regarding combined
complexity, we devise a deterministic algorithm for
query answering that needs time single exponential
in the size of the KB and double exponential in the
size of the query. Regarding data complexity, we
prove co-NP-completeness.
1 Introduction
Description Logics (DLs) [Baader et al., 2003] are a wellestablished family of logic-based knowledge representation
formalisms that have recently gained increased attention due
to their usage as the logical underpinning of ontology languages such as DAML+OIL and OWL [Horrocks et al.,
2003]. A DL knowledge base (KB) consists of a TBox, which
contains intensional knowledge such as concept definitions
and general background knowledge, and an ABox, which
contains extensional knowledge and is used to describe individuals. Using a database metaphor, the TBox corresponds
to the schema, and the ABox corresponds to the data.
In data-intensive applications, querying KBs plays a central role. Essentially, there are two forms of querying. The
first one is instance retrieval, which allows the retrieval of all
certain instances of a given (possibly complex) concept C,
i.e., it returns all individuals from the ABox that are an instance of C in every model of the KB. Technically, instance
retrieval is well-understood. For the prominent DL SHIQ,
which underlies DAML+OIL and OWL Lite, it is E XP T IMEcomplete [Tobies, 2001], and, despite this high worst-case
complexity, efficient implementations are available. On the
other hand, instance retrieval is a rather poor form of querying: concepts are used as queries, and thus we can only query
Institute for Theoretical Computer Science
TU Dresden, Germany
for structures that are invariant under (guarded) bisimulations. For this reason, many applications require conjunctive
query answering as a stronger form of querying, i.e., computing the certain answers to a conjunctive query over a DL
knowledge base.
Until now it was an open problem whether conjunctive
query answering is decidable in SHIQ. In particular, the
presence of transitive and inverse roles makes the problem
rather tricky [Glimm et al., 2006], and results were only available for two restricted cases. The first case is obtained by
stipulating that the variables in queries can only be bound to
individuals that are explicitly mentioned in the ABox. The
result is a form of closed-domain semantics, which is different from the usual open-domain semantics in DLs. It is easily
seen that conjunctive query answering in this setting can be
reduced to instance retrieval. In the second case, the binary
atoms in conjunctive queries are restricted to roles that are
neither transitive nor have transitive sub-roles, and it is known
that conjunctive query answering in this setting is decidable
and co-NP-complete regarding data complexity [Ortiz et al.,
2006].
In this paper, we show that unrestricted conjunctive query
answering in SHIQ is decidable. More precisely, we devise a decision procedure for the entailment of a conjunctive
query by a SHIQ knowledge base, which is the decision
problem corresponding to conjunctive query answering. It
is well-known that decidability and complexity results carry
over from entailment to answering. Our decision procedure
for query entailment consists of a rather intricate reduction to
KB consistency in SHIQ , i.e., SHIQ extended with role
conjunction. The latter is easily seen to be decidable. The
resulting (deterministic) algorithm for conjunctive query entailment in SHIQ needs time double exponential in the size
of the query and single exponential in the size of the KB.
This result concerns the combined complexity, i.e., it is measured in the size of all inputs: the query, the ABox, and the
TBox. Since query and TBox are usually small compared to
the ABox, the data complexity (measured in the size of the
ABox, only) is also a relevant issue. We show that (the decision problem corresponding to) conjunctive query answering
in SHIQ is co-NP-complete regarding data complexity, and
thus not harder than instance retrieval [Hustadt et al., 2005].
This paper is accompanied by a technical report which contains full proofs [Glimm et al., 2006].
IJCAI-07
399
2 Preliminaries
Syntax and Semantics of SHIQ
Let NC , NR , and NI be sets of concept names, role names, and
individual names. We assume that the role names contain
a subset NtR ⊆ NR of transitive role names. A role is an
element of NR ∪ {r− | r ∈ NR }, where roles of the form
r− are called inverse roles. A role inclusion is of the form
r s with r, s roles. A role hierarchy H is a finite set of role
inclusions.
An interpretation I = (ΔI , ·I ) consists of a non-empty set
ΔI , the domain of I, and a function ·I , which maps every
concept name A to a subset AI ⊆ ΔI , every role name
r ∈ NR to a binary relation rI ⊆ ΔI × ΔI such that rI
is transitive for every r ∈ NtR , and every individual name a
to an element aI ∈ ΔI . An interpretation I satisfies a role
inclusion r s if rI ⊆ sI , and a role hierarchy H if it satisfies all role inclusions in H. We use the following standard
notation:
(1) Inv(r) := r− if r ∈ NR and Inv(r) := s if r = s− for a
role name s.
(2) For a role hierarchy H, ∗H is the reflexive transitive closure of over H ∪ {Inv(r) Inv(s) | r s ∈ H}, and we
use r ≡∗H s as an abbreviation for r ∗H s and s ∗H r.
(3) For a role hierarchy H and a role s, we define the set
TransH of transitive roles as
{s | ∃ role r with r ≡∗H s and r ∈ NtR or Inv(r) ∈ NtR }.
(4) A role r is called simple w.r.t. a role hierarchy H if, for
/ TransH .
each role s such that s ∗H r, s ∈
The subscript H of ∗H and TransH is dropped if clear
from the context. SHIQ-concepts (or concepts for short)
are built inductively using the following grammar, where
A ∈ NC , n ∈ IN, r is a role, and s is a simple role:
C ::= | ⊥ | A | ¬C | C1 C2 | C1 C2 |
∀r.C | ∃r.C | ns.C | ns.C.
The semantics of SHIQ-concepts is defined as usual, see
e.g. [Horrocks et al., 2000] for details.
A general concept inclusion (GCI) is an expression C D, where both C and D are concepts. A finite set of GCIs
is called a TBox. An assertion is an expression of the form
.
A(a), ¬A(a), r(a, b), ¬r(a, b), or a = b, where A is a concept
name, r is a role, and a, b ∈ NI . An ABox is a finite set
of assertions. We use Ind(A) to denote the set of individual
names occurring in A. An interpretation I satisfies a GCI
C D if C I ⊆ DI . Satisfaction of assertions is defined
in the obvious way, e.g. A(a) is satisfied if aI ∈ AI . An
interpretation I satisfies a TBox (ABox) if it satisfies each
GCI (assertion) in it.
Observe that, in ABox assertions C(a), we require C to
be a (possibly negated) concept name. This is a standard assumption when the data complexity of DLs is analyzed, see
e.g. [Donini et al., 1994]. In this paper, we will sometimes
also allow ABox assertions C(a), where C is an arbitrary
concept. To make this explicit, we will call such ABoxes
generalized.
A knowledge base (KB) is a triple (T , H, A) with T a
TBox, H a role hierarchy, and A an ABox. Let K = (T , H, A)
be a KB and I = (ΔI , ·I ) an interpretation. An interpretation
I satisfies K if it satisfies T , H, and A. If Γ is a TBox,
ABox, or KB and I satisfies Γ, we say that I is a model of Γ
and write I |= Γ. A knowledge base K is consistent if it has
a model.
Conjunctive Queries
Let NV be a countably infinite set of variables disjoint from
NC , NR , and NI . An atom is an expression A(v) (concept
atom) or r(v, v ) (role atom), where A is a concept name, r is
a role, and v, v ∈ NV . A conjunctive query q is a non-empty
set of atoms. Intuitively, such a set represents the conjunction
of its elements. We use Var(q) (Var(at)) to denote the set
of variables occurring in the query q (atom at). Let I be an
interpretation, q a conjunctive query, and π : Var(q) → ΔI a
total function. We write
• I |=π C(v) if (π(v)) ∈ C I ;
• I |=π r(v, v ) if (π(v), π(v )) ∈ rI ;
If I |=π at for all at ∈ q, we write I |=π q and call π a
match for I and q. We say that I satisfies q and write I |= q
if there is a match π for I and q. If I |= q for all models I of
a KB K, we write K |= q and say that K entails q.
The query entailment problem is defined as follows: given
a knowledge base K and a query q, decide whether K |= q.
It is well-known that query entailment and query answering
can be mutually reduced and that decidability and complexity
results carry over [Calvanese et al., 1998; Horrocks and Tessaris, 2000]. In the remainder of this paper, we concentrate
on query entailment.
For convenience, we assume that conjunctive queries
are closed under inverses, i.e., if r(v, v ) ∈ q, then
Inv(r)(v , v) ∈ q. If we add or remove atoms from a query,
we silently assume that we do this such that the resulting
query is again closed under inverses. We will also assume
that queries are connected. Formally, a query q (closed under
inverses) is connected if, for all v, v ∈ Var(q), there exists
a sequence v0 , . . . , vn such that v0 = v, vn = v , and for
all i < n, there exists a role r such that r(vi , vi+1 ) ∈ q.
A collection q0 , . . . , qk of queries is a partitioning of q if
q = q0 ∪ · · · ∪ qk , Var(qi ) ∩ Var(qj ) = ∅ for i < j ≤ k,
and each qi is connected. The following lemma shows that
connectedness can be assumed w.l.o.g.
Lemma 1. Let K be a knowledge base, q a conjunctive query,
and q0 , . . . , qn a partitioning of q. Then K |= q iff K |= qi
for all i ≤ n.
3 Forests and Trees
In this section, we carefully analyze the entailment of queries
by knowledge bases and establish a set of general properties
that will play a central role in our decision procedure. We
start by showing that, in order to decide whether K |= q, it
suffices to check whether I |= q for all models I of K that are
of a particular shape. Intuitively, these models are shaped like
a forest (in the graph-theoretic sense), modulo the fact that
transitive roles have to be interpreted in transitive relations.
Let IN∗ be the set of all (finite) words over the alphabet IN.
A tree T is a non-empty, prefix-closed subset of IN∗ . For
IJCAI-07
400
w, w ∈ T , we call w a successor of w if w = w · c for
some c ∈ IN, where “·” denotes concatenation. We call w a
neighbor of w if w is a successor of w or w is a successor
of w .
Definition 2. A forest base for K is an interpretation I that
interpretes transitive roles in unrestricted (i.e., not necessarily
transitive) relations and, additionally, satisfies the following
conditions:
T1 ΔI ⊆ Ind(A) × IN∗ such that, for all a ∈ Ind(A), the
set {w | (a, w) ∈ ΔI } is a tree;
T2 if ((a, w), (a , w )) ∈ rI , then either w = w = ε or
a = a and w is a neighbor of w;
T3 for all a ∈ Ind(A), aI = (b, ε) for some b ∈ Ind(A).
A model I of K is canonical if there exists a forest base J for
K such that J is identical to I except that, for all non-simple
roles r, we have
rI = rJ ∪
(sJ )+ .
s∗ r, s∈Trans
In this case, we say that J is a forest base for I.
Observe that, in canonical models I, each individual a is
mapped to a pair (b, ε), where a = b does not necessarily
hold. We need this since we do not adopt the uniqe name
assumption (UNA): if a, b ∈ NI with a = b, then we allow
that aI = bI . If desired, the UNA can easily be adopted by
adding an assertion a = b for each pair of individual names
in Ind(A) to the ABox.
Lemma 3. K |= q iff there exists a canonical model I of K
such that I |= q.
Proof. Using standard unravelling (see e.g. [Tobies, 2001]),
each model of K can be converted into a canonical model
of K. Moreover, if I |= K and I is the canonical model
obtained by unravelling I, then it is not hard to show that
J
I |= q implies I |= q, for all conjunctive queries q.
Lemma 3 shows that, when deciding whether K |= q, it suffices to check whether I |= q for all canonical models I of K.
As a next step, we would like to show that, for canonical models I, to check whether I |= q, we can restrict our attention
to a particular kind of match π for I and q. A match π for I
and q is a forest match if, for all r(v, v ) ∈ q, we have one of
the following:
• π(v), π(v ) ∈ NI × {ε};
• π(v), π(v ) ∈ {a} × IN∗ for some a ∈ Ind(A).
Alas, it is not sufficient to only consider forest matches for I
and q. Instead, we show the following: we can rewrite q into
a set of queries Q such that, for all canonical models I, we
have that I |= q iff I |=π q for some q ∈ Q and forest
match π. Intuitively, this complication is due to the presence
of transitive roles.
Definition 4. A query q is called a transitivity rewriting of q w.r.t. K if it is obtained from q by choosing
atoms r0 (v0 , v0 ), . . . , rn (vn , vn ) ∈ q and roles s0 , . . . , sn ∈
TransH such that si ∗H ri for all i ≤ n, and then replacing
ri (vi , vi ) with
si (vi , ui ), si (ui , vi )
or
si (vi , ui ), si (ui , ui ), si (ui , vi )
for all i ≤ n, where ui , ui ∈ Var(q). We use trK (q) to denote
the set of all transitivity rewritings of q w.r.t. K.
We assume that trK (q) contains no isomorphic queries, i.e.,
differences in (newly introduced) variable names are neglected. Together with Lemma 3, the following lemma shows
that, to decide whether K |= q, it suffices to check the existence of some canonical model I, some forest match π, and
some q ∈ trK (q) such that I |=π q .
Lemma 5. Let I be a model of K.
1. If I is canonical and I |= q, then there is a q ∈ trK (q)
such that I |=π q , with π a forest match.
2. If I |= q with q ∈ trK (q), then I |= q.
Proof. (1) can be proved by using the match and the canonical
structure of I to guide the rewriting process. (2) holds by
definition of transitivity rewritings and the semantics.
J
The most important property of forest matches is the following: if I |=π q with π a forest match, then π splits the query q
into several subqueries: the base subquery q0 contains all role
atoms that are matched to root nodes:
q0 := {r(v, v ) ∈ q | π(v), π(v ) ∈ NI × {ε}};
Moreover, for each (a, ε) ∈ NI × {ε} which occurs in the
range of π, there is an object subquery qa :
qa := {at | ∀v ∈ Var(at) : π(v) ∈ {a} × IN∗ } \ q0 .
Clearly, q = q0 ∪ a qa . Although the resulting subqueries
are not a partitioning of q in the sense of Section 2, one of the
fundamental ideas behind our decision procedure is to treat
the different subqueries more or less separately. The main
benefit is that the object subqueries can be rewritten into treeshaped queries which can then be translated into concepts.
This technique is also known as “rolling up” of tree conjunctive queries into concepts and was proposed in [Calvanese et
al., 1998; Horrocks and Tessaris, 2000].
Formally, a query q is tree-shaped if there exists a bijection
σ from Var(q) into a tree T such that r(v, v ) ∈ q implies
that σ(v) is a neighbor of σ(v ) in T . Before we show how to
rewrite the object subqueries into tree-shaped queries, let us
substantiate our claim that tree-shaped queries can be rolled
up into concepts. The result of rolling up is not a SHIQconcept, but a concept formulated in SHIQ , the extension
of SHIQ with role intersection. More precisely, SHIQ is
obtained from SHIQ by admitting the concept constructors
∃α.C, ∀α.C, α.C, and α.C, where α is a role conjunction r1 · · · rk with the ri (possibly inverse) roles.
Let q be a tree-shaped query and σ a bijection from Var(q)
into a tree T . Inductively assign to each v ∈ Var(q) a
SHIQ -concept Cq (v):
• if σ(v) is a leaf of T , then Cq (v) := A(v)∈q A
IJCAI-07
401
• if σ(v) has successors σ(v1 ), . . . , σ(vn ) in T , then
Cq (v) :=
A
∃
r .Cq (vi ).
A(v)∈q
1≤i≤n
r(v,vi )∈q
Then the rolling up Cq of q is defined as Cq (vr ), where vr
is such that σ(vr ) = ε. (Recall that σ is a bijection, hence,
such a vr exists.) The following lemma shows the connection
between the query and the rolled up concept.
Lemma 6. Let q be a tree-shaped query and I an interpretation. Then I |= q iff CqI = ∅.
We now show how to transform a query q into a set Q of treeshaped ones. To describe the exact goal of this translation,
we need to introduce tree matches as a special case of forest
matches: a match π for a canonical model I and q is a tree
match if the range of π is a subset of {a} × IN∗ , for some
a ∈ NI . Now, our tree transformation should be such that
(∗) whenever I |=π q with I a canonical model
and π a tree match, then I |=π q for some (treeshaped) query q ∈ Q and tree match π .
Recall the splitting of a query into a base subquery and a set
of object subqueries qa , induced by a forest match π. It is not
hard to see that for each qa , the restriction of π to Var(qa ) is
a tree match for I and qa . Thus, object subqueries together
with their inducing matches π satisfy the precondition of (∗).
The rewriting of a query into a tree-shaped one is a
three stage process. In the first stage, we derive a collapsing q0 of the original query q by (possibly) identifying variables in q. This allows us, e.g., to transform atoms
r(v, u), r(v, u ), r(u, w), r(u , w) into a tree shape by identifying u and u . In the second stage, we derive an extension q1 of q0 by (possibly) introducing new variables and
role atoms that make certain existing role atoms r(v, v )
redundant, where r is non-simple. In the third stage, we
derive a reduct q of q1 by (possibly) removing redundant
role atoms, i.e., atoms r(v, v ) such that there exist atoms
s(v0 , v1 ), . . . , s(vn−1 , vn ) ∈ q with v0 = v, vn = v , s ∗ r,
and s ∈ Trans. Combining the extension and reduct steps allows us, e.g., to transform a non-tree-shaped “loop” r(v, v)
into a tree shape by adding a new variable v and edges
s(v, v ), s(v , v) such that s ∗ r and s ∈ Trans, and then
removing the redundant atom r(v, v).
In what follows, the size |q| of a query q is defined as the
number of atoms in q.
Definition 7. A collapsing of q is obtained by identifying
variables in q. A query q is an extension of q w.r.t. K if:
1. q ⊆ q ;
2. A(v) ∈ q implies A(v) ∈ q;
3. r(v, v ) ∈ q \ q implies that r occurs in K;
4. |Var(q )| ≤ 4|q|;
5. |{r(v, v ) ∈ q | r(v, v ) ∈
/ q}| ≤ 171|q|2 .
A query q is a reduct of q w.r.t. K if:
1. q ⊆ q;
2. A(v) ∈ q implies A(v) ∈ q ;
3. if r(v, v ) ∈ q \ q , then there is a role s such that s ∗ r,
s ∈ Trans, and there are v0 , . . . , vn such that v0 = v,
vn = v , and s(vi , vi+1 ) ∈ q for all i < n.
A query q is a tree transformation of q w.r.t. K if there exist
queries q0 and q1 such that
• q0 is a collapsing of q,
• q1 is an extension of q0 w.r.t. K, and
• q is a tree-shaped reduct of q1 .
We use ttK (q) to denote the set of all tree transformations of
q w.r.t. K.
We remark that Condition 5 of query extensions is not strictly
needed for correctness, but it ensures that our algorithm is
only single exponential in the size of K. As in the case of
trK (q), we assume that ttK (q) does not contain any isomorphic queries.
The next lemma states the central properties of tree transformations. Before we can formulate it, we introduce two
technical notions. Let q ∈ ttK (q), I |=π q, and I |=π q .
Then π and π are called ε-compatible if the following holds:
if v ∈ Var(q), v was identified with variable v ∈ Var(q )
during the collapsing step, and at least one of π(v), π (v ) is
in NI × {ε}, then π(v) = π (v ). Furthermore, we call π an
a-tree match if π(v) ∈ {a} × IN∗ for all v ∈ Var(q).
Lemma 8. Let I be a model of K.
1. If I is canonical and π an a-tree match with I |=π q,
then there is a q ∈ ttK (q) and an a-tree match π such
that I |=π q and π, π are ε-compatible.
2. If q ∈ ttK (q) and I |=π q , then there is a match π
with I |=π q and π, π are ε-compatible.
Proof. The proof of (2) is straightforward, but the proof of
(1) is quite complex. The basic idea behind the proof of (1) is
that, given a canonical model I of K and a tree match π such
that I |=π q, we can use π and I to guide the transformation
process. The bounds of 4|q| and 171|q|2 in Conditions 4 and 5
of extensions are derived by computing the maximum number
of variables and atoms that might be introduced in the guided
transformation process.
J
Intuitively, using a-tree matches and ε-compatibility in
Lemma 8 ensures that, if we are given a match for the base
subquery and a tree match for a tree transformation of each
object subquery, then we can construct a forest match of the
original query.
4 The Decision Procedure
Let K be a knowledge base and q a conjunctive query such
that we want to decide whether K |= q. A counter model for
K and q is a model I of K such that I |= q. In the following,
we show how to convert K and q into a sequence of knowledge bases K1 , . . . , K such that (i) every counter model for
K and q is a model of some Ki , (ii) every canonical model
of a Ki is a countermodel for K and q, and (iii) each consistent Ki has a canonical model. Thus, K |= q iff all the Ki are
inconsistent. This gives rise to two decision procedures: a deterministic one in which we enumerate all Ki and which we
IJCAI-07
402
use to derive an upper bound for combined complexity; and
a non-determinstic one in which we guess a Ki and which
yields a tight upper bound for data complexity.
Since the knowledge bases Ki involve concepts obtained
by rolling up tree-shaped queries, they are fomulated in
SHIQ rather than in SHIQ. More precisely, each KB Ki
is of the form (T ∪ Tq , H, A ∪ Ai ), where
• (T , H, A) is a SHIQ knowledge base;
• Tq is a set of GCIs C with C a SHIQ concept;
• Ai is a generalized SHIQ -ABox1 such that
Ind(Ai ) ⊆ Ind(A).
In what follows, we call knowledge bases of this form extended knowledge bases. Using a standard unravelling argument, it is easy to establish Property (iii) from above, i.e.,
every consistent extended knowledge base K has a canonical
model.
The actual construction of the extended knowledge bases
is based on the analysis in Section 3. To start with, Lemma 3
and 5 imply the following: to ensure that a canonical model of
an extended KB is a counter model for K and q, it suffices to
prevent forest matches of all queries q ∈ trK (q). In order to
prevent such matches, we use the parts Tq and Ai of extended
knowledge bases.
More precisely, we distinguish between two kinds of forest
matches: tree matches and true forest matches, i.e., forest
matches that are not tree matches. Preventing tree matches
of a q ∈ trK (q) in a canonical model is relatively simple:
by Lemmas 8 and 6, it suffices to ensure that, for all q ∈
ttK (q ), the corresponding concept Cq does not have any
instances. Therefore, Tq is defined as follows:
Tq = { ¬Cq | q ∈ ttK (q ) for some q ∈ trK (q)}.
It is easily seen that true forest matches π always involve
at least one ABox individual (i.e., π(v) ∈ NI × {ε} for at
least one variable v). In order to prevent true forest matches,
we thus use an ABox Ai . Intuitively, we obtain the ABoxes
A1 , . . . , A by considering all possible ways of adding assertions to K such that, for all queries q ∈ trK (q), all true forest
matches are prevented. This gives rise to the knowledge bases
K1 , . . . , K .
As suggested in Section 3, we can prevent a true forest
match π of q ∈ trK (q) by splitting π into a base subquery
and a number of object subqueries and then making sure that
either the base query fails to match (this involves only individual names from the ABox) or at least one of the object
subqueries fails to have a tree match. In Section 3, however,
we used a concrete forest match π to split a query into subqueries. Here, we do not have a concrete π available and must
consider all possible ways in which a forest match can split a
query.
Let q ∈ trK (q). A splitting candidate for q is a partial
function τ : Var(q ) → Ind(A) with non-empty domain. For
a ∈ Ind(A), we use Reach(a, τ ) to denote the set of variables
v ∈ Var(q ) for which there exists a sequence of variables
v0 , . . . , vn , n ≥ 0, such that
1
Recall that an ABox is generalized if it admits assertions C(a)
with C an arbitrary concept.
• τ (v0 ) = a and vn = v;
• for all i ≤ n, τ (vi ) = a or τ (vi ) is undefined;
• for all i < n, there is a role r s.t. r(vi , vi+1 ) ∈ q .
We call τ a split mapping for q if, for all a, b ∈ Ind(A), a =
b implies Reach(a, τ ) ∩ Reach(b, τ ) = ∅. Intuitively, each
split mapping τ for q represents the set of forest matches π
of q such that π(v) = (τ (v), ε) for all v in the domain of
τ . Each injective split mapping for q induces a splitting of
q into a base query and object queries. Split mappings τ
need not be injective, however, and thus the general picture is
that they induce a splitting of the collapsing q of q obtained
by identifying all variables v, v with τ (v) = τ (v ). This
splitting is as follows:
q0τ
qaτ
:=
:=
{r(v, v ) ∈ q | τ (v), τ (v ) is defined}
{at ∈ q | Var(at) ⊆ Reach(a, τ )} \ q0τ
for all a ∈ NI that are in the range of τ . Observe that the
condition which distinguishes splitting candidates and split
mappings ensures that a = b implies Var(qaτ ) ∩ Var(qbτ ) = ∅.
This condition is satisfied by the splittings described in Section 3, and it is needed to independently roll up the subqueries
qaτ into concepts.
In the following, we use sub(q) to denote the set of subqueries of q, i.e., the set of non-empty subsets of q. Let
Q := {q3 | ∃q1 , q2 : q1 ∈ trK (q), q2 ∈ sub(q1 ), q3 ∈ ttK (q2 )}
and let cl(q) be the set of rolled up concepts Cq , for all
q ∈ Q. A SHIQ ABox A is called a q-completion if it
contains only assertions of the form
• ¬C(a) for some C ∈ cl(q) and a ∈ Ind(A) and
• ¬r(a, b) for a role name r occurring in Q and a, b ∈
Ind(A).
Let τ be a split mapping for q ∈ trK (q) and A a qcompletion. We say that A spoils τ if one of the following
holds:
(a) there is an r(v, v ) ∈ q0τ such that ¬r(τ (v), τ (v )) ∈ A ;
(b) there is an a in the range of τ such that ¬C(a) ∈ A ,
where
C := Cq .
τ
q ∈ttK (qa )
Intuitively, (b) prevents tree matches of the object subqueries,
c.f. Lemmas 8 and 6.
Finally, a q-completion A is called a counter canidate for
K and q if for all q ∈ trK (q) and all split mappings τ for q ,
A spoils τ . Let A1 , . . . , A be all counter candidates for K
and q and K1 , . . . , K the associated extended KBs.
Lemma 9. K |= q iff K1 , . . . , K are inconsistent.
We now define the two decision procedures for query entailment in SHIQ: in the deterministic version, we generate all
q-completions A of A and return “K |= q” if all generated
A that are a counter candidate give rise to an inconsistent
extended KB. Otherwise, we return “K |= q”. In the nondeterministic version, we guess a q-completion A of A, return “K |= q” if A is a counter candidate and the associated
IJCAI-07
403
extended KB is consistent, and “K |= q” otherwise. To show
that these algorithms are decision procedures, it remains to
show that the consistency of extended knowledge bases is decidable. The following theorem can be proved by a reduction
to the DL ALCQIb and using results from [Tobies, 2001].
In the following, the size |Γ| of an ABox, TBox, role hierarchy or (extended) knowledge base Γ is simply the number of
symbols needed to write Γ (with numbers written in binary).
Theorem 10. There is a polynomial p such that, given an
extended knowledge base K = (T ∪ Tq , H, A ∪ A ) with
|K | = k, |A ∪ A | = a, |T ∪ Tq ∪ H| = t, and where the
maximum length of concepts in Tq and A is , we can decide
consistency of K in
p()
• deterministic time in 2p(k)2
;
p(t)
• non-deterministic time in p(a) · 22 .
We now discuss the complexity of our algorithms. We start
by establishing some bounds on the number and size of transitivity rewritings, tree transformations, etc.
Lemma 11. Let |q| = n and |K| = m. Then there is a
polynomial p such that
(a) |trK (q)| ≤ 2p(n)·log p(m) ;
(b) for all q ∈ trK (q), |q | ≤ 3n;
(c)
(d)
(e)
(f)
for all q ∈ trK (q), |ttK (q )| ≤ 2p(n)·log p(m) ;
for all q ∈ trK (q) and q ∈ ttK (q ), |q | ≤ p(n);
|cl(q)| ≤ 2p(n)·log p(m) ; and
for all C ∈ cl(q), |C| ≤ p(n).
Let k = |cl(q)|. We first show that the deterministic version
of our algorithm runs in time exponential in m and double
exponential in n. This follows from Theorem 10 together
with the following observations:
2
(i) The number of q-completions is bounded by 2k·m+k·m ,
which, by Lemma 11(e), is exponential in m and double
exponential in n;
(ii) Checking whether a q-completion is a counter candidate
can be done in time exponential in n and polynomial
in m;
(iii) By Lemma 11, the cardinality of Tq and of each qcompletion is exponential in n and polynomial in m, and
the maximum length of concepts in Tq and A is polynomial in n (and independent of m).
Now for the non-deterministic version. Since we aim at an
upper bound for data complexity, we only need to verify that
the algorithm runs in time polynomial in the size of |A|, and
can neglect the contribution of T , H, and q to time complexity. The desired result follows from Theorem 10 and
Points (ii) and (iii) above. This bound is tight since conjunctive query entailment is already co-NP-hard regarding
data complexity in the AL fragment of SHIQ [Donini et
al., 1994]. Summing up, we thus have the following.
Theorem 12. Conjunctive query entailment in SHIQ is data
complete for co-NP, and can be decided in time exponential
in the size of the knowledge base and double exponential in
the size of the query.
5 Conclusions
With the decision procedure presented for conjunctive query
entailment (and therefore for query answering) in SHIQ, we
close a long open problem. It will be part of future work to
extend this procedure to SHOIN , which is the DL underlying OWL DL. We will also attempt to find more implementable algorithms for query answering in SHIQ. Carrying out query transformations in a more goal directed way
will be crucial to achieving this.
Acknowledgements. This work was supported by the EU
funded IST-2005-7603 FET Project Thinking Ontologies
(TONES). Birte Glimm was supported by an EPSRC studentship.
References
[Baader et al., 2003] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and
Applications. Cambridge University Press, 2003.
[Calvanese et al., 1998] D. Calvanese, G. De Giacomo, and
M. Lenzerini. On the decidability of query containment
under constraints. In Proc. of PODS’98. ACM Press, 1998.
[Donini et al., 1994] F. Donini, M. Lenzerini, D. Nardi, and
A. Schaerf. Deduction in concept languages: From subsumption to instance checking. Journal of Logic and Computation, 4(4):423-452, 1994.
[Glimm et al., 2006] B. Glimm, I. Horrocks, and U. Sattler.
Conjunctive query answering for description logics with
transitive roles. In Proc. of DL-06. CEUR, 2006.
[Glimm et al., 2006] B. Glimm, I. Horrock, C. Lutz, and
U. Sattler. Conjunctive query answering for the description logic SHIQ. LTCS-Report 06-01, Theoretical Computer Science, TU Dresden, 2006. See http://lat.inf.tudresden.de/research/reports.html.
[Horrocks and Tessaris, 2000] I. Horrocks and S. Tessaris. A
conjunctive query language for description logic aboxes.
In Proc. of AAAI-00, 2000.
[Horrocks et al., 2000] I. Horrocks, U. Sattler, and S. Tobies. Reasoning with Individuals for the Description
Logic SHIQ. In Proc. of CADE-00, vol. 1831 of LNAI.
Springer-Verlag, 2000.
[Horrocks et al., 2003] I. Horrocks, P. F. Patel-Schneider,
and F. van Harmelen. From SHIQ and RDF to OWL: The
making of a web ontology language. Journ. of Web Semantics, 1(1), 2003.
[Hustadt et al., 2005] U. Hustadt, B. Motik, and U. Sattler.
Data complexity of reasoning in very expressive description logics. In Proc. of IJCAI-05, 2005.
[Ortiz et al., 2006] M. M. Ortiz, D. Calvanese, and T. Eiter.
Characterizing data complexity for conjunctive query answering in expressive description logics. In Proc. of AAAI06, 2006.
[Tobies, 2001] S. Tobies. Complexity Results and Practical
Algorithms for Logics in Knowledge Representation. PhD
thesis, RWTH Aachen, 2001.
IJCAI-07
404
Download