Conjunctive Query Answering for the Description Logic SHIQ Birte Glimm1, Ian Horrocks1 , Carsten Lutz2 , Uli Sattler1 1 2 School of Computer Science University of Manchester, UK Abstract Conjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, it was an open problem whether conjunctive query answering over DL knowledge bases is decidable if transitive roles are admitted in the query. In this paper, we consider conjunctive queries over knowledge bases formulated in the popular DL SHIQ and allow transitive roles in both the query and the knowledge base. We show that query answering is decidable and establish the following complexity bounds: regarding combined complexity, we devise a deterministic algorithm for query answering that needs time single exponential in the size of the KB and double exponential in the size of the query. Regarding data complexity, we prove co-NP-completeness. 1 Introduction Description Logics (DLs) [Baader et al., 2003] are a wellestablished family of logic-based knowledge representation formalisms that have recently gained increased attention due to their usage as the logical underpinning of ontology languages such as DAML+OIL and OWL [Horrocks et al., 2003]. A DL knowledge base (KB) consists of a TBox, which contains intensional knowledge such as concept definitions and general background knowledge, and an ABox, which contains extensional knowledge and is used to describe individuals. Using a database metaphor, the TBox corresponds to the schema, and the ABox corresponds to the data. In data-intensive applications, querying KBs plays a central role. Essentially, there are two forms of querying. The first one is instance retrieval, which allows the retrieval of all certain instances of a given (possibly complex) concept C, i.e., it returns all individuals from the ABox that are an instance of C in every model of the KB. Technically, instance retrieval is well-understood. For the prominent DL SHIQ, which underlies DAML+OIL and OWL Lite, it is E XP T IMEcomplete [Tobies, 2001], and, despite this high worst-case complexity, efficient implementations are available. On the other hand, instance retrieval is a rather poor form of querying: concepts are used as queries, and thus we can only query Institute for Theoretical Computer Science TU Dresden, Germany for structures that are invariant under (guarded) bisimulations. For this reason, many applications require conjunctive query answering as a stronger form of querying, i.e., computing the certain answers to a conjunctive query over a DL knowledge base. Until now it was an open problem whether conjunctive query answering is decidable in SHIQ. In particular, the presence of transitive and inverse roles makes the problem rather tricky [Glimm et al., 2006], and results were only available for two restricted cases. The first case is obtained by stipulating that the variables in queries can only be bound to individuals that are explicitly mentioned in the ABox. The result is a form of closed-domain semantics, which is different from the usual open-domain semantics in DLs. It is easily seen that conjunctive query answering in this setting can be reduced to instance retrieval. In the second case, the binary atoms in conjunctive queries are restricted to roles that are neither transitive nor have transitive sub-roles, and it is known that conjunctive query answering in this setting is decidable and co-NP-complete regarding data complexity [Ortiz et al., 2006]. In this paper, we show that unrestricted conjunctive query answering in SHIQ is decidable. More precisely, we devise a decision procedure for the entailment of a conjunctive query by a SHIQ knowledge base, which is the decision problem corresponding to conjunctive query answering. It is well-known that decidability and complexity results carry over from entailment to answering. Our decision procedure for query entailment consists of a rather intricate reduction to KB consistency in SHIQ , i.e., SHIQ extended with role conjunction. The latter is easily seen to be decidable. The resulting (deterministic) algorithm for conjunctive query entailment in SHIQ needs time double exponential in the size of the query and single exponential in the size of the KB. This result concerns the combined complexity, i.e., it is measured in the size of all inputs: the query, the ABox, and the TBox. Since query and TBox are usually small compared to the ABox, the data complexity (measured in the size of the ABox, only) is also a relevant issue. We show that (the decision problem corresponding to) conjunctive query answering in SHIQ is co-NP-complete regarding data complexity, and thus not harder than instance retrieval [Hustadt et al., 2005]. This paper is accompanied by a technical report which contains full proofs [Glimm et al., 2006]. IJCAI-07 399 2 Preliminaries Syntax and Semantics of SHIQ Let NC , NR , and NI be sets of concept names, role names, and individual names. We assume that the role names contain a subset NtR ⊆ NR of transitive role names. A role is an element of NR ∪ {r− | r ∈ NR }, where roles of the form r− are called inverse roles. A role inclusion is of the form r s with r, s roles. A role hierarchy H is a finite set of role inclusions. An interpretation I = (ΔI , ·I ) consists of a non-empty set ΔI , the domain of I, and a function ·I , which maps every concept name A to a subset AI ⊆ ΔI , every role name r ∈ NR to a binary relation rI ⊆ ΔI × ΔI such that rI is transitive for every r ∈ NtR , and every individual name a to an element aI ∈ ΔI . An interpretation I satisfies a role inclusion r s if rI ⊆ sI , and a role hierarchy H if it satisfies all role inclusions in H. We use the following standard notation: (1) Inv(r) := r− if r ∈ NR and Inv(r) := s if r = s− for a role name s. (2) For a role hierarchy H, ∗H is the reflexive transitive closure of over H ∪ {Inv(r) Inv(s) | r s ∈ H}, and we use r ≡∗H s as an abbreviation for r ∗H s and s ∗H r. (3) For a role hierarchy H and a role s, we define the set TransH of transitive roles as {s | ∃ role r with r ≡∗H s and r ∈ NtR or Inv(r) ∈ NtR }. (4) A role r is called simple w.r.t. a role hierarchy H if, for / TransH . each role s such that s ∗H r, s ∈ The subscript H of ∗H and TransH is dropped if clear from the context. SHIQ-concepts (or concepts for short) are built inductively using the following grammar, where A ∈ NC , n ∈ IN, r is a role, and s is a simple role: C ::= | ⊥ | A | ¬C | C1 C2 | C1 C2 | ∀r.C | ∃r.C | ns.C | ns.C. The semantics of SHIQ-concepts is defined as usual, see e.g. [Horrocks et al., 2000] for details. A general concept inclusion (GCI) is an expression C D, where both C and D are concepts. A finite set of GCIs is called a TBox. An assertion is an expression of the form . A(a), ¬A(a), r(a, b), ¬r(a, b), or a = b, where A is a concept name, r is a role, and a, b ∈ NI . An ABox is a finite set of assertions. We use Ind(A) to denote the set of individual names occurring in A. An interpretation I satisfies a GCI C D if C I ⊆ DI . Satisfaction of assertions is defined in the obvious way, e.g. A(a) is satisfied if aI ∈ AI . An interpretation I satisfies a TBox (ABox) if it satisfies each GCI (assertion) in it. Observe that, in ABox assertions C(a), we require C to be a (possibly negated) concept name. This is a standard assumption when the data complexity of DLs is analyzed, see e.g. [Donini et al., 1994]. In this paper, we will sometimes also allow ABox assertions C(a), where C is an arbitrary concept. To make this explicit, we will call such ABoxes generalized. A knowledge base (KB) is a triple (T , H, A) with T a TBox, H a role hierarchy, and A an ABox. Let K = (T , H, A) be a KB and I = (ΔI , ·I ) an interpretation. An interpretation I satisfies K if it satisfies T , H, and A. If Γ is a TBox, ABox, or KB and I satisfies Γ, we say that I is a model of Γ and write I |= Γ. A knowledge base K is consistent if it has a model. Conjunctive Queries Let NV be a countably infinite set of variables disjoint from NC , NR , and NI . An atom is an expression A(v) (concept atom) or r(v, v ) (role atom), where A is a concept name, r is a role, and v, v ∈ NV . A conjunctive query q is a non-empty set of atoms. Intuitively, such a set represents the conjunction of its elements. We use Var(q) (Var(at)) to denote the set of variables occurring in the query q (atom at). Let I be an interpretation, q a conjunctive query, and π : Var(q) → ΔI a total function. We write • I |=π C(v) if (π(v)) ∈ C I ; • I |=π r(v, v ) if (π(v), π(v )) ∈ rI ; If I |=π at for all at ∈ q, we write I |=π q and call π a match for I and q. We say that I satisfies q and write I |= q if there is a match π for I and q. If I |= q for all models I of a KB K, we write K |= q and say that K entails q. The query entailment problem is defined as follows: given a knowledge base K and a query q, decide whether K |= q. It is well-known that query entailment and query answering can be mutually reduced and that decidability and complexity results carry over [Calvanese et al., 1998; Horrocks and Tessaris, 2000]. In the remainder of this paper, we concentrate on query entailment. For convenience, we assume that conjunctive queries are closed under inverses, i.e., if r(v, v ) ∈ q, then Inv(r)(v , v) ∈ q. If we add or remove atoms from a query, we silently assume that we do this such that the resulting query is again closed under inverses. We will also assume that queries are connected. Formally, a query q (closed under inverses) is connected if, for all v, v ∈ Var(q), there exists a sequence v0 , . . . , vn such that v0 = v, vn = v , and for all i < n, there exists a role r such that r(vi , vi+1 ) ∈ q. A collection q0 , . . . , qk of queries is a partitioning of q if q = q0 ∪ · · · ∪ qk , Var(qi ) ∩ Var(qj ) = ∅ for i < j ≤ k, and each qi is connected. The following lemma shows that connectedness can be assumed w.l.o.g. Lemma 1. Let K be a knowledge base, q a conjunctive query, and q0 , . . . , qn a partitioning of q. Then K |= q iff K |= qi for all i ≤ n. 3 Forests and Trees In this section, we carefully analyze the entailment of queries by knowledge bases and establish a set of general properties that will play a central role in our decision procedure. We start by showing that, in order to decide whether K |= q, it suffices to check whether I |= q for all models I of K that are of a particular shape. Intuitively, these models are shaped like a forest (in the graph-theoretic sense), modulo the fact that transitive roles have to be interpreted in transitive relations. Let IN∗ be the set of all (finite) words over the alphabet IN. A tree T is a non-empty, prefix-closed subset of IN∗ . For IJCAI-07 400 w, w ∈ T , we call w a successor of w if w = w · c for some c ∈ IN, where “·” denotes concatenation. We call w a neighbor of w if w is a successor of w or w is a successor of w . Definition 2. A forest base for K is an interpretation I that interpretes transitive roles in unrestricted (i.e., not necessarily transitive) relations and, additionally, satisfies the following conditions: T1 ΔI ⊆ Ind(A) × IN∗ such that, for all a ∈ Ind(A), the set {w | (a, w) ∈ ΔI } is a tree; T2 if ((a, w), (a , w )) ∈ rI , then either w = w = ε or a = a and w is a neighbor of w; T3 for all a ∈ Ind(A), aI = (b, ε) for some b ∈ Ind(A). A model I of K is canonical if there exists a forest base J for K such that J is identical to I except that, for all non-simple roles r, we have rI = rJ ∪ (sJ )+ . s∗ r, s∈Trans In this case, we say that J is a forest base for I. Observe that, in canonical models I, each individual a is mapped to a pair (b, ε), where a = b does not necessarily hold. We need this since we do not adopt the uniqe name assumption (UNA): if a, b ∈ NI with a = b, then we allow that aI = bI . If desired, the UNA can easily be adopted by adding an assertion a = b for each pair of individual names in Ind(A) to the ABox. Lemma 3. K |= q iff there exists a canonical model I of K such that I |= q. Proof. Using standard unravelling (see e.g. [Tobies, 2001]), each model of K can be converted into a canonical model of K. Moreover, if I |= K and I is the canonical model obtained by unravelling I, then it is not hard to show that J I |= q implies I |= q, for all conjunctive queries q. Lemma 3 shows that, when deciding whether K |= q, it suffices to check whether I |= q for all canonical models I of K. As a next step, we would like to show that, for canonical models I, to check whether I |= q, we can restrict our attention to a particular kind of match π for I and q. A match π for I and q is a forest match if, for all r(v, v ) ∈ q, we have one of the following: • π(v), π(v ) ∈ NI × {ε}; • π(v), π(v ) ∈ {a} × IN∗ for some a ∈ Ind(A). Alas, it is not sufficient to only consider forest matches for I and q. Instead, we show the following: we can rewrite q into a set of queries Q such that, for all canonical models I, we have that I |= q iff I |=π q for some q ∈ Q and forest match π. Intuitively, this complication is due to the presence of transitive roles. Definition 4. A query q is called a transitivity rewriting of q w.r.t. K if it is obtained from q by choosing atoms r0 (v0 , v0 ), . . . , rn (vn , vn ) ∈ q and roles s0 , . . . , sn ∈ TransH such that si ∗H ri for all i ≤ n, and then replacing ri (vi , vi ) with si (vi , ui ), si (ui , vi ) or si (vi , ui ), si (ui , ui ), si (ui , vi ) for all i ≤ n, where ui , ui ∈ Var(q). We use trK (q) to denote the set of all transitivity rewritings of q w.r.t. K. We assume that trK (q) contains no isomorphic queries, i.e., differences in (newly introduced) variable names are neglected. Together with Lemma 3, the following lemma shows that, to decide whether K |= q, it suffices to check the existence of some canonical model I, some forest match π, and some q ∈ trK (q) such that I |=π q . Lemma 5. Let I be a model of K. 1. If I is canonical and I |= q, then there is a q ∈ trK (q) such that I |=π q , with π a forest match. 2. If I |= q with q ∈ trK (q), then I |= q. Proof. (1) can be proved by using the match and the canonical structure of I to guide the rewriting process. (2) holds by definition of transitivity rewritings and the semantics. J The most important property of forest matches is the following: if I |=π q with π a forest match, then π splits the query q into several subqueries: the base subquery q0 contains all role atoms that are matched to root nodes: q0 := {r(v, v ) ∈ q | π(v), π(v ) ∈ NI × {ε}}; Moreover, for each (a, ε) ∈ NI × {ε} which occurs in the range of π, there is an object subquery qa : qa := {at | ∀v ∈ Var(at) : π(v) ∈ {a} × IN∗ } \ q0 . Clearly, q = q0 ∪ a qa . Although the resulting subqueries are not a partitioning of q in the sense of Section 2, one of the fundamental ideas behind our decision procedure is to treat the different subqueries more or less separately. The main benefit is that the object subqueries can be rewritten into treeshaped queries which can then be translated into concepts. This technique is also known as “rolling up” of tree conjunctive queries into concepts and was proposed in [Calvanese et al., 1998; Horrocks and Tessaris, 2000]. Formally, a query q is tree-shaped if there exists a bijection σ from Var(q) into a tree T such that r(v, v ) ∈ q implies that σ(v) is a neighbor of σ(v ) in T . Before we show how to rewrite the object subqueries into tree-shaped queries, let us substantiate our claim that tree-shaped queries can be rolled up into concepts. The result of rolling up is not a SHIQconcept, but a concept formulated in SHIQ , the extension of SHIQ with role intersection. More precisely, SHIQ is obtained from SHIQ by admitting the concept constructors ∃α.C, ∀α.C, α.C, and α.C, where α is a role conjunction r1 · · · rk with the ri (possibly inverse) roles. Let q be a tree-shaped query and σ a bijection from Var(q) into a tree T . Inductively assign to each v ∈ Var(q) a SHIQ -concept Cq (v): • if σ(v) is a leaf of T , then Cq (v) := A(v)∈q A IJCAI-07 401 • if σ(v) has successors σ(v1 ), . . . , σ(vn ) in T , then Cq (v) := A ∃ r .Cq (vi ). A(v)∈q 1≤i≤n r(v,vi )∈q Then the rolling up Cq of q is defined as Cq (vr ), where vr is such that σ(vr ) = ε. (Recall that σ is a bijection, hence, such a vr exists.) The following lemma shows the connection between the query and the rolled up concept. Lemma 6. Let q be a tree-shaped query and I an interpretation. Then I |= q iff CqI = ∅. We now show how to transform a query q into a set Q of treeshaped ones. To describe the exact goal of this translation, we need to introduce tree matches as a special case of forest matches: a match π for a canonical model I and q is a tree match if the range of π is a subset of {a} × IN∗ , for some a ∈ NI . Now, our tree transformation should be such that (∗) whenever I |=π q with I a canonical model and π a tree match, then I |=π q for some (treeshaped) query q ∈ Q and tree match π . Recall the splitting of a query into a base subquery and a set of object subqueries qa , induced by a forest match π. It is not hard to see that for each qa , the restriction of π to Var(qa ) is a tree match for I and qa . Thus, object subqueries together with their inducing matches π satisfy the precondition of (∗). The rewriting of a query into a tree-shaped one is a three stage process. In the first stage, we derive a collapsing q0 of the original query q by (possibly) identifying variables in q. This allows us, e.g., to transform atoms r(v, u), r(v, u ), r(u, w), r(u , w) into a tree shape by identifying u and u . In the second stage, we derive an extension q1 of q0 by (possibly) introducing new variables and role atoms that make certain existing role atoms r(v, v ) redundant, where r is non-simple. In the third stage, we derive a reduct q of q1 by (possibly) removing redundant role atoms, i.e., atoms r(v, v ) such that there exist atoms s(v0 , v1 ), . . . , s(vn−1 , vn ) ∈ q with v0 = v, vn = v , s ∗ r, and s ∈ Trans. Combining the extension and reduct steps allows us, e.g., to transform a non-tree-shaped “loop” r(v, v) into a tree shape by adding a new variable v and edges s(v, v ), s(v , v) such that s ∗ r and s ∈ Trans, and then removing the redundant atom r(v, v). In what follows, the size |q| of a query q is defined as the number of atoms in q. Definition 7. A collapsing of q is obtained by identifying variables in q. A query q is an extension of q w.r.t. K if: 1. q ⊆ q ; 2. A(v) ∈ q implies A(v) ∈ q; 3. r(v, v ) ∈ q \ q implies that r occurs in K; 4. |Var(q )| ≤ 4|q|; 5. |{r(v, v ) ∈ q | r(v, v ) ∈ / q}| ≤ 171|q|2 . A query q is a reduct of q w.r.t. K if: 1. q ⊆ q; 2. A(v) ∈ q implies A(v) ∈ q ; 3. if r(v, v ) ∈ q \ q , then there is a role s such that s ∗ r, s ∈ Trans, and there are v0 , . . . , vn such that v0 = v, vn = v , and s(vi , vi+1 ) ∈ q for all i < n. A query q is a tree transformation of q w.r.t. K if there exist queries q0 and q1 such that • q0 is a collapsing of q, • q1 is an extension of q0 w.r.t. K, and • q is a tree-shaped reduct of q1 . We use ttK (q) to denote the set of all tree transformations of q w.r.t. K. We remark that Condition 5 of query extensions is not strictly needed for correctness, but it ensures that our algorithm is only single exponential in the size of K. As in the case of trK (q), we assume that ttK (q) does not contain any isomorphic queries. The next lemma states the central properties of tree transformations. Before we can formulate it, we introduce two technical notions. Let q ∈ ttK (q), I |=π q, and I |=π q . Then π and π are called ε-compatible if the following holds: if v ∈ Var(q), v was identified with variable v ∈ Var(q ) during the collapsing step, and at least one of π(v), π (v ) is in NI × {ε}, then π(v) = π (v ). Furthermore, we call π an a-tree match if π(v) ∈ {a} × IN∗ for all v ∈ Var(q). Lemma 8. Let I be a model of K. 1. If I is canonical and π an a-tree match with I |=π q, then there is a q ∈ ttK (q) and an a-tree match π such that I |=π q and π, π are ε-compatible. 2. If q ∈ ttK (q) and I |=π q , then there is a match π with I |=π q and π, π are ε-compatible. Proof. The proof of (2) is straightforward, but the proof of (1) is quite complex. The basic idea behind the proof of (1) is that, given a canonical model I of K and a tree match π such that I |=π q, we can use π and I to guide the transformation process. The bounds of 4|q| and 171|q|2 in Conditions 4 and 5 of extensions are derived by computing the maximum number of variables and atoms that might be introduced in the guided transformation process. J Intuitively, using a-tree matches and ε-compatibility in Lemma 8 ensures that, if we are given a match for the base subquery and a tree match for a tree transformation of each object subquery, then we can construct a forest match of the original query. 4 The Decision Procedure Let K be a knowledge base and q a conjunctive query such that we want to decide whether K |= q. A counter model for K and q is a model I of K such that I |= q. In the following, we show how to convert K and q into a sequence of knowledge bases K1 , . . . , K such that (i) every counter model for K and q is a model of some Ki , (ii) every canonical model of a Ki is a countermodel for K and q, and (iii) each consistent Ki has a canonical model. Thus, K |= q iff all the Ki are inconsistent. This gives rise to two decision procedures: a deterministic one in which we enumerate all Ki and which we IJCAI-07 402 use to derive an upper bound for combined complexity; and a non-determinstic one in which we guess a Ki and which yields a tight upper bound for data complexity. Since the knowledge bases Ki involve concepts obtained by rolling up tree-shaped queries, they are fomulated in SHIQ rather than in SHIQ. More precisely, each KB Ki is of the form (T ∪ Tq , H, A ∪ Ai ), where • (T , H, A) is a SHIQ knowledge base; • Tq is a set of GCIs C with C a SHIQ concept; • Ai is a generalized SHIQ -ABox1 such that Ind(Ai ) ⊆ Ind(A). In what follows, we call knowledge bases of this form extended knowledge bases. Using a standard unravelling argument, it is easy to establish Property (iii) from above, i.e., every consistent extended knowledge base K has a canonical model. The actual construction of the extended knowledge bases is based on the analysis in Section 3. To start with, Lemma 3 and 5 imply the following: to ensure that a canonical model of an extended KB is a counter model for K and q, it suffices to prevent forest matches of all queries q ∈ trK (q). In order to prevent such matches, we use the parts Tq and Ai of extended knowledge bases. More precisely, we distinguish between two kinds of forest matches: tree matches and true forest matches, i.e., forest matches that are not tree matches. Preventing tree matches of a q ∈ trK (q) in a canonical model is relatively simple: by Lemmas 8 and 6, it suffices to ensure that, for all q ∈ ttK (q ), the corresponding concept Cq does not have any instances. Therefore, Tq is defined as follows: Tq = { ¬Cq | q ∈ ttK (q ) for some q ∈ trK (q)}. It is easily seen that true forest matches π always involve at least one ABox individual (i.e., π(v) ∈ NI × {ε} for at least one variable v). In order to prevent true forest matches, we thus use an ABox Ai . Intuitively, we obtain the ABoxes A1 , . . . , A by considering all possible ways of adding assertions to K such that, for all queries q ∈ trK (q), all true forest matches are prevented. This gives rise to the knowledge bases K1 , . . . , K . As suggested in Section 3, we can prevent a true forest match π of q ∈ trK (q) by splitting π into a base subquery and a number of object subqueries and then making sure that either the base query fails to match (this involves only individual names from the ABox) or at least one of the object subqueries fails to have a tree match. In Section 3, however, we used a concrete forest match π to split a query into subqueries. Here, we do not have a concrete π available and must consider all possible ways in which a forest match can split a query. Let q ∈ trK (q). A splitting candidate for q is a partial function τ : Var(q ) → Ind(A) with non-empty domain. For a ∈ Ind(A), we use Reach(a, τ ) to denote the set of variables v ∈ Var(q ) for which there exists a sequence of variables v0 , . . . , vn , n ≥ 0, such that 1 Recall that an ABox is generalized if it admits assertions C(a) with C an arbitrary concept. • τ (v0 ) = a and vn = v; • for all i ≤ n, τ (vi ) = a or τ (vi ) is undefined; • for all i < n, there is a role r s.t. r(vi , vi+1 ) ∈ q . We call τ a split mapping for q if, for all a, b ∈ Ind(A), a = b implies Reach(a, τ ) ∩ Reach(b, τ ) = ∅. Intuitively, each split mapping τ for q represents the set of forest matches π of q such that π(v) = (τ (v), ε) for all v in the domain of τ . Each injective split mapping for q induces a splitting of q into a base query and object queries. Split mappings τ need not be injective, however, and thus the general picture is that they induce a splitting of the collapsing q of q obtained by identifying all variables v, v with τ (v) = τ (v ). This splitting is as follows: q0τ qaτ := := {r(v, v ) ∈ q | τ (v), τ (v ) is defined} {at ∈ q | Var(at) ⊆ Reach(a, τ )} \ q0τ for all a ∈ NI that are in the range of τ . Observe that the condition which distinguishes splitting candidates and split mappings ensures that a = b implies Var(qaτ ) ∩ Var(qbτ ) = ∅. This condition is satisfied by the splittings described in Section 3, and it is needed to independently roll up the subqueries qaτ into concepts. In the following, we use sub(q) to denote the set of subqueries of q, i.e., the set of non-empty subsets of q. Let Q := {q3 | ∃q1 , q2 : q1 ∈ trK (q), q2 ∈ sub(q1 ), q3 ∈ ttK (q2 )} and let cl(q) be the set of rolled up concepts Cq , for all q ∈ Q. A SHIQ ABox A is called a q-completion if it contains only assertions of the form • ¬C(a) for some C ∈ cl(q) and a ∈ Ind(A) and • ¬r(a, b) for a role name r occurring in Q and a, b ∈ Ind(A). Let τ be a split mapping for q ∈ trK (q) and A a qcompletion. We say that A spoils τ if one of the following holds: (a) there is an r(v, v ) ∈ q0τ such that ¬r(τ (v), τ (v )) ∈ A ; (b) there is an a in the range of τ such that ¬C(a) ∈ A , where C := Cq . τ q ∈ttK (qa ) Intuitively, (b) prevents tree matches of the object subqueries, c.f. Lemmas 8 and 6. Finally, a q-completion A is called a counter canidate for K and q if for all q ∈ trK (q) and all split mappings τ for q , A spoils τ . Let A1 , . . . , A be all counter candidates for K and q and K1 , . . . , K the associated extended KBs. Lemma 9. K |= q iff K1 , . . . , K are inconsistent. We now define the two decision procedures for query entailment in SHIQ: in the deterministic version, we generate all q-completions A of A and return “K |= q” if all generated A that are a counter candidate give rise to an inconsistent extended KB. Otherwise, we return “K |= q”. In the nondeterministic version, we guess a q-completion A of A, return “K |= q” if A is a counter candidate and the associated IJCAI-07 403 extended KB is consistent, and “K |= q” otherwise. To show that these algorithms are decision procedures, it remains to show that the consistency of extended knowledge bases is decidable. The following theorem can be proved by a reduction to the DL ALCQIb and using results from [Tobies, 2001]. In the following, the size |Γ| of an ABox, TBox, role hierarchy or (extended) knowledge base Γ is simply the number of symbols needed to write Γ (with numbers written in binary). Theorem 10. There is a polynomial p such that, given an extended knowledge base K = (T ∪ Tq , H, A ∪ A ) with |K | = k, |A ∪ A | = a, |T ∪ Tq ∪ H| = t, and where the maximum length of concepts in Tq and A is , we can decide consistency of K in p() • deterministic time in 2p(k)2 ; p(t) • non-deterministic time in p(a) · 22 . We now discuss the complexity of our algorithms. We start by establishing some bounds on the number and size of transitivity rewritings, tree transformations, etc. Lemma 11. Let |q| = n and |K| = m. Then there is a polynomial p such that (a) |trK (q)| ≤ 2p(n)·log p(m) ; (b) for all q ∈ trK (q), |q | ≤ 3n; (c) (d) (e) (f) for all q ∈ trK (q), |ttK (q )| ≤ 2p(n)·log p(m) ; for all q ∈ trK (q) and q ∈ ttK (q ), |q | ≤ p(n); |cl(q)| ≤ 2p(n)·log p(m) ; and for all C ∈ cl(q), |C| ≤ p(n). Let k = |cl(q)|. We first show that the deterministic version of our algorithm runs in time exponential in m and double exponential in n. This follows from Theorem 10 together with the following observations: 2 (i) The number of q-completions is bounded by 2k·m+k·m , which, by Lemma 11(e), is exponential in m and double exponential in n; (ii) Checking whether a q-completion is a counter candidate can be done in time exponential in n and polynomial in m; (iii) By Lemma 11, the cardinality of Tq and of each qcompletion is exponential in n and polynomial in m, and the maximum length of concepts in Tq and A is polynomial in n (and independent of m). Now for the non-deterministic version. Since we aim at an upper bound for data complexity, we only need to verify that the algorithm runs in time polynomial in the size of |A|, and can neglect the contribution of T , H, and q to time complexity. The desired result follows from Theorem 10 and Points (ii) and (iii) above. This bound is tight since conjunctive query entailment is already co-NP-hard regarding data complexity in the AL fragment of SHIQ [Donini et al., 1994]. Summing up, we thus have the following. Theorem 12. Conjunctive query entailment in SHIQ is data complete for co-NP, and can be decided in time exponential in the size of the knowledge base and double exponential in the size of the query. 5 Conclusions With the decision procedure presented for conjunctive query entailment (and therefore for query answering) in SHIQ, we close a long open problem. It will be part of future work to extend this procedure to SHOIN , which is the DL underlying OWL DL. We will also attempt to find more implementable algorithms for query answering in SHIQ. Carrying out query transformations in a more goal directed way will be crucial to achieving this. Acknowledgements. This work was supported by the EU funded IST-2005-7603 FET Project Thinking Ontologies (TONES). Birte Glimm was supported by an EPSRC studentship. References [Baader et al., 2003] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003. [Calvanese et al., 1998] D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query containment under constraints. In Proc. of PODS’98. ACM Press, 1998. [Donini et al., 1994] F. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. Deduction in concept languages: From subsumption to instance checking. Journal of Logic and Computation, 4(4):423-452, 1994. [Glimm et al., 2006] B. Glimm, I. Horrocks, and U. Sattler. Conjunctive query answering for description logics with transitive roles. In Proc. of DL-06. CEUR, 2006. [Glimm et al., 2006] B. Glimm, I. Horrock, C. Lutz, and U. Sattler. Conjunctive query answering for the description logic SHIQ. LTCS-Report 06-01, Theoretical Computer Science, TU Dresden, 2006. See http://lat.inf.tudresden.de/research/reports.html. [Horrocks and Tessaris, 2000] I. Horrocks and S. Tessaris. A conjunctive query language for description logic aboxes. In Proc. of AAAI-00, 2000. [Horrocks et al., 2000] I. Horrocks, U. Sattler, and S. Tobies. Reasoning with Individuals for the Description Logic SHIQ. In Proc. of CADE-00, vol. 1831 of LNAI. Springer-Verlag, 2000. [Horrocks et al., 2003] I. Horrocks, P. F. Patel-Schneider, and F. van Harmelen. From SHIQ and RDF to OWL: The making of a web ontology language. Journ. of Web Semantics, 1(1), 2003. [Hustadt et al., 2005] U. Hustadt, B. Motik, and U. Sattler. Data complexity of reasoning in very expressive description logics. In Proc. of IJCAI-05, 2005. [Ortiz et al., 2006] M. M. Ortiz, D. Calvanese, and T. Eiter. Characterizing data complexity for conjunctive query answering in expressive description logics. In Proc. of AAAI06, 2006. [Tobies, 2001] S. Tobies. Complexity Results and Practical Algorithms for Logics in Knowledge Representation. PhD thesis, RWTH Aachen, 2001. IJCAI-07 404