Lexicalised Configuration Grammars

Lexicalised Configuration Grammars∗ Robert Grabowski and Marco Kuhlmann and Mathias Möhl Programming Systems Lab Saarland University Saarbrücken, Germany Abstract This paper introduces Lexicalised Configuration Grammars (lcgs), a new declarative framework for natural language syntax. lcg is powerful enough to encode a large number of existing grammar formalisms, facilitating their comparison from the perspective of graph configuration (Debusmann et al., 2005). Once a formalism has been encoded as an lcg, the framework offers various means to increase its expressivity in a controlled manner. Trading expressive power for computational complexity, this makes it possible to model syntactic phenomena in novel ways. Parsing algorithms for lcgs lend themselves to a combination of chartbased and constraint-based processing techniques, allowing both to bring in their strengths. 1 Introduction Formal accounts of natural language syntax may differ in their understanding of grammar. In generative frameworks, grammars are systems of derivation rules; well-formed expressions correspond to successful derivations in these systems. In descriptive frameworks, grammars are complex constraints on syntactic structures; well-formed structures are those that satisfy a grammar. This paper presents Lexicalised Configuration Grammars (lcgs), a new descriptive framework for the syntactic analysis of natural language. ∗ This paper is the extended version of an article that appears in the proceedings of the 2nd International Workshop on Constraint Solving and Language Processing, Barcelona, Spain, 2005. 2006-05-24 Structures and constraints lcg does not replace existing grammar formalisms; it offers a formal landscape into which these formalisms can be embedded to study them and their relations from a different angle: as description languages for syntactic structures. To be expressed as an lcg, a grammar formalism needs to be characterised by two choices: (1) What structures does it describe? and (2) What constraints does it use to describe them? To illustrate this, we will show how context-free grammars (cfgs) fits into lcg. Following McCawley (1968), cfgs can be seen as description languages for ordered, labelled trees (Choice 1). More precisely, let G = (N, T , P , S) be a cfg with N and T being the alphabets of non-terminal and terminal symbols, respectively, P the set of productions, and S ∈ N the start symbol. A node u satisfies G if either (a) u is a leaf node labelled with a terminal symbol, or (b) u is an inner node with successors u1 , . . . , uk (in that order), P contains a rule A → α1 · · · αk (where A ∈ N and αi ∈ N ∪ T ), u is labelled with A, and each successor ui of u is labelled with αi ; that is, the order of the successors of u is compatible with the order specified by the rule (Choice 2). An ordered, labelled tree satisfies G if its root node is labelled with π , its frontier is s, and all of its nodes satisfy G. Global and local constraints The choice of a class of reference structures for an lcg grammar formalism (Choice 1) imposes a global constraint on the formalism’s expressivity. For example, by committing itself to ordered, labelled trees, no grammar specified in the lcg version of cfg can possibly account for syntactic structures with discontinuous configurations, and no possible choice for the constraint language (Choice 2) can change that. Similarly, in previous work (Bodirsky et al., 2005), we have identified a class of discontinuous structures that is ‘just right’ for a descriptive view on Lexicalised Tree Adjoining Grammar (ltag) (Joshi and Schabes, 1997). Adopting this class commits an lcg formalism to subsets of those syntactic structures describable by an ltag. The choice of the class of reference structures is the only non-lexical constraint expressible in lcg. This sets lcg formalisms apart from other formalisms employing constraints to restrict syntactic configurations, like the id/lp format of Generalised Phrase Structure Grammar (Gazdar et al., 1985) or Constraint Dependency Grammar (cdg) (Maruyama, 1990). Both of these formalisms allow for the statement of non-lexical constraints at the level of individual grammars (order constraints in id/lp grammars, all constraints in cdg). In contrast, global constraints in lcg can be imposed only by the choice of reference structures (Choice 1), which is a choice made at the level of the formalism. All remaining constraints are local: they apply to a word and the words in its immediate syntactic neighbourhood. In this sense, lcg is a lexicalised framework. The next section discusses the notion of locality employed in lcg and the role of lexical constraints in more detail. Valencies and lexical constraints Locality is modelled through the concept of valency. The valency of a word w specifies the possible types of w (accepted types) and the number and types of words that w must connect with to form a complete expression (required types). The concept of valency is universal among lexicalised formalisms; it is implemented by nonterminal symbols in lexicalised cfg, syntactic roles in dependency grammar, and slashed categories in categorial grammar. When we say that lexical constraints apply to words and their immediate syntactic neighbourhoods, we mean that constraints in the lexical entry for w are relations over the words permitted by the valency of w. These words can be referred to by the accepted and required types of w. We illustrate the idea behind lexical constraints by finalising our encoding of cfg as an lcg formalism. Assuming that we chose ordered, labelled trees as the reference class of structures (Choice 1), rules in a (lexicalised) cfg can be rewritten as lcg lexical entries using a single binary constraint relation ≺ to express linear precedence (Choice 2). For example, the rule A → B1 wB2 B3 (where A, Bi ∈ N and w ∈ T ) corresponds to the lexical entry h{A}, {B1 , B2 , B3 } ; {B1 ≺ ?, ? ≺ B2 , B2 ≺ B3 }i . The first component of this entry specifies the types accepted by w, the second component specifies the required types; thus, in a tree satisfying this entry, the node labelled with w must have a predecessor of type A and successors of types B1 , B2 , B3 . The third component of the entry contains the lexical constraints on the valency; for the example entry, the node labelled with w (denoted by ? here) and its successors (referred to by their types) must be ordered as prescribed by the right hand side of the context-free rule. Note that this semantics exactly corresponds to McCawley’s conception of cfg. Increasing the expressivity Given that the lcg framework is parametrised with respect to the choice of the class of reference structures and the choice of the lexical constraint languages, there are two obvious ways how the expressivity of an lcg formalism can be increased: • choose a more permissive class of structures (for example, the ltag structures mentioned above instead of the ordered, labelled trees employed for the encoding of cfg); • choose other constraint languages (for example, languages with structural constraints other than precedence, like isolation or adjacency (Suhre, 1999), or languages allowing for non-structural constraints such as agreement). It turns out that lcg facilitates a rather detailed analysis of the implications that these two changes have in terms of the generative capacity and the processing complexity of the resulting formalisms. One of the main reasons why one might want to experiment with expressivity alternations is that for most traditional grammar formalisms, there is a small number of ‘killer phenomena’ for which it seems necessary to locally extend the expressiveness of the formalisms by just the right amount. In the case of English for example, while most syntactic configurations disallow discontinuities, a few (such as in whmovement) require them. It seems desirable to be able to express context-free and non-context-free phenomena in the same formalism, investing extra formal and computational resources only in places where they really are required. We claim that lcg is suitable for such endeavours. Another reason why we think that lcg is an interesting formal framework for modelling natural language is that it is able to handle linguistic phenomena that have proven to be particularly hard for other frameworks. As an example, we cite the permutation of nominal arguments in the German verb cluster known as scrambling. If we accept the linguistic analysis put forward by Becker et al. (Becker et al., 1992), the question whether a formalism can model scrambling boils down to asking whether it can generate the indexed language SCR = { π (n[0] , . . . , n[k] )v [0] · · · v [k] | k ≥ 0 } , where π is some permutation, and the indices (written as superscripts) match up verbs (vs) with their noun arguments (ns). It has been shown (Becker et al., 1992) that no formalism in the class of Linear Context-Free Rewriting Systems1 that produces a verb v [i] and the requirement for its matching noun argument n[i] in the same derivation step can generate SCR. In Section 3.3, we will present an lcg that does. Previous work Lexicalised Configuration Grammar elaborates on our previous work on graph configuration (Debusmann et al., 2005), in which we have shown how a broad range The class of Linear Context-Free Rewriting Systems includes, among other formalisms, Combinatory Categorial Grammar, ltag, and local Multi-Component tags. 1 of problems within computational linguistics can be modelled as tasks in which a finite number of elementary graph structures (fragments) has to be assembled into one reference structure, obeying both global and local constraints. In lcg, the fragments are specified by the lexical entries. Structure We start our exposition by introducing labelled drawings as the universal class of reference structures for lcgs (Section 2). Section 3 presents the parametrised framework for constraint languages over drawings and gives some illustrative examples. In Section 4, we prove some limitative complexity results for lcg. Section 5 then addresses the issue of parsing lcgs and shows how the standard polynomial complexities for parsing can be obtained by appropriate restrictions on the structures and constraint languages. The paper concludes with an outlook on future work in Section 6. Acknowledgements We thank Alexander Koller, Gert Smolka and the anonymous reviewers for useful comments on earlier versions of this paper. The work of Kuhlmann and Möhl is funded by the Collaborative Research Centre 378 of the Deutsche Forschungsgemeinschaft. 2 Labelled drawings We introduce lcgs as description languages for (labelled) drawings (Bodirsky et al., 2005), a class of relational structures representing two essential syntactic dimensions: derivation structure and word order. Derivation structure captures the idea that a natural language expression can be composed of smaller expressions; word order concerns the possible linearisations of syntactic material. This section presents the basic terminology for drawings and cites some previous results. 2.1 Relational structures A relational structure consists of a non-empty, finite set V of nodes and a number of relations on V . In this paper, we are mostly concerned with binary relations on the nodes. We use the standard terminology and notations available for binary relations. In particular, R + refers 0 0 0 0 0 0 a b 0 c 1 2 1 e f a b 1 0 0 d 0 c 0 0 e f d 1 a b e c 0 f d Figure 1: Drawings in D0 (projective drawings; left), D1 − D0 (gap degree 1; middle) and D2 − D1 (gap degree 2; right). An integer at a node states that node’s gap degree. to the transitive closure, R ∗ to the reflexivetransitive closure of R. The notation Ru stands for the relational image of u under R: the set of all v such that (u, v) ∈ R. Since relational structures with binary relations can also be seen as multigraphs, all the standard graph terminology can be applied to them. Two types of relational structures are particularly important for the representation of syntactic configurations: trees and total orders. A relational structure (V ; /) is a forest iff / is acyclic and every node in V has an indegree of at most one. Nodes with indegree zero are called roots. A tree is a forest with exactly one root. For a node v, we call the set /∗ v the yield of v. A total order is a relational structure (V ; ≺) in which ≺ is transitive and for all v1 , v2 ∈ V , exactly one of the following three conditions holds: v1 ≺ v2 , v1 = v2 , or v2 ≺ v1 . Given a total order, the interval between two nodes v1 and v2 is the set of all v such that v1 v v2 . A set is convex iff it is an interval. The cover of a set V 0 , C(V 0 ), is the smallest interval containing V 0 . A gap in a set V 0 is a maximal, non-empty interval in C(V 0 ) − V 0 ; the number of gaps in V 0 is the gap degree of V 0 . 2.2 Drawings Drawings are trees whose nodes are totally ordered. Definition 2.1 A drawing is a relational structure (V ; /, ≺) in which (V ; /) forms a tree and (V ; ≺) forms a total order. a Note that drawings are not the same as ordered trees: in an ordered tree, only sibling nodes are ordered; in drawings, the order is total for all of the nodes. The notions of cover, gap and gap degree can be applied to nodes in a drawing by identi- fying a node v with its yield /∗ v; for example, the gap degree of a node v is the gap degree of /∗ v. The gap degree of a drawing is the maximum among the gap degrees of its nodes. We write Dg for the class of all drawings whose gap degree does not exceed g. The drawings in D0 are called projective. Fig. 1 shows three drawings of the same tree structure but with different gap degrees. The notion of gap degree yields a scale along which the non-projectivity of a drawing can be quantified. Orthogonal to that, there are linguistically relevant qualitative restrictions on non-projectivity. One of these is well-nestedness, which constrains the possible relations between gaps (Bodirsky et al., 2005). Definition 2.2 Let D be a drawing. Two disjoint trees T1 and T2 in D interleave iff there are nodes l1 , r1 ∈ T1 and l2 , r2 ∈ T2 such that l1 ≺ l2 ≺ r1 ≺ r2 . The drawing D is called wellnested iff it does not contain any interleaving trees. a We use the notation Dwn to refer to the class of all well-nested drawings. In Fig. 1, the first and the second drawing are well-nested; the third drawing contains two pairs of interleaving trees, rooted at b, e and c, e, respectively. 2.3 Labelled drawings A labelled drawing is a drawing equipped with two total functions `V and È : the function `V marks each node with a node label from an alphabet Σ, and È marks each edge with an edge label from an alphabet Π. We write DΣ,Π for the class of labelled drawings obtained by decorating drawings from class D with node labels from Σ and edge labels from Π. In labelled drawings, labelled successor relations can be defined as follows: /π := { (u, v) | u / v and È (u, v) = π } . To reduce the complexity of our presentation, we define the accessibility relation as the set of nodes reachable via an edge labelled with π : π := /π ◦ /∗ The projection of a labelled drawing D, proj(D), is the string obtained by concatenating the node labels of the drawing in the order of their corresponding nodes. Thus, the projection of a drawing in DΣ,Π is a string over Σ. 3 Lexical constraint languages The choice of a particular class of drawings imposes a global constraint on the syntactic structures allowed by an lcg formalism. In this section, we formalise the mechanism of lexical (local) constraints. As we illustrated in the introduction, the lexical entry for a given word w specifies the type of w and the types of the words connected to w, and imposes additional structural restrictions using constraints from a lexical constraint language. In our formal model, words will correspond to node labels, and types of nodes will correspond to edge labels. A lexical constraint between two types π1 , π2 in the entry of a word `V (u) will be interpreted on the nodes reachable from u by the labelled successor relations named by π1 and π2 . 3.1 Syntax and semantics Syntax The syntax of a lexical constraint language is defined relative to an alphabet R of relation symbols and an alphabet Π of edge labels. The alphabet R, together with a function ar that assigns every symbol R ∈ R a non-negative arity ar(R), forms the signature of the language. We will leave the arity function implicit, and use the letter R to refer to signatures. Definition 3.1 Let R be a signature, and let Π be an alphabet of edge labels. A lexical constraint language with signature R over Π, written LR (Π), consists of literals of the form R(π1 , . . . , πk ), where R ∈ R, ar(R) = k, and πi ∈ Π. We write LR for the class of all lexical constraint languages with signature R. a Binary constraint literals will be written using infix notation, so the notation π1 R π2 will stand for R(π1 , π2 ). Note that L is the class of languages that contains no constraints. Semantics The satisfiability relation associated to a lexical constraint language LR (Π) is a ternary relation between a formula φ, a drawing D ∈ DΣ,Π and a node u in that drawing. In this paper, we focus on satisfiability relations that meet the following requirement: the question whether a defining condition of a constraint applies must be decidable in time polynomial in the number of nodes in D. lcg as we define it here does not impose any further restrictions; it allows for defining arbitrary constraint languages for labelled drawings, if they meet the complexity condition. To simplify our presentation, we assume the existence of a special edge label ? called ‘self’, distinct from all other labels, and define /? := Id and ? := Id. The self label allows for constraint definitions in which successors and the node u itself are referenced in the same manner. It constitutes a notational trick: instead, one could define additional special contraints with references to u integrated into their definition. 3.2 Theories and grammars Within lcg, we distinguish between theories and grammars. Formally, an lcg theory is a pair of a class of (unlabelled) drawings and a class of lexical constraint languages. An lcg theory corresponds to a ‘grammar formalism’ in the usual sense of the word. An lcg grammar adopts a theory and instantiates it by choosing concrete alphabets for the node and edge labels, and a lexicon. An lcg lexicon is a mapping from node labels to sets of lexical entries. The type of a lexical entry depends on the signature of its constraint language and the alphabet of edge labels that the lexical constraints may refer to. Definition 3.2 A bag (multiset) over a base set S is a function in S → N0 that assigns every element in S a natural number that tells how often the element appears in the bag. The set of all bags over S is written B(S). An option over S is a bag over S that assigns 0 to all elements in S, possibly except for exactly one element which is assigned 1. An option is therefore equivalent to a set that contains either exactly one element of S or is empty. The set of all options over S is written O(S).a Definition 3.3 A lexical entry describes a node in a drawing. It is a triple hI, Ω ; Φi ∈ LE R (Π), where the option I ∈ O(Π) and the bag Ω ∈ B(Π) contain edge labels, and Φ ∈ P(LR (Π)) is a set of lexical constraints. A node u in D ∈ DΣ,Π satisfies a lexical entry hI, Ω ; Φi ∈ LE R (Π) iff: for all π ∈ Π, |(/π )−1 u| = I(π ) and |(/π )u| = Ω(π ); and D, u î φ for all φ ∈ Φ. a A node u thus satisfies a lexical entry hI, Ω ; Φi if and only if I contains exactly the ingoing edge of u, Ω is exactly the bag of outgoing edges, and u satisfies all constraints in the set of constraints Φ. Definition 3.4 Let T = (D, LR ) be a theory. A grammar of type T is a triple GT = (Σ, Π, Lex) such that Σ is an alphabet of node labels, Π is an alphabet of edge labels, and Lex is a lexicon of type Σ → P(LE R (Π)). a Definition 3.5 A node u ∈ D ∈ DΣ,Π satisfies a lexicon Lex ∈ Σ → P(LER (Π)) iff there is a lexical entry hI, Ω ; Φi ∈ Lex(`V (u)) such that u satisfies hI, Ω ; Φi. A drawing D satisfies a grammar G, written D î G, iff every node u ∈ D satisfies the lexicon of G. a 3.3 Sample languages To provide an intuition for the formal concepts defined in the previous two sections, we will now translate three grammar formalisms into lcg theories. We start by adapting our previous encoding of lcfg to the new formal concepts. Lexicalised Context-Free Grammars As mentioned in the introduction, lexicalised context-free rules like A → B1 wB2 B3 can be seen as local well-formedness conditions on node-labelled, ordered trees (see Fig. 2). A A B₂ B₃ B₁ B₁ a w b w B₂ B₃ c w : h{A}, {B1 , B2 , B3 } ; {B1 ≺ ?, ? ≺ B2 , B2 ≺ B3 }i Figure 2: Encoding Lexicalised Context Free Grammars To express these conditions in the formal framework defined above, we first need to choose a class of drawings suitable as models for lcfgs. Since the yields of each non-terminal are continuous, a proper choice is D0 , the class of projective drawings. Second, we need to choose a signature for the lexical constraint language that we want to use. As we already mentioned in the introduction, the only structural constraint relevant to lcfgs is linear order. Therefore, it suffices to have a single literal ≺ that imposes an order on the immediate successors of a node; since the language is interpreted on projective drawings, this order induces an order on the subtrees. D, u î π1 ≺ π2 iff /π1 u × /π2 u ⊆ ≺ Fig. 2 shows a node-labelled tree, the corresponding lexical entry for the word w, and a (partial) drawing satisfying the entry. Note that (instances of) non-terminals in the lcfg rule correspond to edge labels in lcg. If A was a start symbol of the underlying grammar, the first component of the corresponding lcg entry would have to be the empty set; such entries can only be satisfied at root nodes. Lexicalised Unordered Context-Free Grammar Since nothing forces us to impose order constraints on all types, we can write grammars corresponding to lcfgs with arbitrary permutations of the right hand sides of the rules. If we abandon the order constraints completely, we get the theory (D0 , L ), which is equivalent to the class of (lexicalised) unordered context-free grammars. The scrambling language The following grammar derives drawings whose projections D, u î π1 ≺ π2 iff π1 u × π2 u ⊆ ≺ D, u î π1 1 π2 iff (/π1 ∪ /π2 )u convex n 4 Limitative complexity results v n v n v n n n n n ‘isolation’ (zero-gap) constraints. These constraints can be translated into constraints from the lexical constraint language LLSL shown in Fig. 4. v v v v Figure 3: Lexical constraint language and sample drawing for SCR form the scrambling language presented in the introduction. The underlying theory uses the unrestricted class of drawings and a constraint language with two literals ≺ (linear precedence) and 1 (adjacency), whose semantics are specified in Fig. 3. The grammar is GSCR := ({n, v}, {n, v}, Lex), where the lexicon Lex contains the entry h{n}, ; i for n and the following entries for v: h, {n, v} ; {n ≺ ?, ? ≺ v, ? 1 v}i , h{v}, {n, v} ; {n ≺ ?, ? ≺ v, ? 1 v}i , and h{v}, {n} ; {n ≺ ?}i . The precedence constraints place each v in between its n-successor and its v-successor. The adjacency constraint prevents material from entering between a v and its v-successor. Therefore, all nodes labelled with n must be placed to the left of all nodes labelled with v, and while the vs are ordered, the ns can appear in any permutation. (Fig. 3 shows a sample drawing licensed by GSCR .) Linear Specification Language Suhre’s lsl formalism (Suhre, 1999) allows to generate languages with a free word order. It is inspired by id/lp parsing (Gazdar et al., 1985), but allows for local constraints only, which makes it more suitable for translation into lcg. The yields in lsl are generally discontinuous; therefore, a theory for lsl needs to adopt the class of unrestricted drawings as its models. To restrict the possible linearisations, each lsl grammar rule can be annotated with local precedence and The previous section has demonstrated that the lcg framework is rather expressive. This expressive power does not come without a price. It is clear that all recognition problems for lcg are in np: we can simply guess a labelled drawing and check the lexical constraints in polynomial time. The main result of the present section is the proof that the universal recognition problem for the most general lcg theory is np-complete. 4.1 The universal recognition problem Definition 4.1 Let G = (Σ, Π, Lex) be a grammar for the theory (D, LR ), and let s be a string over Σ. The universal recognition problem for G and s, written (G, s), is the problem to decide whether the following set is nonempty: C(G, s) := { D ∈ DΣ,Π | D î G and proj(D) = s } Elements of this set are called configurations of (G, s). a Lemma 4.2 The universal recognition problem for (D, L ) is np-hard. 2 Proof We will present a polynomial reduction of Hamilton Path to the universal recognition problem for (D, L ). More specifically, for each input graph H = (V ; E) to Hamilton Path, we will construct (in time linear in the size of the input graph) a grammar GH and a string sH such that C(GH , sH ) is non-empty iff H has a Hamilton Path. Let sH be some string over V , and define ΣH , ΠH := V S(v) := { h, {v 0 } ; i | v → v 0 ∈ H } I (v) := { h{v}, {v 0 } ; i | v → v 0 ∈ H } E(v) := { h{v}, ; i | v → v 0 ∈ H } Lex H := { v , S(v) ∪ I (v) ∪ E(v) | v ∈ V } GH := (ΣH , ΠH , Lex H ) D, u î π1 < π2 iff π1 u × π2 u ⊆ ≺ D, u î π1 π2 iff π1 u × π2 u ⊆ ≺ and C(π1 u) ∪ C(π2 u) convex D, u î hπ i iff π u is convex D, u î h•i iff /∗ u is convex Figure 4: Suhre’s Linear Specification Language. The last clause corresponds to an isolation constraint applied to the left hand side of an lsl rule. Each Hamilton Path in H forms a linear tree on V . Each such tree can be configured using GH by choosing, for each node v in H, an entry from either S(v), E(v), or I (v), depending on the position of v in the Hamilton Path (start, inner, or end). Conversely, in each configuration of (GH , sH ), each node has at most one predecessor and at most one successor qua lexicon. Therefore, each such configuration is a drawing whose successor relation forms a linear tree, and the path from the root to the leaf identifies a Hamilton Path in H. To illustrate the encoding used in the proof, we show an example for an input graph H and a corresponding configuration in Fig. 5. The depicted drawing satisfies the following lexicon Lex H . (The lexical entry satisfied at each node is underlined.) Well-nested drawings Since linear drawings do not contain disjoint trees, all solutions to the configuration problem for Hamilton Graphs are well-nested. Corollary 4.3 The universal recognition problem for (Dwn , L ) is np-hard. 2 Projective drawings While all solutions to the configuration problem for Hamilton Graphs are well-nested, there is no limit on their gap degree. One may wonder if restricting the gap degree reduces the complexity of the universal recognition problem. This, however, is not even the case for drawings without gaps: Lemma 4.4 The universal recognition problem for (D0 , L ) is np-hard. 2 Proof As indicated in section 3.3, each (D0 , L ) grammar can be transformed into an equivalent lexicalised unordered context-free grammar. Therefore, the universal recognition problem for (D0 , L ) can be reduced from the corresponding problem for lucfgs; this problem is np-hard (Barton, 1985). 4.2 The fixed recognition problem The fixed recognition problem asks the same question as the universal problem, but the grammar is not considered part of the input. This invalidates the reduction that we used in the previous section, as this reduction constructed a new grammar for every input, while any reduction for the fixed recognition problem needs to assume one fixed grammar for every input string. Lemma 4.5 The fixed recognition problem for (D, L ) is polynomial. 2 Proof We assume the existence of a chart parser that solves the fixed recognition problem for (D, L ) and uses parse items to represent partial derivations. These parse items have the form s : hI, Ωi, where s represents the span of the input sentence covered by the partial derivation, and I and Ω represent the incoming and outgoing valencies that still need to be saturated. (We will show such a general chart-based parsing schema for lcgs in Section 5.) The time complexity of such a parser is polynomial in the number of possible parse items, but since for the fixed recognition problem, we may ignore the size of the grammar, we are only interested in the number of different spans. The string languages of (D, L ) are closed under permutation; therefore, each span can be represented by a Parikh vector (a mapping from node labels to natural numbers). As each σ ∈ Σ can occur between zero and n times, the number of different such Parikh vec- 1 2 1 4 3 3 4 1 2 3 4 1 , {h, {3} ; i, h, {4} ; i, h{1}, {3} ; i, h{1}, {4} ; i, h{1}, ; i} 2 , {h, {1} ; i, h, {4} ; i, h{2}, {1} ; i, h{2}, {4} ; i, h{2}, ; i} 3 , {h{3}, ; i} 4 , {h, {3} ; i, h{4}, {3} ; i, h{4}, ; i} Figure 5: An input graph H for Hamilton Path and a drawing licensing Lex H . The Hamilton Path in H is marked by solid edges. tors is (n + 1)|Σ| ∈ O(n|Σ| ), which is polynomial in n. (An upper bound for Σ is the size of the grammar, which is considered constant for the fixed recognition problem.) It would seem desirable to have a framework in which extending the signature of the constraint language may only reduce the complexity of the recognition problem, but never increase it. For lcgs, however, this is not necessarily the case: in an unpublished manuscript, Holzer et. al. show—by a reduction of Tripartite Matching—that for the Linear Specification Language, even the fixed recognition problem is np-complete (p.c.); consequently, by the encoding of lsl presented in Section 3.3, the same result applies to lcgs. 5 Parsing Lexicalised Configuration Grammars This section presents a general schema for chart-based approaches to parsing lcgs. Parsing schemata (Sikkel, 1997) provide us with a declarative specification of concrete parsing algorithms, and allow us to analyse the complexity of these algorithms on a high level of abstraction, hiding the algorithmic details. The complexity and even the completeness heavily depend on the class of drawings that the schema is applied to. Hence we get a detailed picture of how parsers can benefit from the global constraints that are implicit in a class of drawings and up to what limits the class can be extended without losing efficiency. 5.1 A general parsing schema Parsing schemata (Sikkel, 1997) view parsing algorithms as inference systems. The general parsing schema for lcg derives parse items representing partial drawings licensed by a given grammar and sentence. These parse items have the form s : hI, Ωi, where s is a span (a non-empty subset of the words in the sentence) and I and Ω are bags of edge labels. Each parse item represents the information that the grammar licenses a partial drawing covering the words of the input sentence specified by s; for this drawing to be complete, one still needs to connect its root nodes using incoming edges labelled with the labels in I and outgoing edges labelled with the labels in Ω. A parse item in which Ω is empty is fully saturated. An item s : h, i in which s contains all the words in the sentence is complete. The lookup rule The parsing schema contains three rules called lookup, group and plug. The lookup rule creates a new parse item with a singleton span for a word wi in the input sentence: hI, Ω ; Φi ∈ Lex(wi ) {i} : hI, Ωi lookup The combination rules The group and plug rules derive new parse items from existing ones. The first rule, group, combines two fully saturated items into a new fully saturated item. The plug rule saturates a bag of valencies in a parse item by combining it with another item accepting these valencies on incoming edges pointing to its root nodes: s1 : hI1 , i s2 : hI2 , i s1 ⊕ s2 : hI1 ∪ I2 , i s1 : hI1 , Ω ] I2 i group s2 : hI2 , i s1 ⊕ s2 : hI1 , Ωi plug The span of a parse item in the conclusion of the group or plug rule (s1 ⊕ s2 ) is the union of the spans in the premises (s1 , s2 ). The ⊕ relation is a subset of the disjoint union relation. On which pairs of spans it is defined depends on the class of drawings that the schema is applied to, e.g. for D1 it would only be defined on pairs of spans whose union has at most one gap. We regard these restrictions as implicit side conditions of the group and plug rule. Chart-based parsing A concrete parsing algorithm using the general schema would test whether the inferential closure of the three rules contains a complete item. Computing the inferential closure can be done efficiently by using a chart, indexed by the spans, to record parse items already derived, and by choosing a control strategy that guarantees that no two items are combined twice. Alternatively a grammar could be translated into a definite-clause grammar (dcg): each instance of the lookup rule as well as the group and the plug rule can be represented by dcg rules. A dcg parser implemented as proposed in (Shieber et al., 1995) will perform the same operations as the chart parser sketched above. 5.2 Completeness Before we look at the complexity of parsing lcgs in more detail, we first need to ensure that the presented parsing schema is sound and complete, i.e., that all the inferences are valid and that every drawing can be derived with them. While this is easy to show in the general case, chart-based parsing requires a crucial invariant on the parsing rules: all spans derived during parsing must have a uniform representation. More specifically, assume that each span in the premises of a combination rule has at most g gaps and thus can be represented using 2(g + 1) integer indices (denoting the start and end positions of the g + 1 intervals that the span consists of). Then the union of two spans must also have at most g gaps. Under this side condition, the general parsing schema is no longer complete: there are drawings whose gap degree is bounded by g that cannot be derived using parse items whose gap degree is bounded by g. Completeness for well-nested drawings We will now show that for well-nested drawings (cf. Section 2.2), the general parsing schema is complete even in the presence of the gap invariant. For the proof of this result, we need the concept of the gap forest of a well-nested drawing (Bodirsky et al., 2005). Definition 5.1 Let (V ; /, ≺) be a well-nested drawing and let v ∈ V be a node with g gaps. The gap forest for v is defined as the ordered forest gf(v) = (S ; =, <): S := {{v}, G1v , . . . , Ggv } ∪ { /∗ w | v / w } = := { (s1 , s2 ) ∈ S × S | C(s1 ) ⊃ s2 } < := { (s1 , s2 ) ∈ S × S | s1 ≺ s2 } The elements of S are called spans. a (The notation Giv refers to the ith gap in the yield of v.) In a gap forest, sibling spans correspond to disjoint sets whose union has at most g gaps. Sibling spans belonging to the same convex region are called span groups. Lemma 5.2 Let G be an lcg grammar and let D be a well-nested drawing on nodes V with gap degree at most g. Then D î G implies the existence of a derivation of a parse item V : hI, i that only involves parse items whose gap degree is bounded by g. 2 Proof Let G be a grammar and let D be a well-nested drawing on V such that D î G. If V = {u}, then h, ; Φi ∈ Lex(`(u)). In this case, the parse item {u} : h, i can be derived by one application of the lookup rule. Now assume that D consists of a root node u with children vi , 1 ≤ i ≤ k, where each child vi is the root of a drawing Di . Then h, S 1≤i≤k Qi ; Φi ∈ Lex(`V (u)), where Qi = { πi | h{πi }, Ωi ; Φi i ∈ Lex(`V (vi )) } . n X (ik − (i − 1)k )2 + (ik − (i − 1)k )(i − 1)k i=1 = = n X i=1 n X i2k − 2ik (i − 1)k + (i − 1)2k + ik (i − 1)k − (i − 1)2k i2k − ik (i − 1)k ≤ i=1 n X i2k − (i − 1)2k = n2k i=1 Figure 6: Complexity estimate for a left-to-right chart parsing strategy. The last equality holds because the sum in question is telescopic. By induction, we may assume that each of the drawings Di was derived using parse items with gap degree at most g only; in particular, each complete drawing Di corresponds to such a parse item. The drawing D then can be derived using the two combination rules, successively combining the parse items for the drawings Di and the item for the root node u (obtainable by the lookup rule). The interesting part of the proof is to show that the combining operations can be linearised in such a way that the gap degree of the intermediate parse items is bounded by g. We now present such a linearisation, based on a post-order traversal of the gap forest for the node u: In a horizontal phase of the traversal, we combine all parse items corresponding to a span group from left to right, ignoring any gap nodes. There are at most g such nodes in the complete gap forest; therefore, this phase of the traversal maintains the gap invariant. In a vertical phase, we combine the parse items from the preceding horizontal phase with the item corresponding to the parent node in the gap forest in order of their gap degree. Since the gap degree of the final item is bounded by g, this maintains the gap invariant. 5.3 Complexity analysis We now determine the complexity bounds of an implementation of our schema. Space complexity To bound the number of parse items stored in the chart, we look at the number of possible values for the variables of a parse item s : hI, Ωi. As both I and Ω may represent arbitrary multisets over the edge labels, the number of parse items may be exponential in the size of the grammar. In the case that the drawings under consideration are unrestricted (so that a span s can be an arbitrary set), the number of parse items is also exponential in the length of the input sentence. However, in cases where Lemma 5.2 applies, spans can be represented by k = 2(g + 1) integers (cf. Section 5.2). Thus, there will be at most O(nk ) different parse items in the chart. Time complexity Since the chart-based architecture guarantees that no two parse items are combined twice, the space complexity can be used to bound the time complexity. Of course, if the number of parse items is exponential, the runtime of any algorithm faithfully implementing the general parsing schema will be exponential as well. In what follows, we will ignore the size of the grammar and focus on well-nested drawings with bounded gap degree. How many possibilities of combinations are there for parse items? Counted over the runtime of the complete algorithm, every parse item needs to be combined with every other item, so the time needed for these combinations is O(nk ) · O(nk ) = O(n2k ). To see this more clearly, assume that the parser works from left to right. At each position i in the input string, the chart contains ik different spans. For the combination rule, we need to combine the parse items that we have not yet seen among each other, and with all the items previously present in the chart. Since the number is monotonically increasing, the estimate in Fig. 6 holds. A refined analysis This O(n2k ) time estimate is too pessimistic still. To see this, notice that in both of the combination rules, k indices used to represent the spans only occur in the premises: since both the spans in the premises and the span in the conclusion can be represented using k indices each, 2k−k cannot ‘make it’ into the conclusion. As the union operation on spans does not ‘forget’ about any material, the value of k/2 of these indices are determined by other indices in the premises. Thus, a better upper bound for the time complexity for the algorithm is O(n2k−k/2 ). Remembering that k = 2(g + 1), we get the following result: Lemma 5.3 Let D be a class of well-nested drawings whose gap degree is bounded by g, and let LR be a lexical constraint language. Then the universal recognition problem for (D, LR ) has complexity O(2|G| cn3g+3 ), where c is the (polynomial) time it takes to check lexical constraints, measured relative to |G| and n. 2 For context-free grammars (g = 0), this lemma gives the familiar O(n3 ) parsing result; for tags (g = 1), we get a parser that takes time O(n6 ). Note that both of these complexities ignore the size of the grammar, and the complexity of the lexical constraints. For lcfgs, however, our parsing framework can be as efficient as e.g. the Earley parser: Lemma 5.4 The universal recognition problem for totally ordered grammars of type (D0 , L{≺} ) has complexity O(|G|2 n3 ). 2 Proof By the previous lemma, we know that O(2|G| cn3 ) is an upper bound. The restriction that the valency of each lexical entry are totally ordered implies that we can represent valencies as lists instead of bags. The precedence constraints can be expressed entirely as side conditions on the span variables, hence we can ignore complexity of these constraints. 5.4 The size of the grammar and the complexity of the constraints The previous section offered insights in how far the model class used by a certain grammar formalism influences the completeness and the complexity with respect to the length of the input sentence. To develop an efficient parser of practical relevance based on our parsing schema, two crucial points are the parsing complexity with respect to the size of the grammar and the complexity of the lexical constraints. Grammar size is an often neglected factor for the performance of parsing algorithms: a standard sentence of, say, 25 words, is usually several orders of magnitude shorter than a lexicalised grammar. While grammar size thus is significant even for frameworks in which the grammar only contributes linearly or quadratically to the speed of the parsing algorithm (such as context-free grammar), it is definitely an issue in a framework like lcg, where for reasons of expressive power it cannot in general be avoided. It seems then, that it is desirable to complement the chart-based parsing architecture by methods to avoid the worst-case complexity in the size of the grammar whenever possible. This is where we propose to use constraint propagation: lexical constraints can be used to control the chart-based parser. To give a very simple example: in the presence of order constraints, far from all of the possible combinations of parse items need to be considered when applying the plug rule: if an item i has open valencies π1 ≺ π2 , there is no need to try to plug π2 with an item adjacent to i—any item plugging π1 precedes any item plugging π2 in all licensing drawings. How exactly the interaction between constraint propagation and chart parsing is realized and how much a parser can benefit from each single constraint are open questions that we are currently addressing. 6 Conclusion This paper presented Lexicalised Configuration Grammars (lcgs), a novel framework for the descriptive analysis of natural language. lcg is parametrised by the choice of a class of reference structures (a global constraint), and the choice of a lexical (i.e., local) constraint language used to describe those structures that should be considered grammatical. Translating grammar formalisms into lcg makes it possible to study these formalisms and their relations from a new perspective, and to experiment with gradual and local alternations of their expressivity and processing complexity. lcgs are expressive enough to generate the scrambling language, a language that cannot be generated by many traditional generative frameworks. The universal recognition problem for lcg is np-complete; however, a broad class of linguistically relevant lcgs can be parsed in polynomial time. Future work We plan to continue our research by investigating the potential of the processing framework outlined in Section 5 to combine chart-based and constraint-based processing techniques. Our immediate goal is the implementation of a parser for lcgs that uses constraint propagation to avoid the worst-case complexity of the chart-based parsing algorithm with respect of the size of the grammar. One of the major technical challenges in this is the constraint-based treatment of lexical ambiguity: handling disjunctive information is notorously difficult for constraint propagation. In a second line of work, we will try to relate lcgs to more and more traditional grammar formalisms by defining appropriate lcg theories and grammars and proving the necessary equivalence results. References G. Edward Barton. 1985. On the complexity of ID/LP parsing. Comp. Ling., 11(4):205–218. Tilman Becker, Owen Rambow, and Michael Niv. 1992. The derivational generative power, or, scrambling is beyond lcfrs. Technical Report IRCS-92-38, University of Pennsylvania. Manuel Bodirsky, Marco Kuhlmann, and Mathias Möhl. 2005. Well-nested drawings as models of syntactic structure. In 10th Conference on Formal Grammar and 9th Meeting on Mathematics of Language, Edinburgh, Scotland, UK. Ralph Debusmann, Denys Duchier, and Marco Kuhlmann. 2005. Multi-dimensional graph configuration for natural language processing. In Constraint Solving and Language Processing, volume 3438 of Lecture Notes in Computer Science, pages 104–120. Springer. Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. 1985. Generalized Phrase Structure Grammar. Havard University Press, Cambrige, MA. Aravind Joshi and Yves Schabes, 1997. Handbook of Formal Languages, volume 3, chapter Tree Adjoining Grammars, pages 69–123. Springer. Hiroshi Maruyama. 1990. Structural disambiguation with constraint propagation. In 28th Annual Meeting of the Association for Computational Linguistics (ACL 1990), pages 31–38, Pittsburgh, Pennsylvania, USA. James D. McCawley. 1968. Concerning the base component of a transformational grammar. Foundations of Language, 4(3):243–269. Stuart M. Shieber, Yves Schabes, and Fernando C. N. Pereira. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming, 24(1&2):3–36. Klaas Sikkel. 1997. Parsing Schemata: A Framework for Specification and Analysis of Parsing Algorithms. Springer-Verlag. Oliver Suhre. 1999. Computational aspects of a grammar formalism for languages with freer word order. Diploma thesis, Universität Tübingen.

Lexicalised Configuration Grammars

Related documents

Products

Support

Lexicalised Configuration Grammars

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib