og re ss Parsing permutations of first-order propositional logic formulas Jeffrey Finkelstein András Kornai October 16, 2014 pr In this work, we wish to determine the complexity of deciding whether a given multiset of symbols, chosen from some finite set of symbols, can be derived according to the rules of a given context-free grammar. Providing an upper bound for this problem is of interest to computational linguistics, one goal of which is to classify the complexity of parsing and understanding natural language. We consider a simplified subset of fist-order propositional logic consisting of only those formulas containing universal quantification, conjunction operations, and equality relations. Variables xi will be represented as i repetitions of the symbol x (for example, x3 becomes xxx). in - Definition 0.1 ([5]). The context-free language L of restricted first-order propositional logic is defined by the following context-free grammar. The set of terminals is {∧, =, ∀, x, (, )}, the set of nonterminals is {S, V }, and the grammar is as follows. S → (V = V ) S → ∀V S V →Vx V →x W or k- S → (S ∧ S) A string generated by this grammar is called a well-formed formula. Definition 0.2. Consider a string in the language L and its associated parse-tree. A variable xi is bound if it is the descendant of a quantified formula in which that variable is quantified; a variable is free if it is not bound. The scope of a quantifier (with respect to its quantified variable) is the well-formed formula following it (that is, the formula S in the production S → ∀ V S). Copyright 2011, 2012, 2013, 2014 Jeffrey Finkelstein 〈jeffreyf@bu.edu〉. This document is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License, which is available at https://creativecommons.org/licenses/by-sa/4.0/. The LATEX markup that generated this document can be downloaded from its website at https://github.com/jfinkels/parseable. The markup is distributed under the same license. 1 A vacuous quantifier is one whose quantified variable does not appear as a free variable in its scope (though the variable may appear as a bound variable). A closed formula (or a sentence) is a formula in which all variables appearing in the formula are bound. In each of the problems defined below, the alphabet T is the set of terminals in the grammar of L. Definition 0.3 (Parseable). Instance: a word φ over the alphabet T Question: Is φ a well-formed formula? (In other words, is φ in L?) Definition 0.4 (Closed). Instance: a word φ over the alphabet T Question: Is φ a closed well-formed formula? Definition 0.5 (No Vacuous Quantifiers). Instance: a word φ over the alphabet T Question: Is φ a well-formed formula with no vacuous quantifiers? Closed is sometimes called the “no free variables” problem, and No Vacuous Quantifiers is sometimes called the “no vacuous quantifiers” problem. Closed does not require that the formula has no vacuous quantifiers and No Vacuous Quantifiers does not require that the formula be closed. Let us define some complexity classes in order to help classify the complexity of these problems. Definition 0.6. • CFL is the set of all context-free languages, or equivalently the set of all languages accepted by nondeterministic pushdown automata. • DCFL is the set of all languages accepted by deterministic pushdown automata. • CSL is the set of all context-sensitive languages, or equivalently the set of all languages accepted by linear-bounded automata. • GCSL is the set of all context-sensitive languages in which the right side of each transformation in the context-sensitive grammar is either strictly longer than the left side or the left side is the start symbol. • MCSL is the set of all mildly context-sensitive languages (for a precise definition, see Appendix A). • INDEXED is the set of all indexed languages, or equivalently the set of all languages accepted by one-way nondeterministic nested stack automata [1]. • AC1 is the set of all languages accepted by non-uniform, polynomial size, O(log n), unbounded fan-in circuits with AND, OR, and NOT gates. 2 • NC2 is the set of all languages accepted by uniform, polynomial size, O(log2 n), fan-in 2 circuits with AND, OR, and NOT gates. • P is the set of all languages accepted by a deterministic Turing machine running in polynomial time. • NSPACE(n) is the set of all languages accepted by a nondeterministic Turing machine using at most n space, where n is the length of the input. • NLINSPACE is the set of all languages accepted by a nondeterministic Turing machine using space at most linear in the length of the input. • E is the set of all languages accepted by a deterministic Turing machine running in 2O(n) time. • 1NSA is the set of all languages accepted by one-way nondeterministic stack automata. Here are some inclusions among these complexity classes. Some of these inclusions are strict. Theorem 0.7. 1. CFL ⊆ AC1 ⊆ NC2 ⊆ P. 2. CFL ⊆ GCSL ⊆ CSL = NSPACE(n) ⊆ NLINSPACE ⊆ E. 3. CFL ⊆ 1NSA ⊆ MCSL ⊆ INDEXED ⊆ CSL [3]. We know the following facts about the complexity of the three languages of interest. • Parseable ∈ CFL. • Closed ∈ INDEXED \ CFL. [5] • It is conjectured in [5] that No Vacuous Quantifiers ∈ / INDEXED, though the complexity of this problem still seems to be unknown [2]. Open problem 0.8. Is Closed a mildly context-sensitive language (more formally, is Closed ∈ MCSL \ CFL)? Could it be decided by, for example, an instance of the formalism described in [4]? Open problem 0.9. Is No Vacuous Quantifiers ∈ / INDEXED? We will consider also permutation problems corresponding to Parseable, Closed, and No Vacuous Quantifiers. In the definitions below, if w is a word composed of symbols w1 · · · wn and σ is a permutation on n elements then we define σ(w) = wσ(1) · · · wσ(n) . Definition 0.10 (Perm Parseable). Instance: a word φ of length n over the alphabet T Question: Is there a permutation σ on n elements such that σ(φ) is a well-formed formula? (In other words, is σ(φ) in Parseable?) 3 Definition 0.11 (Perm Closed). Instance: a word φ of length n over the alphabet T Question: Is there a permutation σ on n elements such that σ(φ) is a closed well-formed formula? (In other words, is σ(φ) in Closed?) Definition 0.12 (Perm NVQ). Instance: a word φ of length n over the alphabet T Question: Is there a permutation σ on n elements such that σ(φ) is a well-formed formula with no vacuous quantifiers? (In other words, is σ(φ) in No Vacuous Quantifiers?) With these problems, we wish to capture the situation in which a human has a jumble of linguistic elements and is able to synthesize a natural language sentence (preferably one which is semantically valid, but we will just consider syntax for now). Todo 0.13. Explain the significance of the Perm Parseable, Perm Closed, and Perm NVQ problems with respect to natural language parsing and computational linguistics. Open problem 0.14. Is either of Perm Closed or Perm NVQ in INDEXED? Is either in MCSL? Generally, permutations of parseable languages are easier than the originals (if the number of terminal symbols meets some requirement, then we can rearrange them to choose any appropriate parse-tree), so it should be possible to put Perm Closed ∈ INDEXED. To show this, one might start by showing that a one-way nondeterministic nested stack automaton accepts Perm Closed. Or that a linear indexed grammar generates all words φ which have some permutation which is a closed well-formed formula. Figure 1 is a fragment of a combinatorial, linear, non-erasing literal movement grammar for Perm Closed. ?? is a fragment of a combinatorial, linear, nonerasing literal movement grammar for Perm NVQ. Todo 0.15. Definition of combinatorial, linear, non-erasing literal movement grammars. Todo 0.16. Determine the complexity of the literal movement grammar in Figure 1 and ??. A Definition of “mildly context-sensitive” The precise definition of the class of mildly context-sensitive languages comes from [4, Definition 1]. We refer you to that work for the precise definition of the “constant growth property” that states roughly that if one orders the strings in a language by increasing length, the difference between the lengths of two consecutive strings grows in a linear way. Definition A.1. 4 Figure 1: Literal movement grammar for Perm Closed. Qi represents a quantification of variable xi . After quantification, this grammar allows any number of instances of the variable xi to occur in rule Vi . The rules for producing the other symbols of first-order propositional logic have been omitted, as represented by the vertical ellipsis. Note: this grammar can derive strings with vacuous quantifiers. S() → S(X) → Qi (X) Qi (X ∀ Y xi Z) → Vi (X Y Z) Qi (X xi Y ∀ Z) → Vi (X Y Z) Vi (X xi Y ) → Vi (X Y ) Vi (X) → S(X) .. . • A language L is mildly context-sensitive if it is in P and has the constantgrowth property. • A class of languages C is mildly context-sensitive if 1. All languages in C are mildly context-sensitive languages. 2. CFL ⊆ C. 3. C can describe cross-serial dependencies, that is, there exists an n ≥ 2 such that for all k ≤ n, wk w ∈ T ∗ ∈ C. • A formalism F (e.g. an abstract machine or a grammar) is mildly contextsensitive if {L | ∃F ∈ F : L = L(F )} is a mildly context-sensitive class of languages. • Denote the largest mildly context-sensitive class of languages by MCSL. In [4], Kallmeyer states, “So far, it has not been possible to identify a grammar formalism that generates the largest possible mildly context-sensitive class of string languages.” But she proposes a formalism which is more general than all known examples of mildly context-sensitive formalisms, yet still mildly context-sensitive itself. References [1] Alfred V. Aho. “Indexed Grammars—An extension of Context-Free Grammars”. In: Journal of the ACM 15.4 (1968), pp. 647–671. 5 [2] Christopher Potts. “No Vacuous Quantification Constraints in Syntax”. In: Proceedings of NELS. Vol. 32. 2002, pp. 451–470. [3] J.E. Hopcroft and J.D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley series in computer science. AddisonWesley, 1979. isbn: 9780201029888. url: http :/ / books. google. com / books?id=i%5C_BQAAAAMAAJ. [4] Laura Kallmeyer. “On Mildly Context-Sensitive Non-Linear Rewriting”. In: Research on Language and Computation 8 (4 2010), pp. 341–363. issn: 1570-7075. doi: 10.1007/s11168-011-9081-6. url: http://dx.doi.org/ 10.1007/s11168-011-9081-6. [5] William Marsh and Barbara H. Partee. “How Non-Context Free is Variable Binding?” In: The Formal Complexity of Natural Language. Ed. by Walter J. Savitch et al. Vol. 33. Studies in Linguistics and Philosophy. Springer Netherlands, 1987, pp. 369–386. isbn: 978-1-55608-047-0. doi: 10.1007/97894-009-3401-6_16. 6