Parsing permutations of first-order propositional logic formulas Jeffrey Finkelstein András Kornai

advertisement
og
re
ss
Parsing permutations of first-order propositional
logic formulas
Jeffrey Finkelstein
András Kornai
October 16, 2014
pr
In this work, we wish to determine the complexity of deciding whether a
given multiset of symbols, chosen from some finite set of symbols, can be derived
according to the rules of a given context-free grammar. Providing an upper
bound for this problem is of interest to computational linguistics, one goal of
which is to classify the complexity of parsing and understanding natural language.
We consider a simplified subset of fist-order propositional logic consisting of
only those formulas containing universal quantification, conjunction operations,
and equality relations. Variables xi will be represented as i repetitions of the
symbol x (for example, x3 becomes xxx).
in
-
Definition 0.1 ([5]). The context-free language L of restricted first-order propositional logic is defined by the following context-free grammar. The set of
terminals is {∧, =, ∀, x, (, )}, the set of nonterminals is {S, V }, and the grammar
is as follows.
S → (V = V )
S → ∀V S
V →Vx
V →x
W
or
k-
S → (S ∧ S)
A string generated by this grammar is called a well-formed formula.
Definition 0.2. Consider a string in the language L and its associated parse-tree.
A variable xi is bound if it is the descendant of a quantified formula in which
that variable is quantified; a variable is free if it is not bound. The scope of a
quantifier (with respect to its quantified variable) is the well-formed formula
following it (that is, the formula S in the production S → ∀ V S).
Copyright 2011, 2012, 2013, 2014 Jeffrey Finkelstein ⟨jeffreyf@bu.edu⟩.
This document is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License, which is available at https://creativecommons.org/licenses/by-sa/4.0/.
The LATEX markup that generated this document can be downloaded from its website at
https://github.com/jfinkels/parseable. The markup is distributed under the same license.
1
A vacuous quantifier is one whose quantified variable does not appear as a
free variable in its scope (though the variable may appear as a bound variable).
A closed formula (or a sentence) is a formula in which all variables appearing in
the formula are bound.
In each of the problems defined below, the alphabet T is the set of terminals
in the grammar of L.
Definition 0.3 (Parseable).
Instance: a word φ over the alphabet T
Question: Is φ a well-formed formula? (In other words, is φ in L?)
Definition 0.4 (Closed).
Instance: a word φ over the alphabet T
Question: Is φ a closed well-formed formula?
Definition 0.5 (No Vacuous Quantifiers).
Instance: a word φ over the alphabet T
Question: Is φ a well-formed formula with no vacuous quantifiers?
Closed is sometimes called the “no free variables” problem, and No Vacuous Quantifiers is sometimes called the “no vacuous quantifiers” problem.
Closed does not require that the formula has no vacuous quantifiers and No
Vacuous Quantifiers does not require that the formula be closed.
Let us define some complexity classes in order to help classify the complexity
of these problems.
Definition 0.6.
• CFL is the set of all context-free languages, or equivalently the set of all
languages accepted by nondeterministic pushdown automata.
• DCFL is the set of all languages accepted by deterministic pushdown
automata.
• CSL is the set of all context-sensitive languages, or equivalently the set of
all languages accepted by linear-bounded automata.
• GCSL is the set of all context-sensitive languages in which the right side
of each transformation in the context-sensitive grammar is either strictly
longer than the left side or the left side is the start symbol.
• MCSL is the set of all mildly context-sensitive languages (for a precise
definition, see Appendix A).
• INDEXED is the set of all indexed languages, or equivalently the set of all
languages accepted by one-way nondeterministic nested stack automata
[1].
• AC1 is the set of all languages accepted by non-uniform, polynomial size,
O(log n), unbounded fan-in circuits with AND, OR, and NOT gates.
2
• NC2 is the set of all languages accepted by uniform, polynomial size,
O(log2 n), fan-in 2 circuits with AND, OR, and NOT gates.
• P is the set of all languages accepted by a deterministic Turing machine
running in polynomial time.
• NSPACE(n) is the set of all languages accepted by a nondeterministic
Turing machine using at most n space, where n is the length of the input.
• NLINSPACE is the set of all languages accepted by a nondeterministic
Turing machine using space at most linear in the length of the input.
• E is the set of all languages accepted by a deterministic Turing machine
running in 2O(n) time.
• 1NSA is the set of all languages accepted by one-way nondeterministic
stack automata.
Here are some inclusions among these complexity classes. Some of these
inclusions are strict.
Theorem 0.7.
1. CFL ⊆ AC1 ⊆ NC2 ⊆ P.
2. CFL ⊆ GCSL ⊆ CSL = NSPACE(n) ⊆ NLINSPACE ⊆ E.
3. CFL ⊆ 1NSA ⊆ MCSL ⊆ INDEXED ⊆ CSL [3].
We know the following facts about the complexity of the three languages of
interest.
• Parseable ∈ CFL.
• Closed ∈ INDEXED \ CFL. [5]
• It is conjectured in [5] that No Vacuous Quantifiers ∈
/ INDEXED,
though the complexity of this problem still seems to be unknown [2].
Open problem 0.8. Is Closed a mildly context-sensitive language (more
formally, is Closed ∈ MCSL \ CFL)? Could it be decided by, for example, an
instance of the formalism described in [4]?
Open problem 0.9. Is No Vacuous Quantifiers ∈
/ INDEXED?
We will consider also permutation problems corresponding to Parseable,
Closed, and No Vacuous Quantifiers. In the definitions below, if w is a
word composed of symbols w1 · · · wn and σ is a permutation on n elements then
we define σ(w) = wσ(1) · · · wσ(n) .
Definition 0.10 (Perm Parseable).
Instance: a word φ of length n over the alphabet T
Question: Is there a permutation σ on n elements such that σ(φ) is a well-formed
formula? (In other words, is σ(φ) in Parseable?)
3
Definition 0.11 (Perm Closed).
Instance: a word φ of length n over the alphabet T
Question: Is there a permutation σ on n elements such that σ(φ) is a closed
well-formed formula? (In other words, is σ(φ) in Closed?)
Definition 0.12 (Perm NVQ).
Instance: a word φ of length n over the alphabet T
Question: Is there a permutation σ on n elements such that σ(φ) is a well-formed
formula with no vacuous quantifiers? (In other words, is σ(φ) in No Vacuous
Quantifiers?)
With these problems, we wish to capture the situation in which a human
has a jumble of linguistic elements and is able to synthesize a natural language
sentence (preferably one which is semantically valid, but we will just consider
syntax for now).
Todo 0.13. Explain the significance of the Perm Parseable, Perm Closed,
and Perm NVQ problems with respect to natural language parsing and computational linguistics.
Open problem 0.14. Is either of Perm Closed or Perm NVQ in INDEXED?
Is either in MCSL? Generally, permutations of parseable languages are easier
than the originals (if the number of terminal symbols meets some requirement,
then we can rearrange them to choose any appropriate parse-tree), so it should
be possible to put Perm Closed ∈ INDEXED. To show this, one might start
by showing that a one-way nondeterministic nested stack automaton accepts
Perm Closed. Or that a linear indexed grammar generates all words φ which
have some permutation which is a closed well-formed formula.
Figure 1 is a fragment of a combinatorial, linear, non-erasing literal movement
grammar for Perm Closed. ?? is a fragment of a combinatorial, linear, nonerasing literal movement grammar for Perm NVQ.
Todo 0.15. Definition of combinatorial, linear, non-erasing literal movement
grammars.
Todo 0.16. Determine the complexity of the literal movement grammar in
Figure 1 and ??.
A
Definition of “mildly context-sensitive”
The precise definition of the class of mildly context-sensitive languages comes
from [4, Definition 1]. We refer you to that work for the precise definition of
the “constant growth property” that states roughly that if one orders the strings
in a language by increasing length, the difference between the lengths of two
consecutive strings grows in a linear way.
Definition A.1.
4
Figure 1: Literal movement grammar for Perm Closed. Qi represents a quantification of variable xi . After quantification, this grammar allows any number
of instances of the variable xi to occur in rule Vi . The rules for producing the
other symbols of first-order propositional logic have been omitted, as represented
by the vertical ellipsis. Note: this grammar can derive strings with vacuous
quantifiers.
S() → S(X) → Qi (X)
Qi (X ∀ Y xi Z) → Vi (X Y Z)
Qi (X xi Y ∀ Z) → Vi (X Y Z)
Vi (X xi Y ) → Vi (X Y )
Vi (X) → S(X)
..
.
• A language L is mildly context-sensitive if it is in P and has the constantgrowth property.
• A class of languages C is mildly context-sensitive if
1. All languages in C are mildly context-sensitive languages.
2. CFL ⊆ C.
3. C can describe cross-serial
dependencies,
that is, there exists an n ≥ 2
such that for all k ≤ n, wk w ∈ T ∗ ∈ C.
• A formalism F (e.g. an abstract machine or a grammar) is mildly contextsensitive if {L | ∃F ∈ F : L = L(F )} is a mildly context-sensitive class of
languages.
• Denote the largest mildly context-sensitive class of languages by MCSL.
In [4], Kallmeyer states, “So far, it has not been possible to identify a
grammar formalism that generates the largest possible mildly context-sensitive
class of string languages.” But she proposes a formalism which is more general
than all known examples of mildly context-sensitive formalisms, yet still mildly
context-sensitive itself.
References
[1] Alfred V. Aho. “Indexed Grammars—An extension of Context-Free Grammars”. In: Journal of the ACM 15.4 (1968), pp. 647–671.
5
[2] Christopher Potts. “No Vacuous Quantification Constraints in Syntax”. In:
Proceedings of NELS. Vol. 32. 2002, pp. 451–470.
[3] J.E. Hopcroft and J.D. Ullman. Introduction to automata theory, languages,
and computation. Addison-Wesley series in computer science. AddisonWesley, 1979. isbn: 9780201029888. url: http :/ / books. google. com /
books?id=i%5C_BQAAAAMAAJ.
[4] Laura Kallmeyer. “On Mildly Context-Sensitive Non-Linear Rewriting”.
In: Research on Language and Computation 8 (4 2010), pp. 341–363. issn:
1570-7075. doi: 10.1007/s11168-011-9081-6. url: http://dx.doi.org/
10.1007/s11168-011-9081-6.
[5] William Marsh and Barbara H. Partee. “How Non-Context Free is Variable
Binding?” In: The Formal Complexity of Natural Language. Ed. by Walter
J. Savitch et al. Vol. 33. Studies in Linguistics and Philosophy. Springer
Netherlands, 1987, pp. 369–386. isbn: 978-1-55608-047-0. doi: 10.1007/97894-009-3401-6_16.
6
Download