probabilistic lexicalized context-free grammars

advertisement
PROBABILISTIC LEXICALIZED CONTEXT-FREE GRAMMARS
İbrahim Alız
Department of Computer Engineering ,
Middle East Technical University ( METU)
ibrahim_aliz@yahoo.com
“With enough knowledge we can figure out the probability of just about anything.”
Referring to the moral here, it didn’t get to much time for computer linguists to use the
power of probability for parsing, to deal with the ambiguities in natural language
understanding task. Probabilistic parsing is a key contribution to disambiguation. Choose the
most probable parse as the answer, so simple. However, additionally, using the help of
subcategorization and lexical dependency information and so of probabilistic lexicalized
context-free grammars (PLCFG) which is an extension to the probabilistic context free
grammars (PCFG) one can get better results. This paper gives a brief description on the
principles of PLCFG, and then gives a suggestion, for an implementation on a PLCFG
within a limited Turkish lexicon and grammar.
An easy way to think of a lexicalized grammar is as a context free grammar with a lot
more rules; it is as if we created many copies of each rule, one copy for each possible
headword for each constituent. In general, it will be to costly to keep all these rules around
but thinking lexicalized grammars this way makes it clearer that we can parse them with
standard CFG parsing algorithms. As an example for a sentence like “what does your student
want to write” we have the following parsing results.
Lexicalized parse tree
(write, what, Swrite S),
(write, does, S does S),
(write, student, S NP VP),
(student, your, NP your student),
(write, want, VPwant VP),
(write, to, VPto write),
Usage of a lexical head (most important item for the constituent) for each constituent
is the main idea while extending a PCFG to a lexicalized PCFG. For example the head of a
noun phrase is the main noun typically the rightmost one (e.g. student for “your student”).
More generally, heads are computed bottom up and the head of a constituent c is a
deterministic function of the rule used to expand c. For example the c is expanded using
s  np vp, the function would indicate that one should find the head of the c by looking for
the head of the vp.
Lexicalized statistical parsers collect, to a first approximation, two kinds of
statistics. One relates the head of a phrase to the rule used to expand the phrase, which we
denote p(r | h), and the other relates the head of a phrase to the head of a subphrase, which we
denote p(h | m, t), where h is the head of the subphrase, m the head of the mother phrase, and t
the type of subphrase. Therefore , for a lexicalized parser to find the probability of a
corresponding parse we use the following formula, if s is the entire sentence, π is a particular
parse of s, c ranges over the constituents of π, and r(c) is the rule used to expand c, then
p(s, π) = ∏c p(h(c)|m(c)) * p(r(c)|h(c))
Here we first find the probability of the head of the constituent h(c) given the head of the
mother m(c) and then the probability of the rule r(c) given the head of c.
However, before parsing we have to train the parser using a pre parsed training corpus,
referring to the Charniak’s work* on statistical parsing which also uses two more equations
on calculating the probability for individual rules and on their dependencies, meaning to give
the necessary probabilities to the rules.
Thus, having the primitive probabilities approximated by the lexical dependencies
between the words in the training corpus, subcategorized on the word affinities, we can
calculate the probability of each parse using the above formula. Having enough information
on the basics of PLCFG, my aim is to develop a LPCFG parser on a specified Turkish
grammar & lexicon.
As it is a natural tendency to calculate the probabilities with the proven phsycological
results on human parsing, both with the lexical dependency another determining factor in
human parsing, it is not shocking that parsers that are implementations of lexicalized PCFG
have a success rate 88%. With more intelligent machines pushing the tight limits every day on
Natural Language Processing topics, the LPCFG’s are an importmant phase on getting the
moral of understanding human speech recognition and parsing.
References
 Statistical Techniques for Natural Language Parsing
Eugene Charniak, Department of Computer Science, Brown University
 *Statistical Parsing with a Context Free Grammar and Word Statistics
Eugene Charniak, Department of Computer Science, Brown University
 Speech and Language Processing, Jurafsky and Martino, Prentice Hall 2000
 A Model of Syntactic Disambiguation Based on Lexicalized Grammars
Yusuke Miyao, Deparment of Computer Science, University of Tokyo
Jun’ichi Tsujii, Department of Computer Science, University of Tokyo
Download