Treelet Probabilities for HPSG Parsing and Error Correction Angelina Ivanova

advertisement
Treelet Probabilities for HPSG Parsing and Error Correction
Angelina Ivanova♠, Gertjan van Noord♣
♠
♣
University of Oslo, Department of Informatics, University of Groningen, Center for Language and Cognition
Motivation
Parsing with the HPSG-based English
Resource Grammar (ERG)
Small mistakes (typos and word repetitions) can be problematic for syntactic parsers. In such cases, a parser
should produce an analysis for the intended sentence,
rather than for the given literal sentence.
Datasets
I
The PET parser (Callmeier, 2000), commonly used with
the ERG, exploits log-linear model for parse disambiguation. For each sentence we can produce a ranked n-best
list of candidate analyses. The ranking probabilities are
discriminative and permit only comparison of parse trees
of the same sentence.
The linguistic depth of broad-coverage grammars, like
HPSG (Pollar and Sag, 1987), could be exploited for error
correction. Since common grammar parsers do not allow
cross-sentence comparison of parse tree probabilities, we
evaluate the treelet model (Pauls and Klein, 2012) for estimating generative probabilities for HPSG parsing with
error correction.
I
I
We apply the treelet model in order to obtain generative
probabilities that allow us to compare the likelihood of
analyses of different sentences.
http://heim.ifi.uio.no/~angelii/wiki_pairs.html
The context for non-terminal productions in the treelet model.
r = hd-cmp u c → v vp will-p le hd-cmp u c,
P = hd-cmp u c, P’=sb-hd mc c,
r’ = sb-hd mc c → sp-hd n c hd-cmp u c
Treelet model
P (T )
The context for terminal productions in the treelet model.
P = v vp will-p le, R = hd-cmp u c,
r’ = hd-cmp u c → v vp will-p le hd-cmp u c,
w−1 = company, w−2 = The
Treelet model for parse selection
The treelet model is proposed in Pauls and
Klein (2012):
Goal: compare the treelet model with the PCFG
model on the parse selection problem
= Πr ∈T p (C1d |h)
We compute how many times either model chooses
the gold-standard analysis among the up to 500
analyses generated by the PET parser.
where T is a constituency tree consisting of
context-free rules of the form
r = P → C1 . . . Cd ,
P is the parent symbol of the rule r ,
C1d = C1 . . . Cd are its children
and h is the context.
Number and percentage of sentences for which
treelet, PCFG and random choice models scored the
gold-standard parse tree higher than other analyses
produced by the PET parser:
The context for non-terminal productions includes parent node P, grandparent node P’
and the rule r’ that generated P ;
while for terminal (lexical) productions the
context covers P, the sibling R immediately
to the right of P , parent rule r’ and the previous two words w−1 and w−2 (see figures).
Treelet model for scoring parse trees of erroneous and
corrected sentences
Goal: evaluate the treelet’s ability to assign a higher probability to the corrected sentence than to the erroneous one in comparison to the PCFG and
SRILM trigram models.
The table demonstrates the number of sentences for which the treelet, PCFG
and random choice models:
1. scored the parse tree of original version of the sentence higher (“Original”);
2. scored the parse tree of corrected version of the sentence higher
(“Corrected”);
3. scored parse trees of original and corrected versions equally (“Equal”)
Upper-bound 12,311 sent. 100%
Treelet
4,487 sent. 36.45%
PCFG
2,905 sent. 23.60%
Random
621 sent. 5.04%
Error analysis
The sentences on which the treelet model
outperforms the PCFG model are:
I longer;
I contain slightly more coordination
structures;
I most of the vocabulary has been seen by
the model during training.
ERG gold data (from the DeepBank and Redwoods
treebanks)
Train: 63,298 sent.; development: 7,034 sent.; test:
17,583 sent.
NUS Corpus of Learner English 2,181 pairs of erroneous
and corrected plain-text sentences
Wikipedia 4,604 pairs of plain-text sentences from
Wikipedia 2012 and 2013 snapshots.
Assumption: sentences from Wiki12 contain errors,
sentences from Wiki13 provide corrections.
NUS corpus
Wikipedia
Model
Corrected Equal Original Corrected Equal Original
Oracle
2,223
0
4,604
0
Treelet
1,449
11
763
1,884
994 1,726
PCFG
1,304
11
908
1,835
996 1,773
Trigram 1,249
80
894
1,732 1,294 1,578
Random
1,112
1,111
2,302
2,302
Conclusions
Wikipedia corpus is peculiar:
I many errors occur in named entities (“Stuebing” - “Stùˆbing”)
I some errors can only be corrected on the discourse level
I sentences from Wikipedia 2013 do not always provide
grammatical error corrections to the sentences from Wikipedia
2012
Future work
I
I
I
Language Resources and Evaluation Conference 2014
enhancement of the treelet model implementation with the
transformations suggested in Pauls and Klein (2012);
supplying the system with a large dictionary of named entities;
building a system for HPSG-parsing with error correction.
I
I
I
the treelet model outperforms the random and PCFG
models for the parse selection problem;
the treelet model appears to perform better than the
random, trigram and PCFG models on the NUS corpus
data;
differences in the performance of the treelet, PCFG and
trigram models are not statistically significant on the
Wikipedia data.
Acknowledgements
We are grateful to Lilja Øvrelid and Stephan Oepen for helpful
feedback and discussions, as well as to anonymous reviewers for
their comments. The first author is funded by the Norwegian
Research Council through its WeSearch project.
Download