Treelet Probabilities for HPSG Parsing and Error Correction Angelina Ivanova♠, Gertjan van Noord♣ ♠ ♣ University of Oslo, Department of Informatics, University of Groningen, Center for Language and Cognition Motivation Parsing with the HPSG-based English Resource Grammar (ERG) Small mistakes (typos and word repetitions) can be problematic for syntactic parsers. In such cases, a parser should produce an analysis for the intended sentence, rather than for the given literal sentence. Datasets I The PET parser (Callmeier, 2000), commonly used with the ERG, exploits log-linear model for parse disambiguation. For each sentence we can produce a ranked n-best list of candidate analyses. The ranking probabilities are discriminative and permit only comparison of parse trees of the same sentence. The linguistic depth of broad-coverage grammars, like HPSG (Pollar and Sag, 1987), could be exploited for error correction. Since common grammar parsers do not allow cross-sentence comparison of parse tree probabilities, we evaluate the treelet model (Pauls and Klein, 2012) for estimating generative probabilities for HPSG parsing with error correction. I I We apply the treelet model in order to obtain generative probabilities that allow us to compare the likelihood of analyses of different sentences. http://heim.ifi.uio.no/~angelii/wiki_pairs.html The context for non-terminal productions in the treelet model. r = hd-cmp u c → v vp will-p le hd-cmp u c, P = hd-cmp u c, P’=sb-hd mc c, r’ = sb-hd mc c → sp-hd n c hd-cmp u c Treelet model P (T ) The context for terminal productions in the treelet model. P = v vp will-p le, R = hd-cmp u c, r’ = hd-cmp u c → v vp will-p le hd-cmp u c, w−1 = company, w−2 = The Treelet model for parse selection The treelet model is proposed in Pauls and Klein (2012): Goal: compare the treelet model with the PCFG model on the parse selection problem = Πr ∈T p (C1d |h) We compute how many times either model chooses the gold-standard analysis among the up to 500 analyses generated by the PET parser. where T is a constituency tree consisting of context-free rules of the form r = P → C1 . . . Cd , P is the parent symbol of the rule r , C1d = C1 . . . Cd are its children and h is the context. Number and percentage of sentences for which treelet, PCFG and random choice models scored the gold-standard parse tree higher than other analyses produced by the PET parser: The context for non-terminal productions includes parent node P, grandparent node P’ and the rule r’ that generated P ; while for terminal (lexical) productions the context covers P, the sibling R immediately to the right of P , parent rule r’ and the previous two words w−1 and w−2 (see figures). Treelet model for scoring parse trees of erroneous and corrected sentences Goal: evaluate the treelet’s ability to assign a higher probability to the corrected sentence than to the erroneous one in comparison to the PCFG and SRILM trigram models. The table demonstrates the number of sentences for which the treelet, PCFG and random choice models: 1. scored the parse tree of original version of the sentence higher (“Original”); 2. scored the parse tree of corrected version of the sentence higher (“Corrected”); 3. scored parse trees of original and corrected versions equally (“Equal”) Upper-bound 12,311 sent. 100% Treelet 4,487 sent. 36.45% PCFG 2,905 sent. 23.60% Random 621 sent. 5.04% Error analysis The sentences on which the treelet model outperforms the PCFG model are: I longer; I contain slightly more coordination structures; I most of the vocabulary has been seen by the model during training. ERG gold data (from the DeepBank and Redwoods treebanks) Train: 63,298 sent.; development: 7,034 sent.; test: 17,583 sent. NUS Corpus of Learner English 2,181 pairs of erroneous and corrected plain-text sentences Wikipedia 4,604 pairs of plain-text sentences from Wikipedia 2012 and 2013 snapshots. Assumption: sentences from Wiki12 contain errors, sentences from Wiki13 provide corrections. NUS corpus Wikipedia Model Corrected Equal Original Corrected Equal Original Oracle 2,223 0 4,604 0 Treelet 1,449 11 763 1,884 994 1,726 PCFG 1,304 11 908 1,835 996 1,773 Trigram 1,249 80 894 1,732 1,294 1,578 Random 1,112 1,111 2,302 2,302 Conclusions Wikipedia corpus is peculiar: I many errors occur in named entities (“Stuebing” - “Stùˆbing”) I some errors can only be corrected on the discourse level I sentences from Wikipedia 2013 do not always provide grammatical error corrections to the sentences from Wikipedia 2012 Future work I I I Language Resources and Evaluation Conference 2014 enhancement of the treelet model implementation with the transformations suggested in Pauls and Klein (2012); supplying the system with a large dictionary of named entities; building a system for HPSG-parsing with error correction. I I I the treelet model outperforms the random and PCFG models for the parse selection problem; the treelet model appears to perform better than the random, trigram and PCFG models on the NUS corpus data; differences in the performance of the treelet, PCFG and trigram models are not statistically significant on the Wikipedia data. Acknowledgements We are grateful to Lilja Øvrelid and Stephan Oepen for helpful feedback and discussions, as well as to anonymous reviewers for their comments. The first author is funded by the Norwegian Research Council through its WeSearch project.