Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Learning to Identify Reduced Passive Verb Phrases with a Shallow Parser Sean Igo Ellen Riloff School of Computing University of Utah Salt Lake City, UT 84112 samwibatt@gmail.com School of Computing University of Utah Salt Lake City, UT 84112 riloff@cs.utah.edu in sentence (b) is in the passive voice because its syntactic subject, “an attorney”, is the entity that was arrested. We will refer to these types of constructions as reduced passives, because they typically occur in reduced relative clauses. Ideally, NLP systems should be able to accurately identify reduced passives by generating parse trees that capture the corresponding syntactic structures. However, we will show that even full parsers have trouble consistently identifying reduced passives across domains. But more importantly, many NLP applications rely on shallow parsers because of their speed and robustness (Li and Roth 2001), and shallow parsers do not generate the deep syntactic structures that are needed to explicitly capture reduced passive constructions. We present a novel approach that aims to create a classifier to identify reduced passive voice VPs in shallow parsing environments. Abstract Our research is motivated by the observation that NLP systems frequently mislabel passive voice verb phrases as being in the active voice when there is no auxiliary verb (e.g., “The man arrested had a long record”). These errors directly impact thematic role recognition and NLP applications that depend on it. We present a learned classifier that can accurately identify reduced passive voice constructions in shallow parsing environments. Introduction Natural language processing systems need to distinguish between active and passive voice to correctly identify the thematic roles associated with verb phrases (VPs). In the passive voice, verbs typically occur in the past participle form preceded by a “to be” auxiliary verb. 1 For example, “YouTube was purchased by Google” is a passive voice construction which means that YouTube was the purchasee (theme) and Google was the purchaser (agent). In contrast, “YouTube purchased Google” is an active voice construction which reverses the thematic roles, meaning that YouTube is the purchaser (agent) and Google is the purchasee (theme). As is often the case, both the agent and theme belong to the same semantic class (in this case, COM PANY) so determining whether the VP is in active or passive voice is critical to correctly assign thematic roles. Our research is motivated by a recurring problem that we have observed: passive voice VPs are often mislabeled as active voice when the auxiliary verb is missing. For example, consider the following sentences, where the underlined verbs are in the passive voice: (a) “A dead bird infected with the West Nile Virus has been discovered in Maine.” (b) “The ringleader was John Freire, an attorney arrested in March 1989.” The verb “infected” in sentence (a) is in the passive voice because its syntactic subject, “a dead bird”, is the entity that was infected (its theme). Similarly, the verb “arrested” Motivation and Related Work Our research was motivated by the observation that our shallow parser, Sundance (Riloff and Phillips 2004), consistently mislabeled reduced passives as being in the active voice. These errors led to mistakes in downstream application systems that rely on thematic role labels. In fact, Sundance is unusual in that it generates active/passive voice labels at all. Most shallow parsers perform syntactic “chunking” (NP, VP, etc.) but do not perform any syntactic analysis beyond that. Sundance uses heuristic rules to distinguish active voice VPs from passive voice VPs based on part-of-speech tags and the presence of auxiliary verbs. But reduced passives look almost identical to active voice VPs based on these properties. We are not aware of research that has focused specifically on identifying reduced passive voice constructions, but the problem has been previously acknowledged and discussed (Merlo and Stevenson 1998). Before embarking on a solution, we wanted to know: (1) how common are reduced passives?, and (2) how well do full parsers identify them? To answer these questions, we manually identified the reduced passive voice constructions in texts drawn from three corpora: 50 Penn Treebank (WSJ) texts (Bies et al. 1995), 200 MUC4 terrorism articles (MUC-4 Proceedings 1992), and 100 ProMed disease outbreak texts (ProMedmail 2006). Reduced passives occurred in 8.4% of the Treebank sentences, 11.6% of the MUC4 sentences, and 13.3% of the ProMed sentences. On average, they occurred in 1 of c 2008, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 1 Some other auxiliary verbs can also signal passive voice, such as “get” (e.g., “he got shot”) and “become” (e.g., “she became stranded”) (Carter and McCarthy 1999; McEnery and Xiao 2005). 1458 every 9 sentences. We concluded that this phenomenon was common enough to warrant further investigation. We then evaluated the ability of 3 widely used full parsers to recognize passive voice constructions: Charniak’s parser (Charniak 2000), Collins’ parser (Collins 1999), and MINIPAR (Lin 1998). The Charniak and Collins parsers generate Treebank-style parse trees, which do not explicitly assign voice labels. However, the Treebank grammar (Bies et al. 1995) includes phrase structures that capture passive voice constructions, and Treebank parsers should generate these structures when passives occur. So we wrote a program that navigates a Treebank-style parse tree and labels a verb as an ordinary or reduced passive if one of the corresponding phrase structures is found. 2 To measure the effectiveness of our program, we compared its output against the manually identified passive verbs in our gold standard. The program identified the ordinary passives with 98% recall and 98% precision, and the reduced passives with 93% recall and 93% precision. Upon inspection, we attributed the discrepancies to highly unusual phrases and felt satisfied that the program faithfully reflects the passive voice labels that would be assigned by Treebank-style parsers using automated methods. MINIPAR generates dependency relations, one of which is the head-modifier relationship vrel, or “passive verb modifier of nouns”, which corresponds to the most common type of reduced passive. We considered all verbs with vrel tags to be reduced passives. MINIPAR does not seem to have a relation that directly corresponds to ordinary passive voice constructions, so we did not evaluate MINIPAR on these. R WSJ P CH CO .95 .95 .96 .95 CH CO MP .90 .85 .48 .88 .89 .57 MUC4 R P F Ordinary Passives .95 .91 .96 .93 .95 .91 .94 .92 Reduced Passives .89 .77 .71 .74 .87 .66 .67 .66 .52 .51 .72 .60 F R PRO P F .93 .93 .96 .94 .94 .93 .77 .64 .44 .78 .78 .68 .77 .70 .53 formance on reduced passives was substantially lower on MUC4 and ProMed texts than on WSJ texts. Since there was no comparable drop across domains for ordinary passives, reduced passive recognition may be more susceptible to domain portability problems in statistical parsers. 4 Our goal was to develop a reduced passive voice recognizer that can be used with shallow parsers, which are widely used because they are typically much faster and more robust than full parsers. To illustrate the speed differential, we processed 1,718 sentences from ProMed using our shallow parser, Sundance, as well as the Stanford parser (Klein and Manning 2003), which claims to be a fast full parser. The Stanford parser took 48 minutes, while Sundance took 20 seconds.5 This speed differential is similar to what we have observed with other full parsers as well. For applications that must process large volumes of text, this speed differential can make the use of full parsers impractical, or at least undesirable. Li and Roth (2001) have also demonstrated that shallow parsers are more robust on lower-quality texts. Reduced Passive Classification The goal of our research is to create a classifier that can identify reduced passive VPs in shallow parsing environments. The features rely on information provided by the Sundance shallow parser, which does part-of-speech tagging and syntactic chunking, as well as syntactic role assignment (subject, direct object, indirect object) and simple clause segmentation. It also assigns general semantic classes (e.g., ANIMAL, DISEASE, etc.) to words via dictionary look-up. In this section, we first describe a test that identifies Reduced Passive Candidates. Next, we present features that we hypothesized could be useful for identifying reduced passives. Finally, we describe the machine learning classifiers. Reduced Passive Candidates The first step is to identify verbs that potentially can be reduced passives. We want to rule out verbs that are obviously in active voice or ordinary passive voice constructions. We consider a verb to be a reduced passive candidate if it satisfies three criteria: (1) it is in past tense 6 , (2) it is not itself an auxiliary verb (“be”, “have”, etc.), (3) the VP containing the verb does not contain passive auxiliaries, perfective auxiliaries (“have”), “do”, or modals. We define the set of passive auxiliaries as: {“be”, “become”, “appear”, “feel”,“get”, “remain”, “seem”}. Only the verbs that satisfy these criteria are used to create training/test instances. Table 1: Performance of full parsers Table 1 shows the performance of the parsers 3 on ordinary and reduced passives. The Charniak (CH) and Collins (CO) parsers performed well on ordinary passives in all three domains. For reduced passives, these parsers performed well on WSJ texts, but performance dropped considerably on the MUC4 and ProMed corpora. MINIPAR (MP) achieved only moderate recall and precision across the board. The performance of the Charniak and Collins parsers was considerably lower on reduced passives than on ordinary passives. This supports our hypothesis that reduced passives are more difficult to recognize. Second, their per- Features We created 28 features to identify reduced passives. 4 The WSJ articles in our gold standard were selected randomly, so the Collins and Charniak parsers may in fact have been trained on some of these articles, in which case another explanation is that their WSJ results are artificially high. 5 It is also important to note that Sundance is just a research platform and has not been optimized for speed. 6 Sundance does not distinguish between past and past participle verb forms, so the classifier can only look for past tense verbs. In principle, however, the passive voice requires a past participle. 2 The program recognizes 2 phrase structures that correspond to ordinary passive voice, and 4 that correspond to reduced passive voice. (See http://www.xmission.com/˜sgigo/aaai08/rp.html) 3 The parsers failed on some sentences in the MUC4 and ProMed corpora, so we wrote scripts to allow them to continue parsing subsequent sentences in a document after a failure. 1459 its theme and agent. To capture this, we derived empirical estimates of the thematic role semantics for verbs. (a) For each verb, we collected its agents and themes. For transitive active voice constructions, we assumed that the verb’s subject is its agent, and its direct object is its theme. For ordinary passives, we assumed that the verb’s subject is its theme and (if present) a “by” PP contains its agent.8 (b) For each verb, we collected thematic role semantics from its agents and themes. We looked up the semantic class of the heads in Sundance’s dictionary, and computed P (semclass | agent, verb) and P (semclass | theme, verb). Using this data, we defined 4 features: Plausible Theme (1): Given a verb and its subject, we assess whether the subject’s head is a semantically plausible theme for the verb. Our goal is to determine which semantic classes virtually never occur in a thematic role (e.g., if a verb’s subject has a semantic class that has never been seen as a theme, then it is probably not a reduced passive). This feature gets a value of 1 if P (semclass | theme, verb) > .01, otherwise its value is 0 (which means it was either never seen as a theme, or the few instances were probably noise 9 ). Plausible Agent ByPP (1): Given a verb and following “by” PP, we assess whether the head noun of the PP is a semantically plausible agent for the verb using the analogous criteria. If the “by” PP is not a plausible agent, then this verb instance is probably not a reduced passive. Plausible Agent Subj (1): Given a verb and its subject, we assess whether the subject is a semantically plausible agent for the verb, using the same criteria. Frequency (1): This feature takes 4 possible values based on the frequency of the verb’s root. We use 4 bins representing a logarithmic scale: 0, 1-10, 11-100, and 100+. Lexical (1): One feature is the root of the verb. Syntactic (7): 4 binary features represent the presence of 4 syntactic constituents around the verb (subject, direct object, indirect object, and “by” PP); 2 features represent the lexical heads of the subject and the “by” PP; 1 binary feature indicates whether the subject is a nominative pronoun. Part of Speech (5): 3 binary features indicate whether the verb is (a) followed by a verb, auxiliary, or modal POS tag, (b) followed by a preposition, or (c) preceded by a number; 2 features represent the POS tag of the preceding word and the following word. Semantic (4): 4 features represent the semantic class of the head of the verb’s subject and the semantic class of the head of a following “by” PP (if one exists). We define 2 features for each case: one with the most general matching semantic class, and one with the most specific matching semantic class, based on Sundance’s semantic hierarchy. Clausal (6): 2 binary features indicate the presence of multiple clauses or multiple verb phrases, and 4 binary features represent the context surrounding the current VP: is it followed by an infinitive, a new clause, the end of the sentence, or is it the last VP in the sentence? T RANSITIVITY If a transitive verb occurs in the past tense without a passive auxiliary or a direct object, then it is tempting to conclude that it is a reduced passive. However, transitive verbs often appear without a direct object (e.g., “He ate after the concert”).7 Consequently, simply knowing that a verb is transitive is not sufficient. We hypothesized that it may be useful to empirically determine the transitiveness of a verb based on the intuition that some transitive verbs almost always occur with an object, while others occur in mixed settings. If a verb nearly always occurs with an object, then the absence of one is more significant. We estimate the transitiveness of a verb empirically from an unannotated corpus. If a VP does not contain any passive auxiliaries and has both a subject and a direct object, then we assume it is a transitive active voice construction. If a VP is past tense, has a passive auxiliary, and has a subject but no direct object, then we assume it is an ordinary passive. All other instances of the verb are considered to be intransitive. These heuristics are obviously flawed (e.g., all reduced passives are counted as intransitive!). But the heuristics were designed to be a lower bound on a verb’s true transitiveness. We then compute transitiveness as P (transitive | verb) and create the following feature: Transitivity (1): P (transitive | verb) is mapped to 6 values based on ranges: < .20, .20-.40, .40-.60, .60-.80, > .80, or unknown (verb was not seen in the training corpus). The Classifier We created classifiers using decision trees and support vector machines. For the Dtree classifiers, we used the Weka J48 decision tree learner (Witten and Frank 2005) with the default settings. For the SVM classifiers, we used the SVMlight package (Joachims 1999) with the default settings. We built one SVM classifier with a linear kernel, LSVM, and one with a degree 3 polynomial kernel, PSVM. Experimental Results We conducted two sets of experiments. First, we performed 10-fold cross-validation using 2,069 Wall Street Journal (WSJ) texts from the Penn Treebank. The gold standard answers were the reduced passive voice labels automatically derived from the Penn Treebank parse trees by the program mentioned in the Motivation section. Second, we performed experiments on 3 corpora using manually annotated test sets as the gold standards. To create the test sets, we randomly selected 50 WSJ articles from the Penn Treebank, 200 terrorism articles from the MUC4 corpus, and 100 disease outbreak stories from ProMed. The training set for these experiments consisted of the 2,069 Penn Treebank documents T HEMATIC ROLE S EMANTICS Conceptually, recognizing the passive voice is about recognizing that a verb’s syntactic subject is its theme, or that a “by” PP contains its agent. Therefore it should be beneficial to know the semantic classes that each verb expects as 8 7 One could argue that there is an implicitly understood direct object, i.e., “He ate [food] after the concert.”. 9 1460 With the exception of dates and locations. For example: parser error, incorrect semantic label, etc. Method Candidacy +MC+NoDO +ByPP +All Dtree LSVM PSVM Rec .86 .77 .13 .12 .53 .62 .65 XVAL Prec .14 .23 .68 .82 .82 .82 .84 F .24 .36 .21 .20 .64 .71 .73 Rec .86 .74 .13 .12 .49 .58 .60 WSJ Prec .15 .26 .69 .81 .81 .79 .82 F .25 .38 .22 .22 .61 .67 .69 Rec .85 .77 .09 .09 .52 .61 .60 MUC4 Prec .17 .25 .75 .80 .78 .76 .80 F .28 .38 .16 .16 .62 .67 .69 Rec .86 .78 .10 .09 .43 .58 .54 PRO Prec .25 .36 .90 .93 .81 .81 .82 F .38 .50 .17 .16 .56 .68 .65 Table 2: Recall, precision, and F-measure results for reduced passive voice recognition. used in the cross-validation experiments.10 Our WSJ test set did not overlap with the documents in this training set. We report recall and precision for the cross-validation experiments on Treebank documents (XVAL) as well as the separate WSJ, MUC4, and ProMed (PRO) test sets. We devised 4 baselines to determine how well simple rules can identify reduced passives: Candidacy: Verbs that are past tense, have no modals, no passive, perfect, or “do” auxiliary, and are not themselves auxiliary verbs are labeled as reduced passives. (These are the Reduced Passive Candidate criteria described earlier.) +MC+NoDO: In addition to the Candidacy rules, the verb must occur in a multi-clausal sentence and cannot have a direct object. +ByPP: In addition to the Candidacy rules, the verb must be followed by a PP with preposition “by”. +All: All of the conditions above must apply. Acknowledgments This research was supported by Department of Homeland Security Grant N0014-07-1-0152, and the Institute for Scientific Computing Research and the Center for Applied Scientific Computing within Lawrence Livermore National Laboratory. References Bies, A.; Ferguson, M.; Katz, K.; and MacIntyre, R. 1995. Bracketing Guidelines for Treebank II Style Penn Treebank Project. Technical report, University of Pennsylvania. Carter, R., and McCarthy, M. 1999. The English get-passive in spoken discourse: description and implications for an interpersonal grammar. English Language and Linguistics 3.1:41–58. Charniak, E. 2000. A Maximum-Entropy-Inspired-Parser. In Proceedings of the The First Meeting of the North American Chapter of the Association for Computational Linguistics. Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. Dissertation, University of Pennsylvania. Joachims, T. 1999. Making Large-Scale Support Vector Machine Learning Practical. In B. Schölkopf, C. Burges, A. S., ed., Advances in Kernel Methods: Support Vector Machines. MIT Press. Klein, D., and Manning, C. 2003. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Advances in Neural Information Processing Systems 15. Li, X., and Roth, D. 2001. Exploring evidence for shallow parsing. In Proc. of the Annual Conference on Computational Natural Language Learning (CoNLL). Lin, D. 1998. Dependency-based Evaluation of MINIPAR. In Workshop on the Evaluation of Parsing Systems, Granada, Spain. McEnery, A., and Xiao, R. 2005. Passive constructions in English and Chinese: A corpus-based contrastive study. The Corpus Linguistics Conference Series 1(1). Merlo, P., and Stevenson, S. 1998. What grammars tell us about corpora: the case of reduced relative clauses. In Proceedings of the Sixth Workshop on Very Large Corpora. MUC-4 Proceedings. 1992. Proceedings of the Fourth Message Understanding Conference (MUC-4). Morgan Kaufmann. ProMed-mail. 2006. http://www.promedmail.org/. Riloff, E., and Phillips, W. 2004. An Introduction to the Sundance and AutoSlog Systems. Technical Report UUCS-04-015, School of Computing, University of Utah. Witten, I., and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. Morgan Kaufmann. Table 2 shows that the baseline systems can achieve high recall or high precision, but not both. Consequently, these simple heuristics are not a viable solution for shallow parsers. The last 3 rows of Table 2 show the performance of our classifiers, with the best score for each column in boldface. For the XVAL, WSJ, and MUC4 data sets, the best classifier is the PSVM which achieves 80-84% precision with 60-65% recall. For ProMed, the LSVM performs best, achieving 81% precision with 58% recall. These scores are all substantially above the baseline methods. We also conducted experiments to determine which features had the most impact. Performance dropped substantially when then lexical, POS, or transitivity features were removed. We also investigated whether any group of features performed well on their own. The POS features performed best but still well below the full feature set. Conclusions Currently, NLP systems that use shallow parsers either consistently mislabel reduced passives, or they rely on heuristics similar to our baselines. As an alternative, we have presented a learned classifier that can accurately recognize reduced passives in shallow parsing environments. This classifier should be beneficial for NLP systems that use shallow parsing but need to identify thematic roles. 10 See http://www.xmission.com/˜sgigo/aaai08/rp.html for the document ids used in all of our experiments. 1461