Logical Formalizations of Commonsense Reasoning: Papers from the 2015 AAAI Spring Symposium Which States Can Be Changed by Which Events? Niloofar Montazeri Jerry R. Hobbs niloofar@isi.edu Hobbs@isi.edu Information Sciences Institute/University of Southern California, Marina del Rey, California Related works in the area of event semantics include detecting happens-before relations (Chklovski and Pantel 2004), causal relations (Girju 2003) and entailment relations (Pekar 2006) between events. The work by (Sil and Yates, 2011) is the closest to ours. They identify STRIPS representations of events which include such information as preconditions, post-conditions and delete-effects of events. The latter is defined as “conditions that held before occurrence of event but no longer hold afterwards” and is precisely what we are looking for. (Sil and Yates, 2011) have limited delete-effects to those that are negations of preconditions (e.g., “unhurt” is a precondition of “maim” and its negation, “hurt” is a delete-effect), but we also extract conditional delete-effects such as “If x teaches, then x’s retirement will put an end to it”. In addition, we find possible delete-effects such as “If x is happy, realizing y may put an end to it”. Abstract We present a method for finding (STATE, EVENT) pairs where EVENT can change STATE. For example, the event “realize” can put an end to the states “be unaware”, “be confused”, and “be happy”; while it can rarely affect “being hungry”. We extract these pairs from a large corpus using a fixed set of syntactic dependency patterns. We then apply a supervised Machine Learning algorithm to clean the results using syntactic and collocational features, achieving a precision of 78% and a recall of 90%. We observe 3 different relations between states and events that change them and present a method for using Mechanical Turk to differentiate between these relations1. Introduction Knowledge about event semantics plays an important role in both understanding natural language and reasoning about the world. In a series of papers (Montazeri and Hobbs, 2011, 2012; Hobbs and Montazeri, 2014) we have described our effort in manually axiomatizing change-ofstate verbs in terms of predicates in core theories of commonsense knowledge. This paper is an extension to our previous work (Montazeri et. al, 2013) in which we investigated the possibility of automatically extracting axioms for change-of-state verbs from text by detecting the states that an event can change. As before, we use hand crafted syntactic dependency patterns to extract candidate (STATE, EVENT) pairs; but unlike our previous filtering method in which we ranked the pairs and applied a threshold, we use a supervised Machine Learning algorithm to filter the candidates. We observe 3 different relations between states and events that change them (which result in 3 different types of axioms) and present a method for using Mechanical Turk to categorizes pairs based on these relations. Methodology Data Set: We harvested information from the ClueWeb09 dataset, whose English portion contains just over 500 million web pages2. Patterns: We use lexico-syntactic patterns for extracting candidate (STATE, EVENT) pairs. In our patterns, STATE and EVENT are two phrases that are in an adverbialcomplement relation and have a common argument. Here are simple verbal representations of our syntactic patterns: “used to STATE, before EVENT” , “STATE until t when EVENT”, “STATE until EVENT”, “if EVENT, no longer STATE”, “no longer STATE because EVENT”, “became/got/came STATE after EVENT”, “became/got/came STATE when EVENT”, “although EVENT, still/continued STATE”, “stopped STATE, because EVENT” “how can STATE if EVENT”, “no longer STATE if EVENT”. In each pattern, EVENT is a verb phrase with verb VE and STATE is a verb phrase with either (1) a verb VS in passive 1 This research was funded by the Office of Naval Research under Contract No. N00014-09-1-1029. Copyright © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 2 122 http://lemurproject.org/clueweb09.php/ form (e.g., was detained)3 or (2) a “being” verb VBS, with a noun, adjective, or a prepositional phrase (e.g., remained successful, was hero, was in team). A “being” verb is a verb in the set {“be”, “remain”, “become”, “get”, “stay”, “keep”}. To apply the constraint that STATE and EVENT have a common argument, the subject or the object of the second phrase (according to their order in the sentence) should be a pronoun that refers to the subject or object of the first phrase. Some examples are: “(John was detained) until March when (the authorities released him)”, “(John was happy) until (he heard the news)” and “if (John hears the news), (he will get upset)”. EVENT and hence we used the following fine-grained tags for annotating the results: • Category-1: STATE is a precondition of EVENT and will no longer hold after EVENT. Examples are (lost (y), find (x, y)) and (alive (x), die (x)). Pairs in this category result in axiom with the format: EVENT’(e, x, y) → changeFrom’(e, e0) & STATE’(e0, x/y)5 • Category-2: STATE is not a precondition of EVENT, but if STATE holds before EVENT, occurrence of EVENT will surely put an end to it. Examples are (teacher (x), retire (x)) and (married (x), die (x)). Pairs in this category result in axioms with the format: EVENT’(e, x, y) & STATE’(e0, x/y)→ changeFrom’(e, e0) Parsing the Corpus and Applying Patterns: Since our patterns require syntactic information, we parsed the sentences using the fast dependency parser described in (Tratz and Hovy, 2011)4. Before parsing this corpus, we first filter out sentences that won’t match any of our patterns, using a set of regular expressions derived from the patterns. Next, we parse these sentences and apply our syntactic dependency patterns to extract STATE and EVENT from each sentence along with several features such as pronounresolution (whether the common theme is the object or subject of the event), the pattern that was matched against the sentence, state and event verb’s voice (passive/active), and whether the state has been represented by an adjective or a noun. In the next step, we aggregate the features extracted for the same (STATE, EVENT) pairs into (STATE, EVENT, f, P, F) tuples; where f is the frequency of the pair (which shows how many times it was extracted by any change-of-state pattern), P is a dictionary structure that shows for each pattern, how many times it was matched against the (STATE, EVENT) pair, and F is a dictionary structure that keeps the most frequent value for each feature. We then drop the “being” verbs for nouns, adjectives and prepositional phrases; and construct the argument structures for states and events based on the most frequent pronounresolution case. After removing tuples with empty events like “be”, “have”, “get”, “do”, etc., we get about 68000 instances, examples of which are (lost (y), find (x, y)) and (teacher (x), retire (x)). • Category-3: Similar to the situation for Category-2, but EVENT only sometimes puts an end to STATE. Examples are (happy (x), realize (x, y)) and (confused (x), read (x, y)). Such pairs result in defeasible axioms with the format: EVENT’(e, x, y) & STATE’(e0, x/y) & etc6→ changeFrom’(e, e0) We refer to pairs that belong to any of the above categories as “change-of-state” and the rest as “non-change-of-state” pairs. Table 1 shows the distribution of change-of-state pairs and the finer-grained categories for all annotated pairs. In total, 69% of the annotated pairs are change-ofstate pairs. We consider this as the baseline precision for our method. Change-of-State Non-change-of-state 69% %Pairs Cat1 Cat2 Cat3 12% 16% 41% 31% Table 1: Distribution of Pair Categories Filtering the Results In order to increase the 69% baseline precision of the simple pattern-matching method, we used the C4.5 decision tree learning algorithm (Quinlan 1986) to classify the extracted pairs into change-of-state/non-change-of-state categories. The features we used are pronoun resolution, matched patterns, state and event verb’s voice (passive/active), and whether the state has been represented by an adjective or a noun) plus the following statistical information: 1) number of distinct patterns that have extracted the pair, and 2) Pointwise Mutual Information (PMI) between (STATE, EVENT) and the set of change-of-state pat- Assessing the Quality of the Results: Preparing the Evaluation Data Set One of the authors annotated 870 (STATE, EVENT) pairs which are a combination of random and high-frequency pairs. While trying to annotate these pairs, we found 3 types of change-of-state relationships between STATE and 3 5 In the case that the verb is a state verb, we can also consider active forms, however, in this work we only consider passive forms. 4 http://www.isi.edu/publications/licensed-sw/fanseparser/ For space reasons we have unified two alternative axioms into one: STATE’(e0, x/y) means STATE’(e0,x) or STATE’(e0,y) 6 The predicate etc means “Some other conditions hold” 123 terns. We compute this mutual information using the following formula: provided by all the 4 annotators and get the final answer for each question. MACE (Multi-Annotator Competence Estimation) is an implementation of an item-response model that learns in an unsupervised fashion to a) identify which annotators are trustworthy and b) predict the correct underlying labels. It is possible to have MACE produce answers in which it has a confidence above 90%. We have used this feature in our experiment. Since we had two types of questions, we ran MACE on each set of answers separately. As a result, we got for each pair, 2 answers: a yes/no answer for question 1 and a 3choice answer (surely/sometimes/rarely) for question 2. We then aggregated the answers to obtain the final category of the pair according to Table 2. In the following, we refer to Mace as M and to the author that annotated the pairs as A. We measure the agreement only on those cases where MACE was sure about its answer and hence produced one. In evaluating the performance of Mechanical Turk, we are particularly sensitive to false positives, as they will reduce precision, while false negatives only reduce recall. We consider the following cases as false positives: for binary yes/no questions: M said “yes”, but A said “no”. For 3 choice questions: 1) M said “surely”, but A said “sometimes” or “rarely” 2) M said “sometimes”, but A said “rarely”. Table 3 shows the agreement between M and A (which is above 80%) and percentage of false-positives (which is less than 8%) for different types of questions. Where pti represents pattern i, P (state, event, pti) represents the probability that pattern i extracts (state, event) and * is a wildcard. We normalized the PMI values using the discounting factor presented in (Pantel and Ravichandran, 2004) to moderate the bias towards rare cases. We achieved a precision of 78% and a recall of 90% in a 10-fold cross validation test on our 870 annotated pairs, which means about 10% improvement over the random selection baseline. Categorizing Pairs With Mechanical Turk We performed an experiment with Mechanical Turk to investigate whether we can use crowd sourcing for categorizing the change-of-state pairs into the 3 finer grained categories which we introduced earlier. For each (STATE, EVENT) pair, we asked 2 questions from the annotators. Here is an instantiated version of the two questions for the pair (lost(y), find(x,y)): 1. If I hear "something/someone is found" a. I can tell that it/she was lost before being found b. I cannot tell whether it/she was lost before being found 2. If something/someone is lost: a. finding will surely put an end to it. b. finding will sometimes put an end to it. c. finding will rarely put an end to it. Agreement Sometimes Rarely Yes Cat1 Cat3 None No Cat2 Cat3 None 85% 7.5% Final Answer 82% We have presented a method for extracting (STATE, EVENT) pairs in which EVENT can put an end to STATE, with a precision of 78%. We observed 3 different relations between states and events that change them (which result in 3 different types of axioms) and presented a method for using Mechanical Turk to differentiate between these relations. In future, we would like to consider synonymy of events and consolidate the data extracted for synonym events which will hopefully boost the quality of extractions. As for categorizing the pairs, we can adopt the method used by (Sil and Yates, 2011) for identifying preconditions of events (which is important for identifying Category-1 pairs). Finally, we can try using machine learning and statistical analysis to categorize the pairs automatically. From the 870 pairs that we had annotated by ourselves, we randomly selected 20 pairs per each change-of-state category, plus 20 non-change-of-state pairs, a total of 80 pairs. We divided them into 8 assignments each containing 10 pairs (for each pair, 2 questions, and hence 20 questions per assignment). We required that each assignment be answered by 4 subjects. After collecting the results, we used MACE7 (Hovy, et. al 2013) to aggregate the answers MACE can be downloaded http://www.isi.edu/publications/licensed-sw/mace/ 6% 3 Choice Conclusions and Future Work Table 2: Final Categories Based on Answers to Questions 7 86% Table 3: Agreement and False Positives for different types of questions We can aggregate the answers to these 2 questions according to Table 2 to obtain the right category. Surely False Positives Yes/No from: 124 References Chklovski, Timothy, and Patrick Pantel. Verbocean: Mining the web for fine-grained semantic verb relations. In proceedings of EMNLP. Vol. 4. 2004. Girju, Roxana. Automatic detection of causal relations for question answering. In proceedings of the ACL 2003 workshop on Multilingual summarization and question answering-Volume 12. 2003. Hobbs, Jerry R., and Niloofar Montazeri. The Deep Lexical Semantics of Event Words. Frames and Concept Types. Springer International Publishing, 2014. 157-176. Hovy, Dirk., Berg-Kirkpatrick, Taylor, Vaswani, Ashish, and Eduard Hovy, Learning Whom to trust with MACE. In proceedings of NAACL-HLT (pp. 1120-1130), 2013. Montazeri, Niloofar, and Jerry R. Hobbs. Elaborating a knowledge base for deep lexical semantics. In proceedings of the Ninth International Conference on Computational Semantics. Association for Computational Linguistics, 2011. Montazeri, Niloofar, and Jerry R. Hobbs. Axiomatizing Changeof-State Words. In M. Donnelly and G. Guizzardi (eds.), Formal Ontology in Information Systems: In proceedings of the Seventh International Conference (FOIS 2012), IOS Press, Amsterdam, Netherlands, pp. 221-234 Montazeri, Niloofar, Hobbs, Jerry R. and Eduard H. Hovy. How Text Mining Can Help Lexical and Commonsense Knowledgebase Construction. In proceedings of 11th International Symposium on Logical Formalizations of Commonsense Reasoning (Commonsense 2013) Pantel, Patrick, and Deepak Ravichandran , Automatically Labeling Semantic Classes. In proceedings of the 2004 Human Language Technology Conference (HLT- NAA Cl-04), Boston, MA, 2004, pp. 321–328. Pekar, Viktor. Acquisition of verb entailment from text. In proceedings of the Human Language Technology Conference of the NAACL, Main Conference. 2006. Quinlan, J. Ross. Induction of decision trees. Machine learning 1.1 (1986): 81-106. Sil, Avirup, Fei Huang, and Alexander Yates. Extracting Action and Event Semantics from Web Text. AAAI Fall Symposium: Commonsense Knowledge. 2010. Sil, Avirup, and Alexander Yates. Extracting STRIPS Representations of Actions and Events. RANLP. 2011. Tratz, Stephen, and Eduard Hovy. A fast, accurate, nonprojective, semantically-enriched parser. In proceedings of the Conference on EMNLP.2011 125