Automatic Event Classification Using Surface Text Features Hilda Hardy1, Vika Kanchakouskaya1, and Tomek Strzalkowski1,2 Transfer, Develop, Abstract Attack, Introduction Spray/load, Fill, Butter, Remove, Bring and Take (Get, Obtain), Killing (Murder, Poison, Destroy). height, person country . General frames accident, pollution, trade, Typed frames Attack Event Acquisition from Unstructured Text Transfer In the Gaza Strip, <agent>Palestinian gunners</agent> <type>fired</type> <instr>eight mortars</instr> against <target>Jewish settlements</target> overnight <time> Wednesday</time>. In <time>1988</time> <agent>Saddam</agent> also <type>used</type> <instr>mustard and nerve agents </instr> against <target>Iraqi Kurds</target> at <location> Halabja in northern Iraq</location>. pattern: event = “attack”, part of speech = “verb”, voice = “active” <NP=Agent> <trigger> + <NP=Instrument> <“against | with | on | at”> + <NP=Target> Develop Event Agree Assist Attack Develop Financial Yesterday, 7 December 1941—a date which will live in infamy—the United States of America was suddenly and deliberately attacked by naval and air forces of the Empire of Japan. Some senior Indian ministers had threatened retaliation against Pakistan for its alleged abetment of terrorism in Jammu and Kashmir. LawCriminal LawNat’l/Int’l Political Threat Transfer None None Transfer, Develop, Attack Agree, Example Triggers (type) treaty, agreement, sign, ratify helped, supporting, assisted, aid attacked, invaded, bombed, destroyed construct, develop, manufacture funded, financed, laundered money arrested, detained, caught, charges inspectors visited, imposed embargo, passed legislation election, fired, hired, appointed, voted threaten, fear, warned acquire, smuggle, obtain, seize, export Key Roles PARTIES, INSTR AGENT, TARGET, INSTR AGENT, TARGET, INSTR AGENT, OBJECT SOURCE, TARGET, QUANTITY AGENT, TARGET, CHARGE AGENT, TARGET, WHAT, CHARGE AGENT, TARGET, POSITION AGENT, TARGET, INSTR SOURCE, DESTINATION, OBJECT Features for Automatically Classifying Events Word Features (Nouns, Verbs, Adjectives, Pronouns and Prepositions) None k k Software Resources and Machine Learning Algorithms Algorithm 100 Logistic Logistic Vote Bagging 53.70 55.06 53.88 Logistic Vote Bagging 47.67 48.62 48.05 Logistic Vote Bagging 31.93 33.21 32.51 Number of Words 60 40 Nouns 53.05 50.40 48.27 53.43 51.15 48.02 52.43 50.15 47.70 Verbs 47.15 44.02 39.44 47.87 44.27 39.21 47.27 43.87 39.19 Adjectives 31.06 30.93 30.0 32.66 32.08 30.83 31.91 31.88 30.96 80 20 39.01 40.07 39.89 36.11 36.29 36.09 30.38 30.46 30.53 k Vote Sentence Length and Named Entity Features Bagging Results for Combined Feature Sets Geo-Political Entity, Location, Person, Time N 50 50 40 30 Vote Algorithm Logistic Vote Bagging Baseline (majority class) Vb 40 20 20 20 Accuracy 28.53% 29.33 29.08 21.45 Adj 40 20 20 10 Vb P&P 15 15 15 15 SL 1 1 1 1 NE 24 18 18 18 N P&P Adj SL Total features 170 124 114 94 NE Logistic Regression Vote Syntactic Chunk Patterns Algorithm NP, VP, Verb, PP, ADVP Other Verb Other k Logistic Vote Bagging Baseline 170 59.13% 58.98 52.93 21.45 Number of Features 124 114 59.76% 59.46% 58.81 58.56 52.93 52.98 21.45 21.45 94 58.61% 56.58 52.33 21.45 k NP-VP, NP-Verb-NP-PP, NP-ADVP-VP, Other-NP-VP, Agree, Transfer, Attack None Transfer Develop Attacks Threat None Class NONE TRN DEV ATT FIN THR AST POL LEI LEC AGR No. 827 857 540 687 143 177 210 46 134 50 325 Attack Precision 0.632 0.607 0.614 0.613 0.51 0.61 0.326 0.216 0.262 0.222 0.712 Recall 0.811 0.614 0.57 0.64 0.371 0.469 0.21 0.174 0.164 0.16 0.692 F-Measure 0.711 0.61 0.591 0.626 0.429 0.53 0.255 0.193 0.202 0.186 0.702 0.8 0.711 0.689 0.7134 0.727 0.634 0.7 0.5504 0.6 0.4755 0.5 0.4518 0.4 0.3 0.2 0.1 0 Transfer Develop Manual Attack Agree Auto Logistic No. Reducing the Size of the Data Set Class NONE TRN DEV ATT THR AGR No. 827 857 540 687 177 325 Precision 0.698 0.694 0.682 0.717 0.643 0.755 Recall 0.839 0.684 0.593 0.705 0.469 0.702 F-Measure 0.762 0.689 0.634 0.711 0.542 0.727 Logistic No. NONE TRN DEV ATT THR AGR NONE 694 92 64 99 18 27 TRN 34 586 112 59 26 27 Classified as: DEV ATT THR 22 53 6 85 65 8 320 18 4 13 484 24 5 40 83 24 15 4 AGR 18 21 22 8 5 228 Logistic Comparative Results, Manual vs. Automatic Logistic Regression, Bagging Vote 100% WordNet Workshop at NAACL. 90% 80% 70% 60% Logistic Bagging 50% Vote 40% Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000), 30% 20% Proceedings of the Human Conference (HLT ’02). 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% Language Technology 90% 100% Percentage of data used for training (remainder testing) Applied Statistics English Verb Classes and Alternations: A Preliminary Investigation Future Work Proceedings of DARPA Broadcast News Workshop, Proceedings of the 20th International Conference on Computational Linguistics, Coling 2004, Proceedings of the Intelligence Analysis, International Conference on Proceedings of Coling 1996 Acknowledgments . Proceedings of the 9th International Workshop on Parsing Technologies (IWPT 2005), Data Mining: Practical Machine Learning Tools and Techniques, Proceedings of ACL-2003. References . Machine Learning Machine Learning