Automatic Event Classification Using Surface Text Features Hilda Hardy , Vika Kanchakouskaya

advertisement
Automatic Event Classification Using Surface Text Features
Hilda Hardy1, Vika Kanchakouskaya1, and Tomek Strzalkowski1,2
Transfer,
Develop,
Abstract
Attack,
Introduction
Spray/load, Fill, Butter, Remove, Bring
and Take (Get, Obtain),
Killing (Murder, Poison,
Destroy).
height,
person
country
.
General frames
accident, pollution, trade,
Typed frames
Attack
Event Acquisition from Unstructured Text
Transfer
In the Gaza Strip, <agent>Palestinian gunners</agent>
<type>fired</type> <instr>eight mortars</instr> against
<target>Jewish settlements</target> overnight <time>
Wednesday</time>.
In <time>1988</time> <agent>Saddam</agent> also
<type>used</type> <instr>mustard and nerve agents
</instr> against <target>Iraqi Kurds</target> at <location>
Halabja in northern Iraq</location>.
pattern: event = “attack”, part of speech = “verb”, voice =
“active”
<NP=Agent> <trigger> + <NP=Instrument> <“against |
with | on | at”> + <NP=Target>
Develop
Event
Agree
Assist
Attack
Develop
Financial
Yesterday, 7 December 1941—a date which will live in
infamy—the United States of America was suddenly and
deliberately attacked by naval and air forces of the Empire
of Japan.
Some senior Indian ministers had threatened retaliation
against Pakistan for its alleged abetment of terrorism in
Jammu and Kashmir.
LawCriminal
LawNat’l/Int’l
Political
Threat
Transfer
None
None
Transfer, Develop, Attack
Agree,
Example Triggers
(type)
treaty, agreement,
sign, ratify
helped, supporting,
assisted, aid
attacked, invaded,
bombed, destroyed
construct, develop,
manufacture
funded, financed,
laundered money
arrested, detained,
caught, charges
inspectors visited,
imposed embargo,
passed legislation
election, fired, hired,
appointed, voted
threaten, fear,
warned
acquire, smuggle,
obtain, seize, export
Key Roles
PARTIES, INSTR
AGENT, TARGET,
INSTR
AGENT, TARGET,
INSTR
AGENT, OBJECT
SOURCE, TARGET,
QUANTITY
AGENT, TARGET,
CHARGE
AGENT, TARGET,
WHAT, CHARGE
AGENT, TARGET,
POSITION
AGENT, TARGET,
INSTR
SOURCE,
DESTINATION, OBJECT
Features for Automatically Classifying Events
Word Features (Nouns, Verbs, Adjectives,
Pronouns and Prepositions)
None
k
k
Software Resources and Machine Learning
Algorithms
Algorithm
100
Logistic
Logistic
Vote
Bagging
53.70
55.06
53.88
Logistic
Vote
Bagging
47.67
48.62
48.05
Logistic
Vote
Bagging
31.93
33.21
32.51
Number of Words
60
40
Nouns
53.05 50.40 48.27
53.43 51.15 48.02
52.43 50.15 47.70
Verbs
47.15 44.02 39.44
47.87 44.27 39.21
47.27 43.87 39.19
Adjectives
31.06 30.93 30.0
32.66 32.08 30.83
31.91 31.88 30.96
80
20
39.01
40.07
39.89
36.11
36.29
36.09
30.38
30.46
30.53
k
Vote
Sentence Length and Named Entity Features
Bagging
Results for Combined Feature Sets
Geo-Political Entity, Location, Person, Time
N
50
50
40
30
Vote
Algorithm
Logistic
Vote
Bagging
Baseline (majority class)
Vb
40
20
20
20
Accuracy
28.53%
29.33
29.08
21.45
Adj
40
20
20
10
Vb
P&P
15
15
15
15
SL
1
1
1
1
NE
24
18
18
18
N
P&P
Adj
SL
Total features
170
124
114
94
NE
Logistic
Regression
Vote
Syntactic Chunk Patterns
Algorithm
NP, VP, Verb, PP, ADVP
Other Verb
Other
k
Logistic
Vote
Bagging
Baseline
170
59.13%
58.98
52.93
21.45
Number of Features
124
114
59.76% 59.46%
58.81
58.56
52.93
52.98
21.45
21.45
94
58.61%
56.58
52.33
21.45
k
NP-VP, NP-Verb-NP-PP,
NP-ADVP-VP, Other-NP-VP,
Agree,
Transfer, Attack
None
Transfer
Develop
Attacks
Threat
None
Class
NONE
TRN
DEV
ATT
FIN
THR
AST
POL
LEI
LEC
AGR
No.
827
857
540
687
143
177
210
46
134
50
325
Attack
Precision
0.632
0.607
0.614
0.613
0.51
0.61
0.326
0.216
0.262
0.222
0.712
Recall
0.811
0.614
0.57
0.64
0.371
0.469
0.21
0.174
0.164
0.16
0.692
F-Measure
0.711
0.61
0.591
0.626
0.429
0.53
0.255
0.193
0.202
0.186
0.702
0.8
0.711
0.689
0.7134 0.727
0.634
0.7
0.5504
0.6
0.4755
0.5
0.4518
0.4
0.3
0.2
0.1
0
Transfer
Develop
Manual
Attack
Agree
Auto
Logistic
No.
Reducing the Size of the Data Set
Class
NONE
TRN
DEV
ATT
THR
AGR
No.
827
857
540
687
177
325
Precision
0.698
0.694
0.682
0.717
0.643
0.755
Recall
0.839
0.684
0.593
0.705
0.469
0.702
F-Measure
0.762
0.689
0.634
0.711
0.542
0.727
Logistic
No.
NONE
TRN
DEV
ATT
THR
AGR
NONE
694
92
64
99
18
27
TRN
34
586
112
59
26
27
Classified as:
DEV ATT THR
22
53
6
85
65
8
320
18
4
13 484
24
5
40
83
24
15
4
AGR
18
21
22
8
5
228
Logistic
Comparative Results, Manual vs. Automatic
Logistic Regression, Bagging
Vote
100%
WordNet Workshop at NAACL.
90%
80%
70%
60%
Logistic
Bagging
50%
Vote
40%
Proceedings of the 38th
Annual Meeting of the Association for Computational
Linguistics (ACL-2000),
30%
20%
Proceedings of the Human
Conference (HLT ’02).
10%
0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
Language
Technology
90% 100%
Percentage of data used for training (remainder testing)
Applied Statistics
English Verb Classes and Alternations: A
Preliminary Investigation
Future Work
Proceedings of DARPA Broadcast News Workshop,
Proceedings of the 20th International Conference on
Computational Linguistics, Coling 2004,
Proceedings of the
Intelligence Analysis,
International
Conference
on
Proceedings of Coling 1996
Acknowledgments
.
Proceedings of the 9th International Workshop
on Parsing Technologies (IWPT 2005),
Data Mining: Practical
Machine Learning Tools and Techniques,
Proceedings of ACL-2003.
References
. Machine Learning
Machine Learning
Download