Matt Gerber
Michigan State University, Department of Computer
Science
The semantic role
Relation between a constituent and a predication
“John presented his findings to the committee.”
Agent Theme Experiencer
The task
Automatically identify semantic roles occurring in natural language
Problematic: which roles are the “right” ones?
Annotated corpus of semantic roles
“John presented his findings to the committee.”
Arg0 Arg1 Arg2
Base corpus: TreeBank 2 (Marcus et al., 1993)
Evaluation
CoNLL Shared Task (Carreras and Marquez, 2005)
Implications
QA: Kaisser and Webber (2007), Shen and Lapata (2007)
Coreference: Ponzetto and Strube (2006)
Information extraction: Surdeanu et al. (2003)
Verbal
Nominal
Verbs are not the only lexical category with shallow semantic structure
[
Arg0
[
Arg2
Judge Curry] [
Predicate ordered] [
Arg1
Edison] to make average refunds of about $45].
Judge Curry ordered [
[
Predicate refunds] [
Arg1
Arg0
Edison] to make average of about $45].
A more complete semantic interpretation of natural language
Corpus information
Base corpus: TreeBank 2
Distinct nominalizations: 4704
Total attestations: ~115K
NomLex (Macleod et al., 1998)
Nominalization classes (22)
[
Nom (deverbals)
Example: Sales departments then urged [
Arg1 of the Pico Project].
Predicate abandonment]
Partitive (part-whole)
Example: Hallwood owns about 11 [
Predicate
%] [
Arg1 of Integra].
Investigate the role of NomLex classes in automated NomBank SRL
Hypotheses
(1) Classes may exhibit consistent realizations of their arguments
(2) Modeling each class separately may result in more homogeneous training data and better SRL performance
Nominalization interpretation: related work
NomBank SRL
Class-based NomBank SRL
Preliminary results and analysis
Conclusions and future work
Rule-based methods
Associate syntactic configurations with grammatical functions and semantic properties
Dahl et al. (1987)
Hull and Gomez (1996)
Meyers et al. (1998)
Statistical models: Lapata (2000)
Identify underlying subject/object
[subject satellite] observation
[object satellite] observation
SemEval (Girju, 2007)
Semantic relations between nominals
Cause-Effect: laugh wrinkles
Instrument-Agency: laser printer
Product-Producer: honey bee
Origin-Entity: message entity from outer-space
Theme-Tool: news conference origin
Part-Whole: the door of the car
Content-Container: the grocery bag
NomBank SRL: Jiang and Ng (2006), Liu and Ng (2007)
Direct application of verbal SRL methods
Standard feature set
Maximum entropy modeling
Best overall f-measure score: 0.7283
NomBank-specific features had little impact
Full syntactic analysis S
VP
NP
S
VP
VP
NP
NP PP
JJ NNS
Judge Curry ordered Edison to make average [Predicate refunds] of about $45.
Argument identification
Binary classification problem
Argument
Non-argument
S
VP
NP
S
VP
VP
NP
NP PP
JJ NNS
Judge Curry ordered [Edison] to make average [Predicate refunds] [of about $45].
Argument classification
22-class problem
Arg0-Arg9
Temporal, location, etc.
S
VP
NP
S
VP
VP
NP
NP PP
JJ NNS
Judge Curry ordered [Arg0 Edison] to make average [Predicate refunds] [Arg1 of about $45].
Simple method
Cluster nominalizations according to NomLex class membership
Train a logistic regression model for each class
Single-stage, 23-class strategy
Baseline feature set
Heuristic post-processing
Backoff
Trained over all classes
Model application
Hallwood owns about 11 [
Predicate
%] of Integra.
NomLex abandonment: … abatement: … abduction: … aberration: … ability: … abolition: … abomination: …
Nom Partitive Attribute Relational Backoff
Hallwood owns about 11 [
Predicate
%] [
Arg1 of Integra].
Evaluation configuration
Training instances: WSJ 2-21
Testing instances: WSJ 23
Automatically generated parse trees for training and testing
Key observations
Overall performance
Per-class performance
Class-based gains over baseline
Overall evaluation results
Per-class evaluation results
General observations
Negligible overall gains compared to Liu and Ng (2007), who reported overall f-measure of 0.7283
Some NomLex classes perform very well
Classes introduce gains as well as losses
Hypothesis 1: classes may exhibit consistent realizations of their arguments
class (F1=90.94)
Regularity: argument incorporation
[
Arg2
Mr. Hunt’s] [
Arg0/Predicate attorney] said his client welcomed the gamble.
100% of Relational nominalizations have an incorporated Arg0
Constitutes 38% of test arguments for the class
Hypothesis 1: classes may exhibit consistent realizations of their arguments
class (F1=79.85)
Regularity: presence of Arg0
86% of Partitive instances take a single Arg0
Compare: 15% of
Arg1
Nom instances take a single
Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance
Improvements
Class
Nom-like
Environment
Group
Job
Test instances Improvement
798
108
40
30
2.06
3.97
5.87
6.29
Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance
Losses
Class
Share
Nom-adj-like
Test instances Loss Class ambiguity Training instances
42 20.83
98.53
66 of 5211 total
28 5.93
90.56
400 of 5086 total
NomBank SRL based on classes derived from NomLex
Demonstrates negligible gains over Liu and Ng (2007)
Intra-class regularity leads to modest gains in some classes
NomLex ambiguity causes losses in others
In-depth class modeling
Identification of class-specific regularities not captured by the current feature set
Further partitioning of the Nom class?
NomLex class disambiguation
Carreras, X. & Màrquez, L. (2005), 'Introduction to the CoNLL-2005 Shared Task: Semantic
Role Labeling'.
Dahl, D. A.; Palmer, M. S. & Passonneau, R. J. (1987), Nominalizations in PUNDIT,
'Proceedings of the 25th annual meeting on Association for Computational Linguistics',
Association for Computational Linguistics, Morristown, NJ, USA, pp. 131--139.
Girju, R.; Nakov, P.; Nastase, V.; Szpakowicz, S.; Turney, P. & Yuret, D. (2007), SemEval-
2007 Task 04: Classification of Semantic Relations between Nominals,
4th International Workshop on Semantic Evaluations'.
Lapata, M. (2000), The Automatic Interpretation of Nominalizations, in
Hull, R. & Gomez, F. (1996), Semantic Interpretation of Nominalizations,
AAAI'.
Jiang, Z. & Ng, H. (2006), Semantic Role Labeling of NomBank: A Maximum Entropy
Approach, in 'Proceedings of the 2006 Conference on Empirical Methods in Natural
Language Processing'.
Kaisser, M. & Webber, B. (2007), Question Answering based on Semantic Roles, in 'ACL
2007 Workshop on Deep Linguistic Processing', Association for Computational Linguistics,
Prague, Czech Republic, pp. 41--48.
Kingsbury, P. & Palmer, M. (2003), Propbank: the next level of treebank,
Treebanks and Lexical Theories'.
in in in in
'Proceedings of the
'Proceedings of
'Proceedings of
'Proceedings of the
Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on
Innovative Applications of Artificial Intelligence', AAAI Press / The MIT Press, , pp. 716--
721.
Liu, C. & Ng, H. (2007), Learning Predictive Structures for Semantic Role Labeling of
NomBank, in 'Proceedings of the 45th Annual Meeting of the Association of Computational
Linguistics', Association for Computational Linguistics, Prague, Czech Republic, pp. 208--
215.
Macleod, C.; Grishman, R.; Meyers, A.; Barrett, L. & Reeves, R. (1998), Nomlex: A lexicon of nominalizations, in 'Proceedings of the Eighth International Congress of the European
Association for Lexicography'.
Marcus, M.; Santorini, B. & Marcinkiewicz, M. A. (1993), 'Building a large annotated corpus of English: the Penn TreeBank', Computational Linguistics
PropBank', Technical report, New York University.
19, 313-330.
Meyers, A. (2007), 'Annotation Guidelines for NomBank - Noun Argument Structure for
Meyers, A.; Macleod, C.; Yangarber, R.; Grishman, R.; Barrett, L. & Reeves, R. (1998),
Using NOMLEX to produce nominalization patterns for information extraction,
Wikipedia for coreference resolution, in in
'Proceedings of the COLING-ACL Workshop on the Computational Treatment of Nominals'.
Ponzetto, S. P. & Strube, M. (2006), Exploiting semantic role labeling, WordNet and
'Proceedings of the main conference on Human
Language Technology Conference of the North American Chapter of the Association of
Computational Linguistics', Association for Computational Linguistics, Morristown, NJ, USA, pp. 192--199.
Shen, D. & Lapata, M. (2007), Using Semantic Roles to Improve Question Answering,
'Proceedings of the Conference on Empirical Methods in Natural Language Processing and in 'Proceedings of the 41st Annual Meeting on in on Computational Natural Language Learning', pp. 12-21.
Surdeanu, M.; Harabagiu, S.; Williams, J. & Aarseth, P. (2003), Using predicate-argument structures for information extraction,
Association for Computational Linguistics'.