Class-based_nominal_semantic_role_labeling

Class-based nominal semantic role labeling: a preliminary investigation

Matt Gerber

Michigan State University, Department of Computer

Science

Introduction: semantic role labeling



The semantic role



Relation between a constituent and a predication

“John presented his findings to the committee.”

Agent Theme Experiencer



The task





Automatically identify semantic roles occurring in natural language

Problematic: which roles are the “right” ones?

Introduction: PropBank (Kingsbury and Palmer 2003)



Annotated corpus of semantic roles

“John presented his findings to the committee.”

Arg0 Arg1 Arg2







Base corpus: TreeBank 2 (Marcus et al., 1993)

Evaluation



CoNLL Shared Task (Carreras and Marquez, 2005)

Implications







QA: Kaisser and Webber (2007), Shen and Lapata (2007)

Coreference: Ponzetto and Strube (2006)

Information extraction: Surdeanu et al. (2003)

Introduction: NomBank (Meyers, 2007)

Verbal

Nominal





Verbs are not the only lexical category with shallow semantic structure





[

Arg0

[

Arg2

Judge Curry] [

Predicate ordered] [

Arg1

Edison] to make average refunds of about $45].

Judge Curry ordered [

[

Predicate refunds] [

Arg1

Arg0

Edison] to make average of about $45].

A more complete semantic interpretation of natural language

Introduction: NomBank (Meyers, 2007)





Corpus information





Base corpus: TreeBank 2

Distinct nominalizations: 4704



Total attestations: ~115K

NomLex (Macleod et al., 1998)



Nominalization classes (22)

[

Nom (deverbals)

Example: Sales departments then urged [

Arg1 of the Pico Project].

Predicate abandonment]

Partitive (part-whole)

Example: Hallwood owns about 11 [

Predicate

%] [

Arg1 of Integra].

Research objectives





Investigate the role of NomLex classes in automated NomBank SRL

Hypotheses





(1) Classes may exhibit consistent realizations of their arguments

(2) Modeling each class separately may result in more homogeneous training data and better SRL performance

Outline











Nominalization interpretation: related work

NomBank SRL

Class-based NomBank SRL

Preliminary results and analysis

Conclusions and future work

Nominalization interpretation: early work





Rule-based methods



Associate syntactic configurations with grammatical functions and semantic properties





Dahl et al. (1987)

Hull and Gomez (1996)



Meyers et al. (1998)

Statistical models: Lapata (2000)



Identify underlying subject/object





[subject satellite] observation

[object satellite] observation

Nominalization interpretation: recent work



SemEval (Girju, 2007)



Semantic relations between nominals







Cause-Effect: laugh wrinkles

Instrument-Agency: laser printer

Product-Producer: honey bee









Origin-Entity: message entity from outer-space

Theme-Tool: news conference origin

Part-Whole: the door of the car

Content-Container: the grocery bag

Nominalization interpretation: recent work



NomBank SRL: Jiang and Ng (2006), Liu and Ng (2007)





Direct application of verbal SRL methods



Standard feature set



Maximum entropy modeling

Best overall f-measure score: 0.7283



NomBank-specific features had little impact

Overview of NomBank SRL



Full syntactic analysis S

VP

NP

S

VP

VP

NP

NP PP

JJ NNS

Judge Curry ordered Edison to make average [Predicate refunds] of about $45.




Argument identification



Binary classification problem





Argument

Non-argument

S

VP

NP

S

VP

VP

NP

NP PP

JJ NNS

Judge Curry ordered [Edison] to make average [Predicate refunds] [of about $45].




Argument classification



22-class problem





Arg0-Arg9

Temporal, location, etc.

S

VP

NP

S

VP

VP

NP

NP PP

JJ NNS

Judge Curry ordered [Arg0 Edison] to make average [Predicate refunds] [Arg1 of about $45].

NomBank SRL features






Simple method



Cluster nominalizations according to NomLex class membership



Train a logistic regression model for each class







Single-stage, 23-class strategy

Baseline feature set

Heuristic post-processing

Backoff



Trained over all classes




Model application

Hallwood owns about 11 [

Predicate

%] of Integra.

NomLex abandonment: … abatement: … abduction: … aberration: … ability: … abolition: … abomination: …

Nom Partitive Attribute Relational Backoff

Hallwood owns about 11 [

Predicate

%] [

Arg1 of Integra].

Preliminary results and analysis





Evaluation configuration





Training instances: WSJ 2-21

Testing instances: WSJ 23



Automatically generated parse trees for training and testing

Key observations



Overall performance





Per-class performance

Class-based gains over baseline

Overall evaluation results

Per-class evaluation results

Per-class evaluation results



General observations



Negligible overall gains compared to Liu and Ng (2007), who reported overall f-measure of 0.7283





Some NomLex classes perform very well

Classes introduce gains as well as losses

Analysis: intra-class regularity





Hypothesis 1: classes may exhibit consistent realizations of their arguments

Relational

class (F1=90.94)









Regularity: argument incorporation

[

Arg2

Mr. Hunt’s] [

Arg0/Predicate attorney] said his client welcomed the gamble.

100% of Relational nominalizations have an incorporated Arg0

Constitutes 38% of test arguments for the class

Analysis: intra-class regularity





Hypothesis 1: classes may exhibit consistent realizations of their arguments

Partitive

class (F1=79.85)







Regularity: presence of Arg0

86% of Partitive instances take a single Arg0

Compare: 15% of

Arg1

Nom instances take a single

Analysis: class-based gains





Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance

Improvements

Class

Nom-like

Environment

Group

Job

Test instances Improvement

798

108

40

30

2.06

3.97

5.87

6.29

Analysis: class-based gains





Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance

Losses

Class

Share

Nom-adj-like

Test instances Loss Class ambiguity Training instances

42 20.83

98.53

66 of 5211 total

28 5.93

90.56

400 of 5086 total










NomBank SRL based on classes derived from NomLex

Demonstrates negligible gains over Liu and Ng (2007)

Intra-class regularity leads to modest gains in some classes

NomLex ambiguity causes losses in others






In-depth class modeling





Identification of class-specific regularities not captured by the current feature set

Further partitioning of the Nom class?

NomLex class disambiguation

Thanks!

Any questions?

References

















Carreras, X. & Màrquez, L. (2005), 'Introduction to the CoNLL-2005 Shared Task: Semantic

Role Labeling'.

Dahl, D. A.; Palmer, M. S. & Passonneau, R. J. (1987), Nominalizations in PUNDIT,

'Proceedings of the 25th annual meeting on Association for Computational Linguistics',

Association for Computational Linguistics, Morristown, NJ, USA, pp. 131--139.

Girju, R.; Nakov, P.; Nastase, V.; Szpakowicz, S.; Turney, P. & Yuret, D. (2007), SemEval-

2007 Task 04: Classification of Semantic Relations between Nominals,

4th International Workshop on Semantic Evaluations'.

Lapata, M. (2000), The Automatic Interpretation of Nominalizations, in

Hull, R. & Gomez, F. (1996), Semantic Interpretation of Nominalizations,

AAAI'.

Jiang, Z. & Ng, H. (2006), Semantic Role Labeling of NomBank: A Maximum Entropy

Approach, in 'Proceedings of the 2006 Conference on Empirical Methods in Natural

Language Processing'.

Kaisser, M. & Webber, B. (2007), Question Answering based on Semantic Roles, in 'ACL

2007 Workshop on Deep Linguistic Processing', Association for Computational Linguistics,

Prague, Czech Republic, pp. 41--48.

Kingsbury, P. & Palmer, M. (2003), Propbank: the next level of treebank,

Treebanks and Lexical Theories'.

in in in in

'Proceedings of the

'Proceedings of

'Proceedings of

'Proceedings of the

Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on

Innovative Applications of Artificial Intelligence', AAAI Press / The MIT Press, , pp. 716--

721.

References (cont’d)

















Liu, C. & Ng, H. (2007), Learning Predictive Structures for Semantic Role Labeling of

NomBank, in 'Proceedings of the 45th Annual Meeting of the Association of Computational

Linguistics', Association for Computational Linguistics, Prague, Czech Republic, pp. 208--

215.

Macleod, C.; Grishman, R.; Meyers, A.; Barrett, L. & Reeves, R. (1998), Nomlex: A lexicon of nominalizations, in 'Proceedings of the Eighth International Congress of the European

Association for Lexicography'.

Marcus, M.; Santorini, B. & Marcinkiewicz, M. A. (1993), 'Building a large annotated corpus of English: the Penn TreeBank', Computational Linguistics

PropBank', Technical report, New York University.

19, 313-330.

Meyers, A. (2007), 'Annotation Guidelines for NomBank - Noun Argument Structure for

Meyers, A.; Macleod, C.; Yangarber, R.; Grishman, R.; Barrett, L. & Reeves, R. (1998),

Using NOMLEX to produce nominalization patterns for information extraction,

Wikipedia for coreference resolution, in in

'Proceedings of the COLING-ACL Workshop on the Computational Treatment of Nominals'.

Ponzetto, S. P. & Strube, M. (2006), Exploiting semantic role labeling, WordNet and

'Proceedings of the main conference on Human

Language Technology Conference of the North American Chapter of the Association of

Computational Linguistics', Association for Computational Linguistics, Morristown, NJ, USA, pp. 192--199.

Shen, D. & Lapata, M. (2007), Using Semantic Roles to Improve Question Answering,

'Proceedings of the Conference on Empirical Methods in Natural Language Processing and in 'Proceedings of the 41st Annual Meeting on in on Computational Natural Language Learning', pp. 12-21.

Surdeanu, M.; Harabagiu, S.; Williams, J. & Aarseth, P. (2003), Using predicate-argument structures for information extraction,

Association for Computational Linguistics'.

Class-based_nominal_semantic_role_labeling

Class-based nominal semantic role labeling: a preliminary investigation

Introduction: semantic role labeling

Introduction: PropBank (Kingsbury and Palmer 2003)

Introduction: NomBank (Meyers, 2007)

Introduction: NomBank (Meyers, 2007)

Research objectives

Outline

Nominalization interpretation: early work

Nominalization interpretation: recent work

Nominalization interpretation: recent work

Overview of NomBank SRL

Overview of NomBank SRL

Overview of NomBank SRL

NomBank SRL features

Class-based NomBank SRL

Class-based NomBank SRL

Preliminary results and analysis

Per-class evaluation results

Analysis: intra-class regularity

Relational

Analysis: intra-class regularity

Partitive

Analysis: class-based gains

Analysis: class-based gains

Conclusions and future work

Conclusions and future work

Thanks!

Any questions?

References

References (cont’d)

Related documents

Products

Support

Class-based_nominal_semantic_role_labeling