Class-based_nominal_semantic_role_labeling

advertisement

Class-based nominal semantic role labeling: a preliminary investigation

Matt Gerber

Michigan State University, Department of Computer

Science

Introduction: semantic role labeling

The semantic role

Relation between a constituent and a predication

“John presented his findings to the committee.”

Agent Theme Experiencer

The task

Automatically identify semantic roles occurring in natural language

Problematic: which roles are the “right” ones?

Introduction: PropBank (Kingsbury and Palmer 2003)

Annotated corpus of semantic roles

“John presented his findings to the committee.”

Arg0 Arg1 Arg2

Base corpus: TreeBank 2 (Marcus et al., 1993)

Evaluation

CoNLL Shared Task (Carreras and Marquez, 2005)

Implications

QA: Kaisser and Webber (2007), Shen and Lapata (2007)

Coreference: Ponzetto and Strube (2006)

Information extraction: Surdeanu et al. (2003)

Introduction: NomBank (Meyers, 2007)

Verbal

Nominal

Verbs are not the only lexical category with shallow semantic structure

[

Arg0

[

Arg2

Judge Curry] [

Predicate ordered] [

Arg1

Edison] to make average refunds of about $45].

Judge Curry ordered [

[

Predicate refunds] [

Arg1

Arg0

Edison] to make average of about $45].

A more complete semantic interpretation of natural language

Introduction: NomBank (Meyers, 2007)

Corpus information

Base corpus: TreeBank 2

Distinct nominalizations: 4704

Total attestations: ~115K

NomLex (Macleod et al., 1998)

Nominalization classes (22)

[

Nom (deverbals)

Example: Sales departments then urged [

Arg1 of the Pico Project].

Predicate abandonment]

Partitive (part-whole)

Example: Hallwood owns about 11 [

Predicate

%] [

Arg1 of Integra].

Research objectives

Investigate the role of NomLex classes in automated NomBank SRL

Hypotheses

(1) Classes may exhibit consistent realizations of their arguments

(2) Modeling each class separately may result in more homogeneous training data and better SRL performance

Outline

Nominalization interpretation: related work

NomBank SRL

Class-based NomBank SRL

Preliminary results and analysis

Conclusions and future work

Nominalization interpretation: early work

Rule-based methods

Associate syntactic configurations with grammatical functions and semantic properties

Dahl et al. (1987)

Hull and Gomez (1996)

Meyers et al. (1998)

Statistical models: Lapata (2000)

Identify underlying subject/object

[subject satellite] observation

[object satellite] observation

Nominalization interpretation: recent work

SemEval (Girju, 2007)

Semantic relations between nominals

Cause-Effect: laugh wrinkles

Instrument-Agency: laser printer

Product-Producer: honey bee

Origin-Entity: message entity from outer-space

Theme-Tool: news conference origin

Part-Whole: the door of the car

Content-Container: the grocery bag

Nominalization interpretation: recent work

NomBank SRL: Jiang and Ng (2006), Liu and Ng (2007)

Direct application of verbal SRL methods

Standard feature set

Maximum entropy modeling

Best overall f-measure score: 0.7283

NomBank-specific features had little impact

Overview of NomBank SRL

Full syntactic analysis S

VP

NP

S

VP

VP

NP

NP PP

JJ NNS

Judge Curry ordered Edison to make average [Predicate refunds] of about $45.

Overview of NomBank SRL

Argument identification

Binary classification problem

Argument

Non-argument

S

VP

NP

S

VP

VP

NP

NP PP

JJ NNS

Judge Curry ordered [Edison] to make average [Predicate refunds] [of about $45].

Overview of NomBank SRL

Argument classification

22-class problem

Arg0-Arg9

Temporal, location, etc.

S

VP

NP

S

VP

VP

NP

NP PP

JJ NNS

Judge Curry ordered [Arg0 Edison] to make average [Predicate refunds] [Arg1 of about $45].

NomBank SRL features

Class-based NomBank SRL

Simple method

Cluster nominalizations according to NomLex class membership

Train a logistic regression model for each class

Single-stage, 23-class strategy

Baseline feature set

Heuristic post-processing

Backoff

Trained over all classes

Class-based NomBank SRL

Model application

Hallwood owns about 11 [

Predicate

%] of Integra.

NomLex abandonment: … abatement: … abduction: … aberration: … ability: … abolition: … abomination: …

Nom Partitive Attribute Relational Backoff

Hallwood owns about 11 [

Predicate

%] [

Arg1 of Integra].

Preliminary results and analysis

Evaluation configuration

Training instances: WSJ 2-21

Testing instances: WSJ 23

Automatically generated parse trees for training and testing

Key observations

Overall performance

Per-class performance

Class-based gains over baseline

Overall evaluation results

Per-class evaluation results

Per-class evaluation results

General observations

Negligible overall gains compared to Liu and Ng (2007), who reported overall f-measure of 0.7283

Some NomLex classes perform very well

Classes introduce gains as well as losses

Analysis: intra-class regularity

Hypothesis 1: classes may exhibit consistent realizations of their arguments

Relational

class (F1=90.94)

Regularity: argument incorporation

[

Arg2

Mr. Hunt’s] [

Arg0/Predicate attorney] said his client welcomed the gamble.

100% of Relational nominalizations have an incorporated Arg0

Constitutes 38% of test arguments for the class

Analysis: intra-class regularity

Hypothesis 1: classes may exhibit consistent realizations of their arguments

Partitive

class (F1=79.85)

Regularity: presence of Arg0

86% of Partitive instances take a single Arg0

Compare: 15% of

Arg1

Nom instances take a single

Analysis: class-based gains

Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance

Improvements

Class

Nom-like

Environment

Group

Job

Test instances Improvement

798

108

40

30

2.06

3.97

5.87

6.29

Analysis: class-based gains

Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance

Losses

Class

Share

Nom-adj-like

Test instances Loss Class ambiguity Training instances

42 20.83

98.53

66 of 5211 total

28 5.93

90.56

400 of 5086 total

Conclusions and future work

NomBank SRL based on classes derived from NomLex

Demonstrates negligible gains over Liu and Ng (2007)

Intra-class regularity leads to modest gains in some classes

NomLex ambiguity causes losses in others

Conclusions and future work

In-depth class modeling

Identification of class-specific regularities not captured by the current feature set

Further partitioning of the Nom class?

NomLex class disambiguation

Thanks!

Any questions?

References

Carreras, X. & Màrquez, L. (2005), 'Introduction to the CoNLL-2005 Shared Task: Semantic

Role Labeling'.

Dahl, D. A.; Palmer, M. S. & Passonneau, R. J. (1987), Nominalizations in PUNDIT,

'Proceedings of the 25th annual meeting on Association for Computational Linguistics',

Association for Computational Linguistics, Morristown, NJ, USA, pp. 131--139.

Girju, R.; Nakov, P.; Nastase, V.; Szpakowicz, S.; Turney, P. & Yuret, D. (2007), SemEval-

2007 Task 04: Classification of Semantic Relations between Nominals,

4th International Workshop on Semantic Evaluations'.

Lapata, M. (2000), The Automatic Interpretation of Nominalizations, in

Hull, R. & Gomez, F. (1996), Semantic Interpretation of Nominalizations,

AAAI'.

Jiang, Z. & Ng, H. (2006), Semantic Role Labeling of NomBank: A Maximum Entropy

Approach, in 'Proceedings of the 2006 Conference on Empirical Methods in Natural

Language Processing'.

Kaisser, M. & Webber, B. (2007), Question Answering based on Semantic Roles, in 'ACL

2007 Workshop on Deep Linguistic Processing', Association for Computational Linguistics,

Prague, Czech Republic, pp. 41--48.

Kingsbury, P. & Palmer, M. (2003), Propbank: the next level of treebank,

Treebanks and Lexical Theories'.

in in in in

'Proceedings of the

'Proceedings of

'Proceedings of

'Proceedings of the

Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on

Innovative Applications of Artificial Intelligence', AAAI Press / The MIT Press, , pp. 716--

721.

References (cont’d)

Liu, C. & Ng, H. (2007), Learning Predictive Structures for Semantic Role Labeling of

NomBank, in 'Proceedings of the 45th Annual Meeting of the Association of Computational

Linguistics', Association for Computational Linguistics, Prague, Czech Republic, pp. 208--

215.

Macleod, C.; Grishman, R.; Meyers, A.; Barrett, L. & Reeves, R. (1998), Nomlex: A lexicon of nominalizations, in 'Proceedings of the Eighth International Congress of the European

Association for Lexicography'.

Marcus, M.; Santorini, B. & Marcinkiewicz, M. A. (1993), 'Building a large annotated corpus of English: the Penn TreeBank', Computational Linguistics

PropBank', Technical report, New York University.

19, 313-330.

Meyers, A. (2007), 'Annotation Guidelines for NomBank - Noun Argument Structure for

Meyers, A.; Macleod, C.; Yangarber, R.; Grishman, R.; Barrett, L. & Reeves, R. (1998),

Using NOMLEX to produce nominalization patterns for information extraction,

Wikipedia for coreference resolution, in in

'Proceedings of the COLING-ACL Workshop on the Computational Treatment of Nominals'.

Ponzetto, S. P. & Strube, M. (2006), Exploiting semantic role labeling, WordNet and

'Proceedings of the main conference on Human

Language Technology Conference of the North American Chapter of the Association of

Computational Linguistics', Association for Computational Linguistics, Morristown, NJ, USA, pp. 192--199.

Shen, D. & Lapata, M. (2007), Using Semantic Roles to Improve Question Answering,

'Proceedings of the Conference on Empirical Methods in Natural Language Processing and in 'Proceedings of the 41st Annual Meeting on in on Computational Natural Language Learning', pp. 12-21.

Surdeanu, M.; Harabagiu, S.; Williams, J. & Aarseth, P. (2003), Using predicate-argument structures for information extraction,

Association for Computational Linguistics'.

Download