Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources

advertisement
Classification of Discourse
Coherence Relations: An
Exploratory Study using
Multiple Knowledge Sources
Ben Wellner†*, James Pustejovsky†,
Catherine Havasi†, Anna Rumshisky†
and Roser Saurí†
† Brandeis University
* The MITRE Corporation
mitre
Outline of Talk




Overview and Motivation for Modeling Discourse
Background
Objectives
The Discourse GraphBank




Modeling Discourse




Overview
Coherence Relations
Issues with the GraphBank
Machine learning approach
Knowledge Sources and Features
Experiments and Analysis
Conclusions and Future Work
Modeling Discourse: Motivation

Why model discourse?
Dialogue
 General text understanding applications



Text summarization and generation
Information extraction

MUC Scenario Template Task
Discourse is vital for understanding how events are related
 Modeling discourse generally may aid specific extraction
tasks

Background

Different approaches to discourse


Different objectives


Coarse vs. fine-grained
Different representations


Informational vs. intentional, dialog vs. general text
Different inventories of discourse relations


Semantics/formalisms: Hobbs [1985], Mann and Thomson[1987], Grosz
and Sidner[1986], Asher [1993], others
Tree representation vs. Graph
Same steps involved:




1. Identifying discourse segments
2. Grouping discourse segments into sequences
3. Identifying the presence of a relation
4. Identifying the type of the relation
Discourse Steps #1*
1. Segment: Mary is in a bad mood because Fred played tuba while she was taking a nap.
A
B
C
r1
2. Group
3. Connect segments
r2
A
B
4. Relation Type:
C
r1 = cause-effect
r2 = elaboration
* Example from [Danlos 2004]
Discourse Steps #2*
1. Segment:
Fred played the tuba. Next he prepared a pizza to please Mary.
A
2. Group
3. Connect segments
4. Relation Type:
B
r1
A
C
r2
B
C
r1 = temporal precedence
r2 = cause-effect
* Example from [Danlos 2004]
Objectives

Our Main Focus: Step 4 - classifying discourse relations


Important for all approaches to discourse
Can be approached independently of representation




But – relation types and structure are probably quite dependent
Task will vary with inventory of relation types
What types of knowledge/features are important for
this task
Can we apply the same approach to Step 3:

Identifying whether two segment groups are linked
Discourse GraphBank: Overview

Graph-based representation of discourse



Segments can be grouped into sequences
Relations need not exist between segments within a group
Coherence relations between segment groups


Tree-representation inadequate: multiple parents, crossing dependencies
Discourse composed of clausal segments


[Wolf and Gibson, 2005]
Roughly those of Hobbs [1985]
Why GraphBank?

Similar inventory of relations as SDRT




Linked to lexical representations
Semantics well-developed
Includes non-local discourse links
Existing annotated corpus, unexplored outside of [Wolf and Gibson, 2005]
Resemblance Relations
Similarity:
(parallel)
The first flight to Frankfurt this morning was delayed.
The second flight arrived late as well.
Contrast:
The first flight to Frankfurt this morning was delayed.
The second flight arrived on time.
Example:
There have been many previous missions to Mars.
A famous example is the Pathfinder mission.
Generalization: Two missions to Mars in 1999 failed.
There are many missions to Mars that have failed.
Elaboration*: A probe to Mars was launched from the Ukraine this week.
The European-built “Mars Express” is scheduled to reach Mars by Dec.
* The elaboration relation is given one or more sub-types:
organization, person, location, time, number, detail
Causal, Temporal and Attribution Relations
Cause-effect:
Causal
Conditional:
There was bad weather at the airport
and so our flight got delayed
If the new software works,
everyone should be happy.
Violated Expectation:
Temporal
Precedence:
Attribution
Attribution:
Same:
The new software worked great,
but nobody was happy.
First, John went grocery shopping.
Then, he disappeared into a liquor store.
John said that
the weather would be nice tomorrow.
The economy,
according to analysts,
is expected to improve by early next year.
Some Issues with GraphBank

Coherence relations

Conflation of actual causation and intention/purpose
The university spent $30,000
to upgrade lab equipment in 1987
cause
?? John pushed the door

Granularity

elaboration
to open it.
cause
Desirable for relations hold between eventualities or entities,
not necessarily entire clausal segments:
the new policy came about after President Reagan’s historic decision in mid-December
to reverse the policy of refusing to deal with members of the organization,
long shunned as a band of terrorists.
Reagan said
PLO chairman Yasser Arafat had met US demands.
A Classifier-based Approach

For each pair of discourse segments, classify relation type between
them


Advantages





Include arbitrary knowledge sources as features
Easier than implementing inference on top of semantic interpretations
Robust performance
Gain insight into how different knowledge sources contribute
Disadvantages


For segment pairs on which we know a relation exists
Difficult to determine why mistakes happen
Maximum Entropy


Commonly used discriminative classifier
Allows for a high-number of non-independent features
Knowledge Sources

Knowledge Sources:








Proximity
Cue Words
Lexical Similarity
Events
Modality and Subordinating Relations
Grammatical Relations
Temporal relations
Associate with each knowledge source

One or more Feature Classes
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Proximity

Motivation

Some relations tend to be local – i.e. Their arguments appear
nearby in the text


Attribution, cause-effect, temporal precedence, violated expectation
Other relations can span larger portions of text


Elaboration
Similar, contrast
Proximity:
Feature Class
- Whether segments are adjacent or not
- Directionality (which argument appears earlier in the text)
- Number of intervening segments
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words

Motivation:


Many coherence relations are frequently identified by a discourse
cue word or phrase: “therefore”, “but”, “in contrast”
Cues are generally captured by the first word in a segment


Obviates enumerating all potential cue words
Non-traditional discourse markers (e.g. adverbials or even
determiners) may indicate a preference for certain relation types
Cue Words:
- First word in each segment
Feature Class
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words
First1=“to”; First2=“The”
Lexical Coherence

Motivation:



Identify lexical associations, lexical/semantic similarities
E.g. push/fall, crash/injure, lab/university
Brandeis Semantic Ontology (BSO)


Taxonomy of types (i.e. senses)
Includes qualia information for words


Telic (purpose), agentive (creation), constitutive (parts)
Word Sketch Engine (WSE)

Similarity of words as measured by their contexts in a corpus (BNC)
Feature Class
BSO:
- Paths between words up to length 10
WSE:
- Number of word pairs with similarity > 0.05, > 0.01
- Segment similarities (sum of word-pair similarities / # words)
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words
First1=“to”; First2=“The”
BSO
Research Lab=>Educational Activity=>University
WSE
WSE>0.05; WSE-sentence-similarity=0.005417
Events

Motivation:



Certain events and event-pairs are indicative of certain relation
types (e.g. “push”-”fall”: cause)
Allow learner to associate events and event-pairs with particular
relation types
Evita: EVents In Text Analyzer


Performs domain independent identification of events
Identifies all event-referring expressions (that can be temporally
ordered)
Feature Class
Events:
- Event mentions in each segment
- Event mention pairs drawn from both segments
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words
First1=“to”; First2=“The”
BSO
Research Lab=>Educational Activity=>University
WSE
WSE>0.05; WSE-sentence-similarity=0.005417
Events
Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”
Modality and Subordinating
Relations

Motivation:


Event modality and subordinating relations are indicative of
certain relations
SlinkET [Saurí et al. 2006]

Identifies subordinating contexts and classifying as:



Factive, counter-factive, evidential, negative evidential, or modal
E.g. evidential => attribute relation
Event class, polarity, tense, etc.
Feature Class
SlinkET:
- Event class, polarity, tense and modality of events in each
segment
- Subordinating relations between event pairs
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words
First1=“to”; First2=“The”
BSO
Research Lab=>Educational Activity=>University
WSE
WSE>0.05; WSE-sentence-similarity=0.005417
Events
Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”
SlinkET
Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”;
Tense2=“past”; modal-relation
Cue Words and Events

Motivation
Certain events (event types) are likely to appear in
particular discourse contexts keyed by certain
connectives.
 Pairing connectives with events captures this more
precisely than connectives or events on their own

CueWords + Events:
- First word of SEG1 and each event mention in SEG2
- First word of SEG2 and each event mention in SEG1
Feature Class
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words
First1=“to”; First2=“The”
BSO
Research Lab=>Educational Activity=>University
WSE
WSE>0.05; WSE-sentence-similarity=0.005417
Events
Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”
SlinkET
Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”;
Tense2=“past”; modal-relation
CueWord +
Events
First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade”
Grammatical Relations

Motivation:
Certain intra-sentential relations captured or ruled
out by particular dependency relations between
clausal headwords
 Identification of headwords also important



Main events identified
RASP parser
Syntax:
- Grammatical relations between two segments
- GR + SEG1 head word
- GR + SEG2 head word
- GR + Both head words
Feature Class
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words
First1=“to”; First2=“The”
BSO
Research Lab=>Educational Activity=>University
WSE
WSE>0.05; WSE-sentence-similarity=0.005417
Events
Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”
SlinkET
Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”;
Tense2=“past”; modal-relation
CueWord +
Events
First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade”
Syntax
Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”Head2=“spent”
Temporal Relations

Motivation:



Temporal ordering between events constrains possible
coherence relations
E.g. E1 BEFORE E2 => NOT(E2 CAUSE E1)
Temporal Relation Classifier


Trained on TimeBank 1.2 using MaxEnt
See [Mani et al. “Machine Learning of Temporal Relations” ACL
2006]
TLink:
- Temporal Relations holding between segments
Feature Class
Example
SEG2: The university spent $30000
SEG1: to upgrade lab equipment in 1987
Fea. Class
Example Feature
Proximity
adjacent; dist<3; dist<5; direction-reverse; same-sentence
Cue Words
First1=“to”; First2=“The”
BSO
Research Lab=>Educational Activity=>University
WSE
WSE>0.05; WSE-sentence-similarity=0.005417
Events
Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent”
SlinkET
Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”;
Tense2=“past”; modal-relation
CueWord +
Events
First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade”
Syntax
Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”Head2=“spent”
Tlink
Seg2-before-Seg1
Relation Classification

Identify

Specific coherence relation






Used Maximum Entropy classifier ( Gaussian prior variance = 2.0 )
8-fold cross validation
Specific relation accuracy: 81.06%
Inter-annotator agreement: 94.6%
Majority Class Baseline: 45.7%


Coarse-grained relation (resemblance, cause-effect, temporal, attributive)
Evaluation Methodology


Ignoring elaboration subtypes (too sparse)
Classifying all relations as elaboration
Coarse-grain relation accuracy: 87.51%
F-Measure Results
Relation
Precision
Recall
F-measure
# True
positives
elaboration
88.72
95.31
91.90
512
attribution
91.14
95.10
93.09
184
similar (parallel)
71.89
83.33
77.19
132
same
87.09
75.00
80.60
72
cause-effect
78.78
41.26
54.16
63
contrast
65.51
66.67
66.08
57
example
78.94
48.39
60.00
31
temporal precedence
50.00
20.83
29.41
24
violated expectation
33.33
16.67
22.22
12
conditional
45.45
62.50
52.63
8
generalization
0
0
0
0
Results: Confusion Matrix
Reference
Hypothesis
elab
par
attr
ce
temp
contr
same
exmp
expv
cond
gen
elab
488
3
7
3
1
0
2
4
0
3
1
par
6
110
2
2
0
8
2
0
0
2
0
attr
4
0
175
0
0
1
2
0
1
1
0
ce
18
9
3
26
3
2
2
0
0
0
0
temp
6
8
2
0
5
3
0
0
0
0
0
contr
4
12
0
0
0
38
0
0
3
0
0
same
3
9
2
2
0
2
54
0
0
0
0
exmp
15
1
0
0
0
0
0
15
0
0
0
expv
3
1
1
0
1
4
0
0
2
0
0
cond
3
0
0
0
0
0
0
0
0
5
0
gen
0
0
0
0
0
0
0
0
0
0
0
Feature Class Analysis


What is the utility of each feature class?
Features overlap significantly – highly correlated


Independently




How can we estimate utility?
Start with Proximity feature class (baseline)
Add each feature class separately
Determine improvement over baseline
In combination with other features



Start with all features
Remove each feature class individually
Determine reduction from removal of feature class
Feature Class Analysis Results
Feature Class
Accuracy
Feature Class
Accuracy
Coarse-grain
Acc.
Coarse-grain
Acc.
All Features
81.06%
87.51%
Proximity
60.08%
69.43%
- Proximity
71.52%
84.88%
+ Cuewords
76.77%
83.50%
- Cuewords
75.71%
84.69%
+ BSO
62.92%
74.40%
-
BSO
80.65%
87.04%
+ WSE
62.20%
70.10%
-
WSE
80.26%
87.14%
+ Events
63.84%
78.16%
-
Events
80.90%
86.92%
+ SlinkET
69.00%
75.91%
- SlinkET
79.68%
86.89%
+ CueWord /
Event
67.18%
78.63%
- CueWord /
Event
80.41%
87.14%
+ Syntax
70.30%
80.84%
- Syntax
80.20%
86.89%
+ TLink
64.19%
72.30%
- TLink
80.30%
87.36%
Feature Class Contributions in Isolation
Feature Class Contributions in Conjunction
Relation Identification

Given


Identify


Discourse segments (and segment sequences)
For each pair of segments, whether a relation (any relation)
exists on those segments
Two issues:

Highly skewed classification


Many negatives, few positives
Many of the relations are transitive

These aren’t annotated and will be false negative instances
Relation Identification Results

For all pairs of segment sequence in a document
Used same features as for classification
 Achieved accuracy only slightly above majority class
baseline


For segment pairs in same sentence


Accuracy: 70.04% (baseline 58%)
Identification and classification in same sentence

Accuracy: 64.53% (baseline 58%)
Inter-relation Dependencies

Each relation shouldn’t be identified in isolation

When identifying a relation between si and sj, consider other relations
involving si and sj
{R( si , sk ) | k  j}

Include as features the other (gold standard true) relation types
both segments are involved in



Adding this feature class improves performance to 82.3%
6.3% error reduction
Indicates room for improvement with


Collective classification (where outputs influence each other)
Incorporating explicit modeling constraints



{R( s j , sl ) | l  i}
Tree-based parsing model
Constrained DAGs [Danlos 2004]
Including, deducing transitive links may help further
Conclusions


Classification approach with many features achieves good
performance at classifying coherence relation types
All feature classes helpful, but:




Discriminative power of most individual feature classes captured by
union of remaining feature classes
Proximity + CueWords acheives 76.77%
Remaining features reduce error by 23.7%
Classification approach performs less well on task of identifying
the presence of a relation


Using same features as for classifying coherence relation types
“Parsing” may prove better for local relationships
Future Work

Additional linguistic analysis


Co-reference – both entities and events
Word-sense


Pipelined or ‘stacked’ architecture





Classify coarse-grained category first, then specific coherence relation
Justification: different categories require different types of knowledge
Relational classification


lexical similarity confounded with multiple types for a lexeme
Model decisions collectively
Include constraints on structure
Investigate transitivity of resemblance relations
Consider other approaches for identification of relations
Questions?
Backup Slides
GraphBank Annotation Statistics

Corpus and Annotator Statistics


135 doubly annotated newswire articles
Identifying discourse segments had high agreement (> 90%
from pilot study of 10 documents)



Corpus segments ultimately annotated once (by both annotators
together)
Segment grouping - Kappa 0.8424
Relation identification and typing - Kappa 0.8355

Factors Involved in Identifying
Coherence Relations
Proximity


E.g. Attribution local, elaboration non-local
Lexical and phrasal cues

Constrain possible relation types



Co-reference


E.g. similar => similar/same event and/or participants
Lexical Knowledge




Coherence established with references to mentioned entities/events
Argument structure


But => ‘contrast’, ‘expected violation’
And => ‘elaboration’, ‘similar’, ‘contrast’
Type inclusion, word sense
Qualia (purpose of an object, resulting state of an action), event structure
Paraphrases: delay => arrive late
World Knowledge

E.g. Ukraine is part of Europe
Architecture
Training
Knowledge
Source 1
Pre-processing
Knowledge
Source 2
Knowledge
Source n
Feature
Constructor
Model
Classifications
Prediction
Scenario Extraction: MUC

Pull together relevant facts related to a “complex event”





Requires identifying relations between events:



Management Succession
Mergers and Acquisitions
Natural Disasters
Satellite launches
Parallel, cause-effect, elaboration
Also: identity, part-of
Hypothesis:

Task independent identification of discourse relations will
allow rapid development of Scenario Extraction systems
Information Extraction: Current
Scenario Extraction
Fact Extraction
Task 1.1
Domain 1
Task 1.N
Pre-process
Task 2.1
Domain 2
Task 2.N
Domain N
Information Extraction: Future
Pre-process
Fact Extraction
Discourse
Download