Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner†*, James Pustejovsky†, Catherine Havasi†, Anna Rumshisky† and Roser Saurí† † Brandeis University * The MITRE Corporation mitre Outline of Talk Overview and Motivation for Modeling Discourse Background Objectives The Discourse GraphBank Modeling Discourse Overview Coherence Relations Issues with the GraphBank Machine learning approach Knowledge Sources and Features Experiments and Analysis Conclusions and Future Work Modeling Discourse: Motivation Why model discourse? Dialogue General text understanding applications Text summarization and generation Information extraction MUC Scenario Template Task Discourse is vital for understanding how events are related Modeling discourse generally may aid specific extraction tasks Background Different approaches to discourse Different objectives Coarse vs. fine-grained Different representations Informational vs. intentional, dialog vs. general text Different inventories of discourse relations Semantics/formalisms: Hobbs [1985], Mann and Thomson[1987], Grosz and Sidner[1986], Asher [1993], others Tree representation vs. Graph Same steps involved: 1. Identifying discourse segments 2. Grouping discourse segments into sequences 3. Identifying the presence of a relation 4. Identifying the type of the relation Discourse Steps #1* 1. Segment: Mary is in a bad mood because Fred played tuba while she was taking a nap. A B C r1 2. Group 3. Connect segments r2 A B 4. Relation Type: C r1 = cause-effect r2 = elaboration * Example from [Danlos 2004] Discourse Steps #2* 1. Segment: Fred played the tuba. Next he prepared a pizza to please Mary. A 2. Group 3. Connect segments 4. Relation Type: B r1 A C r2 B C r1 = temporal precedence r2 = cause-effect * Example from [Danlos 2004] Objectives Our Main Focus: Step 4 - classifying discourse relations Important for all approaches to discourse Can be approached independently of representation But – relation types and structure are probably quite dependent Task will vary with inventory of relation types What types of knowledge/features are important for this task Can we apply the same approach to Step 3: Identifying whether two segment groups are linked Discourse GraphBank: Overview Graph-based representation of discourse Segments can be grouped into sequences Relations need not exist between segments within a group Coherence relations between segment groups Tree-representation inadequate: multiple parents, crossing dependencies Discourse composed of clausal segments [Wolf and Gibson, 2005] Roughly those of Hobbs [1985] Why GraphBank? Similar inventory of relations as SDRT Linked to lexical representations Semantics well-developed Includes non-local discourse links Existing annotated corpus, unexplored outside of [Wolf and Gibson, 2005] Resemblance Relations Similarity: (parallel) The first flight to Frankfurt this morning was delayed. The second flight arrived late as well. Contrast: The first flight to Frankfurt this morning was delayed. The second flight arrived on time. Example: There have been many previous missions to Mars. A famous example is the Pathfinder mission. Generalization: Two missions to Mars in 1999 failed. There are many missions to Mars that have failed. Elaboration*: A probe to Mars was launched from the Ukraine this week. The European-built “Mars Express” is scheduled to reach Mars by Dec. * The elaboration relation is given one or more sub-types: organization, person, location, time, number, detail Causal, Temporal and Attribution Relations Cause-effect: Causal Conditional: There was bad weather at the airport and so our flight got delayed If the new software works, everyone should be happy. Violated Expectation: Temporal Precedence: Attribution Attribution: Same: The new software worked great, but nobody was happy. First, John went grocery shopping. Then, he disappeared into a liquor store. John said that the weather would be nice tomorrow. The economy, according to analysts, is expected to improve by early next year. Some Issues with GraphBank Coherence relations Conflation of actual causation and intention/purpose The university spent $30,000 to upgrade lab equipment in 1987 cause ?? John pushed the door Granularity elaboration to open it. cause Desirable for relations hold between eventualities or entities, not necessarily entire clausal segments: the new policy came about after President Reagan’s historic decision in mid-December to reverse the policy of refusing to deal with members of the organization, long shunned as a band of terrorists. Reagan said PLO chairman Yasser Arafat had met US demands. A Classifier-based Approach For each pair of discourse segments, classify relation type between them Advantages Include arbitrary knowledge sources as features Easier than implementing inference on top of semantic interpretations Robust performance Gain insight into how different knowledge sources contribute Disadvantages For segment pairs on which we know a relation exists Difficult to determine why mistakes happen Maximum Entropy Commonly used discriminative classifier Allows for a high-number of non-independent features Knowledge Sources Knowledge Sources: Proximity Cue Words Lexical Similarity Events Modality and Subordinating Relations Grammatical Relations Temporal relations Associate with each knowledge source One or more Feature Classes Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Proximity Motivation Some relations tend to be local – i.e. Their arguments appear nearby in the text Attribution, cause-effect, temporal precedence, violated expectation Other relations can span larger portions of text Elaboration Similar, contrast Proximity: Feature Class - Whether segments are adjacent or not - Directionality (which argument appears earlier in the text) - Number of intervening segments Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words Motivation: Many coherence relations are frequently identified by a discourse cue word or phrase: “therefore”, “but”, “in contrast” Cues are generally captured by the first word in a segment Obviates enumerating all potential cue words Non-traditional discourse markers (e.g. adverbials or even determiners) may indicate a preference for certain relation types Cue Words: - First word in each segment Feature Class Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words First1=“to”; First2=“The” Lexical Coherence Motivation: Identify lexical associations, lexical/semantic similarities E.g. push/fall, crash/injure, lab/university Brandeis Semantic Ontology (BSO) Taxonomy of types (i.e. senses) Includes qualia information for words Telic (purpose), agentive (creation), constitutive (parts) Word Sketch Engine (WSE) Similarity of words as measured by their contexts in a corpus (BNC) Feature Class BSO: - Paths between words up to length 10 WSE: - Number of word pairs with similarity > 0.05, > 0.01 - Segment similarities (sum of word-pair similarities / # words) Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words First1=“to”; First2=“The” BSO Research Lab=>Educational Activity=>University WSE WSE>0.05; WSE-sentence-similarity=0.005417 Events Motivation: Certain events and event-pairs are indicative of certain relation types (e.g. “push”-”fall”: cause) Allow learner to associate events and event-pairs with particular relation types Evita: EVents In Text Analyzer Performs domain independent identification of events Identifies all event-referring expressions (that can be temporally ordered) Feature Class Events: - Event mentions in each segment - Event mention pairs drawn from both segments Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words First1=“to”; First2=“The” BSO Research Lab=>Educational Activity=>University WSE WSE>0.05; WSE-sentence-similarity=0.005417 Events Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent” Modality and Subordinating Relations Motivation: Event modality and subordinating relations are indicative of certain relations SlinkET [Saurí et al. 2006] Identifies subordinating contexts and classifying as: Factive, counter-factive, evidential, negative evidential, or modal E.g. evidential => attribute relation Event class, polarity, tense, etc. Feature Class SlinkET: - Event class, polarity, tense and modality of events in each segment - Subordinating relations between event pairs Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words First1=“to”; First2=“The” BSO Research Lab=>Educational Activity=>University WSE WSE>0.05; WSE-sentence-similarity=0.005417 Events Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent” SlinkET Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relation Cue Words and Events Motivation Certain events (event types) are likely to appear in particular discourse contexts keyed by certain connectives. Pairing connectives with events captures this more precisely than connectives or events on their own CueWords + Events: - First word of SEG1 and each event mention in SEG2 - First word of SEG2 and each event mention in SEG1 Feature Class Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words First1=“to”; First2=“The” BSO Research Lab=>Educational Activity=>University WSE WSE>0.05; WSE-sentence-similarity=0.005417 Events Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent” SlinkET Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relation CueWord + Events First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade” Grammatical Relations Motivation: Certain intra-sentential relations captured or ruled out by particular dependency relations between clausal headwords Identification of headwords also important Main events identified RASP parser Syntax: - Grammatical relations between two segments - GR + SEG1 head word - GR + SEG2 head word - GR + Both head words Feature Class Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words First1=“to”; First2=“The” BSO Research Lab=>Educational Activity=>University WSE WSE>0.05; WSE-sentence-similarity=0.005417 Events Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent” SlinkET Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relation CueWord + Events First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade” Syntax Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”Head2=“spent” Temporal Relations Motivation: Temporal ordering between events constrains possible coherence relations E.g. E1 BEFORE E2 => NOT(E2 CAUSE E1) Temporal Relation Classifier Trained on TimeBank 1.2 using MaxEnt See [Mani et al. “Machine Learning of Temporal Relations” ACL 2006] TLink: - Temporal Relations holding between segments Feature Class Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987 Fea. Class Example Feature Proximity adjacent; dist<3; dist<5; direction-reverse; same-sentence Cue Words First1=“to”; First2=“The” BSO Research Lab=>Educational Activity=>University WSE WSE>0.05; WSE-sentence-similarity=0.005417 Events Event1=“upgrade”; event2=“spent”; event-pair=“upgrade-spent” SlinkET Class1=“occurrence”; Class2=“occurrence”; Tense1=“infinitive”; Tense2=“past”; modal-relation CueWord + Events First1=“to”-Event2=“spent”; First2=“The”-Event1=“upgrade” Syntax Gr=“ncmod”; Gr=“ncmod”-Head1=“equipment”; Gr=“ncmod”Head2=“spent” Tlink Seg2-before-Seg1 Relation Classification Identify Specific coherence relation Used Maximum Entropy classifier ( Gaussian prior variance = 2.0 ) 8-fold cross validation Specific relation accuracy: 81.06% Inter-annotator agreement: 94.6% Majority Class Baseline: 45.7% Coarse-grained relation (resemblance, cause-effect, temporal, attributive) Evaluation Methodology Ignoring elaboration subtypes (too sparse) Classifying all relations as elaboration Coarse-grain relation accuracy: 87.51% F-Measure Results Relation Precision Recall F-measure # True positives elaboration 88.72 95.31 91.90 512 attribution 91.14 95.10 93.09 184 similar (parallel) 71.89 83.33 77.19 132 same 87.09 75.00 80.60 72 cause-effect 78.78 41.26 54.16 63 contrast 65.51 66.67 66.08 57 example 78.94 48.39 60.00 31 temporal precedence 50.00 20.83 29.41 24 violated expectation 33.33 16.67 22.22 12 conditional 45.45 62.50 52.63 8 generalization 0 0 0 0 Results: Confusion Matrix Reference Hypothesis elab par attr ce temp contr same exmp expv cond gen elab 488 3 7 3 1 0 2 4 0 3 1 par 6 110 2 2 0 8 2 0 0 2 0 attr 4 0 175 0 0 1 2 0 1 1 0 ce 18 9 3 26 3 2 2 0 0 0 0 temp 6 8 2 0 5 3 0 0 0 0 0 contr 4 12 0 0 0 38 0 0 3 0 0 same 3 9 2 2 0 2 54 0 0 0 0 exmp 15 1 0 0 0 0 0 15 0 0 0 expv 3 1 1 0 1 4 0 0 2 0 0 cond 3 0 0 0 0 0 0 0 0 5 0 gen 0 0 0 0 0 0 0 0 0 0 0 Feature Class Analysis What is the utility of each feature class? Features overlap significantly – highly correlated Independently How can we estimate utility? Start with Proximity feature class (baseline) Add each feature class separately Determine improvement over baseline In combination with other features Start with all features Remove each feature class individually Determine reduction from removal of feature class Feature Class Analysis Results Feature Class Accuracy Feature Class Accuracy Coarse-grain Acc. Coarse-grain Acc. All Features 81.06% 87.51% Proximity 60.08% 69.43% - Proximity 71.52% 84.88% + Cuewords 76.77% 83.50% - Cuewords 75.71% 84.69% + BSO 62.92% 74.40% - BSO 80.65% 87.04% + WSE 62.20% 70.10% - WSE 80.26% 87.14% + Events 63.84% 78.16% - Events 80.90% 86.92% + SlinkET 69.00% 75.91% - SlinkET 79.68% 86.89% + CueWord / Event 67.18% 78.63% - CueWord / Event 80.41% 87.14% + Syntax 70.30% 80.84% - Syntax 80.20% 86.89% + TLink 64.19% 72.30% - TLink 80.30% 87.36% Feature Class Contributions in Isolation Feature Class Contributions in Conjunction Relation Identification Given Identify Discourse segments (and segment sequences) For each pair of segments, whether a relation (any relation) exists on those segments Two issues: Highly skewed classification Many negatives, few positives Many of the relations are transitive These aren’t annotated and will be false negative instances Relation Identification Results For all pairs of segment sequence in a document Used same features as for classification Achieved accuracy only slightly above majority class baseline For segment pairs in same sentence Accuracy: 70.04% (baseline 58%) Identification and classification in same sentence Accuracy: 64.53% (baseline 58%) Inter-relation Dependencies Each relation shouldn’t be identified in isolation When identifying a relation between si and sj, consider other relations involving si and sj {R( si , sk ) | k j} Include as features the other (gold standard true) relation types both segments are involved in Adding this feature class improves performance to 82.3% 6.3% error reduction Indicates room for improvement with Collective classification (where outputs influence each other) Incorporating explicit modeling constraints {R( s j , sl ) | l i} Tree-based parsing model Constrained DAGs [Danlos 2004] Including, deducing transitive links may help further Conclusions Classification approach with many features achieves good performance at classifying coherence relation types All feature classes helpful, but: Discriminative power of most individual feature classes captured by union of remaining feature classes Proximity + CueWords acheives 76.77% Remaining features reduce error by 23.7% Classification approach performs less well on task of identifying the presence of a relation Using same features as for classifying coherence relation types “Parsing” may prove better for local relationships Future Work Additional linguistic analysis Co-reference – both entities and events Word-sense Pipelined or ‘stacked’ architecture Classify coarse-grained category first, then specific coherence relation Justification: different categories require different types of knowledge Relational classification lexical similarity confounded with multiple types for a lexeme Model decisions collectively Include constraints on structure Investigate transitivity of resemblance relations Consider other approaches for identification of relations Questions? Backup Slides GraphBank Annotation Statistics Corpus and Annotator Statistics 135 doubly annotated newswire articles Identifying discourse segments had high agreement (> 90% from pilot study of 10 documents) Corpus segments ultimately annotated once (by both annotators together) Segment grouping - Kappa 0.8424 Relation identification and typing - Kappa 0.8355 Factors Involved in Identifying Coherence Relations Proximity E.g. Attribution local, elaboration non-local Lexical and phrasal cues Constrain possible relation types Co-reference E.g. similar => similar/same event and/or participants Lexical Knowledge Coherence established with references to mentioned entities/events Argument structure But => ‘contrast’, ‘expected violation’ And => ‘elaboration’, ‘similar’, ‘contrast’ Type inclusion, word sense Qualia (purpose of an object, resulting state of an action), event structure Paraphrases: delay => arrive late World Knowledge E.g. Ukraine is part of Europe Architecture Training Knowledge Source 1 Pre-processing Knowledge Source 2 Knowledge Source n Feature Constructor Model Classifications Prediction Scenario Extraction: MUC Pull together relevant facts related to a “complex event” Requires identifying relations between events: Management Succession Mergers and Acquisitions Natural Disasters Satellite launches Parallel, cause-effect, elaboration Also: identity, part-of Hypothesis: Task independent identification of discourse relations will allow rapid development of Scenario Extraction systems Information Extraction: Current Scenario Extraction Fact Extraction Task 1.1 Domain 1 Task 1.N Pre-process Task 2.1 Domain 2 Task 2.N Domain N Information Extraction: Future Pre-process Fact Extraction Discourse