Computational Models of Discourse and Dialogue 2011: Conversation in Social Media Natural Language and Dialogue Systems Lab Persuasion in Social Media Persuasion and argumentation in social media websites and forums NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ NLDS Social Media Dialogue Data Data collected in the last year in collaboration with FoxTree’s Lab & Anand’s SemLab Convinceme.net 4forums.org Carm.org NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Using Mechanical Turk to get labels http://pcon.soe.ucsc.edu/mturk_external/123/123.ph p?pageId=1597&assignmentId=ASSIGNMENT_ID_N OT_AVAILABLE&hitId=1HNBWKACQBSEV0YDIO YSBWM1C0YNIP http://pcon.soe.ucsc.edu/mturk_external/qr/qr.php?p ageId=1398&assignmentId=ASSIGNMENT_ID_NOT _AVAILABLE&hitId=1CEJFP6T9BRSEF7QNPYEV9U3 7T7Y6W NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Classic Models of Discourse and Dialogue Structure (Task Oriented Dialog, Newspaper texts) Marilyn Walker. CS245. April 1st, 2010 Natural Language and Dialogue Systems Lab Dialogue Processing (circa 1988) Grosz & Sidner 1986 Planning, Grice Mann & Thompson 1988 Rhetorical Relations, Text Structure Polanyi 1984 Linguistic Discourse Model Hobbs 1979 Coherence Relations NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Dialogue Processing (circa 1988) Me 1989 Starting my Ph.D. with Aravind Joshi and Ellen Prince Science IS NOT a belief system => Empirical Methods in Discourse NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Empirical/Statistical Approaches in NLP Penn Treebank first available ~ 1990 Plenty of data for parsing and POS But what about language behavior above the sentence? What about interactive language? 1993: NSF Workshop on Centering in Naturally Occurring Discourse => Walker, Joshi & Prince 1997 1995: AAAI Workshop on Empirical Methods in Discourse => Walker & Moore CL special issue 1996: NSF Workshop on Discourse & Dialogue Tagging => DAMSL markup NOW: there is virtually no work in NLP on discourse and dialogue that is not corpus based/empirical. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ What is a dialogue model? A model is an abstraction of a thing, simplified or dimensionally reduced A good model should be simpler but capture the essence of the real thing. A good dialogue model should be testable. It should make predictions. Its claims should be such that one should be able to prove whether or not it is correct. A good dialogue model should lead to results that are more generalizable. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Dialogue Structure What makes a text coherent? What are discourse structures? Theories of discourse structures Approaches to build discourse structures NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Discourse Coherence Example: (1) John hid Bill’s car keys. (2) He was drunk. (1) John hid Bill’s car keys. (2) He likes junk food. (1) George Bush supports big business. (2) He’s sure to veto House Bill 1711. Hearers try to find connections between utterances in a discourse. The possible connections between utterances can be specified as a set of coherence relations. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Coherence relations (Hobbs,1979) Result: S0 causes S1 John bought an Acura. His father went ballistic. Explanation: S1 causes S0. John hid Bill’s car keys. He was drunk. Parallel: S0 and S1 are parallel. John bought an Acura. Bill bought a BMW. Elaboration: S1 is an elaboration of S0. John bought an Acura this weekend. He purchased it for $40 thousand dollars. … NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Discourse structure S1: John took a train to Bill’s car dealership. S2: He needed to buy a car. S3: The company he works for now isn’t near any public transportation. S4:He also wanted to talk to Bill about their softball leagues. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB ] Explanation UC SANTA CRUZ Discourse structure NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB ] Explanation S1: John took a train to Bill’s car dealership. S2: He needed to buy a car. S3: The company he works for now isn’t near any public transportation. S4:He also wanted to talk to Bill about their softball leagues. ] Parallel UC SANTA CRUZ Discourse structure ] Explanation ] Parallel NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB ] Explanation S1: John took a train to Bill’s car dealership. S2: He needed to buy a car. S3: The company he works for now isn’t near any public transportation. S4:He also wanted to talk to Bill about their softball leagues. UC SANTA CRUZ Discourse parsing Explanation (e1) S1 (e1) Parallel (e2;e4) Explanation (e2) S2(e2) S4 (e4) S3(e3) NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Why compute discourse structure? Natural language understanding Summarization Information retrieval Natural language Generation Reference resolution NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Theories of discourse structure Mann and Thompson’s Rhetorical structure theory (1988) Grosz and Sidner’s Attention, intention and structure of discourse (1986) Discourse TAG. Penn Discourse Treebank (PDTB) We will read a lot of papers using DTAG and PDTB so am just going to talk about these ‘classic theories’ today. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Rhetorical structure theory (RST) Mann and Thompson (1988) One theory of discourse structure, based on identifying relations between parts of the text: Defined 20+ rhetorical relations Presentational relations: intentional Subject matter relations: informational Nucleus: central segment of text Satellite: more peripheral segment Relation definitions and more. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Presentational (intentional) relations Those whose intended effect is to increase some inclination in the hearer. Relations: Antithesis Background Concession Enablement: Evidence NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB - Justify - Motivation - Preparation - Restatement - Summary UC SANTA CRUZ Subject matter (information) relations Those whose intended effect is that the hearer recognize the relation in question. Relations Circumstance Condition Elaboration Evaluation Interpretation Means Non-volitional cause Non-volitional result NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB - Otherwise - Purpose - Solutionhood - Unconditional - Unless - Volitional cause - Volitional result UC SANTA CRUZ Multinuclear relations Contrast Joint List Multinuclear restatement Sequence NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Some examples Explanation: John went to the coffee shop. He was sleepy. Elaboration: John likes coffee. He drinks it every day. Contrast: John likes coffee. Mary hates it. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Discourse structure John likes coffee They argue a lot Mary hates coffee. He drinks it every day NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ A relation: Evidence (a) George Bush supports big business. (b) He’s sure to veto House Bill 1711. Relation Name: Evidence Constraints on Nucl: H might not believe Nucl to a degree satisfactory to S. Constraints on Sat: H believes Sat or will find it credible Constraints on Nucl+Sat: H’s comprehending Sat in Sat increases H’s belief of Nucl. Effect: H’s belief of Nucl is increased. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ A relation: Volitional-Cause (a) George Bush supports big business. (b) He’s sure to veto House Bill 1711. Relation Name: Volitional-Cause Constraints on Nucl: presents a volitional action Constraints on Sat: none. Constraints on Nucl+Sat: Sat presents a situation that could have caused the agent of the volitional action in Nucl to perform the action. Effect: H recognizes the situation presented in Sat as a cause for the volitional action presented in Nucl. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Another example S: (a) Come home by 5:00. (b) Then we can go to the hardware store before it closes. (c) That way we can finish the bookshelves tonight. (a) (a) motivation motivation (b) (b) condition NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB (c) (c) condition UC SANTA CRUZ A Problem with RST (Moore & Pollack, 1992) How many rhetorical relations are there? How can we use RST in dialogues? How do we incorporate speaker intentions into RST? RST does not allow for multiple relations between parts of a discourse: informational and intentional levels must coexist. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Grosz & Sidner (1986) Natural Language and Dialogue Systems Lab Grosz and Sidner (1986) Three components: Linguistic structure Intentional structure Attentional state NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Linguistic structure The structure of the sequence of utterances that comprises a discourse. Utterances form Discourse Segment (DS); and a discourse is made up of embedded DSs. What exactly is a DS? Any evidence that humans naturally recognize segment boundaries? Do humans agree on segment boundaries? How to find the boundaries automatically? NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Intentional structure Speakers in a discourse may have many intentions: public or private. Discourse purpose (DP): the intention that underlies engaging in a discourse. Discourse segment purpose (DSP): the purpose a DS. How this segment contributes to achieving the overall DP? Two relations between DSPs: Dominance: if DSP1 contributes to DSP2, we say DSP2 dominates DSP1. Satisfaction-precedence: DSP1 must be satisfied before DSP2. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Attentional State The attentional state is an abstraction of the participants’ focus of attention as their discourse unfolds. The state is a stack of focus spaces. A focus space (FS) is associated with a DS, and it contains DSP and objects, properties, and relations salient in the DS. When a DS ends, its FS is popped. When a DS starts, its FS is pushed onto the stack. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ An example C1: I need to travel in May. A1: And, what day in May do you want to travel? C2: I need to be there for a meeting on 15th. A2: And you are flying into what city? C3: Seattle. A3: And what time would you like to leave Pittsburgh? C4: Hmm. I don’t think there are many options for non-stop. A4: There are three non-stops today. C5: What are they? …. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB DS1 DS2 DS3 DS0 DS4 DS5 UC SANTA CRUZ Discourse structure with intention info DS0 DS1 DS2 C1 A1-C2 DS3 DS4 DS5 A2-C3 A3 C4-C7 I0: C wants A to find a flight for C I1: C wants A to know that C is traveling in May. I2: A wants to know the departure date etc. I3: A wants to know the destination I4: A wants to know the departure time I5: C wants A to find a nonstop flight NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Problems with G&S 1986 Assume that discourses are task-oriented Assume there is a single, hierarchical structure shared by speaker and hearer Do people really build such structures when they speak? Do they use them in interpreting what others say? NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Walker 1996: Limited Attention & Discourse Structure Natural Language and Dialogue Systems Lab LIMITED ATTENTION CONSTRAINT Walker 1993, 1996 ellipsis interpretation pronominal anaphora interpretation inference of discourse relations between utterances A and B B MOTIVATES A B is EVIDENCE for A NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ How is attention modeled ? Linear Recency Hierarchical Recency NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Centering Centering is formulated as a theory that relates focus of attention, choice of referring expression, and perceived coherence of utterances, within a discourse segment [Grosz et al., 1995]. Brennan, Walker & Pollard 1987: Centering theory of Anaphora Resolution NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ What about Processing & Centering? NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Informationally Redundant Utterances NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Centers cross segments Centers continued over discourse segment boundaries with pronominal referring expressions whose form is identical to those that occur within a discourse segment. (29) and he's going to take a pear or two, and then.. go on his way (30) um but the little boy comes, (31) and uh he doesn't want just a pear, (32) he wants a whole basket. (33) So he puts the bicycle down, (34) and he.. [Pear Stories, Chafe, 1980; Passonneau, 1995]: => discourse segment boundary between (32) and (33). [Passonneau, 1995, Passonneau & Litman 1997] [Walker et al., 1998], (33) realizes a CONTINUE transition, indicating that utterance (33) is highly coherent in the context of utterance (32). NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Why is centering only within Segment? It is not plausible that a different process than centering would be required to explain the relationship between utterances (32) and (33), simply because these utterances span a discourse segment boundary. Centering is a theory that relates focus of attention, choice of referring expression, and perceived coherence of utterances, within a discourse segment [Joshi & Weinstein 1983, Grosz, Joshi & Weinstein, 1995], NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Cache Model (Human Working Memory) NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Building discourse structure Natural Language and Dialogue Systems Lab Tasks Identify units, e.g. discourse segment boundaries Determine relations between segments Determine intentions of the segments Determine the attentional state Methods: Inference-based approach: symbolic Cue-based approach: statistical NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Inference-based approach Ex: John hid Bill’s car keys. He was drunk. X is drunk people do not want X to drive People don’t want X to drive people hide X’s car key. Abduction: AI-complete: Require and utilize world knowledge. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Cue-based approach Attentional state: Attentional changes: (push) now, next, but, …. (pop) anyway, in any case, now back to, ok, fine,... True interruption: excuse me, I must interrupt Flashback: oops, I forgot Intention: Satisfaction-precedes: first, second, furthermore, …. Dominance: for example, first, second, …. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Cues (cont) Linguistic structure Elaboration: for example, … Concession: although Condition: if Sequence: and, first, second. Contrast: and, … … NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ One example (Marcu 1999): Train a parser on a discourse treebank. 90 trees, hand-annotated for rhetorical relations (RR) Learn to identify Elementary discourse units (EDUs) Learn to identify N, S, and their relation. Features: WordNet-based similarity, lexical, structural, … NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Results Identify units (Elementary DUs): 96%-98% accuracy Identify hierarchical structures (2 EDUs are related): Recall=71%, Precision=84% Identify nucleus/satellite labels: Rec=58%, Prec=69% Identify rhetorical relation: Rec=38%, Prec=45% Hierarchical structure is easier to id than rhetorical relations. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Discourse Representation Theory Natural Language and Dialogue Systems Lab Informational Components. Data Participants Beliefs Common ground Intentions NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Formal Representations Formal representation of informational components Typed feature structures Lists Sets Propositions First order logic NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Dialog Moves Trigger the update of the information state Grammatical triggers External events NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Update Rules Govern information state updates Sometimes incorporates domain knowledge Sometimes govern behavior of dialog moves NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Control Strategy Decide which update rule applies Simple priority list Game theory Utility theory Statistical methods NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Also for Dialogue Systems… Natural Language and Dialogue Systems Lab Dialog Theories Finite State Dialog Models Plan-based Models NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Finite State Dialog Models Information is a state in the FSM Dialog moves are inputs matching transitions Update Rules are FSM lookups and transitions Control Strategy is static, the FSM itself NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Plan-based Models Information state is the modeled beliefs, desires, and intentions of the participants Dialog moves are speech acts, e.g. request and inform Update rules are cognitive rules of evidence Control Strategies are classic AI plan-based strategies NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ What is a discourse relation? (Joshi,Prasad, Webber, Coling/ACL Tutorial 1996) The meaning and coherence of a discourse results partly from how its constituents relate to each other. Reference relations Discourse relations Discourse Coherence Reference Relations Discourse Relations Informational Intentional 63 NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Why Discourse Relations? Informational discourse relations convey relations that hold in the subject matter. Intentional discourse relations specify how intended discourse effects relate to each other. [Moore & Pollack, 1992] argue that discourse analysis requires both types. RST informational or semantic relations (e.g, CONTRAST, CAUSE, CONDITIONAL, TEMPORAL, etc.) between abstract entities of appropriate sorts (e.g., facts, beliefs, eventualities, etc.), commonly called Abstract Objects (AOs) [Asher, 1993]. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Why Discourse Relations? Discourse relations provide a level of description that is theoretically interesting, linking sentences (clauses) and discourse; identifiable more or less reliably on a sufficiently large scale; capable of supporting a level of inference potentially relevant to many NLP applications. 65 NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ How are Discourse Relations declared? Broadly, there are two ways of specifying discourse relations: Abstract specification Relations between two given Abstract Objects are always inferred, and declared by choosing from a pre-defined set of abstract categories. Lexical elements can serve as partial, ambiguous evidence for inference. Lexically grounded Relations can be grounded in lexical elements. Where lexical elements are absent, relations may be inferred. 66 NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Rhetorical Structure Theory (RST) RST [Mann & Thompson, 1988] associate discourse relations with discourse structure (TEXT). Discourse structure reflects context-free rules called schemas. Applied to a text, schemas define a tree structure in which: • Each leaf is an elementary discourse unit (a continuous text span); • Each non-terminal covers a contiguous, non-overlapping text span; • The root projects to a complete, non-overlapping cover of the text; • Discourse relations (aka rhetorical relations) hold only between daughters of the same non-terminal node. 67 NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Types of Schemas in RST RST schemas differ with respect to: what rhetorical relation, if any, hold between right-hand side (RHS) sisters; whether or not the RHS has a head (called a nucleus); whether or not the schema has binary, ternary, or arbitrary branching. RST schema types in RST annotation 68 NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Moore & Pollack 1992 Example 1 (a) George Bush supports big business. (b) He's sure to veto House Bill 1711. SATELLITE NUCLEUS Relation name: EVIDENCE (MT 1987) Evidence is a “presentational relation” Constraints on Nucleus: H might not believe Nucleus to a degree satisfactory to S. Constraints on Satellite: H believes Satellite or will find it credible. Constraints on Nucleus + Satellite combination: H's comprehending Satellite increases H's belief of Nucleus. Effect: H's belief of Nucleus is increased NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Moore & Pollack 1992 Example 1 (a) George Bush supports big business. (b) He's sure to veto House Bill 1711. Relation name: VOLITIONAL-CAUSE Volitional Cause is a “subject matter” relation Constraints on Nucleus: presents a volitional action or situation that could have arisen from a volitional action. Constraints on Satellite: none. Constraints on Nucleus + Satellite combination: Satellite presents a situation that could have caused the agent of the volitional action in Nucleus to perform that action; without the presentation of Satellite, H might not regard the action as motivated or know the particular motivation; Nucleus is more central to S's purposes in putting forth the Nucleus-Satellite combination than Satellite is. Effect: H recognizes the situation presented in Satellite as a cause for the volitional action presented in Nucleus. NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ Moore & Pollack 1992 Presentational relations: == Speaker intention Speaker always has an INTENTION But Informational (subject matter relations) also necessary to understand the discourse Multiple levels of analysis are simultaneously available NATURAL LANGUAGE AND DIALOGUE SYSTEMS LAB UC SANTA CRUZ