Dialogue Systems Julia Hirschberg CS 4705 7/15/2016 1 Today • • • • • Dialogue Systems and Human Conversation Turns and Turn-taking Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics – Presupposition – Conventional Implicature – Conversational Implicature 7/15/2016 2 Dialogue System Applications • Information providing – 800-BING-411, Google Mobile App, Amtrak’s Julie, • Customer Care – T-Mobile’s Call Center, AT&T Call Routing • Training – Language tutoring: e.g. Carnegie Speech, KTH Ville – Other research platforms: e.g. ItSpoke at UPitt • Fun and games…. • Goal: Emulate Human-Human Behavior? 7/15/2016 3 Today • • • • • Dialogue Systems and Human Conversation Turns and Turn-taking Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics – Presupposition – Conventional Implicature – Conversational Implicature 7/15/2016 4 Turn-taking Behavior • Dialogue characterized by turn-taking – How do speakers know what to say and when to say it? • Conversational partners expect certain patterns of behavior in normal conversation Pat: You got an A? That’s great! Chris: Yeah, I’m really smart you know. Chris: Well, I was just lucky I happened to read the chapter on dialogue systems right before the test. Otherwise I never would have squeaked through. – Deviation is significant: dispreferred utterances 7/15/2016 5 • Children learn turn taking within first 2 years (Stern ’74) • General individual differences – Shy people pause longer and speak less and less often (Pilkonis ’77) – Schizophrenics, neurotics, depressed people less skilled in turn-taking 7/15/2016 6 Cultural Differences in Turn-Taking • Chinese telephone conversations – Openings (Zhu ’04) • Mandarin vs. British • Identification differences – British self-report – Chinese callees ask the caller – Closings (Sun ’05) • 39 female-female Mandarin telephone conversations • Closings initiated through matter-of-fact statement of intention to end conversation • Verbalized thanking occurs except in mother/daughter closings – not the standard English model – Finnish business calls (Halmari ’93) vs. American • Americans get right to the point • Finns chat 7/15/2016 7 Conversational Analysis (Sacks et al ’74) • Can we characterize expectations of ‘what to say’ more generally? • ‘Rules’ of turn-taking – If, during this turn the current speaker has selected A as the next speaker, then A must speak next – If the current speaker does not select the next speaker, any other speaker may take the next turn – If no one else takes the next turn, the current speaker may take the next turn • Rules Apply at Transition Relevance Places (TRPs) where something allows speaker changes to occur 7/15/2016 8 Where Can Speaker Shifts Occur • Adjacency pairs – Question/answer – Greeting/greeting – Compliment/downplayer • Dispreferred responses – Silence – ‘No’ to a simple request without explanation – Changing the topic abruptly without transition – Important for Spoken Dialogue Systems 7/15/2016 9 Diarization: Automatic Speaker Identification/Segmentation • Segment audio corpora (Broadcast News, meetings, telephone conversations) into speaker segments – Speaker segmentation – Speaker identification – Speech and music • Speaker segmentation (Diarization) – Initial segmentation – Segment clustering based on acoustic features – State-of-the-art: 8.47% error 7/15/2016 10 • Speaker identification – Linguistic information to identify speaker types and speaker names (LIMSI ’04) • Templates (“<name> has this report from <location>”) • Results: 10.9% error on test set – But only 10% of segments contain relevant patterns – Estimate 25% error on broadcast news if segmentation and clustering is done to id all of each speaker’s segments 7/15/2016 11 Turn-taking Behaviors Important for SDS • System understanding: – Is the user backchanneling or is she taking the turn (does ‘ok’ mean ‘I agree’ or ‘I’m listening’)? – Is this a good place for a system backchannel? • System generation: – How to signal to the user that the system system’s turn is over? – How to signal to the user that a backchannel might be appropriate? 7/15/2016 12 Types of Behavior • Smooth Switch: S1 is speaking and S2 speaks and takes and holds the floor • Hold: S1 is speaking, pauses, and continues to speak • Backchannel: S1 is speaking and S2 speaks -- to indicate continued attention -- not to take the floor (e.g. mhmm, ok, yeah) • How do people coordinate these behaviors with their interlocutor? • Acoustic-prosodic and lexical cues…. 7/15/2016 13 Smooth Switch, Backchannel, and Hold Differences 7/15/2016 14 Today • • • • • Dialogue Systems and Human Conversation Turns and Turn-taking Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics – Presupposition – Conventional Implicature – Conversational Implicature 7/15/2016 15 Speech Act Theory (Austin, Searle) • Locutionary acts: the act of uttering (semantic meaning) • Illocutionary acts: the act S intends to convey by the utterance (e.g. request, promise, statement) • Perlocutionary acts: the rhetorical act S intends the utterance to produce on H (e.g. regret, fear, hope) • Indirect Speech Acts (a type of illocutionary act): – It’s cold in here. – Can you tell me the time. 7/15/2016 16 NLP Speech Acts • Often identified with illocutionary force • Can be indicated by performative verbs – E.g. promise, order, ask, beseech, deny, apologize, curse – NB: Perlocutionary force cannot (I convince you to vote for me for president) • Searle’s ’75 taxonomy (assertives, directives, commissives, expressives, declarations) now vastly expanded 7/15/2016 17 Dialogue Acts in SDS • Roughly correspond to Illocutionary acts – Motivation: Improving Spoken Dialogue Systems – Many coding schemes (e.g. DAMSL) – Many-to-many mapping between DAs and words • Agreement DA can realized by Okay, Um, Right, Yeah, … • But each of these can express multiple DAs, e.g. S: You should take the 10pm flight. U: Okay …that sounds perfect. …but I’d prefer an earlier flight. …(I’m listening) 7/15/2016 18 • DA recognition important for – Turn recognition (which grammar to use when) – Turn disambiguation, e.g. S: What city do you want to go to? U1: Boston. (reply) U2: Boston? (request for information) S: Do you want to go to Boston? U1: Boston. (confirmation) U2: Boston? (question) 7/15/2016 19 Automatic DA Detection • Rosset & Lamel ’04: Can we detect DAs automatically w/ minimal reliance on lexical content? – Lexicons are domain-dependent – ASR output is errorful • Corpora (3912 utts total) – Agent/client dialogues in a French bank call center, in a French web-based stock exchange customer service center, in an English bank call center 7/15/2016 20 • DA tags (44) – Conventional (openings, closings) – Information level (items related to the semantic content of the task) – Forward Looking Function: • statement (e.g. assert, commit, explanation) • infl on Hearer (e.g. confirmation, offer, request) – Backward Looking Function: • Agreement (e.g. accept, reject) • Understanding (e.g. backchannel, correction) – Communicative Status (e.g. self-talk, change-mind) – NB: Each utt could receive a tag for each class, so utts represented as vectors • But…only 197 combinations observed 7/15/2016 21 – Method: Memory-based learning (TIMBL) • Uses all examples for classification • Useful for sparse data – Features • • • • Speaker identity First 2 words of each turn # utts in turn Previously proposed DA tags for utts in turn – Results • With true utt boundaries: – ~83% accuracy on test data from same domain – ~75% accuracy on test data from different domain 7/15/2016 22 – On automatically identified utt units: 3.3% ins, 6.6% del, 13.5% sub • Which DAs are easiest/hardest to detect? 7/15/2016 DA Resp-to Backch GE.fr 52.0% 75.0% CAP.fr 33.0% 72.0% GE.eng 55.7% 89.2% Accept Assert Expression Comm-mgt 41.7% 66.0% 89.0% 86.8% 26.0% 56.3% 69.3% 70.7% 30.3% 50.5% 56.2% 59.2% Task 85.4% 81.4% 78.8% 23 • Conclusions – Strong ‘grammar’ of DAs in Spoken Dialogue systems – A few initial words perform as well as more 7/15/2016 24 Today • • • • • Dialogue Systems and Human Conversation Turns and Turn-taking Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics – Presupposition – Conventional Implicature – Conversational Implicature 7/15/2016 25 Grounding (Stalnaker ’78, Clark & Schaefer ’89) • Common Ground: the set of propositions mutually believed by S and H – Principle of Closure: agents performing an action require evidence that they have succeeded – and S needs to know when s/he has succeeded in communicating – Presentation of utterance by S – Acceptance of utterance by H • How does grounding take place in conversation? 7/15/2016 26 Grounding Strategies from Weak to Strong I need to get your homework by Monday. • Continued attention … • Next contribution I should be finished Sunday night. • Acknowledgment Mhmm… • Demonstration You need this soon. • Display You need to get my homework Monday. 7/15/2016 27 Discourse Structure and Intention Welcome to word processing. That’s using a computer to type letters and reports. Make a typo? No problem. Just back up, type over the mistake, and it’s gone. And, it eliminates retyping. And, it eliminates retyping. 7/15/2016 28 Structures of Discourse Structure (Grosz & Sidner ‘86) • Leading alternative to Rhetorical Structure Theory – Provides for multiple levels of analysis: S’s purpose as well as content of utterances and S and H’s attentional state – Identifies only a few, general relations that hold among intentions • Three components: – Linguistic structure – Intentional structure – Attentional structure 7/15/2016 29 Linguistic Structure • What is actually said/written • How is this represented? – Assume discourse is segmented into Discourse Segments (DS) -- how? • what is basic unit of analysis? • segmentation agreement • automatic segmentation – Embedding relations: topic structure – Cue phrases 7/15/2016 30 Intentional Structure • Discourse purpose (DP): basic purpose of the discourse • Discourse segment purposes (DSPs): how this segment contributes to the overall DP • Segment relations: – Satisfaction-precedence: DSP1 must be satisfied before DSP2 – Dominance: DSP1 dominates DSP2 if fulfilling DSP2 constitutes part of fulfilling DSP1 7/15/2016 31 Attentional State • Focus stack: – Stack of focus spaces, each containing objects, properties and relations salient during each DS, plus the DSP (content plus purpose) – State changes modeled by transition rules controlling the addition/deletion of focus spaces • Information at lower levels may or may not be available at higher levels • Focus spaces are pushed onto the stack when 7/15/2016 – new DS or embedded DS (e.g. DS that are dominated by other DS) are begun – popped when they are completed 32 Limits of G&S ‘86 • Assumes that discourses are task-oriented • Assumes there is a single, hierarchical structure shared by S and H • How do we identify entities that are salient (on the focus stack)? Grammatical function? • Do people really build such structures when they converse? Use them in interpreting what others say? 7/15/2016 33 How are these structures recognized from a discourse? • Linguistic markers: – tense and aspect – cue phrases – intonational variation • Inference of S intentions • Inference from task structure • Intonational Information 7/15/2016 34 Today • • • • • Dialogue Systems and Human Conversation Turns and Turn-taking Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics – Presupposition – Conventional Implicature – Conversational Implicature 7/15/2016 35 Implicit Information • Question interpretation in SDS S: Are you traveling to La Guardia? U: I’m going to New York. U: When does the 5 o’clock train leave from Newark? S : <U believes there is a 5 o’clock train from Newark.> S: I heard you say New York City? U: New York City? 7/15/2016 36 • Cooperative responses in SDS – Correcting misconceptions U: When does the 5 o’clock train leave from Newark? S (thinks): <U believes there is a 5 o’clock train from Newark> S: There is no 5 o’clock train from Newark; there is a 5:20 tho. – Providing more information than is asked for U: Do I have the $500 minimum in that account? S1: Yes. S2: You have $739. 7/15/2016 37 Discourse Pragmatics • Context-dependent meaning, invited inference, intended meaning – vs. “propositional content” • Indirect Speech Acts • Presupposition • Implicature – Conversational – Conventional 7/15/2016 38 Presupposition • What is `taken for granted’, given some linguistic expression X The King of France is bald. (Is there a King of France? All of Herman’s children are bright. (Does Herman have children?) • Linguistic Test: Negative, interrogative, and embedded X preserve the same assumption The King of France is not bald. Is the King of France bald? I thought that the King of France was bald. 7/15/2016 39 • Presuppositions can be suspended but they cannot be felicitously denied All of Herman’s children are bright, if he indeed has children. *All of Herman’s children are bright, though he has no children. 7/15/2016 40 Presupposition and SDS • Presuppositional information adds facts/beliefs to the dialogue history – Information to store and check for accuracy • My wife will also be a driver (S has a spouse) • My number is 212-555-1212 (S has a telephone account) • I’ll take the red-eye (S believes there is a red-eye) • I’m upset about being charged for a call to Ethiopia (S was charged for a call to Ethiopia) • I’m a bachelor. (S is an unmarried male person) 7/15/2016 41 Conversational Implicature • H. Paul Grice: Conversation is not formal logic – and is not ‘^’, or is not ‘v’, some is not – George got married and had a baby. – Was it a boy or a girl? – Some people sent baby gifts. – Principles of Cooperative Conversation: Make your conversational contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged 7/15/2016 42 Maxims of Cooperative Conversation • Maxim of Quantity: – 1. Make your contribution as informative as is required (for the current purposes of the exchange) – 2. Do not make your contribution more than is required. • Maxim of Quality: – Try to make your contribution one that is true. • 1. Do not say what you believe to be false. • 2. Do not say that for which you lack adequate evidence. •7/15/2016 Maxim of Relation: Be relevant 43 • Maxim of Manner: Be perspicuous – 1. Avoid obscurity of expression. – 2. Avoid ambiguity. – 3. Be brief (avoid unnecessary prolixity). – 4. Be orderly. • Maxims may be – Observed John got into Columbia and won a scholarship. – Violated quietly I never said that. – Flouted 7/15/2016 He has excellent handwriting…. 44 • Speakers may not be able to observe all maxims simultaneously • Implicature interpretation requires both S and H to understand the CP and Maxims – That which S licenses and H infers via the CP and the Maxims A. I got an A on that exam. B. And I’m Queen Marie of Rumania. A. Where did you go? B. Out. 7/15/2016 45 A: Where does Arnold live? B: Somewhere in southern California. 7/15/2016 46 Other Implicatures • Generalized Conversational, e.g. indefinites A car ran over John’s foot. (not John’s car) John broke a foot yesterday. (John’s foot) John broke a nose yesterday. (not his own) • Conventional George is short but brave. George is short; therefore he is brave. 7/15/2016 47 Summary • • • • • Dialogue Systems and Human Conversation Turns and Turn-taking Speech Acts and Dialogue Acts Grounding and Intentional Structure Pragmatics – Presupposition – Conventional Implicature – Conversational Implicature 7/15/2016 50 Spoken Language Processing • These are only a few of the challenges of Spoken Language Processing (CS 4706) • How does it go beyond CS 4705? – Speech analysis tools and techniques • Deception, charisma, emotional speech, medical states – Speech technologies • • • • 7/15/2016 Text-to-Speech Automatic Speech Recognition Speaker ID Language and dialect ID 51 Project • Build a Spoken Dialogue System of your own – Choose the domain and task – Build a speech recognizer, a text-to-speech synthesis system, and a dialogue manager (from libraries) – Demo your system and maybe win a prize 7/15/2016 52 Next Class • Review for the Final Exam 7/15/2016 53