Recognizing Structure: Dialogue Acts and Segmentation Julia Hirschberg CS 6998 7/15/2016 1 Today Recognizing structural information from speech Topic structure Speech/dialogue acts Applications Speech browsing and search of large corpora Broadcast News (NIST TREC SDR track) Topic Detection and Tracking (NIST/DARPA TDT) Customer care, focus groups, voicemail Spoken Dialogue Systems 7/15/2016 2 SCAN 7/15/2016 3 SCANMail Demo: Basic Layout SCANMail Demo: Number Extraction Discourse Structure and Topic Structure Intention-based accounts Grosz & Sidner ‘86 Conversational moves (games) Edinburgh map task dialogues Adjacency pairs Schegloff, Sacks, Jefferson 7/15/2016 6 Indicators of Topic Structure Cue phrases: now, well, first Pronominal reference Orthography and formatting -- in text Lexical information (Hearst ‘94, Reynar ’98, Beeferman et al ‘99) In speech? 7/15/2016 7 Prosodic Correlates of Discourse/Topic Structure Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96 7/15/2016 8 Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92 7/15/2016 9 Prosodic Cues to Sentence and Topic Boundaries: Shriberg et al ’00 Prosody cues perform as well or better than text-based cues at topic segmentation -- and generalize better? Goal: identify sentence and topic boundaries at ASR-defined word boundaries CART decision trees provided boundary predictions HMM combined these with lexical boundary predictions 7/15/2016 10 Features For each potential boundary location: Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity 7/15/2016 11 Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender Trained/tested on Switchboard and Broadcast News 7/15/2016 12 Sentence segmentation results • Prosodic features • Better than LM for BN • Worse (on transcription) and same for ASR transcript on SB • All better than chance • Useful features for BN • Pause at boundary ,turn/no turn, f0 diff across boundary, rhyme duration 7/15/2016 13 • Useful features for SB • Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 7/15/2016 14 Topic segmentation results (BN only): • Useful features • Pause at boundary, f0 range, turn/no turn, gender, time in turn • Prosody alone better than LM • Combined model improves significantly 7/15/2016 15 Speech Act Theory John Searle Locutionary acts: semantic meaning Illocutionary acts: ask, promise, answer, threat Perlocutionary acts: Effect intended to be produced on speaker: regret, fear Dialogue acts Many tagging schemes (e.g. DAMSL) 7/15/2016 16 Practical Motivations: Spoken Dialogue Systems Add more information about speaker intentions Disambiguate ambiguous utterances Okay Um Right 7/15/2016 17 Experimental Evidence: Nickerson & Chu-Carroll ‘99 Can/would/would..willing questions Can you move the piano? Would you move the piano? Would you be willing to move the piano? A la Sag & Liberman ‘75: can intonation disambiguate? 7/15/2016 18 Experiments Production studies: Subjects read ambiguous questions in disambiguating contexts Control for given/new and contrastiveness Polite/neutral/impolite Problems: Cells imbalanced No pretesting 7/15/2016 19 No distractors Same speaker reads both contexts 7/15/2016 20 Results Indirect requests If L%, more likely (73%) to be indirect 46% H%: differences in height of boundary tone? Politeness: can differs in impolite (higher rise) vs. neutral Variation in speaker strategy 7/15/2016 21 Corpus Studies: Jurafsky et al ‘98 Lexical, acoustic/prosodic/syntactic differentiators for yeah, ok, uhuh, mhmm, um… Continuers: Mhmm (not taking floor) Assessments: Mhmm (tasty) Agreements: Mhmm (I agree) Yes answers: Mhmm (That’s right) Incipient speakership: Mhmm (taking floor) 7/15/2016 22 Corpus Study Switchboard telephone conversation corpus Hand segmented and labeled with DA information (initially from text) Relabeled for this study Analyzed for Lexical realization F0 and rms features Syntactic patterns 7/15/2016 23 Results: Lexical Differences Agreements yeah (36%), right (11%),... Continuer uhuh (45%), yeah (27%),… Incipient speaker yeah (59%), uhuh (17%), right (7%),… Yes-answer yeah (56%), yes (17%), uhuh (14%),... 7/15/2016 24 Results: Prosodic and Syntactic Cues Relabeling from speech produces only 2% changed labels over all (114/5757) 43/987 continuers --> agreements Why? Shorter duration, lower F0, lower energy, longer preceding pause Over all DA’s, duration best differentiator but… Highly correlated with length in words Assessments: That’s X (good, great, fine,…) 7/15/2016 25 Future Work Speaker differences? Higher level prosodic differences among ambiguous word DA’s? 7/15/2016 26 A Coding Scheme for ‘ok’ Ritualistic? Closing You're Welcome Other No 3rd-Turn-Receipt? Yes No If Ritualistic==No, code all of these as well: 7/15/2016 27 Task Management: I'm done I'm not done yet None Topic Management: Starting new topic Finished old topic Pivot: finishing and starting Turn Management: Still your turn (=traditional backchannel) Still my turn (=stalling for time) I'm done, it is now your turn None Belief Management: I accept your proposition I entertain your proposition I reject your proposition Do you accept my proposition? 7/15/2016 None (=y/n question) 28 Next Week Turn-taking and disfluencies 7/15/2016 29