INTERSPEECH, Antwerp, August 2007 Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia Vovsha Spoken Language Processing Group Columbia University Cue Words Ambiguous linguistic expressions used for Making a semantic contribution, or Conveying a pragmatic function. Examples: now, well, so, alright, and, okay, first, by the way, on the other hand. Single affirmative cue words Examples: alright, okay, mm-hm, right, uh-huh, yes. May be used to convey acknowledgment or agreement, to change topic, to backchannel, etc. Agustín Gravano INTERSPEECH 2007 2 Research Goals Learn which features best characterize the different functions of single affirmative cue words. Determine how these can be identified automatically. Important in Spoken Dialogue Systems: Understand user input. Produce output appropriately. Agustín Gravano INTERSPEECH 2007 3 Previous Work Classification of cue words into discourse vs. sentential use. In our corpus: Hirschberg & Litman ’87, ’93; Litman ’94; Heeman, Byron & Allen ’98; Zufferey & Popescu-Belis ’04. right: 15% discourse, 85% sentential. All other affirmative cue words: 99% disc., 1% sent. Discourse vs. sentential distinction insufficient. Need to define new classification tasks. Agustín Gravano INTERSPEECH 2007 4 Talk Overview Columbia Games Corpus Classification tasks Experimental features Results Agustín Gravano INTERSPEECH 2007 5 The Columbia Games Corpus 12 spontaneous task-oriented dyadic conversations in Standard American English. 2 subjects playing computer games; no eye contact. Agustín Gravano INTERSPEECH 2007 6 The Columbia Games Corpus Function of Affirmative Cue Words Cue Words alright gotcha huh mm-hm okay right uh-huh yeah yep yes yup Functions Acknowledgment / Agreement Backchannel Cue beginning discourse segment 7.9% of the words Cue ending discourse segment in our corpus Check with the interlocutor Stall / Filler Back from a task Literal modifier Pivot beginning: Ack/Agree + Cue begin Pivot ending: Ack/Agree + Cue end Agustín Gravano INTERSPEECH 2007 7 The Columbia Games Corpus Function of Affirmative Cue Words Literal Modifier that’s pretty much okay Backchannel Speaker 1: between the yellow mermaid and the whale Speaker 2: okay Speaker 1: and it is Cue beginning discourse segment okay we gonna be placing the blue moon Agustín Gravano INTERSPEECH 2007 8 The Columbia Games Corpus Function of Affirmative Cue Words 3 trained labelers Inter-labeler agreement: Fleiss’ Kappa = 0.69 (Fleiss ’71) In this study we use the majority label for each affirmative cue word. Majority label: label chosen by at least two of the three labelers. Agustín Gravano INTERSPEECH 2007 9 Method Two new classification tasks Identification of a discourse segment boundary function Segment beginning vs. Segment end vs. No discourse segment boundary function Identification of an acknowledgment function Acknowledgment vs. No acknowledgment Agustín Gravano INTERSPEECH 2007 10 Method Machine Learning Experiments ML Algorithm JRip: Weka’s implementation of the propositional rule learner Ripper (Cohen ’95). We also tried J4.8, Weka’s implementation of the decision tree learner C4.5 (Quinlan ’93, ’96), with similar results. 10-fold cross validation in all experiments. Agustín Gravano INTERSPEECH 2007 11 Method Experimental features IPU (Inter-pausal unit) Maximal sequence of words delimited by pause > 50ms. Conversational Turn Maximal sequence of IPUs by the same speaker, with no contribution from the other speaker. Agustín Gravano INTERSPEECH 2007 12 Method Experimental features Text-based features Timing features Extracted from the text transcriptions. Lexical id; POS tags; position of word in IPU / turn; etc. Extracted from the time alignment of the transcriptions. Word / IPU / turn duration; amount of overlap; etc. Acoustic features {min, mean, max, stdev} x {pitch, intensity} Slope of pitch, stylized pitch, and intensity, over the whole word, and over its last 100, 200, 300ms. Acoustic features from the end of the other speaker’s previous turn. Agustín Gravano INTERSPEECH 2007 13 Results Discourse segment boundary function Feature Set Error Rate F-Measure Begin End Text-based 11.6 % .77 .30 Timing 11.3 % .73 .52 Acoustic 14.2 % .66 .19 Text-based + Timing 9.8 % .81 .53 Full set 9.6 % .81 .57 19.0 % .00 .00 5.7 % .94 .71 Baseline (1) Human labelers (2) (1) Majority class baseline: NO BOUNDARY. (2) Calculated wrt each labeler’s agreement with the majority labels. Agustín Gravano INTERSPEECH 2007 14 Results Acknowledgment function Feature Set Error Rate Text-based F-Measure 8.3 % .94 Timing 11.0 % .92 Acoustic 17.2 % .87 Text-based + Timing 6.2 % .95 Full set 6.5 % .95 16.7 % .88 5.5 % .98 Baseline (1) Human labelers (2) {huh, right } no ACK all other words ACK (2) Calculated wrt each labeler’s agreement with the majority labels. (1) Baseline based on lexical identity: Agustín Gravano INTERSPEECH 2007 15 Best-performing features Discourse Segment Boundary Function • Lexical identity • POS tag of the following word • Number and proportion of succeeding words in the turn • Context-normalized mean intensity Acknowledgment Function • Lexical identity • POS tag of preceding word • Number and proportion of preceding words in the turn • IPU and turn length Agustín Gravano INTERSPEECH 2007 16 Results Classification of individual words Classification of each individual word into its most common functions. alright Ack/Agree, Cue Begin, Other mm-hm Ack/Agree, Backchannel okay Ack/Agree, Backchannel, Cue Begin, Ack+CueBegin, Ack+CueEnd, Other right Ack/Agree, Check, Literal Modifier yeah Ack/Agree, Backchannel Agustín Gravano INTERSPEECH 2007 17 Results Classification of the word ‘okay’ F-Measure Feature Set Error Rate Text-based 31.7 .76 .16 .77 .09 .33 Acoustic 40.2 .69 .24 .64 .03 .25 Text-based + Timing 25.6 .79 .31 .82 .18 .67 Full set 25.5 .80 .46 .83 .21 .66 Baseline (1) 48.3 .68 .00 .00 .00 .00 Human labelers (2) 14.0 .89 .78 .94 .56 .73 Ack / Back- Cue Ack/Agree + Ack/Agree + Cue End Agree channel Begin Cue Begin (1) Majority class baseline: ACK/AGREE. (2) Calculated wrt each labeler’s agreement with the majority labels. Agustín Gravano INTERSPEECH 2007 18 Summary Discourse/sentential distinction is insufficient for affirmative cue words in spoken dialogue. Two new classification tasks: Detection of an acknowledgment function. Detection of a discourse boundary function. Best performing ML models: Based on textual and timing features. Slight improvement when using acoustic features. Agustín Gravano INTERSPEECH 2007 19 Further Work Gravano et al, 2007 On the role of context and prosody in the interpretation of ‘okay’. ACL 2007, Prague, Czech Republic, June 2007. Benus et al, 2007 The prosody of backchannels in American English. ICPhS 2007, Saarbrücken, Germany, August 2007. Agustín Gravano INTERSPEECH 2007 20 INTERSPEECH, Antwerp, August 2007 Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia Vovsha Spoken Language Processing Group Columbia University alright mm-hm Ack / Agree okay right uh-huh yeah Other Total 99 61 1137 114 18 808 133 2370 6 402 121 14 143 72 5 763 89 0 548 2 0 2 0 641 Cue End 8 0 10 0 0 0 0 18 Pivot Begin 5 0 68 0 0 0 0 73 13 12 232 2 0 22 17 298 Back from Task 9 1 33 0 0 0 0 43 Check 0 0 6 53 0 1 8 68 Stall 1 0 15 1 0 2 0 19 Literal Modifier 9 0 29 1079 0 0 1 1118 56 27 235 10 3 65 11 407 295 503 2434 1275 164 972 175 5818 Backchannel Cue Begin Pivot End ? Total Agustín Gravano INTERSPEECH 2007 22