Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005 1 Outline • • • • From Back Channel to backchannels Function of the Back Channel Characteristics of the Back Channel The Back Channel in Spoken Dialogue Systems 2 From back channel… • 70s: Conversation Analysts attempt to describe systematic rules for turn-taking management – Goal: minimize gaps and overlaps between speakers • BUT many overlaps in natural speech – E.g.: “mm-hmm”, “okay”, “yeah”… • “Back channel” (Yngve 1970): Parallel channel for communication (Duncan 1972) – “Back channel communication does not constitute a turn or a claim for a turn” – But it “may participate in a variety of communication functions, including the regulation of speaking turns.” 3 …to backchannels • “Backchannel”: listener-produced signal such as “mm-hmm”, “yeah”… (“To backchannel”: to produce such signals) • Does not imply the will to take the turn • Implies some form of acknowledgment (in general) 4 Front vs Back Channel Front Channel Back Channel Function Propositional Transactional Conversation managmt Social Conversation managmt Social Protocol Turn-taking Floor sharing ? (controlled by FC?) No floor to share Lexical content Anything vocalizations, short words, phrases (“That’s true”) 5 Front-channel cues to backchannel signals • Koiso et al (1998) • Analyze the relationship between different syntactic and prosodic features and the occurrence of backchannels 6 Koiso et al (Methodology) • Data: 8 dialogs from Japanese Map Task corpus: – replica of the Edinburgh MT – Face-to-face and speech only (no difference) • Features – – – – – – Syntactic: POS Duration of last mora (normal/long/short) F0 pattern of last mora (flat-fall, rise…) Peak F0 (low/high) Energy pattern (late-decr, decr, no-decr) Peak energy (low/high) 7 Koiso et al (Results) • Frequency of feature values BC > no-BC POS=verb-phrase, post-position, conjunction F0 pat=flat-fall or rise-fall Energy pat=late-decr Peak energy=high no-BC > BC POS=adv, conjunction, interjection, filler Dur=short F0 pat=fall or flat Energy pat=non-decr Peak energy=low 8 Koiso et al (Results) • Decision Tree analysis • Compare the loss in performance by not using each feature – POS: single best feature – Prosodic features altogether: as good as POS 9 Koiso et al (Discussion) • Some POS strongly inhibit BC • Individual prosodic features are not good indicators of BC occurrence • BC occurrence is conditioned by both POS and prosody (as a whole) • What about other languages? • What about BC overlapping with speech? 10 BC cues in English and Japanese • Ward and Tsukahara (2000) • Tests one hypothesis (“BC are triggered by low pitch cues”) for two languages 11 The Low Pitch Cue • Both in American English and Japanese, it appears that “after a region of low pitch lasting 110 ms the listener tends to produce back-channel feedback”. • Goal of this paper: quantitatively test this on naturally occurring conversations 12 Ward and Tsukahara (Methodology) • Data: – English: 8 conversations, 12 speakers (first author participates in 5 conversations!) – Japanese: 18 conversations, 24 speakers • Prediction: – Every 10ms decide BC/no-BC by applying a hand coded rule with 5 parameters tuned to the data 13 Ward and Tsukahara (Results) • Each predicted BC was considered correct if it fell within 500ms of an actual BC • Low pitch region rule is better than chance both in English and Japanese 14 Ward and Tsukahara (Results) • Issues: – Evaluation (tolerance window size, speakers produce BCs with different frequencies…) – No actual comparison between languages – Are low pitch regions and BCs simply correlated to other phenomena (syntactic completion, disfluencies…) or is there a direct cause/consequence relationship? 15 Effects of Native Language and Gender on BC • Feke (2003) • Conversation Analysis study of BC in native-English and native-Spanish, sameand mixed-gender dialogs 16 Definition of BC • BC: responses of the participant that is “clearly not holding the floor”… • Very loose compared to previous papers: – e.g. “How did you find Quechua?” is a BC • Distinguishes In-Between BC and Overlap BC 17 Feke (Methodology) • Recorded 8 non-scripted conversations between 8 different speakers (2 native languages x 2 genders x 2 subjects) • Manually coded In-Between BCs and Overlap BCs 18 Feke (Results) • No differences observed across cultures • Participants of both genders tend to use more BC when conversing with someone of the opposite gender • Difference seems bigger for females than for males 19 Feke (Discussion) • Interesting/surprising result from the ethnological/sociological point of view • Very few data points, no significance analysis • Only looked at number of BCs • Consequences on SDS? (e.g. using gender information in BC prediction, selecting the gender of an agent…) 20 BC in Practical Systems… • Takeuchi et al (2003) • Method to determine the timing of turn transitions and aizuchi (≈BC) on Japanese Human-Human corpus 21 Takeuchi (Approach) • Similar to Koiso et al, but only using automatically extracted features • Every 100 ms decide between: – Take turn – Aizuchi (BC) – Leave turn (wait) 22 Takeuchi (Approach) • Decision Tree using – Syntax (POS, content/function words) – Utterance duration – Pause duration/pause since last content wd – Content word duration – F0 – Power 23 Takeuchi (Results) • Precision/Recall of frame classification: – Around 80% on the training set – Less then 50% on a test set • Subjective evaluation: – Artificially insert BC at predicted time – Timing was judged “good” in 70-80% – On real utterances: 72% (!) 24 Takeuchi (Discussion) • Found that syntactic information did not help (contradicts Koiso?) • Underscores the difficulty of evaluating turn-taking/backchanneling systems 25 Conclusion • Hard to account for simultaneous turns in conversation • Back Channel framework offers one explanation • But most work remains very specific • Missing a good theory of conversation… 26