Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005

advertisement
Back Channel
Communication
Antoine Raux
Dialogs on Dialogs 02/25/2005
1
Outline
•
•
•
•
From Back Channel to backchannels
Function of the Back Channel
Characteristics of the Back Channel
The Back Channel in Spoken Dialogue
Systems
2
From back channel…
• 70s: Conversation Analysts attempt to describe
systematic rules for turn-taking management
– Goal: minimize gaps and overlaps between speakers
• BUT many overlaps in natural speech
– E.g.: “mm-hmm”, “okay”, “yeah”…
• “Back channel” (Yngve 1970): Parallel channel
for communication (Duncan 1972)
– “Back channel communication does not constitute a
turn or a claim for a turn”
– But it “may participate in a variety of communication
functions, including the regulation of speaking turns.”
3
…to backchannels
• “Backchannel”: listener-produced signal
such as “mm-hmm”, “yeah”…
(“To backchannel”: to produce such signals)
• Does not imply the will to take the turn
• Implies some form of acknowledgment
(in general)
4
Front vs Back Channel
Front Channel
Back Channel
Function
Propositional
Transactional
Conversation managmt
Social
Conversation managmt
Social
Protocol
Turn-taking
Floor sharing
? (controlled by FC?)
No floor to share
Lexical content Anything
vocalizations, short words,
phrases (“That’s true”)
5
Front-channel cues to backchannel signals
• Koiso et al (1998)
• Analyze the relationship between different
syntactic and prosodic features and the
occurrence of backchannels
6
Koiso et al (Methodology)
• Data: 8 dialogs from Japanese Map Task
corpus:
– replica of the Edinburgh MT
– Face-to-face and speech only (no difference)
• Features
–
–
–
–
–
–
Syntactic: POS
Duration of last mora (normal/long/short)
F0 pattern of last mora (flat-fall, rise…)
Peak F0 (low/high)
Energy pattern (late-decr, decr, no-decr)
Peak energy (low/high)
7
Koiso et al (Results)
• Frequency of feature values
BC > no-BC
POS=verb-phrase,
post-position,
conjunction
F0 pat=flat-fall or
rise-fall
Energy pat=late-decr
Peak energy=high
no-BC > BC
POS=adv, conjunction,
interjection, filler
Dur=short
F0 pat=fall or flat
Energy pat=non-decr
Peak energy=low
8
Koiso et al (Results)
• Decision Tree analysis
• Compare the loss in performance by not
using each feature
– POS: single best feature
– Prosodic features altogether: as good as POS
9
Koiso et al (Discussion)
• Some POS strongly inhibit BC
• Individual prosodic features are not good
indicators of BC occurrence
• BC occurrence is conditioned by both
POS and prosody (as a whole)
• What about other languages?
• What about BC overlapping with speech?
10
BC cues in English and Japanese
• Ward and Tsukahara (2000)
• Tests one hypothesis (“BC are triggered
by low pitch cues”) for two languages
11
The Low Pitch Cue
• Both in American English and Japanese,
it appears that “after a region of low pitch
lasting 110 ms the listener tends to
produce back-channel feedback”.
• Goal of this paper: quantitatively test this
on naturally occurring conversations
12
Ward and Tsukahara (Methodology)
• Data:
– English: 8 conversations, 12 speakers (first
author participates in 5 conversations!)
– Japanese: 18 conversations, 24 speakers
• Prediction:
– Every 10ms decide BC/no-BC by applying a
hand coded rule with 5 parameters tuned to
the data
13
Ward and Tsukahara (Results)
• Each predicted BC was considered
correct if it fell within 500ms of an actual
BC
• Low pitch region rule is better than
chance both in English and Japanese
14
Ward and Tsukahara (Results)
• Issues:
– Evaluation (tolerance window size, speakers
produce BCs with different frequencies…)
– No actual comparison between languages
– Are low pitch regions and BCs simply
correlated to other phenomena (syntactic
completion, disfluencies…) or is there a
direct cause/consequence relationship?
15
Effects of Native Language
and Gender on BC
• Feke (2003)
• Conversation Analysis study of BC in
native-English and native-Spanish, sameand mixed-gender dialogs
16
Definition of BC
• BC: responses of the participant that is
“clearly not holding the floor”…
• Very loose compared to previous papers:
– e.g. “How did you find Quechua?” is a BC
• Distinguishes In-Between BC and
Overlap BC
17
Feke (Methodology)
• Recorded 8 non-scripted conversations
between 8 different speakers (2 native
languages x 2 genders x 2 subjects)
• Manually coded In-Between BCs and
Overlap BCs
18
Feke (Results)
• No differences observed across cultures
• Participants of both genders tend to use
more BC when conversing with someone
of the opposite gender
• Difference seems bigger for females than
for males
19
Feke (Discussion)
• Interesting/surprising result from the
ethnological/sociological point of view
• Very few data points, no significance
analysis
• Only looked at number of BCs
• Consequences on SDS? (e.g. using
gender information in BC prediction,
selecting the gender of an agent…)
20
BC in Practical Systems…
• Takeuchi et al (2003)
• Method to determine the timing of turn
transitions and aizuchi (≈BC) on
Japanese Human-Human corpus
21
Takeuchi (Approach)
• Similar to Koiso et al, but only using
automatically extracted features
• Every 100 ms decide between:
– Take turn
– Aizuchi (BC)
– Leave turn (wait)
22
Takeuchi (Approach)
• Decision Tree using
– Syntax (POS, content/function words)
– Utterance duration
– Pause duration/pause since last content wd
– Content word duration
– F0
– Power
23
Takeuchi (Results)
• Precision/Recall of frame classification:
– Around 80% on the training set
– Less then 50% on a test set
• Subjective evaluation:
– Artificially insert BC at predicted time
– Timing was judged “good” in 70-80%
– On real utterances: 72% (!)
24
Takeuchi (Discussion)
• Found that syntactic information did not
help (contradicts Koiso?)
• Underscores the difficulty of evaluating
turn-taking/backchanneling systems
25
Conclusion
• Hard to account for simultaneous turns in
conversation
• Back Channel framework offers one
explanation
• But most work remains very specific
• Missing a good theory of conversation…
26
Download