Classification of Discourse Functions of Affirmative Words in Spoken Dialogue

advertisement
INTERSPEECH, Antwerp, August 2007
Classification of Discourse
Functions of Affirmative Words
in Spoken Dialogue
Agustín Gravano, Stefan Benus, Julia Hirschberg
Shira Mitchell, Ilia Vovsha
Spoken Language Processing Group
Columbia University
Cue Words

Ambiguous linguistic expressions used for
Making a semantic contribution, or
 Conveying a pragmatic function.
Examples: now, well, so, alright, and, okay, first,
by the way, on the other hand.


Single affirmative cue words


Examples: alright, okay, mm-hm, right, uh-huh, yes.
May be used to convey acknowledgment or
agreement, to change topic, to backchannel, etc.
Agustín Gravano
INTERSPEECH 2007
2
Research Goals



Learn which features best characterize the
different functions of single affirmative cue
words.
Determine how these can be identified
automatically.
Important in Spoken Dialogue Systems:


Understand user input.
Produce output appropriately.
Agustín Gravano
INTERSPEECH 2007
3
Previous Work

Classification of cue words into discourse vs.
sentential use.


In our corpus:



Hirschberg & Litman ’87, ’93; Litman ’94; Heeman,
Byron & Allen ’98; Zufferey & Popescu-Belis ’04.
right: 15% discourse, 85% sentential.
All other affirmative cue words: 99% disc., 1% sent.
Discourse vs. sentential distinction insufficient.
Need to define new classification tasks.
Agustín Gravano
INTERSPEECH 2007
4
Talk Overview




Columbia Games Corpus
Classification tasks
Experimental features
Results
Agustín Gravano
INTERSPEECH 2007
5
The Columbia Games Corpus


12 spontaneous task-oriented dyadic conversations
in Standard American English.
2 subjects playing computer games; no eye contact.
Agustín Gravano
INTERSPEECH 2007
6
The Columbia Games Corpus
Function of Affirmative Cue Words
Cue Words











alright
gotcha
huh
mm-hm
okay
right
uh-huh
yeah
yep
yes
yup
Functions
Acknowledgment / Agreement
 Backchannel
 Cue beginning discourse segment
7.9% of the words
Cue
ending discourse segment
in our
corpus
 Check with the interlocutor
 Stall / Filler
 Back from a task
 Literal modifier
 Pivot beginning: Ack/Agree + Cue begin
 Pivot ending: Ack/Agree + Cue end

Agustín Gravano
INTERSPEECH 2007
7
The Columbia Games Corpus
Function of Affirmative Cue Words

Literal Modifier
that’s pretty much okay

Backchannel
Speaker 1: between the yellow mermaid and
the whale
Speaker 2: okay
Speaker 1: and it is

Cue beginning discourse segment
okay we gonna be placing the blue moon
Agustín Gravano
INTERSPEECH 2007
8
The Columbia Games Corpus
Function of Affirmative Cue Words


3 trained labelers
Inter-labeler agreement:


Fleiss’ Kappa = 0.69
(Fleiss ’71)
In this study we use the majority label for
each affirmative cue word.

Majority label: label chosen by at least two of the
three labelers.
Agustín Gravano
INTERSPEECH 2007
9
Method
Two new classification tasks

Identification of a discourse segment
boundary function


Segment beginning
vs. Segment end
vs. No discourse segment boundary function
Identification of an acknowledgment function

Acknowledgment vs. No acknowledgment
Agustín Gravano
INTERSPEECH 2007
10
Method
Machine Learning Experiments

ML Algorithm



JRip: Weka’s implementation of the propositional
rule learner Ripper (Cohen ’95).
We also tried J4.8, Weka’s implementation of the
decision tree learner C4.5 (Quinlan ’93, ’96), with
similar results.
10-fold cross validation in all experiments.
Agustín Gravano
INTERSPEECH 2007
11
Method
Experimental features

IPU (Inter-pausal unit)


Maximal sequence of words delimited by pause >
50ms.
Conversational Turn

Maximal sequence of IPUs by the same speaker, with
no contribution from the other speaker.
Agustín Gravano
INTERSPEECH 2007
12
Method
Experimental features

Text-based features



Timing features



Extracted from the text transcriptions.
Lexical id; POS tags; position of word in IPU / turn; etc.
Extracted from the time alignment of the transcriptions.
Word / IPU / turn duration; amount of overlap; etc.
Acoustic features



{min, mean, max, stdev} x {pitch, intensity}
Slope of pitch, stylized pitch, and intensity, over the whole word,
and over its last 100, 200, 300ms.
Acoustic features from the end of the other speaker’s previous turn.
Agustín Gravano
INTERSPEECH 2007
13
Results
Discourse segment boundary function
Feature Set
Error Rate
F-Measure
Begin
End
Text-based
11.6 %
.77
.30
Timing
11.3 %
.73
.52
Acoustic
14.2 %
.66
.19
Text-based + Timing
9.8 %
.81
.53
Full set
9.6 %
.81
.57
19.0 %
.00
.00
5.7 %
.94
.71
Baseline (1)
Human labelers (2)
(1) Majority class baseline: NO BOUNDARY.
(2) Calculated wrt each labeler’s agreement with the majority labels.
Agustín Gravano
INTERSPEECH 2007
14
Results
Acknowledgment function
Feature Set
Error Rate
Text-based
F-Measure
8.3 %
.94
Timing
11.0 %
.92
Acoustic
17.2 %
.87
Text-based + Timing
6.2 %
.95
Full set
6.5 %
.95
16.7 %
.88
5.5 %
.98
Baseline (1)
Human labelers (2)
{huh, right }  no ACK
all other words  ACK
(2) Calculated wrt each labeler’s agreement with the majority labels.
(1) Baseline based on lexical identity:
Agustín Gravano
INTERSPEECH 2007
15
Best-performing features
Discourse Segment
Boundary Function
• Lexical identity
• POS tag of the following word
• Number and proportion of
succeeding words in the turn
• Context-normalized mean
intensity
Acknowledgment Function
• Lexical identity
• POS tag of preceding word
• Number and proportion of
preceding words in the turn
• IPU and turn length
Agustín Gravano
INTERSPEECH 2007
16
Results
Classification of individual words

Classification of each individual word into its
most common functions.





alright  Ack/Agree, Cue Begin, Other
mm-hm  Ack/Agree, Backchannel
okay
 Ack/Agree, Backchannel, Cue Begin,
Ack+CueBegin, Ack+CueEnd, Other
right
 Ack/Agree, Check, Literal Modifier
yeah
 Ack/Agree, Backchannel
Agustín Gravano
INTERSPEECH 2007
17
Results
Classification of the word ‘okay’
F-Measure
Feature Set
Error
Rate
Text-based
31.7
.76
.16
.77
.09
.33
Acoustic
40.2
.69
.24
.64
.03
.25
Text-based + Timing
25.6
.79
.31
.82
.18
.67
Full set
25.5
.80
.46
.83
.21
.66
Baseline (1)
48.3
.68
.00
.00
.00
.00
Human labelers (2)
14.0
.89
.78
.94
.56
.73
Ack / Back- Cue Ack/Agree + Ack/Agree +
Cue End
Agree channel Begin Cue Begin
(1) Majority class baseline: ACK/AGREE.
(2) Calculated wrt each labeler’s agreement with the majority labels.
Agustín Gravano
INTERSPEECH 2007
18
Summary


Discourse/sentential distinction is insufficient
for affirmative cue words in spoken dialogue.
Two new classification tasks:



Detection of an acknowledgment function.
Detection of a discourse boundary function.
Best performing ML models:


Based on textual and timing features.
Slight improvement when using acoustic features.
Agustín Gravano
INTERSPEECH 2007
19
Further Work

Gravano et al, 2007
On the role of context and prosody in the
interpretation of ‘okay’.
ACL 2007, Prague, Czech Republic, June 2007.

Benus et al, 2007
The prosody of backchannels in American English.
ICPhS 2007, Saarbrücken, Germany, August 2007.
Agustín Gravano
INTERSPEECH 2007
20
INTERSPEECH, Antwerp, August 2007
Classification of Discourse
Functions of Affirmative Words
in Spoken Dialogue
Agustín Gravano, Stefan Benus, Julia Hirschberg
Shira Mitchell, Ilia Vovsha
Spoken Language Processing Group
Columbia University
alright mm-hm
Ack / Agree
okay
right
uh-huh
yeah
Other
Total
99
61
1137
114
18
808
133
2370
6
402
121
14
143
72
5
763
89
0
548
2
0
2
0
641
Cue End
8
0
10
0
0
0
0
18
Pivot Begin
5
0
68
0
0
0
0
73
13
12
232
2
0
22
17
298
Back from Task
9
1
33
0
0
0
0
43
Check
0
0
6
53
0
1
8
68
Stall
1
0
15
1
0
2
0
19
Literal Modifier
9
0
29
1079
0
0
1
1118
56
27
235
10
3
65
11
407
295
503
2434
1275
164
972
175
5818
Backchannel
Cue Begin
Pivot End
?
Total
Agustín Gravano
INTERSPEECH 2007
22
Download