Dialogue Acts Julia Hirschberg LSA07 353 7/15/2016

advertisement
Dialogue Acts
Julia Hirschberg
LSA07 353
7/15/2016
1
Today
• Recognizing structural information: Dialogue
Acts vs. Discourse Structure
• Speech Acts  Dialogue Acts
– Coding schemes (DAMSL)
– Practical goals
• Identifying DAs
– Direct and indirect DAs: experimental results
– Corpus studies of DA disambiguation
– Automatic DA identification
– More corpus studies
7/15/2016
2
Speech Acts
• Wittgenstein ’53, Austin ’62 and Searle ’75
• Contributions to dialogue are actions performed
by speakers:
– I promise to make you very very sorry for that.
– Performative verbs
• Locutionary act: the act of conveying the ‘meaning’
of the sentence uttered (e.g. committing the
Speaker to making the hearer sorry)
• Ilocutionary act: the act associated with the verb
uttered (e.g. promising)
• Perlocutionary act: the act of producing an effect
on the Hearer (e.g. threatening)
7/15/2016
3
Searle’s Classification Scheme
• Assertives: commit S to the truth of X (e.g. The
world is flat)
• Directives: attempt by S to get H to do X (e.g.
Open the window please)
• Commissives: commit S to do X (e.g. I’ll do it
tomorrow)
• Expressives: S’s description of his/her own
feelings about X (e.g. I’m sorry I screamed)
• Declarations: S brings about a change in the
world by virtue of uttering X (e.g. I divorce you, I
resign)
7/15/2016
4
Dialogue Acts
• Roughly correspond to Illocutionary acts
– Motivation: Modeling Spoken Dialogue
– Many coding schemes (e.g. DAMSL)
– Many-to-many mapping between DAs and words
• Agreement DA can realized by Okay, Um, Right, Yeah, …
• But each of these can express multiple DAs, e.g.
S: You should take the 10pm flight.
U: Okay
…that sounds perfect.
…but I’d prefer an earlier flight.
…(I’m listening)
7/15/2016
5
A Possible Coding Scheme for ‘ok’
• Ritualistic?
– Closing
– You're welcome
– Other
– No
• 3rd-Turn-Receipt?
– Yes
– No
• If Ritualistic==No, code all of these as
well:
• Task Management:
– I'm done
– I'm not done yet
– None
7/15/2016
6
• Topic Management:
– Starting new topic
– Finished old topic
– Pivot: finishing and starting
• Turn Management:
– Still your turn (=traditional
backchannel)
– Still my turn (=stalling for time)
– I'm done, it is now your turn
– None
• Belief Management:
– I accept your proposition
– I entertain your proposition
– I reject your proposition
– Do you accept my proposition? (=ynq)
7/15/2016
– None
7
Practical Goals
• In Spoken Dialogue Systems
– Disambiguate current DA
• Represent user input correctly
• Respond appropriately
– Predict next DA
• Switch Language Models for ASR
• Switch states in semantic processing
– Produce DA for next system turn appropriately
7/15/2016
8
Disambiguating Ambiguous DAs Intonationally
• Modal (Can/would/would..willing) questions
– Can you move the piano?
– Would you move the piano?
– Would you be willing to move the piano?
• Nickerson & Chu-Carroll ’99: Can info-requests
be disambiguated reliably from action-requests?
– By prosodic information?
– Role of politeness
7/15/2016
9
Production Studies
• Design
– Subjects read ambiguous questions in disambiguating
contexts
– Control for given/new and contrastiveness
– Polite/neutral/impolite readings
– ToBI-style labeling
• Problems:
– Cells imbalanced; little data
– No pretesting
– No distractors
– Same speaker reads both contexts
– No perception checks
7/15/2016
10
Results
• Indirect requests (e.g. for action)
– If L%, more likely (73%) to be indirect
– If H%,46% were indirect: differences in height
of boundary tone?
– Politeness: can differs in impolite (higher rise)
vs. neutral cases
– Speaker variability
• Some production differences
– Limited utility in production of indirect DAs
– Beware too steep a rise
7/15/2016
11
Corpus Studies: Jurafsky et al ‘98
• Can we distinguish different DA functions for
affirmative words
– Lexical, acoustic/prosodic/syntactic
differentiators for yeah, ok, uhuh, mhmm,
um…
– Functional categories to distinguish
•
•
•
•
•
7/15/2016
Continuers: Mhmm (not taking floor)
Assessments: Mhmm (tasty)
Agreements: Mhmm (I agree)
Yes answers: Mhmm (That’s right)
Incipient speakership: Mhmm (taking floor)
12
Questions
•
•
•
Are these terms important cues to dialogue
structure?
Does prosodic variation help to disambiguate
them?
Is there any difference in syntactic realization
of certain DAs, compared to others?
7/15/2016
13
Corpus
• SwitchBoard telephone conversation
corpus
– Hand segmented and labeled with DA
information (initially from text) using the
SWBD-DAMSL dialogue tagset
• ~60 labels that could be combined in different
dimensions
– 84% inter-labeler agreement on tags
– Tagset reduced to 42
• 7 CU-Boulder linguistics grad students labeling
switchboard conversations of human-to-human
interaction
7/15/2016
14
– Relabeling from speech  only 2% changed
labels (114/5757)
• 43/987 continuers --> agreements
• Why?
– Shorter duration, lower F0, lower energy, longer
preceding pause
– DAs analyzed for
• Lexical realization
• F0 and intensity features
• Syntactic patterns
7/15/2016
15
Results: Lexical Differences
• Agreements
– yeah (36%), right (11%),...
• Continuer
– uhuh (45%), yeah (27%),…
• Incipient speaker
– yeah (59%), uhuh (17%), right (7%),…
• Yes-answer
– yeah (56%), yes (17%), uhuh (14%),...
7/15/2016
16
Prosodic and Lexico/Syntactic Cues
• Over all DA’s, duration best differentiator
– Highly correlated with DA length in words
• Assessments:
– Pro Term + Copula + (Intensifier) +
Assessment Adjective
– That’s X (good, great, fine,…)
7/15/2016
17
Observations
• Yeah (and variations) ambiguous
– agreement at 36%
– incipient speaker at 59%
– Yes-answer at 86%
• Uh-huh (with its variations):
– a continuer at 45% (vs. yeah at 27%)
• Continuers (compared to agreements) are:
– shorter in duration
– less intonationally `marked’
– Preceded by longer pauses
7/15/2016
18
Hypothesis
• Prosodic information may be particularly helpful
in distinguishing DAs with less lexical content
7/15/2016
19
Automatic DA Detection
• Rosset & Lamel ’04: Can we detect DAs
automatically w/ minimal reliance on lexical
content?
– Lexicons are domain-dependent
– ASR output is errorful
• Corpora (3912 utts total)
– Agent/client dialogues in a French bank call
center, in a French web-based stock
exchange customer service center, in an
English bank call center
7/15/2016
20
• DA tags (44) similar to DAMSL
– Conventional (openings, closings)
– Information level (items related to the semantic
content of the task)
– Forward Looking Function:
• statement (e.g. assert, commit, explanation)
• infl on Hearer (e.g. confirmation, offer, request)
– Backward Looking Function:
• Agreement (e.g. accept, reject)
• Understanding (e.g. backchannel, correction)
– Communicative Status (e.g. self-talk, change-mind)
– NB: each utt could receive a tag for each
class, so utts represented as vectors
• But…only 197 combinations observed
7/15/2016
21
– Method: Memory-based learning (TIMBL)
• Uses all examples for classification
• Useful for sparse data
– Features
•
•
•
•
Speaker identity
First 2 words of each turn
# utts in turn
Previously proposed DA tags for utts in turn
– Results
• With true utt boundaries:
– ~83% accuracy on test data from same domain
– ~75% accuracy on test data from different domain
7/15/2016
22
– On automatically identified utt units: 3.3% ins, 6.6% del, 13.5%
sub
• Which DAs are easiest/hardest to detect?
7/15/2016
DA
Resp-to
Backch
GE.fr
52.0%
75.0%
CAP.fr
33.0%
72.0%
GE.eng
55.7%
89.2%
Accept
Assert
Expression
Comm-mgt
41.7%
66.0%
89.0%
86.8%
26.0%
56.3%
69.3%
70.7%
30.3%
50.5%
56.2%
59.2%
Task
85.4%
81.4%
78.8%
23
• Conclusions
– Strong ‘grammar’ of DAs in Spoken Dialogue
systems
– A few initial words perform as well as more
7/15/2016
24
Phonetic, Prosodic, and Lexical Context Cues to
DA Disambiguation
• Hypothesis: Prosodic information may be
important for disambiguating shorter DAs
• Observation: ASR errors suggest it would be
useful to limit the role of lexical content in DA
disambiguation as much as possible …and that
this is feasible
• Experiment:
– Can people distinguish one (short) DA from
another purely from
phonetic/acoustic/prosodic cues?
– Are they better with lexical context?
7/15/2016
25
The Columbia Games Corpus
Collection
• 12 spontaneous task-oriented dyadic
conversations in Standard American English.
• 2 subjects playing a computer game, no eye
contact.
Describer:
7/15/2016
Follower:
26
The Columbia Games Corpus
Affirmative Cue Words
Cue Words
– alright
– gotcha
– huh
– mm-hm
– okay
– right
– uh-huh
– yeah
– yep
– yes
– yup
7/15/2016
count
1. the
4565
2. of
1534
3. okay 1151
4. and
886
5. like
753
…
Functions
– Acknowledgment / Agreement
– Backchannel
– Cue beginning discourse
segment
– Cue ending discourse segment
– Check with the interlocutor
– Stall / Filler
– Back from a task
– Literal modifier
– Pivot beginning
– Pivot ending
27
Perception Study
Selection of Materials
Speaker 1: yeah um
there's like there's some
space there's
– Acknowledgment
/ Agreement
Speaker 2: okay –I think
I got it
Backchannel
– okay
– Cue beginning discourse
segment
Speaker 1: but it's gonna be below the onion
Speaker 2: okay
Speaker 1: okay alright I'll try it okay
Speaker 2: okay the owl is blinking
7/15/2016
28
Perception Study
Experiment Design
•
•
•
•
54 instances of ‘okay’ (18 for each function).
2 tokens for each ‘okay’:
Isolated condition: Only the word ‘okay’.
Contextualized condition: 2 full speaker turns:
– The turn containing the target ‘okay’; and
– The previous turn by the other speaker.
speakers
okay
contextualized ‘okay’
7/15/2016
29
Perception Study
Experiment Design
• Two conditions:
– Part 1: 54 isolated tokens
– Part 2: 54 contextualized tokens
• Subjects asked to classify each token of ‘okay’
as:
– Acknowledgment / Agreement, or
– Backchannel, or
– Cue beginning discourse segment.
7/15/2016
30
Perception Study
Definitions Given to the Subjects
• Acknowledge/Agreement:
– The function of okay that indicates “I believe
what you said” and/or “I agree with what you
say”.
• Backchannel:
– The function of okay in response to another
speaker's utterance that indicates only “I’m still
here” or “I hear you and please continue”.
• Cue beginning discourse segment
– The function of okay that marks a new
segment of a discourse or a new topic. This
use
of
okay
could
be
replaced
by
now.
7/15/2016
31
Perception Study
Subjects and Procedure
• Subjects:
– 20 paid subjects (10 female, 10 male).
– Ages between 20 and 60.
– Native speakers of English.
– No hearing problems.
• GUI on a laboratory workstation with
headphones.
7/15/2016
32
Results
Inter-Subject Agreement
• Kappa measure of agreement with respect to
chance (Fleiss ’71)
Isolated Condition
Contextualized Condition
Overall
.120
.294
Ack / Agree vs. Other
.089
.227
Backchannel vs. Other
.118
.164
Cue beginning vs. Other
.157
.497
7/15/2016
33
Results
Cues to Interpretation
• Phonetic transcription of okay:
• Isolated Condition
Strong correlation for realization of initial vowel
 Backchannel
 Ack/Agree, Cue Beginning
• Contextualized Condition
No strong correlations found for phonetic variants.
7/15/2016
34
Results
Cues to Interpretation
Isolated Condition
Contextualized Condition
Shorter /k/
Shorter latency between turns
Shorter pause before okay
Higher final pitch slope
Longer 2nd syllable
Lower intensity
Higher final pitch slope
More words by S2 before okay
Fewer words by S1 after okay
Lower final pitch slope
Lower overall pitch slope
Lower final pitch slope
Longer latency between turns
More words by S1 after okay
Ack / Agree
Backchannel
Cue beginning
S1 = Utterer of the target ‘okay’. S2 = The other speaker.
7/15/2016
35
Conclusions
• Agreement:
– Availability of context improves inter-subject
agreement.
– Cue beginnings easier to disambiguate than
the other two functions.
• Cues to interpretation:
– Contextual features override word features
– Exception: Final pitch slope of okay in both
conditions.
• Guide to generation…
7/15/2016
36
Summary: Dialogue Act Modeling for SDS
• DA identification
– Looks potentially feasible, even when
transcription is errorful
– Prosodic and lexical cues useful
• DA generation
– Descriptive results may be more useful for
generation than for recognition, ironically
– Choice of DA realization, lexical and prosodic
7/15/2016
37
Next Class
•
•
•
•
J&M 22.5
Hirschberg et al ’04
Goldberg et al ’03
Krahmer et al ‘01
7/15/2016
38
Download