Dialogue Systems Julia Hirschberg CS 4705 7/15/2016

advertisement
Dialogue Systems
Julia Hirschberg
CS 4705
7/15/2016
1
Today
•
•
•
•
•
Dialogue Systems and Human Conversation
Turns and Turn-taking
Speech Acts and Dialogue Acts
Grounding and Intentional Structure
Pragmatics
– Presupposition
– Conventional Implicature
– Conversational Implicature
7/15/2016
2
Dialogue System Applications
• Information providing
– 800-BING-411, Google Mobile App, Amtrak’s
Julie,
• Customer Care
– T-Mobile’s Call Center, AT&T Call Routing
• Training
– Language tutoring: e.g. Carnegie Speech, KTH
Ville
– Other research platforms: e.g. ItSpoke at UPitt
• Fun and games….
• Goal: Emulate Human-Human Behavior?
7/15/2016
3
Today
•
•
•
•
•
Dialogue Systems and Human Conversation
Turns and Turn-taking
Speech Acts and Dialogue Acts
Grounding and Intentional Structure
Pragmatics
– Presupposition
– Conventional Implicature
– Conversational Implicature
7/15/2016
4
Turn-taking Behavior
• Dialogue characterized by turn-taking
– How do speakers know what to say and when to
say it?
• Conversational partners expect certain patterns of
behavior in normal conversation
Pat: You got an A? That’s great!
Chris: Yeah, I’m really smart you know.
Chris: Well, I was just lucky I happened to read the
chapter on dialogue systems right before the test.
Otherwise I never would have squeaked through.
– Deviation is significant: dispreferred utterances
7/15/2016
5
• Children learn turn taking within first 2 years (Stern
’74)
• General individual differences
– Shy people pause longer and speak less and less
often (Pilkonis ’77)
– Schizophrenics, neurotics, depressed people less
skilled in turn-taking
7/15/2016
6
Cultural Differences in Turn-Taking
• Chinese telephone conversations
– Openings (Zhu ’04)
• Mandarin vs. British
• Identification differences
– British self-report
– Chinese callees ask the caller
– Closings (Sun ’05)
• 39 female-female Mandarin telephone conversations
• Closings initiated through matter-of-fact statement of intention to
end conversation
• Verbalized thanking occurs except in mother/daughter closings –
not the standard English model
– Finnish business calls (Halmari ’93) vs. American
• Americans get right to the point
• Finns chat
7/15/2016
7
Conversational Analysis (Sacks et al ’74)
• Can we characterize expectations of ‘what to say’
more generally?
• ‘Rules’ of turn-taking
– If, during this turn the current speaker has selected
A as the next speaker, then A must speak next
– If the current speaker does not select the next
speaker, any other speaker may take the next turn
– If no one else takes the next turn, the current
speaker may take the next turn
• Rules Apply at Transition Relevance Places (TRPs)
where something allows speaker changes to occur
7/15/2016
8
Where Can Speaker Shifts Occur
• Adjacency pairs
– Question/answer
– Greeting/greeting
– Compliment/downplayer
• Dispreferred responses
– Silence
– ‘No’ to a simple request without explanation
– Changing the topic abruptly without transition
– Important for Spoken Dialogue Systems
7/15/2016
9
Diarization: Automatic Speaker
Identification/Segmentation
• Segment audio corpora (Broadcast News, meetings,
telephone conversations) into speaker segments
– Speaker segmentation
– Speaker identification
– Speech and music
• Speaker segmentation (Diarization)
– Initial segmentation
– Segment clustering based on acoustic features
– State-of-the-art: 8.47% error
7/15/2016
10
• Speaker identification
– Linguistic information to identify speaker types
and speaker names (LIMSI ’04)
• Templates (“<name> has this report from <location>”)
• Results: 10.9% error on test set
– But only 10% of segments contain relevant patterns
– Estimate 25% error on broadcast news if segmentation and
clustering is done to id all of each speaker’s segments
7/15/2016
11
Turn-taking Behaviors Important for SDS
• System understanding:
– Is the user backchanneling or is she taking the turn
(does ‘ok’ mean ‘I agree’ or ‘I’m listening’)?
– Is this a good place for a system backchannel?
• System generation:
– How to signal to the user that the system system’s
turn is over?
– How to signal to the user that a backchannel might
be appropriate?
7/15/2016
12
Types of Behavior
• Smooth Switch: S1 is speaking and S2 speaks and
takes and holds the floor
• Hold: S1 is speaking, pauses, and continues to speak
• Backchannel: S1 is speaking and S2 speaks -- to
indicate continued attention -- not to take the floor
(e.g. mhmm, ok, yeah)
• How do people coordinate these behaviors with their
interlocutor?
• Acoustic-prosodic and lexical cues….
7/15/2016
13
Smooth Switch, Backchannel, and Hold Differences
7/15/2016
14
Today
•
•
•
•
•
Dialogue Systems and Human Conversation
Turns and Turn-taking
Speech Acts and Dialogue Acts
Grounding and Intentional Structure
Pragmatics
– Presupposition
– Conventional Implicature
– Conversational Implicature
7/15/2016
15
Speech Act Theory (Austin, Searle)
• Locutionary acts: the act of uttering (semantic
meaning)
• Illocutionary acts: the act S intends to convey by
the utterance (e.g. request, promise, statement)
• Perlocutionary acts: the rhetorical act S intends the
utterance to produce on H (e.g. regret, fear, hope)
• Indirect Speech Acts (a type of illocutionary act):
– It’s cold in here.
– Can you tell me the time.
7/15/2016
16
NLP Speech Acts
• Often identified with illocutionary force
• Can be indicated by performative verbs
– E.g. promise, order, ask, beseech, deny,
apologize, curse
– NB: Perlocutionary force cannot (I convince
you to vote for me for president)
• Searle’s ’75 taxonomy (assertives, directives,
commissives, expressives, declarations) now
vastly expanded
7/15/2016
17
Dialogue Acts in SDS
• Roughly correspond to Illocutionary acts
– Motivation: Improving Spoken Dialogue Systems
– Many coding schemes (e.g. DAMSL)
– Many-to-many mapping between DAs and words
• Agreement DA can realized by Okay, Um, Right, Yeah, …
• But each of these can express multiple DAs, e.g.
S: You should take the 10pm flight.
U: Okay
…that sounds perfect.
…but I’d prefer an earlier flight.
…(I’m listening)
7/15/2016
18
• DA recognition important for
– Turn recognition (which grammar to use when)
– Turn disambiguation, e.g.
S: What city do you want to go to?
U1: Boston. (reply)
U2: Boston? (request for information)
S: Do you want to go to Boston?
U1: Boston. (confirmation)
U2: Boston? (question)
7/15/2016
19
Automatic DA Detection
• Rosset & Lamel ’04: Can we detect DAs
automatically w/ minimal reliance on lexical content?
– Lexicons are domain-dependent
– ASR output is errorful
• Corpora (3912 utts total)
– Agent/client dialogues in a French bank call center,
in a French web-based stock exchange customer
service center, in an English bank call center
7/15/2016
20
• DA tags (44)
– Conventional (openings, closings)
– Information level (items related to the semantic content of
the task)
– Forward Looking Function:
• statement (e.g. assert, commit, explanation)
• infl on Hearer (e.g. confirmation, offer, request)
– Backward Looking Function:
• Agreement (e.g. accept, reject)
• Understanding (e.g. backchannel, correction)
– Communicative Status (e.g. self-talk, change-mind)
– NB: Each utt could receive a tag for each class, so
utts represented as vectors
• But…only 197 combinations observed
7/15/2016
21
– Method: Memory-based learning (TIMBL)
• Uses all examples for classification
• Useful for sparse data
– Features
•
•
•
•
Speaker identity
First 2 words of each turn
# utts in turn
Previously proposed DA tags for utts in turn
– Results
• With true utt boundaries:
– ~83% accuracy on test data from same domain
– ~75% accuracy on test data from different domain
7/15/2016
22
– On automatically identified utt units: 3.3% ins, 6.6% del, 13.5% sub
• Which DAs are easiest/hardest to detect?
7/15/2016
DA
Resp-to
Backch
GE.fr
52.0%
75.0%
CAP.fr
33.0%
72.0%
GE.eng
55.7%
89.2%
Accept
Assert
Expression
Comm-mgt
41.7%
66.0%
89.0%
86.8%
26.0%
56.3%
69.3%
70.7%
30.3%
50.5%
56.2%
59.2%
Task
85.4%
81.4%
78.8%
23
• Conclusions
– Strong ‘grammar’ of DAs in Spoken Dialogue
systems
– A few initial words perform as well as more
7/15/2016
24
Today
•
•
•
•
•
Dialogue Systems and Human Conversation
Turns and Turn-taking
Speech Acts and Dialogue Acts
Grounding and Intentional Structure
Pragmatics
– Presupposition
– Conventional Implicature
– Conversational Implicature
7/15/2016
25
Grounding (Stalnaker ’78, Clark & Schaefer ’89)
• Common Ground: the set of propositions mutually
believed by S and H
– Principle of Closure: agents performing an action
require evidence that they have succeeded – and S
needs to know when s/he has succeeded in
communicating
– Presentation of utterance by S
– Acceptance of utterance by H
• How does grounding take place in conversation?
7/15/2016
26
Grounding Strategies from Weak to Strong
I need to get your homework by Monday.
• Continued attention
…
• Next contribution
I should be finished Sunday night.
• Acknowledgment
Mhmm…
• Demonstration
You need this soon.
• Display
You need to get my homework Monday.
7/15/2016
27
Discourse Structure and Intention
Welcome to word processing.
That’s using a computer to type letters and reports.
Make a typo?
No problem.
Just back up, type over the mistake, and it’s gone.
And, it eliminates retyping.
And, it eliminates retyping.
7/15/2016
28
Structures of Discourse Structure (Grosz & Sidner
‘86)
• Leading alternative to Rhetorical Structure Theory
– Provides for multiple levels of analysis: S’s
purpose as well as content of utterances and S and
H’s attentional state
– Identifies only a few, general relations that hold
among intentions
• Three components:
– Linguistic structure
– Intentional structure
– Attentional structure
7/15/2016
29
Linguistic Structure
• What is actually said/written
• How is this represented?
– Assume discourse is segmented into Discourse
Segments (DS) -- how?
• what is basic unit of analysis?
• segmentation agreement
• automatic segmentation
– Embedding relations: topic structure
– Cue phrases
7/15/2016
30
Intentional Structure
• Discourse purpose (DP): basic purpose of the
discourse
• Discourse segment purposes (DSPs): how this
segment contributes to the overall DP
• Segment relations:
– Satisfaction-precedence: DSP1 must be satisfied
before DSP2
– Dominance: DSP1 dominates DSP2 if fulfilling
DSP2 constitutes part of fulfilling DSP1
7/15/2016
31
Attentional State
• Focus stack:
– Stack of focus spaces, each containing objects,
properties and relations salient during each DS,
plus the DSP (content plus purpose)
– State changes modeled by transition rules
controlling the addition/deletion of focus spaces
• Information at lower levels may or may not be
available at higher levels
• Focus spaces are pushed onto the stack when
7/15/2016
– new DS or embedded DS (e.g. DS that are dominated by
other DS) are begun
– popped when they are completed
32
Limits of G&S ‘86
• Assumes that discourses are task-oriented
• Assumes there is a single, hierarchical structure
shared by S and H
• How do we identify entities that are salient (on the
focus stack)? Grammatical function?
• Do people really build such structures when they
converse? Use them in interpreting what others say?
7/15/2016
33
How are these structures recognized from a
discourse?
• Linguistic markers:
– tense and aspect
– cue phrases
– intonational variation
• Inference of S intentions
• Inference from task structure
• Intonational Information
7/15/2016
34
Today
•
•
•
•
•
Dialogue Systems and Human Conversation
Turns and Turn-taking
Speech Acts and Dialogue Acts
Grounding and Intentional Structure
Pragmatics
– Presupposition
– Conventional Implicature
– Conversational Implicature
7/15/2016
35
Implicit Information
• Question interpretation in SDS
S: Are you traveling to La Guardia?
U: I’m going to New York.
U: When does the 5 o’clock train leave from Newark?
S : <U believes there is a 5 o’clock train from Newark.>
S: I heard you say New York City?
U: New York City?
7/15/2016
36
• Cooperative responses in SDS
– Correcting misconceptions
U: When does the 5 o’clock train leave from Newark?
S (thinks): <U believes there is a 5 o’clock train from
Newark>
S: There is no 5 o’clock train from Newark; there is a
5:20 tho.
– Providing more information than is asked for
U: Do I have the $500 minimum in that account?
S1: Yes.
S2: You have $739.
7/15/2016
37
Discourse Pragmatics
• Context-dependent meaning, invited inference,
intended meaning – vs. “propositional content”
• Indirect Speech Acts
• Presupposition
• Implicature
– Conversational
– Conventional
7/15/2016
38
Presupposition
• What is `taken for granted’, given some linguistic
expression X
The King of France is bald. (Is there a King of
France?
All of Herman’s children are bright. (Does
Herman have children?)
• Linguistic Test: Negative, interrogative, and
embedded X preserve the same assumption
The King of France is not bald. Is the King of
France bald? I thought that the King of France
was
bald.
7/15/2016
39
• Presuppositions can be suspended but they cannot
be felicitously denied
All of Herman’s children are bright, if he indeed
has children.
*All of Herman’s children are bright, though he
has no children.
7/15/2016
40
Presupposition and SDS
• Presuppositional information adds facts/beliefs to the
dialogue history
– Information to store and check for accuracy
• My wife will also be a driver (S has a spouse)
• My number is 212-555-1212 (S has a telephone
account)
• I’ll take the red-eye (S believes there is a red-eye)
• I’m upset about being charged for a call to Ethiopia (S
was charged for a call to Ethiopia)
• I’m a bachelor. (S is an unmarried male person)
7/15/2016
41
Conversational Implicature
• H. Paul Grice: Conversation is not formal logic
– and is not ‘^’, or is not ‘v’, some is not 
– George got married and had a baby.
– Was it a boy or a girl?
– Some people sent baby gifts.
– Principles of Cooperative Conversation:
Make your conversational contribution such as is
required, at the stage at which it occurs, by the accepted
purpose or direction of the talk exchange in which you
are engaged
7/15/2016
42
Maxims of Cooperative Conversation
• Maxim of Quantity:
– 1. Make your contribution as informative as is
required (for the current purposes of the exchange)
– 2. Do not make your contribution more than is
required.
• Maxim of Quality:
– Try to make your contribution one that is true.
• 1. Do not say what you believe to be false.
• 2. Do not say that for which you lack adequate
evidence.
•7/15/2016
Maxim of Relation: Be relevant
43
• Maxim of Manner: Be perspicuous
– 1. Avoid obscurity of expression.
– 2. Avoid ambiguity.
– 3. Be brief (avoid unnecessary prolixity).
– 4. Be orderly.
• Maxims may be
– Observed
John got into Columbia and won a scholarship.
– Violated quietly
I never said that.
– Flouted
7/15/2016
He has excellent handwriting….
44
• Speakers may not be able to observe all maxims
simultaneously
• Implicature interpretation requires both S and H to
understand the CP and Maxims
– That which S licenses and H infers via the CP
and the Maxims
A. I got an A on that exam.
B. And I’m Queen Marie of Rumania.
A. Where did you go?
B. Out.
7/15/2016
45
A: Where does Arnold live?
B: Somewhere in southern California.
7/15/2016
46
Other Implicatures
• Generalized Conversational, e.g. indefinites
A car ran over John’s foot. (not John’s car)
John broke a foot yesterday. (John’s foot)
John broke a nose yesterday. (not his own)
• Conventional
George is short but brave.
George is short; therefore he is brave.
7/15/2016
47
Summary
•
•
•
•
•
Dialogue Systems and Human Conversation
Turns and Turn-taking
Speech Acts and Dialogue Acts
Grounding and Intentional Structure
Pragmatics
– Presupposition
– Conventional Implicature
– Conversational Implicature
7/15/2016
50
Spoken Language Processing
• These are only a few of the challenges of Spoken
Language Processing (CS 4706)
• How does it go beyond CS 4705?
– Speech analysis tools and techniques
• Deception, charisma, emotional speech, medical states
– Speech technologies
•
•
•
•
7/15/2016
Text-to-Speech
Automatic Speech Recognition
Speaker ID
Language and dialect ID
51
Project
• Build a Spoken Dialogue System of your own
– Choose the domain and task
– Build a speech recognizer, a text-to-speech
synthesis system, and a dialogue manager (from
libraries)
– Demo your system and maybe win a prize
7/15/2016
52
Next Class
• Review for the Final Exam
7/15/2016
53
Download