AT&T Labs Research Intonational Variation in Spoken Dialogue Systems Generation and Understanding

advertisement
AT&T Labs Research
Intonational Variation in Spoken
Dialogue Systems
Generation and Understanding
Julia Hirschberg
Charles University
March 2001
7/15/2016
1
Talking to a Machine….and
Getting an Answer
• Today’s spoken dialogue systems make it
possible to accomplish real tasks, over
the phone, without talking to a person
 Real-time speech technology enables real-time
interaction
 Speech recognition and understanding is
‘good enough’ for limited, goal-directed
interactions
 Careful dialogue design can be tailored to
capabilities of component technologies
• Limited domain
• Judicious use of system initiative vs. mixed initiative
Julia 7/15/2016
2
Some Representative
Spoken Dialogue Systems
Deployed
Brokerage
(Schwab-Nuance)
Mixed
Initiative
User
E-MailAccess
(myTalk)
System
Initiative
Directory
Assistant (BNR)
Air Travel
(UA Info-SpeechWorks)
Communicator
(DARPA Travel)
MIT
Galaxy/Jupiter
Communications
(Wildfire, Portico)
Customer Care
(HMIHY – AT&T)
Banking
(ANSER)
ATIS
(DARPA Travel)
Train Schedule
(ARISE)
Multimodal Maps
(Trains, Quickset)
1980+
1990+
Julia 7/15/2016
1993+
1995+ 1997+ 1999+
3
AT&T Labs Research
But we have a long way to go…
7/15/2016
4
Course Overview
• Spoken Dialogue Systems today
 Evaluating their weaknesses
 Role of intonational variation
• Importance of corpora and conventions for
annotating them
• Intonational ‘meanings’
• Prosody in Speech Generation
• Prosody in Speech Recognition/
Understanding
Julia 7/15/2016
5
Course Overview
• Spoken Dialogue Systems today
 Evaluating their strengths and weaknesses
 Role of intonational variation
• Importance of corpora and conventions for
annotating them
• Intonational ‘meanings’
• Prosody in Speech Generation
• Prosody in Speech Recognition/
Understanding
Julia 7/15/2016
6
Evaluating Dialogue Systems
• PARADISE framework (Walker et al ’00)
• “Performance” of a dialogue system is
affected both by what gets accomplished by
the user and the dialogue agent and how it
gets accomplished
Maximize
Task Success
Minimize
Costs
Efficiency
Measures
Julia 7/15/2016
Qualitative
Measures
7
Task Success
•Task goals seen as Attribute-Value Matrix
ELVIS e-mail retrieval task (Walker et al ‘97)
“Find the time and place of your meeting with
Kim.”
Attribute
Selection Criterion
Time
Place
Value
Kim or Meeting
10:30 a.m.
2D516
•Task success defined by match between AVM
values at end of with “true” values for AVM
Julia 7/15/2016
8
Metrics
• Efficiency of the Interaction:User Turns,
System Turns, Elapsed Time
• Quality of the Interaction: ASR rejections,
Time Out Prompts, Help Requests, BargeIns, Mean Recognition Score (concept
accuracy), Cancellation Requests
• User Satisfaction
• Task Success: perceived completion,
information extracted
Julia 7/15/2016
9
Experimental Procedures
• Subjects given specified tasks
• Spoken dialogues recorded
• Cost factors, states, dialog acts automatically
logged; ASR accuracy,barge-in hand-labeled
• Users specify task solution via web page
• Users complete User Satisfaction surveys
• Use multiple linear regression to model User
Satisfaction as a function of Task Success
and Costs; test for significant predictive
factors
Julia 7/15/2016
10
User Satisfaction:
Sum of Many Measures
• Was Annie easy to
understand in this
conversation? (TTS
Performance)
• In this conversation, did
Annie understand what you
said? (ASR Performance)
• In this conversation, was it
easy to find the message you
wanted? (Task Ease)
• Was the pace of interaction
with Annie appropriate in this
conversation? (Interaction
Pace)
• In this conversation, did you
know what you could say at
each point of the dialog?
Julia 7/15/2016
(User Expertise)
• How often was Annie
sluggish and slow to reply to
you in this conversation?
(System Response)
• Did Annie work the way you
expected her to in this
conversation? (Expected
Behavior)
• From your current
experience with using Annie
to get your email, do you
think you'd use Annie
regularly to access your mail
when you are away from your
desk? (Future Use)
11
Performance Functions from
Three Systems
• ELVIS User Sat.= .21* COMP + .47 * MRS - .15 * ET
• TOOT User Sat.= .35* COMP + .45* MRS - .14*ET
• ANNIE User Sat.= .33*COMP + .25* MRS +.33*
Help
 COMP: User perception of task completion
(task success)
 MRS: Mean recognition accuracy (cost)
 ET: Elapsed time (cost)
 Help: Help requests (cost)
Julia 7/15/2016
12
Performance Model
• Perceived task completion and mean
recognition score are consistently significant
predictors of User Satisfaction
• Performance model useful for system
development
 Making predictions about system modifications
 Distinguishing ‘good’ dialogues from ‘bad’
dialogues
• But can we also tell on-line when a dialogue is
‘going wrong’
Julia 7/15/2016
13
Course Overview
• Spoken Dialogue Systems today
 Evaluating their weaknesses
 Role of intonational variation
• Importance of corpora and conventions for
annotating them
• Intonational ‘meanings’
• Prosody in Speech Generation
• Prosody in Speech Recognition/
Understanding
Julia 7/15/2016
14
How to Predict Problems ‘OnLine’?
• Evidence of system misconceptions reflected in user
responses (Krahmer et al ‘99, ‘00)
 Responses to incorrect verifications
• contain more words (or are empty)
• show marked word order (especially after implicit verifications)
• contain more disconfirmations, more repeated/corrected info
 ‘No’ after incorrect verifications vs. other ynq’s
•
•
•
•
•
has higher boundary tone
wider pitch range
longer duration
longer pauses before and after
more additional words after it
Julia 7/15/2016
15
• User information state reflected response
(Shimojima et al ’99, ‘01)
 Echoic responses repeat prior information – as
acknowledgment or request for confirmation
S1: Then go to Keage station.
S2: Keage.
 Experiment:
• Identify ‘degree of integration’ and prosodic features
(boundary tone, pitch range, tempo, initial pause)
• Perception studies to elicit ‘integration’ effect
 Results: fast tempo, little pause and low pitch
signal high integration
Julia 7/15/2016
16
AT&T Labs Research
Can Prosodic Information Help
Identify Dialogue System Problems
‘On Line’?
7/15/2016
17
Motivation
• Prosody conveys information about:
 The state of the interaction:
• Is the user having trouble being understood?
• Is the user having trouble understanding the system?
 What the speaker is trying to convey
• Is this a statement or a question?
 The structure of the dialogue
• Is the user or the system trying to start a new topic?
 The emotions of the speaker
• Is the speaker getting angry, frustrated?
Julia 7/15/2016
18
Past Research Issues and
Applications
• How prosodic variation influences ‘meaning’
 Focus or contrast
 Given/new
• How prosodic variation is related to other
linguistic components
 Syntax
 Semantics
• How to model prosodic variation effectively
• Applications: Text-to-Speech
Julia 7/15/2016
19
Current Trends
• New description schemes (e.g. ToBI)
• Corpus-based research and machine
learning
• Emphasis on evaluation of algorithms and
systems (NLE ‘00 special issue)
• Investigation of spontaneous speech
phenomena and variation in speaking
style
• Applications to CTS, ASR and SDS
Julia 7/15/2016
20
Course Overview
• Spoken Dialogue Systems today
 Evaluating their weaknesses
 Role of intonational variation
• Importance of corpora and conventions for
annotating them
• Intonational ‘meanings’
• Prosody in Speech Generation
• Prosody in Speech Recognition/
Understanding
Julia 7/15/2016
21
Corpora
• Public and semi-public databases
 ATIS, SwitchBoard, Call Home (NIST/DARPA/LDC)
 TRAINS/TRIPS (U. Rochester)
 FM Radio (BU)
• Private collections
 Acquired for speech or dialogue research (e.g.
August, Gustafson & Bell ’00)
 Meeting, call center, focus group collections
 Accidentally collected
• The Web
 Mud/Moo dialogues
Julia 7/15/2016
22
To(nes and)B(reak)I(ndices)
• Developed by prosody researchers in four
meetings over 1991-94
• Goals:
 devise common labeling scheme for Standard
American English that is robust and reliable
 promote collection of large, prosodically
labeled, shareable corpora
• ToBI standards also proposed for
Japanese, German, Italian, Spanish,
British and Australian English,....
Julia 7/15/2016
23
• Minimal ToBI transcription:
 recording of speech
 f0 contour
 ToBI tiers:
• orthographic tier: words
• break-index tier: degrees of junction (Price et al ‘89)
• tonal tier: pitch accents, phrase accents, boundary
tones (Pierrehumbert ‘80)
• miscellaneous tier: disfluencies, non-speech sounds,
etc.
Julia 7/15/2016
24
Sample ToBI Labeling
Julia 7/15/2016
25
• Online training material,available at:
 http://www.ling.ohio-state.edu/phonetics/ToBI/
• Evaluation
 Good inter-labeler reliability for expert and
naive labelers: 88% agreement on
presence/absence of tonal category, 81%
agreement on category label, 91% agreement
on break indices to within 1 level (Silverman et
al. ‘92,Pitrelli et al ‘94)
Julia 7/15/2016
26
Course Overview
• Spoken Dialogue Systems today
 Evaluating their weaknesses
 Role of intonational variation
• Importance of corpora and conventions for
annotating them
• Intonational ‘meanings’
• Prosody in Speech Generation
• Prosody in Speech Recognition/
Understanding
Julia 7/15/2016
27
Pitch Accent/Prominence in ToBI
• Which items are made intonationally
prominent and how?
• Accent type:





H*
L*
L*+H
L+H*
H+!H*
Julia 7/15/2016
simple high (declarative)
simple low (ynq)
scooped, late rise (uncertainty/ incredulity)
early rise to stress (contrastive focus)
fall onto stress (implied familiarity)
28
•Downstepped accents:
•!H*, L+!H*, L*+!H
•Degree of prominence:
within a phrase: HiF0
across phrases
Julia 7/15/2016
29
Functions of Pitch Accent
• Given/new information
 S: Do you need a return ticket.
 U: No, thanks, I don’t need a return.
• Contrast (narrow focus)
 U: No, thanks, I don’t need a RETURN…. (I
need a time schedule, receipt,…)
• Disambiguation of discourse markers
 S: Now let me get you the train information.
 U: Okay (thanks) vs. Okay….(but I really
want…)
Julia 7/15/2016
30
Prosodic Phrasing in ToBI
• ‘Levels’ of phrasing:
 intermediate phrase: one or more pitch
accents plus a phrase accent (Hor L- )
 intonational phrase: 1 or more intermediate
phrases + boundary tone (H% or L% )
• ToBI break-index tier
 0 no word boundary
 1 word boundary
 2 strong juncture with no tonal markings
 3 intermediate phrase boundary
 4 intonational phrase boundary
Julia 7/15/2016
31
Functions of Phrasing
• Disambiguates syntactic constructions,
e.g. PP attachment:
 S: You should buy the ticket with the discount
coupon.
• Disambiguates scope ambiguities, e.g.
Negation:
 S: You aren’t booked through Rome because
of the fare.
• Or modifier scope:
 S: This fare is restricted to retired politicians
and civil servants.
Julia 7/15/2016
32
Contours: Accent + Phrasing
• What do intonational contours ‘mean’
(Ladd ‘80, Bolinger ‘89)?
 Speech acts (statements, questions, requests)
S: That’ll be credit card? (L* H- H%)
 Propositional attitude (uncertainty, incredulity)
S: You’d like an evening flight. (L*+H L- H%)
 Speaker affect (anger, happiness, love)
U: I said four SEVEN one! (L+H* L- L%)
 “Personality”
S: Welcome to the Sunshine Travel System.
Julia 7/15/2016
33
Pitch Range and Timing
• Level of speaker engagement
 S: Welcome to InfoTravel. How may I help you?
• Contour interpretation
 S: You can take the L*+H bus from Malpensa to
Rome L-H%.
 U: Take the bus. vs. Take the bus!
• Discourse/topic structure
Julia 7/15/2016
34
AT&T Labs Research
Can systems make use of this
information?
Can they produce it??
Can they recognize it??
7/15/2016
35
Download