SDS Future Julia Hirschberg LSA07 353 7/15/2016

advertisement
SDS Future
Julia Hirschberg
LSA07 353
7/15/2016
1
Today
•
•
•
•
Whither Spoken Dialogue Systems?
Technology issues
Human factors issues
Taking automated dialogue to the next level
– Modeling users’ emotional state
– Entrainment/adaptation/… to users
– System personality
– Cultural sensitivity
7/15/2016
2
Technology Issues
• Better ASR
– Fast and accurate
– Better rejection capabilities
– Trained on real dialogue phenomena (hyper-articulation,
self-repairs and other disfluencies) and broader subject pool
• More sophisticated semantic representations
• Automated call routing
• Tools to automate creation of new systems
– Recognizers
• More accurate
• Easier to train
– Dialogue flow schemes
– TTS voices
7/15/2016
3
Human Factors Issues
• Improved modeling of human/human interaction
– Better model of turn-taking (e.g. backchanneling
behavior, timing issues)
– Incorporation of dialogue act disambiguation
findings
– Error detection/correction capabilities
• Support for mixed initiative
– Moving away from the voice menu model
• Better ways to evaluate system performance
• Adaptation/customization for frequent/power users
7/15/2016
4
Today
•
•
•
•
Whither Spoken Dialogue Systems?
Technology issues
Human factors issues
Taking automated dialogue to the next level
– Modeling users’ emotional state
– Entrainment/adaptation/… to users
– System personality
– Cultural sensitivity
7/15/2016
5
Emotion and Speaker State
• A speaker’s emotional state represents important and
useful information
– To recognize
• Anger/frustration in call center SDS
• Confidence/uncertainty in a tutoring domain
– To generate (e.g. any emotion for games)
– Prosodic information has proven quite useful in
detecting different emotions automatically
7/15/2016
6
Studies of Emotional Speech in Human/Human Corpora
• Anger/frustration
– Travel scenarios (Batliner et al (2003), Ang et
al (2002))
– Call Centers (Liscombe et al (2005), Vidrascu
& Devillers (2005), Lee & Narayanan (2005))
• Other emotions
– Meetings (Wrede & Shriberg (2003))
– Unconstrained (Roach (2000), Cowie et al
(2001), Campbell (2003),…)
7/15/2016
7
Issues in Emotional Speech Studies
• Data debate:
– Acted speech vs. natural (hand labeled) corpora
• Classification tasks:
• Distinguish specific ‘classic’ emotions
• Distinguish negative emotions
• Distinguish valence, activation
• Representations of prosodic features
– Direct modeling via acoustic correlates
– A symbolic representation (e.g. ToBI)
7/15/2016
8
Acted Speech: LDC Emotional Speech Corpus
happy
sad
angry
confident
frustrated
friendly
interested
7/15/2016
anxious
bored
encouraging
9
Can We Distinguish Classic Emotions in
Acted Speech?
• User study to classify tokens from LDC corpus
• 10 emotions:
– Positive: confident, encouraging, friendly,
happy, interested
– Negative: angry, anxious, bored, frustrated, sad
– Chosen from most convincing
• Machine learning classification of tokens by
majority label (binary classification)
– (Liscombe, Hirschberg & Venditti, 2003)
7/15/2016
10
What Features Are Useful in Emotion
Classification?
• Features:
– Automatically extracted pitch, intensity, rate
– Spectral tilt from hand-segmented vowels
– Hand-labeled ToBI contours
• Results:
– Direct modeling of acoustic/prosodic features
• 62% average baseline
• 75% average accuracy
• Acoustic-prosodic features identify activation
– Higher-level ToBI features distinguish valence
7/15/2016
• H-L% correlated with negative emotions
• L-L% with positive
11
Accuracy Distinguishing One Emotion from the
Rest: Direct Modeling
7/15/2016
Emotion
Baseline
Accuracy
angry
69.32%
77.27%
confident
75.00%
75.00%
happy
57.39%
80.11%
interested
69.89%
74.43%
encouraging
52.27%
72.73%
sad
61.93%
80.11%
anxious
55.68%
71.59%
bored
66.48%
78.98%
friendly
59.09%
73.86%
frustrated
59.09%
73.86%
12
Different Valence/Different Activation
7/15/2016
13
Different Valence/ Same Activation
7/15/2016
14
Can We Identify Emotions in Natural Speech?
• AT&T’s “How May I Help You?” system
• Liscombe, Guicciardi, Tur & Gokken-Tur ‘05
• Customers are sometimes angry, frustrated
• Data:
– 5690 operator/caller dialogues with 20,013 caller turns
– Labeled for degrees of anger, frustration, negativity and
collapsed to positive vs. frustrated vs. angry
7/15/2016
15
HMIHY Example
Very Frustrated
Somewhat Frustrated
7/15/2016
16
Features
•
•
•
•
Automatic acoustic-prosodic
Lexical
Pragmatic (labeled DAs)
Contextual (all above features for preceding 1 or 2
turns)
7/15/2016
17
Direct Modeling of Prosody Features in Context
M edian Pitch
M ean Energy
Speaking Rate
2
1. 5
Z
s
c
o
r
e
1
0. 5
0
-0. 5
-1
-1. 5
-2
Positive
Positive
Positive
Utterance
7/15/2016
18
Direct Modeling of Prosodic Features in
Context
M edian Pitch
M ean Energy
Speaking Rate
2
1. 5
Z
s
c
o
r
e
1
0. 5
0
-0. 5
-1
-1. 5
-2
Positive
Frustrated
Angry
Utterance
7/15/2016
19
Results
Feature Set
Accuracy
Rel. Improv. over
Baseline
Majority Class
73.1%
-----
pros+lex
76.1%
-----
pros+lex+da
77.0%
1.2%
all
79.0%
3.8%
7/15/2016
20
Implications for SDS
• SDS should be able to take advantage of current
imperfect emotion prediction capabilities
– Even if you miss some angry people….
– E.g. Some call center software monitors
conversations for regions of high intensity
7/15/2016
21
Today
•
•
•
•
Whither Spoken Dialogue Systems?
Technology issues
Human factors issues
Taking automated dialogue to the next level
– Modeling users’ emotional state
– Entrainment/adaptation/… to users
– System personality
– Cultural sensitivity
7/15/2016
22
Entrainment/Adaptation/Accommodation/Alignment
• Hypothesis: over time, people tend to adapt their
communicative behavior to that of their
conversational partner
• Issues
– What are the dimensions of entrainment?
– How rapidly do people adapt?
– Does entrainment occur (on the human side) in
human/computer conversations?
7/15/2016
23
Varieties of Entrainment…
• Lexical: S and H tend over time to adopt the same method of
referring to items in a discourse
A: It’s that thing that looks like a harpsichord.
B: So the harpsichord-looking thing…
....
B: The harpsichord…
• Phonological
– Word pronunciation: voice/voiceless /t/ in better
• Acoustic/Prosodic
– Speaking rate, pitch range, choice of contour
• Discourse/dialogue/social
– Marking of topic shift, turn-taking
7/15/2016
24
The Vocabulary Problem
• Furnas et al ’87: the probability that 2 subjects will producing
the same name for a command for common computer
operations varied from .07-.18
– Remove a file: remove, delete, erase, kill, omit, destroy,
lose, change, trash
– With 20 synonyms for a single command, the likelihood
that 2 people will choose the same one was 80%
– With 25 commands, the likelihood that 2 people who
choose the same term think it means the same command
was 15%
• How can people possibly communicate?
– They collaborate on choice of referring expressions
7/15/2016
25
Early Studies of Priming Effects
• Hypothesis: Users will tend to use the vocabulary and
syntax the system uses
– Evidence from data collections in the field
• Systems should take advantage of this proclivity to
prime users to speak in ways that the system can
recognize well
7/15/2016
26
User Responses to Vaxholm
The answers to the question:
“What weekday do you want to go?”
(Vilken veckodag vill du åka?)
•
•
•
•
•
22%
11%
11%
7%
6%
• -
7/15/2016
Friday (fredag)
I want to go on Friday (jag vill åka på fredag)
I want to go today (jag vill åka idag)
on Friday (på fredag)
I want to go a Friday (jag vill åka en fredag)
are there any hotels in Vaxholm?
(finns det några hotell i Vaxholm)
27
Verb Priming: How often do you go abroad on holiday?
Hur ofta åker du utomlands på semestern? Hur ofta reser du utomlands på semestern?
jag åker en gång om året kanske
jag åker ganska sällan utomlands på semester
jag åker nästan alltid utomlands under min
semester
jag åker ungefär 2 gånger per år utomlands på
semester
jag åker utomlands nästan varje år
jag åker utomlands på semestern varje år
jag åker utomlands ungefär en gång om året
jag är nästan aldrig utomlands
en eller två gånger om året
en gång per semester
kanske en gång per år
ungefär en gång per år
åtminståne en gång om året
nästan aldrig
7/15/2016
jag reser en gång om året utomlands
jag reser inte ofta utomlands på semester det blir mera i
arbetet
jag reser reser utomlands på semestern vartannat år
jag reser utomlands en gång per semester
jag reser utomlands på semester ungefär en gång per år
jag brukar resa utomlands på semestern åtminståne en
gång i året
en gång per år kanske
en gång vart annat år
varje år
vart tredje år ungefär
nu för tiden inte så ofta
varje år brukar jag åka utomlands
28
Results
no
no
reuse
4% 2%answer
other
24%
reuse
52%
18%
ellipse
7/15/2016
29
Lexical Entrainment in Referring Expressions
• Choice of Referring Expressions: Informativeness vs.
availability (basic level or not) vs. saliency vs. recency
• Gricean prediction
– People use descriptions that minimally but effectively
distinguish among items in the discourse
• Garrod & Anderson ’87 Output/Input Principle
– Conversational partners formulate their current utterance
according to the model used to interpret their partner’s
most recent utterance
• Clark, Brennan, et al’s Conceptual Pacts
– People make Conceptual Pacts wrt appropriate referring
expressions made with particular conversational partners
– They are loath to abandon these even when shorter
expressions possible
7/15/2016
30
Entrainment in Spontaneous Speech
S13: the orange M&M looking kind of scared and then a one on
the bottom left and a nine on the bottom right
S12: alright I have the exact same thing I just had it's an M&M
looking scared that's orange
S13: yeah the scared M&M guy yeah
S12: framed mirror and the scared M&M on the lower right
S13: and it's to the right of the scared M&M guy
S13: yeah and the iron should be on the same line as the
frightened M&M kind of like an L
S12: to the left of the scared M&M to the right of the onion and
above the iron
7/15/2016
31
Extraterrestrial vs Alien I
s11: okay in the middle of the card I have an
extraterrestrial figure
s11: okay middle of the card I have the extraterrestrial
…
s10: I've got the blue lion with the extraterrestrial on
the lower right
s11: okay I have the extraterrestrial now and then I
have the eye at the bottom right corner
s10: my extraterrestrial's gone
7/15/2016
32
Extraterrestrial vs. Alien II
S03: okay I have a blue lion and then the extraterrestrial at the
lower right corner
S11: mm I'll pass I have the alien with an eye in the lower right
corner
S03: um I have just the alien so I guess I'll match that
-----------------------------------------------------------------------------S10: yes now I've got that extraterrestrial with the yellow lion and
the money
…
S12: oh now I have the blue lion in the center with our little alien
buddy in the right hand corner
S10: with the alien buddy so I'm gonna match him with the single
blue lion okay I've got our alien with the eye in the corner
7/15/2016
33
Timing and Voice Quality
• Guitar & Marchinkoski ’01:
– How early do we start to adapt to others’ speech?
– Do children adapt their speaking rate to their mother’s
speech?
• Study:
– 6 mothers spoke with their own (normally speaking) 3-yrolds (3M, 3F)
– Mothers’ rates significantly reduced (B) or not (A) in A-BA-B design
• Results:
– 5/6 children reduced their rates when their mothers spoke
more slowly
7/15/2016
34
• Sherblom & La Riviere ’87: How are speech timing and voice
quality affected by a non-familiar conversational partner?
• Study:
– 65 pairs of undergraduates asked to discuss a ‘problem
situation’ together
– Utter a single sentence before and after the conversation
– Sentences compared for speaking rate, utterance length and
vocal jitter
• Results:
– Substantial influence of partner on all 3 measures
– Interpersonal uncertainty and differences in arousal
influenced degree of adaptation
7/15/2016
35
Amplitude and Response Latency
• Coulston et al ’02:
– Do humans adapt to the behavior of non-human partners?
– Do children speak more loudly to a loud animated
character?
• Study:
– 24 7-10-yr olds interacted with an extroverted, loud
animated character and with an introverted, soft character
(TTS voices)
– Multiple tasks using different amplitude ranges
– Human/TTS amplitudes and latencies compared
• Results:
– 79-94% of children adapted their amplitude, bidirectionally
– Also adapted their response latencies (mean 18.4%),
bidirectionally
7/15/2016
36
Social Status and Entrainment
• Azuma ’97: Do speakers adapt to the style of other social
classes?
• Study: Emperor Hirohito visits the countryside
– Corpus-based study of speech style of Japanese Emperor
Hirohito during chihoo jyunkoo (`visits to countryside‘),
1946-54
– Published transcripts of speeches
– Findings:
• Emperor Hirohito converged his speech style to that of listeners
lower in social status
– Choice of verb-forms, pronouns no longer those of person with
highest authority
– Perceived as like those of a (low-status) mother
7/15/2016
37
Socio-Cultural Influences and Entrainment
• Co-teachers adapt teaching styles (Roth ’05)
– Social context
• High school in NE with predominantly AfricanAmerican student body
• Cristobal: Cuban-African-American teacher
• Chris: new Italian-American teacher
– Adaptation of Chris to Cristobal
• Catch phrases (e.g. right!, really really hot) and their
production: pitch and intensity contours
• Pitch ‘matching’ across speakers
– Mimesis vs entrainment
7/15/2016
38
Conclusions for SDS
• Systems can make use of user tendency to entrain to
system vocabulary
• Should systems also entrain to their users?
– CMU’s Let’s Go system adapts confirmation
prompts to non-native speech, finding the closest
match to user input in its own vocabulary
7/15/2016
39
Today
•
•
•
•
Whither Spoken Dialogue Systems?
Technology issues
Human factors issues
Taking automated dialogue to the next level
– Modeling users’ emotional state
– Entrainment/adaptation/… to users
– System personality
– Cultural sensitivity
7/15/2016
40
Personality and Computer Systems
• Early-pc-era reports that significant others were jealous of the
time their partners spent with their computers.
• Reeves & Nass, The Media Equation How People Treat
Computers, Television, and New Media Like Real People and
Places, 1996
– Evolution explains the anthropomorphization of the pc
• Humans evolved over millions of years without media
• Proper response to any stimulus was critical to survival
• Human psychology and physiological responses well developed
before media invented
• Ergo, our bodies and minds react to media, immediately and
fundamentally, as if they were real
7/15/2016
41
People See ‘Personality’ Everywhere
• Humans assess personality of another (human or otherwise) quickly, with
minimal clues
• Perceived computer personality strongly affects how we evaluate the
computer and information it provides
• Experiments:
– Created “dominant” and “submissive” computer interfaces and asked
subjects to use to solve hypothetical problems
– Max (dominant) used assertive language, showed higher confidence in
the information displayed (via a numeric scale), always presented its
own analysis of the problem first
– Linus (submissive) phrased information more tentatively, rated its own
information at lower confidence levels, and allowed human to discuss
problem first
– Each used alternately by people whose personalities previously
identified as being either dominant or submissive
7/15/2016
42
User Reactions
• Users described Max and Linus in human terms: aggressive,
assertive, authoritative vs. shy, timid, submissive
– Users correctly identified machines more like themselves
– Users rated machines more like themselves as better
computers even though content received exactly the same.
– Users rated their own performance better when machine’s
personality matched theirs
• People more frank when rating a computer if questionnaire
presented on another machine
• Subjects thought highly of computers that praised them, even
if praise clearly undeserved
7/15/2016
43
Personality in SDS
• Mairesse & Walker ’07 PERSONAGE
(PERSONAlity GEnerator)
– ‘Big 5’ personality trait model: extroversion,
neuroticism, agreeableness, conscientiousness,
openness to experience
– Attempts to generate “extroverted” language based
on traits associated with extroversion in
psychology literature
– Demo: find your personality type
7/15/2016
44
7/15/2016
45
Conclusions for SDS
• Systems can be designed to convey different
personalities
• Can they recognize users’ personalities and entrain to
them?
• Should they?
7/15/2016
46
Goodbye!
• Final Paper
7/15/2016
47
Download