Emotional Speech
Who cares?
The Idea of Emotion
Difficulties in approaching
Describing Emotion
Computational Models
Modeling Emotion in Speech
An example – Ang ’02
Who Cares?
Practical impact
 Detecting Frustration/Anger
 Stress/Distress
 Help call prioritizing
 Tutorials – Boredom/Confusion/Frustration
Pacing/Positive feedback
 User acceptance
Users preferred talking head using ES (Stallo, in Schröder)
Who Cares?
Esoteric Impact
 Is artificial intelligence possible w/o detection of
 w/o display of “emotion”?
Do we experience someone/something as
understanding us if it can’t understand our
emotional state/experience?
Who Cares? – Izard ’77
Emotion & Perception
E & Cognition
E & Action
E & Personality Development
Understanding a speaker’s emotional state
gives us insight into his/her intention, desire,
motivation (Zimring)
The Bad News (Picard ’97)
Maintaining realistic expectations
User’s confidence in information
Potential to forge affective channels
Problem solving vs. empathic/observational
Symmetry of communication
Privacy issues
Idea of Emotion (Hergenhahn ’01)
 “Passions”
Understood emotions as originating from both
physiological and cognitive sources
Pineal gland
Late 1800’s – early 1900’s
 Psychology was study of consciousness
William James  “The Science of Mental Life”
Major method was introspection – mental
Relies on a person reporting his/her experience
Idea of Emotion
1930’s – 1950’s
 Behaviorist tradition – study of behavior
“Objective” (at least measurable and observable)
Emerged from academia – a lot of rats suffered
Explains everything in terms of stimulus / response
Fails to explain some crucial issues, e.g., language
Idea of emotion
No one expects to get wet in a pool
filled with ping pong ball models
of water molecules.
1950’s – Cognitive “Revolution”
 Piaget, Miller, Chomsky, et al.
Miller  The Science of Mental Life
 John Searle
Syntax vs. semantics
 Materialism vs. Dualism
 What are reasonable expectations?
Searle ’90
Difficulties in approaching (Cowie)
E is resistant to capture in symbols
Speech presents special problems
Modeling of primary E’s not so useful
Display Rules (Ekman)
Mixes  “Love/hate relationship”
Negative response to simulated displays
“[Utterances were] said by two actors in the
emotions of happiness, sadness, anger […]”
Difficulties in approaching
Quality of reference data
Rating believability (Schröder)
 Forced choice tests often ignore issue of
 “How appropriate was utterance to given E”
(Rank 98)
 (Iida, et al.) Rated using scales for preference and for
subjective degree of expressed E.
Subject generosity
Temporal and contextual relationships
Everything it is possible to analyze depends
on a clear method of distinguishing the
similar from the dissimilar.
Describing Emotion
– Carl Linnaeus
Describing Emotion (Cowie)
Primary emotions
 Acceptance, anger, anticipation, disgust, joy,
fear, sadness, surprise
Secondary Emotions 
An aside: Intention may generate all of these
activity decisiveness haughtiness restrained adoration delighted
helplessness restraint alarm dependence hope righteousness alertness
depression humiliation rigor anger desire indifference routine
animosity despair inferiority sadness annoyance dimness initiative
satisfaction anxiety disappointment intensity satisfied appetite disgust
scorn artificiality
disregard involvement serenity astonishment disrespect joy servility
at ease distress leniency shame attraction droopy loneliness
sharpness balanced embarrassment longing shyness belonging
embitterment love simplicity bitterness enjoyment meditative sincerity
bliss envy mirth sleepy restlessness blur exaggeration misery
slumber boldness excitement sorrow boredom fatigue naturalness
stability calmness fear nervousness stubbornness caution firmness
pain suffering clearness frankness panic superiority compassion
suspiciousness concern frustration pity sympathy conciliated gaiety
pleasure tenderness confidence generosity posing tension constraint
gloom pride tolerance hate confusion grateful quiescence tranquility
contempt greediness regret uneasiness contentment grievance relaxed
unstable courage guilt relief vigilance yearning craving happiness
repulsion weakness criticism haste respect worry curiosity
“…emotion is a fact upon which all introspection
agrees. [Most emotional states] are states which
we have experienced personally.
(Gellhorn & Loofbourrow ’63)
Data of Emotion (Lang ’87)
Everyone generally agrees on existence
Basic datum is a state of feeling
 Completely private
Include understanding of antecedents and
Important to determine how E is represented
in memory
Suggest a Turing test (but don’t describe…)
Describing Emotion
One approach:
continuous dim. model (Cowie/Lang)
Activation – evaluation space
Add control
Curse of dimensionality
Primary E’s differ on at least 2 dimensions of
this scale (Pereira)
Computational Models (Pfeifer ’87)
Emotion as process
Emotion generation
Influence of emotion
Goal oriented nature
Interaction between subsystems
E. as heuristics
Representation of emotion
Computational Models (Pfeifer ’87)
Examines models dimensionally
 A) Symbolic vs non-symbolic (cognitive vs AI)
 B) Augmented by emotion vs focused on emotion
All approaches deal with E as process
Unclear whether system state = emotion
Models must function in complex, uncontrollable,
unpredictable context
No model for physiological aspect
Emotions tightly coupled to commonsense reasoning
Modeling Emotion in Speech
Synthesis: basic issues (Schröder)
 How is a given emotion expressed?
 Which properties of the E state are to be
 Relationship between this state and another
 Formant synthesis (Burkhardt)
 Diphone concatenation
 Unit selection
Modeling Emotion in Speech
Formant synthesis (Burkhardt)
 High degree of control “emoSyn”
Mean pitch, pitch range, variation, phrase and word
contour, flutter, intensity, rate, phonation type, vowel
precision, lip spread
 Two experiments
Stimuli systematically varied, then classified
Prototype generated and varied slightly
Modeling Emotion in Speech
Formant synthesis (Burkhardt)
 Fear
High pitch, broad range, falsetto voice, fast rate
 Joy
Broader pitch range, faster rate, modal or tense
phonation, precise articulation
Lowest recognition rates (perhaps due to intonation
 Boredom
Lowered mean pitch, narrow range, slow rate, imprecise
Modeling Emotion in Speech
Formant synthesis (Burkhardt)
 Sadness
Narrow range, slow rate, breathy articulation
Also raised pitch, falsetto
Possible that sadness was imprecise term
 Anger
Faster rate, tense phonation
 General results
Recognition rates are comparable to natural speech,
especially when the categories from experiment 2 are
Modeling Emotion in Speech
Generally: tradeoff between flexibility of
modeling and naturalness:
 Rule-based less natural
 Selection-based is less flexible
An Example – Ang ’02
Prosody-Based detection of annoyance/
frustration in human computer dialog
DARPA Communicator Project Travel
Planning Data (a simulation)
 (NIST, UC Boulder, CMU)
Considers contributions of prosody, language
model, and speaking style
Doesn’t begin with a strong hypothesis
An Example – Ang ’02
Uses recognizer output (sort of)
Examines rel. of emotion and speaking style
Uses hand coded style data
 Hyperaticulation, pauses, raised voice
Repeated requests or corrections
Hand labeled emotion relative to speaker
 Original and consensus labels
An Example – Ang ’02
Emotion Class
An Example – Ang ’02
Prosodic Features
 Duration and speaking rate
 Pause, pitch, energy, spectral tilt
Non-prosodic Features
 Repetitions & corrections
 Position in dialog
Language model features
Discriminated using decision trees
 “Brute force iterative algorithm” to determine useful features
 With and without LM features
An Example – Ang ’02
Ang ’02 – Decision Tree Usage
Temporal features 28%
 Longer duration, slow speaking rate corr.
w/ frustration
Pitch features 26%
 Generally, high F0 correlated w/ frustration
Repeats/corrections (= system error) 26%
 Correlated w/ frustration
Raised Voice
Ang ’02 – Results
Ang ’02 – Results
Performance better by 5-6% for utterances
on which labelers originally agreed
Use of the repeat/correction feature improves
success by 4%
Frustration vs Else – very little data
Only slight difference between labeled and