Emotional Speech

advertisement
Emotional Speech
Overview

Who cares?

The Idea of Emotion

Difficulties in approaching

Describing Emotion

Computational Models

Modeling Emotion in Speech

An example – Ang ’02
Who Cares?

Practical impact
 Detecting Frustration/Anger
 Stress/Distress
 Help call prioritizing
 Tutorials – Boredom/Confusion/Frustration

Pacing/Positive feedback
 User acceptance

Users preferred talking head using ES (Stallo, in Schröder)
Who Cares?

Esoteric Impact
 Is artificial intelligence possible w/o detection of
emotion?
 w/o display of “emotion”?

Do we experience someone/something as
understanding us if it can’t understand our
emotional state/experience?
Who Cares? – Izard ’77

Emotion & Perception

E & Cognition

E & Action

E & Personality Development

Understanding a speaker’s emotional state
gives us insight into his/her intention, desire,
motivation (Zimring)
The Bad News (Picard ’97)

Maintaining realistic expectations

User’s confidence in information

Potential to forge affective channels

Problem solving vs. empathic/observational

Symmetry of communication

Privacy issues
Idea of Emotion (Hergenhahn ’01)

Descartes
 “Passions”



Understood emotions as originating from both
physiological and cognitive sources
Pineal gland
Late 1800’s – early 1900’s
 Psychology was study of consciousness


William James  “The Science of Mental Life”
Major method was introspection – mental
–
Relies on a person reporting his/her experience
Idea of Emotion

1930’s – 1950’s
 Behaviorist tradition – study of behavior




“Objective” (at least measurable and observable)
Emerged from academia – a lot of rats suffered
Explains everything in terms of stimulus / response
Fails to explain some crucial issues, e.g., language
Idea of emotion

No one expects to get wet in a pool
filled with ping pong ball models
of water molecules.
1950’s – Cognitive “Revolution”
 Piaget, Miller, Chomsky, et al.

Miller  The Science of Mental Life
 John Searle

Syntax vs. semantics
 Materialism vs. Dualism
 What are reasonable expectations?
Searle ’90
Difficulties in approaching (Cowie)

E is resistant to capture in symbols

Speech presents special problems

Modeling of primary E’s not so useful

Consensus

Display Rules (Ekman)

Mixes  “Love/hate relationship”

Negative response to simulated displays
“[Utterances were] said by two actors in the
emotions of happiness, sadness, anger […]”
Difficulties in approaching

Quality of reference data

Rating believability (Schröder)
 Forced choice tests often ignore issue of
appropriateness/believability
 “How appropriate was utterance to given E”
(Rank 98)
 (Iida, et al.) Rated using scales for preference and for
subjective degree of expressed E.

Subject generosity

Temporal and contextual relationships
Pereira
Everything it is possible to analyze depends
on a clear method of distinguishing the
similar from the dissimilar.
Describing Emotion
=
=
=
≠
– Carl Linnaeus
≠
Invariants
Describing Emotion (Cowie)

Primary emotions
 Acceptance, anger, anticipation, disgust, joy,
fear, sadness, surprise

Secondary Emotions 

Arousal

Attitude

An aside: Intention may generate all of these
activity decisiveness haughtiness restrained adoration delighted
helplessness restraint alarm dependence hope righteousness alertness
depression humiliation rigor anger desire indifference routine
animosity despair inferiority sadness annoyance dimness initiative
satisfaction anxiety disappointment intensity satisfied appetite disgust
interest
skepticism
approval
disqualification
scorn artificiality
disregard involvement serenity astonishment disrespect joy servility
at ease distress leniency shame attraction droopy loneliness
sharpness balanced embarrassment longing shyness belonging
embitterment love simplicity bitterness enjoyment meditative sincerity
bliss envy mirth sleepy restlessness blur exaggeration misery
slumber boldness excitement sorrow boredom fatigue naturalness
stability calmness fear nervousness stubbornness caution firmness
pain suffering clearness frankness panic superiority compassion
fondness
passiveness
surprise
complexity
friendly
patience
suspiciousness concern frustration pity sympathy conciliated gaiety
pleasure tenderness confidence generosity posing tension constraint
gloom pride tolerance hate confusion grateful quiescence tranquility
contempt greediness regret uneasiness contentment grievance relaxed
unstable courage guilt relief vigilance yearning craving happiness
repulsion weakness criticism haste respect worry curiosity
“…emotion is a fact upon which all introspection
agrees. [Most emotional states] are states which
we have experienced personally.
(Gellhorn & Loofbourrow ’63)
Data of Emotion (Lang ’87)

Everyone generally agrees on existence

Basic datum is a state of feeling
 Completely private

Include understanding of antecedents and
consequences

Important to determine how E is represented
in memory

Suggest a Turing test (but don’t describe…)
Describing Emotion

One approach:
continuous dim. model (Cowie/Lang)

Activation – evaluation space

Add control

Curse of dimensionality

Primary E’s differ on at least 2 dimensions of
this scale (Pereira)
Computational Models (Pfeifer ’87)

Emotion as process

Emotion generation

Influence of emotion

Goal oriented nature

Interaction between subsystems

E. as heuristics

Representation of emotion
Computational Models (Pfeifer ’87)

Examines models dimensionally
 A) Symbolic vs non-symbolic (cognitive vs AI)
 B) Augmented by emotion vs focused on emotion

All approaches deal with E as process

Unclear whether system state = emotion

Models must function in complex, uncontrollable,
unpredictable context

No model for physiological aspect

Emotions tightly coupled to commonsense reasoning
Modeling Emotion in Speech

Synthesis: basic issues (Schröder)
 How is a given emotion expressed?
 Which properties of the E state are to be
expressed?
 Relationship between this state and another

Approaches
 Formant synthesis (Burkhardt)
 Diphone concatenation
 Unit selection
Modeling Emotion in Speech

Formant synthesis (Burkhardt)
 High degree of control “emoSyn”

Mean pitch, pitch range, variation, phrase and word
contour, flutter, intensity, rate, phonation type, vowel
precision, lip spread
 Two experiments


Stimuli systematically varied, then classified
Prototype generated and varied slightly
Modeling Emotion in Speech

Formant synthesis (Burkhardt)
 Fear

High pitch, broad range, falsetto voice, fast rate
 Joy


Broader pitch range, faster rate, modal or tense
phonation, precise articulation
Lowest recognition rates (perhaps due to intonation
patterns)
 Boredom

Lowered mean pitch, narrow range, slow rate, imprecise
articulation
Modeling Emotion in Speech

Formant synthesis (Burkhardt)
 Sadness



Narrow range, slow rate, breathy articulation
Also raised pitch, falsetto
Possible that sadness was imprecise term
 Anger

Faster rate, tense phonation
 General results

Recognition rates are comparable to natural speech,
especially when the categories from experiment 2 are
recombined.
Modeling Emotion in Speech

Generally: tradeoff between flexibility of
modeling and naturalness:
 Rule-based less natural
 Selection-based is less flexible
An Example – Ang ’02

Prosody-Based detection of annoyance/
frustration in human computer dialog

DARPA Communicator Project Travel
Planning Data (a simulation)
 (NIST, UC Boulder, CMU)

Considers contributions of prosody, language
model, and speaking style

Doesn’t begin with a strong hypothesis
An Example – Ang ’02

Uses recognizer output (sort of)

Examines rel. of emotion and speaking style

Uses hand coded style data
 Hyperaticulation, pauses, raised voice

Repeated requests or corrections

Hand labeled emotion relative to speaker
 Original and consensus labels
An Example – Ang ’02
Emotion Class
Instances
Percent
NEUTRAL
41545
83.84%
ANNOYED
3777
7.62%
FRUSTRATED
358
0.72%
TIRED
328
0.66%
AMUSED
326
0.66%
OTHER
115
0.23%
3104
6.26%
49553
100.0%
NOT-APPLICABLE
Total
An Example – Ang ’02

Prosodic Features
 Duration and speaking rate
 Pause, pitch, energy, spectral tilt

Non-prosodic Features
 Repetitions & corrections
 Position in dialog

Language model features

Discriminated using decision trees
 “Brute force iterative algorithm” to determine useful features
 With and without LM features
An Example – Ang ’02
Ang ’02 – Decision Tree Usage

Temporal features 28%
 Longer duration, slow speaking rate corr.
w/ frustration

Pitch features 26%
 Generally, high F0 correlated w/ frustration

Repeats/corrections (= system error) 26%
 Correlated w/ frustration

Raised Voice
Ang ’02 – Results
Ang ’02 – Results

Performance better by 5-6% for utterances
on which labelers originally agreed

Use of the repeat/correction feature improves
success by 4%

Frustration vs Else – very little data

Only slight difference between labeled and
recognized
Download