Emotional Speech Julia Hirschberg CS 6998 7/15/2016

advertisement
Emotional Speech
Julia Hirschberg
CS 6998
7/15/2016
1
Today
Defining emotional speech
Emotional categories
Eliciting judgments
Producing emotional speech
Detecting emotional speech
A Subclass: Deceptive speech
7/15/2016
2
Cowie ‘00
Is there a good theoretical or practical definition
of emotional speech?
“Full-blown” emotion vs. emotional state
Cause and effect descriptions
Primary and secondary (second order)
Everyday descriptions
Representations
Biological
7/15/2016
3
Dimensions in continuous space, e.g.
Valence: positive or negative
Activation level: how disposed to take action
Structural models: different ways of
appraising situation that evokes emotion
e.g. positive or negative? Does situation help
agent to achieve his/her goals?
Timing as a key variable
sadness vs. grief vs. depression vs. gloominess
7/15/2016
4
How are emotions expressed?
Display rules? In speech?
Mixing
Simulation
7/15/2016
5
Schroeder ‘01: Emotion in
Synthesis
How is a given emotion expressed in speech?
What are the properties of the emotion to be
expressed? How are they related to those of
other emotions?
What kind of synthesizer works best?
Formant
Diphone
Unit selection
7/15/2016
6
Prosody rules: what to modify?
How do we evaluate the results?
Forced choice
Free response
Recognition rate
Perceived naturalness
7/15/2016
7
Ten Bosch ‘00: Emotion
Recognition
How hard is the problem?
Is ‘standard’ ASR technology well-suited to it?
Acoustic and language models target short
local events
Feature extraction normlizes/excludes e.g.
pitch, rate, amplitude -- why?
Interaction: emotional speech and ASR
performance
Synthesis needs one good example but...
7/15/2016
8
Ang et al
Challenges:
Use output from ASR system
Use automatic prosodic features
Find good speaker normalization
Combine with lexical features
Pioneered approach of “direct modeling” – no
use of intermediate phonological units
Applications: detecting frustration,
disappointment/tiredness, amusement/surprise
Results: prediction comparable to human
accuracy 70-75%
7/15/2016
9
Method: Prosodic Models
Extract pitch from signal
Speech recognizer outputs word and phone
alignments (duration features)
Utterance-level features extracted (e.g., max
speaker normalized pitch in the longest phonenormalized vowel, etc)
Decision trees created to provide posterior
probabilities of emotion classes given features
Feature selection from development test set
Separate test set used for evaluation
7/15/2016
10
Prosodic Features
 Duration features
Phone / Vowel / Syllable Durations
Normalized by Phone/Vowel Means, Speaker
 Speaking rate features (vowels/time)
 Pause features
Speech to pause ratio, number of long pauses
Maximum pause length
 Energy features (RMS energy)
 Pitch features
Used pitch stylization algorithm (Sonmez et al.)
LTM model of F0 to estimate speaker range
Pitch ranges, slopes, locations of interest
 Spectral tilt features
 Other (non-prosodic) features
Position of utterance in dialog
Repeat or correction
7/15/2016
11
Emotion in Deception
Motivation: why might such cues exist?
Deception evokes emotion in deceivers
(e.g. Ekman ‘85-92)
Fear of discovery: higher pitch, faster,
louder, pauses disfluencies, indirect speech
Elation at successful deceiving: higher
pitch, faster, louder, greater elaboration
7/15/2016
12
Acoustic/Prosodic/Lexical Cues
Are deceivers less forthcoming?
Shorter speech with fewer details
Are lies less compelling than truths?
Less plausible, logical, more discrepancies
Less verbal and vocal ‘involvement’
Less verbal ‘immediacy’: more passives,
negations, indirect speech
More uncertainty (subjective)
More repetitions
Are liars less positive, pleasant?
7/15/2016
13
More negative statements, complaints
Are liars more tense?
Nervous overall
Vocal tension
High pitch
Do lies contain fewer ‘imperfections’?
Fewer self-repairs
Fewer admissions of forgetfulness
Fewer scene descriptions, details
More mention of peripheral events or
relationships
7/15/2016
14
Current State-of-the-Art
 No single cue to deceptive speech: most studied are
visual
 Other acoustic/prosodic features proposed, but
evidence mixed so far
Loudness/intensity
Speaking rate
Response latency
Disfluencies
 No attested method to detect deception automatically
using acoustic/prosodic/lexical cues
All current findings are descriptive, suggestive
7/15/2016All proposed methods require human intervention 15
Our Approach
Elicit deceptive and non-deceptive corpus
Motivation: Identity-relevant (self-image)
and instrumental (monetary) incentives
“Real” deception vs. acted
Good recording conditions
Tasks/interview paradigm
Transcription/annotation
Acoustic/prosodic/lexical analysis to identify
features of interest, test validity of paradigm
Automatic feature extraction and analysis to
train models of deceptive and non-deceptive
speech
7/15/2016
16
Corpus Collection
Subjects asked to perform tasks for
comparison with target profile of 25 top
entrepreneurs
Performance manipulated to produce
performance same as/differing from target
Monetary incentive to convince an
interviewer they matched target
Recorded interview/interrogation
Biographical information (t/f)
“Big lie” on task performance
“Local lie”: Pedal indicators of t/f for each
answer
7/15/2016
17
Collection
To date: 15 subjects, totaling ~3h of
subject speech
Planned: 7-8h hours of subject speech
7/15/2016
18
Results of Prosodic/Acoustic
Analysis
On Arizona Mock Theft data subset:
32 interviews/72m, required
segmentation, recording issues (50/160m
more being segmented)
Significant pitch feature differences
between deceptive and non-deceptive
speech, but...
7/15/2016
Highly motivated speakers lower pitch when
lying
Low motivation speakers raise pitch when lying
Males lower pitch when lying
Females raise pitch when lying
19
On Columbia corpus:
Preliminary analyses of 8 speakers for
‘local’ t/f
Significant differences in pitch range for
six subjects, but differ from Mock Theft wrt
gender
Lexical findings:
Preliminary analyses on Columbia data using
LIWC show negative words more prevalent in
deceptive speech
7/15/2016
20
Download