Emotional Speech Julia Hirschberg CS 6998 7/15/2016

Emotional Speech Julia Hirschberg CS 6998 7/15/2016 1 Today Defining emotional speech Emotional categories Eliciting judgments Producing emotional speech Detecting emotional speech A Subclass: Deceptive speech 7/15/2016 2 Cowie ‘00 Is there a good theoretical or practical definition of emotional speech? “Full-blown” emotion vs. emotional state Cause and effect descriptions Primary and secondary (second order) Everyday descriptions Representations Biological 7/15/2016 3 Dimensions in continuous space, e.g. Valence: positive or negative Activation level: how disposed to take action Structural models: different ways of appraising situation that evokes emotion e.g. positive or negative? Does situation help agent to achieve his/her goals? Timing as a key variable sadness vs. grief vs. depression vs. gloominess 7/15/2016 4 How are emotions expressed? Display rules? In speech? Mixing Simulation 7/15/2016 5 Schroeder ‘01: Emotion in Synthesis How is a given emotion expressed in speech? What are the properties of the emotion to be expressed? How are they related to those of other emotions? What kind of synthesizer works best? Formant Diphone Unit selection 7/15/2016 6 Prosody rules: what to modify? How do we evaluate the results? Forced choice Free response Recognition rate Perceived naturalness 7/15/2016 7 Ten Bosch ‘00: Emotion Recognition How hard is the problem? Is ‘standard’ ASR technology well-suited to it? Acoustic and language models target short local events Feature extraction normlizes/excludes e.g. pitch, rate, amplitude -- why? Interaction: emotional speech and ASR performance Synthesis needs one good example but... 7/15/2016 8 Ang et al Challenges: Use output from ASR system Use automatic prosodic features Find good speaker normalization Combine with lexical features Pioneered approach of “direct modeling” – no use of intermediate phonological units Applications: detecting frustration, disappointment/tiredness, amusement/surprise Results: prediction comparable to human accuracy 70-75% 7/15/2016 9 Method: Prosodic Models Extract pitch from signal Speech recognizer outputs word and phone alignments (duration features) Utterance-level features extracted (e.g., max speaker normalized pitch in the longest phonenormalized vowel, etc) Decision trees created to provide posterior probabilities of emotion classes given features Feature selection from development test set Separate test set used for evaluation 7/15/2016 10 Prosodic Features  Duration features Phone / Vowel / Syllable Durations Normalized by Phone/Vowel Means, Speaker  Speaking rate features (vowels/time)  Pause features Speech to pause ratio, number of long pauses Maximum pause length  Energy features (RMS energy)  Pitch features Used pitch stylization algorithm (Sonmez et al.) LTM model of F0 to estimate speaker range Pitch ranges, slopes, locations of interest  Spectral tilt features  Other (non-prosodic) features Position of utterance in dialog Repeat or correction 7/15/2016 11 Emotion in Deception Motivation: why might such cues exist? Deception evokes emotion in deceivers (e.g. Ekman ‘85-92) Fear of discovery: higher pitch, faster, louder, pauses disfluencies, indirect speech Elation at successful deceiving: higher pitch, faster, louder, greater elaboration 7/15/2016 12 Acoustic/Prosodic/Lexical Cues Are deceivers less forthcoming? Shorter speech with fewer details Are lies less compelling than truths? Less plausible, logical, more discrepancies Less verbal and vocal ‘involvement’ Less verbal ‘immediacy’: more passives, negations, indirect speech More uncertainty (subjective) More repetitions Are liars less positive, pleasant? 7/15/2016 13 More negative statements, complaints Are liars more tense? Nervous overall Vocal tension High pitch Do lies contain fewer ‘imperfections’? Fewer self-repairs Fewer admissions of forgetfulness Fewer scene descriptions, details More mention of peripheral events or relationships 7/15/2016 14 Current State-of-the-Art  No single cue to deceptive speech: most studied are visual  Other acoustic/prosodic features proposed, but evidence mixed so far Loudness/intensity Speaking rate Response latency Disfluencies  No attested method to detect deception automatically using acoustic/prosodic/lexical cues All current findings are descriptive, suggestive 7/15/2016All proposed methods require human intervention 15 Our Approach Elicit deceptive and non-deceptive corpus Motivation: Identity-relevant (self-image) and instrumental (monetary) incentives “Real” deception vs. acted Good recording conditions Tasks/interview paradigm Transcription/annotation Acoustic/prosodic/lexical analysis to identify features of interest, test validity of paradigm Automatic feature extraction and analysis to train models of deceptive and non-deceptive speech 7/15/2016 16 Corpus Collection Subjects asked to perform tasks for comparison with target profile of 25 top entrepreneurs Performance manipulated to produce performance same as/differing from target Monetary incentive to convince an interviewer they matched target Recorded interview/interrogation Biographical information (t/f) “Big lie” on task performance “Local lie”: Pedal indicators of t/f for each answer 7/15/2016 17 Collection To date: 15 subjects, totaling ~3h of subject speech Planned: 7-8h hours of subject speech 7/15/2016 18 Results of Prosodic/Acoustic Analysis On Arizona Mock Theft data subset: 32 interviews/72m, required segmentation, recording issues (50/160m more being segmented) Significant pitch feature differences between deceptive and non-deceptive speech, but... 7/15/2016 Highly motivated speakers lower pitch when lying Low motivation speakers raise pitch when lying Males lower pitch when lying Females raise pitch when lying 19 On Columbia corpus: Preliminary analyses of 8 speakers for ‘local’ t/f Significant differences in pitch range for six subjects, but differ from Mock Theft wrt gender Lexical findings: Preliminary analyses on Columbia data using LIWC show negative words more prevalent in deceptive speech 7/15/2016 20

Emotional Speech Julia Hirschberg CS 6998 7/15/2016

Related documents

Products

Support

Emotional Speech Julia Hirschberg CS 6998 7/15/2016

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib