Vocal Pitch - University of Arizona

advertisement
Vocalic Markers of Deception and Cognitive
Dissonance for Automated Emotion Detection
Systems
Dr. Aaron C. Elkins
The University of Arizona
Emotional Voice
2
Can computers perceive vocal
emotion?
Yes…. but,
 The science of the emotional voice is
young
 Communication is complex and
dynamic



Moods and emotions contextually switch
Emotion is computationally ill-defined

Measuring emotion may inform theory
3
Emotional Dimensions
DISGUST?
4
Four Components of Speech

Voiced vs.
Unvoiced
sounds


[v] vs. [f]
Airstream
through
mouth or
nose

[m] vs. [o]
5
Speech Sounds

(1) pitch, (2) loudness, and (3) quality
Sound is small variations in air pressure that
occur rapidly in succession
 Vocal folds superimpose outgoing air of
voiced sounds
 The vocal folds vibrate to create a periodic
vibration (100 – 250 Hz)


We measure these features digitally
6
Recording Father – Digital
Audio
Waveform
measures pulses
of vocal folds
Based on air
pressure
disturbance (dB)
Voiced vs.
Unvoiced (low
pressure)
Each peak occurs
every 100th of a
second (100 Hz)
7
Vowel Articulation



Source-Filter Theory
(Müller, 1848)
Vocal Folds vibrate
at same speed
(pitch)
Resonance changes
in vocal tract to filter
frequencies
(formants)
8
Vocalics

Vocalic Analysis

Examines how it
was said
Amplitude
 Pitch (frequency)
 Response latency
 Tempo


Linguistics

Examines what
was said
9
Sound Production is Complex

When we tense our muscles, such during
stress, our larynx tenses


Higher Pitch
The process is complex


Emotions affect the normal operation
Deception takes away cognitive resources away
and is stressful


More mistakes, lower quality, increased average and
variation in pitch
Sympathetic Nervous system response


Increased auditory acuity
Heightened arousal
10
Standard Vocal Measures Calculated with Praat
and Custom Signal Processing Software
11
Nemesysco LVA 6.50
Commercial Vocalic Software Evaluated
12
Five Vocalic Studies
Summarized
Study One (Deception Experiment)
 Study Two (Cognitive Dissonance)
 Study Three (Embodied Conversational
Agent and Trust)
 Study Four (Embodied Conversational
Agent Security Screening - Bomber)
 Study Five (Embodied Conversational
Agent Security Screening - Imposter)

Vocal Deception (Study 1) –
Experimental Design



N = 96
$10 reward for appearing credible to professional
interviewer
Two Sequences:
First Sequence: DT DDTT TD TTDD T
Second Sequence: DT TTDD TD DDTT T

13 Short-Answer Questions


Only 8 had variation both within and between subjects
Two types of questions: Charged and Neutral
14
Results




Built-in classification performed at chance level
Vocal measures independent of system discriminated
deception: FMain, AVJ, and SOS
Possible Latent Variables measuring Conflicting Thoughts,
Cognitive Effort, and Emotional Fear
Logistic regression performed best on charged questions



Higher pitch, cognitive effort, and hesitations are predictive of deception in
more stressful interactions
The claim that the vocal analysis software measures stress,
cognitive effort, or emotion cannot be completely dismissed
Deception and Stress can be predicted by Acoustic
measures of Voice Quality and Pitch when controlling for
speaker characteristics
15
Vocal Dissonance (Study 2) –
Experimental Design

Modified Induced-Compliance
Paradigm

Participants (N=52) made two vocal
counter-attitudinal arguments for cutting
funding for service for the disabled
Choice is manipulated High vs. Low
(IV) High N = 24, Low N = 28
 Participants report attitude towards
argument issue (DV)

Arousal (Vocal Pitch)

High choice had a
10Hz higher pitch


F(1,50) = 4.43,
p = .04
All participants
reduced their
pitch over time

F(1,50) = 4.90,
p = .03
17
Cognitive Difficulty

High Choice
had nearly 2x
the response
latency on
argument two


F(1,50) = 4.53, p
= .04
Arousal
moderation
18
Cognitive Difficulty

Participants
spoke with 33%
more
nonfluencies
on the second
argument

F(1,50) = 4.03,
p = .05
19
The Importance of Language
(Imagery as Abstract Language)
20
Vocal Dissonance Model


χ²(1, N = 51), p = .49 SRMR = .02
R² Attitude Change = .17, Imagery = .11
21
From the lab to the AVATAR
22
First Kiosk
23
Kiosk from Last Year
24
Third-Generation Kiosk
25
Gender and Demeanor
26
Vocal Trust (Study 3) –
Experimental Design
• Participants completed presurvey
• Packed bag before ECA
screening interviewing
• Completed security
screening
• All responses to ECA
recorded for vocal analysis
ECA Demeanor and Gender
N = 88 Participants (53 Males, 35 Females)
Question
Block 1
Question
Block 2
Question
Block 3
Question
Block4
Repeated Measures Latin Square Design
All participants interacted with all demeanor
and gender ECA combinations
4 Questions Per block, 16 Total Questions
28
Trust and Time
Main effects


Initial Trust = 4.09
Trust Rate of Change



Multilevel Growth Model Specified with
Trust as the DV (N = 218) with Subject
as random effect (N=60)
.04 per second increase
p < .01
Duration


.05 decrease in trust for
every second spent
answering the ECA over
the 7.6 second average
p < .001
29
Vocal Pitch, Time, and Trust

Main Effect of Pitch

For every 1Hz
increase in pitch over
156Hz trust drops by
.01


p = .03
Interaction Pitch and
Time


Pitch x Time b = 9.3e05, p = .03
Over time pitch
predicts trust less and
less
30
Results

Human perceptions of trust transfer to ECA
 Time plays in important role in the
interaction
 All participants trusted the ECA more over
time, particularly when it smiled


48 increase in trust when ECA smiles
Vocal measures of pitch predicted trust, but
only early on


For every 1Hz increase in pitch over 156Hz trust
drops by .01
Over time pitch predicts trust less and less
31
Vocalics of a Bomber (Study 4)
Experimental Design
• 29 EU border guards were
randomly assigned to build a
bomb (N = 16) or Control (N =
13) then pack a bag
• Identical to Study 3,
but no breaks in the
interview
• Only male neutral
demeanor ECA interviewed
participants
• Bomb Makers were instructed to
successfully smuggle the bomb
past the ECA
Vocal Analysis
Recorded responses to question:
 “Has anyone given you a prohibited
substance to transport through this
checkpoint?”

Average Response 2.68 sec (SD = 1.66)


Responses such as “No” or “of course not”
Vocal measures of Pitch and Pitch
Variation
33
Results of Vocal Pitch



Voice Quality, Gender,
and Intensity included as
covariates
No difference in mean
vocal pitch F(1,22)=0.38,
p = .54
Main Effect of pitch
variation

Bomb Makers had 25.34%
more variation F(1,22)=4.79,
p=.04
34
Pitch Contours
35
Eye Gaze: Guilty
36
Eye Gaze: Innocent
37
Vocalics of an Imposter (Study 5)
– Experimental Design


38 EU Border Guards
All required to present visa and
passport through multiphase screening




E-gate
Manual Processing
AVATAR Screening Interview
Four randomly assigned imposters
carrying false documents with hostile
intentions through screening
AVATAR Interaction Example
iPad Output for Screener
40
Voice Quality Change from Baseline
Question (What is your full name?)
41
Vocalic Classification Model
42
Vocalic Resulting Classification

7 innocents falsely classified as terrorists
 27 correctly classified as innocent
 All “guilty” referred to secondary
 Overall accuracy = 81%




TPR = 100%
TNR = 79%
FPR = 20%
FNR = 0%
43
Eye Fixations on Visa
44
Date of Birth Results – Correct?
45
Final Decision Model
46
Vocalic Resulting Classification

3 innocents falsely classified as terrorists

One of these three was actually lying

Actually a True Positive

31 correctly classified as innocent
 All “guilty” referred to secondary
 Overall accuracy = 94.47%




TPR = 100%
TNR = 88.24%
FPR = 5.8%  Reduced by 3/4
FNR = 0%
47
Questions?

Isn’t the voice amazing?
Download