Evasion and Obfuscation in Speaker Recognition Surveillance and Forensics

advertisement
Evasion and Obfuscation
in Speaker Recognition Surveillance
and Forensics
Federico Alegre, Giovanni Soldi, Nicholas Evans,
Benoit Fauve and Jasmin Liu
IWBF 2014 - March 27, 2014
Biometrics & Subversion
`
Biometric systems: two types of errors
`
`
`
Subversion: provoke deliberate error
`
`
`
False Acceptance (FA)
False Rejection (FR)
Authentication & Security Î Spoofing (increased FA)
Surveillance Î Evasion & Obfuscation (increased FR)
Forensics
`
`
Spoofing: generation of falsified evidence
Evasion & Obfuscation: impede detection
07/05/2014 -
EURECOM RESEARCH
-p2
Evasion & Obfuscation (E&O)
`
Evasion and obfuscation: provoke deliberate missed detection
`
Evasion: provoke missed detection at the biometry detection stage
`
Obfuscation: provoke missed detection at the recognition stage
EVASION
OBFUSCATION
DECISION
DETECTION
RECOGNITION
BIOMETRIC SYSTEM
`
Need of evasion and obfuscation detection
07/05/2014 -
EURECOM RESEARCH
-p3
Evasion
`
Speech Activity Detection (SAD): speech/non-speech detector
`
`
`
`
Energy-based,
Model-based
Phoneme-based.
Examples of evasion
`
Fill audio silences with high-energy noise
`
`
`
overcome surveillance system with energy-based SAD.
…
No previous work in evasion
07/05/2014 -
EURECOM RESEARCH
-p4
Obfuscation (1/2)
`
Obfuscation (intentional disguise):
`
Non-electronic:
`
`
Electronic:
`
`
whispering, glottal fry, pitch modification, hand over the mouth, etc.
voice modification, voice conversion.
Previous work in obfuscation
`
Non-electronic obfuscation
`
`
Voice modification, voice conversion
`
`
[Künsel’04] [Kajarekar’06] [Zhang’08] [Villalba’10]
[Perrot’09]
TIMIT database,
no vulnerability
assessment
Electronic pitch modification (detection)
`
In-house collected
databases,
no detection
[Wu’13]
07/05/2014 -
EURECOM RESEARCH
-p5
Obfuscation (2/2)
`
Voice conversion [Matrouf et al.‘05]
`
`
Speaker X = targeted speaker Î Spoofing
Speaker X = non-targeted speaker Î Obfuscation
I – FRAME BLOCKING
SPEAKER
`
Y
II - FRAME-TO-FRAME
III - SYNTHESIS
VOCAL TRACT REPLACEMENT
FROM Y TO X
CONVERTED VOICE
Why voice conversion?
`
No “footprints” of manipulation! (critical in forensics)
07/05/2014 -
EURECOM RESEARCH
-p6
Experimental Setup
`
Targeted Speaker Verification (SV) system:
`
Windowing
`
`
`
`
20 ms, 10 ms overlap.
Parameters
`
16 MFCC + 16 ∆MFCC + ∆Energy
`
Energy-based speech detector + Feature normalization
`
Modeling
`
GMM-UBM: standard with 1024 Gaussians
`
GSL: GMM supervector + SVM classifier
`
GSL-NAP: GSL with NAP compensation
`
`
GSL-FA: GSL system with FA supervectors
`
FA: GMM with factor analysis compensation
`
IV-PLDA: i-vectors + PLDA with length normalization.
`
07/05/2014 -
NIST Databases:
`
Development: NIST’05
`
Evaluation: NIST’06
`
Background: NIST’04 & NIST’08
Baseline dataset (BASE):
`
Client Models (298): from targeted system
`
True trials (1344): 8conv4w-1conv4w
`
Impostor trials (12648): 8conv4w-1conv4w
Evasion dataset (EVAS):
`
True trials (1344): silences filled with high-energy noise
`
Impostor trials (12648): same as Baseline dataset
Obfuscation dataset (OBF):
`
True trials (1344): replaced with converted versions
`
Impostor trials (12648): same as Baseline dataset
EURECOM RESEARCH
-p7
Results (1/4): E&O assessment
`
Detector developed for spoofing attacks [Alegre‘13]
ƒ
6 ASV systems
ƒ
3 datasets (BASE-EVAS-OBF)
ASV
BASE
EVAS
OBF
GMM-UBM
8,7
19,4
47,7
GSL
8,0
55,1
32,3
GSL-NAP
6,8
53,4
31,5
GSL-FA
6,4
54,7
29,1
FA
5,6
20,6
41,9
IV-PLDA
3,0
24,3
20,0
COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS
AGAINST THREE DATABASES
07/05/2014 -
EVALUATION FOR E&O FOR IV-PLDA SYSTEM
EURECOM RESEARCH
-p8
Results (2/4): E&O Detection
`
Detector developed for spoofing attacks [Alegre‘BTAS13]
ƒ
Feature: Local Binary Patterns
ƒ
Modeling: one-class classifier
ASV
E&O
detector
EVA
FEATURE
EXTRACTION
0
FEATURE (LBP) SPACE
OBF
3,4
EVALUATION (EER %) OF E&O DETECTOR AGAINST
EVASION AND OBFUSCATION
CLASSIFIER
LICIT/
E&O
EVASION
LICIT DATA
ONE-CLASS CLASSIFIER
OBFUSCATION
(SVM)
EVALUATION FOR E&O FOR IV-PLDA SYSTEM
07/05/2014 -
EURECOM RESEARCH
-p9
Results (3/4): Integration - Evasion
`
Two systems in cascade (evaluated against evasion)
ƒ
Speaker Verification (1)
ƒ
E/O detector (2)
Dataset/
System
BASE/
(1)
EVAS/
(1)
EVAS/
(1) + (2)
GMM-UBM
8,7
19,4
0
GSL
8,0
55,1
0
GSL-NAP
6,8
53,4
0
GSL-FA
6,4
54,7
0
FA
5,6
20,6
0
IV-PLDA
3,0
24,3
0
COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS
AGAINST THREE DATABASES
07/05/2014 -
EVALUATION FOR E&O FOR IV-PLDA SYSTEM
EURECOM RESEARCH
- p 10
Results (4/4): Integration - Obfuscation
`
Two systems in cascade (evaluated against obfuscation)
ƒ
Speaker Verification (1)
ƒ
E/O detector (2)
Dataset/
System
BASE/
(1)
OBF/
(1)
OBF/
(1) + (2)
GMM-UBM
8,7
47,7
4,1
GSL
8,0
32,3
3,5
GSL-NAP
6,8
31,5
3,4
GSL-FA
6,4
29,1
3,1
FA
5,6
41,9
3,9
IV-PLDA
3,0
20,0
3,1
COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS
AGAINST THREE DATABASES
07/05/2014 -
EVALUATION FOR E&O FOR IV-PLDA SYSTEM
EURECOM RESEARCH
- p 11
Conclusions & Future Work
`
Surveillance and forensic ASV systems can be subverted
`
`
`
Evasion
Obfuscation
All systems vulnerable
`
EERs increase from 3%-9% to
`
`
`
`
`
24%-55% for evasion
20%-48% for obfuscation
Detection decreases EERs to 3-4% EER for obfuscation and 0% for evasion
i-vectors approach least sensitive
Proof of concept shows need for further work
07/05/2014 -
EURECOM RESEARCH
- p 12
End
THANK YOU FOR YOUR ATTENTION
07/05/2014 -
EURECOM RESEARCH
- p 13
Backup 1: E/O detector
`
Generalized approach
`
`
Feature that properly represents real speech
One-class, generalized CM (applicable to all biometrics)
FEATURE
EXTRACTION
07/05/2014 -
TWO
ONE-CLASS
CLASSIFIER
EURECOM RESEARCH
SPOOF/
NON SPOOF
- p 14
Backup 2: LBP Feature
`
Treat speech as an image and extract higher-level features:
`
`
Computer vision successfully applied in speech field [Roy et al.‘12]
Here: image formed by concatenation of feature vectors
LINEAR FREQUENCY
CEPSTRAL COEFFICIENTS
(LFCC)
∆LFCC
∆∆LFCC
`
Local Binary Patterns [Ojala et al. ‘06]
SPOOF/
NON SPOOF
LBP
OPERATOR
ONE-CLASS
CLASSIFIER
ANTISPOOFING
FEATURE
07/05/2014 -
EURECOM RESEARCH
TEXTROGRAM
- p 15
Backup 3: Results with SPOOF & Tnorm
Dataset/
System
BASE
EVA
OBF
SPOOF
Dataset/
System
BASE
EVA
OBF
GMM-UBM
8,7
19,4
47,7
32,6
GMM-UBM
8,6
52,6
32,1
GSL
8,0
55,1
32,3
37,2
GSL
8,1
53,2
29,9
GSL-NAP
6,8
53,4
31,5
32,1
GSL-NAP
6,3
50,3
27,6
GSL-FA
6,4
54,7
29,1
24,4
GSL-FA
5,7
49,8
32,4
FA
5,6
20,6
41,9
30,3
FA
5,6
49,1
29,2
IV-PLDA
3,0
24,3
20,0
20,2
IV-PLDA
3,0
24,3
20,0
SPOOF
COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS
COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS
AGAINST FOUR DATABASES
AGAINST FOUR DATABASES
07/05/2014 -
EURECOM RESEARCH
Evasion
`
Speech Activity Detection (SAD): speech/non-speech detector
`
`
`
`
Energy-based,
Model-based
Phoneme-based.
Examples of evasion
`
Fill audio silences with high-energy noise
`
`
Vocoder: carrier (noise, musical instrument) + modulator (message)
`
`
`
overcome surveillance system with energy-based SAD.
overcome model-based and phoneme-based SADs.
…
No previous work in evasion
07/05/2014 -
EURECOM RESEARCH
- p 17
Download