Evasion and Obfuscation in Speaker Recognition Surveillance and Forensics Federico Alegre, Giovanni Soldi, Nicholas Evans, Benoit Fauve and Jasmin Liu IWBF 2014 - March 27, 2014 Biometrics & Subversion ` Biometric systems: two types of errors ` ` ` Subversion: provoke deliberate error ` ` ` False Acceptance (FA) False Rejection (FR) Authentication & Security Î Spoofing (increased FA) Surveillance Î Evasion & Obfuscation (increased FR) Forensics ` ` Spoofing: generation of falsified evidence Evasion & Obfuscation: impede detection 07/05/2014 - EURECOM RESEARCH -p2 Evasion & Obfuscation (E&O) ` Evasion and obfuscation: provoke deliberate missed detection ` Evasion: provoke missed detection at the biometry detection stage ` Obfuscation: provoke missed detection at the recognition stage EVASION OBFUSCATION DECISION DETECTION RECOGNITION BIOMETRIC SYSTEM ` Need of evasion and obfuscation detection 07/05/2014 - EURECOM RESEARCH -p3 Evasion ` Speech Activity Detection (SAD): speech/non-speech detector ` ` ` ` Energy-based, Model-based Phoneme-based. Examples of evasion ` Fill audio silences with high-energy noise ` ` ` overcome surveillance system with energy-based SAD. … No previous work in evasion 07/05/2014 - EURECOM RESEARCH -p4 Obfuscation (1/2) ` Obfuscation (intentional disguise): ` Non-electronic: ` ` Electronic: ` ` whispering, glottal fry, pitch modification, hand over the mouth, etc. voice modification, voice conversion. Previous work in obfuscation ` Non-electronic obfuscation ` ` Voice modification, voice conversion ` ` [Künsel’04] [Kajarekar’06] [Zhang’08] [Villalba’10] [Perrot’09] TIMIT database, no vulnerability assessment Electronic pitch modification (detection) ` In-house collected databases, no detection [Wu’13] 07/05/2014 - EURECOM RESEARCH -p5 Obfuscation (2/2) ` Voice conversion [Matrouf et al.‘05] ` ` Speaker X = targeted speaker Î Spoofing Speaker X = non-targeted speaker Î Obfuscation I – FRAME BLOCKING SPEAKER ` Y II - FRAME-TO-FRAME III - SYNTHESIS VOCAL TRACT REPLACEMENT FROM Y TO X CONVERTED VOICE Why voice conversion? ` No “footprints” of manipulation! (critical in forensics) 07/05/2014 - EURECOM RESEARCH -p6 Experimental Setup ` Targeted Speaker Verification (SV) system: ` Windowing ` ` ` ` 20 ms, 10 ms overlap. Parameters ` 16 MFCC + 16 ∆MFCC + ∆Energy ` Energy-based speech detector + Feature normalization ` Modeling ` GMM-UBM: standard with 1024 Gaussians ` GSL: GMM supervector + SVM classifier ` GSL-NAP: GSL with NAP compensation ` ` GSL-FA: GSL system with FA supervectors ` FA: GMM with factor analysis compensation ` IV-PLDA: i-vectors + PLDA with length normalization. ` 07/05/2014 - NIST Databases: ` Development: NIST’05 ` Evaluation: NIST’06 ` Background: NIST’04 & NIST’08 Baseline dataset (BASE): ` Client Models (298): from targeted system ` True trials (1344): 8conv4w-1conv4w ` Impostor trials (12648): 8conv4w-1conv4w Evasion dataset (EVAS): ` True trials (1344): silences filled with high-energy noise ` Impostor trials (12648): same as Baseline dataset Obfuscation dataset (OBF): ` True trials (1344): replaced with converted versions ` Impostor trials (12648): same as Baseline dataset EURECOM RESEARCH -p7 Results (1/4): E&O assessment ` Detector developed for spoofing attacks [Alegre‘13] 6 ASV systems 3 datasets (BASE-EVAS-OBF) ASV BASE EVAS OBF GMM-UBM 8,7 19,4 47,7 GSL 8,0 55,1 32,3 GSL-NAP 6,8 53,4 31,5 GSL-FA 6,4 54,7 29,1 FA 5,6 20,6 41,9 IV-PLDA 3,0 24,3 20,0 COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST THREE DATABASES 07/05/2014 - EVALUATION FOR E&O FOR IV-PLDA SYSTEM EURECOM RESEARCH -p8 Results (2/4): E&O Detection ` Detector developed for spoofing attacks [Alegre‘BTAS13] Feature: Local Binary Patterns Modeling: one-class classifier ASV E&O detector EVA FEATURE EXTRACTION 0 FEATURE (LBP) SPACE OBF 3,4 EVALUATION (EER %) OF E&O DETECTOR AGAINST EVASION AND OBFUSCATION CLASSIFIER LICIT/ E&O EVASION LICIT DATA ONE-CLASS CLASSIFIER OBFUSCATION (SVM) EVALUATION FOR E&O FOR IV-PLDA SYSTEM 07/05/2014 - EURECOM RESEARCH -p9 Results (3/4): Integration - Evasion ` Two systems in cascade (evaluated against evasion) Speaker Verification (1) E/O detector (2) Dataset/ System BASE/ (1) EVAS/ (1) EVAS/ (1) + (2) GMM-UBM 8,7 19,4 0 GSL 8,0 55,1 0 GSL-NAP 6,8 53,4 0 GSL-FA 6,4 54,7 0 FA 5,6 20,6 0 IV-PLDA 3,0 24,3 0 COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST THREE DATABASES 07/05/2014 - EVALUATION FOR E&O FOR IV-PLDA SYSTEM EURECOM RESEARCH - p 10 Results (4/4): Integration - Obfuscation ` Two systems in cascade (evaluated against obfuscation) Speaker Verification (1) E/O detector (2) Dataset/ System BASE/ (1) OBF/ (1) OBF/ (1) + (2) GMM-UBM 8,7 47,7 4,1 GSL 8,0 32,3 3,5 GSL-NAP 6,8 31,5 3,4 GSL-FA 6,4 29,1 3,1 FA 5,6 41,9 3,9 IV-PLDA 3,0 20,0 3,1 COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST THREE DATABASES 07/05/2014 - EVALUATION FOR E&O FOR IV-PLDA SYSTEM EURECOM RESEARCH - p 11 Conclusions & Future Work ` Surveillance and forensic ASV systems can be subverted ` ` ` Evasion Obfuscation All systems vulnerable ` EERs increase from 3%-9% to ` ` ` ` ` 24%-55% for evasion 20%-48% for obfuscation Detection decreases EERs to 3-4% EER for obfuscation and 0% for evasion i-vectors approach least sensitive Proof of concept shows need for further work 07/05/2014 - EURECOM RESEARCH - p 12 End THANK YOU FOR YOUR ATTENTION 07/05/2014 - EURECOM RESEARCH - p 13 Backup 1: E/O detector ` Generalized approach ` ` Feature that properly represents real speech One-class, generalized CM (applicable to all biometrics) FEATURE EXTRACTION 07/05/2014 - TWO ONE-CLASS CLASSIFIER EURECOM RESEARCH SPOOF/ NON SPOOF - p 14 Backup 2: LBP Feature ` Treat speech as an image and extract higher-level features: ` ` Computer vision successfully applied in speech field [Roy et al.‘12] Here: image formed by concatenation of feature vectors LINEAR FREQUENCY CEPSTRAL COEFFICIENTS (LFCC) ∆LFCC ∆∆LFCC ` Local Binary Patterns [Ojala et al. ‘06] SPOOF/ NON SPOOF LBP OPERATOR ONE-CLASS CLASSIFIER ANTISPOOFING FEATURE 07/05/2014 - EURECOM RESEARCH TEXTROGRAM - p 15 Backup 3: Results with SPOOF & Tnorm Dataset/ System BASE EVA OBF SPOOF Dataset/ System BASE EVA OBF GMM-UBM 8,7 19,4 47,7 32,6 GMM-UBM 8,6 52,6 32,1 GSL 8,0 55,1 32,3 37,2 GSL 8,1 53,2 29,9 GSL-NAP 6,8 53,4 31,5 32,1 GSL-NAP 6,3 50,3 27,6 GSL-FA 6,4 54,7 29,1 24,4 GSL-FA 5,7 49,8 32,4 FA 5,6 20,6 41,9 30,3 FA 5,6 49,1 29,2 IV-PLDA 3,0 24,3 20,0 20,2 IV-PLDA 3,0 24,3 20,0 SPOOF COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST FOUR DATABASES AGAINST FOUR DATABASES 07/05/2014 - EURECOM RESEARCH Evasion ` Speech Activity Detection (SAD): speech/non-speech detector ` ` ` ` Energy-based, Model-based Phoneme-based. Examples of evasion ` Fill audio silences with high-energy noise ` ` Vocoder: carrier (noise, musical instrument) + modulator (message) ` ` ` overcome surveillance system with energy-based SAD. overcome model-based and phoneme-based SADs. … No previous work in evasion 07/05/2014 - EURECOM RESEARCH - p 17