Evasion and Obfuscation in Speaker Recognition Surveillance and Forensics

Evasion and Obfuscation in Speaker Recognition Surveillance and Forensics Federico Alegre, Giovanni Soldi, Nicholas Evans, Benoit Fauve and Jasmin Liu IWBF 2014 - March 27, 2014 Biometrics & Subversion ` Biometric systems: two types of errors ` ` ` Subversion: provoke deliberate error ` ` ` False Acceptance (FA) False Rejection (FR) Authentication & Security Î Spoofing (increased FA) Surveillance Î Evasion & Obfuscation (increased FR) Forensics ` ` Spoofing: generation of falsified evidence Evasion & Obfuscation: impede detection 07/05/2014 - EURECOM RESEARCH -p2 Evasion & Obfuscation (E&O) ` Evasion and obfuscation: provoke deliberate missed detection ` Evasion: provoke missed detection at the biometry detection stage ` Obfuscation: provoke missed detection at the recognition stage EVASION OBFUSCATION DECISION DETECTION RECOGNITION BIOMETRIC SYSTEM ` Need of evasion and obfuscation detection 07/05/2014 - EURECOM RESEARCH -p3 Evasion ` Speech Activity Detection (SAD): speech/non-speech detector ` ` ` ` Energy-based, Model-based Phoneme-based. Examples of evasion ` Fill audio silences with high-energy noise ` ` ` overcome surveillance system with energy-based SAD. … No previous work in evasion 07/05/2014 - EURECOM RESEARCH -p4 Obfuscation (1/2) ` Obfuscation (intentional disguise): ` Non-electronic: ` ` Electronic: ` ` whispering, glottal fry, pitch modification, hand over the mouth, etc. voice modification, voice conversion. Previous work in obfuscation ` Non-electronic obfuscation ` ` Voice modification, voice conversion ` ` [Künsel’04] [Kajarekar’06] [Zhang’08] [Villalba’10] [Perrot’09] TIMIT database, no vulnerability assessment Electronic pitch modification (detection) ` In-house collected databases, no detection [Wu’13] 07/05/2014 - EURECOM RESEARCH -p5 Obfuscation (2/2) ` Voice conversion [Matrouf et al.‘05] ` ` Speaker X = targeted speaker Î Spoofing Speaker X = non-targeted speaker Î Obfuscation I – FRAME BLOCKING SPEAKER ` Y II - FRAME-TO-FRAME III - SYNTHESIS VOCAL TRACT REPLACEMENT FROM Y TO X CONVERTED VOICE Why voice conversion? ` No “footprints” of manipulation! (critical in forensics) 07/05/2014 - EURECOM RESEARCH -p6 Experimental Setup ` Targeted Speaker Verification (SV) system: ` Windowing ` ` ` ` 20 ms, 10 ms overlap. Parameters ` 16 MFCC + 16 ∆MFCC + ∆Energy ` Energy-based speech detector + Feature normalization ` Modeling ` GMM-UBM: standard with 1024 Gaussians ` GSL: GMM supervector + SVM classifier ` GSL-NAP: GSL with NAP compensation ` ` GSL-FA: GSL system with FA supervectors ` FA: GMM with factor analysis compensation ` IV-PLDA: i-vectors + PLDA with length normalization. ` 07/05/2014 - NIST Databases: ` Development: NIST’05 ` Evaluation: NIST’06 ` Background: NIST’04 & NIST’08 Baseline dataset (BASE): ` Client Models (298): from targeted system ` True trials (1344): 8conv4w-1conv4w ` Impostor trials (12648): 8conv4w-1conv4w Evasion dataset (EVAS): ` True trials (1344): silences filled with high-energy noise ` Impostor trials (12648): same as Baseline dataset Obfuscation dataset (OBF): ` True trials (1344): replaced with converted versions ` Impostor trials (12648): same as Baseline dataset EURECOM RESEARCH -p7 Results (1/4): E&O assessment ` Detector developed for spoofing attacks [Alegre‘13] 6 ASV systems 3 datasets (BASE-EVAS-OBF) ASV BASE EVAS OBF GMM-UBM 8,7 19,4 47,7 GSL 8,0 55,1 32,3 GSL-NAP 6,8 53,4 31,5 GSL-FA 6,4 54,7 29,1 FA 5,6 20,6 41,9 IV-PLDA 3,0 24,3 20,0 COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST THREE DATABASES 07/05/2014 - EVALUATION FOR E&O FOR IV-PLDA SYSTEM EURECOM RESEARCH -p8 Results (2/4): E&O Detection ` Detector developed for spoofing attacks [Alegre‘BTAS13] Feature: Local Binary Patterns Modeling: one-class classifier ASV E&O detector EVA FEATURE EXTRACTION 0 FEATURE (LBP) SPACE OBF 3,4 EVALUATION (EER %) OF E&O DETECTOR AGAINST EVASION AND OBFUSCATION CLASSIFIER LICIT/ E&O EVASION LICIT DATA ONE-CLASS CLASSIFIER OBFUSCATION (SVM) EVALUATION FOR E&O FOR IV-PLDA SYSTEM 07/05/2014 - EURECOM RESEARCH -p9 Results (3/4): Integration - Evasion ` Two systems in cascade (evaluated against evasion) Speaker Verification (1) E/O detector (2) Dataset/ System BASE/ (1) EVAS/ (1) EVAS/ (1) + (2) GMM-UBM 8,7 19,4 0 GSL 8,0 55,1 0 GSL-NAP 6,8 53,4 0 GSL-FA 6,4 54,7 0 FA 5,6 20,6 0 IV-PLDA 3,0 24,3 0 COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST THREE DATABASES 07/05/2014 - EVALUATION FOR E&O FOR IV-PLDA SYSTEM EURECOM RESEARCH - p 10 Results (4/4): Integration - Obfuscation ` Two systems in cascade (evaluated against obfuscation) Speaker Verification (1) E/O detector (2) Dataset/ System BASE/ (1) OBF/ (1) OBF/ (1) + (2) GMM-UBM 8,7 47,7 4,1 GSL 8,0 32,3 3,5 GSL-NAP 6,8 31,5 3,4 GSL-FA 6,4 29,1 3,1 FA 5,6 41,9 3,9 IV-PLDA 3,0 20,0 3,1 COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST THREE DATABASES 07/05/2014 - EVALUATION FOR E&O FOR IV-PLDA SYSTEM EURECOM RESEARCH - p 11 Conclusions & Future Work ` Surveillance and forensic ASV systems can be subverted ` ` ` Evasion Obfuscation All systems vulnerable ` EERs increase from 3%-9% to ` ` ` ` ` 24%-55% for evasion 20%-48% for obfuscation Detection decreases EERs to 3-4% EER for obfuscation and 0% for evasion i-vectors approach least sensitive Proof of concept shows need for further work 07/05/2014 - EURECOM RESEARCH - p 12 End THANK YOU FOR YOUR ATTENTION 07/05/2014 - EURECOM RESEARCH - p 13 Backup 1: E/O detector ` Generalized approach ` ` Feature that properly represents real speech One-class, generalized CM (applicable to all biometrics) FEATURE EXTRACTION 07/05/2014 - TWO ONE-CLASS CLASSIFIER EURECOM RESEARCH SPOOF/ NON SPOOF - p 14 Backup 2: LBP Feature ` Treat speech as an image and extract higher-level features: ` ` Computer vision successfully applied in speech field [Roy et al.‘12] Here: image formed by concatenation of feature vectors LINEAR FREQUENCY CEPSTRAL COEFFICIENTS (LFCC) ∆LFCC ∆∆LFCC ` Local Binary Patterns [Ojala et al. ‘06] SPOOF/ NON SPOOF LBP OPERATOR ONE-CLASS CLASSIFIER ANTISPOOFING FEATURE 07/05/2014 - EURECOM RESEARCH TEXTROGRAM - p 15 Backup 3: Results with SPOOF & Tnorm Dataset/ System BASE EVA OBF SPOOF Dataset/ System BASE EVA OBF GMM-UBM 8,7 19,4 47,7 32,6 GMM-UBM 8,6 52,6 32,1 GSL 8,0 55,1 32,3 37,2 GSL 8,1 53,2 29,9 GSL-NAP 6,8 53,4 31,5 32,1 GSL-NAP 6,3 50,3 27,6 GSL-FA 6,4 54,7 29,1 24,4 GSL-FA 5,7 49,8 32,4 FA 5,6 20,6 41,9 30,3 FA 5,6 49,1 29,2 IV-PLDA 3,0 24,3 20,0 20,2 IV-PLDA 3,0 24,3 20,0 SPOOF COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS COMPARISON (EER %) OF SIX DIFFERENT ASV SYSTEMS AGAINST FOUR DATABASES AGAINST FOUR DATABASES 07/05/2014 - EURECOM RESEARCH Evasion ` Speech Activity Detection (SAD): speech/non-speech detector ` ` ` ` Energy-based, Model-based Phoneme-based. Examples of evasion ` Fill audio silences with high-energy noise ` ` Vocoder: carrier (noise, musical instrument) + modulator (message) ` ` ` overcome surveillance system with energy-based SAD. overcome model-based and phoneme-based SADs. … No previous work in evasion 07/05/2014 - EURECOM RESEARCH - p 17

Evasion and Obfuscation in Speaker Recognition Surveillance and Forensics

Related documents

Products

Support

Evasion and Obfuscation in Speaker Recognition Surveillance and Forensics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib