Speaker Verification - Artificial Intelligence & Information Analysis

Informati cs Dpt. Speaker Verification Alexandros Xafopoulos Digital Days 1 05- Informati cs Dpt. Presentation Outline • Framework • Preprocessing • Features (Extraction, Noise CompensationChannel Equalization, Selection) • (Pattern) Matching-Modeling • Decision Making • Performance Evaluation • Experimental Results • References Digital Days 2 05- Introduction Informati Framework cs Dpt. • Motivation – Speech contains speaker specific characteristics (physiological-behavioral) • vocal tract • pitch range - vocal cords • articulator movement – Mouth – Nasal cavity – Lips – Voiceprint as a biometric (distinguishing trait) – Natural & economical way Digital Days 3 05- Introduction(2) Informati Framework cs Dpt. • Objective – Discriminate betw. a given speaker & all others • Definitions – Verification < Latin verus (true) • Claim: Speaker identity • Proof: Speech utterance • Binary decision to establish the truth – Client: speaker registered on the system – Impostor: speaker who claims a false identity – Model: set of parameters that represents a speaker or a group of speakers Digital Days 4 05- Related Research Areas Informati Framework cs Dpt. • Signal Processing Signal processing Analog Signal Digital Signal Processing Processing Speech Processing Analysis Recognition Speech Recognition Speaker Identification Digital Days Other Signals Coding / Synthesis Speaker Recognition Enhancement Storage / Transmission Language Identification Speaker Detection / Tracking 5 Speaker Verification 05- Related Research Areas(2) Informati Framework cs Dpt. • (Statistical) Pattern Recognition Data Digital Days Feature Features Extractor 6 Trained Classifier Class Label 05- Related Research Areas(3) Informati Framework cs Dpt. • Biometrics Technology – def: automatic identification of a person based on his/her physiological or behavioral characteristics (=biometrics) – desirable properties of biometrics [Jain_bk] • universality (found in every person) • uniqueness (different ”value” for each person) • permanence (invariant with time) • collectability (quantitatively measurable) • performance ( accuracy vs.  resources) • high acceptability (person’s willingness) • low circumvention (not easy to ”fool”) Digital Days 7 05- Related Research Areas(4) Informati Framework cs Dpt. • Speech Science Digital Days Communication by speech [Somervuo] 8 05- Related Research Areas(5) Informati Framework cs Dpt. Speech Production Physiology [Picone] • Speech Science(2) Block Diagram of Human Speech Production [Picone] Digital Days 9 05- Related Research Areas(6) Informati Framework cs Dpt. • Speech Science(3) [Morgan] (corrected) General Discrete-Time Model for Speech Production Gain for voice source G(z) H(z) Gain for noise source Digital Days 10 05- R(z) Generic Speaker Verification Process • Enrollment (Training) module Speaker ”A” N utterances Informati Framework cs Dpt.  Known Identity: ”Speaker is ”A”” Speech Pressure Wave of ”A” Digital Signal Acquisition Digital Speech Feature Creation N Sets of Feature Feature Model Vectors Vectors Registration Channel to transfer signal Speaker Model of ”A” Digital Days 11 05- Generic Speaker Verification Process(2)Informati Framework cs Dpt. • Enrollment module(2) – Digital signal acquisition Speech Pressure Wave Microphone Analog Voltage Signal Conditioned Analog Antialiasing Sampling & Signal low-pass Quantization filter (A/D converter) • Sampling frequency: Digital Days Fs 12 Digital Speech 05- Generic Speaker Verification Process(3)Informati Framework cs Dpt. • Enrollment module(3): Feature creation Digital Speech Preprocessing Preprocessed Digital Speech Feature Extraction Noise Compensation Plain Feature Vectors & Channel Equalization (Clean) Feature Vectors Feature Selection Digital Days 13 05- Generic Speaker Verification Process(4)Informati Framework cs Dpt. • Verification module Claimed Identity B Threshold of ”B” |P| Speaker Speaker Speech Pressure Models Model Model of ”B” Wave of ”A” Selection Digital Feature Matching Digital Speech Feature Vectors Pattern Results Decision Signal Matching Creation Making Aquisition Acceptance (A=B) or Rejection (AB) Digital Days 14 05- Generic Speaker Verification Process(5)Informati Framework cs Dpt. • Threshold setting module Speaker Models of " A" Speaker Model of ”A” Threshold Setting Threshold of ”A” A : A h (cohort model) or  (world model) • Cohort model: competitive clients only • World model: all the clients Digital Days 15 05- Corpus Parameters Informati Framework cs Dpt. • Text-dependency [Nedic] – Text dependent: verification done on a fixed phrase, predetermined by the recognizer (fixed phrase) – Text prompted: verification done on system generated sequence of predetermined words (fixed vocabulary) – User customized: verification done on user requested phrase – Text independent: verification done on any phrase – Language independent: verification done on any language • Vocabulary – Fixed or not – Size (|V|) Digital Days 16 05- Corpus Parameters(2) Informati Framework cs Dpt. • Population (Speakers) – Size (|P|) – Similarity • Speech Flow – Discrete Utterance (pauses betw. words) – Continuous – Spontaneous (natural) • Quantity (#sessions, #phrases, phrase duration) • Quality of speech (Problems) Digital Days 17 05- Problems under real conditions Informati Framework cs Dpt. • Microphone / Communication channel / Digitizer quality • Channel - Environmental mismatch (different channels - environments for enrollment & verification request) • Mimicry by humans & tape recorders • Bad pronunciation • Extreme emotional states (e.g. anger) • Sickness / Allergies / Tiredness / Thirst • Aging • Environmental noise / Poor room acoustics Digital Days 18 05- Errors Informati Framework cs Dpt. • False Rejection – A client makes a request to be verified as himself/herself & the request is rejected – High rate client: goat [Koolwaaij] – Low rate client: sheep • False Acceptance – An impostor makes a request to be verified as a client & the request is accepted – High rate client: lamb – Low rate client: ram – High rate impostor: wolf – Low rate impostor: badger Digital Days 19 05- Applications • • • • • Informati Framework cs Dpt. Access control to databases / facilities Electronic commerce Remote access to computer networks Forensic Telephone banking [James] Digital Days 20 05- Preemphasis-Frame Blocking Informati Preprocessing cs Dpt. • Preemphasis: Low order digital system to – spectrally flatten the signal (in favour of vocal tract parameters) – make it less susceptible to later finite precision effects – usually (order=1): s pe (n)  s(n)  α pe s(n  1), α pe  [0.9,1] • Frame blocking (short-term(st) processing) – L successive overlapping (by M samples) frames f (l; n)  s pe (n  M (l  1)), n  0,..., N  1, l  1,..., L – window size - length: N samples = N/ Fs sec – frame rate-shift-period: M samples = M/ F sec s Digital Days 21 05- Frame Windowing Informati Preprocessing cs Dpt. • Used to minimize the singal discontinuities at the beg. & end of each frame – Time (long window)<->freq. (short) resolution f w (l; n)  f (l; n)w(n), n  0,..., N  1 – Window type: [Picone] – Corrections: N  N  1, k  n, n  0,..., N  1 Digital Days 22 05- Speech Activity Detection Informati Preprocessing cs Dpt. • • • • Silence-speech detection Voiced-unvoiced discrimination Endpoint detection [Deller_bk] Can be applied afterward Digital Days 23 05- Signal Measures & Graphs Informati Preprocessing cs Dpt. Speech waveform Zerocrossing rate Time-frequency plot (Spectrogram) Energy plot [Weingessel] Digital Days 24 05- Features - General Informati cs Dpt. Feature Extraction • Maps each speech interval-frame to a multidimensional feature space • Order N coef : number of coefficients in each feature vector (dimensionality) • Several kinds of coefficients have been proposed Digital Days 25 05- Linear Prediction (LP) Informati cs Dpt. Feature Extraction • Speech sample as a linear combination of N LPC previous samples (autoregressive mdl): s ( n)  N LPC a m 1 LPC (m) s(n  m)  Gu(n) – aLPC (m), m  1,..., N LPC : LP coefficients (LPC) – u (n): normalized excitation source – G : scale factor – aLPC (l; m), m  1,..., N LPC : stLPC of frame l Digital Days 26 05- Linear Prediction (LP)(2) Informati cs Dpt. Feature Extraction • Calculation of stLPC – Mean squared error minimization – Autocorrelation method • Levinson-Durbin (L-D) recursion – Covariance method • Cholesky (LU) decomposition Digital Days L-D recursion (l is implied, R: autocorrelation matrix) [Picone2] 27 05- Linear Prediction (LP)(3) Informati cs Dpt. Feature Extraction • LPC – highly correlated – not orthonormal • Distance: Itakura-Saito – Computationally expensive • LPC processor [Rabiner_bk] Digital Days 28 05- Cepstrum (Complex-Real) Informati cs Dpt. Feature Extraction • Special case of homomorphic signal proc. • Focuses on voiced segments • Short-term complex cepstrum (stCC): 1 cCC (l; m)  DFT {log 10 (DFT{ f w (l; n)})}, m  1,..., N CC • Short-term real cepstrum (stRC): 1 cRC (l; m)  DFT {log 10 | DFT{ f w (l; n)} |}, m  1,..., N RC • Distance of cepstrum based coefficients – Euclidean: vectors defined in an orthonormal space Digital Days 29 05- Mel Cepstrum Informati cs Dpt. Feature Extraction Mel-cepstral feature generation (frame l) • Mel [Young] – unit of measure of perceived frequency of a tone – non-linear correspondance to the physical freq. (like the human ear) f mel f Hz    2595 log 10 1    700  – mel freq. cepstral coefficients (MFCC): cMFCC (l; m), m  1,..., N MFCC N FFT ( mel ) , N filters( mel ) – generalized case [Vergin] Digital Days 30 05- LP derived Cepstrum Informati cs Dpt. Feature Extraction • LP Cepstral Coefficients (LPCC): m  1,..., N LPC : m 1 k cLPCC (l ; m)  aLPC (l ; m)   cLPCC (l ; k )aLPC (l ; m  k ) k 1 m m  N LPC  1,..., N LPCC : m 1 k cLPCC (l ; m)   cLPCC (l ; k )aLPC (l ; m  k ) k m N m Digital Days 31 05- Other cepstral variants Informati cs Dpt. Feature Extraction • Linear Freq. Cepstral Coefficients (LFCC) – Like MFCC but: – filters are uniformally spaced on the Hz scale • Mel-warped LPCC (MLPCC) [Kuitert] – CC not directly derived from LPC – 1st compute the log magn. spectrum of LPC – then warp the freq. axis to correspond to the mel axis Digital Days 32 05- Variants Informati cs Dpt. Feature Extraction • Discrete Wavelet Transform (DWT) instead of FFT [Krishnan] • Application of other type than triangular filters • Application of the logarithm before the triangular filters Digital Days 33 05- Delta Cepstrum • [Milner]: cGCC (l ; m)  K  kc k  K GCC Informati cs Dpt. Feature Extraction (l  k ; m) , m  1,..., N GCC K k 2 k  K • Inclusion of temporal information Digital Days 34 05- PLP - Auditory Features Informati cs Dpt. Feature Extraction • Perceptual Linear Prediction (PLP) [Hermansky] – Spectral scale: non-linear Bark scale f Bark  f   0.76 f Hz    13atan    3.5atan  2   1000   7500  2 Hz – Spectral features smoothed within freq. bands • Auditory Features [Kumar] – Imitates signal proc. performed by the ear – cochlear modeling Digital Days 35 05- Intra-frame cepstral proc. Informati cs Dpt. Noise Compensation-Channel Equalization [Mammone] • Liftering-weighting – low order coeffs: sensitive to overall spectral slope – high order: sensitive to noise – =>tapered window (bandpass liftering) N GCC  πm  , m  1,..., N GCC w(m)  1  sin  2  N GCC  cwGCC (l ; m)  w(m)cGCC (l; m), m  1,..., N GCC • Adaptive Component Weighting (ACW) – motivation: all frames don't have same distortion Digital Days 36 05- Inter-frame cepstral proc. Informati cs Dpt. Noise Compensation-Channel Equalization • Cepstral Mean Subtraction (CMS) – mean (over a num of frames) subtraction (tackles training-testing discrepancy) cCMS GCC (l; m)  cGCC (l; m)  avg k (cGCC (k ; m)), m  1,..., NGCC – lowpass filtering – eliminates communication channel spectral shaping • Pole Filtered CMS (PFCMS): cepstrum poles modification Digital Days 37 05- RASTA proc. Informati cs Dpt. Noise Compensation-Channel Equalization • Relative Spectral Filtering (RASTA) [Hermansky] – bandpass filtering in the log-spectral domain – suppresses spectral components that change more slowly or quickly than in typical speech – RASTA-PLP • Microphone (type, position) robustness Digital Days 38 05- Feature Selection Introduction Informati Feature Selection cs Dpt. • Goal – find a transformation to a relatively low-dimensional feature space that preserves the information pertinent to the application while enabling meaningful comparisons to be performed using measures of similarity • Processing of features – Principal Component Analysis (PCA) (or Karhunen Loève Expansion-KLE) • seeks a lower dimensional representation that accounts for variance of the features • not necessarily optimum for class discrimination – Linear Discriminant Analysis (LDA) [Jin] – Non LDA (NLDA) (using MLP) [Konig] Digital Days 39 05- Matching-Modeling Introduction Informati cs Dpt. Matching-Modeling • Modeling: creation of (speaker) models • Model: Can be considered as the output of a proper proc. of a speaker’s set of feature vectors • Matching: computation of a match score betw. the input feature vectors & some speaker model • Methods [Wassner] – Template Matching • deterministic • score: distance betw. a test speaker (feature vectors of an) utterance & a reference speaker model • better score: min distance Digital Days 40 05- Matching-Modeling Introduction(2) Informati cs Dpt. Matching-Modeling • Methods(2) – Stochastic Approach • probabilistic matching • score: prob. of generation of a speech utterance by the claimed speaker P(U | Sc ) • better score: max probability • Parametric speaker model: specific pdf is assumed & its appropriate parameters (e.g. mean vector, covariance matrix) can be estimated using the Maximum Likelihood Estimation (MLE) e.g. multivariate normal model Digital Days 41 05- Template Matching Methods Informati cs Dpt. Matching-Modeling • Dynamic Time Warping (DTW) – dynamic comparison betw. a test & a reference (model) matrix (set of feature vectors) – computes a distance betw. the test & ref. patterns – allows time alignment at different costs – uses Dynamic Programming (DP) Digital Days 42 05- Template Matching Methods(2) Informati cs Dpt. Matching-Modeling • Dynamic Time Warping (DTW)(2) The DP grid with test (t) & reference (r) feature vectors at respective frame indices [Picone] Digital Days 43 05- Template Matching Methods(3) Informati cs Dpt. Matching-Modeling • Dynamic Time Warping (DTW)(3) – distances-costs on the DP grid (i,j frame indices, k step index) • Node d N (ik , jk ) – e.g. N LPCC test ref 2 ( c ( i ; m )  c ( j ; m ))  LPCC k LPCC k m 1 • Transition dT [(ik , jk ) | (ik 1 , jk 1 )] e.g. [ik  ik 1 ]  [ jk  jk 1 ] • Both d B [(ik , jk ) | (ik 1 , jk 1 )] (Type 4) – e.g. • Global d N (ik , jk )  dT [(ik , jk ) | (ik 1 , jk 1 )] K D   d B [(ik , jk ) | (ik 1 , jk 1 )] k 1 – K: number of transitions Digital Days 44 05- Template Matching Methods(4) Informati cs Dpt. Matching-Modeling • Dynamic Time Warping (DTW)(4) – DTW search constraints • Endpoint Constraints (bottom left(S) - top right(E) corners) – endpoint relaxation: ΔiS , ΔjS , ΔiE , ΔjE max points allowed in each direction • Monotonicity (going up & right) ik 1  ik  jk 1  jk • Global Path Constraints (global movement area) – permissible slope or – permissible window Digital Days jk  ik  W 45 05- Template Matching Methods(5) Informati cs Dpt. Matching-Modeling • Dynamic Time Warping (DTW)(5) – DTW search constraints(2) • Local Path Constraints (local movement area) Sakoe & Shiba local constraints on DTW path search [Picone] Digital Days 46 05- Template Matching Methods(6) Informati cs Dpt. Matching-Modeling • Dynamic Time Warping (DTW)(6) – The minimum cost final endpoint provides the distance betw. a test & a reference phrase – Training-Modeling [Deller_bk] • Casual: Unaltered feature strings form models • Averaging feature strings of utterances • The stochastic techniques possess superior training methods Digital Days 47 05- Template Matching Methods(7) Informati cs Dpt. Matching-Modeling • Vector Quantization (VQ) – Uses intra-vector dependencies to break-up a vector space in cells (unsupervised) – follows Linde-Buzo-Gray (LBG) algorithm – speaker model: codebook – codebook: set of prototype vectors used to represent vector spaces – goal: data structure "discovery" by finding how the data is clustered Digital Days 48 05- Template Matching Methods(8) Informati cs Dpt. Matching-Modeling • Learning Vector Quantization (LVQ) – Predefined classes, labeled data – defines the class borders according to the nearest neighbor rule – supervised version of VQ – set of variants (e.g. LVQ1,2,3) – goal: to determine a set of prototypes that best represent each class. Digital Days 49 05- Statistical Measures Informati cs Dpt. Matching-Modeling • Second Order Statistical Measures (SOSM) [Bimbot] – E.g. Arithmetic-Harmonic-Sphericity (AHS) • speaker model: covariance matrix of feature vectors • Distance=min(=0) iff all eigenvalues of test & ref covar matrices are equal Digital Days 50 05- Generative Models Informati cs Dpt. Matching-Modeling • Hidden Markov Models (HMMs) – Statistical - stochastic – Flexible – Types • Continuous Density (CD) • Discrete • SemiContinuous (SC) [Falavigna] – Model: prob. distributions e.g. mixtures of Gaussians of the feature vectors of the speaker Digital Days 51 05- Generative Models(2) Informati cs Dpt. Matching-Modeling • Hidden Markov Models (HMMs)(2) – Topologies • Left-Right (LR) (self & right connections): attempts to catch the temporal structure of the speech & to link consecutive short-time observations together #states/unit (e.g. phoneme) #gaussian distributions(mixtures)/state [Kumar] Example of a left-right HMM Ok : feature vectors Digital Days 52 05- Generative Models(3) Informati cs Dpt. Matching-Modeling • Hidden Markov Models (HMMs)(3) – Topologies(2) • Ergodic (fully connected) -AR HMMs: the prob. distrib. associated with each state is estimated via an AR process [Bourlard] [Picone] Example of an ergodic HMM Digital Days 53 05- Generative Models(4) Informati cs Dpt. Matching-Modeling • Gaussian Mixture Models (GMMs) – Single multi-Gaussian state ”HMM” – Uses a mixture of Gaussian densities to model the distribution of the feature vectors of each speaker – Local covariance info Digital Days 54 05- Neural Networks (NN) Informati cs Dpt. Matching-Modeling • Feed-Forward Neural Networks – supervised learning – each speaker has his own NN (each checked in turn to find the best match) – classifier-matcher: NN output – positive/negative training (rivals) Digital Days 55 05- Neural Networks (NN)(2) Informati cs Dpt. Matching-Modeling • Feed-Forward NNs(2) – Types [Haykin_bk] • Multilayer Perceptron (MLP): trained usually with the Back-Propagation (BP) algorithm – Error Correction Learning – Global optimization • Time Delay NNs (TDNN) • Radial Basis Function (RBF) Networks [Lo] – Memory-Based Learning – Local optimization • Support Vector Machines (SVM) – Learning by examples – Vapnik-Chervonenkis (VC) dimension: framework for the development of SVMs Digital Days 56 05- Neural Networks (NN)(3) Informati cs Dpt. Matching-Modeling • Self Organizing Maps (SOM) – unsupervised learning – method to form a topologically ordered codebook – speaker model: codebook – density of prototype vectors approaches the pdf of the input vectors during the training – nonlinear projection – competitive (winner neuron) learning Digital Days 57 05- NNs & Combined Methods Informati cs Dpt. Matching-Modeling • DTW-SOM – associate an entire feature vector sequence, instead of a single feature vector, as a model with each SOM node (also DTW-LVQ) [Somervuo] • Recurrent NNs (RNN) [Shrimpton] – (self-or not) feedback • Combined methods [Genoud] Digital Days 58 05- Sub-band Proc. Introduction Informati cs Dpt. Matching-Modeling • Speech signal split into band-limited channels (freq. ranges) Block diagram of an LPCC-based sub-band processing system [Finan] Digital Days 59 05- Decision Approaches Informati Decision Making cs Dpt. • ”Template” approach – threshold setting: based on inter- & intraspeaker scores/distances – comparison: test score<=thresholdacceptance [Fakotakis] • Statistical approach [Bengio] [Bourlard] – S c : speaker RV for identity c being claimed – U : utterance represented by feat. vectors – S c : other speakers RV P(U | Sc ) P( Sc ) P( S c | U )  P(U ) Digital Days 60 05- Decision Approaches(2) Informati Decision Making cs Dpt. • Statistical approach(2) – Claim c is true if: P( S c | U ) P(U | Sc ) P( Sc ) 1   c P( S c | U ) P(U | Sc ) P( Sc ) – c : decision threshold usu. found assuming Gaussian distributions for P(U | Sc ) and P(U | Sc ) – normalised likelihood - likelihood ratio – using logs: P(U | Sc ) log  log c  log P(U | Sc )  log P(U | Sc )  Θc P(U | Sc ) – Log Likelihood Ratio (LLR) Digital Days 61 05- Decision Approaches(3) Informati Decision Making cs Dpt. • Statistical approach(3) – P(U | Sc ) : speaker dependent model – P(U | Sc ) : normalization factor – cohort model Sc  Sch : group of selected speakers who are more competitive with the model of the claimed id • No well-established selection procedure – world model Sc   : all other speakers • less computation & storage needed Digital Days 62 05- Decision Approaches(4) Informati Decision Making cs Dpt. • Statistical approach extensions – If y  log P(U | Sc )  log P(U | Sc )  Θc – sign(y) gives the decision – Techniques: • Bayes Decision Rule (assumes prob.s perfectly estimated) – Minimizes Half Total Error Rate(HTER) %FA  %FR HTER  2 • Linear Regression • SVM Regression Digital Days 63 05- Threshold Setting Informati Decision Making cs Dpt. • speaker dependent – |P| thresholds: c , c  1,..., | P | • speaker independent – 1 threshold:  • leave one (client o) out – |P|*|P| thresholds: co , c  1,...,|P|, o  1,...,|P| • a priori: computed on training set (enrollment data) [Lindberg] • a posteriori: computed on test set (obtained during actual use of the system) Digital Days 64 05- Hypothesis Testing Informati Decision Making cs Dpt. Valid & impostor densities [Campbell] Digital Days 65 05- Hypothesis Testing(2) Informati Decision Making cs Dpt. Probability terms & definitions [Campbell] Digital Days 66 05- Accuracy Informati cs Dpt. Performance Evaluation • Error %s – FAR (False Acceptance Rate): # false acceptance s • Prob. of false acceptance # false claims – FRR (False Rejection Rate): # false rejections • Prob. of false rejection # true claims – Values for FAR & FRR are adjusted by changing the threshold values:  FAR vs.  FRR Digital Days 67 05- Accuracy(2) Informati cs Dpt. Performance Evaluation • Error %s(2) – EER (Equal Error Rate): operating point where FAR FRR – Choice of 2 subsequent operating points to approximate the EER value FRR k  FAR k 1  FRR k 1  FAR k EER  , (FAR k 1  FAR k )  (FRR k 1  FRR k ) FAR i 1  FAR i  FRR i 1  FRR i , i FAR k  FRR k  FAR k 1  FRR k 1 – MDE (Minimum Decision Error): operating point where FRR 10FAR Digital Days 68 05- Accuracy(3) Informati cs Dpt. Performance Evaluation • Graphs ROC (Receiver Operating Characteristics) curve: Plot of different operating points (FRR vs. FAR values). Called also DET (Detection Error Tradeoff) plot [Gauvain] • Quantities – #speakers correctly/wrongly verified Digital Days 69 05- Computational Complexity Informati cs Dpt. Performance Evaluation • CPU time – Training • Feature creation • Modeling • Threshold setting – Testing (verification throughput) • Feature creation • Matching • Memory-disk storage – Speech database, Features, Models, Thresholds Digital Days 70 05- Parameters Informati Experimental Resultscs Dpt. Text dependent – Fixed vocab.: Digits 0-9 in French or Spanish |V|=10 |P|=37 (M2VTS database) Discrete utterance speech flow #sessions(shots)/speaker=5, the 5th is for testing|S|=4 #phrases/session=1 (0-9 utterance) Phrase duration~6sec Fs  48KHz Proc. Freq.=12KHz α pe  0.95 N  360 (30ms) M  240 (20ms) Window type: Hamming N LPCC  12 N LPC  12 Liftering-weighting: cw LPCC (l ; m) Coefficients: LPCC Digital Days 71 05- Parameters(2)-EER Informati Experimental Resultscs Dpt. Matching method: DTW d N : Euclidean d B  d N  dT dT : Type 4 ΔiS  10, ΔjS  10, ΔiE  10, ΔjE  10 W  30 Local path constraint: Sakoe & Shiba (b) Decision approach: Template Threshold setting: leave one out |P|(client left out).|P-1|(rest clients as claimants).|S|(shot left out for claiming-testing)=5328 client claims |P|(client left out as impostor).|P-1|(claims of the impostor as one of the rest clients).|S|(shot left out for claiming)=5328 impostor claims EER(avg)[0.6569%,1.5390%] (FAR1=1.5390% >FRR1=0.6569%) EER(avg)=[EER(1|234)+EER(2|134)+EER(3|124)+EER(4|123)]/4 Digital Days 72 05- Parameters(3)-EER Informati Experimental Resultscs Dpt. Shot 4 left out, shot 5 used for testing: |P|.|P-1|=1332 client & 1332 impostor claims EER(5|123)=2.7027% Difference: N MFCC  12  512 N filters( mel )  40 Coefficients: MFCC N FFT ( mel ) EER(avg)=4.1817% EER(5|123)=5.4054% Digital Days 73 05- References Informati cs Dpt. • [Bengio] S. Bengio and J. Mariéthoz, Learning the Decision Function for Speaker Verification, IDIAP Research Report, 2001 • [Bimbot] F. Bimbot, I. Magrin-Chagnolleau and L. Mathan, Second-Order Statistical Measures for Text-Independent Speaker Identification, Speech Communication, vol. 17, pp. 177-192, 1995 • [Bourlard] H. Bourlard and N. Morgan, Speaker Verication: A Quick Overview, IDIAP Research Report, 1998 • [Campbell] J.P. Campbell Jr., Speaker Recognition: a Tutorial, Proc. of the IEEE, vol. 85, no. 9, pp. 1437-1462, 1997 • [Deller_bk] J.R. Deller, J.G. Proakis and J.H. Hansen, Discrete-time Processing of Speech Signals, Macmillan, New York, 1993 • [Fakotakis] N. Fakotakis, E. Dermatas, G. Kokkinakis, Optimal Decision Threshold for Speaker Verification, in Signal Processing III: Theories and Applications, editor: I.T. Young et al., pp. 585-587, Elsevier Science Publishers B.V. (North Holland), 1986 Digital Days 74 05- References(2) Informati cs Dpt. • [Falavigna] D. Falavigna, Comparison Of Different Hmm Based Methods For Speaker Verification (citeseer) • [Finan] R.A. Finan, R.I. Damper and A.T. Sapeluk, Improved Data Modeling for Text-Dependent Speaker Recognition Using Sub-Band Processing (citeseer) • [Gauvain] J. Gauvain, L. Lamel and B. Prouts, Experiments with Speaker Verification over the Telephone, Eurospeech’95, pp. 651-654, 1995 • [Genoud] D. Genoud, F. Bimbot, G. Gravier and G. Chollet, Combining Methods to Improve Speaker Verification Decision, Proc. of ICSLP'96, vol. 3, pp. 1756-1759, 1996 • [Haykin_bk] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1995 • [Hermansky] H. Hermansky and N. Morgan, Rasta Processing of Speech, IEEE Trans. on Speech and Audio Processing, vol. 2, no. 4, pp. 578-589, 1994 Digital Days 75 05- References(3) Informati cs Dpt. • [Jain_bk] A. Jain, R. Bolle and S. Pankanti, editors, Biometrics: Personal Identification in Networked Society, Kluwer Academic Publishers, Boston, MA, 1999 • [James] D. James, H. Hutter and F. Bimbot, CAVE -- Speaker Verification in Banking and Telecommunications (citeseer) • [Jin] Q. Jin and A. Waibel, Application of LDA to Speaker Recognition (citeseer) • [Konig] Y. Konig, L. Heck, M. Weintraub and K. Sonmez, Nonlinear Discriminant Feature Extraction for Robust Text-Independent Speaker Recognition, Proc. of RLA2C’98 (Speaker Recognition and Its Commercial and Forensic Applications), 1998 • [Koolwaaij] J.W. Koolwaaij and L. Boves, A New Procedure for Classifying Speakers in Speaker Verification Systems, Proc. of Eurospeech'97, pp. 2355-2358, 1997 • [Krishnan] M. Krishnan, C. Neophytou and G. Prescott, Wavelet Transform Speech Recognition using Vector Quantization, Dynamic Time Warping and Artificial Neural Networks, 1994 Digital Days 76 05- References(4) • • • • • • Informati cs Dpt. [Kuitert] M. Kuitert and L. Boves, Speaker Verification with GSM Coded Telephone Speech, Proc. of Eurospeech'97, vol. 2, pp. 975-978, 1997 [Kumar] N. Kumar, Investigation of Silicon Auditory Models and Generalization of Linear Discriminant Analysis for Improved Speech Recognition, PhD thesis, Johns Hopkins University, 1997 [Lindberg] J. Lindberg, J.W. Koolwaaij, H.-P. Hutter, D. Genoud, M. Blomberg, F. Bimbot and J.-B. Pierrot, Techniques for a priori Decision Threshold Estimation in Speaker Verification, Proc. of RLA2C’98, 1998 [Lo] T.F. Lo and M.W. Mak, A New Intra-Frame and Inter-Frame Cepstral Processing Method for Telephone-Based Speaker Verification, Int. Workshop on Multimedia Data Storage, Retrieval, Integration and Applications, pp. 116-122, 2000 [Mammone] R.J. Mammone, X. Zhang and R.P. Ramachandran, Robust Speaker Recognition, IEEE Signal Proc. Magazine, vol. 13, no. 5, pp. 58-71, Sep. 1996 [Milner] B. Milner, Inclusion of Temporal Information into Features for Speech Recognition, Proc. of ICSLP’96, pp. 256-259, 1996 Digital Days 77 05- References(5) • • • • • • Informati cs Dpt. [Morgan] N. Morgan and B. Gold, Speech Analysis and Synthesis Overview, Lecture, Univ. of California Berkeley, 1999 [Nedic] B. Nedic and H. Bourlard, Recent Developments in Speaker Verification at IDIAP, IDIAP Research Report, 2000 [Picone] J. Picone, Fundamentals of Speech Recognition: A Short Course, Mississippi State Univ., 1996 [Picone2] J. Picone, Signal Modeling Techniques in Speech Recognition, Proc. of the IEEE, vol. 81, no. 9, pp. 1215-1247, 1993 [Rabiner_bk] L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993 [Shrimpton] D. Shrimpton and B.D. Watson, Comparison of Recurrent Neural Network Architectures for Speaker Verification. Proc. of the Fourth Australian International Conference on Speech Science and Technology, pp. 460-464, 1992 Digital Days 78 05- References(6) • • • • • Informati cs Dpt. [Somervuo] P. Somervuo, Speech Recognition using Context Vectors and Multiple Feature Streams, Helsinki University of Technology, Faculty of Electrical Engineering, 1996 [Vergin] R. Vergin, D. O'Shaughnessy and A. Farhat, Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary SpeakerIndependent Continuous-Speech Recognition, IEEE Trans. on Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, 1999 [Wassner] H. Wassner, G. Maitre and G. Chollet, Speaker Verification : a Review, Technical Report, IDIAP, 1996 [Weingessel] A. Weingessel, Speech Recognition (citeseer) [Young] S. Young, Large vocabulary continuous speech recognition, IEEE Signal Proc. Magazine, vol. 13, no. 5, pp. 45-57, 1996 Digital Days 79 05-

Speaker Verification - Artificial Intelligence & Information Analysis

Related documents

Products

Support

Speaker Verification - Artificial Intelligence & Information Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib