ppt File

advertisement
Digital Signal Processing
(Term
Project)
by
Habib ur Rehman
Abdul Basit
CENTER FOR ADVANCED STUDIES IN ENGINERING
Speaker Recognition
Introduction
What is Speaker Recognition?
A process that automatically recognizes, who
is speaking on the basis of individual
information included in the speech waves
Speaker
Recognition
Words
“Who are you?”
Speech Signal
Speaker Recognition
Speaker Recognition System
Goals
The goal of this project is to build a
simple, yet complete and representative
‘speaker recognition system ‘.
• The system should be able to identify
speakers based on the different voice
characteristics of each of the known
speakers.
• This identification should be accomplished
regardless of the sentence spoken (Text
independent).
•
Speaker Recognition
Basic Structure of Speaker Recognition System
Speaker Identification / Speaker Verification
Speaker Recognition
Principle of speaker Recognition system
Introduction
All speaker Recognition systems have to serve two distinguished
phases.
• Enrollment or Training phase
• Testing phase
In training phase each registered speaker has to provide samples of
their speech so that the system can build a reference model for that
speaker
In testing the input speech is matched with stored reference model(s)
and recognition decision is made
Speaker Recognition
Basic structure of speaker Recognition system
Feature Extraction / Feature Matching
Speaker Recognition
MFCC Processor
Block diagram
• Continuous signal is blocked into frames of N samples.
Frame
Blocking
•
Windowing
Windowing the frames minimize the signal
discontinuities at the beg & end of each frame
•
Windowing minimize spectral distortion to taper
the signal to zero at beg. & end of each frame.
•
y[n]=x[n]w[n]
•
Fourier
Transform
0  n  N 1
spectrum
Typically Hamming window is used which has the
 2 n 
w[n]  0.54  0.46 cos
0  n  N 1

 N 1
N 1
• 1st fram consists of N samples
• 2nd frame begins M samples after the 1st & overlap it
N-M samples and so on
• Typically N=256(radix 2 FFT), M=100
FFT X [k ]   x[n]e
2kn

N
0  n  N  1,
0  k  N 1
Mel cepstrum
Mel
Mel freq.
Wrapping
Cepstrum
spectrum
n 0
•
Cosine Transform (Mel Cepstrum)
K
~
~
1 

cn    log Sk  cos k    n  1,2,3..K
2 K
 
k 1 
Speaker Recognition
Speech Production
A Convolution Process
•
Speech can be modeled as
convolution between
•
Glottal exitation source g[n]
&
A vocal tract impulse response
v[n]
•
•
y[n] =g[n]*v[n]
Speaker Recognition
Cepstrum
A transformation
•
•
•
It is believed that vocal tract characterstics
are important to speech & speaker
recognition.
We would like to separate out this filtered
response.
Cepstrum does this & converts multiplication
(convolution in time)
Y(  )=g(  )v( )
to sum
Y~(  )=log[g(  )]+log[v( )]
Speaker Recognition
Mel Cepstrum
Mimicing the behaviour of human ear
Speaker Recognition
Mel filter bank
linear spacing below 1kHz, log. Scale above 1kHz
• Triangular shaped filters
emphasize center i frequency and
span to the next center frequency.
• Thus for each tone with actual freq.
in Hz.
a subjective pitch is measured on
Mel scale
mel(f)= 2595*log10(1+f / 700)
•
(Fant’s expresion)
Speaker Recognition
Part 2
Speaker Verification
Speaker Recognition
Speaker Verification
Feature Matching
• Clasification of objects of interest into patterns or
acoustic vectors extracted from input speech
• Since the classification is applied on extracted features,
the process can also be reffered to as feature matching
• Various feature maching techniques DTW,HMM & VQ etc
• Vector Quantization is a process of mapping vectors
from a large vector space to a small number of regions in
space .
• Each region is called a cluster and is represented by its
center called a ‘codeword’.
• The collection of all the ‘codewords’ is called a
codebook.
Speaker Recognition
Vector Quantization
The codebook
•
Speaker Recognition
Vector Quantisation
(The LBG algorithm)
•
Speaker Recognition
Download