ch6 (ANN Filter Banks).ppt

advertisement
2.5.4.1 Basics of Neural Networks
X0
X1
INPUT
Y
OUTPUT
 N 1

y  f  Wi xi   
 i 0

X2
X N 1
1
2.5.4.2 Neural Network Topologies
2
2.5.4.2 Neural Network Topologies
3
2.5.4.2 Neural Network Topologies
4
TDNN
5
2.5.4.6 Neural Network Structures for
Speech Recognition
6
2.5.4.6 Neural Network Structures for
Speech Recognition
7
3.1.1 Spectral Analysis Models
8
3.1.1 Spectral Analysis Models
9
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
10
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
11
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
12
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
13
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
14
3.2.1 Types of Filter Bank Used for
Speech Recognition
Fs
f i  i,
N
Q  N /2
1 i  Q
Fs
bi 
N
15
Nonuniform Filter Banks
b1  c
bi   bi 1 ,
2iQ
(bi  b1 )
f i  f1   b j 
,
2
j 1
i 1
16
Nonuniform Filter Banks
Filter 1 :
f1  300 Hz ,
b1  200 Hz
Filter 2 :
f 2  600 Hz ,
b2  400 Hz
Filter 3 :
f 3  1200 Hz ,
b3  800 Hz
Filter 4 :
f 4  2400 Hz ,
b4  1600 Hz
17
3.2.1 Types of Filter Bank Used for
Speech Recognition
18
3.2.1 Types of Filter Bank Used for
Speech Recognition
19
3.2.2 Implementations of Filter Banks

Instead of direct convolution, which is computationally
expensive, we assume each bandpass filter impulse
response to be represented by:
hi (n)  w(n)e
j i n
Where w(n) is a fixed lowpass filter
20
3.2.2 Implementations of Filter Banks
21
3.2.2.1 Frequency Domain Interpretation of the ShortTime Fourier Transform
22
3.2.2.1 Frequency Domain
Interpretation of the Short-Time
Fourier Transform
23
3.2.2.1 Frequency Domain
Interpretation of the Short-Time
Fourier Transform
24
3.2.2.1 Frequency Domain
Interpretation of the Short-Time
Fourier Transform
25
Linear Filter Interpretation of the
STFT
~
s ( n)
s (n)
w(n)
e
S n (e
 j i
 ji
26
)
3.2.2.4 FFT Implementation of a
Uniform Filter Bank
27
Direct implementation of an arbitrary
filter bank
s (n)
h1 (n)
X 1 ( n)
h2 (n)
X 2 (n)

hQ (n)
X Q (n)
28
3.2.2.5 Nonuniform FIR Filter Bank
Implementations
29
3.2.2.7 Tree Structure Realizations of
Nonuniform Filter Banks
30
3.2.4 Practical Examples of SpeechRecognition Filter Banks
31
3.2.4 Practical Examples of SpeechRecognition Filter Banks
32
3.2.4 Practical Examples of SpeechRecognition Filter Banks
33
3.2.4 Practical Examples of SpeechRecognition Filter Banks
34
3.2.5 Generalizations of Filter-Bank Analyzer
35
3.2.5 Generalizations of Filter-Bank Analyzer
36
3.2.5 Generalizations of Filter-Bank Analyzer
37
3.2.5 Generalizations of Filter-Bank Analyzer
38
40
41
43
44
‫روش ‪MFCC‬‬
‫‪‬‬
‫روش ‪ MFCC‬مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد‪.‬‬
‫‪‬‬
‫روش ‪ MFCC‬نسبت به ساير وي ِژگيها در محيطهاي نويزي بهتر عمل ميکند‪.‬‬
‫ً‬
‫‪ MFCC‬اساسا جهت کاربردهاي شناسايي گفتار ارايه شده است اما در شناسايي گوينده نيز‬
‫راندمان مناسبي دارد‪.‬‬
‫‪‬‬
‫واحد شنيدار گوش انسان ‪ Mel‬مي باشد که به کمک رابطه زير بدست مي آيد‪:‬‬
‫‪‬‬
‫‪45‬‬
‫مراحل روش ‪MFCC‬‬
‫مرحله ‪ :1‬نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک ‪ FFT‬زمان‬
‫کوتاه‪.‬‬
‫)‪ :Z(n‬سيگنال گفتار‬
‫(‪ :W(n‬تابع پنجره مانند پنجره همينگ‬
‫‪WF= e-j2π/F‬‬
‫;‪m : 0,…,F – 1‬‬
‫‪ :F‬طول فريم گفتاري‪.‬‬
‫‪46‬‬
‫مراحل روش ‪MFCC‬‬
‫مرحله ‪ :2‬يافتن انرژي هر کانال بانک فيلتر‪.‬‬
‫که ‪ M‬تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد‪.‬‬
‫تابع‪k‬فيلترهاي بانک فيلتر است‪.‬‬
‫‪ 0,1,..., M  1‬‬
‫) ‪Wk ( j‬‬
‫‪47‬‬
‫توزيع فيلترمبتنی برمعيار مل‬
‫‪48‬‬
‫مراحل روش ‪MFCC‬‬
‫‪‬‬
‫‪‬‬
‫‪49‬‬
‫مرحله ‪ :4‬فشرده سازي طيف و اعمال تبديل ‪ DCT‬جهت حصول به ضرايب‬
‫‪MFCC‬‬
‫در رابطه باال ‪ n=0،...،L‬مرتبه ضرايب ‪ MFCC‬ميباشد‪.‬‬
‫کپستروم‬-‫روش مل‬
‫سیگنال زمانی‬
‫فریم بندی‬
|FFT|2
Mel-scaling
Logarithm
IDCT
Cepstra
Delta & Delta Delta Cepstra
Differentiator
Low-order
coefficients
50
Time-Frequency analysis

Short-term Fourier Transform

Standard way of frequency analysis: decompose the incoming signal
into the constituent frequency components.

W(n): windowing function
N: frame length
p: step size


51
Critical band integration

Related to masking phenomenon: the threshold of a
sinusoid is elevated when its frequency is close to
the center frequency of a narrow-band noise

Frequency components within a critical band are not
resolved. Auditory system interprets the signals
within a critical band as a whole
52
Bark scale
53
Feature orthogonalization



Spectral values in adjacent frequency channels
are highly correlated
The correlation results in a Gaussian model
with lots of parameters: have to estimate all
the elements of the covariance matrix
Decorrelation is useful to improve the
parameter estimation.
54
Download