Part3

advertisement
Speech Recognition
Chapter 3
Signal Processing Front End
Convert the speech waveform
in some type of parametric representation.
sk
Signal Processing
Front End
O=o(1)o(2)..o(T)
Filterbank
Linear Prediction Analysis
Parametric Representation:
Zero crossing rate,
Short time Energy,
Short time spectral envelope, etc.
Filterbank

Introduction

Filterbank Front End

Uniform Filterbank Design

No-uniform Filterbank Design

Implementation
xi (n)  s(n)  hi (n)
L 1
  hi (m) s(n  m)
m 0
X i (z)  S(z) Hi (z)  s(n)  hi (n)
Filter-Bank Front End
si (n)  s(n)  hi (n)
L 1
  hi (m) s(n  m)
m 0
Shifts the
band signal spectrum
to low-frequency band
and creates
high frequencies
images.
(half-wave or full wave
rectifiers)
Retrains the DC
component and
Eliminates the
high-frequencies
images
created by the
nonlinearity.
(20-30 Hz)
Each filterbank
gives a measurement
of the Energy of the
speech in each band.
Reduce data
(40-60 Hz)
Log, m-law
Original
Signal
500 Hz
concentration
Peaks at
500,1000, 1500,...
images
DC
concentration
Uniform FB Design
Filter
Bandwidth
(no overlapping)
Central
F 
f i   s N  i , 1  i  Q
Frequencies
Number
of filters
Q N
N is the number of uniformly spaced filters
required to span the frequency range of speech.
2
No-uniform Filterbank Design
Logarithmic Frequency scale
 Critical Band Scale (Fig. 3.9)

– Mel Scale
– Bark Scale
Logarithm Frequency Scale FB Design

For Q bandpass filters, fi central frequencies
and bandwidth bi:
b1  C
bi  bi 1 ,
Logarithmic grow
factor. Usually 2.
arbitrary bandwidth
of the first filter
2i Q
(bi  b1 )
f i  f1   b j 
2
j 1
i 1
arbitrary central frequency
of the first filter
C  200Hz;
C  50Hz;
f 1  300Hz;
f 1  225Hz;
 = 2; Q  4;
 = 1.33 Q  12;
Implementation of FBs

Basics on filter implementation

Filterbanks Implementation

Spectral Analysis Basics

Filterbank Implementation using STFT
Basics on Filter Implementation
Infinite Impulse Response (IIR) Filters
 Finite Impulse Response (FIR) Filters
 Filterbanks in Speech Recognition

– 8<Q<32
– Practical systems use no-uniform spaced
filterbanks to characterised the speech
spectrum in a manner considered more
consistent with human perception.
Infinite Impulse Response (IIR) Filters
IIR Filter Design
 IIR Filter Implementation

IIR Filter Design

From Analog Filters
– Impulse Invariance
– Analog-to-Digital Transformation
– Filter Design

Computer Aided Design
Impulse Invariance
Chooses the unit-step response of the digital
filter as equally spaced samples of the
impulse response of the analog filter
h(n)  ha (nT )
where T is the sampling period.
 In this case the design procedure is:
– Calculate the partial fraction expansion of Ha ( s)
N
Ak
Ha ( s)  
k 1 s  sk

– Calculate
N
Ak
H ( z)  
sk T 1
1

e
z
k 1
Filter Design
Butterworth Filter
 Chevyshev Filter
 Elliptic Filter

Butterworth Filter
Introduction
 Design Example

Introduction

Properties:
– The magnitude is maximally flat in the
passband.
– The approximation is monotonic in the
passband and the stopband.

The square magnitude of the filter is of
the form:
j
2
Ha (e ) 
1
1   j / j c 
2N
The roots of the denominator polynomial
are then
1
s p  (1) 2 N  jc 
Thus there are 2N poles equally spaced in an angle
on a circule of radio  c in the s-plane
The poles are symmetrically located with
respect to the imaginary axis.
The angular spacing between poles is p/N
radians.
s-plane
Design Example


Lets assume that we require a a filter such
that the passband magnitude is constant
within 1dB for frequencies below 0.2p and the
stopband attenuation is greater than 15dB for
frequencies between 0.3p and p.
Then if the passband is normalised to 1 at
w=0, then we require:
20 log10 H (e j 2p )  1
20 log10 H (e j 2p )  15
Analog Butterworh filter
20 log 10 H a (e j 2p )  1
20 log10 H a (e j 2p )  15
Calculating N and  c
1


2




1

20 log10 
 1
 1   j / j  2 N  

 
c





1

20 log10 
 1   j / j  2 N 


c

1
2


  15


1   j / jc   100.1
2N
N 6
1   j / j c 
 c  0.7032
2N
 101.5
For this values the poles of the Butterworth filter are:
 01820
.
 j 0.6792
 0.4972  j 0.4972
 0.6792  j 01820
.
Thus the Butterworth filter is as follows:
0.12093
H a ( s)  2
s  0.3640s  0.4945 s 2  0.9945s  0.4945 s 2  1.3585s  0.4945




Expressing this equation as a partial-fraction expansion
and performing the transformation:
0.2871  0.4466z 1
 2.1428  11454
.
z 1
18558
.
 0.6304 z 1
H  z 


1  12971
.
z 1  0.6949 z 2 1  10691
.
z 1  0.3699 z 2 1  0.9972 z 1  0.2570z 2
IIR Filter Implementation
M
H ( z) 
b z
n0
N
n
n
1   an z n
n 1
N  8; M  8
Direct form (4.3.1, Oppenheim & Schafer)
Cascade (4.3.3, Oppenheim & Schafer)
Parallel (4.3.2, Oppenheim & Schafer)
Direct form I (first the zeros)
y( k ) 
M
N
 b x ( k  n)   a
n0
n
n 1
n
z
b0
1
z 1
b1
a1
y ( k  n)
a N 1
z 1
z 1
bM
Direct form I (first the poles)
N
M
n 1
n0
y( k )   a n y( k  n)   bn x( k  n)
aN
z
a1
1
z
b0
1
b1
a N 1
z 1 z 1
Direct form II
aN
bM
N
c ( k )  x ( k )   a n c( k  n )
n 1
z 1
a1
b0
b1
M
y( k ) 
 b c( k  n )
n0
n
(saves memory)
a N 1
z 1
aN
bM
Cascade Form
M
b z
n 
N
H ( z) 
1
n
N 1
n
a z
n 
n
1  1n z 1  2 n z 2
 A
1
2
1


z


z
n 1
1n
2n
2
n
Implanting every second order sub-system as direct form II
x ( n)
A
z 1
11
11
z 1
 21
z 1
b0
21

1,
N 1
2
z 1

2,
N 1
2
b0

1,

2,
N 1
2
N 1
2
y ( n)
Parallel Form
M
H ( z) 
b z
n 
N
1
n
N 1
 on 1   1n z 1
  Ck z  
1
2
no
n 1 1  1n z   2 n z
MN
n
n 
C
 01
z 1
n
a z
Assuming
M=N
n
11
2
 11
z 1
n
 21
x ( n)
Implanting every
second order sub-system
as direct form II

0,
N 1
2
z 1

1,
N 1
2
z 1

2,
N 1
2

1,
N 1
2
y (n
Finite Impulse Response
N 1
H ( z )   h( n) z  n
n 0
N  64
Direct form (4.5.1, Oppenheim & Schafer)
Cascade (4.5.2, Oppenheim & Schafer)
Direct Form
y( k ) 
N 1
 h( n) x ( k  n)
n0
z 1
z 1
h0
h1
z 1
z 1
hN 2
hN 2
x ( n)
h2
hN 1
y ( n)
Cascade Form
N
H ( z )   hn z
n0
N
n
   0n  1n z 1  2 n z  2 
2
n 1
Implanting every second order sub-system as direct form II

11
x ( n)
z 1
b0
11
01,
z 1
z 1
z 1
21
N
2
b0

1,
N
2
2,
N
2

y ( n)
Filterbank Implementation
For FIR filters
for i  1,2,, Q
N 1
si ( k ) 
 h (n) s( k  n)
n0
i
Advantage:
It is simply.
Linear phase when carefully design.
Disadvantage:
Since N at least 64, high computational requirements.
NQ
NQ
multiplications
additions
For IIR filters
for i  1,2,Q
N
M
xi ( k )   a x ( k  n)   bni s( k  n)
n 1
i
n i
n0
Advantages:
It is simply.
Since N and M around 8, Low computational requirement.
Disadvantage:
Distortion of Phase.
Spectral Analysis

Discrete-Time Fourier Tranform (DTFT).

Short-Time Fourier Transform (STFT).

Windowing effects

Windows

Discrete Fourier Transform (DFT)
Discrete-Time Fourier Transform

Discrete-Time Fourier Tranform (DTFT)
Synthesis Definition
S (e jT ) 

s e
n 

 jnT
n
If s n is a periodic waveform
S(e
jT
N 1
)   sne
 jnT
n0

DTFT Analysis Definition
2p
T
sn  p  S (e jT )e jnT d
2 0
(1)
Short-Time Fourier Transform


Since speech signal change with the time, we
we short-time analysis.
Hence the Short-Time Fourier Transform
(STFT) is defined as:
N 1
S'(e jT )   wn sne jnT
n0

Hence, the speech signal is multiplied in time
by a window wn .
Windowing Effects

S'(e jT ) is only an approximation to S(e jT )
S ' (e
jT
N 1
2p
T
  jnT
jT
jnT
)   wn  p  S (e )e
de
0
2

n0
(Juang, Ex. 3.2, pp. 85 and 86)
S ' (e
jT
2p
T
)  p  S (e jT )W (e j (   ) T )e jnT d
2 0
Graphical Interpretation
S (e jT )
S(e jT )W(e j (1 )T )
S (e jT )W(e j (2 )T )
1
()
2
1
( )
2
S '(e jT )
Windows



There are not ideal windows.
Side lobes only contributes to spectral distortion.
Therefore, we wish (Fig. 1.4 Deller)
– Main lobe wide bandwidth
– Side lobes low amplitude.

Hamming window is a good choice
(Secc. 1.1.5 Deller).
Main lobe
Side lobes
101-point FIR filters

For large N (Narrow Bandwidth Analysis) (Fig. 3.11-3.14
– Good spectral resolution
– Bad time resolution

For small N (Wide Bandwidth Analysis)
– Bad spectral resolution (very smooth)
– Good time resolution (quasi-stationary segments)
)
Discrete Fourier Transform (DFT)

The spectrum is sampled, as follow:
2pp
p 
,
p=0,1,2 ,...N-1
TN

By substituting it in 1
S p  S (e

j p
N 1
 sn e
 j p n
,
0  p  N-1
n0
The analysis DFT is as follows:
N 1
sn 

)
, the DFT is defined as follows:
 S pe
j p n
,
0  n  N-1
p0
Time resolution depends on N. N samples in time
corresponds to N samples in frequency
Filterbank Implementation using
STFT
Uniform Filterbanks
 No-uniform Filterbanks

sn (m)
Window the signal
Break the signal.
un ( k )   sn ( Nr  k ); 0  k  N -1
Add the segments
r
Take the DFT
Modulate the DFT.
U n (k ) 
N 1
u
n0
xi (n)  e
n
( k )e
 j p n
 j pn
U n (k )
N  Number of Filters
Download