AcousticProcessing1

advertisement
Signal Processing Front End
sk
Spectrum
Signal Processing
Front End
Filterbank
MFCC
Linear Prediction Analysis
O=o(1)o(2)..o(T)
Signal
A quantity who changes with the time.
 Continuous signal
 Discrete-Time signal
 Digital signal
 Digital Data: 00010101

Discrete-time signal from
Continuous Signal
Speech signal is a continuous signal.
 In order to process the Speech signal in
a computer we need to digitalized it.

S(t)
1
~
Sk
Sk
Zero Order
Holder
Quantizer
2
~
S (t)
3
Digitalization
4
5 Sd
SNR por
cuantización
SNR
Nyquist
 fs  2 fw
Ts  1 / 2 fw 
 Ts  1 / fs
dB
 6 . 02 n  1 . 76
Store
Spectrum


Representation in the Frequency domain of a
time signal.
Tools:
– Fourier Transform (continuous signals)
– Fourier Series (periodic continuous signals)
– Discrete-Time Fourier Transform (Discrete Time
Signals periodic and not-periodic)
– Discrete Fourier Transform (Discrete Time
Signals)
– Fast Fourier Transform (Discrete Time Signals of
length 2x)
Spectral Analysis

Discrete-Time Fourier Transform
(DTFT).
– Short-Time Fourier Transform (STFT).
– Windowing effects

Discrete Fourier Transform (DFT)
– Short-Time Fourier Transform (STFT).
Discrete-Time Fourier Transform

Discrete-Time Fourier Tranform (DTFT)
Synthesis Definition

S (e
j T
)
s
e
n
 j  nT
n  

If
sn
is a periodic waveform
N 1
S (e
j T
)
s
n
e
(1)
 j n T
n0

DTFT Analysis Definition
sn 
T
2

2
0
S (e
j T
)e
j  nT
d
Short-Time Fourier Transform


Since speech signal change with the time, we
we make a short-time analysis.
Hence the Short-Time Fourier Transform
(STFT) is defined as:
N 1
S ' (e
j T
)

wn s ne
 j n T
n0

Hence, the speech signal is multiplied in time
by a window w n .
Windowing Effects

S ' (e
j T
j T
) is only an approximation to S ( e )
N 1
S ' (e
j T
)

n0
S ' (e
j T
)
T
2
T
wn  
2

2
S (e
0
Effect: Smooth the spectrum

2
S (e
j T
)e
j  nT
0
j T
)W ( e
j (   ) T
  j nT
d e

)e
j  nT
d
Main lobe
Side lobes
101-point FIR filters
Narrow Bandwidth Analysis
Wide Bandwidth Analysis
Discrete Fourier Transform (DFT)

The spectrum is sampled, as follow:
p 

2p
TN
,
p= 0 ,1 ,2 ,...N -1
By substituting it in 1
S p  S (e
j
, the DFT is defined as follows:
N 1
p
) 

sn e
 j p n
,
0  p  N -1
n0

The analysis DFT is as follows:
N 1
sn 
S
p
e
j p n
,
0  n  N -1
p0

Time resolution depends on N. N samples in time
corresponds to N samples in frequency
STDFT
DFT
DFT
DFT
DFT
o(1)
o(2)
o(3)
o(4)
Time-Frequency Representation:
Sonogram

Can we use all the Sampled Spectrum
as our feature observations for the
Automatic Speech Recognition task?
Reducing the information
Filterbank Energy as Parameters
 Formant Frequencies as Parameters
 Cepstral Coefficients

Filterbank
 Uniform
Filterbank
 No-uniform Filterbank
 Filterbank Front End
Uniform Filterbank
No-uniform Filterbank
Filter-Bank Front End
FB
FB
FB
FB
o(1)
o(2)
o(3)
o(4)

Can we found better parameters?
Introduction
S ( )  E ( ) H ( )
Since excitation information is not needed for ASR (in English),
it is desirable to separate the excitation information form
the vocal tract information.
H ( )
E ( )
We can think the speech spectrum as a signal,
we can observer that is composed for the multiplication
of a slow signal, H ( ) and a fast signal,E ( ) .
S ( )  E ( ) H ( )
We can filter the spectrum signal using linear filtering,
however, the excitation and the speech signals are
multiplied, and linear filters only useful to separate
signals that are added.

On the other hand, linear filtering is
convenient to use because it is simple
to implement.

Therefore, we transform this no-linear
relationship to a linear relationship using
a log transformation.
S (  )  log  E (  )   log  H (  ) 


In the log transform, we can filter out the
excitation signal, and keep the vocal tract
information. This is called the cepstrum
spectrum.

If we only take into account the magnitude of
the log transform we obtain the cepstrum.
Since the vocal tract information was in the
slow signal spectrum, hence from the
cepstrum, we keep the slower coefficients.

CC
DFT
|.|2
Log()
IDFT
Cepstral Coefficients
CC
CC
CC
CC
o(1)
o(2)
o(3)
o(4)
MFCC

If :
– Instead of a IDFT used to obtain the
cepstrum is used a Cosine Transform.
– Instead of feeding to the Cosine Transform
the Energy in each frequency, the output of
a non-linear filterbank energy is used

we obtain the MFCC
MFCC
DFT
Non-Uniform
Filterbank
|.|2
Log()
Cosine
Transform
MFCC
o(1)
MFCC
o(2)
MFCC
MFCC
o(3)
o(4)

Sphinx:
– $base_dir/preprocessing/adc2mfcc.8k.csh
Download