Part6

advertisement
Speech Coding
Waveform Coding
 Vocoders


Middle Term Evaluation
Waveform Coding

In the time domain
– PCM
– Delta PCM (DPCM)
– Adaptive DPCM

In the Frequency Domain
– Filterbank spectrum Analyser
– Subband coding
– Adaptive Transform Coding
– Vector Waveform Quantisation
Pulse Code Modulation
PCM
11010010....
Uniform quantiser
Nonuniform quantiser
Uniform Quantiser
sn
11
t
10
01
T=1/Fs
00
S(w)
11
01
w
Fs>2Wc
00
wc
10
t
Rate=RFs
Encoding
11010010....
Each sample of
signal is quantized
to one of the 2R amplitude
values.
A-law
m-law
Nonuniform quantiser
11
10
01
00
11
01
00
11
t
Rate=RFs
Encoding
11010011....
Delta PCM


Since successive speech samples exhibit
high correlation, hence the average of
successive-samples amplitude change is very
small.
Therefore, by encoding the differences
between successive samples, fewer bits are
required.
DPCM coder-decoder
E
n
c
o
d
e
r
e~(n)
s(n)
Goal:
Decorralate
speech signal.
Therefore:
A simple long-term
LP predictor is
enough.
D
e
c
o
d
e
r
+
S
e(n)
Quantiser
Predictor
(LP Analyser)
^s~(n)
Channel
e(n)
+
S
s~(n)
+
^s~(n)
Predictor
(LP Analyser)
DPCM coder-decoder
(this version ensures that the error in s~(n) is only the
quantization error)
s(n)
+
e(n)
S
Sampled signal
modified by the
quantisation process.
e~(n)
Quantiser
+
-
s~(n)
^s~(n)
Predictor
(LP Analyser)
S
+
D
e
c
o
d
e
r
Channel
e(n)
+
S
s~(n)
+
^s~(n)
Predictor
(LP Analyser)
E
n
c
o
d
e
r
DPCM coder-decoder
(improved version)
s(n)
+
e~(n)
e(n)
S
Quantiser
^s~(n) -
^s~(n)
All-zero
Linear Filter
+
+
s~(n)
S
Predictor
(LP Analyser)
S
E
n
c
o
d
e
r
+
+
D
e
c
o
d
e
r
Channel
e(n)
+
S
s~(n)
~
+ ^s (n)
Predictor
(LP Analyser)
+
S
s~(n)
+
Predictor
(LP Analyser)
Adaptive PCM and DPCM
PCM and DPCM assumes the speech
signal is stationary.
 The coding process can be improve by
assuming the speech is quasistationary.
 An improvement is to use an adaptive
quantiser.

Adaptive DPCM
Step-size
Adaptation
s(n)
+
e~(n)
e(n)
S
Quantiser
-
^s~(n)
All-zero
Linear Filter
^s~(n)
+
+
s~(n)
Predictor
(LP Analyser)
S
S
+
+
Predictor
Adaptator
D
e
c
o
d
e
r
E
n
c
o
d
e
r
Channel
e(n)
S
+
+
Predictor
(LP Analyser)
+
s~(n)
^s~(n)
S
s~(n)
+
Predictor
(LP Analyser)
Vocoders





Channel vocoder
Cepstral vocoder
Phase vocoder
Formant vocoder
Linear prediction coder.
Cepstral Vocoder
Pitch
Estimator
s(n)
stDFT
Log |.|
IDFT
“Low-time”
lifter
w(n)
Channel
D
e
c
o
d
e
r
DFT
Exp(.)
Pitch
Pulse Generator
V/U
Noise Generator
IDFT
Convolution
E
n
c
o
d
e
r
Linear Prediction in Speech
Coding
Introduction
 Generalities
 Methods

Introduction


This speech coders are called Vocoders
(voice coder).
Basic Idea
Estimate
parameters

Encode
Parameters
Transmit
Parameters
Decode
Parameters
Synthetise
Speech
They usually provide more bandwidth
compression than is possible with waveform
coding (2400-9600bps).
Generalities
LP Model
 Parameter Estimation
 Typical Memory requirements

LP Model
Pitch Period
Voice
Impulse
Generator
Voice/Unvoice
Switch
White
Noise
Unvoice
Generator
All-pole
filter
Gain
Speech
Signal
Glottal filter
Vocal tract filter
Lip Radiation filter
Parameter Estimation

Therefore, for each frame:
– estimate LP coefficients (ai´s)
– estimate Gain
– estimate type of excitation (voice or
unvoice).
– Estimate pitch.
Typical Memory Requirements
Pitch coefficient (6 bits).
 Gain (5 bits)
 Model parameters:

– LP coefficients (8-10 bits)

Small changes in the LPC results in large
changes in the pole positions.
– Reflection coefficients (6 bits)

If |rk| near 1, then large distortion.
– Log-Area Ratio:

Represent a non-linear transformation of the
Reflection Coefficients to expand the scale
near to |rk| near 1.
Methods
Introduction
 LPC-10
 Analysis-by-Synthesis

Introduction

The main difference of the LP vocoders
is the calculation fo the source of
excitation.
LPC-10
Window
(180 samples)
Speech
Signal
AMDF and
Zero Crossing
LP Analysis
(Covariance
Method)
ADC
(8kHz)
Non-linear
warping
Sample
Speech
Channel
D
e
c
o
d
e
r
Pitch
Period
(7 bits)
(4 bits)
Gain
(5 bits)
Impulse
Generator
White Noise
Generator
E
n
Voice/Unvoice
c
Switch (1 bit)
o
LAR
coefficients d
(4 bits and 5 bits)
e
Reflection
r
Coefficients
Pitch Frequency
(7 bit)
10 Reflection
Coefficients.
(5 bits for one
and 4 bits for the others).
Voice/Unvoice
Switch(1 bit)
Synthesized
Speech Signal
1/A(z)
Analysis-by-Synthesis Methods
Introduction
 Multipulse LPC Vocoder
 Regular Excited Linear Prediction
(RELP)
 Code Excited Linear Prediction (CELP)
Vocoder

Introduction
Sampled Speech
Buffer and
LP Analysis
LP Synthesis
Filter
Multipulse
excitation
generator
Perceptual
weighting
Filter
Error
minimisation
E
n
c
o
d
e
r
Multipulse LPC vocoder
Multipulse excitation consists of a short
sequence of pulses to minimise the
energy of the perceptual error.
 For simplicity, the amplitude y location
of the impulses are obtained
sequencially by minimising the energy
for one pulse at a time.
 In practice 4-8 pulses are calculated
every 5 ms.

Multipulse LPC
Sampled Speech
Pitch
Synthesis
Filter
Buffer and
LP Analysis
LP
Synthesis
Filter
Perceptual
weighting
Filter
Error
minimisation
Pulses’ locations
(4 bits)
Pulses’ amplitude
(4 bits)
Channel
Scale factor
(6 bits)
Pulses’ locations
(4 bits)
Pulses’ amplitude
(4 bits)
Pitch filter
parameters
(6 bits)
Scale factor
(6 bits)
Multipulse
excitation
generator
D
e
c
o
d
e
r
10-12 Reflection
Coefficients (5 bits).
Pitch filter
parameters
(6 bits)
Excitation
Generator
E(z)
10-12 Reflection
Coefficients (5 bits).
1/A(z)
Synthesized
Speech Signal
E
n
c
o
d
e
r
Memory Requirments

Updated every 5 ms:
– Scale factor(larger amplitude) log
quantised: 6 bits
– Pulse Amplitude (relative to the larger one)
linear quantised: 4 bits.

Updated every 20 ms:
– Vocal Tract Parameters (reflection
coefficients): 6 bits.
– Pitch Period: 6 bits
Effective for good-quality speech at
9600bps.
 They have been used for airborne
mobile satellite telephone service.

Variations
Every time a new location and
amplitude of of an impulse is obtained,
one can go back and reoptimise the
amplitudes of the previous impulses.
 Joint optimisation of all the amplitudes,
after all locations have been
determined.

Code Excited Linear Prediction
(CELP) Vocoder
The excitation signal is selected from a
codebook of zero-mean Gaussian
sequences.
 LP coefficients are calculated around
every 20 ms.

CELP
Sampled Speech
Pitch
Synthesis
Filter
Buffer and
LP Analysis
LP
Synthesis
Filter
Perceptual
weighting
Filter
Gaussian
Excitation
Codebook
D
e
c
o
d
e
r
10-12 Reflection
Coefficients (5 bits).
Error
minimisation
Pitch filter
parameters
(6 bits)
Gain factor
(6 bits)
Index of the
excitation sequence
(4 bits)
Channel
Pitch filter
parameters
(6 bits)
Gain factor
(6 bits)
Pulses’ amplitude
(4 bits)
Excitation
Generator
E(z)
10-12 Reflection
Coefficients (5 bits).
1/A(z)
Synthesized
Speech Signal
E
n
c
o
d
e
r
With a codebook of 1024 sequences
can be obtain toll-quality speech.
 Rate around 4.8Kbps.

Variations
Low-Delay CELP
 VSELP

Topics to Evaluate





Vocal Tract Physiological Model.
Linear Prediction (LP).
Relatinoship betwen Vocal Tract
Physiological Model and LP.
Filterbank and Signal Processing.
HMM
– Basics
– Applied to Speech Recognition.
– Parameter re-estimation.
Download