ch5.3 (Vocoders).ppt

advertisement
Vocoders
1
The Channel Vocoder (analyzer):

The channel vocoder employs a bank of
bandpass filters,
 Each
having a bandwidth between 100 HZ and 300
HZ.
 Typically, 16-20 linear phase FIR filter are used.

The output of each filter is rectified and lowpass
filtered.
 The
bandwidth of the lowpass filter is selected to
match the time variations in the characteristics of the
vocal tract.

For measurement of the spectral magnitudes, a
voicing detector and a pitch estimator are
included in the speech analysis.
2
The Channel Vocoder (analyzer block diagram):
Rectifier
Lowpass
Filter
A/D
Converter
Bandpass
Filter
Rectifier
Lowpass
Filter
A/D
Converter
S(n)
Encoder
Bandpass
Filter
To
Channel
Voicing
detector
Pitch
detector
3
The Channel Vocoder (synthesizer):
16-20 linear-phase FIR filters
 Covering 0-4 kHz
 Each having a bandwidth between 100300 Hz
 20 ms frames, or 50 Hz changing of
spectral magnitude
 LPF bandwidth: 20-25 Hz
 Sampling rate of the output of the filters:
50 Hz

4
The Channel Vocoder (synthesizer):

Bit rate:
1
bit for voicing detector
 6 bits for pitch period
 For 16 channels, each coded with 3-4 bits,
updated 50 times per second
 Then the total bit rate is 2400-3200 bps
 Further reductions to 1200 bps can be
achieved by exploiting frequency correlations
of the spectrum magnitude
5
The Channel Vocoder (synthesizer):

At the receiver the signal samples are passed
through D/A converters.

The outputs of the D/As are multiplied by the
voiced or unvoiced signal sources.

The resulting signal are passed through
bandpass filters.

The outputs of the bandpass filters are summed
to form the synthesized speech signal.
6
The Channel Vocoder (synthesizer block diagram):
D/A
Converter
Bandpass
Filter
∑
Channel
Decoder
From
D/A
Converter
Output
speech
Bandpass
Filter
Voicing
Information
Switch
Pitch
period
Random
Noise
generator
Pulse
generator
7
The Phase Vocoder :

The phase vocoder is similar to the
channel vocoder.

However, instead of estimating the pitch,
the phase vocoder estimates the phase
derivative at the output of each filter.

By coding and transmitting the phase
derivative, this vocoder destroys the phase
information .
8
The Phase Vocoder (analyzer block diagram):
cos k n
Short-term
magnitude
cos  n
Lowpass
Filter
k
ak n
Differentiator
Differentiator
k
sin k n
bk n
Compute
Short-term
Magnitude
And
Phase
Derivative
Encoder
S(n)
Lowpass
cos  n
Filter
sin k n
Decimator
To
Channel
Decimator
Short-term phase
derivative
9
The Phase Vocoder
(synthesizer block diagram, kth channel):
Decimate
Short-term
amplitude
cos k n
Channel
Decoder
From
Cos
Interpolator
∑
Integrator
Decimate
Sin
Interpolator
Short-term
Phase
sin k n
derivative
10
The Phase Vocoder :







LPF bandwidth: 50 Hz
Demodulation separation: 100 Hz
Number of filters: 25-30
Sampling rate of spectrum magnitude and phase
derivative: 50-60 samples per second
Spectral magnitude is coded using PCM or
DPCM
Phase derivative is coded linearly using 2-3 bits
The resulting bit rate is 7200 bps
11
The Formant Vocoder :

The formant vocoder can be viewed as a
type of channel vocoder that estimate the
first three or four formants in a segment of
speech.

It is this information plus the pitch period
that is encoded and transmitted to the
receiver.
12
The Formant Vocoder :

Example of formant:
: The spectrogram of the utterance “day one”
showing the pitch and the harmonic structure of
speech.
 (b) : A zoomed spectrogram of the fundamental and
the second harmonic.
 (a)
(a)
(b)
13
The Formant Vocoder (analyzer block diagram):
F3
F3
B3
F2
F2
B2
F1
F1
B1
Input
Speech
Pitch
And
V/U
Decoder
V/U
F0
Fk :The frequency of the kth formant
Bk :The bandwidth of the kth formant
14
The Formant Vocoder (
synthesizer block diagram)
F3
B3
F2
B2
F1
B1
V/U
F0
:
F3
F2
∑
F1
Excitation
Signal
15
Linear Predictive Coding :

The objective of LP analysis is to estimate
parameters of an all-pole model of the vocal
tract.

Several methods have been devised for
generating the excitation sequence for speech
synthesizes.

LPC-type of speech analysis and synthesis are
differ primarily in the type of excitation signal that
is generated for speech synthesis.
16
Synthesis Lattice Structure
Periodic
Impulse
generator
Voiced
+ +
+
Switch
Unvoiced
White
Noise
generator
+
…
+
k(M;m)
Θ̂0
Z-1
…
Z-1
+
+
k(2;m)
+
k(2;m)
+
Z-1
+
+
+
S´ (m)
k(1;m)
k(1;m)
+
Synthesized
speech
Z-1
Gain
estimate
17
LPC 10 :

This methods is called LPC-10 because of
10 coefficient are typically employed.

LPC-10 partitions the speech into the 180
sample frame.

Pitch and voicing decision are determined
by using the AMDF and zero crossing
measures.
18
A General Discrete-Time Model
For Speech Production
19
‫پيشگويي خطي‬
‫تعيين مرتبه پيشگويي‬
‫صدادار‬
‫بي صدا‬
‫صفحه ‪ 20‬از ‪54‬‬
‫پيشگويي خطي‬
‫تعيين مرتبه پيشگويي‬
‫صفحه ‪ 21‬از ‪54‬‬
‫پيشگويي خطي‬
‫تعيين مرتبه پيشگويي‬
‫‪2‬‬
‫‪ m‬‬
‫‪s‬‬
‫‪[n] ‬‬
‫‪‬‬
‫‪n  m  M 1‬‬
‫‪PG  10 log‬‬
‫‪2‬‬
‫‪ m‬‬
‫‪‬‬
‫‪e‬‬
‫[‬
‫‪n‬‬
‫]‬
‫‪ n  m  M 1‬‬
‫‪‬‬
‫صدادار‬
‫بي صدا‬
‫صفحه ‪ 22‬از ‪54‬‬
‫پيشگويي خطي‬
‫مثال‬
‫‪M=4‬‬
‫‪M=10‬‬
‫صفحه ‪ 23‬از ‪54‬‬
‫پيشگويي خطي‬
‫مثال‬
‫‪M=2‬‬
‫‪M=10‬‬
‫‪M=54‬‬
‫صفحه ‪ 24‬از ‪54‬‬
‫پيشگويي خطي‬
‫ايده پيشگويي خطي بلند مدت‬
‫‪M=10‬‬
‫‪M=50‬‬
‫صفحه ‪ 25‬از ‪54‬‬
‫پيشگويي خطي‬
‫پيشگويي خطي بلند مدت‬
‫صفحه ‪ 26‬از ‪54‬‬
‫وكدر ‪LPC10‬‬
‫مشخصات عمومي‬
‫‪‬‬
‫‪‬‬
‫‪‬‬
‫‪‬‬
‫‪‬‬
‫صفحه ‪27‬‬
‫بخاطر ارسال ‪ 10‬ضريب پيشگويي خطي به ‪ LPC10‬معروف‬
‫است‪.‬‬
‫نرخ ارسال برابر ‪ 2400‬بيت بر ثانيه ميباشد‪.‬‬
‫تعداد نمونهها در هر فريم برابر ‪ 180‬نمونه در نظر گرفته شده‬
‫است‪.‬‬
‫تعداد ‪ 54‬بيت به ازاي هر فريم ارسال ميشود‪.‬‬
‫سيگنال آنالوگ ورودي آن با نرخ ‪ 8000‬هرتز نمونه برداري شده‬
‫و با ‪ 16‬بيت كوانتايز ميشود‪.‬‬
‫وكدر ‪LPC10‬‬
‫رشته بيت‬
‫ارسالي‬
‫صفحه ‪ 28‬از ‪54‬‬
‫كد كننده‬
‫آشكار ساز‬
‫صدا‬
‫فيلتر پيش تاكيد‬
‫فريم بندي‬
‫تخمين‬
‫دوره گام‬
‫فيلتر خطاي‬
‫پيشگويي‬
‫تعيين ضرايب‬
‫پيشگويي‬
‫محاسبه بهره‬
‫كد گشايي‬
‫‪LPC‬‬
‫كد گذاري‬
‫ضرايب ‪LPC‬‬
‫كد گذاري‬
‫دوره گام‬
‫كد گذاري بهره‬
‫انديس‬
‫دوره گام‬
‫‪Bit Encoder‬‬
‫انديس بهره‬
‫انديس‬
‫ضرايب‪LPC‬‬
‫سيگنال‬
‫‪PCM‬‬
‫ودودي‬
‫تشخيص پريود پيچ‬
m
R[l,m] 
 s[n]s[n  l]
n  m  N 1
MDF[l , m] 
m
 s[n]  s[n  l ]
n  m  N 1
s[n]  b. s[n  N ]  e[n],



‫روش خود همبستگي‬
‫روش تابع تفاضل دامنه‬
‫ روش‬YMC
m  N 1  m
29 ‫صفحه‬
‫وكدر ‪LPC10‬‬
‫كد كننده‬
‫آشكار ساز صدا‬
‫‪ -1‬محاسبه انرژي ( باند پايين)‬
‫‪ -2‬محاسبه نرخ عبور از صفر‬
‫‪ -3‬محاسبه بهره پيشگويي‬
‫تخمين پريود پيچ‬
‫‪ ‬محاسبه ‪MDF‬‬
‫‪ ‬ارسال يكي از مقادير‪:‬‬
‫‪T=20,21,…,39,40,42,…,80,84,…,15‬‬
‫‪4‬‬
‫صفحه ‪ 30‬از ‪54‬‬
‫وكدر ‪LPC10‬‬
‫كد كننده‬
‫كوانتيزاسيون ضرايب ‪LPC‬‬
‫‪ -1‬حل معادله نرمال به روش لوينسون‪ -‬دوربين‬
‫‪ -2‬محاسبه ضرايب ‪RC‬‬
‫صفحه ‪ 31‬از ‪54‬‬
‫وكدر ‪LPC10‬‬
‫سنتز گفتار‬
‫كد كننده‬
‫سيگنال اصلي‬
‫بخش كد كننده‬
‫• تعيين صدادار‪/‬بيصدا بودن فريم‬
‫• تعيين دوره گام فثط براي حالت‬
‫صدادار‬
‫• محاسبه بهره سيگنال‬
‫كد گشا‬
‫بهره‬
‫مدل منبع‬
‫‪G‬‬
‫‪V/U‬‬
‫قطار ضربه با پريود‬
‫يراير دوره گام‬
‫گفتار سنتز شده‬
‫صفحه ‪32‬‬
‫نويز‬
‫تصادفي‬
‫وكدر ‪LPC10‬‬
‫محدوديتها‬
‫‪ -1‬تقسيم بندي به دو قسمت صدادار و بيصدا‬
‫‪ -2‬استفاده از نويز تصادفي و قطار ضربه پريوديك جهت تحريك(قطار ضربه تنها‬
‫نميتواند تمامي صوتهاي واكدار را ايجاد كند‪).‬‬
‫‪ -3‬حفظ نشدن فاز سيگنال اصلي‬
‫‪ -4‬استفاده از قطار ضربه يك تخطي از مدل ‪ AR‬است‪.‬‬
‫صفحه ‪33‬‬
Residual Excited LP Vocoder :

Speech quality can be improved at the
expense of a higher bit rate by computing
and transmitting a residual error, as done
in the case of DPCM.

One method is that the LPC model and
excitation parameters are estimated from
a frame of speech.
34
Residual Excited LP Vocoder :

The speech is synthesized at the transmitter and
subtracted from the original speech signal to
form the residual error.

The residual error is quantized, coded, and
transmitted to the receiver

At the receiver the signal is synthesized by
adding the residual error to the signal generated
from the model.
35
Residual Excited LP Vocoder :

The residual signal is low-pass filtered at 1000 Hz in the
analyzer to reduce bit rate

In the synthesizer, it is rectified and spectrum flattened
(using a HPF), the lowpass and highpass signals are
summed and the resulting residual error signal is used to
excite the LPC model.

RELP vocoder provides communication-quality speech
at about 9600 bps.
36
RELP Analyzer (type 1):
S(n)
Buffer
And
window
f (n; m)
∑
e (n; m)
Residual
error
LP
Parameters
Excitation
{â(i; m)}
Encoder
stLP
analysis
Θ̂ 0 , gain estimate
V/U, decision
parameters
To
Channel
P̂, pitch estimate
LP
Synthesis
model
37
RELP Analyzer (type 2):
S(n)
Buffer
And
window
f (n; m)
Inverse
Filter
Â(z; m)
stLP
analysis
Prediction
Residual
 (n; m)
Lowpass
Filter
To
Decimator
DFT
Encoder
Channel
LP
Parameters
{â(i; m)}
38
Synthesizer for a RELP vocoder
From
Channel
Decoder
Buffer
And
Controller
Residual
Interpolator
Rectifier
Highpass
Filter
∑
LP
model
Parameter
updates
LP
synthesizer
Excitation
39
Multipulse LPC Vocoder

RELP needs to regenerate the highfrequency components at the decoder.
A

crude approximation of the high frequencies
The multipulse LPC is a time domain
analysis-by-synthesis method that results
in a better excitation signal for the LPC
vocal system filter.
40
Multipulse LPC Vocoder

The information concerning the excitation sequence
includes:

the location of the pulses
 an overall scale factor corresponding to the largest pulse amplitude
 The pulse amplitudes relative to the overall scale factor






The scale factor is logarithmically quantized into 6 bits.
The amplitudes are linearly quantized into 4 bits.
The pulse locations are encoded using a differential
coding scheme.
The excitation parameters are updated every 5 msec.
The LPC vocal-tract parameters and the pitch period are
updated every 20 msec.
The bit rate is 9600 bps.
41
Analysis-by-synthesis coder
A stored sequence from a Gaussian
excitation codebook is scaled and used to
excite the cascade of a pitch synthesis filter
and the LPC synthesis filter
 The synthetic speech is compared with the
original speech
 Residual error signal is weighted
perceptually by a filter
ˆ
ˆ

 ( z / c)
A( z )
W ( z) 

ˆ
 ( z)
Aˆ ( z / c)
42
Obtaining the multipulse excitation:
(Analysis by synthesis method)
Input speech
s(n)
Buffer
And
LP analysis
P̂
Pitch
Synthesis
filter Θ p (z)
LP
Synthesis
filter
+
∑
f̂(n; m)
f(n; m)
 (n; m)
Perceptual
Weighting
filter W(z)
Multipulse
Excitation
generator
Error
minimization
 W (n; m)
43
Code Excited LP :

CELP is an analysis-by-synthesis method
in which the excitation sequence is
selected from a codebook of zero-mean
Gaussian sequence.

The bit rate of the CELP is 4800 bps.
44
CELP (analysis-by-synthesis coder) :
Speech samples
LP
Gain
Gaussian
Excitation
codebook
Buffer and
LP
analysis
Side
information
parameters
Pitch
Synthesis
filter
Spectral
Envelope
(LP)
Synthesis filter
∑
Perceptual
Weighting
Filter W(z)
Computer
Index of
Energy
Excitation
(square and sum)
sequence
45
Analysis-by-synthesis coder
This weighted error is squared and
summed over a subframe block to give the
error energy
 By performing an exhaustive search
through the codebook we find the
excitation sequence that minimize the
error energy

46
Analysis-by-synthesis coder

The gain factor for scaling the excitation
sequence is determined for each
codeword in the codebook by minimizing
the error energy for the block of samples
47
CELP (synthesizer) :
From
Channel
decoder
Buffer
And
controller
Gaussian
Excitation
codebook
Pitch
Synthesis
filter
LP
Synthesis
filter
LP parameters,
gain and pitch
estimate
updates
48
CELP synthesizer



Cascade of two all-pole filter with coefficients
that are updated periodically
First filter is a long-delay pitch filter used to
generate the pitch periodicity in voiced speech
This filter has this form
 p ( z) 
p
1  bz  p
49
CELP
Parameters of the filter can be determined
by minimizing the prediction error energy,
after pitch estimation ,over a frame
duration of 5msec
 Second filter is a short-delay all-pole
(vocal-tract) filter and has 10-12
coefficients that are determined every 1020msec

50
Example:
sampling frequency is 8khz
 subframe block duration for the pitch
estimation and excitation sequence is
performed every 5msec.
 We have 40 samples per 5-msec
 The excitation sequence consist of 40
samples

51
Example:
A codebook of 1024 sequences gives
good-quality speech
 For such codebook size ,we require
10bits to send codebook index
 Hence the bit rate is reduced by a factor
of 4
 The transmission of pitch predictor
parameters and spectral predictor brings
the bit rate to about 4800 bps

52
Low-delay CELP coder
CELP has been used to achieve tollquality speech at 16000 bps with low
delay.
 Although other types of vocoders
produces high quality speech at 16000
bps these vocoders buffer 10-20msec of
speech samples

53
Low-delay CELP coder
The one way delay is of the order of 20-40
msec
 With modification of CELP, it is possible to
reduce the one-way delay to about 2ms
 Low-delay CELP is achieved by using a
backward-adaptive predictor with a gain
parameter and an excitation vector size as
small as 5 samples

54
Low-delay CELP coder
Input Speech
s(n)
Buffer and
window
Excitation
Vector
quantizer
codebook
Gain
Gain
adaptation
LP (high-order)
Synthesis filter
+
f̂(n; m)
-
Predictor
adaptation
Error
minimization
f(n; m)
∑
 (n; m)
Perceptual
Weighting
Filter W(z)
 W (n; m)
55
Low-delay CELP coder

Pitch predictor used in the conventional
forward-adaptive coder is eliminated

In order to compensate for the loss in pitch
information, the LPC predictor order is
increased significantly , to an order of 50
56
Low-delay CELP coder

LPC coefficients are updated more
frequently, every 2.5 ms

5-sample excitation vector corresponds to
an excitation block duration of 0.625 msec
at an 8-khz sampling rate
57
Low-delay CELP coder

The logarithm of the excitation gain is
adapted every subframe excitation block
by employing a 10th-order adaptive linear
predictor in the logarithmic domain

The coefficients of the logarithmic-gain
predictor are updated every four blocks by
performing an LPC analysis of previously
quantized excitation signal blocks
58
Low-delay CELP coder
The perceptual weighting filter is also 10th
order and is updated once every four
blocks by employing an LPC analysis on
frames of the input speech signal of
duration 2.5 msec
 The excitation codebook in the low-delay
CELP is also modified compared to
conventional CELP.
 10-bit excitation codebook is employed.

59
Vector Sum Excited LP :

The VSELP coder and decoder basically differ in
method by which the excitation sequence is
formed.

In next block diagram of the VSELP, there are
three excitation source.

One excitation is obtained from the pitch period
state.

The other two excitation source are obtained
from two codebook.
60
VSELP Decoder :
Long-term
Filter state
0
Codebook
1
∑
Pitch
synthesis
filter
Spectral
envelop
(LP)
synthesis
filter
Spectral Synthetic
post filter Speech
1
Codebook
2
2
61
VSELP Decoder
LPC synthesis filter is implemented as a
10-pole filter and its coefficients are coded
and transmitted every 20ms.
 Coefficients are updated in each 5-ms
frame by interpolation
 Excitation parameters are also updated
every 5ms

62
VSELP Decoder
128 codewords in each of the two
codebooks
 codewords are constructed from two sets
of seven basis codewords by forming
linear combinations of the seven basis
codewords
 The long-term filter state is also a
codebook with 128 codeword sequences

63
VSELP Decoder


In each 5-msec frame, the codewords from this
codebook are filtered through the speech system
filter ˆ( z ) and correlated with the input speech
sequence
The filtered codeword is used to update the
history and the lag is transmitted to the decoder
64
VSELP Decoder
Thus the update occurs by appending the
best-filtered codeword to the history
codebook
 The oldest sample in the history array is
discarded
 The result is that the long-term state
becomes an adaptive codebook

65
VSELP Decoder
The three excitation sequence are
selected sequentially from each of three
codebooks
 Each codebook search attempts to find the
codeword that minimizes the total energy
of the perceptually weighted error
 Once the codewords have been selected
the three gain parameters are optimized

66
VSELP Decoder
Joint gain optimization is sequentially
accomplished by orthogonalizing each
weighted codeword vectors prior to the
codebook search
 These parameters are vector quantized to
one of 256 eight-bit vectors and
transmitted in every 5-ms frame.

67
Vector Sum Excited LP :

The bit rate of the VSELP is about 8000 bps.
 Bit
allocations for 8000-bps VSELP
Parameters
Bits/5-ms Frame
Bits/20ms
10 LPC coefficients
Average speech energy
Excitation codewords
from two VSELP
codebooks
Gain parameters
Lag of pitch filter
-
38
5
14
8
7
56
32
28
Total
29
159
68
VSELP Decoder

Finally, an adaptive spectral post filter is
employed in VSELP following the LPC
synthesis filter this post filter is a pole-zero filter
of the form
ˆ( z / c)
Aˆ ( z )
W ( z) 

ˆ( z )
Aˆ ( z / c)
69
DEMO
Speech Codec
Male
Speaker
Female
Speaker
Music
Original Speech/Music
(16-bit sampled at 8KHz)
FS-1015 (LPC-10e 2.4
kb/s)
FS-1016(CELP 4.8 kb/s)
IS-54 ( VSELP 7.95 kb/s)
G.721 (32 kb/s ADPCM)
70
Download