Digital Systems: Hardware Organization and Design

advertisement
Speech & Audio
Processing
Speech & Audio Coding Examples
A Simple Speech Coder
 LPC Based Analysis Structure
Linear Prediction Analysis
Windowing
Analysis
Analysis
Filter
15 April 2020
AutoCorrelation
LevinsonDurbin
Quantization
Audio
Input
Preemphasis
Residual
Residual
Filter
Coeffs
Filter
Coeffs
Veton Këpuska
2
Windowing Analysis Stage
N – Length of the
Analysis Window
10-30 msec
15 April 2020
Veton Këpuska
3
Some Analysis Windows
15 April 2020
Veton Këpuska
4
MATLAB Useful Functions
 wintool
 Use “doc wintool” for more information
 window
 Use “>doc window” for the list of supported windows
 Define your own window if needed e.g:
 Sine window and Vorbis window
  n  0.5 
wn  sin 
sine window

N



2   n  0.5  
wn  sin  sin 
  vorbis window
N


2
15 April 2020
Veton Këpuska
5
LPC Analysis Stage
 LPC Method Described in:
 Ch5-Analysis_&_Synthesis_of_PoleZero_Speech_Models.ppt
 Summary:
 Perform Autocorrelation
 Solve system of equations with DurbinLevinson Method
 MATLAB help
 doc lpc, etc.
15 April 2020
Veton Këpuska
6
Example of MATLAB Code
function myLPCCodec(wavfile, N)
%
% wavfile - input MS wav file
% N
- LPC Filter Order
%
[x, fs, nbits] = wavread(wavfile);
% plot(x);
% Playing Original Signal
soundsc(x,fs);
% Performing LPC analysis using MATLAB lpc function
[a, g] = lpc(x,N);
% performing filtering operation on estimated filter coeffs
% producing predicted samples
est_x = filter([0 -a(2:end)], 1, x);
% error signal
e = x - est_x;
% Testing the quality of predicted samples
soundsc(est_x, fs);
ge[n]
% Synthesis Stage With Zero Loss of Information
syn_x = filter([0 -a(2:end)], 1, g.*e);
soundsc(syn_x,fs);
H z  
1
A z 
ŝ[n]
p
sˆn   k sˆn  k   gen
k 1
15 April 2020
Veton Këpuska
7
Analysis of Quantization Errors
 Use MATLAB functions to research the effects of
quantization errors introduced by precision of the
arithmetic operations and representation of the
filter and error signal:




Double (float64) representation (software emulation)
Float (float32) representation (software emulation)
Int (int32) representation (hardware emulation)
Short (int16) representation (hardware emulation).
 Useful MATLAB functions:
 Fix, floor, round, ceil
 Example:
 sig_hat=fix(sig*2^(B-1))/2^(B-1);
 Truncation of the sig to B bits.
15 April 2020
Veton Këpuska
8
Quantization of Error Signal & Filter
Coefficients
 Can Apply ADPCM for Error Signal
 Filter Coefficients in the Direct Filter Form
are found to be sensitive to quantization
errors:
 Small quantization error can have a large effect
on filter characteristics.
 Issue is that polynomial coefficients have nonlinear mapping to poles of the filter (e.g., roots
of the polynomial).
 Alternate representations possible that have
significantly better tolerance to quantization
error.
15 April 2020
Veton Këpuska
9
LPC Filter Representations
 As noted previously when Levinson-Durbin algorithm was
introduced one alternate representation to filter
coefficients was also mentioned: PARCOR coefficients:
 LPC to PARCOR:
a jp   j 1  j  p
for i  p,p  1, ,1
a ij1 
a ij  aii aii j
1 k
2
i
1  j  i 1
ki 1  aii11
15 April 2020
Veton Këpuska
10
PARCOR Filter Representation
 PARCOR to LPC:
for i  1,2, ,p
aii  ki
a ij  a ij1  ki aii1j 1  j  i  1
 j  a jp 1  j  p
15 April 2020
Veton Këpuska
11
Line Spectral Frequency
Representation
 It turns out that PARCOR coefficients can be represented
with LSF that have significantly better properties.
 Note that:
1
H z  
A z 
 The PARCOR lattice structure of the LPC synthesis filter
above:
kp+1=∓1
Input
+
z-1
Bp
15 April 2020
Ap-1
Ap
-
A0
+
kp
z-1
Bp-1
Veton Këpuska
-
Output
kp-1
z-1
B0
12
Line Spectral Frequency
Representation
 From previous slide the following holds:
Ap 1 z   Ap  z   k p B p 1 z 


B p z   z 1 B p 1 z   k p Ap 1 z 
A0 z   1 & B0 z   z 1 &
 
B p z   z  p 1 Ap z 1
 From this realization of the filter the LSP
representation is derived:
15 April 2020
Veton Këpuska
13
LSF Representation
k p 1  1 Pp 1  z   Ap  z   B p  z 
k p 1  1 Q p 1  z   Ap  z   B p  z 

15 April 2020


1
Ap  z   Pp 1  z   Q p 1  z 
2
Veton Këpuska
14
LPC Synthesis Filter with LSF
1
1
H z  

A z  1  A z   1
1

1
1  Pp 1  z   1  Q p 1  z   1
2

15 April 2020

Veton Këpuska
15
A Simple Speech Coder
 LPC Based Synthesis Structure
Decoding
Residual
Signal
Residual
Synthesis
Filter
Deemphasis
Audio
Output
Filter
Coeffs
Filter
Coeffs
15 April 2020
Veton Këpuska
16
Audio Coding
Audio Coding
 Most of the Audio Coding Standards use
principles of Psychoacoustics.
 Example of Basic Structure of MP3
encoder:
Audio
Input
Filterbank &
Transform
Quantization
Bit-stream
Psychoacoustic
Model
15 April 2020
Veton Këpuska
18
Basic Structure of Audio Coders
 Filterbank Processing
 Psychoacoustic Model
 Quantization
15 April 2020
Veton Këpuska
19
Filter Bank Analysis
Synthesis
Filterbank Processing:
 Splitting full-band signal into several subbands:
 Uniform sub-bands (FFT)
 Critical Band (FFT followed by non-linear
transformation)
 Reflect Human Auditory Apparatus.
 Mel-Scale and Bark-Scale transformations
f 

Mel  1127.01048 * ln 1 

 700 
  f 2 

Bark  13 * arctan 0.00076 * f   3.5 * arctan  

  7500  


15 April 2020
Veton Këpuska
21
Mel-Scale
15 April 2020
Veton Këpuska
22
Bark-Scale
15 April 2020
Veton Këpuska
23
Analysis Structure of Filterbank
hk[n] – Impulse Response of a Quadrature Mirror kth-filter
N – Number of Channels. Typically 32
↓ - Down-sampling
MDCT – Modified Discrete Cosine Transform
↓
MDCT
Audio
Input
15 April 2020
MDCT
Quantization
h1[n]
hk[n]
↓
MDCT
MDCT
hN[n]
↓
MDCT
MDCT
Veton Këpuska
Bit
Stream
24
Analysis Structure of Filterbank
gk[n] – Impulse Response of a Inverse Quadrature Mirror kth-filter
N – Number of Channels. Typically 32
↑ - Up-sampling
Bit
Stream
15 April 2020
MDCT
IMDCT
↑
g1[n]
Decoding
IMDCT – Inverse Modified Discrete Cosine Transform
MDCT
IMDCT
↑
gk[n]
MDCT
IMDCT
↑
gN[n]
Veton Këpuska
Audio
Output
25
Psycho-Acoustic Modeling
Psychoacoustic Model
 Masking Threshold according to the
human auditory perception.
 Masking threshold is used to quantize
the Discrete Cosine Transform
Coefficients
 Analysis is done in frequency domain
represented by DFT and computed by
FFT.
15 April 2020
Veton Këpuska
27
Threshold of Hearing
 Absolute threshold of audibly
perceptible events in quiet conditions
(no other sounds).
 Any signal bellow the threshold can
be removed without effect on the
perception.
15 April 2020
Veton Këpuska
28
Threshold of Hearing
15 April 2020
Veton Këpuska
29
Frequency Masking
 Schröder Spreading Function
 Bark Scale Function:
  f 2 

z  f   13 * arctan 0.00076 * f   3.5 * arctan  

  7500  


z   z  f maskee   z  f mas ker 

10 * log 10 F z   15.81  7.5z   0.474  17.5 1  z   0.474
15 April 2020
Veton Këpuska

1
2 2
30
Masking Curve
15 April 2020
Veton Këpuska
31
Primary Tone 1kHz
15 April 2020
Veton Këpuska
32
Masked Tone 900 Hz
15 April 2020
Veton Këpuska
33
Combined Sound 1kHz + 0.9kHz
15 April 2020
Veton Këpuska
34
Combined 1kHz + 0.9kHz (-10dB)
15 April 2020
Veton Këpuska
35
Combined 1kHz + 5kHz (-10dB)
15 April 2020
Veton Këpuska
36
END
15 April 2020
Veton Këpuska
37
Download