- Sacramento

advertisement
AUDIO CODER USING PERCEPTUAL LINEAR PREDICTIVE CODING
Pratik R. Bhatt
B.E., C.U. Shah College of Engineering and Technology, India, 2006
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
ELECTRICAL AND ELECTRONIC ENGINEERING
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
FALL
2010
AUDIO CODER USING PERCEPTUAL LINEAR PREDICTIVE CODING
A Project
by
Pratik R. Bhatt
Approved by:
__________________________________, Committee Chair
Jing Pang, Ph.D.
__________________________________, Second Reader
Preetham Kumar, Ph.D.
____________________________
Date
ii
Student: Pratik R. Bhatt
I certify that this student has met the requirements for format contained in the University
format manual, and that this project is suitable for shelving in the Library and credit is to
be awarded for the project.
, Graduate Coordinator
Preetham Kumar, Ph.D.
Date
Department of Electrical and Electronic Engineering
iii
Abstract
of
AUDIO CODER USING PERCEPTUAL LINEAR PREDICTIVE CODING
by
Pratik R. Bhatt
Audio coder using perceptual linear predictive coding is a new technique for
compressing and decompressing audio signal. This technique codes only the information
of an audio signal, which is audible to the human ear, all other information is discarded
during coding. Audio signal is given to a psychoacoustic model to generate information
to control the linear prediction filter. At the encoder side, a signal is passed through a
prediction filter, and then the filtered signal contains less information as it is coded
according to perceptual criteria. At the decoder side, an encoded signal is decoded using
a linear prediction filter, which is the inverse of the filter used at the encoder side. The
decoded signal is similar to the original signal. Since information from the audio signal
relative to the human hear is coded so the signal contains less data and has a high signal
to noise ratio.
iv
This project is about designing and implementing the MATLAB code for an audio
coder using perceptual linear predictive coding. The following tasks were performed in
this project.
(i)
Process the audio signal according to a perceptual model.
(ii)
Encode and decode the signal using the linear predictor.
(iii)
Check simulation results for the original audio signal as well as the retrieved
audio signal after signal processing.
(iv)
Listen to the wave file of the original as well as the reconstructed signal.
, Committee Chair
Jing Pang, Ph.D.
____________________________
Date
v
ACKNOWLEDGMENTS
I would like to thank my project adviser, Dr. Jing Pang, for giving me an
opportunity to implement the Audio coder based on perceptual linear predictive coding. I
have learned many concepts of signal processing during the implementation of this
project. Dr. Pang provided great support and guidance throughout the development of this
project. She has been a great listener and problem solver throughout the implementation.
Her valuable research experience has helped me to develop fundamental research skills.
I would also like to thank Dr. Preetham Kumar for his valuable guidance and
support in writing this project report. In addition, I would like to thank Dr. Suresh
Vadhva, Chair of the Electrical and Electronic Engineering Department, for his support in
completing the requirements for a Master’s degree at California State University,
Sacramento.
vi
TABLE OF CONTENTS
Page
Acknowledgments ....................................................................................................................... vi
List of Tables ............................................................................................................................... ix
List of Figures ............................................................................................................................... x
Chapter
1. INTRODUCTION .......................................................................................................... 1
1.1 Background ....................................................................................................... 1
1.2 Types of Audio Coding .................................................................................... 1
1.3 Concept ............................................................................................................. 3
1.4 Overview of Design .......................................................................................... 4
2. PSYCHOACOUSTIC MODEL .................................................................................... 6
2.1 Threshold of Hearing ........................................................................................ 6
2.2 Critical Bands.................................................................................................... 8
2.3 Simultaneous Masking .................................................................................... 11
2.3.1 Noise Masking Tone .............................................................................. 12
2.3.2 Tone Masking Noise .............................................................................. 13
2.4 Non-simultaneous Masking ............................................................................ 14
2.5 Calculate Overall Masking Threshold ............................................................ 15
3. LINEAR PREDICTIVE CODING .............................................................................. 18
3.1 Linear Prediction ............................................................................................. 18
vii
3.2 Linear FIR Predictor as Pre-filter ................................................................... 19
3.3 Linear IIR Predictor as Post-filter ................................................................... 21
3.4 Calculation of LPC Filter Coefficients ........................................................... 22
4. MATLAB IMPLEMENTATION ................................................................................ 27
5. SIMULATION RESULTS ........................................................................................... 31
6. CONCLUSION ............................................................................................................ 37
6.1 Future Improvements ...................................................................................... 37
Appendix ........................................................................................................................... 39
References ......................................................................................................................... 49
viii
LIST OF TABLES
Page
1.
Table 1 List of critical bands ................................................................................... 11
ix
LIST OF FIGURES
Page
1.
Figure 1 Audio coder using perceptual linear predictive coding ............................... 4
2.
Figure 2 Threshold of hearing for average human..................................................... 7
3.
Figure 3 Frequency to place transformation along the basilar membrane ................. 8
4.
Figure 4 Narrow band noise sources masked by two tones ....................................... 9
5.
Figure 5 Decrease in threshold after critical bandwidth ............................................ 9
6.
Figure 6 Noise masking tone ................................................................................... 12
7.
Figure 7 Tone masking noise ................................................................................... 13
8.
Figure 8 Non-simultaneous masking properties of human ear ................................ 14
9.
Figure 9 Steps to calculate masking threshold of signal .......................................... 16
10.
Figure 10 FIR linear predictor structure .................................................................. 19
11.
Figure 11 IIR linear predictor structure ................................................................... 21
12.
Figure 12 Determination of linear predictor coefficients ........................................ 23
13.
Figure 13 Input signal .............................................................................................. 31
14.
Figure 14 Pre-filtered signal .................................................................................... 32
15.
Figure 15 Reconstructed signal from post-filter ...................................................... 33
16.
Figure 16 Spectrogram for masking threshold of input signal ................................ 34
17.
Figure 17 Spectrogram for estimated masking threshold ........................................ 35
18.
Figure 18 Masking threshold and LPC response for frame of signal ...................... 36
x
1
Chapter 1
INTRODUCTION
1.1 Background
In our day-to-day lives, we use various kinds of audio equipment, such as Cell
phones, music players and wireless radio receivers. These devices process audio signals
in different ways. The basic form of audio signals is analog. It is easier to process a
digital signal form rather than an analog signal form. The process of converting an
analog signal into a digital representation is described as audio coding. Moreover, audio
processing has become significantly highly digitized in recent years. The general
procedure of audio processing is as follows. An analog signal is sampled at regular
intervals and sampled data is encoded with a coding scheme. At the decoder side, data is
converted back to the original form. Therefore, as we increase the sampling rate of the
audio signal, the bandwidth required by that signal will increase and the audio quality
will also increase. The main goal of audio coding is to minimize the sampling rate with a
higher signal to noise ratio and reduced bit rate and achieve high audio quality.
1.2 Types of Audio Coding
There are various types of coding schemes available for audio coding. Each
coding scheme has its benefits and drawbacks depending on the technique used to
process the audio signal. For any coding technique, an area of concern is the quality of
the signal and the achievable bit rate used to reduce the transmission bandwidth. Based
on these criteria, different coding techniques are used to code the audio signal. One of
2
the coding techniques is waveform coding, which uses a signal to perform a coding
operation and produces a reconstructed signal that is the same as the original signal.
Pulse coded modulation (PCM), Differential pulse coded modulation (DPCM), and
Adaptive differential pulse coded modulation (ADPCM) are known as waveform coders.
Source coding technique or parametric coder uses a model of the signal source to code
the signal; this source model is represented by a mathematical representation to reduce
signal redundancies. Linear predictive coding is an example of source coding. In linear
predictive coding, signal parameters are coded as well as transmitted and these
parameters are used to produce a signal at the decoder side. In this technique, the
reconstructed signal does not have the same quality as the original signal. Sub-band
coding and transform coding are examples of source coders that work in the frequency
domain [1].
Transform-based coding uses a transform length to analyze and process the audio
signal; therefore, the resolution of the signal is dependent on the transform length. If the
transform length is longer, then it provides good resolution for speech signals because
they change very little over time. If the transform length is short, it provides good
resolution for audio signals because they change abruptly within a short time. Therefore,
required transform length is dependent on the signal type to be coded [2]. Another
concept is to code the signal according to perceptual criteria. In this method, information
or the sound of a signal that cannot be heard by the human ear is removed and
quantization is performed according to a perceptual model. This allows quantization
3
noise to be kept below the masking threshold of the signal such that it cannot be
perceived by the human ear; then, appropriate transform coding is used to achieve
compression by removing redundant information from the signal. The purpose of an
audio coder using perceptual linear predictive coding is to use a linear predictive filter at
the encoder side as well as at the decoder side. A psychoacoustic model is used to control
the linear predictor filter according to a perceptual criterion of the human auditory
system. The linear predictive filter at the encoder side codes the signal based on a
psychoacoustic model and the linear predictive filter at the decoder side decodes the
signal to reconstruct the original signal.
1.3 Concept
The basic idea of an audio coder based on perceptual linear predictive coding is to
combine a linear predictive filter with a psychoacoustic model. The combination of a
psychoacoustic model and linear predictive coding is used to reduce irrelevant
information of the input signal. The linear predictor at the encoder side is known as the
pre-filter and the linear predictor at the decoder side is known as post-filter. The pre- and
post-filter are controlled by a psychoacoustic model which shapes the quantization noise
such that it stays below the masking threshold of the signal and it becomes inaudible to
the human ear. This technique provides good resolution for speech as well as audio
signals and the resolution of signals is independent of the coding technique used [2].
4
1.4 Overview of Design
The project design consists of a pre-filter and post-filter with a psychoacoustic
model. Figure 1 shows a basic block diagram of an audio coder using perceptual linear
predictive coding. The psychoacoustic model takes the input audio signal and generates a
masking threshold value for each frame of signal. It utilizes properties of the human
auditory system to generate the masking threshold of the input audio signal.
Psycho
acoustic model
Input Audio
signal
Pre Filter
Pre Filtered
Signal
Pre Filtered
Signal
Post Filter
Post Filtered
Signal
Figure 1. Audio coder using perceptual linear predictive coding [2]
Pre- and post-filter are based on the linear predictive filter structure and are
adapted by the masking threshold generated by a psychoacoustic model. The masking
threshold of an audio signal is used to estimate the linear predictive filter coefficients,
and these estimated coefficients change with time making a linear predictive filter
adaptive.
At the transmitter side, a psychoacoustic pre-filter generates a time varying
magnitude response according to the masking threshold of the input signal, and this is
achieved by time varying filter coefficients from a masking threshold for that particular
5
frame of signal. A linear predictor as a pre-filter takes the input audio signal and
processes it for each frame. Adaption of a pre-filter using a psychoacoustic model
removes any inaudible components from the input signal and reduces information that is
irrelevant to the human ear. This pre-filtered audio signal is now coded, as it contains less
information than the original signal. During pre-filtering, the signal is processed or coded
according to perceptual criteria.
At the receiver side, a linear predictor works as a post-filter. It takes an encoded
signal to reconstruct the original signal according to a perceptual model. This
psychoacoustic post-filter also shapes noise according to the masking threshold level of a
signal. Filter coefficients are estimated from the masking threshold for each frame of
signal. Therefore, the post-filter provides a time varying magnitude response for each
frame of signal. Post-filter has an inverse magnitude response of the pre-filter used at the
encoder side, and it predicts the output using the same predictor coefficients used by the
pre-filter. In this design, the predictor coefficients are not transmitted but are stored in a
file for each frame of signal such that a post-filter has access to these coefficients to
reconstruct the signal.
6
Chapter 2
PSYCHOACOUSTIC MODEL
The psychoacoustic model is based on the characteristics of the human ear. The
human ear does not hear all frequencies in the same manner. The human ear typically
hears sounds in the 20 Hz to 20 KHz range. Sounds outside of this range are not
perceived by the human ear. The psycho acoustic model is used to eliminate information
which is not audible to the average human ear. This basic phenomenon of the
psychoacoustic model is used in audio signal processing to reduce the size of information
or data of the audio signal. While doing audio coding based on a psychoacoustic model,
signal components in the range of the average human ear are coded and others are
discarded since they are not perceived by the human ear. By coding this way, fewer bits
are required to code an audio signal and efficiency of the coding scheme is improved.
2.1 Threshold of Hearing
In the frequency range of 20 to 20 KHz, sounds should have a minimum sound
pressure level (SPL) to be heard by the human ear. This minimum sound pressure level is
known as the threshold of hearing for the average human. If sound has a pressure level
below the absolute threshold of hearing, then it will not be heard even if it is in the
frequency range of the human ear. The equation for the threshold of hearing is as follows
[3]:
𝑓
2
Th = 3.64(f/1000)-0.8 - 6.5 𝑒−0.6(1000−3.3) + 10-3(f/1000)4
(1)
7
For example, a signal with a frequency of 1 KHz is going to heard only when its
pressure level satisfies the above equation (1). This level is defined as the threshold level
of hearing for this particular frequency. Each different frequency has its own threshold
level with which to have an effect on the human ear. Figure 2 shows the value of the
absolute threshold of hearing for different values of frequency. It also shows that the
level for the threshold of hearing is more sensitive in the middle range of the human
auditory system. While coding the signal, if the signal frequency has a pressure level less
than the threshold of hearing, then that component can be removed since it cannot be
Sound Pressure Level, SPL (dB)
heard by the average human [4].
Frequency (Hz)
Figure 2. Threshold of hearing for average human [4]
8
2.2 Critical Bands
The basic idea of critical bands is to analyze the human auditory system. As
signals vary over time, the threshold of hearing also changes with time. By analyzing the
human auditory system, determination of the actual threshold becomes possible. Figure 3
shows how the frequency to place transformation occurs along the basilar membrane of
the human inner ear. Inside the inner ear (cochlea), the wave travels along the length of
basilar membrane and it reaches the maximum amplitude at a certain point of the basilar
membrane because each point of basilar membrane is associated with a particular
Displacement
frequency [3].
Distance from oval window (mm)
Figure 3. Frequency to place transformation along the basilar membrane [4]
To imitate this behavior of the human ear (cochlea) for audio spectrum analysis,
band pass filters are used. They are also known as Cochlea filters. The bandwidth of
each filter is known as critical bandwidth. This bandwidth is smaller for low frequencies
and it increases for higher frequencies. It is a non-linear function of frequency and can be
described as follows [4].
9
(2)
Sound Pressure Level (dB)
Critical Bandwidth= 25+75 (1+1.4(f/1000)2)) 0.69 (Hz)
Freq.
Audibility Th.
Figure 4. Narrow band noise sources masked by two tones [3]
fcb
∆f
Figure 5. Decrease in threshold after critical bandwidth [3]
One way to see the effect on the threshold in critical bandwidth is shown in
Figure 4. It shows that the narrowband noise is located between two masking tones
separated in the critical band. Figure 5 shows that within the critical band value, the
threshold remains constant and independent of location of the masking tone but it starts to
decrease as the difference between tones increases beyond the critical bandwidth. To
perform spectrum analysis, the human auditory range is divided into 24 critical bands as
10
shown in Table 1 and each critical band distance is known as one “bark.” It is calculated
from a given frequency by the following equation [4].
Z(f) = 13 arctan (0.00076f) + 3.5 arctan [(f/7500)2] ( bark)
Band
Center Frequency
Bandwidth
No
(Hz)
(Hz)
1
50
0 -100
2
150
100 - 200
3
250
200 - 300
3
350
300 -400
5
450
400 - 510
6
570
510 - 630
7
700
630 - 770
8
840
770 - 920
9
1000
920 - 1080
10
1175
1080 - 1270
11
1370
1270 - 1480
12
1600
1480 - 1720
13
1850
1720 - 2000
14
2150
2000 - 2320
15
2500
2320 - 2700
16
2900
2700 - 3150
17
3400
3150 - 3700
( 3)
11
18
4000
3700 - 4400
19
4800
4400 - 5300
20
5800
5300 - 6400
21
7000
6400 - 7700
22
8,500
7700 - 9500
23
10,500
9500 - 12,000
24
13,500
12,000 -15,500
25
19,500
15,500 -
Table 1. List of critical bands [4]
Table 1 shows how the human auditory range is divided into 25 critical bands with
bandwidth and it is non-linear with frequency.
2.3 Simultaneous Masking
In an audio signal, there is more than one sound or signal present at a particular
time or frequency. When the presence of a strong signal makes a weaker signal inaudible
to the human ear, the event is known as a masking effect. This effect increases the
threshold value of hearing from the threshold value in quiet. A strong signal is known as
a masker while a weaker signal is known as masked signal. When two signals are close in
frequency range or within critical bandwidth and perceived by the human ear at same
time, then this type of masking is known as simultaneous masking. This type of masking
describes most of the masking effects in an audio signal. In the frequency domain, a
magnitude spectrum determines which components will be detected and which
12
components will be masked. The theory of simultaneous masking is explained by the fact
that the basilar membrane is excited by a strong signal much more than by a weaker
signal; therefore, a weaker signal is not detected [3]. In an audio signal, simultaneous
masking is described mainly by Noise masking tone and Tone masking noise scenarios.
SPL (dB)
2.3.1 Noise Masking Tone
410
Crit. BW
Freq. (Hz)
Figure 6. Noise masking tone [4]
The above figure shows noise masker with critical bandwidth centered at 410Hz.
It has a sound pressure level of 80 dB. Masked tone is also centered at 410Hz with a
sound pressure level of 76 dB. The masked tone threshold level depends on the center
frequency and the sound pressure level (SPL) of the masker noise. When tone frequency
becomes equal to the center frequency of masked noise, minimum signal-to-mask ratio
(SMR) is achieved to get a threshold value for critical band noise masking tone. Signalto-mask ratio is defined as the level difference between noise masker and masked tone.
13
The scenario represented in Figure 6 shows the minimum signal-to- mask ratio with the
value of 4 dB. The signal-to-mask ratio increases and the masking power decreases as
tone frequency changes above or below the center frequency [4].
SPL (dB)
2.3.2 Tone Masking Noise
1000
Crit. BW
Freq. (Hz)
Figure 7. Tone masking noise [4]
The above figure shows a tone signal with a center frequency of 1000 Hz within
critical bandwidth and it has a sound pressure level of 80 dB. This tone signal masks a
noise centered at 1000Hz with a sound pressure level of 56 dB. The masked noise
threshold level depends on the center frequency and sound pressure level of the masker
tone. When the center frequency of masked noise is close to the frequency of masker
tone, minimum signal-to-mask ratio (SMR) is achieved to get the threshold value for
noise masked by tone. In this case, a minimum value of SMR equal to 24dB is achieved
for the threshold value of noise. The SMR value increases and the masking power
14
decreases as noise changes above and below the center frequency of the masking tone.
Here the minimum value of SMR is much higher compared to the noise masking tone [4].
2.4 Non-simultaneous Masking
Simultaneous masking takes place when a masker and masked signal are
perceived at the same time. However, a masking effect also takes place before the masker
appears as well as after the masker is removed. This type of masking is known as non-
Masker Audibility
Threshold Increase (dB)
simultaneous or temporal masking [3][4].
Time after masker appearance (ms)
Time after masker removal (ms)
Figure 8. Non-simultaneous masking properties of human ear [3]
Pre-masking is described as the effect before the masker appears and post–
masking can be described as the effect after the masker is removed. Figure 8 shows preand post- masking regions, and it also shows how the threshold of hearing changes within
these regions with non-simultaneous masking. Pre-masking can range from 1-2 ms before
the masker appears and post-masking can be in the range of 100-300 ms depending on
15
intensity as well as the duration of the masker. This masking effect increases the
threshold of hearing for the masked signal from the threshold of hearing in quiet [3].
2.5 Calculate Overall Masking Threshold [4]
For this project, a basic psychoacoustic model of MPEG layer-1 is used to
generate the masking threshold of input signal. These steps are as follows.
Step 1: Get the power spectrum of signal.
In this step, an input signal is first normalized such that its sound pressure level
stays at the appropriate threshold of hearing for the average human. After normalization,
Fast Fourier Transform (FFT) is used to take frames of the input signal, and the power
spectrum of the input signal is obtained for each frame.
Step2: Determination of tone and noise maskers.
For a given power spectrum of signal, when the component exceeds the level of a
neighboring component by a minimum of 7 dB, it is considered a tone masker. Then, it is
added to generate a new power spectrum. A given frequency is a tone if its power
satisfies any of the conditions below.
(i) P[i] > P [i-1] and P[i+1]
(ii) P[i] > P[i+ Nk] + 7 dB and P[i] > + P[i –Nk] + 7 dB
Where the neighboring component Nk is defined as:
[i-2..i+2] for 0.17 Hz< f<5.5 KHz
[i-3..i+3] for 5.5 KHz ≤f <11 KHz
[i-6…i+6] for 11 KHz ≤ f <20 KHz
16
Noise maskers are calculated after determining tone maskers for a given spectrum of
signal.
Input signal
Get the power spectrum
of signal
Determine tone maskers
and noise maskers
Reduction and
determination of
maskers
Determine individual
masking threshold
Get the overall masking
threshold
Figure 9. Steps to calculate masking threshold of signal[4]
Step 3: Reduce and find maskers in critical bandwidth.
In the masker power spectrum, if the masker is below the threshold of hearing,
then it is discarded since it is not heard by the average human. After that, if there is more
17
than one masker in a critical bandwidth for a given power spectrum, then the value of the
strongest masker is considered while the value of the weaker one is not considered.
Step 4: Determine individual masking threshold.
After the tone masker and noise masker, the individual masking threshold for
noise and tone are calculated using a spreading function. A spreading function determines
the minimum level for a neighborhood frequency so it can be perceived by the human
ear. The spreading function shows the effect of masking to nearby critical bands. It
depends on masker location j, masked signal location i, power spectrum PTM for the tone
masker and the difference between the masker and masked signal location in barks. It is
described as follows.
Spreading function (i,j) = 17 Δ– 0.4PTM (j)+11
-3≤ Δ < -1
(4)
= (0.4 PTM (j) +6) Δ
-1≤Δ< 0
(5)
= -17 Δ
0≤Δ< 1
(6)
= (0.15 PTM(j) – 17)Δ - 0.15 PTM(j) 1≤Δ<8
(7)
Where Δ= z(i) –z (j).
Step 5: Find overall masking threshold.
To determine the overall masking threshold, each individual masking threshold
for a given frequency component is added, and all individual maskers are added together.
18
Chapter 3
LINEAR PREDICTIVE CODING
3.1 Linear Prediction
A speech signal is continuous and varies with time; therefore, it becomes difficult
to analyze and more complex to code. After coding, the transmission of coded signal
requires more bandwidth and a higher bit rate as speech contains more information. At
the receiver side, it also becomes much more complex to reconstruct the original signal
based on coded speech signal. For this reason, it becomes necessary to code a signal such
that it requires a lower bit rate and the reconstructed signal should resemble the original
signal.
The basic idea behind linear prediction is that a signal carries relative information.
Therefore, the value of consecutive samples of a signal is approximately the same and the
difference between them is small. It becomes easy to predict the output based on a linear
combination of previous samples. For example, continuous time varying signal is x and
its value at time t is the function of x(t). When this signal is converted into discrete time
domain samples, then the value of the nth sample is determined by x(n). To determine the
value of x(n) using linear prediction techniques, values of past samples such as (n-1),x(n2), x(n-3)……..x(n-p) are used where p is the predictor order of the filter [8]. This coding
based on the values of previous samples is known as linear predictive coding. This
coding technique is used to code speech signals. The normal speech signal is sampled at
8000 samples/sec and 8-bit is used to code each sample. This provides a bit rate around
19
64,000 bits/sec. By using linear predictive coding, a bit rate of 2400 bits/sec is achieved.
In a linear prediction system or coding scheme, prediction error e(n) is transmitted rather
than a sampled signal x(n) because the error is relatively smaller than a sample of signal;
therefore, it requires fewer bits to code [5]. At the receiver side, this error signal is used
to generate and reconstruct the original signal using previous output samples.
3.2 Linear FIR Predictor as Pre-filter
The basic Linear FIR predictor is defined as the finite impulse response linear
predictor. This structure is used to build a psychoacoustic pre-filter at the encoder side.
The pre-filter takes an input sample from a current frame of signal and the linear
predictor structure determines the predicted sample using previous samples of the input
signal [6].
Z -1
x(n)
Z -1
Z -1
a1
a2
+
+
x̂ (n)
ap
e(n)
+
Figure 10. FIR linear predictor structure [6]
20
Figure 10 shows the basic FIR linear predictor structure. The predicted signal is
generated by multiplying predictor coefficients with corresponding previous values of a
signal and then adding these values together. Here a1, a2, a3……ap are known as linear
prediction coefficients. They are explained in the following section. The value of p is
known as the predictor order and its value determines the closest estimation of a signal.
Predicted signal x̂ (n) is the function of these LPC coefficients and previous input
samples, and the predicted signal is represented by following equation.
p
x̂ (n) = ∑k=1 ak x(n – k)
(8)
Prediction error = original signal – predicted signal
(9)
e(n) = x(n) – x̂ (n)
(10)
Replaced by equation (8),
p
e(n) = x(n)–∑k=1 ak x (n- k)
(11)
To convert the above equation in the frequency domain, taking Z transform
p
E(z) = X(z) - ∑k=1 ak X(z) z-k
p
E(z) = X(z) [1 – ∑k=1 ak z-k]
E(z) = X(z) H(z)
(12)
(13)
(14)
p
Where H (z) = 1 – ∑k=1 ak z-k is the transfer function of the linear predictor as a
pre-filter. Here the error signal E (z) is represented by the product of input signal X(z)
and the transfer function of the FIR predictor filter H(z). From the transfer function H(z),
it is realized that it is an all-zero filter. It is also called an analysis filter in linear
predictive coding.
21
3.3 Linear IIR Predictor as Post-filter
The basic infinite impulse response linear predictor is known as linear IIR
predictor. This linear predictor structure is used for a psychoacoustic post-filter at the
decoder side and its filter response is the inverse of that pre-filter. The linear IIR
predictor takes the error signal generated at the encoder side and generates predicted
values of previous output samples. To reconstruct the original signal, values of previous
samples are added to the values of the error signal [6].
Z -1
Z -1
a1
a2
+
+
ŷ (n)
e(n)
Z -1
ap
y(n)
+
Figure 11. IIR linear predictor structure [6]
The above figure shows the basic structure for IIR linear predictor. In this
structure, previous output samples are used to predict the current output sample. Here, the
reconstructed signal is the function of the error signal and predicted output samples.
p
y(n) = e(n) + ∑k=1 ak y(n-p)
(15)
Taking the z-transform of above equation,
p
Y(z) = E(z) + ∑k=1 ak Y(z) z-k
(16)
22
p
Y(z) [1 – ∑k=1 ak z-k] = E(z)
p
Y(z)/ E (z) = 1 / (1- ∑k=1 ak z-k) = 1/ H (z)
(17)
(18)
The above equation represents a transfer function of the linear predictor at the
decoder side. Post-filter transfer function is the inverse of the pre-filter transfer function.
This transfer function represents an all-pole filter. It is also defined as a synthesis filter in
linear predictive coding since it is used to reproduce speech signals at the decoder side.
3.4 Calculation of LPC Filter Coefficients
In a normal linear predictor, coefficients are obtained from a signal for each
frame, and they are calculated using linear equations to get a minimum prediction error.
These coefficients make linear predictor response time vary according to the signal. The
calculated coefficients are transmitted with coded speech such that they are used to
reconstruct the signal at the receiver side [9].
Figure 12 shows the method to determine linear predictor coefficients from the
masking threshold of each frame signal. Psychoacoustic pre- and post-filter are linear
predictors but they are adapted by the masking threshold generated by a psychoacoustic
model.
The masking threshold is estimated for each frame of the signal. Filter
coefficients are determined from the masking threshold spectrum of the signal. However,
in normal linear predictor, they are determined from signal spectra [6].
23
Audio Signal
Psychoacoustic
Model
Masking
Threshold
IFFT to get Autocorrelation values
Levinson’s
Theorem
Linear Predictor
Coefficients
Figure 12. Determination of linear predictor coefficients [6]
Here, first predictor coefficients for linear prediction are extracted from signal
spectra and auto-correlation values for the signal. These auto-correlation values are
obtained from the masking threshold of a signal such that predictor coefficients represent
an approximation of the masking threshold rather than the signal.
According to [8], linear predictor coefficients are obtained such that the predicted
sample value is close to the original sample; therefore, prediction error should be small.
24
Overall error is defined as the sum of the mean squared error for samples that can be
expressed by the below equation.
E = ∑ e2 (n)
(19)
From equation [11],
e(n)=x(n)– ∑𝑝𝑘=1 𝑎k x(n-k)
p
E= ∑ {x(n)-∑k=1 ak x(n-k)}2
(20)
(21)
To get filter coefficients, taking derivation of E with respect to the set of coefficients ak ,
where k = 1, 2…p. and setting it to zero,
dE/ da1 = 0, dE/da2= 0 ..….dE/dap=0
The above linear equations are represented as a matrix in the following way,
𝑅(0)
𝑅(1)
𝑅(1)
𝑅(0)
⋮
⋮
𝑅(𝑝 − 2) 𝑅(𝑝 − 3)
[𝑅(𝑝 − 1) 𝑅(𝑝 − 2)
⋯
⋱
⋯
𝑎1
𝑅(𝑝 − 2) 𝑅(𝑝 − 1)
𝑎2
𝑅(𝑝 − 3) 𝑅(𝑝 − 2)
𝑎3
⋮
⋮
=
⋮
𝑅(0)
𝑅(1)
𝑎(𝑝 − 1)
𝑅(1)
𝑅(0) ] [ 𝑎𝑝 ]
𝑅(1)
𝑅(2)
𝑅(3)
⋮
𝑅(𝑝 − 1)
[ 𝑅(𝑝) ]
(22)
N-1
Here, R (k) = ∑n=k x(n)x(n-k) which is an auto-correlation function of the sequence
x(n).
For this project, predictor order p=10 is taken so above equation can be simplified as
25
𝑅(0)
𝑅(1)
⋮
𝑅(8)
[𝑅(9)
𝑅(1)
𝑅(0)
⋮
𝑅(7)
𝑅(8)
⋯
⋱
⋯
𝑅(8)
𝑅(7)
⋮
𝑅(0)
𝑅(1)
𝑅(1)
𝑅(9) 𝑎1
𝑅(2)
𝑅(8) 𝑎2
𝑅(3)
𝑎3
⋮
=
⋮
⋮
𝑅(1)
𝑅(9)
𝑎9
𝑅(0)] [
𝑎10] [𝑅(10)]
(23)
The above equation can be represented by:
R .a = r
𝑅(0)
𝑅(1)
R= ⋮
𝑅(8)
[𝑅(9)
𝑅(1)
𝑅(0)
⋮
𝑅(7)
𝑅(8)
⋯
⋱
⋯
𝑅(8)
𝑅(7)
⋮
𝑅(0)
𝑅(1)
𝑅(1)
𝑎1
𝑅(9)
𝑅(2)
𝑎2
𝑅(8)
𝑅(3)
⋮ 𝑎 = 𝑎3 𝑟 =
⋮
⋮
𝑅(1)
𝑅(9)
𝑎9
𝑅(0)]
[𝑎10]
[𝑅(10)]
(24)
According to [7] [9], to get optimum predictor coefficients, the above equation is
solved by Levinson’s theorem because it requires less computation than any other
method. From equation (11) the sum of the squared error is represented by the equation
below:
p
E = ∑ e2[n] = ∑ {x(n) – ∑k=1 ak x(n-k)}2
(25)
Solving the above equation for P th order can be represented as:
Ep = R (0) – a1 R (1) – a2 R (2) – …. – ap-1.R (p-1)
Ep = R (0) – ∑𝑝𝑘=1 𝑎k R(k)
(26)
(27)
To compute the pth order coefficients, the following steps are performed repetitively for
p= 1,2…P
(i) Bp = R(p) – ∑𝑝−1
𝑖=1 𝑎 i (p-1) R(p-i)
(ii) ap(p)
=
kp= Bp / E p-1
26
(iii) ai(p) = ai( p-1 ) - kp a(p-1) (p-i) where i = 1,2…p-1
(iv) Ep = Ep-1 ( 1 – kP2)
Therefore, the final solution for LPC coefficients is given by ai = ai(p) where i =
1,2…p-1. For each analysis frame, the psychoacoustic model calculates the masking
threshold of the audio signal. According to [6], the masking threshold is expressed in
frequency domain and it is represented as a power spectral density function of signal.
Auto-correlation function can be represented as the power of spectral density function of
a signal.
To convert the masking threshold from frequency domain to time domain, the
Inverse Fast Fourier Transform (IFFT) of the masking threshold is performed. This can
be done by IFFT, which is a built-in function in MATLAB. As per theorem, when
Inverse Fast Fourier Transform is performed on a power spectral density function, it
gives values of auto-correlation function. While taking the Inverse Fast Fourier transform
of the masking threshold the real part of the corresponding value is taken because only
the real part can approximate predictor coefficients. After getting auto-correlation values
for the masking threshold, Levinson’s theorem is used to determine linear predictive
coefficients.
27
Chapter 4
MATLAB IMPLEMENTATION
Audio coder using perceptual linear predictive filter is implemented in MATLAB.
For this project, a perceptual model is the basic psychoacoustic model for MPEG layer-1
and it is taken from Rice University.
It is used to determine the overall masking
threshold of the audio signal. This psychoacoustic model takes the current portion of
signal and performs Fast Fourier Transform (FFT). In a psychoacoustic model, signal is
converted from a time domain to a frequency domain. To determine the masking effect,
tone maskers, noise maskers and overall maskers within a critical band are calculated.
After considering the overall effects of maskers, the masking threshold is obtained for
each window frame of input signal.
In the encoder part, a signal is computed over a 2048-point Fast Fourier
Transform (FFT) length. Fast Fourier Transform provides the masking threshold from a
psychoacoustic model for each frame. This FFT length gives a large number of samples
for the masking threshold; therefore, a close approximation of predictor coefficients is
possible for each frame of signal. Linear predictor codes the audio signal using these
predictor coefficients. At the encoder side, the following steps are performed to code the
signal.

A window size of 25ms is used in the encoder as well as in the decoder because it
provides enough resolution to perform analysis for speech and audio signal.
28

A process window size of 5ms is used in the encoder as well as in the decoder to
update predictor coefficients within each window frame.

First the input audio signal is read from WAV file (Waveform Audio Format) to
process using perceptual model.

Initially, for the first window size, the masking threshold is calculated and it is
used to estimate predictor coefficients for that signal frame. The linear prediction
filter processes the portion of the signal using predictor coefficients.

Once enough data is available to process, the current window is shifted by a
window size of 25ms.

The masking threshold is calculated for the current window size of a signal.

The calculated masking threshold is used to get linear predictor coefficients for
the current portion of signal.

The estimated coefficients are stored in a matrix for each processing window of
5ms for the current window size.

The pre-filter takes a current sample, predictor coefficients, and the previous
values stored in a buffer to compute filter output.

At every process window size, predictor coefficients are updated and they are
stored in a matrix.

The above steps are repeated for the length of the input signal.

For each frame of signal, estimated predictor coefficients are stored in a MAT
(Matlab) file so the predictive filter at the decoder side has access to it.
29

Once the pre-filter output is computed, it is stored in a wav file
In the predictor coefficients estimation, the masking threshold spectrum is
represented in a frequency domain. It is defined for positive as well as negative
frequencies. To get the real estimation of coefficients, negative frequencies are
reconstructed from the masking threshold spectrum. Inverse Fast Fourier Transform
(IFFT) is performed on the power spectrum of the masking threshold to convert from
frequency domain to time domain. While converting masking threshold spectrum to time
domain values, only the real part is used to approximate values. These values are
represented as auto-correlation values for the masking threshold of signal. In this project,
Levinson’s theorem is implemented in MATLAB. It is used to estimate linear predictive
coefficients from auto-correlation values of the masking threshold. These estimated linear
predictive coefficients generate a predictor response according to the masking threshold
of signal.
At the decoder side, the linear predictive filter is used to decode or reconstruct the
signal. In decoder implementation, the following steps are performed to decode the
signal.

Processed or coded audio signal is read from a WAV (Waveform Audio Format)
file.

Predictor coefficients from MAT file (Matlab file) are stored into the matrix.
These coefficients correspond to the masking threshold of input audio signal and
it is generated by the psychoacoustic model.
30

The initial window of a signal is processed. Initial predictor coefficients are taken
from the matrix to generate a post-filtered signal and initial data for the next
window of signal.

After initial window processing, the current window is shifted by window size
(25 ms) to process signal.

Within each current window, predictor coefficients are extracted from a matrix for
every 5ms and it is updated to the post-filter.

For each window size, the post-filter takes a current sample, predictor
coefficients, and previous samples to determine output sample.

After processing the signal, post-filter output is computed over the length of input
signal.

The reconstructed original signal is stored in the audio (wav) file.
After performing the post-filter operation, the output audio signal is played as
well as compared with the original audio signal. Output audio signal quality is the same
as the original signal.
31
Chapter 5
SIMULATION RESULTS
To verify the project design input signal is taken from the wave (Waveform
Audio Format) file recorded in a quiet room with no noise effects.
Input signal
0.8
0.6
0.4
Amplitude (v)
0.2
0
-0.2
-0.4
-0.6
-0.8
0.5
1
1.5
2
2.5
3
Time (Sec)
Figure 13. Input signal
Figure 13 shows the input signal at the encoder side. The signal is sampled at 16
KHz and it is from the artistic database of CMU (Carnegie Melon University) speech
group. The audio signal has audible quality.
32
Pre-filtered signal
0.25
0.2
0.15
Amplitude (v)
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.25
0.5
1
1.5
2
2.5
3
Time (Sec)
Figure 14. Pre-filtered signal
Figure 14 shows the output signal from the pre-filter. It contains less information
than the original signal since this information is removed based on the perceptual model.
The masking threshold of the input signal is extracted by a psychoacoustic model, and
linear predictor coefficients are estimated to generate frequency response. The prefiltered signal is coded according to perceptual criteria.
33
Reconstructed signal from post-filter
0.8
0.6
Amplitude (v)
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
0.5
1
1.5
2
2.5
3
Time (Sec)
Figure 15. Reconstructed signal from post-filter
Figure 15 shows a reconstructed signal from a post-filter, and it is similar to the
input signal. A post-filter generates a frequency response inverse to the pre-filter to
decode the signal. The post-filter uses the same filter coefficients used by the pre-filter
for that particular frame of signal. Predictor coefficients are stored in a MAT file so the
post-filter has access to it to generate a frequency response according to perceptual
criteria to reconstruct the signal.
34
Masking threshold spectrogram
Masking threshold spectrogram
1000
Frequency (v)
2000
3000
4000
5000
6000
7000
8000
0.5
1
1.5
2
2.5
3
Time (Sec)
Figure 16. Spectrogram for masking threshold of input signal
Figure 16 shows the masking threshold of the input signal extracted by a
psychoacoustic model. This spectrogram shows different magnitude levels for a masking
threshold of a signal at a particular frequency. It shows color representation for a masking
threshold describing magnitude levels. It is extracted for each frame of signal. A red color
bar refers to highest magnitude level while the blue color refers to the lowest maginude
level of masking threshold.
35
Masking threshold estimate spectrogram
Masking threshold estimate spectrogram
1000
Frequency (v)
2000
3000
4000
5000
6000
7000
8000
0.5
1
1.5
2
2.5
3
Time (Sec)
Figure 17. Spectrogram for estimated masking threshold
Figure 17 shows the estimated masking threshold of an input signal and it is also
described as a frequecy response of linear predictor. This shows the frequency response
in terms of color bars. Color bars show the approximation for LPC coefficents from
generating the masking threshold of the input signal. A red color bar refers to the highest
magnitude level while the blue color refers to the lowest maginude level of the estimated
masking threshold.
36
60
masking
threshold
Masking
threshold
estimated
magitide
response
Estimated
magnitude
response
40
Magnitude
(dB)
magnitude
dB
20
0
-20
-40
-60
-80
0
1000
2000
3000
4000
time
Frequency
5000
6000
7000
8000
(Hz)
Figure 18. Masking threshold and LPC response for frame of signal
Figure 18 shows the masking threshold for one of the frames of the audio signal.
The red plot shows the extracted masking threshold while the blue plot shows the
response of the linear predictor for that frame of input signal. From the plot it is clear that
the response is almost similar to the extracted masking threshold of the input signal and
the approximation of LPC coefficents is accurate.
37
Chapter 6
CONCLUSION
The implementation of audio coder based on perceptual linear predictive coding
represents another way to code audio signal. This project combines a perceptual model
and linear predictive technique to perform audio coding. The project is successfully
implemented in MATLAB. Linear predictor coefficients are obtained using Levinson’s
theorem. During coding, information from the audio signal is removed based on a
perceptual model and irreverence reduction is achieved. After decoding, the reconstructed
audio signal has the same quality as the original audio signal.
This project provided me a great opportunity to study and learn the fundamentals
of digital signal processing. During implementation, I have used these fundamentals to
get the best results for this project. I have finished all the requirements for the success of
this project. After developing this project, I have learned the fundamentals of a
psychoacoustic model and how it is combined with a linear predictor filter to process the
audio signal.
6.1 Future Improvements
This project uses a perceptual model that removes irrelevant information from an
audio signal. There is a need for redundancy reduction from an audio signal to achieve
further compression. This design can be extended further or be combined with
appropriate quantization and coding techniques. Therefore, it will be able to get a lower
bit rate as well as maintain signal quality. The coding technique is used either in
38
frequency domain or in time domain.
In this project, filter coefficients are not
transmitted to the decoder side. To control post-filter according to perceptual criteria in
another way, coefficients are quantized and efficiently coded. After coding, they are
transmitted along with the coded signal as side information.
39
APPENDIX
encode.m
% It reads the audio signal from wave file and code the input signal according to
perceptual criteria and linear predictive filter (Pre-filter).
% function [y, A, fs] = encode_file (filename, plotting)
filename = 'cmu_us _arctic _slt _a0001.wav';
plotting = 1;
% if plotting is not set, no plotting should be done
if nargin == 1
plotting = 0;
end
% create figure for masking threshold and estimate filter evolution
if plotting == 2
fig = figure;
end
% reading the signal from the file
[x, fs] = wavread (filename);
%-------------------------------------------------------------------------% system initialization
% assigning the number of point over which the FFT is computed
FFTlength = 2048;
% set the LPC analysis order
p = 10;
% computing the processing step, window size and frequency values (in Hz
% and Barks)
40
ps = round (fs * 0.005);
ws = round (fs * 0.025);
% 5 ms
% 25 ms
N = length (x);
% the length of the signal in samples
f = fs/FFTlength:fs/FFTlength:fs/2;
b = hz2bark (f);
% initializing the filter coefficients matrix
A = zeros (floor ((N-ws)/ps)+1, p+1);
% if plotting is enabled, initialize the filter response and MT
% spectrograms
if plotting
H = [];
TH = [];
end
%-------------------------------------------------------------------------% estimating initial values for the pre-filter
% selecting the first window from the signal
crtW = x (1:ws);
% extracting the masking threshold
th = findThreshold (crtW, FFTlength, f);
% get an estimate filter for the extracted threshold
a = getMTest (th, p, f);
% save the filter coefficients into matrix A
A (1, :) = a;
% if plotting is enabled, save the filter response and the MT
% if plotting
H = [H 10*log10 (abs (freqz (1, a, FFTlength/2)))];
TH = [TH; th];
41
% end
%-------------------------------------------------------------------------% initialize the channel output
y = zeros (size (x));
%-------------------------------------------------------------------------% start processing the signal
% initialize the filter delay chains
prebuff = zeros (p+1, 1);
% first process and transmit the initial frame
for i=1:ws
% computing the prefilter output
[y (i), prebuff] = prefilter (x (i), prebuff, a);
crtW = [crtW (2:ws); x (i)];
end
% sufficient new data has been gathered and now the normal processing steps
% can be taken
lastUpdated = 1;
crtW = [crtW (ps+1:ws); x (i+1:i+ps)];
th = findThreshold (crtW, FFTlength, f);
a = getMTest (th, p, f);
A (floor ((i-ws)/ps)+1, :) = a;
% if plotting is enabled, save the filter response and the MT
if plotting
H = [H 10*log10 (abs (freqz (1, a, FFTlength/2)))];
TH = [TH; th];
42
end
for i=i+1:N-ps
% computing the prefilter output
[y (i), prebuff] = prefilter (x (i), prebuff, a);
% adding new values to the analysis window
% crtW = [crtW (2:ws); x (i)];
% if the coefficients were last updated ps samples ago, it's time to
% compute new coefficients
if lastUpdated == ps
lastUpdated = 0;
crtW = [crtW (ps+1:ws); x (i+1:i+ps)];
%
th = get_AMT (crtW, FFTlength, f, b);
th = findThreshold (crtW, FFTlength, f);
a = getMTest (th, p, f);
A (floor ((i-ws)/ps)+2, :) = a;
% if plotting is enabled, save the filter response and the MT
if plotting
h = 10*log10 (abs (freqz (1, a, FFTlength/2)).^2);
H = [H h-mean (h)];
TH = [TH; th-mean (th)];
if plotting == 2
figure (fig);
plot (f, th-mean (th), 'r', f, h-mean (h), 'b');
pause (0.1);
end
end
end
lastUpdated = lastUpdated+1;
43
end
for i=i+1:N
% computing the prefilter output
[y (i), prebuff] = prefilter (x (i), prebuff, a);
end
time = (1:N)/fs;
wavwrite (y, fs, [filename (1:end-3) 'pre.wav']);
% LPC coefficients are stored for each frame of signal . and updated for every process
frame for window size. It is used to pass these coefficients to post-filter.
save ('filter_coeffs.mat', 'A');
if plotting
close all;
figure;
plot (time, x), title ('Original signal'), xlim ([time (1) time (N)]);
plot (time, y), title ('Pre-filtered signal'), xlim ([time (1) time (N)]);
figure;
imagesc (time, f, TH');
title ('Masking threshold spectrogram');
xlim ([time (1) time (N)]);
ylim ([f (1) f (FFTlength/2)]);
imagesc (time, f, H);
title ('Masking threshold estimate spectrogram');
xlim ([time (1) time (N)]);
ylim ([f (1) f (FFTlength/2)]);
end
44
prefilter.m
function [y, buff] = prefilter (x, buff, a)
% determine the filter order
p = length (a) - 1;
% push the current value in the delay chain
buff = [x; buff (1:p)];
% compute the output sample
y = a*buff;
getMtest.m
function [a, e] = getMTest (th, p, f)
th = th - min (th);
th = [th fliplr (th)];
% plot (th);
% pause (1);
acf = real (ifft (power (10, th/10)));
[a e] = levi (acf (1:round (length (acf)/2)), p);
% [b a] = invfreqz (th, f/max(f)*pi, 1, p);
Decode.m
% function y = decode (x, A, fs, plotting)
45
clear all;
filename = 'cmu_us _arctic _slt _a0001.pre.wav';
[x, fs] = wavread (filename);
load ('filter_coeffs.mat');
% if plotting is not set, no plotting should be done
% if nargin == 3
plotting = 1;
% end
%-------------------------------------------------------------------------% system initialization
% set the LPC analysis order
p = 10;
% computing the processing step, window size and frequency values (in Hz
% and Barks)
ps = round (fs * 0.005); % 5 ms
ws = round (fs * 0.025); % 25 ms
N = length (x);
% the length of the signal in samples
%-------------------------------------------------------------------------% initialize the postfilter output
y = zeros (size (x));
%-------------------------------------------------------------------------% start processing the signal
% initialize the filter delay chains
postbuff = zeros (p+1, 1);
% get the first filter coefficients
a = A (1,:);
% process the first frame
46
for i=1:ws
% compute the post-filter output
[y (i), postbuff] = postfilter (x (i), postbuff, a);
end
lastUpdated = 1;
a = A (2, :);
for i=i+1:N-ps
% computing the postfilter output
[y (i), postbuff] = postfilter (x (i), postbuff, a);
% if the coefficients were last updated ps samples ago, it's time to
% compute new coefficients
if lastUpdated == ps
lastUpdated = 0;
a = A (floor ((i-ws)/ps)+2, :);
end
lastUpdated = lastUpdated+1;
end
for i=i+1:N
% computing the postfilter output
[y (i), postbuff] = postfilter (x (i), postbuff, a);
end
time = (1:N)/fs;
if plotting
figure;
plot (time, x), title (' pre-filtered signal'), xlim ([time (1) time (N)]);
plot (time, y), title ('Reconstructed signal from post-filter'), xlim ([time (1) time (N)]);
47
end
wavwrite (y, fs, [filename (1:end-7) 'post.wav']);
levi.m
% It takes predictor order and auto correlation values for each frame of signal .
%calculates LPC coefficients using autocorrelation values
function a = levi(r, p)
% initializing the LP coefficients
a = zeros(1, p+1);
a(1) = 1;
% running the first iteration
k = -r(2) / r(1);
a(2) = k;
E = r(1) * (1-k^2);
% iterating the remaining steps
for i=2 : p
% setting the j index
j = 1:i-1;
% computing the reflection coefficient
k = r(i+1);
k = k + sum(a(j+1) .* r(i-j+1));
k = -k/E;
% saving the previous LP coefficients
ap = a;
% computing the new LP coefficients from 1 to i-1
48
a(j+1) = ap(j+1) + k * ap(i-j+1);
% saving the reflection coefficient as the i'th LP coef
a(i+1) = k;
% computing the new prediction gain
E = E * (1-k^2);
End
Postfilter.m
% it takes stored LPC coefficients from matrix for each frame of signal and generates
%frequency response inverse of the pre-filter using previous values of output samples to
%decode the signal.
function [y, buff] = postfilter (x, buff, a)
% determine the filter order
p = length (a) - 1;
% compute the output sample
y = x - [a (2:end) 0]*buff;
% push the current value in the delay chain
buff = [y; buff (1:p)];
49
REFERENCES
[1] A. Gersho, “Advances in Speech and Audio Compression.” Proceedings of the IEEE,
Vol. 82, Issue 6, 1994, pp. 900-918.
[2] B. Edler, and G. Schuller, “Audio Coding Using a Psychoacoustic Pre- and Postfilter.” Proceedings of the IEEE International Conference on Acoustics, Speech, Signal
Processing, Istanbul, Turkey, 2000, pp. 881–884.
[3] E. Zwicker, and H. Fastl, “Psychoacoustics: Facts and Models.” New York, Springer,
1999. Ch. 2, 4, and 6.
[4] Ted Painter, and Andreas Spanias, “Perceptual Coding of Digital Audio.” Proceedings
of the IEEE, Vol. 88, No. 4, April 2000, pp. 451–512.
[5] Jeremy Bradbury, “Linear Predictive Coding.” December 5, 2000.
[6] B.Edler, C. Faller, and G. Schuller, “Perceptual Audio Coding Using a Time-Varying
Linear Pre- and Post-filter.” Proceedings of AES Symposium, Los Angeles, CA,
September 2000.
[7] Nagarajan and S. Sankar, “Efficient Implementation of Linear Predictive Coding
Algorithms.” Proceedings of the IEEE, Southeastcon 1998, pp. 69-72.
[8] John Makhoul, “Linear Prediction: A Tutorial Review.” Proceedings of the IEEE,
Vol. 63, No. 4, April 1975.
[9] Douglas O’Shaughnessy, “Linear Predictive Coding.” IEEE Potentials, Vol. 7, Issue
1, 1988, pp. 19-32.
Download