fact integer dct audio

U MI
MICROFILMED 2003
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
INFORMATION TO USERS
This manuscript has been reproduced from th e microfilm master. UMI films
the text directly from the original or copy submitted. Thus, som e thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.
The quality of this reproduction is dependent upon the quality of the
copy submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not sen d UMI a complete manuscript
and there are missing pages, these will be noted.
Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are
reproduced by
sectioning the original, beginning a t the upper left-hand corner and continuing
from left to right in equal sections with small overlaps.
ProQuest Information and Learning
300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA
800-521-0600
with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
FAST INTEGER MDCT FOR MPEG/AUDIO CODING
The members of the Committee approve the masters
thesis of Tharakram Krishnan
Soontorn Oraintara
Supervising Professor
Venkat Devarajan
Si
Michael Manry
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
Copyright © by Tharakram Krishnan 2002
All Rights Reserved
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
FAST INTEGER MDCT FOR M PEG/AUDIO CODING
by
THARAKRAM KRISHNAN
Presented to the Faculty of the Graduate School of
The University of Texas at Arlington in Partial Fulfillment
of the Requirements
for the Degree of
MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
THE UNIVERSITY OF TEXAS AT ARLINGTON
August 2002
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
UMI Number: 1410963
___
®
UMI
UM I Microform 1 4 1 0 9 6 3
Copyright 2 0 0 2 by ProQ uest Inform ation and Learning C om pany.
All rights reserved. This m icroform edition is protected against
unauthorized copying under Title 17, United S tates C ode.
P roQ uest Information and Learning C om pany
300 North Z e e b Road
P .O . Box 1346
Ann Arbor, M l 4 8 1 0 6 -1 3 4 6
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
ACKNOWLEDGMENTS
I wish to acknowledge the patient encouragement and guidance of Prof. Soontorn
Oraintara, without whom this thesis could not have been completed in time. I would also
like to thank my parents, my sister and friends whose constant support and understanding
has helped me weather through many difficult times.
May 1, 2002
v
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
FAST INTEGER MDCT FOR MPEG/AUDIO CODING
Publication N o .______
Tharakram Krishnan, M.S.
The University of Texas at Arlington, 2002
Supervising Professor: Soontorn Oraintara
The modified discrete cosine transform (MDCT) is a lapped transform used in
transform coding schemes as an analysis/synthesis filter bank based on the concept of
time domain aliasing cancellation. This thesis proposes to implement MDCT using the
lifting scheme for its lossless implementation with integer coefficients. The lifting scheme
also renders the MDCT impervious to quantization errors making it a lossless scheme.
The new fast algorithm can be easily adopted for the MDCT computation in current
audio coding schemes such as those offered by the MPEG audio standards. A performance
comparison of the proposed implementation and the current technique will be presented.
The integer MDCT is implemented in a standard MPEG layer III codec. The performance
of the codec with the new structure will be evaluated.
vi
with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission .
TABLE OF CONTENTS
ACKNOWLEDGMENTS ............................................................................................
v
ABSTRACT
.................................................................................................................
vi
LIST OF F IG U R E S ....................................................................................................
ix
LIST OF T A B L E S .......................................................................................................
xi
Chapter
1. IN T R O D U C T IO N ..................................................................................................
1
1.1 MPEG/Audio c o d in g ...............................................................................
2
1.1.1 Features........................................................................................
2
1.1.2 Applications
...............................................................................
3
1.2 The Modified Discrete Cosine Transform.................................................
4
1.3 O u tlin e ........................................................................................................
4
2. MPEG/AUDIO LAYER III AND THE M D C T .................................................
7
2.1
3.
The MPEG Layer III E n c o d e r ...............................................................
7
2.2 The MPEG Layer III D e c o d e r...............................................................
13
2.3 The Modified Discrete Cosine Transform.................................................
15
2.4 A fast structure for the M D C T ................................................................
17
2.5 The Lifting scheme
..................................................................................
23
A FAST, INTEGER STRUCTURE FOR THE M D C T ...................................
26
3.1 Fast 12-point Integer MDCT structure...................................................
26
3.2 Fast 36-point Integer MDCT structure...................................................
26
3.2.1 Orthonormal Factorization of the 9-point DCT I I ..................
28
3.3 Accuracy of the forward tran sfo rm ..........................................................
30
vii
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
3.3.1 Computational C o m p lex ity .......................................................
34
4. THE INTEGER MDCT IN A STANDARD MPEG LAYER III CODEC . . .
37
4.1 Simulation r e s u l ts .....................................................................................
40
5. CONCLUSIONS .....................................................................................................
50
B IB LIO G R A PH Y ........................................................................................................
52
BIOGRAPHICAL INFORMATION
54
.........................................................................
viii
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
LIST OF FIGURES
Figure
Page
2.1
MPEG layer III encoder functional block d ia g ra m ......................................
8
2.2
The four types of w in d o w s.............................................................................
10
2.3
MPEG layer III decoder functional block d ia g ra m ......................................
14
2.4
Butterfly implementation of the 12-point MDCT s t r u c tu r e .......................
21
2.5
Butterfly implementation of the 36-point MDCT s t r u c t u r e ......................
22
2.6
Butterfly and Lifting s t e p s .............................................................................
23
3.1
Fast, integer structure for the 12-point Integer M D CT/IM D CT................
27
3.2
Fast, integer structure for the 36-point Integer M D CT/IM D CT.................
29
3.3
Structure for the orthogonal 9-point D C T - I I ...............................................
31
3.4
An integer structure for the orthogonal 9-point D C T - I I .............................
33
3.5
Plot of Mean square error versus N c ..............................................................
35
3.6
Plot of Nc versus the minimum number of binary adders
36
4.1
Block diagram of an integer MDCT based MPEG layer III encoder . . . .
38
4.2
Block diagram of an integer IMDCT based MPEG layer III decoder . . .
39
4.3
Time plots of the audio s ig n a l.......................................................................
42
4.4
Spectrogram of audio signal ..........................................................................
43
4.5
Spectrogram of audio signal
..........................................................................
44
4.6
Spectrogram of audio signal
..........................................................................
45
4.7 Spectrogram of audio signal
..........................................................................
46
4.8
.........................
MSE vs Nc for the integer c o d e c ............................................................
4.9 MSE vs Nc for the integer encoder/standard decoder
47
................................
ix
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
48
4.10 MSE vs Nc for the standard encoder/integer decoder
x
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
LIST OF TABLES
Table
Page
2.1
Butterfly coefficients of the 12-point MDCT shown in figure 2.4 ..............
3.1
Lifting Coefficients used in the 12-point Integer MDCT, figure 3.1 . . . .
28
3.2
Butterfly coefficients of the 9-point DCT-II block shown in figure 3.3 . . .
32
3.3
Lifting Coefficients used in the 9-point DCT-II/DST-II block, figure 3.4 .
32
xi
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
20
CHAPTER 1
INTRODUCTION
From the time the modern computer has been in use, there has existed a disparity
between the volume of information and the availability of resources to store it in. Mem­
ory resources have always been limited. Engineers and computer scientists have tried
by either improving memory fabrication technology or by compressing the data stored,
to make maximum utilization of the memory available. Memory fabrication technology
today makes it possible to store a million times more information than the memory de­
vices twenty years ago. With the advent of a faster and more accessible internet, the
volume and the types of information stored has increased many times. Also the expo­
nential growth in processing power makes today’s computing devices number crunching
behemoths. This means that we can run complex data compression algorithms to max­
imize memory resources while making full use of the processing power available to us.
By using data compression techniques we can reduce the memory occupied by the data.
Less obvious advantage of smaller data file size is that it would occupy lesser bandwidth
when sent over the internet.
A major chunk of the information sent over the internet today is in the form of
streaming audio and video. Raw digitized video and audio require very high bandwidth
and therefore not suited for transmission over networks with limited bandwidth and high
load. Frequencies of sound perceived by the human ear is the range of 30 Hz to 30 KHz.
For most audio applications frequencies above 22.5 KHz are regarded as redundant. For
the near-perfect reconstruction of audio signals we require it to be sampled at 44.1 KHz,
and at 16 bits per sample we would require a bandwidth of approximately 7 megabits
1
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
per second. As per today’s standards this is a huge amount of bandwidth making it
extremely expensive to send even moderately large raw audio files over the internet.
Fortunately we have many advanced audio coding schemes which have made it eas­
ier to compress, store and transfer audio files. Some of which are the AAC-3 (Advanced
Audio Coding) and the MPEG (Motion Pictures Experts Group) audio coding standards.
In this thesis we concern ourselves with the layer III of the MPEG/audio coding scheme,
specifically an integer approximation of the modified discrete cosine transform (MDCT)
filterbank which forms its core.
1.1
M PEG /A udio coding
The need for an efficient and quality preserving data compression algorithm lead the
International Organization (ISO) for Standardization to develop a standard for compress­
ing digital video and audio. The Motion Pictures Expret Group (MPEG)was established
to cater to this need and in November 1992 the first standard called MPEG-1 was pro­
posed. Later, in November 1994, an extension to the MPEG-1 standard called MPEG-2
was developed. MPEG/audio addresses the compression of synchronized audio and video
at 1.5 megabits per second [1]. Unlike the coders that model the vocal-tract and specially
tune themselves to speech, the MPEG coder gets its data reduction by exploiting the
limitations of the human ear. The compression results from removing the perceptually
redundant parts of the audio signal. Since the distortions introduced are inaudible to
the human ear, the MPEG coder can compress any signal meant to be perceived by the
human auditory system.
1.1.1
Features
The MPEG/Audio offers a diverse assortment of compression modes and a number
of useful features like fast forwarding, audio reversing and random access. The sampling
rate of the audio stream can be set at 32, 44.1 or 48 kHz. The compressed bitstream can
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
3
support one or two audio channels in four possible modes [1]:
1. a monophonic mode for single audio channel,
2. a dual-monophonic mode for two independent channels,
3. a stereo mode for stereo channels that share bits,
4. a joint stereo mode th at takes advantage of the correlations between stereo
channels or the irrelevancy of phase difference between channels.
There are several predefined bitrates, ranging from 32 to 224 kilobits per second per
channel. Compression ratios range from 2.7 to 24. It also supports a “free” bitrate mode
to support bitrates other than the predefined ones. MPEG/audio offers a choice of three
layers of compression. Between them the three layers provide a range of tradeoffs between
code complexity and compression rates [1].
Layer I, the simplest, is suited for bitrates of 128 kbps and upward, layer II which
is more complex than layer I offers bitrates around 128 kbps. Layer III has the most
codec complexity, but offers highest fidelity. All three layers are simple enough to be
implemented on a small chip.
1.1.2
Applications
Based on the quality of service offered, each layer has its own applications. Layer
I is used in Philips’ Digital Compact Cassette(DCC) with 192 kbit/s per channel. Ap­
plications for layer II include coding of audio for Digital Audio Broadcasting (DAB) and
the storage of full motion, synchronized audio and video on CD-ROM, more popularly
called Video CD. Layer III which offers bitrates of about 64 kbps per channel suits audio
transmission over ISDN. With the advent of cheap and powerful processors coupled with
high bandwidth offered by cable and DSL based internet service providers, the MPEG
layer III compression has become the favorite mode of exchange of digital audio over the
internet. The layer III algorithm, though complex when compared to layer I and II, is
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
4
simple enough to be implemented on a chip. This has brought many small hand-held,
portable and inexpensive MP3 players into vogue.
1.2
T he M odified Discrete Cosine Transform
The modified discrete cosine transform (MDCT) has been employed in transform
coding schemes as the analysis/synthesis filterbank based upon time domain aliasing
cancellation [2]. The MDCT is used in the layer III of MPEG-1 and MPEG-2 to provide
better spectral resolution by subdividing in frequency the sub-band outputs obtained
from the previous layers. The MDCT can be viewed as a lapped transform of which the
number of inputs N is twice the number of outputs [3]. The MPEG audio standards use
two sizes of MDCT, 6 x 12 (N = 12) and 18 x 36 (N = 36).
Recently, new fast algorithms for the forward and inverse MDCT computation has
been proposed [4]. It is based on DCT-II/DST-II fast algorithms and their inverses which
have real (irrational) coefficents. The algorithm can be used to compute the MDCT for
data sequences of length N divisible by 4. Despite the efficiency of the proposed fast al­
gorithms, the internal operations require real multiplications which are not preferable in
applications that run on batteries such as mobile multimedia communications. In prac­
tice, these transforms are often approximated by using fixed-point arithmetic. However,
this type of implementation does not preserve the invertibility property of the transform.
1.3
Outline
As was previously mentioned the layer III is the most complex of the three layers.
In this thesis we concentrate on improving the efficiency of one of the main components
of the layer III codec, the Modified Discrete Cosine Transform (MDCT) Filterbank. We
are interested in approximating the MDCT using integer or dyadic coefficients while
maintaining its reversibility. Lifting scheme [5] is used to calculate orthogonal matrices.
This technique has been used in approximating other orthogonal transforms such as the
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
DCT [6, 7] and the DFT [8]. It has also been employed in some classes of lapped trans­
forms with symmetric and anti-symmectric basis functions [9]. It should be noted that
the basis functions for MDCT are neither symmetric nor anti-symmetric, and thus can
not utilize the existing structures. We will illustrate the possibility of lossless implemen­
tation of the MDCT with integer coefficients using liftings. The size of the MDCT being
considered is also problematic since it uses 3-point and 9-point forward and inverse DCTII/DST-II. Though some fast algorithms for computing radix-3 DCT and DST have been
presented in the literature [10, 11], they are not based on orthogonal operations which
are difficult to control the dynamic-range of the internal nodes in the transforms. In
this thesis, a novel factorization of these transforms which are suitable for the integer
implementation of the MDCT is presented. The accuracy of the integer transform, when
compared to the original fast transform will be analyzed.
The integer transformation is incorporated in a standard MPEG layer III encoder
and decoder. Based on certain subjective tests the performance of the new codec is com­
pared with the standard MPEG layer III codec.
In chapter 2, a detailed description of the MPEG layer III, the different compo­
nents, a comparison with the other two layers and an analysis of the role and working of
the MDCT is presented. The mathematics behind the MDCT, certain properties of the
MDCT and some previous works about its fast implementation are also described. The
lifting scheme which is the technique used in the integer implementation of the MDCT
is described in a brief and pertinent manner.
Chapter 3 deals with the new integer structure for the MDCT. The orthogonal
factorizations for the 3-point and 9-point discrete cosine transform of type II (DCT II)
are also described. Results obtained from simulations to determine the accuracy of the
integer MDCT are presented.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
6
The performance of the Integer MDCT when incorporated in a commercial MPEG/Audio
codec are discussed in chapter 4.
Chapter 5 concludes and summarizes the contributions of the thesis.
R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission.
CHAPTER 2
MPEG/AUDIO LAYER III AND THE MDCT
This chapter briefly describes the different algorithms of the MPEG layer III codec.
The MDCT and its role in the layer III standard is discussed in detail. A fast MDCT
structure and the Lifting scheme which is used in integer implementations of transforms
is also presented.
2.1
T h e M PEG Layer III Encoder
In this section a brief functional description of the MPEG/Audio Layer III encoder
is presented. The block diagram of the encoder is shown in figure 2.1. Here the audio
signal is a single channel Pulse Code Modulated (PCM) signal sampled at 44.1 KHz and
quantized to 16 bits.
A nalysis polyphase filterbank: The first block in the encoder is the polyphase
filterbank that coarsely divides the samples in frequency into 32 equally spaced subbands.
1,152 PCM samples are simultaneously filtered by a filterbank consisting of 32 equal sub­
bands, each of width 1.37 kHz. This is followed by decimation by a factor of 32. Each
subband will therefore contain 36 samples. Aliasing cancellation is taken care of in the
decoder to achieve perfect reconstruction.
F a st Fourier Transform : Two fast fourier transform calculations are performed
simultaneously with the polyphase filterbank calculations. Both 256 and 1,024 point
FFT are performed to provide for high spectral resolution.
7
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Reproduced
with permission
of the copyright owner.
Audio signal
in PCM format
Further reproduction
Analysis
Polyphase
Filter Bank
PFT
MDCT
Psychoacoustic
Model
Nonuniform
Quantization
Huffman
Encoding
Bitstream Formatting
CRC word generation
Coding of
Side Information
Coded
Audio
Signal
Ancillary
Data
prohibited without permission.
Figure 2.1. MPEG layer III encoder functional block diagram.
Oo
9
M odified Discrete Cosine Transform: The MDCT enhances the spectral reso­
lution of the 36 samples obtained from the each of the 32 subbands by mapping them to
36 frequency lines per subband. This results in the 1,152 PCM samples being spectrally
resolved into 1,152 frequency lines. Prior to the application of the transform windowing
of the subband samples is performed. The MPEG layer III stipulates four kinds of win­
dows. The long window which of length 36 is applied when the samples within a subband
exhibit stationary, slow changing behavior. The short window is applied when the sam­
ples in the subband exhibit non-stationary or transient behavior. Two other windows
termed “start” and “stop” windows are used to handle transitions between the long and
short windows. These windows provides better frequency resolution in the lower frequen­
cies without sacrificing time resolution for the higher frequencies. The different windows
are illustrated in figure 2.2. The output of the MDCT consists of 18 frequency samples
obtained after applying the transform on either the long window or three overlapping
short windows. The different types of windows and the sequence of windows applied to
a subband is shown in figure. The decision on which window to apply is dependant on
the psychoacoustic model. Performing the MDCT on on the long windows will produce
18 frequency lines with 50 percent overlap. The short windows will produce 3 groups
of
6
frequencies each belonging to different time intervals. Thus applying the MDCT
once to the 32 subbands will produce 576 samples in frequency. 50 percent overlap will
cause the MDCT to produce 1,152 frequency points when transforming the subbands.
Because the MDCT processing of a subband has good frequency resolution, it has poor
time resolution. The MDCT operates on 12 or 36 polyphase filter samples, so the effec­
tive time window of audio samples involved is 12 or 36 times larger. The quantization of
the MDCT values will cause the errors to spread over this large time window and it is
more likely to manifest audible distortions in the form of pre-echo. The psychoacoustic
model of the layer III incorporates several measures to reduce pre-echo. The MDCT is
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
10
dealt with in greater detail in the next section.
A lias reduction: Aliasing introduced in the analysis polyphase filterbank is re-
0.7 ■
0.6 -
0.5 •
0.4 •
03 •
(a)
5
10
15
20
(b)
25
(c)
30
35
(d)
Figure 2.2. The plots of the (a) short window, (b) long window, (c) start(long to short
transition)window and (d) stop(short to long transition) window.
moved by means of a series of butterfly computations. This is done so as to reduce the
amount of data for transmission.
P sychoacoustic m odelling: The psychoacoustic model is the model of human
ears auditory perception. It is used in the encoder only, in order to decide which parts of
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
11
the signal can be heard by the human ear and which parts cannot. The psychoacoustic
model decides which window type to apply before applying the MDCT on the subband
samples. This decision is based on the differences in the FFT spectra of the present
samples and the spectra of the previous samples. If the signal is a stationary signal, then
there will not be much difference between the FFT spectra calculated at the present
instant with th at calculated previously. If it is a fast moving transient signal, there will
be certain differences between the current spectra and the previous FFT spectra. The
psycho acoustic model will then call for the application of the long to short transitional
window and the short window.
The psychoacoustic model also supplies the nonuniform quantization block with
information on how the frequencies obtained from the MDCT block are to be quan­
tized. The quantization of spectral lines is adapted to the limitations of the human ear’s
perception of audio signals. The human ear has 24 critical bands in which it is less
frequency selective.Masking is a phenomenon in which a weak signal is made inaudible
by a simultaneously occurring strong signal. The masking occurs in each critical band
when a dominant tonal component is present and all other frequencies in the band are
not perceived properly. The dominant component introduces a masking threshold belowr
which frequencies in the same band are masked out. This allows for coarse quantization
of the masked frequencies within the band, without allowing any perceivable distortion.
Whenever a dominant tone is present the masking threshold is calculated. Based on
this threshold an upper limit for the quantization level required in individual scalefactor
bands is determined. The layer III specifies two different models that can be used with
the encoder.
N onuniform quantization: The non linear quantization of the spectral lines is
performed in this block. Nonlinearity is introduced by first raising each sample to the
power of 3/4 In order to reduce quantization noise, a scaling of the spectral coefficients
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
12
in each scalefactor band is performed prior to the nonuniform quantization. Hence the
output is quantized spectral lines and scalefactors for each band. Grouping and scaling
of frequencies into scalefactor bands is performed in accordance with the psychoacoustic
model.
H uffm an encoding and coding of side inform ation: The scaled frequency
lines are Huffman coded using 32 stationary Huffman tables. It is here that the majority
of the data reduction takes place. In order for the decoder to to reproduce the audio
signal successfully all the parameters used and generated by the encoder should be pro­
vided. The side information consists of boundaries of certain data blocks, quantizer step
sizes etc.
B itstre a m fo rm attin g , CRC word g en eratio n and A ncillary data:T he Huff­
man coded spectral lines, side information and a frame header are assembled to form the
bitstream. The bitstream is partitioned into frames representing 1152 PCM samples. An
optional CRC (Cyclic Redundancy Check) can be included for data validation. Ancillary
data is used for features like the artists name, album or music category. It is an optional
feature.
Stereo E ncoding: The encoder model presented so far is applicable only to sin­
gle channel audio. Encoding dual channels or stereo audio channels is achieved by time
sharing the model described above. It does not introduce extra complexity because the
two channels are encoded independently, the two stereo redundancy modes supported by
the layer III are the Middle Side (MS) stereo and the Intensity stereo. The first method
transmits the sum and the difference of the two channels. The two new channels are
transmitted as described for a dual channel. Intensity stereo requires only one channel
and transmits the sum of the two audio signals.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
2.2
The M PE G Layer III Decoder
A functional description of the MPEG layer III decoder is presented in this section.
Figure 2.3 shows the block diagram of the layer III decoder. The decoder does not have
the psychoacoustic model block. As is obvious from the figure the blocks function as an
inverse to the ones presented in the encoder. Based on the CRC data and the header
information, the integrity of the bitstream is verified. The ancillary data is decoded.
The Huffman decoding block performs the decoding of the Huffman coded bits. There
have been no attempts to seperate the code words in the Huffman code bits. Therefore a
codeword from the middle of the bitstream cannot be identified without starting from the
initial codeword. The Huffman info coding block is a controller of the Huffman decoding
block, all parameters required for the correct decoding of the Huffman coded bits are
setup by this block. The scalefactor decoding block decodes the coded scalefactors. The
decoded scalefactors are used in descaling. Descaling establishes a perceptually identical
copy of the frequency lines generated by the MDCT block in the encoder. The descaled
frequency lines are then reordered. The stereo processing block converts the encoded
stereo signal into left and right channels. It was mentioned previously that aliasing
reduction was applied in the the MDCT block of the encoder. In order to obtain the
correct reconstruction of the audio signals in the next few blocks, the aliasing artifacts
have to be added again. This is done by .eight butterfly calculations for each subband.
The frequency lines from the alias reduction block are mapped onto polyphase filter
samples in the IMDCT block. The expression for the IMDCT is presented in the next
section in equation (2.4). Depending on the block type the output of the IMDCT block
is multiplied with one of the following windows:
Block Type = 0 (Long Block)
7T
1
Wi = sin(— (t + - ) ,
i = 0 to 35
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
(2.1)
Reproduced
with permission
of the copyright owner.
Huffman code Bits
Magnitude 6 Sign
Huffman
Descaling
Decoding
Synchronization
and
Error
Checking
MDCT'
Reordering
Huffman
Info
Decoding
Further reproduction
Scalefactor
Decoding
Scalefactors
Ancillary Data
MDCT
prohibited without permission.
Alias
Reduction
IMDCT
Frequency
Inversion
Synthesis
Polyphase
Filterbank
Joint
Stereo
Decoding
► Right
Channel
PCM Samples
Alias
Reduction
IMDCT
Frequency
Inversion
Synthesis
Polyphase
Filterbank
Figure 2.3. MPEG layer III decoder functional block diagram.
► Left
Channel
15
Block Type = 1 (long to short transition)
i — 0 to 17
sin(§(* + 5)
1
i = 18 to 23
s in ( £ ( i- 1 8 + ±)
1 = 24 to 29
Wi= <
2=
0
30 to 35
Block Type = 2 (Short Block)
= sm (— (z +
=
0
to
( 2 .2 )
11
Block Type = 3 (short to long transition)
2=
sin (^ ( 2
-6
+ i)
0 to 5
2= 6
to
11
Wi =
sin(§(* + l)
2=
12 to 17
2=
18 to 35
When the short block window is applied, the windowed short blocks are overlapped and
concatenated.
In order to compensate for the frequency inversion in the synthesis filterbank, odd
time samples of every odd subband is multiplied with a -1. Each time a sample from
each of the 32 subbands are calculated, they are applied to the synthesis filterbank and
32 consecutive audio samples are calculated.
2.3
The Modified Discrete Cosine Transform
The Modified Discrete Cosine Transform is an orthogonal lapped transform used
in the Layer III of the MPEG/Audio to provide better spectral resolution by subdividing
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
16
in frequency the sub-band outputs obtained from the Multirate filterbank. The MDCT
and its inverse (IMDCT) for a windowed input sequence are defined as [2]:
N- 1
— y~]x n COS
n=0
7r(2n + 1 + y)(2/c + 1)
2N
7r( 2 n
c ‘ COS
N E
k= 0
+ l +f)(2A : + l)
2N
(2.3)
(2.4)
where k = 0,1, • • • , N/2 — 1 and n = 0,1, • • • , N —1 . The MDCT coefficients posses an
even antisymmetry property so that we obtain N/2 MDCT coefficients C* from N input
samples. The input window which is of length N is shifted by N/2 samples to obtain
the next set of coefficients. Thus the length of the input data vector remains the same
after transformation. The notation xn denotes that the recovered data sequence is not
the same as the original input vector and is said to be time aliased. The data vector is
reconstructed by overlapping and adding the IMDCT coefficients xn.
Equations (2.3) and (2.4) are applicable to a windowed input signal. For perfect
reconstruction,the window h(n) must comply with two constraints as follows [12 ]:
h(N — 1 —n) = h(n)
(2.5)
N
N
h (n) + h ( n + — ), n = 0,1,2,3...., — - 1
( 2 .6 )
where n = 0,1, 2,3,...., N — 1.
The possible choices for the window which match these constraints is the rectangular
window, which is h(n) = 1 for n = 0 , 1 ,2,..., N and the sine window which is given by:
h{n) = sin [~ (n - ^), n = 0 ,1 ,2 ,3 ,
JV —1.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
(2.7)
17
The window is applied twice, before the forward transform and once again after the in­
verse transform so that the condition specified by equation (2.6) is satisfied. If N —k —1
is substituted instead of k in equation (2.3), we get
N- 1
= —^ xn COS
ZN-k-1
n=0
7t(2n + 1 + y)(2 k + 1)
2N
(2 .8)
where n = 0,1,2,...., N — 1 and k = 0,1,2,...., y — 1. The MDCT exhibits the even
antisymmetry property, given by
Ztf-k-i =
2.4
—Zk, k = 0 , 1 , 2 , 3
,
(2.9)
1.
A fast structure for the M DCT
According to equation (2.3) the MDCT can be written in the form
N —l
cos
7r(2n+ 1)(2*+ 1) 7r
2^
+ 4
^ , fc = 0 , 1 , 2 ,
N
— — 1.
n= 0
( 2 . 10 )
The even antisymmetry property allows for the computation of only one half of the
coefficients [13]. Therefore
N- 1
= £
N
x„ COS ^ ( 2 n ± | M + l ) + ^ ( 4 t + 1) , fe = 0,1,2,3, ■■• , —— 1.
n= 0
(2 . 1 1 )
This can be rewritten as
N- 1
^ 2 * = ( ~ l ) fc- y
E>
n= 0
,
1"
(I COS
c
'
7r(2n + 1)(4 k + 1)
—sin
2N
7r(2 n
-I-1)(4 k + 1)
2N
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission.
(2 . 12)
where k = 0,1, • • • , N / 2 — 1 . Symmetry of the sine and cosine kernels provide further
reduction. Substituting N - n — 1 into equation (2.12), for n = 0,1,2, • • • , N /2 - 1 and
using trigonometric identities, it can be written in the form
N- 1
7r( 2 n
. y / 2
Z 2k =
cos
n=0
[
+ 1)(4k + 1)
7r(2 n + 1)(4A: + 1)"
— x” sin
2N
\
[
2N
J , (2.13)
where k = 0,1, • • • , N/2 — 1 .
where
xn — xn
Xjv-n-ii
(2.14)
xn — xn + iv—n—i ,
(2.15)
where n = 0,1,2, • • •, N/2 — 1 .
Repeatedly using the symmetry of the sine and cosine kernels, and by substituting y
n — 1 into equation (2.13), for y —n - 1 the second reduction is achieved as
cos
7r(2 n +
n —0
~ {x 'n
- ^ _ n_i)sin
7r( 2 n
1)(4 k + 1)
2N
+ 1)(4 k + 1)
2N
(2.16)
where k = 0,1, • • • , N/2 — 1 . The sine and cosine kernels are expanded as follows:
+ 1)(4 k + 1)
2N
rc(2n+ l)(4/c + 1 )'
sin
2N
cos
7r( 2 n
=
7r(2n + l)
CQS~
7r(2n + l)/c
. 7r(2n + 1)
. 7r(2n + l)/c
— cos — Ar.
sin —
—- s i n - —- ,— —,
2N
N/2
2N
N/2
. n(2n+l)
n(2n+l)k
7r(2n + l) . Tr(2n + l)k
+ cos -J l-—r—- sin
= sin — — cos —- ___
2N
N/2
2N
N /2
’
(2.17)
R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission .
19
After their substitution in equation (2.16) and a few manipulations we get
py
Z 2k = ( - 1 ) fc- y
4__ 1
X 2
7r(2n + 1)k
Qn c o s
- bn sin
2(N/4)
n=0
"■7r(2n + 1)A:
(2.18)
2(iV/4)
k = 0,1, • • • , N/2 — 1 , where
, i
»
v
7 r (2 n + l)
, „
{xn - x ^ _ n_l)cos— ^
an =
7r (2 n + 1 )
sin ■
2N
»
, . 7 r (2 n + l)
— ^
,
2N
7r(2n + 1)
(X n -x .^ s m
+
( x 'n — X j v _ n _ j ) COS
2N
(2.19)
n = 0 , 1 , 2 , • • • , JV/4 — 1 .
These are recognized as plane rotations. The sine and cosine kernels are recognized
as j point DCT and DST of type II. Finally substituting k + ^ into 2.16 and using
trigonometric identities
n (2 n + l)(4 k + l + N )
2N
. 7r(2n + l)(4k + 1 + N)
sin
2N
cos
. 7r(2n + l)(4/c + 1)
= (—l ) n+ sin
=
2N
( - l ) " c o s * ( 2 w + ^ 4* + 1 ) .
v
'
2N
'
( 2 . 20 )
After some manipulations, the complete formula for the fast MDCT is obtained as
—-l
Z2k =
^ 2 f an C°S
n= 0
= ( - 1)
k+i
V2i
E
(-»
n= 0
n+1
an cos
7r(2n+ 1)k
2 (A/4)
7r(2n + 1)k
- bn sin
- bn sin
7r(2n + 1)k
7r(2n + l)k
( 2 . 21 )
fc = 0,1,2, • • • , AT/4 —1 .
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
20
Detailed analysis of the computation of the fast algorithm results in a simple signal
flow graph. The Inverse Modified Cosine Transform (IMDCT)structure is obtained from
the MDCT structure by simply reversing the flow in the signal flowgraph. The fast
12-point MDCT/IMDCT structure is shown in figure 2.4, whereas that used in the
fast 36-point MDCT/IMDCT is shown in 2.5. The rotations block and the DCT/DST
block are composed of rotations as shown in figures 2.4 and 2.5. The coefficients used
in the rotations are called butterfly coefficients and their values for 12-point MDCT are
obtained as in table 2.4. The values of the butterfly coefficients of the rotations block of
the Appoint MDCT are well defined and are given by [4]:
C{ = cos
(N / 2 - 1 - 2 i)n
— — ----- —,
2N
’
. (N/2 - 1 - 2i)ir
Si = sin -— ------—----- —
1
2N
where i = 0 ,1 , ..., N/4 —1. The 3-point and 9-point DCT-II structures presented in [4]
have the disadvantage that they cannot be orthonormally factorized. The orthonormal
factorization of the 3-point DCT-II is trivial and is shown in the 3point DCT-II block in
figure 2.4. However, the ones in the DCT-II and DST-II blocks of the 36-pointMDCT
remain unknown. A novel structure for the 9-point DCT-II/DST-II is presented in the
next chapter.
Table 2 .1 . Butterfly coefficients of the 12-point MDCT shown in
figure 2.4.
Cl
COS of'
C2
COS f j
c o s£
c3
1
C4, Cq
v/2
1
c5i C7
%/3
Si
Slnl?
sin
S3
s in £
S3
1
S4 >S6
S5 , S7
/I
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
Reproduced
with permission
of the copyright owner.
Rotations
3 point IDCT-H
Inverse Rotations
Further reproduction
prohibited
without permission.
Figure 2.4. Fast structure for the 12-point MDCT/IMDCT with rotations implemented using butterflies.
to
Reproduced
with permission
of the copyright owner.
R otations
Inverse R otations
Further reproduction
prohibited without permission.
*<3I)
M32M
Figure 2.5. Fast structure for the 36-point MDCT/IMDCT with rotations implemented using butterflies.
N3
to
23
2.5
T h e L ifting schem e
One of the methods of evaluating a concept in mathematics is to reduce into sim­
pler steps. Polynomials are factorized into monomials, numbers into their prime factors.
In this section we study the lifting scheme using which we factorize the butterfly coef­
ficients. Such a factorization has certain singular advantages. The rotations in the fast
implementation can be represented by an orthonormal matrix as:
c
R =
s
where c = cos 9, and s = sin 6
—s c
Figure 2.6 (a) shows a block diagram for computing R and its inverse. This matrix R
can be factorised into a product of upper and lower triangular matrices as follows:
1
0
s
1
1
z=±
s
1
0
1
I—
*
1____
1
o
R =
where
C~ 1 , bU= s, andJ g= ------c~ 1
a = -----The coefficients a, b and g are call lifting coefficients in this paper. Figure 2 .6 (b) shows
a flowgraph of the forward and inverse lifting structures of R.
g
(a)
g
(b)
Figure 2.6. Forward and inverse (a) butterflies and (b) lifting steps.
R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission.
24
The advantage of lifting is that the quantization introduced in the forward structure
can be cancelled out in the inverse structure making the structure invertible. Consider a
signal Xi flowing through the top branch of the forward lifting structure. Let the signal
X2 flow through the lower branch. The signals X\ and X2 are quantized to Ni bits. Let the
coefficients to Nc bits and the nodes to
bits, N t «
AT, and Nc «
N t. The output
at the top branch node following the the coefficient a is
2/i
= X\ + a * X2 + Qi + Qa
where Qi and Qa are quantization errors introduced because of the lifting coefficient and
at the node. In the inverse structure the output at the top branch node following the
the coefficient a is
y[ = x\ + a * x 2
Qi
Qa
Thus any quantization error introduced in the forward structure is cancelled out by the
inverse. This property of the lifting scheme is especially useful for mobile devices. By
using less number of bits to quantize the lifting coefficients, battery power is saved. More­
over, nonlinear operations can be used at the liftings without violating its invertibility
as long as same operations are used in its inverse. Assuming that the resolution of the
lifting coefficient is sufficiently high it can be shown that the lifting implementation of
each coefficient increases the resolution of the input by at most one bit [8 ]. Thus the
dynamic range at the internal nodes is an important factor. The number of bits at the
nodes, Ni has to be at least this dynamic range. The lifting conversion reduces the
number of multiplications needed to compute the R from four to three, although the
number of additions is increased from two to three. This is a definite advantage since
multiplications are computationally more expensive.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 3 discusses the integer implementation of the fast MDCT structure
lifting.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 3
A FAST, INTEGER STRUCTURE
FOR THE MDCT
In this chapter we discuss a fast, integer MDCT structure for the sizes 12 and 36points used in the MPEG layer III codec. We also present a novel 9-point orthonormally
factorized DCT-II/DSTII structure which is used in the 36-point MDCT structure. We
also discuss the merits of the new MDCT structures and also assess the accuracy of the
forward integer MDCT when compared with the fast structure presented in the previous
chapter.
3.1
Fast 12-point Integer MDCT structure
The fast structure shown in figure 2.4 uses butterflies to implement the rotations
and the DCT-II/DSTII. The conversion of butterflies to lifting steps was shown in section
2.5. By converting all of the butterflies in figure 2.4 to lifting, the fast, integer structure
can be obtained as shown in figure 3.1. The lifting coefficients are listed in table 3.1.
3.2
Fast 36-point Integer MDCT structure
The 3-point DCT II/DST II structure which is a part of the 12-point MDCT
shown in figure 2.5 was trivial enough to since it uses only two rotation angles. Whereas
the structure of DCT II/DST II far larger dimensions is far more difficult to derive.
Especially the orthogonal factorization of the 9-point DCT-II is not so obvious. The
integer structure obtained by converting the butterflies into lifting is shown in figure
3 .2 .
The lifting implementation for the rotations block in the 36-point MDCT in figure 3.2
are given by
26
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced
with permission
of the copyright owner.
x(O),
Rotations
Further reproduction
ill.
X~
A
t m
t i
3 point DST-II
Inverse Rotations
3 point IDCT-II
3 point DCT-II
,Co
Jc.
X
4 c 2J X - X
JM
A A
- L .. \
'X
k
L
,,' V
A
t■
i. T
L‘ f>
1"6
Tft Is f. "l
i
r
3 point IDST-II
prohibited without permission.
Figure 3.1. Fast structure for the 12-point Integer MDCT/IMDCT with rotations implemented using lifting.
to
28
Table 3.1. Lifting Coefficients used in the 12-point Integer MDCT,
figure 3.1.
Oi
1
sin
“ "If - 1
0,2
sin | f
O3
04,
co s ^ f
a6
0 5 , 07
COS24
1
s i n ||
9i
62
sin | |
92
sin £
h
s in lf
64,
T I- 1
1
b$
h , b?
l-l/l
c o s§ f-l
61
I
93
co s^ - 1
s in ^ f
« * * -!
sin I T
94,96
72- 1
95,97
1-Ji
cos 7 j —1 ,
.
cos 7 i —1
ai = — :------- , bi = sin 7 ,, g{= ------:-----sin 7 j
sin 7 ,
where, 7 * = N~ ^ Al and * = 1 , 2 ,3,...,9.
3.2.1 Orthonormal Factorization of the
9-point DCT of typ e II
In this section, the factorization of the 9-point DCT-II and DST-II using only
orthonormal rotations is presented. Since the DST-II can be obtained from the DCT-II
by negating the odd inputs, only the factorization of the DCT part will be presented . The
calculation of the 9-point DCT-II can be separated into two parts, with approximately
one-fourth computationally complexity each, via DCT-V and DCT-VII [14]. The N-point
DCT-II, V and VII are defined as [14]:
N- 1
h , 7T
x n cos t
(" + 2 )kN
n= 0
N- 1
Z kv
(3.1)
.
1 , k7r
(«+-)
2> N - \
(3.2)
Iw ,
Tt
x n cos /
(n + 2 )(‘f c + 2 ) F 3 T
(3.3)
y i x n COS
n=0
AT-1
z r
£
71=0
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced
with permission
of the copyright owner.
Inverse Rotations
Rotations
Further reproduction
9 point
Lossless
DCT-II
9 point
Lossless
DCT-II
prohibited
without permission.
Figure 3.2. Fast, integer structure for the 36-point Integer MDCT/IMDCT with rotations implemented using lifting.
to
CO
30
where 0 < k, n < N — 1 and 6* = 1 —(1 - 1 /\/2 )5[k}. It is clear that, when N is odd, the
N -point DCT-II can be calculated from the ^ p -p o in t DCT-V and p i-p o in t DCT-VII
as follows:
From (3.1), substituting k by 2k yields
JV- 1
L
Z 2k =
b k Y ^ { X n + ^ - l - n } COS
n=0
7T
( n + -)k
2' N/2
(3.4)
for 0 < k < yy-, which is equivalent to the y p -p o in t DCT-V of x n + x ^ - x - n. Similarly,
substitute k by 2k + 1 in (3.1) yields
IV—3
1v
Z 2k +1 =
^2
n=0
{ Xn ~ X K - l - n } COS
l v
vr
{ n + 2) ( k + 2) N/2\
for 0 < k < y p , which is equivalent to the ^^^p-point DCT-VII of xn —
(3.5)
It is
easy to see that the computational complexity, commonly measured by the number of
multiplications, is approximately reduced by half for large N.
Figure 3.3 illustrates the calculation of the 9-point DCT-II via the 5-point DCT-V
and 4-point DCT-VII. It also shows orthonormal factorization of the two reduced-size
matrices. The 9-point DST-II is obtained by negating the even inputs. Table 3.2.1 sum­
marizes the values of the butterfly coefficients used in the 9-point DCT-II. Figure 3.4
shows the integer structure for the 9-point DCT-II. The lifting coefficients used are tab­
ulated in table 3.2.1.
3.3
A ccuracy of the forward transform
In our implementation, we convert the butterflies in the 12-point and 36-point
MDCT structures to lifting units. The lifting coefficients are then quantized to a finite
number of bits 7VC, which ranges from 1 to 20 bits. In the evaluation, the input signal is
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
31
H
O
Q
H
O
Q
c
’o
a
-1-3
I
N
I"
N
N
N
N
N
n
N n
I n | n | n | n I n | n I ^ ! ( n
fl
o
bo
o
-£3
4? N4$ N4? N4*4$
N
N N
-4-3
S-l
fT
N
O
<P
'^
-£3
-4-3
4-1
£
0)
tH
43-2
<U
3
<
4s
--2
cn
>
dQ
CO
CO
c
a;
ud
fcuO
£
in
©
'x
X
'x
X
w
x”
00
'x
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table 3.2. Butterfly coefficients of the 9-point DCT-II block shown
figure 3.3.
c l ) f-2> f-3) c 8
COS f
5 lj S 2, S3, S 8
sin j
C g , C 1 0 , C n , C 15
c o s f
®9> s 10> S l l ) S l 5
s in f
C 4 , C 7 , C \ 2 ) C 14
%/2
1
\/3
S4> S7> S 1 2 , S l 4
c5
COS f p
s5
^6
1
sin ^
S6
3
Cl3
Vs
3
cos
S l3
s in T
Table 3.3. Lifting Coefficients used in the 9-point DCT-II/DST-II
block, figure 3.4.
cos ^ - - 1
ai
02
03
0,4
cos £ —1
s in f
9i
sin f
cos £ 1
f>2
s in f
92
sin £
cos £ —1
h
s in f
9s
sin £
94
y/2 - y / 3
s in f
COS j - 1
s in 7
cos f - 1
sin f
sfi -
sfi
b4
1
VS
cos ^ - 1
a§
a7
cos ^ - 1
h
05
-:i
VI
V 2 -V 3
b6
67
s i n f
1
Vi
9b
9b
1
V3
97
COS -j - 1
a8
Gg
^10
a n
s in j
COS 7 - 1
s in J
co sf-1
s in 1
cos^-1
sin £
y/2
cos 4 —1
... 4 ...
sin f
COS £ - 1
s in f
9s
bo
s in f
99
bio
s in f
9\o
bn
s in f
9n
sin £
912
y/2 -y/3
y/2 - y / S
b\2
&13
sin ^ - 1
------ —
cos ^
bn
G 14
y/2 - y / Z
bu
1
VS
sin £
cos 4 —1
., 4 ..
sin £
COS £ —1
s in
cos ^
—1
913
cos^
vs
9u
y / 2 - y/3
s in f
9lb
1
co s£ -l
sin £
Vs
-y/3
&8
G 12
^15
sin £ £
-2
cos £ —1
b\ s
sin £
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
>
6a
4>
eo
H
O
Q
>
e
c
0
a
a>1
13
c
obO
o
-a
ot-l
a>
4-5
-C
J-i
.2
a>
3
a
3
Sh
a>
J-h
O'
bO
a>
>
<
d
Q
CO
<
D
S-h
3
bO
—
-o
r-
<N
-ca-*
o
*fr
«*> o
r*-
oo
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
34
quantized to 8 bits, the product between the signal through the lifting branch and the
lifting coefficient is quantized to 25 bits, and the internal nodes are set at 25 bits. By
sending the input through the forward fast MDCT transform, the output of the MDCT
is obtained. By sending the same input through the forward fast integer transform, an
approximation of the MDCT output is obtained. The error of the integer MDCT is ob­
tained by subtracting the two outputs. The performance of the MDCT evaluated based
on the accuracy of the integer MDCT compared with the ideal MDCT is presented in
figure 3.3 fa,) and (b). From the figures, it is clear that as the number of bits allocated
to the lifting coefficients N c is increased, the accuracy of the integer transform increases
approximately linearly with N c for both sizes of the transform, ft is evident that the ac­
curacy is improved by approximately 6 dB per each bit of the multiplication coefficients
increased.
3.3.1
Com putational Complexity
An integer multiplication is equivalent to bit shifting the multiplicand too the left
by different numbers of bits and sum up these bit shifted versions. An algorithm for
calculating the minimum number of adders for a given integer multiplication is presented
in [15]. Based on the algorithm we have obtained the minimum number of adders for
the multiplications involved in the 12 and 36-point MDCT for coefficient bits Nc varying
from 1 to 8. The graphs so obtained are shown in figure 3.3.1 (a) and (b). It is seen that
the minimum number of adders required for the implementation of the integer MDCT
increases linearly with the resolution of the coefficient bits.
In chapter 4, the implementation of the integer MDCT structure in a commercial
MPEG layer III codec is discussed.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
35
-50
“
—100
P-150
-200
-250
NumberofCoefficientsBits(No)
(a)
-50
2--100
S-150
-2 0 0
-250
NumberofCoefficientsBits(Nc)
(b)
Figure 3.5. Mean square error of the integer MDCT versus the
number of coefficient bits Nc: (a) N = 12 and (b) N = 36.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
36
40
35
30
w
a>
5
Ito 25
c
in
o
20
15
10
5
1
2
3
4
5
Numberof Coefficient Bits
(a)
6
7
8
140
120
(0 100
0)
T3
T5
<
S
ro'
c
CD
O
IE
z3
NumberofCoefficientBits
(b)
Figure 3.6. Number of coefficient bits versus the minimum num­
ber of Binary adders for (a) N = 12 and (b) N = 36.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 4
THE IMPLEMENTATION OF THE INTEGER MDCT
IN A STANDARD MPEG LAYER III CODEC
Commercial MPEG Layer III codecs implement the MDCT using a straightforward
computation by using equation (2.3) or resort to faster techniques such as that repre­
sented by equation (2.21). Either way the implementation involves numerous floating
point additions and multiplications resulting in loss of information due to quantization
error.
The fast integer MDCT structure presented in the previous section is implemented
in a standard MPEG layer III codec. Figures 4.1 and 4.2 show the modification to the en­
coder and the decoder block diagrams. We assess the performance of the integer MDCT
based encoder and decoder by quantizing the lifting coefficients and increasing the num­
ber of bits used until we achieve satisfactory performance, as good as or better than
the original floating point transform. The lifting coefficients are quantized to Nc bits.
Since addition is carried out in the node, after addition the resolution at the node will
be at least N c + 1. Therefore the nodes are quantized at Ni bits, iVj being much greater
than Nc. In the simulation Nt is set at 25 bits. N c is varied from 1 to 8 bits. There are
different criteria for assessing the performance of the integer MDCT based encoder or the
decoder. The integer MDCT based encoder should be capable of producing a bitstream
which can be decoded by any standard decoder. Moreover the decoded bitstream must
be perceptually as good or better than that produced by the original floating point en­
coder/decoder combination. Similarly, the integer MDCT based decoder should be able
to produce sound that is perceptually as good as the standard decoder while decoding a
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced
with permission
of the copyright owner.
Audio signal
in PCM format
Further reproduction
Analysis
Polyphase
Filter Bank
FFT
Integer
MDCT
prohibited without permission.
Psychoacoustic
Model
Nonuniform
Quantization
Bitstream Formatting
CRC word generation
Coding of
Side Information
Coded
Audio
Signal
Ancillary
Data
Figure 4.1. Fast integer MDCT based MPEG layer III encoder functional block diagram.
Co
00
Reproduced
with permission
of the copyright owner.
Huffman c o d e B i t s
M agnitu de i S ig n
MDCT'
Huffman
Decoding
Bitstream
Synchronization
and
Error
Checking
Reordering
Huffman
Info
Decoding
Further reproduction
Scalefactor
Decoding
Scalefactors
Ancillary Data
MDCT
prohibited
Alias
Reduction
Integer
IMDCT
Frequency
Inversion
Synthesis
Polyphase
Filterbank
without permission.
Joint
Stereo
Decoding
PCM Samples
Alias
Reduction
Integer
IMDCT
Frequency
Inversion
► Left
Channel
Figure 4.2. Fast integer IMDCT based MPEG layer III decoder functional block diagram.
CO
CO
40
bitstream encoded using the standard encoder. Finally the integer MDCT based encoder
bitstream when decoded by an integer MDCT based decoder should produce perceptually
lossless audio. By varying the number of bits used to quantize the lifting coefficients, Nc
we can find the minimum value of Nc that satisfies each of the criteria listed above.
4.1
Simulation results
Figure 4.5(a) shows one frame of the original sample. Figures 4.3 (b),(c) and (d)
show the corresponding waveforms obtained using the integer encoder and decoder with
MDCT coefficient quantized to Nc = 1,2 and 3 bits respectively . It was found from
subjective tests, that the number of bits for the MDCT coefficients required to produce
perceptually lossless sound is 2 bits. If the integer encoder is used with the standard
decoder or the standard encoder with the integer decoder, perceptually lossless sound is
obtained when Nc is set to 3 bits.
Figure 4.4(a) shows the spectrogram of asample PCM signal. Figures 4.4(b) , 4.5(a)
and (b) show the spectrogram of the decoded file obtained by using the integer encoder
and standard decoder combination with Nc — 1,2, and 3 respectively. Figure 4.6(a)
shows the spectrogram of the standard MP3 file decoded using a standard decoder. Fig­
ures 4.6(b), 4.7(a) and (b) show the spectrogram of the decoded file obtained using
the integer encoder/decoder combination for Arc = 1,2 and 3. Certain minor differences
can be noticed while examining the spectrograms. Because the psychoacoustic model
removes acoustically redundant frequencies, the decoded files have a spectrum that is
different from the original PCM file. It is easily by comparing figure 4.4 (a) to all other
plots that the spectrogram of the decoded file has frequencies higher than the normalized
frequency = 0.6 clipped. From the 1-bit integer encoder standard decoder combination, it
can be seen that there are red blotches indicating higher amplitude in the low frequency
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41
region that are different from the 2-bit or the 3-bit encoded files.
According to figure 3.3.1 the number of adders required for implementing the 12point and 36-point MDCT/IMDCT in the encoder/decoder are 54 and 86 for Nc — 2 and
N c = 3 respectively.
The mean square error between a given audio signal and a reference one can be
used to quantitatively assess the performance of the codec. We consider the three cases:
integer encoder/integer decoder combination, standard encoder/integer decoder and the
integer encoder/standard decoder. The reference signal in each case is the audio signal
obtained from each combination with N c set at 20 bits. This signal is chosen because it
gives the highest fidelity for a given value of Nc. The mean square error for the integer
encoder/integer decoder for N c varying from 1 to 20 is shown in figure 4.8. From the
normal plot it is seen that The mean square error decreases slowly initially and then at
N c = 20 it quickly converges to zero. Figures 4.9 and 4.10 shows the plots of the mean
square error versus the number of coefficient bits Nc for the standard encoder/integer
decoder and the integer encoder/standard decoder combinations. From the plots it is
clear that the integer encoder/integer decoder combination performs best at 2 bits since
the mean square error is in the order of 10- 5 . The standard encoder/integer decoder and
the integer encoder/standard decoder combinations both have mean square errors in the
range 10“3 at 2 bits. This result is in agreement with those obtained in the subjective
tests.
Chapter 5 summarizes the contributions of this thesis.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
42
0.05
(a)
0.5
0.5
(b)
0.5
a
E
<
(c)
-0 .5
0.5
0.5
T
(d)
-0 .5
0.5
Time (ms)
Figure 4.3. Time plots of (a) PCM signal. Decoded signal obtained using Nc = 1,2,3 bit
integer encoder-decoder combination shown in plots (b),(c),(d) respectively.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Spectrogram of the Original PCM file
"j, -
•t ••• P*••.!■.tjr
0
0.2
0.4
0.6
0.8
1
1.2
Time
1.4
1.6
1.8
2
x10
(a)
1bitencodedmp3file, standarddecoder
I
m
0
0.2
0.4
0.6
0.8
1
Time
1.2
1.4
1.6
1.8
2
x 10
(b)
Figure 4.4. Spectrogram of faJPCM signal ^ E n co d ed signal
with MDCT coefficients quantized to 1 bit.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
44
2bitencodedmp3file, standarddecoder
3bitencodedmp3file, standarddecoder
(b)
Figure 4.5. Spectrogram of (aJEncoded signal with MDCT coef­
ficients quantized to 2 bit ^E ncoded signal with MDCT coeffi­
cients quantized to 3 bit.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Spectrogram of the d eco d ed file obtained from th e original en co d er decoder com bination
w.xi't.Jr- 'irt;
■e:
J
(b)
Figure 4.6. Spectrogram of ^D ecoded signal obtained using
the standard codec (b) Decoded signal obtained from the integer
enocoder/decoder combination with N c = 1.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
46
Spectrogram of th e d e co d ed file with coefficients quantized to 2 bits
(b)
Figure 4.7. Spectrogram of decoded signal obtained from the
integer enocoder/decoder combination with (a)Nc = 2 (b)Nc = 3.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
47
0.8
§ 0.6
0.4
0.2
Nc(Numberofcoefficientbits)
(a)
£ 10 '
10 "
10"’
Nc(NumberofCoefficientbits)
(b)
Figure 4.8. The mean square error versus N c, number of coef­
ficient bits for the integer encoder/integer decoder combination
as Nc varies from 1 to 20 bits (a) Normal plot (b) Semilog plot.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
48
6
.-3
x 10"
5
4
o
UJ
T
J
<D
S3
%
c
<
00
]
5
2
0
Nc(NumberofCoefficient bits)
(a)
10 - '
Nc(NumberofCoefficientbits)
(b)
Figure 4.9. The mean square error versus N c, number of coeffi­
cient bits for the integer encoder/standard decoder combination
as Nc varies from 1 to 20 bits (a) Normal plot (b) Semilog plot.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
49
(30
cr
W
c
(0
2©
Nc(NumberofCoefficientbits)
(a)
10 - '
10
Nc(NumberofCoefficientbits)
(b)
Figure 4.10. The mean square error versus Nc, number of coeffi­
cient bits for the standard encoder/integer decoder combination
as N c varies from 1 to 20 bits (a) Normal plot (b) Semilog plot.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 5
CONCLUSIONS
In this thesis we have described a novel fast, integer structure for the MDCT used
in MPEG audio coding. It is a lapped transform which provides better spectral res­
olution by subdividing in frequency the sub-band outputs obtained from the previous
layers. MPEG layer III uses MDCT of lengths 12 and 36. In order to improve the speed
and efficiency of the codec fast structures for the MDCT have been implemented. The
fast structures are implemented using Given’s plane rotations and DCT/DST of type II.
The 12-point and the 36-point structures use 3-point and 9-point DCT/DST structures.
The rotations block contain orthogonal factorizations which are represented by butterfly
units. Using the lifting scheme the butterfly units can be converted to liftings. One of
the properties of lifting is that using a combination of the forward and inverse lifting
structure, quantization error can be completely removed.
In order for the entire fast MDCT structure to be lossless, DCT/DST blocks have to
be implemented using liftings. The orthonormal factorization of the 3-point DCT/DST
is trivial whereas that for the 9-point DCT/DST is far more complex. We propose a
new structure for the 9-point DCT/DST and obtain its orthonormal factorization. The
new MDCT structure obtained is implemented using the lifting scheme for its lossless
implementation using integer coefficients. The performances of the new transforms are
presented, in which their accuracies are estimated. It is evident that the accuracy of the
integer transform increases approximately linearly as the resolution of the coefficients
increases. It was also shown that the minimum number of adders required to implement
the structure also increases linearly with the resolution of the coefficients [16]. This is
50
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
a particularly important result as far as implementation in low power, mobile devices is
concerned. Fewer the number of bits used, more the savings in power.
The new MDCT structure with integer coefficients was incorporated in a standard
MPEG layer III encoder and decoder. Its performance was compared with the origi­
nal codec. The integer MDCT based encoder-decoder combination performed optimally
with coefficient bits quantized at only 2 bits. Whereas combinations of the standard
encoder/decoder with the integer MDCT based encoder/decoder requires the integer en­
coder/decoder to have at least 3 bits for perceptually lossless reproduction of sound. An
analysis of the differences in the plots and the spectrogram of the waveforms obtained
from the standard codec and the integer MDCT based codec was performed and the
differences in the waveforms were highlighted. Even though there were minor differences
in the waveforms and the spectra, perceptually no difference was found in the integer
codec when the MDCT coefficients were quantized to 2 or more bits.
The new integer MDCT structure is particularly useful to manufacturers of portable
MPEG layer III audio devices. If the MDCT coefficients are quantized to 2 or 3 bits,
the number of adders and shifters reqired are considerably reduced. Also rather than
providing power to floating point coefficients that consist of 32 to 64 bits, only 2 or 3
bits need to be powered. This would prolong battery life in such devices.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
BIBLIOGRAPHY
[1] D. Pan, “A tutorial on mpeg/audio compression,” IEEE Multimedia, pp. 60-74,
Summer 1995.
[2] K.R. Rao and J.J. Hwang, Techniques and Standards for Digital Image/Video/Audio
Coding, Prentice Hall, 1996.
[3] H.S.Malvar, Signal Processing with Lapped Transforms, Archtech House, 1992.
[4] V. Britanak and K.R. Rao, “An efficient implementation of the forward and inverse
MDCT in MPEG audio coding,” IEEE Singal Processing Letters, vol. 8, no. 2, pp.
48-51, February 2001.
[5] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting steps,”
Technical report, Bell Laboratories, Lucent Technologies, 1996.
[6] T.D. Tran, “The bindct: Fast multiplierless approximation of the dct,” IEEE Singal
Processing Letters, vol. 7, no. 6, pp. 141-144, June 2001.
[7] Y-J. Chen, S. Oraintara, and T.Q. Nguyen, “Video compression using integer DCT,”
in ICIP, 2000.
[8] S. Oraintara, Y. Chen, and T. Nguyen, “Integer fast fourier tran sfo rm (in tfft ),”
Proc. IEEE International Conf. on Acoustics, Speech, and Signal Processing, vol. 6,
pp. 3485-3488, 2001.
[9] T.D. Tran, “m-channel linear phase perfect reconstruction filter banks with rational
coefficients,” Submitted to IEEE Trans. CAS-I, 2001.
[10] Y. Wu and Z. Zhu, “New radix-3 fast algorithm for the discrete cosine transform,”
Proc. of the Aerospace and Electronics Conf., vol. 1, pp. 86-89, September 1993.
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
53
[11] Y-H. Chan and W-C. Sui, “An efficient implementation of the forward and inverse
mdct in mpeg audio coding,” IEEE Singal Processing Letters, vol. 8, no. 2, pp.
48-51, February 2001.
[12] Y. Wang, L. Yaroslavsky, M. Vilermo, and M. Vaananen, “Some peculiar properties
of the mdct,” Proc. of the International Conference in Signal Processing, vol. 1, pp.
61-64, 2000.
[13] V. Britanak and K.R. Rao, “New fast algorithms for the unified forward and in­
verse mdct/mdst computation,” Technical report, Institute of Control Theory and
Robotics, Slovak Academy of Sciences, May 2000.
[14] G. Strang and T. Nguyen, Wavelets and Filter Banks, Wellesley Cambridge, 1997.
[15] Y.J. Chen, S. Oraintara, T. D. Tran, K. Amaratunga, and T.Q. Nguyen, “Multiplierless approximations of transforms using lifting scheme and coordinate descent
with adder constraint,” IEEE International Conf. on Acoustics, Speech, and Signal
Processing,, May 2002.
[16] T. Krishnan and S. Oraintara, “A fast and lossless forward and inverse structure for
the mdct in mpeg audio coding,” Proc. of the International Symposium on Circuits
and Systems, May 2002.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.
BIOGRAPHICAL INFORMATION
The author was born on June 15, 1979 in Banglore a metropolitan city in South
India. Owing to the nomadic nature of his father’s profession as a banker working for a
national bank with numerous branches, he spent his childhood in many different exciting
cities and small towns in India. Goaded on by his grandmother with small bribes and
threats, he developed a keen interest in mathematics. After finishing his schooling and
passing out of high school at the top of his class, he went on to pursue a Baccalaureate
degree in Electronics and Communication Engineering at the Sri Venkateswara College
of Engineering under the University of Madras. He soon developed an interest in Digital
Signal Processing and Communications engineering.
He decided to continue onto a Master’s program in Electrical Engineering at The
University of Texas at Arlington where he was offered a graduate fellowship and an op­
portunity to work with Professor Soontorn Oraintara at the Multirate Signal Processing
laboratory. It is here, under the professor’s guidance, he has developed the work docu­
mented in this thesis. He is a student member of the IEEE and a member of the Electrical
Engineering Honor Society Eta Kappa Nu. He is a voracious reader, fond of music, the
movies, long after dinner walks and good vegetarian cuisine. Being vain and extremely
conscious about his height-weight ratio, he works out and swims. He resides on campus
at The University of Texas at Arlington and is currently seeking a career path in the
industry. He graduated from The University of Arlington with a Master of Science in
Electrical Engineering, August 2002.
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.