U MI MICROFILMED 2003 R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. INFORMATION TO USERS This manuscript has been reproduced from th e microfilm master. UMI films the text directly from the original or copy submitted. Thus, som e thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not sen d UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning a t the upper left-hand corner and continuing from left to right in equal sections with small overlaps. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission . R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. FAST INTEGER MDCT FOR MPEG/AUDIO CODING The members of the Committee approve the masters thesis of Tharakram Krishnan Soontorn Oraintara Supervising Professor Venkat Devarajan Si Michael Manry R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission. Copyright © by Tharakram Krishnan 2002 All Rights Reserved R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. FAST INTEGER MDCT FOR M PEG/AUDIO CODING by THARAKRAM KRISHNAN Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ELECTRICAL ENGINEERING THE UNIVERSITY OF TEXAS AT ARLINGTON August 2002 R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission . UMI Number: 1410963 ___ ® UMI UM I Microform 1 4 1 0 9 6 3 Copyright 2 0 0 2 by ProQ uest Inform ation and Learning C om pany. All rights reserved. This m icroform edition is protected against unauthorized copying under Title 17, United S tates C ode. P roQ uest Information and Learning C om pany 300 North Z e e b Road P .O . Box 1346 Ann Arbor, M l 4 8 1 0 6 -1 3 4 6 R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. ACKNOWLEDGMENTS I wish to acknowledge the patient encouragement and guidance of Prof. Soontorn Oraintara, without whom this thesis could not have been completed in time. I would also like to thank my parents, my sister and friends whose constant support and understanding has helped me weather through many difficult times. May 1, 2002 v R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. FAST INTEGER MDCT FOR MPEG/AUDIO CODING Publication N o .______ Tharakram Krishnan, M.S. The University of Texas at Arlington, 2002 Supervising Professor: Soontorn Oraintara The modified discrete cosine transform (MDCT) is a lapped transform used in transform coding schemes as an analysis/synthesis filter bank based on the concept of time domain aliasing cancellation. This thesis proposes to implement MDCT using the lifting scheme for its lossless implementation with integer coefficients. The lifting scheme also renders the MDCT impervious to quantization errors making it a lossless scheme. The new fast algorithm can be easily adopted for the MDCT computation in current audio coding schemes such as those offered by the MPEG audio standards. A performance comparison of the proposed implementation and the current technique will be presented. The integer MDCT is implemented in a standard MPEG layer III codec. The performance of the codec with the new structure will be evaluated. vi with p erm ission of the copyright ow ner. Further reproduction prohibited w ithout p erm ission . TABLE OF CONTENTS ACKNOWLEDGMENTS ............................................................................................ v ABSTRACT ................................................................................................................. vi LIST OF F IG U R E S .................................................................................................... ix LIST OF T A B L E S ....................................................................................................... xi Chapter 1. IN T R O D U C T IO N .................................................................................................. 1 1.1 MPEG/Audio c o d in g ............................................................................... 2 1.1.1 Features........................................................................................ 2 1.1.2 Applications ............................................................................... 3 1.2 The Modified Discrete Cosine Transform................................................. 4 1.3 O u tlin e ........................................................................................................ 4 2. MPEG/AUDIO LAYER III AND THE M D C T ................................................. 7 2.1 3. The MPEG Layer III E n c o d e r ............................................................... 7 2.2 The MPEG Layer III D e c o d e r............................................................... 13 2.3 The Modified Discrete Cosine Transform................................................. 15 2.4 A fast structure for the M D C T ................................................................ 17 2.5 The Lifting scheme .................................................................................. 23 A FAST, INTEGER STRUCTURE FOR THE M D C T ................................... 26 3.1 Fast 12-point Integer MDCT structure................................................... 26 3.2 Fast 36-point Integer MDCT structure................................................... 26 3.2.1 Orthonormal Factorization of the 9-point DCT I I .................. 28 3.3 Accuracy of the forward tran sfo rm .......................................................... 30 vii R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 3.3.1 Computational C o m p lex ity ....................................................... 34 4. THE INTEGER MDCT IN A STANDARD MPEG LAYER III CODEC . . . 37 4.1 Simulation r e s u l ts ..................................................................................... 40 5. CONCLUSIONS ..................................................................................................... 50 B IB LIO G R A PH Y ........................................................................................................ 52 BIOGRAPHICAL INFORMATION 54 ......................................................................... viii R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. LIST OF FIGURES Figure Page 2.1 MPEG layer III encoder functional block d ia g ra m ...................................... 8 2.2 The four types of w in d o w s............................................................................. 10 2.3 MPEG layer III decoder functional block d ia g ra m ...................................... 14 2.4 Butterfly implementation of the 12-point MDCT s t r u c tu r e ....................... 21 2.5 Butterfly implementation of the 36-point MDCT s t r u c t u r e ...................... 22 2.6 Butterfly and Lifting s t e p s ............................................................................. 23 3.1 Fast, integer structure for the 12-point Integer M D CT/IM D CT................ 27 3.2 Fast, integer structure for the 36-point Integer M D CT/IM D CT................. 29 3.3 Structure for the orthogonal 9-point D C T - I I ............................................... 31 3.4 An integer structure for the orthogonal 9-point D C T - I I ............................. 33 3.5 Plot of Mean square error versus N c .............................................................. 35 3.6 Plot of Nc versus the minimum number of binary adders 36 4.1 Block diagram of an integer MDCT based MPEG layer III encoder . . . . 38 4.2 Block diagram of an integer IMDCT based MPEG layer III decoder . . . 39 4.3 Time plots of the audio s ig n a l....................................................................... 42 4.4 Spectrogram of audio signal .......................................................................... 43 4.5 Spectrogram of audio signal .......................................................................... 44 4.6 Spectrogram of audio signal .......................................................................... 45 4.7 Spectrogram of audio signal .......................................................................... 46 4.8 ......................... MSE vs Nc for the integer c o d e c ............................................................ 4.9 MSE vs Nc for the integer encoder/standard decoder 47 ................................ ix R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 48 4.10 MSE vs Nc for the standard encoder/integer decoder x R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. LIST OF TABLES Table Page 2.1 Butterfly coefficients of the 12-point MDCT shown in figure 2.4 .............. 3.1 Lifting Coefficients used in the 12-point Integer MDCT, figure 3.1 . . . . 28 3.2 Butterfly coefficients of the 9-point DCT-II block shown in figure 3.3 . . . 32 3.3 Lifting Coefficients used in the 9-point DCT-II/DST-II block, figure 3.4 . 32 xi R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 20 CHAPTER 1 INTRODUCTION From the time the modern computer has been in use, there has existed a disparity between the volume of information and the availability of resources to store it in. Mem­ ory resources have always been limited. Engineers and computer scientists have tried by either improving memory fabrication technology or by compressing the data stored, to make maximum utilization of the memory available. Memory fabrication technology today makes it possible to store a million times more information than the memory de­ vices twenty years ago. With the advent of a faster and more accessible internet, the volume and the types of information stored has increased many times. Also the expo­ nential growth in processing power makes today’s computing devices number crunching behemoths. This means that we can run complex data compression algorithms to max­ imize memory resources while making full use of the processing power available to us. By using data compression techniques we can reduce the memory occupied by the data. Less obvious advantage of smaller data file size is that it would occupy lesser bandwidth when sent over the internet. A major chunk of the information sent over the internet today is in the form of streaming audio and video. Raw digitized video and audio require very high bandwidth and therefore not suited for transmission over networks with limited bandwidth and high load. Frequencies of sound perceived by the human ear is the range of 30 Hz to 30 KHz. For most audio applications frequencies above 22.5 KHz are regarded as redundant. For the near-perfect reconstruction of audio signals we require it to be sampled at 44.1 KHz, and at 16 bits per sample we would require a bandwidth of approximately 7 megabits 1 R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. per second. As per today’s standards this is a huge amount of bandwidth making it extremely expensive to send even moderately large raw audio files over the internet. Fortunately we have many advanced audio coding schemes which have made it eas­ ier to compress, store and transfer audio files. Some of which are the AAC-3 (Advanced Audio Coding) and the MPEG (Motion Pictures Experts Group) audio coding standards. In this thesis we concern ourselves with the layer III of the MPEG/audio coding scheme, specifically an integer approximation of the modified discrete cosine transform (MDCT) filterbank which forms its core. 1.1 M PEG /A udio coding The need for an efficient and quality preserving data compression algorithm lead the International Organization (ISO) for Standardization to develop a standard for compress­ ing digital video and audio. The Motion Pictures Expret Group (MPEG)was established to cater to this need and in November 1992 the first standard called MPEG-1 was pro­ posed. Later, in November 1994, an extension to the MPEG-1 standard called MPEG-2 was developed. MPEG/audio addresses the compression of synchronized audio and video at 1.5 megabits per second [1]. Unlike the coders that model the vocal-tract and specially tune themselves to speech, the MPEG coder gets its data reduction by exploiting the limitations of the human ear. The compression results from removing the perceptually redundant parts of the audio signal. Since the distortions introduced are inaudible to the human ear, the MPEG coder can compress any signal meant to be perceived by the human auditory system. 1.1.1 Features The MPEG/Audio offers a diverse assortment of compression modes and a number of useful features like fast forwarding, audio reversing and random access. The sampling rate of the audio stream can be set at 32, 44.1 or 48 kHz. The compressed bitstream can R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 3 support one or two audio channels in four possible modes [1]: 1. a monophonic mode for single audio channel, 2. a dual-monophonic mode for two independent channels, 3. a stereo mode for stereo channels that share bits, 4. a joint stereo mode th at takes advantage of the correlations between stereo channels or the irrelevancy of phase difference between channels. There are several predefined bitrates, ranging from 32 to 224 kilobits per second per channel. Compression ratios range from 2.7 to 24. It also supports a “free” bitrate mode to support bitrates other than the predefined ones. MPEG/audio offers a choice of three layers of compression. Between them the three layers provide a range of tradeoffs between code complexity and compression rates [1]. Layer I, the simplest, is suited for bitrates of 128 kbps and upward, layer II which is more complex than layer I offers bitrates around 128 kbps. Layer III has the most codec complexity, but offers highest fidelity. All three layers are simple enough to be implemented on a small chip. 1.1.2 Applications Based on the quality of service offered, each layer has its own applications. Layer I is used in Philips’ Digital Compact Cassette(DCC) with 192 kbit/s per channel. Ap­ plications for layer II include coding of audio for Digital Audio Broadcasting (DAB) and the storage of full motion, synchronized audio and video on CD-ROM, more popularly called Video CD. Layer III which offers bitrates of about 64 kbps per channel suits audio transmission over ISDN. With the advent of cheap and powerful processors coupled with high bandwidth offered by cable and DSL based internet service providers, the MPEG layer III compression has become the favorite mode of exchange of digital audio over the internet. The layer III algorithm, though complex when compared to layer I and II, is R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout p erm ission . 4 simple enough to be implemented on a chip. This has brought many small hand-held, portable and inexpensive MP3 players into vogue. 1.2 T he M odified Discrete Cosine Transform The modified discrete cosine transform (MDCT) has been employed in transform coding schemes as the analysis/synthesis filterbank based upon time domain aliasing cancellation [2]. The MDCT is used in the layer III of MPEG-1 and MPEG-2 to provide better spectral resolution by subdividing in frequency the sub-band outputs obtained from the previous layers. The MDCT can be viewed as a lapped transform of which the number of inputs N is twice the number of outputs [3]. The MPEG audio standards use two sizes of MDCT, 6 x 12 (N = 12) and 18 x 36 (N = 36). Recently, new fast algorithms for the forward and inverse MDCT computation has been proposed [4]. It is based on DCT-II/DST-II fast algorithms and their inverses which have real (irrational) coefficents. The algorithm can be used to compute the MDCT for data sequences of length N divisible by 4. Despite the efficiency of the proposed fast al­ gorithms, the internal operations require real multiplications which are not preferable in applications that run on batteries such as mobile multimedia communications. In prac­ tice, these transforms are often approximated by using fixed-point arithmetic. However, this type of implementation does not preserve the invertibility property of the transform. 1.3 Outline As was previously mentioned the layer III is the most complex of the three layers. In this thesis we concentrate on improving the efficiency of one of the main components of the layer III codec, the Modified Discrete Cosine Transform (MDCT) Filterbank. We are interested in approximating the MDCT using integer or dyadic coefficients while maintaining its reversibility. Lifting scheme [5] is used to calculate orthogonal matrices. This technique has been used in approximating other orthogonal transforms such as the R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. DCT [6, 7] and the DFT [8]. It has also been employed in some classes of lapped trans­ forms with symmetric and anti-symmectric basis functions [9]. It should be noted that the basis functions for MDCT are neither symmetric nor anti-symmetric, and thus can not utilize the existing structures. We will illustrate the possibility of lossless implemen­ tation of the MDCT with integer coefficients using liftings. The size of the MDCT being considered is also problematic since it uses 3-point and 9-point forward and inverse DCTII/DST-II. Though some fast algorithms for computing radix-3 DCT and DST have been presented in the literature [10, 11], they are not based on orthogonal operations which are difficult to control the dynamic-range of the internal nodes in the transforms. In this thesis, a novel factorization of these transforms which are suitable for the integer implementation of the MDCT is presented. The accuracy of the integer transform, when compared to the original fast transform will be analyzed. The integer transformation is incorporated in a standard MPEG layer III encoder and decoder. Based on certain subjective tests the performance of the new codec is com­ pared with the standard MPEG layer III codec. In chapter 2, a detailed description of the MPEG layer III, the different compo­ nents, a comparison with the other two layers and an analysis of the role and working of the MDCT is presented. The mathematics behind the MDCT, certain properties of the MDCT and some previous works about its fast implementation are also described. The lifting scheme which is the technique used in the integer implementation of the MDCT is described in a brief and pertinent manner. Chapter 3 deals with the new integer structure for the MDCT. The orthogonal factorizations for the 3-point and 9-point discrete cosine transform of type II (DCT II) are also described. Results obtained from simulations to determine the accuracy of the integer MDCT are presented. R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 6 The performance of the Integer MDCT when incorporated in a commercial MPEG/Audio codec are discussed in chapter 4. Chapter 5 concludes and summarizes the contributions of the thesis. R ep ro d u ced with p erm ission o f th e copyright ow ner. Further reproduction prohibited w ithout perm ission. CHAPTER 2 MPEG/AUDIO LAYER III AND THE MDCT This chapter briefly describes the different algorithms of the MPEG layer III codec. The MDCT and its role in the layer III standard is discussed in detail. A fast MDCT structure and the Lifting scheme which is used in integer implementations of transforms is also presented. 2.1 T h e M PEG Layer III Encoder In this section a brief functional description of the MPEG/Audio Layer III encoder is presented. The block diagram of the encoder is shown in figure 2.1. Here the audio signal is a single channel Pulse Code Modulated (PCM) signal sampled at 44.1 KHz and quantized to 16 bits. A nalysis polyphase filterbank: The first block in the encoder is the polyphase filterbank that coarsely divides the samples in frequency into 32 equally spaced subbands. 1,152 PCM samples are simultaneously filtered by a filterbank consisting of 32 equal sub­ bands, each of width 1.37 kHz. This is followed by decimation by a factor of 32. Each subband will therefore contain 36 samples. Aliasing cancellation is taken care of in the decoder to achieve perfect reconstruction. F a st Fourier Transform : Two fast fourier transform calculations are performed simultaneously with the polyphase filterbank calculations. Both 256 and 1,024 point FFT are performed to provide for high spectral resolution. 7 R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. Reproduced with permission of the copyright owner. Audio signal in PCM format Further reproduction Analysis Polyphase Filter Bank PFT MDCT Psychoacoustic Model Nonuniform Quantization Huffman Encoding Bitstream Formatting CRC word generation Coding of Side Information Coded Audio Signal Ancillary Data prohibited without permission. Figure 2.1. MPEG layer III encoder functional block diagram. Oo 9 M odified Discrete Cosine Transform: The MDCT enhances the spectral reso­ lution of the 36 samples obtained from the each of the 32 subbands by mapping them to 36 frequency lines per subband. This results in the 1,152 PCM samples being spectrally resolved into 1,152 frequency lines. Prior to the application of the transform windowing of the subband samples is performed. The MPEG layer III stipulates four kinds of win­ dows. The long window which of length 36 is applied when the samples within a subband exhibit stationary, slow changing behavior. The short window is applied when the sam­ ples in the subband exhibit non-stationary or transient behavior. Two other windows termed “start” and “stop” windows are used to handle transitions between the long and short windows. These windows provides better frequency resolution in the lower frequen­ cies without sacrificing time resolution for the higher frequencies. The different windows are illustrated in figure 2.2. The output of the MDCT consists of 18 frequency samples obtained after applying the transform on either the long window or three overlapping short windows. The different types of windows and the sequence of windows applied to a subband is shown in figure. The decision on which window to apply is dependant on the psychoacoustic model. Performing the MDCT on on the long windows will produce 18 frequency lines with 50 percent overlap. The short windows will produce 3 groups of 6 frequencies each belonging to different time intervals. Thus applying the MDCT once to the 32 subbands will produce 576 samples in frequency. 50 percent overlap will cause the MDCT to produce 1,152 frequency points when transforming the subbands. Because the MDCT processing of a subband has good frequency resolution, it has poor time resolution. The MDCT operates on 12 or 36 polyphase filter samples, so the effec­ tive time window of audio samples involved is 12 or 36 times larger. The quantization of the MDCT values will cause the errors to spread over this large time window and it is more likely to manifest audible distortions in the form of pre-echo. The psychoacoustic model of the layer III incorporates several measures to reduce pre-echo. The MDCT is R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 10 dealt with in greater detail in the next section. A lias reduction: Aliasing introduced in the analysis polyphase filterbank is re- 0.7 ■ 0.6 - 0.5 • 0.4 • 03 • (a) 5 10 15 20 (b) 25 (c) 30 35 (d) Figure 2.2. The plots of the (a) short window, (b) long window, (c) start(long to short transition)window and (d) stop(short to long transition) window. moved by means of a series of butterfly computations. This is done so as to reduce the amount of data for transmission. P sychoacoustic m odelling: The psychoacoustic model is the model of human ears auditory perception. It is used in the encoder only, in order to decide which parts of R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 11 the signal can be heard by the human ear and which parts cannot. The psychoacoustic model decides which window type to apply before applying the MDCT on the subband samples. This decision is based on the differences in the FFT spectra of the present samples and the spectra of the previous samples. If the signal is a stationary signal, then there will not be much difference between the FFT spectra calculated at the present instant with th at calculated previously. If it is a fast moving transient signal, there will be certain differences between the current spectra and the previous FFT spectra. The psycho acoustic model will then call for the application of the long to short transitional window and the short window. The psychoacoustic model also supplies the nonuniform quantization block with information on how the frequencies obtained from the MDCT block are to be quan­ tized. The quantization of spectral lines is adapted to the limitations of the human ear’s perception of audio signals. The human ear has 24 critical bands in which it is less frequency selective.Masking is a phenomenon in which a weak signal is made inaudible by a simultaneously occurring strong signal. The masking occurs in each critical band when a dominant tonal component is present and all other frequencies in the band are not perceived properly. The dominant component introduces a masking threshold belowr which frequencies in the same band are masked out. This allows for coarse quantization of the masked frequencies within the band, without allowing any perceivable distortion. Whenever a dominant tone is present the masking threshold is calculated. Based on this threshold an upper limit for the quantization level required in individual scalefactor bands is determined. The layer III specifies two different models that can be used with the encoder. N onuniform quantization: The non linear quantization of the spectral lines is performed in this block. Nonlinearity is introduced by first raising each sample to the power of 3/4 In order to reduce quantization noise, a scaling of the spectral coefficients R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 12 in each scalefactor band is performed prior to the nonuniform quantization. Hence the output is quantized spectral lines and scalefactors for each band. Grouping and scaling of frequencies into scalefactor bands is performed in accordance with the psychoacoustic model. H uffm an encoding and coding of side inform ation: The scaled frequency lines are Huffman coded using 32 stationary Huffman tables. It is here that the majority of the data reduction takes place. In order for the decoder to to reproduce the audio signal successfully all the parameters used and generated by the encoder should be pro­ vided. The side information consists of boundaries of certain data blocks, quantizer step sizes etc. B itstre a m fo rm attin g , CRC word g en eratio n and A ncillary data:T he Huff­ man coded spectral lines, side information and a frame header are assembled to form the bitstream. The bitstream is partitioned into frames representing 1152 PCM samples. An optional CRC (Cyclic Redundancy Check) can be included for data validation. Ancillary data is used for features like the artists name, album or music category. It is an optional feature. Stereo E ncoding: The encoder model presented so far is applicable only to sin­ gle channel audio. Encoding dual channels or stereo audio channels is achieved by time sharing the model described above. It does not introduce extra complexity because the two channels are encoded independently, the two stereo redundancy modes supported by the layer III are the Middle Side (MS) stereo and the Intensity stereo. The first method transmits the sum and the difference of the two channels. The two new channels are transmitted as described for a dual channel. Intensity stereo requires only one channel and transmits the sum of the two audio signals. R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 2.2 The M PE G Layer III Decoder A functional description of the MPEG layer III decoder is presented in this section. Figure 2.3 shows the block diagram of the layer III decoder. The decoder does not have the psychoacoustic model block. As is obvious from the figure the blocks function as an inverse to the ones presented in the encoder. Based on the CRC data and the header information, the integrity of the bitstream is verified. The ancillary data is decoded. The Huffman decoding block performs the decoding of the Huffman coded bits. There have been no attempts to seperate the code words in the Huffman code bits. Therefore a codeword from the middle of the bitstream cannot be identified without starting from the initial codeword. The Huffman info coding block is a controller of the Huffman decoding block, all parameters required for the correct decoding of the Huffman coded bits are setup by this block. The scalefactor decoding block decodes the coded scalefactors. The decoded scalefactors are used in descaling. Descaling establishes a perceptually identical copy of the frequency lines generated by the MDCT block in the encoder. The descaled frequency lines are then reordered. The stereo processing block converts the encoded stereo signal into left and right channels. It was mentioned previously that aliasing reduction was applied in the the MDCT block of the encoder. In order to obtain the correct reconstruction of the audio signals in the next few blocks, the aliasing artifacts have to be added again. This is done by .eight butterfly calculations for each subband. The frequency lines from the alias reduction block are mapped onto polyphase filter samples in the IMDCT block. The expression for the IMDCT is presented in the next section in equation (2.4). Depending on the block type the output of the IMDCT block is multiplied with one of the following windows: Block Type = 0 (Long Block) 7T 1 Wi = sin(— (t + - ) , i = 0 to 35 R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. (2.1) Reproduced with permission of the copyright owner. Huffman code Bits Magnitude 6 Sign Huffman Descaling Decoding Synchronization and Error Checking MDCT' Reordering Huffman Info Decoding Further reproduction Scalefactor Decoding Scalefactors Ancillary Data MDCT prohibited without permission. Alias Reduction IMDCT Frequency Inversion Synthesis Polyphase Filterbank Joint Stereo Decoding ► Right Channel PCM Samples Alias Reduction IMDCT Frequency Inversion Synthesis Polyphase Filterbank Figure 2.3. MPEG layer III decoder functional block diagram. ► Left Channel 15 Block Type = 1 (long to short transition) i — 0 to 17 sin(§(* + 5) 1 i = 18 to 23 s in ( £ ( i- 1 8 + ±) 1 = 24 to 29 Wi= < 2= 0 30 to 35 Block Type = 2 (Short Block) = sm (— (z + = 0 to ( 2 .2 ) 11 Block Type = 3 (short to long transition) 2= sin (^ ( 2 -6 + i) 0 to 5 2= 6 to 11 Wi = sin(§(* + l) 2= 12 to 17 2= 18 to 35 When the short block window is applied, the windowed short blocks are overlapped and concatenated. In order to compensate for the frequency inversion in the synthesis filterbank, odd time samples of every odd subband is multiplied with a -1. Each time a sample from each of the 32 subbands are calculated, they are applied to the synthesis filterbank and 32 consecutive audio samples are calculated. 2.3 The Modified Discrete Cosine Transform The Modified Discrete Cosine Transform is an orthogonal lapped transform used in the Layer III of the MPEG/Audio to provide better spectral resolution by subdividing R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 16 in frequency the sub-band outputs obtained from the Multirate filterbank. The MDCT and its inverse (IMDCT) for a windowed input sequence are defined as [2]: N- 1 — y~]x n COS n=0 7r(2n + 1 + y)(2/c + 1) 2N 7r( 2 n c ‘ COS N E k= 0 + l +f)(2A : + l) 2N (2.3) (2.4) where k = 0,1, • • • , N/2 — 1 and n = 0,1, • • • , N —1 . The MDCT coefficients posses an even antisymmetry property so that we obtain N/2 MDCT coefficients C* from N input samples. The input window which is of length N is shifted by N/2 samples to obtain the next set of coefficients. Thus the length of the input data vector remains the same after transformation. The notation xn denotes that the recovered data sequence is not the same as the original input vector and is said to be time aliased. The data vector is reconstructed by overlapping and adding the IMDCT coefficients xn. Equations (2.3) and (2.4) are applicable to a windowed input signal. For perfect reconstruction,the window h(n) must comply with two constraints as follows [12 ]: h(N — 1 —n) = h(n) (2.5) N N h (n) + h ( n + — ), n = 0,1,2,3...., — - 1 ( 2 .6 ) where n = 0,1, 2,3,...., N — 1. The possible choices for the window which match these constraints is the rectangular window, which is h(n) = 1 for n = 0 , 1 ,2,..., N and the sine window which is given by: h{n) = sin [~ (n - ^), n = 0 ,1 ,2 ,3 , JV —1. R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. (2.7) 17 The window is applied twice, before the forward transform and once again after the in­ verse transform so that the condition specified by equation (2.6) is satisfied. If N —k —1 is substituted instead of k in equation (2.3), we get N- 1 = —^ xn COS ZN-k-1 n=0 7t(2n + 1 + y)(2 k + 1) 2N (2 .8) where n = 0,1,2,...., N — 1 and k = 0,1,2,...., y — 1. The MDCT exhibits the even antisymmetry property, given by Ztf-k-i = 2.4 —Zk, k = 0 , 1 , 2 , 3 , (2.9) 1. A fast structure for the M DCT According to equation (2.3) the MDCT can be written in the form N —l cos 7r(2n+ 1)(2*+ 1) 7r 2^ + 4 ^ , fc = 0 , 1 , 2 , N — — 1. n= 0 ( 2 . 10 ) The even antisymmetry property allows for the computation of only one half of the coefficients [13]. Therefore N- 1 = £ N x„ COS ^ ( 2 n ± | M + l ) + ^ ( 4 t + 1) , fe = 0,1,2,3, ■■• , —— 1. n= 0 (2 . 1 1 ) This can be rewritten as N- 1 ^ 2 * = ( ~ l ) fc- y E> n= 0 , 1" (I COS c ' 7r(2n + 1)(4 k + 1) —sin 2N 7r(2 n -I-1)(4 k + 1) 2N R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout perm ission. (2 . 12) where k = 0,1, • • • , N / 2 — 1 . Symmetry of the sine and cosine kernels provide further reduction. Substituting N - n — 1 into equation (2.12), for n = 0,1,2, • • • , N /2 - 1 and using trigonometric identities, it can be written in the form N- 1 7r( 2 n . y / 2 Z 2k = cos n=0 [ + 1)(4k + 1) 7r(2 n + 1)(4A: + 1)" — x” sin 2N \ [ 2N J , (2.13) where k = 0,1, • • • , N/2 — 1 . where xn — xn Xjv-n-ii (2.14) xn — xn + iv—n—i , (2.15) where n = 0,1,2, • • •, N/2 — 1 . Repeatedly using the symmetry of the sine and cosine kernels, and by substituting y n — 1 into equation (2.13), for y —n - 1 the second reduction is achieved as cos 7r(2 n + n —0 ~ {x 'n - ^ _ n_i)sin 7r( 2 n 1)(4 k + 1) 2N + 1)(4 k + 1) 2N (2.16) where k = 0,1, • • • , N/2 — 1 . The sine and cosine kernels are expanded as follows: + 1)(4 k + 1) 2N rc(2n+ l)(4/c + 1 )' sin 2N cos 7r( 2 n = 7r(2n + l) CQS~ 7r(2n + l)/c . 7r(2n + 1) . 7r(2n + l)/c — cos — Ar. sin — —- s i n - —- ,— —, 2N N/2 2N N/2 . n(2n+l) n(2n+l)k 7r(2n + l) . Tr(2n + l)k + cos -J l-—r—- sin = sin — — cos —- ___ 2N N/2 2N N /2 ’ (2.17) R ep ro d u ced with p erm ission of th e copyright ow ner. Further reproduction prohibited w ithout p erm ission . 19 After their substitution in equation (2.16) and a few manipulations we get py Z 2k = ( - 1 ) fc- y 4__ 1 X 2 7r(2n + 1)k Qn c o s - bn sin 2(N/4) n=0 "■7r(2n + 1)A: (2.18) 2(iV/4) k = 0,1, • • • , N/2 — 1 , where , i » v 7 r (2 n + l) , „ {xn - x ^ _ n_l)cos— ^ an = 7r (2 n + 1 ) sin ■ 2N » , . 7 r (2 n + l) — ^ , 2N 7r(2n + 1) (X n -x .^ s m + ( x 'n — X j v _ n _ j ) COS 2N (2.19) n = 0 , 1 , 2 , • • • , JV/4 — 1 . These are recognized as plane rotations. The sine and cosine kernels are recognized as j point DCT and DST of type II. Finally substituting k + ^ into 2.16 and using trigonometric identities n (2 n + l)(4 k + l + N ) 2N . 7r(2n + l)(4k + 1 + N) sin 2N cos . 7r(2n + l)(4/c + 1) = (—l ) n+ sin = 2N ( - l ) " c o s * ( 2 w + ^ 4* + 1 ) . v ' 2N ' ( 2 . 20 ) After some manipulations, the complete formula for the fast MDCT is obtained as —-l Z2k = ^ 2 f an C°S n= 0 = ( - 1) k+i V2i E (-» n= 0 n+1 an cos 7r(2n+ 1)k 2 (A/4) 7r(2n + 1)k - bn sin - bn sin 7r(2n + 1)k 7r(2n + l)k ( 2 . 21 ) fc = 0,1,2, • • • , AT/4 —1 . R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 20 Detailed analysis of the computation of the fast algorithm results in a simple signal flow graph. The Inverse Modified Cosine Transform (IMDCT)structure is obtained from the MDCT structure by simply reversing the flow in the signal flowgraph. The fast 12-point MDCT/IMDCT structure is shown in figure 2.4, whereas that used in the fast 36-point MDCT/IMDCT is shown in 2.5. The rotations block and the DCT/DST block are composed of rotations as shown in figures 2.4 and 2.5. The coefficients used in the rotations are called butterfly coefficients and their values for 12-point MDCT are obtained as in table 2.4. The values of the butterfly coefficients of the rotations block of the Appoint MDCT are well defined and are given by [4]: C{ = cos (N / 2 - 1 - 2 i)n — — ----- —, 2N ’ . (N/2 - 1 - 2i)ir Si = sin -— ------—----- — 1 2N where i = 0 ,1 , ..., N/4 —1. The 3-point and 9-point DCT-II structures presented in [4] have the disadvantage that they cannot be orthonormally factorized. The orthonormal factorization of the 3-point DCT-II is trivial and is shown in the 3point DCT-II block in figure 2.4. However, the ones in the DCT-II and DST-II blocks of the 36-pointMDCT remain unknown. A novel structure for the 9-point DCT-II/DST-II is presented in the next chapter. Table 2 .1 . Butterfly coefficients of the 12-point MDCT shown in figure 2.4. Cl COS of' C2 COS f j c o s£ c3 1 C4, Cq v/2 1 c5i C7 %/3 Si Slnl? sin S3 s in £ S3 1 S4 >S6 S5 , S7 /I R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. Reproduced with permission of the copyright owner. Rotations 3 point IDCT-H Inverse Rotations Further reproduction prohibited without permission. Figure 2.4. Fast structure for the 12-point MDCT/IMDCT with rotations implemented using butterflies. to Reproduced with permission of the copyright owner. R otations Inverse R otations Further reproduction prohibited without permission. *<3I) M32M Figure 2.5. Fast structure for the 36-point MDCT/IMDCT with rotations implemented using butterflies. N3 to 23 2.5 T h e L ifting schem e One of the methods of evaluating a concept in mathematics is to reduce into sim­ pler steps. Polynomials are factorized into monomials, numbers into their prime factors. In this section we study the lifting scheme using which we factorize the butterfly coef­ ficients. Such a factorization has certain singular advantages. The rotations in the fast implementation can be represented by an orthonormal matrix as: c R = s where c = cos 9, and s = sin 6 —s c Figure 2.6 (a) shows a block diagram for computing R and its inverse. This matrix R can be factorised into a product of upper and lower triangular matrices as follows: 1 0 s 1 1 z=± s 1 0 1 I— * 1____ 1 o R = where C~ 1 , bU= s, andJ g= ------c~ 1 a = -----The coefficients a, b and g are call lifting coefficients in this paper. Figure 2 .6 (b) shows a flowgraph of the forward and inverse lifting structures of R. g (a) g (b) Figure 2.6. Forward and inverse (a) butterflies and (b) lifting steps. R ep ro d u ced with p erm ission o f the copyright ow ner. Further reproduction prohibited w ithout p erm ission. 24 The advantage of lifting is that the quantization introduced in the forward structure can be cancelled out in the inverse structure making the structure invertible. Consider a signal Xi flowing through the top branch of the forward lifting structure. Let the signal X2 flow through the lower branch. The signals X\ and X2 are quantized to Ni bits. Let the coefficients to Nc bits and the nodes to bits, N t « AT, and Nc « N t. The output at the top branch node following the the coefficient a is 2/i = X\ + a * X2 + Qi + Qa where Qi and Qa are quantization errors introduced because of the lifting coefficient and at the node. In the inverse structure the output at the top branch node following the the coefficient a is y[ = x\ + a * x 2 Qi Qa Thus any quantization error introduced in the forward structure is cancelled out by the inverse. This property of the lifting scheme is especially useful for mobile devices. By using less number of bits to quantize the lifting coefficients, battery power is saved. More­ over, nonlinear operations can be used at the liftings without violating its invertibility as long as same operations are used in its inverse. Assuming that the resolution of the lifting coefficient is sufficiently high it can be shown that the lifting implementation of each coefficient increases the resolution of the input by at most one bit [8 ]. Thus the dynamic range at the internal nodes is an important factor. The number of bits at the nodes, Ni has to be at least this dynamic range. The lifting conversion reduces the number of multiplications needed to compute the R from four to three, although the number of additions is increased from two to three. This is a definite advantage since multiplications are computationally more expensive. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 discusses the integer implementation of the fast MDCT structure lifting. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 3 A FAST, INTEGER STRUCTURE FOR THE MDCT In this chapter we discuss a fast, integer MDCT structure for the sizes 12 and 36points used in the MPEG layer III codec. We also present a novel 9-point orthonormally factorized DCT-II/DSTII structure which is used in the 36-point MDCT structure. We also discuss the merits of the new MDCT structures and also assess the accuracy of the forward integer MDCT when compared with the fast structure presented in the previous chapter. 3.1 Fast 12-point Integer MDCT structure The fast structure shown in figure 2.4 uses butterflies to implement the rotations and the DCT-II/DSTII. The conversion of butterflies to lifting steps was shown in section 2.5. By converting all of the butterflies in figure 2.4 to lifting, the fast, integer structure can be obtained as shown in figure 3.1. The lifting coefficients are listed in table 3.1. 3.2 Fast 36-point Integer MDCT structure The 3-point DCT II/DST II structure which is a part of the 12-point MDCT shown in figure 2.5 was trivial enough to since it uses only two rotation angles. Whereas the structure of DCT II/DST II far larger dimensions is far more difficult to derive. Especially the orthogonal factorization of the 9-point DCT-II is not so obvious. The integer structure obtained by converting the butterflies into lifting is shown in figure 3 .2 . The lifting implementation for the rotations block in the 36-point MDCT in figure 3.2 are given by 26 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. x(O), Rotations Further reproduction ill. X~ A t m t i 3 point DST-II Inverse Rotations 3 point IDCT-II 3 point DCT-II ,Co Jc. X 4 c 2J X - X JM A A - L .. \ 'X k L ,,' V A t■ i. T L‘ f> 1"6 Tft Is f. "l i r 3 point IDST-II prohibited without permission. Figure 3.1. Fast structure for the 12-point Integer MDCT/IMDCT with rotations implemented using lifting. to 28 Table 3.1. Lifting Coefficients used in the 12-point Integer MDCT, figure 3.1. Oi 1 sin “ "If - 1 0,2 sin | f O3 04, co s ^ f a6 0 5 , 07 COS24 1 s i n || 9i 62 sin | | 92 sin £ h s in lf 64, T I- 1 1 b$ h , b? l-l/l c o s§ f-l 61 I 93 co s^ - 1 s in ^ f « * * -! sin I T 94,96 72- 1 95,97 1-Ji cos 7 j —1 , . cos 7 i —1 ai = — :------- , bi = sin 7 ,, g{= ------:-----sin 7 j sin 7 , where, 7 * = N~ ^ Al and * = 1 , 2 ,3,...,9. 3.2.1 Orthonormal Factorization of the 9-point DCT of typ e II In this section, the factorization of the 9-point DCT-II and DST-II using only orthonormal rotations is presented. Since the DST-II can be obtained from the DCT-II by negating the odd inputs, only the factorization of the DCT part will be presented . The calculation of the 9-point DCT-II can be separated into two parts, with approximately one-fourth computationally complexity each, via DCT-V and DCT-VII [14]. The N-point DCT-II, V and VII are defined as [14]: N- 1 h , 7T x n cos t (" + 2 )kN n= 0 N- 1 Z kv (3.1) . 1 , k7r («+-) 2> N - \ (3.2) Iw , Tt x n cos / (n + 2 )(‘f c + 2 ) F 3 T (3.3) y i x n COS n=0 AT-1 z r £ 71=0 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Inverse Rotations Rotations Further reproduction 9 point Lossless DCT-II 9 point Lossless DCT-II prohibited without permission. Figure 3.2. Fast, integer structure for the 36-point Integer MDCT/IMDCT with rotations implemented using lifting. to CO 30 where 0 < k, n < N — 1 and 6* = 1 —(1 - 1 /\/2 )5[k}. It is clear that, when N is odd, the N -point DCT-II can be calculated from the ^ p -p o in t DCT-V and p i-p o in t DCT-VII as follows: From (3.1), substituting k by 2k yields JV- 1 L Z 2k = b k Y ^ { X n + ^ - l - n } COS n=0 7T ( n + -)k 2' N/2 (3.4) for 0 < k < yy-, which is equivalent to the y p -p o in t DCT-V of x n + x ^ - x - n. Similarly, substitute k by 2k + 1 in (3.1) yields IV—3 1v Z 2k +1 = ^2 n=0 { Xn ~ X K - l - n } COS l v vr { n + 2) ( k + 2) N/2\ for 0 < k < y p , which is equivalent to the ^^^p-point DCT-VII of xn — (3.5) It is easy to see that the computational complexity, commonly measured by the number of multiplications, is approximately reduced by half for large N. Figure 3.3 illustrates the calculation of the 9-point DCT-II via the 5-point DCT-V and 4-point DCT-VII. It also shows orthonormal factorization of the two reduced-size matrices. The 9-point DST-II is obtained by negating the even inputs. Table 3.2.1 sum­ marizes the values of the butterfly coefficients used in the 9-point DCT-II. Figure 3.4 shows the integer structure for the 9-point DCT-II. The lifting coefficients used are tab­ ulated in table 3.2.1. 3.3 A ccuracy of the forward transform In our implementation, we convert the butterflies in the 12-point and 36-point MDCT structures to lifting units. The lifting coefficients are then quantized to a finite number of bits 7VC, which ranges from 1 to 20 bits. In the evaluation, the input signal is R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 31 H O Q H O Q c ’o a -1-3 I N I" N N N N N n N n I n | n | n | n I n | n I ^ ! ( n fl o bo o -£3 4? N4$ N4? N4*4$ N N N -4-3 S-l fT N O <P '^ -£3 -4-3 4-1 £ 0) tH 43-2 <U 3 < 4s --2 cn > dQ CO CO c a; ud fcuO £ in © 'x X 'x X w x” 00 'x R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 3.2. Butterfly coefficients of the 9-point DCT-II block shown figure 3.3. c l ) f-2> f-3) c 8 COS f 5 lj S 2, S3, S 8 sin j C g , C 1 0 , C n , C 15 c o s f ®9> s 10> S l l ) S l 5 s in f C 4 , C 7 , C \ 2 ) C 14 %/2 1 \/3 S4> S7> S 1 2 , S l 4 c5 COS f p s5 ^6 1 sin ^ S6 3 Cl3 Vs 3 cos S l3 s in T Table 3.3. Lifting Coefficients used in the 9-point DCT-II/DST-II block, figure 3.4. cos ^ - - 1 ai 02 03 0,4 cos £ —1 s in f 9i sin f cos £ 1 f>2 s in f 92 sin £ cos £ —1 h s in f 9s sin £ 94 y/2 - y / 3 s in f COS j - 1 s in 7 cos f - 1 sin f sfi - sfi b4 1 VS cos ^ - 1 a§ a7 cos ^ - 1 h 05 -:i VI V 2 -V 3 b6 67 s i n f 1 Vi 9b 9b 1 V3 97 COS -j - 1 a8 Gg ^10 a n s in j COS 7 - 1 s in J co sf-1 s in 1 cos^-1 sin £ y/2 cos 4 —1 ... 4 ... sin f COS £ - 1 s in f 9s bo s in f 99 bio s in f 9\o bn s in f 9n sin £ 912 y/2 -y/3 y/2 - y / S b\2 &13 sin ^ - 1 ------ — cos ^ bn G 14 y/2 - y / Z bu 1 VS sin £ cos 4 —1 ., 4 .. sin £ COS £ —1 s in cos ^ —1 913 cos^ vs 9u y / 2 - y/3 s in f 9lb 1 co s£ -l sin £ Vs -y/3 &8 G 12 ^15 sin £ £ -2 cos £ —1 b\ s sin £ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. > 6a 4> eo H O Q > e c 0 a a>1 13 c obO o -a ot-l a> 4-5 -C J-i .2 a> 3 a 3 Sh a> J-h O' bO a> > < d Q CO < D S-h 3 bO — -o r- <N -ca-* o *fr «*> o r*- oo Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 34 quantized to 8 bits, the product between the signal through the lifting branch and the lifting coefficient is quantized to 25 bits, and the internal nodes are set at 25 bits. By sending the input through the forward fast MDCT transform, the output of the MDCT is obtained. By sending the same input through the forward fast integer transform, an approximation of the MDCT output is obtained. The error of the integer MDCT is ob­ tained by subtracting the two outputs. The performance of the MDCT evaluated based on the accuracy of the integer MDCT compared with the ideal MDCT is presented in figure 3.3 fa,) and (b). From the figures, it is clear that as the number of bits allocated to the lifting coefficients N c is increased, the accuracy of the integer transform increases approximately linearly with N c for both sizes of the transform, ft is evident that the ac­ curacy is improved by approximately 6 dB per each bit of the multiplication coefficients increased. 3.3.1 Com putational Complexity An integer multiplication is equivalent to bit shifting the multiplicand too the left by different numbers of bits and sum up these bit shifted versions. An algorithm for calculating the minimum number of adders for a given integer multiplication is presented in [15]. Based on the algorithm we have obtained the minimum number of adders for the multiplications involved in the 12 and 36-point MDCT for coefficient bits Nc varying from 1 to 8. The graphs so obtained are shown in figure 3.3.1 (a) and (b). It is seen that the minimum number of adders required for the implementation of the integer MDCT increases linearly with the resolution of the coefficient bits. In chapter 4, the implementation of the integer MDCT structure in a commercial MPEG layer III codec is discussed. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 35 -50 “ —100 P-150 -200 -250 NumberofCoefficientsBits(No) (a) -50 2--100 S-150 -2 0 0 -250 NumberofCoefficientsBits(Nc) (b) Figure 3.5. Mean square error of the integer MDCT versus the number of coefficient bits Nc: (a) N = 12 and (b) N = 36. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 36 40 35 30 w a> 5 Ito 25 c in o 20 15 10 5 1 2 3 4 5 Numberof Coefficient Bits (a) 6 7 8 140 120 (0 100 0) T3 T5 < S ro' c CD O IE z3 NumberofCoefficientBits (b) Figure 3.6. Number of coefficient bits versus the minimum num­ ber of Binary adders for (a) N = 12 and (b) N = 36. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 4 THE IMPLEMENTATION OF THE INTEGER MDCT IN A STANDARD MPEG LAYER III CODEC Commercial MPEG Layer III codecs implement the MDCT using a straightforward computation by using equation (2.3) or resort to faster techniques such as that repre­ sented by equation (2.21). Either way the implementation involves numerous floating point additions and multiplications resulting in loss of information due to quantization error. The fast integer MDCT structure presented in the previous section is implemented in a standard MPEG layer III codec. Figures 4.1 and 4.2 show the modification to the en­ coder and the decoder block diagrams. We assess the performance of the integer MDCT based encoder and decoder by quantizing the lifting coefficients and increasing the num­ ber of bits used until we achieve satisfactory performance, as good as or better than the original floating point transform. The lifting coefficients are quantized to Nc bits. Since addition is carried out in the node, after addition the resolution at the node will be at least N c + 1. Therefore the nodes are quantized at Ni bits, iVj being much greater than Nc. In the simulation Nt is set at 25 bits. N c is varied from 1 to 8 bits. There are different criteria for assessing the performance of the integer MDCT based encoder or the decoder. The integer MDCT based encoder should be capable of producing a bitstream which can be decoded by any standard decoder. Moreover the decoded bitstream must be perceptually as good or better than that produced by the original floating point en­ coder/decoder combination. Similarly, the integer MDCT based decoder should be able to produce sound that is perceptually as good as the standard decoder while decoding a 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Audio signal in PCM format Further reproduction Analysis Polyphase Filter Bank FFT Integer MDCT prohibited without permission. Psychoacoustic Model Nonuniform Quantization Bitstream Formatting CRC word generation Coding of Side Information Coded Audio Signal Ancillary Data Figure 4.1. Fast integer MDCT based MPEG layer III encoder functional block diagram. Co 00 Reproduced with permission of the copyright owner. Huffman c o d e B i t s M agnitu de i S ig n MDCT' Huffman Decoding Bitstream Synchronization and Error Checking Reordering Huffman Info Decoding Further reproduction Scalefactor Decoding Scalefactors Ancillary Data MDCT prohibited Alias Reduction Integer IMDCT Frequency Inversion Synthesis Polyphase Filterbank without permission. Joint Stereo Decoding PCM Samples Alias Reduction Integer IMDCT Frequency Inversion ► Left Channel Figure 4.2. Fast integer IMDCT based MPEG layer III decoder functional block diagram. CO CO 40 bitstream encoded using the standard encoder. Finally the integer MDCT based encoder bitstream when decoded by an integer MDCT based decoder should produce perceptually lossless audio. By varying the number of bits used to quantize the lifting coefficients, Nc we can find the minimum value of Nc that satisfies each of the criteria listed above. 4.1 Simulation results Figure 4.5(a) shows one frame of the original sample. Figures 4.3 (b),(c) and (d) show the corresponding waveforms obtained using the integer encoder and decoder with MDCT coefficient quantized to Nc = 1,2 and 3 bits respectively . It was found from subjective tests, that the number of bits for the MDCT coefficients required to produce perceptually lossless sound is 2 bits. If the integer encoder is used with the standard decoder or the standard encoder with the integer decoder, perceptually lossless sound is obtained when Nc is set to 3 bits. Figure 4.4(a) shows the spectrogram of asample PCM signal. Figures 4.4(b) , 4.5(a) and (b) show the spectrogram of the decoded file obtained by using the integer encoder and standard decoder combination with Nc — 1,2, and 3 respectively. Figure 4.6(a) shows the spectrogram of the standard MP3 file decoded using a standard decoder. Fig­ ures 4.6(b), 4.7(a) and (b) show the spectrogram of the decoded file obtained using the integer encoder/decoder combination for Arc = 1,2 and 3. Certain minor differences can be noticed while examining the spectrograms. Because the psychoacoustic model removes acoustically redundant frequencies, the decoded files have a spectrum that is different from the original PCM file. It is easily by comparing figure 4.4 (a) to all other plots that the spectrogram of the decoded file has frequencies higher than the normalized frequency = 0.6 clipped. From the 1-bit integer encoder standard decoder combination, it can be seen that there are red blotches indicating higher amplitude in the low frequency Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 41 region that are different from the 2-bit or the 3-bit encoded files. According to figure 3.3.1 the number of adders required for implementing the 12point and 36-point MDCT/IMDCT in the encoder/decoder are 54 and 86 for Nc — 2 and N c = 3 respectively. The mean square error between a given audio signal and a reference one can be used to quantitatively assess the performance of the codec. We consider the three cases: integer encoder/integer decoder combination, standard encoder/integer decoder and the integer encoder/standard decoder. The reference signal in each case is the audio signal obtained from each combination with N c set at 20 bits. This signal is chosen because it gives the highest fidelity for a given value of Nc. The mean square error for the integer encoder/integer decoder for N c varying from 1 to 20 is shown in figure 4.8. From the normal plot it is seen that The mean square error decreases slowly initially and then at N c = 20 it quickly converges to zero. Figures 4.9 and 4.10 shows the plots of the mean square error versus the number of coefficient bits Nc for the standard encoder/integer decoder and the integer encoder/standard decoder combinations. From the plots it is clear that the integer encoder/integer decoder combination performs best at 2 bits since the mean square error is in the order of 10- 5 . The standard encoder/integer decoder and the integer encoder/standard decoder combinations both have mean square errors in the range 10“3 at 2 bits. This result is in agreement with those obtained in the subjective tests. Chapter 5 summarizes the contributions of this thesis. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 42 0.05 (a) 0.5 0.5 (b) 0.5 a E < (c) -0 .5 0.5 0.5 T (d) -0 .5 0.5 Time (ms) Figure 4.3. Time plots of (a) PCM signal. Decoded signal obtained using Nc = 1,2,3 bit integer encoder-decoder combination shown in plots (b),(c),(d) respectively. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. Spectrogram of the Original PCM file "j, - •t ••• P*••.!■.tjr 0 0.2 0.4 0.6 0.8 1 1.2 Time 1.4 1.6 1.8 2 x10 (a) 1bitencodedmp3file, standarddecoder I m 0 0.2 0.4 0.6 0.8 1 Time 1.2 1.4 1.6 1.8 2 x 10 (b) Figure 4.4. Spectrogram of faJPCM signal ^ E n co d ed signal with MDCT coefficients quantized to 1 bit. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 44 2bitencodedmp3file, standarddecoder 3bitencodedmp3file, standarddecoder (b) Figure 4.5. Spectrogram of (aJEncoded signal with MDCT coef­ ficients quantized to 2 bit ^E ncoded signal with MDCT coeffi­ cients quantized to 3 bit. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Spectrogram of the d eco d ed file obtained from th e original en co d er decoder com bination w.xi't.Jr- 'irt; ■e: J (b) Figure 4.6. Spectrogram of ^D ecoded signal obtained using the standard codec (b) Decoded signal obtained from the integer enocoder/decoder combination with N c = 1. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 46 Spectrogram of th e d e co d ed file with coefficients quantized to 2 bits (b) Figure 4.7. Spectrogram of decoded signal obtained from the integer enocoder/decoder combination with (a)Nc = 2 (b)Nc = 3. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 47 0.8 § 0.6 0.4 0.2 Nc(Numberofcoefficientbits) (a) £ 10 ' 10 " 10"’ Nc(NumberofCoefficientbits) (b) Figure 4.8. The mean square error versus N c, number of coef­ ficient bits for the integer encoder/integer decoder combination as Nc varies from 1 to 20 bits (a) Normal plot (b) Semilog plot. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 48 6 .-3 x 10" 5 4 o UJ T J <D S3 % c < 00 ] 5 2 0 Nc(NumberofCoefficient bits) (a) 10 - ' Nc(NumberofCoefficientbits) (b) Figure 4.9. The mean square error versus N c, number of coeffi­ cient bits for the integer encoder/standard decoder combination as Nc varies from 1 to 20 bits (a) Normal plot (b) Semilog plot. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. 49 (30 cr W c (0 2© Nc(NumberofCoefficientbits) (a) 10 - ' 10 Nc(NumberofCoefficientbits) (b) Figure 4.10. The mean square error versus Nc, number of coeffi­ cient bits for the standard encoder/integer decoder combination as N c varies from 1 to 20 bits (a) Normal plot (b) Semilog plot. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 5 CONCLUSIONS In this thesis we have described a novel fast, integer structure for the MDCT used in MPEG audio coding. It is a lapped transform which provides better spectral res­ olution by subdividing in frequency the sub-band outputs obtained from the previous layers. MPEG layer III uses MDCT of lengths 12 and 36. In order to improve the speed and efficiency of the codec fast structures for the MDCT have been implemented. The fast structures are implemented using Given’s plane rotations and DCT/DST of type II. The 12-point and the 36-point structures use 3-point and 9-point DCT/DST structures. The rotations block contain orthogonal factorizations which are represented by butterfly units. Using the lifting scheme the butterfly units can be converted to liftings. One of the properties of lifting is that using a combination of the forward and inverse lifting structure, quantization error can be completely removed. In order for the entire fast MDCT structure to be lossless, DCT/DST blocks have to be implemented using liftings. The orthonormal factorization of the 3-point DCT/DST is trivial whereas that for the 9-point DCT/DST is far more complex. We propose a new structure for the 9-point DCT/DST and obtain its orthonormal factorization. The new MDCT structure obtained is implemented using the lifting scheme for its lossless implementation using integer coefficients. The performances of the new transforms are presented, in which their accuracies are estimated. It is evident that the accuracy of the integer transform increases approximately linearly as the resolution of the coefficients increases. It was also shown that the minimum number of adders required to implement the structure also increases linearly with the resolution of the coefficients [16]. This is 50 R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. a particularly important result as far as implementation in low power, mobile devices is concerned. Fewer the number of bits used, more the savings in power. The new MDCT structure with integer coefficients was incorporated in a standard MPEG layer III encoder and decoder. Its performance was compared with the origi­ nal codec. The integer MDCT based encoder-decoder combination performed optimally with coefficient bits quantized at only 2 bits. Whereas combinations of the standard encoder/decoder with the integer MDCT based encoder/decoder requires the integer en­ coder/decoder to have at least 3 bits for perceptually lossless reproduction of sound. An analysis of the differences in the plots and the spectrogram of the waveforms obtained from the standard codec and the integer MDCT based codec was performed and the differences in the waveforms were highlighted. Even though there were minor differences in the waveforms and the spectra, perceptually no difference was found in the integer codec when the MDCT coefficients were quantized to 2 or more bits. The new integer MDCT structure is particularly useful to manufacturers of portable MPEG layer III audio devices. If the MDCT coefficients are quantized to 2 or 3 bits, the number of adders and shifters reqired are considerably reduced. Also rather than providing power to floating point coefficients that consist of 32 to 64 bits, only 2 or 3 bits need to be powered. This would prolong battery life in such devices. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. BIBLIOGRAPHY [1] D. Pan, “A tutorial on mpeg/audio compression,” IEEE Multimedia, pp. 60-74, Summer 1995. [2] K.R. Rao and J.J. Hwang, Techniques and Standards for Digital Image/Video/Audio Coding, Prentice Hall, 1996. [3] H.S.Malvar, Signal Processing with Lapped Transforms, Archtech House, 1992. [4] V. Britanak and K.R. Rao, “An efficient implementation of the forward and inverse MDCT in MPEG audio coding,” IEEE Singal Processing Letters, vol. 8, no. 2, pp. 48-51, February 2001. [5] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting steps,” Technical report, Bell Laboratories, Lucent Technologies, 1996. [6] T.D. Tran, “The bindct: Fast multiplierless approximation of the dct,” IEEE Singal Processing Letters, vol. 7, no. 6, pp. 141-144, June 2001. [7] Y-J. Chen, S. Oraintara, and T.Q. Nguyen, “Video compression using integer DCT,” in ICIP, 2000. [8] S. Oraintara, Y. Chen, and T. Nguyen, “Integer fast fourier tran sfo rm (in tfft ),” Proc. IEEE International Conf. on Acoustics, Speech, and Signal Processing, vol. 6, pp. 3485-3488, 2001. [9] T.D. Tran, “m-channel linear phase perfect reconstruction filter banks with rational coefficients,” Submitted to IEEE Trans. CAS-I, 2001. [10] Y. Wu and Z. Zhu, “New radix-3 fast algorithm for the discrete cosine transform,” Proc. of the Aerospace and Electronics Conf., vol. 1, pp. 86-89, September 1993. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 53 [11] Y-H. Chan and W-C. Sui, “An efficient implementation of the forward and inverse mdct in mpeg audio coding,” IEEE Singal Processing Letters, vol. 8, no. 2, pp. 48-51, February 2001. [12] Y. Wang, L. Yaroslavsky, M. Vilermo, and M. Vaananen, “Some peculiar properties of the mdct,” Proc. of the International Conference in Signal Processing, vol. 1, pp. 61-64, 2000. [13] V. Britanak and K.R. Rao, “New fast algorithms for the unified forward and in­ verse mdct/mdst computation,” Technical report, Institute of Control Theory and Robotics, Slovak Academy of Sciences, May 2000. [14] G. Strang and T. Nguyen, Wavelets and Filter Banks, Wellesley Cambridge, 1997. [15] Y.J. Chen, S. Oraintara, T. D. Tran, K. Amaratunga, and T.Q. Nguyen, “Multiplierless approximations of transforms using lifting scheme and coordinate descent with adder constraint,” IEEE International Conf. on Acoustics, Speech, and Signal Processing,, May 2002. [16] T. Krishnan and S. Oraintara, “A fast and lossless forward and inverse structure for the mdct in mpeg audio coding,” Proc. of the International Symposium on Circuits and Systems, May 2002. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission. BIOGRAPHICAL INFORMATION The author was born on June 15, 1979 in Banglore a metropolitan city in South India. Owing to the nomadic nature of his father’s profession as a banker working for a national bank with numerous branches, he spent his childhood in many different exciting cities and small towns in India. Goaded on by his grandmother with small bribes and threats, he developed a keen interest in mathematics. After finishing his schooling and passing out of high school at the top of his class, he went on to pursue a Baccalaureate degree in Electronics and Communication Engineering at the Sri Venkateswara College of Engineering under the University of Madras. He soon developed an interest in Digital Signal Processing and Communications engineering. He decided to continue onto a Master’s program in Electrical Engineering at The University of Texas at Arlington where he was offered a graduate fellowship and an op­ portunity to work with Professor Soontorn Oraintara at the Multirate Signal Processing laboratory. It is here, under the professor’s guidance, he has developed the work docu­ mented in this thesis. He is a student member of the IEEE and a member of the Electrical Engineering Honor Society Eta Kappa Nu. He is a voracious reader, fond of music, the movies, long after dinner walks and good vegetarian cuisine. Being vain and extremely conscious about his height-weight ratio, he works out and swims. He resides on campus at The University of Texas at Arlington and is currently seeking a career path in the industry. He graduated from The University of Arlington with a Master of Science in Electrical Engineering, August 2002. 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R eproduced with permission of the copyright owner. Further reproduction prohibited without permission.