AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya CD Quality CD Audio: 2 Channel (stereo) 16 bit encoding 44.1 kHz sampling rate Data Rate: This leads to a data rate of 1.4 - 1.54 mbps AUDIO ENCODER! Bit rate: as low as 1 bit per sample or less Based on a Perceptual model Capable of high fidelity audio Lossy or Lossless? ‘CD Quality’ Many perceptual test were conducted to verify the quality of audio output Takes advantage of perceptual irrelevancies as well as statistical redundancies. Motion Picture Experts Group MPEG • • MPEG is a family of encoding standards for digital multimedia information • MPEG-1: a standard for storage and retrieval of moving pictures and audio on storage media (e.g., CD-ROM). • Layer I • Layer II • Layer III (aka MP3) • • MPEG-2: standard for digital television, including high-definition television (HDTV), and for addressing multimedia applications. • • Advanced Audio Coding (AAC) • • MPEG-4: a standard for multimedia applications, with very low bit-rate audio-visual compression for those channels with very limited bandwidths (e.g., wireless channels). • • MPEG-7: a content representation standard for information search Back to the Encoder! Generic Audio Encoder Architecture Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000 Psychoacoustic Model Critical Listening Threshold: The absolute threshold of hearing is defined as the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment. This criteria assumes that the volume control on the decoder will be set such that the smallest possible output signal will be presented at 0 dB - SPL Psychoacoustic Model Absolute threshold of hearing Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000 • Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4) • Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4) nn Psychoacoustic Model Critical Bands The ear has a limited frequency selectivity that varies in acuity from less than 100 Hz for the lowest audible frequencies to more than 4 kHz for the highest. As a result the audible spectrum can be partitioned into critical bands that reflect the resolving power of the ear as a function of frequency. Due to this limited frequency resolving power, the threshold for noise masking at any given frequency is solely dependent on the signal activity within a critical band of that frequency. Psychoacoustic Model MPEG/Audio filter banks Vs Critical Bands Pan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5 Psychoacoustic Model Auditory Masking Auditory Masking is a perceptual weakness of the ear that occurs whenever the presence of a strong audio signal makes a spectral neighbourhood of weaker audio signals imperceptible. Two types of Masking: * Simultaneous Masking * Temporal Masking Psychoacoustic Model Audio Masking QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Pan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5 Psychoacoustic Model Perceptual Entropy Johnston at Bell Labs has combined notions of psychoacoustic masking with signal quantization principles to define perceptual entropy (PE), a measure of perceptually relevant information contained in any audio record. Expressed in bits per sample, PE represents a theoretical limit on the compressibility of a particular signal. PE measurements reported in and suggest that a wide variety of CD quality audio source material can be transparently compressed at approximately 2.1 bits per sample. Time - Frequency Analysis Filter Banks The filter bank divides the signal spectrum into frequency sub-bands and generates a time-indexed series of coefficients representing the frequency localized signal power within each band. Masking thresholds are applied to resulting frequency sub-band signals By providing explicit information about the distribution of signal and hence masking power over the time-frequency plane, the filter bank plays an essential role in the identification of perceptual irrelevancies when used in conjunction with a perceptual model Time - Frequency Analysis Pseudo QMF - M Band Banks Used in all MPEG 1 encoders Signal is separated into sub-bands, the widths of which are equal over the entire frequency range The resulting sub-band signals are then down-sampled, in order to conserve bandwidth. (they are up-sampled again at the decoder) Pre Echo Distortion Pre-echoes occur when a signal with a sharp attack begins near the end of a transform block immediately following a region of low energy. This situation can arise when coding recordings of percussive instruments such as the triangle, the glockenspiel, or the castanets b Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000 AUDIO COMPRESSION THANK YOU!