Guerino Mazzola (Fall 2014©): Introduction to Music Technology III Digital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC Guerino Mazzola (Fall 2014©): Introduction to Music Technology The MP3 encoder chain Audio Data Subbands Line Filter Bank 32 Subbands Quantization and Encoding (Check of 5. Huffman Compression Encoding Datastream Data Stream 2*16 to Formatting to Frames etc. Quantization loop) Encoding of Additional Information Additional Data Psychoacoustical Model External Check 2. FFT with Filter Bank 4. Quantization 3. Psychoacoustical Model (Perceptual-Audio-Coding Model PAC) 1. Digital Datastream 6. Frame Outputstream Formatting Guerino Mazzola (Fall 2014©): Introduction to Music Technology The MP3 encoder chain Audio Data Subbands Line Filter Bank 32 Subbands Quantization and Encoding (Check of Encoding Datastream Data Stream 2*16 to Formatting to Frames etc. Quantization loop) Encoding of Additional Information Psychoacoustical Model External Check 1. Digital Datastream 2 ~ stereo 768 kbit/s ~ 48 000 × 16 b/s Additional Data Guerino Mazzola (Fall 2014©): Introduction to Music Technology The MP3 encoder chain Audio Data Subbands Line Filter Bank 32 Subbands Quantization and Encoding (Check of Encoding Datastream Data Stream 2*16 to Formatting to Frames etc. Quantization loop) Encoding of Additional Information Psychoacoustical Model Additional Data Important: Since # sample rate = # Fourier coefficients, speak of “Fourier 2. FFT with Filter Bank samples 2.1 Cut spectrum 0 – 20 kHz into 32 subbands per second” of 625 Hz each (32 × 625 = 20 000) for 1/40 sec windows. 2.2 Use MDCT (Modified Discrete Cosine Transformation ~ variant of FFT) to split each 625 Hz band into 18 subbands with variable widths, according to psychoacoustical criteria. Get 576 = 18 × 32 “lines”. External Check Guerino Mazzola (Fall 2014©): Introduction to Music Technology The MP3 encoder chain Audio Data Subbands Line Filter Bank 32 Subbands Quantization and Encoding (Check of 5. Huffman lossless Compression Encoding Datastream Data Stream 2*16 to Formatting to Frames etc. Quantization loop) Encoding of Additional Information Psychoacoustical Model External Check 4. lossy Quantization Already discussed, Ok!!!! 40% of compression Additional Data Guerino Mazzola (Fall 2014©): Introduction to Music Technology The MP3 encoder chain Audio Data Subbands Line Filter Bank 32 Subbands Quantization and Encoding (Check of Encoding Datastream Data Stream 2*16 to Formatting to Frames etc. Quantization loop) Encoding of Additional Information Psychoacoustical Model External Check 3. Psychoacoustical Model (Perceptual-Audio-Coding Model PAC) Additional Data Guerino Mazzola (Fall 2014©): Introduction to Music Technology The MP3 encoder chain Psychoacoustical Model (Perceptual-Audio-Coding Model PAC) = core features of MP3, it covers 60 % of MP3 compression The PAC Model is based upon three limitations of human audio-perception: PAC 1: hearing thresholds PAC 2: auditory masking PAC 3: temporary masking All three PAC components generate lossy compression The MP3 encoder chain loudness Guerino Mazzola (Fall 2014©): Introduction to Music Technology PAC 1: hearing thresholds you don’t hear sinusoidal sounds below this threshold of loudness frequency (kHz) The MP3 encoder chain loudness Guerino Mazzola (Fall 2014©): Introduction to Music Technology PAC 2: auditory masking frequency For every sinusoidal frequency component of frequency f and loudness l, there is a surrounding masking surface, where other frequency/loudness components cannot be heard together with the given one. Example: the 4 kHz/40 dB component (red) masks the blue one. The MP3 encoder chain loudness Guerino Mazzola (Fall 2014©): Introduction to Music Technology PAC 3: temporary masking time For every sinusoidal frequency component of frequency f and loudness l (red) another subsequent component (blue) cannot be heard below the given curve of loudness in time, because the ear needs some time to “recover” from that first component’s perception. This is even true for sounds before the given one (red curve), because the perception needs to be built up! Guerino Mazzola (Fall 2014©): Introduction to Music Technology The MP3 encoder chain Audio Data Subbands Line Filter Bank 32 Subbands Quantization and Encoding (Check of Encoding Datastream Data Stream 2*16 to Formatting to Frames etc. Quantization loop) Encoding of Additional Information Additional Data Psychoacoustical Model External Check 6. Frame Outputstream Formatting