The MP3 encoder chain

advertisement
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
III Digital Audio
III.6 (Fr Oct 24)
The MP3 algorithm with PAC
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
The MP3 encoder chain
Audio Data
Subbands
Line
Filter Bank
32 Subbands
Quantization
and
Encoding
(Check of
5. Huffman
Compression
Encoding
Datastream Data Stream
2*16 to
Formatting
to
Frames etc.
Quantization
loop)
Encoding of
Additional
Information
Additional Data
Psychoacoustical
Model
External
Check
2. FFT with Filter Bank
4. Quantization
3. Psychoacoustical Model
(Perceptual-Audio-Coding Model PAC)
1. Digital Datastream
6. Frame
Outputstream
Formatting
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
The MP3 encoder chain
Audio Data
Subbands
Line
Filter Bank
32 Subbands
Quantization
and
Encoding
(Check of
Encoding
Datastream Data Stream
2*16 to
Formatting
to
Frames etc.
Quantization
loop)
Encoding of
Additional
Information
Psychoacoustical
Model
External
Check
1. Digital Datastream
2 ~ stereo
768 kbit/s ~ 48 000 × 16 b/s
Additional Data
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
The MP3 encoder chain
Audio Data
Subbands
Line
Filter Bank
32 Subbands
Quantization
and
Encoding
(Check of
Encoding
Datastream Data Stream
2*16 to
Formatting
to
Frames etc.
Quantization
loop)
Encoding of
Additional
Information
Psychoacoustical
Model
Additional Data
Important:
Since # sample rate =
# Fourier coefficients,
speak of “Fourier
2. FFT with Filter Bank
samples
2.1 Cut spectrum 0 – 20 kHz into 32 subbands
per second”
of 625 Hz each (32 × 625 = 20 000) for 1/40 sec windows.
2.2 Use MDCT (Modified Discrete Cosine Transformation ~ variant of FFT)
to split each 625 Hz band into 18 subbands with variable widths,
according to psychoacoustical criteria. Get 576 = 18 × 32 “lines”.
External
Check
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
The MP3 encoder chain
Audio Data
Subbands
Line
Filter Bank
32 Subbands
Quantization
and
Encoding
(Check of
5. Huffman
lossless Compression
Encoding
Datastream Data Stream
2*16 to
Formatting
to
Frames etc.
Quantization
loop)
Encoding of
Additional
Information
Psychoacoustical
Model
External
Check
4. lossy Quantization
Already discussed, Ok!!!!
40% of compression
Additional Data
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
The MP3 encoder chain
Audio Data
Subbands
Line
Filter Bank
32 Subbands
Quantization
and
Encoding
(Check of
Encoding
Datastream Data Stream
2*16 to
Formatting
to
Frames etc.
Quantization
loop)
Encoding of
Additional
Information
Psychoacoustical
Model
External
Check
3. Psychoacoustical Model
(Perceptual-Audio-Coding Model PAC)
Additional Data
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
The MP3 encoder chain
Psychoacoustical Model
(Perceptual-Audio-Coding Model PAC) =
core features of MP3,
it covers 60 % of MP3 compression
The PAC Model is based upon three
limitations of human audio-perception:
PAC 1: hearing thresholds
PAC 2: auditory masking
PAC 3: temporary masking
All three PAC components generate lossy compression
The MP3 encoder chain
loudness
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
PAC 1: hearing thresholds
you don’t hear sinusoidal sounds below
this threshold of loudness
frequency (kHz)
The MP3 encoder chain
loudness
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
PAC 2: auditory masking
frequency
For every sinusoidal frequency component of frequency f and loudness l,
there is a surrounding masking surface, where other frequency/loudness
components cannot be heard together with the given one.
Example: the 4 kHz/40 dB component (red) masks the blue one.
The MP3 encoder chain
loudness
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
PAC 3: temporary masking
time
For every sinusoidal frequency component of frequency f and loudness l (red)
another subsequent component (blue) cannot be heard below the given curve
of loudness in time, because the ear needs some time to “recover” from that
first component’s perception. This is even true for sounds before the given
one (red curve), because the perception needs to be built up!
Guerino Mazzola (Fall 2014©): Introduction to Music Technology
The MP3 encoder chain
Audio Data
Subbands
Line
Filter Bank
32 Subbands
Quantization
and
Encoding
(Check of
Encoding
Datastream Data Stream
2*16 to
Formatting
to
Frames etc.
Quantization
loop)
Encoding of
Additional
Information
Additional Data
Psychoacoustical
Model
External
Check
6. Frame
Outputstream
Formatting
Download