Uploaded by vishwas867

Explain audio encoder and decoder used in MPEG by vishwas

Explain audio encoder and decoder
used in MPEG
MPEG audio encoder
The principle of MPEG audio compression is quantization.
The values being quantized however are not the audio samples but numbers
(called signals) taken from the frequency domain of the sound.
The fact that the compression ratio (or equivalently bit rate) is known to the
encoder means that the encoder knows at any time how many bits it can allocate
to the quantized signals.
Thus the (adaptive) bit allocation algorithm is an important part of the encoder.
This algorithm uses the known bitrate and the frequency spectrum of the most
recent audio samples to determine the size of the quantized signals such that the
quantization noise ( the difference between an original signal and a quantized
one) will be inaudible.
The psychoacoustic models use the frequency of the sound that is being
compressed, but the input stream consists of audio samples not sound
The frequency has to be computed from the samples.
This is why the first step in MPEG audio encoding is a discrete fourier transform,
where a set of S12 consecutive audio samples is transformed to the frequency
Since the number of frequencies can be huge, they are grouped into 32-equal
width frequency subbands ( Layer III uses different numbers but the same
For each subband, a number is obtained that indicates the intensity of the sound
at the subband’s frequency range.
These numbers (called signals) are then quantized. The coarseness of the
quantization in each subband is determined by the masking threshold in the
subband and by the number of bits still available to the encoder.
The masking threshold is computed for each subband using psychoacoustic
MPEG uses Psychoacoustic models to implement frequency masking and
temporal masking.
Each model describes how loud sound masks other sounds that happen to be
close to it in frequency or in time.
The model partitions the frequency range into 24 critical bands and specifies
how masking effects apply within each band.
The masking effects depend of course on the frequency and amplitude of the
When the sound is decompressed and played, the user (listener) may select any
playback amplitude, which is why the psychoacoustic model has to be designed
for the worst case.
The masking effects also depend on the nature of the source of the sound being
The source may be tono-like or noise-like.
The two psychoacoustic models employed by MPEG are based on experimental
work done by researchers over many years.
MPEG audio decoder
The decoder must be fast, since it may have to decode the entire movie (video
and audio) at real time,, so it must be simple.
As a result it does not use any psychoacoustic model or bit allocation algorithm.
The compressed stream must therefore contain all the information that the
decoder needs for dequantizing the signals.
This information must be written by the encoder on the compressed stream, and
it constitutes overhead that should be subtracted from the number of remaining
available bits.
The ancillary data is user-definable and would normally consist of information
related to specific applications. This data is optional.