MPEG AAC

advertisement
An Overview of Perceptual
Audio Coding and MPEG
AAC
Introduction
• Audio coding or audio compression algorithms are used
to obtain compact digital representation of high-fidelity
(wideband) audio signals for the purpose of efficient
transmission or storage.
• The central objective in audio coding is to represent the
signal with minimum number of bits while achieving
transparent signal reproduction i.e. generating output
audio that cannot distinguished from the original input
even by a listener with ”Golden Ears”
• The Motion Picture Experts Group (MPEG) audio
compression algorithm is an International Organization
for Standardization (ISO) standard for high- fidelity audio
compression.
Continue …
• MPEG audio compression standards are lossy audio
coding standards. They try to compress audio by trying
to reduce perceptual and statistical redundancies.
The basic task of a perceptual audio coding system is to
compress the digital audio data in a way that - the compression is as high as possible, and
- the reconstructed (decoded) audio sounds exactly (or as
close as possible) to the original audio before compression
Audio Coding Techniques
• Parametric Coding
• Waveform Coding
Time Domain
PCM, DPCM, ADPCM etc.
Frequency Domain
Transform Coding, Subband Coding
• Hybrid Coding
Perceptual Audio Coding Basics
• Human hearing limited to values lower than
~20kHz in most cases
• Human hearing is insensitive to quiet frequency
components to sound accompanying other
stronger frequency components
• Stereo audio streams contain largely redundant
information
• MPEG audio compression takes advantage of
these facts to reduce extent and detail of mostly
inaudible frequency ranges
Generic Perceptual Audio Coding
Architecture
Psychoacoustic Principles
• High-precision engineering models for highfidelity audio currently do not exist. So, audio
coding algorithms rely upon generalized receiver
models to optimize coding efficiency.
• In the case of audio, the receiver is ultimately
the human ear and sound perception is affected
by its masking properties.
• Perceptual audio coders achieve compression
by exploiting the fact that “irrelevant” signal
information is not detectable by even a well
trained or sensitive listener.
• Irrelevant signal information is identified during
signal analysis by incorporating into the coder
several psychoacoustic principles, including
absolute hearing thresholds, critical band
frequency analysis, simultaneous masking, the
spread of masking along the basilar membrane,
and temporal masking.
• By combining all these, a quantitative estimate
of the fundamental limit of transparent audio
signal compression i.e. Perceptual Entropy is
determined for given audio frame.
• Perceptual entropy denotes minimum number of
bits which should be allocated to a given audio
frame to represent ‘perceptually lossless’ audio.
Absolute Threshold of Hearing
• The absolute threshold of hearing characterizes
the amount of energy needed in a pure tone
such that it can be detected by a listener in a
noiseless environment.
It can be expressed with a non-linear function,
3.64(f/1000)-0.8 -
2
-0.6(f/1000-3.3)
6.5e
Tq(f) =
+ 10-3(f/1000)4 (dB SPL)
• When applied to signal compression, it could be
interpreted as a maximum allowable energy
level for coding distortions introduced in the
frequency domain.
• So using this information the noise levels during
quantization are tried to fit below this threshold.
• Due to this quantization noise does not become
audible.
However …
• The detection threshold for spectrally complex
quantization noise is a modified version of the
absolute threshold, with its shape determined by
the stimuli present at any given time.
• Since stimuli are in general time-varying, the
detection threshold is also a time-varying
function of the input signal.
• A Spreading function helps to determine
modified detection threshold of hearing in
presence of stimuli in given audio frame.
Critical Bands
• Human ear can be viewed as a discrete set of
band pass filters, which covers the entire 20kHz
frequency range.
• The inner ear called as ”Cochlea” contains
frequency sensitive positions. Whenever any
tone enters the cochlea it moves until it reaches
the position where it resonates. (Works as
spectrum analyzer)
• The “critical bandwidth” is a function of
frequency that quantifies the cochlear filter pass
bands. (unit – Bark)
• As the center frequency goes on increasing, the barkwidth also goes on increasing.
• Spectral analysis of audio content is performed using
critical bands.
Bark-width with center frequency ‘f’ is gives as …
BWc(f) = 25 + 75(1 + 1.4(f/100)2)0.69 Hz
To convert frequency in ‘Hz’ to ‘Bark’ …
Z(f) = 13 arctan(0.00076f) + 3.5 arctan(f/7500)2 (Bark)
Figure: Idealized critical band filter bank
Masking
Masking refers to a process where one sound is
rendered inaudible because of the presence of
another sound
• Simultaneous Masking (Frequency domain)
Relative shapes of the masker and maskee magnitude spectra
determine extent of masking
• Non-simultaneous Masking (Time domain)
Phase relationships between masker and maskee determine
masking outcome.
Depending on the behavior of masker and
maskee there are following cases :
• Noise Masking Tone (NMT)
• Tone Masking Noise (TMN)
• Noise Masking Noise (NMN)
Noise Masking Tone
Tone Masking Noise
We can see the asymmetry of masking power between noise and tonal maskers.
Significantly greater masking power is associated with noise maskers than with tonal
masker.
Difference between SMR, NMR and SNR
Spread of Masking
•
•
Masker centered within one critical band has some predictable effect on
detection thresholds in other critical bands. This effect, also known as the
spread of masking,
It is often modeled in coding applications by an approximately triangular
spreading function
Non-simultaneous Masking
(Temporal Masking)
MPEG Audio Codec Family
•
•
•
•
•
•
MPEG-1 (ISO/IEC 11172-3) Layer 2 (mp2)
MPEG-1 Layer 3 (mp3)
MPEG-2 (ISO/IEC 13818-3) AAC
MPEG-4 (ISO/IEC 14496-3) AAC
MPEG-4 HE AAC
MPEG-4 HE AAV v2
MP3 Compression Flow Chart
QMF
Filter
bank
MDCT
Filter
bank
Layer 3 uses a 2-stage filter, more frequency
resolution and improved Huffman Coding to the
basic perceptual coder principle
Bit rates available :
• In MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96, 112,
128, 160, 192, 224, 256 and 320 kbit/s, and the available
sampling frequencies are 32, 44.1 and 48 kHz. 44.1 kHz
is almost always used (coincides with the sampling rate
of compact discs), and 128 kbit/s has become the de
facto "good enough" standard, although 192 kbit/s is
becoming increasingly popular over peer-to-peer file
sharing networks.
• In MPEG-2 and [the non-official] MPEG-2.5 include
some additional bit rates: 8, 16, 24, 32, 40, 48, 56, 64,
80, 96, 112, 128, 144, 160 kbit/s while providing lower
sampling frequencies (8, 11.025, 12, 16, 22.05 and 24
kHz)
Design limitations of MP3
There are several limitations inherent to the MP3 format that cannot
be overcome by using a better encoder. Newer audio compression
formats such as Vorbis and AAC no longer have these limitations.
In technical terms, MP3 is limited in the following ways:
• Bitrate is limited to a maximum of 320 kbit/s
• Time resolution can be too low for highly transient signals, causing
some smearing of percussive sounds
• Frequency resolution is limited by the small long block window size,
decreasing coding efficiency
• No scale factor band for frequencies above 15.5/15.8 kHz
• Joint stereo is done on a frame-to-frame basis
• Encoder/decoder overall delay is not defined, which means lack of
official provision for gapless playback. However, some encoders
such as LAME can attach additional metadata that will allow players
that are aware of it to deliver gapless playback.
• Nevertheless, a well-tuned MP3 encoder can perform competitively
even with these restrictions.
Advanced Audio Coding (AAC)
• It is a standardized, lossy digital audio
compression scheme. It was developed with the
cooperation and contributions of companies
mainly including Dolby, Fraunhofer (FhG), AT&T,
Sony and Nokia, and was officially declared an
international standard by the Moving Pictures
Experts Group in April of 1997.
• Not backward compatible with other MPEG
audio standards (like mp3)
• AAC was promoted as the successor to MP3 for audio
coding at medium to high bitrates.
• AAC follows the same basic coding paradigm as Layer-3
(high frequency resolution filterbank, non-uniform
quantization, Huffman coding, iteration loop structure
using analysis by-synthesis), but improves on Layer-3 in
a lot of details and uses new coding tools for improved
quality at low bit-rates.
• Its popularity is currently maintained by it being the
default iTunes codec, the media player which powers
iPod, the most popular digital audio player on the
market.
• Furthermore, the iTunes Music Store, whose sales
account for 85% of the market for legal online
downloads, sells AAC-encoded songs (encapsulated
with FairPlay Digital Rights Management)
AAC's improvements over MP3
• Sample frequencies from 8 kHz to 96 kHz (official MP3:
16 kHz to 48 kHz)
• Up to 48 channels
• Higher efficiency and simpler filterbank (hybrid → pure
MDCT)
• Higher coding efficiency for stationary signals (blocksize:
576 → 1024 samples)
• Higher coding efficiency for transient signals (blocksize:
192 → 128 samples)
• Can use Kaiser-Bessel derived window function to
eliminate spectral leakage at the expense of widening
the main lobe
• Much better handling of frequencies above 16 kHz
• More flexible joint stereo (separate for every scale band)
• Both the mid/side coding and the intensity coding are
more flexible, allowing to apply them to reduce the bitrate more frequently.
• An optional backward prediction, computed line by line,
achieves better coding efficiency especially for very
tone-like signals. This feature is only available within the
rarely used main profile.
• Improved Huffman Coding : In AAC, coding by
quadruples of frequency lines applied more often. In
addition, the assignment of Huffman code tables to
coder partitions can be much more flexible.
• AAC and HE-AAC are far better than MP3 at very low
bitrates, but at medium to higher bitrates the two formats
are more comparable
Modular encoding
AAC takes a modular approach to encoding. Depending on the
complexity of the bitstream to be encoded, the desired performance
and the acceptable output, implementers may create profiles to
define which of a specific set of tools they want use for a particular
application. The standard offers four default profiles:
• Low Complexity (LC) - the simplest and most widely used and
supported;
• Main Profile (MAIN) - like the LC profile, with the addition of
backwards prediction;
• Sample-Rate Scalable (SRS), a.k.a. Scalable Sample Rate
(MPEG-4 AAC-SSR);
• Long Term Prediction (LTP); added in the MPEG-4 standard - an
improvement of the MAIN profile using a forward predictor with lower
computational complexity.
• Depending on the AAC profile and the MP3 encoder, 96 kbit/s AAC
can give nearly the same or better perceptional quality as 128 kbit/s
MP3
MPEG-2 AAC Flowchart
MPEG AAC Family
Extensions and Improvements
Some extensions have been added to the original AAC
standard:
• MPEG-4 Scalable To Lossless (SLS);
• High Efficiency AAC (HE-AAC), a.k.a. aacPlus v1 or
AAC+ - the combination of SBR (Spectral Band
Replication) and AAC; used for low bitrates;
• HE-AAC v.2, a.k.a. aacPlus v2 - the combination of
Parametric Stereo (PS) and HE-AAC;
• Perceptual Noise Substitution (PNS);
• Long Term Predictor (LTP) - added in MPEG-4 Part 3.
MPEG AAC Performance
• MPEG AAC provides excellent audio quality. Reaching
perceptually transparent quality at only 64 kbit/s per
channel, it fulfills the requirements for broadcast quality
as defined by the European Broadcasting Union.
• With sampling rates ranging from 8kHz up to 96kHz and
above, with bit rates up to 256 kbit/s, and with support
for up to 48 channels, MPEG AAC is one of the most
flexible audio codecs. Of course, the standard also
supports mono, stereo, and all common multi-channel
configurations (e. g. 5.1 or 7.1).
• The low computational demands make AAC the ideal
codec for any low bit rate high-quality audio application.
MPEG-HE AAC
• HE-AAC is the low bit rate codec in the AAC
family and is a combination of the AAC LC
(Advanced Audio Coding Low Complexity) audio
coder and the SBR (Spectral Band Replication)
bandwidth expansion tool.
• This combination achieves good stereo quality
already at bit rates of 32 to 48 kbit/s. HE-AAC is
also known as aacPlus and can be used in multichannel operations.
MPEG-4 HE-AAC v2
• Combined with parametric stereo, the HEAAC codec provides good audio quality
starting at bit rates around 16 to 24 kbit/s
for stereo content.
• HE-AAC v2 is also known as aacPlus v2.
Rough work …
• Explain basic psychoacoustic principles –
Absolute threshold of hearing, Critical bands,
Phenomenon of masking – Simultaneous,
Masking asymmetry, Spread of masking, Nonsimultaneous, Perceptual Entropy
• MPEG audio codec family – mp3, mp2 AAC,
mp4 AAC, advanced AAC plus version 1,
advanced AAC plus version 2
(mention features present/absent in each)
•
•
•
•
Limitations of mp3
What is different in AAC ?
Features in AAC
Explain each feature in detail (mp2, mp4)
Download