Presentation 2: Audio Compression Techniques

advertisement
AUDIO COMPRESSION
TOOLS & TECHNIQUES
Gautam Bhattacharya
CD Quality
CD Audio:
2 Channel (stereo)
16 bit encoding
44.1 kHz sampling rate
Data Rate:
This leads to a data rate of 1.4 - 1.54 mbps
AUDIO ENCODER!
Bit rate: as low as 1 bit per sample or
less
Based on a Perceptual model
Capable of high fidelity audio
Lossy or Lossless?
‘CD Quality’
Many perceptual test were conducted to
verify the quality of audio output
Takes advantage of perceptual
irrelevancies as well as statistical
redundancies.
Motion Picture Experts Group
MPEG
•
•
MPEG is a family of encoding standards for digital multimedia
information
• MPEG-1: a standard for storage and retrieval of moving pictures and audio on storage
media (e.g., CD-ROM).
•
Layer I
•
Layer II
•
Layer III (aka MP3)
•
• MPEG-2: standard for digital television, including high-definition television (HDTV),
and for addressing multimedia applications.
•
• Advanced Audio Coding (AAC)
•
• MPEG-4: a standard for multimedia applications, with very low bit-rate audio-visual
compression for those channels with very limited bandwidths (e.g., wireless channels).
•
• MPEG-7: a content representation standard for information search
Back to the Encoder!
Generic Audio Encoder Architecture
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000
Psychoacoustic Model
Critical Listening Threshold:
The absolute threshold of hearing is defined as the amount of energy needed in a
pure tone such that it can be detected by a listener in a noiseless environment.
This criteria assumes that
the volume control on the decoder will be set such
that the smallest possible output signal will be presented at 0 dB - SPL
Psychoacoustic Model
Absolute threshold of hearing
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000
•
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4)
•
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4)
nn
Psychoacoustic Model
Critical Bands
The ear has a limited frequency selectivity that varies in acuity from less
than 100 Hz for the lowest audible frequencies to more than 4 kHz for the
highest.
As a result the audible spectrum can be partitioned into critical bands that
reflect the resolving power of the ear as a function of frequency.
Due to this limited frequency resolving power, the threshold for noise
masking at any given frequency is solely dependent on the signal activity
within a critical band of that frequency.
Psychoacoustic Model
MPEG/Audio filter banks Vs Critical
Bands
Pan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5
Psychoacoustic Model
Auditory Masking
Auditory Masking is a perceptual weakness of the ear that occurs whenever the
presence of a strong audio signal makes a spectral neighbourhood of weaker
audio signals imperceptible.
Two types of Masking:
* Simultaneous Masking
* Temporal Masking
Psychoacoustic Model
Audio Masking
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Pan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5
Psychoacoustic Model
Perceptual Entropy
Johnston at Bell Labs has combined notions of psychoacoustic masking with
signal quantization principles to define perceptual entropy (PE), a measure of
perceptually relevant information contained in any audio record.
Expressed in bits per sample, PE represents a theoretical limit on the
compressibility of a particular signal.
PE measurements reported in and suggest that a wide variety of CD quality audio
source material can be transparently compressed at approximately 2.1 bits per
sample.
Time - Frequency Analysis
Filter Banks
The filter bank divides the signal spectrum into frequency sub-bands and
generates a time-indexed series of coefficients representing the frequency
localized signal power within each band.
Masking thresholds are applied to resulting frequency sub-band signals
By providing explicit information about the distribution of signal and hence
masking power over the time-frequency plane, the filter bank plays an essential
role in the identification of perceptual irrelevancies when used in conjunction with
a perceptual model
Time - Frequency Analysis
Pseudo QMF - M Band Banks
Used in all MPEG 1 encoders
Signal is separated into sub-bands, the widths of which are equal over the entire
frequency range
The resulting sub-band signals are then down-sampled, in order to conserve
bandwidth. (they are up-sampled again at the decoder)
Pre Echo Distortion
Pre-echoes occur when a signal with a sharp attack begins near the end of a
transform block immediately following a region of low energy.
This situation can arise when coding recordings of percussive instruments such as
the triangle, the glockenspiel, or the castanets
b
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000
AUDIO COMPRESSION
THANK YOU!
Download