PPT

advertisement

CS 414 – Multimedia Systems Design

Lecture 11 – MP3 and MP4

Audio (Part 7)

Klara Nahrstedt

Spring 2012

CS 414 - Spring 2012

Administrative

 MP1 – deadline February 18

CS 414 - Spring 2012

Outline

 MP3 Audio Encoding

 MP4 Audio

 Reading:

Media Coding book, Section 7.7.2 – 7.7.5

Recommended Paper on MP3: Davis Pan, “A Tutorial on MPEG/Audio

Compression”, IEEE Multimedia, pp. 6-74, 1995

 Recommended books on JPEG/ MPEG Audio/Video Fundamentals:

Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”,

Chapman and Hall, 1996

CS 414 - Spring 2012

Why Compression is Needed

 Data rate = sampling rate * quantization bits * channels (+ control information)

 For example (digital audio):

 44100 Hz; 16 bits; 2 channels

 generates about 1.4M of data per second;

84M per minute; 5G per hour

MPEG-1 Audio

 Lossy compression of audio

In late 1980’s ISO’s MPEG group started to standardize

 TV broadcasting

 Use of Audio on CD-ROM (later DVD)

 MPEG-1 Audio – 1992

 MPEG-2 Audio - 1994

 MPEG-1 Audio Layer I, II, III

CS 414 - Spring 2012

Criteria for A Good Standard

Achieve desired outcome

Be comprehensible

Allow efficient implementation

Support competition

Give benchmark tests

Be supported by industry

Be good for end users

….

Two models: implement first, then standardize standardize first, then implement

MPEG-1 Audio Layer II

 Called MP2

 Dominant standard for audio broadcasting

 DAB digital radio and DVB digital television

 Came out of MUSICAM codecs with bit rates

64-196 kbps

 MUSICAM audio coding - basis for MPEG-1 and MPEG-2 audio

Sampling rates: 32, 44.1, 48 kHz

Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps

Format: mono, stereo, dual channel, …

 MP2 – sub-band audio encoder in time domain

MPEG-1 Audio Layer III

 MPEG-1 Layer III is called MP3 format

 Popular for PC and Internet applications

 Goal to compress to 128 kbps, but can be compressed to higher or lower resulting quality

 Utilization of psychoacoustics

 Scientific study of sound perception .

CS 414 - Spring 2012

MPEG Audio – MP3

 First psychoacoustic masking code was proposed in 1979 in AT&T – Bell Labs,

Murray Hill.

 MP3 based on OCF (optimum coding in frequency domain) and PXFM (Perceptual transform coding)

 MPEG-1 Audio Layer III – public release

1993

 MPEG-2 Audio III – public release 1995

CS 414 - Spring 2012

MPEG Audio – MP3

 1997 – mp3.com – offering thousands of

MP3s created by independent artists for free

 1999 – Napster MP3 peer-to-peer file sharing

 Problem: copyright infringement

 Authorized services: Amazon.com,

Rhapsody, Juno Records, ..

CS 414 - Spring 2012

MPEG-1 Audio Encoding

 Characteristics

 Precision 16 bits

 Sampling frequency: 32KHz, 44.1 KHz, 48

KHz

 3 compression layers: Layer 1, Layer 2, Layer

3 (MP3)

 Layer 3: 32-320 kbps, target 64 kbps

 Layer 2: 32-384 kbps, target 128 kbps

 Layer 1: 32-448 kbps, target 192 kbps

CS 414 - Spring 2012

MPEG Audio Encoding Steps

CS 414 - Spring 2012

MPEG Audio Filter Bank

 Filter bank divides input into multiple sub-bands

(32 equal frequency sub-bands)

 Sub-band i defined

S t

[ i ]

 k

7 

0

3 j

7 

0 cos(

( 2 i

1 )( k

64

16 )

* ( C [ k

64 j ] * x [ k

64 j ]

 i

[ 0 , 31 ], S t

[ i ] filter output sample for sub-band i at time t , C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer

CS 414 - Spring 2012

MPEG Audio Psycho-acoustic

Model

 MPEG audio compresses by removing acoustically irrelevant parts of audio signals

 Takes advantage of human auditory systems inability to hear quantization noise under auditory masking

 Auditory masking : occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible.

CS 414 - Spring 2012

Loudness and Pitch

(Review on Psychoacoustic Effects)

 More sensitive to loudness at mid frequencies than at other frequencies

 intermediate frequencies at [500hz, 5000hz]

 Human hearing frequencies at [20hz,20000hz]

 Perceived loudness of a sound changes based on frequency of that sound

 basilar membrane reacts more to intermediate frequencies than other frequencies

CS 414 - Spring 2012

Fletcher-Munson Contours

Each contour represents an equal perceived sound

Perception sensitivity (loudness) is not linear across all frequencies and intensities

CS 414 - Spring 2012

Masking Effects

(Review of Psychoacoustic Effects)

Frequency masking

Temporal masking

CS 414 - Spring 2012

MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band

CS 414 - Spring 2012

MPEG Audio Bit Allocation

This process determines number of code bits allocated to each sub-band based on information from the psychoacoustic model

Algorithm:

1.

2.

Compute mask-to-noise ratio : MNR=SNR-SMR

Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels

Get MNR for each sub-band

3.

4.

Search for sub-band with the lowest MNR

Allocate code bits to this sub-band.

 If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1

CS 414 - Spring 2012

Audio Quality

 Bitrate

 With too low bit rate, we get compression artifacts

 Ringing

 Pre-echo – sound is heard before it occurs. It is most noticeable in impulsive sounds from percussion instruments such as cymbals

 Occurs in transform-based audio compression algorithms

 Quality of encoder and encoding parameters

 Constant Bit rate encoding

 Variable Bit rate encoding

CS 414 - Spring 2012

MP3 Audio Format

Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg

CS 414 - Spring 2012

MPEG Audio Comments

 Precision of 16 bits per sample is needed to get good SNR ratio

 Noise we are getting is quantization noise from the digitization process

 For each added bit, we get 6dB better SNR ratio

 Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away

 Raising noise floor is the same as using less bits and using less bits is the same as compression

CS 414 - Spring 2012

Successor of MP3

 Advanced Audio Coding (AAC) – now part of MPEG-4 Audio

 Inclusion of 48 full-bandwidth audio channels

 Default audio format for iPhone, iPad,

Nintendo, PlayStation, Nokia, Android,

BlackBerry

 Introduced 1997 as MPEG-2 Part 7

 In 1999 – updated and included in MPEG-4

CS 414 - Spring 2012

AAC’s Improvements over MP3

 More sample frequencies (8-96 kHz)

 Arbitrary bit rates and variable frame length

 Higher efficiency and simpler filterbank

 Uses pure MDCT (modified discrete cosine transform)

 Used in Windows Media Audio

CS 414 - Spring 2012

MPEG-4 Audio

 Variety of applications

 General audio signals

 Speech signals

 Synthetic audio

 Synthesized speech (structured audio)

CS 414 - Spring 2012

MPEG-4 Audio Part 3

 Includes variety of audio coding technologies

 Lossy speech coding (e.g., CELP)

 CELP – code-excited linear prediction – speech coding

 General audio coding (AAC)

 Lossless audio coding

 Text-to-Speech interface

 Structured Audio (e.g., MIDI)

CS 414 - Spring 2012

MPEG-4 Part 14

 Called MP4 with Extension .mp4

 Multimedia container format

 Stores digital video and audio streams and allows streaming over Internet

 Container or wrapper format

 meta-file format whose spec describes how different data elements and metadata coesit in computer file

CS 414 - Spring 2012

MPEG-4 Audio

 Bit-rate 2-64kbps

 Scalable for variable rates

 MPEG-4 defines set of coders

 Parametric Coding Techniques : low bit-rate 2-6kbps,

8kHz sampling frequency

 Code Excited Linear Prediction : medium bit-rates 6-

24 kbps, 8 and 16 kHz sampling rate

 Time Frequency Techniques : high quality audio 16 kbps and higher bit-rates, sampling rate > 7 kHz

CS 414 - Spring 2011

Conclusion

 MPEG Audio is an integral part of the

MPEG standard to be considered together with video

 MPEG-4 Audio represents an major extension in terms of capabilities to

MPEG-1 Audio

CS 414 - Spring 2012

Download