PPT

CS 414 – Multimedia Systems Design

Lecture 11 – MP3 and MP4

Audio (Part 7)

Klara Nahrstedt

Spring 2012

CS 414 - Spring 2012

Administrative

 MP1 – deadline February 18


Outline

 MP3 Audio Encoding

 MP4 Audio

 Reading:





Media Coding book, Section 7.7.2 – 7.7.5

Recommended Paper on MP3: Davis Pan, “A Tutorial on MPEG/Audio

Compression”, IEEE Multimedia, pp. 6-74, 1995

 Recommended books on JPEG/ MPEG Audio/Video Fundamentals:



Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”,

Chapman and Hall, 1996


Why Compression is Needed

 Data rate = sampling rate * quantization bits * channels (+ control information)

 For example (digital audio):

 44100 Hz; 16 bits; 2 channels

 generates about 1.4M of data per second;

84M per minute; 5G per hour

MPEG-1 Audio

 Lossy compression of audio



In late 1980’s ISO’s MPEG group started to standardize

 TV broadcasting

 Use of Audio on CD-ROM (later DVD)

 MPEG-1 Audio – 1992

 MPEG-2 Audio - 1994

 MPEG-1 Audio Layer I, II, III


Criteria for A Good Standard

















Achieve desired outcome

Be comprehensible

Allow efficient implementation

Support competition

Give benchmark tests

Be supported by industry

Be good for end users

….







Two models: implement first, then standardize standardize first, then implement

MPEG-1 Audio Layer II

 Called MP2

 Dominant standard for audio broadcasting

 DAB digital radio and DVB digital television

 Came out of MUSICAM codecs with bit rates

64-196 kbps

 MUSICAM audio coding - basis for MPEG-1 and MPEG-2 audio







Sampling rates: 32, 44.1, 48 kHz

Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps

Format: mono, stereo, dual channel, …

 MP2 – sub-band audio encoder in time domain

MPEG-1 Audio Layer III

 MPEG-1 Layer III is called MP3 format

 Popular for PC and Internet applications

 Goal to compress to 128 kbps, but can be compressed to higher or lower resulting quality

 Utilization of psychoacoustics

 Scientific study of sound perception .


MPEG Audio – MP3

 First psychoacoustic masking code was proposed in 1979 in AT&T – Bell Labs,

Murray Hill.

 MP3 based on OCF (optimum coding in frequency domain) and PXFM (Perceptual transform coding)

 MPEG-1 Audio Layer III – public release

1993

 MPEG-2 Audio III – public release 1995


MPEG Audio – MP3

 1997 – mp3.com – offering thousands of

MP3s created by independent artists for free

 1999 – Napster MP3 peer-to-peer file sharing

 Problem: copyright infringement

 Authorized services: Amazon.com,

Rhapsody, Juno Records, ..


MPEG-1 Audio Encoding

 Characteristics

 Precision 16 bits

 Sampling frequency: 32KHz, 44.1 KHz, 48

KHz

 3 compression layers: Layer 1, Layer 2, Layer

3 (MP3)

 Layer 3: 32-320 kbps, target 64 kbps




MPEG Audio Encoding Steps


MPEG Audio Filter Bank

 Filter bank divides input into multiple sub-bands

(32 equal frequency sub-bands)

 Sub-band i defined

S t

[ i ]

 k

7 



0

3 j

7 



0 cos(

( 2 i



1 )( k

64



16 )



* ( C [ k



64 j ] * x [ k



64 j ]

 i



[ 0 , 31 ], S t

[ i ] filter output sample for sub-band i at time t , C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer


MPEG Audio Psycho-acoustic

Model

 MPEG audio compresses by removing acoustically irrelevant parts of audio signals

 Takes advantage of human auditory systems inability to hear quantization noise under auditory masking

 Auditory masking : occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible.


Loudness and Pitch

(Review on Psychoacoustic Effects)

 More sensitive to loudness at mid frequencies than at other frequencies

 intermediate frequencies at [500hz, 5000hz]

 Human hearing frequencies at [20hz,20000hz]

 Perceived loudness of a sound changes based on frequency of that sound

 basilar membrane reacts more to intermediate frequencies than other frequencies


Fletcher-Munson Contours

Each contour represents an equal perceived sound

Perception sensitivity (loudness) is not linear across all frequencies and intensities


Masking Effects

(Review of Psychoacoustic Effects)

Frequency masking

Temporal masking


MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band


MPEG Audio Bit Allocation





This process determines number of code bits allocated to each sub-band based on information from the psychoacoustic model

Algorithm:

1.

2.



Compute mask-to-noise ratio : MNR=SNR-SMR

Standard provides tables that give estimates for SNR resulting from quantizing to a given number of quantizer levels

Get MNR for each sub-band

3.

4.

Search for sub-band with the lowest MNR

Allocate code bits to this sub-band.

 If sub-band gets allocated more code bits than appropriate, look up new estimate of SNR and repeat step 1


Audio Quality

 Bitrate

 With too low bit rate, we get compression artifacts

 Ringing

 Pre-echo – sound is heard before it occurs. It is most noticeable in impulsive sounds from percussion instruments such as cymbals

 Occurs in transform-based audio compression algorithms

 Quality of encoder and encoding parameters

 Constant Bit rate encoding

 Variable Bit rate encoding


MP3 Audio Format

Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg


MPEG Audio Comments

 Precision of 16 bits per sample is needed to get good SNR ratio

 Noise we are getting is quantization noise from the digitization process

 For each added bit, we get 6dB better SNR ratio

 Masking effect means that we can raise the noise floor around a strong sound because the noise will be masked away

 Raising noise floor is the same as using less bits and using less bits is the same as compression


Successor of MP3

 Advanced Audio Coding (AAC) – now part of MPEG-4 Audio

 Inclusion of 48 full-bandwidth audio channels

 Default audio format for iPhone, iPad,

Nintendo, PlayStation, Nokia, Android,

BlackBerry

 Introduced 1997 as MPEG-2 Part 7

 In 1999 – updated and included in MPEG-4


AAC’s Improvements over MP3

 More sample frequencies (8-96 kHz)

 Arbitrary bit rates and variable frame length

 Higher efficiency and simpler filterbank

 Uses pure MDCT (modified discrete cosine transform)

 Used in Windows Media Audio


MPEG-4 Audio

 Variety of applications

 General audio signals

 Speech signals

 Synthetic audio

 Synthesized speech (structured audio)


MPEG-4 Audio Part 3

 Includes variety of audio coding technologies

 Lossy speech coding (e.g., CELP)

 CELP – code-excited linear prediction – speech coding

 General audio coding (AAC)

 Lossless audio coding

 Text-to-Speech interface

 Structured Audio (e.g., MIDI)


MPEG-4 Part 14

 Called MP4 with Extension .mp4

 Multimedia container format

 Stores digital video and audio streams and allows streaming over Internet

 Container or wrapper format

 meta-file format whose spec describes how different data elements and metadata coesit in computer file


MPEG-4 Audio

 Bit-rate 2-64kbps

 Scalable for variable rates

 MPEG-4 defines set of coders

 Parametric Coding Techniques : low bit-rate 2-6kbps,

8kHz sampling frequency

 Code Excited Linear Prediction : medium bit-rates 6-

24 kbps, 8 and 16 kHz sampling rate

 Time Frequency Techniques : high quality audio 16 kbps and higher bit-rates, sampling rate > 7 kHz


Conclusion

 MPEG Audio is an integral part of the

MPEG standard to be considered together with video

 MPEG-4 Audio represents an major extension in terms of capabilities to

MPEG-1 Audio


PPT

Lecture 11 – MP3 and MP4

Audio (Part 7)

Klara Nahrstedt

Spring 2012

Administrative

Outline

Why Compression is Needed

MPEG-1 Audio

Criteria for A Good Standard

MPEG-1 Audio Layer II

MPEG-1 Audio Layer III

MPEG Audio – MP3

MPEG Audio – MP3

MPEG-1 Audio Encoding

MPEG Audio Encoding Steps

MPEG Audio Filter Bank

MPEG Audio Psycho-acoustic

Model

Loudness and Pitch

Fletcher-Munson Contours

Masking Effects

MPEG Audio Bit Allocation

Audio Quality

MP3 Audio Format

MPEG Audio Comments

Successor of MP3

AAC’s Improvements over MP3

MPEG-4 Audio

MPEG-4 Audio Part 3

MPEG-4 Part 14

MPEG-4 Audio

Conclusion

Related documents

Products

Support

PPT

Lecture 11 – MP3 and MP4

Audio (Part 7)

Klara Nahrstedt

Spring 2012

Administrative

Outline

Why Compression is Needed

MPEG-1 Audio

Criteria for A Good Standard

MPEG-1 Audio Layer II

MPEG-1 Audio Layer III

MPEG Audio – MP3

MPEG Audio – MP3

MPEG-1 Audio Encoding

MPEG Audio Encoding Steps

MPEG Audio Filter Bank

MPEG Audio Psycho-acoustic

Model

Loudness and Pitch

Fletcher-Munson Contours

Masking Effects

MPEG Audio Bit Allocation

Audio Quality

MP3 Audio Format

MPEG Audio Comments

Successor of MP3

AAC’s Improvements over MP3

MPEG-4 Audio

MPEG-4 Audio Part 3

MPEG-4 Part 14

MPEG-4 Audio

Conclusion

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib