Audio Coding

advertisement
Audio Codecs
Miikka Vilermo
Nokia Research Center – Audio Visual Systems Laboratory
1
© NOKIA
Audio Codecs
Introduction
• Codecs evolve and new technologies emerge.
• What can we do with all these codecs?
• Will the emerging technologies change the ”status quo”?
• What do people want?
2
© NOKIA
Audio Codecs
Audio Codecs
• Recent technical advances
• Existing and emerging codecs
• How good is good enough? – Codec requirements.
• Important issues outside today’s presentation
• One case closed and another reopened (if time)
• Questions
3
© NOKIA
Audio Codecs
Recent Technical Advances
• Spectral Band Replication (SBR)
• Binaural Cue Coding (BCC)
• Integer-to-Integer Modified Discrete Cosine Transform
(INTMDCT)
4
© NOKIA
Audio Codecs
Spectral Band Replication (SBR)
• SBR is one method of Bandwidth Extension (BWE).
• BWE is a class of methods to increase the perceived bandwidth without
using many bits. Psychoacoustics…
• SBR was introduced by Coding Technologies.
• The technology is applicable to any coder. Eg: AAC+, MP3Pro
• Achieves very high quality @ 48 kbps stereo.
• SBR has been standardized as High Efficiency Advanced Audio Coding (HEAAC).
5
© NOKIA
Audio Codecs
Audio encoder
2
M
U
X
SBR encoder
a
D
E
M
U
X
Audio decoder
SBR decoder
2
b
High-level block diagram of the SBR incorporated to an audio encoder (a) and
audio decoder (b). (Juha Ojanperä)
6
© NOKIA
Audio Codecs
input signal
Analysis
QMF
1
t/f grid
2
prediction
3
3 2 1
3 2 1
3 2 1
spectral
envelope
noise
level
sine
coding
Q
Q
Huffman
coding
Huffman
coding
SBR MUX
SBR encoder
input signal
Down
sampling
SBR bits
AAC
encoder
SBR + AAC
frame
MUX
SBR + AAC encoder
Fill element
AAC bitstream data
SBR + AAC frame
Block diagram of the SBR encoder module combined with AAC core encoder.
(Juha Ojanperä, Miikka Vilermo)
7
© NOKIA
Audio Codecs
t/f grid
frequency
time
Example of the time/frequency grid of the SBR.
(Juha Ojanperä, Miikka Vilermo)
8
© NOKIA
Audio Codecs
Binaural Cue Coding (BCC)
• Traditional multichannel coding requires “the number of channels” x “mono
bitrate” kbps.
• Without specific matrixing, traditional multichannel coding is restricted to a
certain number of channels e.g. 5.1 and speaker placement.
• Binaural Cue Coding (BCC) has two versions: flexible rendering and natural
rendering
• In flexible rendering the original multichannel input is downmixed to
(usually) one channel and the spatial information is sent as one separate
low bitrate parametric stream. The decoder then renders as many channels
as are needed based on the parameterised spatial image. The decoder can
also apply Head Related Transfer Functions (HRTF’s) to create surround
headphone playback.
• In natural rendering one parameterised stream of spatial information is
created for each of the original channels. This increases the bitrate and
limits rendering options in the decoder, but also improves quality.
• BCC can also be used as parametric stereo.
9
© NOKIA
Audio Codecs
BCC continued
• Typical parameters for BCC are:
• Inter-Channel Level Difference (ICLD)
• Inter-Channel Time Difference (ICTD)
• Inter-Channel Correlation (ICC)
• Parameters are applied on critical bands.
• BCC is based on the assumption that on every critical band the dominant
source defines the spatial perception.
• BCC doesn’t suffer from unmasking effects since the quantization noise is
automatically rendered to the same direction as the source.
10
© NOKIA
Audio Codecs
Integer-to-Integer Modified Discrete Cosine
Transform (INTMDCT)
• Lossy coding is important, but how could you extend that to lossless coding?
• Modified Discrete Cosine Transform (MDCT) is the most popular audio
coding transform, but losslessly coding floating point values is difficult.
• Integer-to-Integer Modified Discrete Cosine Transform (INTMDCT) is similar
to MDCT but if the input is integers then the output is integers too.
• It is possible to create an integer version of any transform where the
transform matrix can be expressed as a product of matrices that have ones
in the diagonal and all other elements are zero except in either one row or
column.
1
0

 

 x i,1
 

0
0

11
© NOKIA
Audio Codecs

0
0
1
0

x i,2
1
x i,N-1

0
0

1
0
0 
0 
 

x i,N 
 

0 
1 
INTMDCT Continued
• Givens rotations (butterfly operations) can be expressed as such matrices.
Thus all matrices that can be expressed in Givens rotations can be used as
basis for an integer transform.
• MPEG has an ongoing standardisation on lossless coding. INTMDCT was a
basis for that work.
12
© NOKIA
Audio Codecs
Input
MDCT
Quantization
and coding
Perceptual
model
Bitstream
coding
Lossy bitstream
Inverse quantization
and rounding
-
Lossless
enhancement
bitstream
Entropy
coding
INTMDCT
ENCODER
Lossy bitstream
Bitstream
decode
Inverse
quantization
Inverse
MDCT
Lossy audio
Rounding
Lossless enhancement
bitstream
Entropy
decode
+
Inverse
INTMDCT
Lossless audio
DECODER
Block diagram of scalable lossless INTMDCT enhanced perceptual codec
13
© NOKIA
Audio Codecs
Existing and Emerging Codecs
• Internet codecs
• Multichannel codecs
• Lossless codecs
• Low delay codecs
• New codecs
• Others
14
© NOKIA
Audio Codecs
Internet Codecs
• MP3
• MPEG-1 layer 3
• largest user base
• near CD-quality can be over 192 kbps for difficult material
• Ogg Vorbis
• open source
• claimed to be IPR free
• quality around mp3 but varies greatly between samples
• AAC
• MPEG 2 and 4
• lowest bitrate for CD-quality
• near CD-quality around 128 kbps even for difficult material
• Quicktime and RealAudio use AAC for high bitrates
• Windows Media
• proprietary
• large user base through windows
• better than mp3, WMA9 comes close to AAC in quality
• includes lossless and multichannel coding
15
© NOKIA
Audio Codecs
Internet Codecs Continued
• RealAudio
• uses AAC for high bitrates
• proprietary low bitrate codecs, the same as in earlier versions
• proprietary multichannel codecs
• built for streaming
• ATRAC
• proprietary
• ATRAC3plus for low bitrates (<=64kbps)
• ATRAC3 for high bitrates
• mp3 like quality in high bitrates
• better than AAC at low bitrates
16
© NOKIA
Audio Codecs
Multichannel Codecs
• Windows Media9 and RealAudio10 include multichannel coding, AAC and
AAC+ support multichannel coding
• AC3 (Audio Coding, Dolby)
• proprietary
• largest installed user base
• quality close to mp3
• production point of view taken into account
• DTS (Digital Theater Systems)
• proprietary
• high bitrate, high quality
• MLP (Meridian Lossless Packing)
• proprietary
• lossless
• SDDS (Sony Dynamic Digital Sound)
• proprietary
• based on ATRAC
17
© NOKIA
Audio Codecs
Lossless Codecs
• Compression ratios 1/3-1/2 depending on the material
• FLAC (Free Lossless Audio Coding)
• free
• Monkey’s Audio
• free
• Windows Media
• Many others exist
• MPEG has an ongoing standardization work
18
© NOKIA
Audio Codecs
Low-Delay Codecs
• G.722 based teleconferencing codecs
• low quality, enough for speech @ 64kbps
• AAC-LC
• MPEG 4
• Quality better than mp3
• Most ordinary codecs not good enough for two-way communications,
especially AAC+ has a very high delay
19
© NOKIA
Audio Codecs
New Codecs
• Spectral Band Replication
• AAC+ = MPEG HE-AAC , very high quality around 48kbps
• mp3+
• AMR-WB+ (Adaptive Multi-Rate WideBand, Nokia)
• good quality around 24kbps
• optional codec in 3GPP alongside with AAC+
• Discreet multichannel
• AAC+ discreet 5.1 @ 128kbps
• E-AC3 (Enhanced Audio Coding, Dolby)
• Binaural Cue Coding
• mp3 surround 192kbps (FhG, Agere)
• HE-AAC surround 64kbps, supposedly better than AC-3 at ???kbps
• MPEG standardization about to start
• Spectral Band Replication & Binaural Cue Coding
• E-AAC+ (Enhanced AAC+, FhG, CT, Philips)
20
© NOKIA
Audio Codecs
Other Codecs
• SBC (Sub Band Coding)
• used with bluetooth
• low complexity, low power
• near CD quality @ 320 kbps
• Dolby-E
• multichannel
• synchronous with video frames
• high bitrates, but studied tandem coding quality
21
© NOKIA
Audio Codecs
How Good Is Good Enough? – Codec
Requirements
• Many users are happy with 128kbps mp3, but others or moving to 192kbps
mp3
• iTunes AAC 128 is near CD-quality but not fully transparent. However, this
seems to be enough judging by the popularity of the service.
• On the other hand, RealAudio AAC 192 is practically transparent.
• Personally AAC 320 kbps is enough but then lossless codecs are close at
700 kbps.
• Some Internet music services offer songs with lossless compression.
• One unanswered question is: What is enough for streaming?
• Streaming over fixed line at 128kbps can be achieved. But how about
wireless links: 3G, WLAN, bluetooth? And in many cases there has to be
room for video.
22
© NOKIA
Audio Codecs
How Good Is Good Enough? – Codec Requirements
Contd.
• Delay
• usually high efficiency means long delay, AAC+ is a prime example
• Will multichannel become important?
• Error resilience is a must in wireless applications
• Scalability would be useful, some new ideas presented recently by A.
Aggarwal
• Editability
• Transcoding is a sin!
• Reversible codecs
• High enough bitrate
23
© NOKIA
Audio Codecs
Important Issues Outside Today’s
Presentation
• DMR (Digital Rights Management)
• usability
• parametric coding
24
© NOKIA
Audio Codecs
One Case Closed and Another Reopened (if
time)
• Louder Sounds Can Produce Less Forward Masking: Effects of Component
Phase in Complex Tones, Gockel et al., J. Acoust. Soc. Am., Vol. 114, No. 2
August 2003
• Near-optimal selection of encoding parameters for audio coding, Aggarwal
et Al., IEEE International Conference on Acoustics, Speech, and Signal
Processing, (ICASSP '01), 7-11 May, 2001. Proceedings, Volume 5, Pages:
3269 – 3272.
• M.Wolters, ‘A closer look into MPEG-4 high efficiency AAC’, 115th AES
Convention, New York, NY, Oct.2003
25
© NOKIA
Audio Codecs
Conclusion
• Existing codecs have matured and added new features
• For most needs there already is a codec
• Emerging codecs make possible good quality stereo @ 48kbps and 5.1
multichannel @ 64kbps
• User requirements are still a question
26
© NOKIA
Audio Codecs
27
© NOKIA
Audio Codecs
Download