Proposal

advertisement
EE5359 MULTIMEDIA PROCESSING PROJECT PROPOSAL
Study and implementation of G.719 audio codec and performance analysis of
G.719 with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced
audio codec) audio codecs.
Student: Yashas Prakash
Student ID: 1000803680
Instructor: Dr. K. R. Rao
E-mail: yashas.prakash@mavs.uta.edu
Date: 03-27-2012.
Project Proposal:
Title: Study and implementation of G.719 audio codec and performance analysis of G.719
with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced audio codec)
audio codecs.
Abstract:
This project describes a low-complexity full-band (20 kHz) audio coding algorithm which has
been
recently
standardized
by
ITU-T(International
Telecommunication
Union-
Telecommunication Standardization Sector) as Recommendation G.719 [1]. The G.719 encoded
audio is the container format of 3GP audio. The algorithm is designed to provide 20 Hz - 20 kHz
audio bandwidth using a 48 kHz sample rate, operating at 32 - 128 kbps. This codec features
very high audio quality and low computational complexity and is suitable for use in applications
such as videoconferencing, teleconferencing, and streaming audio over the Internet [1]. This
technology, while leading to exceptionally low complexity and small memory footprint, results
in high full band audio quality, making the codec a great choice for any kind of communication
devices, from large telepresence systems to small low-power devices for mobile communication
[2]. A comparison of the widely used AAC and HE-AAC [9] audio codecs is carried out in terms
of performance, reliability, memory requirements and applications. A windows media audio file
is encoded to 3GP, AAC, HE AAC formats using SUPER
(c)
[13] software and testing of
different coding schemes is carried out for performance, encoding and decoding durations,
memory requirements and compression ratios.
List of acronyms
AAC
- Advanced audio coding
ATSC
- Advanced television systems committee
AES
- Audio Engineering Society
EBU
- European broadcasting union
FLVQ
- Fast lattice vector quantization
HE-AAC
- High efficiency advanced audio coding
HRQ
- Higher rate lattice vector quantization
IMDCT
- Inverse modified discrete cosine transform
ISO
- International organization for standardization
ITU
- International telecommunication union
JAES
- Journal of the Audio Engineering Society
LC
- Low complexity
LRQ
- Lower rate lattice vector quantization
LFE
- Low frequencies enhancement
LTP
- Long term prediction
MDCT
- Modified discrete cosine transform
MPEG
- Moving picture experts group
SBR
- Spectral band replication
SMR
- Symbolic music representation
SRS
- Sample rate scalable
An Overview of G.719 Audio Codec:
In hands-free videoconferencing and teleconferencing markets, there is strong and increasing
demand for audio coding providing the full human auditory bandwidth of 20 Hz to 20 kHz [1],
This is because:
 Conferencing systems are increasingly used for more elaborate presentations, often
including music and sound effects (i.e. animal sounds, musical instruments, vehicles or
nature sounds, etc.) which occupy a wider audio band than speech. Presentations involve
remote education of music, playback of audio and video from DVDs and VCRs,
audio/video clips from PCs, and elaborate audio-visual presentations from, for example,
PowerPoint [1].
 Users perceive the bandwidth of 20 Hz to 20 kHz as representing the ultimate goal for
audio bandwidth. The resulting market pressures are causing a shift in this direction, now
that sufficient IP (internet protocol) bit rate and audio coding
technology
are
available to deliver this. As with any audio codec for hands-free videoconferencing use,
the
requirements include [1]:
 Low latency (support natural conversation)
 Low complexity (free cycles for video processing and other processing reduce cost)
 High quality on all signal types [1].
Block diagram of the G.719 encoder:
Figure 1: Block diagram of the G.719 encoder [1].
In figure 1 the input signal sampled at 48 kHz is processed through a transient detector.
Depending on the detection of a transient, indicated by a flag IsTransient, a high frequency
resolution or a low frequency resolution transform is applied on the input signal frame. The
adaptive transform is based on a modified discrete cosine transform (MDCT) [1] in case of
stationary frames [1]. For transient frames, the MDCT is modified to obtain a higher temporal
resolution without a need for additional delay and with very little overhead in complexity.
Transient frames have a temporal resolution equivalent to 5 ms frames [1].
MDCT is defined as follows:
y(k)=transform co-efficient of input frame
x(n)=time domain aliased signal of the input signal
Block diagram of the G.719 decoder:
Figure 2: Block diagram of the G.719 decoder [1].
A block diagram of the G.719 decoder is shown in Figure 2. The transient flag is first decoded
which indicates the frame configuration, i.e. stationary or transient. The spectral envelope is then
decoded and the same, bit-exact, norm adjustment and bit-allocation algorithms are used at the
decoder to re-compute the bit-allocation which is essential for decoding quantization indices of
the normalized transform coefficients [1]. After decoding the transform coefficients, the noncoded transform coefficients (allocated zero bits) in low frequencies are regenerated by using a
spectral-fill codebook built from the decoded transform coefficients [2].
TRANSFORM COEFFICIENT QUANTIZATION:
Each band consists of one or more vectors of 8-dimensional transform coefficients and the
coefficients are normalized by the quantized norm. All 8-dimensional vectors belonging to one
band are assigned the same number of bits for quantization. A fast lattice vector quantization
(FLVQ) scheme is used to quantize the normalized coefficients in 8 dimensions. In FLVQ the
quantizer comprises two sub-quantizers: a D8-based higher rate lattice vector quantizer (HRQ)
and an RE8-based lower-rate lattice vector quantizer (LRQ). HRQ is a multi-rate quantizer
designed to quantize the transform coefficients at rates of 2 up to 9 bit/coefficient and its
codebook is based on the so-called Voronoi code for the D8 lattice [4]. D8 is a well-known
lattice and defined as:
where Z8 is the lattice which consists of all points with integer coordinates. It can be seen that D8
consists of the points having integer coordinates with an even sum. The codebook of HRQ is
constructed from a finite region of the D8 lattice and is not stored in memory. The code words are
generated by a simple algebraic method and a fast quantization algorithm is used.
Figure 3: Observed spectrum of different sounds, voiced speech, unvoiced speech and pop music
on different audio bandwidths [3].
Figure 3 illustrates how, for some signals, a large portion of energy is beyond the wideband
frequency range. While the use of wideband speech codecs primarily addresses the requirement
of intelligibility, the perceived naturalness and experienced quality of speech can be further
enhanced by providing a larger acoustic bandwidth [3]. This is especially true in applications
such as teleconferencing where a high-fidelity representation of both speech and natural sounds
enables a much higher degree of naturalness and spontaneity. The logical step toward the sense
of being there is the coding and rendering of super wide band signals with an acoustic bandwidth
of 14 kHz. The response of ITU-T to this increased need for naturalness was standardization of
the G.722.1 Annex C extension in 2005 [2]. More recently, this has also led ITU-T to start work
on extensions of the G.718, G.729.1, G.722, and G.711.1 codecs to provide super-wideband
telephony as extension layers to these wideband core codecs [3].
An overview of MPEG – Advanced Audio Coding
Advanced audio coding (AAC) scheme was a joint development by Dolby, Fraunhoffer,
AT&T, Sony and Nokia [9]. It is a digital audio compression scheme for medium to high bit
rates which is not backward compatible with previous MPEG audio standards. The AAC
encoding follows a modular approach and the standard defines four profiles which can be chosen
based on factors like complexity of bit stream to be encoded, desired performance and output.

Low complexity (LC)

Main profile (MAIN)

Sample-rate scalable (SRS)

Long term prediction (LTP)
Excellent audio quality is provided by AAC and it is suitable for low bit rate high quality audio
applications. MPEG – AAC audio coder uses the AAC scheme.
HE – AAC also known as AACPlus is a low bit rate audio coder. It is an AAC LC audio coder
enhanced with spectral band replication (SBR) technology.
AAC is a second generation coding scheme which is used for stereo and multichannel signals.
When compared to the perceptual coders, AAC provides more flexibility and uses more coding
tools [12].
The coding efficiency is enhanced by the following tools and they help attain higher quality at
lower bit rates [12].

This scheme has higher frequency resolution with the number of lines increased up to
1024 from 576.

Joint stereo coding has been improved. The bit rate can be reduced frequently owing to
the flexibility of the mid or side coding and intensity coding.

Huffman coding [12] is applied to the coder partitions.
An overview of spectral band replication technology in AACplus audio codec
Spectral band replication (SBR) is a new audio coding tool that significantly improves the coding
gain of perceptual coders and speech coders. Currently, there are three different audio coders
that have shown a vast improvement by the combination with SBR: MPEG-AAC, MPEG-Layer
II and MPEG-Layer III (mp3), all three being parts of the open ISO-MPEG standard. The
combination of AAC and SBR will be used in the standardized Digital audio Mondiale (DRM)
system, and SBR is currently also being standardized within MPEG-4 [15].
Block diagram of SBR encoder:
Figure 4: Block diagram of the SBR encoder [15]
The basic layout of the SBR encoder is shown in figure 4. The input signal is initially fed to a
down-sampler, which supplies the core encoder with a time domain signal having half the
sampling frequency of the input signal. The input signal is in parallel fed to a 64-channel
analysis QMF bank. The outputs from the filter bank are complex-valued sub-band signals. The
sub-band signals are fed to an envelope estimator and various detectors. The outputs from the
detectors and the envelope estimator are assembled into the SBR data stream. The data is
subsequently coded using entropy coding and, in the case of multichannel signals, also channel-
redundancy coding. The coded SBR data and a bitrate control signal are then supplied to the core
encoder. The SBR encoder interacts closely with the core encoder. Information is exchanged
between the systems in order to, for example, determine the optimal cutoff frequency between
the core coder and the SBR band. The core coder finally multiplexes the SBR data stream into
the combined bitstream [15].
Block diagram of the SBR decoder:
figure 6: Block diagram of the decoder [15]
Figure 6 illustrates the layout of the SBR enhanced decoder. The received bitstream is divided
into two parts: the core coder bitstream and the SBR data stream. The core bitstream is decoded
by the core decoder, and the output audio signal, typically of lowpass character, is forwarded to
the SBR decoder together with the SBR data stream. The core audio signal, sampled at half the
frequency of the original signal, is first filtered in the analysis QMF bank. The filter bank splits
the time domain signal into 32 sub-band signals. The output from the filter bank, i.e. the subband signals, are complex-valued and thus oversampled by a factor of two compared to a regular
QMF bank [15].
Proposal: In this project working of G.719 codec is comprehensively analyzed and studied with
implementation and a comparative analysis in terms of performance, reliability, memory
requirements, compression ratios and application of the of G.719 codec with AAC and HE-AAC
audio codecs is carried out with results.
References:
[1] M. Xie, P. Chu, A. Taleb and M. Briand, " A new low-complexity full band (20kHz)
audio coding standard for high-quality conversational applications", IEEE, Workshop on
Applications of Signal Processing to Audio and Acoustics, pp.265-268, Oct. 2009.
[2] A. Taleb and S. Karapetkov, " The first ITU-T standard for high-quality
conversational fullband audio coding ", IEEE, communications magazine, vol.47, pp.124130, Oct. 2009.
[3] J. Wang, B. Chen, H. He, S. Zhao and J. Kuang, " An adaptive window switching
method for ITU-T G.719 transient coding in TDA domain", IEEE, International
Conference on Wireless, Mobile and Multimedia Networks, pp.298-301, Jan. 2011.
[4] J. Wang, N. ning, X. ji and J. kuang, " Norm adjustment with segmental weighted
SMR for ITU-T G.719 audio codec ", IEEE, International Conference on Multimedia and
Signal Processing, vol.2, pp.282-285, May 2011.
[5] K. Brandenburg and M. Bosi, “ Overview of MPEG audio: current and future
standards for low-bit-rate audio coding ” JAES, vol.45, pp.4-21, Jan/Feb. 1997.
[6]
A/52
B
ATSC
Digital
http://www.atsc.org/cms/standards/a_52b.pdf
Audio
Compression
Standard:
[7] F. Henn , R. Böhm and S. Meltzer, “ Spectral band replication technology and its
application in broadcasting ”, International broadcasting convention, 2003.
[8] M. Dietz and S. Meltzer, “ CT-AACPlus – a state of the art audio coding scheme ”,
Coding Tecnologies, EBU Technical review, July 2002.
[9] ISO/IEC IS 13818-7, “ Information technology – Generic coding of moving pictures
and associated audio information Part 7: advanced audio coding (AAC) ”, 1997.
[10] M. Bosi and R. E. Goldberg, “ Introduction to digital audio coding standards ”,
Norwell, MA, Kluwer, 2003.
[11] H. S. Malvar, “ Signal processing with lapped transforms ”, Artech House,
Norwood, MA, 1992.
[12] D. Meares, K. Watanabe and E. Scheirer, “ Report on the MPEG-2 AAC stereo
verification tests ”, ISO/IEC JTC1/SC29/WG11, Feb. 1998.
[13] Super (c) v.2012.build.50: A simplified universal player encoder and renderer, A
graphic
user interface to FFmpeg, Mencoder, Mplayer, x264, Musepack,
Shorten audio, True audio, Wavpack, Libavcodec library and
Theora/vorbis
real
producers plugin: www.erightsoft.com
[14] T. Ogunfunmi and M. Narasimha, “ Principles of speech coding ”, Boca Raton, FL:
CRC Press, 2010.
[15] P. Ekstrand, " Bandwidth extension of audio signals by spectral band replication ",
IEEE, Workshop on model based processing and coding of audio, pp.53-58, Nov. 2002.
Download