Block Diagram of MPEG AAC

advertisement
Design of MPEG-4 AAC Encoder
Authors:
Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, KangYan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao,
Hen-Wen Hue and Chu-Ting Chien
Outline







Introduction
Psychoacoustic Model
M/S Coding
Window Switch
Temporal Noise Shaping
Experiments & Demonstration
Conclusion
Introduction–
NCTU-AAC Encoder
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
Introduction–
NCTU-AAC Encoder
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
1. Introduction–
NCTU-AAC Encoder
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
1. Introduction

Modules





Psychoacoustic Model
M/S Coding
Window Switch
Temporal Noise Shaping
Objective



Theoretical Frameworks
Quality
Complexity
2. Psychoacoustic Model

Approach
MDCT-based instead of FFT-based.
 New Masking Models
 Detection of tonal attack band.
 Detection of tone-rich signal.

2. Psychoacoustic Model (c.1)

MDCT and FFT



Similar spectrum.
MDCT spectrum is
chaotic due to the
aliasing.
MDCT leads to the
consistent spectrum
for analysis and
encoding process.
2. Psychoacoustic Model (c.2)
DCT Spectrum


Q-Bands instead of Lines or P-Bands
Tone/Noise information based on

Band Flatness instead of Frame Predictivity
N 1
1
GM b
1 N 1
flatness b 
, GM b   xi N , AM b   xi
AM b
N i 0
i 0


For tone-rich signal in band, flatnessb approximates to 0
For noise-rich signal in band, flatnessb approximates to 1
2. Psychoacoustic Model-Adaptive TMN and NMT offset

Utilization Human Perception


Insensitivity in high frequency
The masking effect in high frequency is
higher than the lower one
Offset
4
3.5
3
2.5
2
Offset
1.5
1
0.5
0
1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
2. Psychoacoustic Model–
Tonal Attack and Tone-Rich Signals

Tone/Harmonic



Tonal attack.
Tone-rich signals.
Solution


Masking adjustment
Disable window switch
Original Spectrum
Reconstructed Spectrum
2. Psychoacoustic Model–
Concluding Remarks

New Models






Filterbank instead of FFT.
SFM instead of unpredictivity.
Detection of tonal attack bands.
Detection of tonal-rich signals.
Noise masking effect alone.
Results


Speedup by 70% and 65% for AAC and MP3.
Quality improves by 0.2 and 0.1 for AAC and MP3.
3. M/S Coding
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
3. M/S Coding
Issues & Approach

Band-Level Switching Decision


M/S Psychoacoustic Model


Conservative masking threshold
Bit Allocated to M/S Channels


Viterbi Algorithm from O(249) to O(49)
Allocation Entropy
Joint Design with Window Switch

Coupling
3. M/S Coding-- Viterbi Algorithm

Find the Optimal Solution


SLR(i) and SMS(i) represent the optimal accumulated
cost found in i-th band
αLR,LR, αLR,MS, αMS,LR and αMS,MS represent the
transition cost
S LR (0)
nLR (0)
 LR, LR
S LR (1)
S LR (47)
nLR (1)
nLR (47)
S LR (48)
 LR, LR
 LR,MS
 LR,MS
 MS , LR
 MS , LR
nMS (0)
S MS (0)
 MS ,MS
n MS (47)
nMS (1)
S MS (1)
 MS ,MS
S MS (47)
Scale factor band
nLR (48)
n MS (48)
S MS (48)
3. M/S Coding–
Frame-Level Switching

Compare the AE of MS and LR

C1 is a constant factor
False
AE_MS < C1 * AE_LR
?
True
Use M/S Frame
Use L/R Frame
3. M/S Coding–
M/S Psychoacoustic Model

Noise of Reconstructed Signal
L'i [k ]  M 'i [k ]  S 'i [k ]
R' i [ k ]  M ' i [ k ]  S ' i [ k ]
L'i [k ]  Li [k ]  N Li [k ]
 M i [k ]  S i [k ]  N M i [k ]  N Si [k ]
R'i [k ]  Ri [k ]  N Ri [k ]
 M i [ k ]  S i [ k ]  N M i [ k ]  N Si [ k ]
3. M/S Coding–
M/S Psychoacoustic Model

Variance of Noise
 2   N2   N2  TL
 2   N2   N2  TR
NL
i
Mi
Si
i
NR
i
Mi
Si
i
 2  0.5  Min(TL , TR )
NM
i
i
i
 2  0.5  Min(TL , TR )
NS
i



i
i
TX is the masking threshold of X channel
σX is the variance of X channel
Threshold of M/S Channels
TM i  0.5  Min(TLi , TRi )
TSi  0.5  Min(TLi , TRi )
3. M/S Coding–
Allocation Entropy
SMRChanneli
 Ei

 Ti Bi

 0
if ( Ei  Ti Bi )
if ( Ei  Ti * Bi )
AEChanneli  Wi  log( SMRChanneli  1)



Ei is the energy of i-th quantization band
Bi is effective bandwidth of i-th quantization band
Wi is the bandwidth of i-th quantization band
3. M/S Coding–
Available Bits in the M/S Channels

Channel Allocation Bits
False
L/R band
?
True
AEM = AEM + L_AE[i]
AES = AES + R_AE[i]
False
AEM = AEM + M_AE[i]
AES = AES + S_AE[i]
i < 49
?
True

AEM
Bit M 
B
AEM  AES
AE S
Bit S 
B
AE M  AE S
B is allocated bits for current frame
4. Window Switch
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
4. Window Switch
Design Issues




Window Decision
Psychoacoustic Model
Window Grouping
Joint Design with Other AAC
Modules
4. Window Switch–
Window Decision



Global Energy Ratio
Zero-Crossing Ratio
Tonal Attack
4. Window Switch–
Psychoacoustic Model

Models based on Long Window

Calculate SMRs for Short Windows From
SMRs for Long Windows
band SMRs for short window
band SMRs for long window
4. Window Switch–
Window Grouping

Calculate the Scale Factor


Bit allocation module calculate the scale
factor for each band.
Error of Scale Factors
Eg   sfb, w  sharedsfg ,b  bandwidthb
b wg

Criterion


Minimizes the Grouping Number
Eg in each group should be smaller than a
threshold M
5. Temporal Noise Shaping
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
5. TNS

Three Artifacts




Error Amplification at
Attack periods
Time-Aliasing
TNS order vs Error.
Design Issues ?


Detection Mechanism
TNS Design
5. TNS

Remarks


Pre-aliasing leads to the tradeoff with Pre-echo
Post-aliasing may be masked by post-aliasing
5. TNS-- Ease Aliasing Artifacts

Combining with Window Switch

Long Start and Long Stop window
6. Experiments





Psychoacoustic Model
M/S Coding
Window Switch
TNS
Overall
6. Experiments-- Test Samples
Track
Time
Signal description
1
10
es01
vocal (Suzan Vega)
2
8
es02
German speech
3
7
es03
English speech
4
10
sc01
Trumpet solo and orchestra
5
12
sc02
Orchestral piece
6
11
sc03
Contemporary pop music
7
7
si01
Harpsichord
8
7
si02
Castanets
9
27
si03
pitch pipe
10
11
sm01
Bagpipes
11
10
sm02
Glockenspiel
12
13
sm03
Plucked strings
Speech signal
Complex sound mixtures
Single instruments
Simple sound mixtures
6. Experiments–
Psychoacoustic Model


Intel vTune 7.0
Psychoacoustic Models



P1: Psychoacoustic Model II
P4: MDCT Psychoacoustic Model
Speed up 72.58% over P1
1
2
3
4
5
Average Speedup (%)
P1
30.24
29.66
29.75
29.96
27.75
29.47
P4
8.57
8.94
8.00
7.31
7.59
8.08
72.58
6. Experiments--
Psychoacoustic Model

Speed up
14.59% over P1
Tracks
Length
P1
P4
Percentage
(%)
es01
02:51
26
19
26.92
es02
02:17
19
14
26.32
es03
04:03
36
27
25.00
sc01
02:55
22
18
18.18
sc02
03:23
28
23
17.86
sc03
03:04
27
23
14.81
si01
04:47
39
36
7.69
si02
03:05
30
26
13.33
si03
05:34
49
45
8.16
sm01
04:27
38
35
7.89
sm02
02:01
18
16
11.11
sm03
04:11
38
34
10.53
30.8
26.3
14.59
Average
6. Experiments--
Psychoacoustic Model

Category Result


P4 gets better quality than P1 in speech signal,
single instrument and simple sound mixtures
For complex sound mixtures, only sc02 is worse
than P1
es01
es02
es03
sc01
sc02
sc03
si01
si02
si03
sm01
sm02
sm03
0
-0.5
-1
-1.5
-2
-2.5
-3
-3.5
-4
P1
P4
6. Experiments– M/S Coding

Environment
Coding Mode
L/R
New M/S
Disable bit reservoir,
window switch and
TNS
Uses P4
es01
-1.57
-0.82
es02
-2.03
-0.55
es03
-2.21
-0.84
sc01
-0.74
-0.54
sc02
-1.11
-0.83
Improve 0.39 of
average ODG
sc03
-0.7
-0.52
si01
-1.16
-1.05
si02
-3.24
-3.01
si03
-1.29
-1.21
sm01
-0.9
-0.93
sm02
-1.54
-1.4
sm03
-1.37
-1.5
Average
-1.4883
-1.1



6. Experiments– Window Switch
Coupling Method
Average ODGs of with and without coupling method
are −0.7025 and −0.8483
ag
e
A
ve
r
03
sm
02
sm
01
sm
3
si0
2
si0
1
si0
03
sc
02
sc
01
sc
03
es
02
es
01
Bit Rate=128Kbps, Sample Rate=44.1kHz, with Short Window and M/S
es

0
-0.2
-0.4
-0.6
ODG

-0.8
-1
-1.2
-1.4
-1.6
NCTU_AAC without Coupling Method
NCTU_AAC with Coupling Method
6. Experiments– TNS

Easing Aliasing Method


Improve quality except sm01
Especially for si02
6. Experiments– Overall
Nero 6.3
QuickTime 6.3
NCTU-AAC
es01
-0.6
-0.32
-0.27
es02
-0.45
-0.11
-0.15
es03
-0.51
0.02
-0.23
sc01
-0.88
-0.22
-0.45
sc02
-1.38
-0.84
-0.66
sc03
-0.84
-0.64
-0.4
si01
-1.32
-0.71
-0.62
si02
-0.82
-0.72
-0.54
si03
-1.59
-0.78
-0.98
sm01
-1.36
-0.75
-0.61
sm02
-0.72
-0.37
-0.53
sm03
-1.29
-0.73
-0.62
Average
-0.98
-0.51417
-0.505

Commercial
Encoders



Nero 6.3
QuickTime 6.3
Result



NCTU-AAC has better
quality in all tracks as
compared to Nero 6.3
NCTU-AAC has better
quality in 7 tracks as
compared to QuickTime
6.3
NCTU-AAC performs
better than these two
encoders in average
Encoders with Audio Patch
Method
Nero 6.3
Nero6.3
+APM
QuickTime
6.3
QT6.3
+APM
NCTU-AAC
NCTUAAC
+APM
es01
-0.6
-0.38
-0.32
-0.26
-0.27
-0.28
es02
-0.45
-0.44
-0.11
-0.18
-0.15
-0.14
es03
-0.51
-0.43
0.02
-0.02
-0.23
-0.24
sc01
-0.88
-0.73
-0.22
-0.21
-0.45
-0.43
sc02
-1.38
-0.70
-0.84
-0.43
-0.66
-0.51
sc03
-0.84
-0.40
-0.64
-0.32
-0.4
-0.37
si01
-1.32
-0.52
-0.71
-0.47
-0.62
-0.43
si02
-0.82
-0.63
-0.72
-0.55
-0.54
-0.53
si03
-1.59
-0.64
-0.78
-0.43
-0.98
-0.51
sm01
-1.36
-0.83
-0.75
-0.53
-0.61
-0.46
sm02
-0.72
-0.73
-0.37
-0.38
-0.53
-0.54
sm03
-1.29
-0.55
-0.73
-0.35
-0.62
-0.42
Average
-0.98
-0.5817
-0.51417
-0.34417
-0.505
-0.4050
QuickTime 6.3 with APM gets the best quality in average
Conclusion
Quality and Efficiency

Efficient Psychoacoustic Model



M/S Coding




DCT-based Approach.
Tonal Attack bands and Tone-Rich Signals.
Efficient decision method.
Psychoacoustic model for M/S channels.
Viterbi algorithm.
Window Switch



Switch Detection.
New grouping method.
Psychoacosutic Model for Short Window.
Conclusion

TNS



Bit Allocation


Two-Step Approach.
Filter bank


Single Loop Approach
Bit Reservoir


Window Detection
New window switch policy
Fast DCT method
Audio Patch Method

Zero band and High frequency extension.
5. NCTU- AAC CODEC
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
PatchEnable
Decoder
Effect
5. NCTU- AAC CODEC (Patents)
Audio in
Psychoacoustic
Model
W-Switch
Bit Reservoir
TNS
M/S
Bit Allocation
Quantization
VLC
Bit-Stream Packing
Filterbank
PatchEnable
Decoder
Effect
SC03 Original
SC03 QT 6.3
QT6.3
-0.64
Nero
6.3
Lame
3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 Nero 6.3
QT6.3
Nero
6.3
-0.64
-0.84
Lame
3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 Lame 3.88
QT6.3
Nero
6.3
Lame
3.88
-0.64
-0.84
-1.16
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 NCTU-AAC
QT6.3
Nero
6.3
Lame
3.88
NCTU
-AAC
-0.64
-0.84
-1.16
-0.4
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 NCTU-MP3
QT6.3
Nero
6.3
Lame
3.88
NCTU
-AAC
NCTU
-MP3
-0.64
-0.84
-1.16
-0.4
-0.91
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 QT 6.3+APM
QT6.3
Nero
6.3
Lame
3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
-0.64
-0.84
-1.16
-0.4
-0.91
-0.32
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 Nero 6.3+APM
QT6.3
Nero
6.3
Lame
3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
-0.64
-0.84
-1.16
-0.4
-0.91
-0.32
-0.4
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 NCTU-AAC+APM
QT6.3
Nero
6.3
Lame
3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
-0.64
-0.84
-1.16
-0.4
-0.91
-0.32
-0.4
-0.37
NCTU
-MP3
+APM
Lame
3.88
+APM
SC03 NCTU-MP3+APM
QT6.3
Nero
6.3
Lame
3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
-0.64
-0.84
-1.16
-0.4
-0.91
-0.32
-0.4
-0.37
-0.38
Lame
3.88
+APM
SC03 Lame 3.88 + APM
QT6.3
Nero
6.3
Lame
3.88
NCTU
-AAC
NCTU
-MP3
QT6.3
+APM
Nero
6.3
+APM
NCTU
-AAC
+APM
NCTU
-MP3
+APM
Lame
3.88
+APM
-0.64
-0.84
-1.16
-0.4
-0.91
-0.32
-0.4
-0.37
-0.38
-0.41
Questions
Download