Design of MPEG-4 AAC Encoder Authors: Chi-Min Liu, Wen-Chieh Lee, Chung-Han Yang, KangYan Peng, Ting Chiou, Tzu-Wen Chang, Yu-Hua Hsiao, Hen-Wen Hue and Chu-Ting Chien Outline Introduction Psychoacoustic Model M/S Coding Window Switch Temporal Noise Shaping Experiments & Demonstration Conclusion Introduction– NCTU-AAC Encoder Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank Introduction– NCTU-AAC Encoder Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank 1. Introduction– NCTU-AAC Encoder Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank 1. Introduction Modules Psychoacoustic Model M/S Coding Window Switch Temporal Noise Shaping Objective Theoretical Frameworks Quality Complexity 2. Psychoacoustic Model Approach MDCT-based instead of FFT-based. New Masking Models Detection of tonal attack band. Detection of tone-rich signal. 2. Psychoacoustic Model (c.1) MDCT and FFT Similar spectrum. MDCT spectrum is chaotic due to the aliasing. MDCT leads to the consistent spectrum for analysis and encoding process. 2. Psychoacoustic Model (c.2) DCT Spectrum Q-Bands instead of Lines or P-Bands Tone/Noise information based on Band Flatness instead of Frame Predictivity N 1 1 GM b 1 N 1 flatness b , GM b xi N , AM b xi AM b N i 0 i 0 For tone-rich signal in band, flatnessb approximates to 0 For noise-rich signal in band, flatnessb approximates to 1 2. Psychoacoustic Model-Adaptive TMN and NMT offset Utilization Human Perception Insensitivity in high frequency The masking effect in high frequency is higher than the lower one Offset 4 3.5 3 2.5 2 Offset 1.5 1 0.5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 2. Psychoacoustic Model– Tonal Attack and Tone-Rich Signals Tone/Harmonic Tonal attack. Tone-rich signals. Solution Masking adjustment Disable window switch Original Spectrum Reconstructed Spectrum 2. Psychoacoustic Model– Concluding Remarks New Models Filterbank instead of FFT. SFM instead of unpredictivity. Detection of tonal attack bands. Detection of tonal-rich signals. Noise masking effect alone. Results Speedup by 70% and 65% for AAC and MP3. Quality improves by 0.2 and 0.1 for AAC and MP3. 3. M/S Coding Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank 3. M/S Coding Issues & Approach Band-Level Switching Decision M/S Psychoacoustic Model Conservative masking threshold Bit Allocated to M/S Channels Viterbi Algorithm from O(249) to O(49) Allocation Entropy Joint Design with Window Switch Coupling 3. M/S Coding-- Viterbi Algorithm Find the Optimal Solution SLR(i) and SMS(i) represent the optimal accumulated cost found in i-th band αLR,LR, αLR,MS, αMS,LR and αMS,MS represent the transition cost S LR (0) nLR (0) LR, LR S LR (1) S LR (47) nLR (1) nLR (47) S LR (48) LR, LR LR,MS LR,MS MS , LR MS , LR nMS (0) S MS (0) MS ,MS n MS (47) nMS (1) S MS (1) MS ,MS S MS (47) Scale factor band nLR (48) n MS (48) S MS (48) 3. M/S Coding– Frame-Level Switching Compare the AE of MS and LR C1 is a constant factor False AE_MS < C1 * AE_LR ? True Use M/S Frame Use L/R Frame 3. M/S Coding– M/S Psychoacoustic Model Noise of Reconstructed Signal L'i [k ] M 'i [k ] S 'i [k ] R' i [ k ] M ' i [ k ] S ' i [ k ] L'i [k ] Li [k ] N Li [k ] M i [k ] S i [k ] N M i [k ] N Si [k ] R'i [k ] Ri [k ] N Ri [k ] M i [ k ] S i [ k ] N M i [ k ] N Si [ k ] 3. M/S Coding– M/S Psychoacoustic Model Variance of Noise 2 N2 N2 TL 2 N2 N2 TR NL i Mi Si i NR i Mi Si i 2 0.5 Min(TL , TR ) NM i i i 2 0.5 Min(TL , TR ) NS i i i TX is the masking threshold of X channel σX is the variance of X channel Threshold of M/S Channels TM i 0.5 Min(TLi , TRi ) TSi 0.5 Min(TLi , TRi ) 3. M/S Coding– Allocation Entropy SMRChanneli Ei Ti Bi 0 if ( Ei Ti Bi ) if ( Ei Ti * Bi ) AEChanneli Wi log( SMRChanneli 1) Ei is the energy of i-th quantization band Bi is effective bandwidth of i-th quantization band Wi is the bandwidth of i-th quantization band 3. M/S Coding– Available Bits in the M/S Channels Channel Allocation Bits False L/R band ? True AEM = AEM + L_AE[i] AES = AES + R_AE[i] False AEM = AEM + M_AE[i] AES = AES + S_AE[i] i < 49 ? True AEM Bit M B AEM AES AE S Bit S B AE M AE S B is allocated bits for current frame 4. Window Switch Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank 4. Window Switch Design Issues Window Decision Psychoacoustic Model Window Grouping Joint Design with Other AAC Modules 4. Window Switch– Window Decision Global Energy Ratio Zero-Crossing Ratio Tonal Attack 4. Window Switch– Psychoacoustic Model Models based on Long Window Calculate SMRs for Short Windows From SMRs for Long Windows band SMRs for short window band SMRs for long window 4. Window Switch– Window Grouping Calculate the Scale Factor Bit allocation module calculate the scale factor for each band. Error of Scale Factors Eg sfb, w sharedsfg ,b bandwidthb b wg Criterion Minimizes the Grouping Number Eg in each group should be smaller than a threshold M 5. Temporal Noise Shaping Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank 5. TNS Three Artifacts Error Amplification at Attack periods Time-Aliasing TNS order vs Error. Design Issues ? Detection Mechanism TNS Design 5. TNS Remarks Pre-aliasing leads to the tradeoff with Pre-echo Post-aliasing may be masked by post-aliasing 5. TNS-- Ease Aliasing Artifacts Combining with Window Switch Long Start and Long Stop window 6. Experiments Psychoacoustic Model M/S Coding Window Switch TNS Overall 6. Experiments-- Test Samples Track Time Signal description 1 10 es01 vocal (Suzan Vega) 2 8 es02 German speech 3 7 es03 English speech 4 10 sc01 Trumpet solo and orchestra 5 12 sc02 Orchestral piece 6 11 sc03 Contemporary pop music 7 7 si01 Harpsichord 8 7 si02 Castanets 9 27 si03 pitch pipe 10 11 sm01 Bagpipes 11 10 sm02 Glockenspiel 12 13 sm03 Plucked strings Speech signal Complex sound mixtures Single instruments Simple sound mixtures 6. Experiments– Psychoacoustic Model Intel vTune 7.0 Psychoacoustic Models P1: Psychoacoustic Model II P4: MDCT Psychoacoustic Model Speed up 72.58% over P1 1 2 3 4 5 Average Speedup (%) P1 30.24 29.66 29.75 29.96 27.75 29.47 P4 8.57 8.94 8.00 7.31 7.59 8.08 72.58 6. Experiments-- Psychoacoustic Model Speed up 14.59% over P1 Tracks Length P1 P4 Percentage (%) es01 02:51 26 19 26.92 es02 02:17 19 14 26.32 es03 04:03 36 27 25.00 sc01 02:55 22 18 18.18 sc02 03:23 28 23 17.86 sc03 03:04 27 23 14.81 si01 04:47 39 36 7.69 si02 03:05 30 26 13.33 si03 05:34 49 45 8.16 sm01 04:27 38 35 7.89 sm02 02:01 18 16 11.11 sm03 04:11 38 34 10.53 30.8 26.3 14.59 Average 6. Experiments-- Psychoacoustic Model Category Result P4 gets better quality than P1 in speech signal, single instrument and simple sound mixtures For complex sound mixtures, only sc02 is worse than P1 es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03 0 -0.5 -1 -1.5 -2 -2.5 -3 -3.5 -4 P1 P4 6. Experiments– M/S Coding Environment Coding Mode L/R New M/S Disable bit reservoir, window switch and TNS Uses P4 es01 -1.57 -0.82 es02 -2.03 -0.55 es03 -2.21 -0.84 sc01 -0.74 -0.54 sc02 -1.11 -0.83 Improve 0.39 of average ODG sc03 -0.7 -0.52 si01 -1.16 -1.05 si02 -3.24 -3.01 si03 -1.29 -1.21 sm01 -0.9 -0.93 sm02 -1.54 -1.4 sm03 -1.37 -1.5 Average -1.4883 -1.1 6. Experiments– Window Switch Coupling Method Average ODGs of with and without coupling method are −0.7025 and −0.8483 ag e A ve r 03 sm 02 sm 01 sm 3 si0 2 si0 1 si0 03 sc 02 sc 01 sc 03 es 02 es 01 Bit Rate=128Kbps, Sample Rate=44.1kHz, with Short Window and M/S es 0 -0.2 -0.4 -0.6 ODG -0.8 -1 -1.2 -1.4 -1.6 NCTU_AAC without Coupling Method NCTU_AAC with Coupling Method 6. Experiments– TNS Easing Aliasing Method Improve quality except sm01 Especially for si02 6. Experiments– Overall Nero 6.3 QuickTime 6.3 NCTU-AAC es01 -0.6 -0.32 -0.27 es02 -0.45 -0.11 -0.15 es03 -0.51 0.02 -0.23 sc01 -0.88 -0.22 -0.45 sc02 -1.38 -0.84 -0.66 sc03 -0.84 -0.64 -0.4 si01 -1.32 -0.71 -0.62 si02 -0.82 -0.72 -0.54 si03 -1.59 -0.78 -0.98 sm01 -1.36 -0.75 -0.61 sm02 -0.72 -0.37 -0.53 sm03 -1.29 -0.73 -0.62 Average -0.98 -0.51417 -0.505 Commercial Encoders Nero 6.3 QuickTime 6.3 Result NCTU-AAC has better quality in all tracks as compared to Nero 6.3 NCTU-AAC has better quality in 7 tracks as compared to QuickTime 6.3 NCTU-AAC performs better than these two encoders in average Encoders with Audio Patch Method Nero 6.3 Nero6.3 +APM QuickTime 6.3 QT6.3 +APM NCTU-AAC NCTUAAC +APM es01 -0.6 -0.38 -0.32 -0.26 -0.27 -0.28 es02 -0.45 -0.44 -0.11 -0.18 -0.15 -0.14 es03 -0.51 -0.43 0.02 -0.02 -0.23 -0.24 sc01 -0.88 -0.73 -0.22 -0.21 -0.45 -0.43 sc02 -1.38 -0.70 -0.84 -0.43 -0.66 -0.51 sc03 -0.84 -0.40 -0.64 -0.32 -0.4 -0.37 si01 -1.32 -0.52 -0.71 -0.47 -0.62 -0.43 si02 -0.82 -0.63 -0.72 -0.55 -0.54 -0.53 si03 -1.59 -0.64 -0.78 -0.43 -0.98 -0.51 sm01 -1.36 -0.83 -0.75 -0.53 -0.61 -0.46 sm02 -0.72 -0.73 -0.37 -0.38 -0.53 -0.54 sm03 -1.29 -0.55 -0.73 -0.35 -0.62 -0.42 Average -0.98 -0.5817 -0.51417 -0.34417 -0.505 -0.4050 QuickTime 6.3 with APM gets the best quality in average Conclusion Quality and Efficiency Efficient Psychoacoustic Model M/S Coding DCT-based Approach. Tonal Attack bands and Tone-Rich Signals. Efficient decision method. Psychoacoustic model for M/S channels. Viterbi algorithm. Window Switch Switch Detection. New grouping method. Psychoacosutic Model for Short Window. Conclusion TNS Bit Allocation Two-Step Approach. Filter bank Single Loop Approach Bit Reservoir Window Detection New window switch policy Fast DCT method Audio Patch Method Zero band and High frequency extension. 5. NCTU- AAC CODEC Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank PatchEnable Decoder Effect 5. NCTU- AAC CODEC (Patents) Audio in Psychoacoustic Model W-Switch Bit Reservoir TNS M/S Bit Allocation Quantization VLC Bit-Stream Packing Filterbank PatchEnable Decoder Effect SC03 Original SC03 QT 6.3 QT6.3 -0.64 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM SC03 Nero 6.3 QT6.3 Nero 6.3 -0.64 -0.84 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM SC03 Lame 3.88 QT6.3 Nero 6.3 Lame 3.88 -0.64 -0.84 -1.16 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM SC03 NCTU-AAC QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC -0.64 -0.84 -1.16 -0.4 NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM SC03 NCTU-MP3 QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 -0.64 -0.84 -1.16 -0.4 -0.91 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM SC03 QT 6.3+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM SC03 Nero 6.3+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM SC03 NCTU-AAC+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 NCTU -MP3 +APM Lame 3.88 +APM SC03 NCTU-MP3+APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38 Lame 3.88 +APM SC03 Lame 3.88 + APM QT6.3 Nero 6.3 Lame 3.88 NCTU -AAC NCTU -MP3 QT6.3 +APM Nero 6.3 +APM NCTU -AAC +APM NCTU -MP3 +APM Lame 3.88 +APM -0.64 -0.84 -1.16 -0.4 -0.91 -0.32 -0.4 -0.37 -0.38 -0.41 Questions