EE 5359 TOPICS IN SIGNAL PROCESSING FINAL REPORT ANALYSIS OF AVS-M FOR LOW PICTURE RESOLUTION MOBILE APPLICATIONS Under the guidance of DR. K. R. RAO DETARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS AT ARLINGTON Submitted By: ADITYA DESHKAR(1000848085) aditya.deshkar@mavs.uta.edu List of Acronyms: • AU Access Unit • AVS Audio Video Standard • AVS-M Audio Video Standard for mobile • B-Frame Interpolated Frame • CAVLC Context Adaptive Variable Length Coding • CBP Coded Block Pattern • CIF Common Intermediate Format • DIP Direct Intra Prediction • DPB Decoded Picture Buffer • EOB End of Block • HD High Definition • HHR Horizontal High Resolution • ICT Integer Cosine Transform • IDR Instantaneous Decoding Refresh • I-Frame Intra Frame • IMS IP Multimedia Subsystem • ITU-T (Telecommunication Standardization Sector of the International Telecommunications Union) • MB Macroblocks • MPEG Moving Picture Experts Group • MPM Most Probable Mode • MV Motion Vector • NAL Network Abstraction Layer • P-Frame Predicted Frame • PIT Prescaled Integer Transform • PPS Picture Parameter Set • QCIF Quarter Common Intermediate Format • QP Quantization Parameter • RD Cost Rate Distortion Cost • SAD Sum of Absolute Differences • SD Standard Definition • SEI Supplemental Enhancement Information • SPS Sequence Parameter Set • VLC Variable Length Coding LIST OF FIGURE FIGURE 1 : HISTORY OF A/V CODING STANDARD FIGURE 2 : STANDARD STRUCTURE OF AVS-VIDEO FIGURE 3 : LAYERED STRUCTURE AVS CHINA FIGURE 4 : PICTURE TYPES IN AVS PART 7 FIGURE 5: SLICE STRUTURE FOR AVS PART 7 FIGURE 6 : MACROBLOCK PARTITIONING FIGURE 7: AVS-M ENCODER FIGURE 8 : AVS-M DECODER FIGURE 9 : INTRA_4X4 PREDICTION FIGURE 10 : EIGHT DIRECTIONAL PREDICTION MODES OF AVS PART 7 FIGURE 11 : NINE INTRA_4X4 PREDICTION MODES OF AVS PART 7 FIGURE 12 :RELATIONS BETWEEN VARIALBLE POSITIONS AND REFERENCE SAMPLES FIGURE 13 : THE POSITION OF INTEGER, HALF, AND QUARTER PIXEL SAMPLES FIGURE 14: THE FLOW CHART OF MAIN() FIGURE 15: THE FLOW CHART OF ENCODE_I_FRAME() FIGURE 16: THE FLOW CHART OF ENCODE_P_FRAME() LIST OF TABLES TABLE 1 : PROFILES OF AVS CHINA STANDARD TABLE 2 : DIFFERENT PARTS OF AVS STANDARD TABLE 3 : NAL UNIT TYPES TABLE 4 : CONTENT BASED MOST PROBABLE INTRA MODE DECISION TABLE 5 : COMPARISON BETWEEN AVS PART 2 AND AVS PART 7 TABLE 6 : COMPARISON BETWEEN AVS PART 7 AND H.264 BASELINE PROFILE TABLE 7 : COMPRESSED FILE SIZE, COMPRESSION RATIO, BIT RATE, PSNR AND SSIM AT VARIOUS QP FOR MOTHER-DAUGHTER_QCIF SEQUENCE TABLE 8 : COMPRESSED FILE SIZE, COMPRESSION RATIO, BIT RATE, PSNR AND SSIM AT VARIOUS QP FOR NEWS_CIF SEQUENCE TABLE 9 : COMPRESSED FILE SIZE, COMPRESSION RATIO, BIT RATE, PSNR AND SSIM AT VARIOUS QP FOR FOREMAN_QCIF SEQUENCE Abstract: Audio video standard for Mobile (AVS-M) [1] is seventh part of the standard developed by Audio Video coding Standard (AVS) workgroup of China. AVS-M is particularly aimed for mobile systems and devices with limited processing and power consumption. The project provides insight into AVS-M video coding standard (Jiben Profile) [2] and will analyze its architecture, features and data formats for its use in low complexity and low picture resolution mobile applications. The project mainly focuses on providing an understanding of the AVS-M video encoder and decoder, while detailing various logical components within these systems. A performance comparison is made with the other popular standards[5], and its major applications are discussed A study is done on the key techniques such as Intra prediction, quarter-pixel interpolation, motion compensation modes, transform and quantization, entropy coding, In-loop de-blocking filter, profile and tools that are used in this standard, and the various methods of implementing each key technique are explored. INTRODUCTION The digital entertainment media is the largest application among the plethora of applications[5] which are advanced due to success of standards for audio video signals. Mobile devices typically need efficient video coding standard considering the following factors: 1. To design robust system to deal with transmission error problems 2. Estimate loss information due to transmission error problems 3. Performance at lower cost 4. High audio-video quality at low resolution 5. Performance at low power AVS-M[1] Standard can cover a broad range of applications including mobile multimedia services , IP multimedia subsystems , multimedia mailing , multimedia services over packet networks , video conferencing , video phone , video surveillance, all requiring the above mentioned criteria. Over the past 20 years, analog based communication around the world has been sidetracked by digital communication. The modes of digital representation of information such as audio and video signals have undergone much transformation in leaps and bounds. With the increase in commercial interest in video communications, the need for international image and video compression standards arose. Many successful standards of audio-video signals have been released which have advanced a plethora of applications, the largest of which is the digital entertainment media. Products have been developed which span a wide range of applications and have been enhanced by the advances in other technologies such as the internet and digital media storage Figure 1 : HISTORY OF A/V CODING STANDARD[1] AVS China [1] was developed by the AVS workgroup, and is currently owned by China. This audio and video standard was initiated by the Chinese government in order to counter the monopoly of the MPEG standards[5], which were costing it dearly. AVS China clearly seeks to cut down on dependence of audio-video information formatting based on the MPEG formats, thereby providing China with a standard, that helped save millions of dollars of Chinese money being lost to the MPEG group. AVS objective was to create a national audio-video standard for broadcasting in China and further extend this technology across the globe. Figure 2 : STANDARD STRUCTURE OF AVS-VIDEO[1] PROFILES AND LEVELS Audio-video coding standard (AVS) is a working group of audio and video coding standard in China, which was established in 2002. AVS-China consists of four profiles namely: Jizhun (base) profile, Jiben (basic) profile, Shenzhan (extended) profile and Jiaqiang (enhanced) profile, defined in AVS-video targeting to different applications [4] Profiles Jizhun profile Key Applications Television broadcasting, HDTV, etc. Jiben profile Mobility applications, etc. Shenzhan profile Video surveillance, etc. Jiaqiang profile Multimedia entertainment, etc. Table 1 : PROFILES OF AVS CHINA STANDARD [4] AVS is a set of integrity standard system – system , video, audio and media copyright management. AVS M is the 7th part of the video coding standard developed by the AVS Workgroup of China which aims for mobile systems and devices. Profiles and Levels characteristics: Significance: To facilitate interoperability among streams from various applications. Profile : It is subset of syntax , semantics and algorithms defined by AVS M. Level : It places constraints on the parameters of the stream. 'Jiben' Profile has been defined with 9 different levels : 1.0,1.1,1.2,1.3,2.0,2.1,2.2,3.0,3.1[4] Table 2 : DIFFERENT PARTS OF AVS STANDARD[3] DATA FORMATS USED IN AVS[5] 1) Progressive scan format It is a method of storing and transmitting images where in all lines of each frame is drawn in sequence 2) Interlaced scan format It involves alternate drawing of odd and even lines. (Interlacing even and odd fields) Progressive scan format has the following advantages over Interlaced format Efficiency in operation of motion estimation[11] Significantly lower bit rate required for encoding Less complexity involved in motion compensation[11] Thus all the characteristics required for low power and low resolution mobile devices are satisfied using progressive scan format . Layered Structure AVS follows a layered structure for the data and this is very much visible in the coded bit stream. The Layered structure is shown in Figure 3. Figure 3 : Layered Structure AVS China[5] 1)Sequence[3] The sequence layer provides an entry point into the coded video 2) Picture[3] The picture layer provides the coded representation of a video frame. It comprises a header with mandatory and optional parameters and optionally with user data. There are 3 types of pictures defined by the AVS: • I- Pictures (Intra Pictures) • P-Pictures (Predicted Pictures) • B-Pictures (Interpolated Pictures) AVS-M uses 4:2:0 Sub Sampling format as shown in figure 6. AVS-M supports only I picture and P picture as shown in Figure 4. AVS-M supports progressive video sequence, therefore one picture is one frame. As shown in figure P picture can have maximum two reference frames for forward prediction Figure 4 : PICTURE TYPES IN AVS PART 7[3] 3) Slice[3] Slice comprises of series of Macro blocks. They must not overlap, must be contiguous, must begin and terminate at the left and right edge of the picture. A single slice can cover the entire picture. Slices are independently coded so no slice can refer to another during the decoding process. Figure 5: SLICE STRUTURE FOR AVS PART 7[3] 4)Macro blocks and Blocks[3] Picture is divided into Macro blocks. The upper left sample of each MB should not exceed picture boundary. The Macro blocks are partitioned for motion compensation. The number in each rectangle specifies the order of appearance of motion vectors. Figure 6 : MACROBLOCK PARTITIONING[3] AVS-M ENCODER[5] Figure 7: AVS-M encoder [5] A video consists of a sequence of frames (YUV)[5] and each frame is split into several rectangular blocks known as macro blocks which contain a fixed size of 16x16 luminance components and their corresponding chrominance components. Predictive type coding is performed on each of the macro blocks that can be classified into either as inter-frame coding or intra-frame coding. The transform is performed on the macro blocks corresponding to the prediction residuals, which are the differences between original pixel values of the current image and the predicted pixel values. The transform coefficients are further quantized and scanned before entropy coding and finally the entropy coded information is converted into a bit stream. AVS-M DECODER[5] Figure 8 : AVS-M decoder [5] The AVS decoder takes in the compressed video elementary stream from the storage or transmission media as its input and stores it in a rate buffer from which the data is read out at a rate demanded by the decoding of each macro block and picture. This is followed by a bit stream parser which separates the quantization parameter, motion vectors and other side information from the coded data. The data is then passed through the VLD entropy decoder[5] which extracts the header information and the slice data along the motion vectors. The signal is then decoded by the inverse quantizer and inverse DCT to reconstruct the prediction error or the coded data. The motion vectors are decoded by the motion compensation unit to generate the prediction of the current picture which is further added to the prediction error to generate the output signal. Network Abstraction Layer (NAL) and Supplemental Enhancement Information[12] NAL unit stands for network abstraction layer unit. It is a kind of packetization layer that prefixes certain headers to encoded video bit streams. NAL unit is primarily designed for following main reasons: 1) Provide network friendly environment for transmission of video data 2) Address video related application such as video telephony , video storage , broadcast and streaming applications , IPTV etc 3) The AVS encoded raw bit stream is converted to NAL unit before sending it over network In AVS-M video compression, a compressed video bit stream is made up of access units (AUs), and each AU contains information for decoding a picture. An AU consists of a number of NAL units, some of which are optional. A NAL unit can be a sequence parameter set (SPS), a picture parameter set (PPS), an SEI, a picture header, or a slice_layer_rbsp (raw byte sequence payload) which consists of a slice_header followed by slice data [2] [5]. In the byte-format bit stream, a NAL unit starts with 3-byte startcode (0x000001) followed by a 1-byte NAL unit indicator in which nal_unit_type is represented in a 5 bit field. For decoding a picture in AVS-M an AU contains optional SPS, PPS, SEI NAL units followed by a mandatory picture header NAL unit and several slice_layer_rbsp NAL units. Table 2 lists the NAL unit types Table 3 : NAL UNIT TYPES[13] Intra Prediction[4],[13] There are two types of Intra Prediction which are used. [A] Intra _4x4[13] [B] Direct Intra Prediction (DIP)[4] It significantly reduces the complexity and maintains a comparable performance. Intra_4x4 Each 4x4 block is predicted from spatially neighboring samples. For each 4x4 block, one of the nine predictions modes can be utilized to exploit spatial correlation including eight directional prediction modes (such as Down Left, vertical etc.) and nondirectional prediction mode (DC). The 16 samples of the 4x4 block which are labeled as a-p are predicted using prior decoded samples in adjacent block label as A-D, E-H and X. The up right pixels used to predict are expanded by pixel sample D and the down left pixels are expanded by H. Figure 9 : INTRA_4X4 PREDICTION[13] Figure 10 : EIGHT DIRECTIONAL PREDICTION MODES OF AVS PART 7[13] • 1 of the 9 prediction modes shown in figure .11 is used for spatial correlation. Figure 11 : NINE INTRA_4X4 PREDICTION MODES OF AVS PART 7[13] Content based Most Probable Intra Mode Decision A statistical model is used to determine the most probable intra mode of current block based on video characteristics and content correlation. A look up table is used to predict the most probable intra mode decision of current block. Irrespective of whether Intra_4x4 or DIP is used, the most probable mode decision method is described as follows: Get the intra mode of up block and left block. If the up (or left) block is not available for intra mode prediction, the mode up (or left) block is defined as -1. Use the up intra mode and left intra mode to find the most probable mode in the table. If the current MB is coded as Intra_4x4 mode, the intra prediction mode is coded as follows: If the best mode equals to the most probable mode, 1 bit of flag is transmitted to each block to indicate the mode of current block is its most probable mode. Table 4 : CONTENT BASED MOST PROBABLE INTRA MODE DECISION[13] If the best mode is not the most probable mode, the 1 bit flag is to indicate the mode of current block is not the most probably mode, and then a 3 bit mode information is transmitted. Thus mode information of each block can be presented in 1 bit or 4 bits. Direct Intra Prediction When direct intra prediction is used, a new method is followed to code the intra prediction mode information. A rate distortion based direct intra prediction mainly contains 5 steps. Step 1: All 16 4x4 blocks in a MB use their most probable modes to do Intra_4x4 prediction and calculate RDCost(DIP) of this MB. RDCost(mode)=D(mode) + λ.R(mode) (11) Step 2: Mode search of Intra_4x4, find the best intra prediction mode of each block, and calculate RDCost(Intra_4x4). Step 3: Compare RDCost(DIP) and RDCost(Intra_4x4). If RDCost(DIP) is less than the RDCost(Intra_4x4), DIP flags equals to 1 then go to step 4, else DIP flags equals to 0 and go to step 5. Step 4: Encode the MB using DIP and finish the encoding of this MB. Step 5: Encode the MB using ordinary Intra_4x4 and finish the encoding of this MB. Interframe Prediction[13] AVS M defines I picture and P picture. P picture uses forward motion compensated prediction. The maximum number of reference pictures used by a P picture is 2. It also specifies nonreference P pictures. If the nal_ref_idc of a P picture is equal to 0, the P picture shall not be used as a reference picture. The nonreference P pictures can be used for temporal scalability. The reference pictures are identified by the reference picture number, which is 0 for IDR picture. After decoding current picture, if nal_ref_idc of current picture is not equal to 0, then current picture is marked as “used for reference”. If current picture is an IDR picture, all reference pictures except current picture shall be marked as “unused for reference”. Otherwise, if nal_unit_type of current picture is not equal to 0 and the total no. of reference pictures excluding current picture is equal to the num ref frames, the foll. applies: If num ref frames is 1, reference pictures excluding current picture in DBP shall be marked as “unused for reference”. If num ref frames is 2 and sliding window size is 2, the reference picture excluding the current picture in DPB with smaller reference picture number shall be marked as “unused for reference”. Otherwise, id num ref frames is 2 and sliding window size is 1, the reference picture excluding the current picture in DBP with larger reference picture number shall be marked as “unused for reference”. The size of motion compensation block can be 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4. If the half_pixel_mv_flag is equal to 1, the precision of the motion vector is up to ½ pixels; otherwise the precision of motion vector is up to ¼ pixels. When half_pixel_mv_flag is not present in the bitstream, it shall be inferred to be 11. The interpolated values at half sample positions can be obtained using 8 tap filter F1 = (-1,4, -12, 41, 41, -12, 4, -1) and 4 tap filter F2 = (-1, 5, 5, -1). The positions of the integer, half and quarter pixel samples are shown in the figure 13. Capital letters indicate integer sample positions, while lower case letters indicate half and quarter sample positions. Figure 12 : RELATIONS BETWEEN VARIALBLE POSITIONS AND REFERENCE SAMPLES[13] Figure 13 : THE POSITION OF INTEGER, HALF, AND QUARTER PIXEL SAMPLES[13] COMPARISON BETWEEN AVS PART 2 AND AVS PART 7 The major tools and technical features of Jizhun and Jiben profiles of AVS-video are listed in Table 7. Table 5 : Comparison between AVS Part 2 and AVS Part 7[3] COMPARISON BETWEEN AVS PART 7 AND H.264 BASELINE PROFILE Table 8 gives an overview of various tools and technical features of the Jiben profile of AVS and H.264 baseline profile. Table 6 : Comparison between AVS Part 7 and H.264 baseline Profile[3] ERROR CONCEALMENT To deal with the transmission error problem numerous techniques have been specified which are: forward error concealment, backward error concealment and interactive error concealment [12]. In forward error concealment technique the encoder plays the primary role. Backward error concealment refers to the concealment or estimation of lost information due to transmission errors in which the decoder fulfills the error concealment task. The decoder and encoder interactive techniques achieve the best reconstruction quality, but are more difficult to implement. Error concealment scheme The error concealment scheme 1 is to replace the lost MBs, including intra and inter MBs, with data in order to make a picture look smoother [12]. For lost MB in I frame, the most probable intra prediction mode defined by AVSM intra prediction algorithm is used as the intra prediction mode. And then the lost MB is reconstructed using this intra prediction mode and available data on neighboring MBs. For lost MBs in P frame, temporal prediction is used for the concealment. In this scheme, all lost P MBs are assumed to be 16×16 type MB and the motion vectors of neighboring MBs are used to predict the current MB motion vector using the algorithms defined in AVS-M standard. Main program flow analysis for encoder [20]: In this section, we analyze in detail the main program flow in three key function: Main( ), Encode_I Frame( ) ,Encode_P_Frame( ) and give flow diagram instructions . This function is the AVS-M program's main function. The main process of the main function is that the required parameters and cache used in the entire program are allocated and initialized. And then, according to the parameters pglmage-> type, decide on the current image I frame or P frame coding, respectively, into the I frame or P frame coding procedures for processing. At last compensation image to return to the main function is reconstructed and stored. For image motion compensation, the amount of data itself will be significantly reduced. Flow chart of the main() is shown in the figure 14. Figure 14: The flow chart of main() [20] Flow chart of Encode_I_Frame is shown in figure 15 Figure 15: The flow chart of Encode_I_Frame() [20] The flow chart of Encode_P_Frame() is shown in figure 16 Figure 16: The flow chart of Encode_P_Frame()[20] Simulation Result: The software which has been used to perform for AVS China Part 7 it is RM 3.3.7 [39]. [39] AVS China software: Part 7: ftp://124.207.250.92/incoming/video_codec/AVS1_P7 Microsoft Visual Studio Professional 2012 has been used to run the code and build the project for the codec. After building the project, code will generate two application files namely encode.exe and decode.exe. We run these two files using appropriate and necessary parameters and obtai+n the final result which is a decoded file. The original file and decoded file are than evaluated using MSU video quality measurement tool. The values of PSNR, MSE and SSIM are obtained from it. Input Sequence: mother-daughter_qcif.yuv Total No: of frames: 30 frames. Original file size : 1139Kb Width: 176. Height: 144. Frame rate: 30 fps Original Image QP = 10 QP = 50 QP = 63 Figure 17 : Video quality at various QP values for mother_daughter_qcif Table 7 : Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for mother-daughter_qcif sequence Figure 18 : PSNR vs Bit Rate Figure 19 : SSIM vs Bit Rate Input Sequence: news_cif.yuv Total No: of frames: 30 frames. Original file size : 14850Kb Width: 288. Height: 352. Frame rate: 25 fps Original Image QP = 10 QP = 31 QP = 63 Figure 20 : Video quality at various QP values for news_cif Table 8 : Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for news_cif sequence Figure 21 : PSNR vs Bit Rate Figure 22 : SSIM vs Bit Rate Input Sequence: foreman_qcif.yuv Total No: of frames: 30 frames. Original file size : 3713Kb Width: 144. Height: 176. Frame rate: 25 fps Original QP = 31 QP = 15 QP = 63 Figure 23: Video quality at various QP values for foreman_qcif Table 9 : Compressed file size, compression ratio, bit rate, PSNR and SSIM at various QP for foreman_qcif sequence Conclusion: AVS part 7 targets low complexity and low picture resolution mobility applications. The AVS encoder and decoder are implemented using AVS M software. Tests are carried out on various QCIF and CIF sequences. The bit rate, PSNR and SSIM values are tabulated. The performance of AVS-china was analyzed by varying the quantization parameter (QP). The PSNR and bit rate and SSIM were calculated. It can be observed that at higher QP the performance is best but decoded file is size is also large. As QP decreases quality of video and size of video decreases. References: [1] AVS working group official website, http://www.avs.org.cn [2] W. Gao et al, "AVS– the Chinese next-generation video coding standard," National Association of Broadcasters, Las Vegas, 2004 [3] L.Fan et al, "Overview of AVS Video Standard", IEEE International conference on multimedia and expo, Vol 1, pp. 423 - 426, June 2004. [4] B. Tang, Y. Chen and W. Ji "AVS Encoder Performance and Complexity Analysis Based on Mobile Video Communication", 2009 International Conference on Communications and Mobile Computing [5] L.Fan, "Mobile Multimedia Broadcasting Standards", Springer US, 2009 [6] AVS-M Reference Software, http://www.avs.org.cn/fruits/en/softList.asp [7] Y. Cheng et al, "Analysis and application of error concealment tools in AVS-M decoder", Journal of Zhejiang University –Science A, vol. 7, pp. 54-58, Jan 2006 [8] Website for PSNR, http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio [9] AVS China software: Part 7: ftp://124.207.250.92/incoming/video_codec/AVS1_P7 [10] S. Ma , S. Wang, W. Gao, "Overview of IEEE 1857 Video Coding Standards” IEEE ICIP, pp. 1500-1504, September 2013 , Melbourne, Australia (Several papers related to AVS China are in IEEE ICIP,2013) [11] Lu Yu et al, " Overview of AVS-video coding standards", Signal Processing: Image Communication, pp. 247-262, Nov 2009. [12] Y. Wang ” AVS_M: From standards to Applications”, Journal of Computer Science and Technology - Special section on China AVS standard Vol.21. No.3 pp. 332-344, May 2006 [13] L. Yu, “AVS Project and AVS-Video Techniques”, http://wwwee.uta.edu/dip/Courses/EE5351/ISPACSAVS.pdf, Dec.13, 2005 ISPACS 2005 [14] Microsoft Visual Studio Professional 2012 : http://www.microsoft.com/enus/download/details.aspx?id=34673 [15] MSU video quality measurement tool: http://www.softrecipe.com/Download/msu_video_quality_measurement_tool.html [16] Test video sequences : http://trace.eas.asu.edu/yuv/ [17] M. Liu and Z. Wei, “A fast mode decision algorithm for intra prediction in AVS-M video coding” Vol. 1, ICWAPR apos;07,Issue, 2-4, pp.326 -331, Nov. 2007. [18] Y. Cheng et al, “Analysis and application of error concealment tools in AVS-M decoder”, Journal of Zhejiang University –Science A, vol. 7, pp. 54-58, Jan 2006. [19] S.Hu, X.Zhang and Z.Yang, “Efficient Implementation of Interpolation for AVS”, Congress on Image and Signal Processing,2008. Vol 3, pp133 –138, 27-30 May 2008 [20] S.Hu, X.Zhang and Z.Yang, “Efficient Implementation of Interpolation for AVS”, Congress on Image and Signal Processing, 2008. Vol 3, pp133 –138, 27-30 May 2008