International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 10 - Apr 2014 Video Coding Based on SSIM Roselin Raju1, Arun P.S2 1 PG Scholar, 2 Assistant Professor , Dept: of Electronics & Communication, Kerala University Mar Baselios college of Engineering and Technology, Nalanchira Trivandrum,India Abstract— This paper proposes a new image quality assessment algorithm- structured similarity index (SSIM) for video coding applications. It mainly incorporates the human visual sensitivity characteristics for video coding. This method can achieve significant gain at lower bit rate in terms of rate SSIM performance. Similarity measure is based on three comparisons such as luminance, chrominance and structure. Structural comparison extracts structural characteristics of an image which can improves the quality of video. Better quality video can be achieved by varying the value of quantisation parameter. Experimental results show the variation of video quality at different quantisation parameters. Keywords— Video coding, Structured Similarity Index, Peak Signal to Noise Ratio, Mean Square Error I. INTRODUCTION In several decades, video coding has been a very popular research area in video processing applications. It provides efficient solutions for storing and transmitting huge amount of data. Basically, video coding is equivalent to video compression. The main aim of video coding is reduce the size of the video without reducing the quality of video. Development of some video coding standards such as MPEG1, MPEG-2 , H.263 and H.263/AVC improves the efficiency of video coding out of which H.261, H.263,H.263+ are used for video conferencing and MPEG-1,MPEG-2 are used for video broad casting and archiving. These standards increase the coding efficiency by removing the statistical redundancy and temporal redundancy. A brief overview of statistical and temporal redundancy is given in [1] .There are several approaches used to improve the coding efficiency out of which perceptual video coding provides tremendous applications in video processing. In modern video coding methods, spatial redundancy has been extensively exploited using motion compensation, inter frame prediction technologies. Incorporating the human visual characteristics into the video coding techniques greatly reduces the bit rate without exploiting their quality. The main characteristics of HVS systems are frequency sensitivity, response to luminance amplitude variations and response to objective motion. A study of basic video coding algorithms is discussed in [5]. Initial procedure in every video coding algorithm is the generation of frame sequence. Normally, all video sequences are in RGB format. Conversion of RGB format to YUV in video coding makes the coding much simpler. Initially, video sequences are converted into frames. The quality of each frames are assessed using image quality ISSN: 2231-5381 assessment algorithms. Algorithms used for measuring image quality are classified into two types- Subjective evaluation and objective evaluation. The former one is usually time consuming and inconvenient. The latter one automatically predicts perceived image quality [4]. Objective quality metric is classified into full reference, reduced reference and no reference assessments. Mean square error (MSE) was one of the most commonly used full reference image quality assessment algorithms. II. RELATED WORKS H. Chen [2] had proposed a paper on Distributed video coding in which he makes a new paradigm of video coding based on temporal correlations between successive frames. C. Sun, H. J. Wang and H. Li [8]-[10] proposed a macro block level rate distortion optimization scheme based on the perceptual features of captured video. Based on the sensitivity features of human visual system this paper proposes three visual distortion sensitivity models-Attention model, position model, and texture structure model. These models are used to detect the active, central, flat regions. Zhi-Yi Mai proposed a paper discussed in [11], [12]. This paper is based on rate distortion optimization. Rate distortion optimization scheme is used for the selection of best prediction modes in H.264 I frame encoder. Rate distortion optimization is based on mean squared error (MSE) between the reconstructed blocks and the original blocks. This paper proposes two improved prediction mode selection methods based on structural similarity index. The first method is used to reduce the error or distortions in the perceived video. The second method is used to improve the compression efficiency. Domimque Brunet had proposed a paper on quality assessment from error sensitivity to structural similarity was discussed in [6]. Mean square error is simple to calculate and have clear physical meaning. It is computed by taking the average of squared intensity difference between the transmitted image and distorted image. But the disadvantage is that it does not incorporate the characteristics of human visual system (HVS). In [7], better perceived image quality can be achieved through proper bit allocation process. Psycho visual model consists of motion attention model and texture structure model which is used for bit allocation process. The idea behind in the bit allocation process is to allocate more bits to the video areas where more distortions occurs and allocate lesser number of bits to the video areas where less distortions occurs. http://www.ijettjournal.org Page 508 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 10 - Apr 2014 Many researches had gone into the development of quality assessment algorithms that incorporates the characteristics of human visual system. Based on the property of human visual system, a new paradigm (Structured similarity index) for quality assessment was proposed. Section III summarizes the advantages of structural similarity index vary from 0 to1. Human visual system is sensitive to relative luminance change rather than absolute luminance change. III. STRUCTURED SIMILARITY INDEX (SSIM) Pixels of the images that are highly structured exhibit strong dependencies. Those pixels exhibit strong dependencies carry important information about the structure of the object. Mean square error quality metric are independent of quality structure. Most of the existing quality measures based error sensitivity do not depends on the structure of the object. But the structured similarity index (SSIM) compares the structure of both the transmitted and the distorted image. This metric also incorporate the characteristics of human visual system thereby providing a good approximation to perceived image quality. Mean square approach is a bottom up approach where as structured similarity index is a top down approach. This top down approach also evaluates the structural changes between two complex structured signals. Each of the frame sequence consists of one luminance and two chrominance components. Luminance components are sensitive to human visual system where as chrominance components are less sensitive to human visual system. Luminance of an image is the product of illumination and reflectance components. But the structure of the image is independent of illumination. Consequently, in order to extract the structure of the information, influence of the illumination components was to be separated. Structural information in the image gives the structure of the objects in the image. The system diagram for structured similarity index is given in figure 1. Consider two image signals x and y with one signal having high quality and the other having lesser quality compared to the first signal. Initially, the system separates the luminance, chrominance and contrast components of the two image signals. First, comparison of luminance signal is done. Then mean intensities are removed from the image signal and contrast comparison is performed. After that the image signals are to be normalized. Finally structural comparison is performed on normalized signals. Normalized signals are obtained by dividing the signal with its standard deviation. Similarity can be measured by combining the three components. Boundedness, uniqueness and symmetry are the properties of SSIM. Each SSIM value of different video sequence must satisfy these properties. Structured similarity index is calculated using the following equation (1) l(x,y) represents the luminance comparison, c(x,y) represents the chrominance comparison and s(x,y) represents the structural comparison. The symbols α, β, ϒ are used to adjust the relative importance parameters. The value these symbols ISSN: 2231-5381 Fig. 1 Block Diagram of Structured Similarity Index(SSIM)[6] IV. PROPOSED SYSTEM A. Encoder Section Input video DCT SSIM Encoder Bit stream output B. Decoder Section Inverse DCT Encoded output Decoder Output In the encoder section, all the frame sequence in RGB format is converted into YUV sequence. The coefficients obtained after performing the DCT are stored in 8x8 matrix. The pixels of the transformed coefficients are compared with the coefficients of the pixels in the reference frames. Structural similarity is measured between the pixels in the reference frame and distorted frames. Inverse operation is performed on the decoding section. V. RESULTS AND DISCUSSION Simulation is done on different video sequences which are in .CIF format. Simulation is performed in MATLAB (2013). Image quality is assessed for each I, P, B frames and SSIM value is calculated for each of the video sequences. Performance can be evaluated for each frame. http://www.ijettjournal.org Page 509 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 10 - Apr 2014 Each frame is considered as an image. Each image is divided into macroblocks of variable block size. Here the following graph showing the variation of image quality of P frame based on the division of macroblocks. Quality of the video can also be varied by varying the quantisation parameter. Experimental results shows that better quality video can be achieved for a quantisation parameter 36. Table 1 Comparison of different frames for a quantisation parameter of 36 Parameters I frame P frame B frame No: of blocks - 76 36 Luma PSNR 34.857dB 33.9528dB 33.0944dB Chroma PSNR 37.2297dB 38.7243dB 37.5946dB Total rate 1.1615bpp 0.75738bpp 0.18535bpp Table 2 Comparison of different frames for a quantization parameter of 29 Fig. 2 Original frame & DCT transform result Parameters I frame P frame B frame No: of blocks - 27 4 Luma PSNR 27.9979dB 27.3599dB 26.8857dB Chroma PSNR 33.3316dB 33.1308dB 33.2892dB 34.2128dB 34.0324dB 34.1167dB 0.3997bpp 0.10156bpp 0.0665164bpp Total rate Table 3 Comparison of different frames for a quantization parameter of 40 Video No: of Resolution File SSIM sequence frames Foreman 300 352x288 YUV file 0.123 Football 125 352x240 YUV file 0.1109 Crowd 125 352x288 YUV file 0.0079 Garden 115 352x240 YUV file 0.0187 format Table 4 SSIM values for different video sequence Fig. 3 PSNR versus No: of blocks for P frame (QP-30) VI.CONCLUSION Similarily PSNR versus no: of blocks for P, B frames are also be obtained. Experimental results show that for a quantization parameter of 36, proposed approach provides better quality video. Performance evaluation also shows that the proposed approach doesn’t give better quality for a quantization parameter above 36 and below 36. Performance evaluation for different quantisation parameters is shown in the below tables. Parameters I frame P frame B frame No: of blocks - 54 27 Luma PSNR 42.803dB 34.828dB 34.1163dB Chroma PSNR 43.0735dB 43.8125dB 39.7412dB 41.7122dB 40.3192dB 42.0975dB Total rate 2.7826bpp 0.63585bpp 0.18316bpp ISSN: 2231-5381 We propose an image quality assessment metric, the structured similarity index for perceptual video coding. The novelty of this scheme lies in the comparison of three components. The proposed scheme demonstrates superior performance as compared to other image quality assessment algorithms. Improvement in visual quality and low bit rate is also achieved by the proposed scheme. ACKNOWLEDGMENT We would like to thank the almighty for sustaining the enthusiasm with which plunged into this endeavor. We are grateful to our families for supporting us and praying for us. We avail this opportunity to express our profound sense of sincere and deep gratitude to the Professors of MBCET for their valuable guidance for making this paper. My hearty and http://www.ijettjournal.org Page 510 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 10 - Apr 2014 inevitable thanks, to all our friends who supported us for the successful completion of this paper. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Shiqi Wang, Abdul Rehman, Student Member, IEEE, Zhou Wang, Member, IEEE,”Perceptual Video Coding Based on SSIM-Inspired Divisive Normalization” Vol. 22, No. 4,April 2013 H. Chen, Y. Huang, P. Su, and T. Ou, “Improving video coding quality by perceptual rate-distortion optimization,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2010, pp. 1287–1292.. S. Channappayya, A. C. Bovik, and J. R. W. Heathh, “Rate bounds on SSIM index of quantized images,” IEEE Trans. Image Process., vol. 17, no. 9, pp. 1624– 1639, Sep. 2008. J. Chen, J. Zheng, and Y. He, “Macroblock-level adaptive frequency weighting for perceptual video coding,” IEEE Trans. Consum. Electron., vol. 53, no. 2, pp. 775–781, May 2007. Image and Video Compression: A Survey, Roger J. Clarke Department of Computing and Electrical Engineering, Heriot-Watt University, Riccarton, Edinburgh EH14 4 AS Scotland,1999. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004. C.-W. Tang, C.-H. Chen, Y.-H. Yu, and C.-J. Tsai, “Visual sensitivity guided bit allocation for video coding,” IEEE Trans. Multimedia, vol. 8, no. 1, pp. 11–18, Feb. 2006. C. Sun, H.-J. Wang, and H. Li, “Macroblock-level rate-distortion optimization with perceptual adjustment for video coding,” in Proc. IEEEData Compress. Conf., Mar. 2008, p. 546. J. Chen, J. Zheng, and Y. He, “Macroblock-level adaptive frequency weighting for perceptual video coding,” IEEE Trans. Consum. Electron.,vol. 53, no. 2, pp. 775–781, May 2007. A. Tanizawa and T. Chujoh, “Adaptive quantization matrix selection,” Toshiba, Geneva, Switzerland, Tech. Rep. T05-SG16-060403-D-0266, Apr. 2006. Z. Mai, C. Yang, L. Po, and S. Xie, “A new rate-distortion optimization using structural information in H.264 I-frame encoder,” in Proc.ACIVS, 2005, pp. 435–441. X. Zhao, J. Sun, S. Ma, and W. Gao, “Novel statistical modeling, analysis and implementation of rate-distortion estimation for H.264/AVC coders,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 5,pp. 647–660, May 2010. E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT coefficient distributions for images,” IEEE Trans. Image Process., vol. 9, no. 10, pp. 1661–1666, Oct. 2000. Joint Video Team (JVT) Reference Software. (2010) [Online].Available: http://iphome.hhi.de/suehring/tml/download/old-jm ISSN: 2231-5381 http://www.ijettjournal.org Page 511