International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014 Low Complexity Video Compression Based on Orthogonal Transforms Neelesh Choudhary#1, Alok Jain#2 # Samrat Ashok Technological Institute, Vidisha, M.P., India Abstract— Video is a 3D (three dimensional) array of pixels. Two of its dimensions serve as spatial domain for moving pictures and remaining one dimension serve as time domain. As compared to other multimedia applications, transmission of video frames requires large bandwidth and more storage space, therefore video compression is necessary. The proposed technique compresses the video by converting video from 3D (three dimension) to 2D (two dimension) using Accordion function. This transformation converts temporal redundancy of video into spatial redundancy. To eliminate spatial redundancy Discrete Cosine Transform (DCT), Discrete Walsh Transform (DWT) and Discrete Kekre Transform (DKT) are applied separately. Comparative study of the transform is done on the basis of parameters calculated and it is found that Discrete Kekre Transform performs better as compared to Discrete Cosine Transform and Discrete Walsh Transform. Keywords— Discrete cosine transform, Discrete Walsh Transform, Discrete Kekre Transform, Accordion function, Arithmetic encoding. I. INTRODUCTION Video is a sequence of pictures or frames. Video signal has high correlation between successive frames which is known as temporal redundancy. Neighbouring pixels in an image are similar to some extent and shows spatial redundancy. Video files are stretching towards every field of information technology, therefore it is required to compress and decompress in an efficient way. The aim of video compression is to reduce the size of video signal for reducing storage size and enhance the transmission time without degrading its visual quality. In this paper, a new video compression technique is proposed, which tends to remove temporal redundancy and spatial redundancy in terms to improve the efficiency with minimum processing complexity. In portable digital video applications, efficient real-time video compression and decompression technique is required. For such applications, a desired technique which is efficient, low cost, low complexity is required. Motion based estimation is widely used amongst all video compression techniques. It is based on inter frame correlation due to which it provides wellorganised compression. However, it is a complex process and requires a large number of operations per pixel and motion estimation process is computationally expensive [1, 2]. For stored video applications, motion based video coding standard MPEG was used, through which encoding process is typically carried out offline. Therefore, it is not relevant to implement it as a real-time compression technique for portable video files. ISSN: 2231-5381 Another avenue to exploit inter frame correlation is to use transform based approach. The objective here is to find relevant transform that has energy compaction property and allows an efficient subsequent entropy coding. For image compression wavelet transform and 2D DCT have shown these properties. These transforms are used in image/video compression in spatial domain. The 2D-DCT can be extended into third dimension, i.e. 3D-DCT by including time as third dimension into transformation and energy compaction [3, 4, 5, 6]. Transform coder provides video compression ratio which is as close as motion estimation based coding and less complex as compared to motion estimation based coding [7]. II. TRANSFORM BASED CODING The Transform coding is being in used for more than two decades and has proven to be most efficient coding technique, eminently for spatial domain. It is now the basis of almost all video coding standards. The usually used transform based video coder uses DCT as used in JPEG compression standard. For video it is known to us as M-JPEG, where “m” stands for “motion”. Transform based coding is shown in Fig.1. At first, the input frame of video is divided into N×N blocks. Thereafter, transform is applied on these blocks [6,7,8]. The objective of using transform is to de-correlate pixels of the input block. This is achieved by re-adjusting the energy of pixels and concentrating majority in a small set of transform coefficient known as energy compaction. Fig. 1 Transform based video compression Compression is achieved in two ways: first, low energy coefficients are discarded as they have minimum impact on the reconstruction. Second, retained coefficients are quantized according to their visual importance because human vision system has different sensitivity to different frequencies. Transformed coefficients are then quantized and then suitably http://www.ijettjournal.org Page 543 International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014 coded. Now these quantized coefficients are suitably coded [9,10]. III. METHODOLOGY Here approach is to represent data in highly correlated form, which is obtained by exploiting by both temporal and spatial redundancy in video signal. The input of our encoder is referred as video cube (3D), which is made up of number of frames. This cube is decomposed into temporal frames which are then combined into one frame (2D). In the next step, 2D frame is decomposed into N×N blocks of pixels and transform is applied. Block diagram representation of accordion based transform coding is shown in fig.3. Fig.4 Spatial decomposition of frames Initially, in spatial decomposition, frames decompose into its row and column pixel intensity value as shown in Fig 5. Then we decompose group of frames into temporal (time) domain. Fig.5 Temporal decomposition of frames Fig. 2 Block diagram representation of Accordion Based Transform Coding Temporal decomposition is achieved by merging the video cube pixels which have same column rank. Since, these frames have a stronger correlation as compared to spatial frames. Frames retrieved from temporal decomposition are called temporal frames. A. Accordion Representation The accordion representation transforms video into images that arrange adjacent pixels of temporal frames to spatial neighbourhood [11]. Fig.3 shows first four frames of News reader sequence. To obtain accordion representation arrange temporal frames of same column rank are grouped together since they highly correlated to each other. Direction of arrow in Fig.6 shows the manner in which frames are arranged to form accordion. Fig.3. Frames of News reader video Here, objective is to carryout spatial and temporal decomposition of 3D video signal. Fig. 4 shows spatial decomposition of video cube of first four frames. Here I(h,w,tn) is the pixel intensity value in respected coordinate, here ‘h’ and ‘w’ represents height and width of video frame, ‘tn’ shows time instance. ISSN: 2231-5381 Fig.6 Arrangement of Temporal frames to form Accordion representation of video http://www.ijettjournal.org Page 544 International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014 Fig. 6 shows the obtained accordion representation of spatially and temporally decomposed video file. Fig.7 Accordion representation in the form of intensity values of video file Accordion representation of News reader sequence is shown in fig 7. transform applied on an frame (image) of size N×N requires only addition and no multiplications [13]. 4×4 walsh transform matrix is shown in fig.8. 1 1 1 1 1 1 -1 -1 1 -1 -1 1 1 -1 1 -1 Fig. 9 4×4 Walsh Transform Matrix Although, the computational complexity of DWT is simpler than other transform techniques, it is not suitable for the application which requires higher compression ratio. For the higher compression ratio, the reconstruction quality drastically decreases 3) Discrete Kekre Transform: Unlike other transform matrix kekre transform matrix need not to be in power of 2, transformation matrix can be of any size. All upper diagonal values are one, while lower diagonal part except the values just below diagonal are zero [13, 14]. Expression for N×N Kekre’s transform matrix is given as: ( { ) (3) Fig.8 Accordion representation of Akiyo video file Obtained IACC frames are decomposed into 8×8 blocks. For each 8×8 block transform is applied. Transform used here are DCT, DWT and DKT. Transformed coefficients are quantized and encoded. B. Transform Used: Following transforms are used in this proposed work are described below: 1) Discrete Cosine Transform: Discrete Cosine Transform (DCT) is a lossy compression technique used in image compression. Its expression for N×N is given as( ) √ Where, ( ) ( ) ( )∑ ∑ ( ) [( ) ] [( ) ] (1) N×N Kekre transform matrix is shown in Fig.10 1 1 1 1 1 -N+1 1 1 1 1 0 -N+2 1 1 1 : : : : : 0 0 0 1 1 0 0 0 -N+(N-1) 1 : Fig. 10 N×N Kekre Transform Matrix {√ (2) As DCT shows simplicity and satisfactory performance in compression, it was most widely used in transform coding. Drawback of using DCT is that it introduces blocking artifacts at low bit rate. 2) Discrete Walsh Transform: The walsh transform matrix is define as a set of N rows, denote Wj, for j=0,1………N-1. Here, Wj takes on the values +1 and -1 and Wj [0]=1 for all j. walsh transform Matrix is defined using Hadamard matrix of order N. Full 2-Dimensional walsh ISSN: 2231-5381 C. Quantization and Zigzag Scanning: Quantization is the process of mapping a large set of values to a smaller set of values by rounding some values to some unit of precision. After quantization of the transformed values, entropy encoder further compresses the quantized values to give additional compression. It is a loss-less compression and is also reversible. An effective way to code the resulting set quantized coefficients is with a combination of zigzag scan of coefficients. Fig.11 shows zig-zag ordering of quantized coefficients. http://www.ijettjournal.org Page 545 International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014 (DWT) and Discrete Kekre Transform (DKT). Performance parameters used here are Peak signal to noise Ratio (PSNR) and Compression Ratio. Fig. 11 Zig zag scanning of coefficients D. Encoding: In the entropy encoding, the idea is to find a reversible mapping to the quantized values such that the average number of bits or symbols is minimized. Encoding used here is Run Length Encoding (RLE) and arithmetic coding. Run length encoding works by reducing the physical size of a repeating string of characters. This repeating string, called a run, is typically encoded into two bytes. The first byte represents the number of characters in the run and is called the run count. The second byte is the value of the character in the run, which is in the range of 0 to 255, and is called the run value. RLE schemes are simple and fast, but their compression efficiency depends on the type of image data being encoded. The idea behind Arithmetic coding is to have a probability line, 0-1, and assign to every symbol a range in this line based on its probability, the higher the probability, the higher range which assigns to it. A. MSE and PSNR The two commonly used error notations used to compare various image/video compression techniques are Mean square error and peak signal to noise ratio. The MSE is cumulative squared error between compressed and original image whereas the PSNR is a measure of peak error. PSNR is the ratio between maximum possible power of the signal and the power of corrupting Noise that affects the fidelity of representation. Since, many of the signals have a very wide dynamic range, PSNR is usually expressed in terms of Logarithmic decibel scale. PSNR is most usually used as measure of quality of reconstruction of lossy compression method. Equation for MSE and PSNR are given as: ∑ ∑ [ ( ) ( )] (4) Where, M×N represents number of rows and columns of frames. Fˈ(i,j) represents reconstructed frames. F(i,j) represents original frames ( ) (5) √ Here. MAX is the maximum possible value of a image. When pixels are represented by 8 bits per sample, than maximum value is 255. B. Compression Ratio Compression defines the measure of extent to which compression can be achieved without degrading the quality of video file. It is given as: IV. EXPERIMENTS AND RESULTS Experiments had been performed in order to study the performance of proposed technique by considering four videos shown in Fig.12 (6) Where, = number of bits in input video = number of bits in compressed video. DKT gives better PSNR for every frame then DCT and DWT which is clear in Fig.13 psnr(db) ACC DKT ACC DCT ACC DWT 44 42 40 38 36 34 32 30 28 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 frames Fig.12 (a) 1st frame of Akiyo (b) 4th frame of Deadline (c) 2nd frame of Container (d) 100th frame of Bowing Transform which provide high PSNR and Compression ratio is considered to be the best. Transform used here are Discete Cosine Transform (DCT), Discrete walsh Transform ISSN: 2231-5381 Fig.13. PSNR waveform comparison between ACC-DKT, ACC_DCT and ACC-DKT (Bowing test video). http://www.ijettjournal.org Page 546 International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014 Table 1 represents overall performance of all proposed Transforms i.e. Discrete Cosine Transform (DCT) and Discrete Kekre Transform (DKT) for different compression ratios. Here, PSNR is observed for different compression ratio. Transform which gives high value of PSNR for high Compression ratio is considered to be the best. From observation, we found that Kekre Transform provides high PSNR at different Compression Ratio without distorting the quality of video file. TABLE I NUMERICAL RESULTS ON THE TEST VIDEO FRAMES AVERAGED OVER 50 FRAMES PSNR(db) TEST VIDEO AKIYO DEADLINE CONTAINER BOWING CR=10 CR=25 CR=50 CR=10 CR=25 CR=50 CR=10 CR=25 CR=50 CR=10 CR=25 CR=50 DCT DWT DKT 32.15 31.62 29.48 30.01 30.00 28.34 32.21 30.61 28.06 33.72 32.48 29.38 33.70 32.25 29.10 30.65 30.00 29.10 33.18 31.06 29.06 33.22 33.01 30.42 34.26 33.13 31.66 31.30 30.55 30.14 35.33 32.29 30.28 37.94 37.66 34.99 In this methodology, Redundancies are exploited in temporal domain. Some artifacts produced by transform based compression techniques prevail in accordion. The application of Transform based coding on IACC allows the transformation from temporal domain to spatial domain. After quantization, high spatial frequencies of IACC frames are eliminated which actually represents the high temporal frequencies of 3D video file. Thus, strong quantification will not affect the quality of image but will rather affect the fluency of video. References [1] Q. L. X. Zhou and Y. Chen, “Implementation of h.264 decoder on general purpose processors with media instructions”, in SPIE Conf. on Image and Video Communications and Processing, 224-235, 2003. [2] M. B. T. Q. N. A. Molino and F.Vacca, “Low complexity video codec for mobile video conferencing”, in Eur. Signal Processing Conference, 665668, 2004. [3] S. B. Gokturk and A. M. Aaron, “Applying 3d techniques to video for compression”, in Digital Video Processing (EE392J) Projects Winter Quarter, 2002. [4] T. Fryza, “Compression of Video Signals by 3D DCT Transform”, Diploma thesis, Institute of Radio Electronics, FEKT Brno University of Technology, Czech Republic, 2002. [5] G. M.P. Servais, “Video Compression using the three dimensional discrete cosine transform” in Proc. COMSIG, 27-32, 1997. [6] R. A.Burg, “A 3d-dct real-time video compression system for low complexity single chip vlsi implementation”, in the Mobile Multimedia Conference (MoMuC), 2000. [7] Jaya Krishna sunkara, E navaneethasagari, D pradeep, E Naga Chaithanya,D Pravani and D V Sai Sudeer, “A new video compression method using DCT/DWT and spiht based on Accordion representation”, in International Journal of Image,Graphics and Signal Processing,2012. [8] Reny Catherin L.,Thirupurasunthari P.,Sherley Arcksily Sylvia A.,Sravani Kumari G.,Joany R.M and N.M. Nandhitha, “A servey on Hybrid Image compression Techniques for Video Transmisson”, in IJECE 2013. [9] M.Atheeshwari and K.Mahesh, “video compression techniques- A Comprehensive survey”, in IJARCSSE, 2014. [10] Detlev Marpe, Heiko Schwarz and Thomas Wiegand, “Context based adaptive binary arithmetic coding in the H.264/AVC video compression standard”, in IEEE transaction on circuits and system for video technology, 2003. [11] Tarek Ouni, Walid Ayedi and Mohamed “New low complexity DCT based video compression method” in IEEE 2009. [12] Gregorio Bernabe, Jose M. Garcia and Jose Gonzalez, “A lossy 3D wavelet transform for high quality compression of medical video”, in The Journal of Systems and Software, Elsevier, 2009. [13] Sudeep.D.thepade and Jaya H.Dewan, “varying proportions of constituent transforms to generate hybrid wavelet transform for image compression”, in IEEE International Conference on Emerging Trends in computing communication and nanotechnology, 2013. [14] Dr. H.B kekre, Sudeep D. Thepade and Akshay Maloo, “performance comparison of image retrieval techniques using wavelet pyramids of walsh, haar and kekre transforms”,in IJCA, 2010. V. CONCLUSION In this paper, comparative study of different orthogonal transform like Discrete Cosine Transform, Discrete Walsh Transform and Discrete Kekre Transform is done for effective video compression. These transforms were applied on ACCORDION representation of video file which is 3D to 2D conversion of video. Objective of transform based coding is to remove temporal redundancy. On the basis of PSNR and Compression Ration calculated, inference drawn here is that Discrete Kekre Transform (DKT) has performed better as compared to Discrete Cosine Transform (DCT) and Discrete Walsh Transform (DWT). DKT has provided high PSNR amongst all transforms used. ISSN: 2231-5381 http://www.ijettjournal.org Page 547