Department of Electrical and Computer Engineering University of Victoria Wavelet Based Video Compression Final Report Group Members: Scott Chin Anup Misra Supervisor: Submission Date: Dr. Wu Sheng Lu August 1, 2003 Wavelet Based Video Compression ELEC499a – Final Report Table of Contents Table of Contents .............................................................................................................. 2 Table of Figures................................................................................................................. 3 1 Introduction ............................................................................................................... 4 1.1 Motivation ........................................................................................................... 4 1.2 Purpose................................................................................................................ 4 1.3 Project Goals ....................................................................................................... 4 2 Video Processing Basics ............................................................................................ 5 2.1 Motion Compensated Prediction......................................................................... 5 2.1.1 Motion Estimation ...................................................................................... 5 2.1.2 Motion Compensation................................................................................. 8 2.2 Transfom coding – The Discrete Wavelet Transform ........................................ 8 2.3 Group of Pictures .............................................................................................. 10 3 Implementation ....................................................................................................... 12 3.1 Encoder and Decoder Implementation.............................................................. 12 3.2 DWT Implementation ....................................................................................... 13 3.3 Motion Estimation Algorithms ......................................................................... 13 4 Challenges Faced in Hardware Implementation ................................................. 14 5 Results ...................................................................................................................... 15 5.1 PSNR Measurements ........................................................................................ 15 5.2 Visual Quality ................................................................................................... 16 5.3 Computation Time ............................................................................................ 17 6 Recommendations ................................................................................................... 18 7 Conclusion ............................................................................................................... 19 References ........................................................................................................................ 20 Scott Chin, Anup Misra 2 Wavelet Based Video Compression ELEC499a – Final Report Table of Figures Figure 1. Anchor Frame ...................................................................................................... 6 Figure 2. Target Frame ....................................................................................................... 6 Figure 3. Motion Vector Field ............................................................................................ 7 Figure 4. Reconstructed Target Frame................................................................................ 7 Figure 5. Error Frame For Train Example .......................................................................... 8 Figure 6. Example Image Before DWT .............................................................................. 9 Figure 7. Example Image After One Application of DWT ................................................ 9 Figure 8. Example Picture Grouping Scheme................................................................... 10 Figure 9. Encoder Block Diagram .................................................................................... 12 Figure 10. Decoder Block Diagram .................................................................................. 12 Figure 11. PSNR Comparison Between DCT and DWT .................................................. 15 Figure 12 DWT compressed frame ................................................................................... 16 Figure 13 DCT compressed frame .................................................................................... 16 Scott Chin, Anup Misra 3 Wavelet Based Video Compression ELEC499a – Final Report 1 Introduction 1.1 Motivation Digital multimedia has changed the way we view computers. For example, the emergence of digital multimedia has brought DVD players to our home theaters, transformed our computers into media centers, and has allowed us to take our media with us in our handheld devices. Our project aims to examine one of the technologies that have been key to this digital revolution, digital video compression. The most recognizable consumer application is DVD video. On a DVD disc a movie is stored digitally as a string of ones and zeros. Video compression algorithms, such as MPEG, use various signal processing and video processing techniques to reduce the storage space needed to represent a movie while maintaining the highest quality possible. 1.2 Purpose The main trade off in any signal compression (i.e. music, image or video) is compression rate versus signal quality. When compression is too high the signal quality usually degrades to an undesirable quality. An example would be an mp3 file encoded at a very low bitrate. The file size may be very small but the music won’t sound good. Engineers face the same problems when trying to compress video signals. If a video is compressed too much the picture quality becomes terrible. To make better compression algorithms the challenge is to find techniques that can achieve good quality at high compression rates. Wavelets are an emerging signal processing technique that fit the bill. 1.3 Project Goals The goal of this project is to explore the merits the Discrete Wavelet Transform in video compression. The three original goals were to 1. Design, implement an encoder and decoder in matlab 2. Test and compare the performance of the DWT over the DCT 3. Implement the decoding algorithm on a high speed DSP chip or alternate dedicated hardware. However, due to several factors discussed later in this report, goal number three could not be achieved. Therefore, we changed goal number three to the following. 3. Implement additional features to the encoder/decoder in Matlab to increase the PSNR. Scott Chin, Anup Misra 4 Wavelet Based Video Compression ELEC499a – Final Report 2 Video Processing Basics There are three main parts found in a standard video compression algorithm. Motion Compensated Prediction (MCP) Transform Coding (DWT/ DCT) Group of pictures (GOP) 2.1 Motion Compensated Prediction Video sequences show a high degree of correlation from frame to frame. One compression strategy is to take the difference between adjacent frames and store this value. If the two frames are similar, the difference frame will not contain much information. This technique is known as Predictive Coding and is commonly used in signal compression. However, in video signals, there may be large difference between frames due to lighting effects, quick camera movements, and fast scene changes. If neighboring frames are very different, the difference frame may be large and contain more information than the original frames themselves! To overcome this we use ‘Motion-Compensated’ Prediction. MCP is a refinement of predictive coding. Rather than finding the difference frame directly, we can use the motion of objects in the scene to produce a better predictive coding algorithm. We use motion estimation to judge the movement of objects in a scene. 2.1.1 Motion Estimation In general, a scene has multiple moving objects. Therefore, the motion of each object can be characterized from frame to frame. For example, if there is a movie of a car driving across the screen, each frame shows the same car but it is shifted with respect to the previous frame. This shift can be calculated and characterized by a Motion Vector. However, determining where all the objects are in a scene is extremely complex. A simple, but non-ideal, solution is to partition each frame into non-overlapping uniform square blocks and characterize the motion of each block. This type of motion estimation is called Block Matching. We assume that each block undergoes translation only with no scaling or rotation. The blocks in the first frame, called the anchor frame, are compared to the blocks in the second frame, called the target frame. Motion Vectors can then be calculated for each block to see where each block from the anchor frame ends up in the target frame. Scott Chin, Anup Misra 5 Wavelet Based Video Compression ELEC499a – Final Report Figure 1 and Figure 2 show an example of two frames from a video sequence. Note that the train is moving from right to left, the ball is rolling, the gimbals are spinning, and the camera is slowly panning upwards. Scott Chin, Anup Misra 6 Wavelet Based Video Compression ELEC499a – Final Report Figure 1. Anchor Frame Figure 2. Target Frame Figure 3 shows the motion vectors generated between these two frames. Scott Chin, Anup Misra 7 Wavelet Based Video Compression ELEC499a – Final Report Figure 3. Motion Vector Field Figure 4 shows the target frame predicted from the anchor frame and the motion vectors. Figure 4. Reconstructed Target Frame Scott Chin, Anup Misra 8 Wavelet Based Video Compression ELEC499a – Final Report 2.1.2 Motion Compensation When a target frame is reconstructed using the anchor frame and motion vectors, the reconstruction is not perfect. In order to compensate for these errors, an error frame is generated at the encoder. The error frame is the difference between the actual target frame, and the reconstructed target frame. These error frames are generally very small and compress quite well. By spending the extra time to calculate a set of motion vectors we generally ensure an error frame with much smaller components. Figure 5 Shows the error frame generated for the target frame of the same train sequence. Figure 5. Error Frame For Train Example 2.2 Transfom coding – The Discrete Wavelet Transform All mainstream encoders use the Discrete Cosine Transform (DCT) to perform transform coding. The DCT maps a time domain signals to a frequency domain representation. We can compress the frequency domain spectrum by truncating low intensity regions. However, the DCT has several drawbacks. Computation of the DCT takes an extremely long time and grows exponentially with signal size. To calculate the DCT of an entire video frame takes an unacceptable amount of time. The only solution is to partition the frame into small blocks and then apply the DCT to each block. However, this leads to a degradation in picture quality. Scott Chin, Anup Misra 9 Wavelet Based Video Compression ELEC499a – Final Report The Discrete Wavelet Transform, DWT, offers a better solution. The DWT is another transform that maps time domain signals to frequency domain representations. But the DWT has a distinct advantage; The DWT, in essence, can be computed by performing a set of digital filters which can be done quickly. This allows us to apply the DWT on entire signals without taking a significant performance hit. By analyzing the entire signal the DWT captures more information than the DCT and can produce better results. Figure 6 and Figure 7 show one step of the DWT decomposition. The DWT separates the image’s high frequency components from the rest of the image, resizes the remaining parts and rearranges them to form a new ‘transformed’ image. Figure 6. Example Image Before DWT Figure 7. Example Image After One Application of DWT The image is separated into four subimages. The bottom left, bottom-right and top-right show the high-frequency detail of the image. The top left quadrant contains the low frequency or lower detail portion of the image, we can see that most of the information is in this portion. We can achieve compression by removing data in the high detail areas. As you Scott Chin, Anup Misra 10 Wavelet Based Video Compression ELEC499a – Final Report can see, if we retain only the top left image we are dropping information that does not distort the image in a noticeable fashion. 2.3 Group of Pictures What is the general structure of compressed and uncompressed video sequences? In general, there are three different types of frames. They are called I, P, and B frames. I frames are essentially the main anchor frames. No motion estimation is performed to generate these frames. They are transform coded directly to ensure a high quality reconstruction. This is because all following frames are predicted from the I frame and any error in the I frame will propagate through the rest of the group. P frames are predicted using MCP from the preceding I or P frame. The error frame generated is then transformed and compressed. Both the error frame and the set of motion vectors are stored to file. B frames are encoded much like P frames except that the prediction is done from a combination of a previous P or I frame, and a future frame P or I frame. The results are then averaged to represent the current frame. This is called bi-directional prediction. The prediction relative to future frames is needed to capture new object that may appear in the video in the middle of the group of pictures. Frame ordering is very important to overall picture quality. When predicting frames from previously encoded frames any errors in the previously encoded frames will degrade the reconstruction of the current frame. This error propagation can be controlled by using a specific frame ordering known as the group of pictures. Figure 8 shows a general group of pictures structure. Figure 8. Example Picture Grouping Scheme Scott Chin, Anup Misra 11 Wavelet Based Video Compression ELEC499a – Final Report By using a set group of pictures and reusing it we ensure that error propagation is kept within each group. The length of the group should not be too large so as to reduce the amount of error propagation. Also it is important to have a nice mix of P and B frames. B frames are more costly to calculate but the bi-directional prediction helps video quality immensely. Scott Chin, Anup Misra 12 Wavelet Based Video Compression ELEC499a – Final Report 3 Implementation 3.1 Encoder and Decoder Implementation Both the encoder and decoder were implemented in Matlab. Figure 9 and Figure 10 show the general structure of the codec. Our codec was specially designed so that the type of transform coding to be performed could be switched without affecting the rest of the codec. One of the input arguments to our encoder allows the user to specify which type of transform coding to perform, DCT or DWT. This way we were able to test the effect of using the DWT over the DCT. Figure 9. Encoder Block Diagram Figure 10. Decoder Block Diagram Scott Chin, Anup Misra 13 Wavelet Based Video Compression ELEC499a – Final Report 3.2 DWT Implementation The wavelet transformed was performed using a Daubeschies 4 filter with 5 levels of decomposition. The UVi_Wave toolbox was used to implement the wavelet transform and filter generation. To perform lossy compression all of the small components of the transformed image were set to zero. The threshold to use was chosen by a simple rate control technique. A portion of the video was encoded in three passes and the threshold was adjusted until the desired compression ratio was achieved. This threshold was then used on the remainder of the sequence. This approach is not ideal but considering our test sequences were short and only contained one scene this system was deemed acceptable. 3.3 Motion Estimation Algorithms Originally only one motion estimation algorithm was implemented. This was the Exhaustive Block Matching Algorithm (EBMA). This algorithm performs an exhaustive search within a given search range for the best matching block in the target frame. Therefore, it always finds the best match. However, since it is an exhaustive search, computation time is relatively long. To reduce the time spent on quick tests, the Three Step Search algorithm was implemented. This algorithm discards testing of unlikely candidate blocks. But due to discarding of potential candidates, the algorithm would sometimes discard a good candidate. Therefore an overall quality degradation was present. This was useful when quick tests were required. As a final attempt to increase the PSNR of the encdoded video sequences, a Fractional Exhaustive Block Matching Algorithm was implemented. This is a modification of the EBMA. Motion vectors may not always be multiples of pixels. Therefore, to increase the accuracy of the motion vectors, sub-pixel search was required. Half pixel resolution was tested using the Fractional EBMA. The number of operations was increased by a factor of 4 in addition to interpolation operations. This algorithm was implemented because our research showed that this algorithm would supposedly yield a better picture quality. After testing, it was found that this algorithm did indeed yield an improved picture quality to the human eye. However, it gave a decreased PSNR. The reason was determined to be the low pass filtering effect of the interpolation process. By smoothing out blocking effects, the picture seemed better to the viewer. But since the predicted frame is now low passed, it is more different than the original target frame. This causes the error frame to contain much more data and thus not compress very well. Due to these problems, the EBMA was chosen as the motion estimation algorithm for the encoder and decoder. Scott Chin, Anup Misra 14 Wavelet Based Video Compression ELEC499a – Final Report 4 Challenges Faced in Hardware Implementation Our original goal was to implement a real time decoder on a high speed dsp chip. The implementation would allow us to learn more about dsp chips and real time systems. Unfortunately, due to hardware availability we decided that this portion of the project was not feasible within the time limit, and decided to focus on the software based decoder. The hardware the technical department has in stock is not suitable to video applications. The Motorola DSP boards are geared towards audio applications. The processor itself is capable of the processing needs we have but unfortunately the data transferrates from the board to the PC was inadequate. Our hope was to store the encoded video file on our computer and use the DSP chip to perform decoding in real time. Unfortunately the serial connection between the computer and the development board could not transfer the data fast enough to the DSP chip. Newer DSP development boards provide USB connections. A connection of this type would be ideal in application where there is a large amount of data. Also a chip that offered floating point calculations would be desirable. The filtering operation of the DWT could be calculated efficiently and accurately in floating point format. For more detail and information on motion estimation algorithms, please refer to the text Video Processing and Communications by Y. Wang, J.Ostermann and Y-Q. Zhang. Scott Chin, Anup Misra 15 Wavelet Based Video Compression ELEC499a – Final Report 5 Results The wavelet based codec has many advantages over the DCT counterpart. The quality was roughly 3-4dB higher than the equivalent DCT based codec. The distortions introduced into the video were also very interesting. Artifacts in the DCT compressed video were mainly blocking effects and green/red speckled colour distortion. The DWT based encoder also had the color distortion but rather than blocking artifacts the video appeared more smeared. In some low bit rate cases the distortion in the DWT compressed video was more appealing than the harsher blocking artifacts introduced by the DCT. 5.1 PSNR Measurements The following plot shows a comparison of the power-to-signal-noise ratio when using DWT and DCT. Six GOPs of a video sequence were encoded at 90% compression ratio. The following plot was calculated for the luminance component only. Figure 11. PSNR Comparison Between DCT and DWT As seen from Figure 11, the PSNR of the DWT encoded sequence is constantly at least 3dB higher than the DCT frames. Also note that each maximum peak corresponds to an I frame in the GOP structure. Therefore it has the highest PSNR. However, as errors propogate through the GOP, the PSNR of the DCT frames degrades much faster than the DWT frames. Scott Chin, Anup Misra 16 Wavelet Based Video Compression ELEC499a – Final Report The PSNR difference from the I frame and the last B frame in the GOP is only approximately 1dB for the DWT encoded sequence; But for the DCT encoded sequence, this differenece is almost 3dB. 5.2 Visual Quality Figure 12 shows an example frame from one of the test sequences encoded with the DWT based codec. Figure 13 shows the same frame from the DCT based codec. The increased quality can be most clearly seen on the blue and yellow jersey of the football player. The DWT encoded frame isgenerally smoother and has less green/red distortion. Figure 12 DWT compressed frame Figure 13 DCT compressed frame Scott Chin, Anup Misra 17 ELEC499a – Final Report Wavelet Based Video Compression 5.3 Computation Time Table 1 summarizes the encoding and decoding times for four video sequences. Note that these computation times are calculated for each 13 frame GOP, averaged over four GOPs. The calculation times are in minutes. From this table we see that the encoding time for DWT is consistently longerby 30-75 seconds. The decoding time for DWT is also longer but only by approximately 30 seconds. As a final note, these calculations were executed in Matlab and a 1GHz desktop PC. Table 1. Computation Comparison Between DWT and DCT DWT tempete mobile edberg football Encoding 10.68 10.36 10.72 11.32 Scott Chin, Anup Misra DCT Decoding 1.04 0.99 1.21 1.14 Encoding 10.14 9.66 10.44 10.02 Decoding 0.62 0.59 0.55 0.65 18 Wavelet Based Video Compression ELEC499a – Final Report 6 Recommendations The codec that was implemented contained only the essential components in a video compression algorithm. Commercial codecs such as MPEG2 have numerous additional features. For example, streaming functionality, audio standards, backwards compatibility, etc. In order to become a useful standard, additional features need to be implemented. For future students undertaking a wavelet based video compression project in ELEC499, some areas of focus could include: Implement the encoder/decoder through dedicated software Implement the codec to be compatible with commercial players such as Microsoft Media Player Implement the decoder in hardware Research and implement additional features required in a commercial standard. If students wish to implement the algorithm in hardware, it is strongly recommended (based on our experience) to research and obtain the appropriate DSP hardware as early as possible. This should be done within the first week of class or even prior to the start of class. Scott Chin, Anup Misra 19 Wavelet Based Video Compression ELEC499a – Final Report 7 Conclusion As seen from the results, the the codec shows that a DWT based codec can be superior to a DCT based codec. Improvement in PSNR increased by 3dB. Error propagation through out a GOP reduced from 3dB to 1dB. Video quality was also noticeably better to the human eye. Computation time was slightly longer but could be optimized with dedicated hardware or software. We were pleased with the results and we believe that wavelet based compression will play an integral role in the future of video coding standards. Scott Chin, Anup Misra 20 Wavelet Based Video Compression ELEC499a – Final Report References P. Agathoklis, Lecture Notes, 2003 W.-S. Lu, Lecture Notes, 2003 Y. Wang, J.Ostermann and Y-Q. Zhang, “Video Processing and Communications”, Prentice-Hall, 2002 Scott Chin, Anup Misra 21