Final Report - Electrical and Computer Engineering

advertisement
Department of Electrical and Computer Engineering
University of Victoria
Wavelet Based Video Compression
Final Report
Group Members:
Scott Chin
Anup Misra
Supervisor:
Submission Date:
Dr. Wu Sheng Lu
August 1, 2003
Wavelet Based Video Compression
ELEC499a – Final Report
Table of Contents
Table of Contents .............................................................................................................. 2
Table of Figures................................................................................................................. 3
1 Introduction ............................................................................................................... 4
1.1
Motivation ........................................................................................................... 4
1.2
Purpose................................................................................................................ 4
1.3
Project Goals ....................................................................................................... 4
2 Video Processing Basics ............................................................................................ 5
2.1
Motion Compensated Prediction......................................................................... 5
2.1.1
Motion Estimation ...................................................................................... 5
2.1.2
Motion Compensation................................................................................. 8
2.2
Transfom coding – The Discrete Wavelet Transform ........................................ 8
2.3
Group of Pictures .............................................................................................. 10
3 Implementation ....................................................................................................... 12
3.1
Encoder and Decoder Implementation.............................................................. 12
3.2
DWT Implementation ....................................................................................... 13
3.3
Motion Estimation Algorithms ......................................................................... 13
4 Challenges Faced in Hardware Implementation ................................................. 14
5 Results ...................................................................................................................... 15
5.1
PSNR Measurements ........................................................................................ 15
5.2
Visual Quality ................................................................................................... 16
5.3
Computation Time ............................................................................................ 17
6 Recommendations ................................................................................................... 18
7 Conclusion ............................................................................................................... 19
References ........................................................................................................................ 20
Scott Chin, Anup Misra
2
Wavelet Based Video Compression
ELEC499a – Final Report
Table of Figures
Figure 1. Anchor Frame ...................................................................................................... 6
Figure 2. Target Frame ....................................................................................................... 6
Figure 3. Motion Vector Field ............................................................................................ 7
Figure 4. Reconstructed Target Frame................................................................................ 7
Figure 5. Error Frame For Train Example .......................................................................... 8
Figure 6. Example Image Before DWT .............................................................................. 9
Figure 7. Example Image After One Application of DWT ................................................ 9
Figure 8. Example Picture Grouping Scheme................................................................... 10
Figure 9. Encoder Block Diagram .................................................................................... 12
Figure 10. Decoder Block Diagram .................................................................................. 12
Figure 11. PSNR Comparison Between DCT and DWT .................................................. 15
Figure 12 DWT compressed frame ................................................................................... 16
Figure 13 DCT compressed frame .................................................................................... 16
Scott Chin, Anup Misra
3
Wavelet Based Video Compression
ELEC499a – Final Report
1 Introduction
1.1 Motivation
Digital multimedia has changed the way we view computers. For example, the emergence of
digital multimedia has brought DVD players to our home theaters, transformed our
computers into media centers, and has allowed us to take our media with us in our handheld
devices. Our project aims to examine one of the technologies that have been key to this
digital revolution, digital video compression.
The most recognizable consumer application is DVD video. On a DVD disc a movie is
stored digitally as a string of ones and zeros. Video compression algorithms, such as MPEG,
use various signal processing and video processing techniques to reduce the storage space
needed to represent a movie while maintaining the highest quality possible.
1.2 Purpose
The main trade off in any signal compression (i.e. music, image or video) is compression
rate versus signal quality. When compression is too high the signal quality usually degrades
to an undesirable quality. An example would be an mp3 file encoded at a very low bitrate.
The file size may be very small but the music won’t sound good.
Engineers face the same problems when trying to compress video signals. If a video is
compressed too much the picture quality becomes terrible. To make better compression
algorithms the challenge is to find techniques that can achieve good quality at high
compression rates. Wavelets are an emerging signal processing technique that fit the bill.
1.3 Project Goals
The goal of this project is to explore the merits the Discrete Wavelet Transform in video
compression. The three original goals were to
1. Design, implement an encoder and decoder in matlab
2. Test and compare the performance of the DWT over the DCT
3. Implement the decoding algorithm on a high speed DSP chip or alternate dedicated
hardware.
However, due to several factors discussed later in this report, goal number three could not be
achieved. Therefore, we changed goal number three to the following.
3. Implement additional features to the encoder/decoder in Matlab to increase the
PSNR.
Scott Chin, Anup Misra
4
Wavelet Based Video Compression
ELEC499a – Final Report
2 Video Processing Basics
There are three main parts found in a standard video compression algorithm.



Motion Compensated Prediction (MCP)
Transform Coding (DWT/ DCT)
Group of pictures (GOP)
2.1 Motion Compensated Prediction
Video sequences show a high degree of correlation from frame to frame. One compression
strategy is to take the difference between adjacent frames and store this value. If the two
frames are similar, the difference frame will not contain much information. This technique
is known as Predictive Coding and is commonly used in signal compression.
However, in video signals, there may be large difference between frames due to lighting
effects, quick camera movements, and fast scene changes. If neighboring frames are very
different, the difference frame may be large and contain more information than the original
frames themselves! To overcome this we use ‘Motion-Compensated’ Prediction.
MCP is a refinement of predictive coding. Rather than finding the difference frame directly,
we can use the motion of objects in the scene to produce a better predictive coding
algorithm. We use motion estimation to judge the movement of objects in a scene.
2.1.1 Motion Estimation
In general, a scene has multiple moving objects. Therefore, the motion of each object can be
characterized from frame to frame. For example, if there is a movie of a car driving across
the screen, each frame shows the same car but it is shifted with respect to the previous frame.
This shift can be calculated and characterized by a Motion Vector.
However, determining where all the objects are in a scene is extremely complex. A simple,
but non-ideal, solution is to partition each frame into non-overlapping uniform square blocks
and characterize the motion of each block. This type of motion estimation is called Block
Matching. We assume that each block undergoes translation only with no scaling or
rotation. The blocks in the first frame, called the anchor frame, are compared to the blocks
in the second frame, called the target frame. Motion Vectors can then be calculated for each
block to see where each block from the anchor frame ends up in the target frame.
Scott Chin, Anup Misra
5
Wavelet Based Video Compression
ELEC499a – Final Report
Figure 1 and Figure 2 show an example of two frames from a video sequence. Note that the
train is moving from right to left, the ball is rolling, the gimbals are spinning, and the camera
is slowly panning upwards.
Scott Chin, Anup Misra
6
Wavelet Based Video Compression
ELEC499a – Final Report
Figure 1. Anchor Frame
Figure 2. Target Frame
Figure 3 shows the motion vectors generated between these two frames.
Scott Chin, Anup Misra
7
Wavelet Based Video Compression
ELEC499a – Final Report
Figure 3. Motion Vector Field
Figure 4 shows the target frame predicted from the anchor frame and the motion vectors.
Figure 4. Reconstructed Target Frame
Scott Chin, Anup Misra
8
Wavelet Based Video Compression
ELEC499a – Final Report
2.1.2 Motion Compensation
When a target frame is reconstructed using the anchor frame and motion vectors, the
reconstruction is not perfect. In order to compensate for these errors, an error frame is
generated at the encoder. The error frame is the difference between the actual target frame,
and the reconstructed target frame. These error frames are generally very small and
compress quite well. By spending the extra time to calculate a set of motion vectors we
generally ensure an error frame with much smaller components.
Figure 5 Shows the error frame generated for the target frame of the same train sequence.
Figure 5. Error Frame For Train Example
2.2 Transfom coding – The Discrete Wavelet Transform
All mainstream encoders use the Discrete Cosine Transform (DCT) to perform transform
coding. The DCT maps a time domain signals to a frequency domain representation. We
can compress the frequency domain spectrum by truncating low intensity regions.
However, the DCT has several drawbacks. Computation of the DCT takes an extremely
long time and grows exponentially with signal size. To calculate the DCT of an entire video
frame takes an unacceptable amount of time. The only solution is to partition the frame into
small blocks and then apply the DCT to each block. However, this leads to a degradation in
picture quality.
Scott Chin, Anup Misra
9
Wavelet Based Video Compression
ELEC499a – Final Report
The Discrete Wavelet Transform, DWT, offers a better solution. The DWT is another
transform that maps time domain signals to frequency domain representations. But the
DWT has a distinct advantage; The DWT, in essence, can be computed by performing a set
of digital filters which can be done quickly. This allows us to apply the DWT on entire
signals without taking a significant performance hit. By analyzing the entire signal the DWT
captures more information than the DCT and can produce better results.
Figure 6 and Figure 7 show one step of the DWT decomposition. The DWT separates the
image’s high frequency components from the rest of the image, resizes the remaining parts
and rearranges them to form a new ‘transformed’ image.
Figure 6. Example Image Before DWT
Figure 7. Example Image After One Application of DWT
The image is separated into four subimages. The bottom left, bottom-right and top-right
show the high-frequency detail of the image. The top left quadrant contains the low
frequency or lower detail portion of the image, we can see that most of the information is in
this portion. We can achieve compression by removing data in the high detail areas. As you
Scott Chin, Anup Misra
10
Wavelet Based Video Compression
ELEC499a – Final Report
can see, if we retain only the top left image we are dropping information that does not distort
the image in a noticeable fashion.
2.3 Group of Pictures
What is the general structure of compressed and uncompressed video sequences?
In general, there are three different types of frames. They are called I, P, and B frames.
I frames are essentially the main anchor frames. No motion estimation is performed to
generate these frames. They are transform coded directly to ensure a high quality
reconstruction. This is because all following frames are predicted from the I frame and any
error in the I frame will propagate through the rest of the group.
P frames are predicted using MCP from the preceding I or P frame. The error frame
generated is then transformed and compressed. Both the error frame and the set of motion
vectors are stored to file.
B frames are encoded much like P frames except that the prediction is done from a
combination of a previous P or I frame, and a future frame P or I frame. The results are then
averaged to represent the current frame. This is called bi-directional prediction. The
prediction relative to future frames is needed to capture new object that may appear in the
video in the middle of the group of pictures.
Frame ordering is very important to overall picture quality. When predicting frames from
previously encoded frames any errors in the previously encoded frames will degrade the
reconstruction of the current frame. This error propagation can be controlled by using a
specific frame ordering known as the group of pictures. Figure 8 shows a general group of
pictures structure.
Figure 8. Example Picture Grouping Scheme
Scott Chin, Anup Misra
11
Wavelet Based Video Compression
ELEC499a – Final Report
By using a set group of pictures and reusing it we ensure that error propagation is kept
within each group. The length of the group should not be too large so as to reduce the
amount of error propagation. Also it is important to have a nice mix of P and B frames. B
frames are more costly to calculate but the bi-directional prediction helps video quality
immensely.
Scott Chin, Anup Misra
12
Wavelet Based Video Compression
ELEC499a – Final Report
3 Implementation
3.1 Encoder and Decoder Implementation
Both the encoder and decoder were implemented in Matlab. Figure 9 and Figure 10 show
the general structure of the codec. Our codec was specially designed so that the type of
transform coding to be performed could be switched without affecting the rest of the codec.
One of the input arguments to our encoder allows the user to specify which type of transform
coding to perform, DCT or DWT. This way we were able to test the effect of using the
DWT over the DCT.
Figure 9. Encoder Block Diagram
Figure 10. Decoder Block Diagram
Scott Chin, Anup Misra
13
Wavelet Based Video Compression
ELEC499a – Final Report
3.2 DWT Implementation
The wavelet transformed was performed using a Daubeschies 4 filter with 5 levels of
decomposition. The UVi_Wave toolbox was used to implement the wavelet transform and
filter generation. To perform lossy compression all of the small components of the
transformed image were set to zero. The threshold to use was chosen by a simple rate
control technique. A portion of the video was encoded in three passes and the threshold was
adjusted until the desired compression ratio was achieved. This threshold was then used on
the remainder of the sequence. This approach is not ideal but considering our test sequences
were short and only contained one scene this system was deemed acceptable.
3.3 Motion Estimation Algorithms
Originally only one motion estimation algorithm was implemented. This was the Exhaustive
Block Matching Algorithm (EBMA). This algorithm performs an exhaustive search within a
given search range for the best matching block in the target frame. Therefore, it always
finds the best match. However, since it is an exhaustive search, computation time is
relatively long.
To reduce the time spent on quick tests, the Three Step Search algorithm was implemented.
This algorithm discards testing of unlikely candidate blocks. But due to discarding of
potential candidates, the algorithm would sometimes discard a good candidate. Therefore an
overall quality degradation was present. This was useful when quick tests were required.
As a final attempt to increase the PSNR of the encdoded video sequences, a Fractional
Exhaustive Block Matching Algorithm was implemented. This is a modification of the
EBMA. Motion vectors may not always be multiples of pixels. Therefore, to increase the
accuracy of the motion vectors, sub-pixel search was required.
Half pixel resolution was tested using the Fractional EBMA. The number of operations was
increased by a factor of 4 in addition to interpolation operations.
This algorithm was implemented because our research showed that this algorithm would
supposedly yield a better picture quality. After testing, it was found that this algorithm did
indeed yield an improved picture quality to the human eye. However, it gave a decreased
PSNR. The reason was determined to be the low pass filtering effect of the interpolation
process. By smoothing out blocking effects, the picture seemed better to the viewer. But
since the predicted frame is now low passed, it is more different than the original target
frame. This causes the error frame to contain much more data and thus not compress very
well.
Due to these problems, the EBMA was chosen as the motion estimation algorithm for the
encoder and decoder.
Scott Chin, Anup Misra
14
Wavelet Based Video Compression
ELEC499a – Final Report
4 Challenges Faced in Hardware Implementation
Our original goal was to implement a real time decoder on a high speed dsp chip. The
implementation would allow us to learn more about dsp chips and real time systems.
Unfortunately, due to hardware availability we decided that this portion of the project was
not feasible within the time limit, and decided to focus on the software based decoder.
The hardware the technical department has in stock is not suitable to video applications. The
Motorola DSP boards are geared towards audio applications. The processor itself is capable
of the processing needs we have but unfortunately the data transferrates from the board to
the PC was inadequate. Our hope was to store the encoded video file on our computer and
use the DSP chip to perform decoding in real time. Unfortunately the serial connection
between the computer and the development board could not transfer the data fast enough to
the DSP chip.
Newer DSP development boards provide USB connections. A connection of this type would
be ideal in application where there is a large amount of data. Also a chip that offered
floating point calculations would be desirable. The filtering operation of the DWT could be
calculated efficiently and accurately in floating point format.
For more detail and information on motion estimation algorithms, please refer to the text
Video Processing and Communications by Y. Wang, J.Ostermann and Y-Q. Zhang.
Scott Chin, Anup Misra
15
Wavelet Based Video Compression
ELEC499a – Final Report
5 Results
The wavelet based codec has many advantages over the DCT counterpart. The quality was
roughly 3-4dB higher than the equivalent DCT based codec. The distortions introduced into
the video were also very interesting. Artifacts in the DCT compressed video were mainly
blocking effects and green/red speckled colour distortion. The DWT based encoder also had
the color distortion but rather than blocking artifacts the video appeared more smeared. In
some low bit rate cases the distortion in the DWT compressed video was more appealing
than the harsher blocking artifacts introduced by the DCT.
5.1 PSNR Measurements
The following plot shows a comparison of the power-to-signal-noise ratio when using DWT
and DCT. Six GOPs of a video sequence were encoded at 90% compression ratio. The
following plot was calculated for the luminance component only.
Figure 11. PSNR Comparison Between DCT and DWT
As seen from Figure 11, the PSNR of the DWT encoded sequence is constantly at least 3dB
higher than the DCT frames. Also note that each maximum peak corresponds to an I frame
in the GOP structure. Therefore it has the highest PSNR. However, as errors propogate
through the GOP, the PSNR of the DCT frames degrades much faster than the DWT frames.
Scott Chin, Anup Misra
16
Wavelet Based Video Compression
ELEC499a – Final Report
The PSNR difference from the I frame and the last B frame in the GOP is only
approximately 1dB for the DWT encoded sequence; But for the DCT encoded sequence, this
differenece is almost 3dB.
5.2 Visual Quality
Figure 12 shows an example frame from one of the test sequences encoded with the DWT
based codec. Figure 13 shows the same frame from the DCT based codec. The increased
quality can be most clearly seen on the blue and yellow jersey of the football player. The
DWT encoded frame isgenerally smoother and has less green/red distortion.
Figure 12 DWT compressed frame
Figure 13 DCT compressed frame
Scott Chin, Anup Misra
17
ELEC499a – Final Report
Wavelet Based Video Compression
5.3 Computation Time
Table 1 summarizes the encoding and decoding times for four video sequences. Note that
these computation times are calculated for each 13 frame GOP, averaged over four GOPs.
The calculation times are in minutes.
From this table we see that the encoding time for DWT is consistently longerby 30-75
seconds. The decoding time for DWT is also longer but only by approximately 30 seconds.
As a final note, these calculations were executed in Matlab and a 1GHz desktop PC.
Table 1. Computation Comparison Between DWT and DCT
DWT
tempete
mobile
edberg
football
Encoding
10.68
10.36
10.72
11.32
Scott Chin, Anup Misra
DCT
Decoding
1.04
0.99
1.21
1.14
Encoding
10.14
9.66
10.44
10.02
Decoding
0.62
0.59
0.55
0.65
18
Wavelet Based Video Compression
ELEC499a – Final Report
6 Recommendations
The codec that was implemented contained only the essential components in a video
compression algorithm. Commercial codecs such as MPEG2 have numerous additional
features. For example, streaming functionality, audio standards, backwards compatibility,
etc. In order to become a useful standard, additional features need to be implemented.
For future students undertaking a wavelet based video compression project in ELEC499,
some areas of focus could include:




Implement the encoder/decoder through dedicated software
Implement the codec to be compatible with commercial players such as Microsoft
Media Player
Implement the decoder in hardware
Research and implement additional features required in a commercial standard.
If students wish to implement the algorithm in hardware, it is strongly recommended (based
on our experience) to research and obtain the appropriate DSP hardware as early as possible.
This should be done within the first week of class or even prior to the start of class.
Scott Chin, Anup Misra
19
Wavelet Based Video Compression
ELEC499a – Final Report
7 Conclusion
As seen from the results, the the codec shows that a DWT based codec can be superior to a
DCT based codec. Improvement in PSNR increased by 3dB. Error propagation through out
a GOP reduced from 3dB to 1dB. Video quality was also noticeably better to the human
eye. Computation time was slightly longer but could be optimized with dedicated hardware
or software. We were pleased with the results and we believe that wavelet based
compression will play an integral role in the future of video coding standards.
Scott Chin, Anup Misra
20
Wavelet Based Video Compression
ELEC499a – Final Report
References
P. Agathoklis, Lecture Notes, 2003
W.-S. Lu, Lecture Notes, 2003
Y. Wang, J.Ostermann and Y-Q. Zhang, “Video Processing and Communications”,
Prentice-Hall, 2002
Scott Chin, Anup Misra
21
Download