VIDEO COMPRESSION AND DECOMPRESSION USING ADAPTIVE ROOD PATTERN SEARCH Nirav Shah

advertisement
VIDEO COMPRESSION AND DECOMPRESSION
USING ADAPTIVE ROOD PATTERN SEARCH
Nirav Shah
B.E., Dharmsinh Desai University, India, 2006
PROJECT
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
ELECTRICAL AND ELECTRONIC ENGINEERING
at
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
FALL
2010
VIDEO COMPRESSION AND DECOMPRESSION
USING ADAPTIVE ROOD PATTERN SEARCH
A Project
by
Nirav Shah
Approved by:
__________________________________, Committee Chair
Jing Pang, Ph. D.
__________________________________, Second Reader
Preetham Kumar, Ph. D.
____________________________
Date
ii
Student: Nirav Shah
I certify that this student has met the requirements for format contained in the University
format manual, and that this project is suitable for shelving in the Library and credit is to
be awarded for the Project.
__________________________, Graduate Coordinator
Preetham Kumar, Ph. D.
Department of Electrical and Electronic Engineering
iii
________________
Date
Abstract
of
VIDEO COMPRESSION AND DECOMPRESSION
USING ADAPTIVE ROOD PATTERN SEARCH
by
Nirav Shah
Video compression is getting more and more important in the electronic world
with increased amount of bandwidth and storage requirement due to increase in the video
usage over the internet. Pioneering advances in the video compression algorithms is
important. The project discusses various algorithms that are currently available in the
commercial market for its advantages and disadvantages. One of them is H.264 standard.
H.264 is a motion-block oriented codec standard developed by ITU-T. Aim of this
algorithm is to provide better video quality with fewer amounts of information transfer.
The final goal of the project was to implement a video encoder and decoder using Matlab.
A video captured in RGB format was encoded using the encoder with each frame
processed by dividing it into several motion-blocks. In the encoder part, several motion
estimation algorithms were studied. The algorithms were compared with respect to
number of calculations done by each algorithm and arithmetic complexity involved in
them. Peak signal to noise ratio for multiple frames was also calculated for different
algorithms to get information about quality of the algorithm. From the discussed
iv
algorithms, ARPS (Adaptive Rood Pattern Search) algorithm was used in the final
encoder. Motion vectors generated by the ARPS were given to Motion Compensation to
generate compensated image. The compensated image was transformed using DCT
(Discrete Cosine Transform). Finally, the transformed vectors were Quantization and
encoded using RLE (Run length encoding).
The encoded video stream was successfully decoded by a decoder following
reverse process to re-generate the video in its actual format.
_______________________, Committee Chair
Jing Pang, Ph. D.
_______________________
Date
v
ACKNOWLEDGMENTS
Before starting discussing the project, I would like to put some heartfelt words for
those who motivated and helped me for completing this project successfully. I am
thankful to Dr. Jing Pang for providing me an opportunity to work on this project, which
was a great exposure to the field of video processing. I thank her for providing all
resources, help and guidance whenever needed to successfully finish the project. Her
knowledge and expertise in the field was very helpful for me to understand the project
and finish it successfully. Without her guidance and help this project would not have
been completed successfully.
I would like to thank Dr. Preetham Kumar for reviewing my report and providing
valuable suggestions that helped me to improve my project report. I would like to thank
my family members for providing me strength and inspiration during the critical phases
of this project in the last one year. Finally, I would like to thank all the faculty member of
Department of Electrical and Electronics Engineering at California State University,
Sacramento for the help and support to complete my graduation successfully.
vi
TABLE OF CONTENTS
Page
Acknowledgments
vi
List of Tables
ix
List of Figures
x
Chapter
1. INTRODUCTION
1
1.1 Purpose of the Project
2
1.2 Significance of the Project
2
1.3 Organization of Report
3
2. BASIC COMPONENTS OF H.264 ENCODER AND DECODER
5
2.1 Types of Redundancy
5
2.2 Encoder Module
8
2.2.1 Intra Frame Prediction
11
2.2.2 Inter Frame Prediction
12
2.2.3 Motion Compensation
14
2.2.4 Residual Frame
15
2.2.5 Residual Frame Encoding
16
2.3Decoder Module
16
3. IMAGE COMPRESSION COMPONENTS
19
3.1 Transformation
20
3.2 Quantization
22
3.3 Entropy Encoding and Decoding
24
4. INTER FRAME PREDICTION BASICS
25
5. INTER FRAME PREDICTION ALGORITHMS
29
5.1 Exhaustive Search
31
5.2 Three Step Search
32
5.3 Diamond Search
35
vii
5.4 Adaptive Rood Pattern Search
37
6. SIMULATION RESULTS
43
7. CONCLUSION
47
References
48
viii
LIST OF TABLES
Page
1.
Table 3-1: 8x8 Discrete Cosine Transform Matrix ............................................... 21
2.
Table 3-2: 8x8 Pixels of Block ............................................................................. 21
3.
Table 3-4: DCT of a Block Shown in Table-2...................................................... 22
4.
Table 3-5: Quantization Matrix Example ............................................................. 23
5.
Table 3-6: DCT Matrix Before Quantization ....................................................... 23
6.
Table 3-7: De-quantized DCT Matrix .................................................................. 23
7.
Table 5-1: Average Cost in ES, TSS and NTSS ................................................... 35
ix
LIST OF FIGURES
Page
1.
Figure 2-1: Special Redundancy Example.............................................................. 5
2.
Figure 2-2: Temporal Redundancy ......................................................................... 6
3.
Figure 2-3: Difference Between Consecutive Frames ............................................ 7
4.
Figure 2-4: Transformation Distribution of The Image Shown in Figure-2-1 ....... 7
5.
Figure 2-5: Block Diagram of H.264 Encoder ....................................................... 9
6.
Figure 2-6: Intra Frame Prediction Block Diagram .............................................. 12
7.
Figure 2-7: Prediction Using One Reference Frame ............................................ 13
8.
Figure 2-8: Prediction Using Two Reference Frames .......................................... 13
9.
Figure 2-9: Compensated Frame........................................................................... 14
10.
Figure 2-10: Residual Frame ................................................................................ 15
11.
Figure 2-11: Residual Frame 3D Mesh Plot ......................................................... 16
12.
Figure 2-12: Block Diagram of H.264 Decoder ................................................... 17
13.
Figure 3-1: Image Compression Components ...................................................... 19
14.
Figure 4-1: A16x16 Macro Block Breaking ......................................................... 25
15.
Figure 4-2: Area of Search for a Macro Block ..................................................... 26
16.
Figure 4-3: Motion Vector for a Macro Block ..................................................... 27
17.
Figure 5-1: Searching Block and Current Macro Block ....................................... 29
18.
Figure 5-2: Basic Three Step Search .................................................................... 32
19.
Figure 5-3: New Three Step Search ...................................................................... 34
20.
Figure 5-4: LDSP and SDSP ................................................................................ 36
x
21.
Figure 5-5: Example of Diamond Search ............................................................. 36
22.
Figure 5-6: Matching Error Surface...................................................................... 38
23.
Figure 5-7: Types of Prediction of Motion ........................................................... 40
24.
Figure 5-8: First step of ARPS.............................................................................. 41
xi
1
Chapter 1
INTRODUCTION
Usage of HD (High Definition) Video is increasing day by day in applications
like television, video over internet, gaming and video surveillance [1]. Considerable
storage space is required to store such information. Computational requirement to process
large amount of data is also very high. Pervasive, seamless, high-quality digital video has
been the goal of companies, researchers and standards bodies. Areas like television and
consumer video storage has already captured huge market of consumer electronics.
Applications like videoconferencing, video email, and mobile video are also increasing
day by day demanding intense video processing.
Getting digital video from its source (a camera or a stored clip) to its destination
(a display) involves a chain of components or processes. Key to this chain are the
processes of compression (encoding) and decompression (decoding), in which
bandwidth-intensive ‘raw’ digital video is reduced to a manageable size for transmission
or storage, then reconstructed for display. Getting the compression and decompression
processes ‘right’ can give a significant technical and commercial edge to a product, by
providing better image quality, greater reliability and/or more flexibility than competing
solutions.
Challenge is adoption of a common process that can be used by variety of
audiences. Two major groups are actively involved in providing standards for video
compression: Moving Picture Experts Group (MPEG) and Video Coding Experts Group
(VCEG). The MPEG and VCEG have developed a new standard that promises to
outperform the earlier MPEG-4 and H.263 standards, providing better compression of
2
video images. The new standard is entitled ‘Advanced Video Coding’ (AVC) and is
published jointly as Part 10 of MPEG-4 and ITU-T Recommendation H.264 [2].
1.1
Purpose of the Project
Video stream needs to be processed from several steps in order to encode and
decode the video such that it is compressed efficiently with available limited resources of
hardware and software. Each step can be implemented with different algorithms to
accomplish required task. All advantages and disadvantages of available algorithms
should be known to implement a codec to accomplish final requirement. The purpose of
this project is to implement all basic building blocks of H.264 video encoder and decoder.
1.2
Significance of the Project
The significance of the project is the inclusion of all components required to
encode and decode a video in Matlab. This project contains several algorithms for inter
frame prediction. Inter frame prediction does prediction of position of a macro block
within current frame by taking past and future frames as reference. Along with doing
prediction of a macro-block, challenge is processing huge amount of data in each frame.
That is because usage of high definition video is increasing widely. High definition
videos are captured at several frames/second rates. Such videos have large amount of
macro blocks in each frame. Components designed in Matlab for the project will be
helpful while implementing them on hardware to get information on factors like
complexity and performance.
3
1.3
Organization of Report
Chapter two contains brief information on all basic components of a video
encoder and decoder. First top-level diagram of encoder module explained including use
of inter frame prediction, intra frame prediction and motion compensation inside the
encoder to encode video information. Then top-level diagram of decoder module is
explained including each component inside decode to decode video information.
Residual image is compressed with image compression concepts. Chapter three
contains information on components involved on image compression and decompression.
In that first discrete cosine transformation in explained which transforms given image in
frequency domain. Because of that frequencies containing more information and less
information are separated.
Chapter four contains basic information of inter frame prediction. Using inter
frame prediction, location of a macro-block in current frame is matched with a macroblock of past or future frame. Each macro-block in current frame is compared with
several macro-blocks of past or future frame to find best match. Because of that Inter
frame prediction is the most computational part of video encoding.
Various algorithms used for inter frame prediction are explained in Chapter five.
Exhaustive search algorithm is very basic inter-frame prediction algorithm. Implementing
this algorithm is simple and PSNR achieved by this algorithm is very high. Amount of
calculations required in this algorithm is very large. Later adaptive rood pattern search
algorithm is explained. It has very less computational requirement compared to other
4
algorithms. However, PSNR achieved by this algorithm is very close to exhaustive search
algorithm.
Chapter six contains simulation results done using adaptive rood pattern search
and diamond search. For the comparison, computational complexity involved in each
algorithm and peak signal to noise ratio were compared.
5
Chapter 2
BASIC COMPONENTS OF H.264 ENCODER AND DECODER
Intension of video compression algorithms is to remove different types of
redundant information from a video. Video compression algorithms operate by removing
redundancy in the temporal, spatial and/or frequency domains. Algorithms are
implemented in such a way that before sending video information, redundant information
is removed. Because of that amount of space required to save a video is largely reduced.
On the decoder side with help of algorithms used, redundant information is extracted
back from available compressed data.
2.1
Types of Redundancy
Special refers to redundancy available within a picture. Notice special redundancy
available in following figure [4]:
Figure 2-1: Special redundancy example
In the figure above notice are surrounded by black border. In this area, there is not
a significant change in information. In this case size can be saved by sending information
of very small part of the block and then mentioning to use same information for
remaining area inside the black border.
6
Temporal redundancy refers to redundancy between two multiple frames. In a
second multiple frames are captured to form a video. In normal video capture, objects are
not moving with a rapid amount. Also, all objects in a frame are not moving with respect
to frames located nearby each other. Following figure shows two consecutive frames:
Figure 2-2: Temporal redundancy
Above two are consecutive figures. Notice that there is not much difference
between two consecutive figures. In other words, almost all objects are steady between
two frames. In this case, while sending second frame we can just mention related
information compared to previous frame and save huge amount of information that needs
to be transferred or saved.
Following figure shows difference of above two frames using Matlab. Notice that
there is very less movement between two frames. With help of inter frame prediction
technique mentioned in Chapter-4, the movement related information is transferred using
motion vector of each macro block. On the receiver end, with help of a reference frame
and motion vector of each macro block the other frame is reconstructed. By sending only
motion vector related information large amount of data that needs to be transferred or
saved for a frame is saved.
7
Figure 3-3: Difference between consecutive frames
The human eye and brain (Human Visual System) are more sensitive to lower
frequencies [3]. Because of that even if we remove high frequency information, an image
is still recognizable. Following figure shows discrete cosine transformation of the frame
shown in figure-1:
Figure 3-4: Transformation distribution of the image shown in figure-2-1
8
Notice that the image is of 190x256 pixels. After doing transformation, the
information is concentrated towards the center. Because of the characteristic of
concentration towards lower frequency, while doing compression, higher frequency
information is removed from a frame to achieve higher compression.
In video compression, by removing different types of redundancy (spatial,
frequency and/or temporal) it is possible to compress the data significantly at the expense
of a certain amount of information loss (distortion). Further compression can be achieved
by encoding the processed data using an entropy coding scheme such as Huffman coding
or Arithmetic coding.
2.2
Encoder Module
An H.264 video encoder mainly comprises of inter-frame prediction, motion
compensation, intra frame prediction, discrete cosine transformation, quantization and
entropy encoding [1]. Below is a block diagram showing connection of all components
with each other:
9
Fn
(Current)
T and Q
Entropy
Encoder
Encoded
Stream
Inter
Prediction
MC
F’n-1
(Reference)
Choose
Intra
Prediction
Fn’
Intra
Prediction
Filter
Inverse T
and Q
Figure 2-5: Block diagram of H.264 encoder
As shown in the figure, intra frame prediction is done almost same way as image
compression. Notice the feedback path in the figure while doing intra prediction. The
feedback is to prevent difference at decoder side compared to encoder side [5].Intra
predicted frames are known as I-frame.
Inter frame prediction is done based on current frame and reference frame. The
reference frame can be a past frame or future frame. Inter frame prediction can be done
by taking one or more than one frames as reference. Based on that inter predicted frame
can be a P-frame or B-frame. If a frame is predicted from one reference frame, then the
predicted frame is known as P-frame. On the other hand, if a frame is predicted from
more than one reference frame then the predicted frame is known as B-frame. Inter
prediction gives information about motion vector (MV) for each macro block (MB).
10
Motion vector for a macro block gives information about location of the macro block in
current frame with respect to location of the macro block in reference frame.
Motion compensation block generates a compensated image by taking motion
vector information of current frame and reference frame/frames as reference. Frame
generated by this frame is almost same as current frame. That is because motion vector
for current frame tells from which position the motion compensation block should take a
macro block of reference frame to put in current frame.
Motion compensated image is not exactly same as current frame. Because of that
a residual image is generated from current frame and compensated image. Following
example shows how residual image is helpful to regenerate current frame at the decoder
end by taking only MVs of current frame and reference frame as reference:
Cur_fr = Current frame
Ref_fr = Reference frame
Comp_fr = Compensated frame
Res_fr = Residual frame
MV= fun (Cur_fr, Ref_fr)
Comp_fr = fun (MV, Ref_fr)
Res_fr = Cur_fr – Comp_fr
Notice that decoder already has Ref_fr and MV. Based on the information it can
generate Comp_fr. Ref_fr is extracted at the decoder end after passing input through
inverse discrete cosine transform and inverse encoding stages. From residual frame and
Compensated frame, current frame is generated in following way:
11
Cur_fr = Res_fr – Comp_fr
Notice that prior to sending residual frame, it is discrete cosine transformed,
quantized and entropy encoded. These stages are similar to JPEG compression. By
transforming a frame, low frequency information and high frequency information are
separated from each other from a frame. Quantization is allowing more low frequency
information to pass compared to high frequency information. Finally the M x N frame is
converted to a serial stream by zigzag pattern and entropy encoded.
2.2.1 Intra Frame Prediction
Intra frame prediction refers to prediction of position of a macro block with
respect to position of another macro block within the same frame. During the intra predict
mode selection, the prediction cost is computed and compared. The mode which has least
cost is selected in video coding [5]. Following is a block diagram showing major
components involved in intra frame prediction:
Fn
(Current)
T and Q
Reconstructed
Frame
Intra
Prediction
For coder
Inverse T
and Q
Intra
Prediction
for search
Mode
Decision
Entropy
Encoder
Encoded
Stream
12
Figure 2-6: Intra frame prediction block diagram
Intra frame prediction is done based on mode selection. Nine modes are defined in
intra frame prediction in H.264. In practice, these intra prediction modes mentioned
above take the correlation between two macro-blocks into consideration. During the intra
predict mode selection, the prediction cost is computed and compared. The mode which
has least cost is selected in video coding.
2.2.2 Inter Frame Prediction
Inter frame prediction refers to prediction of position of a macro block with
respect to position of another macro block in a reference frame. There can be one or more
reference frame for predicting position of a macro block of current frame. Based on that
inter frame predicted frame can be a P-frame or B-frame. If only one frame is used as
reference, then the predicted frame is known as P-frame otherwise the predicted frame is
known as B-frame. Following figure shows as example of prediction done using single
reference frame:
Reference Frame
Current Frame
13
Figure 2-7: Prediction using one reference frame
Notice that the object in reference frame has just moved to another position in the
current frame. In inter prediction, only the displacement information is passed for current
frame which is known as a P-frame.
Figure 2-8: Prediction using two reference frames
As shown in the figure above, the light blue object has moved a little with respect
to past frame compared to future frame. Because of that prediction of position of the blue
object is done with respect to position of the blue object in past frame. The square object
has moved a little with respect to future frame compared to past frame. Because of that
prediction of the square object is done with respect to position of the square object in the
future frame. As the current frame is predicted from past frame as well as future frame,
the predicted frame is known as B-frame.
A frame is divided into macro block. After that each macro block is searched in
reference frame in nearby search area. Computation required to do the prediction is very
large. Several algorithms are developed to reduce required computations. Chapter-5
contains algorithms used to do inter frame prediction.
14
2.2.3 Motion Compensation
In motion compensation, a compensated image is generated by taking reference
image and motion vectors of current frame as reference [6]. The motion compensation
block has information of motion vector for each macro block. Motion vector is giving
information about from where to take a macro block for the current frame from a
reference frame. Following is an example shown how a macro block for current frame is
taken from reference frame with help of motion vector:
MV (i, j) = (p, q);
(2.1)
MBCF (i, j) = MBRF (i+p, j+q);
(2.2)
Where
MV is location of a macro block
MBRF is location of a macro block in reference frame
MBCF is location of a macro block in current frame
Figure 2-9: Compensated frame
Above figure is generated from motion vectors of current frame and reference
frame. Notice that the compensated frame is almost matching with current frame.
Difference between compensated frame and current frame is transmitted so that decoder
15
can regenerate the current frame with help of motion vectors for the frame and reference
frame.
2.2.4 Residual Frame
Compensated image is not exactly equal to current frame. Because of that current
frame is subtracted from compensated image and send to decoder. The decoder already
has information of compensated frame because of motion vector and reference frame.
The decoder adds a residual frame with a current frame to regenerate current frame.
Following figures shows residual frame:
Figure 2-10: Residual Frame
Notice that the difference between compensated frame and current frame is very
small. In the figure above black area indicates no difference between compensated frame
and current frame.
Following is 3D mash plot of the frame to get more information about difference
between compensated frame and current frame:
16
Figure 2-11: Residual Frame 3D mesh plot
In the figure above notice that throughout the frame difference between
compensated frame and current frame is negligible. Because of less difference throughout
the frame huge spatial redundancy is available in the residual frame.
2.2.5 Residual Frame Encoding
Residual frame is in M x N format. As residual frame contains difference between
compensated frame and current frame, it has huge special redundancy. This redundancy
is removed in frequency domain by doing discrete cosine transform of the residual frame.
Finally to transmit or save the encoded data it is converted to a serial stream and entropy
encoded.
2.3
Decoder Module
Decoder regenerates residual frame by doing entropy decoding of input serial
stream and doing inverse discrete cosine transform. Decoder also receives information
about motion vector for each macro block in of current frame. Based on available
17
information the decoder is able to generate current frame at the decoder side. Following
figure shows basic building components in a decoder module. Notice that except motion
estimation component, the decoder is having all components same as encoder module.
Because of no motion estimation part in decoder, implementation is a decoder is
relatively simple than the encoder module.
F’n-1
(Reference)
MC
Intra
Prediction
F’n
(Reconstructed)
Filter
Inverse T
and Q
Entropy
decoder
Input stream
Figure 2-12: Block diagram of H.264 decoder
Residual frame in form of serial stream is received as an input to the decoder. As
shown in the block diagram the input stream is first entropy decoded. By doing that serial
redundancy removed while sending the information is achieved back.
Similar to entropy coding, entropy decoding also can be considered as 2-step
process. The first step converts the input bit stream into the intermediate symbols. The
second step converts the intermediate symbols into the quantized DCT coefficients.
The quantized DCT co-efficient information is de-quantized with same algorithm
used to quantize DCT co-efficient. The de-quantized co-efficient of DCT are passed
through inverse discrete cosine transform to get residual image back. The process is
18
similar to decompression method used to decompress an image. This process is described
more in detail in Chapter-3.
If a received frame is a P-frame or B-frame, with help of motion vector associated
with the frame, compensated frame is generated by taking past frame as reference. The
compensated frame is added with the residual frame to reconstruct current frame. If a
received frame is an I-frame, with help of intra-frame prediction component the frame is
reconstructed.
19
Chapter 3
IMAGE COMPRESSION COMPONENTS
In the block diagram of H.264 video encoder shown in figure-5 notice that
components seen towards the end side are matching with components involved in image
compression. These components are re-drawn in the following figure in more detail:
Input Frame
Breaking in
8x8 blocks
QUANTIZATION
ZIG ZAG
DCT of each
block
Entropy Encoding
Output serial
stream
Figure 3-1: Image compression components
As shown in the figure above, input image is first divided into 8x8 blocks. On
each block discrete cosine transformation is applied. By doing that time domain
information is transferred in frequency domain. In this frequency domain representation,
low frequency information is separated from high frequency information.
An 8x8 quantization matrix is formed to do quantization of an 8x8 discrete cosine
transformed block. Human eyes are more sensitive to low frequency components
compared to high frequency components [3]. Because of that quantization matrix is
implemented in such a way that low frequency components are less quantized compared
to high frequency components.
20
By doing zigzag patterning 8x8 blocks are transferred to serial stream from lowest
frequency information to highest frequency components. Finally the serial data is entropy
encoded to compress serial stream without losing any information.
3.1
Transformation
A discrete cosine transform (DCT) is a Fourier related transform. It is similar to
the discrete Fourier transform without imaginary numbers. In a DCT, a sequence of
numbers represented in terms of sum of cosine function [7]. There are different types of
discrete cosine transform. From them type-|| DCT is used in applications like image and
video compression [1]. A DCT of an NxN block can be expressed as following:
Y=AXB
In above equation, A is a discrete cosine transform matrix, X is a block of
samples of which we want to do discrete cosine transform, B is transpose of matrix A and
Y is discrete cosine transformed matrix. An inverse DCT of an NxN block can be
expressed as following:
X=BYA
Notice that for doing discrete cosine transformation and inverse discrete cosine
transformation, a discrete cosine transform matrix is used. This discrete cosine transform
is a constant which is shared by all 8x8 blocks of a frame. With following equation
elements of the discrete cosine transform matrix can be achieved:
(3.1)
Where,
21
and
(3.2)
Following is the discrete cosine transform matrix used for 8x8 pixels frame block:
0.353553
0.353553
0.353553
0.353553
0.353553
0.353553
0.353553
0.353553
0.490393
0.415818
0.277992
0.097887
-0.097106
-0.277329
-0.415375
-0.490246
0.461978
0.191618
-0.190882
-0.461673
-0.462282
-0.192353
0.190145
0.461366
0.414818
-0.097106
-0.490246
-0.278653
0.276667
0.49071
0.099448
-0.414486
0.353694
-0.353131
-0.354256
0.352567
0.354819
-0.352001
-0.355378
0.351435
0.277992
-0.490246
0.096324
0.4167
-0.414486
-0.100228
0.491013
-0.274673
0.191618
-0.462282
0.461366
-0.189409
-0.193822
0.463187
-0.46044
0.187195
0.097887
-0.278653
0.4167
-0.490862
0.489771
-0.413593
0.274008
-0.092414
Table3-1: 8x8 discrete cosine transform matrix
Above set of 64 numbers are created by multiplying a horizontally oriented set
one-dimensional 8 point cosine basis function by vertically oriented set of same
functions. In above table horizontal frequencies are represented by set of the horizontal
set of co-efficient while vertical frequencies are represented by set of the vertical set of
co-efficient. Following tables shows a block of 8x8 pixels and it’s discrete cosine
transform respectively:
140
144
152
168
162
147
136
148
144
152
155
145
148
167
156
155
147
140
136
156
156
140
123
136
140
147
167
160
148
155
167
155
140
140
163
152
140
155
162
152
155
148
162
155
136
140
144
147
179
167
152
136
147
136
140
147
175
179
172
160
162
162
147
136
-14
14
-20
19
7
-1
Table 3-2: 8x8 pixels of block
186
21
-10
-18
-34
-24
15
26
-2
-9
-9
6
23
-11
-18
-9
11
3
22
-8
-3
4
9
0
-5
10
-2
1
-8
14
8
-18
-3
-2
-15
1
8
4
2
-8
-11
8
-1
1
-3
18
-4
-7
4
-3
18
1
-1
-6
8
15
-7
-2
0
Table 3-3: DCT of a block shown in table-2
In transformed matrix low frequency information is concentrated on left top side
of a matrix. Notice how more information is concentrated in the table above.
3.2
Quantization
Each element of an 8x8 pixels of block is occupying 8-bit of memory. Each
element 0f discrete cosine transformed matrix is occupying more memory than the
original block. That is because value of an element within discrete cosine transformed
matrix can be from a low -1024 to a high 1023. To store the information of each element
11-bit of memory is required.
Human eyes are more sensitive to lower frequency compared to higher frequency
[3]. Huge amount of memory can be saved by quantizing high frequency information
from a discrete cosine transformed matrix. A quantization matrix is implemented in such
a way that low frequency information is less quantized compared to high frequency
information. By doing quantization, number of bits required to save an element is
reduced without much compromising with distortion in lower frequency components.
Following is an example of a quantization matrix:
3
5
7
9
11
5
7
9
11
13
7
9
11
13
15
9
11
13
15
17
11
13
15
17
19
13
15
17
19
21
15
17
19
21
23
17
19
21
23
25
23
13
15
17
15
17
19
17
19
21
19
21
23
21
23
25
23
25
27
25
27
29
27
29
31
Table 3-4: Quantization matrix example
Notice that with larger quantization value large error can be generated in the DCT
output while doing de-quantization. As errors generated in high frequency components
has less effects on human eyes almost similar frame is seen after doing de-quantization
and inverse discrete cosine transformation of a block. Following table shows DCT table
before and after quantization.
92
-39
-84
-52
-86
-62
-17
-54
3
-58
62
-36
-40
65
14
32
-9
12
1
-10
49
-12
-36
-9
-7
17
-18
14
-7
-2
17
-9
3
-2
3
-10
17
3
-11
22
-1
2
4
4
-6
-8
3
0
0
4
-5
-2
-2
-2
3
1
2
2
5
0
5
0
-1
3
Table 3-5 DCT matrix before quantization
90
-35
-84
-45
-77
-52
-15
-51
0
-56
54
-33
-39
60
0
19
-7
-9
0
0
45
0
-19
0
0
11
-13
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Table 3-6 de-quantized DCT matrix
Even though there is difference between DCT matrix before and after
quantization, this difference is not visible to human eyes. Also, notice that the discrete
cosine transformed matrix is divided by a number that occupies 4-bit. Because of that
24
only seven bits are required to save each element of quantized DCT matrix compared to
eleven bits required to save each element of un-quantized DCT matrix.
3.3
Entropy Encoding and Decoding
Entropy encoding refers to compressing information without losing any
information from the source. In H.264 two types of entropy encodings are used: context
based adaptive variable length encoding and context based adaptive variable length
encoding.
In context based adaptive variable length coding, input 4x4 macro block is first
converted to a serial stream using zigzag patterning. The serial data is encoded in form of
different parameters [6]. To send each type of parameters, input serial stream is passed
through an algorithm to decide parameter value. Following are parameters transmitted to
represent serial information in decreased amount of bits.
1. Number of nonzero co-efficient (numCoef) and trailing ones (T1)
2. The pattern of trailing ones (T1)
3. The non-zero co-efficient (Levels)
4. Number of zeros embedded in the non-zero coefficients (Total Zeros)
5. The location of those embedded zeros (run_before)
25
Chapter 4
INTER FRAME PREDICTION BASICS
Inter frame prediction refers to prediction by taking more than one frames as a
reference. By doing inter frame prediction, temporal redundancy is removed between
consecutive frames. For doing that each frame is divided into 16x16, 8x8 or 4x4 macro
blocks. In H.264, macro blocks can be of variable size [1]. Larger macro-block size is
used for portion of a picture where picture is continuous in given area. To capture small
changes in a picture from one frame to another smaller macro-block size is used.
16x16
16x8
8x16
8x8
Figure 4-1: A16x16 macro block breaking
As shown in the figure-14 above, a 16x16 macro block can further be divided into
16x8, 8x16 or 8x8 macro blocks. To send a 16x16 macro block, only one motion vector
is required. If a macro block is further divided in sub-macro blocks, then in that case one
motion vector is required to give displacement related information of each sub macro
block. In the above case, two motion vectors are required for a 16x16 macro block with
16x8 and 8x16 sub-macro block breaking. In case of processing a 16x16 macro block
with 8x8 sub-macro block size, four motion vectors are required to give information of
position of one macro-block. Thus there is tradeoff between precision in a movement and
required computation to achieve that precision. To get higher precision in movement, a
macro block needs to be further divided into multiple sub-macro blocks and in that case
26
multiple motion vectors are required to locate position of a macro block for which higher
calculations are required.
mb-x
Figure 4-2: Area of search for a macro block
Each macro block in current frame is searched within a small area surrounded by
the current macro block in a reference frame to find best match for location of a macro
block of current frame in a given reference frame. In a video capture, multiple frames are
captured in a second. Because of that there is not much movement of an object from one
to another consecutive frame. Because of that macro block in current frame is not
searched for a matching macro block in entire reference frame. Refer to figure-15 for an
example. Notice that the mb-x is of a current frame is searched within dotted area of a
reference frame. Searching for matching mb-x in reference frame is started from top-left
corner of the dotted line box shown in the figure above. SAD (Sum of absolute
difference) or MAD (Mean of absolute difference) is calculated at that point and saved in
a temporary variable associated with the search point. After that mb-x is moved in the
box shown by dotted line of a reference frame and calculated SAD or MAD at that point
27
and saved in another temporary variable associated with the search point. Following are
equations of SAD and MAD respectively:
(4.1)
And
(4.2)
Where,
N is number of rows
M is number of columns
Cij is a point in current frame
Rij is a point in reference
Motion vector for the mb-x is taken as a search point where SAD or MAD is
minimum compared to all other search points. Following figure is an example of motion
vector found at (-1,-1) location in the reference frame compared to location of mb-x in
the current frame.
mb-x
Figure 4-3: Motion vector for a macro block
28
Notice that macro block located at location mentioned by a macro block of a current
frame is not exactly matching inside the reference frame. A frame generated by
calculated motion vectors and reference frame is known as compensated frame. As the
compensated frame is not exactly matching with the current frame, difference between
the current frame and compensated frame is transferred to the decoder which is known as
a residual frame. PSNR (Peak signal to noise ratio) can be calculated by following
equation indicating peak signal to noise ratio between current frame and compensated
frame.
(4.2)
Where
(4.3)
For adapting appropriate algorithm, PSNR calculated between compensated frame
and current frame for different algorithms. Based on complexity involved in an algorithm
and average PSNR received associated with each algorithm, appropriate algorithm is
chosen.
29
Chapter 5
INTER FRAME PREDICTION ALGORITHMS
Finding a motion vector for a macro block within current frame with respect to a
macro block in reference frame is the most computational expensive task in any video
encoding [8]. Several algorithms are available to do inter frame motion estimation
providing range of computational complexity and PSNR. In this project four algorithms
are studied in detail. From that exhaustive search is very simple search algorithm that
provides best PSNR [8] among all available search algorithms that has maximum
computational complexity. New three step search and diagonal search algorithms has less
computational complexity by compromising with PSNR. Finally adaptive rood pattern
search algorithm is discussed that has less computational complexity compared to new
three step search and diagonal search and has higher PSNR compared to tree step search
algorithm and diagonal search. Following figure shows current macro block and search
block within which current macro block is searched in reference frame for motion
estimation. Note that same process is repeated for each macro block in current frame.
Search Block
Current
MB
P
P
Figure 5-1: Searching block and current macro block
30
In the figure above current MB is a macro block of current frame with size MxM
pixels while P is a search area starting from position of current MB in the reference
frame.
Comparison between several algorithms is done based on two factors: cost
function and PSNR. PSNR is calculated after a motion vector is found for current MB. It
is explained more in detail in chapter-4. Cost function calculates number of calculations
required to complete search for a motion vector for a macro block. Based on algorithm
used, cost function can be different for different macro block.
(5.1)
Where
M is size of number of rows or column of a frame
m is size of a macro block
In high definition video value of M is very large. Size of a macro block is
indicated by ‘m’ in the above equation. Notice that for larger value of ‘m’ fewer
computations are required. Because of that value of ‘m’ is kept in area of a picture in
which position of pixels in nearby area is not changing from one frame to another. Cost
during one MB search depends on algorithm used. P is a search area within which best
match is calculated. Value of higher P is leading to more computations during search for
motion vector of each macro block. Because of that in case of area in a picture where
there is fewer movement of an object from one frame to another, value of P is kept less.
31
5.1
Exhaustive Search
Exhaustive search algorithm is calculating cost function at each possible location
in the search window. Because of that this algorithm is also known as full search
algorithm. By finding cost function at each possible position in a reference frame, best
possible match is found for a current macro block within reference frame. Maximum
search in a search area is leading to the highest PSNR among any other block matching
algorithm. Fast algorithms try to achieve PSNR matching with PSNR of exhaustive
search. Fast algorithms are implemented to achieve that PSNR with reduced
computational complexity. Following equation shows computations required in
exhaustive search:
(5.2)
Where
M is size of a macro block
P is search area
In the equation above m is length of current macro block and p is a search area.
The equation is written assuming that the macro block has equal height and width. Note
that the square of m term in the above equation refers to calculations required to compare
current macro block of a current frame with a macro block of reference frame located
within the searching range. That term will be common in any algorithm being used.
Special purpose processors are used to compare one macro block with another in parallel
[8]. By doing that computational time is saved which is required to do one comparison.
Notice that square of (p + m) term is the main factor which is making cost of effective
32
search very large. Fast search algorithms are implemented in such a way that the (p + m)
factor is reduced to minimize overall cost for search of motion vector for a macro block
in current frame.
5.2
Three Step Search
Three step search algorithm is one of the earliest attempt to calculate motion
estimation quickly [9]. Generation information about the algorithm is mentioned in the
figure below:
First Step
Second Step
Third Step
Figure 5-2: Basic three step search
As shown in the figure above for P=7, initial step size is taken as 4. With step size
of four, current macro block is compared with a macro block in reference frame at nine
positions as mentioned in the figure. At the end of first step of search at nine points, best
point is taken as center for second step. Best point is the point where MAD is minimum
compared to eight other search points. For the second step, step size is reduced to two and
33
again the current macro block is compared with reference frame at nine points. With the
best point found in step two, again search is done at nine points with step size of one.
Notice how drastically number of steps required to do motion estimation has
decreased in the basic three step search. In case of P=7, number of comparisons required
in exhaustive search is 255. Here in case of three step search it has reduced to 25.
Following is breaking of 25 searches:
9 points: During first step
8 points: During second step
8 points: During third step
During first step of the three step search, cost function at the best matching point
is already calculated. Because of that at eight points search is required at the second step.
The same reason applies to the third step.
The TSS is using uniformly allocated checking for motion detection. We can say
that because every step the step size for searching is decreasing. The three step search
pattern the algorithm stops at 25th step irrespective of PSNR received by using motion
vector of the last step. Because of that it is prone to missing small motions [9]. A new
three step search algorithm is an improved method compared to actual three step search
algorithm. In this algorithm, center biased searching is provided by adding extra eight
search points in the original three step search method. It was widely used in earlier
standards like MPEG 1 and H.261. Following figures shows an example of how a motion
vector is calculated in case of new three step search:
34
Outer 1st step
Inner 1st step
2nd step (5 points)
3rd step (3 points)
Figure 5-3: New Three Step Search
As shown in the figure above, in case of the new three step search, current macro
block is searched at 17 places in the first step. In those 17 cost functions, if minimum cost
is found is found in the center, then searching for motion vector is stop at that point.
Otherwise ordinary three step method is continued by dividing the search step by two and
cost functions are calculated at required points. The number of block required to match
can be estimated by following equation:
(5.3)
Where
P1 is the probability of finding a good match in 1st step
P2 is the probability of finding a good match in 2nd step
P3 is the probability of finding a good match in 3rd step
35
Note that in case of the new three step search method, searching is continued till
best cost function is found at center. Because of that in best case motion vector can be
found in 17 iterations on the other hand it can take up-to 33 iterations if cost function is
not found in the center. Following table shows average computations required in
exhaustive search, three step search and new three step search:
ES
TSS
NTSS
Board move
137
25
18
Taxi move
165
25
21
Table 5-1: Average cost in ES, TSS and NTSS
In the table above notice that number of calculations required in TSS and NTSS
are very less compared to ES. Peak signal to noise ratio is of these methods is less
compared to ES. Detail about PSNR difference is explained more in detail in chapter 5.4
5.3
Diamond Search
Diamond search algorithms are providing PSNR almost similar to exhaustive
search algorithms [10]. In this algorithm, computation is done till minimum cost function
is found in the center of a diamond. Following are two fixed types of search patterns used
in the diagonal search algorithm:
36
Figure 5-4: LDSP and SDSP
As shown in the figure above, in case of LDSP (large diagonal search pattern),
comparisons are done at nine different places. Step size is two in case of the large
diagonal search pattern. In case of SDSP (small diagonal search pattern), comparisons are
done at five different places. Step size is one in case of the small diagonal search pattern.
Area covered by the large diagonal search pattern is large. By that in broader area current
macro block is matched with reference frame with comparatively lower precision
compared to small diagonal search pattern algorithm. Following figure shows how both
methods are used together for finding a motion vector:
LDSP 1
LDSP 2
LDSP 3
LDSP 4
SDSP
Figure 5-5: Example of Diamond search
37
Large diagonal search pattern is used starting from the beginning till minimum
cost function is found in the center. Once a minimum cost function is found in the center,
small diagonal pattern is used to find précised location of motion vector in the reference
frame. Because of small diagonal search pattern as a last step of diamond search, PSNR
of the diamond search algorithm is almost same as PSNR of exhaustive search algorithm.
Diamond search algorithm is good compared to NTSS in most of the conditions
as far as required number of computations is concerned [10]. That is because motion
vectors for objects that move faster are quickly caught by large diamond search pattern
that gives lower computations required to find a motion vector compared to the new three
step search algorithm.
5.4
Adaptive Rood Pattern Search
Algorithms used so-far follow a fix pattern to find motion vector for given macro
block in current frame. Adaptive rood pattern search is based on a fix pattern to find
motion vector for given macro block in current frame as well as a natural property of a
video. Experiments have shown that objects nearby each other are moving with same rate
in same direction [11].
Fast block based motion estimation techniques explained so-far are using fixed set
of search patterns. These algorithms are implemented based on assumption that the
motion estimation matching error decrease gradually as the searching point moves
towards the position where minimum error is found. Following figure shows Sum
38
Absolute Difference at different points for a macro block for current frame and reference
frame shown in chapter-3:
Figure 5-6: Matching error surface
In the figure above notice that sum of absolute difference is decreasing gradually
towards the point where error is minimum compared to all other points. The mesh plot
showing sum of absolute difference was plotted based on exhaustive search pattern
algorithm. Plot for three step search pattern and diamond search pattern are similar to
exhaustive search pattern algorithm.
Advantage of algorithms mentioned so-far is that they are simple to implement as
far as complexity in algorithm is concerned. Implementation of these algorithms is
regular at different steps. Because of that these algorithms were widely used in prior to
prior to H.264. Except diagonal search, all other algorithms are less efficient in tracking
large motions. Because of the advantage of diagonal search pattern algorithm, it was
accepted in the MPEG-10’s model of verification. Both derivatives of the diagonal search
39
are center biased. Because of that the diagonal search algorithm is providing higher
search accuracy with a little increase in computational complexity.
Instead of using pre-determined search pattern, algorithms are available that
exploits the correlation between the current clock and the reference block to predict the
target motion vector [12]. In such methods prediction is done by doing statistical average
of motion vectors of neighboring macro block and according to that decide size of a
macro block and step size. Thus, in these kinds of methods, search window is redefined.
These methods are giving comparable performance with expense of higher computation.
Notice that additional memory is required for storing neighboring motion vectors in this
method.
There are algorithms that do motion estimation using multi-resolution frame
structure [12]. These algorithms are implemented based on the fact that images at
difference resolutions are identical at different resolutions. Motion vectors are calculated
with lower macro block size and then they are used to predict direction of motion vector
in actual larger size frame. These algorithms lead to poor performance if the assumption
is not true in the prediction process. Block size are kept in the lower resolution while
prediction direction of motion vector. By doing that motion vector found in lower
resolution frame is easily mapped with associated block in the larger frame.
Algorithms explained so-far calculate MAD by taking all horizontal and vertical
pixels are reference. There are algorithms that use sub sampled pixels for calculating
mean absolute difference [11]. That is based on the fact that pixels located by each other
possess spatial redundancy within a macro block. By doing sub pixel level calculations
40
several computations are reduced at each position while finding cost function to decide
appropriate motion vector.
Each class of algorithm explained so far achieves different trade off. Algorithm
complexity involved, speed for doing search for a motion vector and picture quality after
doing prediction are some examples based on which trade-off is calculated.
With algorithms explained so-far, there are two major parts on which the
algorithm is focused on: Prediction of direction of a motion vector for finding motion
vector for a block quickly and most suitable size and shape of the search pattern.
Adaptive search algorithm uses direction of motion vector of previous block at the
beginning of the algorithm to get prediction of a motion vector direction. Following
figure shows four scenarios to predict direction of motion of current block with reference
to other blocks:
Type-1
Type-2
Type-3
Type-4
Figure 5-7: Types of prediction of motion
In the figure above, in case of type-1 of prediction, direction of adjacent four
macro blocks are checked. Based on motion vector of these four block’s motion vector,
prediction of motion vector for current macro block is done. In case of direction of
motion vector are different among adjacent macro block, majority rule is applied to
calculate prediction of location of motion vector for current macro block. In existing
41
project, type-4 is used for prediction. In this type, only direction of motion vector related
to previously processed macro block is taken as reference for prediction of direction of
motion vector for current block. Following figure gives an example of adaptive rood
pattern search:
Predicted MV
Step
Size
Figure 5-8: First step of ARPS
As shown in the figure above, during first step of the adaptive rood pattern search,
cost function are found at five locations used in small diamond search algorithm as well
as at the point where motion vector was found in the previous macro block. Notice that
step size used in this small diagonal search pattern is not equal to step size used in
ordinary small diamond search explained in chapter-5.3. In case of adaptive rood pattern
search, step size is selected such a way that it matches with size of predicted motion
vector. Following equation is used to find the step size:
(5.4)
Where,
42
(5.5)
In the equation above, MV’ is a motion vector of previous macro block. Notice
that to achieve step size value in hardware, lots of calculations are required. To minimize
that complexity in calculation, step size is chosen such a way that maximum of horizontal
vector and vertical vector of the predicted motion vector from the previous frame.
43
Chapter 6
SIMULATION RESULTS
In this project following videos were taken as a reference to implement several
components of a video encoder and decoder including various motion estimation
algorithms mentioned in the report. Following table shows list of videos taken as
reference:
Bit
Rate Frame
Search
Format
(kbps)
Rate(fps)
Range(pixels)
Car phone
QCIF
190
29
8
Room
CIF
48
18
8
Crossing
QCIF
232
14
16
Board
CIF
512
15
8
Foreman
CIF
313
25
8
Table 6-1: Frames taken as reference
In Matlab, each video was read into a data structure. Frames were extracted from
the data structure based on format of video, bit rate and frame rate information. After
extracting frames, they were passed into Matlab function along with macro block size and
search range as parameters. Following table shows average number of calculations
required to perform motion estimation with exhaustive search algorithm, new three step
search algorithm, diamond search algorithm and finally adaptive rood pattern search
algorithm:
44
ES
DS
ARPS
Car phone
260
24
10
Room
255
19
10
Crossing
934
28
16
Board
255
14
8
Foreman
275
29
13
Table 6-2 Average number of search points per MV generation
From the table above it is clear that in case of adaptive rood pattern search, a
search point for a macro block is found very quickly compared to other algorithms.
Notice that difference between numbers of computations required between NTSS and DS
is not major. Because of that NTSS and DS were widely interchangeably used in past
video encoding algorithms. Codec requiring less complexity were using NTSS and codec
capable to do complex calculations were using DS. Note that the adaptive rood pattern
search algorithm requires very less number of searches to find appropriate motion vector
for a macro block within current frame. On the other hand complexity involved in the
algorithm is very large compared to the past algorithm. Because of that Industries related
to video transmission and storage are more concentrating on implementing this algorithm
in hardware to do processing of a high definition videos at faster rate and efficiently [10].
Finding a search point quickly is not the only requirement in videos demanding
high quality. Good peak signal to noise ratio is also important for good quality of video.
Following figure shows comparison of PSNR between for 20 frames:
45
Figure 6-1: comparisons of PSNR (dB) for 20 frames between ES and ARPS
In the figure above notice that peak signal to noise ratio found with adaptive rood pattern
search algorithm for 20 frames is around one dB lesser than peak signal to noise ration
found with exhaustive search algorithm. On the other hand, from Table-9 notice that
number of computations required for finding motion vector of a given macro block in
exhaustive search are very large compared to number of computations required in the
adaptive rood pattern search algorithm.
Following is a pseudo block diagram of encoder module used in the project. Prior
to giving frames as input to the encoder, a function of Matlab was used to generate
frames from given video. Those frames were saved in a folder for reference for the
encoder module. The encoder module was separating current frame and reference frames
from each other and then as-per that input frames were processed by following video
processing components written in Matlab.
46
Video_encode_n.m
Current and
reference frame
extractor
AdaptiveRoodPattern.m
Motion_vect
Compansated_motion.m
Comp_frame
Residual_image.m
Resi_frame
FrameEncodeTx.m
EncoderModule.m
Figure 6-2: Pseudo block diagram of the video encoder
In the figure above adaptive rood pattern search is the file that generates motion
vectors and cost of doing the motion estimation. The motion vector related information is
then given to motion compensation module to generate compensated frame.
Compensated frame is not exactly equal to current frame. Because of that difference of
those frames is calculated by residual image module and transmitted.
47
Chapter 7
CONCLUSION
H.264 is providing mechanism for compressing video very efficiently without
much loss in video quality. Because of that, this standard is meeting practical multimedia
communication requirements. Matlab implementation for all major components within
H.264 encoder and decoder gives information on complexity involved in implementing
the codec on a hardware platform. Inter frame prediction for doing estimation of
movement of a macro block with respect to a macro block in reference frame was the
main focus of the project. High definition videos are taken at larger frame rate. Because
of that huge amount of temporal redundancy is available between consecutive frames. By
removing the temporal redundancy very less amount of information is required to send by
using effective inter frame prediction algorithms. Intra frame prediction and variable
length encoding are also attractions in the encoder compared to previous algorithms. Inter
frame prediction is the most complex component of an H.264 encoder.
Different algorithms for doing inter frame were discussed were discussed in the
project and factors like computational complexity and peak signal to noise ratio
associated with each algorithm were discussed. Adaptive rood pattern search algorithm
mentioned in this project requires minimum computations to find a motion vector.
Average peak signal to noise ratio achieved by this algorithm is almost matching with
average peak signal to noise ratio of exhaustive search algorithm which has maximum
peak signal to noise ratio.
48
REFERENCES
1.
T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the
H.264/AVC Video Coding Standard”, IEEE Trans. on Circuits and Systems for
Video Technology, vol. 13, no. 7, pp. 560–576, July 2003.
2. Sadka, Compressed Video Communications, John Wiley & Sons, 2002.
3. Edwin Paul J. Tozer , Broadcast engineer's reference book, Elsevier, 2004
4. Current and Previous, http://extra.cmis.csiro.au/IA/changs/motion/taxi.0.gif
5. Ilker Hamzaoglu, Ozgur Tasdizen, Esra Sahin, “An Efficient H.264 Intra Frame
Encoder System Design”, Faculty of Engineering and Natural Sciences, Sabanci
University 34956, Tuzla, Istanbul, Turkey
6. Hirohisa Jozawa, Kazuto Kamikura and Atsushi Sagata, “Two-Stage Motion
Compensation Using Adaptive Global MC and Local Affine MC”, IEEE Trans. on
Circuits and Systems for Video Technology , VOL. 7, NO. 1, FEBRUARY 1997
7. N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete Cosine Transform”, IEEE Trans.
Computers, 90-93, Jan 1974
8. Jianhua Lu and Ming L. Liou, “A Simple and Efficient Search Algorithm for Block
-Matching Motion Estimation”, IEEE Trans. on Circuits and Systems for Video
Technology, VOL. 7, NO. 2, APRIL 1997
9. Renxiang Li, Bing Zeng, and Ming L. Liou, “A New Three-Step Search Algorithm
for Block Motion Estimation”, IEEE Trans. Circuits AndSystems For Video
Technology, vol 4., no. 4, pp. 438-442, August 1994.
10. Yao Nie, and Kai-Kuang Ma, “Adaptive Rood Pattern Search for Fast BlockMatching Motion Estimation”, IEEE Trans. Image Processing, vol 11, no. 12, pp.
1442-1448, December 2002
11. B. Liu and A. Zaccarin, “New fast algorithms for the estimation of block motion
vectors,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, no.2, pp. 148–157,
1993.
12. L.-J. Luo, C. Zou, and X.-Q. Gao, “A new prediction search algorithm for block
motion estimation in video coding,” IEEE Trans. Consumer Electron., vol. 43, pp.
56–61, Feb. 1997.
Download