PPT

advertisement
Fast mode decision for Inter Mode
Selection in H.264/AVC Video Coding
By
Amruta Kulkarni
Under Guidance of
DR. K.R. RAO
Contents
 Need for video compression
 Motivation
 Video coding standards, video formats and quality
 Overview of H.264
 Complexity reduction algorithm for inter mode selection
 Experimental results
 Conclusions
 References
Need for Video Compression
 It reduces both storage and bandwidth
demands.
 Insufficient resources to handle
uncompressed videos.
 Better proposition is to send high-
resolution compressed video than a lowresolution, uncompressed stream over a
high bit-rate transmission channel.
Motivation [2]
 Removing redundancy in a video clip
 Only a small percentage of any
particular frame is new information
 Highly complex process
 Reduce the overall complexity
suitable for handheld devices
Timeline of Video Development [10]
Inter-operability between encoders and decoders from different
manufacturers
Build a video platform which helps to interact with video codecs, audio
codecs, transport protocols, security and rights management in well
defined and consistent ways
OVERVIEW OF H.264 / AVC STANDARD
 Built on the concepts of earlier standards such as MPEG-2 and
MPEG-4 Visual
 Achieves substantially higher video compression and has network
friendly video representation
 50% reduction in bit-rate over MPEG-2
 Error resilience tools
 Supports various interactive (video telephony) and non-interactive
applications (broadcast, streaming, storage, video on demand)
H.264/MPEG-4 Part 10 or AVC [2, 5]
 Is an advanced video compression standard, developed by
ITU-T Video Coding Experts Group(VCEG) together with
ISO/IEC Moving Picture Experts Group(MPEG).
 It is a widely used video codec in mobile applications,
internet ( YouTube, flash players), set top box, DTV etc.
 A H.264 encoder converts the video into a compressed
format(.264) and a decoder converts compressed video back
into the original format.
How does H.264 codec work ?
 An H.264 video encoder carries out prediction, transform
and encoding processes to produce a compressed H.264 bit
stream. The block diagram of the H.264 video encoder is
shown in Fig 1.
 A decoder carries out a complementary process by decoding,
inverse transform and reconstruction to output a decoded
video sequence. The block diagram of the H.264 video
decoder is shown in Fig 2.
H.264 encoder block diagram
Fig. 1 H.264 Encoder block diagram[7]
H.264 decoder block diagram
Bitstream
Input
+
Entropy
Decoding
Inverse Quantization
& Inverse Transform
Video
Output
Deblocking
Filter
+
Intra/Inter Mode
Selection
Picture
Buffering
Fig.2 H.264 decoder
block diagram [2]
Intra
Prediction
Motion
Compensation
Slice Types
[3]
I (intra) slice – contains reference only to itself.
P (predictive) slice – uses one or more recently decoded slices as a reference (or
prediction) for picture construction.
B (bi-predictive) slice – works similar to P slices except that former and future I or P
slices may be used as reference pictures
SI and SP or “switching” slices may be used for transitions between two different
H.264 video streams.
Profiles in H.264
 The H.264 standard defines sets of capabilities, which are also referred to as
“Profiles”, targeting specific classes of applications. Fig. 3. Different features are
supported in different profiles depending on applications. Table 1. lists some
profiles and there applications.
Table 1. List of
H.264 Profiles
and
applications[2]
Profile
Applications
Baseline
Video conferencing , Videophone
Main
Digital Storage Media, Television
Broadcasting
High
StreamingVideo
Extended
Content distribution
Post processing
Profiles in H.264[9]
Fig. 3 Profiles in H. 264[9]
Intra Prediction
 I – pictures usually have a large amount of information




present in the frame.
The spatial correlation between adjacent macro-blocks in a
given frame is exploited.
H.264 offers nine modes for intra prediction of 4x4
luminance blocks.
H.264 offers four modes of intra prediction for 16x16
luminance block.
H.264 supports four modes similar to 16x16 luminance
block for prediction of 8x8 chrominance blocks.
Intra prediction
Fig.4 16x16 intra prediction modes [11]
Fig. 5 4x4 Intra prediction modes [11]
Inter Prediction
[5]
 Takes advantage of the temporal redundancies that exist
among successive frames.
 Temporal prediction in P frames involves predicting from
one or more past frames known as reference frames.
Motion Estimation/Compensation
 It includes motion estimation (ME) and motion
compensation (MC).
 ME/MC performs prediction. A predicted version of a
rectangular block of pixels is generated by choosing another
similarly sized rectangular block of pixels from previously
decoded reference picture.
 Reference block is translated to the position of current
rectangular block (motion vector).
 Different sizes of block for luma: 4x4, 4x8, 8x4, 8x8, 16x8,
8x16, 16x16 pixels.
Inter prediction
Fig. 6 Partitioning of a MB for motion compensation [5]
Integer Transform and Quantization
 Transform:
 Prediction error block is expressed in the form of transform co-
efficients.
 H.264 employs a purely integer spatial transform, which is a
rough approximation of the DCT.
 Quantization:
 Significant portion of data compression takes place.
 Fifty-two different quantization step sizes can be chosen.
 Step sizes are increased at a compounding rate of approximately
12.5%.
De-blocking Filter and Entropy Coding
 De-blocking filter:
 Removes the blocking artifacts due to the block based encoding
pattern
 In-loop de-blocking filter
 Entropy coding:
 Assigning shorter code-words to symbols with higher
probabilities of occurrence, and longer code-words to symbols
with less frequent occurrences.
 CAVLC and CABAC
FAT (Fast Adaptive Termination) for Mode Selection
[9]
 The proposed fast adaptive mode selection algorithm
includes the following:
 Fast mode prediction
 Adaptive rate distortion threshold
 Homogeneity detection
 Early Skip mode detection
Fast mode prediction
 In H264/ AVC video coding is performed on each frame by
dividing the frame into small macro blocks from up-left to
right-bottom direction.
 The spatial macro blocks in the same frame generally have
the similar characteristics such as motion, detailed region.
 For example, if most of the neighboring macro blocks have
skip mode, that means the current macro block has more
chance of having the same mode.
 Temporal similarity also exists between the collocated macro
blocks in the previous encoded frame.
Fast mode prediction
•
Fig. 7 shows the spatial macro blocks, the current macro block X has similar characteristics with its
neighboring macro blocks from A through H.
•
In Fig. 8 shows the temporal similarity between current and collocated macro block PX in the
previous frame and its neighbors.
Fig. 7 Spatial Neighboring blocks [8]
Fig. 8 Temporal Neighboring blocks [8]
Fast mode prediction
 A mode histogram from spatial and temporal neighboring
macro blocks is obtained, we select the best mode as the
index corresponding to the maximum value in the mode
histogram.
 The average rate-distortion cost of each neighboring macro
block corresponding to the best mode is then selected as the
prediction cost for the current macro block.
Rate Distortion Optimization
•Rate–distortion optimization (RDO) is a method of improving video quality in
video compression. The name refers to the optimization of the amount of distortion (loss of
video quality) against the amount of data required to encode the video, the rate.
•Macro block parameters : QP(quantization parameter) and Lagrange multiplier (λ)
•Calculate : λ Mode= 0.85*2(QP-12)/3
•Then calculate cost, which determines the best mode,
RDcost = D + λ MODE * R,
D – Distortion
R - bit rate with given QP
λ – Lagrange multiplier
•Distortion (D) is obtained by SAD (Sum of Absolute Differences) between the original
macro block and its reconstructed block.
•Bit rate(R) includes the bits for the mode information and transform coefficients for
macro block .
•Quantization parameter (QP) can vary from (0-51)
•Lagrange multiplier (λ) a value representing the relationship between bit cost & quality.
Adaptive Rate Distortion Threshold
 RDthres for early termination is dependent on RD pred which
is computed according to spatial and temporal correlations.
 RDthres also depends on the value of β modulator.
 Thus, rate distortion threshold is given by,
Rdthres = (1+ β) x RD pred
 β modulator provides a trade-off between computational
efficiency and accuracy.
Threshold selection
RD thres = RD pred x (1-8xβ)
 Adaptive Threshold II: RD thres = RD pred x (1+10xβ)
 The threshold is adaptive as it depends on the predicted rate
distortion cost derived from spatial and temporal correlations.
 Where, β is the modulation Coefficient, and it depends on two
factors namely quantization step (Qstep) and block size (N and
M).
 Adaptive Threshold I:
Homogeneity Detection
 Smaller block sizes like P4x8, P8x4 and P4x4 often
correspond to detailed regions and thus requires much more
computation when compared to larger block sizes.
 So, before checking smaller block sizes it is necessary to
check if a P8x8 block is homogeneous or not.
 The method adopted to detect homogeneity is based on edge
detection.
 An edge map is created for each frame using the Sobel
operator [27].
Homogeneity Detection
 For each pixel pm, n, an edge vector is obtained Dm,n ( dxm,n, dym,n)
 dxm, n = pm-1, n+1 + 2 * pm, n+1 + pm+1, n+1 - pm-1, n-1 – 2 * pm, n- 1 - pm+1, n-1
 dym,n = pm+1, n-1 + 2 * pm+1, n + pm+1, n+1 - pm-1, n-1 – 2 * pm-1, n - pm-1, n+1
(1)
(2)
 Here dxm, n and dym, n represent the differences in the vertical and
horizontal directions respectively.
 The amplitude Amp (D (m, n)) of the edge vector is given by,
 Amp (D (m, n)) = │ dxm, n │+ │ dym, n │
(3)
 A homogeneous region is detected by comparing the summation
of the amplitudes of edge vectors over one region with predefined
threshold values [30]. In the proposed algorithm, such thresholds
are made adaptive depending on the amplitude of left, up blocks
and mode information.
Homogeneity Detection
 The adaptive threshold is determined as per the following four
cases:
 Case 1: If the left block and the up block are both P8x8
 Case 2: If the left block is P8x8 and up block is not P8x8
 Threshold =
Homogeneity Detection
 Case 3: If the left block is not P8x8 and up block is P8x8
 Threshold =
 Case 4: If the left block is not P8x8 and up block is not
P8x8
FAT Algorithm [8]
Fig. 9 FAT algorithm [8]
FAT Algorithm
 Step 1 : If current macro block belongs to I slice, check for intra




prediction using I4x4 or I16x16,go to step 10 else go to step 2.
Step 2 : If a current macro block belongs to the first macro block
in P slice check for inter and intra prediction modes, go to step 10
else go to step 2.
Step 3: Compute mode histogram from neighboring spatial and
temporal macro blocks, go to step 4.
Step 4 : Select prediction mode as the index corresponding to
maximum in the mode histogram and obtain values of Adaptive
Threshold I and Adaptive Threshold II, go to step 5.
Step 5 : Always check over P16x16 mode and check the conditions
in the skip mode, if the conditions of skip mode are satisfied go to
step 10, otherwise go to step 6.
FAT Algorithm
 Step 6 : If all left, up , up-left and up-right have skip modes, then
check the skip mode against Adaptive Threshold I if the rate
distortion is less than Adaptive Threshold I , the current macro
block is labeled as skip mode and go to step 10, otherwise, go to
step 7.
 Step 7 : First round check over the predicted mode; if the
predicted mode is P8x8, go to step 8; otherwise, check the rate
distortion cost of the predicted mode against Adaptive Threshold I.
If the RD cost is less than Adaptive Threshold I, go to step 10;
otherwise go to step 9.
 Step 8 : If a current P8x8 is homogeneous, no further partition is
required. Otherwise, further partitioning into smaller blocks
8x4,4x8, 4x4 is performed. If the RD of P8x8 is less than
Adaptive Threshold I , go to step 10; otherwise go to step 9.
FAT Algorithm
 Step 9 : Second round check over the remaining modes
against Adaptive Threshold II : If the rate distortion is less
than Adaptive Threshold II; go to step 10; otherwise continue
check all the remaining modes, go to step 10.
 Step 10 : Save the best mode and rate distortion cost.
CIF and QCIF sequences
 CIF (Common Intermediate Format) is a format used to standardize the
horizontal and vertical resolutions in pixels of Y, Cb, Cr sequences in video
signals, commonly used in video teleconferencing systems.
 QCIF means "Quarter CIF". To have one fourth of the area as "quarter" implies
the height and width of the frame are halved.
 The differences in Y, Cb, Cr of CIF and QCIF are as shown below in fig.6. [16]
Fig.10 CIF
and QCIF
resolutions(Y,
Cb, Cr ).
Results
 The following QCIF and CIF sequences were used to test the








complexity reduction algorithm. [10]
Akiyo
Foreman
Car phone
Hall monitor
Silent
News
Container
Coastguard
Test Sequences
Akiyo
Coastguard
News
Foreman
Car phone
Container
Test Sequences
Hall monitor
Silent
Experimental Results
 Baseline profile
 IPPP type.
 Various QP of 22,27, 32 and 37.
 QCIF -30 frames
CIF - 30 frames
 The results were compared with exhaustive search of JM in
terms of the change of PSNR, bit-rate, SSIM, compression
ratio, and encoding time.
 Intel Pentium Dual Core processor of 2.10GHz and 4GB
memory.
Experimental Results
 Computational efficiency is measured by the amount of time
reduction, which is computed as follows:
TimeJM 1 7.2  Timen ew
Time 
 100%
Time1 7.2
 Delta Bit rate is measured by the amount of reduction which is
computed by,
Bit rate17.2  Bit rateJMnew
Bit rate 
100%
Bit rate17.2
 Delta PSNR (Peak Signal to Noise Ratio) is measured by the
amount of reduction which is computed by,
PSNR1 7.2  PSNRJMn ew
PSNR 
100%
PSNR1 7.2
Quality

Specify, evaluate and compare

Visual quality is inherently subjective.

Two types of quality measures :

Objective quality measure- PSNR, MSE

Structural quality measure- SSIM [29]

PSNR - most widely used objective quality measurement
PSNRdB = 10 log10 ((2n − 1)2 / MSE)
where, n = number of bits per pixel,
MSE = mean square error

SSIM – SSIM emphasizes that the human visual system is highly adapted to extract structural information from
visual scenes. Therefore, structural similarity measurement should provide a good approximation to perceptual
image quality.
Results
Conclusions
 To achieve time complexity reduction in inter prediction, a fast adaptive
termination mode selection algorithm, named FAT [8] has been used.
 Experimental results reported on different video sequences and comparison
with open source code (JM17.2) indicate that the algorithm used achieves faster
encoding time with a negligible loss in video quality. Numbers are as shown
below:
 Encoding time: ~43% reduction for QCIF and ~40% reduction for CIF
 PSNR: ~0.15% reduction for QCIF and ~0.26% reduction for CIF
 Bit Rate: ~6% reduction for QCIF and ~9.5% reduction for CIF
 SSIM: ~0.077% reduction for QCIF and ~0.073% reduction for CIF
 These results show that considerable reduction in encoding time is achieved
using FAT algorithm while not degrading the video quality
References:
1.
Open source article, “Intra frame coding” :
http://www.cs.cf.ac.uk/Dave/Multimedia/node248.html
2.
Open source article, “MPEG 4 new compression techniques” :
http://www.meabi.com/wp-content/uploads/2010/11/21.jpg
3.
Open source article, “H.264/MPEG-4 AVC,” Wikipedia Foundation,
http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC
4.
I.E.Richardson, “The H.264 advanced video compression standard”,2nd Edition ,Wiley 2010.
5.
R. Schafer and T. Sikora, “Digital video coding standards and their role in video
communications, ”Proceedings of the IEEE Vol 83,pp. 907-923,Jan 1995.
6.
G. Escribano et al, “Video encoding and transcoding using machine learning,”
MDM/KDD’08,August 24,2008,Las Vegas,NV,USA.
7.
8.
D. Marpe, T. Wiegand and S. Gordon, “H.264/MPEG4-AVC Fidelity Range Extensions:
Tools, Profiles, Performance, and Application Areas”, Proceedings of the IEEE International
Conference on Image Processing 2005, vol. 1, pp. 593 - 596, Sept. 2005.
ITU-T Recommendation H.264-Advanced Video Coding for Generic Audio-Visual services.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
S. Kwon, A. Tamhankar and K.R. Rao, ”Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication
and Image Representation, vol. 17, pp.186-216, April 2006.
A. Puri et al, “Video coding using the H.264/ MPEG-4 AVC compression standard”, Signal Processing:
Image Communication, vol. 19, pp: 793 – 849, Oct. 2004.
G. Sullivan, P. Topiwala and A. Luthra, “The H.264/AVC Advanced Video Coding Standard: Overview and
Introduction to the Fidelity Range Extensions”, SPIE conference on Applications of Digital Image
Processing XXVII, vol. 5558, pp. 53-74, Aug. 2004.
K. R. Rao and P. C.Yip, “The transform and data compression handbook”, Boca Raton, FL: CRC press,
2001.
T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing Magazine, vol.
24, pp. 148-153, March 2007.
I.E.Richardson “H.264/MPEG-4 Part 10 White Paper : Inter Prediction”, www.vcodex.com,
March 2003.
JM reference software http://iphome.hhi.de/suehring/tml/
G. Raja and M.Mirza, “In-loop de-blocking filter for H.264/AVC Video”, Proceedings of the IEEE
International Conference on Communication and Signal Processing 2006, Marrakech, Morroco, Mar.
2006.
M. Wien, “Variable block size transforms for H.264/AVC”, IEEE Trans. on Circuits and Systems for Video
Technology, vol. 13, pp. 604–613, July 2003.
A. Luthra, G. Sullivan and T. Wiegand, “Introduction to the special issue on the H.264/AVC video coding
standard”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, issue 7, pp. 557-559, July
2003.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
A. Luthra, G. Sullivan and T. Wiegand, “Introduction to the special issue on the H.264/AVC video coding
standard”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, issue 7, pp. 557-559, July
2003.
H.Kim and Y.Altunhasak, “Low-Complexity macroblock mode selection for H.264-AVC encoders”, IEEE
International Conference on Image Processing, vol.2, pp .765-768, Oct. 2004.
“Editor's Proposed Draft Text Modifications for Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC
14496-10 AVC), Draft 2”, JVT-E022d2, Geneva, Switzerland, 9-17 October, 2002
A. Tourapis, O. C. Au, and M. L. Liou, “Highly efficient predictive zonal algorithm for fast block-matching
motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, pp. 934-947, Oct. 2002
Z. Chen, P. Zhou and Y He, “Fast integer pel and fractional pel motion estimation for JVT”, JVTF017r1.doc, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, 6th meeting, Awaji, Island, JP, 513 December, 2002.
A. M. Tourapis, " Enhanced predictive zonal search for single and multiple frame motion estimation,” in
proceedings of Visual Communications and Image Processing 2002 (VCIP-2002), pp. 1069-1079, San Jose,
CA, January 2002.
Y. Lin and S. Tai, “Fast full search block matching algorithm for motion compensated video compression”,
IEEE Transactions on Communications, vol. 45, pp. 527-531, May 1997.
T. Uchiyama, N. Mukawa, and H.Kaneko, “Estimation of homogeneous regions for segmentation of textured
images,” Proceedings of IEEE ICPR, pp. 1072-1075, 2002.
X. Liu, D. Liang and A. Srivastava, “Image segmentation using local special histograms,” Proceedings of IEEE
ICIP, pp. 70-73, 2001.
F. Pan, X. Lin, R. Susanto, K. Lim,Z. Li,G. Feng,D. Wu and S. Wu, ”Fast mode decision for intra prediction,“
Doc. JVT-G013,Mar.2003.
30. D. Wu et al ,“Fast intermode decision in H.264/AVC video coding,” IEEE Transactions on Circuits and
System for Video Technology, vol. 15, no. 7, pp. 953-958,July 2005.
31. YUV test video sequences : http://trace.eas.asu.edu/yuv/
32. J. Ren et al , “Computationally efficient mode selection in H.264/AVC video coding”, IEEE Transactions on
Consumer Electronics, Vol. 54, No.2, pp. 877-886, May 2008.
33.
Z. Wang et al,”Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on
Image Processing, vol.13, no.4, pp.600-612, Apr. 2004.
34. A.Puri et al, “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: Image
Communication, vol.19, pp. 793-849, Oct. 2004.
35. Multi-view Coding H.264/MPEG-4 AVC : http://mpeg.chiariglione.org/technologies/mpeg-4/mp04-
mvc/index.htm
36. CIF and QCIF format : http://en.wikipedia.org/wiki/Common_Intermediate_Format
37. T.Wiegand et al,”Rate-constrained coder control and comparison of video coding standards,” IEEE Trans.
Circuits Systems Video Technology, vol. 13, no.7, pp.688-703, July 2003.
38. T.Stockhammer, D.Kontopodis, and T.Wiegand,” Rate-distortion optimization for H.26L video coding in
packet loss environment,” in Proc. Packet Video Workshop 2002, Pittsburgh, PA, April 2002.
39. K.R.Rao and J.J.Hwang,”Techniques and standards for digital image/video/audio coding,” Englewood
Cliffs, NJ: Prentice Hall, 1996.
40. Open source article,”Blu-ray discs”, http://www.blu-ray.com/info/
41. Open source article,” Coding of moving pictures and audio”
,http://mpeg.chiariglione.org/standards/mpeg-2/mpeg-2.htm
42. Open source article,” Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9
aspect ratios”, http://www.itu.int/rec/R-REC-BT.601/
43. Integrated Performance Primitives from Intel Website: http://software.intel.com/en-us/articles/intel-
ipp/#support, 2009.
44. T.Purushotham,” Low complexity H.264 encoder using machine learning,” M.S. Thesis, E.E Dept, UTA,
2010.
45. S.Muniyappa,” Implementation of complexity reduction algorithm for intra mode selection in H.264/AVC,”
M.S. Thesis, E.E Dept, UTA, 2011.
46. R.Su, G.Liu and T.Zhang,” Fast mode decision algorithm for intra prediction in H.264/AVC with integer
transform and adaptive threshold,” Signal, Image and Video Processing, vol.1, no.1, pp. 11-27, Apr. 2007.
47. D.Kim, K.Han and Y.Lee,” Adaptive single-multiple prediction for H.264/AVC intra coding,” IEEE Trans. on
Circuits and Systems for Video Technology, vol. 20, no. 4, pp. 610-615, April 2010.
48. G.J.Sullivan,” The H.264/ MPEG-4 AVC video coding standard and its deployment status,” Proc. SPIE Conf.
Visual Communications and Image Processing (VCIP), Beijing, China, July 2005.
49. D.Marpe, T.Wiegand and G.Sullivan,” The H.264/MPEG-4 advanced video coding standard and its
applications,” IEEE Communications Magazine, vol. 44, no.8, pp. 134-143, Aug. 2006.
50. T.Wiegand and G.Sullivan,”The picturephone is here: Really”, IEEE Spectrum, vol.48, pp.50-54, Sept.
2011.
Thank You !!
SSIM










The difference with respect to other techniques mentioned previously such as MSE or PSNR, is that
these approaches estimate perceived errors on the other hand SSIM considers image degradation as
perceived change in structural information. Structural information is the idea that the pixels have strong
inter-dependencies especially when they are spatially close. These dependencies carry important
information about the structure of the objects in the visual scene.
The SSIM metric is calculated on various windows of an image. The measure between two windows
and of common size N×N is:
With
the average of μx ;
the average of μy;
the variance of σx;
the variance of σy ;
the covariance of and σxy;
C1 and C2, two variables to stabilize the division with weak denominator;
In order to evaluate the image quality this formula is applied only on luma. The resultant SSIM
index is a decimal value between -1 and 1, and value 1 is only reachable in the case of two identical
sets of data. Typically it is calculated on window sizes of 8×8. The window can be displaced pixelby-pixel on the image but the authors propose to use only a subgroup of the possible windows to
reduce the complexity of the calculation.
RDO
Rate–distortion optimization (RDO) is a method of improving video quality in video
compression. The name refers to the optimization of the amount of distortion (loss of video quality)
against the amount of data required to encode the video, the rate. While it is primarily used by video
encoders, rate-distortion optimization can be used to improve quality in any encoding situation
(image, video, audio, or otherwise) where decisions have to be made that affect both file size and
quality simultaneously.
 Rate–distortion optimization solves the aforementioned problem by acting as a video quality
metric, measuring both the deviation from the source material and the bit cost for each possible
decision outcome. The bits are mathematically measured by multiplying the bit cost by the
Lagrangian, a value representing the relationship between bit cost and quality for a particular
quality level. The deviation from the source is usually measured as the mean squared error, in order
to maximize the PSNR video quality metric.
 Calculating the bit cost is made more difficult by the entropy encoders in modern video codecs,
requiring the rate-distortion optimization algorithm to pass each block of video to be tested to the
entropy coder to measure its actual bit cost. In MPEG codecs, the full process consists of a discrete
cosine transform, followed by quantization and entropy encoding. Because of this, rate-distortion
optimization is much slower than most other block-matching metrics, such as the simple sum of
absolute differences (SAD) and sum of absolute transformed differences (SATD). As such it is
usually used only for the final steps of the motion estimation process, such as deciding between
different partition types in H.264/AVC.

PSNR






The PSNR is most commonly used as a measure of quality of reconstruction of lossy compression
codecs (e.g., for image compression). The signal in this case is the original data, and the noise is the
error introduced by compression. When comparing compression codecs it is used as an
approximation to human perception of reconstruction quality, therefore in some cases one
reconstruction may appear to be closer to the original than another, even though it has a lower
PSNR (a higher PSNR would normally indicate that the reconstruction is of higher quality). One
has to be extremely careful with the range of validity of this metric; it is only conclusively valid
when it is used to compare results from the same codec (or codec type) and same content. [1][2]
It is most easily defined via the mean squared error (MSE) which for two m×n monochrome images
I and K where one of the images is considered a noisy approximation of the other is defined as:
The PSNR is defined as:
Here, MAXI is the maximum possible pixel value of the image. When the pixels are represented
using 8 bits per sample, this is 255. More generally, when samples are represented using linear
PCM with B bits per sample, MAXI is 2B−1. For color images with three RGB values per pixel, the
definition of PSNR is the same except the MSE is the sum over all squared value differences divided
by image size and by three. Alternately, for color images the image is converted to a different color
space and PSNR is reported against each channel of that color space, e.g., YCbCr or HSL.[3][4][5]
Typical values for the PSNR in lossy image and video compression are between 30 and 50 dB,
where higher is better.[6][7] Acceptable values for wireless transmission quality loss are considered to
be about 20 dB to 25 dB.[8][9]
When the two images are identical, the MSE will be zero. For this value the PSNR is undefined (
Bit rate








In telecommunications and computing, bit rate (sometimes written bitrate, data rate or as a
variable R[1]) is the number of bits that are conveyed or processed per unit of time.
In digital multimedia, bitrate represents the amount of information, or detail, that is stored per unit
of time of a recording. The bitrate depends on several factors:
The original material may be sampled at different frequencies
The samples may use different numbers of bits
The data may be encoded by different schemes
The information may be digitally compressed by different algorithms or to different degrees
Generally, choices are made about the above factors in order to achieve the desired trade-off
between minimizing the bitrate and maximizing the quality of the material when it is played.
If lossy data compression is used on audio or visual data, differences from the original signal will be
introduced; if the compression is substantial, or lossy data is decompressed and recompressed, this
may become noticeable in the form of compression artifacts. Whether these affect the perceived
quality, and if so how much, depends on the compression scheme, encoder power, the
characteristics of the input data, the listener’s perceptions, the listener's familiarity with artifacts,
and the listening or viewing environment.
 The most computational expensive process in H.264 is the
Motion Estimation.
 For example, assuming FS and P block types, Q reference
frames and a search range of MxN, MxNxPxQ computions
are needed.
Download