basics-of-video-compression(01/27/2016)

advertisement
Video Compression
Evolution of video coding standards
Outline
• Need for Video Compression
• Application Scenarios
• Fundamentals of Video Coding
• Redundancy Removal Techniques
• Compression Artifacts
• Encoding and Decoding Process Flow
• Video Coding Standards
• Related Work
Digital Video
• A sequence of digital video frames
Frame 1
Frame 2
Frame 3
Frame 4
• Video compression is a process of reducing the amount of data
required to represent digital video signal, prior to transmission or
storage
• Coding techniques can be lossy or lossless
Need for Video Compression
• Uncompressed 1080p (progressive) high definition (HD) video at 24 frames
/ second
Pixels per frame
1920x1080
Bits per pixel
8 x 3 (RGB)
1.5 hours
806 GB
Bitrate
1.2 Gbit/second
• Blue-ray DVD
⁻ Capacity : 25 GB (gigabytes) for single layer.
⁻ Read rate : 36 Mbit/s
• Video Streaming or TV Broadcast
⁻ 1 Mbit/s to 20 Mbit/s
• Requires compression in the order of 20x to 1000x
Application Scenarios
Digital television
broadcasting
Internet Video Streaming
Mobile Video Streaming
DVD video
Video Calling
Fundamentals of Video Coding
YCrCb Color Space
• YCrCb is the digital form of YUV color space. RGB to YCbCr conversion for 8 bits
per pixel (bpp) image is given below.
 Y   0.299 0.587 0.114  R   0 
  
  

 Cb    0.169 0.334 0.500  G   128 
 Cr   0.500 0.419 0.081 B  128 
  
  

• Has luminance (Y) and color difference or chrominance (Cr, Cb) components
• Widely used in image and video compression schemes
• Chroma sub sampling example – 4:2:0 format
WxH
H
W
Y
W/2 x H/2
W/2 x H/2
Cr
Cb
Redundancies in Video Sequences
• Compression can be achieved by exploiting various redundancies
in video sequences
• Types of redundancies:
- Spatial redundancy
- Perceptual redundancy
- Statistical redundancy
- Temporal redundancy
Spatial Redundancy Removal
1. Intra Prediction
- Blocks are predicted using neighboring pixels reconstructed from the
same picture
- Prediction block is subtracted from the current block prior to encoding
Predict
Current Frame
Previously coded pixels
Previously coded pixels
Horizontally
Predicted Block
-
Predict
Current Block
Current block
to be coded
Encode
Difference
Spatial Redundancy Removal
2. Block Transforms
• Convert spatial variations within a block to frequency variations without
changing the data
• Typically matrix operations, invertible
• Used for energy compaction in the block (DCT is used)
Forward 2D-DCT
N 1 N 1
2
  (2 x  1)u 
  (2 y  1)v 
F (u , v)  C (u )C (v)  f ( x, y )cos 
cos 


N
2N
2N




x0 y 0
for u  0,..., N  1 and v  0,..., N  1
1 / 2 for k  0
where N  8 and C ( k )  
 1 otherwise
Inverse 2D-DCT
2 N 1 N 1
  (2 x  1)u 
  (2 y  1)v 
f ( x, y )   C (u )C (v) F (u , v)cos 
cos



N u 0 v0
2N
2N




for x  0,..., N  1 and y  0,..., N  1 where N  8
f(x,y) is the value of each
pixel in the selected 8×8
block, and the F(u,v) is the
DCT coefficient after
transformation. The
transformation of the 8×8
block is also a 8×8 block
composed of F(u,v).
Contd..
8x8 2D Discrete Cosine
Transform (DCT)
8x8 Block of Pixels
(8 bits per pixel, 0-255 levels)
Transformed Coefficients
Perceptual Redundancy Removal
• From perceptual point of view, all the video data is not equally significant
• Human visual system is more sensitive to low frequency information than high
frequency information
• Quantization is a good tool for removal of perceptual redundancy
- It’s not invertible, introduces distortion
Statistical Redundancy Removal
• Probability of occurrence of all the pixels in an image (or transformed image) is not equal
• Entropy Coding can be used to exploit statistical redundancy (Example - Variable length coding)
•
•
•
•
Shorter code words used to represent more frequent values
Longer code words used to represent less frequent values
Lossless compression technique
Entropy (H) is the minimum theoretical bit rate at which a group of L samples can be coded and is given by
N
Entropy (H) = -  P (ai) log2 P(ai)
i=1
N = # of symbols
P (ai) = probability of symbol ai
• Various coding techniques:
•
•
•
CABAC – Context Adaptive Binary Arithmetic Coding
CAVLC – Context Adaptive Variable Length Coding
Huffman Coding
Temporal Redundancy Removal
• Inter Prediction
- Adjacent picture(s) used as reference to predict current block of frame
1. Frame difference coding
- Difference can be encoded using DCT + Quantization + Entropy coding
Frame 1
Frame 2
Frame 1 - Frame 2
Temporal Redundancy Removal
2. Inter prediction using Motion Compensated Prediction
• Divide the frame into blocks
• For each block, find out the relative motion between
the current block and a matching block of the same
size in the reference frame using Block Matching
Algorithm*
• Displacement between the current block and the
best matching block is the Motion Vector (MV),
process of motion determination is called Motion
Estimation.
• Current block is replaced by best matching block to
form Motion Compensated Prediction
• Transmit the motion vector(s) for each block
* M. Jakubowski and G. Pastuszak, “Block-based motion
estimation algorithms – a survey”, Opto-Electronics
Review, Rev. 21, No. 1, pp. 86-102, 2013.
Contd..
• The dissimilarity D(s,t) (sometimes referred to as error, distortion, or
distance) between two images Ψn andΨn-1 is defined as follows
p
q
D( s, t )    M [n( x, y), n  1( x  Vx, y  Vy )]
Vy 1 Vx , 1
where M(u,v) is a metric that measure the dissimilarity
between
the two arguments u and v. [Here u = Ψn (x,y) and v = Ψn-1(x+Vx, y+Vy)]
• There are several types of matching criteria and two most frequently
used are MSE and MAD, which is defined as follows:
1) Mean square error (MSE): M (u, v)  (u - v)2
2) Mean absolute difference (MAD): M (u, v) | u - v |
• A study based on experimental works reported that the matching
criterion does not significantly affect the search. Hence, the MAD is
preferred due to its simplicity in implementation.
Contd..
Given a macroblock in the anchor
block Bm, the motion estimation is
to determine a matching block Bm’
in the target frame such that the
error D(s,t) between the two blocks
is minimized. The most
straightforward method is the
exhaustive block-matching
algorithm (EBMA).
MV  (Vx , Vy )  arg
min
(Vx ,Vy )S

M [n( x, y), n  1( x  Vx, y  Vy )]
( x , y )MB
where S is the search region and MV
is the motion vector that minimizes
the distance.
Video Coding Picture Types
• Three types:
I-frames - least compressible but don't require other video frames to decode
P-frames - use data from previous frames to decompress and are more
compressible than I-frames.
B-frames - use both previous and forward frames for data reference to get the
highest amount of data compression
• Typical Group Of Pictures (GOP) structure is IBBPBBP…
I : Intra coded frame
P : Predicted coded frame
B : Bi-directionally predicted frame
Sample Prediction Blocks in a Frame
Compression Artifacts
• Noticeable distortion of video caused by the application of lossy data
compression techniques
• Types of artifacts in the context of predictive coding
• Blocking artifacts
• Occurs due to discontinuities found at the boundaries of adjacent blocks in a
reconstructed picture.
• Induced by two different sources: Block-wise prediction and block transform coding
• Can be mitigated using de-blocking filter.
• Ringing artifacts
• Occurs in the context of transformation and quantization due to the loss of high
frequencies.
• Blurring artifacts
• Occurs due to loss of spatial detail in moderate to high spatial activity regions of
pictures, such as in roughly textured areas or around scene object edges.
Blocking
Artifacts
Original Image
Ringing
Artifacts
Blurred image
Typical Hybrid Block-Based Encoder
VLC – Variable length
coding
DCT – Discrete cosine
transform
IDCT – Inverse discrete
cosine transform
Typical Decoder Process Flow
VLC – Variable length
coding
IDCT – Inverse discrete
cosine transform
Video Coding Standards
• Support multiple use cases and
profiles
• Approximately 2x improvement in
compression every decade
• Standards:
• H.265/HEVC – High Efficiency Video
Coding
• H.264/AVC – Advanced Video Coding
• MPEG-2
Block diagram of MPEG-2 encoder
Block diagram of MPEG-2 decoder
MPEG-2 intra and inter quantization matrices
• These Matrices reflect the Human Visual System (HVS).
,
Scanning types
Used mainly in frame
based coding
Used mainly in field based
coding
Improvements of H.264 over previous standards
• Prediction
• Intra prediction using neighboring samples
• Temporal prediction using multiple frames
• Motion compensation on variable block sizes, quarter pixel resolution.
• Transform
• 4x4 or 8x8 integer transform, 2x2 or 4x4 Secondary Hadamard.
• Quantization
• Finer quantization supported
• Entropy coding
• Context Adaptive Variable Length Coding (CAVLC) and Arithmetic Coding (CABAC)
• In-loop de-blocking filter
Improved features in HEVC compared to its
predecessor H.264
• Enhanced
Hybrid
spatial-temporal
prediction
model
- Flexible partitioning, introduces Coding Tree Units (Coding, Prediction and
Transform Units - CU, PU, TU)
• CTU supporting larger block structure (64x64) with more variable sub partition
structures (32x32, 16x16).
• Supports high-resolution video up to 8K UHD (8192*4320) while H.264 supports up
to 4K UHD(4096*2160).
• 33 directional modes for intra prediction (H.264 has 8 directional modes)
apart from DC and planar.
• Entropy coding using only CABAC
• In-loop filtering having de-blocking and Sample Adaptive Offset filters
• Superior parallel processing architecture
Contd..
• 7-tap or 8-tap filters for fractional sample interpolation (up to quartersample precision) but, H.264 uses 6-tap filter for half-sample precision
and linear interpolation for quarter-sample precision.
• Supports tiles for parallel processing.
Block diagram of H.264 encoder
NAL – Network abstraction layer
MC – Motion compensation
ME- Motion estimation
T – Transform
Q – Quantization
Q-1 – Inverse quantization
T-1 - Inverse transform
Block diagram of H.264 decoder
NAL – Network abstraction layer
MC- Motion compensation
Q-1 – Inverse Quantization
T-1 - Inverse transform
Block diagram of HEVC encoder
Block diagram of HEVC decoder
Modes for intra-prediction in H.264
Division of picture into slices and tiles in HEVC
CTU : Coding Tree Unit
Modes and directional orientations for intra
prediction in HEVC
Integer and fractional positions for luma
interpolation
A – Full pixels
b – Half-pel interpolated
pixels
a,c – quarter-pel
interpolated pixels
Filter coefficients for luma and chroma fractional
sample interpolation in HEVC
Filter coefficients for luma fractional sample interpolation
Filter coefficients for chroma fractional sample interpolation
References
•
I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley, 2002.
•
I.E.G. Richardson, “The H.264 advanced video compression standard”, 2nd Edition, Hoboken, NJ, Wiley, 2010.
•
K. Sayood, “Introduction to Data compression”, Third Edition, Morgan Kaufmann Series in Multimedia Information and Systems, San Francisco, CA, 2005.
•
V. Sze and M. Budagavi, “Design and Implementation of Next Generation Video Coding Systems (H.265/HEVC Tutorial)”, IEEE International Symposium on Circuits and Systems
(ISCAS), Melbourne, Australia, June 2014.
•
V. Sze, M. Budagavi and G.J. Sullivan (Editors), “High Efficiency Video Coding (HEVC): Algorithms and Architectures”, Springer, 2014.
•
G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1649-1668, Dec.
2012.
•
G. J. Sullivan et al ,“Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, vol. 7, pp.1001-1016, Dec. 2013.
•
K.R. Rao, D.N. Kim and J.J. Hwang, “Video Coding Standards: AVS China, H.264/MPEG-4 Part 10, HEVC, VP6, DIRAC and VC-1”, Springer, 2014.
•
D. Grois, B. Bross and D. Marpe, “HEVC/H.265 Video Coding Standard (Version 2) including the Range Extensions, Scalable Extensions, and Multiview Extensions,” (Tutorial) Sunday
27 Sept 2015, 9:00 am to 12:30 pm), IEEE ICIP, Quebec City, Canada, 27 – 30 Sept. 2015.
The tutorial below is for personal use only [Password: a2FazmgNK ]
https://datacloud.hhi.fraunhofer.de/owncloud/public.php?service=files&t=8edc97d26d46d4458a9c1a17964bf881
•
Please find the links to YouTube videos on the tutorial - HEVC/H.265 Video Coding Standard including the Range Extensions Scalable Extensions and Multiview Extensions below:
https://www.youtube.com/watch?v=TLNkK5C1KN8
•
HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html
•
“Special issue on HEVC extensions and efficient HEVC implementations”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 26, pp. 1-249, Jan. 2016.
•
K.R. Rao and J.J. Hwang, “Techniques and standards for image/video/audio coding”, Prentice Hall, 1996.
Contd..
• Video lectures from IITs and IISC: http://nptel.iitm.ac.in/
• Image and video processing courses at UT Arlington (EE 5351, EE 5355, EE 5356 and EE 5359) :
http://www.uta.edu/faculty/krrao/dip/
• HEVC chapter 1 :http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/HEVCCH1a_updated.doc
• Online course on fundamentals of digital image and video processing from Coursera:
https://www.coursera.org/course/digital
• Access to HM 16.0 Software Manual:
http://iphome.hhi.de/marpe/download/Performance_HEVC_VP9_X264_PCS_2013_preprint.pdf
• Test Sequences: ftp://ftp.kw.bbc.co.uk/hevc/hm-11.0-anchors/bitstreams/
• HEVC white paper-Ittiam Systems: http://www.ittiam.com/Downloads/en/documentation.aspx
• HEVC white paper-Elemental Technologies: http://www.elementaltechnologies.com/lp/hevch265-demystified-white-paper
• Access to HM 16.0 Reference Software: http://hevc.hhi.fraunhofer.de/
Books for beginners
• A. Bovik (Ed), “Handbook of image and video processing”, Orlando, FL: Academic Press, 2000. II Edition,
2005. (Elsevier) www.books.elsevier.com/communications.
• M. Ghanbari, Video Coding: An Introduction to Standard Codecs, IEE Press, 1999.
• J.D. Gibson et al, "Digital Compression for Multimedia: Principles and Standards," San Francisco, CA: Morgan
Kaufmann, 1998.
• M. Ghanbari, “Standard codecs: Image compression to advanced video coding, IEE, UK, 2003.
• B. Haskell, A. Puri and A. Netravali, Digital Video: An Introduction to MPEG-2, Chapman & Hall, 1996.
• K. Jack, “Video demystified: a handbook for the digital engineer”, Oxford, UK: Newnes/Elsevier, 2005.
• K.N. Ngan, T. Meier and D. Chai, “Advanced video coding: Principles and techniques”, Elsevier, Aug. 1999.
• A. Puri and T. Chen (Eds), Multimedia Systems, Standards and Networks, Marcel Dekker, 2000.
• W.B. Pennebaker, J.L. Mitchell, C. Fogg and D. Le Gall, MPEG Digital Video Compression Standard, Chapman
& Hall, 1997.
• D.S. Peter, “Video compression: Fundamental compression techniques and an overview of the JPEG and
MPEG compression systems”, McGraw-Hill, New York, NY: 1998.
Contd..
• C. Poynton, “Digital video and HDTV”, San Francisco, CA: Morgan Kaufmann, 2003.
• Y.Q. Shi and H. Sun, “Image and video compression for multimedia engineering: Fundamentals, algorithms
and standards,” Boca Raton, FL: CRC Press, 2000. II Edition, 2008. CD-ROM, Solution manual and
downloadable images in the text.
• J. Watkinson, “Digital compression in video and audio,” Focal Press. Ltd., U.K./Butterworth-Heinemann,
Boston, 1995.
• D.G. Duffy, “ Advanced engineering mathematics with MATLAB”,
• Boca Raton, FL: CRC Press, 2011
• M. Sonka, “Image processing, Analysis and Machine Vision”, 4th Edition, Boston, MA: Cengage learning, 2013.
(projects – MATLAB companion).
• L. Guan, Y. He and S.-Y. Kung (Ed), “Multimedia image and video processing”, II Edition, CRC Press, 2012.
• S . Jayaraman, S. Esakkirajan and T. Veerakumar, “Digital image processing”, McGraw Hill Education (India)
Private Ltd., New Delhi, 2009. (excellent book with examples, MATLAB codes etc)
• A. Bovik , “ The Essential Guide to Video Processing “ , 2nd Edition , Elsevier , 2009.
• W. Pearlman and A. Said, “Digital signal compression: Principles and practice”, Cambridge University Press,
2009.
Thank You!
Download