Video Compression Evolution of video coding standards Outline • Need for Video Compression • Application Scenarios • Fundamentals of Video Coding • Redundancy Removal Techniques • Compression Artifacts • Encoding and Decoding Process Flow • Video Coding Standards • Related Work Digital Video • A sequence of digital video frames Frame 1 Frame 2 Frame 3 Frame 4 • Video compression is a process of reducing the amount of data required to represent digital video signal, prior to transmission or storage • Coding techniques can be lossy or lossless Need for Video Compression • Uncompressed 1080p (progressive) high definition (HD) video at 24 frames / second Pixels per frame 1920x1080 Bits per pixel 8 x 3 (RGB) 1.5 hours 806 GB Bitrate 1.2 Gbit/second • Blue-ray DVD ⁻ Capacity : 25 GB (gigabytes) for single layer. ⁻ Read rate : 36 Mbit/s • Video Streaming or TV Broadcast ⁻ 1 Mbit/s to 20 Mbit/s • Requires compression in the order of 20x to 1000x Application Scenarios Digital television broadcasting Internet Video Streaming Mobile Video Streaming DVD video Video Calling Fundamentals of Video Coding YCrCb Color Space • YCrCb is the digital form of YUV color space. RGB to YCbCr conversion for 8 bits per pixel (bpp) image is given below. Y 0.299 0.587 0.114 R 0 Cb 0.169 0.334 0.500 G 128 Cr 0.500 0.419 0.081 B 128 • Has luminance (Y) and color difference or chrominance (Cr, Cb) components • Widely used in image and video compression schemes • Chroma sub sampling example – 4:2:0 format WxH H W Y W/2 x H/2 W/2 x H/2 Cr Cb Redundancies in Video Sequences • Compression can be achieved by exploiting various redundancies in video sequences • Types of redundancies: - Spatial redundancy - Perceptual redundancy - Statistical redundancy - Temporal redundancy Spatial Redundancy Removal 1. Intra Prediction - Blocks are predicted using neighboring pixels reconstructed from the same picture - Prediction block is subtracted from the current block prior to encoding Predict Current Frame Previously coded pixels Previously coded pixels Horizontally Predicted Block - Predict Current Block Current block to be coded Encode Difference Spatial Redundancy Removal 2. Block Transforms • Convert spatial variations within a block to frequency variations without changing the data • Typically matrix operations, invertible • Used for energy compaction in the block (DCT is used) Forward 2D-DCT N 1 N 1 2 (2 x 1)u (2 y 1)v F (u , v) C (u )C (v) f ( x, y )cos cos N 2N 2N x0 y 0 for u 0,..., N 1 and v 0,..., N 1 1 / 2 for k 0 where N 8 and C ( k ) 1 otherwise Inverse 2D-DCT 2 N 1 N 1 (2 x 1)u (2 y 1)v f ( x, y ) C (u )C (v) F (u , v)cos cos N u 0 v0 2N 2N for x 0,..., N 1 and y 0,..., N 1 where N 8 f(x,y) is the value of each pixel in the selected 8×8 block, and the F(u,v) is the DCT coefficient after transformation. The transformation of the 8×8 block is also a 8×8 block composed of F(u,v). Contd.. 8x8 2D Discrete Cosine Transform (DCT) 8x8 Block of Pixels (8 bits per pixel, 0-255 levels) Transformed Coefficients Perceptual Redundancy Removal • From perceptual point of view, all the video data is not equally significant • Human visual system is more sensitive to low frequency information than high frequency information • Quantization is a good tool for removal of perceptual redundancy - It’s not invertible, introduces distortion Statistical Redundancy Removal • Probability of occurrence of all the pixels in an image (or transformed image) is not equal • Entropy Coding can be used to exploit statistical redundancy (Example - Variable length coding) • • • • Shorter code words used to represent more frequent values Longer code words used to represent less frequent values Lossless compression technique Entropy (H) is the minimum theoretical bit rate at which a group of L samples can be coded and is given by N Entropy (H) = - P (ai) log2 P(ai) i=1 N = # of symbols P (ai) = probability of symbol ai • Various coding techniques: • • • CABAC – Context Adaptive Binary Arithmetic Coding CAVLC – Context Adaptive Variable Length Coding Huffman Coding Temporal Redundancy Removal • Inter Prediction - Adjacent picture(s) used as reference to predict current block of frame 1. Frame difference coding - Difference can be encoded using DCT + Quantization + Entropy coding Frame 1 Frame 2 Frame 1 - Frame 2 Temporal Redundancy Removal 2. Inter prediction using Motion Compensated Prediction • Divide the frame into blocks • For each block, find out the relative motion between the current block and a matching block of the same size in the reference frame using Block Matching Algorithm* • Displacement between the current block and the best matching block is the Motion Vector (MV), process of motion determination is called Motion Estimation. • Current block is replaced by best matching block to form Motion Compensated Prediction • Transmit the motion vector(s) for each block * M. Jakubowski and G. Pastuszak, “Block-based motion estimation algorithms – a survey”, Opto-Electronics Review, Rev. 21, No. 1, pp. 86-102, 2013. Contd.. • The dissimilarity D(s,t) (sometimes referred to as error, distortion, or distance) between two images Ψn andΨn-1 is defined as follows p q D( s, t ) M [n( x, y), n 1( x Vx, y Vy )] Vy 1 Vx , 1 where M(u,v) is a metric that measure the dissimilarity between the two arguments u and v. [Here u = Ψn (x,y) and v = Ψn-1(x+Vx, y+Vy)] • There are several types of matching criteria and two most frequently used are MSE and MAD, which is defined as follows: 1) Mean square error (MSE): M (u, v) (u - v)2 2) Mean absolute difference (MAD): M (u, v) | u - v | • A study based on experimental works reported that the matching criterion does not significantly affect the search. Hence, the MAD is preferred due to its simplicity in implementation. Contd.. Given a macroblock in the anchor block Bm, the motion estimation is to determine a matching block Bm’ in the target frame such that the error D(s,t) between the two blocks is minimized. The most straightforward method is the exhaustive block-matching algorithm (EBMA). MV (Vx , Vy ) arg min (Vx ,Vy )S M [n( x, y), n 1( x Vx, y Vy )] ( x , y )MB where S is the search region and MV is the motion vector that minimizes the distance. Video Coding Picture Types • Three types: I-frames - least compressible but don't require other video frames to decode P-frames - use data from previous frames to decompress and are more compressible than I-frames. B-frames - use both previous and forward frames for data reference to get the highest amount of data compression • Typical Group Of Pictures (GOP) structure is IBBPBBP… I : Intra coded frame P : Predicted coded frame B : Bi-directionally predicted frame Sample Prediction Blocks in a Frame Compression Artifacts • Noticeable distortion of video caused by the application of lossy data compression techniques • Types of artifacts in the context of predictive coding • Blocking artifacts • Occurs due to discontinuities found at the boundaries of adjacent blocks in a reconstructed picture. • Induced by two different sources: Block-wise prediction and block transform coding • Can be mitigated using de-blocking filter. • Ringing artifacts • Occurs in the context of transformation and quantization due to the loss of high frequencies. • Blurring artifacts • Occurs due to loss of spatial detail in moderate to high spatial activity regions of pictures, such as in roughly textured areas or around scene object edges. Blocking Artifacts Original Image Ringing Artifacts Blurred image Typical Hybrid Block-Based Encoder VLC – Variable length coding DCT – Discrete cosine transform IDCT – Inverse discrete cosine transform Typical Decoder Process Flow VLC – Variable length coding IDCT – Inverse discrete cosine transform Video Coding Standards • Support multiple use cases and profiles • Approximately 2x improvement in compression every decade • Standards: • H.265/HEVC – High Efficiency Video Coding • H.264/AVC – Advanced Video Coding • MPEG-2 Block diagram of MPEG-2 encoder Block diagram of MPEG-2 decoder MPEG-2 intra and inter quantization matrices • These Matrices reflect the Human Visual System (HVS). , Scanning types Used mainly in frame based coding Used mainly in field based coding Improvements of H.264 over previous standards • Prediction • Intra prediction using neighboring samples • Temporal prediction using multiple frames • Motion compensation on variable block sizes, quarter pixel resolution. • Transform • 4x4 or 8x8 integer transform, 2x2 or 4x4 Secondary Hadamard. • Quantization • Finer quantization supported • Entropy coding • Context Adaptive Variable Length Coding (CAVLC) and Arithmetic Coding (CABAC) • In-loop de-blocking filter Improved features in HEVC compared to its predecessor H.264 • Enhanced Hybrid spatial-temporal prediction model - Flexible partitioning, introduces Coding Tree Units (Coding, Prediction and Transform Units - CU, PU, TU) • CTU supporting larger block structure (64x64) with more variable sub partition structures (32x32, 16x16). • Supports high-resolution video up to 8K UHD (8192*4320) while H.264 supports up to 4K UHD(4096*2160). • 33 directional modes for intra prediction (H.264 has 8 directional modes) apart from DC and planar. • Entropy coding using only CABAC • In-loop filtering having de-blocking and Sample Adaptive Offset filters • Superior parallel processing architecture Contd.. • 7-tap or 8-tap filters for fractional sample interpolation (up to quartersample precision) but, H.264 uses 6-tap filter for half-sample precision and linear interpolation for quarter-sample precision. • Supports tiles for parallel processing. Block diagram of H.264 encoder NAL – Network abstraction layer MC – Motion compensation ME- Motion estimation T – Transform Q – Quantization Q-1 – Inverse quantization T-1 - Inverse transform Block diagram of H.264 decoder NAL – Network abstraction layer MC- Motion compensation Q-1 – Inverse Quantization T-1 - Inverse transform Block diagram of HEVC encoder Block diagram of HEVC decoder Modes for intra-prediction in H.264 Division of picture into slices and tiles in HEVC CTU : Coding Tree Unit Modes and directional orientations for intra prediction in HEVC Integer and fractional positions for luma interpolation A – Full pixels b – Half-pel interpolated pixels a,c – quarter-pel interpolated pixels Filter coefficients for luma and chroma fractional sample interpolation in HEVC Filter coefficients for luma fractional sample interpolation Filter coefficients for chroma fractional sample interpolation References • I.E.G. Richardson, “Video Codec Design: Developing Image and Video Compression Systems”, Wiley, 2002. • I.E.G. Richardson, “The H.264 advanced video compression standard”, 2nd Edition, Hoboken, NJ, Wiley, 2010. • K. Sayood, “Introduction to Data compression”, Third Edition, Morgan Kaufmann Series in Multimedia Information and Systems, San Francisco, CA, 2005. • V. Sze and M. Budagavi, “Design and Implementation of Next Generation Video Coding Systems (H.265/HEVC Tutorial)”, IEEE International Symposium on Circuits and Systems (ISCAS), Melbourne, Australia, June 2014. • V. Sze, M. Budagavi and G.J. Sullivan (Editors), “High Efficiency Video Coding (HEVC): Algorithms and Architectures”, Springer, 2014. • G. J. Sullivan et al, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 22, No. 12, pp. 1649-1668, Dec. 2012. • G. J. Sullivan et al ,“Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal of selected topics in Signal Processing, vol. 7, pp.1001-1016, Dec. 2013. • K.R. Rao, D.N. Kim and J.J. Hwang, “Video Coding Standards: AVS China, H.264/MPEG-4 Part 10, HEVC, VP6, DIRAC and VC-1”, Springer, 2014. • D. Grois, B. Bross and D. Marpe, “HEVC/H.265 Video Coding Standard (Version 2) including the Range Extensions, Scalable Extensions, and Multiview Extensions,” (Tutorial) Sunday 27 Sept 2015, 9:00 am to 12:30 pm), IEEE ICIP, Quebec City, Canada, 27 – 30 Sept. 2015. The tutorial below is for personal use only [Password: a2FazmgNK ] https://datacloud.hhi.fraunhofer.de/owncloud/public.php?service=files&t=8edc97d26d46d4458a9c1a17964bf881 • Please find the links to YouTube videos on the tutorial - HEVC/H.265 Video Coding Standard including the Range Extensions Scalable Extensions and Multiview Extensions below: https://www.youtube.com/watch?v=TLNkK5C1KN8 • HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html • “Special issue on HEVC extensions and efficient HEVC implementations”, IEEE Trans. on Circuits and Systems for Video Technology, Vol. 26, pp. 1-249, Jan. 2016. • K.R. Rao and J.J. Hwang, “Techniques and standards for image/video/audio coding”, Prentice Hall, 1996. Contd.. • Video lectures from IITs and IISC: http://nptel.iitm.ac.in/ • Image and video processing courses at UT Arlington (EE 5351, EE 5355, EE 5356 and EE 5359) : http://www.uta.edu/faculty/krrao/dip/ • HEVC chapter 1 :http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/HEVCCH1a_updated.doc • Online course on fundamentals of digital image and video processing from Coursera: https://www.coursera.org/course/digital • Access to HM 16.0 Software Manual: http://iphome.hhi.de/marpe/download/Performance_HEVC_VP9_X264_PCS_2013_preprint.pdf • Test Sequences: ftp://ftp.kw.bbc.co.uk/hevc/hm-11.0-anchors/bitstreams/ • HEVC white paper-Ittiam Systems: http://www.ittiam.com/Downloads/en/documentation.aspx • HEVC white paper-Elemental Technologies: http://www.elementaltechnologies.com/lp/hevch265-demystified-white-paper • Access to HM 16.0 Reference Software: http://hevc.hhi.fraunhofer.de/ Books for beginners • A. Bovik (Ed), “Handbook of image and video processing”, Orlando, FL: Academic Press, 2000. II Edition, 2005. (Elsevier) www.books.elsevier.com/communications. • M. Ghanbari, Video Coding: An Introduction to Standard Codecs, IEE Press, 1999. • J.D. Gibson et al, "Digital Compression for Multimedia: Principles and Standards," San Francisco, CA: Morgan Kaufmann, 1998. • M. Ghanbari, “Standard codecs: Image compression to advanced video coding, IEE, UK, 2003. • B. Haskell, A. Puri and A. Netravali, Digital Video: An Introduction to MPEG-2, Chapman & Hall, 1996. • K. Jack, “Video demystified: a handbook for the digital engineer”, Oxford, UK: Newnes/Elsevier, 2005. • K.N. Ngan, T. Meier and D. Chai, “Advanced video coding: Principles and techniques”, Elsevier, Aug. 1999. • A. Puri and T. Chen (Eds), Multimedia Systems, Standards and Networks, Marcel Dekker, 2000. • W.B. Pennebaker, J.L. Mitchell, C. Fogg and D. Le Gall, MPEG Digital Video Compression Standard, Chapman & Hall, 1997. • D.S. Peter, “Video compression: Fundamental compression techniques and an overview of the JPEG and MPEG compression systems”, McGraw-Hill, New York, NY: 1998. Contd.. • C. Poynton, “Digital video and HDTV”, San Francisco, CA: Morgan Kaufmann, 2003. • Y.Q. Shi and H. Sun, “Image and video compression for multimedia engineering: Fundamentals, algorithms and standards,” Boca Raton, FL: CRC Press, 2000. II Edition, 2008. CD-ROM, Solution manual and downloadable images in the text. • J. Watkinson, “Digital compression in video and audio,” Focal Press. Ltd., U.K./Butterworth-Heinemann, Boston, 1995. • D.G. Duffy, “ Advanced engineering mathematics with MATLAB”, • Boca Raton, FL: CRC Press, 2011 • M. Sonka, “Image processing, Analysis and Machine Vision”, 4th Edition, Boston, MA: Cengage learning, 2013. (projects – MATLAB companion). • L. Guan, Y. He and S.-Y. Kung (Ed), “Multimedia image and video processing”, II Edition, CRC Press, 2012. • S . Jayaraman, S. Esakkirajan and T. Veerakumar, “Digital image processing”, McGraw Hill Education (India) Private Ltd., New Delhi, 2009. (excellent book with examples, MATLAB codes etc) • A. Bovik , “ The Essential Guide to Video Processing “ , 2nd Edition , Elsevier , 2009. • W. Pearlman and A. Said, “Digital signal compression: Principles and practice”, Cambridge University Press, 2009. Thank You!