Motion Compensated Prediction and the Role of the DCT in Video Coding Michael Horowitz Applied Video Compression michael@avcompression.com (c) 2008 Michael Horowitz Outline • Overview block-based hybrid motion compensated predictive video coding – ITU-T standards H.261, H.263, H.264 – ISO/IEC standards: MPEG-1, MPEG-2 & MPEG-4 • Survey motion estimation & compensation • Discrete cosine transform (DCT) – Coding efficiency – Computational complexity – Perceptual implications (c) 2008 Michael Horowitz Block-Based Hybrid Motion Compensated Predictive Coding • Video picture partitioned into macroblocks • Macroblock (MB) has three components – One luma • “Y”, represents “lightness” • 16x16 luma samples – Two chroma • “Cb” & “Cr”, represent color • 16x16, 8x16, or 8x8 chroma samples (c) 2008 Michael Horowitz Block-Based Hybrid Motion Compensated Predictive Coding (continued) – Human Visual System more sensitive to luma • Chroma frequently sub-sampled • Sub-sampling examples 4:4:4 4:2:2 4:2:0 Y Cb Cr • Two coding modes for macroblocks (c) 2008 Michael Horowitz Inter-Picture Macroblock Coding – Estimate motion of blocks from picture-to-picture – Search previously coded (reference) pictures Motion Estimate Location of input MB Search Region Reference Picture – Encode • Location of motion estimate (motion vector) • Difference between input MB and motion estimate (c) 2008 Michael Horowitz Intra-Picture Macroblock Coding • Input MB coded using intra-picture prediction – Prediction derived from spatially adjacent MBs – Earlier algorithms offer no intra-picture prediction • Significantly lower coding efficiency than inter-coded MBs at low data rates • Useful when motion estimate is poor • Can be used to stop error propagation (c) 2008 Michael Horowitz Block-Based Hybrid Motion Compensated Predictive Coding QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. (c) 2008 Michael Horowitz Survey Motion Estimation and Motion Compensation • Motion models – Translational (focus of talk) • Location of kth motion compensated block – X k MVx,k , Yk MVy,k – (Xk,Yk) is location of kth input block – (MVx,k,MVy,k) is motion vector (MV) for kth block –Affine motion models • Rotation • Scaling • Video standards do not use affine models (c) 2008 Michael Horowitz Motion Estimation • Estimate inter-picture block translation • Luma samples (and sometimes chroma) • Example – Distortion: Sum of Absolute Differences (SAD) • Low complexity • Commonly used in real-time production encoders – Find (MVx,k, MVy,k) that minimizes SAD between • Input block sk(i,j) • Motion compensated prediction in reference picture r(i,j) • Subject to search range (c) 2008 Michael Horowitz Motion Estimation (continued) (MVx,k , MVy,k ) arg min sk (i, j) rX k x i, Yk y j x,y j i (X k x,Yk y ) SearchRange (Xk,Yk) Sample Locations Reference Picture r(i,j) Search Range • Fast motion estimation algorithms (c) 2008 Michael Horowitz Fractional Sample Motion Estimation • Estimate content between samples • Example: bilinear interpolation x1 ≤ x* < x2 and y1 ≤ y* < y2 fx= (x*-x1)/(x2-x1) fy= (y*-y1)/(y2-y1) z(x1,y1) z(x2,y1) z(x*,y*) z(x1,y2) z(x2,y2) z(x*,y*) = (1-fx)(1-fy)z(x1,y1) + fx(1-fy) z(x2,y1) + fxfy z(x2,y2) + (1-fx)fyz(x1,y2) (c) 2008 Michael Horowitz Fractional Sample Motion Estimation (continued) – H.261 • No fractional sample motion estimation – MPEG-1, MPEG-2 and H.263 • 1/2-sample, bilinear interpolation – H.264 | MPEG-4 AVC & SVC • Luma – 1/2-sample, 6-tap interpolation – 1/4-sample, simple average • Chroma (1/8-sample, bilinear) (c) 2008 Michael Horowitz Fractional Sample Motion Estimation (continued) • Coding efficiency gain H.263, [from Wang 2002] QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. (c) 2008 Michael Horowitz Multiple Motion Vectors per MB • One motion vector for each sub-block • H.264 results [Bjontegaard 2001] Foreman 38 37 PSNR (dB) 36 35 34 2 blocks 33 32 4 blocks 31 30 7 blocks 29 28 0 20 40 60 80 100 Rate (Kbps) (c) 2008 Michael Horowitz Multiple Reference Pictures [Wiegand, Zhang, & Girod 1997] • Coding gains – Uncovered areas – More integer motion vector estimates Integer sample location t-3 t-2 t-1 Direction of motion t0 Integer sample location (c) 2008 Michael Horowitz Multiple Reference Pictures (Continued) Woman, CIF, 10fps, QP=7,8,10,13,19,28 24.5 PSNR (dB) 24 Annex U (5 f rames) 23.5 No Annex U 23 22.5 22 0 50000 100000 150000 200000 Bit Rate (bps) • H.263 Annex U [Horowitz 2000] (c) 2008 Michael Horowitz Multi-Hypothesis Motion Compensated Prediction [Flierl, Wiegand & Girod 1998] • Linear combination of multiple predictions – One motion vector for each prediction – Bi-predicted pictures are special case (2 MVs) – Predictions may be forward & backward in time (c) 2008 Michael Horowitz Multi-Hypothesis for H.263 • Sequences Mobile & Calendar and Foreman • Results [Flierl 1998] (c) 2008 Michael Horowitz Overlapped Block Motion Compensation [Orchard & Sullivan 1994] • Special case of multi-hypothesis coding • H.263 advanced prediction mode (Annex F) – Overlapped block motion compensation • 1 coded + 2 “derived” motion vectors • Non-uniform spatial weighting of samples – 4 motion vectors per macroblock (c) 2008 Michael Horowitz Rate-Distortion Optimization • MV resulting in lowest distortion often not optimal • Goal: Find best tradeoff between distortion and rate • Strategy [Everett III 1963], [Shoham & Gersho 1988] J D R Dk Rk Jk k Total distortion Total bit-rate k Distortion Rate for block k for block k – Minimize Jk for each block k separately, using common (c) 2008 Michael Horowitz Perceptual Tuning • • • • Prevent transparent foreground macroblocks Blurring of fast moving objects Deblocking filter Artifacts in the motion wake Moving Object Direction of Motion Macroblocks in the Motion Wake (c) 2008 Michael Horowitz Coding Summary • Macroblock-based coding • Two basic macroblock coding modes – Inter-coded MB motion compensated prediction – Intra-coded MB QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. (c) 2008 Michael Horowitz 1-D Discrete Cosine Transform • Type II forward DCT [Ahmed et al. 1974] mk 12 dm x k cos m 0, ..., N 1 N k 0 N 1 • Type II inverse DCT d 2 xk 0 N N m(k 21 ) dm cos N k 0,...,N 1 m=1 N 1 (c) 2008 Michael Horowitz 2-Dimensional DCT • Forward (x 12 )u (y 12 )v sx, y cos cos N N x 0 y 0 N1 N1 d u,v • Inverse (x 12 )u (y 12 )v C u C v d u,v cos cos N N u 0 v 0 N1 N1 sx, y 1 Ct N 2 N for t 0, for t 1,2,...,N 1 (c) 2008 Michael Horowitz Basis Functions for 8x8 DCT QuickTi me™ and a TIFF (LZW) decompressor are needed to see this picture. (c) 2008 Michael Horowitz Why Choose the DCT? • Coding efficiency • Computational complexity • Perceptual implications (c) 2008 Michael Horowitz Coding Efficiency X1 X X2 Q1 Q2 ^ X1 ^ X ^ X2 • Source X = [X1, X2] – Xi is a Gaussian random variable – Mean = 0, Variance = i2 • Rate of quantizer Qi is Ri (bits / index) – Total rate R = R1 + R2 (c) 2008 Michael Horowitz Coding Efficiency (continued) • Distortion – Square error – High-rate assumption • High-rate implies R ≥ 3 bits / sample • Often works well for lower rates • Asymptotic Quantization Theory [Gray & Neuhoff 1998] Di hg i2 22Ri where hg 3 2 ( for Gaussian ) – Total distortion D D1 D2 (c) 2008 Michael Horowitz Rate Allocation Problem • What is smallest D = D* subject to R Rmax? • Find optimal value for R1 R1* D D1 D2 1) hg12 22R1 hg 22 22(RR • Minimizing D with respect to R1 yields 1 R 1 R 2 log 2 2 2 * 1 (c) 2008 Michael Horowitz Rate Allocation Problem (continued) It follows that 2 R 1 R 2 log 2 2 1 * 2 and D1* D2* hg1 2 2R which implies D* 2hg1 2 2R (c) 2008 Michael Horowitz Generalize for k Quantizers X12 X X3 X1 X2 Q1 ^ X 1 ^ X ^ X 12 ^ X 2 Q2 ^ X 3 Q3 • Rate R R1,2 R3 Rmax • Distortion D D1, 2 D3 2 2R D h • Recall 3 g 3 2 3 (c) 2008 Michael Horowitz Generalization (continued) • 2 quantizers with R1,2 R R3 subject to R Rmax * D1,2 2hg1 2 2(RR3 ) ( from previous result) * D3 with respect • Minimize D D1,2 to R3 2 R 1 3 * R3 2 log 2 1 3 2 2 2 3 1 2 3 (c) 2008 Michael Horowitz Generalization (continued) • It follows that 1 3 3 2R 2 3 D* 3hg 2 j j1 • Generalize to k quantizers by induction (c) 2008 Michael Horowitz Optimal Rate and Distortion [Huang & Schultheiss 1963] • Rate 2 R 1 i * Ri 2 log 2 1 k k 2 k j j1 • Distortion 1 k k 2R 2 k D* khg 2 j j1 (c) 2008 Michael Horowitz Observations and Comments • #1 Optimal rate for Qi proportional to log 2 i • #2 Optimal distortion 1 k 2R k D * 2 k Di hg 2 j k j1 * for all i • #3 In practice, systems use positive [Segall 1976] integer [Farber & Zeger 2005] Ri (c) 2008 Michael Horowitz Question • Given Gaussian source X & fixed encoder structure (i.e., k scalar quantizers) how can we minimize D subject to R Rmax? k Answer : Transform X to minimize . 2 j j1 (c) 2008 Michael Horowitz Transform Coding X1 X X2 Xk [Kramer & Mathews 1956] ^ ^ Y1 Y1 X1 Q1 ^ ^ X Y2 Y2 2 Q2 T T-1 … ^ ^ Yk Yk Xk Qk ^ X • For orthogonal T 2 2 ˆ ˆ D E Y Y E X X (c) 2008 Michael Horowitz Fact 1 • Karhunen-Loeve Transform (KLT) produces smallest . [Huang et al. 1963] k 2 j j 1 – – – – a) Gaussian input random variables b) High-rate quantizers c) Rate of each quantizer is arbitrary real value d) Square error distortion measure (c) 2008 Michael Horowitz Fact 2 • The autocorrelation matrix of the KLT transform vector is diagonal. – KLT coefficients are uncorrelated – There is no general theorem stating uncorrelated quantities can be more efficiently quantized than correlated ones (c) 2008 Michael Horowitz Fact 3 k • If KLT produces , orthogonal T˜produces 2 j ≥ ˜ then j 1 2 j j 1 n for j 1 k k 2 j j 1 n ˜ j2 2 j for all n k j 1 12 22 , , k2 & ˜ 12 ˜ 22 , ˜ k2 , Energy compaction (c) 2008 Michael Horowitz Practical Considerations • KLT impractical for many systems – Computational complexity • Transform is signal dependent • Compute and apply transform for each input • Consider Fourier based transforms – Fast algorithms exist – Examine loss of coding efficiency resulting from loss of energy compaction (c) 2008 Michael Horowitz Energy Compaction of Some Discrete Transforms QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. • 1x32 block in natural images [Lohscheller] (c) 2008 Michael Horowitz 2-D Energy Compaction [from Hedberg & Nilsson 2004] • KLT DCT QuickTi me™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. • DFT QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. (c) 2008 Michael Horowitz Computational Complexity • Recall DCT may be derived from DFT – First N coefficients of 2N-point DFT – Requires appropriate input sequence symmetry – Requries scaling [Tseng & Miller 1978] m dm Sec f m 2N 1 2 for m 0, ..., N 1 where fm is mth DFT coefficient • Leverage FFT to compute DCT (c) 2008 Michael Horowitz Computational Complexity (continued) • 1-D 8-point DCT from 16-point DFT – 13 mults, 29 adds [Arai et al. 1988] – 8 final scaling multiplies rolled into quantization • Net 5 mults, 29 adds best known • Fast 2-D DCT (8x8) – Separable [from Pennebaker & Mitchell 1992] • 80 mults, 464 adds best known – Non-separable [Feig 1992] • 54 mults, 416 adds, 6 shifts (c) 2008 Michael Horowitz Perceptual Implications • Contrast sensitivity of HVS – See last page of handout [Barlow & Mollen 1982] • Perceptually tuned quantization tables [Watson] • Filter coefficients prior to quantization – Shape frequency content of source – Exploit HVS contrast sensitivity (c) 2008 Michael Horowitz Concluding Summary • Motion estimation & compensation – Translation-based motion models – Fractional sample motion estimation – Multiple motion vectors per macroblock – Multiple reference pictures – Multi-hypothesis motion compensated prediction – Overlapped block motion compensation (c) 2008 Michael Horowitz Concluding Summary • DCT – Near optimal R-D performance for wide range of sources (Gaussian, high-rate assumptions) – Simple relationship to DFT fast – Perceptual relevance (c) 2008 Michael Horowitz References • • • • • • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90–93, Jan. 1974. Y. Arai, T. Agui, M. Nakajima, “A Fast DCT-SQ Scheme for Images”, Trans. of the IEICE.E 71(11):1095(Nov.1988). E. Feig, S. T.Winograd, “Fast Algorithms for Discrete Cosine Transform”, IEEE Trans. Signal Proc., 40, 2174-2193 (1992). H. B. Barlow and J. D. Mollon, The Senses. Cambridge: Cambridge University Press, 1982. G. Bjontegaard “Objective simulation results”, Document VCEG-M34, Video Coding Experts Group (VCEG),Thirteenth Meeting: Austin, Texas, USA, 2-4 April, 2001 H. Everett III, “Generalized Lagrangian Multiplier Method for Solving Problems of Optimum Allocation of Resources,” Operations Research, vol. 11, pp. 399-417, 1963. B. Farber and K. Zeger, “Quantization of Multiple Sources Using Integer Bit Allocation" Data Compression Conference (DCC) Salt Lake City, Utah, March 2005 (to appear). (c) 2008 Michael Horowitz References (continued) • • • • • • M. Flierl, T. Wiegand, B. Girod, “Locally Optimal Design Algorithm for Block-Based Multi-Hypothesis Motion-Compensated Prediction,” Proc. of the IEEE Data Compression Conference (DCC'98), pp. 239-248, Snowbird, USA, Apr. 1998. A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. B. Girod, Lecture for EE368b, Video and Image Compression Stanford University. R. M. Gray and D. L. Neuhoff, "Quantization," IEEE Transactions on Information Theory, vol. 44, pp. 2325-2384, Oct. 1998. R. M. Haralick, “A Storage Efficient Way to Implement the Discrete Cosine Transform”, IEEE Transactions on Computers, 25 (6) (1976) 764–765. H. Hedberg, and P. Nilsson, “A Survey of Various Discrete Transforms used in Digital Image Compression Algorithms,” Proceedings of the Swedish System-On-Chip Conference 2004, Bastad, Sweden, April 13-14, 2004. (c) 2008 Michael Horowitz References (continued) • • • • • • M. J. Horowitz, “Demonstration of H.263++ Annex U Performance”, Document Q15-J11, Tenth Meeting (Meeting J) of the ITU-T Q.15/16, Advanced Video Coding Experts Group, Osaka, Japan, 16-18 May, 2000. J.-Y. Huang and P. M. Schultheiss, “Block quantization of correlated Gaussian randomvariables,” IEEE Trans. Comm., vol. 11, pp. 289–296, September 1963. F. Kossentini, Y. Lee, M. Smith and R. Ward, “Predictive RD Optimized Motion Estimation for Very Low Bit-Rate Video Coding”, Special Issue of the IEEE Journal on Selected Areas in Communications, 15(9), pages 1752-1763, December 1997. H. P. Kramer and M.V. Mathews, “A linear coding for transmitting a set of correlated signals,” IRE Trans. Inform. Theory, vol. 23, no. 3, pp. 41-46, Sept. 1956. M. T. Orchard and G. J. Sullivan, “Overlapped block motion compensation: An estimation-theoretic approach,” IEEE Trans. Image Processing, vol. 3, no. 9, pp. 693699, Sept. 1994. W. B. Pennebaker, J. L. Mitchell, JPEG, p-53, Kluwer Academic Publishers, Norwell, MA, USA 1992. (c) 2008 Michael Horowitz References (continued) • • • • • • A. Segall, “Bit allocation and encoding for vector sources,” IEEE Trans. Inform. Theory IT-22 (March 1976) 162-169. Y. Shoham and A. Gersho, “Efficient Bit Allocation for an Arbitrary Set of Quantizers," IEEE Trans. on Acoust., Speech, Signal Processing, vol. 36, no. 9, pp. 1445-1453. September 1988. B. D. Tseng and W. C. Miller, “On Computing the Discrete Cosine Transform”, IEEE Transactions on Computers, 27 (10), (1978) 966–968. Y. Wang, “Video Coding Standards”, lecture slides based on text Video Processing and Communications, Prentice Hall, 2002. A. B. Watson, “DCT quantization matrices visually optimized for individual images,” Proc. SPIE, 1913:202-16, 1993. T. Wiegand, X. Zhang, and B. Girod, “Block-Based Hybrid Video Coding Using Motion-Compensated Long-Term Memory Prediction,” in Proc. of the Picture Coding Symposium, Berlin, Germany, pp. 153-158, Sept. 1997. (c) 2008 Michael Horowitz Backup slides • Little Things Big Difference • Motion search over picture boundary – Reference Gisle’s Austin Contribution (c) 2008 Michael Horowitz DCT from the DFT [Haralick 1976] • N-point DCT mk 12 dm x k cos m 0, ..., N 1 N k 0 N 1 • Extend N-point sequence xk by reflection xk x x2N k 1 ' k 0 k N N k 2N (c) 2008 Michael Horowitz Extend N-point Sequence xk by Reflection • Example xk N 2N (c) 2008 Michael Horowitz Compute 2N-point DFT fm 2N 1 ' x ke j 2 m k 2N k 0 N 1 xk' e j 2 m k 2N k 0 2N 1 xk' e j 2 m k 2N for m 0, ..., 2N 1 k N • Second sum equals (by symmetry of xk) N 1 ' x Le j 2 m (L1) 2N where L 2N k 1 L 0 (c) 2008 Michael Horowitz Compute 2N-point DFT (continued) • It follows that 2 m (k1) j 2 m k j fm xk' e 2N e 2N k 0 N 1 2 m 1 • Multiply by e f m 2e j j 2 2N e j 2 m 1 m N1 2N x k 0 ' k cos 2 2N & employ Euler’s formula m(k 12 ) N (c) 2008 Michael Horowitz Compute 2N-point DFT (continued) • Recognizing the DCT dm 12 e j m 2N fm for m 0, ..., N 1 ' x • Note k is even and real fm = Re{fm} (i.e. Im{fm} = 0) • It follows that [Tseng & Miller] m dm Sec f m 2N 1 2 for m 0, ..., N 1 • First N coeffs of 2N-point DFT N-point DCT – with appropriate scaling and xk symmetry (c) 2008 Michael Horowitz Energy Compaction of Some Discrete Transforms QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. • Transform coefficient variances for N=16, ρ=0.95 [Ahmed 1974] (c) 2008 Michael Horowitz KLT Computational Complexity • Transform is signal dependent • Construct transform – Compute correlation matrix for input vector – Find eigenvectors of correlation matrix • Apply transform (c) 2008 Michael Horowitz Multiple Motion Vectors per MB • One motion vector for each sub-block • H.264 results [Bjontegaard] Mobile and Calendar (CIF@30fps) 35 34 PSNR (dB) 33 32 31 2 blocks 30 29 4 blocks 28 27 7 blocks 26 25 0 500 1500 1000 2000 2500 Kbps (c) 2008 Michael Horowitz Practical Matters • 16-bit math for 4x4 in H.264 complexity reduction on certain platforms • 4x4 and 8x8 transforms in H.264 – Exact inverses • Non-exact specification for inverse DCT – How is it done? – Implications (c) 2008 Michael Horowitz Overlapped Block Motion Compensation in H.263 • Coding efficiency PSNR [dB] – Baseline vs advanced prediction mode [from Girod] QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. (c) 2008 Michael Horowitz 1-D DFT Energy Compaction Analysis • Fourier transform of ramp (continuous both domains) • Sample Fourier domain Fourier Series Repetition in time domain Amplitude Time (c) 2008 Michael Horowitz Ramp: First 5 Fourier Terms [ptolemy.eecs.berkeley.edu/eecs20/week8/examples.html] QuickTi me™ and a TIFF (LZW) decompres sor are needed to see this picture. 1 • Fourier term decay rate: n (c) 2008 Michael Horowitz Better Energy Compaction • DFT energy compaction not very good • Better energy compacting Fourier based transforms exist • Consider DFT of extended sequence – Extend input to force even symmetry – Leads to DCT (c) 2008 Michael Horowitz Extended Ramp (Triangle) • 2N-point extended ramp Amplitude Time • Sample Fourier Domain Fourier Series – No discontinuities at boundary (symmetrical extension) – Expect better energy compaction (c) 2008 Michael Horowitz Triangle: First 5 Fourier Terms [ptolemy.eecs.berkeley.edu/eecs20/week8/examples.html] QuickTi me™ and a TIFF (LZW) decompres sor are needed to see this picture. 1 • Fourier term decay rate: 2 n (c) 2008 Michael Horowitz Compaction Comparison Summary • DFT coefficient amplitude decay 1 n – Ramp – Extended ramp 1 n2 • Suggests DCT will compact well • Fourier Series DFT – Sampling in time repetition in frequency – “series-based” observations valid for DFT (c) 2008 Michael Horowitz Contrast Sensitivity • Allen B. Poirson & Brian A. Wandell, Pattern-color separable pathways predict sensitivity to simple colored patterns, Vision Research, 1995 QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. (c) 2008 Michael Horowitz Uniform Scalar Quantization • Distortion of D0 2 cell Di E X i xˆ i x 0 p 2 ith 2 0th Cell 2 X0 (x) dx 1 – Assume high rate pX 0 (x) 0 2 0 2 x 2 2 Otherwise 2 2 2 D0 , Di i D 12 12 12 (c) 2008 Michael Horowitz