Image/Video Compression September 28, 1999 Lawrence A. Rowe University of California, Berkeley URL: http://www.BMRC.Berkeley.EDU/~larry Copyright @1999, L.A. Rowe Outline • • • • • Background Block Transform Coding Other Coding Algorithms Software/Hardware CODEC’s Pragmatic Issues Multimedia Systems and Applications 2 Video Data Size size of uncompressed video in gigabytes 1 sec 1 min 1 hour 1000 hours 1920x1080 0.19 11.20 671.85 671,846.40 1280x720 0.08 4.98 298.60 298,598.40 640x480 0.03 1.66 99.53 99,532.80 320x240 0.01 0.41 24.88 24,883.20 160x120 0.00 0.10 6.22 6,220.80 image size of video 1280x720 (1.77) Multimedia Systems and Applications 640x480 (1.33) 320x240 160x120 3 Video Bit Rate Calculation width * height * depth * fps = bits/sec compression factor width ~ pixels(160, 320, 640, 720, 1280, 1920, …) height ~ pixels (120, 240, 480, 485, 720, 1080, …) depth ~ bits (1, 4, 8, 15, 16, 24, …) fps ~ frames per second (5, 15, 20, 24, 30, …) compression factor (1, 6, 24, …) Multimedia Systems and Applications 4 Effects of Compression storage for 1 hour of compressed video in megabytes 1:1 3:1 6:1 25:1 100:1 1920x1080 671,846 223,949 111,974 26,874 6,718 1280x720 298,598 99,533 49,766 11,944 2,986 640x480 99,533 33,178 16,589 3,981 995 320x240 24,883 8,294 4,147 995 249 160x120 6,221 2,074 1,037 249 62 3 bytes/pixel, 30 frames/sec Multimedia Systems and Applications 5 Be Careful... mpeg 200:1, jpeg 24:1 analog source digital representation compressed representation vs Multimedia Systems and Applications 6 Another View Data Rate 128 Kbs 384 Kbs 1.5 Mbs 3.0 Mbs 6.0 Mbs 25 Mbs Multimedia Systems and Applications Size/Hour 60 MB 170 MB 680 MB 1.4 GB 2.7 GB 11.0 GB 7 Perceptual Coding • Encode source signal using lossy compression Lossless algorithms typically reduce signal by 3:1 Must use lossy algorithm to get adequate compression • Hide errors where humans will not see or hear it Study hearing and vision system to understand how we see/hear Masking refers to one signal overwhelming/hiding another (e.g., loud siren or bright flash) Audio perception is 20-20 kHz but most sounds in low frequencies (e.g., 2 kHz to 4 kHz) Visual perception influenced by edges and low frequencies Multimedia Systems and Applications 8 What is… • JPEG - Joint Photographic Experts Group Still image compression, intraframe picture technology MJPEG is sequence of images coded with JPEG • MPEG - Moving Picture Experts Group Many standards MPEG1, MPEG2, and MPEG4 Very sophisticated technology involving intra- and interframe picture coding and many other optimizations => high quality and cost in time/computation • H.261/H.263/H.263+ - Video Conferencing Low to medium bit rate, quality, and computational cost Used in H.320 and H.323 video conferencing standards Multimedia Systems and Applications 9 Coding Overview • Digitize Subsample to reduce data • Intraframe compression Remove redundancy within frame (spatial compression) • Interframe compression Remove redundancy between frames (temporal compression) • Symbol coding Efficient coding of sequence of symbols Multimedia Systems and Applications 10 Digitizing • Modify color space 24 bit RGB => 15 or 16 bit RGB 24 bit RGB => YUV (8 bit Y, 4 bit U, 4 bit V) 24 bit RGB => 8 bit color map • Encode only 1 field • Reduce frame rate Film is 24 fps so why encode 30 frames of video Is 20 fps good enough? 18? 15? 12? 8? 4? ... Variable frame rate? Multimedia Systems and Applications 11 Block Transform Encoding DCT Quantize Zig-zag 011010001011101... Run-length Code Multimedia Systems and Applications Huffman Code 12 Block Encoding DC component 139 144 150 159 144 151 155 161 149 153 160 162 153 156 163 160 DCT 1260 -1 -12 -23 -17 -6 -11 -9 -2 -7 -2 0 -5 -3 2 1 Quantize original image AC components 79 0 -2 -1 -1 -1 0 0 -1 0 0 0 0 0 0 0 run-length code 0 1 0 0 0 2 0 79 -2 -1 -1 -1 -1 0 Multimedia Systems and Applications Huffman code zigzag 79 -2 -1 0 0 -1 -1 0 -1 0 0 0 10011011100011... coded bitstream < 10 bits (0.55 bits/pixel) 13 0 0 0 0 Result of Coding/Decoding 139 144 150 159 144 151 155 161 149 153 160 162 144 156 155 160 153 156 163 160 146 150 156 161 149 152 157 161 152 154 158 162 reconstructed block original block -5 -4 -5 -1 -2 1 -1 0 0 1 3 1 1 2 5 -2 errors Multimedia Systems and Applications 14 Examples Uncompressed (262 KB) Compressed (50) (22 KB, 12:1) Multimedia Systems and Applications Compressed (1) (6 KB, 43:1) 15 Discrete Cosine Transform 4C(u)C(v) F[u,v] = n2 n-1 n-1 f(j,k) cos (2j+1)up 2n j=0 k=0 where C(w) = 1 2 1 cos (2k+1)vp 2n for w=0 for w=1,2,…,n-1 • Inverse is very similar • DCT better at reducing redundancy than Discrete Fourier Transform but computationally expensive Multimedia Systems and Applications 16 DCT vs DFT original signal recovered from DCT Multimedia Systems and Applications recovered from DFT 17 Inter-frame Compression • Pixel difference with previous frame If previous pixel very similar, skip it • Send sequence of blocks rather than frames If previous block similar, skip it or send difference • Motion compensation Search around block in previous frame for a better matching block and encode position and error difference Search in future frame Average block in previous and future frame Multimedia Systems and Applications 18 Background • JPEG still image (1 bpp) Symmetric codec • MPEG Interleaved audio and video Low cost decoder at expense of high cost encoder (asymmetric) • H.26x Video conferencing standard (QCIF, CIF, 4CIF, and 16CIF) Variable bit rates and coding flexibility Multimedia Systems and Applications 19 MPEG Standards • MPEG1 - vhs quality (1992) CIF images, 4:2:0 sampling, 1.5 Mbs Frame encoding • MPEG2 - broadcast quality (1994) CCIR 601 images, 4:2:2 sampling, 15 Mbs Interlaced and progressive scanning Frame and field encoding Multimedia Systems and Applications 20 MPEG Technology • • • • Picture coding types Motion compensation Bit rate control Picture order Multimedia Systems and Applications 21 Do MPEG slides HERE • Frame types I, P, B, and Bi frames • Jargon GOP, macroblock, slice • Motion vectors Predicted and interpolated vectors Multimedia Systems and Applications 22 Compression Formats • Cinepack (2 bpp) Inter-frame, RGB15, software playback • Motion JPEG (1-4 bpp) Intra-frame, DCT+Quantization, good for editing • MPEG (0.5-2 bpp) Inter-frame, DCT+Quantization+Motion Compensation, excellent for playback • H.261/H.263 Inter-frame, DCT+Quantization+Motion Compensation, block stream, excellent for conferencing Multimedia Systems and Applications 23 Quantitative Measures of Quality • Mean Squared Error n-1 n-1 1 2 MSE = f(i,j) f (i,j) ] n2 i=0 i=0 where f(i,j) is original image and f’(i,j) is image after compression and decompression • Perceptual Measures Perceptual Distortion Measure (Heeger) Picture Quality Scale (Miyahara, et.al.) Spatial/Temporal Measure (Webster, et.al.) Multimedia Systems and Applications 24 Discussion • Where are the lossy steps? Quantization and subsampling before coding • How do you choose quantization matrix? Standard proscribes matrix based on psychophysics Vary quantization by scaling Q matrix (i.e. MQUANT) Custom designed Q matrix • Can improve compressioin by using motion compensation (MC) • Comparison of standards… JPEG - still image uses fixed Q matrix H.261 - video conferencing uses MC and variable quantization MPEG - video playback uses MC and variable quantization Multimedia Systems and Applications 25 Discussion • How to do very low bit rate (28.8 Kbs)? Small image, low quality/bit rate, and better coding MPEG4 will address this problem • Gazillions of other proposals… Fine, but who needs them Scalable proposals will be useful • Must be prepared to deliver several formats Multimedia Systems and Applications 26 H.261 Coding Example 1 3 2 4 5 6 7 1 1 4 2 3 5 6 Multimedia Systems and Applications 2 3 1 4 2 5 3 6 4 5 7 6 27 Other Techniques • Vector Quantization Use small codebook of values Slow encode, fast decode • Fractal Coding Fit curves to signal Slow encode, fast decode? • Wavelet Coding Better transform than DCT – incorporates both spatial and frequency in transform Used in most research work and scalable codecs Multimedia Systems and Applications 28 Scalable Algorithms • Some applications want to control bandwidth and performance Send low quality, low bit rate image on high priority channel and quality improvements on low priority channel(s) May be dictated by network bandwidth (wireless), end station (PDA), or application need (video gallery) • Scalability parameters Image size Frame rate Reconstructed image quality Multimedia Systems and Applications 29 Pyramid Codes CodedImage-0 Low Quality CodedImage-1 Multimedia Systems and Applications 30 MPEG4 Standards Work • Original goal was very low-bit rate coding Can do remarkably well at rates as low as 10 bits/sec but coding/decoding time is large Hierarchical VQ is popular because of low cost to decode, but trade-off is less coding efficiency • Recent activity has moved towards programmable decoders Idea is to allow code to be downloaded to the decoding chip • Difficult trade-off between power consumption, quality and coding algorithm Multimedia Systems and Applications 31 Understanding CODEC Performance CODEC Compress Decompress JPEG 314 inst/pixel 283 inst/pixel MPEG1 1120 inst/pixel 60-100 inst/pixel • Problem is motion vector search in MPEG • Low bit-rate coding (1.5 Mbs) requires excellent coding Multimedia Systems and Applications 32 MPEG Coding Performance • Decoding is easy MPEG1 decoding in software on most platforms Hardware decoders are widely available ($150/board) Windows graphics accelerators with MPEG decoding now entering market (e.g., Matrox, Diamond, …) • Encoding is expensive Sequential software encoders are 20:1 real-time Real-time encoders use parallel processing Real-time hardware encoders are expensive (e.g., $12K$50K for MPEG1 and $100K-$500K for MPEG2) Hardware-assisted off-line MPEG1 encoders (3:1) used for multimedia authoring at reasonable cost ($5k) Multimedia Systems and Applications 33 Put decoding/encoding performance slides here Multimedia Systems and Applications 34 Other Boards • MJPEG DEC J300 (Turbochannel, CL560) costs $3K Parallax (Sun, HP, IBM R6K, CL560) costs $4K-$7.5K SGI Cosmo (CL560) cost (?) Radius VVS (Mac, CL560) costs $3K (?) Radius Truevision (PC, (?)) costs (?) PC boards (ASL, RealMagic MPEG Editor, IIT, …) • Proprietary Algorithms Intel Smart Video Recorder Pro ($3K, I860) • Programmable Boards Sun VideoPix (CL4K) supports MJPEG, MPEG, CELLB costs $1.5K Chromatics MPACT chip board (mid ‘96) Multimedia Systems and Applications 35