Digital Image Processing – Fall 2010 Prof. Dmitry Goldgof Digital Video Processing Matthew Shreve Computer Science and Engineering University of South Florida mshreve@cse.usf.edu Outline • • • • Basics of Video Digital Video MPEG Summary Basics of Video Static scene capture Image Bring in motion Video • Image sequence: A 3-D signal – 2 spatial dimensions & 1 time dimension – Continuous I (x, y, t) discrete I (m, n, tk) Video Camera • Frame-by-frame capturing • CCD sensors (Charge-Coupled Devices) – – – – 2-D array of solid-state sensors Each sensor corresponds to a pixel Stored in a buffer and sequentially read out Widely used Progressive vs. Interlaced Videos • Progressive – Every pixel on the screen is refreshed in order (monitors) or simultaneously (films) • Interlaced – Refreshed twice every frame; the little gun at the back of your CRT shoots all the correct phosphors on the even numbered rows of pixels first and then odd numbered rows – NTSC frame-rate of 29.97 means the screen is redrawn 59.94 times a second – In other words, 59.94 half-frames per second or 59.94 fields per second Progressive vs. Interlaced Videos • How interlaced video could cause problems – Suppose you resize a 720 x 480 interlaced video to 576 x 384 (20% reduction) – How does resizing work? • takes a sample of the pixels from the original source and blends them together to create the new pixels – In case of interlaced video, you might end of blending scan lines of two completely different images! Progressive vs. Interlaced Videos Image in full 720 x 480 resolution Observe distinct scan lines Progressive vs. Interlaced Videos Image after being resized to 576x384 Some scan lines blended together! DIGITAL VIDEO Why Digital? • “Exactness” – Exact reproduction without degradation – Accurate duplication of processing result • Convenient & powerful computer-aided processing – Can perform rather sophisticated processing through hardware or software • Easy storage and transmission – 1 DVD can store a three-hour movie !!! – Transmission of high quality video through network in reasonable time Digital Video Coding • The basic idea is to remove redundancy in video and encode it • Perceptual redundancy – The Human Visual System is less sensitive to color and high frequencies • Spatial redundancy – Pixels in a neighborhood have close luminance levels • Low frequency • How about temporal redundancy? – Differences between subsequent frames can be small. Shouldn’t we exploit this? Hybrid Video Coding • “Hybrid” ~ combination of Spatial, Perceptual, & Temporal redundancy removal • Issues to be handled – Not all regions are easily inferable from previous frame • Occlusion ~ solved by backward prediction using future frames as reference • The decision of whether to use prediction or not is made adaptively – Drifting and error propagation • Solved by encoding reference regions or frames at constant intervals of time – Random access • Solved by encoding frame without prediction at constant intervals of time – Bit allocation • according to statistics • constant and variable bit-rate requirement MPEG combines all of these features !!! MPEG • MPEG – Moving Pictures Experts Group – Coding of moving pictures and associated audio • Picture part – Can achieve compression ratio of about 50:1 through storing only the difference between successive frames – Even higher compression ratios possible Bit Rate • Defined in two ways – bits per second (all inter-frame compression algorithms) – bits per frame (most intra-frame compression algorithms except DV and MJPEG) • What does this mean? – If you encode something in MPEG, specify it to be 1.5 Mbps; it doesn’t matter what the frame-rate is, it takes the same amount of space lower frame-rate will look sharper but less smooth – If you do the same with a codec like Huffyuv or Intel Indeo, you will get the same image quality through all of them, but the smoothness and file sizes will change as frame-rate changes MPEG-1 Compression Aspects • Lossless and Lossy compression are both used for a high compression rate • Down-sampled chrominance – Perceptual redundancy • Intra-frame compression – Spatial redundancy – Correlation/compression within a frame – Based on “baseline” JPEG compression standard • Inter-frame compression – Temporal redundancy – Correlation/compression between like frames • Audio compression – Three different layers (MP3) Perceptual Redundancy • Here is an image represented with 8-bits per pixel Perceptual Redundancy • The same image at 7-bits per pixel Perceptual Redundancy • At 6-bits per pixel Perceptual Redundancy • At 5-bits per pixel Perceptual Redundancy • At 4-bits per pixel Perceptual Redundancy • It is clear that we don’t all these bits! – Our previous example illustrated the eye’s sensitivity to luminance • We can build a perceptual model – Give more importance to what is perceivable to the Human Visual System • Usually this is a function of the spatial frequency Fundamentals of JPEG Encoder DCT Quantizer Entropy coder Compressed image data IDCT Dequantizer Decoder Entropy decoder Fundamentals of JPEG • • • • JPEG works on 8×8 blocks Extract 8×8 block of pixels Convert to DCT domain Quantize each coefficient – Different stepsize for each coefficient • Based on sensitivity of human visual system • Order coefficients in zig-zag order – Similar frequencies are grouped together • Run-length encode the quantized values and then use Huffman coding on what is left Random Access and Inter-frame Compression Temporal Redundancy – Only perform repeated encoding of the parts of a picture frame that are rapidly changing – Do not repeatedly encode background elements and still elements Random access capability – Prediction that does not depend upon the user accessing the first frame (skipping through movie scenes, arbitrary point pick-up) Sample (2D) Motion Field Target Frame Anchor Frame Motion Field 2-D Motion Corresponding to Camera Motion Camera zoom Camera rotation around Z-axis (roll) General Considerations for Motion Estimation • Two categories of approaches: – Feature based (more often used in object tracking, 3D reconstruction from 2D) – Intensity based (based on constant intensity assumption) (more often used for motion compensated prediction, required in video coding, frame interpolation) • Three important questions – How to represent the motion field? – What criteria to use to estimate motion parameters? – How to search motion parameters? Motion Representation Global: Entire motion field is represented by a few global parameters Pixel-based: One MV at each pixel, with some smoothness constraint between adjacent MVs. Block-based: Entire frame is divided into blocks, and motion in each block is characterized by a few parameters. Region-based: Entire frame is divided into regions, each region corresponding to an object or subobject with consistent motion, represented by a few parameters. Also mesh-based (flow of corners, approximated inside) Motion field Predicted target frame target frame anchor frame Examples Half-pel Exhaustive Block Matching Algorithm (EBMA) Predicted target frame Examples Three-level Hierarchical Block Matching Algorithm mesh-based method EBMA Examples EBMA vs. Mesh-based Motion Estimation Motion Compensated Prediction • • • • Divide current frame, i, into disjoint 16×16 macroblocks Search a window in previous frame, i-1, for closest match Calculate the prediction error For each of the four 8×8 blocks in the macroblock, perform DCT-based coding • Transmit motion vector + entropy coded prediction error (lossy coding) MPEG-1 Video Coding • Most MPEG1 implementations use a large number of I frames to ensure fast access – Somewhat low compression ratio by itself • For predictive coding, P frames depend on only a small number of past frames – Using less past frames reduces the propagation error • To further enhance compression in an MPEG-1 file, introduce a third frame called the “B” frame bi-directional frame – B frames are encoded using predictive coding of only two other frames: a past frame and a future frame • By looking at both the past and the future, helps reduce prediction error due to rapid changes from frame to frame (i.e. a fight scene or fast-action scene) Predictive coding hierarchy: I, P and B frames • I frames (black) do not depend on any other frame and are encoded separately – Called “Anchor frame” • P frames (red) depend on the last P frame or I frame (whichever is closer) – Also called “Anchor frame” • B frames (blue) depend on two frames: the closest past P or I frame, and the closest future P or I frame – B frames are NOT used to predict other B frames, only P frames and I frames are used for predicting other frames MPEG-1 Temporal Order of Compression • I frames are generated and compressed first – Have no frame dependence • P frames are generated and compressed second – Only depend upon the past I frame values • B frames are generated and compressed last – Depend on surrounding frames – Forward prediction needed Adaptive Predictive Coding in MPEG-1 • Coding each block in P-frame – Predictive block using previous I/P frame as reference – Intra-block ~ encode without prediction • use this if prediction costs more bits than non-prediction • good for occluded area • can also avoid error propagation • Coding each block in B-frame – Intra-block ~ encode without prediction – Predictive block • use previous I/P frame as reference (forward prediction) • or use future I/P frame as reference (backward prediction) • or use both for prediction MPEG Library • The MPEG Library is a C library for decoding MPEG-1 video streams and dithering them to a variety of color schemes. • Most of the code in the library comes directly from an old version of the Berkeley MPEG player (mpeg_play) • The Library can be downloaded from http://starship.python.net/~gward/mpeglib/mpeg_lib-1.3.1.tar.gz • It works good on all modern Unix and Unix-like platforms with an ANSI C compiler. I have tested it on “grad”. NOTE - This is not the best library available. But it works good for MPEG-1 and it is fairly easy to use. If you are inquisitive, you should check MPEG Software Simulation Group at http://www.mpeg.org/MPEG/MSSG/ where you can find a free MPEG-2 video coder/decoder. MPEGe Library • The MPEGe(ncoding) Library is designed to allow you to create MPEG movies from your application • The library can be downloaded from the files section of http://groups.yahoo.com/group/mpegelib/ • The encoder library uses the Berkeley MPEG encoder engine, which handles all the complexities of MPEG streams • As was the case with the decoder, this library can write only one MPEG movie at a time • The library works good with most of the common image formats – To keep things simple, we will stick to PPM MPEGe Library Functions • The library consists of 3 simple functions – MPEGe_open for initializing the encoder. – MPEGe_image called each time you want to add a frame to the sequence. The format of the image pointed to by image is that used by the SDSC Image library • SDSC is a powerful library which will allow you to read/write 32 different image types and also contains functions to manipulate them. The source code as well as pre-compiled binaries can be downloaded at ftp://ftp.sdsc.edu/pub/sdsc/graphics/ – MPEGe_close called to end the MPEG sequence. This function will reset the library to a sane state and create the MPEG end sequences and close the output file Note: All functions return non NULL (i.e. TRUE) on success and Zero (or FALSE) on failure. Usage Details • • You are not required to write code using the libraries to decode and encode MPEG streams Copy the binary executables from – – • http://www.csee.usf.edu/~mshreve/readframes http://www.csee.usf.edu/~mshreve/encodeframes Usage – To read frames from an MPEG movie (say test.mpg) and store them in a directory extractframes (relative to your current working directory) with the prefix testframe (to the filename) • readframes test.mpg extractframes/testframe This will decode all the frames of test.mpg into the directory extractframes with the filenames testframe0.ppm, testframe1.ppm … – To encode, • encodeframes 0 60 extractframes/testframe testresult.mpg This will encode images testframe0.ppm to testframe60.ppm from the directory extractframes into testresult.mpg • In order to convert between PPM and PGM formats, copy the script from – http://www.csee.usf.edu/~mshreve/batchconvert