MPEG-4 • Objective • Standardize algorithms for audiovisual coding in multimedia applications allowing for • • • • Interactivity High compression Scalability of audio and video content Support for natural and synthetic audio and video • The Idea • An audiovisual scene is a coded representation of audiovisual objects related in space and time MPEG-4: Scenario • A/V object • • • • A video object within a scene The background An instrument or voice Coded independently • A/V scene • • • • Mixture of natural or synthetic objects Individual bitstreams multiplexed and transmitted One or more channels Each channel may have its own quality of service MPEG-4: Video Object Plane • Video frame = sum of segmented regions with arbitrary shape (VOP) • Shape motion and texture information of VOPs belonging to the same video object is encoded into a video object layer (VOL) • Encode • VOL identifiers • Composition information • Overlapping configuration of VOPs MPEG-4: Coding • Shape coding • • • • Shape information in alpha planes Transparency of shape encoded Inter and intra shape coding functions After shape coding each VOP in a VO is partitioned into non-overlapping macroblocks • Motion coding • Shift parameter wrt reference window • Standard macroblock • Contour macroblock MPEG-4: Coding • Texture coding • Intra-VOPs, residual errors from motion compensation are DCT coded like MPEG-1 • 4 luminance and 2 chrominance blocks in a macroblock • P-VOPs (prediction error blocks) may not conform to VOP boundary • Pixels outside the active area are set to a constant value • Standard compression • Efficient prediction of DC and AC components from intra and inter coded blocks • Multiplexing • Shape motion texture coded data • Motion and DCT coefficients can be jointly (H.263) or individually coded MPEG-4 Video Object Segmentation-I • Construct a video object • User selects start frame, outlines polygon designating rough object boundary • Refine boundary using snake algorithm, if needed • Compute a k-pixel bounding box around the object • Within bounding box compute • Edge map: bit plane, after thresholding a convolution kernel • Color map: compute luminance and chrominance, quantize by kmeans clustering, keep quantization table • Motion field: block-based motion vector • Segment into regions no significant edge, smooth color having smooth motion • Intersect segments and initial object boundary and determine foreground and background region • Estimate the motion of regions in the next frame with an affine motion model MPEG-4 Video Object Segmentation-II • Track object • Locate estimated position of foreground and background regions from • • • • • • previous frame. Call this the object mask. Generate same three feature maps with the quantization table; Requantize if error is large Classify regions into foreground/background and new regions Intersection ratio r with object mask For foreground regions, if r > 80% OR foreground mask, mark as foreground; label foreground - mask as new For new regions, if r < 30% mark as new; if r > 80% mark as foreground; else find nearest-motion-similar neighbor. If it is in the foreground, do previous step, else keep region as new Iterate until stable