Video Special Effects Wen-Hung Liao 10/3/2006 Outline Hardware-based video special effects Software-based video special effects Video content analysis Hardware-based VFX Matrox RT.X2 http://www.matrox.com/video/ct/home.cfm Real-time multi-layer workflows in HD and SD: Designed primarily for real-time native HDV and DV editing, Matrox RT.X2 also provides a high-quality MPEG-2 4:2:2 I-frame codec so you can capture other HD and SD formats using RT.X2's analog inputs, and mix all types of footage on the timeline in real time. Where to purchase? http://www.voxelvision.com.tw/ http://www.avideo.com.tw/ Real-time CPU Effects Realtime primary color correction Realtime secondary color correction Realtime chroma and luma keying Realtime speed changes Realtime transitions Realtime track matte Realtime move & scale Realtime SD clip upscaling in an HD timeline Realtime HD clip downscaling in an SD timeline Native Adobe Premiere Pro effects Real-time GPU Effects Realtime Adobe Motion effect Realtime advanced 2D/3D DVE Realtime shadow Realtime blur/glow/soft focus Realtime page curl Realtime surface finish Realtime pan & scan Realtime mask Realtime mask blur Realtime mask mosaic Realtime four-corner pin Accelerated shine Native Adobe Premiere Pro transitions Realtime crystallize Realtime lens flare Realtime old movie effect Graphics Processing Unit (GPU) A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated graphics rendering device for a personal computer, workstation, or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes them more effective than typical CPUs for a range of complex algorithms. GPU Operations A GPU implements a number of graphics primitive operations in a way that makes running them much faster than drawing directly to the screen with the host CPU. The most common operations for early 2D computer graphics include the BitBLT operation (combine two bitmap patterns using a RasterOp), usually in special hardware called a "blitter", and operations for drawing rectangles, triangles, circles, and arcs. Modern GPUs also have support for 3D computer graphics, and typically include digital video-related functions as well. Applications: Example 1 OpenVIDIA : GPU accelerated Computer Vision Library, http://openvidia.sourceforge.net/ The OpenVIDIA project implements computer vision algorithms on computer graphics hardware, using OpenGL and Cg. The project provides useful example programs which run real time computer vision algorithms on single or parallel graphics processing units. Applications: Example 2 Real-time stereo using GPU “... Since the GPU is built to process images it is particularly well suited to perform some computer vision and image processing algorithms very efficiently. We developed a real-time stereo algorithm that runs on the GPU and is several times faster than most CPU-based implementations.” Software-based Video Special Effects Examples: EffectTV: http://effectv.sourceforge.net/ FreeFrame: http://freeframe.sourceforge.net/gallery.html RGB/YUV Conversion http://www.fourcc.org/index.php?http%3A//www.four cc.org/intro.php RGB to YUV Conversion Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16 Cr = V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128 Cb = U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128 YUV to RGB Conversion B = 1.164(Y - 16) + 2.018(U - 128) G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128) R = 1.164(Y - 16) + 1.596(V - 128) Types of Special Effects Applying to the whole image frame Applying to part of the image (edges, moving pixels,…) Applying to a collection of frames (framebuffer) Applying to detected areas Overlaying virtual objects: at pre-determined locations in response to user’s position Compressed-Domain Processing Video special effects editing in MPEG-2 compressed video Fernando, W.A.C.; Canagarajah, C.N.; Bull, D.R Fade, dissolve and wipe production in MPEG-2 compressed video Fernando, W.A.C.; Canagarajah, C.N.; Bull, D.R.; Video Content Analysis Event detection For indexing/searching To obtain high-level semantic description of the content. Image Databases Problem: accessing and searching large databases of images, videos and music Traditional solutions: file IDs, keywords, associated text. Problems: can’t query based on visual or musical properties depends on the particular vocabulary used doesn’t provide queries by example time consuming Solution: content-based retrieval using automatic analysis tools (see http://wwwqbic.almaden.ibm.com) Retrieval of images by similarity Components: Extraction of features or image signatures and efficient representation and storage A set of similarity measures A user interface for efficient and ordered representation of retrieved images and to support relevance feedback Considerations Many definitions of similarity are possible User interface plays a crucial role Visual content-based retrieval is best utilized when combined with traditional search Image features for similarity definition Color similarity Similarity: e.g., “distance” between color histograms Should use perceptually meaningful color spaces (HSV, Lab...) Should be relatively independent of illumination (color constancy) Locality:“find a red object such as this one Texture similarity Texture feature extraction (statistical models) Texture qualities: directionality, roughness, granularity... Shape Similarity Must distinguish between similarity between actual geometrical 2-D shapes in the image and underlying 3-D shape Shape features: circularity, eccentricity, principal axis orientation... Spatial similarity Assumes images have been (automatically or manually) segmented into meaningful objects (symbolic image) Considers the spatial layout of the objects in the scene Object presence analysis Is this particular object in the image? Main components of retrieval system Database population: images and videos are processed to extract features (color, texture, shape, camera and object motion) Database query: user composes query via graphic user interface. Features are generated from graphical query and input to matching engine Relevance feedback: automatically adjusts existing query using information fed back by user about relevance of previously retrieved objects Video parsing and representation Interaction with video using conventional VCR-like manipulation is difficult - need to introduce structural video analysis Video parsing Temporal segmentation into elemental units Compact representation of elemental unit Temporal segmentation Fundamental unit of video manipulation: video shots Types of transition between shots: Abrupt shot change Fades: slow change in brightness Dissolve Wipe: pixels from second shots replace those of previous shot in regular patterns Other factors of image change: Motion, including camera motion and object motion Luminosity changes and noise Representation of Video Video database population has three major components: Shot detection Representative frame creation for each shot Derivation of layered representation of coherently moving structures/objects A representative frame (R-frame) is used for: population: R-frame is treated as a still image for representation query: R-frames are basic units initially returned in video query Choice of R-frame: first - middle - last frame in video shot sprite built by seamless mosaicing all frames in a shot Video soundtrack analysis Image/sound relationships are critical to the perception and understanding of video content. Possibilities: Speech, music and Foley sound, detection and representation Locutor identification and retrieval Word spotting and labeling (speech recognition) A possible query could be: “find the next time this locutor is again present in this soundtrack” Video scene analysis 500-1000 shots per hours in typical movies One level above shot: sequence or scene (a series of consecutive shots constituting a unit from the narrative point of view)