Mid-level Representations for Images and Videos Sylvain Paris, Adobe with Jiawen Chen, Michael Cohen, Frédo Durand, Wojciech Matusik, Jue Wang Making good videos is hard, much harder than taking good photos. Better tools will help. Today's focus: How to represent video data to build better algorithms. Outline • Mid-level representations • Bilateral grid • Video mesh Low-level Representation R=255 G=50 B=35 High-level Representation building car street Mid-level Information texture smooth edge Mid-level Representations • Data are indexed by (x,y,t,r,g,b) • (x,y,t): location. For video, it's a grid. • (r,g,b): color. That is where the information is. – Many representations exist: HSV, Lab, [Kim 09]... • We are going to look at (x,y,t,r,g,b). Outline • Mid-level representations • Bilateral grid • Video mesh Example: the Edge-aware Brush • Classical paint brush – Ignores edges Strokeimage Input with classical brush • Our edge-aware brush – Respects edges Stroke with bilateral brush Bilateral Grid – Definition y Standard 2D Image • Bilateral grid = 3D array –x and y correspond to pixel position –z corresponds to pixel intensity –Euclidean distance accounts for edges x • space distance (x,y) and intensity distance (z) • Grid can be coarsely sampled –E.g., 70 x 70 x 10 for an 8 megapixel image Bilateral Grid 2D vs. Bilateral Grid Downsampling • Bilateral grid enables aggressive downsampling • Extra dimension preserves edges y • Nearest neighbor arbitrarily bright or dark • Bicubic intermediate value not in original x Downsampling in 2D 2D vs. Bilateral Grid Downsampling • Bilateral grid enables aggressive downsampling • Extra dimension preserves edges y x Downsampling in 2D Bilateral Grid Bilateral Grid Painting • When mouse is held down, paint only at intensity level of initial mouse click Input image Bilateral Grid Bilateral Grid Painting • Edge-aware brush used to change hue Bilateral Filter [Tomasi 98] • Smooth image except across strong edges • Ubiquitous in computational photography [Oh 01, Durand 02, Eisemann 04, Petschnigg 04, Bennett 05, Bae 06, Fattal 07, Kopf 07, …] Input Filtered Output Bilateral Filter on the Bilateral Grid Bilateral Grid intensity Image scanline space Bilateral Filter on the Bilateral Grid Bilateral Grid intensity Image scanline Gaussian blur grid values Slice: query grid with input image intensity space space Filtered scanline More than 100 Hz at HD video resolution using GPU. Video Many Operations and Applications • Local histogram equalization • Interactive tone mapping • Video abstraction [Winnemoller 06, DeCarlo 02] • Segmentation [Paris 07] • Photographic style transfer [Bae 06] Model Input Output Works great frame by frame if no noise. What if there is noise? Not everything can be done frame by frame, e.g. segmentation. Frame by Frame • Consider only the current frame. • Do not exploit all the available information. unused unused Volumetric, Off-line • Good if we know the entire sequence in advance. • Time is the same as x and y. – Use a time Gaussian. Causal, On-line • Good for real time – recursive algorithm • Formal equivalence between exp. decay and Gaussian [ECCV 08] fixed unused Video Summary on Bilateral Grid • Image data as a coarse volumetric grid. • Edges are naturally represented. – Standard operations become edge-aware for free. • Fast: HD video in real time • Grid good up to 5D/6D, e.g. (x,y,t,r,g,b) – See work by Adams et al. at Stanford for higher dimensions. Outline • Mid-level representations • Bilateral grid • Video mesh Video Meshes • Coarse data: – motion, depth – represented by a triangle mesh • Finely detailed data: – colors, boundaries – represented by RGBA textures Video Conclusion • Mid-level cues carry a lot of information • Can be "embedded" in image representation – Fast, easy-to-adapt algorithms • Many existing structures – Unwrap mosaics, locally rigid triangles, edge-aware wavelets à trous, video front…