Mid-level Representation for Images and Videos

advertisement
Mid-level Representations
for Images and Videos
Sylvain Paris, Adobe
with Jiawen Chen,
Michael Cohen, Frédo Durand,
Wojciech Matusik, Jue Wang
Making good videos is hard, much
harder than taking good photos.
Better tools will help.
Today's focus:
How to represent video data
to build better algorithms.
Outline
• Mid-level representations
• Bilateral grid
• Video mesh
Low-level Representation
R=255
G=50
B=35
High-level Representation
building
car
street
Mid-level Information
texture
smooth
edge
Mid-level Representations
• Data are indexed by (x,y,t,r,g,b)
• (x,y,t): location. For video, it's a grid.
• (r,g,b): color. That is where the information is.
– Many representations exist: HSV, Lab, [Kim 09]...
• We are going to look at (x,y,t,r,g,b).
Outline
• Mid-level representations
• Bilateral grid
• Video mesh
Example: the Edge-aware Brush
• Classical paint brush
– Ignores edges
Strokeimage
Input
with classical brush
• Our edge-aware brush
– Respects edges
Stroke with bilateral brush
Bilateral Grid – Definition
y
Standard 2D Image
• Bilateral grid = 3D array
–x and y correspond to pixel position
–z corresponds to pixel intensity
–Euclidean distance accounts
for edges
x
• space distance (x,y) and
intensity distance (z)
• Grid can be coarsely sampled
–E.g., 70 x 70 x 10 for an
8 megapixel image
Bilateral Grid
2D vs. Bilateral Grid Downsampling
• Bilateral grid enables aggressive downsampling
• Extra dimension preserves edges
y
• Nearest neighbor
arbitrarily bright or dark
• Bicubic
intermediate value not in original
x
Downsampling in 2D
2D vs. Bilateral Grid Downsampling
• Bilateral grid enables aggressive downsampling
• Extra dimension preserves edges
y
x
Downsampling in 2D
Bilateral Grid
Bilateral Grid Painting
• When mouse is held down, paint only at intensity level
of initial mouse click
Input image
Bilateral Grid
Bilateral Grid Painting
• Edge-aware brush used to change hue
Bilateral Filter [Tomasi 98]
• Smooth image except across strong edges
• Ubiquitous in computational photography
[Oh 01, Durand 02, Eisemann 04, Petschnigg 04, Bennett 05, Bae 06, Fattal 07, Kopf 07, …]
Input
Filtered
Output
Bilateral Filter on the Bilateral Grid
Bilateral
Grid
intensity
Image scanline
space
Bilateral Filter on the Bilateral Grid
Bilateral
Grid
intensity
Image scanline
Gaussian blur
grid values
Slice: query grid
with input image
intensity
space
space
Filtered scanline
More than 100 Hz
at HD video resolution
using GPU.
Video
Many Operations and Applications
• Local histogram equalization
• Interactive tone mapping
• Video abstraction
[Winnemoller 06, DeCarlo 02]
• Segmentation [Paris 07]
• Photographic style transfer [Bae 06]
Model
Input
Output
Works great frame by frame if no noise.
What if there is noise?
Not everything can be done frame by
frame, e.g. segmentation.
Frame by Frame
• Consider only the
current frame.
• Do not exploit all
the available
information.
unused
unused
Volumetric, Off-line
• Good if we know the
entire sequence in
advance.
• Time is the same as x
and y.
– Use a time Gaussian.
Causal, On-line
• Good for real time
– recursive algorithm
• Formal equivalence
between exp.
decay and
Gaussian
[ECCV 08]
fixed
unused
Video
Summary on Bilateral Grid
• Image data as a coarse volumetric grid.
• Edges are naturally represented.
– Standard operations become edge-aware for free.
• Fast: HD video in real time
• Grid good up to 5D/6D, e.g. (x,y,t,r,g,b)
– See work by Adams et al. at Stanford
for higher dimensions.
Outline
• Mid-level representations
• Bilateral grid
• Video mesh
Video Meshes
• Coarse data:
– motion, depth
– represented by a triangle mesh
• Finely detailed data:
– colors, boundaries
– represented by RGBA textures
Video
Conclusion
• Mid-level cues carry
a lot of information
• Can be "embedded"
in image representation
– Fast, easy-to-adapt
algorithms
• Many existing structures
– Unwrap mosaics, locally rigid triangles,
edge-aware wavelets à trous, video front…
Download