Automatic scene inference for 3D object compositing

Automatic scene inference for 3D
object compositing
Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.;
Jin, H.; Fonte, R.; Sittig, M., David Forsyth
What is this system
Image editing system
Drag-and-drop object insertion
Place objects in 3D and relight
Fully automatic for recovering a comprehensive
3D scene model: geometry, illumination, diffuse
albedo, and camera parameters
• From single low dynamic range (LDR) image
Existing problems
• It’s the artist’s job to create photorealistic
effects by recognizing the physical space
• Lighting, shadow, perspective
• Need: camera parameters, scene geometry,
surface materials, and sources of illumination
What can not this system handle
• Works best when scene lighting is diffuse;
therefore generally works better indoors than
• Errors in either geometry, illumination, or
materials may be prominent
• Does not handle object insertion behind
existing scene elements
• Illumination inference: recovers a full lighting
model including light sources not directly
visible in the photograph
• Depth estimation: combines data-driven
depth transfer with geometric reasoning
about the scene layout
How to do this
• Need: geometry, illumination, surface
• Even though the estimates are coarse, the
composites still look realistic because even
large changes in lighting are often not
Indoor/outdoor scene classification
K-nearest-neighbor matching of GIST features
Indoor dataset: NYUv2
Outdoor dataset: Make3D
Different training images and classifiers are
chosen depending on indoor/outdoor scene
Single image reconstruction
• Camera parameters, geometry
– Focal length f, camera center (cx, cy) and extrinsic
parameters are computed from three orthogonal
vanishing points detected in the scene
Surface materials
• Per-pixel diffuse material albedo and shading
by Color Rentinex method
Data-driven depth estimation
• Database: rgbd
• Appearance cues for correspondences: multiscale SIFT features
• Incorporate geometric information
Data-driven depth estimation
Et: depth transfer
Em: Manhattan world
Eo: orientation
E3s: spatial smoothness in 3D
Scene illumination
Visible sources
• Segment the image into superpixels;
• Then compute features for each superpixel;
– Location in image
– Use 340 features used in Make3D
• Train a binary classifier with annotated data to
predict whether or not a superpixel is
emitting/reflecting a significant amount of light.
Out-of-view sources
• Data-driven: annotated SUN360 panorama
• Assumption: if photographs are similar, then
the illumination environment beyond the
photographed region will be similar as well.
Out-of-view sources
• Use features: geometric context, orientation maps,
spatial pyramids, HSV histograms, output of the light
• Measure: histogram intersection score, per-pixel inner
• Similarity metric of IBLs: how similar the rendered
canonical objects are;
• Ranking function: 1-slack, linear SVN-ranking
optimization (trained).
Relative intensities of the light sources
• Intensity estimation through rendering: adjusting until
a rendered version of the scene matches the original
• Humans cannot distinguish between a range of
illumination configurations, suggesting that there is a
family of lighting conditions that produce the same
perceptual response.
• Simply choose the lighting configuration that can be
rendered faster.
Physically grounded image editing
• Drag-and-drop insertion
• Lighting adjustment
• Synthetic depth-of-field
User study
• Real object, real scene VS inserted object, real
• Synthetic object, synthetic scene VS inserted
object, synthetic scene
• Produces perceptually convincing results