Automatic scene inference for 3D object compositing Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.; Jin, H.; Fonte, R.; Sittig, M., David Forsyth SIGGRAPH 2014 What is this system • • • • Image editing system Drag-and-drop object insertion Place objects in 3D and relight Fully automatic for recovering a comprehensive 3D scene model: geometry, illumination, diffuse albedo, and camera parameters • From single low dynamic range (LDR) image Existing problems • It’s the artist’s job to create photorealistic effects by recognizing the physical space • Lighting, shadow, perspective • Need: camera parameters, scene geometry, surface materials, and sources of illumination State-of-the-art • http://www.popularmechanics.com/technolo gy/digital/visual-effects/4218826 • http://en.wikipedia.org/wiki/The_Adventures _of_Seinfeld_%26_Superman What can not this system handle • Works best when scene lighting is diffuse; therefore generally works better indoors than out • Errors in either geometry, illumination, or materials may be prominent • Does not handle object insertion behind existing scene elements Contribution • Illumination inference: recovers a full lighting model including light sources not directly visible in the photograph • Depth estimation: combines data-driven depth transfer with geometric reasoning about the scene layout How to do this • Need: geometry, illumination, surface reflectance • Even though the estimates are coarse, the composites still look realistic because even large changes in lighting are often not perceivable Workflow Indoor/outdoor scene classification • • • • K-nearest-neighbor matching of GIST features Indoor dataset: NYUv2 Outdoor dataset: Make3D Different training images and classifiers are chosen depending on indoor/outdoor scene Single image reconstruction • Camera parameters, geometry – Focal length f, camera center (cx, cy) and extrinsic parameters are computed from three orthogonal vanishing points detected in the scene Surface materials • Per-pixel diffuse material albedo and shading by Color Rentinex method Data-driven depth estimation • Database: rgbd • Appearance cues for correspondences: multiscale SIFT features • Incorporate geometric information Data-driven depth estimation Et: depth transfer Em: Manhattan world Eo: orientation E3s: spatial smoothness in 3D Scene illumination Visible sources • Segment the image into superpixels; • Then compute features for each superpixel; – Location in image – Use 340 features used in Make3D • Train a binary classifier with annotated data to predict whether or not a superpixel is emitting/reflecting a significant amount of light. Out-of-view sources • Data-driven: annotated SUN360 panorama dataset; • Assumption: if photographs are similar, then the illumination environment beyond the photographed region will be similar as well. Out-of-view sources • Use features: geometric context, orientation maps, spatial pyramids, HSV histograms, output of the light classifier; • Measure: histogram intersection score, per-pixel inner product; • Similarity metric of IBLs: how similar the rendered canonical objects are; • Ranking function: 1-slack, linear SVN-ranking optimization (trained). Relative intensities of the light sources • Intensity estimation through rendering: adjusting until a rendered version of the scene matches the original image; • Humans cannot distinguish between a range of illumination configurations, suggesting that there is a family of lighting conditions that produce the same perceptual response. • Simply choose the lighting configuration that can be rendered faster. Physically grounded image editing • Drag-and-drop insertion • Lighting adjustment • Synthetic depth-of-field User study • Real object, real scene VS inserted object, real scene • Synthetic object, synthetic scene VS inserted object, synthetic scene • Produces perceptually convincing results