The perception of Shading and Reflectance E.H. Adelson, A.P. Pentland Presenter: Stefan Zickler The “Intrinsic Image” the underlying physical properties of a scene. Looking at a 2D image, what does its 3-dimensional source model look like? What makes an image? A combination of three factors: Lighting Shading Reflectance Lighting Variables: Number of light sources Intensity Position Distribution (Spot-light or Global) Reflectance How a surface’s material changes the light: Color Absorbance Transparency Etc… Shading A change to the angle of incidence of light based on the surface normal. a simple formulation of an image in terms of reflectance and shading I(x,y) = r(x,y) s(x,y) r(x,y) is the reflectance image s(x,y) is the shading image / luminance image where s(x,y) = λ N(x,y)·L N(x,y) is the surface normal L is the illumination direction λ is the “luminous flux”, meaning intensity of light. The bad news Any 2D image can be described by infinitely many 3D models of shading and reflectance (the most simple being a flat 2D screen, colored with the image). The good news Humans are easily able to reason about which intrinsic 3D model is likely to be the correct one. Therefore, a computer should be able do the same… How do we find the best intrinsic image? A perception should correspond to the simplest or likeliest explanation. One way to define simplicity is by introducing a cost-function. The “workshop” metaphor A generative model for shading, reflectance, and lighting. We have three workers: Painter Sheet Metal Worker Lighting Designer The painter Can paint polygons with certain colors. Works on the reflectance component of our image. The metal-worker Can cut out new pieces of metal Can bend pieces of metal This is the shading component of our image. The Lighting Designer Can position lights to illuminate a scene. Can chose between flood lights and spot lights. What does this give us? A fairly complete generative model to create any arbitrary 3D scene How do we enforce simplistic solutions? Through a cost-function. The pricelist Painter Fees: Paint rectangular patch: Paint general polygon: Sheet Metal Worker Fees: Right angle cuts Odd angle cuts Right angle bends Odd angle bends Lighting Designer Fees: Flood light Custom spot light $5 each $5 each $2 $5 $2 $5 each each each each $5 each $30 each Each worker can create an entire image with a minimum of help from the other workers. Painter’s solution: Paint 9 polygons: Setup 1 flood light Cut 1 rectangle Total $180 $5 $8 $193 Sheet metal worker's solution: Cut 24 odd angles $120 Bend 6 odd angles $30 Set up 1 flood light $5 Total $155 Lighting Designer's solution: Cut 1 Rectangle $8 Set up 9 spot lights $270 Total $278 We need a supervisor His role: Coordinate the three workers to find a cooperative solution with the minimum overall cost. In more scientific terms: To perform a search through the entire solution space and find the point of minimum overall cost. The supervisor’s solution: Supervisor's solution: Cut 1 rectangle Paint 3 rectangles Bend 2 right angles Supervisor's fee Total $8 $5 $4 $30 $47 Compare to: Painter’s solution: Metal Worker’s solution: Lighting Worker’s solution: $193 $155 $278 Tweaking the price-list: Discouraging naïve solutions Make naïve solutions expensive. We don’t want our algorithm to simply create a painted 2D screen. On the other hand we don’t want to make things like paint too expensive so that they never get used. Cooperative solutions should be cheaper than single workers Is there an optimal pricelist? Price-list values can be determined experimentally and tweaked in a way that they deliver the most likely solution for most images. However, there is no universal price list that correctly describes all possible images. The main problem with this workshop theory The search space for cooperative solutions of our workers is enormous, as there are infinitely many ways of combining their skills Even for small scenes, there exists no efficient search algorithm to solve this problem in a simultaneous fashion. Their solution Instead of a simultaneous cooperative model, we use a simplified, multi-stage generative model. Where have we seen this before? Stage 1: The Shape Specialist Assumptions: image was made by orthographic projection. We are given the observed x,y coordinates of all edges and vertices in the image. Operations: We can move vertices among the z axis Shape Specialist Contd. Simple solutions are enforced by assigning higher costs to non-right angles. Compactness (shorter edges) and planarity (less angle-variance) are rewarded. This cost-metric works for most figures, but not all of them. Stage 2: Lighting Specialist Given the shape from the previous specialist, find the lighting direction that best explains the observed luminance variation in terms of shading. This can be estimated linearly by solving for the light direction L of two connected surfaces: I1 = r1 λ N1·L I2 = r2 λ N2·L Where r(x,y) is an estimated average, and λ=1 Stage 3: Reflectance specialist Given the shape and lighting from the previous two specialists, explain any left-over differences by painting the surfaces. An example: The problem with this approach Real world scenes don’t look like this: The problem with this approach Instead, they look more like this: Some Other Shortcomings Tuning the cost-factors is done manually. There will never be a single set of parameters that will correctly describe all scenes. A psychologist’s approach to computer science: not much information on how far this approach can scale up to more complex scenes, not much work on coming up with a better search algorithm or parameter learning. How well this approach works on random, real-world scenes is questionable.