Image and Depth from a Conventional Camera with a Coded Aperture Anat Levin, Rob Fergus, Frédo Durand, William Freeman MIT CSAIL Output #1: Depth map Single input image: Output #2: All-focused image Lens and defocus Image of a point light source Lens’ aperture Lens Camera sensor Point spread function Focal plane Lens and defocus Image of a defocused point light source Lens’ aperture Object Lens Camera sensor Point spread function Focal plane Lens and defocus Image of a defocused point light source Lens’ aperture Object Lens Camera sensor Point spread function Focal plane Lens and defocus Image of a defocused point light source Lens’ aperture Object Lens Camera sensor Point spread function Focal plane Depth and defocus Out of focus Depth from defocus: Infer depth by analyzing local scale of defocus blur In focus Challenges • Hard to discriminate a smooth scene from defocus blur ? Out of focus • Hard to undo defocus blur Input Ringing with conventional deblurring algorithm Key contributions • Exploit prior on natural images - Improve deconvolution - Improve depth discrimination Natural • Coded aperture (mask inside lens) - make defocus patterns different from natural images and easier to discriminate Unnatural Related Work •Active methods e.g. Nayar et al. , Zhang and Nayar •Depth from (de)focus e.g. Pentland, Chaudhuri, Favaro et al. • Plenoptic/ light field cameras e.g. Adelson and Wang, Ng et al. • Wave front coding e.g. Cathey & Dowski •Blind Deconvolution e.g. Kundur and Hatzinakos , Fergus et al, Levin Never recover both depth AND full resolution image from a single image Defocus as local convolution Calibrated blur kernels at different depths Input defocused image Defocus as local convolution y fk x Local sub-window Input defocused image Calibrated blur kernels at depth k Sharp sub-window Depth k=1: y fk x Depth k=2: y fk x Depth k=3: y fk x Overview Try deconvolving local input windows with different scaled filters: Somehow: select best scale. ? ? ? Larger scale Correct scale Smaller scale Challenges • Hard to deconvolve even when kernel is known Input • Hard to identify correct scale: ? ? ? Ringing with the traditional Richardson-Lucy deconvolution algorithm Larger scale Correct scale Smaller scale Deconvolution is ill posed f x y ? = Linear, but rank deficient Deconvolution is ill posed f x y Solution 1: ? = ? = Solution 2: Linear, but rank deficient Image Restoration Techniques • Frequency domain – Wiener filter, deconvwnr() – Direct inverse method ? Clear = Kernel Blur X • Spatial domain – Richardson-Lucy method, deconvlucy() – MAP estimator, iterative method – Slow convergence for high freq. (needs priors) Idea 1: Natural images prior What makes images special? Natural Image gradient Unnatural Natural Image Statistics Characteristic distribution with heavy tails Histogram of image gradients Blurry image has different statistics Histogram of image gradients Deconvolution with prior | f x y | i (xi ) x arg min 2 Convolution error ? _ Derivatives prior 2 + Low Equal convolution error 2 ? _ + High Comparing deconvolution algorithms (Non blind) deconvolution code available online: http://groups.csail.mit.edu/graphics/CodedAperture/ Input Richardson-Lucy (x) x 2 (x) x 0.8 “spread” gradients “localizes” gradients Gaussian prior Sparse prior Comparing deconvolution algorithms (Non blind) deconvolution code available online: http://groups.csail.mit.edu/graphics/CodedAperture/ Input Richardson-Lucy (x) x 2 (x) x 0.8 “spread” gradients “localizes” gradients Gaussian prior Sparse prior Recall: Overview Try deconvolving local input windows with different scaled filters: Larger scale Correct scale Smaller scale ? ? ? Somehow: select best scale. Challenge: smaller scale not so different than correct Idea 2: Coded Aperture • Mask (code) in aperture plane - make defocus patterns different from natural images and easier to discriminate Conventional aperture Our coded aperture Solution: lens with occluder Object Lens Camera sensor Point spread function Focal plane Solution: lens with occluder Image of a defocused point light source Aperture pattern Object Lens with coded aperture Camera sensor Point spread function Focal plane Solution: lens with occluder Image of a defocused point light source Aperture pattern Object Lens with coded aperture Camera sensor Point spread function Focal plane Solution: lens with occluder Image of a defocused point light source Aperture pattern Object Lens with coded aperture Camera sensor Point spread function Focal plane Solution: lens with occluder Image of a defocused point light source Aperture pattern Object Lens with coded aperture Camera sensor Point spread function Focal plane Solution: lens with occluder Image of a defocused point light source Aperture pattern Object Lens with coded aperture Camera sensor Point spread function Focal plane Why coded? Coded aperture- reduce uncertainty in scale identification Conventional Larger scale Correct scale Smaller scale Coded Frequency spectrum 0 Filter, 1st scale spectrum = 1st observed image 0 Frequency Frequency 0 Sharp Image Frequency 0 spectrum spectrum Sharp Image Filter, 2nd scale spectrum spectrum Convolution- frequency domain representation = 2nd observed image 0 Frequency 0 Spatial convolution Frequency Output spectrum has zeros frequency multiplication where filter spectrum has zeros Estimated image ? Frequency spectrum 0 Filter, correct scale spectrum Observed image 0 Division by zero Estimated image ? Frequency 0 spectrum = spectrum spectrum Coded aperture: Scale estimation and division by zero spatial ringing 0 Frequency 0 Frequency Filter, wrong scale = Frequency Estimated image ? Frequency spectrum 0 Filter, correct scale spectrum Observed image 0 0 Frequency division of tiny value by zero no spatial ringing ? Frequency Filter, wrong scale 0 Frequency Estimated image 0 spectrum = spectrum spectrum Division by zero with a conventional aperture? Frequency = Filter Design Analytically search for a pattern maximizing discrimination between images at different defocus scales (KL-divergence) Account for image prior and physical constraints Score More discrimination between scales Less discrimination between scales Sampled aperture patterns Conventional aperture Kullback-Leibler divergence Non-symmetric measure of distance between 2 probabilities. Closely related to entropy. Use probability distribution in the frequency domain. Filter Design Considerations •Filters should be binary •Easy to cut from material – No floating elements •Avoid using the full aperture due to radial distortion in optics •Minimum size on filter holes to avoid diffraction Filter Design Optimization Input: •Binary 13x13 Patterns with 1mm features •8 different scales vary between 5 to 15 pixels Method: •KL-divergence scores computed for pairs of varying scales of each filter design •Minimum score between pairs is assigned as score for filter Result: Score More discrimination between scales Less discrimination between scales Sampled aperture patterns Conventional aperture Zero frequencies- pros and cons Previous methods: 0 0 Our solution: 0 Include zero frequencies: No zero frequencies: + - Filter can be easily inverted Weaker depth discrimination 0 + + Zeros improve depth discrimination Inversion difficult Inversion made possible with image priors Depth results Regularizing depth estimation Try deblurring with 10 different aperture scales x arg min | f x y |2 Convolution error _ i (xi ) Derivatives prior 2 + Keep minimal error scale in each local window + regularization Input Local depth estimation Regularized depth Regularizing depth estimation Local depth estimation Input Regularized depth Sometimes, manual intervention Input Local depth estimation Regularized depth After user corrections All focused results Input All-focused (deconvolved) Close-up Original image All-focus image Input All-focused (deconvolved) Close-up Original image All-focus image Naïve sharpening Comparison- conventional aperture result Ringing due to wrong scale estimation Comparison- coded aperture result Application: Digital refocusing from a single image Application: Digital refocusing from a single image Application: Digital refocusing from a single image Application: Digital refocusing from a single image Application: Digital refocusing from a single image Application: Digital refocusing from a single image Application: Digital refocusing from a single image Coded aperture: pros and cons AND depth at a single shot + Image + No loss of image resolution + Simple modification to lens - Depth is coarse unable to get depth at untextured areas, might need manual corrections. + But depth is a pure bonus - Loss some light + But deconvolution increases depth of field Deconvolution code available http://groups.csail.mit.edu/graphics/CodedAperture/ 50mm f/1.8: $79.95 Cardboard: $1 Tape: $1 Depth acquisition: priceless