Precis 1 (reconstructing occluded surfaces…)

advertisement
Liz Bondi
V. Vaish, M. Levoy, R. Szeliski, C. L. Zitnick, and S. B. Kang. (2006, June).
Reconstructing Occluded Surfaces Using Synthetic Apertures: Stereo, Focus and Robust
Measures. 2006 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition [Online]. 2(CVPR’06), pp.2331-2338. Available:
http://www.computer.org/portal/web/csdl/doi/10.1109/CVPR.2006.244
The paper begins with giving two problems that occur when reconstructing
occluded objects: limited number of views and cost functions (functions that are used in
3D reconstruction algorithms) that assume that the camera can see all objects. To increase
the number of views, scientists use synthetic aperture focusing, which is the use of a large
camera array wherein the camera array’s view is wider than the occluding objects. As for
the other problem, the Stanford scientists, who coincidentally helped work on the class
example of the multi camera array, discuss the effectiveness of different cost functions to
reconstruct occluded images.
There are four cost functions that V. Vaish, M. Levoy, R. Szeliski, C.L. Zitnick,
and S.B. Kang explore in this paper: shape from focus, shape from stereo, shape from
median, and shape from entropy. Shape from focus and shape from stereo are similar in
that they both use mean color to reconstruct objects, although stereo specifically uses
variance. The problem with these two cost functions is that rays that hit occlusions are
used in the reconstruction, but they should be considered outliers. Hence, shape from
median is basically the same as stereo, but it uses the median color instead of the mean
since statistics dictates that the median is the best measure of central tendency when there
are outliers. Shape from entropy uses modal color instead of the median or the mean. In
the three experiments conducted by the Stanford scientists, shape from focus was best for
high percentages of occlusions with similar color, while entropy was best for low
percentages of occlusions.
This paper overall seems to be a good source for several reasons: it is written for
Computer Vision and Pattern Recognition (CVPR), a conference that is part of the IEEE,
it uses twenty sources to back up its claims, and it gives many diagrams, mathematical
proofs, and equations that can help the reader visualize its claims. For example, when
discussing stereo versus focus, there is a diagram of the camera layout, two graphs for
intensity, and a final graph comparing the depth and response of stereo and focus. Also,
in the beginning, the authors use a conclusion drawn by Schechner et al to “…inspire this
research.” However, the paper does seem a bit unorganized. For example, one section is
stereo vs. focus, but it does not define these cost functions until after discussing the
specific cost functions, giving a theorem, and proving the theorem. Once it finally does
give definitions, it gives them in several different ways throughout the paper and each
definition seems to contradict another. Additionally, it does not give a clear conclusion
about which cost function is overall the best.
I got a lot from this source, especially since I had to look up so many details about
algorithms, synthetic apertures and so forth. Once I understood as much of the paper as I
could, I decided it was very useful to our project. Specifically, I found the experiments
section helpful. This section gave experimental setups that not only validated the results,
but also gave me ideas on how we could start building a multi camera array. For example,
in an experiment in which there is an ivy wall in front of a person and a statue, they used
88 cameras. In an experiment in which there they tried to image a CD case behind plants,
they used a single camera with a very large aperture. This paper can also be useful once a
multi camera array is built and we need to combine images from many cameras to see the
occluded objects. All we need to decide is whether the people in an airport will create a
high or low percentage of occlusions. If it’s a high percentage, we can use the shape from
focus cost function (𝑓𝑑 (π‘₯) = − [
Μ… (π‘₯) 2
πœ•πΌπ‘‘
πœ•π‘₯
] ), and if it’s a low percentage, we can use the
𝐾
shape from entropy cost function (𝐻 = − ∑
𝑏𝑖
𝑖=0 𝑁
𝑏
log 𝑁𝑖). Of course, we will need to
figure out what algorithms are and how cost functions are specifically used, but this will
save us from determining a cost function. Therefore, this paper was useful because it
gave us information about what happens after we set up a few cameras, about how to turn
many different images into an image that will show an occluded object. Now we have
some ideas on how to “see through” occlusions.
Download