Word Document

advertisement
Homework #4: Stereo Vision
Due April 15, 2010.
Professor Davi Geiger.
Disparities are encoded using a scale factor 4 for gray levels 1 .. 255, while gray level 0 means "unknown disparity".
Therefore, the encoded disparity range is 0.25 .. 63.75 pixels. We will use the notation that Dmin= -63 pixels and
Dmax=1 pixels. Since no disparity is 0, use 0 disparity for the occlusion pixels.
FILTERS: COMBINING THEM
So far we used a feature and feature distance for stereo that is the following combination of two features (and their distances)
Ce[l,w+D]=0.1 (√๐‘Š๐œƒ=0,๐‘ =3 (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)) + √๐‘Š๐œƒ=๐œ‹,๐‘ =3 (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)))
2
where the feature distance ๐‘Š๐œƒ,๐‘  (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)) is defined in terms of the features ๐ผฬ‚๐ฟ and ๐ผฬ‚๐‘… as follows
2
2
๐‘Š๐œƒ,๐‘  (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)) = ๐‘š๐‘–๐‘› ((๐ผฬ‚๐ฟ (๐‘’, ๐‘™, ๐œƒ, ๐‘ ) − ๐ผฬ‚๐‘… (๐‘’, ๐‘Ÿ, ๐œƒ, ๐‘ )) , (๐ผฬ‚๐ฟ (๐‘’, ๐‘™, ๐œƒ + ๐œ‹, ๐‘ ) − ๐ผฬ‚๐‘… (๐‘’, ๐‘Ÿ, ๐œƒ + ๐œ‹, ๐‘ )) )
Why did we use the same coefficients for both, ๏ฑ๏€ฝ๏€ฐ๏€ and๏€ ๏ฑ๏€ฝ๏ฐ๏€ฏ๏€ฒ comparisons? What if we use an
another feature, say intensity grey level comparison, such as ๐‘Š๐ผ (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)) = (๐ผ ๐ฟ (๐‘’, ๐‘™) − ๐ผ ๐‘… (๐‘’, ๐‘Ÿ))2 ,
how should it be combined with the measures above? We could also use color features, a feature for
each channel R, G, B or the average intensity I and ratios such as r=255*(R/I) and g=255*(G/I) (blue
channels are more noisy), where I=R+G+B. How to combine them?
Let us study the case we have n features and feature distances, let us call the distance {๐น๐‘– (๐‘’, ๐‘™, ๐‘ค); ๐‘– =
1, … , ๐‘›}. A feature distance could be for example ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค) = ๐‘Š๐ผ (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค))1/2 . The smaller the value
of a feature distance at (e,l) for a disparity w, the more likely the disparity is w (the more similar are the image
features at left and right image). Let us work with five feature distances
1
๐ฟ
1
๐‘…
๐น1 = √๐‘Š๐ผ (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)) = ๐ด๐ต๐‘† ( ๐ผ (๐‘’, ๐‘™) − ๐ผ (๐‘’, ๐‘Ÿ)),
3
1
3
๐น2 = √๐‘Š๐œƒ=0,๐‘ =3 (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)),
๐น3 = √๐‘Š๐œƒ=๐œ‹/2,๐‘ =3 (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)),
๐น4 = √๐‘Š๐‘Ÿ (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)) = 255 ∗ ๐ด๐ต๐‘† (
๐‘…๐ฟ (๐‘’, ๐‘™)
๐ผ๐ฟ (๐‘’, ๐‘™)
๐บ๐ฟ (๐‘’, ๐‘™)
๐น5 = √๐‘Š๐‘Ÿ (๐‘’, ๐‘™, ๐‘Ÿ(๐‘™, ๐‘ค)) = 255 ∗ ๐ด๐ต๐‘† (
๐ผ๐ฟ (๐‘’, ๐‘™)
−
๐‘…๐‘… (๐‘’, ๐‘Ÿ)
−
),
๐ผ๐‘… (๐‘’, ๐‘Ÿ)
๐บ๐‘… (๐‘’, ๐‘Ÿ)
๐ผ๐‘… (๐‘’, ๐‘Ÿ)
)
where ๐ผ ๐ฟ (๐‘’, ๐‘™) = ๐‘… ๐ฟ (๐‘’, ๐‘™) + ๐บ ๐ฟ (๐‘’, ๐‘™) + ๐ต๐ฟ (๐‘’, ๐‘™) and ๐ผ ๐‘… (๐‘’, ๐‘Ÿ) = ๐‘… ๐‘… (๐‘’, ๐‘Ÿ) + ๐บ ๐‘… (๐‘’, ๐‘Ÿ) + ๐ต๐‘… (๐‘’, ๐‘Ÿ)
So one possible definition of a probability of a feature at a given location to have a match at a disparity w is
given by
๐‘’ −๐น๐‘– (๐‘’,๐‘™,๐‘ค)
๐‘ƒ๐‘–,๐‘’,๐‘™ (๐‘ค) = ๐ท๐‘š๐‘Ž๐‘ฅ
∑๐‘ค=−๐ท๐‘š๐‘–๐‘› ๐‘’ −๐น๐‘– (๐‘’,๐‘™,๐‘ค)
where ๐‘– = 1,2,3,4,5. There is an ambiguity as we can chose a weight ๏ก to scale the feature distance
without affecting the ranking of the probabilities. So we may refer to each feature distance as being
the feature similarity comparison multiplied by some scale factor, we call it ๏ก๏€ฎ๏€  In the previous
homework we used ๏ก=0.1. This factor could vary locally to be ๐›ผ๐‘– (๐‘’, ๐‘™, ๐‘ค) for each feature distance i.
Taking into account this factor, we should define the probabilities as
๐‘ƒ๐‘–,๐‘’,๐‘™ (๐‘ค) =
๐‘’ −๐›ผ๐‘– (๐‘’,๐‘™,๐‘ค) ๐น๐‘– (๐‘’,๐‘™,๐‘ค)
๐‘’ −๐›ผ๐‘– (๐‘’,๐‘™,๐‘ค) ๐น๐‘– (๐‘’,๐‘™,๐‘ค)
=
−๐›ผ๐‘– (๐‘’,๐‘™,๐‘ค) ๐น๐‘– (๐‘’,๐‘™,๐‘ค)
∑๐ท๐‘š๐‘Ž๐‘ฅ
๐‘๐‘ค
๐‘ค=−๐ท๐‘š๐‘–๐‘› ๐‘’
How to estimate these parameters ๐›ผ๐‘– (๐‘’, ๐‘™, ๐‘ค) ? We do have the true disparity for these images and so
we do have the correspondence between pixels. We can evaluate the value of ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค) for each
pixel of the image and its correspondence.
1. Construct a histogram for each ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค), i.e., for each value ๐ธ๐‘– = ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค) define an
interval ๏„ and count the number of matched pixels falling in this interval. Normalize the
histogram by dividing the count by the number of pixels considered. Plot the histogram. This
histogram represents the probability ๐‘ƒ(๐ธ๐‘– = ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค)) =
the parameter ๐›ผ๐‘– (๐‘’, ๐‘™, ๐‘ค) from it.
๐‘’ −๐›ผ๐‘– (๐‘’,๐‘™,๐‘ค) ๐ธ๐‘–
๐‘๐ธ
, so we need to extract
2. One can see that
๐‘™๐‘œ๐‘”๐‘ƒ๐‘– (๐ธ๐‘– = ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค)) = −๐›ผ๐‘– (๐‘’, ๐‘™, ๐‘ค) ๐ธ๐‘– − log ๐‘๐ธ
and so
๐œ• ๐‘™๐‘œ๐‘”๐‘ƒ๐‘– (๐ธ๐‘– = ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค))
1 ๐œ• ๐‘ƒ๐‘– ( ๐ธ๐‘– )
=−
๐œ• ๐ธ๐‘–
๐‘ƒ๐‘– ( ๐ธ๐‘– ) ๐œ• ๐ธ๐‘–
Thus, we can take these derivatives of the histogram to obtain the local parameters. Note that
these parameters vary with ๐ธ๐‘– = ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค) so plot ๐›ผ๐‘– (๐ธ๐‘– ) per feature i and report the average
value ๐›ผฬ…๐‘– per image, i.e.,
๐›ผ๐‘– (๐ธ๐‘– = ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค)) = −
2
๐‘š๐‘Ž๐‘ฅ๐ธ๐‘–
๐›ผฬ…๐‘– = ∑ ๐›ผ๐‘– (๐ธ๐‘– ) ๐‘ƒ๐‘– ( ๐ธ๐‘– )
๐ธ๐‘– =0
3. Disparity Changes. Create the histogram of disparity changes. For each two consecutive
epipolar-line-pixels on all horizontal lines (epipolar lines), compute the magnitude of the
disparity change and add to one count for its bin on the histogram. So the x-axis of the
histogram represent the possible values of the disparity change, i.e., it should vary from 0 to
Dmax – (–Dmin)=Dmax+Dmin. Normalize the histogram by dividing the bin counts by the
total number of counts (total number of consecutive pixels considered). This histogram
generates
๐‘’ −๐œ†(๐‘’,๐‘™,๐‘ค) ๐ธ
๐‘ƒ๐‘Ÿ(๐ธ = |๐‘ค๐‘’,๐‘™ − ๐‘ค๐‘’,๐‘™−1 |) =
๐‘๐ธ
so we need to extract the parameter ๐œ†(๐‘’, ๐‘™, ๐‘ค) from it.
4. One can see that
๐‘™๐‘œ๐‘” ๐‘ƒ๐‘Ÿ(๐ธ = |๐‘ค๐‘’,๐‘™ − ๐‘ค๐‘’,๐‘™−1 |) = −๐œ†๐‘– (๐‘’, ๐‘™, ๐‘ค) ๐ธ๐‘– − log ๐‘๐ธ
and so
๐œ• ๐‘™๐‘œ๐‘” ๐‘ƒ๐‘Ÿ(๐ธ = |๐‘ค๐‘’,๐‘™ − ๐‘ค๐‘’,๐‘™−1 |)
1 ๐œ• ๐‘ƒ๐‘Ÿ(๐ธ)
=−
๐œ• ๐ธ๐‘–
๐‘ƒ๐‘Ÿ(๐ธ) ๐œ•๐ธ
Thus, we can take these derivatives of the histogram to obtain the local parameters. Note that
the parameter depends on ๐ธ๐‘– = |๐‘ค๐‘’,๐‘™ − ๐‘ค๐‘’,๐‘™−1 |, so plot ๐œ†๐‘– (๐ธ๐‘– ) and report the average value ๐›ผฬ…๐‘–
per image
๐œ†(๐ธ = |๐‘ค๐‘’,๐‘™ − ๐‘ค๐‘’,๐‘™−1 |) = −
๐‘š๐‘Ž๐‘ฅ๐ธ
๐œ†ฬ… = ∑ ๐œ†(๐ธ) ๐‘ƒ๐‘Ÿ(๐ธ)
๐ธ=0
5. Use these parameters learned from these images and apply to the Tsukuba image. Note that
in the case of the ๐œ†๐‘– (๐ธ๐‘– ) we only use ๐œ† since the dynamic programing would have to be
modified to allow for local variations. However, for ๐›ผ๐‘– (๐ธ๐‘– ) we can use a local value that depends on the feature matching error ๐ธ๐‘– = ๐น๐‘– (๐‘’, ๐‘™, ๐‘ค).
3
Download