Homework #4: Stereo Vision Due April 15, 2010. Professor Davi Geiger. Disparities are encoded using a scale factor 4 for gray levels 1 .. 255, while gray level 0 means "unknown disparity". Therefore, the encoded disparity range is 0.25 .. 63.75 pixels. We will use the notation that Dmin= -63 pixels and Dmax=1 pixels. Since no disparity is 0, use 0 disparity for the occlusion pixels. FILTERS: COMBINING THEM So far we used a feature and feature distance for stereo that is the following combination of two features (and their distances) Ce[l,w+D]=0.1 (√๐๐=0,๐ =3 (๐, ๐, ๐(๐, ๐ค)) + √๐๐=๐,๐ =3 (๐, ๐, ๐(๐, ๐ค))) 2 where the feature distance ๐๐,๐ (๐, ๐, ๐(๐, ๐ค)) is defined in terms of the features ๐ผฬ๐ฟ and ๐ผฬ๐ as follows 2 2 ๐๐,๐ (๐, ๐, ๐(๐, ๐ค)) = ๐๐๐ ((๐ผฬ๐ฟ (๐, ๐, ๐, ๐ ) − ๐ผฬ๐ (๐, ๐, ๐, ๐ )) , (๐ผฬ๐ฟ (๐, ๐, ๐ + ๐, ๐ ) − ๐ผฬ๐ (๐, ๐, ๐ + ๐, ๐ )) ) Why did we use the same coefficients for both, ๏ฑ๏ฝ๏ฐ๏ and๏ ๏ฑ๏ฝ๏ฐ๏ฏ๏ฒ comparisons? What if we use an another feature, say intensity grey level comparison, such as ๐๐ผ (๐, ๐, ๐(๐, ๐ค)) = (๐ผ ๐ฟ (๐, ๐) − ๐ผ ๐ (๐, ๐))2 , how should it be combined with the measures above? We could also use color features, a feature for each channel R, G, B or the average intensity I and ratios such as r=255*(R/I) and g=255*(G/I) (blue channels are more noisy), where I=R+G+B. How to combine them? Let us study the case we have n features and feature distances, let us call the distance {๐น๐ (๐, ๐, ๐ค); ๐ = 1, … , ๐}. A feature distance could be for example ๐น๐ (๐, ๐, ๐ค) = ๐๐ผ (๐, ๐, ๐(๐, ๐ค))1/2 . The smaller the value of a feature distance at (e,l) for a disparity w, the more likely the disparity is w (the more similar are the image features at left and right image). Let us work with five feature distances 1 ๐ฟ 1 ๐ ๐น1 = √๐๐ผ (๐, ๐, ๐(๐, ๐ค)) = ๐ด๐ต๐ ( ๐ผ (๐, ๐) − ๐ผ (๐, ๐)), 3 1 3 ๐น2 = √๐๐=0,๐ =3 (๐, ๐, ๐(๐, ๐ค)), ๐น3 = √๐๐=๐/2,๐ =3 (๐, ๐, ๐(๐, ๐ค)), ๐น4 = √๐๐ (๐, ๐, ๐(๐, ๐ค)) = 255 ∗ ๐ด๐ต๐ ( ๐ ๐ฟ (๐, ๐) ๐ผ๐ฟ (๐, ๐) ๐บ๐ฟ (๐, ๐) ๐น5 = √๐๐ (๐, ๐, ๐(๐, ๐ค)) = 255 ∗ ๐ด๐ต๐ ( ๐ผ๐ฟ (๐, ๐) − ๐ ๐ (๐, ๐) − ), ๐ผ๐ (๐, ๐) ๐บ๐ (๐, ๐) ๐ผ๐ (๐, ๐) ) where ๐ผ ๐ฟ (๐, ๐) = ๐ ๐ฟ (๐, ๐) + ๐บ ๐ฟ (๐, ๐) + ๐ต๐ฟ (๐, ๐) and ๐ผ ๐ (๐, ๐) = ๐ ๐ (๐, ๐) + ๐บ ๐ (๐, ๐) + ๐ต๐ (๐, ๐) So one possible definition of a probability of a feature at a given location to have a match at a disparity w is given by ๐ −๐น๐ (๐,๐,๐ค) ๐๐,๐,๐ (๐ค) = ๐ท๐๐๐ฅ ∑๐ค=−๐ท๐๐๐ ๐ −๐น๐ (๐,๐,๐ค) where ๐ = 1,2,3,4,5. There is an ambiguity as we can chose a weight ๏ก to scale the feature distance without affecting the ranking of the probabilities. So we may refer to each feature distance as being the feature similarity comparison multiplied by some scale factor, we call it ๏ก๏ฎ๏ In the previous homework we used ๏ก=0.1. This factor could vary locally to be ๐ผ๐ (๐, ๐, ๐ค) for each feature distance i. Taking into account this factor, we should define the probabilities as ๐๐,๐,๐ (๐ค) = ๐ −๐ผ๐ (๐,๐,๐ค) ๐น๐ (๐,๐,๐ค) ๐ −๐ผ๐ (๐,๐,๐ค) ๐น๐ (๐,๐,๐ค) = −๐ผ๐ (๐,๐,๐ค) ๐น๐ (๐,๐,๐ค) ∑๐ท๐๐๐ฅ ๐๐ค ๐ค=−๐ท๐๐๐ ๐ How to estimate these parameters ๐ผ๐ (๐, ๐, ๐ค) ? We do have the true disparity for these images and so we do have the correspondence between pixels. We can evaluate the value of ๐น๐ (๐, ๐, ๐ค) for each pixel of the image and its correspondence. 1. Construct a histogram for each ๐น๐ (๐, ๐, ๐ค), i.e., for each value ๐ธ๐ = ๐น๐ (๐, ๐, ๐ค) define an interval ๏ and count the number of matched pixels falling in this interval. Normalize the histogram by dividing the count by the number of pixels considered. Plot the histogram. This histogram represents the probability ๐(๐ธ๐ = ๐น๐ (๐, ๐, ๐ค)) = the parameter ๐ผ๐ (๐, ๐, ๐ค) from it. ๐ −๐ผ๐ (๐,๐,๐ค) ๐ธ๐ ๐๐ธ , so we need to extract 2. One can see that ๐๐๐๐๐ (๐ธ๐ = ๐น๐ (๐, ๐, ๐ค)) = −๐ผ๐ (๐, ๐, ๐ค) ๐ธ๐ − log ๐๐ธ and so ๐ ๐๐๐๐๐ (๐ธ๐ = ๐น๐ (๐, ๐, ๐ค)) 1 ๐ ๐๐ ( ๐ธ๐ ) =− ๐ ๐ธ๐ ๐๐ ( ๐ธ๐ ) ๐ ๐ธ๐ Thus, we can take these derivatives of the histogram to obtain the local parameters. Note that these parameters vary with ๐ธ๐ = ๐น๐ (๐, ๐, ๐ค) so plot ๐ผ๐ (๐ธ๐ ) per feature i and report the average value ๐ผฬ ๐ per image, i.e., ๐ผ๐ (๐ธ๐ = ๐น๐ (๐, ๐, ๐ค)) = − 2 ๐๐๐ฅ๐ธ๐ ๐ผฬ ๐ = ∑ ๐ผ๐ (๐ธ๐ ) ๐๐ ( ๐ธ๐ ) ๐ธ๐ =0 3. Disparity Changes. Create the histogram of disparity changes. For each two consecutive epipolar-line-pixels on all horizontal lines (epipolar lines), compute the magnitude of the disparity change and add to one count for its bin on the histogram. So the x-axis of the histogram represent the possible values of the disparity change, i.e., it should vary from 0 to Dmax – (–Dmin)=Dmax+Dmin. Normalize the histogram by dividing the bin counts by the total number of counts (total number of consecutive pixels considered). This histogram generates ๐ −๐(๐,๐,๐ค) ๐ธ ๐๐(๐ธ = |๐ค๐,๐ − ๐ค๐,๐−1 |) = ๐๐ธ so we need to extract the parameter ๐(๐, ๐, ๐ค) from it. 4. One can see that ๐๐๐ ๐๐(๐ธ = |๐ค๐,๐ − ๐ค๐,๐−1 |) = −๐๐ (๐, ๐, ๐ค) ๐ธ๐ − log ๐๐ธ and so ๐ ๐๐๐ ๐๐(๐ธ = |๐ค๐,๐ − ๐ค๐,๐−1 |) 1 ๐ ๐๐(๐ธ) =− ๐ ๐ธ๐ ๐๐(๐ธ) ๐๐ธ Thus, we can take these derivatives of the histogram to obtain the local parameters. Note that the parameter depends on ๐ธ๐ = |๐ค๐,๐ − ๐ค๐,๐−1 |, so plot ๐๐ (๐ธ๐ ) and report the average value ๐ผฬ ๐ per image ๐(๐ธ = |๐ค๐,๐ − ๐ค๐,๐−1 |) = − ๐๐๐ฅ๐ธ ๐ฬ = ∑ ๐(๐ธ) ๐๐(๐ธ) ๐ธ=0 5. Use these parameters learned from these images and apply to the Tsukuba image. Note that in the case of the ๐๐ (๐ธ๐ ) we only use ๐ since the dynamic programing would have to be modified to allow for local variations. However, for ๐ผ๐ (๐ธ๐ ) we can use a local value that depends on the feature matching error ๐ธ๐ = ๐น๐ (๐, ๐, ๐ค). 3