Project 3 Report

advertisement
Trevor McCasland
Arch Kelley
Project 3 Report - Dynamic Programming for Stereo Reconstruction
Brief summary of project
Students were tasked with achieving one primary goal in project 2: taking two distinct
images depicting the same location or situation in the real world, and deriving the spatial
relationship between the two images. This task was to be completed using a sequence of steps,
the first of which was feature detection using the Harris corner-detection algorithm. Feature
matching involving a bidirectional matching algorithm was then conducted using those detected
corners, which then made it possible to finally compute a homography matrix that described the
transformation from one image to another. The homography matrix was used to warp one of the
images into the coordinate system of the other image, and the fitness of the matrix (or the
relationship between the images themselves) was then tested by visually comparing the warped
image to the actual second image and identifying how closely the warped image matched the
real second image.
Brief outline of the algorithmic approach
The steps of this project required work completed in previous homework assignments
and two overlapping images taken from a camera. First, the images were loaded into the
program and then passed to two functions written in previous homework assignments. The first
function, which was an implementation of the Harris corner detection algorithm, scanned the
image and returned a set of identified corners, which acted as ‘features’ later in the process.
The features were then matched between images based on similarity and relative location by a
separate function written in another previous homework assignment. Once matches were
determined between images, an implementation of the RANSAC algorithm was used to classify
each member of the set of matching features as either outliers (which were ultimately ignored)
and inliers. RANSAC relied on random samplings of three matches (which we referred to as
s=3), and ran a calculated N number of times to find the function of best fit, thereby gathering
the largest possible pool of inliers. Using the newly-found set of inliers, two 3x3 homography
matrices were created under two conditions. The first matrix was constructed under the
assumption that the final element in the matrix, h33, was always equal to 1, which allowed for a
more simplistic calculation that only had to produce eight of the nine h values. On the other
hand, the second matrix was computed assuming all nine of its values were variable but that the
magnitude of the matrix itself was always equal to one, which made it possible to find a nontrivial solution to the system. Finally, forward and backward warping techniques were used to
warp one image to the coordinate system of another image using either of the calculated
homography matrices, which were mathematically similar.
Pictures of intermediate results
Images of the results of our program are shown below.
Design decisions
By and large, we created our image warping program by following the lecture notes and
instructions given to us by Professor Yin. We modified exactly one existing file—project2Main.m
(the main program)— which included calls to the functions written for homeworks 3 and 4 and
contained the code for homography matrix formulation and image warping. Figures displaying
intermediate and final results of the entire warping process were generated inside this main
program file. We chose not to implement any special features or bonus features, including the
creation of image mosaics, due to time and effort constraints.
One design decision that we made was to assume that the probability of an image
containing an outlier was 50%. This decision, which mainly influenced the RANSAC
implementation alone, was made to ensure that RANSAC would run a sufficient number of
times to arrive at an acceptable accurate result. If we assumed that there were fewer than 50%
outliers, the algorithm had a tendency to run too few times, which led to increased inaccuracy in
our results. For similar reasons, we chose to use a RANSAC accuracy probability of 99% to
further increase the number of times that the algorithm’s loop ran and to only consider matches
within a certain distance that we determined to be roughly 5.
In terms of performance and robustness, we made a key choice while developing the
program that we believe increased its overall performance and ‘niceness’. Using our method,
we were able to avoid creating a completely new set of points and storage for points and
labelling that set as ‘inliers’; doing so would have required us to write a segment of boilerplate
loop code to copy inliers into a newly resized container each time a new inlier was found.
Instead, we chose to create an Mx1 boolean array in which all values were initialized to zero. If
a set of matched points was found to be an inlier by RANSAC, the index for that match in the
matrix of matches was marked as a 1 at that same index in the inliers boolean array. In this
fashion, we could construct the homography matrix through considering all points but only
conditionally using the points if they were part of a match that was identified as a 1 in the
boolean array. This meant that we avoided the computational cost of creating a resized inlier
container and copying over all of the inlier info from the old to new storage each time another
inlier was found.
Experimental observations
When using images with a translation in a certain direction, there were clear differences
in the outcome of forward and backward warping. Forward warping preserved all overlapping
pixels in the first image while backward warping preserved all overlapping pixels in the second
image. When the warping was complete, the black pixels that were in the warped image
represented the direction of the translation. For example, if there was a one dimensional
translation from left to right, then there would be black pixels on the left of the image for forward
warping and black pixels on the right for backward warping. The reason for this is because the
pixels from one image are trying to fit into the area of the matched features in the second image
which is only the overlapping part. It is also worth mentioning that the forward warping
operations tended to run faster than both variations of the backward warping operations.
Adjusting parameters related to RANSAC often led to drastically improved or drastically
worsened results. Generally, we found that tweaking parameters to maximize the number of
times RANSAC’s loop ran resulted in better results; this made sense because more iterations
would make it more statistically likely that a better set of inliers for the result homography matrix
was found. Specifically, larger values of e (predicted percentage of incorrectly classified outliers)
and larger values of p (the target percentage number of inliers that RANSAC should find)
typically led to a larger N (the number of RANSAC loop iterations), which then typically led to a
larger number of correctly-identified inliers.
We observed that after the computation of our two versions of the homography matrix,
the matrices were indeed mathematically similar. We included code that multiplied the matrix
derived by assuming h33=1 with the actual h33 value from the magnitude-derived matrix, and
discovered that the resulting matrix (which we named Htest) was very close to the magnitudebased matrix. We displayed the sum of squared differences value between the two matrices as
an output to the console from our application.
Though our application largely worked as expected, forward warping with and without
for-looping would sometimes leave black pixels in the shape of a warped grid on the warped
image. In this grid, the cells would shrink as they got closer the top left corner of the image and
the cells would expand as they got closer to the bottom right corner of the image. However, this
behavior was to be expected; warping an image also scales it in certain situations, which would
leave gaps in the resulting image that had no sibling in the other image to derive intensity from.
Otherwise, the act of borrowing intensity values from the sibling image could have missed a set
of pixels as a result of rounding operations like floor, ceiling, and averaging.
Download