Ankitha Miryala Computer Vision - CMPEN 454 Project 1 Report a) This first project was to lay a foundation to become comfortable with the processes of various types of feature extraction and identification for finding corresponding matching points on two images. The processes include utilizing the mathematical theories we've learned in class thus far and implementing these algorithms and mathematical models in Matlab, or in our case, C++. Through this project, we will learn how to combine Harris corner detection with converting images into grayscale to efficiently detect points of interest in order to identify local features. Since we will be matching these point scores between images, we will combine different types of matching methods, such as NCC and SSD scores to find the corresponding feature on another image. After finding matches, we will then implement part of group theory, specifically homographies, in order to perform image transformations from alternative coordinate systems. Last, we will evaluate the performance of our method of Harris Corner Detection along with the matchability of feature descriptors we’ve used. A more detailed description is provided in the following section. By the completion of this project, all team members should obtain some level of proficiency in C++ programming and developing code for using algorithms of computer vision. b) Initially we included all the basic libraries and the libraries for Open CV as our code was in C++. We defined the values for k and the SSD threshold, explanation of the design decision is explained later. We initiated a matrix called myHarris, this is a function we made which is explained below. Next we started our main function. We assigned the first image to a matrix called ‘img’ and the second image to matrix called ‘img1’. Then we declared the matrices called dst_norm_scaled, dst_norm, dst_norm_scaled1 and dst_norm1. The threshold is set to 75. As previously mentioned, we made a function, myHarris to compute the Harris corners. First, we converted the image to grayscale because it converts all pixels to a single channel. We chose to use a kernel of size 3x3 and sigma value of 1 to blur the image after trying various combinations of kernel sizes and sigma values since this combination seemed to yield the best results. We then used the Sobel operator in order to compute the derivatives of the horizontal and vertical pixels to make the edges of the smoothed image more prominent. Next, we compute the gradients, 𝑔𝑟𝑎𝑑𝑥𝑥 , 𝑔𝑟𝑎𝑑𝑥𝑦 , 𝑎𝑛𝑑 𝑔𝑟𝑎𝑑𝑦𝑦 , that will eventually be put into the autocorrelation (A) matrix. We blur the gradients with a 3x3 kernel and a sigma value of 2 and denote them as shown in the following equation: 𝐴 = [𝑆𝑥𝑥 , 𝑆𝑥𝑦 ; 𝑆𝑥𝑦 , 𝑆𝑦𝑦 ]. Next we calculate the determinant and trace of the A matrix in order to be able to calculate the Harris R value. We then plug all the computed values into the following equation : 𝑅 = 𝑑𝑒𝑡(𝐴) − 𝑘𝑇𝑟(𝐴)2 . At the end of myHarris function, we normalize the R-score values, which we will denote as dst_norm, such that they lie between 0 and 255 to make life simple. Now the harris corners of the first image are assigned to dst_norm and the harris corners of the second image are assigned to dst_norm1. The harris corner values are then scaled to absolute values using the function ‘convertScaleAbs’. The scaled values of dst_norm and dst_norm1 are assigned to dst_norm_scaled and dst_norm_scaled1 respectively. Four vectors are defined in type int to indicate the rows are columns of the two images. We chose type int as the rows are columns are integers themselves. The vectors for the first image are pointX, pointY and the vectors for the second image are pointX1, pointY1. Next we define two special vectors called pointXY and pointXY1 in type Point2f. This is to indicate a point in a matrix in (x,y) format. We also define two other variable XY and XY1 which are used later in the code. Next part of the code is to print circles where edges are found in the images. A two layer(subsetted) for loop structure is used to check through each row from zero to the last row and each column from zero to the last column in dst_norm(image 1). Here, we compare each point to the threshold value check if it is greater than threshold to draw red circles using the command ‘circle( dst_norm_scaled, Point( i, j ), 1, Scalar(0,0,255), 1, 8, 0 );’ and the push the value into the vector that we defined called pointXY. The same code is repeated for the second image and push the values into the vector called pointXY1. Now we have two vectors from the two images that have all points which are edges. Then the size of the vectors are printed out that is nothing but the number of detected edges in each image. Vectors called SSD, mat and mat1 are initiated in type float. The minimum size of both the vectors above is computed and is called ‘bound’. This gives us the common points where we can compute SSD. Vector pointXY is assigned to XY and vector pointXY1 is assigned to XY1. Next for all the points given by the bound value we create a 5*5 matrix and save them into a vector called ‘mat’ for the first image and a vector called ‘mat1’ for the second image. Here, we calculate the value of SSD at each point. A type float called temp_ssd is initiated to hold the value of SSD at each edge of the bound value.To do this we calculate the SSD value at each of the 25 places in the vectors mat and mat1. The last values in these matrices are assigned to variables var and var1 their values are subtracted. This value of the difference is added to temp_ssd. Repeating it 25 times the final value of SSD called temp_ssd at a particular point. This value is the pushed back into a vector called ‘SSD’. Hence, we have calculated the SSD value for each edge of the bound value and the stored it in a vector called ‘SSD’. Two vectors of type Point2f are created called newXY and newXY1. Now we compare the value of SSD that we got at each point to the threshold value of the SSD that we defined at the beginning. If the calculated SSD value is less than the threshold value, we send those locations point from the pointXY to newXY. Similarly, for the second image with the same condition we send the value from vector pointXY1 to newXY1. Now, we have two vectors called newXY and newXY1 which have the desired locations satisfying the SSD conditions. We then print out the size of these two vectors, which gives the number of match points using SSD. For Part 4: Repeatability and Matchability, we first found the homography matrix, H, from the images. The findHomography function find a 3x3 homography matrix which contains the transform of the matched keypoints. It uses the coordinates of the points from the original image and from the destination image. We chose to use CV_LMEDS, which is the Least - Median robust method, because the recommended papers seemed to allude that this would yield a more accurate response than the infamous RANSAC. Next, we used perspectiveTransform to map these interest points from image 1 to image i. We repeated this processes to map interest points from image i to image 1. The only difference was that we needed to compute the inverse of the homography matrix to map image i interest points onto image 1. We were then able to count the number of interest points that were transferred to image i or image 1 by the homography or inverse homography matrix. Now, we are able to compute repeatability rate. We chose the distance (𝜖 𝑜𝑟 𝑒𝑝𝑠) between the transferred interest points and one of the interest points in image i to be 1.5 pixels after testing various values ranging from 0 to 100 because the Smid paper seems to favor 1.5 pixels. We then transferred image 1 interest points to image i using the homography matrix to count the number R_i which are within “eps” distance of one of the interest points on the destination image i. With this value, we can compute the repeatability rate (denoted in the code as r_i) using the equation: 𝑟𝑖 (𝜖) = 𝑅𝑖 (𝜖) / 𝑚𝑖𝑛(𝑛1, 𝑛𝑖). Repeatability rate is commonly viewed as a percentage of interest points that have been located on another image, creating a collection of possible matching features. This being said, repeatability rate could be any value ranging from 0 to 1. We calculated matchability rate in a similar way except that now we counted the number (M_i) of matched interest points in image i that fell less than or equal to the eps distance, that we defined previously, to its predicted location. After we were able to compute matchability rate using the equation: 𝑚𝑖 (𝜖) = 𝑀𝑖 (𝜖) / 𝑚𝑖𝑛(𝑛1, 𝑛𝑖. Matchability rate is calculated to serve as a way to check the accuracy of our feature descriptor and matching process. To check that we calculated repeatability rate and matchability rate correctly, we checked if the matchability rate was less than or equal to the repeatability rate. We wanted to implement our real algorithms in C++ using the openCV library because we didn’t get good results for the feature descriptor part. With this knowledge, we provided another version of the code using some built-in functions from the openCV library. The code basically uses the Keypoints and Descriptors types that are implemented in the openCV library, making it a lot more accurate. The Keypoints type contain more than just the coordinates of the Keypoint, but also the size, angle, response and octave. We also used the FlannBasedMatcher, which has the ability to train a net of computed points. Essentially, this means the more matches we try the better result we get. By doing this, we accomplished a good knowledge of new algorithms that are being used for feature detection, like SURF, and also were lucky enough to obtain much more accurate results. c) Run the different portions of your code and show pictures of intermediate and final results Harris Corner Images: set 1, 2: 75 set 1, 3: 65 set 1, 4: 75 set 1, 5: 95 set 1, 6: 105 Harris_image#, Set# put 1_1.2 with 2_1,2 put 1_1.3 with 3_1,3, etc d) Run repeatability and matchability experiments on the images given and plot or tabulate the quantitative results. e) Experimental observations: The first code works, but we wanted to see if the functions from the openCV library were more accurate or robust than our original code. We found that, like our prediction, the opencv functions we implemented into our alternative code yielded better results. If we apply Gaussian smoothing too much, we eventually lose the ability to detect corners using Harris corner detection because the pixel to pixel contrast becomes less prominent. We did not visually notice much of a difference when selecting parameters k and Rlowthreshold, but we settled for k to equal 0.5 because it was suggested in the project description and we selected different threshold values ranging from 65 to 105 depending on the relative amount of corners we were detecting. For the images that we were detecting a substantially larger amount of corners, we used a higher threshold to limit the number of corners. If we increased or decreased k, we needed to change the threshold to maintain a reasonable resolution because these parameters are coupled. For repeatability and matchability the only design consideration we had to take into account was deciding on the epsilon value. At first, we were getting a count of zero unless we increase epsilon to nearly 100. Finally, after comparing the keypoints in a better way before computing the homography, we were able to use the epsilons that were suggested as appropriate values for the distance in the project description and Schmid’s paper. For our original function, we would not be comfortable using the feature descriptors, but we were comfortable with our Harris corner method and SSD computations. However, we would be comfortable to use the SURF feature descriptor from openCV in future projects.