1. Abstract - Department of Electrical Engineering

TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY the Faculty of Electrical Engineering the Control and Robotics Laboratory The Panorama Creator by Diana Tsamalashvili & Yevgeny Yusepovsky the supervisor Arie Nakhmani Spring 2010 List of Images 1 2 3 Chapter Image FFT Based Algorithm for Automatic Transformation from rectangular Image Registration coordinates to log polar Calculating the affine parameters for Pictorial representation of the creating panorama fundamental RANSAC iteration Shift The log-polar transformation page 7 12 19 example 4 Shift The rotated image corrected by the 20 shift function 5 Results and Conclusions FFT panorama 36 6 Results and Conclusions Normalized weighted sum panorama 36 7 Results and Conclusions Weighted stitch panorama 37 Table of Contest 1. Abstract .................................................................................................................................................. 5 2. Theory .................................................................................................................................................... 6 2.1 FFT Based Algorithm for Automatic Image Registration ............................................. 6 2.2 Feature Based Algorithm for Automatic Image Registration .................................. 10 2.2.1 Feature Matching .............................................................................................................. 10 2.2.2 Calculating the affine parameters for creating panorama ................................ 11 2.2.3 Model verification by linear least square ................................................................ 13 2.2.4 Image Stitching and Blending....................................................................................... 14 Normalized Weighted Sum. ..................................................................................................... 14 Multiband Blending algorithm ................................................................................................ 15 3. Implementation ............................................................................................................................... 17 3.1 Main-FFT.m ................................................................................................................................. 17 3.2 Shift.m ........................................................................................................................................... 18 3.3 Merger.m ...................................................................................................................................... 22 3.4 Main_SIFT.m ............................................................................................................................... 26 3.5 RANSAC.m ................................................................................................................................... 29 3.6 Affinein.m .................................................................................................................................... 31 3.7 Affineout.m .................................................................................................................................. 32 3.8 Affinecheck.m ............................................................................................................................. 32 3.9 Border_filter.m........................................................................................................................... 32 3.10 SIFT.m ......................................................................................................................................... 33 4. Results and Conclusion ................................................................................................................. 36 5. Bibliography ...................................................................................................................................... 38 1. Abstract In today’s world people try to utilize to the fullest everything they posses. We are trying to catch with all our senses whatever the world has to offer us. And it won’t be a discovery to state that our sight is one of the biggest consumers of our brain. However, even today, the technology cannot compete with human body. And not once we found ourselves trying to capture a beautiful landscape and then disappointed by the limitations of our camera. A panorama (formed from Greek πᾶν "all" + ὅραμα "sight") is a wide-angle view or representation of a physical space. And the goal of our project is to implement a “Panorama Creator” from a video film taken by a regular camera. In this project we implement a direct FFT based and Feature based algorithms for stitching images. Then we are using two compositing techniques: Normalized weighted sum and Weighted stitching in order to remove variety of artifacts and create a high quality panorama image. In the end of this book we compare between techniques we used and present the required conditions in order to get the best result possible. 2. Theory 2.1 FFT Based Algorithm for Automatic Image Registration The first automatic registration algorithms we decided to implement is based on the Fast Fourier Transform (FFT). The displacement between two given images can be determined by computing the ratio F1conj(F2)/|F1F2|, and then applying the inverse Fourier transform (Hongjie Xie et al., 2003). The result is an impulse-like function, which is approximately zero everywhere except at the displacement that is necessary to optimally register the images. The FFT-based automatic registration method relies on the Fourier shift theorem, which guarantees that the phase of a specially defined “ratio” is equal to the phase difference between the images. It is known that if two images I1 and I2 differ only by a shift, (x0, y0), [i.e., I2(x, y) = I1(x- x0, y- y0)], then their Fourier transforms are related by the formula: F2 ( , )  e  j  2 (  x0   y0 )  F1( , ) (1) The “ratio” of two images I1 and I2 is defined as: F1 ( , ) conj(F2 ( , )) F1 ( , ) F1 ( , )e j2 ( x0  y0 ) R   (2) abs(F1 ( , )) abs(F2 ( , )) F1 ( , ) F1 ( , ) e j2 ( x0  y0 ) where conj is the complex conjugate. e j2 ( x0  y0 ) 1 By taking the inverse Fourier transform of R , we see that the resulting function is approximately zero everywhere except for a small neighborhood around a single point. This single point is where the absolute value of the inverse Fourier transfer of R attains its maximum value. The location of this point is exactly the displacement (x0, y0) needed to optimally register the images. If the two images differ by shift, rotation and scaling, then converting abs (F(, )) from rectangular coordinates (x, y) to log-polar coordinates (log(), ) (Fig. 1) makes it possible to represent both rotation and scaling as shifts. However, computing (log (),) from the original rectangular grid leads to points that are not located exactly at points in the original grid. Thus, interpolation is needed to find a value of abs (F(, )) on the desired grid. A bilinear interpolation is used in this implementation. Let (x, y) be a point related to the desired grid point (log (), ), x  elog(  )  cos( ); y  elog(  )  sin( ) To find the new value M(x, y) using this interpolation, take the intensities M jk, M j+1, k, M j, k+1, and M j+1,K+1 of four original grid points (j, k), (j+1, k), (j, k+1), and (j+1, k+1) surrounding (x, y). Then interpolate M(x, y) as follows: M(x, y) = M jk  (1-t)  (1-u) + M j+1, k  t  (1-u) + M j, k+1  (1-t)  u + M j+1,K+1  t u where t is a fractional part of x, and u is a fractional part of y. The final algorithm for determining rotation, scaling, and shift is: 1. Apply FFT to images I1 and I2  F1(, ) and F2(, ); 2. Compute the absolute values of F1(, ) and F2(, ); 3. Apply a high pass filter to the absolute values to remove low frequency noise; 4. Transform the resulting values from rectangular coordinates to log-polar coordinates; 5. Apply the FFT to log-polar images I1 and I2  Flp1(, ) and Flp2(, ); 6. Compute the ratio R1 of Flp1(, ) and Flp2(, ) using equation (2); 7. Compute the inverse FFT IR1 of the ratio R1; 8. Find the location ( log(  0 ),  0 ) of the maximum of abs(IR1) and obtain the values log( 0 ) of scale (  0  base ), and rotation angle (  0 ); 9. Construct a new image, I3, by applying reverse rotation and scaling to I2 or I1; 10. Apply FFT to images I1 and I3 (or I2 and I3) depending on whether I1 or I2 is chosen as the base image. 11. Compute the ratio R2 using equation (2); 12. Take inverse FFT IR2 of R2; 13. Obtain the values (x0, y0) of the shift from the location of the maximum of abs(IR2). The result of this process is the values of the scale, rotation and shift parameters needed to register the two images. The process of image stitching will be presented in section 2.2.4. Limitations of the algorithm are: 1) The algorithm only works for two images of the exact same size. Especially for the rotation and scale computation, images also need to be square. This limitation is not severe because it is easy to produce images that are of equal size and, if necessary, square before we run the user functions. 2) The algorithm requires images that have an overlapping area larger than 30%. 3) The algorithm only works for images in which the scale changes less than 1.8. Otherwise, the criterion of 30% overlapping area is not satisfied. 4) We cannot get full homography parameters for creating the panorama. In order to find a better solution additional algorithm was explored. The theoretical explanation of it can be found in the next chapter. 2.2 Feature Based Algorithm for Automatic Image Registration 2.2.1 Feature Matching The first step in the panoramic recognition algorithm is to extract and match SIFT features between all of the images. SIFT features are located at scale-space maxima/ minima of a difference of Gaussian function. At each feature location, a characteristic scale and orientation is established. This gives a similarity-invariant frame in which to make measurements. Although simply sampling intensity values in this frame would be similarity invariant, the invariant descriptor is actually computed by accumulating local gradients in orientation histograms. This allows edges to shift slightly without altering the descriptor vector, giving some robustness to affine change. The vector of gradients is normalized, and since it consists of differences of intensity values, it is invariant to affine changes in intensity. Assuming that the camera rotates about it’s optical centre, the group of transformations the images may undergo is a special group of homographies. We parameterize each camera by 3 rotation angles   1 2 3  and focal length f. This gives pairwise homographies ~ ~ u i  H ij u j H ij  K i Ri R K T j 1 j  fi where K i   0  0 0 fi 0 0  0  i  0  and Ri  e , i    i 3  i 2 0  However, for small changes in image position ui  ui 0  a11 ui  Aij ui , where Aij  a21  0 ~ ~ a12 a22 0 ui u j i 3 0 i1 i 2  i1  0  u j or equivalently ui 0 a13  a23  is an affine transformation obtained by 1  linearising the homography around ui 0 . This implies that each small image patch undergoes an affine transformation, and justifies the use of SIFT features which are partially invariant under affine change. Once features have been extracted from all n images (linear time), they must be matched. 2.2.2 Calculating the affine parameters for creating panorama Using SIFT we are getting a big set of points matched between two images. As described above we need a set of 4 pair of coordinates in order to obtain the homographic model. That would be truth if we any set of points would give us the same model. However, due to the noise, difference in illumination, different focal length, etc. our model depends on the set of points we choose. In order to choose the best set of points we need additional algorithm. The RANSAC algorithm (RANdom Sample And Consensus) was first introduced by Fischler and Bolles [5] in 1981 as a method to estimate the parameters of a certain model starting from a set of data contaminated by large amounts of outliers (Marco Zuliani, 2008). A datum is considered to be an outlier if it will not fit a model instantiated by a given set of parameters (assuming that both the model and the parameters are the “true” ones) within some error threshold that defines the maximum deviation attributable to the effect of noise. The percentage of outliers which can be handled by RANSAC can be larger than 50% of the entire data set. The RANSAC algorithm is essentially composed of two steps that are repeated in an iterative fashion (hypothesize-and-test framework):  Hypothesize. First minimal sample sets (MSSs) are randomly selected from the input dataset and the model parameters are computed using only the elements of the MSS. The cardinality of the MSS is the smallest sufficient to determine the model parameters.  Test. In the second step RANSAC checks which elements of the entire dataset are consistent with the model instantiated with the parameters estimated in the first step. The set of such elements is called consensus set (CS). RANSAC terminates when the probability of finding a better CS drops below a certain threshold. Let q be the probability of sampling from the dataset D a MSSs that produces an accurate estimate of the model parameters. Consequently, the probability of picking a MSS at least one outlier (i.e. a MSS that produces a biased estimate of the true model parameter vector) is 1−q. If we construct h different MSS, then the probability that all of h them are contaminated by outliers is (1  q) . We would like to pick h (i.e. the number h of iterations) large enough so that the probability (1  q) is smaller or equal than a h certain probability threshold  (often called alarm rate), i.e. (1  q) < . From previous   log   T iter     log 1  q   . Below relation can be concluded that the number of iterations is: you can see the pictorial representation of the fundamental RANSAC iteration: 2.2.3 Model verification by linear least square Each calculated model is subject to a verification procedure in which a linear least squares solution is performed for the parameters of the affine transformation (D. Lowe, 1999). The affine transformation of a model point [x y]T to an image point [u v]T can be written as below: where the model translation is [tx ty]T and the affine rotation, scale, and stretch are represented by the parameters m1, m2, m3 and m4. To solve for the transformation parameters the equation above can be rewritten to gather the unknowns into a column vector. This equation shows a single match, but any number of further matches can be added, with each match contributing two more rows to the first and last matrix. At least 3 matches are needed to provide a solution. We can write this linear system as where A is a known m-by-n matrix (usually with m > n), x is an unknown n-dimensional parameter vector, and b is a known m-dimensional measurement vector. Therefore the minimizing vector is a solution of the normal equation The solution of the system of linear equations is given in terms of the matrix  AT A  1 AT , called the pseudoinverse of A, by , which minimizes the sum of the squares of the distances from the projected model locations to the corresponding image locations. Outliers can now be removed by checking for agreement between each image feature and the model, given the parameter solution. Given the linear least squares solution, each match is required to agree within the error range that was used for the parameters in the RANSAC threshold. As the inliners are obtained, their amount is compared to the maximal amount of the previous iterations, then, the new best model is saved and the process iterated. If fewer than 3 points remain after discarding outliers, then the match is rejected. 2.2.4 Image Stitching and Blending After calculating all affine parameters and adapting the images accordingly we are ready to create the final composite. The simplest way would be to take an average value at each pixel. However, this usually does not work very well, because of exposure differences, mis-registrations and movie quality. In this section we will present two techniques to deal with the problems mentioned above. Normalized Weighted Sum. One of the recommended techniques is to weight pixels near the center of the image more heavily and to down-weight pixels near the edges (Richard Szeliski, 2005). From the previous steps we have n images I i    which, given the known registration, may be expressed in a common (spherical) coordinate system as I i    . In order to combine information from multiple images we assign a weight function to each image W(x, y) = w(x)w(y) where w(x) and w(y) varies linearly from 1at the centre of the image to 0 at the edge [6]. The weight functions are also resampled in spherical coordinates W i    .A simple approach to blending is to perform a weighted sum of the image intensities along each ray using these weight functions I I linear    linear      n i 1 I i   W i     W i    i 1 n where is a composite spherical image formed using linear blending. Weighted averaging with a distance map is often called feathering. However, this approach can cause blurring of high frequency detail if there are small registration errors. Multiband Blending algorithm To prevent the above mentioned problem the multiband blending algorithm of Burt and Adelson (M. Brown, 2007) can be used. The idea behind multi-band blending is to blend low frequencies over a large spatial range and high frequencies over a short range. We initialize blending weights for each image by finding the set of points for which image i is most responsible: i max W 1 if W i  ,   argmax j W j  ,    ,     0 otherwise , i.e. i Wmax  ,  is 1 for  ,  values where image i has maximum weight, and 0 where some other image has a higher weight. These max-weight maps are successively blurred to form the blending weights for each band. Bi     I i     Ii    A high pass version of the rendered image is formed: g    is a Gaussian of standard deviation  . in the range of wavelength  0,  Ii     I i     g    B    where represents spatial frequencies . We blend this band between images using a blending weight formed by blurring the max-weight map for this image i Wi     Wmax     g     0,  , where Wi    is the blend weigh for the wavelength band. Subsequent frequency bands are blended using lower frequency bandpass images and further blurring the blend weights, i.e. for k  1 : B(ik 1)  Iki   Iik 1 Iik 1  Iki   g ' , Wik 1  Wki  g ' where the standard deviation of the Gaussian blurring kernel '  2k  1 is set such that subsequent bands have the same range of wavelength. For each band, overlapping images are linearly combined using the corresponding blend multi k weights: I      n   Wki    . This causes high frequency bands (small i W      i 1 k i i 1 k n B k ) to be blended over short ranges whilst low frequency bands (large k ) are blended over larger ranges. 3. Implementation In this part we are describing the code we wrote in order to implement our project. For this purpose we used MATLAB technical computing language. Below are described the functions we wrote in order to reach our goal with attention to the main issues we were dealing during the implementation. 3.1 Main-FFT.m The “Main-FFT.m” file is the main macro for implementing the FFT-based algorithm. It reads the input file, calls other functions following the algorithm and its output is panorama image. In its advanced version MATLAB offers us the option to work with different types of video as AVI , MPEG-1, Windows Media® Video (.wmv, .asf, .asx) and any format supported by Microsoft DirectShow using the built in function mmreader. Based on this function we implemented the “movie_read.m” function, which provides us with MATLAB movie structure, the number of frames and the size of a frame. After getting access to the information we need, we are preparing our data for further processing. First we are converting the RGB data to Black & White (BW) frames. It is important to notice that the conversion to grayscale mode doesn’t affect our ability to calculate the required parameters for combining the panorama. It does reduce the amount of data which we are processing. The final result is represented in the original colors. On the next step, we are calculating the shift vector for all frames respectively to the first frame using the “shift.m” function. This function if based on FFT fast image registration algorithm which is described in chapter 2.1. More details about implementing the “shift.m” function you can read below. The final step in creating panorama is merging all the frames according to shift vector, calculated before. 3.2 Shift.m “Shift” function’s inputs are two images (im1, im2) and their size (rows, cols). This function rotates and scales im2 so it fits the best the im1 and then calculates the relative shift between the frames based on FFT algorithm as described below. At the beginning we are preprocessing the input images in order to make them square as required for log-polar transformation and following calculations. Then we are transforming the above results in to frequency domain, using the fftn function and calculating the absolute values. F1=fftn(double(im1s)); F2=fftn(double(im2s)); F1=abs(fftshift(F1)); F2=abs(fftshift(F2)); The second step is to apply high pass filter to the absolute values to remove low frequency noise. And a low pass filter with radius of the square frame in order to remove direction dependence, i.e. to create a circle so the transformation to log-polar coordinates will be more accurate. Then we transform images from rectangular coordinates to log-polar coordinates using the “imgpolarcoord.m” function. log_F1=imgpolarcoord((F1),rows,cols); log_F2=imgpolarcoord((F2),rows,cols); In order to get some intuition about the process you can see below an example of calculations executed by MATLAB for white square on black background and compare the results original and rotated log-polar images. The shift which represents the rotation on the log-polar images can be noticed if white lines, which origin from the black halfcircles, along the y axis are examined: Following the above calculations we are applying FFT on the log-polar images: I1=fftn(double(log_F1)); I2=fftn(double(log_F2)); And now we are able to calculate the R1 ratio as defined in chapter 2.1. Following this step we are calculating the inverse FFT of the calculated above ratio, R1. R1=Flp1.*conj(Flp2)./(abs(Flp1).*abs(Flp2)); IR1=ifft2(R1); In order to obtain the values of scale and rotation angle we need to find the location of the maximum of the absolute value of IR1. In the algorithm, one image must be chosen as the base image. It is assumed that the orientation of the base image is orientated and that the object of the process is to reference the second image. A positive rotation angle means that the image is rotated to the east relative to the base image (i.e. clockwise rotation), while a negative rotation angle means that the image is rotated to the west (i.e. counterclockwise rotation). It is also assumed that the rotation angle is in the interval [–45, +45], because normally the two images have similar orientations. max_corr=max(max(abs(IR1))); [pos_y, pos_x]=find(abs(IR1)==max_corr); b=10^((log10(rows))/siz); scale=b^(mod(pos_y,siz)); angle=(pos_x*pi / (2*siz))*(180/pi); if angle>45 angle=angle-90; end In our implementation we chose im1 as the base image. After calculating the relative scale and rotation angle of the second image (im2) we are creating third image (im3). Since the rotation and scale may change the size of the original matrix we are required to resize im3 so it will be compatible to the base image. The rescale and rotation of im2 is executed by the “img_sc_rot.m” function which is based on two built-in Matlab functions: imresize and imrotate. im3=img_sc_rot(im2,1/scale,-angle); im3=border_filter(im3); The result for the above example after calculating the rotation angle and its correction is: the rotated image corrected by the shift function When the frames are rotated a bilinear interpolation are used. This means that pixels at the edge are averaged with zeros and this causes undesired artifacts during stitching. In order to deal with it we are using a border filter. The explanation about its implementation you can find in section 3.9. Now we are ready to calculate the shift between the two images. Similar to previous steps we are required to calculate the FFT for two images. Then we are calculating the ration, defined in previous section, R2. Finally, in order to obtain the relative shift we need to calculate the inverse FFT of R2 ration (IR2) and find the coordinates of the absolute value of the IR2. F3=fftn(double(im3)); F1=fftn(double(im1)); R2=F1.*conj(F3)./(abs(F1).*abs(F3)); IR2=ifft2(R2); max_corr=max(max(abs(IR2))); [y,x]=find(IR2==max_corr); The sign convention is that image shifts to the east or south are positive while shifts the west or north are negative. Also, special consideration is needed when the computed value (y and x) in the last step above is bigger than half of the image columns or rows (Table 1). The following approach is used to deal with these situations: if (y>0.5*rows) y=y-rows; Status end; computed if (x>0.5*cols) value x=x-cols; end; of Output shift value shift Image shift direction x< ½ columns positive to east x> ½ columns negative (x-columns) to west y< ½ rows positive to south y> ½ rows negative (y-rows) to north Table 1 3.3 Merger.m The merger function creates panorama from a set of discrete frames and the translation between each frame and the previous one. As an input, the merger function gets the amount of frames, the shift between every two frames and the movie itself with the final version of every frame. In other words, if we use any kind of transform beyond translation, all of the frames are already transformed according to the model and the border filter is already applied. The first step in the merger function is the creation of a common scale between all of the frames and a matrix of the right size to contain exactly the full panorama picture. The way to implement it is by using “cumsum” Matlab function, which creates a vector of cumulative sum of elements. Those partial sums of translation vector represent the translation between each frame and the first one. Thus, a common scale for all of the frames is already created. The problem now, is that the coordinates of the frames are limited to matrix indexes. This enforce us to start our scale from the point (1,1). However, this may be solved easily by subtracting the minimum of all frame locations as an offset and adding “1”. Now, everything is done for creating a zero matrix of the right size aimed for panorama containment. %% the borders of the panorama x_max=max(absolute_x); x_min=min(absolute_x); y_max=max(absolute_y); y_min=min(absolute_y); %% making the right size matrix for panorama +3 color pellets panoram=zeros(y_max+size_y,x_max+size_x); absolute_x and absolute_y are the coordinate of the rightmost top point of each frame and size_x, size_y are frame sizes. Next, the way of frame stitching has to be chosen. Thus, there are two versions of merger function: 1> Normalized weighted sum 2> Weighted sum Normalized weighted sum First, the masks of the frames should be created: temp=moviex(:,:,k); temp3=double((temp>0)); This is true, for every frame “k” (‘for’ loop is used) Next, a function of weights should be chosen: [ X,Y ] = meshgrid(linspace(-1,1,size_x),linspace(-1,1,size_y) ); a=1; b=40; w=1./((abs(b*X)).â+(abs(b*Y)).â); For smoothing reduction, the gradient of the chosen function should be high enough, especially near the origin. This promises a dominant frame for most of the centralized pixels. On the other hand, near the edges, the gradient should be low enough for smooth frame transitions. From several implement functions, which include inverse parabola, Gaussian and (Xâ+Yâ) for different a values, the function shown above gave the best results. The next step is the multiplication of the frame masks by the function of weights. temp3=temp3.*w; Now, a matrix of weights of every pixel in every frame may be created: weight=zeros(y_max+size_y,x_max+size_x,frame_num); and for every iteration k: weight(absolute_y(k):(absolute_y(k)+size_y-1),absolute_x(k):(absolute_x(k)+size_x1),k)=weight(absolute_y(k):(absolute_y(k)+size_y1),absolute_x(k):(absolute_x(k)+size_x-1),k)+temp3; The normalization factor is: nirmul=sum(weight,3); Now the panorama may be created, using the following formula: Weighted stitching This is another method of frame stitching, which is aimed for smoothing the transitions between frames as the normalized weighted stitch do. However, in contrast to normalized weighted stitching this algorithm is also aimed for preserving the sharpness of the frames as much as possible. The algorithm is based on finding the overlapping region between each two frames and making a fast transition function from one frame to another. This function should be fast enough to contain as less frames as possible on each panorama pixel, yet to be slow enough for a smooth transition. The implementation of this algorithm using the same idea of a function of weights, however, this time, the “function of weights” has different purpose which will be explained later. For a translation mostly on x axis (move_x>2*move_y) the function is: w=abs(X)+1; While the masks are: temp3=temp3.*w+(1-temp3)*(5); weight(y,x,k)=temp3; where 5 is just a random choice of a number greater than any value of the w function. Another option would be just to change “5” into “inf”. Then, in order to make the code more readable, the x and y vectors of each frame indexes in the panorama scale are defined: %% new frame place (all its y and x in panorama coordinates) y=absolute_y(k):(absolute_y(k)+size_y-1); % the y axis of the new frame in panorama coordinates x=absolute_x(k):(absolute_x(k)+size_x-1); % the x axis of the new frame in panorama coordinates The next step, is dividing the panorama and the frame into three regions: The overlapping region, The panorama only region and the frame only regions. The two last, are straight forward: %% masks (overlap, new, existing overlap=zeros(y_max+size_y,x_max+size_x); frame=double(moviex(:,:,k)); % new frame temp=(frame>0); % frame mask temp2=double(panoram>0); % panorama mask overlap(y,x)=temp+temp2(y,x); overlap=double(overlap==2); % overlap region %% add new add_frame=(1-overlap(y,x)).*temp.*frame; The overlapping region on the other hand is handled in the following way: Using the “function of weights”, we find the dominant frame, at every pixel and we define the scale for the Gaussian function: distt=min(weight(y,x,1:(k-1)),[],3); % dist exist distp=distt(:,:,1); distf=weight(y,x,k); alfa=0.92; l=abs(distp-alfa*distf); The “1-Gaussian” function is defined with several pixel standard deviation, 0.5 peak value and average at the transition line. Then, the value of the pixels is calculated by multiplying the dominant frame by “1-Gaussian” and the other one by “Gaussian”. For the transition in the middle of the overlapping region the “alfa” parameter should be “1”, however, with smaller “alfa” values the transition line takes place closer to the left (Top) side of the new frame, which gives better results. This due to the location of (1,1) point which is in the leftmost top edge of the frame. Note that in general case, the alfa parameter may be obtained analytically (it is a linear function of the translation vector), however, “alfa=0.92” gives good results for most of the examined cases. 3.4 Main_SIFT.m The “Main-SIFT.m” file is the main macro for implementing the Feature-based algorithm. It reads the input file, calls other functions following the algorithm and its output is panorama image. As in “Main_FFT.m” function, first we read the movie and converting the frames from RGB color scheme to grayscale. The next step is to extract features from the frames using “sift.m” function. First some parameters should be initialized. Applying the “sift.m” function on the first frame provides vector with extracted descriptions and the matching matrix with location parameters (for more detailed information about SIFT function read below). As part of initialization also created two variables, “loc1x” and “loc2x”, which will be used to sort the matching points between two frames. In addition we define a ratio, distRatio, between two descriptors which helps us to find the matching points between two frames. Following the initialization we begin the main cycle of feature extraction and matching. Using a loop for the desired number of frames we are extracting the descriptors and location parameters for the following frame. In order to find the matching descriptors between two frames, calculated a vector of dot products between a descriptor from the first image and the descriptors vector of the second frame. The values in the calculated vector representing the projection of the descriptors of the second frame on the descriptor of the first frame. In order to find the matching descriptor, inverse cosine values are calculated and the vector is sorted. dotprods = des1(l,:) * des2.'; [vals,indx] = sort(acos(dotprods)); The matches are identified by finding the 2 nearest neighbors of each keypoint from the first image among those in the second image, and only accepting a match if the distance to the closest neighbor, ‘distRatio’, is less than 0.6 of that to the second closest neighbor. The threshold of 0.6 can be adjusted up to select more matches or down to select only the most reliable. If a feature from first frame smaller than a feature from second image multiplied by ‘distRatio’, we copy their matching location parameters into loc1x and loc2x accordingly. if (vals(1) < distRatio * vals(2)) loc1x(count,:)=loc1(l,:); loc2x(count,:)=loc2(indx(1),:); count=count+1; end After finding all the relevant points we are calculating the transformation model using RANSAC. The inputs to the function are location vectors (loc2x and loc1x) calculated previously; the number of point pairs used by RANSAC to calculate the model; the number of iterations and the desired resolution. In order to get a precise model, RANSAC uses 15 pairs of pixels and the desired threshold is half pixel. After a model is calculated its parameters are sorted in a matrix which represents the affine transformation. Then a transformation structure is calculated and the new position of the second frame is extracted. [vmask,model]=RANSAC(loc2x, loc1x, 15, 1000,0.5); if (k==2) modl=model; else modl=[modl model]; end A=[model(1) model(2) model(5);model(3) model(4) model(6)].'; tfrm=maketform('affine',A); [y x]=tformfwd(tfrm,1,1); As preparation for the next cycle the descriptors and locations vectors are updated for the new reference frame. By transforming the second frame we changed the locations of the features so they also needed to be multiplied by transformation matrix. Since there is no meaning to the shift parameters only the first four values are taken. transform=[model(1) model(2); model(3) model(4)]; loc1(:,1:2)=(transform*(loc2(:,1:2).')).'; loc1x=zeros(size(loc1)); loc2x=zeros(size(loc1)); After all the transformations are calculated a matrix containing all the transformed images is created. The transformations applied by using “imtransform” function. When an image is rotated and scaled, bilinear interpolation used. This technique causes blurring and some artifacts on the edges. In order to deal with this problems, ”border_filter” function is used. homo_frame(1:rows,1:cols,1)=bw_frame(:,:,1); for k=2:N % affine A=[modl(1,k-1) modl(2,k-1) modl(5,k-1);modl(3,k-1) modl(4,k-1) modl(6,k-1)].'; tfrm=maketform('affine',A); res =imtransform(bw_frame(:,:,k),tfrm,'Udata',[1 size_y],'Vdata',[1 size_x]); homo_frame(1:size(res,1),1:size(res,2),k)=res; homo_frame(:,:,k)=border_filter(uint8(homo_frame(:,:,k))); end Now all the frames are ready to be merged into one panorama image. By applying the “merger.m” function on the “homo_frame” matrix we are constructing the desired image. panorama =merger(homo_frame,x0,y0,N); 3.5 RANSAC.m The purpose of RANSAC function is to select the combination of points which will give us the best model for transformation. In order to reach that goal the general RANSAC implementation was adapted in order to meet our project requirements. The inputs of the function are: mData and mData2 – the location vectors extracted by SIFT function for two frames. nSamleLen – the number of random pair of points used to calculate the model in each iteration. nIter – the number of iterations for each pair of frames. dThreshold – the threshold for residuum. The outputs are: vMask – a vector which indicated the inliers in location vector. 1s set for inliers and 0s for outliers. Model – approximate model of the transformation between two images. The model calculated by RANSAC is verified by using Least-Square method, as explained in chapter 2.2.3. In order to check it we need to build two matrices. This process is implemented by applying functions “affinein” and “affineout” (see section 3.6 and 3.7) on the location parameters vectors (mData and mData2) accordingly. At the beginning of main cycle nSampLen pair of points choose randomly. Then using the Least-Square method a model calculated: A_in=affinein(mData(:,Sample)); A_out=affineout(mData2(:,Sample)); ModelSet= inv((A_in.')*A_in)*(A_in.')*A_out; In order to verify the model we are multiplying the vector of location parameters from the first image by the calculated transformation. Then using the “affinecheck” function (see section 3.8) we are creating a mask representing the number of inliers that are found for this model. By summarizing the mask we calculate an indication for the quality of the model: transformed=A_in_check*ModelSet; CurMask=affinecheck(A_out_check,transformed,dThreshold); nCurInlyerCount = sum(CurMask); If the new calculated indicator, nCurIlyerCount, is higher than previous the latest model saved as the best found. After choosing the best model, the model calculated again using the inliers only, in order to get as much as possible precise results: A_in=affinein(mData(:,vMask)); A_out=affineout(mData2(:,vMask)); Model= inv((A_in.')*A_in)*(A_in.')*A_out; The calculated model and the mask representing the quality of the model returned to the main function. 3.6 Affinein.m This function gets a 2 x N matrix, ‘mData’, which contains the pixel’s coordinates from first image. ‘N’ is the number of points. The function returns the first matrix needed for Least-Square algorithm calculations (see the related explanation in ‘Theory’ part) with the input data sorted as required. In order to create this matrix a temporary matrix, ’temp’, size of N x 6 created. Then the contest of ‘mData’ copied in to the first two columns of ‘temp’ and in the 5th column set the value of ‘1’. Following this, ‘upsample’ function used in order to create a temporary copy of ‘temp’ with every second row set with zeros,’half1’. Then we create the’half2’ which is a copy of ‘half1’ with columns shifted by one column right and rows shifted by one row down. Then the columns 2-4 of the ‘half2’ matrix are shifted by one column right. In order to complete the creation ‘half2’ summed with ‘half1’ and we get the desired matrix. temp=zeros(size(mData,2),6); temp(:,1:2)=mData.'; temp(:,5)=1; half1=upsample(temp,2); half2=circshift(half1,[1 1]); half2(:,2:4)=circshift(half2(:,2:4),[0 1]); res=half1+half2; 3.7 Affineout.m This function gets a 2 x N matrix, ‘mData’, which contains the pixel’s coordinates from second image. ‘N’ is the number of points. The function returns the result vector needed for Least-Square algorithm calculations (see the related explanation in ‘Theory’ part) with the input data sorted as required. In order to create this vector, temporary vectors, ‘x’ and ‘y’, are created. They are containing the x and y coordinates accordingly with zeros between the values. Then the contest of ‘y’ shifted down by one place. In order to complete the creation of desired vector, ‘res’, we are summarizing between ‘x’ and ‘y’ vectors. x=upsample(mData(1,:).',2); y=upsample(mData(2,:).',2); y=circshift(y,1); res=x+y; 3.8 Affinecheck.m This function gets two sets of data in the format f=[x1 y1 x2 y2 x3 y3...]. It finds the elements for which  x  x    y  y    threshold and returns the answer in the 2 1 2 2 1 2 format of a mask. For example if the points 1,2,4 not satisfy the rule, while 3 and 5 satisfy the returned vector will be [0 0 1 0 1].'. The function is used to verify the calculated by RANSAC model. 3.9 Border_filter.m After the bilinear interpolation applied on an image while rotating it, undesired artifacts created on the borders. It is a result of average between an original pixel and the black background created during the rotation. This function removes the unwanted border artifacts and returns the image with sharp borders. The filter is implemented using four convolutions between the image and a delta function in four different directions: c1=[0,0,1]; tempx1=(conv2(double(im3),double(c1),'same')); c2=[1,0,0]; tempx2=(conv2(double(im3),double(c2),'same')); c3=[0;0;1]; tempy1=(conv2(double(im3),double(c3),'same')); c4=[1;0;0]; tempy2=(conv2(double(im3),double(c4),'same')); The results represent the unwanted borders. By multiplying them we get all the pixels with artifacts. temp=temp.*tempx1.*tempx2.*tempy1.*tempy2; temp=uint8(temp>0); After multiplying between the created filter “temp” and the image, we are getting rid of the undesired pixels and the corrected image returned to the main function. 3.10 SIFT.m This function reads an image and returns its SIFT keypoints according to David Lowe’s algorithm. Additional outputs of this function are descriptors: a K-by-128 matrix, where each row gives an invariant descriptor for one of the K keypoints. The descriptor is a vector of 128 values normalized to unit length. Also this function returns a K-by-4 matrix, locs, in which each row has the 4 values for a keypoint location (row, column, scale, orientation). The orientation is in the range     radians. In our implementation we are using a demo version of David Lowe's SIFT keypoint detector in the form of compiled binaries that can run under Linux or Windows. The demo software uses PGM format for image input. So first we are converting input image into PGM image file. Then the function calls to the executable file “siftWin32”. If there is any error while reading the file or creating a temporary file,”tmp.key”, for saving the keypoints, according messages are shown. Otherwise two output matrices are created: ‘locs’ and ‘descriptors’. Then we read from the tmp.key file the information. The file format starts with 2 integers giving the total number of keypoints and the length of the descriptor vector for each keypoint (128). Then the location of each keypoint in the image is specified by 4 floating point numbers giving pixel’s row and column location, scale, and orientation (in the range     radians). Obviously, these numbers are not invariant to viewpoint, but can be used in later stages of processing to check for geometric consistency among matches. Finally, the invariant descriptor vector for the keypoint is given as a list of 128 integers in range [0,255]. Keypoints from a new image can be matched to those from previous images by simply looking for the descriptor vector with closest Euclidean distance among all vectors from previous images. The descriptors are normalized by the square root of the power of total summary. num = header(1); len = header(2); … for i = 1:num [vector, count] = fscanf(g, '%f %f %f %f', [1 4]); if count ~= 4 error('Invalid keypoint file format'); end locs(i, :) = vector(1, :); [descrip, count] = fscanf(g, '%d', [1 len]); if (count ~= 128) error('Invalid keypoint file value.'); end % Normalize each input vector to unit length descrip = descrip / sqrt(sum(descrip.^2)); descriptors(i, :) = descrip(1, :); end In the end all the outputs are returned and “tmp.key” file is closed. 4. Results and Conclusion In this chapter you can see the obtained results based on the implemented algorithms presented in previous chapters. The movie we used in order to demonstrate the abilities of the implemented code is containing 46 frames. The movie is filmed with 8 frames/second rate and the size of each frame is 240 x 320 pixels [height x width]. Below you can see the result using the FFT based algorithm. The stitching method used to obtain this image is weighted stitch. The creation of panorama took approximately 10 second (depends on the hardware used to run MATLAB code). While exploring the result we conclude that the required parameters for obtaining the Translation, Euclidean and Similarity transformations are extracted right from the frames. By examination of horizontal lines which define the windows of the left building we can see a good example of right stitching. The results, obtained using the feature based algorithm, are differing by the stitching method. The best results were calculated using these parameters: number of points to create the model – 8; number of iterations - 100,000; the threshold – 0.25 (which is 0.5 pixels). The creation using this approach with mentioned above parameters took approximately 6 hours. The panorama below was created using the normalized weighted sum method for stitching: As you can see all the edges of stitched frames are smoothed, however some areas are blurred. The reason for this is the images quality and minor inaccuracies in the transformation model obtained by RANSAC. The second panorama obtained based on feature based algorithm uses the weighted stitch: As you can see the result image less blur, however, the small model inaccuracies can be noticed along few stitching lines which are caused mainly by the difference of the illumination at the edges of the image. After examining the results presented above we can summarize that in order to create a panorama which mainly based on translation transformation it would be wise to use the FFT algorithm since the image we got is focused and the stitching is precise. When we are creating a panorama where the affine transformation should be obtained we recommend using the weighted stitch since the quality of the image is less damaged by the applied masks. If the stitching lines aren’t smoothed enough (depends on the image quality, the illumination and more) the normalized weighted sum should be used. 5. Bibliography 1) Szeliski Richard,” Image Alignment and Stiching”, (2005). 2) Matthew Brown and David G. Lowe, "Recognising panoramas," International Conference on Computer Vision (ICCV 2003), Nice, France (October 2003), pp. 1218-25. 3) David G. Lowe, "Object recognition from local scale-invariant features", International Conference on Computer Vision, Corfu, Greece (September 1999), pp. 1150-1157. 4) Hongjie Xie, Nigel Hicks, G. Randy Keller, Haitao Huang, Vladik Kreinovich, “An IDL/ENVI implementation of the FFT-based algorithm for automatic image registration”, Computers & Geosciences 29 (2003) 1045–1055 5) Marco Zuliani, “RANSAC4 Dummies”, A tutorial, (November 2008). 6) Matthew Brown and David G. Lowe, "Automatic panoramic image stitching using invariant features," International Journal of Computer Vision, 74, 1 (2007), pp. 59-73.

1. Abstract - Department of Electrical Engineering

Related documents

Products

Support

1. Abstract - Department of Electrical Engineering

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib