1. Abstract - Department of Electrical Engineering

advertisement
TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY
the Faculty of Electrical Engineering
the Control and Robotics Laboratory
The Panorama Creator
by
Diana Tsamalashvili & Yevgeny Yusepovsky
the supervisor
Arie Nakhmani
Spring 2010
List of Images
1
2
3
Chapter
Image
FFT Based Algorithm for Automatic
Transformation from rectangular
Image Registration
coordinates to log polar
Calculating the affine parameters for
Pictorial representation of the
creating panorama
fundamental RANSAC iteration
Shift
The log-polar transformation
page
7
12
19
example
4
Shift
The rotated image corrected by the
20
shift function
5
Results and Conclusions
FFT panorama
36
6
Results and Conclusions
Normalized weighted sum panorama
36
7
Results and Conclusions
Weighted stitch panorama
37
Table of Contest
1. Abstract .................................................................................................................................................. 5
2. Theory .................................................................................................................................................... 6
2.1 FFT Based Algorithm for Automatic Image Registration ............................................. 6
2.2 Feature Based Algorithm for Automatic Image Registration .................................. 10
2.2.1 Feature Matching .............................................................................................................. 10
2.2.2 Calculating the affine parameters for creating panorama ................................ 11
2.2.3 Model verification by linear least square ................................................................ 13
2.2.4 Image Stitching and Blending....................................................................................... 14
Normalized Weighted Sum. ..................................................................................................... 14
Multiband Blending algorithm ................................................................................................ 15
3. Implementation ............................................................................................................................... 17
3.1 Main-FFT.m ................................................................................................................................. 17
3.2 Shift.m ........................................................................................................................................... 18
3.3 Merger.m ...................................................................................................................................... 22
3.4 Main_SIFT.m ............................................................................................................................... 26
3.5 RANSAC.m ................................................................................................................................... 29
3.6 Affinein.m .................................................................................................................................... 31
3.7 Affineout.m .................................................................................................................................. 32
3.8 Affinecheck.m ............................................................................................................................. 32
3.9 Border_filter.m........................................................................................................................... 32
3.10 SIFT.m ......................................................................................................................................... 33
4. Results and Conclusion ................................................................................................................. 36
5. Bibliography ...................................................................................................................................... 38
1. Abstract
In today’s world people try to utilize to the fullest everything they posses. We are trying
to catch with all our senses whatever the world has to offer us. And it won’t be a
discovery to state that our sight is one of the biggest consumers of our brain. However,
even today, the technology cannot compete with human body. And not once we found
ourselves trying to capture a beautiful landscape and then disappointed by the
limitations of our camera.
A panorama (formed from Greek πᾶν "all" + ὅραμα "sight") is a wide-angle view or
representation of a physical space. And the goal of our project is to implement a
“Panorama Creator” from a video film taken by a regular camera.
In this project we implement a direct FFT based and Feature based algorithms for
stitching images. Then we are using two compositing techniques: Normalized weighted
sum and Weighted stitching in order to remove variety of artifacts and create a high
quality panorama image.
In the end of this book we compare between techniques we used and present the
required conditions in order to get the best result possible.
2. Theory
2.1 FFT Based Algorithm for Automatic Image Registration
The first automatic registration algorithms we decided to implement is based on the
Fast Fourier Transform (FFT). The displacement between two given images can be
determined by computing the ratio F1conj(F2)/|F1F2|, and then applying the inverse
Fourier transform (Hongjie Xie et al., 2003). The result is an impulse-like function, which
is approximately zero everywhere except at the displacement that is necessary to
optimally register the images.
The FFT-based automatic registration method relies on the Fourier shift theorem, which
guarantees that the phase of a specially defined “ratio” is equal to the phase difference
between the images. It is known that if two images I1 and I2 differ only by a shift, (x0,
y0), [i.e., I2(x, y) = I1(x- x0, y- y0)], then their Fourier transforms are related by the
formula:
F2 ( , )  e  j  2 (  x0   y0 )  F1( , )
(1)
The “ratio” of two images I1 and I2 is defined as:
F1 ( , ) conj(F2 ( , ))
F1 ( , ) F1 ( , )e j2 ( x0  y0 )
R


(2)
abs(F1 ( , )) abs(F2 ( , )) F1 ( , ) F1 ( , ) e j2 ( x0  y0 )
where conj is the complex conjugate.
e j2 ( x0  y0 )
1
By taking the inverse Fourier transform of R , we see that the resulting function is
approximately zero everywhere except for a small neighborhood around a single point.
This single point is where the absolute value of the inverse Fourier transfer of R attains
its maximum value. The location of this point is exactly the displacement (x0, y0) needed
to optimally register the images.
If the two images differ by shift, rotation and scaling, then converting abs (F(, )) from
rectangular coordinates (x, y) to log-polar coordinates (log(), ) (Fig. 1) makes it
possible to represent both rotation and scaling as shifts. However, computing (log (),)
from the original rectangular grid leads to points that are not located exactly at points in
the original grid. Thus, interpolation is needed to find a value of abs (F(, )) on the
desired grid. A bilinear interpolation is used in this implementation. Let (x, y) be a point
related to the desired grid point (log (), ),
x  elog(  )  cos( ); y  elog(  )  sin( )
To find the new value M(x, y) using this interpolation, take the intensities M jk, M j+1, k,
M j, k+1, and M j+1,K+1 of four original grid points (j, k), (j+1, k), (j, k+1), and (j+1, k+1)
surrounding (x, y). Then interpolate M(x, y) as follows:
M(x, y) = M jk  (1-t)  (1-u) + M j+1, k  t  (1-u) + M j, k+1  (1-t)  u + M j+1,K+1
 t u
where t is a fractional part of x, and u is a fractional part of y.
The final algorithm for determining rotation, scaling, and shift is:
1. Apply FFT to images I1 and I2  F1(, ) and F2(, );
2. Compute the absolute values of F1(, ) and F2(, );
3. Apply a high pass filter to the absolute values to remove low frequency noise;
4. Transform the resulting values from
rectangular
coordinates
to
log-polar
coordinates;
5. Apply the FFT to log-polar images I1 and
I2  Flp1(, ) and Flp2(, );
6. Compute the ratio R1 of Flp1(, ) and Flp2(, ) using equation (2);
7. Compute the inverse FFT IR1 of the ratio R1;
8. Find the location ( log(  0 ),  0 ) of the maximum of abs(IR1) and obtain the values
log( 0 )
of scale (  0  base
), and rotation angle (  0 );
9. Construct a new image, I3, by applying reverse rotation and scaling to I2 or I1;
10. Apply FFT to images I1 and I3 (or I2 and I3) depending on whether I1 or I2 is
chosen as the base image.
11. Compute the ratio R2 using equation (2);
12. Take inverse FFT IR2 of R2;
13. Obtain the values (x0, y0) of the shift from the location of the maximum of
abs(IR2).
The result of this process is the values of the scale, rotation and shift parameters
needed to register the two images. The process of image stitching will be presented in
section 2.2.4.
Limitations of the algorithm are:
1) The algorithm only works for two images of the exact same size. Especially for
the rotation and scale computation, images also need to be square. This
limitation is not severe because it is easy to produce images that are of equal
size and, if necessary, square before we run the user functions.
2) The algorithm requires images that have an overlapping area larger than 30%.
3) The algorithm only works for images in which the scale changes less than 1.8.
Otherwise, the criterion of 30% overlapping area is not satisfied.
4) We cannot get full homography parameters for creating the panorama.
In order to find a better solution additional algorithm was explored. The theoretical
explanation of it can be found in the next chapter.
2.2 Feature Based Algorithm for Automatic Image Registration
2.2.1 Feature Matching
The first step in the panoramic recognition algorithm is to extract and match SIFT
features between all of the images. SIFT features are located at scale-space maxima/
minima of a difference of Gaussian function. At each feature location, a characteristic
scale and orientation is established. This gives a similarity-invariant frame in which to
make measurements. Although simply sampling intensity values in this frame would be
similarity invariant, the invariant descriptor is actually computed by accumulating local
gradients in orientation histograms. This allows edges to shift slightly without altering
the descriptor vector, giving some robustness to affine change. The vector of gradients
is normalized, and since it consists of differences of intensity values, it is invariant to
affine changes in intensity.
Assuming that the camera rotates about it’s optical centre, the group of transformations
the images may undergo is a special group of homographies. We parameterize each
camera by 3 rotation angles   1 2 3  and focal length f. This gives pairwise
homographies
~
~
u i  H ij u j
H ij  K i Ri R K
T
j
1
j
 fi
where K i   0
 0
0
fi
0
0
 0

i 
0  and Ri  e , i    i 3
 i 2
0 
However, for small changes in image position ui  ui 0 
a11
ui  Aij ui , where Aij  a21
 0
~
~
a12
a22
0
ui
u j
i 3
0
i1
i 2 
i1 
0 
u j or equivalently
ui 0
a13 
a23  is an affine transformation obtained by
1 
linearising the homography around ui 0 . This implies that each small image patch
undergoes an affine transformation, and justifies the use of SIFT features which are
partially invariant under affine change.
Once features have been extracted from all n images (linear time), they must be
matched.
2.2.2 Calculating the affine parameters for creating panorama
Using SIFT we are getting a big set of points matched between two images. As described
above we need a set of 4 pair of coordinates in order to obtain the homographic model.
That would be truth if we any set of points would give us the same model. However, due
to the noise, difference in illumination, different focal length, etc. our model depends on
the set of points we choose. In order to choose the best set of points we need additional
algorithm.
The RANSAC algorithm (RANdom Sample And Consensus) was first introduced by
Fischler and Bolles [5] in 1981 as a method to estimate the parameters of a certain
model starting from a set of data contaminated by large amounts of outliers (Marco
Zuliani, 2008). A datum is considered to be an outlier if it will not fit a model
instantiated by a given set of parameters (assuming that both the model and the
parameters are the “true” ones) within some error threshold that defines the maximum
deviation attributable to the effect of noise. The percentage of outliers which can be
handled by RANSAC can be larger than 50% of the entire data set.
The RANSAC algorithm is essentially composed of two steps that are repeated in an
iterative fashion (hypothesize-and-test framework):
 Hypothesize. First minimal sample sets (MSSs) are randomly selected from the
input dataset and the model parameters are computed using only the elements
of the MSS. The cardinality of the MSS is the smallest sufficient to determine the
model parameters.
 Test. In the second step RANSAC checks which elements of the entire dataset are
consistent with the model instantiated with the parameters estimated in the first
step. The set of such elements is called consensus set (CS).
RANSAC terminates when the probability of finding a better CS drops below a certain
threshold.
Let q be the probability of sampling from the dataset D a MSSs that produces an
accurate estimate of the model parameters. Consequently, the probability of picking a
MSS at least one outlier (i.e. a MSS that produces a biased estimate of the true model
parameter vector) is 1−q. If we construct h different MSS, then the probability that all of
h
them are contaminated by outliers is (1  q) . We would like to pick h (i.e. the number
h
of iterations) large enough so that the probability (1  q) is smaller or equal than a
h
certain probability threshold  (often called alarm rate), i.e. (1  q) < . From previous

 log  
T iter  

 log 1  q   . Below
relation can be concluded that the number of iterations is:
you can see the pictorial representation of the fundamental RANSAC iteration:
2.2.3 Model verification by linear least square
Each calculated model is subject to a verification procedure in which a linear least
squares solution is performed for the parameters of the affine transformation (D. Lowe,
1999). The affine transformation of a model point [x y]T to an image point [u v]T can be
written as below:
where the model translation is [tx ty]T and the affine rotation, scale, and stretch are
represented by the parameters m1, m2, m3 and m4. To solve for the transformation
parameters the equation above can be rewritten to gather the unknowns into a column
vector.
This equation shows a single match, but any number of further matches can be added,
with each match contributing two more rows to the first and last matrix. At least 3
matches are needed to provide a solution. We can write this linear system as
where A is a known m-by-n matrix (usually with m > n), x is an unknown n-dimensional
parameter vector, and b is a known m-dimensional measurement vector.
Therefore the minimizing vector
is a solution of the normal equation
The solution of the system of linear equations is given in terms of the
matrix  AT A 
1
AT , called the pseudoinverse of A, by
, which
minimizes the sum of the squares of the distances from the projected model locations
to the corresponding image locations.
Outliers can now be removed by checking for agreement between each image feature
and the model, given the parameter solution. Given the linear least squares solution,
each match is required to agree within the error range that was used for the parameters
in the RANSAC threshold. As the inliners are obtained, their amount is compared to the
maximal amount of the previous iterations, then, the new best model is saved and the
process iterated. If fewer than 3 points remain after discarding outliers, then the match
is rejected.
2.2.4 Image Stitching and Blending
After calculating all affine parameters and adapting the images accordingly we are ready
to create the final composite. The simplest way would be to take an average value at
each pixel. However, this usually does not work very well, because of exposure
differences, mis-registrations and movie quality. In this section we will present two
techniques to deal with the problems mentioned above.
Normalized Weighted Sum.
One of the recommended techniques is to weight pixels near the center of the image
more heavily and to down-weight pixels near the edges (Richard Szeliski, 2005). From
the previous steps we have n images I i    which, given the known registration, may
be expressed in a common (spherical) coordinate system as
I i   
. In order to combine
information from multiple images we assign a weight function to each image W(x, y) =
w(x)w(y) where w(x) and w(y) varies linearly from 1at the centre of the image to 0 at the
edge [6]. The weight functions are also resampled in spherical coordinates
W i   
.A
simple approach to blending is to perform a weighted sum of the image intensities along
each ray using these weight functions I
I linear   
linear
    
n
i 1
I i   W i   

W i   
i 1
n
where
is a composite spherical image formed using linear blending. Weighted
averaging with a distance map is often called feathering. However, this approach can
cause blurring of high frequency detail if there are small registration errors.
Multiband Blending algorithm
To prevent the above mentioned problem the multiband blending algorithm of Burt and
Adelson (M. Brown, 2007) can be used. The idea behind multi-band blending is to blend
low frequencies over a large spatial range and high frequencies over a short range.
We initialize blending weights for each image by finding the set of points for which
image i is most responsible:
i
max
W
1 if W i  ,   argmax j W j  , 

 ,   

0 otherwise
, i.e.
i
Wmax
 , 
is 1 for
 ,  values
where image i has maximum weight, and 0 where some other image has a higher
weight. These max-weight maps are successively blurred to form the blending weights
for each band.
Bi     I i     Ii   
A high pass version of the rendered image is formed:
g   
is a Gaussian of standard deviation  .
in the range of wavelength
 0, 
Ii     I i     g   
B   
where
represents spatial frequencies
. We blend this band between images using a
blending weight formed by blurring the max-weight map for this image
i
Wi     Wmax
    g   
 0, 
, where
Wi   
is the blend weigh for the wavelength
band. Subsequent frequency bands are blended using lower frequency
bandpass images and further blurring the blend weights, i.e. for k  1 :
B(ik 1)  Iki   Iik 1
Iik 1  Iki   g '
,
Wik 1  Wki  g '
where the standard deviation of the Gaussian blurring kernel
' 
2k  1
is set
such that subsequent bands have the same range of wavelength.
For each band, overlapping images are linearly combined using the corresponding blend
multi
k
weights: I
    
n
  Wki   
. This causes high frequency bands (small
i
W





i 1 k
i
i 1 k
n
B
k ) to be blended over short ranges whilst low frequency bands (large k ) are
blended over larger ranges.
3. Implementation
In this part we are describing the code we wrote in order to implement our project. For
this purpose we used MATLAB technical computing language. Below are described the
functions we wrote in order to reach our goal with attention to the main issues we were
dealing during the implementation.
3.1 Main-FFT.m
The “Main-FFT.m” file is the main macro for implementing the FFT-based algorithm. It
reads the input file, calls other functions following the algorithm and its output is
panorama image.
In its advanced version MATLAB offers us the option to work with different types of
video as AVI , MPEG-1, Windows Media® Video (.wmv, .asf, .asx) and any format
supported by Microsoft DirectShow using the built in function mmreader. Based on this
function we implemented the “movie_read.m” function, which provides us with
MATLAB movie structure, the number of frames and the size of a frame.
After getting access to the information we need, we are preparing our data for further
processing. First we are converting the RGB data to Black & White (BW) frames. It is
important to notice that the conversion to grayscale mode doesn’t affect our ability to
calculate the required parameters for combining the panorama. It does reduce the
amount of data which we are processing. The final result is represented in the original
colors.
On the next step, we are calculating the shift vector for all frames respectively to the
first frame using the “shift.m” function. This function if based on FFT fast image
registration algorithm which is described in chapter 2.1. More details about
implementing the “shift.m” function you can read below.
The final step in creating panorama is merging all the frames according to shift vector,
calculated before.
3.2 Shift.m
“Shift” function’s inputs are two images (im1, im2) and their size (rows, cols). This
function rotates and scales im2 so it fits the best the im1 and then calculates the relative
shift between the frames based on FFT algorithm as described below.
At the beginning we are preprocessing the input images in order to make them square
as required for log-polar transformation and following calculations.
Then we are transforming the above results in to frequency domain, using the fftn
function and calculating the absolute values.
F1=fftn(double(im1s));
F2=fftn(double(im2s));
F1=abs(fftshift(F1));
F2=abs(fftshift(F2));
The second step is to apply high pass filter to the absolute values to remove low
frequency noise. And a low pass filter with radius of the square frame in order to
remove direction dependence, i.e. to create a circle so the transformation to log-polar
coordinates will be more accurate.
Then we transform images from rectangular coordinates to log-polar coordinates using
the “imgpolarcoord.m” function.
log_F1=imgpolarcoord((F1),rows,cols);
log_F2=imgpolarcoord((F2),rows,cols);
In order to get some intuition about the process you can see below an example of
calculations executed by MATLAB for white square on black background and compare
the results original and rotated log-polar images. The shift which represents the rotation
on the log-polar images can be noticed if white lines, which origin from the black halfcircles, along the y axis are examined:
Following the above calculations we are applying FFT on the log-polar images:
I1=fftn(double(log_F1));
I2=fftn(double(log_F2));
And now we are able to calculate the R1 ratio as defined in chapter 2.1. Following this
step we are calculating the inverse FFT of the calculated above ratio, R1.
R1=Flp1.*conj(Flp2)./(abs(Flp1).*abs(Flp2));
IR1=ifft2(R1);
In order to obtain the values of scale and rotation angle we need to find the location of
the maximum of the absolute value of IR1. In the algorithm, one image must be chosen
as the base image. It is assumed that the orientation of the base image is orientated and
that the object of the process is to reference the second image. A positive rotation angle
means that the image is rotated to the east relative to the base image (i.e. clockwise
rotation), while a negative rotation angle means that the image is rotated to the west
(i.e. counterclockwise rotation). It is also assumed that the rotation angle is in the
interval [–45, +45], because normally the two images have similar orientations.
max_corr=max(max(abs(IR1)));
[pos_y, pos_x]=find(abs(IR1)==max_corr);
b=10^((log10(rows))/siz);
scale=b^(mod(pos_y,siz));
angle=(pos_x*pi / (2*siz))*(180/pi);
if angle>45
angle=angle-90;
end
In our implementation we chose im1 as the base image. After calculating the relative
scale and rotation angle of the second image (im2) we are creating third image (im3).
Since the rotation and scale may change the size of the original matrix we are required
to resize im3 so it will be compatible to the base image. The rescale and rotation of im2
is executed by the “img_sc_rot.m” function which is based on two built-in Matlab
functions: imresize and imrotate.
im3=img_sc_rot(im2,1/scale,-angle);
im3=border_filter(im3);
The result for the above example after calculating the rotation angle and its correction
is:
the rotated image corrected by the shift function
When the frames are rotated a bilinear interpolation are used. This means that pixels at
the edge are averaged with zeros and this causes undesired artifacts during stitching. In
order to deal with it we are using a border filter. The explanation about its
implementation you can find in section 3.9.
Now we are ready to calculate the shift between the two images. Similar to previous
steps we are required to calculate the FFT for two images. Then we are calculating the
ration, defined in previous section, R2. Finally, in order to obtain the relative shift we
need to calculate the inverse FFT of R2 ration (IR2) and find the coordinates of the
absolute value of the IR2.
F3=fftn(double(im3));
F1=fftn(double(im1));
R2=F1.*conj(F3)./(abs(F1).*abs(F3));
IR2=ifft2(R2);
max_corr=max(max(abs(IR2)));
[y,x]=find(IR2==max_corr);
The sign convention is that image shifts to the east or south are positive while shifts the
west or north are negative. Also, special consideration is needed when the computed
value (y and x) in the last step above is bigger than half of the image columns or rows
(Table 1). The following approach is used to deal with these situations:
if (y>0.5*rows)
y=y-rows;
Status
end;
computed
if (x>0.5*cols)
value
x=x-cols;
end;
of Output shift value
shift
Image shift
direction
x< ½ columns
positive
to east
x> ½ columns
negative (x-columns) to west
y< ½ rows
positive
to south
y> ½ rows
negative (y-rows)
to north
Table 1
3.3 Merger.m
The merger function creates panorama from a set of discrete frames and the translation
between each frame and the previous one.
As an input, the merger function gets the amount of frames, the shift between every
two frames and the movie itself with the final version of every frame. In other words, if
we use any kind of transform beyond translation, all of the frames are already
transformed according to the model and the border filter is already applied.
The first step in the merger function is the creation of a common scale between all of
the frames and a matrix of the right size to contain exactly the full panorama picture.
The way to implement it is by using “cumsum” Matlab function, which creates a vector
of cumulative sum of elements. Those partial sums of translation vector represent the
translation between each frame and the first one. Thus, a common scale for all of the
frames is already created.
The problem now, is that the coordinates of the frames are limited to matrix indexes.
This enforce us to start our scale from the point (1,1). However, this may be solved
easily by subtracting the minimum of all frame locations as an offset and adding “1”.
Now, everything is done for creating a zero matrix of the right size aimed for panorama
containment.
%% the borders of the panorama
x_max=max(absolute_x);
x_min=min(absolute_x);
y_max=max(absolute_y);
y_min=min(absolute_y);
%% making the right size matrix for panorama +3 color pellets
panoram=zeros(y_max+size_y,x_max+size_x);
absolute_x and absolute_y are the coordinate of the rightmost top point of each frame
and size_x, size_y are frame sizes.
Next, the way of frame stitching has to be chosen. Thus, there are two versions of
merger function:
1> Normalized weighted sum
2> Weighted sum
Normalized weighted sum
First, the masks of the frames should be created:
temp=moviex(:,:,k);
temp3=double((temp>0));
This is true, for every frame “k” (‘for’ loop is used)
Next, a function of weights should be chosen:
[ X,Y ] = meshgrid(linspace(-1,1,size_x),linspace(-1,1,size_y) );
a=1;
b=40;
w=1./((abs(b*X)).^a+(abs(b*Y)).^a);
For smoothing reduction, the gradient of the chosen function should be high enough,
especially near the origin. This promises a dominant frame for most of the centralized
pixels. On the other hand, near the edges, the gradient should be low enough for
smooth frame transitions. From several implement functions, which include inverse
parabola, Gaussian and (X^a+Y^a) for different a values, the function shown above gave
the best results.
The next step is the multiplication of the frame masks by the function of weights.
temp3=temp3.*w;
Now, a matrix of weights of every pixel in every frame may be created:
weight=zeros(y_max+size_y,x_max+size_x,frame_num);
and for every iteration k:
weight(absolute_y(k):(absolute_y(k)+size_y-1),absolute_x(k):(absolute_x(k)+size_x1),k)=weight(absolute_y(k):(absolute_y(k)+size_y1),absolute_x(k):(absolute_x(k)+size_x-1),k)+temp3;
The normalization factor is:
nirmul=sum(weight,3);
Now the panorama may be created, using the following formula:
Weighted stitching
This is another method of frame stitching, which is aimed for smoothing the transitions
between frames as the normalized weighted stitch do. However, in contrast to
normalized weighted stitching this algorithm is also aimed for preserving the sharpness
of the frames as much as possible.
The algorithm is based on finding the overlapping region between each two frames and
making a fast transition function from one frame to another. This function should be
fast enough to contain as less frames as possible on each panorama pixel, yet to be slow
enough for a smooth transition.
The implementation of this algorithm using the same idea of a function of weights,
however, this time, the “function of weights” has different purpose which will be
explained later. For a translation mostly on x axis (move_x>2*move_y) the function is:
w=abs(X)+1;
While the masks are:
temp3=temp3.*w+(1-temp3)*(5);
weight(y,x,k)=temp3;
where 5 is just a random choice of a number greater than any value of the w function.
Another option would be just to change “5” into “inf”.
Then, in order to make the code more readable, the x and y vectors of each frame
indexes in the panorama scale are defined:
%% new frame place (all its y and x in panorama coordinates)
y=absolute_y(k):(absolute_y(k)+size_y-1); % the y axis of the new frame in
panorama coordinates
x=absolute_x(k):(absolute_x(k)+size_x-1); % the x axis of the new frame in
panorama coordinates
The next step, is dividing the panorama and the frame into three regions:
The overlapping region,
The panorama only region
and the frame only regions.
The two last, are straight forward:
%% masks (overlap, new, existing
overlap=zeros(y_max+size_y,x_max+size_x);
frame=double(moviex(:,:,k)); % new frame
temp=(frame>0);
% frame mask
temp2=double(panoram>0);
% panorama mask
overlap(y,x)=temp+temp2(y,x);
overlap=double(overlap==2);
% overlap region
%% add new
add_frame=(1-overlap(y,x)).*temp.*frame;
The overlapping region on the other hand is handled in the following way:
Using the “function of weights”, we find the dominant frame, at every pixel and we
define the scale for the Gaussian function:
distt=min(weight(y,x,1:(k-1)),[],3); % dist exist
distp=distt(:,:,1);
distf=weight(y,x,k);
alfa=0.92;
l=abs(distp-alfa*distf);
The “1-Gaussian” function is defined with several pixel standard deviation, 0.5 peak
value and average at the transition line. Then, the value of the pixels is calculated by
multiplying the dominant frame by “1-Gaussian” and the other one by “Gaussian”.
For the transition in the middle of the overlapping region the “alfa” parameter should
be “1”, however, with smaller “alfa” values the transition line takes place closer to the
left (Top) side of the new frame, which gives better results. This due to the location of
(1,1) point which is in the leftmost top edge of the frame.
Note that in general case, the alfa parameter may be obtained analytically (it is a linear
function of the translation vector), however, “alfa=0.92” gives good results for most of
the examined cases.
3.4 Main_SIFT.m
The “Main-SIFT.m” file is the main macro for implementing the Feature-based
algorithm. It reads the input file, calls other functions following the algorithm and its
output is panorama image.
As in “Main_FFT.m” function, first we read the movie and converting the frames from
RGB color scheme to grayscale.
The next step is to extract features from the frames using “sift.m” function. First some
parameters should be initialized. Applying the “sift.m” function on the first frame
provides vector with extracted descriptions and the matching matrix with location
parameters (for more detailed information about SIFT function read below). As part of
initialization also created two variables, “loc1x” and “loc2x”, which will be used to sort
the matching points between two frames. In addition we define a ratio, distRatio,
between two descriptors which helps us to find the matching points between two
frames.
Following the initialization we begin the main cycle of feature extraction and matching.
Using a loop for the desired number of frames we are extracting the descriptors and
location parameters for the following frame. In order to find the matching descriptors
between two frames, calculated a vector of dot products between a descriptor from the
first image and the descriptors vector of the second frame. The values in the calculated
vector representing the projection of the descriptors of the second frame on the
descriptor of the first frame. In order to find the matching descriptor, inverse cosine
values are calculated and the vector is sorted.
dotprods = des1(l,:) * des2.';
[vals,indx] = sort(acos(dotprods));
The matches are identified by finding the 2 nearest neighbors of each keypoint from the
first image among those in the second image, and only accepting a match if the distance
to the closest neighbor, ‘distRatio’, is less than 0.6 of that to the second closest
neighbor. The threshold of 0.6 can be adjusted up to select more matches or down to
select only the most reliable. If a feature from first frame smaller than a feature from
second image multiplied by ‘distRatio’, we copy their matching location parameters into
loc1x and loc2x accordingly.
if (vals(1) < distRatio * vals(2))
loc1x(count,:)=loc1(l,:);
loc2x(count,:)=loc2(indx(1),:);
count=count+1;
end
After finding all the relevant points we are calculating the transformation model using
RANSAC. The inputs to the function are location vectors (loc2x and loc1x) calculated
previously; the number of point pairs used by RANSAC to calculate the model; the
number of iterations and the desired resolution.
In order to get a precise model, RANSAC uses 15 pairs of pixels and the desired
threshold is half pixel.
After a model is calculated its parameters are sorted in a matrix which represents the
affine transformation. Then a transformation structure is calculated and the new
position of the second frame is extracted.
[vmask,model]=RANSAC(loc2x, loc1x, 15, 1000,0.5);
if (k==2)
modl=model;
else
modl=[modl model];
end
A=[model(1) model(2) model(5);model(3) model(4) model(6)].';
tfrm=maketform('affine',A);
[y x]=tformfwd(tfrm,1,1);
As preparation for the next cycle the descriptors and locations vectors are updated for
the new reference frame. By transforming the second frame we changed the locations
of the features so they also needed to be multiplied by transformation matrix. Since
there is no meaning to the shift parameters only the first four values are taken.
transform=[model(1) model(2); model(3) model(4)];
loc1(:,1:2)=(transform*(loc2(:,1:2).')).';
loc1x=zeros(size(loc1));
loc2x=zeros(size(loc1));
After all the transformations are calculated a matrix containing all the transformed
images is created. The transformations applied by using “imtransform” function. When
an image is rotated and scaled, bilinear interpolation used. This technique causes
blurring and some artifacts on the edges. In order to deal with this problems,
”border_filter” function is used.
homo_frame(1:rows,1:cols,1)=bw_frame(:,:,1);
for k=2:N
% affine
A=[modl(1,k-1) modl(2,k-1) modl(5,k-1);modl(3,k-1) modl(4,k-1) modl(6,k-1)].';
tfrm=maketform('affine',A);
res =imtransform(bw_frame(:,:,k),tfrm,'Udata',[1 size_y],'Vdata',[1 size_x]);
homo_frame(1:size(res,1),1:size(res,2),k)=res;
homo_frame(:,:,k)=border_filter(uint8(homo_frame(:,:,k)));
end
Now all the frames are ready to be merged into one panorama image. By applying the
“merger.m” function on the “homo_frame” matrix we are constructing the desired
image.
panorama =merger(homo_frame,x0,y0,N);
3.5 RANSAC.m
The purpose of RANSAC function is to select the combination of points which will give us
the best model for transformation. In order to reach that goal the general RANSAC
implementation was adapted in order to meet our project requirements.
The inputs of the function are:
mData and mData2 – the location vectors extracted by SIFT function for two frames.
nSamleLen – the number of random pair of points used to calculate the model in each
iteration.
nIter – the number of iterations for each pair of frames.
dThreshold – the threshold for residuum.
The outputs are:
vMask – a vector which indicated the inliers in location vector. 1s set for inliers and 0s
for outliers.
Model – approximate model of the transformation between two images.
The model calculated by RANSAC is verified by using Least-Square method, as explained
in chapter 2.2.3. In order to check it we need to build two matrices. This process is
implemented by applying functions “affinein” and “affineout” (see section 3.6 and 3.7)
on the location parameters vectors (mData and mData2) accordingly.
At the beginning of main cycle nSampLen pair of points choose randomly. Then using
the Least-Square method a model calculated:
A_in=affinein(mData(:,Sample));
A_out=affineout(mData2(:,Sample));
ModelSet= inv((A_in.')*A_in)*(A_in.')*A_out;
In order to verify the model we are multiplying the vector of location parameters from
the first image by the calculated transformation. Then using the “affinecheck” function
(see section 3.8) we are creating a mask representing the number of inliers that are
found for this model. By summarizing the mask we calculate an indication for the quality
of the model:
transformed=A_in_check*ModelSet;
CurMask=affinecheck(A_out_check,transformed,dThreshold);
nCurInlyerCount = sum(CurMask);
If the new calculated indicator, nCurIlyerCount, is higher than previous the latest model
saved as the best found.
After choosing the best model, the model calculated again using the inliers only, in order
to get as much as possible precise results:
A_in=affinein(mData(:,vMask));
A_out=affineout(mData2(:,vMask));
Model= inv((A_in.')*A_in)*(A_in.')*A_out;
The calculated model and the mask representing the quality of the model returned to
the main function.
3.6 Affinein.m
This function gets a 2 x N matrix, ‘mData’, which contains the pixel’s coordinates from
first image. ‘N’ is the number of points. The function returns the first matrix needed for
Least-Square algorithm calculations (see the related explanation in ‘Theory’ part) with
the input data sorted as required.
In order to create this matrix a temporary matrix, ’temp’, size of N x 6 created. Then the
contest of ‘mData’ copied in to the first two columns of ‘temp’ and in the 5th column set
the value of ‘1’. Following this, ‘upsample’ function used in order to create a temporary
copy of ‘temp’ with every second row set with zeros,’half1’. Then we create the’half2’
which is a copy of ‘half1’ with columns shifted by one column right and rows shifted by
one row down. Then the columns 2-4 of the ‘half2’ matrix are shifted by one column
right. In order to complete the creation ‘half2’ summed with ‘half1’ and we get the
desired matrix.
temp=zeros(size(mData,2),6);
temp(:,1:2)=mData.';
temp(:,5)=1;
half1=upsample(temp,2);
half2=circshift(half1,[1 1]);
half2(:,2:4)=circshift(half2(:,2:4),[0 1]);
res=half1+half2;
3.7 Affineout.m
This function gets a 2 x N matrix, ‘mData’, which contains the pixel’s coordinates from
second image. ‘N’ is the number of points. The function returns the result vector needed
for Least-Square algorithm calculations (see the related explanation in ‘Theory’ part)
with the input data sorted as required.
In order to create this vector, temporary vectors, ‘x’ and ‘y’, are created. They are
containing the x and y coordinates accordingly with zeros between the values. Then the
contest of ‘y’ shifted down by one place. In order to complete the creation of desired
vector, ‘res’, we are summarizing between ‘x’ and ‘y’ vectors.
x=upsample(mData(1,:).',2);
y=upsample(mData(2,:).',2);
y=circshift(y,1);
res=x+y;
3.8 Affinecheck.m
This function gets two sets of data in the format f=[x1 y1 x2 y2 x3 y3...]. It finds the
elements for which
 x  x    y  y    threshold and returns the answer in the
2
1
2
2
1
2
format of a mask. For example if the points 1,2,4 not satisfy the rule, while 3 and 5
satisfy the returned vector will be [0 0 1 0 1].'.
The function is used to verify the calculated by RANSAC model.
3.9 Border_filter.m
After the bilinear interpolation applied on an image while rotating it, undesired artifacts
created on the borders. It is a result of average between an original pixel and the black
background created during the rotation. This function removes the unwanted border
artifacts and returns the image with sharp borders.
The filter is implemented using four convolutions between the image and a delta
function in four different directions:
c1=[0,0,1];
tempx1=(conv2(double(im3),double(c1),'same'));
c2=[1,0,0];
tempx2=(conv2(double(im3),double(c2),'same'));
c3=[0;0;1];
tempy1=(conv2(double(im3),double(c3),'same'));
c4=[1;0;0];
tempy2=(conv2(double(im3),double(c4),'same'));
The results represent the unwanted borders. By multiplying them we get all the pixels
with artifacts.
temp=temp.*tempx1.*tempx2.*tempy1.*tempy2;
temp=uint8(temp>0);
After multiplying between the created filter “temp” and the image, we are getting rid of
the undesired pixels and the corrected image returned to the main function.
3.10 SIFT.m
This function reads an image and returns its SIFT keypoints according to David Lowe’s
algorithm. Additional outputs of this function are descriptors: a K-by-128 matrix, where
each row gives an invariant descriptor for one of the K keypoints. The descriptor is a
vector of 128 values normalized to unit length. Also this function returns a K-by-4
matrix, locs, in which each row has the 4 values for a keypoint location (row, column,
scale, orientation). The orientation is in the range     radians.
In our implementation we are using a demo version of David Lowe's SIFT keypoint
detector in the form of compiled binaries that can run under Linux or Windows. The
demo software uses PGM format for image input. So first we are converting input image
into PGM image file.
Then the function calls to the executable file “siftWin32”. If there is any error while
reading the file or creating a temporary file,”tmp.key”, for saving the keypoints,
according messages are shown. Otherwise two output matrices are created: ‘locs’ and
‘descriptors’. Then we read from the tmp.key file the information.
The file format starts with 2 integers giving the total number of keypoints and the length
of the descriptor vector for each keypoint (128). Then the location of each keypoint in
the image is specified by 4 floating point numbers giving pixel’s row and column
location, scale, and orientation (in the range     radians). Obviously, these numbers
are not invariant to viewpoint, but can be used in later stages of processing to check for
geometric consistency among matches. Finally, the invariant descriptor vector for the
keypoint is given as a list of 128 integers in range [0,255]. Keypoints from a new image
can be matched to those from previous images by simply looking for the descriptor
vector with closest Euclidean distance among all vectors from previous images.
The descriptors are normalized by the square root of the power of total summary.
num = header(1);
len = header(2);
…
for i = 1:num
[vector, count] = fscanf(g, '%f %f %f %f', [1 4]);
if count ~= 4
error('Invalid keypoint file format');
end
locs(i, :) = vector(1, :);
[descrip, count] = fscanf(g, '%d', [1 len]);
if (count ~= 128)
error('Invalid keypoint file value.');
end
% Normalize each input vector to unit length
descrip = descrip / sqrt(sum(descrip.^2));
descriptors(i, :) = descrip(1, :);
end
In the end all the outputs are returned and “tmp.key” file is closed.
4. Results and Conclusion
In this chapter you can see the obtained results based on the implemented algorithms
presented in previous chapters. The movie we used in order to demonstrate the abilities
of the implemented code is containing 46 frames. The movie is filmed with 8
frames/second rate and the size of each frame is 240 x 320 pixels [height x width].
Below you can see the result using the FFT based algorithm. The stitching method used
to obtain this image is weighted stitch. The creation of panorama took approximately 10
second (depends on the hardware used to run MATLAB code). While exploring the result
we conclude that the required parameters for obtaining the Translation, Euclidean and
Similarity transformations are extracted right from the frames. By examination of
horizontal lines which define the windows of the left building we can see a good
example of right stitching.
The results, obtained using the feature based algorithm, are differing by the stitching
method. The best results were calculated using these parameters: number of points to
create the model – 8; number of iterations - 100,000; the threshold – 0.25 (which is 0.5
pixels). The creation using this approach with mentioned above parameters took
approximately 6 hours.
The panorama below was created using the normalized weighted sum method for
stitching:
As you can see all the edges of stitched frames are smoothed, however some areas are
blurred. The reason for this is the images quality and minor inaccuracies in the
transformation model obtained by RANSAC.
The second panorama obtained based on feature based algorithm uses the weighted
stitch:
As you can see the result image less blur, however, the small model inaccuracies can be
noticed along few stitching lines which are caused mainly by the difference of the
illumination at the edges of the image.
After examining the results presented above we can summarize that in order to create a
panorama which mainly based on translation transformation it would be wise to use the
FFT algorithm since the image we got is focused and the stitching is precise. When we
are creating a panorama where the affine transformation should be obtained we
recommend using the weighted stitch since the quality of the image is less damaged by
the applied masks. If the stitching lines aren’t smoothed enough (depends on the image
quality, the illumination and more) the normalized weighted sum should be used.
5. Bibliography
1) Szeliski Richard,” Image Alignment and Stiching”, (2005).
2) Matthew Brown and David G. Lowe, "Recognising panoramas," International
Conference on Computer Vision (ICCV 2003), Nice, France (October 2003), pp.
1218-25.
3) David G. Lowe, "Object recognition from local scale-invariant features",
International Conference on Computer Vision, Corfu, Greece (September 1999),
pp. 1150-1157.
4) Hongjie Xie, Nigel Hicks, G. Randy Keller, Haitao Huang, Vladik Kreinovich, “An
IDL/ENVI implementation of the FFT-based algorithm for automatic image
registration”, Computers & Geosciences 29 (2003) 1045–1055
5) Marco Zuliani, “RANSAC4 Dummies”, A tutorial, (November 2008).
6) Matthew Brown and David G. Lowe, "Automatic panoramic image stitching
using invariant features," International Journal of Computer Vision, 74, 1 (2007),
pp. 59-73.
Download