Abstract This project investigates a stitching algorithm that would make it possible to mosaic a set of images. These images form a part of the output of portable scanner. The project involves reduction of noise, extraction of features, identification of matching features (feature correspondence) and estimating the transformation parameters that describe the motion in the image. This algorithm uses a local structure matrix based operator to extract the points of interest or features. For noise reduction the algorithm employs a smoothing spatial filter. From the extracted features a singular value decomposition based technique has been applied to identify the corresponding points in the sequence of images. Corresponding points had to be identified in order to estimate the transformation parameter from these matching points. This project discusses a procedure to get the initial estimates of the transformation parameters; whereas refining these estimates could be done applying the least squares solutions. For this project a data set library containing various images has been compiled which, to some extent replicates the possible distortions that might occur in such applications. Observations made while implementing this algorithm reveal that the methodology employed in order to extract features and establish a correspondence between those features work considerably well for images subjected to small distortions. Screen shots have been provided where applicable to assist in visualizing the implementation of the algorithm. Chapter 1: Algorithm Specifications 1.1 Introduction The main aim of the project was to investigate an algorithm, which would make it possible to stitch (mosaic) a set of images. The process of stitching is considered to be complex due its dependency on other issues like noise (chapter 4), feature extraction (chapter 5), feature correspondence (chapter 6), Warping & Interpolation (chapter 7) etc. The structures of these images are considered to be transformed under rigid motion and these images form a part of the output of a portable scanner. The following chapters shall describe the issues mentioned above as well as a few additional issues in order to maintain comprehensiveness. The methodologies adopted in developing this algorithm, are based on suggestions from other researchers and are referenced where applicable. 1.2 Computation Environment and Algorithm Validation The implementation of the algorithm has been done using Matlab® (Release-13) Development Environment. This is because of its features such as the Image Processing toolbox, data visualisation (graphics) etc., which assist in speeding up the process of algorithm development. Validation of the algorithm was done using sample data (images) because it has not been possible to obtain real data at the time of drafting this dissertation. This sample data is a compilation from various sources such as the internet, flatbed scanners, and synthetic data created using image editing tools (e.g. MS Paint etc) The images under consideration are in the 24-bit, colour format (.bmp).The test criteria for the algorithm, is visual interpretation i.e. the result of the particular operations shall be displayed as markers or lines superimposed on the input image. 1.3 Assumptions 1.3.1 Noise Noise in images can interfere or occur in many forms. Some of these are salt & pepper noise, random noise etc. Due to the unavailability of original data it has not been possible to estimate the types of noise which may occur in this particular application. Hence, in general after considering certain similar applications, and acquisition methods, [1, 6] it is assumed that random noise is more likely in such applications and consequently a suitable filter is applied to reduce this form of noise (chapter 4). 1.3.2 Image Transformations Considering that the end application or device gives three degrees of freedom, then, possibly the images acquired could be distorted with, translations, rotations or a combination of both. This form of acquired data falls into the category similar to a rigid body under motion, and can be analysed by the theory of kinematics of rigid bodies [18]. 1.3.3 Images under consideration The acquired data is stored in the memory in a 24-bit map format and hence all sample images considered for the validation of the algorithm are in the same format (i.e. 24-bit true colour bit map, .bmp). For this project, the scaling factor has been considered to be unity (one), and also that the images being processed are similar in intensities. The maximum number of images being processed is limited to two, in order to prove the algorithm. This algorithm could be used to process more than two images with minor changes to the source code. 1.4 Methodology The time spent in studying some of the IEEE publications, and other documentation related to this subject, has provided comprehension of the project and the challenges involved in it. With due consideration to the research done on this topic and the suggestions from the references, a flow diagram (figure 1) has been proposed which describes the method which is adopted in implementing the algorithm. This flow diagram is an illustration of the main tasks involved in the algorithm. A more detailed description is provided in the later chapters of this document, which explain the technique and mathematics involved in developing the algorithm. In figure 1, I1 and I2 are the input images. 1.5 Flow Diagram NOISE REDUCTION I1 NOISE REDUCTION I2 FEATURE EXTRACTION FEATURE EXTRACTION FEATURE CORRESPONDENCE ALIGNMENT WARPING & INTERPOLATION Figure 1, Flow diagram 1.6 Document Structure This document is organised according to the flow diagram (figure 1) with additional chapters which emphasize more on the development environment (chapter 3) and the end application. Chapter 4 discusses some of the type of noise that normally occur in images and the possibility of them interfering in the current application. Chapter 5 emphasizes the feature extraction part of this algorithm whereas Chapter 6 focuses on the method of establishing the correspondence among the extracted features. This chapter also discusses the image transformations and the estimation of these transformation parameters, considering our initial assumptions of rigid motion. Warping and Interpolation which result in the final stitched image is discussed briefly in Chapter 7. Details of the literature referred to, is provided in the References section of the thesis. At the end of the document, appendices have been provided which may assist in some of the applied mathematical concepts. Chapter 8: Conclusion and Future Work 8.1 Conclusion This project has investigated an algorithm that would make it possible to stitch (mosaic) a set of images. The proposed algorithm reduces the noise content in the acquired images (based on the initial assumptions, section 1.3), identifies features from a given set of images (chapter 5), establishes a correspondence between features from different images (chapter 6), and gives an initial estimate of the motion parameters (section 6.2) that assist in the alignment and stitching process. These processes have been validated using images from the data set library compiled for this purpose (section 2.2). It is observed that the operator employed to extract features from the images works considerably well for small rotations i.e. for angles <10o (Table 1, Figure: 22 – 24, 34) but its performance varies as the angle of rotation increases i.e. for angles >10o (Table 1, Figure: 26, 28, 29, 30, 33, 35). It has been observed that the number of features extracted is the maximum when the angle of rotation is 45o (Table 1). The influence of noise on the feature extraction process has been observed and illustrated by a bar graph (Figure 21). It was also observed that the computation time increases as the number of features extracted increases (Table 2). This is due to the participation of a higher number of image elements in the correspondence establishment process. 8.2 Future Work Although a possible algorithm has been proposed, it is required that more validation be done for each block in Figure 1. Due to time constraints, only a few images have been considered as test images. The optimality of this algorithm could be confirmed by extending the validation procedure over a wide variety of images and also on original scanner output images. As a part of the future work, experiments can be conducted on additional techniques relative to each block of the algorithms flow diagram (Figure 1), and have been discussed in the following sections. 8.2.1 Noise Reduction Instead of the Gaussian smoothing filter, experiments can be conducted with other spatial filtering techniques such as the Alpha trimmed mean filters, which have an advantage of being useful in situations where images contain a combination of salt & pepper and Gaussian noise (more than one type of noise) [1]. Also, adaptive filtering could be applied to reduce local noise while extracting points of interests (features). 8.2.2 Feature Extraction It has been observed that the Harris operator tends to identify more number of features when images are subject to a rotation of about 45o. In order confirm this experiments initially were conducted on images that were distorted using image editing tools, but these tools induced unwanted artefacts such as extra edges, image borders etc. Hence, to avoid these artefacts documents were scanned using a HP-Precision Pro Flatbed scanner. Sample documents were scanned, while intentionally disturbing the scanning procedure. The observations are as shown in Figures 24 – 35 (Chapter 5). Hence, work can be carried out in this area in order to identify methods that can extract features from images that are subjected to a higher angle of rotation. In addition to the above-mentioned extension of the current work on feature extraction, another area worth experimenting is the derivative operator i.e. for this project a normal derivative operator has been applied; instead a Sobel operator could be applied. A possible advantage of applying a Sobel derivative operator could be its weighted value of 2, which could help in the smoothing operation by giving more importance to the centre pixel [1] Normal derivative operator -1 0 1 -1 -1 -1 -1 0 1 0 0 0 -1 0 1 1 1 1 (a) (b) Figure 2, (a), (b) x & y - directional derivative masks, respectively Sobel operator -1 0 1 -1 -2 -1 -2 0 2 0 0 0 -1 0 1 1 2 1 (a) (b) Figure 3, (a), (b) x & y - directional Sobel masks, respectively 8.2.3 Feature Correspondence In chapter 6 a methodology to identify corresponding points from a set of features has been discussed and the implementation procedure presented. This method works fine to an extent of small rotations (< 10o). This might be due to the fact that some features that are not in the overlapping region (outliers) also participate in the matching process. For the current project rogue features or outliers (i.e. the features that do not correspond to anything) have not been considered, hence work could be carried out in order to identify and minimize the participation of outliers in feature matching. This may be useful in optimizing the stitching process. Literature by Huynth.D.Q. et al as well as Press.W.H et al, provide information about detecting and minimizing errors induced by these outliers [14, 17]. As an alternate method of identifying possible 1:1 correspondence between the features extracted from different images, the Random Sample Consensus Algorithm could be implemented. The RANSAC algorithm could be used for robust fitting of models in the presence of many outliers [10]. 8.2.4 Motion Parameter Estimation Section 6.2 has described a method by which the initial estimates of the motion parameters could be achieved. Since, this is only an initial estimate of the motion parameters; a least squares approximation can be applied to minimize the error in the estimation. This minimization could be given by equation 8.1 [15] n E= X i 1 2 ' i R Xi t (8.1) where, X', feature in I2 X, feature in I1, R, rotation matrix and t, translation vector. In addition to the above-mentioned extensions to the current project work, frequency domain techniques could be explored in order to reduce the overall computational time. Reddy.B.S et al have described an FFT based technique to estimate the motion parameters which is claimed to be computationally less expensive. Other frequency domain techniques such as the Discrete Cosine Transforms (DCT), and Wavelet Transforms could be used to reduce the overall computational cost of the algorithm. References: [1] Gonzalez.R.C and Woods.E.R, “Digital Image Processing”, Prentice Hall, Inc., Second Edition, ISBN 0-201-18075-8, 2001 [2] Heung-Yeung Shum and Szeliski.R, “Panoramic image mosaics”, Technical report, MSR-TR-97-23 (updated), 1997 [3] Eric W. Weisstein. Mathworld--A Wolfram Web Resource, http://mathworld.wolfram.com/AffineTransformation.html [4] Hartley.R and Zisserman.A, “Multiple view geometry in computer vision”, Second Edition. Cambridge University Press, ISBN: 0521540518, March 2004 [5] Zappala.T, et al, University of Cambridge, “Document mosaicing”, BMVC97 proceedings, 1997 [6] Harris.C and Stephens.M, “A Combined Corner and Edge Detector”, 1998, Proc. 4th Alvey Vision Conference, Manchester, U.K., pp 147 – 151, [7] Ramoser.H et al, “Efficient alignment of finger print images”, Pattern Recognition, Proceedings. 16th, International Conference on, Volume: 3, 11-15 Pages: 748 - 751 Vol.3., Aug.2002 [8] Scott.G and Longuet.H-Higgins., “An algorithm for associating the features of two patterns”, In Proc. Royal Society London, volume B244, pages 21-26, 1991. [9] Wallis.J.W and Miller.T.R, “An Optimal Rotator for iterative Reconstruction”, IEEE Transactions on medical imaging, Vol. 16, No.1, Feb’1997, [10] Fischler.M.A and Bolles.R.C, “The Random Sample Consensus set: a paradigm for model fitting with applications to image analysis and automated cartography”, Communications of the ACM, 24(6); 381-395, 1981. [11] Pilu.M, “Uncalibrated Stereo Correspondence by Singular Decomposition”, HP Laboratories Bristol, HPL-97-96, August, 1997. Value [12] Denton, J.; Beveridge, J.R.; “Two dimensional projective point matching”, Image Analysis and Interpretation, 2002. Proceedings. Fifth IEEE Southwest Symposium on, 7-9 April 2002 Pages: 77 – 81. [13] Stegmann.M.B, “Image Warping”, Informatics and Mathematical Modeling, Technical University of Denmark, Richard Petersens Plads, Building 321, DK-2800 Kgs. Lyngby, Denmark, 29th October 2001. [14] Huynth.D.Q et al, “Outlier Detection in Video sequences under Affine Projection”, IEEE’2001, Pg. 695 – 701. [15] Meshoul, S.; Batouche, M.; “A fully automatic method for feature-based image registration”, Systems, Man and Cybernetics, 2002 IEEE International Conference on, Volume: 4, 6-9 Oct. 2002Pages:5 pp. vol.4. [16] Reddy, B.S.; Chatterji, B.N; “An FFT-based technique for translation, rotation, and scale-invariant image registration”, Image Processing, IEEE Transactions on, Vol.:5, Issue: 8, Aug’1996, Pg: 1266 – 1271. [17] Press.W.H et al, “Numerical Recipes in C: The Art of Scientific Computing”, Cambridge University Press, Second Edition, ISBN 0-521-43108-5, 1992. [18] Huang, T.S.; Netravali, A.N, “Motion and structure from feature correspondences: A review”, Proceedings of the IEEE, Volume: 82, Issue: 2, Feb.1994 Pages: 252 – 268, [19] Smith.S.W, “The Scientist and Engineer's Guide to Digital Signal Processing”, Second Edition, California Technical Publishing, ISBN 0-9660176-6-8, 1999. [20] Rockett.I.P, “Performance Assessment of feature detection algorithms: A methodology and case study on corner detectors”, IEEE Trans. on Image processing, Vol.12, No.12, December 2003, Pg. 1668 – 1676. [21] C.A. Glasbey et al, “A Review of Image warping methods”, Journal of Applied Statistics, 25, 155-171, 1998. [22] Matlab Release 13, Documentation, Mathworks Inc., 2002. [23] Stroud.K.A, “Further Engineering Mathematics”, Palgrave Macmillan, 3rd edition, ISBN 0333657411, 1996 [24] Chanereley.A, Class handouts, Digital signal processing, School of Computing and Technology, University of East London,