Variable-Length Signature for Near-Duplicate Image Matching ABSTRACT: We propose a variable-length signature for near-duplicate image matching in this paper. An image is represented by a signature, the length of which varies with respect to the number of patches in the image. A new visual descriptor, viz., probabilistic center-symmetric local binary pattern, is proposed to characterize the appearance of each image patch. Beyond each individual patch, the spatial relationships among the patches are captured. In order to compute the similarity between two images, we utilize the earth mover’s distance which is good at handling variable-length signatures. The proposed image signature is evaluated in two different applications, i.e., near duplicate document image retrieval and nearduplicate natural image detection. The promising experimental results demonstrate the validity and effectiveness of the proposed variable-length signature. EXISTING SYSTEM: Kim employed the ordinal measures of the discrete cosine transform coefficients to represent an image. Then the L1 norm was utilized for image similarity computation. Liu and Yang built a color difference histogram for an image, which encoded the color and edge orientations of the image in a uniform framework. Subsequently, the similarity of two images was computed in terms of the enhanced Canberra distance. Aksoy and Haralick proposed line-angle-ratio statistics and co-occurrence variances to represent an image which were organized into a feature vector of 28 dimensions. Then different similarity measures were compared in the image retrieval scenario. Meng et al. first represented an image by a 279D feature vector. For similarity computation, the enhanced Dynamic Partial Function was proposed which adaptively activated a different number of features in a pairwise manner to accommodate the characteristics of each image pair. Chum et al. represented an image based on its color histograms and then employed Locality Sensitive Hashing (LSH) for fast retrieval. For the sake of computational efficiency, the vectorial representations were first embedded into binary codes in some works DISADVANTAGES OF EXISTING SYSTEM: In bag-of-visual-words model, the spatial layout of the visual words is totally disregarded, which will incur ambiguity during matching. PROPOSED SYSTEM: A visual descriptor named Probabilistic Center-symmetric Local Binary Pattern (PCSLBP) is proposed to depict the patch appearance, which is flexible in the presence of image distortions. Beyond each individual patch, we describe the relationships among the patches as well, viz. the distance between every pair of patches in the image. A weight is also assigned to each patch to indicate its contribution in identifying the image. Given the characteristics of all the patches, the image is represented by a signature. The superiority of signatures over vectors in representing images is that the former vary in length across images, indicating the image’s characteristics. To compute the similarity between two images, the Earth Mover’s Distance is employed in our work, thanks to its prominent ability in coping with variable-length signatures. Furthermore, it is able to handle the issue of patch extraction instability naturally by allowing many-to-many patch correspondence. ADVANTAGES OF PROPOSED SYSTEM: We further justify the proposed patch extraction approach by comparing it with the commonly used image segmentation method, namely, Watershed. The comparisons are demonstrated, from which the advantage of the proposed approach is obvious. To describe patch visual appearance, good robustness to image orientation, illumination and scale variations is highly desired. In our work, we propose a patch visual appearance descriptor, viz. Probabilistic Center-symmetric Local Binary Pattern (PCSLBP), which is an improvement Centersymmetric Local Binary Pattern (CSLBP). SYSTEM ARCHITECTURE Source Image PCSLBP Probabilistic Centersymmetric Local Binary Pattern Patch Extraction Patch 1 Patch 2 Signatures Patch n Return a Set of Near Matching Score of SYSTEM REQUIREMENTS: HARDWARE REQUIREMENTS: System : Pentium IV 2.4 GHz. Hard Disk : 40 GB. Floppy Drive : 1.44 Mb. Monitor : 15 VGA Colour. Mouse : Logitech. Ram : 512 Mb. SOFTWARE REQUIREMENTS: Operating system : Windows XP/7. Coding Language : MATLAB Tool MATLAB R2013A : REFERENCE: Li Liu, Yue Lu, Senior Member, IEEE, and Ching Y. Suen, Fellow, IEEE, “Variable-Length Signature for Near-Duplicate Image Matching”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 4, APRIL 2015.