International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013 ISSN 2278-7763 1 357 Retrieving Content Based Images with Query point technique based on K-mean Clustering Miss Dipti S. Zambre, Miss Sonal P. Patil 1 Research Fellow of Department of Computer Science, G.H.R.I.E.M., Jalgaon, India; 2Department of Computer Science, G.H.R.I.E.M., Jalgaon, India Email: 1zambredipti@gmail.com, 2sonalpatil3@gmail.com ABSTRACT This paper provides an overview of the technical achievements in the research area of k-means clustering in content-based image retrieval (CBIR). K-means clustering is a powerful technique in CBIR systems, in order to improve the performance of CBIR effectively. It is an open research area to the researcher to reduce the local maximum trapping and number of iteration to find target image. The paper covers the current state of art of the research in k-means clustering in CBIR and query point method for the system. Keywords : Clustering, K-means clustering, content based image retrieval, feature extraction. 1 INTRODUCTION Image retrieval is the process of browsing, searching and retrieving images from a large database of digital images. The collection of images in the web are growing larger and becoming more diverse. Retrieving images from such large collections is a challenging problem. One of the main problems they highlighted was the difficulty of locating a desired image in a large and varied collection. While it is perfectly possible to identify a desired image from a small collection simply by browsing, more effective techniques are needed with collections containing thousands of items. To search for images, a user may provide query terms such as keyword, image file/link, or click on some image, and the system will return images "similar" to the query. The similarity used for search criteria could be meta tags, color distribution in images, region/shape attributes, etc. Unfortunately, image retrieval systems have not kept pace with the collections they are searching. The shortcomings of these systems are due both to the image representations they use and to their methods of accessing those representations to find images. The problems of image retrieval are becoming widely recognized, and the search for solutions an increasingly active area for research and development. In recent years, with large scale storing of images the need to have an efficient method of image searching and retrieval has increased. It can simplify many tasks in many application areas such as biomedicine, forensics, artificial intelligence, military, education, web image searching etc. while dealing with this sort of information like organizing and searching large volumes of images in databases, users can have difficulties, as the current commercial database systems are designed for textual data, it is not well suited and compatible for digital images. Therefore there is a need for an efficient way for image retrieval. There are different ways to retrieve the images in CBIR including QBIC [1], Photobook [2], MARS [3], NeTra [4], Copyright © 2013 SciResPub. PicHunter [5], Blobworld [6], VisualSEEK [7]. In a typical CBIR system, low-level visual image features (e.g., color, texture, and shape) are automatically extracted for image descriptions and indexing purposes. To search for desirable images, a user presents an image as an example of similarity, and the system returns a set of similar images based on the extracted features. In CBIR systems with relevance feedback (RF), a user can mark returned images as positive or negative, which are then fed back into the systems as a new, refined query for the next round of retrieval. The process is repeated until the user is satisfied with the query result. Relevance feedback uses the user’s knowledge for the marking images, but user’s knowledge may be accurate or inaccurate. It is not 100% accurate knowledge. Due to this we will get more number of irrelevant images and inaccuracy. To avoid that inaccuracy we are not integrating relevance feedback in proposed system. To overcome the inaccuracy we are proposing k-means clustering technique which can be applied for scalable image retrieval from large databases. Image classification or categorization has often been treated as a pre-processing step for speeding-up image retrieval in large databases and improving accuracy, or for performing automatic image annotation. Image clustering inherently depends on a similarity measure, image categorization has been performed by varied methods that neither require nor make use of similarity metrics. Image categorization is often followed by a step of similarity measurement, restricted to those images in a large database that belong to the same visual class as predicted for the query. In such cases, the retrieval process is intertwined, whereby categorization and similarity matching steps together form the retrieval process. Similar arguments hold for clustering as well, due to which, in many cases, it is also a fundamental “early” step in image retrieval. IJOART 358 International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013 ISSN 2278-7763 2 PROBLEM DEFINITIONS Many CBIR systems have been developed for efficient and effective image retrieval from a large multimedia database. In a typical CBIR system, low-level visual image features (e.g., color, texture, and shape) are automatically extracted for image descriptions and indexing purposes. To search for desirable images, a user presents an image as an example of similarity, and the system returns a set of similar images based on the extracted features. There are two general types of image search: target search and category search. The goal of target search is to find a specific (target) image, such as a registered logo, a historical photograph, or a particular painting. The goal of category search is to retrieve a given semantic class or genre of images, such as scenery images or skyscrapers. These strategy leads to the following disadvantages: Local Maximum Trap Problem: In local maximum trap there is no guarantee that the target can be found. Slow convergence: In slow convergence problem more number of iterations is required to lead to the target image. The proposed methodology will be used to retrieve images in the database that are nearest to query image. The methodology efficiently retrieves image, base on CBIR concept, in which a clustering approach is used to efficiently prune the search space and initializing it to minimum bounding box in order to lead to target image with minimum number of iterations and overcome the problem of local maximum trap and slow convergence. 2 images. So that it reduces the storage required and hence the system becomes faster and effective in CBIR. Once the features are extracted, they are stored in the database for future use. The degree to which a computer can extract meaningful information from the image is the most powerful key to the advancement of intelligent image interpreting systems. One of the biggest advantages of feature extraction is that, it significantly reduces the information (compared to the original image) to represent an image for understanding the content of that image. The most commonly used features are color, texture, and shape. 3 SYSTEM ARCHITECTURE Fig .3.1 shows the general scheme of image retrieval from a database using K-mean clustering. Image retrieval is according to the similarities between the query image and images in image database, ignoring the similarities between images in image database. The paper applies the clustering algorithm to further explore the similarities between images in image database for reducing the image retrieval space. To retrieve the image from the database, we first extract feature vectors from images (the features can be shape, color, texture etc), then store feature vectors into another database for future use. When given query image, we similarly extract its feature vectors, and match those features with database image features. If the distance between two images feature vectors is small enough; we consider the corresponding image in the database similar to the query. The system could be any real value symmetric image retrieval system. First, extract the image features of each image in image database and apply the clustering algorithm to analysis the similarities of images in the database for constructing the images clustering database, then, input the query image, extracting its features and comparing the similarities between features of it and those of images in image clustering database, and output the best matching results. A. FEATURE EXTRACTION Feature extraction is the basis of content based image retrieval. It involves extracting the meaningful information from the Copyright © 2013 SciResPub. Fig 3.1: System Architecture a. Colour Features Color is linked to the chromatic part of an image. Color represents one of the most widely used visual features in CBIR systems. First a color space is used to represent color images. Typically, RGB space where the gray level intensity is represented as the sum of red, green and blue gray level intensities [8].In image retrieval a histogram is employed to represent the distribution of colors in image [9]. A color histogram provides allotment of colors which is achieved by damaging image color and plus how many numbers of pixels fit into every color. For the whole collection every image’s color histogram is examined and saved in the database. Retrieval of those images has been done in the matching process whose color allotment matches to the example query very much [10]. IJOART 359 International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013 ISSN 2278-7763 b. Texture Features Texture is another important property of images. It refers to the visual patterns that have property of homogeneity or arrangement that do not result from the presence of only a single color or intensity. By dissimilarity in brightness with high frequencies in the image spectrum, textures are characterized. While making a distinction between areas of the mages with same color, these features are very useful. Measuring of image texture such as the degree of coarseness, contrast, directionality, line likeness, regularity and roughness can be calculated using second-order statistics [10]. c. Shape Features By either the global form of the shape or local elements of its boundary, shape features can be differentiated. Global form of the shape: like the area, the extension and the major axis orientation. Local elements of its boundary: like corners, characteristics points or curvature elements. The degree of similarity between two shapes is evaluated through standard mathematical distances measures, like Euclidian distance between two pairs [11]. B. Similarity Measures In similarity measure, the query image feature vector and database image feature vector are compared using the distance metric. The images are ranked based on the distance value. It is proposed in [12], the detailed comparison of nine different metrics such as Manhattan, weighted mean-variance, Euclidean, Chebychev, Mahanobis etc distance for texture image retrieval with empirical evaluation. They found that Canberra and Bray-Curtis distance metrics performed exceptionally well than all other distance metrics. 4 K-MEANS CLUSTERING Clustering algorithm has been widely used in computer vision such as image segmentation and database organization. KMeans is a clustering method based on the optimization of an overall measure of clustering quality is known for its efficiency in producing accurate results in image retrieval. In K-means clustering algorithms, group the images into clusters based on the color content, the clustered images apply to K-Means, so that we can get better favoured image results. After clustering and selecting the cluster centers, the given query image is first compared with all the cluster centers. The clusters are ranked according to their similarity with the query. Then the query image is compared directly with the images in these clusters. Thus, the number of comparisons is reduced considerably from comparing the query with all the images in the database. This process iterates until the criterion function converges. Thus, the retrieval will be very accurate. The number of similarity comparisons required depends on the sizes of the clusters and the number of clusters being examined. It leads to the better performance. Copyright © 2013 SciResPub. 3 The purpose of clustering is to group images whose feature vectors are similar by similarity judgment standard; meanwhile to separate the dissimilar images. Clustering algorithms can be broadly divided into two groups: hierarchical and partitional. Hierarchical clustering algorithms recursively find nested clusters either in agglomerative mode (starting with each data point in its own cluster and merging the most similar pair of clusters successively to form a cluster hierarchy) or in divisive (top-down) mode (starting with all the data points in one cluster and recursively dividing each cluster into smaller clusters). Compared to hierarchical clustering algorithms, partitional clustering algorithms find all the clusters simultaneously as a partition of the data and do not impose a hierarchical structure. Input to a hierarchical algorithm is an n*n similarity matrix, where n is the number of objects to be clustered. On the other hand, a partitional algorithm can use either an n*d pattern matrix, where n objects are embedded in a d-dimensional feature space, or an n*n similarity matrix. Note that a similarity matrix can be easily derived from a pattern matrix, but ordination methods such as multi-dimensional scaling (MDS) are needed to derive a pattern matrix from a similarity matrix. The most popular and the simplest partitional algorithm is Kmeans. Since partitional algorithms are preferred in pattern recognition due to the nature of available data, K-means has a rich and diverse history as it was independently discovered in different scientific fields, it is one of the most widely used algorithms for clustering. Ease of implementation, simplicity, efficiency, and empirical success are the main reasons for its popularity. In the paper, we apply k-means algorithm to analysis images similarities in the database [13]. 5 QUERY POINT MOVEMENT TECHNIQUE It is essentially tries to improve the estimate of the ‘ideal query point’ by moving it towards good example points and away from bad example points. Query is represented by a single point in a feature space and refinement process attempts to move that point toward the direction where relevant points were located. The frequently used technique to iteratively improve this estimation is Rocchio’s formula [14]. Query processing technique evaluate three types of queries (i.e., sampling queries, constrained sampling queries, and constrained k-NN queries) based on our four target search methods. The query cost is the sum of disk seek (including cylinder seek and rotation), data transfer, and CPU time, in which seek time dominates the total query cost. Query processing technique, designed to minimize the disk I/O cost [15]. 6 CONCLUSIONS Image retrieval algorithms always use the similarity between the query image and images in image database. However, they ignore the similarities between images in image database. In this paper we addressed this problem by introducing a clustering approach for image retrieval by finding image similarity clustering to reduce the images retrieving space. IJOART 360 International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013 ISSN 2278-7763 K-means clustering, which also can improve the efficiency of image retrieving and evidently promote retrieval precision. This k-means algorithm independent on the feature extraction algorithm is used as a post-processing step in retrieval. The improvements in selecting neighbourhood vertices of the retrieval results from tradition image retrieval system in image feature space could also improve the recall rate. We can conclude from the results that the proposed system achieve high retrieval performance, reduce the local maximum trapping and number of iteration. Due to which the retrieval speed is increased. 4 [15] Danzhou Liu,Kein A. Hua, Khanh Vu, Ning Yu, “ Fast Query Point Movement Techniques for Large CBIR Systems”, IEEE Trans on Know and data engg, vol.21 no.5,May 2009. REFERENCES [1] M. Flickner, H.S. Sawhney, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker,“Query by Image and Video Content: The QBIC System,”Computer, vol. 28, no. 9, pp. 23-32, Sept. 1995. [2] A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation for Image Databases,” Int’l J. Computer Vision,vol. 18, no. 3, pp. 233-254, 1996. [3] M. Ortega- Binderberger and S. Mehrotra, “Relevance FeedbackTechniques in the MARS Image Retrieval Systems,”Multimedia Systems, vol. 9, no. 6, pp. 535-547, 2004. [4] W.Y. Ma and B. Manjunath, “Netra: A Toolbox for NavigatingLarge Image Databases,” Proc. IEEE Int’l Conf. Image Processing (ICIP ’97), pp. 568-571, 1997. [5] I.J. Cox, M.L. Miller, T.P. Minka, T.V. Papathomas, and P.N. Yianilos, “The Bayesian Image Retrieval System, PicHunter: Theory, Implementation, and Psychophysical Experiments,” IEEE Trans. Image Processing, vol. 9, no. 1, pp. 20-37, 2000. [6] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and ItsApplication to Image Querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1026-1038, Aug. 2002. [7] J.R. Smith and S.-F. Chang, “VisualSEEk: A Fully Automated Content-Based Image Query System,” Proc. Fourth ACM Int’l Conf. Multimedia (MULTIMEDIA ’96), pp. 87-98, 1996. [8] A.C. She and T. S. Huang, "Segmentation of road scene using color and fractal based texture classification," in Proc.ICIP, Austin,Nov 1994. [9] J. R Smith and S.F.Chang, "A fully automated content based image query systems," in Proc. ACM Multimedia 96, 1996. [10] C. Carson, S. Belongie, H.Greenspan and J.malik "Image segmentation using expectation maximization and its application to image querying" IEEE Trans. Pattern Anal. Mach. Intell, 2002, p.1026-1038 [11] M.Borowski, L.Brocker, S. Heisterkamp, J. Loffler, “Structuring the visual content of Digital Libraries Using CBIR Systems”, IEEE, pp.288-293,2000. [12] Manesh Kokare, B.N. Chatterji and P.K Biswas, “Comparision of Similarity Metrics for Texture Image Retrieval” in IEEE TENCON, pp. 571-575, 2003. [13] Hong Liu 1, and Xiaohong Yu 2, “Application Research of kmeans Clustering Algorithm in Image Retrieval System”, in Proc ISCSCT, China, Dec. 2009 [14] Rocchio JJ, “Relevance Feedback in Information Retrieval” in the SMART Retrieval system, pp. 313-323, prentice Hall. 1971. Copyright © 2013 SciResPub. IJOART